cluster-toolkit/community/modules/compute/schedmd-slurm-gcp-v6-nodeset/README.md at main · abbas1902/cluster-toolkit

Description

This module creates a nodeset data structure intended to be input to the schedmd-slurm-gcp-v6-partition module.

Nodesets allow adding heterogeneous node types to a partition, and hence running jobs that mix multiple node characteristics. See the heterogeneous jobs section of the SchedMD documentation for more information.

To specify nodes from a specific nodesets in a partition, the --nodelist (or -w) flag can be used, for example:

srun -N 3 -p compute --nodelist cluster-compute-group-[0-2] hostname

Where the 3 nodes will be selected from the nodes cluster-compute-group-[0-2] in the compute partition.

Additionally, depending on how the nodes differ, a constraint can be added via the --constraint (or -C) flag or other flags such as --mincpus can be used to specify nodes with the desired characteristics.

Example

The following code snippet creates a partition module using the nodeset module as input with:

a max node count of 200
VM machine type of c2-standard-30
partition name of "compute"
default nodeset name of "ghpc"
connected to the network module via use
nodes mounted to homefs via use

- id: nodeset
  source: community/modules/compute/schedmd-slurm-gcp-v6-nodeset
  use:
  - network
  settings:
    node_count_dynamic_max: 200
    machine_type: c2-standard-30

- id: compute_partition
  source: community/modules/compute/schedmd-slurm-gcp-v6-partition
  use:
  - homefs
  - nodeset
  settings:
    partition_name: compute

Custom Images

For more information on creating valid custom images for the node group VM instances or for custom instance templates, see our vm-images.md documentation page.

GPU Support

More information on GPU support in Slurm on GCP and other Cluster Toolkit modules can be found at docs/gpu-support.md

Compute VM Zone Policies

The Slurm on GCP nodeset module allows you to specify additional zones in which to create VMs through bulk creation. This is valuable when configuring partitions with popular VM families and you desire access to more compute resources across zones.

WARNING: Lenient zone policies can lead to additional egress costs when moving large amounts of data between zones in the same region. For example, traffic between VMs and traffic from VMs to shared filesystems such as Filestore. For more information on egress fees, see the Network Pricing Google Cloud documentation.

To avoid egress charges, ensure your compute nodes are created in a single zone by setting var.zone and leaving var.zones to its default value of the empty list.

NOTE: If a new zone is added to the region while the cluster is active, nodes in the partition may be created in that zone. In this case, the partition may need to be redeployed to ensure the newly added zone is denied.

In the zonal example below, the nodeset's zone implicitly defaults to the deployment variable vars.zone:

vars:
  zone: us-central1-f

- id: zonal-nodeset
  source: community/modules/compute/schedmd-slurm-gcp-v6-nodeset

In the example below, we enable creation in additional zones:

vars:
  zone: us-central1-f

- id: multi-zonal-nodeset
  source: community/modules/compute/schedmd-slurm-gcp-v6-nodeset
  settings:
    zones:
    - us-central1-a
    - us-central1-b

Support

The Cluster Toolkit team maintains the wrapper around the slurm-on-gcp terraform modules. For support with the underlying modules, see the instructions in the slurm-gcp README.

Requirements

Name	Version
terraform	>= 1.4
google	>= 5.11

Providers

Name	Version
google	>= 5.11
terraform	n/a

Modules

Name	Source	Version
gpu	../../../../modules/internal/gpu-definition	n/a
instance_validation	../../../../modules/internal/instance_validations	n/a

Resources

Name	Type
terraform_data.machine_type_zone_validation	resource
google_compute_machine_types.machine_types_by_zone	data source
google_compute_reservation.reservation	data source
google_compute_zones.available	data source

Inputs

Name	Description	Type	Default	Required
accelerator_topology	Specifies the shape of the Accelerator (GPU/TPU) slice.	`string`	`null`	no
access_config	Access configurations, i.e. IPs via which the VM instance can be accessed via the Internet.	list(object({ nat_ip = string network_tier = string }))	`[]`	no
additional_disks	Configurations of additional disks to be included on the partition nodes.	list(object({ disk_name = optional(string) device_name = optional(string) disk_size_gb = optional(number) disk_type = optional(string) disk_labels = optional(map(string)) auto_delete = optional(bool) boot = optional(bool) disk_resource_manager_tags = optional(map(string)) }))	`[]`	no
additional_networks	Additional network interface details for GCE, if any.	list(object({ network = optional(string) subnetwork = string subnetwork_project = optional(string) network_ip = optional(string, "") nic_type = optional(string) stack_type = optional(string) queue_count = optional(number) access_config = optional(list(object({ nat_ip = string network_tier = string })), []) ipv6_access_config = optional(list(object({ network_tier = string })), []) alias_ip_range = optional(list(object({ ip_cidr_range = string subnetwork_range_name = string })), []) }))	`[]`	no
advanced_machine_features	See https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/compute_instance_template#nested_advanced_machine_features	object({ enable_nested_virtualization = optional(bool) threads_per_core = optional(number) turbo_mode = optional(string) visible_core_count = optional(number) performance_monitoring_unit = optional(string) enable_uefi_networking = optional(bool) })	{ "threads_per_core": 1 }	no
allow_automatic_updates	If false, disables automatic system package updates on the created instances. This feature is only available on supported images (or images derived from them). For more details, see https://cloud.google.com/compute/docs/instances/create-hpc-vm#disable_automatic_updates	`bool`	`true`	no
bandwidth_tier	Configures the network interface card and the maximum egress bandwidth for VMs. - Setting `platform_default` respects the Google Cloud Platform API default values for networking. - Setting `virtio_enabled` explicitly selects the VirtioNet network adapter. - Setting `gvnic_enabled` selects the gVNIC network adapter (without Tier 1 high bandwidth). - Setting `tier_1_enabled` selects both the gVNIC adapter and Tier 1 high bandwidth networking. - Note: both gVNIC and Tier 1 networking require a VM image with gVNIC support as well as specific VM families and shapes. - See official docs for more details.	`string`	`"platform_default"`	no
can_ip_forward	Enable IP forwarding, for NAT instances for example.	`bool`	`false`	no
disable_public_ips	DEPRECATED: Use `enable_public_ips` instead.	`bool`	`null`	no
disk_auto_delete	Whether or not the boot disk should be auto-deleted.	`bool`	`true`	no
disk_labels	Labels specific to the boot disk. These will be merged with var.labels.	`map(string)`	`{}`	no
disk_resource_manager_tags	(Optional) A set of key/value resource manager tag pairs to bind to the instance disks. Keys must be in the format tagKeys/{tag_key_id}, and values are in the format tagValues/456.	`map(string)`	`{}`	no
disk_size_gb	Size of boot disk to create for the partition compute nodes.	`number`	`50`	no
disk_type	Boot disk type, can be either hyperdisk-balanced, pd-ssd, pd-standard, pd-balanced, or pd-extreme.	`string`	`"pd-standard"`	no
dws_flex	If set and `enabled = true`, will utilize the DWS Flex Start to provision nodes. See: https://cloud.google.com/blog/products/compute/introducing-dynamic-workload-scheduler Options: - enable: Enable DWS Flex Start - max_run_duration: Maximum duration in seconds for the job to run, should not exceed 604,800 (one week). - use_job_duration: Use the job duration to determine the max_run_duration, if job duration is not set, max_run_duration will be used. - use_bulk_insert: Uses the legacy implementation of DWS Flex Start with Bulk Insert for non-accelerator instances Limitations: - CAN NOT be used with reservations; - CAN NOT be used with placement groups; - If `use_job_duration` is enabled nodeset can be used in "exclusive" partitions only	object({ enabled = optional(bool, true) max_run_duration = optional(number, 604800) # one week use_job_duration = optional(bool, false) use_bulk_insert = optional(bool, false) })	{ "enabled": false }	no
enable_confidential_vm	Enable the Confidential VM configuration. Note: the instance image must support option.	`bool`	`false`	no
enable_maintenance_reservation	Enables slurm reservation for scheduled maintenance.	`bool`	`false`	no
enable_opportunistic_maintenance	On receiving maintenance notification, maintenance will be performed as soon as nodes becomes idle.	`bool`	`false`	no
enable_oslogin	Enables Google Cloud os-login for user login and authentication for VMs. See https://cloud.google.com/compute/docs/oslogin	`bool`	`true`	no
enable_placement	Use placement policy for VMs in this nodeset. See: https://cloud.google.com/compute/docs/instances/placement-policies-overview To set max_distance of used policy, use `placement_max_distance` variable. Enabled by default, reasons for users to disable it: - If non-dense reservation is used, user can avoid extra-cost of creating placement policies; - If user wants to avoid "all or nothing" VM provisioning behaviour; - If user wants to intentionally have "spread" VMs (e.g. for reliability reasons)	`bool`	`true`	no
enable_public_ips	If set to true. The node group VMs will have a random public IP assigned to it. Ignored if access_config is set.	`bool`	`false`	no
enable_shielded_vm	Enable the Shielded VM configuration. Note: the instance image must support option.	`bool`	`false`	no
enable_smt	DEPRECATED: Use `advanced_machine_features.threads_per_core` instead.	`bool`	`null`	no
enable_spot_vm	Enable the partition to use spot VMs (https://cloud.google.com/spot-vms).	`bool`	`false`	no
future_reservation	If set, will make use of the future reservation for the nodeset. Input can be either the future reservation name or its selfLink in the format 'projects/PROJECT_ID/zones/ZONE/futureReservations/FUTURE_RESERVATION_NAME'. See https://cloud.google.com/compute/docs/instances/future-reservations-overview	`string`	`""`	no
guest_accelerator	List of the type and count of accelerator cards attached to the instance.	list(object({ type = string, count = number }))	`[]`	no
instance_image	Defines the image that will be used in the Slurm node group VM instances. Expected Fields: name: The name of the image. Mutually exclusive with family. family: The image family to use. Mutually exclusive with name. project: The project where the image is hosted. For more information on creating custom images that comply with Slurm on GCP see the "Slurm on GCP Custom Images" section in docs/vm-images.md.	`map(string)`	{ "family": "slurm-gcp-6-11-hpc-rocky-linux-8", "project": "schedmd-slurm-public" }	no
instance_image_custom	A flag that designates that the user is aware that they are requesting to use a custom and potentially incompatible image for this Slurm on GCP module. If the field is set to false, only the compatible families and project names will be accepted. The deployment will fail with any other image family or name. If set to true, no checks will be done. See: https://goo.gle/hpc-slurm-images	`bool`	`false`	no
instance_properties	Override the instance properties. Used to test features not supported by Slurm GCP, recommended for advanced usage only. See https://cloud.google.com/compute/docs/reference/rest/v1/regionInstances/bulkInsert If any sub-field (e.g. scheduling) is set, it will override the values computed by SlurmGCP and ignoring values of provided vars.	`any`	`null`	no
instance_template	DEPRECATED: Instance template can not be specified for compute nodes.	`string`	`null`	no
labels	Labels to add to partition compute instances. Key-value pairs.	`map(string)`	`{}`	no
machine_type	Compute Platform machine type to use for this partition compute nodes.	`string`	`"c2-standard-60"`	no
maintenance_interval	Sets the maintenance interval for instances in this nodeset. See https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/compute_instance#maintenance_interval.	`string`	`null`	no
metadata	Metadata, provided as a map.	`map(string)`	`{}`	no
min_cpu_platform	The name of the minimum CPU platform that you want the instance to use.	`string`	`null`	no
name	Name of the nodeset. Automatically populated by the module id if not set. If setting manually, ensure a unique value across all nodesets.	`string`	n/a	yes
network_storage	An array of network attached storage mounts to be configured on nodes.	list(object({ server_ip = string, remote_mount = string, local_mount = string, fs_type = string, mount_options = string, }))	`[]`	no
node_conf	Map of Slurm node line configuration.	`map(any)`	`{}`	no
node_count_dynamic_max	Maximum number of auto-scaling nodes allowed in this partition.	`number`	`10`	no
node_count_static	Number of nodes to be statically created.	`number`	`0`	no
on_host_maintenance	Instance availability Policy. Note: Placement groups are not supported when on_host_maintenance is set to "MIGRATE" and will be deactivated regardless of the value of enable_placement. To support enable_placement, ensure on_host_maintenance is set to "TERMINATE".	`string`	`"TERMINATE"`	no
placement_max_distance	Maximum distance between nodes in the placement group. Requires enable_placement to be true. Values must be supported by the chosen machine type.	`number`	`null`	no
preemptible	Should use preemptibles to burst.	`bool`	`false`	no
project_id	Project ID to create resources in.	`string`	n/a	yes
region	The default region for Cloud resources.	`string`	n/a	yes
reservation_name	Name of the reservation to use for VM resources, should be in one of the following formats: - projects/PROJECT_ID/reservations/RESERVATION_NAME[/reservationBlocks/BLOCK_ID] - RESERVATION_NAME[/reservationBlocks/BLOCK_ID] Must be a "SPECIFIC" reservation Set to empty string if using no reservation or automatically-consumed reservations	`string`	`""`	no
resource_manager_tags	(Optional) A set of key/value resource manager tag pairs to bind to the instances. Keys must be in the format tagKeys/{tag_key_id}, and values are in the format tagValues/456.	`map(string)`	`{}`	no
service_account	DEPRECATED: Use `service_account_email` and `service_account_scopes` instead.	object({ email = string scopes = set(string) })	`null`	no
service_account_email	Service account e-mail address to attach to the compute instances.	`string`	`null`	no
service_account_scopes	Scopes to attach to the compute instances.	`set(string)`	[ "https://www.googleapis.com/auth/cloud-platform" ]	no
shielded_instance_config	Shielded VM configuration for the instance. Note: not used unless enable_shielded_vm is 'true'. - enable_integrity_monitoring : Compare the most recent boot measurements to the integrity policy baseline and return a pair of pass/fail results depending on whether they match or not. - enable_secure_boot : Verify the digital signature of all boot components, and halt the boot process if signature verification fails. - enable_vtpm : Use a virtualized trusted platform module, which is a specialized computer chip you can use to encrypt objects like keys and certificates.	object({ enable_integrity_monitoring = bool enable_secure_boot = bool enable_vtpm = bool })	{ "enable_integrity_monitoring": true, "enable_secure_boot": true, "enable_vtpm": true }	no
spot_instance_config	Configuration for spot VMs.	object({ termination_action = string })	`null`	no
startup_script	Startup script used by VMs in this nodeset	`string`	`"# no-op"`	no
subnetwork_self_link	Subnet to deploy to.	`string`	n/a	yes
tags	Network tag list.	`list(string)`	`[]`	no
zone	Zone in which to create compute VMs. Additional zones in the same region can be specified in var.zones.	`string`	n/a	yes
zone_target_shape	Strategy for distributing VMs across zones in a region. ANY GCE picks zones for creating VM instances to fulfill the requested number of VMs within present resource constraints and to maximize utilization of unused zonal reservations. ANY_SINGLE_ZONE (default) GCE always selects a single zone for all the VMs, optimizing for resource quotas, available reservations and general capacity. BALANCED GCE prioritizes acquisition of resources, scheduling VMs in zones where resources are available while distributing VMs as evenly as possible across allowed zones to minimize the impact of zonal failure.	`string`	`"ANY_SINGLE_ZONE"`	no
zones	Additional zones in which to allow creation of partition nodes. Google Cloud will find zone based on availability, quota and reservations. Should not be set if SPECIFIC reservation is used.	`set(string)`	`[]`	no

Outputs

Name	Description
nodeset	Details of the nodeset. Typically used as input to `schedmd-slurm-gcp-v6-partition`.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Description

Example

Custom Images

GPU Support

Compute VM Zone Policies

Support

Requirements

Providers

Modules

Resources

Inputs

Outputs

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Description

Example

Custom Images

GPU Support

Compute VM Zone Policies

Support

Requirements

Providers

Modules

Resources

Inputs

Outputs