This module creates a nodeset data structure intended to be input to the schedmd-slurm-gcp-v6-partition module.
Nodesets allow adding heterogeneous node types to a partition, and hence running jobs that mix multiple node characteristics. See the heterogeneous jobs section of the SchedMD documentation for more information.
To specify nodes from a specific nodesets in a partition, the --nodelist
(or -w) flag can be used, for example:
srun -N 3 -p compute --nodelist cluster-compute-group-[0-2] hostnameWhere the 3 nodes will be selected from the nodes cluster-compute-group-[0-2]
in the compute partition.
Additionally, depending on how the nodes differ, a constraint can be added via
the --constraint (or -C) flag or other flags such as --mincpus can be
used to specify nodes with the desired characteristics.
The following code snippet creates a partition module using the nodeset
module as input with:
- a max node count of 200
- VM machine type of
c2-standard-30 - partition name of "compute"
- default nodeset name of "ghpc"
- connected to the
networkmodule viause - nodes mounted to homefs via
use
- id: nodeset
source: community/modules/compute/schedmd-slurm-gcp-v6-nodeset
use:
- network
settings:
node_count_dynamic_max: 200
machine_type: c2-standard-30
- id: compute_partition
source: community/modules/compute/schedmd-slurm-gcp-v6-partition
use:
- homefs
- nodeset
settings:
partition_name: computeFor more information on creating valid custom images for the node group VM instances or for custom instance templates, see our vm-images.md documentation page.
More information on GPU support in Slurm on GCP and other Cluster Toolkit modules can be found at docs/gpu-support.md
The Slurm on GCP nodeset module allows you to specify additional zones in which to create VMs through bulk creation. This is valuable when configuring partitions with popular VM families and you desire access to more compute resources across zones.
WARNING: Lenient zone policies can lead to additional egress costs when moving large amounts of data between zones in the same region. For example, traffic between VMs and traffic from VMs to shared filesystems such as Filestore. For more information on egress fees, see the Network Pricing Google Cloud documentation.
To avoid egress charges, ensure your compute nodes are created in a single zone by setting var.zone and leaving var.zones to its default value of the empty list.
NOTE: If a new zone is added to the region while the cluster is active, nodes in the partition may be created in that zone. In this case, the partition may need to be redeployed to ensure the newly added zone is denied.
In the zonal example below, the nodeset's zone implicitly defaults to the
deployment variable vars.zone:
vars:
zone: us-central1-f
- id: zonal-nodeset
source: community/modules/compute/schedmd-slurm-gcp-v6-nodesetIn the example below, we enable creation in additional zones:
vars:
zone: us-central1-f
- id: multi-zonal-nodeset
source: community/modules/compute/schedmd-slurm-gcp-v6-nodeset
settings:
zones:
- us-central1-a
- us-central1-bThe Cluster Toolkit team maintains the wrapper around the slurm-on-gcp terraform modules. For support with the underlying modules, see the instructions in the slurm-gcp README.
| Name | Version |
|---|---|
| terraform | >= 1.4 |
| >= 5.11 |
| Name | Version |
|---|---|
| >= 5.11 | |
| terraform | n/a |
| Name | Source | Version |
|---|---|---|
| gpu | ../../../../modules/internal/gpu-definition | n/a |
| instance_validation | ../../../../modules/internal/instance_validations | n/a |
| Name | Type |
|---|---|
| terraform_data.machine_type_zone_validation | resource |
| google_compute_machine_types.machine_types_by_zone | data source |
| google_compute_reservation.reservation | data source |
| google_compute_zones.available | data source |
| Name | Description | Type | Default | Required |
|---|---|---|---|---|
| accelerator_topology | Specifies the shape of the Accelerator (GPU/TPU) slice. | string |
null |
no |
| access_config | Access configurations, i.e. IPs via which the VM instance can be accessed via the Internet. | list(object({ |
[] |
no |
| additional_disks | Configurations of additional disks to be included on the partition nodes. | list(object({ |
[] |
no |
| additional_networks | Additional network interface details for GCE, if any. | list(object({ |
[] |
no |
| advanced_machine_features | See https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/compute_instance_template#nested_advanced_machine_features | object({ |
{ |
no |
| allow_automatic_updates | If false, disables automatic system package updates on the created instances. This feature is only available on supported images (or images derived from them). For more details, see https://cloud.google.com/compute/docs/instances/create-hpc-vm#disable_automatic_updates |
bool |
true |
no |
| bandwidth_tier | Configures the network interface card and the maximum egress bandwidth for VMs. - Setting platform_default respects the Google Cloud Platform API default values for networking.- Setting virtio_enabled explicitly selects the VirtioNet network adapter.- Setting gvnic_enabled selects the gVNIC network adapter (without Tier 1 high bandwidth).- Setting tier_1_enabled selects both the gVNIC adapter and Tier 1 high bandwidth networking.- Note: both gVNIC and Tier 1 networking require a VM image with gVNIC support as well as specific VM families and shapes. - See official docs for more details. |
string |
"platform_default" |
no |
| can_ip_forward | Enable IP forwarding, for NAT instances for example. | bool |
false |
no |
| disable_public_ips | DEPRECATED: Use enable_public_ips instead. |
bool |
null |
no |
| disk_auto_delete | Whether or not the boot disk should be auto-deleted. | bool |
true |
no |
| disk_labels | Labels specific to the boot disk. These will be merged with var.labels. | map(string) |
{} |
no |
| disk_resource_manager_tags | (Optional) A set of key/value resource manager tag pairs to bind to the instance disks. Keys must be in the format tagKeys/{tag_key_id}, and values are in the format tagValues/456. | map(string) |
{} |
no |
| disk_size_gb | Size of boot disk to create for the partition compute nodes. | number |
50 |
no |
| disk_type | Boot disk type, can be either hyperdisk-balanced, pd-ssd, pd-standard, pd-balanced, or pd-extreme. | string |
"pd-standard" |
no |
| dws_flex | If set and enabled = true, will utilize the DWS Flex Start to provision nodes.See: https://cloud.google.com/blog/products/compute/introducing-dynamic-workload-scheduler Options: - enable: Enable DWS Flex Start - max_run_duration: Maximum duration in seconds for the job to run, should not exceed 604,800 (one week). - use_job_duration: Use the job duration to determine the max_run_duration, if job duration is not set, max_run_duration will be used. - use_bulk_insert: Uses the legacy implementation of DWS Flex Start with Bulk Insert for non-accelerator instances Limitations: - CAN NOT be used with reservations; - CAN NOT be used with placement groups; - If use_job_duration is enabled nodeset can be used in "exclusive" partitions only |
object({ |
{ |
no |
| enable_confidential_vm | Enable the Confidential VM configuration. Note: the instance image must support option. | bool |
false |
no |
| enable_maintenance_reservation | Enables slurm reservation for scheduled maintenance. | bool |
false |
no |
| enable_opportunistic_maintenance | On receiving maintenance notification, maintenance will be performed as soon as nodes becomes idle. | bool |
false |
no |
| enable_oslogin | Enables Google Cloud os-login for user login and authentication for VMs. See https://cloud.google.com/compute/docs/oslogin |
bool |
true |
no |
| enable_placement | Use placement policy for VMs in this nodeset. See: https://cloud.google.com/compute/docs/instances/placement-policies-overview To set max_distance of used policy, use placement_max_distance variable.Enabled by default, reasons for users to disable it: - If non-dense reservation is used, user can avoid extra-cost of creating placement policies; - If user wants to avoid "all or nothing" VM provisioning behaviour; - If user wants to intentionally have "spread" VMs (e.g. for reliability reasons) |
bool |
true |
no |
| enable_public_ips | If set to true. The node group VMs will have a random public IP assigned to it. Ignored if access_config is set. | bool |
false |
no |
| enable_shielded_vm | Enable the Shielded VM configuration. Note: the instance image must support option. | bool |
false |
no |
| enable_smt | DEPRECATED: Use advanced_machine_features.threads_per_core instead. |
bool |
null |
no |
| enable_spot_vm | Enable the partition to use spot VMs (https://cloud.google.com/spot-vms). | bool |
false |
no |
| future_reservation | If set, will make use of the future reservation for the nodeset. Input can be either the future reservation name or its selfLink in the format 'projects/PROJECT_ID/zones/ZONE/futureReservations/FUTURE_RESERVATION_NAME'. See https://cloud.google.com/compute/docs/instances/future-reservations-overview |
string |
"" |
no |
| guest_accelerator | List of the type and count of accelerator cards attached to the instance. | list(object({ |
[] |
no |
| instance_image | Defines the image that will be used in the Slurm node group VM instances. Expected Fields: name: The name of the image. Mutually exclusive with family. family: The image family to use. Mutually exclusive with name. project: The project where the image is hosted. For more information on creating custom images that comply with Slurm on GCP see the "Slurm on GCP Custom Images" section in docs/vm-images.md. |
map(string) |
{ |
no |
| instance_image_custom | A flag that designates that the user is aware that they are requesting to use a custom and potentially incompatible image for this Slurm on GCP module. If the field is set to false, only the compatible families and project names will be accepted. The deployment will fail with any other image family or name. If set to true, no checks will be done. See: https://goo.gle/hpc-slurm-images |
bool |
false |
no |
| instance_properties | Override the instance properties. Used to test features not supported by Slurm GCP, recommended for advanced usage only. See https://cloud.google.com/compute/docs/reference/rest/v1/regionInstances/bulkInsert If any sub-field (e.g. scheduling) is set, it will override the values computed by SlurmGCP and ignoring values of provided vars. |
any |
null |
no |
| instance_template | DEPRECATED: Instance template can not be specified for compute nodes. | string |
null |
no |
| labels | Labels to add to partition compute instances. Key-value pairs. | map(string) |
{} |
no |
| machine_type | Compute Platform machine type to use for this partition compute nodes. | string |
"c2-standard-60" |
no |
| maintenance_interval | Sets the maintenance interval for instances in this nodeset. See https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/compute_instance#maintenance_interval. |
string |
null |
no |
| metadata | Metadata, provided as a map. | map(string) |
{} |
no |
| min_cpu_platform | The name of the minimum CPU platform that you want the instance to use. | string |
null |
no |
| name | Name of the nodeset. Automatically populated by the module id if not set. If setting manually, ensure a unique value across all nodesets. |
string |
n/a | yes |
| network_storage | An array of network attached storage mounts to be configured on nodes. | list(object({ |
[] |
no |
| node_conf | Map of Slurm node line configuration. | map(any) |
{} |
no |
| node_count_dynamic_max | Maximum number of auto-scaling nodes allowed in this partition. | number |
10 |
no |
| node_count_static | Number of nodes to be statically created. | number |
0 |
no |
| on_host_maintenance | Instance availability Policy. Note: Placement groups are not supported when on_host_maintenance is set to "MIGRATE" and will be deactivated regardless of the value of enable_placement. To support enable_placement, ensure on_host_maintenance is set to "TERMINATE". |
string |
"TERMINATE" |
no |
| placement_max_distance | Maximum distance between nodes in the placement group. Requires enable_placement to be true. Values must be supported by the chosen machine type. | number |
null |
no |
| preemptible | Should use preemptibles to burst. | bool |
false |
no |
| project_id | Project ID to create resources in. | string |
n/a | yes |
| region | The default region for Cloud resources. | string |
n/a | yes |
| reservation_name | Name of the reservation to use for VM resources, should be in one of the following formats: - projects/PROJECT_ID/reservations/RESERVATION_NAME[/reservationBlocks/BLOCK_ID] - RESERVATION_NAME[/reservationBlocks/BLOCK_ID] Must be a "SPECIFIC" reservation Set to empty string if using no reservation or automatically-consumed reservations |
string |
"" |
no |
| resource_manager_tags | (Optional) A set of key/value resource manager tag pairs to bind to the instances. Keys must be in the format tagKeys/{tag_key_id}, and values are in the format tagValues/456. | map(string) |
{} |
no |
| service_account | DEPRECATED: Use service_account_email and service_account_scopes instead. |
object({ |
null |
no |
| service_account_email | Service account e-mail address to attach to the compute instances. | string |
null |
no |
| service_account_scopes | Scopes to attach to the compute instances. | set(string) |
[ |
no |
| shielded_instance_config | Shielded VM configuration for the instance. Note: not used unless enable_shielded_vm is 'true'. - enable_integrity_monitoring : Compare the most recent boot measurements to the integrity policy baseline and return a pair of pass/fail results depending on whether they match or not. - enable_secure_boot : Verify the digital signature of all boot components, and halt the boot process if signature verification fails. - enable_vtpm : Use a virtualized trusted platform module, which is a specialized computer chip you can use to encrypt objects like keys and certificates. |
object({ |
{ |
no |
| spot_instance_config | Configuration for spot VMs. | object({ |
null |
no |
| startup_script | Startup script used by VMs in this nodeset | string |
"# no-op" |
no |
| subnetwork_self_link | Subnet to deploy to. | string |
n/a | yes |
| tags | Network tag list. | list(string) |
[] |
no |
| zone | Zone in which to create compute VMs. Additional zones in the same region can be specified in var.zones. | string |
n/a | yes |
| zone_target_shape | Strategy for distributing VMs across zones in a region. ANY GCE picks zones for creating VM instances to fulfill the requested number of VMs within present resource constraints and to maximize utilization of unused zonal reservations. ANY_SINGLE_ZONE (default) GCE always selects a single zone for all the VMs, optimizing for resource quotas, available reservations and general capacity. BALANCED GCE prioritizes acquisition of resources, scheduling VMs in zones where resources are available while distributing VMs as evenly as possible across allowed zones to minimize the impact of zonal failure. |
string |
"ANY_SINGLE_ZONE" |
no |
| zones | Additional zones in which to allow creation of partition nodes. Google Cloud will find zone based on availability, quota and reservations. Should not be set if SPECIFIC reservation is used. |
set(string) |
[] |
no |
| Name | Description |
|---|---|
| nodeset | Details of the nodeset. Typically used as input to schedmd-slurm-gcp-v6-partition. |