cluster-toolkit/community/modules/compute/schedmd-slurm-gcp-v6-nodeset-tpu/README.md at main · abbas1902/cluster-toolkit

Description

This module creates partition of TPU nodeset. TPUs are Google's custom-developed application specific ICs to accelerate machine learning workloads.

Example

The following code snippet creates TPU partition with following attributes.

TPU nodeset module is connected to network module.
TPU nodeset is of type v2-8 and version 2.10.0, you can check different configuration configuration
TPU vms are preemptible.
preserve_tpu is set to false. This means, suspended vms will be deleted.
Partition module uses this defined tpu_nodeset module and this partition can be accessed as tpu partition.

  - id: tpu_nodeset
    source: community/modules/compute/schedmd-slurm-gcp-v6-nodeset-tpu
    use: [network]
    settings:
      node_type: v2-8
      tf_version: 2.10.0
      disable_public_ips: false
      preemptible: true
      preserve_tpu: false

  - id: tpu_partition
    source: community/modules/compute/schedmd-slurm-gcp-v6-partition
    use: [tpu_nodeset]
    settings:
      partition_name: tpu

Requirements

Name	Version
terraform	>= 1.3

Providers

No providers.

Modules

No modules.

Resources

No resources.

Inputs

Name	Description	Type	Default	Required
accelerator_config	Nodeset accelerator config, see https://cloud.google.com/tpu/docs/supported-tpu-configurations for details.	object({ topology = string version = string })	{ "topology": "", "version": "" }	no
data_disks	The data disks to include in the TPU node	`list(string)`	`[]`	no
disable_public_ips	DEPRECATED: Use `enable_public_ips` instead.	`bool`	`null`	no
docker_image	The gcp container registry id docker image to use in the TPU vms, it defaults to gcr.io/schedmd-slurm-public/tpu:slurm-gcp-6-9-tf-<var.tf_version>	`string`	`null`	no
enable_public_ips	If set to true. The node group VMs will have a random public IP assigned to it. Ignored if access_config is set.	`bool`	`false`	no
name	Name of the nodeset. Automatically populated by the module id if not set. If setting manually, ensure a unique value across all nodesets.	`string`	n/a	yes
network_storage	An array of network attached storage mounts to be configured on nodes.	list(object({ server_ip = string, remote_mount = string, local_mount = string, fs_type = string, mount_options = string, }))	`[]`	no
node_count_dynamic_max	Maximum number of auto-scaling worker nodes allowed in this partition. For larger TPU machines, there are multiple worker nodes required per machine (1 for every 8 cores). See https://cloud.google.com/tpu/docs/v4#large-topologies, for more information about these machine types.	`number`	`0`	no
node_count_static	Number of worker nodes to be statically created. For larger TPU machines, there are multiple worker nodes required per machine (1 for every 8 cores). See https://cloud.google.com/tpu/docs/v4#large-topologies, for more information about these machine types.	`number`	`0`	no
node_type	Specify a node type to base the vm configuration upon it.	`string`	`""`	no
preemptible	Should use preemptibles to burst.	`bool`	`false`	no
preserve_tpu	Specify whether TPU-vms will get preserve on suspend, if set to true, on suspend vm is stopped, on false it gets deleted	`bool`	`false`	no
project_id	Project ID to create resources in.	`string`	n/a	yes
reserved	Specify whether TPU-vms in this nodeset are created under a reservation.	`bool`	`false`	no
service_account	DEPRECATED: Use `service_account_email` and `service_account_scopes` instead.	object({ email = string scopes = set(string) })	`null`	no
service_account_email	Service account e-mail address to attach to the TPU-vm.	`string`	`null`	no
service_account_scopes	Scopes to attach to the TPU-vm.	`set(string)`	[ "https://www.googleapis.com/auth/cloud-platform" ]	no
subnetwork_self_link	The name of the subnetwork to attach the TPU-vm of this nodeset to.	`string`	n/a	yes
tf_version	Nodeset Tensorflow version, see https://cloud.google.com/tpu/docs/supported-tpu-configurations#tpu_vm for details.	`string`	`"2.14.0"`	no
zone	Zone in which to create compute VMs. TPU partitions can only specify a single zone.	`string`	n/a	yes

Outputs

Name	Description
nodeset_tpu	Details of the nodeset tpu. Typically used as input to `schedmd-slurm-gcp-v6-partition`.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Description

Example

Requirements

Providers

Modules

Resources

Inputs

Outputs

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Description

Example

Requirements

Providers

Modules

Resources

Inputs

Outputs