This module creates partition of TPU nodeset. TPUs are Google's custom-developed application specific ICs to accelerate machine learning workloads.
The following code snippet creates TPU partition with following attributes.
- TPU nodeset module is connected to
networkmodule. - TPU nodeset is of type
v2-8and version2.10.0, you can check different configuration configuration - TPU vms are preemptible.
preserve_tpuis set to false. This means, suspended vms will be deleted.- Partition module uses this defined
tpu_nodesetmodule and this partition can be accessed astpupartition.
- id: tpu_nodeset
source: community/modules/compute/schedmd-slurm-gcp-v6-nodeset-tpu
use: [network]
settings:
node_type: v2-8
tf_version: 2.10.0
disable_public_ips: false
preemptible: true
preserve_tpu: false
- id: tpu_partition
source: community/modules/compute/schedmd-slurm-gcp-v6-partition
use: [tpu_nodeset]
settings:
partition_name: tpu| Name | Version |
|---|---|
| terraform | >= 1.3 |
No providers.
No modules.
No resources.
| Name | Description | Type | Default | Required |
|---|---|---|---|---|
| accelerator_config | Nodeset accelerator config, see https://cloud.google.com/tpu/docs/supported-tpu-configurations for details. | object({ |
{ |
no |
| data_disks | The data disks to include in the TPU node | list(string) |
[] |
no |
| disable_public_ips | DEPRECATED: Use enable_public_ips instead. |
bool |
null |
no |
| docker_image | The gcp container registry id docker image to use in the TPU vms, it defaults to gcr.io/schedmd-slurm-public/tpu:slurm-gcp-6-9-tf-<var.tf_version> | string |
null |
no |
| enable_public_ips | If set to true. The node group VMs will have a random public IP assigned to it. Ignored if access_config is set. | bool |
false |
no |
| name | Name of the nodeset. Automatically populated by the module id if not set. If setting manually, ensure a unique value across all nodesets. |
string |
n/a | yes |
| network_storage | An array of network attached storage mounts to be configured on nodes. | list(object({ |
[] |
no |
| node_count_dynamic_max | Maximum number of auto-scaling worker nodes allowed in this partition. For larger TPU machines, there are multiple worker nodes required per machine (1 for every 8 cores). See https://cloud.google.com/tpu/docs/v4#large-topologies, for more information about these machine types. |
number |
0 |
no |
| node_count_static | Number of worker nodes to be statically created. For larger TPU machines, there are multiple worker nodes required per machine (1 for every 8 cores). See https://cloud.google.com/tpu/docs/v4#large-topologies, for more information about these machine types. |
number |
0 |
no |
| node_type | Specify a node type to base the vm configuration upon it. | string |
"" |
no |
| preemptible | Should use preemptibles to burst. | bool |
false |
no |
| preserve_tpu | Specify whether TPU-vms will get preserve on suspend, if set to true, on suspend vm is stopped, on false it gets deleted | bool |
false |
no |
| project_id | Project ID to create resources in. | string |
n/a | yes |
| reserved | Specify whether TPU-vms in this nodeset are created under a reservation. | bool |
false |
no |
| service_account | DEPRECATED: Use service_account_email and service_account_scopes instead. |
object({ |
null |
no |
| service_account_email | Service account e-mail address to attach to the TPU-vm. | string |
null |
no |
| service_account_scopes | Scopes to attach to the TPU-vm. | set(string) |
[ |
no |
| subnetwork_self_link | The name of the subnetwork to attach the TPU-vm of this nodeset to. | string |
n/a | yes |
| tf_version | Nodeset Tensorflow version, see https://cloud.google.com/tpu/docs/supported-tpu-configurations#tpu_vm for details. | string |
"2.14.0" |
no |
| zone | Zone in which to create compute VMs. TPU partitions can only specify a single zone. | string |
n/a | yes |
| Name | Description |
|---|---|
| nodeset_tpu | Details of the nodeset tpu. Typically used as input to schedmd-slurm-gcp-v6-partition. |