| title | Azure Kubernetes Service (AKS) |
|---|
This guide covers setting up an AKS cluster with GPU nodes and deploying Dynamo.
- An active Azure subscription with sufficient GPU VM quota
- Azure CLI (
az) installed and logged in - kubectl installed
- Helm v3.0+ installed
az group create \
--name <RESOURCE_GROUP> \
--location <REGION>az aks create \
--resource-group <RESOURCE_GROUP> \
--name <CLUSTER_NAME> \
--node-count 1 \
--generate-ssh-keysThen get credentials:
az aks get-credentials \
--resource-group <RESOURCE_GROUP> \
--name <CLUSTER_NAME>Add a GPU-enabled node pool with driver installation skipped. The --skip-gpu-driver-install flag prevents AKS from managing GPU drivers — the NVIDIA GPU Operator (Step 3) will handle that instead.
az aks nodepool add \
--resource-group <RESOURCE_GROUP> \
--cluster-name <CLUSTER_NAME> \
--name gpunp \
--node-count 2 \
--node-vm-size Standard_NC24ads_A100_v4 \
--skip-gpu-driver-installFor RDMA-capable workloads (disaggregated inference), use ND-series VMs such as Standard_ND96asr_v4 or Standard_ND96isr_H100_v5. See the RDMA / InfiniBand guide for the additional setup required on those nodes.
For a full list of GPU VM sizes, see GPU-optimized VM sizes.
The GPU Operator manages NVIDIA drivers, container toolkit, device plugin, and monitoring on GPU nodes.
helm repo add nvidia https://helm.ngc.nvidia.com/nvidia
helm repo updatehelm install gpu-operator nvidia/gpu-operator \
--namespace gpu-operator --create-namespaceVerify the pods are running:
kubectl get pods -n gpu-operatorExpected output (abbreviated):
NAMESPACE NAME READY STATUS RESTARTS AGE
gpu-operator gpu-feature-discovery-xxxxx 1/1 Running 0 2m
gpu-operator gpu-operator-xxxxx 1/1 Running 0 2m
gpu-operator nvidia-container-toolkit-daemonset-xxxxx 1/1 Running 0 2m
gpu-operator nvidia-cuda-validator-xxxxx 0/1 Completed 0 1m
gpu-operator nvidia-device-plugin-daemonset-xxxxx 1/1 Running 0 2m
gpu-operator nvidia-driver-daemonset-xxxxx 1/1 Running 0 2m
Note
If you need RDMA / InfiniBand for disaggregated inference, do not install the GPU Operator yet — the RDMA setup requires different Helm values. See RDMA / InfiniBand for the full setup, which includes the correct GPU Operator install command.
Follow the Installation Guide to install the Dynamo Platform and deploy your first model.
Required for disaggregated inference in production. Without RDMA, KV cache transfers between prefill and decode workers fall back to TCP with severe latency degradation (~98s TTFT vs ~200–500ms with RDMA). ND-series VMs (e.g., Standard_ND96asr_v4, Standard_ND96isr_H100_v5) include Mellanox ConnectX InfiniBand NICs but require additional setup beyond the GPU Operator: the NVIDIA Network Operator, a NicClusterPolicy for MOFED drivers, an ib-node-config DaemonSet to configure kernel modules and memlock limits, and an RDMA Shared Device Plugin to expose the NICs to pods.
Prevents each pod from independently downloading model weights on startup. Without shared storage, large models take hours to load per pod and will hit HuggingFace rate limits at scale. Covers Azure Managed Lustre, Azure Files, Azure Disk, and Local CSI options with per-cache-type recommendations (model cache, compilation cache, performance cache).
The recommended storage for large multi-node models requiring high-throughput shared access. Azure Managed Lustre is not installed by default — this guide covers installing and configuring the Lustre CSI driver before you can use it as a PVC storage class.
Significantly reduces GPU compute costs by running on preemptible Spot VM node pools. AKS automatically taints Spot nodes with kubernetes.azure.com/scalesetpriority=spot:NoSchedule, so Dynamo components need explicit tolerations. The Dynamo Helm chart includes a pre-built values-aks-spot.yaml that handles this.
# Delete all Dynamo Graph Deployments
kubectl delete dynamographdeployments.nvidia.com --all --all-namespaces
# Uninstall Dynamo Platform
export NAMESPACE="dynamo-system"
helm uninstall dynamo-platform -n $NAMESPACE
# If running Dynamo < 1.0 with a separate CRDs chart:
# helm uninstall dynamo-crds -n $NAMESPACEIf you want to delete the GPU Operator, follow the Uninstalling the NVIDIA GPU Operator guide.
If you want to delete the entire AKS cluster, follow the Delete an AKS cluster guide.