AKS-deployment.md 10.7 KB
Newer Older
1
2
# Dynamo on AKS

3
This guide covers deploying Dynamo and running LLM inference on Azure Kubernetes Service (AKS). You'll learn how to set up an AKS cluster with GPU nodes, install required components, and deploy your first model.
4

5
## Prerequisites
6

7
Before you begin, ensure you have:
8

9
10
11
12
- An active Azure subscription
- Sufficient Azure quota for GPU VMs
- [kubectl](https://kubernetes.io/docs/tasks/tools/) installed
- [Helm](https://helm.sh/docs/intro/install/) installed
13

14
## Step 1: Create AKS Cluster with GPU Nodes
15

16
If you don't have an AKS cluster yet, create one using the [Azure CLI](https://learn.microsoft.com/en-us/azure/aks/learn/quick-kubernetes-deploy-cli), [Azure PowerShell](https://learn.microsoft.com/en-us/azure/aks/learn/quick-kubernetes-deploy-powershell), or the [Azure portal](https://learn.microsoft.com/en-us/azure/aks/learn/quick-kubernetes-deploy-portal).
17

18
Ensure your AKS cluster has a node pool with GPU-enabled nodes. Follow the [Use GPUs for compute-intensive workloads on Azure Kubernetes Service (AKS)](https://learn.microsoft.com/en-us/azure/aks/use-nvidia-gpu?tabs=add-ubuntu-gpu-node-pool#skip-gpu-driver-installation) guide to create a GPU-enabled node pool.
19

20
**Important:** It is recommended to **skip the GPU driver installation** during node pool creation, as the NVIDIA GPU Operator will handle this in the next step.
21

22
## Step 2: Install NVIDIA GPU Operator
23

24
Once your AKS cluster is configured with a GPU-enabled node pool, install the NVIDIA GPU Operator. This operator automates the deployment and lifecycle of all NVIDIA software components required to provision GPUs in the Kubernetes cluster, including drivers, container toolkit, device plugin, and monitoring tools.
25

26
Follow the [Installing the NVIDIA GPU Operator](https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/getting-started.html) guide to install the GPU Operator on your AKS cluster.
27

28
You should see output similar to the example below. Note that this is not the complete output; there should be additional pods running. The most important thing is to verify that the GPU Operator pods are in a `Running` state.
29
30

```bash
31
32
33
34
35
36
37
NAMESPACE     NAME                                                          READY   STATUS    RESTARTS   AGE
gpu-operator  gpu-feature-discovery-xxxxx                                   1/1     Running   0          2m
gpu-operator  gpu-operator-xxxxx                                            1/1     Running   0          2m
gpu-operator  nvidia-container-toolkit-daemonset-xxxxx                      1/1     Running   0          2m
gpu-operator  nvidia-cuda-validator-xxxxx                                   0/1     Completed 0          1m
gpu-operator  nvidia-device-plugin-daemonset-xxxxx                          1/1     Running   0          2m
gpu-operator  nvidia-driver-daemonset-xxxxx                                 1/1     Running   0          2m
38
39
```

40
## Step 3: Deploy Dynamo Kubernetes Operator
41

42
Follow the [Deploying Inference Graphs to Kubernetes](../../../docs/pages/kubernetes/README.md) guide to install Dynamo on your AKS cluster.
43

44
Validate that the Dynamo pods are running:
45
46

```bash
47
kubectl get pods -n dynamo-system
48

49
50
51
52
53
54
# Expected output:
# NAME                                                              READY   STATUS    RESTARTS   AGE
# dynamo-platform-dynamo-operator-controller-manager-xxxxxxxxxx     2/2     Running   0          2m50s
# dynamo-platform-etcd-0                                            1/1     Running   0          2m50s
# dynamo-platform-nats-0                                            2/2     Running   0          2m50s
# dynamo-platform-nats-box-xxxxxxxxxx                               1/1     Running   0          2m51s
55
56
```

57
## Step 4: Deploy and Test a Model
58

59
Follow the [Deploy Model/Workflow](../../../docs/pages/kubernetes/installation-guide.md#next-steps) guide to deploy and test a model on your AKS cluster.
60

61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
## AKS Storage options for Model Caching and Runtime Data

For implementing tiered storage you can take advantage of the different storage options available in Azure such as:

| Storage Option | Performance | Best For |
|----------------|-------------|----------|
| Local CSI (Ephemeral Disk) | Very high | Fast model caching, warm restarts |
| [Azure Managed Lustre](https://learn.microsoft.com/en-us/azure/azure-managed-lustre/use-csi-driver-kubernetes) | Extremely high | Large multi-node models, shared cache |
| [Azure Disk (Managed Disk)](https://learn.microsoft.com/en-us/azure/aks/azure-csi-driver-volume-provisioning?tabs=dynamic-volume-blob%2Cnfs%2Ckubernetes-secret%2Cnfs-3%2Cgeneral%2Cgeneral2%2Cdynamic-volume-disk%2Cgeneral-disk%2Cdynamic-volume-files%2Cgeneral-files%2Cgeneral-files2%2Cdynamic-volume-files-mid%2Coptimize%2Csmb-share&pivots=csi-disk#create-azure-disk-pvs-using-built-in-storage-classes) | High | Persistent single-writer model cache |
| [Azure Files](https://learn.microsoft.com/en-us/azure/aks/azure-csi-driver-volume-provisioning?tabs=dynamic-volume-blob%2Cnfs%2Ckubernetes-secret%2Cnfs-3%2Cgeneral%2Cgeneral2%2Cdynamic-volume-disk%2Cgeneral-disk%2Cdynamic-volume-files%2Cgeneral-files%2Cgeneral-files2%2Cdynamic-volume-files-mid%2Coptimize%2Csmb-share&pivots=csi-files#use-a-persistent-volume-for-storage) | Medium | Shared small/medium models |
| [Azure Blob (via Fuse or init)](https://learn.microsoft.com/en-us/azure/aks/azure-csi-driver-volume-provisioning?tabs=dynamic-volume-blob%2Cnfs%2Ckubernetes-secret%2Cnfs-3%2Cgeneral%2Cgeneral2%2Cdynamic-volume-disk%2Cgeneral-disk%2Cdynamic-volume-files%2Cgeneral-files%2Cgeneral-files2%2Cdynamic-volume-files-mid%2Coptimize%2Csmb-share&pivots=csi-blob#create-a-pvc-using-built-in-storage-class) | Low–Medium | Cold model storage, bootstrap downloads |

Note: Azure Managed Lustre and Local CSI (ephemeral disk) are not installed by default in AKS and require additional setup before use. Azure Disk, Azure Files, and Azure Blob CSI drivers are available out of the box. See the [AKS CSI storage options documentation](https://learn.microsoft.com/azure/aks/csi-storage-drivers) for more details.

In the cache.yaml in the different [recipes](https://github.com/ai-dynamo/dynamo/tree/main/recipes), you can set the storageClassName to a predefined storage option that are available in your AKS cluster:

```bash
kubectl get storageclass

NAME                           PROVISIONER                 RECLAIMPOLICY
azureblob-csi                  blob.csi.azure.com          Delete
azurefile                      file.csi.azure.com          Delete
azurefile-csi                  file.csi.azure.com          Delete
azurefile-csi-premium          file.csi.azure.com          Delete
azurefile-premium              file.csi.azure.com          Delete
default                        disk.csi.azure.com          Delete
managed                        disk.csi.azure.com          Delete
managed-csi                    disk.csi.azure.com          Delete
managed-csi-premium            disk.csi.azure.com          Delete
managed-premium                disk.csi.azure.com          Delete
sc.azurelustre.csi.azure.com   azurelustre.csi.azure.com   Retain

```
The recommendation for storage options for the Dynamo caches are:

- Model Cache storing raw model artifacts, configuration files, tokenizers etc.<br>
  - Persistence: Required to avoid repeated downloads and reduce cold-start latency.<br>
  - Recommended storage: Azure Managed Lustre (shared, high throughput) or Azure Disk (single-replica, persistent).

- Compilation Cache stores backend-specific compiled artifacts (e.g., TensorRT engines).<br>
  - Persistence: Optional<br>
  - Recommended storage: Local CSI (fast, node-local) or Azure Disk (persistent when GPU configuration is fixed).

- Performance Cache stores runtime tuning and profiling data.<br>
  - Persistence: Not required<br>
  - Recommended storage: Local CSI (or other ephemeral storage).

cache.yaml example:
```bash
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: model-cache
spec:
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 100Gi
  storageClassName: "sc.azurelustre.csi.azure.com"
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: compilation-cache
spec:
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 50Gi
  storageClassName: "azurefile-csi"
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: perf-cache
spec:
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 50Gi
  storageClassName: "local-ephemeral"
```

## Running on AKS Spot VMs based GPU node pools

When deploying Dynamo on AKS with GPU-enabled [Spot VM](https://azure.microsoft.com/en-us/products/virtual-machines/spot) node pools, AKS will automatically apply the following taint to those Spot nodes to prevent standard workloads from being scheduled on them by default.
```bash
kubernetes.azure.com/scalesetpriority=spot:NoSchedule
```
Because of these taints, workloads (including the Dynamo CRD controller, Platform components, and any GPU workloads) must include below tolerations in their Helm charts. Without these tolerations, Kubernetes will not schedule pods onto the Spot VM node pools, and GPU resources will remain unused.
```bash
tolerations:
  - key: kubernetes.azure.com/scalesetpriority
    operator: Equal
    value: spot
    effect: NoSchedule
```
To schedule Dynamo platform components and jobs onto these nodes, use the provided dynamo/examples/deployments/AKS/values-aks-spot.yaml, which includes all required tolerations for:
- Dynamo operator controller manager
- Webhook CA inject and cert generation jobs
- etcd
- NATS
- MPI SSH key generation job
- Other core Dynamo platform pods

Use the following commands to install or upgrade Dynamo using the AKS Spot values file:
```bash
helm install dynamo-platform dynamo-platform-${RELEASE_VERSION}.tgz \
  --namespace dynamo-system \
  --create-namespace \
  -f ./values-aks-spot.yaml
```
or
```bash
helm upgrade dynamo-platform dynamo-platform-${RELEASE_VERSION}.tgz \
  --namespace dynamo-system \
  -f ./values-aks-spot.yaml
```

183
184
## Clean Up Resources

185
If you want to clean up the Dynamo resources created during this guide, you can run the following commands:
186
187

```bash
188
189
# Delete all Dynamo Graph Deployments
kubectl delete dynamographdeployments.nvidia.com --all --all-namespaces
190

191
192
193
# Uninstall Dynamo Platform and CRDs
helm uninstall dynamo-platform -n dynamo-kubernetes
helm uninstall dynamo-crds -n default
194
195
```

196
197
198
199
200
This will spin down the Dynamo deployment and all associated resources.

If you want to delete the GPU Operator, follow the instructions in the [Uninstalling the NVIDIA GPU Operator](https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/uninstall.html) guide.

If you want to delete the entire AKS cluster, follow the instructions in the [Delete an AKS cluster](https://learn.microsoft.com/en-us/azure/aks/delete-cluster) guide.