docs: EKS Auto Mode example (#7369)

Signed-off-by: Erez Zarum <erezz@amazon.com>

docs: EKS Auto Mode example (#7369)
Signed-off-by: Erez Zarum <erezz@amazon.com>
9616c86f · Erez Zarum · GitHub · e252f82d · e252f82d · e252f82d
Unverified Commit 9616c86f authored Mar 23, 2026 by Erez Zarum Committed by GitHub Mar 23, 2026
12 changed files
--- a/examples/deployments/EKS/Create_EKS_EFS.md
+++ b/examples/deployments/EKS/Create_EKS_EFS.md
-# Steps to create EKS cluster with EFS
-## 1. Install CLIs
-### a. Install AWS CLI (steps [here](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html))
-```
-sudo apt install unzip
-curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
-unzip awscliv2.zip
-sudo ./aws/install
-```
-### b. Install Kubernetes CLI (steps [here](https://docs.aws.amazon.com/eks/latest/userguide/install-kubectl.html))
-```
-curl -O https://s3.us-west-2.amazonaws.com/amazon-eks/1.30.0/2024-05-12/bin/linux/amd64/kubectl
-chmod +x ./kubectl
-mkdir -p $HOME/bin && cp ./kubectl $HOME/bin/kubectl && export PATH=$HOME/bin:$PATH
-echo 'export PATH=$HOME/bin:$PATH' >> ~/.bashrc
-```
-### c. Install EKS CLI (steps [here](https://eksctl.io/installation/))
-```
-ARCH=amd64
-PLATFORM=$(uname -s)_$ARCH
-curl -sLO "https://github.com/eksctl-io/eksctl/releases/latest/download/eksctl_$PLATFORM.tar.gz"
-curl -sL "https://github.com/eksctl-io/eksctl/releases/latest/download/eksctl_checksums.txt" | grep $PLATFORM | sha256sum --check
-tar -xzf eksctl_$PLATFORM.tar.gz -C /tmp && rm eksctl_$PLATFORM.tar.gz
-sudo mv /tmp/eksctl /usr/local/bin
-```
-### d. Install Helm CLI (steps [here](https://docs.aws.amazon.com/eks/latest/userguide/helm.html))
-```
-curl https://raw.githubusercontent.com/helm/helm/master/scripts/get-helm-3 > get_helm.sh
-chmod 700 get_helm.sh
-./get_helm.sh
-```
-## 2. Create an EKS cluster
-In this example we create an EKS cluster consisting of 1 `g6e.48xlarge` compute node, each with 8 NVIDIA L40S GPUs and 1 `c5.2xlarge` CPU node as control plane. We also setup EFA between the compute nodes.
-### a. Configure AWS CLI
-```
-aws configure
-```
-### b. Create a config file for EKS cluster creation
-```
-apiVersion: eksctl.io/v1alpha5
-kind: ClusterConfig
-metadata:
-  name: <CLUSTER_NAME>
-  version: "1.32"
-  region: <REGION_NAME>
-iam:
-  withOIDC: true
-managedNodeGroups:
-  - name: sys-ng
-    instanceType: c5.2xlarge
-    minSize: 1
-    desiredCapacity: 1
-    maxSize: 1
-    iam:
-      withAddonPolicies:
-        imageBuilder: true
-        autoScaler: true
-        ebs: true
-        efs: true
-        awsLoadBalancerController: true
-        cloudWatch: true
-        albIngress: true
-  - name: efa-compute-ng
-    instanceType: g6e.48xlarge
-    minSize: 1
-    desiredCapacity: 1
-    maxSize: 1
-    volumeSize: 300
-    efaEnabled: true
-    privateNetworking: true
-    iam:
-      withAddonPolicies:
-        imageBuilder: true
-        autoScaler: true
-        ebs: true
-        efs: true
-        awsLoadBalancerController: true
-        cloudWatch: true
-        albIngress: true
-```
-> [!NOTE]
-> We set `minSize` and `desiredCapacity` to be 1 because AWS does not create your cluster successfully if no nodes are available. For example, if you specify `desiredCapacity` to be 2 but there are no available 2 nodes, your cluster creation will fail due to timeout even though there are no errors. The easiest way to avoid this is to create the cluster with 1 node and increase the number of nodes later in the EKS console. After you increase number of nodes in your node groups, make sure GPU nodes are in the same subnet. This is required for EFA to work.
-### c. Create the EKS cluster
-```
-eksctl create cluster -f eks_cluster_config.yaml
-```
-## 3. Create an EFS file system
-We'll need a common, shared storage location to enable pods deployed to multiple nodes to load shards of the same model. This way, they can be used in coordination to serve inference requests for models too large to loaded by GPUs on a single node. In Kubernetes, these common, shared storage locations are referred to as persistent volumes. Persistent volumes can be volume mapped in to any number of pods and then accessed by processes running inside of said pods as if they were part of the pod's file system. We will be using EFS as persistent volume.
-Additionally, we will need to create a persistent-volume claim which can use to assign the persistent volume to a pod.
-### a. Create an IAM role
-Follow the steps to create an IAM role for your EFS file system: https://docs.aws.amazon.com/eks/latest/userguide/efs-csi.html#efs-create-iam-resources. This role will be used later when you install the EFS CSI Driver.
-### b. Install EFS CSI driver
-Install the EFS CSI Driver through the Amazon EKS add-on in AWS console: https://docs.aws.amazon.com/eks/latest/userguide/efs-csi.html#efs-install-driver. Once it's done, check the Add-ons section in EKS console, you should see the driver is showing `Active` under Status.
-### c. Create EFS file system
-Follow the steps to create an EFS file system: https://github.com/kubernetes-sigs/aws-efs-csi-driver/blob/master/docs/efs-create-filesystem.md. Make sure you mount subnets in the last step correctly. This will affect whether your nodes are able to access the created EFS file system.
-## 4. Test
-Follow the steps to check if your EFS file system is working properly with your nodes: https://github.com/kubernetes-sigs/aws-efs-csi-driver/tree/master/examples/kubernetes/multiple_pods. This test is going to mount your EFS file system on all of your available nodes and write a text file to the file system.
-## 5. Create StorageClass
-You can find your `fileSystemId` from AWS EFS. It usually start with `fs-`.
-```
-kind: StorageClass
-apiVersion: storage.k8s.io/v1
-metadata:
-  name: efs-sc
-  annotations:
-    storageclass.kubernetes.io/is-default-class: "true"
-provisioner: efs.csi.aws.com
-parameters:
-  fileSystemId: fs-01e72da3fcdbf8a4d
-  provisioningMode: efs-ap
-  directoryPerms: "777"
-  uid: "1000"
-  gid: "1000"
-```
-```
-kubectl apply -f storageclass.yaml
-```
\ No newline at end of file
--- a/examples/deployments/EKS/Deploy_Dynamo_Kubernetes_Platform.md
+++ b/examples/deployments/EKS/Deploy_Dynamo_Kubernetes_Platform.md
-# Steps to install Dynamo Kubernetes Platform from Source
-## 1. Build Dynamo Base Image
-Create 1 ECR repositoriy
-```
-aws configure
-aws ecr create-repository --repository-name <ECR_REPOSITORY>
-```
-Build Image
-```
-export NAMESPACE=dynamo-system
-export DOCKER_SERVER=<ECR_REGISTRY>
-export DOCKER_USERNAME=AWS
-export DOCKER_PASSWORD="$(aws ecr get-login-password --region <ECR_REGION>)"
-export IMAGE_TAG=0.3.2.1
-python container/render.py --framework=dynamo --target=runtime --output-short-filename
-docker build -t dynamo:latest-vllm -f container/rendered.Dockerfile .
-```
-Push Image
-```
-docker tag dynamo:latest-vllm <ECR_REGISTRY>/<ECR_REPOSITORY>:$IMAGE_TAG
-aws ecr get-login-password | docker login --username AWS --password-stdin <ECR_REGISTRY>
-docker push <ECR_REGISTRY>/<ECR_REPOSITORY>:$IMAGE_TAG
-```
-## 2. Install Dynamo Kubernetes Platform
-Build and Push Operator Image
-```
-cd deploy/operator
-docker build -t $DOCKER_SERVER/kubernetes-operator:$IMAGE_TAG .
-docker push $DOCKER_SERVER/kubernetes-operator:$IMAGE_TAG
-```
-Create secrets
-```
-kubectl create namespace ${NAMESPACE}
-kubectl create secret docker-registry docker-imagepullsecret \
-  --docker-server=${DOCKER_SERVER} \
-  --docker-username=${DOCKER_USERNAME} \
-  --docker-password=${DOCKER_PASSWORD} \
-  --namespace=${NAMESPACE}
-export HF_TOKEN=<HF_TOKEN>
-kubectl create secret generic hf-token-secret \
-  --from-literal=HF_TOKEN=${HF_TOKEN} \
-  -n ${NAMESPACE}
-```
-Install Dynamo Kubernetes Platform
-```
-cd deploy/helm/charts
-helm install dynamo-crds ./crds/ \
-  --namespace default \
-  --wait \
-  --atomic
-```
-```
-helm dep build ./platform/
-kubectl create namespace ${NAMESPACE}
-# Create docker registry secret
-kubectl create secret docker-registry docker-imagepullsecret \
-  --docker-server=${DOCKER_SERVER} \
-  --docker-username=${DOCKER_USERNAME} \
-  --docker-password=${DOCKER_PASSWORD} \
-  --namespace=${NAMESPACE}
-# Install platform
-helm install dynamo-platform ./platform/ \
-  --namespace ${NAMESPACE} \
-  --set "dynamo-operator.controllerManager.manager.image.repository=${DOCKER_SERVER}/kubernetes-operator" \
-  --set "dynamo-operator.controllerManager.manager.image.tag=${IMAGE_TAG}" \
-  --set "dynamo-operator.imagePullSecrets[0].name=docker-imagepullsecret"
-```
-Your pods should be running like below
-```
-ubuntu@ip-192-168-83-157:~/dynamo/examples/backends/vllm/deploy$ kubectl get pods -A
-NAMESPACE      NAME                                                              READY   STATUS    RESTARTS   AGE
-dynamo-system   dynamo-platform-dynamo-operator-controller-manager-86795c5f4j4k   2/2     Running   0          4h17m
-dynamo-system   dynamo-platform-etcd-0                                            1/1     Running   0          4h17m
-dynamo-system   dynamo-platform-nats-0                                            2/2     Running   0          4h17m
-dynamo-system   dynamo-platform-nats-box-5dbf45c748-bxqj7                         1/1     Running   0          4h17m
-```
--- a/examples/deployments/EKS/Deploy_VLLM_example.md
+++ b/examples/deployments/EKS/Deploy_VLLM_example.md
-# Steps to deploy vLLM example
-## 1. Deploy Dynamo Graph
-```
-cd dynamo/examples/backends/vllm/deploy
-vim agg_router.yaml    #under metadata add namespace: dynamo-system and change image to your built base image
-kubectl apply -f agg_router.yaml
-```
-Your pods should be running like below
-```
-ubuntu@ip-192-168-83-157:~/dynamo/examples/backends/vllm/deploy$ kubectl get pods -A
-NAMESPACE      NAME                                                              READY   STATUS    RESTARTS   AGE
-dynamo-system   dynamo-platform-dynamo-operator-controller-manager-86795c5f4j4k   2/2     Running   0          4h17m
-dynamo-system   dynamo-platform-etcd-0                                            1/1     Running   0          4h17m
-dynamo-system   dynamo-platform-nats-0                                            2/2     Running   0          4h17m
-dynamo-system   dynamo-platform-nats-box-5dbf45c748-bxqj7                         1/1     Running   0          4h17m
-dynamo-system   vllm-agg-router-frontend-79d599bb9c-fg97p                         1/1     Running   0          4m9s
-dynamo-system   vllm-agg-router-vllmdecodeworker-787d575485-hrcjp                 1/1     Running   0          4m9s
-dynamo-system   vllm-agg-router-vllmdecodeworker-787d575485-zkwdd                 1/1     Running   0          4m9s
-```
-Test the Deployment
-```
-kubectl port-forward deployment/vllm-agg-router-frontend 8000:8000 -n dynamo-system
-curl localhost:8000/v1/chat/completions \
-  -H "Content-Type: application/json" \
-  -d '{
-    "model": "Qwen/Qwen3-0.6B",
-    "messages": [
-    {
-        "role": "user",
-        "content": "In the heart of Eldoria, an ancient land of boundless magic and mysterious creatures, lies the long-forgotten city of Aeloria. Once a beacon of knowledge and power, Aeloria was buried beneath the shifting sands of time, lost to the world for centuries. You are an intrepid explorer, known for your unparalleled curiosity and courage, who has stumbled upon an ancient map hinting at ests that Aeloria holds a secret so profound that it has the potential to reshape the very fabric of reality. Your journey will take you through treacherous deserts, enchanted forests, and across perilous mountain ranges. Your Task: Character Background: Develop a detailed background for your character. Describe their motivations for seeking out Aeloria, their skills and weaknesses, and any personal connections to the ancient city or its legends. Are they driven by a quest for knowledge, a search for lost familt clue is hidden."
-    }
-    ],
-    "stream": false,
-    "max_tokens": 30
-  }'
-```
-You should output something similar to below
-```
-{"id":"chatcmpl-bbe52b36-90ed-4479-9872-89e1aa412aa7","choices":[{"index":0,"message":{"content":"<think>\nOkay, so the user wants me to develop a character background for an explorer named someone in Eldoria. The character is part of the","refusal":null,"tool_calls":null,"role":"assistant","function_call":null,"audio":null},"finish_reason":"stop","logprobs":null}],"created":1753417848,"model":"Qwen/Qwen3-0.6B","service_tier":null,"system_fingerprint":null,"object":"chat.completion","usage":{"prompt_tokens":196,"completion_tokens":29,"total_tokens":225,"prompt_tokens_details":null,"completion_tokens_details":null}}
-```
\ No newline at end of file
--- a/examples/deployments/EKS/EFS.md
+++ b/examples/deployments/EKS/EFS.md
+<!--
+SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+SPDX-License-Identifier: Apache-2.0
+-->
+# Create an Amazon EFS File System for Amazon EKS
+This guide walks through creating an Amazon EFS file system and connecting it to your EKS cluster. The EFS CSI Driver was already installed as an addon via `eksctl.yaml` during cluster creation. Now we need to create the actual file system and make it available to Kubernetes workloads.
+This filesystem will be used by Dynamo to store shared model weights and compilation cache across nodes.
+## Prerequisites
+- EKS cluster created following the [README](README.md)
+- Environment variables set:
+```bash
+export AWS_REGION="us-east-1"
+export CLUSTER_NAME="ai-dynamo"
+export DYNAMO_NAMESPACE="dynamo-system"
+```
+## Retrieve VPC and Subnet Information
+Get the VPC ID associated with your EKS cluster:
+```bash
+export VPC_ID=$(aws eks describe-cluster \
+  --name $CLUSTER_NAME \
+  --region $AWS_REGION \
+  --query "cluster.resourcesVpcConfig.vpcId" \
+  --output text)
+```
+Get the CIDR range for the VPC (used for the security group rule):
+```bash
+export VPC_CIDR=$(aws ec2 describe-vpcs \
+  --vpc-ids $VPC_ID \
+  --query "Vpcs[0].CidrBlock" \
+  --output text)
+```
+## Create a Security Group for EFS
+Create a security group that allows NFS traffic (port 2049) from within the VPC:
+```bash
+export EFS_SG_ID=$(aws ec2 create-security-group \
+  --group-name dynamo-efs-sg \
+  --description "Security group for EFS access from EKS" \
+  --vpc-id $VPC_ID \
+  --region $AWS_REGION \
+  --query "GroupId" \
+  --output text)
+```
+Add an inbound rule to allow NFS traffic from the VPC CIDR:
+```bash
+aws ec2 authorize-security-group-ingress \
+  --group-id $EFS_SG_ID \
+  --protocol tcp \
+  --port 2049 \
+  --cidr $VPC_CIDR \
+  --region $AWS_REGION
+```
+## Create the EFS File System
+```bash
+export EFS_FS_ID=$(aws efs create-file-system \
+  --performance-mode generalPurpose \
+  --throughput-mode elastic \
+  --encrypted \
+  --region $AWS_REGION \
+  --tags Key=Name,Value=dynamo-efs \
+  --query "FileSystemId" \
+  --output text)
+```
+Wait for the file system to become available:
+```bash
+aws efs describe-file-systems \
+  --file-system-id $EFS_FS_ID \
+  --region $AWS_REGION \
+  --query "FileSystems[0].LifeCycleState" \
+  --output text
+```
+You should see `available` before proceeding.
+## Create Mount Targets
+Mount targets allow your EKS nodes to access the EFS file system. You need one mount target per subnet where your nodes run.
+Get the subnet IDs used by your EKS cluster:
+```bash
+export SUBNET_IDS=$(aws eks describe-cluster \
+  --name $CLUSTER_NAME \
+  --region $AWS_REGION \
+  --query "cluster.resourcesVpcConfig.subnetIds[]" \
+  --output text)
+echo "Subnet IDs: $SUBNET_IDS"
+```
+Create a mount target in each subnet:
+```bash
+for SUBNET_ID in $(echo "$SUBNET_IDS" | tr '\t' '\n'); do
+  echo "Creating mount target in subnet: $SUBNET_ID"
+  aws efs create-mount-target \
+    --file-system-id $EFS_FS_ID \
+    --subnet-id $SUBNET_ID \
+    --security-groups $EFS_SG_ID \
+    --region $AWS_REGION 2>/dev/null || echo "  Mount target already exists or subnet is in a duplicate AZ (this is OK)"
+done
+```
+> **Note:** EFS allows only one mount target per Availability Zone. If multiple subnets are in the same AZ, the command will fail for the duplicates, which is expected and safe to ignore.
+Verify mount targets are available:
+```bash
+aws efs describe-mount-targets \
+  --file-system-id $EFS_FS_ID \
+  --region $AWS_REGION \
+  --query "MountTargets[*].{SubnetId:SubnetId,AZ:AvailabilityZoneName,State:LifeCycleState}" \
+  --output table
+```
+Wait until all mount targets show `available` in the State column before proceeding.
+## Create Kubernetes StorageClass
+Create a StorageClass that uses the EFS CSI driver with dynamic provisioning:
+```bash
+kubectl apply -f - << EOF
+apiVersion: storage.k8s.io/v1
+kind: StorageClass
+metadata:
+  name: efs-sc-dynamic
+provisioner: efs.csi.aws.com
+parameters:
+  provisioningMode: efs-ap
+  fileSystemId: "${EFS_FS_ID}"
+  directoryPerms: "777"
+  uid: "1000"
+  gid: "1000"
+EOF
+```
+## Create a PersistentVolumeClaim
+We create three separate PVCs because different Dynamo recipe examples reference each one individually:
+* `model-cache` stores downloaded model weights (e.g. from HuggingFace).
+* `compilation-cache` stores vLLM/TRT-LLM compilation artifacts.
+* `perf-cache` stores benchmark traces and performance results.
+```bash
+# Create the namespace we will use for Dynamo if not already exists
+kubectl create namespace ${DYNAMO_NAMESPACE}
+# Create PVCs
+kubectl apply -f - << EOF
+apiVersion: v1
+kind: PersistentVolumeClaim
+metadata:
+  name: model-cache
+  namespace: ${DYNAMO_NAMESPACE}
+spec:
+  accessModes:
+    - ReadWriteMany
+  resources:
+    requests:
+      storage: 5Gi
+  storageClassName: "efs-sc-dynamic"
+---
+apiVersion: v1
+kind: PersistentVolumeClaim
+metadata:
+  name: compilation-cache
+  namespace: ${DYNAMO_NAMESPACE}
+spec:
+  accessModes:
+    - ReadWriteMany
+  resources:
+    requests:
+      storage: 5Gi
+  storageClassName: "efs-sc-dynamic"
+---
+apiVersion: v1
+kind: PersistentVolumeClaim
+metadata:
+  name: perf-cache
+  namespace: ${DYNAMO_NAMESPACE}
+spec:
+  accessModes:
+    - ReadWriteMany
+  resources:
+    requests:
+      storage: 5Gi
+  storageClassName: "efs-sc-dynamic"
+EOF
+```
+> **Note:** EFS is elastic, the `storage` value in the PVC is required by Kubernetes but does not limit the actual storage. EFS will grow and shrink automatically.
+## Verify
+Confirm the PVC is bound:
+```bash
+kubectl get pvc -n ${DYNAMO_NAMESPACE}
+```
+You should see output similar to:
+```
+NAME                STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS     VOLUMEATTRIBUTESCLASS   AGE
+compilation-cache   Bound    pvc-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx   5Gi        RWX            efs-sc-dynamic   <unset>                 41s
+model-cache         Bound    pvc-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx   5Gi        RWX            efs-sc-dynamic   <unset>                 42s
+perf-cache          Bound    pvc-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx   5Gi        RWX            efs-sc-dynamic   <unset>                 41s
+```
+## Cleanup
+To delete the EFS resources when no longer needed:
+```bash
+# Delete the Kubernetes resources
+kubectl delete pvc model-cache compilation-cache perf-cache -n ${DYNAMO_NAMESPACE}
+kubectl delete storageclass efs-sc-dynamic
+# Delete mount targets
+for MT_ID in $(aws efs describe-mount-targets --file-system-id $EFS_FS_ID --region $AWS_REGION --query "MountTargets[*].MountTargetId" --output text); do
+  aws efs delete-mount-target --mount-target-id $MT_ID --region $AWS_REGION
+done
+# Delete the EFS file system
+aws efs delete-file-system --file-system-id $EFS_FS_ID --region $AWS_REGION
+# Delete the security group
+aws ec2 delete-security-group --group-id $EFS_SG_ID --region $AWS_REGION
+```
--- a/examples/deployments/EKS/README.md
+++ b/examples/deployments/EKS/README.md
+<!--
+SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+SPDX-License-Identifier: Apache-2.0
+-->
+# Steps to create an EKS cluster
+This guide demonstrates the Dynamo platform on Amazon Elastic Kubernetes Service (EKS).
+## Setup environment variables
+We will use those environment variables throughout this guide.
+If you would like to use a different region, modify the `AWS_REGION` variable
+```bash
+export AWS_REGION="us-east-1"
+export CLUSTER_NAME="ai-dynamo"
+export DYNAMO_NAMESPACE="dynamo-system"
+export DYNAMO_RELEASE_VERSION="1.0.0"
+```
+## Install CLIs
+### Install AWS CLI ([AWS CLI installation guide](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html))
+```bash
+sudo apt install unzip
+curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
+unzip awscliv2.zip
+sudo ./aws/install
+```
+### Install Kubernetes CLI ([kubectl installation guide for EKS](https://docs.aws.amazon.com/eks/latest/userguide/install-kubectl.html))
+```bash
+curl -O https://s3.us-west-2.amazonaws.com/amazon-eks/1.35.2/2026-02-27/bin/darwin/amd64/kubectl
+chmod +x ./kubectl
+mkdir -p $HOME/bin && cp ./kubectl $HOME/bin/kubectl && export PATH=$HOME/bin:$PATH
+echo 'export PATH=$HOME/bin:$PATH' >> ~/.bashrc
+```
+### Install Eksctl CLI ([eksctl installation guide](https://eksctl.io/installation/))
+```bash
+ARCH=amd64
+PLATFORM=$(uname -s)_$ARCH
+curl -sLO "https://github.com/eksctl-io/eksctl/releases/latest/download/eksctl_$PLATFORM.tar.gz"
+curl -sL "https://github.com/eksctl-io/eksctl/releases/latest/download/eksctl_checksums.txt" | grep $PLATFORM | sha256sum --check
+tar -xzf eksctl_$PLATFORM.tar.gz -C /tmp && rm eksctl_$PLATFORM.tar.gz
+sudo mv /tmp/eksctl /usr/local/bin
+```
+### Install Helm CLI ([Helm setup for EKS](https://docs.aws.amazon.com/eks/latest/userguide/helm.html))
+```bash
+curl https://raw.githubusercontent.com/helm/helm/master/scripts/get-helm-3 > get_helm.sh
+chmod 700 get_helm.sh
+./get_helm.sh
+```
+## Create an EKS Auto Mode cluster
+Creating an EKS Auto Mode cluster using Eksctl with `eksctl.yaml`.
+This will create an EKS Auto Mode cluster with the Amazon EFS CSI Driver installed as an addon, we will later use Amazon EFS to store model weights and compilation to be used by Dynamo.
+```bash
+# Use all availability zones in a region, exclude use1-az3 where EKS control plane is not available
+export EKS_CP_AZS=$(aws ec2 describe-availability-zones \
+      --region ${AWS_REGION} \
+      --filters "Name=opt-in-status,Values=opt-in-not-required" \
+      --query "AvailabilityZones[?ZoneId!='use1-az3'].[ZoneName]" \
+      --output text | sed 's/ /, /g; s/^/  - /')
+eksctl create cluster -f <(envsubst < templates/eksctl.yaml)
+```
+*Note: eksctl will automatically configure kubeconfig context for you, if not you can run: `aws eks update-kubeconfig --region $AWS_REGION --name $CLUSTER_NAME`*
+### Create an EKS Auto Mode GPU NodePool
+Creating a GPU NodePool that targets the **g5,g6,g6e,g7e,p5,p5e,p5en** instance families.
+```bash
+kubectl apply -f automode-np-gpu.yaml
+```
+## Create a default StorageClass
+Create a default StorageClass to use the storage capability of EKS Auto Mode, this will make the default StorageClass to use EBS volumes for Stateful workloads needed by NATS that is used with Dynamo.
+```bash
+kubectl apply -f - << EOF
+apiVersion: storage.k8s.io/v1
+kind: StorageClass
+metadata:
+  name: auto-ebs-sc
+  annotations:
+    storageclass.kubernetes.io/is-default-class: "true"
+allowedTopologies:
+- matchLabelExpressions:
+  - key: eks.amazonaws.com/compute-type
+    values:
+    - auto
+provisioner: ebs.csi.eks.amazonaws.com
+volumeBindingMode: WaitForFirstConsumer
+parameters:
+  type: gp3
+  encrypted: "true"
+EOF
+```
+## Create an Amazon EFS shared file system
+Follow the [EFS setup guide](EFS.md) to create an EFS file system and make it available as shared storage for Dynamo workloads.
+## Install Dynamo Kubernetes Platform
+### Install Dynamo Platform
+```bash
+helm fetch https://helm.ngc.nvidia.com/nvidia/ai-dynamo/charts/dynamo-platform-"${DYNAMO_RELEASE_VERSION}".tgz
+helm install dynamo-platform dynamo-platform-"${DYNAMO_RELEASE_VERSION}".tgz --namespace "${DYNAMO_NAMESPACE}" --create-namespace
+```
+### Setup HuggingFace TOKEN
+```bash
+export HF_TOKEN=<HF_TOKEN>
+kubectl create secret generic hf-token-secret \
+  --from-literal=HF_TOKEN=${HF_TOKEN} \
+  -n ${DYNAMO_NAMESPACE}
+```
+### Verify installation
+Validate that the Dynamo platform pods are running, you should see an output similar to output below.
+```bash
+kubectl get pods -n ${DYNAMO_NAMESPACE}
+NAME                                                              READY   STATUS    RESTARTS   AGE
+dynamo-platform-dynamo-operator-controller-manager-ff54b5dstgcq   1/1     Running   0          106s
+dynamo-platform-nats-0                                            2/2     Running   0          106s
+```
+Validate that the Dynamo CRDs were installed
+```bash
+kubectl get crds | grep dynamo
+dynamocheckpoints.nvidia.com                      2026-03-17T13:18:05Z
+dynamocomponentdeployments.nvidia.com             2026-03-17T13:18:06Z
+dynamographdeploymentrequests.nvidia.com          2026-03-17T13:18:08Z
+dynamographdeployments.nvidia.com                 2026-03-17T13:18:09Z
+dynamographdeploymentscalingadapters.nvidia.com   2026-03-17T13:18:10Z
+dynamomodels.nvidia.com                           2026-03-17T13:18:10Z
+dynamoworkermetadatas.nvidia.com                  2026-03-17T13:18:11Z
+```
+## Deploy a Dynamo DynamoGraphDeployment (DGD)
+| Manifest | Description |
+|----------|-------------|
+| `manifests/vllm/disagg.yaml` | Disaggregated prefill/decode DGD using NIXL with LIBFABRIC backend over EFA. Targets `g7e.12xlarge` instances with GPUDirect RDMA support for high-throughput KV-cache transfer between prefill and decode workers. |
+| `manifests/vllm/disagg-p5.yaml` | Disaggregated prefill/decode DGD using NIXL with LIBFABRIC backend over EFA. Targets `p5.48xlarge` reserved instances with 8 EFA devices (4 EFAs per 1 GPU for p5.48xlarge) and TP-2 for Qwen3-32B. Uses 2 decode and 6 prefill replicas on reserved capacity (`karpenter.sh/capacity-type: reserved`). |
+| `manifests/vllm/disagg-tcp.yaml` | Alternative disaggregated prefill/decode inference graph using TCP instead of EFA. Targets `g6e.2xlarge` instances, suitable for instance types without EFA support. |
+| `manifests/vllm/agg.yaml` | Aggregated (single-worker) inference graph where a single vLLM worker handles both prefill and decode phases. Simpler deployment without KV-cache transfer overhead. |
+### Cache Models on EFS
+Before deploying an inference graph, download the model weights onto the shared EFS file system. Each Dynamo recipe includes a `model-cache/model-download.yaml` Job manifest that downloads the model from HuggingFace.
+Copy the recipe's download manifest into the local kustomize directory and apply it:
+```bash
+# Example: cache the Qwen3-32B model which we will be using later
+cp ../../../recipes/qwen3-32b/model-cache/model-download.yaml manifests/model-download/model-download.yaml
+kubectl kustomize manifests/model-download | kubectl -n ${DYNAMO_NAMESPACE} apply -f -
+rm -f manifests/model-download/model-download.yaml
+```
+The recipe manifests don't set any memory resources on the download container. Without a memory request, the Job pod can get OOMKilled during download — especially for large models. The `kustomization.yaml` in `manifests/model-download/` patches in a memory request to prevent this. By default it adds `4Gi`.
+For larger models (e.g. DeepSeek-R1, Nemotron-3-Super-120B) increase this value in `manifests/model-download/kustomization.yaml` before applying:
+```yaml
+patches:
+  - target:
+      kind: Job
+      name: model-download
+    patch: |
+      apiVersion: batch/v1
+      kind: Job
+      metadata:
+        name: model-download
+      spec:
+        template:
+          spec:
+            containers:
+              - name: model-download
+                resources:
+                  requests:
+                    memory: "16Gi"   # increase for larger models
+```
+Then apply:
+```bash
+kubectl kustomize manifests/model-download | kubectl -n ${DYNAMO_NAMESPACE} apply -f -
+```
+Monitor the download Job:
+```bash
+kubectl -n ${DYNAMO_NAMESPACE} get jobs model-download
+kubectl -n ${DYNAMO_NAMESPACE} logs -f job/model-download
+```
+To re-run a download (e.g. after changing the model or fixing an OOM), delete the previous Job first:
+```bash
+kubectl -n ${DYNAMO_NAMESPACE} delete job model-download
+```
+Then copy the new recipe's manifest and apply again.
+### Disaggregated Serving
+This example deploys a disaggregated prefill/decode Dynamo Inference Graph that uses NIXL with the LIBFABRIC backend using [Elastic Fabric Adapter (EFA)](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/efa.html) for high-throughput KV-cache transfer between workers.
+It targets `g7e.12xlarge` instances, which support GPUDirect RDMA, and uses the Dynamo EFA-enabled vLLM container `nvcr.io/nvidia/ai-dynamo/vllm-runtime:1.0.0-efa-amd64` that ships with the [EFA Installer](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/efa-changelog.html) pre-installed.
+*Note: For a full list of EFA-supported instance types, see [the AWS EC2 Docs](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/efa.html#efa-instance-types).*
+```yaml
+        nodeSelector:
+          node.kubernetes.io/instance-type: g7e.12xlarge
+```
+KV-cache transfer between workers uses [NIXL](https://github.com/ai-dynamo/nixl) with the LIBFABRIC backend. Enable it by passing the following argument to vLLM:
+`--kv-transfer-config '{"kv_connector":"NixlConnector","kv_role":"kv_both","kv_connector_extra_config": {"backends": ["LIBFABRIC"]}}'`
+*Note: On instance types without EFA support, NIXL's libfabric backend falls back to TCP automatically. However, vLLM's `NixlConnector` defaults to `cuda` as the buffer device, so you must add `"kv_buffer_device":"cpu"` to the `kv-transfer-config` argument for disaggregated serving to work without EFA.*
+Request an EFA device for each worker pod using the `vpc.amazonaws.com/efa` extended resource:
+```yaml
+      resources:
+        requests:
+          gpu: "1"
+          custom:
+            vpc.amazonaws.com/efa: "1"
+        limits:
+          gpu: "1"
+          custom:
+            vpc.amazonaws.com/efa: "1"
+```
+*Note: EKS Auto Mode includes the EFA device plugin making `vpc.amazonaws.com/efa` extended resource available.*
+All workers (prefill and decode) must be co-located in the same availability zone, since EFA traffic does not cross AZ boundaries. Use a pod affinity rule to enforce this:
+```yaml
+        affinity:
+          podAffinity:
+            requiredDuringSchedulingIgnoredDuringExecution:
+              - topologyKey: "topology.kubernetes.io/zone"
+                labelSelector:
+                  matchLabels:
+                    nvidia.com/dynamo-graph-deployment-name: "vllm-disagg"
+```
+```bash
+kubectl -n ${DYNAMO_NAMESPACE} apply -f manifests/vllm/disagg.yaml
+```
+*Note: `manifests/vllm/disagg-tcp.yaml` provides an alternative example that uses TCP instead of EFA, targeting `g6e.2xlarge` instances.*
+Verify that all pods reach `Running` status:
+```bash
+kubectl -n ${DYNAMO_NAMESPACE} get pods
+NAME                                                              READY   STATUS    RESTARTS   AGE
+dynamo-platform-dynamo-operator-controller-manager-ff54b5dstgcq   1/1     Running   0          39m
+dynamo-platform-nats-0                                            2/2     Running   0          39m
+vllm-disagg-frontend-85f8476887-wwtwk                             1/1     Running   0          2m13s
+vllm-disagg-vllmdecodeworker-510a1741-7666987b-tp58w              1/1     Running   0          2m13s
+vllm-disagg-vllmprefillworker-510a1741-54f76d7954-tjgn8           1/1     Running   0          2m13s
+```
+```bash
+kubectl -n ${DYNAMO_NAMESPACE} port-forward svc/vllm-disagg-frontend 8000:8000
+curl localhost:8000/v1/chat/completions \
+  -H "Content-Type: application/json" \
+  -d '{
+    "model": "Qwen/Qwen3-32B",
+    "messages": [
+    {
+        "role": "user",
+        "content": "In the heart of Eldoria, an ancient land of boundless magic and mysterious creatures, lies the long-forgotten city of Aeloria. Once a beacon of knowledge and power, Aeloria was buried beneath the shifting sands of time, lost to the world for centuries. You are an intrepid explorer, known for your unparalleled curiosity and courage, who has stumbled upon an ancient map hinting at ests that Aeloria holds a secret so profound that it has the potential to reshape the very fabric of reality. Your journey will take you through treacherous deserts, enchanted forests, and across perilous mountain ranges. Your Task: Character Background: Develop a detailed background for your character. Describe their motivations for seeking out Aeloria, their skills and weaknesses, and any personal connections to the ancient city or its legends. Are they driven by a quest for knowledge, a search for lost familt clue is hidden."
+    }
+    ],
+    "stream": false,
+    "max_tokens": 30
+  }'
+```
+You should see output similar to below
+```bash
+{"id":"chatcmpl-23a7c94b-99cb-42ca-ae56-2397aa5a560f","choices":[{"index":0,"message":{"content":"<think>\nOkay, so I need to develop a character background for someone who's an intrepid explorer in Eldoria, specifically focusing on their motivations,","role":"assistant","reasoning_content":null},"finish_reason":"length"}],"created":1773336002,"model":"Qwen/Qwen3-0.6B","object":"chat.completion","usage":{"prompt_tokens":196,"completion_tokens":30,"total_tokens":226,"prompt_tokens_details":{"audio_tokens":null,"cached_tokens":192}},"nvext":{"worker_id":{"prefill_worker_id":4265733549773195,"prefill_dp_rank":0,"decode_worker_id":7535192362430132,"decode_dp_rank":0},"timing":{"request_received_ms":1773336002136,"prefill_wait_time_ms":0.852483,"prefill_time_ms":12.90597,"ttft_ms":13.758453000000001,"total_time_ms":110.89621500000001,"kv_hit_rate":0.0}}}
+```
+*Note: The initial request for each worker will occur increased latency, this is due to the NIXL backend handshake and initialization overhead, this operation is only for the very first transfer*
+Watch logs
+```bash
+kubectl logs -n ${DYNAMO_NAMESPACE} -l nvidia.com/dynamo-graph-deployment-name=vllm-disagg --all-containers=true --max-log-requests=20 --prefix=true --timestamps -f
+```
+Cleanup
+```bash
+kubectl -n ${DYNAMO_NAMESPACE} delete -f manifests/vllm/disagg.yaml
+```
+### Aggregated Serving
+```bash
+kubectl -n ${DYNAMO_NAMESPACE} apply -f manifests/vllm/agg.yaml
+```
+Your pods should be running like below output, making sure they are in status "Running".
+```bash
+kubectl -n ${DYNAMO_NAMESPACE} get pods
+NAME                                                              READY   STATUS    RESTARTS   AGE
+dynamo-platform-dynamo-operator-controller-manager-ff54b5dstgcq   1/1     Running   0          12m
+dynamo-platform-nats-0                                            2/2     Running   0          12m
+vllm-agg-frontend-ff8457bcf-tq9jh                                 1/1     Running   0          4m46s
+vllm-agg-vllmdecodeworker-d0a70291-759df94478-8lc74               1/1     Running   0          4m46s
+```
+Watch logs
+```bash
+kubectl logs -n ${DYNAMO_NAMESPACE} -l nvidia.com/dynamo-graph-deployment-name=vllm-agg --all-containers=true --max-log-requests=20 --prefix=true --timestamps -f
+```
+```bash
+kubectl -n ${DYNAMO_NAMESPACE} port-forward svc/vllm-agg-frontend 8000:8000
+curl localhost:8000/v1/chat/completions \
+  -H "Content-Type: application/json" \
+  -d '{
+    "model": "Qwen/Qwen3-0.6B",
+    "messages": [
+    {
+        "role": "user",
+        "content": "In the heart of Eldoria, an ancient land of boundless magic and mysterious creatures, lies the long-forgotten city of Aeloria. Once a beacon of knowledge and power, Aeloria was buried beneath the shifting sands of time, lost to the world for centuries. You are an intrepid explorer, known for your unparalleled curiosity and courage, who has stumbled upon an ancient map hinting at ests that Aeloria holds a secret so profound that it has the potential to reshape the very fabric of reality. Your journey will take you through treacherous deserts, enchanted forests, and across perilous mountain ranges. Your Task: Character Background: Develop a detailed background for your character. Describe their motivations for seeking out Aeloria, their skills and weaknesses, and any personal connections to the ancient city or its legends. Are they driven by a quest for knowledge, a search for lost familt clue is hidden."
+    }
+    ],
+    "stream": false,
+    "max_tokens": 30
+  }'
+```
+You should see output similar to below
+```bash
+{"id":"chatcmpl-093fac0e-f75e-43b5-90dc-96c8c77a2e7c","choices":[{"index":0,"message":{"content":"<think>\nOkay, I need to develop a character background for the explorer in Eldoria. Let me start by understanding the user's query. They mentioned","role":"assistant","reasoning_content":null},"finish_reason":"length"}],"created":1773443560,"model":"Qwen/Qwen3-0.6B","object":"chat.completion","usage":{"prompt_tokens":196,"completion_tokens":30,"total_tokens":226},"nvext":{"timing":{"request_received_ms":1773443560878,"total_time_ms":99.89782}}}%
+```
+Cleanup
+```bash
+kubectl -n ${DYNAMO_NAMESPACE} delete -f manifests/vllm/agg.yaml
+```
+## Using On-Demand Capacity Reservations (ODCR) and Capacity Blocks (CBs) for ML
+GPU instances can be difficult to acquire on-demand. AWS provides two reservation mechanisms to guarantee capacity for ML workloads:
+- [On-Demand Capacity Reservations (ODCRs)](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-capacity-reservations.html) reserve capacity in a specific AZ for any duration. You pay for the reserved capacity whether or not you use it.
+- [Capacity Blocks for ML](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-capacity-blocks.html) reserve GPU instances for a fixed time window (hours to days). Instances are placed in EC2 UltraClusters for low-latency networking. Capacity Blocks have a defined end time, and EC2 will terminate instances before the block expires.
+EKS Auto Mode uses Karpenter under the hood, which models reserved capacity as `karpenter.sh/capacity-type: reserved` and prioritizes it over on-demand and spot.
+> **Note:** By default, EKS Auto Mode can launch into open ODCRs automatically, but does not prioritize them. Capacity Blocks are never used automatically. Both require explicit `capacityReservationSelectorTerms` configuration on a NodeClass to be prioritized and labeled as `reserved`.
+### Create a NodeClass with Capacity Reservation
+Create a NodeClass that references your ODCR or Capacity Block reservation. You can select by reservation ID or by tags.
+First, extract the subnet, security group, and role configuration from the `default` NodeClass that EKS Auto Mode already created:
+```bash
+export NC_SUBNETS=$(kubectl get nodeclass default -o json | jq -c '.spec.subnetSelectorTerms')
+export NC_SG=$(kubectl get nodeclass default -o json | jq -c '.spec.securityGroupSelectorTerms')
+export NC_ROLE=$(kubectl get nodeclass default -o json | jq -r '.spec.role')
+```
+Replace `<CR ID>` with your actual reservation ID from the EC2 console.
+```bash
+export CR_ID=<CR ID>
+kubectl apply -f - << EOF
+apiVersion: eks.amazonaws.com/v1
+kind: NodeClass
+metadata:
+  name: gpu-reserved
+spec:
+  role: ${NC_ROLE}
+  subnetSelectorTerms: ${NC_SUBNETS}
+  securityGroupSelectorTerms: ${NC_SG}
+  capacityReservationSelectorTerms:
+    # Select by reservation ID (ODCR or Capacity Block)
+    - id: "${CR_ID}"
+    # Or select by tags (can be combined)
+    # - tags:
+    #     team: "dynamo"
+EOF
+```
+Wait until the status of the capacityReservation state is `active`.
+```bash
+kubectl get nodeclass gpu-reserved -o json | jq '.status.capacityReservations'
+[
+  {
+    "availabilityZone": "us-east-2c",
+    "endTime": "2026-03-18T11:30:00Z",
+    "id": "cr-xxxxxxxxxxxxxx",
+    "instanceMatchCriteria": "targeted",
+    "instanceType": "p5.48xlarge",
+    "ownerID": "xxxxxxxxxxx",
+    "reservationType": "capacity-block",
+    "state": "active"
+  }
+]
+```
+### Create a NodePool for Reserved Capacity
+Create a NodePool that references the `gpu-reserved` NodeClass and uses the `reserved` capacity type. You can optionally include `on-demand` and `spot` as a fallback when the reservation is exhausted.
+```bash
+kubectl apply -f - << EOF
+apiVersion: karpenter.sh/v1
+kind: NodePool
+metadata:
+  name: gpu-reserved
+spec:
+  disruption:
+    budgets:
+      - nodes: 10%
+    consolidateAfter: 300s
+    consolidationPolicy: WhenEmptyOrUnderutilized
+  template:
+    spec:
+      nodeClassRef:
+        group: eks.amazonaws.com
+        kind: NodeClass
+        name: gpu-reserved
+      requirements:
+        - key: karpenter.sh/capacity-type
+          operator: In
+          values:
+            - reserved
+            # Uncomment to fallback to on-demand or spot when reservation is exhausted
+            # - on-demand
+            # - spot
+        - key: eks.amazonaws.com/instance-family
+          operator: In
+          values:
+            - g6e
+            - g7e
+            - p5
+            - p5e
+            - p5en
+      taints:
+        - effect: NoSchedule
+          key: nvidia.com/gpu
+          value: Exists
+EOF
+```
+Validate that the `gpu-reserved` NodePool is ready
+```bash
+kubectl get nodepool gpu-reserved
+NAME           NODECLASS      NODES   READY   AGE
+gpu-reserved   gpu-reserved   0       True    8s
+```
+> **Note:** When configuring `capacityReservationSelectorTerms` on any NodeClass in the cluster, EKS Auto Mode will stop automatically using open ODCRs for all NodeClasses. Make sure all NodeClasses that should use ODCRs have explicit selector terms configured.
+### Targeting Reserved Nodes from Workloads
+Pods are scheduled onto reserved nodes through the existing NodePool requirements and taints. If you want to ensure a workload only runs on reserved capacity, add a node selector:
+```yaml
+      nodeSelector:
+        karpenter.sh/capacity-type: reserved
+      tolerations:
+        - key: nvidia.com/gpu
+          operator: Exists
+          effect: NoSchedule
+```
+### Capacity Blocks Considerations
+Capacity Blocks have a fixed end time. EC2 begins terminating instances 30 minutes before the block expires (60 minutes for UltraServer types). Karpenter will start draining nodes 10 minutes before EC2 termination begins, giving your workloads time to gracefully shut down.
+Plan your inference workloads accordingly, and consider using `on-demand` as a fallback capacity type in the NodePool if you need continuity beyond the Capacity Block window.
+## Cleanup
+Delete all DynamoGraphDeployment
+```bash
+kubectl -n ${DYNAMO_NAMESPACE} get dgd
+# If you have any, delete them
+kubectl -n ${DYNAMO_NAMESPACE} delete dgd <name>
+```
+Uninstall Dynamo platform
+```bash
+helm uninstall -n ${DYNAMO_NAMESPACE} dynamo-platform
+```
+Clean leftover PVCs related to NATS
+```bash
+kubectl -n ${DYNAMO_NAMESPACE} get pvc
+NAME                                             STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   VOLUMEATTRIBUTESCLASS   AGE
+dynamo-platform-nats-js-dynamo-platform-nats-0   Bound    pvc-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx   10Gi       RWO            auto-ebs-sc    <unset>                 75m
+kubectl -n ${DYNAMO_NAMESPACE} delete pvc dynamo-platform-nats-js-dynamo-platform-nats-0
+```
+Delete the AutoMode GPU nodepool
+```bash
+kubectl delete nodepool gpu
+```
+Cleanup EFS related resources, follow the [EFS setup guide](EFS.md#cleanup) cleanup section
+Delete the EKS Auto Mode cluster using Eksctl
+```bash
+eksctl delete cluster -f <(envsubst < templates/eksctl.yaml)
+```
\ No newline at end of file
--- a/examples/deployments/EKS/automode-np-gpu.yaml
+++ b/examples/deployments/EKS/automode-np-gpu.yaml
+# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+apiVersion: karpenter.sh/v1
+kind: NodePool
+metadata:
+  name: gpu
+spec:
+  disruption:
+    budgets:
+      - nodes: 10%
+    consolidateAfter: 300s
+    consolidationPolicy: WhenEmptyOrUnderutilized
+  template:
+    spec:
+      nodeClassRef:
+        group: eks.amazonaws.com
+        kind: NodeClass
+        name: default
+      requirements:
+        - key: karpenter.sh/capacity-type
+          operator: In
+          values:
+            - spot
+            - on-demand
+        - key: eks.amazonaws.com/instance-family
+          operator: In
+          values:
+            - g5
+            - g6
+            - g6e
+            - g7e
+            - p5
+            - p5e
+            - p5en
+      taints:
+        - effect: NoSchedule
+          key: nvidia.com/gpu
+          value: Exists
--- a/examples/deployments/EKS/manifests/model-download/kustomization.yaml
+++ b/examples/deployments/EKS/manifests/model-download/kustomization.yaml
+# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+apiVersion: kustomize.config.k8s.io/v1beta1
+kind: Kustomization
+resources:
+  - model-download.yaml
+patches:
+  - target:
+      kind: Job
+      name: model-download
+    patch: |
+      apiVersion: batch/v1
+      kind: Job
+      metadata:
+        name: model-download
+      spec:
+        template:
+          spec:
+            containers:
+              - name: model-download
+                resources:
+                  requests:
+                    memory: "4Gi"
--- a/examples/deployments/EKS/manifests/vllm/agg.yaml
+++ b/examples/deployments/EKS/manifests/vllm/agg.yaml
+# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+apiVersion: nvidia.com/v1alpha1
+kind: DynamoGraphDeployment
+metadata:
+  name: vllm-agg
+spec:
+  services:
+    Frontend:
+      componentType: frontend
+      replicas: 1
+      extraPodSpec:
+        mainContainer:
+          image: nvcr.io/nvidia/ai-dynamo/dynamo-frontend:1.0.0
+    VllmDecodeWorker:
+      envFromSecret: hf-token-secret
+      componentType: worker
+      replicas: 1
+      resources:
+        requests:
+          gpu: "1"
+        limits:
+          gpu: "1"
+      extraPodSpec:
+        nodeSelector:
+          node.kubernetes.io/instance-type: g6e.2xlarge
+        tolerations:
+        - key: "nvidia.com/gpu"
+          operator: "Exists"
+          effect: "NoSchedule"
+        mainContainer:
+          image: nvcr.io/nvidia/ai-dynamo/vllm-runtime:1.0.0-efa-amd64
+          workingDir: /workspace/examples/backends/vllm
+          command:
+          - /bin/bash
+          - -c
+          - |
+            exec python3 -m dynamo.vllm --model Qwen/Qwen3-0.6B
\ No newline at end of file
--- a/examples/deployments/EKS/manifests/vllm/disagg-p5.yaml
+++ b/examples/deployments/EKS/manifests/vllm/disagg-p5.yaml
+# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+apiVersion: nvidia.com/v1alpha1
+kind: DynamoGraphDeployment
+metadata:
+  name: vllm-disagg
+spec:
+  pvcs:
+  - create: false
+    name: model-cache
+  - create: false
+    name: compilation-cache
+  services:
+    Frontend:
+      componentType: frontend
+      envs:
+        - name: HF_HOME
+          value: /home/dynamo/.cache/huggingface
+      replicas: 1
+      resources:
+        requests:
+          cpu: "8"
+        limits:
+          cpu: "8"
+      volumeMounts:
+      - name: model-cache
+        mountPoint: /home/dynamo/.cache/huggingface
+      extraPodSpec:
+        affinity:
+          podAffinity:
+            requiredDuringSchedulingIgnoredDuringExecution:
+              - topologyKey: "topology.kubernetes.io/zone"
+                labelSelector:
+                  matchLabels:
+                    nvidia.com/dynamo-graph-deployment-name: "vllm-disagg"
+                    nvidia.com/dynamo-component-type: "worker"
+        mainContainer:
+          image: nvcr.io/nvidia/ai-dynamo/dynamo-frontend:1.0.0
+          workingDir: /workspace
+          command:
+          - /bin/bash
+          - -c
+          - |
+            exec python3 -m dynamo.frontend --router-mode kv --router-reset-states
+    VllmDecodeWorker:
+      envFromSecret: hf-token-secret
+      componentType: worker
+      subComponentType: decode
+      replicas: 2
+      resources:
+        requests:
+          gpu: "2"
+          custom:
+            vpc.amazonaws.com/efa: "8"
+        limits:
+          gpu: "2"
+          custom:
+            vpc.amazonaws.com/efa: "8"
+      volumeMounts:
+      - name: model-cache
+        mountPoint: /home/dynamo/.cache/huggingface
+      - name: compilation-cache
+        mountPoint: /home/dynamo/.cache/vllm
+        useAsCompilationCache: true
+      extraPodSpec:
+        nodeSelector:
+          node.kubernetes.io/instance-type: p5.48xlarge
+          karpenter.sh/capacity-type: reserved
+        tolerations:
+        - key: "nvidia.com/gpu"
+          operator: "Exists"
+          effect: "NoSchedule"
+        affinity:
+          podAffinity:
+            requiredDuringSchedulingIgnoredDuringExecution:
+              - topologyKey: "topology.kubernetes.io/zone"
+                labelSelector:
+                  matchLabels:
+                    nvidia.com/dynamo-graph-deployment-name: "vllm-disagg"
+          podAntiAffinity:
+            preferredDuringSchedulingIgnoredDuringExecution:
+              - weight: 100
+                podAffinityTerm:
+                  labelSelector:
+                    matchLabels:
+                      nvidia.com/dynamo-component: "VllmPrefillWorker"
+                  topologyKey: "kubernetes.io/hostname"
+        mainContainer:
+          env:
+          - name: HF_HOME
+            value: /home/dynamo/.cache/huggingface
+          securityContext:
+            capabilities:
+              add:
+                - IPC_LOCK
+          image: nvcr.io/nvidia/ai-dynamo/vllm-runtime:1.0.0-efa-amd64
+          workingDir: /workspace
+          command:
+          - /bin/bash
+          - -c
+          - |
+            exec python3 -m dynamo.vllm \
+             --model Qwen/Qwen3-32B \
+             --tensor-parallel-size 2 \
+             --disaggregation-mode decode \
+             --kv-transfer-config '{"kv_connector":"NixlConnector","kv_role":"kv_both","kv_connector_extra_config": {"backends": ["LIBFABRIC"]}}'
+    VllmPrefillWorker:
+      envFromSecret: hf-token-secret
+      componentType: worker
+      subComponentType: prefill
+      replicas: 6
+      resources:
+        requests:
+          gpu: "2"
+          custom:
+            vpc.amazonaws.com/efa: "8"
+        limits:
+          gpu: "2"
+          custom:
+            vpc.amazonaws.com/efa: "8"
+      volumeMounts:
+      - name: model-cache
+        mountPoint: /home/dynamo/.cache/huggingface
+      - name: compilation-cache
+        mountPoint: /home/dynamo/.cache/vllm
+        useAsCompilationCache: true
+      extraPodSpec:
+        nodeSelector:
+          node.kubernetes.io/instance-type: p5.48xlarge
+          karpenter.sh/capacity-type: reserved
+        tolerations:
+        - key: "nvidia.com/gpu"
+          operator: "Exists"
+          effect: "NoSchedule"
+        affinity:
+          podAffinity:
+            requiredDuringSchedulingIgnoredDuringExecution:
+              - topologyKey: "topology.kubernetes.io/zone"
+                labelSelector:
+                  matchLabels:
+                    nvidia.com/dynamo-graph-deployment-name: "vllm-disagg"
+          podAntiAffinity:
+            preferredDuringSchedulingIgnoredDuringExecution:
+              - weight: 100
+                podAffinityTerm:
+                  labelSelector:
+                    matchLabels:
+                      nvidia.com/dynamo-component: "VllmPrefillWorker"
+                  topologyKey: "kubernetes.io/hostname"
+        mainContainer:
+          env:
+          - name: HF_HOME
+            value: /home/dynamo/.cache/huggingface
+          securityContext:
+            capabilities:
+              add:
+                - IPC_LOCK
+          image: nvcr.io/nvidia/ai-dynamo/vllm-runtime:1.0.0-efa-amd64
+          workingDir: /workspace
+          command:
+          - /bin/bash
+          - -c
+          - |
+            exec python3 -m dynamo.vllm \
+             --model Qwen/Qwen3-32B \
+             --tensor-parallel-size 2 \
+             --disaggregation-mode prefill \
+             --kv-transfer-config '{"kv_connector":"NixlConnector","kv_role":"kv_both","kv_connector_extra_config": {"backends": ["LIBFABRIC"]}}'
\ No newline at end of file
--- a/examples/deployments/EKS/manifests/vllm/disagg-tcp.yaml
+++ b/examples/deployments/EKS/manifests/vllm/disagg-tcp.yaml
+# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+apiVersion: nvidia.com/v1alpha1
+kind: DynamoGraphDeployment
+metadata:
+  name: vllm-disagg
+spec:
+  services:
+    Frontend:
+      componentType: frontend
+      replicas: 1
+      extraPodSpec:
+        affinity:
+          podAffinity:
+            requiredDuringSchedulingIgnoredDuringExecution:
+              - topologyKey: "topology.kubernetes.io/zone"
+                labelSelector:
+                  matchLabels:
+                    nvidia.com/dynamo-graph-deployment-name: "vllm-disagg"
+        mainContainer:
+          image: nvcr.io/nvidia/ai-dynamo/dynamo-frontend:1.0.0
+    VllmDecodeWorker:
+      envFromSecret: hf-token-secret
+      componentType: worker
+      subComponentType: decode
+      replicas: 1
+      resources:
+        requests:
+          gpu: "1"
+        limits:
+          gpu: "1"
+      extraPodSpec:
+        tolerations:
+        - key: "nvidia.com/gpu"
+          operator: "Exists"
+          effect: "NoSchedule"
+        nodeSelector:
+          node.kubernetes.io/instance-type: g6e.2xlarge
+        affinity:
+          podAffinity:
+            requiredDuringSchedulingIgnoredDuringExecution:
+              - topologyKey: "topology.kubernetes.io/zone"
+                labelSelector:
+                  matchLabels:
+                    nvidia.com/dynamo-graph-deployment-name: "vllm-disagg"
+          podAntiAffinity:
+            requiredDuringSchedulingIgnoredDuringExecution:
+              - labelSelector:
+                  matchLabels:
+                    nvidia.com/dynamo-component: "VllmPrefillWorker"
+                topologyKey: "kubernetes.io/hostname"
+        mainContainer:
+          securityContext:
+            capabilities:
+              add:
+                - IPC_LOCK
+          image: nvcr.io/nvidia/ai-dynamo/vllm-runtime:1.0.0-efa-amd64
+          workingDir: /workspace/examples/backends/vllm
+          command:
+          - /bin/bash
+          - -c
+          - |
+            exec python3 -m dynamo.vllm \
+             --model Qwen/Qwen3-0.6B \
+             --disaggregation-mode decode \
+             --kv-transfer-config '{"kv_connector":"NixlConnector","kv_role":"kv_both","kv_buffer_device":"cpu","kv_connector_extra_config": {"backends": ["LIBFABRIC"]}}'
+    VllmPrefillWorker:
+      envFromSecret: hf-token-secret
+      componentType: worker
+      subComponentType: prefill
+      replicas: 1
+      resources:
+        requests:
+          gpu: "1"
+        limits:
+          gpu: "1"
+      extraPodSpec:
+        tolerations:
+        - key: "nvidia.com/gpu"
+          operator: "Exists"
+          effect: "NoSchedule"
+        nodeSelector:
+          node.kubernetes.io/instance-type: g6e.2xlarge
+        affinity:
+          podAffinity:
+            requiredDuringSchedulingIgnoredDuringExecution:
+              - topologyKey: "topology.kubernetes.io/zone"
+                labelSelector:
+                  matchLabels:
+                    nvidia.com/dynamo-graph-deployment-name: "vllm-disagg"
+          podAntiAffinity:
+            requiredDuringSchedulingIgnoredDuringExecution:
+              - labelSelector:
+                  matchLabels:
+                    nvidia.com/dynamo-component: "VllmDecodeWorker"
+                topologyKey: "kubernetes.io/hostname"
+        mainContainer:
+          securityContext:
+            capabilities:
+              add:
+                - IPC_LOCK
+          image: nvcr.io/nvidia/ai-dynamo/vllm-runtime:1.0.0-efa-amd64
+          workingDir: /workspace/examples/backends/vllm
+          command:
+          - /bin/bash
+          - -c
+          - |
+            exec python3 -m dynamo.vllm \
+             --model Qwen/Qwen3-0.6B \
+             --disaggregation-mode prefill \
+             --kv-transfer-config '{"kv_connector":"NixlConnector","kv_role":"kv_both","kv_buffer_device":"cpu","kv_connector_extra_config": {"backends": ["LIBFABRIC"]}}'
\ No newline at end of file
--- a/examples/deployments/EKS/manifests/vllm/disagg.yaml
+++ b/examples/deployments/EKS/manifests/vllm/disagg.yaml
+# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+apiVersion: nvidia.com/v1alpha1
+kind: DynamoGraphDeployment
+metadata:
+  name: vllm-disagg
+spec:
+  pvcs:
+  - create: false
+    name: model-cache
+  - create: false
+    name: compilation-cache
+  services:
+    Frontend:
+      componentType: frontend
+      envs:
+        - name: HF_HOME
+          value: /home/dynamo/.cache/huggingface
+      replicas: 1
+      resources:
+        requests:
+          cpu: "8"
+        limits:
+          cpu: "8"
+      volumeMounts:
+      - name: model-cache
+        mountPoint: /home/dynamo/.cache/huggingface
+      extraPodSpec:
+        affinity:
+          podAffinity:
+            requiredDuringSchedulingIgnoredDuringExecution:
+              - topologyKey: "topology.kubernetes.io/zone"
+                labelSelector:
+                  matchLabels:
+                    nvidia.com/dynamo-graph-deployment-name: "vllm-disagg"
+                    nvidia.com/dynamo-component-type: "worker"
+        mainContainer:
+          image: nvcr.io/nvidia/ai-dynamo/dynamo-frontend:1.0.0
+          workingDir: /workspace
+          command:
+          - /bin/bash
+          - -c
+          - |
+            exec python3 -m dynamo.frontend --router-mode kv --router-reset-states
+    VllmDecodeWorker:
+      envFromSecret: hf-token-secret
+      componentType: worker
+      subComponentType: decode
+      replicas: 1
+      resources:
+        requests:
+          gpu: "1"
+          custom:
+            vpc.amazonaws.com/efa: "1"
+        limits:
+          gpu: "1"
+          custom:
+            vpc.amazonaws.com/efa: "1"
+      volumeMounts:
+      - name: model-cache
+        mountPoint: /home/dynamo/.cache/huggingface
+      - name: compilation-cache
+        mountPoint: /home/dynamo/.cache/vllm
+        useAsCompilationCache: true
+      extraPodSpec:
+        nodeSelector:
+          node.kubernetes.io/instance-type: g7e.12xlarge
+        tolerations:
+        - key: "nvidia.com/gpu"
+          operator: "Exists"
+          effect: "NoSchedule"
+        affinity:
+          podAffinity:
+            requiredDuringSchedulingIgnoredDuringExecution:
+              - topologyKey: "topology.kubernetes.io/zone"
+                labelSelector:
+                  matchLabels:
+                    nvidia.com/dynamo-graph-deployment-name: "vllm-disagg"
+          podAntiAffinity:
+            requiredDuringSchedulingIgnoredDuringExecution:
+              - labelSelector:
+                  matchLabels:
+                    nvidia.com/dynamo-component: "VllmPrefillWorker"
+                topologyKey: "kubernetes.io/hostname"
+        mainContainer:
+          env:
+          - name: HF_HOME
+            value: /home/dynamo/.cache/huggingface
+          securityContext:
+            capabilities:
+              add:
+                - IPC_LOCK
+          image: nvcr.io/nvidia/ai-dynamo/vllm-runtime:1.0.0-efa-amd64
+          workingDir: /workspace
+          command:
+          - /bin/bash
+          - -c
+          - |
+            exec python3 -m dynamo.vllm \
+             --model Qwen/Qwen3-32B \
+             --disaggregation-mode decode \
+             --kv-transfer-config '{"kv_connector":"NixlConnector","kv_role":"kv_both","kv_connector_extra_config": {"backends": ["LIBFABRIC"]}}'
+    VllmPrefillWorker:
+      envFromSecret: hf-token-secret
+      componentType: worker
+      subComponentType: prefill
+      replicas: 1
+      resources:
+        requests:
+          gpu: "1"
+          custom:
+            vpc.amazonaws.com/efa: "1"
+        limits:
+          gpu: "1"
+          custom:
+            vpc.amazonaws.com/efa: "1"
+      volumeMounts:
+      - name: model-cache
+        mountPoint: /home/dynamo/.cache/huggingface
+      - name: compilation-cache
+        mountPoint: /home/dynamo/.cache/vllm
+        useAsCompilationCache: true
+      extraPodSpec:
+        nodeSelector:
+          node.kubernetes.io/instance-type: g7e.12xlarge
+        tolerations:
+        - key: "nvidia.com/gpu"
+          operator: "Exists"
+          effect: "NoSchedule"
+        affinity:
+          podAffinity:
+            requiredDuringSchedulingIgnoredDuringExecution:
+              - topologyKey: "topology.kubernetes.io/zone"
+                labelSelector:
+                  matchLabels:
+                    nvidia.com/dynamo-graph-deployment-name: "vllm-disagg"
+          podAntiAffinity:
+            requiredDuringSchedulingIgnoredDuringExecution:
+              - labelSelector:
+                  matchLabels:
+                    nvidia.com/dynamo-component: "VllmDecodeWorker"
+                topologyKey: "kubernetes.io/hostname"
+        mainContainer:
+          env:
+          - name: HF_HOME
+            value: /home/dynamo/.cache/huggingface
+          securityContext:
+            capabilities:
+              add:
+                - IPC_LOCK
+          image: nvcr.io/nvidia/ai-dynamo/vllm-runtime:1.0.0-efa-amd64
+          workingDir: /workspace
+          command:
+          - /bin/bash
+          - -c
+          - |
+            exec python3 -m dynamo.vllm \
+             --model Qwen/Qwen3-32B \
+             --disaggregation-mode prefill \
+             --kv-transfer-config '{"kv_connector":"NixlConnector","kv_role":"kv_both","kv_connector_extra_config": {"backends": ["LIBFABRIC"]}}'
\ No newline at end of file
--- a/examples/deployments/EKS/templates/eksctl.yaml
+++ b/examples/deployments/EKS/templates/eksctl.yaml
+# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+apiVersion: eksctl.io/v1alpha5
+kind: ClusterConfig
+metadata:
+  name: ${CLUSTER_NAME}
+  region: ${AWS_REGION}
+availabilityZones:
+${EKS_CP_AZS}
+autoModeConfig:
+  enabled: true
+addonsConfig:
+  disableDefaultAddons: true
+addons:
+  - name: aws-efs-csi-driver
+    version: latest
+    useDefaultPodIdentityAssociations: true