fix: update ECS doc (#3500)

Signed-off-by: Kyle H <kylhuang@nvidia.com>

fix: update ECS doc (#3500)
Signed-off-by: Kyle H <kylhuang@nvidia.com>
db192a35 · kYLe · GitHub · bd28e442 · db192a35 · db192a35
Unverified Commit db192a35 authored Oct 09, 2025 by kYLe Committed by GitHub Oct 08, 2025
3 changed files
--- a/examples/deployments/ECS/README.md
+++ b/examples/deployments/ECS/README.md
@@ -3,7 +3,7 @@
 1. Go to AWS ECS console, **Clusters** tab and click on **Create cluster** with name `dynamo-GPU`
 2. Input the cluster name and choose **AWS EC2 instances** as the infrastructure. This option will create a cluster with EC2 instances to deploy containers.
 3. Choose the ECS-optimized GPU AMI `Amazon Linux 2 (GPU)` (Amazon ECS–optimized), which includes NVIDIA drivers and the Docker GPU runtime out of the box.
-4. Choose `g6e.2xlarge` as the **EC2 instance type** and add an `SSH Key pair` so you can log in the instance for debugging purpose.
+4. Choose `g6e.2xlarge` as the **EC2 instance type** and add an `SSH Key pair` so you can log in the instance for debugging purpose. To test with disaggregated serving, we need at least 2 GPUs, so you can choose `g6e.12xlarge` with 4 GPUs
 5. Set **Root EBS volume size** as `200`
 6. For the networking, use the default settings. Make sure the **security group** has
 - an inbound rule which allows "All traffic" from this security group.
@@ -31,6 +31,8 @@ Before creating the task definitions, you need to create the `ecsTaskExecutionRo

 Follow the [AWS documentation on creating the task execution IAM role](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task_execution_IAM_role.html#create-task-execution-role) to create a role named `ecsTaskExecutionRole` with the `AmazonECSTaskExecutionRolePolicy` policy attached.

+Based on the task definition, you may need to add Amazon CloudWatch permissions and AWS Secrets Manager permissions to the `ecsTaskExecutionRole`. See details in the [Amazon CloudWatch Logs permissions reference](https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/permissions-reference-cwl.html) the [AWS Secrets Manager authentication and access control guide](https://docs.aws.amazon.com/secretsmanager/latest/userguide/auth-and-access.html#auth-and-access_secrets)
+
 > [!NOTE]
 > The role ARN will be `arn:aws:iam::<your-account-id>:role/ecsTaskExecutionRole`. Make sure to update `<your-account-id>` in any task definition JSON files with your actual AWS account ID.

@@ -61,8 +63,8 @@ Follow the [AWS documentation on creating the task execution IAM role](https://d
 > [!Note]
 > Ensure you have created the `ecsTaskExecutionRole` as described in section 3.1 before creating these task definitions.

-1. Dynamo vLLM Frontend Task
-This task will create vLLM frontend, processors, routers and a decode worker.
+1. Dynamo vLLM Frontend and Decoding Worker Task
+This task will create vLLM frontend, processors, routers and a decoding worker.
 Please follow steps below to create this task
 - Set container name as `dynamo-frontend` and use prebuild [Dynamo container](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/ai-dynamo/containers/vllm-runtime).
 - Choose `Amazon EC2 instances` as the **Launch type** with **Task size** `2 vCPU` and `40 GB`memory
@@ -99,11 +101,12 @@ You can create a service or directly run the task from the task definition
 - For **Security group**, select the same security group used by the EC2 cluster
 - Verify that outbound rules allow all traffic for downloading images and external communication
 - Wait for this deployment to finish, and get the **Private IP** of this task.
-2. Dynamo Frontend Task
+2. Dynamo Frontend and Decoding Worker Task
 - Choose the EC2 cluster (`dynamo-GPU`) for **Existing cluster** created in step 1.
 - In the **Container Overrides**, use the IP for ETCD/NATS task for the `ETCD_ENDPOINTS` and `NATS_SERVER` values.
+- After the deployment, an aggregated serving endpoint is created and you can test it with scripts in step 6.
 3. Dynamo PrefillWorker Task
- Choose the EC2 cluster (`dynamo-GPU`) for **Existing cluster** created in step 1.
+- For disaggregated serving, you can deploy a separate prefill worker on another GPU. Choose the EC2 cluster (`dynamo-GPU`) for **Existing cluster** created in step 1 with at least 2 GPUs ( `g6e.12xlarge` for example)
 - In the **Container Overrides**, use the IP for ETCD/NATS task for the `ETCD_ENDPOINTS` and `NATS_SERVER` values.

 ## 6. Testing

--- a/examples/deployments/ECS/task_definition_frontend.json
+++ b/examples/deployments/ECS/task_definition_frontend.json
@@ -5,7 +5,7 @@
            "name": "dynamo-vllm-frontend",
            "image": "nvcr.io/nvidia/ai-dynamo/vllm-runtime:my-tag",
            "repositoryCredentials": {
-                "credentialsParameter": "arn:aws:secretsmanager:us-east-2:AWS_ID:secret:ngc_nvcr_access"
+                "credentialsParameter": "arn:aws:secretsmanager:AWS_REGION:AWS_ID:secret:ngc_nvcr_access"
            },
            "cpu": 0,
            "portMappings": [
@@ -46,7 +46,7 @@
                    "mode": "non-blocking",
                    "awslogs-create-group": "true",
                    "max-buffer-size": "25m",
-                    "awslogs-region": "us-east-2",
+                    "awslogs-region": "AWS_REGION",
                    "awslogs-stream-prefix": "ecs"
                },
                "secretOptions": []

--- a/examples/deployments/ECS/task_definition_prefillworker.json
+++ b/examples/deployments/ECS/task_definition_prefillworker.json
@@ -5,7 +5,7 @@
            "name": "dynamo-prefill",
            "image": "nvcr.io/nvidia/ai-dynamo/vllm-runtime:my-tag",
            "repositoryCredentials": {
-                "credentialsParameter": "arn:aws:secretsmanager:us-east-2:AWS_ID:secret:ngc_access"
+                "credentialsParameter": "arn:aws:secretsmanager:AWS_REGION:AWS_ID:secret:ngc_access"
            },
            "cpu": 0,
            "portMappings": [],
@@ -38,7 +38,7 @@
                    "mode": "non-blocking",
                    "awslogs-create-group": "true",
                    "max-buffer-size": "25m",
-                    "awslogs-region": "us-east-2",
+                    "awslogs-region": "AWS_REGION",
                    "awslogs-stream-prefix": "ecs"
                },
                "secretOptions": []