Unverified Commit db192a35 authored by kYLe's avatar kYLe Committed by GitHub
Browse files

fix: update ECS doc (#3500)


Signed-off-by: default avatarKyle H <kylhuang@nvidia.com>
parent bd28e442
...@@ -3,7 +3,7 @@ ...@@ -3,7 +3,7 @@
1. Go to AWS ECS console, **Clusters** tab and click on **Create cluster** with name `dynamo-GPU` 1. Go to AWS ECS console, **Clusters** tab and click on **Create cluster** with name `dynamo-GPU`
2. Input the cluster name and choose **AWS EC2 instances** as the infrastructure. This option will create a cluster with EC2 instances to deploy containers. 2. Input the cluster name and choose **AWS EC2 instances** as the infrastructure. This option will create a cluster with EC2 instances to deploy containers.
3. Choose the ECS-optimized GPU AMI `Amazon Linux 2 (GPU)` (Amazon ECS–optimized), which includes NVIDIA drivers and the Docker GPU runtime out of the box. 3. Choose the ECS-optimized GPU AMI `Amazon Linux 2 (GPU)` (Amazon ECS–optimized), which includes NVIDIA drivers and the Docker GPU runtime out of the box.
4. Choose `g6e.2xlarge` as the **EC2 instance type** and add an `SSH Key pair` so you can log in the instance for debugging purpose. 4. Choose `g6e.2xlarge` as the **EC2 instance type** and add an `SSH Key pair` so you can log in the instance for debugging purpose. To test with disaggregated serving, we need at least 2 GPUs, so you can choose `g6e.12xlarge` with 4 GPUs
5. Set **Root EBS volume size** as `200` 5. Set **Root EBS volume size** as `200`
6. For the networking, use the default settings. Make sure the **security group** has 6. For the networking, use the default settings. Make sure the **security group** has
- an inbound rule which allows "All traffic" from this security group. - an inbound rule which allows "All traffic" from this security group.
...@@ -31,6 +31,8 @@ Before creating the task definitions, you need to create the `ecsTaskExecutionRo ...@@ -31,6 +31,8 @@ Before creating the task definitions, you need to create the `ecsTaskExecutionRo
Follow the [AWS documentation on creating the task execution IAM role](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task_execution_IAM_role.html#create-task-execution-role) to create a role named `ecsTaskExecutionRole` with the `AmazonECSTaskExecutionRolePolicy` policy attached. Follow the [AWS documentation on creating the task execution IAM role](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task_execution_IAM_role.html#create-task-execution-role) to create a role named `ecsTaskExecutionRole` with the `AmazonECSTaskExecutionRolePolicy` policy attached.
Based on the task definition, you may need to add Amazon CloudWatch permissions and AWS Secrets Manager permissions to the `ecsTaskExecutionRole`. See details in the [Amazon CloudWatch Logs permissions reference](https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/permissions-reference-cwl.html) the [AWS Secrets Manager authentication and access control guide](https://docs.aws.amazon.com/secretsmanager/latest/userguide/auth-and-access.html#auth-and-access_secrets)
> [!NOTE] > [!NOTE]
> The role ARN will be `arn:aws:iam::<your-account-id>:role/ecsTaskExecutionRole`. Make sure to update `<your-account-id>` in any task definition JSON files with your actual AWS account ID. > The role ARN will be `arn:aws:iam::<your-account-id>:role/ecsTaskExecutionRole`. Make sure to update `<your-account-id>` in any task definition JSON files with your actual AWS account ID.
...@@ -61,8 +63,8 @@ Follow the [AWS documentation on creating the task execution IAM role](https://d ...@@ -61,8 +63,8 @@ Follow the [AWS documentation on creating the task execution IAM role](https://d
> [!Note] > [!Note]
> Ensure you have created the `ecsTaskExecutionRole` as described in section 3.1 before creating these task definitions. > Ensure you have created the `ecsTaskExecutionRole` as described in section 3.1 before creating these task definitions.
1. Dynamo vLLM Frontend Task 1. Dynamo vLLM Frontend and Decoding Worker Task
This task will create vLLM frontend, processors, routers and a decode worker. This task will create vLLM frontend, processors, routers and a decoding worker.
Please follow steps below to create this task Please follow steps below to create this task
- Set container name as `dynamo-frontend` and use prebuild [Dynamo container](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/ai-dynamo/containers/vllm-runtime). - Set container name as `dynamo-frontend` and use prebuild [Dynamo container](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/ai-dynamo/containers/vllm-runtime).
- Choose `Amazon EC2 instances` as the **Launch type** with **Task size** `2 vCPU` and `40 GB`memory - Choose `Amazon EC2 instances` as the **Launch type** with **Task size** `2 vCPU` and `40 GB`memory
...@@ -99,11 +101,12 @@ You can create a service or directly run the task from the task definition ...@@ -99,11 +101,12 @@ You can create a service or directly run the task from the task definition
- For **Security group**, select the same security group used by the EC2 cluster - For **Security group**, select the same security group used by the EC2 cluster
- Verify that outbound rules allow all traffic for downloading images and external communication - Verify that outbound rules allow all traffic for downloading images and external communication
- Wait for this deployment to finish, and get the **Private IP** of this task. - Wait for this deployment to finish, and get the **Private IP** of this task.
2. Dynamo Frontend Task 2. Dynamo Frontend and Decoding Worker Task
- Choose the EC2 cluster (`dynamo-GPU`) for **Existing cluster** created in step 1. - Choose the EC2 cluster (`dynamo-GPU`) for **Existing cluster** created in step 1.
- In the **Container Overrides**, use the IP for ETCD/NATS task for the `ETCD_ENDPOINTS` and `NATS_SERVER` values. - In the **Container Overrides**, use the IP for ETCD/NATS task for the `ETCD_ENDPOINTS` and `NATS_SERVER` values.
- After the deployment, an aggregated serving endpoint is created and you can test it with scripts in step 6.
3. Dynamo PrefillWorker Task 3. Dynamo PrefillWorker Task
- Choose the EC2 cluster (`dynamo-GPU`) for **Existing cluster** created in step 1. - For disaggregated serving, you can deploy a separate prefill worker on another GPU. Choose the EC2 cluster (`dynamo-GPU`) for **Existing cluster** created in step 1 with at least 2 GPUs ( `g6e.12xlarge` for example)
- In the **Container Overrides**, use the IP for ETCD/NATS task for the `ETCD_ENDPOINTS` and `NATS_SERVER` values. - In the **Container Overrides**, use the IP for ETCD/NATS task for the `ETCD_ENDPOINTS` and `NATS_SERVER` values.
## 6. Testing ## 6. Testing
......
...@@ -5,7 +5,7 @@ ...@@ -5,7 +5,7 @@
"name": "dynamo-vllm-frontend", "name": "dynamo-vllm-frontend",
"image": "nvcr.io/nvidia/ai-dynamo/vllm-runtime:my-tag", "image": "nvcr.io/nvidia/ai-dynamo/vllm-runtime:my-tag",
"repositoryCredentials": { "repositoryCredentials": {
"credentialsParameter": "arn:aws:secretsmanager:us-east-2:AWS_ID:secret:ngc_nvcr_access" "credentialsParameter": "arn:aws:secretsmanager:AWS_REGION:AWS_ID:secret:ngc_nvcr_access"
}, },
"cpu": 0, "cpu": 0,
"portMappings": [ "portMappings": [
...@@ -46,7 +46,7 @@ ...@@ -46,7 +46,7 @@
"mode": "non-blocking", "mode": "non-blocking",
"awslogs-create-group": "true", "awslogs-create-group": "true",
"max-buffer-size": "25m", "max-buffer-size": "25m",
"awslogs-region": "us-east-2", "awslogs-region": "AWS_REGION",
"awslogs-stream-prefix": "ecs" "awslogs-stream-prefix": "ecs"
}, },
"secretOptions": [] "secretOptions": []
......
...@@ -5,7 +5,7 @@ ...@@ -5,7 +5,7 @@
"name": "dynamo-prefill", "name": "dynamo-prefill",
"image": "nvcr.io/nvidia/ai-dynamo/vllm-runtime:my-tag", "image": "nvcr.io/nvidia/ai-dynamo/vllm-runtime:my-tag",
"repositoryCredentials": { "repositoryCredentials": {
"credentialsParameter": "arn:aws:secretsmanager:us-east-2:AWS_ID:secret:ngc_access" "credentialsParameter": "arn:aws:secretsmanager:AWS_REGION:AWS_ID:secret:ngc_access"
}, },
"cpu": 0, "cpu": 0,
"portMappings": [], "portMappings": [],
...@@ -38,7 +38,7 @@ ...@@ -38,7 +38,7 @@
"mode": "non-blocking", "mode": "non-blocking",
"awslogs-create-group": "true", "awslogs-create-group": "true",
"max-buffer-size": "25m", "max-buffer-size": "25m",
"awslogs-region": "us-east-2", "awslogs-region": "AWS_REGION",
"awslogs-stream-prefix": "ecs" "awslogs-stream-prefix": "ecs"
}, },
"secretOptions": [] "secretOptions": []
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment