Unverified Commit 30942780 authored by atchernych's avatar atchernych Committed by GitHub
Browse files

docs: Create a guide for writing dynamo deployments CR (#1999)

parent f0e382ad
...@@ -115,7 +115,7 @@ For Kubernetes deployment, YAML manifests are provided in the `deploy/` director ...@@ -115,7 +115,7 @@ For Kubernetes deployment, YAML manifests are provided in the `deploy/` director
#### Prerequisites #### Prerequisites
- **Dynamo Cloud**: Follow the [Quickstart Guide](../../docs/guides/dynamo_deploy/quickstart.md) to deploy Dynamo Cloud first. - **Dynamo Cloud**: Follow the [Quickstart Guide](../../../docs/guides/dynamo_deploy/quickstart.md) to deploy Dynamo Cloud first.
- **Container Images**: The deployment files currently require access to `nvcr.io/nvidian/nim-llm-dev/vllm-runtime`. If you don't have access, build and push your own image: - **Container Images**: The deployment files currently require access to `nvcr.io/nvidian/nim-llm-dev/vllm-runtime`. If you don't have access, build and push your own image:
```bash ```bash
......
...@@ -36,13 +36,14 @@ docker login <CONTAINER_REGISTRY> ...@@ -36,13 +36,14 @@ docker login <CONTAINER_REGISTRY>
#### 🛠️ Build and push images for the Dynamo Cloud platform components #### 🛠️ Build and push images for the Dynamo Cloud platform components
[One-time Action] [One-time Action]
You should build the images for the Dynamo Cloud Platform. You should build the image(s) for the Dynamo Cloud Platform.
If you are a **👤 Dynamo User** you would do this step once. If you are a **👤 Dynamo User** you would do this step once.
```bash ```bash
export DOCKER_SERVER=<your-docker-server> export DOCKER_SERVER=<your-docker-server>
export IMAGE_TAG=<TAG> export IMAGE_TAG=<TAG>
earthly --push +all-docker --DOCKER_SERVER=$DOCKER_SERVER --IMAGE_TAG=$IMAGE_TAG cd deploy/cloud/operator
earthly --push +docker --DOCKER_SERVER=$DOCKER_SERVER --IMAGE_TAG=$IMAGE_TAG
``` ```
If you are a **🧑‍💻 Dynamo Contributor** you would have to rebuild the dynamo platform images as the code evolves. To do so please look at the [Cloud Guide](../../../docs/guides/dynamo_deploy/dynamo_cloud.md). If you are a **🧑‍💻 Dynamo Contributor** you would have to rebuild the dynamo platform images as the code evolves. To do so please look at the [Cloud Guide](../../../docs/guides/dynamo_deploy/dynamo_cloud.md).
......
...@@ -36,7 +36,7 @@ export NAMESPACE=<your-namespace> # the namespace you used to deploy Dynamo clou ...@@ -36,7 +36,7 @@ export NAMESPACE=<your-namespace> # the namespace you used to deploy Dynamo clou
Deploying an example consists of the simple `kubectl apply -f ... -n ${NAMESPACE}` command. For example: Deploying an example consists of the simple `kubectl apply -f ... -n ${NAMESPACE}` command. For example:
```bash ```bash
kubectl apply -f components/backends/vllm/deploy/agg.yaml -n ${NAMESPACE} kubectl apply -f components/backends/vllm/deploy/agg.yaml -n ${NAMESPACE}
``` ```
You can use `kubectl get dynamoGraphDeployment -n ${NAMESPACE}` to view your deployment. You can use `kubectl get dynamoGraphDeployment -n ${NAMESPACE}` to view your deployment.
......
# Creating Kubernetes Deployments
The scripts in the `components/<backend>/launch` folder like [agg.sh](../../../components/backends/vllm/launch/agg.sh) demonstrate how you can serve your models locally.
The corresponding YAML files like [agg.yaml](../../../components/backends/vllm/deploy/agg.yaml) show you how you could create a kubernetes deployment for your inference graph.
This guide explains how to create your own deployment files.
## Step 1: Choose Your Architecture Pattern
Select the architecture pattern as your template that best fits your use case.
For example, when using the `VLLM` inference backend:
- **Development / Testing**
Use [`agg.yaml`](../../../components/backends/vllm/deploy/agg.yaml) as the base configuration.
- **Production with Load Balancing**
Use [`agg_router.yaml`](../../../components/backends/vllm/deploy/agg_router.yaml) to enable scalable, load-balanced inference.
- **High Performance / Disaggregated Deployment**
Use [`disagg_router.yaml`](../../../components/backends/vllm/deploy/disagg_router.yaml) for maximum throughput and modular scalability.
## Step 2: Customize the Template
You can run the Frontend on one machine, for example a CPU node, and the worker on a different machine (a GPU node).
The Frontend serves as a framework-agnostic HTTP entry point and is likely not to need many changes.
It serves the following roles:
1. OpenAI-Compatible HTTP Server
* Provides `/v1/chat/completions` endpoint
* Handles HTTP request/response formatting
* Supports streaming responses
* Validates incoming requests
2. Service Discovery and Routing
* Auto-discovers backend workers via etcd
* Routes requests to the appropriate Processor/Worker components
* Handles load balancing between multiple workers
3. Request Preprocessing
* Initial request validation
* Model name verification
* Request format standardization
You should then pick a worker and specialize the config. For example,
```yaml
VllmWorker: # vLLM-specific config
enforce-eager: true
enable-prefix-caching: true
SglangWorker: # SGLang-specific config
router-mode: kv
disagg-mode: true
TrtllmWorker: # TensorRT-LLM-specific config
engine-config: ./engine.yaml
kv-cache-transfer: ucx
```
Here's a template structure based on the examples:
```yaml
YourWorker:
dynamoNamespace: your-namespace
componentType: worker
replicas: N
envFromSecret: your-secrets # e.g., hf-token-secret
# Health checks for worker initialization
readinessProbe:
exec:
command: ["/bin/sh", "-c", 'grep "Worker.*initialized" /tmp/worker.log']
resources:
requests:
gpu: "1" # GPU allocation
extraPodSpec:
mainContainer:
image: your-image
command:
- /bin/sh
- -c
args:
- python -m dynamo.YOUR_INFERENCE_ENGINE --model YOUR_MODEL --your-flags
```
Consult the corresponding sh file. Each of the python commands to launch a component will go into your yaml spec under the
`extraPodSpec: -> mainContainer: -> args:`
The front end is launched with "python3 -m dynamo.frontend [--http-port 8000] [--router-mode kv]"
Each worker will launch `python -m dynamo.YOUR_INFERENCE_BACKEND --model YOUR_MODEL --your-flags `command.
If you are a Dynamo contributor the [dynamo run guide](../dynamo_run.md) for details on how to run this command.
## Step 3: Key Customization Points
### Model Configuration
```yaml
args:
- "python -m dynamo.YOUR_INFERENCE_BACKEND --model YOUR_MODEL --your-flag"
```
### Resource Allocation
```yaml
resources:
requests:
cpu: "N"
memory: "NGi"
gpu: "N"
```
### Scaling
```yaml
replicas: N # Number of worker instances
```
### Routing Mode
```yaml
args:
- --router-mode
- kv # Enable KV-cache routing
```
### Worker Specialization
```yaml
args:
- --is-prefill-worker # For disaggregated prefill workers
```
\ No newline at end of file
...@@ -64,13 +64,10 @@ Use this approach when developing or customizing Dynamo as a contributor, or usi ...@@ -64,13 +64,10 @@ Use this approach when developing or customizing Dynamo as a contributor, or usi
Ensure you have the source code checked out and are in the `dynamo` directory: Ensure you have the source code checked out and are in the `dynamo` directory:
```bash
cd deploy/cloud/helm/
```
### Set Environment Variables ### Set Environment Variables
Our examples use the `nvcr.io` but you can setup your own values if you use another docker registry. Our examples use the [`nvcr.io`](nvcr.io/nvidia/ai-dynamo/) but you can setup your own values if you use another docker registry.
```bash ```bash
export NAMESPACE=dynamo-cloud # or whatever you prefer. export NAMESPACE=dynamo-cloud # or whatever you prefer.
...@@ -98,15 +95,13 @@ docker login <your-registry> ...@@ -98,15 +95,13 @@ docker login <your-registry>
docker push <your-registry>/dynamo-base:latest-vllm docker push <your-registry>/dynamo-base:latest-vllm
``` ```
[More on image building](../../../../README.md)
### Install Dynamo Cloud ### Install Dynamo Cloud
You need to build and push the Dynamo Cloud Operator Image by running You need to build and push the Dynamo Cloud Operator Image by running
```bash ```bash
earthly --push +all-docker --DOCKER_SERVER=$DOCKER_SERVER --IMAGE_TAG=$IMAGE_TAG cd deploy/cloud/operator
earthly --push +docker --DOCKER_SERVER=$DOCKER_SERVER --IMAGE_TAG=$IMAGE_TAG
``` ```
The Nvidia Cloud Operator image will be pulled from the `$DOCKER_SERVER/dynamo-operator:$IMAGE_TAG`. The Nvidia Cloud Operator image will be pulled from the `$DOCKER_SERVER/dynamo-operator:$IMAGE_TAG`.
...@@ -196,4 +191,5 @@ kubectl create secret generic hf-token-secret \ ...@@ -196,4 +191,5 @@ kubectl create secret generic hf-token-secret \
-n ${NAMESPACE} -n ${NAMESPACE}
``` ```
Follow the [Examples](../../examples/README.md) Follow the [Examples](../../examples/README.md)
\ No newline at end of file For more details on how to create your own deployments follow [Create Deployment Guide](create_deployment.md)
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment