"...ssh:/git@developer.sourcefind.cn:2222/OpenDAS/dynamo.git" did not exist on "889ab67e0c9a732b2be76619ea4b6f72684c95f8"
Unverified Commit 95dd9426 authored by atchernych's avatar atchernych Committed by GitHub
Browse files

docs: Post-Merge cleanup of the deploy documentation (#1922)

parent cb6de94d
...@@ -83,7 +83,7 @@ docker push <your-registry>/dynamo-base:latest-vllm ...@@ -83,7 +83,7 @@ docker push <your-registry>/dynamo-base:latest-vllm
``` ```
Notes about builds for specific frameworks: Notes about builds for specific frameworks:
- For specific details on the `--framework vllm` build, see [here](examples/llm/README.md). - For specific details on the `--framework vllm` build, see [here](examples/vllm/README.md).
- For specific details on the `--framework tensorrtllm` build, see [here](examples/tensorrt_llm/README.md). - For specific details on the `--framework tensorrtllm` build, see [here](examples/tensorrt_llm/README.md).
Note about AWS environments: Note about AWS environments:
...@@ -99,14 +99,14 @@ export DYNAMO_IMAGE=<your-registry>/dynamo-base:latest-vllm ...@@ -99,14 +99,14 @@ export DYNAMO_IMAGE=<your-registry>/dynamo-base:latest-vllm
### Running and Interacting with an LLM Locally ### Running and Interacting with an LLM Locally
To run a model and interact with it locally you can call `dynamo You can run a model and interact with it locally using commands below.
run` with a hugging face model. `dynamo run` supports several backends We support several backends including: `mistralrs`, `sglang`, `vllm`, and `tensorrtllm`.
including: `mistralrs`, `sglang`, `vllm`, and `tensorrtllm`.
#### Example Command #### Example Commands
``` ```
dynamo run out=vllm deepseek-ai/DeepSeek-R1-Distill-Llama-8B python -m dynamo.frontend [--http-port 8080]
python -m dynamo.vllm deepseek-ai/DeepSeek-R1-Distill-Llama-8B
``` ```
``` ```
......
...@@ -48,7 +48,8 @@ tools. ...@@ -48,7 +48,8 @@ tools.
Try the following to begin interacting with a model: Try the following to begin interacting with a model:
> dynamo --help > dynamo --help
> dynamo run Qwen/Qwen2.5-3B-Instruct > python -m dynamo.frontend [--http-port 8080]
> python -m dynamo.vllm Qwen/Qwen2.5-3B-Instruct
To run more complete deployment examples, instances of etcd and nats need to be To run more complete deployment examples, instances of etcd and nats need to be
accessible within the container. This is generally done by connecting to accessible within the container. This is generally done by connecting to
...@@ -58,6 +59,6 @@ cases, you can start them in the container as well: ...@@ -58,6 +59,6 @@ cases, you can start them in the container as well:
> etcd --listen-client-urls http://0.0.0.0:2379 --advertise-client-urls http://0.0.0.0:2379 --data-dir /tmp/etcd & > etcd --listen-client-urls http://0.0.0.0:2379 --advertise-client-urls http://0.0.0.0:2379 --data-dir /tmp/etcd &
With etcd/nats accessible, run the examples: With etcd/nats accessible, run the examples:
> cd examples/hello_world > cd examples
> dynamo serve hello_world:Frontend
<!--
SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
SPDX-License-Identifier: Apache-2.0
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
# Dynamo Deployment Guide
This directory contains all the necessary files and instructions for deploying Dynamo in various environments. Choose the deployment method that best suits your needs:
## Directory Structure
```
deploy/
├── cloud/ # Cloud deployment configurations and tools
├── helm/ # Helm charts for manual Kubernetes deployment
├── metrics/ # Monitoring and metrics configuration
├── sdk/ # Dynamo SDK and related tools
└── README.md # This file
```
## Deployment Options
### 1. 🚀 Dynamo Cloud Platform [PREFERRED]
The Dynamo Cloud Platform provides a managed deployment experience with:
- Automated infrastructure management
- Built-in monitoring and metrics
- Simplified deployment process via `dynamo deploy` CLI commands
- Production-ready configurations
- Managed NATS and etcd dependencies
For detailed instructions, see:
- [Dynamo Cloud Platform Guide](../docs/guides/dynamo_deploy/dynamo_cloud.md)
- [Operator Deployment Guide](../docs/guides/dynamo_deploy/operator_deployment.md)
### 2. Manual Deployment with Helm Charts
For users who need more control over their deployments:
- Full control over deployment parameters
- Manual management of infrastructure
- Customizable monitoring setup
- Flexible configuration options
- Manual management of NATS and etcd dependencies
Documentation:
- [Manual Helm Deployment Guide](../docs/guides/dynamo_deploy/manual_helm_deployment.md)
- [Minikube Setup Guide](../docs/guides/dynamo_deploy/minikube.md)
## Choosing the Right Deployment Method
- **Dynamo Cloud Platform**: Best for most users, provides managed deployment with built-in monitoring
- See [Dynamo Cloud Platform Guide](../docs/guides/dynamo_deploy/dynamo_cloud.md)
- Recommended for production deployments
- Simplifies dependency management
- Provides infrastructure for user management
- **Manual Helm Deployment**: For users who need full control over their deployment
- See [Manual Helm Deployment Guide](../docs/guides/dynamo_deploy/manual_helm_deployment.md)
- Suitable for custom deployments
- Requires manual management of dependencies
- Provides maximum flexibility for users
## Example Deployments
To help you get started, we provide several example deployments:
### Hello World Example
A basic example to learn Dynamo deployment: [Hello World Example](../examples/hello_world/README.md#deploying-to-and-running-the-example-in-kubernetes)
- Shows how to deploy a simple three-service pipeline that processes text
- Provides step-by-step instructions for building your service and testing with port forwarding
- Includes sample output showing the text flow between services
### LLM Examples
Example for deploying LLM services: [LLM Example](../examples/llm/README.md#deploy-to-kubernetes)
- Demonstrates deploying and making inference requests against LLM models
- Includes examples for both aggregated and disaggregated serving
- Provides detailed deployment steps and testing instructions
./docs/guides/dynamo_deploy/README.md
\ No newline at end of file
...@@ -139,6 +139,8 @@ retry_command() { ...@@ -139,6 +139,8 @@ retry_command() {
# Update the helm repo and build the dependencies # Update the helm repo and build the dependencies
retry_command "$HELM_CMD repo add nats https://nats-io.github.io/k8s/helm/charts/" 5 5 && \ retry_command "$HELM_CMD repo add nats https://nats-io.github.io/k8s/helm/charts/" 5 5 && \
# retry_command "$HELM_CMD repo add bitnami https://charts.bitnami.com/bitnami" 5 5 && \
# retry_command "$HELM_CMD repo add minio https://charts.min.io/" 5 5 && \
retry_command "$HELM_CMD repo update" 5 5 retry_command "$HELM_CMD repo update" 5 5
......
...@@ -30,7 +30,7 @@ This approach allows you to install Dynamo directly using a DynamoGraphDeploymen ...@@ -30,7 +30,7 @@ This approach allows you to install Dynamo directly using a DynamoGraphDeploymen
### Basic Installation ### Basic Installation
```bash ```bash
helm upgrade --install dynamo-graph ./deploy/helm/chart -n dynamo-cloud -f ./examples/vllm_v1/deploy/agg.yaml helm upgrade --install dynamo-graph ./deploy/helm/chart -n dynamo-cloud -f ./examples/vllm/deploy/agg.yaml
``` ```
### Customizable Properties ### Customizable Properties
...@@ -39,7 +39,7 @@ You can override the default configuration by setting the following properties: ...@@ -39,7 +39,7 @@ You can override the default configuration by setting the following properties:
```bash ```bash
helm upgrade --install dynamo-graph ./deploy/helm/chart -n dynamo-cloud \ helm upgrade --install dynamo-graph ./deploy/helm/chart -n dynamo-cloud \
-f ./examples/vllm_v1/deploy/agg.yaml \ -f ./examples/vllm/deploy/agg.yaml \
--set "imagePullSecrets[0].name=docker-secret-1" \ --set "imagePullSecrets[0].name=docker-secret-1" \
--set etcdAddr="my-etcd-service:2379" \ --set etcdAddr="my-etcd-service:2379" \
--set natsAddr="nats://my-nats-service:4222" --set natsAddr="nats://my-nats-service:4222"
......
...@@ -13,42 +13,12 @@ This guide provides instructions for setting up the Inference Gateway with Dynam ...@@ -13,42 +13,12 @@ This guide provides instructions for setting up the Inference Gateway with Dynam
1. **Install Dynamo Cloud** 1. **Install Dynamo Cloud**
Follow the instructions in [deploy/cloud/README.md](../../deploy/cloud/README.md) to deploy Dynamo Cloud on your Kubernetes cluster. This will set up the necessary infrastructure components for managing Dynamo inference graphs. [See Quickstart Guide](../../../docs/guides/dynamo_deploy/quickstart.md) to install Dynamo Cloud.
2. **Launch 2 Dynamo Deployments**
Deploy 2 Dynamo aggregated graphs following the instructions in [examples/llm/README.md](../../examples/llm/README.md): 2. **Launch Dynamo Deployments**
### Deploy Dynamo Graphs [See VLLM Example](../../../examples/vllm/README.md)
Follow the commands to deploy 2 dynamo graphs -
```bash
# Set pre-built vLLM dynamo base container image
export VLLM_RUNTIME_IMAGE=<dynamo-vllm-base-image>
# for example:
# export VLLM_RUNTIME_IMAGE=nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.3.1
# run the following commands from dynamo repo's root folder
# Deploy first graph
export DEPLOYMENT_NAME=llm-agg1
yq eval '
.metadata.name = env(DEPLOYMENT_NAME) |
.spec.services[].extraPodSpec.mainContainer.image = env(VLLM_RUNTIME_IMAGE)
' examples/vllm_v0/deploy/agg.yaml > examples/vllm_v0/deploy/agg1.yaml
kubectl apply -f examples/vllm_v0/deploy/agg1.yaml
# Deploy second graph
export DEPLOYMENT_NAME=llm-agg2
yq eval '
.metadata.name = env(DEPLOYMENT_NAME) |
.spec.services[].extraPodSpec.mainContainer.image = env(VLLM_RUNTIME_IMAGE)
' examples/vllm_v0/deploy/agg.yaml > examples/vllm_v0/deploy/agg2.yaml
kubectl apply -f examples/vllm_v0/deploy/agg2.yaml
```
3. **Deploy Inference Gateway** 3. **Deploy Inference Gateway**
......
...@@ -57,7 +57,6 @@ As of Q2 2025, Dynamo HTTP Frontend metrics are exposed when you build container ...@@ -57,7 +57,6 @@ As of Q2 2025, Dynamo HTTP Frontend metrics are exposed when you build container
- Start the [components/metrics](../../components/metrics/README.md) application to begin monitoring for metric events from dynamo workers and aggregating them on a Prometheus metrics endpoint: `http://localhost:9091/metrics`. - Start the [components/metrics](../../components/metrics/README.md) application to begin monitoring for metric events from dynamo workers and aggregating them on a Prometheus metrics endpoint: `http://localhost:9091/metrics`.
- Uncomment the appropriate lines in prometheus.yml to poll port 9091. - Uncomment the appropriate lines in prometheus.yml to poll port 9091.
- Start worker(s) that publishes KV Cache metrics: [examples/rust/service_metrics/bin/server](../../lib/runtime/examples/service_metrics/README.md)` can populate dummy KV Cache metrics. - Start worker(s) that publishes KV Cache metrics: [examples/rust/service_metrics/bin/server](../../lib/runtime/examples/service_metrics/README.md)` can populate dummy KV Cache metrics.
- For a real workflow with real data, see the KV Routing example in [examples/llm/utils/vllm.py](../../examples/llm/utils/vllm.py).
## Configuration ## Configuration
......
...@@ -126,7 +126,6 @@ services: ...@@ -126,7 +126,6 @@ services:
- ./grafana_dashboards:/etc/grafana/provisioning/dashboards - ./grafana_dashboards:/etc/grafana/provisioning/dashboards
- ./grafana-datasources.yml:/etc/grafana/provisioning/datasources/datasources.yml - ./grafana-datasources.yml:/etc/grafana/provisioning/datasources/datasources.yml
environment: environment:
# Port 3000 is already used by "dynamo serve", so use 3001
- GF_SERVER_HTTP_PORT=3001 - GF_SERVER_HTTP_PORT=3001
# do not make it admin/admin, because you will be prompted to change the password every time # do not make it admin/admin, because you will be prompted to change the password every time
- GF_SECURITY_ADMIN_USER=dynamo - GF_SECURITY_ADMIN_USER=dynamo
......
...@@ -23,14 +23,6 @@ ...@@ -23,14 +23,6 @@
**Dynamo Cloud** - A Kubernetes platform providing managed deployment experience for Dynamo inference graphs. **Dynamo Cloud** - A Kubernetes platform providing managed deployment experience for Dynamo inference graphs.
**dynamo build** - The CLI command to containerize inference graphs or parts of graphs into Docker containers.
**dynamo deploy** - The CLI command to deploy inference graphs to Kubernetes with Helm charts or custom operators.
**dynamo run** - The CLI command to quickly experiment and test models with various LLM engines.
**dynamo serve** - The CLI command to compose and serve inference graphs locally.
## E ## E
**@endpoint** - A Python decorator used to define service endpoints within a Dynamo component. **@endpoint** - A Python decorator used to define service endpoints within a Dynamo component.
......
...@@ -2,7 +2,7 @@ ...@@ -2,7 +2,7 @@
## Serving examples locally ## Serving examples locally
Follow individual examples to serve models locally. TODO: Follow individual examples to serve models locally.
## Deploying Examples to Kubernetes ## Deploying Examples to Kubernetes
...@@ -16,7 +16,6 @@ If you are a **👤 Dynamo User** first follow the [Quickstart Guide](../guides/ ...@@ -16,7 +16,6 @@ If you are a **👤 Dynamo User** first follow the [Quickstart Guide](../guides/
### Instructions for Dynamo Contributor ### Instructions for Dynamo Contributor
If you are a **🧑‍💻 Dynamo Contributor** first follow the instructions in [deploy/cloud/helm/README.md](../../deploy/cloud/helm/README.md) to create your Dynamo Cloud deployment. If you are a **🧑‍💻 Dynamo Contributor** first follow the instructions in [deploy/cloud/helm/README.md](../../deploy/cloud/helm/README.md) to create your Dynamo Cloud deployment.
Make sure your dynamo cloud the `deploy.sh --crds --interactive` script finished successfully.
You would have to rebuild the dynamo platform images as the code evolves. For more details please look at the [Cloud Guide](../guides/dynamo_deploy/dynamo_cloud.md) You would have to rebuild the dynamo platform images as the code evolves. For more details please look at the [Cloud Guide](../guides/dynamo_deploy/dynamo_cloud.md)
...@@ -27,7 +26,7 @@ export DYNAMO_IMAGE=<your-registry>/<your-image-name>:<your-tag> ...@@ -27,7 +26,7 @@ export DYNAMO_IMAGE=<your-registry>/<your-image-name>:<your-tag>
``` ```
### Post Install Instructions ### Deploying a particular example
```bash ```bash
# Set your dynamo root directory # Set your dynamo root directory
...@@ -36,17 +35,43 @@ export PROJECT_ROOT=$(pwd) ...@@ -36,17 +35,43 @@ export PROJECT_ROOT=$(pwd)
export NAMESPACE=<your-namespace> # the namespace you used to deploy Dynamo cloud to. export NAMESPACE=<your-namespace> # the namespace you used to deploy Dynamo cloud to.
``` ```
Pick your deployment destination. Deploying an example consists of the simple `kubectl apply -f ... -n ${NAMESPACE}` command. For example:
If local ```bash
kubectl apply -f examples/vllm/deploy/agg.yaml -n ${NAMESPACE}
```
You can use `kubectl get dynamoGraphDeployment -n ${NAMESPACE}` to view your deployment.
You can use `kubectl delete dynamoGraphDeployment <your-dep-name> -n ${NAMESPACE}` to delete the deployment.
**Note 1** Example Image
The examples use a prebuilt image from the `nvcr.io/nvidian/nim-llm-dev registry`.
You can build your own image and update the image location in your CR file prior to applying.
See [Building the Dynamo Base Image](../../README.md#building-the-dynamo-base-image)
```bash ```bash
export DYNAMO_CLOUD=http://localhost:8080 extraPodSpec:
mainContainer:
image: <image-in-your-$DYNAMO_IMAGE>
``` ```
If kubernetes **Note 2**
Setup port forward if needed when deploying to Kubernetes.
List the services in your namespace:
```bash
kubectl get svc -n ${NAMESPACE}
```
Look for one that ends in `-frontend` and use it for port forward.
```bash ```bash
export DYNAMO_CLOUD=https://dynamo-cloud.nvidia.com SERVICE_NAME=$(kubectl get svc -n ${NAMESPACE} -o name | grep frontend | sed 's|.*/||' | sed 's|-frontend||' | head -n1)
kubectl port-forward svc/${SERVICE_NAME}-frontend 8000:8000 -n ${NAMESPACE}
``` ```
Deploying examples consists of the simple `kubectl apply -f` command. Consult the [Port Forward Documentation](https://kubernetes.io/docs/tasks/access-application-cluster/port-forward-access-application-cluster/)
More on [LLM examples](llm_deployment.md)
\ No newline at end of file
<!--
SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
SPDX-License-Identifier: Apache-2.0
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
# Hello World Example: Basic Pipeline
## Overview
This example demonstrates the basic concepts of Dynamo by creating a simple multi-service pipeline. It shows how to:
1. Create and connect multiple Dynamo services
2. Pass data between services using Dynamo's runtime
3. Set up a simple HTTP API endpoint
4. Deploy and interact with a Dynamo service graph
Graph Architecture:
```
Users/Clients (HTTP)
┌─────────────┐
│ Frontend │ HTTP API endpoint (/generate)
└─────────────┘
│ dynamo/runtime
┌─────────────┐
│ Middle │
└─────────────┘
│ dynamo/runtime
┌─────────────┐
│ Backend │
└─────────────┘
```
## Component Descriptions
### Frontend Service
- Serves as the entry point for external HTTP requests
- Exposes a `/generate` HTTP API endpoint that clients can call
- Processes incoming text and passes it to the Middle service
### Middle Service
- Acts as an intermediary service in the pipeline
- Receives requests from the Frontend
- Appends "-mid" to the text and forwards it to the Backend
### Backend Service
- Functions as the final service in the pipeline
- Processes requests from the Middle service
- Appends "-back" to the text and yields tokens
## Running the Example Locally
Make sure you are running etcd and nats
```bash
sudo systemctl start etcd
sudo systemctl start nats-server
```
1. Launch all three services using a single command:
```bash
cd /workspace/examples/hello_world
dynamo serve hello_world:Frontend
```
The `dynamo serve` command deploys the entire service graph, automatically handling the dependencies between Frontend, Middle, and Backend services.
2. Send request to frontend using curl:
```bash
curl -X 'POST' \
'http://localhost:8000/generate' \
-H 'accept: text/event-stream' \
-H 'Content-Type: application/json' \
-d '{
"text": "test"
}'
```
# Deploy to Kubernetes
You should first deploy the Dynamo Cloud Platform.
If you are a **👤 Dynamo User** first follow the [Quickstart Guide](../guides/dynamo_deploy/quickstart.md).
If you are a **🧑‍💻 Dynamo Contributor** and you have changed the platform code you would have to rebuild the dynamo platform. To do so please look at the [Cloud Guide](../guides/dynamo_deploy/dynamo_cloud.md).
## Deploy your service using a DynamoGraphDeployment CR.
```bash
kubectl apply -f examples/hello_world/deploy/hello_world.yaml -n ${NAMESPACE}
```
## Testing the Deployment
Once the deployment is complete, you can test it using commands below.
Do the port forward in another terminal if needed.
```bash
export DEPLOYMENT_NAME=hello-world
# Forward the pod's port to localhost
kubectl port-forward svc/$DEPLOYMENT_NAME-frontend 8000:8000 -n ${NAMESPACE}
```
```bash
# Test the API endpoint
curl -N -X POST http://localhost:8000/generate \
-H "accept: text/event-stream" \
-H "Content-Type: application/json" \
-d '{"text": "test"}'
```
## Expected Output
When you send the request with "test" as input, the response will show how the text flows through each service:
```
Frontend: Middle: Backend: test-mid-back
```
This demonstrates how:
1. The Frontend receives "test"
2. The Middle service adds "-mid" to create "test-mid"
3. The Backend service adds "-back" to create "test-mid-back"
...@@ -81,27 +81,27 @@ Start required services (etcd and NATS) using [Docker Compose](../../deploy/metr ...@@ -81,27 +81,27 @@ Start required services (etcd and NATS) using [Docker Compose](../../deploy/metr
docker compose -f deploy/metrics/docker-compose.yml up -d docker compose -f deploy/metrics/docker-compose.yml up -d
``` ```
### Build docker ### Build the container image for your platform
```bash ```bash
# On an x86 machine # On an x86 machine
./container/build.sh --framework vllm ./container/build.sh --framework VLLM
# On an ARM machine (ex: GB200) # On an ARM machine (ex: GB200)
./container/build.sh --framework vllm --platform linux/arm64 ./container/build.sh --framework VLLM --platform linux/arm64
``` ```
```{note} ```{note}
Building a vLLM docker image for ARM machines currently involves building vLLM from source, which is known to have performance issues to require exgtensive system RAM; see [vLLM Issue 8878](https://github.com/vllm-project/vllm/issues/8878). Building a vLLM docker image for ARM machines currently involves building vLLM from source, which is known to have performance issues to require extensive system RAM; see [vLLM Issue 8878](https://github.com/vllm-project/vllm/issues/8878).
You can tune the number of parallel build jobs for building VLLM from source You can tune the number of parallel build jobs for building VLLM from source
on ARM based on your available cores and system RAM with `VLLM_MAX_JOBS`. on ARM based on your available cores and system RAM with `VLLM_MAX_JOBS`.
For example, on an ARM machine with low system resources: For example, on an ARM machine with low system resources:
`./container/build.sh --framework vllm --platform linux/arm64 --build-arg VLLM_MAX_JOBS=2` `./container/build.sh --framework VLLM --platform linux/arm64 --build-arg VLLM_MAX_JOBS=2`
For example, on a GB200 which has very high CPU cores and memory resource: For example, on a GB200 which has very high CPU cores and memory resource:
`./container/build.sh --framework vllm --platform linux/arm64 --build-arg VLLM_MAX_JOBS=64` `./container/build.sh --framework VLLM --platform linux/arm64 --build-arg VLLM_MAX_JOBS=64`
When vLLM has pre-built ARM wheels published, this process can be improved. When vLLM has pre-built ARM wheels published, this process can be improved.
...@@ -109,17 +109,17 @@ You can tune the number of parallel build jobs for building VLLM from source ...@@ -109,17 +109,17 @@ You can tune the number of parallel build jobs for building VLLM from source
on ARM based on your available cores and system RAM with `VLLM_MAX_JOBS`. on ARM based on your available cores and system RAM with `VLLM_MAX_JOBS`.
For example, on an ARM machine with low system resources: For example, on an ARM machine with low system resources:
`./container/build.sh --framework vllm --platform linux/arm64 --build-arg VLLM_MAX_JOBS=2` `./container/build.sh --framework VLLM --platform linux/arm64 --build-arg VLLM_MAX_JOBS=2`
For example, on a GB200 which has very high CPU cores and memory resource: For example, on a GB200 which has very high CPU cores and memory resource:
`./container/build.sh --framework vllm --platform linux/arm64 --build-arg VLLM_MAX_JOBS=64` `./container/build.sh --framework VLLM --platform linux/arm64 --build-arg VLLM_MAX_JOBS=64`
When vLLM has pre-built ARM wheels published, this process can be improved. When vLLM has pre-built ARM wheels published, this process can be improved.
``` ```
### Run container ### Run the container you have built
``` ```
./container/run.sh -it --framework vllm ./container/run.sh -it --framework VLLM
``` ```
## Run Deployment ## Run Deployment
...@@ -147,127 +147,6 @@ This figure shows an overview of the major components to deploy: ...@@ -147,127 +147,6 @@ This figure shows an overview of the major components to deploy:
``` ```
```{note} ```{note}
The planner component is enabled by default for all deployment architectures but is set to no-op mode. This means the planner observes metrics but doesn't take scaling actions. To enable active scaling, you can add `--Planner.no-operation=false` to your `dynamo serve` command. For more details, see [PLanner](../architecture/planner_intro.rst). The planner component is enabled by default for all deployment architectures but is set to no-op mode. This means the planner observes metrics but doesn't take scaling actions. To enable active scaling, you can add `--Planner.no-operation=false` to your `dynamo serve` command.
``` For more details, see [Planner Architecture Overview](../architecture/planner_intro.rst).
### Example architectures
```{note}
For a non-dockerized deployment, first export `DYNAMO_HOME` to point to the dynamo repository root, e.g. `export DYNAMO_HOME=$(pwd)`
```
#### Aggregated serving
```bash
cd $DYNAMO_HOME/examples/llm
dynamo serve graphs.agg:Frontend -f ./configs/agg.yaml
```
#### Aggregated serving with KV Routing
```bash
cd $DYNAMO_HOME/examples/llm
dynamo serve graphs.agg_router:Frontend -f ./configs/agg_router.yaml
```
#### Disaggregated serving
```bash
cd $DYNAMO_HOME/examples/llm
dynamo serve graphs.disagg:Frontend -f ./configs/disagg.yaml
```
#### Disaggregated serving with KV Routing
```bash
cd $DYNAMO_HOME/examples/llm
dynamo serve graphs.disagg_router:Frontend -f ./configs/disagg_router.yaml
```
### Client
In another terminal:
```bash
# this test request has around 200 tokens isl
curl localhost:8000/v1/chat/completions -H "Content-Type: application/json" -d '{
"model": "deepseek-ai/DeepSeek-R1-Distill-Llama-8B",
"messages": [
{
"role": "user",
"content": "In the heart of Eldoria, an ancient land of boundless magic and mysterious creatures, lies the long-forgotten city of Aeloria. Once a beacon of knowledge and power, Aeloria was buried beneath the shifting sands of time, lost to the world for centuries. You are an intrepid explorer, known for your unparalleled curiosity and courage, who has stumbled upon an ancient map hinting at ests that Aeloria holds a secret so profound that it has the potential to reshape the very fabric of reality. Your journey will take you through treacherous deserts, enchanted forests, and across perilous mountain ranges. Your Task: Character Background: Develop a detailed background for your character. Describe their motivations for seeking out Aeloria, their skills and weaknesses, and any personal connections to the ancient city or its legends. Are they driven by a quest for knowledge, a search for lost familt clue is hidden."
}
],
"stream":false,
"max_tokens": 30
}'
```
### Multinode deployment
See [Multinode Examples](../examples/multinode.md) for more details.
### Close deployment
See [Close deployment](../guides/dynamo_serve.md#close-deployment) in the *Dynamo Run* topic to learn about how to close the deployment.
## Deploy to Kubernetes
These examples can be deployed to a Kubernetes cluster using [Dynamo Cloud](../guides/dynamo_deploy/dynamo_cloud.md) and the Dynamo CLI.
### Prerequisites
You must first follow the instructions in [deploy/cloud/helm/README.md](https://github.com/ai-dynamo/dynamo/blob/main/deploy/cloud/helm/README.md) to install Dynamo Cloud on your Kubernetes cluster.
```{note}
The `KUBE_NS` variable in the following steps must match the Kubernetes namespace where you installed Dynamo Cloud. You must also expose the `dynamo-store` service externally. This will be the endpoint the CLI uses to interface with Dynamo Cloud.
```
### Deployment Steps
For detailed deployment instructions, please refer to the [Operator Deployment Guide](../guides/dynamo_deploy/operator_deployment.md). The following are the specific commands for the LLM examples:
```bash
# Set your project root directory
export PROJECT_ROOT=$(pwd)
# Configure environment variables (see operator_deployment.md for details)
export KUBE_NS=dynamo-cloud
export DYNAMO_CLOUD=http://localhost:8080 # If using port-forward
# OR
# export DYNAMO_CLOUD=https://dynamo-cloud.nvidia.com # If using Ingress/VirtualService
# Build the Dynamo base image (see operator_deployment.md for details)
export DYNAMO_IMAGE=<your-registry>/<your-image-name>:<your-tag>
# Build the service
cd $PROJECT_ROOT/examples/llm
DYNAMO_TAG=$(dynamo build graphs.agg:Frontend | grep "Successfully built" | awk '{ print $NF }' | sed 's/\.$//')
# Deploy to Kubernetes
export DEPLOYMENT_NAME=llm-agg
# TODO: Deploy your service using a DynamoGraphDeployment CR.
```
**Note**: Optionally add `--Planner.no-operation=false` at the end of the deployment command to enable the planner component to take scaling actions on your deployment.
### Testing the Deployment
Once the deployment is complete, you can test it using:
```bash
# Forward the port to localhost
kubectl port-forward svc/$DEPLOYMENT_NAME-frontend 8000:8000 -n ${KUBE_NS}
# Test the API endpoint
curl localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "deepseek-ai/DeepSeek-R1-Distill-Llama-8B",
"messages": [
{
"role": "user",
"content": "In the heart of Eldoria, an ancient land of boundless magic and mysterious creatures, lies the long-forgotten city of Aeloria. Once a beacon of knowledge and power, Aeloria was buried beneath the shifting sands of time, lost to the world for centuries. You are an intrepid explorer, known for your unparalleled curiosity and courage, who has stumbled upon an ancient map hinting at ests that Aeloria holds a secret so profound that it has the potential to reshape the very fabric of reality. Your journey will take you through treacherous deserts, enchanted forests, and across perilous mountain ranges. Your Task: Character Background: Develop a detailed background for your character. Describe their motivations for seeking out Aeloria, their skills and weaknesses, and any personal connections to the ancient city or its legends. Are they driven by a quest for knowledge, a search for lost familt clue is hidden."
}
],
"stream":false,
"max_tokens": 30
}'
``` ```
...@@ -123,13 +123,14 @@ export DYNAMO_IMAGE=<your-registry>/dynamo-base:latest-vllm ...@@ -123,13 +123,14 @@ export DYNAMO_IMAGE=<your-registry>/dynamo-base:latest-vllm
## Running and Interacting with an LLM Locally ## Running and Interacting with an LLM Locally
To run a model and interact with it locally, call `dynamo run` with a Hugging Face model. Dynamo supports several backends, including `mistralrs`, `sglang`, `vllm`, and `tensorrtllm`.
`dynamo run` supports several backends, including `mistralrs`, `sglang`, `vllm`, and `tensorrtllm`. Use example commands below tp launch a model.
### Example Command ### Example Command
```bash ```bash
dynamo run out=vllm deepseek-ai/DeepSeek-R1-Distill-Llama-8B python -m dynamo.frontend [--http-port 8080]
python -m dynamo.vllm deepseek-ai/DeepSeek-R1-Distill-Llama-8B
``` ```
```bash ```bash
...@@ -166,31 +167,7 @@ docker compose -f deploy/docker-compose.yml up -d ...@@ -166,31 +167,7 @@ docker compose -f deploy/docker-compose.yml up -d
### Start Dynamo LLM Serving Components ### Start Dynamo LLM Serving Components
Next, serve a minimal configuration with an http server, basic [Explore the VLLM Example](../examples/vllm/README.md)
round-robin router, and a single worker.
```bash
cd examples/llm
dynamo serve graphs.agg:Frontend -f configs/agg.yaml
```
### Send a Request
```bash
curl localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "deepseek-ai/DeepSeek-R1-Distill-Llama-8B",
"messages": [
{
"role": "user",
"content": "Hello, how are you?"
}
],
"stream":false,
"max_tokens": 300
}' | jq
```
## Local Development ## Local Development
...@@ -232,6 +209,6 @@ pip install .[all] ...@@ -232,6 +209,6 @@ pip install .[all]
# To test # To test
docker compose -f deploy/docker-compose.yml up -d docker compose -f deploy/docker-compose.yml up -d
cd examples/llm python -m dynamo.frontend [--http-port 8080]
dynamo serve graphs.agg:Frontend -f configs/agg.yaml python -m dynamo.vllm deepseek-ai/DeepSeek-R1-Distill-Llama-8B
``` ```
<!--
SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
SPDX-License-Identifier: Apache-2.0
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
# Guide to Dynamo CLI
After installing Dynamo with the following command, Dynamo can be used primarily through its CLI.
```
apt-get update
DEBIAN_FRONTEND=noninteractive apt-get install -yq python3-dev python3-pip python3-venv libucx0
python3 -m venv venv
source venv/bin/activate
pip install "ai-dynamo[all]"
```
## Dynamo workflow
Dynamo CLI has the following 4 sub-commands.
- :runner: dynamo run: quickly spin up a server to experiment with a specified model, input and output target.
- :palm_up_hand: dynamo serve: compose a graph of workers locally and serve.
- :hammer: (Experimental) dynamo build: containerize either the entire graph or parts of graph to multiple containers
- :rocket: (Experimental) dynamo deploy: deploy to K8 with helm charts or custom operators
- :cloud: (Experimental) dynamo cloud: interact with your dynamo cloud server
For more detailed examples on serving LLMs with disaggregated serving, KV aware routing, etc, please refer to [LLM deployment examples](https://github.com/ai-dynamo/dynamo/blob/main/examples/llm/README.md)
...@@ -37,7 +37,7 @@ Use `run` to start an interactive chat session with a model. This command execut ...@@ -37,7 +37,7 @@ Use `run` to start an interactive chat session with a model. This command execut
#### Example #### Example
```bash ```bash
dynamo run deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B dynamo-run deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
``` ```
### `serve` ### `serve`
......
...@@ -15,31 +15,21 @@ See the License for the specific language governing permissions and ...@@ -15,31 +15,21 @@ See the License for the specific language governing permissions and
limitations under the License. limitations under the License.
--> -->
# Deploying Inference Graphs to Kubernetes (`dynamo deploy`) # Deploying Inference Graphs to Kubernetes
This guide explains the deployment options available for Dynamo inference graphs in Kubernetes environments. We expect users to deploy their inference graphs using CRDs or helm charts.
## Deployment Options Prior to deploying an inference graph the user should deploy the Dynamo Cloud Platform.
Dynamo Cloud acts as an orchestration layer between the end user and Kubernetes, handling the complexity of deploying your graphs for you. This is a one-time action, only necessary the first time you deploy a DynamoGraph.
Dynamo provides two distinct deployment options that each serve different use cases:
1. Dynamo Cloud Kubernetes Platform is preferred in cases that support it
2. Manual Deployment with Helm Charts is suited to users who need more control over their deployments
# 1. Please follow [Installing Dynamo Cloud](./dynamo_cloud.md) for steps to install.
For details about the Dynamo Cloud Platform, see the [Dynamo Operator Guide](dynamo_operator.md)
### Dynamo Cloud Kubernetes Platform [PREFERRED] # 2. Follow [Examples](../../examples/README.md) to see how you can deploy your Inference Graphs.
The Dynamo Cloud Platform (`deploy/cloud/`) provides a managed deployment experience:
- Contains the infrastructure components required for the Dynamo cloud platform ## Manual Deployment with Helm Charts
- Used when deploying with the `dynamo deploy` CLI commands
- Provides a managed deployment experience
For detailed instructions on using the Dynamo Cloud Platform, see:
- [Dynamo Cloud Platform Guide](dynamo_cloud.md): walks through installing and configuring the Dynamo cloud components on your Kubernetes cluster.
- [Dynamo Operator Guide](dynamo_operator.md)
### Manual Deployment with Helm Charts
Users who need more control over their deployments can use the manual deployment path (`deploy/helm/`): Users who need more control over their deployments can use the manual deployment path (`deploy/helm/`):
...@@ -50,20 +40,3 @@ Users who need more control over their deployments can use the manual deployment ...@@ -50,20 +40,3 @@ Users who need more control over their deployments can use the manual deployment
- Documentation: - Documentation:
- [Using the Deployment Script](manual_helm_deployment.md#using-the-deployment-script): all-in-one script for manual deployment - [Using the Deployment Script](manual_helm_deployment.md#using-the-deployment-script): all-in-one script for manual deployment
- [Helm Deployment Guide](manual_helm_deployment.md#helm-deployment-guide): detailed instructions for manual deployment - [Helm Deployment Guide](manual_helm_deployment.md#helm-deployment-guide): detailed instructions for manual deployment
## Getting Started with Helm Deploy
1. **For Dynamo Cloud Platform**:
- Follow the [Dynamo Cloud Platform Guide](dynamo_cloud.md)
- Deploy a Hello World pipeline using the [Operator Deployment Guide](operator_deployment.md)
- Deploy a Dynamo LLM pipeline to Kubernetes [Deploy LLM Guide](../../examples/llm_deployment.md#deploy-to-kubernetes)
- Model caching with [Fluid](model_caching_with_fluid.md)
2. **For Manual Deployment**:
- Follow the [Manual Helm Deployment Guide](manual_helm_deployment.md)
## Example Deployments
See the [Hello World example](../../examples/hello_world.md#deploying-to-and-running-the-example-in-kubernetes) for a complete walkthrough of deploying a simple inference graph.
See the [LLM example](../../examples/llm_deployment.md#deploy-to-kubernetes) for a complete walkthrough of deploying a production-ready LLM inference pipeline to Kubernetes.
\ No newline at end of file
...@@ -15,9 +15,9 @@ See the License for the specific language governing permissions and ...@@ -15,9 +15,9 @@ See the License for the specific language governing permissions and
limitations under the License. limitations under the License.
--> -->
# Dynamo Cloud Kubernetes Platform (Dynamo Deploy) # Dynamo Cloud Kubernetes Platform
The Dynamo Cloud platform is a comprehensive solution for deploying and managing Dynamo inference graphs (also referred to as pipelines) in Kubernetes environments. It provides a streamlined experience for deploying, scaling, and monitoring your inference services. You can interface with Dynamo Cloud using the `deploy` subcommand available in the Dynamo CLI (for example, `dynamo deploy`) The Dynamo Cloud platform is a comprehensive solution for deploying and managing Dynamo inference graphs (also referred to as pipelines) in Kubernetes environments. It provides a streamlined experience for deploying, scaling, and monitoring your inference services.
## Overview ## Overview
...@@ -26,11 +26,8 @@ The Dynamo cloud platform consists of several key components: ...@@ -26,11 +26,8 @@ The Dynamo cloud platform consists of several key components:
- **Dynamo Operator**: A Kubernetes operator that manages the lifecycle of Dynamo inference graphs from build ➡️ deploy. For more information on the operator, see [Dynamo Kubernetes Operator Documentation](../dynamo_deploy/dynamo_operator.md) - **Dynamo Operator**: A Kubernetes operator that manages the lifecycle of Dynamo inference graphs from build ➡️ deploy. For more information on the operator, see [Dynamo Kubernetes Operator Documentation](../dynamo_deploy/dynamo_operator.md)
- **Custom Resources**: Kubernetes custom resources for defining and managing Dynamo services - **Custom Resources**: Kubernetes custom resources for defining and managing Dynamo services
These components work together to provide a seamless deployment experience, handling everything from containerization to scaling and monitoring.
![Dynamo Deploy system deployment diagram.](../../images/dynamo-deploy.png) ## Deployment Prerequisites
## Prerequisites
Before getting started with the Dynamo cloud platform, ensure you have: Before getting started with the Dynamo cloud platform, ensure you have:
...@@ -56,58 +53,20 @@ Just export the environment variable. This will be the image used by your indivi ...@@ -56,58 +53,20 @@ Just export the environment variable. This will be the image used by your indivi
export DYNAMO_IMAGE=nvcr.io/nvidia/dynamo:latest-vllm export DYNAMO_IMAGE=nvcr.io/nvidia/dynamo:latest-vllm
``` ```
For advanced examples make sure you have first built and pushed to your registry Dynamo Base Image for Dynamo inference runtime. This is a one-time operation. For a custom setup build and push to your registry Dynamo Base Image for Dynamo inference runtime. This is a one-time operation.
```bash ```bash
# Run the script to build the default dynamo:latest-vllm image. # Run the script to build the default dynamo:latest-vllm image.
./container/build.sh ./container/build.sh
export IMAGE_TAG=<TAG> export IMAGE_TAG=<TAG>
# retag the image # Tag the image
docker tag dynamo:latest-vllm <your-registry>/dynamo:${IMAGE_TAG} docker tag dynamo:latest-vllm <your-registry>/dynamo:${IMAGE_TAG}
docker push <your-registry>/dynamo:${IMAGE_TAG} docker push <your-registry>/dynamo:${IMAGE_TAG}
``` ```
## Building Docker Images for Dynamo Cloud Components ## 🚀 Deploying the Dynamo Cloud Platform
The Dynamo cloud platform components need to be built and pushed to a container registry before deployment. You can build these components individually or all at once.
### Setting Up Environment Variables
First, set the required environment variables for building and pushing images:
```bash
# Set your container registry
export DOCKER_SERVER=<CONTAINER_REGISTRY>
# Set the image tag (e.g., latest, 0.0.1, etc.)
export IMAGE_TAG=<TAG>
```
As a description of the placeholders:
- `<CONTAINER_REGISTRY>`: Your container registry (e.g., `nvcr.io`, `docker.io/<your-username>`, etc.)
- `<TAG>`: The tag you want to use for the images of the Dynamo cloud components (e.g., `latest`, `0.0.1`, etc.)
If the runtime image tag is not explicitly set, the default is the `latest`.
The tag will go into the dynamo-operator:<IMAGE_TAG> image for the Operator. The runtime (base) image handles the inference toolchain and the sdk and built by the (`build.sh`). The tags do not have to match the runtime image tag but the images must be compatible.
**Important** Make sure you're logged in to your container registry before pushing images. For example:
```bash
docker login <CONTAINER_REGISTRY>
```
### Building Components
You can build and push all platform components at once:
```bash
earthly --push +all-docker --DOCKER_SERVER=$DOCKER_SERVER --IMAGE_TAG=$IMAGE_TAG
```
### 🚀 Deploying the Dynamo Cloud Platform ## Prerequisites
Once you've built and pushed the components, you can deploy the platform to your Kubernetes cluster.
### Prerequisites
Before deploying Dynamo Cloud, ensure your Kubernetes cluster meets the following requirements: Before deploying Dynamo Cloud, ensure your Kubernetes cluster meets the following requirements:
...@@ -135,144 +94,19 @@ kubectl get storageclass ...@@ -135,144 +94,19 @@ kubectl get storageclass
# standard (default) kubernetes.io/gce-pd Delete Immediate true 1d # standard (default) kubernetes.io/gce-pd Delete Immediate true 1d
``` ```
## Installation
Follow [Quickstart Guide](./quickstart.md) to install the Dynamo Cloud
### Installation using the helper script ⚠️ **Note:** that omitting `--crds` will skip the CRDs installation/upgrade. This is useful when installing on a shared cluster as CRDs are cluster-scoped resources.
1. Set the required environment variables:
```bash
export PROJECT_ROOT=$(pwd)
export DOCKER_USERNAME=<your-docker-username>
export DOCKER_PASSWORD=<your-docker-password>
export DOCKER_SERVER=<your-docker-server>
export IMAGE_TAG=<TAG> # Use the same tag you used when building the images
export NAMESPACE=dynamo-cloud # change this to whatever you want!
export DYNAMO_INGRESS_SUFFIX=dynamo-cloud.com # change this to whatever you want!
```
``` {note}
DOCKER_USERNAME and DOCKER_PASSWORD are optional and only needed if you want to pull docker images from a private registry.
A docker image pull secret is created automatically if these variables are set. Its name is `docker-imagepullsecret` unless overridden by the `DOCKER_SECRET_NAME` environment variable.
```
The Dynamo Cloud Platform auto-generates docker images for pipelines and pushes them to a container registry.
By default, the platform uses the same container registry as the platform components (specified by `DOCKER_SERVER`).
However, you can use a different container registry for the platform components by making sure an associated kubernetes secret is present:
```bash
kubectl create secret docker-registry dynamo-components-imagepullsecret \
--docker-server=<docker-registry-for-dynamo-components> \
--docker-username=<username> \
--docker-password=<password> \
--namespace=${NAMESPACE}
```
If you wish to expose your Dynamo Cloud Platform externally, you can setup the following environment variables:
```bash
# if using ingress
export INGRESS_ENABLED="true"
export INGRESS_CLASS="nginx" # or whatever ingress class you have configured
# if using istio
export ISTIO_ENABLED="true"
export ISTIO_GATEWAY="istio-system/istio-ingressgateway" # or whatever istio gateway you have configured
```
Running the installation script with `--interactive` guides you through the process of exposing your Dynamo Cloud Platform externally if you don't want to set these environment variables manually.
2. [One-time Action] Create a new kubernetes namespace and set it as your default.
```bash
cd deploy/cloud/helm
kubectl create namespace $NAMESPACE
kubectl config set-context --current --namespace=$NAMESPACE
```
3. Deploy the Helm charts (install CRDs first, then platform) using the deployment script:
```bash
./deploy.sh --crds
```
if you want guidance during the process, run the deployment script with the `--interactive` flag:
```bash
./deploy.sh --crds --interactive
```
omitting `--crds` will skip the CRDs installation/upgrade. This is useful when installing on a shared cluster as CRDs are cluster-scoped resources.
If you'd like to only generate the generated-values.yaml file without deploying to Kubernetes (e.g., for inspection, CI workflows, or dry-run testing), use: ⚠️ **Note:** If you'd like to only generate the generated-values.yaml file without deploying to Kubernetes (e.g., for inspection, CI workflows, or dry-run testing), use:
```bash ```bash
./deploy_dynamo_cloud.py --yaml-only ./deploy_dynamo_cloud.py --yaml-only
``` ```
### Installation using published helm chart
To install Dynamo Cloud using the published Helm chart, you'll need to configure Docker registry credentials and image settings.
#### Environment Setup
Set the required environment variables:
```bash
# Docker registry configuration
export DOCKER_SERVER="your-registry.com" # Docker registry server where images of dynamo cloud services (operator) are available
export IMAGE_TAG="v1.0.0" # Image tag to deploy
export NAMESPACE="dynamo-cloud" # Target namespace
# Components-specific Docker registry (if different from DOCKER_SERVER)
export COMPONENTS_DOCKER_SERVER="your-pipeline-registry.com" # Registry for Dynamo components images
# Image pull secret for the operator itself
export DOCKER_SECRET_NAME="my-pull-secret" # Secret for pulling images of dynamo cloud services (operator) operator images
export COMPONENTS_DOCKER_SECRET_NAME="my-components-pull-secret" # Secret for pulling images of dynamo components images (if needed)
```
you can easily create an image pull secret with the following command :
```bash
kubectl create secret docker-registry ${DOCKER_SECRET_NAME} \
--docker-server=${DOCKER_SERVER} \
--docker-username=<docker-server-username> \
--docker-password=<docker-server-password> \
--namespace=${NAMESPACE}
# Only if using a different registry for Dynamo components
kubectl create secret docker-registry ${COMPONENTS_DOCKER_SECRET_NAME} \
--docker-server=${COMPONENTS_DOCKER_SERVER} \
--docker-username=<components-docker-server-username> \
--docker-password=<components-docker-server-password> \
--namespace=${NAMESPACE}
```
#### Installation Commands
**Step 1: Install Custom Resource Definitions (CRDs)**
```bash
helm install dynamo-crds dynamo-crds-helm-chart.tgz \
--namespace default \
--wait \
--atomic
```
**Step 2: Install Dynamo Platform**
Run the following helm command:
```bash
helm install dynamo-platform dynamo-platform-helm-chart.tgz \
--namespace ${NAMESPACE} \
--set "dynamo-operator.controllerManager.manager.image.repository=${DOCKER_SERVER}/dynamo-operator" \
--set "dynamo-operator.controllerManager.manager.image.tag=${IMAGE_TAG}" \
--set "dynamo-operator.imagePullSecrets[0].name=${DOCKER_SECRET_NAME}"
```
### Cloud Provider-Specific deployment ### Cloud Provider-Specific deployment
...@@ -280,12 +114,3 @@ helm install dynamo-platform dynamo-platform-helm-chart.tgz \ ...@@ -280,12 +114,3 @@ helm install dynamo-platform dynamo-platform-helm-chart.tgz \
You can find detailed instructions for deployment in GKE [here](../dynamo_deploy/gke_setup.md) You can find detailed instructions for deployment in GKE [here](../dynamo_deploy/gke_setup.md)
## Next Steps
After deploying the Dynamo cloud platform, you can:
1. Deploy your first inference graph using the [Dynamo CLI](operator_deployment.md)
2. Deploy Dynamo LLM graphs to Kubernetes using the [Dynamo CLI](../../examples/llm_deployment.md)
3. Manage your deployments using the Dynamo CLI
For more detailed information about deploying inference graphs, see the [Dynamo Deploy Guide](README.md).
<!--
SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
SPDX-License-Identifier: Apache-2.0
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<a id="k8-helm-deploy"></a>
# Deploying Dynamo Inference Graphs to Kubernetes using Helm
This guide describes the deployment process of an inference graph created using the Dynamo SDK onto a Kubernetes cluster.
While this guide covers deployment of Dynamo inference graphs using Helm, the preferred method to deploy an inference graph is to [deploy with the Dynamo cloud platform](operator_deployment.md). The [Dynamo cloud platform](dynamo_cloud.md) simplifies the deployment and management of Dynamo inference graphs. It includes a set of components (Operator, Kubernetes Custom Resources, etc.) that work together to streamline the deployment and management process.
Once an inference graph is defined using the Dynamo SDK, it can be deployed onto a Kubernetes cluster using a simple `dynamo deploy` command that orchestrates the following deployment steps:
1. Building docker images from inference graph components on the cluster
2. Intelligently composing the encoded inference graph into a complete deployment on Kubernetes
3. Enabling autoscaling, monitoring, and observability for the inference graph
4. Easy administration of deployments via UI
## Helm Deployment Guide
### Setting up MicroK8s
Follow these steps to set up a local Kubernetes cluster using MicroK8s:
1. Install MicroK8s:
```bash
sudo snap install microk8s --classic
```
2. Configure user permissions:
```bash
sudo usermod -a -G microk8s $USER
sudo chown -R $USER ~/.kube
```
3. **Important**: Log out and log back in for the permissions to take effect
4. Start MicroK8s:
```bash
microk8s start
```
5. Enable required addons:
```bash
# Enable GPU support
microk8s enable gpu
# Enable storage support
# See: https://microk8s.io/docs/addon-hostpath-storage
microk8s enable storage
```
6. Configure kubectl:
```bash
mkdir -p ~/.kube
microk8s config >> ~/.kube/config
```
After completing these steps, you should be able to use the `kubectl` command to interact with your cluster.
### Installing Required Dependencies
Follow these steps to set up the namespace and install required components:
1. Set environment variables:
```bash
export NAMESPACE=dynamo-playground
export RELEASE_NAME=dynamo-platform
export PROJECT_ROOT=$(pwd)
```
2. Install NATS messaging system:
```bash
# Navigate to dependencies directory
cd $PROJECT_ROOT/deploy/helm/dependencies
# Add and update NATS Helm repository
helm repo add nats https://nats-io.github.io/k8s/helm/charts/
helm repo update
# Install NATS with custom values
helm install --namespace ${NAMESPACE} ${RELEASE_NAME}-nats nats/nats \
--values nats-values.yaml
```
3. Install etcd key-value store:
```bash
# Install etcd using Bitnami chart
helm install --namespace ${NAMESPACE} ${RELEASE_NAME}-etcd \
oci://registry-1.docker.io/bitnamicharts/etcd \
--values etcd-values.yaml
```
After completing these steps, your cluster has the necessary messaging and storage infrastructure for running Dynamo inference graphs.
### Building and Deploying the Pipeline
Follow these steps to containerize and deploy your inference pipeline:
1. Build and containerize the pipeline:
``` {note}
For instructions on building and pushing the Dynamo base image, see [Building the Dynamo Base Image](../../get_started.md#building-the-dynamo-base-image).
```
```bash
# Navigate to example directory
cd $PROJECT_ROOT/examples/hello_world
# Set runtime image name
export DYNAMO_IMAGE=<dynamo_base_image>
# Build and containerize the Frontend service
dynamo build --containerize hello_world:Frontend
```
2. Push container to registry:
```bash
# Tag the built image for your registry
docker tag <BUILT_IMAGE_TAG> <TAG>
# Push to your container registry
docker push <TAG>
```
3. Deploy using Helm:
```bash
# Navigate to the deployment directory
cd $PROJECT_ROOT/deploy/helm
# Set release name for Helm
export HELM_RELEASE=hello-world-manual
# Generate Helm values file from Frontend service
dynamo get frontend > pipeline-values.yaml
# Install/upgrade Helm release
helm upgrade -i "$HELM_RELEASE" ./chart \
-f pipeline-values.yaml \
--set image=<TAG> \
--set dynamoIdentifier="hello_world:Frontend" \
-n "$NAMESPACE"
```
4. Test the deployment:
```bash
# Forward the service port to localhost
kubectl -n ${NAMESPACE} port-forward svc/${HELM_RELEASE}-frontend 3000:80
# Test the API endpoint
curl -X 'POST' 'http://localhost:3000/generate' \
-H 'accept: text/event-stream' \
-H 'Content-Type: application/json' \
-d '{"text": "test"}'
```
### Using the Deployment Script
For convenience, you can use the deployment script at `deploy/helm/deploy.sh` that automates all of these steps:
```bash
export DYNAMO_IMAGE=<dynamo_docker_image_name>
./deploy.sh <docker_registry> <k8s_namespace> <path_to_dynamo_directory> <dynamo_identifier> [<dynamo_config_file>]
# Example: export DYNAMO_IMAGE=nvcr.io/nvidian/nim-llm-dev/dynamo-base-worker:0.0.1
# Example: ./deploy.sh nvcr.io/nvidian/nim-llm-dev my-namespace ../../../examples/hello_world/ hello_world:Frontend
# Example: ./deploy.sh nvcr.io/nvidian/nim-llm-dev my-namespace ../../../examples/llm graphs.disagg_router:Frontend ../../../examples/llm/configs/disagg_router.yaml
```
This script handles:
1. Building and pushing the Docker image
2. Setting up the Helm values
3. Installing/upgrading the Helm release
4. Configuring the necessary Kubernetes resources
# Quickstart # Quickstart
Before deploying your inference graphs you need to install the Dynamo Inference Platform and the Dynamo Cloud. Your onboarding includes 2 steps.
1. Before deploying your inference graphs you need to install the Dynamo Inference Platform and the Dynamo Cloud.
Dynamo Cloud acts as an orchestration layer between the end user and Kubernetes, handling the complexity of deploying your graphs for you.
You could install from [Published Artifacts](#1-installing-dynamo-cloud-from-published-artifacts) or [Source](#2-installing-dynamo-cloud-from-source)
2. Once you install the Dynamo Cloud, proceed to the [Examples](../../examples/README.md) to deploy an inference graph.
## 1. Installing from Published Artifacts ## 1. Installing Dynamo Cloud from Published Artifacts
Use this approach when installing from pre-built helm charts and docker images published to NGC. Use this approach when installing from pre-built helm charts and docker images published to NGC.
...@@ -17,6 +21,8 @@ Install `envsubst`, `kubectl`, `helm` ...@@ -17,6 +21,8 @@ Install `envsubst`, `kubectl`, `helm`
### Authenticate with NGC ### Authenticate with NGC
Go to https://ngc.nvidia.com/org to get your NGC_CLI_API_KEY.
```bash ```bash
helm repo add nvidia https://helm.ngc.nvidia.com/nvidia --username='$oauthtoken' --password=<YOUR_NGC_CLI_API_KEY> helm repo add nvidia https://helm.ngc.nvidia.com/nvidia --username='$oauthtoken' --password=<YOUR_NGC_CLI_API_KEY>
``` ```
...@@ -50,7 +56,7 @@ kubectl create namespace ${NAMESPACE} ...@@ -50,7 +56,7 @@ kubectl create namespace ${NAMESPACE}
helm install dynamo-platform dynamo-platform-v${RELEASE_VERSION}.tgz --namespace ${NAMESPACE} helm install dynamo-platform dynamo-platform-v${RELEASE_VERSION}.tgz --namespace ${NAMESPACE}
``` ```
## 2. Installing from Source ## 2. Installing Dynamo Cloud from Source
Use this approach when developing or customizing Dynamo as a contributor, or using local helm charts from the source repository. Use this approach when developing or customizing Dynamo as a contributor, or using local helm charts from the source repository.
...@@ -64,12 +70,18 @@ cd deploy/cloud/helm/ ...@@ -64,12 +70,18 @@ cd deploy/cloud/helm/
### Set Environment Variables ### Set Environment Variables
Our examples use the `nvcr.io` but you can setup your own values if you use another docker registry.
```bash ```bash
export NAMESPACE=dynamo-cloud export NAMESPACE=dynamo-cloud # or whatever you prefer.
export DOCKER_USERNAME=your-username export DOCKER_SERVER=nvcr.io/nvidia/ai-dynamo/ # your-docker-registry.com
export DOCKER_PASSWORD=your-password export DOCKER_USERNAME='$oauthtoken' # your-username if not using nvcr.io
export DOCKER_SERVER=your-docker-registry.com export DOCKER_PASSWORD=YOUR_NGC_CLI_API_KEY # your-password if not using nvcr.io
export IMAGE_TAG=your-image-tag ```
```bash
export IMAGE_TAG=RELEASE_VERSION # i.e. 0.3.2 - the release you are using or your-image-tag of you have built your own Dynamo image.
# The Nvidia Cloud Operator image will be pulled from the `$DOCKER_SERVER/dynamo-operator:$IMAGE_TAG`.
``` ```
The operator image will be pulled from `$DOCKER_SERVER/dynamo-operator:$IMAGE_TAG`. The operator image will be pulled from `$DOCKER_SERVER/dynamo-operator:$IMAGE_TAG`.
...@@ -107,7 +119,9 @@ if you want guidance during the process, run the deployment script with the `--i ...@@ -107,7 +119,9 @@ if you want guidance during the process, run the deployment script with the `--i
./deploy.sh --crds --interactive ./deploy.sh --crds --interactive
``` ```
**Step 1: Install Custom Resource Definitions (CRDs)** **Installing CRDs manually (alternative to the script deploy.sh)**
***Step 1: Install Custom Resource Definitions (CRDs)**
```bash ```bash
helm install dynamo-crds ./crds/ \ helm install dynamo-crds ./crds/ \
...@@ -116,7 +130,7 @@ helm install dynamo-crds ./crds/ \ ...@@ -116,7 +130,7 @@ helm install dynamo-crds ./crds/ \
--atomic --atomic
``` ```
**Step 2: Build Dependencies and Install Platform** ***Step 2: Build Dependencies and Install Platform**
```bash ```bash
helm dep build ./platform/ helm dep build ./platform/
...@@ -150,22 +164,6 @@ We provide a script to uninstall CRDs should you need a clean start. ...@@ -150,22 +164,6 @@ We provide a script to uninstall CRDs should you need a clean start.
## Explore Examples ## Explore Examples
### Hello World
For a basic example that doesn't require a GPU, see the [Hello World](../../examples/hello_world.md)
### LLM
Create a Kubernetes secret containing your sensitive values if needed:
```bash
export HF_TOKEN=your_hf_token
kubectl create secret generic hf-token-secret \
--from-literal=HF_TOKEN=${HF_TOKEN} \
-n ${NAMESPACE}
```
Pick your deployment destination. Pick your deployment destination.
If local If local
...@@ -179,9 +177,13 @@ If kubernetes ...@@ -179,9 +177,13 @@ If kubernetes
export DYNAMO_CLOUD=https://dynamo-cloud.nvidia.com export DYNAMO_CLOUD=https://dynamo-cloud.nvidia.com
``` ```
If deploying to Kubernetes, create a Kubernetes secret containing your sensitive values if needed:
```bash ```bash
# Go to your main dynamo directory. export HF_TOKEN=your_hf_token
cd ../../../ kubectl create secret generic hf-token-secret \
kubectl apply -f examples/llm/deploy/agg.yaml -n $NAMESPACE --from-literal=HF_TOKEN=${HF_TOKEN} \
-n ${NAMESPACE}
``` ```
Follow the [Examples](../../examples/README.md)
\ No newline at end of file
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment