Unverified Commit 403344e5 authored by hhzhang16's avatar hhzhang16 Committed by GitHub
Browse files

refactor: refactor dynamo deploy subfolder (#927)

parent 99cd9d85
...@@ -40,7 +40,7 @@ def setup_and_teardown(): ...@@ -40,7 +40,7 @@ def setup_and_teardown():
"serve", "serve",
"pipeline:Frontend", "pipeline:Frontend",
"--working-dir", "--working-dir",
"deploy/dynamo/sdk/src/dynamo/sdk/tests", "deploy/sdk/src/dynamo/sdk/tests",
"--Frontend.model=qwentastic", "--Frontend.model=qwentastic",
"--Middle.bias=0.5", "--Middle.bias=0.5",
"--dry-run", "--dry-run",
...@@ -54,7 +54,7 @@ def setup_and_teardown(): ...@@ -54,7 +54,7 @@ def setup_and_teardown():
"serve", "serve",
"pipeline:Frontend", "pipeline:Frontend",
"--working-dir", "--working-dir",
"deploy/dynamo/sdk/src/dynamo/sdk/tests", "deploy/sdk/src/dynamo/sdk/tests",
"--Frontend.model=qwentastic", "--Frontend.model=qwentastic",
"--Middle.bias=0.5", "--Middle.bias=0.5",
] ]
......
...@@ -5,7 +5,7 @@ it via `dynamo serve` or `dynamo deploy`, covering basic concepts as well as ...@@ -5,7 +5,7 @@ it via `dynamo serve` or `dynamo deploy`, covering basic concepts as well as
advanced features like enabling KV routing and disaggregated serving. advanced features like enabling KV routing and disaggregated serving.
For detailed information about `dynamo serve` infrastructure, see the For detailed information about `dynamo serve` infrastructure, see the
[Dynamo SDK Docs](../deploy/dynamo/sdk/docs/sdk/README.md). [Dynamo SDK Docs](../deploy/sdk/docs/sdk/README.md).
For a guide that walks through how to launch a vLLM-based worker with For a guide that walks through how to launch a vLLM-based worker with
implementation of Disaggregated Serving and KV-Aware Routing included, implementation of Disaggregated Serving and KV-Aware Routing included,
...@@ -19,7 +19,7 @@ a Python class based definition that requires a few key decorators to get going: ...@@ -19,7 +19,7 @@ a Python class based definition that requires a few key decorators to get going:
- `@dynamo_endpoint`: marks methods that can be called by other workers or clients - `@dynamo_endpoint`: marks methods that can be called by other workers or clients
For more detailed information on these concepts, see the For more detailed information on these concepts, see the
[Dynamo SDK Docs](../deploy/dynamo/sdk/docs/sdk/README.md). [Dynamo SDK Docs](../deploy/sdk/docs/sdk/README.md).
### Worker Skeleton ### Worker Skeleton
...@@ -52,7 +52,7 @@ based on the definitions above, it would be: `your_namespace/YourWorker/your_end ...@@ -52,7 +52,7 @@ based on the definitions above, it would be: `your_namespace/YourWorker/your_end
- `endpoint="your_endpoint"`: Defined by the `@dynamo_endpoint` decorator, or by default the name of the function being decorated. - `endpoint="your_endpoint"`: Defined by the `@dynamo_endpoint` decorator, or by default the name of the function being decorated.
For more details about service configuration, resource management, and dynamo endpoints, For more details about service configuration, resource management, and dynamo endpoints,
see the [Dynamo SDK Docs](../deploy/dynamo/sdk/docs/README.md). see the [Dynamo SDK Docs](../deploy/sdk/docs/README.md).
### Request/Response Types ### Request/Response Types
...@@ -628,5 +628,5 @@ For more information on Disaggregated Serving, see the ...@@ -628,5 +628,5 @@ For more information on Disaggregated Serving, see the
## Additional Resources ## Additional Resources
- Check the [examples](../examples/) directory for more detailed implementations - Check the [examples](../examples/) directory for more detailed implementations
- Refer to the [Dynamo SDK Docs](../deploy/dynamo/sdk/docs/sdk/README.md) for API details. - Refer to the [Dynamo SDK Docs](../deploy/sdk/docs/sdk/README.md) for API details.
- For Disaggregated Serving, see the [general guide](../docs/disagg_serving.md) and [performance tuning guide](../docs/guides/disagg_perf_tuning.md). - For Disaggregated Serving, see the [general guide](../docs/disagg_serving.md) and [performance tuning guide](../docs/guides/disagg_perf_tuning.md).
...@@ -85,7 +85,7 @@ dynamo build hello_world:Frontend --containerize ...@@ -85,7 +85,7 @@ dynamo build hello_world:Frontend --containerize
### 4. Run your container ### 4. Run your container
As a prerequisite, ensure you have NATS and etcd running by running the docker compose in the deploy directory. You can find it [here](../../deploy/docker-compose.yml). As a prerequisite, ensure you have NATS and etcd running by running the docker compose in the deploy directory. You can find it [here](../../deploy/metrics/docker-compose.yml).
```bash ```bash
docker compose up -d docker compose up -d
...@@ -145,7 +145,7 @@ dynamo build graphs.agg:Frontend --containerize ...@@ -145,7 +145,7 @@ dynamo build graphs.agg:Frontend --containerize
### 4. Run your container ### 4. Run your container
As a prerequisite, ensure you have NATS and etcd running by running the docker compose in the deploy directory. You can find it [here](../../deploy/docker-compose.yml). As a prerequisite, ensure you have NATS and etcd running by running the docker compose in the deploy directory. You can find it [here](../../deploy/metrics/docker-compose.yml).
```bash ```bash
docker compose up -d docker compose up -d
......
...@@ -25,7 +25,7 @@ Dynamo provides two distinct deployment paths, each serving different use cases: ...@@ -25,7 +25,7 @@ Dynamo provides two distinct deployment paths, each serving different use cases:
### 1. 🚀 Dynamo Cloud Kubernetes Platform [PREFERRED] ### 1. 🚀 Dynamo Cloud Kubernetes Platform [PREFERRED]
The Dynamo Cloud Platform (`deploy/dynamo/helm/`) provides a managed deployment experience: The Dynamo Cloud Platform (`deploy/cloud/`) provides a managed deployment experience:
- Contains the infrastructure components required for the Dynamo cloud platform - Contains the infrastructure components required for the Dynamo cloud platform
- Used when deploying with the `dynamo deploy` CLI commands - Used when deploying with the `dynamo deploy` CLI commands
...@@ -37,15 +37,15 @@ For detailed instructions on using the Dynamo Cloud Platform, see: ...@@ -37,15 +37,15 @@ For detailed instructions on using the Dynamo Cloud Platform, see:
### 2. Manual Deployment with Helm Charts ### 2. Manual Deployment with Helm Charts
The manual deployment path (`deploy/Kubernetes/`) is available for users who need more control over their deployments: The manual deployment path (`deploy/helm/`) is available for users who need more control over their deployments:
- Used for manually deploying inference graphs to Kubernetes - Used for manually deploying inference graphs to Kubernetes
- Contains Helm charts and configurations for deploying individual inference pipelines - Contains Helm charts and configurations for deploying individual inference pipelines
- Provides full control over deployment parameters - Provides full control over deployment parameters
- Requires manual management of infrastructure components - Requires manual management of infrastructure components
- Documentation: - Documentation:
- [Deploying Dynamo Inference Graphs to Kubernetes using Helm](../../Kubernetes/pipeline/README.md): all-in-one script - [Using the Deployment Script](manual_helm_deployment.md#using-the-deployment-script): all-in-one script for manual deployment
- [Manual Helm Deployment Guide](manual_helm_deployment.md): detailed instructions on manual deployment - [Helm Deployment Guide](manual_helm_deployment.md#helm-deployment-guide): detailed instructions for manual deployment
## Getting Started ## Getting Started
......
...@@ -141,7 +141,7 @@ Running the installation script with `--interactive` will guide you through the ...@@ -141,7 +141,7 @@ Running the installation script with `--interactive` will guide you through the
2. [One-time Action] Create a new kubernetes namespace and set it as your default. 2. [One-time Action] Create a new kubernetes namespace and set it as your default.
```bash ```bash
cd deploy/dynamo/helm cd deploy/cloud/helm
kubectl create namespace $NAMESPACE kubectl create namespace $NAMESPACE
kubectl config set-context --current --namespace=$NAMESPACE kubectl config set-context --current --namespace=$NAMESPACE
``` ```
......
...@@ -21,7 +21,7 @@ This guide will walk you through the process of deploying an inference graph cre ...@@ -21,7 +21,7 @@ This guide will walk you through the process of deploying an inference graph cre
While this guide covers deployment of Dynamo inference graphs using Helm, the preferred method to deploy an inference graph is to [deploy with the Dynamo cloud platform](operator_deployment.md). The [Dynamo cloud platform](dynamo_cloud.md) simplifies the deployment and management of Dynamo inference graphs. It includes a set of components (Operator, Kubernetes Custom Resources, etc.) that work together to streamline the deployment and management process. While this guide covers deployment of Dynamo inference graphs using Helm, the preferred method to deploy an inference graph is to [deploy with the Dynamo cloud platform](operator_deployment.md). The [Dynamo cloud platform](dynamo_cloud.md) simplifies the deployment and management of Dynamo inference graphs. It includes a set of components (Operator, Kubernetes Custom Resources, etc.) that work together to streamline the deployment and management process.
Once an inference graph is defined using the Dynamo SDK, it can be deployed onto a Kubernetes cluster using a simple `dynamo deploy` command that orchestrates the following deployment steps: Once an inference graph is defined using the Dynamo SDK, it can be deployed onto a Kubernetes cluster using a simple `dynamo deploy` command that orchestrates the following deployment steps:
1. Building docker images from inference graph components on the cluster 1. Building docker images from inference graph components on the cluster
2. Intelligently composing the encoded inference graph into a complete deployment on Kubernetes 2. Intelligently composing the encoded inference graph into a complete deployment on Kubernetes
...@@ -86,7 +86,7 @@ export PROJECT_ROOT=$(pwd) ...@@ -86,7 +86,7 @@ export PROJECT_ROOT=$(pwd)
2. Install NATS messaging system: 2. Install NATS messaging system:
```bash ```bash
# Navigate to dependencies directory # Navigate to dependencies directory
cd $PROJECT_ROOT/deploy/Kubernetes/pipeline/dependencies cd $PROJECT_ROOT/deploy/helm/dependencies
# Add and update NATS Helm repository # Add and update NATS Helm repository
helm repo add nats https://nats-io.github.io/k8s/helm/charts/ helm repo add nats https://nats-io.github.io/k8s/helm/charts/
...@@ -139,7 +139,7 @@ docker push <TAG> ...@@ -139,7 +139,7 @@ docker push <TAG>
3. Deploy using Helm: 3. Deploy using Helm:
```bash ```bash
# Navigate to the deployment directory # Navigate to the deployment directory
cd $PROJECT_ROOT/deploy/Kubernetes/pipeline cd $PROJECT_ROOT/deploy/helm
# Set release name for Helm # Set release name for Helm
export HELM_RELEASE=hello-world-manual export HELM_RELEASE=hello-world-manual
...@@ -167,4 +167,21 @@ curl -X 'POST' 'http://localhost:3000/generate' \ ...@@ -167,4 +167,21 @@ curl -X 'POST' 'http://localhost:3000/generate' \
-d '{"text": "test"}' -d '{"text": "test"}'
``` ```
For convenience, you can find a complete deployment script at `deploy/Kubernetes/pipeline/deploy.sh` that automates all of these steps. ### Using the Deployment Script
For convenience, you can use the deployment script at `deploy/helm/deploy.sh` that automates all of these steps:
```bash
export DYNAMO_IMAGE=<dynamo_docker_image_name>
./deploy.sh <docker_registry> <k8s_namespace> <path_to_dynamo_directory> <dynamo_identifier> [<dynamo_config_file>]
# Example: export DYNAMO_IMAGE=nvcr.io/nvidian/nim-llm-dev/dynamo-base-worker:0.0.1
# Example: ./deploy.sh nvcr.io/nvidian/nim-llm-dev my-namespace ../../../examples/hello_world/ hello_world:Frontend
# Example: ./deploy.sh nvcr.io/nvidian/nim-llm-dev my-namespace ../../../examples/llm graphs.disagg_router:Frontend ../../../examples/llm/configs/disagg_router.yaml
```
This script handles:
1. Building and pushing the Docker image
2. Setting up the Helm values
3. Installing/upgrading the Helm release
4. Configuring the necessary Kubernetes resources
...@@ -13,7 +13,7 @@ Before proceeding with deployment, ensure you have: ...@@ -13,7 +13,7 @@ Before proceeding with deployment, ensure you have:
- Helm package manager - Helm package manager
- Rust packages and toolchain - Rust packages and toolchain
You must have first followed the instructions in [deploy/dynamo/helm/README.md](../../../deploy/dynamo/helm/README.md) to install Dynamo Cloud on your Kubernetes cluster. You must have first followed the instructions in [deploy/cloud/helm/README.md](../../../deploy/cloud/helm/README.md) to install Dynamo Cloud on your Kubernetes cluster.
**Note**: Note the `KUBE_NS` variable in the following steps must match the Kubernetes namespace where you installed Dynamo Cloud. You must also expose the `dynamo-store` service externally. This will be the endpoint the CLI uses to interface with Dynamo Cloud. **Note**: Note the `KUBE_NS` variable in the following steps must match the Kubernetes namespace where you installed Dynamo Cloud. You must also expose the `dynamo-store` service externally. This will be the endpoint the CLI uses to interface with Dynamo Cloud.
## Understanding the Deployment Process ## Understanding the Deployment Process
......
...@@ -37,7 +37,7 @@ Inference graphs are compositions of service components that work together to ha ...@@ -37,7 +37,7 @@ Inference graphs are compositions of service components that work together to ha
## Creating an inference graph ## Creating an inference graph
Once you've written your various Dynamo services (docs on how to write these can be found [here](../../deploy/dynamo/sdk/docs/sdk/README.md)), you can create an inference graph by composing these services together using the following two mechanisms: Once you've written your various Dynamo services (docs on how to write these can be found [here](../../deploy/sdk/docs/sdk/README.md)), you can create an inference graph by composing these services together using the following two mechanisms:
### 1. Dependencies with `depends()` ### 1. Dependencies with `depends()`
...@@ -144,7 +144,7 @@ We've provided a set of basic configurations for this example [here](../../examp ...@@ -144,7 +144,7 @@ We've provided a set of basic configurations for this example [here](../../examp
### 4. Serve your graph ### 4. Serve your graph
As a prerequisite, ensure you have NATS and etcd running by running the docker compose in the deploy directory. You can find it [here](../../deploy/docker-compose.yml). As a prerequisite, ensure you have NATS and etcd running by running the docker compose in the deploy directory. You can find it [here](../../deploy/metrics/docker-compose.yml).
```bash ```bash
docker compose up -d docker compose up -d
......
...@@ -93,7 +93,7 @@ This example can be deployed to a Kubernetes cluster using [Dynamo Cloud](../../ ...@@ -93,7 +93,7 @@ This example can be deployed to a Kubernetes cluster using [Dynamo Cloud](../../
### Prerequisites ### Prerequisites
You must have first followed the instructions in [deploy/dynamo/helm/README.md](../../deploy/dynamo/helm/README.md) to create your Dynamo cloud deployment. You must have first followed the instructions in [deploy/cloud/helm/README.md](../../deploy/cloud/helm/README.md) to create your Dynamo cloud deployment.
### Deployment Steps ### Deployment Steps
......
...@@ -45,9 +45,9 @@ In this example, we will use 2 nodes to demo the disagg serving. ...@@ -45,9 +45,9 @@ In this example, we will use 2 nodes to demo the disagg serving.
- Deploys DummyWorker as the monolith worker - Deploys DummyWorker as the monolith worker
### Prerequisites ### Prerequisites
On Node 1, start required services (etcd and NATS) using [Docker Compose](../../../deploy/docker-compose.yml) On Node 1, start required services (etcd and NATS) using [Docker Compose](../../../deploy/metrics/docker-compose.yml)
```bash ```bash
docker compose -f deploy/docker-compose.yml up -d docker compose -f deploy/metrics/docker-compose.yml up -d
``` ```
### Run the Deployment ### Run the Deployment
......
...@@ -64,9 +64,9 @@ sequenceDiagram ...@@ -64,9 +64,9 @@ sequenceDiagram
### Prerequisites ### Prerequisites
Start required services (etcd and NATS) using [Docker Compose](../../deploy/docker-compose.yml) Start required services (etcd and NATS) using [Docker Compose](../../deploy/metrics/docker-compose.yml)
```bash ```bash
docker compose -f deploy/docker-compose.yml up -d docker compose -f deploy/metrics/docker-compose.yml up -d
``` ```
### Build docker ### Build docker
...@@ -186,7 +186,7 @@ These examples can be deployed to a Kubernetes cluster using [Dynamo Cloud](../. ...@@ -186,7 +186,7 @@ These examples can be deployed to a Kubernetes cluster using [Dynamo Cloud](../.
### Prerequisites ### Prerequisites
You must have first followed the instructions in [deploy/dynamo/helm/README.md](../../deploy/dynamo/helm/README.md) to install Dynamo Cloud on your Kubernetes cluster. You must have first followed the instructions in [deploy/cloud/helm/README.md](../../deploy/cloud/helm/README.md) to install Dynamo Cloud on your Kubernetes cluster.
**Note**: The `KUBE_NS` variable in the following steps must match the Kubernetes namespace where you installed Dynamo Cloud. You must also expose the `dynamo-store` service externally. This will be the endpoint the CLI uses to interface with Dynamo Cloud. **Note**: The `KUBE_NS` variable in the following steps must match the Kubernetes namespace where you installed Dynamo Cloud. You must also expose the `dynamo-store` service externally. This will be the endpoint the CLI uses to interface with Dynamo Cloud.
......
...@@ -17,7 +17,7 @@ Note that this can be easily extended to more nodes. You can also run the Fronte ...@@ -17,7 +17,7 @@ Note that this can be easily extended to more nodes. You can also run the Fronte
**Step 1**: Start NATS/ETCD on your head node. Ensure you have the correct firewall rules to allow communication between the nodes as you will need the NATS/ETCD endpoints to be accessible by all other nodes. **Step 1**: Start NATS/ETCD on your head node. Ensure you have the correct firewall rules to allow communication between the nodes as you will need the NATS/ETCD endpoints to be accessible by all other nodes.
```bash ```bash
# node 1 # node 1
docker compose -f deploy/docker-compose.yml up -d docker compose -f deploy/metrics/docker-compose.yml up -d
``` ```
**Step 2**: Create the inference graph for this node. Here we will use the `agg_router.py` (even though we are doing disaggregated serving) graph because we want the `Frontend`, `Processor`, `Router`, and `VllmWorker` to spin up (we will spin up the other decode worker and prefill worker separately on different nodes later). **Step 2**: Create the inference graph for this node. Here we will use the `agg_router.py` (even though we are doing disaggregated serving) graph because we want the `Frontend`, `Processor`, `Router`, and `VllmWorker` to spin up (we will spin up the other decode worker and prefill worker separately on different nodes later).
......
...@@ -35,9 +35,9 @@ Note: TensorRT-LLM disaggregation does not support conditional disaggregation ye ...@@ -35,9 +35,9 @@ Note: TensorRT-LLM disaggregation does not support conditional disaggregation ye
### Prerequisites ### Prerequisites
Start required services (etcd and NATS) using [Docker Compose](../../deploy/docker-compose.yml) Start required services (etcd and NATS) using [Docker Compose](../../deploy/metrics/docker-compose.yml)
```bash ```bash
docker compose -f deploy/docker-compose.yml up -d docker compose -f deploy/metrics/docker-compose.yml up -d
``` ```
### Build docker ### Build docker
......
...@@ -44,7 +44,7 @@ cargo test ...@@ -44,7 +44,7 @@ cargo test
The simplest way to deploy the pre-requisite services is using The simplest way to deploy the pre-requisite services is using
[docker-compose](https://docs.docker.com/compose/install/linux/), [docker-compose](https://docs.docker.com/compose/install/linux/),
defined in [deploy/docker-compose.yml](../../deploy/docker-compose.yml). defined in [deploy/metrics/docker-compose.yml](../../deploy/metrics/docker-compose.yml).
``` ```
docker-compose up -d docker-compose up -d
......
...@@ -44,7 +44,7 @@ cargo test ...@@ -44,7 +44,7 @@ cargo test
The simplest way to deploy the pre-requisite services is using The simplest way to deploy the pre-requisite services is using
[docker-compose](https://docs.docker.com/compose/install/linux/), [docker-compose](https://docs.docker.com/compose/install/linux/),
defined in the project's root [docker-compose.yml](docker-compose.yml). defined in the project's root [docker-compose.yml](../../../docker-compose.yml).
``` ```
docker-compose up -d docker-compose up -d
......
...@@ -78,7 +78,7 @@ requires = ["hatchling"] ...@@ -78,7 +78,7 @@ requires = ["hatchling"]
build-backend = "hatchling.build" build-backend = "hatchling.build"
[tool.hatch.build.targets.wheel] [tool.hatch.build.targets.wheel]
packages = ["deploy/dynamo/sdk/src/dynamo", "components/planner/src/dynamo"] packages = ["deploy/sdk/src/dynamo", "components/planner/src/dynamo"]
# This section is for including the binaries in the wheel package # This section is for including the binaries in the wheel package
# but doesn't make them executable scripts in the venv bin directory # but doesn't make them executable scripts in the venv bin directory
...@@ -132,7 +132,7 @@ addopts = [ ...@@ -132,7 +132,7 @@ addopts = [
"--mypy", "--mypy",
"--ignore-glob=*model.py", "--ignore-glob=*model.py",
"--ignore-glob=*_inc.py", "--ignore-glob=*_inc.py",
"--ignore-glob=deploy/dynamo/api-store/*", "--ignore-glob=deploy/cloud/api-store/*",
# FIXME: Get relative/generic blob paths to work here # FIXME: Get relative/generic blob paths to work here
] ]
xfail_strict = true xfail_strict = true
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment