SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
SPDX-License-Identifier: Apache-2.0
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
# Dynamo Deployment Guide
This directory contains all the necessary files and instructions for deploying Dynamo in various environments. Choose the deployment method that best suits your needs:
## Directory Structure
```
deploy/
├── cloud/ # Cloud deployment configurations and tools
├── helm/ # Helm charts for manual Kubernetes deployment
├── metrics/ # Monitoring and metrics configuration
├── sdk/ # Dynamo SDK and related tools
└── README.md # This file
```
## Deployment Options
### 1. 🚀 Dynamo Cloud Platform [PREFERRED]
The Dynamo Cloud Platform provides a managed deployment experience with:
- Automated infrastructure management
- Built-in monitoring and metrics
- Simplified deployment process via `dynamo deploy` CLI commands
-**Dynamo Cloud Platform**: Best for most users, provides managed deployment with built-in monitoring
- See [Dynamo Cloud Platform Guide](../docs/guides/dynamo_deploy/dynamo_cloud.md)
- Recommended for production deployments
- Simplifies dependency management
- Provides infrastructure for user management
-**Manual Helm Deployment**: For users who need full control over their deployment
- See [Manual Helm Deployment Guide](../docs/guides/dynamo_deploy/manual_helm_deployment.md)
- Suitable for custom deployments
- Requires manual management of dependencies
- Provides maximum flexibility for users
## Example Deployments
To help you get started, we provide several example deployments:
### Hello World Example
A basic example to learn Dynamo deployment: [Hello World Example](../examples/hello_world/README.md#deploying-to-and-running-the-example-in-kubernetes)
- Shows how to deploy a simple three-service pipeline that processes text
- Provides step-by-step instructions for building your service and testing with port forwarding
- Includes sample output showing the text flow between services
### LLM Examples
Example for deploying LLM services: [LLM Example](../examples/llm/README.md#deploy-to-kubernetes)
- Demonstrates deploying and making inference requests against LLM models
- Includes examples for both aggregated and disaggregated serving
- Provides detailed deployment steps and testing instructions
@@ -13,42 +13,12 @@ This guide provides instructions for setting up the Inference Gateway with Dynam
1.**Install Dynamo Cloud**
Follow the instructions in [deploy/cloud/README.md](../../deploy/cloud/README.md) to deploy Dynamo Cloud on your Kubernetes cluster. This will set up the necessary infrastructure components for managing Dynamo inference graphs.
[See Quickstart Guide](../../../docs/guides/dynamo_deploy/quickstart.md) to install Dynamo Cloud.
2.**Launch 2 Dynamo Deployments**
Deploy 2 Dynamo aggregated graphs following the instructions in [examples/llm/README.md](../../examples/llm/README.md):
@@ -57,7 +57,6 @@ As of Q2 2025, Dynamo HTTP Frontend metrics are exposed when you build container
- Start the [components/metrics](../../components/metrics/README.md) application to begin monitoring for metric events from dynamo workers and aggregating them on a Prometheus metrics endpoint: `http://localhost:9091/metrics`.
- Uncomment the appropriate lines in prometheus.yml to poll port 9091.
- Start worker(s) that publishes KV Cache metrics: [examples/rust/service_metrics/bin/server](../../lib/runtime/examples/service_metrics/README.md)` can populate dummy KV Cache metrics.
- For a real workflow with real data, see the KV Routing example in [examples/llm/utils/vllm.py](../../examples/llm/utils/vllm.py).
Follow individual examples to serve models locally.
TODO: Follow individual examples to serve models locally.
## Deploying Examples to Kubernetes
...
...
@@ -16,7 +16,6 @@ If you are a **👤 Dynamo User** first follow the [Quickstart Guide](../guides/
### Instructions for Dynamo Contributor
If you are a **🧑💻 Dynamo Contributor** first follow the instructions in [deploy/cloud/helm/README.md](../../deploy/cloud/helm/README.md) to create your Dynamo Cloud deployment.
Make sure your dynamo cloud the `deploy.sh --crds --interactive` script finished successfully.
You would have to rebuild the dynamo platform images as the code evolves. For more details please look at the [Cloud Guide](../guides/dynamo_deploy/dynamo_cloud.md)
SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
SPDX-License-Identifier: Apache-2.0
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
# Hello World Example: Basic Pipeline
## Overview
This example demonstrates the basic concepts of Dynamo by creating a simple multi-service pipeline. It shows how to:
1. Create and connect multiple Dynamo services
2. Pass data between services using Dynamo's runtime
3. Set up a simple HTTP API endpoint
4. Deploy and interact with a Dynamo service graph
Graph Architecture:
```
Users/Clients (HTTP)
│
▼
┌─────────────┐
│ Frontend │ HTTP API endpoint (/generate)
└─────────────┘
│ dynamo/runtime
▼
┌─────────────┐
│ Middle │
└─────────────┘
│ dynamo/runtime
▼
┌─────────────┐
│ Backend │
└─────────────┘
```
## Component Descriptions
### Frontend Service
- Serves as the entry point for external HTTP requests
- Exposes a `/generate` HTTP API endpoint that clients can call
- Processes incoming text and passes it to the Middle service
### Middle Service
- Acts as an intermediary service in the pipeline
- Receives requests from the Frontend
- Appends "-mid" to the text and forwards it to the Backend
### Backend Service
- Functions as the final service in the pipeline
- Processes requests from the Middle service
- Appends "-back" to the text and yields tokens
## Running the Example Locally
Make sure you are running etcd and nats
```bash
sudo systemctl start etcd
sudo systemctl start nats-server
```
1. Launch all three services using a single command:
```bash
cd /workspace/examples/hello_world
dynamo serve hello_world:Frontend
```
The `dynamo serve` command deploys the entire service graph, automatically handling the dependencies between Frontend, Middle, and Backend services.
2. Send request to frontend using curl:
```bash
curl -X'POST'\
'http://localhost:8000/generate'\
-H'accept: text/event-stream'\
-H'Content-Type: application/json'\
-d'{
"text": "test"
}'
```
# Deploy to Kubernetes
You should first deploy the Dynamo Cloud Platform.
If you are a **👤 Dynamo User** first follow the [Quickstart Guide](../guides/dynamo_deploy/quickstart.md).
If you are a **🧑💻 Dynamo Contributor** and you have changed the platform code you would have to rebuild the dynamo platform. To do so please look at the [Cloud Guide](../guides/dynamo_deploy/dynamo_cloud.md).
## Deploy your service using a DynamoGraphDeployment CR.
Building a vLLM docker image for ARM machines currently involves building vLLM from source, which is known to have performance issues to require exgtensive system RAM; see [vLLM Issue 8878](https://github.com/vllm-project/vllm/issues/8878).
Building a vLLM docker image for ARM machines currently involves building vLLM from source, which is known to have performance issues to require extensive system RAM; see [vLLM Issue 8878](https://github.com/vllm-project/vllm/issues/8878).
You can tune the number of parallel build jobs for building VLLM from source
on ARM based on your available cores and system RAM with `VLLM_MAX_JOBS`.
For example, on an ARM machine with low system resources:
When vLLM has pre-built ARM wheels published, this process can be improved.
```
### Run container
### Run the container you have built
```
./container/run.sh -it --framework vllm
./container/run.sh -it --framework VLLM
```
## Run Deployment
...
...
@@ -147,127 +147,6 @@ This figure shows an overview of the major components to deploy:
```
```{note}
The planner component is enabled by default for all deployment architectures but is set to no-op mode. This means the planner observes metrics but doesn't take scaling actions. To enable active scaling, you can add `--Planner.no-operation=false` to your `dynamo serve` command. For more details, see [PLanner](../architecture/planner_intro.rst).
```
### Example architectures
```{note}
For a non-dockerized deployment, first export `DYNAMO_HOME` to point to the dynamo repository root, e.g. `export DYNAMO_HOME=$(pwd)`
```
#### Aggregated serving
```bash
cd$DYNAMO_HOME/examples/llm
dynamo serve graphs.agg:Frontend -f ./configs/agg.yaml
```
#### Aggregated serving with KV Routing
```bash
cd$DYNAMO_HOME/examples/llm
dynamo serve graphs.agg_router:Frontend -f ./configs/agg_router.yaml
```
#### Disaggregated serving
```bash
cd$DYNAMO_HOME/examples/llm
dynamo serve graphs.disagg:Frontend -f ./configs/disagg.yaml
```
#### Disaggregated serving with KV Routing
```bash
cd$DYNAMO_HOME/examples/llm
dynamo serve graphs.disagg_router:Frontend -f ./configs/disagg_router.yaml
"content": "In the heart of Eldoria, an ancient land of boundless magic and mysterious creatures, lies the long-forgotten city of Aeloria. Once a beacon of knowledge and power, Aeloria was buried beneath the shifting sands of time, lost to the world for centuries. You are an intrepid explorer, known for your unparalleled curiosity and courage, who has stumbled upon an ancient map hinting at ests that Aeloria holds a secret so profound that it has the potential to reshape the very fabric of reality. Your journey will take you through treacherous deserts, enchanted forests, and across perilous mountain ranges. Your Task: Character Background: Develop a detailed background for your character. Describe their motivations for seeking out Aeloria, their skills and weaknesses, and any personal connections to the ancient city or its legends. Are they driven by a quest for knowledge, a search for lost familt clue is hidden."
}
],
"stream":false,
"max_tokens": 30
}'
```
### Multinode deployment
See [Multinode Examples](../examples/multinode.md) for more details.
### Close deployment
See [Close deployment](../guides/dynamo_serve.md#close-deployment) in the *Dynamo Run* topic to learn about how to close the deployment.
## Deploy to Kubernetes
These examples can be deployed to a Kubernetes cluster using [Dynamo Cloud](../guides/dynamo_deploy/dynamo_cloud.md) and the Dynamo CLI.
### Prerequisites
You must first follow the instructions in [deploy/cloud/helm/README.md](https://github.com/ai-dynamo/dynamo/blob/main/deploy/cloud/helm/README.md) to install Dynamo Cloud on your Kubernetes cluster.
```{note}
The `KUBE_NS` variable in the following steps must match the Kubernetes namespace where you installed Dynamo Cloud. You must also expose the `dynamo-store` service externally. This will be the endpoint the CLI uses to interface with Dynamo Cloud.
```
### Deployment Steps
For detailed deployment instructions, please refer to the [Operator Deployment Guide](../guides/dynamo_deploy/operator_deployment.md). The following are the specific commands for the LLM examples:
```bash
# Set your project root directory
export PROJECT_ROOT=$(pwd)
# Configure environment variables (see operator_deployment.md for details)
export KUBE_NS=dynamo-cloud
export DYNAMO_CLOUD=http://localhost:8080 # If using port-forward
# OR
# export DYNAMO_CLOUD=https://dynamo-cloud.nvidia.com # If using Ingress/VirtualService
# Build the Dynamo base image (see operator_deployment.md for details)
# TODO: Deploy your service using a DynamoGraphDeployment CR.
```
**Note**: Optionally add `--Planner.no-operation=false` at the end of the deployment command to enable the planner component to take scaling actions on your deployment.
### Testing the Deployment
Once the deployment is complete, you can test it using:
"content": "In the heart of Eldoria, an ancient land of boundless magic and mysterious creatures, lies the long-forgotten city of Aeloria. Once a beacon of knowledge and power, Aeloria was buried beneath the shifting sands of time, lost to the world for centuries. You are an intrepid explorer, known for your unparalleled curiosity and courage, who has stumbled upon an ancient map hinting at ests that Aeloria holds a secret so profound that it has the potential to reshape the very fabric of reality. Your journey will take you through treacherous deserts, enchanted forests, and across perilous mountain ranges. Your Task: Character Background: Develop a detailed background for your character. Describe their motivations for seeking out Aeloria, their skills and weaknesses, and any personal connections to the ancient city or its legends. Are they driven by a quest for knowledge, a search for lost familt clue is hidden."
}
],
"stream":false,
"max_tokens": 30
}'
The planner component is enabled by default for all deployment architectures but is set to no-op mode. This means the planner observes metrics but doesn't take scaling actions. To enable active scaling, you can add `--Planner.no-operation=false` to your `dynamo serve` command.
For more details, see [Planner Architecture Overview](../architecture/planner_intro.rst).
- :runner: dynamo run: quickly spin up a server to experiment with a specified model, input and output target.
- :palm_up_hand: dynamo serve: compose a graph of workers locally and serve.
- :hammer: (Experimental) dynamo build: containerize either the entire graph or parts of graph to multiple containers
- :rocket: (Experimental) dynamo deploy: deploy to K8 with helm charts or custom operators
- :cloud: (Experimental) dynamo cloud: interact with your dynamo cloud server
For more detailed examples on serving LLMs with disaggregated serving, KV aware routing, etc, please refer to [LLM deployment examples](https://github.com/ai-dynamo/dynamo/blob/main/examples/llm/README.md)
@@ -15,31 +15,21 @@ See the License for the specific language governing permissions and
limitations under the License.
-->
# Deploying Inference Graphs to Kubernetes (`dynamo deploy`)
# Deploying Inference Graphs to Kubernetes
This guide explains the deployment options available for Dynamo inference graphs in Kubernetes environments.
We expect users to deploy their inference graphs using CRDs or helm charts.
## Deployment Options
Prior to deploying an inference graph the user should deploy the Dynamo Cloud Platform.
Dynamo Cloud acts as an orchestration layer between the end user and Kubernetes, handling the complexity of deploying your graphs for you. This is a one-time action, only necessary the first time you deploy a DynamoGraph.
Dynamo provides two distinct deployment options that each serve different use cases:
1. Dynamo Cloud Kubernetes Platform is preferred in cases that support it
2. Manual Deployment with Helm Charts is suited to users who need more control over their deployments
# 1. Please follow [Installing Dynamo Cloud](./dynamo_cloud.md) for steps to install.
For details about the Dynamo Cloud Platform, see the [Dynamo Operator Guide](dynamo_operator.md)
### Dynamo Cloud Kubernetes Platform [PREFERRED]
# 2. Follow [Examples](../../examples/README.md) to see how you can deploy your Inference Graphs.
The Dynamo Cloud Platform (`deploy/cloud/`) provides a managed deployment experience:
- Contains the infrastructure components required for the Dynamo cloud platform
- Used when deploying with the `dynamo deploy` CLI commands
- Provides a managed deployment experience
For detailed instructions on using the Dynamo Cloud Platform, see:
-[Dynamo Cloud Platform Guide](dynamo_cloud.md): walks through installing and configuring the Dynamo cloud components on your Kubernetes cluster.
-[Dynamo Operator Guide](dynamo_operator.md)
### Manual Deployment with Helm Charts
## Manual Deployment with Helm Charts
Users who need more control over their deployments can use the manual deployment path (`deploy/helm/`):
...
...
@@ -50,20 +40,3 @@ Users who need more control over their deployments can use the manual deployment
- Documentation:
-[Using the Deployment Script](manual_helm_deployment.md#using-the-deployment-script): all-in-one script for manual deployment
-[Helm Deployment Guide](manual_helm_deployment.md#helm-deployment-guide): detailed instructions for manual deployment
## Getting Started with Helm Deploy
1.**For Dynamo Cloud Platform**:
- Follow the [Dynamo Cloud Platform Guide](dynamo_cloud.md)
- Deploy a Hello World pipeline using the [Operator Deployment Guide](operator_deployment.md)
- Deploy a Dynamo LLM pipeline to Kubernetes [Deploy LLM Guide](../../examples/llm_deployment.md#deploy-to-kubernetes)
- Model caching with [Fluid](model_caching_with_fluid.md)
2.**For Manual Deployment**:
- Follow the [Manual Helm Deployment Guide](manual_helm_deployment.md)
## Example Deployments
See the [Hello World example](../../examples/hello_world.md#deploying-to-and-running-the-example-in-kubernetes) for a complete walkthrough of deploying a simple inference graph.
See the [LLM example](../../examples/llm_deployment.md#deploy-to-kubernetes) for a complete walkthrough of deploying a production-ready LLM inference pipeline to Kubernetes.
@@ -15,9 +15,9 @@ See the License for the specific language governing permissions and
limitations under the License.
-->
# Dynamo Cloud Kubernetes Platform (Dynamo Deploy)
# Dynamo Cloud Kubernetes Platform
The Dynamo Cloud platform is a comprehensive solution for deploying and managing Dynamo inference graphs (also referred to as pipelines) in Kubernetes environments. It provides a streamlined experience for deploying, scaling, and monitoring your inference services. You can interface with Dynamo Cloud using the `deploy` subcommand available in the Dynamo CLI (for example, `dynamo deploy`)
The Dynamo Cloud platform is a comprehensive solution for deploying and managing Dynamo inference graphs (also referred to as pipelines) in Kubernetes environments. It provides a streamlined experience for deploying, scaling, and monitoring your inference services.
## Overview
...
...
@@ -26,11 +26,8 @@ The Dynamo cloud platform consists of several key components:
-**Dynamo Operator**: A Kubernetes operator that manages the lifecycle of Dynamo inference graphs from build ➡️ deploy. For more information on the operator, see [Dynamo Kubernetes Operator Documentation](../dynamo_deploy/dynamo_operator.md)
-**Custom Resources**: Kubernetes custom resources for defining and managing Dynamo services
These components work together to provide a seamless deployment experience, handling everything from containerization to scaling and monitoring.

## Prerequisites
## Deployment Prerequisites
Before getting started with the Dynamo cloud platform, ensure you have:
...
...
@@ -56,58 +53,20 @@ Just export the environment variable. This will be the image used by your indivi
For advanced examples make sure you have first built and pushed to your registry Dynamo Base Image for Dynamo inference runtime. This is a one-time operation.
For a custom setup build and push to your registry Dynamo Base Image for Dynamo inference runtime. This is a one-time operation.
```bash
# Run the script to build the default dynamo:latest-vllm image.
./container/build.sh
export IMAGE_TAG=<TAG>
# retag the image
# Tag the image
docker tag dynamo:latest-vllm <your-registry>/dynamo:${IMAGE_TAG}
docker push <your-registry>/dynamo:${IMAGE_TAG}
```
## Building Docker Images for Dynamo Cloud Components
The Dynamo cloud platform components need to be built and pushed to a container registry before deployment. You can build these components individually or all at once.
### Setting Up Environment Variables
First, set the required environment variables for building and pushing images:
```bash
# Set your container registry
export DOCKER_SERVER=<CONTAINER_REGISTRY>
# Set the image tag (e.g., latest, 0.0.1, etc.)
export IMAGE_TAG=<TAG>
```
As a description of the placeholders:
-`<CONTAINER_REGISTRY>`: Your container registry (e.g., `nvcr.io`, `docker.io/<your-username>`, etc.)
-`<TAG>`: The tag you want to use for the images of the Dynamo cloud components (e.g., `latest`, `0.0.1`, etc.)
If the runtime image tag is not explicitly set, the default is the `latest`.
The tag will go into the dynamo-operator:<IMAGE_TAG> image for the Operator. The runtime (base) image handles the inference toolchain and the sdk and built by the (`build.sh`). The tags do not have to match the runtime image tag but the images must be compatible.
**Important** Make sure you're logged in to your container registry before pushing images. For example:
```bash
docker login <CONTAINER_REGISTRY>
```
### Building Components
You can build and push all platform components at once:
Once you've built and pushed the components, you can deploy the platform to your Kubernetes cluster.
### Prerequisites
## Prerequisites
Before deploying Dynamo Cloud, ensure your Kubernetes cluster meets the following requirements:
...
...
@@ -135,144 +94,19 @@ kubectl get storageclass
# standard (default) kubernetes.io/gce-pd Delete Immediate true 1d
```
## Installation
Follow [Quickstart Guide](./quickstart.md) to install the Dynamo Cloud
### Installation using the helper script
1. Set the required environment variables:
```bash
export PROJECT_ROOT=$(pwd)
export DOCKER_USERNAME=<your-docker-username>
export DOCKER_PASSWORD=<your-docker-password>
export DOCKER_SERVER=<your-docker-server>
export IMAGE_TAG=<TAG> # Use the same tag you used when building the images
export NAMESPACE=dynamo-cloud # change this to whatever you want!
export DYNAMO_INGRESS_SUFFIX=dynamo-cloud.com # change this to whatever you want!
```
``` {note}
DOCKER_USERNAME and DOCKER_PASSWORD are optional and only needed if you want to pull docker images from a private registry.
A docker image pull secret is created automatically if these variables are set. Its name is `docker-imagepullsecret` unless overridden by the `DOCKER_SECRET_NAME` environment variable.
```
The Dynamo Cloud Platform auto-generates docker images for pipelines and pushes them to a container registry.
By default, the platform uses the same container registry as the platform components (specified by `DOCKER_SERVER`).
However, you can use a different container registry for the platform components by making sure an associated kubernetes secret is present:
If you wish to expose your Dynamo Cloud Platform externally, you can setup the following environment variables:
```bash
# if using ingress
export INGRESS_ENABLED="true"
export INGRESS_CLASS="nginx"# or whatever ingress class you have configured
# if using istio
export ISTIO_ENABLED="true"
export ISTIO_GATEWAY="istio-system/istio-ingressgateway"# or whatever istio gateway you have configured
```
Running the installation script with `--interactive` guides you through the process of exposing your Dynamo Cloud Platform externally if you don't want to set these environment variables manually.
2. [One-time Action] Create a new kubernetes namespace and set it as your default.
3. Deploy the Helm charts (install CRDs first, then platform) using the deployment script:
```bash
./deploy.sh --crds
```
if you want guidance during the process, run the deployment script with the `--interactive` flag:
```bash
./deploy.sh --crds--interactive
```
omitting `--crds` will skip the CRDs installation/upgrade. This is useful when installing on a shared cluster as CRDs are cluster-scoped resources.
⚠️ **Note:** that omitting `--crds` will skip the CRDs installation/upgrade. This is useful when installing on a shared cluster as CRDs are cluster-scoped resources.
If you'd like to only generate the generated-values.yaml file without deploying to Kubernetes (e.g., for inspection, CI workflows, or dry-run testing), use:
⚠️ **Note:**If you'd like to only generate the generated-values.yaml file without deploying to Kubernetes (e.g., for inspection, CI workflows, or dry-run testing), use:
```bash
./deploy_dynamo_cloud.py --yaml-only
```
### Installation using published helm chart
To install Dynamo Cloud using the published Helm chart, you'll need to configure Docker registry credentials and image settings.
#### Environment Setup
Set the required environment variables:
```bash
# Docker registry configuration
export DOCKER_SERVER="your-registry.com"# Docker registry server where images of dynamo cloud services (operator) are available
export IMAGE_TAG="v1.0.0"# Image tag to deploy
export NAMESPACE="dynamo-cloud"# Target namespace
# Components-specific Docker registry (if different from DOCKER_SERVER)
export COMPONENTS_DOCKER_SERVER="your-pipeline-registry.com"# Registry for Dynamo components images
# Image pull secret for the operator itself
export DOCKER_SECRET_NAME="my-pull-secret"# Secret for pulling images of dynamo cloud services (operator) operator images
export COMPONENTS_DOCKER_SECRET_NAME="my-components-pull-secret"# Secret for pulling images of dynamo components images (if needed)
```
you can easily create an image pull secret with the following command :
SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
SPDX-License-Identifier: Apache-2.0
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<aid="k8-helm-deploy"></a>
# Deploying Dynamo Inference Graphs to Kubernetes using Helm
This guide describes the deployment process of an inference graph created using the Dynamo SDK onto a Kubernetes cluster.
While this guide covers deployment of Dynamo inference graphs using Helm, the preferred method to deploy an inference graph is to [deploy with the Dynamo cloud platform](operator_deployment.md). The [Dynamo cloud platform](dynamo_cloud.md) simplifies the deployment and management of Dynamo inference graphs. It includes a set of components (Operator, Kubernetes Custom Resources, etc.) that work together to streamline the deployment and management process.
Once an inference graph is defined using the Dynamo SDK, it can be deployed onto a Kubernetes cluster using a simple `dynamo deploy` command that orchestrates the following deployment steps:
1. Building docker images from inference graph components on the cluster
2. Intelligently composing the encoded inference graph into a complete deployment on Kubernetes
3. Enabling autoscaling, monitoring, and observability for the inference graph
4. Easy administration of deployments via UI
## Helm Deployment Guide
### Setting up MicroK8s
Follow these steps to set up a local Kubernetes cluster using MicroK8s:
1. Install MicroK8s:
```bash
sudo snap install microk8s --classic
```
2. Configure user permissions:
```bash
sudo usermod -a-G microk8s $USER
sudo chown-R$USER ~/.kube
```
3.**Important**: Log out and log back in for the permissions to take effect
After completing these steps, your cluster has the necessary messaging and storage infrastructure for running Dynamo inference graphs.
### Building and Deploying the Pipeline
Follow these steps to containerize and deploy your inference pipeline:
1. Build and containerize the pipeline:
``` {note}
For instructions on building and pushing the Dynamo base image, see [Building the Dynamo Base Image](../../get_started.md#building-the-dynamo-base-image).