docs: Post-Merge cleanup of the deploy documentation (#1922)

95dd9426 · atchernych · GitHub · cb6de94d · 95dd9426 · 95dd9426
Unverified Commit 95dd9426 authored Jul 21, 2025 by atchernych Committed by GitHub Jul 21, 2025
20 changed files
--- a/README.md
+++ b/README.md
@@ -83,7 +83,7 @@ docker push <your-registry>/dynamo-base:latest-vllm
 ```

 Notes about builds for specific frameworks:
- For specific details on the `--framework vllm` build, see [here](examples/llm/README.md).
+- For specific details on the `--framework vllm` build, see [here](examples/vllm/README.md).
 - For specific details on the `--framework tensorrtllm` build, see [here](examples/tensorrt_llm/README.md).

 Note about AWS environments:
@@ -99,14 +99,14 @@ export DYNAMO_IMAGE=<your-registry>/dynamo-base:latest-vllm

 ### Running and Interacting with an LLM Locally

-To run a model and interact with it locally you can call `dynamo
-run` with a hugging face model. `dynamo run` supports several backends
-including: `mistralrs`, `sglang`, `vllm`, and `tensorrtllm`.
+You can run a model and interact with it locally using commands below.
+We support several backends including: `mistralrs`, `sglang`, `vllm`, and `tensorrtllm`.

-#### Example Command
+#### Example Commands

 ```
-dynamo run out=vllm deepseek-ai/DeepSeek-R1-Distill-Llama-8B
+python -m dynamo.frontend [--http-port 8080]
+python -m dynamo.vllm deepseek-ai/DeepSeek-R1-Distill-Llama-8B
 ```

 ```

--- a/container/launch_message.txt
+++ b/container/launch_message.txt
@@ -48,7 +48,8 @@ tools.

 Try the following to begin interacting with a model:
 > dynamo --help
-> dynamo run Qwen/Qwen2.5-3B-Instruct
+> python -m dynamo.frontend [--http-port 8080]
+> python -m dynamo.vllm Qwen/Qwen2.5-3B-Instruct

 To run more complete deployment examples, instances of etcd and nats need to be
 accessible within the container. This is generally done by connecting to
@@ -58,6 +59,6 @@ cases, you can start them in the container as well:
 > etcd --listen-client-urls http://0.0.0.0:2379 --advertise-client-urls http://0.0.0.0:2379 --data-dir /tmp/etcd &

 With etcd/nats accessible, run the examples:
-> cd examples/hello_world
-> dynamo serve hello_world:Frontend
+> cd examples
+

--- a/deploy/README.md
+++ b/deploy/README.md
-<!--
-SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
-SPDX-License-Identifier: Apache-2.0
-
-Licensed under the Apache License, Version 2.0 (the "License");
-you may not use this file except in compliance with the License.
-You may obtain a copy of the License at
-
-http://www.apache.org/licenses/LICENSE-2.0
-
-Unless required by applicable law or agreed to in writing, software
-distributed under the License is distributed on an "AS IS" BASIS,
-WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-See the License for the specific language governing permissions and
-limitations under the License.
-->
-
-# Dynamo Deployment Guide
-
-This directory contains all the necessary files and instructions for deploying Dynamo in various environments. Choose the deployment method that best suits your needs:
-
-## Directory Structure
-
-```
-deploy/
-├── cloud/                    # Cloud deployment configurations and tools
-├── helm/                     # Helm charts for manual Kubernetes deployment
-├── metrics/                  # Monitoring and metrics configuration
-├── sdk/                      # Dynamo SDK and related tools
-└── README.md                 # This file
-```
-
-## Deployment Options
-
-### 1. 🚀 Dynamo Cloud Platform [PREFERRED]
-
-The Dynamo Cloud Platform provides a managed deployment experience with:
- Automated infrastructure management
- Built-in monitoring and metrics
- Simplified deployment process via `dynamo deploy` CLI commands
- Production-ready configurations
- Managed NATS and etcd dependencies
-
-For detailed instructions, see:
- [Dynamo Cloud Platform Guide](../docs/guides/dynamo_deploy/dynamo_cloud.md)
- [Operator Deployment Guide](../docs/guides/dynamo_deploy/operator_deployment.md)
-
-### 2. Manual Deployment with Helm Charts
-
-For users who need more control over their deployments:
- Full control over deployment parameters
- Manual management of infrastructure
- Customizable monitoring setup
- Flexible configuration options
- Manual management of NATS and etcd dependencies
-
-Documentation:
- [Manual Helm Deployment Guide](../docs/guides/dynamo_deploy/manual_helm_deployment.md)
- [Minikube Setup Guide](../docs/guides/dynamo_deploy/minikube.md)
-
-## Choosing the Right Deployment Method
-
- **Dynamo Cloud Platform**: Best for most users, provides managed deployment with built-in monitoring
-  - See [Dynamo Cloud Platform Guide](../docs/guides/dynamo_deploy/dynamo_cloud.md)
-  - Recommended for production deployments
-  - Simplifies dependency management
-  - Provides infrastructure for user management
-
- **Manual Helm Deployment**: For users who need full control over their deployment
-  - See [Manual Helm Deployment Guide](../docs/guides/dynamo_deploy/manual_helm_deployment.md)
-  - Suitable for custom deployments
-  - Requires manual management of dependencies
-  - Provides maximum flexibility for users
-
-## Example Deployments
-
-To help you get started, we provide several example deployments:
-
-### Hello World Example
-A basic example to learn Dynamo deployment: [Hello World Example](../examples/hello_world/README.md#deploying-to-and-running-the-example-in-kubernetes)
- Shows how to deploy a simple three-service pipeline that processes text
- Provides step-by-step instructions for building your service and testing with port forwarding
- Includes sample output showing the text flow between services
-
-### LLM Examples
-Example for deploying LLM services: [LLM Example](../examples/llm/README.md#deploy-to-kubernetes)
- Demonstrates deploying and making inference requests against LLM models
- Includes examples for both aggregated and disaggregated serving
- Provides detailed deployment steps and testing instructions
--- a/deploy/README.md
+++ b/deploy/README.md
+./docs/guides/dynamo_deploy/README.md
\ No newline at end of file
--- a/deploy/cloud/helm/deploy.sh
+++ b/deploy/cloud/helm/deploy.sh
@@ -139,6 +139,8 @@ retry_command() {

 # Update the helm repo and build the dependencies
 retry_command "$HELM_CMD repo add nats https://nats-io.github.io/k8s/helm/charts/" 5 5 && \
+# retry_command "$HELM_CMD repo add bitnami https://charts.bitnami.com/bitnami" 5 5 && \
+# retry_command "$HELM_CMD repo add minio https://charts.min.io/" 5 5 && \
 retry_command "$HELM_CMD repo update" 5 5



--- a/deploy/helm/README.md
+++ b/deploy/helm/README.md
@@ -30,7 +30,7 @@ This approach allows you to install Dynamo directly using a DynamoGraphDeploymen
 ### Basic Installation

 ```bash
-helm upgrade --install dynamo-graph ./deploy/helm/chart -n dynamo-cloud -f ./examples/vllm_v1/deploy/agg.yaml
+helm upgrade --install dynamo-graph ./deploy/helm/chart -n dynamo-cloud -f ./examples/vllm/deploy/agg.yaml
 ```

 ### Customizable Properties
@@ -39,7 +39,7 @@ You can override the default configuration by setting the following properties:

 ```bash
 helm upgrade --install dynamo-graph ./deploy/helm/chart -n dynamo-cloud \
-  -f ./examples/vllm_v1/deploy/agg.yaml \
+  -f ./examples/vllm/deploy/agg.yaml \
  --set "imagePullSecrets[0].name=docker-secret-1" \
  --set etcdAddr="my-etcd-service:2379" \
  --set natsAddr="nats://my-nats-service:4222"

--- a/deploy/inference-gateway/example/README.md
+++ b/deploy/inference-gateway/example/README.md
@@ -13,42 +13,12 @@ This guide provides instructions for setting up the Inference Gateway with Dynam

 1. **Install Dynamo Cloud**

-Follow the instructions in [deploy/cloud/README.md](../../deploy/cloud/README.md) to deploy Dynamo Cloud on your Kubernetes cluster. This will set up the necessary infrastructure components for managing Dynamo inference graphs.
+[See Quickstart Guide](../../../docs/guides/dynamo_deploy/quickstart.md) to install Dynamo Cloud.

-2. **Launch 2 Dynamo Deployments**

-Deploy 2 Dynamo aggregated graphs following the instructions in [examples/llm/README.md](../../examples/llm/README.md):
+2. **Launch Dynamo Deployments**

-### Deploy Dynamo Graphs
-
-Follow the commands to deploy 2 dynamo graphs -
-
-```bash
-# Set pre-built vLLM dynamo base container image
-export VLLM_RUNTIME_IMAGE=<dynamo-vllm-base-image>
-# for example:
-# export VLLM_RUNTIME_IMAGE=nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.3.1
-
-# run the following commands from dynamo repo's root folder
-
-# Deploy first graph
-export DEPLOYMENT_NAME=llm-agg1
-yq eval '
-  .metadata.name = env(DEPLOYMENT_NAME) |
-  .spec.services[].extraPodSpec.mainContainer.image = env(VLLM_RUNTIME_IMAGE)
-' examples/vllm_v0/deploy/agg.yaml > examples/vllm_v0/deploy/agg1.yaml
-
-kubectl apply -f examples/vllm_v0/deploy/agg1.yaml
-
-# Deploy second graph
-export DEPLOYMENT_NAME=llm-agg2
-yq eval '
-  .metadata.name = env(DEPLOYMENT_NAME) |
-  .spec.services[].extraPodSpec.mainContainer.image = env(VLLM_RUNTIME_IMAGE)
-' examples/vllm_v0/deploy/agg.yaml > examples/vllm_v0/deploy/agg2.yaml
-
-kubectl apply -f examples/vllm_v0/deploy/agg2.yaml
-```
+[See VLLM Example](../../../examples/vllm/README.md)

 3. **Deploy Inference Gateway**


--- a/deploy/metrics/README.md
+++ b/deploy/metrics/README.md
@@ -57,7 +57,6 @@ As of Q2 2025, Dynamo HTTP Frontend metrics are exposed when you build container
   - Start the [components/metrics](../../components/metrics/README.md) application to begin monitoring for metric events from dynamo workers and aggregating them on a Prometheus metrics endpoint: `http://localhost:9091/metrics`.
   - Uncomment the appropriate lines in prometheus.yml to poll port 9091.
   - Start worker(s) that publishes KV Cache metrics: [examples/rust/service_metrics/bin/server](../../lib/runtime/examples/service_metrics/README.md)` can populate dummy KV Cache metrics.
-   - For a real workflow with real data, see the KV Routing example in [examples/llm/utils/vllm.py](../../examples/llm/utils/vllm.py).


 ## Configuration

--- a/deploy/metrics/docker-compose.yml
+++ b/deploy/metrics/docker-compose.yml
@@ -126,7 +126,6 @@ services:
      - ./grafana_dashboards:/etc/grafana/provisioning/dashboards
      - ./grafana-datasources.yml:/etc/grafana/provisioning/datasources/datasources.yml
    environment:
-      # Port 3000 is already used by "dynamo serve", so use 3001
      - GF_SERVER_HTTP_PORT=3001
      # do not make it admin/admin, because you will be prompted to change the password every time
      - GF_SECURITY_ADMIN_USER=dynamo

--- a/docs/dynamo_glossary.md
+++ b/docs/dynamo_glossary.md
@@ -23,14 +23,6 @@

 **Dynamo Cloud** - A Kubernetes platform providing managed deployment experience for Dynamo inference graphs.

-**dynamo build** - The CLI command to containerize inference graphs or parts of graphs into Docker containers.
-
-**dynamo deploy** - The CLI command to deploy inference graphs to Kubernetes with Helm charts or custom operators.
-
-**dynamo run** - The CLI command to quickly experiment and test models with various LLM engines.
-
-**dynamo serve** - The CLI command to compose and serve inference graphs locally.
-
 ## E
 **@endpoint** - A Python decorator used to define service endpoints within a Dynamo component.


--- a/docs/examples/README.md
+++ b/docs/examples/README.md
@@ -2,7 +2,7 @@

 ## Serving examples locally

-Follow individual examples to serve models locally.
+TODO: Follow individual examples to serve models locally.


 ## Deploying Examples to Kubernetes
@@ -16,7 +16,6 @@ If you are a **👤 Dynamo User** first follow the [Quickstart Guide](../guides/
 ### Instructions for Dynamo Contributor
 If you are a **🧑‍💻 Dynamo Contributor** first follow the instructions in [deploy/cloud/helm/README.md](../../deploy/cloud/helm/README.md) to create your Dynamo Cloud deployment.

-Make sure your dynamo cloud the `deploy.sh --crds --interactive` script finished successfully.

 You would have to rebuild the dynamo platform images as the code evolves. For more details please look at the [Cloud Guide](../guides/dynamo_deploy/dynamo_cloud.md)

@@ -27,7 +26,7 @@ export DYNAMO_IMAGE=<your-registry>/<your-image-name>:<your-tag>
 ```


-### Post Install Instructions
+### Deploying a particular example

 ```bash
 # Set your dynamo root directory
@@ -36,17 +35,43 @@ export PROJECT_ROOT=$(pwd)
 export NAMESPACE=<your-namespace> # the namespace you used to deploy Dynamo cloud to.
 ```

-Pick your deployment destination.
+Deploying an example consists of the simple `kubectl apply -f ... -n ${NAMESPACE}` command. For example:

-If local
+```bash
+kubectl apply -f  examples/vllm/deploy/agg.yaml -n ${NAMESPACE}
+```
+
+You can use `kubectl get dynamoGraphDeployment -n ${NAMESPACE}` to view your deployment.
+You can use `kubectl delete dynamoGraphDeployment <your-dep-name> -n ${NAMESPACE}` to delete the deployment.
+
+
+**Note 1** Example Image
+
+The examples use a prebuilt image from the `nvcr.io/nvidian/nim-llm-dev registry`.
+You can build your own image and update the image location in your CR file prior to applying.
+See [Building the Dynamo Base Image](../../README.md#building-the-dynamo-base-image)

 ```bash
-export DYNAMO_CLOUD=http://localhost:8080
+extraPodSpec:
+        mainContainer:
+          image: <image-in-your-$DYNAMO_IMAGE>
 ```

-If kubernetes
+**Note 2**
+Setup port forward if needed when deploying to Kubernetes.
+
+List the services in your namespace:
+
+```bash
+kubectl get svc -n ${NAMESPACE}
+```
+Look for one that ends in `-frontend` and use it for port forward.
+
 ```bash
-export DYNAMO_CLOUD=https://dynamo-cloud.nvidia.com
+SERVICE_NAME=$(kubectl get svc -n ${NAMESPACE} -o name | grep frontend | sed 's|.*/||' | sed 's|-frontend||' | head -n1)
+kubectl port-forward svc/${SERVICE_NAME}-frontend 8000:8000 -n ${NAMESPACE}
 ```

-Deploying examples consists of the simple `kubectl apply -f` command.
+Consult the [Port Forward Documentation](https://kubernetes.io/docs/tasks/access-application-cluster/port-forward-access-application-cluster/)
+
+More on [LLM examples](llm_deployment.md)
\ No newline at end of file
--- a/docs/examples/hello_world.md
+++ b/docs/examples/hello_world.md
-<!--
-SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
-SPDX-License-Identifier: Apache-2.0
-
-Licensed under the Apache License, Version 2.0 (the "License");
-you may not use this file except in compliance with the License.
-You may obtain a copy of the License at
-
-http://www.apache.org/licenses/LICENSE-2.0
-
-Unless required by applicable law or agreed to in writing, software
-distributed under the License is distributed on an "AS IS" BASIS,
-WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-See the License for the specific language governing permissions and
-limitations under the License.
-->
-
-# Hello World Example: Basic Pipeline
-
-## Overview
-
-This example demonstrates the basic concepts of Dynamo by creating a simple multi-service pipeline. It shows how to:
-
-1. Create and connect multiple Dynamo services
-2. Pass data between services using Dynamo's runtime
-3. Set up a simple HTTP API endpoint
-4. Deploy and interact with a Dynamo service graph
-
-Graph Architecture:
-
-```
-Users/Clients (HTTP)
-      │
-      ▼
-┌─────────────┐
-│  Frontend   │  HTTP API endpoint (/generate)
-└─────────────┘
-      │ dynamo/runtime
-      ▼
-┌─────────────┐
-│   Middle    │
-└─────────────┘
-      │ dynamo/runtime
-      ▼
-┌─────────────┐
-│  Backend    │
-└─────────────┘
-```
-
-## Component Descriptions
-
-### Frontend Service
- Serves as the entry point for external HTTP requests
- Exposes a `/generate` HTTP API endpoint that clients can call
- Processes incoming text and passes it to the Middle service
-
-### Middle Service
- Acts as an intermediary service in the pipeline
- Receives requests from the Frontend
- Appends "-mid" to the text and forwards it to the Backend
-
-### Backend Service
- Functions as the final service in the pipeline
- Processes requests from the Middle service
- Appends "-back" to the text and yields tokens
-
-## Running the Example Locally
-
-Make sure you are running etcd and nats
-```bash
-sudo systemctl start etcd
-sudo systemctl start nats-server
-```
-
-1. Launch all three services using a single command:
-
-```bash
-cd /workspace/examples/hello_world
-dynamo serve hello_world:Frontend
-```
-
-The `dynamo serve` command deploys the entire service graph, automatically handling the dependencies between Frontend, Middle, and Backend services.
-
-2. Send request to frontend using curl:
-
-```bash
-curl -X 'POST' \
-  'http://localhost:8000/generate' \
-  -H 'accept: text/event-stream' \
-  -H 'Content-Type: application/json' \
-  -d '{
-  "text": "test"
-}'
-```
-
-# Deploy to Kubernetes
-
-You should first deploy the Dynamo Cloud Platform.
-If you are a **👤 Dynamo User** first follow the [Quickstart Guide](../guides/dynamo_deploy/quickstart.md).
-If you are a **🧑‍💻 Dynamo Contributor** and you have changed the platform code you would have to rebuild the dynamo platform. To do so please look at the [Cloud Guide](../guides/dynamo_deploy/dynamo_cloud.md).
-
-## Deploy your service using a DynamoGraphDeployment CR.
-
-```bash
-kubectl apply -f examples/hello_world/deploy/hello_world.yaml -n ${NAMESPACE}
-```
-
-## Testing the Deployment
-
-Once the deployment is complete, you can test it using commands below.
-Do the port forward in another terminal if needed.
-
-```bash
-export DEPLOYMENT_NAME=hello-world
-# Forward the pod's port to localhost
-kubectl port-forward svc/$DEPLOYMENT_NAME-frontend 8000:8000 -n ${NAMESPACE}
-```
-
-```bash
-# Test the API endpoint
-curl -N -X POST http://localhost:8000/generate \
-  -H "accept: text/event-stream" \
-  -H "Content-Type: application/json" \
-  -d '{"text": "test"}'
-```
-
-
-## Expected Output
-
-When you send the request with "test" as input, the response will show how the text flows through each service:
-
-```
-Frontend: Middle: Backend: test-mid-back
-```
-
-This demonstrates how:
-1. The Frontend receives "test"
-2. The Middle service adds "-mid" to create "test-mid"
-3. The Backend service adds "-back" to create "test-mid-back"
--- a/docs/examples/llm_deployment.md
+++ b/docs/examples/llm_deployment.md
@@ -81,27 +81,27 @@ Start required services (etcd and NATS) using [Docker Compose](../../deploy/metr
 docker compose -f deploy/metrics/docker-compose.yml up -d
 ```

-### Build docker
+### Build the container image for your platform

 ```bash
 # On an x86 machine
-./container/build.sh --framework vllm
+./container/build.sh --framework VLLM

 # On an ARM machine (ex: GB200)
-./container/build.sh --framework vllm --platform linux/arm64
+./container/build.sh --framework VLLM --platform linux/arm64
 ```

 ```{note}
-Building a vLLM docker image for ARM machines currently involves building vLLM from source, which is known to have performance issues to require exgtensive system RAM; see [vLLM Issue 8878](https://github.com/vllm-project/vllm/issues/8878).
+Building a vLLM docker image for ARM machines currently involves building vLLM from source, which is known to have performance issues to require extensive system RAM; see [vLLM Issue 8878](https://github.com/vllm-project/vllm/issues/8878).

 You can tune the number of parallel build jobs for building VLLM from source
 on ARM based on your available cores and system RAM with `VLLM_MAX_JOBS`.

 For example, on an ARM machine with low system resources:
-`./container/build.sh --framework vllm --platform linux/arm64 --build-arg VLLM_MAX_JOBS=2`
+`./container/build.sh --framework VLLM --platform linux/arm64 --build-arg VLLM_MAX_JOBS=2`

 For example, on a GB200 which has very high CPU cores and memory resource:
-`./container/build.sh --framework vllm --platform linux/arm64 --build-arg VLLM_MAX_JOBS=64`
+`./container/build.sh --framework VLLM --platform linux/arm64 --build-arg VLLM_MAX_JOBS=64`

 When vLLM has pre-built ARM wheels published, this process can be improved.

@@ -109,17 +109,17 @@ You can tune the number of parallel build jobs for building VLLM from source
 on ARM based on your available cores and system RAM with `VLLM_MAX_JOBS`.

 For example, on an ARM machine with low system resources:
-`./container/build.sh --framework vllm --platform linux/arm64 --build-arg VLLM_MAX_JOBS=2`
+`./container/build.sh --framework VLLM --platform linux/arm64 --build-arg VLLM_MAX_JOBS=2`

 For example, on a GB200 which has very high CPU cores and memory resource:
-`./container/build.sh --framework vllm --platform linux/arm64 --build-arg VLLM_MAX_JOBS=64`
+`./container/build.sh --framework VLLM --platform linux/arm64 --build-arg VLLM_MAX_JOBS=64`

 When vLLM has pre-built ARM wheels published, this process can be improved.
 ```
-### Run container
+### Run the container you have built

 ```
-./container/run.sh -it --framework vllm
+./container/run.sh -it --framework VLLM
 ```

 ## Run Deployment
@@ -147,127 +147,6 @@ This figure shows an overview of the major components to deploy:
 ```

 ```{note}
-The planner component is enabled by default for all deployment architectures but is set to no-op mode. This means the planner observes metrics but doesn't take scaling actions. To enable active scaling, you can add `--Planner.no-operation=false` to your `dynamo serve` command. For more details, see [PLanner](../architecture/planner_intro.rst).
-```
-
-### Example architectures
-
-```{note}
-For a non-dockerized deployment, first export `DYNAMO_HOME` to point to the dynamo repository root, e.g. `export DYNAMO_HOME=$(pwd)`
-```
-
-#### Aggregated serving
-```bash
-cd $DYNAMO_HOME/examples/llm
-dynamo serve graphs.agg:Frontend -f ./configs/agg.yaml
-```
-
-#### Aggregated serving with KV Routing
-```bash
-cd $DYNAMO_HOME/examples/llm
-dynamo serve graphs.agg_router:Frontend -f ./configs/agg_router.yaml
-```
-
-#### Disaggregated serving
-```bash
-cd $DYNAMO_HOME/examples/llm
-dynamo serve graphs.disagg:Frontend -f ./configs/disagg.yaml
-```
-
-#### Disaggregated serving with KV Routing
-```bash
-cd $DYNAMO_HOME/examples/llm
-dynamo serve graphs.disagg_router:Frontend -f ./configs/disagg_router.yaml
-```
-
-### Client
-
-In another terminal:
-```bash
-# this test request has around 200 tokens isl
-
-curl localhost:8000/v1/chat/completions   -H "Content-Type: application/json"   -d '{
-    "model": "deepseek-ai/DeepSeek-R1-Distill-Llama-8B",
-    "messages": [
-    {
-        "role": "user",
-        "content": "In the heart of Eldoria, an ancient land of boundless magic and mysterious creatures, lies the long-forgotten city of Aeloria. Once a beacon of knowledge and power, Aeloria was buried beneath the shifting sands of time, lost to the world for centuries. You are an intrepid explorer, known for your unparalleled curiosity and courage, who has stumbled upon an ancient map hinting at ests that Aeloria holds a secret so profound that it has the potential to reshape the very fabric of reality. Your journey will take you through treacherous deserts, enchanted forests, and across perilous mountain ranges. Your Task: Character Background: Develop a detailed background for your character. Describe their motivations for seeking out Aeloria, their skills and weaknesses, and any personal connections to the ancient city or its legends. Are they driven by a quest for knowledge, a search for lost familt clue is hidden."
-    }
-    ],
-    "stream":false,
-    "max_tokens": 30
-  }'
-
-```
-
-### Multinode deployment
-
-See [Multinode Examples](../examples/multinode.md) for more details.
-
-### Close deployment
-
-See [Close deployment](../guides/dynamo_serve.md#close-deployment) in the *Dynamo Run* topic to learn about how to close the deployment.
-
-## Deploy to Kubernetes
-
-These examples can be deployed to a Kubernetes cluster using [Dynamo Cloud](../guides/dynamo_deploy/dynamo_cloud.md) and the Dynamo CLI.
-
-### Prerequisites
-
-You must first follow the instructions in [deploy/cloud/helm/README.md](https://github.com/ai-dynamo/dynamo/blob/main/deploy/cloud/helm/README.md) to install Dynamo Cloud on your Kubernetes cluster.
-
-```{note}
-The `KUBE_NS` variable in the following steps must match the Kubernetes namespace where you installed Dynamo Cloud. You must also expose the `dynamo-store` service externally. This will be the endpoint the CLI uses to interface with Dynamo Cloud.
-```
-
-### Deployment Steps
-
-For detailed deployment instructions, please refer to the [Operator Deployment Guide](../guides/dynamo_deploy/operator_deployment.md). The following are the specific commands for the LLM examples:
-
-```bash
-# Set your project root directory
-export PROJECT_ROOT=$(pwd)
-
-# Configure environment variables (see operator_deployment.md for details)
-export KUBE_NS=dynamo-cloud
-export DYNAMO_CLOUD=http://localhost:8080  # If using port-forward
-# OR
-# export DYNAMO_CLOUD=https://dynamo-cloud.nvidia.com  # If using Ingress/VirtualService
-
-# Build the Dynamo base image (see operator_deployment.md for details)
-export DYNAMO_IMAGE=<your-registry>/<your-image-name>:<your-tag>
-
-# Build the service
-cd $PROJECT_ROOT/examples/llm
-DYNAMO_TAG=$(dynamo build graphs.agg:Frontend | grep "Successfully built" |  awk '{ print $NF }' | sed 's/\.$//')
-
-# Deploy to Kubernetes
-export DEPLOYMENT_NAME=llm-agg
-# TODO: Deploy your service using a DynamoGraphDeployment CR.
-```
-
-**Note**: Optionally add `--Planner.no-operation=false` at the end of the deployment command to enable the planner component to take scaling actions on your deployment.
-
-### Testing the Deployment
-
-Once the deployment is complete, you can test it using:
-
-```bash
-# Forward the port to localhost
-kubectl port-forward svc/$DEPLOYMENT_NAME-frontend 8000:8000 -n ${KUBE_NS}
-
-# Test the API endpoint
-curl localhost:8000/v1/chat/completions \
-  -H "Content-Type: application/json" \
-  -d '{
-    "model": "deepseek-ai/DeepSeek-R1-Distill-Llama-8B",
-    "messages": [
-    {
-        "role": "user",
-        "content": "In the heart of Eldoria, an ancient land of boundless magic and mysterious creatures, lies the long-forgotten city of Aeloria. Once a beacon of knowledge and power, Aeloria was buried beneath the shifting sands of time, lost to the world for centuries. You are an intrepid explorer, known for your unparalleled curiosity and courage, who has stumbled upon an ancient map hinting at ests that Aeloria holds a secret so profound that it has the potential to reshape the very fabric of reality. Your journey will take you through treacherous deserts, enchanted forests, and across perilous mountain ranges. Your Task: Character Background: Develop a detailed background for your character. Describe their motivations for seeking out Aeloria, their skills and weaknesses, and any personal connections to the ancient city or its legends. Are they driven by a quest for knowledge, a search for lost familt clue is hidden."
-    }
-    ],
-    "stream":false,
-    "max_tokens": 30
-  }'
+The planner component is enabled by default for all deployment architectures but is set to no-op mode. This means the planner observes metrics but doesn't take scaling actions. To enable active scaling, you can add `--Planner.no-operation=false` to your `dynamo serve` command.
+For more details, see [Planner Architecture Overview](../architecture/planner_intro.rst).
 ```
--- a/docs/get_started.md
+++ b/docs/get_started.md
@@ -123,13 +123,14 @@ export DYNAMO_IMAGE=<your-registry>/dynamo-base:latest-vllm

 ## Running and Interacting with an LLM Locally

-To run a model and interact with it locally, call `dynamo run` with a Hugging Face model.
-`dynamo run` supports several backends, including `mistralrs`, `sglang`, `vllm`, and `tensorrtllm`.
+Dynamo supports several backends, including `mistralrs`, `sglang`, `vllm`, and `tensorrtllm`.
+Use example commands below tp launch a model.

 ### Example Command

 ```bash
-dynamo run out=vllm deepseek-ai/DeepSeek-R1-Distill-Llama-8B
+python -m dynamo.frontend [--http-port 8080]
+python -m dynamo.vllm deepseek-ai/DeepSeek-R1-Distill-Llama-8B
 ```

 ```bash
@@ -166,31 +167,7 @@ docker compose -f deploy/docker-compose.yml up -d

 ### Start Dynamo LLM Serving Components

-Next, serve a minimal configuration with an http server, basic
-round-robin router, and a single worker.
-
-```bash
-cd examples/llm
-dynamo serve graphs.agg:Frontend -f configs/agg.yaml
-```
-
-### Send a Request
-
-```bash
-curl localhost:8000/v1/chat/completions \
-  -H "Content-Type: application/json" \
-  -d '{
-    "model": "deepseek-ai/DeepSeek-R1-Distill-Llama-8B",
-    "messages": [
-    {
-        "role": "user",
-        "content": "Hello, how are you?"
-    }
-    ],
-    "stream":false,
-    "max_tokens": 300
-  }' | jq
-```
+[Explore the VLLM Example](../examples/vllm/README.md)


 ## Local Development
@@ -232,6 +209,6 @@ pip install .[all]

 # To test
 docker compose -f deploy/docker-compose.yml up -d
-cd examples/llm
-dynamo serve graphs.agg:Frontend -f configs/agg.yaml
+python -m dynamo.frontend [--http-port 8080]
+python -m dynamo.vllm deepseek-ai/DeepSeek-R1-Distill-Llama-8B
 ```
--- a/docs/guides/README.md
+++ b/docs/guides/README.md
-<!--
-SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
-SPDX-License-Identifier: Apache-2.0
-
-Licensed under the Apache License, Version 2.0 (the "License");
-you may not use this file except in compliance with the License.
-You may obtain a copy of the License at
-
-http://www.apache.org/licenses/LICENSE-2.0
-
-Unless required by applicable law or agreed to in writing, software
-distributed under the License is distributed on an "AS IS" BASIS,
-WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-See the License for the specific language governing permissions and
-limitations under the License.
-->
-
-# Guide to Dynamo CLI
-
-After installing Dynamo with the following command, Dynamo can be used primarily through its CLI.
-```
-apt-get update
-DEBIAN_FRONTEND=noninteractive apt-get install -yq python3-dev python3-pip python3-venv libucx0
-python3 -m venv venv
-source venv/bin/activate
-
-pip install "ai-dynamo[all]"
-```
-
-## Dynamo workflow
-Dynamo CLI has the following 4 sub-commands.
-
- :runner: dynamo run: quickly spin up a server to experiment with a specified model, input and output target.
- :palm_up_hand: dynamo serve: compose a graph of workers locally and serve.
- :hammer: (Experimental) dynamo build: containerize either the entire graph or parts of graph to multiple containers
- :rocket: (Experimental) dynamo deploy: deploy to K8 with helm charts or custom operators
- :cloud: (Experimental) dynamo cloud: interact with your dynamo cloud server
-
-For more detailed examples on serving LLMs with disaggregated serving, KV aware routing, etc,  please refer to [LLM deployment examples](https://github.com/ai-dynamo/dynamo/blob/main/examples/llm/README.md)
-
--- a/docs/guides/cli_overview.md
+++ b/docs/guides/cli_overview.md
@@ -37,7 +37,7 @@ Use `run` to start an interactive chat session with a model. This command execut

 #### Example
 ```bash
-dynamo run deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
+dynamo-run deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
 ```

 ### `serve`

--- a/docs/guides/dynamo_deploy/README.md
+++ b/docs/guides/dynamo_deploy/README.md
@@ -15,31 +15,21 @@ See the License for the specific language governing permissions and
 limitations under the License.
 -->

-# Deploying Inference Graphs to Kubernetes (`dynamo deploy`)
+# Deploying Inference Graphs to Kubernetes

-This guide explains the deployment options available for Dynamo inference graphs in Kubernetes environments.
+We expect users to deploy their inference graphs using CRDs or helm charts.

-## Deployment Options
+Prior to deploying an inference graph the user should deploy the Dynamo Cloud Platform.
+Dynamo Cloud acts as an orchestration layer between the end user and Kubernetes, handling the complexity of deploying your graphs for you. This is a one-time action, only necessary the first time you deploy a DynamoGraph.

-Dynamo provides two distinct deployment options that each serve different use cases:
-1. Dynamo Cloud Kubernetes Platform is preferred in cases that support it
-2. Manual Deployment with Helm Charts is suited to users who need more control over their deployments

+# 1. Please follow [Installing Dynamo Cloud](./dynamo_cloud.md) for steps to install.
+For details about the Dynamo Cloud Platform, see the [Dynamo Operator Guide](dynamo_operator.md)

-### Dynamo Cloud Kubernetes Platform [PREFERRED]
+# 2. Follow [Examples](../../examples/README.md) to see how you can deploy your Inference Graphs.

-The Dynamo Cloud Platform (`deploy/cloud/`) provides a managed deployment experience:

- Contains the infrastructure components required for the Dynamo cloud platform
- Used when deploying with the `dynamo deploy` CLI commands
- Provides a managed deployment experience
-
-For detailed instructions on using the Dynamo Cloud Platform, see:
- [Dynamo Cloud Platform Guide](dynamo_cloud.md): walks through installing and configuring the Dynamo cloud components on your Kubernetes cluster.
- [Dynamo Operator Guide](dynamo_operator.md)
-
-
-### Manual Deployment with Helm Charts
+## Manual Deployment with Helm Charts

 Users who need more control over their deployments can use the manual deployment path (`deploy/helm/`):

@@ -50,20 +40,3 @@ Users who need more control over their deployments can use the manual deployment
 - Documentation:
  - [Using the Deployment Script](manual_helm_deployment.md#using-the-deployment-script): all-in-one script for manual deployment
  - [Helm Deployment Guide](manual_helm_deployment.md#helm-deployment-guide): detailed instructions for manual deployment
-
-## Getting Started with Helm Deploy
-
-1. **For Dynamo Cloud Platform**:
-   - Follow the [Dynamo Cloud Platform Guide](dynamo_cloud.md)
-   - Deploy a Hello World pipeline using the [Operator Deployment Guide](operator_deployment.md)
-   - Deploy a Dynamo LLM pipeline to Kubernetes [Deploy LLM Guide](../../examples/llm_deployment.md#deploy-to-kubernetes)
-   - Model caching with [Fluid](model_caching_with_fluid.md)
-
-2. **For Manual Deployment**:
-   - Follow the [Manual Helm Deployment Guide](manual_helm_deployment.md)
-
-## Example Deployments
-
-See the [Hello World example](../../examples/hello_world.md#deploying-to-and-running-the-example-in-kubernetes) for a complete walkthrough of deploying a simple inference graph.
-
-See the [LLM example](../../examples/llm_deployment.md#deploy-to-kubernetes) for a complete walkthrough of deploying a production-ready LLM inference pipeline to Kubernetes.
\ No newline at end of file
--- a/docs/guides/dynamo_deploy/dynamo_cloud.md
+++ b/docs/guides/dynamo_deploy/dynamo_cloud.md
@@ -15,9 +15,9 @@ See the License for the specific language governing permissions and
 limitations under the License.
 -->

-# Dynamo Cloud Kubernetes Platform (Dynamo Deploy)
+# Dynamo Cloud Kubernetes Platform

-The Dynamo Cloud platform is a comprehensive solution for deploying and managing Dynamo inference graphs (also referred to as pipelines) in Kubernetes environments. It provides a streamlined experience for deploying, scaling, and monitoring your inference services. You can interface with Dynamo Cloud using the `deploy` subcommand available in the Dynamo CLI (for example, `dynamo deploy`)
+The Dynamo Cloud platform is a comprehensive solution for deploying and managing Dynamo inference graphs (also referred to as pipelines) in Kubernetes environments. It provides a streamlined experience for deploying, scaling, and monitoring your inference services.

 ## Overview

@@ -26,11 +26,8 @@ The Dynamo cloud platform consists of several key components:
 - **Dynamo Operator**: A Kubernetes operator that manages the lifecycle of Dynamo inference graphs from build ➡️ deploy. For more information on the operator, see [Dynamo Kubernetes Operator Documentation](../dynamo_deploy/dynamo_operator.md)
 - **Custom Resources**: Kubernetes custom resources for defining and managing Dynamo services

-These components work together to provide a seamless deployment experience, handling everything from containerization to scaling and monitoring.

-![Dynamo Deploy system deployment diagram.](../../images/dynamo-deploy.png)
-
-## Prerequisites
+## Deployment Prerequisites

 Before getting started with the Dynamo cloud platform, ensure you have:

@@ -56,58 +53,20 @@ Just export the environment variable. This will be the image used by your indivi
 export DYNAMO_IMAGE=nvcr.io/nvidia/dynamo:latest-vllm
 ```

-For advanced examples make sure you have first built and pushed to your registry Dynamo Base Image for Dynamo inference runtime. This is a one-time operation.
+For a custom setup build and push to your registry Dynamo Base Image for Dynamo inference runtime. This is a one-time operation.

 ```bash
 # Run the script to build the default dynamo:latest-vllm image.
 ./container/build.sh
 export IMAGE_TAG=<TAG>
-# retag the image
+# Tag the image
 docker tag dynamo:latest-vllm <your-registry>/dynamo:${IMAGE_TAG}
 docker push <your-registry>/dynamo:${IMAGE_TAG}
 ```

-## Building Docker Images for Dynamo Cloud Components
-
-The Dynamo cloud platform components need to be built and pushed to a container registry before deployment. You can build these components individually or all at once.
-
-### Setting Up Environment Variables
-
-First, set the required environment variables for building and pushing images:
-
-```bash
-# Set your container registry
-export DOCKER_SERVER=<CONTAINER_REGISTRY>
-# Set the image tag (e.g., latest, 0.0.1, etc.)
-export IMAGE_TAG=<TAG>
-```
-
-As a description of the placeholders:
- `<CONTAINER_REGISTRY>`: Your container registry (e.g., `nvcr.io`, `docker.io/<your-username>`, etc.)
- `<TAG>`: The tag you want to use for the images of the Dynamo cloud components (e.g., `latest`, `0.0.1`, etc.)
-If the runtime image tag is not explicitly set, the default is the `latest`.
-
-The tag will go into the dynamo-operator:<IMAGE_TAG> image for the Operator.  The runtime (base) image handles the inference toolchain and the sdk and built by the (`build.sh`). The tags do not have to match the runtime  image tag but the images must be compatible.
-
-**Important** Make sure you're logged in to your container registry before pushing images. For example:
-
-```bash
-docker login <CONTAINER_REGISTRY>
-```
-
-### Building Components
-
-You can build and push all platform components at once:
-
-```bash
-earthly --push +all-docker --DOCKER_SERVER=$DOCKER_SERVER --IMAGE_TAG=$IMAGE_TAG
-```
+## 🚀 Deploying the Dynamo Cloud Platform

-### 🚀 Deploying the Dynamo Cloud Platform
-
-Once you've built and pushed the components, you can deploy the platform to your Kubernetes cluster.
-
-### Prerequisites
+## Prerequisites

 Before deploying Dynamo Cloud, ensure your Kubernetes cluster meets the following requirements:

@@ -135,144 +94,19 @@ kubectl get storageclass
 # standard (default)   kubernetes.io/gce-pd    Delete          Immediate              true                   1d
 ```

+## Installation

+Follow [Quickstart Guide](./quickstart.md) to install the Dynamo Cloud

-### Installation using the helper script
-
-1. Set the required environment variables:
-```bash
-export PROJECT_ROOT=$(pwd)
-export DOCKER_USERNAME=<your-docker-username>
-export DOCKER_PASSWORD=<your-docker-password>
-export DOCKER_SERVER=<your-docker-server>
-export IMAGE_TAG=<TAG>  # Use the same tag you used when building the images
-export NAMESPACE=dynamo-cloud    # change this to whatever you want!
-export DYNAMO_INGRESS_SUFFIX=dynamo-cloud.com # change this to whatever you want!
-```
-
-``` {note}
-DOCKER_USERNAME and DOCKER_PASSWORD are optional and only needed if you want to pull docker images from a private registry.
-A docker image pull secret is created automatically if these variables are set. Its name is `docker-imagepullsecret` unless overridden by the `DOCKER_SECRET_NAME` environment variable.
-```
-
-The Dynamo Cloud Platform auto-generates docker images for pipelines and pushes them to a container registry.
-By default, the platform uses the same container registry as the platform components (specified by `DOCKER_SERVER`).
-However, you can use a different container registry for the platform components by making sure an associated kubernetes secret is present:
-
-```bash
-kubectl create secret docker-registry dynamo-components-imagepullsecret \
-  --docker-server=<docker-registry-for-dynamo-components> \
-  --docker-username=<username> \
-  --docker-password=<password> \
-  --namespace=${NAMESPACE}
-```
-
-If you wish to expose your Dynamo Cloud Platform externally, you can setup the following environment variables:
-
-```bash
-# if using ingress
-export INGRESS_ENABLED="true"
-export INGRESS_CLASS="nginx" # or whatever ingress class you have configured
-
-# if using istio
-export ISTIO_ENABLED="true"
-export ISTIO_GATEWAY="istio-system/istio-ingressgateway" # or whatever istio gateway you have configured
-```
-
-Running the installation script with `--interactive` guides you through the process of exposing your Dynamo Cloud Platform externally if you don't want to set these environment variables manually.
-
-2. [One-time Action] Create a new kubernetes namespace and set it as your default.
-
-```bash
-cd deploy/cloud/helm
-kubectl create namespace $NAMESPACE
-kubectl config set-context --current --namespace=$NAMESPACE
-```
-
-3. Deploy the Helm charts (install CRDs first, then platform) using the deployment script:
-
-```bash
-./deploy.sh --crds
-```
-
-if you want guidance during the process, run the deployment script with the `--interactive` flag:
-
-```bash
-./deploy.sh --crds --interactive
-```
-
-omitting `--crds` will skip the CRDs installation/upgrade. This is useful when installing on a shared cluster as CRDs are cluster-scoped resources.
+⚠️ **Note:** that omitting `--crds` will skip the CRDs installation/upgrade. This is useful when installing on a shared cluster as CRDs are cluster-scoped resources.

-If you'd like to only generate the generated-values.yaml file without deploying to Kubernetes (e.g., for inspection, CI workflows, or dry-run testing), use:
+⚠️ **Note:** If you'd like to only generate the generated-values.yaml file without deploying to Kubernetes (e.g., for inspection, CI workflows, or dry-run testing), use:

 ```bash
 ./deploy_dynamo_cloud.py --yaml-only
 ```


-### Installation using published helm chart
-
-To install Dynamo Cloud using the published Helm chart, you'll need to configure Docker registry credentials and image settings.
-
-
-#### Environment Setup
-
-Set the required environment variables:
-
-```bash
-# Docker registry configuration
-export DOCKER_SERVER="your-registry.com"                    # Docker registry server where images of dynamo cloud services (operator) are available
-export IMAGE_TAG="v1.0.0"                                   # Image tag to deploy
-export NAMESPACE="dynamo-cloud"                             # Target namespace
-
-# Components-specific Docker registry (if different from DOCKER_SERVER)
-export COMPONENTS_DOCKER_SERVER="your-pipeline-registry.com" # Registry for Dynamo components images
-
-# Image pull secret for the operator itself
-export DOCKER_SECRET_NAME="my-pull-secret"                       # Secret for pulling images of dynamo cloud services (operator) operator images
-export COMPONENTS_DOCKER_SECRET_NAME="my-components-pull-secret" # Secret for pulling images of dynamo components images (if needed)
-```
-
-you can easily create an image pull secret with the following command :
-
-```bash
-kubectl create secret docker-registry ${DOCKER_SECRET_NAME} \
-  --docker-server=${DOCKER_SERVER} \
-  --docker-username=<docker-server-username> \
-  --docker-password=<docker-server-password> \
-  --namespace=${NAMESPACE}
-
-# Only if using a different registry for Dynamo components
-kubectl create secret docker-registry ${COMPONENTS_DOCKER_SECRET_NAME} \
-  --docker-server=${COMPONENTS_DOCKER_SERVER} \
-  --docker-username=<components-docker-server-username> \
-  --docker-password=<components-docker-server-password> \
-  --namespace=${NAMESPACE}
-
-```
-
-#### Installation Commands
-
-**Step 1: Install Custom Resource Definitions (CRDs)**
-
-```bash
-helm install dynamo-crds dynamo-crds-helm-chart.tgz \
-  --namespace default \
-  --wait \
-  --atomic
-```
-
-**Step 2: Install Dynamo Platform**
-
-Run the following helm command:
-
-```bash
-helm install dynamo-platform dynamo-platform-helm-chart.tgz \
-  --namespace ${NAMESPACE} \
-  --set "dynamo-operator.controllerManager.manager.image.repository=${DOCKER_SERVER}/dynamo-operator" \
-  --set "dynamo-operator.controllerManager.manager.image.tag=${IMAGE_TAG}" \
-  --set "dynamo-operator.imagePullSecrets[0].name=${DOCKER_SECRET_NAME}"
-```

 ### Cloud Provider-Specific deployment

@@ -280,12 +114,3 @@ helm install dynamo-platform dynamo-platform-helm-chart.tgz \

 You can find detailed instructions for deployment in GKE [here](../dynamo_deploy/gke_setup.md)

-## Next Steps
-
-After deploying the Dynamo cloud platform, you can:
-
-1. Deploy your first inference graph using the [Dynamo CLI](operator_deployment.md)
-2. Deploy Dynamo LLM graphs to Kubernetes using the [Dynamo CLI](../../examples/llm_deployment.md)
-3. Manage your deployments using the Dynamo CLI
-
-For more detailed information about deploying inference graphs, see the [Dynamo Deploy Guide](README.md).
--- a/docs/guides/dynamo_deploy/manual_helm_deployment.md
+++ b/docs/guides/dynamo_deploy/manual_helm_deployment.md
-<!--
-SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
-SPDX-License-Identifier: Apache-2.0
-
-Licensed under the Apache License, Version 2.0 (the "License");
-you may not use this file except in compliance with the License.
-You may obtain a copy of the License at
-
-http://www.apache.org/licenses/LICENSE-2.0
-
-Unless required by applicable law or agreed to in writing, software
-distributed under the License is distributed on an "AS IS" BASIS,
-WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-See the License for the specific language governing permissions and
-limitations under the License.
-->
-
-<a id="k8-helm-deploy"></a>
-# Deploying Dynamo Inference Graphs to Kubernetes using Helm
-
-This guide describes the deployment process of an inference graph created using the Dynamo SDK onto a Kubernetes cluster.
-
-While this guide covers deployment of Dynamo inference graphs using Helm, the preferred method to deploy an inference graph is to [deploy with the Dynamo cloud platform](operator_deployment.md). The [Dynamo cloud platform](dynamo_cloud.md) simplifies the deployment and management of Dynamo inference graphs. It includes a set of components (Operator, Kubernetes Custom Resources, etc.) that work together to streamline the deployment and management process.
-
-Once an inference graph is defined using the Dynamo SDK, it can be deployed onto a Kubernetes cluster using a simple `dynamo deploy` command that orchestrates the following deployment steps:
-
-1. Building docker images from inference graph components on the cluster
-2. Intelligently composing the encoded inference graph into a complete deployment on Kubernetes
-3. Enabling autoscaling, monitoring, and observability for the inference graph
-4. Easy administration of deployments via UI
-
-## Helm Deployment Guide
-
-### Setting up MicroK8s
-
-Follow these steps to set up a local Kubernetes cluster using MicroK8s:
-
-1. Install MicroK8s:
-```bash
-sudo snap install microk8s --classic
-```
-
-2. Configure user permissions:
-```bash
-sudo usermod -a -G microk8s $USER
-sudo chown -R $USER ~/.kube
-```
-
-3. **Important**: Log out and log back in for the permissions to take effect
-
-4. Start MicroK8s:
-```bash
-microk8s start
-```
-
-5. Enable required addons:
-```bash
-# Enable GPU support
-microk8s enable gpu
-
-# Enable storage support
-# See: https://microk8s.io/docs/addon-hostpath-storage
-microk8s enable storage
-```
-
-6. Configure kubectl:
-```bash
-mkdir -p ~/.kube
-microk8s config >> ~/.kube/config
-```
-
-After completing these steps, you should be able to use the `kubectl` command to interact with your cluster.
-
-### Installing Required Dependencies
-
-Follow these steps to set up the namespace and install required components:
-
-1. Set environment variables:
-```bash
-export NAMESPACE=dynamo-playground
-export RELEASE_NAME=dynamo-platform
-export PROJECT_ROOT=$(pwd)
-```
-
-2. Install NATS messaging system:
-```bash
-# Navigate to dependencies directory
-cd $PROJECT_ROOT/deploy/helm/dependencies
-
-# Add and update NATS Helm repository
-helm repo add nats https://nats-io.github.io/k8s/helm/charts/
-helm repo update
-
-# Install NATS with custom values
-helm install --namespace ${NAMESPACE} ${RELEASE_NAME}-nats nats/nats \
-    --values nats-values.yaml
-```
-
-3. Install etcd key-value store:
-```bash
-# Install etcd using Bitnami chart
-helm install --namespace ${NAMESPACE} ${RELEASE_NAME}-etcd \
-    oci://registry-1.docker.io/bitnamicharts/etcd \
-    --values etcd-values.yaml
-```
-
-After completing these steps, your cluster has the necessary messaging and storage infrastructure for running Dynamo inference graphs.
-
-### Building and Deploying the Pipeline
-
-Follow these steps to containerize and deploy your inference pipeline:
-
-1. Build and containerize the pipeline:
-
-``` {note}
-For instructions on building and pushing the Dynamo base image, see [Building the Dynamo Base Image](../../get_started.md#building-the-dynamo-base-image).
-```
-
-```bash
-# Navigate to example directory
-cd $PROJECT_ROOT/examples/hello_world
-
-# Set runtime image name
-export DYNAMO_IMAGE=<dynamo_base_image>
-
-# Build and containerize the Frontend service
-dynamo build --containerize hello_world:Frontend
-```
-
-2. Push container to registry:
-```bash
-# Tag the built image for your registry
-docker tag <BUILT_IMAGE_TAG> <TAG>
-
-# Push to your container registry
-docker push <TAG>
-```
-
-3. Deploy using Helm:
-```bash
-# Navigate to the deployment directory
-cd $PROJECT_ROOT/deploy/helm
-
-# Set release name for Helm
-export HELM_RELEASE=hello-world-manual
-
-# Generate Helm values file from Frontend service
-dynamo get frontend > pipeline-values.yaml
-
-# Install/upgrade Helm release
-helm upgrade -i "$HELM_RELEASE" ./chart \
-    -f pipeline-values.yaml \
-    --set image=<TAG> \
-    --set dynamoIdentifier="hello_world:Frontend" \
-    -n "$NAMESPACE"
-```
-
-4. Test the deployment:
-```bash
-# Forward the service port to localhost
-kubectl -n ${NAMESPACE} port-forward svc/${HELM_RELEASE}-frontend 3000:80
-
-# Test the API endpoint
-curl -X 'POST' 'http://localhost:3000/generate' \
-    -H 'accept: text/event-stream' \
-    -H 'Content-Type: application/json' \
-    -d '{"text": "test"}'
-```
-
-### Using the Deployment Script
-
-For convenience, you can use the deployment script at `deploy/helm/deploy.sh` that automates all of these steps:
-
-```bash
-export DYNAMO_IMAGE=<dynamo_docker_image_name>
-./deploy.sh <docker_registry> <k8s_namespace> <path_to_dynamo_directory> <dynamo_identifier> [<dynamo_config_file>]
-
-# Example: export DYNAMO_IMAGE=nvcr.io/nvidian/nim-llm-dev/dynamo-base-worker:0.0.1
-# Example: ./deploy.sh nvcr.io/nvidian/nim-llm-dev my-namespace ../../../examples/hello_world/ hello_world:Frontend
-# Example: ./deploy.sh nvcr.io/nvidian/nim-llm-dev my-namespace ../../../examples/llm graphs.disagg_router:Frontend ../../../examples/llm/configs/disagg_router.yaml
-```
-
-This script handles:
-1. Building and pushing the Docker image
-2. Setting up the Helm values
-3. Installing/upgrading the Helm release
-4. Configuring the necessary Kubernetes resources
--- a/docs/guides/dynamo_deploy/quickstart.md
+++ b/docs/guides/dynamo_deploy/quickstart.md
 # Quickstart

-Before deploying your inference graphs you need to install the Dynamo Inference Platform and the Dynamo Cloud.
+Your onboarding includes 2 steps.
+1. Before deploying your inference graphs you need to install the Dynamo Inference Platform and the Dynamo Cloud.
+Dynamo Cloud acts as an orchestration layer between the end user and Kubernetes, handling the complexity of deploying your graphs for you.
+You could install from [Published Artifacts](#1-installing-dynamo-cloud-from-published-artifacts) or [Source](#2-installing-dynamo-cloud-from-source)
+2. Once you install the Dynamo Cloud, proceed to the [Examples](../../examples/README.md) to deploy an inference graph.

-## 1. Installing from Published Artifacts
+## 1. Installing Dynamo Cloud from Published Artifacts

 Use this approach when installing from pre-built helm charts and docker images published to NGC.

@@ -17,6 +21,8 @@ Install `envsubst`, `kubectl`, `helm`

 ### Authenticate with NGC

+Go to  https://ngc.nvidia.com/org to get your NGC_CLI_API_KEY.
+
 ```bash
 helm repo add nvidia https://helm.ngc.nvidia.com/nvidia --username='$oauthtoken' --password=<YOUR_NGC_CLI_API_KEY>
 ```
@@ -50,7 +56,7 @@ kubectl create namespace ${NAMESPACE}
 helm install dynamo-platform dynamo-platform-v${RELEASE_VERSION}.tgz --namespace ${NAMESPACE}
 ```

-## 2. Installing from Source
+## 2. Installing Dynamo Cloud from Source

 Use this approach when developing or customizing Dynamo as a contributor, or using local helm charts from the source repository.

@@ -64,12 +70,18 @@ cd deploy/cloud/helm/

 ### Set Environment Variables

+Our examples use the `nvcr.io` but you can setup your own values if you use another docker registry.
+
 ```bash
-export NAMESPACE=dynamo-cloud
-export DOCKER_USERNAME=your-username
-export DOCKER_PASSWORD=your-password
-export DOCKER_SERVER=your-docker-registry.com
-export IMAGE_TAG=your-image-tag
+export NAMESPACE=dynamo-cloud # or whatever you prefer.
+export DOCKER_SERVER=nvcr.io/nvidia/ai-dynamo/  # your-docker-registry.com
+export DOCKER_USERNAME='$oauthtoken'  # your-username if not using nvcr.io
+export DOCKER_PASSWORD=YOUR_NGC_CLI_API_KEY  # your-password if not using nvcr.io
+```
+
+```bash
+export IMAGE_TAG=RELEASE_VERSION # i.e. 0.3.2 - the release you are using or your-image-tag of you have built your own Dynamo image.
+# The  Nvidia Cloud Operator image will be pulled from the `$DOCKER_SERVER/dynamo-operator:$IMAGE_TAG`.
 ```

 The operator image will be pulled from `$DOCKER_SERVER/dynamo-operator:$IMAGE_TAG`.
@@ -107,7 +119,9 @@ if you want guidance during the process, run the deployment script with the `--i
 ./deploy.sh --crds --interactive
 ```

-**Step 1: Install Custom Resource Definitions (CRDs)**
+**Installing CRDs manually  (alternative to the script deploy.sh)**
+
+***Step 1: Install Custom Resource Definitions (CRDs)**

 ```bash
 helm install dynamo-crds ./crds/ \
@@ -116,7 +130,7 @@ helm install dynamo-crds ./crds/ \
  --atomic
 ```

-**Step 2: Build Dependencies and Install Platform**
+***Step 2: Build Dependencies and Install Platform**

 ```bash
 helm dep build ./platform/
@@ -150,22 +164,6 @@ We provide a script to uninstall CRDs should you need a clean start.

 ## Explore Examples

-### Hello World
-
-For a basic example that doesn't require a GPU, see the [Hello World](../../examples/hello_world.md)
-
-### LLM
-
-Create a Kubernetes secret containing your sensitive values if needed:
-
-```bash
-export HF_TOKEN=your_hf_token
-kubectl create secret generic hf-token-secret \
-  --from-literal=HF_TOKEN=${HF_TOKEN} \
-  -n ${NAMESPACE}
-```
-
-
 Pick your deployment destination.

 If local
@@ -179,9 +177,13 @@ If kubernetes
 export DYNAMO_CLOUD=https://dynamo-cloud.nvidia.com
 ```

+If deploying to Kubernetes, create a Kubernetes secret containing your sensitive values if needed:
+
 ```bash
-# Go to your main dynamo directory.
-cd ../../../
-kubectl apply -f examples/llm/deploy/agg.yaml -n $NAMESPACE
+export HF_TOKEN=your_hf_token
+kubectl create secret generic hf-token-secret \
+  --from-literal=HF_TOKEN=${HF_TOKEN} \
+  -n ${NAMESPACE}
 ```

+Follow the [Examples](../../examples/README.md)
\ No newline at end of file