docs: Consolidate documentation and fix redundant headings (#2518)

129a2444 · Anish · GitHub · d9aef67e · 129a2444 · 129a2444
Unverified Commit 129a2444 authored Aug 19, 2025 by Anish Committed by GitHub Aug 19, 2025
15 changed files
--- a/components/backends/sglang/README.md
+++ b/components/backends/sglang/README.md
@@ -50,7 +50,7 @@ git checkout $(git describe --tags $(git rev-list --tags --max-count=1))
 | **GB200 Support**   | ✅     |                                                              |
-## Quick Start
+## SGLang Quick Start
 Below we provide a guide that lets you run all of our common deployment patterns on a single node.

--- a/components/backends/trtllm/README.md
+++ b/components/backends/trtllm/README.md
@@ -66,7 +66,7 @@ git checkout $(git describe --tags $(git rev-list --tags --max-count=1))
 | **DP Rank Routing**| ✅           |                                                                 |
 | **GB200 Support**  | ✅           |                                                                 |
-## Quick Start
+## TensorRT-LLM Quick Start
 Below we provide a guide that lets you run all of our the common deployment patterns on a single node.

--- a/components/backends/vllm/README.md
+++ b/components/backends/vllm/README.md
@@ -51,7 +51,7 @@ git checkout $(git describe --tags $(git rev-list --tags --max-count=1))
 | **DP Rank Routing**| ✅   | Supported via external control of DP ranks |
 | **GB200 Support**  | 🚧   | Container functional on main |
-## Quick Start
+## vLLM Quick Start
 Below we provide a guide that lets you run all of our the common deployment patterns on a single node.

--- a/deploy/metrics/k8s/README.md
+++ b/deploy/metrics/k8s/README.md
 # Dynamo Metrics Collection on Kubernetes
-For detailed documentation on collecting and visualizing metrics on Kubernetes, see [docs/guides/deploy/k8s_metrics.md](../../../docs/guides/deploy/k8s_metrics.md).
+For detailed documentation on collecting and visualizing metrics on Kubernetes, see [docs/guides/dynamo_deploy/k8s_metrics.md](../../../docs/guides/dynamo_deploy/k8s_metrics.md).
--- a/docs/architecture/architecture.md
+++ b/docs/architecture/architecture.md
@@ -48,7 +48,7 @@ There are multi-faceted challenges:
 To address the growing demands of distributed inference serving, NVIDIA introduces Dynamo. This innovative product tackles key challenges in scheduling, memory management, and data transfer. Dynamo employs KV-aware routing for optimized decoding, leveraging existing KV caches. For efficient global memory management at scale, it strategically stores and evicts KV caches across multiple memory tiers—GPU, CPU, SSD, and object storage—enhancing both time-to-first-token and overall throughput. Dynamo features NIXL (NVIDIA Inference tranXfer Library), a new data transfer engine designed for dynamic scaling and low-latency storage access.
-## High level architecture and key benefits
+## Key benefits
 The following diagram outlines Dynamo's high-level architecture. To enable large-scale distributed and disaggregated inference serving, Dynamo includes five key features:

--- a/docs/architecture/sla_planner.md
+++ b/docs/architecture/sla_planner.md
@@ -17,7 +17,7 @@ The SLA (Service Level Agreement)-based planner is an intelligent autoscaling sy
 * **Performance interpolation**: Leverages profiling results data from pre-deployment profiling for accurate scaling decisions
 * **Correction factors**: Adapts to real-world performance deviations from profiled data
-## Architecture
+## Design
 The SLA planner consists of several key components:
@@ -108,7 +108,7 @@ Finally, SLA planner applies the change by scaling up/down the number of prefill
 For detailed deployment instructions including setup, configuration, troubleshooting, and architecture overview, see the [SLA Planner Deployment Guide](../guides/dynamo_deploy/sla_planner_deployment.md).
-**Quick Start:**
+**To deploy SLA Planner:**
 ```bash
 cd components/backends/vllm/deploy
 kubectl apply -f disagg_planner.yaml -n {$NAMESPACE}

--- a/docs/components/router/README.md
+++ b/docs/components/router/README.md
@@ -9,7 +9,7 @@ SPDX-License-Identifier: Apache-2.0
 The Dynamo KV Router intelligently routes requests by evaluating their computational costs across different workers. It considers both decoding costs (from active blocks) and prefill costs (from newly computed blocks). Optimizing the KV Router is critical for achieving maximum throughput and minimum latency in distributed inference setups.
-## Quick Start
+## KV Router Quick Start
 To launch the Dynamo frontend with the KV Router:

--- a/docs/guides/dynamo_deploy/README.md
+++ b/docs/guides/dynamo_deploy/README.md
@@ -17,85 +17,130 @@ limitations under the License.
 # Deploying Inference Graphs to Kubernetes
- We expect users to deploy their inference graphs using CRDs or helm charts.
+High-level guide to Dynamo Kubernetes deployments. Start here, then dive into specific guides.
-# 1. Install Dynamo Cloud.
+## 1. Install Platform First
+**[Dynamo Kubernetes Platform](dynamo_cloud.md)** - Main installation guide with 3 paths
-Prior to deploying an inference graph the user should deploy the Dynamo Cloud Platform. Reference the [Quickstart Guide](quickstart.md) for steps to install Dynamo Cloud with Helm.
+## 2. Choose Your Backend
-Dynamo Cloud acts as an orchestration layer between the end user and Kubernetes, handling the complexity of deploying your graphs for you. This is a one-time action, only necessary the first time you deploy a DynamoGraph.
+Each backend has deployment examples and configuration options:
-# 2. Deploy your inference graph.
+| Backend | Available Configurations |
+|---------|--------------------------|
+| **[vLLM](../../../components/backends/vllm/deploy/README.md)** | Aggregated, Aggregated + Router, Disaggregated, Disaggregated + Router, Disaggregated + Planner |
+| **[SGLang](../../../components/backends/sglang/deploy/README.md)** | Aggregated, Aggregated + Router, Disaggregated, Disaggregated + Planner, Disaggregated Multi-node |
+| **[TensorRT-LLM](../../../components/backends/trtllm/deploy/README.md)** | Aggregated, Aggregated + Router, Disaggregated, Disaggregated + Router |
-We provide a Custom Resource YAML file for many examples under the components/backends/{engine}/deploy folders. Consult the examples below for the CRs for a specific inference backend.
+## 3. Deploy Your First Model
-[View SGLang K8s](../../../components/backends/sglang/deploy/README.md)
+```bash
+# Set same namespace from platform install
-[View vLLM K8s](../../../components/backends/vllm/deploy/README.md)
+export NAMESPACE=dynamo-cloud
-[View TRT-LLM K8s](../../../components/backends/trtllm/deploy/README.md)
+# Deploy any example (this uses vLLM with Qwen model using aggregated serving)
+kubectl apply -f components/backends/vllm/deploy/agg.yaml -n ${NAMESPACE}
-### Deploying a particular example
+# Check status
+kubectl get dynamoGraphDeployment -n ${NAMESPACE}
-```bash
+# Test it
-# Set your dynamo root directory
+kubectl port-forward svc/agg-vllm-frontend 8000:8000 -n ${NAMESPACE}
-cd <root-dynamo-folder>
+curl http://localhost:8000/v1/models
-export PROJECT_ROOT=$(pwd)
-export NAMESPACE=<your-namespace> # the namespace you used to deploy Dynamo cloud to.
 ```
-Deploying an example consists of the simple `kubectl apply -f ... -n ${NAMESPACE}` command. For example:
+## What's a DynamoGraphDeployment?
-```bash
+It's a Kubernetes Custom Resource that defines your inference pipeline:
-kubectl apply -f components/backends/vllm/deploy/agg.yaml -n ${NAMESPACE}
+- Model configuration
-```
+- Resource allocation (GPUs, memory)
+- Scaling policies
+- Frontend/backend connections
-You can use `kubectl get dynamoGraphDeployment -n ${NAMESPACE}` to view your deployment.
+The scripts in the `components/<backend>/launch` folder like `agg.sh` demonstrate how you can serve your models locally. The corresponding YAML files like `agg.yaml` show you how you could create a kubernetes deployment for your inference graph.
-You can use `kubectl delete dynamoGraphDeployment <your-dep-name> -n ${NAMESPACE}` to delete the deployment.
-We provide a Custom Resource YAML file for many examples under the `deploy/` folder.
+### Choosing Your Architecture Pattern
-Use [VLLM YAML](../../../components/backends/vllm/deploy/agg.yaml) for an example.
-**Note 1** Example Image
+When creating a deployment, select the architecture pattern that best fits your use case:
-The examples use a prebuilt image from the `nvcr.io` registry.
+- **Development / Testing** - Use `agg.yaml` as the base configuration
-You can utilize public images from [Dynamo NGC](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/ai-dynamo/collections/ai-dynamo) or build your own image and update the image location in your CR file prior to applying. Either way, you will need to overwrite the image in the example YAML.
+- **Production with Load Balancing** - Use `agg_router.yaml` to enable scalable, load-balanced inference
+- **High Performance / Disaggregated** - Use `disagg_router.yaml` for maximum throughput and modular scalability
-To build your own image:
+### Frontend and Worker Components
-```bash
+You can run the Frontend on one machine (e.g., a CPU node) and workers on different machines (GPU nodes). The Frontend serves as a framework-agnostic HTTP entry point that:
-./container/build.sh --framework <your-inference-framework>
-```
-For example for the `sglang` run
+- Provides OpenAI-compatible `/v1/chat/completions` endpoint
-```bash
+- Auto-discovers backend workers via etcd
-./container/build.sh --framework sglang
+- Routes requests and handles load balancing
-```
+- Validates and preprocesses requests
-To overwrite the image in the example:
+### Customizing Your Deployment
-```bash
+Example structure:
-extraPodSpec:
+```yaml
+apiVersion: nvidia.com/v1alpha1
+kind: DynamoGraphDeployment
+metadata:
+  name: my-llm
+spec:
+  services:
+    Frontend:
+      dynamoNamespace: my-llm
+      componentType: frontend
+      replicas: 1
+      extraPodSpec:
        mainContainer:
-          image: <image-in-your-$DYNAMO_IMAGE>
+          image: your-image
+    VllmDecodeWorker:  # or SGLangDecodeWorker, TrtllmDecodeWorker
+      dynamoNamespace: dynamo-dev
+      componentType: worker
+      replicas: 1
+      envFromSecret: hf-token-secret  # for HuggingFace models
+      resources:
+        limits:
+          gpu: "1"
+      extraPodSpec:
+        mainContainer:
+          image: your-image
+          command: ["/bin/sh", "-c"]
+          args:
+            - python3 -m dynamo.vllm --model YOUR_MODEL [--your-flags]
 ```
-**Note 2**
+Worker command examples per backend:
-Setup port forward if needed when deploying to Kubernetes.
+```yaml
+# vLLM worker
-List the services in your namespace:
+args:
+  - python3 -m dynamo.vllm --model Qwen/Qwen3-0.6B
-```bash
-kubectl get svc -n ${NAMESPACE}
+# SGLang worker
+args:
+  - >-
+    python3 -m dynamo.sglang
+    --model-path deepseek-ai/DeepSeek-R1-Distill-Llama-8B
+    --tp 1
+    --trust-remote-code
+# TensorRT-LLM worker
+args:
+  - python3 -m dynamo.trtllm
+    --model-path deepseek-ai/DeepSeek-R1-Distill-Llama-8B
+    --served-model-name deepseek-ai/DeepSeek-R1-Distill-Llama-8B
+    --extra-engine-args engine_configs/agg.yaml
 ```
-Look for one that ends in `-frontend` and use it for port forward.
-```bash
+Key customization points include:
-SERVICE_NAME=$(kubectl get svc -n ${NAMESPACE} -o name | grep frontend | sed 's|.*/||' | sed 's|-frontend||' | head -n1)
+- **Model Configuration**: Specify model in the args command
-kubectl port-forward svc/${SERVICE_NAME}-frontend 8080:8080 -n ${NAMESPACE}
+- **Resource Allocation**: Configure GPU requirements under `resources.limits`
-```
+- **Scaling**: Set `replicas` for number of worker instances
+- **Routing Mode**: Enable KV-cache routing by setting `DYN_ROUTER_MODE=kv` in Frontend envs
+- **Worker Specialization**: Add `--is-prefill-worker` flag for disaggregated prefill workers
-Additional Resources:
+## Additional Resources
- [Port Forward Documentation](https://kubernetes.io/docs/tasks/access-application-cluster/port-forward-access-application-cluster/)
- [Examples Deployment Guide](../../examples/README.md#deploying-a-particular-example)
+- **[Examples](../../examples/README.md)** - Complete working examples
+- **[Create Custom Deployments](create_deployment.md)** - Build your own CRDs
+- **[Operator Documentation](dynamo_operator.md)** - How the platform works
+- **[Helm Charts](../../../deploy/helm/README.md)** - For advanced users
\ No newline at end of file
--- a/docs/guides/dynamo_deploy/dynamo_cloud.md
+++ b/docs/guides/dynamo_deploy/dynamo_cloud.md
@@ -15,102 +15,167 @@ See the License for the specific language governing permissions and
 limitations under the License.
 -->
-# Dynamo Cloud Kubernetes Platform
+# Dynamo Kubernetes Platform
-The Dynamo Cloud platform is a comprehensive solution for deploying and managing Dynamo inference graphs (also referred to as pipelines) in Kubernetes environments. It provides a streamlined experience for deploying, scaling, and monitoring your inference services.
+Deploy and manage Dynamo inference graphs on Kubernetes with automated orchestration and scaling, using the Dynamo Kubernetes Platform.
-## Overview
+## Quick Start Paths
-The Dynamo cloud platform consists of several key components:
+**Path A: Production Install**
+Install from published artifacts on your existing cluster → [Jump to Path A](#path-a-production-install)
- **Dynamo Operator**: A Kubernetes operator that manages the lifecycle of Dynamo inference graphs from build ➡️ deploy. For more information on the operator, see [Dynamo Kubernetes Operator Documentation](../dynamo_deploy/dynamo_operator.md)
+**Path B: Local Development**
- **Custom Resources**: Kubernetes custom resources for defining and managing Dynamo services
+Set up Minikube first → [Minikube Setup](minikube.md) → Then follow Path A
+**Path C: Custom Development**
+Build from source for customization → [Jump to Path C](#path-c-custom-development)
-## Deployment Prerequisites
+## Prerequisites
-Before getting started with the Dynamo cloud platform, ensure you have:
- A Kubernetes cluster (version 1.24 or later)
- [Earthly](https://earthly.dev/) installed for building components
- Docker installed and running
- Access to a container registry (e.g., Docker Hub, NVIDIA NGC, etc.)
- `kubectl` configured to access your cluster
- Helm installed (version 3.0 or later)
+```bash
+# Required tools
+kubectl version --client  # v1.24+
+helm version             # v3.0+
+docker version           # Running daemon
+# Set your inference runtime image
+export DYNAMO_IMAGE=nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.4.0
+# Also available: sglang-runtime, tensorrtllm-runtime
+```
 > [!TIP]
-> Don't have a Kubernetes cluster? Check out our [Minikube setup guide](../../../docs/guides/dynamo_deploy/minikube.md) to set up a local environment! 🏠
+> No cluster? See [Minikube Setup](minikube.md) for local development.
-#### 🏗️ Build Dynamo inference runtime.
+## Path A: Production Install
-[One-time Action]
+Install from [NGC published artifacts](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/ai-dynamo/collections/ai-dynamo/artifacts) in 3 steps.
-Before you could use Dynamo make sure you have setup the Inference Runtime Image.
-For basic cases you could use the prebuilt image for the Dynamo Inference Runtime.
-Just export the environment variable. This will be the image used by your individual components. You pick whatever dynamo version you want or use the latest (default)
 ```bash
-export DYNAMO_IMAGE=nvcr.io/nvidia/dynamo:latest-vllm
+# 1. Set environment
+export NAMESPACE=dynamo-kubernetes
+export RELEASE_VERSION=0.4.0 # any version of Dynamo 0.3.2+
+# 2. Install CRDs
+helm fetch https://helm.ngc.nvidia.com/nvidia/ai-dynamo/charts/dynamo-crds-${RELEASE_VERSION}.tgz
+helm install dynamo-crds dynamo-crds-${RELEASE_VERSION}.tgz --namespace default
+# 3. Install Platform
+kubectl create namespace ${NAMESPACE}
+helm fetch https://helm.ngc.nvidia.com/nvidia/ai-dynamo/charts/dynamo-platform-${RELEASE_VERSION}.tgz
+helm install dynamo-platform dynamo-platform-${RELEASE_VERSION}.tgz --namespace ${NAMESPACE}
 ```
-For a custom setup build and push to your registry Dynamo Base Image for Dynamo inference runtime. This is a one-time operation.
+→ [Verify Installation](#verify-installation)
-```bash
+## Path C: Custom Development
-# Run the script to build the default dynamo:latest-vllm image.
-./container/build.sh
-export IMAGE_TAG=<TAG>
-# Tag the image
-docker tag dynamo:latest-vllm <your-registry>/dynamo:${IMAGE_TAG}
-docker push <your-registry>/dynamo:${IMAGE_TAG}
-```
-## 🚀 Deploying the Dynamo Cloud Platform
+Build and deploy from source for customization.
-## Prerequisites
+### Quick Deploy Script
+```bash
+# 1. Set environment
+export NAMESPACE=dynamo-cloud
+export DOCKER_SERVER=nvcr.io/nvidia/ai-dynamo/  # or your registry
+export DOCKER_USERNAME='$oauthtoken'
+export DOCKER_PASSWORD=<YOUR_NGC_CLI_API_KEY>
+export IMAGE_TAG=0.4.0
+# 2. Build operator
+cd deploy/cloud/operator
+earthly --push +docker --DOCKER_SERVER=$DOCKER_SERVER --IMAGE_TAG=$IMAGE_TAG
+cd -
+# 3. Create namespace and secrets
+kubectl create namespace ${NAMESPACE}
+kubectl create secret docker-registry docker-imagepullsecret \
+  --docker-server=${DOCKER_SERVER} \
+  --docker-username=${DOCKER_USERNAME} \
+  --docker-password=${DOCKER_PASSWORD} \
+  --namespace=${NAMESPACE}
+# 4. Deploy
+helm repo add bitnami https://charts.bitnami.com/bitnami
+./deploy.sh --crds
+```
-Before deploying Dynamo Cloud, ensure your Kubernetes cluster meets the following requirements:
+### Manual Steps (Alternative)
-#### 1. 🛡️ Istio Installation
+<details>
-Dynamo Cloud requires Istio for service mesh capabilities. Verify Istio is installed and running:
+<summary>Click to expand manual installation steps</summary>
+**Step 1: Install CRDs**
 ```bash
-# Check if Istio is installed
+helm install dynamo-crds ./crds/ --namespace default
-kubectl get pods -n istio-system
+```
-# Expected output should show running Istio pods
+**Step 2: Install Platform**
-# istiod-* pods should be in Running state
+```bash
+helm dep build ./platform/
+helm install dynamo-platform ./platform/ \
+  --namespace ${NAMESPACE} \
+  --set "dynamo-operator.controllerManager.manager.image.repository=${DOCKER_SERVER}/dynamo-operator" \
+  --set "dynamo-operator.controllerManager.manager.image.tag=${IMAGE_TAG}" \
+  --set "dynamo-operator.imagePullSecrets[0].name=docker-imagepullsecret"
 ```
+</details>
+→ [Verify Installation](#verify-installation)
-#### 2. 💾 PVC Support with Default Storage Class
+## Verify Installation
-Dynamo Cloud requires Persistent Volume Claim (PVC) support with a default storage class. Verify your cluster configuration:
 ```bash
-# Check if default storage class exists
+# Check CRDs
-kubectl get storageclass
+kubectl get crd | grep dynamo
-# Expected output should show at least one storage class marked as (default)
+# Check operator and platform pods
-# Example:
+kubectl get pods -n ${NAMESPACE}
-# NAME                 PROVISIONER             RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
+# Expected: dynamo-operator-* and etcd-* pods Running
-# standard (default)   kubernetes.io/gce-pd    Delete          Immediate              true                   1d
 ```
-## Installation
+## Next Steps
-Follow [Quickstart Guide](./quickstart.md) to install the Dynamo Cloud
+1. **Deploy Model/Workflow**
+   ```bash
+   # Example: Deploy a vLLM workflow with Qwen3-0.6B using aggregated serving
+   kubectl apply -f components/backends/vllm/deploy/agg.yaml -n ${NAMESPACE}
-⚠️ **Note:** that omitting `--crds` will skip the CRDs installation/upgrade. This is useful when installing on a shared cluster as CRDs are cluster-scoped resources.
+   # Port forward and test
+   kubectl port-forward svc/agg-vllm-frontend 8000:8000 -n ${NAMESPACE}
+   curl http://localhost:8000/v1/models
+   ```
-⚠️ **Note:** If you'd like to only generate the generated-values.yaml file without deploying to Kubernetes (e.g., for inspection, CI workflows, or dry-run testing), use:
+2. **Explore Backend Guides**
+   - [vLLM Deployments](../../../components/backends/vllm/deploy/README.md)
+   - [SGLang Deployments](../../../components/backends/sglang/deploy/README.md)
+   - [TensorRT-LLM Deployments](../../../components/backends/trtllm/deploy/README.md)
-```bash
+3. **Optional:**
-./deploy_dynamo_cloud.py --yaml-only
+   - [Set up Prometheus & Grafana](k8s_metrics.md)
-```
+   - [SLA Planner Deployment Guide](sla_planner_deployment.md) (for advanced SLA-aware scheduling and autoscaling)
+## Troubleshooting
+**Pods not starting?**
+```bash
+kubectl describe pod <pod-name> -n ${NAMESPACE}
+kubectl logs <pod-name> -n ${NAMESPACE}
+```
-### Cloud Provider-Specific deployment
+**HuggingFace model access?**
+```bash
+kubectl create secret generic hf-token-secret \
+  --from-literal=HF_TOKEN=${HF_TOKEN} \
+  -n ${NAMESPACE}
+```
-#### Google Kubernetes Engine (GKE) deployment
+**Clean uninstall?**
+```bash
+./uninstall.sh  # Removes all CRDs and platform
+```
-You can find detailed instructions for deployment in GKE [here](../dynamo_deploy/gke_setup.md)
+## Advanced Options
+- [GKE-specific setup](gke_setup.md)
+- [Create custom deployments](create_deployment.md)
+- [Dynamo Operator details](dynamo_operator.md)
\ No newline at end of file
--- a/docs/guides/dynamo_deploy/grove.md
+++ b/docs/guides/dynamo_deploy/grove.md
+# Grove Deployment Guide
+Grove is a Kubernetes API specifically designed to address the orchestration challenges of modern AI workloads, particularly disaggregated inference systems. Grove provides seamless integration with NVIDIA Dynamo for comprehensive AI infrastructure management.
+## Overview
+Grove was originally motivated by the challenges of orchestrating multinode, disaggregated inference systems. It provides a consistent and unified API that allows users to define, configure, and scale prefill, decode, and any other components like routing within a single custom resource.
+### How Grove Works for Disaggregated Serving
+Grove enables disaggregated serving by breaking down large language model inference into separate, specialized components that can be independently scaled and managed. This architecture provides several advantages:
+- **Component Specialization**: Separate prefill, decode, and routing components optimized for their specific tasks
+- **Independent Scaling**: Each component can scale based on its individual resource requirements and workload patterns
+- **Resource Optimization**: Better utilization of hardware resources through specialized workload placement
+- **Fault Isolation**: Issues in one component don't necessarily affect others
+## Core Components and API Resources
+Grove implements disaggregated serving through several custom Kubernetes resources that provide declarative composition of role-based pod groups:
+### PodGangSet
+The top-level Grove object that defines a group of components managed and colocated together. Key features include:
+- Support for autoscaling
+- Topology-aware spread of replicas for availability
+- Unified management of multiple disaggregated components
+### PodClique
+Represents a group of pods with a specific role (e.g., leader, worker, frontend). Each clique features:
+- Independent configuration options
+- Custom scaling logic support
+- Role-specific resource allocation
+### PodCliqueScalingGroup
+A set of PodCliques that scale and are scheduled together, ideal for tightly coupled roles like prefill leader and worker components that need coordinated scaling behavior.
+## Key Capabilities for Disaggregated Serving
+Grove provides several specialized features that make it particularly well-suited for disaggregated serving:
+### Flexible Gang Scheduling
+PodCliques and PodCliqueScalingGroups allow users to specify flexible gang-scheduling requirements at multiple levels within a PodGangSet to prevent resource deadlocks and ensure all components of a disaggregated system start together.
+### Multi-level Horizontal Auto-Scaling
+Supports pluggable horizontal auto-scaling solutions to scale PodGangSet, PodClique, and PodCliqueScalingGroup custom resources independently based on their specific metrics and requirements.
+### Network Topology-Aware Scheduling
+Allows specifying network topology pack and spread constraints to optimize for both network performance and service availability, crucial for disaggregated systems where components need efficient inter-node communication.
+### Custom Startup Dependencies
+Prescribes the order in which PodCliques must start in a declarative specification, with pod startup decoupled from pod creation or scheduling. This ensures proper initialization order for disaggregated components.
+## Use Cases and Examples
+Grove specifically supports:
+- **Multi-node disaggregated inference** for large models such as DeepSeek-R1 and Llama-4-Maverick
+- **Single-node disaggregated inference** for optimized resource utilization
+- **Agentic pipelines of models** for complex AI workflows
+- **Standard aggregated serving** patterns for single node or single GPU inference
+## Integration with NVIDIA Dynamo
+Grove is strategically aligned with NVIDIA Dynamo for seamless integration within the AI infrastructure stack:
+### Complementary Roles
+- **Grove**: Handles the Kubernetes orchestration layer for disaggregated AI workloads
+- **Dynamo**: Provides comprehensive AI infrastructure capabilities including serving backends, routing, and resource management
+### Release Coordination
+Grove is aligning its release schedule with NVIDIA Dynamo to ensure seamless integration, with the finalized release cadence reflected in the project roadmap.
+### Unified AI Platform
+The integration creates a comprehensive platform where:
+- Grove manages complex orchestration of disaggregated components
+- Dynamo provides the serving infrastructure, routing capabilities, and backend integrations
+- Together they enable sophisticated AI serving architectures with simplified management
+## Architecture Benefits
+Grove represents a significant advancement in Kubernetes-based orchestration for AI workloads by:
+1. **Simplifying Complex Deployments**: Provides a unified API that can manage multiple components (prefill, decode, routing) within a single resource definition
+2. **Enabling Sophisticated Architectures**: Supports advanced disaggregated inference patterns that were previously difficult to orchestrate
+3. **Reducing Operational Complexity**: Abstracts away the complexity of coordinating multiple interdependent AI components
+4. **Optimizing Resource Utilization**: Enables fine-grained control over component placement and scaling
+## Getting Started
+> **Note**: Grove is currently in development and aligning with NVIDIA Dynamo's release schedule.
+For installation instructions, see the [Grove Installation Guide](https://github.com/NVIDIA/grove/blob/main/docs/installation.md).
+For practical examples of Grove-based multinode deployments in action, see the [Multinode Deployment Guide](multinode-deployment.md), which demonstrates multi-node disaggregated serving scenarios.
+For the latest updates on Grove, refer to the [official project on GitHub](https://github.com/NVIDIA/grove).
\ No newline at end of file
--- a/docs/guides/deploy/k8s_metrics.md
+++ b/docs/guides/deploy/k8s_metrics.md
@@ -7,7 +7,7 @@ This guide provides a walkthrough for collecting and visualizing metrics from Dy
 ## Prerequisites
 ### Install Dynamo Operator
-Before setting up metrics collection, you'll need to have the Dynamo operator installed in your cluster. Follow our [Quickstart Guide](../dynamo_deploy/quickstart.md) for detailed instructions on deploying the Dynamo operator.
+Before setting up metrics collection, you'll need to have the Dynamo operator installed in your cluster. Follow our [Installation Guide](../dynamo_deploy/dynamo_cloud.md) for detailed instructions on deploying the Dynamo operator.
 ### Install Prometheus Operator
 If you don't have an existing Prometheus setup, you'll need to install the Prometheus Operator. The Prometheus Operator introduces custom resources that make it easy to deploy and manage Prometheus monitoring in Kubernetes:
@@ -39,7 +39,7 @@ This will create two components:
 - A Worker component exposing metrics on its system port
 Both components expose a `/metrics` endpoint following the OpenMetrics format, but with different metrics appropriate to their roles. For details about:
- Deployment configuration: See the [vLLM README](../../../components/backends/vllm/README.md)
+- Deployment configuration: See the [vLLM README](../../components/backends/vllm/README.md)
 - Available metrics: See the [metrics guide](../metrics.md)
 ### Validate the Deployment
@@ -47,7 +47,7 @@ Both components expose a `/metrics` endpoint following the OpenMetrics format, b
 Let's send some test requests to populate metrics:
 ```bash
-curl localhost:8080/v1/chat/completions \
+curl localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Qwen/Qwen3-0.6B",

--- a/docs/guides/dynamo_deploy/minikube.md
+++ b/docs/guides/dynamo_deploy/minikube.md
@@ -17,21 +17,19 @@ limitations under the License.
 # Minikube Setup Guide
-Don't have a Kubernetes cluster? No problem! You can set up a local development environment using Minikube. This guide walks through the set up of everything you need to run Dynamo Cloud locally.
+Don't have a Kubernetes cluster? No problem! You can set up a local development environment using Minikube. This guide walks through the set up of everything you need to run Dynamo Kubernetes Platform locally.
-## Setting Up Minikube
+## 1. Install Minikube
-### 1. Install Minikube
 First things first! Start by installing Minikube. Follow the official [Minikube installation guide](https://minikube.sigs.k8s.io/docs/start/) for your operating system.
-### 2. Configure GPU Support (Optional)
+## 2. Configure GPU Support (Optional)
 Planning to use GPU-accelerated workloads? You'll need to configure GPU support in Minikube. Follow the [Minikube GPU guide](https://minikube.sigs.k8s.io/docs/tutorials/nvidia/) to set up NVIDIA GPU support before proceeding.
 ```{tip}
 Make sure to configure GPU support before starting Minikube if you plan to use GPU workloads!
 ```
-### 3. Start Minikube
+## 3. Start Minikube
 Time to launch your local cluster!
 ```bash
@@ -44,7 +42,7 @@ minikube addons enable istio
 minikube addons enable storage-provisioner-rancher
 ```
-### 4. Verify Installation
+## 4. Verify Installation
 Let's make sure everything is working correctly!
 ```bash
@@ -60,5 +58,5 @@ kubectl get storageclass
 ## Next Steps
-Once your local environment is set up, you can proceed with the [Dynamo Cloud deployment guide](./dynamo_cloud.md) to deploy the platform to your local cluster.
+Once your local environment is set up, you can proceed with the [Dynamo Kubernetes Platform deployment guide](./dynamo_cloud.md) to deploy the platform to your local cluster.
--- a/docs/guides/dynamo_deploy/model_caching_with_fluid.md
+++ b/docs/guides/dynamo_deploy/model_caching_with_fluid.md
@@ -27,7 +27,7 @@ helm install fluid fluid/fluid -n fluid-system
 ```
 For advanced configuration, see the [Fluid Installation Guide](https://fluid-cloudnative.github.io/docs/get-started/installation).
-## Quick Start
+## Pre-deployment Steps
 1. Install Fluid (see [Installation](#installation)).
 2. Create a Dataset and Runtime (see [the following example](#webufs-example)).

--- a/docs/guides/metrics.md
+++ b/docs/guides/metrics.md
@@ -31,7 +31,7 @@ Dynamo automatically exposes metrics with the `dynamo_` name prefixes. It also a
 **Specialized Component Metrics**: Components can also expose additional metrics specific to their functionality. For example, a `preprocessor` component exposes metrics with the `dynamo_preprocessor_*` prefix. See the [Available Metrics section](../../deploy/metrics/README.md#available-metrics) for details on specialized component metrics.
-**Kubernetes Integration**: For comprehensive Kubernetes deployment and monitoring setup, see the [Kubernetes Metrics Guide](deploy/k8s_metrics.md). This includes Prometheus Operator setup, metrics collection configuration, and visualization in Grafana.
+**Kubernetes Integration**: For comprehensive Kubernetes deployment and monitoring setup, see the [Kubernetes Metrics Guide](dynamo_deploy/k8s_metrics.md). This includes Prometheus Operator setup, metrics collection configuration, and visualization in Grafana.
 ## Metrics Hierarchy

--- a/examples/runtime/hello_world/README.md
+++ b/examples/runtime/hello_world/README.md
@@ -106,7 +106,7 @@ Hello star!
 Note that this a very simple degenerate example which does not demonstrate the standard Dynamo FrontEnd-Backend deployment. The hello-world client is not a web server, it is a one-off function which sends the predefined text "world,sun,moon,star" to the backend. The example is meant to show the HelloWorldWorker. As such you will only see the HelloWorldWorker pod in deployment. The client will run and exit and the pod will not be operational.
-Follow the [Quickstart Guide](../../../docs/guides/dynamo_deploy/quickstart.md) to install Dynamo Cloud.
+Follow the [Quickstart Guide](../../../docs/guides/dynamo_deploy/quickstart.md) to install Dynamo Kubernetes Platform.
 Then deploy to kubernetes using
 ```bash