docs: Simplify sphinx build and table of contents on webpage (#2519)

766d3f2c · Ryan McCormick · GitHub · f5a41004 · 766d3f2c · 766d3f2c
Unverified Commit 766d3f2c authored Aug 25, 2025 by Ryan McCormick Committed by GitHub Aug 25, 2025
10 changed files
--- a/docs/examples/README.md
+++ b/docs/examples/README.md
+../../examples/README.md
\ No newline at end of file
--- a/docs/guides/backend.md
+++ b/docs/guides/backend.md
@@ -76,7 +76,7 @@ The `model_type` can be:
 See `components/backends` for full code examples.
-### Component names
+## Component names
 A worker needs three names to register itself: namespace.component.endpoint

--- a/docs/guides/dynamo_deploy/dynamo_cloud.md
+++ b/docs/guides/dynamo_deploy/dynamo_cloud.md
@@ -39,7 +39,7 @@ helm version             # v3.0+
 docker version           # Running daemon
 # Set your inference runtime image
-export DYNAMO_IMAGE=nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.4.0
+export DYNAMO_IMAGE=nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.4.1
 # Also available: sglang-runtime, tensorrtllm-runtime
 ```
@@ -53,7 +53,7 @@ Install from [NGC published artifacts](https://catalog.ngc.nvidia.com/orgs/nvidi
 ```bash
 # 1. Set environment
 export NAMESPACE=dynamo-kubernetes
-export RELEASE_VERSION=0.4.0 # any version of Dynamo 0.3.2+
+export RELEASE_VERSION=0.4.1 # any version of Dynamo 0.3.2+
 # 2. Install CRDs
 helm fetch https://helm.ngc.nvidia.com/nvidia/ai-dynamo/charts/dynamo-crds-${RELEASE_VERSION}.tgz
@@ -79,7 +79,7 @@ export NAMESPACE=dynamo-cloud
 export DOCKER_SERVER=nvcr.io/nvidia/ai-dynamo/  # or your registry
 export DOCKER_USERNAME='$oauthtoken'
 export DOCKER_PASSWORD=<YOUR_NGC_CLI_API_KEY>
-export IMAGE_TAG=0.4.0
+export IMAGE_TAG=0.4.1
 # 2. Build operator
 cd deploy/cloud/operator
@@ -178,4 +178,4 @@ kubectl create secret generic hf-token-secret \
 - [GKE-specific setup](gke_setup.md)
 - [Create custom deployments](create_deployment.md)
 - [Dynamo Operator details](dynamo_operator.md)
\ No newline at end of file
--- a/docs/guides/dynamo_deploy/dynamo_operator.md
+++ b/docs/guides/dynamo_deploy/dynamo_operator.md
@@ -93,7 +93,7 @@ The GitOps workflow for Dynamo deployments consists of three main steps:
 ### Step 1: Build and Push Dynamo Cloud Operator
-First, follow to [See Install Dynamo Cloud](quickstart.md#install-dynamo-cloud).
+First, follow to [See Install Dynamo Cloud](README.md).
 ### Step 2: Create Initial Deployment

--- a/docs/guides/dynamo_deploy/operator_deployment.md
+++ b/docs/guides/dynamo_deploy/operator_deployment.md
-../../../guides/dynamo_deploy/operator_deployment.md
\ No newline at end of file
--- a/docs/guides/dynamo_deploy/quickstart.md
+++ b/docs/guides/dynamo_deploy/quickstart.md
-# Quickstart
-Your onboarding includes 2 steps.
-1. Before deploying your inference graphs you need to install the Dynamo Inference Platform and the Dynamo Cloud.
-Dynamo Cloud acts as an orchestration layer between the end user and Kubernetes, handling the complexity of deploying your graphs for you.
-You could install from [Published Artifacts](#1-installing-dynamo-cloud-from-published-artifacts) or [Source](#2-installing-dynamo-cloud-from-source)
-2. Once you install the Dynamo Cloud, proceed to the [Examples](../../examples/README.md) to deploy an inference graph.
-## 1. Installing Dynamo Cloud from Published Artifacts
-Use this approach when installing from pre-built helm charts and docker images published to NGC.
-### Prerequisites
-```bash
-export NAMESPACE=dynamo-cloud
-export RELEASE_VERSION=0.4.0
-```
-Install `envsubst`, `kubectl`, `helm`
-### Authenticate with NGC
-Go to  https://ngc.nvidia.com/org to get your NGC_CLI_API_KEY.
-```bash
-helm repo add nvidia https://helm.ngc.nvidia.com/nvidia --username='$oauthtoken' --password=<YOUR_NGC_CLI_API_KEY>
-```
-### Fetch Helm Charts
-```bash
-# Fetch the CRDs helm chart
-helm fetch https://helm.ngc.nvidia.com/nvidia/ai-dynamo/charts/dynamo-crds-${RELEASE_VERSION}.tgz
-# Fetch the platform helm chart
-helm fetch https://helm.ngc.nvidia.com/nvidia/ai-dynamo/charts/dynamo-platform-${RELEASE_VERSION}.tgz
-```
-### Install Dynamo Cloud
-**Step 1: Install Custom Resource Definitions (CRDs)**
-```bash
-helm install dynamo-crds dynamo-crds-${RELEASE_VERSION}.tgz \
-  --namespace default \
-  --wait \
-  --atomic
-```
-**Step 2: Install Dynamo Platform**
-```bash
-kubectl create namespace ${NAMESPACE}
-helm install dynamo-platform dynamo-platform-${RELEASE_VERSION}.tgz --namespace ${NAMESPACE}
-```
-## 2. Installing Dynamo Cloud from Source
-Use this approach when developing or customizing Dynamo as a contributor, or using local helm charts from the source repository.
-### Prerequisites
-Ensure you have the source code checked out and are in the `dynamo` directory:
-### Set Environment Variables
-Our examples use the [`nvcr.io`](https://catalog.ngc.nvidia.com) but you can setup your own values if you use another docker registry.
-```bash
-export NAMESPACE=dynamo-cloud # or whatever you prefer.
-export DOCKER_SERVER=nvcr.io/nvidia/ai-dynamo/  # your-docker-registry.com
-export DOCKER_USERNAME='$oauthtoken'  # your-username if not using nvcr.io
-export DOCKER_PASSWORD=YOUR_NGC_CLI_API_KEY  # your-password if not using nvcr.io
-```
-### Pick the Dynamo Inference Image
-Export the tag of the Dynamo Runtime Image.
-If you are using a pre-defined release:
-```bash
-export IMAGE_TAG=RELEASE_VERSION # i.e. 0.3.2 - the release you are using
-```
-Or build your own image first and tag it with IMAGE_TAG
-```bash
-export IMAGE_TAG=<your-pick>
-./container/build.sh
-docker tag dynamo:latest-vllm <your-registry>/dynamo-base:$IMAGE_TAG
-docker login <your-registry>
-docker push <your-registry>/dynamo-base:latest-vllm
-```
-### Install Dynamo Cloud
-You need to build and push the Dynamo Cloud Operator Image by running
-```bash
-cd deploy/cloud/operator
-earthly --push +docker --DOCKER_SERVER=$DOCKER_SERVER --IMAGE_TAG=$IMAGE_TAG
-```
-The  Nvidia Cloud Operator image will be pulled from the `$DOCKER_SERVER/dynamo-operator:$IMAGE_TAG`.
-You could run the `deploy.sh` or use the manual commands under Step 1 and Step 2.
-**Installing with a script (alternative to the Step 1 and Step 2)**
-Create the namespace and the docker registry secret.
-```bash
-kubectl create namespace ${NAMESPACE}
-kubectl create secret docker-registry docker-imagepullsecret \
-  --docker-server=${DOCKER_SERVER} \
-  --docker-username=${DOCKER_USERNAME} \
-  --docker-password=${DOCKER_PASSWORD} \
-  --namespace=${NAMESPACE}
-```
-You need to add the bitnami helm repository by running:
-```bash
-helm repo add bitnami https://charts.bitnami.com/bitnami
-```
-```bash
-./deploy.sh --crds
-```
-if you want guidance during the process, run the deployment script with the `--interactive` flag:
-```bash
-./deploy.sh --crds --interactive
-```
-**Installing CRDs manually  (alternative to the script deploy.sh)**
-***Step 1: Install Custom Resource Definitions (CRDs)**
-```bash
-helm install dynamo-crds ./crds/ \
-  --namespace default \
-  --wait \
-  --atomic
-```
-***Step 2: Build Dependencies and Install Platform**
-```bash
-cd deploy/cloud/helm
-helm dep build ./platform/
-kubectl create namespace ${NAMESPACE}
-# Create docker registry secret
-kubectl create secret docker-registry docker-imagepullsecret \
-  --docker-server=${DOCKER_SERVER} \
-  --docker-username=${DOCKER_USERNAME} \
-  --docker-password=${DOCKER_PASSWORD} \
-  --namespace=${NAMESPACE}
-# Install platform
-helm install dynamo-platform ./platform/ \
-  --namespace ${NAMESPACE} \
-  --set "dynamo-operator.controllerManager.manager.image.repository=${DOCKER_SERVER}/dynamo-operator" \
-  --set "dynamo-operator.controllerManager.manager.image.tag=${IMAGE_TAG}" \
-  --set "dynamo-operator.imagePullSecrets[0].name=docker-imagepullsecret"
-```
-[More on Deploying to Dynamo Cloud](./dynamo_cloud.md)
-## Uninstall CRDs for a clean start
-We provide a script to uninstall CRDs should you need a clean start.
-```bash
-./uninstall.sh
-```
-## Explore Examples
-If deploying to Kubernetes, create a Kubernetes secret containing your sensitive values if needed:
-```bash
-export HF_TOKEN=your_hf_token
-kubectl create secret generic hf-token-secret \
-  --from-literal=HF_TOKEN=${HF_TOKEN} \
-  -n ${NAMESPACE}
-```
-Follow the [Examples](../../examples/README.md)
-For more details on how to create your own deployments follow [Create Deployment Guide](create_deployment.md)
--- a/docs/hidden_toctree.rst
+++ b/docs/hidden_toctree.rst
@@ -4,18 +4,6 @@
    SPDX-FileCopyrightText: Copyright (c) 2024-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
    SPDX-License-Identifier: Apache-2.0
-    Licensed under the Apache License, Version 2.0 (the "License");
-    you may not use this file except in compliance with the License.
-    You may obtain a copy of the License at
-    http://www.apache.org/licenses/LICENSE-2.0
-    Unless required by applicable law or agreed to in writing, software
-    distributed under the License is distributed on an "AS IS" BASIS,
-    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-    See the License for the specific language governing permissions and
-    limitations under the License.
 .. This hidden toctree includes readmes etc that aren't meant to be in the main table of contents but should be accounted for in the sphinx project structure
@@ -34,24 +22,32 @@
   API/nixl_connect/writable_operation.md
   API/nixl_connect/read_operation.md
   API/nixl_connect/write_operation.md
-   components/backends/sglang/deploy/README.md
+   API/nixl_connect/README.md
-   components/backends/sglang/docs/dsr1-wideep-h100.md
-   components/backends/sglang/docs/multinode-examples.md
-   components/backends/sglang/docs/sgl-http-server.md
-   components/backends/sglang/slurm_jobs/README.md
-   examples/README.md
   guides/dynamo_deploy/create_deployment.md
   guides/dynamo_deploy/sla_planner_deployment.md
-   guides/dynamo_deploy/helm_install.md
   guides/dynamo_deploy/gke_setup.md
+   guides/dynamo_deploy/grove.md
+   guides/dynamo_deploy/k8s_metrics.md
+   guides/dynamo_deploy/model_caching_with_fluid.md
   guides/dynamo_deploy/README.md
   guides/dynamo_run.md
-   components/backends/vllm/README.md
+   guides/metrics.md
-   components/backends/trtllm/README.md
+   guides/run_kvbm_in_vllm.md
-   components/backends/trtllm/deploy/README.md
-   components/backends/trtllm/llama4_plus_eagle.md
+   architecture/kv_cache_routing.md
-   components/backends/trtllm/multinode-examples.md
+   architecture/load_planner.md
-   components/backends/trtllm/kv-cache-transfer.md
+   architecture/request_migration.md
-   components/backends/vllm/deploy/README.md
-   components/backends/vllm/multi-node.md
+   components/backends/trtllm/multinode/multinode-examples.md
+   components/backends/sglang/docs/multinode-examples.md
+   examples/README.md
+   examples/runtime/hello_world/README.md
+   architecture/distributed_runtime.md
+   architecture/dynamo_flow.md
+..   TODO: architecture/distributed_runtime.md and architecture/dynamo_flow.md
+     have some outdated names/references and need a refresh.
--- a/docs/index.rst
+++ b/docs/index.rst
@@ -14,6 +14,10 @@
    See the License for the specific language governing permissions and
    limitations under the License.
+..
+   Main Page
+..
 Welcome to NVIDIA Dynamo
 ========================
@@ -22,156 +26,49 @@ The NVIDIA Dynamo Platform is a high-performance, low-latency inference framewor
 .. admonition:: 💎 Discover the latest developments!
   :class: seealso
-   This guide is a snapshot of the `Dynamo GitHub Repository <https://github.com/ai-dynamo/dynamo>`_ at a specific point in time. For the latest information and examples, see:
+   This guide is a snapshot at a specific point in time. For the latest information and examples, see the `Dynamo GitHub repository <https://github.com/ai-dynamo/dynamo>`_.
-   - `Dynamo README <https://github.com/ai-dynamo/dynamo/blob/main/README.md>`_
-   - `Architecture and features doc <https://github.com/ai-dynamo/dynamo/blob/main/docs/architecture/>`_
-   - `Usage guides <https://github.com/ai-dynamo/dynamo/tree/main/docs/guides>`_
-   - `Dynamo examples repo <https://github.com/ai-dynamo/dynamo/tree/main/examples>`_
-Quick Start
-----------------
-Local Deployment
-~~~~~~~~~~~~~~~~
-Get started with Dynamo locally in just a few commands:
-**1. Install Dynamo**
-.. code-block:: bash
-   # Install uv (recommended Python package manager)
-   curl -LsSf https://astral.sh/uv/install.sh | sh
-   # Create virtual environment and install Dynamo
-   uv venv venv
-   source venv/bin/activate
-   uv pip install "ai-dynamo[sglang]"  # or [vllm], [trtllm]
-**2. Start etcd/NATS**
-.. code-block:: bash
-   # Start etcd and NATS using Docker Compose
-   docker compose -f deploy/docker-compose.yml up -d
-**3. Run Dynamo**
-.. code-block:: bash
-   # Start the OpenAI compatible frontend
-   python -m dynamo.frontend
-   # In another terminal, start an SGLang worker
-   python -m dynamo.sglang.worker deepseek-ai/DeepSeek-R1-Distill-Llama-8B
-**4. Test your deployment**
-.. code-block:: bash
-   curl localhost:8080/v1/chat/completions \
-     -H "Content-Type: application/json" \
-     -d '{"model": "deepseek-ai/DeepSeek-R1-Distill-Llama-8B",
-          "messages": [{"role": "user", "content": "Hello!"}],
-          "max_tokens": 50}'
-Kubernetes Deployment
-~~~~~~~~~~~~~~~~~~~~~
-For deployments on Kubernetes, follow the :doc:`Dynamo Platform Quickstart Guide <guides/dynamo_deploy/quickstart>`.
-Dive in: Examples
+Quickstart
-----------------
+==========
+.. include:: _includes/quick_start_local.rst
-The examples below assume you build the latest image yourself from source. If using a prebuilt image follow the examples from the corresponding branch.
-.. grid:: 1 2 2 2
-    :gutter: 3
-    :margin: 0
-    :padding: 3 4 0 0
-    .. grid-item-card:: :doc:`Hello World <examples/runtime/hello_world/README>`
-        :link: examples/runtime/hello_world/README
-        :link-type: doc
-        Demonstrates the basic concepts of Dynamo by creating a simple GPU-unaware graph
-    .. grid-item-card:: :doc:`LLM Serving with VLLM <components/backends/vllm/README>`
-        :link: components/backends/vllm/README
-        :link-type: doc
-        Presents examples and reference implementations for deploying Large Language Models (LLMs) in various configurations with VLLM.
-    .. grid-item-card:: :doc:`Multinode with SGLang <components/backends/sglang/docs/multinode-examples>`
-        :link: components/backends/sglang/docs/multinode-examples
-        :link-type: doc
-        Demonstrates disaggregated serving on several nodes.
-    .. grid-item-card:: :doc:`TensorRT-LLM <components/backends/trtllm/README>`
-        :link: components/backends/trtllm/README
-        :link-type: doc
-        Presents TensorRT-LLM examples and reference implementations for deploying Large Language Models (LLMs) in various configurations.
+..
+   Sidebar
+..
 .. toctree::
   :hidden:
+   :caption: Getting Started
-   Welcome to Dynamo <self>
+   Quickstart <self>
+   Installation <_sections/installation>
   Support Matrix <support_matrix.md>
+   Architecture <_sections/architecture>
+   Examples <_sections/examples>
 .. toctree::
   :hidden:
-   :caption: Architecture & Features
+   :caption: Kubernetes Deployment
-   High Level Architecture <architecture/architecture.md>
+   Quickstart (K8s) <../guides/dynamo_deploy/dynamo_cloud.md>
-   Distributed Runtime <architecture/distributed_runtime.md>
+   Dynamo Operator <../guides/dynamo_deploy/dynamo_operator.md>
-   Disaggregated Serving <architecture/disagg_serving.md>
+   Metrics <../guides/dynamo_deploy/k8s_metrics.md>
-   KV Block Manager <architecture/kvbm_intro.rst>
+   Multinode <../guides/dynamo_deploy/multinode-deployment.md>
-   KV Cache Routing <architecture/kv_cache_routing.md>
+   Minikube Setup <../guides/dynamo_deploy/minikube.md>
-   Planner <architecture/planner_intro.rst>
-   Dynamo Architecture Flow <architecture/dynamo_flow.md>
 .. toctree::
   :hidden:
-   :caption: Using Dynamo
+   :caption: Components
-   Writing Python Workers in Dynamo <guides/backend.md>
+   Backends <_sections/backends>
-   Disaggregation and Performance Tuning <guides/disagg_perf_tuning.md>
+   Router <components/router/README>
-   Working with Dynamo Kubernetes Operator <guides/dynamo_deploy/dynamo_operator.md>
+   Planner <architecture/planner_intro>
+   KVBM <architecture/kvbm_intro>
 .. toctree::
   :hidden:
-   :caption: Deployment Guides
+   :caption: Developer Guide
-   Dynamo Deploy Quickstart <guides/dynamo_deploy/quickstart.md>
-   Dynamo Cloud Kubernetes Platform <guides/dynamo_deploy/dynamo_cloud.md>
-   Manual Helm Deployment <guides/dynamo_deploy/helm_install.md>
-   Minikube Setup Guide <guides/dynamo_deploy/minikube.md>
-   Model Caching with Fluid <guides/dynamo_deploy/model_caching_with_fluid.md>
-.. toctree::
-   :hidden:
-   :caption: Examples
-   Hello World <examples/runtime/hello_world/README.md>
-   LLM Deployment Examples using VLLM <components/backends/vllm/README.md>
-   LLM Deployment Examples using SGLang <components/backends/sglang/README.md>
-   Multinode Examples using SGLang <components/backends/sglang/docs/multinode-examples.md>
-   Planner Benchmark Example <guides/planner_benchmark/README.md>
-   LLM Deployment Examples using TensorRT-LLM <components/backends/trtllm/README.md>
-.. toctree::
-   :hidden:
-   :caption: Reference
+   Tuning Disaggregated Serving Performance <guides/disagg_perf_tuning.md>
+   Writing Python Workers in Dynamo <guides/backend.md>
   Glossary <dynamo_glossary.md>
-   NIXL Connect API <API/nixl_connect/README.md>
-   KVBM Reading <architecture/kvbm_reading.md>
--- a/examples/basics/multinode/README.md
+++ b/examples/basics/multinode/README.md
@@ -315,6 +315,7 @@ Send multiple new conversations to see them distributed across replicas:
 ```python
 import asyncio
 from openai import AsyncOpenAI
+import os
 if os.environ.get("DYN_FRONTEND_IP"):
    frontend_ip=os.environ.get("DYN_FRONTEND_IP")

--- a/examples/runtime/hello_world/README.md
+++ b/examples/runtime/hello_world/README.md
@@ -106,7 +106,7 @@ Hello star!
 Note that this a very simple degenerate example which does not demonstrate the standard Dynamo FrontEnd-Backend deployment. The hello-world client is not a web server, it is a one-off function which sends the predefined text "world,sun,moon,star" to the backend. The example is meant to show the HelloWorldWorker. As such you will only see the HelloWorldWorker pod in deployment. The client will run and exit and the pod will not be operational.
-Follow the [Quickstart Guide](../../../docs/guides/dynamo_deploy/quickstart.md) to install Dynamo Kubernetes Platform.
+Follow the [Quickstart Guide](../../../docs/guides/dynamo_deploy/README.md) to install Dynamo Kubernetes Platform.
 Then deploy to kubernetes using
 ```bash
@@ -119,4 +119,4 @@ to delete your deployment:
 ```bash
 kubectl delete dynamographdeployment hello-world -n ${NAMESPACE}
 ```
\ No newline at end of file