docs: Bring back some missed release/0.4.0 doc changes, fix broken links, add...

docs: Bring back some missed release/0.4.0 doc changes, fix broken links, add lychee link checker github action (#2482)

docs: Bring back some missed release/0.4.0 doc changes, fix broken links, add...
docs: Bring back some missed release/0.4.0 doc changes, fix broken links, add lychee link checker github action (#2482)
844f8819 · Ryan McCormick · GitHub · 41f095cf · 844f8819 · 844f8819
Unverified Commit 844f8819 authored Aug 18, 2025 by Ryan McCormick Committed by GitHub Aug 18, 2025
20 changed files
--- a/docs/guides/dynamo_deploy/model_caching_with_fluid.md
+++ b/docs/guides/dynamo_deploy/model_caching_with_fluid.md
@@ -318,7 +318,7 @@ spec:

 - [Fluid Documentation](https://fluid-cloudnative.github.io/)
 - [Alluxio Documentation](https://docs.alluxio.io/)
- [MinIO Documentation](https://min.io/docs/)
+- [MinIO Documentation](https://docs.min.io/)
 - [Hugging Face Hub](https://huggingface.co/docs/hub/index)
 - [Dynamo README](https://github.com/ai-dynamo/dynamo/blob/main/.devcontainer/README.md)
 - [Dynamo Documentation](https://docs.nvidia.com/dynamo/latest/index.html)
--- a/docs/guides/dynamo_deploy/multinode-deployment.md
+++ b/docs/guides/dynamo_deploy/multinode-deployment.md
@@ -50,8 +50,8 @@ These systems provide enhanced scheduling capabilities including topology-aware

 LWS is a simple multinode deployment mechanism that allows you to deploy a workload across multiple nodes.

- **LWS**: [LWS Installation](https://github.com/NVIDIA/LWS#installation)
- **Volcano**: [Volcano Installation](https://volcano.sh/docs/installation/install-volcano/)
+- **LWS**: [LWS Installation](https://github.com/kubernetes-sigs/lws#installation)
+- **Volcano**: [Volcano Installation](https://volcano.sh/en/docs/installation/)

 Volcano is a Kubernetes native scheduler optimized for AI workloads at scale. It is used in conjunction with LWS to provide gang scheduling support.

@@ -110,8 +110,8 @@ args:

 For additional support and examples, see the working multinode configurations in:

- **SGLang**: [components/backends/sglang/deploy/](../../components/backends/sglang/deploy/)
- **TensorRT-LLM**: [components/backends/trtllm/deploy/](../../components/backends/trtllm/deploy/)
- **vLLM**: [components/backends/vllm/deploy/](../../components/backends/vllm/deploy/)
+- **SGLang**: [components/backends/sglang/deploy/](../../../components/backends/sglang/deploy/)
+- **TensorRT-LLM**: [components/backends/trtllm/deploy/](../../../components/backends/trtllm/deploy/)
+- **vLLM**: [components/backends/vllm/deploy/](../../../components/backends/vllm/deploy/)

 These examples demonstrate proper usage of the `multinode` section with corresponding `gpu` limits and correct `tp-size` configuration.
--- a/docs/guides/dynamo_deploy/quickstart.md
+++ b/docs/guides/dynamo_deploy/quickstart.md
@@ -14,7 +14,7 @@ Use this approach when installing from pre-built helm charts and docker images p

 ```bash
 export NAMESPACE=dynamo-cloud
-export RELEASE_VERSION=0.3.2
+export RELEASE_VERSION=0.4.0
 ```

 Install `envsubst`, `kubectl`, `helm`
@@ -67,7 +67,7 @@ Ensure you have the source code checked out and are in the `dynamo` directory:

 ### Set Environment Variables

-Our examples use the [`nvcr.io`](https://nvcr.io/nvidia/ai-dynamo/) but you can setup your own values if you use another docker registry.
+Our examples use the [`nvcr.io`](https://catalog.ngc.nvidia.com) but you can setup your own values if you use another docker registry.

 ```bash
 export NAMESPACE=dynamo-cloud # or whatever you prefer.

--- a/docs/hidden_toctree.rst
+++ b/docs/hidden_toctree.rst
+:orphan:
+
 ..
    SPDX-FileCopyrightText: Copyright (c) 2024-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
    SPDX-License-Identifier: Apache-2.0
@@ -21,5 +23,36 @@
   :maxdepth: 2
   :hidden:

-   guides/README.md
   runtime/README.md
+   API/nixl_connect/connector.md
+   API/nixl_connect/descriptor.md
+   API/nixl_connect/device.md
+   API/nixl_connect/device_kind.md
+   API/nixl_connect/operation_status.md
+   API/nixl_connect/rdma_metadata.md
+   API/nixl_connect/readable_operation.md
+   API/nixl_connect/writable_operation.md
+   API/nixl_connect/read_operation.md
+   API/nixl_connect/write_operation.md
+   components/backends/sglang/deploy/README.md
+   components/backends/sglang/docs/dsr1-wideep-h100.md
+   components/backends/sglang/docs/multinode-examples.md
+   components/backends/sglang/docs/sgl-http-server.md
+   components/backends/sglang/slurm_jobs/README.md
+   components/router/README.md
+   examples/README.md
+   guides/dynamo_deploy/create_deployment.md
+   guides/dynamo_deploy/sla_planner_deployment.md
+   guides/dynamo_deploy/helm_install.md
+   guides/dynamo_deploy/gke_setup.md
+   guides/dynamo_deploy/README.md
+   guides/dynamo_run.md
+   components/backends/vllm/README.md
+   components/backends/trtllm/README.md
+   components/backends/trtllm/deploy/README.md
+   components/backends/trtllm/llama4_plus_eagle.md
+   components/backends/trtllm/multinode-examples.md
+   components/backends/trtllm/kv-cache-transfer.md
+   components/backends/vllm/deploy/README.md
+   components/backends/vllm/multi-node.md
+
--- a/docs/index.rst
+++ b/docs/index.rst
@@ -27,12 +27,60 @@ The NVIDIA Dynamo Platform is a high-performance, low-latency inference framewor
   - `Dynamo README <https://github.com/ai-dynamo/dynamo/blob/main/README.md>`_
   - `Architecture and features doc <https://github.com/ai-dynamo/dynamo/blob/main/docs/architecture/>`_
   - `Usage guides <https://github.com/ai-dynamo/dynamo/tree/main/docs/guides>`_
-   - `Dynamo examples repo <https://github.com/ai-dynamo/examples>`_
+   - `Dynamo examples repo <https://github.com/ai-dynamo/dynamo/tree/main/examples>`_


 Quick Start
 -----------------
-Follow the :doc:`Quick Guide to install Dynamo Platform <guides/dynamo_deploy/quickstart>`.
+
+Local Deployment
+~~~~~~~~~~~~~~~~
+
+Get started with Dynamo locally in just a few commands:
+
+**1. Install Dynamo**
+
+.. code-block:: bash
+
+   # Install uv (recommended Python package manager)
+   curl -LsSf https://astral.sh/uv/install.sh | sh
+
+   # Create virtual environment and install Dynamo
+   uv venv venv
+   source venv/bin/activate
+   uv pip install "ai-dynamo[sglang]"  # or [vllm], [trtllm]
+
+**2. Start etcd/NATS**
+
+.. code-block:: bash
+
+   # Start etcd and NATS using Docker Compose
+   docker compose -f deploy/docker-compose.yml up -d
+
+**3. Run Dynamo**
+
+.. code-block:: bash
+
+   # Start the OpenAI compatible frontend
+   python -m dynamo.frontend
+
+   # In another terminal, start an SGLang worker
+   python -m dynamo.sglang.worker deepseek-ai/DeepSeek-R1-Distill-Llama-8B
+
+**4. Test your deployment**
+
+.. code-block:: bash
+
+   curl localhost:8080/v1/chat/completions \
+     -H "Content-Type: application/json" \
+     -d '{"model": "deepseek-ai/DeepSeek-R1-Distill-Llama-8B",
+          "messages": [{"role": "user", "content": "Hello!"}],
+          "max_tokens": 50}'
+
+Kubernetes Deployment
+~~~~~~~~~~~~~~~~~~~~~
+
+For deployments on Kubernetes, follow the :doc:`Dynamo Platform Quickstart Guide <guides/dynamo_deploy/quickstart>`.


 Dive in: Examples
@@ -92,16 +140,8 @@ The examples below assume you build the latest image yourself from source. If us
   :hidden:
   :caption: Using Dynamo

-   Running Inference Graphs Locally (dynamo-run) <guides/dynamo_run.md>
-   Deploying Inference Graphs <guides/dynamo_deploy/README.md>
-
-.. toctree::
-   :hidden:
-   :caption: Usage Guides
-
   Writing Python Workers in Dynamo <guides/backend.md>
   Disaggregation and Performance Tuning <guides/disagg_perf_tuning.md>
-   KV Cache Router Performance Tuning <guides/kv_router_perf_tuning.md>
   Working with Dynamo Kubernetes Operator <guides/dynamo_deploy/dynamo_operator.md>

 .. toctree::
@@ -110,31 +150,19 @@ The examples below assume you build the latest image yourself from source. If us

   Dynamo Deploy Quickstart <guides/dynamo_deploy/quickstart.md>
   Dynamo Cloud Kubernetes Platform <guides/dynamo_deploy/dynamo_cloud.md>
-   Manual Helm Deployment <deploy/helm/README.md>
-   GKE Setup Guide <guides/dynamo_deploy/gke_setup.md>
+   Manual Helm Deployment <guides/dynamo_deploy/helm_install.md>
   Minikube Setup Guide <guides/dynamo_deploy/minikube.md>
   Model Caching with Fluid <guides/dynamo_deploy/model_caching_with_fluid.md>

-.. toctree::
-   :hidden:
-   :caption: Benchmarking
-
-   Planner Benchmark Example <guides/planner_benchmark/README.md>
-
-
-.. toctree::
-   :hidden:
-   :caption: API
-
-   NIXL Connect API <API/nixl_connect/README.md>
-
 .. toctree::
   :hidden:
   :caption: Examples

   Hello World <examples/runtime/hello_world/README.md>
   LLM Deployment Examples using VLLM <components/backends/vllm/README.md>
+   LLM Deployment Examples using SGLang <components/backends/sglang/README.md>
   Multinode Examples using SGLang <components/backends/sglang/docs/multinode-examples.md>
+   Planner Benchmark Example <guides/planner_benchmark/README.md>
   LLM Deployment Examples using TensorRT-LLM <components/backends/trtllm/README.md>

 .. toctree::
@@ -143,6 +171,7 @@ The examples below assume you build the latest image yourself from source. If us


   Glossary <dynamo_glossary.md>
+   NIXL Connect API <API/nixl_connect/README.md>
   KVBM Reading <architecture/kvbm_reading.md>


--- a/examples/basics/disaggregated_serving/README.md
+++ b/examples/basics/disaggregated_serving/README.md
@@ -36,9 +36,9 @@ docker compose -f deploy/metrics/docker-compose.yml up -d

 ## Components

- [Frontend](../../../components/frontend/README) - HTTP API endpoint that receives requests and forwards them to the decode worker
- [vLLM Prefill Worker](../../../components/backends/vllm/README) - Specialized worker for prefill phase execution
- [vLLM Decode Worker](../../../components/backends/vllm/README) - Specialized worker that handles requests and decides between local/remote prefill
+- [Frontend](../../../components/frontend/README.md) - HTTP API endpoint that receives requests and forwards them to the decode worker
+- [vLLM Prefill Worker](../../../components/backends/vllm/README.md) - Specialized worker for prefill phase execution
+- [vLLM Decode Worker](../../../components/backends/vllm/README.md) - Specialized worker that handles requests and decides between local/remote prefill

 ```mermaid
 ---

--- a/examples/basics/multinode/README.md
+++ b/examples/basics/multinode/README.md
@@ -85,7 +85,7 @@ Install Dynamo with [SGLang](https://docs.sglang.ai/) support:
 pip install ai-dynamo[sglang]
 ```

-For more information about the SGLang backend and its integration with Dynamo, see the [SGLang Backend Documentation](../../components/backends/sglang/README.md).
+For more information about the SGLang backend and its integration with Dynamo, see the [SGLang Backend Documentation](../../../components/backends/sglang/README.md).

 ### 3. Network Requirements

@@ -210,7 +210,7 @@ The frontend will:
 - Enable KV-aware routing for intelligent request distribution
 - Monitor worker health and adjust routing accordingly

-For more details about frontend configuration options, see the [Frontend Component Documentation](../../../components/frontend/README).
+For more details about frontend configuration options, see the [Frontend Component Documentation](../../../components/frontend/README.md).

 ## Testing the Setup


--- a/examples/basics/quickstart/README.md
+++ b/examples/basics/quickstart/README.md
@@ -17,8 +17,8 @@ docker compose -f deploy/docker-compose.yml up -d

 ## Components

- [Frontend](../../../components/frontend/README) - A built-in component that launches an OpenAI compliant HTTP server, a pre-processor, and a router in a single process
- [vLLM Backend](../../../components/backends/vllm/README) - A built-in component that runs vLLM within the Dynamo runtime
+- [Frontend](../../../components/frontend/README.md) - A built-in component that launches an OpenAI compliant HTTP server, a pre-processor, and a router in a single process
+- [vLLM Backend](../../../components/backends/vllm/README.md) - A built-in component that runs vLLM within the Dynamo runtime

 ```mermaid
 ---

--- a/examples/multimodal_v1/README.md
+++ b/examples/multimodal_v1/README.md
@@ -60,7 +60,7 @@ flowchart LR
 ```

 ```bash
-cd $DYNAMO_HOME/examples/multimodal_v1
+cd $DYNAMO_HOME/examples/multimodal
 # Serve a LLaVA 1.5 7B model:
 bash launch/agg.sh --model llava-hf/llava-1.5-7b-hf
 # Serve a Qwen2.5-VL model:
@@ -138,7 +138,7 @@ flowchart LR
 ```

 ```bash
-cd $DYNAMO_HOME/examples/multimodal_v1
+cd $DYNAMO_HOME/examples/multimodal
 bash launch/disagg.sh --model llava-hf/llava-1.5-7b-hf
 ```

@@ -215,7 +215,7 @@ flowchart LR
 ```

 ```bash
-cd $DYNAMO_HOME/examples/multimodal_v1
+cd $DYNAMO_HOME/examples/multimodal
 bash launch/agg_llama.sh
 ```

@@ -281,13 +281,13 @@ flowchart LR
 ```

 ```bash
-cd $DYNAMO_HOME/examples/multimodal_v1
+cd $DYNAMO_HOME/examples/multimodal
 bash launch/disagg_llama.sh --head-node

 # On a separate node that has finished standard dynamo setup, i.e.
 # the worker node needs NATS_SERVER and ETCD_ENDPOINTS environment variables
 # pointing to the head node's external IP address for distributed coordination
-cd $DYNAMO_HOME/examples/multimodal_v1
+cd $DYNAMO_HOME/examples/multimodal
 bash launch/disagg_llama.sh
 ```


--- a/examples/multimodal_v1/components/encode_worker.py
+++ b/examples/multimodal_v1/components/encode_worker.py
--- a/examples/multimodal_v1/components/processor.py
+++ b/examples/multimodal_v1/components/processor.py
--- a/examples/multimodal_v1/components/publisher.py
+++ b/examples/multimodal_v1/components/publisher.py
--- a/examples/multimodal_v1/components/worker.py
+++ b/examples/multimodal_v1/components/worker.py
--- a/examples/multimodal_v1/connect/README.md
+++ b/examples/multimodal_v1/connect/README.md
@@ -115,7 +115,7 @@ flowchart LR

 #### Code Examples

-See [prefill_worker](../components/prefill_worker.py#L199) or [decode_worker](../components/decode_worker.py#L239),
+See [prefill_worker](../components/worker.py) or [decode_worker](../components/worker.py),
 for how they coordinate directly with the Encode Worker by creating a [`WritableOperation`](#writableoperation),
 sending the operation's metadata via Dynamo's round-robin dispatcher, and awaiting the operation for completion before making use of the transferred data.

@@ -338,5 +338,5 @@ Use the [`.to_serialized()`](#to_serialized) method on either of the above types

  - [NVIDIA Dynamo](https://developer.nvidia.com/dynamo) @ [GitHub](https://github.com/ai-dynamo/dynamo)
  - [NVIDIA Inference Transfer Library (NIXL)](https://developer.nvidia.com/blog/introducing-nvidia-dynamo-a-low-latency-distributed-inference-framework-for-scaling-reasoning-ai-models/#nvidia_inference_transfer_library_nixl_low-latency_hardware-agnostic_communication%C2%A0) @ [GitHub](https://github.com/ai-dynamo/nixl)
-  - [Dynamo Multimodal Example](https://github.com/ai-dynamo/dynamo/tree/main/examples/multimodal)
+  - [Dynamo Multimodal Example](../../../examples/multimodal)
  - [NVIDIA GPU Direct](https://developer.nvidia.com/gpudirect)
--- a/examples/multimodal_v1/connect/__init__.py
+++ b/examples/multimodal_v1/connect/__init__.py
--- a/examples/multimodal_v1/launch/agg.sh
+++ b/examples/multimodal_v1/launch/agg.sh
--- a/examples/multimodal_v1/launch/agg_llama.sh
+++ b/examples/multimodal_v1/launch/agg_llama.sh
--- a/examples/multimodal_v1/launch/disagg.sh
+++ b/examples/multimodal_v1/launch/disagg.sh
--- a/examples/multimodal_v1/launch/disagg_llama.sh
+++ b/examples/multimodal_v1/launch/disagg_llama.sh
--- a/examples/multimodal_v1/utils/args.py
+++ b/examples/multimodal_v1/utils/args.py