Unverified Commit 844f8819 authored by Ryan McCormick's avatar Ryan McCormick Committed by GitHub
Browse files

docs: Bring back some missed release/0.4.0 doc changes, fix broken links, add...

docs: Bring back some missed release/0.4.0 doc changes, fix broken links, add lychee link checker github action (#2482)
parent 41f095cf
......@@ -318,7 +318,7 @@ spec:
- [Fluid Documentation](https://fluid-cloudnative.github.io/)
- [Alluxio Documentation](https://docs.alluxio.io/)
- [MinIO Documentation](https://min.io/docs/)
- [MinIO Documentation](https://docs.min.io/)
- [Hugging Face Hub](https://huggingface.co/docs/hub/index)
- [Dynamo README](https://github.com/ai-dynamo/dynamo/blob/main/.devcontainer/README.md)
- [Dynamo Documentation](https://docs.nvidia.com/dynamo/latest/index.html)
......@@ -50,8 +50,8 @@ These systems provide enhanced scheduling capabilities including topology-aware
LWS is a simple multinode deployment mechanism that allows you to deploy a workload across multiple nodes.
- **LWS**: [LWS Installation](https://github.com/NVIDIA/LWS#installation)
- **Volcano**: [Volcano Installation](https://volcano.sh/docs/installation/install-volcano/)
- **LWS**: [LWS Installation](https://github.com/kubernetes-sigs/lws#installation)
- **Volcano**: [Volcano Installation](https://volcano.sh/en/docs/installation/)
Volcano is a Kubernetes native scheduler optimized for AI workloads at scale. It is used in conjunction with LWS to provide gang scheduling support.
......@@ -110,8 +110,8 @@ args:
For additional support and examples, see the working multinode configurations in:
- **SGLang**: [components/backends/sglang/deploy/](../../components/backends/sglang/deploy/)
- **TensorRT-LLM**: [components/backends/trtllm/deploy/](../../components/backends/trtllm/deploy/)
- **vLLM**: [components/backends/vllm/deploy/](../../components/backends/vllm/deploy/)
- **SGLang**: [components/backends/sglang/deploy/](../../../components/backends/sglang/deploy/)
- **TensorRT-LLM**: [components/backends/trtllm/deploy/](../../../components/backends/trtllm/deploy/)
- **vLLM**: [components/backends/vllm/deploy/](../../../components/backends/vllm/deploy/)
These examples demonstrate proper usage of the `multinode` section with corresponding `gpu` limits and correct `tp-size` configuration.
......@@ -14,7 +14,7 @@ Use this approach when installing from pre-built helm charts and docker images p
```bash
export NAMESPACE=dynamo-cloud
export RELEASE_VERSION=0.3.2
export RELEASE_VERSION=0.4.0
```
Install `envsubst`, `kubectl`, `helm`
......@@ -67,7 +67,7 @@ Ensure you have the source code checked out and are in the `dynamo` directory:
### Set Environment Variables
Our examples use the [`nvcr.io`](https://nvcr.io/nvidia/ai-dynamo/) but you can setup your own values if you use another docker registry.
Our examples use the [`nvcr.io`](https://catalog.ngc.nvidia.com) but you can setup your own values if you use another docker registry.
```bash
export NAMESPACE=dynamo-cloud # or whatever you prefer.
......
:orphan:
..
SPDX-FileCopyrightText: Copyright (c) 2024-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
SPDX-License-Identifier: Apache-2.0
......@@ -21,5 +23,36 @@
:maxdepth: 2
:hidden:
guides/README.md
runtime/README.md
API/nixl_connect/connector.md
API/nixl_connect/descriptor.md
API/nixl_connect/device.md
API/nixl_connect/device_kind.md
API/nixl_connect/operation_status.md
API/nixl_connect/rdma_metadata.md
API/nixl_connect/readable_operation.md
API/nixl_connect/writable_operation.md
API/nixl_connect/read_operation.md
API/nixl_connect/write_operation.md
components/backends/sglang/deploy/README.md
components/backends/sglang/docs/dsr1-wideep-h100.md
components/backends/sglang/docs/multinode-examples.md
components/backends/sglang/docs/sgl-http-server.md
components/backends/sglang/slurm_jobs/README.md
components/router/README.md
examples/README.md
guides/dynamo_deploy/create_deployment.md
guides/dynamo_deploy/sla_planner_deployment.md
guides/dynamo_deploy/helm_install.md
guides/dynamo_deploy/gke_setup.md
guides/dynamo_deploy/README.md
guides/dynamo_run.md
components/backends/vllm/README.md
components/backends/trtllm/README.md
components/backends/trtllm/deploy/README.md
components/backends/trtllm/llama4_plus_eagle.md
components/backends/trtllm/multinode-examples.md
components/backends/trtllm/kv-cache-transfer.md
components/backends/vllm/deploy/README.md
components/backends/vllm/multi-node.md
......@@ -27,12 +27,60 @@ The NVIDIA Dynamo Platform is a high-performance, low-latency inference framewor
- `Dynamo README <https://github.com/ai-dynamo/dynamo/blob/main/README.md>`_
- `Architecture and features doc <https://github.com/ai-dynamo/dynamo/blob/main/docs/architecture/>`_
- `Usage guides <https://github.com/ai-dynamo/dynamo/tree/main/docs/guides>`_
- `Dynamo examples repo <https://github.com/ai-dynamo/examples>`_
- `Dynamo examples repo <https://github.com/ai-dynamo/dynamo/tree/main/examples>`_
Quick Start
-----------------
Follow the :doc:`Quick Guide to install Dynamo Platform <guides/dynamo_deploy/quickstart>`.
Local Deployment
~~~~~~~~~~~~~~~~
Get started with Dynamo locally in just a few commands:
**1. Install Dynamo**
.. code-block:: bash
# Install uv (recommended Python package manager)
curl -LsSf https://astral.sh/uv/install.sh | sh
# Create virtual environment and install Dynamo
uv venv venv
source venv/bin/activate
uv pip install "ai-dynamo[sglang]" # or [vllm], [trtllm]
**2. Start etcd/NATS**
.. code-block:: bash
# Start etcd and NATS using Docker Compose
docker compose -f deploy/docker-compose.yml up -d
**3. Run Dynamo**
.. code-block:: bash
# Start the OpenAI compatible frontend
python -m dynamo.frontend
# In another terminal, start an SGLang worker
python -m dynamo.sglang.worker deepseek-ai/DeepSeek-R1-Distill-Llama-8B
**4. Test your deployment**
.. code-block:: bash
curl localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "deepseek-ai/DeepSeek-R1-Distill-Llama-8B",
"messages": [{"role": "user", "content": "Hello!"}],
"max_tokens": 50}'
Kubernetes Deployment
~~~~~~~~~~~~~~~~~~~~~
For deployments on Kubernetes, follow the :doc:`Dynamo Platform Quickstart Guide <guides/dynamo_deploy/quickstart>`.
Dive in: Examples
......@@ -92,16 +140,8 @@ The examples below assume you build the latest image yourself from source. If us
:hidden:
:caption: Using Dynamo
Running Inference Graphs Locally (dynamo-run) <guides/dynamo_run.md>
Deploying Inference Graphs <guides/dynamo_deploy/README.md>
.. toctree::
:hidden:
:caption: Usage Guides
Writing Python Workers in Dynamo <guides/backend.md>
Disaggregation and Performance Tuning <guides/disagg_perf_tuning.md>
KV Cache Router Performance Tuning <guides/kv_router_perf_tuning.md>
Working with Dynamo Kubernetes Operator <guides/dynamo_deploy/dynamo_operator.md>
.. toctree::
......@@ -110,31 +150,19 @@ The examples below assume you build the latest image yourself from source. If us
Dynamo Deploy Quickstart <guides/dynamo_deploy/quickstart.md>
Dynamo Cloud Kubernetes Platform <guides/dynamo_deploy/dynamo_cloud.md>
Manual Helm Deployment <deploy/helm/README.md>
GKE Setup Guide <guides/dynamo_deploy/gke_setup.md>
Manual Helm Deployment <guides/dynamo_deploy/helm_install.md>
Minikube Setup Guide <guides/dynamo_deploy/minikube.md>
Model Caching with Fluid <guides/dynamo_deploy/model_caching_with_fluid.md>
.. toctree::
:hidden:
:caption: Benchmarking
Planner Benchmark Example <guides/planner_benchmark/README.md>
.. toctree::
:hidden:
:caption: API
NIXL Connect API <API/nixl_connect/README.md>
.. toctree::
:hidden:
:caption: Examples
Hello World <examples/runtime/hello_world/README.md>
LLM Deployment Examples using VLLM <components/backends/vllm/README.md>
LLM Deployment Examples using SGLang <components/backends/sglang/README.md>
Multinode Examples using SGLang <components/backends/sglang/docs/multinode-examples.md>
Planner Benchmark Example <guides/planner_benchmark/README.md>
LLM Deployment Examples using TensorRT-LLM <components/backends/trtllm/README.md>
.. toctree::
......@@ -143,6 +171,7 @@ The examples below assume you build the latest image yourself from source. If us
Glossary <dynamo_glossary.md>
NIXL Connect API <API/nixl_connect/README.md>
KVBM Reading <architecture/kvbm_reading.md>
......@@ -36,9 +36,9 @@ docker compose -f deploy/metrics/docker-compose.yml up -d
## Components
- [Frontend](../../../components/frontend/README) - HTTP API endpoint that receives requests and forwards them to the decode worker
- [vLLM Prefill Worker](../../../components/backends/vllm/README) - Specialized worker for prefill phase execution
- [vLLM Decode Worker](../../../components/backends/vllm/README) - Specialized worker that handles requests and decides between local/remote prefill
- [Frontend](../../../components/frontend/README.md) - HTTP API endpoint that receives requests and forwards them to the decode worker
- [vLLM Prefill Worker](../../../components/backends/vllm/README.md) - Specialized worker for prefill phase execution
- [vLLM Decode Worker](../../../components/backends/vllm/README.md) - Specialized worker that handles requests and decides between local/remote prefill
```mermaid
---
......
......@@ -85,7 +85,7 @@ Install Dynamo with [SGLang](https://docs.sglang.ai/) support:
pip install ai-dynamo[sglang]
```
For more information about the SGLang backend and its integration with Dynamo, see the [SGLang Backend Documentation](../../components/backends/sglang/README.md).
For more information about the SGLang backend and its integration with Dynamo, see the [SGLang Backend Documentation](../../../components/backends/sglang/README.md).
### 3. Network Requirements
......@@ -210,7 +210,7 @@ The frontend will:
- Enable KV-aware routing for intelligent request distribution
- Monitor worker health and adjust routing accordingly
For more details about frontend configuration options, see the [Frontend Component Documentation](../../../components/frontend/README).
For more details about frontend configuration options, see the [Frontend Component Documentation](../../../components/frontend/README.md).
## Testing the Setup
......
......@@ -17,8 +17,8 @@ docker compose -f deploy/docker-compose.yml up -d
## Components
- [Frontend](../../../components/frontend/README) - A built-in component that launches an OpenAI compliant HTTP server, a pre-processor, and a router in a single process
- [vLLM Backend](../../../components/backends/vllm/README) - A built-in component that runs vLLM within the Dynamo runtime
- [Frontend](../../../components/frontend/README.md) - A built-in component that launches an OpenAI compliant HTTP server, a pre-processor, and a router in a single process
- [vLLM Backend](../../../components/backends/vllm/README.md) - A built-in component that runs vLLM within the Dynamo runtime
```mermaid
---
......
......@@ -60,7 +60,7 @@ flowchart LR
```
```bash
cd $DYNAMO_HOME/examples/multimodal_v1
cd $DYNAMO_HOME/examples/multimodal
# Serve a LLaVA 1.5 7B model:
bash launch/agg.sh --model llava-hf/llava-1.5-7b-hf
# Serve a Qwen2.5-VL model:
......@@ -138,7 +138,7 @@ flowchart LR
```
```bash
cd $DYNAMO_HOME/examples/multimodal_v1
cd $DYNAMO_HOME/examples/multimodal
bash launch/disagg.sh --model llava-hf/llava-1.5-7b-hf
```
......@@ -215,7 +215,7 @@ flowchart LR
```
```bash
cd $DYNAMO_HOME/examples/multimodal_v1
cd $DYNAMO_HOME/examples/multimodal
bash launch/agg_llama.sh
```
......@@ -281,13 +281,13 @@ flowchart LR
```
```bash
cd $DYNAMO_HOME/examples/multimodal_v1
cd $DYNAMO_HOME/examples/multimodal
bash launch/disagg_llama.sh --head-node
# On a separate node that has finished standard dynamo setup, i.e.
# the worker node needs NATS_SERVER and ETCD_ENDPOINTS environment variables
# pointing to the head node's external IP address for distributed coordination
cd $DYNAMO_HOME/examples/multimodal_v1
cd $DYNAMO_HOME/examples/multimodal
bash launch/disagg_llama.sh
```
......
......@@ -115,7 +115,7 @@ flowchart LR
#### Code Examples
See [prefill_worker](../components/prefill_worker.py#L199) or [decode_worker](../components/decode_worker.py#L239),
See [prefill_worker](../components/worker.py) or [decode_worker](../components/worker.py),
for how they coordinate directly with the Encode Worker by creating a [`WritableOperation`](#writableoperation),
sending the operation's metadata via Dynamo's round-robin dispatcher, and awaiting the operation for completion before making use of the transferred data.
......@@ -338,5 +338,5 @@ Use the [`.to_serialized()`](#to_serialized) method on either of the above types
- [NVIDIA Dynamo](https://developer.nvidia.com/dynamo) @ [GitHub](https://github.com/ai-dynamo/dynamo)
- [NVIDIA Inference Transfer Library (NIXL)](https://developer.nvidia.com/blog/introducing-nvidia-dynamo-a-low-latency-distributed-inference-framework-for-scaling-reasoning-ai-models/#nvidia_inference_transfer_library_nixl_low-latency_hardware-agnostic_communication%C2%A0) @ [GitHub](https://github.com/ai-dynamo/nixl)
- [Dynamo Multimodal Example](https://github.com/ai-dynamo/dynamo/tree/main/examples/multimodal)
- [Dynamo Multimodal Example](../../../examples/multimodal)
- [NVIDIA GPU Direct](https://developer.nvidia.com/gpudirect)
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment