Unverified Commit 766d3f2c authored by Ryan McCormick's avatar Ryan McCormick Committed by GitHub
Browse files

docs: Simplify sphinx build and table of contents on webpage (#2519)

parent f5a41004
../../examples/README.md
\ No newline at end of file
...@@ -76,7 +76,7 @@ The `model_type` can be: ...@@ -76,7 +76,7 @@ The `model_type` can be:
See `components/backends` for full code examples. See `components/backends` for full code examples.
### Component names ## Component names
A worker needs three names to register itself: namespace.component.endpoint A worker needs three names to register itself: namespace.component.endpoint
......
...@@ -39,7 +39,7 @@ helm version # v3.0+ ...@@ -39,7 +39,7 @@ helm version # v3.0+
docker version # Running daemon docker version # Running daemon
# Set your inference runtime image # Set your inference runtime image
export DYNAMO_IMAGE=nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.4.0 export DYNAMO_IMAGE=nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.4.1
# Also available: sglang-runtime, tensorrtllm-runtime # Also available: sglang-runtime, tensorrtllm-runtime
``` ```
...@@ -53,7 +53,7 @@ Install from [NGC published artifacts](https://catalog.ngc.nvidia.com/orgs/nvidi ...@@ -53,7 +53,7 @@ Install from [NGC published artifacts](https://catalog.ngc.nvidia.com/orgs/nvidi
```bash ```bash
# 1. Set environment # 1. Set environment
export NAMESPACE=dynamo-kubernetes export NAMESPACE=dynamo-kubernetes
export RELEASE_VERSION=0.4.0 # any version of Dynamo 0.3.2+ export RELEASE_VERSION=0.4.1 # any version of Dynamo 0.3.2+
# 2. Install CRDs # 2. Install CRDs
helm fetch https://helm.ngc.nvidia.com/nvidia/ai-dynamo/charts/dynamo-crds-${RELEASE_VERSION}.tgz helm fetch https://helm.ngc.nvidia.com/nvidia/ai-dynamo/charts/dynamo-crds-${RELEASE_VERSION}.tgz
...@@ -79,7 +79,7 @@ export NAMESPACE=dynamo-cloud ...@@ -79,7 +79,7 @@ export NAMESPACE=dynamo-cloud
export DOCKER_SERVER=nvcr.io/nvidia/ai-dynamo/ # or your registry export DOCKER_SERVER=nvcr.io/nvidia/ai-dynamo/ # or your registry
export DOCKER_USERNAME='$oauthtoken' export DOCKER_USERNAME='$oauthtoken'
export DOCKER_PASSWORD=<YOUR_NGC_CLI_API_KEY> export DOCKER_PASSWORD=<YOUR_NGC_CLI_API_KEY>
export IMAGE_TAG=0.4.0 export IMAGE_TAG=0.4.1
# 2. Build operator # 2. Build operator
cd deploy/cloud/operator cd deploy/cloud/operator
...@@ -178,4 +178,4 @@ kubectl create secret generic hf-token-secret \ ...@@ -178,4 +178,4 @@ kubectl create secret generic hf-token-secret \
- [GKE-specific setup](gke_setup.md) - [GKE-specific setup](gke_setup.md)
- [Create custom deployments](create_deployment.md) - [Create custom deployments](create_deployment.md)
- [Dynamo Operator details](dynamo_operator.md) - [Dynamo Operator details](dynamo_operator.md)
\ No newline at end of file
...@@ -93,7 +93,7 @@ The GitOps workflow for Dynamo deployments consists of three main steps: ...@@ -93,7 +93,7 @@ The GitOps workflow for Dynamo deployments consists of three main steps:
### Step 1: Build and Push Dynamo Cloud Operator ### Step 1: Build and Push Dynamo Cloud Operator
First, follow to [See Install Dynamo Cloud](quickstart.md#install-dynamo-cloud). First, follow to [See Install Dynamo Cloud](README.md).
### Step 2: Create Initial Deployment ### Step 2: Create Initial Deployment
......
../../../guides/dynamo_deploy/operator_deployment.md
\ No newline at end of file
# Quickstart
Your onboarding includes 2 steps.
1. Before deploying your inference graphs you need to install the Dynamo Inference Platform and the Dynamo Cloud.
Dynamo Cloud acts as an orchestration layer between the end user and Kubernetes, handling the complexity of deploying your graphs for you.
You could install from [Published Artifacts](#1-installing-dynamo-cloud-from-published-artifacts) or [Source](#2-installing-dynamo-cloud-from-source)
2. Once you install the Dynamo Cloud, proceed to the [Examples](../../examples/README.md) to deploy an inference graph.
## 1. Installing Dynamo Cloud from Published Artifacts
Use this approach when installing from pre-built helm charts and docker images published to NGC.
### Prerequisites
```bash
export NAMESPACE=dynamo-cloud
export RELEASE_VERSION=0.4.0
```
Install `envsubst`, `kubectl`, `helm`
### Authenticate with NGC
Go to https://ngc.nvidia.com/org to get your NGC_CLI_API_KEY.
```bash
helm repo add nvidia https://helm.ngc.nvidia.com/nvidia --username='$oauthtoken' --password=<YOUR_NGC_CLI_API_KEY>
```
### Fetch Helm Charts
```bash
# Fetch the CRDs helm chart
helm fetch https://helm.ngc.nvidia.com/nvidia/ai-dynamo/charts/dynamo-crds-${RELEASE_VERSION}.tgz
# Fetch the platform helm chart
helm fetch https://helm.ngc.nvidia.com/nvidia/ai-dynamo/charts/dynamo-platform-${RELEASE_VERSION}.tgz
```
### Install Dynamo Cloud
**Step 1: Install Custom Resource Definitions (CRDs)**
```bash
helm install dynamo-crds dynamo-crds-${RELEASE_VERSION}.tgz \
--namespace default \
--wait \
--atomic
```
**Step 2: Install Dynamo Platform**
```bash
kubectl create namespace ${NAMESPACE}
helm install dynamo-platform dynamo-platform-${RELEASE_VERSION}.tgz --namespace ${NAMESPACE}
```
## 2. Installing Dynamo Cloud from Source
Use this approach when developing or customizing Dynamo as a contributor, or using local helm charts from the source repository.
### Prerequisites
Ensure you have the source code checked out and are in the `dynamo` directory:
### Set Environment Variables
Our examples use the [`nvcr.io`](https://catalog.ngc.nvidia.com) but you can setup your own values if you use another docker registry.
```bash
export NAMESPACE=dynamo-cloud # or whatever you prefer.
export DOCKER_SERVER=nvcr.io/nvidia/ai-dynamo/ # your-docker-registry.com
export DOCKER_USERNAME='$oauthtoken' # your-username if not using nvcr.io
export DOCKER_PASSWORD=YOUR_NGC_CLI_API_KEY # your-password if not using nvcr.io
```
### Pick the Dynamo Inference Image
Export the tag of the Dynamo Runtime Image.
If you are using a pre-defined release:
```bash
export IMAGE_TAG=RELEASE_VERSION # i.e. 0.3.2 - the release you are using
```
Or build your own image first and tag it with IMAGE_TAG
```bash
export IMAGE_TAG=<your-pick>
./container/build.sh
docker tag dynamo:latest-vllm <your-registry>/dynamo-base:$IMAGE_TAG
docker login <your-registry>
docker push <your-registry>/dynamo-base:latest-vllm
```
### Install Dynamo Cloud
You need to build and push the Dynamo Cloud Operator Image by running
```bash
cd deploy/cloud/operator
earthly --push +docker --DOCKER_SERVER=$DOCKER_SERVER --IMAGE_TAG=$IMAGE_TAG
```
The Nvidia Cloud Operator image will be pulled from the `$DOCKER_SERVER/dynamo-operator:$IMAGE_TAG`.
You could run the `deploy.sh` or use the manual commands under Step 1 and Step 2.
**Installing with a script (alternative to the Step 1 and Step 2)**
Create the namespace and the docker registry secret.
```bash
kubectl create namespace ${NAMESPACE}
kubectl create secret docker-registry docker-imagepullsecret \
--docker-server=${DOCKER_SERVER} \
--docker-username=${DOCKER_USERNAME} \
--docker-password=${DOCKER_PASSWORD} \
--namespace=${NAMESPACE}
```
You need to add the bitnami helm repository by running:
```bash
helm repo add bitnami https://charts.bitnami.com/bitnami
```
```bash
./deploy.sh --crds
```
if you want guidance during the process, run the deployment script with the `--interactive` flag:
```bash
./deploy.sh --crds --interactive
```
**Installing CRDs manually (alternative to the script deploy.sh)**
***Step 1: Install Custom Resource Definitions (CRDs)**
```bash
helm install dynamo-crds ./crds/ \
--namespace default \
--wait \
--atomic
```
***Step 2: Build Dependencies and Install Platform**
```bash
cd deploy/cloud/helm
helm dep build ./platform/
kubectl create namespace ${NAMESPACE}
# Create docker registry secret
kubectl create secret docker-registry docker-imagepullsecret \
--docker-server=${DOCKER_SERVER} \
--docker-username=${DOCKER_USERNAME} \
--docker-password=${DOCKER_PASSWORD} \
--namespace=${NAMESPACE}
# Install platform
helm install dynamo-platform ./platform/ \
--namespace ${NAMESPACE} \
--set "dynamo-operator.controllerManager.manager.image.repository=${DOCKER_SERVER}/dynamo-operator" \
--set "dynamo-operator.controllerManager.manager.image.tag=${IMAGE_TAG}" \
--set "dynamo-operator.imagePullSecrets[0].name=docker-imagepullsecret"
```
[More on Deploying to Dynamo Cloud](./dynamo_cloud.md)
## Uninstall CRDs for a clean start
We provide a script to uninstall CRDs should you need a clean start.
```bash
./uninstall.sh
```
## Explore Examples
If deploying to Kubernetes, create a Kubernetes secret containing your sensitive values if needed:
```bash
export HF_TOKEN=your_hf_token
kubectl create secret generic hf-token-secret \
--from-literal=HF_TOKEN=${HF_TOKEN} \
-n ${NAMESPACE}
```
Follow the [Examples](../../examples/README.md)
For more details on how to create your own deployments follow [Create Deployment Guide](create_deployment.md)
...@@ -4,18 +4,6 @@ ...@@ -4,18 +4,6 @@
SPDX-FileCopyrightText: Copyright (c) 2024-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. SPDX-FileCopyrightText: Copyright (c) 2024-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
SPDX-License-Identifier: Apache-2.0 SPDX-License-Identifier: Apache-2.0
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
.. This hidden toctree includes readmes etc that aren't meant to be in the main table of contents but should be accounted for in the sphinx project structure .. This hidden toctree includes readmes etc that aren't meant to be in the main table of contents but should be accounted for in the sphinx project structure
...@@ -34,24 +22,32 @@ ...@@ -34,24 +22,32 @@
API/nixl_connect/writable_operation.md API/nixl_connect/writable_operation.md
API/nixl_connect/read_operation.md API/nixl_connect/read_operation.md
API/nixl_connect/write_operation.md API/nixl_connect/write_operation.md
components/backends/sglang/deploy/README.md API/nixl_connect/README.md
components/backends/sglang/docs/dsr1-wideep-h100.md
components/backends/sglang/docs/multinode-examples.md
components/backends/sglang/docs/sgl-http-server.md
components/backends/sglang/slurm_jobs/README.md
examples/README.md
guides/dynamo_deploy/create_deployment.md guides/dynamo_deploy/create_deployment.md
guides/dynamo_deploy/sla_planner_deployment.md guides/dynamo_deploy/sla_planner_deployment.md
guides/dynamo_deploy/helm_install.md
guides/dynamo_deploy/gke_setup.md guides/dynamo_deploy/gke_setup.md
guides/dynamo_deploy/grove.md
guides/dynamo_deploy/k8s_metrics.md
guides/dynamo_deploy/model_caching_with_fluid.md
guides/dynamo_deploy/README.md guides/dynamo_deploy/README.md
guides/dynamo_run.md guides/dynamo_run.md
components/backends/vllm/README.md guides/metrics.md
components/backends/trtllm/README.md guides/run_kvbm_in_vllm.md
components/backends/trtllm/deploy/README.md
components/backends/trtllm/llama4_plus_eagle.md architecture/kv_cache_routing.md
components/backends/trtllm/multinode-examples.md architecture/load_planner.md
components/backends/trtllm/kv-cache-transfer.md architecture/request_migration.md
components/backends/vllm/deploy/README.md
components/backends/vllm/multi-node.md components/backends/trtllm/multinode/multinode-examples.md
components/backends/sglang/docs/multinode-examples.md
examples/README.md
examples/runtime/hello_world/README.md
architecture/distributed_runtime.md
architecture/dynamo_flow.md
.. TODO: architecture/distributed_runtime.md and architecture/dynamo_flow.md
have some outdated names/references and need a refresh.
...@@ -14,6 +14,10 @@ ...@@ -14,6 +14,10 @@
See the License for the specific language governing permissions and See the License for the specific language governing permissions and
limitations under the License. limitations under the License.
..
Main Page
..
Welcome to NVIDIA Dynamo Welcome to NVIDIA Dynamo
======================== ========================
...@@ -22,156 +26,49 @@ The NVIDIA Dynamo Platform is a high-performance, low-latency inference framewor ...@@ -22,156 +26,49 @@ The NVIDIA Dynamo Platform is a high-performance, low-latency inference framewor
.. admonition:: 💎 Discover the latest developments! .. admonition:: 💎 Discover the latest developments!
:class: seealso :class: seealso
This guide is a snapshot of the `Dynamo GitHub Repository <https://github.com/ai-dynamo/dynamo>`_ at a specific point in time. For the latest information and examples, see: This guide is a snapshot at a specific point in time. For the latest information and examples, see the `Dynamo GitHub repository <https://github.com/ai-dynamo/dynamo>`_.
- `Dynamo README <https://github.com/ai-dynamo/dynamo/blob/main/README.md>`_
- `Architecture and features doc <https://github.com/ai-dynamo/dynamo/blob/main/docs/architecture/>`_
- `Usage guides <https://github.com/ai-dynamo/dynamo/tree/main/docs/guides>`_
- `Dynamo examples repo <https://github.com/ai-dynamo/dynamo/tree/main/examples>`_
Quick Start
-----------------
Local Deployment
~~~~~~~~~~~~~~~~
Get started with Dynamo locally in just a few commands:
**1. Install Dynamo**
.. code-block:: bash
# Install uv (recommended Python package manager)
curl -LsSf https://astral.sh/uv/install.sh | sh
# Create virtual environment and install Dynamo
uv venv venv
source venv/bin/activate
uv pip install "ai-dynamo[sglang]" # or [vllm], [trtllm]
**2. Start etcd/NATS**
.. code-block:: bash
# Start etcd and NATS using Docker Compose
docker compose -f deploy/docker-compose.yml up -d
**3. Run Dynamo**
.. code-block:: bash
# Start the OpenAI compatible frontend
python -m dynamo.frontend
# In another terminal, start an SGLang worker
python -m dynamo.sglang.worker deepseek-ai/DeepSeek-R1-Distill-Llama-8B
**4. Test your deployment**
.. code-block:: bash
curl localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "deepseek-ai/DeepSeek-R1-Distill-Llama-8B",
"messages": [{"role": "user", "content": "Hello!"}],
"max_tokens": 50}'
Kubernetes Deployment
~~~~~~~~~~~~~~~~~~~~~
For deployments on Kubernetes, follow the :doc:`Dynamo Platform Quickstart Guide <guides/dynamo_deploy/quickstart>`.
Dive in: Examples Quickstart
----------------- ==========
.. include:: _includes/quick_start_local.rst
The examples below assume you build the latest image yourself from source. If using a prebuilt image follow the examples from the corresponding branch.
.. grid:: 1 2 2 2
:gutter: 3
:margin: 0
:padding: 3 4 0 0
.. grid-item-card:: :doc:`Hello World <examples/runtime/hello_world/README>`
:link: examples/runtime/hello_world/README
:link-type: doc
Demonstrates the basic concepts of Dynamo by creating a simple GPU-unaware graph
.. grid-item-card:: :doc:`LLM Serving with VLLM <components/backends/vllm/README>`
:link: components/backends/vllm/README
:link-type: doc
Presents examples and reference implementations for deploying Large Language Models (LLMs) in various configurations with VLLM.
.. grid-item-card:: :doc:`Multinode with SGLang <components/backends/sglang/docs/multinode-examples>`
:link: components/backends/sglang/docs/multinode-examples
:link-type: doc
Demonstrates disaggregated serving on several nodes.
.. grid-item-card:: :doc:`TensorRT-LLM <components/backends/trtllm/README>`
:link: components/backends/trtllm/README
:link-type: doc
Presents TensorRT-LLM examples and reference implementations for deploying Large Language Models (LLMs) in various configurations.
..
Sidebar
..
.. toctree:: .. toctree::
:hidden: :hidden:
:caption: Getting Started
Welcome to Dynamo <self> Quickstart <self>
Installation <_sections/installation>
Support Matrix <support_matrix.md> Support Matrix <support_matrix.md>
Architecture <_sections/architecture>
Examples <_sections/examples>
.. toctree:: .. toctree::
:hidden: :hidden:
:caption: Architecture & Features :caption: Kubernetes Deployment
High Level Architecture <architecture/architecture.md> Quickstart (K8s) <../guides/dynamo_deploy/dynamo_cloud.md>
Distributed Runtime <architecture/distributed_runtime.md> Dynamo Operator <../guides/dynamo_deploy/dynamo_operator.md>
Disaggregated Serving <architecture/disagg_serving.md> Metrics <../guides/dynamo_deploy/k8s_metrics.md>
KV Block Manager <architecture/kvbm_intro.rst> Multinode <../guides/dynamo_deploy/multinode-deployment.md>
KV Cache Routing <architecture/kv_cache_routing.md> Minikube Setup <../guides/dynamo_deploy/minikube.md>
Planner <architecture/planner_intro.rst>
Dynamo Architecture Flow <architecture/dynamo_flow.md>
.. toctree:: .. toctree::
:hidden: :hidden:
:caption: Using Dynamo :caption: Components
Writing Python Workers in Dynamo <guides/backend.md> Backends <_sections/backends>
Disaggregation and Performance Tuning <guides/disagg_perf_tuning.md> Router <components/router/README>
Working with Dynamo Kubernetes Operator <guides/dynamo_deploy/dynamo_operator.md> Planner <architecture/planner_intro>
KVBM <architecture/kvbm_intro>
.. toctree:: .. toctree::
:hidden: :hidden:
:caption: Deployment Guides :caption: Developer Guide
Dynamo Deploy Quickstart <guides/dynamo_deploy/quickstart.md>
Dynamo Cloud Kubernetes Platform <guides/dynamo_deploy/dynamo_cloud.md>
Manual Helm Deployment <guides/dynamo_deploy/helm_install.md>
Minikube Setup Guide <guides/dynamo_deploy/minikube.md>
Model Caching with Fluid <guides/dynamo_deploy/model_caching_with_fluid.md>
.. toctree::
:hidden:
:caption: Examples
Hello World <examples/runtime/hello_world/README.md>
LLM Deployment Examples using VLLM <components/backends/vllm/README.md>
LLM Deployment Examples using SGLang <components/backends/sglang/README.md>
Multinode Examples using SGLang <components/backends/sglang/docs/multinode-examples.md>
Planner Benchmark Example <guides/planner_benchmark/README.md>
LLM Deployment Examples using TensorRT-LLM <components/backends/trtllm/README.md>
.. toctree::
:hidden:
:caption: Reference
Tuning Disaggregated Serving Performance <guides/disagg_perf_tuning.md>
Writing Python Workers in Dynamo <guides/backend.md>
Glossary <dynamo_glossary.md> Glossary <dynamo_glossary.md>
NIXL Connect API <API/nixl_connect/README.md>
KVBM Reading <architecture/kvbm_reading.md>
...@@ -315,6 +315,7 @@ Send multiple new conversations to see them distributed across replicas: ...@@ -315,6 +315,7 @@ Send multiple new conversations to see them distributed across replicas:
```python ```python
import asyncio import asyncio
from openai import AsyncOpenAI from openai import AsyncOpenAI
import os
if os.environ.get("DYN_FRONTEND_IP"): if os.environ.get("DYN_FRONTEND_IP"):
frontend_ip=os.environ.get("DYN_FRONTEND_IP") frontend_ip=os.environ.get("DYN_FRONTEND_IP")
......
...@@ -106,7 +106,7 @@ Hello star! ...@@ -106,7 +106,7 @@ Hello star!
Note that this a very simple degenerate example which does not demonstrate the standard Dynamo FrontEnd-Backend deployment. The hello-world client is not a web server, it is a one-off function which sends the predefined text "world,sun,moon,star" to the backend. The example is meant to show the HelloWorldWorker. As such you will only see the HelloWorldWorker pod in deployment. The client will run and exit and the pod will not be operational. Note that this a very simple degenerate example which does not demonstrate the standard Dynamo FrontEnd-Backend deployment. The hello-world client is not a web server, it is a one-off function which sends the predefined text "world,sun,moon,star" to the backend. The example is meant to show the HelloWorldWorker. As such you will only see the HelloWorldWorker pod in deployment. The client will run and exit and the pod will not be operational.
Follow the [Quickstart Guide](../../../docs/guides/dynamo_deploy/quickstart.md) to install Dynamo Kubernetes Platform. Follow the [Quickstart Guide](../../../docs/guides/dynamo_deploy/README.md) to install Dynamo Kubernetes Platform.
Then deploy to kubernetes using Then deploy to kubernetes using
```bash ```bash
...@@ -119,4 +119,4 @@ to delete your deployment: ...@@ -119,4 +119,4 @@ to delete your deployment:
```bash ```bash
kubectl delete dynamographdeployment hello-world -n ${NAMESPACE} kubectl delete dynamographdeployment hello-world -n ${NAMESPACE}
``` ```
\ No newline at end of file
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment