Unverified Commit 9e3d3f7a authored by Ben Hamm's avatar Ben Hamm Committed by GitHub
Browse files

docs: revise top-level README for Dynamo 1.0 (#7417)


Signed-off-by: default avatarDan Gil <dagil@nvidia.com>
Co-authored-by: default avatarClaude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: default avatarDan Gil <dagil@nvidia.com>
parent b22a9d76
...@@ -19,373 +19,201 @@ limitations under the License. ...@@ -19,373 +19,201 @@ limitations under the License.
[![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/licenses/Apache-2.0) [![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
[![GitHub Release](https://img.shields.io/github/v/release/ai-dynamo/dynamo)](https://github.com/ai-dynamo/dynamo/releases/latest) [![GitHub Release](https://img.shields.io/github/v/release/ai-dynamo/dynamo)](https://github.com/ai-dynamo/dynamo/releases/latest)
[![PyPI](https://img.shields.io/pypi/v/ai-dynamo)](https://pypi.org/project/ai-dynamo/)
[![Ask DeepWiki](https://deepwiki.com/badge.svg)](https://deepwiki.com/ai-dynamo/dynamo) [![Ask DeepWiki](https://deepwiki.com/badge.svg)](https://deepwiki.com/ai-dynamo/dynamo)
[![Discord](https://dcbadge.limes.pink/api/server/D92uqZRjCZ?style=flat)](https://discord.gg/D92uqZRjCZ) ![Community Contributors](https://img.shields.io/badge/community_contributors-70%2B-brightgreen) [![Discord](https://dcbadge.limes.pink/api/server/D92uqZRjCZ?style=flat)](https://discord.gg/D92uqZRjCZ)
![Community Contributors](https://img.shields.io/badge/community_contributors-70%2B-brightgreen)
| **[Roadmap](https://github.com/ai-dynamo/dynamo/issues/5506)** | **[Support Matrix](https://github.com/ai-dynamo/dynamo/blob/main/docs/reference/support-matrix.md)** | **[Docs](https://docs.nvidia.com/dynamo/)** | **[Recipes](https://github.com/ai-dynamo/dynamo/tree/main/recipes)** | **[Examples](https://github.com/ai-dynamo/dynamo/tree/main/examples)** | **[Prebuilt Containers](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/ai-dynamo/collections/ai-dynamo)** | **[Design Proposals](https://github.com/ai-dynamo/enhancements)** | **[Blogs](https://developer.nvidia.com/blog/tag/nvidia-dynamo)** | **[Docs](https://docs.nvidia.com/dynamo/)** | **[Roadmap](https://github.com/ai-dynamo/dynamo/issues/5506)** | **[Recipes](https://github.com/ai-dynamo/dynamo/tree/main/recipes)** | **[Examples](https://github.com/ai-dynamo/dynamo/tree/main/examples)** | **[Prebuilt Containers](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/ai-dynamo/collections/ai-dynamo)** | **[Blog](https://developer.nvidia.com/blog/nvidia-dynamo-1-production-ready/)** | **[Design Proposals](https://github.com/ai-dynamo/enhancements)** |
# NVIDIA Dynamo # Dynamo
High-throughput, low-latency inference framework designed for serving generative AI and reasoning models in multi-node distributed environments. **The open-source, datacenter-scale inference stack.** Dynamo is the orchestration layer above inference engines — it doesn't replace SGLang, TensorRT-LLM, or vLLM, it turns them into a coordinated multi-node inference system. Disaggregated serving, intelligent routing, multi-tier KV caching, and automatic scaling work together to maximize throughput and minimize latency for LLM, reasoning, multimodal, and video generation workloads.
## Why Dynamo Built in Rust for performance, Python for extensibility.
<p align="center"> ## When to use Dynamo
<img src="./docs/assets/img/frontpage-gpu-vertical.png" alt="Multi Node Multi-GPU topology" width="600" />
</p>
Large language models exceed single-GPU capacity. Tensor parallelism spreads layers across GPUs but creates coordination challenges. Dynamo closes this orchestration gap.
Dynamo is inference engine agnostic (supports SGLang, TRT-LLM, vLLM) and provides:
- **Disaggregated Prefill & Decode** – Maximizes GPU throughput with latency/throughput trade-offs - You're serving LLMs across **multiple GPUs or nodes** and need to coordinate them
- **Dynamic GPU Scheduling** – Optimizes performance based on fluctuating demand - You want **KV-aware routing** to avoid redundant prefill computation
- **LLM-Aware Request Routing** – Eliminates unnecessary KV cache re-computation - You need to **independently scale prefill and decode** (disaggregated serving)
- **Accelerated Data Transfer** – Reduces inference response time using NIXL - You want **automatic scaling** that meets latency SLAs at minimum total cost of ownership (TCO)
- **KV Cache Offloading** – Leverages multiple memory hierarchies for higher throughput - You need **fast cold-starts** when spinning up new replicas
Built in Rust for performance and Python for extensibility, Dynamo is fully open-source with an OSS-first development approach. If you're running a single model on a single GPU, your inference engine alone is probably sufficient.
## Backend Feature Support **Feature support at a glance:**
| | [SGLang](docs/backends/sglang/README.md) | [TensorRT-LLM](docs/backends/trtllm/README.md) | [vLLM](docs/backends/vllm/README.md) | | | [SGLang](https://docs.nvidia.com/dynamo/backends/sg-lang) | [TensorRT-LLM](https://docs.nvidia.com/dynamo/backends/tensor-rt-llm) | [vLLM](https://docs.nvidia.com/dynamo/backends/v-llm) |
|---|:----:|:----------:|:--:| |---|:----:|:----------:|:--:|
| [**Disaggregated Serving**](docs/design-docs/disagg-serving.md) | ✅ | ✅ | ✅ | | [**Disaggregated Serving**](https://docs.nvidia.com/dynamo/design-docs/disaggregated-serving) | ✅ | ✅ | ✅ |
| [**KV-Aware Routing**](docs/components/router/README.md) | ✅ | ✅ | ✅ | | [**KV-Aware Routing**](https://docs.nvidia.com/dynamo/components/router) | ✅ | ✅ | ✅ |
| [**SLA-Based Planner**](docs/components/planner/planner-guide.md) | ✅ | ✅ | ✅ | | [**SLA-Based Planner**](https://docs.nvidia.com/dynamo/components/planner/planner-guide) | ✅ | ✅ | ✅ |
| [**KVBM**](docs/components/kvbm/README.md) | 🚧 | ✅ | ✅ | | [**KVBM**](https://docs.nvidia.com/dynamo/components/kvbm) | 🚧 | ✅ | ✅ |
| [**Multimodal**](docs/features/multimodal/README.md) | ✅ | ✅ | ✅ | | [**Multimodal**](https://docs.nvidia.com/dynamo/user-guides/multimodal) | ✅ | ✅ | ✅ |
| [**Tool Calling**](docs/agents/tool-calling.md) | ✅ | ✅ | ✅ | | [**Tool Calling**](https://docs.nvidia.com/dynamo/user-guides/tool-calling) | ✅ | ✅ | ✅ |
> **[Full Feature Matrix →](docs/reference/feature-matrix.md)** — Detailed compatibility including LoRA, Request Migration, Speculative Decoding, and feature interactions. > **[Full Feature Matrix →](https://docs.nvidia.com/dynamo/resources/feature-matrix)** — LoRA, request migration, speculative decoding, and feature interactions.
## Dynamo Architecture ## Key Results
<p align="center"> | Result | Context |
<img src="./docs/assets/img/frontpage-architecture.png" alt="Dynamo architecture" width="600" /> |--------|---------|
</p> | **7x** higher throughput per GPU | DeepSeek R1 on GB200 NVL72 w/ Dynamo vs B200 without ([InferenceX](https://inferencex.semianalysis.com/)) |
| **7x** faster model startup | ModelExpress weight streaming (DeepSeek-V3 on H200) |
| **2x** faster time to first token | KV-aware routing, Qwen3-Coder 480B ([Baseten benchmark](https://www.baseten.co/blog/how-baseten-achieved-2x-faster-inference-with-nvidia-dynamo/)) |
| **80%** fewer SLA breaches | Planner autoscaling at 5% lower TCO ([Alibaba APSARA 2025 @ 2:50:00](https://yunqi.aliyun.com/2025/session?agendaId=6062)) |
| **750x** higher throughput | DeepSeek-R1 on GB300 NVL72 ([InferenceXv2](https://inferencex.semianalysis.com/)) |
> **[Architecture Deep Dive →](docs/design-docs/architecture.md)**
## Latest News ## What Dynamo Does
- [12/05] [Moonshot AI's Kimi K2 achieves 10x inference speedup with Dynamo on GB200](https://quantumzeitgeist.com/kimi-k2-nvidia-ai-ai-breakthrough/) Most inference engines optimize a single GPU or a single node. Dynamo is the **orchestration layer above them** — it turns a cluster of GPUs into a coordinated inference system.
- [12/02] [Mistral AI runs Mistral Large 3 with 10x faster inference using Dynamo](https://www.marktechpost.com/2025/12/02/nvidia-and-mistral-ai-bring-10x-faster-inference-for-the-mistral-3-family-on-gb200-nvl72-gpu-systems/)
- [12/01] [InfoQ: NVIDIA Dynamo simplifies Kubernetes deployment for LLM inference](https://www.infoq.com/news/2025/12/nvidia-dynamo-kubernetes/)
## Get Started <p align="center">
<img src="./docs/assets/dynamo-readme-overview.svg" alt="Dynamo architecture overview" width="600" />
</p>
| Path | Use Case | Time | Requirements | **[Architecture Deep Dive →](https://docs.nvidia.com/dynamo/design-docs/overall-architecture)**
|------|----------|------|--------------|
| [**Local Quick Start**](#local-quick-start) | Test on a single machine | ~5 min | 1 GPU, Ubuntu 24.04 |
| [**Kubernetes Deployment**](#kubernetes-deployment) | Production multi-node clusters | ~30 min | K8s cluster with GPUs |
| [**Building from Source**](#building-from-source) | Contributors and development | ~15 min | Ubuntu, Rust, Python |
Want to help shape the future of distributed LLM inference? See the **[Contribution Guide](docs/contribution-guide.md)**. ### Core Capabilities
# Local Quick Start | Capability | What it does | Why it matters |
|------------|-------------|----------------|
| [**Disaggregated Prefill/Decode**](https://docs.nvidia.com/dynamo/design-docs/disaggregated-serving) | Separates prefill and decode into independently scalable GPU pools | Maximizes GPU utilization; each phase runs on hardware tuned for its workload |
| [**KV-Aware Routing**](https://docs.nvidia.com/dynamo/components/router) | Routes requests based on worker load and KV cache overlap | Eliminates redundant prefill computation — 2x faster TTFT |
| [**KV Block Manager (KVBM)**](https://docs.nvidia.com/dynamo/components/kvbm) | Offloads KV cache across GPU → CPU → SSD → remote storage | Extends effective context length beyond GPU memory |
| [**ModelExpress**](https://github.com/ai-dynamo/modelexpress) | Streams model weights GPU-to-GPU via NIXL/NVLink | 7x faster cold-start for new replicas |
| [**Planner**](https://docs.nvidia.com/dynamo/components/planner/planner-guide) | SLA-driven autoscaler that profiles workloads and right-sizes pools | Meets latency targets at minimum total cost of ownership (TCO) |
| [**Grove**](https://github.com/ai-dynamo/grove) | K8s operator for topology-aware gang scheduling (NVL72) | Places workloads optimally across racks, hosts, and NUMA nodes |
| [**AIConfigurator**](https://github.com/ai-dynamo/aiconfigurator) | Simulates 10K+ deployment configs in seconds | Finds optimal serving config without burning GPU-hours |
| [**Fault Tolerance**](https://docs.nvidia.com/dynamo/user-guides/fault-tolerance/request-migration) | Canary health checks + in-flight request migration | Workers fail; user requests don't |
The following examples require a few system level packages. ### New in 1.0
Recommended to use Ubuntu 24.04 with a x86_64 CPU. See [docs/reference/support-matrix.md](docs/reference/support-matrix.md)
## Install Dynamo - **Zero-config deploy ([DGDR](https://docs.nvidia.com/dynamo/kubernetes-deployment/deployment-guide/deploying-your-first-model))** *(beta):* Specify model, HW, and SLA in one YAML — AIConfigurator auto-profiles the workload, Planner optimizes the topology, and Dynamo deploys
- **Agentic inference:** Per-request hints for latency priority, expected output length, and cache pinning TTL. [LangChain](https://docs.langchain.com/oss/python/integrations/chat/nvidia_ai_endpoints#use-with-nvidia-dynamo) + [NeMo Agent Toolkit](https://github.com/NVIDIA/NeMo-Agent-Toolkit) integrations
- **Multimodal E/P/D:** Disaggregated encode/prefill/decode with embedding cache — 30% faster TTFT on image workloads
- **Video generation:** Native [FastVideo](https://github.com/hao-ai-lab/FastVideo) + [SGLang Diffusion](https://lmsys.org/blog/2026-02-16-sglang-diffusion-advanced-optimizations/) support — real-time 1080p on single B200
- **K8s Inference Gateway plugin:** KV-aware routing inside the standard Kubernetes gateway
- **Storage-tier KV offload:** S3/Azure blob support + global KV events for cluster-wide cache visibility
### Option A: Containers (Recommended) ## Quick Start
Containers have all dependencies pre-installed. No setup required. ### Option A: Container (fastest)
```bash ```bash
# SGLang # Pull a prebuilt container (SGLang example)
docker run --gpus all --network host --rm -it nvcr.io/nvidia/ai-dynamo/sglang-runtime:1.0.0 docker run --gpus all --network host --rm -it nvcr.io/nvidia/ai-dynamo/sglang-runtime:1.0.0
# TensorRT-LLM # Inside the container — start frontend and worker
docker run --gpus all --network host --rm -it nvcr.io/nvidia/ai-dynamo/tensorrtllm-runtime:1.0.0 python3 -m dynamo.frontend --http-port 8000 --discovery-backend file > /dev/null 2>&1 &
python3 -m dynamo.sglang --model-path Qwen/Qwen3-0.6B --discovery-backend file &
# vLLM # Send a request
docker run --gpus all --network host --rm -it nvcr.io/nvidia/ai-dynamo/vllm-runtime:1.0.0 curl -s localhost:8000/v1/chat/completions -H "Content-Type: application/json" -d '{
"model": "Qwen/Qwen3-0.6B",
"messages": [{"role": "user", "content": "Hello!"}],
"max_tokens": 100
}' | jq
``` ```
> **Tip:** To run frontend and worker in the same container, either run processes in background with `&` (see below), or open a second terminal and use `docker exec -it <container_id> bash`. Also available: [`tensorrtllm-runtime:1.0.0`](https://docs.nvidia.com/dynamo/resources/release-artifacts) and [`vllm-runtime:1.0.0`](https://docs.nvidia.com/dynamo/resources/release-artifacts).
See [Release Artifacts](docs/reference/release-artifacts.md#container-images) for available versions.
### Option B: Install from PyPI ### Option B: Install from PyPI
The Dynamo team recommends the `uv` Python package manager, although any way works.
```bash ```bash
# Install uv (recommended Python package manager) pip install "ai-dynamo[sglang]" # or [vllm] or [trtllm]
curl -LsSf https://astral.sh/uv/install.sh | sh
# Create virtual environment
uv venv venv
source venv/bin/activate
uv pip install pip
``` ```
Install system dependencies and the Dynamo wheel for your chosen backend: Then start the frontend and a worker as shown above. See the [full installation guide](https://docs.nvidia.com/dynamo/getting-started/quickstart) for system dependencies and backend-specific notes.
**SGLang**
```bash
sudo apt install python3-dev
uv pip install "ai-dynamo[sglang]"
```
> **Note:** For CUDA 13 (B300/GB300), the container is recommended. See [SGLang install docs](https://docs.sglang.io/get_started/install.html) for details.
**TensorRT-LLM**
```bash
sudo apt install python3-dev
pip install torch==2.9.0 torchvision --index-url https://download.pytorch.org/whl/cu130
pip install --pre --extra-index-url https://pypi.nvidia.com "ai-dynamo[trtllm]"
```
> **Note:** TensorRT-LLM requires `pip` due to a transitive Git URL dependency that `uv` doesn't resolve. We recommend using the [TensorRT-LLM container](docs/reference/release-artifacts.md#container-images) for broader compatibility.
**vLLM**
```bash
sudo apt install python3-dev libxcb1
uv pip install "ai-dynamo[vllm]"
```
## Run Dynamo
> **Tip (Optional):** Before running Dynamo, verify your system configuration with `python3 deploy/sanity_check.py`
Dynamo provides a simple way to spin up a local set of inference components including: ### Option C: Kubernetes (recommended)
- **OpenAI Compatible Frontend** – High performance OpenAI compatible http api server written in Rust. For production multi-node clusters, install the [Dynamo Platform](https://docs.nvidia.com/dynamo/kubernetes-deployment/deployment-guide) and deploy with a single manifest:
- **Basic and Kv Aware Router** – Route and load balance traffic to a set of workers.
- **Workers** – Set of pre-configured LLM serving engines.
Start the frontend: ```yaml
# Zero-config deploy: specify model + SLA, Dynamo handles the rest
> **Tip:** To run in a single terminal (useful in containers), append `> logfile.log 2>&1 &` to run processes in background. Example: `python3 -m dynamo.frontend --discovery-backend file > dynamo.frontend.log 2>&1 &` apiVersion: nvidia.com/v1beta1
kind: DynamoGraphDeploymentRequest
```bash metadata:
# Start an OpenAI compatible HTTP server with prompt templating, tokenization, and routing. name: my-model
# For local dev: --discovery-backend file avoids etcd (workers and frontend must share a disk) spec:
python3 -m dynamo.frontend --http-port 8000 --discovery-backend file model: Qwen/Qwen3-0.6B
backend: vllm
sla:
ttft: 200.0 # ms
itl: 20.0 # ms
autoApply: true
``` ```
In another terminal (or same terminal if using background mode), start a worker for your chosen backend: Pre-built recipes for common models:
```bash
# SGLang
python3 -m dynamo.sglang --model-path Qwen/Qwen3-0.6B --discovery-backend file
# TensorRT-LLM
python3 -m dynamo.trtllm --model-path Qwen/Qwen3-0.6B --discovery-backend file
# vLLM (note: uses --model, not --model-path) | Model | Framework | Mode | Recipe |
python3 -m dynamo.vllm --model Qwen/Qwen3-0.6B --discovery-backend file \ |-------|-----------|------|--------|
--kv-events-config '{"enable_kv_cache_events": false}' | Llama-3-70B | vLLM | Aggregated | [View](recipes/llama-3-70b/vllm/) |
``` | DeepSeek-R1 | SGLang | Disaggregated | [View](recipes/deepseek-r1/sglang/) |
| Qwen3-32B-FP8 | TensorRT-LLM | Aggregated | [View](recipes/qwen3-32b-fp8/trtllm/) |
> **Note:** For dependency-free local development, disable KV event publishing (avoids NATS): See [recipes/](recipes/README.md) for the full list. Cloud-specific guides: [AWS EKS](examples/deployments/EKS/) · [Google GKE](examples/deployments/GKE/)
> - **vLLM:** Add `--kv-events-config '{"enable_kv_cache_events": false}'`
> - **SGLang:** No flag needed (KV events disabled by default)
> - **TensorRT-LLM:** No flag needed (KV events disabled by default)
>
> **TensorRT-LLM only:** The warning `Cannot connect to ModelExpress server/transport error. Using direct download.` is expected and can be safely ignored.
>
> See [Service Discovery and Messaging](#service-discovery-and-messaging) for details.
> **Deprecation notice:** vLLM automatically enables KV event publishing when prefix caching is active. In a future release, this will change — KV events will be disabled by default for all backends. Start using `--kv-events-config` explicitly to prepare. ## Building from Source
#### Send a Request For contributors who want to build and develop locally. See the [full build guide](https://docs.nvidia.com/dynamo/getting-started/contribution-guide#building-from-source) for details.
```bash ```bash
curl localhost:8000/v1/chat/completions -H "Content-Type: application/json" -d '{ # Install system deps (Ubuntu 24.04)
"model": "deepseek-ai/DeepSeek-R1-Distill-Llama-8B",
"messages": [
{
"role": "user",
"content": "Hello, how are you?"
}
],
"stream":false,
"max_tokens": 300
}' | jq
```
Rerun with `curl -N` and change `stream` in the request to `true` to get the responses as soon as the engine issues them.
# Kubernetes Deployment
For production deployments on Kubernetes clusters with multiple GPUs.
## Prerequisites
- Kubernetes cluster with GPU nodes
- [Dynamo Platform installed](docs/kubernetes/README.md)
- HuggingFace token for model downloads
## Production Recipes
Pre-built deployment configurations for common models and topologies:
| Model | Framework | Mode | GPUs | Recipe |
|-------|-----------|------|------|--------|
| Llama-3-70B | vLLM | Aggregated | 4x H100 | [View](recipes/llama-3-70b/vllm/) |
| DeepSeek-R1 | SGLang | Disaggregated | 8x H200 | [View](recipes/deepseek-r1/sglang/) |
| Qwen3-32B-FP8 | TensorRT-LLM | Aggregated | 8x GPU | [View](recipes/qwen3-32b-fp8/trtllm/) |
See [recipes/README.md](recipes/README.md) for the full list and deployment instructions.
## Cloud Deployment Guides
- [Amazon EKS](examples/deployments/EKS/)
- [Google GKE](examples/deployments/GKE/)
# Building from Source
For contributors who want to build Dynamo from source rather than installing from PyPI.
## 1. Install Libraries
**Ubuntu:**
```
sudo apt install -y build-essential libhwloc-dev libudev-dev pkg-config libclang-dev protobuf-compiler python3-dev cmake sudo apt install -y build-essential libhwloc-dev libudev-dev pkg-config libclang-dev protobuf-compiler python3-dev cmake
```
**macOS:**
- [Homebrew](https://brew.sh/)
```
# if brew is not installed on your system, install it
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
```
- [Xcode](https://developer.apple.com/xcode/) # Install Rust
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh && source $HOME/.cargo/env
``` # Create venv and build
brew install cmake protobuf uv venv dynamo && source dynamo/bin/activate
## Check that Metal is accessible
xcrun -sdk macosx metal
```
If Metal is accessible, you should see an error like `metal: error: no input files`, which confirms it is installed correctly.
## 2. Install Rust
```
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
source $HOME/.cargo/env
```
## 3. Create a Python Virtual Environment
Follow the instructions in [uv installation](https://docs.astral.sh/uv/#installation) guide to install uv if you don't have `uv` installed. Once uv is installed, create a virtual environment and activate it.
- Install uv
```bash
curl -LsSf https://astral.sh/uv/install.sh | sh
```
- Create a virtual environment
```bash
uv venv dynamo
source dynamo/bin/activate
```
## 4. Install Build Tools
```
uv pip install pip maturin uv pip install pip maturin
``` cd lib/bindings/python && maturin develop --uv && cd $PROJECT_ROOT
[Maturin](https://github.com/PyO3/maturin) is the Rust<->Python bindings build tool.
## 5. Build the Rust Bindings
```
cd lib/bindings/python
maturin develop --uv
```
## 6. Install GPU Memory Service
The GPU Memory Service is a Python package with a C++ extension. It requires only Python development headers and a C++ compiler (g++).
```bash
cd $PROJECT_ROOT
uv pip install -e lib/gpu_memory_service uv pip install -e lib/gpu_memory_service
```
## 7. Install the Wheel
```
cd $PROJECT_ROOT
uv pip install -e . uv pip install -e .
``` ```
## 8. Run the Frontend > VSCode/Cursor users: see the [`.devcontainer`](.devcontainer/README.md) for a pre-configured dev environment.
```bash
python3 -m dynamo.frontend
```
## 9. Configure for Local Development
- Pass `--discovery-backend file` to avoid external dependencies (see [Service Discovery and Messaging](#service-discovery-and-messaging))
- Set `DYN_LOG` to adjust the logging level (e.g., `export DYN_LOG=debug`). Uses the same syntax as `RUST_LOG`
> **Note:** VSCode and Cursor users can use the `.devcontainer` folder for a pre-configured dev environment. See the [devcontainer README](.devcontainer/README.md) for details.
# Advanced Topics
## Benchmarking
Dynamo provides comprehensive benchmarking tools:
- **[Benchmarking Guide](docs/benchmarks/benchmarking.md)** – Compare deployment topologies using AIPerf
- **[SLA-Driven Deployments](docs/components/planner/planner-guide.md)** – Optimize deployments to meet SLA requirements
## Frontend OpenAPI Specification
The OpenAI-compatible frontend exposes an OpenAPI 3 spec at `/openapi.json`. To generate without running the server:
```bash
cargo run -p dynamo-llm --bin generate-frontend-openapi
```
This writes to `docs/reference/api/openapi.json`.
## Service Discovery and Messaging
Dynamo uses TCP for inter-component communication. On Kubernetes, native resources ([CRDs + EndpointSlices](docs/kubernetes/service-discovery.md)) handle service discovery. External services are optional for most deployments:
| Deployment | etcd | NATS | Notes |
|------------|------|------|-------|
| **Local Development** | ❌ Not required | ❌ Not required | Pass `--discovery-backend file`; vLLM also needs `--kv-events-config '{"enable_kv_cache_events": false}'` |
| **Kubernetes** | ❌ Not required | ❌ Not required | K8s-native discovery; TCP request plane |
> **Note:** KV-Aware Routing requires NATS for prefix caching coordination. ## Community and Contributing
For Slurm or other distributed deployments (and KV-aware routing): Dynamo is built in the open with an OSS-first development model. We welcome contributions of all kinds.
- [etcd](https://etcd.io/) can be run directly as `./etcd`. - **[Contribution Guide](https://docs.nvidia.com/dynamo/getting-started/contribution-guide)** — How to contribute code, docs, and recipes
- [nats](https://nats.io/) needs JetStream enabled: `nats-server -js`. - **[Design Proposals](https://github.com/ai-dynamo/enhancements)** — RFCs for major features
- **[Office Hours](https://www.youtube.com/playlist?list=PL5B692fm6--tgryKu94h2Zb7jTFM3Go4X)** — Biweekly community calls
- **[Discord](https://discord.gg/D92uqZRjCZ)** — Chat with the team and community
- **[Dynamo Day Recordings](https://nvevents.nvidia.com/dynamoday)** — Deep dives from production users
To quickly setup both: `docker compose -f deploy/docker-compose.yml up -d` ## Latest News
See [TRT-LLM on Slurm](examples/basics/multinode/trtllm/README.md) for deployment examples. - [03/15] [Dynamo 1.0 is here — production-ready with strong community adoption](https://developer.nvidia.com/blog/nvidia-dynamo-1-production-ready/)
- [03/15] [NVIDIA Blackwell Ultra sets new inference records in MLPerf](https://developer.nvidia.com/blog/nvidia-blackwell-ultra-sets-new-inference-records-in-mlperf-debut/)
- [03/15] [NVIDIA Blackwell leads on SemiAnalysis InferenceMax benchmarks](https://developer.nvidia.com/blog/nvidia-blackwell-leads-on-new-semianalysis-inferencemax-benchmarks/)
- [12/05] [Moonshot AI's Kimi K2 achieves 10x inference speedup with Dynamo on GB200](https://quantumzeitgeist.com/kimi-k2-nvidia-ai-ai-breakthrough/)
- [12/02] [Mistral AI runs Mistral Large 3 with 10x faster inference using Dynamo](https://www.marktechpost.com/2025/12/02/nvidia-and-mistral-ai-bring-10x-faster-inference-for-the-mistral-3-family-on-gb200-nvl72-gpu-systems/)
- [11/20] [Dell integrates PowerScale with NIXL for 19x faster TTFT](https://www.dell.com/en-us/dt/corporate/newsroom/announcements/detailpage.press-releases~usa~2025~11~dell-technologies-and-nvidia-advance-enterprise-ai-innovation.htm)
## More News <details>
<summary>Older news</summary>
- [11/20] [Dell integrates PowerScale with Dynamo's NIXL for 19x faster TTFT](https://www.dell.com/en-us/dt/corporate/newsroom/announcements/detailpage.press-releases~usa~2025~11~dell-technologies-and-nvidia-advance-enterprise-ai-innovation.htm)
- [11/20] [WEKA partners with NVIDIA on KV cache storage for Dynamo](https://siliconangle.com/2025/11/20/nvidia-weka-kv-cache-solution-ai-inferencing-sc25/) - [11/20] [WEKA partners with NVIDIA on KV cache storage for Dynamo](https://siliconangle.com/2025/11/20/nvidia-weka-kv-cache-solution-ai-inferencing-sc25/)
- [11/13] [Dynamo Office Hours Playlist](https://www.youtube.com/playlist?list=PL5B692fm6--tgryKu94h2Zb7jTFM3Go4X) - [11/13] [Dynamo Office Hours Playlist](https://www.youtube.com/playlist?list=PL5B692fm6--tgryKu94h2Zb7jTFM3Go4X)
- [10/16] [How Baseten achieved 2x faster inference with NVIDIA Dynamo](https://www.baseten.co/blog/how-baseten-achieved-2x-faster-inference-with-nvidia-dynamo/) - [10/16] [How Baseten achieved 2x faster inference with NVIDIA Dynamo](https://www.baseten.co/blog/how-baseten-achieved-2x-faster-inference-with-nvidia-dynamo/)
- [12/01] [InfoQ: NVIDIA Dynamo simplifies Kubernetes deployment for LLM inference](https://www.infoq.com/news/2025/12/nvidia-dynamo-kubernetes/)
</details>
## Reference
- **[Support Matrix](https://docs.nvidia.com/dynamo/resources/support-matrix)** — Hardware, OS, CUDA, and backend versions
- **[Feature Matrix](https://docs.nvidia.com/dynamo/resources/feature-matrix)** — Detailed backend compatibility
- **[Release Artifacts](https://docs.nvidia.com/dynamo/resources/release-artifacts)** — Containers, wheels, Helm charts
- **[Service Discovery](https://docs.nvidia.com/dynamo/kubernetes-deployment/deployment-guide/service-discovery)** — K8s-native vs etcd vs file-based discovery
- **[Benchmarking Guide](https://docs.nvidia.com/dynamo/user-guides/dynamo-benchmarking)** — Compare deployment topologies with AIPerf
<!-- Reference links for Feature Compatibility Matrix --> <!-- Reference links for Feature Compatibility Matrix -->
[disagg]: docs/design-docs/disagg-serving.md [disagg]: docs/design-docs/disagg-serving.md
......
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 560 595" width="1680" height="1785">
<defs>
<linearGradient id="bg-grad" x1="0%" y1="0%" x2="100%" y2="100%">
<stop offset="0%" stop-color="#F8FAFC"/>
<stop offset="100%" stop-color="#EFF2F6"/>
</linearGradient>
<linearGradient id="green-fill" x1="0%" y1="0%" x2="100%" y2="100%">
<stop offset="0%" stop-color="#8BD420"/>
<stop offset="100%" stop-color="#6AAF00"/>
</linearGradient>
<filter id="shadow" x="-8%" y="-10%" width="116%" height="128%">
<feDropShadow dx="0" dy="3" stdDeviation="6" flood-color="rgba(15,23,42,0.10)"/>
</filter>
<filter id="green-glow" x="-12%" y="-14%" width="124%" height="136%">
<feDropShadow dx="0" dy="3" stdDeviation="8" flood-color="rgba(118,185,0,0.35)"/>
</filter>
<filter id="blue-glow" x="-4%" y="-4%" width="108%" height="114%">
<feDropShadow dx="0" dy="2" stdDeviation="10" flood-color="rgba(59,130,246,0.15)"/>
</filter>
<linearGradient id="accent-fade" x1="0" y1="0" x2="0" y2="1">
<stop offset="0%" stop-color="#76B900" stop-opacity="0"/>
<stop offset="12%" stop-color="#76B900" stop-opacity="0.65"/>
<stop offset="88%" stop-color="#76B900" stop-opacity="0.65"/>
<stop offset="100%" stop-color="#76B900" stop-opacity="0"/>
</linearGradient>
<pattern id="grid-dots" width="40" height="40" patternUnits="userSpaceOnUse">
<circle cx="20" cy="20" r="0.5" fill="rgba(148,163,184,0.2)"/>
</pattern>
<marker id="arrow" viewBox="0 0 6 6" refX="5" refY="3"
markerWidth="5" markerHeight="5" orient="auto">
<path d="M0,0.5 L6,3 L0,5.5 Z" fill="#CBD5E1"/>
</marker>
<marker id="arrow-dark" viewBox="0 0 6 6" refX="5" refY="3"
markerWidth="5" markerHeight="5" orient="auto">
<path d="M0,0.5 L6,3 L0,5.5 Z" fill="#64748B"/>
</marker>
<marker id="arrow-green" viewBox="0 0 6 6" refX="5" refY="3"
markerWidth="6" markerHeight="6" orient="auto">
<path d="M0,0.5 L6,3 L0,5.5 Z" fill="#D97706"/>
</marker>
<marker id="arrow-green-rev" viewBox="0 0 6 6" refX="1" refY="3"
markerWidth="6" markerHeight="6" orient="auto">
<path d="M6,0.5 L0,3 L6,5.5 Z" fill="#D97706"/>
</marker>
<symbol id="sym-monitor" viewBox="0 0 14 14">
<rect x="2" y="1" width="10" height="8" rx="1.5" fill="none" stroke="currentColor" stroke-width="1.6"/>
<line x1="7" y1="9" x2="7" y2="12" stroke="currentColor" stroke-width="1.6" stroke-linecap="round"/>
<line x1="4" y1="12" x2="10" y2="12" stroke="currentColor" stroke-width="1.6" stroke-linecap="round"/>
</symbol>
<symbol id="sym-fork" viewBox="0 0 14 14">
<path d="M2,7 L7,7" stroke="currentColor" stroke-width="1.8" stroke-linecap="round"/>
<path d="M7,7 L12,3" stroke="currentColor" stroke-width="1.8" stroke-linecap="round"/>
<path d="M7,7 L12,11" stroke="currentColor" stroke-width="1.8" stroke-linecap="round"/>
<circle cx="12" cy="3" r="1.2" fill="currentColor"/>
<circle cx="12" cy="11" r="1.2" fill="currentColor"/>
</symbol>
<symbol id="sym-gauge" viewBox="0 0 14 14">
<path d="M2,10 A5.5,5.5 0 1,1 12,10" fill="none" stroke="currentColor" stroke-width="1.6" stroke-linecap="round"/>
<line x1="7" y1="9" x2="9.5" y2="4.5" stroke="currentColor" stroke-width="1.8" stroke-linecap="round"/>
<circle cx="7" cy="9" r="1" fill="currentColor"/>
</symbol>
<symbol id="sym-layers" viewBox="0 0 14 14">
<path d="M1,8 L7,11 L13,8" fill="none" stroke="currentColor" stroke-width="1.5" stroke-linecap="round" stroke-linejoin="round"/>
<path d="M1,5.5 L7,8.5 L13,5.5" fill="none" stroke="currentColor" stroke-width="1.5" stroke-linecap="round" stroke-linejoin="round"/>
<path d="M1,3 L7,6 L13,3 L7,0 Z" fill="none" stroke="currentColor" stroke-width="1.5" stroke-linejoin="round"/>
</symbol>
<symbol id="sym-stream" viewBox="0 0 14 14">
<line x1="2" y1="3" x2="8" y2="3" stroke="currentColor" stroke-width="1.5" stroke-linecap="round"/>
<line x1="2" y1="7" x2="10" y2="7" stroke="currentColor" stroke-width="1.5" stroke-linecap="round"/>
<line x1="2" y1="11" x2="6" y2="11" stroke="currentColor" stroke-width="1.5" stroke-linecap="round"/>
<line x1="12" y1="0" x2="12" y2="14" stroke="currentColor" stroke-width="1.8" stroke-linecap="round" opacity="0.6"/>
</symbol>
<symbol id="sym-blocks" viewBox="0 0 14 14">
<rect x="0.5" y="0.5" width="5.5" height="5.5" rx="1" fill="none" stroke="currentColor" stroke-width="1.3"/>
<rect x="7" y="0.5" width="5.5" height="5.5" rx="1" fill="none" stroke="currentColor" stroke-width="1.3"/>
<rect x="0.5" y="7" width="5.5" height="5.5" rx="1" fill="none" stroke="currentColor" stroke-width="1.3"/>
<rect x="8" y="8" width="5.5" height="5.5" rx="1" fill="currentColor" opacity="0.25" stroke="currentColor" stroke-width="1.3"/>
</symbol>
<symbol id="sym-gate" viewBox="0 0 14 14">
<rect x="2" y="1" width="4" height="12" rx="1" fill="none" stroke="currentColor" stroke-width="1.4"/>
<rect x="8" y="1" width="4" height="12" rx="1" fill="none" stroke="currentColor" stroke-width="1.4"/>
</symbol>
<symbol id="sym-transfer" viewBox="0 0 14 12">
<line x1="0" y1="6" x2="14" y2="6" stroke="currentColor" stroke-width="0.8" stroke-dasharray="2 2"/>
<path d="M3,2 L7,6 L3,10" fill="none" stroke="currentColor" stroke-width="1.8" stroke-linecap="round" stroke-linejoin="round"/>
<path d="M11,2 L7,6 L11,10" fill="none" stroke="currentColor" stroke-width="1.8" stroke-linecap="round" stroke-linejoin="round"/>
</symbol>
<clipPath id="clip-prefill"><rect x="50" y="256" width="160" height="88" rx="10"/></clipPath>
<clipPath id="clip-decode"><rect x="330" y="256" width="160" height="88" rx="10"/></clipPath>
</defs>
<style>
text { font-family: -apple-system, 'Segoe UI', 'Helvetica Neue', Arial, sans-serif; }
.title { font-size: 24px; font-weight: 700; fill: #0F172A; }
.subtitle { font-size: 12px; font-weight: 400; fill: #94A3B8; letter-spacing: 0.3px; }
.box-label { font-size: 13.5px; font-weight: 600; fill: #1E293B; text-anchor: start; }
.box-label-center { font-size: 14px; font-weight: 600; fill: #1E293B; text-anchor: middle; }
.box-label-white { font-size: 14px; font-weight: 700; fill: #FFFFFF; text-anchor: middle; }
.box-sub { font-size: 10px; font-weight: 400; fill: #64748B; text-anchor: start; }
.box-sub-center { font-size: 10px; font-weight: 400; fill: #64748B; text-anchor: middle; }
.box-sub-white { font-size: 10px; font-weight: 400; fill: rgba(255,255,255,0.75); text-anchor: middle; }
.conn-label { font-size: 10px; font-weight: 600; text-anchor: middle; }
.pill-blue { fill: rgba(59,130,246,0.12); stroke: rgba(59,130,246,0.15); stroke-width: 0.5; }
.pill-green { fill: rgba(217,119,6,0.12); stroke: rgba(217,119,6,0.15); stroke-width: 0.5; }
.pill-slate { fill: rgba(100,116,139,0.08); stroke: rgba(100,116,139,0.12); stroke-width: 0.5; }
.label-blue { fill: #3B82F6; }
.label-green { fill: #B45309; }
.label-slate { fill: #64748B; }
.zone-label { font-size: 9.5px; font-weight: 600; fill: #94A3B8; letter-spacing: 1.2px; text-anchor: middle; }
.connector { fill: none; stroke: #CBD5E1; stroke-width: 1.8; }
.connector-kv { fill: none; stroke: #D97706; stroke-width: 2.4; }
.connector-dashed { fill: none; stroke: #CBD5E1; stroke-width: 1.8; stroke-dasharray: 6 4; }
</style>
<!-- Dynamo Logo -->
<g transform="translate(4,2) scale(1.25)">
<path fill="#76B900" d="M39.49988,24c0-.61621-.04578-1.2207-.1156-1.81787.18384-.31543.37524-.63477.5365-.93945,1.00208-1.90088,1.57373-3.56885,1.57886-4.9292.00232-.81348-.2207-1.55127-.75012-2.07324l-.00049.00049c-.53174-.52588-1.2771-.74365-2.1012-.7417-.4353,0-.8999.05859-1.39343.16797l.21594.97607c.43762-.09668.83337-.14453,1.17749-.14453.66443.00195,1.11035.16797,1.39893.45312l.00073.00146c.28528.28125.44995.71289.45227,1.36035.00488,1.0376-.47583,2.55273-1.38306,4.29688-1.29248-5.78564-5.81421-10.34082-11.57874-11.68896.14099-.07227.28088-.14209.41882-.20947l.44922-.21875-.43726-.89844-.44922.21875c-.15088.07275-.30322.14893-.45703.22803l-.44482.22852.26074.50732c-.93384-.17529-1.89368-.27734-2.87842-.27734-8.56067,0-15.49976,6.93945-15.5,15.5,0,.61621.04565,1.2207.1156,1.81787-.18396.31543-.37549.63477-.53699.93994-1.0022,1.90137-1.57373,3.56982-1.57886,4.93066-.00244.81348.22119,1.55176.75049,2.07373l.00037.00098c.52856.52051,1.27014.73779,2.09155.73584.41211,0,.85132-.05225,1.31714-.1499l-.20581-.97852-.00012-.00049c-.41138.08691-.78467.12939-1.11121.12939-.6604-.00195-1.10413-.16748-1.38965-.44922h-.00012c-.28564-.28271-.45044-.71436-.45276-1.36182-.005-1.03906.47595-2.5542,1.38354-4.29932,1.29224,5.78564,5.81409,10.34082,11.57874,11.68896-.14099.07227-.28076.14209-.4187.20898l-.44971.21826.43677.89941.44971-.21826c.151-.07324.30347-.14941.45728-.22852l.44434-.22852-.26074-.50732c.93396.17529,1.89417.27734,2.87915.27734,8.56055,0,15.49963-6.93945,15.49988-15.49951ZM9.65552,26.01221c.9043-1.50684,2.06372-3.14062,3.45288-4.82471l.00024.00049-.00012-.00049c.61035-.73975,1.26306-1.48828,1.95447-2.23926l-.7356-.67676c-.70337.76318-1.36804,1.52539-1.99023,2.27979l.00012.00049c-1.07947,1.30859-2.01318,2.58545-2.81897,3.81201-.00317-.12207-.01843-.24072-.01843-.36377.00037-4.00586,1.62219-7.62744,4.24695-10.25342,2.62561-2.62451,6.24744-4.24658,10.25317-4.24707.17468,0,.34424.02002.51733.02637-.37134.23877-.74536.48438-1.12598.74854l.57007.82129-.00012.00049c.76208-.52832,1.49988-.99561,2.21375-1.41602,3.13135.47217,5.93835,1.92773,8.078,4.06641,2.1759,2.17725,3.64624,5.04346,4.09119,8.24072-.9043,1.50732-2.06372,3.1416-3.453,4.82617h0c-.61023.74023-1.26282,1.48877-1.95422,2.23926l.7356.67676c.70312-.7627,1.3678-1.52539,1.98999-2.2793h.00037c1.07922-1.30957,2.01282-2.58691,2.81873-3.81396.00293.12207.01831.24072.01831.36377-.00037,4.00537-1.62219,7.62695-4.24695,10.25293-2.62561,2.62451-6.24744,4.24658-10.25305,4.24707-.17444,0-.34375-.02002-.51672-.02637.37085-.23828.74451-.48389,1.12463-.74756l-.56982-.82227v.00049c-.7616.52832-1.49902.99512-2.2124,1.41504-3.13184-.47217-5.93896-1.92773-8.07886-4.06641-2.17603-2.17725-3.64624-5.04395-4.09131-8.24072Z"/>
<path fill="#76B900" d="M19.56604,28.06836l2.69275,1.34668c-.01807.10156-.03149.20508-.03149.31201.00024.979.79346,1.77246,1.77271,1.77246.97913,0,1.77234-.79346,1.77258-1.77246-.00012-.10693-.01343-.21045-.03162-.31201l2.69275-1.34668c.32361.34717.78137.56738,1.29358.56738.97913,0,1.77222-.79297,1.77258-1.77197-.00024-.97949-.79346-1.77295-1.77258-1.77295-.18311,0-.35632.03564-.52246.0874l-.78577-1.17822.78601-1.17969c.16602.05176.33911.0874.52222.0874.97913,0,1.77234-.79346,1.77258-1.77246-.00024-.97949-.79346-1.77295-1.77258-1.77295-.51245,0-.97034.2207-1.29395.56836l-2.69226-1.34668c.01807-.10107.03137-.20459.03149-.31104-.00024-.97949-.79346-1.77295-1.77258-1.77295-.97925,0-1.77246.79346-1.77271,1.77295,0,.10693.01343.20996.03149.31152l-2.7168,1.3584c-.32202-.33154-.77075-.53857-1.26929-.53857-.97925,0-1.77246.79346-1.77271,1.77295.00037.979.79346,1.77197,1.77271,1.77197.19141,0,.37231-.03809.54517-.09424l.76257,1.14453-.78552,1.17871c-.16602-.05176-.33911-.0874-.52222-.0874-.97925,0-1.77246.79346-1.77271,1.77295.00037.979.79346,1.77197,1.77271,1.77197.51221,0,.96973-.22021,1.29333-.56738ZM20.78296,23.99951l.52478-.78711.95105.4751c-.01807.10205-.03149.20557-.03149.3125,0,.10645.01343.20947.03137.31104l-.95093.47559-.18689-.28027-.33789-.50684ZM26.69226,24.78613l-.95105-.47461c.01807-.10156.03137-.20508.03137-.31152,0-.10693-.01331-.21045-.03149-.31201l.95093-.47559.5249.78711-.52466.78662ZM24.77271,24c-.00024.10254-.02246.19922-.05859.28809l-.0166.03369c-.03455.07422-.08057.1416-.13574.20068-.02173.02393-.04858.0415-.07288.06201-.04578.03857-.09412.07324-.14807.1001-.02441.0127-.04907.02441-.07483.03418-.08362.03125-.17163.05371-.26599.0542-.09448-.00049-.18237-.02295-.26611-.0542-.02576-.00977-.05029-.02148-.07483-.03418-.05371-.02686-.10217-.06152-.14783-.09961-.02441-.021-.05151-.03857-.07324-.0625-.05493-.05908-.10083-.12598-.13525-.2002l-.01709-.03418c-.03613-.08936-.05835-.18604-.05847-.28809.00085-.42725.34619-.77246.77283-.77344.42651.00098.77185.34619.77271.77344ZM20.04541,26.86377c0-.10693-.01343-.21045-.03149-.31201l.95117-.4751,1.57422,2.36133-2.52539-1.2627c.01807-.10156.03149-.20459.03149-.31152ZM23.53125,28.02539c-.0188.00488-.03931.00391-.05774.00977l-1.0011-1.50098-.60596-.90918.84033-.41992c.10181.10938.21582.20557.34216.28662.01062.00684.02234.01123.03308.01758.11304.06885.23462.12305.36255.16553.03442.01172.06824.02295.10376.03223.14502.03857.29456.06543.45166.06543s.30664-.02686.45166-.06543c.03552-.00928.06921-.02051.10376-.03223.12793-.04199.24951-.09668.36255-.16553.01074-.00635.02246-.01074.03296-.01758.12646-.08105.24036-.17725.34216-.28662l.08997.04492.75037.37549-.81189,1.21777-.79492,1.19189c-.01843-.00586-.03906-.00488-.05786-.00977-.15015-.0415-.30542-.0708-.46875-.0708s-.3186.0293-.46875.0708ZM25.46033,28.43799l1.57434-2.36182.95129.47559c-.01819.10156-.03149.20508-.03149.31201s.01331.20996.03137.31152l-2.52551,1.2627ZM27.59387,25.2373l.224-.33643.44775.67188-.67175-.33545ZM27.59375,22.76172l.67334-.33691-.44885.67334-.22449-.33643ZM24.52246,19.95752l.29602.44434,1.31458,1.97217-.84009.41992c-.32349-.34668-.78101-.56689-1.29297-.56689s-.96948.22021-1.29297.56689l-.84058-.41992.00037-.00049h-.00012l1.6106-2.41602c.16626.05176.33948.0874.52271.0874.18311,0,.3562-.03564.52246-.0874ZM20.21985,22.66846l.18604.09326-.22437.33643-.44836-.67285.48669.24316ZM20.40588,25.2373l-.67126.33496.44727-.6709.224.33594ZM24,30.5c-.42664-.00098-.77197-.34619-.77283-.77295.00012-.10254.02234-.19971.05872-.28906l.01562-.03076c.00757-.0166.02124-.02881.03003-.04492.05322-.09766.125-.1748.20508-.23682.03894-.03027.07764-.06055.12207-.08252.02319-.01172.04578-.02441.07007-.03369.08496-.03223.17505-.05518.27124-.05518s.18628.02295.27124.05518c.02417.00928.04688.02197.07007.03369.04431.02197.08301.05225.12195.08203.08044.0625.15222.14014.20569.23828.00854.01562.02197.02734.02954.04395l.01538.03076c.0365.08936.05859.18652.05884.28906-.00085.42676-.34619.77197-.77271.77295ZM30.5,26.86377c-.00085.42578-.34619.77148-.77271.77246-.30542-.00049-.56348-.18164-.68921-.43848-.05347-.10986-.08325-.21826-.08362-.33447.00024-.11523.03015-.22461.08362-.33398.12549-.25732.38379-.43848.68921-.43896.42651.00098.77185.34619.77271.77344ZM29.72729,20.36182c.42651.00098.77185.34619.77271.77344-.00085.42676-.34619.77197-.77271.77295-.42676-.00098-.77209-.34619-.77295-.77295.00085-.42725.34619-.77246.77295-.77344ZM27.98608,21.44775l-.60938.30469-.34192.1709-1.57483-2.36182,2.52588,1.2627c-.01807.10156-.03137.20459-.03137.31104,0,.10693.01331.21094.03162.3125ZM24,17.49951c.42651.00098.77185.34619.77271.77344-.00085.42578-.34619.77148-.77271.77246-.42664-.00098-.77197-.34668-.77283-.77246.00085-.42725.34619-.77246.77283-.77344ZM22.53979,19.56152l-1.14355,1.71533-.43091.646-.94751-.47314c.01392-.08936.02747-.1792.02747-.27246,0-.12012-.01257-.2373-.0354-.35107l2.52991-1.26465ZM17.49988,21.17725c.00085-.42725.34619-.77246.77283-.77344.42664.00098.77185.34619.77271.77344-.00085.42578-.34607.77148-.77271.77246-.42664-.00098-.77197-.34668-.77283-.77246ZM17.49988,26.86377c.00085-.42725.34619-.77246.77283-.77344.30542.00049.56372.18164.68921.43896.05322.10986.08325.21924.0835.33398-.00037.11621-.03003.22461-.08374.33447-.12549.25684-.38354.43799-.68896.43848-.42664-.00098-.77197-.34668-.77283-.77246Z"/>
<path fill="#76B900" d="M10.69629,14.45703c.10706-.12354.21558-.24805.32544-.37256l.33057-.375-.75-.66113-.33057.375c-.11218.12695-.2229.25342-.33228.38037l-.32666.37842.75684.65332.32666-.37842Z"/>
<path fill="#76B900" d="M12.36096,12.62549c.1134-.11914.22791-.23779.34351-.35693l.34863-.35889-.71729-.69727-.34863.35889c-.1178.12158-.23438.24268-.34985.36377l-.34521.36182.72363.69043.34521-.36182Z"/>
<polygon fill="#76B900" points="15.93445 9.19141 15.93445 9.19189 15.95569 9.17285 16.3302 8.8418 15.66858 8.09277 15.29407 8.42383 15.27234 8.44238 14.89783 8.77344 15.55994 9.52246 15.93445 9.19141"/>
<path fill="#76B900" d="M14.1106,10.87109h.00012c.11877-.11426.2384-.22803.35913-.3418l.36377-.34326-.68652-.72754-.36377.34326c-.12268.11621-.24438.23193-.36511.34766l-.36084.34619.69238.72168.36084-.34619Z"/>
<path fill="#76B900" d="M7.59961,18.54688l-.00012-.00049c.44568-.69482.95764-1.42041,1.52917-2.16602l.00635-.0083-.00012-.00049c.09912-.12939.20007-.25928.30273-.38965l.30908-.39307-.78589-.61816-.30908.39307c-.10535.13379-.20886.26709-.31067.3999l-.3042.39697.00757.00586c-.46753.62695-.90454,1.24414-1.28687,1.84033l.84204.54004Z"/>
<path fill="#76B900" d="M6.61475,20.23438l-.88721-.46094c-.55151,1.06152-.93237,2.0415-1.11304,2.92432l.97974.2002c.15308-.75342.49805-1.66016,1.02051-2.66357Z"/>
<path fill="#76B900" d="M7.48645,26.49854l.23035-.00342-.03345-.99902-.00012-.00049-.19678.00342c-.79749-.00195-1.31287-.20752-1.61194-.54785-.22632-.25732-.35815-.61182-.37354-1.11475l-.99951.02832c.00977.38867.08789.75537.22681,1.09033l-.00903.00391c.00977.02344.0293.04102.03955.06396.0957.2124.21045.41406.36621.59033.55908.63086,1.40454.88721,2.36145.88525Z"/>
<path fill="#76B900" d="M37.04224,33.44922c-.11047.12109-.22241.24219-.33569.36328l-.34131.36523.73096.68262.34131-.36523c.1156-.12402.22998-.24756.34277-.37109l.3374-.36914-.73804-.6748-.3374.36914Z"/>
<polygon fill="#76B900" points="31.65747 38.56494 31.65759 38.56494 31.63586 38.58301 31.25159 38.90332 31.89221 39.67188 32.27649 39.35156 32.29834 39.33252 32.68213 39.01172 32.04126 38.24414 31.65747 38.56494"/>
<path fill="#76B900" d="M35.32666,35.23291l-.00012.00049c-.11658.11572-.2345.23145-.35352.34717l-.3584.34863.69751.7168.3584-.34863c.12109-.11816.24109-.23584.35999-.35352l.35547-.35205-.70386-.71094-.35547.35205Z"/>
<path fill="#76B900" d="M33.52808,36.93799c-.12195.11035-.24487.2207-.36865.33154l-.37305.33252.66553.74609.37305-.33252c.12598-.1123.25098-.22461.375-.3374l.37012-.33594-.67188-.74023-.37012.33594Z"/>
<path fill="#76B900" d="M40.2533,29.44775h-.00012c-.46692.68506-1.00098,1.39893-1.59607,2.13184l.00037.00049h-.00024l-.00012-.00049c-.10278.12646-.20728.25342-.31372.38135l-.31982.38379.76831.63965.31982-.38379c.10876-.13037.21606-.26074.32141-.39062l.00037.00049c.00452-.00586.00842-.01123.01306-.0166l.30176-.37207-.00708-.00586c.48511-.61426.93933-1.21924,1.33826-1.80469l-.82617-.56348Z"/>
<path fill="#76B900" d="M41.28479,27.78809l.87402.48633c.5813-1.0459.98938-2.01465,1.19495-2.89209l-.97388-.22754c-.17407.74854-.54456,1.64502-1.09509,2.6333Z"/>
<path fill="#76B900" d="M40.43091,21.49951l-.07886.00098.02148.99951.05737-.00098c.86951.00195,1.422.22607,1.72656.5957.21912.26416.34082.62012.34253,1.12598h1c.00098-.39014-.06714-.75977-.19666-1.09863l.00928-.00342c-.00916-.02393-.02844-.04199-.03821-.06543-.08936-.21436-.19824-.41895-.34863-.59863-.57202-.68408-1.46948-.95703-2.49487-.95508Z"/>
<polygon fill="#76B900" points="40.37354 22.50049 40.37354 22.5 40.37341 22.5 40.37354 22.50049"/>
<path fill="#76B900" d="M18.57056,15.4458l-.67651-.73633c-.52161.47949-1.04297.97559-1.56213,1.48828-.20703.2041-.4115.40869-.61328.61328l.71228.70215c.19849-.20166.39966-.40283.60327-.604.51086-.50391,1.02368-.99219,1.53638-1.46338Z"/>
<path fill="#76B900" d="M20.05322,14.12939c.77734-.66602,1.54968-1.29053,2.30933-1.86914l-.6062-.7959c-.77539.59131-1.56238,1.22754-2.35388,1.90576l.65088.75977-.00012-.00049Z"/>
<path fill="#76B900" d="M33.66895,8.30664c.05298.10791.09644.229.12891.36572l.11621.48633.9729-.23242-.11621-.48633c-.04822-.20166-.11572-.39404-.20447-.57471l-.2207-.44873-.89746.44141.2207.44873h.00012Z"/>
<path fill="#76B900" d="M32.02148,7.49951l.0022-.00049c.15356,0,.29688.00977.42908.02734l.49561.06689.13428-.99121-.49561-.06689c-.18176-.02441-.36963-.03564-.56335-.03564h-.50269l.00049,1h.5Z"/>
<path fill="#76B900" d="M29.74451,7.96045v-.00049c.1554-.05371.30737-.10303.45557-.14746l.479-.14404-.28735-.95801-.479.14404c-.16248.04883-.32776.10254-.49561.16064l-.47266.16357.32739.94531.47266-.16357Z"/>
<polygon fill="#76B900" points="40.04736 14.95312 40.04749 14.95312 40.04724 14.95264 40.04736 14.95312"/>
<path fill="#76B900" d="M29.42908,32.55469l.67676.73633c.52148-.479,1.04285-.97559,1.56213-1.4873.20703-.20459.4115-.40918.61328-.61377l-.71216-.70215h-.00012c-.19849.20166-.39966.40283-.60327.604-.51086.50342-1.0238.9917-1.53662,1.46289Z"/>
<path fill="#76B900" d="M27.94641,33.87109c-.77747.66602-1.5498,1.29053-2.30945,1.86914l.60596.7959c.77551-.59131,1.56262-1.22754,2.35413-1.90576l-.65063-.75879v-.00049Z"/>
<rect fill="#76B900" x="13.26849" y="36.94187" width="1.03637" height="1.00006" transform="translate(-25.71802 43.04067) rotate(-77.852)"/>
<path fill="#76B900" d="M14.33044,39.69189c-.05298-.10791-.09644-.22949-.12903-.36572l-.11572-.48633-.97266.23145.11572.48633c.0481.20215.11548.39453.20422.5752l.2207.44873.89746-.44141-.2207-.44873v.00049Z"/>
<path fill="#76B900" d="M15.97791,40.49854l-.00195.00049c-.15369,0-.29688-.00977-.4292-.02734l-.49512-.06738-.13452.99023.49512.06738c.18176.0249.36975.03662.56372.03662h.00244l.5-.00049-.00049-1-.5.00049Z"/>
<path fill="#76B900" d="M18.25537,40.03857h-.00012c-.1554.05371-.30737.10303-.45569.14746l-.479.14355.28711.95801.479-.14355c.16248-.04883.32764-.10205.49561-.16016l.47266-.16357-.3269-.94531-.47266.16357Z"/>
<polygon fill="#76B900" points="37.47046 14.64355 37.47034 14.64307 37.47034 14.64307 37.47046 14.64355"/>
</g>
<!-- Title -->
<text x="72" y="40" class="title">Dynamo</text>
<!-- ================================================ -->
<!-- ROW 1 -->
<!-- ================================================ -->
<!-- Router (green, wider, centered above workers) -->
<rect x="130" y="84" width="280" height="88" rx="10"
fill="url(#green-fill)" filter="url(#green-glow)"/>
<use href="#sym-fork" x="136" y="88" width="20" height="20" style="color:rgba(255,255,255,0.8)"/>
<text x="270" y="125" class="box-label-white">Router</text>
<text x="270" y="138" class="box-sub-white">KV-Aware Routing</text>
<!-- ================================================ -->
<!-- ROW 2 -->
<!-- ================================================ -->
<!-- Worker group container -->
<rect x="30" y="232" width="480" height="120" rx="14"
fill="rgba(219,234,254,0.25)" stroke="rgba(147,197,253,0.5)" stroke-width="1.2"
stroke-dasharray="6 3" filter="url(#blue-glow)"/>
<!-- Prefill Worker -->
<rect x="48" y="256" width="160" height="88" rx="10" fill="#76B900" opacity="0.7"/>
<rect x="50" y="256" width="160" height="88" rx="10"
fill="white" stroke="rgba(226,232,240,0.6)" stroke-width="1" filter="url(#shadow)"/>
<use href="#sym-layers" x="54" y="258" width="20" height="20" style="color:#76B900;opacity:0.8"/>
<text x="60" y="296" class="box-label">Prefill</text>
<text x="60" y="309" class="box-sub">Worker</text>
<!-- Decode Worker -->
<rect x="328" y="256" width="160" height="88" rx="10" fill="#76B900" opacity="0.7"/>
<rect x="330" y="256" width="160" height="88" rx="10"
fill="white" stroke="rgba(226,232,240,0.6)" stroke-width="1" filter="url(#shadow)"/>
<use href="#sym-stream" x="334" y="258" width="20" height="20" style="color:#76B900;opacity:0.8"/>
<text x="340" y="296" class="box-label">Decode</text>
<text x="340" y="309" class="box-sub">Worker</text>
<!-- KV Cache lines (drawn before NIXL so NIXL renders on top) -->
<line x1="130" y1="348" x2="130" y2="496" class="connector" marker-end="url(#arrow)"/>
<line x1="410" y1="348" x2="410" y2="496" class="connector" marker-end="url(#arrow)"/>
<!-- ================================================ -->
<!-- ROW 3: NIXL -->
<!-- ================================================ -->
<g>
<rect x="30" y="420" width="480" height="48" rx="10"
fill="url(#green-fill)" filter="url(#green-glow)" opacity="0.75"/>
</g>
<use href="#sym-transfer" x="36" y="422" width="20" height="18" style="color:white"/>
<text x="270" y="440" class="box-label-white">NIXL</text>
<text x="270" y="453" class="box-sub-white">Accelerated KV Transfer</text>
<!-- ================================================ -->
<!-- ROW 4: Memory Hierarchy container -->
<!-- ================================================ -->
<rect x="20" y="500" width="500" height="80" rx="14"
fill="rgba(219,234,254,0.2)" stroke="rgba(147,197,253,0.4)" stroke-width="1"
stroke-dasharray="6 3"/>
<rect x="40" y="520" width="100" height="44" rx="10"
fill="white" stroke="rgba(226,232,240,0.6)" stroke-width="1" filter="url(#shadow)"/>
<text x="90" y="547" class="box-label-center" style="font-size:12px;">Block</text>
<rect x="160" y="520" width="100" height="44" rx="10"
fill="white" stroke="rgba(226,232,240,0.6)" stroke-width="1" filter="url(#shadow)"/>
<text x="210" y="547" class="box-label-center" style="font-size:12px;">Local File</text>
<rect x="280" y="520" width="100" height="44" rx="10"
fill="white" stroke="rgba(226,232,240,0.6)" stroke-width="1" filter="url(#shadow)"/>
<text x="330" y="540" class="box-label-center" style="font-size:12px;">Remote File</text>
<text x="330" y="555" class="box-label-center" style="font-size:12px;">/ Object</text>
<rect x="400" y="520" width="100" height="44" rx="10"
fill="white" stroke="rgba(226,232,240,0.6)" stroke-width="1" filter="url(#shadow)"/>
<text x="450" y="547" class="box-label-center" style="font-size:12px;">Cloud</text>
<!-- ================================================ -->
<!-- ZONE ANNOTATIONS -->
<!-- ================================================ -->
<text x="270" y="245" class="zone-label" style="fill:rgba(59,130,246,0.45);">DISAGGREGATED SERVING</text>
<text x="270" y="513" class="zone-label" style="fill:rgba(59,130,246,0.45);">MEMORY HIERARCHY</text>
<!-- ================================================ -->
<!-- CONNECTORS -->
<!-- ================================================ -->
<!-- Tokens: arrow stops 4px outside Prefill (y=252) -->
<path d="M230,176 L230,210 L130,210 L130,252" class="connector" marker-end="url(#arrow)"/>
<rect x="154" y="188" width="48" height="16" rx="8" class="pill-blue"/>
<text x="178" y="199" class="conn-label label-blue">Tokens</text>
<!-- Route: arrow stops 4px outside Decode (y=252) -->
<path d="M310,176 L310,210 L410,210 L410,252" class="connector" marker-end="url(#arrow)"/>
<rect x="338" y="188" width="42" height="16" rx="8" class="pill-blue"/>
<text x="359" y="199" class="conn-label label-blue">Route</text>
<!-- KV Transfer: bidirectional between Prefill and Decode -->
<line x1="214" y1="300" x2="326" y2="300" class="connector-kv" marker-start="url(#arrow-green-rev)" marker-end="url(#arrow-green)"/>
<rect x="236" y="278" width="68" height="16" rx="8" class="pill-green"/>
<text x="270" y="289" class="conn-label label-green">KV Transfer</text>
<!-- KV Cache pills (lines drawn earlier, before NIXL) -->
<rect x="136" y="384" width="60" height="16" rx="8" class="pill-green"/>
<text x="166" y="395" class="conn-label label-green">KV Cache</text>
<rect x="344" y="384" width="60" height="16" rx="8" class="pill-green"/>
<text x="374" y="395" class="conn-label label-green">KV Cache</text>
</svg>
File suppressed by a .gitattributes entry or the file's encoding is unsupported.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment