docs: add Local Installation and Building from Source guides (#7490)

Signed-off-by: Dan Gil <dagil@nvidia.com>

docs: add Local Installation and Building from Source guides (#7490)
Signed-off-by: Dan Gil <dagil@nvidia.com>
3f6c6249 · dagil-nvidia · GitHub · 3357c53f · 3f6c6249 · 3f6c6249
Unverified Commit 3f6c6249 authored Mar 18, 2026 by dagil-nvidia Committed by GitHub Mar 18, 2026
5 changed files
--- a/README.md
+++ b/README.md
@@ -126,7 +126,7 @@ Also available: [`tensorrtllm-runtime:1.0.1`](https://docs.nvidia.com/dynamo/res
 pip install "ai-dynamo[sglang]"   # or [vllm] or [trtllm]
 ```

-Then start the frontend and a worker as shown above. See the [full installation guide](https://docs.nvidia.com/dynamo/getting-started/quickstart) for system dependencies and backend-specific notes.
+Then start the frontend and a worker as shown above. See the [full installation guide](docs/getting-started/local-installation.md) for system dependencies and backend-specific notes.

 ### Option C: Kubernetes (recommended)

@@ -159,7 +159,7 @@ See [recipes/](recipes/README.md) for the full list. Cloud-specific guides: [AWS

 ## Building from Source

-For contributors who want to build and develop locally. See the [full build guide](https://docs.nvidia.com/dynamo/getting-started/contribution-guide#building-from-source) for details.
+For contributors who want to build and develop locally. See the [full build guide](docs/getting-started/building-from-source.md) for details.

 ```bash
 # Install system deps (Ubuntu 24.04)

--- a/docs/getting-started/building-from-source.md
+++ b/docs/getting-started/building-from-source.md
+---
+# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+sidebar-title: Building from Source
+description: Build Dynamo from source for development and contributions
+---
+
+# Building from Source
+
+Build Dynamo from source when you want to contribute code, test features on the development branch, or customize the build. If you just want to run Dynamo, the [Local Installation](local-installation.md) guide is faster.
+
+This guide covers Ubuntu and macOS. For a containerized dev environment that handles all of this automatically, see [DevContainer](#devcontainer).
+
+## 1. Install System Libraries
+
+**Ubuntu:**
+
+```bash
+sudo apt install -y build-essential libhwloc-dev libudev-dev pkg-config libclang-dev protobuf-compiler python3-dev cmake
+```
+
+**macOS:**
+
+```bash
+# Install Homebrew if needed
+/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
+
+brew install cmake protobuf
+
+# Verify Metal is accessible
+xcrun -sdk macosx metal
+```
+
+## 2. Install Rust
+
+```bash
+curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
+source $HOME/.cargo/env
+```
+
+## 3. Create a Python Virtual Environment
+
+Install [uv](https://docs.astral.sh/uv/#installation) if you don't have it:
+
+```bash
+curl -LsSf https://astral.sh/uv/install.sh | sh
+```
+
+Create and activate a virtual environment:
+
+```bash
+uv venv .venv
+source .venv/bin/activate
+```
+
+## 4. Install Build Tools
+
+```bash
+uv pip install pip maturin
+```
+
+[Maturin](https://github.com/PyO3/maturin) is the Rust-Python bindings build tool.
+
+## 5. Build the Rust Bindings
+
+```bash
+cd lib/bindings/python
+maturin develop --uv
+```
+
+## 6. Install GPU Memory Service
+
+```bash
+# Return to project root
+cd "$(git rev-parse --show-toplevel)"
+uv pip install -e lib/gpu_memory_service
+```
+
+## 7. Install the Wheel
+
+```bash
+uv pip install -e .
+```
+
+## 8. Verify the Build
+
+```bash
+python3 -m dynamo.frontend --help
+```
+
+You should see the frontend command help output.
+
+## DevContainer
+
+VSCode and Cursor users can skip manual setup using pre-configured development containers. The DevContainer installs all toolchains, builds the project, and sets up the Python environment automatically.
+
+Framework-specific containers are available for vLLM, SGLang, and TensorRT-LLM. See the [DevContainer README](https://github.com/ai-dynamo/dynamo/tree/main/.devcontainer) for setup instructions.
+
+## Set Up Pre-commit Hooks
+
+Before submitting PRs, install the pre-commit hooks to ensure your code passes CI checks:
+
+```bash
+uv pip install pre-commit
+pre-commit install
+```
+
+Run checks manually on all files:
+
+```bash
+pre-commit run --all-files
+```
+
+## Troubleshooting
+
+**Missing system packages**
+
+If `maturin develop` fails with linker errors, verify all system dependencies are installed. On Ubuntu:
+
+```bash
+sudo apt install -y build-essential libhwloc-dev libudev-dev pkg-config libclang-dev protobuf-compiler python3-dev cmake
+```
+
+**Virtual environment not activated**
+
+Maturin builds against the active Python interpreter. If you see errors about Python or site-packages, ensure your virtual environment is activated:
+
+```bash
+source .venv/bin/activate
+```
+
+**Disk space**
+
+The Rust `target/` directory can grow to 10+ GB during development. If builds fail with disk space errors, clean the build cache:
+
+```bash
+cargo clean
+```
+
+## Next Steps
+
+- [Contribution Guide](../contribution-guide.md) -- Workflow for contributing code
+- [Examples](https://github.com/ai-dynamo/dynamo/tree/main/examples) -- Explore the codebase
+- [Good First Issues](https://github.com/ai-dynamo/dynamo/labels/good-first-issue) -- Find a task to work on
--- a/docs/getting-started/local-installation.md
+++ b/docs/getting-started/local-installation.md
+---
+# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+sidebar-title: Local Installation
+description: Install and run Dynamo on a local machine or VM with containers or PyPI
+---
+
+# Local Installation
+
+This guide walks through installing and running Dynamo on a local machine or VM with one or more GPUs. By the end, you'll have a working OpenAI-compatible endpoint serving a model.
+
+For production multi-node clusters, see the [Kubernetes Deployment Guide](../kubernetes/README.md). To build from source for development, see [Building from Source](building-from-source.md).
+
+## System Requirements
+
+| Requirement | Supported |
+|---|---|
+| **GPU** | NVIDIA Ampere, Ada Lovelace, Hopper, Blackwell |
+| **OS** | Ubuntu 22.04, Ubuntu 24.04 |
+| **Architecture** | x86_64, ARM64 (ARM64 requires Ubuntu 24.04) |
+| **CUDA** | 12.9+ or 13.0+ (B300/GB300 require CUDA 13) |
+| **Python** | 3.10, 3.12 |
+| **Driver** | 575.51.03+ (CUDA 12) or 580.00.03+ (CUDA 13) |
+
+TensorRT-LLM does not support Python 3.11.
+
+For the full compatibility matrix including backend framework versions, see the [Support Matrix](../reference/support-matrix.md).
+
+## Install Dynamo
+
+### Option A: Containers (Recommended)
+
+Containers have all dependencies pre-installed. No setup required.
+
+```bash
+# SGLang
+docker run --gpus all --network host --rm -it nvcr.io/nvidia/ai-dynamo/sglang-runtime:1.0.0
+
+# TensorRT-LLM
+docker run --gpus all --network host --rm -it nvcr.io/nvidia/ai-dynamo/tensorrtllm-runtime:1.0.0
+
+# vLLM
+docker run --gpus all --network host --rm -it nvcr.io/nvidia/ai-dynamo/vllm-runtime:1.0.0
+```
+
+To run frontend and worker in the same container, either:
+
+- Run processes in background with `&` (see Run Dynamo section below), or
+- Open a second terminal and use `docker exec -it <container_id> bash`
+
+See [Release Artifacts](../reference/release-artifacts.md#container-images) for available
+versions and backend guides for run instructions: [SGLang](../backends/sglang/README.md) |
+[TensorRT-LLM](../backends/trtllm/README.md) | [vLLM](../backends/vllm/README.md)
+
+### Option B: Install from PyPI
+
+```bash
+# Install uv (recommended Python package manager)
+curl -LsSf https://astral.sh/uv/install.sh | sh
+
+# Create virtual environment
+uv venv venv
+source venv/bin/activate
+uv pip install pip
+```
+
+Install system dependencies and the Dynamo wheel for your chosen backend:
+
+**SGLang**
+
+```bash
+sudo apt install python3-dev
+uv pip install --prerelease=allow "ai-dynamo[sglang]"
+```
+
+For CUDA 13 (B300/GB300), the container is recommended. See
+[SGLang install docs](https://docs.sglang.io/get_started/install.html) for details.
+
+**TensorRT-LLM**
+
+```bash
+sudo apt install python3-dev
+pip install torch==2.9.0 torchvision --index-url https://download.pytorch.org/whl/cu130
+pip install --pre --extra-index-url https://pypi.nvidia.com "ai-dynamo[trtllm]"
+```
+
+TensorRT-LLM requires `pip` due to a transitive Git URL dependency that
+`uv` doesn't resolve. We recommend using the TensorRT-LLM container for
+broader compatibility. See the [TRT-LLM backend guide](../backends/trtllm/README.md)
+for details.
+
+**vLLM**
+
+```bash
+sudo apt install python3-dev libxcb1
+uv pip install --prerelease=allow "ai-dynamo[vllm]"
+```
+
+## Run Dynamo
+
+### Discovery Backend
+
+Dynamo components discover each other through a shared backend. Two options are available:
+
+| Backend | When to Use | Setup |
+|---|---|---|
+| **File** | Single machine, local development | No setup -- pass `--discovery-backend file` to all components |
+| **etcd** | Multi-node, production | Requires a running etcd instance (default if no flag is specified) |
+
+This guide uses `--discovery-backend file`. For etcd setup, see [Service Discovery](../kubernetes/service-discovery.md).
+
+### Verify Installation (Optional)
+
+Verify the CLI is installed and callable:
+
+```bash
+python3 -m dynamo.frontend --help
+```
+
+If you cloned the repository, you can run additional system checks:
+
+```bash
+python3 deploy/sanity_check.py
+```
+
+### Start the Frontend
+
+```bash
+# Start the OpenAI compatible frontend (default port is 8000)
+python3 -m dynamo.frontend --discovery-backend file
+```
+
+To run in a single terminal (useful in containers), append `> logfile.log 2>&1 &`
+to run processes in background:
+
+```bash
+python3 -m dynamo.frontend --discovery-backend file > dynamo.frontend.log 2>&1 &
+```
+
+### Start a Worker
+
+In another terminal (or same terminal if using background mode), start a worker for your chosen backend:
+
+**SGLang**
+
+```bash
+python3 -m dynamo.sglang --model-path Qwen/Qwen3-0.6B --discovery-backend file
+```
+
+**TensorRT-LLM**
+
+```bash
+python3 -m dynamo.trtllm --model-path Qwen/Qwen3-0.6B --discovery-backend file
+```
+
+The warning `Cannot connect to ModelExpress server/transport error. Using direct download.`
+is expected in local deployments and can be safely ignored.
+
+**vLLM**
+
+```bash
+python3 -m dynamo.vllm --model Qwen/Qwen3-0.6B --discovery-backend file \
+  --kv-events-config '{"enable_kv_cache_events": false}'
+```
+
+### KV Events Configuration
+
+For dependency-free local development, disable KV event publishing (avoids NATS):
+
+- **vLLM:** Add `--kv-events-config '{"enable_kv_cache_events": false}'`
+- **SGLang:** No flag needed (KV events disabled by default)
+- **TensorRT-LLM:** No flag needed (KV events disabled by default)
+
+vLLM automatically enables KV event publishing when prefix caching is active. In a future release, KV events will be disabled by default for all backends. Start using `--kv-events-config` explicitly to prepare.
+
+## Test Your Deployment
+
+```bash
+curl localhost:8000/v1/chat/completions \
+  -H "Content-Type: application/json" \
+  -d '{"model": "Qwen/Qwen3-0.6B",
+       "messages": [{"role": "user", "content": "Hello!"}],
+       "max_tokens": 50}'
+```
+
+## Troubleshooting
+
+**CUDA/driver version mismatch**
+
+Run `nvidia-smi` to check your driver version. Dynamo requires driver 575.51.03+ for CUDA 12 or 580.00.03+ for CUDA 13. B300/GB300 GPUs require CUDA 13. See the [Support Matrix](../reference/support-matrix.md) for full requirements.
+
+**Model doesn't fit on GPU (OOM)**
+
+The default model `Qwen/Qwen3-0.6B` requires ~2GB of GPU memory. Larger models need more VRAM:
+
+| Model Size | Approximate VRAM |
+|---|---|
+| 7B | 14-16 GB |
+| 13B | 26-28 GB |
+| 70B | 140+ GB (multi-GPU) |
+
+Start with a small model and scale up based on your hardware.
+
+**Python 3.11 with TensorRT-LLM**
+
+TensorRT-LLM does not support Python 3.11. If you see installation failures with TensorRT-LLM, check your Python version with `python3 --version`. Use Python 3.10 or 3.12 instead.
+
+**Container runs but GPU not detected**
+
+Ensure you passed `--gpus all` to `docker run`. Without this flag, the container won't have access to GPUs:
+
+```bash
+# Correct
+docker run --gpus all --network host --rm -it nvcr.io/nvidia/ai-dynamo/sglang-runtime:1.0.0
+
+# Wrong -- no GPU access
+docker run --network host --rm -it nvcr.io/nvidia/ai-dynamo/sglang-runtime:1.0.0
+```
+
+## Next Steps
+
+- [Backend Guides](../backends/sglang/README.md) -- Backend-specific configuration and features
+- [Disaggregated Serving](../features/disaggregated-serving/README.md) -- Scale prefill and decode independently
+- [KV Cache Aware Routing](../components/router/router-guide.md) -- Smart request routing
+- [Kubernetes Deployment](../kubernetes/README.md) -- Production multi-node deployments
--- a/docs/getting-started/quickstart.md
+++ b/docs/getting-started/quickstart.md
@@ -11,6 +11,14 @@ This guide covers running Dynamo **using the CLI on your local machine or VM**.
 > See the [Kubernetes Installation Guide](../kubernetes/installation-guide.md)
 > and [Kubernetes Quickstart](../kubernetes/README.md) for cluster deployments.

+## Choose Your Install Path
+
+| Path | Best For | Guide |
+|---|---|---|
+| **Local Install** | Running Dynamo on a single machine or VM | [Local Installation](local-installation.md) |
+| **Kubernetes** | Production multi-node cluster deployments | [Kubernetes Deployment Guide](../kubernetes/README.md) |
+| **Building from Source** | Contributors and local development | [Building from Source](building-from-source.md) |
+
 ## Install Dynamo

 **Option A: Containers (Recommended)**
@@ -28,12 +36,6 @@ docker run --gpus all --network host --rm -it nvcr.io/nvidia/ai-dynamo/tensorrtl
 docker run --gpus all --network host --rm -it nvcr.io/nvidia/ai-dynamo/vllm-runtime:1.0.0
 ```

-> [!TIP]
-> To run frontend and worker in the same container, either:
->
-> - Run processes in background with `&` (see Run Dynamo section below), or
-> - Open a second terminal and use `docker exec -it <container_id> bash`
-
 See [Release Artifacts](../reference/release-artifacts.md#container-images) for available
 versions and backend guides for run instructions: [SGLang](../backends/sglang/README.md) |
 [TensorRT-LLM](../backends/trtllm/README.md) | [vLLM](../backends/vllm/README.md)
@@ -59,10 +61,6 @@ sudo apt install python3-dev
 uv pip install --prerelease=allow "ai-dynamo[sglang]"
 ```

-> [!NOTE]
-> For CUDA 13 (B300/GB300), the container is recommended. See
-> [SGLang install docs](https://docs.sglang.io/get_started/install.html) for details.
-
 **TensorRT-LLM**

 ```bash
@@ -71,12 +69,6 @@ pip install torch==2.9.0 torchvision --index-url https://download.pytorch.org/wh
 pip install --pre --extra-index-url https://pypi.nvidia.com "ai-dynamo[trtllm]"
 ```

-> [!NOTE]
-> TensorRT-LLM requires `pip` due to a transitive Git URL dependency that
-> `uv` doesn't resolve. We recommend using the TensorRT-LLM container for
-> broader compatibility. See the [TRT-LLM backend guide](../backends/trtllm/README.md)
-> for details.
-
 **vLLM**

 ```bash
@@ -86,10 +78,6 @@ uv pip install --prerelease=allow "ai-dynamo[vllm]"

 ## Run Dynamo

-> [!TIP]
-> **(Optional)** Before running Dynamo, verify your system configuration:
-> `python3 deploy/sanity_check.py`
-
 Start the frontend, then start a worker for your chosen backend.

 > [!TIP]
@@ -123,19 +111,6 @@ python3 -m dynamo.vllm --model Qwen/Qwen3-0.6B --discovery-backend file \
  --kv-events-config '{"enable_kv_cache_events": false}'
 ```

-> [!NOTE]
-> For dependency-free local development, disable KV event publishing (avoids NATS):
->
-> - **vLLM:** Add `--kv-events-config '{"enable_kv_cache_events": false}'`
-> - **SGLang:** No flag needed (KV events disabled by default)
-> - **TensorRT-LLM:** No flag needed (KV events disabled by default)
->
-> **TensorRT-LLM only:** The warning `Cannot connect to ModelExpress server/transport error. Using direct download.`
-> is expected and can be safely ignored.
-
-> [!NOTE]
-> **Deprecation notice:** vLLM automatically enables KV event publishing when prefix caching is active. In a future release, this will change — KV events will be disabled by default for all backends. Start using `--kv-events-config` explicitly to prepare.
-
 ## Test Your Deployment

 ```bash

--- a/docs/index.yml
+++ b/docs/index.yml
@@ -24,6 +24,10 @@ navigation:
        path: getting-started/quickstart.md
      - page: Introduction
        path: getting-started/introduction.md
+      - page: Local Installation
+        path: getting-started/local-installation.md
+      - page: Building from Source
+        path: getting-started/building-from-source.md
      - page: Contribution Guide
        path: contribution-guide.md