Unverified Commit 3f6c6249 authored by dagil-nvidia's avatar dagil-nvidia Committed by GitHub
Browse files

docs: add Local Installation and Building from Source guides (#7490)


Signed-off-by: default avatarDan Gil <dagil@nvidia.com>
parent 3357c53f
......@@ -126,7 +126,7 @@ Also available: [`tensorrtllm-runtime:1.0.1`](https://docs.nvidia.com/dynamo/res
pip install "ai-dynamo[sglang]" # or [vllm] or [trtllm]
```
Then start the frontend and a worker as shown above. See the [full installation guide](https://docs.nvidia.com/dynamo/getting-started/quickstart) for system dependencies and backend-specific notes.
Then start the frontend and a worker as shown above. See the [full installation guide](docs/getting-started/local-installation.md) for system dependencies and backend-specific notes.
### Option C: Kubernetes (recommended)
......@@ -159,7 +159,7 @@ See [recipes/](recipes/README.md) for the full list. Cloud-specific guides: [AWS
## Building from Source
For contributors who want to build and develop locally. See the [full build guide](https://docs.nvidia.com/dynamo/getting-started/contribution-guide#building-from-source) for details.
For contributors who want to build and develop locally. See the [full build guide](docs/getting-started/building-from-source.md) for details.
```bash
# Install system deps (Ubuntu 24.04)
......
---
# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
sidebar-title: Building from Source
description: Build Dynamo from source for development and contributions
---
# Building from Source
Build Dynamo from source when you want to contribute code, test features on the development branch, or customize the build. If you just want to run Dynamo, the [Local Installation](local-installation.md) guide is faster.
This guide covers Ubuntu and macOS. For a containerized dev environment that handles all of this automatically, see [DevContainer](#devcontainer).
## 1. Install System Libraries
**Ubuntu:**
```bash
sudo apt install -y build-essential libhwloc-dev libudev-dev pkg-config libclang-dev protobuf-compiler python3-dev cmake
```
**macOS:**
```bash
# Install Homebrew if needed
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
brew install cmake protobuf
# Verify Metal is accessible
xcrun -sdk macosx metal
```
## 2. Install Rust
```bash
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
source $HOME/.cargo/env
```
## 3. Create a Python Virtual Environment
Install [uv](https://docs.astral.sh/uv/#installation) if you don't have it:
```bash
curl -LsSf https://astral.sh/uv/install.sh | sh
```
Create and activate a virtual environment:
```bash
uv venv .venv
source .venv/bin/activate
```
## 4. Install Build Tools
```bash
uv pip install pip maturin
```
[Maturin](https://github.com/PyO3/maturin) is the Rust-Python bindings build tool.
## 5. Build the Rust Bindings
```bash
cd lib/bindings/python
maturin develop --uv
```
## 6. Install GPU Memory Service
```bash
# Return to project root
cd "$(git rev-parse --show-toplevel)"
uv pip install -e lib/gpu_memory_service
```
## 7. Install the Wheel
```bash
uv pip install -e .
```
## 8. Verify the Build
```bash
python3 -m dynamo.frontend --help
```
You should see the frontend command help output.
## DevContainer
VSCode and Cursor users can skip manual setup using pre-configured development containers. The DevContainer installs all toolchains, builds the project, and sets up the Python environment automatically.
Framework-specific containers are available for vLLM, SGLang, and TensorRT-LLM. See the [DevContainer README](https://github.com/ai-dynamo/dynamo/tree/main/.devcontainer) for setup instructions.
## Set Up Pre-commit Hooks
Before submitting PRs, install the pre-commit hooks to ensure your code passes CI checks:
```bash
uv pip install pre-commit
pre-commit install
```
Run checks manually on all files:
```bash
pre-commit run --all-files
```
## Troubleshooting
**Missing system packages**
If `maturin develop` fails with linker errors, verify all system dependencies are installed. On Ubuntu:
```bash
sudo apt install -y build-essential libhwloc-dev libudev-dev pkg-config libclang-dev protobuf-compiler python3-dev cmake
```
**Virtual environment not activated**
Maturin builds against the active Python interpreter. If you see errors about Python or site-packages, ensure your virtual environment is activated:
```bash
source .venv/bin/activate
```
**Disk space**
The Rust `target/` directory can grow to 10+ GB during development. If builds fail with disk space errors, clean the build cache:
```bash
cargo clean
```
## Next Steps
- [Contribution Guide](../contribution-guide.md) -- Workflow for contributing code
- [Examples](https://github.com/ai-dynamo/dynamo/tree/main/examples) -- Explore the codebase
- [Good First Issues](https://github.com/ai-dynamo/dynamo/labels/good-first-issue) -- Find a task to work on
---
# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
sidebar-title: Local Installation
description: Install and run Dynamo on a local machine or VM with containers or PyPI
---
# Local Installation
This guide walks through installing and running Dynamo on a local machine or VM with one or more GPUs. By the end, you'll have a working OpenAI-compatible endpoint serving a model.
For production multi-node clusters, see the [Kubernetes Deployment Guide](../kubernetes/README.md). To build from source for development, see [Building from Source](building-from-source.md).
## System Requirements
| Requirement | Supported |
|---|---|
| **GPU** | NVIDIA Ampere, Ada Lovelace, Hopper, Blackwell |
| **OS** | Ubuntu 22.04, Ubuntu 24.04 |
| **Architecture** | x86_64, ARM64 (ARM64 requires Ubuntu 24.04) |
| **CUDA** | 12.9+ or 13.0+ (B300/GB300 require CUDA 13) |
| **Python** | 3.10, 3.12 |
| **Driver** | 575.51.03+ (CUDA 12) or 580.00.03+ (CUDA 13) |
TensorRT-LLM does not support Python 3.11.
For the full compatibility matrix including backend framework versions, see the [Support Matrix](../reference/support-matrix.md).
## Install Dynamo
### Option A: Containers (Recommended)
Containers have all dependencies pre-installed. No setup required.
```bash
# SGLang
docker run --gpus all --network host --rm -it nvcr.io/nvidia/ai-dynamo/sglang-runtime:1.0.0
# TensorRT-LLM
docker run --gpus all --network host --rm -it nvcr.io/nvidia/ai-dynamo/tensorrtllm-runtime:1.0.0
# vLLM
docker run --gpus all --network host --rm -it nvcr.io/nvidia/ai-dynamo/vllm-runtime:1.0.0
```
To run frontend and worker in the same container, either:
- Run processes in background with `&` (see Run Dynamo section below), or
- Open a second terminal and use `docker exec -it <container_id> bash`
See [Release Artifacts](../reference/release-artifacts.md#container-images) for available
versions and backend guides for run instructions: [SGLang](../backends/sglang/README.md) |
[TensorRT-LLM](../backends/trtllm/README.md) | [vLLM](../backends/vllm/README.md)
### Option B: Install from PyPI
```bash
# Install uv (recommended Python package manager)
curl -LsSf https://astral.sh/uv/install.sh | sh
# Create virtual environment
uv venv venv
source venv/bin/activate
uv pip install pip
```
Install system dependencies and the Dynamo wheel for your chosen backend:
**SGLang**
```bash
sudo apt install python3-dev
uv pip install --prerelease=allow "ai-dynamo[sglang]"
```
For CUDA 13 (B300/GB300), the container is recommended. See
[SGLang install docs](https://docs.sglang.io/get_started/install.html) for details.
**TensorRT-LLM**
```bash
sudo apt install python3-dev
pip install torch==2.9.0 torchvision --index-url https://download.pytorch.org/whl/cu130
pip install --pre --extra-index-url https://pypi.nvidia.com "ai-dynamo[trtllm]"
```
TensorRT-LLM requires `pip` due to a transitive Git URL dependency that
`uv` doesn't resolve. We recommend using the TensorRT-LLM container for
broader compatibility. See the [TRT-LLM backend guide](../backends/trtllm/README.md)
for details.
**vLLM**
```bash
sudo apt install python3-dev libxcb1
uv pip install --prerelease=allow "ai-dynamo[vllm]"
```
## Run Dynamo
### Discovery Backend
Dynamo components discover each other through a shared backend. Two options are available:
| Backend | When to Use | Setup |
|---|---|---|
| **File** | Single machine, local development | No setup -- pass `--discovery-backend file` to all components |
| **etcd** | Multi-node, production | Requires a running etcd instance (default if no flag is specified) |
This guide uses `--discovery-backend file`. For etcd setup, see [Service Discovery](../kubernetes/service-discovery.md).
### Verify Installation (Optional)
Verify the CLI is installed and callable:
```bash
python3 -m dynamo.frontend --help
```
If you cloned the repository, you can run additional system checks:
```bash
python3 deploy/sanity_check.py
```
### Start the Frontend
```bash
# Start the OpenAI compatible frontend (default port is 8000)
python3 -m dynamo.frontend --discovery-backend file
```
To run in a single terminal (useful in containers), append `> logfile.log 2>&1 &`
to run processes in background:
```bash
python3 -m dynamo.frontend --discovery-backend file > dynamo.frontend.log 2>&1 &
```
### Start a Worker
In another terminal (or same terminal if using background mode), start a worker for your chosen backend:
**SGLang**
```bash
python3 -m dynamo.sglang --model-path Qwen/Qwen3-0.6B --discovery-backend file
```
**TensorRT-LLM**
```bash
python3 -m dynamo.trtllm --model-path Qwen/Qwen3-0.6B --discovery-backend file
```
The warning `Cannot connect to ModelExpress server/transport error. Using direct download.`
is expected in local deployments and can be safely ignored.
**vLLM**
```bash
python3 -m dynamo.vllm --model Qwen/Qwen3-0.6B --discovery-backend file \
--kv-events-config '{"enable_kv_cache_events": false}'
```
### KV Events Configuration
For dependency-free local development, disable KV event publishing (avoids NATS):
- **vLLM:** Add `--kv-events-config '{"enable_kv_cache_events": false}'`
- **SGLang:** No flag needed (KV events disabled by default)
- **TensorRT-LLM:** No flag needed (KV events disabled by default)
vLLM automatically enables KV event publishing when prefix caching is active. In a future release, KV events will be disabled by default for all backends. Start using `--kv-events-config` explicitly to prepare.
## Test Your Deployment
```bash
curl localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "Qwen/Qwen3-0.6B",
"messages": [{"role": "user", "content": "Hello!"}],
"max_tokens": 50}'
```
## Troubleshooting
**CUDA/driver version mismatch**
Run `nvidia-smi` to check your driver version. Dynamo requires driver 575.51.03+ for CUDA 12 or 580.00.03+ for CUDA 13. B300/GB300 GPUs require CUDA 13. See the [Support Matrix](../reference/support-matrix.md) for full requirements.
**Model doesn't fit on GPU (OOM)**
The default model `Qwen/Qwen3-0.6B` requires ~2GB of GPU memory. Larger models need more VRAM:
| Model Size | Approximate VRAM |
|---|---|
| 7B | 14-16 GB |
| 13B | 26-28 GB |
| 70B | 140+ GB (multi-GPU) |
Start with a small model and scale up based on your hardware.
**Python 3.11 with TensorRT-LLM**
TensorRT-LLM does not support Python 3.11. If you see installation failures with TensorRT-LLM, check your Python version with `python3 --version`. Use Python 3.10 or 3.12 instead.
**Container runs but GPU not detected**
Ensure you passed `--gpus all` to `docker run`. Without this flag, the container won't have access to GPUs:
```bash
# Correct
docker run --gpus all --network host --rm -it nvcr.io/nvidia/ai-dynamo/sglang-runtime:1.0.0
# Wrong -- no GPU access
docker run --network host --rm -it nvcr.io/nvidia/ai-dynamo/sglang-runtime:1.0.0
```
## Next Steps
- [Backend Guides](../backends/sglang/README.md) -- Backend-specific configuration and features
- [Disaggregated Serving](../features/disaggregated-serving/README.md) -- Scale prefill and decode independently
- [KV Cache Aware Routing](../components/router/router-guide.md) -- Smart request routing
- [Kubernetes Deployment](../kubernetes/README.md) -- Production multi-node deployments
......@@ -11,6 +11,14 @@ This guide covers running Dynamo **using the CLI on your local machine or VM**.
> See the [Kubernetes Installation Guide](../kubernetes/installation-guide.md)
> and [Kubernetes Quickstart](../kubernetes/README.md) for cluster deployments.
## Choose Your Install Path
| Path | Best For | Guide |
|---|---|---|
| **Local Install** | Running Dynamo on a single machine or VM | [Local Installation](local-installation.md) |
| **Kubernetes** | Production multi-node cluster deployments | [Kubernetes Deployment Guide](../kubernetes/README.md) |
| **Building from Source** | Contributors and local development | [Building from Source](building-from-source.md) |
## Install Dynamo
**Option A: Containers (Recommended)**
......@@ -28,12 +36,6 @@ docker run --gpus all --network host --rm -it nvcr.io/nvidia/ai-dynamo/tensorrtl
docker run --gpus all --network host --rm -it nvcr.io/nvidia/ai-dynamo/vllm-runtime:1.0.0
```
> [!TIP]
> To run frontend and worker in the same container, either:
>
> - Run processes in background with `&` (see Run Dynamo section below), or
> - Open a second terminal and use `docker exec -it <container_id> bash`
See [Release Artifacts](../reference/release-artifacts.md#container-images) for available
versions and backend guides for run instructions: [SGLang](../backends/sglang/README.md) |
[TensorRT-LLM](../backends/trtllm/README.md) | [vLLM](../backends/vllm/README.md)
......@@ -59,10 +61,6 @@ sudo apt install python3-dev
uv pip install --prerelease=allow "ai-dynamo[sglang]"
```
> [!NOTE]
> For CUDA 13 (B300/GB300), the container is recommended. See
> [SGLang install docs](https://docs.sglang.io/get_started/install.html) for details.
**TensorRT-LLM**
```bash
......@@ -71,12 +69,6 @@ pip install torch==2.9.0 torchvision --index-url https://download.pytorch.org/wh
pip install --pre --extra-index-url https://pypi.nvidia.com "ai-dynamo[trtllm]"
```
> [!NOTE]
> TensorRT-LLM requires `pip` due to a transitive Git URL dependency that
> `uv` doesn't resolve. We recommend using the TensorRT-LLM container for
> broader compatibility. See the [TRT-LLM backend guide](../backends/trtllm/README.md)
> for details.
**vLLM**
```bash
......@@ -86,10 +78,6 @@ uv pip install --prerelease=allow "ai-dynamo[vllm]"
## Run Dynamo
> [!TIP]
> **(Optional)** Before running Dynamo, verify your system configuration:
> `python3 deploy/sanity_check.py`
Start the frontend, then start a worker for your chosen backend.
> [!TIP]
......@@ -123,19 +111,6 @@ python3 -m dynamo.vllm --model Qwen/Qwen3-0.6B --discovery-backend file \
--kv-events-config '{"enable_kv_cache_events": false}'
```
> [!NOTE]
> For dependency-free local development, disable KV event publishing (avoids NATS):
>
> - **vLLM:** Add `--kv-events-config '{"enable_kv_cache_events": false}'`
> - **SGLang:** No flag needed (KV events disabled by default)
> - **TensorRT-LLM:** No flag needed (KV events disabled by default)
>
> **TensorRT-LLM only:** The warning `Cannot connect to ModelExpress server/transport error. Using direct download.`
> is expected and can be safely ignored.
> [!NOTE]
> **Deprecation notice:** vLLM automatically enables KV event publishing when prefix caching is active. In a future release, this will change — KV events will be disabled by default for all backends. Start using `--kv-events-config` explicitly to prepare.
## Test Your Deployment
```bash
......
......@@ -24,6 +24,10 @@ navigation:
path: getting-started/quickstart.md
- page: Introduction
path: getting-started/introduction.md
- page: Local Installation
path: getting-started/local-installation.md
- page: Building from Source
path: getting-started/building-from-source.md
- page: Contribution Guide
path: contribution-guide.md
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment