@@ -126,7 +126,7 @@ Also available: [`tensorrtllm-runtime:1.0.1`](https://docs.nvidia.com/dynamo/res
pip install"ai-dynamo[sglang]"# or [vllm] or [trtllm]
```
Then start the frontend and a worker as shown above. See the [full installation guide](https://docs.nvidia.com/dynamo/getting-started/quickstart) for system dependencies and backend-specific notes.
Then start the frontend and a worker as shown above. See the [full installation guide](docs/getting-started/local-installation.md) for system dependencies and backend-specific notes.
### Option C: Kubernetes (recommended)
...
...
@@ -159,7 +159,7 @@ See [recipes/](recipes/README.md) for the full list. Cloud-specific guides: [AWS
## Building from Source
For contributors who want to build and develop locally. See the [full build guide](https://docs.nvidia.com/dynamo/getting-started/contribution-guide#building-from-source) for details.
For contributors who want to build and develop locally. See the [full build guide](docs/getting-started/building-from-source.md) for details.
# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
sidebar-title:Building from Source
description:Build Dynamo from source for development and contributions
---
# Building from Source
Build Dynamo from source when you want to contribute code, test features on the development branch, or customize the build. If you just want to run Dynamo, the [Local Installation](local-installation.md) guide is faster.
This guide covers Ubuntu and macOS. For a containerized dev environment that handles all of this automatically, see [DevContainer](#devcontainer).
curl --proto'=https'--tlsv1.2 -sSf https://sh.rustup.rs | sh
source$HOME/.cargo/env
```
## 3. Create a Python Virtual Environment
Install [uv](https://docs.astral.sh/uv/#installation) if you don't have it:
```bash
curl -LsSf https://astral.sh/uv/install.sh | sh
```
Create and activate a virtual environment:
```bash
uv venv .venv
source .venv/bin/activate
```
## 4. Install Build Tools
```bash
uv pip install pip maturin
```
[Maturin](https://github.com/PyO3/maturin) is the Rust-Python bindings build tool.
## 5. Build the Rust Bindings
```bash
cd lib/bindings/python
maturin develop --uv
```
## 6. Install GPU Memory Service
```bash
# Return to project root
cd"$(git rev-parse --show-toplevel)"
uv pip install-e lib/gpu_memory_service
```
## 7. Install the Wheel
```bash
uv pip install-e .
```
## 8. Verify the Build
```bash
python3 -m dynamo.frontend --help
```
You should see the frontend command help output.
## DevContainer
VSCode and Cursor users can skip manual setup using pre-configured development containers. The DevContainer installs all toolchains, builds the project, and sets up the Python environment automatically.
Framework-specific containers are available for vLLM, SGLang, and TensorRT-LLM. See the [DevContainer README](https://github.com/ai-dynamo/dynamo/tree/main/.devcontainer) for setup instructions.
## Set Up Pre-commit Hooks
Before submitting PRs, install the pre-commit hooks to ensure your code passes CI checks:
```bash
uv pip install pre-commit
pre-commit install
```
Run checks manually on all files:
```bash
pre-commit run --all-files
```
## Troubleshooting
**Missing system packages**
If `maturin develop` fails with linker errors, verify all system dependencies are installed. On Ubuntu:
# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
sidebar-title:Local Installation
description:Install and run Dynamo on a local machine or VM with containers or PyPI
---
# Local Installation
This guide walks through installing and running Dynamo on a local machine or VM with one or more GPUs. By the end, you'll have a working OpenAI-compatible endpoint serving a model.
For production multi-node clusters, see the [Kubernetes Deployment Guide](../kubernetes/README.md). To build from source for development, see [Building from Source](building-from-source.md).
## System Requirements
| Requirement | Supported |
|---|---|
| **GPU** | NVIDIA Ampere, Ada Lovelace, Hopper, Blackwell |
-**SGLang:** No flag needed (KV events disabled by default)
-**TensorRT-LLM:** No flag needed (KV events disabled by default)
vLLM automatically enables KV event publishing when prefix caching is active. In a future release, KV events will be disabled by default for all backends. Start using `--kv-events-config` explicitly to prepare.
Run `nvidia-smi` to check your driver version. Dynamo requires driver 575.51.03+ for CUDA 12 or 580.00.03+ for CUDA 13. B300/GB300 GPUs require CUDA 13. See the [Support Matrix](../reference/support-matrix.md) for full requirements.
**Model doesn't fit on GPU (OOM)**
The default model `Qwen/Qwen3-0.6B` requires ~2GB of GPU memory. Larger models need more VRAM:
| Model Size | Approximate VRAM |
|---|---|
| 7B | 14-16 GB |
| 13B | 26-28 GB |
| 70B | 140+ GB (multi-GPU) |
Start with a small model and scale up based on your hardware.
**Python 3.11 with TensorRT-LLM**
TensorRT-LLM does not support Python 3.11. If you see installation failures with TensorRT-LLM, check your Python version with `python3 --version`. Use Python 3.10 or 3.12 instead.
**Container runs but GPU not detected**
Ensure you passed `--gpus all` to `docker run`. Without this flag, the container won't have access to GPUs:
```bash
# Correct
docker run --gpus all --network host --rm-it nvcr.io/nvidia/ai-dynamo/sglang-runtime:1.0.0
# Wrong -- no GPU access
docker run --network host --rm-it nvcr.io/nvidia/ai-dynamo/sglang-runtime:1.0.0
```
## Next Steps
-[Backend Guides](../backends/sglang/README.md) -- Backend-specific configuration and features
-[Disaggregated Serving](../features/disaggregated-serving/README.md) -- Scale prefill and decode independently
> - **SGLang:** No flag needed (KV events disabled by default)
> - **TensorRT-LLM:** No flag needed (KV events disabled by default)
>
> **TensorRT-LLM only:** The warning `Cannot connect to ModelExpress server/transport error. Using direct download.`
> is expected and can be safely ignored.
> [!NOTE]
> **Deprecation notice:** vLLM automatically enables KV event publishing when prefix caching is active. In a future release, this will change — KV events will be disabled by default for all backends. Start using `--kv-events-config` explicitly to prepare.