quickstart.md 3.57 KB
Newer Older
1
2
3
---
# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
4
title: Quickstart
5
6
---

7
This guide covers running Dynamo **using the CLI on your local machine or VM**.
8

9
10
11
12
> [!IMPORTANT]
> **Looking to deploy on Kubernetes instead?**
> See the [Kubernetes Installation Guide](../kubernetes/installation-guide.md)
> and [Kubernetes Quickstart](../kubernetes/README.md) for cluster deployments.
13

14
15
16
17
18
19
20
21
## Choose Your Install Path

| Path | Best For | Guide |
|---|---|---|
| **Local Install** | Running Dynamo on a single machine or VM | [Local Installation](local-installation.md) |
| **Kubernetes** | Production multi-node cluster deployments | [Kubernetes Deployment Guide](../kubernetes/README.md) |
| **Building from Source** | Contributors and local development | [Building from Source](building-from-source.md) |

22
## Install Dynamo
23

24
**Option A: Containers (Recommended)**
25

26
Containers have all dependencies pre-installed. No setup required.
27

28
29
```bash
# SGLang
30
docker run --gpus all --network host --rm -it nvcr.io/nvidia/ai-dynamo/sglang-runtime:1.0.1
31
32

# TensorRT-LLM
33
docker run --gpus all --network host --rm -it nvcr.io/nvidia/ai-dynamo/tensorrtllm-runtime:1.0.1
34
35

# vLLM
36
docker run --gpus all --network host --rm -it nvcr.io/nvidia/ai-dynamo/vllm-runtime:1.0.1
37
38
39
40
41
42
43
```

See [Release Artifacts](../reference/release-artifacts.md#container-images) for available
versions and backend guides for run instructions: [SGLang](../backends/sglang/README.md) |
[TensorRT-LLM](../backends/trtllm/README.md) | [vLLM](../backends/vllm/README.md)

**Option B: Install from PyPI**
44
45
46
47
48

```bash
# Install uv (recommended Python package manager)
curl -LsSf https://astral.sh/uv/install.sh | sh

49
# Create virtual environment
50
51
uv venv venv
source venv/bin/activate
52
uv pip install pip
53
54
```

55
56
57
Install system dependencies and the Dynamo wheel for your chosen backend:

**SGLang**
58
59

```bash
60
61
sudo apt install python3-dev
uv pip install --prerelease=allow "ai-dynamo[sglang]"
62
63
```

64
**TensorRT-LLM**
65
66

```bash
67
68
69
70
71
72
73
74
75
76
sudo apt install python3-dev
pip install torch==2.9.0 torchvision --index-url https://download.pytorch.org/whl/cu130
pip install --pre --extra-index-url https://pypi.nvidia.com "ai-dynamo[trtllm]"
```

**vLLM**

```bash
sudo apt install python3-dev libxcb1
uv pip install --prerelease=allow "ai-dynamo[vllm]"
77
78
```

79
80
81
82
## Run Dynamo

Start the frontend, then start a worker for your chosen backend.

83
84
85
> [!TIP]
> To run in a single terminal (useful in containers), append `> logfile.log 2>&1 &`
> to run processes in background. Example: `python3 -m dynamo.frontend --discovery-backend file > dynamo.frontend.log 2>&1 &`
86
87

```bash
88
# Start the OpenAI compatible frontend (default port is 8000)
89
90
# --discovery-backend file avoids needing etcd. Frontend and workers must share a disk.
# The event plane automatically defaults to ZMQ (no NATS required) with this backend.
91
python3 -m dynamo.frontend --discovery-backend file
92
93
94
95
96
97
98
```

In another terminal (or same terminal if using background mode), start a worker:

**SGLang**

```bash
99
python3 -m dynamo.sglang --model-path Qwen/Qwen3-0.6B --discovery-backend file
100
101
```

102
**TensorRT-LLM**
103

104
```bash
105
python3 -m dynamo.trtllm --model-path Qwen/Qwen3-0.6B --discovery-backend file
106
```
107

108
**vLLM**
109

110
```bash
111
python3 -m dynamo.vllm --model Qwen/Qwen3-0.6B --discovery-backend file \
112
113
  --kv-events-config '{"enable_kv_cache_events": false}'
```
114

115
## Test Your Deployment
116

117
118
119
120
121
122
123
```bash
curl localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "Qwen/Qwen3-0.6B",
       "messages": [{"role": "user", "content": "Hello!"}],
       "max_tokens": 50}'
```