README.md 18.4 KB
Newer Older
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
# Container Development Guide

## Overview

The NVIDIA Dynamo project uses containerized development and deployment to maintain consistent environments across different AI inference frameworks and deployment scenarios. This directory contains the tools for building and running Dynamo containers:

### Core Components

- **`build.sh`** - A Docker image builder that creates containers for different AI inference frameworks (vLLM, TensorRT-LLM, SGLang). It handles framework-specific dependencies, multi-stage builds, and development vs production configurations.

- **`run.sh`** - A container runtime manager that launches Docker containers with proper GPU access, volume mounts, and environment configurations. It supports different development workflows from root-based legacy setups to user-based development environments.

- **Multiple Dockerfiles** - Framework-specific Dockerfiles that define the container images:
  - `Dockerfile.vllm` - For vLLM inference backend
  - `Dockerfile.trtllm` - For TensorRT-LLM inference backend
  - `Dockerfile.sglang` - For SGLang inference backend
  - `Dockerfile` - Base/standalone configuration

### Why Containerization?

Each inference framework (vLLM, TensorRT-LLM, SGLang) has specific CUDA versions, Python dependencies, and system libraries. Containers provide consistent environments, framework isolation, and proper GPU configurations across development and production.

The scripts in this directory abstract away the complexity of Docker commands while providing fine-grained control over build and runtime configurations.

### Convenience Scripts vs Direct Docker Commands

The `build.sh` and `run.sh` scripts are convenience wrappers that simplify common Docker operations. They automatically handle:
- Framework-specific image selection and tagging
- GPU access configuration and runtime selection
- Volume mount setup for development workflows
- Environment variable management
- Build argument construction for multi-stage builds

**You can always use Docker commands directly** if you prefer more control or want to customize beyond what the scripts provide. The scripts use `--dry-run` flags to show you the exact Docker commands they would execute, making it easy to understand and modify the underlying operations.

## Development Targets Feature Matrix

These targets are specified with `build.sh --target <target>` and correspond to Docker multi-stage build targets defined in the Dockerfiles (e.g., `FROM somebase AS <target>`). Some commonly used targets include:

- `runtime` - For running pre-built containers without development tools (minimal size)
41
42
- `dev` - For development (inferencing/benchmarking/etc, runs as root user)
- `local-dev` - For development with local user permissions matching host UID/GID. This is useful when mounting host partitions (with local user permissions) to Docker partitions.
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75

Additional targets are available in the Dockerfiles for specific build stages and use cases.

```
Feature           │ 1. dev + `run.sh`     │ 2. local-dev + `run.sh`  │ 3. local-dev + Dev Container
──────────────────┼───────────────────────┼──────────────────────────┼────────────────────────────
Default User      │ root                  │ ubuntu                   │ ubuntu
──────────────────┼───────────────────────┼──────────────────────────┼────────────────────────────
User Setup        │ None                  │ Matches UID/GID of       │ Matches UID/GID of
                  │                       │ `build.sh` user          │ `build.sh` user
──────────────────┼───────────────────────┼──────────────────────────┼────────────────────────────
Permissions       │ root                  │ ubuntu with sudo         │ ubuntu with sudo
──────────────────┼───────────────────────┼──────────────────────────┼────────────────────────────
Home Directory    │ /root                 │ /home/ubuntu             │ /home/ubuntu
──────────────────┼───────────────────────┼──────────────────────────┼────────────────────────────
Working Directory │ /workspace            │ /workspace               │ /home/ubuntu/dynamo
──────────────────┼───────────────────────┼──────────────────────────┼────────────────────────────
Rust Toolchain    │ System install        │ User install (~/.rustup, │ User install (~/.rustup,
                  │ (/usr/local/rustup,   │  ~/.cargo)               │  ~/.cargo)
                  │  /usr/local/cargo)    │                          │
──────────────────┼───────────────────────┼──────────────────────────┼────────────────────────────
Python Env        │ root owned            │ User owned venv          │ User owned venv
──────────────────┼───────────────────────┼──────────────────────────┼────────────────────────────
File Permissions  │ root-level            │ user-level, safe         │ user-level, safe
──────────────────┼───────────────────────┼──────────────────────────┼────────────────────────────
Compatibility     │ Legacy workflows,     │ workspace writable on NFS│workspace writable on NFS
                  │   workspace not       │                          │
                  │   writable on NFS     │                          │
──────────────────┼───────────────────────┼──────────────────────────┼────────────────────────────
```

## Usage Guidelines

76
77
- **Use dev + `run.sh`**: for command-line testing. Runs as root user
- **Use local-dev + `run.sh`**: for command-line development and Docker mounted partitions using your local user ID
78
79
80
81
- **Use local-dev + Dev Container**: VS Code/Cursor Dev Container Plugin, using your local user ID

## Example Commands

82
### 1. dev + `run.sh` (runs as root):
83
```bash
84
run.sh ...
85
86
```

87
### 2. local-dev + `run.sh` (runs as the local user):
88
```bash
89
run.sh --mount-workspace -it --image dynamo:latest-vllm-local-dev ...
90
91
```

92
93
### 3. local-dev + Dev Container Extension:
Use VS Code/Cursor Dev Container Extension with devcontainer.json configuration
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109

## Build and Run Scripts Overview

### build.sh - Docker Image Builder

The `build.sh` script is responsible for building Docker images for different AI inference frameworks. It supports multiple frameworks and configurations:

**Purpose:**
- Builds Docker images for NVIDIA Dynamo with support for vLLM, TensorRT-LLM, SGLang, or standalone configurations
- Handles framework-specific dependencies and optimizations
- Manages build contexts, caching, and multi-stage builds
- Configures development vs production targets

**Key Features:**
- **Framework Support**: vLLM (default when --framework not specified), TensorRT-LLM, SGLang, or NONE
- **Multi-stage Builds**: Build process with base images
110
- **Development Targets**: Supports `dev` target and `local-dev` target
111
112
113
114
115
116
- **Build Caching**: Docker layer caching and sccache support
- **GPU Optimization**: CUDA, EFA, and NIXL support

**Common Usage Examples:**

```bash
117
# Build vLLM dev image called dynamo:latest-vllm (default). This runs as root and is fine to use for inferencing/benchmarking, etc.
118
119
./build.sh

120
# Build both development and local-dev images (integrated into build.sh). While the dev image runs as root, the local-dev image will run as the local user, which is useful when mounting partitions. It will also contain development tools.
121
122
./build.sh --framework vllm --target local-dev

123
124
125
# Build TensorRT-LLM development image called dynamo:latest-trtllm
./build.sh --framework trtllm

126
127
128
129
130
131
132
133
134
135
136
137
138
# Build with custom tag
./build.sh --framework sglang --tag my-custom-tag

# Dry run to see commands
./build.sh --dry-run

# Build with no cache
./build.sh --no-cache

# Build with build arguments
./build.sh --build-arg CUSTOM_ARG=value
```

139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
### build.sh --dev-image - Local Development Image Builder

The `build.sh --dev-image` option takes a dev image and then builds a local-dev image, which contains proper local user permissions. It also includes extra developer utilities (debugging tools, text editors, system monitors, etc.).

**Common Usage Examples:**

```bash
# Build local-dev image from dev image dynamo:latest-vllm
./build.sh --dev-image dynamo:latest-vllm --framework vllm

# Build with custom tag from dev image dynamo:latest-vllm
./build.sh --dev-image dynamo:latest-vllm --framework vllm --tag my-local:dev

# Dry run to see what would be built
./build.sh --dev-image dynamo:latest-vllm --framework vllm --dry-run
```

156
157
158
159
160
161
162
163
164
165
166
167
168
169
### run.sh - Container Runtime Manager

The `run.sh` script launches Docker containers with the appropriate configuration for development and inference workloads.

**Purpose:**
- Runs pre-built Dynamo Docker images with proper GPU access
- Configures volume mounts, networking, and environment variables
- Supports different development workflows (root vs user-based)
- Manages container lifecycle and resource allocation

**Key Features:**
- **GPU Management**: Automatic GPU detection and allocation
- **Volume Mounting**: Workspace and HuggingFace cache mounting
- **User Management**: Root or user-based container execution
170
- **Network Configuration**: Configurable networking modes (host, bridge, none, container sharing)
171
172
173
174
175
- **Resource Limits**: Memory, file descriptors, and IPC configuration

**Common Usage Examples:**

```bash
176
177
# Basic container launch (inference/production, runs as root user)
./run.sh --image dynamo:latest-vllm -v $HOME/.cache:/home/ubuntu/.cache
178

179
# Mount workspace for development (use local-dev image for local host user permissions)
180
./run.sh --image dynamo:latest-vllm-local-dev --mount-workspace -it -v $HOME/.cache:/home/ubuntu/.cache
181

182
# Use specific image and framework for development
183
./run.sh --image v0.1.0.dev.08cc44965-vllm-local-dev --framework vllm --mount-workspace -it -v $HOME/.cache:/home/ubuntu/.cache
184

185
# Interactive development shell with workspace mounted
186
./run.sh --image dynamo:latest-vllm-local-dev --mount-workspace -v $HOME/.cache:/home/ubuntu/.cache -it -- bash
187

188
# Development with custom environment variables
189
./run.sh --image dynamo:latest-vllm-local-dev -e CUDA_VISIBLE_DEVICES=0,1 --mount-workspace -it -v $HOME/.cache:/home/ubuntu/.cache
190
191
192
193

# Dry run to see docker command
./run.sh --dry-run

194
# Development with custom volume mounts
195
./run.sh --image dynamo:latest-vllm-local-dev -v /host/path:/container/path --mount-workspace -it -v $HOME/.cache:/home/ubuntu/.cache
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
```

### Network Configuration Options

The `run.sh` script supports different networking modes via the `--network` flag (defaults to `host`):

#### Host Networking (Default)
```bash
# Same examples with local host user permissions
./run.sh --image dynamo:latest-vllm-local-dev --network host -v $HOME/.cache:/home/ubuntu/.cache
./run.sh --image dynamo:latest-vllm-local-dev -v $HOME/.cache:/home/ubuntu/.cache
```
**Use cases:**
- High-performance ML inference (default for GPU workloads)
- Services that need direct host port access
- Maximum network performance with minimal overhead
- Sharing services with the host machine (NATS, etcd, etc.)

**⚠️ Port Sharing Limitation:** Host networking shares all ports with the host machine, which means you can only run **one instance** of services like NATS (port 4222) or etcd (port 2379) across all containers and the host.

#### Bridge Networking (Isolated)
```bash
# CI/testing with isolated bridge networking and host cache sharing
219
./run.sh --image dynamo:latest-vllm --mount-workspace -it --network bridge -v $HOME/.cache:/home/ubuntu/.cache
220
221
222
223
224
225
226
227
228
229
230
231
```
**Use cases:**
- Secure isolation from host network
- CI/CD pipelines requiring complete isolation
- When you need absolute control of ports
- Exposing specific services to host while maintaining isolation

**Note:** For port sharing with the host, use the `--port` or `-p` option with format `host_port:container_port` (e.g., `--port 8000:8000` or `-p 9081:8081`) to expose specific container ports to the host.

#### No Networking ⚠️ **LIMITED FUNCTIONALITY**
```bash
# Complete network isolation - no external connectivity
232
./run.sh --image dynamo:latest-vllm --network none --mount-workspace -it -v $HOME/.cache:/home/ubuntu/.cache
233
234

# Same with local user permissions
235
./run.sh --image dynamo:latest-vllm-local-dev --network none --mount-workspace -it -v $HOME/.cache:/home/ubuntu/.cache
236
```
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
**⚠️ WARNING: `--network none` severely limits Dynamo functionality:**
- **No model downloads** - HuggingFace models cannot be downloaded
- **No API access** - Cannot reach external APIs or services
- **No distributed inference** - Multi-node setups won't work
- **No monitoring/logging** - External monitoring systems unreachable
- **Limited debugging** - Cannot access external debugging tools

**Very limited use cases:**
- Pre-downloaded models with purely local processing
- Air-gapped security environments (models must be pre-staged)

#### Container Network Sharing
Use `--network container:name` to share the network namespace with another container.

**Use cases:**
- Sidecar patterns (logging, monitoring, caching)
- Service mesh architectures
- Sharing network namespaces between related containers

See Docker documentation for `--network container:name` usage.

#### Custom Networks
Use custom Docker networks for multi-container applications. Create with `docker network create` and specify with `--network network-name`.

**Use cases:**
- Multi-container applications
- Service discovery by container name

See Docker documentation for custom network creation and management.

#### Network Mode Comparison

| Mode | Performance | Security | Use Case | Dynamo Compatibility | Port Sharing | Port Publishing |
|------|-------------|----------|----------|---------------------|---------------|-----------------|
| `host` | Highest | Lower | ML/GPU workloads, high-performance services | ✅ Full | ⚠️ **Shared with host** (one NATS/etcd only) | ❌ Not needed |
| `bridge` | Good | Higher | General web services, controlled port exposure | ✅ Full | ✅ Isolated ports | ✅ `-p host:container` |
| `none` | N/A | Highest | Air-gapped environments only | ⚠️ **Very Limited** | ✅ No network | ❌ No network |
| `container:name` | Good | Medium | Sidecar patterns, shared network stacks | ✅ Full | ⚠️ Shared with target container | ❌ Use target's ports |
| Custom networks | Good | Medium | Multi-container applications | ✅ Full | ✅ Isolated ports | ✅ `-p host:container` |
276
277
278
279
280

## Workflow Examples

### Development Workflow
```bash
281
# 1. Build local-dev image (creates both dynamo:latest-vllm and dynamo:latest-vllm-local-dev)
282
283
./build.sh --framework vllm --target local-dev

284
# 2. Run development container using the local-dev image
285
./run.sh --image dynamo:latest-vllm-local-dev --mount-workspace -v $HOME/.cache:/home/ubuntu/.cache -it
286
287
288
289
290
291

# 3. Inside container, run inference (requires both frontend and backend)
# Start frontend
python -m dynamo.frontend &

# Start backend (vLLM example)
292
python -m dynamo.vllm --model Qwen/Qwen3-0.6B --gpu-memory-utilization 0.20 &
293
294
295
296
297
298
299
```

### Production Workflow
```bash
# 1. Build production image
./build.sh --framework vllm --release-build

300
301
# 2. Run production container (runs as root)
./run.sh --image dynamo:latest-vllm --gpus all
302
303
```

304
### CI/CD Workflow
305
```bash
306
# 1. Build image for CI
307
308
./build.sh --framework vllm --no-cache

309
# 2. Run tests with network isolation for reproducible results
310
./run.sh --image dynamo:latest-vllm --mount-workspace -it --network bridge -v $HOME/.cache:/home/ubuntu/.cache -- python -m pytest tests/
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326

# 3. Inside the container with bridge networking, start services
# Note: Services are only accessible from the same container - no port conflicts with host
nats-server -js &
etcd --listen-client-urls http://0.0.0.0:2379 --advertise-client-urls http://0.0.0.0:2379 --data-dir /tmp/etcd &
python -m dynamo.frontend &

# 4. Start worker backend (choose one framework):
# vLLM
DYN_SYSTEM_ENABLED=true DYN_SYSTEM_PORT=8081 python -m dynamo.vllm --model Qwen/Qwen3-0.6B --gpu-memory-utilization 0.20 --enforce-eager --no-enable-prefix-caching --max-num-seqs 64 &

# SGLang
DYN_SYSTEM_ENABLED=true DYN_SYSTEM_PORT=8081 python -m dynamo.sglang --model Qwen/Qwen3-0.6B --mem-fraction-static 0.20 --max-running-requests 64 &

# TensorRT-LLM
DYN_SYSTEM_ENABLED=true DYN_SYSTEM_PORT=8081 python -m dynamo.trtllm --model Qwen/Qwen3-0.6B --free-gpu-memory-fraction 0.20 --max-num-tokens 8192 --max-batch-size 64 &
327
```
328
329
330
331
332

**Framework-Specific GPU Memory Arguments:**
- **vLLM**: `--gpu-memory-utilization 0.20` (use 20% GPU memory), `--enforce-eager` (disable CUDA graphs), `--no-enable-prefix-caching` (save memory), `--max-num-seqs 64` (max concurrent sequences)
- **SGLang**: `--mem-fraction-static 0.20` (20% GPU memory for static allocation), `--max-running-requests 64` (max concurrent requests)
- **TensorRT-LLM**: `--free-gpu-memory-fraction 0.20` (reserve 20% GPU memory), `--max-num-tokens 8192` (max tokens in batch), `--max-batch-size 64` (max batch size)