README.md 23.8 KB
Newer Older
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# Container Development Guide

## Overview

The NVIDIA Dynamo project uses containerized development and deployment to maintain consistent environments across different AI inference frameworks and deployment scenarios. This directory contains the tools for building and running Dynamo containers:

### Core Components

- **`build.sh`** - A Docker image builder that creates containers for different AI inference frameworks (vLLM, TensorRT-LLM, SGLang). It handles framework-specific dependencies, multi-stage builds, and development vs production configurations.

- **`run.sh`** - A container runtime manager that launches Docker containers with proper GPU access, volume mounts, and environment configurations. It supports different development workflows from root-based legacy setups to user-based development environments.

- **Multiple Dockerfiles** - Framework-specific Dockerfiles that define the container images:
  - `Dockerfile.vllm` - For vLLM inference backend
  - `Dockerfile.trtllm` - For TensorRT-LLM inference backend
  - `Dockerfile.sglang` - For SGLang inference backend
  - `Dockerfile` - Base/standalone configuration
18
19
  - `Dockerfile.frontend` - For Kubernetes Gateway API Inference Extension integration with EPP
  - `Dockerfile.epp` - For building the Endpoint Picker (EPP) image
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41

### Why Containerization?

Each inference framework (vLLM, TensorRT-LLM, SGLang) has specific CUDA versions, Python dependencies, and system libraries. Containers provide consistent environments, framework isolation, and proper GPU configurations across development and production.

The scripts in this directory abstract away the complexity of Docker commands while providing fine-grained control over build and runtime configurations.

### Convenience Scripts vs Direct Docker Commands

The `build.sh` and `run.sh` scripts are convenience wrappers that simplify common Docker operations. They automatically handle:
- Framework-specific image selection and tagging
- GPU access configuration and runtime selection
- Volume mount setup for development workflows
- Environment variable management
- Build argument construction for multi-stage builds

**You can always use Docker commands directly** if you prefer more control or want to customize beyond what the scripts provide. The scripts use `--dry-run` flags to show you the exact Docker commands they would execute, making it easy to understand and modify the underlying operations.

## Development Targets Feature Matrix

These targets are specified with `build.sh --target <target>` and correspond to Docker multi-stage build targets defined in the Dockerfiles (e.g., `FROM somebase AS <target>`). Some commonly used targets include:

42
43
44
- `runtime` - For running pre-built containers without development tools (minimal size, runs as non-root `dynamo` user with UID 1000 and GID 0)
- `dev` - For development (inferencing/benchmarking/etc, runs as root user for maximum flexibility)
- `local-dev` - For development with local user permissions matching host UID/GID. This is useful when mounting host partitions (with local user permissions) to Docker partitions. The `dynamo` user UID/GID is remapped to match the host user.
45
46
47

Additional targets are available in the Dockerfiles for specific build stages and use cases.

48
49
| Feature | **dev + `run.sh`** | **local-dev + `run.sh`** | **local-dev + Dev Container** |
|---------|-------------------|--------------------------|-------------------------------|
50
51
52
53
| **Default User** | root | dynamo (matched to host UID/GID) | dynamo (matched to host UID/GID) |
| **User Setup** | None (root) | Matches UID/GID of `build.sh` user | Matches UID/GID of `build.sh` user |
| **Permissions** | root | dynamo with sudo | dynamo with sudo |
| **Home Directory** | `/root` | `/home/dynamo` | `/home/dynamo` |
54
55
| **Working Directory** | `/workspace` | `/workspace` | `/workspace` |
| **Rust Toolchain** | System install (`/usr/local/rustup`, `/usr/local/cargo`) | User install (`~/.rustup`, `~/.cargo`) | User install (`~/.rustup`, `~/.cargo`) |
56
57
58
| **Python Env** | dynamo user owned | dynamo owned venv | dynamo owned venv |
| **File Permissions** | root-level | user-level (dynamo), safe | user-level (dynamo), safe |
| **Compatibility** | Legacy workflows, maximum flexibility | workspace writable on NFS, non-root security | workspace writable on NFS, non-root security |
59
60
61
62
63
64
65

## Environment Variables Across Build Stages

Understanding how environment variables change across different build stages is crucial for development and debugging. The Dynamo build system uses a multi-stage Docker build process where environment variables are set, inherited, and overridden at different stages.

### Build Stage Flow

66
```
67
68
69
70
71
Dockerfile → base → dev (dynamo-base image)

Dockerfile.vllm → framework → runtime → dev (vllm dev image)

Dockerfile.local_dev → local-dev (from vllm dev image)
72
73
```

74
75
### Environment Variables by Stage

76
77
78
79
80
81
82
83
84
85
86
| Variable             | **base**            | **base→dev**         | **vllm→framework** | **vllm→runtime**   | **vllm→dev** | **local-dev** |
|----------------------|---------------------|----------------------|--------------------|--------------------|--------------|--------------------|
| **DYNAMO_HOME**      | ❌ Not set          | `/opt/dynamo`        | ❌ Not set         | `/opt/dynamo`      | `/workspace`**OVERRIDE** | `/workspace` (inherited) |
| **WORKSPACE_DIR**    | ❌ Not set          | ❌ Not set           | ❌ Not set         | ❌ Not set         | `/workspace` | `/workspace` (inherited) |
| **CARGO_TARGET_DIR** | ❌ Not set          | `/opt/dynamo/target` | ❌ Not set         | ❌ Not set         | `/workspace/target`**OVERRIDE** | `/workspace/target` (inherited) |
| **VIRTUAL_ENV**      | `/opt/dynamo/venv`  | (inherited)          | `/opt/dynamo/venv` | `/opt/dynamo/venv` | `/opt/dynamo/venv`**REDEFINE** | `/opt/dynamo/venv` (inherited) |
| **RUSTUP_HOME**      | `/usr/local/rustup` | (inherited)          | ❌ Not set         | ❌ Not set         | `/usr/local/rustup` | `/home/dynamo/.rustup`**OVERRIDE** |
| **CARGO_HOME**       | `/usr/local/cargo`  | (inherited)          | ❌ Not set         | ❌ Not set         | `/usr/local/cargo` | `/home/dynamo/.cargo`**OVERRIDE** |
| **USERNAME**         | ❌ Not set          | `dynamo`             | ❌ Not set         | `dynamo`           | ❌ Not set   | `dynamo` |
| **HOME**             | (system default)    | `/home/dynamo`       | (system default)   | `/home/dynamo`     | (system default) | `/home/dynamo` |
| **PATH**             | (includes cargo)    | (inherited)          | (system default)   | (includes venv, etcd, ucx) | `/usr/local/cargo/bin:$PATH` | `/home/dynamo/.cargo/bin:$PATH`**OVERRIDE** |
87
88
89
90
91
92
93
94
95

### Key Insights

**1. DYNAMO_HOME Dual Purpose:**
- `base→dev` and `vllm→runtime`: `/opt/dynamo` - For **installed/packaged** Dynamo (CI, production)
- `vllm→dev` and `local-dev`: `/workspace` - For **development** with source code mounted from host

**2. Rust Toolchain Location:**
- `dev` target: System-wide at `/usr/local/rustup` and `/usr/local/cargo` (suitable for root)
96
- `local-dev` target: User-specific at `/home/dynamo/.rustup` and `/home/dynamo/.cargo` (proper UID/GID ownership)
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114

**3. Build Artifacts Location:**
- `base→dev`: `/opt/dynamo/target` - Build artifacts with installed package
- `vllm→dev` onward: `/workspace/target` - Build artifacts in mounted workspace for persistence

**4. Variables That Stay Constant:**
- `VIRTUAL_ENV`: Always `/opt/dynamo/venv` (ownership changes in local-dev via rsync)
- `WORKSPACE_DIR`: Always `/workspace` once set in vllm→dev
- `DYNAMO_HOME`: Always `/workspace` once overridden in vllm→dev (for development)

**5. local-dev Specific Changes:**
From `Dockerfile.local_dev`, the Rust toolchain is moved to user home because:
- Workspace mount points may change, breaking toolchain paths
- User needs ownership of cargo binaries and registry for package installation
- Toolchain requires consistent system paths that don't depend on workspace location

The Python venv ownership is also updated via rsync in local-dev to match the user's UID/GID, ensuring package installation permissions work correctly.

115
116
117
118
119
120
121
122
123
**6. Non-Root User Architecture:**
Dynamo containers implement a multi-stage user strategy:
- **runtime stage**: Runs as non-root `dynamo` user (UID 1000, GID 0) for production workloads
- **dev stage**: Runs as root for maximum development flexibility (builds on runtime but switches to root)
- **local-dev stage**: Runs as `dynamo` user with UID/GID matched to host user for safe file system operations
- **Security**: Runtime and local-dev use non-root execution to reduce attack surface
- **File Ownership**: All application files, virtual environments, and build artifacts are owned by `dynamo:root` (1000:0) in runtime stage
- **Environment Setup**: Launch banner moved to `/opt/dynamo/.launch_screen` (shared across all users) and venv activation configured in `/etc/bash.bashrc` for system-wide availability. This replaces the previous per-user `~/.launch_screen` and `~/.bashrc` approach.

124
125
## Usage Guidelines

126
127
128
129
- **Use runtime target**: for production deployments. Runs as non-root `dynamo` user (UID 1000, GID 0) for security
- **Use dev + `run.sh`**: for command-line testing and inferencing. Runs as root for maximum flexibility
- **Use local-dev + `run.sh`**: for command-line development and Docker mounted partitions. Runs as `dynamo` user with UID/GID matched to your local user. Add `-it` flag for interactive sessions
- **Use local-dev + Dev Container**: VS Code/Cursor Dev Container Plugin, using `dynamo` user with UID/GID matched to your local user
130
131
132

## Example Commands

133
### 1. dev + `run.sh` (runs as root):
134
```bash
135
run.sh ...
136
137
```

138
### 2. local-dev + `run.sh` (runs as dynamo user with matched host UID/GID):
139
```bash
140
run.sh --mount-workspace -it --image dynamo:latest-vllm-local-dev ...
141
142
```

143
### 3. local-dev + Dev Container Extension:
144
145
146
147
148
149
150
151
152
153
Use VS Code/Cursor Dev Container Extension with devcontainer.json configuration. The `dynamo` user UID/GID is automatically matched to your local user.

### 4. runtime target (runs as non-root dynamo user):
```bash
# Build runtime image
./build.sh --framework vllm --target runtime

# Run runtime container
./run.sh --image dynamo:latest-vllm-runtime
```
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169

## Build and Run Scripts Overview

### build.sh - Docker Image Builder

The `build.sh` script is responsible for building Docker images for different AI inference frameworks. It supports multiple frameworks and configurations:

**Purpose:**
- Builds Docker images for NVIDIA Dynamo with support for vLLM, TensorRT-LLM, SGLang, or standalone configurations
- Handles framework-specific dependencies and optimizations
- Manages build contexts, caching, and multi-stage builds
- Configures development vs production targets

**Key Features:**
- **Framework Support**: vLLM (default when --framework not specified), TensorRT-LLM, SGLang, or NONE
- **Multi-stage Builds**: Build process with base images
170
- **Development Targets**: Supports `dev` target and `local-dev` target
171
172
173
174
175
176
- **Build Caching**: Docker layer caching and sccache support
- **GPU Optimization**: CUDA, EFA, and NIXL support

**Common Usage Examples:**

```bash
177
# Build vLLM dev image called dynamo:latest-vllm (default). This runs as root and is fine to use for inferencing/benchmarking, etc.
178
179
./build.sh

180
# Build both development and local-dev images (integrated into build.sh). While the dev image runs as root, the local-dev image will run as dynamo user with UID/GID matched to your host user, which is useful when mounting partitions. It will also contain development tools.
181
182
./build.sh --framework vllm --target local-dev

183
184
185
# Build TensorRT-LLM development image called dynamo:latest-trtllm
./build.sh --framework trtllm

186
187
188
189
190
191
192
193
194
195
196
197
198
# Build with custom tag
./build.sh --framework sglang --tag my-custom-tag

# Dry run to see commands
./build.sh --dry-run

# Build with no cache
./build.sh --no-cache

# Build with build arguments
./build.sh --build-arg CUSTOM_ARG=value
```

199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
### build.sh --dev-image - Local Development Image Builder

The `build.sh --dev-image` option takes a dev image and then builds a local-dev image, which contains proper local user permissions. It also includes extra developer utilities (debugging tools, text editors, system monitors, etc.).

**Common Usage Examples:**

```bash
# Build local-dev image from dev image dynamo:latest-vllm
./build.sh --dev-image dynamo:latest-vllm --framework vllm

# Build with custom tag from dev image dynamo:latest-vllm
./build.sh --dev-image dynamo:latest-vllm --framework vllm --tag my-local:dev

# Dry run to see what would be built
./build.sh --dev-image dynamo:latest-vllm --framework vllm --dry-run
```

216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
### Building the Frontend Image

The frontend image is a specialized container that includes the Dynamo components (NATS, etcd, dynamo, NIXL, etc) along with the Endpoint Picker (EPP) for Kubernetes Gateway API Inference Extension integration. This image is primarily used for inference gateway deployments.

**Step 1: Build the Custom Dynamo EPP Image**

Follow the instructions in [`deploy/inference-gateway/README.md`](../deploy/inference-gateway/README.md) under "Build the custom EPP image" section. This process:
- Clones the Gateway API Inference Extension repository
- Applies Dynamo-specific patches for custom routing
- Builds the Dynamo router as a static library
- Creates a custom EPP image with integrated Dynamo routing capabilities

**Step 2: Build the Dynamo Base Image**

The base image contains the core Dynamo runtime components, NATS server, etcd, and Python dependencies:
```bash
# Build the base dev image (framework=none for frontend-only deployment)
./build.sh --framework none --target dev
```

**Step 3: Build the Frontend Image**

Now build the frontend image that combines the Dynamo base with the EPP:

```bash
# 2. Build the frontend image using the pre-built EPP
docker buildx build --load --platform linux/amd64 \
  --build-arg DYNAMO_BASE_IMAGE=dynamo:latest-none-dev \
  --build-arg EPP_IMAGE={EPP_IMAGE_TAG} \
  --build-arg PYTHON_VERSION=3.12 \
  -f container/Dockerfile.frontend \
  -t dynamo:latest-none-frontend \
  .
```
#### Frontend Image Contents

The frontend image includes:
- **EPP (Endpoint Picker)**: Handles request routing and load balancing for inference gateway
- **Dynamo Runtime**: Core platform components and routing logic
- **NIXL**: NVIDIA InfiniBand Library for high-performance network communication
- **Benchmarking Tools**: Performance testing utilities (aiperf, aiconfigurator, etc)
- **Python Environment**: Virtual environment with all required dependencies
- **NATS Server**: Message broker for Dynamo's distributed communication
- **etcd**: Distributed key-value store for configuration and coordination

#### Deployment

The frontend image is designed for Kubernetes deployment with the Gateway API Inference Extension. See [`deploy/inference-gateway/README.md`](../deploy/inference-gateway/README.md) for complete deployment instructions using Helm charts.

265
266
267
268
269
270
271
272
273
274
275
276
277
### run.sh - Container Runtime Manager

The `run.sh` script launches Docker containers with the appropriate configuration for development and inference workloads.

**Purpose:**
- Runs pre-built Dynamo Docker images with proper GPU access
- Configures volume mounts, networking, and environment variables
- Supports different development workflows (root vs user-based)
- Manages container lifecycle and resource allocation

**Key Features:**
- **GPU Management**: Automatic GPU detection and allocation
- **Volume Mounting**: Workspace and HuggingFace cache mounting
278
- **User Management**: Non-root `dynamo` user execution (UID 1000, GID 0), with optional `--user` flag to override
279
- **Network Configuration**: Configurable networking modes (host, bridge, none, container sharing)
280
- **Resource Limits**: Memory, file descriptors, and IPC configuration
281
- **Interactive Mode**: Use `-it` flag for interactive terminal sessions (required for shells, debugging, and interactive development)
282
283
284
285

**Common Usage Examples:**

```bash
286
287
288
289
290
# Basic container launch with dev image (runs as root by default, non-interactive)
./run.sh --image dynamo:latest-vllm -v $HOME/.cache:/root/.cache

# Interactive development with workspace mounted using dev image (runs as root)
./run.sh --image dynamo:latest-vllm --mount-workspace -it -v $HOME/.cache:/home/dynamo/.cache
291

292
293
# Interactive development with local-dev image (runs as dynamo user with matched host UID/GID)
./run.sh --image dynamo:latest-vllm-local-dev --mount-workspace -it -v $HOME/.cache:/home/dynamo/.cache
294

295
# Use specific image and framework for development
296
./run.sh --image v0.1.0.dev.08cc44965-vllm-local-dev --framework vllm --mount-workspace -it -v $HOME/.cache:/home/dynamo/.cache
297

298
299
# Interactive development shell with workspace mounted (local-dev)
./run.sh --image dynamo:latest-vllm-local-dev --mount-workspace -v $HOME/.cache:/home/dynamo/.cache -it -- bash
300

301
# Development with custom environment variables
302
./run.sh --image dynamo:latest-vllm-local-dev -e CUDA_VISIBLE_DEVICES=0,1 --mount-workspace -it -v $HOME/.cache:/home/dynamo/.cache
303
304
305
306

# Dry run to see docker command
./run.sh --dry-run

307
# Development with custom volume mounts
308
309
310
311
312
313
314
./run.sh --image dynamo:latest-vllm-local-dev -v /host/path:/container/path --mount-workspace -it -v $HOME/.cache:/home/dynamo/.cache

# Run runtime image as non-root dynamo user (for production)
./run.sh --image dynamo:latest-vllm-runtime -v $HOME/.cache:/home/dynamo/.cache

# Run dev image as specific user (override default root)
./run.sh --image dynamo:latest-vllm --user dynamo -v $HOME/.cache:/home/dynamo/.cache
315
316
317
318
319
320
321
322
```

### Network Configuration Options

The `run.sh` script supports different networking modes via the `--network` flag (defaults to `host`):

#### Host Networking (Default)
```bash
323
324
325
# Examples with dynamo user
./run.sh --image dynamo:latest-vllm-local-dev --network host -v $HOME/.cache:/home/dynamo/.cache
./run.sh --image dynamo:latest-vllm-local-dev -v $HOME/.cache:/home/dynamo/.cache
326
327
328
329
330
331
332
333
334
335
336
```
**Use cases:**
- High-performance ML inference (default for GPU workloads)
- Services that need direct host port access
- Maximum network performance with minimal overhead
- Sharing services with the host machine (NATS, etcd, etc.)

**⚠️ Port Sharing Limitation:** Host networking shares all ports with the host machine, which means you can only run **one instance** of services like NATS (port 4222) or etcd (port 2379) across all containers and the host.

#### Bridge Networking (Isolated)
```bash
337
# CI/testing with isolated bridge networking and host cache sharing (no -it for automated CI)
338
./run.sh --image dynamo:latest-vllm --mount-workspace --network bridge -v $HOME/.cache:/home/dynamo/.cache
339
340
341
342
343
344
345
346
347
348
349
350
```
**Use cases:**
- Secure isolation from host network
- CI/CD pipelines requiring complete isolation
- When you need absolute control of ports
- Exposing specific services to host while maintaining isolation

**Note:** For port sharing with the host, use the `--port` or `-p` option with format `host_port:container_port` (e.g., `--port 8000:8000` or `-p 9081:8081`) to expose specific container ports to the host.

#### No Networking ⚠️ **LIMITED FUNCTIONALITY**
```bash
# Complete network isolation - no external connectivity
351
./run.sh --image dynamo:latest-vllm --network none --mount-workspace -it -v $HOME/.cache:/home/dynamo/.cache
352

353
354
# Same with local-dev image (dynamo user with matched host UID/GID)
./run.sh --image dynamo:latest-vllm-local-dev --network none --mount-workspace -it -v $HOME/.cache:/home/dynamo/.cache
355
```
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
**⚠️ WARNING: `--network none` severely limits Dynamo functionality:**
- **No model downloads** - HuggingFace models cannot be downloaded
- **No API access** - Cannot reach external APIs or services
- **No distributed inference** - Multi-node setups won't work
- **No monitoring/logging** - External monitoring systems unreachable
- **Limited debugging** - Cannot access external debugging tools

**Very limited use cases:**
- Pre-downloaded models with purely local processing
- Air-gapped security environments (models must be pre-staged)

#### Container Network Sharing
Use `--network container:name` to share the network namespace with another container.

**Use cases:**
- Sidecar patterns (logging, monitoring, caching)
- Service mesh architectures
- Sharing network namespaces between related containers

See Docker documentation for `--network container:name` usage.

#### Custom Networks
Use custom Docker networks for multi-container applications. Create with `docker network create` and specify with `--network network-name`.

**Use cases:**
- Multi-container applications
- Service discovery by container name

See Docker documentation for custom network creation and management.

#### Network Mode Comparison

| Mode | Performance | Security | Use Case | Dynamo Compatibility | Port Sharing | Port Publishing |
|------|-------------|----------|----------|---------------------|---------------|-----------------|
| `host` | Highest | Lower | ML/GPU workloads, high-performance services | ✅ Full | ⚠️ **Shared with host** (one NATS/etcd only) | ❌ Not needed |
| `bridge` | Good | Higher | General web services, controlled port exposure | ✅ Full | ✅ Isolated ports | ✅ `-p host:container` |
| `none` | N/A | Highest | Air-gapped environments only | ⚠️ **Very Limited** | ✅ No network | ❌ No network |
| `container:name` | Good | Medium | Sidecar patterns, shared network stacks | ✅ Full | ⚠️ Shared with target container | ❌ Use target's ports |
| Custom networks | Good | Medium | Multi-container applications | ✅ Full | ✅ Isolated ports | ✅ `-p host:container` |
395
396
397
398
399

## Workflow Examples

### Development Workflow
```bash
400
# 1. Build local-dev image (creates both dynamo:latest-vllm and dynamo:latest-vllm-local-dev)
401
402
./build.sh --framework vllm --target local-dev

403
# 2. Run development container using the local-dev image
404
./run.sh --image dynamo:latest-vllm-local-dev --mount-workspace -v $HOME/.cache:/home/dynamo/.cache -it
405
406
407
408
409
410

# 3. Inside container, run inference (requires both frontend and backend)
# Start frontend
python -m dynamo.frontend &

# Start backend (vLLM example)
411
python -m dynamo.vllm --model Qwen/Qwen3-0.6B --gpu-memory-utilization 0.20 &
412
413
414
415
```

### Production Workflow
```bash
416
417
# 1. Build production runtime image (runs as non-root dynamo user)
./build.sh --framework vllm --target runtime
418

419
420
# 2. Run production container as non-root dynamo user
./run.sh --image dynamo:latest-vllm-runtime --gpus all -v $HOME/.cache:/home/dynamo/.cache
421
422
```

423
### Testing Workflow
424
```bash
425
# 1. Build dev image
426
427
./build.sh --framework vllm --no-cache

428
# 2. Run tests with network isolation for reproducible results (no -it needed for CI)
429
./run.sh --image dynamo:latest-vllm --mount-workspace --network bridge -v $HOME/.cache:/home/dynamo/.cache -- python -m pytest tests/
430
431
432
433
434
435
436
437
438

# 3. Inside the container with bridge networking, start services
# Note: Services are only accessible from the same container - no port conflicts with host
nats-server -js &
etcd --listen-client-urls http://0.0.0.0:2379 --advertise-client-urls http://0.0.0.0:2379 --data-dir /tmp/etcd &
python -m dynamo.frontend &

# 4. Start worker backend (choose one framework):
# vLLM
439
DYN_SYSTEM_PORT=8081 python -m dynamo.vllm --model Qwen/Qwen3-0.6B --gpu-memory-utilization 0.20 --enforce-eager --no-enable-prefix-caching --max-num-seqs 64 &
440
441

# SGLang
442
DYN_SYSTEM_PORT=8081 python -m dynamo.sglang --model Qwen/Qwen3-0.6B --mem-fraction-static 0.20 --max-running-requests 64 &
443
444

# TensorRT-LLM
445
DYN_SYSTEM_PORT=8081 python -m dynamo.trtllm --model Qwen/Qwen3-0.6B --free-gpu-memory-fraction 0.20 --max-num-tokens 8192 --max-batch-size 64 &
446
```
447
448
449
450
451

**Framework-Specific GPU Memory Arguments:**
- **vLLM**: `--gpu-memory-utilization 0.20` (use 20% GPU memory), `--enforce-eager` (disable CUDA graphs), `--no-enable-prefix-caching` (save memory), `--max-num-seqs 64` (max concurrent sequences)
- **SGLang**: `--mem-fraction-static 0.20` (20% GPU memory for static allocation), `--max-running-requests 64` (max concurrent requests)
- **TensorRT-LLM**: `--free-gpu-memory-fraction 0.20` (reserve 20% GPU memory), `--max-num-tokens 8192` (max tokens in batch), `--max-batch-size 64` (max batch size)