README.md 4.66 KB
Newer Older
1
2
3
---
# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
4
title: SGLang
5
6
---

7
8
# Running SGLang with Dynamo

9
10
## Use the Latest Release

11
We recommend using the latest stable release of Dynamo to avoid breaking changes:
12
13
14
15
16
17
18
19
20
21
22

[![GitHub Release](https://img.shields.io/github/v/release/ai-dynamo/dynamo)](https://github.com/ai-dynamo/dynamo/releases/latest)

You can find the latest release [here](https://github.com/ai-dynamo/dynamo/releases/latest) and check out the corresponding branch with:

```bash
git checkout $(git describe --tags $(git rev-list --tags --max-count=1))
```

---

23
Dynamo SGLang integrates [SGLang](https://github.com/sgl-project/sglang) engines into Dynamo's distributed runtime, enabling disaggregated serving, KV-aware routing, and request cancellation while maintaining full compatibility with SGLang's native engine arguments. It supports LLM inference, embedding models, multimodal vision models, and diffusion-based generation (LLM, image, video).
24
25
26

## Installation

27
28
29
### Install Latest Release

We recommend using [uv](https://github.com/astral-sh/uv) to install:
30
31
32
33
34
35

```bash
uv venv --python 3.12 --seed
uv pip install "ai-dynamo[sglang]"
```

36
This installs Dynamo with the compatible SGLang version.
37

38
39
40
41
### Install for Development

<Accordion title="Development installation">
Requires Rust and the CUDA toolkit (`nvcc`).
42
43

```bash
44
# install dynamo
45
uv venv --python 3.12 --seed
46
uv pip install maturin nixl
47
48
49
50
cd $DYNAMO_HOME/lib/bindings/python
maturin develop --uv
cd $DYNAMO_HOME
uv pip install -e .
51
52
53
# install sglang
git clone https://github.com/sgl-project/sglang.git
cd sglang && uv pip install -e "python"
54
55
```

56
57
This is the ideal way for agents to also develop. You can provide the path to both repos and the virtual environment and have it rerun these commands as it makes changes
</Accordion>
58

59
### Docker
60

61
<Accordion title="Build and run container">
62
63
```bash
cd $DYNAMO_ROOT
64
python container/render.py --framework sglang --output-short-filename
65
docker build -f container/rendered.Dockerfile -t dynamo:latest-sglang .
66
67
68
69
```

```bash
docker run \
70
71
72
    --gpus all -it --rm \
    --network host --shm-size=10G \
    --ulimit memlock=-1 --ulimit stack=67108864 \
73
    --ulimit nofile=65536:65536 \
74
    --cap-add CAP_SYS_PTRACE --ipc host \
75
    dynamo:latest-sglang
76
```
77
</Accordion>
78

79
## Feature Support Matrix
80

81
82
83
84
85
86
87
88
89
| Feature | Status | Notes |
|---------|--------|-------|
| [**Disaggregated Serving**](../../design-docs/disagg-serving.md) | ✅ | Prefill/decode separation with NIXL KV transfer |
| [**KV-Aware Routing**](../../components/router/README.md) | ✅ | |
| [**SLA-Based Planner**](../../components/planner/planner-guide.md) | ✅ | |
| [**Multimodal Support**](../../features/multimodal/multimodal-sglang.md) | ✅ | Image via EPD, E/PD, E/P/D patterns |
| [**Diffusion Models**](sglang-diffusion.md) | ✅ | LLM diffusion, image, and video generation |
| [**Request Cancellation**](../../fault-tolerance/request-cancellation.md) | ✅ | Aggregated full; disaggregated decode-only |
| [**Graceful Shutdown**](../../fault-tolerance/graceful-shutdown.md) | ✅ | Discovery unregister + grace period |
90
| [**Observability**](sglang-observability.md) | ✅ | Metrics, tracing, and Grafana dashboards |
91
| [**KVBM**](../../components/kvbm/README.md) | ❌ | Planned |
92

93
## Quick Start
94

95
96
97
### Python / CLI Deployment

Start infrastructure services for local development:
98
99
100
101
102
103

```bash
docker compose -f deploy/docker-compose.yml up -d
```


104
Launch an aggregated serving deployment:
105
106
107
108
109
110

```bash
cd $DYNAMO_HOME/examples/backends/sglang
./launch/agg.sh
```

111
Verify the deployment:
112
113
114
115
116
117

```bash
curl localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Qwen/Qwen3-0.6B",
118
    "messages": [{"role": "user", "content": "Hello!"}],
119
120
121
122
    "stream": true,
    "max_tokens": 30
  }'
```
123
### Kubernetes Deployment
124

125
You can deploy SGLang with Dynamo on Kubernetes using a `DynamoGraphDeployment`. For more details, see the [SGLang Kubernetes Deployment Guide](https://github.com/ai-dynamo/dynamo/tree/main/examples/backends/sglang/deploy).
126

127
## Next Steps
128

129
130
131
132
- **[Reference Guide](sglang-reference-guide.md)**: Worker types, architecture, and configuration
- **[Examples](sglang-examples.md)**: All deployment patterns with launch scripts
- **[Disaggregation](sglang-disaggregation.md)**: P/D architecture and KV transfer details
- **[Diffusion](sglang-diffusion.md)**: LLM, image, and video diffusion models
133
- **[Observability](sglang-observability.md)**: Metrics, tracing, and Grafana dashboards
134
- **[Deploying SGLang with Dynamo on Kubernetes](https://github.com/ai-dynamo/dynamo/tree/main/examples/backends/sglang/deploy)**: Kubernetes deployment guide