README.md 4.43 KB
Newer Older
1
2
3
---
# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
4
title: SGLang
5
6
7
8
---

## Use the Latest Release

9
We recommend using the [latest stable release](https://github.com/ai-dynamo/dynamo/releases/latest) of Dynamo to avoid breaking changes.
10
11
12

---

13
Dynamo SGLang integrates [SGLang](https://github.com/sgl-project/sglang) engines into Dynamo's distributed runtime, enabling disaggregated serving, KV-aware routing, and request cancellation while maintaining full compatibility with SGLang's native engine arguments. It supports LLM inference, embedding models, multimodal vision models, and diffusion-based generation (LLM, image, video).
14
15
16

## Installation

17
18
19
### Install Latest Release

We recommend using [uv](https://github.com/astral-sh/uv) to install:
20
21
22

```bash
uv venv --python 3.12 --seed
23
uv pip install --prerelease=allow "ai-dynamo[sglang]"
24
25
```

26
This installs Dynamo with the compatible SGLang version.
27

28
29
30
31
### Install for Development

<Accordion title="Development installation">
Requires Rust and the CUDA toolkit (`nvcc`).
32
33

```bash
34
# install dynamo
35
uv venv --python 3.12 --seed
36
uv pip install maturin nixl
37
38
39
40
cd $DYNAMO_HOME/lib/bindings/python
maturin develop --uv
cd $DYNAMO_HOME
uv pip install -e .
41
42
43
# install sglang
git clone https://github.com/sgl-project/sglang.git
cd sglang && uv pip install -e "python"
44
45
```

46
47
This is the ideal way for agents to also develop. You can provide the path to both repos and the virtual environment and have it rerun these commands as it makes changes
</Accordion>
48

49
### Docker
50

51
<Accordion title="Build and run container">
52
53
```bash
cd $DYNAMO_ROOT
54
python container/render.py --framework sglang --output-short-filename
55
docker build -f container/rendered.Dockerfile -t dynamo:latest-sglang .
56
57
58
59
```

```bash
docker run \
60
61
62
    --gpus all -it --rm \
    --network host --shm-size=10G \
    --ulimit memlock=-1 --ulimit stack=67108864 \
63
    --ulimit nofile=65536:65536 \
64
    --cap-add CAP_SYS_PTRACE --ipc host \
65
    dynamo:latest-sglang
66
```
67
</Accordion>
68

69
## Feature Support Matrix
70

71
72
73
74
75
76
77
78
79
| Feature | Status | Notes |
|---------|--------|-------|
| [**Disaggregated Serving**](../../design-docs/disagg-serving.md) | ✅ | Prefill/decode separation with NIXL KV transfer |
| [**KV-Aware Routing**](../../components/router/README.md) | ✅ | |
| [**SLA-Based Planner**](../../components/planner/planner-guide.md) | ✅ | |
| [**Multimodal Support**](../../features/multimodal/multimodal-sglang.md) | ✅ | Image via EPD, E/PD, E/P/D patterns |
| [**Diffusion Models**](sglang-diffusion.md) | ✅ | LLM diffusion, image, and video generation |
| [**Request Cancellation**](../../fault-tolerance/request-cancellation.md) | ✅ | Aggregated full; disaggregated decode-only |
| [**Graceful Shutdown**](../../fault-tolerance/graceful-shutdown.md) | ✅ | Discovery unregister + grace period |
80
| [**Observability**](sglang-observability.md) | ✅ | Metrics, tracing, and Grafana dashboards |
81
| [**KVBM**](../../components/kvbm/README.md) | ❌ | Planned |
82

83
## Quick Start
84

85
86
87
### Python / CLI Deployment

Start infrastructure services for local development:
88
89
90
91
92
93

```bash
docker compose -f deploy/docker-compose.yml up -d
```


94
Launch an aggregated serving deployment:
95
96
97
98
99
100

```bash
cd $DYNAMO_HOME/examples/backends/sglang
./launch/agg.sh
```

101
Verify the deployment:
102
103
104
105
106
107

```bash
curl localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Qwen/Qwen3-0.6B",
108
    "messages": [{"role": "user", "content": "Explain why Roger Federer is considered one of the greatest tennis players of all time"}],
109
110
111
112
    "stream": true,
    "max_tokens": 30
  }'
```
113
### Kubernetes Deployment
114

115
You can deploy SGLang with Dynamo on Kubernetes using a `DynamoGraphDeployment`. For more details, see the [SGLang Kubernetes Deployment Guide](https://github.com/ai-dynamo/dynamo/tree/main/examples/backends/sglang/deploy).
116

117
## Next Steps
118

119
120
121
122
- **[Reference Guide](sglang-reference-guide.md)**: Worker types, architecture, and configuration
- **[Examples](sglang-examples.md)**: All deployment patterns with launch scripts
- **[Disaggregation](sglang-disaggregation.md)**: P/D architecture and KV transfer details
- **[Diffusion](sglang-diffusion.md)**: LLM, image, and video diffusion models
123
- **[Observability](sglang-observability.md)**: Metrics, tracing, and Grafana dashboards
124
- **[Deploying SGLang with Dynamo on Kubernetes](https://github.com/ai-dynamo/dynamo/tree/main/examples/backends/sglang/deploy)**: Kubernetes deployment guide