README.md 7.48 KB
Newer Older
Byron Hsu's avatar
Byron Hsu committed
1
# SGLang Router
2

Simo Lin's avatar
Simo Lin committed
3
SGLang router is a standalone Rust module that enables data parallelism across SGLang instances, providing high-performance request routing and advanced load balancing. The router supports multiple load balancing algorithms including cache-aware, power of two, random, and round robin, and acts as a specialized load balancer for prefill-decode disaggregated serving architectures.
4

Simo Lin's avatar
Simo Lin committed
5
## Documentation
Byron Hsu's avatar
Byron Hsu committed
6

Simo Lin's avatar
Simo Lin committed
7
- **User Guide**: [docs.sglang.ai/router/router.html](https://docs.sglang.ai/router/router.html)
8

Simo Lin's avatar
Simo Lin committed
9
## Quick Start
10

11
### Prerequisites
12

Simo Lin's avatar
Simo Lin committed
13
**Rust and Cargo:**
14
15
16
17
18
19
20
21
22
23
24
25
```bash
# Install rustup (Rust installer and version manager)
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

# Follow the installation prompts, then reload your shell
source $HOME/.cargo/env

# Verify installation
rustc --version
cargo --version
```

Simo Lin's avatar
Simo Lin committed
26
**Python with pip installed**
27

Simo Lin's avatar
Simo Lin committed
28
### Installation
29

Simo Lin's avatar
Simo Lin committed
30
#### Option A: Build and Install Wheel (Recommended)
31
```bash
Simo Lin's avatar
Simo Lin committed
32
33
# Install build dependencies
pip install setuptools-rust wheel build
34

Simo Lin's avatar
Simo Lin committed
35
36
# Build the wheel package
python -m build
37

Simo Lin's avatar
Simo Lin committed
38
39
# Install the generated wheel
pip install dist/*.whl
40

Simo Lin's avatar
Simo Lin committed
41
42
# One-liner for development (rebuild + install)
python -m build && pip install --force-reinstall dist/*.whl
43
44
```

Simo Lin's avatar
Simo Lin committed
45
#### Option B: Development Mode
46
```bash
Simo Lin's avatar
Simo Lin committed
47
pip install -e .
48
49
```

Simo Lin's avatar
Simo Lin committed
50
⚠️ **Warning**: Editable installs may suffer performance degradation. Use wheel builds for performance testing.
51

Simo Lin's avatar
Simo Lin committed
52
### Basic Usage
53

54
```bash
Simo Lin's avatar
Simo Lin committed
55
56
57
58
59
60
# Build Rust components
cargo build

# Launch router with worker URLs
python -m sglang_router.launch_router \
    --worker-urls http://worker1:8000 http://worker2:8000
61
62
```

Simo Lin's avatar
Simo Lin committed
63
## Configuration
64

65
66
### Logging

Simo Lin's avatar
Simo Lin committed
67
Enable structured logging with optional file output:
68
69

```python
Simo Lin's avatar
Simo Lin committed
70
71
72
73
74
75
from sglang_router import Router

# Console logging (default)
router = Router(worker_urls=["http://worker1:8000", "http://worker2:8000"])

# File logging enabled
76
77
router = Router(
    worker_urls=["http://worker1:8000", "http://worker2:8000"],
Simo Lin's avatar
Simo Lin committed
78
    log_dir="./logs"  # Daily log files created here
79
80
81
)
```

Simo Lin's avatar
Simo Lin committed
82
Set log level with `--log-level` flag ([documentation](https://docs.sglang.ai/backend/server_arguments.html#logging)).
83

84
85
### Metrics

Simo Lin's avatar
Simo Lin committed
86
Prometheus metrics endpoint available at `127.0.0.1:29000` by default.
87

Simo Lin's avatar
Simo Lin committed
88
89
```bash
# Custom metrics configuration
90
python -m sglang_router.launch_router \
Simo Lin's avatar
Simo Lin committed
91
92
93
    --worker-urls http://localhost:8080 http://localhost:8081 \
    --prometheus-host 0.0.0.0 \
    --prometheus-port 9000
94
95
```

Simo Lin's avatar
Simo Lin committed
96
## Advanced Features
97

Simo Lin's avatar
Simo Lin committed
98
### Kubernetes Service Discovery
99

Simo Lin's avatar
Simo Lin committed
100
Automatic worker discovery and management in Kubernetes environments.
101

Simo Lin's avatar
Simo Lin committed
102
#### Basic Service Discovery
103
104
105
106
107
108
109
110

```bash
python -m sglang_router.launch_router \
    --service-discovery \
    --selector app=sglang-worker role=inference \
    --service-discovery-namespace default
```

Simo Lin's avatar
Simo Lin committed
111
#### PD (Prefill-Decode) Mode
112

Simo Lin's avatar
Simo Lin committed
113
For disaggregated prefill/decode routing:
114
115
116
117
118
119
120
121
122
123
124

```bash
python -m sglang_router.launch_router \
    --pd-disaggregation \
    --policy cache_aware \
    --service-discovery \
    --prefill-selector app=sglang component=prefill \
    --decode-selector app=sglang component=decode \
    --service-discovery-namespace sglang-system
```

Simo Lin's avatar
Simo Lin committed
125
#### Kubernetes Pod Configuration
126
127
128
129
130
131
132
133
134
135
136

**Prefill Server Pod:**
```yaml
apiVersion: v1
kind: Pod
metadata:
  name: sglang-prefill-1
  labels:
    app: sglang
    component: prefill
  annotations:
Simo Lin's avatar
Simo Lin committed
137
    sglang.ai/bootstrap-port: "9001"  # Optional: Bootstrap port
138
139
140
141
142
143
spec:
  containers:
  - name: sglang
    image: lmsys/sglang:latest
    ports:
    - containerPort: 8000  # Main API port
Simo Lin's avatar
Simo Lin committed
144
    - containerPort: 9001  # Optional: Bootstrap port
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
```

**Decode Server Pod:**
```yaml
apiVersion: v1
kind: Pod
metadata:
  name: sglang-decode-1
  labels:
    app: sglang
    component: decode
spec:
  containers:
  - name: sglang
    image: lmsys/sglang:latest
    ports:
Simo Lin's avatar
Simo Lin committed
161
    - containerPort: 8000
162
163
```

Simo Lin's avatar
Simo Lin committed
164
#### RBAC Configuration
165

166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
**Namespace-scoped (recommended):**
```yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  name: sglang-router
  namespace: sglang-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  namespace: sglang-system
  name: sglang-router
rules:
- apiGroups: [""]
  resources: ["pods"]
  verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: sglang-router
  namespace: sglang-system
subjects:
- kind: ServiceAccount
  name: sglang-router
  namespace: sglang-system
roleRef:
  kind: Role
  name: sglang-router
  apiGroup: rbac.authorization.k8s.io
```

Simo Lin's avatar
Simo Lin committed
199
#### Complete PD Example
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214

```bash
python -m sglang_router.launch_router \
    --pd-disaggregation \
    --policy cache_aware \
    --service-discovery \
    --prefill-selector app=sglang component=prefill environment=production \
    --decode-selector app=sglang component=decode environment=production \
    --service-discovery-namespace production \
    --host 0.0.0.0 \
    --port 8080 \
    --prometheus-host 0.0.0.0 \
    --prometheus-port 9090
```

Simo Lin's avatar
Simo Lin committed
215
### Command Line Arguments Reference
216

Simo Lin's avatar
Simo Lin committed
217
218
219
220
221
#### Service Discovery
- `--service-discovery`: Enable Kubernetes service discovery
- `--service-discovery-port`: Port for worker URLs (default: 8000)
- `--service-discovery-namespace`: Kubernetes namespace to watch
- `--selector`: Label selectors for regular mode (format: `key1=value1 key2=value2`)
222

Simo Lin's avatar
Simo Lin committed
223
224
225
226
227
228
229
#### PD Mode
- `--pd-disaggregation`: Enable Prefill-Decode disaggregated mode
- `--prefill`: Initial prefill server (format: `URL BOOTSTRAP_PORT`)
- `--decode`: Initial decode server URL
- `--prefill-selector`: Label selector for prefill pods
- `--decode-selector`: Label selector for decode pods
- `--policy`: Routing policy (`cache_aware`, `random`, `power_of_two`)
230

Simo Lin's avatar
Simo Lin committed
231
## Development
232

Simo Lin's avatar
Simo Lin committed
233
### Build Process
234

Simo Lin's avatar
Simo Lin committed
235
236
237
238
239
240
```bash
# Build Rust project
cargo build

# Build Python binding (see Installation section above)
```
241

Simo Lin's avatar
Simo Lin committed
242
**Note**: When modifying Rust code, you must rebuild the wheel for changes to take effect.
243

Simo Lin's avatar
Simo Lin committed
244
### Troubleshooting
245

Simo Lin's avatar
Simo Lin committed
246
247
**VSCode Rust Analyzer Issues:**
Set `rust-analyzer.linkedProjects` to the absolute path of `Cargo.toml`:
248

Simo Lin's avatar
Simo Lin committed
249
250
251
252
253
```json
{
  "rust-analyzer.linkedProjects": ["/workspaces/sglang/sgl-router/Cargo.toml"]
}
```
254

Simo Lin's avatar
Simo Lin committed
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
### CI/CD Pipeline

The continuous integration pipeline includes comprehensive testing, benchmarking, and publishing:

#### Build & Test
1. **Build Wheels**: Uses `cibuildwheel` for manylinux x86_64 packages
2. **Build Source Distribution**: Creates source distribution for pip fallback
3. **Rust HTTP Server Benchmarking**: Performance testing of router overhead
4. **Basic Inference Testing**: End-to-end validation through the router
5. **PD Disaggregation Testing**: Benchmark and sanity checks for prefill-decode load balancing

#### Publishing
- **PyPI Publishing**: Wheels and source distributions are published only when the version changes in `pyproject.toml`
- **Container Images**: Docker images published using `/docker/Dockerfile.router`

## Features

- **High Performance**: Rust-based routing with connection pooling and optimized request handling
- **Advanced Load Balancing**: Multiple algorithms including:
  - **Cache-Aware**: Intelligent routing based on cache locality for optimal performance
  - **Power of Two**: Chooses the less loaded of two randomly selected workers
  - **Random**: Distributes requests randomly across available workers
  - **Round Robin**: Sequential distribution across workers in rotation
- **Prefill-Decode Disaggregation**: Specialized load balancing for separated prefill and decode servers
- **Service Discovery**: Automatic Kubernetes worker discovery and health management
- **Monitoring**: Comprehensive Prometheus metrics and structured logging
- **Scalability**: Handles thousands of concurrent connections with efficient resource utilization