SGL Router supports automatic service discovery for worker nodes in Kubernetes environments. When enabled, the router will automatically:
SGL Router supports automatic service discovery for worker nodes in Kubernetes environments. This feature works with both regular (single-server) routing and PD (Prefill-Decode) routing modes. When enabled, the router will automatically:
- Discover and add worker pods with matching labels
- Discover and add worker pods with matching labels
- Remove unhealthy or deleted worker pods
- Remove unhealthy or deleted worker pods
- Dynamically adjust the worker pool based on pod health and availability
- Dynamically adjust the worker pool based on pod health and availability
- For PD mode: distinguish between prefill and decode servers based on labels
#### Command Line Usage
#### Regular Mode Service Discovery
For traditional single-server routing:
```bash
```bash
python -m sglang_router.launch_router \
python -m sglang_router.launch_router \
--service-discovery\
--service-discovery\
--selectorapp=sglang-worker role=inference \
--selectorapp=sglang-worker role=inference \
--service-discovery-port 8000 \
--service-discovery-namespace default
--service-discovery-namespace default
```
```
#### PD Mode Service Discovery
For PD (Prefill-Decode) disaggregated routing, service discovery can automatically discover and classify pods as either prefill or decode servers based on their labels:
```bash
python -m sglang_router.launch_router \
--pd-disaggregation\
--policy cache_aware \
--service-discovery\
--prefill-selectorapp=sglang component=prefill \
--decode-selectorapp=sglang component=decode \
--service-discovery-namespace sglang-system
```
You can also specify initial prefill and decode servers and let service discovery add more:
```bash
python -m sglang_router.launch_router \
--pd-disaggregation\
--policy cache_aware \
--prefill http://prefill-1:8000 8001 \
--decode http://decode-1:8000 \
--service-discovery\
--prefill-selectorapp=sglang component=prefill \
--decode-selectorapp=sglang component=decode \
--service-discovery-namespace sglang-system
```
#### Kubernetes Pod Configuration for PD Mode
When using PD service discovery, your Kubernetes pods need specific labels to be classified as prefill or decode servers:
**Prefill Server Pod:**
```yaml
apiVersion:v1
kind:Pod
metadata:
name:sglang-prefill-1
labels:
app:sglang
component:prefill
annotations:
sglang.ai/bootstrap-port:"9001"# Optional: Bootstrap port for Mooncake prefill coordination
spec:
containers:
-name:sglang
image:lmsys/sglang:latest
ports:
-containerPort:8000# Main API port
-containerPort:9001# Optional: Bootstrap coordination port
# ... rest of configuration
```
**Decode Server Pod:**
```yaml
apiVersion:v1
kind:Pod
metadata:
name:sglang-decode-1
labels:
app:sglang
component:decode
spec:
containers:
-name:sglang
image:lmsys/sglang:latest
ports:
-containerPort:8000# Main API port
# ... rest of configuration
```
**Key Requirements:**
- Prefill pods must have labels matching your `--prefill-selector`
- Decode pods must have labels matching your `--decode-selector`
- Prefill pods can optionally include bootstrap port in annotations using `sglang.ai/bootstrap-port` (defaults to None if not specified)
#### Service Discovery Arguments
#### Service Discovery Arguments
**General Arguments:**
-`--service-discovery`: Enable Kubernetes service discovery feature
-`--service-discovery`: Enable Kubernetes service discovery feature
-`--selector`: One or more label key-value pairs for pod selection (format: key1=value1 key2=value2)
-`--service-discovery-port`: Port to use when generating worker URLs (default: 8000)
-`--service-discovery-port`: Port to use when generating worker URLs (default: 80)
-`--service-discovery-namespace`: Optional. Kubernetes namespace to watch for pods. If not provided, watches all namespaces (requires cluster-wide permissions)
-`--service-discovery-namespace`: Optional. Kubernetes namespace to watch for pods. If not provided, watches all namespaces (requires cluster-wide permissions)
-`--selector`: One or more label key-value pairs for pod selection in regular mode (format: key1=value1 key2=value2)
1. Enable PD (Prefill-Decode) disaggregated routing mode with automatic pod classification
2. Watch for pods in the `production` namespace
3. Automatically add prefill servers with labels `app=sglang`, `component=prefill`, `environment=production`
4. Automatically add decode servers with labels `app=sglang`, `component=decode`, `environment=production`
5. Extract bootstrap ports from the `sglang.ai/bootstrap-port` annotation on prefill pods
6. Use cache-aware load balancing for optimal performance
7. Expose the router API on port 8080 and metrics on port 9090
-**If watching all namespaces** (without specifying namespace):
**Note:** In PD mode with service discovery, pods MUST match either the prefill or decode selector to be added. Pods that don't match either selector are ignored.
Set up a ServiceAccount, ClusterRole, and ClusterRoleBinding with permissions to list/watch pods at the cluster level