@@ -130,5 +130,5 @@ You can deploy SGLang with Dynamo on Kubernetes using a `DynamoGraphDeployment`.
...
@@ -130,5 +130,5 @@ You can deploy SGLang with Dynamo on Kubernetes using a `DynamoGraphDeployment`.
-**[Examples](sglang-examples.md)**: All deployment patterns with launch scripts
-**[Examples](sglang-examples.md)**: All deployment patterns with launch scripts
-**[Disaggregation](sglang-disaggregation.md)**: P/D architecture and KV transfer details
-**[Disaggregation](sglang-disaggregation.md)**: P/D architecture and KV transfer details
-**[Diffusion](sglang-diffusion.md)**: LLM, image, and video diffusion models
-**[Diffusion](sglang-diffusion.md)**: LLM, image, and video diffusion models
-**[Prometheus Metrics](sglang-prometheus.md)**: Metrics integration and monitoring
-**[Observability](sglang-observability.md)**: Metrics, tracing, and Grafana dashboards
-**[Deploying SGLang with Dynamo on Kubernetes](https://github.com/ai-dynamo/dynamo/tree/main/examples/backends/sglang/deploy)**: Kubernetes deployment guide
-**[Deploying SGLang with Dynamo on Kubernetes](https://github.com/ai-dynamo/dynamo/tree/main/examples/backends/sglang/deploy)**: Kubernetes deployment guide
# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
# SPDX-License-Identifier: Apache-2.0
title:Prometheus
title:Observability
---
---
# SGLang Prometheus Metrics
# SGLang Observability
## Overview
This guide covers metrics, tracing, and visualization for SGLang deployments running through Dynamo.
## Prometheus Metrics
When running SGLang through Dynamo, SGLang engine metrics are automatically passed through and exposed on Dynamo's `/metrics` endpoint (default port 8081). This allows you to access both SGLang engine metrics (prefixed with `sglang:`) and Dynamo runtime metrics (prefixed with `dynamo_*`) from a single worker backend endpoint.
When running SGLang through Dynamo, SGLang engine metrics are automatically passed through and exposed on Dynamo's `/metrics` endpoint (default port 8081). This allows you to access both SGLang engine metrics (prefixed with `sglang:`) and Dynamo runtime metrics (prefixed with `dynamo_*`) from a single worker backend endpoint.
...
@@ -16,21 +18,21 @@ When running SGLang through Dynamo, SGLang engine metrics are automatically pass
...
@@ -16,21 +18,21 @@ When running SGLang through Dynamo, SGLang engine metrics are automatically pass
**For visualization setup instructions**, see the [Prometheus and Grafana Setup Guide](../../observability/prometheus-grafana.md).
**For visualization setup instructions**, see the [Prometheus and Grafana Setup Guide](../../observability/prometheus-grafana.md).
## Environment Variables
### Environment Variables
| Variable | Description | Default | Example |
| Variable | Description | Default | Example |
|----------|-------------|---------|---------|
|----------|-------------|---------|---------|
| `DYN_SYSTEM_PORT` | System metrics/health port | `-1` (disabled) | `8081` |
| `DYN_SYSTEM_PORT` | System metrics/health port | `-1` (disabled) | `8081` |
## Getting Started Quickly
### Getting Started Quickly
This is a single machine example.
This is a single machine example.
### Start Observability Stack
#### Start Observability Stack
For visualizing metrics with Prometheus and Grafana, start the observability stack. See [Observability Getting Started](../../observability/README.md#getting-started-quickly) for instructions.
For visualizing metrics with Prometheus and Grafana, start the observability stack. See [Observability Getting Started](../../observability/README.md#getting-started-quickly) for instructions.
### Launch Dynamo Components
#### Launch Dynamo Components
Launch a frontend and SGLang backend to test metrics:
Launch a frontend and SGLang backend to test metrics:
SGLang exposes metrics in Prometheus Exposition Format text at the `/metrics` HTTP endpoint. All SGLang engine metrics use the `sglang:` prefix and include labels (e.g., `model_name`, `engine_type`, `tp_rank`, `pp_rank`) to identify the source.
SGLang exposes metrics in Prometheus Exposition Format text at the `/metrics` HTTP endpoint. All SGLang engine metrics use the `sglang:` prefix and include labels (e.g., `model_name`, `engine_type`, `tp_rank`, `pp_rank`) to identify the source.
...
@@ -91,7 +93,7 @@ SGLang provides metrics in the following categories (all prefixed with `sglang:`
...
@@ -91,7 +93,7 @@ SGLang provides metrics in the following categories (all prefixed with `sglang:`
**Note:** Specific metrics are subject to change between SGLang versions. Always refer to the [official documentation](https://docs.sglang.io/references/production_metrics.html) or inspect the `/metrics` endpoint for your SGLang version.
**Note:** Specific metrics are subject to change between SGLang versions. Always refer to the [official documentation](https://docs.sglang.io/references/production_metrics.html) or inspect the `/metrics` endpoint for your SGLang version.
## Available Metrics
### Available Metrics
The official SGLang documentation includes complete metric definitions with:
The official SGLang documentation includes complete metric definitions with:
- HELP and TYPE descriptions
- HELP and TYPE descriptions
...
@@ -102,21 +104,283 @@ The official SGLang documentation includes complete metric definitions with:
...
@@ -102,21 +104,283 @@ The official SGLang documentation includes complete metric definitions with:
For the complete and authoritative list of all SGLang metrics, see the [official SGLang Production Metrics documentation](https://docs.sglang.io/references/production_metrics.html).
For the complete and authoritative list of all SGLang metrics, see the [official SGLang Production Metrics documentation](https://docs.sglang.io/references/production_metrics.html).
## Implementation Details
### Implementation Details
- SGLang uses multiprocess metrics collection via `prometheus_client.multiprocess.MultiProcessCollector`
- SGLang uses multiprocess metrics collection via `prometheus_client.multiprocess.MultiProcessCollector`
- Metrics are filtered by the `sglang:` prefix before being exposed
- Metrics are filtered by the `sglang:` prefix before being exposed
- The integration uses Dynamo's `register_engine_metrics_callback()` function
- The integration uses Dynamo's `register_engine_metrics_callback()` function
- Metrics appear after SGLang engine initialization completes
- Metrics appear after SGLang engine initialization completes
---
## Distributed Tracing
Dynamo propagates [W3C Trace Context](https://www.w3.org/TR/trace-context/) headers through the SGLang request pipeline, allowing you to correlate traces across the frontend, router, and individual SGLang workers in a disaggregated deployment.
### Prerequisites
SGLang's engine-internal tracing requires the `opentelemetry` packages. These are declared as SGLang's `[tracing]` extra. Install them into your Dynamo environment:
Without these packages, Dynamo-side spans (frontend, handler) will still work, but SGLang's internal engine spans will not be emitted and you will see a warning: `"Tracing is disabled because the packages cannot be imported."`
### How Trace Propagation Works
```
Frontend (Rust)
creates span, embeds trace_id + span_id in Context
-`components/src/dynamo/sglang/request_handlers/handler_base.py:71-84` - Extracts trace context from Dynamo `Context` object
-`components/src/dynamo/sglang/request_handlers/llm/decode_handler.py` - Passes `external_trace_header` and `rid=trace_id` to `engine.async_generate()`
| `--otlp-traces-endpoint` | OTLP gRPC endpoint for SGLang's internal trace export (bare `host:port` format, e.g. `localhost:4317`) |
Both flags are required for end-to-end tracing through the SGLang engine. Without `--enable-trace`, the Dynamo handler still creates spans, but SGLang's internal engine spans will not be linked.
### Launch with Tracing
The disaggregated launch script supports `--enable-otel` to enable tracing across all components:
```bash
# Start observability stack first
docker compose -f deploy/docker-compose.yml up -d
docker compose -f deploy/docker-observability.yml up -d
For more details on the Tempo/Grafana tracing infrastructure, see the [Dynamo Tracing Guide](../../observability/tracing.md).
---
## SGLang Grafana Dashboard
Dynamo ships a pre-provisioned Grafana dashboard for SGLang at `deploy/observability/grafana_dashboards/sglang.json`. It is automatically loaded when the observability stack starts.
When developing on a remote VM (cloud instance, bare metal, etc.), the observability ports are only bound to `localhost` inside the VM. You have two options to access them.
### Option 1: SSH Port Forwarding (Recommended)
Forward the relevant ports through your SSH connection. No firewall changes needed, traffic is encrypted.
```bash
# Forward Grafana (3000), Prometheus (9090), and Tempo (3200)
ssh -L 3000:localhost:3000 \
-L 9090:localhost:9090 \
-L 3200:localhost:3200 \
user@your-vm-ip
```
Then open `http://localhost:3000` in your local browser.
For a long-running tunnel in the background:
```bash
ssh -fN\
-L 3000:localhost:3000 \
-L 9090:localhost:9090 \
-L 3200:localhost:3200 \
user@your-vm-ip
```
### Option 2: Firewall Rules
Open the ports directly. Only use this on trusted networks.
```bash
# Ubuntu/Debian
sudo ufw allow 3000/tcp # Grafana
sudo ufw allow 9090/tcp # Prometheus
# Or for cloud VMs, add inbound rules in your security group for ports 3000, 9090
```
Then access `http://<vm-ip>:3000` directly.
### Headless / Agent Access
For CI pipelines, AI coding agents, or headless workflows where no browser is available, you can query Grafana and Prometheus directly via their APIs:
Both SGLang engine metrics (`sglang:*` prefix) and Dynamo runtime metrics (`dynamo_*` prefix) are served from the same endpoint.
Both SGLang engine metrics (`sglang:*` prefix) and Dynamo runtime metrics (`dynamo_*` prefix) are served from the same endpoint.
For metric details, see [SGLang Prometheus Metrics](sglang-prometheus.md). For visualization setup, see [Prometheus + Grafana](../../observability/prometheus-grafana.md).
For metric details, see [SGLang Observability](sglang-observability.md). For visualization setup, see [Prometheus + Grafana](../../observability/prometheus-grafana.md).