To setup a monitoring dashboard, you can use the following docker compose file: [examples/monitoring/docker-compose.yaml](../examples/monitoring/docker-compose.yaml).
Assume you have sglang server running at `localhost:30000`.
To start the monitoring dashboard (prometheus + grafana), cd to `examples/monitoring` and run:
```bash
docker compose -f compose.yaml -p monitoring up
```
Then you can access the Grafana dashboard at http://localhost:3000.
### Grafana Dashboard
To import the Grafana dashboard, click `+` -> `Import` -> `Upload JSON file` -> `Upload` and select [grafana.json](../examples/monitoring/grafana.json).
"expr":"histogram_quantile(0.99, sum by(le) (rate(sglang:e2e_request_latency_seconds_bucket{instance=\"$instance\", name=\"$name\"}[$__rate_interval])))",
"fullMetaSearch":false,
"includeNullMetadata":true,
"instant":false,
"legendFormat":"P99",
"range":true,
"refId":"A",
"useBackend":false
},
{
"datasource":{
"type":"prometheus",
"uid":"ddyfngn31dg5cf"
},
"disableTextWrap":false,
"editorMode":"code",
"expr":"histogram_quantile(0.9, sum by(le) (rate(sglang:e2e_request_latency_seconds_bucket{instance=\"$instance\", name=\"$name\"}[$__rate_interval])))",
"fullMetaSearch":false,
"hide":false,
"includeNullMetadata":true,
"instant":false,
"legendFormat":"P90",
"range":true,
"refId":"B",
"useBackend":false
},
{
"datasource":{
"type":"prometheus",
"uid":"ddyfngn31dg5cf"
},
"disableTextWrap":false,
"editorMode":"builder",
"expr":"histogram_quantile(0.95, sum by(le) (rate(sglang:e2e_request_latency_seconds_bucket{instance=\"$instance\", name=\"$model_name\"}[$__rate_interval])))",
"fullMetaSearch":false,
"hide":false,
"includeNullMetadata":true,
"instant":false,
"legendFormat":"P95",
"range":true,
"refId":"C",
"useBackend":false
},
{
"datasource":{
"type":"prometheus",
"uid":"ddyfngn31dg5cf"
},
"disableTextWrap":false,
"editorMode":"builder",
"expr":"histogram_quantile(0.5, sum by(le) (rate(sglang:e2e_request_latency_seconds_bucket{instance=\"$instance\", name=\"$model_name\"}[$__rate_interval])))",