To setup a monitoring dashboard, you can use the following docker compose file: [examples/monitoring/docker-compose.yaml](../examples/monitoring/docker-compose.yaml).
To setup a monitoring dashboard, you can use the following docker compose file: [examples/monitoring/docker-compose.yaml](../examples/monitoring/docker-compose.yaml).
Assume you have sglang server running at `localhost:30000`.
Assume you have sglang server running at `localhost:30000`, to start the server, ensure you have `--enable-metrics` flag enabled:
To start the monitoring dashboard (prometheus + grafana), cd to `examples/monitoring` and run:
To start the monitoring dashboard (prometheus + grafana), cd to `examples/monitoring` and run:
...
@@ -142,4 +146,28 @@ Then you can access the Grafana dashboard at http://localhost:3000.
...
@@ -142,4 +146,28 @@ Then you can access the Grafana dashboard at http://localhost:3000.
### Grafana Dashboard
### Grafana Dashboard
In a new Grafana setup, ensure that you have the `Prometheus` data source enabled. To check that, go to `http://localhost:3000/connections/datasources` and ensure that `Prometheus` is enabled.
If not, click `Add data source` -> `Prometheus`, set Prometheus URL to `http://localhost:9090`, and click `Save & Test`.
To import the Grafana dashboard, click `+` -> `Import` -> `Upload JSON file` -> `Upload` and select [grafana.json](../examples/monitoring/grafana.json).
To import the Grafana dashboard, click `+` -> `Import` -> `Upload JSON file` -> `Upload` and select [grafana.json](../examples/monitoring/grafana.json).
### Troubleshooting
#### Check if the variables are created
The example dashboard assume you have the following variables avaliable:
If you don't have these variables, you can create them manually.
To create a variable, go to dashboard settings, `Variables` -> `New variable`.
You should be able to see the preview the values (e.g. `meta-llama/Llama-3.1-8B-Instruct` for `model_name`).
#### Check if the metrics are being collected
Run `python3 -m sglang.bench_serving --backend sglang --dataset-name random --num-prompts 3000 --random-input 1024 --random-output 1024 --random-range-ratio 0.5` to generate some requests.
Then you should be able to see the metrics in the Grafana dashboard.
"expr":"histogram_quantile(0.99, sum by(le) (rate(sglang:e2e_request_latency_seconds_bucket{instance=\"$instance\", name=\"$name\"}[$__rate_interval])))",
"fullMetaSearch":false,
"includeNullMetadata":true,
"instant":false,
"legendFormat":"P99",
"range":true,
"refId":"A",
"useBackend":false
},
{
"datasource":{
"type":"prometheus",
"uid":"ddyfngn31dg5cf"
},
"disableTextWrap":false,
"editorMode":"code",
"expr":"histogram_quantile(0.9, sum by(le) (rate(sglang:e2e_request_latency_seconds_bucket{instance=\"$instance\", name=\"$name\"}[$__rate_interval])))",
"fullMetaSearch":false,
"hide":false,
"includeNullMetadata":true,
"instant":false,
"legendFormat":"P90",
"range":true,
"refId":"B",
"useBackend":false
},
{
"datasource":{
"type":"prometheus",
"uid":"ddyfngn31dg5cf"
},
"disableTextWrap":false,
"editorMode":"builder",
"expr":"histogram_quantile(0.95, sum by(le) (rate(sglang:e2e_request_latency_seconds_bucket{instance=\"$instance\", name=\"$model_name\"}[$__rate_interval])))",
"fullMetaSearch":false,
"hide":false,
"includeNullMetadata":true,
"instant":false,
"legendFormat":"P95",
"range":true,
"refId":"C",
"useBackend":false
},
{
"datasource":{
"type":"prometheus",
"uid":"ddyfngn31dg5cf"
},
"disableTextWrap":false,
"editorMode":"builder",
"expr":"histogram_quantile(0.5, sum by(le) (rate(sglang:e2e_request_latency_seconds_bucket{instance=\"$instance\", name=\"$model_name\"}[$__rate_interval])))",