Unverified Commit b35db6e2 authored by Jacky's avatar Jacky Committed by GitHub
Browse files

docs: Request rejection metrics (#8139)


Signed-off-by: default avatarJacky <18255193+kthui@users.noreply.github.com>
parent eac6a5fc
...@@ -197,38 +197,19 @@ def send_with_retry(request, max_retries=5): ...@@ -197,38 +197,19 @@ def send_with_retry(request, max_retries=5):
Track rejection behavior with these metrics: Track rejection behavior with these metrics:
| Metric | Type | Description | - `dynamo_frontend_model_rejection_total`: Counter tracking the total number of requests rejected due to resource exhaustion
|--------|------|-------------| - Labels:
| `dynamo_tasks_rejected_total` | Counter | Total number of rejected tasks | - `model`: The model name being served
| `dynamo_queued_requests` | Gauge | Requests waiting in HTTP queue | - `endpoint`: The API endpoint that received the request (e.g., `chat_completions`, `completions`, `embeddings`)
- This metric is incremented when the router returns a `ResourceExhausted` error because all workers are busy. The rejected request is surfaced to the client as an HTTP 503 response.
### Example Prometheus Queries
**Example metrics output:**
```promql ```text
# Rejection rate over 5 minutes dynamo_frontend_model_rejection_total{endpoint="chat_completions",model="Qwen/Qwen3-0.6B"} 32
rate(dynamo_tasks_rejected_total[5m]) dynamo_frontend_model_rejection_total{endpoint="completions",model="Qwen/Qwen3-0.6B"} 5
# Percentage of requests rejected
sum(rate(dynamo_tasks_rejected_total[5m])) /
sum(rate(dynamo_tasks_issued_total[5m])) * 100
``` ```
### Grafana Alerting **Endpoint:** Available on the frontend HTTP service at `/metrics`.
Example alert for high rejection rate:
```yaml
alert: HighRequestRejectionRate
expr: |
sum(rate(dynamo_tasks_rejected_total[5m])) /
sum(rate(dynamo_tasks_issued_total[5m])) > 0.1
for: 5m
labels:
severity: warning
annotations:
summary: "High request rejection rate"
description: "More than 10% of requests are being rejected"
```
## Tuning Thresholds ## Tuning Thresholds
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment