[Doc] Changed explanation of generation_tokens_total and prompt_tokens_total...

[Doc] Changed explanation of generation_tokens_total and prompt_tokens_total counter type metrics to avoid confusion (#16784) Signed-off-by: insukim1994 <insu.kim@moreh.io>

[Doc] Changed explanation of generation_tokens_total and prompt_tokens_total...
[Doc] Changed explanation of generation_tokens_total and prompt_tokens_total counter type metrics to avoid confusion (#16784) Signed-off-by: insukim1994 <insu.kim@moreh.io>
7c02d6a1 · Insu Kim · GitHub · 11c3b984 · 7c02d6a1
Unverified Commit 7c02d6a1 authored Apr 17, 2025 by Insu Kim Committed by GitHub Apr 17, 2025
Hide whitespace changes
Inline Side-by-side

Showing with 2 additions and 2 deletions

docs/source/design/v1/metrics.md docs/source/design/v1/metrics.md +2 -2

No files found.
--- a/docs/source/design/v1/metrics.md
+++ b/docs/source/design/v1/metrics.md
@@ -66,8 +66,8 @@ vLLM also provides [a reference example](https://docs.vllm.ai/en/latest/getting_
 The subset of metrics exposed in the Grafana dashboard gives us an indication of which metrics are especially important:

 - `vllm:e2e_request_latency_seconds_bucket` - End to end request latency measured in seconds
- `vllm:prompt_tokens_total` - Prompt Tokens/Sec
- `vllm:generation_tokens_total` - Generation Tokens/Sec
+- `vllm:prompt_tokens_total` - Prompt Tokens
+- `vllm:generation_tokens_total` - Generation Tokens
 - `vllm:time_per_output_token_seconds` - Inter token latency (Time Per Output Token, TPOT) in second.
 - `vllm:time_to_first_token_seconds` - Time to First Token (TTFT) latency in seconds.
 - `vllm:num_requests_running` (also, `_swapped` and `_waiting`) - Number of requests in RUNNING, WAITING, and SWAPPED state