Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
vllm_cscc
Commits
690cc3ef
Unverified
Commit
690cc3ef
authored
Dec 05, 2025
by
TimWang
Committed by
GitHub
Dec 04, 2025
Browse files
docs: update metrics design doc to use new vllm:kv_cache_usage_perc (#30041)
Signed-off-by:
Tim
<
tim.wang03@sap.com
>
parent
1f0d1845
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
1 addition
and
1 deletion
+1
-1
docs/design/metrics.md
docs/design/metrics.md
+1
-1
No files found.
docs/design/metrics.md
View file @
690cc3ef
...
...
@@ -62,7 +62,7 @@ The subset of metrics exposed in the Grafana dashboard gives us an indication of
-
`vllm:time_per_output_token_seconds`
- Inter-token latency (Time Per Output Token, TPOT) in seconds.
-
`vllm:time_to_first_token_seconds`
- Time to First Token (TTFT) latency in seconds.
-
`vllm:num_requests_running`
(also,
`_swapped`
and
`_waiting`
) - Number of requests in the RUNNING, WAITING, and SWAPPED states.
-
`vllm:
gpu
_cache_usage_perc`
- Percentage of used cache blocks by vLLM.
-
`vllm:
kv
_cache_usage_perc`
- Percentage of used cache blocks by vLLM.
-
`vllm:request_prompt_tokens`
- Request prompt length.
-
`vllm:request_generation_tokens`
- Request generation length.
-
`vllm:request_success`
- Number of finished requests by their finish reason: either an EOS token was generated or the max sequence length was reached.
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment