- 14 Apr, 2026 1 commit
-
-
Mark McLoughlin authored
[Core][Metrics][BugFix] Replace num_cached_tokens/num_external_computed_tokens with PrefillStats (#37460) Related to `Counters can only be incremented by non-negative amounts` error with the `vllm:prompt_tokens_by_source_total` metric. Signed-off-by:
Mark McLoughlin <markmc@redhat.com> Co-authored-by:
Or Ozeri <or@ozery.com>
-
- 12 Apr, 2026 1 commit
-
-
Mark McLoughlin authored
Signed-off-by:Mark McLoughlin <markmc@redhat.com>
-
- 04 Feb, 2026 1 commit
-
-
zhanqiuhu authored
Add labeled Prometheus metrics to distinguish where prompt tokens come from in P/D disaggregated deployments. In P/D disaggregation, decode instances receive KV cache from prefill instances. Currently, decode reports inflated prompt throughput because it counts all prompt tokens as "computed", even though most were transferred. This PR adds labeled metrics so users can understand actual compute work vs transferred work: vllm:prompt_tokens_by_source_total{source="local_compute"} # Tokens prefilled locally vllm:prompt_tokens_by_source_total{source="external_kv_transfer"} # Tokens received via KV transfer vllm:prompt_tokens_by_source_total{source="local_cache_hit"} # Tokens from local prefix cache vllm:prompt_tokens_cached_total # Total cached (local + external, -1 when all Signed-off-by:Zhanqiu Hu <zh338@cornell.edu>
-
- 09 Dec, 2025 1 commit
-
-
Victor Ziliang Peng authored
Signed-off-by:Ziliang Peng <ziliang@character.ai>
-
- 10 Nov, 2025 1 commit
-
-
Mark McLoughlin authored
Signed-off-by:Mark McLoughlin <markmc@redhat.com>
-
- 05 Nov, 2025 1 commit
-
-
Snehlata authored
Signed-off-by:atalhens <sneh.lata@nutanix.com>
-
- 05 Oct, 2025 2 commits
-
-
Harry Mellor authored
Signed-off-by:Harry Mellor <19981378+hmellor@users.noreply.github.com>
-
22quinn authored
Signed-off-by:22quinn <33176974+22quinn@users.noreply.github.com>
-