- 14 Apr, 2026 1 commit
-
-
Mark McLoughlin authored
[Core][Metrics][BugFix] Replace num_cached_tokens/num_external_computed_tokens with PrefillStats (#37460) Related to `Counters can only be incremented by non-negative amounts` error with the `vllm:prompt_tokens_by_source_total` metric. Signed-off-by:
Mark McLoughlin <markmc@redhat.com> Co-authored-by:
Or Ozeri <or@ozery.com>
-
- 13 Apr, 2026 1 commit
-
-
mukesh-hai authored
Signed-off-by:
Mukesh Baphna <mukesh@hippocraticai.com> Signed-off-by:
Mark McLoughlin <markmc@redhat.com> Co-authored-by:
Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by:
Mark McLoughlin <markmc@redhat.com>
-
- 12 Apr, 2026 1 commit
-
-
Mark McLoughlin authored
Signed-off-by:Mark McLoughlin <markmc@redhat.com>
-
- 01 Apr, 2026 1 commit
-
-
Yifan Qiao authored
Signed-off-by:
Yifan Qiao <yifanqiao@berkeley.edu> Signed-off-by:
Yifan Qiao <yifanqiao@inferact.ai>
-
- 14 Feb, 2026 1 commit
-
-
Cyrus Leung authored
Signed-off-by:
DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by:
Cyrus Leung <cyrus.tl.leung@gmail.com>
-
- 04 Feb, 2026 1 commit
-
-
zhanqiuhu authored
Add labeled Prometheus metrics to distinguish where prompt tokens come from in P/D disaggregated deployments. In P/D disaggregation, decode instances receive KV cache from prefill instances. Currently, decode reports inflated prompt throughput because it counts all prompt tokens as "computed", even though most were transferred. This PR adds labeled metrics so users can understand actual compute work vs transferred work: vllm:prompt_tokens_by_source_total{source="local_compute"} # Tokens prefilled locally vllm:prompt_tokens_by_source_total{source="external_kv_transfer"} # Tokens received via KV transfer vllm:prompt_tokens_by_source_total{source="local_cache_hit"} # Tokens from local prefix cache vllm:prompt_tokens_cached_total # Total cached (local + external, -1 when all Signed-off-by:Zhanqiu Hu <zh338@cornell.edu>
-
- 31 Jan, 2026 1 commit
-
-
jma99_2333 authored
Signed-off-by:
Roger Wang <hey@rogerw.io> Co-authored-by:
Roger Wang <hey@rogerw.io>
-
- 18 Dec, 2025 1 commit
-
-
SungMinCho authored
Signed-off-by:
SungMinCho <tjdals4565@gmail.com> Signed-off-by:
Mark McLoughlin <markmc@redhat.com> Co-authored-by:
Mark McLoughlin <markmc@redhat.com>
-
- 09 Dec, 2025 1 commit
-
-
Victor Ziliang Peng authored
Signed-off-by:Ziliang Peng <ziliang@character.ai>
-
- 03 Dec, 2025 1 commit
-
-
Yong Hoon Shin authored
Signed-off-by:Yong Hoon Shin <yhshin@meta.com>
-
- 01 Dec, 2025 1 commit
-
-
shivampr authored
Introduces three new Prometheus histograms for fine-grained observability of KV cache residency behavior: vllm:kv_block_lifetime_seconds — total lifetime from allocation to free vllm:kv_block_idle_before_evict_seconds — idle duration before eviction vllm:kv_block_reuse_gap_seconds — time between consecutive reuses of the same block These metrics help operators analyze KV cache efficiency, reuse patterns, and eviction timing beyond simple utilization rates. Implementation uses monotonic timestamps for accuracy, 1% sampling for minimal overhead (~48 bytes/block), and is fully thread-safe with zero runtime cost when disabled. Two new runtime flags are introduced: --kv-cache-metrics – enable KV cache residency metrics --kv-cache-metrics-sample – control sampling ratio (default: 0.01) Signed-off-by:Shivam <shivamprasad91@gmail.com>
-
- 10 Nov, 2025 1 commit
-
-
Mark McLoughlin authored
Signed-off-by:Mark McLoughlin <markmc@redhat.com>
-
- 05 Nov, 2025 1 commit
-
-
Snehlata authored
Signed-off-by:atalhens <sneh.lata@nutanix.com>
-
- 23 Oct, 2025 1 commit
-
-
Tova Movshovitz authored
Signed-off-by:tovam <tovam@pliops.com>
-
- 12 Oct, 2025 1 commit
-
-
Harry Mellor authored
Signed-off-by:Harry Mellor <19981378+hmellor@users.noreply.github.com>
-
- 10 Oct, 2025 2 commits
-
-
Mark McLoughlin authored
Signed-off-by:Mark McLoughlin <markmc@redhat.com>
-
Cyrus Leung authored
Signed-off-by:DarkLight1337 <tlleungac@connect.ust.hk>
-
- 05 Oct, 2025 2 commits
-
-
Harry Mellor authored
Signed-off-by:Harry Mellor <19981378+hmellor@users.noreply.github.com>
-
22quinn authored
Signed-off-by:22quinn <33176974+22quinn@users.noreply.github.com>
-
- 27 Sep, 2025 1 commit
-
-
Zhuohan Li authored
Signed-off-by:Zhuohan Li <zhuohan123@gmail.com>
-
- 24 Sep, 2025 1 commit
-
-
baxingpiaochong authored
Signed-off-by:baxingpiaochong <771405853@qq.com>
-
- 19 Sep, 2025 1 commit
-
-
Nicolò Lucchesi authored
Signed-off-by:NickLucche <nlucches@redhat.com>
-
- 12 Sep, 2025 1 commit
-
-
RichardoMu authored
Signed-off-by:
Mu Huai <tianbowen.tbw@antgroup.com> Signed-off-by:
Ye Zhang <zhysishu@gmail.com> Signed-off-by:
RichardoMu <44485717+RichardoMrMu@users.noreply.github.com> Signed-off-by:
simon-mo <simon.mo@hey.com> Signed-off-by:
Aaron Pham <contact@aarnphm.xyz> Signed-off-by:
22quinn <33176974+22quinn@users.noreply.github.com> Co-authored-by:
Mu Huai <tianbowen.tbw@antgroup.com> Co-authored-by:
Ye Zhang <zhysishu@gmail.com> Co-authored-by:
Benjamin Bartels <benjamin@bartels.dev> Co-authored-by:
simon-mo <simon.mo@hey.com> Co-authored-by:
瑜琮 <ly186375@antfin.com> Co-authored-by:
Aaron Pham <contact@aarnphm.xyz> Co-authored-by:
22quinn <33176974+22quinn@users.noreply.github.com>
-
- 02 Sep, 2025 2 commits
-
-
Mark McLoughlin authored
Signed-off-by:Mark McLoughlin <markmc@redhat.com>
-
Didier Durand authored
Signed-off-by:
Didier Durand <durand.didier@gmail.com> Co-authored-by:
Jee Jee Li <pandaleefree@gmail.com>
-
- 02 Aug, 2025 1 commit
-
-
Nick Hill authored
Signed-off-by:Nick Hill <nhill@redhat.com>
-
- 20 Jun, 2025 1 commit
-
-
Vlad Tiberiu Mihailescu authored
Signed-off-by:Vlad Mihailescu <vtmihailescu@gmail.com>
-
- 19 Jun, 2025 1 commit
-
-
Maximilien de Bayser authored
Signed-off-by:
Max de Bayser <mbayser@br.ibm.com> Signed-off-by:
Max de Bayser <maxdebayser@gmail.com> Signed-off-by:
22quinn <33176974+22quinn@users.noreply.github.com> Co-authored-by:
22quinn <33176974+22quinn@users.noreply.github.com>
-
- 14 Jun, 2025 1 commit
-
-
Saheli Bhattacharjee authored
Signed-off-by:Saheli Bhattacharjee <saheli@krai.ai>
-
- 03 Jun, 2025 1 commit
-
-
Simon Mo authored
Signed-off-by:simon-mo <simon.mo@hey.com>
-
- 12 May, 2025 1 commit
-
-
Chen Zhang authored
Signed-off-by:Chen Zhang <zhangch99@outlook.com>
-
- 01 Apr, 2025 1 commit
-
-
Mark McLoughlin authored
Signed-off-by:Mark McLoughlin <markmc@redhat.com>
-
- 24 Mar, 2025 1 commit
-
-
Nick Hill authored
Signed-off-by:Nick Hill <nhill@redhat.com>
-
- 19 Mar, 2025 1 commit
-
-
Wang Ran (汪然) authored
-
- 07 Mar, 2025 1 commit
-
-
Mark McLoughlin authored
Signed-off-by:Mark McLoughlin <markmc@redhat.com>
-
- 03 Mar, 2025 3 commits
-
-
Mark McLoughlin authored
[WIP][[V1][Metrics] Implement max_num_generation_tokens, request_params_n, and request_params_max_tokens metrics (#14055) Signed-off-by:Mark McLoughlin <markmc@redhat.com>
-
Mark McLoughlin authored
Signed-off-by:Mark McLoughlin <markmc@redhat.com>
-
Harry Mellor authored
-
- 27 Feb, 2025 1 commit
-
-
Mark McLoughlin authored
-
- 25 Feb, 2025 1 commit
-
-
Mark McLoughlin authored
-