- 14 Apr, 2026 1 commit
-
-
Mark McLoughlin authored
[Core][Metrics][BugFix] Replace num_cached_tokens/num_external_computed_tokens with PrefillStats (#37460) Related to `Counters can only be incremented by non-negative amounts` error with the `vllm:prompt_tokens_by_source_total` metric. Signed-off-by:
Mark McLoughlin <markmc@redhat.com> Co-authored-by:
Or Ozeri <or@ozery.com>
-
- 13 Apr, 2026 1 commit
-
-
mukesh-hai authored
Signed-off-by:
Mukesh Baphna <mukesh@hippocraticai.com> Signed-off-by:
Mark McLoughlin <markmc@redhat.com> Co-authored-by:
Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by:
Mark McLoughlin <markmc@redhat.com>
-
- 12 Apr, 2026 1 commit
-
-
Mark McLoughlin authored
Signed-off-by:Mark McLoughlin <markmc@redhat.com>
-
- 01 Apr, 2026 1 commit
-
-
Yifan Qiao authored
Signed-off-by:
Yifan Qiao <yifanqiao@berkeley.edu> Signed-off-by:
Yifan Qiao <yifanqiao@inferact.ai>
-
- 20 Mar, 2026 1 commit
-
-
Martin Hickey authored
Signed-off-by:Martin Hickey <martin.hickey@ie.ibm.com>
-
- 18 Mar, 2026 1 commit
-
-
Thillai Chithambaram authored
Signed-off-by:Thillai Chithambaram <thillaichithambaram.a@gmail.com>
-
- 23 Feb, 2026 1 commit
-
-
Mark McLoughlin authored
Export the existing Model FLOPs Utilization (MFU) metrics via Prometheus. `--enable-mfu-metrics` is required for these to be exposed. Co-authored-by:
Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by:
Mark McLoughlin <markmc@redhat.com>
-
- 14 Feb, 2026 1 commit
-
-
Cyrus Leung authored
Signed-off-by:
DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by:
Cyrus Leung <cyrus.tl.leung@gmail.com>
-
- 13 Feb, 2026 1 commit
-
-
Jaewon authored
Signed-off-by:
Jaewon Lee <jaewon@meta.com> Co-authored-by:
Lu Fang <30275821+houseroad@users.noreply.github.com>
-
- 04 Feb, 2026 1 commit
-
-
zhanqiuhu authored
Add labeled Prometheus metrics to distinguish where prompt tokens come from in P/D disaggregated deployments. In P/D disaggregation, decode instances receive KV cache from prefill instances. Currently, decode reports inflated prompt throughput because it counts all prompt tokens as "computed", even though most were transferred. This PR adds labeled metrics so users can understand actual compute work vs transferred work: vllm:prompt_tokens_by_source_total{source="local_compute"} # Tokens prefilled locally vllm:prompt_tokens_by_source_total{source="external_kv_transfer"} # Tokens received via KV transfer vllm:prompt_tokens_by_source_total{source="local_cache_hit"} # Tokens from local prefix cache vllm:prompt_tokens_cached_total # Total cached (local + external, -1 when all Signed-off-by:Zhanqiu Hu <zh338@cornell.edu>
-
- 31 Jan, 2026 1 commit
-
-
jma99_2333 authored
Signed-off-by:
Roger Wang <hey@rogerw.io> Co-authored-by:
Roger Wang <hey@rogerw.io>
-
- 27 Jan, 2026 1 commit
-
-
omkhalil authored
Fix UnembedMetrics to correctly count FLOPs for the unembedding (LM head) layer. The bug: UnembedMetrics used total_num_tokens() which counts all tokens in the batch for projection flops, vocab projections are run on just the last token for the autoregressive use case. Co-authored-by:Omar Mohamed Khalil <omarkhalil@meta.com>
-
- 20 Jan, 2026 1 commit
-
-
杨朱 · Kiki authored
This PR completes the removal of the deprecated vllm:time_per_output_token_seconds metric that was deprecated in v0.11, hidden in v0.12, scheduled for removal in v0.13, but delayed until v0.15. Signed-off-by:
carlory <baofa.fan@daocloud.io> Co-authored-by:
Claude Haiku 4.5 <noreply@anthropic.com>
-
- 18 Dec, 2025 1 commit
-
-
SungMinCho authored
Signed-off-by:
SungMinCho <tjdals4565@gmail.com> Signed-off-by:
Mark McLoughlin <markmc@redhat.com> Co-authored-by:
Mark McLoughlin <markmc@redhat.com>
-
- 09 Dec, 2025 1 commit
-
-
Victor Ziliang Peng authored
Signed-off-by:Ziliang Peng <ziliang@character.ai>
-
- 08 Dec, 2025 1 commit
-
-
Nick Hill authored
Signed-off-by:Nick Hill <nhill@redhat.com>
-
- 03 Dec, 2025 1 commit
-
-
Yong Hoon Shin authored
Signed-off-by:Yong Hoon Shin <yhshin@meta.com>
-
- 02 Dec, 2025 1 commit
-
-
Seiji Eicher authored
Signed-off-by:
Seiji Eicher <seiji@anyscale.com> Co-authored-by:
rongfu.leng <1275177125@qq.com>
-
- 01 Dec, 2025 1 commit
-
-
shivampr authored
Introduces three new Prometheus histograms for fine-grained observability of KV cache residency behavior: vllm:kv_block_lifetime_seconds — total lifetime from allocation to free vllm:kv_block_idle_before_evict_seconds — idle duration before eviction vllm:kv_block_reuse_gap_seconds — time between consecutive reuses of the same block These metrics help operators analyze KV cache efficiency, reuse patterns, and eviction timing beyond simple utilization rates. Implementation uses monotonic timestamps for accuracy, 1% sampling for minimal overhead (~48 bytes/block), and is fully thread-safe with zero runtime cost when disabled. Two new runtime flags are introduced: --kv-cache-metrics – enable KV cache residency metrics --kv-cache-metrics-sample – control sampling ratio (default: 0.01) Signed-off-by:Shivam <shivamprasad91@gmail.com>
-
- 28 Nov, 2025 1 commit
-
-
Cyrus Leung authored
Signed-off-by:DarkLight1337 <tlleungac@connect.ust.hk>
-
- 25 Nov, 2025 1 commit
-
-
Mark McLoughlin authored
Signed-off-by:Mark McLoughlin <markmc@redhat.com>
-
- 21 Nov, 2025 1 commit
-
-
wangxiyuan authored
Signed-off-by:wangxiyuan <wangxiyuan1007@gmail.com>
-
- 17 Nov, 2025 1 commit
-
-
Jae-Won Chung authored
The `vllm:kv_cache_usage_perc` Gauge metric is missing `multiprocess_mode="mostrecent"` and ends up returning ``` vllm:kv_cache_usage_perc{engine="0",model_name="Qwen/Qwen3-VL-8B-Instruct",pid="277"} 0.0 vllm:kv_cache_usage_perc{engine="0",model_name="Qwen/Qwen3-VL-8B-Instruct",pid="275"} 0.0 vllm:kv_cache_usage_perc{engine="0",model_name="Qwen/Qwen3-VL-8B-Instruct",pid="273"} 0.6530455880475035 ... ``` The deprecated `vllm:gpu_cache_usage_perc` Gauge metric has `multiprocess_mode="mostrecent"`. Signed-off-by:Jae-Won Chung <jwnchung@umich.edu>
-
- 14 Nov, 2025 1 commit
-
-
lyn610 authored
Add tracking and periodic logging for the number of preempted requests in the metrics logger. This helps monitor system behavior under load. Signed-off-by:Yining Liu <610lyn@gmail.com>
-
- 10 Nov, 2025 1 commit
-
-
Mark McLoughlin authored
Signed-off-by:Mark McLoughlin <markmc@redhat.com>
-
- 05 Nov, 2025 1 commit
-
-
Snehlata authored
Signed-off-by:atalhens <sneh.lata@nutanix.com>
-
- 04 Nov, 2025 1 commit
-
-
Mark McLoughlin authored
Signed-off-by:Mark McLoughlin <markmc@redhat.com>
-
- 30 Oct, 2025 1 commit
-
-
Sumanth R Hegde authored
Signed-off-by:SumanthRH <sumanthrh99@gmail.com>
-
- 29 Oct, 2025 2 commits
-
-
Nicolò Lucchesi authored
Signed-off-by:
NickLucche <nlucches@redhat.com> Signed-off-by:
Mark McLoughlin <markmc@redhat.com> Co-authored-by:
Mark McLoughlin <markmc@redhat.com>
-
Braulio Dumba authored
Signed-off-by:Braulio Dumba <Braulio.Dumba@ibm.com>
-
- 24 Oct, 2025 1 commit
-
-
Wentao Ye authored
Signed-off-by:yewentao256 <zhyanwentao@126.com>
-
- 23 Oct, 2025 1 commit
-
-
Tova Movshovitz authored
Signed-off-by:tovam <tovam@pliops.com>
-
- 18 Oct, 2025 1 commit
-
-
Tova Movshovitz authored
Signed-off-by:tovam <tovam@pliops.com>
-
- 14 Oct, 2025 1 commit
-
-
Lucia Fang authored
Signed-off-by:Lu Fang <fanglu@fb.com>
-
- 12 Oct, 2025 1 commit
-
-
Harry Mellor authored
Signed-off-by:Harry Mellor <19981378+hmellor@users.noreply.github.com>
-
- 10 Oct, 2025 2 commits
-
-
Mark McLoughlin authored
Signed-off-by:Mark McLoughlin <markmc@redhat.com>
-
Cyrus Leung authored
Signed-off-by:DarkLight1337 <tlleungac@connect.ust.hk>
-
- 05 Oct, 2025 2 commits
-
-
Harry Mellor authored
Signed-off-by:Harry Mellor <19981378+hmellor@users.noreply.github.com>
-
22quinn authored
Signed-off-by:22quinn <33176974+22quinn@users.noreply.github.com>
-
- 03 Oct, 2025 1 commit
-
-
Nicolò Lucchesi authored
Signed-off-by:NickLucche <nlucches@redhat.com>
-