Commits · d3af8c18317c0dc008d42e4367fbb9045cfb7bf6 · OpenDAS / vllm_cscc

14 Apr, 2026 1 commit

[Core][Metrics][BugFix] Replace num_cached_tokens/num_external_computed_tokens... · d3af8c18

Mark McLoughlin authored Apr 14, 2026


[Core][Metrics][BugFix] Replace num_cached_tokens/num_external_computed_tokens with PrefillStats (#37460)

Related to `Counters can only be incremented by non-negative amounts`
error with the `vllm:prompt_tokens_by_source_total` metric.
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Co-authored-by: Or Ozeri <or@ozery.com>

d3af8c18

13 Apr, 2026 1 commit

[Core][Metrics] expose waiting request breakdown via labeled metric (capacity/deferred) (#38435) · 5c18b961

mukesh-hai authored Apr 13, 2026


Signed-off-by: Mukesh Baphna <mukesh@hippocraticai.com>
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Mark McLoughlin <markmc@redhat.com>

5c18b961

12 Apr, 2026 1 commit
- [Core][Metrics] Remove `vllm:prompt_tokens_recomputed` metric (#38709) · 72ff142c
  Mark McLoughlin authored Apr 12, 2026
```
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
```
  72ff142c
01 Apr, 2026 1 commit
- [Feat][v1] Simple yet General CPU KV Cache Offloading (#37160) · 91e4521f
  Yifan Qiao authored Mar 31, 2026
```
Signed-off-by: Yifan Qiao <yifanqiao@berkeley.edu>
Signed-off-by: Yifan Qiao <yifanqiao@inferact.ai>
```
  91e4521f
14 Feb, 2026 1 commit

[Renderer] Move InputPreprocessor into Renderer (1/2) (#34510) · 73391a1b

Cyrus Leung authored Feb 15, 2026


Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com>

73391a1b

04 Feb, 2026 1 commit

[Metrics] Add labeled prompt token metrics for P/D disaggregation (#33290) · 4403e3ed

zhanqiuhu authored Feb 04, 2026

Add labeled Prometheus metrics to distinguish where prompt tokens come
from in P/D disaggregated deployments.

In P/D disaggregation, decode instances receive KV cache from prefill instances.
Currently, decode reports inflated prompt throughput because it counts all
prompt tokens as "computed", even though most were transferred.

This PR adds labeled metrics so users can understand actual compute work vs
transferred work:

vllm:prompt_tokens_by_source_total{source="local_compute"} # Tokens prefilled locally
vllm:prompt_tokens_by_source_total{source="external_kv_transfer"} # Tokens received via KV transfer
vllm:prompt_tokens_by_source_total{source="local_cache_hit"} # Tokens from local prefix cache
vllm:prompt_tokens_cached_total # Total cached (local + external, -1 when all
Signed-off-by: Zhanqiu Hu <zh338@cornell.edu>

4403e3ed

31 Jan, 2026 1 commit
- Support clear mm and encoder cache (#33452) · 22d9a056
  jma99_2333 authored Jan 31, 2026
```
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Roger Wang <hey@rogerw.io>
```
  22d9a056
18 Dec, 2025 1 commit

[Metrics] Model FLOPs Utilization estimation (#30738) · a0b782f9

SungMinCho authored Dec 17, 2025


Signed-off-by: SungMinCho <tjdals4565@gmail.com>
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Co-authored-by: Mark McLoughlin <markmc@redhat.com>

a0b782f9

09 Dec, 2025 1 commit
- feat(metrics): Add prefill KV compute metric excluding cached tokens (#30189) · f1599ca5
  Victor Ziliang Peng authored Dec 08, 2025
```
Signed-off-by: Ziliang Peng <ziliang@character.ai>
```
  f1599ca5
03 Dec, 2025 1 commit
- Add logging for cudagraph related info (#29825) · 69520bc6
  Yong Hoon Shin authored Dec 02, 2025
```
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
```
  69520bc6
01 Dec, 2025 1 commit

[Core][Observability] Add KV cache residency metrics (#27793) · cabc77cc

shivampr authored Dec 01, 2025



Introduces three new Prometheus histograms for fine-grained observability of KV cache residency behavior:

vllm:kv_block_lifetime_seconds — total lifetime from allocation to free
vllm:kv_block_idle_before_evict_seconds — idle duration before eviction
vllm:kv_block_reuse_gap_seconds — time between consecutive reuses of the same block

These metrics help operators analyze KV cache efficiency, reuse patterns, and eviction timing beyond simple utilization rates.

Implementation uses monotonic timestamps for accuracy, 1% sampling for minimal overhead (~48 bytes/block), and is fully thread-safe with zero runtime cost when disabled.

Two new runtime flags are introduced:

--kv-cache-metrics – enable KV cache residency metrics
--kv-cache-metrics-sample – control sampling ratio (default: 0.01)
Signed-off-by: Shivam <shivamprasad91@gmail.com>

cabc77cc

10 Nov, 2025 1 commit
- [Metrics] Refactor LoRA state tracking (#26801) · 6f7de33b
  Mark McLoughlin authored Nov 10, 2025
```
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
```
  6f7de33b
05 Nov, 2025 1 commit
- [Feature]: Add corrupted request metric to V1 metrics system. (#27306) · e1560178
  Snehlata authored Nov 06, 2025
```
Signed-off-by: atalhens <sneh.lata@nutanix.com>
```
  e1560178
23 Oct, 2025 1 commit
- [Metrics] [KVConnector] Add connector prefix cache hit rate stats (#26245) · 88afa110
  Tova Movshovitz authored Oct 23, 2025
```
Signed-off-by: tovam <tovam@pliops.com>
```
  88afa110
12 Oct, 2025 1 commit
- Update `Optional[x]` -> `x | None` and `Union[x, y]` to `x | y` (#26633) · 8fcaaf6a
  Harry Mellor authored Oct 12, 2025
```
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
```
  8fcaaf6a
10 Oct, 2025 2 commits
- [Metrics] Add test for multi-modal cache stats logging (#26588) · e5192819
  Mark McLoughlin authored Oct 10, 2025
```
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
```
  e5192819
- [Metrics] Log multi-modal cache stats and fix reset (#26285) · ad430a67
  Cyrus Leung authored Oct 10, 2025
```
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
```
  ad430a67
05 Oct, 2025 2 commits
- Convert formatting to use `ruff` instead of `yapf` + `isort` (#26247) · d6953beb
  Harry Mellor authored Oct 05, 2025
```
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
```
  d6953beb
- [Easy] Add str repr for IterationStats (#26232) · 78c1d5bf
  22quinn authored Oct 04, 2025
```
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
```
  78c1d5bf
27 Sep, 2025 1 commit
- [Core] Don't count preempted tokens in prefix cache hit rate (#25787) · 8bf8f458
  Zhuohan Li authored Sep 26, 2025
```
Signed-off-by: Zhuohan Li <zhuohan123@gmail.com>
```
  8bf8f458
24 Sep, 2025 1 commit
- [V1][Metrics] Add per-request TPOT histogram (#24015) · d06b5a95
  baxingpiaochong authored Sep 24, 2025
```
Signed-off-by: baxingpiaochong <771405853@qq.com>
```
  d06b5a95
19 Sep, 2025 1 commit
- [P/D][Nixl] Introduce `KVTransferMetrics` and aggregation strategy (#22188) · a3d087ad
  Nicolò Lucchesi authored Sep 19, 2025
```
Signed-off-by: NickLucche <nlucches@redhat.com>
```
  a3d087ad
12 Sep, 2025 1 commit

[V1] feat:add engine v1 tracing (#20372) · 40b6c912

RichardoMu authored Sep 12, 2025


Signed-off-by: Mu Huai <tianbowen.tbw@antgroup.com>
Signed-off-by: Ye Zhang <zhysishu@gmail.com>
Signed-off-by: RichardoMu <44485717+RichardoMrMu@users.noreply.github.com>
Signed-off-by: simon-mo <simon.mo@hey.com>
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
Co-authored-by: Mu Huai <tianbowen.tbw@antgroup.com>
Co-authored-by: Ye Zhang <zhysishu@gmail.com>
Co-authored-by: Benjamin Bartels <benjamin@bartels.dev>
Co-authored-by: simon-mo <simon.mo@hey.com>
Co-authored-by: 瑜琮 <ly186375@antfin.com>
Co-authored-by: Aaron Pham <contact@aarnphm.xyz>
Co-authored-by: 22quinn <33176974+22quinn@users.noreply.github.com>

40b6c912

02 Sep, 2025 2 commits
- [Metrics] Deprecate TPOT in favor of ITL (#24110) · 24177984
  Mark McLoughlin authored Sep 02, 2025
```
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
```
  24177984
- [Doc]: fix typos in Python comments (#24042) · 0235103c
  Didier Durand authored Sep 02, 2025
```
Signed-off-by: Didier Durand <durand.didier@gmail.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
```
  0235103c
02 Aug, 2025 1 commit
- [BugFix] Improve internal DP load balancing (#21617) · 8d524ce7
  Nick Hill authored Aug 02, 2025
```
Signed-off-by: Nick Hill <nhill@redhat.com>
```
  8d524ce7
20 Jun, 2025 1 commit
- Export NaNs in logits to scheduler_stats if output is corrupted (#18777) · 2e3e3c86
  Vlad Tiberiu Mihailescu authored Jun 20, 2025
```
Signed-off-by: Vlad Mihailescu <vtmihailescu@gmail.com>
```
  2e3e3c86
19 Jun, 2025 1 commit

Support embedding models in V1 (#16188) · 799397ee

Maximilien de Bayser authored Jun 19, 2025


Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: Max de Bayser <maxdebayser@gmail.com>
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
Co-authored-by: 22quinn <33176974+22quinn@users.noreply.github.com>

799397ee

14 Jun, 2025 1 commit
- [V1][Metrics] Deprecate metrics with gpu_ prefix for non GPU specific metrics. (#18354) · d1e34cc9
  Saheli Bhattacharjee authored Jun 14, 2025
```
Signed-off-by: Saheli Bhattacharjee <saheli@krai.ai>
```
  d1e34cc9
03 Jun, 2025 1 commit
- [Misc] Add SPDX-FileCopyrightText (#19100) · 02f0c7b2
  Simon Mo authored Jun 03, 2025
```
Signed-off-by: simon-mo <simon.mo@hey.com>
```
  02f0c7b2
12 May, 2025 1 commit
- [v1][KVCacheManager] Change prefix caching metric from counting blocks to counting tokens (#18003) · 302f3aca
  Chen Zhang authored May 13, 2025
```
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
```
  302f3aca
01 Apr, 2025 1 commit
- [V1][Metrics] Initial speculative decoding metrics (#15151) · a79cc68b
  Mark McLoughlin authored Apr 01, 2025
```
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
```
  a79cc68b
24 Mar, 2025 1 commit
- [V1] Aggregate chunked prompt logprobs in model runner (#14875) · 3aee6573
  Nick Hill authored Mar 24, 2025
```
Signed-off-by: Nick Hill <nhill@redhat.com>
```
  3aee6573
19 Mar, 2025 1 commit
- simple bugfix: Update stats.py (#15139) · 8310e0b5
  Wang Ran (汪然) authored Mar 20, 2025
  
  8310e0b5
07 Mar, 2025 1 commit
- [V1][Metrics] Fix traceback with preemptions+LoRA (#14220) · e1f0835a
  Mark McLoughlin authored Mar 07, 2025
```
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
```
  e1f0835a
03 Mar, 2025 3 commits
- [WIP][[V1][Metrics] Implement max_num_generation_tokens, request_params_n,... · ae122b1c
  Mark McLoughlin authored Mar 03, 2025
```
[WIP][[V1][Metrics] Implement max_num_generation_tokens,  request_params_n, and request_params_max_tokens metrics (#14055)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
```
  ae122b1c
- [V1] Refactor parallel sampling support (#13774) · 4167252e
  Mark McLoughlin authored Mar 03, 2025
```
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
```
  4167252e
- Update deprecated Python 3.8 typing (#13971) · cf069aa8
  Harry Mellor authored Mar 03, 2025
  
  cf069aa8
27 Feb, 2025 1 commit
- [V1][Metrics] Handle preemptions (#13169) · cd711c48
  Mark McLoughlin authored Feb 27, 2025
  
  cd711c48
25 Feb, 2025 1 commit
- [V1][Metrics] Implement vllm:lora_requests_info metric (#13504) · bc32bc73
  Mark McLoughlin authored Feb 25, 2025
  
  bc32bc73