Commits · d3af8c18317c0dc008d42e4367fbb9045cfb7bf6 · OpenDAS / vllm_cscc

14 Apr, 2026 1 commit

[Core][Metrics][BugFix] Replace num_cached_tokens/num_external_computed_tokens... · d3af8c18

Mark McLoughlin authored Apr 14, 2026


[Core][Metrics][BugFix] Replace num_cached_tokens/num_external_computed_tokens with PrefillStats (#37460)

Related to `Counters can only be incremented by non-negative amounts`
error with the `vllm:prompt_tokens_by_source_total` metric.
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Co-authored-by: Or Ozeri <or@ozery.com>

d3af8c18

13 Apr, 2026 1 commit

[Core][Metrics] expose waiting request breakdown via labeled metric (capacity/deferred) (#38435) · 5c18b961

mukesh-hai authored Apr 13, 2026


Signed-off-by: Mukesh Baphna <mukesh@hippocraticai.com>
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Mark McLoughlin <markmc@redhat.com>

5c18b961

12 Apr, 2026 1 commit
- [Core][Metrics] Remove `vllm:prompt_tokens_recomputed` metric (#38709) · 72ff142c
  Mark McLoughlin authored Apr 12, 2026
```
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
```
  72ff142c
01 Apr, 2026 1 commit
- [Feat][v1] Simple yet General CPU KV Cache Offloading (#37160) · 91e4521f
  Yifan Qiao authored Mar 31, 2026
```
Signed-off-by: Yifan Qiao <yifanqiao@berkeley.edu>
Signed-off-by: Yifan Qiao <yifanqiao@inferact.ai>
```
  91e4521f
20 Mar, 2026 1 commit
- [Metrics] Some small refactoring for better maintainability (#33898) · 880be2b1
  Martin Hickey authored Mar 20, 2026
```
Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>
```
  880be2b1
18 Mar, 2026 1 commit
- [Bugfix] Expand quantization method support in perf metrics (#37231) · 828f862a
  Thillai Chithambaram authored Mar 18, 2026
```
Signed-off-by: Thillai Chithambaram <thillaichithambaram.a@gmail.com>
```
  828f862a
23 Feb, 2026 1 commit

[Metrics] Add Prometheus counters for Model FLOPs Utilization (MFU) (#30950) · 5cc7c445

Mark McLoughlin authored Feb 23, 2026



Export the existing Model FLOPs Utilization (MFU) metrics via Prometheus.

`--enable-mfu-metrics` is required for these to be exposed.
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Mark McLoughlin <markmc@redhat.com>

5cc7c445

14 Feb, 2026 1 commit

[Renderer] Move InputPreprocessor into Renderer (1/2) (#34510) · 73391a1b

Cyrus Leung authored Feb 15, 2026


Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com>

73391a1b

13 Feb, 2026 1 commit

[Core] Profiler improvements and lazy initialization (#33198) · 4453ba8d

Jaewon authored Feb 12, 2026


Signed-off-by: Jaewon Lee <jaewon@meta.com>
Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com>

4453ba8d

04 Feb, 2026 1 commit

[Metrics] Add labeled prompt token metrics for P/D disaggregation (#33290) · 4403e3ed

zhanqiuhu authored Feb 04, 2026

Add labeled Prometheus metrics to distinguish where prompt tokens come
from in P/D disaggregated deployments.

In P/D disaggregation, decode instances receive KV cache from prefill instances.
Currently, decode reports inflated prompt throughput because it counts all
prompt tokens as "computed", even though most were transferred.

This PR adds labeled metrics so users can understand actual compute work vs
transferred work:

vllm:prompt_tokens_by_source_total{source="local_compute"} # Tokens prefilled locally
vllm:prompt_tokens_by_source_total{source="external_kv_transfer"} # Tokens received via KV transfer
vllm:prompt_tokens_by_source_total{source="local_cache_hit"} # Tokens from local prefix cache
vllm:prompt_tokens_cached_total # Total cached (local + external, -1 when all
Signed-off-by: Zhanqiu Hu <zh338@cornell.edu>

4403e3ed

31 Jan, 2026 1 commit
- Support clear mm and encoder cache (#33452) · 22d9a056
  jma99_2333 authored Jan 31, 2026
```
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Roger Wang <hey@rogerw.io>
```
  22d9a056
27 Jan, 2026 1 commit

[Metrics][MFU] Fix UnembedMetrics FLOP overcounting for prefill (#33045) (#33045) · 5ec44056

omkhalil authored Jan 27, 2026

Fix UnembedMetrics to correctly count FLOPs for the unembedding (LM head) layer.

The bug: UnembedMetrics used total_num_tokens() which counts all tokens in the
batch for projection flops, vocab projections are run on just the last token for the
autoregressive use case.
Co-authored-by: Omar Mohamed Khalil <omarkhalil@meta.com>

5ec44056

20 Jan, 2026 1 commit

[Metrics] Complete removal of deprecated vllm:time_per_output_token_seconds metric (#32661) · bb917203

杨朱 · Kiki authored Jan 20, 2026



This PR completes the removal of the deprecated vllm:time_per_output_token_seconds
metric that was deprecated in v0.11, hidden in v0.12, scheduled for removal in v0.13,
but delayed until v0.15.
Signed-off-by: carlory <baofa.fan@daocloud.io>
Co-authored-by: Claude Haiku 4.5 <noreply@anthropic.com>

bb917203

18 Dec, 2025 1 commit

[Metrics] Model FLOPs Utilization estimation (#30738) · a0b782f9

SungMinCho authored Dec 17, 2025


Signed-off-by: SungMinCho <tjdals4565@gmail.com>
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Co-authored-by: Mark McLoughlin <markmc@redhat.com>

a0b782f9

09 Dec, 2025 1 commit
- feat(metrics): Add prefill KV compute metric excluding cached tokens (#30189) · f1599ca5
  Victor Ziliang Peng authored Dec 08, 2025
```
Signed-off-by: Ziliang Peng <ziliang@character.ai>
```
  f1599ca5
08 Dec, 2025 1 commit
- [BugFix] Unblock use of LoRA with data parallel mode (#30220) · d726a7b0
  Nick Hill authored Dec 07, 2025
```
Signed-off-by: Nick Hill <nhill@redhat.com>
```
  d726a7b0
03 Dec, 2025 1 commit
- Add logging for cudagraph related info (#29825) · 69520bc6
  Yong Hoon Shin authored Dec 02, 2025
```
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
```
  69520bc6
02 Dec, 2025 1 commit

[Misc] Add ReplicaId to Ray metrics (#24267) · 22274b21

Seiji Eicher authored Dec 01, 2025


Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Co-authored-by: rongfu.leng <1275177125@qq.com>

22274b21

01 Dec, 2025 1 commit

[Core][Observability] Add KV cache residency metrics (#27793) · cabc77cc

shivampr authored Dec 01, 2025



Introduces three new Prometheus histograms for fine-grained observability of KV cache residency behavior:

vllm:kv_block_lifetime_seconds — total lifetime from allocation to free
vllm:kv_block_idle_before_evict_seconds — idle duration before eviction
vllm:kv_block_reuse_gap_seconds — time between consecutive reuses of the same block

These metrics help operators analyze KV cache efficiency, reuse patterns, and eviction timing beyond simple utilization rates.

Implementation uses monotonic timestamps for accuracy, 1% sampling for minimal overhead (~48 bytes/block), and is fully thread-safe with zero runtime cost when disabled.

Two new runtime flags are introduced:

--kv-cache-metrics – enable KV cache residency metrics
--kv-cache-metrics-sample – control sampling ratio (default: 0.01)
Signed-off-by: Shivam <shivamprasad91@gmail.com>

cabc77cc

28 Nov, 2025 1 commit
- [mypy] Enable type checking for more directories (#29674) · 9e6bcda3
  Cyrus Leung authored Nov 29, 2025
```
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
```
  9e6bcda3
25 Nov, 2025 1 commit
- [Metrics] Scheduled removal of deprecated metrics (#29330) · 9cf4edae
  Mark McLoughlin authored Nov 25, 2025
```
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
```
  9cf4edae
21 Nov, 2025 1 commit
- [Doc] Update plugin doc (#28532) · 4050bae4
  wangxiyuan authored Nov 21, 2025
```
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
```
  4050bae4
17 Nov, 2025 1 commit

[Metrics] Fix KV cache usage percent metric multiproc (#28792) · d4acf518

Jae-Won Chung authored Nov 17, 2025



The `vllm:kv_cache_usage_perc` Gauge metric is missing `multiprocess_mode="mostrecent"` and ends up returning

```
vllm:kv_cache_usage_perc{engine="0",model_name="Qwen/Qwen3-VL-8B-Instruct",pid="277"} 0.0
vllm:kv_cache_usage_perc{engine="0",model_name="Qwen/Qwen3-VL-8B-Instruct",pid="275"} 0.0
vllm:kv_cache_usage_perc{engine="0",model_name="Qwen/Qwen3-VL-8B-Instruct",pid="273"} 0.6530455880475035
...
```

The deprecated `vllm:gpu_cache_usage_perc` Gauge metric has `multiprocess_mode="mostrecent"`.
Signed-off-by: Jae-Won Chung <jwnchung@umich.edu>

d4acf518

14 Nov, 2025 1 commit

[Metrics] Log number of preempted requests (#28522) · ecf8230d

lyn610 authored Nov 14, 2025

Add tracking and periodic logging for the number of preempted requests in the
metrics logger. This helps monitor system behavior under load.
Signed-off-by: Yining Liu <610lyn@gmail.com>

ecf8230d

10 Nov, 2025 1 commit
- [Metrics] Refactor LoRA state tracking (#26801) · 6f7de33b
  Mark McLoughlin authored Nov 10, 2025
```
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
```
  6f7de33b
05 Nov, 2025 1 commit
- [Feature]: Add corrupted request metric to V1 metrics system. (#27306) · e1560178
  Snehlata authored Nov 06, 2025
```
Signed-off-by: atalhens <sneh.lata@nutanix.com>
```
  e1560178
04 Nov, 2025 1 commit
- [Metrics] Enable sleep state metric outside of dev mode (#27867) · 380ba681
  Mark McLoughlin authored Nov 04, 2025
```
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
```
  380ba681
30 Oct, 2025 1 commit
- [Fix] Skip `record_sleep_state` logic in `PrometheusStatsLogger` if not in dev mode (#27789) · 49170025
  Sumanth R Hegde authored Oct 30, 2025
```
Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
```
  49170025
29 Oct, 2025 2 commits
- [KVConnector] Add metrics to Prometheus-Grafana dashboard (#26811) · accb8fab
  Nicolò Lucchesi authored Oct 29, 2025
```
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Co-authored-by: Mark McLoughlin <markmc@redhat.com>
```
  accb8fab
- [Core] Exposing engine sleep & wake_up state as prometheus metrics (#24176) · 1da3309a
  Braulio Dumba authored Oct 29, 2025
```
Signed-off-by: Braulio Dumba <Braulio.Dumba@ibm.com>
```
  1da3309a
24 Oct, 2025 1 commit
- [Log] Optimize Startup Log (#26740) · 52efc34e
  Wentao Ye authored Oct 24, 2025
```
Signed-off-by: yewentao256 <zhyanwentao@126.com>
```
  52efc34e
23 Oct, 2025 1 commit
- [Metrics] [KVConnector] Add connector prefix cache hit rate stats (#26245) · 88afa110
  Tova Movshovitz authored Oct 23, 2025
```
Signed-off-by: tovam <tovam@pliops.com>
```
  88afa110
18 Oct, 2025 1 commit
- [V1][Metrics][Plugin] Add plugin support for custom `StatLoggerBase` implementations (#22456) · 83e760c5
  Tova Movshovitz authored Oct 19, 2025
```
Signed-off-by: tovam <tovam@pliops.com>
```
  83e760c5
14 Oct, 2025 1 commit
- [Misc][DP] support customized aggregated logger for dp (#24354) · 8317f723
  Lucia Fang authored Oct 13, 2025
```
Signed-off-by: Lu Fang <fanglu@fb.com>
```
  8317f723
12 Oct, 2025 1 commit
- Update `Optional[x]` -> `x | None` and `Union[x, y]` to `x | y` (#26633) · 8fcaaf6a
  Harry Mellor authored Oct 12, 2025
```
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
```
  8fcaaf6a
10 Oct, 2025 2 commits
- [Metrics] Add test for multi-modal cache stats logging (#26588) · e5192819
  Mark McLoughlin authored Oct 10, 2025
```
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
```
  e5192819
- [Metrics] Log multi-modal cache stats and fix reset (#26285) · ad430a67
  Cyrus Leung authored Oct 10, 2025
```
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
```
  ad430a67
05 Oct, 2025 2 commits
- Convert formatting to use `ruff` instead of `yapf` + `isort` (#26247) · d6953beb
  Harry Mellor authored Oct 05, 2025
```
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
```
  d6953beb
- [Easy] Add str repr for IterationStats (#26232) · 78c1d5bf
  22quinn authored Oct 04, 2025
```
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
```
  78c1d5bf
03 Oct, 2025 1 commit
- [NIXL][Misc] Expose metrics from NIXL for logging to CLI (#25388) · 48f30902
  Nicolò Lucchesi authored Oct 03, 2025
```
Signed-off-by: NickLucche <nlucches@redhat.com>
```
  48f30902