Commits · 9f9c38c392476fd35b9154221c00a2255dcfd010 · OpenDAS / vllm_cscc

27 Jul, 2025 1 commit
- Refactor: Remove numpy dependency from LoggingStatLogger (#20529) · a8936e51
  ZiTian.Zhao authored Jul 27, 2025
```
Signed-off-by: zitian.zhao <zitian.zhao@tencentmusic.com>
```
  a8936e51
21 Jul, 2025 1 commit

[DP] Fix Prometheus Logging (#21257) · 29d1ffc5

Robert Shaw authored Jul 21, 2025


Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>

29d1ffc5

20 Jul, 2025 1 commit
- Enable v1 metrics tests (#20953) · d1fb65bd
  Seiji Eicher authored Jul 19, 2025
```
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
```
  d1fb65bd
20 Jun, 2025 1 commit
- Export NaNs in logits to scheduler_stats if output is corrupted (#18777) · 2e3e3c86
  Vlad Tiberiu Mihailescu authored Jun 20, 2025
```
Signed-off-by: Vlad Mihailescu <vtmihailescu@gmail.com>
```
  2e3e3c86
19 Jun, 2025 1 commit

Support embedding models in V1 (#16188) · 799397ee

Maximilien de Bayser authored Jun 19, 2025


Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: Max de Bayser <maxdebayser@gmail.com>
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
Co-authored-by: 22quinn <33176974+22quinn@users.noreply.github.com>

799397ee

14 Jun, 2025 1 commit
- [V1][Metrics] Deprecate metrics with gpu_ prefix for non GPU specific metrics. (#18354) · d1e34cc9
  Saheli Bhattacharjee authored Jun 14, 2025
```
Signed-off-by: Saheli Bhattacharjee <saheli@krai.ai>
```
  d1e34cc9
04 Jun, 2025 1 commit
- Fix ValueError: Missing value for tag key(s): model_name,engine. (#19113) · 2669a0d7
  Seiji Eicher authored Jun 04, 2025
```
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
```
  2669a0d7
03 Jun, 2025 1 commit
- [Misc] Add SPDX-FileCopyrightText (#19100) · 02f0c7b2
  Simon Mo authored Jun 03, 2025
```
Signed-off-by: simon-mo <simon.mo@hey.com>
```
  02f0c7b2
01 Jun, 2025 1 commit
- [BugFix] Fix incorrect metrics shutdown error log message (#18992) · 2b102d51
  Nick Hill authored May 31, 2025
```
Signed-off-by: Nick Hill <nhill@redhat.com>
```
  2b102d51
30 May, 2025 1 commit
- [Perf] API-server scaleout with many-to-many server-engine comms (#17546) · 2dbe8c07
  Nick Hill authored May 30, 2025
  
  2dbe8c07
27 May, 2025 1 commit
- [V1][Metrics] Add API for accessing in-memory Prometheus metrics (#17010) · 06a03380
  Mark McLoughlin authored May 27, 2025
```
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
```
  06a03380
16 May, 2025 1 commit
- [Misc] Add Ray Prometheus logger to V1 (#17925) · 54181767
  Seiji Eicher authored May 16, 2025
```
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
```
  54181767
12 May, 2025 1 commit
- [v1][KVCacheManager] Change prefix caching metric from counting blocks to counting tokens (#18003) · 302f3aca
  Chen Zhang authored May 13, 2025
```
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
```
  302f3aca
10 May, 2025 1 commit
- [V1][Spec Decoding] Log accumulated metrics after system goes idle (#17913) · 7042cc96
  Mark McLoughlin authored May 10, 2025
```
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
```
  7042cc96
30 Apr, 2025 1 commit
- [V1][Bugfix]: vllm v1 verison metric num_gpu_blocks is None (#15755) · d8037867
  rongfu.leng authored Apr 30, 2025
```
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>
```
  d8037867
27 Apr, 2025 2 commits

[Metrics] Fix minor inconsistencies in bucket progression (#17262) · 4213475e
Cyrus Leung authored Apr 28, 2025
```
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
```
4213475e

[Misc] Change buckets of histogram_iteration_tokens to [1, 8, 16, 32, 64, 128,... · 18445edd

Flex Wang authored Apr 27, 2025


[Misc] Change buckets of histogram_iteration_tokens to [1, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096, 8096] to represent number of tokens (#17033)
Signed-off-by: sfc-gh-zhwang <flex.wang@snowflake.com>

18445edd

26 Apr, 2025 1 commit

[V1][Metrics] Allow V1 AsyncLLM to use custom logger (#14661) · 53e8cf53

Zijing Liu authored Apr 25, 2025


Signed-off-by: Zijing Liu <liuzijing2014@gmail.com>
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Mark McLoughlin <markmc@redhat.com>
Co-authored-by: Nick Hill <nhill@redhat.com>

53e8cf53

24 Apr, 2025 1 commit
- [V1][Spec Decoding] Add num_drafts and num_accepted_tokens_per_position metrics (#16665) · 340d7b1b
  Mark McLoughlin authored Apr 24, 2025
```
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
```
  340d7b1b
17 Apr, 2025 1 commit
- [V1] Remove log noise when idle (#16735) · 9dbf7a2d
  Russell Bryant authored Apr 17, 2025
```
Signed-off-by: Russell Bryant <rbryant@redhat.com>
```
  9dbf7a2d
07 Apr, 2025 1 commit

[Metrics] Add bucket for `request_latency`, `time_to_first_token` and... · 86fc2321

Kay Yan authored Apr 07, 2025


[Metrics] Add bucket for `request_latency`, `time_to_first_token` and `time_per_output_token` (#15202)
Signed-off-by: Kay Yan <kay.yan@daocloud.io>

86fc2321

01 Apr, 2025 1 commit
- [V1][Metrics] Initial speculative decoding metrics (#15151) · a79cc68b
  Mark McLoughlin authored Apr 01, 2025
```
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
```
  a79cc68b
27 Mar, 2025 1 commit
- [V1] AsyncLLM data parallel (#13923) · 15dac210
  Nick Hill authored Mar 27, 2025
```
Signed-off-by: Nick Hill <nhill@redhat.com>
```
  15dac210
24 Mar, 2025 1 commit
- [V1] Aggregate chunked prompt logprobs in model runner (#14875) · 3aee6573
  Nick Hill authored Mar 24, 2025
```
Signed-off-by: Nick Hill <nhill@redhat.com>
```
  3aee6573
19 Mar, 2025 1 commit
- simple bugfix: Update stats.py (#15139) · 8310e0b5
  Wang Ran (汪然) authored Mar 20, 2025
  
  8310e0b5
07 Mar, 2025 2 commits
- [V1][Metrics] Fix traceback with preemptions+LoRA (#14220) · e1f0835a
  Mark McLoughlin authored Mar 07, 2025
```
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
```
  e1f0835a
- [V1] Eagerly remove finished requests from the batch (#14388) · 8ed5421a
  Nick Hill authored Mar 07, 2025
```
Signed-off-by: Nick Hill <nhill@redhat.com>
```
  8ed5421a
05 Mar, 2025 1 commit
- [V1][Bugfix] Do not reset prefix caching metrics (#14235) · ade3f7d9
  Cody Yu authored Mar 04, 2025
  
  ade3f7d9
03 Mar, 2025 4 commits
- [WIP][[V1][Metrics] Implement max_num_generation_tokens, request_params_n,... · ae122b1c
  Mark McLoughlin authored Mar 03, 2025
```
[WIP][[V1][Metrics] Implement max_num_generation_tokens,  request_params_n, and request_params_max_tokens metrics (#14055)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
```
  ae122b1c
- [V1] Simplify stats logging (#14082) · 872db2be
  Nick Hill authored Mar 03, 2025
```
Signed-off-by: Nick Hill <nhill@redhat.com>
```
  872db2be
- [V1] Refactor parallel sampling support (#13774) · 4167252e
  Mark McLoughlin authored Mar 03, 2025
```
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
```
  4167252e
- Update deprecated Python 3.8 typing (#13971) · cf069aa8
  Harry Mellor authored Mar 03, 2025
  
  cf069aa8
27 Feb, 2025 1 commit
- [V1][Metrics] Handle preemptions (#13169) · cd711c48
  Mark McLoughlin authored Feb 27, 2025
  
  cd711c48
25 Feb, 2025 1 commit
- [V1][Metrics] Implement vllm:lora_requests_info metric (#13504) · bc32bc73
  Mark McLoughlin authored Feb 25, 2025
  
  bc32bc73
22 Feb, 2025 2 commits
- [Metrics] Add `--show-hidden-metrics-for-version` CLI arg (#13295) · 2cb8c154
  Mark McLoughlin authored Feb 22, 2025
  
  2cb8c154
- [V1][Metrics] Support `vllm:cache_config_info` (#13299) · 1cd981da
  Mark McLoughlin authored Feb 22, 2025
  
  1cd981da
15 Feb, 2025 1 commit
- [V1][Metrics] Add iteration_tokens_total histogram from V0 (#13288) · 2ad1bc7a
  Mark McLoughlin authored Feb 15, 2025
  
  2ad1bc7a
11 Feb, 2025 2 commits
- [V1][Metrics] Add several request timing histograms (#12644) · 75e6e145
  Mark McLoughlin authored Feb 11, 2025
```
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
```
  75e6e145
- [V1][Metrics] Add GPU prefix cache hit rate % gauge (#12592) · 41c5dd45
  Cody Yu authored Feb 11, 2025
  
  41c5dd45
07 Feb, 2025 1 commit

[V1] Logprobs and prompt logprobs support (#9880) · 0630d453

afeldman-nm authored Feb 07, 2025



This PR is adding support for sample logprobs & prompt logprobs to vLLM v1.

New behavior:

- During model execution, model runner computes sample logprobs (if user-provided logprobs setting is not None) and prompt logprobs (if user-provided prompt_logprobs setting is not None). For both sample and prompt logprobs, the engine core returns 3 vectors: token ids, token logprob values, token ranks. Ranks reflect tokens' 1-indexed positions in the vocabulary vector after sorting the vocabulary by log probability in descending order.
- In scheduler.update_from_output(), sample and prompt logprobs are incorporated into the EngineCoreOutput data structure which is transferred to the engine client. If multiprocessing is enabled, then sample and prompt logprobs will be (de)serialized when the EngineCoreOutput data structure is (de)serialized.
- During output processing, the LogprobsProcessor transforms the triplet of token ids, token logprobs values, and token ranks into the OpenAI-compatible List[Dict[token id,Logprob]] format (for sample and prompt logprobs respectively.)
- Each Logprob instance (whether sample- or prompt-) consists of a token's log-probability, rank, and detokenized string representation. Note that logprob detokenization is handled by the LogprobsProcessor not the detokenizer.
Signed-off-by: Andrew Feldman <afeldman@neuralmagic.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
Co-authored-by: Nick Hill <nhill@redhat.com>

0630d453