Commits · f1599ca55d79cb686cb94dc3ff2f65d82db94940 · OpenDAS / vllm_cscc

09 Dec, 2025 1 commit
- feat(metrics): Add prefill KV compute metric excluding cached tokens (#30189) · f1599ca5
  Victor Ziliang Peng authored Dec 08, 2025
```
Signed-off-by: Ziliang Peng <ziliang@character.ai>
```
  f1599ca5
29 Nov, 2025 1 commit
- [Misc] Refactor tokenizer interface (#29693) · 34a98427
  Cyrus Leung authored Nov 29, 2025
```
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
```
  34a98427
20 Nov, 2025 1 commit

[RL] Add Pause and Resume Generation for Asynchronous RL Training (#28037) · 371b1d4c

Samit authored Nov 20, 2025


Signed-off-by: SamitHuang <285365963@qq.com>
Signed-off-by: Samit <285365963@qq.com>
Signed-off-by: samithuang <285365963@qq.com>
Co-authored-by: 22quinn <33176974+22quinn@users.noreply.github.com>

371b1d4c

13 Nov, 2025 1 commit

[Perf] Support stream interval for reducing host overhead (#27869) · 5d6ce2b9

elvischenv authored Nov 14, 2025


Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
Co-authored-by: Nick Hill <nhill@redhat.com>

5d6ce2b9

10 Nov, 2025 1 commit
- [Metrics] Refactor LoRA state tracking (#26801) · 6f7de33b
  Mark McLoughlin authored Nov 10, 2025
```
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
```
  6f7de33b
30 Oct, 2025 1 commit
- [CI] Fix mypy for `vllm/v1/core` and `vllm/v1/engine` (#27108) · c01f6e52
  Wentao Ye authored Oct 30, 2025
```
Signed-off-by: yewentao256 <zhyanwentao@126.com>
```
  c01f6e52
23 Oct, 2025 1 commit
- [Model] Add num_cached_tokens for PoolingRequestOutput (#27378) · 3729ed00
  wang.yuqi authored Oct 23, 2025
```
Signed-off-by: wang.yuqi <noooop@126.com>
```
  3729ed00
12 Oct, 2025 1 commit
- Update `Optional[x]` -> `x | None` and `Union[x, y]` to `x | y` (#26633) · 8fcaaf6a
  Harry Mellor authored Oct 12, 2025
```
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
```
  8fcaaf6a
05 Oct, 2025 1 commit
- Convert formatting to use `ruff` instead of `yapf` + `isort` (#26247) · d6953beb
  Harry Mellor authored Oct 05, 2025
```
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
```
  d6953beb
03 Oct, 2025 1 commit
- add(v1): RequestStatesStats to RequestOutput (#24947) · 3e70e3d4
  HUIJONG JEONG authored Oct 03, 2025
```
Signed-off-by: huijjj <huijong.jeong@squeezebits.com>
```
  3e70e3d4
26 Sep, 2025 1 commit

[Bugfix] Properly abort pooling request. (#25734) · fe6b19c3

wang.yuqi authored Sep 26, 2025


Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>

fe6b19c3

19 Sep, 2025 1 commit

[CORE] Prompt Embeddings Support for v1 Engine (#24278) · 9a4600e4

Andrew Sansom authored Sep 18, 2025


Signed-off-by: Andrew Sansom <andrew@protopia.ai>
Signed-off-by: Andrew Sansom <qthequartermasterman@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>

9a4600e4

17 Sep, 2025 1 commit
- [Core] Remove tokenizer group in vLLM (#24078) · 6c47f6bf
  Zhuohan Li authored Sep 17, 2025
```
Signed-off-by: Zhuohan Li <zhuohan123@gmail.com>
```
  6c47f6bf
15 Sep, 2025 1 commit
- [gpt-oss] Add IncompleteDetails to ResponsesRepsonse (#24561) · 25aba2b6
  Andrew Xia authored Sep 15, 2025
```
Signed-off-by: Andrew Xia <axia@meta.com>
```
  25aba2b6
12 Sep, 2025 1 commit

[V1] feat:add engine v1 tracing (#20372) · 40b6c912

RichardoMu authored Sep 12, 2025


Signed-off-by: Mu Huai <tianbowen.tbw@antgroup.com>
Signed-off-by: Ye Zhang <zhysishu@gmail.com>
Signed-off-by: RichardoMu <44485717+RichardoMrMu@users.noreply.github.com>
Signed-off-by: simon-mo <simon.mo@hey.com>
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
Co-authored-by: Mu Huai <tianbowen.tbw@antgroup.com>
Co-authored-by: Ye Zhang <zhysishu@gmail.com>
Co-authored-by: Benjamin Bartels <benjamin@bartels.dev>
Co-authored-by: simon-mo <simon.mo@hey.com>
Co-authored-by: 瑜琮 <ly186375@antfin.com>
Co-authored-by: Aaron Pham <contact@aarnphm.xyz>
Co-authored-by: 22quinn <33176974+22quinn@users.noreply.github.com>

40b6c912

14 Aug, 2025 1 commit
- [Core] Return final response for aborted requests from `AsyncLLM.generate` (#22283) · ebcce2cd
  Nick Hill authored Aug 14, 2025
```
Signed-off-by: Nick Hill <nhill@redhat.com>
```
  ebcce2cd
23 Jul, 2025 1 commit
- [Core][Model] PrithviMAE Enablement on vLLM v1 engine (#20577) · 8560a5b2
  Christian Pinto authored Jul 23, 2025
```
Signed-off-by: Christian Pinto <christian.pinto@ibm.com>
```
  8560a5b2
19 Jun, 2025 1 commit

Support embedding models in V1 (#16188) · 799397ee

Maximilien de Bayser authored Jun 19, 2025


Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: Max de Bayser <maxdebayser@gmail.com>
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
Co-authored-by: 22quinn <33176974+22quinn@users.noreply.github.com>

799397ee

03 Jun, 2025 1 commit
- [Misc] Add SPDX-FileCopyrightText (#19100) · 02f0c7b2
  Simon Mo authored Jun 03, 2025
```
Signed-off-by: simon-mo <simon.mo@hey.com>
```
  02f0c7b2
23 May, 2025 1 commit
- [Feature][V1]: suupports cached_tokens in response usage (#18149) · b046cf79
  Chauncey authored May 23, 2025
```
Co-authored-by: simon-mo <xmo@berkeley.edu>
```
  b046cf79
12 May, 2025 1 commit

[P/D] NIXL Integration (#17751) · d1911020

Robert Shaw authored May 12, 2025


Signed-off-by: ApostaC <yihua98@uchicago.edu>
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: Brent Salisbury <bsalisbu@redhat.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: ApostaC <yihua98@uchicago.edu>
Co-authored-by: Robert Shaw <rshaw@neuralmagic.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Tyler Michael Smith <tysmith@redhat.com>
Co-authored-by: Brent Salisbury <bsalisbu@redhat.com>

d1911020

04 May, 2025 1 commit
- Add full API docs and improve the UX of navigating them (#17485) · d6484ef3
  Harry Mellor authored May 04, 2025
```
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
```
  d6484ef3
26 Apr, 2025 1 commit
- [Core] Remove prompt string from engine core data structures (#17214) · df6f3ce8
  Nick Hill authored Apr 25, 2025
```
Signed-off-by: Nick Hill <nhill@redhat.com>
```
  df6f3ce8
24 Apr, 2025 1 commit
- Simplify `TokenizerGroup` (#16790) · 0a05ed57
  Harry Mellor authored Apr 24, 2025
```
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
```
  0a05ed57
22 Apr, 2025 1 commit
- [Bugfix]: fix issue with n>1 sampling on v1 requests overriding each other (#16863) · 0e425449
  Jeffrey Li authored Apr 21, 2025
```
Signed-off-by: Jeffrey Li <jeffrey.dot.li@gmail.com>
```
  0e425449
17 Apr, 2025 1 commit

[V1][Frontend] Improve Shutdown And Logs (#11737) · 2b05b8ce

Robert Shaw authored Apr 16, 2025


Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
Signed-off-by: Andrew Feldman <afeldman@neuralmagic.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: Andrew Feldman <afeldman@neuralmagic.com>
Co-authored-by: afeldman-nm <156691304+afeldman-nm@users.noreply.github.com>
Co-authored-by: Nick Hill <nhill@redhat.com>

2b05b8ce

29 Mar, 2025 1 commit
- [Misc][V1] Misc code streamlining (#15723) · 6d531ad7
  Nick Hill authored Mar 28, 2025
```
Signed-off-by: Nick Hill <nhill@redhat.com>
```
  6d531ad7
24 Mar, 2025 2 commits

[V1][Perf] Simpler request output queues (#15156) · 9d72daf4

Nick Hill authored Mar 24, 2025


Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
Co-authored-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

9d72daf4

[V1] Aggregate chunked prompt logprobs in model runner (#14875) · 3aee6573
Nick Hill authored Mar 24, 2025
```
Signed-off-by: Nick Hill <nhill@redhat.com>
```
3aee6573

13 Mar, 2025 1 commit
- [V1] Detokenizer: Respect Stop Tokens + not include_stop_str_in_output (#14624) · 02fcaa3d
  afeldman-nm authored Mar 13, 2025
```
Signed-off-by: Andrew Feldman <afeldman@neuralmagic.com>
```
  02fcaa3d
12 Mar, 2025 1 commit
- [BugFix][V1] Fix parallel sampling finishing/aborts (#14512) · f5d3acd4
  Nick Hill authored Mar 12, 2025
```
Signed-off-by: Nick Hill <nhill@redhat.com>
```
  f5d3acd4
10 Mar, 2025 1 commit
- Correct capitalisation: `VLLM` -> `vLLM` (#14562) · 3b352a2f
  Harry Mellor authored Mar 10, 2025
```
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
```
  3b352a2f
06 Mar, 2025 1 commit

[V1] Do not detokenize if sampling param detokenize is False (#14224) · cd579352

Himanshu Jaju authored Mar 06, 2025


Signed-off-by: Himanshu Jaju <hj@mistral.ai>
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Nick Hill <nhill@redhat.com>

cd579352

03 Mar, 2025 3 commits
- [WIP][[V1][Metrics] Implement max_num_generation_tokens, request_params_n,... · ae122b1c
  Mark McLoughlin authored Mar 03, 2025
```
[WIP][[V1][Metrics] Implement max_num_generation_tokens,  request_params_n, and request_params_max_tokens metrics (#14055)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
```
  ae122b1c
- [V1] Refactor parallel sampling support (#13774) · 4167252e
  Mark McLoughlin authored Mar 03, 2025
```
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
```
  4167252e
- Update deprecated Python 3.8 typing (#13971) · cf069aa8
  Harry Mellor authored Mar 03, 2025
  
  cf069aa8
25 Feb, 2025 1 commit
- [V1][Metrics] Implement vllm:lora_requests_info metric (#13504) · bc32bc73
  Mark McLoughlin authored Feb 25, 2025
  
  bc32bc73
12 Feb, 2025 1 commit
- [Bug] [V1] Try fetching stop_reason from EngineOutput before checking the request (#13108) · f4d97e4f
  bnellnm authored Feb 12, 2025
  
  f4d97e4f
11 Feb, 2025 1 commit
- [V1][Metrics] Add several request timing histograms (#12644) · 75e6e145
  Mark McLoughlin authored Feb 11, 2025
```
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
```
  75e6e145
07 Feb, 2025 1 commit

[V1] Logprobs and prompt logprobs support (#9880) · 0630d453

afeldman-nm authored Feb 07, 2025



This PR is adding support for sample logprobs & prompt logprobs to vLLM v1.

New behavior:

- During model execution, model runner computes sample logprobs (if user-provided logprobs setting is not None) and prompt logprobs (if user-provided prompt_logprobs setting is not None). For both sample and prompt logprobs, the engine core returns 3 vectors: token ids, token logprob values, token ranks. Ranks reflect tokens' 1-indexed positions in the vocabulary vector after sorting the vocabulary by log probability in descending order.
- In scheduler.update_from_output(), sample and prompt logprobs are incorporated into the EngineCoreOutput data structure which is transferred to the engine client. If multiprocessing is enabled, then sample and prompt logprobs will be (de)serialized when the EngineCoreOutput data structure is (de)serialized.
- During output processing, the LogprobsProcessor transforms the triplet of token ids, token logprobs values, and token ranks into the OpenAI-compatible List[Dict[token id,Logprob]] format (for sample and prompt logprobs respectively.)
- Each Logprob instance (whether sample- or prompt-) consists of a token's log-probability, rank, and detokenized string representation. Note that logprob detokenization is handled by the LogprobsProcessor not the detokenizer.
Signed-off-by: Andrew Feldman <afeldman@neuralmagic.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
Co-authored-by: Nick Hill <nhill@redhat.com>

0630d453