Commits · 79c7e092350e4ae82d679ea4b2cdaaa4b580944b · OpenDAS / vllm_cscc

15 Feb, 2026 1 commit

[KV Connector] Add temporary, off-by-default... · 79c7e092

Seiji Eicher authored Feb 14, 2026


[KV Connector] Add temporary, off-by-default `VLLM_DISABLE_REQUEST_ID_RANDOMIZATION` workaround (#34415)
Signed-off-by: Seiji Eicher <seiji@anyscale.com>

79c7e092

14 Feb, 2026 1 commit

[Renderer] Move InputPreprocessor into Renderer (1/2) (#34510) · 73391a1b

Cyrus Leung authored Feb 15, 2026


Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com>

73391a1b

13 Feb, 2026 5 commits

[Core] Move pause and resume functions into engine (#34125) · dddbff46

Aaron Hao authored Feb 13, 2026


Signed-off-by: ahao-anyscale <ahao@anyscale.com>
Signed-off-by: Aaron Hao <ahao@anyscale.com>
Signed-off-by: hao-aaron <ahao@anyscale.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com>

dddbff46

[Refactor] Pass full VllmConfig to Renderer (#34485) · 2f308214
Cyrus Leung authored Feb 13, 2026
```
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
```
2f308214
[Refactor] Simplify BOS/EOS token handling (#34435) · ea5ff3a1
Cyrus Leung authored Feb 13, 2026
```
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
```
ea5ff3a1

[Core] Profiler improvements and lazy initialization (#33198) · 4453ba8d

Jaewon authored Feb 12, 2026


Signed-off-by: Jaewon Lee <jaewon@meta.com>
Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com>

4453ba8d

[Core] Add sleep level 0 mode with enqueue/wait pattern (#33195) · aa181c92

Jaewon authored Feb 12, 2026


Signed-off-by: Jaewon Lee <jaewon@meta.com>
Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com>

aa181c92

12 Feb, 2026 3 commits
- [BUG] Reset running requests when clearing cache for pause/resume (#34382) · 7b5a8b4a
  Aaron Hao authored Feb 12, 2026
```
Signed-off-by: hao-aaron <ahao@anyscale.com>
```
  7b5a8b4a
- [Refactor] Pass Renderer to Input Processor (#34329) · b96f7314
  Cyrus Leung authored Feb 12, 2026
```
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
```
  b96f7314
- [Refactor] Move validation to params definitions (#34362) · ced2a92f
  Cyrus Leung authored Feb 12, 2026
```
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
```
  ced2a92f
11 Feb, 2026 3 commits
- [Docs] Reduce time spent generating API docs (#34255) · 40b8f553
  Harry Mellor authored Feb 11, 2026
```
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
```
  40b8f553
- [Frontend] Exploit tokenizers "new stream" in FastIncrementalDetokenizer (#34217) · e09546cf
  Nick Hill authored Feb 11, 2026
```
Signed-off-by: Nick Hill <nickhill123@gmail.com>
```
  e09546cf
- [Misc] Clean up validation logic in input processor (#34144) · b5dcb372
  Cyrus Leung authored Feb 11, 2026
```
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
```
  b5dcb372
10 Feb, 2026 1 commit

[Perf] Optimize detokenizer python logic (#32975) · e1060a71

Wentao Ye authored Feb 10, 2026


Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: Nick Hill <nhill@redhat.com>

e1060a71

09 Feb, 2026 1 commit

[Tiny] Rename encoder budget file to more specific name (#34103) · 7c233dbb

Reagan Lee authored Feb 08, 2026


Signed-off-by: Reagan Lee <“reaganjlee@gmail.com”>
Co-authored-by: Reagan Lee <“reaganjlee@gmail.com”>

7c233dbb

07 Feb, 2026 2 commits

[Misc] Make `PlaceholderRange.get_num_embeds` a method (#34035) · 48312e57
Cyrus Leung authored Feb 07, 2026
```
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
```
48312e57

[Feat][RL] Pause and Resume with keep requests for single engine (#32351) · 89a385d7

Aaron Hao authored Feb 06, 2026


Signed-off-by: ahao-anyscale <ahao@anyscale.com>
Signed-off-by: Aaron Hao <ahao@anyscale.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>

89a385d7

06 Feb, 2026 2 commits
- [Refactor] Consolidate sequence normalization and enc-dec parsing (#33928) · cd8b405b
  Cyrus Leung authored Feb 06, 2026
```
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
```
  cd8b405b
- [Feature] OTEL tracing during loading (#31162) · 325ab6b0
  emricksini-h authored Feb 06, 2026
  
  325ab6b0
05 Feb, 2026 4 commits
- [Feat][RL][1/2] Native Weight Syncing API: NCCL (#31943) · c1858b7e
  Aaron Hao authored Feb 05, 2026
```
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
Signed-off-by: Aaron Hao <ahao@anyscale.com>
Co-authored-by: SumanthRH <sumanthrh99@gmail.com>
```
  c1858b7e
- [Refactor] Move `task` outside of `PoolingParams.verify` (#33796) · 038914b7
  Cyrus Leung authored Feb 05, 2026
```
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io>
```
  038914b7
- [Perf] Optimize the performance of structured output + reasoning (#33557) · 6abb0454
  Chauncey authored Feb 05, 2026
```
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
```
  6abb0454
- [Minor] Include `StreamingInput` in inputs package (#33856) · add9f1fb
  Nick Hill authored Feb 04, 2026
```
Signed-off-by: Nick Hill <nickhill123@gmail.com>
```
  add9f1fb
04 Feb, 2026 1 commit

[Metrics] Add labeled prompt token metrics for P/D disaggregation (#33290) · 4403e3ed

zhanqiuhu authored Feb 04, 2026

Add labeled Prometheus metrics to distinguish where prompt tokens come
from in P/D disaggregated deployments.

In P/D disaggregation, decode instances receive KV cache from prefill instances.
Currently, decode reports inflated prompt throughput because it counts all
prompt tokens as "computed", even though most were transferred.

This PR adds labeled metrics so users can understand actual compute work vs
transferred work:

vllm:prompt_tokens_by_source_total{source="local_compute"} # Tokens prefilled locally
vllm:prompt_tokens_by_source_total{source="external_kv_transfer"} # Tokens received via KV transfer
vllm:prompt_tokens_by_source_total{source="local_cache_hit"} # Tokens from local prefix cache
vllm:prompt_tokens_cached_total # Total cached (local + external, -1 when all
Signed-off-by: Zhanqiu Hu <zh338@cornell.edu>

4403e3ed

02 Feb, 2026 2 commits
- [Refactor] Move profiling methods to MM budget (#33559) · d7e17aaa
  Cyrus Leung authored Feb 02, 2026
```
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
```
  d7e17aaa
- [Chore] Remove redundant input parsing methods (#33542) · a502831d
  Cyrus Leung authored Feb 02, 2026
```
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
```
  a502831d
01 Feb, 2026 4 commits
- [Redo] #33110 with threading limit (#33502) · 21997f45
  Cyrus Leung authored Feb 01, 2026
```
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: YunzhuLu <lucia.yunzhu@gmail.com>
```
  21997f45
- [Critical] Revert #33110 (#33500) · b6bb2842
  Cyrus Leung authored Feb 01, 2026
  
  b6bb2842
- [Bugfix] Fix inconsistent handling of cache reset (#33481) · 79b6ec6a
  Cyrus Leung authored Feb 01, 2026
```
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
```
  79b6ec6a
- [Refactor] Make Renderer an abstract class (#33479) · a358e4df
  Cyrus Leung authored Feb 01, 2026
```
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
```
  a358e4df
31 Jan, 2026 6 commits
- [Refactor] Move MM data parsing outside processor (#33408) · 88c3e114
  Cyrus Leung authored Feb 01, 2026
```
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
```
  88c3e114
- [Bugfix] Early-reject requests with MM data longer than encode cache capacity (#33110) · 27cb2f67
  YunzhuLu authored Feb 01, 2026
```
Signed-off-by: YunzhuLu <lucia.yunzhu@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
```
  27cb2f67
- Support clear mm and encoder cache (#33452) · 22d9a056
  jma99_2333 authored Jan 31, 2026
```
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Roger Wang <hey@rogerw.io>
```
  22d9a056
- [Frontend] Use new Renderer for Completions and Tokenize API (#32863) · f0a1c845
  Cyrus Leung authored Jan 31, 2026
```
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
```
  f0a1c845
- [ModelRunner V2] Fix spec decoding + logprobs (#33391) · 876a16f4
  Nick Hill authored Jan 30, 2026
```
Signed-off-by: Nick Hill <nickhill123@gmail.com>
```
  876a16f4
- [Bugfix] Fix typo in read_offset variable name (#33426) · 64a40a7a
  Alberto Ferrer authored Jan 30, 2026
```
Signed-off-by: Alberto Ferrer <albertof@barrahome.org>
```
  64a40a7a
30 Jan, 2026 2 commits
- [Realtime API] Adds minimal realtime API based on websockets (#33187) · 10152d21
  Patrick von Platen authored Jan 30, 2026
```
Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
```
  10152d21
- [Misc] Replace Optional[X] with X | None syntax (#33332) · 1a7894db
  杨朱 · Kiki authored Jan 30, 2026
```
Signed-off-by: carlory <baofa.fan@daocloud.io>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
```
  1a7894db
28 Jan, 2026 1 commit
- Add flake8-implicit-str-concat rules to Ruff (#33191) · 2eb673a0
  Harry Mellor authored Jan 28, 2026
```
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
```
  2eb673a0
27 Jan, 2026 1 commit
- [BugFix] Fix P/D with non-MoE DP (#33037) · 0cd259b2
  Nick Hill authored Jan 27, 2026
```
Signed-off-by: Nick Hill <nickhill123@gmail.com>
```
  0cd259b2