Commits · a75a5b54c7f76bc2e15d3025d61e63cc91c7b0d7 · OpenDAS / vllm_cscc

07 Feb, 2026 2 commits

[Misc] Make `PlaceholderRange.get_num_embeds` a method (#34035) · 48312e57
Cyrus Leung authored Feb 07, 2026
```
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
```
48312e57

[Feat][RL] Pause and Resume with keep requests for single engine (#32351) · 89a385d7

Aaron Hao authored Feb 06, 2026


Signed-off-by: ahao-anyscale <ahao@anyscale.com>
Signed-off-by: Aaron Hao <ahao@anyscale.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>

89a385d7

06 Feb, 2026 2 commits
- [Refactor] Consolidate sequence normalization and enc-dec parsing (#33928) · cd8b405b
  Cyrus Leung authored Feb 06, 2026
```
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
```
  cd8b405b
- [Feature] OTEL tracing during loading (#31162) · 325ab6b0
  emricksini-h authored Feb 06, 2026
  
  325ab6b0
05 Feb, 2026 4 commits
- [Feat][RL][1/2] Native Weight Syncing API: NCCL (#31943) · c1858b7e
  Aaron Hao authored Feb 05, 2026
```
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
Signed-off-by: Aaron Hao <ahao@anyscale.com>
Co-authored-by: SumanthRH <sumanthrh99@gmail.com>
```
  c1858b7e
- [Refactor] Move `task` outside of `PoolingParams.verify` (#33796) · 038914b7
  Cyrus Leung authored Feb 05, 2026
```
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io>
```
  038914b7
- [Perf] Optimize the performance of structured output + reasoning (#33557) · 6abb0454
  Chauncey authored Feb 05, 2026
```
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
```
  6abb0454
- [Minor] Include `StreamingInput` in inputs package (#33856) · add9f1fb
  Nick Hill authored Feb 04, 2026
```
Signed-off-by: Nick Hill <nickhill123@gmail.com>
```
  add9f1fb
04 Feb, 2026 1 commit

[Metrics] Add labeled prompt token metrics for P/D disaggregation (#33290) · 4403e3ed

zhanqiuhu authored Feb 04, 2026

Add labeled Prometheus metrics to distinguish where prompt tokens come
from in P/D disaggregated deployments.

In P/D disaggregation, decode instances receive KV cache from prefill instances.
Currently, decode reports inflated prompt throughput because it counts all
prompt tokens as "computed", even though most were transferred.

This PR adds labeled metrics so users can understand actual compute work vs
transferred work:

vllm:prompt_tokens_by_source_total{source="local_compute"} # Tokens prefilled locally
vllm:prompt_tokens_by_source_total{source="external_kv_transfer"} # Tokens received via KV transfer
vllm:prompt_tokens_by_source_total{source="local_cache_hit"} # Tokens from local prefix cache
vllm:prompt_tokens_cached_total # Total cached (local + external, -1 when all
Signed-off-by: Zhanqiu Hu <zh338@cornell.edu>

4403e3ed

02 Feb, 2026 2 commits
- [Refactor] Move profiling methods to MM budget (#33559) · d7e17aaa
  Cyrus Leung authored Feb 02, 2026
```
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
```
  d7e17aaa
- [Chore] Remove redundant input parsing methods (#33542) · a502831d
  Cyrus Leung authored Feb 02, 2026
```
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
```
  a502831d
01 Feb, 2026 4 commits
- [Redo] #33110 with threading limit (#33502) · 21997f45
  Cyrus Leung authored Feb 01, 2026
```
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: YunzhuLu <lucia.yunzhu@gmail.com>
```
  21997f45
- [Critical] Revert #33110 (#33500) · b6bb2842
  Cyrus Leung authored Feb 01, 2026
  
  b6bb2842
- [Bugfix] Fix inconsistent handling of cache reset (#33481) · 79b6ec6a
  Cyrus Leung authored Feb 01, 2026
```
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
```
  79b6ec6a
- [Refactor] Make Renderer an abstract class (#33479) · a358e4df
  Cyrus Leung authored Feb 01, 2026
```
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
```
  a358e4df
31 Jan, 2026 6 commits
- [Refactor] Move MM data parsing outside processor (#33408) · 88c3e114
  Cyrus Leung authored Feb 01, 2026
```
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
```
  88c3e114
- [Bugfix] Early-reject requests with MM data longer than encode cache capacity (#33110) · 27cb2f67
  YunzhuLu authored Feb 01, 2026
```
Signed-off-by: YunzhuLu <lucia.yunzhu@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
```
  27cb2f67
- Support clear mm and encoder cache (#33452) · 22d9a056
  jma99_2333 authored Jan 31, 2026
```
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Roger Wang <hey@rogerw.io>
```
  22d9a056
- [Frontend] Use new Renderer for Completions and Tokenize API (#32863) · f0a1c845
  Cyrus Leung authored Jan 31, 2026
```
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
```
  f0a1c845
- [ModelRunner V2] Fix spec decoding + logprobs (#33391) · 876a16f4
  Nick Hill authored Jan 30, 2026
```
Signed-off-by: Nick Hill <nickhill123@gmail.com>
```
  876a16f4
- [Bugfix] Fix typo in read_offset variable name (#33426) · 64a40a7a
  Alberto Ferrer authored Jan 30, 2026
```
Signed-off-by: Alberto Ferrer <albertof@barrahome.org>
```
  64a40a7a
30 Jan, 2026 2 commits
- [Realtime API] Adds minimal realtime API based on websockets (#33187) · 10152d21
  Patrick von Platen authored Jan 30, 2026
```
Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
```
  10152d21
- [Misc] Replace Optional[X] with X | None syntax (#33332) · 1a7894db
  杨朱 · Kiki authored Jan 30, 2026
```
Signed-off-by: carlory <baofa.fan@daocloud.io>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
```
  1a7894db
28 Jan, 2026 1 commit
- Add flake8-implicit-str-concat rules to Ruff (#33191) · 2eb673a0
  Harry Mellor authored Jan 28, 2026
```
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
```
  2eb673a0
27 Jan, 2026 1 commit
- [BugFix] Fix P/D with non-MoE DP (#33037) · 0cd259b2
  Nick Hill authored Jan 27, 2026
```
Signed-off-by: Nick Hill <nickhill123@gmail.com>
```
  0cd259b2
26 Jan, 2026 1 commit
- [Refactor] Use data parser for matching data items to multi-modal UUIDs (#32955) · 11b55687
  Cyrus Leung authored Jan 26, 2026
```
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
```
  11b55687
24 Jan, 2026 1 commit

[Feature] add session based streaming input support to v1 (#28973) · 91601ff4

Joshua Deng authored Jan 24, 2026


Signed-off-by: Joshua Deng <joshuakdeng@gmail.com>
Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com>

91601ff4

22 Jan, 2026 1 commit
- [Frontend] Introduce Renderer for processing chat messages (using `ModelConfig`) (#30200) · d117a4d1
  Cyrus Leung authored Jan 22, 2026
```
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
```
  d117a4d1
20 Jan, 2026 1 commit
- [Core] Cleanup shm based object store on engine shutdown (#32429) · 8be263c3
  Walter Beller-Morales authored Jan 20, 2026
```
Signed-off-by: walterbm <walter.beller.morales@gmail.com>
```
  8be263c3
15 Jan, 2026 3 commits
- [2/N] Move cache factories to MM registry (#32382) · cbbae38f
  Cyrus Leung authored Jan 15, 2026
```
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
```
  cbbae38f
- [Bugfix] Strengthen the check of X-data-parallel-rank in Hybrid LB mode (#32314) · 1e584823
  dtc authored Jan 15, 2026
```
Signed-off-by: Tianchen Ding <dtcccc@linux.alibaba.com>
```
  1e584823
- [code clean] remove duplicate check (#32376) · 9d7ae3fc
  Ning Xie authored Jan 15, 2026
```
Signed-off-by: Andy Xie <andy.xning@gmail.com>
```
  9d7ae3fc
14 Jan, 2026 1 commit
- [1/N] Reorganize multimodal processing code (#32327) · 9ea07b41
  Cyrus Leung authored Jan 14, 2026
```
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
```
  9ea07b41
13 Jan, 2026 3 commits
- [Refactor] Remove `get_encoder_dummy_data` (#32241) · eb28e806
  Cyrus Leung authored Jan 13, 2026
```
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
```
  eb28e806
- [Perf] Optimize requests abort (#32211) · 2a719e08
  Wentao Ye authored Jan 12, 2026
```
Signed-off-by: yewentao256 <zhyanwentao@126.com>
```
  2a719e08
- [BugFix] Fix engine crash caused by chat tools + response_format (#32127) · c6bb5b56
  Nick Hill authored Jan 12, 2026
```
Signed-off-by: Nick Hill <nickhill123@gmail.com>
```
  c6bb5b56
12 Jan, 2026 3 commits

[Misc] Change log level for batch queue log (#32192) · 08e8e99c
Nicolò Lucchesi authored Jan 12, 2026
```
Signed-off-by: NickLucche <nlucches@redhat.com>
```
08e8e99c
[Misc] Set default torch num threads for input processing (#31879) · 16abe6b8
Roger Wang authored Jan 12, 2026
```
Signed-off-by: Roger Wang <hey@rogerw.io>
```
16abe6b8

[Feature] Support recording expert indices for rollout router replay (#28284) · 49e6b86c

Hongxin Xu authored Jan 12, 2026


Signed-off-by: xhx1022 <1737006628@qq.com>
Signed-off-by: Hongxin Xu <70438206+xhx1022@users.noreply.github.com>
Signed-off-by: arlenxu <arlenxu@tencent.com>
Co-authored-by: 22quinn <33176974+22quinn@users.noreply.github.com>
Co-authored-by: arlenxu <arlenxu@tencent.com>

49e6b86c

11 Jan, 2026 1 commit
- [Misc] fix this log format not space (#32112) · d70249e2
  rongfu.leng authored Jan 11, 2026
```
Signed-off-by: lengrongfu <lenronfu@gmail.com>
```
  d70249e2