Commits · 4c6fd258808ed42fc98a94f3a849f5fc9efebc20 · OpenDAS / vllm_cscc

09 Dec, 2025 7 commits
- kv_transfer: Rename the shared storage connectors (#30201) · 4c6fd258
  Or Ozeri authored Dec 09, 2025
```
Signed-off-by: Or Ozeri <oro@il.ibm.com>
```
  4c6fd258
- [Kernel]Support W4A8 Grouped GEMM on Hopper (#29691) · f6227c22
  czhu-cohere authored Dec 08, 2025
```
Signed-off-by: czhu-cohere <conway.zhu@cohere.com>
```
  f6227c22
- Lora MoE Align Improvements (#29257) · ea657f20
  gnovack authored Dec 08, 2025
```
Signed-off-by: gnovack <gnovack@amazon.com>
```
  ea657f20
- Mark qwen2_5_vl as xfail (#30283) · 7b35011a
  Yanan Cao authored Dec 08, 2025
```
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>
```
  7b35011a
- [Feature] Batch invariant: Enable `TRITON_MLA` without prefix-caching (#29125) · d9417096
  Wentao Ye authored Dec 08, 2025
```
Signed-off-by: yewentao256 <zhyanwentao@126.com>
```
  d9417096
- feat(metrics): Add prefill KV compute metric excluding cached tokens (#30189) · f1599ca5
  Victor Ziliang Peng authored Dec 08, 2025
```
Signed-off-by: Ziliang Peng <ziliang@character.ai>
```
  f1599ca5
- [Disagg] Support large batch size in proxy server and update NixlConnector doc for DP (#28782) · 60d17251
  Ming Yang authored Dec 08, 2025
```
Signed-off-by: Ming Yang <minos.future@gmail.com>
```
  60d17251
08 Dec, 2025 8 commits

[ROCm][CI] Fix test_max_len.py for Rocm (#29916) · 6af70e11
Charlie Fu authored Dec 08, 2025
```
Signed-off-by: charlifu <charlifu@amd.com>
Signed-off-by: Charlie Fu <Charlie.Fu@amd.com>
```
6af70e11
Add SpecDec support to `selective_state_update` (#29488) · ae0f69b1
roikoren755 authored Dec 08, 2025
```
Signed-off-by: Roi Koren <roik@nvidia.com>
```
ae0f69b1
[Misc] Split the LoRA code (#30253) · 67312cad
Jee Jee Li authored Dec 09, 2025
```
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
```
67312cad
Add evaluate_guards option to DynamicShapesConfig (#27432) · 87aee9ed
Laith Sakka authored Dec 08, 2025
```
Signed-off-by: Laith Sakka <lsakka@meta.com>
```
87aee9ed

[DeepSeek v3.2] Make top-k work for any logit values. (#27568) · 184076c3

Daniel Cámpora authored Dec 08, 2025


Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>

184076c3

[Frontend] Binary embedding response does not return metadata by setting... · 2e660c24

wang.yuqi authored Dec 08, 2025


[Frontend] Binary embedding response does not return metadata by setting encoding_format to bytes_only. (#30249)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

2e660c24

[Model][7/N] Improve all pooling task | Deprecation as_reward_model. Extract... · 9e77ffca

wang.yuqi authored Dec 08, 2025


[Model][7/N] Improve all pooling task | Deprecation as_reward_model. Extract hidden states prefer using new multi-vector retrieval API (#26686)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>

9e77ffca

[Frontend] Add MCP type support infrastructure to Responses API (#30054) · 444f0e3f
daniel-salib authored Dec 07, 2025
```
Signed-off-by: Daniel Salib <danielsalib@meta.com>
```
444f0e3f

07 Dec, 2025 9 commits
- [Performance] Fused blockwise quant RMS norm (#27883) · af0444bf
  ElizaWszola authored Dec 07, 2025
```
Signed-off-by: ElizaWszola <ewszola@redhat.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: yewentao256 <zhyanwentao@126.com>
```
  af0444bf
- [v1] Add PrefixLM support to FlexAttention backend (#27938) · b952f4d3
  Isotr0py authored Dec 07, 2025
```
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
```
  b952f4d3
- [CI/Build]Temporary workaround for test_default_mm_loras timeout (#30202) · b0f4866a
  Jee Jee Li authored Dec 07, 2025
```
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
```
  b0f4866a
- [Kernel][MoE] optimize `moe_align_block_size` (#29642) · 879ddb09
  Jinzhen Lin authored Dec 07, 2025
```
Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
```
  879ddb09
- Revert "[Renderer] Separate out `RendererConfig` from `ModelConfig` (#30145)" (#30199) · e83b7e37
  Cyrus Leung authored Dec 07, 2025
  
  e83b7e37
- [Renderer] Separate out `RendererConfig` from `ModelConfig` (#30145) · 27f4c2fd
  Cyrus Leung authored Dec 07, 2025
```
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
```
  27f4c2fd
- [Perf] Enable cuda graph for deepepHT, 5.3% throughput improvement, 4.4% TTFT improvement (#29558) · 17eb25e3
  Wentao Ye authored Dec 06, 2025
```
Signed-off-by: yewentao256 <zhyanwentao@126.com>
```
  17eb25e3
- Support multiple image/audio embeddings per requests (#29988) · dce6d229
  jeremyteboul authored Dec 06, 2025
```
Signed-off-by: Jeremy Teboul <jeremyteboul@fb.com>
Co-authored-by: Jeremy Teboul <jeremyteboul@fb.com>
```
  dce6d229
- [Frontend] Remove confusing -O.xx flag error (#30169) · cbedb703
  Yanan Cao authored Dec 06, 2025
```
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>
```
  cbedb703
06 Dec, 2025 8 commits

[ez] move harmony utils to parser folder (#30117) · 421125d0
Andrew Xia authored Dec 06, 2025
```
Signed-off-by: Andrew Xia <axia@fb.com>
Co-authored-by: Andrew Xia <axia@fb.com>
```
421125d0
[Model] Move `multimodal_cpu_fields` definition to field config (#30181) · 671427ef
Cyrus Leung authored Dec 06, 2025
```
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
```
671427ef
Gigachat 3 tool parser and tests (#29905) · 21bb3235
Viacheslav authored Dec 06, 2025
```
Signed-off-by: Viacheslav Barinov <viacheslav.teh@gmail.com>
```
21bb3235
Support tokenization_kwargs override (#29794) · 43e75930
Yu Jiaqi authored Dec 06, 2025
```
Signed-off-by: piood <2477084691@qq.com>
```
43e75930

[CI/Build][AMD] Use ROCM_ATTN instead of FLASH_ATTN test for... · b12f4a98

rasmith authored Dec 05, 2025


[CI/Build][AMD] Use ROCM_ATTN instead of FLASH_ATTN test for test_register_kv_caches for ROCm and update test for TRITON_ATTN (#29985)
Signed-off-by: Randall Smith <ransmith@amd.com>
Co-authored-by: Randall Smith <ransmith@amd.com>
Co-authored-by: TJian <tunjian.tan@embeddedllm.com>

b12f4a98

[CI/Build][AMD] Skip marlin, machete, and hadacore tests since these require... · 62079d86

rasmith authored Dec 05, 2025


[CI/Build][AMD] Skip marlin, machete, and hadacore tests since these require _C functions not defined for ROCm (#30109)
Signed-off-by: Randall Smith <ransmith@amd.com>
Co-authored-by: Randall Smith <ransmith@amd.com>

62079d86

[CI]: Remove unnecessary imports from test_lmache_integration (#30157) · 7e31c3a3
Samuel Shen authored Dec 05, 2025
```
Signed-off-by: Samuel Shen <slshen@uchicago.edu>
Co-authored-by: Samuel Shen <slshen@uchicago.edu>
```
7e31c3a3
[Tests] Tool call tests for openai/gpt-oss-20b (#26237) · 02a41691
Deboleina authored Dec 05, 2025
```
Signed-off-by: Debolina Roy <debroy@redhat.com>
```
02a41691

05 Dec, 2025 8 commits

[Bugfix][llama4_eagle] Fix missing 'lm_head' attribute (#29926) · 962d7038
Divakar Verma authored Dec 05, 2025
```
Signed-off-by: Divakar Verma <divakar.verma@amd.com>
```
962d7038
[CI] Re-use whisper_client for all tests (#30148) · e23ca3a0
Nicolò Lucchesi authored Dec 05, 2025
```
Signed-off-by: NickLucche <nlucches@redhat.com>
```
e23ca3a0
[Misc] Rename CohereForAI references to CohereLabs (#30147) · 3633035a
Russell Bryant authored Dec 05, 2025
```
Signed-off-by: Russell Bryant <rbryant@redhat.com>
```
3633035a

[Compile] Conditional compilation. Introduce compile_ranges (#24252) · 4e26d3b0

Ilya Markov authored Dec 05, 2025


Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Signed-off-by: ilmarkov <markovilya197@gmail.com>
Signed-off-by: Luka Govedič <luka.govedic@gmail.com>
Signed-off-by: ProExpertProg <lgovedic@redhat.com>
Co-authored-by: Luka Govedič <lgovedic@redhat.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Luka Govedič <luka.govedic@gmail.com>

4e26d3b0

[Attention][UX][1/N] Add AttentionConfig and change attention env vars to CLI arguments (#26315) · 66e674cd

Matthew Bonanni authored Dec 05, 2025

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>

66e674cd

[NIXL] Add remote_request_id to kv_transfer_params (#29665) · dff0a2b3
Mark McLoughlin authored Dec 05, 2025
```
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
```
dff0a2b3

[BugFix] Eagerly abort cancelled final-step requests (#29987) · dc264bce

Nick Hill authored Dec 05, 2025



Currently, when requests are cancelled while executing their final
step, "completion" is handled based on normal stop processing (e.g.
length or stop token), so the abort has no effect. This is typically
not a problem, but when a kv connector is involved it thinks the
request completed successfully rather than being aborted.

This is problematic for disaggregated prefill which will free kv
cache blocks if the request was aborted but not if it completed
successfully—since the cancelled request will never be sent to
the decode side, kv cache blocks remain pinned until the fall-back
timeout expires. The problem is exacerbated when many requests
are cancelled and/or there are large prefills whose forward pass
takes a long time (since the window is bigger).

This PR fixes the problem by processing pending aborts
immediately prior to processing model output each step; we process
only aborts, not new requests, since it's preferable for latency to
process model outputs before new incoming requests.

Fixes #26400.
Signed-off-by: Nick Hill <nhill@redhat.com>

dc264bce

[NIXL] Small cleanup of unused variables (#29618) · 78c44fd7

Nicolò Lucchesi authored Dec 05, 2025


Signed-off-by: NickLucche <nlucches@redhat.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>

78c44fd7