Commits · 21bb323542bad9d7a7206d949f33734caf48c40c · OpenDAS / vllm_cscc

06 Dec, 2025 19 commits
- Gigachat 3 tool parser and tests (#29905) · 21bb3235
  Viacheslav authored Dec 06, 2025
```
Signed-off-by: Viacheslav Barinov <viacheslav.teh@gmail.com>
```
  21bb3235
- simplify requires_files list creation (#29656) · 17a9abec
  Chukwuma Nwaugha authored Dec 06, 2025
```
Signed-off-by: Chukwuma Nwaugha <nwaughac@gmail.com>
```
  17a9abec
- [Misc] Fix circular import in vllm.transformers_utils.config (#30179) · 92c35abb
  Ye (Charlotte) Qi authored Dec 06, 2025
```
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>
```
  92c35abb
- Support tokenization_kwargs override (#29794) · 43e75930
  Yu Jiaqi authored Dec 06, 2025
```
Signed-off-by: piood <2477084691@qq.com>
```
  43e75930
- [Chore] Deprecate `SupportsMultiModal.merge_by_field_config` (#30170) · c46b932d
  Cyrus Leung authored Dec 06, 2025
```
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
```
  c46b932d
- prefix caching design doc sha256 now default (#29261) · 64763823
  redwrasse authored Dec 05, 2025
```
Signed-off-by: redwrasse <mail@redwrasse.io>
```
  64763823
- [bugfix] fix type[AttentionBackend] bug in kv_connector_base_v1 (#30051) · d6aeaddf
  kx authored Dec 06, 2025
```
Signed-off-by: 01267596 <xiongkai123@cmbchina.com>
Co-authored-by: 01267596 <xiongkai123@cmbchina.com>
```
  d6aeaddf
- [Model Runner V2] Support min-p sampling (#30171) · a238cbd8
  Woosuk Kwon authored Dec 05, 2025
```
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
```
  a238cbd8
- [Misc] Move `disable_nccl_for_dp_synchronization` init logic into `VllmConfig` (#30161) · 4026ae31
  Nick Hill authored Dec 05, 2025
```
Signed-off-by: Nick Hill <nhill@redhat.com>
```
  4026ae31
- [CI/Build][AMD] Use ROCM_ATTN instead of FLASH_ATTN test for... · b12f4a98
  rasmith authored Dec 05, 2025
```
[CI/Build][AMD] Use ROCM_ATTN instead of FLASH_ATTN test for test_register_kv_caches for ROCm and update test for TRITON_ATTN (#29985)
Signed-off-by: Randall Smith <ransmith@amd.com>
Co-authored-by: Randall Smith <ransmith@amd.com>
Co-authored-by: TJian <tunjian.tan@embeddedllm.com>
```
  b12f4a98
- [Bugfix]: Fix `TokenizerLike` interface (#30009) · 40a046cd
  Rohan Potdar authored Dec 05, 2025
```
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
```
  40a046cd
- [Model] Add support for transformer-based Ultravox v0.7 projector (#30089) · e858bc4d
  Peter Salas authored Dec 05, 2025
```
Signed-off-by: Peter Salas <peter@fixie.ai>
```
  e858bc4d
- fix#30092 Kimi-Linear model loading failure with missing indexer_rotary_emb (#30093) · e3fbb6f1
  Dongjie Zou authored Dec 05, 2025
```
Signed-off-by: baonudesifeizhai <baonudesifeizhai@gmail.com>
```
  e3fbb6f1
- Fix AWQ MoE marlin check issue in marlin_utils.py for AMD backend (#30102) · c4d62618
  yuttian1 authored Dec 06, 2025
```
Signed-off-by: yuttian1 <yuttian@amd.com>
```
  c4d62618
- [CI/Build][AMD] Skip marlin, machete, and hadacore tests since these require... · 62079d86
  rasmith authored Dec 05, 2025
```
[CI/Build][AMD] Skip marlin, machete, and hadacore tests since these require _C functions not defined for ROCm (#30109)
Signed-off-by: Randall Smith <ransmith@amd.com>
Co-authored-by: Randall Smith <ransmith@amd.com>
```
  62079d86
- Better error when world size is larger than node and... · bf4a901a
  Harry Mellor authored Dec 06, 2025
```
Better error when world size is larger than node and `distributed_executor_backend` is not set (#30140)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
```
  bf4a901a
- [CI]: Remove unnecessary imports from test_lmache_integration (#30157) · 7e31c3a3
  Samuel Shen authored Dec 05, 2025
```
Signed-off-by: Samuel Shen <slshen@uchicago.edu>
Co-authored-by: Samuel Shen <slshen@uchicago.edu>
```
  7e31c3a3
- [CI/Build][AMD][Quantization] Fix test_int8_kernel.py by updating int8_utils... · dc839ad0
  rasmith authored Dec 05, 2025
```
[CI/Build][AMD][Quantization] Fix test_int8_kernel.py by updating int8_utils to use hip.libdevice.round (#30151)
Signed-off-by: Randall Smith <ransmith@amd.com>
Co-authored-by: Randall Smith <ransmith@amd.com>
```
  dc839ad0
- [Tests] Tool call tests for openai/gpt-oss-20b (#26237) · 02a41691
  Deboleina authored Dec 05, 2025
```
Signed-off-by: Debolina Roy <debroy@redhat.com>
```
  02a41691
05 Dec, 2025 21 commits

[Bug] Fix vLLM config is not set error (#29999) · 7b5575fa
Wentao Ye authored Dec 05, 2025
```
Signed-off-by: yewentao256 <zhyanwentao@126.com>
```
7b5575fa
let draft model follow target model's config_format (#30152) · 77e44728
Bangsheng Tang authored Dec 05, 2025

77e44728
[Bugfix][llama4_eagle] Fix missing 'lm_head' attribute (#29926) · 962d7038
Divakar Verma authored Dec 05, 2025
```
Signed-off-by: Divakar Verma <divakar.verma@amd.com>
```
962d7038
[CI] Re-use whisper_client for all tests (#30148) · e23ca3a0
Nicolò Lucchesi authored Dec 05, 2025
```
Signed-off-by: NickLucche <nlucches@redhat.com>
```
e23ca3a0
[Misc] Rename CohereForAI references to CohereLabs (#30147) · 3633035a
Russell Bryant authored Dec 05, 2025
```
Signed-off-by: Russell Bryant <rbryant@redhat.com>
```
3633035a
[Enc-Dec] Fix OOT tokenizer issue (#30144) · bff78310
Nicolò Lucchesi authored Dec 05, 2025
```
Signed-off-by: NickLucche <nlucches@redhat.com>
```
bff78310

[KVConnector][Feature] Support KV connector cache reset via /reset_prefix_cache (#27170) · adb31506

Tova Movshovitz authored Dec 05, 2025

Signed-off-by: tovam <tovam@pliops.com>
Signed-off-by: Tova Movshovitz <tovam@pliops.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

adb31506

[Compile] Conditional compilation. Introduce compile_ranges (#24252) · 4e26d3b0

Ilya Markov authored Dec 05, 2025


Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Signed-off-by: ilmarkov <markovilya197@gmail.com>
Signed-off-by: Luka Govedič <luka.govedic@gmail.com>
Signed-off-by: ProExpertProg <lgovedic@redhat.com>
Co-authored-by: Luka Govedič <lgovedic@redhat.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Luka Govedič <luka.govedic@gmail.com>

4e26d3b0

[Attention][UX][1/N] Add AttentionConfig and change attention env vars to CLI arguments (#26315) · 66e674cd

Matthew Bonanni authored Dec 05, 2025

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>

66e674cd

[NIXL] Add remote_request_id to kv_transfer_params (#29665) · dff0a2b3
Mark McLoughlin authored Dec 05, 2025
```
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
```
dff0a2b3

[BugFix] Eagerly abort cancelled final-step requests (#29987) · dc264bce

Nick Hill authored Dec 05, 2025



Currently, when requests are cancelled while executing their final
step, "completion" is handled based on normal stop processing (e.g.
length or stop token), so the abort has no effect. This is typically
not a problem, but when a kv connector is involved it thinks the
request completed successfully rather than being aborted.

This is problematic for disaggregated prefill which will free kv
cache blocks if the request was aborted but not if it completed
successfully—since the cancelled request will never be sent to
the decode side, kv cache blocks remain pinned until the fall-back
timeout expires. The problem is exacerbated when many requests
are cancelled and/or there are large prefills whose forward pass
takes a long time (since the window is bigger).

This PR fixes the problem by processing pending aborts
immediately prior to processing model output each step; we process
only aborts, not new requests, since it's preferable for latency to
process model outputs before new incoming requests.

Fixes #26400.
Signed-off-by: Nick Hill <nhill@redhat.com>

dc264bce

[NIXL] Small cleanup of unused variables (#29618) · 78c44fd7

Nicolò Lucchesi authored Dec 05, 2025


Signed-off-by: NickLucche <nlucches@redhat.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>

78c44fd7

[bugfix] Pass globals to aot_compiled function (#29428) · e7296b08
Angela Yi authored Dec 05, 2025
```
Signed-off-by: angelayi <yiangela7@gmail.com>
```
e7296b08

[responsesAPI][5] ResponsesParser with tools for full MCP python loop (#29798) · da7bc54e

Andrew Xia authored Dec 05, 2025


Signed-off-by: Andrew Xia <axia@fb.com>
Signed-off-by: Andrew Xia <axia@meta.com>
Co-authored-by: Andrew Xia <axia@fb.com>

da7bc54e

[NIXL] Add compatibility checking to NIXL KV connector handshake (#29503) · 949a6a19
Mark McLoughlin authored Dec 05, 2025
```
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
```
949a6a19

Reduce validation to a warning (#28749) · 2c174420

Alec S authored Dec 05, 2025


Signed-off-by: Alec Solder <alecs@fb.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Alec Solder <alecs@fb.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

2c174420

[Compressed Tensors] Add XPU `wNa16` support (#29484) · 0d8a7d8a
Yi Liu authored Dec 05, 2025
```
Signed-off-by: yiliu30 <yi4.liu@intel.com>
```
0d8a7d8a

[CPU][Perf] Add fast vectorized exp impl from Arm Optimized Routines (#30068) · 9843e332

Elham authored Dec 05, 2025

Signed-off-by: Ubuntu <ubuntu@ip-10-252-30-150.eu-west-1.compute.internal>
Signed-off-by: Elham Harirpoush <elham.harirpoush@arm.com>
Co-authored-by: Ubuntu <ubuntu@ip-10-252-30-150.eu-west-1.compute.internal>

9843e332

[CI] Have pre-commit comment on a PR if pre-commit was not used (#30077) · b7d85cf2
Harry Mellor authored Dec 05, 2025
```
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
```
b7d85cf2

[Feature] Add Layer-wise NVTX Support (#29990) · c2894d38

Max Hu authored Dec 05, 2025


Signed-off-by: Max Hu <hyoung2991@gmail.com>
Signed-off-by: Max Hu <maxhu@nvidia.com>
Co-authored-by: Max Hu <maxhu@nvidia.com>

c2894d38

[ROCm][MXFP4] Infer w4a4 quant method in rocm aiter fused moe (#29775) · 3628bcaa
Zhiwei authored Dec 05, 2025
```
Signed-off-by: ZhiweiYan-96 <zhiwei.yan@amd.com>
```
3628bcaa