Commits · 98b4d389ed27f09fd185ade889a02f640a3ff0b4 · OpenDAS / vllm_cscc

15 Nov, 2025 8 commits

Cyrus Leung authored Nov 15, 2025


Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>

98b4d389

[Performance][DeepGEMM] Estimate expected_m (#28694) · 6965ef43

Varun Sundar Rabindranath authored Nov 15, 2025


Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>

6965ef43

[NIXL] heterogeneous block_size support (#26759) · c9e66585

Chendi.Xue authored Nov 14, 2025


Signed-off-by: Chendi Xue <chendi.xue@intel.com>
Signed-off-by: Chendi.Xue <chendi.xue@intel.com>
Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com>

c9e66585

Fix IntermediateTensors initialization and add type hints (#28743) · 363aaeef
Mohammad Othman authored Nov 15, 2025
```
Signed-off-by: Mohammad Othman <Mo@MohammadOthman.com>
Co-authored-by: Mohammad Othman <Mo@MohammadOthman.com>
```
363aaeef
Revert "[Core] Performance: Use list[np.ndarray] instead of list[list… (#28773) · ac86bff8
Nick Hill authored Nov 14, 2025

ac86bff8
[Model][Qwen3VL] Use `mm_position` to compute mrope positions (#28730) · f05d474c
Lukas Geiger authored Nov 15, 2025
```
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
```
f05d474c
[TPU] Fix import error in tpu launch (#28758) · 9fc81ec7
QiliangCui authored Nov 14, 2025
```
Signed-off-by: Qiliang Cui <derrhein@gmail.com>
```
9fc81ec7

[Core] Performance: Use list[np.ndarray] instead of list[list[int]] for output... · 186352b2

Jialin Ouyang authored Nov 14, 2025


[Core] Performance: Use list[np.ndarray] instead of list[list[int]] for output tokens for GC optimization (#26368)
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>

186352b2

14 Nov, 2025 32 commits

[Log] Save profiler results to file instead of stdout (#28144) · ba041d98
rasmith authored Nov 14, 2025
```
Signed-off-by: Randall Smith <ransmith@amd.com>
Co-authored-by: Randall Smith <ransmith@amd.com>
```
ba041d98
[Hybrid] [Kernel] Fix chunk scan kernel when BLOCK_SIZE_DSTATE > 128 (#28295) · e0c910bb
Thomas Parnell authored Nov 14, 2025
```
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
```
e0c910bb
[Bugfix] Fix ChunkedLocalAttention CUDA Graph setting (#28739) · bf3ffb61
Benjamin Chislett authored Nov 14, 2025
```
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
```
bf3ffb61

[Bugfix] Fix incorrect use of hidden_states for shared_experts due to... · e5c78956

Alexander Matveev authored Nov 14, 2025


[Bugfix] Fix incorrect use of hidden_states for shared_experts due to do_naive_dispatch_combine (#28740)
Signed-off-by: Alexander Matveev <amatveev@redhat.com>

e5c78956

Avoid bytecode hook and simplify TorchCompileWrapperWithCustomDipatch (#25110) · 2e0ad629
Laith Sakka authored Nov 14, 2025
```
Signed-off-by: Laith Sakka <lsakka@meta.com>
```
2e0ad629
[BugFix] Fix misprint introduced by modular_kernel refactoring. (#28728) · fd455508
Andrey Khalyavin authored Nov 14, 2025
```
Signed-off-by: Andrey Khalyavin <halyavin@yandex-team.ru>
```
fd455508

[Bugfix] resolve Qwen3-VL GPTQModel quantized model loading failure (#28663) · cec275ef

GuanH authored Nov 15, 2025


Signed-off-by: GuanH <guansdrailib@gmail.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>

cec275ef

[Chore] Rename `SchedulerConfig.chunked_prefill_enabled` (#28735) · e2741f6c
Cyrus Leung authored Nov 15, 2025
```
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
```
e2741f6c
[Bugfix] [ROCm] [AITER]: Fix aiter block quant not compatible with torch compile dynamo (#28716) · a425dc25
TJian authored Nov 14, 2025
```
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
```
a425dc25

LLaMA4 LoRA Adapter Enablement (#28602) · 964d65de

Fardin Hoque authored Nov 14, 2025


Signed-off-by: Fardin Hoque <kfhfar@amazon.com>
Co-authored-by: Wei Wei <wwei6@meta.com>

964d65de

docs(lora_resolvers): clarify multi-resolver order and storage path requirement (#28153) · 9261eb3d

Chen Wang authored Nov 14, 2025

Signed-off-by: Chen Wang <Chen.Wang1@ibm.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

9261eb3d

Fix typo in comment: existance -> existence (#28737) · a17e36f2
Mohammad Othman authored Nov 14, 2025
```
Signed-off-by: Mohammad Othman <emranm226@hotmail.com>
```
a17e36f2

[DisaggEverything] Tokens in<>out `/generate` endpoint (#24261) · 6f1e7f72

Nicolò Lucchesi authored Nov 14, 2025

Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

6f1e7f72

[Docs] Update the name of `Transformers backend` -> `Transformers modeling backend` (#28725) · 5f3cd7f7
Harry Mellor authored Nov 14, 2025
```
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
```
5f3cd7f7

[Fix] improve aspect ratio in dummy image generation and add common VLM tests... · c934caee

dongbo910220 authored Nov 15, 2025


[Fix] improve aspect ratio in dummy image generation and add common  VLM tests for PaddleOCR-VL (#28711)
Signed-off-by: dongbo910220 <1275604947@qq.com>

c934caee

[Kernels] Enable FlashInfer FP8 Blockscale on SM90 (for TEP DSR1) (#27134) · 3f8a8740

Duncan Moss authored Nov 14, 2025

Signed-off-by: Duncan Moss <djm.moss@gmail.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>

3f8a8740

[Config] Clean up SchedulerConfig initialization (#28665) · 511a6b61
Cyrus Leung authored Nov 14, 2025
```
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
```
511a6b61
[Bugfix][Nixl] Fix kernel physical<>logical block_size issue (#28677) · 96b23b8e
Nicolò Lucchesi authored Nov 14, 2025
```
Signed-off-by: NickLucche <nlucches@redhat.com>
```
96b23b8e
[Model] Fix bailing_moe accuracy problem (#28277) · 433c0f86
zhaozx-cn authored Nov 14, 2025
```
Signed-off-by: zhaozx-cn <zhaozx2116@163.com>
```
433c0f86
[BugFix] Fix FA3 IMA with FULL_AND_PIECEWISE and cascade attention (default) (#28702) · db56a599
Lucas Wilkinson authored Nov 14, 2025

db56a599
Fix KV sharing fast prefill with cudagraph enabled (#28537) · 9324e102
Yong Hoon Shin authored Nov 14, 2025
```
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
```
9324e102

[DCP] Support Decode Context Parallel (DCP) for GQA with Flashinfer (#25438) · 4516d44b

Jingchun Gao authored Nov 14, 2025


Signed-off-by: gaojc <1055866782@qq.com>
Signed-off-by: Jingchun Gao <gaojingchun1@huawei.com>
Signed-off-by: Jingchun Gao <63247409+gjc0824@users.noreply.github.com>
Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com>
Co-authored-by: gaojingchun (A) <g00955623@china.huawei.com>
Co-authored-by: Jingchun Gao <gaojingchun1@huawei.com>
Co-authored-by: QiuChunshuo <qiuchunshuo@huawei.com>

4516d44b

[Model][MM] Extract conv layer as CustomOp (#28455) · 41b92f7d

Shanshan Shen authored Nov 14, 2025


Signed-off-by: shen-shanshan <467638484@qq.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>

41b92f7d

[Frontend] Added chat-style multimodal support to /classify. (#27516) · 360bd876

Srreyansh Sethi authored Nov 14, 2025


Signed-off-by: WorldExplored <srreyansh.sethi@gmail.com>
Signed-off-by: Srreyansh Sethi <107075589+WorldExplored@users.noreply.github.com>
Signed-off-by: vnadathur <glvikramn@gmail.com>
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Co-authored-by: vnadathur <236933696+vnadathur@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: vnadathur <glvikramn@gmail.com>
Co-authored-by: wang.yuqi <noooop@126.com>
Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io>

360bd876

[Metrics] Log number of preempted requests (#28522) · ecf8230d

lyn610 authored Nov 14, 2025

Add tracking and periodic logging for the number of preempted requests in the
metrics logger. This helps monitor system behavior under load.
Signed-off-by: Yining Liu <610lyn@gmail.com>

ecf8230d

[BugFix] Fix multi-modal async scheduling race condition (#28706) · bc3e4306
Nick Hill authored Nov 14, 2025
```
Signed-off-by: Nick Hill <nhill@redhat.com>
```
bc3e4306
[Bugfix] fix dots.ocr pp support (#28705) · c36bcfe6
Jiangyun Zhu authored Nov 14, 2025
```
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
```
c36bcfe6
use default CCL_ZE_IPC_EXCHANGE (#28700) · 529cea34
Yan Ma authored Nov 14, 2025
```
Signed-off-by: Yan Ma <yan.ma@intel.com>
```
529cea34

[Bugfix][CI/Test][Spec Decode] Fix illegal memory access in... · 15ae8e07

rasmith authored Nov 14, 2025


[Bugfix][CI/Test][Spec Decode] Fix illegal memory access in offline_inference/spec_decode.py (Issue  27619) (#28432)
Signed-off-by: Randall Smith <ransmith@amd.com>
Co-authored-by: Randall Smith <ransmith@amd.com>
Co-authored-by: TJian <tunjian.tan@embeddedllm.com>

15ae8e07

[Misc] add ignore mapper for quark quantization (#28275) · 0b254989
haoyangli-amd authored Nov 14, 2025
```
Signed-off-by: Haoyang Li <lihaoyang0109@gmail.com>
```
0b254989
[Misc] Remove `warn_for_unimplemented_methods` (#28613) · 01bea115
Cyrus Leung authored Nov 14, 2025
```
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
```
01bea115
[quantization][config] enable override existing quant_config (#28510) · 4d5943bd
Hank_ authored Nov 14, 2025
```
Signed-off-by: Hank <hcc.mayday@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
```
4d5943bd