Commits · 46794958f0c60bc3a4f30562032e991222ab5d56 · OpenDAS / vllm_cscc

"vscode:/vscode.git/clone" did not exist on "06e16a27eb251805c8e07b3a2e3bbd980fcf1592"

22 Apr, 2026 3 commits
- test: add nan/inf clamp regression test for fused_topk_bias (#40553) · 46794958
  Jhao-Ting Chen authored Apr 21, 2026
```
Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>
```
  46794958
- [Bugfix] avoid warmup if text only expectation in multi_modal run (#40409) · 6ff8dea0
  Khushali Desai authored Apr 21, 2026
```
Signed-off-by: khushali9 <khushali.desai9@gmail.com>
```
  6ff8dea0
- [ROCm] [Wheel] [Bugfix] [Critical] Remove any packages installed from github... · 583e6f22
  TJian authored Apr 22, 2026
```
[ROCm] [Wheel] [Bugfix] [Critical] Remove any packages installed from github from rocm.txt e.g  `fastsafetensors` as it is incompatible with `uv pip` (#40461)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
```
  583e6f22
21 Apr, 2026 37 commits

[Startup][UX] Enable CUDAGraph memory profiling by default (#38284) · 96a85c57
Matthew Bonanni authored Apr 21, 2026
```
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
```
96a85c57

[MoE Refactor] Add more MoE layer tests (#39349) · 9db4650e

bnellnm authored Apr 21, 2026

Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>

9db4650e

[MoE Refactor] Remove SharedFusedMoE class (#35782) · 5e584ce9
bnellnm authored Apr 21, 2026
```
Signed-off-by: Bill Nell <bnell@redhat.com>
```
5e584ce9
[Refactor] Remove unused param (#39750) · 1842447c
Wentao Ye authored Apr 21, 2026
```
Signed-off-by: yewentao256 <zhyanwentao@126.com>
```
1842447c
[Perf] Optimize batch invariant with fused rms norm, 2.1% E2E latency improvement (#40413) · 16688b26
Wentao Ye authored Apr 21, 2026
```
Signed-off-by: yewentao256 <zhyanwentao@126.com>
```
16688b26

[Bugfix][Kernel] nvfp4 cutlass MoE: fix nvfp4 experts quant out-of-bounds read... · 6fbec8ed

Jakub Zakrzewski authored Apr 21, 2026


[Bugfix][Kernel] nvfp4 cutlass MoE: fix nvfp4 experts quant out-of-bounds read for expert counts not divisible by 4 or 16 (#40351)
Signed-off-by: Jakub Zakrzewski <jzakrzewski@nvidia.com>

6fbec8ed

[Performance] Add is_reasoning_end_streaming() override to GptOssReasoningParser (#35745) · 5544f8c1

Fergus authored Apr 21, 2026


Signed-off-by: Fergus <fergus.barratt00@gmail.com>
Signed-off-by: fergus barratt <fergus.barratt00@gmail.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>

5544f8c1

[Bugfix] Fix spec decode test failures on Blackwell (SM100+) (#39546) · 9f39b380

Rishi Puri authored Apr 21, 2026


Signed-off-by: Stefano Castagnetta <scastagnetta@nvidia.com>
Signed-off-by: Rishi Puri <puririshi98@berkeley.edu>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Stefano Castagnetta <scastagnetta@nvidia.com>
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: Benjamin Chislett <bchislett@nvidia.com>

9f39b380

[MRv2]fix: model accuracy regression caused by reusing the stale... · 9a6a66f3

Zijing Liu authored Apr 21, 2026


[MRv2]fix: model accuracy regression caused by reusing the stale last_sampled_tokens and draft_tokens (#39833)
Signed-off-by: Zijing Liu <liuzijing2014@gmail.com>

9a6a66f3

Revert "[Misc] Move `pyav` and `soundfile` to common requirements" (#40276) · 67eb6083
Isotr0py authored Apr 22, 2026
```
Co-authored-by: Roger Wang <hey@rogerw.io>
```
67eb6083
Add new tp plan styles to the Transformers modelling backend (#40467) · 6ee081d1
Harry Mellor authored Apr 21, 2026
```
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
```
6ee081d1

[Model Runner V2] Multiple prompt logprobs support (#39937) · 66cc3fa5

Wentao Ye authored Apr 21, 2026


Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com>

66cc3fa5

Revert #38730 and #38791 (#40032) · 6d85b36a

Vadim Gimpelson authored Apr 21, 2026


Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
Signed-off-by: Vadim Gimpelson <156319763+vadiklyutiy@users.noreply.github.com>

6d85b36a

[UX] Bump version in CG memory profiling log message (#40465) · ab5666eb
Matthew Bonanni authored Apr 21, 2026
```
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
```
ab5666eb

Default to 'align' mamba cache mode for Mamba-based models when speculative... · f819265a

roikoren755 authored Apr 21, 2026


Default to 'align' mamba cache mode for Mamba-based models when speculative decoding is enabled (#40454)
Signed-off-by: Roi Koren <roik@nvidia.com>

f819265a

[MM][CG] Optimize default `max_frames_per_batch` auto-infer for ViT CUDA graph... · 936e0b79

Shanshan Shen authored Apr 21, 2026


[MM][CG] Optimize default `max_frames_per_batch` auto-infer for ViT CUDA graph video inference (#40445)
Signed-off-by: shen-shanshan <467638484@qq.com>

936e0b79

[XPU][CI] Add misc, engine and lora cases on Intel GPU in CI (#39887) · b2a55186
xiangdong authored Apr 21, 2026
```
Signed-off-by: zengxian <xiangdong.zeng@intel.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
```
b2a55186
[Bugfix] LoRA: extend expert base_layer loading to Qwen3.5 and Step3.x (#37114) · 908a7134
ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟 authored Apr 21, 2026
```
Signed-off-by: Hollow Man <hollowman@opensuse.org>
```
908a7134

[Doc] Add Qwen3 AWQ models to documentation (#40034) · ec5ef0ac

Yusuf Mohammad authored Apr 21, 2026


Signed-off-by: Yusuf <yusufmohammad@live.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

ec5ef0ac

[Bugfix] Fix dataset name and path argument validation bug in vllm bench serve (#40288) · 7b1e0b07

Talor Abramovich authored Apr 21, 2026

Signed-off-by: talora <talora@nvidia.com>
Signed-off-by: Talor Abramovich <talor19@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

7b1e0b07

Add Granite 4.1 Vision as built-in multimodal model (#40282) · d249a9e9

artem-spector authored Apr 21, 2026


Signed-off-by: Artem Spector <artems@il.ibm.com>
Signed-off-by: artemspector <artems@il.ibm.com>
Co-authored-by: artemspector <artems@il.ibm.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

d249a9e9

[Frontend] Remove frontend pooling multi task support. (#37861) · d2e2e856

wang.yuqi authored Apr 21, 2026


Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

d2e2e856

feat(multimodal): support externally processed mm_kwargs with cache injection (#39502) · 766cb65d

Kris Hung authored Apr 21, 2026


Signed-off-by: Krish Hung <krishung5@gmail.com>
Signed-off-by: krishung5 <krish@nvidia.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

766cb65d

fix: clamp NaN/Inf in topk_softmax to prevent duplicate expert IDs (#39391) · 28c22215
Jhao-Ting Chen authored Apr 21, 2026
```
Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>
```
28c22215

Revert "[Startup] Parallelize torch/transformers import + weight prefetch +... · 3975eb6d

wang.yuqi authored Apr 21, 2026


Revert "[Startup] Parallelize torch/transformers import + weight prefetch + forkserver prewarm" (#40438)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>

3975eb6d

[Bugfix] Normalize malformed dict prompts that carry token IDs in `prompt` (#40339) · 5a94a198
Zeyu Zhang authored Apr 21, 2026
```
Signed-off-by: Alchuang22-dev <2584829494@qq.com>
```
5a94a198
[Feat] dflash support for ROCm (#39703) · f95c11a8
hangy-amd authored Apr 21, 2026
```
Signed-off-by: Hang Yang <hangy@amd.com>
```
f95c11a8
[MoE] Triton MoE Perf regression - restore low latency path (#39016) · 257015d5
milesial authored Apr 20, 2026

257015d5
[MM][Misc] Support image+video mixed inputs (per prompt) for VLM examples (#40335) · b4784001
Shanshan Shen authored Apr 21, 2026
```
Signed-off-by: shen-shanshan <467638484@qq.com>
```
b4784001
[Fix] Add missing space in IP fallback warning (#40359) · 989cc12d
SeongJun Lee authored Apr 21, 2026
```
Signed-off-by: lesj0610 <lesj0610@gmail.com>
```
989cc12d
[Deprecation] Deprecate cprofile and cprofile_context (#39100) · 301024aa
Wentao Ye authored Apr 20, 2026
```
Signed-off-by: yewentao256 <zhyanwentao@126.com>
```
301024aa
[Startup] Parallelize torch/transformers import + weight prefetch + forkserver prewarm (#40331) · 8256833f
Simon Mo authored Apr 20, 2026
```
Signed-off-by: simon-mo <simon@inferact.ai>
```
8256833f
[Doc] Update ViT CUDA graph doc for mixed (image+video) inputs (#40355) · 80975912
Shanshan Shen authored Apr 21, 2026
```
Signed-off-by: shen-shanshan <467638484@qq.com>
```
80975912

[Bugfix] Gemma4: fix multimodal embedder norm order to match HF reference (#40411) · 20d37434

Luciano Martins authored Apr 20, 2026


Signed-off-by: Luciano Martins <lucianommartins@users.noreply.github.com>
Co-authored-by: Luciano Martins <lucianommartins@users.noreply.github.com>

20d37434

[Misc] Reduce attention logging levels (#40086) · 18563f20
Chauncey authored Apr 21, 2026
```
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
```
18563f20

[Bugfix] Fix `_CONFIG_REGISTRY` types getting wrong config class when on-disk... · 0e884fe6

Misa authored Apr 20, 2026


[Bugfix] Fix `_CONFIG_REGISTRY` types getting wrong config class when on-disk model_type differs (#39554)
Signed-off-by: Misa <misaAle@users.noreply.github.com>
Signed-off-by: Misael Casarez <misacasa@amazon.com>
Co-authored-by: Misael Casarez <misacasa@amazon.com>

0e884fe6

[vLLM IR] Add IR op testing and benchmarking infrastructure (#40167) · fe5c115e

Yanan Cao authored Apr 20, 2026


Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>
Co-authored-by: Theresa Shan <Theresa.Shan@amd.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fe5c115e