Commits · 967146e7bdfb0cc3cb16fb5cc547bff9667ae0a2 · OpenDAS / vllm_cscc

10 Apr, 2026 8 commits
- [model] support FireRedLID (#39290) · 967146e7
  PatchyTIS authored Apr 10, 2026
```
Signed-off-by: PatchouliTaisa <patchychen@tencent.com>
Co-authored-by: PatchouliTaisa <patchychen@tencent.com>
```
  967146e7
- [PluggableLayer][3/N] Apply PluggableLayer to llm_head and vocab embedding layer (#33465) · 1dfd64c1
  Hexiang Wang authored Apr 10, 2026
```
Signed-off-by: whx-sjtu <2952154980@qq.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
```
  1dfd64c1
- Nemotron Nano VL: Streamline pixel shuffle (#37580) · 270e8a41
  milesial authored Apr 10, 2026
```
Signed-off-by: milesial <milesial@users.noreply.github.com>
```
  270e8a41
- [compile] Allow strings in custom ops without regressing compilation times (#38123) · f44afef6
  Richard Zou authored Apr 10, 2026
```
Signed-off-by: Richard Zou <zou3519@gmail.com>
```
  f44afef6
- Add EXAONE-4.5 (#39388) · e7a1387e
  Kyungmin Lee authored Apr 10, 2026
```
Signed-off-by: lkm2835 <lkm2835@gmail.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
```
  e7a1387e
- [CT][FP8][Marlin] refactor CompressedTensorsW8A16Fp8 to use kernel abstraction (#38244) · 55d037e2
  Kunshang Ji authored Apr 10, 2026
```
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
Signed-off-by: Kunshang Ji <jikunshang95@gmail.com>
```
  55d037e2
- fix(gdn): Align prefill warmup with real prefill path (#39169) · 9853a3c1
  Ibrahim Arshad authored Apr 10, 2026
```
Signed-off-by: Ibrahim Arshad <38925737+ibrahim1023@users.noreply.github.com>
```
  9853a3c1
- [Model][Perf] Enable checkpoints prefetching for Lustre FS by default (#39422) · bb6047db
  Artem Perevedentsev authored Apr 10, 2026
```
Signed-off-by: Artem Perevedentsev <aperevedents@nvidia.com>
```
  bb6047db
09 Apr, 2026 11 commits
- [ASR] Fix spacing bw chunks in multi chunk audio transcription (#39116) · f7cad674
  Ekagra Ranjan authored Apr 09, 2026
```
Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
```
  f7cad674
- [Refactor] Move NVFP4 GEMM management into NvFp4LinearKernel (#39129) · 2800706f
  Michael Goin authored Apr 09, 2026
```
Signed-off-by: mgoin <mgoin64@gmail.com>
```
  2800706f
- [Quantization] Support Quark W8A8 INT8 MoE inference (#36320) · 827268e9
  PikaPikachu authored Apr 10, 2026
```
Signed-off-by: kangletian <Letian.Kang@amd.com>
```
  827268e9
- [BugFix] fix tests/kernels/moe/test_moe_layer.py (#39404) · 6c749399
  Richard Zou authored Apr 09, 2026
```
Signed-off-by: Richard Zou <zou3519@gmail.com>
```
  6c749399
- nemotron-nano-vl: Allow `use_audio_in_video` to be passed at `vllm serve` time (#38538) · df2503e1
  Andrii Skliar authored Apr 09, 2026
```
Signed-off-by: Andrii Skliar <askliar@nvidia.com>
Co-authored-by: Andrii Skliar <askliar@nvidia.com>
```
  df2503e1
- [Refactor] Improve indexer decode path metadata preparation (#38865) · 2e984060
  Yongye Zhu authored Apr 08, 2026
  
  2e984060
- [Model] Update ColModernVBERT to support latest HF checkpoint (#39307) · d37b3787
  Ilya Boytsov authored Apr 09, 2026
```
Signed-off-by: Ilya Boytsov <ilyaboytsov1805@gmail.com>
```
  d37b3787
- [Bug] Fix routing bias dtype for trtllm per-block fp8 moe (#38989) · 92fbec39
  Wei Zhao authored Apr 08, 2026
```
Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
```
  92fbec39
- [Gemma4] Support quantized MoE (#39045) · 3aecdf08
  Dipika Sikka authored Apr 08, 2026
```
Signed-off-by: Dipika Sikka <dipikasikka1@gmail.com>
```
  3aecdf08
- [W8A8 Block Linear Refactor][2/N] Remove W8A8Fp8BlockLinearOp and adopt Fp8... · 2e9034c9
  Maral authored Apr 09, 2026
```
[W8A8 Block Linear Refactor][2/N] Remove W8A8Fp8BlockLinearOp and adopt Fp8 block linear kernel selections. (#33892)
Signed-off-by: maral <maralbahari.98@gmail.com>
Signed-off-by: Maral <maralbahari.98@gmail.com>
```
  2e9034c9
- [Bugfix] FlashInfer MXINT4 MoE crashes, missing do_finalize (#39315) · 8332078c
  Benjamin Chislett authored Apr 08, 2026
```
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
```
  8332078c
08 Apr, 2026 7 commits

[Bugfix]Fix EP precision for Qwen3.5, Qwen3-Next (#39181) · f3c7941e
Kai Song authored Apr 09, 2026
```
Signed-off-by: Song Kai <songkai05@baidu.com>
```
f3c7941e
[Feature] Batch invariant nvfp4 linear support (#39322) · 20181372
Wentao Ye authored Apr 08, 2026
```
Signed-off-by: yewentao256 <zhyanwentao@126.com>
```
20181372

[MoE] Move DEEP_GEMM into experts/ subdirectory (#39005) · a776a48b

Jackmin801 authored Apr 08, 2026


Signed-off-by: Jackmin801 <ongjackm@gmail.com>
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>

a776a48b

[Perf][Kernel] Persistent TopK scheduler: unified CUDAGraph-safe kernel with... · b55d830e

Roberto L. Castro authored Apr 08, 2026


[Perf][Kernel] Persistent TopK scheduler: unified CUDAGraph-safe kernel with dynamic per-row dispatch - DeepSeek-V3.2 DSA decode (#37421)
Signed-off-by: LopezCastroRoberto <rocastro@redhat.com>
Signed-off-by: Roberto L. Castro <38211239+LopezCastroRoberto@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>

b55d830e

[CI][AMD][BugFix][Kernel] Cast induction variable to int64 on MI350 for... · 78434b92

rasmith authored Apr 08, 2026


[CI][AMD][BugFix][Kernel] Cast induction variable to int64 on MI350 for chunk_gated_delta_rule_fwd_kernel_h_blockdim64 to avoid illegal memory access (#39087)
Signed-off-by: Randall Smith <Randall.Smith@amd.com>

78434b92

[release 2.11] Update to torch 2.11 (#34644) · 2111997f
Andrey Talman authored Apr 07, 2026

2111997f

[XPU] add xpu backend implementation of mxfp8 quant (#38682) · ad330442

zofia authored Apr 08, 2026


Signed-off-by: Zhu, Zufang <zufang.zhu@intel.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>

ad330442

07 Apr, 2026 12 commits
- [Attention][V0 Deprecation] Deprecate accept output buffer (#39125) · 70406eb1
  Lucas Wilkinson authored Apr 07, 2026
```
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
```
  70406eb1
- [Bugfix] Fix extract_hidden_states crash with quantized KV cache dtype (#39160) · 08bfedc1
  Yubo Wang authored Apr 07, 2026
```
Signed-off-by: Yubo Wang <yubowang2019@gmail.com>
```
  08bfedc1
- [CI][Bugfix][AMD][ Ensure weights created when using emulating OCP MXFP4 (#36993) · 83d09d36
  rasmith authored Apr 07, 2026
```
Signed-off-by: Randall Smith <Randall.Smith@amd.com>
```
  83d09d36
- [XPU] Quick fix for TritonMLA to remove cuda hardcode (#39088) · 92b9afee
  Chendi.Xue authored Apr 07, 2026
```
Signed-off-by: Chendi Xue <chendi.xue@intel.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
```
  92b9afee
- [Bugfix] Fix marlin nvfp4 rescaling (#37502) · 73105554
  Jinzhen Lin authored Apr 07, 2026
```
Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com>
```
  73105554
- [Bugfix][Quantization] Fix PerTensorScale loading with tuple shard_id in... · 98e1a43a
  kkyyxhll authored Apr 07, 2026
```
[Bugfix][Quantization] Fix PerTensorScale loading with tuple shard_id in MergedColumnParallelLinear (#38517)
Signed-off-by: loukang <loukang@xiaohongshu.com>
```
  98e1a43a
- [Bug] Fix Trtllm Fp8 MoE Weight Shuffle Memory Fragamentation (#39054) · 0be9516e
  Wei Zhao authored Apr 07, 2026
```
Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>
```
  0be9516e
- [vLLM IR] rework gemma_rms_norm (#39014) · 8060bb03
  Jiangyun Zhu authored Apr 07, 2026
```
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
Signed-off-by: Jiangyun Zhu <riverclouds.zhu@qq.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
```
  8060bb03
- [Model] Use AutoWeightsLoader for FalconH1 (#39092) · da4c0e4d
  Rishapveer Singh authored Apr 07, 2026
```
Signed-off-by: Rishapveer Singh <215205492+rishaps@users.noreply.github.com>
```
  da4c0e4d
- nano-nemotron-vl: get_mm_max_tokens_per_item for audio, video, image == seq_len (#38727) · a9a0e055
  Netanel Haber authored Apr 07, 2026
```
Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
```
  a9a0e055
- [Kernels][MoE] Fix legacy_routing to use bitmatrix-based routing path (#38504) · 2df2c85b
  Andreas Karatzas authored Apr 06, 2026
```
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
```
  2df2c85b
- [MoE Refactor] Split up compressed_tensors_moe.py (#38960) · b2b2c523
  bnellnm authored Apr 06, 2026
```
Signed-off-by: Bill Nell <bnell@redhat.com>
```
  b2b2c523
06 Apr, 2026 2 commits

[NVFP4] Support NVFP4 dense models from `modelopt` and `compressed-tensors` on... · 00d7b497

fxmarty-amd authored Apr 07, 2026


[NVFP4] Support NVFP4 dense models from `modelopt` and `compressed-tensors` on AMD Instinct MI300, MI355X and Hopper through emulation (#35733)
Signed-off-by: Felix Marty <Felix.Marty@amd.com>
Signed-off-by: fxmarty-amd <felmarty@amd.com>
Co-authored-by: Kyle Sayers <kylesayrs@gmail.com>

00d7b497

NemotronH default mamba_ssm_cache_dtype=float32; enable auto-hook for... · dfa5062a

Netanel Haber authored Apr 06, 2026


NemotronH default mamba_ssm_cache_dtype=float32; enable auto-hook for NemotronHNanoVLV2Config (#39032)
Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>

dfa5062a