Commits · 8510c10c99e68ba142e4331d9a9a8777921ad910 · OpenDAS / vllm_cscc

10 Feb, 2026 1 commit
- feat: implement FP8 blockwise GEMM with hipblaslt · 8510c10c
  lixh authored Feb 09, 2026
  
  8510c10c
05 Feb, 2026 1 commit
- Merge tag 'v0.15.1' into v0.15.1-dev · 45a060d6
  zhuwenwen authored Feb 05, 2026
  
  45a060d6
04 Feb, 2026 13 commits
- fix load error · 99fc9fc3
  zhuwenwen authored Feb 04, 2026
  
  99fc9fc3
- [perf] use optimized topk_softmax + renormalize (lightop) · e9e95d0f
  zhuwenwen authored Feb 04, 2026
  
  e9e95d0f
- [perf] update op.moe_fused_gate · 06e16a27
  zhuwenwen authored Feb 04, 2026
  
  06e16a27
- update VLLM_USE_OPT_RESHAPE_AND_CACHE to support bf16 and qwen3-dense · 263f45a4
  zhuwenwen authored Feb 04, 2026
  
  263f45a4
- [perf] add VLLM_USE_FLASH_ATTN_FP8 to use fa fp8 attention · ac28ab22
  zhuwenwen authored Feb 04, 2026
  
  ac28ab22
- [perf] add VLLM_USE_FUSED_FILL_RMS_CAT to use lightop for dpsk mtp fill + rms*2 + cat · 5fe03549
  zhuwenwen authored Feb 04, 2026
  
  5fe03549
- skip SPLIT_K · b8c7ba0a
  zhuwenwen authored Feb 04, 2026
  
  b8c7ba0a
- remove remove VLLM_USE_OPT_MOE_SUM · 2703e2e9
  zhuwenwen authored Feb 04, 2026
  
  2703e2e9
- update mla interface · 1cb851b0
  zhuwenwen authored Feb 04, 2026
  
  1cb851b0
- skip AiterInt8ScaledMMLinearKernel · 4599e05f
  zhuwenwen authored Feb 04, 2026
  
  4599e05f
- [BugFix][Spec Decoding] Fix negative accepted tokens metric crash (#33729) · 1892993b
  Nick Hill authored Feb 03, 2026
```
Signed-off-by: Nick Hill <nickhill123@gmail.com>
```
  1892993b
- cherry pick · 7d98f09b
  Michael Goin authored Feb 03, 2026
```
Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>
```
  7d98f09b
- [Bugfix] Disable RoutingMethodType.[Renormalize,RenormalizeNaive] TRTLLM... · daa2784b
  Michael Goin authored Feb 03, 2026
```
[Bugfix] Disable RoutingMethodType.[Renormalize,RenormalizeNaive] TRTLLM per-tensor FP8 MoE (#33620)
Signed-off-by: mgoin <mgoin64@gmail.com>
(cherry picked from commit e346e2d0

)
Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>
```
  daa2784b
03 Feb, 2026 13 commits
- skip cupy · f509adcb
  zhuwenwen authored Feb 03, 2026
  
  f509adcb
- update triton version · a3363dca
  zhuwenwen authored Feb 03, 2026
  
  a3363dca
- add cupy · 1d73cdad
  zhuwenwen authored Feb 03, 2026
  
  1d73cdad
- __syncwarp isn't defined · 76855bdc
  zhuwenwen authored Feb 03, 2026
  
  76855bdc
- skip aiter · e5f2ff72
  zhuwenwen authored Feb 03, 2026
  
  e5f2ff72
- add moe_fused_gate · 0386844b
  zhuwenwen authored Feb 03, 2026
  
  0386844b
- [torch.compile] Don't do the fast moe cold start optimization if there is... · e4bf6ed9
  Richard Zou authored Feb 02, 2026
```
[torch.compile] Don't do the fast moe cold start optimization if there is speculative decoding (#33624)
Signed-off-by: Richard Zou <zou3519@gmail.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
(cherry picked from commit 5eac9a1b)
```
  e4bf6ed9
- [torch.compile] Speed up MOE handling in forward_context (#33184) · 611b1875
  Richard Zou authored Jan 27, 2026
```
Signed-off-by: Richard Zou <zou3519@gmail.com>
(cherry picked from commit d9aa39a3)
```
  611b1875
- [Misc][Build] Lazy load cv2 in nemotron_parse.py (#33189) · eec3546b
  Kiersten Stokes authored Jan 29, 2026
```
Signed-off-by: kiersten-stokes <kierstenstokes@gmail.com>
(cherry picked from commit 9e138cb0)
```
  eec3546b
- Patch Protobuf for CVE 2026-0994 (#33619) · 7c023baf
  zaristei2 authored Feb 03, 2026
```
Signed-off-by: Zachary Aristei <zaristei@nvidia.com>
Co-authored-by: Zachary Aristei <zaristei@nvidia.com>
```
  7c023baf
- Patch aiohttp for CVE-2025-69223 (#33621) · 099a787e
  zaristei2 authored Feb 03, 2026
```
Signed-off-by: Zachary Aristei <zaristei@nvidia.com>
Co-authored-by: Zachary Aristei <zaristei@nvidia.com>
```
  099a787e
- fix run error · b31c7251
  zhuwenwen authored Feb 03, 2026
  
  b31c7251
- [Release] Fix format and cherry-pick (#33618) · 31a64c63
  Zhewen Li authored Feb 02, 2026
```
Signed-off-by: zhewenli <zhewen@inferact.ai>
Co-authored-by: zhewenli <zhewen@inferact.ai>
```
  31a64c63
02 Feb, 2026 12 commits

[Release] patch step3p5 attention class in v0.15.1 release (#33602) · 57eae2f8
Zhewen Li authored Feb 02, 2026
```
Signed-off-by: zhewenli <zhewen@inferact.ai>
Co-authored-by: zhewenli <zhewen@inferact.ai>
```
57eae2f8
[Fix] prefix cache hit rate == 0 bug with gpt-oss style models (#33524) · f0d00586
Yifan Qiao authored Feb 01, 2026
```
Signed-off-by: Yifan Qiao <yifanqiao@berkeley.edu>
(cherry picked from commit a01ef3fa)
```
f0d00586

[Nightly CI] Remove CT Model (#33530) · 94cbe0a3

Robert Shaw authored Feb 01, 2026


Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
(cherry picked from commit 318b1207)

94cbe0a3

[Models] Step-3.5-Flash (#33523) · 8b45c58f

csy0225 authored Feb 02, 2026


Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: i-zhangmingming <i-zhangmingming@stepfun.com>
Co-authored-by: xiewuxun <xiewuxun@stepfun.com>
Co-authored-by: zetaohong <i-hongzetao@stepfun.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
(cherry picked from commit c3b40dc3)

8b45c58f

pin LMCache to v0.3.9 or greater with vLLM v0.15.0 (#33440) · c7039a80

Greg Pereira authored Jan 31, 2026


Signed-off-by: greg pereira <grpereir@redhat.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
(cherry picked from commit d6416fdd)

c7039a80

fix: Add SM120 (RTX Blackwell) support for FlashInfer CUTLASS NVFP4 MoE kernels (#33417) · 15ebd0ce
René Honig authored Jan 31, 2026
```
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
(cherry picked from commit 07978117)
```
15ebd0ce

[fix][torch.compile] Fix cold-start compilation time increase by adding kv... · 29152683

Luka Govedič authored Jan 31, 2026


[fix][torch.compile] Fix cold-start compilation time increase by adding kv cache update to splitting ops (#33441)
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Co-authored-by: Richard Zou <zou3519@gmail.com>
(cherry picked from commit 15f40b20)

29152683

[BugFix] Fix whisper FA2 + full cudagraphs (#33360) · d984d664

Lucas Wilkinson authored Jan 30, 2026


Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
(cherry picked from commit 0a3c71e7)

d984d664

[Bugfix][ROCm] Fixing the skinny gemm dispatch logic from #32831 (#33366) · 5f45b0b7
Gregory Shtrasberg authored Jan 30, 2026
```
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
(cherry picked from commit 31aedfe7)
```
5f45b0b7
[release] Minor fixes to release annotation and wheel upload (#33129) · a2dba556
Kevin H. Luu authored Jan 29, 2026
```
Signed-off-by: khluu <khluu000@gmail.com>
(cherry picked from commit 2284461d)
```
a2dba556
[Bugfix] Enable Triton MoE for FP8 per-tensor dynamic (#33300) · 6ff16b77
Michael Goin authored Jan 29, 2026
```
Signed-off-by: mgoin <mgoin64@gmail.com>
(cherry picked from commit bfb9bdaf)
```
6ff16b77

[Bugfix] Fix Qwen3-VL-Reranker load. (#33298) · 1ed963d4

wang.yuqi authored Jan 29, 2026


Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
(cherry picked from commit abb34ac4)

1ed963d4