Commits · 4e4db0b411c8358e0e6f0fd94478d30ca1e61ecd · OpenDAS / vllm_cscc

03 Feb, 2026 1 commit
- [feat]1.适配ds w8a8及mtp;2.添加宽松mtp;3.适配w8a8 DEEPEP;4.解决ds 671B精度异常 · fd1b3940
  王敏 authored Feb 03, 2026
  
  fd1b3940
16 Jan, 2026 1 commit
- Switch default w8a8 gemm impl to blaslt. · f06d1125
  zhuwenwen authored Jan 16, 2026
```
fix _forward_encoder_attention
remove medusa
set VLLM_PCIE_USE_CUSTOM_ALLREDUCE=1
```
  f06d1125
12 Jan, 2026 1 commit
- [3/N][Attention] Move AttentionMetadata-related code from utils.py to backend.py (#32054) · 20228cb8
  Matthew Bonanni authored Jan 12, 2026
```
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
```
  20228cb8
09 Jan, 2026 1 commit
- [1/N][Attention] Restructure attention: move files (#31916) · 2612ba92
  Matthew Bonanni authored Jan 09, 2026
```
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
```
  2612ba92
07 Jan, 2026 4 commits
- [Chore] Migrate V0 attention utils (#31891) · b665bbc2
  Cyrus Leung authored Jan 07, 2026
```
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
```
  b665bbc2
- [ROCm][AITER] fix wrong argument passed to AITER `flash_attn_varlen_func` (#31880) · 41cfa506
  vllmellm authored Jan 07, 2026
```
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
```
  41cfa506
- [ROCm][AITER] bugfix accuracy regression in ROCM_AITER_TRITON_MLA backend (#31816) · 6409004b
  vllmellm authored Jan 07, 2026
```
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
```
  6409004b
- fixed mypy warnings for files vllm/v1/attention with TEMPORARY workaround (#31465) · 0a2c2dc3
  Jack Yang authored Jan 06, 2026
```
Signed-off-by: Zhuohao Yang <zy242@cornell.edu>
Co-authored-by: Zhuohao Yang <zy242@cornell.edu>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
```
  0a2c2dc3
06 Jan, 2026 3 commits

[Attention][1/n] Remove usage of deprecated `seq_lens_cpu` and... · e0327c9d

Lucas Wilkinson authored Jan 06, 2026


[Attention][1/n] Remove usage of deprecated `seq_lens_cpu` and `num_computed_tokens_cpu` CommonAttentionMetadata properties (#31773)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

e0327c9d

update lightop import · 05e8b083
zhuwenwen authored Jan 06, 2026

05e8b083

fix weights_not_loaded · 451af742

zhuwenwen authored Jan 06, 2026

update weights_not_loaded and flash_mla_with_kvcache
update paged_mqa_logits

451af742

02 Jan, 2026 1 commit
- [Bugfix][Hardware][AMD] Fix last_page_len calculation in AITER MLA decode (#31282) · 825c2dc1
  Kevin McKay authored Jan 01, 2026
```
Signed-off-by: c0de128 <kevin.mckay@outlook.com>
```
  825c2dc1
31 Dec, 2025 1 commit

[Bug] Fix log issue with `\n` (#31390) · 357d435c

Wentao Ye authored Dec 31, 2025

Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>

357d435c

23 Dec, 2025 2 commits
- update flash_mla_with_kvcache · 9663a03f
  zhuwenwen authored Dec 23, 2025
  
  9663a03f
- Revert "[SM100] Enable fp8 compute for prefill MLA (#30746)" (#31197) · 3e102623
  Pavani Majety authored Dec 22, 2025
```
Signed-off-by: Pavani Majety <pmajety@nvidia.com>
```
  3e102623
22 Dec, 2025 3 commits
- [Bug] Fix `'CutlassMLAImpl' object has no attribute '_workspace_buffer'` (#31173) · 5312a728
  Wentao Ye authored Dec 22, 2025
```
Signed-off-by: yewentao256 <zhyanwentao@126.com>
```
  5312a728
- [SpecDecode] Simplified alternative padded-speculation acceptance rate fix (#29845) · de717476
  Lucas Wilkinson authored Dec 22, 2025
```
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
```
  de717476
- [SM100] Enable fp8 compute for prefill MLA (#30746) · b10f41c8
  Pavani Majety authored Dec 22, 2025
```
Signed-off-by: Pavani Majety <pmajety@nvidia.com>
```
  b10f41c8
18 Dec, 2025 1 commit
- remove unused code · c1c5e4f6
  zhuwenwen authored Dec 18, 2025
  
  c1c5e4f6
17 Dec, 2025 1 commit

Synchronize the modifications from the 12th to the 17th: · b66c8e4b

zhuwenwen authored Dec 17, 2025

修复CompressedTensorsLinearMethod中的w4a16的冲突问题
feat(moe): add Marlin W16A16 fused MoE behind VLLM_USE_MARLIN_W16A16_MOE
replace the fp8_mqa_logits and fp8_paged_mqa_logits interfaces in deepgemm with mqa_logits and paged_mqa_logits from lightop

b66c8e4b

16 Dec, 2025 1 commit
- [ROCm][MTP] Support MTP for AITER MLA backend (#28624) · 9dbbc59b
  Pleaplusone authored Dec 16, 2025
```
Signed-off-by: ganyi <ygan@amd.com>
```
  9dbbc59b
13 Dec, 2025 1 commit

[Feature] Add SM103 (Blackwell Ultra) Support to vLLM (#30484) · 4fa7ce46

Roberto L. Castro authored Dec 13, 2025

Signed-off-by: LopezCastroRoberto <robertol.c510@gmail.com>
Signed-off-by: Roberto L. Castro <38211239+LopezCastroRoberto@users.noreply.github.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>

4fa7ce46

12 Dec, 2025 1 commit
- [Attention] Use sparse prefill kernel for fp8 kv-cache in DeepSeek-v3.2 (#27532) · 3e41992f
  Lucas Wilkinson authored Dec 12, 2025
```
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
```
  3e41992f
11 Dec, 2025 1 commit
- [perf] Use direct copy (broadcast) instead of cat for k_nope/k_pe in MLA prefill (#29710) · fba89069
  Ming Yang authored Dec 11, 2025
```
Signed-off-by: Ming Yang <minos.future@gmail.com>
```
  fba89069
09 Dec, 2025 1 commit
- [DCP][Bugfix][CI] Fix accuracy issue of DCP when using FLASH_ATTN_MLA (#30309) · 67475a6e
  Jaya Yuan authored Dec 09, 2025
```
Signed-off-by: FENP <yuanyongjie.yyj@antgroup.com>
```
  67475a6e
08 Dec, 2025 1 commit
- [Perf] Improve fp8 quant in mla; replace ReduceSum with ReduceScatterSum (#29795) · 1fb632fd
  Lain authored Dec 08, 2025
```
Signed-off-by: Siyuan Fu <siyuanf@nvidia.com>
```
  1fb632fd
05 Dec, 2025 1 commit

[Attention][UX][1/N] Add AttentionConfig and change attention env vars to CLI arguments (#26315) · 66e674cd

Matthew Bonanni authored Dec 05, 2025

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>

66e674cd

04 Dec, 2025 3 commits
- update MLACommonBaseImpl get_and_maybe_dequant_weights · b8412df6
  zhuwenwen authored Dec 04, 2025
  
  b8412df6
- Revert "update kv_b_proj_weight" · 7f1d5aff
  zhuwenwen authored Dec 04, 2025
```
This reverts commit 3b121add.
```
  7f1d5aff
- update kv_b_proj_weight · 3b121add
  zhuwenwen authored Dec 04, 2025
  
  3b121add
02 Dec, 2025 1 commit
- [fix]解决部分mtp启动报错 · 26084d72
  王敏 authored Dec 02, 2025
  
  26084d72
30 Nov, 2025 1 commit
- Fix AttributeError about _use_fi_prefill (#29734) · 82c795d6
  Huamin Li authored Nov 29, 2025
```
Signed-off-by: Huamin Li <3ericli@gmail.com>
```
  82c795d6
28 Nov, 2025 1 commit
- bugfix: correct attn output with base 2 or e (#28840) · 9726e645
  Augusto Yao authored Nov 29, 2025
```
Signed-off-by: augusto.yjh <augusto.yjh@antgroup.com>
```
  9726e645
26 Nov, 2025 1 commit
- [Performance][MLA][ROCm] Remove redundant D2D copy in deepseek (#27457) · d9d342d2
  Pleaplusone authored Nov 26, 2025
```
Signed-off-by: ganyi <ygan@amd.com>
```
  d9d342d2
25 Nov, 2025 2 commits

[ROCm][MLA] enable fp8 MLA decode on ROCm (#28032) · cb7214d8

gbyu-amd authored Nov 25, 2025


Signed-off-by: guanbao <gyu@amd.com>
Signed-off-by: Guanbao Yu <gyu@amd.com>
Signed-off-by: gbyu-amd <Guanbao.Yu@amd.com>
Co-authored-by: guanbao <gyu@amd.com>

cb7214d8

[Perf][Deepseek] optimize gather_and_maybe_dequant_cache kernel's perf for... · 77e10c9c

Pleaplusone authored Nov 25, 2025


[Perf][Deepseek] optimize gather_and_maybe_dequant_cache kernel's perf for extremely long sequence (#28029)
Signed-off-by: ganyi <ygan@amd.com>

77e10c9c

22 Nov, 2025 1 commit
- [Attention] Refactor FA `block_size` limitations to hybrid models only (#29084) · 066209a0
  Nicolò Lucchesi authored Nov 22, 2025
```
Signed-off-by: NickLucche <nlucches@redhat.com>
```
  066209a0
20 Nov, 2025 3 commits
- [KVConnector][Core] Support cross-layer KV blocks (#27743) · 64746471
  Or Ozeri authored Nov 20, 2025
```
Signed-off-by: Or Ozeri <oro@il.ibm.com>
```
  64746471
- [ROCm] Add AMD GPU support on Deepseek v3.2 and SparseMLA (#26670) · 06c20c99
  Pleaplusone authored Nov 20, 2025
```
Signed-off-by: ganyi <ygan@amd.com>
```
  06c20c99
- [AMD] Use Decoupled Kernel Block Size to Support AITER MLA block_size=1 (#27715) · 3fb0d909
  Qiang Zhang authored Nov 20, 2025
```
Signed-off-by: chiangzhang <chiangzhang@tencent.com>
```
  3fb0d909