Commits · 2f1c19b2456d4fb15f3475c9db5b077777feab76 · OpenDAS / vllm_cscc

12 Jun, 2025 1 commit
- [CI] change spell checker from codespell to typos (#18711) · 2f1c19b2
  Ning Xie authored Jun 12, 2025
```
Signed-off-by: Andy Xie <andy.xning@gmail.com>
```
  2f1c19b2
11 Jun, 2025 2 commits

[AMD] [Quantization] Add override flag for attention dtype instead of using... · c7ea0b56

rasmith authored Jun 11, 2025


[AMD] [Quantization] Add override flag for attention dtype instead of using kv_cache_dtype trigger (#17331)
Signed-off-by: Randall Smith <Randall.Smith@amd.com>

c7ea0b56

[Misc] Fix misleading ROCm warning (#19486) · 04a55612
Jee Jee Li authored Jun 12, 2025
```
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
```
04a55612

04 Jun, 2025 1 commit
- [CPU] V1 support for the CPU backend (#16441) · 4555143e
  Li, Jiang authored Jun 04, 2025
  
  4555143e
03 Jun, 2025 2 commits
- [V1] Support cross-layer KV sharing (#18212) · bdf13965
  Yong Hoon Shin authored Jun 03, 2025
```
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
```
  bdf13965
- [Misc] Add SPDX-FileCopyrightText (#19100) · 02f0c7b2
  Simon Mo authored Jun 03, 2025
```
Signed-off-by: simon-mo <simon.mo@hey.com>
```
  02f0c7b2
30 May, 2025 1 commit
- [ROCm] Remove unnecessary assertion of max_model_len in ROCM_AITER_MLA attention backend. (#18938) · 77b6e74f
  vllmellm authored May 30, 2025
```
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
```
  77b6e74f
29 May, 2025 2 commits
- [ROCm][V0][Attention] Revert to the previous FA triton kernel (#18226) · 1b7cfd5a
  Gregory Shtrasberg authored May 29, 2025
```
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
```
  1b7cfd5a
- [Attention][V1] Toggle for v1 attention backend (#18275) · da4b69d0
  Gregory Shtrasberg authored May 29, 2025
```
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
```
  da4b69d0
28 May, 2025 2 commits

[Bugfix][ROCm] fix the power of 2 exception from triton_unified_attention.py... · 269d9017

Hongxia Yang authored May 28, 2025


[Bugfix][ROCm] fix the power of 2 exception from triton_unified_attention.py when running llama4 models and unit test fix (#18100)
Signed-off-by: Hongxia Yang <hongxia.yang@amd.com>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com>

269d9017

[BugFix] FA2 MLA Accuracy Issue (#18807) · ce75efee
Lucas Wilkinson authored May 28, 2025
```
Signed-off-by: LucasWilkinson <lwilkinson@neuralmagic.com>
```
ce75efee

21 May, 2025 1 commit
- [ROCm][Kernel][V1] Enable AMD Radeon GPU Custom Paged Attention on v1 (#17004) · dd5fa7e0
  Hosang authored May 21, 2025
```
Signed-off-by: Hosang Yoon <hosang.yoon@amd.com>
```
  dd5fa7e0
20 May, 2025 1 commit
- [Kernel] update comment for KV shape in unified triton attn (#18099) · 980a1724
  Percy authored May 20, 2025
```
Signed-off-by: haochengxia <xhc_1007@163.com>
```
  980a1724
15 May, 2025 2 commits
- [Kernel] [V1] Fix performance regression for triton unified attention (#18161) · 01c22335
  Thomas Parnell authored May 15, 2025
```
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
Co-authored-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
```
  01c22335
- [Bugfix] Fix fp8 tests for triton_unified_attention for Triton 3.3 (#18013) · e6b8e65d
  Thomas Parnell authored May 15, 2025
```
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
Co-authored-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
```
  e6b8e65d
14 May, 2025 2 commits
- [BugFix][AMD] Compatible patch for AITER lib after 04/20 (#17912) · 4f8b3732
  qli88 authored May 14, 2025
```
Signed-off-by: Qiang Li <qiang.li2@amd.com>
```
  4f8b3732
- [Fix] Support CUDAGraph capture for encoder-decoder on ROCm (#18104) · 176a95c6
  Luka Govedič authored May 13, 2025
```
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
```
  176a95c6
13 May, 2025 1 commit
- Implements dual-chunk-flash-attn backend for dual chunk attention with sparse... · 60f76243
  Tao He authored May 13, 2025
```
Implements dual-chunk-flash-attn backend for dual chunk attention with sparse attention support (#11844)
```
  60f76243
11 May, 2025 2 commits
- [FP8][ROCm][Attention] Enable FP8 KV cache on ROCm for V1 (#17870) · 06c0922a
  Gregory Shtrasberg authored May 11, 2025
```
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
```
  06c0922a
- fix amd triton mla path (#17871) · eea22a56
  Shiyan Deng authored May 11, 2025
  
  eea22a56
09 May, 2025 4 commits
- Revert "[BugFix][AMD] Compatible patch for latest AITER(05/07/2025)" (#17910) · 85b72cb7
  Michael Goin authored May 09, 2025
  
  85b72cb7
- [BugFix][AMD] Compatible patch for latest AITER(05/07/2025) (#17864) · 9f64e934
  qli88 authored May 09, 2025
```
Signed-off-by: Qiang Li <qiang.li2@amd.com>
```
  9f64e934
- [Attention] MLA move rotary embedding to cuda-graph region (#17668) · 5e6f9394
  Lucas Wilkinson authored May 08, 2025
```
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
```
  5e6f9394
- [FEAT][ROCm]: Support AITER MLA on V1 Engine (#17523) · 3c9396a6
  vllmellm authored May 09, 2025
```
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
Co-authored-by: qli88 <qiang.li2@amd.com>
Co-authored-by: Hongxia Yang <62075498+hongxiayang@users.noreply.github.com>
```
  3c9396a6
08 May, 2025 1 commit
- [Hardware][Intel-Gaudi] Support Automatic Prefix Caching on HPU (#17648) · 843b2227
  Agata Dobrzyniewicz authored May 08, 2025
```
Signed-off-by: Agata Dobrzyniewicz <adobrzyniewicz@habana.ai>
```
  843b2227
07 May, 2025 1 commit
- [TPU] Add kernel test for moe_pallas (#17496) · e50a1f1a
  Michael Goin authored May 06, 2025
```
Signed-off-by: Michael Goin <mgoin64@gmail.com>
```
  e50a1f1a
06 May, 2025 3 commits
- [Kernel] Unified Triton kernel that doesn't distinguish between prefill + decode (#16828) · 2f925e57
  Thomas Parnell authored May 06, 2025
```
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Co-authored-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
```
  2f925e57
- [v1] AttentionMetadata for each layer (#17394) · cba31c47
  Chen Zhang authored May 06, 2025
```
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
```
  cba31c47
- [Bugfix] Fix triton import with local TritonPlaceholder (#17446) · f9bc5a06
  Mengqing Cao authored May 06, 2025
```
Signed-off-by: Mengqing Cao <cmq0113@163.com>
```
  f9bc5a06
04 May, 2025 1 commit
- Add full API docs and improve the UX of navigating them (#17485) · d6484ef3
  Harry Mellor authored May 04, 2025
```
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
```
  d6484ef3
02 May, 2025 3 commits

[Bugfix] fix tmp_out and exp_sums dimensions (#17438) · 4c33d673
Hui Liu authored May 02, 2025
```
Signed-off-by: Hui Liu <96135754+hliuca@users.noreply.github.com>
```
4c33d673

[Core] [Bugfix] Add Input Embeddings (#15428) · cc2a77d7

Andrew Sansom authored May 02, 2025


Signed-off-by: Andrew Sansom <andrew@protopia.ai>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: 临景 <linjing.yx@alibaba-inc.com>
Co-authored-by: Bryce1010 <bryceyx@gmail.com>
Co-authored-by: Nan2018 <nan@protopia.ai>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>

cc2a77d7

[Attention] MLA move o_proj q_proj into cuda-graph region (#17484) · afcb3f88
Lucas Wilkinson authored May 01, 2025
```
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
```
afcb3f88

01 May, 2025 2 commits
- [ROCm] remove unsupported archs from rocm triton flash-attention supported list (#17536) · 28566d73
  Hongxia Yang authored May 01, 2025
```
Signed-off-by: Hongxia Yang <hongxia.yang@amd.com>
```
  28566d73
- [BugFix] Fix mla cpu - missing 3 required positional arguments (#17494) · 3c3d7672
  Lucas Wilkinson authored May 01, 2025
```
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
```
  3c3d7672
30 Apr, 2025 2 commits
- [Hardware][Intel GPU] Upgrade to torch 2.7 (#17444) · ed6cfb90
  Kunshang Ji authored Apr 30, 2025
```
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
Co-authored-by: Qiming Zhang <qiming1.zhang@intel.com>
```
  ed6cfb90
- Update PyTorch to 2.7.0 (#16859) · 2c4f59af
  Huy Do authored Apr 29, 2025
  
  2c4f59af
28 Apr, 2025 1 commit

[BugFix] Fix vllm_flash_attn install issues (#17267) · d8bccde6

Lucas Wilkinson authored Apr 27, 2025


Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Aaron Pham <contact@aarnphm.xyz>

d8bccde6

27 Apr, 2025 2 commits
- [Bugfix] Get a specific type of layer from forward context (#17222) · 838cedad
  Chen Zhang authored Apr 27, 2025
```
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
```
  838cedad
- [Kernel][Triton][FP8] Adding fp8 and variable length sequence support to... · 8e4b351a
  rasmith authored Apr 26, 2025
```
[Kernel][Triton][FP8] Adding fp8 and variable length sequence support to Triton FAv2 kernel (#12591)
Signed-off-by: Randall Smith <Randall.Smith@amd.com>
```
  8e4b351a