- 26 Mar, 2026 1 commit
-
-
laibao authored
feat(v1 attention): 为 ROCm FlashAttention 接入 unified kv layout,并打通 mm_prefix、qq_bias 与 use_alibi_sqrt 透传 在 ROCm FlashAttention 后端增加 unified KV layout 选择逻辑 接入 unified varlen kernel 调用路径 在 FlashAttention metadata 中补充 mm_prefix_range 与 qq_bias 透传
-
- 12 Mar, 2026 2 commits
- 09 Mar, 2026 1 commit
-
-
zhangshao authored
-
- 06 Mar, 2026 1 commit
-
-
王敏 authored
-
- 03 Mar, 2026 1 commit
-
-
zhuwenwen authored
-
- 30 Jan, 2026 1 commit
-
-
zhuwenwen authored
add prepare_so_files to prepare so
-
- 24 Jan, 2026 1 commit
-
-
ElizaWszola authored
Signed-off-by:
ElizaWszola <ewszola@redhat.com> Signed-off-by:
mgoin <mgoin64@gmail.com> Signed-off-by:
Matthew Bonanni <mbonanni@redhat.com> Signed-off-by:
Luka Govedič <luka.govedic@gmail.com> Signed-off-by:
Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by:
Luka Govedič <ProExpertProg@users.noreply.github.com> Signed-off-by:
Luka Govedič <lgovedic@redhat.com> Co-authored-by:
mgoin <mgoin64@gmail.com> Co-authored-by:
Varun Sundar Rabindranath <varunsundar08@gmail.com> Co-authored-by:
Matthew Bonanni <mbonanni@redhat.com> Co-authored-by:
Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by:
Luka Govedič <luka.govedic@gmail.com> Co-authored-by:
Lucas Wilkinson <lwilkins@redhat.com> Co-authored-by:
Luka Govedič <lgovedic@redhat.com>
-
- 22 Jan, 2026 1 commit
-
-
Eldar Kurtić authored
Signed-off-by:
Eldar Kurtic <8884008+eldarkurtic@users.noreply.github.com> Signed-off-by:
eldarkurtic <8884008+eldarkurtic@users.noreply.github.com>
-
- 18 Jan, 2026 1 commit
-
-
Li Xie authored
Signed-off-by:xieli <xieli@stepfun.com>
-
- 16 Jan, 2026 1 commit
-
-
zhuwenwen authored
区分pcie和hglink custom allreduce的使用 vllm:export VLLM_CUSTOM_CACHE=1 dtk:export HIP_KERNEL_EVENT_SYSTENFENCE=1 set VLLM_USE_FUSED_RMS_ROPE=1 add SUPPORT_MOE_MARLIN_W16A16 to use moe marlin on bw support fa kvcache fp8 (todo: add VLLM_USE_QUERY_QUANT to not use q quant) update moe_align_block_size
-
- 15 Jan, 2026 1 commit
-
-
Li Wang authored
Signed-off-by:wangli <wangli858794774@gmail.com>
-
- 09 Jan, 2026 1 commit
-
-
Matthew Bonanni authored
Signed-off-by:Matthew Bonanni <mbonanni@redhat.com>
-
- 02 Jan, 2026 1 commit
-
-
Andreas Karatzas authored
Signed-off-by:Andreas Karatzas <akaratza@amd.com>
-
- 30 Dec, 2025 1 commit
-
-
yt0428 authored
Signed-off-by:
yuantao <2422264527@qq.com> Signed-off-by:
yt0428 <51468697+yt0428@users.noreply.github.com> Co-authored-by:
Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by:
Cyrus Leung <tlleungac@connect.ust.hk>
-
- 28 Dec, 2025 1 commit
-
-
Boyuan Feng authored
Signed-off-by:Boyuan Feng <boyuan@meta.com>
-
- 18 Dec, 2025 1 commit
-
-
Isotr0py authored
[MM Encoder]: Migrate legacy ViT `MultiHeadAttention` to new `MMEncoderAttention` interface (#30684) Signed-off-by:Isotr0py <mozf@mail2.sysu.edu.cn>
-
- 15 Dec, 2025 2 commits
-
-
Matthew Bonanni authored
Signed-off-by:Matthew Bonanni <mbonanni@redhat.com>
-
Shanshan Shen authored
[CustomOp][MM] Extract MMEncoderAttention as CustomOp and replace the backend of QwenVisionAttention with it. (#30125) Signed-off-by:
shen-shanshan <467638484@qq.com> Signed-off-by:
Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by:
tjtanaa <tunjian.tan@embeddedllm.com> Co-authored-by:
Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by:
tjtanaa <tunjian.tan@embeddedllm.com>
-
- 13 Dec, 2025 1 commit
-
-
Matthew Bonanni authored
Signed-off-by:Matthew Bonanni <mbonanni@redhat.com>
-
- 09 Dec, 2025 2 commits
-
-
rasmith authored
[CI/Build] Make test_mha_attn.py run on correct platform only and check for flash_attn_varlen_func in layer.py (#29145)
-
Wentao Ye authored
Signed-off-by:yewentao256 <zhyanwentao@126.com>
-
- 07 Dec, 2025 1 commit
-
-
Isotr0py authored
Signed-off-by:Isotr0py <mozf@mail2.sysu.edu.cn>
-
- 05 Dec, 2025 1 commit
-
-
Matthew Bonanni authored
Signed-off-by:
Matthew Bonanni <mbonanni@redhat.com> Signed-off-by:
Matthew Bonanni <mbonanni001@gmail.com> Co-authored-by:
gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by:
Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by:
Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
-
- 28 Nov, 2025 1 commit
-
-
Mingyuan Ma authored
Signed-off-by:
mingyuanm <mingyuanm@nvidia.com> Signed-off-by:
Roger Wang <hey@rogerw.io> Co-authored-by:
Roger Wang <hey@rogerw.io>
-
- 26 Nov, 2025 1 commit
-
-
Matthew Bonanni authored
Signed-off-by:Matthew Bonanni <mbonanni@redhat.com>
-
- 24 Nov, 2025 1 commit
-
-
Roger Wang authored
Signed-off-by:Roger Wang <hey@rogerw.io>
-
- 18 Nov, 2025 1 commit
-
-
Song Zhixin authored
Signed-off-by:
jesse <szxfml@gmail.com> Signed-off-by:
Song Zhixin <szxfml@gmail.com> Co-authored-by:
gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
-
- 13 Nov, 2025 2 commits
-
-
Huamin Li authored
Signed-off-by:Huamin Li <3ericli@gmail.com>
-
zhuwenwen authored
set default_max_num_batched_tokens = 10240 update qwen3_moe of layernorm off lightop of moe_fused_gate
-
- 12 Nov, 2025 1 commit
-
-
Nicolò Lucchesi authored
Signed-off-by:
NickLucche <nlucches@redhat.com> Signed-off-by:
Mark McLoughlin <markmc@redhat.com> Co-authored-by:
Mark McLoughlin <markmc@redhat.com>
-
- 11 Nov, 2025 2 commits
-
-
Matthew Bonanni authored
Signed-off-by:
Matthew Bonanni <mbonanni@redhat.com> Signed-off-by:
Matthew Bonanni <mbonanni001@gmail.com> Co-authored-by:
Luka Govedič <ProExpertProg@users.noreply.github.com>
-
David Ben-David authored
Signed-off-by:
David Ben-David <davidb@pliops.com> Co-authored-by:
David Ben-David <davidb@pliops.com> Co-authored-by:
Mark McLoughlin <markmc@redhat.com>
-
- 10 Nov, 2025 1 commit
-
-
Adrian Abeyta authored
Signed-off-by:adabeyta <aabeyta@redhat.com>
-
- 01 Nov, 2025 1 commit
-
-
Yan Ma authored
Signed-off-by:
Yan Ma <yan.ma@intel.com> Signed-off-by:
Kunshang Ji <kunshang.ji@intel.com> Co-authored-by:
Yejing Lai <yejing.lai@intel.com> Co-authored-by:
Guancheng Fu <110874468+gc-fu@users.noreply.github.com> Co-authored-by:
Kunshang Ji <kunshang.ji@intel.com>
-
- 31 Oct, 2025 1 commit
-
-
zhuwenwen authored
-
- 26 Oct, 2025 1 commit
-
-
JartX authored
[BUGFIX][ROCM] ViT FlashAttention on ROCm (no GFX9) and contiguous on qwen3vl ROCm TORCH_SDPA (#27190) Signed-off-by:
JartX <sagformas@epdcenter.es> Co-authored-by:
tjtanaa <tunjian.tan@embeddedllm.com>
-
- 25 Oct, 2025 1 commit
-
-
Matthew Bonanni authored
Signed-off-by:Matthew Bonanni <mbonanni@redhat.com>
-
- 23 Oct, 2025 1 commit
-
-
Bradley D authored
Co-authored-by:
Bradley D <4551889+bradleyhd@users.noreply.github.com> Co-authored-by:
Roger Wang <hey@rogerw.io>
-
- 21 Oct, 2025 1 commit
-
-
Roger Wang authored
Signed-off-by:Roger Wang <hey@rogerw.io>
-