- 26 Mar, 2026 1 commit
-
-
laibao authored
feat(v1 attention): 为 ROCm FlashAttention 接入 unified kv layout,并打通 mm_prefix、qq_bias 与 use_alibi_sqrt 透传 在 ROCm FlashAttention 后端增加 unified KV layout 选择逻辑 接入 unified varlen kernel 调用路径 在 FlashAttention metadata 中补充 mm_prefix_range 与 qq_bias 透传
-
- 06 Feb, 2026 1 commit
-
-
zhuwenwen authored
set fp8_e4m3 only supported on nmz and support q&kvcache fp8 set VLLM_PCIE_USE_CUSTOM_ALLREDUCE=1
-
- 04 Feb, 2026 1 commit
-
-
zhuwenwen authored
-
- 02 Feb, 2026 1 commit
-
-
Lucas Wilkinson authored
Signed-off-by:
Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by:
Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by:
gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by:
Michael Goin <mgoin64@gmail.com> (cherry picked from commit 0a3c71e7)
-
- 30 Jan, 2026 1 commit
-
-
zhuwenwen authored
add prepare_so_files to prepare so
-
- 29 Jan, 2026 1 commit
-
-
zhuwenwen authored
not supported FlashMLASchedMeta
-
- 28 Jan, 2026 1 commit
-
-
Nicolò Lucchesi authored
Signed-off-by:
NickLucche <nlucches@redhat.com> (cherry picked from commit 1f3a2c29)
-
- 24 Jan, 2026 1 commit
-
-
ElizaWszola authored
Signed-off-by:
ElizaWszola <ewszola@redhat.com> Signed-off-by:
mgoin <mgoin64@gmail.com> Signed-off-by:
Matthew Bonanni <mbonanni@redhat.com> Signed-off-by:
Luka Govedič <luka.govedic@gmail.com> Signed-off-by:
Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by:
Luka Govedič <ProExpertProg@users.noreply.github.com> Signed-off-by:
Luka Govedič <lgovedic@redhat.com> Co-authored-by:
mgoin <mgoin64@gmail.com> Co-authored-by:
Varun Sundar Rabindranath <varunsundar08@gmail.com> Co-authored-by:
Matthew Bonanni <mbonanni@redhat.com> Co-authored-by:
Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by:
Luka Govedič <luka.govedic@gmail.com> Co-authored-by:
Lucas Wilkinson <lwilkins@redhat.com> Co-authored-by:
Luka Govedič <lgovedic@redhat.com>
-
- 22 Jan, 2026 2 commits
-
-
Eldar Kurtić authored
Signed-off-by:
Eldar Kurtic <8884008+eldarkurtic@users.noreply.github.com> Signed-off-by:
eldarkurtic <8884008+eldarkurtic@users.noreply.github.com>
-
Matthew Bonanni authored
Signed-off-by:Matthew Bonanni <mbonanni@redhat.com>
-
- 21 Jan, 2026 1 commit
-
-
zhuwenwen authored
-
- 16 Jan, 2026 2 commits
-
-
zhuwenwen authored
区分pcie和hglink custom allreduce的使用 vllm:export VLLM_CUSTOM_CACHE=1 dtk:export HIP_KERNEL_EVENT_SYSTENFENCE=1 set VLLM_USE_FUSED_RMS_ROPE=1 add SUPPORT_MOE_MARLIN_W16A16 to use moe marlin on bw support fa kvcache fp8 (todo: add VLLM_USE_QUERY_QUANT to not use q quant) update moe_align_block_size
-
zhuwenwen authored
fix _forward_encoder_attention remove medusa set VLLM_PCIE_USE_CUSTOM_ALLREDUCE=1
-
- 12 Jan, 2026 1 commit
-
-
Matthew Bonanni authored
Signed-off-by:Matthew Bonanni <mbonanni@redhat.com>
-
- 09 Jan, 2026 1 commit
-
-
Matthew Bonanni authored
Signed-off-by:Matthew Bonanni <mbonanni@redhat.com>
-
- 07 Jan, 2026 1 commit
-
-
Jack Yang authored
Signed-off-by:
Zhuohao Yang <zy242@cornell.edu> Co-authored-by:
Zhuohao Yang <zy242@cornell.edu> Co-authored-by:
Wentao Ye <44945378+yewentao256@users.noreply.github.com>
-
- 05 Jan, 2026 1 commit
-
-
zhuwenwen authored
-
- 25 Dec, 2025 1 commit
-
-
zhuwenwen authored
-
- 16 Dec, 2025 1 commit
-
-
Lucas Wilkinson authored
Signed-off-by:
Lucas Wilkinson <lwilkins@redhat.com> Co-authored-by:
Stanislaw Wozniak <stw@zurich.ibm.com>
-
- 05 Dec, 2025 1 commit
-
-
Matthew Bonanni authored
Signed-off-by:
Matthew Bonanni <mbonanni@redhat.com> Signed-off-by:
Matthew Bonanni <mbonanni001@gmail.com> Co-authored-by:
gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by:
Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by:
Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
-
- 03 Dec, 2025 1 commit
-
-
zhuwenwen authored
add VLLM_USE_OPT_RESHAPE_AND_CACHE、VLLM_USE_FUSE_SILU_AND_MUL and VLLM_USE_TOPK_RENORM for qwen3-30b
-
- 27 Nov, 2025 2 commits
-
-
Matthew Bonanni authored
Signed-off-by:Matthew Bonanni <mbonanni@redhat.com>
-
Matthew Bonanni authored
[Attention][Async] Eliminate `seq_lens_cpu` in FlashAttention metadata building with DCP > 1 (#29449) Signed-off-by:Matthew Bonanni <mbonanni@redhat.com>
-
- 26 Nov, 2025 1 commit
-
-
Matthew Bonanni authored
Signed-off-by:Matthew Bonanni <mbonanni@redhat.com>
-
- 22 Nov, 2025 1 commit
-
-
Nicolò Lucchesi authored
Signed-off-by:NickLucche <nlucches@redhat.com>
-
- 20 Nov, 2025 1 commit
-
-
Or Ozeri authored
Signed-off-by:Or Ozeri <oro@il.ibm.com>
-
- 19 Nov, 2025 2 commits
-
-
Qiu authored
Signed-off-by:
QiuChunshuo <qiuchunshuo@huawei.com> Signed-off-by:
FENP <yuanyongjie.yyj@antgroup.com> Signed-off-by:
LookAround <lixushi@huawei.com> Signed-off-by:
Jingchun Gao <gaojingchun1@huawei.com> Signed-off-by:
zhenwenqi2024 <zhenwenqi_2022@qq.com> Co-authored-by:
FENP <yuanyongjie.yyj@antgroup.com> Co-authored-by:
LookAround <lixushi@huawei.com> Co-authored-by:
Jingchun Gao <gaojingchun1@huawei.com> Co-authored-by:
zhenwenqi2024 <zhenwenqi_2022@qq.com> Co-authored-by:
Jingchun Gao <63247409+gjc0824@users.noreply.github.com>
-
Matthew Bonanni authored
Signed-off-by:Matthew Bonanni <mbonanni@redhat.com>
-
- 14 Nov, 2025 1 commit
-
-
Lucas Wilkinson authored
-
- 13 Nov, 2025 2 commits
-
-
Matthew Bonanni authored
Signed-off-by:Matthew Bonanni <mbonanni@redhat.com>
-
Huamin Li authored
Signed-off-by:Huamin Li <3ericli@gmail.com>
-
- 12 Nov, 2025 1 commit
-
-
Benjamin Chislett authored
[Perf] Refactor cudagraph_support to enable full CUDA graphs for spec decoding with FlashInfer (#28479) Signed-off-by:Benjamin Chislett <bchislett@nvidia.com>
-
- 11 Nov, 2025 1 commit
-
-
Matthew Bonanni authored
Signed-off-by:
Matthew Bonanni <mbonanni@redhat.com> Signed-off-by:
Matthew Bonanni <mbonanni001@gmail.com> Co-authored-by:
Luka Govedič <ProExpertProg@users.noreply.github.com>
-
- 10 Nov, 2025 1 commit
-
-
Lucas Wilkinson authored
Signed-off-by:Lucas Wilkinson <lwilkins@redhat.com>
-
- 08 Nov, 2025 2 commits
-
-
zhangsicheng5 authored
Signed-off-by:
zhangsicheng5 <zhangsicheng5@huawei.com> Signed-off-by:
QiuChunshuo <qiuchunshuo@huawei.com> Signed-off-by:
Qiu <qiuchunshuo@huawei.com> Co-authored-by:
QiuChunshuo <qiuchunshuo@huawei.com>
-
22quinn authored
Signed-off-by:22quinn <33176974+22quinn@users.noreply.github.com>
-
- 05 Nov, 2025 1 commit
-
-
Kunshang Ji authored
Signed-off-by:Kunshang Ji <kunshang.ji@intel.com>
-
- 03 Nov, 2025 1 commit
-
-
Thomas Parnell authored
Signed-off-by:Thomas Parnell <tpa@zurich.ibm.com>
-
- 26 Oct, 2025 1 commit
-
-
Yeshwanth N authored
Signed-off-by:
Yeshwanth Surya <yeshsurya@gmail.com> Signed-off-by:
Yeshwanth N <yeshsurya@gmail.com> Signed-off-by:
yeshsurya <yeshsurya@gmail.com>
-
- 24 Oct, 2025 1 commit
-
-
fhl2000 authored
Signed-off-by:
fhl <2410591650@qq.com> Signed-off-by:
fhl2000 <63384265+fhl2000@users.noreply.github.com> Signed-off-by:
Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by:
Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by:
Harry Mellor <19981378+hmellor@users.noreply.github.com>
-