- 16 Jan, 2026 2 commits
-
-
zhuwenwen authored
区分pcie和hglink custom allreduce的使用 vllm:export VLLM_CUSTOM_CACHE=1 dtk:export HIP_KERNEL_EVENT_SYSTENFENCE=1 set VLLM_USE_FUSED_RMS_ROPE=1 add SUPPORT_MOE_MARLIN_W16A16 to use moe marlin on bw support fa kvcache fp8 (todo: add VLLM_USE_QUERY_QUANT to not use q quant) update moe_align_block_size
-
zhuwenwen authored
fix _forward_encoder_attention remove medusa set VLLM_PCIE_USE_CUSTOM_ALLREDUCE=1
-
- 07 Jan, 2026 1 commit
-
-
zhuwenwen authored
-
- 19 Dec, 2025 1 commit
-
-
zhuwenwen authored
-
- 18 Dec, 2025 1 commit
-
-
zhuwenwen authored
-
- 17 Dec, 2025 1 commit
-
-
zhuwenwen authored
修复CompressedTensorsLinearMethod中的w4a16的冲突问题 feat(moe): add Marlin W16A16 fused MoE behind VLLM_USE_MARLIN_W16A16_MOE replace the fp8_mqa_logits and fp8_paged_mqa_logits interfaces in deepgemm with mqa_logits and paged_mqa_logits from lightop
-
- 13 Dec, 2025 1 commit
-
-
zhuwenwen authored
-
- 12 Dec, 2025 1 commit
-
-
Lucas Wilkinson authored
Signed-off-by:Lucas Wilkinson <lwilkins@redhat.com>
-
- 11 Dec, 2025 2 commits
-
-
Wentao Ye authored
Signed-off-by:yewentao256 <zhyanwentao@126.com>
-
Cyrus Leung authored
Signed-off-by:DarkLight1337 <tlleungac@connect.ust.hk>
-
- 10 Dec, 2025 1 commit
-
-
Jialin Ouyang authored
[Perf] Enable environment cache in EngineCore to enable the feature for UniProcExecutor as well (#29289) Signed-off-by:Jialin Ouyang <Jialin.Ouyang@gmail.com>
-
- 09 Dec, 2025 3 commits
-
-
Benjamin Chislett authored
Signed-off-by:
Benjamin Chislett <bchislett@nvidia.com> Signed-off-by:
Benjamin Chislett <chislett.ben@gmail.com> Co-authored-by:
gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by:
Harry Mellor <19981378+hmellor@users.noreply.github.com>
-
Wentao Ye authored
[Compile] Fix torch warning `TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled` (#29897) Signed-off-by:yewentao256 <zhyanwentao@126.com>
-
Ming Yang authored
Signed-off-by:Ming Yang <minos.future@gmail.com>
-
- 04 Dec, 2025 1 commit
-
-
dtc authored
Signed-off-by:
Tianchen Ding <dtcccc@linux.alibaba.com> Signed-off-by:
dtc <dtcccc@linux.alibaba.com> Co-authored-by:
Nicolò Lucchesi <nicolo.lucchesi@gmail.com>
-
- 03 Dec, 2025 4 commits
-
-
Shengqi Chen authored
[CI] fix docker image build by specifying merge-base commit id when downloading pre-compiled wheels (#29930) Signed-off-by:Shengqi Chen <harry-chen@outlook.com>
-
Elizabeth Thomas authored
Signed-off-by:
Elizabeth Thomas <email2eliza@gmail.com> Signed-off-by:
Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by:
Roger Wang <hey@rogerw.io> Signed-off-by:
Jane Xu <janeyx@meta.com> Signed-off-by:
Nick Hill <nhill@redhat.com> Signed-off-by:
Johnny Yang <johnnyyang@google.com> Co-authored-by:
Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by:
bruceszchen <bruceszchen@tencent.com> Co-authored-by:
Roger Wang <hey@rogerw.io> Co-authored-by:
Jane (Yuan) Xu <31798555+janeyx99@users.noreply.github.com> Co-authored-by:
Nick Hill <nhill@redhat.com> Co-authored-by:
Johnny Yang <24908445+jcyang43@users.noreply.github.com>
-
Amr Mahdi authored
Signed-off-by:Amr Mahdi <amrmahdi@meta.com>
-
zhuwenwen authored
add VLLM_USE_OPT_RESHAPE_AND_CACHE、VLLM_USE_FUSE_SILU_AND_MUL and VLLM_USE_TOPK_RENORM for qwen3-30b
-
- 02 Dec, 2025 3 commits
-
-
Andrew Xia authored
Signed-off-by:
Andrew Xia <axia@fb.com> Co-authored-by:
Andrew Xia <axia@fb.com>
-
zhuwenwen authored
-
Shengqi Chen authored
Signed-off-by:Shengqi Chen <harry-chen@outlook.com>
-
- 01 Dec, 2025 4 commits
-
-
Kevin H. Luu authored
-
Shengqi Chen authored
Signed-off-by:Shengqi Chen <harry-chen@outlook.com>
-
Yifei Zhang authored
Signed-off-by:Yifei Zhang <yifei.zhang1992@outlook.com>
-
Shu Wang authored
Signed-off-by:
Shu Wang <shuw@nvidia.com> Signed-off-by:
Shu Wang. <shuw@nvidia.com> Signed-off-by:
Michael Goin <mgoin64@gmail.com> Co-authored-by:
root <root@umbriel-b200-017.ipp4a1.colossus.nvidia.com> Co-authored-by:
Michael Goin <mgoin64@gmail.com>
-
- 29 Nov, 2025 1 commit
-
-
Jinzhen Lin authored
Signed-off-by:
Jinzhen Lin <jinzhen.ljz@antgroup.com> Signed-off-by:
Michael Goin <mgoin64@gmail.com> Signed-off-by:
Jinzhen Lin <linjinzhen@hotmail.com> Co-authored-by:
Michael Goin <mgoin64@gmail.com> Co-authored-by:
Michael Goin <mgoin@redhat.com>
-
- 28 Nov, 2025 1 commit
-
-
杰兮 authored
Signed-off-by:
zhyajie <yajizhan@amd.com> Co-authored-by:
zhyajie <yajizhan@amd.com> Co-authored-by:
TJian <tunjian.tan@embeddedllm.com>
-
- 26 Nov, 2025 1 commit
-
-
zhuwenwen authored
-
- 24 Nov, 2025 1 commit
-
-
Roger Wang authored
Signed-off-by:Roger Wang <hey@rogerw.io>
-
- 21 Nov, 2025 2 commits
-
-
Lucas Wilkinson authored
[BugFix] Make sure to allocate worst case MoE workspace during profile run in the DP + EP case (#27426) Signed-off-by:Lucas Wilkinson <lwilkins@redhat.com>
-
Woosuk Kwon authored
Signed-off-by:Woosuk Kwon <woosuk.kwon@berkeley.edu>
-
- 20 Nov, 2025 4 commits
-
-
zhuwenwen authored
update VLLM_USE_PD_SPLIT=0 (for dspk)and VLLM_USE_PD_SPLIT=1 (for others)
-
Benjamin Chislett authored
Signed-off-by:
Benjamin Chislett <bchislett@nvidia.com> Signed-off-by:
Benjamin Chislett <chislett.ben@gmail.com> Co-authored-by:
gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
-
zhuwenwen authored
add VLLM_USE_PP_SYNC to use pp sync update qwen3 of rmsnorm
-
Nick Hill authored
Signed-off-by:Nick Hill <nhill@redhat.com>
-
- 19 Nov, 2025 4 commits
-
-
Shu Wang authored
Signed-off-by:
Shu Wang. <shuw@nvidia.com> Signed-off-by:
mgoin <mgoin64@gmail.com> Co-authored-by:
Michael Goin <mgoin64@gmail.com>
-
vnadathur authored
Signed-off-by:
vnadathur <glvikramn@gmail.com> Signed-off-by:
WorldExplored <srreyansh.sethi@gmail.com> Signed-off-by:
Srreyansh Sethi <srreyansh.sethi@gmail.com> Signed-off-by:
Srreyansh Sethi <107075589+WorldExplored@users.noreply.github.com> Co-authored-by:
WorldExplored <srreyansh.sethi@gmail.com> Co-authored-by:
Srreyansh Sethi <107075589+worldexplored@users.noreply.github.com> Co-authored-by:
vnadathur <236933696+vnadathur@users.noreply.github.com> Co-authored-by:
Luka Govedič <ProExpertProg@users.noreply.github.com>
-
Didier Durand authored
Signed-off-by:Didier Durand <durand.didier@gmail.com>
-
Li, Jiang authored
Signed-off-by:jiang1.li <jiang1.li@intel.com>
-