- 18 Mar, 2026 1 commit
-
-
laibao authored
新增环境变量 VLLM_USE_LIGHTOP_MOE_SUM_MUL_ADD 用于控制 fused sum+mul+add 开关。 在 DeepseekV2MoE 中增加 fused 路径,预计算 shared_output,并下传 iqis 与 routed_scaling_factor。 扩展 FusedMoE/SharedFusedMoE 及相关 custom op 接口,统一透传 i_q/i_s/shared_output/routed_scaling_factor。 同步适配 Triton、Marlin W16A16、SlimQuant W4A8、CompressedTensors W8A8 等实现,支持在内核侧完成 sum+mul+add。
-
- 17 Mar, 2026 1 commit
-
-
caihl authored
-
- 16 Mar, 2026 3 commits
- 15 Mar, 2026 1 commit
-
-
fanwl authored
- Add VLLM_V1_USE_FA_UNIFIED_ATTN_2D 环境变量 - 0: Triton attention, 1: FA unified attention
-
- 13 Mar, 2026 3 commits
- 12 Mar, 2026 8 commits
- 11 Mar, 2026 3 commits
- 10 Mar, 2026 2 commits
- 09 Mar, 2026 1 commit
-
-
yangql authored
-
- 07 Mar, 2026 1 commit
-
-
wanglong3 authored
-
- 06 Mar, 2026 3 commits
- 05 Mar, 2026 3 commits
- 04 Mar, 2026 1 commit
-
-
liuchy5 authored
-
- 03 Mar, 2026 1 commit
-
-
zhuwenwen authored
-
- 02 Mar, 2026 2 commits
- 28 Feb, 2026 1 commit
-
-
yangql1 authored
-
- 25 Feb, 2026 1 commit
-
-
SAC_fanth authored
-
- 24 Feb, 2026 2 commits
- 19 Feb, 2026 1 commit
-
-
王敏 authored
-
- 16 Feb, 2026 1 commit
-
-
Rayyyyy authored
-