- 08 Apr, 2026 1 commit
-
-
wujl5 authored
-
- 02 Apr, 2026 1 commit
-
-
yangql authored
-
- 28 Mar, 2026 1 commit
-
-
wanglong3 authored
-
- 27 Mar, 2026 1 commit
-
-
laibao authored
-
- 19 Mar, 2026 1 commit
-
-
laibao authored
移除 forward 中对 experts.use_overlapped/_shared_experts 的状态改写,避免 torch.compile 启动期 shared/non-shared 路径不一致 FusedMoE.forward_impl 仅在 shared_output 为空时计算 shared experts,防止透传值被本地重算覆盖
-
- 18 Mar, 2026 4 commits
-
-
guanyu1 authored
-
guanyu1 authored
-
laibao authored
新增环境变量 VLLM_USE_LIGHTOP_MOE_SUM_MUL_ADD 用于控制 fused sum+mul+add 开关。 在 DeepseekV2MoE 中增加 fused 路径,预计算 shared_output,并下传 iqis 与 routed_scaling_factor。 扩展 FusedMoE/SharedFusedMoE 及相关 custom op 接口,统一透传 i_q/i_s/shared_output/routed_scaling_factor。 同步适配 Triton、Marlin W16A16、SlimQuant W4A8、CompressedTensors W8A8 等实现,支持在内核侧完成 sum+mul+add。
-
yangql authored
-
- 17 Mar, 2026 2 commits
- 16 Mar, 2026 2 commits
- 15 Mar, 2026 1 commit
-
-
fanwl authored
- Add VLLM_V1_USE_FA_UNIFIED_ATTN_2D 环境变量 - 0: Triton attention, 1: FA unified attention
-
- 12 Mar, 2026 2 commits
- 11 Mar, 2026 1 commit
-
-
yangql authored
-
- 07 Mar, 2026 1 commit
-
-
wanglong3 authored
-
- 06 Mar, 2026 3 commits
- 05 Mar, 2026 2 commits
- 03 Mar, 2026 1 commit
-
-
zhuwenwen authored
-
- 02 Mar, 2026 2 commits
- 24 Feb, 2026 1 commit
-
-
laibao authored
新增 router_capture 工具,用于按 num_tokens/rank 过滤并落盘 MoE router logits 在 Qwen3MoeSparseMoeBlock 中接入采集调用,并在 torch.compile 场景下自动跳过 补充 VLLM_MOE_ROUTER_CAPTURE* 环境变量
-
- 16 Feb, 2026 2 commits
- 13 Feb, 2026 1 commit
-
-
王敏 authored
-
- 11 Feb, 2026 1 commit
-
-
laibao authored
参考并移植 011/vllm 的关键提交逻辑 新增 VLLM_USE_MOE_W16A16_TRITON 开关,并接入基于 lightop 的运行时能力探测与启用结果缓存。 在权重加载后对 w13 与 w2 执行 W16A16 Marlin 预打包。 W16A16 Marlin 启用时保留 monolithic 执行路径,并在 fused_experts_impl 中增加 packed 权重 fast-path。 保持 Marlin 或 lightop 不可用时的回退行为不变。
-
- 10 Feb, 2026 1 commit
-
-
zhuwenwen authored
-
- 06 Feb, 2026 4 commits
- 04 Feb, 2026 3 commits
-
-
zhuwenwen authored
-
zhuwenwen authored
-
Michael Goin authored
Signed-off-by:Robert Shaw <rshaw@neuralmagic.com>
-
- 03 Feb, 2026 1 commit
-
-
Kiersten Stokes authored
Signed-off-by:
kiersten-stokes <kierstenstokes@gmail.com> (cherry picked from commit 9e138cb0)
-