- 08 Apr, 2026 2 commits
- 03 Apr, 2026 2 commits
- 02 Apr, 2026 2 commits
- 01 Apr, 2026 3 commits
- 28 Mar, 2026 1 commit
-
-
wanglong3 authored
-
- 27 Mar, 2026 3 commits
-
-
flyingdown authored
-
laibao authored
-
flyingdown authored
-
- 26 Mar, 2026 6 commits
-
-
laibao authored
-
laibao authored
feat(v1 attention): 为 ROCm FlashAttention 接入 unified kv layout,并打通 mm_prefix、qq_bias 与 use_alibi_sqrt 透传 在 ROCm FlashAttention 后端增加 unified KV layout 选择逻辑 接入 unified varlen kernel 调用路径 在 FlashAttention metadata 中补充 mm_prefix_range 与 qq_bias 透传
-
wanghl6 authored
-
wanghl6 authored
-
wanghl6 authored
-
wanglong3 authored
-
- 24 Mar, 2026 6 commits
- 23 Mar, 2026 1 commit
-
-
guanyu1 authored
-
- 21 Mar, 2026 6 commits
- 20 Mar, 2026 1 commit
-
-
laibao authored
-
- 19 Mar, 2026 2 commits
- 18 Mar, 2026 5 commits
-
-
guanyu1 authored
-
guanyu1 authored
-
guanyu1 authored
-
laibao authored
新增环境变量 VLLM_USE_LIGHTOP_MOE_SUM_MUL_ADD 用于控制 fused sum+mul+add 开关。 在 DeepseekV2MoE 中增加 fused 路径,预计算 shared_output,并下传 iqis 与 routed_scaling_factor。 扩展 FusedMoE/SharedFusedMoE 及相关 custom op 接口,统一透传 i_q/i_s/shared_output/routed_scaling_factor。 同步适配 Triton、Marlin W16A16、SlimQuant W4A8、CompressedTensors W8A8 等实现,支持在内核侧完成 sum+mul+add。
-
renzhc authored
-