- 19 Mar, 2026 1 commit
-
-
laibao authored
移除 forward 中对 experts.use_overlapped/_shared_experts 的状态改写,避免 torch.compile 启动期 shared/non-shared 路径不一致 FusedMoE.forward_impl 仅在 shared_output 为空时计算 shared experts,防止透传值被本地重算覆盖
-
- 18 Mar, 2026 1 commit
-
-
laibao authored
新增环境变量 VLLM_USE_LIGHTOP_MOE_SUM_MUL_ADD 用于控制 fused sum+mul+add 开关。 在 DeepseekV2MoE 中增加 fused 路径,预计算 shared_output,并下传 iqis 与 routed_scaling_factor。 扩展 FusedMoE/SharedFusedMoE 及相关 custom op 接口,统一透传 i_q/i_s/shared_output/routed_scaling_factor。 同步适配 Triton、Marlin W16A16、SlimQuant W4A8、CompressedTensors W8A8 等实现,支持在内核侧完成 sum+mul+add。
-
- 17 Mar, 2026 4 commits
- 16 Mar, 2026 16 commits
-
-
wangmin6 authored
fix: resolve block_shape conflicts between DeepEP MoE and non-DeepEP quantization See merge request dcutoolkit/deeplearing/vllm!507
-
chenhw5 authored
-
zhangqha authored
Merge v0.15.1-dev-pd into v0.15.1-dev See merge request dcutoolkit/deeplearing/vllm!506
-
xuxz authored
-
zhangqha authored
Fix Qwen3/Qwen3.5 Reasoning Parser (#34779) See merge request dcutoolkit/deeplearing/vllm!504
-
xuxz authored
-
xuxz authored
-
wangmin6 authored
perf: GLM4.7增加MOE调用rmsQuant, fix: 修掉fused_moe向后传递None导致的报错 See merge request dcutoolkit/deeplearing/vllm!505
-
wujl5 authored
-
jujl1 authored
-
zhangqha authored
Add FA Unified Attention 2D See merge request dcutoolkit/deeplearing/vllm!501
-
fanwl authored
-
wangmin6 authored
[feat]deepseek mtp支持pp模式 See merge request dcutoolkit/deeplearing/vllm!503
-
王敏 authored
-
wangmin6 authored
[feat]支持ray分布式异步调度,VLLM_ENABLE_RAY_ASYNC_SCHEDULING环境变量控制 See merge request dcutoolkit/deeplearing/vllm!502
-
王敏 authored
-
- 15 Mar, 2026 1 commit
-
-
fanwl authored
- Add VLLM_V1_USE_FA_UNIFIED_ATTN_2D 环境变量 - 0: Triton attention, 1: FA unified attention
-
- 13 Mar, 2026 6 commits
-
-
wangmin6 authored
rms_norm_opt精度问题解决(换了个kernel) See merge request dcutoolkit/deeplearing/vllm!499
-
wangmin6 authored
fix: 修复MOE量化tensor对于其他模型的影响 See merge request dcutoolkit/deeplearing/vllm!500
-
guanyu1 authored
-
wujl5 authored
-
wangmin6 authored
修改sparse_attn hip后端 See merge request dcutoolkit/deeplearing/vllm!498
-
liuchy5 authored
-
- 12 Mar, 2026 11 commits
-
-
wangmin6 authored
[fix]添加VLLM_USE_LIGHTOP_FUSED_TOPP_TOPK控制lightop topp_topk融合算子开关 See merge request dcutoolkit/deeplearing/vllm!496
-
王敏 authored
-
王敏 authored
-
王敏 authored
-
王敏 authored
-
wangmin6 authored
Fix:GLM-5量化模型mla_attention layout修复&&sparse_attn fp8支持 See merge request dcutoolkit/deeplearing/vllm!495
-
lixh6 authored
-
wangmin6 authored
feat(deepseek-mla): 接入 VLLM_USE_LIGHTOP_RMS_ROPE_CONCAT 融合链路 See merge request dcutoolkit/deeplearing/vllm!486
-
laibao authored
-
laibao authored
新增环境变量与 MLA 融合接线(wrapper -> attention -> impl) 接入 lightop fused_rms_norm_rope_contiguous,保留回退路径
-
wangmin6 authored
moe: 补齐 fill+moe_align 融合开关语义 See merge request dcutoolkit/deeplearing/vllm!484
-