- 21 Mar, 2026 3 commits
- 20 Mar, 2026 2 commits
- 19 Mar, 2026 5 commits
-
-
wangmin6 authored
[fix]修复GLM mtp精度问题 See merge request dcutoolkit/deeplearing/vllm!518
-
王敏 authored
-
wangmin6 authored
qwen3.py合入fused_morpe See merge request dcutoolkit/deeplearing/vllm!516
-
wangmin6 authored
feat(moe): 修复 shared_output 透传被覆盖并兼容 torch.compile 启动路径 See merge request dcutoolkit/deeplearing/vllm!517
-
laibao authored
移除 forward 中对 experts.use_overlapped/_shared_experts 的状态改写,避免 torch.compile 启动期 shared/non-shared 路径不一致 FusedMoE.forward_impl 仅在 shared_output 为空时计算 shared experts,防止透传值被本地重算覆盖
-
- 18 Mar, 2026 12 commits
-
-
guanyu1 authored
-
wangmin6 authored
x接入mla_cat算子仅在nmz和kvcache-fp8情况下生效,默认关闭,开启需要export VLLM_USE_CAT_MLA=1 See merge request dcutoolkit/deeplearing/vllm!513
-
wangmin6 authored
fix prompt_is_reasoning_end_arr not defined See merge request dcutoolkit/deeplearing/vllm!515
-
wangmin6 authored
feat(deepseek-moe): 接入 VLLM_USE_LIGHTOP_MOE_SUM_MUL_ADD 融合链路 See merge request dcutoolkit/deeplearing/vllm!485
-
guanyu1 authored
-
guanyu1 authored
-
laibao authored
新增环境变量 VLLM_USE_LIGHTOP_MOE_SUM_MUL_ADD 用于控制 fused sum+mul+add 开关。 在 DeepseekV2MoE 中增加 fused 路径,预计算 shared_output,并下传 iqis 与 routed_scaling_factor。 扩展 FusedMoE/SharedFusedMoE 及相关 custom op 接口,统一透传 i_q/i_s/shared_output/routed_scaling_factor。 同步适配 Triton、Marlin W16A16、SlimQuant W4A8、CompressedTensors W8A8 等实现,支持在内核侧完成 sum+mul+add。
-
renzhc authored
-
wangmin6 authored
feat:支持mqa的fp8实现 See merge request dcutoolkit/deeplearing/vllm!514
-
lixh6 authored
-
wangmin6 authored
feat:接入VLLM_USE_FUSED_FILL_RMS_CAT优化 See merge request dcutoolkit/deeplearing/vllm!512
-
yangql authored
-
- 17 Mar, 2026 10 commits
-
-
liuchy5 authored
-
wangmin6 authored
[perf]默认使用full graph See merge request dcutoolkit/deeplearing/vllm!511
-
王敏 authored
-
王敏 authored
-
wangmin6 authored
[perf]消除sparse mla build时的拷贝调度空泡 See merge request dcutoolkit/deeplearing/vllm!510
-
王敏 authored
-
wangmin6 authored
add fa unified attn 导入判断 See merge request dcutoolkit/deeplearing/vllm!509
-
wangmin6 authored
invoke flash_attn in the Qwen2AudioEncoder (transformers) See merge request dcutoolkit/deeplearing/vllm!508
-
fanwl authored
-
caihl authored
-
- 16 Mar, 2026 8 commits
-
-
wangmin6 authored
fix: resolve block_shape conflicts between DeepEP MoE and non-DeepEP quantization See merge request dcutoolkit/deeplearing/vllm!507
-
chenhw5 authored
-
zhangqha authored
Merge v0.15.1-dev-pd into v0.15.1-dev See merge request dcutoolkit/deeplearing/vllm!506
-
xuxz authored
-
zhangqha authored
Fix Qwen3/Qwen3.5 Reasoning Parser (#34779) See merge request dcutoolkit/deeplearing/vllm!504
-
xuxz authored
-
xuxz authored
-
wangmin6 authored
perf: GLM4.7增加MOE调用rmsQuant, fix: 修掉fused_moe向后传递None导致的报错 See merge request dcutoolkit/deeplearing/vllm!505
-