- 26 Mar, 2026 6 commits
- 25 Mar, 2026 3 commits
-
-
wangmin6 authored
支持kvacache fp8_e4m3的RMS_ROPE_CONCAT See merge request dcutoolkit/deeplearing/vllm!531
-
wangmin6 authored
处理VLLM_USE_LIGHTOP_MOE_SUM_MUL_ADD导致的awq推理bug问题 See merge request dcutoolkit/deeplearing/vllm!528
-
wangmin6 authored
fix(moe): 补齐非Marlin量化路径 shared_output/routed_scaling_factor 透传 See merge request dcutoolkit/deeplearing/vllm!529
-
- 24 Mar, 2026 7 commits
- 23 Mar, 2026 4 commits
-
-
wangmin6 authored
删除1d_mrope See merge request dcutoolkit/deeplearing/vllm!525
-
guanyu1 authored
-
wangmin6 authored
关闭sparse_mla的num_head到64/128的pad,以及添加控制fp8_use_mixed_batch模式的环境变量控制,FP8_USE_MI... See merge request dcutoolkit/deeplearing/vllm!524
-
yangql authored
# Conflicts: # vllm/model_executor/layers/sparse_attn_indexer.py # vllm/v1/attention/backends/mla/flashmla_sparse.py
-
- 21 Mar, 2026 8 commits
-
-
wangmin6 authored
feat:flash_mla,q去掉pad See merge request dcutoolkit/deeplearing/vllm!522
-
yangql authored
-
yangql authored
-
yangql authored
关闭sparse_mla的num_head到64/128的pad,以及添加控制fp8_use_mixed_batch模式的环境变量控制,FP8_USE_MIXED_BATCH,默认为false,为分离模式
-
liuchy5 authored
-
wangmin6 authored
[perf]DSA架构模型支持mtp>1 See merge request dcutoolkit/deeplearing/vllm!521
-
yangql authored
-
王敏 authored
-
- 20 Mar, 2026 2 commits
- 19 Mar, 2026 5 commits
-
-
wangmin6 authored
[fix]修复GLM mtp精度问题 See merge request dcutoolkit/deeplearing/vllm!518
-
王敏 authored
-
wangmin6 authored
qwen3.py合入fused_morpe See merge request dcutoolkit/deeplearing/vllm!516
-
wangmin6 authored
feat(moe): 修复 shared_output 透传被覆盖并兼容 torch.compile 启动路径 See merge request dcutoolkit/deeplearing/vllm!517
-
laibao authored
移除 forward 中对 experts.use_overlapped/_shared_experts 的状态改写,避免 torch.compile 启动期 shared/non-shared 路径不一致 FusedMoE.forward_impl 仅在 shared_output 为空时计算 shared experts,防止透传值被本地重算覆盖
-
- 18 Mar, 2026 5 commits
-
-
guanyu1 authored
-
wangmin6 authored
x接入mla_cat算子仅在nmz和kvcache-fp8情况下生效,默认关闭,开启需要export VLLM_USE_CAT_MLA=1 See merge request dcutoolkit/deeplearing/vllm!513
-
wangmin6 authored
fix prompt_is_reasoning_end_arr not defined See merge request dcutoolkit/deeplearing/vllm!515
-
wangmin6 authored
feat(deepseek-moe): 接入 VLLM_USE_LIGHTOP_MOE_SUM_MUL_ADD 融合链路 See merge request dcutoolkit/deeplearing/vllm!485
-
guanyu1 authored
-