- 15 Mar, 2026 1 commit
-
-
fanwl authored
- Add VLLM_V1_USE_FA_UNIFIED_ATTN_2D 环境变量 - 0: Triton attention, 1: FA unified attention
-
- 13 Mar, 2026 6 commits
-
-
wangmin6 authored
rms_norm_opt精度问题解决(换了个kernel) See merge request dcutoolkit/deeplearing/vllm!499
-
wangmin6 authored
fix: 修复MOE量化tensor对于其他模型的影响 See merge request dcutoolkit/deeplearing/vllm!500
-
guanyu1 authored
-
wujl5 authored
-
wangmin6 authored
修改sparse_attn hip后端 See merge request dcutoolkit/deeplearing/vllm!498
-
liuchy5 authored
-
- 12 Mar, 2026 27 commits
-
-
wangmin6 authored
[fix]添加VLLM_USE_LIGHTOP_FUSED_TOPP_TOPK控制lightop topp_topk融合算子开关 See merge request dcutoolkit/deeplearing/vllm!496
-
王敏 authored
-
王敏 authored
-
王敏 authored
-
王敏 authored
-
wangmin6 authored
Fix:GLM-5量化模型mla_attention layout修复&&sparse_attn fp8支持 See merge request dcutoolkit/deeplearing/vllm!495
-
lixh6 authored
-
wangmin6 authored
feat(deepseek-mla): 接入 VLLM_USE_LIGHTOP_RMS_ROPE_CONCAT 融合链路 See merge request dcutoolkit/deeplearing/vllm!486
-
laibao authored
-
laibao authored
新增环境变量与 MLA 融合接线(wrapper -> attention -> impl) 接入 lightop fused_rms_norm_rope_contiguous,保留回退路径
-
wangmin6 authored
moe: 补齐 fill+moe_align 融合开关语义 See merge request dcutoolkit/deeplearing/vllm!484
-
laibao authored
-
wangmin6 authored
增加max_cudagraph_capture_size,细化capture的范围 See merge request dcutoolkit/deeplearing/vllm!494
-
wujl5 authored
-
wangmin6 authored
perf: DS V2模型MOE部分增加rmsQuant See merge request dcutoolkit/deeplearing/vllm!492
-
wujl5 authored
-
wangmin6 authored
fix: fix bug http://hpczentao.sugon.com/bug-view-118388.html See merge request dcutoolkit/deeplearing/vllm!491
-
-
wangmin6 authored
mrope_1d修改 See merge request dcutoolkit/deeplearing/vllm!490
-
wangmin6 authored
perf: DS V2模型MLA中增加rmsQuant See merge request dcutoolkit/deeplearing/vllm!487
-
wujl5 authored
-
wangmin6 authored
mla: opt-cat 在 prefill 和 decode 的拼接路由 See merge request dcutoolkit/deeplearing/vllm!483
-
wangmin6 authored
perf: DS v2增加DTBMM融合,默认关闭 See merge request dcutoolkit/deeplearing/vllm!480
-
wujl5 authored
-
guanyu1 authored
-
guanyu1 authored
-
guanyu1 authored
-
- 11 Mar, 2026 6 commits
-
-
zhangqha authored
dpsk_v32的mtp层的dense加载适配 See merge request dcutoolkit/deeplearing/vllm!479
-
zhangqha authored
feat:修复dsa的mqa接口兼容glm5 See merge request dcutoolkit/deeplearing/vllm!478
-
yangql authored
-
liuchy5 authored
-
laibao authored
-
zhangqha authored
Fix:修复调用Triton MoE gemm时缺失的参数,对齐接口 See merge request dcutoolkit/deeplearing/vllm!476
-