"vscode:/vscode.git/clone" did not exist on "6d646d08a2e0e73e83e313a5ae470c1f9e4f200e"
- 21 Jan, 2026 2 commits
- 20 Jan, 2026 9 commits
-
-
laibao authored
- 使用 get_moe_cuda_marlin_config_w16a16(status) 判断 W16A16 Marlin MoE 是否可用 - 在 FusedMoE 初始化阶段计算并缓存 _marlin_w16a16_moe_enabled,满足条件时强制 use_nn_moe=False - 权重加载后按缓存结果进行一次性 Marlin pack;运行时按 packed 标记走 Marlin fast path - 删除 envs.py 中 VLLM_USE_MARLIN_W16A16_MOE 环境变量定义与解析逻辑
-
zhuwenwen authored
-
zhuwenwen authored
[fix]解决glm4 moe + mtp精度异常 See merge request dcutoolkit/deeplearing/vllm!374
-
王敏 authored
-
王敏 authored
-
zhuwenwen authored
-
zhuwenwen authored
-
laibao authored
-
zhuwenwen authored
[fix]解决gpt oss nn moe权重加载出错 See merge request dcutoolkit/deeplearing/vllm!372
-
- 19 Jan, 2026 4 commits
- 16 Jan, 2026 5 commits
- 15 Jan, 2026 3 commits
- 14 Jan, 2026 7 commits
-
-
zhuwenwen authored
-
zhuwenwen authored
适配block-wise fp8接口 See merge request dcutoolkit/deeplearing/vllm!366
-
-
zhuwenwen authored
set VLLM_USE_PD_SPLIT=1 update moe_align_block_size
-
SAC_fanth authored
-
zhuwenwen authored
Switch default w8a8 gemm impl to blaslt. See merge request dcutoolkit/deeplearing/vllm!365
-
wanglong3 authored
-
- 12 Jan, 2026 2 commits
- 10 Jan, 2026 8 commits
-
-
zhuwenwen authored
perf(fused-moe): 预打包 Marlin W16A16 MoE 权重,降低 warmup 显存峰值 See merge request dcutoolkit/deeplearing/vllm!358
-
laibao authored
在 post-load hook 中对 w13/w2 做 per-expert Marlin pack,并替换为 packed 参数 Marlin fast path 仅接受 packed 权重;未预打包则 fail fast,避免运行时 packing 峰值/不确定性 更新 Marlin wrapper 的入参与 shape 推导(从 packed layout 计算 K/N)
-
zhuwenwen authored
-
zhuwenwen authored
-
zhuwenwen authored
区分pcie和hglink custom allreduce的使用 vllm:export VLLM_CUSTOM_CACHE=1 dtk:export HIP_KERNEL_EVENT_SYSTENFENCE=1
-
zhuwenwen authored
-
zhuwenwen authored
-
zhuwenwen authored
-