"vscode:/vscode.git/clone" did not exist on "89988ec8c2a0c3e18e63767d9df5ca8f6b8ff21c"
- 11 Feb, 2026 1 commit
-
-
laibao authored
参考并移植 011/vllm 的关键提交逻辑 新增 VLLM_USE_MOE_W16A16_TRITON 开关,并接入基于 lightop 的运行时能力探测与启用结果缓存。 在权重加载后对 w13 与 w2 执行 W16A16 Marlin 预打包。 W16A16 Marlin 启用时保留 monolithic 执行路径,并在 fused_experts_impl 中增加 packed 权重 fast-path。 保持 Marlin 或 lightop 不可用时的回退行为不变。
-
- 10 Feb, 2026 2 commits
- 09 Feb, 2026 2 commits
- 08 Feb, 2026 4 commits
- 06 Feb, 2026 10 commits
- 05 Feb, 2026 3 commits
- 04 Feb, 2026 13 commits
-
-
zhuwenwen authored
-
zhuwenwen authored
-
zhuwenwen authored
-
zhuwenwen authored
-
zhuwenwen authored
-
zhuwenwen authored
-
zhuwenwen authored
-
zhuwenwen authored
-
zhuwenwen authored
-
zhuwenwen authored
-
Nick Hill authored
Signed-off-by:Nick Hill <nickhill123@gmail.com>
-
Michael Goin authored
Signed-off-by:Robert Shaw <rshaw@neuralmagic.com>
-
Michael Goin authored
[Bugfix] Disable RoutingMethodType.[Renormalize,RenormalizeNaive] TRTLLM per-tensor FP8 MoE (#33620) Signed-off-by:
mgoin <mgoin64@gmail.com> (cherry picked from commit e346e2d0 ) Signed-off-by:
Robert Shaw <rshaw@neuralmagic.com>
-
- 03 Feb, 2026 5 commits
-
-
zhuwenwen authored
-
zhuwenwen authored
-
Richard Zou authored
[torch.compile] Don't do the fast moe cold start optimization if there is speculative decoding (#33624) Signed-off-by:
Richard Zou <zou3519@gmail.com> Co-authored-by:
Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> (cherry picked from commit 5eac9a1b)
-
Richard Zou authored
Signed-off-by:
Richard Zou <zou3519@gmail.com> (cherry picked from commit d9aa39a3)
-
Kiersten Stokes authored
Signed-off-by:
kiersten-stokes <kierstenstokes@gmail.com> (cherry picked from commit 9e138cb0)
-