- 06 Feb, 2026 10 commits
-
-
王敏 authored
-
王敏 authored
# Conflicts: # vllm/model_executor/layers/fused_moe/modular_kernel.py
-
zhuwenwen authored
-
王敏 authored
# Conflicts: # vllm/model_executor/layers/fused_moe/config.py # vllm/model_executor/layers/fused_moe/layer.py # vllm/model_executor/layers/quantization/compressed_tensors/compressed_tensors_marlin.py
-
王敏 authored
-
zhuwenwen authored
-
zhuwenwen authored
-
王敏 authored
-
zhuwenwen authored
-
zhuwenwen authored
-
- 05 Feb, 2026 5 commits
- 04 Feb, 2026 13 commits
-
-
zhuwenwen authored
-
zhuwenwen authored
-
zhuwenwen authored
-
zhuwenwen authored
-
zhuwenwen authored
-
zhuwenwen authored
-
zhuwenwen authored
-
zhuwenwen authored
-
zhuwenwen authored
-
zhuwenwen authored
-
Nick Hill authored
Signed-off-by:Nick Hill <nickhill123@gmail.com>
-
Michael Goin authored
Signed-off-by:Robert Shaw <rshaw@neuralmagic.com>
-
Michael Goin authored
[Bugfix] Disable RoutingMethodType.[Renormalize,RenormalizeNaive] TRTLLM per-tensor FP8 MoE (#33620) Signed-off-by:
mgoin <mgoin64@gmail.com> (cherry picked from commit e346e2d0 ) Signed-off-by:
Robert Shaw <rshaw@neuralmagic.com>
-
- 03 Feb, 2026 12 commits
-
-
zhuwenwen authored
-
zhuwenwen authored
-
zhuwenwen authored
-
zhuwenwen authored
-
zhuwenwen authored
-
zhuwenwen authored
-
Richard Zou authored
[torch.compile] Don't do the fast moe cold start optimization if there is speculative decoding (#33624) Signed-off-by:
Richard Zou <zou3519@gmail.com> Co-authored-by:
Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> (cherry picked from commit 5eac9a1b)
-
Richard Zou authored
Signed-off-by:
Richard Zou <zou3519@gmail.com> (cherry picked from commit d9aa39a3)
-
Kiersten Stokes authored
Signed-off-by:
kiersten-stokes <kierstenstokes@gmail.com> (cherry picked from commit 9e138cb0)
-
zaristei2 authored
Signed-off-by:
Zachary Aristei <zaristei@nvidia.com> Co-authored-by:
Zachary Aristei <zaristei@nvidia.com>
-
zaristei2 authored
Signed-off-by:
Zachary Aristei <zaristei@nvidia.com> Co-authored-by:
Zachary Aristei <zaristei@nvidia.com>
-
zhuwenwen authored
-