- 09 Feb, 2026 3 commits
- 08 Feb, 2026 4 commits
- 06 Feb, 2026 13 commits
-
-
zhuwenwen authored
set fp8_e4m3 only supported on nmz and support q&kvcache fp8 set VLLM_PCIE_USE_CUSTOM_ALLREDUCE=1
-
zhuwenwen authored
[feat]支持宽松mtp See merge request dcutoolkit/deeplearing/vllm!414
-
王敏 authored
-
王敏 authored
-
王敏 authored
# Conflicts: # vllm/model_executor/layers/fused_moe/modular_kernel.py
-
zhuwenwen authored
-
王敏 authored
# Conflicts: # vllm/model_executor/layers/fused_moe/config.py # vllm/model_executor/layers/fused_moe/layer.py # vllm/model_executor/layers/quantization/compressed_tensors/compressed_tensors_marlin.py
-
王敏 authored
-
zhuwenwen authored
-
zhuwenwen authored
-
王敏 authored
-
zhuwenwen authored
-
zhuwenwen authored
-
- 05 Feb, 2026 5 commits
- 04 Feb, 2026 13 commits
-
-
zhuwenwen authored
-
zhuwenwen authored
-
zhuwenwen authored
-
zhuwenwen authored
-
zhuwenwen authored
-
zhuwenwen authored
-
zhuwenwen authored
-
zhuwenwen authored
-
zhuwenwen authored
-
zhuwenwen authored
-
Nick Hill authored
Signed-off-by:Nick Hill <nickhill123@gmail.com>
-
Michael Goin authored
Signed-off-by:Robert Shaw <rshaw@neuralmagic.com>
-
Michael Goin authored
[Bugfix] Disable RoutingMethodType.[Renormalize,RenormalizeNaive] TRTLLM per-tensor FP8 MoE (#33620) Signed-off-by:
mgoin <mgoin64@gmail.com> (cherry picked from commit e346e2d0 ) Signed-off-by:
Robert Shaw <rshaw@neuralmagic.com>
-
- 03 Feb, 2026 2 commits