- 11 Feb, 2026 1 commit
-
-
laibao authored
参考并移植 011/vllm 的关键提交逻辑 新增 VLLM_USE_MOE_W16A16_TRITON 开关,并接入基于 lightop 的运行时能力探测与启用结果缓存。 在权重加载后对 w13 与 w2 执行 W16A16 Marlin 预打包。 W16A16 Marlin 启用时保留 monolithic 执行路径,并在 fused_experts_impl 中增加 packed 权重 fast-path。 保持 Marlin 或 lightop 不可用时的回退行为不变。
-
- 10 Feb, 2026 2 commits
- 09 Feb, 2026 5 commits
-
-
zhuwenwen authored
-
zhuwenwen authored
-
zhuwenwen authored
[feat]宽松mtp支持temp,top-p等参数设置 See merge request dcutoolkit/deeplearing/vllm!420
-
zhuwenwen authored
[feat]支持prefill和decode调度分离 See merge request dcutoolkit/deeplearing/vllm!419
-
zhuwenwen authored
适配w8a8 deepep,接入lightop版deepgemm See merge request dcutoolkit/deeplearing/vllm!418
-
- 08 Feb, 2026 4 commits
- 06 Feb, 2026 13 commits
-
-
zhuwenwen authored
set fp8_e4m3 only supported on nmz and support q&kvcache fp8 set VLLM_PCIE_USE_CUSTOM_ALLREDUCE=1
-
zhuwenwen authored
[feat]支持宽松mtp See merge request dcutoolkit/deeplearing/vllm!414
-
王敏 authored
-
王敏 authored
-
王敏 authored
# Conflicts: # vllm/model_executor/layers/fused_moe/modular_kernel.py
-
zhuwenwen authored
-
王敏 authored
# Conflicts: # vllm/model_executor/layers/fused_moe/config.py # vllm/model_executor/layers/fused_moe/layer.py # vllm/model_executor/layers/quantization/compressed_tensors/compressed_tensors_marlin.py
-
王敏 authored
-
zhuwenwen authored
-
zhuwenwen authored
-
王敏 authored
-
zhuwenwen authored
-
zhuwenwen authored
-
- 05 Feb, 2026 5 commits
- 04 Feb, 2026 10 commits