"cacheflow/model_executor/memory_analyzer.py" did not exist on "80a2f812f17add5838f84288054fbe0b915622cc"
- 21 Dec, 2025 2 commits
- 18 Dec, 2025 2 commits
- 17 Dec, 2025 7 commits
- 16 Dec, 2025 2 commits
- 15 Dec, 2025 5 commits
- 11 Dec, 2025 1 commit
-
-
zhuwenwen authored
[fix]修复deepep 高吞吐模式vmfault问题 See merge request dcutoolkit/deeplearing/vllm!291
-
- 10 Dec, 2025 1 commit
-
-
王敏 authored
-
- 08 Dec, 2025 3 commits
- 07 Dec, 2025 1 commit
-
-
zhuwenwen authored
add ALLOW_MNNV default falase, use VLLM_ALLOW_MNNVL=1 See merge request dcutoolkit/deeplearing/vllm!286
-
- 05 Dec, 2025 1 commit
-
-
yangql authored
-
- 02 Dec, 2025 2 commits
- 28 Nov, 2025 2 commits
- 24 Nov, 2025 2 commits
- 17 Nov, 2025 1 commit
-
-
zhuwenwen authored
[feat]1.w8a8 marlin适配deepep低延迟;2.非naive ep模式,去掉多余的dp padding,避免allreduce耗时 See merge request dcutoolkit/deeplearing/vllm!256
-
- 15 Nov, 2025 1 commit
-
-
王敏 authored
-
- 13 Nov, 2025 7 commits
-
-
zhuwenwen authored
restore the default settings of disable_cascade_attn add VLLM_USE_OPT_ZEROS to replace triton_ (torch.zeros) set default_max_num_batched_tokens = 10240 update qwen3_moe of layernorm
-
zhuwenwen authored
解决w8a8 pp16开启marlin的oom问题
-
zhuwenwen authored
-
zhuwenwen authored
-
zhuwenwen authored
- 在环境变量中引入 `VLLM_SCHED_ENABLE_MINIMAL_INJECTION` 以控制流水线并行调度的最小注入。 - 调整 Scheduler 逻辑以使用新的最小注入功能。 - 更新调度逻辑以利用输出占位符,确保在解码过程中避免 0-token 停滞。 - 增强 Scheduler,根据批次队列状态管理最小进度注入。
-
zhuwenwen authored
[feat]w4a8和w8a8适配deepep低延迟 See merge request dcutoolkit/deeplearing/vllm!255
-
王敏 authored
-