- 13 Nov, 2025 1 commit
-
-
zhuwenwen authored
-
- 12 Nov, 2025 4 commits
- 11 Nov, 2025 2 commits
- 10 Nov, 2025 4 commits
- 09 Nov, 2025 2 commits
- 08 Nov, 2025 1 commit
-
-
王敏 authored
-
- 07 Nov, 2025 7 commits
-
-
zhuwenwen authored
-
zhuwenwen authored
-
-
zhuwenwen authored
-
zhuwenwen authored
-
zhuwenwen authored
fix:修复pp卡住问题 See merge request dcutoolkit/deeplearing/vllm!245
-
laibao authored
- 更新环境变量以控制流水线并行调度的最小注入。 - 从 Request 类中移除 num_output_placeholders,并调整 Scheduler 逻辑以使用新的最小注入功能。 - 增强 Scheduler,根据批次队列状态管理最小进度注入。
-
- 06 Nov, 2025 4 commits
- 05 Nov, 2025 1 commit
-
-
zhuwenwen authored
-
- 04 Nov, 2025 5 commits
-
-
liuchy5 authored
-
zhuwenwen authored
- 在环境变量中引入 `VLLM_ENABLE_OUTPUT_PLACEHOLDERS` 以控制输出占位符的启用。 - 在 `Request` 类中增加 `num_output_placeholders` 属性,用于跟踪预计生成的 token 数量。 - 更新调度逻辑以利用输出占位符,确保在解码过程中避免 0-token 停滞。 - 移除不再使用的最小进度注入相关代码,简化调度器实现。
-
zhuwenwen authored
-
zhuwenwen authored
-
zhuwenwen authored
feat: w8a8_marlin 接入,通过-q slimquant_marlin开启,优化w4a8_marlin代码 See merge request dcutoolkit/deeplearing/vllm!240
-
- 03 Nov, 2025 1 commit
-
-
jujl1 authored
-
- 31 Oct, 2025 6 commits
- 29 Oct, 2025 2 commits