- 06 Nov, 2025 3 commits
- 05 Nov, 2025 1 commit
-
-
zhuwenwen authored
-
- 04 Nov, 2025 5 commits
-
-
liuchy5 authored
-
zhuwenwen authored
- 在环境变量中引入 `VLLM_ENABLE_OUTPUT_PLACEHOLDERS` 以控制输出占位符的启用。 - 在 `Request` 类中增加 `num_output_placeholders` 属性,用于跟踪预计生成的 token 数量。 - 更新调度逻辑以利用输出占位符,确保在解码过程中避免 0-token 停滞。 - 移除不再使用的最小进度注入相关代码,简化调度器实现。
-
zhuwenwen authored
-
zhuwenwen authored
-
zhuwenwen authored
feat: w8a8_marlin 接入,通过-q slimquant_marlin开启,优化w4a8_marlin代码 See merge request dcutoolkit/deeplearing/vllm!240
-
- 03 Nov, 2025 1 commit
-
-
jujl1 authored
-
- 31 Oct, 2025 6 commits
- 29 Oct, 2025 2 commits
- 27 Oct, 2025 2 commits
- 24 Oct, 2025 4 commits
- 23 Oct, 2025 2 commits
- 20 Oct, 2025 2 commits
- 17 Oct, 2025 2 commits
- 16 Oct, 2025 3 commits
- 15 Oct, 2025 6 commits
- 13 Oct, 2025 1 commit
-
-
zhuwenwen authored
去掉all2all ep相关代码 See merge request dcutoolkit/deeplearing/vllm!226
-