1. 13 Nov, 2025 2 commits
    • zhuwenwen's avatar
      feat: 添加输出占位符功能以优化调度 · 613edd7d
      zhuwenwen authored
      - 在环境变量中引入 `VLLM_SCHED_ENABLE_MINIMAL_INJECTION` 以控制流水线并行调度的最小注入。
      - 调整 Scheduler 逻辑以使用新的最小注入功能。
      - 更新调度逻辑以利用输出占位符,确保在解码过程中避免 0-token 停滞。
      - 增强 Scheduler,根据批次队列状态管理最小进度注入。
      613edd7d
    • 王敏's avatar
      [feat]w4a8和w8a8适配deepep低延迟 · 92761bde
      王敏 authored
      92761bde
  2. 10 Nov, 2025 1 commit
  3. 08 Nov, 2025 1 commit
  4. 07 Nov, 2025 4 commits
  5. 06 Nov, 2025 2 commits
  6. 04 Nov, 2025 1 commit
  7. 03 Nov, 2025 3 commits
  8. 01 Nov, 2025 1 commit
  9. 31 Oct, 2025 1 commit
  10. 29 Oct, 2025 3 commits
  11. 28 Oct, 2025 1 commit
  12. 27 Oct, 2025 1 commit
  13. 24 Oct, 2025 1 commit
    • zhuwenwen's avatar
      add VLLM_USE_LIGHTOP_MOE_SUM_MUL_ADD · c2e6f453
      zhuwenwen authored
      support prefix cache on kme
      fix the error in test_moe caused by moe align not supporting 511 and 211
      multi-modal switching to torch implementation on z100l&k100
      c2e6f453
  14. 20 Oct, 2025 1 commit
  15. 17 Oct, 2025 3 commits
  16. 15 Oct, 2025 7 commits
  17. 14 Oct, 2025 1 commit
  18. 13 Oct, 2025 6 commits