1. 11 Feb, 2026 1 commit
    • laibao's avatar
      feat(moe): 补齐 v0.15 中 Marlin W16A16 MoE 端到端接入 · 1e2fe58f
      laibao authored
        参考并移植 011/vllm 的关键提交逻辑
        新增 VLLM_USE_MOE_W16A16_TRITON 开关,并接入基于 lightop 的运行时能力探测与启用结果缓存。
        在权重加载后对 w13 与 w2 执行 W16A16 Marlin 预打包。
        W16A16 Marlin 启用时保留 monolithic 执行路径,并在 fused_experts_impl 中增加 packed 权重 fast-path。
        保持 Marlin 或 lightop 不可用时的回退行为不变。
      1e2fe58f
  2. 06 Feb, 2026 1 commit
  3. 26 Jan, 2026 1 commit
  4. 24 Jan, 2026 1 commit
  5. 21 Jan, 2026 1 commit
  6. 16 Jan, 2026 1 commit
  7. 07 Jan, 2026 1 commit
  8. 06 Jan, 2026 1 commit
  9. 17 Dec, 2025 1 commit
    • zhuwenwen's avatar
      Synchronize the modifications from the 12th to the 17th: · b66c8e4b
      zhuwenwen authored
      修复CompressedTensorsLinearMethod中的w4a16的冲突问题
      feat(moe): add Marlin W16A16 fused MoE behind VLLM_USE_MARLIN_W16A16_MOE
      replace the fp8_mqa_logits and fp8_paged_mqa_logits interfaces in deepgemm with mqa_logits and paged_mqa_logits from lightop
      b66c8e4b
  10. 11 Dec, 2025 1 commit
  11. 10 Dec, 2025 1 commit
  12. 26 Nov, 2025 1 commit
  13. 19 Nov, 2025 1 commit
  14. 13 Nov, 2025 2 commits
  15. 10 Nov, 2025 1 commit
  16. 05 Nov, 2025 1 commit
  17. 12 Oct, 2025 1 commit
  18. 11 Oct, 2025 2 commits
  19. 05 Oct, 2025 1 commit
  20. 28 Sep, 2025 3 commits
  21. 27 Sep, 2025 1 commit
  22. 26 Sep, 2025 1 commit
  23. 21 Sep, 2025 1 commit
  24. 17 Sep, 2025 3 commits
  25. 13 Sep, 2025 1 commit
  26. 12 Sep, 2025 1 commit
  27. 01 Sep, 2025 1 commit
  28. 29 Aug, 2025 1 commit
  29. 25 Aug, 2025 1 commit
  30. 20 Aug, 2025 1 commit
  31. 19 Aug, 2025 1 commit
  32. 13 Aug, 2025 1 commit
  33. 12 Aug, 2025 1 commit
  34. 11 Aug, 2025 1 commit