"vllm/entrypoints/openai/engine/serving.py" did not exist on "740374d456a638df98ffbc7d9dab328752330e62"
  1. 22 Aug, 2024 1 commit
  2. 21 Aug, 2024 1 commit
  3. 20 Aug, 2024 4 commits
  4. 17 Aug, 2024 1 commit
  5. 15 Aug, 2024 1 commit
  6. 13 Aug, 2024 1 commit
  7. 12 Aug, 2024 1 commit
  8. 10 Aug, 2024 1 commit
  9. 09 Aug, 2024 2 commits
  10. 06 Aug, 2024 2 commits
  11. 05 Aug, 2024 1 commit
  12. 29 Jul, 2024 1 commit
  13. 24 Jul, 2024 1 commit
  14. 23 Jul, 2024 2 commits
  15. 22 Jul, 2024 2 commits
  16. 20 Jul, 2024 2 commits
  17. 18 Jul, 2024 1 commit
  18. 17 Jul, 2024 1 commit
  19. 10 Jul, 2024 2 commits
  20. 02 Jul, 2024 2 commits
  21. 13 Jun, 2024 2 commits
  22. 12 Jun, 2024 2 commits
    • Cody Yu's avatar
      [Kernel] Vectorized FP8 quantize kernel (#5396) · 5985e342
      Cody Yu authored
      Inspired by #5146, this PR improves FP8 quantize kernel by vectorizing data transfer to better utilize memory bandwidth. Microbenchmark shows that this improved kernel can achieve 1.0x-1.5x speedup (especially when hidden size is large).
      
      In details, we applied 3 optimizations:
      
      - Use inverted scale so that most divisions are changed to multiplications.
      - Unroll the loop by 4 times to improve ILP.
      - Use vectorized 4 to transfer data between HBM and SRAM.
      5985e342
    • zhuwenwen's avatar
      skip fp8 · 103f3110
      zhuwenwen authored
      103f3110
  23. 09 Jun, 2024 1 commit
  24. 07 Jun, 2024 2 commits
  25. 05 Jun, 2024 1 commit
  26. 03 Jun, 2024 2 commits