1. 17 Sep, 2024 1 commit
  2. 13 Sep, 2024 1 commit
  3. 11 Sep, 2024 1 commit
  4. 29 Aug, 2024 1 commit
  5. 27 Aug, 2024 1 commit
  6. 21 Aug, 2024 1 commit
  7. 16 Aug, 2024 3 commits
  8. 14 Aug, 2024 1 commit
  9. 13 Aug, 2024 1 commit
  10. 07 Aug, 2024 2 commits
  11. 05 Aug, 2024 1 commit
  12. 01 Aug, 2024 1 commit
  13. 30 Jul, 2024 1 commit
  14. 25 Jul, 2024 1 commit
  15. 23 Jul, 2024 3 commits
  16. 21 Jul, 2024 1 commit
  17. 16 Jul, 2024 1 commit
  18. 11 Jul, 2024 1 commit
  19. 07 Jul, 2024 1 commit
  20. 03 Jul, 2024 3 commits
  21. 02 Jul, 2024 1 commit
  22. 01 Jul, 2024 1 commit
  23. 30 Jun, 2024 1 commit
  24. 25 Jun, 2024 1 commit
  25. 19 Jun, 2024 1 commit
  26. 18 Jun, 2024 1 commit
  27. 17 Jun, 2024 1 commit
  28. 16 Jun, 2024 1 commit
  29. 15 Jun, 2024 1 commit
  30. 13 Jun, 2024 2 commits
  31. 12 Jun, 2024 2 commits
    • Cody Yu's avatar
      [Kernel] Vectorized FP8 quantize kernel (#5396) · 5985e342
      Cody Yu authored
      Inspired by #5146, this PR improves FP8 quantize kernel by vectorizing data transfer to better utilize memory bandwidth. Microbenchmark shows that this improved kernel can achieve 1.0x-1.5x speedup (especially when hidden size is large).
      
      In details, we applied 3 optimizations:
      
      - Use inverted scale so that most divisions are changed to multiplications.
      - Unroll the loop by 4 times to improve ILP.
      - Use vectorized 4 to transfer data between HBM and SRAM.
      5985e342
    • Simon Mo's avatar
      Revert "[CI/Build] Add `is_quant_method_supported` to control quantization... · e3c12bf6
      Simon Mo authored
      Revert "[CI/Build] Add `is_quant_method_supported` to control quantization test configurations" (#5463)
      
      e3c12bf6