1. 03 Apr, 2025 1 commit
    • Yu Cheng's avatar
      [Dev] Add FP8 Quantization Examples and Absolute Maximum Reduction Operation Support (#320) · 4b705eb2
      Yu Cheng authored
      * [Dev] Add FP8 Quantization Examples and Absolute Maximum Reduction Operation Support
      
      * Added `example_per_token_cast_to_fp8.py` in examples/cast, providing token-wise FP8 quantization implementation.
      * Added `example_triton_cast_to_fp8.py` in examples/cast, providing Triton-based FP8 quantization implementation.
      * Added support for absolute maximum (absmax) reduction operation in reduce.cc and reduce.h.
      * Implemented `reduce_absmax` function in reduce.py, allowing absolute maximum reduction on input buffers.
      * Updated tilelang.language module to include the new `reduce_absmax` function.
      
      These changes enhance FP8 quantization capabilities and extend reduction operation support.
      
      * [Enhancement] Update per_token_cast_to_fp8 for improved FP8 quantization
      
      * Modified the `per_token_cast_to_fp8` function to support variable block sizes and improved memory layout annotations.
      * Adjusted the handling of absolute maximum values and scaling factors for better performance and accuracy.
      * Updated the main execution block to allow for larger matrix dimensions and refined the profiler setup for benchmarking.
      
      These changes enhance the flexibility and efficiency of the FP8 quantization process.
      
      * lint
      
      * [Dev] Update per_token_cast_fp8.py
      4b705eb2