• Yu Cheng's avatar
    [Dev] Add FP8 Quantization Examples and Absolute Maximum Reduction Operation Support (#320) · 4b705eb2
    Yu Cheng authored
    * [Dev] Add FP8 Quantization Examples and Absolute Maximum Reduction Operation Support
    
    * Added `example_per_token_cast_to_fp8.py` in examples/cast, providing token-wise FP8 quantization implementation.
    * Added `example_triton_cast_to_fp8.py` in examples/cast, providing Triton-based FP8 quantization implementation.
    * Added support for absolute maximum (absmax) reduction operation in reduce.cc and reduce.h.
    * Implemented `reduce_absmax` function in reduce.py, allowing absolute maximum reduction on input buffers.
    * Updated tilelang.language module to include the new `reduce_absmax` function.
    
    These changes enhance FP8 quantization capabilities and extend reduction operation support.
    
    * [Enhancement] Update per_token_cast_to_fp8 for improved FP8 quantization
    
    * Modified the `per_token_cast_to_fp8` function to support variable block sizes and improved memory layout annotations.
    * Adjusted the handling of absolute maximum values and scaling factors for better performance and accuracy.
    * Updated the main execution block to allow for larger matrix dimensions and refined the profiler setup for benchmarking.
    
    These changes enhance the flexibility and efficiency of the FP8 quantization process.
    
    * lint
    
    * [Dev] Update per_token_cast_fp8.py
    4b705eb2
reduce.cc 7.94 KB