-
Yu Cheng authored
* [Dev] Add FP8 Quantization Examples and Absolute Maximum Reduction Operation Support * Added `example_per_token_cast_to_fp8.py` in examples/cast, providing token-wise FP8 quantization implementation. * Added `example_triton_cast_to_fp8.py` in examples/cast, providing Triton-based FP8 quantization implementation. * Added support for absolute maximum (absmax) reduction operation in reduce.cc and reduce.h. * Implemented `reduce_absmax` function in reduce.py, allowing absolute maximum reduction on input buffers. * Updated tilelang.language module to include the new `reduce_absmax` function. These changes enhance FP8 quantization capabilities and extend reduction operation support. * [Enhancement] Update per_token_cast_to_fp8 for improved FP8 quantization * Modified the `per_token_cast_to_fp8` function to support variable block sizes and improved memory layout annotations. * Adjusted the handling of absolute maximum values and scaling factors for better performance and accuracy. * Updated the main execution block to allow for larger matrix dimensions and refined the profiler setup for benchmarking. These changes enhance the flexibility and efficiency of the FP8 quantization process. * lint * [Dev] Update per_token_cast_fp8.py
4b705eb2