- 08 Apr, 2025 1 commit
-
-
Yu Cheng authored
- Modified the `group_per_split_token_cast_to_fp8` function to support `bfloat16`, `float`, and `float16` data types. - Updated local fragment allocations to use the new `accum_dtype` for consistency. - Enhanced the main execution block to handle different tensor data types based on the specified `dtype`, improving flexibility in tensor operations.
-
- 06 Apr, 2025 1 commit
-
-
Yu Cheng authored
- Modified the `group_per_split_token_cast_to_fp8` function to include a conditional check for batch sizes, ensuring that the scaling factor is applied only when within the valid range. This change enhances the robustness of the FP8 conversion process for grouped per-split tokens.
-
- 05 Apr, 2025 1 commit
-
-
Yu Cheng authored
Implements FP8 type conversion functionality for grouped per-split tokens. The script includes several helper functions for handling tensor TMA alignment and FP8 conversion, enhancing support for FP8 data types and providing performance benchmarks. This change provides users with more flexible examples of FP8 operations.
-
- 03 Apr, 2025 1 commit
-
-
Yu Cheng authored
* [Dev] Add FP8 Quantization Examples and Absolute Maximum Reduction Operation Support * Added `example_per_token_cast_to_fp8.py` in examples/cast, providing token-wise FP8 quantization implementation. * Added `example_triton_cast_to_fp8.py` in examples/cast, providing Triton-based FP8 quantization implementation. * Added support for absolute maximum (absmax) reduction operation in reduce.cc and reduce.h. * Implemented `reduce_absmax` function in reduce.py, allowing absolute maximum reduction on input buffers. * Updated tilelang.language module to include the new `reduce_absmax` function. These changes enhance FP8 quantization capabilities and extend reduction operation support. * [Enhancement] Update per_token_cast_to_fp8 for improved FP8 quantization * Modified the `per_token_cast_to_fp8` function to support variable block sizes and improved memory layout annotations. * Adjusted the handling of absolute maximum values and scaling factors for better performance and accuracy. * Updated the main execution block to allow for larger matrix dimensions and refined the profiler setup for benchmarking. These changes enhance the flexibility and efficiency of the FP8 quantization process. * lint * [Dev] Update per_token_cast_fp8.py
-