Commits · c99b7056b489a5d222e9c7b41b954d243cff97da · OpenDAS / tilelang

08 Apr, 2025 1 commit

[Enhancement] Update group_per_split_token_cast_to_fp8 to support multiple data types (#356) · a686f0f1

Yu Cheng authored Apr 08, 2025

- Modified the `group_per_split_token_cast_to_fp8` function to support `bfloat16`, `float`, and `float16` data types.
- Updated local fragment allocations to use the new `accum_dtype` for consistency.
- Enhanced the main execution block to handle different tensor data types based on the specified `dtype`, improving flexibility in tensor operations.

a686f0f1

06 Apr, 2025 1 commit

[Bugfix] Fix X_amax Correctness Issue in Group Cast FP8 (#345) · 847a461b

Yu Cheng authored Apr 06, 2025

- Modified the `group_per_split_token_cast_to_fp8` function to include a conditional check for batch sizes, ensuring that the scaling factor is applied only when within the valid range. This change enhances the robustness of the FP8 conversion process for grouped per-split tokens.

847a461b

05 Apr, 2025 1 commit

[Dev] Add Group Cast FP8 Example (#338) · 73885cfd

Yu Cheng authored Apr 05, 2025

Implements FP8 type conversion functionality for grouped per-split tokens. The script includes several helper functions for handling tensor TMA alignment and FP8 conversion, enhancing support for FP8 data types and providing performance benchmarks. This change provides users with more flexible examples of FP8 operations.

73885cfd

03 Apr, 2025 1 commit

[Dev] Add FP8 Quantization Examples and Absolute Maximum Reduction Operation Support (#320) · 4b705eb2

Yu Cheng authored Apr 03, 2025

* [Dev] Add FP8 Quantization Examples and Absolute Maximum Reduction Operation Support

* Added `example_per_token_cast_to_fp8.py` in examples/cast, providing token-wise FP8 quantization implementation.
* Added `example_triton_cast_to_fp8.py` in examples/cast, providing Triton-based FP8 quantization implementation.
* Added support for absolute maximum (absmax) reduction operation in reduce.cc and reduce.h.
* Implemented `reduce_absmax` function in reduce.py, allowing absolute maximum reduction on input buffers.
* Updated tilelang.language module to include the new `reduce_absmax` function.

These changes enhance FP8 quantization capabilities and extend reduction operation support.

* [Enhancement] Update per_token_cast_to_fp8 for improved FP8 quantization

* Modified the `per_token_cast_to_fp8` function to support variable block sizes and improved memory layout annotations.
* Adjusted the handling of absolute maximum values and scaling factors for better performance and accuracy.
* Updated the main execution block to allow for larger matrix dimensions and refined the profiler setup for benchmarking.

These changes enhance the flexibility and efficiency of the FP8 quantization process.

* lint

* [Dev] Update per_token_cast_fp8.py

4b705eb2