src/tl_templates/hip/common.h · 319bc6b1bdd1325a3ce99084f9823ca29d213d96 · OpenDAS / tilelang

[AMD][Enhancement] Add support for Vectorized FP8 DataPacking (#542) · 319bc6b1

Lei Wang authored Jun 04, 2025

* [Enhancement] Add support for new FP8 types in HIP code generation

* Updated `PrintConst` function in `codegen_hip.cc` to handle `float8_e4m3fnuz` type.
* Introduced new functions in `hip_fp8.h` for creating FP8 types, including `make_fp8_e4_4_t` and `make_fp8_e4_8_t`, enhancing type handling for FP8 data structures.
* Improved overall compatibility and performance for FP8 data types in HIP.

* workaround for competition

* enhance autotune

* autotune cache fix

* Implement validation for unused keys in AutoTuner configuration

* Added a check in the AutoTuner class to raise a ValueError if there are unused keys in the configuration, enhancing error handling and ensuring configuration integrity.

* lint fix

* revert changes of threads

* Update pipelining in `example_mla_decode.py` to improve performance

* Changed the number of stages in the pipelined loop from 0 to 2, enhancing the efficiency of the attention mechanism in the decoding process.

* Enhance Cython kernel validation by adding tensor attribute checks

* Updated the `CythonKernelWrapper` to include dedicated methods for validating tensor device, dtype, and static shape.
* Modified the `forward` method to utilize these new validation methods, improving error handling and ensuring input integrity.
* Updated the `lambda_forward` function in `CythonKernelAdapter` to reflect changes in validation parameters.

319bc6b1

common.h 3.8 KB

Replace common.h