-
Lei Wang authored
* [Enhancement] Add support for new FP8 types in HIP code generation * Updated `PrintConst` function in `codegen_hip.cc` to handle `float8_e4m3fnuz` type. * Introduced new functions in `hip_fp8.h` for creating FP8 types, including `make_fp8_e4_4_t` and `make_fp8_e4_8_t`, enhancing type handling for FP8 data structures. * Improved overall compatibility and performance for FP8 data types in HIP. * workaround for competition * enhance autotune * autotune cache fix * Implement validation for unused keys in AutoTuner configuration * Added a check in the AutoTuner class to raise a ValueError if there are unused keys in the configuration, enhancing error handling and ensuring configuration integrity. * lint fix * revert changes of threads * Update pipelining in `example_mla_decode.py` to improve performance * Changed the number of stages in the pipelined loop from 0 to 2, enhancing the efficiency of the attention mechanism in the decoding process. * Enhance Cython kernel validation by adding tensor attribute checks * Updated the `CythonKernelWrapper` to include dedicated methods for validating tensor device, dtype, and static shape. * Modified the `forward` method to utilize these new validation methods, improving error handling and ensuring input integrity. * Updated the `lambda_forward` function in `CythonKernelAdapter` to reflect changes in validation parameters.
319bc6b1