• Lei Wang's avatar
    [AMD][Enhancement] Add support for Vectorized FP8 DataPacking (#542) · 319bc6b1
    Lei Wang authored
    * [Enhancement] Add support for new FP8 types in HIP code generation
    
    * Updated `PrintConst` function in `codegen_hip.cc` to handle `float8_e4m3fnuz` type.
    * Introduced new functions in `hip_fp8.h` for creating FP8 types, including `make_fp8_e4_4_t` and `make_fp8_e4_8_t`, enhancing type handling for FP8 data structures.
    * Improved overall compatibility and performance for FP8 data types in HIP.
    
    * workaround for competition
    
    * enhance autotune
    
    * autotune cache fix
    
    * Implement validation for unused keys in AutoTuner configuration
    
    * Added a check in the AutoTuner class to raise a ValueError if there are unused keys in the configuration, enhancing error handling and ensuring configuration integrity.
    
    * lint fix
    
    * revert changes of threads
    
    * Update pipelining in `example_mla_decode.py` to improve performance
    
    * Changed the number of stages in the pipelined loop from 0 to 2, enhancing the efficiency of the attention mechanism in the decoding process.
    
    * Enhance Cython kernel validation by adding tensor attribute checks
    
    * Updated the `CythonKernelWrapper` to include dedicated methods for validating tensor device, dtype, and static shape.
    * Modified the `forward` method to utilize these new validation methods, improving error handling and ensuring input integrity.
    * Updated the `lambda_forward` function in `CythonKernelAdapter` to reflect changes in validation parameters.
    319bc6b1
common.h 3.8 KB