• Lei Wang's avatar
    [Bugfix] Add missing FP8 header include (#752) · cf7be057
    Lei Wang authored
    
    
    * [Enhancement] Add DispatchInstruction specialization for fp8 types in gemm_sm90.h
    
    - Introduced specialized DispatchInstruction templates for fp8_e4_t and fp8_e5_t types, enhancing support for new data formats in CUDA GEMM operations.
    - Each specialization defines the corresponding MMA and MMA_Group types, optimizing performance for specific configurations.
    Co-authored-by: default avatarLeiWang1999 <leiwang1999@outlook.com>
    
    * [Enhancement] Include cuda_fp8.h in gemm_sm90.h
    
    - Added the inclusion of the "cuda_fp8.h" header file to support new data formats in CUDA GEMM operations, enhancing compatibility with recent updates for fp8 types.
    Co-authored-by: default avatarLeiWang1999 <leiwang1999@outlook.com>
    
    * lint fix
    
    * [Refactor] Remove unused tl_shuffle_elect and related functions from common.h
    
    - Deleted the `tl_shuffle_elect` function and its associated comments to streamline the codebase.
    - Added inclusion of "intrin.h" for improved intrinsic support in CUDA operations.
    - Cleaned up the file by removing unnecessary template parameters and functions, enhancing clarity and maintainability.
    
    * lint fix
    
    * [Refactor] Update header inclusions in common.h and gemm_sm90.h
    
    - Removed the inclusion of "intrin.h" from common.h to streamline dependencies.
    - Added "intrin.h" inclusion in gemm_sm90.h to ensure intrinsic support for CUDA operations, enhancing functionality and maintainability.
    
    * bug fix
    cf7be057
intrin.h 2.32 KB