• Lei Wang's avatar
    [Refactor] Use new namespace and enhance dispatch macros for mma (#801) · b62a0b43
    Lei Wang authored
    * Refactor CUDA GEMM operations to use new namespace and enhance dispatch macros
    
    - Moved GEMM-related dispatch instructions to the `cute::tl_mma` namespace for better organization.
    - Introduced `TL_DISPATCH_MMA` and `TL_DISPATCH_MMA_TEMPLATE` macros to streamline the definition of dispatch instructions for various data types and architectures.
    - Updated the handling of CUDA architecture checks to include additional support for newer architectures.
    - Improved clarity and maintainability of the code by restructuring the layout and organization of dispatch instructions.
    - Ensured consistent usage of tensor views and memory clearing operations across different GEMM implementations.
    
    * Remove deprecated `DispatchInstruction` templates and `tl_mma` namespace from CUDA GEMM implementation. This cleanup enhances code clarity and maintainability by eliminating unused structures and streamlining the overall organization of the GEMM operations.
    b62a0b43
gemm_sm90.h 15.2 KB