• Lei Wang's avatar
    [Refactor] Update GEMM layout and operand traits for improved CUDA compatibility (#500) · 33937683
    Lei Wang authored
    * [Enhancement] Improve GEMM layout function and documentation
    
    * Added detailed documentation for the makeGemmABLayout function, explaining parameters and layout selection strategies.
    * Updated the layout selection logic to use mat_continuous consistently, enhancing clarity and correctness in memory layout calculations.
    * Adjusted the InferLayout method to reflect changes in the layout function, ensuring accurate matrix dimension handling for transposed cases.
    
    * lint fix
    
    * [Refactor] Update GEMM layout and operand traits for improved CUDA compatibility
    
    * Adjusted the InferLayout method in gemm.cc to include trans_A in fragment creation, enhancing layout inference for transposed matrices.
    * Updated OperandTraits in gemm_sm89.h and gemm_sm90.h to change the Copy type from SM75_U16x4_LDSM_N to SM75_U16x4_LDSM_T, optimizing memory access patterns for different warp configurations.
    * Enhanced static assertions in gemm_sm90.h to clarify requirements for num_warp_m, ensuring compatibility with Hopper architecture.
    
    * [Refactor] Clean up formatting in GEMM implementation and CUDA templates
    
    * Simplified the formatting of the fragment creation in the InferLayout method of gemm.cc for better readability.
    * Adjusted the static assertion message in gemm_sm90.h to enhance clarity regarding the num_warp_m requirement for Hopper architecture.
    33937683
gemm.cc 11.5 KB