• Lei Wang's avatar
    [Enhancement] Support cute mma tile mxn8ky (#434) · d1c15bc5
    Lei Wang authored
    * [Enhancement] Improve error handling in layout inference and update profiler type in tests
    
    * Added a detailed error message in the layout inference for local.fragment to clarify the requirement for trans_B.
    * Updated the profiler type in the cumulative sum test from TensorSupplyType.One to TensorDistributionType.Randn for better profiling accuracy.
    
    * lint fix
    
    * [Refactor] Update OperandTraits to include num_warp_n parameter
    
    * Modified OperandTraits templates across gemm_sm80.h, gemm_sm89.h, and gemm_sm90.h to include an additional num_warp_n parameter for improved flexibility in layout and copy operations.
    * Adjusted Copy type selection based on the new parameter to enhance performance and adaptability in various scenarios.
    
    * lint fix
    
    * [Refactor] Update DispatchInstruction templates to include N parameter
    
    * Modified DispatchInstruction templates in gemm_sm80.h, gemm_sm89.h, and gemm_sm90.h to include an additional N parameter, enhancing flexibility in tile size calculations.
    * Adjusted MMA_Group definitions to use std::min for improved handling of warp sizes, ensuring better performance and adaptability in various scenarios.
    d1c15bc5
gemm.cc 11.2 KB