"...resnet50_tensorflow.git" did not exist on "92bad0d216cc46140c52da8d75d4685eb364736a"
[Enhancement] Support cute mma tile mxn8ky (#434)
* [Enhancement] Improve error handling in layout inference and update profiler type in tests * Added a detailed error message in the layout inference for local.fragment to clarify the requirement for trans_B. * Updated the profiler type in the cumulative sum test from TensorSupplyType.One to TensorDistributionType.Randn for better profiling accuracy. * lint fix * [Refactor] Update OperandTraits to include num_warp_n parameter * Modified OperandTraits templates across gemm_sm80.h, gemm_sm89.h, and gemm_sm90.h to include an additional num_warp_n parameter for improved flexibility in layout and copy operations. * Adjusted Copy type selection based on the new parameter to enhance performance and adaptability in various scenarios. * lint fix * [Refactor] Update DispatchInstruction templates to include N parameter * Modified DispatchInstruction templates in gemm_sm80.h, gemm_sm89.h, and gemm_sm90.h to include an additional N parameter, enhancing flexibility in tile size calculations. * Adjusted MMA_Group definitions to use std::min for improved handling of warp sizes, ensuring better performance and adaptability in various scenarios.
Showing
Please register or sign in to comment