"...composable_kernel_onnxruntime.git" did not exist on "913afaeb5d5ec0732d7b5a3da468ffd07609538e"
[Refactor] Use new namespace and enhance dispatch macros for mma (#801)
* Refactor CUDA GEMM operations to use new namespace and enhance dispatch macros - Moved GEMM-related dispatch instructions to the `cute::tl_mma` namespace for better organization. - Introduced `TL_DISPATCH_MMA` and `TL_DISPATCH_MMA_TEMPLATE` macros to streamline the definition of dispatch instructions for various data types and architectures. - Updated the handling of CUDA architecture checks to include additional support for newer architectures. - Improved clarity and maintainability of the code by restructuring the layout and organization of dispatch instructions. - Ensured consistent usage of tensor views and memory clearing operations across different GEMM implementations. * Remove deprecated `DispatchInstruction` templates and `tl_mma` namespace from CUDA GEMM implementation. This cleanup enhances code clarity and maintainability by eliminating unused structures and streamlining the overall organization of the GEMM operations.
Showing
Please register or sign in to comment