src/tl_templates/cuda/gemm_sm90.h · d9a0f13176253cc1cc2b114b2314163787f1663f · OpenDAS / tilelang

[Refactor] Use new namespace and enhance dispatch macros for mma (#801) · b62a0b43

Lei Wang authored Sep 11, 2025

* Refactor CUDA GEMM operations to use new namespace and enhance dispatch macros

- Moved GEMM-related dispatch instructions to the `cute::tl_mma` namespace for better organization.
- Introduced `TL_DISPATCH_MMA` and `TL_DISPATCH_MMA_TEMPLATE` macros to streamline the definition of dispatch instructions for various data types and architectures.
- Updated the handling of CUDA architecture checks to include additional support for newer architectures.
- Improved clarity and maintainability of the code by restructuring the layout and organization of dispatch instructions.
- Ensured consistent usage of tensor views and memory clearing operations across different GEMM implementations.

* Remove deprecated `DispatchInstruction` templates and `tl_mma` namespace from CUDA GEMM implementation. This cleanup enhances code clarity and maintainability by eliminating unused structures and streamlining the overall organization of the GEMM operations.

b62a0b43

gemm_sm90.h 15.2 KB

Replace gemm_sm90.h