src/tl_templates/cuda/intrin.h · 77e31e52b4d87e25eb828ba1b176a08b4a44bfd4 · OpenDAS / tilelang

[Bugfix] Add missing FP8 header include (#752) · cf7be057

Lei Wang authored Aug 24, 2025



* [Enhancement] Add DispatchInstruction specialization for fp8 types in gemm_sm90.h

- Introduced specialized DispatchInstruction templates for fp8_e4_t and fp8_e5_t types, enhancing support for new data formats in CUDA GEMM operations.
- Each specialization defines the corresponding MMA and MMA_Group types, optimizing performance for specific configurations.
Co-authored-by: LeiWang1999 <leiwang1999@outlook.com>

* [Enhancement] Include cuda_fp8.h in gemm_sm90.h

- Added the inclusion of the "cuda_fp8.h" header file to support new data formats in CUDA GEMM operations, enhancing compatibility with recent updates for fp8 types.
Co-authored-by: LeiWang1999 <leiwang1999@outlook.com>

* lint fix

* [Refactor] Remove unused tl_shuffle_elect and related functions from common.h

- Deleted the `tl_shuffle_elect` function and its associated comments to streamline the codebase.
- Added inclusion of "intrin.h" for improved intrinsic support in CUDA operations.
- Cleaned up the file by removing unnecessary template parameters and functions, enhancing clarity and maintainability.

* lint fix

* [Refactor] Update header inclusions in common.h and gemm_sm90.h

- Removed the inclusion of "intrin.h" from common.h to streamline dependencies.
- Added "intrin.h" inclusion in gemm_sm90.h to ensure intrinsic support for CUDA operations, enhancing functionality and maintainability.

* bug fix

cf7be057

intrin.h 2.32 KB

Replace intrin.h