[Enhancement] Add DispatchInstruction specialization for fp8 types in gemm_sm90.h (#751)
- Introduced specialized DispatchInstruction templates for fp8_e4_t and fp8_e5_t types, enhancing support for new data formats in CUDA GEMM operations. - Each specialization defines the corresponding MMA and MMA_Group types, optimizing performance for specific configurations.
Showing
Please register or sign in to comment