[Bugfix] Legalize Datatype for mma intrinisc codegen (#1179)
* fix * lint fix * Enhance CUDA code generation by updating register type handling for float data types. Introduced a workaround for TF32 type compatibility and improved the registration of MMA register types for A and B operands.
Showing
Please register or sign in to comment