• Lei Wang's avatar
    [Bugfix] Legalize Datatype for mma intrinisc codegen (#1179) · 7c61d31a
    Lei Wang authored
    * fix
    
    * lint fix
    
    * Enhance CUDA code generation by updating register type handling for float data types. Introduced a workaround for TF32 type compatibility and improved the registration of MMA register types for A and B operands.
    7c61d31a
codegen_cuda.cc 120 KB