-
Lei Wang authored
* Enhance CUDA code generation by improving register type handling for float data types and introducing a workaround for TF32 compatibility. Updated MMA register type registration for A and B operands to boost performance and ensure correctness. * lint fix --------- Co-authored-by:Zhiwen Mo <zm125@ic.ac.uk>
8119550b