[Bugfix][Language V2] Capture closure variables from program (#1206)
* Enhance CUDA code generation by improving register type handling for float data types and introducing a workaround for TF32 compatibility. Updated MMA register type registration for A and B operands to boost performance and ensure correctness.
* lint fix
---------
Co-authored-by:
Zhiwen Mo <zm125@ic.ac.uk>
Showing
Please register or sign in to comment