"vscode:/vscode.git/clone" did not exist on "57a8ccf3bafb87e40f62a88d927fcbd01de7eb4c"
- 12 Dec, 2025 1 commit
-
-
Lei Wang authored
-
- 01 Dec, 2025 1 commit
-
-
Lei Wang authored
* [Enhancement] Implement dynamic unroll factor in CUDA code generation This commit introduces support for specifying a dynamic unroll factor in the CUDA code generation. The `unroll_factor` map is added to store unroll factors for loop variables, allowing for more flexible and optimized loop unrolling. Additionally, the `unroll` function is integrated into the loop language, enabling users to define unroll factors directly in their code. This enhancement improves performance by allowing tailored unrolling strategies based on specific loop characteristics. * lint fix * [Bugfix] Correct initialization of non-zero counters in custom compress kernel and update TIR registration for gemm_sp_py to use the correct tile operation
-