"vscode:/vscode.git/clone" did not exist on "b83e3b3bc6cedf64e2cfd13d34ee6f80de8df676"
- 11 Jun, 2025 1 commit
-
-
Yu Cheng authored
* [Feature] Added Support for Synchronizing Grids and Persistent Threadblock Transformation - Defined the sync_grid operation in builtin.cc and builtin.h, allowing synchronization of all threads within a grid. - Implemented support for sync_grid in codegen_cuda.cc, ensuring proper handling of this operation in the generated CUDA code. - Added the PersistThreadblock transformation, enabling the conversion of thread blocks to persistent thread blocks, enhancing support for persistent kernels. - Updated relevant documentation and comments to reflect the addition of new features and usage instructions. * [Example] Add MLA Decode With Persistent Threadblock Example * [Feature] Introduce Persistent Loop and Update GEMM Example - Added a new persistent loop construct in the TIR framework, enabling more efficient kernel execution. - Updated the GEMM example to utilize the new persistent primitive, enhancing performance for matrix multiplication. - Introduced a `loop_break` intrinsic for better control flow within persistent loops. - Updated relevant files to support the new features, including changes in code generation and language interface. * lint fix
-
- 07 Jun, 2025 1 commit
-
-
Yu Cheng authored
* [Enhancement] Fix multi-version buffer index in nested-loop * [Feature] Support persistent kernels and add persistent GEMM example * lint fix * lint fix * [CI] Remove test_tilelang_transform_annotate_device_regions.py
-