"...composable_kernel.git" did not exist on "2f463a94067f96519a083539679a5d187ca0563f"
[Feature] Add ptx_cp_async_barrier_noinc intrinsic and related functionality (#809)
- Introduced a new intrinsic `ptx_cp_async_barrier_noinc` for handling the `cp.async.mbarrier.arrive.noinc` operation in TileLang. - Updated the CUDA code generation to support the new barrier operation. - Added a corresponding function in the TileLang Python API for ease of use. - Enhanced the barrier handling in CUDA templates to include the new no-increment operation, improving synchronization capabilities in parallel execution contexts.
Showing
Please register or sign in to comment