• Yu Cheng's avatar
    [Feature] Add ptx_cp_async_barrier_noinc intrinsic and related functionality (#809) · ae9b7063
    Yu Cheng authored
    - Introduced a new intrinsic `ptx_cp_async_barrier_noinc` for handling the `cp.async.mbarrier.arrive.noinc` operation in TileLang.
    - Updated the CUDA code generation to support the new barrier operation.
    - Added a corresponding function in the TileLang Python API for ease of use.
    - Enhanced the barrier handling in CUDA templates to include the new no-increment operation, improving synchronization capabilities in parallel execution contexts.
    ae9b7063
builtin.h 8.26 KB