1. 14 Sep, 2025 1 commit
    • Yu Cheng's avatar
      [Feature] Add ptx_cp_async_barrier_noinc intrinsic and related functionality (#809) · ae9b7063
      Yu Cheng authored
      - Introduced a new intrinsic `ptx_cp_async_barrier_noinc` for handling the `cp.async.mbarrier.arrive.noinc` operation in TileLang.
      - Updated the CUDA code generation to support the new barrier operation.
      - Added a corresponding function in the TileLang Python API for ease of use.
      - Enhanced the barrier handling in CUDA templates to include the new no-increment operation, improving synchronization capabilities in parallel execution contexts.
      ae9b7063