"include/ck/utility/tuple_helper.hpp" did not exist on "e823d518cb46ad61ddb3c70eac8529e0a58af1f8"
-
Yu Cheng authored
- Introduced a new intrinsic `ptx_cp_async_barrier_noinc` for handling the `cp.async.mbarrier.arrive.noinc` operation in TileLang. - Updated the CUDA code generation to support the new barrier operation. - Added a corresponding function in the TileLang Python API for ease of use. - Enhanced the barrier handling in CUDA templates to include the new no-increment operation, improving synchronization capabilities in parallel execution contexts.
ae9b7063