• Yu Cheng's avatar
    [Feature] Add TMA Store Synchronization Support (#195) · eba7dd5a
    Yu Cheng authored
    - Introduce TMAStoreArrive and TMAStoreWait operations for CUDA TMA store synchronization
    - Add new builtin operations in op/builtin.cc and op/builtin.h
    - Implement TMAStoreSyncInjector to automatically inject TMA store synchronization calls
    - Update CUDA codegen to support new TMA store synchronization intrinsics
    - Add Python language bindings for new TMA store synchronization operations
    eba7dd5a
codegen_cuda.cc 60.3 KB