-
Yu Cheng authored
* [Feature] Add TMA Store Synchronization Support - Introduce TMAStoreArrive and TMAStoreWait operations for CUDA TMA store synchronization - Add new builtin operations in op/builtin.cc and op/builtin.h - Implement TMAStoreSyncInjector to automatically inject TMA store synchronization calls - Update CUDA codegen to support new TMA store synchronization intrinsics - Add Python language bindings for new TMA store synchronization operations * [CMake] Add CUDA Major Version Detection for Conditional Compilation - Introduce CUDA_MAJOR_VERSION CMake variable to dynamically detect CUDA toolkit version - Update runtime and transform files to use CUDA_MAJOR_VERSION for version-specific code paths - Replace hardcoded __CUDACC_VER_MAJOR__ with dynamically set CUDA_MAJOR_VERSION - Improve cross-version compatibility for CUDA-dependent code sections
20f19611