-
Lei Wang authored
- Added conditional compilation for BFLOAT16 atomic operations to ensure compatibility with CUDA architectures greater than 7.5. - Improved code clarity by organizing the AtomicAdd functions and adding relevant comments for better understanding.
9ad9d9cd