-
Lei Wang authored
* [Enhancement] Add atomic addition functions for FLOAT16x2 and FLOAT16x4 in CUDA * Introduced `AtomicAddx2` and `AtomicAddx4` functions for performing atomic addition operations on double-width float types in CUDA. * Updated `customize.py` to include the new `atomic_addx4` function for external calls. * Modified `__init__.py` to export the new atomic addition function, ensuring accessibility in the module. * lint fix
46798f25