"vscode:/vscode.git/clone" did not exist on "857df64eba915d076fa4b7f6e7c1e4a06c52aab4"
-
Lei Wang authored
* [Enhancement] Add atomic addition functions for FLOAT16x2 and FLOAT16x4 in CUDA * Introduced `AtomicAddx2` and `AtomicAddx4` functions for performing atomic addition operations on double-width float types in CUDA. * Updated `customize.py` to include the new `atomic_addx4` function for external calls. * Modified `__init__.py` to export the new atomic addition function, ensuring accessibility in the module. * lint fix
46798f25