"vscode:/vscode.git/clone" did not exist on "857df64eba915d076fa4b7f6e7c1e4a06c52aab4"
  • Lei Wang's avatar
    [Enhancement] Add atomicAdd for FLOAT16x2 and FLOAT16x4 (#522) · 46798f25
    Lei Wang authored
    * [Enhancement] Add atomic addition functions for FLOAT16x2 and FLOAT16x4 in CUDA
    
    * Introduced `AtomicAddx2` and `AtomicAddx4` functions for performing atomic addition operations on double-width float types in CUDA.
    * Updated `customize.py` to include the new `atomic_addx4` function for external calls.
    * Modified `__init__.py` to export the new atomic addition function, ensuring accessibility in the module.
    
    * lint fix
    46798f25
common.h 7.59 KB