[GQA] Add regional atomic add to slightly boost performance (#1093)
* [Lint] * [BugFix] Freeze the memory order of all atomic_add operations * [Lint] * [Atomic] Move on to regional atomic add * [Lint]
Showing
Please register or sign in to comment