[Feature] Add GQA backward kernel with varlen input (#1082)
* [Feature] Add GQA backward kernel with varlen input * [Lint] * [BugFix] Freeze the memory order of all atomic_add operations * [Lint] * [Lint] * [BugFix] Use release order to boost performance
Showing
This diff is collapsed.
Please register or sign in to comment