Merge pull request #905 from ROCmSoftwarePlatform/mha-train-develop-grad-bias
flash attention output bias grad
Showing
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
Please register or sign in to comment