Fuses dropout and softmax in backward pass, add bias support to CPP MHA, add...
Fuses dropout and softmax in backward pass, add bias support to CPP MHA, add additive mask support, separate Q/K/V parameters (#854)
Co-authored-by:
Sukru Eryilmaz <seryilmaz@computelab-dgx1v-32.nvidia.com>
Showing
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
Please register or sign in to comment