Fuses dropout and softmax in backward pass, add bias support to CPP MHA, add...
Fuses dropout and softmax in backward pass, add bias support to CPP MHA, add additive mask support, separate Q/K/V parameters (#854)
Co-authored-by:
Sukru Eryilmaz <seryilmaz@computelab-dgx1v-32.nvidia.com>
Showing
Please register or sign in to comment