-
Oleg Goncharov authored
* Added new unfused softmax cuda kernel to support causal attention mask Signed-off-by:
Oleg Goncharov <ogoncharov@nvidia.com> * Added test suite for unfused causal softmax kernel Signed-off-by:
Oleg Goncharov <ogoncharov@nvidia.com> * Removed test cases with large matrices from the causal softmax test suite Signed-off-by:
Oleg Goncharov <ogoncharov@nvidia.com> * Cleaned up the code per lint Signed-off-by:
Oleg Goncharov <ogoncharov@nvidia.com> * Added a compute buffer to causal softmax testing suite to store intermediate results without casting Signed-off-by:
Oleg Goncharov <ogoncharov@nvidia.com> * Added more tests cases Signed-off-by:
Oleg Goncharov <ogoncharov@nvidia.com> * Relaxed absolute tolerance atol Signed-off-by:
Oleg Goncharov <ogoncharov@nvidia.com> * Relaxed absolute tolerance for BF16 Signed-off-by:
Oleg Goncharov <ogoncharov@nvidia.com> --------- Signed-off-by:
Oleg Goncharov <ogoncharov@nvidia.com>
d9eb1991