Initialize output tensors to 0 for THD (temporary) (#1009)
* initialize output tensors to 0 for THD while waiting for cuDNN bug fix Signed-off-by:Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * move fill_() to F16 loop Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix fused_attn_bwd() Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * correct typo in check_set_window_size Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * use nvtx3 instead Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> --------- Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Showing
Please register or sign in to comment