Update cudnn-frontend to 1.0.3 to fix cuDNN v9 SDPA NaNs (#650)

* Update cudnn frontend to 1.0.3 to fix cudnn v9 Nans Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * make d_out contiguous for bwd Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * remove cudnnDestroy to let torch handle it Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * Update transformer_engine/pytorch/attention.py Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by: cyanguwa <8636796+cyanguwa@users.noreply.github.com> * Update transformer_engine/pytorch/attention.py Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by: cyanguwa <8636796+cyanguwa@users.noreply.github.com> * Update transformer_engine/pytorch/attention.py Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by: cyanguwa <8636796+cyanguwa@users.noreply.github.com> --------- Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> Signed-off-by: cyanguwa <8636796+cyanguwa@users.noreply.github.com> Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>

Update cudnn-frontend to 1.0.3 to fix cuDNN v9 SDPA NaNs (#650)
* Update cudnn frontend to 1.0.3 to fix cudnn v9 Nans Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * make d_out contiguous for bwd Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * remove cudnnDestroy to let torch handle it Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * Update transformer_engine/pytorch/attention.py Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by: cyanguwa <8636796+cyanguwa@users.noreply.github.com> * Update transformer_engine/pytorch/attention.py Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by: cyanguwa <8636796+cyanguwa@users.noreply.github.com> * Update transformer_engine/pytorch/attention.py Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by: cyanguwa <8636796+cyanguwa@users.noreply.github.com> --------- Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> Signed-off-by: cyanguwa <8636796+cyanguwa@users.noreply.github.com> Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>
2aee0591 · cyanguwa · GitHub · ce163f9e · a86ad708 · 9f82dda5
Unverified Commit 2aee0591 authored Feb 02, 2024 by cyanguwa Committed by GitHub Feb 02, 2024
3 changed files
--- a/cudnn-frontend @ a86ad708
+++ b/cudnn-frontend @ a86ad708
-Subproject commit 9f82dda5c029d15a5f371f0fe003dc0c74a0c987
+Subproject commit a86ad708db725e4d29919bb6fadf8e6cdfa5dc06
--- a/transformer_engine/common/fused_attn/utils.h
+++ b/transformer_engine/common/fused_attn/utils.h
@@ -152,11 +152,6 @@ class cudnnExecutionPlanManager {
    }

    ~cudnnExecutionPlanManager() {
-        static thread_local std::once_flag flag;
-        std::call_once(flag, [&] {
-                        if (handle_ != nullptr) {
-                          cudnnDestroy(handle_);
-                        }});
    }

 private:

--- a/transformer_engine/pytorch/attention.py
+++ b/transformer_engine/pytorch/attention.py
@@ -1823,6 +1823,7 @@ class FusedAttnFunc_qkvpacked(torch.autograd.Function):

    @staticmethod
    def backward(ctx, d_out):
+        d_out = d_out.contiguous()
        qkv, out, cu_seqlens = ctx.saved_tensors
        if not ctx.aux_ctx_tensors[0].is_contiguous():
            ctx.aux_ctx_tensors[0] = ctx.aux_ctx_tensors[0].contiguous()
@@ -1892,6 +1893,7 @@ class FusedAttnFunc_kvpacked(torch.autograd.Function):

    @staticmethod
    def backward(ctx, d_out):
+        d_out = d_out.contiguous()
        q, kv, out, cu_seqlens_q, cu_seqlens_kv = ctx.saved_tensors
        if not ctx.aux_ctx_tensors[0].is_contiguous():
            ctx.aux_ctx_tensors[0] = ctx.aux_ctx_tensors[0].contiguous()
@@ -1973,6 +1975,7 @@ class FusedAttnFunc(torch.autograd.Function):

    @staticmethod
    def backward(ctx, d_out):
+        d_out = d_out.contiguous()
        q, k, v, out, cu_seqlens_q, cu_seqlens_kv = ctx.saved_tensors
        if not ctx.aux_ctx_tensors[0].is_contiguous():
            ctx.aux_ctx_tensors[0] = ctx.aux_ctx_tensors[0].contiguous()