[PyTorch] Fix issues with cross attention (#1069)

Signed-off-by: Markus Schnoes <markus.schnoes@gmx.de> Co-authored-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

[PyTorch] Fix issues with cross attention (#1069)
Signed-off-by: Markus Schnoes <markus.schnoes@gmx.de> Co-authored-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
a326e351 · Marks101 · GitHub · cc329b79 · a326e351 · a326e351
Unverified Commit a326e351 authored Aug 15, 2024 by Marks101 Committed by GitHub Aug 14, 2024
Hide whitespace changes
Inline Side-by-side

Showing with 5 additions and 2 deletions

transformer_engine/pytorch/attention.py transformer_engine/pytorch/attention.py +2 -1

transformer_engine/pytorch/transformer.py transformer_engine/pytorch/transformer.py +3 -1

No files found.
--- a/transformer_engine/pytorch/attention.py
+++ b/transformer_engine/pytorch/attention.py
@@ -5778,7 +5778,8 @@ class DotProductAttention(TransformerEngineBaseModule):
                        assert (
                            attention_mask is not None
                        ), "Please provide attention_mask for padding!"
-                        if max_seqlen_q == max_seqlen_kv:
+                        if self.attention_type == "self":
+                            assert max_seqlen_q == max_seqlen_kv
                            cu_seqlens_q = get_cu_seqlens(attention_mask)
                            cu_seqlens_kv = cu_seqlens_q
                        else:

--- a/transformer_engine/pytorch/transformer.py
+++ b/transformer_engine/pytorch/transformer.py
@@ -652,7 +652,7 @@ class TransformerLayer(torch.nn.Module):
            hidden_states,
            attention_mask=attention_mask,
            attn_mask_type=self_attn_mask_type,
-            window_size=enc_dec_window_size,
+            window_size=window_size,
            inference_params=inference_params,
            is_first_microbatch=is_first_microbatch,
            checkpoint_core_attention=checkpoint_core_attention,
@@ -679,6 +679,8 @@ class TransformerLayer(torch.nn.Module):
            inter_attention_outputs = self.inter_attention(
                hidden_states,
                attention_mask=enc_dec_attn_mask,
+                attn_mask_type=enc_dec_attn_mask_type,
+                window_size=enc_dec_window_size,
                encoder_output=encoder_output,
                is_first_microbatch=is_first_microbatch,
                checkpoint_core_attention=checkpoint_core_attention,