[mistral] Fix FA2 attention reshape for Mistral Nemo (#32065)

* [mistral] Fix FA2 attention reshape * [run-slow] mistral

[mistral] Fix FA2 attention reshape for Mistral Nemo (#32065)
* [mistral] Fix FA2 attention reshape * [run-slow] mistral
22f888b3 · Joshua Lochner · GitHub · cd48553f · 22f888b3
Unverified Commit 22f888b3 authored Jul 19, 2024 by Joshua Lochner Committed by GitHub Jul 19, 2024
Show whitespace changes
Inline Side-by-side

Showing with 1 addition and 1 deletion

src/transformers/models/mistral/modeling_mistral.py src/transformers/models/mistral/modeling_mistral.py +1 -1

No files found.
--- a/src/transformers/models/mistral/modeling_mistral.py
+++ b/src/transformers/models/mistral/modeling_mistral.py
@@ -387,7 +387,7 @@ class MistralFlashAttention2(MistralAttention):
            is_causal=self.is_causal,
        )

-        attn_output = attn_output.reshape(bsz, q_len, self.hidden_size).contiguous()
+        attn_output = attn_output.reshape(bsz, q_len, self.num_heads * self.head_dim).contiguous()
        attn_output = self.o_proj(attn_output)

        if not output_attentions: