[`GPTNeoX`] Fix GPTNeoX + Flash Attention 2 issue (#28645)

Update modeling_gpt_neox.py

[`GPTNeoX`] Fix GPTNeoX + Flash Attention 2 issue (#28645)
Update modeling_gpt_neox.py
e201864b · Younes Belkada · GitHub · dafd5951 · e201864b
Unverified Commit e201864b authored Jan 22, 2024 by Younes Belkada Committed by GitHub Jan 22, 2024
Show whitespace changes
Inline Side-by-side

Showing with 1 addition and 1 deletion

src/transformers/models/gpt_neox/modeling_gpt_neox.py src/transformers/models/gpt_neox/modeling_gpt_neox.py +1 -1

No files found.
--- a/src/transformers/models/gpt_neox/modeling_gpt_neox.py
+++ b/src/transformers/models/gpt_neox/modeling_gpt_neox.py
@@ -390,7 +390,7 @@ class GPTNeoXFlashAttention2(GPTNeoXAttention):
            elif hasattr(self.config, "_pre_quantization_dtype"):
                target_dtype = self.config._pre_quantization_dtype
            else:
-                target_dtype = self.q_proj.weight.dtype
+                target_dtype = self.query_key_value.weight.dtype

            logger.warning_once(
                f"The input hidden states seems to be silently casted in float32, this might be related to"