Update flash_attention_patch.py

To be compatible with the new change in the Transformers library, where a new argument 'padding_mask' was added to forward function of attention layer. https://github.com/huggingface/transformers/pull/25598

Update flash_attention_patch.py
To be compatible with the new change in the Transformers library, where a new argument 'padding_mask' was added to forward function of attention layer. https://github.com/huggingface/transformers/pull/25598
7768afba · Zian(Andy) Zheng · 611a5a80 · 7768afba
Commit 7768afba authored Oct 13, 2023 by Zian(Andy) Zheng
Hide whitespace changes
Inline Side-by-side

Showing with 1 addition and 0 deletions

applications/Colossal-LLaMA-2/colossal_llama2/utils/flash_attention_patch.py ...al-LLaMA-2/colossal_llama2/utils/flash_attention_patch.py +1 -0

No files found.
--- a/applications/Colossal-LLaMA-2/colossal_llama2/utils/flash_attention_patch.py
+++ b/applications/Colossal-LLaMA-2/colossal_llama2/utils/flash_attention_patch.py
@@ -65,6 +65,7 @@ def attention_forward(
    past_key_value: Optional[Tuple[torch.Tensor]] = None,
    output_attentions: bool = False,
    use_cache: bool = False,
+    **kwargs
 ) -> Tuple[torch.Tensor, Optional[torch.Tensor], Optional[Tuple[torch.Tensor]]]:
    """
    Re-define LLaMA-2 `LlamaAttention` forward method using flash-attention.