Fix sliding window attention used in Gemma2FlashAttention2 (#32522)
* fix sliding window attention (flash2) in gemma2 model * [run-slow] gemma * fix slicing attention_mask for flash_attn2 * fix slicing attention_mask when flash_attn is used * add missing comment * slice the last seq_len tokens in the key, value states * revert code of slicing key, value states
Showing
Please register or sign in to comment