don't zero out the attention_mask when using sliding window with flash attention (#31670)
* don't zero out the attention_mask when using sliding window with flash attention * chore: lint
Showing
Please register or sign in to comment
* don't zero out the attention_mask when using sliding window with flash attention * chore: lint