[Longformer] Better handling of global attention mask vs local attention mask (#4672)
* better api * improve automatic setting of global attention mask * fix longformer bug * fix global attention mask in test * fix global attn mask flatten * fix slow tests * update docstring * update docs and make more robust * improve attention mask
Showing
Please register or sign in to comment