Improve BERT-like models performance with better self attention (#9124)
* Improve BERT-like models attention layers * Apply style * Put back error raising instead of assert * Update template * Fix copies * Apply raising valueerror in MPNet * Restore the copy check for the Intermediate layer in Longformer * Update longformer
Showing
Please register or sign in to comment