[Longformer] fix longformer global attention output (#5659)
* fix longformer global attention output * fix multi gpu problem * replace -10000 with 0 * better comment * make attention output equal local and global * Update src/transformers/modeling_longformer.py
Showing
Please register or sign in to comment