"git@developer.sourcefind.cn:chenpangpang/transformers.git" did not exist on "27c7f971c0dcd3bb423ea221fe2bce751d313119"
Fix GPT-NeoX-20B past handling, attention computation (#17811)
* Fix GPT-NeoX-20B past handling, swap attention computation to hopefully avoid NaN, update docs * 20B tests
Showing
Please register or sign in to comment