Fix GPT-NeoX-20B past handling, attention computation (#17811)
* Fix GPT-NeoX-20B past handling, swap attention computation to hopefully avoid NaN, update docs * 20B tests
Showing
Please register or sign in to comment
* Fix GPT-NeoX-20B past handling, swap attention computation to hopefully avoid NaN, update docs * 20B tests