"tests/models/bert/test_modeling_flax_bert.py" did not exist on "cd9274d0107079cb4ba5a8d00bba2fcd8236c220"
Fix GPT-NeoX-20B past handling, attention computation (#17811)
* Fix GPT-NeoX-20B past handling, swap attention computation to hopefully avoid NaN, update docs * 20B tests
Showing
Please register or sign in to comment