FlaxGPTJ (#14396)
* add flax gptj * no bias in attention dense * no wpe * fix rotary embeddings * fix rotary embeds * fix rotray embeds * quality * doc and quality * fix equivalence tests
Showing
This diff is collapsed.
Please register or sign in to comment