flax model parallel training (#12590)
* update scripts * add copyright * add logging * cleanup * add z loss * add readme * shard description * update readme
Showing
This diff is collapsed.
Please register or sign in to comment
* update scripts * add copyright * add logging * cleanup * add z loss * add readme * shard description * update readme