Fix bug when initializing model-parallel process groups for GPT-3 (#1435)
* Hack to enable training GPT-3 Seems to fix bug from #1416 * Add test to initialize model-parallelism for decoder-only Transformers Namely GPT-3.
Showing
Please register or sign in to comment