-
Tim Moon authored
* Hack to enable training GPT-3 Seems to fix bug from #1416 * Add test to initialize model-parallelism for decoder-only Transformers Namely GPT-3.
fb21698e
* Hack to enable training GPT-3 Seems to fix bug from #1416 * Add test to initialize model-parallelism for decoder-only Transformers Namely GPT-3.