Merge branch 'patch-1' of https://github.com/jamesr66a/Megatron-LM into github-pr

adebe364 · Jared Casper · 20f6169f · 98b457b1 · adebe364
Commit adebe364 authored Jan 28, 2022 by Jared Casper
Show whitespace changes
Inline Side-by-side

Showing with 1 addition and 1 deletion

README.md README.md +1 -1

No files found.
--- a/README.md
+++ b/README.md
@@ -263,7 +263,7 @@ Second, we developed a simple and efficient two-dimensional model-parallel appro

 <!-- The number of microbatches in a per-pipeline minibatch is controlled by the `--num-microbatches-in-minibatch` argument. With `WORLD_SIZE` GPUs, `TENSOR_MP_SIZE` tensor-model-parallel size, `PIPELINE_MP_SIZE` pipeline-model-parallel-size, `WORLD_SIZE`/(`TENSOR_MP_SIZE` * `PIPELINE_MP_SIZE`) GPUs will be used for data parallelism. The default values for `--tensor-model-parallel-size` and `--pipeline-model-parallel-size` is 1, which will not implement either form of model parallelism. -->

-We have examples of how to use these two different forms of model parallelism the example scripts ending in `distributed_with_mp.sh`, note that pipeline parallelism is not currently supported in the T5 model:
+We have examples of how to use these two different forms of model parallelism the example scripts ending in `distributed_with_mp.sh`:

 Other than these minor changes, the distributed training is identical to the training on a single GPU.