Commit adebe364 authored by Jared Casper's avatar Jared Casper
Browse files

Merge branch 'patch-1' of https://github.com/jamesr66a/Megatron-LM into github-pr

parents 20f6169f 98b457b1
......@@ -263,7 +263,7 @@ Second, we developed a simple and efficient two-dimensional model-parallel appro
<!-- The number of microbatches in a per-pipeline minibatch is controlled by the `--num-microbatches-in-minibatch` argument. With `WORLD_SIZE` GPUs, `TENSOR_MP_SIZE` tensor-model-parallel size, `PIPELINE_MP_SIZE` pipeline-model-parallel-size, `WORLD_SIZE`/(`TENSOR_MP_SIZE` * `PIPELINE_MP_SIZE`) GPUs will be used for data parallelism. The default values for `--tensor-model-parallel-size` and `--pipeline-model-parallel-size` is 1, which will not implement either form of model parallelism. -->
We have examples of how to use these two different forms of model parallelism the example scripts ending in `distributed_with_mp.sh`, note that pipeline parallelism is not currently supported in the T5 model:
We have examples of how to use these two different forms of model parallelism the example scripts ending in `distributed_with_mp.sh`:
Other than these minor changes, the distributed training is identical to the training on a single GPU.
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment