Unverified Commit 0311ba21 authored by Stas Bekman's avatar Stas Bekman Committed by GitHub
Browse files

typo (#11152)

* typo

* style
parent 269c9638
...@@ -355,9 +355,9 @@ Notes: ...@@ -355,9 +355,9 @@ Notes:
able to use significantly larger batch sizes using the same hardware (e.g. 3x and even bigger) which should lead to able to use significantly larger batch sizes using the same hardware (e.g. 3x and even bigger) which should lead to
significantly shorter training time. significantly shorter training time.
3. To use the second version of Sharded data-parallelism, add ``--sharded_ddp zero_dp_2`` or ``--sharded_ddp zero_dp_3` 3. To use the second version of Sharded data-parallelism, add ``--sharded_ddp zero_dp_2`` or ``--sharded_ddp
to the command line arguments, and make sure you have added the distributed launcher ``-m torch.distributed.launch zero_dp_3`` to the command line arguments, and make sure you have added the distributed launcher ``-m
--nproc_per_node=NUMBER_OF_GPUS_YOU_HAVE`` if you haven't been using it already. torch.distributed.launch --nproc_per_node=NUMBER_OF_GPUS_YOU_HAVE`` if you haven't been using it already.
For example here is how you could use it for ``run_translation.py`` with 2 GPUs: For example here is how you could use it for ``run_translation.py`` with 2 GPUs:
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment