"git@developer.sourcefind.cn:chenpangpang/transformers.git" did not exist on "f9582c205afaa4bb117bb67a4bf5184b053417b3"
Unverified Commit 5da7c78e authored by Stas Bekman's avatar Stas Bekman Committed by GitHub
Browse files

update to new script; notebook notes (#10241)

parent dee876ce
...@@ -258,17 +258,16 @@ To deploy this feature: ...@@ -258,17 +258,16 @@ To deploy this feature:
2. Add ``--sharded_ddp`` to the command line arguments, and make sure you have added the distributed launcher ``-m 2. Add ``--sharded_ddp`` to the command line arguments, and make sure you have added the distributed launcher ``-m
torch.distributed.launch --nproc_per_node=NUMBER_OF_GPUS_YOU_HAVE`` if you haven't been using it already. torch.distributed.launch --nproc_per_node=NUMBER_OF_GPUS_YOU_HAVE`` if you haven't been using it already.
For example here is how you could use it for ``finetune_trainer.py`` with 2 GPUs: For example here is how you could use it for ``run_seq2seq.py`` with 2 GPUs:
.. code-block:: bash .. code-block:: bash
cd examples/seq2seq python -m torch.distributed.launch --nproc_per_node=2 examples/seq2seq/run_seq2seq.py \
python -m torch.distributed.launch --nproc_per_node=2 ./finetune_trainer.py \ --model_name_or_path t5-small --per_device_train_batch_size 1 \
--model_name_or_path sshleifer/distill-mbart-en-ro-12-4 --data_dir wmt_en_ro \
--output_dir output_dir --overwrite_output_dir \ --output_dir output_dir --overwrite_output_dir \
--do_train --n_train 500 --num_train_epochs 1 \ --do_train --max_train_samples 500 --num_train_epochs 1 \
--per_device_train_batch_size 1 --freeze_embeds \ --dataset_name wmt16 --dataset_config "ro-en" \
--src_lang en_XX --tgt_lang ro_RO --task translation \ --task translation_en_to_ro --source_prefix "translate English to Romanian: " \
--fp16 --sharded_ddp --fp16 --sharded_ddp
Notes: Notes:
...@@ -344,17 +343,18 @@ In fact, you can continue using ``-m torch.distributed.launch`` with DeepSpeed a ...@@ -344,17 +343,18 @@ In fact, you can continue using ``-m torch.distributed.launch`` with DeepSpeed a
the ``deepspeed`` launcher. But since in the DeepSpeed documentation it'll be used everywhere, for consistency we will the ``deepspeed`` launcher. But since in the DeepSpeed documentation it'll be used everywhere, for consistency we will
use it here as well. use it here as well.
Here is an example of running ``finetune_trainer.py`` under DeepSpeed deploying all available GPUs: Here is an example of running ``run_seq2seq.py`` under DeepSpeed deploying all available GPUs:
.. code-block:: bash .. code-block:: bash
cd examples/seq2seq deepspeed examples/seq2seq/run_seq2seq.py \
deepspeed ./finetune_trainer.py --deepspeed ds_config.json \ --deepspeed examples/tests/deepspeed/ds_config.json \
--model_name_or_path sshleifer/distill-mbart-en-ro-12-4 --data_dir wmt_en_ro \ --model_name_or_path t5-small --per_device_train_batch_size 1 \
--output_dir output_dir --overwrite_output_dir \ --output_dir output_dir --overwrite_output_dir --fp16 \
--do_train --n_train 500 --num_train_epochs 1 \ --do_train --max_train_samples 500 --num_train_epochs 1 \
--per_device_train_batch_size 1 --freeze_embeds \ --dataset_name wmt16 --dataset_config "ro-en" \
--src_lang en_XX --tgt_lang ro_RO --task translation --task translation_en_to_ro --source_prefix "translate English to Romanian: "
Note that in the DeepSpeed documentation you are likely to see ``--deepspeed --deepspeed_config ds_config.json`` - i.e. Note that in the DeepSpeed documentation you are likely to see ``--deepspeed --deepspeed_config ds_config.json`` - i.e.
two DeepSpeed-related arguments, but for the sake of simplicity, and since there are already so many arguments to deal two DeepSpeed-related arguments, but for the sake of simplicity, and since there are already so many arguments to deal
...@@ -372,13 +372,13 @@ To deploy DeepSpeed with one GPU adjust the :class:`~transformers.Trainer` comma ...@@ -372,13 +372,13 @@ To deploy DeepSpeed with one GPU adjust the :class:`~transformers.Trainer` comma
.. code-block:: bash .. code-block:: bash
cd examples/seq2seq deepspeed --num_gpus=1 examples/seq2seq/run_seq2seq.py \
deepspeed --num_gpus=1 ./finetune_trainer.py --deepspeed ds_config.json \ --deepspeed examples/tests/deepspeed/ds_config.json \
--model_name_or_path sshleifer/distill-mbart-en-ro-12-4 --data_dir wmt_en_ro \ --model_name_or_path t5-small --per_device_train_batch_size 1 \
--output_dir output_dir --overwrite_output_dir \ --output_dir output_dir --overwrite_output_dir --fp16 \
--do_train --n_train 500 --num_train_epochs 1 \ --do_train --max_train_samples 500 --num_train_epochs 1 \
--per_device_train_batch_size 1 --freeze_embeds \ --dataset_name wmt16 --dataset_config "ro-en" \
--src_lang en_XX --tgt_lang ro_RO --task translation --task translation_en_to_ro --source_prefix "translate English to Romanian: "
This is almost the same as with multiple-GPUs, but here we tell DeepSpeed explicitly to use just one GPU. By default, This is almost the same as with multiple-GPUs, but here we tell DeepSpeed explicitly to use just one GPU. By default,
DeepSpeed deploys all GPUs it can see. If you have only 1 GPU to start with, then you don't need this argument. The DeepSpeed deploys all GPUs it can see. If you have only 1 GPU to start with, then you don't need this argument. The
...@@ -424,17 +424,17 @@ Notes: ...@@ -424,17 +424,17 @@ Notes:
.. code-block:: bash .. code-block:: bash
deepspeed --include localhost:1 ./finetune_trainer.py deepspeed --include localhost:1 examples/seq2seq/run_seq2seq.py ...
In this example, we tell DeepSpeed to use GPU 1. In this example, we tell DeepSpeed to use GPU 1 (second gpu).
Deployment in Notebooks Deployment in Notebooks
======================================================================================================================= =======================================================================================================================
The problem with notebooks is that there is no normal ``deepspeed`` launcher to rely on, so under certain setups we The problem with running notebook cells as a script is that there is no normal ``deepspeed`` launcher to rely on, so
have to emulate it. under certain setups we have to emulate it.
Here is how you'd have to adjust your training code in the notebook to use DeepSpeed. Here is how you'd have to adjust your training code in the notebook to use DeepSpeed.
...@@ -510,6 +510,24 @@ cell with: ...@@ -510,6 +510,24 @@ cell with:
EOT EOT
That's said if the script is not in the notebook cells, you can launch ``deepspeed`` normally via shell from a cell
with:
.. code-block::
!deepspeed examples/seq2seq/run_seq2seq.py ...
or with bash magic, where you can write a multi-line code for the shell to run:
.. code-block::
%%bash
cd /somewhere
deepspeed examples/seq2seq/run_seq2seq.py ...
Configuration Configuration
======================================================================================================================= =======================================================================================================================
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment