Unverified Commit 6a176880 authored by Lysandre Debut's avatar Lysandre Debut Committed by GitHub
Browse files

per_device instead of per_gpu/error thrown when argument unknown (#4618)



* per_device instead of per_gpu/error thrown when argument unknown

* [docs] Restore examples.md symlink

* Correct absolute links so that symlink to the doc works correctly

* Update src/transformers/hf_argparser.py
Co-authored-by: default avatarJulien Chaumond <chaumond@gmail.com>

* Warning + reorder

* Docs

* Style

* not for squad
Co-authored-by: default avatarJulien Chaumond <chaumond@gmail.com>
parent 1381b6d0
...@@ -340,8 +340,8 @@ python ./examples/text-classification/run_glue.py \ ...@@ -340,8 +340,8 @@ python ./examples/text-classification/run_glue.py \
--do_eval \ --do_eval \
--data_dir $GLUE_DIR/$TASK_NAME \ --data_dir $GLUE_DIR/$TASK_NAME \
--max_seq_length 128 \ --max_seq_length 128 \
--per_gpu_eval_batch_size=8 \ --per_device_eval_batch_size=8 \
--per_gpu_train_batch_size=8 \ --per_device_train_batch_size=8 \
--learning_rate 2e-5 \ --learning_rate 2e-5 \
--num_train_epochs 3.0 \ --num_train_epochs 3.0 \
--output_dir /tmp/$TASK_NAME/ --output_dir /tmp/$TASK_NAME/
...@@ -367,8 +367,8 @@ python ./examples/text-classification/run_glue.py \ ...@@ -367,8 +367,8 @@ python ./examples/text-classification/run_glue.py \
--data_dir=${GLUE_DIR}/STS-B \ --data_dir=${GLUE_DIR}/STS-B \
--output_dir=./proc_data/sts-b-110 \ --output_dir=./proc_data/sts-b-110 \
--max_seq_length=128 \ --max_seq_length=128 \
--per_gpu_eval_batch_size=8 \ --per_device_eval_batch_size=8 \
--per_gpu_train_batch_size=8 \ --per_device_train_batch_size=8 \
--gradient_accumulation_steps=1 \ --gradient_accumulation_steps=1 \
--max_steps=1200 \ --max_steps=1200 \
--model_name=xlnet-large-cased \ --model_name=xlnet-large-cased \
...@@ -391,8 +391,8 @@ python -m torch.distributed.launch --nproc_per_node 8 ./examples/text-classifica ...@@ -391,8 +391,8 @@ python -m torch.distributed.launch --nproc_per_node 8 ./examples/text-classifica
--do_eval \ --do_eval \
--data_dir $GLUE_DIR/MRPC/ \ --data_dir $GLUE_DIR/MRPC/ \
--max_seq_length 128 \ --max_seq_length 128 \
--per_gpu_eval_batch_size=8 \ --per_device_eval_batch_size=8 \
--per_gpu_train_batch_size=8 \ --per_device_train_batch_size=8 \
--learning_rate 2e-5 \ --learning_rate 2e-5 \
--num_train_epochs 3.0 \ --num_train_epochs 3.0 \
--output_dir /tmp/mrpc_output/ \ --output_dir /tmp/mrpc_output/ \
...@@ -428,8 +428,8 @@ python -m torch.distributed.launch --nproc_per_node=8 ./examples/question-answer ...@@ -428,8 +428,8 @@ python -m torch.distributed.launch --nproc_per_node=8 ./examples/question-answer
--max_seq_length 384 \ --max_seq_length 384 \
--doc_stride 128 \ --doc_stride 128 \
--output_dir ../models/wwm_uncased_finetuned_squad/ \ --output_dir ../models/wwm_uncased_finetuned_squad/ \
--per_gpu_eval_batch_size=3 \ --per_device_eval_batch_size=3 \
--per_gpu_train_batch_size=3 \ --per_device_train_batch_size=3 \
``` ```
Training with these hyper-parameters gave us the following results: Training with these hyper-parameters gave us the following results:
......
This diff is collapsed.
../../examples/README.md
\ No newline at end of file
...@@ -16,17 +16,17 @@ This is still a work-in-progress – in particular documentation is still sparse ...@@ -16,17 +16,17 @@ This is still a work-in-progress – in particular documentation is still sparse
| Task | Example datasets | Trainer support | TFTrainer support | pytorch-lightning | Colab | Task | Example datasets | Trainer support | TFTrainer support | pytorch-lightning | Colab
|---|---|:---:|:---:|:---:|:---:| |---|---|:---:|:---:|:---:|:---:|
| [**`language-modeling`**](./language-modeling) | Raw text | ✅ | - | - | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/blog/blob/master/notebooks/01_how_to_train.ipynb) | [**`language-modeling`**](https://github.com/huggingface/transformers/tree/master/examples/language-modeling) | Raw text | ✅ | - | - | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/blog/blob/master/notebooks/01_how_to_train.ipynb)
| [**`text-classification`**](./text-classification) | GLUE, XNLI | ✅ | ✅ | ✅ | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/blog/blob/master/notebooks/trainer/01_text_classification.ipynb) | [**`text-classification`**](https://github.com/huggingface/transformers/tree/master/examples/text-classification) | GLUE, XNLI | ✅ | ✅ | ✅ | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/blog/blob/master/notebooks/trainer/01_text_classification.ipynb)
| [**`token-classification`**](./token-classification) | CoNLL NER | ✅ | ✅ | ✅ | - | [**`token-classification`**](https://github.com/huggingface/transformers/tree/master/examples/token-classification) | CoNLL NER | ✅ | ✅ | ✅ | -
| [**`multiple-choice`**](./multiple-choice) | SWAG, RACE, ARC | ✅ | ✅ | - | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ViktorAlm/notebooks/blob/master/MPC_GPU_Demo_for_TF_and_PT.ipynb) | [**`multiple-choice`**](https://github.com/huggingface/transformers/tree/master/examples/multiple-choice) | SWAG, RACE, ARC | ✅ | ✅ | - | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ViktorAlm/notebooks/blob/master/MPC_GPU_Demo_for_TF_and_PT.ipynb)
| [**`question-answering`**](./question-answering) | SQuAD | - | ✅ | - | - | [**`question-answering`**](https://github.com/huggingface/transformers/tree/master/examples/question-answering) | SQuAD | - | ✅ | - | -
| [**`text-generation`**](./text-generation) | - | - | - | - | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/blog/blob/master/notebooks/02_how_to_generate.ipynb) | [**`text-generation`**](https://github.com/huggingface/transformers/tree/master/examples/text-generation) | - | - | - | - | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/blog/blob/master/notebooks/02_how_to_generate.ipynb)
| [**`distillation`**](./distillation) | All | - | - | - | - | [**`distillation`**](https://github.com/huggingface/transformers/tree/master/examples/distillation) | All | - | - | - | -
| [**`summarization`**](./summarization) | CNN/Daily Mail | - | - | - | - | [**`summarization`**](https://github.com/huggingface/transformers/tree/master/examples/summarization) | CNN/Daily Mail | - | - | - | -
| [**`translation`**](./translation) | WMT | - | - | - | - | [**`translation`**](https://github.com/huggingface/transformers/tree/master/examples/translation) | WMT | - | - | - | -
| [**`bertology`**](./bertology) | - | - | - | - | - | [**`bertology`**](https://github.com/huggingface/transformers/tree/master/examples/bertology) | - | - | - | - | -
| [**`adversarial`**](./adversarial) | HANS | - | - | - | - | [**`adversarial`**](https://github.com/huggingface/transformers/tree/master/examples/adversarial) | HANS | - | - | - | -
<br> <br>
...@@ -57,7 +57,7 @@ When using Tensorflow, TPUs are supported out of the box as a `tf.distribute.Str ...@@ -57,7 +57,7 @@ When using Tensorflow, TPUs are supported out of the box as a `tf.distribute.Str
When using PyTorch, we support TPUs thanks to `pytorch/xla`. For more context and information on how to setup your TPU environment refer to Google's documentation and to the When using PyTorch, we support TPUs thanks to `pytorch/xla`. For more context and information on how to setup your TPU environment refer to Google's documentation and to the
very detailed [pytorch/xla README](https://github.com/pytorch/xla/blob/master/README.md). very detailed [pytorch/xla README](https://github.com/pytorch/xla/blob/master/README.md).
In this repo, we provide a very simple launcher script named [xla_spawn.py](./xla_spawn.py) that lets you run our example scripts on multiple TPU cores without any boilerplate. In this repo, we provide a very simple launcher script named [xla_spawn.py](https://github.com/huggingface/transformers/tree/master/examples/xla_spawn.py) that lets you run our example scripts on multiple TPU cores without any boilerplate.
Just pass a `--num_cores` flag to this script, then your regular training script with its arguments (this is similar to the `torch.distributed.launch` helper for torch.distributed). Just pass a `--num_cores` flag to this script, then your regular training script with its arguments (this is similar to the `torch.distributed.launch` helper for torch.distributed).
For example for `run_glue`: For example for `run_glue`:
......
...@@ -19,7 +19,7 @@ python ./examples/multiple-choice/run_multiple_choice.py \ ...@@ -19,7 +19,7 @@ python ./examples/multiple-choice/run_multiple_choice.py \
--max_seq_length 80 \ --max_seq_length 80 \
--output_dir models_bert/swag_base \ --output_dir models_bert/swag_base \
--per_gpu_eval_batch_size=16 \ --per_gpu_eval_batch_size=16 \
--per_gpu_train_batch_size=16 \ --per_device_train_batch_size=16 \
--gradient_accumulation_steps 2 \ --gradient_accumulation_steps 2 \
--overwrite_output --overwrite_output
``` ```
...@@ -46,7 +46,7 @@ python ./examples/multiple-choice/run_tf_multiple_choice.py \ ...@@ -46,7 +46,7 @@ python ./examples/multiple-choice/run_tf_multiple_choice.py \
--max_seq_length 80 \ --max_seq_length 80 \
--output_dir models_bert/swag_base \ --output_dir models_bert/swag_base \
--per_gpu_eval_batch_size=16 \ --per_gpu_eval_batch_size=16 \
--per_gpu_train_batch_size=16 \ --per_device_train_batch_size=16 \
--logging-dir logs \ --logging-dir logs \
--gradient_accumulation_steps 2 \ --gradient_accumulation_steps 2 \
--overwrite_output --overwrite_output
......
...@@ -61,8 +61,8 @@ class ExamplesTests(unittest.TestCase): ...@@ -61,8 +61,8 @@ class ExamplesTests(unittest.TestCase):
--do_train --do_train
--do_eval --do_eval
--output_dir ./tests/fixtures/tests_samples/temp_dir --output_dir ./tests/fixtures/tests_samples/temp_dir
--per_gpu_train_batch_size=2 --per_device_train_batch_size=2
--per_gpu_eval_batch_size=1 --per_device_eval_batch_size=1
--learning_rate=1e-4 --learning_rate=1e-4
--max_steps=10 --max_steps=10
--warmup_steps=2 --warmup_steps=2
......
...@@ -68,7 +68,7 @@ python run_glue.py \ ...@@ -68,7 +68,7 @@ python run_glue.py \
--do_eval \ --do_eval \
--data_dir $GLUE_DIR/$TASK_NAME \ --data_dir $GLUE_DIR/$TASK_NAME \
--max_seq_length 128 \ --max_seq_length 128 \
--per_gpu_train_batch_size 32 \ --per_device_train_batch_size 32 \
--learning_rate 2e-5 \ --learning_rate 2e-5 \
--num_train_epochs 3.0 \ --num_train_epochs 3.0 \
--output_dir /tmp/$TASK_NAME/ --output_dir /tmp/$TASK_NAME/
...@@ -141,7 +141,7 @@ python run_glue.py \ ...@@ -141,7 +141,7 @@ python run_glue.py \
--do_eval \ --do_eval \
--data_dir $GLUE_DIR/MRPC/ \ --data_dir $GLUE_DIR/MRPC/ \
--max_seq_length 128 \ --max_seq_length 128 \
--per_gpu_train_batch_size 32 \ --per_device_train_batch_size 32 \
--learning_rate 2e-5 \ --learning_rate 2e-5 \
--num_train_epochs 3.0 \ --num_train_epochs 3.0 \
--output_dir /tmp/mrpc_output/ --output_dir /tmp/mrpc_output/
...@@ -166,7 +166,7 @@ python run_glue.py \ ...@@ -166,7 +166,7 @@ python run_glue.py \
--do_eval \ --do_eval \
--data_dir $GLUE_DIR/MRPC/ \ --data_dir $GLUE_DIR/MRPC/ \
--max_seq_length 128 \ --max_seq_length 128 \
--per_gpu_train_batch_size 32 \ --per_device_train_batch_size 32 \
--learning_rate 2e-5 \ --learning_rate 2e-5 \
--num_train_epochs 3.0 \ --num_train_epochs 3.0 \
--output_dir /tmp/mrpc_output/ \ --output_dir /tmp/mrpc_output/ \
...@@ -189,7 +189,7 @@ python -m torch.distributed.launch \ ...@@ -189,7 +189,7 @@ python -m torch.distributed.launch \
--do_eval \ --do_eval \
--data_dir $GLUE_DIR/MRPC/ \ --data_dir $GLUE_DIR/MRPC/ \
--max_seq_length 128 \ --max_seq_length 128 \
--per_gpu_train_batch_size 8 \ --per_device_train_batch_size 8 \
--learning_rate 2e-5 \ --learning_rate 2e-5 \
--num_train_epochs 3.0 \ --num_train_epochs 3.0 \
--output_dir /tmp/mrpc_output/ --output_dir /tmp/mrpc_output/
...@@ -221,7 +221,7 @@ python -m torch.distributed.launch \ ...@@ -221,7 +221,7 @@ python -m torch.distributed.launch \
--do_eval \ --do_eval \
--data_dir $GLUE_DIR/MNLI/ \ --data_dir $GLUE_DIR/MNLI/ \
--max_seq_length 128 \ --max_seq_length 128 \
--per_gpu_train_batch_size 8 \ --per_device_train_batch_size 8 \
--learning_rate 2e-5 \ --learning_rate 2e-5 \
--num_train_epochs 3.0 \ --num_train_epochs 3.0 \
--output_dir output_dir \ --output_dir output_dir \
...@@ -280,7 +280,7 @@ python run_xnli.py \ ...@@ -280,7 +280,7 @@ python run_xnli.py \
--do_train \ --do_train \
--do_eval \ --do_eval \
--data_dir $XNLI_DIR \ --data_dir $XNLI_DIR \
--per_gpu_train_batch_size 32 \ --per_device_train_batch_size 32 \
--learning_rate 5e-5 \ --learning_rate 5e-5 \
--num_train_epochs 2.0 \ --num_train_epochs 2.0 \
--max_seq_length 128 \ --max_seq_length 128 \
......
...@@ -69,7 +69,7 @@ python3 run_ner.py --data_dir ./ \ ...@@ -69,7 +69,7 @@ python3 run_ner.py --data_dir ./ \
--output_dir $OUTPUT_DIR \ --output_dir $OUTPUT_DIR \
--max_seq_length $MAX_LENGTH \ --max_seq_length $MAX_LENGTH \
--num_train_epochs $NUM_EPOCHS \ --num_train_epochs $NUM_EPOCHS \
--per_gpu_train_batch_size $BATCH_SIZE \ --per_device_train_batch_size $BATCH_SIZE \
--save_steps $SAVE_STEPS \ --save_steps $SAVE_STEPS \
--seed $SEED \ --seed $SEED \
--do_train \ --do_train \
...@@ -91,7 +91,7 @@ Instead of passing all parameters via commandline arguments, the `run_ner.py` sc ...@@ -91,7 +91,7 @@ Instead of passing all parameters via commandline arguments, the `run_ner.py` sc
"output_dir": "germeval-model", "output_dir": "germeval-model",
"max_seq_length": 128, "max_seq_length": 128,
"num_train_epochs": 3, "num_train_epochs": 3,
"per_gpu_train_batch_size": 32, "per_device_train_batch_size": 32,
"save_steps": 750, "save_steps": 750,
"seed": 1, "seed": 1,
"do_train": true, "do_train": true,
......
...@@ -126,6 +126,9 @@ class HfArgumentParser(ArgumentParser): ...@@ -126,6 +126,9 @@ class HfArgumentParser(ArgumentParser):
if return_remaining_strings: if return_remaining_strings:
return (*outputs, remaining_args) return (*outputs, remaining_args)
else: else:
if remaining_args:
raise ValueError(f"Some specified arguments are not used by the HfArgumentParser: {remaining_args}")
return (*outputs,) return (*outputs,)
def parse_json_file(self, json_file: str) -> Tuple[DataClass, ...]: def parse_json_file(self, json_file: str) -> Tuple[DataClass, ...]:
......
...@@ -416,7 +416,7 @@ class Trainer: ...@@ -416,7 +416,7 @@ class Trainer:
logger.info("***** Running training *****") logger.info("***** Running training *****")
logger.info(" Num examples = %d", self.num_examples(train_dataloader)) logger.info(" Num examples = %d", self.num_examples(train_dataloader))
logger.info(" Num Epochs = %d", num_train_epochs) logger.info(" Num Epochs = %d", num_train_epochs)
logger.info(" Instantaneous batch size per device = %d", self.args.per_gpu_train_batch_size) logger.info(" Instantaneous batch size per device = %d", self.args.per_device_train_batch_size)
logger.info(" Total train batch size (w. parallel, distributed & accumulation) = %d", total_train_batch_size) logger.info(" Total train batch size (w. parallel, distributed & accumulation) = %d", total_train_batch_size)
logger.info(" Gradient Accumulation steps = %d", self.args.gradient_accumulation_steps) logger.info(" Gradient Accumulation steps = %d", self.args.gradient_accumulation_steps)
logger.info(" Total optimization steps = %d", t_total) logger.info(" Total optimization steps = %d", t_total)
......
...@@ -58,8 +58,28 @@ class TrainingArguments: ...@@ -58,8 +58,28 @@ class TrainingArguments:
default=False, metadata={"help": "Run evaluation during training at each logging step."}, default=False, metadata={"help": "Run evaluation during training at each logging step."},
) )
per_gpu_train_batch_size: int = field(default=8, metadata={"help": "Batch size per GPU/CPU for training."}) per_device_train_batch_size: int = field(
per_gpu_eval_batch_size: int = field(default=8, metadata={"help": "Batch size per GPU/CPU for evaluation."}) default=8, metadata={"help": "Batch size per GPU/TPU core/CPU for training."}
)
per_device_eval_batch_size: int = field(
default=8, metadata={"help": "Batch size per GPU/TPU core/CPU for evaluation."}
)
per_gpu_train_batch_size: Optional[int] = field(
default=None,
metadata={
"help": "Deprecated, the use of `--per_device_train_batch_size` is preferred. "
"Batch size per GPU/TPU core/CPU for training."
},
)
per_gpu_eval_batch_size: Optional[int] = field(
default=None,
metadata={
"help": "Deprecated, the use of `--per_device_eval_batch_size` is preferred."
"Batch size per GPU/TPU core/CPU for evaluation."
},
)
gradient_accumulation_steps: int = field( gradient_accumulation_steps: int = field(
default=1, default=1,
metadata={"help": "Number of updates steps to accumulate before performing a backward/update pass."}, metadata={"help": "Number of updates steps to accumulate before performing a backward/update pass."},
...@@ -115,11 +135,23 @@ class TrainingArguments: ...@@ -115,11 +135,23 @@ class TrainingArguments:
@property @property
def train_batch_size(self) -> int: def train_batch_size(self) -> int:
return self.per_gpu_train_batch_size * max(1, self.n_gpu) if self.per_gpu_train_batch_size:
logger.warning(
"Using deprecated `--per_gpu_train_batch_size` argument which will be removed in a future "
"version. Using `--per_device_train_batch_size` is preferred."
)
per_device_batch_size = self.per_gpu_train_batch_size or self.per_device_train_batch_size
return per_device_batch_size * max(1, self.n_gpu)
@property @property
def eval_batch_size(self) -> int: def eval_batch_size(self) -> int:
return self.per_gpu_eval_batch_size * max(1, self.n_gpu) if self.per_gpu_eval_batch_size:
logger.warning(
"Using deprecated `--per_gpu_eval_batch_size` argument which will be removed in a future "
"version. Using `--per_device_eval_batch_size` is preferred."
)
per_device_batch_size = self.per_gpu_eval_batch_size or self.per_device_eval_batch_size
return per_device_batch_size * max(1, self.n_gpu)
@cached_property @cached_property
@torch_required @torch_required
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment