Unverified Commit dabeb152 authored by Sylvain Gugger's avatar Sylvain Gugger Committed by GitHub
Browse files

Examples reorg (#11350)



* Base move

* Examples reorganization

* Update references

* Put back test data

* Move conftest

* More fixes

* Move test data to test fixtures

* Update path

* Apply suggestions from code review
Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>

* Address review comments and clean
Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>
parent ca7ff64f
...@@ -14,9 +14,9 @@ See the License for the specific language governing permissions and ...@@ -14,9 +14,9 @@ See the License for the specific language governing permissions and
limitations under the License. limitations under the License.
--> -->
## SQuAD # SQuAD
Based on the script [`run_qa.py`](https://github.com/huggingface/transformers/blob/master/examples/question-answering/run_qa.py). Based on the script [`run_qa.py`](https://github.com/huggingface/transformers/blob/master/examples/pytorch/question-answering/run_qa.py).
**Note:** This script only works with models that have a fast tokenizer (backed by the 🤗 Tokenizers library) as it **Note:** This script only works with models that have a fast tokenizer (backed by the 🤗 Tokenizers library) as it
uses special features of those tokenizers. You can check if your favorite model has a fast tokenizer in uses special features of those tokenizers. You can check if your favorite model has a fast tokenizer in
...@@ -29,7 +29,9 @@ The old version of this script can be found [here](https://github.com/huggingfac ...@@ -29,7 +29,9 @@ The old version of this script can be found [here](https://github.com/huggingfac
Note that if your dataset contains samples with no possible answers (like SQUAD version 2), you need to pass along the flag `--version_2_with_negative`. Note that if your dataset contains samples with no possible answers (like SQUAD version 2), you need to pass along the flag `--version_2_with_negative`.
#### Fine-tuning BERT on SQuAD1.0 ## Trainer-based scripts
### Fine-tuning BERT on SQuAD1.0
This example code fine-tunes BERT on the SQuAD1.0 dataset. It runs in 24 min (with BERT-base) or 68 min (with BERT-large) This example code fine-tunes BERT on the SQuAD1.0 dataset. It runs in 24 min (with BERT-base) or 68 min (with BERT-large)
on a single tesla V100 16GB. on a single tesla V100 16GB.
...@@ -57,7 +59,6 @@ exact_match = 81.22 ...@@ -57,7 +59,6 @@ exact_match = 81.22
#### Distributed training #### Distributed training
Here is an example using distributed training on 8 V100 GPUs and Bert Whole Word Masking uncased model to reach a F1 > 93 on SQuAD1.1: Here is an example using distributed training on 8 V100 GPUs and Bert Whole Word Masking uncased model to reach a F1 > 93 on SQuAD1.1:
```bash ```bash
...@@ -128,6 +129,71 @@ python run_qa_beam_search.py \ ...@@ -128,6 +129,71 @@ python run_qa_beam_search.py \
--save_steps 5000 --save_steps 5000
``` ```
## With Accelerate
Based on the script `run_qa_no_trainer.py` and `run_qa_beam_search_no_trainer.py`.
Like `run_qa.py` and `run_qa_beam_search.py`, these scripts allow you to fine-tune any of the models supported on a
SQUAD or a similar dataset, the main difference is that this
script exposes the bare training loop, to allow you to quickly experiment and add any customization you would like.
It offers less options than the script with `Trainer` (for instance you can easily change the options for the optimizer
or the dataloaders directly in the script) but still run in a distributed setup, on TPU and supports mixed precision by
the mean of the [🤗 `Accelerate`](https://github.com/huggingface/accelerate) library. You can use the script normally
after installing it:
```bash
pip install accelerate
```
then
```bash
python run_qa_no_trainer.py \
--model_name_or_path bert-base-uncased \
--dataset_name squad \
--max_seq_length 384 \
--doc_stride 128 \
--output_dir ~/tmp/debug_squad
```
You can then use your usual launchers to run in it in a distributed environment, but the easiest way is to run
```bash
accelerate config
```
and reply to the questions asked. Then
```bash
accelerate test
```
that will check everything is ready for training. Finally, you cna launch training with
```bash
export TASK_NAME=mrpc
accelerate launch run_qa_no_trainer.py \
--model_name_or_path bert-base-uncased \
--dataset_name squad \
--max_seq_length 384 \
--doc_stride 128 \
--output_dir ~/tmp/debug_squad
```
This command is the same and will work for:
- a CPU-only setup
- a setup with one GPU
- a distributed training with several GPUs (single or multi node)
- a training on TPUs
Note that this library is in alpha release so your feedback is more than welcome if you encounter any problem using it.
## Results
Larger batch size may improve the performance while costing more memory. Larger batch size may improve the performance while costing more memory.
##### Results for SQuAD1.0 with the previously defined hyper-parameters: ##### Results for SQuAD1.0 with the previously defined hyper-parameters:
...@@ -223,22 +289,3 @@ python -m torch.distributed.launch --nproc_per_node=8 ./examples/question-answer ...@@ -223,22 +289,3 @@ python -m torch.distributed.launch --nproc_per_node=8 ./examples/question-answer
``` ```
Training with the above command leads to the f1 score of 93.52, which is slightly better than the f1 score of 93.15 for Training with the above command leads to the f1 score of 93.52, which is slightly better than the f1 score of 93.15 for
`bert-large-uncased-whole-word-masking`. `bert-large-uncased-whole-word-masking`.
## SQuAD with the Tensorflow Trainer
```bash
python run_tf_squad.py \
--model_name_or_path bert-base-uncased \
--output_dir model \
--max_seq_length 384 \
--num_train_epochs 2 \
--per_gpu_train_batch_size 8 \
--per_gpu_eval_batch_size 16 \
--do_train \
--logging_dir logs \
--logging_steps 10 \
--learning_rate 3e-5 \
--doc_stride 128
```
For the moment evaluation is not available in the Tensorflow Trainer only the training.
...@@ -14,9 +14,9 @@ See the License for the specific language governing permissions and ...@@ -14,9 +14,9 @@ See the License for the specific language governing permissions and
limitations under the License. limitations under the License.
--> -->
## Sequence to Sequence Training and Evaluation ## Summarization
This directory contains examples for finetuning and evaluating transformers on summarization and translation tasks. This directory contains examples for finetuning and evaluating transformers on summarization tasks.
Please tag @patil-suraj with any issues/unexpected behaviors, or send a PR! Please tag @patil-suraj with any issues/unexpected behaviors, or send a PR!
For deprecated `bertabs` instructions, see [`bertabs/README.md`](https://github.com/huggingface/transformers/blob/master/examples/research_projects/bertabs/README.md). For deprecated `bertabs` instructions, see [`bertabs/README.md`](https://github.com/huggingface/transformers/blob/master/examples/research_projects/bertabs/README.md).
For the old `finetune_trainer.py` and related utils, see [`examples/legacy/seq2seq`](https://github.com/huggingface/transformers/blob/master/examples/legacy/seq2seq). For the old `finetune_trainer.py` and related utils, see [`examples/legacy/seq2seq`](https://github.com/huggingface/transformers/blob/master/examples/legacy/seq2seq).
...@@ -30,16 +30,16 @@ For the old `finetune_trainer.py` and related utils, see [`examples/legacy/seq2s ...@@ -30,16 +30,16 @@ For the old `finetune_trainer.py` and related utils, see [`examples/legacy/seq2s
- `PegasusForConditionalGeneration` - `PegasusForConditionalGeneration`
- `T5ForConditionalGeneration` - `T5ForConditionalGeneration`
`run_summarization.py` and `run_translation.py` are lightweight examples of how to download and preprocess a dataset from the [🤗 Datasets](https://github.com/huggingface/datasets) library or use your own files (jsonlines or csv), then fine-tune one of the architectures above on it. `run_summarization.py` is a lightweight example of how to download and preprocess a dataset from the [🤗 Datasets](https://github.com/huggingface/datasets) library or use your own files (jsonlines or csv), then fine-tune one of the architectures above on it.
For custom datasets in `jsonlines` format please see: https://huggingface.co/docs/datasets/loading_datasets.html#json-files For custom datasets in `jsonlines` format please see: https://huggingface.co/docs/datasets/loading_datasets.html#json-files
and you also will find examples of these below. and you also will find examples of these below.
### Summarization ## With Trainer
Here is an example on a summarization task: Here is an example on a summarization task:
```bash ```bash
python examples/seq2seq/run_summarization.py \ python examples/pytorch/summarization/run_summarization.py \
--model_name_or_path t5-small \ --model_name_or_path t5-small \
--do_train \ --do_train \
--do_eval \ --do_eval \
...@@ -63,7 +63,7 @@ And here is how you would use it on your own files, after adjusting the values f ...@@ -63,7 +63,7 @@ And here is how you would use it on your own files, after adjusting the values f
`--train_file`, `--validation_file`, `--text_column` and `--summary_column` to match your setup: `--train_file`, `--validation_file`, `--text_column` and `--summary_column` to match your setup:
```bash ```bash
python examples/seq2seq/run_summarization.py \ python examples/pytorch/summarization/run_summarization.py \
--model_name_or_path t5-small \ --model_name_or_path t5-small \
--do_train \ --do_train \
--do_eval \ --do_eval \
...@@ -134,115 +134,64 @@ And as with the CSV files, you can specify which values to select from the file, ...@@ -134,115 +134,64 @@ And as with the CSV files, you can specify which values to select from the file,
--summary_column summary \ --summary_column summary \
``` ```
## With Accelerate
Based on the script [`run_summarization_no_trainer.py`](https://github.com/huggingface/transformers/blob/master/examples/pytorch/summarization/run_summarization_no_trainer.py).
### Translation Like `run_summarization.py`, this script allows you to fine-tune any of the models supported on a
summarization task, the main difference is that this
script exposes the bare training loop, to allow you to quickly experiment and add any customization you would like.
Here is an example of a translation fine-tuning with a MarianMT model: It offers less options than the script with `Trainer` (for instance you can easily change the options for the optimizer
or the dataloaders directly in the script) but still run in a distributed setup, on TPU and supports mixed precision by
the mean of the [🤗 `Accelerate`](https://github.com/huggingface/accelerate) library. You can use the script normally
after installing it:
```bash ```bash
python examples/seq2seq/run_translation.py \ pip install accelerate
--model_name_or_path Helsinki-NLP/opus-mt-en-ro \
--do_train \
--do_eval \
--source_lang en \
--target_lang ro \
--dataset_name wmt16 \
--dataset_config_name ro-en \
--output_dir /tmp/tst-translation \
--per_device_train_batch_size=4 \
--per_device_eval_batch_size=4 \
--overwrite_output_dir \
--predict_with_generate
``` ```
MBart and some T5 models require special handling. then
T5 models `t5-small`, `t5-base`, `t5-large`, `t5-3b` and `t5-11b` must use an additional argument: `--source_prefix "translate {source_lang} to {target_lang}"`. For example:
```bash ```bash
python examples/seq2seq/run_translation.py \ python run_summarization_no_trainer.py \
--model_name_or_path t5-small \ --model_name_or_path t5-small \
--do_train \ --dataset_name cnn_dailymail \
--do_eval \ --dataset_config "3.0.0" \
--source_lang en \ --source_prefix "summarize: " \
--target_lang ro \ --output_dir ~/tmp/tst-summarization
--source_prefix "translate English to Romanian: " \
--dataset_name wmt16 \
--dataset_config_name ro-en \
--output_dir /tmp/tst-translation \
--per_device_train_batch_size=4 \
--per_device_eval_batch_size=4 \
--overwrite_output_dir \
--predict_with_generate
``` ```
If you get a terrible BLEU score, make sure that you didn't forget to use the `--source_prefix` argument. You can then use your usual launchers to run in it in a distributed environment, but the easiest way is to run
For the aforementioned group of T5 models it's important to remember that if you switch to a different language pair, make sure to adjust the source and target values in all 3 language-specific command line argument: `--source_lang`, `--target_lang` and `--source_prefix`. ```bash
accelerate config
```
MBart models require a different format for `--source_lang` and `--target_lang` values, e.g. instead of `en` it expects `en_XX`, for `ro` it expects `ro_RO`. The full MBart specification for language codes can be found [here](https://huggingface.co/facebook/mbart-large-cc25). For example: and reply to the questions asked. Then
```bash ```bash
python examples/seq2seq/run_translation.py \ accelerate test
--model_name_or_path facebook/mbart-large-en-ro \ ```
--do_train \
--do_eval \
--dataset_name wmt16 \
--dataset_config_name ro-en \
--source_lang en_XX \
--target_lang ro_RO \
--output_dir /tmp/tst-translation \
--per_device_train_batch_size=4 \
--per_device_eval_batch_size=4 \
--overwrite_output_dir \
--predict_with_generate
```
And here is how you would use the translation finetuning on your own files, after adjusting the that will check everything is ready for training. Finally, you cna launch training with
values for the arguments `--train_file`, `--validation_file` to match your setup:
```bash ```bash
python examples/seq2seq/run_translation.py \ export TASK_NAME=mrpc
accelerate launch run_summarization_no_trainer.py \
--model_name_or_path t5-small \ --model_name_or_path t5-small \
--do_train \ --dataset_name cnn_dailymail \
--do_eval \ --dataset_config "3.0.0" \
--source_lang en \ --source_prefix "summarize: " \
--target_lang ro \ --output_dir ~/tmp/tst-summarization
--source_prefix "translate English to Romanian: " \
--dataset_name wmt16 \
--dataset_config_name ro-en \
--train_file path_to_jsonlines_file \
--validation_file path_to_jsonlines_file \
--output_dir /tmp/tst-translation \
--per_device_train_batch_size=4 \
--per_device_eval_batch_size=4 \
--overwrite_output_dir \
--predict_with_generate
``` ```
The task of translation supports only custom JSONLINES files, with each line being a dictionary with a key `"translation"` and its value another dictionary whose keys is the language pair. For example: This command is the same and will work for:
```json - a CPU-only setup
{ "translation": { "en": "Others have dismissed him as a joke.", "ro": "Alții l-au numit o glumă." } } - a setup with one GPU
{ "translation": { "en": "And some are holding out for an implosion.", "ro": "Iar alții așteaptă implozia." } } - a distributed training with several GPUs (single or multi node)
``` - a training on TPUs
Here the languages are Romanian (`ro`) and English (`en`).
If you want to use a pre-processed dataset that leads to high BLEU scores, but for the `en-de` language pair, you can use `--dataset_name stas/wmt14-en-de-pre-processed`, as following:
```bash Note that this library is in alpha release so your feedback is more than welcome if you encounter any problem using it.
python examples/seq2seq/run_translation.py \
--model_name_or_path t5-small \
--do_train \
--do_eval \
--source_lang en \
--target_lang de \
--source_prefix "translate English to German: " \
--dataset_name stas/wmt14-en-de-pre-processed \
--output_dir /tmp/tst-translation \
--per_device_train_batch_size=4 \
--per_device_eval_batch_size=4 \
--overwrite_output_dir \
--predict_with_generate
```
datasets >= 1.1.3
sentencepiece != 0.1.92
protobuf
rouge-score
nltk
py7zr
torch >= 1.3
...@@ -36,7 +36,8 @@ SRC_DIRS = [ ...@@ -36,7 +36,8 @@ SRC_DIRS = [
"language-modeling", "language-modeling",
"multiple-choice", "multiple-choice",
"question-answering", "question-answering",
"seq2seq", "summarization",
"translation",
] ]
] ]
sys.path.extend(SRC_DIRS) sys.path.extend(SRC_DIRS)
......
...@@ -16,7 +16,7 @@ limitations under the License. ...@@ -16,7 +16,7 @@ limitations under the License.
# Text classification examples # Text classification examples
## PyTorch version ## GLUE tasks
Based on the script [`run_glue.py`](https://github.com/huggingface/transformers/blob/master/examples/text-classification/run_glue.py). Based on the script [`run_glue.py`](https://github.com/huggingface/transformers/blob/master/examples/text-classification/run_glue.py).
...@@ -129,7 +129,7 @@ and reply to the questions asked. Then ...@@ -129,7 +129,7 @@ and reply to the questions asked. Then
accelerate test accelerate test
``` ```
that will check everything is ready for training. Finally, you cna launch training with that will check everything is ready for training. Finally, you can launch training with
```bash ```bash
export TASK_NAME=mrpc export TASK_NAME=mrpc
...@@ -152,84 +152,3 @@ This command is the same and will work for: ...@@ -152,84 +152,3 @@ This command is the same and will work for:
- a training on TPUs - a training on TPUs
Note that this library is in alpha release so your feedback is more than welcome if you encounter any problem using it. Note that this library is in alpha release so your feedback is more than welcome if you encounter any problem using it.
## TensorFlow 2.0 version
Based on the script [`run_tf_glue.py`](https://github.com/huggingface/transformers/blob/master/examples/text-classification/run_tf_glue.py).
Fine-tuning the library TensorFlow 2.0 Bert model for sequence classification on the MRPC task of the GLUE benchmark: [General Language Understanding Evaluation](https://gluebenchmark.com/).
This script has an option for mixed precision (Automatic Mixed Precision / AMP) to run models on Tensor Cores (NVIDIA Volta/Turing GPUs) and future hardware and an option for XLA, which uses the XLA compiler to reduce model runtime.
Options are toggled using `USE_XLA` or `USE_AMP` variables in the script.
These options and the below benchmark are provided by @tlkh.
Quick benchmarks from the script (no other modifications):
| GPU | Mode | Time (2nd epoch) | Val Acc (3 runs) |
| --------- | -------- | ----------------------- | ----------------------|
| Titan V | FP32 | 41s | 0.8438/0.8281/0.8333 |
| Titan V | AMP | 26s | 0.8281/0.8568/0.8411 |
| V100 | FP32 | 35s | 0.8646/0.8359/0.8464 |
| V100 | AMP | 22s | 0.8646/0.8385/0.8411 |
| 1080 Ti | FP32 | 55s | - |
Mixed precision (AMP) reduces the training time considerably for the same hardware and hyper-parameters (same batch size was used).
## Run generic text classification script in TensorFlow
The script [run_tf_text_classification.py](https://github.com/huggingface/transformers/blob/master/examples/text-classification/run_tf_text_classification.py) allows users to run a text classification on their own CSV files. For now there are few restrictions, the CSV files must have a header corresponding to the column names and not more than three columns: one column for the id, one column for the text and another column for a second piece of text in case of an entailment classification for example.
To use the script, one as to run the following command line:
```bash
python run_tf_text_classification.py \
--train_file train.csv \ ### training dataset file location (mandatory if running with --do_train option)
--dev_file dev.csv \ ### development dataset file location (mandatory if running with --do_eval option)
--test_file test.csv \ ### test dataset file location (mandatory if running with --do_predict option)
--label_column_id 0 \ ### which column corresponds to the labels
--model_name_or_path bert-base-multilingual-uncased \
--output_dir model \
--num_train_epochs 4 \
--per_device_train_batch_size 16 \
--per_device_eval_batch_size 32 \
--do_train \
--do_eval \
--do_predict \
--logging_steps 10 \
--evaluation_strategy steps \
--save_steps 10 \
--overwrite_output_dir \
--max_seq_length 128
```
## XNLI
Based on the script [`run_xnli.py`](https://github.com/huggingface/transformers/blob/master/examples/text-classification/run_xnli.py).
[XNLI](https://www.nyu.edu/projects/bowman/xnli/) is a crowd-sourced dataset based on [MultiNLI](http://www.nyu.edu/projects/bowman/multinli/). It is an evaluation benchmark for cross-lingual text representations. Pairs of text are labeled with textual entailment annotations for 15 different languages (including both high-resource language such as English and low-resource languages such as Swahili).
#### Fine-tuning on XNLI
This example code fine-tunes mBERT (multi-lingual BERT) on the XNLI dataset. It runs in 106 mins on a single tesla V100 16GB.
```bash
python run_xnli.py \
--model_name_or_path bert-base-multilingual-cased \
--language de \
--train_language en \
--do_train \
--do_eval \
--per_device_train_batch_size 32 \
--learning_rate 5e-5 \
--num_train_epochs 2.0 \
--max_seq_length 128 \
--output_dir /tmp/debug_xnli/ \
--save_steps -1
```
Training with the previously defined hyper-parameters yields the following results on the **test** set:
```bash
acc = 0.7093812375249501
```
...@@ -2,3 +2,4 @@ accelerate ...@@ -2,3 +2,4 @@ accelerate
datasets >= 1.1.3 datasets >= 1.1.3
sentencepiece != 0.1.92 sentencepiece != 0.1.92
protobuf protobuf
torch >= 1.3
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment