Examples reorg (#11350)

* Base move * Examples reorganization * Update references * Put back test data * Move conftest * More fixes * Move test data to test fixtures * Update path * Apply suggestions from code review Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * Address review comments and clean Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

Examples reorg (#11350)
* Base move * Examples reorganization * Update references * Put back test data * Move conftest * More fixes * Move test data to test fixtures * Update path * Apply suggestions from code review Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * Address review comments and clean Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
dabeb152 · Sylvain Gugger · GitHub · ca7ff64f · dabeb152 · dabeb152
Unverified Commit dabeb152 authored Apr 21, 2021 by Sylvain Gugger Committed by GitHub Apr 21, 2021
20 changed files
--- a/examples/text-classification/run_glue_no_trainer.py
+++ b/examples/text-classification/run_glue_no_trainer.py
--- a/examples/text-classification/run_xnli.py
+++ b/examples/text-classification/run_xnli.py
--- a/examples/text-generation/README.md
+++ b/examples/text-generation/README.md
--- a/examples/multiple-choice/requirements.txt
+++ b/examples/multiple-choice/requirements.txt
 sentencepiece != 0.1.92
 protobuf
+torch >= 1.3
--- a/examples/text-generation/run_generation.py
+++ b/examples/text-generation/run_generation.py
--- a/examples/token-classification/README.md
+++ b/examples/token-classification/README.md
@@ -61,7 +61,7 @@ You can find the old version of the PyTorch script [here](https://github.com/hug
 ## Pytorch version, no Trainer
-Based on the script [run_ner_no_trainer.py](https://github.com/huggingface/transformers/blob/master/examples/token-classification/run_ner_no_trainer.py).
+Based on the script [run_ner_no_trainer.py](https://github.com/huggingface/transformers/blob/master/examples/pytorch/token-classification/run_ner_no_trainer.py).
 Like `run_ner.py`, this script allows you to fine-tune any of the models on the [hub](https://huggingface.co/models) on a
 token classification task, either NER, POS or CHUNKS tasks or your own data in a csv or a JSON file. The main difference is that this
@@ -126,66 +126,3 @@ This command is the same and will work for:
 - a training on TPUs
 Note that this library is in alpha release so your feedback is more than welcome if you encounter any problem using it.
-### TensorFlow version
-The following examples are covered in this section:
-* NER on the GermEval 2014 (German NER) dataset
-* Emerging and Rare Entities task: WNUT’17 (English NER) dataset
-Details and results for the fine-tuning provided by @stefan-it.
-### GermEval 2014 (German NER) dataset
-#### Data (Download and pre-processing steps)
-Data can be obtained from the [GermEval 2014](https://sites.google.com/site/germeval2014ner/data) shared task page.
-Here are the commands for downloading and pre-processing train, dev and test datasets. The original data format has four (tab-separated) columns, in a pre-processing step only the two relevant columns (token and outer span NER annotation) are extracted:
-```bash
-curl -L 'https://drive.google.com/uc?export=download&id=1Jjhbal535VVz2ap4v4r_rN1UEHTdLK5P' \
-| grep -v "^#" | cut -f 2,3 | tr '\t' ' ' > train.txt.tmp
-curl -L 'https://drive.google.com/uc?export=download&id=1ZfRcQThdtAR5PPRjIDtrVP7BtXSCUBbm' \
-| grep -v "^#" | cut -f 2,3 | tr '\t' ' ' > dev.txt.tmp
-curl -L 'https://drive.google.com/uc?export=download&id=1u9mb7kNJHWQCWyweMDRMuTFoOHOfeBTH' \
-| grep -v "^#" | cut -f 2,3 | tr '\t' ' ' > test.txt.tmp
-```
-The GermEval 2014 dataset contains some strange "control character" tokens like `'\x96', '\u200e', '\x95', '\xad' or '\x80'`.
-One problem with these tokens is, that `BertTokenizer` returns an empty token for them, resulting in misaligned `InputExample`s.
-The `preprocess.py` script located in the `scripts` folder a) filters these tokens and b) splits longer sentences into smaller ones (once the max. subtoken length is reached).
-Let's define some variables that we need for further pre-processing steps and training the model:
-```bash
-export MAX_LENGTH=128
-export BERT_MODEL=bert-base-multilingual-cased
-```
-Run the pre-processing script on training, dev and test datasets:
-```bash
-python3 scripts/preprocess.py train.txt.tmp $BERT_MODEL $MAX_LENGTH > train.txt
-python3 scripts/preprocess.py dev.txt.tmp $BERT_MODEL $MAX_LENGTH > dev.txt
-python3 scripts/preprocess.py test.txt.tmp $BERT_MODEL $MAX_LENGTH > test.txt
-```
-The GermEval 2014 dataset has much more labels than CoNLL-2002/2003 datasets, so an own set of labels must be used:
-```bash
-cat train.txt dev.txt test.txt | cut -d " " -f 2 | grep -v "^$"| sort | uniq > labels.txt
-```
-#### Prepare the run
-Additional environment variables must be set:
-```bash
-export OUTPUT_DIR=germeval-model
-export BATCH_SIZE=32
-export NUM_EPOCHS=3
-export SAVE_STEPS=750
-export SEED=1
-```
--- a/examples/token-classification/requirements.txt
+++ b/examples/token-classification/requirements.txt
 seqeval
 datasets >= 1.1.3
+torch >= 1.3
--- a/examples/token-classification/run.sh
+++ b/examples/token-classification/run.sh
--- a/examples/token-classification/run_ner.py
+++ b/examples/token-classification/run_ner.py
--- a/examples/token-classification/run_ner_no_trainer.py
+++ b/examples/token-classification/run_ner_no_trainer.py
--- a/examples/token-classification/run_no_trainer.sh
+++ b/examples/token-classification/run_no_trainer.sh
--- a/examples/pytorch/translation/README.md
+++ b/examples/pytorch/translation/README.md
+<!---
+Copyright 2020 The HuggingFace Team. All rights reserved.
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+    http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+## Translation
+This directory contains examples for finetuning and evaluating transformers on translation tasks.
+Please tag @patil-suraj with any issues/unexpected behaviors, or send a PR!
+For deprecated `bertabs` instructions, see [`bertabs/README.md`](https://github.com/huggingface/transformers/blob/master/examples/research_projects/bertabs/README.md).
+For the old `finetune_trainer.py` and related utils, see [`examples/legacy/seq2seq`](https://github.com/huggingface/transformers/blob/master/examples/legacy/seq2seq).
+### Supported Architectures
+- `BartForConditionalGeneration`
+- `FSMTForConditionalGeneration` (translation only)
+- `MBartForConditionalGeneration`
+- `MarianMTModel`
+- `PegasusForConditionalGeneration`
+- `T5ForConditionalGeneration`
+`run_translation.py` is a lightweight examples of how to download and preprocess a dataset from the [🤗 Datasets](https://github.com/huggingface/datasets) library or use your own files (jsonlines or csv), then fine-tune one of the architectures above on it.
+For custom datasets in `jsonlines` format please see: https://huggingface.co/docs/datasets/loading_datasets.html#json-files
+and you also will find examples of these below.
+## With Trainer
+Here is an example of a translation fine-tuning with a MarianMT model:
+```bash
+python examples/pytorch/seq2seq/run_translation.py \
+    --model_name_or_path Helsinki-NLP/opus-mt-en-ro \
+    --do_train \
+    --do_eval \
+    --source_lang en \
+    --target_lang ro \
+    --dataset_name wmt16 \
+    --dataset_config_name ro-en \
+    --output_dir /tmp/tst-translation \
+    --per_device_train_batch_size=4 \
+    --per_device_eval_batch_size=4 \
+    --overwrite_output_dir \
+    --predict_with_generate
+```
+MBart and some T5 models require special handling.
+T5 models `t5-small`, `t5-base`, `t5-large`, `t5-3b` and `t5-11b` must use an additional argument: `--source_prefix "translate {source_lang} to {target_lang}"`. For example:
+```bash
+python examples/pytorch/seq2seq/run_translation.py \
+    --model_name_or_path t5-small \
+    --do_train \
+    --do_eval \
+    --source_lang en \
+    --target_lang ro \
+    --source_prefix "translate English to Romanian: " \
+    --dataset_name wmt16 \
+    --dataset_config_name ro-en \
+    --output_dir /tmp/tst-translation \
+    --per_device_train_batch_size=4 \
+    --per_device_eval_batch_size=4 \
+    --overwrite_output_dir \
+    --predict_with_generate
+```
+If you get a terrible BLEU score, make sure that you didn't forget to use the `--source_prefix` argument.
+For the aforementioned group of T5 models it's important to remember that if you switch to a different language pair, make sure to adjust the source and target values in all 3 language-specific command line argument: `--source_lang`, `--target_lang` and `--source_prefix`.
+MBart models require a different format for `--source_lang` and `--target_lang` values, e.g. instead of `en` it expects `en_XX`, for `ro` it expects `ro_RO`. The full MBart specification for language codes can be found [here](https://huggingface.co/facebook/mbart-large-cc25). For example:
+```bash
+python examples/pytorch/seq2seq/run_translation.py \
+    --model_name_or_path facebook/mbart-large-en-ro  \
+    --do_train \
+    --do_eval \
+    --dataset_name wmt16 \
+    --dataset_config_name ro-en \
+    --source_lang en_XX \
+    --target_lang ro_RO \
+    --output_dir /tmp/tst-translation \
+    --per_device_train_batch_size=4 \
+    --per_device_eval_batch_size=4 \
+    --overwrite_output_dir \
+    --predict_with_generate
+ ```
+And here is how you would use the translation finetuning on your own files, after adjusting the
+values for the arguments `--train_file`, `--validation_file` to match your setup:
+```bash
+python examples/pytorch/seq2seq/run_translation.py \
+    --model_name_or_path t5-small \
+    --do_train \
+    --do_eval \
+    --source_lang en \
+    --target_lang ro \
+    --source_prefix "translate English to Romanian: " \
+    --dataset_name wmt16 \
+    --dataset_config_name ro-en \
+    --train_file path_to_jsonlines_file \
+    --validation_file path_to_jsonlines_file \
+    --output_dir /tmp/tst-translation \
+    --per_device_train_batch_size=4 \
+    --per_device_eval_batch_size=4 \
+    --overwrite_output_dir \
+    --predict_with_generate
+```
+The task of translation supports only custom JSONLINES files, with each line being a dictionary with a key `"translation"` and its value another dictionary whose keys is the language pair. For example:
+```json
+{ "translation": { "en": "Others have dismissed him as a joke.", "ro": "Alții l-au numit o glumă." } }
+{ "translation": { "en": "And some are holding out for an implosion.", "ro": "Iar alții așteaptă implozia." } }
+```
+Here the languages are Romanian (`ro`) and English (`en`).
+If you want to use a pre-processed dataset that leads to high BLEU scores, but for the `en-de` language pair, you can use `--dataset_name stas/wmt14-en-de-pre-processed`, as following:
+```bash
+python examples/pytorch/seq2seq/run_translation.py \
+    --model_name_or_path t5-small \
+    --do_train \
+    --do_eval \
+    --source_lang en \
+    --target_lang de \
+    --source_prefix "translate English to German: " \
+    --dataset_name stas/wmt14-en-de-pre-processed \
+    --output_dir /tmp/tst-translation \
+    --per_device_train_batch_size=4 \
+    --per_device_eval_batch_size=4 \
+    --overwrite_output_dir \
+    --predict_with_generate
+ ```
+## With Accelerate
+Based on the script [`run_translation_no_trainer.py`](https://github.com/huggingface/transformers/blob/master/examples/pytorch/translation/run_translationn_no_trainer.py).
+Like `run_translation.py`, this script allows you to fine-tune any of the models supported on a
+translation task, the main difference is that this
+script exposes the bare training loop, to allow you to quickly experiment and add any customization you would like.
+It offers less options than the script with `Trainer` (for instance you can easily change the options for the optimizer
+or the dataloaders directly in the script) but still run in a distributed setup, on TPU and supports mixed precision by
+the mean of the [🤗 `Accelerate`](https://github.com/huggingface/accelerate) library. You can use the script normally
+after installing it:
+```bash
+pip install accelerate
+```
+then
+```bash
+python run_tranlation_no_trainer.py \
+    --model_name_or_path Helsinki-NLP/opus-mt-en-ro \
+    --source_lang en \
+    --target_lang ro \
+    --dataset_name wmt16 \
+    --dataset_config_name ro-en \
+    --output_dir ~/tmp/tst-translation
+```
+You can then use your usual launchers to run in it in a distributed environment, but the easiest way is to run
+```bash
+accelerate config
+```
+and reply to the questions asked. Then
+```bash
+accelerate test
+```
+that will check everything is ready for training. Finally, you cna launch training with
+```bash
+export TASK_NAME=mrpc
+accelerate launch run_translation_no_trainer.py \
+    --model_name_or_path Helsinki-NLP/opus-mt-en-ro \
+    --source_lang en \
+    --target_lang ro \
+    --dataset_name wmt16 \
+    --dataset_config_name ro-en \
+    --output_dir ~/tmp/tst-translation
+```
+This command is the same and will work for:
+- a CPU-only setup
+- a setup with one GPU
+- a distributed training with several GPUs (single or multi node)
+- a training on TPUs
+Note that this library is in alpha release so your feedback is more than welcome if you encounter any problem using it.
--- a/examples/seq2seq/requirements.txt
+++ b/examples/seq2seq/requirements.txt
@@ -2,6 +2,5 @@ datasets >= 1.1.3
 sentencepiece != 0.1.92
 protobuf
 sacrebleu >= 1.4.12
-rouge-score
-nltk
 py7zr
+torch >= 1.3
--- a/examples/seq2seq/run_translation.py
+++ b/examples/seq2seq/run_translation.py
--- a/examples/seq2seq/run_translation_no_trainer.py
+++ b/examples/seq2seq/run_translation_no_trainer.py
--- a/examples/xla_spawn.py
+++ b/examples/xla_spawn.py
--- a/examples/tensorflow/README.md
+++ b/examples/tensorflow/README.md
+<!---
+Copyright 2020 The HuggingFace Team. All rights reserved.
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+    http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+# Examples
+This folder contains actively maintained examples of use of 🤗 Transformers using the TensorFlow backend, organized along NLP tasks. It is under construction so we thank you for your patience!
+## The Big Table of Tasks
+Here is the list of all our examples:
+- with information on whether they are **built on top of `Keras`** (if not, they still work, they might
+  just lack some features),
+- whether or not they leverage the [🤗 Datasets](https://github.com/huggingface/datasets) library.
+- links to **Colab notebooks** to walk through the scripts and run them easily,
+<!--
+Coming soon!
+- links to **Cloud deployments** to be able to deploy large-scale trainings in the Cloud with little to no setup.
+-->
+| Task | Example datasets | Keras support | 🤗 Datasets | Colab
+|---|---|:---:|:---:|:---:|
+| **`language-modeling`** | WikiText-2 | - | - | -
+| [**`multiple-choice`**](https://github.com/huggingface/transformers/tree/master/examples/tensorflow/multiple-choice) | SWAG | - | - | -
+| [**`question-answering`**](https://github.com/huggingface/transformers/tree/master/examples/tensorflow/question-answering) | SQuAD | - | - | -
+| **`summarization`** | XSum | - | -  | -
+| [**`text-classification`**](https://github.com/huggingface/transformers/tree/master/examples/tensorflow/text-classification) | GLUE | - | - | -
+| **`text-generation`** | n/a | - | n/a | -
+| **`token-classification`** | CoNLL NER | - | - | - 
+| **`translation`** | WMT | -  | - | -
--- a/examples/tensorflow/benchmarking/README.md
+++ b/examples/tensorflow/benchmarking/README.md
+<!---
+Copyright 2020 The HuggingFace Team. All rights reserved.
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+    http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+# 🤗 Benchmark results
+Here, you can find a list of the different benchmark results created by the community.
+If you would like to list benchmark results on your favorite models of the [model hub](https://huggingface.co/models) here, please open a Pull Request and add it below.
+| Benchmark description | Results | Environment info |      Author      |
+|:----------|:-------------|:-------------|------:|
+| PyTorch Benchmark on inference for `bert-base-cased` |[memory](https://github.com/patrickvonplaten/files_to_link_to/blob/master/bert_benchmark/inference_memory.csv) | [env](https://github.com/patrickvonplaten/files_to_link_to/blob/master/bert_benchmark/env.csv) | [Partick von Platen](https://github.com/patrickvonplaten) | 
+| PyTorch Benchmark on inference for `bert-base-cased` |[time](https://github.com/patrickvonplaten/files_to_link_to/blob/master/bert_benchmark/inference_time.csv) | [env](https://github.com/patrickvonplaten/files_to_link_to/blob/master/bert_benchmark/env.csv) | [Partick von Platen](https://github.com/patrickvonplaten) | 
--- a/examples/tensorflow/benchmarking/plot_csv_file.py
+++ b/examples/tensorflow/benchmarking/plot_csv_file.py
+# Copyright 2020 The HuggingFace Team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import csv
+from collections import defaultdict
+from dataclasses import dataclass, field
+from typing import List, Optional
+import matplotlib.pyplot as plt
+import numpy as np
+from matplotlib.ticker import ScalarFormatter
+from transformers import HfArgumentParser
+def list_field(default=None, metadata=None):
+    return field(default_factory=lambda: default, metadata=metadata)
+@dataclass
+class PlotArguments:
+    """
+    Arguments pertaining to which model/config/tokenizer we are going to fine-tune, or train from scratch.
+    """
+    csv_file: str = field(
+        metadata={"help": "The csv file to plot."},
+    )
+    plot_along_batch: bool = field(
+        default=False,
+        metadata={"help": "Whether to plot along batch size or sequence length. Defaults to sequence length."},
+    )
+    is_time: bool = field(
+        default=False,
+        metadata={"help": "Whether the csv file has time results or memory results. Defaults to memory results."},
+    )
+    no_log_scale: bool = field(
+        default=False,
+        metadata={"help": "Disable logarithmic scale when plotting"},
+    )
+    is_train: bool = field(
+        default=False,
+        metadata={
+            "help": "Whether the csv file has training results or inference results. Defaults to inference results."
+        },
+    )
+    figure_png_file: Optional[str] = field(
+        default=None,
+        metadata={"help": "Filename under which the plot will be saved. If unused no plot is saved."},
+    )
+    short_model_names: Optional[List[str]] = list_field(
+        default=None, metadata={"help": "List of model names that are used instead of the ones in the csv file."}
+    )
+def can_convert_to_int(string):
+    try:
+        int(string)
+        return True
+    except ValueError:
+        return False
+def can_convert_to_float(string):
+    try:
+        float(string)
+        return True
+    except ValueError:
+        return False
+class Plot:
+    def __init__(self, args):
+        self.args = args
+        self.result_dict = defaultdict(lambda: dict(bsz=[], seq_len=[], result={}))
+        with open(self.args.csv_file, newline="") as csv_file:
+            reader = csv.DictReader(csv_file)
+            for row in reader:
+                model_name = row["model"]
+                self.result_dict[model_name]["bsz"].append(int(row["batch_size"]))
+                self.result_dict[model_name]["seq_len"].append(int(row["sequence_length"]))
+                if can_convert_to_int(row["result"]):
+                    # value is not None
+                    self.result_dict[model_name]["result"][
+                        (int(row["batch_size"]), int(row["sequence_length"]))
+                    ] = int(row["result"])
+                elif can_convert_to_float(row["result"]):
+                    # value is not None
+                    self.result_dict[model_name]["result"][
+                        (int(row["batch_size"]), int(row["sequence_length"]))
+                    ] = float(row["result"])
+    def plot(self):
+        fig, ax = plt.subplots()
+        title_str = "Time usage" if self.args.is_time else "Memory usage"
+        title_str = title_str + " for training" if self.args.is_train else title_str + " for inference"
+        if not self.args.no_log_scale:
+            # set logarithm scales
+            ax.set_xscale("log")
+            ax.set_yscale("log")
+        for axis in [ax.xaxis, ax.yaxis]:
+            axis.set_major_formatter(ScalarFormatter())
+        for model_name_idx, model_name in enumerate(self.result_dict.keys()):
+            batch_sizes = sorted(list(set(self.result_dict[model_name]["bsz"])))
+            sequence_lengths = sorted(list(set(self.result_dict[model_name]["seq_len"])))
+            results = self.result_dict[model_name]["result"]
+            (x_axis_array, inner_loop_array) = (
+                (batch_sizes, sequence_lengths) if self.args.plot_along_batch else (sequence_lengths, batch_sizes)
+            )
+            label_model_name = (
+                model_name if self.args.short_model_names is None else self.args.short_model_names[model_name_idx]
+            )
+            for inner_loop_value in inner_loop_array:
+                if self.args.plot_along_batch:
+                    y_axis_array = np.asarray(
+                        [results[(x, inner_loop_value)] for x in x_axis_array if (x, inner_loop_value) in results],
+                        dtype=np.int,
+                    )
+                else:
+                    y_axis_array = np.asarray(
+                        [results[(inner_loop_value, x)] for x in x_axis_array if (inner_loop_value, x) in results],
+                        dtype=np.float32,
+                    )
+                (x_axis_label, inner_loop_label) = (
+                    ("batch_size", "len") if self.args.plot_along_batch else ("in #tokens", "bsz")
+                )
+                x_axis_array = np.asarray(x_axis_array, np.int)[: len(y_axis_array)]
+                plt.scatter(
+                    x_axis_array, y_axis_array, label=f"{label_model_name} - {inner_loop_label}: {inner_loop_value}"
+                )
+                plt.plot(x_axis_array, y_axis_array, "--")
+            title_str += f" {label_model_name} vs."
+        title_str = title_str[:-4]
+        y_axis_label = "Time in s" if self.args.is_time else "Memory in MB"
+        # plot
+        plt.title(title_str)
+        plt.xlabel(x_axis_label)
+        plt.ylabel(y_axis_label)
+        plt.legend()
+        if self.args.figure_png_file is not None:
+            plt.savefig(self.args.figure_png_file)
+        else:
+            plt.show()
+def main():
+    parser = HfArgumentParser(PlotArguments)
+    plot_args = parser.parse_args_into_dataclasses()[0]
+    plot = Plot(args=plot_args)
+    plot.plot()
+if __name__ == "__main__":
+    main()
--- a/examples/tensorflow/benchmarking/requirements.txt
+++ b/examples/tensorflow/benchmarking/requirements.txt
+tensorflow >= 2.3
\ No newline at end of file