Examples reorg (#11350)

* Base move * Examples reorganization * Update references * Put back test data * Move conftest * More fixes * Move test data to test fixtures * Update path * Apply suggestions from code review Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * Address review comments and clean Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

Examples reorg (#11350)
* Base move * Examples reorganization * Update references * Put back test data * Move conftest * More fixes * Move test data to test fixtures * Update path * Apply suggestions from code review Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * Address review comments and clean Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
dabeb152 · Sylvain Gugger · GitHub · ca7ff64f · dabeb152 · dabeb152
Unverified Commit dabeb152 authored Apr 21, 2021 by Sylvain Gugger Committed by GitHub Apr 21, 2021
20 changed files
--- a/.circleci/config.yml
+++ b/.circleci/config.yml
@@ -306,12 +306,12 @@ jobs:
                      - v0.4-{{ checksum "setup.py" }}
            - run: pip install --upgrade pip
            - run: pip install .[sklearn,torch,sentencepiece,testing]
-            - run: pip install -r examples/_tests_requirements.txt
+            - run: pip install -r examples/pytorch/_tests_requirements.txt
            - save_cache:
                  key: v0.4-torch_examples-{{ checksum "setup.py" }}
                  paths:
                      - '~/.cache/pip'
-            - run: TRANSFORMERS_IS_CI=1 python -m pytest -n 8 --dist=loadfile -s --make-reports=examples_torch ./examples/ | tee examples_output.txt
+            - run: TRANSFORMERS_IS_CI=1 python -m pytest -n 8 --dist=loadfile -s --make-reports=examples_torch ./examples/pytorch/ | tee examples_output.txt
            - store_artifacts:
                  path: ~/transformers/examples_output.txt
            - store_artifacts:

--- a/.github/workflows/self-scheduled.yml
+++ b/.github/workflows/self-scheduled.yml
@@ -59,7 +59,7 @@ jobs:
          HF_HOME: /mnt/cache
          TRANSFORMERS_IS_CI: yes
        run: |
-          pip install -r examples/_tests_requirements.txt
+          pip install -r examples/pytorch/_tests_requirements.txt
          python -m pytest -n 1 --dist=loadfile --make-reports=examples_torch_gpu examples
      - name: Failure short reports

--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -285,7 +285,7 @@ $ python -m pytest -n auto --dist=loadfile -s -v ./tests/
 and for the examples:
 ```bash
-$ pip install -r examples/requirements.txt  # only needed the first time
+$ pip install -r examples/xxx/requirements.txt  # only needed the first time
 $ python -m pytest -n auto --dist=loadfile -s -v ./examples/
 ```
 In fact, that's how `make test` and `make test-examples` are implemented (sans the `pip install` line)!

--- a/Makefile
+++ b/Makefile
@@ -73,7 +73,7 @@ test:
 # Run tests for examples
 test-examples:
-	python -m pytest -n auto --dist=loadfile -s -v ./examples/
+	python -m pytest -n auto --dist=loadfile -s -v ./examples/pytorch/
 # Run tests for SageMaker DLC release

--- a/docker/transformers-pytorch-tpu/Dockerfile
+++ b/docker/transformers-pytorch-tpu/Dockerfile
@@ -53,7 +53,7 @@ RUN git clone https://github.com/huggingface/transformers.git && \
    git checkout CI && \
    cd .. && \
    pip install ./transformers && \
-    pip install -r ./transformers/examples/requirements.txt && \
+    pip install -r ./transformers/examples/pytorch/_test_requirements.txt && \
    pip install pytest
 RUN python -c "import torch_xla; print(torch_xla.__version__)"

--- a/docker/transformers-pytorch-tpu/bert-base-cased.jsonnet
+++ b/docker/transformers-pytorch-tpu/bert-base-cased.jsonnet
@@ -27,7 +27,7 @@ local bertBaseCased = base.BaseTest {
  },
  command: utils.scriptCommand(
    |||
-      python -m pytest -s transformers/examples/test_xla_examples.py -v
+      python -m pytest -s transformers/examples/pytorch/test_xla_examples.py -v
      test_exit_code=$?
      echo "\nFinished running commands.\n"
      test $test_exit_code -eq 0

--- a/docs/source/benchmarks.rst
+++ b/docs/source/benchmarks.rst
@@ -65,10 +65,10 @@ respectively.
 .. code-block:: bash
    ## PYTORCH CODE
-    python examples/benchmarking/run_benchmark.py --help
+    python examples/pytorch/benchmarking/run_benchmark.py --help
    ## TENSORFLOW CODE
-    python examples/benchmarking/run_benchmark_tf.py --help
+    python examples/tensorflow/benchmarking/run_benchmark_tf.py --help
 An instantiated benchmark object can then simply be run by calling ``benchmark.run()``.

--- a/docs/source/converting_tensorflow_models.rst
+++ b/docs/source/converting_tensorflow_models.rst
@@ -33,8 +33,8 @@ You can convert any TensorFlow checkpoint for BERT (in particular `the pre-train
 This CLI takes as input a TensorFlow checkpoint (three files starting with ``bert_model.ckpt``\ ) and the associated
 configuration file (\ ``bert_config.json``\ ), and creates a PyTorch model for this configuration, loads the weights
 from the TensorFlow checkpoint in the PyTorch model and saves the resulting model in a standard PyTorch save file that
-can be imported using ``from_pretrained()`` (see example in :doc:`quicktour` , `run_glue.py
+can be imported using ``from_pretrained()`` (see example in :doc:`quicktour` , :prefix_link:`run_glue.py
-<https://github.com/huggingface/transformers/blob/master/examples/text-classification/run_glue.py>`_\ ).
+<examples/pytorch/text-classification/run_glue.py>` \ ).
 You only need to run this conversion script **once** to get a PyTorch model. You can then disregard the TensorFlow
 checkpoint (the three files starting with ``bert_model.ckpt``\ ) but be sure to keep the configuration file (\

--- a/docs/source/installation.md
+++ b/docs/source/installation.md
@@ -168,13 +168,13 @@ Here is an example of how this can be used on a filesystem that is shared betwee
 On the instance with the normal network run your program which will download and cache models (and optionally datasets if you use 🤗 Datasets). For example:
 ```
-python examples/seq2seq/run_translation.py --model_name_or_path t5-small --dataset_name wmt16 --dataset_config ro-en ...
+python examples/pytorch/translation/run_translation.py --model_name_or_path t5-small --dataset_name wmt16 --dataset_config ro-en ...
 ```
 and then with the same filesystem you can now run the same program on a firewalled instance:
 ```
 HF_DATASETS_OFFLINE=1 TRANSFORMERS_OFFLINE=1 \
-python examples/seq2seq/run_translation.py --model_name_or_path t5-small --dataset_name wmt16 --dataset_config ro-en ...
+python examples/pytorch/translation/run_translation.py --model_name_or_path t5-small --dataset_name wmt16 --dataset_config ro-en ...
 ```
 and it should succeed without any hanging waiting to timeout.

--- a/docs/source/main_classes/processors.rst
+++ b/docs/source/main_classes/processors.rst
@@ -68,8 +68,8 @@ Additionally, the following method can be used to load values from a data file a
 Example usage
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-An example using these processors is given in the `run_glue.py
+An example using these processors is given in the :prefix_link:`run_glue.py
-<https://github.com/huggingface/pytorch-transformers/blob/master/examples/text-classification/run_glue.py>`__ script.
+<examples/legacy/text-classification/run_glue.py>` script.
 XNLI
@@ -89,8 +89,8 @@ This library hosts the processor to load the XNLI data:
 Please note that since the gold labels are available on the test set, evaluation is performed on the test set.
-An example using these processors is given in the `run_xnli.py
+An example using these processors is given in the :prefix_link:`run_xnli.py
-<https://github.com/huggingface/pytorch-transformers/blob/master/examples/text-classification/run_xnli.py>`__ script.
+<examples/legacy/text-classification/run_xnli.py>` script.
 SQuAD
@@ -169,4 +169,4 @@ Using `tensorflow_datasets` is as easy as using a data file:
 Another example using these processors is given in the :prefix_link:`run_squad.py
-<examples/question-answering/run_squad.py>` script.
+<examples/legacy/question-answering/run_squad.py>` script.
--- a/docs/source/main_classes/trainer.rst
+++ b/docs/source/main_classes/trainer.rst
@@ -338,7 +338,7 @@ For example here is how you could use it for ``run_translation.py`` with 2 GPUs:
 .. code-block:: bash
-    python -m torch.distributed.launch --nproc_per_node=2 examples/seq2seq/run_translation.py \
+    python -m torch.distributed.launch --nproc_per_node=2 examples/pytorch/translation/run_translation.py \
    --model_name_or_path t5-small --per_device_train_batch_size 1   \
    --output_dir output_dir --overwrite_output_dir \
    --do_train --max_train_samples 500 --num_train_epochs 1 \
@@ -363,7 +363,7 @@ For example here is how you could use it for ``run_translation.py`` with 2 GPUs:
 .. code-block:: bash
-    python -m torch.distributed.launch --nproc_per_node=2 examples/seq2seq/run_translation.py \
+    python -m torch.distributed.launch --nproc_per_node=2 examples/pytorch/translation/run_translation.py \
    --model_name_or_path t5-small --per_device_train_batch_size 1   \
    --output_dir output_dir --overwrite_output_dir \
    --do_train --max_train_samples 500 --num_train_epochs 1 \
@@ -540,7 +540,7 @@ Here is an example of running ``run_translation.py`` under DeepSpeed deploying a
 .. code-block:: bash
-    deepspeed examples/seq2seq/run_translation.py \
+    deepspeed examples/pytorch/translation/run_translation.py \
    --deepspeed tests/deepspeed/ds_config.json \
    --model_name_or_path t5-small --per_device_train_batch_size 1   \
    --output_dir output_dir --overwrite_output_dir --fp16 \
@@ -565,7 +565,7 @@ To deploy DeepSpeed with one GPU adjust the :class:`~transformers.Trainer` comma
 .. code-block:: bash
-    deepspeed --num_gpus=1 examples/seq2seq/run_translation.py \
+    deepspeed --num_gpus=1 examples/pytorch/translation/run_translation.py \
    --deepspeed tests/deepspeed/ds_config.json \
    --model_name_or_path t5-small --per_device_train_batch_size 1   \
    --output_dir output_dir --overwrite_output_dir --fp16 \
@@ -617,7 +617,7 @@ Notes:
   .. code-block:: bash
-       deepspeed --include localhost:1 examples/seq2seq/run_translation.py ...
+       deepspeed --include localhost:1 examples/pytorch/translation/run_translation.py ...
   In this example, we tell DeepSpeed to use GPU 1 (second gpu).
@@ -711,7 +711,7 @@ shell from a cell. For example, to use ``run_translation.py`` you would launch i
 .. code-block::
    !git clone https://github.com/huggingface/transformers
-    !cd transformers; deepspeed examples/seq2seq/run_translation.py ...
+    !cd transformers; deepspeed examples/pytorch/translation/run_translation.py ...
 or with ``%%bash`` magic, where you can write a multi-line code for the shell program to run:
@@ -721,7 +721,7 @@ or with ``%%bash`` magic, where you can write a multi-line code for the shell pr
    git clone https://github.com/huggingface/transformers
    cd transformers
-    deepspeed examples/seq2seq/run_translation.py ...
+    deepspeed examples/pytorch/translation/run_translation.py ...
 In such case you don't need any of the code presented at the beginning of this section.

--- a/docs/source/model_doc/bart.rst
+++ b/docs/source/model_doc/bart.rst
@@ -43,7 +43,7 @@ Examples
 _______________________________________________________________________________________________________________________
 - Examples and scripts for fine-tuning BART and other models for sequence to sequence tasks can be found in
-  :prefix_link:`examples/seq2seq/ <examples/seq2seq/README.md>`.
+  :prefix_link:`examples/pytorch/summarization/ <examples/pytorch/summarization/README.md>`.
 - An example of how to train :class:`~transformers.BartForConditionalGeneration` with a Hugging Face :obj:`datasets`
  object can be found in this `forum discussion
  <https://discuss.huggingface.co/t/train-bart-for-conditional-generation-e-g-summarization/1904>`__.

--- a/docs/source/model_doc/barthez.rst
+++ b/docs/source/model_doc/barthez.rst
@@ -43,7 +43,7 @@ Examples
 _______________________________________________________________________________________________________________________
 - BARThez can be fine-tuned on sequence-to-sequence tasks in a similar way as BART, check:
-  :prefix_link:`examples/seq2seq/ <examples/seq2seq/README.md>`.
+  :prefix_link:`examples/pytorch/summarization/ <examples/pytorch/summarization/README.md>`.
 BarthezTokenizer

--- a/docs/source/model_doc/distilbert.rst
+++ b/docs/source/model_doc/distilbert.rst
@@ -44,8 +44,8 @@ Tips:
 - DistilBERT doesn't have options to select the input positions (:obj:`position_ids` input). This could be added if
  necessary though, just let us know if you need this option.
-This model was contributed by `victorsanh <https://huggingface.co/victorsanh>`__. The original code can be found `here
+This model was contributed by `victorsanh <https://huggingface.co/victorsanh>`__. The original code can be found
-<https://github.com/huggingface/transformers/tree/master/examples/distillation>`__.
+:prefix_link:`here <examples/research-projects/distillation>`.
 DistilBertConfig

--- a/docs/source/model_doc/pegasus.rst
+++ b/docs/source/model_doc/pegasus.rst
@@ -53,7 +53,8 @@ Examples
 _______________________________________________________________________________________________________________________
 - :prefix_link:`Script <examples/research_projects/seq2seq-distillation/finetune_pegasus_xsum.sh>` to fine-tune pegasus
-  on the XSUM dataset. Data download instructions at :prefix_link:`examples/seq2seq/ <examples/seq2seq/README.md>`.
+  on the XSUM dataset. Data download instructions at :prefix_link:`examples/pytorch/summarization/
+  <examples/pytorch/summarization/README.md>`.
 - FP16 is not supported (help/ideas on this appreciated!).
 - The adafactor optimizer is recommended for pegasus fine-tuning.

--- a/docs/source/model_doc/retribert.rst
+++ b/docs/source/model_doc/retribert.rst
@@ -21,7 +21,7 @@ Question Answering <https://yjernite.github.io/lfqa.html>`__. RetriBERT is a sma
 pair of BERT encoders with lower-dimension projection for dense semantic indexing of text.
 This model was contributed by `yjernite <https://huggingface.co/yjernite>`__. Code to train and use the model can be
-found `here <https://github.com/huggingface/transformers/tree/master/examples/distillation>`__.
+found :prefix_link:`here <examples/research-projects/distillation>`.
 RetriBertConfig

--- a/docs/source/model_doc/xlnet.rst
+++ b/docs/source/model_doc/xlnet.rst
@@ -41,7 +41,7 @@ Tips:
  using only a sub-set of the output tokens as target which are selected with the :obj:`target_mapping` input.
 - To use XLNet for sequential decoding (i.e. not in fully bi-directional setting), use the :obj:`perm_mask` and
  :obj:`target_mapping` inputs to control the attention span and outputs (see examples in
-  `examples/text-generation/run_generation.py`)
+  `examples/pytorch/text-generation/run_generation.py`)
 - XLNet is one of the few models that has no sequence length limit.
 This model was contributed by `thomwolf <https://huggingface.co/thomwolf>`__. The original code can be found `here

--- a/docs/source/model_summary.rst
+++ b/docs/source/model_summary.rst
@@ -682,7 +682,8 @@ The `mbart-large-en-ro checkpoint <https://huggingface.co/facebook/mbart-large-e
 romanian translation.
 The `mbart-large-cc25 <https://huggingface.co/facebook/mbart-large-cc25>`_ checkpoint can be finetuned for other
-translation and summarization tasks, using code in ```examples/seq2seq/``` , but is not very useful without finetuning.
+translation and summarization tasks, using code in ```examples/pytorch/translation/``` , but is not very useful without
+finetuning.
 ProphetNet

--- a/docs/source/multilingual.rst
+++ b/docs/source/multilingual.rst
@@ -90,8 +90,8 @@ You can then feed it all as input to your model:
    >>> outputs = model(input_ids, langs=langs)
-The example :prefix_link:`run_generation.py <examples/text-generation/run_generation.py>` can generate text using the
+The example :prefix_link:`run_generation.py <examples/pytorch/text-generation/run_generation.py>` can generate text
-CLM checkpoints from XLM, using the language embeddings.
+using the CLM checkpoints from XLM, using the language embeddings.
 XLM without Language Embeddings
 -----------------------------------------------------------------------------------------------------------------------

--- a/docs/source/sagemaker.md
+++ b/docs/source/sagemaker.md
@@ -325,7 +325,7 @@ When you create a `HuggingFace` Estimator, you can specify a [training script th
 If you are using `git_config` to run the [🤗 Transformers examples scripts](https://github.com/huggingface/transformers/tree/master/examples) keep in mind that you need to configure the right `'branch'` for you `transformers_version`, e.g. if you use `transformers_version='4.4.2` you have to use `'branch':'v4.4.2'`. 
-As an example to use `git_config` with an [example script from the transformers repository](https://github.com/huggingface/transformers/tree/master/examples/text-classification).
+As an example to use `git_config` with an [example script from the transformers repository](https://github.com/huggingface/transformers/tree/master/examples/pytorch/text-classification).
 _Tip: define `output_dir` as `/opt/ml/model` in the hyperparameter for the script to save your model to S3 after training._
@@ -338,7 +338,7 @@ git_config = {'repo': 'https://github.com/huggingface/transformers.git','branch'
 # create the Estimator
 huggingface_estimator = HuggingFace(
        entry_point='run_glue.py',
-        source_dir='./examples/text-classification',
+        source_dir='./examples/pytorch/text-classification',
        git_config=git_config,
        instance_type='ml.p3.2xlarge',
        instance_count=1,