Convert rst files (#14888)

* Convert all tutorials and guides * Convert all remaining rst to mdx * Track and fix bad links

Convert rst files (#14888)
* Convert all tutorials and guides * Convert all remaining rst to mdx * Track and fix bad links
207594be · Sylvain Gugger · GitHub · b0c7d2ec · 207594be · 207594be
Unverified Commit 207594be authored Dec 22, 2021 by Sylvain Gugger Committed by GitHub Dec 22, 2021
20 changed files
--- a/docs/source/add_new_model.rst
+++ b/docs/source/add_new_model.rst
--- a/docs/source/add_new_pipeline.mdx
+++ b/docs/source/add_new_pipeline.mdx
+<!--Copyright 2020 The HuggingFace Team. All rights reserved.
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+-->
+# How to add a pipeline to 🤗 Transformers?
+First and foremost, you need to decide the raw entries the pipeline will be able to take. It can be strings, raw bytes,
+dictionaries or whatever seems to be the most likely desired input. Try to keep these inputs as pure Python as possible
+as it makes compatibility easier (even through other languages via JSON). Those will be the `inputs` of the
+pipeline (`preprocess`).
+Then define the `outputs`. Same policy as the `inputs`. The simpler, the better. Those will be the outputs of
+`postprocess` method.
+Start by inheriting the base class `Pipeline`. with the 4 methods needed to implement `preprocess`,
+`_forward`, `postprocess` and `_sanitize_parameters`.
+```python
+from transformers import Pipeline
+class MyPipeline(Pipeline):
+    def _sanitize_parameters(self, **kwargs):
+        preprocess_kwargs = {}
+        if "maybe_arg" in kwargs:
+            preprocess_kwargs["maybe_arg"] = kwargs["maybe_arg"]
+        return preprocess_kwargs, {}, {}
+    def preprocess(self, inputs, maybe_arg=2):
+        model_input = Tensor(....)
+        return {"model_input": model_input}
+    def _forward(self, model_inputs):
+        # model_inputs == {"model_input": model_input}
+        outputs = self.model(**model_inputs)
+        # Maybe {"logits": Tensor(...)}
+        return outputs
+    def postprocess(self, model_outputs):
+        best_class = model_outputs["logits"].softmax(-1)
+        return best_class
+```
+The structure of this breakdown is to support relatively seamless support for CPU/GPU, while supporting doing
+pre/postprocessing on the CPU on different threads
+`preprocess` will take the originally defined inputs, and turn them into something feedable to the model. It might
+contain more information and is usually a `Dict`.
+`_forward` is the implementation detail and is not meant to be called directly. `forward` is the preferred
+called method as it contains safeguards to make sure everything is working on the expected device. If anything is
+linked to a real model it belongs in the `_forward` method, anything else is in the preprocess/postprocess.
+`postprocess` methods will take the output of `_forward` and turn it into the final output that were decided
+earlier.
+`_sanitize_parameters` exists to allow users to pass any parameters whenever they wish, be it at initialization
+time `pipeline(...., maybe_arg=4)` or at call time `pipe = pipeline(...); output = pipe(...., maybe_arg=4)`.
+The returns of `_sanitize_parameters` are the 3 dicts of kwargs that will be passed directly to `preprocess`,
+`_forward` and `postprocess`. Don't fill anything if the caller didn't call with any extra parameter. That
+allows to keep the default arguments in the function definition which is always more "natural".
+A classic example would be a `top_k` argument in the post processing in classification tasks.
+```python
+>>> pipe = pipeline("my-new-task")
+>>> pipe("This is a test")
+[{"label": "1-star", "score": 0.8}, {"label": "2-star", "score": 0.1}, {"label": "3-star", "score": 0.05}
+{"label": "4-star", "score": 0.025}, {"label": "5-star", "score": 0.025}]
+>>> pipe("This is a test", top_k=2)
+[{"label": "1-star", "score": 0.8}, {"label": "2-star", "score": 0.1}]
+```
+In order to achieve that, we'll update our `postprocess` method with a default parameter to `5`. and edit
+`_sanitize_parameters` to allow this new parameter.
+```python
+def postprocess(self, model_outputs, top_k=5):
+    best_class = model_outputs["logits"].softmax(-1)
+    # Add logic to handle top_k
+    return best_class
+def _sanitize_parameters(self, **kwargs):
+    preprocess_kwargs = {}
+    if "maybe_arg" in kwargs:
+        preprocess_kwargs["maybe_arg"] = kwargs["maybe_arg"]
+    postprocess_kwargs = {}
+    if "top_k" in kwargs:
+        preprocess_kwargs["top_k"] = kwargs["top_k"]
+    return preprocess_kwargs, {}, postprocess_kwargs
+```
+Try to keep the inputs/outputs very simple and ideally JSON-serializable as it makes the pipeline usage very easy
+without requiring users to understand new kind of objects. It's also relatively common to support many different types
+of arguments for ease of use (audio files, can be filenames, URLs or pure bytes)
+## Adding it to the list of supported tasks
+Go to `src/transformers/pipelines/__init__.py` and fill in `SUPPORTED_TASKS` with your newly created pipeline.
+If possible it should provide a default model.
+## Adding tests
+Create a new file `tests/test_pipelines_MY_PIPELINE.py` with example with the other tests.
+The `run_pipeline_test` function will be very generic and run on small random models on every possible
+architecture as defined by `model_mapping` and `tf_model_mapping`.
+This is very important to test future compatibility, meaning if someone adds a new model for
+`XXXForQuestionAnswering` then the pipeline test will attempt to run on it. Because the models are random it's
+impossible to check for actual values, that's why There is a helper `ANY` that will simply attempt to match the
+output of the pipeline TYPE.
+You also *need* to implement 2 (ideally 4) tests.
+- `test_small_model_pt` : Define 1 small model for this pipeline (doesn't matter if the results don't make sense)
+  and test the pipeline outputs. The results should be the same as `test_small_model_tf`.
+- `test_small_model_tf` : Define 1 small model for this pipeline (doesn't matter if the results don't make sense)
+  and test the pipeline outputs. The results should be the same as `test_small_model_pt`.
+- `test_large_model_pt` (`optional`): Tests the pipeline on a real pipeline where the results are supposed to
+  make sense. These tests are slow and should be marked as such. Here the goal is to showcase the pipeline and to make
+  sure there is no drift in future releases
+- `test_large_model_tf` (`optional`): Tests the pipeline on a real pipeline where the results are supposed to
+  make sense. These tests are slow and should be marked as such. Here the goal is to showcase the pipeline and to make
+  sure there is no drift in future releases
--- a/docs/source/add_new_pipeline.rst
+++ b/docs/source/add_new_pipeline.rst
-.. 
-    Copyright 2020 The HuggingFace Team. All rights reserved.
-    Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
-    the License. You may obtain a copy of the License at
-        http://www.apache.org/licenses/LICENSE-2.0
-    Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
-    an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
-How to add a pipeline to 🤗 Transformers?
-=======================================================================================================================
-First and foremost, you need to decide the raw entries the pipeline will be able to take. It can be strings, raw bytes,
-dictionaries or whatever seems to be the most likely desired input. Try to keep these inputs as pure Python as possible
-as it makes compatibility easier (even through other languages via JSON). Those will be the :obj:`inputs` of the
-pipeline (:obj:`preprocess`).
-Then define the :obj:`outputs`. Same policy as the :obj:`inputs`. The simpler, the better. Those will be the outputs of
-:obj:`postprocess` method.
-Start by inheriting the base class :obj:`Pipeline`. with the 4 methods needed to implement :obj:`preprocess`,
-:obj:`_forward`, :obj:`postprocess` and :obj:`_sanitize_parameters`.
-.. code-block::
-    from transformers import Pipeline
-    class MyPipeline(Pipeline):
-        def _sanitize_parameters(self, **kwargs):
-            preprocess_kwargs = {}
-            if "maybe_arg" in kwargs:
-                preprocess_kwargs["maybe_arg"] = kwargs["maybe_arg"]
-            return preprocess_kwargs, {}, {}
-        def preprocess(self, inputs, maybe_arg=2):
-            model_input = Tensor(....)
-            return {"model_input": model_input}
-        def _forward(self, model_inputs):
-            # model_inputs == {"model_input": model_input}
-            outputs = self.model(**model_inputs)
-            # Maybe {"logits": Tensor(...)}
-            return outputs
-        def postprocess(self, model_outputs):
-            best_class = model_outputs["logits"].softmax(-1)
-            return best_class
-The structure of this breakdown is to support relatively seamless support for CPU/GPU, while supporting doing
-pre/postprocessing on the CPU on different threads
-:obj:`preprocess` will take the originally defined inputs, and turn them into something feedable to the model. It might
-contain more information and is usually a :obj:`Dict`.
-:obj:`_forward` is the implementation detail and is not meant to be called directly. :obj:`forward` is the preferred
-called method as it contains safeguards to make sure everything is working on the expected device. If anything is
-linked to a real model it belongs in the :obj:`_forward` method, anything else is in the preprocess/postprocess.
-:obj:`postprocess` methods will take the output of :obj:`_forward` and turn it into the final output that were decided
-earlier.
-:obj:`_sanitize_parameters` exists to allow users to pass any parameters whenever they wish, be it at initialization
-time ``pipeline(...., maybe_arg=4)`` or at call time ``pipe = pipeline(...); output = pipe(...., maybe_arg=4)``.
-The returns of :obj:`_sanitize_parameters` are the 3 dicts of kwargs that will be passed directly to :obj:`preprocess`,
-:obj:`_forward` and :obj:`postprocess`. Don't fill anything if the caller didn't call with any extra parameter. That
-allows to keep the default arguments in the function definition which is always more "natural".
-A classic example would be a :obj:`top_k` argument in the post processing in classification tasks.
-.. code-block::
-    >>> pipe = pipeline("my-new-task")
-    >>> pipe("This is a test")
-    [{"label": "1-star", "score": 0.8}, {"label": "2-star", "score": 0.1}, {"label": "3-star", "score": 0.05}
-    {"label": "4-star", "score": 0.025}, {"label": "5-star", "score": 0.025}]
-    >>> pipe("This is a test", top_k=2)
-    [{"label": "1-star", "score": 0.8}, {"label": "2-star", "score": 0.1}]
-In order to achieve that, we'll update our :obj:`postprocess` method with a default parameter to :obj:`5`. and edit
-:obj:`_sanitize_parameters` to allow this new parameter.
-.. code-block::
-        def postprocess(self, model_outputs, top_k=5):
-            best_class = model_outputs["logits"].softmax(-1)
-            # Add logic to handle top_k
-            return best_class
-        def _sanitize_parameters(self, **kwargs):
-            preprocess_kwargs = {}
-            if "maybe_arg" in kwargs:
-                preprocess_kwargs["maybe_arg"] = kwargs["maybe_arg"]
-            postprocess_kwargs = {}
-            if "top_k" in kwargs:
-                preprocess_kwargs["top_k"] = kwargs["top_k"]
-            return preprocess_kwargs, {}, postprocess_kwargs
-Try to keep the inputs/outputs very simple and ideally JSON-serializable as it makes the pipeline usage very easy
-without requiring users to understand new kind of objects. It's also relatively common to support many different types
-of arguments for ease of use (audio files, can be filenames, URLs or pure bytes)
-Adding it to the list of supported tasks
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-Go to ``src/transformers/pipelines/__init__.py`` and fill in :obj:`SUPPORTED_TASKS` with your newly created pipeline.
-If possible it should provide a default model.
-Adding tests
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-Create a new file ``tests/test_pipelines_MY_PIPELINE.py`` with example with the other tests.
-The :obj:`run_pipeline_test` function will be very generic and run on small random models on every possible
-architecture as defined by :obj:`model_mapping` and :obj:`tf_model_mapping`.
-This is very important to test future compatibility, meaning if someone adds a new model for
-:obj:`XXXForQuestionAnswering` then the pipeline test will attempt to run on it. Because the models are random it's
-impossible to check for actual values, that's why There is a helper :obj:`ANY` that will simply attempt to match the
-output of the pipeline TYPE.
-You also *need* to implement 2 (ideally 4) tests.
- :obj:`test_small_model_pt` : Define 1 small model for this pipeline (doesn't matter if the results don't make sense)
-  and test the pipeline outputs. The results should be the same as :obj:`test_small_model_tf`.
- :obj:`test_small_model_tf` : Define 1 small model for this pipeline (doesn't matter if the results don't make sense)
-  and test the pipeline outputs. The results should be the same as :obj:`test_small_model_pt`.
- :obj:`test_large_model_pt` (:obj:`optional`): Tests the pipeline on a real pipeline where the results are supposed to
-  make sense. These tests are slow and should be marked as such. Here the goal is to showcase the pipeline and to make
-  sure there is no drift in future releases
- :obj:`test_large_model_tf` (:obj:`optional`): Tests the pipeline on a real pipeline where the results are supposed to
-  make sense. These tests are slow and should be marked as such. Here the goal is to showcase the pipeline and to make
-  sure there is no drift in future releases
--- a/docs/source/bertology.rst
+++ b/docs/source/bertology.rst
-.. 
+<!--Copyright 2020 The HuggingFace Team. All rights reserved.
-    Copyright 2020 The HuggingFace Team. All rights reserved.
-    Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
-    the License. You may obtain a copy of the License at
+the License. You may obtain a copy of the License at
-        http://www.apache.org/licenses/LICENSE-2.0
+http://www.apache.org/licenses/LICENSE-2.0
-    Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
-    an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
-    specific language governing permissions and limitations under the License.
+specific language governing permissions and limitations under the License.
+-->
-BERTology
+# BERTology
-----------------------------------------------------------------------------------------------------------------------
 There is a growing field of study concerned with investigating the inner working of large-scale transformers like BERT
 (that some call "BERTology"). Some good examples of this field are:
-* BERT Rediscovers the Classical NLP Pipeline by Ian Tenney, Dipanjan Das, Ellie Pavlick:
+- BERT Rediscovers the Classical NLP Pipeline by Ian Tenney, Dipanjan Das, Ellie Pavlick:
  https://arxiv.org/abs/1905.05950
-* Are Sixteen Heads Really Better than One? by Paul Michel, Omer Levy, Graham Neubig: https://arxiv.org/abs/1905.10650
+- Are Sixteen Heads Really Better than One? by Paul Michel, Omer Levy, Graham Neubig: https://arxiv.org/abs/1905.10650
-* What Does BERT Look At? An Analysis of BERT's Attention by Kevin Clark, Urvashi Khandelwal, Omer Levy, Christopher D.
+- What Does BERT Look At? An Analysis of BERT's Attention by Kevin Clark, Urvashi Khandelwal, Omer Levy, Christopher D.
  Manning: https://arxiv.org/abs/1906.04341
 In order to help this new field develop, we have included a few additional features in the BERT/GPT/GPT-2 models to
@@ -28,11 +27,10 @@ help people access the inner representations, mainly adapted from the great work
 (https://arxiv.org/abs/1905.10650):
-* accessing all the hidden-states of BERT/GPT/GPT-2,
+- accessing all the hidden-states of BERT/GPT/GPT-2,
-* accessing all the attention weights for each head of BERT/GPT/GPT-2,
+- accessing all the attention weights for each head of BERT/GPT/GPT-2,
-* retrieving heads output values and gradients to be able to compute head importance score and prune head as explained
+- retrieving heads output values and gradients to be able to compute head importance score and prune head as explained
  in https://arxiv.org/abs/1905.10650.
-To help you understand and use these features, we have added a specific example script: :prefix_link:`bertology.py
+To help you understand and use these features, we have added a specific example script: [bertology.py](https://github.com/huggingface/transformers/tree/master/examples/research_projects/bertology/run_bertology.py) while extract information and prune a model pre-trained on
-<examples/research_projects/bertology/run_bertology.py>` while extract information and prune a model pre-trained on
 GLUE.
--- a/docs/source/community.md
+++ b/docs/source/community.md
--- a/docs/source/converting_tensorflow_models.mdx
+++ b/docs/source/converting_tensorflow_models.mdx
+<!--Copyright 2020 The HuggingFace Team. All rights reserved.
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License.
+-->
+# Converting Tensorflow Checkpoints
+A command-line interface is provided to convert original Bert/GPT/GPT-2/Transformer-XL/XLNet/XLM checkpoints to models
+that can be loaded using the `from_pretrained` methods of the library.
+<Tip>
+Since 2.3.0 the conversion script is now part of the transformers CLI (**transformers-cli**) available in any
+transformers >= 2.3.0 installation.
+The documentation below reflects the **transformers-cli convert** command format.
+</Tip>
+## BERT
+You can convert any TensorFlow checkpoint for BERT (in particular [the pre-trained models released by Google](https://github.com/google-research/bert#pre-trained-models)) in a PyTorch save file by using the
+[convert_bert_original_tf_checkpoint_to_pytorch.py](https://github.com/huggingface/transformers/tree/master/src/transformers/models/bert/convert_bert_original_tf_checkpoint_to_pytorch.py) script.
+This CLI takes as input a TensorFlow checkpoint (three files starting with `bert_model.ckpt`) and the associated
+configuration file (`bert_config.json`), and creates a PyTorch model for this configuration, loads the weights from
+the TensorFlow checkpoint in the PyTorch model and saves the resulting model in a standard PyTorch save file that can
+be imported using `from_pretrained()` (see example in [quicktour](quicktour) , [run_glue.py](https://github.com/huggingface/transformers/tree/master/examples/pytorch/text-classification/run_glue.py) ).
+You only need to run this conversion script **once** to get a PyTorch model. You can then disregard the TensorFlow
+checkpoint (the three files starting with `bert_model.ckpt`) but be sure to keep the configuration file (\
+`bert_config.json`) and the vocabulary file (`vocab.txt`) as these are needed for the PyTorch model too.
+To run this specific conversion script you will need to have TensorFlow and PyTorch installed (`pip install tensorflow`). The rest of the repository only requires PyTorch.
+Here is an example of the conversion process for a pre-trained `BERT-Base Uncased` model:
+```bash
+export BERT_BASE_DIR=/path/to/bert/uncased_L-12_H-768_A-12
+transformers-cli convert --model_type bert \
+  --tf_checkpoint $BERT_BASE_DIR/bert_model.ckpt \
+  --config $BERT_BASE_DIR/bert_config.json \
+  --pytorch_dump_output $BERT_BASE_DIR/pytorch_model.bin
+```
+You can download Google's pre-trained models for the conversion [here](https://github.com/google-research/bert#pre-trained-models).
+## ALBERT
+Convert TensorFlow model checkpoints of ALBERT to PyTorch using the
+[convert_albert_original_tf_checkpoint_to_pytorch.py](https://github.com/huggingface/transformers/tree/master/src/transformers/models/albert/convert_albert_original_tf_checkpoint_to_pytorch.py) script.
+The CLI takes as input a TensorFlow checkpoint (three files starting with `model.ckpt-best`) and the accompanying
+configuration file (`albert_config.json`), then creates and saves a PyTorch model. To run this conversion you will
+need to have TensorFlow and PyTorch installed.
+Here is an example of the conversion process for the pre-trained `ALBERT Base` model:
+```bash
+export ALBERT_BASE_DIR=/path/to/albert/albert_base
+transformers-cli convert --model_type albert \
+  --tf_checkpoint $ALBERT_BASE_DIR/model.ckpt-best \
+  --config $ALBERT_BASE_DIR/albert_config.json \
+  --pytorch_dump_output $ALBERT_BASE_DIR/pytorch_model.bin
+```
+You can download Google's pre-trained models for the conversion [here](https://github.com/google-research/albert#pre-trained-models).
+## OpenAI GPT
+Here is an example of the conversion process for a pre-trained OpenAI GPT model, assuming that your NumPy checkpoint
+save as the same format than OpenAI pretrained model (see [here](https://github.com/openai/finetune-transformer-lm)\
+)
+```bash
+export OPENAI_GPT_CHECKPOINT_FOLDER_PATH=/path/to/openai/pretrained/numpy/weights
+transformers-cli convert --model_type gpt \
+  --tf_checkpoint $OPENAI_GPT_CHECKPOINT_FOLDER_PATH \
+  --pytorch_dump_output $PYTORCH_DUMP_OUTPUT \
+  [--config OPENAI_GPT_CONFIG] \
+  [--finetuning_task_name OPENAI_GPT_FINETUNED_TASK] \
+```
+## OpenAI GPT-2
+Here is an example of the conversion process for a pre-trained OpenAI GPT-2 model (see [here](https://github.com/openai/gpt-2))
+```bash
+export OPENAI_GPT2_CHECKPOINT_PATH=/path/to/gpt2/pretrained/weights
+transformers-cli convert --model_type gpt2 \
+  --tf_checkpoint $OPENAI_GPT2_CHECKPOINT_PATH \
+  --pytorch_dump_output $PYTORCH_DUMP_OUTPUT \
+  [--config OPENAI_GPT2_CONFIG] \
+  [--finetuning_task_name OPENAI_GPT2_FINETUNED_TASK]
+```
+## Transformer-XL
+Here is an example of the conversion process for a pre-trained Transformer-XL model (see [here](https://github.com/kimiyoung/transformer-xl/tree/master/tf#obtain-and-evaluate-pretrained-sota-models))
+```bash
+export TRANSFO_XL_CHECKPOINT_FOLDER_PATH=/path/to/transfo/xl/checkpoint
+transformers-cli convert --model_type transfo_xl \
+  --tf_checkpoint $TRANSFO_XL_CHECKPOINT_FOLDER_PATH \
+  --pytorch_dump_output $PYTORCH_DUMP_OUTPUT \
+  [--config TRANSFO_XL_CONFIG] \
+  [--finetuning_task_name TRANSFO_XL_FINETUNED_TASK]
+```
+## XLNet
+Here is an example of the conversion process for a pre-trained XLNet model:
+```bash
+export TRANSFO_XL_CHECKPOINT_PATH=/path/to/xlnet/checkpoint
+export TRANSFO_XL_CONFIG_PATH=/path/to/xlnet/config
+transformers-cli convert --model_type xlnet \
+  --tf_checkpoint $TRANSFO_XL_CHECKPOINT_PATH \
+  --config $TRANSFO_XL_CONFIG_PATH \
+  --pytorch_dump_output $PYTORCH_DUMP_OUTPUT \
+  [--finetuning_task_name XLNET_FINETUNED_TASK] \
+```
+## XLM
+Here is an example of the conversion process for a pre-trained XLM model:
+```bash
+export XLM_CHECKPOINT_PATH=/path/to/xlm/checkpoint
+transformers-cli convert --model_type xlm \
+  --tf_checkpoint $XLM_CHECKPOINT_PATH \
+  --pytorch_dump_output $PYTORCH_DUMP_OUTPUT
+ [--config XML_CONFIG] \
+ [--finetuning_task_name XML_FINETUNED_TASK]
+```
+## T5
+Here is an example of the conversion process for a pre-trained T5 model:
+```bash
+export T5=/path/to/t5/uncased_L-12_H-768_A-12
+transformers-cli convert --model_type t5 \
+  --tf_checkpoint $T5/t5_model.ckpt \
+  --config $T5/t5_config.json \
+  --pytorch_dump_output $T5/pytorch_model.bin
+```
--- a/docs/source/converting_tensorflow_models.rst
+++ b/docs/source/converting_tensorflow_models.rst
-.. 
-    Copyright 2020 The HuggingFace Team. All rights reserved.
-    Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
-    the License. You may obtain a copy of the License at
-        http://www.apache.org/licenses/LICENSE-2.0
-    Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
-    an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
-    specific language governing permissions and limitations under the License.
-Converting Tensorflow Checkpoints
-=======================================================================================================================
-A command-line interface is provided to convert original Bert/GPT/GPT-2/Transformer-XL/XLNet/XLM checkpoints to models
-that can be loaded using the ``from_pretrained`` methods of the library.
-.. note::
-    Since 2.3.0 the conversion script is now part of the transformers CLI (**transformers-cli**) available in any
-    transformers >= 2.3.0 installation.
-    The documentation below reflects the **transformers-cli convert** command format.
-BERT
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-You can convert any TensorFlow checkpoint for BERT (in particular `the pre-trained models released by Google
-<https://github.com/google-research/bert#pre-trained-models>`_) in a PyTorch save file by using the
-:prefix_link:`convert_bert_original_tf_checkpoint_to_pytorch.py
-<src/transformers/models/bert/convert_bert_original_tf_checkpoint_to_pytorch.py>` script.
-This CLI takes as input a TensorFlow checkpoint (three files starting with ``bert_model.ckpt``) and the associated
-configuration file (``bert_config.json``), and creates a PyTorch model for this configuration, loads the weights from
-the TensorFlow checkpoint in the PyTorch model and saves the resulting model in a standard PyTorch save file that can
-be imported using ``from_pretrained()`` (see example in :doc:`quicktour` , :prefix_link:`run_glue.py
-<examples/pytorch/text-classification/run_glue.py>` ).
-You only need to run this conversion script **once** to get a PyTorch model. You can then disregard the TensorFlow
-checkpoint (the three files starting with ``bert_model.ckpt``) but be sure to keep the configuration file (\
-``bert_config.json``) and the vocabulary file (``vocab.txt``) as these are needed for the PyTorch model too.
-To run this specific conversion script you will need to have TensorFlow and PyTorch installed (``pip install
-tensorflow``). The rest of the repository only requires PyTorch.
-Here is an example of the conversion process for a pre-trained ``BERT-Base Uncased`` model:
-.. code-block:: shell
-    export BERT_BASE_DIR=/path/to/bert/uncased_L-12_H-768_A-12
-    transformers-cli convert --model_type bert \
-      --tf_checkpoint $BERT_BASE_DIR/bert_model.ckpt \
-      --config $BERT_BASE_DIR/bert_config.json \
-      --pytorch_dump_output $BERT_BASE_DIR/pytorch_model.bin
-You can download Google's pre-trained models for the conversion `here
-<https://github.com/google-research/bert#pre-trained-models>`__.
-ALBERT
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-Convert TensorFlow model checkpoints of ALBERT to PyTorch using the
-:prefix_link:`convert_albert_original_tf_checkpoint_to_pytorch.py
-<src/transformers/models/albert/convert_albert_original_tf_checkpoint_to_pytorch.py>` script.
-The CLI takes as input a TensorFlow checkpoint (three files starting with ``model.ckpt-best``) and the accompanying
-configuration file (``albert_config.json``), then creates and saves a PyTorch model. To run this conversion you will
-need to have TensorFlow and PyTorch installed.
-Here is an example of the conversion process for the pre-trained ``ALBERT Base`` model:
-.. code-block:: shell
-    export ALBERT_BASE_DIR=/path/to/albert/albert_base
-    transformers-cli convert --model_type albert \
-      --tf_checkpoint $ALBERT_BASE_DIR/model.ckpt-best \
-      --config $ALBERT_BASE_DIR/albert_config.json \
-      --pytorch_dump_output $ALBERT_BASE_DIR/pytorch_model.bin
-You can download Google's pre-trained models for the conversion `here
-<https://github.com/google-research/albert#pre-trained-models>`__.
-OpenAI GPT
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-Here is an example of the conversion process for a pre-trained OpenAI GPT model, assuming that your NumPy checkpoint
-save as the same format than OpenAI pretrained model (see `here <https://github.com/openai/finetune-transformer-lm>`__\
-)
-.. code-block:: shell
-    export OPENAI_GPT_CHECKPOINT_FOLDER_PATH=/path/to/openai/pretrained/numpy/weights
-    transformers-cli convert --model_type gpt \
-      --tf_checkpoint $OPENAI_GPT_CHECKPOINT_FOLDER_PATH \
-      --pytorch_dump_output $PYTORCH_DUMP_OUTPUT \
-      [--config OPENAI_GPT_CONFIG] \
-      [--finetuning_task_name OPENAI_GPT_FINETUNED_TASK] \
-OpenAI GPT-2
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-Here is an example of the conversion process for a pre-trained OpenAI GPT-2 model (see `here
-<https://github.com/openai/gpt-2>`__)
-.. code-block:: shell
-    export OPENAI_GPT2_CHECKPOINT_PATH=/path/to/gpt2/pretrained/weights
-    transformers-cli convert --model_type gpt2 \
-      --tf_checkpoint $OPENAI_GPT2_CHECKPOINT_PATH \
-      --pytorch_dump_output $PYTORCH_DUMP_OUTPUT \
-      [--config OPENAI_GPT2_CONFIG] \
-      [--finetuning_task_name OPENAI_GPT2_FINETUNED_TASK]
-Transformer-XL
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-Here is an example of the conversion process for a pre-trained Transformer-XL model (see `here
-<https://github.com/kimiyoung/transformer-xl/tree/master/tf#obtain-and-evaluate-pretrained-sota-models>`__)
-.. code-block:: shell
-    export TRANSFO_XL_CHECKPOINT_FOLDER_PATH=/path/to/transfo/xl/checkpoint
-    transformers-cli convert --model_type transfo_xl \
-      --tf_checkpoint $TRANSFO_XL_CHECKPOINT_FOLDER_PATH \
-      --pytorch_dump_output $PYTORCH_DUMP_OUTPUT \
-      [--config TRANSFO_XL_CONFIG] \
-      [--finetuning_task_name TRANSFO_XL_FINETUNED_TASK]
-XLNet
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-Here is an example of the conversion process for a pre-trained XLNet model:
-.. code-block:: shell
-    export TRANSFO_XL_CHECKPOINT_PATH=/path/to/xlnet/checkpoint
-    export TRANSFO_XL_CONFIG_PATH=/path/to/xlnet/config
-    transformers-cli convert --model_type xlnet \
-      --tf_checkpoint $TRANSFO_XL_CHECKPOINT_PATH \
-      --config $TRANSFO_XL_CONFIG_PATH \
-      --pytorch_dump_output $PYTORCH_DUMP_OUTPUT \
-      [--finetuning_task_name XLNET_FINETUNED_TASK] \
-XLM
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-Here is an example of the conversion process for a pre-trained XLM model:
-.. code-block:: shell
-    export XLM_CHECKPOINT_PATH=/path/to/xlm/checkpoint
-    transformers-cli convert --model_type xlm \
-      --tf_checkpoint $XLM_CHECKPOINT_PATH \
-      --pytorch_dump_output $PYTORCH_DUMP_OUTPUT
-     [--config XML_CONFIG] \
-     [--finetuning_task_name XML_FINETUNED_TASK]
-T5
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-Here is an example of the conversion process for a pre-trained T5 model:
-.. code-block:: shell
-    export T5=/path/to/t5/uncased_L-12_H-768_A-12
-    transformers-cli convert --model_type t5 \
-      --tf_checkpoint $T5/t5_model.ckpt \
-      --config $T5/t5_config.json \
-      --pytorch_dump_output $T5/pytorch_model.bin
--- a/docs/source/fast_tokenizers.mdx
+++ b/docs/source/fast_tokenizers.mdx
+<!--Copyright 2020 The HuggingFace Team. All rights reserved.
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License.
+-->
+# Using tokenizers from 🤗 Tokenizers
+The [`PreTrainedTokenizerFast`] depends on the [🤗 Tokenizers](https://huggingface.co/docs/tokenizers) library. The tokenizers obtained from the 🤗 Tokenizers library can be
+loaded very simply into 🤗 Transformers.
+Before getting in the specifics, let's first start by creating a dummy tokenizer in a few lines:
+```python
+>>> from tokenizers import Tokenizer
+>>> from tokenizers.models import BPE
+>>> from tokenizers.trainers import BpeTrainer
+>>> from tokenizers.pre_tokenizers import Whitespace
+>>> tokenizer = Tokenizer(BPE(unk_token="[UNK]"))
+>>> trainer = BpeTrainer(special_tokens=["[UNK]", "[CLS]", "[SEP]", "[PAD]", "[MASK]"])
+>>> tokenizer.pre_tokenizer = Whitespace()
+>>> files = [...]
+>>> tokenizer.train(files, trainer)
+```
+We now have a tokenizer trained on the files we defined. We can either continue using it in that runtime, or save it to
+a JSON file for future re-use.
+## Loading directly from the tokenizer object
+Let's see how to leverage this tokenizer object in the 🤗 Transformers library. The
+[`PreTrainedTokenizerFast`] class allows for easy instantiation, by accepting the instantiated
+*tokenizer* object as an argument:
+```python
+>>> from transformers import PreTrainedTokenizerFast
+>>> fast_tokenizer = PreTrainedTokenizerFast(tokenizer_object=tokenizer)
+```
+This object can now be used with all the methods shared by the 🤗 Transformers tokenizers! Head to [the tokenizer
+page](main_classes/tokenizer) for more information.
+## Loading from a JSON file
+In order to load a tokenizer from a JSON file, let's first start by saving our tokenizer:
+```python
+>>> tokenizer.save("tokenizer.json")
+```
+The path to which we saved this file can be passed to the [`PreTrainedTokenizerFast`] initialization
+method using the `tokenizer_file` parameter:
+```python
+>>> from transformers import PreTrainedTokenizerFast
+>>> fast_tokenizer = PreTrainedTokenizerFast(tokenizer_file="tokenizer.json")
+```
+This object can now be used with all the methods shared by the 🤗 Transformers tokenizers! Head to [the tokenizer
+page](main_classes/tokenizer) for more information.
--- a/docs/source/fast_tokenizers.rst
+++ b/docs/source/fast_tokenizers.rst
-Using tokenizers from 🤗 Tokenizers
-=======================================================================================================================
-The :class:`~transformers.PreTrainedTokenizerFast` depends on the `tokenizers
-<https://huggingface.co/docs/tokenizers>`__ library. The tokenizers obtained from the 🤗 Tokenizers library can be
-loaded very simply into 🤗 Transformers.
-Before getting in the specifics, let's first start by creating a dummy tokenizer in a few lines:
-.. code-block::
-    >>> from tokenizers import Tokenizer
-    >>> from tokenizers.models import BPE
-    >>> from tokenizers.trainers import BpeTrainer
-    >>> from tokenizers.pre_tokenizers import Whitespace
-    >>> tokenizer = Tokenizer(BPE(unk_token="[UNK]"))
-    >>> trainer = BpeTrainer(special_tokens=["[UNK]", "[CLS]", "[SEP]", "[PAD]", "[MASK]"])
-    >>> tokenizer.pre_tokenizer = Whitespace()
-    >>> files = [...]
-    >>> tokenizer.train(files, trainer)
-We now have a tokenizer trained on the files we defined. We can either continue using it in that runtime, or save it to
-a JSON file for future re-use.
-Loading directly from the tokenizer object
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-Let's see how to leverage this tokenizer object in the 🤗 Transformers library. The
-:class:`~transformers.PreTrainedTokenizerFast` class allows for easy instantiation, by accepting the instantiated
-`tokenizer` object as an argument:
-.. code-block::
-    >>> from transformers import PreTrainedTokenizerFast
-    >>> fast_tokenizer = PreTrainedTokenizerFast(tokenizer_object=tokenizer)
-This object can now be used with all the methods shared by the 🤗 Transformers tokenizers! Head to :doc:`the tokenizer
-page <main_classes/tokenizer>` for more information.
-Loading from a JSON file
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-In order to load a tokenizer from a JSON file, let's first start by saving our tokenizer:
-.. code-block::
-    >>> tokenizer.save("tokenizer.json")
-The path to which we saved this file can be passed to the :class:`~transformers.PreTrainedTokenizerFast` initialization
-method using the :obj:`tokenizer_file` parameter:
-.. code-block::
-    >>> from transformers import PreTrainedTokenizerFast
-    >>> fast_tokenizer = PreTrainedTokenizerFast(tokenizer_file="tokenizer.json")
-This object can now be used with all the methods shared by the 🤗 Transformers tokenizers! Head to :doc:`the tokenizer
-page <main_classes/tokenizer>` for more information.
--- a/docs/source/glossary.rst
+++ b/docs/source/glossary.rst
--- a/docs/source/installation.md
+++ b/docs/source/installation.md
--- a/docs/source/internal/file_utils.mdx
+++ b/docs/source/internal/file_utils.mdx
+<!--Copyright 2021 The HuggingFace Team. All rights reserved.
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License.
+-->
+# General Utilities
+This page lists all of Transformers general utility functions that are found in the file `file_utils.py`.
+Most of those are only useful if you are studying the general code in the library.
+## Enums and namedtuples
+[[autodoc]] file_utils.ExplicitEnum
+[[autodoc]] file_utils.PaddingStrategy
+[[autodoc]] file_utils.TensorType
+## Special Decorators
+[[autodoc]] file_utils.add_start_docstrings
+[[autodoc]] file_utils.add_start_docstrings_to_model_forward
+[[autodoc]] file_utils.add_end_docstrings
+[[autodoc]] file_utils.add_code_sample_docstrings
+[[autodoc]] file_utils.replace_return_docstrings
+## Special Properties
+[[autodoc]] file_utils.cached_property
+## Other Utilities
+[[autodoc]] file_utils._LazyModule
--- a/docs/source/internal/file_utils.rst
+++ b/docs/source/internal/file_utils.rst
-.. 
-    Copyright 2021 The HuggingFace Team. All rights reserved.
-    Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
-    the License. You may obtain a copy of the License at
-        http://www.apache.org/licenses/LICENSE-2.0
-    Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
-    an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
-    specific language governing permissions and limitations under the License.
-General Utilities
-----------------------------------------------------------------------------------------------------------------------
-This page lists all of Transformers general utility functions that are found in the file ``file_utils.py``.
-Most of those are only useful if you are studying the general code in the library.
-Enums and namedtuples
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-.. autoclass:: transformers.file_utils.ExplicitEnum
-.. autoclass:: transformers.file_utils.PaddingStrategy
-.. autoclass:: transformers.file_utils.TensorType
-Special Decorators
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-.. autofunction:: transformers.file_utils.add_start_docstrings
-.. autofunction:: transformers.file_utils.add_start_docstrings_to_model_forward
-.. autofunction:: transformers.file_utils.add_end_docstrings
-.. autofunction:: transformers.file_utils.add_code_sample_docstrings
-.. autofunction:: transformers.file_utils.replace_return_docstrings
-Special Properties
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-.. autoclass:: transformers.file_utils.cached_property
-Other Utilities
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-.. autoclass:: transformers.file_utils._LazyModule
--- a/docs/source/internal/generation_utils.mdx
+++ b/docs/source/internal/generation_utils.mdx
+<!--Copyright 2020 The HuggingFace Team. All rights reserved.
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License.
+-->
+# Utilities for Generation
+This page lists all the utility functions used by [`~generation_utils.GenerationMixin.generate`],
+[`~generation_utils.GenerationMixin.greedy_search`],
+[`~generation_utils.GenerationMixin.sample`],
+[`~generation_utils.GenerationMixin.beam_search`],
+[`~generation_utils.GenerationMixin.beam_sample`], and
+[`~generation_utils.GenerationMixin.group_beam_search`].
+Most of those are only useful if you are studying the code of the generate methods in the library.
+## Generate Outputs
+The output of [`~generation_utils.GenerationMixin.generate`] is an instance of a subclass of
+[`~file_utils.ModelOutput`]. This output is a data structure containing all the information returned
+by [`~generation_utils.GenerationMixin.generate`], but that can also be used as tuple or dictionary.
+Here's an example:
+```python
+from transformers import GPT2Tokenizer, GPT2LMHeadModel
+tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
+model = GPT2LMHeadModel.from_pretrained('gpt2')
+inputs = tokenizer("Hello, my dog is cute and ", return_tensors="pt")
+generation_output = model.generate(**inputs, return_dict_in_generate=True, output_scores=True)
+```
+The `generation_output` object is a [`~generation_utils.GreedySearchDecoderOnlyOutput`], as we can
+see in the documentation of that class below, it means it has the following attributes:
+- `sequences`: the generated sequences of tokens
+- `scores` (optional): the prediction scores of the language modelling head, for each generation step
+- `hidden_states` (optional): the hidden states of the model, for each generation step
+- `attentions` (optional): the attention weights of the model, for each generation step
+Here we have the `scores` since we passed along `output_scores=True`, but we don't have `hidden_states` and
+`attentions` because we didn't pass `output_hidden_states=True` or `output_attentions=True`.
+You can access each attribute as you would usually do, and if that attribute has not been returned by the model, you
+will get `None`. Here for instance `generation_output.scores` are all the generated prediction scores of the
+language modeling head, and `generation_output.attentions` is `None`.
+When using our `generation_output` object as a tuple, it only keeps the attributes that don't have `None` values.
+Here, for instance, it has two elements, `loss` then `logits`, so
+```python
+generation_output[:2]
+```
+will return the tuple `(generation_output.sequences, generation_output.scores)` for instance.
+When using our `generation_output` object as a dictionary, it only keeps the attributes that don't have `None`
+values. Here, for instance, it has two keys that are `sequences` and `scores`.
+We document here all output types.
+### GreedySearchOutput
+[[autodoc]] generation_utils.GreedySearchDecoderOnlyOutput
+[[autodoc]] generation_utils.GreedySearchEncoderDecoderOutput
+[[autodoc]] generation_flax_utils.FlaxGreedySearchOutput
+### SampleOutput
+[[autodoc]] generation_utils.SampleDecoderOnlyOutput
+[[autodoc]] generation_utils.SampleEncoderDecoderOutput
+[[autodoc]] generation_flax_utils.FlaxSampleOutput
+### BeamSearchOutput
+[[autodoc]] generation_utils.BeamSearchDecoderOnlyOutput
+[[autodoc]] generation_utils.BeamSearchEncoderDecoderOutput
+### BeamSampleOutput
+[[autodoc]] generation_utils.BeamSampleDecoderOnlyOutput
+[[autodoc]] generation_utils.BeamSampleEncoderDecoderOutput
+## LogitsProcessor
+A [`LogitsProcessor`] can be used to modify the prediction scores of a language model head for
+generation.
+[[autodoc]] LogitsProcessor
+    - __call__
+[[autodoc]] LogitsProcessorList
+    - __call__
+[[autodoc]] LogitsWarper
+    - __call__
+[[autodoc]] MinLengthLogitsProcessor
+    - __call__
+[[autodoc]] TemperatureLogitsWarper
+    - __call__
+[[autodoc]] RepetitionPenaltyLogitsProcessor
+    - __call__
+[[autodoc]] TopPLogitsWarper
+    - __call__
+[[autodoc]] TopKLogitsWarper
+    - __call__
+[[autodoc]] NoRepeatNGramLogitsProcessor
+    - __call__
+[[autodoc]] NoBadWordsLogitsProcessor
+    - __call__
+[[autodoc]] PrefixConstrainedLogitsProcessor
+    - __call__
+[[autodoc]] HammingDiversityLogitsProcessor
+    - __call__
+[[autodoc]] ForcedBOSTokenLogitsProcessor
+    - __call__
+[[autodoc]] ForcedEOSTokenLogitsProcessor
+    - __call__
+[[autodoc]] InfNanRemoveLogitsProcessor
+    - __call__
+[[autodoc]] FlaxLogitsProcessor
+    - __call__
+[[autodoc]] FlaxLogitsProcessorList
+    - __call__
+[[autodoc]] FlaxLogitsWarper
+    - __call__
+[[autodoc]] FlaxTemperatureLogitsWarper
+    - __call__
+[[autodoc]] FlaxTopPLogitsWarper
+    - __call__
+[[autodoc]] FlaxTopKLogitsWarper
+    - __call__
+[[autodoc]] FlaxForcedBOSTokenLogitsProcessor
+    - __call__
+[[autodoc]] FlaxForcedEOSTokenLogitsProcessor
+    - __call__
+[[autodoc]] FlaxMinLengthLogitsProcessor
+    - __call__
+## StoppingCriteria
+A [`StoppingCriteria`] can be used to change when to stop generation (other than EOS token).
+[[autodoc]] StoppingCriteria
+    - __call__
+[[autodoc]] StoppingCriteriaList
+    - __call__
+[[autodoc]] MaxLengthCriteria
+    - __call__
+[[autodoc]] MaxTimeCriteria
+    - __call__
+## BeamSearch
+[[autodoc]] BeamScorer
+    - process
+    - finalize
+[[autodoc]] BeamSearchScorer
+    - process
+    - finalize
+## Utilities
+[[autodoc]] top_k_top_p_filtering
+[[autodoc]] tf_top_k_top_p_filtering
--- a/docs/source/internal/generation_utils.rst
+++ b/docs/source/internal/generation_utils.rst
-.. 
-    Copyright 2020 The HuggingFace Team. All rights reserved.
-    Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
-    the License. You may obtain a copy of the License at
-        http://www.apache.org/licenses/LICENSE-2.0
-    Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
-    an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
-    specific language governing permissions and limitations under the License.
-Utilities for Generation
-----------------------------------------------------------------------------------------------------------------------
-This page lists all the utility functions used by :meth:`~transformers.generation_utils.GenerationMixin.generate`,
-:meth:`~transformers.generation_utils.GenerationMixin.greedy_search`,
-:meth:`~transformers.generation_utils.GenerationMixin.sample`,
-:meth:`~transformers.generation_utils.GenerationMixin.beam_search`,
-:meth:`~transformers.generation_utils.GenerationMixin.beam_sample`, and
-:meth:`~transformers.generation_utils.GenerationMixin.group_beam_search`.
-Most of those are only useful if you are studying the code of the generate methods in the library.
-Generate Outputs
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-The output of :meth:`~transformers.generation_utils.GenerationMixin.generate` is an instance of a subclass of
-:class:`~transformers.file_utils.ModelOutput`. This output is a data structure containing all the information returned
-by :meth:`~transformers.generation_utils.GenerationMixin.generate`, but that can also be used as tuple or dictionary.
-Here's an example:
-.. code-block::
-    from transformers import GPT2Tokenizer, GPT2LMHeadModel
-    tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
-    model = GPT2LMHeadModel.from_pretrained('gpt2')
-    inputs = tokenizer("Hello, my dog is cute and ", return_tensors="pt")
-    generation_output = model.generate(**inputs, return_dict_in_generate=True, output_scores=True)
-The ``generation_output`` object is a :class:`~transformers.generation_utils.GreedySearchDecoderOnlyOutput`, as we can
-see in the documentation of that class below, it means it has the following attributes:
- ``sequences``: the generated sequences of tokens
- ``scores`` (optional): the prediction scores of the language modelling head, for each generation step
- ``hidden_states`` (optional): the hidden states of the model, for each generation step
- ``attentions`` (optional): the attention weights of the model, for each generation step
-Here we have the ``scores`` since we passed along ``output_scores=True``, but we don't have ``hidden_states`` and
-``attentions`` because we didn't pass ``output_hidden_states=True`` or ``output_attentions=True``.
-You can access each attribute as you would usually do, and if that attribute has not been returned by the model, you
-will get ``None``. Here for instance ``generation_output.scores`` are all the generated prediction scores of the
-language modeling head, and ``generation_output.attentions`` is ``None``.
-When using our ``generation_output`` object as a tuple, it only keeps the attributes that don't have ``None`` values.
-Here, for instance, it has two elements, ``loss`` then ``logits``, so
-.. code-block::
-    generation_output[:2]
-will return the tuple ``(generation_output.sequences, generation_output.scores)`` for instance.
-When using our ``generation_output`` object as a dictionary, it only keeps the attributes that don't have ``None``
-values. Here, for instance, it has two keys that are ``sequences`` and ``scores``.
-We document here all output types.
-GreedySearchOutput
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-.. autoclass:: transformers.generation_utils.GreedySearchDecoderOnlyOutput
-    :members:
-.. autoclass:: transformers.generation_utils.GreedySearchEncoderDecoderOutput
-    :members:
-.. autoclass:: transformers.generation_flax_utils.FlaxGreedySearchOutput
-    :members:
-SampleOutput
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-.. autoclass:: transformers.generation_utils.SampleDecoderOnlyOutput
-    :members:
-.. autoclass:: transformers.generation_utils.SampleEncoderDecoderOutput
-    :members:
-.. autoclass:: transformers.generation_flax_utils.FlaxSampleOutput
-    :members:
-BeamSearchOutput
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-.. autoclass:: transformers.generation_utils.BeamSearchDecoderOnlyOutput
-    :members:
-.. autoclass:: transformers.generation_utils.BeamSearchEncoderDecoderOutput
-    :members:
-BeamSampleOutput
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-.. autoclass:: transformers.generation_utils.BeamSampleDecoderOnlyOutput
-    :members:
-.. autoclass:: transformers.generation_utils.BeamSampleEncoderDecoderOutput
-    :members:
-LogitsProcessor
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-A :class:`~transformers.LogitsProcessor` can be used to modify the prediction scores of a language model head for
-generation.
-.. autoclass:: transformers.LogitsProcessor
-    :members: __call__
-.. autoclass:: transformers.LogitsProcessorList
-    :members: __call__
-.. autoclass:: transformers.LogitsWarper
-    :members: __call__
-.. autoclass:: transformers.MinLengthLogitsProcessor
-    :members: __call__
-.. autoclass:: transformers.TemperatureLogitsWarper
-    :members: __call__
-.. autoclass:: transformers.RepetitionPenaltyLogitsProcessor
-    :members: __call__
-.. autoclass:: transformers.TopPLogitsWarper
-    :members: __call__
-.. autoclass:: transformers.TopKLogitsWarper
-    :members: __call__
-.. autoclass:: transformers.NoRepeatNGramLogitsProcessor
-    :members: __call__
-.. autoclass:: transformers.NoBadWordsLogitsProcessor
-    :members: __call__
-.. autoclass:: transformers.PrefixConstrainedLogitsProcessor
-    :members: __call__
-.. autoclass:: transformers.HammingDiversityLogitsProcessor
-    :members: __call__
-.. autoclass:: transformers.ForcedBOSTokenLogitsProcessor
-    :members: __call__
-.. autoclass:: transformers.ForcedEOSTokenLogitsProcessor
-    :members: __call__
-.. autoclass:: transformers.InfNanRemoveLogitsProcessor
-    :members: __call__
-.. autoclass:: transformers.FlaxLogitsProcessor
-    :members: __call__
-.. autoclass:: transformers.FlaxLogitsProcessorList
-    :members: __call__
-.. autoclass:: transformers.FlaxLogitsWarper
-    :members: __call__
-.. autoclass:: transformers.FlaxTemperatureLogitsWarper
-    :members: __call__
-.. autoclass:: transformers.FlaxTopPLogitsWarper
-    :members: __call__
-.. autoclass:: transformers.FlaxTopKLogitsWarper
-    :members: __call__
-.. autoclass:: transformers.FlaxForcedBOSTokenLogitsProcessor
-    :members: __call__
-.. autoclass:: transformers.FlaxForcedEOSTokenLogitsProcessor
-    :members: __call__
-.. autoclass:: transformers.FlaxMinLengthLogitsProcessor
-    :members: __call__
-StoppingCriteria
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-A :class:`~transformers.StoppingCriteria` can be used to change when to stop generation (other than EOS token).
-.. autoclass:: transformers.StoppingCriteria
-    :members: __call__
-.. autoclass:: transformers.StoppingCriteriaList
-    :members: __call__
-.. autoclass:: transformers.MaxLengthCriteria
-    :members: __call__
-.. autoclass:: transformers.MaxTimeCriteria
-    :members: __call__
-BeamSearch
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-.. autoclass:: transformers.BeamScorer
-    :members: process, finalize
-.. autoclass:: transformers.BeamSearchScorer
-    :members: process, finalize
-Utilities
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-.. autofunction:: transformers.top_k_top_p_filtering
-.. autofunction:: transformers.tf_top_k_top_p_filtering
--- a/docs/source/internal/modeling_utils.mdx
+++ b/docs/source/internal/modeling_utils.mdx
+<!--Copyright 2020 The HuggingFace Team. All rights reserved.
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License.
+-->
+# Custom Layers and Utilities
+This page lists all the custom layers used by the library, as well as the utility functions it provides for modeling.
+Most of those are only useful if you are studying the code of the models in the library.
+## Pytorch custom modules
+[[autodoc]] modeling_utils.Conv1D
+[[autodoc]] modeling_utils.PoolerStartLogits
+    - forward
+[[autodoc]] modeling_utils.PoolerEndLogits
+    - forward
+[[autodoc]] modeling_utils.PoolerAnswerClass
+    - forward
+[[autodoc]] modeling_utils.SquadHeadOutput
+[[autodoc]] modeling_utils.SQuADHead
+    - forward
+[[autodoc]] modeling_utils.SequenceSummary
+    - forward
+## PyTorch Helper Functions
+[[autodoc]] apply_chunking_to_forward
+[[autodoc]] modeling_utils.find_pruneable_heads_and_indices
+[[autodoc]] modeling_utils.prune_layer
+[[autodoc]] modeling_utils.prune_conv1d_layer
+[[autodoc]] modeling_utils.prune_linear_layer
+## TensorFlow custom layers
+[[autodoc]] modeling_tf_utils.TFConv1D
+[[autodoc]] modeling_tf_utils.TFSharedEmbeddings
+    - call
+[[autodoc]] modeling_tf_utils.TFSequenceSummary
+## TensorFlow loss functions
+[[autodoc]] modeling_tf_utils.TFCausalLanguageModelingLoss
+[[autodoc]] modeling_tf_utils.TFMaskedLanguageModelingLoss
+[[autodoc]] modeling_tf_utils.TFMultipleChoiceLoss
+[[autodoc]] modeling_tf_utils.TFQuestionAnsweringLoss
+[[autodoc]] modeling_tf_utils.TFSequenceClassificationLoss
+[[autodoc]] modeling_tf_utils.TFTokenClassificationLoss
+## TensorFlow Helper Functions
+[[autodoc]] modeling_tf_utils.get_initializer
+[[autodoc]] modeling_tf_utils.keras_serializable
+[[autodoc]] modeling_tf_utils.shape_list
--- a/docs/source/internal/modeling_utils.rst
+++ b/docs/source/internal/modeling_utils.rst
-.. 
-    Copyright 2020 The HuggingFace Team. All rights reserved.
-    Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
-    the License. You may obtain a copy of the License at
-        http://www.apache.org/licenses/LICENSE-2.0
-    Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
-    an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
-    specific language governing permissions and limitations under the License.
-Custom Layers and Utilities
-----------------------------------------------------------------------------------------------------------------------
-This page lists all the custom layers used by the library, as well as the utility functions it provides for modeling.
-Most of those are only useful if you are studying the code of the models in the library.
-Pytorch custom modules
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-.. autoclass:: transformers.modeling_utils.Conv1D
-.. autoclass:: transformers.modeling_utils.PoolerStartLogits
-    :members: forward
-.. autoclass:: transformers.modeling_utils.PoolerEndLogits
-    :members: forward
-.. autoclass:: transformers.modeling_utils.PoolerAnswerClass
-    :members: forward
-.. autoclass:: transformers.modeling_utils.SquadHeadOutput
-.. autoclass:: transformers.modeling_utils.SQuADHead
-    :members: forward
-.. autoclass:: transformers.modeling_utils.SequenceSummary
-    :members: forward
-PyTorch Helper Functions
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-.. autofunction:: transformers.apply_chunking_to_forward
-.. autofunction:: transformers.modeling_utils.find_pruneable_heads_and_indices
-.. autofunction:: transformers.modeling_utils.prune_layer
-.. autofunction:: transformers.modeling_utils.prune_conv1d_layer
-.. autofunction:: transformers.modeling_utils.prune_linear_layer
-TensorFlow custom layers
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-.. autoclass:: transformers.modeling_tf_utils.TFConv1D
-.. autoclass:: transformers.modeling_tf_utils.TFSharedEmbeddings
-    :members: call
-.. autoclass:: transformers.modeling_tf_utils.TFSequenceSummary
-TensorFlow loss functions
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-.. autoclass:: transformers.modeling_tf_utils.TFCausalLanguageModelingLoss
-    :members:
-.. autoclass:: transformers.modeling_tf_utils.TFMaskedLanguageModelingLoss
-    :members:
-.. autoclass:: transformers.modeling_tf_utils.TFMultipleChoiceLoss
-    :members:
-.. autoclass:: transformers.modeling_tf_utils.TFQuestionAnsweringLoss
-    :members:
-.. autoclass:: transformers.modeling_tf_utils.TFSequenceClassificationLoss
-    :members:
-.. autoclass:: transformers.modeling_tf_utils.TFTokenClassificationLoss
-    :members:
-TensorFlow Helper Functions
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-.. autofunction:: transformers.modeling_tf_utils.get_initializer
-.. autofunction:: transformers.modeling_tf_utils.keras_serializable
-.. autofunction:: transformers.modeling_tf_utils.shape_list
--- a/docs/source/internal/pipelines_utils.mdx
+++ b/docs/source/internal/pipelines_utils.mdx
+<!--Copyright 2020 The HuggingFace Team. All rights reserved.
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License.
+-->
+# Utilities for pipelines
+This page lists all the utility functions the library provides for pipelines.
+Most of those are only useful if you are studying the code of the models in the library.
+## Argument handling
+[[autodoc]] pipelines.ArgumentHandler
+[[autodoc]] pipelines.ZeroShotClassificationArgumentHandler
+[[autodoc]] pipelines.QuestionAnsweringArgumentHandler
+## Data format
+[[autodoc]] pipelines.PipelineDataFormat
+[[autodoc]] pipelines.CsvPipelineDataFormat
+[[autodoc]] pipelines.JsonPipelineDataFormat
+[[autodoc]] pipelines.PipedPipelineDataFormat
+## Utilities
+[[autodoc]] pipelines.PipelineException
--- a/docs/source/internal/pipelines_utils.rst
+++ b/docs/source/internal/pipelines_utils.rst
-.. 
-    Copyright 2020 The HuggingFace Team. All rights reserved.
-    Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
-    the License. You may obtain a copy of the License at
-        http://www.apache.org/licenses/LICENSE-2.0
-    Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
-    an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
-    specific language governing permissions and limitations under the License.
-Utilities for pipelines
-----------------------------------------------------------------------------------------------------------------------
-This page lists all the utility functions the library provides for pipelines.
-Most of those are only useful if you are studying the code of the models in the library.
-Argument handling
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-.. autoclass:: transformers.pipelines.ArgumentHandler
-.. autoclass:: transformers.pipelines.ZeroShotClassificationArgumentHandler
-.. autoclass:: transformers.pipelines.QuestionAnsweringArgumentHandler
-Data format
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-.. autoclass:: transformers.pipelines.PipelineDataFormat
-    :members:
-.. autoclass:: transformers.pipelines.CsvPipelineDataFormat
-    :members:
-.. autoclass:: transformers.pipelines.JsonPipelineDataFormat
-    :members:
-.. autoclass:: transformers.pipelines.PipedPipelineDataFormat
-    :members:
-Utilities
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-.. autoclass:: transformers.pipelines.PipelineException
--- a/docs/source/internal/tokenization_utils.mdx
+++ b/docs/source/internal/tokenization_utils.mdx
+<!--Copyright 2020 The HuggingFace Team. All rights reserved.
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License.
+-->
+# Utilities for Tokenizers
+This page lists all the utility functions used by the tokenizers, mainly the class
+[`~tokenization_utils_base.PreTrainedTokenizerBase`] that implements the common methods between
+[`PreTrainedTokenizer`] and [`PreTrainedTokenizerFast`] and the mixin
+[`~tokenization_utils_base.SpecialTokensMixin`].
+Most of those are only useful if you are studying the code of the tokenizers in the library.
+## PreTrainedTokenizerBase
+[[autodoc]] tokenization_utils_base.PreTrainedTokenizerBase
+    - __call__
+    - all
+## SpecialTokensMixin
+[[autodoc]] tokenization_utils_base.SpecialTokensMixin
+## Enums and namedtuples
+[[autodoc]] tokenization_utils_base.TruncationStrategy
+[[autodoc]] tokenization_utils_base.CharSpan
+[[autodoc]] tokenization_utils_base.TokenSpan