Doc new front (#14590)

* Convert PretrainedConfig doc to Markdown * Use syntax * Add necessary doc files (#14496) * Doc fixes (#14499) * Fixes for the new front * Convert DETR file for table * Title is needed * Simplify a bit * Even simpler * Remove imports * Fix typo in toctree (#14516) * Fix checkpoints badge * Update versions.yml format (#14517) * Doc new front github actions (#14512) * Doc new front github actions * Fix docstring * Fix feature extraction utils import (#14515) * Address Julien's comments * Push to doc-builder * Ready for merge * Remove old build and deploy * Doc misc fixes (#14583) * Rm versions.yml from doc * Fix converting.rst * Rm pretrained_models from toctree * Fix index links (#14567) * Fix links in README * Localized READMEs * Fix copy script * Fix find doc script * Update README_ko.md Co-authored-by: Julien Chaumond <julien@huggingface.co> Co-authored-by: Julien Chaumond <julien@huggingface.co> * Adapt build command to new CLI tools (#14578) * Fix typo * Fix doc interlinks (#14589) * Convert PretrainedConfig doc to Markdown * Use syntax * Rm pattern <[a-z]+(.html).*> * Rm huggingface.co/transformers/master * Rm .html * Rm .html from index.mdx * Rm .html from model_summary.rst * Update index.mdx rm html * Update remove .html * Fix inner doc links * Fix interlink in preprocssing.rst * Update pr_checks Co-authored-by: Sylvain Gugger <sylvain.gugger@gmail.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Convert PretrainedConfig doc to Markdown * Use syntax * Add necessary doc files (#14496) * Doc fixes (#14499) * Fixes for the new front * Convert DETR file for table * Title is needed * Simplify a bit * Even simpler * Remove imports * Fix checkpoints badge * Fix typo in toctree (#14516) * Update versions.yml format (#14517) * Doc new front github actions (#14512) * Doc new front github actions * Fix docstring * Fix feature extraction utils import (#14515) * Address Julien's comments * Push to doc-builder * Ready for merge * Remove old build and deploy * Doc misc fixes (#14583) * Rm versions.yml from doc * Fix converting.rst * Rm pretrained_models from toctree * Fix index links (#14567) * Fix links in README * Localized READMEs * Fix copy script * Fix find doc script * Update README_ko.md Co-authored-by: Julien Chaumond <julien@huggingface.co> Co-authored-by: Julien Chaumond <julien@huggingface.co> * Adapt build command to new CLI tools (#14578) * Fix typo * Fix doc interlinks (#14589) * Convert PretrainedConfig doc to Markdown * Use syntax * Rm pattern <[a-z]+(.html).*> * Rm huggingface.co/transformers/master * Rm .html * Rm .html from index.mdx * Rm .html from model_summary.rst * Update index.mdx rm html * Update remove .html * Fix inner doc links * Fix interlink in preprocssing.rst * Update pr_checks Co-authored-by: Sylvain Gugger <sylvain.gugger@gmail.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Styling Co-authored-by: Mishig Davaadorj <mishig.davaadorj@coloradocollege.edu> Co-authored-by: Lysandre Debut <lysandre@huggingface.co> Co-authored-by: Julien Chaumond <julien@huggingface.co>

Doc new front (#14590)
* Convert PretrainedConfig doc to Markdown * Use syntax * Add necessary doc files (#14496) * Doc fixes (#14499) * Fixes for the new front * Convert DETR file for table * Title is needed * Simplify a bit * Even simpler * Remove imports * Fix typo in toctree (#14516) * Fix checkpoints badge * Update versions.yml format (#14517) * Doc new front github actions (#14512) * Doc new front github actions * Fix docstring * Fix feature extraction utils import (#14515) * Address Julien's comments * Push to doc-builder * Ready for merge * Remove old build and deploy * Doc misc fixes (#14583) * Rm versions.yml from doc * Fix converting.rst * Rm pretrained_models from toctree * Fix index links (#14567) * Fix links in README * Localized READMEs * Fix copy script * Fix find doc script * Update README_ko.md Co-authored-by: Julien Chaumond <julien@huggingface.co> Co-authored-by: Julien Chaumond <julien@huggingface.co> * Adapt build command to new CLI tools (#14578) * Fix typo * Fix doc interlinks (#14589) * Convert PretrainedConfig doc to Markdown * Use syntax * Rm pattern <[a-z]+(.html).*> * Rm huggingface.co/transformers/master * Rm .html * Rm .html from index.mdx * Rm .html from model_summary.rst * Update index.mdx rm html * Update remove .html * Fix inner doc links * Fix interlink in preprocssing.rst * Update pr_checks Co-authored-by: Sylvain Gugger <sylvain.gugger@gmail.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Convert PretrainedConfig doc to Markdown * Use syntax * Add necessary doc files (#14496) * Doc fixes (#14499) * Fixes for the new front * Convert DETR file for table * Title is needed * Simplify a bit * Even simpler * Remove imports * Fix checkpoints badge * Fix typo in toctree (#14516) * Update versions.yml format (#14517) * Doc new front github actions (#14512) * Doc new front github actions * Fix docstring * Fix feature extraction utils import (#14515) * Address Julien's comments * Push to doc-builder * Ready for merge * Remove old build and deploy * Doc misc fixes (#14583) * Rm versions.yml from doc * Fix converting.rst * Rm pretrained_models from toctree * Fix index links (#14567) * Fix links in README * Localized READMEs * Fix copy script * Fix find doc script * Update README_ko.md Co-authored-by: Julien Chaumond <julien@huggingface.co> Co-authored-by: Julien Chaumond <julien@huggingface.co> * Adapt build command to new CLI tools (#14578) * Fix typo * Fix doc interlinks (#14589) * Convert PretrainedConfig doc to Markdown * Use syntax * Rm pattern <[a-z]+(.html).*> * Rm huggingface.co/transformers/master * Rm .html * Rm .html from index.mdx * Rm .html from model_summary.rst * Update index.mdx rm html * Update remove .html * Fix inner doc links * Fix interlink in preprocssing.rst * Update pr_checks Co-authored-by: Sylvain Gugger <sylvain.gugger@gmail.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Styling Co-authored-by: Mishig Davaadorj <mishig.davaadorj@coloradocollege.edu> Co-authored-by: Lysandre Debut <lysandre@huggingface.co> Co-authored-by: Julien Chaumond <julien@huggingface.co>
4df7d05a · Sylvain Gugger · GitHub · 14cc50d0 · 4df7d05a · 14cc50d0
Unverified Commit 4df7d05a authored Dec 01, 2021 by Sylvain Gugger Committed by GitHub Dec 01, 2021
20 changed files
--- a/docs/source/index.mdx
+++ b/docs/source/index.mdx
--- a/docs/source/index.rst
+++ b/docs/source/index.rst
--- a/docs/source/migration.md
+++ b/docs/source/migration.md
@@ -31,7 +31,7 @@ This introduces two breaking changes:

 ##### How to obtain the same behavior as v3.x in v4.x

- The pipelines now contain additional features out of the box. See the [token-classification pipeline with the `grouped_entities` flag](https://huggingface.co/transformers/main_classes/pipelines.html?highlight=textclassification#tokenclassificationpipeline).
+- The pipelines now contain additional features out of the box. See the [token-classification pipeline with the `grouped_entities` flag](main_classes/pipelines#transformers.TokenClassificationPipeline).
 - The auto-tokenizers now return rust tokenizers. In order to obtain the python tokenizers instead, the user may use the `use_fast` flag by setting it to `False`:

 In version `v3.x`:
@@ -98,7 +98,7 @@ from transformers.models.bert.modeling_bert import BertLayer

 #### 4. Switching the `return_dict` argument to `True` by default

-The [`return_dict` argument](https://huggingface.co/transformers/main_classes/output.html) enables the return of dict-like python objects containing the model outputs, instead of the standard tuples. This object is self-documented as keys can be used to retrieve values, while also behaving as a tuple as users may retrieve objects by index or by slice.
+The [`return_dict` argument](main_classes/output) enables the return of dict-like python objects containing the model outputs, instead of the standard tuples. This object is self-documented as keys can be used to retrieve values, while also behaving as a tuple as users may retrieve objects by index or by slice.

 This is a breaking change as the limitation of that tuple is that it cannot be unpacked: `value0, value1 = outputs` will not work.


--- a/docs/source/model_doc/blenderbot.rst
+++ b/docs/source/model_doc/blenderbot.rst
@@ -47,7 +47,7 @@ Implementation Notes
 - Available checkpoints can be found in the `model hub <https://huggingface.co/models?search=blenderbot>`__.
 - This is the `default` Blenderbot model class. However, some smaller checkpoints, such as
  ``facebook/blenderbot_small_90M``, have a different architecture and consequently should be used with
-  `BlenderbotSmall <https://huggingface.co/transformers/master/model_doc/blenderbot_small.html>`__.
+  `BlenderbotSmall <blenderbot_small>`__.


 Usage

--- a/docs/source/model_doc/deit.rst
+++ b/docs/source/model_doc/deit.rst
@@ -25,12 +25,12 @@ Overview

 The DeiT model was proposed in `Training data-efficient image transformers & distillation through attention
 <https://arxiv.org/abs/2012.12877>`__ by Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre
-Sablayrolles, Hervé Jégou. The `Vision Transformer (ViT) <https://huggingface.co/transformers/model_doc/vit.html>`__
-introduced in `Dosovitskiy et al., 2020 <https://arxiv.org/abs/2010.11929>`__ has shown that one can match or even
-outperform existing convolutional neural networks using a Transformer encoder (BERT-like). However, the ViT models
-introduced in that paper required training on expensive infrastructure for multiple weeks, using external data. DeiT
-(data-efficient image transformers) are more efficiently trained transformers for image classification, requiring far
-less data and far less computing resources compared to the original ViT models.
+Sablayrolles, Hervé Jégou. The `Vision Transformer (ViT) <vit>`__ introduced in `Dosovitskiy et al., 2020
+<https://arxiv.org/abs/2010.11929>`__ has shown that one can match or even outperform existing convolutional neural
+networks using a Transformer encoder (BERT-like). However, the ViT models introduced in that paper required training on
+expensive infrastructure for multiple weeks, using external data. DeiT (data-efficient image transformers) are more
+efficiently trained transformers for image classification, requiring far less data and far less computing resources
+compared to the original ViT models.

 The abstract from the paper is the following:


--- a/docs/source/model_doc/detr.rst
+++ b/docs/source/model_doc/detr.rst
--- a/docs/source/model_doc/layoutlmv2.rst
+++ b/docs/source/model_doc/layoutlmv2.rst
@@ -18,9 +18,8 @@ Overview

 The LayoutLMV2 model was proposed in `LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding
 <https://arxiv.org/abs/2012.14740>`__ by Yang Xu, Yiheng Xu, Tengchao Lv, Lei Cui, Furu Wei, Guoxin Wang, Yijuan Lu,
-Dinei Florencio, Cha Zhang, Wanxiang Che, Min Zhang, Lidong Zhou. LayoutLMV2 improves `LayoutLM
-<https://huggingface.co/transformers/model_doc/layoutlm.html>`__ to obtain state-of-the-art results across several
-document image understanding benchmarks:
+Dinei Florencio, Cha Zhang, Wanxiang Che, Min Zhang, Lidong Zhou. LayoutLMV2 improves `LayoutLM <layoutlm>`__ to obtain
+state-of-the-art results across several document image understanding benchmarks:

 - information extraction from scanned documents: the `FUNSD <https://guillaumejaume.github.io/FUNSD/>`__ dataset (a
  collection of 199 annotated forms comprising more than 30,000 words), the `CORD <https://github.com/clovaai/cord>`__

--- a/docs/source/model_summary.rst
+++ b/docs/source/model_summary.rst
@@ -80,7 +80,7 @@ Original GPT
   <a href="https://huggingface.co/models?filter=openai-gpt">
       <img alt="Models" src="https://img.shields.io/badge/All_model_pages-openai--gpt-blueviolet">
   </a>
-   <a href="model_doc/gpt.html">
+   <a href="model_doc/gpt">
       <img alt="Doc" src="https://img.shields.io/badge/Model_documentation-openai--gpt-blueviolet">
   </a>

@@ -100,7 +100,7 @@ GPT-2
   <a href="https://huggingface.co/models?filter=gpt2">
       <img alt="Models" src="https://img.shields.io/badge/All_model_pages-gpt2-blueviolet">
   </a>
-   <a href="model_doc/gpt2.html">
+   <a href="model_doc/gpt2">
       <img alt="Doc" src="https://img.shields.io/badge/Model_documentation-gpt2-blueviolet">
   </a>

@@ -122,7 +122,7 @@ CTRL
   <a href="https://huggingface.co/models?filter=ctrl">
       <img alt="Models" src="https://img.shields.io/badge/All_model_pages-ctrl-blueviolet">
   </a>
-   <a href="model_doc/ctrl.html">
+   <a href="model_doc/ctrl">
       <img alt="Doc" src="https://img.shields.io/badge/Model_documentation-ctrl-blueviolet">
   </a>

@@ -143,7 +143,7 @@ Transformer-XL
   <a href="https://huggingface.co/models?filter=transfo-xl">
       <img alt="Models" src="https://img.shields.io/badge/All_model_pages-transfo--xl-blueviolet">
   </a>
-   <a href="model_doc/transformerxl.html">
+   <a href="model_doc/transformerxl">
       <img alt="Doc" src="https://img.shields.io/badge/Model_documentation-transfo--xl-blueviolet">
   </a>

@@ -174,7 +174,7 @@ Reformer
   <a href="https://huggingface.co/models?filter=reformer">
       <img alt="Models" src="https://img.shields.io/badge/All_model_pages-reformer-blueviolet">
   </a>
-   <a href="model_doc/reformer.html">
+   <a href="model_doc/reformer">
       <img alt="Doc" src="https://img.shields.io/badge/Model_documentation-reformer-blueviolet">
   </a>

@@ -208,7 +208,7 @@ XLNet
   <a href="https://huggingface.co/models?filter=xlnet">
       <img alt="Models" src="https://img.shields.io/badge/All_model_pages-xlnet-blueviolet">
   </a>
-   <a href="model_doc/xlnet.html">
+   <a href="model_doc/xlnet">
       <img alt="Doc" src="https://img.shields.io/badge/Model_documentation-xlnet-blueviolet">
   </a>

@@ -248,7 +248,7 @@ BERT
   <a href="https://huggingface.co/models?filter=bert">
       <img alt="Models" src="https://img.shields.io/badge/All_model_pages-bert-blueviolet">
   </a>
-   <a href="model_doc/bert.html">
+   <a href="model_doc/bert">
       <img alt="Doc" src="https://img.shields.io/badge/Model_documentation-bert-blueviolet">
   </a>

@@ -277,7 +277,7 @@ ALBERT
   <a href="https://huggingface.co/models?filter=albert">
       <img alt="Models" src="https://img.shields.io/badge/All_model_pages-albert-blueviolet">
   </a>
-   <a href="model_doc/albert.html">
+   <a href="model_doc/albert">
       <img alt="Doc" src="https://img.shields.io/badge/Model_documentation-albert-blueviolet">
   </a>

@@ -306,7 +306,7 @@ RoBERTa
   <a href="https://huggingface.co/models?filter=roberta">
       <img alt="Models" src="https://img.shields.io/badge/All_model_pages-roberta-blueviolet">
   </a>
-   <a href="model_doc/roberta.html">
+   <a href="model_doc/roberta">
       <img alt="Doc" src="https://img.shields.io/badge/Model_documentation-roberta-blueviolet">
   </a>

@@ -331,7 +331,7 @@ DistilBERT
   <a href="https://huggingface.co/models?filter=distilbert">
       <img alt="Models" src="https://img.shields.io/badge/All_model_pages-distilbert-blueviolet">
   </a>
-   <a href="model_doc/distilbert.html">
+   <a href="model_doc/distilbert">
       <img alt="Doc" src="https://img.shields.io/badge/Model_documentation-distilbert-blueviolet">
   </a>

@@ -356,7 +356,7 @@ ConvBERT
   <a href="https://huggingface.co/models?filter=convbert">
       <img alt="Models" src="https://img.shields.io/badge/All_model_pages-convbert-blueviolet">
   </a>
-   <a href="model_doc/convbert.html">
+   <a href="model_doc/convbert">
       <img alt="Doc" src="https://img.shields.io/badge/Model_documentation-convbert-blueviolet">
   </a>

@@ -386,7 +386,7 @@ XLM
   <a href="https://huggingface.co/models?filter=xlm">
       <img alt="Models" src="https://img.shields.io/badge/All_model_pages-xlm-blueviolet">
   </a>
-   <a href="model_doc/xlm.html">
+   <a href="model_doc/xlm">
       <img alt="Doc" src="https://img.shields.io/badge/Model_documentation-xlm-blueviolet">
   </a>

@@ -420,7 +420,7 @@ XLM-RoBERTa
   <a href="https://huggingface.co/models?filter=xlm-roberta">
       <img alt="Models" src="https://img.shields.io/badge/All_model_pages-xlm--roberta-blueviolet">
   </a>
-   <a href="model_doc/xlmroberta.html">
+   <a href="model_doc/xlmroberta">
       <img alt="Doc" src="https://img.shields.io/badge/Model_documentation-xlm--roberta-blueviolet">
   </a>

@@ -442,7 +442,7 @@ FlauBERT
   <a href="https://huggingface.co/models?filter=flaubert">
       <img alt="Models" src="https://img.shields.io/badge/All_model_pages-flaubert-blueviolet">
   </a>
-   <a href="model_doc/flaubert.html">
+   <a href="model_doc/flaubert">
       <img alt="Doc" src="https://img.shields.io/badge/Model_documentation-flaubert-blueviolet">
   </a>

@@ -460,7 +460,7 @@ ELECTRA
   <a href="https://huggingface.co/models?filter=electra">
       <img alt="Models" src="https://img.shields.io/badge/All_model_pages-electra-blueviolet">
   </a>
-   <a href="model_doc/electra.html">
+   <a href="model_doc/electra">
       <img alt="Doc" src="https://img.shields.io/badge/Model_documentation-electra-blueviolet">
   </a>

@@ -484,7 +484,7 @@ Funnel Transformer
   <a href="https://huggingface.co/models?filter=funnel">
       <img alt="Models" src="https://img.shields.io/badge/All_model_pages-funnel-blueviolet">
   </a>
-   <a href="model_doc/funnel.html">
+   <a href="model_doc/funnel">
       <img alt="Doc" src="https://img.shields.io/badge/Model_documentation-funnel-blueviolet">
   </a>

@@ -518,7 +518,7 @@ Longformer
   <a href="https://huggingface.co/models?filter=longformer">
       <img alt="Models" src="https://img.shields.io/badge/All_model_pages-longformer-blueviolet">
   </a>
-   <a href="model_doc/longformer.html">
+   <a href="model_doc/longformer">
       <img alt="Doc" src="https://img.shields.io/badge/Model_documentation-longformer-blueviolet">
   </a>

@@ -558,7 +558,7 @@ BART
   <a href="https://huggingface.co/models?filter=bart">
       <img alt="Models" src="https://img.shields.io/badge/All_model_pages-bart-blueviolet">
   </a>
-   <a href="model_doc/bart.html">
+   <a href="model_doc/bart">
       <img alt="Doc" src="https://img.shields.io/badge/Model_documentation-bart-blueviolet">
   </a>

@@ -585,7 +585,7 @@ Pegasus
   <a href="https://huggingface.co/models?filter=pegasus">
       <img alt="Models" src="https://img.shields.io/badge/All_model_pages-pegasus-blueviolet">
   </a>
-   <a href="model_doc/pegasus.html">
+   <a href="model_doc/pegasus">
       <img alt="Doc" src="https://img.shields.io/badge/Model_documentation-pegasus-blueviolet">
   </a>

@@ -616,7 +616,7 @@ MarianMT
   <a href="https://huggingface.co/models?filter=marian">
       <img alt="Models" src="https://img.shields.io/badge/All_model_pages-marian-blueviolet">
   </a>
-   <a href="model_doc/marian.html">
+   <a href="model_doc/marian">
       <img alt="Doc" src="https://img.shields.io/badge/Model_documentation-marian-blueviolet">
   </a>

@@ -635,7 +635,7 @@ T5
   <a href="https://huggingface.co/models?filter=t5">
       <img alt="Models" src="https://img.shields.io/badge/All_model_pages-t5-blueviolet">
   </a>
-   <a href="model_doc/t5.html">
+   <a href="model_doc/t5">
       <img alt="Doc" src="https://img.shields.io/badge/Model_documentation-t5-blueviolet">
   </a>

@@ -668,7 +668,7 @@ MT5
   <a href="https://huggingface.co/models?filter=mt5">
       <img alt="Models" src="https://img.shields.io/badge/All_model_pages-mt5-blueviolet">
   </a>
-   <a href="model_doc/mt5.html">
+   <a href="model_doc/mt5">
       <img alt="Doc" src="https://img.shields.io/badge/Model_documentation-mt5-blueviolet">
   </a>

@@ -689,7 +689,7 @@ MBart
   <a href="https://huggingface.co/models?filter=mbart">
       <img alt="Models" src="https://img.shields.io/badge/All_model_pages-mbart-blueviolet">
   </a>
-   <a href="model_doc/mbart.html">
+   <a href="model_doc/mbart">
       <img alt="Doc" src="https://img.shields.io/badge/Model_documentation-mbart-blueviolet">
   </a>

@@ -718,7 +718,7 @@ ProphetNet
   <a href="https://huggingface.co/models?filter=prophetnet">
       <img alt="Models" src="https://img.shields.io/badge/All_model_pages-prophetnet-blueviolet">
   </a>
-   <a href="model_doc/prophetnet.html">
+   <a href="model_doc/prophetnet">
       <img alt="Doc" src="https://img.shields.io/badge/Model_documentation-prophetnet-blueviolet">
   </a>

@@ -743,7 +743,7 @@ XLM-ProphetNet
   <a href="https://huggingface.co/models?filter=xprophetnet">
       <img alt="Models" src="https://img.shields.io/badge/All_model_pages-xprophetnet-blueviolet">
   </a>
-   <a href="model_doc/xlmprophetnet.html">
+   <a href="model_doc/xlmprophetnet">
       <img alt="Doc" src="https://img.shields.io/badge/Model_documentation-xprophetnet-blueviolet">
   </a>

@@ -781,7 +781,7 @@ model know which part of the input vector corresponds to the text and which to t
 The pretrained model only works for classification.

 ..
-    More information in this :doc:`model documentation </model_doc/mmbt.html>`. TODO: write this page
+    More information in this :doc:`model documentation <model_doc/mmbt>`. TODO: write this page

 .. _retrieval-based-models:

@@ -799,7 +799,7 @@ DPR
   <a href="https://huggingface.co/models?filter=dpr">
       <img alt="Models" src="https://img.shields.io/badge/All_model_pages-dpr-blueviolet">
   </a>
-   <a href="model_doc/dpr.html">
+   <a href="model_doc/dpr">
       <img alt="Doc" src="https://img.shields.io/badge/Model_documentation-dpr-blueviolet">
   </a>

@@ -828,7 +828,7 @@ RAG
   <a href="https://huggingface.co/models?filter=rag">
       <img alt="Models" src="https://img.shields.io/badge/All_model_pages-rag-blueviolet">
   </a>
-   <a href="model_doc/rag.html">
+   <a href="model_doc/rag">
       <img alt="Doc" src="https://img.shields.io/badge/Model_documentation-rag-blueviolet">
   </a>


--- a/docs/source/parallelism.md
+++ b/docs/source/parallelism.md
@@ -46,7 +46,7 @@ Most users with just 2 GPUs already enjoy the increased training speed up thanks
 ## ZeRO Data Parallel

 ZeRO-powered data parallelism (ZeRO-DP) is described on the following diagram from this [blog post](https://www.microsoft.com/en-us/research/blog/zero-deepspeed-new-system-optimizations-enable-training-models-with-over-100-billion-parameters/)
-![DeepSpeed-Image-1](imgs/parallelism-zero.png)
+![DeepSpeed-Image-1](/transformers/_images/parallelism-zero.png)

 It can be difficult to wrap one's head around it, but in reality the concept is quite simple. This is just the usual DataParallel (DP), except, instead of replicating the full model params, gradients and optimizer states, each GPU stores only a slice of it.  And then at run-time when the full layer params are needed just for the given layer, all GPUs synchronize to give each other parts that they miss - this is it.

@@ -122,7 +122,7 @@ Implementations:

 - [DeepSpeed](https://www.deepspeed.ai/features/#the-zero-redundancy-optimizer) ZeRO-DP stages 1+2+3
 - [Fairscale](https://github.com/facebookresearch/fairscale/#optimizer-state-sharding-zero) ZeRO-DP stages 1+2+3
- [`transformers` integration](https://huggingface.co/transformers/master/main_classes/trainer.html#trainer-integrations)
+- [`transformers` integration](main_classes/trainer#trainer-integrations)

 ## Naive Model Parallel (Vertical) and Pipeline Parallel

@@ -150,7 +150,7 @@ Pipeline Parallel (PP) is almost identical to a naive MP, but it solves the GPU

 The following illustration from the [GPipe paper](https://ai.googleblog.com/2019/03/introducing-gpipe-open-source-library.html) shows the naive MP on the top, and PP on the bottom:

-![mp-pp](imgs/parallelism-gpipe-bubble.png)
+![mp-pp](/transformers/_images/parallelism-gpipe-bubble.png)

 It's easy to see from the bottom diagram how PP has less dead zones, where GPUs are idle. The idle parts are referred to as the "bubble".

@@ -203,7 +203,7 @@ Implementations:
 Other approaches:

 DeepSpeed, Varuna and SageMaker use the concept of an [Interleaved Pipeline](https://docs.aws.amazon.com/sagemaker/latest/dg/model-parallel-core-features.html)
-![interleaved-pipeline-execution](imgs/parallelism-sagemaker-interleaved-pipeline.png)
+![interleaved-pipeline-execution](/transformers/_images/parallelism-sagemaker-interleaved-pipeline.png)

 Here the bubble (idle time) is further minimized by prioritizing backward passes.

@@ -221,16 +221,16 @@ The main building block of any transformer is a fully connected `nn.Linear` foll
 Following the Megatron's paper notation, we can write the dot-product part of it as `Y = GeLU(XA)`, where `X` and `Y` are the input and output vectors, and `A` is the weight matrix.

 If we look at the computation in matrix form, it's easy to see how the matrix multiplication can be split between multiple GPUs:
-![Parallel GEMM](imgs/parallelism-tp-parallel_gemm.png)
+![Parallel GEMM](/transformers/_images/parallelism-tp-parallel_gemm.png)

 If we split the weight matrix `A` column-wise across `N` GPUs and perform matrix multiplications `XA_1` through `XA_n` in parallel, then we will end up with `N` output vectors `Y_1, Y_2, ..., Y_n` which can be fed into `GeLU` independently:
-![independent GeLU](imgs/parallelism-tp-independent-gelu.png)
+![independent GeLU](/transformers/_images/parallelism-tp-independent-gelu.png)

 Using this principle, we can update an MLP of arbitrary depth, without the need for any synchronization between GPUs until the very end, where we need to reconstruct the output vector from shards. The Megatron-LM paper authors provide a helpful illustration for that:
-![parallel shard processing](imgs/parallelism-tp-parallel_shard_processing.png)
+![parallel shard processing](/transformers/_images/parallelism-tp-parallel_shard_processing.png)

 Parallelizing the multi-headed attention layers is even simpler, since they are already inherently parallel, due to having multiple independent heads!
-![parallel self-attention](imgs/parallelism-tp-parallel_self_attention.png)
+![parallel self-attention](/transformers/_images/parallelism-tp-parallel_self_attention.png)

 Special considerations: TP requires very fast network, and therefore it's not advisable to do TP across more than one node. Practically, if a node has 4 GPUs, the highest TP degree is therefore 4. If you need a TP degree of 8, you need to use nodes that have at least 8 GPUs.

@@ -258,7 +258,7 @@ Implementations:

 The following diagram from the DeepSpeed [pipeline tutorial](https://www.deepspeed.ai/tutorials/pipeline/) demonstrates how one combines DP with PP.

-![dp-pp-2d](imgs/parallelism-zero-dp-pp.png)
+![dp-pp-2d](/transformers/_images/parallelism-zero-dp-pp.png)

 Here it's important to see how DP rank 0 doesn't see GPU2 and DP rank 1 doesn't see GPU3. To DP there is just GPUs 0 and 1 where it feeds data as if there were just 2 GPUs. GPU0 "secretly" offloads some of its load to GPU2 using PP. And GPU1 does the same by enlisting GPU3 to its aid.

@@ -277,7 +277,7 @@ Implementations:

 To get an even more efficient training a 3D parallelism is used where PP is combined with TP and DP. This can be seen in the following diagram.

-![dp-pp-tp-3d](imgs/parallelism-deepspeed-3d.png)
+![dp-pp-tp-3d](/transformers/_images/parallelism-deepspeed-3d.png)

 This diagram is from a blog post [3D parallelism: Scaling to trillion-parameter models](https://www.microsoft.com/en-us/research/blog/deepspeed-extreme-scale-model-training-for-everyone/), which is a good read as well.

@@ -342,7 +342,7 @@ We have 10 batches of 512 length. If we parallelize them by attribute dimension

 It is similar with tensor model parallelism or naive layer-wise model parallelism.

-![flex-flow-soap](imgs/parallelism-flexflow.jpeg)
+![flex-flow-soap](/transformers/_images/parallelism-flexflow.jpeg)

 The significance of this framework is that it takes resources like (1) GPU/TPU/CPU vs. (2) RAM/DRAM vs. (3) fast-intra-connect/slow-inter-connect and it automatically optimizes all these  algorithmically deciding which parallelisation to use where.


--- a/docs/source/preprocessing.rst
+++ b/docs/source/preprocessing.rst
@@ -56,10 +56,9 @@ is its ``__call__``: you just need to feed your sentence to your tokenizer objec
     'token_type_ids': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 
     'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]}

-This returns a dictionary string to list of ints. The `input_ids <glossary.html#input-ids>`__ are the indices
-corresponding to each token in our sentence. We will see below what the `attention_mask
-<glossary.html#attention-mask>`__ is used for and in :ref:`the next section <sentence-pairs>` the goal of
-`token_type_ids <glossary.html#token-type-ids>`__.
+This returns a dictionary string to list of ints. The `input_ids <glossary#input-ids>`__ are the indices corresponding
+to each token in our sentence. We will see below what the `attention_mask <glossary#attention-mask>`__ is used for and
+in :ref:`the next section <preprocessing-pairs-of-sentences>` the goal of `token_type_ids <glossary#token-type-ids>`__.

 The tokenizer can decode a list of token ids in a proper sentence:

@@ -132,8 +131,8 @@ You can do all of this by using the following options when feeding your list of
                               [1, 1, 1, 1, 1, 1, 1, 1, 0]])}

 It returns a dictionary with string keys and tensor values. We can now see what the `attention_mask
-<glossary.html#attention-mask>`__ is all about: it points out which tokens the model should pay attention to and which
-ones it should not (because they represent padding in this case).
+<glossary#attention-mask>`__ is all about: it points out which tokens the model should pay attention to and which ones
+it should not (because they represent padding in this case).


 Note that if your model does not have a maximum length associated to it, the command above will throw a warning. You
@@ -166,8 +165,8 @@ This will once again return a dict string to list of ints:
     'token_type_ids': [0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1], 
     'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]}

-This shows us what the `token_type_ids <glossary.html#token-type-ids>`__ are for: they indicate to the model which part
-of the inputs correspond to the first sentence and which part corresponds to the second sentence. Note that
+This shows us what the `token_type_ids <glossary#token-type-ids>`__ are for: they indicate to the model which part of
+the inputs correspond to the first sentence and which part corresponds to the second sentence. Note that
 `token_type_ids` are not required or handled by all models. By default, a tokenizer will only return the inputs that
 its associated model expects. You can force the return (or the non-return) of any of those special arguments by using
 ``return_input_ids`` or ``return_token_type_ids``.

--- a/docs/source/pretrained_models.rst
+++ b/docs/source/pretrained_models.rst
--- a/docs/source/quicktour.rst
+++ b/docs/source/quicktour.rst
@@ -197,10 +197,9 @@ To apply these steps on a given text, we can just feed it to our tokenizer:

    >>> inputs = tokenizer("We are very happy to show you the 🤗 Transformers library.")

-This returns a dictionary string to list of ints. It contains the `ids of the tokens <glossary.html#input-ids>`__, as
+This returns a dictionary string to list of ints. It contains the `ids of the tokens <glossary#input-ids>`__, as
 mentioned before, but also additional arguments that will be useful to the model. Here for instance, we also have an
-`attention mask <glossary.html#attention-mask>`__ that the model will use to have a better understanding of the
-sequence:
+`attention mask <glossary#attention-mask>`__ that the model will use to have a better understanding of the sequence:


 .. code-block::

--- a/docs/source/toctree.yml
+++ b/docs/source/toctree.yml
+- sections: 
+  - local: index
+    title: 🤗 Transformers
+  - local: quicktour
+    title: Quick tour
+  - local: installation
+    title: Installation
+  - local: philosophy
+    title: Philosophy
+  - local: glossary
+    title: Glossary
+  title: Get started
+- sections:
+  - local: task_summary
+    title: Summary of the tasks
+  - local: model_summary
+    title: Summary of the models
+  - local: preprocessing
+    title: Preprocessing data
+  - local: training
+    title: Fine-tuning a pretrained model
+  - local: model_sharing
+    title: Model sharing and uploading
+  - local: tokenizer_summary
+    title: Summary of the tokenizers
+  - local: multilingual
+    title: Multi-lingual models
+  title: "Using 🤗 Transformers"
+- sections:
+  - local: examples
+    title: Examples
+  - local: troubleshooting
+    title: Troubleshooting
+  - local: custom_datasets
+    title: Fine-tuning with custom datasets
+  - local: notebooks
+    title: "🤗 Transformers Notebooks"
+  - local: sagemaker
+    title: Run training on Amazon SageMaker
+  - local: community
+    title: Community
+  - local: converting_tensorflow_models
+    title: Converting Tensorflow Checkpoints
+  - local: migration
+    title: Migrating from previous packages
+  - local: contributing
+    title: How to contribute to transformers?
+  - local: add_new_model
+    title: "How to add a model to 🤗 Transformers?"
+  - local: add_new_pipeline
+    title: "How to add a pipeline to 🤗 Transformers?"
+  - local: fast_tokenizers
+    title: "Using tokenizers from 🤗 Tokenizers"
+  - local: performance
+    title: 'Performance and Scalability: How To Fit a Bigger Model and Train It Faster'
+  - local: parallelism
+    title: Model Parallelism
+  - local: testing
+    title: Testing
+  - local: debugging
+    title: Debugging
+  - local: serialization
+    title: Exporting transformers models
+  title: Advanced guides
+- sections:
+  - local: bertology
+    title: BERTology
+  - local: perplexity
+    title: Perplexity of fixed-length models
+  - local: benchmarks
+    title: Benchmarks
+  title: Research
+- sections:
+  - sections:
+    - local: main_classes/callback
+      title: Callbacks
+    - local: main_classes/configuration
+      title: Configuration
+    - local: main_classes/data_collator
+      title: Data Collator
+    - local: main_classes/keras_callbacks
+      title: Keras callbacks
+    - local: main_classes/logging
+      title: Logging
+    - local: main_classes/model
+      title: Models
+    - local: main_classes/optimizer_schedules
+      title: Optimization
+    - local: main_classes/output
+      title: Model outputs
+    - local: main_classes/pipelines
+      title: Pipelines
+    - local: main_classes/processors
+      title: Processors
+    - local: main_classes/tokenizer
+      title: Tokenizer
+    - local: main_classes/trainer
+      title: Trainer
+    - local: main_classes/deepspeed
+      title: DeepSpeed Integration
+    - local: main_classes/feature_extractor
+      title: Feature Extractor
+    title: Main Classes
+  - sections:
+    - local: model_doc/albert
+      title: ALBERT
+    - local: model_doc/auto
+      title: Auto Classes
+    - local: model_doc/bart
+      title: BART
+    - local: model_doc/barthez
+      title: BARThez
+    - local: model_doc/bartpho
+      title: BARTpho
+    - local: model_doc/beit
+      title: BEiT
+    - local: model_doc/bert
+      title: BERT
+    - local: model_doc/bertweet
+      title: Bertweet
+    - local: model_doc/bertgeneration
+      title: BertGeneration
+    - local: model_doc/bert_japanese
+      title: BertJapanese
+    - local: model_doc/bigbird
+      title: BigBird
+    - local: model_doc/bigbird_pegasus
+      title: BigBirdPegasus
+    - local: model_doc/blenderbot
+      title: Blenderbot
+    - local: model_doc/blenderbot_small
+      title: Blenderbot Small
+    - local: model_doc/bort
+      title: BORT
+    - local: model_doc/byt5
+      title: ByT5
+    - local: model_doc/camembert
+      title: CamemBERT
+    - local: model_doc/canine
+      title: CANINE
+    - local: model_doc/clip
+      title: CLIP
+    - local: model_doc/convbert
+      title: ConvBERT
+    - local: model_doc/cpm
+      title: CPM
+    - local: model_doc/ctrl
+      title: CTRL
+    - local: model_doc/deberta
+      title: DeBERTa
+    - local: model_doc/deberta_v2
+      title: DeBERTa-v2
+    - local: model_doc/deit
+      title: DeiT
+    - local: model_doc/detr
+      title: DETR
+    - local: model_doc/dialogpt
+      title: DialoGPT
+    - local: model_doc/distilbert
+      title: DistilBERT
+    - local: model_doc/dpr
+      title: DPR
+    - local: model_doc/electra
+      title: ELECTRA
+    - local: model_doc/encoderdecoder
+      title: Encoder Decoder Models
+    - local: model_doc/flaubert
+      title: FlauBERT
+    - local: model_doc/fnet
+      title: FlauBERT
+    - local: model_doc/fsmt
+      title: FSMT
+    - local: model_doc/funnel
+      title: Funnel Transformer
+    - local: model_doc/herbert
+      title: herBERT
+    - local: model_doc/ibert
+      title: I-BERT
+    - local: model_doc/imagegpt
+      title: ImageGPT
+    - local: model_doc/layoutlm
+      title: LayoutLM
+    - local: model_doc/layoutlmv2
+      title: LayoutLMV2
+    - local: model_doc/layoutxlm
+      title: LayoutXLM
+    - local: model_doc/led
+      title: LED
+    - local: model_doc/longformer
+      title: Longformer
+    - local: model_doc/luke
+      title: LUKE
+    - local: model_doc/lxmert
+      title: LXMERT
+    - local: model_doc/marian
+      title: MarianMT
+    - local: model_doc/m2m_100
+      title: M2M100
+    - local: model_doc/mbart
+      title: MBart and MBart-50
+    - local: model_doc/megatron_bert
+      title: MegatronBERT
+    - local: model_doc/megatron_gpt2
+      title: MegatronGPT2
+    - local: model_doc/mobilebert
+      title: MobileBERT
+    - local: model_doc/mpnet
+      title: MPNet
+    - local: model_doc/mt5
+      title: MT5
+    - local: model_doc/gpt
+      title: OpenAI GPT
+    - local: model_doc/gpt2
+      title: OpenAI GPT2
+    - local: model_doc/gptj
+      title: GPT-J
+    - local: model_doc/gpt_neo
+      title: GPT Neo
+    - local: model_doc/hubert
+      title: Hubert
+    - local: model_doc/pegasus
+      title: Pegasus
+    - local: model_doc/phobert
+      title: PhoBERT
+    - local: model_doc/prophetnet
+      title: ProphetNet
+    - local: model_doc/qdqbert
+      title: QDQBert
+    - local: model_doc/rag
+      title: RAG
+    - local: model_doc/reformer
+      title: Reformer
+    - local: model_doc/rembert
+      title: RemBERT
+    - local: model_doc/retribert
+      title: RetriBERT
+    - local: model_doc/roberta
+      title: RoBERTa
+    - local: model_doc/roformer
+      title: RoFormer
+    - local: model_doc/segformer
+      title: SegFormer
+    - local: model_doc/sew
+      title: SEW
+    - local: model_doc/sew_d
+      title: SEW-D
+    - local: model_doc/speechencoderdecoder
+      title: Speech Encoder Decoder Models
+    - local: model_doc/speech_to_text
+      title: Speech2Text
+    - local: model_doc/speech_to_text_2
+      title: Speech2Text2
+    - local: model_doc/splinter
+      title: Splinter
+    - local: model_doc/squeezebert
+      title: SqueezeBERT
+    - local: model_doc/t5
+      title: T5
+    - local: model_doc/t5v1.1
+      title: T5v1.1
+    - local: model_doc/tapas
+      title: TAPAS
+    - local: model_doc/transformerxl
+      title: Transformer XL
+    - local: model_doc/trocr
+      title: TrOCR
+    - local: model_doc/unispeech
+      title: UniSpeech
+    - local: model_doc/unispeech_sat
+      title: UniSpeech-SAT
+    - local: model_doc/visionencoderdecoder
+      title: Vision Encoder Decoder Models
+    - local: model_doc/vit
+      title: Vision Transformer (ViT)
+    - local: model_doc/visual_bert
+      title: VisualBERT
+    - local: model_doc/wav2vec2
+      title: Wav2Vec2
+    - local: model_doc/xlm
+      title: XLM
+    - local: model_doc/xlmprophetnet
+      title: XLM-ProphetNet
+    - local: model_doc/xlmroberta
+      title: XLM-RoBERTa
+    - local: model_doc/xlnet
+      title: XLNet
+    - local: model_doc/xlsr_wav2vec2
+      title: XLSR-Wav2Vec2
+    title: Models
+  - sections:
+    - local: internal/modeling_utils
+      title: Custom Layers and Utilities
+    - local: internal/pipelines_utils
+      title: Utilities for pipelines
+    - local: internal/tokenization_utils
+      title: Utilities for Tokenizers
+    - local: internal/trainer_utils
+      title: Utilities for Trainer
+    - local: internal/generation_utils
+      title: Utilities for Generation
+    - local: internal/file_utils
+      title: General Utilities
+    title: Internal Helpers
+  title: API
--- a/docs/source/training.rst
+++ b/docs/source/training.rst
@@ -417,5 +417,5 @@ To look at more fine-tuning examples you can refer to:
 - `🤗 Transformers Examples <https://github.com/huggingface/transformers/tree/master/examples>`__ which includes scripts
  to train on all common NLP tasks in PyTorch and TensorFlow.

- `🤗 Transformers Notebooks <notebooks.html>`__ which contains various notebooks and in particular one per task (look
-  for the `how to finetune a model on xxx`).
+- `🤗 Transformers Notebooks <notebooks>`__ which contains various notebooks and in particular one per task (look for
+  the `how to finetune a model on xxx`).
--- a/docs/source/troubleshooting.md
+++ b/docs/source/troubleshooting.md
@@ -27,4 +27,4 @@ ValueError: Connection error, and we cannot find the requested files in the cach
 Please try again or make sure your Internet connection is on.
 ```

-One possible solution in this situation is to use the ["offline-mode"](https://huggingface.co/transformers/installation.html#offline-mode).
+One possible solution in this situation is to use the ["offline-mode"](installation#offline-mode).
--- a/examples/README.md
+++ b/examples/README.md
@@ -42,34 +42,35 @@ To browse the examples corresponding to released versions of 🤗 Transformers,

 <details>
  <summary>Examples for older versions of 🤗 Transformers</summary>
-
-  - [v4.5.1](https://github.com/huggingface/transformers/tree/v4.5.1/examples)
-  - [v4.4.2](https://github.com/huggingface/transformers/tree/v4.4.2/examples)
-  - [v4.3.3](https://github.com/huggingface/transformers/tree/v4.3.3/examples)
-  - [v4.2.2](https://github.com/huggingface/transformers/tree/v4.2.2/examples)
-  - [v4.1.1](https://github.com/huggingface/transformers/tree/v4.1.1/examples)
-  - [v4.0.1](https://github.com/huggingface/transformers/tree/v4.0.1/examples)
-  - [v3.5.1](https://github.com/huggingface/transformers/tree/v3.5.1/examples)
-  - [v3.4.0](https://github.com/huggingface/transformers/tree/v3.4.0/examples)
-  - [v3.3.1](https://github.com/huggingface/transformers/tree/v3.3.1/examples)
-  - [v3.2.0](https://github.com/huggingface/transformers/tree/v3.2.0/examples)
-  - [v3.1.0](https://github.com/huggingface/transformers/tree/v3.1.0/examples)
-  - [v3.0.2](https://github.com/huggingface/transformers/tree/v3.0.2/examples)
-  - [v2.11.0](https://github.com/huggingface/transformers/tree/v2.11.0/examples)
-  - [v2.10.0](https://github.com/huggingface/transformers/tree/v2.10.0/examples)
-  - [v2.9.1](https://github.com/huggingface/transformers/tree/v2.9.1/examples)
-  - [v2.8.0](https://github.com/huggingface/transformers/tree/v2.8.0/examples)
-  - [v2.7.0](https://github.com/huggingface/transformers/tree/v2.7.0/examples)
-  - [v2.6.0](https://github.com/huggingface/transformers/tree/v2.6.0/examples)
-  - [v2.5.1](https://github.com/huggingface/transformers/tree/v2.5.1/examples)
-  - [v2.4.0](https://github.com/huggingface/transformers/tree/v2.4.0/examples)
-  - [v2.3.0](https://github.com/huggingface/transformers/tree/v2.3.0/examples)
-  - [v2.2.0](https://github.com/huggingface/transformers/tree/v2.2.0/examples)
-  - [v2.1.1](https://github.com/huggingface/transformers/tree/v2.1.0/examples)
-  - [v2.0.0](https://github.com/huggingface/transformers/tree/v2.0.0/examples)
-  - [v1.2.0](https://github.com/huggingface/transformers/tree/v1.2.0/examples)
-  - [v1.1.0](https://github.com/huggingface/transformers/tree/v1.1.0/examples)
-  - [v1.0.0](https://github.com/huggingface/transformers/tree/v1.0.0/examples)
+	<ul>
+		<li><a href="https://github.com/huggingface/transformers/tree/v4.5.1/examples">v4.5.1</a></li>
+		<li><a href="https://github.com/huggingface/transformers/tree/v4.4.2/examples">v4.4.2</a></li>
+		<li><a href="https://github.com/huggingface/transformers/tree/v4.3.3/examples">v4.3.3</a></li>
+		<li><a href="https://github.com/huggingface/transformers/tree/v4.2.2/examples">v4.2.2</a></li>
+		<li><a href="https://github.com/huggingface/transformers/tree/v4.1.1/examples">v4.1.1</a></li>
+		<li><a href="https://github.com/huggingface/transformers/tree/v4.0.1/examples">v4.0.1</a></li>
+		<li><a href="https://github.com/huggingface/transformers/tree/v3.5.1/examples">v3.5.1</a></li>
+		<li><a href="https://github.com/huggingface/transformers/tree/v3.4.0/examples">v3.4.0</a></li>
+		<li><a href="https://github.com/huggingface/transformers/tree/v3.3.1/examples">v3.3.1</a></li>
+		<li><a href="https://github.com/huggingface/transformers/tree/v3.2.0/examples">v3.2.0</a></li>
+		<li><a href="https://github.com/huggingface/transformers/tree/v3.1.0/examples">v3.1.0</a></li>
+		<li><a href="https://github.com/huggingface/transformers/tree/v3.0.2/examples">v3.0.2</a></li>
+		<li><a href="https://github.com/huggingface/transformers/tree/v2.11.0/examples">v2.11.0</a></li>
+		<li><a href="https://github.com/huggingface/transformers/tree/v2.10.0/examples">v2.10.0</a></li>
+		<li><a href="https://github.com/huggingface/transformers/tree/v2.9.1/examples">v2.9.1</a></li>
+		<li><a href="https://github.com/huggingface/transformers/tree/v2.8.0/examples">v2.8.0</a></li>
+		<li><a href="https://github.com/huggingface/transformers/tree/v2.7.0/examples">v2.7.0</a></li>
+		<li><a href="https://github.com/huggingface/transformers/tree/v2.6.0/examples">v2.6.0</a></li>
+		<li><a href="https://github.com/huggingface/transformers/tree/v2.5.1/examples">v2.5.1</a></li>
+		<li><a href="https://github.com/huggingface/transformers/tree/v2.4.0/examples">v2.4.0</a></li>
+		<li><a href="https://github.com/huggingface/transformers/tree/v2.3.0/examples">v2.3.0</a></li>
+		<li><a href="https://github.com/huggingface/transformers/tree/v2.2.0/examples">v2.2.0</a></li>
+		<li><a href="https://github.com/huggingface/transformers/tree/v2.1.0/examples">v2.1.1</a></li>
+		<li><a href="https://github.com/huggingface/transformers/tree/v2.0.0/examples">v2.0.0</a></li>
+		<li><a href="https://github.com/huggingface/transformers/tree/v1.2.0/examples">v1.2.0</a></li>
+		<li><a href="https://github.com/huggingface/transformers/tree/v1.1.0/examples">v1.1.0</a></li>
+		<li><a href="https://github.com/huggingface/transformers/tree/v1.0.0/examples">v1.0.0</a></li>
+	</ul>
 </details>

 Alternatively, you can switch your cloned 🤗 Transformers to a specific version (for instance with v3.5.1) with

--- a/notebooks/README.md
+++ b/notebooks/README.md
@@ -85,4 +85,4 @@ You can open any page of the documentation as a notebook in colab (there is a bu

 ## Community notebooks:

-More notebooks developed by the community are available [here](https://huggingface.co/transformers/master/community.html#community-notebooks).
+More notebooks developed by the community are available [here](community#community-notebooks).
--- a/src/transformers/configuration_utils.py
+++ b/src/transformers/configuration_utils.py
--- a/src/transformers/models/beit/feature_extraction_beit.py
+++ b/src/transformers/models/beit/feature_extraction_beit.py
@@ -38,8 +38,9 @@ class BeitFeatureExtractor(FeatureExtractionMixin, ImageFeatureExtractionMixin):
    r"""
    Constructs a BEiT feature extractor.

-    This feature extractor inherits from :class:`~transformers.FeatureExtractionMixin` which contains most of the main
-    methods. Users should refer to this superclass for more information regarding those methods.
+    This feature extractor inherits from :class:`~transformers.feature_extraction_utils.FeatureExtractionMixin` which
+    contains most of the main methods. Users should refer to this superclass for more information regarding those
+    methods.

    Args:
        do_resize (:obj:`bool`, `optional`, defaults to :obj:`True`):

--- a/utils/check_copies.py
+++ b/utils/check_copies.py
@@ -275,54 +275,6 @@ def get_model_list(filename, start_prompt, end_prompt):
    return "".join(result)


-def split_long_line_with_indent(line, max_per_line, indent):
-    """Split the `line` so that it doesn't go over `max_per_line` and adds `indent` to new lines."""
-    words = line.split(" ")
-    lines = []
-    current_line = words[0]
-    for word in words[1:]:
-        if len(f"{current_line} {word}") > max_per_line:
-            lines.append(current_line)
-            current_line = " " * indent + word
-        else:
-            current_line = f"{current_line} {word}"
-    lines.append(current_line)
-    return "\n".join(lines)
-
-
-def convert_to_rst(model_list, max_per_line=None):
-    """Convert `model_list` to rst format."""
-    # Convert **[description](link)** to `description <link>`__
-    def _rep_link(match):
-        title, link = match.groups()
-        # Keep hard links for the models not released yet
-        if "master" in link or not link.startswith("https://huggingface.co/transformers"):
-            return f"`{title} <{link}>`__"
-        # Convert links to relative links otherwise
-        else:
-            link = link[len("https://huggingface.co/transformers/") : -len(".html")]
-            return f":doc:`{title} <{link}>`"
-
-    model_list = re.sub(r"\*\*\[([^\]]*)\]\(([^\)]*)\)\*\*", _rep_link, model_list)
-
-    # Convert [description](link) to `description <link>`__
-    model_list = re.sub(r"\[([^\]]*)\]\(([^\)]*)\)", r"`\1 <\2>`__", model_list)
-
-    # Enumerate the lines properly
-    lines = model_list.split("\n")
-    result = []
-    for i, line in enumerate(lines):
-        line = re.sub(r"^\s*(\d+)\.", f"{i+1}.", line)
-        # Split the lines that are too long
-        if max_per_line is not None and len(line) > max_per_line:
-            prompt = re.search(r"^(\s*\d+\.\s+)\S", line)
-            indent = len(prompt.groups()[0]) if prompt is not None else 0
-            line = split_long_line_with_indent(line, max_per_line, indent)
-
-        result.append(line)
-    return "\n".join(result)
-
-
 def convert_to_localized_md(model_list, localized_model_list, format_str):
    """Convert `model_list` to each localized README."""

@@ -376,6 +328,11 @@ def convert_to_localized_md(model_list, localized_model_list, format_str):
    return num_models_equal, "\n".join(map(lambda x: x[1], sorted_index)) + "\n"


+def convert_readme_to_index(model_list):
+    model_list = model_list.replace("https://huggingface.co/docs/transformers/master/", "")
+    return model_list.replace("https://huggingface.co/docs/transformers/", "")
+
+
 def _find_text_in_file(filename, start_prompt, end_prompt):
    """
    Find the text in `filename` between a line beginning with `start_prompt` and before `end_prompt`, removing empty
@@ -406,10 +363,10 @@ def check_model_list_copy(overwrite=False, max_per_line=119):
    """Check the model lists in the README and index.rst are consistent and maybe `overwrite`."""

    # If the introduction or the conclusion of the list change, the prompts may need to be updated.
-    rst_list, start_index, end_index, lines = _find_text_in_file(
-        filename=os.path.join(PATH_TO_DOCS, "index.rst"),
-        start_prompt="    This list is updated automatically from the README",
-        end_prompt="Supported frameworks",
+    index_list, start_index, end_index, lines = _find_text_in_file(
+        filename=os.path.join(PATH_TO_DOCS, "index.mdx"),
+        start_prompt="<!--This list is updated automatically from the README",
+        end_prompt="### Supported frameworks",
    )
    md_list = get_model_list(
        filename="README.md",
@@ -417,8 +374,6 @@ def check_model_list_copy(overwrite=False, max_per_line=119):
        end_prompt=LOCALIZED_READMES["README.md"]["end_prompt"],
    )

-    converted_rst_list = convert_to_rst(md_list, max_per_line=max_per_line)
-
    converted_md_lists = []
    for filename, value in LOCALIZED_READMES.items():
        _start_prompt = value["start_prompt"]
@@ -430,13 +385,14 @@ def check_model_list_copy(overwrite=False, max_per_line=119):

        converted_md_lists.append((filename, num_models_equal, converted_md_list, _start_prompt, _end_prompt))

-    if converted_rst_list != rst_list:
+    converted_md_list = convert_readme_to_index(md_list)
+    if converted_md_list != index_list:
        if overwrite:
-            with open(os.path.join(PATH_TO_DOCS, "index.rst"), "w", encoding="utf-8", newline="\n") as f:
-                f.writelines(lines[:start_index] + [converted_rst_list] + lines[end_index:])
+            with open(os.path.join(PATH_TO_DOCS, "index.mdx"), "w", encoding="utf-8", newline="\n") as f:
+                f.writelines(lines[:start_index] + [converted_md_list] + lines[end_index:])
        else:
            raise ValueError(
-                "The model list in the README changed and the list in `index.rst` has not been updated. Run "
+                "The model list in the README changed and the list in `index.mdx` has not been updated. Run "
                "`make fix-copies` to fix this."
            )