Honor contributors to models (#11329)

* Honor contributors to models * Fix typo * Address review comments * Add more authors

Honor contributors to models (#11329)
* Honor contributors to models * Fix typo * Address review comments * Add more authors
74712e22 · Sylvain Gugger · GitHub · aad95c7c · 74712e22 · 74712e22
Unverified Commit 74712e22 authored Apr 21, 2021 by Sylvain Gugger Committed by GitHub Apr 21, 2021
20 changed files
--- a/docs/source/model_doc/electra.rst
+++ b/docs/source/model_doc/electra.rst
@@ -54,7 +54,8 @@ Tips:
  :class:`~transformers.ElectraForPreTraining` model (the classification head will be randomly initialized as it
  doesn't exist in the generator).
-The original code can be found `here <https://github.com/google-research/electra>`__.
+This model was contributed by `lysandre <https://huggingface.co/lysandre>`__. The original code can be found `here
+<https://github.com/google-research/electra>`__.
 ElectraConfig

--- a/docs/source/model_doc/flaubert.rst
+++ b/docs/source/model_doc/flaubert.rst
@@ -35,7 +35,8 @@ time they outperform other pretraining approaches. Different versions of FlauBER
 protocol for the downstream tasks, called FLUE (French Language Understanding Evaluation), are shared to the research
 community for further reproducible experiments in French NLP.*
-The original code can be found `here <https://github.com/getalp/Flaubert>`__.
+This model was contributed by `formiel <https://huggingface.co/formiel>`__. The original code can be found `here
+<https://github.com/getalp/Flaubert>`__.
 FlaubertConfig

--- a/docs/source/model_doc/fsmt.rst
+++ b/docs/source/model_doc/fsmt.rst
@@ -34,7 +34,8 @@ data, then decode using noisy channel model reranking. Our submissions are ranke
 human evaluation campaign. On En->De, our system significantly outperforms other systems as well as human translations.
 This system improves upon our WMT'18 submission by 4.5 BLEU points.*
-The original code can be found here <https://github.com/pytorch/fairseq/tree/master/examples/wmt19>__.
+This model was contributed by `stas <https://huggingface.co/stas>`__. The original code can be found here
+<https://github.com/pytorch/fairseq/tree/master/examples/wmt19>__.
 Implementation Notes
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

--- a/docs/source/model_doc/funnel.rst
+++ b/docs/source/model_doc/funnel.rst
@@ -49,7 +49,8 @@ Tips:
  :class:`~transformers.FunnelBaseModel`, :class:`~transformers.FunnelForSequenceClassification` and
  :class:`~transformers.FunnelForMultipleChoice`.
-The original code can be found `here <https://github.com/laiguokun/Funnel-Transformer>`__.
+This model was contributed by `sgugger <https://huggingface.co/sgugger>`__. The original code can be found `here
+<https://github.com/laiguokun/Funnel-Transformer>`__.
 FunnelConfig

--- a/docs/source/model_doc/gpt.rst
+++ b/docs/source/model_doc/gpt.rst
@@ -45,7 +45,8 @@ Tips:
 `Write With Transformer <https://transformer.huggingface.co/doc/gpt>`__ is a webapp created and hosted by Hugging Face
 showcasing the generative capabilities of several models. GPT is one of them.
-The original code can be found `here <https://github.com/openai/finetune-transformer-lm>`__.
+This model was contributed by `thomwolf <https://huggingface.co/thomwolf>`__. The original code can be found `here
+<https://github.com/openai/finetune-transformer-lm>`__.
 Note:

--- a/docs/source/model_doc/gpt2.rst
+++ b/docs/source/model_doc/gpt2.rst
@@ -45,7 +45,8 @@ Tips:
 Hugging Face showcasing the generative capabilities of several models. GPT-2 is one of them and is available in five
 different sizes: small, medium, large, xl and a distilled version of the small checkpoint: `distilgpt-2`.
-The original code can be found `here <https://openai.com/blog/better-language-models/>`__.
+This model was contributed by `thomwolf <https://huggingface.co/thomwolf>`__. The original code can be found `here
+<https://openai.com/blog/better-language-models/>`__.
 GPT2Config

--- a/docs/source/model_doc/gpt_neo.rst
+++ b/docs/source/model_doc/gpt_neo.rst
@@ -23,6 +23,8 @@ Black, Stella Biderman, Leo Gao, Phil Wang and Connor Leahy. It is a GPT2 like c
 The architecture is similar to GPT2 except that GPT Neo uses local attention in every other layer with a window size of
 256 tokens.
+This model was contributed by `valhalla <https://huggingface.co/valhalla>`__.
 Generation
 _______________________________________________________________________________________________________________________

--- a/docs/source/model_doc/herbert.rst
+++ b/docs/source/model_doc/herbert.rst
@@ -56,7 +56,9 @@ Examples of use:
    >>> model = AutoModel.from_pretrained("allegro/herbert-klej-cased-v1")
-The original code can be found `here <https://github.com/allegro/HerBERT>`__.
+This model was contributed by `rmroczkowski <https://huggingface.co/rmroczkowski>`__. The original code can be found
+`here <https://github.com/allegro/HerBERT>`__.
 HerbertTokenizer
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

--- a/docs/source/model_doc/ibert.rst
+++ b/docs/source/model_doc/ibert.rst
@@ -36,8 +36,9 @@ the full-precision baseline. Furthermore, our preliminary implementation of I-BE
 INT8 inference on a T4 GPU system as compared to FP32 inference. The framework has been developed in PyTorch and has
 been open-sourced.*
+This model was contributed by `kssteven <https://huggingface.co/kssteven>`__. The original code can be found `here
+<https://github.com/kssteven418/I-BERT>`__.
-The original code can be found `here <https://github.com/kssteven418/I-BERT>`__.
 IBertConfig
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

--- a/docs/source/model_doc/layoutlm.rst
+++ b/docs/source/model_doc/layoutlm.rst
@@ -80,7 +80,8 @@ occurs. Those can be obtained using the Python Image Library (PIL) library for e
  <https://github.com/NielsRogge/Transformers-Tutorials/blob/master/LayoutLM/Fine_tuning_LayoutLMForTokenClassification_on_FUNSD.ipynb>`__.
  It includes an inference part, which shows how to use Google's Tesseract on a new document.
-The original code can be found `here <https://github.com/microsoft/unilm/tree/master/layoutlm>`_.
+This model was contributed by `liminghao1630 <https://huggingface.co/liminghao1630>`__. The original code can be found
+`here <https://github.com/microsoft/unilm/tree/master/layoutlm>`_.
 LayoutLMConfig

--- a/docs/source/model_doc/led.rst
+++ b/docs/source/model_doc/led.rst
@@ -53,6 +53,8 @@ Tips:
 - A notebook showing how to fine-tune LED, can be accessed `here
  <https://colab.research.google.com/drive/12LjJazBl7Gam0XBPy_y0CTOJZeZ34c2v?usp=sharing>`__.
+This model was contributed by `patrickvonplaten <https://huggingface.co/patrickvonplaten>`__.
 LEDConfig
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

--- a/docs/source/model_doc/longformer.rst
+++ b/docs/source/model_doc/longformer.rst
@@ -40,7 +40,8 @@ Tips:
  token belongs to which segment. Just separate your segments with the separation token :obj:`tokenizer.sep_token` (or
  :obj:`</s>`).
-The Authors' code can be found `here <https://github.com/allenai/longformer>`__.
+This model was contributed by `beltagy <https://huggingface.co/beltagy>`__. The Authors' code can be found `here
+<https://github.com/allenai/longformer>`__.
 Longformer Self Attention
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

--- a/docs/source/model_doc/lxmert.rst
+++ b/docs/source/model_doc/lxmert.rst
@@ -52,7 +52,8 @@ Tips:
  contains self-attention for each respective modality and cross-attention, only the cross attention is returned and
  both self attention outputs are disregarded.
-The original code can be found `here <https://github.com/airsplay/lxmert>`__.
+This model was contributed by `eltoto1219 <https://huggingface.co/eltoto1219>`__. The original code can be found `here
+<https://github.com/airsplay/lxmert>`__.
 LxmertConfig

--- a/docs/source/model_doc/m2m_100.rst
+++ b/docs/source/model_doc/m2m_100.rst
@@ -34,6 +34,8 @@ to create high quality models. Our focus on non-English-Centric models brings ga
 translating between non-English directions while performing competitively to the best single systems of WMT. We
 open-source our scripts so that others may reproduce the data, evaluation, and final M2M-100 model.*
+This model was contributed by `valhalla <https://huggingface.co/valhalla>`__.
 Training and Generation
 _______________________________________________________________________________________________________________________

--- a/docs/source/model_doc/marian.rst
+++ b/docs/source/model_doc/marian.rst
@@ -37,6 +37,7 @@ Implementation Notes
    - the model starts generating with :obj:`pad_token_id` (which has 0 as a token_embedding) as the prefix (Bart uses
      :obj:`<s/>`),
 - Code to bulk convert models can be found in ``convert_marian_to_pytorch.py``.
+- This model was contributed by `sshleifer <https://huggingface.co/sshleifer>`__.
 Naming
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

--- a/docs/source/model_doc/mbart.rst
+++ b/docs/source/model_doc/mbart.rst
@@ -29,7 +29,8 @@ corpora in many languages using the BART objective. mBART is one of the first me
 sequence-to-sequence model by denoising full texts in multiple languages, while previous approaches have focused only
 on the encoder, decoder, or reconstructing parts of the text.
-The Authors' code can be found `here <https://github.com/pytorch/fairseq/tree/master/examples/mbart>`__
+This model was contributed by `valhalla <https://huggingface.co/valhalla>`__. The Authors' code can be found `here
+<https://github.com/pytorch/fairseq/tree/master/examples/mbart>`__
 Training of MBart
 _______________________________________________________________________________________________________________________

--- a/docs/source/model_doc/megatron_bert.rst
+++ b/docs/source/model_doc/megatron_bert.rst
@@ -77,9 +77,10 @@ The following commands allow you to do the conversion. We assume that the folder
    python3 $PATH_TO_TRANSFORMERS/models/megatron_bert/convert_megatron_bert_checkpoint.py megatron_bert_345m_v0_1_cased.zip
-The original code can be found `here <https://github.com/NVIDIA/Megatron-LM>`__. That repository contains a multi-GPU
+This model was contributed by `jdemouth <https://huggingface.co/jdemouth>`__. The original code can be found `here
-and multi-node implementation of the Megatron Language models. In particular, it contains a hybrid model parallel
+<https://github.com/NVIDIA/Megatron-LM>`__. That repository contains a multi-GPU and multi-node implementation of the
-approach using "tensor parallel" and "pipeline parallel" techniques.
+Megatron Language models. In particular, it contains a hybrid model parallel approach using "tensor parallel" and
+"pipeline parallel" techniques.
 MegatronBertConfig
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

--- a/docs/source/model_doc/megatron_gpt2.rst
+++ b/docs/source/model_doc/megatron_gpt2.rst
@@ -64,7 +64,8 @@ The following command allows you to do the conversion. We assume that the folder
    python3 $PATH_TO_TRANSFORMERS/models/megatron_gpt2/convert_megatron_gpt2_checkpoint.py megatron_gpt2_345m_v0_0.zip
-The original code can be found `here <https://github.com/NVIDIA/Megatron-LM>`__. That repository contains a multi-GPU
+This model was contributed by `jdemouth <https://huggingface.co/jdemouth>`__. The original code can be found `here
-and multi-node implementation of the Megatron Language models. In particular, it contains a hybrid model parallel
+<https://github.com/NVIDIA/Megatron-LM>`__. That repository contains a multi-GPU and multi-node implementation of the
-approach using "tensor parallel" and "pipeline parallel" techniques.
+Megatron Language models. In particular, it contains a hybrid model parallel approach using "tensor parallel" and
+"pipeline parallel" techniques.
--- a/docs/source/model_doc/mobilebert.rst
+++ b/docs/source/model_doc/mobilebert.rst
@@ -44,7 +44,8 @@ Tips:
  efficient at predicting masked tokens and at NLU in general, but is not optimal for text generation. Models trained
  with a causal language modeling (CLM) objective are better in that regard.
-The original code can be found `here <https://github.com/google-research/mobilebert>`__.
+This model was contributed by `vshampor <https://huggingface.co/vshampor>`__. The original code can be found `here
+<https://github.com/google-research/mobilebert>`__.
 MobileBertConfig
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

--- a/docs/source/model_doc/mt5.rst
+++ b/docs/source/model_doc/mt5.rst
@@ -28,7 +28,8 @@ multilingual variant of T5 that was pre-trained on a new Common Crawl-based data
 the design and modified training of mT5 and demonstrate its state-of-the-art performance on many multilingual
 benchmarks. All of the code and model checkpoints*
-The original code can be found `here <https://github.com/google-research/multilingual-t5>`__.
+This model was contributed by `patrickvonplaten <https://huggingface.co/patrickvonplaten>`__. The original code can be
+found `here <https://github.com/google-research/multilingual-t5>`__.
 MT5Config
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~