Unverified Commit 74712e22 authored by Sylvain Gugger's avatar Sylvain Gugger Committed by GitHub
Browse files

Honor contributors to models (#11329)

* Honor contributors to models

* Fix typo

* Address review comments

* Add more authors
parent aad95c7c
...@@ -54,7 +54,8 @@ Tips: ...@@ -54,7 +54,8 @@ Tips:
:class:`~transformers.ElectraForPreTraining` model (the classification head will be randomly initialized as it :class:`~transformers.ElectraForPreTraining` model (the classification head will be randomly initialized as it
doesn't exist in the generator). doesn't exist in the generator).
The original code can be found `here <https://github.com/google-research/electra>`__. This model was contributed by `lysandre <https://huggingface.co/lysandre>`__. The original code can be found `here
<https://github.com/google-research/electra>`__.
ElectraConfig ElectraConfig
......
...@@ -35,7 +35,8 @@ time they outperform other pretraining approaches. Different versions of FlauBER ...@@ -35,7 +35,8 @@ time they outperform other pretraining approaches. Different versions of FlauBER
protocol for the downstream tasks, called FLUE (French Language Understanding Evaluation), are shared to the research protocol for the downstream tasks, called FLUE (French Language Understanding Evaluation), are shared to the research
community for further reproducible experiments in French NLP.* community for further reproducible experiments in French NLP.*
The original code can be found `here <https://github.com/getalp/Flaubert>`__. This model was contributed by `formiel <https://huggingface.co/formiel>`__. The original code can be found `here
<https://github.com/getalp/Flaubert>`__.
FlaubertConfig FlaubertConfig
......
...@@ -34,7 +34,8 @@ data, then decode using noisy channel model reranking. Our submissions are ranke ...@@ -34,7 +34,8 @@ data, then decode using noisy channel model reranking. Our submissions are ranke
human evaluation campaign. On En->De, our system significantly outperforms other systems as well as human translations. human evaluation campaign. On En->De, our system significantly outperforms other systems as well as human translations.
This system improves upon our WMT'18 submission by 4.5 BLEU points.* This system improves upon our WMT'18 submission by 4.5 BLEU points.*
The original code can be found here <https://github.com/pytorch/fairseq/tree/master/examples/wmt19>__. This model was contributed by `stas <https://huggingface.co/stas>`__. The original code can be found here
<https://github.com/pytorch/fairseq/tree/master/examples/wmt19>__.
Implementation Notes Implementation Notes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
......
...@@ -49,7 +49,8 @@ Tips: ...@@ -49,7 +49,8 @@ Tips:
:class:`~transformers.FunnelBaseModel`, :class:`~transformers.FunnelForSequenceClassification` and :class:`~transformers.FunnelBaseModel`, :class:`~transformers.FunnelForSequenceClassification` and
:class:`~transformers.FunnelForMultipleChoice`. :class:`~transformers.FunnelForMultipleChoice`.
The original code can be found `here <https://github.com/laiguokun/Funnel-Transformer>`__. This model was contributed by `sgugger <https://huggingface.co/sgugger>`__. The original code can be found `here
<https://github.com/laiguokun/Funnel-Transformer>`__.
FunnelConfig FunnelConfig
......
...@@ -45,7 +45,8 @@ Tips: ...@@ -45,7 +45,8 @@ Tips:
`Write With Transformer <https://transformer.huggingface.co/doc/gpt>`__ is a webapp created and hosted by Hugging Face `Write With Transformer <https://transformer.huggingface.co/doc/gpt>`__ is a webapp created and hosted by Hugging Face
showcasing the generative capabilities of several models. GPT is one of them. showcasing the generative capabilities of several models. GPT is one of them.
The original code can be found `here <https://github.com/openai/finetune-transformer-lm>`__. This model was contributed by `thomwolf <https://huggingface.co/thomwolf>`__. The original code can be found `here
<https://github.com/openai/finetune-transformer-lm>`__.
Note: Note:
......
...@@ -45,7 +45,8 @@ Tips: ...@@ -45,7 +45,8 @@ Tips:
Hugging Face showcasing the generative capabilities of several models. GPT-2 is one of them and is available in five Hugging Face showcasing the generative capabilities of several models. GPT-2 is one of them and is available in five
different sizes: small, medium, large, xl and a distilled version of the small checkpoint: `distilgpt-2`. different sizes: small, medium, large, xl and a distilled version of the small checkpoint: `distilgpt-2`.
The original code can be found `here <https://openai.com/blog/better-language-models/>`__. This model was contributed by `thomwolf <https://huggingface.co/thomwolf>`__. The original code can be found `here
<https://openai.com/blog/better-language-models/>`__.
GPT2Config GPT2Config
......
...@@ -23,6 +23,8 @@ Black, Stella Biderman, Leo Gao, Phil Wang and Connor Leahy. It is a GPT2 like c ...@@ -23,6 +23,8 @@ Black, Stella Biderman, Leo Gao, Phil Wang and Connor Leahy. It is a GPT2 like c
The architecture is similar to GPT2 except that GPT Neo uses local attention in every other layer with a window size of The architecture is similar to GPT2 except that GPT Neo uses local attention in every other layer with a window size of
256 tokens. 256 tokens.
This model was contributed by `valhalla <https://huggingface.co/valhalla>`__.
Generation Generation
_______________________________________________________________________________________________________________________ _______________________________________________________________________________________________________________________
......
...@@ -56,7 +56,9 @@ Examples of use: ...@@ -56,7 +56,9 @@ Examples of use:
>>> model = AutoModel.from_pretrained("allegro/herbert-klej-cased-v1") >>> model = AutoModel.from_pretrained("allegro/herbert-klej-cased-v1")
The original code can be found `here <https://github.com/allegro/HerBERT>`__. This model was contributed by `rmroczkowski <https://huggingface.co/rmroczkowski>`__. The original code can be found
`here <https://github.com/allegro/HerBERT>`__.
HerbertTokenizer HerbertTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
......
...@@ -36,8 +36,9 @@ the full-precision baseline. Furthermore, our preliminary implementation of I-BE ...@@ -36,8 +36,9 @@ the full-precision baseline. Furthermore, our preliminary implementation of I-BE
INT8 inference on a T4 GPU system as compared to FP32 inference. The framework has been developed in PyTorch and has INT8 inference on a T4 GPU system as compared to FP32 inference. The framework has been developed in PyTorch and has
been open-sourced.* been open-sourced.*
This model was contributed by `kssteven <https://huggingface.co/kssteven>`__. The original code can be found `here
<https://github.com/kssteven418/I-BERT>`__.
The original code can be found `here <https://github.com/kssteven418/I-BERT>`__.
IBertConfig IBertConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
......
...@@ -80,7 +80,8 @@ occurs. Those can be obtained using the Python Image Library (PIL) library for e ...@@ -80,7 +80,8 @@ occurs. Those can be obtained using the Python Image Library (PIL) library for e
<https://github.com/NielsRogge/Transformers-Tutorials/blob/master/LayoutLM/Fine_tuning_LayoutLMForTokenClassification_on_FUNSD.ipynb>`__. <https://github.com/NielsRogge/Transformers-Tutorials/blob/master/LayoutLM/Fine_tuning_LayoutLMForTokenClassification_on_FUNSD.ipynb>`__.
It includes an inference part, which shows how to use Google's Tesseract on a new document. It includes an inference part, which shows how to use Google's Tesseract on a new document.
The original code can be found `here <https://github.com/microsoft/unilm/tree/master/layoutlm>`_. This model was contributed by `liminghao1630 <https://huggingface.co/liminghao1630>`__. The original code can be found
`here <https://github.com/microsoft/unilm/tree/master/layoutlm>`_.
LayoutLMConfig LayoutLMConfig
......
...@@ -53,6 +53,8 @@ Tips: ...@@ -53,6 +53,8 @@ Tips:
- A notebook showing how to fine-tune LED, can be accessed `here - A notebook showing how to fine-tune LED, can be accessed `here
<https://colab.research.google.com/drive/12LjJazBl7Gam0XBPy_y0CTOJZeZ34c2v?usp=sharing>`__. <https://colab.research.google.com/drive/12LjJazBl7Gam0XBPy_y0CTOJZeZ34c2v?usp=sharing>`__.
This model was contributed by `patrickvonplaten <https://huggingface.co/patrickvonplaten>`__.
LEDConfig LEDConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
......
...@@ -40,7 +40,8 @@ Tips: ...@@ -40,7 +40,8 @@ Tips:
token belongs to which segment. Just separate your segments with the separation token :obj:`tokenizer.sep_token` (or token belongs to which segment. Just separate your segments with the separation token :obj:`tokenizer.sep_token` (or
:obj:`</s>`). :obj:`</s>`).
The Authors' code can be found `here <https://github.com/allenai/longformer>`__. This model was contributed by `beltagy <https://huggingface.co/beltagy>`__. The Authors' code can be found `here
<https://github.com/allenai/longformer>`__.
Longformer Self Attention Longformer Self Attention
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
......
...@@ -52,7 +52,8 @@ Tips: ...@@ -52,7 +52,8 @@ Tips:
contains self-attention for each respective modality and cross-attention, only the cross attention is returned and contains self-attention for each respective modality and cross-attention, only the cross attention is returned and
both self attention outputs are disregarded. both self attention outputs are disregarded.
The original code can be found `here <https://github.com/airsplay/lxmert>`__. This model was contributed by `eltoto1219 <https://huggingface.co/eltoto1219>`__. The original code can be found `here
<https://github.com/airsplay/lxmert>`__.
LxmertConfig LxmertConfig
......
...@@ -34,6 +34,8 @@ to create high quality models. Our focus on non-English-Centric models brings ga ...@@ -34,6 +34,8 @@ to create high quality models. Our focus on non-English-Centric models brings ga
translating between non-English directions while performing competitively to the best single systems of WMT. We translating between non-English directions while performing competitively to the best single systems of WMT. We
open-source our scripts so that others may reproduce the data, evaluation, and final M2M-100 model.* open-source our scripts so that others may reproduce the data, evaluation, and final M2M-100 model.*
This model was contributed by `valhalla <https://huggingface.co/valhalla>`__.
Training and Generation Training and Generation
_______________________________________________________________________________________________________________________ _______________________________________________________________________________________________________________________
......
...@@ -37,6 +37,7 @@ Implementation Notes ...@@ -37,6 +37,7 @@ Implementation Notes
- the model starts generating with :obj:`pad_token_id` (which has 0 as a token_embedding) as the prefix (Bart uses - the model starts generating with :obj:`pad_token_id` (which has 0 as a token_embedding) as the prefix (Bart uses
:obj:`<s/>`), :obj:`<s/>`),
- Code to bulk convert models can be found in ``convert_marian_to_pytorch.py``. - Code to bulk convert models can be found in ``convert_marian_to_pytorch.py``.
- This model was contributed by `sshleifer <https://huggingface.co/sshleifer>`__.
Naming Naming
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
......
...@@ -29,7 +29,8 @@ corpora in many languages using the BART objective. mBART is one of the first me ...@@ -29,7 +29,8 @@ corpora in many languages using the BART objective. mBART is one of the first me
sequence-to-sequence model by denoising full texts in multiple languages, while previous approaches have focused only sequence-to-sequence model by denoising full texts in multiple languages, while previous approaches have focused only
on the encoder, decoder, or reconstructing parts of the text. on the encoder, decoder, or reconstructing parts of the text.
The Authors' code can be found `here <https://github.com/pytorch/fairseq/tree/master/examples/mbart>`__ This model was contributed by `valhalla <https://huggingface.co/valhalla>`__. The Authors' code can be found `here
<https://github.com/pytorch/fairseq/tree/master/examples/mbart>`__
Training of MBart Training of MBart
_______________________________________________________________________________________________________________________ _______________________________________________________________________________________________________________________
......
...@@ -77,9 +77,10 @@ The following commands allow you to do the conversion. We assume that the folder ...@@ -77,9 +77,10 @@ The following commands allow you to do the conversion. We assume that the folder
python3 $PATH_TO_TRANSFORMERS/models/megatron_bert/convert_megatron_bert_checkpoint.py megatron_bert_345m_v0_1_cased.zip python3 $PATH_TO_TRANSFORMERS/models/megatron_bert/convert_megatron_bert_checkpoint.py megatron_bert_345m_v0_1_cased.zip
The original code can be found `here <https://github.com/NVIDIA/Megatron-LM>`__. That repository contains a multi-GPU This model was contributed by `jdemouth <https://huggingface.co/jdemouth>`__. The original code can be found `here
and multi-node implementation of the Megatron Language models. In particular, it contains a hybrid model parallel <https://github.com/NVIDIA/Megatron-LM>`__. That repository contains a multi-GPU and multi-node implementation of the
approach using "tensor parallel" and "pipeline parallel" techniques. Megatron Language models. In particular, it contains a hybrid model parallel approach using "tensor parallel" and
"pipeline parallel" techniques.
MegatronBertConfig MegatronBertConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
......
...@@ -64,7 +64,8 @@ The following command allows you to do the conversion. We assume that the folder ...@@ -64,7 +64,8 @@ The following command allows you to do the conversion. We assume that the folder
python3 $PATH_TO_TRANSFORMERS/models/megatron_gpt2/convert_megatron_gpt2_checkpoint.py megatron_gpt2_345m_v0_0.zip python3 $PATH_TO_TRANSFORMERS/models/megatron_gpt2/convert_megatron_gpt2_checkpoint.py megatron_gpt2_345m_v0_0.zip
The original code can be found `here <https://github.com/NVIDIA/Megatron-LM>`__. That repository contains a multi-GPU This model was contributed by `jdemouth <https://huggingface.co/jdemouth>`__. The original code can be found `here
and multi-node implementation of the Megatron Language models. In particular, it contains a hybrid model parallel <https://github.com/NVIDIA/Megatron-LM>`__. That repository contains a multi-GPU and multi-node implementation of the
approach using "tensor parallel" and "pipeline parallel" techniques. Megatron Language models. In particular, it contains a hybrid model parallel approach using "tensor parallel" and
"pipeline parallel" techniques.
...@@ -44,7 +44,8 @@ Tips: ...@@ -44,7 +44,8 @@ Tips:
efficient at predicting masked tokens and at NLU in general, but is not optimal for text generation. Models trained efficient at predicting masked tokens and at NLU in general, but is not optimal for text generation. Models trained
with a causal language modeling (CLM) objective are better in that regard. with a causal language modeling (CLM) objective are better in that regard.
The original code can be found `here <https://github.com/google-research/mobilebert>`__. This model was contributed by `vshampor <https://huggingface.co/vshampor>`__. The original code can be found `here
<https://github.com/google-research/mobilebert>`__.
MobileBertConfig MobileBertConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
......
...@@ -28,7 +28,8 @@ multilingual variant of T5 that was pre-trained on a new Common Crawl-based data ...@@ -28,7 +28,8 @@ multilingual variant of T5 that was pre-trained on a new Common Crawl-based data
the design and modified training of mT5 and demonstrate its state-of-the-art performance on many multilingual the design and modified training of mT5 and demonstrate its state-of-the-art performance on many multilingual
benchmarks. All of the code and model checkpoints* benchmarks. All of the code and model checkpoints*
The original code can be found `here <https://github.com/google-research/multilingual-t5>`__. This model was contributed by `patrickvonplaten <https://huggingface.co/patrickvonplaten>`__. The original code can be
found `here <https://github.com/google-research/multilingual-t5>`__.
MT5Config MT5Config
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment