Unverified Commit 74712e22 authored by Sylvain Gugger's avatar Sylvain Gugger Committed by GitHub
Browse files

Honor contributors to models (#11329)

* Honor contributors to models

* Fix typo

* Address review comments

* Add more authors
parent aad95c7c
...@@ -43,7 +43,8 @@ Tips: ...@@ -43,7 +43,8 @@ Tips:
similar to a BERT-like architecture with the same number of hidden layers as it has to iterate through the same similar to a BERT-like architecture with the same number of hidden layers as it has to iterate through the same
number of (repeating) layers. number of (repeating) layers.
The original code can be found `here <https://github.com/google-research/ALBERT>`__. This model was contributed by `lysandre <https://huggingface.co/lysandre>`__. The original code can be found `here
<https://github.com/google-research/ALBERT>`__.
AlbertConfig AlbertConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
......
...@@ -35,7 +35,8 @@ According to the abstract, ...@@ -35,7 +35,8 @@ According to the abstract,
state-of-the-art results on a range of abstractive dialogue, question answering, and summarization tasks, with gains state-of-the-art results on a range of abstractive dialogue, question answering, and summarization tasks, with gains
of up to 6 ROUGE. of up to 6 ROUGE.
The Authors' code can be found `here <https://github.com/pytorch/fairseq/tree/master/examples/bart>`__. This model was contributed by `sshleifer <https://huggingface.co/sshleifer>`__. The Authors' code can be found `here
<https://github.com/pytorch/fairseq/tree/master/examples/bart>`__.
Examples Examples
......
...@@ -16,7 +16,7 @@ BARThez ...@@ -16,7 +16,7 @@ BARThez
Overview Overview
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The BARThez model was proposed in `BARThez: a Skilled Pretrained French Sequence-to-Sequence Model` The BARThez model was proposed in `BARThez: a Skilled Pretrained French Sequence-to-Sequence Model
<https://arxiv.org/abs/2010.12321>`__ by Moussa Kamal Eddine, Antoine J.-P. Tixier, Michalis Vazirgiannis on 23 Oct, <https://arxiv.org/abs/2010.12321>`__ by Moussa Kamal Eddine, Antoine J.-P. Tixier, Michalis Vazirgiannis on 23 Oct,
2020. 2020.
...@@ -35,7 +35,8 @@ summarization dataset, OrangeSum, that we release with this paper. We also conti ...@@ -35,7 +35,8 @@ summarization dataset, OrangeSum, that we release with this paper. We also conti
pretrained multilingual BART on BARThez's corpus, and we show that the resulting model, which we call mBARTHez, pretrained multilingual BART on BARThez's corpus, and we show that the resulting model, which we call mBARTHez,
provides a significant boost over vanilla BARThez, and is on par with or outperforms CamemBERT and FlauBERT.* provides a significant boost over vanilla BARThez, and is on par with or outperforms CamemBERT and FlauBERT.*
The Authors' code can be found `here <https://github.com/moussaKam/BARThez>`__. This model was contributed by `moussakam <https://huggingface.co/moussakam>`__. The Authors' code can be found `here
<https://github.com/moussaKam/BARThez>`__.
Examples Examples
......
...@@ -42,7 +42,8 @@ Tips: ...@@ -42,7 +42,8 @@ Tips:
- BERT was trained with the masked language modeling (MLM) and next sentence prediction (NSP) objectives. It is - BERT was trained with the masked language modeling (MLM) and next sentence prediction (NSP) objectives. It is
efficient at predicting masked tokens and at NLU in general, but is not optimal for text generation. efficient at predicting masked tokens and at NLU in general, but is not optimal for text generation.
The original code can be found `here <https://github.com/google-research/bert>`__. This model was contributed by `thomwolf <https://huggingface.co/thomwolf>`__. The original code can be found `here
<https://github.com/google-research/bert>`__.
BertConfig BertConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
......
...@@ -71,6 +71,8 @@ Tips: ...@@ -71,6 +71,8 @@ Tips:
- This implementation is the same as BERT, except for tokenization method. Refer to the :doc:`documentation of BERT - This implementation is the same as BERT, except for tokenization method. Refer to the :doc:`documentation of BERT
<bert>` for more usage examples. <bert>` for more usage examples.
This model was contributed by `cl-tohoku <https://huggingface.co/cl-tohoku>`__.
BertJapaneseTokenizer BertJapaneseTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
......
...@@ -79,7 +79,8 @@ Tips: ...@@ -79,7 +79,8 @@ Tips:
- For summarization, sentence splitting, sentence fusion and translation, no special tokens are required for the input. - For summarization, sentence splitting, sentence fusion and translation, no special tokens are required for the input.
Therefore, no EOS token should be added to the end of the input. Therefore, no EOS token should be added to the end of the input.
The original code can be found `here <https://tfhub.dev/s?module-type=text-generation&subtype=module,placeholder>`__. This model was contributed by `patrickvonplaten <https://huggingface.co/patrickvonplaten>`__. The original code can be
found `here <https://tfhub.dev/s?module-type=text-generation&subtype=module,placeholder>`__.
BertGenerationConfig BertGenerationConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
......
...@@ -54,8 +54,8 @@ Example of use: ...@@ -54,8 +54,8 @@ Example of use:
>>> # from transformers import TFAutoModel >>> # from transformers import TFAutoModel
>>> # bertweet = TFAutoModel.from_pretrained("vinai/bertweet-base") >>> # bertweet = TFAutoModel.from_pretrained("vinai/bertweet-base")
This model was contributed by `dqnguyen <https://huggingface.co/dqnguyen>`__. The original code can be found `here
The original code can be found `here <https://github.com/VinAIResearch/BERTweet>`__. <https://github.com/VinAIResearch/BERTweet>`__.
BertweetTokenizer BertweetTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
......
...@@ -50,7 +50,8 @@ Tips: ...@@ -50,7 +50,8 @@ Tips:
- Current implementation supports only **ITC**. - Current implementation supports only **ITC**.
- Current implementation doesn't support **num_random_blocks = 0** - Current implementation doesn't support **num_random_blocks = 0**
The original code can be found `here <https://github.com/google-research/bigbird>`__. This model was contributed by `vasudevgupta <https://huggingface.co/vasudevgupta>`__. The original code can be found
`here <https://github.com/google-research/bigbird>`__.
BigBirdConfig BigBirdConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
......
...@@ -36,7 +36,8 @@ and code publicly available. Human evaluations show our best models are superior ...@@ -36,7 +36,8 @@ and code publicly available. Human evaluations show our best models are superior
dialogue in terms of engagingness and humanness measurements. We then discuss the limitations of this work by analyzing dialogue in terms of engagingness and humanness measurements. We then discuss the limitations of this work by analyzing
failure cases of our models.* failure cases of our models.*
The authors' code can be found `here <https://github.com/facebookresearch/ParlAI>`__ . This model was contributed by `sshleifer <https://huggingface.co/sshleifer>`__. The authors' code can be found `here
<https://github.com/facebookresearch/ParlAI>`__ .
Implementation Notes Implementation Notes
......
...@@ -39,7 +39,8 @@ and code publicly available. Human evaluations show our best models are superior ...@@ -39,7 +39,8 @@ and code publicly available. Human evaluations show our best models are superior
dialogue in terms of engagingness and humanness measurements. We then discuss the limitations of this work by analyzing dialogue in terms of engagingness and humanness measurements. We then discuss the limitations of this work by analyzing
failure cases of our models.* failure cases of our models.*
The authors' code can be found `here <https://github.com/facebookresearch/ParlAI>`__ . This model was contributed by `patrickvonplaten <https://huggingface.co/patrickvonplaten>`__. The authors' code can be
found `here <https://github.com/facebookresearch/ParlAI>`__ .
BlenderbotSmallConfig BlenderbotSmallConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
......
...@@ -43,4 +43,5 @@ Tips: ...@@ -43,4 +43,5 @@ Tips:
that is sadly not open-sourced yet. It would be very useful for the community, if someone tries to implement the that is sadly not open-sourced yet. It would be very useful for the community, if someone tries to implement the
algorithm to make BORT fine-tuning work. algorithm to make BORT fine-tuning work.
The original code can be found `here <https://github.com/alexa/bort/>`__. This model was contributed by `stefan-it <https://huggingface.co/stefan-it>`__. The original code can be found `here
<https://github.com/alexa/bort/>`__.
...@@ -37,7 +37,8 @@ Tips: ...@@ -37,7 +37,8 @@ Tips:
- This implementation is the same as RoBERTa. Refer to the :doc:`documentation of RoBERTa <roberta>` for usage examples - This implementation is the same as RoBERTa. Refer to the :doc:`documentation of RoBERTa <roberta>` for usage examples
as well as the information relative to the inputs and outputs. as well as the information relative to the inputs and outputs.
The original code can be found `here <https://camembert-model.fr/>`__. This model was contributed by `camembert <https://huggingface.co/camembert>`__. The original code can be found `here
<https://camembert-model.fr/>`__.
CamembertConfig CamembertConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
......
...@@ -34,8 +34,10 @@ ConvBERT significantly outperforms BERT and its variants in various downstream t ...@@ -34,8 +34,10 @@ ConvBERT significantly outperforms BERT and its variants in various downstream t
fewer model parameters. Remarkably, ConvBERTbase model achieves 86.4 GLUE score, 0.7 higher than ELECTRAbase, while fewer model parameters. Remarkably, ConvBERTbase model achieves 86.4 GLUE score, 0.7 higher than ELECTRAbase, while
using less than 1/4 training cost. Code and pre-trained models will be released.* using less than 1/4 training cost. Code and pre-trained models will be released.*
ConvBERT training tips are similar to those of BERT. The original implementation can be found here: ConvBERT training tips are similar to those of BERT.
https://github.com/yitu-opensource/ConvBert
This model was contributed by `abhishek <https://huggingface.co/abhishek>`__. The original implementation can be found
here: https://github.com/yitu-opensource/ConvBert
ConvBertConfig ConvBertConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
......
...@@ -33,7 +33,8 @@ language model, which could facilitate several downstream Chinese NLP tasks, suc ...@@ -33,7 +33,8 @@ language model, which could facilitate several downstream Chinese NLP tasks, suc
cloze test, and language understanding. Extensive experiments demonstrate that CPM achieves strong performance on many cloze test, and language understanding. Extensive experiments demonstrate that CPM achieves strong performance on many
NLP tasks in the settings of few-shot (even zero-shot) learning.* NLP tasks in the settings of few-shot (even zero-shot) learning.*
The original implementation can be found here: https://github.com/TsinghuaAI/CPM-Generate This model was contributed by `canwenxu <https://huggingface.co/canwenxu>`__. The original implementation can be found
here: https://github.com/TsinghuaAI/CPM-Generate
Note: We only have a tokenizer here, since the model architecture is the same as GPT-2. Note: We only have a tokenizer here, since the model architecture is the same as GPT-2.
......
...@@ -46,7 +46,8 @@ Tips: ...@@ -46,7 +46,8 @@ Tips:
`reusing the past in generative models <../quickstart.html#using-the-past>`__ for more information on the usage of `reusing the past in generative models <../quickstart.html#using-the-past>`__ for more information on the usage of
this argument. this argument.
The original code can be found `here <https://github.com/salesforce/ctrl>`__. This model was contributed by `keskarnitishr <https://huggingface.co/keskarnitishr>`__. The original code can be found
`here <https://github.com/salesforce/ctrl>`__.
CTRLConfig CTRLConfig
......
...@@ -38,7 +38,8 @@ the training data performs consistently better on a wide range of NLP tasks, ach ...@@ -38,7 +38,8 @@ the training data performs consistently better on a wide range of NLP tasks, ach
pre-trained models will be made publicly available at https://github.com/microsoft/DeBERTa.* pre-trained models will be made publicly available at https://github.com/microsoft/DeBERTa.*
The original code can be found `here <https://github.com/microsoft/DeBERTa>`__. This model was contributed by `DeBERTa <https://huggingface.co/DeBERTa>`__. The original code can be found `here
<https://github.com/microsoft/DeBERTa>`__.
DebertaConfig DebertaConfig
......
...@@ -58,7 +58,8 @@ New in v2: ...@@ -58,7 +58,8 @@ New in v2:
- **900M model & 1.5B model** Two additional model sizes are available: 900M and 1.5B, which significantly improves the - **900M model & 1.5B model** Two additional model sizes are available: 900M and 1.5B, which significantly improves the
performance of downstream tasks. performance of downstream tasks.
The original code can be found `here <https://github.com/microsoft/DeBERTa>`__. This model was contributed by `DeBERTa <https://huggingface.co/DeBERTa>`__. The original code can be found `here
<https://github.com/microsoft/DeBERTa>`__.
DebertaV2Config DebertaV2Config
......
...@@ -73,6 +73,8 @@ Tips: ...@@ -73,6 +73,8 @@ Tips:
`facebook/deit-base-patch16-384`. Note that one should use :class:`~transformers.DeiTFeatureExtractor` in order to `facebook/deit-base-patch16-384`. Note that one should use :class:`~transformers.DeiTFeatureExtractor` in order to
prepare images for the model. prepare images for the model.
This model was contributed by `nielsr <https://huggingface.co/nielsr>`__.
DeiTConfig DeiTConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
......
...@@ -44,7 +44,7 @@ Tips: ...@@ -44,7 +44,7 @@ Tips:
- DistilBERT doesn't have options to select the input positions (:obj:`position_ids` input). This could be added if - DistilBERT doesn't have options to select the input positions (:obj:`position_ids` input). This could be added if
necessary though, just let us know if you need this option. necessary though, just let us know if you need this option.
The original code can be found `here This model was contributed by `victorsanh <https://huggingface.co/victorsanh>`__. The original code can be found `here
<https://github.com/huggingface/transformers/tree/master/examples/distillation>`__. <https://github.com/huggingface/transformers/tree/master/examples/distillation>`__.
......
...@@ -30,7 +30,8 @@ our dense retriever outperforms a strong Lucene-BM25 system largely by 9%-19% ab ...@@ -30,7 +30,8 @@ our dense retriever outperforms a strong Lucene-BM25 system largely by 9%-19% ab
retrieval accuracy, and helps our end-to-end QA system establish new state-of-the-art on multiple open-domain QA retrieval accuracy, and helps our end-to-end QA system establish new state-of-the-art on multiple open-domain QA
benchmarks.* benchmarks.*
The original code can be found `here <https://github.com/facebookresearch/DPR>`__. This model was contributed by `lhoestq <https://huggingface.co/lhoestq>`__. The original code can be found `here
<https://github.com/facebookresearch/DPR>`__.
DPRConfig DPRConfig
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment