index.rst 52.6 KB
Newer Older
1
Transformers
Sylvain Gugger's avatar
Sylvain Gugger committed
2
=======================================================================================================================
3

4
State-of-the-art Natural Language Processing for Jax, Pytorch and TensorFlow
thomwolf's avatar
thomwolf committed
5

6
7
🤗 Transformers (formerly known as `pytorch-transformers` and `pytorch-pretrained-bert`) provides general-purpose
architectures (BERT, GPT-2, RoBERTa, XLM, DistilBert, XLNet...) for Natural Language Understanding (NLU) and Natural
8
9
Language Generation (NLG) with over 32+ pretrained models in 100+ languages and deep interoperability between Jax,
PyTorch and TensorFlow.
10

11
12
13
This is the documentation of our repository `transformers <https://github.com/huggingface/transformers>`__. You can
also follow our `online course <https://huggingface.co/course>`__ that teaches how to use this library, as well as the
other libraries developed by Hugging Face and the Hub.
14

15
16
17
18
19
20
21
22
23
If you are looking for custom support from the Hugging Face team
-----------------------------------------------------------------------------------------------------------------------

.. raw:: html

    <a target="_blank" href="https://huggingface.co/support">
        <img alt="HuggingFace Expert Acceleration Program" src="https://huggingface.co/front/thumbnails/support.png" style="max-width: 600px; border: 1px solid #eee; border-radius: 4px; box-shadow: 0 1px 2px 0 rgba(0, 0, 0, 0.05);">
    </a><br>

LysandreJik's avatar
LysandreJik committed
24
Features
Sylvain Gugger's avatar
Sylvain Gugger committed
25
-----------------------------------------------------------------------------------------------------------------------
LysandreJik's avatar
LysandreJik committed
26
27
28
29

- High performance on NLU and NLG tasks
- Low barrier to entry for educators and practitioners

LysandreJik's avatar
LysandreJik committed
30
31
State-of-the-art NLP for everyone:

LysandreJik's avatar
LysandreJik committed
32
33
34
35
- Deep learning researchers
- Hands-on practitioners
- AI/ML/NLP teachers and educators

36
..
Sylvain Gugger's avatar
Sylvain Gugger committed
37
38
39
40
41
42
43
44
45
46
47
    Copyright 2020 The HuggingFace Team. All rights reserved.

    Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
    the License. You may obtain a copy of the License at

        http://www.apache.org/licenses/LICENSE-2.0

    Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
    an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
    specific language governing permissions and limitations under the License.

LysandreJik's avatar
LysandreJik committed
48
49
Lower compute costs, smaller carbon footprint:

LysandreJik's avatar
LysandreJik committed
50
51
52
53
- Researchers can share trained models instead of always retraining
- Practitioners can reduce compute time and production costs
- 8 architectures with over 30 pretrained models, some in more than 100 languages

LysandreJik's avatar
LysandreJik committed
54
55
Choose the right framework for every part of a model's lifetime:

LysandreJik's avatar
LysandreJik committed
56
- Train state-of-the-art models in 3 lines of code
57
58
- Deep interoperability between Jax, Pytorch and TensorFlow models
- Move a single model between Jax/PyTorch/TensorFlow frameworks at will
LysandreJik's avatar
LysandreJik committed
59
60
- Seamlessly pick the right framework for training, evaluation, production

61
The support for Jax is still experimental (with a few models right now), expect to see it grow in the coming months!
Sylvain Gugger's avatar
Sylvain Gugger committed
62

63
64
65
66
67
68
69
70
`All the model checkpoints <https://huggingface.co/models>`__ are seamlessly integrated from the huggingface.co `model
hub <https://huggingface.co>`__ where they are uploaded directly by `users <https://huggingface.co/users>`__ and
`organizations <https://huggingface.co/organizations>`__.

Current number of checkpoints: |checkpoints|

.. |checkpoints| image:: https://img.shields.io/endpoint?url=https://huggingface.co/api/shields/models&color=brightgreen

LysandreJik's avatar
LysandreJik committed
71
Contents
Sylvain Gugger's avatar
Sylvain Gugger committed
72
-----------------------------------------------------------------------------------------------------------------------
LysandreJik's avatar
LysandreJik committed
73

Sylvain Gugger's avatar
Sylvain Gugger committed
74
75
76
77
The documentation is organized in five parts:

- **GET STARTED** contains a quick tour, the installation instructions and some useful information about our philosophy
  and a glossary.
Sylvain Gugger's avatar
Sylvain Gugger committed
78
- **USING 🤗 TRANSFORMERS** contains general tutorials on how to use the library.
Sylvain Gugger's avatar
Sylvain Gugger committed
79
- **ADVANCED GUIDES** contains more advanced guides that are more specific to a given script or part of the library.
Santiago Castro's avatar
Santiago Castro committed
80
- **RESEARCH** focuses on tutorials that have less to do with how to use the library but more about general research in
Sylvain Gugger's avatar
Sylvain Gugger committed
81
  transformers model
82
- The three last section contain the documentation of each public class and function, grouped in:
Sylvain Gugger's avatar
Sylvain Gugger committed
83

84
85
86
    - **MAIN CLASSES** for the main classes exposing the important APIs of the library.
    - **MODELS** for the classes and functions related to each model implemented in the library.
    - **INTERNAL HELPERS** for the classes and functions we use internally.
Sylvain Gugger's avatar
Sylvain Gugger committed
87

88
The library currently contains Jax, PyTorch and Tensorflow implementations, pretrained model weights, usage scripts and
89
90
91
92
conversion utilities for the following models.

Supported models
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
93

94
95
96
..
    This list is updated automatically from the README with `make fix-copies`. Do not update manually!

97
98
99
100
101
102
1. :doc:`ALBERT <model_doc/albert>` (from Google Research and the Toyota Technological Institute at Chicago) released
   with the paper `ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
   <https://arxiv.org/abs/1909.11942>`__, by Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush
   Sharma, Radu Soricut.
2. :doc:`BART <model_doc/bart>` (from Facebook) released with the paper `BART: Denoising Sequence-to-Sequence
   Pre-training for Natural Language Generation, Translation, and Comprehension
103
104
   <https://arxiv.org/pdf/1910.13461.pdf>`__ by Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman
   Mohamed, Omer Levy, Ves Stoyanov and Luke Zettlemoyer.
105
106
107
3. :doc:`BARThez <model_doc/barthez>` (from École polytechnique) released with the paper `BARThez: a Skilled Pretrained
   French Sequence-to-Sequence Model <https://arxiv.org/abs/2010.12321>`__ by Moussa Kamal Eddine, Antoine J.-P.
   Tixier, Michalis Vazirgiannis.
Lysandre's avatar
Lysandre committed
108
109
110
4. :doc:`BARTpho <model_doc/bartpho>` (from VinAI Research) released with the paper `BARTpho: Pre-trained
   Sequence-to-Sequence Models for Vietnamese <https://arxiv.org/abs/2109.09701>`__ by Nguyen Luong Tran, Duong Minh Le
   and Dat Quoc Nguyen.
111
5. :doc:`BEiT <model_doc/beit>` (from Microsoft) released with the paper `BEiT: BERT Pre-Training of Image Transformers
Lysandre's avatar
Lysandre committed
112
   <https://arxiv.org/abs/2106.08254>`__ by Hangbo Bao, Li Dong, Furu Wei.
113
6. :doc:`BERT <model_doc/bert>` (from Google) released with the paper `BERT: Pre-training of Deep Bidirectional
114
115
   Transformers for Language Understanding <https://arxiv.org/abs/1810.04805>`__ by Jacob Devlin, Ming-Wei Chang,
   Kenton Lee and Kristina Toutanova.
Lysandre's avatar
Lysandre committed
116
117
118
7. :doc:`BERTweet <model_doc/bertweet>` (from VinAI Research) released with the paper `BERTweet: A pre-trained language
   model for English Tweets <https://aclanthology.org/2020.emnlp-demos.2/>`__ by Dat Quoc Nguyen, Thanh Vu and Anh Tuan
   Nguyen.
119
8. :doc:`BERT For Sequence Generation <model_doc/bertgeneration>` (from Google) released with the paper `Leveraging
120
121
   Pre-trained Checkpoints for Sequence Generation Tasks <https://arxiv.org/abs/1907.12461>`__ by Sascha Rothe, Shashi
   Narayan, Aliaksei Severyn.
122
9. :doc:`BigBird-RoBERTa <model_doc/bigbird>` (from Google Research) released with the paper `Big Bird: Transformers
Vasudev Gupta's avatar
Vasudev Gupta committed
123
124
   for Longer Sequences <https://arxiv.org/abs/2007.14062>`__ by Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua
   Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed.
125
126
127
128
129
130
131
132
10. :doc:`BigBird-Pegasus <model_doc/bigbird_pegasus>` (from Google Research) released with the paper `Big Bird:
    Transformers for Longer Sequences <https://arxiv.org/abs/2007.14062>`__ by Manzil Zaheer, Guru Guruganesh, Avinava
    Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr
    Ahmed.
11. :doc:`Blenderbot <model_doc/blenderbot>` (from Facebook) released with the paper `Recipes for building an
    open-domain chatbot <https://arxiv.org/abs/2004.13637>`__ by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary
    Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston.
12. :doc:`BlenderbotSmall <model_doc/blenderbot_small>` (from Facebook) released with the paper `Recipes for building
NielsRogge's avatar
NielsRogge committed
133
134
    an open-domain chatbot <https://arxiv.org/abs/2004.13637>`__ by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju,
    Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston.
135
13. :doc:`BORT <model_doc/bort>` (from Alexa) released with the paper `Optimal Subarchitecture Extraction For BERT
Vasudev Gupta's avatar
Vasudev Gupta committed
136
    <https://arxiv.org/abs/2010.10499>`__ by Adrian de Wynter and Daniel J. Perry.
137
14. :doc:`ByT5 <model_doc/byt5>` (from Google Research) released with the paper `ByT5: Towards a token-free future with
Patrick von Platen's avatar
Patrick von Platen committed
138
139
    pre-trained byte-to-byte models <https://arxiv.org/abs/2105.13626>`__ by Linting Xue, Aditya Barua, Noah Constant,
    Rami Al-Rfou, Sharan Narang, Mihir Kale, Adam Roberts, Colin Raffel.
140
15. :doc:`CamemBERT <model_doc/camembert>` (from Inria/Facebook/Sorbonne) released with the paper `CamemBERT: a Tasty
Vasudev Gupta's avatar
Vasudev Gupta committed
141
142
    French Language Model <https://arxiv.org/abs/1911.03894>`__ by Louis Martin*, Benjamin Muller*, Pedro Javier Ortiz
    Suárez*, Yoann Dupont, Laurent Romary, Éric Villemonte de la Clergerie, Djamé Seddah and Benoît Sagot.
143
16. :doc:`CANINE <model_doc/canine>` (from Google Research) released with the paper `CANINE: Pre-training an Efficient
NielsRogge's avatar
NielsRogge committed
144
145
    Tokenization-Free Encoder for Language Representation <https://arxiv.org/abs/2103.06874>`__ by Jonathan H. Clark,
    Dan Garrette, Iulia Turc, John Wieting.
146
17. :doc:`CLIP <model_doc/clip>` (from OpenAI) released with the paper `Learning Transferable Visual Models From
Suraj Patil's avatar
Suraj Patil committed
147
148
149
    Natural Language Supervision <https://arxiv.org/abs/2103.00020>`__ by Alec Radford, Jong Wook Kim, Chris Hallacy,
    Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen
    Krueger, Ilya Sutskever.
150
18. :doc:`ConvBERT <model_doc/convbert>` (from YituTech) released with the paper `ConvBERT: Improving BERT with
Stefan Schweter's avatar
Stefan Schweter committed
151
152
    Span-based Dynamic Convolution <https://arxiv.org/abs/2008.02496>`__ by Zihang Jiang, Weihao Yu, Daquan Zhou,
    Yunpeng Chen, Jiashi Feng, Shuicheng Yan.
153
19. :doc:`CPM <model_doc/cpm>` (from Tsinghua University) released with the paper `CPM: A Large-scale Generative
154
155
156
157
    Chinese Pre-trained Language Model <https://arxiv.org/abs/2012.00413>`__ by Zhengyan Zhang, Xu Han, Hao Zhou, Pei
    Ke, Yuxian Gu, Deming Ye, Yujia Qin, Yusheng Su, Haozhe Ji, Jian Guan, Fanchao Qi, Xiaozhi Wang, Yanan Zheng,
    Guoyang Zeng, Huanqi Cao, Shengqi Chen, Daixuan Li, Zhenbo Sun, Zhiyuan Liu, Minlie Huang, Wentao Han, Jie Tang,
    Juanzi Li, Xiaoyan Zhu, Maosong Sun.
158
20. :doc:`CTRL <model_doc/ctrl>` (from Salesforce) released with the paper `CTRL: A Conditional Transformer Language
abhishek thakur's avatar
abhishek thakur committed
159
160
    Model for Controllable Generation <https://arxiv.org/abs/1909.05858>`__ by Nitish Shirish Keskar*, Bryan McCann*,
    Lav R. Varshney, Caiming Xiong and Richard Socher.
161
21. :doc:`DeBERTa <model_doc/deberta>` (from Microsoft) released with the paper `DeBERTa: Decoding-enhanced BERT with
Lysandre Debut's avatar
Lysandre Debut committed
162
163
    Disentangled Attention <https://arxiv.org/abs/2006.03654>`__ by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu
    Chen.
164
22. :doc:`DeBERTa-v2 <model_doc/deberta_v2>` (from Microsoft) released with the paper `DeBERTa: Decoding-enhanced BERT
Lysandre's avatar
Lysandre committed
165
166
    with Disentangled Attention <https://arxiv.org/abs/2006.03654>`__ by Pengcheng He, Xiaodong Liu, Jianfeng Gao,
    Weizhu Chen.
167
23. :doc:`DeiT <model_doc/deit>` (from Facebook) released with the paper `Training data-efficient image transformers &
NielsRogge's avatar
NielsRogge committed
168
169
    distillation through attention <https://arxiv.org/abs/2012.12877>`__ by Hugo Touvron, Matthieu Cord, Matthijs
    Douze, Francisco Massa, Alexandre Sablayrolles, Hervé Jégou.
170
24. :doc:`DETR <model_doc/detr>` (from Facebook) released with the paper `End-to-End Object Detection with Transformers
NielsRogge's avatar
NielsRogge committed
171
172
    <https://arxiv.org/abs/2005.12872>`__ by Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier,
    Alexander Kirillov, Sergey Zagoruyko.
173
25. :doc:`DialoGPT <model_doc/dialogpt>` (from Microsoft Research) released with the paper `DialoGPT: Large-Scale
174
175
    Generative Pre-training for Conversational Response Generation <https://arxiv.org/abs/1911.00536>`__ by Yizhe
    Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao, Jianfeng Gao, Jingjing Liu, Bill Dolan.
176
26. :doc:`DistilBERT <model_doc/distilbert>` (from HuggingFace), released together with the paper `DistilBERT, a
177
178
179
180
181
182
    distilled version of BERT: smaller, faster, cheaper and lighter <https://arxiv.org/abs/1910.01108>`__ by Victor
    Sanh, Lysandre Debut and Thomas Wolf. The same method has been applied to compress GPT2 into `DistilGPT2
    <https://github.com/huggingface/transformers/tree/master/examples/distillation>`__, RoBERTa into `DistilRoBERTa
    <https://github.com/huggingface/transformers/tree/master/examples/distillation>`__, Multilingual BERT into
    `DistilmBERT <https://github.com/huggingface/transformers/tree/master/examples/distillation>`__ and a German
    version of DistilBERT.
183
27. :doc:`DPR <model_doc/dpr>` (from Facebook) released with the paper `Dense Passage Retrieval for Open-Domain
184
185
    Question Answering <https://arxiv.org/abs/2004.04906>`__ by Vladimir Karpukhin, Barlas Oğuz, Sewon Min, Patrick
    Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih.
186
28. :doc:`EncoderDecoder <model_doc/encoderdecoder>` (from Google Research) released with the paper `Leveraging
187
188
    Pre-trained Checkpoints for Sequence Generation Tasks <https://arxiv.org/abs/1907.12461>`__ by Sascha Rothe, Shashi
    Narayan, Aliaksei Severyn.
189
29. :doc:`ELECTRA <model_doc/electra>` (from Google Research/Stanford University) released with the paper `ELECTRA:
190
191
    Pre-training text encoders as discriminators rather than generators <https://arxiv.org/abs/2003.10555>`__ by Kevin
    Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning.
192
30. :doc:`FlauBERT <model_doc/flaubert>` (from CNRS) released with the paper `FlauBERT: Unsupervised Language Model
193
194
    Pre-training for French <https://arxiv.org/abs/1912.05372>`__ by Hang Le, Loïc Vial, Jibril Frej, Vincent Segonne,
    Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Benoît Crabbé, Laurent Besacier, Didier Schwab.
195
31. :doc:`FNet <model_doc/fnet>` (from Google Research) released with the paper `FNet: Mixing Tokens with Fourier
Lysandre's avatar
Lysandre committed
196
197
    Transforms <https://arxiv.org/abs/2105.03824>`__ by James Lee-Thorp, Joshua Ainslie, Ilya Eckstein, Santiago
    Ontanon.
198
32. :doc:`Funnel Transformer <model_doc/funnel>` (from CMU/Google Brain) released with the paper `Funnel-Transformer:
199
200
    Filtering out Sequential Redundancy for Efficient Language Processing <https://arxiv.org/abs/2006.03236>`__ by
    Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le.
201
33. :doc:`GPT <model_doc/gpt>` (from OpenAI) released with the paper `Improving Language Understanding by Generative
202
203
    Pre-Training <https://blog.openai.com/language-unsupervised/>`__ by Alec Radford, Karthik Narasimhan, Tim Salimans
    and Ilya Sutskever.
204
34. :doc:`GPT-2 <model_doc/gpt2>` (from OpenAI) released with the paper `Language Models are Unsupervised Multitask
205
206
    Learners <https://blog.openai.com/better-language-models/>`__ by Alec Radford*, Jeffrey Wu*, Rewon Child, David
    Luan, Dario Amodei** and Ilya Sutskever**.
207
35. :doc:`GPT-J <model_doc/gptj>` (from EleutherAI) released in the repository `kingoflolz/mesh-transformer-jax
Stella Biderman's avatar
Stella Biderman committed
208
    <https://github.com/kingoflolz/mesh-transformer-jax/>`__ by Ben Wang and Aran Komatsuzaki.
209
36. :doc:`GPT Neo <model_doc/gpt_neo>` (from EleutherAI) released in the repository `EleutherAI/gpt-neo
Suraj Patil's avatar
Suraj Patil committed
210
    <https://github.com/EleutherAI/gpt-neo>`__ by Sid Black, Stella Biderman, Leo Gao, Phil Wang and Connor Leahy.
211
37. :doc:`Hubert <model_doc/hubert>` (from Facebook) released with the paper `HuBERT: Self-Supervised Speech
Patrick von Platen's avatar
Patrick von Platen committed
212
213
    Representation Learning by Masked Prediction of Hidden Units <https://arxiv.org/abs/2106.07447>`__ by Wei-Ning Hsu,
    Benjamin Bolte, Yao-Hung Hubert Tsai, Kushal Lakhotia, Ruslan Salakhutdinov, Abdelrahman Mohamed.
214
38. :doc:`I-BERT <model_doc/ibert>` (from Berkeley) released with the paper `I-BERT: Integer-only BERT Quantization
215
    <https://arxiv.org/abs/2101.01321>`__ by Sehoon Kim, Amir Gholami, Zhewei Yao, Michael W. Mahoney, Kurt Keutzer.
NielsRogge's avatar
NielsRogge committed
216
39. `ImageGPT <https://huggingface.co/transformers/master/model_doc/imagegpt.html>`__ (from OpenAI) released with the
NielsRogge's avatar
NielsRogge committed
217
    paper `Generative Pretraining from Pixels <https://openai.com/blog/image-gpt/>`__ by Mark Chen, Alec Radford, Rewon
NielsRogge's avatar
NielsRogge committed
218
219
    Child, Jeffrey Wu, Heewoo Jun, David Luan, Ilya Sutskever.
40. :doc:`LayoutLM <model_doc/layoutlm>` (from Microsoft Research Asia) released with the paper `LayoutLM: Pre-training
220
221
    of Text and Layout for Document Image Understanding <https://arxiv.org/abs/1912.13318>`__ by Yiheng Xu, Minghao Li,
    Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou.
NielsRogge's avatar
NielsRogge committed
222
41. :doc:`LayoutLMv2 <model_doc/layoutlmv2>` (from Microsoft Research Asia) released with the paper `LayoutLMv2:
223
224
225
    Multi-modal Pre-training for Visually-Rich Document Understanding <https://arxiv.org/abs/2012.14740>`__ by Yang Xu,
    Yiheng Xu, Tengchao Lv, Lei Cui, Furu Wei, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Wanxiang Che, Min
    Zhang, Lidong Zhou.
NielsRogge's avatar
NielsRogge committed
226
42. :doc:`LayoutXLM <model_doc/layoutlmv2>` (from Microsoft Research Asia) released with the paper `LayoutXLM:
227
228
    Multimodal Pre-training for Multilingual Visually-rich Document Understanding <https://arxiv.org/abs/2104.08836>`__
    by Yiheng Xu, Tengchao Lv, Lei Cui, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Furu Wei.
NielsRogge's avatar
NielsRogge committed
229
43. :doc:`LED <model_doc/led>` (from AllenAI) released with the paper `Longformer: The Long-Document Transformer
230
    <https://arxiv.org/abs/2004.05150>`__ by Iz Beltagy, Matthew E. Peters, Arman Cohan.
NielsRogge's avatar
NielsRogge committed
231
44. :doc:`Longformer <model_doc/longformer>` (from AllenAI) released with the paper `Longformer: The Long-Document
232
    Transformer <https://arxiv.org/abs/2004.05150>`__ by Iz Beltagy, Matthew E. Peters, Arman Cohan.
NielsRogge's avatar
NielsRogge committed
233
45. :doc:`LUKE <model_doc/luke>` (from Studio Ousia) released with the paper `LUKE: Deep Contextualized Entity
NielsRogge's avatar
NielsRogge committed
234
235
    Representations with Entity-aware Self-attention <https://arxiv.org/abs/2010.01057>`__ by Ikuya Yamada, Akari Asai,
    Hiroyuki Shindo, Hideaki Takeda, Yuji Matsumoto.
NielsRogge's avatar
NielsRogge committed
236
46. :doc:`LXMERT <model_doc/lxmert>` (from UNC Chapel Hill) released with the paper `LXMERT: Learning Cross-Modality
237
238
    Encoder Representations from Transformers for Open-Domain Question Answering <https://arxiv.org/abs/1908.07490>`__
    by Hao Tan and Mohit Bansal.
NielsRogge's avatar
NielsRogge committed
239
47. :doc:`M2M100 <model_doc/m2m_100>` (from Facebook) released with the paper `Beyond English-Centric Multilingual
240
241
242
    Machine Translation <https://arxiv.org/abs/2010.11125>`__ by Angela Fan, Shruti Bhosale, Holger Schwenk, Zhiyi Ma,
    Ahmed El-Kishky, Siddharth Goyal, Mandeep Baines, Onur Celebi, Guillaume Wenzek, Vishrav Chaudhary, Naman Goyal,
    Tom Birch, Vitaliy Liptchinsky, Sergey Edunov, Edouard Grave, Michael Auli, Armand Joulin.
NielsRogge's avatar
NielsRogge committed
243
48. :doc:`MarianMT <model_doc/marian>` Machine translation models trained using `OPUS <http://opus.nlpl.eu/>`__ data by
244
245
    Jörg Tiedemann. The `Marian Framework <https://marian-nmt.github.io/>`__ is being developed by the Microsoft
    Translator Team.
NielsRogge's avatar
NielsRogge committed
246
49. :doc:`MBart <model_doc/mbart>` (from Facebook) released with the paper `Multilingual Denoising Pre-training for
247
248
    Neural Machine Translation <https://arxiv.org/abs/2001.08210>`__ by Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li,
    Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer.
NielsRogge's avatar
NielsRogge committed
249
50. :doc:`MBart-50 <model_doc/mbart>` (from Facebook) released with the paper `Multilingual Translation with Extensible
Suraj Patil's avatar
Suraj Patil committed
250
251
    Multilingual Pretraining and Finetuning <https://arxiv.org/abs/2008.00401>`__ by Yuqing Tang, Chau Tran, Xian Li,
    Peng-Jen Chen, Naman Goyal, Vishrav Chaudhary, Jiatao Gu, Angela Fan.
NielsRogge's avatar
NielsRogge committed
252
51. :doc:`Megatron-BERT <model_doc/megatron_bert>` (from NVIDIA) released with the paper `Megatron-LM: Training
253
254
    Multi-Billion Parameter Language Models Using Model Parallelism <https://arxiv.org/abs/1909.08053>`__ by Mohammad
    Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro.
NielsRogge's avatar
NielsRogge committed
255
52. :doc:`Megatron-GPT2 <model_doc/megatron_gpt2>` (from NVIDIA) released with the paper `Megatron-LM: Training
256
257
    Multi-Billion Parameter Language Models Using Model Parallelism <https://arxiv.org/abs/1909.08053>`__ by Mohammad
    Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro.
NielsRogge's avatar
NielsRogge committed
258
53. :doc:`MPNet <model_doc/mpnet>` (from Microsoft Research) released with the paper `MPNet: Masked and Permuted
StillKeepTry's avatar
StillKeepTry committed
259
260
    Pre-training for Language Understanding <https://arxiv.org/abs/2004.09297>`__ by Kaitao Song, Xu Tan, Tao Qin,
    Jianfeng Lu, Tie-Yan Liu.
NielsRogge's avatar
NielsRogge committed
261
54. :doc:`MT5 <model_doc/mt5>` (from Google AI) released with the paper `mT5: A massively multilingual pre-trained
Patrick von Platen's avatar
Patrick von Platen committed
262
263
    text-to-text transformer <https://arxiv.org/abs/2010.11934>`__ by Linting Xue, Noah Constant, Adam Roberts, Mihir
    Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel.
NielsRogge's avatar
NielsRogge committed
264
55. :doc:`Pegasus <model_doc/pegasus>` (from Google) released with the paper `PEGASUS: Pre-training with Extracted
265
    Gap-sentences for Abstractive Summarization <https://arxiv.org/abs/1912.08777>`__ by Jingqing Zhang, Yao Zhao,
266
    Mohammad Saleh and Peter J. Liu.
NielsRogge's avatar
NielsRogge committed
267
56. :doc:`PhoBERT <model_doc/phobert>` (from VinAI Research) released with the paper `PhoBERT: Pre-trained language
Lysandre's avatar
Lysandre committed
268
269
    models for Vietnamese <https://www.aclweb.org/anthology/2020.findings-emnlp.92/>`__ by Dat Quoc Nguyen and Anh Tuan
    Nguyen.
NielsRogge's avatar
NielsRogge committed
270
57. :doc:`ProphetNet <model_doc/prophetnet>` (from Microsoft Research) released with the paper `ProphetNet: Predicting
Lysandre's avatar
Lysandre committed
271
272
    Future N-gram for Sequence-to-Sequence Pre-training <https://arxiv.org/abs/2001.04063>`__ by Yu Yan, Weizhen Qi,
    Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou.
273
274
275
276
58. :doc:`QDQBert <model_doc/qdqbert>` (from NVIDIA) released with the paper `Integer Quantization for Deep Learning
    Inference: Principles and Empirical Evaluation <https://arxiv.org/abs/2004.09602>`__ by Hao Wu, Patrick Judd,
    Xiaojie Zhang, Mikhail Isaev and Paulius Micikevicius.
59. :doc:`Reformer <model_doc/reformer>` (from Google Research) released with the paper `Reformer: The Efficient
277
    Transformer <https://arxiv.org/abs/2001.04451>`__ by Nikita Kitaev, Łukasz Kaiser, Anselm Levskaya.
278
60. :doc:`RemBERT <model_doc/rembert>` (from Google Research) released with the paper `Rethinking embedding coupling in
279
280
    pre-trained language models <https://arxiv.org/pdf/2010.12821.pdf>`__ by Hyung Won Chung, Thibault Févry, Henry
    Tsai, M. Johnson, Sebastian Ruder.
281
61. :doc:`RoBERTa <model_doc/roberta>` (from Facebook), released together with the paper a `Robustly Optimized BERT
282
    Pretraining Approach <https://arxiv.org/abs/1907.11692>`__ by Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar
283
    Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov.
284
62. :doc:`RoFormer <model_doc/roformer>` (from ZhuiyiTechnology), released together with the paper a `RoFormer:
285
286
    Enhanced Transformer with Rotary Position Embedding <https://arxiv.org/pdf/2104.09864v1.pdf>`__ by Jianlin Su and
    Yu Lu and Shengfeng Pan and Bo Wen and Yunfeng Liu.
287
63. :doc:`SegFormer <model_doc/segformer>` (from NVIDIA) released with the paper `SegFormer: Simple and Efficient
Lysandre's avatar
Lysandre committed
288
289
    Design for Semantic Segmentation with Transformers <https://arxiv.org/abs/2105.15203>`__ by Enze Xie, Wenhai Wang,
    Zhiding Yu, Anima Anandkumar, Jose M. Alvarez, Ping Luo.
290
64. :doc:`SEW <model_doc/sew>` (from ASAPP) released with the paper `Performance-Efficiency Trade-offs in Unsupervised
291
292
    Pre-training for Speech Recognition <https://arxiv.org/abs/2109.06870>`__ by Felix Wu, Kwangyoun Kim, Jing Pan, Kyu
    Han, Kilian Q. Weinberger, Yoav Artzi.
293
65. :doc:`SEW-D <model_doc/sew_d>` (from ASAPP) released with the paper `Performance-Efficiency Trade-offs in
294
295
    Unsupervised Pre-training for Speech Recognition <https://arxiv.org/abs/2109.06870>`__ by Felix Wu, Kwangyoun Kim,
    Jing Pan, Kyu Han, Kilian Q. Weinberger, Yoav Artzi.
296
66. :doc:`SpeechToTextTransformer <model_doc/speech_to_text>` (from Facebook), released together with the paper
Suraj Patil's avatar
Suraj Patil committed
297
298
    `fairseq S2T: Fast Speech-to-Text Modeling with fairseq <https://arxiv.org/abs/2010.05171>`__ by Changhan Wang, Yun
    Tang, Xutai Ma, Anne Wu, Dmytro Okhonko, Juan Pino.
299
67. :doc:`SpeechToTextTransformer2 <model_doc/speech_to_text_2>` (from Facebook), released together with the paper
Lysandre's avatar
Lysandre committed
300
301
    `Large-Scale Self- and Semi-Supervised Learning for Speech Translation <https://arxiv.org/abs/2104.06678>`__ by
    Changhan Wang, Anne Wu, Juan Pino, Alexei Baevski, Michael Auli, Alexis Conneau.
302
68. :doc:`Splinter <model_doc/splinter>` (from Tel Aviv University), released together with the paper `Few-Shot
Lysandre's avatar
Lysandre committed
303
304
    Question Answering by Pretraining Span Selection <https://arxiv.org/abs/2101.00438>`__ by Ori Ram, Yuval Kirstain,
    Jonathan Berant, Amir Globerson, Omer Levy.
305
69. :doc:`SqueezeBert <model_doc/squeezebert>` (from Berkeley) released with the paper `SqueezeBERT: What can computer
306
307
    vision teach NLP about efficient neural networks? <https://arxiv.org/abs/2006.11316>`__ by Forrest N. Iandola,
    Albert E. Shaw, Ravi Krishna, and Kurt W. Keutzer.
308
70. :doc:`T5 <model_doc/t5>` (from Google AI) released with the paper `Exploring the Limits of Transfer Learning with a
309
310
    Unified Text-to-Text Transformer <https://arxiv.org/abs/1910.10683>`__ by Colin Raffel and Noam Shazeer and Adam
    Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu.
311
71. :doc:`T5v1.1 <model_doc/t5v1.1>` (from Google AI) released in the repository
NielsRogge's avatar
NielsRogge committed
312
313
314
315
    `google-research/text-to-text-transfer-transformer
    <https://github.com/google-research/text-to-text-transfer-transformer/blob/main/released_checkpoints.md#t511>`__ by
    Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi
    Zhou and Wei Li and Peter J. Liu.
316
72. :doc:`TAPAS <model_doc/tapas>` (from Google AI) released with the paper `TAPAS: Weakly Supervised Table Parsing via
Lysandre's avatar
Lysandre committed
317
318
    Pre-training <https://arxiv.org/abs/2004.02349>`__ by Jonathan Herzig, Paweł Krzysztof Nowak, Thomas Müller,
    Francesco Piccinno and Julian Martin Eisenschlos.
319
73. :doc:`Transformer-XL <model_doc/transformerxl>` (from Google/CMU) released with the paper `Transformer-XL:
320
321
    Attentive Language Models Beyond a Fixed-Length Context <https://arxiv.org/abs/1901.02860>`__ by Zihang Dai*,
    Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov.
322
74. :doc:`TrOCR <model_doc/trocr>` (from Microsoft), released together with the paper `TrOCR: Transformer-based Optical
Lysandre's avatar
Lysandre committed
323
324
    Character Recognition with Pre-trained Models <https://arxiv.org/abs/2109.10282>`__ by Minghao Li, Tengchao Lv, Lei
    Cui, Yijuan Lu, Dinei Florencio, Cha Zhang, Zhoujun Li, Furu Wei.
325
75. :doc:`UniSpeech <model_doc/unispeech>` (from Microsoft Research) released with the paper `UniSpeech: Unified Speech
Lysandre's avatar
Lysandre committed
326
327
    Representation Learning with Labeled and Unlabeled Data <https://arxiv.org/abs/2101.07597>`__ by Chengyi Wang, Yu
    Wu, Yao Qian, Kenichi Kumatani, Shujie Liu, Furu Wei, Michael Zeng, Xuedong Huang.
328
76. :doc:`UniSpeechSat <model_doc/unispeech_sat>` (from Microsoft Research) released with the paper `UNISPEECH-SAT:
Lysandre's avatar
Lysandre committed
329
330
331
    UNIVERSAL SPEECH REPRESENTATION LEARNING WITH SPEAKER AWARE PRE-TRAINING <https://arxiv.org/abs/2110.05752>`__ by
    Sanyuan Chen, Yu Wu, Chengyi Wang, Zhengyang Chen, Zhuo Chen, Shujie Liu, Jian Wu, Yao Qian, Furu Wei, Jinyu Li,
    Xiangzhan Yu.
332
77. :doc:`Vision Transformer (ViT) <model_doc/vit>` (from Google AI) released with the paper `An Image is Worth 16x16
333
334
335
    Words: Transformers for Image Recognition at Scale <https://arxiv.org/abs/2010.11929>`__ by Alexey Dosovitskiy,
    Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias
    Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby.
336
78. :doc:`VisualBERT <model_doc/visual_bert>` (from UCLA NLP) released with the paper `VisualBERT: A Simple and
Gunjan Chhablani's avatar
Gunjan Chhablani committed
337
338
    Performant Baseline for Vision and Language <https://arxiv.org/pdf/1908.03557>`__ by Liunian Harold Li, Mark
    Yatskar, Da Yin, Cho-Jui Hsieh, Kai-Wei Chang.
339
79. :doc:`Wav2Vec2 <model_doc/wav2vec2>` (from Facebook AI) released with the paper `wav2vec 2.0: A Framework for
Patrick von Platen's avatar
Patrick von Platen committed
340
341
    Self-Supervised Learning of Speech Representations <https://arxiv.org/abs/2006.11477>`__ by Alexei Baevski, Henry
    Zhou, Abdelrahman Mohamed, Michael Auli.
342
80. :doc:`XLM <model_doc/xlm>` (from Facebook) released together with the paper `Cross-lingual Language Model
343
    Pretraining <https://arxiv.org/abs/1901.07291>`__ by Guillaume Lample and Alexis Conneau.
344
81. :doc:`XLM-ProphetNet <model_doc/xlmprophetnet>` (from Microsoft Research) released with the paper `ProphetNet:
Lysandre's avatar
Lysandre committed
345
346
    Predicting Future N-gram for Sequence-to-Sequence Pre-training <https://arxiv.org/abs/2001.04063>`__ by Yu Yan,
    Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou.
347
82. :doc:`XLM-RoBERTa <model_doc/xlmroberta>` (from Facebook AI), released together with the paper `Unsupervised
348
349
350
    Cross-lingual Representation Learning at Scale <https://arxiv.org/abs/1911.02116>`__ by Alexis Conneau*, Kartikay
    Khandelwal*, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke
    Zettlemoyer and Veselin Stoyanov.
351
83. :doc:`XLNet <model_doc/xlnet>` (from Google/CMU) released with the paper `​XLNet: Generalized Autoregressive
352
353
    Pretraining for Language Understanding <https://arxiv.org/abs/1906.08237>`__ by Zhilin Yang*, Zihang Dai*, Yiming
    Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le.
354
84. :doc:`XLSR-Wav2Vec2 <model_doc/xlsr_wav2vec2>` (from Facebook AI) released with the paper `Unsupervised
355
356
    Cross-Lingual Representation Learning For Speech Recognition <https://arxiv.org/abs/2006.13979>`__ by Alexis
    Conneau, Alexei Baevski, Ronan Collobert, Abdelrahman Mohamed, Michael Auli.
LysandreJik's avatar
LysandreJik committed
357

Sylvain Gugger's avatar
Sylvain Gugger committed
358

359
360
Supported frameworks
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
361

Sylvain Gugger's avatar
Sylvain Gugger committed
362
The table below represents the current support in the library for each of those models, whether they have a Python
363
364
tokenizer (called "slow"). A "fast" tokenizer backed by the 🤗 Tokenizers library, whether they have support in Jax (via
Flax), PyTorch, and/or TensorFlow.
Sylvain Gugger's avatar
Sylvain Gugger committed
365
366
367
368
369
370
371
372
373

..
    This table is updated automatically from the auto modules with `make fix-copies`. Do not update manually!

.. rst-class:: center-aligned-table

+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|            Model            | Tokenizer slow | Tokenizer fast | PyTorch support | TensorFlow support | Flax Support |
+=============================+================+================+=================+====================+==============+
Kamal Raj's avatar
Kamal Raj committed
374
|           ALBERT            |       ✅       |       ✅       |       ✅        |         ✅         |      ✅      |
Sylvain Gugger's avatar
Sylvain Gugger committed
375
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
Daniel Stancl's avatar
Daniel Stancl committed
376
|            BART             |       ✅       |       ✅       |       ✅        |         ✅         |      ✅      |
Sylvain Gugger's avatar
Sylvain Gugger committed
377
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
NielsRogge's avatar
NielsRogge committed
378
|            BEiT             |       ❌       |       ❌       |       ✅        |         ❌         |      ✅      |
NielsRogge's avatar
NielsRogge committed
379
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
380
381
|            BERT             |       ✅       |       ✅       |       ✅        |         ✅         |      ✅      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
Sylvain Gugger's avatar
Sylvain Gugger committed
382
383
|       Bert Generation       |       ✅       |       ❌       |       ✅        |         ❌         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
Vasudev Gupta's avatar
Vasudev Gupta committed
384
|           BigBird           |       ✅       |       ✅       |       ✅        |         ❌         |      ✅      |
Vasudev Gupta's avatar
Vasudev Gupta committed
385
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
Vasudev Gupta's avatar
Vasudev Gupta committed
386
387
|       BigBirdPegasus        |       ❌       |       ❌       |       ✅        |         ❌         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
388
|         Blenderbot          |       ✅       |       ✅       |       ✅        |         ✅         |      ❌      |
Sylvain Gugger's avatar
Sylvain Gugger committed
389
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
390
|       BlenderbotSmall       |       ✅       |       ✅       |       ✅        |         ✅         |      ❌      |
391
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
Sylvain Gugger's avatar
Sylvain Gugger committed
392
393
|          CamemBERT          |       ✅       |       ✅       |       ✅        |         ✅         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
NielsRogge's avatar
NielsRogge committed
394
395
|           Canine            |       ✅       |       ❌       |       ✅        |         ❌         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
396
|            CLIP             |       ✅       |       ✅       |       ✅        |         ❌         |      ✅      |
abhishek thakur's avatar
abhishek thakur committed
397
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
398
|          ConvBERT           |       ✅       |       ✅       |       ✅        |         ✅         |      ❌      |
NielsRogge's avatar
NielsRogge committed
399
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
400
|            CTRL             |       ✅       |       ❌       |       ✅        |         ✅         |      ❌      |
Sylvain Gugger's avatar
Sylvain Gugger committed
401
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
Kamal Raj's avatar
Kamal Raj committed
402
|           DeBERTa           |       ✅       |       ✅       |       ✅        |         ✅         |      ❌      |
Sylvain Gugger's avatar
Sylvain Gugger committed
403
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
Kamal Raj's avatar
Kamal Raj committed
404
|         DeBERTa-v2          |       ✅       |       ❌       |       ✅        |         ✅         |      ❌      |
405
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
NielsRogge's avatar
NielsRogge committed
406
407
|            DeiT             |       ❌       |       ❌       |       ✅        |         ❌         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
408
409
|            DETR             |       ❌       |       ❌       |       ✅        |         ❌         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
Kamal Raj's avatar
Kamal Raj committed
410
|         DistilBERT          |       ✅       |       ✅       |       ✅        |         ✅         |      ✅      |
Sylvain Gugger's avatar
Sylvain Gugger committed
411
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
412
413
|             DPR             |       ✅       |       ✅       |       ✅        |         ✅         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
414
|           ELECTRA           |       ✅       |       ✅       |       ✅        |         ✅         |      ✅      |
Sylvain Gugger's avatar
Sylvain Gugger committed
415
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
416
|       Encoder decoder       |       ❌       |       ❌       |       ✅        |         ✅         |      ✅      |
Sylvain Gugger's avatar
Sylvain Gugger committed
417
418
419
420
421
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
| FairSeq Machine-Translation |       ✅       |       ❌       |       ✅        |         ❌         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|          FlauBERT           |       ✅       |       ❌       |       ✅        |         ✅         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
Gunjan Chhablani's avatar
Gunjan Chhablani committed
422
423
|            FNet             |       ✅       |       ✅       |       ✅        |         ❌         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
Sylvain Gugger's avatar
Sylvain Gugger committed
424
425
|     Funnel Transformer      |       ✅       |       ✅       |       ✅        |         ✅         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
Suraj Patil's avatar
Suraj Patil committed
426
|           GPT Neo           |       ❌       |       ❌       |       ✅        |         ❌         |      ✅      |
Suraj Patil's avatar
Suraj Patil committed
427
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
Stella Biderman's avatar
Stella Biderman committed
428
429
|            GPT-J            |       ❌       |       ❌       |       ✅        |         ❌         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
Will Rice's avatar
Will Rice committed
430
|           Hubert            |       ❌       |       ❌       |       ✅        |         ✅         |      ❌      |
Patrick von Platen's avatar
Patrick von Platen committed
431
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
Sehoon Kim's avatar
Sehoon Kim committed
432
433
|           I-BERT            |       ❌       |       ❌       |       ✅        |         ❌         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
NielsRogge's avatar
NielsRogge committed
434
435
|          ImageGPT           |       ❌       |       ❌       |       ✅        |         ❌         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
436
437
|          LayoutLM           |       ✅       |       ✅       |       ✅        |         ✅         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
438
439
|         LayoutLMv2          |       ✅       |       ✅       |       ✅        |         ❌         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
Patrick von Platen's avatar
Patrick von Platen committed
440
441
|             LED             |       ✅       |       ✅       |       ✅        |         ✅         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
442
443
|         Longformer          |       ✅       |       ✅       |       ✅        |         ✅         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
NielsRogge's avatar
NielsRogge committed
444
445
|            LUKE             |       ✅       |       ❌       |       ✅        |         ❌         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
Sylvain Gugger's avatar
Sylvain Gugger committed
446
447
|           LXMERT            |       ✅       |       ✅       |       ✅        |         ✅         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
Suraj Patil's avatar
Suraj Patil committed
448
449
|           M2M100            |       ✅       |       ❌       |       ✅        |         ❌         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
450
|           Marian            |       ✅       |       ❌       |       ✅        |         ✅         |      ✅      |
Sylvain Gugger's avatar
Sylvain Gugger committed
451
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
452
453
|            mBART            |       ✅       |       ✅       |       ✅        |         ✅         |      ✅      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
454
455
|        MegatronBert         |       ❌       |       ❌       |       ✅        |         ❌         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
Sylvain Gugger's avatar
Sylvain Gugger committed
456
457
|         MobileBERT          |       ✅       |       ✅       |       ✅        |         ✅         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
458
459
460
461
|            MPNet            |       ✅       |       ✅       |       ✅        |         ✅         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|             mT5             |       ✅       |       ✅       |       ✅        |         ✅         |      ✅      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
Sylvain Gugger's avatar
Sylvain Gugger committed
462
463
|         OpenAI GPT          |       ✅       |       ✅       |       ✅        |         ✅         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
Suraj Patil's avatar
Suraj Patil committed
464
|        OpenAI GPT-2         |       ✅       |       ✅       |       ✅        |         ✅         |      ✅      |
Sylvain Gugger's avatar
Sylvain Gugger committed
465
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
466
|           Pegasus           |       ✅       |       ✅       |       ✅        |         ✅         |      ✅      |
Sylvain Gugger's avatar
Sylvain Gugger committed
467
468
469
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|         ProphetNet          |       ✅       |       ❌       |       ✅        |         ❌         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
470
471
|           QDQBert           |       ❌       |       ❌       |       ✅        |         ❌         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
Ratthachat (Jung)'s avatar
Ratthachat (Jung) committed
472
|             RAG             |       ✅       |       ❌       |       ✅        |         ✅         |      ❌      |
Sylvain Gugger's avatar
Sylvain Gugger committed
473
474
475
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|          Reformer           |       ✅       |       ✅       |       ✅        |         ❌         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
476
477
|           RemBERT           |       ✅       |       ✅       |       ✅        |         ✅         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
Sylvain Gugger's avatar
Sylvain Gugger committed
478
479
480
481
|          RetriBERT          |       ✅       |       ✅       |       ✅        |         ❌         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|           RoBERTa           |       ✅       |       ✅       |       ✅        |         ✅         |      ✅      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
482
483
|          RoFormer           |       ✅       |       ✅       |       ✅        |         ✅         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
NielsRogge's avatar
NielsRogge committed
484
485
|          SegFormer          |       ❌       |       ❌       |       ✅        |         ❌         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
486
487
488
489
|             SEW             |       ❌       |       ❌       |       ✅        |         ❌         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|            SEW-D            |       ❌       |       ❌       |       ✅        |         ❌         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
490
491
|   Speech Encoder decoder    |       ❌       |       ❌       |       ✅        |         ❌         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
Suraj Patil's avatar
Suraj Patil committed
492
493
|         Speech2Text         |       ✅       |       ❌       |       ✅        |         ❌         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
494
495
|        Speech2Text2         |       ✅       |       ❌       |       ❌        |         ❌         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
Ori Ram's avatar
Ori Ram committed
496
497
|          Splinter           |       ✅       |       ✅       |       ✅        |         ❌         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
Sylvain Gugger's avatar
Sylvain Gugger committed
498
499
|         SqueezeBERT         |       ✅       |       ✅       |       ✅        |         ❌         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
Vasudev Gupta's avatar
Vasudev Gupta committed
500
|             T5              |       ✅       |       ✅       |       ✅        |         ✅         |      ✅      |
Sylvain Gugger's avatar
Sylvain Gugger committed
501
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
NielsRogge's avatar
NielsRogge committed
502
503
|            TAPAS            |       ✅       |       ❌       |       ✅        |         ❌         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
Sylvain Gugger's avatar
Sylvain Gugger committed
504
505
|       Transformer-XL        |       ✅       |       ❌       |       ✅        |         ✅         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
506
507
|            TrOCR            |       ❌       |       ❌       |       ✅        |         ❌         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
508
509
510
511
|          UniSpeech          |       ❌       |       ❌       |       ✅        |         ❌         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|        UniSpeechSat         |       ❌       |       ❌       |       ✅        |         ❌         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
512
|   Vision Encoder decoder    |       ❌       |       ❌       |       ✅        |         ❌         |      ✅      |
513
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
Gunjan Chhablani's avatar
Gunjan Chhablani committed
514
515
|         VisualBert          |       ❌       |       ❌       |       ✅        |         ❌         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
Yih-Dar's avatar
Yih-Dar committed
516
|             ViT             |       ❌       |       ❌       |       ✅        |         ✅         |      ✅      |
517
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
518
|          Wav2Vec2           |       ✅       |       ❌       |       ✅        |         ✅         |      ✅      |
Patrick von Platen's avatar
Patrick von Platen committed
519
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
Sylvain Gugger's avatar
Sylvain Gugger committed
520
521
522
523
524
525
526
527
528
|             XLM             |       ✅       |       ❌       |       ✅        |         ✅         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|         XLM-RoBERTa         |       ✅       |       ✅       |       ✅        |         ✅         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|        XLMProphetNet        |       ✅       |       ❌       |       ✅        |         ❌         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|            XLNet            |       ✅       |       ✅       |       ✅        |         ✅         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+

529
530
.. toctree::
    :maxdepth: 2
531
    :caption: Get started
532

Sylvain Gugger's avatar
Sylvain Gugger committed
533
    quicktour
534
    installation
Sylvain Gugger's avatar
Sylvain Gugger committed
535
    philosophy
Lysandre's avatar
Lysandre committed
536
    glossary
537
538
539

.. toctree::
    :maxdepth: 2
Sylvain Gugger's avatar
Sylvain Gugger committed
540
    :caption: Using 🤗 Transformers
541

Sylvain Gugger's avatar
Sylvain Gugger committed
542
543
    task_summary
    model_summary
Sylvain Gugger's avatar
Sylvain Gugger committed
544
    preprocessing
545
    training
546
    model_sharing
Sylvain Gugger's avatar
Sylvain Gugger committed
547
    tokenizer_summary
548
549
550
551
552
553
554
    multilingual

.. toctree::
    :maxdepth: 2
    :caption: Advanced guides

    pretrained_models
555
    examples
556
    troubleshooting
557
    custom_datasets
558
    notebooks
559
    sagemaker
560
    community
561
    converting_tensorflow_models
562
    migration
563
    contributing
564
    add_new_model
565
    add_new_pipeline
566
    fast_tokenizers
Stas Bekman's avatar
Stas Bekman committed
567
    performance
568
    parallelism
569
    testing
570
    debugging
Funtowicz Morgan's avatar
Funtowicz Morgan committed
571
    serialization
Sylvain Gugger's avatar
Sylvain Gugger committed
572
    pr_checks
573
574
575
576
577
578

.. toctree::
    :maxdepth: 2
    :caption: Research

    bertology
579
    perplexity
580
    benchmarks
581

thomwolf's avatar
thomwolf committed
582
583
.. toctree::
    :maxdepth: 2
584
    :caption: Main Classes
thomwolf's avatar
thomwolf committed
585

Sylvain Gugger's avatar
Sylvain Gugger committed
586
    main_classes/callback
thomwolf's avatar
thomwolf committed
587
    main_classes/configuration
588
    main_classes/data_collator
589
    main_classes/keras_callbacks
590
    main_classes/logging
thomwolf's avatar
thomwolf committed
591
592
    main_classes/model
    main_classes/optimizer_schedules
593
594
    main_classes/output
    main_classes/pipelines
LysandreJik's avatar
LysandreJik committed
595
    main_classes/processors
596
597
    main_classes/tokenizer
    main_classes/trainer
598
    main_classes/deepspeed
599
    main_classes/feature_extractor
600
601
602
603
604
605

.. toctree::
    :maxdepth: 2
    :caption: Models

    model_doc/albert
thomwolf's avatar
thomwolf committed
606
    model_doc/auto
607
    model_doc/bart
608
    model_doc/barthez
609
    model_doc/bartpho
NielsRogge's avatar
NielsRogge committed
610
    model_doc/beit
611
    model_doc/bert
612
    model_doc/bertweet
613
    model_doc/bertgeneration
614
    model_doc/bert_japanese
Vasudev Gupta's avatar
Vasudev Gupta committed
615
    model_doc/bigbird
Vasudev Gupta's avatar
Vasudev Gupta committed
616
    model_doc/bigbird_pegasus
Sam Shleifer's avatar
Sam Shleifer committed
617
    model_doc/blenderbot
618
    model_doc/blenderbot_small
Stefan Schweter's avatar
Stefan Schweter committed
619
    model_doc/bort
Patrick von Platen's avatar
Patrick von Platen committed
620
    model_doc/byt5
Lysandre's avatar
Lysandre committed
621
    model_doc/camembert
NielsRogge's avatar
NielsRogge committed
622
    model_doc/canine
Suraj Patil's avatar
Suraj Patil committed
623
    model_doc/clip
abhishek thakur's avatar
abhishek thakur committed
624
    model_doc/convbert
625
    model_doc/cpm
626
    model_doc/ctrl
Pengcheng He's avatar
Pengcheng He committed
627
    model_doc/deberta
628
    model_doc/deberta_v2
NielsRogge's avatar
NielsRogge committed
629
    model_doc/deit
NielsRogge's avatar
NielsRogge committed
630
    model_doc/detr
631
    model_doc/dialogpt
632
    model_doc/distilbert
Quentin Lhoest's avatar
Quentin Lhoest committed
633
    model_doc/dpr
634
635
636
    model_doc/electra
    model_doc/encoderdecoder
    model_doc/flaubert
Gunjan Chhablani's avatar
Gunjan Chhablani committed
637
    model_doc/fnet
638
    model_doc/fsmt
Sylvain Gugger's avatar
Sylvain Gugger committed
639
    model_doc/funnel
640
    model_doc/herbert
Sehoon Kim's avatar
Sehoon Kim committed
641
    model_doc/ibert
NielsRogge's avatar
NielsRogge committed
642
    model_doc/imagegpt
Minghao Li's avatar
Minghao Li committed
643
    model_doc/layoutlm
644
645
    model_doc/layoutlmv2
    model_doc/layoutxlm
Patrick von Platen's avatar
Patrick von Platen committed
646
    model_doc/led
647
    model_doc/longformer
NielsRogge's avatar
NielsRogge committed
648
    model_doc/luke
649
650
    model_doc/lxmert
    model_doc/marian
Suraj Patil's avatar
Suraj Patil committed
651
    model_doc/m2m_100
652
    model_doc/mbart
653
654
    model_doc/megatron_bert
    model_doc/megatron_gpt2
655
    model_doc/mobilebert
StillKeepTry's avatar
StillKeepTry committed
656
    model_doc/mpnet
Patrick von Platen's avatar
Patrick von Platen committed
657
    model_doc/mt5
658
659
    model_doc/gpt
    model_doc/gpt2
Stella Biderman's avatar
Stella Biderman committed
660
    model_doc/gptj
Suraj Patil's avatar
Suraj Patil committed
661
    model_doc/gpt_neo
Patrick von Platen's avatar
Patrick von Platen committed
662
    model_doc/hubert
663
    model_doc/pegasus
664
    model_doc/phobert
Weizhen's avatar
Weizhen committed
665
    model_doc/prophetnet
666
    model_doc/qdqbert
Sylvain Gugger's avatar
Sylvain Gugger committed
667
    model_doc/rag
668
    model_doc/reformer
669
    model_doc/rembert
670
671
    model_doc/retribert
    model_doc/roberta
672
    model_doc/roformer
NielsRogge's avatar
NielsRogge committed
673
    model_doc/segformer
674
675
    model_doc/sew
    model_doc/sew_d
676
    model_doc/speechencoderdecoder
Suraj Patil's avatar
Suraj Patil committed
677
    model_doc/speech_to_text
678
    model_doc/speech_to_text_2
Ori Ram's avatar
Ori Ram committed
679
    model_doc/splinter
680
    model_doc/squeezebert
681
    model_doc/t5
NielsRogge's avatar
NielsRogge committed
682
    model_doc/t5v1.1
NielsRogge's avatar
NielsRogge committed
683
    model_doc/tapas
684
    model_doc/transformerxl
685
    model_doc/trocr
686
687
    model_doc/unispeech
    model_doc/unispeech_sat
688
    model_doc/visionencoderdecoder
689
    model_doc/vit
Gunjan Chhablani's avatar
Gunjan Chhablani committed
690
    model_doc/visual_bert
Patrick von Platen's avatar
Patrick von Platen committed
691
    model_doc/wav2vec2
692
    model_doc/xlm
Weizhen's avatar
Weizhen committed
693
    model_doc/xlmprophetnet
694
695
    model_doc/xlmroberta
    model_doc/xlnet
696
    model_doc/xlsr_wav2vec2
697
698
699
700
701

.. toctree::
    :maxdepth: 2
    :caption: Internal Helpers

Sylvain Gugger's avatar
Sylvain Gugger committed
702
    internal/modeling_utils
703
    internal/pipelines_utils
704
    internal/tokenization_utils
Sylvain Gugger's avatar
Sylvain Gugger committed
705
    internal/trainer_utils
706
    internal/generation_utils
707
    internal/file_utils