index.rst 47.2 KB
Newer Older
1
Transformers
Sylvain Gugger's avatar
Sylvain Gugger committed
2
=======================================================================================================================
3

4
State-of-the-art Natural Language Processing for Jax, Pytorch and TensorFlow
thomwolf's avatar
thomwolf committed
5

6
7
🤗 Transformers (formerly known as `pytorch-transformers` and `pytorch-pretrained-bert`) provides general-purpose
architectures (BERT, GPT-2, RoBERTa, XLM, DistilBert, XLNet...) for Natural Language Understanding (NLU) and Natural
8
9
Language Generation (NLG) with over 32+ pretrained models in 100+ languages and deep interoperability between Jax,
PyTorch and TensorFlow.
10

11
12
13
This is the documentation of our repository `transformers <https://github.com/huggingface/transformers>`__. You can
also follow our `online course <https://huggingface.co/course>`__ that teaches how to use this library, as well as the
other libraries developed by Hugging Face and the Hub.
14

15
16
17
18
19
20
21
22
23
If you are looking for custom support from the Hugging Face team
-----------------------------------------------------------------------------------------------------------------------

.. raw:: html

    <a target="_blank" href="https://huggingface.co/support">
        <img alt="HuggingFace Expert Acceleration Program" src="https://huggingface.co/front/thumbnails/support.png" style="max-width: 600px; border: 1px solid #eee; border-radius: 4px; box-shadow: 0 1px 2px 0 rgba(0, 0, 0, 0.05);">
    </a><br>

LysandreJik's avatar
LysandreJik committed
24
Features
Sylvain Gugger's avatar
Sylvain Gugger committed
25
-----------------------------------------------------------------------------------------------------------------------
LysandreJik's avatar
LysandreJik committed
26
27
28
29

- High performance on NLU and NLG tasks
- Low barrier to entry for educators and practitioners

LysandreJik's avatar
LysandreJik committed
30
31
State-of-the-art NLP for everyone:

LysandreJik's avatar
LysandreJik committed
32
33
34
35
- Deep learning researchers
- Hands-on practitioners
- AI/ML/NLP teachers and educators

36
..
Sylvain Gugger's avatar
Sylvain Gugger committed
37
38
39
40
41
42
43
44
45
46
47
    Copyright 2020 The HuggingFace Team. All rights reserved.

    Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
    the License. You may obtain a copy of the License at

        http://www.apache.org/licenses/LICENSE-2.0

    Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
    an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
    specific language governing permissions and limitations under the License.

LysandreJik's avatar
LysandreJik committed
48
49
Lower compute costs, smaller carbon footprint:

LysandreJik's avatar
LysandreJik committed
50
51
52
53
- Researchers can share trained models instead of always retraining
- Practitioners can reduce compute time and production costs
- 8 architectures with over 30 pretrained models, some in more than 100 languages

LysandreJik's avatar
LysandreJik committed
54
55
Choose the right framework for every part of a model's lifetime:

LysandreJik's avatar
LysandreJik committed
56
- Train state-of-the-art models in 3 lines of code
57
58
- Deep interoperability between Jax, Pytorch and TensorFlow models
- Move a single model between Jax/PyTorch/TensorFlow frameworks at will
LysandreJik's avatar
LysandreJik committed
59
60
- Seamlessly pick the right framework for training, evaluation, production

61
The support for Jax is still experimental (with a few models right now), expect to see it grow in the coming months!
Sylvain Gugger's avatar
Sylvain Gugger committed
62

63
64
65
66
67
68
69
70
`All the model checkpoints <https://huggingface.co/models>`__ are seamlessly integrated from the huggingface.co `model
hub <https://huggingface.co>`__ where they are uploaded directly by `users <https://huggingface.co/users>`__ and
`organizations <https://huggingface.co/organizations>`__.

Current number of checkpoints: |checkpoints|

.. |checkpoints| image:: https://img.shields.io/endpoint?url=https://huggingface.co/api/shields/models&color=brightgreen

LysandreJik's avatar
LysandreJik committed
71
Contents
Sylvain Gugger's avatar
Sylvain Gugger committed
72
-----------------------------------------------------------------------------------------------------------------------
LysandreJik's avatar
LysandreJik committed
73

Sylvain Gugger's avatar
Sylvain Gugger committed
74
75
76
77
The documentation is organized in five parts:

- **GET STARTED** contains a quick tour, the installation instructions and some useful information about our philosophy
  and a glossary.
Sylvain Gugger's avatar
Sylvain Gugger committed
78
- **USING 🤗 TRANSFORMERS** contains general tutorials on how to use the library.
Sylvain Gugger's avatar
Sylvain Gugger committed
79
- **ADVANCED GUIDES** contains more advanced guides that are more specific to a given script or part of the library.
Santiago Castro's avatar
Santiago Castro committed
80
- **RESEARCH** focuses on tutorials that have less to do with how to use the library but more about general research in
Sylvain Gugger's avatar
Sylvain Gugger committed
81
  transformers model
82
- The three last section contain the documentation of each public class and function, grouped in:
Sylvain Gugger's avatar
Sylvain Gugger committed
83

84
85
86
    - **MAIN CLASSES** for the main classes exposing the important APIs of the library.
    - **MODELS** for the classes and functions related to each model implemented in the library.
    - **INTERNAL HELPERS** for the classes and functions we use internally.
Sylvain Gugger's avatar
Sylvain Gugger committed
87

88
The library currently contains Jax, PyTorch and Tensorflow implementations, pretrained model weights, usage scripts and
89
90
91
92
conversion utilities for the following models.

Supported models
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
93

94
95
96
..
    This list is updated automatically from the README with `make fix-copies`. Do not update manually!

97
98
99
100
101
102
1. :doc:`ALBERT <model_doc/albert>` (from Google Research and the Toyota Technological Institute at Chicago) released
   with the paper `ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
   <https://arxiv.org/abs/1909.11942>`__, by Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush
   Sharma, Radu Soricut.
2. :doc:`BART <model_doc/bart>` (from Facebook) released with the paper `BART: Denoising Sequence-to-Sequence
   Pre-training for Natural Language Generation, Translation, and Comprehension
103
104
   <https://arxiv.org/pdf/1910.13461.pdf>`__ by Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman
   Mohamed, Omer Levy, Ves Stoyanov and Luke Zettlemoyer.
105
106
107
3. :doc:`BARThez <model_doc/barthez>` (from École polytechnique) released with the paper `BARThez: a Skilled Pretrained
   French Sequence-to-Sequence Model <https://arxiv.org/abs/2010.12321>`__ by Moussa Kamal Eddine, Antoine J.-P.
   Tixier, Michalis Vazirgiannis.
Lysandre's avatar
Lysandre committed
108
109
4. :doc:`BEiT <model_doc/beit>` (from Microsoft) released with the paper `BEiT: BERT Pre-Training of Image Transformers
   <https://arxiv.org/abs/2106.08254>`__ by Hangbo Bao, Li Dong, Furu Wei.
NielsRogge's avatar
NielsRogge committed
110
5. :doc:`BERT <model_doc/bert>` (from Google) released with the paper `BERT: Pre-training of Deep Bidirectional
111
112
   Transformers for Language Understanding <https://arxiv.org/abs/1810.04805>`__ by Jacob Devlin, Ming-Wei Chang,
   Kenton Lee and Kristina Toutanova.
NielsRogge's avatar
NielsRogge committed
113
6. :doc:`BERT For Sequence Generation <model_doc/bertgeneration>` (from Google) released with the paper `Leveraging
114
115
   Pre-trained Checkpoints for Sequence Generation Tasks <https://arxiv.org/abs/1907.12461>`__ by Sascha Rothe, Shashi
   Narayan, Aliaksei Severyn.
NielsRogge's avatar
NielsRogge committed
116
7. :doc:`BigBird-RoBERTa <model_doc/bigbird>` (from Google Research) released with the paper `Big Bird: Transformers
Vasudev Gupta's avatar
Vasudev Gupta committed
117
118
   for Longer Sequences <https://arxiv.org/abs/2007.14062>`__ by Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua
   Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed.
NielsRogge's avatar
NielsRogge committed
119
8. :doc:`BigBird-Pegasus <model_doc/bigbird_pegasus>` (from Google Research) released with the paper `Big Bird:
Vasudev Gupta's avatar
Vasudev Gupta committed
120
121
   Transformers for Longer Sequences <https://arxiv.org/abs/2007.14062>`__ by Manzil Zaheer, Guru Guruganesh, Avinava
   Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed.
NielsRogge's avatar
NielsRogge committed
122
9. :doc:`Blenderbot <model_doc/blenderbot>` (from Facebook) released with the paper `Recipes for building an
Lysandre's avatar
Lysandre committed
123
124
   open-domain chatbot <https://arxiv.org/abs/2004.13637>`__ by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary
   Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston.
NielsRogge's avatar
NielsRogge committed
125
126
127
128
10. :doc:`BlenderbotSmall <model_doc/blenderbot_small>` (from Facebook) released with the paper `Recipes for building
    an open-domain chatbot <https://arxiv.org/abs/2004.13637>`__ by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju,
    Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston.
11. :doc:`BORT <model_doc/bort>` (from Alexa) released with the paper `Optimal Subarchitecture Extraction For BERT
Vasudev Gupta's avatar
Vasudev Gupta committed
129
    <https://arxiv.org/abs/2010.10499>`__ by Adrian de Wynter and Daniel J. Perry.
NielsRogge's avatar
NielsRogge committed
130
12. :doc:`ByT5 <model_doc/byt5>` (from Google Research) released with the paper `ByT5: Towards a token-free future with
Patrick von Platen's avatar
Patrick von Platen committed
131
132
    pre-trained byte-to-byte models <https://arxiv.org/abs/2105.13626>`__ by Linting Xue, Aditya Barua, Noah Constant,
    Rami Al-Rfou, Sharan Narang, Mihir Kale, Adam Roberts, Colin Raffel.
NielsRogge's avatar
NielsRogge committed
133
13. :doc:`CamemBERT <model_doc/camembert>` (from Inria/Facebook/Sorbonne) released with the paper `CamemBERT: a Tasty
Vasudev Gupta's avatar
Vasudev Gupta committed
134
135
    French Language Model <https://arxiv.org/abs/1911.03894>`__ by Louis Martin*, Benjamin Muller*, Pedro Javier Ortiz
    Suárez*, Yoann Dupont, Laurent Romary, Éric Villemonte de la Clergerie, Djamé Seddah and Benoît Sagot.
NielsRogge's avatar
NielsRogge committed
136
14. :doc:`CANINE <model_doc/canine>` (from Google Research) released with the paper `CANINE: Pre-training an Efficient
NielsRogge's avatar
NielsRogge committed
137
138
    Tokenization-Free Encoder for Language Representation <https://arxiv.org/abs/2103.06874>`__ by Jonathan H. Clark,
    Dan Garrette, Iulia Turc, John Wieting.
NielsRogge's avatar
NielsRogge committed
139
15. :doc:`CLIP <model_doc/clip>` (from OpenAI) released with the paper `Learning Transferable Visual Models From
Suraj Patil's avatar
Suraj Patil committed
140
141
142
    Natural Language Supervision <https://arxiv.org/abs/2103.00020>`__ by Alec Radford, Jong Wook Kim, Chris Hallacy,
    Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen
    Krueger, Ilya Sutskever.
NielsRogge's avatar
NielsRogge committed
143
16. :doc:`ConvBERT <model_doc/convbert>` (from YituTech) released with the paper `ConvBERT: Improving BERT with
Stefan Schweter's avatar
Stefan Schweter committed
144
145
    Span-based Dynamic Convolution <https://arxiv.org/abs/2008.02496>`__ by Zihang Jiang, Weihao Yu, Daquan Zhou,
    Yunpeng Chen, Jiashi Feng, Shuicheng Yan.
NielsRogge's avatar
NielsRogge committed
146
17. :doc:`CPM <model_doc/cpm>` (from Tsinghua University) released with the paper `CPM: A Large-scale Generative
147
148
149
150
    Chinese Pre-trained Language Model <https://arxiv.org/abs/2012.00413>`__ by Zhengyan Zhang, Xu Han, Hao Zhou, Pei
    Ke, Yuxian Gu, Deming Ye, Yujia Qin, Yusheng Su, Haozhe Ji, Jian Guan, Fanchao Qi, Xiaozhi Wang, Yanan Zheng,
    Guoyang Zeng, Huanqi Cao, Shengqi Chen, Daixuan Li, Zhenbo Sun, Zhiyuan Liu, Minlie Huang, Wentao Han, Jie Tang,
    Juanzi Li, Xiaoyan Zhu, Maosong Sun.
NielsRogge's avatar
NielsRogge committed
151
18. :doc:`CTRL <model_doc/ctrl>` (from Salesforce) released with the paper `CTRL: A Conditional Transformer Language
abhishek thakur's avatar
abhishek thakur committed
152
153
    Model for Controllable Generation <https://arxiv.org/abs/1909.05858>`__ by Nitish Shirish Keskar*, Bryan McCann*,
    Lav R. Varshney, Caiming Xiong and Richard Socher.
NielsRogge's avatar
NielsRogge committed
154
19. :doc:`DeBERTa <model_doc/deberta>` (from Microsoft) released with the paper `DeBERTa: Decoding-enhanced BERT with
Lysandre Debut's avatar
Lysandre Debut committed
155
156
    Disentangled Attention <https://arxiv.org/abs/2006.03654>`__ by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu
    Chen.
NielsRogge's avatar
NielsRogge committed
157
20. :doc:`DeBERTa-v2 <model_doc/deberta_v2>` (from Microsoft) released with the paper `DeBERTa: Decoding-enhanced BERT
Lysandre's avatar
Lysandre committed
158
159
    with Disentangled Attention <https://arxiv.org/abs/2006.03654>`__ by Pengcheng He, Xiaodong Liu, Jianfeng Gao,
    Weizhu Chen.
NielsRogge's avatar
NielsRogge committed
160
21. :doc:`DeiT <model_doc/deit>` (from Facebook) released with the paper `Training data-efficient image transformers &
NielsRogge's avatar
NielsRogge committed
161
162
    distillation through attention <https://arxiv.org/abs/2012.12877>`__ by Hugo Touvron, Matthieu Cord, Matthijs
    Douze, Francisco Massa, Alexandre Sablayrolles, Hervé Jégou.
NielsRogge's avatar
NielsRogge committed
163
22. :doc:`DETR <model_doc/detr>` (from Facebook) released with the paper `End-to-End Object Detection with Transformers
NielsRogge's avatar
NielsRogge committed
164
165
    <https://arxiv.org/abs/2005.12872>`__ by Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier,
    Alexander Kirillov, Sergey Zagoruyko.
NielsRogge's avatar
NielsRogge committed
166
23. :doc:`DialoGPT <model_doc/dialogpt>` (from Microsoft Research) released with the paper `DialoGPT: Large-Scale
167
168
    Generative Pre-training for Conversational Response Generation <https://arxiv.org/abs/1911.00536>`__ by Yizhe
    Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao, Jianfeng Gao, Jingjing Liu, Bill Dolan.
NielsRogge's avatar
NielsRogge committed
169
24. :doc:`DistilBERT <model_doc/distilbert>` (from HuggingFace), released together with the paper `DistilBERT, a
170
171
172
173
174
175
    distilled version of BERT: smaller, faster, cheaper and lighter <https://arxiv.org/abs/1910.01108>`__ by Victor
    Sanh, Lysandre Debut and Thomas Wolf. The same method has been applied to compress GPT2 into `DistilGPT2
    <https://github.com/huggingface/transformers/tree/master/examples/distillation>`__, RoBERTa into `DistilRoBERTa
    <https://github.com/huggingface/transformers/tree/master/examples/distillation>`__, Multilingual BERT into
    `DistilmBERT <https://github.com/huggingface/transformers/tree/master/examples/distillation>`__ and a German
    version of DistilBERT.
NielsRogge's avatar
NielsRogge committed
176
25. :doc:`DPR <model_doc/dpr>` (from Facebook) released with the paper `Dense Passage Retrieval for Open-Domain
177
178
    Question Answering <https://arxiv.org/abs/2004.04906>`__ by Vladimir Karpukhin, Barlas Oğuz, Sewon Min, Patrick
    Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih.
179
180
181
182
26. :doc:`EncoderDecoder <model_doc/encoderdecoder>` (from Google Research) released with the paper `Leveraging
    Pre-trained Checkpoints for Sequence Generation Tasks <https://arxiv.org/abs/1907.12461>`__ by Sascha Rothe, Shashi
    Narayan, Aliaksei Severyn.
27. :doc:`ELECTRA <model_doc/electra>` (from Google Research/Stanford University) released with the paper `ELECTRA:
183
184
    Pre-training text encoders as discriminators rather than generators <https://arxiv.org/abs/2003.10555>`__ by Kevin
    Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning.
185
28. :doc:`FlauBERT <model_doc/flaubert>` (from CNRS) released with the paper `FlauBERT: Unsupervised Language Model
186
187
    Pre-training for French <https://arxiv.org/abs/1912.05372>`__ by Hang Le, Loïc Vial, Jibril Frej, Vincent Segonne,
    Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Benoît Crabbé, Laurent Besacier, Didier Schwab.
Gunjan Chhablani's avatar
Gunjan Chhablani committed
188
189
190
191
29. `FNet <https://huggingface.co/transformers/master/model_doc/fnet.html>`__ (from Google Research) released with the
    paper `FNet: Mixing Tokens with Fourier Transforms <https://arxiv.org/abs/2105.03824>`__ by James Lee-Thorp, Joshua
    Ainslie, Ilya Eckstein, Santiago Ontanon.
30. :doc:`Funnel Transformer <model_doc/funnel>` (from CMU/Google Brain) released with the paper `Funnel-Transformer:
192
193
    Filtering out Sequential Redundancy for Efficient Language Processing <https://arxiv.org/abs/2006.03236>`__ by
    Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le.
Gunjan Chhablani's avatar
Gunjan Chhablani committed
194
31. :doc:`GPT <model_doc/gpt>` (from OpenAI) released with the paper `Improving Language Understanding by Generative
195
196
    Pre-Training <https://blog.openai.com/language-unsupervised/>`__ by Alec Radford, Karthik Narasimhan, Tim Salimans
    and Ilya Sutskever.
Gunjan Chhablani's avatar
Gunjan Chhablani committed
197
32. :doc:`GPT-2 <model_doc/gpt2>` (from OpenAI) released with the paper `Language Models are Unsupervised Multitask
198
199
    Learners <https://blog.openai.com/better-language-models/>`__ by Alec Radford*, Jeffrey Wu*, Rewon Child, David
    Luan, Dario Amodei** and Ilya Sutskever**.
Gunjan Chhablani's avatar
Gunjan Chhablani committed
200
33. :doc:`GPT-J <model_doc/gptj>` (from EleutherAI) released in the repository `kingoflolz/mesh-transformer-jax
Stella Biderman's avatar
Stella Biderman committed
201
    <https://github.com/kingoflolz/mesh-transformer-jax/>`__ by Ben Wang and Aran Komatsuzaki.
Gunjan Chhablani's avatar
Gunjan Chhablani committed
202
34. :doc:`GPT Neo <model_doc/gpt_neo>` (from EleutherAI) released in the repository `EleutherAI/gpt-neo
Suraj Patil's avatar
Suraj Patil committed
203
    <https://github.com/EleutherAI/gpt-neo>`__ by Sid Black, Stella Biderman, Leo Gao, Phil Wang and Connor Leahy.
Gunjan Chhablani's avatar
Gunjan Chhablani committed
204
35. :doc:`Hubert <model_doc/hubert>` (from Facebook) released with the paper `HuBERT: Self-Supervised Speech
Patrick von Platen's avatar
Patrick von Platen committed
205
206
    Representation Learning by Masked Prediction of Hidden Units <https://arxiv.org/abs/2106.07447>`__ by Wei-Ning Hsu,
    Benjamin Bolte, Yao-Hung Hubert Tsai, Kushal Lakhotia, Ruslan Salakhutdinov, Abdelrahman Mohamed.
Gunjan Chhablani's avatar
Gunjan Chhablani committed
207
36. :doc:`I-BERT <model_doc/ibert>` (from Berkeley) released with the paper `I-BERT: Integer-only BERT Quantization
208
    <https://arxiv.org/abs/2101.01321>`__ by Sehoon Kim, Amir Gholami, Zhewei Yao, Michael W. Mahoney, Kurt Keutzer.
Gunjan Chhablani's avatar
Gunjan Chhablani committed
209
37. :doc:`LayoutLM <model_doc/layoutlm>` (from Microsoft Research Asia) released with the paper `LayoutLM: Pre-training
210
211
    of Text and Layout for Document Image Understanding <https://arxiv.org/abs/1912.13318>`__ by Yiheng Xu, Minghao Li,
    Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou.
Gunjan Chhablani's avatar
Gunjan Chhablani committed
212
38. :doc:`LayoutLMv2 <model_doc/layoutlmv2>` (from Microsoft Research Asia) released with the paper `LayoutLMv2:
213
214
215
    Multi-modal Pre-training for Visually-Rich Document Understanding <https://arxiv.org/abs/2012.14740>`__ by Yang Xu,
    Yiheng Xu, Tengchao Lv, Lei Cui, Furu Wei, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Wanxiang Che, Min
    Zhang, Lidong Zhou.
Gunjan Chhablani's avatar
Gunjan Chhablani committed
216
39. :doc:`LayoutXLM <model_doc/layoutlmv2>` (from Microsoft Research Asia) released with the paper `LayoutXLM:
217
218
    Multimodal Pre-training for Multilingual Visually-rich Document Understanding <https://arxiv.org/abs/2104.08836>`__
    by Yiheng Xu, Tengchao Lv, Lei Cui, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Furu Wei.
Gunjan Chhablani's avatar
Gunjan Chhablani committed
219
40. :doc:`LED <model_doc/led>` (from AllenAI) released with the paper `Longformer: The Long-Document Transformer
220
    <https://arxiv.org/abs/2004.05150>`__ by Iz Beltagy, Matthew E. Peters, Arman Cohan.
Gunjan Chhablani's avatar
Gunjan Chhablani committed
221
41. :doc:`Longformer <model_doc/longformer>` (from AllenAI) released with the paper `Longformer: The Long-Document
222
    Transformer <https://arxiv.org/abs/2004.05150>`__ by Iz Beltagy, Matthew E. Peters, Arman Cohan.
Gunjan Chhablani's avatar
Gunjan Chhablani committed
223
42. :doc:`LUKE <model_doc/luke>` (from Studio Ousia) released with the paper `LUKE: Deep Contextualized Entity
NielsRogge's avatar
NielsRogge committed
224
225
    Representations with Entity-aware Self-attention <https://arxiv.org/abs/2010.01057>`__ by Ikuya Yamada, Akari Asai,
    Hiroyuki Shindo, Hideaki Takeda, Yuji Matsumoto.
Gunjan Chhablani's avatar
Gunjan Chhablani committed
226
43. :doc:`LXMERT <model_doc/lxmert>` (from UNC Chapel Hill) released with the paper `LXMERT: Learning Cross-Modality
227
228
    Encoder Representations from Transformers for Open-Domain Question Answering <https://arxiv.org/abs/1908.07490>`__
    by Hao Tan and Mohit Bansal.
Gunjan Chhablani's avatar
Gunjan Chhablani committed
229
44. :doc:`M2M100 <model_doc/m2m_100>` (from Facebook) released with the paper `Beyond English-Centric Multilingual
230
231
232
    Machine Translation <https://arxiv.org/abs/2010.11125>`__ by Angela Fan, Shruti Bhosale, Holger Schwenk, Zhiyi Ma,
    Ahmed El-Kishky, Siddharth Goyal, Mandeep Baines, Onur Celebi, Guillaume Wenzek, Vishrav Chaudhary, Naman Goyal,
    Tom Birch, Vitaliy Liptchinsky, Sergey Edunov, Edouard Grave, Michael Auli, Armand Joulin.
Gunjan Chhablani's avatar
Gunjan Chhablani committed
233
45. :doc:`MarianMT <model_doc/marian>` Machine translation models trained using `OPUS <http://opus.nlpl.eu/>`__ data by
234
235
    Jörg Tiedemann. The `Marian Framework <https://marian-nmt.github.io/>`__ is being developed by the Microsoft
    Translator Team.
Gunjan Chhablani's avatar
Gunjan Chhablani committed
236
46. :doc:`MBart <model_doc/mbart>` (from Facebook) released with the paper `Multilingual Denoising Pre-training for
237
238
    Neural Machine Translation <https://arxiv.org/abs/2001.08210>`__ by Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li,
    Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer.
Gunjan Chhablani's avatar
Gunjan Chhablani committed
239
47. :doc:`MBart-50 <model_doc/mbart>` (from Facebook) released with the paper `Multilingual Translation with Extensible
Suraj Patil's avatar
Suraj Patil committed
240
241
    Multilingual Pretraining and Finetuning <https://arxiv.org/abs/2008.00401>`__ by Yuqing Tang, Chau Tran, Xian Li,
    Peng-Jen Chen, Naman Goyal, Vishrav Chaudhary, Jiatao Gu, Angela Fan.
Gunjan Chhablani's avatar
Gunjan Chhablani committed
242
48. :doc:`Megatron-BERT <model_doc/megatron_bert>` (from NVIDIA) released with the paper `Megatron-LM: Training
243
244
    Multi-Billion Parameter Language Models Using Model Parallelism <https://arxiv.org/abs/1909.08053>`__ by Mohammad
    Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro.
Gunjan Chhablani's avatar
Gunjan Chhablani committed
245
49. :doc:`Megatron-GPT2 <model_doc/megatron_gpt2>` (from NVIDIA) released with the paper `Megatron-LM: Training
246
247
    Multi-Billion Parameter Language Models Using Model Parallelism <https://arxiv.org/abs/1909.08053>`__ by Mohammad
    Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro.
Gunjan Chhablani's avatar
Gunjan Chhablani committed
248
50. :doc:`MPNet <model_doc/mpnet>` (from Microsoft Research) released with the paper `MPNet: Masked and Permuted
StillKeepTry's avatar
StillKeepTry committed
249
250
    Pre-training for Language Understanding <https://arxiv.org/abs/2004.09297>`__ by Kaitao Song, Xu Tan, Tao Qin,
    Jianfeng Lu, Tie-Yan Liu.
Gunjan Chhablani's avatar
Gunjan Chhablani committed
251
51. :doc:`MT5 <model_doc/mt5>` (from Google AI) released with the paper `mT5: A massively multilingual pre-trained
Patrick von Platen's avatar
Patrick von Platen committed
252
253
    text-to-text transformer <https://arxiv.org/abs/2010.11934>`__ by Linting Xue, Noah Constant, Adam Roberts, Mihir
    Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel.
Gunjan Chhablani's avatar
Gunjan Chhablani committed
254
52. :doc:`Pegasus <model_doc/pegasus>` (from Google) released with the paper `PEGASUS: Pre-training with Extracted
255
    Gap-sentences for Abstractive Summarization <https://arxiv.org/abs/1912.08777>`__ by Jingqing Zhang, Yao Zhao,
256
    Mohammad Saleh and Peter J. Liu.
Gunjan Chhablani's avatar
Gunjan Chhablani committed
257
53. :doc:`ProphetNet <model_doc/prophetnet>` (from Microsoft Research) released with the paper `ProphetNet: Predicting
Lysandre's avatar
Lysandre committed
258
259
    Future N-gram for Sequence-to-Sequence Pre-training <https://arxiv.org/abs/2001.04063>`__ by Yu Yan, Weizhen Qi,
    Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou.
Gunjan Chhablani's avatar
Gunjan Chhablani committed
260
54. :doc:`Reformer <model_doc/reformer>` (from Google Research) released with the paper `Reformer: The Efficient
261
    Transformer <https://arxiv.org/abs/2001.04451>`__ by Nikita Kitaev, Łukasz Kaiser, Anselm Levskaya.
Gunjan Chhablani's avatar
Gunjan Chhablani committed
262
55. :doc:`RemBERT <model_doc/rembert>` (from Google Research) released with the paper `Rethinking embedding coupling in
263
264
    pre-trained language models <https://arxiv.org/pdf/2010.12821.pdf>`__ by Hyung Won Chung, Thibault Févry, Henry
    Tsai, M. Johnson, Sebastian Ruder.
Gunjan Chhablani's avatar
Gunjan Chhablani committed
265
56. :doc:`RoBERTa <model_doc/roberta>` (from Facebook), released together with the paper a `Robustly Optimized BERT
266
    Pretraining Approach <https://arxiv.org/abs/1907.11692>`__ by Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar
267
    Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov.
Gunjan Chhablani's avatar
Gunjan Chhablani committed
268
57. :doc:`RoFormer <model_doc/roformer>` (from ZhuiyiTechnology), released together with the paper a `RoFormer:
269
270
    Enhanced Transformer with Rotary Position Embedding <https://arxiv.org/pdf/2104.09864v1.pdf>`__ by Jianlin Su and
    Yu Lu and Shengfeng Pan and Bo Wen and Yunfeng Liu.
Gunjan Chhablani's avatar
Gunjan Chhablani committed
271
272
58. `SpeechEncoderDecoder <https://huggingface.co/transformers/master/model_doc/speechencoderdecoder.html>`__
59. :doc:`SpeechToTextTransformer <model_doc/speech_to_text>` (from Facebook), released together with the paper
Suraj Patil's avatar
Suraj Patil committed
273
274
    `fairseq S2T: Fast Speech-to-Text Modeling with fairseq <https://arxiv.org/abs/2010.05171>`__ by Changhan Wang, Yun
    Tang, Xutai Ma, Anne Wu, Dmytro Okhonko, Juan Pino.
Gunjan Chhablani's avatar
Gunjan Chhablani committed
275
60. `SpeechToTextTransformer2 <https://huggingface.co/transformers/master/model_doc/speech_to_text_2.html>`__ (from
276
277
278
    Facebook), released together with the paper `Large-Scale Self- and Semi-Supervised Learning for Speech Translation
    <https://arxiv.org/abs/2104.06678>`__ by Changhan Wang, Anne Wu, Juan Pino, Alexei Baevski, Michael Auli, Alexis
    Conneau.
Gunjan Chhablani's avatar
Gunjan Chhablani committed
279
61. :doc:`Splinter <model_doc/splinter>` (from Tel Aviv University), released together with the paper `Few-Shot
Lysandre's avatar
Lysandre committed
280
281
    Question Answering by Pretraining Span Selection <https://arxiv.org/abs/2101.00438>`__ by Ori Ram, Yuval Kirstain,
    Jonathan Berant, Amir Globerson, Omer Levy.
Gunjan Chhablani's avatar
Gunjan Chhablani committed
282
62. :doc:`SqueezeBert <model_doc/squeezebert>` (from Berkeley) released with the paper `SqueezeBERT: What can computer
283
284
    vision teach NLP about efficient neural networks? <https://arxiv.org/abs/2006.11316>`__ by Forrest N. Iandola,
    Albert E. Shaw, Ravi Krishna, and Kurt W. Keutzer.
Gunjan Chhablani's avatar
Gunjan Chhablani committed
285
63. :doc:`T5 <model_doc/t5>` (from Google AI) released with the paper `Exploring the Limits of Transfer Learning with a
286
287
    Unified Text-to-Text Transformer <https://arxiv.org/abs/1910.10683>`__ by Colin Raffel and Noam Shazeer and Adam
    Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu.
Gunjan Chhablani's avatar
Gunjan Chhablani committed
288
64. :doc:`T5v1.1 <model_doc/t5v1.1>` (from Google AI) released in the repository
NielsRogge's avatar
NielsRogge committed
289
290
291
292
    `google-research/text-to-text-transfer-transformer
    <https://github.com/google-research/text-to-text-transfer-transformer/blob/main/released_checkpoints.md#t511>`__ by
    Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi
    Zhou and Wei Li and Peter J. Liu.
Gunjan Chhablani's avatar
Gunjan Chhablani committed
293
65. :doc:`TAPAS <model_doc/tapas>` (from Google AI) released with the paper `TAPAS: Weakly Supervised Table Parsing via
Lysandre's avatar
Lysandre committed
294
295
    Pre-training <https://arxiv.org/abs/2004.02349>`__ by Jonathan Herzig, Paweł Krzysztof Nowak, Thomas Müller,
    Francesco Piccinno and Julian Martin Eisenschlos.
Gunjan Chhablani's avatar
Gunjan Chhablani committed
296
66. :doc:`Transformer-XL <model_doc/transformerxl>` (from Google/CMU) released with the paper `Transformer-XL:
297
298
    Attentive Language Models Beyond a Fixed-Length Context <https://arxiv.org/abs/1901.02860>`__ by Zihang Dai*,
    Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov.
Gunjan Chhablani's avatar
Gunjan Chhablani committed
299
67. :doc:`Vision Transformer (ViT) <model_doc/vit>` (from Google AI) released with the paper `An Image is Worth 16x16
300
301
302
    Words: Transformers for Image Recognition at Scale <https://arxiv.org/abs/2010.11929>`__ by Alexey Dosovitskiy,
    Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias
    Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby.
Gunjan Chhablani's avatar
Gunjan Chhablani committed
303
68. :doc:`VisualBERT <model_doc/visual_bert>` (from UCLA NLP) released with the paper `VisualBERT: A Simple and
Gunjan Chhablani's avatar
Gunjan Chhablani committed
304
305
    Performant Baseline for Vision and Language <https://arxiv.org/pdf/1908.03557>`__ by Liunian Harold Li, Mark
    Yatskar, Da Yin, Cho-Jui Hsieh, Kai-Wei Chang.
Gunjan Chhablani's avatar
Gunjan Chhablani committed
306
69. :doc:`Wav2Vec2 <model_doc/wav2vec2>` (from Facebook AI) released with the paper `wav2vec 2.0: A Framework for
Patrick von Platen's avatar
Patrick von Platen committed
307
308
    Self-Supervised Learning of Speech Representations <https://arxiv.org/abs/2006.11477>`__ by Alexei Baevski, Henry
    Zhou, Abdelrahman Mohamed, Michael Auli.
Gunjan Chhablani's avatar
Gunjan Chhablani committed
309
70. :doc:`XLM <model_doc/xlm>` (from Facebook) released together with the paper `Cross-lingual Language Model
310
    Pretraining <https://arxiv.org/abs/1901.07291>`__ by Guillaume Lample and Alexis Conneau.
Gunjan Chhablani's avatar
Gunjan Chhablani committed
311
71. :doc:`XLM-ProphetNet <model_doc/xlmprophetnet>` (from Microsoft Research) released with the paper `ProphetNet:
Lysandre's avatar
Lysandre committed
312
313
    Predicting Future N-gram for Sequence-to-Sequence Pre-training <https://arxiv.org/abs/2001.04063>`__ by Yu Yan,
    Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou.
Gunjan Chhablani's avatar
Gunjan Chhablani committed
314
72. :doc:`XLM-RoBERTa <model_doc/xlmroberta>` (from Facebook AI), released together with the paper `Unsupervised
315
316
317
    Cross-lingual Representation Learning at Scale <https://arxiv.org/abs/1911.02116>`__ by Alexis Conneau*, Kartikay
    Khandelwal*, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke
    Zettlemoyer and Veselin Stoyanov.
Gunjan Chhablani's avatar
Gunjan Chhablani committed
318
73. :doc:`XLNet <model_doc/xlnet>` (from Google/CMU) released with the paper `​XLNet: Generalized Autoregressive
319
320
    Pretraining for Language Understanding <https://arxiv.org/abs/1906.08237>`__ by Zhilin Yang*, Zihang Dai*, Yiming
    Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le.
Gunjan Chhablani's avatar
Gunjan Chhablani committed
321
74. :doc:`XLSR-Wav2Vec2 <model_doc/xlsr_wav2vec2>` (from Facebook AI) released with the paper `Unsupervised
322
323
    Cross-Lingual Representation Learning For Speech Recognition <https://arxiv.org/abs/2006.13979>`__ by Alexis
    Conneau, Alexei Baevski, Ronan Collobert, Abdelrahman Mohamed, Michael Auli.
LysandreJik's avatar
LysandreJik committed
324

Sylvain Gugger's avatar
Sylvain Gugger committed
325

326
327
Supported frameworks
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
328

Sylvain Gugger's avatar
Sylvain Gugger committed
329
The table below represents the current support in the library for each of those models, whether they have a Python
330
331
tokenizer (called "slow"). A "fast" tokenizer backed by the 🤗 Tokenizers library, whether they have support in Jax (via
Flax), PyTorch, and/or TensorFlow.
Sylvain Gugger's avatar
Sylvain Gugger committed
332
333
334
335
336
337
338
339
340

..
    This table is updated automatically from the auto modules with `make fix-copies`. Do not update manually!

.. rst-class:: center-aligned-table

+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|            Model            | Tokenizer slow | Tokenizer fast | PyTorch support | TensorFlow support | Flax Support |
+=============================+================+================+=================+====================+==============+
Kamal Raj's avatar
Kamal Raj committed
341
|           ALBERT            |       ✅       |       ✅       |       ✅        |         ✅         |      ✅      |
Sylvain Gugger's avatar
Sylvain Gugger committed
342
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
Daniel Stancl's avatar
Daniel Stancl committed
343
|            BART             |       ✅       |       ✅       |       ✅        |         ✅         |      ✅      |
Sylvain Gugger's avatar
Sylvain Gugger committed
344
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
NielsRogge's avatar
NielsRogge committed
345
346
|            BeiT             |       ❌       |       ❌       |       ✅        |         ❌         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
347
348
|            BERT             |       ✅       |       ✅       |       ✅        |         ✅         |      ✅      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
Sylvain Gugger's avatar
Sylvain Gugger committed
349
350
|       Bert Generation       |       ✅       |       ❌       |       ✅        |         ❌         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
Vasudev Gupta's avatar
Vasudev Gupta committed
351
|           BigBird           |       ✅       |       ✅       |       ✅        |         ❌         |      ✅      |
Vasudev Gupta's avatar
Vasudev Gupta committed
352
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
Vasudev Gupta's avatar
Vasudev Gupta committed
353
354
|       BigBirdPegasus        |       ❌       |       ❌       |       ✅        |         ❌         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
Sylvain Gugger's avatar
Sylvain Gugger committed
355
356
|         Blenderbot          |       ✅       |       ❌       |       ✅        |         ✅         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
357
|       BlenderbotSmall       |       ✅       |       ❌       |       ✅        |         ✅         |      ❌      |
358
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
Sylvain Gugger's avatar
Sylvain Gugger committed
359
360
|          CamemBERT          |       ✅       |       ✅       |       ✅        |         ✅         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
NielsRogge's avatar
NielsRogge committed
361
362
|           Canine            |       ✅       |       ❌       |       ✅        |         ❌         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
363
|            CLIP             |       ✅       |       ✅       |       ✅        |         ❌         |      ✅      |
abhishek thakur's avatar
abhishek thakur committed
364
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
365
|          ConvBERT           |       ✅       |       ✅       |       ✅        |         ✅         |      ❌      |
NielsRogge's avatar
NielsRogge committed
366
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
367
|            CTRL             |       ✅       |       ❌       |       ✅        |         ✅         |      ❌      |
Sylvain Gugger's avatar
Sylvain Gugger committed
368
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
Kamal Raj's avatar
Kamal Raj committed
369
|           DeBERTa           |       ✅       |       ✅       |       ✅        |         ✅         |      ❌      |
Sylvain Gugger's avatar
Sylvain Gugger committed
370
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
Kamal Raj's avatar
Kamal Raj committed
371
|         DeBERTa-v2          |       ✅       |       ❌       |       ✅        |         ✅         |      ❌      |
372
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
NielsRogge's avatar
NielsRogge committed
373
374
|            DeiT             |       ❌       |       ❌       |       ✅        |         ❌         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
375
376
|            DETR             |       ❌       |       ❌       |       ✅        |         ❌         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
Kamal Raj's avatar
Kamal Raj committed
377
|         DistilBERT          |       ✅       |       ✅       |       ✅        |         ✅         |      ✅      |
Sylvain Gugger's avatar
Sylvain Gugger committed
378
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
379
380
|             DPR             |       ✅       |       ✅       |       ✅        |         ✅         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
381
|           ELECTRA           |       ✅       |       ✅       |       ✅        |         ✅         |      ✅      |
Sylvain Gugger's avatar
Sylvain Gugger committed
382
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
383
|       Encoder decoder       |       ❌       |       ❌       |       ✅        |         ❌         |      ✅      |
Sylvain Gugger's avatar
Sylvain Gugger committed
384
385
386
387
388
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
| FairSeq Machine-Translation |       ✅       |       ❌       |       ✅        |         ❌         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|          FlauBERT           |       ✅       |       ❌       |       ✅        |         ✅         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
Gunjan Chhablani's avatar
Gunjan Chhablani committed
389
390
|            FNet             |       ✅       |       ✅       |       ✅        |         ❌         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
Sylvain Gugger's avatar
Sylvain Gugger committed
391
392
|     Funnel Transformer      |       ✅       |       ✅       |       ✅        |         ✅         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
Suraj Patil's avatar
Suraj Patil committed
393
|           GPT Neo           |       ❌       |       ❌       |       ✅        |         ❌         |      ✅      |
Suraj Patil's avatar
Suraj Patil committed
394
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
Stella Biderman's avatar
Stella Biderman committed
395
396
|            GPT-J            |       ❌       |       ❌       |       ✅        |         ❌         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
Will Rice's avatar
Will Rice committed
397
|           Hubert            |       ❌       |       ❌       |       ✅        |         ✅         |      ❌      |
Patrick von Platen's avatar
Patrick von Platen committed
398
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
Sehoon Kim's avatar
Sehoon Kim committed
399
400
|           I-BERT            |       ❌       |       ❌       |       ✅        |         ❌         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
401
402
|          LayoutLM           |       ✅       |       ✅       |       ✅        |         ✅         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
403
404
|         LayoutLMv2          |       ✅       |       ✅       |       ✅        |         ❌         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
Patrick von Platen's avatar
Patrick von Platen committed
405
406
|             LED             |       ✅       |       ✅       |       ✅        |         ✅         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
407
408
|         Longformer          |       ✅       |       ✅       |       ✅        |         ✅         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
NielsRogge's avatar
NielsRogge committed
409
410
|            LUKE             |       ✅       |       ❌       |       ✅        |         ❌         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
Sylvain Gugger's avatar
Sylvain Gugger committed
411
412
|           LXMERT            |       ✅       |       ✅       |       ✅        |         ✅         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
Suraj Patil's avatar
Suraj Patil committed
413
414
|           M2M100            |       ✅       |       ❌       |       ✅        |         ❌         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
415
|           Marian            |       ✅       |       ❌       |       ✅        |         ✅         |      ✅      |
Sylvain Gugger's avatar
Sylvain Gugger committed
416
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
417
418
|            mBART            |       ✅       |       ✅       |       ✅        |         ✅         |      ✅      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
419
420
|        MegatronBert         |       ❌       |       ❌       |       ✅        |         ❌         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
Sylvain Gugger's avatar
Sylvain Gugger committed
421
422
|         MobileBERT          |       ✅       |       ✅       |       ✅        |         ✅         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
423
424
425
426
|            MPNet            |       ✅       |       ✅       |       ✅        |         ✅         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|             mT5             |       ✅       |       ✅       |       ✅        |         ✅         |      ✅      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
Sylvain Gugger's avatar
Sylvain Gugger committed
427
428
|         OpenAI GPT          |       ✅       |       ✅       |       ✅        |         ✅         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
Suraj Patil's avatar
Suraj Patil committed
429
|        OpenAI GPT-2         |       ✅       |       ✅       |       ✅        |         ✅         |      ✅      |
Sylvain Gugger's avatar
Sylvain Gugger committed
430
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
431
|           Pegasus           |       ✅       |       ✅       |       ✅        |         ✅         |      ✅      |
Sylvain Gugger's avatar
Sylvain Gugger committed
432
433
434
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|         ProphetNet          |       ✅       |       ❌       |       ✅        |         ❌         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
Ratthachat (Jung)'s avatar
Ratthachat (Jung) committed
435
|             RAG             |       ✅       |       ❌       |       ✅        |         ✅         |      ❌      |
Sylvain Gugger's avatar
Sylvain Gugger committed
436
437
438
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|          Reformer           |       ✅       |       ✅       |       ✅        |         ❌         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
439
440
|           RemBERT           |       ✅       |       ✅       |       ✅        |         ✅         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
Sylvain Gugger's avatar
Sylvain Gugger committed
441
442
443
444
|          RetriBERT          |       ✅       |       ✅       |       ✅        |         ❌         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|           RoBERTa           |       ✅       |       ✅       |       ✅        |         ✅         |      ✅      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
445
446
|          RoFormer           |       ✅       |       ✅       |       ✅        |         ✅         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
447
448
|   Speech Encoder decoder    |       ❌       |       ❌       |       ✅        |         ❌         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
Suraj Patil's avatar
Suraj Patil committed
449
450
|         Speech2Text         |       ✅       |       ❌       |       ✅        |         ❌         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
451
452
|        Speech2Text2         |       ✅       |       ❌       |       ❌        |         ❌         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
Ori Ram's avatar
Ori Ram committed
453
454
|          Splinter           |       ✅       |       ✅       |       ✅        |         ❌         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
Sylvain Gugger's avatar
Sylvain Gugger committed
455
456
|         SqueezeBERT         |       ✅       |       ✅       |       ✅        |         ❌         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
Vasudev Gupta's avatar
Vasudev Gupta committed
457
|             T5              |       ✅       |       ✅       |       ✅        |         ✅         |      ✅      |
Sylvain Gugger's avatar
Sylvain Gugger committed
458
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
NielsRogge's avatar
NielsRogge committed
459
460
|            TAPAS            |       ✅       |       ❌       |       ✅        |         ❌         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
Sylvain Gugger's avatar
Sylvain Gugger committed
461
462
|       Transformer-XL        |       ✅       |       ❌       |       ✅        |         ✅         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
Gunjan Chhablani's avatar
Gunjan Chhablani committed
463
464
|         VisualBert          |       ❌       |       ❌       |       ✅        |         ❌         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
465
466
|             ViT             |       ❌       |       ❌       |       ✅        |         ❌         |      ✅      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
467
|          Wav2Vec2           |       ✅       |       ❌       |       ✅        |         ✅         |      ✅      |
Patrick von Platen's avatar
Patrick von Platen committed
468
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
Sylvain Gugger's avatar
Sylvain Gugger committed
469
470
471
472
473
474
475
476
477
|             XLM             |       ✅       |       ❌       |       ✅        |         ✅         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|         XLM-RoBERTa         |       ✅       |       ✅       |       ✅        |         ✅         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|        XLMProphetNet        |       ✅       |       ❌       |       ✅        |         ❌         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|            XLNet            |       ✅       |       ✅       |       ✅        |         ✅         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+

478
479
.. toctree::
    :maxdepth: 2
480
    :caption: Get started
481

Sylvain Gugger's avatar
Sylvain Gugger committed
482
    quicktour
483
    installation
Sylvain Gugger's avatar
Sylvain Gugger committed
484
    philosophy
Lysandre's avatar
Lysandre committed
485
    glossary
486
487
488

.. toctree::
    :maxdepth: 2
Sylvain Gugger's avatar
Sylvain Gugger committed
489
    :caption: Using 🤗 Transformers
490

Sylvain Gugger's avatar
Sylvain Gugger committed
491
492
    task_summary
    model_summary
Sylvain Gugger's avatar
Sylvain Gugger committed
493
    preprocessing
494
    training
495
    model_sharing
Sylvain Gugger's avatar
Sylvain Gugger committed
496
    tokenizer_summary
497
498
499
500
501
502
503
    multilingual

.. toctree::
    :maxdepth: 2
    :caption: Advanced guides

    pretrained_models
504
    examples
505
    troubleshooting
506
    custom_datasets
507
    notebooks
508
    sagemaker
509
    community
510
    converting_tensorflow_models
511
    migration
512
    contributing
513
    add_new_model
514
    add_new_pipeline
515
    fast_tokenizers
Stas Bekman's avatar
Stas Bekman committed
516
    performance
517
    parallelism
518
    testing
519
    debugging
Funtowicz Morgan's avatar
Funtowicz Morgan committed
520
    serialization
521
522
523
524
525
526

.. toctree::
    :maxdepth: 2
    :caption: Research

    bertology
527
    perplexity
528
    benchmarks
529

thomwolf's avatar
thomwolf committed
530
531
.. toctree::
    :maxdepth: 2
532
    :caption: Main Classes
thomwolf's avatar
thomwolf committed
533

Sylvain Gugger's avatar
Sylvain Gugger committed
534
    main_classes/callback
thomwolf's avatar
thomwolf committed
535
    main_classes/configuration
536
    main_classes/data_collator
537
    main_classes/logging
thomwolf's avatar
thomwolf committed
538
539
    main_classes/model
    main_classes/optimizer_schedules
540
541
    main_classes/output
    main_classes/pipelines
LysandreJik's avatar
LysandreJik committed
542
    main_classes/processors
543
544
    main_classes/tokenizer
    main_classes/trainer
545
    main_classes/deepspeed
546
    main_classes/feature_extractor
547
548
549
550
551
552

.. toctree::
    :maxdepth: 2
    :caption: Models

    model_doc/albert
thomwolf's avatar
thomwolf committed
553
    model_doc/auto
554
    model_doc/bart
555
    model_doc/barthez
NielsRogge's avatar
NielsRogge committed
556
    model_doc/beit
557
    model_doc/bert
558
    model_doc/bertweet
559
    model_doc/bertgeneration
560
    model_doc/bert_japanese
Vasudev Gupta's avatar
Vasudev Gupta committed
561
    model_doc/bigbird
Vasudev Gupta's avatar
Vasudev Gupta committed
562
    model_doc/bigbird_pegasus
Sam Shleifer's avatar
Sam Shleifer committed
563
    model_doc/blenderbot
564
    model_doc/blenderbot_small
Stefan Schweter's avatar
Stefan Schweter committed
565
    model_doc/bort
Patrick von Platen's avatar
Patrick von Platen committed
566
    model_doc/byt5
Lysandre's avatar
Lysandre committed
567
    model_doc/camembert
NielsRogge's avatar
NielsRogge committed
568
    model_doc/canine
Suraj Patil's avatar
Suraj Patil committed
569
    model_doc/clip
abhishek thakur's avatar
abhishek thakur committed
570
    model_doc/convbert
571
    model_doc/cpm
572
    model_doc/ctrl
Pengcheng He's avatar
Pengcheng He committed
573
    model_doc/deberta
574
    model_doc/deberta_v2
NielsRogge's avatar
NielsRogge committed
575
    model_doc/deit
NielsRogge's avatar
NielsRogge committed
576
    model_doc/detr
577
    model_doc/dialogpt
578
    model_doc/distilbert
Quentin Lhoest's avatar
Quentin Lhoest committed
579
    model_doc/dpr
580
581
582
    model_doc/electra
    model_doc/encoderdecoder
    model_doc/flaubert
Gunjan Chhablani's avatar
Gunjan Chhablani committed
583
    model_doc/fnet
584
    model_doc/fsmt
Sylvain Gugger's avatar
Sylvain Gugger committed
585
    model_doc/funnel
586
    model_doc/herbert
Sehoon Kim's avatar
Sehoon Kim committed
587
    model_doc/ibert
Minghao Li's avatar
Minghao Li committed
588
    model_doc/layoutlm
589
590
    model_doc/layoutlmv2
    model_doc/layoutxlm
Patrick von Platen's avatar
Patrick von Platen committed
591
    model_doc/led
592
    model_doc/longformer
NielsRogge's avatar
NielsRogge committed
593
    model_doc/luke
594
595
    model_doc/lxmert
    model_doc/marian
Suraj Patil's avatar
Suraj Patil committed
596
    model_doc/m2m_100
597
    model_doc/mbart
598
599
    model_doc/megatron_bert
    model_doc/megatron_gpt2
600
    model_doc/mobilebert
StillKeepTry's avatar
StillKeepTry committed
601
    model_doc/mpnet
Patrick von Platen's avatar
Patrick von Platen committed
602
    model_doc/mt5
603
604
    model_doc/gpt
    model_doc/gpt2
Stella Biderman's avatar
Stella Biderman committed
605
    model_doc/gptj
Suraj Patil's avatar
Suraj Patil committed
606
    model_doc/gpt_neo
Patrick von Platen's avatar
Patrick von Platen committed
607
    model_doc/hubert
608
    model_doc/pegasus
609
    model_doc/phobert
Weizhen's avatar
Weizhen committed
610
    model_doc/prophetnet
Sylvain Gugger's avatar
Sylvain Gugger committed
611
    model_doc/rag
612
    model_doc/reformer
613
    model_doc/rembert
614
615
    model_doc/retribert
    model_doc/roberta
616
    model_doc/roformer
617
    model_doc/speechencoderdecoder
Suraj Patil's avatar
Suraj Patil committed
618
    model_doc/speech_to_text
619
    model_doc/speech_to_text_2
Ori Ram's avatar
Ori Ram committed
620
    model_doc/splinter
621
    model_doc/squeezebert
622
    model_doc/t5
NielsRogge's avatar
NielsRogge committed
623
    model_doc/t5v1.1
NielsRogge's avatar
NielsRogge committed
624
    model_doc/tapas
625
    model_doc/transformerxl
626
    model_doc/vit
Gunjan Chhablani's avatar
Gunjan Chhablani committed
627
    model_doc/visual_bert
Patrick von Platen's avatar
Patrick von Platen committed
628
    model_doc/wav2vec2
629
    model_doc/xlm
Weizhen's avatar
Weizhen committed
630
    model_doc/xlmprophetnet
631
632
    model_doc/xlmroberta
    model_doc/xlnet
633
    model_doc/xlsr_wav2vec2
634
635
636
637
638

.. toctree::
    :maxdepth: 2
    :caption: Internal Helpers

Sylvain Gugger's avatar
Sylvain Gugger committed
639
    internal/modeling_utils
640
    internal/pipelines_utils
641
    internal/tokenization_utils
Sylvain Gugger's avatar
Sylvain Gugger committed
642
    internal/trainer_utils
643
    internal/generation_utils
644
    internal/file_utils