index.rst 40.1 KB
Newer Older
1
Transformers
Sylvain Gugger's avatar
Sylvain Gugger committed
2
=======================================================================================================================
3

4
State-of-the-art Natural Language Processing for Jax, Pytorch and TensorFlow
thomwolf's avatar
thomwolf committed
5

6
7
🤗 Transformers (formerly known as `pytorch-transformers` and `pytorch-pretrained-bert`) provides general-purpose
architectures (BERT, GPT-2, RoBERTa, XLM, DistilBert, XLNet...) for Natural Language Understanding (NLU) and Natural
8
9
Language Generation (NLG) with over 32+ pretrained models in 100+ languages and deep interoperability between Jax,
PyTorch and TensorFlow.
10
11

This is the documentation of our repository `transformers <https://github.com/huggingface/transformers>`_.
12

13
14
15
16
17
18
19
20
21
If you are looking for custom support from the Hugging Face team
-----------------------------------------------------------------------------------------------------------------------

.. raw:: html

    <a target="_blank" href="https://huggingface.co/support">
        <img alt="HuggingFace Expert Acceleration Program" src="https://huggingface.co/front/thumbnails/support.png" style="max-width: 600px; border: 1px solid #eee; border-radius: 4px; box-shadow: 0 1px 2px 0 rgba(0, 0, 0, 0.05);">
    </a><br>

LysandreJik's avatar
LysandreJik committed
22
Features
Sylvain Gugger's avatar
Sylvain Gugger committed
23
-----------------------------------------------------------------------------------------------------------------------
LysandreJik's avatar
LysandreJik committed
24
25
26
27

- High performance on NLU and NLG tasks
- Low barrier to entry for educators and practitioners

LysandreJik's avatar
LysandreJik committed
28
29
State-of-the-art NLP for everyone:

LysandreJik's avatar
LysandreJik committed
30
31
32
33
- Deep learning researchers
- Hands-on practitioners
- AI/ML/NLP teachers and educators

34
..
Sylvain Gugger's avatar
Sylvain Gugger committed
35
36
37
38
39
40
41
42
43
44
45
    Copyright 2020 The HuggingFace Team. All rights reserved.

    Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
    the License. You may obtain a copy of the License at

        http://www.apache.org/licenses/LICENSE-2.0

    Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
    an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
    specific language governing permissions and limitations under the License.

LysandreJik's avatar
LysandreJik committed
46
47
Lower compute costs, smaller carbon footprint:

LysandreJik's avatar
LysandreJik committed
48
49
50
51
- Researchers can share trained models instead of always retraining
- Practitioners can reduce compute time and production costs
- 8 architectures with over 30 pretrained models, some in more than 100 languages

LysandreJik's avatar
LysandreJik committed
52
53
Choose the right framework for every part of a model's lifetime:

LysandreJik's avatar
LysandreJik committed
54
- Train state-of-the-art models in 3 lines of code
55
56
- Deep interoperability between Jax, Pytorch and TensorFlow models
- Move a single model between Jax/PyTorch/TensorFlow frameworks at will
LysandreJik's avatar
LysandreJik committed
57
58
- Seamlessly pick the right framework for training, evaluation, production

59
The support for Jax is still experimental (with a few models right now), expect to see it grow in the coming months!
Sylvain Gugger's avatar
Sylvain Gugger committed
60

61
62
63
64
65
66
67
68
`All the model checkpoints <https://huggingface.co/models>`__ are seamlessly integrated from the huggingface.co `model
hub <https://huggingface.co>`__ where they are uploaded directly by `users <https://huggingface.co/users>`__ and
`organizations <https://huggingface.co/organizations>`__.

Current number of checkpoints: |checkpoints|

.. |checkpoints| image:: https://img.shields.io/endpoint?url=https://huggingface.co/api/shields/models&color=brightgreen

LysandreJik's avatar
LysandreJik committed
69
Contents
Sylvain Gugger's avatar
Sylvain Gugger committed
70
-----------------------------------------------------------------------------------------------------------------------
LysandreJik's avatar
LysandreJik committed
71

Sylvain Gugger's avatar
Sylvain Gugger committed
72
73
74
75
The documentation is organized in five parts:

- **GET STARTED** contains a quick tour, the installation instructions and some useful information about our philosophy
  and a glossary.
Sylvain Gugger's avatar
Sylvain Gugger committed
76
- **USING 🤗 TRANSFORMERS** contains general tutorials on how to use the library.
Sylvain Gugger's avatar
Sylvain Gugger committed
77
- **ADVANCED GUIDES** contains more advanced guides that are more specific to a given script or part of the library.
Santiago Castro's avatar
Santiago Castro committed
78
- **RESEARCH** focuses on tutorials that have less to do with how to use the library but more about general research in
Sylvain Gugger's avatar
Sylvain Gugger committed
79
  transformers model
80
- The three last section contain the documentation of each public class and function, grouped in:
Sylvain Gugger's avatar
Sylvain Gugger committed
81

82
83
84
    - **MAIN CLASSES** for the main classes exposing the important APIs of the library.
    - **MODELS** for the classes and functions related to each model implemented in the library.
    - **INTERNAL HELPERS** for the classes and functions we use internally.
Sylvain Gugger's avatar
Sylvain Gugger committed
85

86
The library currently contains Jax, PyTorch and Tensorflow implementations, pretrained model weights, usage scripts and
87
88
89
90
conversion utilities for the following models.

Supported models
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
91

92
93
94
..
    This list is updated automatically from the README with `make fix-copies`. Do not update manually!

95
96
97
98
99
100
1. :doc:`ALBERT <model_doc/albert>` (from Google Research and the Toyota Technological Institute at Chicago) released
   with the paper `ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
   <https://arxiv.org/abs/1909.11942>`__, by Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush
   Sharma, Radu Soricut.
2. :doc:`BART <model_doc/bart>` (from Facebook) released with the paper `BART: Denoising Sequence-to-Sequence
   Pre-training for Natural Language Generation, Translation, and Comprehension
101
102
   <https://arxiv.org/pdf/1910.13461.pdf>`__ by Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman
   Mohamed, Omer Levy, Ves Stoyanov and Luke Zettlemoyer.
103
104
105
106
3. :doc:`BARThez <model_doc/barthez>` (from École polytechnique) released with the paper `BARThez: a Skilled Pretrained
   French Sequence-to-Sequence Model <https://arxiv.org/abs/2010.12321>`__ by Moussa Kamal Eddine, Antoine J.-P.
   Tixier, Michalis Vazirgiannis.
4. :doc:`BERT <model_doc/bert>` (from Google) released with the paper `BERT: Pre-training of Deep Bidirectional
107
108
   Transformers for Language Understanding <https://arxiv.org/abs/1810.04805>`__ by Jacob Devlin, Ming-Wei Chang,
   Kenton Lee and Kristina Toutanova.
109
5. :doc:`BERT For Sequence Generation <model_doc/bertgeneration>` (from Google) released with the paper `Leveraging
110
111
   Pre-trained Checkpoints for Sequence Generation Tasks <https://arxiv.org/abs/1907.12461>`__ by Sascha Rothe, Shashi
   Narayan, Aliaksei Severyn.
Vasudev Gupta's avatar
Vasudev Gupta committed
112
113
114
6. :doc:`BigBird-RoBERTa <model_doc/bigbird>` (from Google Research) released with the paper `Big Bird: Transformers
   for Longer Sequences <https://arxiv.org/abs/2007.14062>`__ by Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua
   Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed.
Vasudev Gupta's avatar
Vasudev Gupta committed
115
116
117
118
7. :doc:`BigBird-Pegasus <model_doc/bigbird_pegasus>` (from Google Research) released with the paper `Big Bird:
   Transformers for Longer Sequences <https://arxiv.org/abs/2007.14062>`__ by Manzil Zaheer, Guru Guruganesh, Avinava
   Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed.
8. :doc:`Blenderbot <model_doc/blenderbot>` (from Facebook) released with the paper `Recipes for building an
Lysandre's avatar
Lysandre committed
119
120
   open-domain chatbot <https://arxiv.org/abs/2004.13637>`__ by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary
   Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston.
Vasudev Gupta's avatar
Vasudev Gupta committed
121
9. :doc:`BlenderbotSmall <model_doc/blenderbot_small>` (from Facebook) released with the paper `Recipes for building an
122
123
   open-domain chatbot <https://arxiv.org/abs/2004.13637>`__ by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary
   Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston.
Vasudev Gupta's avatar
Vasudev Gupta committed
124
125
10. :doc:`BORT <model_doc/bort>` (from Alexa) released with the paper `Optimal Subarchitecture Extraction For BERT
    <https://arxiv.org/abs/2010.10499>`__ by Adrian de Wynter and Daniel J. Perry.
Patrick von Platen's avatar
Patrick von Platen committed
126
127
128
129
11. :doc:`ByT5 <model_doc/byt5>` (from Google Research) released with the paper `ByT5: Towards a token-free future with
    pre-trained byte-to-byte models <https://arxiv.org/abs/2105.13626>`__ by Linting Xue, Aditya Barua, Noah Constant,
    Rami Al-Rfou, Sharan Narang, Mihir Kale, Adam Roberts, Colin Raffel.
12. :doc:`CamemBERT <model_doc/camembert>` (from Inria/Facebook/Sorbonne) released with the paper `CamemBERT: a Tasty
Vasudev Gupta's avatar
Vasudev Gupta committed
130
131
    French Language Model <https://arxiv.org/abs/1911.03894>`__ by Louis Martin*, Benjamin Muller*, Pedro Javier Ortiz
    Suárez*, Yoann Dupont, Laurent Romary, Éric Villemonte de la Clergerie, Djamé Seddah and Benoît Sagot.
Patrick von Platen's avatar
Patrick von Platen committed
132
13. :doc:`CLIP <model_doc/clip>` from (OpenAI) released with the paper `Learning Transferable Visual Models From
Suraj Patil's avatar
Suraj Patil committed
133
134
135
    Natural Language Supervision <https://arxiv.org/abs/2103.00020>`__ by Alec Radford, Jong Wook Kim, Chris Hallacy,
    Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen
    Krueger, Ilya Sutskever.
Patrick von Platen's avatar
Patrick von Platen committed
136
14. :doc:`ConvBERT <model_doc/convbert>` (from YituTech) released with the paper `ConvBERT: Improving BERT with
Stefan Schweter's avatar
Stefan Schweter committed
137
138
    Span-based Dynamic Convolution <https://arxiv.org/abs/2008.02496>`__ by Zihang Jiang, Weihao Yu, Daquan Zhou,
    Yunpeng Chen, Jiashi Feng, Shuicheng Yan.
Patrick von Platen's avatar
Patrick von Platen committed
139
15. :doc:`CPM <model_doc/cpm>` (from Tsinghua University) released with the paper `CPM: A Large-scale Generative
140
141
142
143
    Chinese Pre-trained Language Model <https://arxiv.org/abs/2012.00413>`__ by Zhengyan Zhang, Xu Han, Hao Zhou, Pei
    Ke, Yuxian Gu, Deming Ye, Yujia Qin, Yusheng Su, Haozhe Ji, Jian Guan, Fanchao Qi, Xiaozhi Wang, Yanan Zheng,
    Guoyang Zeng, Huanqi Cao, Shengqi Chen, Daixuan Li, Zhenbo Sun, Zhiyuan Liu, Minlie Huang, Wentao Han, Jie Tang,
    Juanzi Li, Xiaoyan Zhu, Maosong Sun.
Patrick von Platen's avatar
Patrick von Platen committed
144
16. :doc:`CTRL <model_doc/ctrl>` (from Salesforce) released with the paper `CTRL: A Conditional Transformer Language
abhishek thakur's avatar
abhishek thakur committed
145
146
    Model for Controllable Generation <https://arxiv.org/abs/1909.05858>`__ by Nitish Shirish Keskar*, Bryan McCann*,
    Lav R. Varshney, Caiming Xiong and Richard Socher.
Patrick von Platen's avatar
Patrick von Platen committed
147
17. :doc:`DeBERTa <model_doc/deberta>` (from Microsoft) released with the paper `DeBERTa: Decoding-enhanced BERT with
Lysandre Debut's avatar
Lysandre Debut committed
148
149
    Disentangled Attention <https://arxiv.org/abs/2006.03654>`__ by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu
    Chen.
Patrick von Platen's avatar
Patrick von Platen committed
150
18. :doc:`DeBERTa-v2 <model_doc/deberta_v2>` (from Microsoft) released with the paper `DeBERTa: Decoding-enhanced BERT
Lysandre's avatar
Lysandre committed
151
152
    with Disentangled Attention <https://arxiv.org/abs/2006.03654>`__ by Pengcheng He, Xiaodong Liu, Jianfeng Gao,
    Weizhu Chen.
Patrick von Platen's avatar
Patrick von Platen committed
153
19. :doc:`DeiT <model_doc/deit>` (from Facebook) released with the paper `Training data-efficient image transformers &
NielsRogge's avatar
NielsRogge committed
154
155
    distillation through attention <https://arxiv.org/abs/2012.12877>`__ by Hugo Touvron, Matthieu Cord, Matthijs
    Douze, Francisco Massa, Alexandre Sablayrolles, Hervé Jégou.
Patrick von Platen's avatar
Patrick von Platen committed
156
20. :doc:`DialoGPT <model_doc/dialogpt>` (from Microsoft Research) released with the paper `DialoGPT: Large-Scale
157
158
    Generative Pre-training for Conversational Response Generation <https://arxiv.org/abs/1911.00536>`__ by Yizhe
    Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao, Jianfeng Gao, Jingjing Liu, Bill Dolan.
Patrick von Platen's avatar
Patrick von Platen committed
159
21. :doc:`DistilBERT <model_doc/distilbert>` (from HuggingFace), released together with the paper `DistilBERT, a
160
161
162
163
164
165
    distilled version of BERT: smaller, faster, cheaper and lighter <https://arxiv.org/abs/1910.01108>`__ by Victor
    Sanh, Lysandre Debut and Thomas Wolf. The same method has been applied to compress GPT2 into `DistilGPT2
    <https://github.com/huggingface/transformers/tree/master/examples/distillation>`__, RoBERTa into `DistilRoBERTa
    <https://github.com/huggingface/transformers/tree/master/examples/distillation>`__, Multilingual BERT into
    `DistilmBERT <https://github.com/huggingface/transformers/tree/master/examples/distillation>`__ and a German
    version of DistilBERT.
Patrick von Platen's avatar
Patrick von Platen committed
166
22. :doc:`DPR <model_doc/dpr>` (from Facebook) released with the paper `Dense Passage Retrieval for Open-Domain
167
168
    Question Answering <https://arxiv.org/abs/2004.04906>`__ by Vladimir Karpukhin, Barlas Oğuz, Sewon Min, Patrick
    Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih.
Patrick von Platen's avatar
Patrick von Platen committed
169
23. :doc:`ELECTRA <model_doc/electra>` (from Google Research/Stanford University) released with the paper `ELECTRA:
170
171
    Pre-training text encoders as discriminators rather than generators <https://arxiv.org/abs/2003.10555>`__ by Kevin
    Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning.
Patrick von Platen's avatar
Patrick von Platen committed
172
24. :doc:`FlauBERT <model_doc/flaubert>` (from CNRS) released with the paper `FlauBERT: Unsupervised Language Model
173
174
    Pre-training for French <https://arxiv.org/abs/1912.05372>`__ by Hang Le, Loïc Vial, Jibril Frej, Vincent Segonne,
    Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Benoît Crabbé, Laurent Besacier, Didier Schwab.
Patrick von Platen's avatar
Patrick von Platen committed
175
25. :doc:`Funnel Transformer <model_doc/funnel>` (from CMU/Google Brain) released with the paper `Funnel-Transformer:
176
177
    Filtering out Sequential Redundancy for Efficient Language Processing <https://arxiv.org/abs/2006.03236>`__ by
    Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le.
Patrick von Platen's avatar
Patrick von Platen committed
178
26. :doc:`GPT <model_doc/gpt>` (from OpenAI) released with the paper `Improving Language Understanding by Generative
179
180
    Pre-Training <https://blog.openai.com/language-unsupervised/>`__ by Alec Radford, Karthik Narasimhan, Tim Salimans
    and Ilya Sutskever.
Patrick von Platen's avatar
Patrick von Platen committed
181
27. :doc:`GPT-2 <model_doc/gpt2>` (from OpenAI) released with the paper `Language Models are Unsupervised Multitask
182
183
    Learners <https://blog.openai.com/better-language-models/>`__ by Alec Radford*, Jeffrey Wu*, Rewon Child, David
    Luan, Dario Amodei** and Ilya Sutskever**.
Patrick von Platen's avatar
Patrick von Platen committed
184
28. :doc:`GPT Neo <model_doc/gpt_neo>` (from EleutherAI) released in the repository `EleutherAI/gpt-neo
Suraj Patil's avatar
Suraj Patil committed
185
    <https://github.com/EleutherAI/gpt-neo>`__ by Sid Black, Stella Biderman, Leo Gao, Phil Wang and Connor Leahy.
Patrick von Platen's avatar
Patrick von Platen committed
186
29. :doc:`I-BERT <model_doc/ibert>` (from Berkeley) released with the paper `I-BERT: Integer-only BERT Quantization
Lysandre's avatar
Lysandre committed
187
    <https://arxiv.org/abs/2101.01321>`__ by Sehoon Kim, Amir Gholami, Zhewei Yao, Michael W. Mahoney, Kurt Keutzer
Patrick von Platen's avatar
Patrick von Platen committed
188
30. :doc:`LayoutLM <model_doc/layoutlm>` (from Microsoft Research Asia) released with the paper `LayoutLM: Pre-training
189
190
    of Text and Layout for Document Image Understanding <https://arxiv.org/abs/1912.13318>`__ by Yiheng Xu, Minghao Li,
    Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou.
Patrick von Platen's avatar
Patrick von Platen committed
191
31. :doc:`LED <model_doc/led>` (from AllenAI) released with the paper `Longformer: The Long-Document Transformer
192
    <https://arxiv.org/abs/2004.05150>`__ by Iz Beltagy, Matthew E. Peters, Arman Cohan.
Patrick von Platen's avatar
Patrick von Platen committed
193
32. :doc:`Longformer <model_doc/longformer>` (from AllenAI) released with the paper `Longformer: The Long-Document
194
    Transformer <https://arxiv.org/abs/2004.05150>`__ by Iz Beltagy, Matthew E. Peters, Arman Cohan.
Patrick von Platen's avatar
Patrick von Platen committed
195
33. :doc:`LUKE <model_doc/luke>` (from Studio Ousia) released with the paper `LUKE: Deep Contextualized Entity
NielsRogge's avatar
NielsRogge committed
196
197
    Representations with Entity-aware Self-attention <https://arxiv.org/abs/2010.01057>`__ by Ikuya Yamada, Akari Asai,
    Hiroyuki Shindo, Hideaki Takeda, Yuji Matsumoto.
Patrick von Platen's avatar
Patrick von Platen committed
198
34. :doc:`LXMERT <model_doc/lxmert>` (from UNC Chapel Hill) released with the paper `LXMERT: Learning Cross-Modality
199
200
    Encoder Representations from Transformers for Open-Domain Question Answering <https://arxiv.org/abs/1908.07490>`__
    by Hao Tan and Mohit Bansal.
Patrick von Platen's avatar
Patrick von Platen committed
201
35. :doc:`M2M100 <model_doc/m2m_100>` (from Facebook) released with the paper `Beyond English-Centric Multilingual
Suraj Patil's avatar
Suraj Patil committed
202
203
204
    Machine Translation <https://arxiv.org/abs/2010.11125>`__ by by Angela Fan, Shruti Bhosale, Holger Schwenk, Zhiyi
    Ma, Ahmed El-Kishky, Siddharth Goyal, Mandeep Baines, Onur Celebi, Guillaume Wenzek, Vishrav Chaudhary, Naman
    Goyal, Tom Birch, Vitaliy Liptchinsky, Sergey Edunov, Edouard Grave, Michael Auli, Armand Joulin.
Patrick von Platen's avatar
Patrick von Platen committed
205
36. :doc:`MarianMT <model_doc/marian>` Machine translation models trained using `OPUS <http://opus.nlpl.eu/>`__ data by
206
207
    Jörg Tiedemann. The `Marian Framework <https://marian-nmt.github.io/>`__ is being developed by the Microsoft
    Translator Team.
Patrick von Platen's avatar
Patrick von Platen committed
208
37. :doc:`MBart <model_doc/mbart>` (from Facebook) released with the paper `Multilingual Denoising Pre-training for
209
210
    Neural Machine Translation <https://arxiv.org/abs/2001.08210>`__ by Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li,
    Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer.
Patrick von Platen's avatar
Patrick von Platen committed
211
38. :doc:`MBart-50 <model_doc/mbart>` (from Facebook) released with the paper `Multilingual Translation with Extensible
Suraj Patil's avatar
Suraj Patil committed
212
213
    Multilingual Pretraining and Finetuning <https://arxiv.org/abs/2008.00401>`__ by Yuqing Tang, Chau Tran, Xian Li,
    Peng-Jen Chen, Naman Goyal, Vishrav Chaudhary, Jiatao Gu, Angela Fan.
Patrick von Platen's avatar
Patrick von Platen committed
214
39. :doc:`Megatron-BERT <model_doc/megatron_bert>` (from NVIDIA) released with the paper `Megatron-LM: Training
215
216
    Multi-Billion Parameter Language Models Using Model Parallelism <https://arxiv.org/abs/1909.08053>`__ by Mohammad
    Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro.
Patrick von Platen's avatar
Patrick von Platen committed
217
40. :doc:`Megatron-GPT2 <model_doc/megatron_gpt2>` (from NVIDIA) released with the paper `Megatron-LM: Training
218
219
    Multi-Billion Parameter Language Models Using Model Parallelism <https://arxiv.org/abs/1909.08053>`__ by Mohammad
    Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro.
Patrick von Platen's avatar
Patrick von Platen committed
220
41. :doc:`MPNet <model_doc/mpnet>` (from Microsoft Research) released with the paper `MPNet: Masked and Permuted
StillKeepTry's avatar
StillKeepTry committed
221
222
    Pre-training for Language Understanding <https://arxiv.org/abs/2004.09297>`__ by Kaitao Song, Xu Tan, Tao Qin,
    Jianfeng Lu, Tie-Yan Liu.
Patrick von Platen's avatar
Patrick von Platen committed
223
42. :doc:`MT5 <model_doc/mt5>` (from Google AI) released with the paper `mT5: A massively multilingual pre-trained
Patrick von Platen's avatar
Patrick von Platen committed
224
225
    text-to-text transformer <https://arxiv.org/abs/2010.11934>`__ by Linting Xue, Noah Constant, Adam Roberts, Mihir
    Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel.
Patrick von Platen's avatar
Patrick von Platen committed
226
43. :doc:`Pegasus <model_doc/pegasus>` (from Google) released with the paper `PEGASUS: Pre-training with Extracted
227
228
    Gap-sentences for Abstractive Summarization <https://arxiv.org/abs/1912.08777>`__> by Jingqing Zhang, Yao Zhao,
    Mohammad Saleh and Peter J. Liu.
Patrick von Platen's avatar
Patrick von Platen committed
229
44. :doc:`ProphetNet <model_doc/prophetnet>` (from Microsoft Research) released with the paper `ProphetNet: Predicting
Lysandre's avatar
Lysandre committed
230
231
    Future N-gram for Sequence-to-Sequence Pre-training <https://arxiv.org/abs/2001.04063>`__ by Yu Yan, Weizhen Qi,
    Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou.
Patrick von Platen's avatar
Patrick von Platen committed
232
45. :doc:`Reformer <model_doc/reformer>` (from Google Research) released with the paper `Reformer: The Efficient
233
    Transformer <https://arxiv.org/abs/2001.04451>`__ by Nikita Kitaev, Łukasz Kaiser, Anselm Levskaya.
Patrick von Platen's avatar
Patrick von Platen committed
234
46. :doc:`RoBERTa <model_doc/roberta>` (from Facebook), released together with the paper a `Robustly Optimized BERT
235
    Pretraining Approach <https://arxiv.org/abs/1907.11692>`__ by Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar
236
    Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov.
Patrick von Platen's avatar
Patrick von Platen committed
237
47. :doc:`RoFormer <model_doc/roformer>` (from ZhuiyiTechnology), released together with the paper a `RoFormer:
238
239
    Enhanced Transformer with Rotary Position Embedding <https://arxiv.org/pdf/2104.09864v1.pdf>`__ by Jianlin Su and
    Yu Lu and Shengfeng Pan and Bo Wen and Yunfeng Liu.
Patrick von Platen's avatar
Patrick von Platen committed
240
48. :doc:`SpeechToTextTransformer <model_doc/speech_to_text>` (from Facebook), released together with the paper
Suraj Patil's avatar
Suraj Patil committed
241
242
    `fairseq S2T: Fast Speech-to-Text Modeling with fairseq <https://arxiv.org/abs/2010.05171>`__ by Changhan Wang, Yun
    Tang, Xutai Ma, Anne Wu, Dmytro Okhonko, Juan Pino.
Patrick von Platen's avatar
Patrick von Platen committed
243
49. :doc:`SqueezeBert <model_doc/squeezebert>` released with the paper `SqueezeBERT: What can computer vision teach NLP
Lysandre's avatar
Lysandre committed
244
245
    about efficient neural networks? <https://arxiv.org/abs/2006.11316>`__ by Forrest N. Iandola, Albert E. Shaw, Ravi
    Krishna, and Kurt W. Keutzer.
Patrick von Platen's avatar
Patrick von Platen committed
246
50. :doc:`T5 <model_doc/t5>` (from Google AI) released with the paper `Exploring the Limits of Transfer Learning with a
247
248
    Unified Text-to-Text Transformer <https://arxiv.org/abs/1910.10683>`__ by Colin Raffel and Noam Shazeer and Adam
    Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu.
Patrick von Platen's avatar
Patrick von Platen committed
249
51. :doc:`TAPAS <model_doc/tapas>` (from Google AI) released with the paper `TAPAS: Weakly Supervised Table Parsing via
Lysandre's avatar
Lysandre committed
250
251
    Pre-training <https://arxiv.org/abs/2004.02349>`__ by Jonathan Herzig, Paweł Krzysztof Nowak, Thomas Müller,
    Francesco Piccinno and Julian Martin Eisenschlos.
Patrick von Platen's avatar
Patrick von Platen committed
252
52. :doc:`Transformer-XL <model_doc/transformerxl>` (from Google/CMU) released with the paper `Transformer-XL:
253
254
    Attentive Language Models Beyond a Fixed-Length Context <https://arxiv.org/abs/1901.02860>`__ by Zihang Dai*,
    Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov.
Patrick von Platen's avatar
Patrick von Platen committed
255
53. :doc:`Vision Transformer (ViT) <model_doc/vit>` (from Google AI) released with the paper `An Image is Worth 16x16
256
257
258
    Words: Transformers for Image Recognition at Scale <https://arxiv.org/abs/2010.11929>`__ by Alexey Dosovitskiy,
    Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias
    Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby.
Gunjan Chhablani's avatar
Gunjan Chhablani committed
259
260
261
262
54. :doc:`VisualBERT <model_doc/visual_bert>` (from UCLA NLP) released with the paper `VisualBERT: A Simple and
    Performant Baseline for Vision and Language <https://arxiv.org/pdf/1908.03557>`__ by Liunian Harold Li, Mark
    Yatskar, Da Yin, Cho-Jui Hsieh, Kai-Wei Chang.
55. :doc:`Wav2Vec2 <model_doc/wav2vec2>` (from Facebook AI) released with the paper `wav2vec 2.0: A Framework for
Patrick von Platen's avatar
Patrick von Platen committed
263
264
    Self-Supervised Learning of Speech Representations <https://arxiv.org/abs/2006.11477>`__ by Alexei Baevski, Henry
    Zhou, Abdelrahman Mohamed, Michael Auli.
Gunjan Chhablani's avatar
Gunjan Chhablani committed
265
56. :doc:`XLM <model_doc/xlm>` (from Facebook) released together with the paper `Cross-lingual Language Model
266
    Pretraining <https://arxiv.org/abs/1901.07291>`__ by Guillaume Lample and Alexis Conneau.
Gunjan Chhablani's avatar
Gunjan Chhablani committed
267
57. :doc:`XLM-ProphetNet <model_doc/xlmprophetnet>` (from Microsoft Research) released with the paper `ProphetNet:
Lysandre's avatar
Lysandre committed
268
269
    Predicting Future N-gram for Sequence-to-Sequence Pre-training <https://arxiv.org/abs/2001.04063>`__ by Yu Yan,
    Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou.
Gunjan Chhablani's avatar
Gunjan Chhablani committed
270
58. :doc:`XLM-RoBERTa <model_doc/xlmroberta>` (from Facebook AI), released together with the paper `Unsupervised
271
272
273
    Cross-lingual Representation Learning at Scale <https://arxiv.org/abs/1911.02116>`__ by Alexis Conneau*, Kartikay
    Khandelwal*, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke
    Zettlemoyer and Veselin Stoyanov.
Gunjan Chhablani's avatar
Gunjan Chhablani committed
274
59. :doc:`XLNet <model_doc/xlnet>` (from Google/CMU) released with the paper `​XLNet: Generalized Autoregressive
275
276
    Pretraining for Language Understanding <https://arxiv.org/abs/1906.08237>`__ by Zhilin Yang*, Zihang Dai*, Yiming
    Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le.
Gunjan Chhablani's avatar
Gunjan Chhablani committed
277
60. :doc:`XLSR-Wav2Vec2 <model_doc/xlsr_wav2vec2>` (from Facebook AI) released with the paper `Unsupervised
278
279
    Cross-Lingual Representation Learning For Speech Recognition <https://arxiv.org/abs/2006.13979>`__ by Alexis
    Conneau, Alexei Baevski, Ronan Collobert, Abdelrahman Mohamed, Michael Auli.
LysandreJik's avatar
LysandreJik committed
280

Sylvain Gugger's avatar
Sylvain Gugger committed
281

282
283
Supported frameworks
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
284

Sylvain Gugger's avatar
Sylvain Gugger committed
285
The table below represents the current support in the library for each of those models, whether they have a Python
286
287
tokenizer (called "slow"). A "fast" tokenizer backed by the 🤗 Tokenizers library, whether they have support in Jax (via
Flax), PyTorch, and/or TensorFlow.
Sylvain Gugger's avatar
Sylvain Gugger committed
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304

..
    This table is updated automatically from the auto modules with `make fix-copies`. Do not update manually!

.. rst-class:: center-aligned-table

+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|            Model            | Tokenizer slow | Tokenizer fast | PyTorch support | TensorFlow support | Flax Support |
+=============================+================+================+=================+====================+==============+
|           ALBERT            |       ✅       |       ✅       |       ✅        |         ✅         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|            BART             |       ✅       |       ✅       |       ✅        |         ✅         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|            BERT             |       ✅       |       ✅       |       ✅        |         ✅         |      ✅      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|       Bert Generation       |       ✅       |       ❌       |       ✅        |         ❌         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
305
|           BigBird           |       ✅       |       ✅       |       ✅        |         ❌         |      ❌      |
Vasudev Gupta's avatar
Vasudev Gupta committed
306
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
Vasudev Gupta's avatar
Vasudev Gupta committed
307
308
|       BigBirdPegasus        |       ❌       |       ❌       |       ✅        |         ❌         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
Sylvain Gugger's avatar
Sylvain Gugger committed
309
310
|         Blenderbot          |       ✅       |       ❌       |       ✅        |         ✅         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
311
|       BlenderbotSmall       |       ✅       |       ❌       |       ✅        |         ✅         |      ❌      |
312
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
Suraj Patil's avatar
Suraj Patil committed
313
|            CLIP             |       ✅       |       ✅       |       ✅        |         ❌         |      ✅      |
Suraj Patil's avatar
Suraj Patil committed
314
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
Sylvain Gugger's avatar
Sylvain Gugger committed
315
316
317
318
|            CTRL             |       ✅       |       ❌       |       ✅        |         ✅         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|          CamemBERT          |       ✅       |       ✅       |       ✅        |         ✅         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
abhishek thakur's avatar
abhishek thakur committed
319
320
|          ConvBERT           |       ✅       |       ✅       |       ✅        |         ✅         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
Sylvain Gugger's avatar
Sylvain Gugger committed
321
322
|             DPR             |       ✅       |       ✅       |       ✅        |         ✅         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
323
|           DeBERTa           |       ✅       |       ✅       |       ✅        |         ❌         |      ❌      |
Sylvain Gugger's avatar
Sylvain Gugger committed
324
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
325
326
|         DeBERTa-v2          |       ✅       |       ❌       |       ✅        |         ❌         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
NielsRogge's avatar
NielsRogge committed
327
328
|            DeiT             |       ❌       |       ❌       |       ✅        |         ❌         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
Sylvain Gugger's avatar
Sylvain Gugger committed
329
330
|         DistilBERT          |       ✅       |       ✅       |       ✅        |         ✅         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
331
|           ELECTRA           |       ✅       |       ✅       |       ✅        |         ✅         |      ✅      |
Sylvain Gugger's avatar
Sylvain Gugger committed
332
333
334
335
336
337
338
339
340
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|       Encoder decoder       |       ❌       |       ❌       |       ✅        |         ❌         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
| FairSeq Machine-Translation |       ✅       |       ❌       |       ✅        |         ❌         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|          FlauBERT           |       ✅       |       ❌       |       ✅        |         ✅         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|     Funnel Transformer      |       ✅       |       ✅       |       ✅        |         ✅         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
Suraj Patil's avatar
Suraj Patil committed
341
342
|           GPT Neo           |       ❌       |       ❌       |       ✅        |         ❌         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
Sehoon Kim's avatar
Sehoon Kim committed
343
344
|           I-BERT            |       ❌       |       ❌       |       ✅        |         ❌         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
Patrick von Platen's avatar
Patrick von Platen committed
345
346
|             LED             |       ✅       |       ✅       |       ✅        |         ✅         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
NielsRogge's avatar
NielsRogge committed
347
348
|            LUKE             |       ✅       |       ❌       |       ✅        |         ❌         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
Sylvain Gugger's avatar
Sylvain Gugger committed
349
350
|           LXMERT            |       ✅       |       ✅       |       ✅        |         ✅         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
Amir Tahmasbi's avatar
Amir Tahmasbi committed
351
|          LayoutLM           |       ✅       |       ✅       |       ✅        |         ✅         |      ❌      |
Sylvain Gugger's avatar
Sylvain Gugger committed
352
353
354
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|         Longformer          |       ✅       |       ✅       |       ✅        |         ✅         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
Suraj Patil's avatar
Suraj Patil committed
355
356
|           M2M100            |       ✅       |       ❌       |       ✅        |         ❌         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
StillKeepTry's avatar
StillKeepTry committed
357
358
|            MPNet            |       ✅       |       ✅       |       ✅        |         ✅         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
Sylvain Gugger's avatar
Sylvain Gugger committed
359
360
|           Marian            |       ✅       |       ❌       |       ✅        |         ✅         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
361
362
|        MegatronBert         |       ❌       |       ❌       |       ✅        |         ❌         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
Sylvain Gugger's avatar
Sylvain Gugger committed
363
364
365
366
|         MobileBERT          |       ✅       |       ✅       |       ✅        |         ✅         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|         OpenAI GPT          |       ✅       |       ✅       |       ✅        |         ✅         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
Suraj Patil's avatar
Suraj Patil committed
367
|        OpenAI GPT-2         |       ✅       |       ✅       |       ✅        |         ✅         |      ✅      |
Sylvain Gugger's avatar
Sylvain Gugger committed
368
369
370
371
372
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|           Pegasus           |       ✅       |       ✅       |       ✅        |         ✅         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|         ProphetNet          |       ✅       |       ❌       |       ✅        |         ❌         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
Ratthachat (Jung)'s avatar
Ratthachat (Jung) committed
373
|             RAG             |       ✅       |       ❌       |       ✅        |         ✅         |      ❌      |
Sylvain Gugger's avatar
Sylvain Gugger committed
374
375
376
377
378
379
380
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|          Reformer           |       ✅       |       ✅       |       ✅        |         ❌         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|          RetriBERT          |       ✅       |       ✅       |       ✅        |         ❌         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|           RoBERTa           |       ✅       |       ✅       |       ✅        |         ✅         |      ✅      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
381
382
|          RoFormer           |       ✅       |       ✅       |       ✅        |         ✅         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
Suraj Patil's avatar
Suraj Patil committed
383
384
|         Speech2Text         |       ✅       |       ❌       |       ✅        |         ❌         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
Sylvain Gugger's avatar
Sylvain Gugger committed
385
386
387
388
|         SqueezeBERT         |       ✅       |       ✅       |       ✅        |         ❌         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|             T5              |       ✅       |       ✅       |       ✅        |         ✅         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
NielsRogge's avatar
NielsRogge committed
389
390
|            TAPAS            |       ✅       |       ❌       |       ✅        |         ❌         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
Sylvain Gugger's avatar
Sylvain Gugger committed
391
392
|       Transformer-XL        |       ✅       |       ❌       |       ✅        |         ✅         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
393
394
|             ViT             |       ❌       |       ❌       |       ✅        |         ❌         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
Gunjan Chhablani's avatar
Gunjan Chhablani committed
395
396
|         VisualBert          |       ❌       |       ❌       |       ✅        |         ❌         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
Patrick von Platen's avatar
Patrick von Platen committed
397
398
|          Wav2Vec2           |       ✅       |       ❌       |       ✅        |         ❌         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
Sylvain Gugger's avatar
Sylvain Gugger committed
399
400
401
402
403
404
405
406
407
408
409
410
411
|             XLM             |       ✅       |       ❌       |       ✅        |         ✅         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|         XLM-RoBERTa         |       ✅       |       ✅       |       ✅        |         ✅         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|        XLMProphetNet        |       ✅       |       ❌       |       ✅        |         ❌         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|            XLNet            |       ✅       |       ✅       |       ✅        |         ✅         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|            mBART            |       ✅       |       ✅       |       ✅        |         ✅         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|             mT5             |       ✅       |       ✅       |       ✅        |         ✅         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+

412
413
.. toctree::
    :maxdepth: 2
414
    :caption: Get started
415

Sylvain Gugger's avatar
Sylvain Gugger committed
416
    quicktour
417
    installation
Sylvain Gugger's avatar
Sylvain Gugger committed
418
    philosophy
Lysandre's avatar
Lysandre committed
419
    glossary
420
421
422

.. toctree::
    :maxdepth: 2
Sylvain Gugger's avatar
Sylvain Gugger committed
423
    :caption: Using 🤗 Transformers
424

Sylvain Gugger's avatar
Sylvain Gugger committed
425
426
    task_summary
    model_summary
Sylvain Gugger's avatar
Sylvain Gugger committed
427
    preprocessing
428
    training
429
    model_sharing
Sylvain Gugger's avatar
Sylvain Gugger committed
430
    tokenizer_summary
431
432
433
434
435
436
437
    multilingual

.. toctree::
    :maxdepth: 2
    :caption: Advanced guides

    pretrained_models
438
    examples
439
    troubleshooting
440
    custom_datasets
441
    notebooks
442
    sagemaker
443
    community
444
    converting_tensorflow_models
445
    migration
446
    contributing
447
    add_new_model
448
    fast_tokenizers
449
    testing
450
    debugging
Funtowicz Morgan's avatar
Funtowicz Morgan committed
451
    serialization
452
453
454
455
456
457

.. toctree::
    :maxdepth: 2
    :caption: Research

    bertology
458
    perplexity
459
    benchmarks
460

thomwolf's avatar
thomwolf committed
461
462
.. toctree::
    :maxdepth: 2
463
    :caption: Main Classes
thomwolf's avatar
thomwolf committed
464

Sylvain Gugger's avatar
Sylvain Gugger committed
465
    main_classes/callback
thomwolf's avatar
thomwolf committed
466
    main_classes/configuration
467
    main_classes/data_collator
468
    main_classes/logging
thomwolf's avatar
thomwolf committed
469
470
    main_classes/model
    main_classes/optimizer_schedules
471
472
    main_classes/output
    main_classes/pipelines
LysandreJik's avatar
LysandreJik committed
473
    main_classes/processors
474
475
    main_classes/tokenizer
    main_classes/trainer
476
    main_classes/deepspeed
477
    main_classes/feature_extractor
478
479
480
481
482
483

.. toctree::
    :maxdepth: 2
    :caption: Models

    model_doc/albert
thomwolf's avatar
thomwolf committed
484
    model_doc/auto
485
    model_doc/bart
486
    model_doc/barthez
487
    model_doc/bert
488
    model_doc/bertweet
489
    model_doc/bertgeneration
490
    model_doc/bert_japanese
Vasudev Gupta's avatar
Vasudev Gupta committed
491
    model_doc/bigbird
Vasudev Gupta's avatar
Vasudev Gupta committed
492
    model_doc/bigbird_pegasus
Sam Shleifer's avatar
Sam Shleifer committed
493
    model_doc/blenderbot
494
    model_doc/blenderbot_small
Stefan Schweter's avatar
Stefan Schweter committed
495
    model_doc/bort
Patrick von Platen's avatar
Patrick von Platen committed
496
    model_doc/byt5
Lysandre's avatar
Lysandre committed
497
    model_doc/camembert
Suraj Patil's avatar
Suraj Patil committed
498
    model_doc/clip
abhishek thakur's avatar
abhishek thakur committed
499
    model_doc/convbert
500
    model_doc/cpm
501
    model_doc/ctrl
Pengcheng He's avatar
Pengcheng He committed
502
    model_doc/deberta
503
    model_doc/deberta_v2
NielsRogge's avatar
NielsRogge committed
504
    model_doc/deit
505
    model_doc/dialogpt
506
    model_doc/distilbert
Quentin Lhoest's avatar
Quentin Lhoest committed
507
    model_doc/dpr
508
509
510
    model_doc/electra
    model_doc/encoderdecoder
    model_doc/flaubert
511
    model_doc/fsmt
Sylvain Gugger's avatar
Sylvain Gugger committed
512
    model_doc/funnel
513
    model_doc/herbert
Sehoon Kim's avatar
Sehoon Kim committed
514
    model_doc/ibert
Minghao Li's avatar
Minghao Li committed
515
    model_doc/layoutlm
Patrick von Platen's avatar
Patrick von Platen committed
516
    model_doc/led
517
    model_doc/longformer
NielsRogge's avatar
NielsRogge committed
518
    model_doc/luke
519
520
    model_doc/lxmert
    model_doc/marian
Suraj Patil's avatar
Suraj Patil committed
521
    model_doc/m2m_100
522
    model_doc/mbart
523
524
    model_doc/megatron_bert
    model_doc/megatron_gpt2
525
    model_doc/mobilebert
StillKeepTry's avatar
StillKeepTry committed
526
    model_doc/mpnet
Patrick von Platen's avatar
Patrick von Platen committed
527
    model_doc/mt5
528
529
    model_doc/gpt
    model_doc/gpt2
Suraj Patil's avatar
Suraj Patil committed
530
    model_doc/gpt_neo
531
    model_doc/pegasus
532
    model_doc/phobert
Weizhen's avatar
Weizhen committed
533
    model_doc/prophetnet
Sylvain Gugger's avatar
Sylvain Gugger committed
534
    model_doc/rag
535
536
537
    model_doc/reformer
    model_doc/retribert
    model_doc/roberta
538
    model_doc/roformer
Suraj Patil's avatar
Suraj Patil committed
539
    model_doc/speech_to_text
540
    model_doc/squeezebert
541
    model_doc/t5
NielsRogge's avatar
NielsRogge committed
542
    model_doc/tapas
543
    model_doc/transformerxl
544
    model_doc/vit
Gunjan Chhablani's avatar
Gunjan Chhablani committed
545
    model_doc/visual_bert
Patrick von Platen's avatar
Patrick von Platen committed
546
    model_doc/wav2vec2
547
    model_doc/xlm
Weizhen's avatar
Weizhen committed
548
    model_doc/xlmprophetnet
549
550
    model_doc/xlmroberta
    model_doc/xlnet
551
    model_doc/xlsr_wav2vec2
552
553
554
555
556

.. toctree::
    :maxdepth: 2
    :caption: Internal Helpers

Sylvain Gugger's avatar
Sylvain Gugger committed
557
    internal/modeling_utils
558
    internal/pipelines_utils
559
    internal/tokenization_utils
Sylvain Gugger's avatar
Sylvain Gugger committed
560
    internal/trainer_utils
561
    internal/generation_utils
562
    internal/file_utils