index.rst 29.7 KB
Newer Older
1
Transformers
Sylvain Gugger's avatar
Sylvain Gugger committed
2
=======================================================================================================================
3

4
State-of-the-art Natural Language Processing for Pytorch and TensorFlow 2.0.
thomwolf's avatar
thomwolf committed
5

6
7
8
🤗 Transformers (formerly known as `pytorch-transformers` and `pytorch-pretrained-bert`) provides general-purpose
architectures (BERT, GPT-2, RoBERTa, XLM, DistilBert, XLNet...) for Natural Language Understanding (NLU) and Natural
Language Generation (NLG) with over 32+ pretrained models in 100+ languages and deep interoperability between
9
10
11
TensorFlow 2.0 and PyTorch.

This is the documentation of our repository `transformers <https://github.com/huggingface/transformers>`_.
12

LysandreJik's avatar
LysandreJik committed
13
Features
Sylvain Gugger's avatar
Sylvain Gugger committed
14
-----------------------------------------------------------------------------------------------------------------------
LysandreJik's avatar
LysandreJik committed
15
16
17
18

- High performance on NLU and NLG tasks
- Low barrier to entry for educators and practitioners

LysandreJik's avatar
LysandreJik committed
19
20
State-of-the-art NLP for everyone:

LysandreJik's avatar
LysandreJik committed
21
22
23
24
- Deep learning researchers
- Hands-on practitioners
- AI/ML/NLP teachers and educators

Sylvain Gugger's avatar
Sylvain Gugger committed
25
26
27
28
29
30
31
32
33
34
35
36
.. 
    Copyright 2020 The HuggingFace Team. All rights reserved.

    Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
    the License. You may obtain a copy of the License at

        http://www.apache.org/licenses/LICENSE-2.0

    Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
    an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
    specific language governing permissions and limitations under the License.

LysandreJik's avatar
LysandreJik committed
37
38
Lower compute costs, smaller carbon footprint:

LysandreJik's avatar
LysandreJik committed
39
40
41
42
- Researchers can share trained models instead of always retraining
- Practitioners can reduce compute time and production costs
- 8 architectures with over 30 pretrained models, some in more than 100 languages

LysandreJik's avatar
LysandreJik committed
43
44
Choose the right framework for every part of a model's lifetime:

LysandreJik's avatar
LysandreJik committed
45
46
47
48
49
- Train state-of-the-art models in 3 lines of code
- Deep interoperability between TensorFlow 2.0 and PyTorch models
- Move a single model between TF2.0/PyTorch frameworks at will
- Seamlessly pick the right framework for training, evaluation, production

Sylvain Gugger's avatar
Sylvain Gugger committed
50
51
Experimental support for Flax with a few models right now, expected to grow in the coming months.

52
53
54
55
56
57
58
59
`All the model checkpoints <https://huggingface.co/models>`__ are seamlessly integrated from the huggingface.co `model
hub <https://huggingface.co>`__ where they are uploaded directly by `users <https://huggingface.co/users>`__ and
`organizations <https://huggingface.co/organizations>`__.

Current number of checkpoints: |checkpoints|

.. |checkpoints| image:: https://img.shields.io/endpoint?url=https://huggingface.co/api/shields/models&color=brightgreen

LysandreJik's avatar
LysandreJik committed
60
Contents
Sylvain Gugger's avatar
Sylvain Gugger committed
61
-----------------------------------------------------------------------------------------------------------------------
LysandreJik's avatar
LysandreJik committed
62

Sylvain Gugger's avatar
Sylvain Gugger committed
63
64
65
66
The documentation is organized in five parts:

- **GET STARTED** contains a quick tour, the installation instructions and some useful information about our philosophy
  and a glossary.
Sylvain Gugger's avatar
Sylvain Gugger committed
67
- **USING 🤗 TRANSFORMERS** contains general tutorials on how to use the library.
Sylvain Gugger's avatar
Sylvain Gugger committed
68
- **ADVANCED GUIDES** contains more advanced guides that are more specific to a given script or part of the library.
Santiago Castro's avatar
Santiago Castro committed
69
- **RESEARCH** focuses on tutorials that have less to do with how to use the library but more about general research in
Sylvain Gugger's avatar
Sylvain Gugger committed
70
  transformers model
71
- The three last section contain the documentation of each public class and function, grouped in:
Sylvain Gugger's avatar
Sylvain Gugger committed
72

73
74
75
    - **MAIN CLASSES** for the main classes exposing the important APIs of the library.
    - **MODELS** for the classes and functions related to each model implemented in the library.
    - **INTERNAL HELPERS** for the classes and functions we use internally.
Sylvain Gugger's avatar
Sylvain Gugger committed
76

Sylvain Gugger's avatar
Sylvain Gugger committed
77
78
The library currently contains PyTorch, Tensorflow and Flax implementations, pretrained model weights, usage scripts
and conversion utilities for the following models:
79

80
81
82
..
    This list is updated automatically from the README with `make fix-copies`. Do not update manually!

83
84
85
86
87
88
1. :doc:`ALBERT <model_doc/albert>` (from Google Research and the Toyota Technological Institute at Chicago) released
   with the paper `ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
   <https://arxiv.org/abs/1909.11942>`__, by Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush
   Sharma, Radu Soricut.
2. :doc:`BART <model_doc/bart>` (from Facebook) released with the paper `BART: Denoising Sequence-to-Sequence
   Pre-training for Natural Language Generation, Translation, and Comprehension
89
90
   <https://arxiv.org/pdf/1910.13461.pdf>`__ by Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman
   Mohamed, Omer Levy, Ves Stoyanov and Luke Zettlemoyer.
91
92
93
94
3. :doc:`BARThez <model_doc/barthez>` (from École polytechnique) released with the paper `BARThez: a Skilled Pretrained
   French Sequence-to-Sequence Model <https://arxiv.org/abs/2010.12321>`__ by Moussa Kamal Eddine, Antoine J.-P.
   Tixier, Michalis Vazirgiannis.
4. :doc:`BERT <model_doc/bert>` (from Google) released with the paper `BERT: Pre-training of Deep Bidirectional
95
96
   Transformers for Language Understanding <https://arxiv.org/abs/1810.04805>`__ by Jacob Devlin, Ming-Wei Chang,
   Kenton Lee and Kristina Toutanova.
97
5. :doc:`BERT For Sequence Generation <model_doc/bertgeneration>` (from Google) released with the paper `Leveraging
98
99
   Pre-trained Checkpoints for Sequence Generation Tasks <https://arxiv.org/abs/1907.12461>`__ by Sascha Rothe, Shashi
   Narayan, Aliaksei Severyn.
100
6. :doc:`Blenderbot <model_doc/blenderbot>` (from Facebook) released with the paper `Recipes for building an
Lysandre's avatar
Lysandre committed
101
102
   open-domain chatbot <https://arxiv.org/abs/2004.13637>`__ by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary
   Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston.
103
104
105
7. :doc:`BlenderbotSmall <model_doc/blenderbot_small>` (from Facebook) released with the paper `Recipes for building an
   open-domain chatbot <https://arxiv.org/abs/2004.13637>`__ by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary
   Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston.
Stefan Schweter's avatar
Stefan Schweter committed
106
107
108
8. :doc:`BORT <model_doc/bort>` (from Alexa) released with the paper `Optimal Subarchitecture Extraction For BERT
   <https://arxiv.org/abs/2010.10499>`__ by Adrian de Wynter and Daniel J. Perry.
9. :doc:`CamemBERT <model_doc/camembert>` (from Inria/Facebook/Sorbonne) released with the paper `CamemBERT: a Tasty
109
110
   French Language Model <https://arxiv.org/abs/1911.03894>`__ by Louis Martin*, Benjamin Muller*, Pedro Javier Ortiz
   Suárez*, Yoann Dupont, Laurent Romary, Éric Villemonte de la Clergerie, Djamé Seddah and Benoît Sagot.
Stefan Schweter's avatar
Stefan Schweter committed
111
112
113
114
10. :doc:`ConvBERT <model_doc/convbert>` (from YituTech) released with the paper `ConvBERT: Improving BERT with
    Span-based Dynamic Convolution <https://arxiv.org/abs/2008.02496>`__ by Zihang Jiang, Weihao Yu, Daquan Zhou,
    Yunpeng Chen, Jiashi Feng, Shuicheng Yan.
11. :doc:`CTRL <model_doc/ctrl>` (from Salesforce) released with the paper `CTRL: A Conditional Transformer Language
abhishek thakur's avatar
abhishek thakur committed
115
116
    Model for Controllable Generation <https://arxiv.org/abs/1909.05858>`__ by Nitish Shirish Keskar*, Bryan McCann*,
    Lav R. Varshney, Caiming Xiong and Richard Socher.
Stefan Schweter's avatar
Stefan Schweter committed
117
12. :doc:`DeBERTa <model_doc/deberta>` (from Microsoft Research) released with the paper `DeBERTa: Decoding-enhanced
118
119
    BERT with Disentangled Attention <https://arxiv.org/abs/2006.03654>`__ by Pengcheng He, Xiaodong Liu, Jianfeng Gao,
    Weizhu Chen.
Stefan Schweter's avatar
Stefan Schweter committed
120
13. :doc:`DialoGPT <model_doc/dialogpt>` (from Microsoft Research) released with the paper `DialoGPT: Large-Scale
121
122
    Generative Pre-training for Conversational Response Generation <https://arxiv.org/abs/1911.00536>`__ by Yizhe
    Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao, Jianfeng Gao, Jingjing Liu, Bill Dolan.
Stefan Schweter's avatar
Stefan Schweter committed
123
14. :doc:`DistilBERT <model_doc/distilbert>` (from HuggingFace), released together with the paper `DistilBERT, a
124
125
126
127
128
129
    distilled version of BERT: smaller, faster, cheaper and lighter <https://arxiv.org/abs/1910.01108>`__ by Victor
    Sanh, Lysandre Debut and Thomas Wolf. The same method has been applied to compress GPT2 into `DistilGPT2
    <https://github.com/huggingface/transformers/tree/master/examples/distillation>`__, RoBERTa into `DistilRoBERTa
    <https://github.com/huggingface/transformers/tree/master/examples/distillation>`__, Multilingual BERT into
    `DistilmBERT <https://github.com/huggingface/transformers/tree/master/examples/distillation>`__ and a German
    version of DistilBERT.
Stefan Schweter's avatar
Stefan Schweter committed
130
15. :doc:`DPR <model_doc/dpr>` (from Facebook) released with the paper `Dense Passage Retrieval for Open-Domain
131
132
    Question Answering <https://arxiv.org/abs/2004.04906>`__ by Vladimir Karpukhin, Barlas Oğuz, Sewon Min, Patrick
    Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih.
Stefan Schweter's avatar
Stefan Schweter committed
133
16. :doc:`ELECTRA <model_doc/electra>` (from Google Research/Stanford University) released with the paper `ELECTRA:
134
135
    Pre-training text encoders as discriminators rather than generators <https://arxiv.org/abs/2003.10555>`__ by Kevin
    Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning.
Stefan Schweter's avatar
Stefan Schweter committed
136
17. :doc:`FlauBERT <model_doc/flaubert>` (from CNRS) released with the paper `FlauBERT: Unsupervised Language Model
137
138
    Pre-training for French <https://arxiv.org/abs/1912.05372>`__ by Hang Le, Loïc Vial, Jibril Frej, Vincent Segonne,
    Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Benoît Crabbé, Laurent Besacier, Didier Schwab.
Stefan Schweter's avatar
Stefan Schweter committed
139
18. :doc:`Funnel Transformer <model_doc/funnel>` (from CMU/Google Brain) released with the paper `Funnel-Transformer:
140
141
    Filtering out Sequential Redundancy for Efficient Language Processing <https://arxiv.org/abs/2006.03236>`__ by
    Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le.
Stefan Schweter's avatar
Stefan Schweter committed
142
19. :doc:`GPT <model_doc/gpt>` (from OpenAI) released with the paper `Improving Language Understanding by Generative
143
144
    Pre-Training <https://blog.openai.com/language-unsupervised/>`__ by Alec Radford, Karthik Narasimhan, Tim Salimans
    and Ilya Sutskever.
Stefan Schweter's avatar
Stefan Schweter committed
145
20. :doc:`GPT-2 <model_doc/gpt2>` (from OpenAI) released with the paper `Language Models are Unsupervised Multitask
146
147
    Learners <https://blog.openai.com/better-language-models/>`__ by Alec Radford*, Jeffrey Wu*, Rewon Child, David
    Luan, Dario Amodei** and Ilya Sutskever**.
Stefan Schweter's avatar
Stefan Schweter committed
148
21. :doc:`LayoutLM <model_doc/layoutlm>` (from Microsoft Research Asia) released with the paper `LayoutLM: Pre-training
149
150
    of Text and Layout for Document Image Understanding <https://arxiv.org/abs/1912.13318>`__ by Yiheng Xu, Minghao Li,
    Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou.
Stefan Schweter's avatar
Stefan Schweter committed
151
22. :doc:`LED <model_doc/led>` (from AllenAI) released with the paper `Longformer: The Long-Document Transformer
152
    <https://arxiv.org/abs/2004.05150>`__ by Iz Beltagy, Matthew E. Peters, Arman Cohan.
Stefan Schweter's avatar
Stefan Schweter committed
153
23. :doc:`Longformer <model_doc/longformer>` (from AllenAI) released with the paper `Longformer: The Long-Document
154
    Transformer <https://arxiv.org/abs/2004.05150>`__ by Iz Beltagy, Matthew E. Peters, Arman Cohan.
Stefan Schweter's avatar
Stefan Schweter committed
155
24. :doc:`LXMERT <model_doc/lxmert>` (from UNC Chapel Hill) released with the paper `LXMERT: Learning Cross-Modality
156
157
    Encoder Representations from Transformers for Open-Domain Question Answering <https://arxiv.org/abs/1908.07490>`__
    by Hao Tan and Mohit Bansal.
Stefan Schweter's avatar
Stefan Schweter committed
158
25. :doc:`MarianMT <model_doc/marian>` Machine translation models trained using `OPUS <http://opus.nlpl.eu/>`__ data by
159
160
    Jörg Tiedemann. The `Marian Framework <https://marian-nmt.github.io/>`__ is being developed by the Microsoft
    Translator Team.
Stefan Schweter's avatar
Stefan Schweter committed
161
26. :doc:`MBart <model_doc/mbart>` (from Facebook) released with the paper `Multilingual Denoising Pre-training for
162
163
    Neural Machine Translation <https://arxiv.org/abs/2001.08210>`__ by Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li,
    Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer.
Suraj Patil's avatar
Suraj Patil committed
164
165
166
167
27. :doc:`MBart-50 <model_doc/mbart>` (from Facebook) released with the paper `Multilingual Translation with Extensible
    Multilingual Pretraining and Finetuning <https://arxiv.org/abs/2008.00401>`__ by Yuqing Tang, Chau Tran, Xian Li,
    Peng-Jen Chen, Naman Goyal, Vishrav Chaudhary, Jiatao Gu, Angela Fan.
28. :doc:`MPNet <model_doc/mpnet>` (from Microsoft Research) released with the paper `MPNet: Masked and Permuted
StillKeepTry's avatar
StillKeepTry committed
168
169
    Pre-training for Language Understanding <https://arxiv.org/abs/2004.09297>`__ by Kaitao Song, Xu Tan, Tao Qin,
    Jianfeng Lu, Tie-Yan Liu.
Suraj Patil's avatar
Suraj Patil committed
170
29. :doc:`MT5 <model_doc/mt5>` (from Google AI) released with the paper `mT5: A massively multilingual pre-trained
Patrick von Platen's avatar
Patrick von Platen committed
171
172
    text-to-text transformer <https://arxiv.org/abs/2010.11934>`__ by Linting Xue, Noah Constant, Adam Roberts, Mihir
    Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel.
Suraj Patil's avatar
Suraj Patil committed
173
30. :doc:`Pegasus <model_doc/pegasus>` (from Google) released with the paper `PEGASUS: Pre-training with Extracted
174
175
    Gap-sentences for Abstractive Summarization <https://arxiv.org/abs/1912.08777>`__> by Jingqing Zhang, Yao Zhao,
    Mohammad Saleh and Peter J. Liu.
Suraj Patil's avatar
Suraj Patil committed
176
31. :doc:`ProphetNet <model_doc/prophetnet>` (from Microsoft Research) released with the paper `ProphetNet: Predicting
Lysandre's avatar
Lysandre committed
177
178
    Future N-gram for Sequence-to-Sequence Pre-training <https://arxiv.org/abs/2001.04063>`__ by Yu Yan, Weizhen Qi,
    Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou.
Suraj Patil's avatar
Suraj Patil committed
179
32. :doc:`Reformer <model_doc/reformer>` (from Google Research) released with the paper `Reformer: The Efficient
180
    Transformer <https://arxiv.org/abs/2001.04451>`__ by Nikita Kitaev, Łukasz Kaiser, Anselm Levskaya.
Suraj Patil's avatar
Suraj Patil committed
181
33. :doc:`RoBERTa <model_doc/roberta>` (from Facebook), released together with the paper a `Robustly Optimized BERT
182
    Pretraining Approach <https://arxiv.org/abs/1907.11692>`__ by Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar
183
    Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov.
Suraj Patil's avatar
Suraj Patil committed
184
34. :doc:`SqueezeBert <model_doc/squeezebert>` released with the paper `SqueezeBERT: What can computer vision teach NLP
Lysandre's avatar
Lysandre committed
185
186
    about efficient neural networks? <https://arxiv.org/abs/2006.11316>`__ by Forrest N. Iandola, Albert E. Shaw, Ravi
    Krishna, and Kurt W. Keutzer.
Suraj Patil's avatar
Suraj Patil committed
187
35. :doc:`T5 <model_doc/t5>` (from Google AI) released with the paper `Exploring the Limits of Transfer Learning with a
188
189
    Unified Text-to-Text Transformer <https://arxiv.org/abs/1910.10683>`__ by Colin Raffel and Noam Shazeer and Adam
    Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu.
Suraj Patil's avatar
Suraj Patil committed
190
36. :doc:`TAPAS <model_doc/tapas>` (from Google AI) released with the paper `TAPAS: Weakly Supervised Table Parsing via
Lysandre's avatar
Lysandre committed
191
192
    Pre-training <https://arxiv.org/abs/2004.02349>`__ by Jonathan Herzig, Paweł Krzysztof Nowak, Thomas Müller,
    Francesco Piccinno and Julian Martin Eisenschlos.
Suraj Patil's avatar
Suraj Patil committed
193
37. :doc:`Transformer-XL <model_doc/transformerxl>` (from Google/CMU) released with the paper `Transformer-XL:
194
195
    Attentive Language Models Beyond a Fixed-Length Context <https://arxiv.org/abs/1901.02860>`__ by Zihang Dai*,
    Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov.
Suraj Patil's avatar
Suraj Patil committed
196
38. :doc:`Wav2Vec2 <model_doc/wav2vec2>` (from Facebook AI) released with the paper `wav2vec 2.0: A Framework for
Patrick von Platen's avatar
Patrick von Platen committed
197
198
    Self-Supervised Learning of Speech Representations <https://arxiv.org/abs/2006.11477>`__ by Alexei Baevski, Henry
    Zhou, Abdelrahman Mohamed, Michael Auli.
Suraj Patil's avatar
Suraj Patil committed
199
39. :doc:`XLM <model_doc/xlm>` (from Facebook) released together with the paper `Cross-lingual Language Model
200
    Pretraining <https://arxiv.org/abs/1901.07291>`__ by Guillaume Lample and Alexis Conneau.
Suraj Patil's avatar
Suraj Patil committed
201
40. :doc:`XLM-ProphetNet <model_doc/xlmprophetnet>` (from Microsoft Research) released with the paper `ProphetNet:
Lysandre's avatar
Lysandre committed
202
203
    Predicting Future N-gram for Sequence-to-Sequence Pre-training <https://arxiv.org/abs/2001.04063>`__ by Yu Yan,
    Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou.
Suraj Patil's avatar
Suraj Patil committed
204
41. :doc:`XLM-RoBERTa <model_doc/xlmroberta>` (from Facebook AI), released together with the paper `Unsupervised
205
206
207
    Cross-lingual Representation Learning at Scale <https://arxiv.org/abs/1911.02116>`__ by Alexis Conneau*, Kartikay
    Khandelwal*, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke
    Zettlemoyer and Veselin Stoyanov.
Suraj Patil's avatar
Suraj Patil committed
208
42. :doc:`XLNet <model_doc/xlnet>` (from Google/CMU) released with the paper `​XLNet: Generalized Autoregressive
209
210
    Pretraining for Language Understanding <https://arxiv.org/abs/1906.08237>`__ by Zhilin Yang*, Zihang Dai*, Yiming
    Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le.
LysandreJik's avatar
LysandreJik committed
211

Sylvain Gugger's avatar
Sylvain Gugger committed
212

213
214
.. _bigtable:

Sylvain Gugger's avatar
Sylvain Gugger committed
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
The table below represents the current support in the library for each of those models, whether they have a Python
tokenizer (called "slow"). A "fast" tokenizer backed by the 🤗 Tokenizers library, whether they have support in PyTorch,
TensorFlow and/or Flax.

..
    This table is updated automatically from the auto modules with `make fix-copies`. Do not update manually!

.. rst-class:: center-aligned-table

+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|            Model            | Tokenizer slow | Tokenizer fast | PyTorch support | TensorFlow support | Flax Support |
+=============================+================+================+=================+====================+==============+
|           ALBERT            |       ✅       |       ✅       |       ✅        |         ✅         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|            BART             |       ✅       |       ✅       |       ✅        |         ✅         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|            BERT             |       ✅       |       ✅       |       ✅        |         ✅         |      ✅      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|       Bert Generation       |       ✅       |       ❌       |       ✅        |         ❌         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|         Blenderbot          |       ✅       |       ❌       |       ✅        |         ✅         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
237
|       BlenderbotSmall       |       ✅       |       ❌       |       ✅        |         ✅         |      ❌      |
238
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
Sylvain Gugger's avatar
Sylvain Gugger committed
239
240
241
242
|            CTRL             |       ✅       |       ❌       |       ✅        |         ✅         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|          CamemBERT          |       ✅       |       ✅       |       ✅        |         ✅         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
abhishek thakur's avatar
abhishek thakur committed
243
244
|          ConvBERT           |       ✅       |       ✅       |       ✅        |         ✅         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
Sylvain Gugger's avatar
Sylvain Gugger committed
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
|             DPR             |       ✅       |       ✅       |       ✅        |         ✅         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|           DeBERTa           |       ✅       |       ❌       |       ✅        |         ❌         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|         DistilBERT          |       ✅       |       ✅       |       ✅        |         ✅         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|           ELECTRA           |       ✅       |       ✅       |       ✅        |         ✅         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|       Encoder decoder       |       ❌       |       ❌       |       ✅        |         ❌         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
| FairSeq Machine-Translation |       ✅       |       ❌       |       ✅        |         ❌         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|          FlauBERT           |       ✅       |       ❌       |       ✅        |         ✅         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|     Funnel Transformer      |       ✅       |       ✅       |       ✅        |         ✅         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
Patrick von Platen's avatar
Patrick von Platen committed
261
262
|             LED             |       ✅       |       ✅       |       ✅        |         ✅         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
Sylvain Gugger's avatar
Sylvain Gugger committed
263
264
265
266
267
268
|           LXMERT            |       ✅       |       ✅       |       ✅        |         ✅         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|          LayoutLM           |       ✅       |       ✅       |       ✅        |         ❌         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|         Longformer          |       ✅       |       ✅       |       ✅        |         ✅         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
StillKeepTry's avatar
StillKeepTry committed
269
270
|            MPNet            |       ✅       |       ✅       |       ✅        |         ✅         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
Sylvain Gugger's avatar
Sylvain Gugger committed
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
|           Marian            |       ✅       |       ❌       |       ✅        |         ✅         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|         MobileBERT          |       ✅       |       ✅       |       ✅        |         ✅         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|         OpenAI GPT          |       ✅       |       ✅       |       ✅        |         ✅         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|        OpenAI GPT-2         |       ✅       |       ✅       |       ✅        |         ✅         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|           Pegasus           |       ✅       |       ✅       |       ✅        |         ✅         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|         ProphetNet          |       ✅       |       ❌       |       ✅        |         ❌         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|             RAG             |       ✅       |       ❌       |       ✅        |         ❌         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|          Reformer           |       ✅       |       ✅       |       ✅        |         ❌         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|          RetriBERT          |       ✅       |       ✅       |       ✅        |         ❌         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|           RoBERTa           |       ✅       |       ✅       |       ✅        |         ✅         |      ✅      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|         SqueezeBERT         |       ✅       |       ✅       |       ✅        |         ❌         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|             T5              |       ✅       |       ✅       |       ✅        |         ✅         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
NielsRogge's avatar
NielsRogge committed
295
296
|            TAPAS            |       ✅       |       ❌       |       ✅        |         ❌         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
Sylvain Gugger's avatar
Sylvain Gugger committed
297
298
|       Transformer-XL        |       ✅       |       ❌       |       ✅        |         ✅         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
Patrick von Platen's avatar
Patrick von Platen committed
299
300
|          Wav2Vec2           |       ✅       |       ❌       |       ✅        |         ❌         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
Sylvain Gugger's avatar
Sylvain Gugger committed
301
302
303
304
305
306
307
308
309
310
311
312
313
|             XLM             |       ✅       |       ❌       |       ✅        |         ✅         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|         XLM-RoBERTa         |       ✅       |       ✅       |       ✅        |         ✅         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|        XLMProphetNet        |       ✅       |       ❌       |       ✅        |         ❌         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|            XLNet            |       ✅       |       ✅       |       ✅        |         ✅         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|            mBART            |       ✅       |       ✅       |       ✅        |         ✅         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|             mT5             |       ✅       |       ✅       |       ✅        |         ✅         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+

314
315
.. toctree::
    :maxdepth: 2
316
    :caption: Get started
317

Sylvain Gugger's avatar
Sylvain Gugger committed
318
    quicktour
319
    installation
Sylvain Gugger's avatar
Sylvain Gugger committed
320
    philosophy
Lysandre's avatar
Lysandre committed
321
    glossary
322
323
324

.. toctree::
    :maxdepth: 2
Sylvain Gugger's avatar
Sylvain Gugger committed
325
    :caption: Using 🤗 Transformers
326

Sylvain Gugger's avatar
Sylvain Gugger committed
327
328
    task_summary
    model_summary
Sylvain Gugger's avatar
Sylvain Gugger committed
329
    preprocessing
330
    training
331
    model_sharing
Sylvain Gugger's avatar
Sylvain Gugger committed
332
    tokenizer_summary
333
334
335
336
337
338
339
    multilingual

.. toctree::
    :maxdepth: 2
    :caption: Advanced guides

    pretrained_models
340
    examples
341
    custom_datasets
342
    notebooks
343
    community
344
    converting_tensorflow_models
345
    migration
346
    contributing
347
    add_new_model
348
    testing
Funtowicz Morgan's avatar
Funtowicz Morgan committed
349
    serialization
350
351
352
353
354
355

.. toctree::
    :maxdepth: 2
    :caption: Research

    bertology
356
    perplexity
357
    benchmarks
358

thomwolf's avatar
thomwolf committed
359
360
.. toctree::
    :maxdepth: 2
361
    :caption: Main Classes
thomwolf's avatar
thomwolf committed
362

Sylvain Gugger's avatar
Sylvain Gugger committed
363
    main_classes/callback
thomwolf's avatar
thomwolf committed
364
    main_classes/configuration
365
    main_classes/logging
thomwolf's avatar
thomwolf committed
366
367
    main_classes/model
    main_classes/optimizer_schedules
368
369
    main_classes/output
    main_classes/pipelines
LysandreJik's avatar
LysandreJik committed
370
    main_classes/processors
371
372
373
374
375
376
377
378
    main_classes/tokenizer
    main_classes/trainer

.. toctree::
    :maxdepth: 2
    :caption: Models

    model_doc/albert
thomwolf's avatar
thomwolf committed
379
    model_doc/auto
380
    model_doc/bart
381
    model_doc/barthez
382
    model_doc/bert
383
    model_doc/bertweet
384
    model_doc/bertgeneration
Sam Shleifer's avatar
Sam Shleifer committed
385
    model_doc/blenderbot
386
    model_doc/blenderbot_small
Stefan Schweter's avatar
Stefan Schweter committed
387
    model_doc/bort
Lysandre's avatar
Lysandre committed
388
    model_doc/camembert
abhishek thakur's avatar
abhishek thakur committed
389
    model_doc/convbert
390
    model_doc/ctrl
Pengcheng He's avatar
Pengcheng He committed
391
    model_doc/deberta
392
    model_doc/dialogpt
393
    model_doc/distilbert
Quentin Lhoest's avatar
Quentin Lhoest committed
394
    model_doc/dpr
395
396
397
    model_doc/electra
    model_doc/encoderdecoder
    model_doc/flaubert
398
    model_doc/fsmt
Sylvain Gugger's avatar
Sylvain Gugger committed
399
    model_doc/funnel
400
    model_doc/herbert
Minghao Li's avatar
Minghao Li committed
401
    model_doc/layoutlm
Patrick von Platen's avatar
Patrick von Platen committed
402
    model_doc/led
403
404
405
406
407
    model_doc/longformer
    model_doc/lxmert
    model_doc/marian
    model_doc/mbart
    model_doc/mobilebert
StillKeepTry's avatar
StillKeepTry committed
408
    model_doc/mpnet
Patrick von Platen's avatar
Patrick von Platen committed
409
    model_doc/mt5
410
411
412
    model_doc/gpt
    model_doc/gpt2
    model_doc/pegasus
413
    model_doc/phobert
Weizhen's avatar
Weizhen committed
414
    model_doc/prophetnet
Sylvain Gugger's avatar
Sylvain Gugger committed
415
    model_doc/rag
416
417
418
    model_doc/reformer
    model_doc/retribert
    model_doc/roberta
419
    model_doc/squeezebert
420
    model_doc/t5
NielsRogge's avatar
NielsRogge committed
421
    model_doc/tapas
422
    model_doc/transformerxl
Patrick von Platen's avatar
Patrick von Platen committed
423
    model_doc/wav2vec2
424
    model_doc/xlm
Weizhen's avatar
Weizhen committed
425
    model_doc/xlmprophetnet
426
427
428
429
430
431
432
    model_doc/xlmroberta
    model_doc/xlnet

.. toctree::
    :maxdepth: 2
    :caption: Internal Helpers

Sylvain Gugger's avatar
Sylvain Gugger committed
433
    internal/modeling_utils
434
    internal/pipelines_utils
435
    internal/tokenization_utils
Sylvain Gugger's avatar
Sylvain Gugger committed
436
    internal/trainer_utils
437
    internal/generation_utils