index.rst 14.8 KB
Newer Older
1
Transformers
Sylvain Gugger's avatar
Sylvain Gugger committed
2
=======================================================================================================================
3

4
State-of-the-art Natural Language Processing for Pytorch and TensorFlow 2.0.
thomwolf's avatar
thomwolf committed
5

6
7
8
馃 Transformers (formerly known as `pytorch-transformers` and `pytorch-pretrained-bert`) provides general-purpose
architectures (BERT, GPT-2, RoBERTa, XLM, DistilBert, XLNet...) for Natural Language Understanding (NLU) and Natural
Language Generation (NLG) with over 32+ pretrained models in 100+ languages and deep interoperability between
9
10
11
TensorFlow 2.0 and PyTorch.

This is the documentation of our repository `transformers <https://github.com/huggingface/transformers>`_.
12

LysandreJik's avatar
LysandreJik committed
13
Features
Sylvain Gugger's avatar
Sylvain Gugger committed
14
-----------------------------------------------------------------------------------------------------------------------
LysandreJik's avatar
LysandreJik committed
15
16
17
18

- High performance on NLU and NLG tasks
- Low barrier to entry for educators and practitioners

LysandreJik's avatar
LysandreJik committed
19
20
State-of-the-art NLP for everyone:

LysandreJik's avatar
LysandreJik committed
21
22
23
24
- Deep learning researchers
- Hands-on practitioners
- AI/ML/NLP teachers and educators

LysandreJik's avatar
LysandreJik committed
25
26
Lower compute costs, smaller carbon footprint:

LysandreJik's avatar
LysandreJik committed
27
28
29
30
- Researchers can share trained models instead of always retraining
- Practitioners can reduce compute time and production costs
- 8 architectures with over 30 pretrained models, some in more than 100 languages

LysandreJik's avatar
LysandreJik committed
31
32
Choose the right framework for every part of a model's lifetime:

LysandreJik's avatar
LysandreJik committed
33
34
35
36
37
38
- Train state-of-the-art models in 3 lines of code
- Deep interoperability between TensorFlow 2.0 and PyTorch models
- Move a single model between TF2.0/PyTorch frameworks at will
- Seamlessly pick the right framework for training, evaluation, production

Contents
Sylvain Gugger's avatar
Sylvain Gugger committed
39
-----------------------------------------------------------------------------------------------------------------------
LysandreJik's avatar
LysandreJik committed
40

Sylvain Gugger's avatar
Sylvain Gugger committed
41
42
43
44
The documentation is organized in five parts:

- **GET STARTED** contains a quick tour, the installation instructions and some useful information about our philosophy
  and a glossary.
Sylvain Gugger's avatar
Sylvain Gugger committed
45
- **USING 馃 TRANSFORMERS** contains general tutorials on how to use the library.
Sylvain Gugger's avatar
Sylvain Gugger committed
46
47
48
- **ADVANCED GUIDES** contains more advanced guides that are more specific to a given script or part of the library.
- **RESEARCH** focuses on tutorials that have less to do with how to use the library but more about general resarch in
  transformers model
49
- The three last section contain the documentation of each public class and function, grouped in:
Sylvain Gugger's avatar
Sylvain Gugger committed
50

51
52
53
    - **MAIN CLASSES** for the main classes exposing the important APIs of the library.
    - **MODELS** for the classes and functions related to each model implemented in the library.
    - **INTERNAL HELPERS** for the classes and functions we use internally.
Sylvain Gugger's avatar
Sylvain Gugger committed
54

55
56
57
The library currently contains PyTorch and Tensorflow implementations, pre-trained model weights, usage scripts and
conversion utilities for the following models:

58
59
60
..
    This list is updated automatically from the README with `make fix-copies`. Do not update manually!

61
62
63
64
65
66
1. :doc:`ALBERT <model_doc/albert>` (from Google Research and the Toyota Technological Institute at Chicago) released
   with the paper `ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
   <https://arxiv.org/abs/1909.11942>`__, by Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush
   Sharma, Radu Soricut.
2. :doc:`BART <model_doc/bart>` (from Facebook) released with the paper `BART: Denoising Sequence-to-Sequence
   Pre-training for Natural Language Generation, Translation, and Comprehension
67
68
   <https://arxiv.org/pdf/1910.13461.pdf>`__ by Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman
   Mohamed, Omer Levy, Ves Stoyanov and Luke Zettlemoyer.
69
70
71
72
73
74
3. :doc:`BERT <model_doc/bert>` (from Google) released with the paper `BERT: Pre-training of Deep Bidirectional
   Transformers for Language Understanding <https://arxiv.org/abs/1810.04805>`__ by Jacob Devlin, Ming-Wei Chang,
   Kenton Lee and Kristina Toutanova.
4. :doc:`BERT For Sequence Generation <model_doc/bertgeneration>` (from Google) released with the paper `Leveraging
   Pre-trained Checkpoints for Sequence Generation Tasks <https://arxiv.org/abs/1907.12461>`__ by Sascha Rothe, Shashi
   Narayan, Aliaksei Severyn.
Lysandre's avatar
Lysandre committed
75
76
77
5. :doc:`Blenderbot <model_doc/blenderbot>` (from Facebook) released with the paper `Recipes for building an
   open-domain chatbot <https://arxiv.org/abs/2004.13637>`__ by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary
   Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston.
78
79
80
81
82
83
6. :doc:`CamemBERT <model_doc/camembert>` (from Inria/Facebook/Sorbonne) released with the paper `CamemBERT: a Tasty
   French Language Model <https://arxiv.org/abs/1911.03894>`__ by Louis Martin*, Benjamin Muller*, Pedro Javier Ortiz
   Su谩rez*, Yoann Dupont, Laurent Romary, 脡ric Villemonte de la Clergerie, Djam茅 Seddah and Beno卯t Sagot.
7. :doc:`CTRL <model_doc/ctrl>` (from Salesforce) released with the paper `CTRL: A Conditional Transformer Language
   Model for Controllable Generation <https://arxiv.org/abs/1909.05858>`__ by Nitish Shirish Keskar*, Bryan McCann*,
   Lav R. Varshney, Caiming Xiong and Richard Socher.
Lysandre's avatar
Lysandre committed
84
85
86
8. :doc:`DeBERTa <model_doc/deberta>` (from Microsoft Research) released with the paper `DeBERTa: Decoding-enhanced
   BERT with Disentangled Attention <https://arxiv.org/abs/2006.03654>`__ by Pengcheng He, Xiaodong Liu, Jianfeng Gao,
   Weizhu Chen.
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
9. :doc:`DialoGPT <model_doc/dialogpt>` (from Microsoft Research) released with the paper `DialoGPT: Large-Scale
   Generative Pre-training for Conversational Response Generation <https://arxiv.org/abs/1911.00536>`__ by Yizhe Zhang,
   Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao, Jianfeng Gao, Jingjing Liu, Bill Dolan.
10. :doc:`DistilBERT <model_doc/distilbert>` (from HuggingFace), released together with the paper `DistilBERT, a
    distilled version of BERT: smaller, faster, cheaper and lighter <https://arxiv.org/abs/1910.01108>`__ by Victor
    Sanh, Lysandre Debut and Thomas Wolf. The same method has been applied to compress GPT2 into `DistilGPT2
    <https://github.com/huggingface/transformers/tree/master/examples/distillation>`__, RoBERTa into `DistilRoBERTa
    <https://github.com/huggingface/transformers/tree/master/examples/distillation>`__, Multilingual BERT into
    `DistilmBERT <https://github.com/huggingface/transformers/tree/master/examples/distillation>`__ and a German
    version of DistilBERT.
11. :doc:`DPR <model_doc/dpr>` (from Facebook) released with the paper `Dense Passage Retrieval for Open-Domain
    Question Answering <https://arxiv.org/abs/2004.04906>`__ by Vladimir Karpukhin, Barlas O臒uz, Sewon Min, Patrick
    Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih.
12. :doc:`ELECTRA <model_doc/electra>` (from Google Research/Stanford University) released with the paper `ELECTRA:
    Pre-training text encoders as discriminators rather than generators <https://arxiv.org/abs/2003.10555>`__ by Kevin
    Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning.
13. :doc:`FlauBERT <model_doc/flaubert>` (from CNRS) released with the paper `FlauBERT: Unsupervised Language Model
    Pre-training for French <https://arxiv.org/abs/1912.05372>`__ by Hang Le, Lo茂c Vial, Jibril Frej, Vincent Segonne,
    Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Beno卯t Crabb茅, Laurent Besacier, Didier Schwab.
14. :doc:`Funnel Transformer <model_doc/funnel>` (from CMU/Google Brain) released with the paper `Funnel-Transformer:
    Filtering out Sequential Redundancy for Efficient Language Processing <https://arxiv.org/abs/2006.03236>`__ by
    Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le.
15. :doc:`GPT <model_doc/gpt>` (from OpenAI) released with the paper `Improving Language Understanding by Generative
    Pre-Training <https://blog.openai.com/language-unsupervised/>`__ by Alec Radford, Karthik Narasimhan, Tim Salimans
    and Ilya Sutskever.
16. :doc:`GPT-2 <model_doc/gpt2>` (from OpenAI) released with the paper `Language Models are Unsupervised Multitask
    Learners <https://blog.openai.com/better-language-models/>`__ by Alec Radford*, Jeffrey Wu*, Rewon Child, David
    Luan, Dario Amodei** and Ilya Sutskever**.
17. :doc:`LayoutLM <model_doc/layoutlm>` (from Microsoft Research Asia) released with the paper `LayoutLM: Pre-training
    of Text and Layout for Document Image Understanding <https://arxiv.org/abs/1912.13318>`__ by Yiheng Xu, Minghao Li,
    Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou.
18. :doc:`Longformer <model_doc/longformer>` (from AllenAI) released with the paper `Longformer: The Long-Document
    Transformer <https://arxiv.org/abs/2004.05150>`__ by Iz Beltagy, Matthew E. Peters, Arman Cohan.
19. :doc:`LXMERT <model_doc/lxmert>` (from UNC Chapel Hill) released with the paper `LXMERT: Learning Cross-Modality
    Encoder Representations from Transformers for Open-Domain Question Answering <https://arxiv.org/abs/1908.07490>`__
    by Hao Tan and Mohit Bansal.
20. :doc:`MarianMT <model_doc/marian>` Machine translation models trained using `OPUS <http://opus.nlpl.eu/>`__ data by
    J枚rg Tiedemann. The `Marian Framework <https://marian-nmt.github.io/>`__ is being developed by the Microsoft
    Translator Team.
Sylvain Gugger's avatar
Sylvain Gugger committed
126
21. :doc:`MBart <model_doc/mbart>` (from Facebook) released with the paper `Multilingual Denoising Pre-training for
127
128
129
130
131
    Neural Machine Translation <https://arxiv.org/abs/2001.08210>`__ by Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li,
    Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer.
22. :doc:`Pegasus <model_doc/pegasus>` (from Google) released with the paper `PEGASUS: Pre-training with Extracted
    Gap-sentences for Abstractive Summarization <https://arxiv.org/abs/1912.08777>`__> by Jingqing Zhang, Yao Zhao,
    Mohammad Saleh and Peter J. Liu.
Lysandre's avatar
Lysandre committed
132
133
134
23. :doc:`ProphetNet <model_doc/prophetnet>` (from Microsoft Research) released with the paper `ProphetNet: Predicting
    Future N-gram for Sequence-to-Sequence Pre-training <https://arxiv.org/abs/2001.04063>`__ by Yu Yan, Weizhen Qi,
    Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou.
Weizhen's avatar
Weizhen committed
135
24. :doc:`Reformer <model_doc/reformer>` (from Google Research) released with the paper `Reformer: The Efficient
136
    Transformer <https://arxiv.org/abs/2001.04451>`__ by Nikita Kitaev, 艁ukasz Kaiser, Anselm Levskaya.
Weizhen's avatar
Weizhen committed
137
25. :doc:`RoBERTa <model_doc/roberta>` (from Facebook), released together with the paper a `Robustly Optimized BERT
138
139
    Pretraining Approach <https://arxiv.org/abs/1907.11692>`__ by Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar
    Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov. ultilingual BERT into `DistilmBERT
140
141
    <https://github.com/huggingface/transformers/tree/master/examples/distillation>`__ and a German version of
    DistilBERT.
Lysandre's avatar
Lysandre committed
142
143
144
26. :doc:`SqueezeBert <model_doc/squeezebert>` released with the paper `SqueezeBERT: What can computer vision teach NLP
    about efficient neural networks? <https://arxiv.org/abs/2006.11316>`__ by Forrest N. Iandola, Albert E. Shaw, Ravi
    Krishna, and Kurt W. Keutzer.
Weizhen's avatar
Weizhen committed
145
27. :doc:`T5 <model_doc/t5>` (from Google AI) released with the paper `Exploring the Limits of Transfer Learning with a
146
147
    Unified Text-to-Text Transformer <https://arxiv.org/abs/1910.10683>`__ by Colin Raffel and Noam Shazeer and Adam
    Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu.
Weizhen's avatar
Weizhen committed
148
28. :doc:`Transformer-XL <model_doc/transformerxl>` (from Google/CMU) released with the paper `Transformer-XL:
149
150
    Attentive Language Models Beyond a Fixed-Length Context <https://arxiv.org/abs/1901.02860>`__ by Zihang Dai*,
    Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov.
Weizhen's avatar
Weizhen committed
151
29. :doc:`XLM <model_doc/xlm>` (from Facebook) released together with the paper `Cross-lingual Language Model
152
    Pretraining <https://arxiv.org/abs/1901.07291>`__ by Guillaume Lample and Alexis Conneau.
Lysandre's avatar
Lysandre committed
153
154
155
30. :doc:`XLM-ProphetNet <model_doc/xlmprophetnet>` (from Microsoft Research) released with the paper `ProphetNet:
    Predicting Future N-gram for Sequence-to-Sequence Pre-training <https://arxiv.org/abs/2001.04063>`__ by Yu Yan,
    Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou.
Weizhen's avatar
Weizhen committed
156
31. :doc:`XLM-RoBERTa <model_doc/xlmroberta>` (from Facebook AI), released together with the paper `Unsupervised
157
158
159
    Cross-lingual Representation Learning at Scale <https://arxiv.org/abs/1911.02116>`__ by Alexis Conneau*, Kartikay
    Khandelwal*, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzm谩n, Edouard Grave, Myle Ott, Luke
    Zettlemoyer and Veselin Stoyanov.
Weizhen's avatar
Weizhen committed
160
32. :doc:`XLNet <model_doc/xlnet>` (from Google/CMU) released with the paper `鈥媂LNet: Generalized Autoregressive
161
162
    Pretraining for Language Understanding <https://arxiv.org/abs/1906.08237>`__ by Zhilin Yang*, Zihang Dai*, Yiming
    Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le.
Weizhen's avatar
Weizhen committed
163
33. `Other community models <https://huggingface.co/models>`__, contributed by the `community
164
    <https://huggingface.co/users>`__.
LysandreJik's avatar
LysandreJik committed
165

166
167
.. toctree::
    :maxdepth: 2
168
    :caption: Get started
169

Sylvain Gugger's avatar
Sylvain Gugger committed
170
    quicktour
171
    installation
Sylvain Gugger's avatar
Sylvain Gugger committed
172
    philosophy
Lysandre's avatar
Lysandre committed
173
    glossary
174
175
176

.. toctree::
    :maxdepth: 2
Sylvain Gugger's avatar
Sylvain Gugger committed
177
    :caption: Using 馃 Transformers
178

Sylvain Gugger's avatar
Sylvain Gugger committed
179
180
    task_summary
    model_summary
Sylvain Gugger's avatar
Sylvain Gugger committed
181
    preprocessing
182
    training
183
    model_sharing
Sylvain Gugger's avatar
Sylvain Gugger committed
184
    tokenizer_summary
185
186
187
188
189
190
191
    multilingual

.. toctree::
    :maxdepth: 2
    :caption: Advanced guides

    pretrained_models
192
    examples
193
    custom_datasets
194
    notebooks
195
    converting_tensorflow_models
196
    migration
197
    contributing
198
    testing
Funtowicz Morgan's avatar
Funtowicz Morgan committed
199
    serialization
200
201
202
203
204
205

.. toctree::
    :maxdepth: 2
    :caption: Research

    bertology
206
    perplexity
207
    benchmarks
208

thomwolf's avatar
thomwolf committed
209
210
.. toctree::
    :maxdepth: 2
211
    :caption: Main Classes
thomwolf's avatar
thomwolf committed
212

Sylvain Gugger's avatar
Sylvain Gugger committed
213
    main_classes/callback
thomwolf's avatar
thomwolf committed
214
    main_classes/configuration
215
    main_classes/logging
thomwolf's avatar
thomwolf committed
216
217
    main_classes/model
    main_classes/optimizer_schedules
218
219
    main_classes/output
    main_classes/pipelines
LysandreJik's avatar
LysandreJik committed
220
    main_classes/processors
221
222
223
224
225
226
227
228
    main_classes/tokenizer
    main_classes/trainer

.. toctree::
    :maxdepth: 2
    :caption: Models

    model_doc/albert
thomwolf's avatar
thomwolf committed
229
    model_doc/auto
230
    model_doc/bart
231
    model_doc/bert
232
    model_doc/bertgeneration
Sam Shleifer's avatar
Sam Shleifer committed
233
    model_doc/blenderbot
Lysandre's avatar
Lysandre committed
234
    model_doc/camembert
235
    model_doc/ctrl
Pengcheng He's avatar
Pengcheng He committed
236
    model_doc/deberta
237
    model_doc/dialogpt
238
    model_doc/distilbert
Quentin Lhoest's avatar
Quentin Lhoest committed
239
    model_doc/dpr
240
241
242
    model_doc/electra
    model_doc/encoderdecoder
    model_doc/flaubert
243
    model_doc/fsmt
Sylvain Gugger's avatar
Sylvain Gugger committed
244
    model_doc/funnel
Minghao Li's avatar
Minghao Li committed
245
    model_doc/layoutlm
246
247
248
249
250
251
252
253
    model_doc/longformer
    model_doc/lxmert
    model_doc/marian
    model_doc/mbart
    model_doc/mobilebert
    model_doc/gpt
    model_doc/gpt2
    model_doc/pegasus
Weizhen's avatar
Weizhen committed
254
    model_doc/prophetnet
Sylvain Gugger's avatar
Sylvain Gugger committed
255
    model_doc/rag
256
257
258
    model_doc/reformer
    model_doc/retribert
    model_doc/roberta
259
    model_doc/squeezebert
260
261
262
    model_doc/t5
    model_doc/transformerxl
    model_doc/xlm
Weizhen's avatar
Weizhen committed
263
    model_doc/xlmprophetnet
264
265
266
267
268
269
270
    model_doc/xlmroberta
    model_doc/xlnet

.. toctree::
    :maxdepth: 2
    :caption: Internal Helpers

Sylvain Gugger's avatar
Sylvain Gugger committed
271
    internal/modeling_utils
272
    internal/pipelines_utils
273
    internal/tokenization_utils
Sylvain Gugger's avatar
Sylvain Gugger committed
274
    internal/trainer_utils