"docs/vscode:/vscode.git/clone" did not exist on "dc7d2daa4c6080c14eb370076a62a17e6b30ab15"
index.rst 14.1 KB
Newer Older
1
Transformers
Sylvain Gugger's avatar
Sylvain Gugger committed
2
=======================================================================================================================
3

4
State-of-the-art Natural Language Processing for Pytorch and TensorFlow 2.0.
thomwolf's avatar
thomwolf committed
5

6
7
8
馃 Transformers (formerly known as `pytorch-transformers` and `pytorch-pretrained-bert`) provides general-purpose
architectures (BERT, GPT-2, RoBERTa, XLM, DistilBert, XLNet...) for Natural Language Understanding (NLU) and Natural
Language Generation (NLG) with over 32+ pretrained models in 100+ languages and deep interoperability between
9
10
11
TensorFlow 2.0 and PyTorch.

This is the documentation of our repository `transformers <https://github.com/huggingface/transformers>`_.
12

LysandreJik's avatar
LysandreJik committed
13
Features
Sylvain Gugger's avatar
Sylvain Gugger committed
14
-----------------------------------------------------------------------------------------------------------------------
LysandreJik's avatar
LysandreJik committed
15
16
17
18

- High performance on NLU and NLG tasks
- Low barrier to entry for educators and practitioners

LysandreJik's avatar
LysandreJik committed
19
20
State-of-the-art NLP for everyone:

LysandreJik's avatar
LysandreJik committed
21
22
23
24
- Deep learning researchers
- Hands-on practitioners
- AI/ML/NLP teachers and educators

LysandreJik's avatar
LysandreJik committed
25
26
Lower compute costs, smaller carbon footprint:

LysandreJik's avatar
LysandreJik committed
27
28
29
30
- Researchers can share trained models instead of always retraining
- Practitioners can reduce compute time and production costs
- 8 architectures with over 30 pretrained models, some in more than 100 languages

LysandreJik's avatar
LysandreJik committed
31
32
Choose the right framework for every part of a model's lifetime:

LysandreJik's avatar
LysandreJik committed
33
34
35
36
37
38
- Train state-of-the-art models in 3 lines of code
- Deep interoperability between TensorFlow 2.0 and PyTorch models
- Move a single model between TF2.0/PyTorch frameworks at will
- Seamlessly pick the right framework for training, evaluation, production

Contents
Sylvain Gugger's avatar
Sylvain Gugger committed
39
-----------------------------------------------------------------------------------------------------------------------
LysandreJik's avatar
LysandreJik committed
40

Sylvain Gugger's avatar
Sylvain Gugger committed
41
42
43
44
The documentation is organized in five parts:

- **GET STARTED** contains a quick tour, the installation instructions and some useful information about our philosophy
  and a glossary.
Sylvain Gugger's avatar
Sylvain Gugger committed
45
- **USING 馃 TRANSFORMERS** contains general tutorials on how to use the library.
Sylvain Gugger's avatar
Sylvain Gugger committed
46
47
48
- **ADVANCED GUIDES** contains more advanced guides that are more specific to a given script or part of the library.
- **RESEARCH** focuses on tutorials that have less to do with how to use the library but more about general resarch in
  transformers model
49
50
51
52
- The three last section contain the documentation of each public class and function, grouped in:
    - **MAIN CLASSES** for the main classes exposing the important APIs of the library.
    - **MODELS** for the classes and functions related to each model implemented in the library.
    - **INTERNAL HELPERS** for the classes and functions we use internally.
Sylvain Gugger's avatar
Sylvain Gugger committed
53

54
55
56
The library currently contains PyTorch and Tensorflow implementations, pre-trained model weights, usage scripts and
conversion utilities for the following models:

57
58
59
60
61
62
63
64
1. `ALBERT <https://github.com/google-research/ALBERT>`_ (from Google Research), released together with the paper
   `ALBERT: A Lite BERT for Self-supervised Learning of Language Representations <https://arxiv.org/abs/1909.11942>`_
   by Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, and Radu Soricut.
2. `BART <https://github.com/pytorch/fairseq/tree/master/examples/bart>`_ (from Facebook) released with the paper
   `BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
   <https://arxiv.org/pdf/1910.13461.pdf>`_ by Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman
   Mohamed, Omer Levy, Ves Stoyanov, and Luke Zettlemoyer.
3. `BERT <https://github.com/google-research/bert>`_ (from Google) released with the paper `BERT: Pre-training of Deep
65
66
   Bidirectional Transformers for Language Understanding <https://arxiv.org/abs/1810.04805>`_ by Jacob Devlin, Ming-Wei
   Chang, Kenton Lee, and Kristina Toutanova.
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
4. `BERT For Sequence Generation <https://tfhub.dev/s?module-type=text-generation&subtype=module,placeholder>`_
   (from Google) released with the paper `Leveraging Pre-trained Checkpoints for Sequence Generation Tasks
   <https://arxiv.org/abs/1907.12461>`_ by Sascha Rothe, Shashi Narayan, Aliaksei Severyn.
5. `CamemBERT <https://huggingface.co/transformers/model_doc/camembert.html>`_ (from FAIR, Inria, Sorbonne Universit茅)
   released together with the paper `CamemBERT: a Tasty French Language Model <https://arxiv.org/abs/1911.03894>`_ by
   Louis Martin, Benjamin Muller, Pedro Javier Ortiz Suarez, Yoann Dupont, Laurent Romary, Eric Villemonte de la
   Clergerie, Djame Seddah, and Beno卯t Sagot.
6. `CTRL <https://github.com/pytorch/fairseq/tree/master/examples/ctrl>`_ (from Salesforce), released together with the
   paper `CTRL: A Conditional Transformer Language Model for Controllable Generation
   <https://www.github.com/salesforce/ctrl>`_ by Nitish Shirish Keskar, Bryan McCann, Lav R. Varshney, Caiming Xiong,
   and Richard Socher.
7. `DeBERTa <https://huggingface.co/transformers/model_doc/deberta.html>`_ (from Microsoft Research) released with the
   paper `DeBERTa: Decoding-enhanced BERT with Disentangled Attention <https://arxiv.org/abs/2006.03654>`_ by Pengcheng
   He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen.
8. `DialoGPT <https://github.com/microsoft/DialoGPT>`_ (from Microsoft Research) released with the paper `DialoGPT:
   Large-Scale Generative Pre-training for Conversational Response Generation <https://arxiv.org/abs/1911.00536>`_ by
   Yizhe Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao, Jianfeng Gao, Jingjing Liu,
   and Bill Dolan.
9. `DistilBERT <https://huggingface.co/transformers/model_doc/distilbert.html>`_ (from HuggingFace) released together
86
87
88
89
   with the paper `DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
   <https://arxiv.org/abs/1910.01108>`_ by Victor Sanh, Lysandre Debut, and Thomas Wolf. The same method has been
   applied to compress GPT2 into
   `DistilGPT2 <https://github.com/huggingface/transformers/tree/master/examples/distillation>`_.
90
91
92
93
94
95
96
10. `DPR <https://github.com/facebookresearch/DPR>`_ (from Facebook) released with the paper `Dense Passage Retrieval
    for Open-Domain Question Answering <https://arxiv.org/abs/2004.04906>`_ by Vladimir Karpukhin, Barlas O臒uz, Sewon
    Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih.
11. `ELECTRA <https://github.com/google-research/electra>`_ (from Google Research/Stanford University) released with
    the paper `ELECTRA: Pre-training text encoders as discriminators rather than generators
    <https://arxiv.org/abs/2003.10555>`_ by Kevin Clark, Minh-Thang Luong, Quoc V. Le, and Christopher D. Manning.
12. `FlauBERT <https://github.com/getalp/Flaubert>`_ (from CNRS) released with the paper `FlauBERT: Unsupervised
97
98
99
    Language Model Pre-training for French <https://arxiv.org/abs/1912.05372>`_ by Hang Le, Lo茂c Vial, Jibril Frej,
    Vincent Segonne, Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Beno卯t Crabb茅, Laurent Besacier, and
    Didier Schwab.
100
13. `Funnel Transformer <https://github.com/laiguokun/Funnel-Transformer>`_ (from CMU/Google Brain) released with the paper
Sylvain Gugger's avatar
Sylvain Gugger committed
101
102
    `Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing
    <https://arxiv.org/abs/2006.03236>`_ by Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le.
103
104
105
106
107
108
109
110
14. `GPT <https://github.com/openai/finetune-transformer-lm>`_ (from OpenAI) released with the paper `Improving Language
    Understanding by Generative Pre-Training <https://blog.openai.com/language-unsupervised>`_ by Alec Radford, Karthik
    Narasimhan, Tim Salimans, and Ilya Sutskever.
15. `GPT-2 <https://blog.openai.com/better-language-models>`_ (from OpenAI) released with the paper `Language Models are
    Unsupervised Multitask Learners <https://blog.openai.com/better-language-models>`_ by Alec Radford, Jeffrey Wu,
    Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever.
16. `LayoutLM <https://github.com/microsoft/unilm/tree/master/layoutlm>`_ (from Microsoft Research Asia) released with
    the paper `LayoutLM: Pre-training of Text and Layout for Document Image Understanding
Minghao Li's avatar
Minghao Li committed
111
    <https://arxiv.org/abs/1912.13318>`_ by Yiheng Xu, Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou.
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
17. `Longformer <https://github.com/allenai/longformer>`_ (from AllenAI) released with the paper `Longformer: The
    Long-Document Transformer <https://arxiv.org/abs/2004.05150>`_ by Iz Beltagy, Matthew E. Peters, and Arman Cohan.
18. `LXMERT <https://github.com/airsplay/lxmert>`_ (from UNC Chapel Hill) released with the paper `LXMERT: Learning
    Cross-Modality Encoder Representations from Transformers for Open-Domain Question
    Answering <https://arxiv.org/abs/1908.07490>`_ by Hao Tan and Mohit Bansal.
19. `MarianMT <https://marian-nmt.github.io/>`_ (developed by the Microsoft Translator Team) machine translation models
    trained using `OPUS <http://opus.nlpl.eu/>`_ pretrained_models data by J枚rg Tiedemann.
20. `MBart <https://github.com/pytorch/fairseq/tree/master/examples/mbart>`_ (from Facebook) released with the paper
    `Multilingual Denoising Pre-training for Neural Machine Translation <https://arxiv.org/abs/2001.08210>`_ by Yinhan
    Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer.
21. `MMBT <https://github.com/facebookresearch/mmbt/>`_ (from Facebook), released together with the paper a `Supervised
    Multimodal Bitransformers for Classifying Images and Text <https://arxiv.org/pdf/1909.02950.pdf>`_ by Douwe Kiela,
    Suvrat Bhooshan, Hamed Firooz, and Davide Testuggine.
22. `Pegasus <https://github.com/google-research/pegasus>`_ (from Google) released with the paper `PEGASUS:
    Pre-training with Extracted Gap-sentences for Abstractive Summarization <https://arxiv.org/abs/1912.08777>`_ by
    Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu.
23. `Reformer <https://github.com/google/trax/tree/master/trax/models/reformer>`_ (from Google Research) released with
    the paper `Reformer: The Efficient Transformer <https://arxiv.org/abs/2001.04451>`_ by Nikita Kitaev, 艁ukasz
    Kaiser, and Anselm Levskaya.
24. `RoBERTa <https://github.com/pytorch/fairseq/tree/master/examples/roberta>`_ (from Facebook), released together with
    the paper a `Robustly Optimized BERT Pretraining Approach <https://arxiv.org/abs/1907.11692>`_ by Yinhan Liu, Myle
    Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin
    Stoyanov.
25. `T5 <https://github.com/google-research/text-to-text-transfer-transformer>`_ (from Google) released with the paper
    `Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
    <https://arxiv.org/abs/1910.10683>`_ by Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang,
    Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu.
26. `Transformer-XL <https://github.com/kimiyoung/transformer-xl>`_ (from Google/CMU) released with the paper
    `Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context <https://arxiv.org/abs/1901.02860>`_ by
    Zihang Dai, Zhilin Yang, Yiming Yang, Jaime Carbonell, Quoc V. Le, and Ruslan Salakhutdinov.
27. `XLM <https://github.com/facebookresearch/XLM>`_ (from Facebook) released together with the paper `Cross-lingual
    Language Model Pretraining <https://arxiv.org/abs/1901.07291>`_ by Guillaume Lample and Alexis Conneau.
28. `XLM-RoBERTa <https://github.com/pytorch/fairseq/tree/master/examples/xlmr>`_ (from Facebook AI), released together
    with the paper `Unsupervised Cross-lingual Representation Learning at Scale <https://arxiv.org/abs/1911.02116>`_ by
    Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzm谩n, Edouard
    Grave, Myle Ott, Luke Zettlemoyer, and Veselin Stoyanov.
29. `XLNet <https://github.com/zihangdai/xlnet>`_ (from Google/CMU) released with the paper `鈥媂LNet: Generalized
    Autoregressive Pretraining for Language Understanding <https://arxiv.org/abs/1906.08237>`_ by Zhilin Yang, Zihang
    Dai, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, and Quoc V. Le.
151
152
153
154
30. SqueezeBERT (from UC Berkeley) released with the paper
    `SqueezeBERT: What can computer vision teach NLP about efficient neural networks? <https://arxiv.org/abs/2006.11316>`_
    by Forrest N. Iandola, Albert E. Shaw, Ravi Krishna, and Kurt W. Keutzer.
31. `Other community models <https://huggingface.co/models>`_, contributed by the `community
155
    <https://huggingface.co/users>`_.
LysandreJik's avatar
LysandreJik committed
156

157
158
.. toctree::
    :maxdepth: 2
159
    :caption: Get started
160

Sylvain Gugger's avatar
Sylvain Gugger committed
161
    quicktour
162
    installation
Sylvain Gugger's avatar
Sylvain Gugger committed
163
    philosophy
Lysandre's avatar
Lysandre committed
164
    glossary
165
166
167

.. toctree::
    :maxdepth: 2
Sylvain Gugger's avatar
Sylvain Gugger committed
168
    :caption: Using 馃 Transformers
169

Sylvain Gugger's avatar
Sylvain Gugger committed
170
171
    task_summary
    model_summary
Sylvain Gugger's avatar
Sylvain Gugger committed
172
    preprocessing
173
    training
174
    model_sharing
Sylvain Gugger's avatar
Sylvain Gugger committed
175
    tokenizer_summary
176
177
178
179
180
181
182
    multilingual

.. toctree::
    :maxdepth: 2
    :caption: Advanced guides

    pretrained_models
183
    examples
184
    custom_datasets
185
    notebooks
186
    converting_tensorflow_models
187
    migration
188
    contributing
189
    testing
Funtowicz Morgan's avatar
Funtowicz Morgan committed
190
    serialization
191
192
193
194
195
196

.. toctree::
    :maxdepth: 2
    :caption: Research

    bertology
197
    perplexity
198
    benchmarks
199

thomwolf's avatar
thomwolf committed
200
201
.. toctree::
    :maxdepth: 2
202
    :caption: Main Classes
thomwolf's avatar
thomwolf committed
203
204

    main_classes/configuration
205
    main_classes/logging
thomwolf's avatar
thomwolf committed
206
207
    main_classes/model
    main_classes/optimizer_schedules
208
209
    main_classes/output
    main_classes/pipelines
LysandreJik's avatar
LysandreJik committed
210
    main_classes/processors
211
212
213
214
215
216
217
218
    main_classes/tokenizer
    main_classes/trainer

.. toctree::
    :maxdepth: 2
    :caption: Models

    model_doc/albert
thomwolf's avatar
thomwolf committed
219
    model_doc/auto
220
    model_doc/bart
221
    model_doc/bert
222
    model_doc/bertgeneration
Lysandre's avatar
Lysandre committed
223
    model_doc/camembert
224
    model_doc/ctrl
Pengcheng He's avatar
Pengcheng He committed
225
    model_doc/deberta
226
    model_doc/dialogpt
227
    model_doc/distilbert
Quentin Lhoest's avatar
Quentin Lhoest committed
228
    model_doc/dpr
229
230
231
    model_doc/electra
    model_doc/encoderdecoder
    model_doc/flaubert
232
    model_doc/fsmt
Sylvain Gugger's avatar
Sylvain Gugger committed
233
    model_doc/funnel
Minghao Li's avatar
Minghao Li committed
234
    model_doc/layoutlm
235
236
237
238
239
240
241
242
    model_doc/longformer
    model_doc/lxmert
    model_doc/marian
    model_doc/mbart
    model_doc/mobilebert
    model_doc/gpt
    model_doc/gpt2
    model_doc/pegasus
Sylvain Gugger's avatar
Sylvain Gugger committed
243
    model_doc/rag
244
245
246
    model_doc/reformer
    model_doc/retribert
    model_doc/roberta
247
    model_doc/squeezebert
248
249
250
251
252
253
254
255
256
257
    model_doc/t5
    model_doc/transformerxl
    model_doc/xlm
    model_doc/xlmroberta
    model_doc/xlnet

.. toctree::
    :maxdepth: 2
    :caption: Internal Helpers

Sylvain Gugger's avatar
Sylvain Gugger committed
258
    internal/modeling_utils
259
    internal/pipelines_utils
260
    internal/tokenization_utils