index.rst 11.7 KB
Newer Older
1
Transformers
2
3
================================================================================================================================================

4
State-of-the-art Natural Language Processing for Pytorch and TensorFlow 2.0.
thomwolf's avatar
thomwolf committed
5

6
7
8
9
10
11
馃 Transformers (formerly known as `pytorch-transformers` and `pytorch-pretrained-bert`) provides general-purpose 
architectures (BERT, GPT-2, RoBERTa, XLM, DistilBert, XLNet...) for Natural Language Understanding (NLU) and Natural 
Language Generation (NLG) with over 32+ pretrained models in 100+ languages and deep interoperability between 
TensorFlow 2.0 and PyTorch.

This is the documentation of our repository `transformers <https://github.com/huggingface/transformers>`_.
12

LysandreJik's avatar
LysandreJik committed
13
14
15
16
17
18
Features
---------------------------------------------------

- High performance on NLU and NLG tasks
- Low barrier to entry for educators and practitioners

LysandreJik's avatar
LysandreJik committed
19
20
State-of-the-art NLP for everyone:

LysandreJik's avatar
LysandreJik committed
21
22
23
24
- Deep learning researchers
- Hands-on practitioners
- AI/ML/NLP teachers and educators

LysandreJik's avatar
LysandreJik committed
25
26
Lower compute costs, smaller carbon footprint:

LysandreJik's avatar
LysandreJik committed
27
28
29
30
- Researchers can share trained models instead of always retraining
- Practitioners can reduce compute time and production costs
- 8 architectures with over 30 pretrained models, some in more than 100 languages

LysandreJik's avatar
LysandreJik committed
31
32
Choose the right framework for every part of a model's lifetime:

LysandreJik's avatar
LysandreJik committed
33
34
35
36
37
38
39
40
- Train state-of-the-art models in 3 lines of code
- Deep interoperability between TensorFlow 2.0 and PyTorch models
- Move a single model between TF2.0/PyTorch frameworks at will
- Seamlessly pick the right framework for training, evaluation, production

Contents
---------------------------------

Sylvain Gugger's avatar
Sylvain Gugger committed
41
42
43
44
The documentation is organized in five parts:

- **GET STARTED** contains a quick tour, the installation instructions and some useful information about our philosophy
  and a glossary.
Sylvain Gugger's avatar
Sylvain Gugger committed
45
- **USING 馃 TRANSFORMERS** contains general tutorials on how to use the library.
Sylvain Gugger's avatar
Sylvain Gugger committed
46
47
48
49
50
- **ADVANCED GUIDES** contains more advanced guides that are more specific to a given script or part of the library.
- **RESEARCH** focuses on tutorials that have less to do with how to use the library but more about general resarch in
  transformers model
- **PACKAGE REFERENCE** contains the documentation of each public class and function.

51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
The library currently contains PyTorch and Tensorflow implementations, pre-trained model weights, usage scripts and
conversion utilities for the following models:

1. `BERT <https://github.com/google-research/bert>`_ (from Google) released with the paper `BERT: Pre-training of Deep
   Bidirectional Transformers for Language Understanding <https://arxiv.org/abs/1810.04805>`_ by Jacob Devlin, Ming-Wei
   Chang, Kenton Lee, and Kristina Toutanova.
2. `GPT <https://github.com/openai/finetune-transformer-lm>`_ (from OpenAI) released with the paper `Improving Language
   Understanding by Generative Pre-Training <https://blog.openai.com/language-unsupervised>`_ by Alec Radford, Karthik
   Narasimhan, Tim Salimans, and Ilya Sutskever.
3. `GPT-2 <https://blog.openai.com/better-language-models>`_ (from OpenAI) released with the paper `Language Models are
   Unsupervised Multitask Learners <https://blog.openai.com/better-language-models>`_ by Alec Radford, Jeffrey Wu,
   Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever.
4. `Transformer-XL <https://github.com/kimiyoung/transformer-xl>`_ (from Google/CMU) released with the paper
   `Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context <https://arxiv.org/abs/1901.02860>`_ by
   Zihang Dai, Zhilin Yang, Yiming Yang, Jaime Carbonell, Quoc V. Le, and Ruslan Salakhutdinov.
5. `XLNet <https://github.com/zihangdai/xlnet>`_ (from Google/CMU) released with the paper `鈥媂LNet: Generalized
   Autoregressive Pretraining for Language Understanding <https://arxiv.org/abs/1906.08237>`_ by Zhilin Yang, Zihang
   Dai, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, and Quoc V. Le.
6. `XLM <https://github.com/facebookresearch/XLM>`_ (from Facebook) released together with the paper `Cross-lingual
   Language Model Pretraining <https://arxiv.org/abs/1901.07291>`_ by Guillaume Lample and Alexis Conneau.
7. `RoBERTa <https://github.com/pytorch/fairseq/tree/master/examples/roberta>`_ (from Facebook), released together with
   the paper a `Robustly Optimized BERT Pretraining Approach <https://arxiv.org/abs/1907.11692>`_ by Yinhan Liu, Myle
   Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin
   Stoyanov.
8. `DistilBERT <https://huggingface.co/transformers/model_doc/distilbert.html>`_ (from HuggingFace) released together
   with the paper `DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
   <https://arxiv.org/abs/1910.01108>`_ by Victor Sanh, Lysandre Debut, and Thomas Wolf. The same method has been
   applied to compress GPT2 into
   `DistilGPT2 <https://github.com/huggingface/transformers/tree/master/examples/distillation>`_.
9. `CTRL <https://github.com/pytorch/fairseq/tree/master/examples/ctrl>`_ (from Salesforce), released together with the
   paper `CTRL: A Conditional Transformer Language Model for Controllable Generation
   <https://www.github.com/salesforce/ctrl>`_ by Nitish Shirish Keskar, Bryan McCann, Lav R. Varshney, Caiming Xiong,
   and Richard Socher.
10. `CamemBERT <https://huggingface.co/transformers/model_doc/camembert.html>`_ (from FAIR, Inria, Sorbonne Universit茅)
    released together with the paper `CamemBERT: a Tasty French Language Model <https://arxiv.org/abs/1911.03894>`_ by
    Louis Martin, Benjamin Muller, Pedro Javier Ortiz Suarez, Yoann Dupont, Laurent Romary, Eric Villemonte de la
    Clergerie, Djame Seddah, and Beno卯t Sagot.
11. `ALBERT <https://github.com/google-research/ALBERT>`_ (from Google Research), released together with the paper
    `ALBERT: A Lite BERT for Self-supervised Learning of Language Representations <https://arxiv.org/abs/1909.11942>`_
    by Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, and Radu Soricut.
12. `T5 <https://github.com/google-research/text-to-text-transfer-transformer>`_ (from Google) released with the paper
    `Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
    <https://arxiv.org/abs/1910.10683>`_ by Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang,
    Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu.
13. `XLM-RoBERTa <https://github.com/pytorch/fairseq/tree/master/examples/xlmr>`_ (from Facebook AI), released together
    with the paper `Unsupervised Cross-lingual Representation Learning at Scale <https://arxiv.org/abs/1911.02116>`_ by
    Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzm谩n, Edouard
    Grave, Myle Ott, Luke Zettlemoyer, and Veselin Stoyanov.
14. `MMBT <https://github.com/facebookresearch/mmbt/>`_ (from Facebook), released together with the paper a `Supervised
    Multimodal Bitransformers for Classifying Images and Text <https://arxiv.org/pdf/1909.02950.pdf>`_ by Douwe Kiela,
    Suvrat Bhooshan, Hamed Firooz, and Davide Testuggine.
15. `FlauBERT <https://github.com/getalp/Flaubert>`_ (from CNRS) released with the paper `FlauBERT: Unsupervised
    Language Model Pre-training for French <https://arxiv.org/abs/1912.05372>`_ by Hang Le, Lo茂c Vial, Jibril Frej,
    Vincent Segonne, Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Beno卯t Crabb茅, Laurent Besacier, and
    Didier Schwab.
16. `BART <https://github.com/pytorch/fairseq/tree/master/examples/bart>`_ (from Facebook) released with the paper
    `BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
    <https://arxiv.org/pdf/1910.13461.pdf>`_ by Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman
    Mohamed, Omer Levy, Ves Stoyanov, and Luke Zettlemoyer.
17. `ELECTRA <https://github.com/google-research/electra>`_ (from Google Research/Stanford University) released with
    the paper `ELECTRA: Pre-training text encoders as discriminators rather than generators
    <https://arxiv.org/abs/2003.10555>`_ by Kevin Clark, Minh-Thang Luong, Quoc V. Le, and Christopher D. Manning.
18. `DialoGPT <https://github.com/microsoft/DialoGPT>`_ (from Microsoft Research) released with the paper `DialoGPT:
    Large-Scale Generative Pre-training for Conversational Response Generation <https://arxiv.org/abs/1911.00536>`_ by
    Yizhe Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao, Jianfeng Gao, Jingjing Liu,
    and Bill Dolan.
19. `Reformer <https://github.com/google/trax/tree/master/trax/models/reformer>`_ (from Google Research) released with
    the paper `Reformer: The Efficient Transformer <https://arxiv.org/abs/2001.04451>`_ by Nikita Kitaev, 艁ukasz
    Kaiser, and Anselm Levskaya.
20. `MarianMT <https://marian-nmt.github.io/>`_ (developed by the Microsoft Translator Team) machine translation models
    trained using `OPUS <http://opus.nlpl.eu/>`_ pretrained_models data by J枚rg Tiedemann.
21. `Longformer <https://github.com/allenai/longformer>`_ (from AllenAI) released with the paper `Longformer: The
    Long-Document Transformer <https://arxiv.org/abs/2004.05150>`_ by Iz Beltagy, Matthew E. Peters, and Arman Cohan.
Quentin Lhoest's avatar
Quentin Lhoest committed
124
125
126
22. `DPR <https://github.com/facebookresearch/DPR>`_ (from Facebook) released with the paper `Dense Passage Retrieval
    for Open-Domain Question Answering <https://arxiv.org/abs/2004.04906>`_ by Vladimir Karpukhin, Barlas O臒uz, Sewon
    Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih.
127
128
23. `Pegasus <https://github.com/google-research/pegasus>`_ (from Google) released with the paper `PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization
    <https://arxiv.org/abs/1912.08777>`_ by Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu.
129
130
131
24. `MBart <https://github.com/pytorch/fairseq/tree/master/examples/mbart>`_ (from Facebook) released with the paper  `Multilingual Denoising Pre-training for Neural Machine Translation <https://arxiv.org/abs/2001.08210>`_ by Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov
    Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer.  
25. `Other community models <https://huggingface.co/models>`_, contributed by the `community
132
    <https://huggingface.co/users>`_.
LysandreJik's avatar
LysandreJik committed
133

134
135
.. toctree::
    :maxdepth: 2
136
    :caption: Get started
137

Sylvain Gugger's avatar
Sylvain Gugger committed
138
    quicktour
139
    installation
Sylvain Gugger's avatar
Sylvain Gugger committed
140
    philosophy
Lysandre's avatar
Lysandre committed
141
    glossary
142
143
144

.. toctree::
    :maxdepth: 2
Sylvain Gugger's avatar
Sylvain Gugger committed
145
    :caption: Using 馃 Transformers
146

Sylvain Gugger's avatar
Sylvain Gugger committed
147
148
    task_summary
    model_summary
Sylvain Gugger's avatar
Sylvain Gugger committed
149
    preprocessing
150
    training
151
    model_sharing
Sylvain Gugger's avatar
Sylvain Gugger committed
152
    tokenizer_summary
153
154
155
156
157
158
159
    multilingual

.. toctree::
    :maxdepth: 2
    :caption: Advanced guides

    pretrained_models
160
    examples
161
    custom_datasets
162
    notebooks
163
    converting_tensorflow_models
164
    migration
165
    contributing
Funtowicz Morgan's avatar
Funtowicz Morgan committed
166
    serialization
167
168
169
170
171
172

.. toctree::
    :maxdepth: 2
    :caption: Research

    bertology
173
    perplexity
174
    benchmarks
175

thomwolf's avatar
thomwolf committed
176
177
.. toctree::
    :maxdepth: 2
178
    :caption: Package Reference
thomwolf's avatar
thomwolf committed
179
180

    main_classes/configuration
181
    main_classes/output
thomwolf's avatar
thomwolf committed
182
183
    main_classes/model
    main_classes/tokenizer
Lysandre Debut's avatar
Lysandre Debut committed
184
    main_classes/pipelines
Sylvain Gugger's avatar
Sylvain Gugger committed
185
    main_classes/trainer
thomwolf's avatar
thomwolf committed
186
    main_classes/optimizer_schedules
LysandreJik's avatar
LysandreJik committed
187
    main_classes/processors
thomwolf's avatar
thomwolf committed
188
    model_doc/auto
189
    model_doc/encoderdecoder
190
191
192
193
194
195
    model_doc/bert
    model_doc/gpt
    model_doc/transformerxl
    model_doc/gpt2
    model_doc/xlm
    model_doc/xlnet
LysandreJik's avatar
Doc  
LysandreJik committed
196
    model_doc/roberta
LysandreJik's avatar
LysandreJik committed
197
    model_doc/distilbert
LysandreJik's avatar
LysandreJik committed
198
    model_doc/ctrl
Lysandre's avatar
Lysandre committed
199
200
    model_doc/camembert
    model_doc/albert
Lysandre's avatar
Lysandre committed
201
    model_doc/xlmroberta
Sam Shleifer's avatar
Sam Shleifer committed
202
203
    model_doc/flaubert
    model_doc/bart
Patrick von Platen's avatar
Patrick von Platen committed
204
    model_doc/t5
205
206
    model_doc/electra
    model_doc/dialogpt
Patrick von Platen's avatar
Patrick von Platen committed
207
    model_doc/reformer
208
    model_doc/marian
209
    model_doc/longformer
Yacine Jernite's avatar
Yacine Jernite committed
210
    model_doc/retribert
Vasily Shamporov's avatar
Vasily Shamporov committed
211
    model_doc/mobilebert
Quentin Lhoest's avatar
Quentin Lhoest committed
212
    model_doc/dpr
213
    model_doc/pegasus
214
    model_doc/mbart
Sylvain Gugger's avatar
Sylvain Gugger committed
215
    internal/modeling_utils
Sylvain Gugger's avatar
Sylvain Gugger committed
216
    internal/tokenization_utils
217
    internal/pipelines_utils