distilbert.rst 5.83 KB
Newer Older
LysandreJik's avatar
LysandreJik committed
1
DistilBERT
Sylvain Gugger's avatar
Sylvain Gugger committed
2
-----------------------------------------------------------------------------------------------------------------------
LysandreJik's avatar
LysandreJik committed
3

Sylvain Gugger's avatar
Sylvain Gugger committed
4
Overview
Sylvain Gugger's avatar
Sylvain Gugger committed
5
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Sylvain Gugger's avatar
Sylvain Gugger committed
6

Sylvain Gugger's avatar
Sylvain Gugger committed
7
8
9
10
11
12
The DistilBERT model was proposed in the blog post `Smaller, faster, cheaper, lighter: Introducing DistilBERT, a
distilled version of BERT <https://medium.com/huggingface/distilbert-8cf3380435b5>`__, and the paper `DistilBERT, a
distilled version of BERT: smaller, faster, cheaper and lighter <https://arxiv.org/abs/1910.01108>`__. DistilBERT is a
small, fast, cheap and light Transformer model trained by distilling BERT base. It has 40% less parameters than
`bert-base-uncased`, runs 60% faster while preserving over 95% of BERT's performances as measured on the GLUE language
understanding benchmark.
Lysandre's avatar
Lysandre committed
13
14
15
16
17
18
19

The abstract from the paper is the following:

*As Transfer Learning from large-scale pre-trained models becomes more prevalent in Natural Language Processing (NLP),
operating these large models in on-the-edge and/or under constrained computational training or inference budgets
remains challenging. In this work, we propose a method to pre-train a smaller general-purpose language representation
model, called DistilBERT, which can then be fine-tuned with good performances on a wide range of tasks like its larger
Sylvain Gugger's avatar
Sylvain Gugger committed
20
21
22
23
24
25
26
counterparts. While most prior work investigated the use of distillation for building task-specific models, we leverage
knowledge distillation during the pre-training phase and show that it is possible to reduce the size of a BERT model by
40%, while retaining 97% of its language understanding capabilities and being 60% faster. To leverage the inductive
biases learned by larger models during pre-training, we introduce a triple loss combining language modeling,
distillation and cosine-distance losses. Our smaller, faster and lighter model is cheaper to pre-train and we
demonstrate its capabilities for on-device computations in a proof-of-concept experiment and a comparative on-device
study.*
Lysandre's avatar
Lysandre committed
27
28

Tips:
Lysandre's avatar
Lysandre committed
29

Sylvain Gugger's avatar
Sylvain Gugger committed
30
31
32
33
- DistilBERT doesn't have :obj:`token_type_ids`, you don't need to indicate which token belongs to which segment. Just
  separate your segments with the separation token :obj:`tokenizer.sep_token` (or :obj:`[SEP]`).
- DistilBERT doesn't have options to select the input positions (:obj:`position_ids` input). This could be added if
  necessary though, just let us know if you need this option.
Lysandre's avatar
Lysandre committed
34

Sylvain Gugger's avatar
Sylvain Gugger committed
35
36
The original code can be found `here
<https://github.com/huggingface/transformers/tree/master/examples/distillation>`__.
37

Lysandre's avatar
Lysandre committed
38

Lysandre's avatar
Lysandre committed
39
DistilBertConfig
Sylvain Gugger's avatar
Sylvain Gugger committed
40
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
LysandreJik's avatar
LysandreJik committed
41

42
.. autoclass:: transformers.DistilBertConfig
LysandreJik's avatar
LysandreJik committed
43
44
45
    :members:


Lysandre's avatar
Lysandre committed
46
DistilBertTokenizer
Sylvain Gugger's avatar
Sylvain Gugger committed
47
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
LysandreJik's avatar
LysandreJik committed
48

49
.. autoclass:: transformers.DistilBertTokenizer
LysandreJik's avatar
LysandreJik committed
50
51
52
    :members:


53
DistilBertTokenizerFast
Sylvain Gugger's avatar
Sylvain Gugger committed
54
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
55
56
57
58
59

.. autoclass:: transformers.DistilBertTokenizerFast
    :members:


Lysandre's avatar
Lysandre committed
60
DistilBertModel
Sylvain Gugger's avatar
Sylvain Gugger committed
61
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
LysandreJik's avatar
LysandreJik committed
62

63
.. autoclass:: transformers.DistilBertModel
Sylvain Gugger's avatar
Sylvain Gugger committed
64
    :members: forward
LysandreJik's avatar
LysandreJik committed
65
66


Lysandre's avatar
Lysandre committed
67
DistilBertForMaskedLM
Sylvain Gugger's avatar
Sylvain Gugger committed
68
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
LysandreJik's avatar
LysandreJik committed
69

70
.. autoclass:: transformers.DistilBertForMaskedLM
Sylvain Gugger's avatar
Sylvain Gugger committed
71
    :members: forward
LysandreJik's avatar
LysandreJik committed
72
73


Lysandre's avatar
Lysandre committed
74
DistilBertForSequenceClassification
Sylvain Gugger's avatar
Sylvain Gugger committed
75
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
LysandreJik's avatar
LysandreJik committed
76

77
.. autoclass:: transformers.DistilBertForSequenceClassification
Sylvain Gugger's avatar
Sylvain Gugger committed
78
    :members: forward
LysandreJik's avatar
LysandreJik committed
79
80


81
DistilBertForMultipleChoice
Sylvain Gugger's avatar
Sylvain Gugger committed
82
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
83
84

.. autoclass:: transformers.DistilBertForMultipleChoice
Sylvain Gugger's avatar
Sylvain Gugger committed
85
    :members: forward
86
87


Sylvain Gugger's avatar
Sylvain Gugger committed
88
DistilBertForTokenClassification
Sylvain Gugger's avatar
Sylvain Gugger committed
89
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Sylvain Gugger's avatar
Sylvain Gugger committed
90
91

.. autoclass:: transformers.DistilBertForTokenClassification
Sylvain Gugger's avatar
Sylvain Gugger committed
92
    :members: forward
Sylvain Gugger's avatar
Sylvain Gugger committed
93
94


Lysandre's avatar
Lysandre committed
95
DistilBertForQuestionAnswering
Sylvain Gugger's avatar
Sylvain Gugger committed
96
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
LysandreJik's avatar
LysandreJik committed
97

98
.. autoclass:: transformers.DistilBertForQuestionAnswering
Sylvain Gugger's avatar
Sylvain Gugger committed
99
    :members: forward
LysandreJik's avatar
LysandreJik committed
100

Lysandre's avatar
Lysandre committed
101
TFDistilBertModel
Sylvain Gugger's avatar
Sylvain Gugger committed
102
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
LysandreJik's avatar
LysandreJik committed
103

104
.. autoclass:: transformers.TFDistilBertModel
Sylvain Gugger's avatar
Sylvain Gugger committed
105
    :members: call
LysandreJik's avatar
LysandreJik committed
106
107


Lysandre's avatar
Lysandre committed
108
TFDistilBertForMaskedLM
Sylvain Gugger's avatar
Sylvain Gugger committed
109
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
LysandreJik's avatar
LysandreJik committed
110

111
.. autoclass:: transformers.TFDistilBertForMaskedLM
Sylvain Gugger's avatar
Sylvain Gugger committed
112
    :members: call
LysandreJik's avatar
LysandreJik committed
113
114


Lysandre's avatar
Lysandre committed
115
TFDistilBertForSequenceClassification
Sylvain Gugger's avatar
Sylvain Gugger committed
116
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
LysandreJik's avatar
LysandreJik committed
117

118
.. autoclass:: transformers.TFDistilBertForSequenceClassification
Sylvain Gugger's avatar
Sylvain Gugger committed
119
    :members: call
LysandreJik's avatar
LysandreJik committed
120
121


Sylvain Gugger's avatar
Sylvain Gugger committed
122
123

TFDistilBertForMultipleChoice
Sylvain Gugger's avatar
Sylvain Gugger committed
124
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Sylvain Gugger's avatar
Sylvain Gugger committed
125
126

.. autoclass:: transformers.TFDistilBertForMultipleChoice
Sylvain Gugger's avatar
Sylvain Gugger committed
127
    :members: call
Sylvain Gugger's avatar
Sylvain Gugger committed
128
129
130
131



TFDistilBertForTokenClassification
Sylvain Gugger's avatar
Sylvain Gugger committed
132
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Sylvain Gugger's avatar
Sylvain Gugger committed
133
134

.. autoclass:: transformers.TFDistilBertForTokenClassification
Sylvain Gugger's avatar
Sylvain Gugger committed
135
    :members: call
Sylvain Gugger's avatar
Sylvain Gugger committed
136
137


Lysandre's avatar
Lysandre committed
138
TFDistilBertForQuestionAnswering
Sylvain Gugger's avatar
Sylvain Gugger committed
139
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
LysandreJik's avatar
LysandreJik committed
140

141
.. autoclass:: transformers.TFDistilBertForQuestionAnswering
Sylvain Gugger's avatar
Sylvain Gugger committed
142
    :members: call