distilbert.rst 6.43 KB
Newer Older
Sylvain Gugger's avatar
Sylvain Gugger committed
1
2
3
4
5
6
7
8
9
10
11
12
.. 
    Copyright 2020 The HuggingFace Team. All rights reserved.

    Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
    the License. You may obtain a copy of the License at

        http://www.apache.org/licenses/LICENSE-2.0

    Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
    an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
    specific language governing permissions and limitations under the License.

LysandreJik's avatar
LysandreJik committed
13
DistilBERT
Sylvain Gugger's avatar
Sylvain Gugger committed
14
-----------------------------------------------------------------------------------------------------------------------
LysandreJik's avatar
LysandreJik committed
15

Sylvain Gugger's avatar
Sylvain Gugger committed
16
Overview
Sylvain Gugger's avatar
Sylvain Gugger committed
17
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Sylvain Gugger's avatar
Sylvain Gugger committed
18

Sylvain Gugger's avatar
Sylvain Gugger committed
19
20
21
22
23
24
The DistilBERT model was proposed in the blog post `Smaller, faster, cheaper, lighter: Introducing DistilBERT, a
distilled version of BERT <https://medium.com/huggingface/distilbert-8cf3380435b5>`__, and the paper `DistilBERT, a
distilled version of BERT: smaller, faster, cheaper and lighter <https://arxiv.org/abs/1910.01108>`__. DistilBERT is a
small, fast, cheap and light Transformer model trained by distilling BERT base. It has 40% less parameters than
`bert-base-uncased`, runs 60% faster while preserving over 95% of BERT's performances as measured on the GLUE language
understanding benchmark.
Lysandre's avatar
Lysandre committed
25
26
27
28
29
30
31

The abstract from the paper is the following:

*As Transfer Learning from large-scale pre-trained models becomes more prevalent in Natural Language Processing (NLP),
operating these large models in on-the-edge and/or under constrained computational training or inference budgets
remains challenging. In this work, we propose a method to pre-train a smaller general-purpose language representation
model, called DistilBERT, which can then be fine-tuned with good performances on a wide range of tasks like its larger
Sylvain Gugger's avatar
Sylvain Gugger committed
32
counterparts. While most prior work investigated the use of distillation for building task-specific models, we leverage
33
knowledge distillation during the pretraining phase and show that it is possible to reduce the size of a BERT model by
Sylvain Gugger's avatar
Sylvain Gugger committed
34
40%, while retaining 97% of its language understanding capabilities and being 60% faster. To leverage the inductive
35
biases learned by larger models during pretraining, we introduce a triple loss combining language modeling,
Sylvain Gugger's avatar
Sylvain Gugger committed
36
37
38
distillation and cosine-distance losses. Our smaller, faster and lighter model is cheaper to pre-train and we
demonstrate its capabilities for on-device computations in a proof-of-concept experiment and a comparative on-device
study.*
Lysandre's avatar
Lysandre committed
39
40

Tips:
Lysandre's avatar
Lysandre committed
41

Sylvain Gugger's avatar
Sylvain Gugger committed
42
43
44
45
- DistilBERT doesn't have :obj:`token_type_ids`, you don't need to indicate which token belongs to which segment. Just
  separate your segments with the separation token :obj:`tokenizer.sep_token` (or :obj:`[SEP]`).
- DistilBERT doesn't have options to select the input positions (:obj:`position_ids` input). This could be added if
  necessary though, just let us know if you need this option.
Lysandre's avatar
Lysandre committed
46

Sylvain Gugger's avatar
Sylvain Gugger committed
47
48
The original code can be found `here
<https://github.com/huggingface/transformers/tree/master/examples/distillation>`__.
49

Lysandre's avatar
Lysandre committed
50

Lysandre's avatar
Lysandre committed
51
DistilBertConfig
Sylvain Gugger's avatar
Sylvain Gugger committed
52
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
LysandreJik's avatar
LysandreJik committed
53

54
.. autoclass:: transformers.DistilBertConfig
LysandreJik's avatar
LysandreJik committed
55
56
57
    :members:


Lysandre's avatar
Lysandre committed
58
DistilBertTokenizer
Sylvain Gugger's avatar
Sylvain Gugger committed
59
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
LysandreJik's avatar
LysandreJik committed
60

61
.. autoclass:: transformers.DistilBertTokenizer
LysandreJik's avatar
LysandreJik committed
62
63
64
    :members:


65
DistilBertTokenizerFast
Sylvain Gugger's avatar
Sylvain Gugger committed
66
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
67
68
69
70
71

.. autoclass:: transformers.DistilBertTokenizerFast
    :members:


Lysandre's avatar
Lysandre committed
72
DistilBertModel
Sylvain Gugger's avatar
Sylvain Gugger committed
73
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
LysandreJik's avatar
LysandreJik committed
74

75
.. autoclass:: transformers.DistilBertModel
Sylvain Gugger's avatar
Sylvain Gugger committed
76
    :members: forward
LysandreJik's avatar
LysandreJik committed
77
78


Lysandre's avatar
Lysandre committed
79
DistilBertForMaskedLM
Sylvain Gugger's avatar
Sylvain Gugger committed
80
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
LysandreJik's avatar
LysandreJik committed
81

82
.. autoclass:: transformers.DistilBertForMaskedLM
Sylvain Gugger's avatar
Sylvain Gugger committed
83
    :members: forward
LysandreJik's avatar
LysandreJik committed
84
85


Lysandre's avatar
Lysandre committed
86
DistilBertForSequenceClassification
Sylvain Gugger's avatar
Sylvain Gugger committed
87
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
LysandreJik's avatar
LysandreJik committed
88

89
.. autoclass:: transformers.DistilBertForSequenceClassification
Sylvain Gugger's avatar
Sylvain Gugger committed
90
    :members: forward
LysandreJik's avatar
LysandreJik committed
91
92


93
DistilBertForMultipleChoice
Sylvain Gugger's avatar
Sylvain Gugger committed
94
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
95
96

.. autoclass:: transformers.DistilBertForMultipleChoice
Sylvain Gugger's avatar
Sylvain Gugger committed
97
    :members: forward
98
99


Sylvain Gugger's avatar
Sylvain Gugger committed
100
DistilBertForTokenClassification
Sylvain Gugger's avatar
Sylvain Gugger committed
101
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Sylvain Gugger's avatar
Sylvain Gugger committed
102
103

.. autoclass:: transformers.DistilBertForTokenClassification
Sylvain Gugger's avatar
Sylvain Gugger committed
104
    :members: forward
Sylvain Gugger's avatar
Sylvain Gugger committed
105
106


Lysandre's avatar
Lysandre committed
107
DistilBertForQuestionAnswering
Sylvain Gugger's avatar
Sylvain Gugger committed
108
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
LysandreJik's avatar
LysandreJik committed
109

110
.. autoclass:: transformers.DistilBertForQuestionAnswering
Sylvain Gugger's avatar
Sylvain Gugger committed
111
    :members: forward
LysandreJik's avatar
LysandreJik committed
112

Lysandre's avatar
Lysandre committed
113
TFDistilBertModel
Sylvain Gugger's avatar
Sylvain Gugger committed
114
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
LysandreJik's avatar
LysandreJik committed
115

116
.. autoclass:: transformers.TFDistilBertModel
Sylvain Gugger's avatar
Sylvain Gugger committed
117
    :members: call
LysandreJik's avatar
LysandreJik committed
118
119


Lysandre's avatar
Lysandre committed
120
TFDistilBertForMaskedLM
Sylvain Gugger's avatar
Sylvain Gugger committed
121
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
LysandreJik's avatar
LysandreJik committed
122

123
.. autoclass:: transformers.TFDistilBertForMaskedLM
Sylvain Gugger's avatar
Sylvain Gugger committed
124
    :members: call
LysandreJik's avatar
LysandreJik committed
125
126


Lysandre's avatar
Lysandre committed
127
TFDistilBertForSequenceClassification
Sylvain Gugger's avatar
Sylvain Gugger committed
128
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
LysandreJik's avatar
LysandreJik committed
129

130
.. autoclass:: transformers.TFDistilBertForSequenceClassification
Sylvain Gugger's avatar
Sylvain Gugger committed
131
    :members: call
LysandreJik's avatar
LysandreJik committed
132
133


Sylvain Gugger's avatar
Sylvain Gugger committed
134
135

TFDistilBertForMultipleChoice
Sylvain Gugger's avatar
Sylvain Gugger committed
136
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Sylvain Gugger's avatar
Sylvain Gugger committed
137
138

.. autoclass:: transformers.TFDistilBertForMultipleChoice
Sylvain Gugger's avatar
Sylvain Gugger committed
139
    :members: call
Sylvain Gugger's avatar
Sylvain Gugger committed
140
141
142
143



TFDistilBertForTokenClassification
Sylvain Gugger's avatar
Sylvain Gugger committed
144
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Sylvain Gugger's avatar
Sylvain Gugger committed
145
146

.. autoclass:: transformers.TFDistilBertForTokenClassification
Sylvain Gugger's avatar
Sylvain Gugger committed
147
    :members: call
Sylvain Gugger's avatar
Sylvain Gugger committed
148
149


Lysandre's avatar
Lysandre committed
150
TFDistilBertForQuestionAnswering
Sylvain Gugger's avatar
Sylvain Gugger committed
151
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
LysandreJik's avatar
LysandreJik committed
152

153
.. autoclass:: transformers.TFDistilBertForQuestionAnswering
Sylvain Gugger's avatar
Sylvain Gugger committed
154
    :members: call