electra.rst 9.8 KB
Newer Older
Sylvain Gugger's avatar
Sylvain Gugger committed
1
2
3
4
5
6
7
8
9
10
11
12
.. 
    Copyright 2020 The HuggingFace Team. All rights reserved.

    Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
    the License. You may obtain a copy of the License at

        http://www.apache.org/licenses/LICENSE-2.0

    Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
    an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
    specific language governing permissions and limitations under the License.

Lysandre Debut's avatar
Lysandre Debut committed
13
ELECTRA
Sylvain Gugger's avatar
Sylvain Gugger committed
14
-----------------------------------------------------------------------------------------------------------------------
Lysandre Debut's avatar
Lysandre Debut committed
15

Sylvain Gugger's avatar
Sylvain Gugger committed
16
Overview
Sylvain Gugger's avatar
Sylvain Gugger committed
17
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Sylvain Gugger's avatar
Sylvain Gugger committed
18

Sylvain Gugger's avatar
Sylvain Gugger committed
19
20
21
22
23
The ELECTRA model was proposed in the paper `ELECTRA: Pre-training Text Encoders as Discriminators Rather Than
Generators <https://openreview.net/pdf?id=r1xMH1BtvB>`__. ELECTRA is a new pretraining approach which trains two
transformer models: the generator and the discriminator. The generator's role is to replace tokens in a sequence, and
is therefore trained as a masked language model. The discriminator, which is the model we're interested in, tries to
identify which tokens were replaced by the generator in the sequence.
Lysandre Debut's avatar
Lysandre Debut committed
24
25
26

The abstract from the paper is the following:

27
28
*Masked language modeling (MLM) pretraining methods such as BERT corrupt the input by replacing some tokens with [MASK]
and then train a model to reconstruct the original tokens. While they produce good results when transferred to
Sylvain Gugger's avatar
Sylvain Gugger committed
29
downstream NLP tasks, they generally require large amounts of compute to be effective. As an alternative, we propose a
30
more sample-efficient pretraining task called replaced token detection. Instead of masking the input, our approach
Sylvain Gugger's avatar
Sylvain Gugger committed
31
32
33
corrupts it by replacing some tokens with plausible alternatives sampled from a small generator network. Then, instead
of training a model that predicts the original identities of the corrupted tokens, we train a discriminative model that
predicts whether each token in the corrupted input was replaced by a generator sample or not. Thorough experiments
34
demonstrate this new pretraining task is more efficient than MLM because the task is defined over all input tokens
Sylvain Gugger's avatar
Sylvain Gugger committed
35
36
37
38
39
40
rather than just the small subset that was masked out. As a result, the contextual representations learned by our
approach substantially outperform the ones learned by BERT given the same model size, data, and compute. The gains are
particularly strong for small models; for example, we train a model on one GPU for 4 days that outperforms GPT (trained
using 30x more compute) on the GLUE natural language understanding benchmark. Our approach also works well at scale,
where it performs comparably to RoBERTa and XLNet while using less than 1/4 of their compute and outperforms them when
using the same amount of compute.*
Lysandre Debut's avatar
Lysandre Debut committed
41
42
43

Tips:

Sylvain Gugger's avatar
Sylvain Gugger committed
44
45
- ELECTRA is the pretraining approach, therefore there is nearly no changes done to the underlying model: BERT. The
  only change is the separation of the embedding size and the hidden size: the embedding size is generally smaller,
Sylvain Gugger's avatar
Sylvain Gugger committed
46
47
48
  while the hidden size is larger. An additional projection layer (linear) is used to project the embeddings from their
  embedding size to the hidden size. In the case where the embedding size is the same as the hidden size, no projection
  layer is used.
Lysandre Debut's avatar
Lysandre Debut committed
49
50
51
- The ELECTRA checkpoints saved using `Google Research's implementation <https://github.com/google-research/electra>`__
  contain both the generator and discriminator. The conversion script requires the user to name which model to export
  into the correct architecture. Once converted to the HuggingFace format, these checkpoints may be loaded into all
Sylvain Gugger's avatar
Sylvain Gugger committed
52
53
54
55
  available ELECTRA models, however. This means that the discriminator may be loaded in the
  :class:`~transformers.ElectraForMaskedLM` model, and the generator may be loaded in the
  :class:`~transformers.ElectraForPreTraining` model (the classification head will be randomly initialized as it
  doesn't exist in the generator).
Lysandre Debut's avatar
Lysandre Debut committed
56

57
58
This model was contributed by `lysandre <https://huggingface.co/lysandre>`__. The original code can be found `here
<https://github.com/google-research/electra>`__.
59

Lysandre Debut's avatar
Lysandre Debut committed
60
61

ElectraConfig
Sylvain Gugger's avatar
Sylvain Gugger committed
62
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Lysandre Debut's avatar
Lysandre Debut committed
63
64
65
66
67
68

.. autoclass:: transformers.ElectraConfig
    :members:


ElectraTokenizer
Sylvain Gugger's avatar
Sylvain Gugger committed
69
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Lysandre Debut's avatar
Lysandre Debut committed
70
71
72
73
74

.. autoclass:: transformers.ElectraTokenizer
    :members:


75
ElectraTokenizerFast
Sylvain Gugger's avatar
Sylvain Gugger committed
76
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
77
78
79
80
81

.. autoclass:: transformers.ElectraTokenizerFast
    :members:


82
Electra specific outputs
Sylvain Gugger's avatar
Sylvain Gugger committed
83
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
84

Sylvain Gugger's avatar
Sylvain Gugger committed
85
.. autoclass:: transformers.models.electra.modeling_electra.ElectraForPreTrainingOutput
Sylvain Gugger's avatar
Sylvain Gugger committed
86
87
    :members:

Sylvain Gugger's avatar
Sylvain Gugger committed
88
.. autoclass:: transformers.models.electra.modeling_tf_electra.TFElectraForPreTrainingOutput
89
90
91
    :members:


Lysandre Debut's avatar
Lysandre Debut committed
92
ElectraModel
Sylvain Gugger's avatar
Sylvain Gugger committed
93
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Lysandre Debut's avatar
Lysandre Debut committed
94
95

.. autoclass:: transformers.ElectraModel
Sylvain Gugger's avatar
Sylvain Gugger committed
96
    :members: forward
Lysandre Debut's avatar
Lysandre Debut committed
97
98
99


ElectraForPreTraining
Sylvain Gugger's avatar
Sylvain Gugger committed
100
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Lysandre Debut's avatar
Lysandre Debut committed
101
102

.. autoclass:: transformers.ElectraForPreTraining
Sylvain Gugger's avatar
Sylvain Gugger committed
103
    :members: forward
Lysandre Debut's avatar
Lysandre Debut committed
104
105
106


ElectraForMaskedLM
Sylvain Gugger's avatar
Sylvain Gugger committed
107
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Lysandre Debut's avatar
Lysandre Debut committed
108
109

.. autoclass:: transformers.ElectraForMaskedLM
Sylvain Gugger's avatar
Sylvain Gugger committed
110
    :members: forward
Lysandre Debut's avatar
Lysandre Debut committed
111
112


Sylvain Gugger's avatar
Sylvain Gugger committed
113
ElectraForSequenceClassification
Sylvain Gugger's avatar
Sylvain Gugger committed
114
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Sylvain Gugger's avatar
Sylvain Gugger committed
115
116

.. autoclass:: transformers.ElectraForSequenceClassification
Sylvain Gugger's avatar
Sylvain Gugger committed
117
    :members: forward
Sylvain Gugger's avatar
Sylvain Gugger committed
118
119


Sylvain Gugger's avatar
Sylvain Gugger committed
120
ElectraForMultipleChoice
Sylvain Gugger's avatar
Sylvain Gugger committed
121
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Sylvain Gugger's avatar
Sylvain Gugger committed
122
123

.. autoclass:: transformers.ElectraForMultipleChoice
Sylvain Gugger's avatar
Sylvain Gugger committed
124
    :members: forward
Sylvain Gugger's avatar
Sylvain Gugger committed
125
126


Lysandre Debut's avatar
Lysandre Debut committed
127
ElectraForTokenClassification
Sylvain Gugger's avatar
Sylvain Gugger committed
128
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Lysandre Debut's avatar
Lysandre Debut committed
129
130

.. autoclass:: transformers.ElectraForTokenClassification
Sylvain Gugger's avatar
Sylvain Gugger committed
131
    :members: forward
Lysandre Debut's avatar
Lysandre Debut committed
132
133


134
ElectraForQuestionAnswering
Sylvain Gugger's avatar
Sylvain Gugger committed
135
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
136
137

.. autoclass:: transformers.ElectraForQuestionAnswering
Sylvain Gugger's avatar
Sylvain Gugger committed
138
    :members: forward
139
140


Lysandre Debut's avatar
Lysandre Debut committed
141
TFElectraModel
Sylvain Gugger's avatar
Sylvain Gugger committed
142
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Lysandre Debut's avatar
Lysandre Debut committed
143
144

.. autoclass:: transformers.TFElectraModel
Sylvain Gugger's avatar
Sylvain Gugger committed
145
    :members: call
Lysandre Debut's avatar
Lysandre Debut committed
146
147
148


TFElectraForPreTraining
Sylvain Gugger's avatar
Sylvain Gugger committed
149
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Lysandre Debut's avatar
Lysandre Debut committed
150
151

.. autoclass:: transformers.TFElectraForPreTraining
Sylvain Gugger's avatar
Sylvain Gugger committed
152
    :members: call
Lysandre Debut's avatar
Lysandre Debut committed
153
154
155


TFElectraForMaskedLM
Sylvain Gugger's avatar
Sylvain Gugger committed
156
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Lysandre Debut's avatar
Lysandre Debut committed
157
158

.. autoclass:: transformers.TFElectraForMaskedLM
Sylvain Gugger's avatar
Sylvain Gugger committed
159
    :members: call
Lysandre Debut's avatar
Lysandre Debut committed
160
161


Sylvain Gugger's avatar
Sylvain Gugger committed
162
TFElectraForSequenceClassification
Sylvain Gugger's avatar
Sylvain Gugger committed
163
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Sylvain Gugger's avatar
Sylvain Gugger committed
164
165

.. autoclass:: transformers.TFElectraForSequenceClassification
Sylvain Gugger's avatar
Sylvain Gugger committed
166
    :members: call
Sylvain Gugger's avatar
Sylvain Gugger committed
167
168
169


TFElectraForMultipleChoice
Sylvain Gugger's avatar
Sylvain Gugger committed
170
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Sylvain Gugger's avatar
Sylvain Gugger committed
171
172

.. autoclass:: transformers.TFElectraForMultipleChoice
Sylvain Gugger's avatar
Sylvain Gugger committed
173
    :members: call
Sylvain Gugger's avatar
Sylvain Gugger committed
174
175


Lysandre Debut's avatar
Lysandre Debut committed
176
TFElectraForTokenClassification
Sylvain Gugger's avatar
Sylvain Gugger committed
177
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Lysandre Debut's avatar
Lysandre Debut committed
178
179

.. autoclass:: transformers.TFElectraForTokenClassification
Sylvain Gugger's avatar
Sylvain Gugger committed
180
    :members: call
Sylvain Gugger's avatar
Sylvain Gugger committed
181
182
183


TFElectraForQuestionAnswering
Sylvain Gugger's avatar
Sylvain Gugger committed
184
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Sylvain Gugger's avatar
Sylvain Gugger committed
185
186

.. autoclass:: transformers.TFElectraForQuestionAnswering
Sylvain Gugger's avatar
Sylvain Gugger committed
187
    :members: call
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236


FlaxElectraModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.FlaxElectraModel
    :members: __call__


FlaxElectraForPreTraining
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.FlaxElectraForPreTraining
    :members: __call__


FlaxElectraForMaskedLM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.FlaxElectraForMaskedLM
    :members: __call__


FlaxElectraForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.FlaxElectraForSequenceClassification
    :members: __call__


FlaxElectraForMultipleChoice
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.FlaxElectraForMultipleChoice
    :members: __call__


FlaxElectraForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.FlaxElectraForTokenClassification
    :members: __call__


FlaxElectraForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.FlaxElectraForQuestionAnswering
    :members: __call__