electra.rst 8.15 KB
Newer Older
Sylvain Gugger's avatar
Sylvain Gugger committed
1
2
3
4
5
6
7
8
9
10
11
12
.. 
    Copyright 2020 The HuggingFace Team. All rights reserved.

    Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
    the License. You may obtain a copy of the License at

        http://www.apache.org/licenses/LICENSE-2.0

    Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
    an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
    specific language governing permissions and limitations under the License.

Lysandre Debut's avatar
Lysandre Debut committed
13
ELECTRA
Sylvain Gugger's avatar
Sylvain Gugger committed
14
-----------------------------------------------------------------------------------------------------------------------
Lysandre Debut's avatar
Lysandre Debut committed
15

Sylvain Gugger's avatar
Sylvain Gugger committed
16
Overview
Sylvain Gugger's avatar
Sylvain Gugger committed
17
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Sylvain Gugger's avatar
Sylvain Gugger committed
18

Sylvain Gugger's avatar
Sylvain Gugger committed
19
20
21
22
23
The ELECTRA model was proposed in the paper `ELECTRA: Pre-training Text Encoders as Discriminators Rather Than
Generators <https://openreview.net/pdf?id=r1xMH1BtvB>`__. ELECTRA is a new pretraining approach which trains two
transformer models: the generator and the discriminator. The generator's role is to replace tokens in a sequence, and
is therefore trained as a masked language model. The discriminator, which is the model we're interested in, tries to
identify which tokens were replaced by the generator in the sequence.
Lysandre Debut's avatar
Lysandre Debut committed
24
25
26

The abstract from the paper is the following:

27
28
*Masked language modeling (MLM) pretraining methods such as BERT corrupt the input by replacing some tokens with [MASK]
and then train a model to reconstruct the original tokens. While they produce good results when transferred to
Sylvain Gugger's avatar
Sylvain Gugger committed
29
downstream NLP tasks, they generally require large amounts of compute to be effective. As an alternative, we propose a
30
more sample-efficient pretraining task called replaced token detection. Instead of masking the input, our approach
Sylvain Gugger's avatar
Sylvain Gugger committed
31
32
33
corrupts it by replacing some tokens with plausible alternatives sampled from a small generator network. Then, instead
of training a model that predicts the original identities of the corrupted tokens, we train a discriminative model that
predicts whether each token in the corrupted input was replaced by a generator sample or not. Thorough experiments
34
demonstrate this new pretraining task is more efficient than MLM because the task is defined over all input tokens
Sylvain Gugger's avatar
Sylvain Gugger committed
35
36
37
38
39
40
rather than just the small subset that was masked out. As a result, the contextual representations learned by our
approach substantially outperform the ones learned by BERT given the same model size, data, and compute. The gains are
particularly strong for small models; for example, we train a model on one GPU for 4 days that outperforms GPT (trained
using 30x more compute) on the GLUE natural language understanding benchmark. Our approach also works well at scale,
where it performs comparably to RoBERTa and XLNet while using less than 1/4 of their compute and outperforms them when
using the same amount of compute.*
Lysandre Debut's avatar
Lysandre Debut committed
41
42
43

Tips:

Sylvain Gugger's avatar
Sylvain Gugger committed
44
45
- ELECTRA is the pretraining approach, therefore there is nearly no changes done to the underlying model: BERT. The
  only change is the separation of the embedding size and the hidden size: the embedding size is generally smaller,
Sylvain Gugger's avatar
Sylvain Gugger committed
46
47
48
  while the hidden size is larger. An additional projection layer (linear) is used to project the embeddings from their
  embedding size to the hidden size. In the case where the embedding size is the same as the hidden size, no projection
  layer is used.
Lysandre Debut's avatar
Lysandre Debut committed
49
50
51
- The ELECTRA checkpoints saved using `Google Research's implementation <https://github.com/google-research/electra>`__
  contain both the generator and discriminator. The conversion script requires the user to name which model to export
  into the correct architecture. Once converted to the HuggingFace format, these checkpoints may be loaded into all
Sylvain Gugger's avatar
Sylvain Gugger committed
52
53
54
55
  available ELECTRA models, however. This means that the discriminator may be loaded in the
  :class:`~transformers.ElectraForMaskedLM` model, and the generator may be loaded in the
  :class:`~transformers.ElectraForPreTraining` model (the classification head will be randomly initialized as it
  doesn't exist in the generator).
Lysandre Debut's avatar
Lysandre Debut committed
56

Sylvain Gugger's avatar
Sylvain Gugger committed
57
The original code can be found `here <https://github.com/google-research/electra>`__.
58

Lysandre Debut's avatar
Lysandre Debut committed
59
60

ElectraConfig
Sylvain Gugger's avatar
Sylvain Gugger committed
61
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Lysandre Debut's avatar
Lysandre Debut committed
62
63
64
65
66
67

.. autoclass:: transformers.ElectraConfig
    :members:


ElectraTokenizer
Sylvain Gugger's avatar
Sylvain Gugger committed
68
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Lysandre Debut's avatar
Lysandre Debut committed
69
70
71
72
73

.. autoclass:: transformers.ElectraTokenizer
    :members:


74
ElectraTokenizerFast
Sylvain Gugger's avatar
Sylvain Gugger committed
75
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
76
77
78
79
80

.. autoclass:: transformers.ElectraTokenizerFast
    :members:


81
Electra specific outputs
Sylvain Gugger's avatar
Sylvain Gugger committed
82
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
83

Sylvain Gugger's avatar
Sylvain Gugger committed
84
.. autoclass:: transformers.models.electra.modeling_electra.ElectraForPreTrainingOutput
Sylvain Gugger's avatar
Sylvain Gugger committed
85
86
    :members:

Sylvain Gugger's avatar
Sylvain Gugger committed
87
.. autoclass:: transformers.models.electra.modeling_tf_electra.TFElectraForPreTrainingOutput
88
89
90
    :members:


Lysandre Debut's avatar
Lysandre Debut committed
91
ElectraModel
Sylvain Gugger's avatar
Sylvain Gugger committed
92
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Lysandre Debut's avatar
Lysandre Debut committed
93
94

.. autoclass:: transformers.ElectraModel
Sylvain Gugger's avatar
Sylvain Gugger committed
95
    :members: forward
Lysandre Debut's avatar
Lysandre Debut committed
96
97
98


ElectraForPreTraining
Sylvain Gugger's avatar
Sylvain Gugger committed
99
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Lysandre Debut's avatar
Lysandre Debut committed
100
101

.. autoclass:: transformers.ElectraForPreTraining
Sylvain Gugger's avatar
Sylvain Gugger committed
102
    :members: forward
Lysandre Debut's avatar
Lysandre Debut committed
103
104
105


ElectraForMaskedLM
Sylvain Gugger's avatar
Sylvain Gugger committed
106
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Lysandre Debut's avatar
Lysandre Debut committed
107
108

.. autoclass:: transformers.ElectraForMaskedLM
Sylvain Gugger's avatar
Sylvain Gugger committed
109
    :members: forward
Lysandre Debut's avatar
Lysandre Debut committed
110
111


Sylvain Gugger's avatar
Sylvain Gugger committed
112
ElectraForSequenceClassification
Sylvain Gugger's avatar
Sylvain Gugger committed
113
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Sylvain Gugger's avatar
Sylvain Gugger committed
114
115

.. autoclass:: transformers.ElectraForSequenceClassification
Sylvain Gugger's avatar
Sylvain Gugger committed
116
    :members: forward
Sylvain Gugger's avatar
Sylvain Gugger committed
117
118


Sylvain Gugger's avatar
Sylvain Gugger committed
119
ElectraForMultipleChoice
Sylvain Gugger's avatar
Sylvain Gugger committed
120
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Sylvain Gugger's avatar
Sylvain Gugger committed
121
122

.. autoclass:: transformers.ElectraForMultipleChoice
Sylvain Gugger's avatar
Sylvain Gugger committed
123
    :members: forward
Sylvain Gugger's avatar
Sylvain Gugger committed
124
125


Lysandre Debut's avatar
Lysandre Debut committed
126
ElectraForTokenClassification
Sylvain Gugger's avatar
Sylvain Gugger committed
127
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Lysandre Debut's avatar
Lysandre Debut committed
128
129

.. autoclass:: transformers.ElectraForTokenClassification
Sylvain Gugger's avatar
Sylvain Gugger committed
130
    :members: forward
Lysandre Debut's avatar
Lysandre Debut committed
131
132


133
ElectraForQuestionAnswering
Sylvain Gugger's avatar
Sylvain Gugger committed
134
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
135
136

.. autoclass:: transformers.ElectraForQuestionAnswering
Sylvain Gugger's avatar
Sylvain Gugger committed
137
    :members: forward
138
139


Lysandre Debut's avatar
Lysandre Debut committed
140
TFElectraModel
Sylvain Gugger's avatar
Sylvain Gugger committed
141
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Lysandre Debut's avatar
Lysandre Debut committed
142
143

.. autoclass:: transformers.TFElectraModel
Sylvain Gugger's avatar
Sylvain Gugger committed
144
    :members: call
Lysandre Debut's avatar
Lysandre Debut committed
145
146
147


TFElectraForPreTraining
Sylvain Gugger's avatar
Sylvain Gugger committed
148
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Lysandre Debut's avatar
Lysandre Debut committed
149
150

.. autoclass:: transformers.TFElectraForPreTraining
Sylvain Gugger's avatar
Sylvain Gugger committed
151
    :members: call
Lysandre Debut's avatar
Lysandre Debut committed
152
153
154


TFElectraForMaskedLM
Sylvain Gugger's avatar
Sylvain Gugger committed
155
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Lysandre Debut's avatar
Lysandre Debut committed
156
157

.. autoclass:: transformers.TFElectraForMaskedLM
Sylvain Gugger's avatar
Sylvain Gugger committed
158
    :members: call
Lysandre Debut's avatar
Lysandre Debut committed
159
160


Sylvain Gugger's avatar
Sylvain Gugger committed
161
TFElectraForSequenceClassification
Sylvain Gugger's avatar
Sylvain Gugger committed
162
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Sylvain Gugger's avatar
Sylvain Gugger committed
163
164

.. autoclass:: transformers.TFElectraForSequenceClassification
Sylvain Gugger's avatar
Sylvain Gugger committed
165
    :members: call
Sylvain Gugger's avatar
Sylvain Gugger committed
166
167
168


TFElectraForMultipleChoice
Sylvain Gugger's avatar
Sylvain Gugger committed
169
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Sylvain Gugger's avatar
Sylvain Gugger committed
170
171

.. autoclass:: transformers.TFElectraForMultipleChoice
Sylvain Gugger's avatar
Sylvain Gugger committed
172
    :members: call
Sylvain Gugger's avatar
Sylvain Gugger committed
173
174


Lysandre Debut's avatar
Lysandre Debut committed
175
TFElectraForTokenClassification
Sylvain Gugger's avatar
Sylvain Gugger committed
176
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Lysandre Debut's avatar
Lysandre Debut committed
177
178

.. autoclass:: transformers.TFElectraForTokenClassification
Sylvain Gugger's avatar
Sylvain Gugger committed
179
    :members: call
Sylvain Gugger's avatar
Sylvain Gugger committed
180
181
182


TFElectraForQuestionAnswering
Sylvain Gugger's avatar
Sylvain Gugger committed
183
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Sylvain Gugger's avatar
Sylvain Gugger committed
184
185

.. autoclass:: transformers.TFElectraForQuestionAnswering
Sylvain Gugger's avatar
Sylvain Gugger committed
186
    :members: call