"examples/movement-pruning/vscode:/vscode.git/clone" did not exist on "e78c1103385f2d2f9cd4980f61a8e71baa655356"
electra.rst 7.52 KB
Newer Older
Lysandre Debut's avatar
Lysandre Debut committed
1
ELECTRA
Sylvain Gugger's avatar
Sylvain Gugger committed
2
-----------------------------------------------------------------------------------------------------------------------
Lysandre Debut's avatar
Lysandre Debut committed
3

Sylvain Gugger's avatar
Sylvain Gugger committed
4
Overview
Sylvain Gugger's avatar
Sylvain Gugger committed
5
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Sylvain Gugger's avatar
Sylvain Gugger committed
6

Sylvain Gugger's avatar
Sylvain Gugger committed
7
8
9
10
11
The ELECTRA model was proposed in the paper `ELECTRA: Pre-training Text Encoders as Discriminators Rather Than
Generators <https://openreview.net/pdf?id=r1xMH1BtvB>`__. ELECTRA is a new pretraining approach which trains two
transformer models: the generator and the discriminator. The generator's role is to replace tokens in a sequence, and
is therefore trained as a masked language model. The discriminator, which is the model we're interested in, tries to
identify which tokens were replaced by the generator in the sequence.
Lysandre Debut's avatar
Lysandre Debut committed
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37

The abstract from the paper is the following:

*Masked language modeling (MLM) pre-training methods such as BERT corrupt
the input by replacing some tokens with [MASK] and then train a model to
reconstruct the original tokens. While they produce good results when transferred
to downstream NLP tasks, they generally require large amounts of compute to be
effective. As an alternative, we propose a more sample-efficient pre-training task
called replaced token detection. Instead of masking the input, our approach
corrupts it by replacing some tokens with plausible alternatives sampled from a small
generator network. Then, instead of training a model that predicts the original
identities of the corrupted tokens, we train a discriminative model that predicts
whether each token in the corrupted input was replaced by a generator sample
or not. Thorough experiments demonstrate this new pre-training task is more
efficient than MLM because the task is defined over all input tokens rather than
just the small subset that was masked out. As a result, the contextual representations
learned by our approach substantially outperform the ones learned by BERT
given the same model size, data, and compute. The gains are particularly strong
for small models; for example, we train a model on one GPU for 4 days that
outperforms GPT (trained using 30x more compute) on the GLUE natural language
understanding benchmark. Our approach also works well at scale, where it
performs comparably to RoBERTa and XLNet while using less than 1/4 of their
compute and outperforms them when using the same amount of compute.*

Tips:

Sylvain Gugger's avatar
Sylvain Gugger committed
38
39
- ELECTRA is the pretraining approach, therefore there is nearly no changes done to the underlying model: BERT. The
  only change is the separation of the embedding size and the hidden size: the embedding size is generally smaller,
Lysandre Debut's avatar
Lysandre Debut committed
40
41
42
43
44
45
  while the hidden size is larger. An additional projection layer (linear) is used to project the embeddings from
  their embedding size to the hidden size. In the case where the embedding size is the same as the hidden size, no
  projection layer is used.
- The ELECTRA checkpoints saved using `Google Research's implementation <https://github.com/google-research/electra>`__
  contain both the generator and discriminator. The conversion script requires the user to name which model to export
  into the correct architecture. Once converted to the HuggingFace format, these checkpoints may be loaded into all
Sylvain Gugger's avatar
Sylvain Gugger committed
46
47
48
49
  available ELECTRA models, however. This means that the discriminator may be loaded in the
  :class:`~transformers.ElectraForMaskedLM` model, and the generator may be loaded in the
  :class:`~transformers.ElectraForPreTraining` model (the classification head will be randomly initialized as it
  doesn't exist in the generator).
Lysandre Debut's avatar
Lysandre Debut committed
50

Sylvain Gugger's avatar
Sylvain Gugger committed
51
The original code can be found `here <https://github.com/google-research/electra>`__.
52

Lysandre Debut's avatar
Lysandre Debut committed
53
54

ElectraConfig
Sylvain Gugger's avatar
Sylvain Gugger committed
55
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Lysandre Debut's avatar
Lysandre Debut committed
56
57
58
59
60
61

.. autoclass:: transformers.ElectraConfig
    :members:


ElectraTokenizer
Sylvain Gugger's avatar
Sylvain Gugger committed
62
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Lysandre Debut's avatar
Lysandre Debut committed
63
64
65
66
67

.. autoclass:: transformers.ElectraTokenizer
    :members:


68
ElectraTokenizerFast
Sylvain Gugger's avatar
Sylvain Gugger committed
69
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
70
71
72
73
74

.. autoclass:: transformers.ElectraTokenizerFast
    :members:


75
Electra specific outputs
Sylvain Gugger's avatar
Sylvain Gugger committed
76
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
77

Sylvain Gugger's avatar
Sylvain Gugger committed
78
79
80
81
.. autoclass:: transformers.modeling_electra.ElectraForPreTrainingOutput
    :members:

.. autoclass:: transformers.modeling_tf_electra.TFElectraForPreTrainingOutput
82
83
84
    :members:


Lysandre Debut's avatar
Lysandre Debut committed
85
ElectraModel
Sylvain Gugger's avatar
Sylvain Gugger committed
86
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Lysandre Debut's avatar
Lysandre Debut committed
87
88

.. autoclass:: transformers.ElectraModel
Sylvain Gugger's avatar
Sylvain Gugger committed
89
    :members: forward
Lysandre Debut's avatar
Lysandre Debut committed
90
91
92


ElectraForPreTraining
Sylvain Gugger's avatar
Sylvain Gugger committed
93
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Lysandre Debut's avatar
Lysandre Debut committed
94
95

.. autoclass:: transformers.ElectraForPreTraining
Sylvain Gugger's avatar
Sylvain Gugger committed
96
    :members: forward
Lysandre Debut's avatar
Lysandre Debut committed
97
98
99


ElectraForMaskedLM
Sylvain Gugger's avatar
Sylvain Gugger committed
100
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Lysandre Debut's avatar
Lysandre Debut committed
101
102

.. autoclass:: transformers.ElectraForMaskedLM
Sylvain Gugger's avatar
Sylvain Gugger committed
103
    :members: forward
Lysandre Debut's avatar
Lysandre Debut committed
104
105


Sylvain Gugger's avatar
Sylvain Gugger committed
106
ElectraForSequenceClassification
Sylvain Gugger's avatar
Sylvain Gugger committed
107
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Sylvain Gugger's avatar
Sylvain Gugger committed
108
109

.. autoclass:: transformers.ElectraForSequenceClassification
Sylvain Gugger's avatar
Sylvain Gugger committed
110
    :members: forward
Sylvain Gugger's avatar
Sylvain Gugger committed
111
112


Sylvain Gugger's avatar
Sylvain Gugger committed
113
ElectraForMultipleChoice
Sylvain Gugger's avatar
Sylvain Gugger committed
114
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Sylvain Gugger's avatar
Sylvain Gugger committed
115
116

.. autoclass:: transformers.ElectraForMultipleChoice
Sylvain Gugger's avatar
Sylvain Gugger committed
117
    :members: forward
Sylvain Gugger's avatar
Sylvain Gugger committed
118
119


Lysandre Debut's avatar
Lysandre Debut committed
120
ElectraForTokenClassification
Sylvain Gugger's avatar
Sylvain Gugger committed
121
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Lysandre Debut's avatar
Lysandre Debut committed
122
123

.. autoclass:: transformers.ElectraForTokenClassification
Sylvain Gugger's avatar
Sylvain Gugger committed
124
    :members: forward
Lysandre Debut's avatar
Lysandre Debut committed
125
126


127
ElectraForQuestionAnswering
Sylvain Gugger's avatar
Sylvain Gugger committed
128
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
129
130

.. autoclass:: transformers.ElectraForQuestionAnswering
Sylvain Gugger's avatar
Sylvain Gugger committed
131
    :members: forward
132
133


Lysandre Debut's avatar
Lysandre Debut committed
134
TFElectraModel
Sylvain Gugger's avatar
Sylvain Gugger committed
135
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Lysandre Debut's avatar
Lysandre Debut committed
136
137

.. autoclass:: transformers.TFElectraModel
Sylvain Gugger's avatar
Sylvain Gugger committed
138
    :members: call
Lysandre Debut's avatar
Lysandre Debut committed
139
140
141


TFElectraForPreTraining
Sylvain Gugger's avatar
Sylvain Gugger committed
142
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Lysandre Debut's avatar
Lysandre Debut committed
143
144

.. autoclass:: transformers.TFElectraForPreTraining
Sylvain Gugger's avatar
Sylvain Gugger committed
145
    :members: call
Lysandre Debut's avatar
Lysandre Debut committed
146
147
148


TFElectraForMaskedLM
Sylvain Gugger's avatar
Sylvain Gugger committed
149
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Lysandre Debut's avatar
Lysandre Debut committed
150
151

.. autoclass:: transformers.TFElectraForMaskedLM
Sylvain Gugger's avatar
Sylvain Gugger committed
152
    :members: call
Lysandre Debut's avatar
Lysandre Debut committed
153
154


Sylvain Gugger's avatar
Sylvain Gugger committed
155
TFElectraForSequenceClassification
Sylvain Gugger's avatar
Sylvain Gugger committed
156
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Sylvain Gugger's avatar
Sylvain Gugger committed
157
158

.. autoclass:: transformers.TFElectraForSequenceClassification
Sylvain Gugger's avatar
Sylvain Gugger committed
159
    :members: call
Sylvain Gugger's avatar
Sylvain Gugger committed
160
161
162


TFElectraForMultipleChoice
Sylvain Gugger's avatar
Sylvain Gugger committed
163
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Sylvain Gugger's avatar
Sylvain Gugger committed
164
165

.. autoclass:: transformers.TFElectraForMultipleChoice
Sylvain Gugger's avatar
Sylvain Gugger committed
166
    :members: call
Sylvain Gugger's avatar
Sylvain Gugger committed
167
168


Lysandre Debut's avatar
Lysandre Debut committed
169
TFElectraForTokenClassification
Sylvain Gugger's avatar
Sylvain Gugger committed
170
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Lysandre Debut's avatar
Lysandre Debut committed
171
172

.. autoclass:: transformers.TFElectraForTokenClassification
Sylvain Gugger's avatar
Sylvain Gugger committed
173
    :members: call
Sylvain Gugger's avatar
Sylvain Gugger committed
174
175
176


TFElectraForQuestionAnswering
Sylvain Gugger's avatar
Sylvain Gugger committed
177
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Sylvain Gugger's avatar
Sylvain Gugger committed
178
179

.. autoclass:: transformers.TFElectraForQuestionAnswering
Sylvain Gugger's avatar
Sylvain Gugger committed
180
    :members: call