electra.rst 7.55 KB
Newer Older
Lysandre Debut's avatar
Lysandre Debut committed
1
ELECTRA
Sylvain Gugger's avatar
Sylvain Gugger committed
2
-----------------------------------------------------------------------------------------------------------------------
Lysandre Debut's avatar
Lysandre Debut committed
3

Sylvain Gugger's avatar
Sylvain Gugger committed
4
Overview
Sylvain Gugger's avatar
Sylvain Gugger committed
5
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Sylvain Gugger's avatar
Sylvain Gugger committed
6

Sylvain Gugger's avatar
Sylvain Gugger committed
7
8
9
10
11
The ELECTRA model was proposed in the paper `ELECTRA: Pre-training Text Encoders as Discriminators Rather Than
Generators <https://openreview.net/pdf?id=r1xMH1BtvB>`__. ELECTRA is a new pretraining approach which trains two
transformer models: the generator and the discriminator. The generator's role is to replace tokens in a sequence, and
is therefore trained as a masked language model. The discriminator, which is the model we're interested in, tries to
identify which tokens were replaced by the generator in the sequence.
Lysandre Debut's avatar
Lysandre Debut committed
12
13
14

The abstract from the paper is the following:

15
16
*Masked language modeling (MLM) pretraining methods such as BERT corrupt the input by replacing some tokens with [MASK]
and then train a model to reconstruct the original tokens. While they produce good results when transferred to
Sylvain Gugger's avatar
Sylvain Gugger committed
17
downstream NLP tasks, they generally require large amounts of compute to be effective. As an alternative, we propose a
18
more sample-efficient pretraining task called replaced token detection. Instead of masking the input, our approach
Sylvain Gugger's avatar
Sylvain Gugger committed
19
20
21
corrupts it by replacing some tokens with plausible alternatives sampled from a small generator network. Then, instead
of training a model that predicts the original identities of the corrupted tokens, we train a discriminative model that
predicts whether each token in the corrupted input was replaced by a generator sample or not. Thorough experiments
22
demonstrate this new pretraining task is more efficient than MLM because the task is defined over all input tokens
Sylvain Gugger's avatar
Sylvain Gugger committed
23
24
25
26
27
28
rather than just the small subset that was masked out. As a result, the contextual representations learned by our
approach substantially outperform the ones learned by BERT given the same model size, data, and compute. The gains are
particularly strong for small models; for example, we train a model on one GPU for 4 days that outperforms GPT (trained
using 30x more compute) on the GLUE natural language understanding benchmark. Our approach also works well at scale,
where it performs comparably to RoBERTa and XLNet while using less than 1/4 of their compute and outperforms them when
using the same amount of compute.*
Lysandre Debut's avatar
Lysandre Debut committed
29
30
31

Tips:

Sylvain Gugger's avatar
Sylvain Gugger committed
32
33
- ELECTRA is the pretraining approach, therefore there is nearly no changes done to the underlying model: BERT. The
  only change is the separation of the embedding size and the hidden size: the embedding size is generally smaller,
Sylvain Gugger's avatar
Sylvain Gugger committed
34
35
36
  while the hidden size is larger. An additional projection layer (linear) is used to project the embeddings from their
  embedding size to the hidden size. In the case where the embedding size is the same as the hidden size, no projection
  layer is used.
Lysandre Debut's avatar
Lysandre Debut committed
37
38
39
- The ELECTRA checkpoints saved using `Google Research's implementation <https://github.com/google-research/electra>`__
  contain both the generator and discriminator. The conversion script requires the user to name which model to export
  into the correct architecture. Once converted to the HuggingFace format, these checkpoints may be loaded into all
Sylvain Gugger's avatar
Sylvain Gugger committed
40
41
42
43
  available ELECTRA models, however. This means that the discriminator may be loaded in the
  :class:`~transformers.ElectraForMaskedLM` model, and the generator may be loaded in the
  :class:`~transformers.ElectraForPreTraining` model (the classification head will be randomly initialized as it
  doesn't exist in the generator).
Lysandre Debut's avatar
Lysandre Debut committed
44

Sylvain Gugger's avatar
Sylvain Gugger committed
45
The original code can be found `here <https://github.com/google-research/electra>`__.
46

Lysandre Debut's avatar
Lysandre Debut committed
47
48

ElectraConfig
Sylvain Gugger's avatar
Sylvain Gugger committed
49
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Lysandre Debut's avatar
Lysandre Debut committed
50
51
52
53
54
55

.. autoclass:: transformers.ElectraConfig
    :members:


ElectraTokenizer
Sylvain Gugger's avatar
Sylvain Gugger committed
56
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Lysandre Debut's avatar
Lysandre Debut committed
57
58
59
60
61

.. autoclass:: transformers.ElectraTokenizer
    :members:


62
ElectraTokenizerFast
Sylvain Gugger's avatar
Sylvain Gugger committed
63
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
64
65
66
67
68

.. autoclass:: transformers.ElectraTokenizerFast
    :members:


69
Electra specific outputs
Sylvain Gugger's avatar
Sylvain Gugger committed
70
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
71

Sylvain Gugger's avatar
Sylvain Gugger committed
72
.. autoclass:: transformers.models.electra.modeling_electra.ElectraForPreTrainingOutput
Sylvain Gugger's avatar
Sylvain Gugger committed
73
74
    :members:

Sylvain Gugger's avatar
Sylvain Gugger committed
75
.. autoclass:: transformers.models.electra.modeling_tf_electra.TFElectraForPreTrainingOutput
76
77
78
    :members:


Lysandre Debut's avatar
Lysandre Debut committed
79
ElectraModel
Sylvain Gugger's avatar
Sylvain Gugger committed
80
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Lysandre Debut's avatar
Lysandre Debut committed
81
82

.. autoclass:: transformers.ElectraModel
Sylvain Gugger's avatar
Sylvain Gugger committed
83
    :members: forward
Lysandre Debut's avatar
Lysandre Debut committed
84
85
86


ElectraForPreTraining
Sylvain Gugger's avatar
Sylvain Gugger committed
87
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Lysandre Debut's avatar
Lysandre Debut committed
88
89

.. autoclass:: transformers.ElectraForPreTraining
Sylvain Gugger's avatar
Sylvain Gugger committed
90
    :members: forward
Lysandre Debut's avatar
Lysandre Debut committed
91
92
93


ElectraForMaskedLM
Sylvain Gugger's avatar
Sylvain Gugger committed
94
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Lysandre Debut's avatar
Lysandre Debut committed
95
96

.. autoclass:: transformers.ElectraForMaskedLM
Sylvain Gugger's avatar
Sylvain Gugger committed
97
    :members: forward
Lysandre Debut's avatar
Lysandre Debut committed
98
99


Sylvain Gugger's avatar
Sylvain Gugger committed
100
ElectraForSequenceClassification
Sylvain Gugger's avatar
Sylvain Gugger committed
101
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Sylvain Gugger's avatar
Sylvain Gugger committed
102
103

.. autoclass:: transformers.ElectraForSequenceClassification
Sylvain Gugger's avatar
Sylvain Gugger committed
104
    :members: forward
Sylvain Gugger's avatar
Sylvain Gugger committed
105
106


Sylvain Gugger's avatar
Sylvain Gugger committed
107
ElectraForMultipleChoice
Sylvain Gugger's avatar
Sylvain Gugger committed
108
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Sylvain Gugger's avatar
Sylvain Gugger committed
109
110

.. autoclass:: transformers.ElectraForMultipleChoice
Sylvain Gugger's avatar
Sylvain Gugger committed
111
    :members: forward
Sylvain Gugger's avatar
Sylvain Gugger committed
112
113


Lysandre Debut's avatar
Lysandre Debut committed
114
ElectraForTokenClassification
Sylvain Gugger's avatar
Sylvain Gugger committed
115
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Lysandre Debut's avatar
Lysandre Debut committed
116
117

.. autoclass:: transformers.ElectraForTokenClassification
Sylvain Gugger's avatar
Sylvain Gugger committed
118
    :members: forward
Lysandre Debut's avatar
Lysandre Debut committed
119
120


121
ElectraForQuestionAnswering
Sylvain Gugger's avatar
Sylvain Gugger committed
122
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
123
124

.. autoclass:: transformers.ElectraForQuestionAnswering
Sylvain Gugger's avatar
Sylvain Gugger committed
125
    :members: forward
126
127


Lysandre Debut's avatar
Lysandre Debut committed
128
TFElectraModel
Sylvain Gugger's avatar
Sylvain Gugger committed
129
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Lysandre Debut's avatar
Lysandre Debut committed
130
131

.. autoclass:: transformers.TFElectraModel
Sylvain Gugger's avatar
Sylvain Gugger committed
132
    :members: call
Lysandre Debut's avatar
Lysandre Debut committed
133
134
135


TFElectraForPreTraining
Sylvain Gugger's avatar
Sylvain Gugger committed
136
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Lysandre Debut's avatar
Lysandre Debut committed
137
138

.. autoclass:: transformers.TFElectraForPreTraining
Sylvain Gugger's avatar
Sylvain Gugger committed
139
    :members: call
Lysandre Debut's avatar
Lysandre Debut committed
140
141
142


TFElectraForMaskedLM
Sylvain Gugger's avatar
Sylvain Gugger committed
143
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Lysandre Debut's avatar
Lysandre Debut committed
144
145

.. autoclass:: transformers.TFElectraForMaskedLM
Sylvain Gugger's avatar
Sylvain Gugger committed
146
    :members: call
Lysandre Debut's avatar
Lysandre Debut committed
147
148


Sylvain Gugger's avatar
Sylvain Gugger committed
149
TFElectraForSequenceClassification
Sylvain Gugger's avatar
Sylvain Gugger committed
150
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Sylvain Gugger's avatar
Sylvain Gugger committed
151
152

.. autoclass:: transformers.TFElectraForSequenceClassification
Sylvain Gugger's avatar
Sylvain Gugger committed
153
    :members: call
Sylvain Gugger's avatar
Sylvain Gugger committed
154
155
156


TFElectraForMultipleChoice
Sylvain Gugger's avatar
Sylvain Gugger committed
157
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Sylvain Gugger's avatar
Sylvain Gugger committed
158
159

.. autoclass:: transformers.TFElectraForMultipleChoice
Sylvain Gugger's avatar
Sylvain Gugger committed
160
    :members: call
Sylvain Gugger's avatar
Sylvain Gugger committed
161
162


Lysandre Debut's avatar
Lysandre Debut committed
163
TFElectraForTokenClassification
Sylvain Gugger's avatar
Sylvain Gugger committed
164
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Lysandre Debut's avatar
Lysandre Debut committed
165
166

.. autoclass:: transformers.TFElectraForTokenClassification
Sylvain Gugger's avatar
Sylvain Gugger committed
167
    :members: call
Sylvain Gugger's avatar
Sylvain Gugger committed
168
169
170


TFElectraForQuestionAnswering
Sylvain Gugger's avatar
Sylvain Gugger committed
171
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Sylvain Gugger's avatar
Sylvain Gugger committed
172
173

.. autoclass:: transformers.TFElectraForQuestionAnswering
Sylvain Gugger's avatar
Sylvain Gugger committed
174
    :members: call