Add electra and alectra model cards (#3524)

ebceeeac · Sho Arora · GitHub · a6c4ee27 · ebceeeac · ebceeeac
Unverified Commit ebceeeac authored Mar 31, 2020 by Sho Arora Committed by GitHub Mar 31, 2020
Showing with 119 additions and 0 deletions

model_cards/shoarora/alectra-small-owt/README.md model_cards/shoarora/alectra-small-owt/README.md +60 -0

model_cards/shoarora/electra-small-owt/README.md model_cards/shoarora/electra-small-owt/README.md +59 -0

No files found.
--- a/model_cards/shoarora/alectra-small-owt/README.md
+++ b/model_cards/shoarora/alectra-small-owt/README.md
+# ALECTRA-small-OWT
+This is an extension of
+[ELECTRA](https://openreview.net/forum?id=r1xMH1BtvB) small model, trained on the
+[OpenWebText corpus](https://skylion007.github.io/OpenWebTextCorpus/).
+The training task (discriminative LM / replaced-token-detection) can be generalized to any transformer type.  Here, we train an ALBERT model under the same scheme.
+## Pretraining task
+![electra task diagram](https://github.com/shoarora/lmtuners/raw/master/assets/electra.png)
+(figure from [Clark et al. 2020](https://openreview.net/pdf?id=r1xMH1BtvB))
+ELECTRA uses discriminative LM / replaced-token-detection for pretraining.
+This involves a generator (a Masked LM model) creating examples for a discriminator
+to classify as original or replaced for each token.
+The generator generalizes to any `*ForMaskedLM` model and the discriminator could be
+any `*ForTokenClassification` model.  Therefore, we can extend the task to ALBERT models,
+not just BERT as in the original paper.
+## Usage
+```python
+from transformers import AlbertForSequenceClassification, BertTokenizer
+# Both models use the bert-base-uncased tokenizer and vocab.
+tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
+alectra = AlbertForSequenceClassification.from_pretrained('shoarora/alectra-small-owt')
+```
+NOTE: this ALBERT model uses a BERT WordPiece tokenizer.
+## Code
+The pytorch module that implements this task is available [here](https://github.com/shoarora/lmtuners/blob/master/lmtuners/lightning_modules/discriminative_lm.py).
+Further implementation information [here](https://github.com/shoarora/lmtuners/tree/master/experiments/disc_lm_small),
+and [here](https://github.com/shoarora/lmtuners/blob/master/experiments/disc_lm_small/train_alectra_small.py) is the script that created this model.
+This specific model was trained with the following params:
+- `batch_size: 512`
+- `training_steps: 5e5`
+- `warmup_steps: 4e4`
+- `learning_rate: 2e-3`
+## Downstream tasks
+#### GLUE Dev results
+| Model                    | # Params | CoLA | SST | MRPC | STS  | QQP  | MNLI | QNLI | RTE |
+| ---                      | ---      | ---  | --- | ---  | ---  | ---  | ---  | ---  | --- |
+| ELECTRA-Small++          | 14M      | 57.0 | 91. | 88.0 | 87.5 | 89.0 | 81.3 | 88.4 | 66.7|
+| ELECTRA-Small-OWT        | 14M      | 56.8 | 88.3| 87.4 | 86.8 | 88.3 | 78.9 | 87.9 | 68.5|
+| ELECTRA-Small-OWT (ours) | 17M      | 56.3 | 88.4| 75.0 | 86.1 | 89.1 | 77.9 | 83.0 | 67.1|
+| ALECTRA-Small-OWT (ours) |  4M      | 50.6 | 89.1| 86.3 | 87.2 | 89.1 | 78.2 | 85.9 | 69.6|
+#### GLUE Test results
+| Model                    | # Params | CoLA | SST | MRPC | STS  | QQP  | MNLI | QNLI | RTE |
+| ---                      | ---      | ---  | --- | ---  | ---  | ---  | ---  | ---  | --- |
+| BERT-Base                | 110M     | 52.1 | 93.5| 84.8 | 85.9 | 89.2 | 84.6 | 90.5 | 66.4|
+| GPT                      | 117M     | 45.4 | 91.3| 75.7 | 80.0 | 88.5 | 82.1 | 88.1 | 56.0|
+| ELECTRA-Small++          | 14M      | 57.0 | 91.2| 88.0 | 87.5 | 89.0 | 81.3 | 88.4 | 66.7|
+| ELECTRA-Small-OWT (ours) | 17M      | 57.4 | 89.3| 76.2 | 81.9 | 87.5 | 78.1 | 82.4 | 68.1|
+| ALECTRA-Small-OWT (ours) |  4M      | 43.9 | 87.9| 82.1 | 82.0 | 87.6 | 77.9 | 85.8 | 67.5|
--- a/model_cards/shoarora/electra-small-owt/README.md
+++ b/model_cards/shoarora/electra-small-owt/README.md
+# ELECTRA-small-OWT
+This is an unnoficial implementation of an
+[ELECTRA](https://openreview.net/forum?id=r1xMH1BtvB) small model, trained on the
+[OpenWebText corpus](https://skylion007.github.io/OpenWebTextCorpus/).
+Differences from official ELECTRA models:
+ - we use a `BertForMaskedLM` as the generator and `BertForTokenClassification` as the discriminator
+ - they use an embedding projection layer, but Bert doesn't have one
+## Pretraining ttask
+![electra task diagram](https://github.com/shoarora/lmtuners/raw/master/assets/electra.png)
+(figure from [Clark et al. 2020](https://openreview.net/pdf?id=r1xMH1BtvB))
+ELECTRA uses discriminative LM / replaced-token-detection for pretraining.
+This involves a generator (a Masked LM model) creating examples for a discriminator
+to classify as original or replaced for each token.
+## Usage
+```python
+from transformers import BertForSequenceClassification, BertTokenizer
+tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
+electra = BertForSequenceClassification.from_pretrained('shoarora/electra-small-owt')
+```
+## Code
+The pytorch module that implements this task is available [here](https://github.com/shoarora/lmtuners/blob/master/lmtuners/lightning_modules/discriminative_lm.py).
+Further implementation information [here](https://github.com/shoarora/lmtuners/tree/master/experiments/disc_lm_small),
+and [here](https://github.com/shoarora/lmtuners/blob/master/experiments/disc_lm_small/train_electra_small.py) is the script that created this model.
+This specific model was trained with the following params:
+- `batch_size: 512`
+- `training_steps: 5e5`
+- `warmup_steps: 4e4`
+- `learning_rate: 2e-3`
+## Downstream tasks
+#### GLUE Dev results
+| Model                    | # Params | CoLA | SST | MRPC | STS  | QQP  | MNLI | QNLI | RTE |
+| ---                      | ---      | ---  | --- | ---  | ---  | ---  | ---  | ---  | --- |
+| ELECTRA-Small++          | 14M      | 57.0 | 91. | 88.0 | 87.5 | 89.0 | 81.3 | 88.4 | 66.7|
+| ELECTRA-Small-OWT        | 14M      | 56.8 | 88.3| 87.4 | 86.8 | 88.3 | 78.9 | 87.9 | 68.5|
+| ELECTRA-Small-OWT (ours) | 17M      | 56.3 | 88.4| 75.0 | 86.1 | 89.1 | 77.9 | 83.0 | 67.1|
+| ALECTRA-Small-OWT (ours) |  4M      | 50.6 | 89.1| 86.3 | 87.2 | 89.1 | 78.2 | 85.9 | 69.6|
+- Table initialized from [ELECTRA github repo](https://github.com/google-research/electra)
+#### GLUE Test results
+| Model                    | # Params | CoLA | SST | MRPC | STS  | QQP  | MNLI | QNLI | RTE |
+| ---                      | ---      | ---  | --- | ---  | ---  | ---  | ---  | ---  | --- |
+| BERT-Base                | 110M     | 52.1 | 93.5| 84.8 | 85.9 | 89.2 | 84.6 | 90.5 | 66.4|
+| GPT                      | 117M     | 45.4 | 91.3| 75.7 | 80.0 | 88.5 | 82.1 | 88.1 | 56.0|
+| ELECTRA-Small++          | 14M      | 57.0 | 91.2| 88.0 | 87.5 | 89.0 | 81.3 | 88.4 | 66.7|
+| ELECTRA-Small-OWT (ours) | 17M      | 57.4 | 89.3| 76.2 | 81.9 | 87.5 | 78.1 | 82.4 | 68.1|
+| ALECTRA-Small-OWT (ours) |  4M      | 43.9 | 87.9| 82.1 | 82.0 | 87.6 | 77.9 | 85.8 | 67.5|