Commit 3fe5c8e8 authored by VictorSanh's avatar VictorSanh
Browse files

update bert-base-uncased rslts

parent 354944e6
...@@ -97,20 +97,20 @@ Fine-tuning the library models for sequence classification on the GLUE benchmark ...@@ -97,20 +97,20 @@ Fine-tuning the library models for sequence classification on the GLUE benchmark
Evaluation](https://gluebenchmark.com/). This script can fine-tune the following models: BERT, XLM, XLNet and RoBERTa. Evaluation](https://gluebenchmark.com/). This script can fine-tune the following models: BERT, XLM, XLNet and RoBERTa.
GLUE is made up of a total of 9 different tasks. We get the following results on the dev set of the benchmark with an GLUE is made up of a total of 9 different tasks. We get the following results on the dev set of the benchmark with an
uncased BERT base model (the checkpoint `bert-base-uncased`). All experiments ran on 8 V100 GPUs with a total train uncased BERT base model (the checkpoint `bert-base-uncased`). All experiments ran on 8 V100 GPUs with a total train
batch size of 24. Some of these tasks have a small dataset and training can lead to high variance in the results batch size of 24. Some of these tasks have a small dataset and training can lead to high variance in the results
between different runs. We report the median on 5 runs (with different seeds) for each of the metrics. between different runs. We report the median on 5 runs (with different seeds) for each of the metrics.
| Task | Metric | Result | | Task | Metric | Result |
|-------|------------------------------|-------------| |-------|------------------------------|-------------|
| CoLA | Matthew's corr | 55.75 | | CoLA | Matthew's corr | 48.87 |
| SST-2 | Accuracy | 92.09 | | SST-2 | Accuracy | 91.74 |
| MRPC | F1/Accuracy | 90.48/86.27 | | MRPC | F1/Accuracy | 90.70/86.27 |
| STS-B | Person/Spearman corr. | 89.03/88.64 | | STS-B | Person/Spearman corr. | 91.39/91.04 |
| QQP | Accuracy/F1 | 90.92/87.72 | | QQP | Accuracy/F1 | 90.79/87.66 |
| MNLI | Matched acc./Mismatched acc. | 83.74/84.06 | | MNLI | Matched acc./Mismatched acc. | 83.70/84.83 |
| QNLI | Accuracy | 91.07 | | QNLI | Accuracy | 89.31 |
| RTE | Accuracy | 68.59 | | RTE | Accuracy | 71.43 |
| WNLI | Accuracy | 43.66 | | WNLI | Accuracy | 43.66 |
Some of these results are significantly different from the ones reported on the test set Some of these results are significantly different from the ones reported on the test set
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment