Commit 6f877d9d authored by VictorSanh's avatar VictorSanh
Browse files

Update dev results on GLUE (bert-base-uncased) w/ median on 5 runs

parent 07681b6b
......@@ -68,7 +68,9 @@ GLUE results on dev set
~~~~~~~~~~~~~~~~~~~~~~~
We get the following results on the dev set of GLUE benchmark with an uncased BERT base
model. All experiments were run on a P100 GPU with a batch size of 32.
model (`bert-base-uncased`). All experiments ran on 8 V100 GPUs with a total train batch size of 24. Some of
these tasks have a small dataset and training can lead to high variance in the results between different runs.
We report the median on 5 runs (with different seeds) for each of the metrics.
.. list-table::
:header-rows: 1
......@@ -78,31 +80,31 @@ model. All experiments were run on a P100 GPU with a batch size of 32.
- Result
* - CoLA
- Matthew's corr.
- 57.29
- 55.75
* - SST-2
- accuracy
- 93.00
- 92.09
* - MRPC
- F1/accuracy
- 88.85/83.82
- 90.48/86.27
* - STS-B
- Pearson/Spearman corr.
- 89.70/89.37
- 89.03/88.64
* - QQP
- accuracy/F1
- 90.72/87.41
- 90.92/87.72
* - MNLI
- matched acc./mismatched acc.
- 83.95/84.39
- 83.74/84.06
* - QNLI
- accuracy
- 89.04
- 91.07
* - RTE
- accuracy
- 61.01
- 68.59
* - WNLI
- accuracy
- 53.52
- 43.66
Some of these results are significantly different from the ones reported on the test set
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment