Commit 6f877d9d authored by VictorSanh's avatar VictorSanh
Browse files

Update dev results on GLUE (bert-base-uncased) w/ median on 5 runs

parent 07681b6b
...@@ -68,7 +68,9 @@ GLUE results on dev set ...@@ -68,7 +68,9 @@ GLUE results on dev set
~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~
We get the following results on the dev set of GLUE benchmark with an uncased BERT base We get the following results on the dev set of GLUE benchmark with an uncased BERT base
model. All experiments were run on a P100 GPU with a batch size of 32. model (`bert-base-uncased`). All experiments ran on 8 V100 GPUs with a total train batch size of 24. Some of
these tasks have a small dataset and training can lead to high variance in the results between different runs.
We report the median on 5 runs (with different seeds) for each of the metrics.
.. list-table:: .. list-table::
:header-rows: 1 :header-rows: 1
...@@ -78,31 +80,31 @@ model. All experiments were run on a P100 GPU with a batch size of 32. ...@@ -78,31 +80,31 @@ model. All experiments were run on a P100 GPU with a batch size of 32.
- Result - Result
* - CoLA * - CoLA
- Matthew's corr. - Matthew's corr.
- 57.29 - 55.75
* - SST-2 * - SST-2
- accuracy - accuracy
- 93.00 - 92.09
* - MRPC * - MRPC
- F1/accuracy - F1/accuracy
- 88.85/83.82 - 90.48/86.27
* - STS-B * - STS-B
- Pearson/Spearman corr. - Pearson/Spearman corr.
- 89.70/89.37 - 89.03/88.64
* - QQP * - QQP
- accuracy/F1 - accuracy/F1
- 90.72/87.41 - 90.92/87.72
* - MNLI * - MNLI
- matched acc./mismatched acc. - matched acc./mismatched acc.
- 83.95/84.39 - 83.74/84.06
* - QNLI * - QNLI
- accuracy - accuracy
- 89.04 - 91.07
* - RTE * - RTE
- accuracy - accuracy
- 61.01 - 68.59
* - WNLI * - WNLI
- accuracy - accuracy
- 53.52 - 43.66
Some of these results are significantly different from the ones reported on the test set Some of these results are significantly different from the ones reported on the test set
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment