Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
gaoqiong
lm-evaluation-harness
Commits
7bd37779
Unverified
Commit
7bd37779
authored
Feb 14, 2021
by
Leo Gao
Committed by
GitHub
Feb 14, 2021
Browse files
Update README.md
parent
c155698f
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
80 additions
and
39 deletions
+80
-39
README.md
README.md
+80
-39
No files found.
README.md
View file @
7bd37779
...
@@ -13,7 +13,7 @@ The goal of this project is to build a set of tools for evaluating LMs on typica
...
@@ -13,7 +13,7 @@ The goal of this project is to build a set of tools for evaluating LMs on typica
### Overview of Tasks
### Overview of Tasks
| Task Name |Train|Val|Test| Metrics |
| Task Name |Train|Val|Test| Metrics |
|---------------|-----|---|----|---------------
-----
|
|---------------|-----|---|----|---------------|
|cola |✓ |✓ |✓ |mcc |
|cola |✓ |✓ |✓ |mcc |
|mnli |✓ |✓ |✓ |acc |
|mnli |✓ |✓ |✓ |acc |
|mnli_mismatched|✓ |✓ |✓ |acc |
|mnli_mismatched|✓ |✓ |✓ |acc |
...
@@ -27,17 +27,27 @@ The goal of this project is to build a set of tools for evaluating LMs on typica
...
@@ -27,17 +27,27 @@ The goal of this project is to build a set of tools for evaluating LMs on typica
|cb |✓ |✓ |✓ |acc, f1 |
|cb |✓ |✓ |✓ |acc, f1 |
|copa |✓ |✓ |✓ |acc |
|copa |✓ |✓ |✓ |acc |
|multirc |✓ |✓ |✓ |acc |
|multirc |✓ |✓ |✓ |acc |
|record |✓ |✓ | |f1, em |
|wic |✓ |✓ |✓ |acc |
|wic |✓ |✓ |✓ |acc |
|wsc |✓ |✓ |✓ |acc |
|wsc |✓ |✓ |✓ |acc |
|lambada | |✓ | |perplexity, accuracy|
|coqa |✓ |✓ | |f1, em |
|lambada | |✓ | |ppl, acc |
|piqa |✓ |✓ | |acc |
|piqa |✓ |✓ | |acc |
|pubmedqa | | |✓ |acc |
|sciq |✓ |✓ |✓ |acc |
|qa4mre_2011 | | |✓ |acc |
|qa4mre_2012 | | |✓ |acc |
|qa4mre_2013 | | |✓ |acc |
|arc_easy |✓ |✓ |✓ |acc |
|arc_easy |✓ |✓ |✓ |acc |
|arc_challenge |✓ |✓ |✓ |acc |
|arc_challenge |✓ |✓ |✓ |acc |
|hellaswag |✓ |✓ |✓ |acc |
|hellaswag |✓ |✓ | |acc |
|openbookqa |✓ |✓ |✓ |acc |
|race |✓ |✓ |✓ |acc |
|race |✓ |✓ |✓ |acc |
|headqa |✓ |✓ |✓ |acc |
|mathqa |✓ |✓ |✓ |acc |
|webqs |✓ | |✓ |acc |
|webqs |✓ | |✓ |acc |
|wsc273 | | |✓ |acc |
|wsc273 | | |✓ |acc |
|winogrande |✓ |✓ |
✓
|acc
|
|winogrande |✓ |✓ |
|acc |
|anli_r1 |✓ |✓ |✓ |acc |
|anli_r1 |✓ |✓ |✓ |acc |
|anli_r2 |✓ |✓ |✓ |acc |
|anli_r2 |✓ |✓ |✓ |acc |
|anli_r3 |✓ |✓ |✓ |acc |
|anli_r3 |✓ |✓ |✓ |acc |
...
@@ -51,6 +61,37 @@ The goal of this project is to build a set of tools for evaluating LMs on typica
...
@@ -51,6 +61,37 @@ The goal of this project is to build a set of tools for evaluating LMs on typica
|arithmetic_5ds | |✓ | |acc |
|arithmetic_5ds | |✓ | |acc |
|arithmetic_2dm | |✓ | |acc |
|arithmetic_2dm | |✓ | |acc |
|arithmetic_1dc | |✓ | |acc |
|arithmetic_1dc | |✓ | |acc |
|wmt14-en-fr | | |✓ |bleu, chrf, ter|
|wmt14-fr-en | | |✓ |bleu, chrf, ter|
|wmt16-en-ro | | |✓ |bleu, chrf, ter|
|wmt16-ro-en | | |✓ |bleu, chrf, ter|
|wmt16-de-en | | |✓ |bleu, chrf, ter|
|wmt16-en-de | | |✓ |bleu, chrf, ter|
|wmt20-cs-en | | |✓ |bleu, chrf, ter|
|wmt20-de-en | | |✓ |bleu, chrf, ter|
|wmt20-de-fr | | |✓ |bleu, chrf, ter|
|wmt20-en-cs | | |✓ |bleu, chrf, ter|
|wmt20-en-de | | |✓ |bleu, chrf, ter|
|wmt20-en-iu | | |✓ |bleu, chrf, ter|
|wmt20-en-ja | | |✓ |bleu, chrf, ter|
|wmt20-en-km | | |✓ |bleu, chrf, ter|
|wmt20-en-pl | | |✓ |bleu, chrf, ter|
|wmt20-en-ps | | |✓ |bleu, chrf, ter|
|wmt20-en-ru | | |✓ |bleu, chrf, ter|
|wmt20-en-ta | | |✓ |bleu, chrf, ter|
|wmt20-en-zh | | |✓ |bleu, chrf, ter|
|wmt20-fr-de | | |✓ |bleu, chrf, ter|
|wmt20-iu-en | | |✓ |bleu, chrf, ter|
|wmt20-ja-en | | |✓ |bleu, chrf, ter|
|wmt20-km-en | | |✓ |bleu, chrf, ter|
|wmt20-pl-en | | |✓ |bleu, chrf, ter|
|wmt20-ps-en | | |✓ |bleu, chrf, ter|
|wmt20-ru-en | | |✓ |bleu, chrf, ter|
|wmt20-ta-en | | |✓ |bleu, chrf, ter|
|wmt20-zh-en | | |✓ |bleu, chrf, ter|
|iwslt17-en-ar | | |✓ |bleu, chrf, ter|
|iwslt17-ar-en | | |✓ |bleu, chrf, ter|
## Usage
## Usage
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment