Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
gaoqiong
lm-evaluation-harness
Commits
36485d7a
Unverified
Commit
36485d7a
authored
Mar 07, 2021
by
Leo Gao
Committed by
GitHub
Mar 07, 2021
Browse files
Update README.md
parent
1de3b743
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
91 additions
and
84 deletions
+91
-84
README.md
README.md
+91
-84
No files found.
README.md
View file @
36485d7a
...
@@ -13,10 +13,10 @@ The goal of this project is to build a set of tools for evaluating LMs on typica
...
@@ -13,10 +13,10 @@ The goal of this project is to build a set of tools for evaluating LMs on typica
### Overview of Tasks
### Overview of Tasks
| Task Name |Train|Val|Test| Metrics |
| Task Name |Train|Val|Test| Metrics |
|---------------|-----|---|----|---------------|
|---------------
---------------
|-----|---|----|---------------|
|cola |✓ |✓ |✓ |mcc |
|cola |✓ |✓ |✓ |mcc |
|mnli |✓ |✓ |✓ |acc |
|mnli |✓ |✓ |✓ |acc |
|mnli_mismatched|✓ |✓ |✓ |acc |
|mnli_mismatched
|✓ |✓ |✓ |acc |
|mrpc |✓ |✓ |✓ |acc, f1 |
|mrpc |✓ |✓ |✓ |acc, f1 |
|rte |✓ |✓ |✓ |acc |
|rte |✓ |✓ |✓ |acc |
|qnli |✓ |✓ |✓ |acc |
|qnli |✓ |✓ |✓ |acc |
...
@@ -31,6 +31,7 @@ The goal of this project is to build a set of tools for evaluating LMs on typica
...
@@ -31,6 +31,7 @@ The goal of this project is to build a set of tools for evaluating LMs on typica
|wic |✓ |✓ |✓ |acc |
|wic |✓ |✓ |✓ |acc |
|wsc |✓ |✓ |✓ |acc |
|wsc |✓ |✓ |✓ |acc |
|coqa |✓ |✓ | |f1, em |
|coqa |✓ |✓ | |f1, em |
|drop |✓ |✓ | |em, f1 |
|lambada | |✓ | |ppl, acc |
|lambada | |✓ | |ppl, acc |
|piqa |✓ |✓ | |acc |
|piqa |✓ |✓ | |acc |
|pubmedqa | | |✓ |acc |
|pubmedqa | | |✓ |acc |
...
@@ -52,10 +53,11 @@ The goal of this project is to build a set of tools for evaluating LMs on typica
...
@@ -52,10 +53,11 @@ The goal of this project is to build a set of tools for evaluating LMs on typica
|anli_r2 |✓ |✓ |✓ |acc |
|anli_r2 |✓ |✓ |✓ |acc |
|anli_r3 |✓ |✓ |✓ |acc |
|anli_r3 |✓ |✓ |✓ |acc |
|ethics_cm |✓ |✓ |✓ |acc |
|ethics_cm |✓ |✓ |✓ |acc |
|ethics_deontology |✓ |✓ |✓ |acc |
|ethics_deontology |✓ |✓ |✓ |acc, em |
|ethics_justice |✓ |✓ |✓ |acc |
|ethics_justice |✓ |✓ |✓ |acc, em |
|ethics_utilitarianism_original|✓ |✓ |✓ |acc |
|ethics_utilitarianism |✓ |✓ |✓ |acc |
|ethics_utilitarianism |✓ |✓ |✓ |acc |
|ethics_virtue |✓ |✓ |✓ |acc
|
|ethics_virtue
|✓ |✓ |✓ |acc
, em
|
|arithmetic_2da | |✓ | |acc |
|arithmetic_2da | |✓ | |acc |
|arithmetic_2ds | |✓ | |acc |
|arithmetic_2ds | |✓ | |acc |
|arithmetic_3da | |✓ | |acc |
|arithmetic_3da | |✓ | |acc |
...
@@ -96,6 +98,11 @@ The goal of this project is to build a set of tools for evaluating LMs on typica
...
@@ -96,6 +98,11 @@ The goal of this project is to build a set of tools for evaluating LMs on typica
|wmt20-zh-en | | |✓ |bleu, chrf, ter|
|wmt20-zh-en | | |✓ |bleu, chrf, ter|
|iwslt17-en-ar | | |✓ |bleu, chrf, ter|
|iwslt17-en-ar | | |✓ |bleu, chrf, ter|
|iwslt17-ar-en | | |✓ |bleu, chrf, ter|
|iwslt17-ar-en | | |✓ |bleu, chrf, ter|
|anagrams1 | |✓ | |acc |
|anagrams2 | |✓ | |acc |
|cycle_letters | |✓ | |acc |
|random_insertion | |✓ | |acc |
|reversed_words | |✓ | |acc |
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment