Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
gaoqiong
lm-evaluation-harness
Commits
4907defd
Unverified
Commit
4907defd
authored
Aug 10, 2023
by
Hailey Schoelkopf
Committed by
GitHub
Aug 10, 2023
Browse files
Merge pull request #764 from EleutherAI/lintangsutawika-patch-2
Update README.md
parents
d1a44c85
f0cc1507
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
12 additions
and
12 deletions
+12
-12
lm_eval/tasks/README.md
lm_eval/tasks/README.md
+12
-12
No files found.
lm_eval/tasks/README.md
View file @
4907defd
...
...
@@ -5,15 +5,15 @@ Boxes should be checked iff tasks are implemented in the refactor and tested for
-
[ ] Glue (Lintang)
-
[x] SuperGlue
-
[ ] CoQA
-
[ ] DROP
-
[ ] CoQA
(Lintang)
-
[ ] DROP
(Lintang)
-
[x] ~~Lambada~~
-
[x] Lambada (Cloze variants)
-
[x] ~~Lambada (Multilingual)~~
-
[x] Wikitext
-
[x] PiQA
-
[x] PROST
-
[ ] MCTACO
-
[ ] MCTACO
(Lintang)
-
[x] Pubmed QA
-
[x] SciQ
-
[ ] QASPER
...
...
@@ -24,20 +24,20 @@ Boxes should be checked iff tasks are implemented in the refactor and tested for
-
[x] HellaSwag
-
[x] SWAG
-
[x] OpenBookQA
-
[ ] SQuADv2
-
[ ] SQuADv2
(Lintang)
-
[x] RACE
-
[x] HeadQA
-
[x] MathQA
-
[x] WebQs
-
[ ] WSC273
-
[ ] WSC273
(Lintang)
-
[x] Winogrande
-
[x] ANLI
-
[x] Hendrycks Ethics (missing some tasks/metrics, see PR 660:
<https://github.com/EleutherAI/lm-evaluation-harness/pull/660>
for more info)
-
[x] TruthfulQA (mc1)
-
[ ] TruthfulQA (mc2)
-
[ ] TruthfulQA (gen)
-
[x] TruthfulQA (mc1)
(Lintang)
-
[ ] TruthfulQA (mc2)
(Lintang)
-
[ ] TruthfulQA (gen)
(Lintang)
-
[ ] MuTual
-
[ ] Hendrycks Math
-
[ ] Hendrycks Math
(Hailey)
-
[ ] Asdiv
-
[ ] GSM8k
-
[x] Arithmetic
...
...
@@ -47,8 +47,8 @@ Boxes should be checked iff tasks are implemented in the refactor and tested for
-
[x] ~~Pile (perplexity)~~
-
[ ] BLiMP (Lintang)
-
[x] ToxiGen
-
[ ] StoryCloze
-
[ ] NaturalQs
-
[ ] StoryCloze
(Lintang)
-
[ ] NaturalQs
(Hailey)
-
[x] CrowS-Pairs
-
[x] XCopa
-
[ ] BIG-Bench (Hailey)
...
...
@@ -56,7 +56,7 @@ Boxes should be checked iff tasks are implemented in the refactor and tested for
-
[x] XWinograd
-
[ ] PAWS-X (Lintang)
-
[ ] XNLI (Lintang)
-
[ ] MGSM
-
[ ] MGSM
(Lintang)
-
[ ] SCROLLS
-
[x] Babi
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment