Unverified Commit 4907defd authored by Hailey Schoelkopf's avatar Hailey Schoelkopf Committed by GitHub
Browse files

Merge pull request #764 from EleutherAI/lintangsutawika-patch-2

Update README.md
parents d1a44c85 f0cc1507
...@@ -5,15 +5,15 @@ Boxes should be checked iff tasks are implemented in the refactor and tested for ...@@ -5,15 +5,15 @@ Boxes should be checked iff tasks are implemented in the refactor and tested for
- [ ] Glue (Lintang) - [ ] Glue (Lintang)
- [x] SuperGlue - [x] SuperGlue
- [ ] CoQA - [ ] CoQA (Lintang)
- [ ] DROP - [ ] DROP (Lintang)
- [x] ~~Lambada~~ - [x] ~~Lambada~~
- [x] Lambada (Cloze variants) - [x] Lambada (Cloze variants)
- [x] ~~Lambada (Multilingual)~~ - [x] ~~Lambada (Multilingual)~~
- [x] Wikitext - [x] Wikitext
- [x] PiQA - [x] PiQA
- [x] PROST - [x] PROST
- [ ] MCTACO - [ ] MCTACO (Lintang)
- [x] Pubmed QA - [x] Pubmed QA
- [x] SciQ - [x] SciQ
- [ ] QASPER - [ ] QASPER
...@@ -24,20 +24,20 @@ Boxes should be checked iff tasks are implemented in the refactor and tested for ...@@ -24,20 +24,20 @@ Boxes should be checked iff tasks are implemented in the refactor and tested for
- [x] HellaSwag - [x] HellaSwag
- [x] SWAG - [x] SWAG
- [x] OpenBookQA - [x] OpenBookQA
- [ ] SQuADv2 - [ ] SQuADv2 (Lintang)
- [x] RACE - [x] RACE
- [x] HeadQA - [x] HeadQA
- [x] MathQA - [x] MathQA
- [x] WebQs - [x] WebQs
- [ ] WSC273 - [ ] WSC273 (Lintang)
- [x] Winogrande - [x] Winogrande
- [x] ANLI - [x] ANLI
- [x] Hendrycks Ethics (missing some tasks/metrics, see PR 660: <https://github.com/EleutherAI/lm-evaluation-harness/pull/660> for more info) - [x] Hendrycks Ethics (missing some tasks/metrics, see PR 660: <https://github.com/EleutherAI/lm-evaluation-harness/pull/660> for more info)
- [x] TruthfulQA (mc1) - [x] TruthfulQA (mc1) (Lintang)
- [ ] TruthfulQA (mc2) - [ ] TruthfulQA (mc2) (Lintang)
- [ ] TruthfulQA (gen) - [ ] TruthfulQA (gen) (Lintang)
- [ ] MuTual - [ ] MuTual
- [ ] Hendrycks Math - [ ] Hendrycks Math (Hailey)
- [ ] Asdiv - [ ] Asdiv
- [ ] GSM8k - [ ] GSM8k
- [x] Arithmetic - [x] Arithmetic
...@@ -47,8 +47,8 @@ Boxes should be checked iff tasks are implemented in the refactor and tested for ...@@ -47,8 +47,8 @@ Boxes should be checked iff tasks are implemented in the refactor and tested for
- [x] ~~Pile (perplexity)~~ - [x] ~~Pile (perplexity)~~
- [ ] BLiMP (Lintang) - [ ] BLiMP (Lintang)
- [x] ToxiGen - [x] ToxiGen
- [ ] StoryCloze - [ ] StoryCloze (Lintang)
- [ ] NaturalQs - [ ] NaturalQs (Hailey)
- [x] CrowS-Pairs - [x] CrowS-Pairs
- [x] XCopa - [x] XCopa
- [ ] BIG-Bench (Hailey) - [ ] BIG-Bench (Hailey)
...@@ -56,7 +56,7 @@ Boxes should be checked iff tasks are implemented in the refactor and tested for ...@@ -56,7 +56,7 @@ Boxes should be checked iff tasks are implemented in the refactor and tested for
- [x] XWinograd - [x] XWinograd
- [ ] PAWS-X (Lintang) - [ ] PAWS-X (Lintang)
- [ ] XNLI (Lintang) - [ ] XNLI (Lintang)
- [ ] MGSM - [ ] MGSM (Lintang)
- [ ] SCROLLS - [ ] SCROLLS
- [x] Babi - [x] Babi
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment