Unverified Commit 8ebc85e8 authored by Lintang Sutawika's avatar Lintang Sutawika Committed by GitHub
Browse files

Update README.md

parent 546fd5cd
...@@ -3,7 +3,7 @@ This list keeps track of which tasks' implementations have been ported to YAML / ...@@ -3,7 +3,7 @@ This list keeps track of which tasks' implementations have been ported to YAML /
Boxes should be checked iff tasks are implemented in the refactor and tested for regression. Tasks should be struck through if checked *against original introducing paper* implementation or popularizing implementation. (WIP) Denotes that there exists a PR or person working on this task already. Boxes should be checked iff tasks are implemented in the refactor and tested for regression. Tasks should be struck through if checked *against original introducing paper* implementation or popularizing implementation. (WIP) Denotes that there exists a PR or person working on this task already.
- [ ] Glue (WIP) - [ ] Glue (Lintang)
- [x] SuperGlue - [x] SuperGlue
- [ ] CoQA - [ ] CoQA
- [ ] DROP - [ ] DROP
...@@ -20,14 +20,14 @@ Boxes should be checked iff tasks are implemented in the refactor and tested for ...@@ -20,14 +20,14 @@ Boxes should be checked iff tasks are implemented in the refactor and tested for
- [x] QA4MRE - [x] QA4MRE
- [ ] TriviaQA - [ ] TriviaQA
- [x] AI2 ARC - [x] AI2 ARC
- [ ] LogiQA (WIP) - [ ] LogiQA [(WIP)](https://github.com/EleutherAI/lm-evaluation-harness/pull/711)
- [x] HellaSwag - [x] HellaSwag
- [x] SWAG - [x] SWAG
- [x] OpenBookQA - [x] OpenBookQA
- [ ] SQuADv2 (WIP) - [ ] SQuADv2
- [x] RACE - [x] RACE
- [x] HeadQA - [x] HeadQA
- [ ] MathQA (WIP) - [x] MathQA
- [ ] WebQs - [ ] WebQs
- [ ] WSC273 - [ ] WSC273
- [x] Winogrande - [x] Winogrande
...@@ -37,28 +37,27 @@ Boxes should be checked iff tasks are implemented in the refactor and tested for ...@@ -37,28 +37,27 @@ Boxes should be checked iff tasks are implemented in the refactor and tested for
- [ ] TruthfulQA (mc2) - [ ] TruthfulQA (mc2)
- [ ] TruthfulQA (gen) - [ ] TruthfulQA (gen)
- [ ] MuTual - [ ] MuTual
- [ ] Hendrycks Math (WIP) - [ ] Hendrycks Math
- [ ] Asdiv (WIP) - [ ] Asdiv
- [ ] GSM8k - [ ] GSM8k
- [x] Arithmetic - [x] Arithmetic
- [ ] MMMLU - [ ] MMMLU
- [ ] Translation (WMT) suite - [ ] Translation (WMT) suite
- [x] Unscramble - [x] Unscramble
- [x] ~~Pile (perplexity)~~ - [x] ~~Pile (perplexity)~~
- [ ] BLiMP - [ ] BLiMP (Lintang)
- [x] ToxiGen - [x] ToxiGen
- [ ] StoryCloze - [ ] StoryCloze
- [ ] NaturalQs (WIP) - [ ] NaturalQs
- [ ] CrowS-Pairs - [ ] CrowS-Pairs
- [ ] XCopa - [ ] XCopa
- [ ] BIG-Bench - [ ] BIG-Bench
- [ ] XStoryCloze - [ ] XStoryCloze
- [ ] XWinograd - [x] XWinograd
- [ ] PAWS-X - [ ] PAWS-X
- [ ] XNLI - [ ] XNLI
- [ ] MGSM - [ ] MGSM
- [ ] SCROLLS - [ ] SCROLLS
- [ ] JSON Task (reference: https://github.com/EleutherAI/lm-evaluation-harness/pull/481)
- [ ] Babi - [ ] Babi
# Novel Tasks # Novel Tasks
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment