Unverified Commit 4d0ac35c authored by Hailey Schoelkopf's avatar Hailey Schoelkopf Committed by GitHub
Browse files

Update task statuses on tracking list

parent ef4a26f6
# v1.0 Tasks # v1.0 Tasks
This list keeps track of which tasks' implementations have been ported to YAML / v2.0 of the Eval Harness. This list keeps track of which tasks' implementations have been ported to YAML / v2.0 of the Eval Harness.
Boxes should be checked iff tasks are implemented in the refactor and tested for regression. Tasks should be struck through if checked *against original introducing paper* implementation or popularizing implementation. Boxes should be checked iff tasks are implemented in the refactor and tested for regression. Tasks should be struck through if checked *against original introducing paper* implementation or popularizing implementation. (WIP) Denotes that there exists a PR or person working on this task already.
- [ ] Glue (WIP) - [ ] Glue (WIP)
- [x] SuperGlue - [x] SuperGlue
...@@ -14,23 +14,23 @@ Boxes should be checked iff tasks are implemented in the refactor and tested for ...@@ -14,23 +14,23 @@ Boxes should be checked iff tasks are implemented in the refactor and tested for
- [x] PiQA - [x] PiQA
- [ ] PROST - [ ] PROST
- [ ] MCTACO - [ ] MCTACO
- [ ] Pubmed QA - [ ] Pubmed QA (WIP)
- [x] SciQ - [x] SciQ
- [ ] QASPER - [ ] QASPER
- [ ] QA4MRE - [ ] QA4MRE
- [ ] TriviaQA - [ ] TriviaQA
- [x] AI2 ARC - [x] AI2 ARC
- [ ] LogiQA - [ ] LogiQA
- [ ] HellaSwag - [x] HellaSwag
- [ ] SWAG - [ ] SWAG (WIP)
- [x] OpenBookQA - [x] OpenBookQA
- [ ] SQuADv2 - [ ] SQuADv2
- [ ] RACE - [ ] RACE (WIP)
- [ ] HeadQA - [ ] HeadQA
- [ ] MathQA - [ ] MathQA
- [ ] WebQs - [ ] WebQs
- [ ] WSC273 - [ ] WSC273
- [ ] Winogrande - [ ] Winogrande (WIP)
- [x] ANLI - [x] ANLI
- [ ] Hendrycks Ethics - [ ] Hendrycks Ethics
- [ ] TruthfulQA - [ ] TruthfulQA
...@@ -38,7 +38,7 @@ Boxes should be checked iff tasks are implemented in the refactor and tested for ...@@ -38,7 +38,7 @@ Boxes should be checked iff tasks are implemented in the refactor and tested for
- [ ] Hendrycks Math - [ ] Hendrycks Math
- [ ] Asdiv - [ ] Asdiv
- [ ] GSM8k - [ ] GSM8k
- [ ] Arithmetic - [ ] Arithmetic (WIP)
- [ ] MMMLU - [ ] MMMLU
- [ ] Translation (WMT) suite - [ ] Translation (WMT) suite
- [ ] Unscramble - [ ] Unscramble
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment