@@ -28,7 +28,7 @@ Boxes should be checked iff tasks are implemented in the refactor and tested for
...
@@ -28,7 +28,7 @@ Boxes should be checked iff tasks are implemented in the refactor and tested for
- [x] RACE
- [x] RACE
- [x] HeadQA
- [x] HeadQA
- [x] MathQA
- [x] MathQA
- [] WebQs
- [x] WebQs
- [ ] WSC273
- [ ] WSC273
- [x] Winogrande
- [x] Winogrande
- [x] ANLI
- [x] ANLI
...
@@ -50,15 +50,15 @@ Boxes should be checked iff tasks are implemented in the refactor and tested for
...
@@ -50,15 +50,15 @@ Boxes should be checked iff tasks are implemented in the refactor and tested for
- [ ] StoryCloze
- [ ] StoryCloze
- [ ] NaturalQs
- [ ] NaturalQs
- [x] CrowS-Pairs
- [x] CrowS-Pairs
- [ ] XCopa
- [ ] XCopa (Lintang)
- [ ] BIG-Bench
- [ ] BIG-Bench (Hailey)
- [ ] XStoryCloze
- [ ] XStoryCloze (Lintang)
- [x] XWinograd
- [x] XWinograd
- [ ] PAWS-X
- [ ] PAWS-X (Lintang)
- [ ] XNLI
- [ ] XNLI (Lintang)
- [ ] MGSM
- [ ] MGSM
- [ ] SCROLLS
- [ ] SCROLLS
- [] Babi
- [x] Babi
# Novel Tasks
# Novel Tasks
Tasks added in the revamped harness that were not previously available. Again, a strikethrough denotes checking performed *against the original task's implementation or published results introducing the task*.
Tasks added in the revamped harness that were not previously available. Again, a strikethrough denotes checking performed *against the original task's implementation or published results introducing the task*.