Unverified Commit 759da8d5 authored by Lintang Sutawika's avatar Lintang Sutawika Committed by GitHub
Browse files

Merge pull request #757 from EleutherAI/add-readme

[Refactor] Add README.md
parents 73912efb c05a5ad4
...@@ -27,8 +27,27 @@ Homepage: `https://github.com/sylinrl/TruthfulQA` ...@@ -27,8 +27,27 @@ Homepage: `https://github.com/sylinrl/TruthfulQA`
} }
``` ```
### Subtasks ### Groups and Tasks
#### Groups
* Not part of a group yet.
#### Tasks
* `truthfulqa_mc1`: `Multiple-choice, single answer` * `truthfulqa_mc1`: `Multiple-choice, single answer`
* `truthfulqa_mc2`: `Multiple-choice, multiple answers` * (MISSING)`truthfulqa_mc2`: `Multiple-choice, multiple answers`
* `truthfulqa_gen`: `Answer generation` * (MISSING)`truthfulqa_gen`: `Answer generation`
### Checklist
For adding novel benchmarks/datasets to the library:
* [ ] Is the task an existing benchmark in the literature?
* [ ] Have you referenced the original paper that introduced the task?
* [ ] If yes, does the original paper provide a reference implementation? If so, have you checked against the reference implementation and documented how to run such a test?
If other tasks on this dataset are already supported:
* [ ] Is the "Main" variant of this task clearly denoted?
* [ ] Have you provided a short sentence in a README on what each new variant adds / evaluates?
* [ ] Have you noted which, if any, published evaluation setups are matched by this variant?
...@@ -28,7 +28,13 @@ Homepage: https://github.com/openai/gpt-3/tree/master/data ...@@ -28,7 +28,13 @@ Homepage: https://github.com/openai/gpt-3/tree/master/data
} }
``` ```
### Subtasks ### Groups and Tasks
#### Groups
* `unscramble`
#### Tasks
* `anagrams1` - Anagrams of all but the first and last letter. * `anagrams1` - Anagrams of all but the first and last letter.
* `anagrams2` - Anagrams of all but the first and last 2 letters. * `anagrams2` - Anagrams of all but the first and last 2 letters.
......
group: group:
- greedy_until - unscramble
task: anagrams1 task: anagrams1
dataset_path: EleutherAI/unscramble dataset_path: EleutherAI/unscramble
dataset_name: mid_word_1_anagrams dataset_name: mid_word_1_anagrams
......
group: group:
- greedy_until - unscramble
task: anagrams2 task: anagrams2
dataset_path: EleutherAI/unscramble dataset_path: EleutherAI/unscramble
dataset_name: mid_word_2_anagrams dataset_name: mid_word_2_anagrams
......
group: group:
- greedy_until - unscramble
task: cycle_letters task: cycle_letters
dataset_path: EleutherAI/unscramble dataset_path: EleutherAI/unscramble
dataset_name: cycle_letters_in_word dataset_name: cycle_letters_in_word
......
group: group:
- greedy_until - unscramble
task: random_insertion task: random_insertion
dataset_path: EleutherAI/unscramble dataset_path: EleutherAI/unscramble
dataset_name: random_insertion_in_word dataset_name: random_insertion_in_word
......
group: group:
- greedy_until - unscramble
task: reversed_words task: reversed_words
dataset_path: EleutherAI/unscramble dataset_path: EleutherAI/unscramble
dataset_name: reversed_words dataset_name: reversed_words
......
# Task-name # WEBQs
### Paper ### Paper
...@@ -33,9 +33,14 @@ Homepage: `https://worksheets.codalab.org/worksheets/0xba659fe363cb46e7a505c5b6a ...@@ -33,9 +33,14 @@ Homepage: `https://worksheets.codalab.org/worksheets/0xba659fe363cb46e7a505c5b6a
} }
``` ```
### Subtasks ### Groups and Tasks
#### Groups
* `freebase`
#### Tasks
List or describe tasks defined in this folder, and their names here:
* `webqs`: `Questions with multiple accepted answers.` * `webqs`: `Questions with multiple accepted answers.`
### Checklist ### Checklist
......
group: group:
- freebase - freebase
- question_answer
task: webqs task: webqs
dataset_path: web_questions dataset_path: web_questions
dataset_name: null dataset_name: null
......
...@@ -26,7 +26,13 @@ Homepage: https://www.salesforce.com/products/einstein/ai-research/the-wikitext- ...@@ -26,7 +26,13 @@ Homepage: https://www.salesforce.com/products/einstein/ai-research/the-wikitext-
} }
``` ```
### Subtasks ### Groups and Tasks
#### Groups
* Not part of a group yet.
#### Tasks
* `wikitext`: measure perplexity on the Wikitext dataset, via rolling loglikelihoods. * `wikitext`: measure perplexity on the Wikitext dataset, via rolling loglikelihoods.
......
group:
- perplexity
- loglikelihood_rolling
task: wikitext task: wikitext
dataset_path: EleutherAI/wikitext_document_level dataset_path: EleutherAI/wikitext_document_level
dataset_name: wikitext-2-raw-v1 dataset_name: wikitext-2-raw-v1
......
# WinoGrande
### Paper
Title: `WinoGrande: An Adversarial Winograd Schema Challenge at Scale`
Abstract: https://arxiv.org/abs/1907.10641
WinoGrande is a collection of 44k problems, inspired by Winograd Schema Challenge
(Levesque, Davis, and Morgenstern 2011), but adjusted to improve the scale and
robustness against the dataset-specific bias. Formulated as a fill-in-a-blank
task with binary options, the goal is to choose the right option for a given
sentence which requires commonsense reasoning.
NOTE: This evaluation of Winogrande uses partial evaluation as described by
Trinh & Le in Simple Method for Commonsense Reasoning (2018).
See: https://arxiv.org/abs/1806.02847
Homepage: https://leaderboard.allenai.org/winogrande/submissions/public
### Citation
```
@article{sakaguchi2019winogrande,
title={WinoGrande: An Adversarial Winograd Schema Challenge at Scale},
author={Sakaguchi, Keisuke and Bras, Ronan Le and Bhagavatula, Chandra and Choi, Yejin},
journal={arXiv preprint arXiv:1907.10641},
year={2019}
}
```
### Groups and Tasks
#### Groups
* Not part of a group yet.
#### Tasks
* `winogrande`
### Checklist
For adding novel benchmarks/datasets to the library:
* [ ] Is the task an existing benchmark in the literature?
* [ ] Have you referenced the original paper that introduced the task?
* [ ] If yes, does the original paper provide a reference implementation? If so, have you checked against the reference implementation and documented how to run such a test?
If other tasks on this dataset are already supported:
* [ ] Is the "Main" variant of this task clearly denoted?
* [ ] Have you provided a short sentence in a README on what each new variant adds / evaluates?
* [ ] Have you noted which, if any, published evaluation setups are matched by this variant?
## XCOPA: A Multilingual Dataset for Causal Commonsense Reasoning # XCOPA
https://ducdauge.github.io/files/xcopa.pdf
### Paper
Title: `XCOPA: A Multilingual Dataset for Causal Commonsense Reasoning`
Abstract: https://ducdauge.github.io/files/xcopa.pdf
The Cross-lingual Choice of Plausible Alternatives dataset is a benchmark to evaluate the ability of machine learning models to transfer commonsense reasoning across languages. The Cross-lingual Choice of Plausible Alternatives dataset is a benchmark to evaluate the ability of machine learning models to transfer commonsense reasoning across languages.
The dataset is the translation and reannotation of the English COPA (Roemmele et al. 2011) and covers 11 languages from 11 families and several areas around the globe. The dataset is the translation and reannotation of the English COPA (Roemmele et al. 2011) and covers 11 languages from 11 families and several areas around the globe.
...@@ -8,6 +13,8 @@ All the details about the creation of XCOPA and the implementation of the baseli ...@@ -8,6 +13,8 @@ All the details about the creation of XCOPA and the implementation of the baseli
Homepage: https://github.com/cambridgeltl/xcopa Homepage: https://github.com/cambridgeltl/xcopa
### Citation
``` ```
@inproceedings{ponti2020xcopa, @inproceedings{ponti2020xcopa,
title={{XCOPA: A} Multilingual Dataset for Causal Commonsense Reasoning}, title={{XCOPA: A} Multilingual Dataset for Causal Commonsense Reasoning},
...@@ -17,3 +24,37 @@ Homepage: https://github.com/cambridgeltl/xcopa ...@@ -17,3 +24,37 @@ Homepage: https://github.com/cambridgeltl/xcopa
url={https://ducdauge.github.io/files/xcopa.pdf} url={https://ducdauge.github.io/files/xcopa.pdf}
} }
``` ```
### Groups and Tasks
#### Groups
* `xcopa`
#### Tasks
* `xcopa_et`: Estonian
* `xcopa_ht`: Haitian Creole
* `xcopa_id`: Indonesian
* `xcopa_it`: Italian
* `xcopa_qu`: Cusco-Collao Quechua
* `xcopa_sw`: Kiswahili
* `xcopa_ta`: Tamil
* `xcopa_th`: Thai
* `xcopa_tr`: Turkish
* `xcopa_vi`: Vietnamese
* `xcopa_zh`: Mandarin Chinese
### Checklist
For adding novel benchmarks/datasets to the library:
* [ ] Is the task an existing benchmark in the literature?
* [ ] Have you referenced the original paper that introduced the task?
* [ ] If yes, does the original paper provide a reference implementation? If so, have you checked against the reference implementation and documented how to run such a test?
If other tasks on this dataset are already supported:
* [ ] Is the "Main" variant of this task clearly denoted?
* [ ] Have you provided a short sentence in a README on what each new variant adds / evaluates?
* [ ] Have you noted which, if any, published evaluation setups are matched by this variant?
# XStoryCloze
### Paper
Title: `Few-shot Learning with Multilingual Language Models`
Abstract: https://arxiv.org/abs/2112.10668
XStoryCloze consists of the professionally translated version of the [English StoryCloze dataset](https://cs.rochester.edu/nlp/rocstories/) (Spring 2016 version) to 10 non-English languages. This dataset is released by Meta AI.
Homepage: https://github.com/facebookresearch/fairseq/pull/4820
### Citation
```
@article{DBLP:journals/corr/abs-2112-10668,
author = {Xi Victoria Lin and
Todor Mihaylov and
Mikel Artetxe and
Tianlu Wang and
Shuohui Chen and
Daniel Simig and
Myle Ott and
Naman Goyal and
Shruti Bhosale and
Jingfei Du and
Ramakanth Pasunuru and
Sam Shleifer and
Punit Singh Koura and
Vishrav Chaudhary and
Brian O'Horo and
Jeff Wang and
Luke Zettlemoyer and
Zornitsa Kozareva and
Mona T. Diab and
Veselin Stoyanov and
Xian Li},
title = {Few-shot Learning with Multilingual Language Models},
journal = {CoRR},
volume = {abs/2112.10668},
year = {2021},
url = {https://arxiv.org/abs/2112.10668},
eprinttype = {arXiv},
eprint = {2112.10668},
timestamp = {Tue, 04 Jan 2022 15:59:27 +0100},
biburl = {https://dblp.org/rec/journals/corr/abs-2112-10668.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
```
### Groups and Tasks
#### Groups
* `xstorycloze`
#### Tasks
* `xstorycloze_ar`: Arabic
* `xstorycloze_en`: English
* `xstorycloze_es`: Spanish
* `xstorycloze_eu`: Basque
* `xstorycloze_hi`: Hindi
* `xstorycloze_id`: Indonesian
* `xstorycloze_my`: Burmese
* `xstorycloze_ru`: Russian
* `xstorycloze_sw`: Swahili
* `xstorycloze_te`: Telugu
* `xstorycloze_zh`: Chinese
### Checklist
For adding novel benchmarks/datasets to the library:
* [ ] Is the task an existing benchmark in the literature?
* [ ] Have you referenced the original paper that introduced the task?
* [ ] If yes, does the original paper provide a reference implementation? If so, have you checked against the reference implementation and documented how to run such a test?
If other tasks on this dataset are already supported:
* [ ] Is the "Main" variant of this task clearly denoted?
* [ ] Have you provided a short sentence in a README on what each new variant adds / evaluates?
* [ ] Have you noted which, if any, published evaluation setups are matched by this variant?
...@@ -31,7 +31,13 @@ Homepage: `https://huggingface.co/datasets/Muennighoff/xwinograd` ...@@ -31,7 +31,13 @@ Homepage: `https://huggingface.co/datasets/Muennighoff/xwinograd`
} }
``` ```
### Subtasks ### Groups and Tasks
#### Groups
* `xwinograd`
#### Tasks
List or describe tasks defined in this folder, and their names here: List or describe tasks defined in this folder, and their names here:
* `xwinograd_en`: Winograd schema challenges in English. * `xwinograd_en`: Winograd schema challenges in English.
......
...@@ -2,9 +2,7 @@ ...@@ -2,9 +2,7 @@
# It doesn't have a yaml file extension as it is not meant to be imported directly # It doesn't have a yaml file extension as it is not meant to be imported directly
# by the harness. # by the harness.
group: group:
- winograd - xwinograd
- commonsense
- multilingual
dataset_path: Muennighoff/xwinograd dataset_path: Muennighoff/xwinograd
dataset_name: null # Overridden by language-specific config. dataset_name: null # Overridden by language-specific config.
output_type: multiple_choice output_type: multiple_choice
......
...@@ -2,7 +2,8 @@ ...@@ -2,7 +2,8 @@
### Paper ### Paper
Title: `paper title goes here` Title: `paper titles goes here`
Abstract: `link to paper PDF or arXiv abstract goes here` Abstract: `link to paper PDF or arXiv abstract goes here`
`Short description of paper / benchmark goes here:` `Short description of paper / benchmark goes here:`
...@@ -16,11 +17,16 @@ Homepage: `homepage to the benchmark's website goes here, if applicable` ...@@ -16,11 +17,16 @@ Homepage: `homepage to the benchmark's website goes here, if applicable`
BibTeX-formatted citation goes here BibTeX-formatted citation goes here
``` ```
### Subtasks ### Groups and Tasks
#### Groups
* `group_name`: `Short description`
#### Tasks
List or describe tasks defined in this folder, and their names here:
* `task_name`: `1-sentence description of what this particular task does` * `task_name`: `1-sentence description of what this particular task does`
* `task_name2`: ..... * `task_name2`: ...
### Checklist ### Checklist
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment