Unverified Commit 759da8d5 authored by Lintang Sutawika's avatar Lintang Sutawika Committed by GitHub
Browse files

Merge pull request #757 from EleutherAI/add-readme

[Refactor] Add README.md
parents 73912efb c05a5ad4
......@@ -27,8 +27,27 @@ Homepage: `https://github.com/sylinrl/TruthfulQA`
}
```
### Subtasks
### Groups and Tasks
#### Groups
* Not part of a group yet.
#### Tasks
* `truthfulqa_mc1`: `Multiple-choice, single answer`
* `truthfulqa_mc2`: `Multiple-choice, multiple answers`
* `truthfulqa_gen`: `Answer generation`
* (MISSING)`truthfulqa_mc2`: `Multiple-choice, multiple answers`
* (MISSING)`truthfulqa_gen`: `Answer generation`
### Checklist
For adding novel benchmarks/datasets to the library:
* [ ] Is the task an existing benchmark in the literature?
* [ ] Have you referenced the original paper that introduced the task?
* [ ] If yes, does the original paper provide a reference implementation? If so, have you checked against the reference implementation and documented how to run such a test?
If other tasks on this dataset are already supported:
* [ ] Is the "Main" variant of this task clearly denoted?
* [ ] Have you provided a short sentence in a README on what each new variant adds / evaluates?
* [ ] Have you noted which, if any, published evaluation setups are matched by this variant?
......@@ -28,7 +28,13 @@ Homepage: https://github.com/openai/gpt-3/tree/master/data
}
```
### Subtasks
### Groups and Tasks
#### Groups
* `unscramble`
#### Tasks
* `anagrams1` - Anagrams of all but the first and last letter.
* `anagrams2` - Anagrams of all but the first and last 2 letters.
......
group:
- greedy_until
- unscramble
task: anagrams1
dataset_path: EleutherAI/unscramble
dataset_name: mid_word_1_anagrams
......
group:
- greedy_until
- unscramble
task: anagrams2
dataset_path: EleutherAI/unscramble
dataset_name: mid_word_2_anagrams
......
group:
- greedy_until
- unscramble
task: cycle_letters
dataset_path: EleutherAI/unscramble
dataset_name: cycle_letters_in_word
......
group:
- greedy_until
- unscramble
task: random_insertion
dataset_path: EleutherAI/unscramble
dataset_name: random_insertion_in_word
......
group:
- greedy_until
- unscramble
task: reversed_words
dataset_path: EleutherAI/unscramble
dataset_name: reversed_words
......
# Task-name
# WEBQs
### Paper
......@@ -33,9 +33,14 @@ Homepage: `https://worksheets.codalab.org/worksheets/0xba659fe363cb46e7a505c5b6a
}
```
### Subtasks
### Groups and Tasks
#### Groups
* `freebase`
#### Tasks
List or describe tasks defined in this folder, and their names here:
* `webqs`: `Questions with multiple accepted answers.`
### Checklist
......
group:
- freebase
- question_answer
task: webqs
dataset_path: web_questions
dataset_name: null
......
......@@ -26,7 +26,13 @@ Homepage: https://www.salesforce.com/products/einstein/ai-research/the-wikitext-
}
```
### Subtasks
### Groups and Tasks
#### Groups
* Not part of a group yet.
#### Tasks
* `wikitext`: measure perplexity on the Wikitext dataset, via rolling loglikelihoods.
......
group:
- perplexity
- loglikelihood_rolling
task: wikitext
dataset_path: EleutherAI/wikitext_document_level
dataset_name: wikitext-2-raw-v1
......
# WinoGrande
### Paper
Title: `WinoGrande: An Adversarial Winograd Schema Challenge at Scale`
Abstract: https://arxiv.org/abs/1907.10641
WinoGrande is a collection of 44k problems, inspired by Winograd Schema Challenge
(Levesque, Davis, and Morgenstern 2011), but adjusted to improve the scale and
robustness against the dataset-specific bias. Formulated as a fill-in-a-blank
task with binary options, the goal is to choose the right option for a given
sentence which requires commonsense reasoning.
NOTE: This evaluation of Winogrande uses partial evaluation as described by
Trinh & Le in Simple Method for Commonsense Reasoning (2018).
See: https://arxiv.org/abs/1806.02847
Homepage: https://leaderboard.allenai.org/winogrande/submissions/public
### Citation
```
@article{sakaguchi2019winogrande,
title={WinoGrande: An Adversarial Winograd Schema Challenge at Scale},
author={Sakaguchi, Keisuke and Bras, Ronan Le and Bhagavatula, Chandra and Choi, Yejin},
journal={arXiv preprint arXiv:1907.10641},
year={2019}
}
```
### Groups and Tasks
#### Groups
* Not part of a group yet.
#### Tasks
* `winogrande`
### Checklist
For adding novel benchmarks/datasets to the library:
* [ ] Is the task an existing benchmark in the literature?
* [ ] Have you referenced the original paper that introduced the task?
* [ ] If yes, does the original paper provide a reference implementation? If so, have you checked against the reference implementation and documented how to run such a test?
If other tasks on this dataset are already supported:
* [ ] Is the "Main" variant of this task clearly denoted?
* [ ] Have you provided a short sentence in a README on what each new variant adds / evaluates?
* [ ] Have you noted which, if any, published evaluation setups are matched by this variant?
## XCOPA: A Multilingual Dataset for Causal Commonsense Reasoning
https://ducdauge.github.io/files/xcopa.pdf
# XCOPA
### Paper
Title: `XCOPA: A Multilingual Dataset for Causal Commonsense Reasoning`
Abstract: https://ducdauge.github.io/files/xcopa.pdf
The Cross-lingual Choice of Plausible Alternatives dataset is a benchmark to evaluate the ability of machine learning models to transfer commonsense reasoning across languages.
The dataset is the translation and reannotation of the English COPA (Roemmele et al. 2011) and covers 11 languages from 11 families and several areas around the globe.
......@@ -8,6 +13,8 @@ All the details about the creation of XCOPA and the implementation of the baseli
Homepage: https://github.com/cambridgeltl/xcopa
### Citation
```
@inproceedings{ponti2020xcopa,
title={{XCOPA: A} Multilingual Dataset for Causal Commonsense Reasoning},
......@@ -17,3 +24,37 @@ Homepage: https://github.com/cambridgeltl/xcopa
url={https://ducdauge.github.io/files/xcopa.pdf}
}
```
### Groups and Tasks
#### Groups
* `xcopa`
#### Tasks
* `xcopa_et`: Estonian
* `xcopa_ht`: Haitian Creole
* `xcopa_id`: Indonesian
* `xcopa_it`: Italian
* `xcopa_qu`: Cusco-Collao Quechua
* `xcopa_sw`: Kiswahili
* `xcopa_ta`: Tamil
* `xcopa_th`: Thai
* `xcopa_tr`: Turkish
* `xcopa_vi`: Vietnamese
* `xcopa_zh`: Mandarin Chinese
### Checklist
For adding novel benchmarks/datasets to the library:
* [ ] Is the task an existing benchmark in the literature?
* [ ] Have you referenced the original paper that introduced the task?
* [ ] If yes, does the original paper provide a reference implementation? If so, have you checked against the reference implementation and documented how to run such a test?
If other tasks on this dataset are already supported:
* [ ] Is the "Main" variant of this task clearly denoted?
* [ ] Have you provided a short sentence in a README on what each new variant adds / evaluates?
* [ ] Have you noted which, if any, published evaluation setups are matched by this variant?
# XStoryCloze
### Paper
Title: `Few-shot Learning with Multilingual Language Models`
Abstract: https://arxiv.org/abs/2112.10668
XStoryCloze consists of the professionally translated version of the [English StoryCloze dataset](https://cs.rochester.edu/nlp/rocstories/) (Spring 2016 version) to 10 non-English languages. This dataset is released by Meta AI.
Homepage: https://github.com/facebookresearch/fairseq/pull/4820
### Citation
```
@article{DBLP:journals/corr/abs-2112-10668,
author = {Xi Victoria Lin and
Todor Mihaylov and
Mikel Artetxe and
Tianlu Wang and
Shuohui Chen and
Daniel Simig and
Myle Ott and
Naman Goyal and
Shruti Bhosale and
Jingfei Du and
Ramakanth Pasunuru and
Sam Shleifer and
Punit Singh Koura and
Vishrav Chaudhary and
Brian O'Horo and
Jeff Wang and
Luke Zettlemoyer and
Zornitsa Kozareva and
Mona T. Diab and
Veselin Stoyanov and
Xian Li},
title = {Few-shot Learning with Multilingual Language Models},
journal = {CoRR},
volume = {abs/2112.10668},
year = {2021},
url = {https://arxiv.org/abs/2112.10668},
eprinttype = {arXiv},
eprint = {2112.10668},
timestamp = {Tue, 04 Jan 2022 15:59:27 +0100},
biburl = {https://dblp.org/rec/journals/corr/abs-2112-10668.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
```
### Groups and Tasks
#### Groups
* `xstorycloze`
#### Tasks
* `xstorycloze_ar`: Arabic
* `xstorycloze_en`: English
* `xstorycloze_es`: Spanish
* `xstorycloze_eu`: Basque
* `xstorycloze_hi`: Hindi
* `xstorycloze_id`: Indonesian
* `xstorycloze_my`: Burmese
* `xstorycloze_ru`: Russian
* `xstorycloze_sw`: Swahili
* `xstorycloze_te`: Telugu
* `xstorycloze_zh`: Chinese
### Checklist
For adding novel benchmarks/datasets to the library:
* [ ] Is the task an existing benchmark in the literature?
* [ ] Have you referenced the original paper that introduced the task?
* [ ] If yes, does the original paper provide a reference implementation? If so, have you checked against the reference implementation and documented how to run such a test?
If other tasks on this dataset are already supported:
* [ ] Is the "Main" variant of this task clearly denoted?
* [ ] Have you provided a short sentence in a README on what each new variant adds / evaluates?
* [ ] Have you noted which, if any, published evaluation setups are matched by this variant?
......@@ -31,7 +31,13 @@ Homepage: `https://huggingface.co/datasets/Muennighoff/xwinograd`
}
```
### Subtasks
### Groups and Tasks
#### Groups
* `xwinograd`
#### Tasks
List or describe tasks defined in this folder, and their names here:
* `xwinograd_en`: Winograd schema challenges in English.
......
......@@ -2,9 +2,7 @@
# It doesn't have a yaml file extension as it is not meant to be imported directly
# by the harness.
group:
- winograd
- commonsense
- multilingual
- xwinograd
dataset_path: Muennighoff/xwinograd
dataset_name: null # Overridden by language-specific config.
output_type: multiple_choice
......
......@@ -2,7 +2,8 @@
### Paper
Title: `paper title goes here`
Title: `paper titles goes here`
Abstract: `link to paper PDF or arXiv abstract goes here`
`Short description of paper / benchmark goes here:`
......@@ -16,11 +17,16 @@ Homepage: `homepage to the benchmark's website goes here, if applicable`
BibTeX-formatted citation goes here
```
### Subtasks
### Groups and Tasks
#### Groups
* `group_name`: `Short description`
#### Tasks
List or describe tasks defined in this folder, and their names here:
* `task_name`: `1-sentence description of what this particular task does`
* `task_name2`: .....
* `task_name2`: ...
### Checklist
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment