Unverified Commit e85ca1a9 authored by Lintang Sutawika's avatar Lintang Sutawika Committed by GitHub
Browse files

Merge pull request #759 from EleutherAI/xstorycloze

[Refactor] XStoryCloze
parents a68a3092 e332a1ec
......@@ -45,14 +45,14 @@ Boxes should be checked iff tasks are implemented in the refactor and tested for
- [ ] Translation (WMT) suite (Hailey)
- [x] Unscramble
- [x] ~~Pile (perplexity)~~
- [ ] BLiMP (Lintang)
- [x] BLiMP
- [x] ToxiGen
- [ ] StoryCloze (Lintang)
- [x] StoryCloze
- [ ] NaturalQs (Hailey)
- [x] CrowS-Pairs
- [x] XCopa
- [ ] BIG-Bench (Hailey)
- [ ] XStoryCloze (Lintang)
- [x] XStoryCloze
- [x] XWinograd
- [ ] PAWS-X (Lintang)
- [x] XNLI
......
# StoryCloze
### Paper
Title: `Few-shot Learning with Multilingual Language Models`
Abstract: `https://arxiv.org/abs/2112.10668`
XStoryCloze consists of the professionally translated version of the [English StoryCloze dataset](https://cs.rochester.edu/nlp/rocstories/) (Spring 2016 version) to 10 non-English languages. This dataset is released by Meta AI.
Homepage: https://github.com/facebookresearch/fairseq/pull/4820
### Citation
```
@article{DBLP:journals/corr/abs-2112-10668,
author = {Xi Victoria Lin and
Todor Mihaylov and
Mikel Artetxe and
Tianlu Wang and
Shuohui Chen and
Daniel Simig and
Myle Ott and
Naman Goyal and
Shruti Bhosale and
Jingfei Du and
Ramakanth Pasunuru and
Sam Shleifer and
Punit Singh Koura and
Vishrav Chaudhary and
Brian O'Horo and
Jeff Wang and
Luke Zettlemoyer and
Zornitsa Kozareva and
Mona T. Diab and
Veselin Stoyanov and
Xian Li},
title = {Few-shot Learning with Multilingual Language Models},
journal = {CoRR},
volume = {abs/2112.10668},
year = {2021},
url = {https://arxiv.org/abs/2112.10668},
eprinttype = {arXiv},
eprint = {2112.10668},
timestamp = {Tue, 04 Jan 2022 15:59:27 +0100},
biburl = {https://dblp.org/rec/journals/corr/abs-2112-10668.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
```
### Subtasks
List or describe tasks defined in this folder, and their names here:
* `task_name`: `1-sentence description of what this particular task does`
* `task_name2`: .....
### Checklist
For adding novel benchmarks/datasets to the library:
* [ ] Is the task an existing benchmark in the literature?
* [ ] Have you referenced the original paper that introduced the task?
* [ ] If yes, does the original paper provide a reference implementation? If so, have you checked against the reference implementation and documented how to run such a test?
If other tasks on this dataset are already supported:
* [ ] Is the "Main" variant of this task clearly denoted?
* [ ] Have you provided a short sentence in a README on what each new variant adds / evaluates?
* [ ] Have you noted which, if any, published evaluation setups are matched by this variant?
group: storycloze
task: storycloze_2016
dataset_path: story_cloze
dataset_name: 2016
output_type: multiple_choice
validation_split: validation
test_split: test
doc_to_text: "{{[input_sentence_1, input_sentence_2, input_sentence_3, input_sentence_4]|join(' ')}}"
doc_to_target: "{{answer_right_ending-1}}"
doc_to_choice: "{{[sentence_quiz1, sentence_quiz2]}}"
should_decontaminate: true
doc_to_decontamination_query: "{{[input_sentence_1, input_sentence_2, input_sentence_3, input_sentence_4]|join(' ')}}"
metric_list:
- metric: acc
aggregation: mean
higher_is_better: true
group: storycloze
task: storycloze_2016
dataset_path: story_cloze
dataset_name: 2018
output_type: multiple_choice
validation_split: validation
test_split: test
doc_to_text: "{{[input_sentence_1, input_sentence_2, input_sentence_3, input_sentence_4]|join(' ')}}"
doc_to_target: "{{answer_right_ending-1}}"
doc_to_choice: "{{[sentence_quiz1, sentence_quiz2]}}"
should_decontaminate: true
doc_to_decontamination_query: "{{[input_sentence_1, input_sentence_2, input_sentence_3, input_sentence_4]|join(' ')}}"
metric_list:
- metric: acc
aggregation: mean
higher_is_better: true
# XStoryCloze
### Paper
Title: `Few-shot Learning with Multilingual Language Models`
Abstract: https://arxiv.org/abs/2112.10668
XStoryCloze consists of the professionally translated version of the [English StoryCloze dataset](https://cs.rochester.edu/nlp/rocstories/) (Spring 2016 version) to 10 non-English languages. This dataset is released by Meta AI.
Homepage: https://github.com/facebookresearch/fairseq/pull/4820
### Citation
```
@article{DBLP:journals/corr/abs-2112-10668,
author = {Xi Victoria Lin and
Todor Mihaylov and
Mikel Artetxe and
Tianlu Wang and
Shuohui Chen and
Daniel Simig and
Myle Ott and
Naman Goyal and
Shruti Bhosale and
Jingfei Du and
Ramakanth Pasunuru and
Sam Shleifer and
Punit Singh Koura and
Vishrav Chaudhary and
Brian O'Horo and
Jeff Wang and
Luke Zettlemoyer and
Zornitsa Kozareva and
Mona T. Diab and
Veselin Stoyanov and
Xian Li},
title = {Few-shot Learning with Multilingual Language Models},
journal = {CoRR},
volume = {abs/2112.10668},
year = {2021},
url = {https://arxiv.org/abs/2112.10668},
eprinttype = {arXiv},
eprint = {2112.10668},
timestamp = {Tue, 04 Jan 2022 15:59:27 +0100},
biburl = {https://dblp.org/rec/journals/corr/abs-2112-10668.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
```
### Subtasks
List or describe tasks defined in this folder, and their names here:
* `task_name`: `1-sentence description of what this particular task does`
* `task_name2`: .....
### Checklist
For adding novel benchmarks/datasets to the library:
* [ ] Is the task an existing benchmark in the literature?
* [ ] Have you referenced the original paper that introduced the task?
* [ ] If yes, does the original paper provide a reference implementation? If so, have you checked against the reference implementation and documented how to run such a test?
If other tasks on this dataset are already supported:
* [ ] Is the "Main" variant of this task clearly denoted?
* [ ] Have you provided a short sentence in a README on what each new variant adds / evaluates?
* [ ] Have you noted which, if any, published evaluation setups are matched by this variant?
group: xstorycloze
task: xstorycloze_ar
dataset_path: juletxara/xstory_cloze
dataset_name: ar
output_type: multiple_choice
training_split: train
validation_split: eval
doc_to_text: "{{[input_sentence_1, input_sentence_2, input_sentence_3, input_sentence_4]|join(' ')}}"
doc_to_target: "{{answer_right_ending-1}}"
doc_to_choice: "{{[sentence_quiz1, sentence_quiz2]}}"
should_decontaminate: true
doc_to_decontamination_query: "{{[input_sentence_1, input_sentence_2, input_sentence_3, input_sentence_4]|join(' ')}}"
metric_list:
- metric: acc
aggregation: mean
higher_is_better: true
include: default_ar.yaml
task: xstorycloze_en
dataset_name: en
include: default_ar.yaml
task: xstorycloze_es
dataset_name: es
include: default_ar.yaml
task: xstorycloze_eu
dataset_name: eu
include: default_ar.yaml
task: xstorycloze_hi
dataset_name: hi
include: default_ar.yaml
task: xstorycloze_id
dataset_name: id
include: default_ar.yaml
task: xstorycloze_my
dataset_name: my
include: default_ar.yaml
task: xstorycloze_ru
dataset_name: ru
include: default_ar.yaml
task: xstorycloze_sw
dataset_name: sw
include: default_ar.yaml
task: xstorycloze_te
dataset_name: te
include: default_ar.yaml
task: xstorycloze_zh
dataset_name: zh
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment