"src/diffusers/pipelines/flux/pipeline_flux_img2img.py" did not exist on "62863bb1ea35b5624053f7b4a83e0fbc7ae4d3a0"
Unverified Commit 3839125a authored by Lintang Sutawika's avatar Lintang Sutawika Committed by GitHub
Browse files

Merge pull request #784 from EleutherAI/mgsm

[Refactor] mgsm
parents 784fe037 78522c94
......@@ -56,7 +56,7 @@ Boxes should be checked iff tasks are implemented in the refactor and tested for
- [x] XWinograd
- [x] PAWS-X
- [x] XNLI
- [ ] MGSM (Lintang)
- [x] MGSM
- [ ] SCROLLS
- [x] Babi
......
# MGSM
### Paper
Title: `Language Models are Multilingual Chain-of-Thought Reasoners`
Abstract: https://arxiv.org/abs/2210.03057
Multilingual Grade School Math Benchmark (MGSM) is a benchmark of grade-school math problems, proposed in the paper [Language models are multilingual chain-of-thought reasoners](http://arxiv.org/abs/2210.03057).
The same 250 problems from [GSM8K](https://arxiv.org/abs/2110.14168) are each translated via human annotators in 10 languages. The 10 languages are:
- Spanish
- French
- German
- Russian
- Chinese
- Japanese
- Thai
- Swahili
- Bengali
- Telugu
GSM8K (Grade School Math 8K) is a dataset of 8.5K high quality linguistically diverse grade school math word problems. The dataset was created to support the task of question answering on basic mathematical problems that require multi-step reasoning.
You can find the input and targets for each of the ten languages (and English) as `.tsv` files.
We also include few-shot exemplars that are also manually translated from each language in `exemplars.py`.
Homepage: https://github.com/google-research/url-nlp/tree/main/mgsm
### Citation
```
@misc{cobbe2021training,
title={Training Verifiers to Solve Math Word Problems},
author={Karl Cobbe and Vineet Kosaraju and Mohammad Bavarian and Jacob Hilton and Reiichiro Nakano and Christopher Hesse and John Schulman},
year={2021},
eprint={2110.14168},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
@misc{shi2022language,
title={Language Models are Multilingual Chain-of-Thought Reasoners},
author={Freda Shi and Mirac Suzgun and Markus Freitag and Xuezhi Wang and Suraj Srivats and Soroush Vosoughi and Hyung Won Chung and Yi Tay and Sebastian Ruder and Denny Zhou and Dipanjan Das and Jason Wei},
year={2022},
eprint={2210.03057},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
```
### Groups and Tasks
#### Groups
* `mgsm_direct`: Direct question
* `mgsm_direct_bn`: Bengali
* `mgsm_direct_de`: German
* `mgsm_direct_en`: English
* `mgsm_direct_es`: Spanish
* `mgsm_direct_fr`: French
* `mgsm_direct_ja`: Japanese
* `mgsm_direct_ru`: Russian
* `mgsm_direct_sw`: Swahili
* `mgsm_direct_te`: Telugu
* `mgsm_direct_th`: Thai
* `mgsm_direct_zh`: Chinese
* `mgsm_cot_native`: Question with Answer followed by CoT prompt in the same language as the dataset.
* `mgsm_cot_native_bn`: Bengali
* `mgsm_cot_native_de`: German
* `mgsm_cot_native_en`: English
* `mgsm_cot_native_es`: Spanish
* `mgsm_cot_native_fr`: French
* `mgsm_cot_native_ja`: Japanese
* `mgsm_cot_native_ru`: Russian
* `mgsm_cot_native_sw`: Swahili
* `mgsm_cot_native_te`: Telugu
* `mgsm_cot_native_th`: Thai
* `mgsm_cot_native_zh`: Chinese
Examplar Samples: https://github.com/google-research/url-nlp/blob/main/mgsm/exemplars.py
### Checklist
For adding novel benchmarks/datasets to the library:
* [ ] Is the task an existing benchmark in the literature?
* [ ] Have you referenced the original paper that introduced the task?
* [ ] If yes, does the original paper provide a reference implementation? If so, have you checked against the reference implementation and documented how to run such a test?
If other tasks on this dataset are already supported:
* [ ] Is the "Main" variant of this task clearly denoted?
* [ ] Have you provided a short sentence in a README on what each new variant adds / evaluates?
* [ ] Have you noted which, if any, published evaluation setups are matched by this variant?
# This file will be included in the generated language-specific task configs.
# It doesn't have a yaml file extension as it is not meant to be imported directly
# by the harness.
group: mgsm_direct
dataset_path: juletxara/mgsm
dataset_name: null # Overridden by language-specific config.
output_type: greedy_until
training_split: train
test_split: test
target_delimiter: ""
generation_kwargs:
until:
- "\n\n"
- "\n"
do_sample: false
temperature: 0.0
filter_list:
- name: remove_whitespace
filter:
- function: remove_whitespace
- function: take_first
metric_list:
- metric: exact_match
aggregation: mean
higher_is_better: true
ignore_case: true
ignore_punctuation: true
# Generated by utils.py
dataset_name: bn
doc_to_target: '{% if answer is not none %}{{answer[6+1]}}{% else %}{{answer_number|string}}{%
endif %}'
doc_to_text: '{% if answer is not none %}{{question+"\nAnswer"}}{% else %}{{"প্রশ্ন:
"+question+"\nAnswer"}}{% endif %}'
include: direct_yaml
task: mgsm_direct_bn
# Generated by utils.py
dataset_name: de
doc_to_target: '{% if answer is not none %}{{answer[7+1]}}{% else %}{{answer_number|string}}{%
endif %}'
doc_to_text: '{% if answer is not none %}{{question+"\nAntwort"}}{% else %}{{"Frage:
"+question+"\nAntwort"}}{% endif %}'
include: direct_yaml
task: mgsm_direct_de
# Generated by utils.py
dataset_name: en
doc_to_target: '{% if answer is not none %}{{answer[6+1]}}{% else %}{{answer_number|string}}{%
endif %}'
doc_to_text: '{% if answer is not none %}{{question+"\nAnswer"}}{% else %}{{"Question:
"+question+"\nAnswer"}}{% endif %}'
include: direct_yaml
task: mgsm_direct_en
# Generated by utils.py
dataset_name: es
doc_to_target: '{% if answer is not none %}{{answer[6+1]}}{% else %}{{answer_number|string}}{%
endif %}'
doc_to_text: '{% if answer is not none %}{{question+"\nAnswer"}}{% else %}{{"Pregunta:
"+question+"\nAnswer"}}{% endif %}'
include: direct_yaml
task: mgsm_direct_es
# Generated by utils.py
dataset_name: fr
doc_to_target: '{% if answer is not none %}{{answer[6+1]}}{% else %}{{answer_number|string}}{%
endif %}'
doc_to_text: '{% if answer is not none %}{{question+"\nAnswer"}}{% else %}{{"Question
: "+question+"\nAnswer"}}{% endif %}'
include: direct_yaml
task: mgsm_direct_fr
# Generated by utils.py
dataset_name: ja
doc_to_target: '{% if answer is not none %}{{answer[6+1]}}{% else %}{{answer_number|string}}{%
endif %}'
doc_to_text: '{% if answer is not none %}{{question+"\nAnswer"}}{% else %}{{"問題: "+question+"\nAnswer"}}{%
endif %}'
include: direct_yaml
task: mgsm_direct_ja
# Generated by utils.py
dataset_name: ru
doc_to_target: '{% if answer is not none %}{{answer[6+1]}}{% else %}{{answer_number|string}}{%
endif %}'
doc_to_text: '{% if answer is not none %}{{question+"\nAnswer"}}{% else %}{{"Задача:
"+question+"\nAnswer"}}{% endif %}'
include: direct_yaml
task: mgsm_direct_ru
# Generated by utils.py
dataset_name: sw
doc_to_target: '{% if answer is not none %}{{answer[6+1]}}{% else %}{{answer_number|string}}{%
endif %}'
doc_to_text: '{% if answer is not none %}{{question+"\nAnswer"}}{% else %}{{"Swali:
"+question+"\nAnswer"}}{% endif %}'
include: direct_yaml
task: mgsm_direct_sw
# Generated by utils.py
dataset_name: te
doc_to_target: '{% if answer is not none %}{{answer[6+1]}}{% else %}{{answer_number|string}}{%
endif %}'
doc_to_text: '{% if answer is not none %}{{question+"\nAnswer"}}{% else %}{{"ప్రశ్న:
"+question+"\nAnswer"}}{% endif %}'
include: direct_yaml
task: mgsm_direct_te
# Generated by utils.py
dataset_name: th
doc_to_target: '{% if answer is not none %}{{answer[6+1]}}{% else %}{{answer_number|string}}{%
endif %}'
doc_to_text: '{% if answer is not none %}{{question+"\nAnswer"}}{% else %}{{"โจทย์:
"+question+"\nAnswer"}}{% endif %}'
include: direct_yaml
task: mgsm_direct_th
# Generated by utils.py
dataset_name: zh
doc_to_target: '{% if answer is not none %}{{answer[6+1]}}{% else %}{{answer_number|string}}{%
endif %}'
doc_to_text: '{% if answer is not none %}{{question+"\nAnswer"}}{% else %}{{"问题: "+question+"\nAnswer"}}{%
endif %}'
include: direct_yaml
task: mgsm_direct_zh
# This file will be included in the generated language-specific task configs.
# It doesn't have a yaml file extension as it is not meant to be imported directly
# by the harness.
group: mgsm_cot_native
dataset_path: juletxara/mgsm
dataset_name: null # Overridden by language-specific config.
output_type: greedy_until
training_split: train
test_split: test
target_delimiter: ""
generation_kwargs:
until:
- "\n\n"
- "\n"
do_sample: false
temperature: 0.0
target_delimiter: " "
metric_list:
- metric: exact_match
aggregation: mean
higher_is_better: true
ignore_case: true
ignore_punctuation: true
# Generated by utils.py
dataset_name: bn
doc_to_target: '{% if answer is not none %}{{answer[16+1]}}{% else %}{{answer_number|string}}{%
endif %}'
doc_to_text: '{% if answer is not none %}{{question+"\nধাপে ধাপে উত্তর:"}}{% else
%}{{"প্রশ্ন: "+question+"\nধাপে ধাপে উত্তর:"}}{% endif %}'
filter:
- function: regex
regex_pattern: The answer is (\-?[0-9\.\,]+)
- function: take_first
filter_list:
- name: get-answer
include: cot_yaml
task: mgsm_bn_direct
# Generated by utils.py
dataset_name: de
doc_to_target: '{% if answer is not none %}{{answer[28+1]}}{% else %}{{answer_number|string}}{%
endif %}'
doc_to_text: '{% if answer is not none %}{{question+"\nSchritt-für-Schritt-Antwort:"}}{%
else %}{{"Frage: "+question+"\nSchritt-für-Schritt-Antwort:"}}{% endif %}'
filter:
- function: regex
regex_pattern: The answer is (\-?[0-9\.\,]+)
- function: take_first
filter_list:
- name: get-answer
include: cot_yaml
task: mgsm_de_direct
# Generated by utils.py
dataset_name: en
doc_to_target: '{% if answer is not none %}{{answer[20+1]}}{% else %}{{answer_number|string}}{%
endif %}'
doc_to_text: '{% if answer is not none %}{{question+"\nStep-by-Step Answer:"}}{% else
%}{{"Question: "+question+"\nStep-by-Step Answer:"}}{% endif %}'
filter:
- function: regex
regex_pattern: The answer is (\-?[0-9\.\,]+)
- function: take_first
filter_list:
- name: get-answer
include: cot_yaml
task: mgsm_en_direct
# Generated by utils.py
dataset_name: es
doc_to_target: '{% if answer is not none %}{{answer[22+1]}}{% else %}{{answer_number|string}}{%
endif %}'
doc_to_text: '{% if answer is not none %}{{question+"\nRespuesta paso a paso:"}}{%
else %}{{"Pregunta: "+question+"\nRespuesta paso a paso:"}}{% endif %}'
filter:
- function: regex
regex_pattern: The answer is (\-?[0-9\.\,]+)
- function: take_first
filter_list:
- name: get-answer
include: cot_yaml
task: mgsm_es_direct
# Generated by utils.py
dataset_name: fr
doc_to_target: '{% if answer is not none %}{{answer[25+1]}}{% else %}{{answer_number|string}}{%
endif %}'
doc_to_text: '{% if answer is not none %}{{question+"\nRéponse étape par étape :"}}{%
else %}{{"Question : "+question+"\nRéponse étape par étape :"}}{% endif %}'
filter:
- function: regex
regex_pattern: The answer is (\-?[0-9\.\,]+)
- function: take_first
filter_list:
- name: get-answer
include: cot_yaml
task: mgsm_fr_direct
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment