"app/git@developer.sourcefind.cn:Fzc7075/nunchaku.git" did not exist on "37a2771246e827c1eca51326d593f2e9e8c4fd48"
Commit 24754ee4 authored by lintangsutawika's avatar lintangsutawika
Browse files

add bbh

parent 88e1cdb0
# BigBenchHard
## Paper
Title: `Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them`
Abstract: https://arxiv.org/abs/2210.09261
A suite of 23 challenging BIG-Bench tasks which we call BIG-Bench Hard (BBH).
These are the task for which prior language model evaluations did not outperform
the average human-rater.
Homepage: https://github.com/suzgunmirac/BIG-Bench-Hard
## Citation
```
@article{suzgun2022challenging,
title={Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them},
author={Suzgun, Mirac and Scales, Nathan and Sch{\"a}rli, Nathanael and Gehrmann, Sebastian and Tay, Yi and Chung, Hyung Won and Chowdhery, Aakanksha and Le, Quoc V and Chi, Ed H and Zhou, Denny and and Wei, Jason},
journal={arXiv preprint arXiv:2210.09261},
year={2022}
}
```
### Groups and Tasks
#### Groups
- `bbh`
#### Tasks
- ...
### Checklist
- [x] Is in Eval-harness v1.0 ?
- [ ] Has been checked for regression from v1.0?
- [ ] Has been checked for equivalence with original paper methodology?
- [ ] "Main" checked variant clearly denoted?
### Variant Wishlist
- [ ] Variant with Calculator (see https://github.com/openai/grade-school-math/blob/master/grade_school_math/calculator.py for example implementation)
- [ ] Using Verifiers
- [ ] Majority voting "without CoT"
import yaml
import inspect
import datasets
from tqdm import tqdm
def main() -> None:
dataset_path = "lukaemon/bbh"
for task in tqdm(datasets.get_dataset_infos(dataset_path).keys()):
file_name = f"{task}.yaml"
try:
with open(f"{file_name}", "w") as f:
f.write("# Generated by _generate_configs.py\n")
yaml.dump(
{
"include": "_template_yaml",
"task": f"{dataset_path.split('/')[-1]}_{task}",
"dataset_name": task,
},
f,
)
except FileExistsError:
pass
if __name__ == "__main__":
main()
\ No newline at end of file
group: bbh
dataset_path: lukaemon/bbh
output_type: greedy_until
test_split: test
doc_to_text: "{{input}}"
doc_to_target: "{{target}}"
metric_list:
- metric: exact_match
aggregation: mean
higher_is_better: true
ignore_case: true
ignore_punctuation: false
generation_kwargs:
until:
- "\n\n"
do_sample: false
temperature: 0.0
# Generated by _generate_configs.py
dataset_name: boolean_expressions
include: _template_yaml
task: bbh_boolean_expressions
# Generated by _generate_configs.py
dataset_name: causal_judgement
include: _template_yaml
task: bbh_causal_judgement
# Generated by _generate_configs.py
dataset_name: date_understanding
include: _template_yaml
task: bbh_date_understanding
# Generated by _generate_configs.py
dataset_name: disambiguation_qa
include: _template_yaml
task: bbh_disambiguation_qa
# Generated by _generate_configs.py
dataset_name: dyck_languages
include: _template_yaml
task: bbh_dyck_languages
# Generated by _generate_configs.py
dataset_name: formal_fallacies
include: _template_yaml
task: bbh_formal_fallacies
# Generated by _generate_configs.py
dataset_name: geometric_shapes
include: _template_yaml
task: bbh_geometric_shapes
# Generated by _generate_configs.py
dataset_name: hyperbaton
include: _template_yaml
task: bbh_hyperbaton
# Generated by _generate_configs.py
dataset_name: logical_deduction_five_objects
include: _template_yaml
task: bbh_logical_deduction_five_objects
# Generated by _generate_configs.py
dataset_name: logical_deduction_seven_objects
include: _template_yaml
task: bbh_logical_deduction_seven_objects
# Generated by _generate_configs.py
dataset_name: logical_deduction_three_objects
include: _template_yaml
task: bbh_logical_deduction_three_objects
# Generated by _generate_configs.py
dataset_name: movie_recommendation
include: _template_yaml
task: bbh_movie_recommendation
# Generated by _generate_configs.py
dataset_name: multistep_arithmetic_two
include: _template_yaml
task: bbh_multistep_arithmetic_two
# Generated by _generate_configs.py
dataset_name: navigate
include: _template_yaml
task: bbh_navigate
# Generated by _generate_configs.py
dataset_name: object_counting
include: _template_yaml
task: bbh_object_counting
# Generated by _generate_configs.py
dataset_name: penguins_in_a_table
include: _template_yaml
task: bbh_penguins_in_a_table
# Generated by _generate_configs.py
dataset_name: reasoning_about_colored_objects
include: _template_yaml
task: bbh_reasoning_about_colored_objects
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment