Commit 88486e57 authored by lintangsutawika's avatar lintangsutawika
Browse files

Merge branch 'group-agg-rework' of...

Merge branch 'group-agg-rework' of https://github.com/EleutherAI/lm-evaluation-harness into multiprompt
parents 5971f2ca ba73d131
group: glue tag: glue
task: cola task: cola
dataset_path: glue dataset_path: glue
dataset_name: cola dataset_name: cola
......
group: glue tag: glue
task: mnli task: mnli
dataset_path: glue dataset_path: glue
dataset_name: mnli dataset_name: mnli
......
group: glue tag: glue
task: mrpc task: mrpc
dataset_path: glue dataset_path: glue
dataset_name: mrpc dataset_name: mrpc
......
group: glue tag: glue
task: qnli task: qnli
dataset_path: glue dataset_path: glue
dataset_name: qnli dataset_name: qnli
......
group: glue tag: glue
task: qqp task: qqp
dataset_path: glue dataset_path: glue
dataset_name: qqp dataset_name: qqp
......
group: glue tag: glue
task: rte task: rte
dataset_path: glue dataset_path: glue
dataset_name: rte dataset_name: rte
......
group: glue tag: glue
task: sst2 task: sst2
dataset_path: glue dataset_path: glue
dataset_name: sst2 dataset_name: sst2
......
group: glue tag: glue
task: wnli task: wnli
dataset_path: glue dataset_path: glue
dataset_name: wnli dataset_name: wnli
......
...@@ -25,11 +25,15 @@ Homepage: `https://github.com/idavidrein/gpqa/tree/main` ...@@ -25,11 +25,15 @@ Homepage: `https://github.com/idavidrein/gpqa/tree/main`
This dataset is gated, so you will have to accept the terms of use at https://huggingface.co/datasets/Idavidrein/gpqa and login via `huggingface-cli login` using your HF Hub token before running this task. This dataset is gated, so you will have to accept the terms of use at https://huggingface.co/datasets/Idavidrein/gpqa and login via `huggingface-cli login` using your HF Hub token before running this task.
### Groups and Tasks ### Groups, Tags, and Tasks
#### Groups #### Groups
* `gpqa` None
#### Tags
* `gpqa`: runs all GPQA variants.
#### Tasks #### Tasks
......
dataset_path: Idavidrein/gpqa dataset_path: Idavidrein/gpqa
group: gpqa tag: gpqa
output_type: generate_until output_type: generate_until
process_docs: !function utils.process_docs process_docs: !function utils.process_docs
training_split: train training_split: train
......
dataset_path: Idavidrein/gpqa dataset_path: Idavidrein/gpqa
group: gpqa tag: gpqa
output_type: generate_until output_type: generate_until
process_docs: !function utils.process_docs process_docs: !function utils.process_docs
training_split: train training_split: train
......
dataset_path: Idavidrein/gpqa dataset_path: Idavidrein/gpqa
group: gpqa tag: gpqa
output_type: generate_until output_type: generate_until
process_docs: !function utils.process_docs process_docs: !function utils.process_docs
training_split: train training_split: train
......
dataset_path: Idavidrein/gpqa dataset_path: Idavidrein/gpqa
group: gpqa tag: gpqa
output_type: multiple_choice output_type: multiple_choice
process_docs: !function utils.process_docs process_docs: !function utils.process_docs
training_split: train training_split: train
......
dataset_path: Idavidrein/gpqa dataset_path: Idavidrein/gpqa
group: gpqa tag: gpqa
output_type: multiple_choice output_type: multiple_choice
process_docs: !function utils.process_docs process_docs: !function utils.process_docs
training_split: train training_split: train
......
include: gsm8k-cot.yaml include: gsm8k-cot.yaml
group: tag:
- chain_of_thought - chain_of_thought
- self_consistency - self_consistency
task: gsm8k_cot_self_consistency task: gsm8k_cot_self_consistency
......
group: tag:
- math_word_problems - math_word_problems
task: gsm8k_cot_zeroshot task: gsm8k_cot_zeroshot
dataset_path: gsm8k dataset_path: gsm8k
......
...@@ -61,7 +61,7 @@ generation_kwargs: ...@@ -61,7 +61,7 @@ generation_kwargs:
- 'Q:' - 'Q:'
- </s> - </s>
- <|im_end|> - <|im_end|>
group: tag:
- chain_of_thought - chain_of_thought
metadata: metadata:
version: 3.0 version: 3.0
......
group: tag:
- math_word_problems - math_word_problems
task: gsm8k task: gsm8k
dataset_path: gsm8k dataset_path: gsm8k
......
group: haerae
dataset_path: HAERAE-HUB/HAE_RAE_BENCH dataset_path: HAERAE-HUB/HAE_RAE_BENCH
test_split: test test_split: test
fewshot_split: test fewshot_split: test
......
group: haerae
task:
- haerae_gk
- haerae_hi
- haerae_lw
- haerae_rw
- haerae_sn
aggregate_metric_list:
- metric: acc
aggregation: mean
weight_by_size: true
- metric: acc_norm
aggregation: mean
weight_by_size: true
metadata:
version: 1.0
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment