Unverified Commit 517aadc4 authored by Lintang Sutawika's avatar Lintang Sutawika Committed by GitHub
Browse files

Group agg rework (#1741)



* add greoup_config arg

* add a group config that allows disabling table for group score and group aggregate in general

* fixed size configuration

* adjust config

* add group config

* adjust mmlu to use group_config

* fixed args input in aggregate_subtask_metrics

* fixed issues related to printing alias of group and updated yaml

* update all mmlu variants to include group_config

* edit format

* modify mmlu tasks

* adjust group to also be a configurable group

* add configurable group

* simplify get_task_list

* adjust group scoring with using ConfigurableGroup

* adjust args

* update mmlu

* update mmlu

* update to work with new group and task configuration

* readd group_agg

* readd files

* move prepare_print_tasks to evaluator_utils

* sort set to False by default, fix predict_only arg

* add version for groups

* reversed task list

* update additional condition when loading a group in a group yaml

* update truthfulqa

* add description regarding tags replacing group

* replace group to tag

* fixed conditional statement

* remove warning

* update loading of task group and newly added tags

* reformat with pre-commit

* fixed info log

* update

* fix bug

* fix bug

* use task id to differentiate tasks

* convert all groups to configurable groups

* use task_id

* reformat

* add task_id for python tasks as well

* add task_id for python tasks as well

* add task_id for python tasks as well

* revert truthfulqa

* revert mmlu tasks

* new mmlu config

* new group config parameter `tag_to_task`

* Update truthfulqa_mc2.yaml

* reformate

* add _process_group_config

* adjust task_id

* add get_subtask_list function to get proper subtask list

* group config to_dict update

* remove tag check

* update mmlu

* fix config passing issues

* add test yaml

* format fix

* add documentation

* corner case for single tag being called

* fix indentation

* formatting

* update all mmlu variants

* Update docs/task_guide.md
Co-authored-by: default avatarHailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

* remove group_alias

* Update docs/task_guide.md
Co-authored-by: default avatarHailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

* remove version for metadata

* Update docs/task_guide.md
Co-authored-by: default avatarHailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

* update mmlu/

* removed " " in make_table

* change how aggregate_metric is loaded

* change how aggregate_metric is loaded

* update aggregate_metric arg

* update format

* update format

* some docs fixes

* add groups for agieval, aexams, aclue

* add more explicit aggregation groups

* add more groupings / tags distinctions

* add more groupings

* more groupings

* add many explicit group configs

* add many explicit group configs

* add more explicit group configs

* add more explicit group configs

* add more error msgs, agg_metric -> agg_metric_list

* some docs updates

* update task_id to be updateable and uses group:task format

* make KMMLU a tag for now

* update docs

* don't duplicate task names

* fix merge conflicts?

* giving this a try

* clean up diff

* switch mmlu variants over to using

* don't use to-be-deprecated group: config field in overview notebook

* Python tasks which subclass ConfigurableTask now run

* update mmlu

* pre-commit format

* fixed sorting for multi-level printing

* move group api to separate file

* fix bbh aggregation filter usage

* track api/group.py

* adjust group and tags loading

* make explicit group configs for leaderboard and other newer tasks

* fix arabicmmlu

* update

* change arabicmmlu template name???

* update group alias

* fix printing bugs

* check table printing is correct ; update tests

* use mmlu_stem to have a group included in print tests

---------
Co-authored-by: default avatarHailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
Co-authored-by: default avatarhaileyschoelkopf <hailey@eleuther.ai>
parent 5a7ed3ee
group: basque-glue tag: basque-glue
task: qnlieu task: qnlieu
dataset_path: orai-nlp/basqueGLUE dataset_path: orai-nlp/basqueGLUE
dataset_name: qnli dataset_name: qnli
......
group: basque-glue tag: basque-glue
task: vaxx_stance task: vaxx_stance
dataset_path: orai-nlp/basqueGLUE dataset_path: orai-nlp/basqueGLUE
dataset_name: vaxx dataset_name: vaxx
......
group: basque-glue tag: basque-glue
task: wiceu task: wiceu
dataset_path: orai-nlp/basqueGLUE dataset_path: orai-nlp/basqueGLUE
dataset_name: wic dataset_name: wic
......
...@@ -21,15 +21,19 @@ Homepage: https://github.com/suzgunmirac/BIG-Bench-Hard ...@@ -21,15 +21,19 @@ Homepage: https://github.com/suzgunmirac/BIG-Bench-Hard
} }
``` ```
### Groups and Tasks ### Groups, Tags, and Tasks
#### Groups #### Groups
- `bbh`: is the same as `bbh_cot_fewshot`.
- `bbh_zeroshot` - `bbh_zeroshot`
- `bbh_fewshot` - `bbh_fewshot`
- `bbh_cot_fewshot` - `bbh_cot_fewshot`
- `bbh_cot_zeroshot` - `bbh_cot_zeroshot`
#### Tags
None.
#### Tasks #### Tasks
......
group: bbh
task:
- bbh_cot_fewshot_boolean_expressions
- bbh_cot_fewshot_causal_judgement
- bbh_cot_fewshot_date_understanding
- bbh_cot_fewshot_disambiguation_qa
- bbh_cot_fewshot_dyck_languages
- bbh_cot_fewshot_formal_fallacies
- bbh_cot_fewshot_geometric_shapes
- bbh_cot_fewshot_hyperbaton
- bbh_cot_fewshot_logical_deduction_five_objects
- bbh_cot_fewshot_logical_deduction_seven_objects
- bbh_cot_fewshot_logical_deduction_three_objects
- bbh_cot_fewshot_movie_recommendation
- bbh_cot_fewshot_multistep_arithmetic_two
- bbh_cot_fewshot_navigate
- bbh_cot_fewshot_object_counting
- bbh_cot_fewshot_penguins_in_a_table
- bbh_cot_fewshot_reasoning_about_colored_objects
- bbh_cot_fewshot_ruin_names
- bbh_cot_fewshot_salient_translation_error_detection
- bbh_cot_fewshot_snarks
- bbh_cot_fewshot_sports_understanding
- bbh_cot_fewshot_temporal_sequences
- bbh_cot_fewshot_tracking_shuffled_objects_five_objects
- bbh_cot_fewshot_tracking_shuffled_objects_seven_objects
- bbh_cot_fewshot_tracking_shuffled_objects_three_objects
- bbh_cot_fewshot_web_of_lies
- bbh_cot_fewshot_word_sorting
aggregate_metric_list:
- metric: exact_match
aggregation: mean
weight_by_size: true
filter_list: get-answer
metadata:
version: 2.0
group: bbh_cot_fewshot
task:
- bbh_cot_fewshot_boolean_expressions
- bbh_cot_fewshot_causal_judgement
- bbh_cot_fewshot_date_understanding
- bbh_cot_fewshot_disambiguation_qa
- bbh_cot_fewshot_dyck_languages
- bbh_cot_fewshot_formal_fallacies
- bbh_cot_fewshot_geometric_shapes
- bbh_cot_fewshot_hyperbaton
- bbh_cot_fewshot_logical_deduction_five_objects
- bbh_cot_fewshot_logical_deduction_seven_objects
- bbh_cot_fewshot_logical_deduction_three_objects
- bbh_cot_fewshot_movie_recommendation
- bbh_cot_fewshot_multistep_arithmetic_two
- bbh_cot_fewshot_navigate
- bbh_cot_fewshot_object_counting
- bbh_cot_fewshot_penguins_in_a_table
- bbh_cot_fewshot_reasoning_about_colored_objects
- bbh_cot_fewshot_ruin_names
- bbh_cot_fewshot_salient_translation_error_detection
- bbh_cot_fewshot_snarks
- bbh_cot_fewshot_sports_understanding
- bbh_cot_fewshot_temporal_sequences
- bbh_cot_fewshot_tracking_shuffled_objects_five_objects
- bbh_cot_fewshot_tracking_shuffled_objects_seven_objects
- bbh_cot_fewshot_tracking_shuffled_objects_three_objects
- bbh_cot_fewshot_web_of_lies
- bbh_cot_fewshot_word_sorting
aggregate_metric_list:
- metric: exact_match
aggregation: mean
weight_by_size: true
filter_list: get-answer
metadata:
version: 2.0
group:
- bbh
- bbh_cot_fewshot
dataset_path: lukaemon/bbh dataset_path: lukaemon/bbh
output_type: generate_until output_type: generate_until
test_split: test test_split: test
......
group: bbh_cot_zeroshot
task:
- bbh_cot_zeroshot_boolean_expressions
- bbh_cot_zeroshot_causal_judgement
- bbh_cot_zeroshot_date_understanding
- bbh_cot_zeroshot_disambiguation_qa
- bbh_cot_zeroshot_dyck_languages
- bbh_cot_zeroshot_formal_fallacies
- bbh_cot_zeroshot_geometric_shapes
- bbh_cot_zeroshot_hyperbaton
- bbh_cot_zeroshot_logical_deduction_five_objects
- bbh_cot_zeroshot_logical_deduction_seven_objects
- bbh_cot_zeroshot_logical_deduction_three_objects
- bbh_cot_zeroshot_movie_recommendation
- bbh_cot_zeroshot_multistep_arithmetic_two
- bbh_cot_zeroshot_navigate
- bbh_cot_zeroshot_object_counting
- bbh_cot_zeroshot_penguins_in_a_table
- bbh_cot_zeroshot_reasoning_about_colored_objects
- bbh_cot_zeroshot_ruin_names
- bbh_cot_zeroshot_salient_translation_error_detection
- bbh_cot_zeroshot_snarks
- bbh_cot_zeroshot_sports_understanding
- bbh_cot_zeroshot_temporal_sequences
- bbh_cot_zeroshot_tracking_shuffled_objects_five_objects
- bbh_cot_zeroshot_tracking_shuffled_objects_seven_objects
- bbh_cot_zeroshot_tracking_shuffled_objects_three_objects
- bbh_cot_zeroshot_web_of_lies
- bbh_cot_zeroshot_word_sorting
aggregate_metric_list:
- metric: exact_match
aggregation: mean
weight_by_size: true
filter_list: flexible-extract
metadata:
version: 2.0
group: bbh_cot_zeroshot
dataset_path: lukaemon/bbh dataset_path: lukaemon/bbh
output_type: generate_until output_type: generate_until
test_split: test test_split: test
......
group: bbh_fewshot
task:
- bbh_fewshot_boolean_expressions
- bbh_fewshot_causal_judgement
- bbh_fewshot_date_understanding
- bbh_fewshot_disambiguation_qa
- bbh_fewshot_dyck_languages
- bbh_fewshot_formal_fallacies
- bbh_fewshot_geometric_shapes
- bbh_fewshot_hyperbaton
- bbh_fewshot_logical_deduction_five_objects
- bbh_fewshot_logical_deduction_seven_objects
- bbh_fewshot_logical_deduction_three_objects
- bbh_fewshot_movie_recommendation
- bbh_fewshot_multistep_arithmetic_two
- bbh_fewshot_navigate
- bbh_fewshot_object_counting
- bbh_fewshot_penguins_in_a_table
- bbh_fewshot_reasoning_about_colored_objects
- bbh_fewshot_ruin_names
- bbh_fewshot_salient_translation_error_detection
- bbh_fewshot_snarks
- bbh_fewshot_sports_understanding
- bbh_fewshot_temporal_sequences
- bbh_fewshot_tracking_shuffled_objects_five_objects
- bbh_fewshot_tracking_shuffled_objects_seven_objects
- bbh_fewshot_tracking_shuffled_objects_three_objects
- bbh_fewshot_web_of_lies
- bbh_fewshot_word_sorting
aggregate_metric_list:
- metric: exact_match
aggregation: mean
weight_by_size: true
metadata:
version: 2.0
group: bbh_fewshot
dataset_path: lukaemon/bbh dataset_path: lukaemon/bbh
output_type: generate_until output_type: generate_until
test_split: test test_split: test
......
group: bbh_zeroshot
task:
- bbh_zeroshot_boolean_expressions
- bbh_zeroshot_causal_judgement
- bbh_zeroshot_date_understanding
- bbh_zeroshot_disambiguation_qa
- bbh_zeroshot_dyck_languages
- bbh_zeroshot_formal_fallacies
- bbh_zeroshot_geometric_shapes
- bbh_zeroshot_hyperbaton
- bbh_zeroshot_logical_deduction_five_objects
- bbh_zeroshot_logical_deduction_seven_objects
- bbh_zeroshot_logical_deduction_three_objects
- bbh_zeroshot_movie_recommendation
- bbh_zeroshot_multistep_arithmetic_two
- bbh_zeroshot_navigate
- bbh_zeroshot_object_counting
- bbh_zeroshot_penguins_in_a_table
- bbh_zeroshot_reasoning_about_colored_objects
- bbh_zeroshot_ruin_names
- bbh_zeroshot_salient_translation_error_detection
- bbh_zeroshot_snarks
- bbh_zeroshot_sports_understanding
- bbh_zeroshot_temporal_sequences
- bbh_zeroshot_tracking_shuffled_objects_five_objects
- bbh_zeroshot_tracking_shuffled_objects_seven_objects
- bbh_zeroshot_tracking_shuffled_objects_three_objects
- bbh_zeroshot_web_of_lies
- bbh_zeroshot_word_sorting
aggregate_metric_list:
- metric: exact_match
aggregation: mean
weight_by_size: true
filter_list: flexible-extract
metadata:
version: 2.0
group: bbh_zeroshot
dataset_path: lukaemon/bbh dataset_path: lukaemon/bbh
output_type: generate_until output_type: generate_until
test_split: test test_split: test
......
group: belebele
task:
- belebele_acm_Arab
- belebele_arz_Arab
- belebele_ceb_Latn
- belebele_fin_Latn
- belebele_hin_Deva
- belebele_ita_Latn
- belebele_khm_Khmr
- belebele_lvs_Latn
- belebele_npi_Deva
- belebele_pol_Latn
- belebele_slv_Latn
- belebele_swe_Latn
- belebele_tso_Latn
- belebele_xho_Latn
- belebele_afr_Latn
- belebele_asm_Beng
- belebele_ces_Latn
- belebele_fra_Latn
- belebele_hin_Latn
- belebele_jav_Latn
- belebele_kin_Latn
- belebele_mal_Mlym
- belebele_npi_Latn
- belebele_por_Latn
- belebele_sna_Latn
- belebele_swh_Latn
- belebele_tur_Latn
- belebele_yor_Latn
- belebele_als_Latn
- belebele_azj_Latn
- belebele_ckb_Arab
- belebele_fuv_Latn
- belebele_hrv_Latn
- belebele_jpn_Jpan
- belebele_kir_Cyrl
- belebele_mar_Deva
- belebele_nso_Latn
- belebele_snd_Arab
- belebele_tam_Taml
- belebele_ukr_Cyrl
- belebele_zho_Hans
- belebele_amh_Ethi
- belebele_bam_Latn
- belebele_dan_Latn
- belebele_gaz_Latn
- belebele_hun_Latn
- belebele_kac_Latn
- belebele_kor_Hang
- belebele_mkd_Cyrl
- belebele_nya_Latn
- belebele_ron_Latn
- belebele_som_Latn
- belebele_tel_Telu
- belebele_urd_Arab
- belebele_zho_Hant
- belebele_apc_Arab
- belebele_ben_Beng
- belebele_deu_Latn
- belebele_grn_Latn
- belebele_hye_Armn
- belebele_kan_Knda
- belebele_lao_Laoo
- belebele_mlt_Latn
- belebele_ory_Orya
- belebele_rus_Cyrl
- belebele_sot_Latn
- belebele_tgk_Cyrl
- belebele_urd_Latn
- belebele_zsm_Latn
- belebele_arb_Arab
- belebele_ben_Latn
- belebele_ell_Grek
- belebele_guj_Gujr
- belebele_ibo_Latn
- belebele_kat_Geor
- belebele_lin_Latn
- belebele_mri_Latn
- belebele_pan_Guru
- belebele_shn_Mymr
- belebele_spa_Latn
- belebele_tgl_Latn
- belebele_uzn_Latn
- belebele_zul_Latn
- belebele_arb_Latn
- belebele_bod_Tibt
- belebele_eng_Latn
- belebele_hat_Latn
- belebele_ilo_Latn
- belebele_kaz_Cyrl
- belebele_lit_Latn
- belebele_mya_Mymr
- belebele_pbt_Arab
- belebele_sin_Latn
- belebele_srp_Cyrl
- belebele_tha_Thai
- belebele_vie_Latn
- belebele_ars_Arab
- belebele_bul_Cyrl
- belebele_est_Latn
- belebele_hau_Latn
- belebele_ind_Latn
- belebele_kea_Latn
- belebele_lug_Latn
- belebele_nld_Latn
- belebele_pes_Arab
- belebele_sin_Sinh
- belebele_ssw_Latn
- belebele_tir_Ethi
- belebele_war_Latn
- belebele_ary_Arab
- belebele_cat_Latn
- belebele_eus_Latn
- belebele_heb_Hebr
- belebele_isl_Latn
- belebele_khk_Cyrl
- belebele_luo_Latn
- belebele_nob_Latn
- belebele_plt_Latn
- belebele_slk_Latn
- belebele_sun_Latn
- belebele_tsn_Latn
- belebele_wol_Latn
aggregate_metric_list:
- aggregation: mean
metric: acc
weight_by_size: true
- aggregation: mean
metric: acc_norm
weight_by_size: true
metadata:
version: 0.0
group: belebele
dataset_path: facebook/belebele dataset_path: facebook/belebele
fewshot_config: fewshot_config:
sampler: first_n sampler: first_n
......
...@@ -65,3 +65,36 @@ if __name__ == "__main__": ...@@ -65,3 +65,36 @@ if __name__ == "__main__":
allow_unicode=True, allow_unicode=True,
default_style='"', default_style='"',
) )
# write group config out
group_yaml_dict = {
"group": f"belebele_{args.task_prefix}"
if args.task_prefix != ""
else "belebele",
"task": [
(
f"belebele_{args.task_prefix}_{lang}"
if args.task_prefix != ""
else f"belebele_{lang}"
)
for lang in languages
if "default" not in lang
],
"aggregate_metric_list": [
{"metric": "acc", "aggregation": "mean", "weight_by_size": False},
{"metric": "acc_norm", "aggregation": "mean", "weight_by_size": False},
],
"metadata": {"version": 0.0},
}
file_save_path = "_" + args.save_prefix_path + f"{args.task_prefix}.yaml"
with open(file_save_path, "w", encoding="utf-8") as group_yaml_file:
yaml.dump(
group_yaml_dict,
group_yaml_file,
width=float("inf"),
allow_unicode=True,
default_style='"',
)
...@@ -4,48 +4,51 @@ task: ...@@ -4,48 +4,51 @@ task:
# ANLI R1 # ANLI R1
- group: anli_r1_flan - group: anli_r1_flan
group_alias: ANLI R1 group_alias: ANLI R1
aggregate_metric_list:
- metric: acc
weight_by_size: True
task: task:
- task: anli_r1 - task: anli_r1_prompt-0
task_alias: prompt-0 task_alias: prompt-0
include: _held_in_template_yaml include: _held_in_template_yaml
doc_to_text: "{{premise}}\n\nChoose your answer: based on the paragraph above can we conclude that \"{{hypothesis}}\"?\n\nOPTIONS:\n- Yes\n- It's impossible to say\n- No\nI think the answer is" doc_to_text: "{{premise}}\n\nChoose your answer: based on the paragraph above can we conclude that \"{{hypothesis}}\"?\n\nOPTIONS:\n- Yes\n- It's impossible to say\n- No\nI think the answer is"
doc_to_target: "{{[\"Yes\", \"It's impossible to say\", \"No\"][label]}}" doc_to_target: "{{[\"Yes\", \"It's impossible to say\", \"No\"][label]}}"
- task: anli_r1 - task: anli_r1_prompt-1
task_alias: prompt-1 task_alias: prompt-1
include: _held_in_template_yaml include: _held_in_template_yaml
doc_to_text: "{{premise}}\n\nBased on that paragraph can we conclude that this sentence is true?\n{{hypothesis}}\n\nOPTIONS:\n- Yes\n- It's impossible to say\n- No" doc_to_text: "{{premise}}\n\nBased on that paragraph can we conclude that this sentence is true?\n{{hypothesis}}\n\nOPTIONS:\n- Yes\n- It's impossible to say\n- No"
doc_to_target: "{{[\"Yes\", \"It's impossible to say\", \"No\"][label]}}" doc_to_target: "{{[\"Yes\", \"It's impossible to say\", \"No\"][label]}}"
- task: anli_r1 - task: anli_r1_prompt-2
task_alias: prompt-2 task_alias: prompt-2
include: _held_in_template_yaml include: _held_in_template_yaml
doc_to_text: "{{premise}}\n\nCan we draw the following conclusion?\n{{hypothesis}}\n\nOPTIONS:\n- Yes\n- It's impossible to say\n- No" doc_to_text: "{{premise}}\n\nCan we draw the following conclusion?\n{{hypothesis}}\n\nOPTIONS:\n- Yes\n- It's impossible to say\n- No"
doc_to_target: "{{[\"Yes\", \"It's impossible to say\", \"No\"][label]}}" doc_to_target: "{{[\"Yes\", \"It's impossible to say\", \"No\"][label]}}"
- task: anli_r1 - task: anli_r1_prompt-3
task_alias: prompt-3 task_alias: prompt-3
include: _held_in_template_yaml include: _held_in_template_yaml
doc_to_text: "{{premise}}\nDoes this next sentence follow, given the preceding text?\n{{hypothesis}}\n\nOPTIONS:\n- Yes\n- It's impossible to say\n- No" doc_to_text: "{{premise}}\nDoes this next sentence follow, given the preceding text?\n{{hypothesis}}\n\nOPTIONS:\n- Yes\n- It's impossible to say\n- No"
doc_to_target: "{{[\"Yes\", \"It's impossible to say\", \"No\"][label]}}" doc_to_target: "{{[\"Yes\", \"It's impossible to say\", \"No\"][label]}}"
- task: anli_r1 - task: anli_r1_prompt-4
task_alias: prompt-4 task_alias: prompt-4
include: _held_in_template_yaml include: _held_in_template_yaml
doc_to_text: "{{premise}}\nCan we infer the following?\n{{hypothesis}}\n\nOPTIONS:\n- Yes\n- It's impossible to say\n- No\nThe answer is:" doc_to_text: "{{premise}}\nCan we infer the following?\n{{hypothesis}}\n\nOPTIONS:\n- Yes\n- It's impossible to say\n- No\nThe answer is:"
doc_to_target: "{{[\"Yes\", \"It's impossible to say\", \"No\"][label]}}" doc_to_target: "{{[\"Yes\", \"It's impossible to say\", \"No\"][label]}}"
- task: anli_r1 - task: anli_r1_prompt-5
task_alias: prompt-5 task_alias: prompt-5
include: _held_in_template_yaml include: _held_in_template_yaml
doc_to_text: "Read the following paragraph and determine if the hypothesis is true:\n\n{{premise}}\n\nOPTIONS:\n- Yes\n- It's impossible to say\n- No\nHypothesis: {{hypothesis}}\n\n\n" doc_to_text: "Read the following paragraph and determine if the hypothesis is true:\n\n{{premise}}\n\nOPTIONS:\n- Yes\n- It's impossible to say\n- No\nHypothesis: {{hypothesis}}\n\n\n"
doc_to_target: "{{[\"Yes\", \"It's impossible to say\", \"No\"][label]}}" doc_to_target: "{{[\"Yes\", \"It's impossible to say\", \"No\"][label]}}"
- task: anli_r1 - task: anli_r1_prompt-6
task_alias: prompt-6 task_alias: prompt-6
include: _held_in_template_yaml include: _held_in_template_yaml
doc_to_text: "Read the text and determine if the sentence is true (see options at the end):\n\n{{premise}}\n\nSentence: {{hypothesis}}\nOPTIONS:\n- Yes\n- It's impossible to say\n- No" doc_to_text: "Read the text and determine if the sentence is true (see options at the end):\n\n{{premise}}\n\nSentence: {{hypothesis}}\nOPTIONS:\n- Yes\n- It's impossible to say\n- No"
doc_to_target: "{{[\"Yes\", \"It's impossible to say\", \"No\"][label]}}" doc_to_target: "{{[\"Yes\", \"It's impossible to say\", \"No\"][label]}}"
- task: anli_r1 - task: anli_r1_prompt-7
task_alias: prompt-7 task_alias: prompt-7
include: _held_in_template_yaml include: _held_in_template_yaml
doc_to_text: "Can we draw the following hypothesis from the context (see options)? \n\nContext:\n\n{{premise}}\n\nHypothesis: {{hypothesis}}\nOPTIONS:\n- Yes\n- It's impossible to say\n- No" doc_to_text: "Can we draw the following hypothesis from the context (see options)? \n\nContext:\n\n{{premise}}\n\nHypothesis: {{hypothesis}}\nOPTIONS:\n- Yes\n- It's impossible to say\n- No"
doc_to_target: "{{[\"Yes\", \"It's impossible to say\", \"No\"][label]}}" doc_to_target: "{{[\"Yes\", \"It's impossible to say\", \"No\"][label]}}"
- task: anli_r1 - task: anli_r1_prompt-8
task_alias: prompt-8 task_alias: prompt-8
include: _held_in_template_yaml include: _held_in_template_yaml
doc_to_text: "Choose from options: Determine if the sentence is true based on the text below:\n{{hypothesis}}\n\n{{premise}}\nOPTIONS:\n- Yes\n- It's impossible to say\n- No" doc_to_text: "Choose from options: Determine if the sentence is true based on the text below:\n{{hypothesis}}\n\n{{premise}}\nOPTIONS:\n- Yes\n- It's impossible to say\n- No"
...@@ -53,48 +56,51 @@ task: ...@@ -53,48 +56,51 @@ task:
# ANLI R2 # ANLI R2
- group: anli_r2_flan - group: anli_r2_flan
group_alias: ANLI R2 group_alias: ANLI R2
aggregate_metric_list:
- metric: acc
weight_by_size: True
task: task:
- task: anli_r2 - task: anli_r2_prompt-0
task_alias: prompt-0 task_alias: prompt-0
include: _held_in_template_yaml include: _held_in_template_yaml
doc_to_text: "{{premise}}\n\nChoose your answer: based on the paragraph above can we conclude that \"{{hypothesis}}\"?\n\nOPTIONS:\n- Yes\n- It's impossible to say\n- No\nI think the answer is" doc_to_text: "{{premise}}\n\nChoose your answer: based on the paragraph above can we conclude that \"{{hypothesis}}\"?\n\nOPTIONS:\n- Yes\n- It's impossible to say\n- No\nI think the answer is"
doc_to_target: "{{[\"Yes\", \"It's impossible to say\", \"No\"][label]}}" doc_to_target: "{{[\"Yes\", \"It's impossible to say\", \"No\"][label]}}"
- task: anli_r2 - task: anli_r2_prompt-1
task_alias: prompt-1 task_alias: prompt-1
include: _held_in_template_yaml include: _held_in_template_yaml
doc_to_text: "{{premise}}\n\nBased on that paragraph can we conclude that this sentence is true?\n{{hypothesis}}\n\nOPTIONS:\n- Yes\n- It's impossible to say\n- No" doc_to_text: "{{premise}}\n\nBased on that paragraph can we conclude that this sentence is true?\n{{hypothesis}}\n\nOPTIONS:\n- Yes\n- It's impossible to say\n- No"
doc_to_target: "{{[\"Yes\", \"It's impossible to say\", \"No\"][label]}}" doc_to_target: "{{[\"Yes\", \"It's impossible to say\", \"No\"][label]}}"
- task: anli_r2 - task: anli_r2_prompt-2
task_alias: prompt-2 task_alias: prompt-2
include: _held_in_template_yaml include: _held_in_template_yaml
doc_to_text: "{{premise}}\n\nCan we draw the following conclusion?\n{{hypothesis}}\n\nOPTIONS:\n- Yes\n- It's impossible to say\n- No" doc_to_text: "{{premise}}\n\nCan we draw the following conclusion?\n{{hypothesis}}\n\nOPTIONS:\n- Yes\n- It's impossible to say\n- No"
doc_to_target: "{{[\"Yes\", \"It's impossible to say\", \"No\"][label]}}" doc_to_target: "{{[\"Yes\", \"It's impossible to say\", \"No\"][label]}}"
- task: anli_r2 - task: anli_r2_prompt-3
task_alias: prompt-3 task_alias: prompt-3
include: _held_in_template_yaml include: _held_in_template_yaml
doc_to_text: "{{premise}}\nDoes this next sentence follow, given the preceding text?\n{{hypothesis}}\n\nOPTIONS:\n- Yes\n- It's impossible to say\n- No" doc_to_text: "{{premise}}\nDoes this next sentence follow, given the preceding text?\n{{hypothesis}}\n\nOPTIONS:\n- Yes\n- It's impossible to say\n- No"
doc_to_target: "{{[\"Yes\", \"It's impossible to say\", \"No\"][label]}}" doc_to_target: "{{[\"Yes\", \"It's impossible to say\", \"No\"][label]}}"
- task: anli_r2 - task: anli_r2_prompt-4
task_alias: prompt-4 task_alias: prompt-4
include: _held_in_template_yaml include: _held_in_template_yaml
doc_to_text: "{{premise}}\nCan we infer the following?\n{{hypothesis}}\n\nOPTIONS:\n- Yes\n- It's impossible to say\n- No\nThe answer is:" doc_to_text: "{{premise}}\nCan we infer the following?\n{{hypothesis}}\n\nOPTIONS:\n- Yes\n- It's impossible to say\n- No\nThe answer is:"
doc_to_target: "{{[\"Yes\", \"It's impossible to say\", \"No\"][label]}}" doc_to_target: "{{[\"Yes\", \"It's impossible to say\", \"No\"][label]}}"
- task: anli_r2 - task: anli_r2_prompt-5
task_alias: prompt-5 task_alias: prompt-5
include: _held_in_template_yaml include: _held_in_template_yaml
doc_to_text: "Read the following paragraph and determine if the hypothesis is true:\n\n{{premise}}\n\nOPTIONS:\n- Yes\n- It's impossible to say\n- No\nHypothesis: {{hypothesis}}\n\n\n" doc_to_text: "Read the following paragraph and determine if the hypothesis is true:\n\n{{premise}}\n\nOPTIONS:\n- Yes\n- It's impossible to say\n- No\nHypothesis: {{hypothesis}}\n\n\n"
doc_to_target: "{{[\"Yes\", \"It's impossible to say\", \"No\"][label]}}" doc_to_target: "{{[\"Yes\", \"It's impossible to say\", \"No\"][label]}}"
- task: anli_r2 - task: anli_r2_prompt-6
task_alias: prompt-6 task_alias: prompt-6
include: _held_in_template_yaml include: _held_in_template_yaml
doc_to_text: "Read the text and determine if the sentence is true (see options at the end):\n\n{{premise}}\n\nSentence: {{hypothesis}}\nOPTIONS:\n- Yes\n- It's impossible to say\n- No" doc_to_text: "Read the text and determine if the sentence is true (see options at the end):\n\n{{premise}}\n\nSentence: {{hypothesis}}\nOPTIONS:\n- Yes\n- It's impossible to say\n- No"
doc_to_target: "{{[\"Yes\", \"It's impossible to say\", \"No\"][label]}}" doc_to_target: "{{[\"Yes\", \"It's impossible to say\", \"No\"][label]}}"
- task: anli_r2 - task: anli_r2_prompt-7
task_alias: prompt-7 task_alias: prompt-7
include: _held_in_template_yaml include: _held_in_template_yaml
doc_to_text: "Can we draw the following hypothesis from the context (see options)? \n\nContext:\n\n{{premise}}\n\nHypothesis: {{hypothesis}}\nOPTIONS:\n- Yes\n- It's impossible to say\n- No" doc_to_text: "Can we draw the following hypothesis from the context (see options)? \n\nContext:\n\n{{premise}}\n\nHypothesis: {{hypothesis}}\nOPTIONS:\n- Yes\n- It's impossible to say\n- No"
doc_to_target: "{{[\"Yes\", \"It's impossible to say\", \"No\"][label]}}" doc_to_target: "{{[\"Yes\", \"It's impossible to say\", \"No\"][label]}}"
- task: anli_r2 - task: anli_r2_prompt-8
task_alias: prompt-8 task_alias: prompt-8
include: _held_in_template_yaml include: _held_in_template_yaml
doc_to_text: "Choose from options: Determine if the sentence is true based on the text below:\n{{hypothesis}}\n\n{{premise}}\nOPTIONS:\n- Yes\n- It's impossible to say\n- No" doc_to_text: "Choose from options: Determine if the sentence is true based on the text below:\n{{hypothesis}}\n\n{{premise}}\nOPTIONS:\n- Yes\n- It's impossible to say\n- No"
...@@ -102,48 +108,51 @@ task: ...@@ -102,48 +108,51 @@ task:
# ANLI R3 # ANLI R3
- group: anli_r3_flan - group: anli_r3_flan
group_alias: ANLI R3 group_alias: ANLI R3
aggregate_metric_list:
- metric: acc
weight_by_size: True
task: task:
- task: anli_r3 - task: anli_r3_prompt-0
task_alias: prompt-0 task_alias: prompt-0
include: _held_in_template_yaml include: _held_in_template_yaml
doc_to_text: "{{premise}}\n\nChoose your answer: based on the paragraph above can we conclude that \"{{hypothesis}}\"?\n\nOPTIONS:\n- Yes\n- It's impossible to say\n- No\nI think the answer is" doc_to_text: "{{premise}}\n\nChoose your answer: based on the paragraph above can we conclude that \"{{hypothesis}}\"?\n\nOPTIONS:\n- Yes\n- It's impossible to say\n- No\nI think the answer is"
doc_to_target: "{{[\"Yes\", \"It's impossible to say\", \"No\"][label]}}" doc_to_target: "{{[\"Yes\", \"It's impossible to say\", \"No\"][label]}}"
- task: anli_r3 - task: anli_r3_prompt-1
task_alias: prompt-1 task_alias: prompt-1
include: _held_in_template_yaml include: _held_in_template_yaml
doc_to_text: "{{premise}}\n\nBased on that paragraph can we conclude that this sentence is true?\n{{hypothesis}}\n\nOPTIONS:\n- Yes\n- It's impossible to say\n- No" doc_to_text: "{{premise}}\n\nBased on that paragraph can we conclude that this sentence is true?\n{{hypothesis}}\n\nOPTIONS:\n- Yes\n- It's impossible to say\n- No"
doc_to_target: "{{[\"Yes\", \"It's impossible to say\", \"No\"][label]}}" doc_to_target: "{{[\"Yes\", \"It's impossible to say\", \"No\"][label]}}"
- task: anli_r3 - task: anli_r3_prompt-2
task_alias: prompt-2 task_alias: prompt-2
include: _held_in_template_yaml include: _held_in_template_yaml
doc_to_text: "{{premise}}\n\nCan we draw the following conclusion?\n{{hypothesis}}\n\nOPTIONS:\n- Yes\n- It's impossible to say\n- No" doc_to_text: "{{premise}}\n\nCan we draw the following conclusion?\n{{hypothesis}}\n\nOPTIONS:\n- Yes\n- It's impossible to say\n- No"
doc_to_target: "{{[\"Yes\", \"It's impossible to say\", \"No\"][label]}}" doc_to_target: "{{[\"Yes\", \"It's impossible to say\", \"No\"][label]}}"
- task: anli_r3 - task: anli_r3_prompt-3
task_alias: prompt-3 task_alias: prompt-3
include: _held_in_template_yaml include: _held_in_template_yaml
doc_to_text: "{{premise}}\nDoes this next sentence follow, given the preceding text?\n{{hypothesis}}\n\nOPTIONS:\n- Yes\n- It's impossible to say\n- No" doc_to_text: "{{premise}}\nDoes this next sentence follow, given the preceding text?\n{{hypothesis}}\n\nOPTIONS:\n- Yes\n- It's impossible to say\n- No"
doc_to_target: "{{[\"Yes\", \"It's impossible to say\", \"No\"][label]}}" doc_to_target: "{{[\"Yes\", \"It's impossible to say\", \"No\"][label]}}"
- task: anli_r3 - task: anli_r3_prompt-4
task_alias: prompt-4 task_alias: prompt-4
include: _held_in_template_yaml include: _held_in_template_yaml
doc_to_text: "{{premise}}\nCan we infer the following?\n{{hypothesis}}\n\nOPTIONS:\n- Yes\n- It's impossible to say\n- No\nThe answer is:" doc_to_text: "{{premise}}\nCan we infer the following?\n{{hypothesis}}\n\nOPTIONS:\n- Yes\n- It's impossible to say\n- No\nThe answer is:"
doc_to_target: "{{[\"Yes\", \"It's impossible to say\", \"No\"][label]}}" doc_to_target: "{{[\"Yes\", \"It's impossible to say\", \"No\"][label]}}"
- task: anli_r3 - task: anli_r3_prompt-5
task_alias: prompt-5 task_alias: prompt-5
include: _held_in_template_yaml include: _held_in_template_yaml
doc_to_text: "Read the following paragraph and determine if the hypothesis is true:\n\n{{premise}}\n\nOPTIONS:\n- Yes\n- It's impossible to say\n- No\nHypothesis: {{hypothesis}}\n\n\n" doc_to_text: "Read the following paragraph and determine if the hypothesis is true:\n\n{{premise}}\n\nOPTIONS:\n- Yes\n- It's impossible to say\n- No\nHypothesis: {{hypothesis}}\n\n\n"
doc_to_target: "{{[\"Yes\", \"It's impossible to say\", \"No\"][label]}}" doc_to_target: "{{[\"Yes\", \"It's impossible to say\", \"No\"][label]}}"
- task: anli_r3 - task: anli_r3_prompt-6
task_alias: prompt-6 task_alias: prompt-6
include: _held_in_template_yaml include: _held_in_template_yaml
doc_to_text: "Read the text and determine if the sentence is true (see options at the end):\n\n{{premise}}\n\nSentence: {{hypothesis}}\nOPTIONS:\n- Yes\n- It's impossible to say\n- No" doc_to_text: "Read the text and determine if the sentence is true (see options at the end):\n\n{{premise}}\n\nSentence: {{hypothesis}}\nOPTIONS:\n- Yes\n- It's impossible to say\n- No"
doc_to_target: "{{[\"Yes\", \"It's impossible to say\", \"No\"][label]}}" doc_to_target: "{{[\"Yes\", \"It's impossible to say\", \"No\"][label]}}"
- task: anli_r3 - task: anli_r3_prompt-7
task_alias: prompt-7 task_alias: prompt-7
include: _held_in_template_yaml include: _held_in_template_yaml
doc_to_text: "Can we draw the following hypothesis from the context (see options)? \n\nContext:\n\n{{premise}}\n\nHypothesis: {{hypothesis}}\nOPTIONS:\n- Yes\n- It's impossible to say\n- No" doc_to_text: "Can we draw the following hypothesis from the context (see options)? \n\nContext:\n\n{{premise}}\n\nHypothesis: {{hypothesis}}\nOPTIONS:\n- Yes\n- It's impossible to say\n- No"
doc_to_target: "{{[\"Yes\", \"It's impossible to say\", \"No\"][label]}}" doc_to_target: "{{[\"Yes\", \"It's impossible to say\", \"No\"][label]}}"
- task: anli_r3 - task: anli_r3_prompt-8
task_alias: prompt-8 task_alias: prompt-8
include: _held_in_template_yaml include: _held_in_template_yaml
doc_to_text: "Choose from options: Determine if the sentence is true based on the text below:\n{{hypothesis}}\n\n{{premise}}\nOPTIONS:\n- Yes\n- It's impossible to say\n- No" doc_to_text: "Choose from options: Determine if the sentence is true based on the text below:\n{{hypothesis}}\n\n{{premise}}\nOPTIONS:\n- Yes\n- It's impossible to say\n- No"
...@@ -151,38 +160,41 @@ task: ...@@ -151,38 +160,41 @@ task:
# Arc Easy # Arc Easy
- group: arc_easy_flan - group: arc_easy_flan
group_alias: Arc Easy group_alias: Arc Easy
aggregate_metric_list:
- metric: acc
weight_by_size: True
task: task:
- task: arc_easy - task: arc_easy_prompt-0
task_alias: prompt-0 task_alias: prompt-0
include: _held_in_template_yaml include: _held_in_template_yaml
doc_to_text: "{{question}}\n\nOPTIONS:\n- {{choices.text|join('\n- ')}}" doc_to_text: "{{question}}\n\nOPTIONS:\n- {{choices.text|join('\n- ')}}"
doc_to_target: "{{choices.text[choices.label.index(answerKey)]}}" doc_to_target: "{{choices.text[choices.label.index(answerKey)]}}"
- task: arc_easy - task: arc_easy_prompt-1
task_alias: prompt-1 task_alias: prompt-1
include: _held_in_template_yaml include: _held_in_template_yaml
doc_to_text: "Question: {{question}}\nOPTIONS:\n- {{choices.text|join('\n- ')}}\nAnswer:" doc_to_text: "Question: {{question}}\nOPTIONS:\n- {{choices.text|join('\n- ')}}\nAnswer:"
doc_to_target: "{{choices.text[choices.label.index(answerKey)]}}" doc_to_target: "{{choices.text[choices.label.index(answerKey)]}}"
- task: arc_easy - task: arc_easy_prompt-2
task_alias: prompt-2 task_alias: prompt-2
include: _held_in_template_yaml include: _held_in_template_yaml
doc_to_text: "Question: {{question}}\n\nWhat is the correct answer to the question from the following choices?\nOPTIONS:\n- {{choices.text|join('\n- ')}}" doc_to_text: "Question: {{question}}\n\nWhat is the correct answer to the question from the following choices?\nOPTIONS:\n- {{choices.text|join('\n- ')}}"
doc_to_target: "{{choices.text[choices.label.index(answerKey)]}}" doc_to_target: "{{choices.text[choices.label.index(answerKey)]}}"
- task: arc_easy - task: arc_easy_prompt-3
task_alias: prompt-3 task_alias: prompt-3
include: _held_in_template_yaml include: _held_in_template_yaml
doc_to_text: "Q: {{question}}\nWhat is the correct answer to this question?\nOPTIONS:\n- {{choices.text|join('\n- ')}}...A:" doc_to_text: "Q: {{question}}\nWhat is the correct answer to this question?\nOPTIONS:\n- {{choices.text|join('\n- ')}}...A:"
doc_to_target: "{{choices.text[choices.label.index(answerKey)]}}" doc_to_target: "{{choices.text[choices.label.index(answerKey)]}}"
- task: arc_easy - task: arc_easy_prompt-4
task_alias: prompt-4 task_alias: prompt-4
include: _held_in_template_yaml include: _held_in_template_yaml
doc_to_text: "Choose your answer?\n\n{{question}}\n\nOPTIONS:\n- {{choices.text|join('\n- ')}}" doc_to_text: "Choose your answer?\n\n{{question}}\n\nOPTIONS:\n- {{choices.text|join('\n- ')}}"
doc_to_target: "{{choices.text[choices.label.index(answerKey)]}}" doc_to_target: "{{choices.text[choices.label.index(answerKey)]}}"
- task: arc_easy - task: arc_easy_prompt-5
task_alias: prompt-5 task_alias: prompt-5
include: _held_in_template_yaml include: _held_in_template_yaml
doc_to_text: "Answer the question\n\n{{question}}\nOPTIONS:\n- {{choices.text|join('\n- ')}}" doc_to_text: "Answer the question\n\n{{question}}\nOPTIONS:\n- {{choices.text|join('\n- ')}}"
doc_to_target: "{{choices.text[choices.label.index(answerKey)]}}" doc_to_target: "{{choices.text[choices.label.index(answerKey)]}}"
- task: arc_easy - task: arc_easy_prompt-6
task_alias: prompt-6 task_alias: prompt-6
include: _held_in_template_yaml include: _held_in_template_yaml
doc_to_text: "{{question}}\n\nPick the answer from these options\n\nOPTIONS:\n- {{choices.text|join('\n- ')}}" doc_to_text: "{{question}}\n\nPick the answer from these options\n\nOPTIONS:\n- {{choices.text|join('\n- ')}}"
...@@ -190,38 +202,41 @@ task: ...@@ -190,38 +202,41 @@ task:
# Arc Challenge # Arc Challenge
- group: arc_challenge_flan - group: arc_challenge_flan
group_alias: Arc Challenge group_alias: Arc Challenge
aggregate_metric_list:
- metric: acc
weight_by_size: True
task: task:
- task: arc_challenge - task: arc_challenge_prompt-0
task_alias: prompt-0 task_alias: prompt-0
include: _held_in_template_yaml include: _held_in_template_yaml
doc_to_text: "{{question}}\n\nOPTIONS:\n- {{choices.text|join('\n- ')}}" doc_to_text: "{{question}}\n\nOPTIONS:\n- {{choices.text|join('\n- ')}}"
doc_to_target: "{{choices.text[choices.label.index(answerKey)]}}" doc_to_target: "{{choices.text[choices.label.index(answerKey)]}}"
- task: arc_challenge - task: arc_challenge_prompt-1
task_alias: prompt-1 task_alias: prompt-1
include: _held_in_template_yaml include: _held_in_template_yaml
doc_to_text: "Question: {{question}}\nOPTIONS:\n- {{choices.text|join('\n- ')}}\nAnswer:" doc_to_text: "Question: {{question}}\nOPTIONS:\n- {{choices.text|join('\n- ')}}\nAnswer:"
doc_to_target: "{{choices.text[choices.label.index(answerKey)]}}" doc_to_target: "{{choices.text[choices.label.index(answerKey)]}}"
- task: arc_challenge - task: arc_challenge_prompt-2
task_alias: prompt-2 task_alias: prompt-2
include: _held_in_template_yaml include: _held_in_template_yaml
doc_to_text: "Question: {{question}}\n\nWhat is the correct answer to the question from the following choices?\nOPTIONS:\n- {{choices.text|join('\n- ')}}" doc_to_text: "Question: {{question}}\n\nWhat is the correct answer to the question from the following choices?\nOPTIONS:\n- {{choices.text|join('\n- ')}}"
doc_to_target: "{{choices.text[choices.label.index(answerKey)]}}" doc_to_target: "{{choices.text[choices.label.index(answerKey)]}}"
- task: arc_challenge - task: arc_challenge_prompt-3
task_alias: prompt-3 task_alias: prompt-3
include: _held_in_template_yaml include: _held_in_template_yaml
doc_to_text: "Q: {{question}}\nWhat is the correct answer to this question?\nOPTIONS:\n- {{choices.text|join('\n- ')}}...A:" doc_to_text: "Q: {{question}}\nWhat is the correct answer to this question?\nOPTIONS:\n- {{choices.text|join('\n- ')}}...A:"
doc_to_target: "{{choices.text[choices.label.index(answerKey)]}}" doc_to_target: "{{choices.text[choices.label.index(answerKey)]}}"
- task: arc_challenge - task: arc_challenge_prompt-4
task_alias: prompt-4 task_alias: prompt-4
include: _held_in_template_yaml include: _held_in_template_yaml
doc_to_text: "Choose your answer?\n\n{{question}}\n\nOPTIONS:\n- {{choices.text|join('\n- ')}}" doc_to_text: "Choose your answer?\n\n{{question}}\n\nOPTIONS:\n- {{choices.text|join('\n- ')}}"
doc_to_target: "{{choices.text[choices.label.index(answerKey)]}}" doc_to_target: "{{choices.text[choices.label.index(answerKey)]}}"
- task: arc_challenge - task: arc_challenge_prompt-5
task_alias: prompt-5 task_alias: prompt-5
include: _held_in_template_yaml include: _held_in_template_yaml
doc_to_text: "Answer the question\n\n{{question}}\nOPTIONS:\n- {{choices.text|join('\n- ')}}" doc_to_text: "Answer the question\n\n{{question}}\nOPTIONS:\n- {{choices.text|join('\n- ')}}"
doc_to_target: "{{choices.text[choices.label.index(answerKey)]}}" doc_to_target: "{{choices.text[choices.label.index(answerKey)]}}"
- task: arc_challenge - task: arc_challenge_prompt-6
task_alias: prompt-6 task_alias: prompt-6
include: _held_in_template_yaml include: _held_in_template_yaml
doc_to_text: "{{question}}\n\nPick the answer from these options\n\nOPTIONS:\n- {{choices.text|join('\n- ')}}" doc_to_text: "{{question}}\n\nPick the answer from these options\n\nOPTIONS:\n- {{choices.text|join('\n- ')}}"
...@@ -229,53 +244,56 @@ task: ...@@ -229,53 +244,56 @@ task:
# BoolQ # BoolQ
- group: boolq_flan - group: boolq_flan
group_alias: BoolQ group_alias: BoolQ
aggregate_metric_list:
- metric: acc
weight_by_size: True
task: task:
- task: boolq - task: boolq_prompt-0
task_alias: prompt-0 task_alias: prompt-0
include: _held_in_template_yaml include: _held_in_template_yaml
doc_to_text: "{{passage}}\n\nCan we conclude that {{question}}?\n\nOPTIONS:\n- no\n- yes" doc_to_text: "{{passage}}\n\nCan we conclude that {{question}}?\n\nOPTIONS:\n- no\n- yes"
doc_to_target: "{{['no', 'yes'][label]}}" doc_to_target: "{{['no', 'yes'][label]}}"
- task: boolq - task: boolq_prompt-1
task_alias: prompt-1 task_alias: prompt-1
include: _held_in_template_yaml include: _held_in_template_yaml
doc_to_text: "{{passage}}\n\nIs it true that {{question}}?\n\nOPTIONS:\n- no\n- yes" doc_to_text: "{{passage}}\n\nIs it true that {{question}}?\n\nOPTIONS:\n- no\n- yes"
doc_to_target: "{{['no', 'yes'][label]}}" doc_to_target: "{{['no', 'yes'][label]}}"
- task: boolq - task: boolq_prompt-2
task_alias: prompt-2 task_alias: prompt-2
include: _held_in_template_yaml include: _held_in_template_yaml
doc_to_text: "{{passage}}\n\n{{question}}?\n\nOPTIONS:\n- no\n- yes" doc_to_text: "{{passage}}\n\n{{question}}?\n\nOPTIONS:\n- no\n- yes"
doc_to_target: "{{['no', 'yes'][label]}}" doc_to_target: "{{['no', 'yes'][label]}}"
- task: boolq - task: boolq_prompt-3
task_alias: prompt-3 task_alias: prompt-3
include: _held_in_template_yaml include: _held_in_template_yaml
doc_to_text: "Text: {{passage}}\n\nQuestion: {{question}}?\n\nOPTIONS:\n- no\n- yes" doc_to_text: "Text: {{passage}}\n\nQuestion: {{question}}?\n\nOPTIONS:\n- no\n- yes"
doc_to_target: "{{['no', 'yes'][label]}}" doc_to_target: "{{['no', 'yes'][label]}}"
- task: boolq - task: boolq_prompt-4
task_alias: prompt-4 task_alias: prompt-4
include: _held_in_template_yaml include: _held_in_template_yaml
doc_to_text: "{{passage}}\n\nWhat's the best answer to this question: {{question}}?\n\nOPTIONS:\n- no\n- yes" doc_to_text: "{{passage}}\n\nWhat's the best answer to this question: {{question}}?\n\nOPTIONS:\n- no\n- yes"
doc_to_target: "{{['no', 'yes'][label]}}" doc_to_target: "{{['no', 'yes'][label]}}"
- task: boolq - task: boolq_prompt-5
task_alias: prompt-5 task_alias: prompt-5
include: _held_in_template_yaml include: _held_in_template_yaml
doc_to_text: "{{passage}}\nBased on the above text what's the best answer to this question: {{question}}?\n\nOPTIONS:\n- no\n- yes" doc_to_text: "{{passage}}\nBased on the above text what's the best answer to this question: {{question}}?\n\nOPTIONS:\n- no\n- yes"
doc_to_target: "{{['no', 'yes'][label]}}" doc_to_target: "{{['no', 'yes'][label]}}"
- task: boolq - task: boolq_prompt-6
task_alias: prompt-6 task_alias: prompt-6
include: _held_in_template_yaml include: _held_in_template_yaml
doc_to_text: "{{passage}}\nAnswer this question making sure that the answer is supposed by the text: {{question}}?\n\nOPTIONS:\n- no\n- yes" doc_to_text: "{{passage}}\nAnswer this question making sure that the answer is supposed by the text: {{question}}?\n\nOPTIONS:\n- no\n- yes"
doc_to_target: "{{['no', 'yes'][label]}}" doc_to_target: "{{['no', 'yes'][label]}}"
- task: boolq - task: boolq_prompt-7
task_alias: prompt-7 task_alias: prompt-7
include: _held_in_template_yaml include: _held_in_template_yaml
doc_to_text: "{{passage}}\n\nIs the following statement correct based on the text\n\n{{question}}\n\nOPTIONS:\n- no\n- yes" doc_to_text: "{{passage}}\n\nIs the following statement correct based on the text\n\n{{question}}\n\nOPTIONS:\n- no\n- yes"
doc_to_target: "{{['no', 'yes'][label]}}" doc_to_target: "{{['no', 'yes'][label]}}"
- task: boolq - task: boolq_prompt-8
task_alias: prompt-8 task_alias: prompt-8
include: _held_in_template_yaml include: _held_in_template_yaml
doc_to_text: "{{passage}}\n\nIs this statement correct \"{{question}}\"?\n\nOPTIONS:\n- no\n- yes" doc_to_text: "{{passage}}\n\nIs this statement correct \"{{question}}\"?\n\nOPTIONS:\n- no\n- yes"
doc_to_target: "{{['no', 'yes'][label]}}" doc_to_target: "{{['no', 'yes'][label]}}"
- task: boolq - task: boolq_prompt-9
task_alias: prompt-9 task_alias: prompt-9
include: _held_in_template_yaml include: _held_in_template_yaml
doc_to_text: "Is it true that {{question}} based on the following text?\n\n{{passage}}\n\nOPTIONS:\n- no\n- yes" doc_to_text: "Is it true that {{question}} based on the following text?\n\n{{passage}}\n\nOPTIONS:\n- no\n- yes"
...@@ -283,48 +301,51 @@ task: ...@@ -283,48 +301,51 @@ task:
# RTE # RTE
- group: rte_flan - group: rte_flan
group_alias: RTE group_alias: RTE
aggregate_metric_list:
- metric: acc
weight_by_size: True
task: task:
- task: rte - task: rte_prompt-0
task_alias: prompt-0 task_alias: prompt-0
include: _held_in_template_yaml include: _held_in_template_yaml
doc_to_text: "{{sentence1}}\n\nQuestion with options: Based on the paragraph above can we conclude that \"{{sentence2}}\"?\n\nOPTIONS:\n- yes\n- no" doc_to_text: "{{sentence1}}\n\nQuestion with options: Based on the paragraph above can we conclude that \"{{sentence2}}\"?\n\nOPTIONS:\n- yes\n- no"
doc_to_target: "{{['yes', 'no'][label]}}" doc_to_target: "{{['yes', 'no'][label]}}"
- task: rte - task: rte_prompt-1
task_alias: prompt-1 task_alias: prompt-1
include: _held_in_template_yaml include: _held_in_template_yaml
doc_to_text: "{{sentence1}}\n\nBased on that paragraph can we conclude that the sentence below is true?\n{{sentence2}}\n\nOPTIONS:\n- yes\n- no" doc_to_text: "{{sentence1}}\n\nBased on that paragraph can we conclude that the sentence below is true?\n{{sentence2}}\n\nOPTIONS:\n- yes\n- no"
doc_to_target: "{{['yes', 'no'][label]}}" doc_to_target: "{{['yes', 'no'][label]}}"
- task: rte - task: rte_prompt-1
task_alias: prompt-2 task_alias: prompt-2
include: _held_in_template_yaml include: _held_in_template_yaml
doc_to_text: "{{sentence1}}\n\nQ with options: Can we draw the following conclusion?\n{{sentence2}}\n\nOPTIONS:\n- yes\n- no" doc_to_text: "{{sentence1}}\n\nQ with options: Can we draw the following conclusion?\n{{sentence2}}\n\nOPTIONS:\n- yes\n- no"
doc_to_target: "{{['yes', 'no'][label]}}" doc_to_target: "{{['yes', 'no'][label]}}"
- task: rte - task: rte_prompt-3
task_alias: prompt-3 task_alias: prompt-3
include: _held_in_template_yaml include: _held_in_template_yaml
doc_to_text: "{{sentence1}}\nDoes this next sentence follow, given the preceding text?\n{{sentence2}}\n\nOPTIONS:\n- yes\n- no" doc_to_text: "{{sentence1}}\nDoes this next sentence follow, given the preceding text?\n{{sentence2}}\n\nOPTIONS:\n- yes\n- no"
doc_to_target: "{{['yes', 'no'][label]}}" doc_to_target: "{{['yes', 'no'][label]}}"
- task: rte - task: rte_prompt-4
task_alias: prompt-4 task_alias: prompt-4
include: _held_in_template_yaml include: _held_in_template_yaml
doc_to_text: "{{sentence1}}\nOPTIONS:\n- yes\n- no\nQuestion: Can we infer the following?\n{{sentence2}}" doc_to_text: "{{sentence1}}\nOPTIONS:\n- yes\n- no\nQuestion: Can we infer the following?\n{{sentence2}}"
doc_to_target: "{{['yes', 'no'][label]}}" doc_to_target: "{{['yes', 'no'][label]}}"
- task: rte - task: rte_prompt-5
task_alias: prompt-5 task_alias: prompt-5
include: _held_in_template_yaml include: _held_in_template_yaml
doc_to_text: "Read the following paragraph and determine if the hypothesis is true. Select from options at the end:\n\n{{sentence1}}\n\nHypothesis: {{sentence2}}\nOPTIONS:\n- yes\n- no\nThe answer is" doc_to_text: "Read the following paragraph and determine if the hypothesis is true. Select from options at the end:\n\n{{sentence1}}\n\nHypothesis: {{sentence2}}\nOPTIONS:\n- yes\n- no\nThe answer is"
doc_to_target: "{{['yes', 'no'][label]}}" doc_to_target: "{{['yes', 'no'][label]}}"
- task: rte - task: rte_prompt-6
task_alias: prompt-6 task_alias: prompt-6
include: _held_in_template_yaml include: _held_in_template_yaml
doc_to_text: "Read the text and determine if the sentence is true:\n\n{{sentence1}}\n\nSentence: {{sentence2}}\nOPTIONS:\n- yes\n- no\nA:" doc_to_text: "Read the text and determine if the sentence is true:\n\n{{sentence1}}\n\nSentence: {{sentence2}}\nOPTIONS:\n- yes\n- no\nA:"
doc_to_target: "{{['yes', 'no'][label]}}" doc_to_target: "{{['yes', 'no'][label]}}"
- task: rte - task: rte_prompt-7
task_alias: prompt-7 task_alias: prompt-7
include: _held_in_template_yaml include: _held_in_template_yaml
doc_to_text: "Question with options: can we draw the following hypothesis from the context? \n\nContext:\n\n{{sentence1}}\n\nHypothesis: {{sentence2}}\nOPTIONS:\n- yes\n- no\nA:" doc_to_text: "Question with options: can we draw the following hypothesis from the context? \n\nContext:\n\n{{sentence1}}\n\nHypothesis: {{sentence2}}\nOPTIONS:\n- yes\n- no\nA:"
doc_to_target: "{{['yes', 'no'][label]}}" doc_to_target: "{{['yes', 'no'][label]}}"
- task: rte - task: rte_prompt-8
task_alias: prompt-8 task_alias: prompt-8
include: _held_in_template_yaml include: _held_in_template_yaml
doc_to_text: "Determine if the sentence is true based on the text below. Choose from options.\n{{sentence2}}\n\n{{sentence1}}\nOPTIONS:\n- yes\n- no" doc_to_text: "Determine if the sentence is true based on the text below. Choose from options.\n{{sentence2}}\n\n{{sentence1}}\nOPTIONS:\n- yes\n- no"
......
...@@ -7,3 +7,9 @@ task: ...@@ -7,3 +7,9 @@ task:
- minerva_math_num_theory - minerva_math_num_theory
- minerva_math_prealgebra - minerva_math_prealgebra
- minerva_math_precalc - minerva_math_precalc
aggregate_metric_list:
- metric: exact_match
aggregation: mean
weight_by_size: true
metadata:
version: 1.0
...@@ -15,3 +15,7 @@ task: ...@@ -15,3 +15,7 @@ task:
task_alias: "professional_medicine (mmlu)" task_alias: "professional_medicine (mmlu)"
- task: mmlu_college_biology - task: mmlu_college_biology
task_alias: "college_biology (mmlu)" task_alias: "college_biology (mmlu)"
aggregate_metric_list:
- metric: acc
aggregation: mean
weight_by_size: True
group: bertaqa tag: bertaqa
dataset_path: HiTZ/BertaQA dataset_path: HiTZ/BertaQA
dataset_name: null dataset_name: null
validation_split: null validation_split: null
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment