"PPOCRLabel/vscode:/vscode.git/clone" did not exist on "f7081e3883447c02a1a888b76539a91bbd1608ff"
Commit 7d09b24c authored by haileyschoelkopf's avatar haileyschoelkopf
Browse files

fix alllll the merge conflicts

parents 96dfe976 6348b947
...@@ -29,10 +29,14 @@ Homepage: https://allenai.org/data/arc ...@@ -29,10 +29,14 @@ Homepage: https://allenai.org/data/arc
} }
``` ```
### Groups and Tasks ### Groups, Tags, and Tasks
#### Groups #### Groups
None.
#### Tags
* `ai2_arc`: Evaluates `arc_easy` and `arc_challenge` * `ai2_arc`: Evaluates `arc_easy` and `arc_challenge`
#### Tasks #### Tasks
......
group: tag:
- ai2_arc - ai2_arc
task: arc_easy task: arc_easy
dataset_path: allenai/ai2_arc dataset_path: allenai/ai2_arc
......
...@@ -27,9 +27,9 @@ Homepage: https://github.com/openai/gpt-3/tree/master/data ...@@ -27,9 +27,9 @@ Homepage: https://github.com/openai/gpt-3/tree/master/data
} }
``` ```
### Groups and Tasks ### Groups, Tags, and Tasks
#### Groups #### Tags
* `arithmetic`: Evaluates `1dc` to `5ds` * `arithmetic`: Evaluates `1dc` to `5ds`
......
group: tag:
- arithmetic - arithmetic
task: arithmetic_1dc task: arithmetic_1dc
dataset_path: EleutherAI/arithmetic dataset_path: EleutherAI/arithmetic
......
...@@ -32,7 +32,7 @@ Homepage: https://github.com/chaochun/nlu-asdiv-dataset ...@@ -32,7 +32,7 @@ Homepage: https://github.com/chaochun/nlu-asdiv-dataset
} }
``` ```
### Groups and Tasks ### Groups, Tags, and Tasks
#### Groups #### Groups
......
...@@ -21,12 +21,16 @@ Homepage: https://github.com/facebookarchive/bAbI-tasks ...@@ -21,12 +21,16 @@ Homepage: https://github.com/facebookarchive/bAbI-tasks
} }
``` ```
### Groups and Tasks ### Groups, Tags, and Tasks
#### Groups #### Groups
* Not part of a group yet * Not part of a group yet
#### Tags
* No tags applied.
#### Tasks #### Tasks
* `babi` * `babi`
......
...@@ -43,11 +43,15 @@ Homepage: `https://github.com/hitz-zentroa/latxa` ...@@ -43,11 +43,15 @@ Homepage: `https://github.com/hitz-zentroa/latxa`
} }
``` ```
### Groups and Tasks ### Groups, Tags, and Tasks
#### Groups #### Groups
* `basque-glue`: First version of the implementation None.
#### Tags
* `basque-glue`: First version of the implementation. Calls all subtasks, but does not average.
#### Tasks #### Tasks
......
...@@ -21,15 +21,19 @@ Homepage: https://github.com/suzgunmirac/BIG-Bench-Hard ...@@ -21,15 +21,19 @@ Homepage: https://github.com/suzgunmirac/BIG-Bench-Hard
} }
``` ```
### Groups and Tasks ### Groups, Tags, and Tasks
#### Groups #### Groups
- `bbh`: is the same as `bbh_cot_fewshot`.
- `bbh_zeroshot` - `bbh_zeroshot`
- `bbh_fewshot` - `bbh_fewshot`
- `bbh_cot_fewshot` - `bbh_cot_fewshot`
- `bbh_cot_zeroshot` - `bbh_cot_zeroshot`
#### Tags
None.
#### Tasks #### Tasks
......
group: bbh
task:
- bbh_cot_fewshot_boolean_expressions
- bbh_cot_fewshot_causal_judgement
- bbh_cot_fewshot_date_understanding
- bbh_cot_fewshot_disambiguation_qa
- bbh_cot_fewshot_dyck_languages
- bbh_cot_fewshot_formal_languages
- bbh_cot_fewshot_geometric_shapes
- bbh_cot_fewshot_hyperbaton
- bbh_cot_fewshot_logical_deduction_five_objects
- bbh_cot_fewshot_logical_deduction_seven_objects
- bbh_cot_fewshot_logical_deduction_three_objects
- bbh_cot_fewshot_movie_recommendation
- bbh_cot_fewshot_multistep_arithmetic_two
- bbh_cot_fewshot_navigate
- bbh_cot_fewshot_object_counting
- bbh_cot_fewshot_penguins_in_a_table
- bbh_cot_fewshot_reasoning_about_colored_objects
- bbh_cot_fewshot_ruin_names
- bbh_cot_fewshot_salient_translation_error_detection
- bbh_cot_fewshot_snarks
- bbh_cot_fewshot_sports_understanding
- bbh_cot_fewshot_temporal_sequences
- bbh_cot_fewshot_tracking_shuffled_objects_five_objects
- bbh_cot_fewshot_tracking_shuffled_objects_seven_objects
- bbh_cot_fewshot_tracking_shuffled_objects_three_objects
- bbh_cot_fewshot_web_of_lies
- bbh_cot_fewshot_word_sorting
aggregate_metric_list:
- metric: exact_match
aggregation: mean
weight_by_size: true
filter_list: get-answer
metadata:
version: 2.0
group: bbh_cot_fewshot
task:
- bbh_cot_fewshot_boolean_expressions
- bbh_cot_fewshot_causal_judgement
- bbh_cot_fewshot_date_understanding
- bbh_cot_fewshot_disambiguation_qa
- bbh_cot_fewshot_dyck_languages
- bbh_cot_fewshot_formal_fallacies
- bbh_cot_fewshot_geometric_shapes
- bbh_cot_fewshot_hyperbaton
- bbh_cot_fewshot_logical_deduction_five_objects
- bbh_cot_fewshot_logical_deduction_seven_objects
- bbh_cot_fewshot_logical_deduction_three_objects
- bbh_cot_fewshot_movie_recommendation
- bbh_cot_fewshot_multistep_arithmetic_two
- bbh_cot_fewshot_navigate
- bbh_cot_fewshot_object_counting
- bbh_cot_fewshot_penguins_in_a_table
- bbh_cot_fewshot_reasoning_about_colored_objects
- bbh_cot_fewshot_ruin_names
- bbh_cot_fewshot_salient_translation_error_detection
- bbh_cot_fewshot_snarks
- bbh_cot_fewshot_sports_understanding
- bbh_cot_fewshot_temporal_sequences
- bbh_cot_fewshot_tracking_shuffled_objects_five_objects
- bbh_cot_fewshot_tracking_shuffled_objects_seven_objects
- bbh_cot_fewshot_tracking_shuffled_objects_three_objects
- bbh_cot_fewshot_web_of_lies
- bbh_cot_fewshot_word_sorting
aggregate_metric_list:
- metric: exact_match
aggregation: mean
weight_by_size: true
filter_list: get-answer
metadata:
version: 2.0
group:
- bbh
- bbh_cot_fewshot
dataset_path: lukaemon/bbh dataset_path: lukaemon/bbh
output_type: generate_until output_type: generate_until
test_split: test test_split: test
......
group: bbh_cot_zeroshot
task:
- bbh_cot_zeroshot_boolean_expressions
- bbh_cot_zeroshot_causal_judgement
- bbh_cot_zeroshot_date_understanding
- bbh_cot_zeroshot_disambiguation_qa
- bbh_cot_zeroshot_dyck_languages
- bbh_cot_zeroshot_formal_fallacies
- bbh_cot_zeroshot_geometric_shapes
- bbh_cot_zeroshot_hyperbaton
- bbh_cot_zeroshot_logical_deduction_five_objects
- bbh_cot_zeroshot_logical_deduction_seven_objects
- bbh_cot_zeroshot_logical_deduction_three_objects
- bbh_cot_zeroshot_movie_recommendation
- bbh_cot_zeroshot_multistep_arithmetic_two
- bbh_cot_zeroshot_navigate
- bbh_cot_zeroshot_object_counting
- bbh_cot_zeroshot_penguins_in_a_table
- bbh_cot_zeroshot_reasoning_about_colored_objects
- bbh_cot_zeroshot_ruin_names
- bbh_cot_zeroshot_salient_translation_error_detection
- bbh_cot_zeroshot_snarks
- bbh_cot_zeroshot_sports_understanding
- bbh_cot_zeroshot_temporal_sequences
- bbh_cot_zeroshot_tracking_shuffled_objects_five_objects
- bbh_cot_zeroshot_tracking_shuffled_objects_seven_objects
- bbh_cot_zeroshot_tracking_shuffled_objects_three_objects
- bbh_cot_zeroshot_web_of_lies
- bbh_cot_zeroshot_word_sorting
aggregate_metric_list:
- metric: exact_match
aggregation: mean
weight_by_size: true
filter_list: flexible-extract
metadata:
version: 2.0
group: bbh_cot_zeroshot
dataset_path: lukaemon/bbh dataset_path: lukaemon/bbh
output_type: generate_until output_type: generate_until
test_split: test test_split: test
......
group: bbh_fewshot
task:
- bbh_fewshot_boolean_expressions
- bbh_fewshot_causal_judgement
- bbh_fewshot_date_understanding
- bbh_fewshot_disambiguation_qa
- bbh_fewshot_dyck_languages
- bbh_fewshot_formal_fallacies
- bbh_fewshot_geometric_shapes
- bbh_fewshot_hyperbaton
- bbh_fewshot_logical_deduction_five_objects
- bbh_fewshot_logical_deduction_seven_objects
- bbh_fewshot_logical_deduction_three_objects
- bbh_fewshot_movie_recommendation
- bbh_fewshot_multistep_arithmetic_two
- bbh_fewshot_navigate
- bbh_fewshot_object_counting
- bbh_fewshot_penguins_in_a_table
- bbh_fewshot_reasoning_about_colored_objects
- bbh_fewshot_ruin_names
- bbh_fewshot_salient_translation_error_detection
- bbh_fewshot_snarks
- bbh_fewshot_sports_understanding
- bbh_fewshot_temporal_sequences
- bbh_fewshot_tracking_shuffled_objects_five_objects
- bbh_fewshot_tracking_shuffled_objects_seven_objects
- bbh_fewshot_tracking_shuffled_objects_three_objects
- bbh_fewshot_web_of_lies
- bbh_fewshot_word_sorting
aggregate_metric_list:
- metric: exact_match
aggregation: mean
weight_by_size: true
metadata:
version: 2.0
group: bbh_fewshot
dataset_path: lukaemon/bbh dataset_path: lukaemon/bbh
output_type: generate_until output_type: generate_until
test_split: test test_split: test
......
group: bbh_zeroshot
task:
- bbh_zeroshot_boolean_expressions
- bbh_zeroshot_causal_judgement
- bbh_zeroshot_date_understanding
- bbh_zeroshot_disambiguation_qa
- bbh_zeroshot_dyck_languages
- bbh_zeroshot_formal_fallacies
- bbh_zeroshot_geometric_shapes
- bbh_zeroshot_hyperbaton
- bbh_zeroshot_logical_deduction_five_objects
- bbh_zeroshot_logical_deduction_seven_objects
- bbh_zeroshot_logical_deduction_three_objects
- bbh_zeroshot_movie_recommendation
- bbh_zeroshot_multistep_arithmetic_two
- bbh_zeroshot_navigate
- bbh_zeroshot_object_counting
- bbh_zeroshot_penguins_in_a_table
- bbh_zeroshot_reasoning_about_colored_objects
- bbh_zeroshot_ruin_names
- bbh_zeroshot_salient_translation_error_detection
- bbh_zeroshot_snarks
- bbh_zeroshot_sports_understanding
- bbh_zeroshot_temporal_sequences
- bbh_zeroshot_tracking_shuffled_objects_five_objects
- bbh_zeroshot_tracking_shuffled_objects_seven_objects
- bbh_zeroshot_tracking_shuffled_objects_three_objects
- bbh_zeroshot_web_of_lies
- bbh_zeroshot_word_sorting
aggregate_metric_list:
- metric: exact_match
aggregation: mean
weight_by_size: true
filter_list: flexible-extract
metadata:
version: 2.0
group: bbh_zeroshot
dataset_path: lukaemon/bbh dataset_path: lukaemon/bbh
output_type: generate_until output_type: generate_until
test_split: test test_split: test
......
group: belebele
task:
- belebele_acm_Arab
- belebele_arz_Arab
- belebele_ceb_Latn
- belebele_fin_Latn
- belebele_hin_Deva
- belebele_ita_Latn
- belebele_khm_Khmr
- belebele_lvs_Latn
- belebele_npi_Deva
- belebele_pol_Latn
- belebele_slv_Latn
- belebele_swe_Latn
- belebele_tso_Latn
- belebele_xho_Latn
- belebele_afr_Latn
- belebele_asm_Beng
- belebele_ces_Latn
- belebele_fra_Latn
- belebele_hin_Latn
- belebele_jav_Latn
- belebele_kin_Latn
- belebele_mal_Mlym
- belebele_npi_Latn
- belebele_por_Latn
- belebele_sna_Latn
- belebele_swh_Latn
- belebele_tur_Latn
- belebele_yor_Latn
- belebele_als_Latn
- belebele_azj_Latn
- belebele_ckb_Arab
- belebele_fuv_Latn
- belebele_hrv_Latn
- belebele_jpn_Jpan
- belebele_kir_Cyrl
- belebele_mar_Deva
- belebele_nso_Latn
- belebele_snd_Arab
- belebele_tam_Taml
- belebele_ukr_Cyrl
- belebele_zho_Hans
- belebele_amh_Ethi
- belebele_bam_Latn
- belebele_dan_Latn
- belebele_gaz_Latn
- belebele_hun_Latn
- belebele_kac_Latn
- belebele_kor_Hang
- belebele_mkd_Cyrl
- belebele_nya_Latn
- belebele_ron_Latn
- belebele_som_Latn
- belebele_tel_Telu
- belebele_urd_Arab
- belebele_zho_Hant
- belebele_apc_Arab
- belebele_ben_Beng
- belebele_deu_Latn
- belebele_grn_Latn
- belebele_hye_Armn
- belebele_kan_Knda
- belebele_lao_Laoo
- belebele_mlt_Latn
- belebele_ory_Orya
- belebele_rus_Cyrl
- belebele_sot_Latn
- belebele_tgk_Cyrl
- belebele_urd_Latn
- belebele_zsm_Latn
- belebele_arb_Arab
- belebele_ben_Latn
- belebele_ell_Grek
- belebele_guj_Gujr
- belebele_ibo_Latn
- belebele_kat_Geor
- belebele_lin_Latn
- belebele_mri_Latn
- belebele_pan_Guru
- belebele_shn_Mymr
- belebele_spa_Latn
- belebele_tgl_Latn
- belebele_uzn_Latn
- belebele_zul_Latn
- belebele_arb_Latn
- belebele_bod_Tibt
- belebele_eng_Latn
- belebele_hat_Latn
- belebele_ilo_Latn
- belebele_kaz_Cyrl
- belebele_lit_Latn
- belebele_mya_Mymr
- belebele_pbt_Arab
- belebele_sin_Latn
- belebele_srp_Cyrl
- belebele_tha_Thai
- belebele_vie_Latn
- belebele_ars_Arab
- belebele_bul_Cyrl
- belebele_est_Latn
- belebele_hau_Latn
- belebele_ind_Latn
- belebele_kea_Latn
- belebele_lug_Latn
- belebele_nld_Latn
- belebele_pes_Arab
- belebele_sin_Sinh
- belebele_ssw_Latn
- belebele_tir_Ethi
- belebele_war_Latn
- belebele_ary_Arab
- belebele_cat_Latn
- belebele_eus_Latn
- belebele_heb_Hebr
- belebele_isl_Latn
- belebele_khk_Cyrl
- belebele_luo_Latn
- belebele_nob_Latn
- belebele_plt_Latn
- belebele_slk_Latn
- belebele_sun_Latn
- belebele_tsn_Latn
- belebele_wol_Latn
aggregate_metric_list:
- aggregation: mean
metric: acc
weight_by_size: true
- aggregation: mean
metric: acc_norm
weight_by_size: true
metadata:
version: 0.0
group: belebele
dataset_path: facebook/belebele dataset_path: facebook/belebele
fewshot_config: fewshot_config:
sampler: first_n sampler: first_n
......
...@@ -65,3 +65,36 @@ if __name__ == "__main__": ...@@ -65,3 +65,36 @@ if __name__ == "__main__":
allow_unicode=True, allow_unicode=True,
default_style='"', default_style='"',
) )
# write group config out
group_yaml_dict = {
"group": f"belebele_{args.task_prefix}"
if args.task_prefix != ""
else "belebele",
"task": [
(
f"belebele_{args.task_prefix}_{lang}"
if args.task_prefix != ""
else f"belebele_{lang}"
)
for lang in languages
if "default" not in lang
],
"aggregate_metric_list": [
{"metric": "acc", "aggregation": "mean", "weight_by_size": False},
{"metric": "acc_norm", "aggregation": "mean", "weight_by_size": False},
],
"metadata": {"version": 0.0},
}
file_save_path = "_" + args.save_prefix_path + f"{args.task_prefix}.yaml"
with open(file_save_path, "w", encoding="utf-8") as group_yaml_file:
yaml.dump(
group_yaml_dict,
group_yaml_file,
width=float("inf"),
allow_unicode=True,
default_style='"',
)
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment