Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
gaoqiong
lm-evaluation-harness
Commits
e4db76cb
Commit
e4db76cb
authored
Jul 09, 2024
by
haileyschoelkopf
Browse files
Merge branch 'main' into multimodal-prototyping
parents
6cc6e9cd
ad80f555
Changes
871
Hide whitespace changes
Inline
Side-by-side
Showing
20 changed files
with
362 additions
and
16 deletions
+362
-16
lm_eval/tasks/basqueglue/README.md
lm_eval/tasks/basqueglue/README.md
+6
-2
lm_eval/tasks/basqueglue/bec.yaml
lm_eval/tasks/basqueglue/bec.yaml
+1
-1
lm_eval/tasks/basqueglue/bhtc.yaml
lm_eval/tasks/basqueglue/bhtc.yaml
+1
-1
lm_eval/tasks/basqueglue/coref.yaml
lm_eval/tasks/basqueglue/coref.yaml
+1
-1
lm_eval/tasks/basqueglue/qnli.yaml
lm_eval/tasks/basqueglue/qnli.yaml
+1
-1
lm_eval/tasks/basqueglue/vaxx.yaml
lm_eval/tasks/basqueglue/vaxx.yaml
+1
-1
lm_eval/tasks/basqueglue/wic.yaml
lm_eval/tasks/basqueglue/wic.yaml
+1
-1
lm_eval/tasks/bbh/README.md
lm_eval/tasks/bbh/README.md
+5
-1
lm_eval/tasks/bbh/cot_fewshot/_bbh.yaml
lm_eval/tasks/bbh/cot_fewshot/_bbh.yaml
+36
-0
lm_eval/tasks/bbh/cot_fewshot/_bbh_cot_fewshot.yaml
lm_eval/tasks/bbh/cot_fewshot/_bbh_cot_fewshot.yaml
+36
-0
lm_eval/tasks/bbh/cot_fewshot/_cot_fewshot_template_yaml
lm_eval/tasks/bbh/cot_fewshot/_cot_fewshot_template_yaml
+0
-3
lm_eval/tasks/bbh/cot_zeroshot/_bbh_cot_zeroshot.yaml
lm_eval/tasks/bbh/cot_zeroshot/_bbh_cot_zeroshot.yaml
+36
-0
lm_eval/tasks/bbh/cot_zeroshot/_cot_zeroshot_template_yaml
lm_eval/tasks/bbh/cot_zeroshot/_cot_zeroshot_template_yaml
+0
-1
lm_eval/tasks/bbh/fewshot/_bbh_fewshot.yaml
lm_eval/tasks/bbh/fewshot/_bbh_fewshot.yaml
+35
-0
lm_eval/tasks/bbh/fewshot/_fewshot_template_yaml
lm_eval/tasks/bbh/fewshot/_fewshot_template_yaml
+0
-1
lm_eval/tasks/bbh/zeroshot/_bbh_zeroshot.yaml
lm_eval/tasks/bbh/zeroshot/_bbh_zeroshot.yaml
+36
-0
lm_eval/tasks/bbh/zeroshot/_zeroshot_template_yaml
lm_eval/tasks/bbh/zeroshot/_zeroshot_template_yaml
+0
-1
lm_eval/tasks/belebele/_belebele.yaml
lm_eval/tasks/belebele/_belebele.yaml
+133
-0
lm_eval/tasks/belebele/_default_template_yaml
lm_eval/tasks/belebele/_default_template_yaml
+0
-1
lm_eval/tasks/belebele/_generate_configs.py
lm_eval/tasks/belebele/_generate_configs.py
+33
-0
No files found.
lm_eval/tasks/basqueglue/README.md
View file @
e4db76cb
...
@@ -43,11 +43,15 @@ Homepage: `https://github.com/hitz-zentroa/latxa`
...
@@ -43,11 +43,15 @@ Homepage: `https://github.com/hitz-zentroa/latxa`
}
}
```
```
### Groups and Tasks
### Groups
, Tags,
and Tasks
#### Groups
#### Groups
*
`basque-glue`
: First version of the implementation
None.
#### Tags
*
`basque-glue`
: First version of the implementation. Calls all subtasks, but does not average.
#### Tasks
#### Tasks
...
...
lm_eval/tasks/basqueglue/bec.yaml
View file @
e4db76cb
group
:
basque-glue
tag
:
basque-glue
task
:
bec2016eu
task
:
bec2016eu
dataset_path
:
orai-nlp/basqueGLUE
dataset_path
:
orai-nlp/basqueGLUE
dataset_name
:
bec
dataset_name
:
bec
...
...
lm_eval/tasks/basqueglue/bhtc.yaml
View file @
e4db76cb
group
:
basque-glue
tag
:
basque-glue
task
:
bhtc_v2
task
:
bhtc_v2
dataset_path
:
orai-nlp/basqueGLUE
dataset_path
:
orai-nlp/basqueGLUE
dataset_name
:
bhtc
dataset_name
:
bhtc
...
...
lm_eval/tasks/basqueglue/coref.yaml
View file @
e4db76cb
group
:
basque-glue
tag
:
basque-glue
task
:
epec_koref_bin
task
:
epec_koref_bin
dataset_path
:
orai-nlp/basqueGLUE
dataset_path
:
orai-nlp/basqueGLUE
dataset_name
:
coref
dataset_name
:
coref
...
...
lm_eval/tasks/basqueglue/qnli.yaml
View file @
e4db76cb
group
:
basque-glue
tag
:
basque-glue
task
:
qnlieu
task
:
qnlieu
dataset_path
:
orai-nlp/basqueGLUE
dataset_path
:
orai-nlp/basqueGLUE
dataset_name
:
qnli
dataset_name
:
qnli
...
...
lm_eval/tasks/basqueglue/vaxx.yaml
View file @
e4db76cb
group
:
basque-glue
tag
:
basque-glue
task
:
vaxx_stance
task
:
vaxx_stance
dataset_path
:
orai-nlp/basqueGLUE
dataset_path
:
orai-nlp/basqueGLUE
dataset_name
:
vaxx
dataset_name
:
vaxx
...
...
lm_eval/tasks/basqueglue/wic.yaml
View file @
e4db76cb
group
:
basque-glue
tag
:
basque-glue
task
:
wiceu
task
:
wiceu
dataset_path
:
orai-nlp/basqueGLUE
dataset_path
:
orai-nlp/basqueGLUE
dataset_name
:
wic
dataset_name
:
wic
...
...
lm_eval/tasks/bbh/README.md
View file @
e4db76cb
...
@@ -21,15 +21,19 @@ Homepage: https://github.com/suzgunmirac/BIG-Bench-Hard
...
@@ -21,15 +21,19 @@ Homepage: https://github.com/suzgunmirac/BIG-Bench-Hard
}
}
```
```
### Groups and Tasks
### Groups
, Tags,
and Tasks
#### Groups
#### Groups
-
`bbh`
: is the same as
`bbh_cot_fewshot`
.
-
`bbh_zeroshot`
-
`bbh_zeroshot`
-
`bbh_fewshot`
-
`bbh_fewshot`
-
`bbh_cot_fewshot`
-
`bbh_cot_fewshot`
-
`bbh_cot_zeroshot`
-
`bbh_cot_zeroshot`
#### Tags
None.
#### Tasks
#### Tasks
...
...
lm_eval/tasks/bbh/cot_fewshot/_bbh.yaml
0 → 100644
View file @
e4db76cb
group
:
bbh
task
:
-
bbh_cot_fewshot_boolean_expressions
-
bbh_cot_fewshot_causal_judgement
-
bbh_cot_fewshot_date_understanding
-
bbh_cot_fewshot_disambiguation_qa
-
bbh_cot_fewshot_dyck_languages
-
bbh_cot_fewshot_formal_fallacies
-
bbh_cot_fewshot_geometric_shapes
-
bbh_cot_fewshot_hyperbaton
-
bbh_cot_fewshot_logical_deduction_five_objects
-
bbh_cot_fewshot_logical_deduction_seven_objects
-
bbh_cot_fewshot_logical_deduction_three_objects
-
bbh_cot_fewshot_movie_recommendation
-
bbh_cot_fewshot_multistep_arithmetic_two
-
bbh_cot_fewshot_navigate
-
bbh_cot_fewshot_object_counting
-
bbh_cot_fewshot_penguins_in_a_table
-
bbh_cot_fewshot_reasoning_about_colored_objects
-
bbh_cot_fewshot_ruin_names
-
bbh_cot_fewshot_salient_translation_error_detection
-
bbh_cot_fewshot_snarks
-
bbh_cot_fewshot_sports_understanding
-
bbh_cot_fewshot_temporal_sequences
-
bbh_cot_fewshot_tracking_shuffled_objects_five_objects
-
bbh_cot_fewshot_tracking_shuffled_objects_seven_objects
-
bbh_cot_fewshot_tracking_shuffled_objects_three_objects
-
bbh_cot_fewshot_web_of_lies
-
bbh_cot_fewshot_word_sorting
aggregate_metric_list
:
-
metric
:
exact_match
aggregation
:
mean
weight_by_size
:
true
filter_list
:
get-answer
metadata
:
version
:
2.0
lm_eval/tasks/bbh/cot_fewshot/_bbh_cot_fewshot.yaml
0 → 100644
View file @
e4db76cb
group
:
bbh_cot_fewshot
task
:
-
bbh_cot_fewshot_boolean_expressions
-
bbh_cot_fewshot_causal_judgement
-
bbh_cot_fewshot_date_understanding
-
bbh_cot_fewshot_disambiguation_qa
-
bbh_cot_fewshot_dyck_languages
-
bbh_cot_fewshot_formal_fallacies
-
bbh_cot_fewshot_geometric_shapes
-
bbh_cot_fewshot_hyperbaton
-
bbh_cot_fewshot_logical_deduction_five_objects
-
bbh_cot_fewshot_logical_deduction_seven_objects
-
bbh_cot_fewshot_logical_deduction_three_objects
-
bbh_cot_fewshot_movie_recommendation
-
bbh_cot_fewshot_multistep_arithmetic_two
-
bbh_cot_fewshot_navigate
-
bbh_cot_fewshot_object_counting
-
bbh_cot_fewshot_penguins_in_a_table
-
bbh_cot_fewshot_reasoning_about_colored_objects
-
bbh_cot_fewshot_ruin_names
-
bbh_cot_fewshot_salient_translation_error_detection
-
bbh_cot_fewshot_snarks
-
bbh_cot_fewshot_sports_understanding
-
bbh_cot_fewshot_temporal_sequences
-
bbh_cot_fewshot_tracking_shuffled_objects_five_objects
-
bbh_cot_fewshot_tracking_shuffled_objects_seven_objects
-
bbh_cot_fewshot_tracking_shuffled_objects_three_objects
-
bbh_cot_fewshot_web_of_lies
-
bbh_cot_fewshot_word_sorting
aggregate_metric_list
:
-
metric
:
exact_match
aggregation
:
mean
weight_by_size
:
true
filter_list
:
get-answer
metadata
:
version
:
2.0
lm_eval/tasks/bbh/cot_fewshot/_cot_fewshot_template_yaml
View file @
e4db76cb
group:
- bbh
- bbh_cot_fewshot
dataset_path: lukaemon/bbh
dataset_path: lukaemon/bbh
output_type: generate_until
output_type: generate_until
test_split: test
test_split: test
...
...
lm_eval/tasks/bbh/cot_zeroshot/_bbh_cot_zeroshot.yaml
0 → 100644
View file @
e4db76cb
group
:
bbh_cot_zeroshot
task
:
-
bbh_cot_zeroshot_boolean_expressions
-
bbh_cot_zeroshot_causal_judgement
-
bbh_cot_zeroshot_date_understanding
-
bbh_cot_zeroshot_disambiguation_qa
-
bbh_cot_zeroshot_dyck_languages
-
bbh_cot_zeroshot_formal_fallacies
-
bbh_cot_zeroshot_geometric_shapes
-
bbh_cot_zeroshot_hyperbaton
-
bbh_cot_zeroshot_logical_deduction_five_objects
-
bbh_cot_zeroshot_logical_deduction_seven_objects
-
bbh_cot_zeroshot_logical_deduction_three_objects
-
bbh_cot_zeroshot_movie_recommendation
-
bbh_cot_zeroshot_multistep_arithmetic_two
-
bbh_cot_zeroshot_navigate
-
bbh_cot_zeroshot_object_counting
-
bbh_cot_zeroshot_penguins_in_a_table
-
bbh_cot_zeroshot_reasoning_about_colored_objects
-
bbh_cot_zeroshot_ruin_names
-
bbh_cot_zeroshot_salient_translation_error_detection
-
bbh_cot_zeroshot_snarks
-
bbh_cot_zeroshot_sports_understanding
-
bbh_cot_zeroshot_temporal_sequences
-
bbh_cot_zeroshot_tracking_shuffled_objects_five_objects
-
bbh_cot_zeroshot_tracking_shuffled_objects_seven_objects
-
bbh_cot_zeroshot_tracking_shuffled_objects_three_objects
-
bbh_cot_zeroshot_web_of_lies
-
bbh_cot_zeroshot_word_sorting
aggregate_metric_list
:
-
metric
:
exact_match
aggregation
:
mean
weight_by_size
:
true
filter_list
:
flexible-extract
metadata
:
version
:
2.0
lm_eval/tasks/bbh/cot_zeroshot/_cot_zeroshot_template_yaml
View file @
e4db76cb
group: bbh_cot_zeroshot
dataset_path: lukaemon/bbh
dataset_path: lukaemon/bbh
output_type: generate_until
output_type: generate_until
test_split: test
test_split: test
...
...
lm_eval/tasks/bbh/fewshot/_bbh_fewshot.yaml
0 → 100644
View file @
e4db76cb
group
:
bbh_fewshot
task
:
-
bbh_fewshot_boolean_expressions
-
bbh_fewshot_causal_judgement
-
bbh_fewshot_date_understanding
-
bbh_fewshot_disambiguation_qa
-
bbh_fewshot_dyck_languages
-
bbh_fewshot_formal_fallacies
-
bbh_fewshot_geometric_shapes
-
bbh_fewshot_hyperbaton
-
bbh_fewshot_logical_deduction_five_objects
-
bbh_fewshot_logical_deduction_seven_objects
-
bbh_fewshot_logical_deduction_three_objects
-
bbh_fewshot_movie_recommendation
-
bbh_fewshot_multistep_arithmetic_two
-
bbh_fewshot_navigate
-
bbh_fewshot_object_counting
-
bbh_fewshot_penguins_in_a_table
-
bbh_fewshot_reasoning_about_colored_objects
-
bbh_fewshot_ruin_names
-
bbh_fewshot_salient_translation_error_detection
-
bbh_fewshot_snarks
-
bbh_fewshot_sports_understanding
-
bbh_fewshot_temporal_sequences
-
bbh_fewshot_tracking_shuffled_objects_five_objects
-
bbh_fewshot_tracking_shuffled_objects_seven_objects
-
bbh_fewshot_tracking_shuffled_objects_three_objects
-
bbh_fewshot_web_of_lies
-
bbh_fewshot_word_sorting
aggregate_metric_list
:
-
metric
:
exact_match
aggregation
:
mean
weight_by_size
:
true
metadata
:
version
:
2.0
lm_eval/tasks/bbh/fewshot/_fewshot_template_yaml
View file @
e4db76cb
group: bbh_fewshot
dataset_path: lukaemon/bbh
dataset_path: lukaemon/bbh
output_type: generate_until
output_type: generate_until
test_split: test
test_split: test
...
...
lm_eval/tasks/bbh/zeroshot/_bbh_zeroshot.yaml
0 → 100644
View file @
e4db76cb
group
:
bbh_zeroshot
task
:
-
bbh_zeroshot_boolean_expressions
-
bbh_zeroshot_causal_judgement
-
bbh_zeroshot_date_understanding
-
bbh_zeroshot_disambiguation_qa
-
bbh_zeroshot_dyck_languages
-
bbh_zeroshot_formal_fallacies
-
bbh_zeroshot_geometric_shapes
-
bbh_zeroshot_hyperbaton
-
bbh_zeroshot_logical_deduction_five_objects
-
bbh_zeroshot_logical_deduction_seven_objects
-
bbh_zeroshot_logical_deduction_three_objects
-
bbh_zeroshot_movie_recommendation
-
bbh_zeroshot_multistep_arithmetic_two
-
bbh_zeroshot_navigate
-
bbh_zeroshot_object_counting
-
bbh_zeroshot_penguins_in_a_table
-
bbh_zeroshot_reasoning_about_colored_objects
-
bbh_zeroshot_ruin_names
-
bbh_zeroshot_salient_translation_error_detection
-
bbh_zeroshot_snarks
-
bbh_zeroshot_sports_understanding
-
bbh_zeroshot_temporal_sequences
-
bbh_zeroshot_tracking_shuffled_objects_five_objects
-
bbh_zeroshot_tracking_shuffled_objects_seven_objects
-
bbh_zeroshot_tracking_shuffled_objects_three_objects
-
bbh_zeroshot_web_of_lies
-
bbh_zeroshot_word_sorting
aggregate_metric_list
:
-
metric
:
exact_match
aggregation
:
mean
weight_by_size
:
true
filter_list
:
flexible-extract
metadata
:
version
:
2.0
lm_eval/tasks/bbh/zeroshot/_zeroshot_template_yaml
View file @
e4db76cb
group: bbh_zeroshot
dataset_path: lukaemon/bbh
dataset_path: lukaemon/bbh
output_type: generate_until
output_type: generate_until
test_split: test
test_split: test
...
...
lm_eval/tasks/belebele/_belebele.yaml
0 → 100644
View file @
e4db76cb
group
:
belebele
task
:
-
belebele_acm_Arab
-
belebele_arz_Arab
-
belebele_ceb_Latn
-
belebele_fin_Latn
-
belebele_hin_Deva
-
belebele_ita_Latn
-
belebele_khm_Khmr
-
belebele_lvs_Latn
-
belebele_npi_Deva
-
belebele_pol_Latn
-
belebele_slv_Latn
-
belebele_swe_Latn
-
belebele_tso_Latn
-
belebele_xho_Latn
-
belebele_afr_Latn
-
belebele_asm_Beng
-
belebele_ces_Latn
-
belebele_fra_Latn
-
belebele_hin_Latn
-
belebele_jav_Latn
-
belebele_kin_Latn
-
belebele_mal_Mlym
-
belebele_npi_Latn
-
belebele_por_Latn
-
belebele_sna_Latn
-
belebele_swh_Latn
-
belebele_tur_Latn
-
belebele_yor_Latn
-
belebele_als_Latn
-
belebele_azj_Latn
-
belebele_ckb_Arab
-
belebele_fuv_Latn
-
belebele_hrv_Latn
-
belebele_jpn_Jpan
-
belebele_kir_Cyrl
-
belebele_mar_Deva
-
belebele_nso_Latn
-
belebele_snd_Arab
-
belebele_tam_Taml
-
belebele_ukr_Cyrl
-
belebele_zho_Hans
-
belebele_amh_Ethi
-
belebele_bam_Latn
-
belebele_dan_Latn
-
belebele_gaz_Latn
-
belebele_hun_Latn
-
belebele_kac_Latn
-
belebele_kor_Hang
-
belebele_mkd_Cyrl
-
belebele_nya_Latn
-
belebele_ron_Latn
-
belebele_som_Latn
-
belebele_tel_Telu
-
belebele_urd_Arab
-
belebele_zho_Hant
-
belebele_apc_Arab
-
belebele_ben_Beng
-
belebele_deu_Latn
-
belebele_grn_Latn
-
belebele_hye_Armn
-
belebele_kan_Knda
-
belebele_lao_Laoo
-
belebele_mlt_Latn
-
belebele_ory_Orya
-
belebele_rus_Cyrl
-
belebele_sot_Latn
-
belebele_tgk_Cyrl
-
belebele_urd_Latn
-
belebele_zsm_Latn
-
belebele_arb_Arab
-
belebele_ben_Latn
-
belebele_ell_Grek
-
belebele_guj_Gujr
-
belebele_ibo_Latn
-
belebele_kat_Geor
-
belebele_lin_Latn
-
belebele_mri_Latn
-
belebele_pan_Guru
-
belebele_shn_Mymr
-
belebele_spa_Latn
-
belebele_tgl_Latn
-
belebele_uzn_Latn
-
belebele_zul_Latn
-
belebele_arb_Latn
-
belebele_bod_Tibt
-
belebele_eng_Latn
-
belebele_hat_Latn
-
belebele_ilo_Latn
-
belebele_kaz_Cyrl
-
belebele_lit_Latn
-
belebele_mya_Mymr
-
belebele_pbt_Arab
-
belebele_sin_Latn
-
belebele_srp_Cyrl
-
belebele_tha_Thai
-
belebele_vie_Latn
-
belebele_ars_Arab
-
belebele_bul_Cyrl
-
belebele_est_Latn
-
belebele_hau_Latn
-
belebele_ind_Latn
-
belebele_kea_Latn
-
belebele_lug_Latn
-
belebele_nld_Latn
-
belebele_pes_Arab
-
belebele_sin_Sinh
-
belebele_ssw_Latn
-
belebele_tir_Ethi
-
belebele_war_Latn
-
belebele_ary_Arab
-
belebele_cat_Latn
-
belebele_eus_Latn
-
belebele_heb_Hebr
-
belebele_isl_Latn
-
belebele_khk_Cyrl
-
belebele_luo_Latn
-
belebele_nob_Latn
-
belebele_plt_Latn
-
belebele_slk_Latn
-
belebele_sun_Latn
-
belebele_tsn_Latn
-
belebele_wol_Latn
aggregate_metric_list
:
-
aggregation
:
mean
metric
:
acc
weight_by_size
:
true
-
aggregation
:
mean
metric
:
acc_norm
weight_by_size
:
true
metadata
:
version
:
0.0
lm_eval/tasks/belebele/_default_template_yaml
View file @
e4db76cb
group: belebele
dataset_path: facebook/belebele
dataset_path: facebook/belebele
fewshot_config:
fewshot_config:
sampler: first_n
sampler: first_n
...
...
lm_eval/tasks/belebele/_generate_configs.py
View file @
e4db76cb
...
@@ -65,3 +65,36 @@ if __name__ == "__main__":
...
@@ -65,3 +65,36 @@ if __name__ == "__main__":
allow_unicode
=
True
,
allow_unicode
=
True
,
default_style
=
'"'
,
default_style
=
'"'
,
)
)
# write group config out
group_yaml_dict
=
{
"group"
:
f
"belebele_
{
args
.
task_prefix
}
"
if
args
.
task_prefix
!=
""
else
"belebele"
,
"task"
:
[
(
f
"belebele_
{
args
.
task_prefix
}
_
{
lang
}
"
if
args
.
task_prefix
!=
""
else
f
"belebele_
{
lang
}
"
)
for
lang
in
languages
if
"default"
not
in
lang
],
"aggregate_metric_list"
:
[
{
"metric"
:
"acc"
,
"aggregation"
:
"mean"
,
"weight_by_size"
:
False
},
{
"metric"
:
"acc_norm"
,
"aggregation"
:
"mean"
,
"weight_by_size"
:
False
},
],
"metadata"
:
{
"version"
:
0.0
},
}
file_save_path
=
"_"
+
args
.
save_prefix_path
+
f
"
{
args
.
task_prefix
}
.yaml"
with
open
(
file_save_path
,
"w"
,
encoding
=
"utf-8"
)
as
group_yaml_file
:
yaml
.
dump
(
group_yaml_dict
,
group_yaml_file
,
width
=
float
(
"inf"
),
allow_unicode
=
True
,
default_style
=
'"'
,
)
Prev
1
2
3
4
5
6
7
8
9
10
…
44
Next
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment