Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
gaoqiong
lm-evaluation-harness
Commits
7d09b24c
Commit
7d09b24c
authored
Jul 03, 2024
by
haileyschoelkopf
Browse files
fix alllll the merge conflicts
parents
96dfe976
6348b947
Changes
395
Hide whitespace changes
Inline
Side-by-side
Showing
20 changed files
with
371 additions
and
17 deletions
+371
-17
lm_eval/tasks/arc/README.md
lm_eval/tasks/arc/README.md
+5
-1
lm_eval/tasks/arc/arc_easy.yaml
lm_eval/tasks/arc/arc_easy.yaml
+1
-1
lm_eval/tasks/arithmetic/README.md
lm_eval/tasks/arithmetic/README.md
+2
-2
lm_eval/tasks/arithmetic/arithmetic_1dc.yaml
lm_eval/tasks/arithmetic/arithmetic_1dc.yaml
+1
-1
lm_eval/tasks/asdiv/README.md
lm_eval/tasks/asdiv/README.md
+1
-1
lm_eval/tasks/babi/README.md
lm_eval/tasks/babi/README.md
+5
-1
lm_eval/tasks/basqueglue/README.md
lm_eval/tasks/basqueglue/README.md
+6
-2
lm_eval/tasks/bbh/README.md
lm_eval/tasks/bbh/README.md
+5
-1
lm_eval/tasks/bbh/cot_fewshot/_bbh.yaml
lm_eval/tasks/bbh/cot_fewshot/_bbh.yaml
+36
-0
lm_eval/tasks/bbh/cot_fewshot/_bbh_cot_fewshot.yaml
lm_eval/tasks/bbh/cot_fewshot/_bbh_cot_fewshot.yaml
+36
-0
lm_eval/tasks/bbh/cot_fewshot/_cot_fewshot_template_yaml
lm_eval/tasks/bbh/cot_fewshot/_cot_fewshot_template_yaml
+0
-3
lm_eval/tasks/bbh/cot_zeroshot/_bbh_cot_zeroshot.yaml
lm_eval/tasks/bbh/cot_zeroshot/_bbh_cot_zeroshot.yaml
+36
-0
lm_eval/tasks/bbh/cot_zeroshot/_cot_zeroshot_template_yaml
lm_eval/tasks/bbh/cot_zeroshot/_cot_zeroshot_template_yaml
+0
-1
lm_eval/tasks/bbh/fewshot/_bbh_fewshot.yaml
lm_eval/tasks/bbh/fewshot/_bbh_fewshot.yaml
+35
-0
lm_eval/tasks/bbh/fewshot/_fewshot_template_yaml
lm_eval/tasks/bbh/fewshot/_fewshot_template_yaml
+0
-1
lm_eval/tasks/bbh/zeroshot/_bbh_zeroshot.yaml
lm_eval/tasks/bbh/zeroshot/_bbh_zeroshot.yaml
+36
-0
lm_eval/tasks/bbh/zeroshot/_zeroshot_template_yaml
lm_eval/tasks/bbh/zeroshot/_zeroshot_template_yaml
+0
-1
lm_eval/tasks/belebele/_belebele.yaml
lm_eval/tasks/belebele/_belebele.yaml
+133
-0
lm_eval/tasks/belebele/_default_template_yaml
lm_eval/tasks/belebele/_default_template_yaml
+0
-1
lm_eval/tasks/belebele/_generate_configs.py
lm_eval/tasks/belebele/_generate_configs.py
+33
-0
No files found.
lm_eval/tasks/arc/README.md
View file @
7d09b24c
...
...
@@ -29,10 +29,14 @@ Homepage: https://allenai.org/data/arc
}
```
### Groups and Tasks
### Groups
, Tags,
and Tasks
#### Groups
None.
#### Tags
*
`ai2_arc`
: Evaluates
`arc_easy`
and
`arc_challenge`
#### Tasks
...
...
lm_eval/tasks/arc/arc_easy.yaml
View file @
7d09b24c
group
:
tag
:
-
ai2_arc
task
:
arc_easy
dataset_path
:
allenai/ai2_arc
...
...
lm_eval/tasks/arithmetic/README.md
View file @
7d09b24c
...
...
@@ -27,9 +27,9 @@ Homepage: https://github.com/openai/gpt-3/tree/master/data
}
```
### Groups and Tasks
### Groups
, Tags,
and Tasks
####
Group
s
####
Tag
s
*
`arithmetic`
: Evaluates
`1dc`
to
`5ds`
...
...
lm_eval/tasks/arithmetic/arithmetic_1dc.yaml
View file @
7d09b24c
group
:
tag
:
-
arithmetic
task
:
arithmetic_1dc
dataset_path
:
EleutherAI/arithmetic
...
...
lm_eval/tasks/asdiv/README.md
View file @
7d09b24c
...
...
@@ -32,7 +32,7 @@ Homepage: https://github.com/chaochun/nlu-asdiv-dataset
}
```
### Groups and Tasks
### Groups
, Tags,
and Tasks
#### Groups
...
...
lm_eval/tasks/babi/README.md
View file @
7d09b24c
...
...
@@ -21,12 +21,16 @@ Homepage: https://github.com/facebookarchive/bAbI-tasks
}
```
### Groups and Tasks
### Groups
, Tags,
and Tasks
#### Groups
*
Not part of a group yet
#### Tags
*
No tags applied.
#### Tasks
*
`babi`
...
...
lm_eval/tasks/basqueglue/README.md
View file @
7d09b24c
...
...
@@ -43,11 +43,15 @@ Homepage: `https://github.com/hitz-zentroa/latxa`
}
```
### Groups and Tasks
### Groups
, Tags,
and Tasks
#### Groups
*
`basque-glue`
: First version of the implementation
None.
#### Tags
*
`basque-glue`
: First version of the implementation. Calls all subtasks, but does not average.
#### Tasks
...
...
lm_eval/tasks/bbh/README.md
View file @
7d09b24c
...
...
@@ -21,15 +21,19 @@ Homepage: https://github.com/suzgunmirac/BIG-Bench-Hard
}
```
### Groups and Tasks
### Groups
, Tags,
and Tasks
#### Groups
-
`bbh`
: is the same as
`bbh_cot_fewshot`
.
-
`bbh_zeroshot`
-
`bbh_fewshot`
-
`bbh_cot_fewshot`
-
`bbh_cot_zeroshot`
#### Tags
None.
#### Tasks
...
...
lm_eval/tasks/bbh/cot_fewshot/_bbh.yaml
0 → 100644
View file @
7d09b24c
group
:
bbh
task
:
-
bbh_cot_fewshot_boolean_expressions
-
bbh_cot_fewshot_causal_judgement
-
bbh_cot_fewshot_date_understanding
-
bbh_cot_fewshot_disambiguation_qa
-
bbh_cot_fewshot_dyck_languages
-
bbh_cot_fewshot_formal_languages
-
bbh_cot_fewshot_geometric_shapes
-
bbh_cot_fewshot_hyperbaton
-
bbh_cot_fewshot_logical_deduction_five_objects
-
bbh_cot_fewshot_logical_deduction_seven_objects
-
bbh_cot_fewshot_logical_deduction_three_objects
-
bbh_cot_fewshot_movie_recommendation
-
bbh_cot_fewshot_multistep_arithmetic_two
-
bbh_cot_fewshot_navigate
-
bbh_cot_fewshot_object_counting
-
bbh_cot_fewshot_penguins_in_a_table
-
bbh_cot_fewshot_reasoning_about_colored_objects
-
bbh_cot_fewshot_ruin_names
-
bbh_cot_fewshot_salient_translation_error_detection
-
bbh_cot_fewshot_snarks
-
bbh_cot_fewshot_sports_understanding
-
bbh_cot_fewshot_temporal_sequences
-
bbh_cot_fewshot_tracking_shuffled_objects_five_objects
-
bbh_cot_fewshot_tracking_shuffled_objects_seven_objects
-
bbh_cot_fewshot_tracking_shuffled_objects_three_objects
-
bbh_cot_fewshot_web_of_lies
-
bbh_cot_fewshot_word_sorting
aggregate_metric_list
:
-
metric
:
exact_match
aggregation
:
mean
weight_by_size
:
true
filter_list
:
get-answer
metadata
:
version
:
2.0
lm_eval/tasks/bbh/cot_fewshot/_bbh_cot_fewshot.yaml
0 → 100644
View file @
7d09b24c
group
:
bbh_cot_fewshot
task
:
-
bbh_cot_fewshot_boolean_expressions
-
bbh_cot_fewshot_causal_judgement
-
bbh_cot_fewshot_date_understanding
-
bbh_cot_fewshot_disambiguation_qa
-
bbh_cot_fewshot_dyck_languages
-
bbh_cot_fewshot_formal_fallacies
-
bbh_cot_fewshot_geometric_shapes
-
bbh_cot_fewshot_hyperbaton
-
bbh_cot_fewshot_logical_deduction_five_objects
-
bbh_cot_fewshot_logical_deduction_seven_objects
-
bbh_cot_fewshot_logical_deduction_three_objects
-
bbh_cot_fewshot_movie_recommendation
-
bbh_cot_fewshot_multistep_arithmetic_two
-
bbh_cot_fewshot_navigate
-
bbh_cot_fewshot_object_counting
-
bbh_cot_fewshot_penguins_in_a_table
-
bbh_cot_fewshot_reasoning_about_colored_objects
-
bbh_cot_fewshot_ruin_names
-
bbh_cot_fewshot_salient_translation_error_detection
-
bbh_cot_fewshot_snarks
-
bbh_cot_fewshot_sports_understanding
-
bbh_cot_fewshot_temporal_sequences
-
bbh_cot_fewshot_tracking_shuffled_objects_five_objects
-
bbh_cot_fewshot_tracking_shuffled_objects_seven_objects
-
bbh_cot_fewshot_tracking_shuffled_objects_three_objects
-
bbh_cot_fewshot_web_of_lies
-
bbh_cot_fewshot_word_sorting
aggregate_metric_list
:
-
metric
:
exact_match
aggregation
:
mean
weight_by_size
:
true
filter_list
:
get-answer
metadata
:
version
:
2.0
lm_eval/tasks/bbh/cot_fewshot/_cot_fewshot_template_yaml
View file @
7d09b24c
group:
- bbh
- bbh_cot_fewshot
dataset_path: lukaemon/bbh
output_type: generate_until
test_split: test
...
...
lm_eval/tasks/bbh/cot_zeroshot/_bbh_cot_zeroshot.yaml
0 → 100644
View file @
7d09b24c
group
:
bbh_cot_zeroshot
task
:
-
bbh_cot_zeroshot_boolean_expressions
-
bbh_cot_zeroshot_causal_judgement
-
bbh_cot_zeroshot_date_understanding
-
bbh_cot_zeroshot_disambiguation_qa
-
bbh_cot_zeroshot_dyck_languages
-
bbh_cot_zeroshot_formal_fallacies
-
bbh_cot_zeroshot_geometric_shapes
-
bbh_cot_zeroshot_hyperbaton
-
bbh_cot_zeroshot_logical_deduction_five_objects
-
bbh_cot_zeroshot_logical_deduction_seven_objects
-
bbh_cot_zeroshot_logical_deduction_three_objects
-
bbh_cot_zeroshot_movie_recommendation
-
bbh_cot_zeroshot_multistep_arithmetic_two
-
bbh_cot_zeroshot_navigate
-
bbh_cot_zeroshot_object_counting
-
bbh_cot_zeroshot_penguins_in_a_table
-
bbh_cot_zeroshot_reasoning_about_colored_objects
-
bbh_cot_zeroshot_ruin_names
-
bbh_cot_zeroshot_salient_translation_error_detection
-
bbh_cot_zeroshot_snarks
-
bbh_cot_zeroshot_sports_understanding
-
bbh_cot_zeroshot_temporal_sequences
-
bbh_cot_zeroshot_tracking_shuffled_objects_five_objects
-
bbh_cot_zeroshot_tracking_shuffled_objects_seven_objects
-
bbh_cot_zeroshot_tracking_shuffled_objects_three_objects
-
bbh_cot_zeroshot_web_of_lies
-
bbh_cot_zeroshot_word_sorting
aggregate_metric_list
:
-
metric
:
exact_match
aggregation
:
mean
weight_by_size
:
true
filter_list
:
flexible-extract
metadata
:
version
:
2.0
lm_eval/tasks/bbh/cot_zeroshot/_cot_zeroshot_template_yaml
View file @
7d09b24c
group: bbh_cot_zeroshot
dataset_path: lukaemon/bbh
output_type: generate_until
test_split: test
...
...
lm_eval/tasks/bbh/fewshot/_bbh_fewshot.yaml
0 → 100644
View file @
7d09b24c
group
:
bbh_fewshot
task
:
-
bbh_fewshot_boolean_expressions
-
bbh_fewshot_causal_judgement
-
bbh_fewshot_date_understanding
-
bbh_fewshot_disambiguation_qa
-
bbh_fewshot_dyck_languages
-
bbh_fewshot_formal_fallacies
-
bbh_fewshot_geometric_shapes
-
bbh_fewshot_hyperbaton
-
bbh_fewshot_logical_deduction_five_objects
-
bbh_fewshot_logical_deduction_seven_objects
-
bbh_fewshot_logical_deduction_three_objects
-
bbh_fewshot_movie_recommendation
-
bbh_fewshot_multistep_arithmetic_two
-
bbh_fewshot_navigate
-
bbh_fewshot_object_counting
-
bbh_fewshot_penguins_in_a_table
-
bbh_fewshot_reasoning_about_colored_objects
-
bbh_fewshot_ruin_names
-
bbh_fewshot_salient_translation_error_detection
-
bbh_fewshot_snarks
-
bbh_fewshot_sports_understanding
-
bbh_fewshot_temporal_sequences
-
bbh_fewshot_tracking_shuffled_objects_five_objects
-
bbh_fewshot_tracking_shuffled_objects_seven_objects
-
bbh_fewshot_tracking_shuffled_objects_three_objects
-
bbh_fewshot_web_of_lies
-
bbh_fewshot_word_sorting
aggregate_metric_list
:
-
metric
:
exact_match
aggregation
:
mean
weight_by_size
:
true
metadata
:
version
:
2.0
lm_eval/tasks/bbh/fewshot/_fewshot_template_yaml
View file @
7d09b24c
group: bbh_fewshot
dataset_path: lukaemon/bbh
output_type: generate_until
test_split: test
...
...
lm_eval/tasks/bbh/zeroshot/_bbh_zeroshot.yaml
0 → 100644
View file @
7d09b24c
group
:
bbh_zeroshot
task
:
-
bbh_zeroshot_boolean_expressions
-
bbh_zeroshot_causal_judgement
-
bbh_zeroshot_date_understanding
-
bbh_zeroshot_disambiguation_qa
-
bbh_zeroshot_dyck_languages
-
bbh_zeroshot_formal_fallacies
-
bbh_zeroshot_geometric_shapes
-
bbh_zeroshot_hyperbaton
-
bbh_zeroshot_logical_deduction_five_objects
-
bbh_zeroshot_logical_deduction_seven_objects
-
bbh_zeroshot_logical_deduction_three_objects
-
bbh_zeroshot_movie_recommendation
-
bbh_zeroshot_multistep_arithmetic_two
-
bbh_zeroshot_navigate
-
bbh_zeroshot_object_counting
-
bbh_zeroshot_penguins_in_a_table
-
bbh_zeroshot_reasoning_about_colored_objects
-
bbh_zeroshot_ruin_names
-
bbh_zeroshot_salient_translation_error_detection
-
bbh_zeroshot_snarks
-
bbh_zeroshot_sports_understanding
-
bbh_zeroshot_temporal_sequences
-
bbh_zeroshot_tracking_shuffled_objects_five_objects
-
bbh_zeroshot_tracking_shuffled_objects_seven_objects
-
bbh_zeroshot_tracking_shuffled_objects_three_objects
-
bbh_zeroshot_web_of_lies
-
bbh_zeroshot_word_sorting
aggregate_metric_list
:
-
metric
:
exact_match
aggregation
:
mean
weight_by_size
:
true
filter_list
:
flexible-extract
metadata
:
version
:
2.0
lm_eval/tasks/bbh/zeroshot/_zeroshot_template_yaml
View file @
7d09b24c
group: bbh_zeroshot
dataset_path: lukaemon/bbh
output_type: generate_until
test_split: test
...
...
lm_eval/tasks/belebele/_belebele.yaml
0 → 100644
View file @
7d09b24c
group
:
belebele
task
:
-
belebele_acm_Arab
-
belebele_arz_Arab
-
belebele_ceb_Latn
-
belebele_fin_Latn
-
belebele_hin_Deva
-
belebele_ita_Latn
-
belebele_khm_Khmr
-
belebele_lvs_Latn
-
belebele_npi_Deva
-
belebele_pol_Latn
-
belebele_slv_Latn
-
belebele_swe_Latn
-
belebele_tso_Latn
-
belebele_xho_Latn
-
belebele_afr_Latn
-
belebele_asm_Beng
-
belebele_ces_Latn
-
belebele_fra_Latn
-
belebele_hin_Latn
-
belebele_jav_Latn
-
belebele_kin_Latn
-
belebele_mal_Mlym
-
belebele_npi_Latn
-
belebele_por_Latn
-
belebele_sna_Latn
-
belebele_swh_Latn
-
belebele_tur_Latn
-
belebele_yor_Latn
-
belebele_als_Latn
-
belebele_azj_Latn
-
belebele_ckb_Arab
-
belebele_fuv_Latn
-
belebele_hrv_Latn
-
belebele_jpn_Jpan
-
belebele_kir_Cyrl
-
belebele_mar_Deva
-
belebele_nso_Latn
-
belebele_snd_Arab
-
belebele_tam_Taml
-
belebele_ukr_Cyrl
-
belebele_zho_Hans
-
belebele_amh_Ethi
-
belebele_bam_Latn
-
belebele_dan_Latn
-
belebele_gaz_Latn
-
belebele_hun_Latn
-
belebele_kac_Latn
-
belebele_kor_Hang
-
belebele_mkd_Cyrl
-
belebele_nya_Latn
-
belebele_ron_Latn
-
belebele_som_Latn
-
belebele_tel_Telu
-
belebele_urd_Arab
-
belebele_zho_Hant
-
belebele_apc_Arab
-
belebele_ben_Beng
-
belebele_deu_Latn
-
belebele_grn_Latn
-
belebele_hye_Armn
-
belebele_kan_Knda
-
belebele_lao_Laoo
-
belebele_mlt_Latn
-
belebele_ory_Orya
-
belebele_rus_Cyrl
-
belebele_sot_Latn
-
belebele_tgk_Cyrl
-
belebele_urd_Latn
-
belebele_zsm_Latn
-
belebele_arb_Arab
-
belebele_ben_Latn
-
belebele_ell_Grek
-
belebele_guj_Gujr
-
belebele_ibo_Latn
-
belebele_kat_Geor
-
belebele_lin_Latn
-
belebele_mri_Latn
-
belebele_pan_Guru
-
belebele_shn_Mymr
-
belebele_spa_Latn
-
belebele_tgl_Latn
-
belebele_uzn_Latn
-
belebele_zul_Latn
-
belebele_arb_Latn
-
belebele_bod_Tibt
-
belebele_eng_Latn
-
belebele_hat_Latn
-
belebele_ilo_Latn
-
belebele_kaz_Cyrl
-
belebele_lit_Latn
-
belebele_mya_Mymr
-
belebele_pbt_Arab
-
belebele_sin_Latn
-
belebele_srp_Cyrl
-
belebele_tha_Thai
-
belebele_vie_Latn
-
belebele_ars_Arab
-
belebele_bul_Cyrl
-
belebele_est_Latn
-
belebele_hau_Latn
-
belebele_ind_Latn
-
belebele_kea_Latn
-
belebele_lug_Latn
-
belebele_nld_Latn
-
belebele_pes_Arab
-
belebele_sin_Sinh
-
belebele_ssw_Latn
-
belebele_tir_Ethi
-
belebele_war_Latn
-
belebele_ary_Arab
-
belebele_cat_Latn
-
belebele_eus_Latn
-
belebele_heb_Hebr
-
belebele_isl_Latn
-
belebele_khk_Cyrl
-
belebele_luo_Latn
-
belebele_nob_Latn
-
belebele_plt_Latn
-
belebele_slk_Latn
-
belebele_sun_Latn
-
belebele_tsn_Latn
-
belebele_wol_Latn
aggregate_metric_list
:
-
aggregation
:
mean
metric
:
acc
weight_by_size
:
true
-
aggregation
:
mean
metric
:
acc_norm
weight_by_size
:
true
metadata
:
version
:
0.0
lm_eval/tasks/belebele/_default_template_yaml
View file @
7d09b24c
group: belebele
dataset_path: facebook/belebele
fewshot_config:
sampler: first_n
...
...
lm_eval/tasks/belebele/_generate_configs.py
View file @
7d09b24c
...
...
@@ -65,3 +65,36 @@ if __name__ == "__main__":
allow_unicode
=
True
,
default_style
=
'"'
,
)
# write group config out
group_yaml_dict
=
{
"group"
:
f
"belebele_
{
args
.
task_prefix
}
"
if
args
.
task_prefix
!=
""
else
"belebele"
,
"task"
:
[
(
f
"belebele_
{
args
.
task_prefix
}
_
{
lang
}
"
if
args
.
task_prefix
!=
""
else
f
"belebele_
{
lang
}
"
)
for
lang
in
languages
if
"default"
not
in
lang
],
"aggregate_metric_list"
:
[
{
"metric"
:
"acc"
,
"aggregation"
:
"mean"
,
"weight_by_size"
:
False
},
{
"metric"
:
"acc_norm"
,
"aggregation"
:
"mean"
,
"weight_by_size"
:
False
},
],
"metadata"
:
{
"version"
:
0.0
},
}
file_save_path
=
"_"
+
args
.
save_prefix_path
+
f
"
{
args
.
task_prefix
}
.yaml"
with
open
(
file_save_path
,
"w"
,
encoding
=
"utf-8"
)
as
group_yaml_file
:
yaml
.
dump
(
group_yaml_dict
,
group_yaml_file
,
width
=
float
(
"inf"
),
allow_unicode
=
True
,
default_style
=
'"'
,
)
Prev
1
2
3
4
5
6
7
…
20
Next
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment