Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
gaoqiong
lm-evaluation-harness
Commits
7d09b24c
"...second_hv_secfpn_sbn-all_16xb2-2x_waymoD5-3d-3class.py" did not exist on "583c4accbbf8e37b15638820b7b781f4475c6bde"
Commit
7d09b24c
authored
Jul 03, 2024
by
haileyschoelkopf
Browse files
fix alllll the merge conflicts
parents
96dfe976
6348b947
Changes
395
Expand all
Show whitespace changes
Inline
Side-by-side
Showing
20 changed files
with
387 additions
and
70 deletions
+387
-70
lm_eval/tasks/benchmarks/flan/flan_held_in.yaml
lm_eval/tasks/benchmarks/flan/flan_held_in.yaml
+67
-67
lm_eval/tasks/benchmarks/minerva_math.yaml
lm_eval/tasks/benchmarks/minerva_math.yaml
+6
-0
lm_eval/tasks/benchmarks/multimedqa/multimedqa.yaml
lm_eval/tasks/benchmarks/multimedqa/multimedqa.yaml
+4
-0
lm_eval/tasks/blimp/_blimp.yaml
lm_eval/tasks/blimp/_blimp.yaml
+75
-0
lm_eval/tasks/blimp/_template_yaml
lm_eval/tasks/blimp/_template_yaml
+0
-1
lm_eval/tasks/ceval/_ceval-valid.yaml
lm_eval/tasks/ceval/_ceval-valid.yaml
+63
-0
lm_eval/tasks/ceval/_default_ceval_yaml
lm_eval/tasks/ceval/_default_ceval_yaml
+0
-1
lm_eval/tasks/ceval/_generate_configs.py
lm_eval/tasks/ceval/_generate_configs.py
+24
-1
lm_eval/tasks/cmmlu/_cmmlu.yaml
lm_eval/tasks/cmmlu/_cmmlu.yaml
+78
-0
lm_eval/tasks/cmmlu/_generate_configs.py
lm_eval/tasks/cmmlu/_generate_configs.py
+30
-0
lm_eval/tasks/cmmlu/cmmlu_agronomy.yaml
lm_eval/tasks/cmmlu/cmmlu_agronomy.yaml
+4
-0
lm_eval/tasks/cmmlu/cmmlu_anatomy.yaml
lm_eval/tasks/cmmlu/cmmlu_anatomy.yaml
+4
-0
lm_eval/tasks/cmmlu/cmmlu_ancient_chinese.yaml
lm_eval/tasks/cmmlu/cmmlu_ancient_chinese.yaml
+4
-0
lm_eval/tasks/cmmlu/cmmlu_arts.yaml
lm_eval/tasks/cmmlu/cmmlu_arts.yaml
+4
-0
lm_eval/tasks/cmmlu/cmmlu_astronomy.yaml
lm_eval/tasks/cmmlu/cmmlu_astronomy.yaml
+4
-0
lm_eval/tasks/cmmlu/cmmlu_business_ethics.yaml
lm_eval/tasks/cmmlu/cmmlu_business_ethics.yaml
+4
-0
lm_eval/tasks/cmmlu/cmmlu_chinese_civil_service_exam.yaml
lm_eval/tasks/cmmlu/cmmlu_chinese_civil_service_exam.yaml
+4
-0
lm_eval/tasks/cmmlu/cmmlu_chinese_driving_rule.yaml
lm_eval/tasks/cmmlu/cmmlu_chinese_driving_rule.yaml
+4
-0
lm_eval/tasks/cmmlu/cmmlu_chinese_food_culture.yaml
lm_eval/tasks/cmmlu/cmmlu_chinese_food_culture.yaml
+4
-0
lm_eval/tasks/cmmlu/cmmlu_chinese_foreign_policy.yaml
lm_eval/tasks/cmmlu/cmmlu_chinese_foreign_policy.yaml
+4
-0
No files found.
lm_eval/tasks/benchmarks/flan/flan_held_in.yaml
View file @
7d09b24c
This diff is collapsed.
Click to expand it.
lm_eval/tasks/benchmarks/minerva_math.yaml
View file @
7d09b24c
...
...
@@ -7,3 +7,9 @@ task:
-
minerva_math_num_theory
-
minerva_math_prealgebra
-
minerva_math_precalc
aggregate_metric_list
:
-
metric
:
exact_match
aggregation
:
mean
weight_by_size
:
true
metadata
:
version
:
1.0
lm_eval/tasks/benchmarks/multimedqa/multimedqa.yaml
View file @
7d09b24c
...
...
@@ -15,3 +15,7 @@ task:
task_alias
:
"
professional_medicine
(mmlu)"
-
task
:
mmlu_college_biology
task_alias
:
"
college_biology
(mmlu)"
aggregate_metric_list
:
-
metric
:
acc
aggregation
:
mean
weight_by_size
:
True
lm_eval/tasks/blimp/_blimp.yaml
0 → 100644
View file @
7d09b24c
group
:
blimp
task
:
-
"
blimp_adjunct_island"
-
"
blimp_anaphor_gender_agreement"
-
"
blimp_anaphor_number_agreement"
-
"
blimp_animate_subject_passive"
-
"
blimp_animate_subject_trans"
-
"
blimp_causative"
-
"
blimp_complex_NP_island"
-
"
blimp_coordinate_structure_constraint_complex_left_branch"
-
"
blimp_coordinate_structure_constraint_object_extraction"
-
"
blimp_determiner_noun_agreement_1"
-
"
blimp_determiner_noun_agreement_2"
-
"
blimp_determiner_noun_agreement_irregular_1"
-
"
blimp_determiner_noun_agreement_irregular_2"
-
"
blimp_determiner_noun_agreement_with_adj_2"
-
"
blimp_determiner_noun_agreement_with_adj_irregular_1"
-
"
blimp_determiner_noun_agreement_with_adj_irregular_2"
-
"
blimp_determiner_noun_agreement_with_adjective_1"
-
"
blimp_distractor_agreement_relational_noun"
-
"
blimp_distractor_agreement_relative_clause"
-
"
blimp_drop_argument"
-
"
blimp_ellipsis_n_bar_1"
-
"
blimp_ellipsis_n_bar_2"
-
"
blimp_existential_there_object_raising"
-
"
blimp_existential_there_quantifiers_1"
-
"
blimp_existential_there_quantifiers_2"
-
"
blimp_existential_there_subject_raising"
-
"
blimp_expletive_it_object_raising"
-
"
blimp_inchoative"
-
"
blimp_intransitive"
-
"
blimp_irregular_past_participle_adjectives"
-
"
blimp_irregular_past_participle_verbs"
-
"
blimp_irregular_plural_subject_verb_agreement_1"
-
"
blimp_irregular_plural_subject_verb_agreement_2"
-
"
blimp_left_branch_island_echo_question"
-
"
blimp_left_branch_island_simple_question"
-
"
blimp_matrix_question_npi_licensor_present"
-
"
blimp_npi_present_1"
-
"
blimp_npi_present_2"
-
"
blimp_only_npi_licensor_present"
-
"
blimp_only_npi_scope"
-
"
blimp_passive_1"
-
"
blimp_passive_2"
-
"
blimp_principle_A_c_command"
-
"
blimp_principle_A_case_1"
-
"
blimp_principle_A_case_2"
-
"
blimp_principle_A_domain_1"
-
"
blimp_principle_A_domain_2"
-
"
blimp_principle_A_domain_3"
-
"
blimp_principle_A_reconstruction"
-
"
blimp_regular_plural_subject_verb_agreement_1"
-
"
blimp_regular_plural_subject_verb_agreement_2"
-
"
blimp_sentential_negation_npi_licensor_present"
-
"
blimp_sentential_negation_npi_scope"
-
"
blimp_sentential_subject_island"
-
"
blimp_superlative_quantifiers_1"
-
"
blimp_superlative_quantifiers_2"
-
"
blimp_tough_vs_raising_1"
-
"
blimp_tough_vs_raising_2"
-
"
blimp_transitive"
-
"
blimp_wh_island"
-
"
blimp_wh_questions_object_gap"
-
"
blimp_wh_questions_subject_gap"
-
"
blimp_wh_questions_subject_gap_long_distance"
-
"
blimp_wh_vs_that_no_gap"
-
"
blimp_wh_vs_that_no_gap_long_distance"
-
"
blimp_wh_vs_that_with_gap"
-
"
blimp_wh_vs_that_with_gap_long_distance"
aggregate_metric_list
:
-
metric
:
acc
aggregation
:
mean
weight_by_size
:
False
metadata
:
version
:
2.0
lm_eval/tasks/blimp/_template_yaml
View file @
7d09b24c
group: blimp
dataset_path: blimp
output_type: multiple_choice
validation_split: train
...
...
lm_eval/tasks/ceval/_ceval-valid.yaml
0 → 100644
View file @
7d09b24c
aggregate_metric_list
:
-
aggregation
:
mean
metric
:
acc
weight_by_size
:
true
-
aggregation
:
mean
metric
:
acc_norm
weight_by_size
:
true
group
:
ceval-valid
metadata
:
version
:
1.0
task
:
-
ceval-valid_computer_network
-
ceval-valid_operating_system
-
ceval-valid_computer_architecture
-
ceval-valid_college_programming
-
ceval-valid_college_physics
-
ceval-valid_college_chemistry
-
ceval-valid_advanced_mathematics
-
ceval-valid_probability_and_statistics
-
ceval-valid_discrete_mathematics
-
ceval-valid_electrical_engineer
-
ceval-valid_metrology_engineer
-
ceval-valid_high_school_mathematics
-
ceval-valid_high_school_physics
-
ceval-valid_high_school_chemistry
-
ceval-valid_high_school_biology
-
ceval-valid_middle_school_mathematics
-
ceval-valid_middle_school_biology
-
ceval-valid_middle_school_physics
-
ceval-valid_middle_school_chemistry
-
ceval-valid_veterinary_medicine
-
ceval-valid_college_economics
-
ceval-valid_business_administration
-
ceval-valid_marxism
-
ceval-valid_mao_zedong_thought
-
ceval-valid_education_science
-
ceval-valid_teacher_qualification
-
ceval-valid_high_school_politics
-
ceval-valid_high_school_geography
-
ceval-valid_middle_school_politics
-
ceval-valid_middle_school_geography
-
ceval-valid_modern_chinese_history
-
ceval-valid_ideological_and_moral_cultivation
-
ceval-valid_logic
-
ceval-valid_law
-
ceval-valid_chinese_language_and_literature
-
ceval-valid_art_studies
-
ceval-valid_professional_tour_guide
-
ceval-valid_legal_professional
-
ceval-valid_high_school_chinese
-
ceval-valid_high_school_history
-
ceval-valid_middle_school_history
-
ceval-valid_civil_servant
-
ceval-valid_sports_science
-
ceval-valid_plant_protection
-
ceval-valid_basic_medicine
-
ceval-valid_clinical_medicine
-
ceval-valid_urban_and_rural_planner
-
ceval-valid_accountant
-
ceval-valid_fire_engineer
-
ceval-valid_environmental_impact_assessment_engineer
-
ceval-valid_tax_accountant
-
ceval-valid_physician
lm_eval/tasks/ceval/_default_ceval_yaml
View file @
7d09b24c
group: ceval-valid
dataset_path: ceval/ceval-exam
validation_split: val
fewshot_split: dev
...
...
lm_eval/tasks/ceval/_generate_configs.py
View file @
7d09b24c
...
...
@@ -8,7 +8,7 @@ import os
import
yaml
from
tqdm
import
tqdm
from
lm_eval.
logger
import
eval_logger
from
lm_eval.
utils
import
eval_logger
SUBJECTS
=
{
...
...
@@ -117,3 +117,26 @@ if __name__ == "__main__":
allow_unicode
=
True
,
default_style
=
'"'
,
)
# write group config out
group_yaml_dict
=
{
"group"
:
"ceval-valid"
,
"task"
:
[
f
"ceval-valid_
{
task_name
}
"
for
task_name
in
SUBJECTS
.
keys
()],
"aggregate_metric_list"
:
[
{
"metric"
:
"acc"
,
"aggregation"
:
"mean"
,
"weight_by_size"
:
True
},
{
"metric"
:
"acc_norm"
,
"aggregation"
:
"mean"
,
"weight_by_size"
:
True
},
],
"metadata"
:
{
"version"
:
1.0
},
}
file_save_path
=
"_"
+
args
.
save_prefix_path
+
".yaml"
with
open
(
file_save_path
,
"w"
,
encoding
=
"utf-8"
)
as
group_yaml_file
:
yaml
.
dump
(
group_yaml_dict
,
group_yaml_file
,
width
=
float
(
"inf"
),
allow_unicode
=
True
,
default_style
=
'"'
,
)
lm_eval/tasks/cmmlu/_cmmlu.yaml
0 → 100644
View file @
7d09b24c
aggregate_metric_list
:
-
aggregation
:
mean
metric
:
acc
weight_by_size
:
true
-
aggregation
:
mean
metric
:
acc_norm
weight_by_size
:
true
group
:
cmmlu
metadata
:
version
:
0.0
task
:
-
cmmlu_agronomy
-
cmmlu_anatomy
-
cmmlu_ancient_chinese
-
cmmlu_arts
-
cmmlu_astronomy
-
cmmlu_business_ethics
-
cmmlu_chinese_civil_service_exam
-
cmmlu_chinese_driving_rule
-
cmmlu_chinese_food_culture
-
cmmlu_chinese_foreign_policy
-
cmmlu_chinese_history
-
cmmlu_chinese_literature
-
cmmlu_chinese_teacher_qualification
-
cmmlu_clinical_knowledge
-
cmmlu_college_actuarial_science
-
cmmlu_college_education
-
cmmlu_college_engineering_hydrology
-
cmmlu_college_law
-
cmmlu_college_mathematics
-
cmmlu_college_medical_statistics
-
cmmlu_college_medicine
-
cmmlu_computer_science
-
cmmlu_computer_security
-
cmmlu_conceptual_physics
-
cmmlu_construction_project_management
-
cmmlu_economics
-
cmmlu_education
-
cmmlu_electrical_engineering
-
cmmlu_elementary_chinese
-
cmmlu_elementary_commonsense
-
cmmlu_elementary_information_and_technology
-
cmmlu_elementary_mathematics
-
cmmlu_ethnology
-
cmmlu_food_science
-
cmmlu_genetics
-
cmmlu_global_facts
-
cmmlu_high_school_biology
-
cmmlu_high_school_chemistry
-
cmmlu_high_school_geography
-
cmmlu_high_school_mathematics
-
cmmlu_high_school_physics
-
cmmlu_high_school_politics
-
cmmlu_human_sexuality
-
cmmlu_international_law
-
cmmlu_journalism
-
cmmlu_jurisprudence
-
cmmlu_legal_and_moral_basis
-
cmmlu_logical
-
cmmlu_machine_learning
-
cmmlu_management
-
cmmlu_marketing
-
cmmlu_marxist_theory
-
cmmlu_modern_chinese
-
cmmlu_nutrition
-
cmmlu_philosophy
-
cmmlu_professional_accounting
-
cmmlu_professional_law
-
cmmlu_professional_medicine
-
cmmlu_professional_psychology
-
cmmlu_public_relations
-
cmmlu_security_study
-
cmmlu_sociology
-
cmmlu_sports_science
-
cmmlu_traditional_chinese_medicine
-
cmmlu_virology
-
cmmlu_world_history
-
cmmlu_world_religions
lm_eval/tasks/cmmlu/_generate_configs.py
View file @
7d09b24c
...
...
@@ -132,3 +132,33 @@ if __name__ == "__main__":
allow_unicode
=
True
,
default_style
=
'"'
,
)
# write group config out
group_yaml_dict
=
{
"group"
:
"cmmlu"
,
"task"
:
[
(
f
"cmmlu_
{
args
.
task_prefix
}
_
{
subject_eng
}
"
if
args
.
task_prefix
!=
""
else
f
"cmmlu_
{
subject_eng
}
"
)
for
subject_eng
in
SUBJECTS
.
keys
()
],
"aggregate_metric_list"
:
[
{
"metric"
:
"acc"
,
"aggregation"
:
"mean"
,
"weight_by_size"
:
True
},
{
"metric"
:
"acc_norm"
,
"aggregation"
:
"mean"
,
"weight_by_size"
:
True
},
],
"metadata"
:
{
"version"
:
0.0
},
}
file_save_path
=
"_"
+
args
.
save_prefix_path
+
".yaml"
with
open
(
file_save_path
,
"w"
,
encoding
=
"utf-8"
)
as
group_yaml_file
:
yaml
.
dump
(
group_yaml_dict
,
group_yaml_file
,
width
=
float
(
"inf"
),
allow_unicode
=
True
,
default_style
=
'"'
,
)
lm_eval/tasks/cmmlu/cmmlu_agronomy.yaml
0 → 100644
View file @
7d09b24c
"
dataset_name"
:
"
agronomy"
"
description"
:
"
以下是关于农学的单项选择题,请直接给出正确答案的选项。
\n\n
"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
cmmlu_agronomy"
lm_eval/tasks/cmmlu/cmmlu_anatomy.yaml
0 → 100644
View file @
7d09b24c
"
dataset_name"
:
"
anatomy"
"
description"
:
"
以下是关于解剖学的单项选择题,请直接给出正确答案的选项。
\n\n
"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
cmmlu_anatomy"
lm_eval/tasks/cmmlu/cmmlu_ancient_chinese.yaml
0 → 100644
View file @
7d09b24c
"
dataset_name"
:
"
ancient_chinese"
"
description"
:
"
以下是关于古汉语的单项选择题,请直接给出正确答案的选项。
\n\n
"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
cmmlu_ancient_chinese"
lm_eval/tasks/cmmlu/cmmlu_arts.yaml
0 → 100644
View file @
7d09b24c
"
dataset_name"
:
"
arts"
"
description"
:
"
以下是关于艺术学的单项选择题,请直接给出正确答案的选项。
\n\n
"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
cmmlu_arts"
lm_eval/tasks/cmmlu/cmmlu_astronomy.yaml
0 → 100644
View file @
7d09b24c
"
dataset_name"
:
"
astronomy"
"
description"
:
"
以下是关于天文学的单项选择题,请直接给出正确答案的选项。
\n\n
"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
cmmlu_astronomy"
lm_eval/tasks/cmmlu/cmmlu_business_ethics.yaml
0 → 100644
View file @
7d09b24c
"
dataset_name"
:
"
business_ethics"
"
description"
:
"
以下是关于商业伦理的单项选择题,请直接给出正确答案的选项。
\n\n
"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
cmmlu_business_ethics"
lm_eval/tasks/cmmlu/cmmlu_chinese_civil_service_exam.yaml
0 → 100644
View file @
7d09b24c
"
dataset_name"
:
"
chinese_civil_service_exam"
"
description"
:
"
以下是关于中国公务员考试的单项选择题,请直接给出正确答案的选项。
\n\n
"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
cmmlu_chinese_civil_service_exam"
lm_eval/tasks/cmmlu/cmmlu_chinese_driving_rule.yaml
0 → 100644
View file @
7d09b24c
"
dataset_name"
:
"
chinese_driving_rule"
"
description"
:
"
以下是关于中国驾驶规则的单项选择题,请直接给出正确答案的选项。
\n\n
"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
cmmlu_chinese_driving_rule"
lm_eval/tasks/cmmlu/cmmlu_chinese_food_culture.yaml
0 → 100644
View file @
7d09b24c
"
dataset_name"
:
"
chinese_food_culture"
"
description"
:
"
以下是关于中国饮食文化的单项选择题,请直接给出正确答案的选项。
\n\n
"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
cmmlu_chinese_food_culture"
lm_eval/tasks/cmmlu/cmmlu_chinese_foreign_policy.yaml
0 → 100644
View file @
7d09b24c
"
dataset_name"
:
"
chinese_foreign_policy"
"
description"
:
"
以下是关于中国外交政策的单项选择题,请直接给出正确答案的选项。
\n\n
"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
cmmlu_chinese_foreign_policy"
Prev
1
2
3
4
5
6
7
8
…
20
Next
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment