Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
gaoqiong
lm-evaluation-harness
Commits
c171fa30
Commit
c171fa30
authored
Jun 21, 2024
by
haileyschoelkopf
Browse files
add more groupings / tags distinctions
parent
46e8c8e6
Changes
81
Hide whitespace changes
Inline
Side-by-side
Showing
20 changed files
with
330 additions
and
1 deletion
+330
-1
lm_eval/tasks/blimp/_blimp.yaml
lm_eval/tasks/blimp/_blimp.yaml
+75
-0
lm_eval/tasks/ceval/_ceval-valid.yaml
lm_eval/tasks/ceval/_ceval-valid.yaml
+63
-0
lm_eval/tasks/ceval/_generate_configs.py
lm_eval/tasks/ceval/_generate_configs.py
+24
-1
lm_eval/tasks/cmmlu/_cmmlu.yaml
lm_eval/tasks/cmmlu/_cmmlu.yaml
+78
-0
lm_eval/tasks/cmmlu/_generate_configs.py
lm_eval/tasks/cmmlu/_generate_configs.py
+30
-0
lm_eval/tasks/cmmlu/cmmlu_agronomy.yaml
lm_eval/tasks/cmmlu/cmmlu_agronomy.yaml
+4
-0
lm_eval/tasks/cmmlu/cmmlu_anatomy.yaml
lm_eval/tasks/cmmlu/cmmlu_anatomy.yaml
+4
-0
lm_eval/tasks/cmmlu/cmmlu_ancient_chinese.yaml
lm_eval/tasks/cmmlu/cmmlu_ancient_chinese.yaml
+4
-0
lm_eval/tasks/cmmlu/cmmlu_arts.yaml
lm_eval/tasks/cmmlu/cmmlu_arts.yaml
+4
-0
lm_eval/tasks/cmmlu/cmmlu_astronomy.yaml
lm_eval/tasks/cmmlu/cmmlu_astronomy.yaml
+4
-0
lm_eval/tasks/cmmlu/cmmlu_business_ethics.yaml
lm_eval/tasks/cmmlu/cmmlu_business_ethics.yaml
+4
-0
lm_eval/tasks/cmmlu/cmmlu_chinese_civil_service_exam.yaml
lm_eval/tasks/cmmlu/cmmlu_chinese_civil_service_exam.yaml
+4
-0
lm_eval/tasks/cmmlu/cmmlu_chinese_driving_rule.yaml
lm_eval/tasks/cmmlu/cmmlu_chinese_driving_rule.yaml
+4
-0
lm_eval/tasks/cmmlu/cmmlu_chinese_food_culture.yaml
lm_eval/tasks/cmmlu/cmmlu_chinese_food_culture.yaml
+4
-0
lm_eval/tasks/cmmlu/cmmlu_chinese_foreign_policy.yaml
lm_eval/tasks/cmmlu/cmmlu_chinese_foreign_policy.yaml
+4
-0
lm_eval/tasks/cmmlu/cmmlu_chinese_history.yaml
lm_eval/tasks/cmmlu/cmmlu_chinese_history.yaml
+4
-0
lm_eval/tasks/cmmlu/cmmlu_chinese_literature.yaml
lm_eval/tasks/cmmlu/cmmlu_chinese_literature.yaml
+4
-0
lm_eval/tasks/cmmlu/cmmlu_chinese_teacher_qualification.yaml
lm_eval/tasks/cmmlu/cmmlu_chinese_teacher_qualification.yaml
+4
-0
lm_eval/tasks/cmmlu/cmmlu_clinical_knowledge.yaml
lm_eval/tasks/cmmlu/cmmlu_clinical_knowledge.yaml
+4
-0
lm_eval/tasks/cmmlu/cmmlu_college_actuarial_science.yaml
lm_eval/tasks/cmmlu/cmmlu_college_actuarial_science.yaml
+4
-0
No files found.
lm_eval/tasks/blimp/_blimp.yaml
0 → 100644
View file @
c171fa30
group
:
blimp
task
:
-
"
blimp_adjunct_island"
-
"
blimp_anaphor_gender_agreement"
-
"
blimp_anaphor_number_agreement"
-
"
blimp_animate_subject_passive"
-
"
blimp_animate_subject_trans"
-
"
blimp_causative"
-
"
blimp_complex_NP_island"
-
"
blimp_coordinate_structure_constraint_complex_left_branch"
-
"
blimp_coordinate_structure_constraint_object_extraction"
-
"
blimp_determiner_noun_agreement_1"
-
"
blimp_determiner_noun_agreement_2"
-
"
blimp_determiner_noun_agreement_irregular_1"
-
"
blimp_determiner_noun_agreement_irregular_2"
-
"
blimp_determiner_noun_agreement_with_adj_2"
-
"
blimp_determiner_noun_agreement_with_adj_irregular_1"
-
"
blimp_determiner_noun_agreement_with_adj_irregular_2"
-
"
blimp_determiner_noun_agreement_with_adjective_1"
-
"
blimp_distractor_agreement_relational_noun"
-
"
blimp_distractor_agreement_relative_clause"
-
"
blimp_drop_argument"
-
"
blimp_ellipsis_n_bar_1"
-
"
blimp_ellipsis_n_bar_2"
-
"
blimp_existential_there_object_raising"
-
"
blimp_existential_there_quantifiers_1"
-
"
blimp_existential_there_quantifiers_2"
-
"
blimp_existential_there_subject_raising"
-
"
blimp_expletive_it_object_raising"
-
"
blimp_inchoative"
-
"
blimp_intransitive"
-
"
blimp_irregular_past_participle_adjectives"
-
"
blimp_irregular_past_participle_verbs"
-
"
blimp_irregular_plural_subject_verb_agreement_1"
-
"
blimp_irregular_plural_subject_verb_agreement_2"
-
"
blimp_left_branch_island_echo_question"
-
"
blimp_left_branch_island_simple_question"
-
"
blimp_matrix_question_npi_licensor_present"
-
"
blimp_npi_present_1"
-
"
blimp_npi_present_2"
-
"
blimp_only_npi_licensor_present"
-
"
blimp_only_npi_scope"
-
"
blimp_passive_1"
-
"
blimp_passive_2"
-
"
blimp_principle_A_c_command"
-
"
blimp_principle_A_case_1"
-
"
blimp_principle_A_case_2"
-
"
blimp_principle_A_domain_1"
-
"
blimp_principle_A_domain_2"
-
"
blimp_principle_A_domain_3"
-
"
blimp_principle_A_reconstruction"
-
"
blimp_regular_plural_subject_verb_agreement_1"
-
"
blimp_regular_plural_subject_verb_agreement_2"
-
"
blimp_sentential_negation_npi_licensor_present"
-
"
blimp_sentential_negation_npi_scope"
-
"
blimp_sentential_subject_island"
-
"
blimp_superlative_quantifiers_1"
-
"
blimp_superlative_quantifiers_2"
-
"
blimp_tough_vs_raising_1"
-
"
blimp_tough_vs_raising_2"
-
"
blimp_transitive"
-
"
blimp_wh_island"
-
"
blimp_wh_questions_object_gap"
-
"
blimp_wh_questions_subject_gap"
-
"
blimp_wh_questions_subject_gap_long_distance"
-
"
blimp_wh_vs_that_no_gap"
-
"
blimp_wh_vs_that_no_gap_long_distance"
-
"
blimp_wh_vs_that_with_gap"
-
"
blimp_wh_vs_that_with_gap_long_distance"
aggregate_metric_list
:
-
metric
:
acc
aggregation
:
mean
weight_by_size
:
False
metadata
:
version
:
2.0
lm_eval/tasks/ceval/_ceval-valid.yaml
0 → 100644
View file @
c171fa30
aggregate_metric_list
:
-
aggregation
:
mean
metric
:
acc
weight_by_size
:
true
-
aggregation
:
mean
metric
:
acc_norm
weight_by_size
:
true
group
:
ceval-valid
metadata
:
version
:
1.0
task
:
-
ceval-valid_computer_network
-
ceval-valid_operating_system
-
ceval-valid_computer_architecture
-
ceval-valid_college_programming
-
ceval-valid_college_physics
-
ceval-valid_college_chemistry
-
ceval-valid_advanced_mathematics
-
ceval-valid_probability_and_statistics
-
ceval-valid_discrete_mathematics
-
ceval-valid_electrical_engineer
-
ceval-valid_metrology_engineer
-
ceval-valid_high_school_mathematics
-
ceval-valid_high_school_physics
-
ceval-valid_high_school_chemistry
-
ceval-valid_high_school_biology
-
ceval-valid_middle_school_mathematics
-
ceval-valid_middle_school_biology
-
ceval-valid_middle_school_physics
-
ceval-valid_middle_school_chemistry
-
ceval-valid_veterinary_medicine
-
ceval-valid_college_economics
-
ceval-valid_business_administration
-
ceval-valid_marxism
-
ceval-valid_mao_zedong_thought
-
ceval-valid_education_science
-
ceval-valid_teacher_qualification
-
ceval-valid_high_school_politics
-
ceval-valid_high_school_geography
-
ceval-valid_middle_school_politics
-
ceval-valid_middle_school_geography
-
ceval-valid_modern_chinese_history
-
ceval-valid_ideological_and_moral_cultivation
-
ceval-valid_logic
-
ceval-valid_law
-
ceval-valid_chinese_language_and_literature
-
ceval-valid_art_studies
-
ceval-valid_professional_tour_guide
-
ceval-valid_legal_professional
-
ceval-valid_high_school_chinese
-
ceval-valid_high_school_history
-
ceval-valid_middle_school_history
-
ceval-valid_civil_servant
-
ceval-valid_sports_science
-
ceval-valid_plant_protection
-
ceval-valid_basic_medicine
-
ceval-valid_clinical_medicine
-
ceval-valid_urban_and_rural_planner
-
ceval-valid_accountant
-
ceval-valid_fire_engineer
-
ceval-valid_environmental_impact_assessment_engineer
-
ceval-valid_tax_accountant
-
ceval-valid_physician
lm_eval/tasks/ceval/_generate_configs.py
View file @
c171fa30
...
...
@@ -7,7 +7,7 @@ import os
import
yaml
from
tqdm
import
tqdm
from
lm_eval.
logger
import
eval_logger
from
lm_eval.
utils
import
eval_logger
SUBJECTS
=
{
...
...
@@ -116,3 +116,26 @@ if __name__ == "__main__":
allow_unicode
=
True
,
default_style
=
'"'
,
)
# write group config out
group_yaml_dict
=
{
"group"
:
"ceval-valid"
,
"task"
:
[
f
"ceval-valid_
{
task_name
}
"
for
task_name
in
SUBJECTS
.
keys
()],
"aggregate_metric_list"
:
[
{
"metric"
:
"acc"
,
"aggregation"
:
"mean"
,
"weight_by_size"
:
True
},
{
"metric"
:
"acc_norm"
,
"aggregation"
:
"mean"
,
"weight_by_size"
:
True
},
],
"metadata"
:
{
"version"
:
1.0
},
}
file_save_path
=
"_"
+
args
.
save_prefix_path
+
".yaml"
with
open
(
file_save_path
,
"w"
,
encoding
=
"utf-8"
)
as
group_yaml_file
:
yaml
.
dump
(
group_yaml_dict
,
group_yaml_file
,
width
=
float
(
"inf"
),
allow_unicode
=
True
,
default_style
=
'"'
,
)
lm_eval/tasks/cmmlu/_cmmlu.yaml
0 → 100644
View file @
c171fa30
aggregate_metric_list
:
-
aggregation
:
mean
metric
:
acc
weight_by_size
:
true
-
aggregation
:
mean
metric
:
acc_norm
weight_by_size
:
true
group
:
cmmlu
metadata
:
version
:
0.0
task
:
-
cmmlu_agronomy
-
cmmlu_anatomy
-
cmmlu_ancient_chinese
-
cmmlu_arts
-
cmmlu_astronomy
-
cmmlu_business_ethics
-
cmmlu_chinese_civil_service_exam
-
cmmlu_chinese_driving_rule
-
cmmlu_chinese_food_culture
-
cmmlu_chinese_foreign_policy
-
cmmlu_chinese_history
-
cmmlu_chinese_literature
-
cmmlu_chinese_teacher_qualification
-
cmmlu_clinical_knowledge
-
cmmlu_college_actuarial_science
-
cmmlu_college_education
-
cmmlu_college_engineering_hydrology
-
cmmlu_college_law
-
cmmlu_college_mathematics
-
cmmlu_college_medical_statistics
-
cmmlu_college_medicine
-
cmmlu_computer_science
-
cmmlu_computer_security
-
cmmlu_conceptual_physics
-
cmmlu_construction_project_management
-
cmmlu_economics
-
cmmlu_education
-
cmmlu_electrical_engineering
-
cmmlu_elementary_chinese
-
cmmlu_elementary_commonsense
-
cmmlu_elementary_information_and_technology
-
cmmlu_elementary_mathematics
-
cmmlu_ethnology
-
cmmlu_food_science
-
cmmlu_genetics
-
cmmlu_global_facts
-
cmmlu_high_school_biology
-
cmmlu_high_school_chemistry
-
cmmlu_high_school_geography
-
cmmlu_high_school_mathematics
-
cmmlu_high_school_physics
-
cmmlu_high_school_politics
-
cmmlu_human_sexuality
-
cmmlu_international_law
-
cmmlu_journalism
-
cmmlu_jurisprudence
-
cmmlu_legal_and_moral_basis
-
cmmlu_logical
-
cmmlu_machine_learning
-
cmmlu_management
-
cmmlu_marketing
-
cmmlu_marxist_theory
-
cmmlu_modern_chinese
-
cmmlu_nutrition
-
cmmlu_philosophy
-
cmmlu_professional_accounting
-
cmmlu_professional_law
-
cmmlu_professional_medicine
-
cmmlu_professional_psychology
-
cmmlu_public_relations
-
cmmlu_security_study
-
cmmlu_sociology
-
cmmlu_sports_science
-
cmmlu_traditional_chinese_medicine
-
cmmlu_virology
-
cmmlu_world_history
-
cmmlu_world_religions
lm_eval/tasks/cmmlu/_generate_configs.py
View file @
c171fa30
...
...
@@ -131,3 +131,33 @@ if __name__ == "__main__":
allow_unicode
=
True
,
default_style
=
'"'
,
)
# write group config out
group_yaml_dict
=
{
"group"
:
"cmmlu"
,
"task"
:
[
(
f
"cmmlu_
{
args
.
task_prefix
}
_
{
subject_eng
}
"
if
args
.
task_prefix
!=
""
else
f
"cmmlu_
{
subject_eng
}
"
)
for
subject_eng
in
SUBJECTS
.
keys
()
],
"aggregate_metric_list"
:
[
{
"metric"
:
"acc"
,
"aggregation"
:
"mean"
,
"weight_by_size"
:
True
},
{
"metric"
:
"acc_norm"
,
"aggregation"
:
"mean"
,
"weight_by_size"
:
True
},
],
"metadata"
:
{
"version"
:
0.0
},
}
file_save_path
=
"_"
+
args
.
save_prefix_path
+
".yaml"
with
open
(
file_save_path
,
"w"
,
encoding
=
"utf-8"
)
as
group_yaml_file
:
yaml
.
dump
(
group_yaml_dict
,
group_yaml_file
,
width
=
float
(
"inf"
),
allow_unicode
=
True
,
default_style
=
'"'
,
)
lm_eval/tasks/cmmlu/cmmlu_agronomy.yaml
0 → 100644
View file @
c171fa30
"
dataset_name"
:
"
agronomy"
"
description"
:
"
以下是关于农学的单项选择题,请直接给出正确答案的选项。
\n\n
"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
cmmlu_agronomy"
lm_eval/tasks/cmmlu/cmmlu_anatomy.yaml
0 → 100644
View file @
c171fa30
"
dataset_name"
:
"
anatomy"
"
description"
:
"
以下是关于解剖学的单项选择题,请直接给出正确答案的选项。
\n\n
"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
cmmlu_anatomy"
lm_eval/tasks/cmmlu/cmmlu_ancient_chinese.yaml
0 → 100644
View file @
c171fa30
"
dataset_name"
:
"
ancient_chinese"
"
description"
:
"
以下是关于古汉语的单项选择题,请直接给出正确答案的选项。
\n\n
"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
cmmlu_ancient_chinese"
lm_eval/tasks/cmmlu/cmmlu_arts.yaml
0 → 100644
View file @
c171fa30
"
dataset_name"
:
"
arts"
"
description"
:
"
以下是关于艺术学的单项选择题,请直接给出正确答案的选项。
\n\n
"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
cmmlu_arts"
lm_eval/tasks/cmmlu/cmmlu_astronomy.yaml
0 → 100644
View file @
c171fa30
"
dataset_name"
:
"
astronomy"
"
description"
:
"
以下是关于天文学的单项选择题,请直接给出正确答案的选项。
\n\n
"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
cmmlu_astronomy"
lm_eval/tasks/cmmlu/cmmlu_business_ethics.yaml
0 → 100644
View file @
c171fa30
"
dataset_name"
:
"
business_ethics"
"
description"
:
"
以下是关于商业伦理的单项选择题,请直接给出正确答案的选项。
\n\n
"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
cmmlu_business_ethics"
lm_eval/tasks/cmmlu/cmmlu_chinese_civil_service_exam.yaml
0 → 100644
View file @
c171fa30
"
dataset_name"
:
"
chinese_civil_service_exam"
"
description"
:
"
以下是关于中国公务员考试的单项选择题,请直接给出正确答案的选项。
\n\n
"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
cmmlu_chinese_civil_service_exam"
lm_eval/tasks/cmmlu/cmmlu_chinese_driving_rule.yaml
0 → 100644
View file @
c171fa30
"
dataset_name"
:
"
chinese_driving_rule"
"
description"
:
"
以下是关于中国驾驶规则的单项选择题,请直接给出正确答案的选项。
\n\n
"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
cmmlu_chinese_driving_rule"
lm_eval/tasks/cmmlu/cmmlu_chinese_food_culture.yaml
0 → 100644
View file @
c171fa30
"
dataset_name"
:
"
chinese_food_culture"
"
description"
:
"
以下是关于中国饮食文化的单项选择题,请直接给出正确答案的选项。
\n\n
"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
cmmlu_chinese_food_culture"
lm_eval/tasks/cmmlu/cmmlu_chinese_foreign_policy.yaml
0 → 100644
View file @
c171fa30
"
dataset_name"
:
"
chinese_foreign_policy"
"
description"
:
"
以下是关于中国外交政策的单项选择题,请直接给出正确答案的选项。
\n\n
"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
cmmlu_chinese_foreign_policy"
lm_eval/tasks/cmmlu/cmmlu_chinese_history.yaml
0 → 100644
View file @
c171fa30
"
dataset_name"
:
"
chinese_history"
"
description"
:
"
以下是关于中国历史的单项选择题,请直接给出正确答案的选项。
\n\n
"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
cmmlu_chinese_history"
lm_eval/tasks/cmmlu/cmmlu_chinese_literature.yaml
0 → 100644
View file @
c171fa30
"
dataset_name"
:
"
chinese_literature"
"
description"
:
"
以下是关于中国文学的单项选择题,请直接给出正确答案的选项。
\n\n
"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
cmmlu_chinese_literature"
lm_eval/tasks/cmmlu/cmmlu_chinese_teacher_qualification.yaml
0 → 100644
View file @
c171fa30
"
dataset_name"
:
"
chinese_teacher_qualification"
"
description"
:
"
以下是关于中国教师资格的单项选择题,请直接给出正确答案的选项。
\n\n
"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
cmmlu_chinese_teacher_qualification"
lm_eval/tasks/cmmlu/cmmlu_clinical_knowledge.yaml
0 → 100644
View file @
c171fa30
"
dataset_name"
:
"
clinical_knowledge"
"
description"
:
"
以下是关于临床知识的单项选择题,请直接给出正确答案的选项。
\n\n
"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
cmmlu_clinical_knowledge"
lm_eval/tasks/cmmlu/cmmlu_college_actuarial_science.yaml
0 → 100644
View file @
c171fa30
"
dataset_name"
:
"
college_actuarial_science"
"
description"
:
"
以下是关于大学精算学的单项选择题,请直接给出正确答案的选项。
\n\n
"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
cmmlu_college_actuarial_science"
Prev
1
2
3
4
5
Next
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment