Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
gaoqiong
lm-evaluation-harness
Commits
a2af2101
Unverified
Commit
a2af2101
authored
Jul 12, 2024
by
Yen-Ting Lin
Committed by
GitHub
Jul 12, 2024
Browse files
Merge branch 'EleutherAI:main' into main
parents
82cb25c1
d5f39bf8
Changes
1000
Hide whitespace changes
Inline
Side-by-side
Showing
20 changed files
with
249 additions
and
4 deletions
+249
-4
lm_eval/tasks/blimp/_template_yaml
lm_eval/tasks/blimp/_template_yaml
+0
-1
lm_eval/tasks/ceval/_ceval-valid.yaml
lm_eval/tasks/ceval/_ceval-valid.yaml
+63
-0
lm_eval/tasks/ceval/_default_ceval_yaml
lm_eval/tasks/ceval/_default_ceval_yaml
+0
-1
lm_eval/tasks/ceval/_generate_configs.py
lm_eval/tasks/ceval/_generate_configs.py
+25
-1
lm_eval/tasks/cmmlu/_cmmlu.yaml
lm_eval/tasks/cmmlu/_cmmlu.yaml
+78
-0
lm_eval/tasks/cmmlu/_default_template_yaml
lm_eval/tasks/cmmlu/_default_template_yaml
+0
-1
lm_eval/tasks/cmmlu/_generate_configs.py
lm_eval/tasks/cmmlu/_generate_configs.py
+31
-0
lm_eval/tasks/cmmlu/cmmlu_agronomy.yaml
lm_eval/tasks/cmmlu/cmmlu_agronomy.yaml
+4
-0
lm_eval/tasks/cmmlu/cmmlu_anatomy.yaml
lm_eval/tasks/cmmlu/cmmlu_anatomy.yaml
+4
-0
lm_eval/tasks/cmmlu/cmmlu_ancient_chinese.yaml
lm_eval/tasks/cmmlu/cmmlu_ancient_chinese.yaml
+4
-0
lm_eval/tasks/cmmlu/cmmlu_arts.yaml
lm_eval/tasks/cmmlu/cmmlu_arts.yaml
+4
-0
lm_eval/tasks/cmmlu/cmmlu_astronomy.yaml
lm_eval/tasks/cmmlu/cmmlu_astronomy.yaml
+4
-0
lm_eval/tasks/cmmlu/cmmlu_business_ethics.yaml
lm_eval/tasks/cmmlu/cmmlu_business_ethics.yaml
+4
-0
lm_eval/tasks/cmmlu/cmmlu_chinese_civil_service_exam.yaml
lm_eval/tasks/cmmlu/cmmlu_chinese_civil_service_exam.yaml
+4
-0
lm_eval/tasks/cmmlu/cmmlu_chinese_driving_rule.yaml
lm_eval/tasks/cmmlu/cmmlu_chinese_driving_rule.yaml
+4
-0
lm_eval/tasks/cmmlu/cmmlu_chinese_food_culture.yaml
lm_eval/tasks/cmmlu/cmmlu_chinese_food_culture.yaml
+4
-0
lm_eval/tasks/cmmlu/cmmlu_chinese_foreign_policy.yaml
lm_eval/tasks/cmmlu/cmmlu_chinese_foreign_policy.yaml
+4
-0
lm_eval/tasks/cmmlu/cmmlu_chinese_history.yaml
lm_eval/tasks/cmmlu/cmmlu_chinese_history.yaml
+4
-0
lm_eval/tasks/cmmlu/cmmlu_chinese_literature.yaml
lm_eval/tasks/cmmlu/cmmlu_chinese_literature.yaml
+4
-0
lm_eval/tasks/cmmlu/cmmlu_chinese_teacher_qualification.yaml
lm_eval/tasks/cmmlu/cmmlu_chinese_teacher_qualification.yaml
+4
-0
No files found.
Too many changes to show.
To preserve performance only
1000 of 1000+
files are displayed.
Plain diff
Email patch
lm_eval/tasks/blimp/_template_yaml
View file @
a2af2101
group: blimp
dataset_path: blimp
output_type: multiple_choice
validation_split: train
...
...
lm_eval/tasks/ceval/_ceval-valid.yaml
0 → 100644
View file @
a2af2101
aggregate_metric_list
:
-
aggregation
:
mean
metric
:
acc
weight_by_size
:
true
-
aggregation
:
mean
metric
:
acc_norm
weight_by_size
:
true
group
:
ceval-valid
metadata
:
version
:
1.0
task
:
-
ceval-valid_computer_network
-
ceval-valid_operating_system
-
ceval-valid_computer_architecture
-
ceval-valid_college_programming
-
ceval-valid_college_physics
-
ceval-valid_college_chemistry
-
ceval-valid_advanced_mathematics
-
ceval-valid_probability_and_statistics
-
ceval-valid_discrete_mathematics
-
ceval-valid_electrical_engineer
-
ceval-valid_metrology_engineer
-
ceval-valid_high_school_mathematics
-
ceval-valid_high_school_physics
-
ceval-valid_high_school_chemistry
-
ceval-valid_high_school_biology
-
ceval-valid_middle_school_mathematics
-
ceval-valid_middle_school_biology
-
ceval-valid_middle_school_physics
-
ceval-valid_middle_school_chemistry
-
ceval-valid_veterinary_medicine
-
ceval-valid_college_economics
-
ceval-valid_business_administration
-
ceval-valid_marxism
-
ceval-valid_mao_zedong_thought
-
ceval-valid_education_science
-
ceval-valid_teacher_qualification
-
ceval-valid_high_school_politics
-
ceval-valid_high_school_geography
-
ceval-valid_middle_school_politics
-
ceval-valid_middle_school_geography
-
ceval-valid_modern_chinese_history
-
ceval-valid_ideological_and_moral_cultivation
-
ceval-valid_logic
-
ceval-valid_law
-
ceval-valid_chinese_language_and_literature
-
ceval-valid_art_studies
-
ceval-valid_professional_tour_guide
-
ceval-valid_legal_professional
-
ceval-valid_high_school_chinese
-
ceval-valid_high_school_history
-
ceval-valid_middle_school_history
-
ceval-valid_civil_servant
-
ceval-valid_sports_science
-
ceval-valid_plant_protection
-
ceval-valid_basic_medicine
-
ceval-valid_clinical_medicine
-
ceval-valid_urban_and_rural_planner
-
ceval-valid_accountant
-
ceval-valid_fire_engineer
-
ceval-valid_environmental_impact_assessment_engineer
-
ceval-valid_tax_accountant
-
ceval-valid_physician
lm_eval/tasks/ceval/_default_ceval_yaml
View file @
a2af2101
group: ceval-valid
dataset_path: ceval/ceval-exam
validation_split: val
fewshot_split: dev
...
...
lm_eval/tasks/ceval/_generate_configs.py
View file @
a2af2101
"""
Take in a YAML, and output all other splits with this YAML
"""
import
argparse
import
os
import
yaml
from
tqdm
import
tqdm
from
lm_eval.
logger
import
eval_logger
from
lm_eval.
utils
import
eval_logger
SUBJECTS
=
{
...
...
@@ -116,3 +117,26 @@ if __name__ == "__main__":
allow_unicode
=
True
,
default_style
=
'"'
,
)
# write group config out
group_yaml_dict
=
{
"group"
:
"ceval-valid"
,
"task"
:
[
f
"ceval-valid_
{
task_name
}
"
for
task_name
in
SUBJECTS
.
keys
()],
"aggregate_metric_list"
:
[
{
"metric"
:
"acc"
,
"aggregation"
:
"mean"
,
"weight_by_size"
:
True
},
{
"metric"
:
"acc_norm"
,
"aggregation"
:
"mean"
,
"weight_by_size"
:
True
},
],
"metadata"
:
{
"version"
:
1.0
},
}
file_save_path
=
"_"
+
args
.
save_prefix_path
+
".yaml"
with
open
(
file_save_path
,
"w"
,
encoding
=
"utf-8"
)
as
group_yaml_file
:
yaml
.
dump
(
group_yaml_dict
,
group_yaml_file
,
width
=
float
(
"inf"
),
allow_unicode
=
True
,
default_style
=
'"'
,
)
lm_eval/tasks/cmmlu/_cmmlu.yaml
0 → 100644
View file @
a2af2101
group
:
cmmlu
task
:
-
cmmlu_agronomy
-
cmmlu_anatomy
-
cmmlu_ancient_chinese
-
cmmlu_arts
-
cmmlu_astronomy
-
cmmlu_business_ethics
-
cmmlu_chinese_civil_service_exam
-
cmmlu_chinese_driving_rule
-
cmmlu_chinese_food_culture
-
cmmlu_chinese_foreign_policy
-
cmmlu_chinese_history
-
cmmlu_chinese_literature
-
cmmlu_chinese_teacher_qualification
-
cmmlu_clinical_knowledge
-
cmmlu_college_actuarial_science
-
cmmlu_college_education
-
cmmlu_college_engineering_hydrology
-
cmmlu_college_law
-
cmmlu_college_mathematics
-
cmmlu_college_medical_statistics
-
cmmlu_college_medicine
-
cmmlu_computer_science
-
cmmlu_computer_security
-
cmmlu_conceptual_physics
-
cmmlu_construction_project_management
-
cmmlu_economics
-
cmmlu_education
-
cmmlu_electrical_engineering
-
cmmlu_elementary_chinese
-
cmmlu_elementary_commonsense
-
cmmlu_elementary_information_and_technology
-
cmmlu_elementary_mathematics
-
cmmlu_ethnology
-
cmmlu_food_science
-
cmmlu_genetics
-
cmmlu_global_facts
-
cmmlu_high_school_biology
-
cmmlu_high_school_chemistry
-
cmmlu_high_school_geography
-
cmmlu_high_school_mathematics
-
cmmlu_high_school_physics
-
cmmlu_high_school_politics
-
cmmlu_human_sexuality
-
cmmlu_international_law
-
cmmlu_journalism
-
cmmlu_jurisprudence
-
cmmlu_legal_and_moral_basis
-
cmmlu_logical
-
cmmlu_machine_learning
-
cmmlu_management
-
cmmlu_marketing
-
cmmlu_marxist_theory
-
cmmlu_modern_chinese
-
cmmlu_nutrition
-
cmmlu_philosophy
-
cmmlu_professional_accounting
-
cmmlu_professional_law
-
cmmlu_professional_medicine
-
cmmlu_professional_psychology
-
cmmlu_public_relations
-
cmmlu_security_study
-
cmmlu_sociology
-
cmmlu_sports_science
-
cmmlu_traditional_chinese_medicine
-
cmmlu_virology
-
cmmlu_world_history
-
cmmlu_world_religions
aggregate_metric_list
:
-
aggregation
:
mean
metric
:
acc
weight_by_size
:
true
-
aggregation
:
mean
metric
:
acc_norm
weight_by_size
:
true
metadata
:
version
:
0.0
lm_eval/tasks/cmmlu/_default_template_yaml
View file @
a2af2101
group: cmmlu
dataset_path: haonan-li/cmmlu
test_split: test
fewshot_split: dev
...
...
lm_eval/tasks/cmmlu/_generate_configs.py
View file @
a2af2101
"""
Take in a YAML, and output all other splits with this YAML
"""
import
argparse
import
os
...
...
@@ -131,3 +132,33 @@ if __name__ == "__main__":
allow_unicode
=
True
,
default_style
=
'"'
,
)
# write group config out
group_yaml_dict
=
{
"group"
:
"cmmlu"
,
"task"
:
[
(
f
"cmmlu_
{
args
.
task_prefix
}
_
{
subject_eng
}
"
if
args
.
task_prefix
!=
""
else
f
"cmmlu_
{
subject_eng
}
"
)
for
subject_eng
in
SUBJECTS
.
keys
()
],
"aggregate_metric_list"
:
[
{
"metric"
:
"acc"
,
"aggregation"
:
"mean"
,
"weight_by_size"
:
True
},
{
"metric"
:
"acc_norm"
,
"aggregation"
:
"mean"
,
"weight_by_size"
:
True
},
],
"metadata"
:
{
"version"
:
0.0
},
}
file_save_path
=
"_"
+
args
.
save_prefix_path
+
".yaml"
with
open
(
file_save_path
,
"w"
,
encoding
=
"utf-8"
)
as
group_yaml_file
:
yaml
.
dump
(
group_yaml_dict
,
group_yaml_file
,
width
=
float
(
"inf"
),
allow_unicode
=
True
,
default_style
=
'"'
,
)
lm_eval/tasks/cmmlu/cmmlu_agronomy.yaml
0 → 100644
View file @
a2af2101
"
dataset_name"
:
"
agronomy"
"
description"
:
"
以下是关于农学的单项选择题,请直接给出正确答案的选项。
\n\n
"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
cmmlu_agronomy"
lm_eval/tasks/cmmlu/cmmlu_anatomy.yaml
0 → 100644
View file @
a2af2101
"
dataset_name"
:
"
anatomy"
"
description"
:
"
以下是关于解剖学的单项选择题,请直接给出正确答案的选项。
\n\n
"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
cmmlu_anatomy"
lm_eval/tasks/cmmlu/cmmlu_ancient_chinese.yaml
0 → 100644
View file @
a2af2101
"
dataset_name"
:
"
ancient_chinese"
"
description"
:
"
以下是关于古汉语的单项选择题,请直接给出正确答案的选项。
\n\n
"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
cmmlu_ancient_chinese"
lm_eval/tasks/cmmlu/cmmlu_arts.yaml
0 → 100644
View file @
a2af2101
"
dataset_name"
:
"
arts"
"
description"
:
"
以下是关于艺术学的单项选择题,请直接给出正确答案的选项。
\n\n
"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
cmmlu_arts"
lm_eval/tasks/cmmlu/cmmlu_astronomy.yaml
0 → 100644
View file @
a2af2101
"
dataset_name"
:
"
astronomy"
"
description"
:
"
以下是关于天文学的单项选择题,请直接给出正确答案的选项。
\n\n
"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
cmmlu_astronomy"
lm_eval/tasks/
a
mmlu/
a
mmlu_business_ethics.yaml
→
lm_eval/tasks/
c
mmlu/
c
mmlu_business_ethics.yaml
View file @
a2af2101
"
dataset_name"
:
"
business_ethics"
"
description"
:
"
فم
بعملية
التقييم
في
مجال
علوم
أخرى
\n\n
"
"
description"
:
"
以下是关于商业伦理的单项选择题,请直接给出正确答案的选项。
\n\n
"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
a
mmlu_business_ethics"
"
task"
:
"
c
mmlu_business_ethics"
lm_eval/tasks/cmmlu/cmmlu_chinese_civil_service_exam.yaml
0 → 100644
View file @
a2af2101
"
dataset_name"
:
"
chinese_civil_service_exam"
"
description"
:
"
以下是关于中国公务员考试的单项选择题,请直接给出正确答案的选项。
\n\n
"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
cmmlu_chinese_civil_service_exam"
lm_eval/tasks/cmmlu/cmmlu_chinese_driving_rule.yaml
0 → 100644
View file @
a2af2101
"
dataset_name"
:
"
chinese_driving_rule"
"
description"
:
"
以下是关于中国驾驶规则的单项选择题,请直接给出正确答案的选项。
\n\n
"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
cmmlu_chinese_driving_rule"
lm_eval/tasks/cmmlu/cmmlu_chinese_food_culture.yaml
0 → 100644
View file @
a2af2101
"
dataset_name"
:
"
chinese_food_culture"
"
description"
:
"
以下是关于中国饮食文化的单项选择题,请直接给出正确答案的选项。
\n\n
"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
cmmlu_chinese_food_culture"
lm_eval/tasks/cmmlu/cmmlu_chinese_foreign_policy.yaml
0 → 100644
View file @
a2af2101
"
dataset_name"
:
"
chinese_foreign_policy"
"
description"
:
"
以下是关于中国外交政策的单项选择题,请直接给出正确答案的选项。
\n\n
"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
cmmlu_chinese_foreign_policy"
lm_eval/tasks/cmmlu/cmmlu_chinese_history.yaml
0 → 100644
View file @
a2af2101
"
dataset_name"
:
"
chinese_history"
"
description"
:
"
以下是关于中国历史的单项选择题,请直接给出正确答案的选项。
\n\n
"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
cmmlu_chinese_history"
lm_eval/tasks/cmmlu/cmmlu_chinese_literature.yaml
0 → 100644
View file @
a2af2101
"
dataset_name"
:
"
chinese_literature"
"
description"
:
"
以下是关于中国文学的单项选择题,请直接给出正确答案的选项。
\n\n
"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
cmmlu_chinese_literature"
lm_eval/tasks/cmmlu/cmmlu_chinese_teacher_qualification.yaml
0 → 100644
View file @
a2af2101
"
dataset_name"
:
"
chinese_teacher_qualification"
"
description"
:
"
以下是关于中国教师资格的单项选择题,请直接给出正确答案的选项。
\n\n
"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
cmmlu_chinese_teacher_qualification"
Prev
1
…
19
20
21
22
23
24
25
26
27
…
50
Next
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment