Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
gaoqiong
lm-evaluation-harness
Commits
a2af2101
Unverified
Commit
a2af2101
authored
Jul 12, 2024
by
Yen-Ting Lin
Committed by
GitHub
Jul 12, 2024
Browse files
Merge branch 'EleutherAI:main' into main
parents
82cb25c1
d5f39bf8
Changes
1000
Hide whitespace changes
Inline
Side-by-side
Showing
20 changed files
with
249 additions
and
4 deletions
+249
-4
lm_eval/tasks/blimp/_template_yaml
lm_eval/tasks/blimp/_template_yaml
+0
-1
lm_eval/tasks/ceval/_ceval-valid.yaml
lm_eval/tasks/ceval/_ceval-valid.yaml
+63
-0
lm_eval/tasks/ceval/_default_ceval_yaml
lm_eval/tasks/ceval/_default_ceval_yaml
+0
-1
lm_eval/tasks/ceval/_generate_configs.py
lm_eval/tasks/ceval/_generate_configs.py
+25
-1
lm_eval/tasks/cmmlu/_cmmlu.yaml
lm_eval/tasks/cmmlu/_cmmlu.yaml
+78
-0
lm_eval/tasks/cmmlu/_default_template_yaml
lm_eval/tasks/cmmlu/_default_template_yaml
+0
-1
lm_eval/tasks/cmmlu/_generate_configs.py
lm_eval/tasks/cmmlu/_generate_configs.py
+31
-0
lm_eval/tasks/cmmlu/cmmlu_agronomy.yaml
lm_eval/tasks/cmmlu/cmmlu_agronomy.yaml
+4
-0
lm_eval/tasks/cmmlu/cmmlu_anatomy.yaml
lm_eval/tasks/cmmlu/cmmlu_anatomy.yaml
+4
-0
lm_eval/tasks/cmmlu/cmmlu_ancient_chinese.yaml
lm_eval/tasks/cmmlu/cmmlu_ancient_chinese.yaml
+4
-0
lm_eval/tasks/cmmlu/cmmlu_arts.yaml
lm_eval/tasks/cmmlu/cmmlu_arts.yaml
+4
-0
lm_eval/tasks/cmmlu/cmmlu_astronomy.yaml
lm_eval/tasks/cmmlu/cmmlu_astronomy.yaml
+4
-0
lm_eval/tasks/cmmlu/cmmlu_business_ethics.yaml
lm_eval/tasks/cmmlu/cmmlu_business_ethics.yaml
+4
-0
lm_eval/tasks/cmmlu/cmmlu_chinese_civil_service_exam.yaml
lm_eval/tasks/cmmlu/cmmlu_chinese_civil_service_exam.yaml
+4
-0
lm_eval/tasks/cmmlu/cmmlu_chinese_driving_rule.yaml
lm_eval/tasks/cmmlu/cmmlu_chinese_driving_rule.yaml
+4
-0
lm_eval/tasks/cmmlu/cmmlu_chinese_food_culture.yaml
lm_eval/tasks/cmmlu/cmmlu_chinese_food_culture.yaml
+4
-0
lm_eval/tasks/cmmlu/cmmlu_chinese_foreign_policy.yaml
lm_eval/tasks/cmmlu/cmmlu_chinese_foreign_policy.yaml
+4
-0
lm_eval/tasks/cmmlu/cmmlu_chinese_history.yaml
lm_eval/tasks/cmmlu/cmmlu_chinese_history.yaml
+4
-0
lm_eval/tasks/cmmlu/cmmlu_chinese_literature.yaml
lm_eval/tasks/cmmlu/cmmlu_chinese_literature.yaml
+4
-0
lm_eval/tasks/cmmlu/cmmlu_chinese_teacher_qualification.yaml
lm_eval/tasks/cmmlu/cmmlu_chinese_teacher_qualification.yaml
+4
-0
No files found.
Too many changes to show.
To preserve performance only
1000 of 1000+
files are displayed.
Plain diff
Email patch
lm_eval/tasks/blimp/_template_yaml
View file @
a2af2101
group: blimp
dataset_path: blimp
dataset_path: blimp
output_type: multiple_choice
output_type: multiple_choice
validation_split: train
validation_split: train
...
...
lm_eval/tasks/ceval/_ceval-valid.yaml
0 → 100644
View file @
a2af2101
aggregate_metric_list
:
-
aggregation
:
mean
metric
:
acc
weight_by_size
:
true
-
aggregation
:
mean
metric
:
acc_norm
weight_by_size
:
true
group
:
ceval-valid
metadata
:
version
:
1.0
task
:
-
ceval-valid_computer_network
-
ceval-valid_operating_system
-
ceval-valid_computer_architecture
-
ceval-valid_college_programming
-
ceval-valid_college_physics
-
ceval-valid_college_chemistry
-
ceval-valid_advanced_mathematics
-
ceval-valid_probability_and_statistics
-
ceval-valid_discrete_mathematics
-
ceval-valid_electrical_engineer
-
ceval-valid_metrology_engineer
-
ceval-valid_high_school_mathematics
-
ceval-valid_high_school_physics
-
ceval-valid_high_school_chemistry
-
ceval-valid_high_school_biology
-
ceval-valid_middle_school_mathematics
-
ceval-valid_middle_school_biology
-
ceval-valid_middle_school_physics
-
ceval-valid_middle_school_chemistry
-
ceval-valid_veterinary_medicine
-
ceval-valid_college_economics
-
ceval-valid_business_administration
-
ceval-valid_marxism
-
ceval-valid_mao_zedong_thought
-
ceval-valid_education_science
-
ceval-valid_teacher_qualification
-
ceval-valid_high_school_politics
-
ceval-valid_high_school_geography
-
ceval-valid_middle_school_politics
-
ceval-valid_middle_school_geography
-
ceval-valid_modern_chinese_history
-
ceval-valid_ideological_and_moral_cultivation
-
ceval-valid_logic
-
ceval-valid_law
-
ceval-valid_chinese_language_and_literature
-
ceval-valid_art_studies
-
ceval-valid_professional_tour_guide
-
ceval-valid_legal_professional
-
ceval-valid_high_school_chinese
-
ceval-valid_high_school_history
-
ceval-valid_middle_school_history
-
ceval-valid_civil_servant
-
ceval-valid_sports_science
-
ceval-valid_plant_protection
-
ceval-valid_basic_medicine
-
ceval-valid_clinical_medicine
-
ceval-valid_urban_and_rural_planner
-
ceval-valid_accountant
-
ceval-valid_fire_engineer
-
ceval-valid_environmental_impact_assessment_engineer
-
ceval-valid_tax_accountant
-
ceval-valid_physician
lm_eval/tasks/ceval/_default_ceval_yaml
View file @
a2af2101
group: ceval-valid
dataset_path: ceval/ceval-exam
dataset_path: ceval/ceval-exam
validation_split: val
validation_split: val
fewshot_split: dev
fewshot_split: dev
...
...
lm_eval/tasks/ceval/_generate_configs.py
View file @
a2af2101
"""
"""
Take in a YAML, and output all other splits with this YAML
Take in a YAML, and output all other splits with this YAML
"""
"""
import
argparse
import
argparse
import
os
import
os
import
yaml
import
yaml
from
tqdm
import
tqdm
from
tqdm
import
tqdm
from
lm_eval.
logger
import
eval_logger
from
lm_eval.
utils
import
eval_logger
SUBJECTS
=
{
SUBJECTS
=
{
...
@@ -116,3 +117,26 @@ if __name__ == "__main__":
...
@@ -116,3 +117,26 @@ if __name__ == "__main__":
allow_unicode
=
True
,
allow_unicode
=
True
,
default_style
=
'"'
,
default_style
=
'"'
,
)
)
# write group config out
group_yaml_dict
=
{
"group"
:
"ceval-valid"
,
"task"
:
[
f
"ceval-valid_
{
task_name
}
"
for
task_name
in
SUBJECTS
.
keys
()],
"aggregate_metric_list"
:
[
{
"metric"
:
"acc"
,
"aggregation"
:
"mean"
,
"weight_by_size"
:
True
},
{
"metric"
:
"acc_norm"
,
"aggregation"
:
"mean"
,
"weight_by_size"
:
True
},
],
"metadata"
:
{
"version"
:
1.0
},
}
file_save_path
=
"_"
+
args
.
save_prefix_path
+
".yaml"
with
open
(
file_save_path
,
"w"
,
encoding
=
"utf-8"
)
as
group_yaml_file
:
yaml
.
dump
(
group_yaml_dict
,
group_yaml_file
,
width
=
float
(
"inf"
),
allow_unicode
=
True
,
default_style
=
'"'
,
)
lm_eval/tasks/cmmlu/_cmmlu.yaml
0 → 100644
View file @
a2af2101
group
:
cmmlu
task
:
-
cmmlu_agronomy
-
cmmlu_anatomy
-
cmmlu_ancient_chinese
-
cmmlu_arts
-
cmmlu_astronomy
-
cmmlu_business_ethics
-
cmmlu_chinese_civil_service_exam
-
cmmlu_chinese_driving_rule
-
cmmlu_chinese_food_culture
-
cmmlu_chinese_foreign_policy
-
cmmlu_chinese_history
-
cmmlu_chinese_literature
-
cmmlu_chinese_teacher_qualification
-
cmmlu_clinical_knowledge
-
cmmlu_college_actuarial_science
-
cmmlu_college_education
-
cmmlu_college_engineering_hydrology
-
cmmlu_college_law
-
cmmlu_college_mathematics
-
cmmlu_college_medical_statistics
-
cmmlu_college_medicine
-
cmmlu_computer_science
-
cmmlu_computer_security
-
cmmlu_conceptual_physics
-
cmmlu_construction_project_management
-
cmmlu_economics
-
cmmlu_education
-
cmmlu_electrical_engineering
-
cmmlu_elementary_chinese
-
cmmlu_elementary_commonsense
-
cmmlu_elementary_information_and_technology
-
cmmlu_elementary_mathematics
-
cmmlu_ethnology
-
cmmlu_food_science
-
cmmlu_genetics
-
cmmlu_global_facts
-
cmmlu_high_school_biology
-
cmmlu_high_school_chemistry
-
cmmlu_high_school_geography
-
cmmlu_high_school_mathematics
-
cmmlu_high_school_physics
-
cmmlu_high_school_politics
-
cmmlu_human_sexuality
-
cmmlu_international_law
-
cmmlu_journalism
-
cmmlu_jurisprudence
-
cmmlu_legal_and_moral_basis
-
cmmlu_logical
-
cmmlu_machine_learning
-
cmmlu_management
-
cmmlu_marketing
-
cmmlu_marxist_theory
-
cmmlu_modern_chinese
-
cmmlu_nutrition
-
cmmlu_philosophy
-
cmmlu_professional_accounting
-
cmmlu_professional_law
-
cmmlu_professional_medicine
-
cmmlu_professional_psychology
-
cmmlu_public_relations
-
cmmlu_security_study
-
cmmlu_sociology
-
cmmlu_sports_science
-
cmmlu_traditional_chinese_medicine
-
cmmlu_virology
-
cmmlu_world_history
-
cmmlu_world_religions
aggregate_metric_list
:
-
aggregation
:
mean
metric
:
acc
weight_by_size
:
true
-
aggregation
:
mean
metric
:
acc_norm
weight_by_size
:
true
metadata
:
version
:
0.0
lm_eval/tasks/cmmlu/_default_template_yaml
View file @
a2af2101
group: cmmlu
dataset_path: haonan-li/cmmlu
dataset_path: haonan-li/cmmlu
test_split: test
test_split: test
fewshot_split: dev
fewshot_split: dev
...
...
lm_eval/tasks/cmmlu/_generate_configs.py
View file @
a2af2101
"""
"""
Take in a YAML, and output all other splits with this YAML
Take in a YAML, and output all other splits with this YAML
"""
"""
import
argparse
import
argparse
import
os
import
os
...
@@ -131,3 +132,33 @@ if __name__ == "__main__":
...
@@ -131,3 +132,33 @@ if __name__ == "__main__":
allow_unicode
=
True
,
allow_unicode
=
True
,
default_style
=
'"'
,
default_style
=
'"'
,
)
)
# write group config out
group_yaml_dict
=
{
"group"
:
"cmmlu"
,
"task"
:
[
(
f
"cmmlu_
{
args
.
task_prefix
}
_
{
subject_eng
}
"
if
args
.
task_prefix
!=
""
else
f
"cmmlu_
{
subject_eng
}
"
)
for
subject_eng
in
SUBJECTS
.
keys
()
],
"aggregate_metric_list"
:
[
{
"metric"
:
"acc"
,
"aggregation"
:
"mean"
,
"weight_by_size"
:
True
},
{
"metric"
:
"acc_norm"
,
"aggregation"
:
"mean"
,
"weight_by_size"
:
True
},
],
"metadata"
:
{
"version"
:
0.0
},
}
file_save_path
=
"_"
+
args
.
save_prefix_path
+
".yaml"
with
open
(
file_save_path
,
"w"
,
encoding
=
"utf-8"
)
as
group_yaml_file
:
yaml
.
dump
(
group_yaml_dict
,
group_yaml_file
,
width
=
float
(
"inf"
),
allow_unicode
=
True
,
default_style
=
'"'
,
)
lm_eval/tasks/cmmlu/cmmlu_agronomy.yaml
0 → 100644
View file @
a2af2101
"
dataset_name"
:
"
agronomy"
"
description"
:
"
以下是关于农学的单项选择题,请直接给出正确答案的选项。
\n\n
"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
cmmlu_agronomy"
lm_eval/tasks/cmmlu/cmmlu_anatomy.yaml
0 → 100644
View file @
a2af2101
"
dataset_name"
:
"
anatomy"
"
description"
:
"
以下是关于解剖学的单项选择题,请直接给出正确答案的选项。
\n\n
"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
cmmlu_anatomy"
lm_eval/tasks/cmmlu/cmmlu_ancient_chinese.yaml
0 → 100644
View file @
a2af2101
"
dataset_name"
:
"
ancient_chinese"
"
description"
:
"
以下是关于古汉语的单项选择题,请直接给出正确答案的选项。
\n\n
"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
cmmlu_ancient_chinese"
lm_eval/tasks/cmmlu/cmmlu_arts.yaml
0 → 100644
View file @
a2af2101
"
dataset_name"
:
"
arts"
"
description"
:
"
以下是关于艺术学的单项选择题,请直接给出正确答案的选项。
\n\n
"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
cmmlu_arts"
lm_eval/tasks/cmmlu/cmmlu_astronomy.yaml
0 → 100644
View file @
a2af2101
"
dataset_name"
:
"
astronomy"
"
description"
:
"
以下是关于天文学的单项选择题,请直接给出正确答案的选项。
\n\n
"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
cmmlu_astronomy"
lm_eval/tasks/
a
mmlu/
a
mmlu_business_ethics.yaml
→
lm_eval/tasks/
c
mmlu/
c
mmlu_business_ethics.yaml
View file @
a2af2101
"
dataset_name"
:
"
business_ethics"
"
dataset_name"
:
"
business_ethics"
"
description"
:
"
فم
بعملية
التقييم
في
مجال
علوم
أخرى
\n\n
"
"
description"
:
"
以下是关于商业伦理的单项选择题,请直接给出正确答案的选项。
\n\n
"
"
include"
:
"
_default_template_yaml"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
a
mmlu_business_ethics"
"
task"
:
"
c
mmlu_business_ethics"
lm_eval/tasks/cmmlu/cmmlu_chinese_civil_service_exam.yaml
0 → 100644
View file @
a2af2101
"
dataset_name"
:
"
chinese_civil_service_exam"
"
description"
:
"
以下是关于中国公务员考试的单项选择题,请直接给出正确答案的选项。
\n\n
"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
cmmlu_chinese_civil_service_exam"
lm_eval/tasks/cmmlu/cmmlu_chinese_driving_rule.yaml
0 → 100644
View file @
a2af2101
"
dataset_name"
:
"
chinese_driving_rule"
"
description"
:
"
以下是关于中国驾驶规则的单项选择题,请直接给出正确答案的选项。
\n\n
"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
cmmlu_chinese_driving_rule"
lm_eval/tasks/cmmlu/cmmlu_chinese_food_culture.yaml
0 → 100644
View file @
a2af2101
"
dataset_name"
:
"
chinese_food_culture"
"
description"
:
"
以下是关于中国饮食文化的单项选择题,请直接给出正确答案的选项。
\n\n
"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
cmmlu_chinese_food_culture"
lm_eval/tasks/cmmlu/cmmlu_chinese_foreign_policy.yaml
0 → 100644
View file @
a2af2101
"
dataset_name"
:
"
chinese_foreign_policy"
"
description"
:
"
以下是关于中国外交政策的单项选择题,请直接给出正确答案的选项。
\n\n
"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
cmmlu_chinese_foreign_policy"
lm_eval/tasks/cmmlu/cmmlu_chinese_history.yaml
0 → 100644
View file @
a2af2101
"
dataset_name"
:
"
chinese_history"
"
description"
:
"
以下是关于中国历史的单项选择题,请直接给出正确答案的选项。
\n\n
"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
cmmlu_chinese_history"
lm_eval/tasks/cmmlu/cmmlu_chinese_literature.yaml
0 → 100644
View file @
a2af2101
"
dataset_name"
:
"
chinese_literature"
"
description"
:
"
以下是关于中国文学的单项选择题,请直接给出正确答案的选项。
\n\n
"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
cmmlu_chinese_literature"
lm_eval/tasks/cmmlu/cmmlu_chinese_teacher_qualification.yaml
0 → 100644
View file @
a2af2101
"
dataset_name"
:
"
chinese_teacher_qualification"
"
description"
:
"
以下是关于中国教师资格的单项选择题,请直接给出正确答案的选项。
\n\n
"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
cmmlu_chinese_teacher_qualification"
Prev
1
…
19
20
21
22
23
24
25
26
27
…
50
Next
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment