Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
gaoqiong
lm-evaluation-harness
Commits
ded890f3
Unverified
Commit
ded890f3
authored
Mar 28, 2025
by
Jinho Heo
Committed by
GitHub
Mar 28, 2025
Browse files
Add kmmlu multiple-choice(accuracy) task (#2849)
parent
febd19d8
Changes
52
Hide whitespace changes
Inline
Side-by-side
Showing
20 changed files
with
109 additions
and
0 deletions
+109
-0
lm_eval/tasks/kmmlu/README.md
lm_eval/tasks/kmmlu/README.md
+1
-0
lm_eval/tasks/kmmlu/default/_default_kmmlu_yaml
lm_eval/tasks/kmmlu/default/_default_kmmlu_yaml
+13
-0
lm_eval/tasks/kmmlu/default/_kmmlu_applied_science.yaml
lm_eval/tasks/kmmlu/default/_kmmlu_applied_science.yaml
+8
-0
lm_eval/tasks/kmmlu/default/_kmmlu_default.yaml
lm_eval/tasks/kmmlu/default/_kmmlu_default.yaml
+11
-0
lm_eval/tasks/kmmlu/default/_kmmlu_humss.yaml
lm_eval/tasks/kmmlu/default/_kmmlu_humss.yaml
+8
-0
lm_eval/tasks/kmmlu/default/_kmmlu_other.yaml
lm_eval/tasks/kmmlu/default/_kmmlu_other.yaml
+8
-0
lm_eval/tasks/kmmlu/default/_kmmlu_stem.yaml
lm_eval/tasks/kmmlu/default/_kmmlu_stem.yaml
+8
-0
lm_eval/tasks/kmmlu/default/kmmlu_accounting.yaml
lm_eval/tasks/kmmlu/default/kmmlu_accounting.yaml
+4
-0
lm_eval/tasks/kmmlu/default/kmmlu_agricultural_sciences.yaml
lm_eval/tasks/kmmlu/default/kmmlu_agricultural_sciences.yaml
+4
-0
lm_eval/tasks/kmmlu/default/kmmlu_aviation_engineering_and_maintenance.yaml
...u/default/kmmlu_aviation_engineering_and_maintenance.yaml
+4
-0
lm_eval/tasks/kmmlu/default/kmmlu_biology.yaml
lm_eval/tasks/kmmlu/default/kmmlu_biology.yaml
+4
-0
lm_eval/tasks/kmmlu/default/kmmlu_chemical_engineering.yaml
lm_eval/tasks/kmmlu/default/kmmlu_chemical_engineering.yaml
+4
-0
lm_eval/tasks/kmmlu/default/kmmlu_chemistry.yaml
lm_eval/tasks/kmmlu/default/kmmlu_chemistry.yaml
+4
-0
lm_eval/tasks/kmmlu/default/kmmlu_civil_engineering.yaml
lm_eval/tasks/kmmlu/default/kmmlu_civil_engineering.yaml
+4
-0
lm_eval/tasks/kmmlu/default/kmmlu_computer_science.yaml
lm_eval/tasks/kmmlu/default/kmmlu_computer_science.yaml
+4
-0
lm_eval/tasks/kmmlu/default/kmmlu_construction.yaml
lm_eval/tasks/kmmlu/default/kmmlu_construction.yaml
+4
-0
lm_eval/tasks/kmmlu/default/kmmlu_criminal_law.yaml
lm_eval/tasks/kmmlu/default/kmmlu_criminal_law.yaml
+4
-0
lm_eval/tasks/kmmlu/default/kmmlu_ecology.yaml
lm_eval/tasks/kmmlu/default/kmmlu_ecology.yaml
+4
-0
lm_eval/tasks/kmmlu/default/kmmlu_economics.yaml
lm_eval/tasks/kmmlu/default/kmmlu_economics.yaml
+4
-0
lm_eval/tasks/kmmlu/default/kmmlu_education.yaml
lm_eval/tasks/kmmlu/default/kmmlu_education.yaml
+4
-0
No files found.
lm_eval/tasks/kmmlu/README.md
View file @
ded890f3
...
...
@@ -32,6 +32,7 @@ Homepage: https://huggingface.co/datasets/HAERAE-HUB/KMMLU
#### Tasks
The following tasks evaluate subjects in the KMMLU dataset
-
`kmmlu_{subject_english}`
-
`kmmlu_direct_{subject_english}`
The following tasks evaluate subjects in the KMMLU-Hard dataset
...
...
lm_eval/tasks/kmmlu/default/_default_kmmlu_yaml
0 → 100644
View file @
ded890f3
dataset_path: HAERAE-HUB/KMMLU
output_type: multiple_choice
test_split: test
fewshot_split: dev
doc_to_text: "{{question.strip()}}\nA. {{A}}\nB. {{B}}\nC. {{C}}\nD. {{D}}\n정답:"
doc_to_choice: ["A", "B", "C", "D"]
doc_to_target: "{{answer-1}}"
metric_list:
- metric: acc
aggregation: mean
higher_is_better: true
metadata:
version: 2.0
lm_eval/tasks/kmmlu/default/_kmmlu_applied_science.yaml
0 → 100644
View file @
ded890f3
group
:
kmmlu_applied_science
task
:
-
kmmlu_applied_science_tasks
aggregate_metric_list
:
-
metric
:
acc
weight_by_size
:
True
metadata
:
version
:
2.0
lm_eval/tasks/kmmlu/default/_kmmlu_default.yaml
0 → 100644
View file @
ded890f3
group
:
kmmlu
task
:
-
kmmlu_stem
-
kmmlu_other
-
kmmlu_applied_science
-
kmmlu_humss
aggregate_metric_list
:
-
metric
:
acc
weight_by_size
:
True
metadata
:
version
:
2.0
lm_eval/tasks/kmmlu/default/_kmmlu_humss.yaml
0 → 100644
View file @
ded890f3
group
:
kmmlu_humss
task
:
-
kmmlu_humss_tasks
aggregate_metric_list
:
-
metric
:
acc
weight_by_size
:
True
metadata
:
version
:
2.0
lm_eval/tasks/kmmlu/default/_kmmlu_other.yaml
0 → 100644
View file @
ded890f3
group
:
kmmlu_other
task
:
-
kmmlu_other_tasks
aggregate_metric_list
:
-
metric
:
acc
weight_by_size
:
True
metadata
:
version
:
2.0
lm_eval/tasks/kmmlu/default/_kmmlu_stem.yaml
0 → 100644
View file @
ded890f3
group
:
kmmlu_stem
task
:
-
kmmlu_stem_tasks
aggregate_metric_list
:
-
metric
:
acc
weight_by_size
:
True
metadata
:
version
:
2.0
lm_eval/tasks/kmmlu/default/kmmlu_accounting.yaml
0 → 100644
View file @
ded890f3
dataset_name
:
Accounting
include
:
_default_kmmlu_yaml
task
:
kmmlu_accounting
tag
:
kmmlu_humss_tasks
lm_eval/tasks/kmmlu/default/kmmlu_agricultural_sciences.yaml
0 → 100644
View file @
ded890f3
dataset_name
:
Agricultural-Sciences
include
:
_default_kmmlu_yaml
task
:
kmmlu_agricultural_sciences
tag
:
kmmlu_other_tasks
lm_eval/tasks/kmmlu/default/kmmlu_aviation_engineering_and_maintenance.yaml
0 → 100644
View file @
ded890f3
dataset_name
:
Aviation-Engineering-and-Maintenance
include
:
_default_kmmlu_yaml
task
:
kmmlu_aviation_engineering_and_maintenance
tag
:
kmmlu_applied_science_tasks
lm_eval/tasks/kmmlu/default/kmmlu_biology.yaml
0 → 100644
View file @
ded890f3
dataset_name
:
Biology
include
:
_default_kmmlu_yaml
task
:
kmmlu_biology
tag
:
kmmlu_stem_tasks
lm_eval/tasks/kmmlu/default/kmmlu_chemical_engineering.yaml
0 → 100644
View file @
ded890f3
dataset_name
:
Chemical-Engineering
include
:
_default_kmmlu_yaml
task
:
kmmlu_chemical_engineering
tag
:
kmmlu_stem_tasks
lm_eval/tasks/kmmlu/default/kmmlu_chemistry.yaml
0 → 100644
View file @
ded890f3
dataset_name
:
Chemistry
include
:
_default_kmmlu_yaml
task
:
kmmlu_chemistry
tag
:
kmmlu_stem_tasks
lm_eval/tasks/kmmlu/default/kmmlu_civil_engineering.yaml
0 → 100644
View file @
ded890f3
dataset_name
:
Civil-Engineering
include
:
_default_kmmlu_yaml
task
:
kmmlu_civil_engineering
tag
:
kmmlu_stem_tasks
lm_eval/tasks/kmmlu/default/kmmlu_computer_science.yaml
0 → 100644
View file @
ded890f3
dataset_name
:
Computer-Science
include
:
_default_kmmlu_yaml
task
:
kmmlu_computer_science
tag
:
kmmlu_stem_tasks
lm_eval/tasks/kmmlu/default/kmmlu_construction.yaml
0 → 100644
View file @
ded890f3
dataset_name
:
Construction
include
:
_default_kmmlu_yaml
task
:
kmmlu_construction
tag
:
kmmlu_other_tasks
lm_eval/tasks/kmmlu/default/kmmlu_criminal_law.yaml
0 → 100644
View file @
ded890f3
dataset_name
:
Criminal-Law
include
:
_default_kmmlu_yaml
task
:
kmmlu_criminal_law
tag
:
kmmlu_humss_tasks
lm_eval/tasks/kmmlu/default/kmmlu_ecology.yaml
0 → 100644
View file @
ded890f3
dataset_name
:
Ecology
include
:
_default_kmmlu_yaml
task
:
kmmlu_ecology
tag
:
kmmlu_stem_tasks
lm_eval/tasks/kmmlu/default/kmmlu_economics.yaml
0 → 100644
View file @
ded890f3
dataset_name
:
Economics
include
:
_default_kmmlu_yaml
task
:
kmmlu_economics
tag
:
kmmlu_humss_tasks
lm_eval/tasks/kmmlu/default/kmmlu_education.yaml
0 → 100644
View file @
ded890f3
dataset_name
:
Education
include
:
_default_kmmlu_yaml
task
:
kmmlu_education
tag
:
kmmlu_humss_tasks
Prev
1
2
3
Next
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment