Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
gaoqiong
lm-evaluation-harness
Commits
ded890f3
Unverified
Commit
ded890f3
authored
Mar 28, 2025
by
Jinho Heo
Committed by
GitHub
Mar 28, 2025
Browse files
Add kmmlu multiple-choice(accuracy) task (#2849)
parent
febd19d8
Changes
52
Show whitespace changes
Inline
Side-by-side
Showing
12 changed files
with
48 additions
and
0 deletions
+48
-0
lm_eval/tasks/kmmlu/default/kmmlu_mechanical_engineering.yaml
...val/tasks/kmmlu/default/kmmlu_mechanical_engineering.yaml
+4
-0
lm_eval/tasks/kmmlu/default/kmmlu_nondestructive_testing.yaml
...val/tasks/kmmlu/default/kmmlu_nondestructive_testing.yaml
+4
-0
lm_eval/tasks/kmmlu/default/kmmlu_patent.yaml
lm_eval/tasks/kmmlu/default/kmmlu_patent.yaml
+4
-0
lm_eval/tasks/kmmlu/default/kmmlu_political_science_and_sociology.yaml
.../kmmlu/default/kmmlu_political_science_and_sociology.yaml
+4
-0
lm_eval/tasks/kmmlu/default/kmmlu_psychology.yaml
lm_eval/tasks/kmmlu/default/kmmlu_psychology.yaml
+4
-0
lm_eval/tasks/kmmlu/default/kmmlu_public_safety.yaml
lm_eval/tasks/kmmlu/default/kmmlu_public_safety.yaml
+4
-0
lm_eval/tasks/kmmlu/default/kmmlu_railway_and_automotive_engineering.yaml
...mlu/default/kmmlu_railway_and_automotive_engineering.yaml
+4
-0
lm_eval/tasks/kmmlu/default/kmmlu_real_estate.yaml
lm_eval/tasks/kmmlu/default/kmmlu_real_estate.yaml
+4
-0
lm_eval/tasks/kmmlu/default/kmmlu_refrigerating_machinery.yaml
...al/tasks/kmmlu/default/kmmlu_refrigerating_machinery.yaml
+4
-0
lm_eval/tasks/kmmlu/default/kmmlu_social_welfare.yaml
lm_eval/tasks/kmmlu/default/kmmlu_social_welfare.yaml
+4
-0
lm_eval/tasks/kmmlu/default/kmmlu_taxation.yaml
lm_eval/tasks/kmmlu/default/kmmlu_taxation.yaml
+4
-0
lm_eval/tasks/kmmlu/default/kmmlu_telecommunications_and_wireless_technology.yaml
...ult/kmmlu_telecommunications_and_wireless_technology.yaml
+4
-0
No files found.
lm_eval/tasks/kmmlu/default/kmmlu_mechanical_engineering.yaml
0 → 100644
View file @
ded890f3
dataset_name
:
Mechanical-Engineering
include
:
_default_kmmlu_yaml
task
:
kmmlu_mechanical_engineering
tag
:
kmmlu_stem_tasks
lm_eval/tasks/kmmlu/default/kmmlu_nondestructive_testing.yaml
0 → 100644
View file @
ded890f3
dataset_name
:
Nondestructive-Testing
include
:
_default_kmmlu_yaml
task
:
kmmlu_nondestructive_testing
tag
:
kmmlu_applied_science_tasks
lm_eval/tasks/kmmlu/default/kmmlu_patent.yaml
0 → 100644
View file @
ded890f3
dataset_name
:
Patent
include
:
_default_kmmlu_yaml
task
:
kmmlu_patent
tag
:
kmmlu_other_tasks
lm_eval/tasks/kmmlu/default/kmmlu_political_science_and_sociology.yaml
0 → 100644
View file @
ded890f3
dataset_name
:
Political-Science-and-Sociology
include
:
_default_kmmlu_yaml
task
:
kmmlu_political_science_and_sociology
tag
:
kmmlu_humss_tasks
lm_eval/tasks/kmmlu/default/kmmlu_psychology.yaml
0 → 100644
View file @
ded890f3
dataset_name
:
Psychology
include
:
_default_kmmlu_yaml
task
:
kmmlu_psychology
tag
:
kmmlu_humss_tasks
lm_eval/tasks/kmmlu/default/kmmlu_public_safety.yaml
0 → 100644
View file @
ded890f3
dataset_name
:
Public-Safety
include
:
_default_kmmlu_yaml
task
:
kmmlu_public_safety
tag
:
kmmlu_other_tasks
lm_eval/tasks/kmmlu/default/kmmlu_railway_and_automotive_engineering.yaml
0 → 100644
View file @
ded890f3
dataset_name
:
Railway-and-Automotive-Engineering
include
:
_default_kmmlu_yaml
task
:
kmmlu_railway_and_automotive_engineering
tag
:
kmmlu_applied_science_tasks
lm_eval/tasks/kmmlu/default/kmmlu_real_estate.yaml
0 → 100644
View file @
ded890f3
dataset_name
:
Real-Estate
include
:
_default_kmmlu_yaml
task
:
kmmlu_real_estate
tag
:
kmmlu_other_tasks
lm_eval/tasks/kmmlu/default/kmmlu_refrigerating_machinery.yaml
0 → 100644
View file @
ded890f3
dataset_name
:
Refrigerating-Machinery
include
:
_default_kmmlu_yaml
task
:
kmmlu_refrigerating_machinery
tag
:
kmmlu_other_tasks
lm_eval/tasks/kmmlu/default/kmmlu_social_welfare.yaml
0 → 100644
View file @
ded890f3
dataset_name
:
Social-Welfare
include
:
_default_kmmlu_yaml
task
:
kmmlu_social_welfare
tag
:
kmmlu_humss_tasks
lm_eval/tasks/kmmlu/default/kmmlu_taxation.yaml
0 → 100644
View file @
ded890f3
dataset_name
:
Taxation
include
:
_default_kmmlu_yaml
task
:
kmmlu_taxation
tag
:
kmmlu_humss_tasks
lm_eval/tasks/kmmlu/default/kmmlu_telecommunications_and_wireless_technology.yaml
0 → 100644
View file @
ded890f3
dataset_name
:
Telecommunications-and-Wireless-Technology
include
:
_default_kmmlu_yaml
task
:
kmmlu_telecommunications_and_wireless_technology
tag
:
kmmlu_applied_science_tasks
Prev
1
2
3
Next
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment