Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
gaoqiong
lm-evaluation-harness
Commits
ded890f3
Unverified
Commit
ded890f3
authored
Mar 28, 2025
by
Jinho Heo
Committed by
GitHub
Mar 28, 2025
Browse files
Add kmmlu multiple-choice(accuracy) task (#2849)
parent
febd19d8
Changes
52
Hide whitespace changes
Inline
Side-by-side
Showing
20 changed files
with
80 additions
and
0 deletions
+80
-0
lm_eval/tasks/kmmlu/default/kmmlu_electrical_engineering.yaml
...val/tasks/kmmlu/default/kmmlu_electrical_engineering.yaml
+4
-0
lm_eval/tasks/kmmlu/default/kmmlu_electronics_engineering.yaml
...al/tasks/kmmlu/default/kmmlu_electronics_engineering.yaml
+4
-0
lm_eval/tasks/kmmlu/default/kmmlu_energy_management.yaml
lm_eval/tasks/kmmlu/default/kmmlu_energy_management.yaml
+4
-0
lm_eval/tasks/kmmlu/default/kmmlu_environmental_science.yaml
lm_eval/tasks/kmmlu/default/kmmlu_environmental_science.yaml
+4
-0
lm_eval/tasks/kmmlu/default/kmmlu_fashion.yaml
lm_eval/tasks/kmmlu/default/kmmlu_fashion.yaml
+4
-0
lm_eval/tasks/kmmlu/default/kmmlu_food_processing.yaml
lm_eval/tasks/kmmlu/default/kmmlu_food_processing.yaml
+4
-0
lm_eval/tasks/kmmlu/default/kmmlu_gas_technology_and_engineering.yaml
...s/kmmlu/default/kmmlu_gas_technology_and_engineering.yaml
+4
-0
lm_eval/tasks/kmmlu/default/kmmlu_geomatics.yaml
lm_eval/tasks/kmmlu/default/kmmlu_geomatics.yaml
+4
-0
lm_eval/tasks/kmmlu/default/kmmlu_health.yaml
lm_eval/tasks/kmmlu/default/kmmlu_health.yaml
+4
-0
lm_eval/tasks/kmmlu/default/kmmlu_industrial_engineer.yaml
lm_eval/tasks/kmmlu/default/kmmlu_industrial_engineer.yaml
+4
-0
lm_eval/tasks/kmmlu/default/kmmlu_information_technology.yaml
...val/tasks/kmmlu/default/kmmlu_information_technology.yaml
+4
-0
lm_eval/tasks/kmmlu/default/kmmlu_interior_architecture_and_design.yaml
...kmmlu/default/kmmlu_interior_architecture_and_design.yaml
+4
-0
lm_eval/tasks/kmmlu/default/kmmlu_korean_history.yaml
lm_eval/tasks/kmmlu/default/kmmlu_korean_history.yaml
+4
-0
lm_eval/tasks/kmmlu/default/kmmlu_law.yaml
lm_eval/tasks/kmmlu/default/kmmlu_law.yaml
+4
-0
lm_eval/tasks/kmmlu/default/kmmlu_machine_design_and_manufacturing.yaml
...kmmlu/default/kmmlu_machine_design_and_manufacturing.yaml
+4
-0
lm_eval/tasks/kmmlu/default/kmmlu_management.yaml
lm_eval/tasks/kmmlu/default/kmmlu_management.yaml
+4
-0
lm_eval/tasks/kmmlu/default/kmmlu_maritime_engineering.yaml
lm_eval/tasks/kmmlu/default/kmmlu_maritime_engineering.yaml
+4
-0
lm_eval/tasks/kmmlu/default/kmmlu_marketing.yaml
lm_eval/tasks/kmmlu/default/kmmlu_marketing.yaml
+4
-0
lm_eval/tasks/kmmlu/default/kmmlu_materials_engineering.yaml
lm_eval/tasks/kmmlu/default/kmmlu_materials_engineering.yaml
+4
-0
lm_eval/tasks/kmmlu/default/kmmlu_math.yaml
lm_eval/tasks/kmmlu/default/kmmlu_math.yaml
+4
-0
No files found.
lm_eval/tasks/kmmlu/default/kmmlu_electrical_engineering.yaml
0 → 100644
View file @
ded890f3
dataset_name
:
Electrical-Engineering
include
:
_default_kmmlu_yaml
task
:
kmmlu_electrical_engineering
tag
:
kmmlu_stem_tasks
lm_eval/tasks/kmmlu/default/kmmlu_electronics_engineering.yaml
0 → 100644
View file @
ded890f3
dataset_name
:
Electronics-Engineering
include
:
_default_kmmlu_yaml
task
:
kmmlu_electronics_engineering
tag
:
kmmlu_applied_science_tasks
lm_eval/tasks/kmmlu/default/kmmlu_energy_management.yaml
0 → 100644
View file @
ded890f3
dataset_name
:
Energy-Management
include
:
_default_kmmlu_yaml
task
:
kmmlu_energy_management
tag
:
kmmlu_applied_science_tasks
lm_eval/tasks/kmmlu/default/kmmlu_environmental_science.yaml
0 → 100644
View file @
ded890f3
dataset_name
:
Environmental-Science
include
:
_default_kmmlu_yaml
task
:
kmmlu_environmental_science
tag
:
kmmlu_applied_science_tasks
lm_eval/tasks/kmmlu/default/kmmlu_fashion.yaml
0 → 100644
View file @
ded890f3
dataset_name
:
Fashion
include
:
_default_kmmlu_yaml
task
:
kmmlu_fashion
tag
:
kmmlu_other_tasks
lm_eval/tasks/kmmlu/default/kmmlu_food_processing.yaml
0 → 100644
View file @
ded890f3
dataset_name
:
Food-Processing
include
:
_default_kmmlu_yaml
task
:
kmmlu_food_processing
tag
:
kmmlu_other_tasks
lm_eval/tasks/kmmlu/default/kmmlu_gas_technology_and_engineering.yaml
0 → 100644
View file @
ded890f3
dataset_name
:
Gas-Technology-and-Engineering
include
:
_default_kmmlu_yaml
task
:
kmmlu_gas_technology_and_engineering
tag
:
kmmlu_applied_science_tasks
lm_eval/tasks/kmmlu/default/kmmlu_geomatics.yaml
0 → 100644
View file @
ded890f3
dataset_name
:
Geomatics
include
:
_default_kmmlu_yaml
task
:
kmmlu_geomatics
tag
:
kmmlu_applied_science_tasks
lm_eval/tasks/kmmlu/default/kmmlu_health.yaml
0 → 100644
View file @
ded890f3
dataset_name
:
Health
include
:
_default_kmmlu_yaml
task
:
kmmlu_health
tag
:
kmmlu_other_tasks
lm_eval/tasks/kmmlu/default/kmmlu_industrial_engineer.yaml
0 → 100644
View file @
ded890f3
dataset_name
:
Industrial-Engineer
include
:
_default_kmmlu_yaml
task
:
kmmlu_industrial_engineer
tag
:
kmmlu_applied_science_tasks
lm_eval/tasks/kmmlu/default/kmmlu_information_technology.yaml
0 → 100644
View file @
ded890f3
dataset_name
:
Information-Technology
include
:
_default_kmmlu_yaml
task
:
kmmlu_information_technology
tag
:
kmmlu_stem_tasks
lm_eval/tasks/kmmlu/default/kmmlu_interior_architecture_and_design.yaml
0 → 100644
View file @
ded890f3
dataset_name
:
Interior-Architecture-and-Design
include
:
_default_kmmlu_yaml
task
:
kmmlu_interior_architecture_and_design
tag
:
kmmlu_other_tasks
lm_eval/tasks/kmmlu/default/kmmlu_korean_history.yaml
0 → 100644
View file @
ded890f3
dataset_name
:
Korean-History
include
:
_default_kmmlu_yaml
task
:
kmmlu_korean_history
tag
:
kmmlu_humss_tasks
lm_eval/tasks/kmmlu/default/kmmlu_law.yaml
0 → 100644
View file @
ded890f3
dataset_name
:
Law
include
:
_default_kmmlu_yaml
task
:
kmmlu_law
tag
:
kmmlu_humss_tasks
lm_eval/tasks/kmmlu/default/kmmlu_machine_design_and_manufacturing.yaml
0 → 100644
View file @
ded890f3
dataset_name
:
Machine-Design-and-Manufacturing
include
:
_default_kmmlu_yaml
task
:
kmmlu_machine_design_and_manufacturing
tag
:
kmmlu_applied_science_tasks
lm_eval/tasks/kmmlu/default/kmmlu_management.yaml
0 → 100644
View file @
ded890f3
dataset_name
:
Management
include
:
_default_kmmlu_yaml
task
:
kmmlu_management
tag
:
kmmlu_humss_tasks
lm_eval/tasks/kmmlu/default/kmmlu_maritime_engineering.yaml
0 → 100644
View file @
ded890f3
dataset_name
:
Maritime-Engineering
include
:
_default_kmmlu_yaml
task
:
kmmlu_maritime_engineering
tag
:
kmmlu_applied_science_tasks
lm_eval/tasks/kmmlu/default/kmmlu_marketing.yaml
0 → 100644
View file @
ded890f3
dataset_name
:
Marketing
include
:
_default_kmmlu_yaml
task
:
kmmlu_marketing
tag
:
kmmlu_other_tasks
lm_eval/tasks/kmmlu/default/kmmlu_materials_engineering.yaml
0 → 100644
View file @
ded890f3
dataset_name
:
Materials-Engineering
include
:
_default_kmmlu_yaml
task
:
kmmlu_materials_engineering
tag
:
kmmlu_stem_tasks
lm_eval/tasks/kmmlu/default/kmmlu_math.yaml
0 → 100644
View file @
ded890f3
dataset_name
:
Math
include
:
_default_kmmlu_yaml
task
:
kmmlu_math
tag
:
kmmlu_stem_tasks
Prev
1
2
3
Next
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment