Added KMMLU evaluation method and changed ReadMe (#1447)

* update kmmlu default formatting * Update _default_kmmlu_yaml * Delete lm_eval/tasks/kmmlu/utils.py * new tasks implemented * add direct tasks * update direct evaluate * update direct eval * add cot sample * update cot * add cot * Update _cot_kmmlu_yaml * add kmmlu90 * Update and rename _cot_kmmlu.yaml to _cot_kmmlu_yaml * Create kmmlu90.yaml * Update _cot_kmmlu_yaml * add direct * Update _cot_kmmlu_yaml * Update and rename kmmlu90.yaml to kmmlu90_cot.yaml * Update kmmlu90_direct.yaml * add kmmlu hard * Update _cot_kmmlu_yaml * Update _cot_kmmlu_yaml * update cot * update cot * erase typo * Update _cot_kmmlu_yaml * update cot * Rename dataset to match k-mmlu-hard * removed kmmlu90 * fixed name 'kmmlu_cot' to 'kmmlu_hard_cot' and revised README * applied pre-commit before pull requests * rename datasets and add notes * Remove DS_Store cache * Update lm_eval/tasks/kmmlu/README.md Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * Change citations and reflect reviews on version * Added kmmlu_hard and fixed other errors * fixing minor errors * remove duplicated * Rename files * try ".index" * minor fix * minor fix again * fix revert. * minor fix. thank for hailey --------- Co-authored-by: GUIJIN SON <spthsrbwls123@yonsei.ac.kr> Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

Added KMMLU evaluation method and changed ReadMe (#1447)
* update kmmlu default formatting * Update _default_kmmlu_yaml * Delete lm_eval/tasks/kmmlu/utils.py * new tasks implemented * add direct tasks * update direct evaluate * update direct eval * add cot sample * update cot * add cot * Update _cot_kmmlu_yaml * add kmmlu90 * Update and rename _cot_kmmlu.yaml to _cot_kmmlu_yaml * Create kmmlu90.yaml * Update _cot_kmmlu_yaml * add direct * Update _cot_kmmlu_yaml * Update and rename kmmlu90.yaml to kmmlu90_cot.yaml * Update kmmlu90_direct.yaml * add kmmlu hard * Update _cot_kmmlu_yaml * Update _cot_kmmlu_yaml * update cot * update cot * erase typo * Update _cot_kmmlu_yaml * update cot * Rename dataset to match k-mmlu-hard * removed kmmlu90 * fixed name 'kmmlu_cot' to 'kmmlu_hard_cot' and revised README * applied pre-commit before pull requests * rename datasets and add notes * Remove DS_Store cache * Update lm_eval/tasks/kmmlu/README.md Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * Change citations and reflect reviews on version * Added kmmlu_hard and fixed other errors * fixing minor errors * remove duplicated * Rename files * try ".index" * minor fix * minor fix again * fix revert. * minor fix. thank for hailey --------- Co-authored-by: GUIJIN SON <spthsrbwls123@yonsei.ac.kr> Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
c26a6ac7 · Hanwool Albert Lee · GitHub · 5ab295c8 · c26a6ac7 · c26a6ac7
Unverified Commit c26a6ac7 authored Feb 21, 2024 by Hanwool Albert Lee Committed by GitHub Feb 21, 2024
20 changed files
--- a/lm_eval/tasks/kmmlu/direct/kmmlu_direct_math.yaml
+++ b/lm_eval/tasks/kmmlu/direct/kmmlu_direct_math.yaml
+dataset_name: Math
+include: _direct_kmmlu_yaml
+task: kmmlu_direct_math
--- a/lm_eval/tasks/kmmlu/direct/kmmlu_direct_mechanical_engineering.yaml
+++ b/lm_eval/tasks/kmmlu/direct/kmmlu_direct_mechanical_engineering.yaml
+dataset_name: Mechanical-Engineering
+include: _direct_kmmlu_yaml
+task: kmmlu_direct_mechanical_engineering
--- a/lm_eval/tasks/kmmlu/direct/kmmlu_direct_nondestructive_testing.yaml
+++ b/lm_eval/tasks/kmmlu/direct/kmmlu_direct_nondestructive_testing.yaml
+dataset_name: Nondestructive-Testing
+include: _direct_kmmlu_yaml
+task: kmmlu_direct_nondestructive_testing
--- a/lm_eval/tasks/kmmlu/direct/kmmlu_direct_patent.yaml
+++ b/lm_eval/tasks/kmmlu/direct/kmmlu_direct_patent.yaml
+dataset_name: Patent
+include: _direct_kmmlu_yaml
+task: kmmlu_direct_patent
--- a/lm_eval/tasks/kmmlu/direct/kmmlu_direct_political_science_and_sociology.yaml
+++ b/lm_eval/tasks/kmmlu/direct/kmmlu_direct_political_science_and_sociology.yaml
+dataset_name: Political-Science-and-Sociology
+include: _direct_kmmlu_yaml
+task: kmmlu_direct_political_science_and_sociology
--- a/lm_eval/tasks/kmmlu/direct/kmmlu_direct_psychology.yaml
+++ b/lm_eval/tasks/kmmlu/direct/kmmlu_direct_psychology.yaml
+dataset_name: Psychology
+include: _direct_kmmlu_yaml
+task: kmmlu_direct_psychology
--- a/lm_eval/tasks/kmmlu/direct/kmmlu_direct_public_safety.yaml
+++ b/lm_eval/tasks/kmmlu/direct/kmmlu_direct_public_safety.yaml
+dataset_name: Public-Safety
+include: _direct_kmmlu_yaml
+task: kmmlu_direct_public_safety
--- a/lm_eval/tasks/kmmlu/direct/kmmlu_direct_railway_and_automotive_engineering.yaml
+++ b/lm_eval/tasks/kmmlu/direct/kmmlu_direct_railway_and_automotive_engineering.yaml
+dataset_name: Railway-and-Automotive-Engineering
+include: _direct_kmmlu_yaml
+task: kmmlu_direct_railway_and_automotive_engineering
--- a/lm_eval/tasks/kmmlu/direct/kmmlu_direct_real_estate.yaml
+++ b/lm_eval/tasks/kmmlu/direct/kmmlu_direct_real_estate.yaml
+dataset_name: Real-Estate
+include: _direct_kmmlu_yaml
+task: kmmlu_direct_real_estate
--- a/lm_eval/tasks/kmmlu/direct/kmmlu_direct_refrigerating_machinery.yaml
+++ b/lm_eval/tasks/kmmlu/direct/kmmlu_direct_refrigerating_machinery.yaml
+dataset_name: Refrigerating-Machinery
+include: _direct_kmmlu_yaml
+task: kmmlu_direct_refrigerating_machinery
--- a/lm_eval/tasks/kmmlu/direct/kmmlu_direct_social_welfare.yaml
+++ b/lm_eval/tasks/kmmlu/direct/kmmlu_direct_social_welfare.yaml
+dataset_name: Social-Welfare
+include: _direct_kmmlu_yaml
+task: kmmlu_direct_social_welfare
--- a/lm_eval/tasks/kmmlu/direct/kmmlu_direct_taxation.yaml
+++ b/lm_eval/tasks/kmmlu/direct/kmmlu_direct_taxation.yaml
+dataset_name: Taxation
+include: _direct_kmmlu_yaml
+task: kmmlu_direct_taxation
--- a/lm_eval/tasks/kmmlu/direct/kmmlu_direct_telecommunications_and_wireless_technology.yaml
+++ b/lm_eval/tasks/kmmlu/direct/kmmlu_direct_telecommunications_and_wireless_technology.yaml
+dataset_name: Telecommunications-and-Wireless-Technology
+include: _direct_kmmlu_yaml
+task: kmmlu_direct_telecommunications_and_wireless_technology
--- a/lm_eval/tasks/kmmlu/direct_hard/_direct_hard_kmmlu_yaml
+++ b/lm_eval/tasks/kmmlu/direct_hard/_direct_hard_kmmlu_yaml
+group:
+    - kmmlu
+    - kmmlu_hard_direct
+dataset_path: HAERAE-HUB/KMMLU-HARD
+output_type: generate_until
+test_split: test
+fewshot_split: dev
+doc_to_text: "{{question.strip()}}\nA. {{A}}\nB. {{B}}\nC. {{C}}\nD. {{D}}\n정답："
+doc_to_target: "{{['A', 'B', 'C', 'D'][answer-1]}}"
+metric_list:
+  - metric: exact_match
+    aggregation: mean
+    higher_is_better: true
+    ignore_case: true
+    ignore_punctuation: true
+    regexes_to_ignore:
+          - " "
+generation_kwargs:
+  until:
+    - "Q:"
+    - "\n\n"
+    - "</s>"
+    - "."
+  do_sample: false
+  temperature: 0.0
+metadata:
+  version: 2.0
--- a/lm_eval/tasks/kmmlu/direct_hard/kmmlu_direct_hard_accounting.yaml
+++ b/lm_eval/tasks/kmmlu/direct_hard/kmmlu_direct_hard_accounting.yaml
+dataset_name: accounting
+include: _direct_hard_kmmlu_yaml
+task: kmmlu_hard_direct_accounting
--- a/lm_eval/tasks/kmmlu/direct_hard/kmmlu_direct_hard_agricultural_sciences.yaml
+++ b/lm_eval/tasks/kmmlu/direct_hard/kmmlu_direct_hard_agricultural_sciences.yaml
+dataset_name: agricultural_sciences
+include: _direct_hard_kmmlu_yaml
+task: kmmlu_hard_direct_agricultural_sciences
--- a/lm_eval/tasks/kmmlu/direct_hard/kmmlu_direct_hard_aviation_engineering_and_maintenance.yaml
+++ b/lm_eval/tasks/kmmlu/direct_hard/kmmlu_direct_hard_aviation_engineering_and_maintenance.yaml
+dataset_name: aviation_engineering_and_maintenance
+include: _direct_hard_kmmlu_yaml
+task: kmmlu_hard_direct_aviation_engineering_and_maintenance
--- a/lm_eval/tasks/kmmlu/direct_hard/kmmlu_direct_hard_biology.yaml
+++ b/lm_eval/tasks/kmmlu/direct_hard/kmmlu_direct_hard_biology.yaml
+dataset_name: biology
+include: _direct_hard_kmmlu_yaml
+task: kmmlu_hard_direct_biology
--- a/lm_eval/tasks/kmmlu/direct_hard/kmmlu_direct_hard_chemical_engineering.yaml
+++ b/lm_eval/tasks/kmmlu/direct_hard/kmmlu_direct_hard_chemical_engineering.yaml
+dataset_name: chemical_engineering
+include: _direct_hard_kmmlu_yaml
+task: kmmlu_hard_direct_chemical_engineering
--- a/lm_eval/tasks/kmmlu/direct_hard/kmmlu_direct_hard_chemistry.yaml
+++ b/lm_eval/tasks/kmmlu/direct_hard/kmmlu_direct_hard_chemistry.yaml
+dataset_name: chemistry
+include: _direct_hard_kmmlu_yaml
+task: kmmlu_hard_direct_chemistry