Unverified Commit 558d0d71 authored by Baber Abbasi's avatar Baber Abbasi Committed by GitHub
Browse files

mmlu-pro: add newlines to task descriptions (not leaderboard) (#2334)



* add newlines to task descriptions; increment versions

* fix task tests (with groups)

* Apply suggestions from code review

---------
Co-authored-by: default avatarHailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
parent 7d242381
...@@ -57,3 +57,8 @@ If other tasks on this dataset are already supported: ...@@ -57,3 +57,8 @@ If other tasks on this dataset are already supported:
* [ ] Is the "Main" variant of this task clearly denoted? * [ ] Is the "Main" variant of this task clearly denoted?
* [ ] Have you provided a short sentence in a README on what each new variant adds / evaluates? * [ ] Have you provided a short sentence in a README on what each new variant adds / evaluates?
* [ ] Have you noted which, if any, published evaluation setups are matched by this variant? * [ ] Have you noted which, if any, published evaluation setups are matched by this variant?
### Changelog
* (tasks, group) 2024-09-23 -- (version 1 --> version 2)
* Added one newline to task description(s) as per [reference implementation](https://github.com/TIGER-AI-Lab/MMLU-Pro/blob/47b9891aacb8bd7cda29d5c5ba17b9434dd333bc/evaluate_from_local.py#L93)
...@@ -30,4 +30,4 @@ metric_list: ...@@ -30,4 +30,4 @@ metric_list:
ignore_case: true ignore_case: true
ignore_punctuation: true ignore_punctuation: true
metadata: metadata:
version: 0.0 version: 1.0
...@@ -20,4 +20,4 @@ aggregate_metric_list: ...@@ -20,4 +20,4 @@ aggregate_metric_list:
weight_by_size: true weight_by_size: true
filter_list: custom-extract filter_list: custom-extract
metadata: metadata:
version: 1.0 version: 2.0
description: "The following are multiple choice questions (with answers) about biology. Think step by step and then finish your answer with \"the answer is (X)\" where X is the correct letter choice." description: "The following are multiple choice questions (with answers) about biology. Think step by step and then finish your answer with \"the answer is (X)\" where X is the correct letter choice.\n"
include: "_default_template_yaml" include: "_default_template_yaml"
task: "mmlu_pro_biology" task: "mmlu_pro_biology"
task_alias: "biology" task_alias: "biology"
......
description: "The following are multiple choice questions (with answers) about business. Think step by step and then finish your answer with \"the answer is (X)\" where X is the correct letter choice." description: "The following are multiple choice questions (with answers) about business. Think step by step and then finish your answer with \"the answer is (X)\" where X is the correct letter choice.\n"
include: "_default_template_yaml" include: "_default_template_yaml"
task: "mmlu_pro_business" task: "mmlu_pro_business"
task_alias: "business" task_alias: "business"
......
description: "The following are multiple choice questions (with answers) about chemistry. Think step by step and then finish your answer with \"the answer is (X)\" where X is the correct letter choice." description: "The following are multiple choice questions (with answers) about chemistry. Think step by step and then finish your answer with \"the answer is (X)\" where X is the correct letter choice.\n"
include: "_default_template_yaml" include: "_default_template_yaml"
task: "mmlu_pro_chemistry" task: "mmlu_pro_chemistry"
task_alias: "chemistry" task_alias: "chemistry"
......
description: "The following are multiple choice questions (with answers) about computer science. Think step by step and then finish your answer with \"the answer is (X)\" where X is the correct letter choice." description: "The following are multiple choice questions (with answers) about computer science. Think step by step and then finish your answer with \"the answer is (X)\" where X is the correct letter choice.\n"
include: "_default_template_yaml" include: "_default_template_yaml"
task: "mmlu_pro_computer_science" task: "mmlu_pro_computer_science"
task_alias: "computer_science" task_alias: "computer_science"
......
description: "The following are multiple choice questions (with answers) about economics. Think step by step and then finish your answer with \"the answer is (X)\" where X is the correct letter choice." description: "The following are multiple choice questions (with answers) about economics. Think step by step and then finish your answer with \"the answer is (X)\" where X is the correct letter choice.\n"
include: "_default_template_yaml" include: "_default_template_yaml"
task: "mmlu_pro_economics" task: "mmlu_pro_economics"
task_alias: "economics" task_alias: "economics"
......
description: "The following are multiple choice questions (with answers) about engineering. Think step by step and then finish your answer with \"the answer is (X)\" where X is the correct letter choice." description: "The following are multiple choice questions (with answers) about engineering. Think step by step and then finish your answer with \"the answer is (X)\" where X is the correct letter choice.\n"
include: "_default_template_yaml" include: "_default_template_yaml"
task: "mmlu_pro_engineering" task: "mmlu_pro_engineering"
task_alias: "engineering" task_alias: "engineering"
......
description: "The following are multiple choice questions (with answers) about health. Think step by step and then finish your answer with \"the answer is (X)\" where X is the correct letter choice." description: "The following are multiple choice questions (with answers) about health. Think step by step and then finish your answer with \"the answer is (X)\" where X is the correct letter choice.\n"
include: "_default_template_yaml" include: "_default_template_yaml"
task: "mmlu_pro_health" task: "mmlu_pro_health"
task_alias: "health" task_alias: "health"
......
description: "The following are multiple choice questions (with answers) about history. Think step by step and then finish your answer with \"the answer is (X)\" where X is the correct letter choice." description: "The following are multiple choice questions (with answers) about history. Think step by step and then finish your answer with \"the answer is (X)\" where X is the correct letter choice.\n"
include: "_default_template_yaml" include: "_default_template_yaml"
task: "mmlu_pro_history" task: "mmlu_pro_history"
task_alias: "history" task_alias: "history"
......
description: "The following are multiple choice questions (with answers) about law. Think step by step and then finish your answer with \"the answer is (X)\" where X is the correct letter choice." description: "The following are multiple choice questions (with answers) about law. Think step by step and then finish your answer with \"the answer is (X)\" where X is the correct letter choice.\n"
include: "_default_template_yaml" include: "_default_template_yaml"
task: "mmlu_pro_law" task: "mmlu_pro_law"
task_alias: "law" task_alias: "law"
......
description: "The following are multiple choice questions (with answers) about math. Think step by step and then finish your answer with \"the answer is (X)\" where X is the correct letter choice." description: "The following are multiple choice questions (with answers) about math. Think step by step and then finish your answer with \"the answer is (X)\" where X is the correct letter choice.\n"
include: "_default_template_yaml" include: "_default_template_yaml"
task: "mmlu_pro_math" task: "mmlu_pro_math"
task_alias: "math" task_alias: "math"
......
description: "The following are multiple choice questions (with answers) about other. Think step by step and then finish your answer with \"the answer is (X)\" where X is the correct letter choice." description: "The following are multiple choice questions (with answers) about other. Think step by step and then finish your answer with \"the answer is (X)\" where X is the correct letter choice.\n"
include: "_default_template_yaml" include: "_default_template_yaml"
task: "mmlu_pro_other" task: "mmlu_pro_other"
task_alias: "other" task_alias: "other"
......
description: "The following are multiple choice questions (with answers) about philosophy. Think step by step and then finish your answer with \"the answer is (X)\" where X is the correct letter choice." description: "The following are multiple choice questions (with answers) about philosophy. Think step by step and then finish your answer with \"the answer is (X)\" where X is the correct letter choice.\n"
include: "_default_template_yaml" include: "_default_template_yaml"
task: "mmlu_pro_philosophy" task: "mmlu_pro_philosophy"
task_alias: "philosophy" task_alias: "philosophy"
......
description: "The following are multiple choice questions (with answers) about physics. Think step by step and then finish your answer with \"the answer is (X)\" where X is the correct letter choice." description: "The following are multiple choice questions (with answers) about physics. Think step by step and then finish your answer with \"the answer is (X)\" where X is the correct letter choice.\n"
include: "_default_template_yaml" include: "_default_template_yaml"
task: "mmlu_pro_physics" task: "mmlu_pro_physics"
task_alias: "physics" task_alias: "physics"
......
description: "The following are multiple choice questions (with answers) about psychology. Think step by step and then finish your answer with \"the answer is (X)\" where X is the correct letter choice." description: "The following are multiple choice questions (with answers) about psychology. Think step by step and then finish your answer with \"the answer is (X)\" where X is the correct letter choice.\n"
include: "_default_template_yaml" include: "_default_template_yaml"
task: "mmlu_pro_psychology" task: "mmlu_pro_psychology"
task_alias: "psychology" task_alias: "psychology"
......
...@@ -5,6 +5,7 @@ import pytest ...@@ -5,6 +5,7 @@ import pytest
import lm_eval.tasks as tasks import lm_eval.tasks as tasks
from lm_eval.api.task import ConfigurableTask from lm_eval.api.task import ConfigurableTask
from lm_eval.evaluator_utils import get_task_list
from .utils import new_tasks from .utils import new_tasks
...@@ -20,10 +21,11 @@ def task_class(): ...@@ -20,10 +21,11 @@ def task_class():
# CI: new_tasks checks if any modifications have been made # CI: new_tasks checks if any modifications have been made
task_classes = new_tasks() task_classes = new_tasks()
# Check if task_classes is empty # Check if task_classes is empty
if task_classes: task_classes = task_classes if task_classes else TASKS
return list(task_manager.load_task_or_group(task_classes).values()) res = tasks.get_task_dict(task_classes, task_manager)
else: res = [x.task for x in get_task_list(res)]
return list(task_manager.load_task_or_group(TASKS).values())
return res
@pytest.fixture() @pytest.fixture()
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment