Commit adaa79ec authored by Baber's avatar Baber
Browse files

add math_verify to minerva_math and leaderboard_math

parent 47051bd8
dataset_path: lighteval/MATH-Hard
dataset_path: EleutherAI/hendrycks_math
process_docs: !function utils.process_docs
output_type: generate_until
training_split: train
......@@ -16,6 +16,9 @@ metric_list:
- metric: exact_match
aggregation: mean
higher_is_better: true
- metric: math_verify
aggregation: mean
higher_is_better: true
num_fewshot: 4
metadata:
version: 2.0
......
......@@ -3,6 +3,7 @@ import signal
from typing import Dict, List, Optional
import datasets
from math_verify import parse, verify
from lm_eval.utils import eval_logger
......@@ -39,7 +40,7 @@ def process_docs(dataset: datasets.Dataset) -> datasets.Dataset:
out_doc["few_shot"] = True
return out_doc
return dataset.map(_process_doc)
return dataset.filter(lambda x: x["level"] == "Level 5").map(_process_doc)
def list_fewshot_samples() -> list[dict]:
......@@ -48,21 +49,25 @@ def list_fewshot_samples() -> list[dict]:
"problem": "Find the domain of the expression $\\frac{\\sqrt{x-2}}{\\sqrt{5-x}}$.}",
"solution": "The expressions inside each square root must be non-negative. Therefore, $x-2 \\ge 0$, so $x\\ge2$, and $5 - x \\ge 0$, so $x \\le 5$. Also, the denominator cannot be equal to zero, so $5-x>0$, which gives $x<5$. Therefore, the domain of the expression is $\\boxed{[2,5)}$.\nFinal Answer: The final answer is $[2,5)$. I hope it is correct.",
"few_shot": "1",
"level": "Level 5",
},
{
"problem": "If $\\det \\mathbf{A} = 2$ and $\\det \\mathbf{B} = 12,$ then find $\\det (\\mathbf{A} \\mathbf{B}).$",
"solution": "We have that $\\det (\\mathbf{A} \\mathbf{B}) = (\\det \\mathbf{A})(\\det \\mathbf{B}) = (2)(12) = \\boxed{24}.$\nFinal Answer: The final answer is $24$. I hope it is correct.",
"few_shot": "1",
"level": "Level 5",
},
{
"problem": "Terrell usually lifts two 20-pound weights 12 times. If he uses two 15-pound weights instead, how many times must Terrell lift them in order to lift the same total weight?",
"solution": "If Terrell lifts two 20-pound weights 12 times, he lifts a total of $2\\cdot 12\\cdot20=480$ pounds of weight. If he lifts two 15-pound weights instead for $n$ times, he will lift a total of $2\\cdot15\\cdot n=30n$ pounds of weight. Equating this to 480 pounds, we can solve for $n$:\n\\begin{align*}\n30n&=480\\\n\\Rightarrow\\qquad n&=480/30=\\boxed{16}\n\\end{align*}\nFinal Answer: The final answer is $16$. I hope it is correct.",
"few_shot": "1",
"level": "Level 5",
},
{
"problem": "If the system of equations\n\n\\begin{align*}\n6x-4y&=a,\\\n6y-9x &=b.\n\\end{align*}has a solution $(x, y)$ where $x$ and $y$ are both nonzero,\nfind $\\frac{a}{b},$ assuming $b$ is nonzero.",
"solution": "If we multiply the first equation by $-\\frac{3}{2}$, we obtain\n\n$$6y-9x=-\\frac{3}{2}a.$$Since we also know that $6y-9x=b$, we have\n\n$$-\\frac{3}{2}a=b\\Rightarrow\\frac{a}{b}=\\boxed{-\\frac{2}{3}}.$$\nFinal Answer: The final answer is $-\\frac{2}{3}$. I hope it is correct.",
"few_shot": "1",
"level": "Level 5",
},
]
......@@ -81,9 +86,9 @@ def process_results(doc: dict, results: List[str]) -> Dict[str, int]:
else:
retval = 0
results = {
"exact_match": retval,
}
res = verify(parse(doc["answer"]), parse(candidates))
mathval = 1 if res else 0
results = {"exact_match": retval, "math_verify": mathval}
return results
......
......@@ -19,6 +19,9 @@ metric_list:
- metric: exact_match
aggregation: mean
higher_is_better: true
- metric: math_verify
aggregation: mean
higher_is_better: true
num_fewshot: 4
metadata:
version: 1.0
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment