refactored hendrycks_math to minerva_math

a554e41b · baberabb · 1a34d084 · 1a34d084 · a554e41b · 1a34d084
Commit a554e41b authored Sep 15, 2023 by baberabb
14 changed files
--- a/lm_eval/benchmarks/hendrycks_math.yaml
+++ b/lm_eval/benchmarks/hendrycks_math.yaml
-group: hendrycks_math
-task:
-  - math_algebra
-  - math_counting_and_prob
-  - math_geometry
-  - math_intermediate_algebra
-  - math_num_theory
-  - math_prealgebra
-  - math_precalc
--- a/lm_eval/benchmarks/minerva_math.yaml
+++ b/lm_eval/benchmarks/minerva_math.yaml
+group: minerva_math
+task:
+  - minerva_math_algebra
+  - minerva_math_counting_and_prob
+  - minerva_math_geometry
+  - minerva_math_intermediate_algebra
+  - minerva_math_num_theory
+  - minerva_math_prealgebra
+  - minerva_math_precalc
--- a/lm_eval/tasks/hendrycks_math/math_geometry.yaml
+++ b/lm_eval/tasks/hendrycks_math/math_geometry.yaml
-include: math_algebra.yaml
-dataset_name: geometry
-task: math_geometry
--- a/lm_eval/tasks/hendrycks_math/math_prealgebra.yaml
+++ b/lm_eval/tasks/hendrycks_math/math_prealgebra.yaml
-include: math_algebra.yaml
-dataset_name: prealgebra
-task: math_prealgebra
--- a/lm_eval/tasks/hendrycks_math/math_precalc.yaml
+++ b/lm_eval/tasks/hendrycks_math/math_precalc.yaml
-include: math_algebra.yaml
-dataset_name: precalculus
-task: math_precalc
--- a/lm_eval/tasks/hendrycks_math/README.md
+++ b/lm_eval/tasks/hendrycks_math/README.md
@@ -6,7 +6,7 @@ https://arxiv.org/abs/2103.03874

 Many intellectual endeavors require mathematical problem solving, but this skill remains beyond the capabilities of computers. To measure this ability in machine learning models, we introduce MATH, a new dataset of 12,500 challenging competition mathematics problems. Each problem in MATH has a full step-by-step solution which can be used to teach models to generate answer derivations and explanations.

-NOTE: The few-shot and the generated answer extraction is based on the [Minerva](https://arxiv.org/abs/2206.14858) paper and exact match equivalence is calculated using the `sympy` library
+NOTE: The few-shot and the generated answer extraction is based on the [Minerva](https://arxiv.org/abs/2206.14858) and exact match equivalence is calculated using the `sympy` library

 Homepage: https://github.com/hendrycks/math

@@ -19,13 +19,20 @@ Homepage: https://github.com/hendrycks/math
  journal={NeurIPS},
  year={2021}
 }
+
+@misc{2206.14858,
+Author = {Aitor Lewkowycz and Anders Andreassen and David Dohan and Ethan Dyer and Henryk Michalewski and Vinay Ramasesh and Ambrose Slone and Cem Anil and Imanol Schlag and Theo Gutman-Solo and Yuhuai Wu and Behnam Neyshabur and Guy Gur-Ari and Vedant Misra},
+Title = {Solving Quantitative Reasoning Problems with Language Models},
+Year = {2022},
+Eprint = {arXiv:2206.14858},
+}
 ```

 ### Groups, Benchmarks and Tasks

 #### Benchmarks

- `hendrycks_math`
+- `minerva_math`

 #### Groups

@@ -34,13 +41,13 @@ Homepage: https://github.com/hendrycks/math

 #### Tasks

- `math_algebra`
- `math_counting_and_prob`
- `math_geometry`
- `math_intermediate_algebra`
- `math_num_theory`
- `math_prealgebra`
- `math_precalc`
+- `minerva_math_algebra`
+- `minerva_math_counting_and_prob`
+- `minerva_math_geometry`
+- `minerva_math_intermediate_algebra`
+- `minerva_math_num_theory`
+- `minerva_math_prealgebra`
+- `minerva_math_precalc`

 ### Checklist


--- a/lm_eval/tasks/hendrycks_math/math_algebra.yaml
+++ b/lm_eval/tasks/hendrycks_math/math_algebra.yaml
 group:
  - math_word_problems
-task: math_algebra
-dataset_path: baber/hendrycks_math
+task: minerva_math_algebra
+dataset_path: EleutherAI/hendrycks_math
 process_docs: !function utils.process_docs
 dataset_name: algebra
 output_type: greedy_until

--- a/lm_eval/tasks/hendrycks_math/math_counting_and_prob.yaml
+++ b/lm_eval/tasks/hendrycks_math/math_counting_and_prob.yaml
-include: math_algebra.yaml
+include: minerva_math_algebra.yaml
 dataset_name: counting_and_probability
-task: math_counting_and_prob
+task: minerva_math_counting_and_prob
--- a/lm_eval/tasks/minerva_math/minerva_math_geometry.yaml
+++ b/lm_eval/tasks/minerva_math/minerva_math_geometry.yaml
+include: minerva_math_algebra.yaml
+dataset_name: geometry
+task: minerva_math_geometry
--- a/lm_eval/tasks/hendrycks_math/math_intermediate_algebra.yaml
+++ b/lm_eval/tasks/hendrycks_math/math_intermediate_algebra.yaml
-include: math_algebra.yaml
+include: minerva_math_algebra.yaml
 dataset_name: intermediate_algebra
-task: math_intermediate_algebra
+task: minerva_math_intermediate_algebra
--- a/lm_eval/tasks/hendrycks_math/math_num_theory.yaml
+++ b/lm_eval/tasks/hendrycks_math/math_num_theory.yaml
-include: math_algebra.yaml
+include: minerva_math_algebra.yaml
 dataset_name: number_theory
-task: math_num_theory
+task: minerva_math_num_theory
--- a/lm_eval/tasks/minerva_math/minerva_math_prealgebra.yaml
+++ b/lm_eval/tasks/minerva_math/minerva_math_prealgebra.yaml
+include: minerva_math_algebra.yaml
+dataset_name: prealgebra
+task: minerva_math_prealgebra
--- a/lm_eval/tasks/minerva_math/minerva_math_precalc.yaml
+++ b/lm_eval/tasks/minerva_math/minerva_math_precalc.yaml
+include: minerva_math_algebra.yaml
+dataset_name: precalculus
+task: minerva_math_precalc
--- a/lm_eval/tasks/hendrycks_math/utils.py
+++ b/lm_eval/tasks/hendrycks_math/utils.py