Unverified Commit c9bbec6e authored by Hailey Schoelkopf's avatar Hailey Schoelkopf Committed by GitHub
Browse files

Merge pull request #1060 from EleutherAI/fix-mmlu

[Refactor] Fix fewshot cot mmlu descriptions
parents 7afae7b5 57e017ff
Pipeline #2992 failed with stages
This source diff could not be displayed because it is too large. You can view the blob instead.
...@@ -35,7 +35,7 @@ ...@@ -35,7 +35,7 @@
\ then x^2 + c = x^2 + 1 = 0 + 1 for x = 0, 1 + 1 = 2 for x = 1 and 1 + 1 = 2 for\ \ then x^2 + c = x^2 + 1 = 0 + 1 for x = 0, 1 + 1 = 2 for x = 1 and 1 + 1 = 2 for\
\ x = 2, hence x^2 + 1 does not have any roots. For c = 2 the polynomial x^2 + 2\ \ x = 2, hence x^2 + 1 does not have any roots. For c = 2 the polynomial x^2 + 2\
\ has two roots at x = 1 and x = 2. Hence Z_3[x]/(x^2 + c) is a field if and only\ \ has two roots at x = 1 and x = 2. Hence Z_3[x]/(x^2 + c) is a field if and only\
\ if c = 1. The answer is (B)." \ if c = 1. The answer is (B).\n\n"
"group": "mmlu_flan_cot_fewshot_stem" "group": "mmlu_flan_cot_fewshot_stem"
"include": "_mmlu_flan_cot_fewshot_template_yaml" "include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_abstract_algebra" "task": "mmlu_flan_cot_fewshot_abstract_algebra"
...@@ -51,7 +51,7 @@ ...@@ -51,7 +51,7 @@
\ of the hyoid bone; therefore, the embryological origin of the hyoid bone are the\ \ of the hyoid bone; therefore, the embryological origin of the hyoid bone are the\
\ second and the third pharyngeal arches—this information is covered in the last\ \ second and the third pharyngeal arches—this information is covered in the last\
\ option (D). Therefore, we conclude that (D) must be the correct answer. The answer\ \ option (D). Therefore, we conclude that (D) must be the correct answer. The answer\
\ is (D)." \ is (D).\n\n"
"group": "mmlu_flan_cot_fewshot_stem" "group": "mmlu_flan_cot_fewshot_stem"
"include": "_mmlu_flan_cot_fewshot_template_yaml" "include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_anatomy" "task": "mmlu_flan_cot_fewshot_anatomy"
...@@ -49,7 +49,7 @@ ...@@ -49,7 +49,7 @@
\ red. Options (C) and (D) are not specific enough about why the color of the surface\ \ red. Options (C) and (D) are not specific enough about why the color of the surface\
\ would be red, while (A) is correct because it explains that the surface is red\ \ would be red, while (A) is correct because it explains that the surface is red\
\ due to the rusted materials on the surface and the red color comes from the rust.\ \ due to the rusted materials on the surface and the red color comes from the rust.\
\ So the correct option is (A). The answer is (A)." \ So the correct option is (A). The answer is (A).\n\n"
"group": "mmlu_flan_cot_fewshot_stem" "group": "mmlu_flan_cot_fewshot_stem"
"include": "_mmlu_flan_cot_fewshot_template_yaml" "include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_astronomy" "task": "mmlu_flan_cot_fewshot_astronomy"
...@@ -50,7 +50,7 @@ ...@@ -50,7 +50,7 @@
\ that best uses the possible options above is “Beyond the business case for engaging\ \ that best uses the possible options above is “Beyond the business case for engaging\
\ the CSR there are a number of moral arguments relating to: negative *externalities*,\ \ the CSR there are a number of moral arguments relating to: negative *externalities*,\
\ the *power* that corporations possess and the *mutual independence* of business\ \ the *power* that corporations possess and the *mutual independence* of business\
\ and society. The answer is (D)." \ and society. The answer is (D).\n\n"
"group": "mmlu_flan_cot_fewshot_other" "group": "mmlu_flan_cot_fewshot_other"
"include": "_mmlu_flan_cot_fewshot_template_yaml" "include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_business_ethics" "task": "mmlu_flan_cot_fewshot_business_ethics"
...@@ -29,7 +29,7 @@ ...@@ -29,7 +29,7 @@
\ (D) oxidative phosphorylation.\nA: Let's think step by step. We refer to Wikipedia\ \ (D) oxidative phosphorylation.\nA: Let's think step by step. We refer to Wikipedia\
\ articles on clinical knowledge for help. The energy for muscular contraction is\ \ articles on clinical knowledge for help. The energy for muscular contraction is\
\ provided by ATP (adenosine triphosphate), which is the powerhouse of the cell.\ \ provided by ATP (adenosine triphosphate), which is the powerhouse of the cell.\
\ The answer is (A)." \ The answer is (A).\n\n"
"group": "mmlu_flan_cot_fewshot_other" "group": "mmlu_flan_cot_fewshot_other"
"include": "_mmlu_flan_cot_fewshot_template_yaml" "include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_clinical_knowledge" "task": "mmlu_flan_cot_fewshot_clinical_knowledge"
...@@ -55,7 +55,7 @@ ...@@ -55,7 +55,7 @@
\ resemblance of structures that have different origins, which is not the case for\ \ resemblance of structures that have different origins, which is not the case for\
\ the human and bird forearms, which rules out (D). Humans and birds do belong to\ \ the human and bird forearms, which rules out (D). Humans and birds do belong to\
\ the same clade - a group of organisms composed of a common ancestor. The answer\ \ the same clade - a group of organisms composed of a common ancestor. The answer\
\ is (C)." \ is (C).\n\n"
"group": "mmlu_flan_cot_fewshot_stem" "group": "mmlu_flan_cot_fewshot_stem"
"include": "_mmlu_flan_cot_fewshot_template_yaml" "include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_college_biology" "task": "mmlu_flan_cot_fewshot_college_biology"
...@@ -32,7 +32,7 @@ ...@@ -32,7 +32,7 @@
\ hyperfine interaction with the 13C (nuclear spin $I = \nrac{1}{2}$) which will\ \ hyperfine interaction with the 13C (nuclear spin $I = \nrac{1}{2}$) which will\
\ split the spectrum into 2 lines. This will be further split into 4 lines by the\ \ split the spectrum into 2 lines. This will be further split into 4 lines by the\
\ interaction with three equivalent 1H nuclei. The total number of lines is therefore\ \ interaction with three equivalent 1H nuclei. The total number of lines is therefore\
\ $2 \\cdot 4 = 8$. The answer is (E)." \ $2 \\cdot 4 = 8$. The answer is (E).\n\n"
"group": "mmlu_flan_cot_fewshot_stem" "group": "mmlu_flan_cot_fewshot_stem"
"include": "_mmlu_flan_cot_fewshot_template_yaml" "include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_college_chemistry" "task": "mmlu_flan_cot_fewshot_college_chemistry"
...@@ -73,7 +73,7 @@ ...@@ -73,7 +73,7 @@
Thus we can see that on average a single processor will lock the bus for:\nlock_ns_per_miss\ Thus we can see that on average a single processor will lock the bus for:\nlock_ns_per_miss\
\ * misses_per_instruction * instructions_per_ns =\n(1000 nanoseconds / cache miss)\ \ * misses_per_instruction * instructions_per_ns =\n(1000 nanoseconds / cache miss)\
\ * (1 cache miss / 50 instructions) * (50 instructions / 27000 nanoseconds) = 1000\ \ * (1 cache miss / 50 instructions) * (50 instructions / 27000 nanoseconds) = 1000\
\ * (1/50) * (50/27000) = 1000/27000 = 1/27. The answer is (B)." \ * (1/50) * (50/27000) = 1000/27000 = 1/27. The answer is (B).\n\n"
"group": "mmlu_flan_cot_fewshot_stem" "group": "mmlu_flan_cot_fewshot_stem"
"include": "_mmlu_flan_cot_fewshot_template_yaml" "include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_college_computer_science" "task": "mmlu_flan_cot_fewshot_college_computer_science"
...@@ -44,7 +44,7 @@ ...@@ -44,7 +44,7 @@
\ $t \\in \\mathbb{R}, \\ln ((s(t)-2))=-[t / 25]+C$. Let $K:=e^{C}$. Then, for all\ \ $t \\in \\mathbb{R}, \\ln ((s(t)-2))=-[t / 25]+C$. Let $K:=e^{C}$. Then, for all\
\ $t \\in \\mathbb{R}$, we have $(s(t))-2=K e^{-t / 25}$, and so $s(t)=2+K e^{-t\ \ $t \\in \\mathbb{R}$, we have $(s(t))-2=K e^{-t / 25}$, and so $s(t)=2+K e^{-t\
\ / 25}$. Then $3=s(0)=2+K e^{0}=2+K$, so $K=1$. Then $s(100)=2+K e^{-100 / 25}=2+1\ \ / 25}$. Then $3=s(0)=2+K e^{0}=2+K$, so $K=1$. Then $s(100)=2+K e^{-100 / 25}=2+1\
\ \\cdot e^{-4}=2+e^{-4}$. The answer is (D)." \ \\cdot e^{-4}=2+e^{-4}$. The answer is (D).\n\n"
"group": "mmlu_flan_cot_fewshot_stem" "group": "mmlu_flan_cot_fewshot_stem"
"include": "_mmlu_flan_cot_fewshot_template_yaml" "include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_college_mathematics" "task": "mmlu_flan_cot_fewshot_college_mathematics"
...@@ -46,7 +46,7 @@ ...@@ -46,7 +46,7 @@
\ monocarbylic acid transporters.\nA: Let's think step by step. We refer to Wikipedia\ \ monocarbylic acid transporters.\nA: Let's think step by step. We refer to Wikipedia\
\ articles on medicine for help. Glucose (also known as the blood sugar) is the\ \ articles on medicine for help. Glucose (also known as the blood sugar) is the\
\ main sugar found in the human body. It is transported into the muscle cell via\ \ main sugar found in the human body. It is transported into the muscle cell via\
\ diffusion through protein transporters called GLUT4. The answer is (A)." \ diffusion through protein transporters called GLUT4. The answer is (A).\n\n"
"group": "mmlu_flan_cot_fewshot_other" "group": "mmlu_flan_cot_fewshot_other"
"include": "_mmlu_flan_cot_fewshot_template_yaml" "include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_college_medicine" "task": "mmlu_flan_cot_fewshot_college_medicine"
...@@ -38,7 +38,7 @@ ...@@ -38,7 +38,7 @@
\ go into the gases internal energy or work done against an external force. However,\ \ go into the gases internal energy or work done against an external force. However,\
\ if the volume of the gas container is constant, no work will be done (since work\ \ if the volume of the gas container is constant, no work will be done (since work\
\ is pressure times change in volume). So, at constant volume, all of the heat goes\ \ is pressure times change in volume). So, at constant volume, all of the heat goes\
\ into the internal energy. The answer is (B)." \ into the internal energy. The answer is (B).\n\n"
"group": "mmlu_flan_cot_fewshot_stem" "group": "mmlu_flan_cot_fewshot_stem"
"include": "_mmlu_flan_cot_fewshot_template_yaml" "include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_college_physics" "task": "mmlu_flan_cot_fewshot_college_physics"
...@@ -30,7 +30,7 @@ ...@@ -30,7 +30,7 @@
\ resulted from improper input validation (due to a missing bounds check) in the\ \ resulted from improper input validation (due to a missing bounds check) in the\
\ implementation of the TLS heartbeat extension. The vulnerability was classified\ \ implementation of the TLS heartbeat extension. The vulnerability was classified\
\ as a buffer over-read, a situation where more data can be read than should be\ \ as a buffer over-read, a situation where more data can be read than should be\
\ allowed. The answer is (C)." \ allowed. The answer is (C).\n\n"
"group": "mmlu_flan_cot_fewshot_stem" "group": "mmlu_flan_cot_fewshot_stem"
"include": "_mmlu_flan_cot_fewshot_template_yaml" "include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_computer_security" "task": "mmlu_flan_cot_fewshot_computer_security"
...@@ -27,7 +27,7 @@ ...@@ -27,7 +27,7 @@
\ speed in the direction of the wind is greater than it would be in the absence\ \ speed in the direction of the wind is greater than it would be in the absence\
\ of wind, and its direction orthogonal to the wind is the same as it would be in\ \ of wind, and its direction orthogonal to the wind is the same as it would be in\
\ the absence of the wind. The total speed, which is these two components added\ \ the absence of the wind. The total speed, which is these two components added\
\ in quadrature, is thus greater than the speed in still air. The answer is (B)." \ in quadrature, is thus greater than the speed in still air. The answer is (B).\n\n"
"group": "mmlu_flan_cot_fewshot_stem" "group": "mmlu_flan_cot_fewshot_stem"
"include": "_mmlu_flan_cot_fewshot_template_yaml" "include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_conceptual_physics" "task": "mmlu_flan_cot_fewshot_conceptual_physics"
...@@ -57,7 +57,7 @@ ...@@ -57,7 +57,7 @@
\ die away (B) Persist indefinitely (C) Grow exponentially (D) Never occur\nA: Let's\ \ die away (B) Persist indefinitely (C) Grow exponentially (D) Never occur\nA: Let's\
\ think step by step. We refer to Wikipedia articles on econometrics for help. This\ \ think step by step. We refer to Wikipedia articles on econometrics for help. This\
\ is a formal logic problem about stationally process. For a stationary autoregressive\ \ is a formal logic problem about stationally process. For a stationary autoregressive\
\ process, shocks will eventually die away. The answer is (A)." \ process, shocks will eventually die away. The answer is (A).\n\n"
"group": "mmlu_flan_cot_fewshot_social_sciences" "group": "mmlu_flan_cot_fewshot_social_sciences"
"include": "_mmlu_flan_cot_fewshot_template_yaml" "include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_econometrics" "task": "mmlu_flan_cot_fewshot_econometrics"
...@@ -28,7 +28,7 @@ ...@@ -28,7 +28,7 @@
\ is 100. Find the total resistance\n(A) 200Ω (B) 100Ω (C) 50Ω (D) 10Ω\nA: Let's\ \ is 100. Find the total resistance\n(A) 200Ω (B) 100Ω (C) 50Ω (D) 10Ω\nA: Let's\
\ think step by step. In lap winding, effectively two resistors are connected in\ \ think step by step. In lap winding, effectively two resistors are connected in\
\ parallel, so the actual resistance of each pair is 1 Ohm. Since we have 50 pairs,\ \ parallel, so the actual resistance of each pair is 1 Ohm. Since we have 50 pairs,\
\ we get a total resistance of 50 Ohms. The answer is (C)." \ we get a total resistance of 50 Ohms. The answer is (C).\n\n"
"group": "mmlu_flan_cot_fewshot_stem" "group": "mmlu_flan_cot_fewshot_stem"
"include": "_mmlu_flan_cot_fewshot_template_yaml" "include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_electrical_engineering" "task": "mmlu_flan_cot_fewshot_electrical_engineering"
...@@ -35,7 +35,7 @@ ...@@ -35,7 +35,7 @@
\nQ: Which expression is equivalent to 5 x 9?\n(A) (5 x 4) x (6 x 5)\n(B) (5 x 5)\ \nQ: Which expression is equivalent to 5 x 9?\n(A) (5 x 4) x (6 x 5)\n(B) (5 x 5)\
\ + (5 x 4)\n(C) (5 x 5) + (5 x 9)\n(D) (5 x 9) x (6 x 9)\nA: Let's think step by\ \ + (5 x 4)\n(C) (5 x 5) + (5 x 9)\n(D) (5 x 9) x (6 x 9)\nA: Let's think step by\
\ step. We know that 9 = (5 + 4), so 5 x 9 = 5 x (5 + 4) = (5 x 5) + (5 x 4). The\ \ step. We know that 9 = (5 + 4), so 5 x 9 = 5 x (5 + 4) = (5 x 5) + (5 x 4). The\
\ answer is (B)." \ answer is (B).\n\n"
"group": "mmlu_flan_cot_fewshot_stem" "group": "mmlu_flan_cot_fewshot_stem"
"include": "_mmlu_flan_cot_fewshot_template_yaml" "include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_elementary_mathematics" "task": "mmlu_flan_cot_fewshot_elementary_mathematics"
...@@ -47,7 +47,7 @@ ...@@ -47,7 +47,7 @@
\ (∀x)(Px ~Dx) For all x, x is on Mars implies that x do not drive on Mars.\n\ \ (∀x)(Px ~Dx) For all x, x is on Mars implies that x do not drive on Mars.\n\
Option (D): ~Dp: p do not drive on Mars.\nOf all these options, Option (C) appears\ Option (D): ~Dp: p do not drive on Mars.\nOf all these options, Option (C) appears\
\ to be the best and most meaningful interpretation of the argument “No people drive\ \ to be the best and most meaningful interpretation of the argument “No people drive\
\ on Mars.” The answer is (C)." \ on Mars.” The answer is (C).\n\n"
"group": "mmlu_flan_cot_fewshot_humanities" "group": "mmlu_flan_cot_fewshot_humanities"
"include": "_mmlu_flan_cot_fewshot_template_yaml" "include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_formal_logic" "task": "mmlu_flan_cot_fewshot_formal_logic"
...@@ -28,7 +28,7 @@ ...@@ -28,7 +28,7 @@
\ of their nation or the world.\nA: Let's think step by step. We refer to Wikipedia\ \ of their nation or the world.\nA: Let's think step by step. We refer to Wikipedia\
\ articles on global facts for help. As of 2019, most people tend to be optimistic\ \ articles on global facts for help. As of 2019, most people tend to be optimistic\
\ about their own future but pessimistic about the future of their nation or the\ \ about their own future but pessimistic about the future of their nation or the\
\ world. The answer is (B)." \ world. The answer is (B).\n\n"
"group": "mmlu_flan_cot_fewshot_other" "group": "mmlu_flan_cot_fewshot_other"
"include": "_mmlu_flan_cot_fewshot_template_yaml" "include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_global_facts" "task": "mmlu_flan_cot_fewshot_global_facts"
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment