Commit 6ac42518 authored by lintangsutawika's avatar lintangsutawika
Browse files

Merge branch 'big-refactor' of...

Merge branch 'big-refactor' of https://github.com/EleutherAI/lm-evaluation-harness into openai_completions
parents 9c3ba7d4 e3644fcc
"dataset_name": "moral_scenarios"
"description": "The following are multiple choice questions (with answers) about moral\
\ scenarios.\n\n"
"group": "mmlu_humanities"
"group_alias": "humanities"
"include": "_default_template_yaml"
"task": "mmlu_moral_scenarios"
"task_alias": "moral_scenarios"
"dataset_name": "nutrition"
"description": "The following are multiple choice questions (with answers) about nutrition.\n\
\n"
"group": "mmlu_other"
"group_alias": "other"
"include": "_default_template_yaml"
"task": "mmlu_nutrition"
"task_alias": "nutrition"
"dataset_name": "philosophy"
"description": "The following are multiple choice questions (with answers) about philosophy.\n\
\n"
"group": "mmlu_humanities"
"group_alias": "humanities"
"include": "_default_template_yaml"
"task": "mmlu_philosophy"
"task_alias": "philosophy"
"dataset_name": "prehistory"
"description": "The following are multiple choice questions (with answers) about prehistory.\n\
\n"
"group": "mmlu_humanities"
"group_alias": "humanities"
"include": "_default_template_yaml"
"task": "mmlu_prehistory"
"task_alias": "prehistory"
"dataset_name": "professional_accounting"
"description": "The following are multiple choice questions (with answers) about professional\
\ accounting.\n\n"
"group": "mmlu_other"
"group_alias": "other"
"include": "_default_template_yaml"
"task": "mmlu_professional_accounting"
"task_alias": "professional_accounting"
"dataset_name": "professional_law"
"description": "The following are multiple choice questions (with answers) about professional\
\ law.\n\n"
"group": "mmlu_humanities"
"group_alias": "humanities"
"include": "_default_template_yaml"
"task": "mmlu_professional_law"
"task_alias": "professional_law"
"dataset_name": "professional_medicine"
"description": "The following are multiple choice questions (with answers) about professional\
\ medicine.\n\n"
"group": "mmlu_other"
"group_alias": "other"
"include": "_default_template_yaml"
"task": "mmlu_professional_medicine"
"task_alias": "professional_medicine"
"dataset_name": "professional_psychology"
"description": "The following are multiple choice questions (with answers) about professional\
\ psychology.\n\n"
"group": "mmlu_social_sciences"
"group_alias": "social_sciences"
"include": "_default_template_yaml"
"task": "mmlu_professional_psychology"
"task_alias": "professional_psychology"
"dataset_name": "public_relations"
"description": "The following are multiple choice questions (with answers) about public\
\ relations.\n\n"
"group": "mmlu_social_sciences"
"group_alias": "social_sciences"
"include": "_default_template_yaml"
"task": "mmlu_public_relations"
"task_alias": "public_relations"
"dataset_name": "security_studies"
"description": "The following are multiple choice questions (with answers) about security\
\ studies.\n\n"
"group": "mmlu_social_sciences"
"group_alias": "social_sciences"
"include": "_default_template_yaml"
"task": "mmlu_security_studies"
"task_alias": "security_studies"
"dataset_name": "sociology"
"description": "The following are multiple choice questions (with answers) about sociology.\n\
\n"
"group": "mmlu_social_sciences"
"group_alias": "social_sciences"
"include": "_default_template_yaml"
"task": "mmlu_sociology"
"task_alias": "sociology"
"dataset_name": "us_foreign_policy"
"description": "The following are multiple choice questions (with answers) about us\
\ foreign policy.\n\n"
"group": "mmlu_social_sciences"
"group_alias": "social_sciences"
"include": "_default_template_yaml"
"task": "mmlu_us_foreign_policy"
"task_alias": "us_foreign_policy"
"dataset_name": "virology"
"description": "The following are multiple choice questions (with answers) about virology.\n\
\n"
"group": "mmlu_other"
"group_alias": "other"
"include": "_default_template_yaml"
"task": "mmlu_virology"
"task_alias": "virology"
"dataset_name": "world_religions"
"description": "The following are multiple choice questions (with answers) about world\
\ religions.\n\n"
"group": "mmlu_humanities"
"group_alias": "humanities"
"include": "_default_template_yaml"
"task": "mmlu_world_religions"
"task_alias": "world_religions"
This source diff could not be displayed because it is too large. You can view the blob instead.
group: mmlu_flan_cot_fewshot
task:
- mmlu_flan_cot_fewshot_stem
- mmlu_flan_cot_fewshot_other
- mmlu_flan_cot_fewshot_social_sciences
- mmlu_flan_cot_fewshot_humanities
dataset_path: hails/mmlu_no_train # a copy of `cais/mmlu` with no auxiliary_train split
validation_split: validation
fewshot_split: dev
output_type: generate_until
doc_to_text: "Q: {{question.strip()}}\n(A) {{choices[0]}} (B) {{choices[1]}} (C) {{choices[2]}} (D) {{choices[3]}}\nA: Let's think step by step."
doc_to_target: "{{['(A)', '(B)', '(C)', '(D)'][answer]}}"
filter_list:
- name: "get-answer"
filter:
- function: "regex"
regex_pattern: "(?<=The answer is )(.*)(?=.)"
- function: "take_first"
generation_kwargs:
until:
- "</s>"
do_sample: false
temperature: 0.0
metric_list:
- metric: exact_match
aggregation: mean
higher_is_better: true
ignore_case: true
ignore_punctuation: true
"dataset_name": "abstract_algebra"
"description": "The following are multiple choice questions (with answers) about abstract\
\ algebra.\n\nQ: Statement 1 | Every element of a group generates a cyclic subgroup\
\ of the group. Statement 2 | The symmetric group S_10 has 10 elements.\n(A) True,\
\ True (B) False, False (C) True, False (D) False, True\nA: Let's think step by\
\ step. A cyclic group is a group that is generated by a single element. Hence a\
\ subgroup generated by a single element of a group is cyclic and Statement 1 is\
\ True. The answer is (C).\n\nQ: The symmetric group $S_n$ has $\nactorial{n}$ elements,\
\ hence it is not true that $S_{10}$ has 10 elements.\nFind the characteristic of\
\ the ring 2Z.\n(A) 0 (B) 3 (C) 12 (D) 30\nA: Let's think step by step. A characteristic\
\ of a ring is R is $n$ if the statement $ka = 0$ for all $a\\in 2Z$ implies that\
\ $k$ is a multiple of $n$. Assume that $ka = 0$ for all $a\\in 2Z$ for some $k$.\
\ In particular $2k = 0$. Hence $k=0$ and $n=0$. The answer is (A).\n\nQ: Statement\
\ 1| Every function from a finite set onto itself must be one to one. Statement\
\ 2 | Every subgroup of an abelian group is abelian.\n(A) True, True (B) False,\
\ False (C) True, False (D) False, True\nA: Let's think step by step. Statement\
\ 1 is true. Let $S$ be a finite set. If $f:S \nightarrow S$ is a onto function,\
\ then $|S| = |f(S)|$. If $f$ was not one to one, then for finite domain $S$ the\
\ image would have less than $S$ elements, a contradiction.\nStatement 2 is true.\
\ Let $G$ be an abelian group and $H$ be a subgroup of $G$. We need to show that\
\ $H$ is abelian. Let $a,b \\in H$. Then $a,b \\in G$ and $ab=ba$. Since $G$ is\
\ abelian, $ab=ba$. Since $H$ is a subgroup of $G$, $ab \\in H$. Therefore, $ab=ba$\
\ and $H$ is abelian. The answer is (A).\n\nQ: Statement 1 | If aH is an element\
\ of a factor group, then |aH| divides |a|. Statement 2 | If H and K are subgroups\
\ of G then HK is a subgroup of G.\n(A) True, True (B) False, False (C) True, False\
\ (D) False, True\nA: Let's think step by step. Statement 2 is false. Let $H$ be\
\ a subgroup of $S_3$ generated by the cycle $(1,2)$ and $K$ be a subgroup of $S_3$\
\ generated by the cycle $(1,3)$. Both $H$ and $K$ have two elements, the generators\
\ and the identity. However $HK$ contains cycles (1,2), (1,3) and (2,3,1), but the\
\ inverse of (2,3,1) is (2,1,3) and it does not belong to HK, hence HK is not a\
\ subgroup. The answer is (B).\n\nQ: Find all c in Z_3 such that Z_3[x]/(x^2 + c)\
\ is a field.\n(A) 0 (B) 1 (C) 2 (D) 3\nA: Let's think step by step. Z_3[x]/(x^2\
\ + c) is a field if and only if x^2 + c does not have roots in Z_3. That is x^2\
\ + c != 0 for every x in Z_3. If c = 0, then x^2 + c = x^2 has root 0. If c = 1\
\ then x^2 + c = x^2 + 1 = 0 + 1 for x = 0, 1 + 1 = 2 for x = 1 and 1 + 1 = 2 for\
\ x = 2, hence x^2 + 1 does not have any roots. For c = 2 the polynomial x^2 + 2\
\ has two roots at x = 1 and x = 2. Hence Z_3[x]/(x^2 + c) is a field if and only\
\ if c = 1. The answer is (B)."
"group": "mmlu_flan_cot_fewshot_stem"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_abstract_algebra"
"dataset_name": "anatomy"
"description": "The following are multiple choice questions (with answers) about anatomy.\n\
\nQ: Which of the following is the body cavity that contains the pituitary gland?\n\
(A) Abdominal (B) Cranial (C) Pleural (D) Spinal\nA: Let's think step by step. We\
\ refer to Wikipedia articles on anatomy for help. Let’s solve this problem step\
\ by step. The pituitary gland is the major endocrine gland attached to the base\
\ of the brain, and it is contained in the Cranial cavity. The answer is (B).\n\n\
Q: Which of these branches of the trigeminal nerve contain somatic motor processes?\n\
(A) The supraorbital nerve (B) The infraorbital nerve (C) The mental nerve (D) None\
\ of the above\nA: Let's think step by step. We refer to Wikipedia articles on anatomy\
\ for help. Let’s solve this problem step by step. \nWe know the following: (A)\
\ The supraorbital nerve (also known as the frontal nerve) is the largest branch\
\ of the ophthalmic nerve and branch of ophthalmic division of the trigeminal nerve.\
\ (B) The infraorbital nerve is a branch of the maxillary division of the trigeminal\
\ nerve. (C) The mental nerve is a branch of the mandibular division of the trigeminal\
\ nerve. Because all these nerves are purely sensory nerves and do not contain any\
\ somatic motor processes. Therefore, the answer should be none of the above, which\
\ is (D). The answer is (D).\n\nQ: In Angle's Class II Div 2 occlusion there is\n\
(A) excess overbite of the upper lateral incisors. (B) negative overjet of the upper\
\ central incisors. (C) excess overjet of the upper lateral incisors. (D) excess\
\ overjet of the upper central incisors.\nA: Let's think step by step. We refer\
\ to Wikipedia articles on anatomy for help. Let’s solve this problem step by step.\
\ This is a question related to anatomy and orthodontics. Excess overjet is associated\
\ with Class II occlusions; therefore, we can safely eliminate (B) from the list,\
\ as negative overjet is often associated with Class III occlusions. Now, we need\
\ to determine the location of the excess overjet, and that would be the upper (maxillary)\
\ lateral incisors. Only (C) has the correct information. The answer is (C).\n\n\
Q: The pleura\n(A) have no sensory innervation. (B) are separated by a 2 mm space.\
\ (C) extend into the neck. (D) are composed of respiratory epithelium.\nA: Let's\
\ think step by step. We refer to Wikipedia articles on anatomy for help. Let’s\
\ solve this problem step by step. First, recall that the pleura refers to the thin\
\ layer of tissue that covers the lungs and lines the interior wall of the chest\
\ cavity. Now, let’s look at each option:\nOption (A): “The pleura have no sensory\
\ innervation.” This information is not correct. The pleura do have a sensory innervation.\n\
Option (B): “The pleura are separated by a 2 mm space.” This information is not\
\ correct. There is a very thin “potential” space between the layers of the pleura;\
\ however, it is typically filled with serous pleural fluid. \nOption (C): “The\
\ pleura extend into the neck.” This information is actuakky true. The cervical\
\ pleura, also known as the dome of the pleuradome of the pleura, lines the extendsiton\
\ of the pleural cavity into the neck.\nOption (D): “The pleura are composed of\
\ respiratory epithelium.” This information is not correct. The pleaura are composed\
\ of connective tissue (CT).\nBecause (A), (B), and (D) are all incorrect, (D) is\
\ the only correct answer. The answer is (C).\n\nQ: What is the embryological origin\
\ of the hyoid bone?\n(A) The first pharyngeal arch (B) The first and second pharyngeal\
\ arches (C) The second pharyngeal arch (D) The second and third pharyngeal arches\n\
A: Let's think step by step. We refer to Wikipedia articles on anatomy for help.\
\ Let’s solve this problem step by step. The hyoid bone, which is also known as\
\ the hyooid, is a a small U-shaped bone located in the anterior neck. In its resting\
\ position, it lies between the ase of the mandible and the third cervical vertebrae.\
\ We know that the second and the third pharyngeal arches give rise to the horns\
\ of the hyoid bone; therefore, the embryological origin of the hyoid bone are the\
\ second and the third pharyngeal arches—this information is covered in the last\
\ option (D). Therefore, we conclude that (D) must be the correct answer. The answer\
\ is (D)."
"group": "mmlu_flan_cot_fewshot_stem"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_anatomy"
"dataset_name": "astronomy"
"description": "The following are multiple choice questions (with answers) about astronomy.\n\
\nQ: Where do most short-period comets come from and how do we know?\n(A) The Kuiper\
\ belt; short period comets tend to be in the plane of the solar system just like\
\ the Kuiper belt. (B) The Kuiper belt; short period comets tend to come from random\
\ directions indicating a spherical distribution of comets called the Kuiper belt.\
\ (C) The asteroid belt; short period comets have orbital periods similar to asteroids\
\ like Vesta and are found in the plane of the solar system just like the asteroid\
\ belt. (D) The Oort cloud; short period comets tend to be in the plane of the solar\
\ system just like the Oort cloud.\nA: Let's think step by step. Most short-period\
\ comets come from the Kuiper belt, and we know because short period coments tend\
\ to be in the plane of the solar system, just like the Kuiper belt is. The answer\
\ is (A).\n\nQ: You are pushing a truck along a road. Would it be easier to accelerate\
\ this truck on Mars? Why? (Assume there is no friction)\n(A) It would be harder\
\ since the truck is heavier on Mars. (B) It would be easier since the truck is\
\ lighter on Mars. (C) It would be harder since the truck is lighter on Mars. (D)\
\ It would be the same no matter where you are.\nA: Let's think step by step. If\
\ we assume that there is no friction, the force needed to accelerate the truck\
\ is by Newton’s second law only dependent on the mass of the truck. Hence (A),\
\ (B) and (C) are incorrect since it doesn’t matter that it’s on Mars, and (D) is\
\ the correct answer. The answer is (D).\n\nQ: Say the pupil of your eye has a diameter\
\ of 5 mm and you have a telescope with an aperture of 50 cm. How much more light\
\ can the telescope gather than your eye?\n(A) 10000 times more (B) 100 times more\
\ (C) 1000 times more (D) 10 times more\nA: Let's think step by step. The amount\
\ of light is proportional to the aperture area $A = \\pi D^2/4$ for a lens with\
\ diameter $D$, so the relative amounts of light between the eye with diameter 5mm\
\ and the telescope with diameter 50mm is $(50 cm)^2/(5mm)^2 = 10000$. The answer\
\ is (A).\n\nQ: Why isn't there a planet where the asteroid belt is located?\n(A)\
\ A planet once formed here but it was broken apart by a catastrophic collision.\
\ (B) There was not enough material in this part of the solar nebula to form a planet.\
\ (C) There was too much rocky material to form a terrestrial planet but not enough\
\ gaseous material to form a jovian planet. (D) Resonance with Jupiter prevented\
\ material from collecting together to form a planet.\nA: Let's think step by step.\
\ The asteroid belt is a stellar disc consisting of a large number of asteroids\
\ between Mars and Jupiter's orbits. The asteroids in this belt are affected by\
\ the gravitational pull from both other asteroids and nearby planets. Due to the\
\ strong gravitational force of Jupiter there are resonances that give rise to low\
\ density regions of asteroids known as the Kirkwood gap. So (B) and (C) are not\
\ correct since it’s not a lack of material that prevents a planet from being formed,\
\ and (A) is incorrect because the Kirkwood gap would have prevented a planet from\
\ forming in the first place, and (D) is the correct option. The answer is (D).\n\
\nQ: Why is Mars red?\n(A) Because the surface is covered with heavily oxidized\
\ (\"rusted\") minerals. (B) Because the atmosphere scatters more light at bluer\
\ wavelengths transmitting mostly red light. (C) Because Mars is covered with ancient\
\ lava flows which are red in color. (D) Because flowing water on Mars's surface\
\ altered the surface minerals several billion years ago.\nA: Let's think step by\
\ step. Option (B) is not correct because if the red color was caused by the scattering\
\ off the atmosphere, then the earth with a much thicker atmosphere would also look\
\ red. Options (C) and (D) are not specific enough about why the color of the surface\
\ would be red, while (A) is correct because it explains that the surface is red\
\ due to the rusted materials on the surface and the red color comes from the rust.\
\ So the correct option is (A). The answer is (A)."
"group": "mmlu_flan_cot_fewshot_stem"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_astronomy"
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment