Commit e1ae8a2f authored by Herbie Bradley's avatar Herbie Bradley
Browse files

Merge remote-tracking branch 'origin/big-refactor' into calibration

parents 50e99bd7 30936bc7
"dataset_name": "us_foreign_policy"
"description": "The following are multiple choice questions (with answers) about us\
\ foreign policy.\n\n"
"group": "mmlu_social_sciences"
"group_alias": "social_sciences"
"include": "_default_template_yaml"
"task": "mmlu_us_foreign_policy"
"task_alias": "us_foreign_policy"
"dataset_name": "virology"
"description": "The following are multiple choice questions (with answers) about virology.\n\
\n"
"group": "mmlu_other"
"group_alias": "other"
"include": "_default_template_yaml"
"task": "mmlu_virology"
"task_alias": "virology"
"dataset_name": "world_religions"
"description": "The following are multiple choice questions (with answers) about world\
\ religions.\n\n"
"group": "mmlu_humanities"
"group_alias": "humanities"
"include": "_default_template_yaml"
"task": "mmlu_world_religions"
"task_alias": "world_religions"
This source diff could not be displayed because it is too large. You can view the blob instead.
group: mmlu_flan_cot_fewshot
task:
- mmlu_flan_cot_fewshot_stem
- mmlu_flan_cot_fewshot_other
- mmlu_flan_cot_fewshot_social_sciences
- mmlu_flan_cot_fewshot_humanities
dataset_path: hails/mmlu_no_train # a copy of `cais/mmlu` with no auxiliary_train split
validation_split: validation
fewshot_split: dev
output_type: generate_until
doc_to_text: "Q: {{question.strip()}}\n(A) {{choices[0]}} (B) {{choices[1]}} (C) {{choices[2]}} (D) {{choices[3]}}\nA: Let's think step by step."
doc_to_target: "{{['(A)', '(B)', '(C)', '(D)'][answer]}}"
filter_list:
- name: "get-answer"
filter:
- function: "regex"
regex_pattern: "(?<=The answer is )(.*)(?=.)"
- function: "take_first"
generation_kwargs:
until:
- "</s>"
do_sample: false
temperature: 0.0
metric_list:
- metric: exact_match
aggregation: mean
higher_is_better: true
ignore_case: true
ignore_punctuation: true
"dataset_name": "abstract_algebra"
"description": "The following are multiple choice questions (with answers) about abstract\
\ algebra.\n\nQ: Statement 1 | Every element of a group generates a cyclic subgroup\
\ of the group. Statement 2 | The symmetric group S_10 has 10 elements.\n(A) True,\
\ True (B) False, False (C) True, False (D) False, True\nA: Let's think step by\
\ step. A cyclic group is a group that is generated by a single element. Hence a\
\ subgroup generated by a single element of a group is cyclic and Statement 1 is\
\ True. The answer is (C).\n\nQ: The symmetric group $S_n$ has $\nactorial{n}$ elements,\
\ hence it is not true that $S_{10}$ has 10 elements.\nFind the characteristic of\
\ the ring 2Z.\n(A) 0 (B) 3 (C) 12 (D) 30\nA: Let's think step by step. A characteristic\
\ of a ring is R is $n$ if the statement $ka = 0$ for all $a\\in 2Z$ implies that\
\ $k$ is a multiple of $n$. Assume that $ka = 0$ for all $a\\in 2Z$ for some $k$.\
\ In particular $2k = 0$. Hence $k=0$ and $n=0$. The answer is (A).\n\nQ: Statement\
\ 1| Every function from a finite set onto itself must be one to one. Statement\
\ 2 | Every subgroup of an abelian group is abelian.\n(A) True, True (B) False,\
\ False (C) True, False (D) False, True\nA: Let's think step by step. Statement\
\ 1 is true. Let $S$ be a finite set. If $f:S \nightarrow S$ is a onto function,\
\ then $|S| = |f(S)|$. If $f$ was not one to one, then for finite domain $S$ the\
\ image would have less than $S$ elements, a contradiction.\nStatement 2 is true.\
\ Let $G$ be an abelian group and $H$ be a subgroup of $G$. We need to show that\
\ $H$ is abelian. Let $a,b \\in H$. Then $a,b \\in G$ and $ab=ba$. Since $G$ is\
\ abelian, $ab=ba$. Since $H$ is a subgroup of $G$, $ab \\in H$. Therefore, $ab=ba$\
\ and $H$ is abelian. The answer is (A).\n\nQ: Statement 1 | If aH is an element\
\ of a factor group, then |aH| divides |a|. Statement 2 | If H and K are subgroups\
\ of G then HK is a subgroup of G.\n(A) True, True (B) False, False (C) True, False\
\ (D) False, True\nA: Let's think step by step. Statement 2 is false. Let $H$ be\
\ a subgroup of $S_3$ generated by the cycle $(1,2)$ and $K$ be a subgroup of $S_3$\
\ generated by the cycle $(1,3)$. Both $H$ and $K$ have two elements, the generators\
\ and the identity. However $HK$ contains cycles (1,2), (1,3) and (2,3,1), but the\
\ inverse of (2,3,1) is (2,1,3) and it does not belong to HK, hence HK is not a\
\ subgroup. The answer is (B).\n\nQ: Find all c in Z_3 such that Z_3[x]/(x^2 + c)\
\ is a field.\n(A) 0 (B) 1 (C) 2 (D) 3\nA: Let's think step by step. Z_3[x]/(x^2\
\ + c) is a field if and only if x^2 + c does not have roots in Z_3. That is x^2\
\ + c != 0 for every x in Z_3. If c = 0, then x^2 + c = x^2 has root 0. If c = 1\
\ then x^2 + c = x^2 + 1 = 0 + 1 for x = 0, 1 + 1 = 2 for x = 1 and 1 + 1 = 2 for\
\ x = 2, hence x^2 + 1 does not have any roots. For c = 2 the polynomial x^2 + 2\
\ has two roots at x = 1 and x = 2. Hence Z_3[x]/(x^2 + c) is a field if and only\
\ if c = 1. The answer is (B)."
"group": "mmlu_flan_cot_fewshot_stem"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_abstract_algebra"
"dataset_name": "anatomy"
"description": "The following are multiple choice questions (with answers) about anatomy.\n\
\nQ: Which of the following is the body cavity that contains the pituitary gland?\n\
(A) Abdominal (B) Cranial (C) Pleural (D) Spinal\nA: Let's think step by step. We\
\ refer to Wikipedia articles on anatomy for help. Let’s solve this problem step\
\ by step. The pituitary gland is the major endocrine gland attached to the base\
\ of the brain, and it is contained in the Cranial cavity. The answer is (B).\n\n\
Q: Which of these branches of the trigeminal nerve contain somatic motor processes?\n\
(A) The supraorbital nerve (B) The infraorbital nerve (C) The mental nerve (D) None\
\ of the above\nA: Let's think step by step. We refer to Wikipedia articles on anatomy\
\ for help. Let’s solve this problem step by step. \nWe know the following: (A)\
\ The supraorbital nerve (also known as the frontal nerve) is the largest branch\
\ of the ophthalmic nerve and branch of ophthalmic division of the trigeminal nerve.\
\ (B) The infraorbital nerve is a branch of the maxillary division of the trigeminal\
\ nerve. (C) The mental nerve is a branch of the mandibular division of the trigeminal\
\ nerve. Because all these nerves are purely sensory nerves and do not contain any\
\ somatic motor processes. Therefore, the answer should be none of the above, which\
\ is (D). The answer is (D).\n\nQ: In Angle's Class II Div 2 occlusion there is\n\
(A) excess overbite of the upper lateral incisors. (B) negative overjet of the upper\
\ central incisors. (C) excess overjet of the upper lateral incisors. (D) excess\
\ overjet of the upper central incisors.\nA: Let's think step by step. We refer\
\ to Wikipedia articles on anatomy for help. Let’s solve this problem step by step.\
\ This is a question related to anatomy and orthodontics. Excess overjet is associated\
\ with Class II occlusions; therefore, we can safely eliminate (B) from the list,\
\ as negative overjet is often associated with Class III occlusions. Now, we need\
\ to determine the location of the excess overjet, and that would be the upper (maxillary)\
\ lateral incisors. Only (C) has the correct information. The answer is (C).\n\n\
Q: The pleura\n(A) have no sensory innervation. (B) are separated by a 2 mm space.\
\ (C) extend into the neck. (D) are composed of respiratory epithelium.\nA: Let's\
\ think step by step. We refer to Wikipedia articles on anatomy for help. Let’s\
\ solve this problem step by step. First, recall that the pleura refers to the thin\
\ layer of tissue that covers the lungs and lines the interior wall of the chest\
\ cavity. Now, let’s look at each option:\nOption (A): “The pleura have no sensory\
\ innervation.” This information is not correct. The pleura do have a sensory innervation.\n\
Option (B): “The pleura are separated by a 2 mm space.” This information is not\
\ correct. There is a very thin “potential” space between the layers of the pleura;\
\ however, it is typically filled with serous pleural fluid. \nOption (C): “The\
\ pleura extend into the neck.” This information is actuakky true. The cervical\
\ pleura, also known as the dome of the pleuradome of the pleura, lines the extendsiton\
\ of the pleural cavity into the neck.\nOption (D): “The pleura are composed of\
\ respiratory epithelium.” This information is not correct. The pleaura are composed\
\ of connective tissue (CT).\nBecause (A), (B), and (D) are all incorrect, (D) is\
\ the only correct answer. The answer is (C).\n\nQ: What is the embryological origin\
\ of the hyoid bone?\n(A) The first pharyngeal arch (B) The first and second pharyngeal\
\ arches (C) The second pharyngeal arch (D) The second and third pharyngeal arches\n\
A: Let's think step by step. We refer to Wikipedia articles on anatomy for help.\
\ Let’s solve this problem step by step. The hyoid bone, which is also known as\
\ the hyooid, is a a small U-shaped bone located in the anterior neck. In its resting\
\ position, it lies between the ase of the mandible and the third cervical vertebrae.\
\ We know that the second and the third pharyngeal arches give rise to the horns\
\ of the hyoid bone; therefore, the embryological origin of the hyoid bone are the\
\ second and the third pharyngeal arches—this information is covered in the last\
\ option (D). Therefore, we conclude that (D) must be the correct answer. The answer\
\ is (D)."
"group": "mmlu_flan_cot_fewshot_stem"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_anatomy"
"dataset_name": "astronomy"
"description": "The following are multiple choice questions (with answers) about astronomy.\n\
\nQ: Where do most short-period comets come from and how do we know?\n(A) The Kuiper\
\ belt; short period comets tend to be in the plane of the solar system just like\
\ the Kuiper belt. (B) The Kuiper belt; short period comets tend to come from random\
\ directions indicating a spherical distribution of comets called the Kuiper belt.\
\ (C) The asteroid belt; short period comets have orbital periods similar to asteroids\
\ like Vesta and are found in the plane of the solar system just like the asteroid\
\ belt. (D) The Oort cloud; short period comets tend to be in the plane of the solar\
\ system just like the Oort cloud.\nA: Let's think step by step. Most short-period\
\ comets come from the Kuiper belt, and we know because short period coments tend\
\ to be in the plane of the solar system, just like the Kuiper belt is. The answer\
\ is (A).\n\nQ: You are pushing a truck along a road. Would it be easier to accelerate\
\ this truck on Mars? Why? (Assume there is no friction)\n(A) It would be harder\
\ since the truck is heavier on Mars. (B) It would be easier since the truck is\
\ lighter on Mars. (C) It would be harder since the truck is lighter on Mars. (D)\
\ It would be the same no matter where you are.\nA: Let's think step by step. If\
\ we assume that there is no friction, the force needed to accelerate the truck\
\ is by Newton’s second law only dependent on the mass of the truck. Hence (A),\
\ (B) and (C) are incorrect since it doesn’t matter that it’s on Mars, and (D) is\
\ the correct answer. The answer is (D).\n\nQ: Say the pupil of your eye has a diameter\
\ of 5 mm and you have a telescope with an aperture of 50 cm. How much more light\
\ can the telescope gather than your eye?\n(A) 10000 times more (B) 100 times more\
\ (C) 1000 times more (D) 10 times more\nA: Let's think step by step. The amount\
\ of light is proportional to the aperture area $A = \\pi D^2/4$ for a lens with\
\ diameter $D$, so the relative amounts of light between the eye with diameter 5mm\
\ and the telescope with diameter 50mm is $(50 cm)^2/(5mm)^2 = 10000$. The answer\
\ is (A).\n\nQ: Why isn't there a planet where the asteroid belt is located?\n(A)\
\ A planet once formed here but it was broken apart by a catastrophic collision.\
\ (B) There was not enough material in this part of the solar nebula to form a planet.\
\ (C) There was too much rocky material to form a terrestrial planet but not enough\
\ gaseous material to form a jovian planet. (D) Resonance with Jupiter prevented\
\ material from collecting together to form a planet.\nA: Let's think step by step.\
\ The asteroid belt is a stellar disc consisting of a large number of asteroids\
\ between Mars and Jupiter's orbits. The asteroids in this belt are affected by\
\ the gravitational pull from both other asteroids and nearby planets. Due to the\
\ strong gravitational force of Jupiter there are resonances that give rise to low\
\ density regions of asteroids known as the Kirkwood gap. So (B) and (C) are not\
\ correct since it’s not a lack of material that prevents a planet from being formed,\
\ and (A) is incorrect because the Kirkwood gap would have prevented a planet from\
\ forming in the first place, and (D) is the correct option. The answer is (D).\n\
\nQ: Why is Mars red?\n(A) Because the surface is covered with heavily oxidized\
\ (\"rusted\") minerals. (B) Because the atmosphere scatters more light at bluer\
\ wavelengths transmitting mostly red light. (C) Because Mars is covered with ancient\
\ lava flows which are red in color. (D) Because flowing water on Mars's surface\
\ altered the surface minerals several billion years ago.\nA: Let's think step by\
\ step. Option (B) is not correct because if the red color was caused by the scattering\
\ off the atmosphere, then the earth with a much thicker atmosphere would also look\
\ red. Options (C) and (D) are not specific enough about why the color of the surface\
\ would be red, while (A) is correct because it explains that the surface is red\
\ due to the rusted materials on the surface and the red color comes from the rust.\
\ So the correct option is (A). The answer is (A)."
"group": "mmlu_flan_cot_fewshot_stem"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_astronomy"
"dataset_name": "business_ethics"
"description": "The following are multiple choice questions (with answers) about business\
\ ethics.\n\nQ: In contrast to _______, _______ aim to reward favourable behaviour\
\ by companies. The success of such campaigns have been heightened through the use\
\ of ___________, which allow campaigns to facilitate the company in achieving _________\
\ .\n(A) Buycotts, Boycotts, Blockchain technology, Charitable donations (B) Buycotts,\
\ Boycotts, Digital technology, Increased Sales (C) Boycotts, Buyalls, Blockchain\
\ technology, Charitable donations (D) Boycotts, Buycotts, Digital technology, Increased\
\ Sales\nA: Let's think step by step. We refer to Wikipedia articles on business\
\ ethics for help. The sentence that best uses the possible options above is “In\
\ contrast to *boycotts*, *buycotts* aim to reward favourable behavior by companies.\
\ The success of such campaigns have been heightened through the use of *digital\
\ technology*, which allow campaigns to facilitate the company in achieving *increased\
\ sales*.” The answer is (D).\n\nQ: _______ is the direct attempt to formally or\
\ informally manage ethical issues or problems, through specific policies, practices\
\ and programmes.\n(A) Corporate social responsibility (B) Business ethics management\
\ (C) Sustainability (D) Environmental management\nA: Let's think step by step.\
\ We refer to Wikipedia articles on business ethics for help. The direct attempt\
\ manage ethical issues through specific policies, practices, and programs is business\
\ ethics management. The answer is (B).\n\nQ: Three contrasting tactics that CSO's\
\ can engage in to meet their aims are ________ which typically involves research\
\ and communication, ________, which may involve physically attacking a company's\
\ operations or ________, often involving some form of _______.\n(A) Non-violent\
\ direct action, Violent direct action, Indirect action, Boycott (B) Indirect action,\
\ Instrumental action, Non-violent direct action, Information campaign (C) Indirect\
\ action, Violent direct action, Non-violent direct-action Boycott (D) Non-violent\
\ direct action, Instrumental action, Indirect action, Information campaign\nA:\
\ Let's think step by step. We refer to Wikipedia articles on business ethics for\
\ help. The sentence that best uses the possible options above is “Three contrasting\
\ tactics that CSO's can engage in to meet their aims are *indirect action*, which\
\ typically involves research and communication, *violent direct action*, which\
\ may involve physically attacking a company's operations or *non-violent direct\
\ action*, often involving some form of *boycott*.” The answer is (C).\n\nQ: To\
\ ensure the independence of the non-executive board members, there are a number\
\ of steps which can be taken, which include non-executives being drawn from _______\
\ the company, being appointed for a _________ time period as well as being appointed\
\ _________.\n(A) Outside, Limited, Independently (B) Inside, Limited, Intermittently\
\ (C) Outside, Unlimited, Intermittently (D) Inside, Unlimited, Independently\n\
A: Let's think step by step. We refer to Wikipedia articles on business ethics for\
\ help. The sentence that best uses the possible options above is “To ensure the\
\ independence of the non-executive board members, there are a number of steps which\
\ can be taken, which include non-executives being draw from *outside* the company,\
\ being appointed for a *limited* time period as well as being imported *independently*.\
\ The answer is (A).\n\nQ: Beyond the business case for engaging in CSR there are\
\ a number of moral arguments relating to: negative _______, the _______that corporations\
\ possess and the ________ of business and society.\n(A) Externalities, Power, Independence\
\ (B) Publicity, Insubstantial resources, Mutual dependence (C) Publicity, Power,\
\ Independence (D) Externalities, Power, Mutual dependence\nA: Let's think step\
\ by step. We refer to Wikipedia articles on business ethics for help. The sentence\
\ that best uses the possible options above is “Beyond the business case for engaging\
\ the CSR there are a number of moral arguments relating to: negative *externalities*,\
\ the *power* that corporations possess and the *mutual independence* of business\
\ and society. The answer is (D)."
"group": "mmlu_flan_cot_fewshot_other"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_business_ethics"
"dataset_name": "clinical_knowledge"
"description": "The following are multiple choice questions (with answers) about clinical\
\ knowledge.\n\nQ: Glycolysis is the name given to the pathway involving the conversion\
\ of:\n(A) glycogen to glucose-1-phosphate. (B) glycogen or glucose to fructose.\
\ (C) glycogen or glucose to pyruvate or lactate. (D) glycogen or glucose to pyruvate\
\ or acetyl CoA.\nA: Let's think step by step. We refer to Wikipedia articles on\
\ clinical knowledge for help. Glycolysis is the name given to the pathway involving\
\ conversion of glycogen or glucose to pyruvate or lactate. The answer is (C).\n\
\nQ: What is the difference between a male and a female catheter?\n(A) Male and\
\ female catheters are different colours. (B) Male catheters are longer than female\
\ catheters. (C) Male catheters are bigger than female catheters. (D) Female catheters\
\ are longer than male catheters.\nA: Let's think step by step. We refer to Wikipedia\
\ articles on clinical knowledge for help. The difference between a male and female\
\ catheter is that male catheters tend to be longer than female catheters. The answer\
\ is (B).\n\nQ: How many attempts should you make to cannulate a patient before\
\ passing the job on to a senior colleague, according to the medical knowledge of\
\ 2020?\n(A) 4 (B) 3 (C) 2 (D) 1\nA: Let's think step by step. We refer to Wikipedia\
\ articles on clinical knowledge for help. According to the medical protocol as\
\ of 2020, you should make two attempts to cannulate a patient before passing the\
\ job on to a more-senior practitioner. The answer is (C).\n\nQ: In the assessment\
\ of the hand function which of the following is true?\n(A) Abduction of the thumb\
\ is supplied by spinal root T2 (B) Opposition of the thumb by opponens policis\
\ is supplied by spinal root T1 (C) Finger adduction is supplied by the median nerve\
\ (D) Finger abduction is mediated by the palmar interossei\nA: Let's think step\
\ by step. We refer to Wikipedia articles on clinical knowledge for help. Of all\
\ the options, it is only true that the opposition of the thumb by opponens pollicis\
\ is supplied by spinal root T1. The answer is (B).\n\nQ: The energy for all forms\
\ of muscle contraction is provided by:\n(A) ATP. (B) ADP. (C) phosphocreatine.\
\ (D) oxidative phosphorylation.\nA: Let's think step by step. We refer to Wikipedia\
\ articles on clinical knowledge for help. The energy for muscular contraction is\
\ provided by ATP (adenosine triphosphate), which is the powerhouse of the cell.\
\ The answer is (A)."
"group": "mmlu_flan_cot_fewshot_other"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_clinical_knowledge"
"dataset_name": "college_biology"
"description": "The following are multiple choice questions (with answers) about college\
\ biology.\n\nQ: Which of the following represents an accurate statement concerning\
\ arthropods?\n(A) They possess an exoskeleton composed primarily of peptidoglycan.\
\ (B) They possess an open circulatory system with a dorsal heart. (C) They are\
\ members of a biologically unsuccessful phylum incapable of exploiting diverse\
\ habitats and nutrition sources. (D) They lack paired, jointed appendages.\nA:\
\ Let's think step by step. Peptidoglycan is known to comprise the plasma membrane\
\ of most bacteria, rather than the exoskeleton of arthropods, which is made of\
\ chitin, which rules out (A). The answer (C) is false because arthropods are a\
\ highly successful phylum. Likewise, arthropods have paired, jointed appendages,\
\ which rules out (D). The only remaining option is (B), as arthropods have an open\
\ circulatory system with a dorsal tubular heart. The answer is (B).\n\nQ: In a\
\ given population, 1 out of every 400 people has a cancer caused by a completely\
\ recessive allele, b. Assuming the population is in Hardy-Weinberg equilibrium,\
\ which of the following is the expected proportion of individuals who carry the\
\ b allele but are not expected to develop the cancer?\n(A) 1/400 (B) 19/400 (C)\
\ 20/400 (D) 38/400\nA: Let's think step by step. According to the Hardy Weinberg\
\ Law, $p^2 + 2 p q + q^2 = 1$, and $p + q = 1$ where $p$ is the frequency of the\
\ dominant allele, $q$ is the frequency of the recessive allele, and $p^2$, $q^2$,\
\ and $2pq$ are the frequencies of dominant homozygous, recessive homozygous, and\
\ heterozygous individuals, respectively. ​The frequency of the recessive allele\
\ (q) is $\\sqrt{\frac{1}{400}} = 0.05$. We have $p = 1 - q = 0.95$. The frequency\
\ of heterozygous individuals is $2pq = 2 \\cdot 0.05 \\cdot 0.95 = 0.095$. The\
\ number of heterozygous individuals is equal to the frequency of heterozygous individuals\
\ times the size of the population, or $0.095 * 400 = 38$. So we end up with 38/400.\
\ The answer is (D).\n\nQ: According to the pressure-flow model of movement of phloem\
\ contents, photosynthate movement from source to sink is driven by\n(A) an ATP-dependent\
\ pressure-flow pump (B) a water-pressure potential gradient (C) transpiration (D)\
\ apoplastic diffusion\nA: Let's think step by step. It is a gradient in water pressure\
\ that induces the movement of phloem content, which refers to answer (B). The mechanism\
\ of movement does not rely on metabolism, which rules out (A). Transpiration refers\
\ to the exhalation of water vapor through plant stomata, and is also not related,\
\ which rules out (C). While the apoplastic pathway is one of two main pathways\
\ for water transport in plants, it is not central to the pressure flow model, which\
\ rules out (D). The answer is (B).\n\nQ: Which of the following contain DNA sequences\
\ required for the segregation of chromosomes in mitosis and meiosis?\n(A) Telomeres\
\ (B) Centromeres (C) Nucleosomes (D) Spliceosomes\nA: Let's think step by step.\
\ The genetic material in Telomeres is not used, which rules out (A). Nucleosomes\
\ are the repeating subunit that comprises chromatin packed in a cell nucleus, and\
\ do not specifically refer to DNA sequences necessary for segregating chromosomes\
\ in cell division, which rules out (C). A spliceosome is a large ribonucleoprotein\
\ that removes introns from transcribed pre-mRNA rather than governing chromosome\
\ segregation. Centromeres are directly responsible for segregating chromosomes\
\ in cell division. The answer is (B).\n\nQ: The presence of homologous structures\
\ in two different organisms, such as the humerus in the front limb of a human and\
\ a bird, indicates that\n(A) the human and bird are polyphyletic species (B) a\
\ human's and bird's evolution is convergent (C) the human and bird belong to a\
\ clade (D) the human and bird developed by analogy\nA: Let's think step by step.\
\ Polyphyletic species are organisms that are grouped due to having similar characteristics\
\ but which do not have a common ancestor. This is not the case for humans and birds,\
\ which rules out (A). Convergent evolution refers to the indepdendent development\
\ of similar features in different species at different periods, which is also not\
\ the case for humans and birds, which rules out (B). Analogy refers to the superficial\
\ resemblance of structures that have different origins, which is not the case for\
\ the human and bird forearms, which rules out (D). Humans and birds do belong to\
\ the same clade - a group of organisms composed of a common ancestor. The answer\
\ is (C)."
"group": "mmlu_flan_cot_fewshot_stem"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_college_biology"
"dataset_name": "college_chemistry"
"description": "The following are multiple choice questions (with answers) about college\
\ chemistry.\n\nQ: 3 Cl−(aq) + 4 CrO_4^2−(aq) + 23 H+(aq) 3 HClO2(aq) + 4 Cr3+(aq)\
\ + 10 H2O(l). In the reaction shown above, Cl−(aq) behaves as\n(A) an acid (B)\
\ a base (C) a catalyst (D) a reducing agent\nA: Let's think step by step. A molecule\
\ that behaves as a base accepts an H+ ion (or proton) from another molecule, whereas\
\ a molecule that behaves as an acid donates an H+ ion (or proton) to another molecule.\
\ Neither of these is the case for Cl in this reaction, which rules out (A) and\
\ (B). A catalyst is a substance that only accelerates a reaction without itself\
\ undergoing chemical change, which is not the case here. This rules out (C). Instead,\
\ the $Cl^{-} molecules carry a negative charge, which they donate in the reaction\
\ to form 3 HClO2. This is the behavior of a reducing agent, or (D). The answer\
\ is (D).\n\nQ: Which of the following statements about the lanthanide elements\
\ is NOT true?\n(A) The most common oxidation state for the lanthanide elements\
\ is +3. (B) Lanthanide complexes often have high coordination numbers (> 6). (C)\
\ All of the lanthanide elements react with aqueous acid to liberate hydrogen. (D)\
\ The atomic radii of the lanthanide elements increase across the period from La\
\ to Lu.\nA: Let's think step by step. The atomic radii of the lanthanide elements\
\ in fact decrease across the period from La to Lu. Options (A), (B), and (C) are\
\ all true. This means that only (D) is NOT true. The answer is (D).\n\nQ: Which\
\ of the following lists the hydrides of group-14 elements in order of thermal stability,\
\ from lowest to highest?\n(A) PbH4 < SnH4 < GeH4 < SiH4 < CH4 (B) PbH4 < SnH4 <\
\ CH4 < GeH4 < SiH4 (C) CH4 < SiH4 < GeH4 < SnH4 < PbH4 (D) CH4 < PbH4 < GeH4 <\
\ SnH4 < SiH4\nA: Let's think step by step. The thermal stability of group-14 hydrides\
\ decreases as we move from the top of group 14 to the bottom. The order of elements\
\ in the group from top to bottom is C, Si, Ge, Sn, Pb. Therefore in order of increasing\
\ thermal stability we have PbH4, SnH4, GeH4, SiH4, and CH4, or answer (A). The\
\ answer is (A).\n\nQ: Predict the number of lines in the EPR spectrum of a solution\
\ of 13C-labelled methyl radical (13CH3•), assuming the lines do not overlap.\n\
(A) 4 (B) 3 (C) 6 (D) 24 (E) 8\nA: Let's think step by step. The electron paramagnetic\
\ resonance spectrum will be split by two forms of interactions. The first is the\
\ hyperfine interaction with the 13C (nuclear spin $I = \nrac{1}{2}$) which will\
\ split the spectrum into 2 lines. This will be further split into 4 lines by the\
\ interaction with three equivalent 1H nuclei. The total number of lines is therefore\
\ $2 \\cdot 4 = 8$. The answer is (E)."
"group": "mmlu_flan_cot_fewshot_stem"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_college_chemistry"
"dataset_name": "college_computer_science"
"description": "The following are multiple choice questions (with answers) about college\
\ computer science.\n\nQ: Which of the following regular expressions is equivalent\
\ to (describes the same set of strings as) (a* + b)*(c + d)?\n(A) a*(c + d)+ b(c\
\ + d)\n(B) a*(c + d)* + b(c + d)*\n(C) a*(c + d)+ b*(c + d)\n(D) (a + b)*c +(a\
\ + b)*d\nA: Let's think step by step. We know that:\n1. (X* + Y)* = (X + Y)*\n\
2. X(Y + Z)? = XY + XZ\nUsing equation 1 we can rewrite (a* + b)*(c + d)? as:\n\
3. (a + b)*(c + d)?\nUsing equation 2 we can rewrite equation 3 as:\n(a + b)*c +\
\ (a + b)*d The answer is (D).\n\nQ: The Singleton design pattern is used to guarantee\
\ that only a single instance of a class may be instantiated. Which of the following\
\ is (are) true of this design pattern?\nI. The Singleton class has a static factory\
\ method to provide its instance.\nII. The Singleton class can be a subclass of\
\ another class.\nIII. The Singleton class has a private constructor.\n(A) I only\n\
(B) II only\n(C) III only\n(D) I, II, and III\nA: Let's think step by step. Statement\
\ I is a correct statement about a Singleton, because a Singleton restricts instantiation\
\ to a single, static method. Statement II is also correct, because there is no\
\ inherent restriction regarding the inheritance of a Singleton. Statement III is\
\ also correct, because a Singletons must be instantiated only once, so its constructor\
\ is made private to prevent any construction except via its static factory method.\n\
Given these facts, statements I, II, and III are all correct. The answer is (D).\n\
\nQ: A certain pipelined RISC machine has 8 general-purpose registers R0, R1, .\
\ . . , R7 and supports the following operations:\nADD Rs1, Rs2, Rd (Add Rs1 to\
\ Rs2 and put the sum in Rd)\nMUL Rs1, Rs2, Rd (Multiply Rs1 by Rs2 and put the\
\ product in Rd)\nAn operation normally takes one cycle; however, an operation takes\
\ two cycles if it produces a result required by the immediately following operation\
\ in an operation sequence.\nConsider the expression AB + ABC + BC, where variables\
\ A, B, C are located in registers R0, R1, R2. If the contents of these three registers\
\ must not be modified, what is the minimum number of clock cycles required for\
\ an operation sequence that computes the value of AB + ABC + BC?\n(A) 5 (B) 6 (C)\
\ 7 (D) 8\nA: Let's think step by step. First, we are given that A is in R0, B is\
\ in R1, and C is in R2.\nNext, we can see that we must compute three multiplies\
\ (AB, BC, and ABC) and two adds (AB + ABC, (AB + ABC) + BC) to compute our final\
\ answer, resulting in a minimum of five clock cycles.\nNext, we can see that there\
\ is no way to avoid at least one pipeline stall when computing our final answer,\
\ because to compute our final sum we must wait at least one cycle for the results\
\ from the previous stage to be ready. Thus, our minimum number of cycles must be\
\ 6.\nWe can verify that we can create a solution that requires only six cycles\
\ as follows:\ncompute AB: MUL R0, R1, R3\ncompute BC: MUL R1, R2, R4\ncompute ABC:\
\ MUL R3, R4, R5\ncompute AB + BC: ADD R3, R4, R6\nSTALL\ncompute AB + ABC + BC:\
\ ADD R5, R6, R7\nSo there are 6 cycles. The answer is (B).\n\nQ: A compiler generates\
\ code for the following assignment statement.\nG := (A + B) * C - (D + E) * F\n\
The target machine has a single accumulator and a single-address instruction set\
\ consisting of instructions load, store, add, subtract, and multiply. For the arithmetic\
\ operations, the left operand is taken from the accumulator and the result appears\
\ in the accumulator. The smallest possible number of instructions in the resulting\
\ code is\n(A) 5 (B) 6 (C) 7 (D) 9\nA: Let's think step by step. We can compute\
\ the final answer with the following sequence of operations:\n1. LOAD D (accumulator\
\ = D)\n2. ADD E (accumulator = D+E)\n3. MUL F (accumulator = (D+E)*F)\n4. STORE\
\ X (X = (D+E)*F)\n5. LOAD A (accumulator = A)\n6. ADD B (accumulator = A+B)\n\
7. MUL C (accumulator = (A+B)*C)\n8. SUB X (accumulator = (A+B)*C - (D+E)*F)\n\
9. STORE G (G = (A+B)*C - (D+E)*F)\nThis sequence takes 9 instructions. The answer\
\ is (D).\n\nQ: Consider a computer design in which multiple processors, each with\
\ a private cache memory, share global memory using a single bus. This bus is the\
\ critical system resource. Each processor can execute one instruction every 500\
\ nanoseconds as long as memory references are satisfied by its local cache. When\
\ a cache miss occurs, the processor is delayed for an additional 2,000 nanoseconds.\
\ During half of this additional delay, the bus is dedicated to serving the cache\
\ miss. During the other half, the processor cannot continue, but the bus is free\
\ to service requests from other processors. On average, each instruction requires\
\ 2 memory references. On average, cache misses occur on 1 percent of references.\
\ What proportion of the capacity of the bus would a single processor consume, ignoring\
\ delays due to competition from other processors?\n(A) 1/50 (B) 1/27 (C) 1/25 (D)\
\ 2/27\nA: Let's think step by step. We know that each instruction requires two\
\ memory references per instruction, and that there is an average cache miss rate\
\ of one percent.\nThus a given processor has:\n(1 cache miss / 100 references)\
\ * (2 references / instruction) =\n(2 cache misses / 100 instructions), so:\nmisses_per_instruction\
\ = 1 cache miss / 50 instructions.\nNext, we know that each instruction requires\
\ 500 nanoseconds when there is no cache miss, and 500 + 2000 = 2500 nanoseconds\
\ when there is a cache miss. Thus:\n50 instructions / (49 * 500) + (1 * 2500) nanoseconds,\
\ so:\ninstructions_per_ns = 50 instructions / 27000 nanoseconds.\nNow, we know\
\ that each cache miss locks the bus for half of the 2000 nanosecond cache miss\
\ delay, or 1000 nanoseconds, so:\nlock_ns_per_miss = 1000 nanoseconds / cache miss.\n\
Thus we can see that on average a single processor will lock the bus for:\nlock_ns_per_miss\
\ * misses_per_instruction * instructions_per_ns =\n(1000 nanoseconds / cache miss)\
\ * (1 cache miss / 50 instructions) * (50 instructions / 27000 nanoseconds) = 1000\
\ * (1/50) * (50/27000) = 1000/27000 = 1/27. The answer is (B)."
"group": "mmlu_flan_cot_fewshot_stem"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_college_computer_science"
"dataset_name": "college_mathematics"
"description": "The following are multiple choice questions (with answers) about college\
\ mathematics.\n\nQ: Let V be the set of all real polynomials p(x). Let transformations\
\ T, S be defined on V by T:p(x) -> xp(x) and S:p(x) -> p'(x) = d/dx p(x), and interpret\
\ (ST)(p(x)) as S(T(p(x))). Which of the following is true?\n(A) ST = 0 (B) ST =\
\ T (C) ST = TS (D) ST - TS is the identity map of V onto itself.\nA: Let's think\
\ step by step. For a given polynomial $p$ we have\n\\[ST(p) = (xp(x))’ = p(x) +\
\ xp’(x)\\]\nand\n\\[TS(p) = xp’(x).\\]\nHence \\[ST(p) - TS(p) = p(x) + xp’(x)\
\ - xp’(x).\\] The answer is (D).\n\nQ: Suppose that f(1 + x) = f(x) for all real\
\ x. If f is a polynomial and f(5) = 11, then f(15/2)\n(A) -11 (B) 0 (C) 11 (D)\
\ 33/2\nA: Let's think step by step. The only polynomial so that $f(1 + x) = f(x)$\
\ is a constant polynomial. Hence $f(5) = 11 = f(15/2)$. The answer is (C).\n\n\
Q: Let A be a real 2x2 matrix. Which of the following statements must be true?\n\
I. All of the entries of A^2 are nonnegative.\nII. The determinant of A^2 is nonnegative.\n\
III. If A has two distinct eigenvalues, then A^2 has two distinct eigenvalues.\n\
(A) I only (B) II only (C) III only (D) II and III only\nA: Let's think step by\
\ step. We have \\[ det(A^2) = (det(A))^2 \\geq 0,\\] hence II holds.\nIII is false:\
\ as a counterexample take a diagonal matrix with -1 and 1 on the diagonal. Then\
\ $A^2$ is the identity matrix. The answer is (B).\n\nQ: Let A be the set of all\
\ ordered pairs of integers (m, n) such that 7m + 12n = 22. What is the greatest\
\ negative number in the set B = {m + n : (m, n) \\in A}?\n(A) -5 (B) -4 (C) -3\
\ (D) -2\nA: Let's think step by step. We have 12n = 22 - 7m and one of the solutions\
\ is $m = -2$, $n = 3$. Then $m + n = 1$, hence we need to look for smaller $m$\
\ in order to make $m + n$ negative. The next solution is $m = -14$ and $n = 10$.\
\ For smaller $m$ we have $m + n$ smaller than $-4$. The answer is (B).\n\nQ: A\
\ tank initially contains a salt solution of 3 grams of salt dissolved in 100 liters\
\ of water. A salt solution containing 0.02 grams of salt per liter of water is\
\ sprayed into the tank at a rate of 4 liters per minute. The sprayed solution is\
\ continually mixed with the salt solution in the tank, and the mixture flows out\
\ of the tank at a rate of 4 liters per minute. If the mixing is instantaneous,\
\ how many grams of salt are in the tank after 100 minutes have elapsed?\n(A) 2\
\ (B) 2 - e^-2 (C) 2 + e^-2 (D) 2 + e^-4\nA: Let's think step by step. For all $t\
\ \\in \\mathbb{R}$, let $s(t)$ denote the number grams of salt in the tank at the\
\ $t$ minute mark. Then $s(0) = 3$.\nWe use $s$ and $s(t)$ interchangeably. We also\
\ use $s^{\\prime}$ and $s^{\\prime}(t)$ interchangeably. The solution sprayed into\
\ the tank adds $(0.02) 4=2 / 25$ grams of salt per minute. There are always 100\
\ liters of liquid in the tank, containing $s$ grams of salt. So the density of\
\ salt in the tank is $s / 100$ grams per liter. The flow of water out of the tank\
\ therefore subtracts $4(s / 100)=s / 25$ grams of salt per minute. Then, for all\
\ $t \\in \\mathbb{R}$, we have $s^{\\prime}(t)=(2 / 25)-(s / 25)=(2-s) / 25$, and\
\ so $[s(t)=2] \\Rightarrow\\left[s^{\\prime}(t)=0\right]$. For all $t \\in \\mathbb{R}$,\n\
$$\n\frac{d}{d t}[\\ln (s-2)]=\frac{s^{\\prime}}{s-2}=\frac{-1}{25}=\frac{d}{d t}\\\
left[-\frac{t}{25}\right] .\n$$\nChoose $C \\in \\mathbb{R}$ such that, for all\
\ $t \\in \\mathbb{R}, \\ln ((s(t)-2))=-[t / 25]+C$. Let $K:=e^{C}$. Then, for all\
\ $t \\in \\mathbb{R}$, we have $(s(t))-2=K e^{-t / 25}$, and so $s(t)=2+K e^{-t\
\ / 25}$. Then $3=s(0)=2+K e^{0}=2+K$, so $K=1$. Then $s(100)=2+K e^{-100 / 25}=2+1\
\ \\cdot e^{-4}=2+e^{-4}$. The answer is (D)."
"group": "mmlu_flan_cot_fewshot_stem"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_college_mathematics"
"dataset_name": "college_medicine"
"description": "The following are multiple choice questions (with answers) about college\
\ medicine.\n\nQ: An expected side effect of creatine supplementation is:\n(A) muscle\
\ weakness. (B) gain in body mass. (C) muscle cramps. (D) loss of electrolytes.\n\
A: Let's think step by step. We refer to Wikipedia articles on medicine for help.\
\ Creatine supplementation is a dietary supplement that results in body mass gain.\
\ The answer is (B).\n\nQ: Which of the following is not a true statement?\n(A)\
\ Muscle glycogen is broken down enzymatically to glucose-1-phosphate (B) Elite\
\ endurance runners have a high proportion of Type I fibres in their leg muscles\
\ (C) Liver glycogen is important in the maintenance of the blood glucose concentration\
\ (D) Insulin promotes glucose uptake by all tissues in the body\nA: Let's think\
\ step by step. We refer to Wikipedia articles on medicine for help. Let’s solve\
\ this step by step and go over each choice: \n(A) “Muscle glycogen is broken down\
\ enzymatically to glucose-1-phosphate”: This is a correct statement.\n(B) “Elite\
\ endurance runners have a high proportion of Type I fibres in their leg muscles”:\
\ This is a correct statement.\n(C) “Liver glycogen is important in the maintenance\
\ of the blood glucose concentration”: This is a correct statement. \n(D) “Insulin\
\ promotes glucose uptake by all tissues in the body”: This is not a correct statement,\
\ because insulin promotes glucose uptake by the liver, adipose tissue, and muscle,\
\ but not all tissues. For instance, the tissues in the brain and red blood cells\
\ are not affected by insulin. The answer is (D).\n\nQ: A high school science teacher\
\ fills a 1 liter bottle with pure nitrogen and seals the lid. The pressure is 1.70\
\ atm, and the room temperature is 25°C. Which two variables will both increase\
\ the pressure of the system, if all other variables are held constant?\n(A) Increasing\
\ temperature, increasing moles of gas (B) Increasing temperature, increasing volume\
\ (C) Decreasing volume, decreasing temperature (D) Decreasing moles of gas, increasing\
\ volume\nA: Let's think step by step. We refer to Wikipedia articles on medicine\
\ for help. The relevant equation for this is the ideal gas law: PV=nRT. To increase\
\ the pressure of the system (P), then either n (number of moles of the gas) or\
\ T (temperature) have to increase. The answer is (A).\n\nQ: In a genetic test of\
\ a newborn, a rare genetic disorder is found that has X-linked recessive transmission.\
\ Which of the following statements is likely true regarding the pedigree of this\
\ disorder?\n(A) All descendants on the maternal side will have the disorder. (B)\
\ Females will be approximately twice as affected as males in this family. (C) All\
\ daughters of an affected male will be affected. (D) There will be equal distribution\
\ of males and females affected.\nA: Let's think step by step. We refer to Wikipedia\
\ articles on medicine for help. Let’s solve this step by step. Let's recall first\
\ that females have two X chromosomes, while males have one X and one Y chromosome.\
\ This is an important fact we need to know before answering this question. \nBecause\
\ a male can only pass his only one X chromosome to a daughter, if he is affected\
\ by this rare genetic disorder, then we know for sure that he will pass this rare\
\ genetic disorder to all his future-born daughters. Therefore, “(C): All daughters\
\ of an affected male will be affected” is a correct statement. The answer is (C).\n\
\nQ: Glucose is transported into the muscle cell:\n(A) via protein transporters\
\ called GLUT4. (B) only in the presence of insulin. (C) via hexokinase. (D) via\
\ monocarbylic acid transporters.\nA: Let's think step by step. We refer to Wikipedia\
\ articles on medicine for help. Glucose (also known as the blood sugar) is the\
\ main sugar found in the human body. It is transported into the muscle cell via\
\ diffusion through protein transporters called GLUT4. The answer is (A)."
"group": "mmlu_flan_cot_fewshot_other"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_college_medicine"
"dataset_name": "college_physics"
"description": "The following are multiple choice questions (with answers) about college\
\ physics.\n\nQ: A refracting telescope consists of two converging lenses separated\
\ by 100 cm. The eye-piece lens has a focal length of 20 cm. The angular magnification\
\ of the telescope is\n(A) 4 (B) 5 (C) 6 (D) 20\nA: Let's think step by step. In\
\ a refracting telescope, if both lenses are converging, the focus of both lenses\
\ must be between the two lenses, and thus the focal lengths of the two lenses must\
\ add up to their separation. Since the focal length of one lens is 20 cm, the focal\
\ length of the other must be 80 cm. The magnification is the ratio of these two\
\ focal lengths, or 4. The answer is (A).\n\nQ: The muon decays with a characteristic\
\ lifetime of about 10^-6 second into an electron, a muon neutrino, and an electron\
\ antineutrino. The muon is forbidden from decaying into an electron and just a\
\ single neutrino by the law of conservation of\n(A) charge (B) mass (C) energy\
\ and momentum (D) lepton number\nA: Let's think step by step. Lepton number must\
\ be conserved, meaning the total number of leptons minus the number of antileptons.\
\ If a muon decays into an electron and a single neutrino, the total lepton number\
\ would go from one to two, violating lepton number conservation. The answer is\
\ (D).\n\nQ: One end of a Nichrome wire of length 2L and cross-sectional area A\
\ is attached to an end of another Nichrome wire of length L and cross- sectional\
\ area 2A. If the free end of the longer wire is at an electric potential of 8.0\
\ volts, and the free end of the shorter wire is at an electric potential of 1.0\
\ volt, the potential at the junction of the two wires is most nearly equal to\n\
(A) 2.4 V (B) 3.3 V (C) 4.5 V (D) 5.7 V\nA: Let's think step by step. This is a\
\ simple voltage divider problem, where the longer wire has a resistance four times\
\ that of the shorter end. So the voltage divider ratio is 1 / 5, meaning that the\
\ potential in the middle is 1.0 V + (8.0 V - 1.0 V) * 1/5 = 2.4 V. The answer is\
\ (A).\n\nQ: A refracting telescope consists of two converging lenses separated\
\ by 100 cm. The eye-piece lens has a focal length of 20 cm. The angular magnification\
\ of the telescope is\n(A) 4 (B) 5 (C) 6 (D) 20\nA: Let's think step by step. In\
\ a refracting telescope, if both lenses are converging, the focus of both lenses\
\ must be between the two lenses, and thus the focal lengths of the two lenses must\
\ add up to their separation. Since the focal length of one lens is 20 cm, the focal\
\ length of the other must be 80 cm. The magnification is the ratio of these two\
\ focal lengths, or 4. The answer is (A).\n\nQ: For which of the following thermodynamic\
\ processes is the increase in the internal energy of an ideal gas equal to the\
\ heat added to the gas?\n(A) Constant temperature (B) Constant volume (C) Constant\
\ pressure (D) Adiabatic\nA: Let's think step by step. Heat added to the gas can\
\ go into the gases internal energy or work done against an external force. However,\
\ if the volume of the gas container is constant, no work will be done (since work\
\ is pressure times change in volume). So, at constant volume, all of the heat goes\
\ into the internal energy. The answer is (B)."
"group": "mmlu_flan_cot_fewshot_stem"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_college_physics"
"dataset_name": "computer_security"
"description": "The following are multiple choice questions (with answers) about computer\
\ security.\n\nQ: SHA-1 has a message digest of\n(A) 160 bits (B) 512 bits (C) 628\
\ bits (D) 820 bits\nA: Let's think step by step. Since SHA-1 is a hash function\
\ which takes an input and produces a 160-bit (20-byte) hash value, its message\
\ digest is 160 bits. The answer is (A).\n\nQ: _____________ can modify data on\
\ your system so that your system doesn’t run correctly or you can no longer access\
\ specific data, or it may even ask for ransom in order to give your access.\n(A)\
\ IM Trojans (B) Backdoor Trojans (C) Trojan-Downloader (D) Ransom Trojan\nA:\
\ Let's think step by step. The system is asking for trojans, which are for ransom,\
\ which means ransom trojan. The answer is (D).\n\nQ: What is ethical hacking?\n\
(A) \"Hacking\" ethics so they justify unintended selfish behavior (B) Hacking systems\
\ (e.g., during penetration testing) to expose vulnerabilities so they can be fixed,\
\ rather than exploited (C) Hacking into systems run by those whose ethics you disagree\
\ with (D) A slang term for rapid software development, e.g., as part of hackathons\n\
A: Let's think step by step. Ethical hacking is a process of detecting vulnerabilities\
\ in an application, system, or organization's infrastructure that an attacker can\
\ use to exploit an individual or organization. They use this process to prevent\
\ cyberattacks and security breaches by lawfully hacking into the systems and looking\
\ for weak points. The answer is (B).\n\nQ: The ____________ is anything which your\
\ search engine cannot search.\n(A) Haunted web (B) World Wide Web (C) Surface web\
\ (D) Deep Web\nA: Let's think step by step. The search engine searches on the Surface\
\ Web, which is the portion of the world wide web which is visible so (B,C) are\
\ wrong. The Haunted Web doesn’t correspond to an internet concept. The Deep Web\
\ is the part of the World Wide Web which is not indexed. The answer is (D).\n\n\
Q: Exploitation of the Heartbleed bug permits\n(A) overwriting cryptographic keys\
\ in memory (B) a kind of code injection (C) a read outside bounds of a buffer (D)\
\ a format string attack\nA: Let's think step by step. The Heartbleed Bug is a serious\
\ vulnerability in the popular OpenSSL cryptographic software library. Heartbleed\
\ resulted from improper input validation (due to a missing bounds check) in the\
\ implementation of the TLS heartbeat extension. The vulnerability was classified\
\ as a buffer over-read, a situation where more data can be read than should be\
\ allowed. The answer is (C)."
"group": "mmlu_flan_cot_fewshot_stem"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_computer_security"
"dataset_name": "conceptual_physics"
"description": "\nThe following are multiple choice questions (with answers) about\
\ conceptual physics.\n\nQ: Colors in a soap bubble result from light\n(A) converted\
\ to a different frequency (B) deflection (C) interference (D) polarization\nA:\
\ Let's think step by step. In a soap bubble film, the light bounces between the\
\ two soap-air interfaces many times, interfering with itself constructively or\
\ destructively depending on the width of the film. This results in different colors\
\ being visible. The answer is (C).\n\nQ: Compared with the mass of a uranium atom\
\ undergoing fission, the combined masses of the products after fission are\n(A)\
\ less (B) more (C) the same (D) zero\nA: Let's think step by step. Fission releases\
\ energy, which comes from the rest mass of its initial nucleus. Thus the mass of\
\ the products is less than the mass of the reactant uranium nucleus. The answer\
\ is (A).\n\nQ: Things that are equivalent according to the equivalence principle\
\ are\n(A) space and time. (B) a traveling twin and a stay-at-home twin. (C) gravity\
\ and acceleration. (D) mass and energy.\nA: Let's think step by step. Einstein’s\
\ famous equivalence principle states that gravity and acceleration are equivalent.\
\ The answer is (C).\n\nQ: Which of these three elements has the most mass per nucleon?\n\
(A) Hydrogen (B) Iron (C) Uranium (D) Same in each\nA: Let's think step by step.\
\ Due to nuclear binding energy, the mass of an atomic nucleus is less than the\
\ sum of individual masses of the free constituent protons and neutrons; this is\
\ known as the mass defect. Hydrogen has no mass defect because it has only a single\
\ nucleon, so it will have the most mass per nucleon. The answer is (A).\n\nQ: A\
\ model airplane flies slower when flying into the wind and faster with wind at\
\ its back. When launched at right angles to the wind a cross wind its groundspeed\
\ compared with flying in still air is\n(A) the same (B) greater (C) less (D) either\
\ greater or less depending on wind speed\nA: Let's think step by step. The plane’s\
\ speed in the direction of the wind is greater than it would be in the absence\
\ of wind, and its direction orthogonal to the wind is the same as it would be in\
\ the absence of the wind. The total speed, which is these two components added\
\ in quadrature, is thus greater than the speed in still air. The answer is (B)."
"group": "mmlu_flan_cot_fewshot_stem"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_conceptual_physics"
"dataset_name": "econometrics"
"description": "The following are multiple choice questions (with answers) about econometrics.\n\
\nQ: Suppose now that a researcher wishes to use information criteria to determine\
\ the optimal lag length for a VAR. 500 observations are available for the bi-variate\
\ VAR, and the values of the determinant of the variance-covariance matrix of residuals\
\ are 0.0336, 0.0169, 0.0084, and 0.0062 for 1, 2, 3, and 4 lags respectively. What\
\ is the optimal model order according to Akaike's information criterion?\n(A) 1\
\ lag (B) 2 lags (C) 3 lags (D) 4 lags\nA: Let's think step by step. We refer to\
\ Wikipedia articles on econometrics for help. Let’s solve this problem step by\
\ step. First of all, let’s recall that for a given set of data, Akaike's information\
\ criterion (AIC) allows us to measure how well a statistical model fits the data;\
\ it is an estimator of prediction error. Here in this problem we will need to use\
\ the formula ln(det(sigma_hat)) + (2 * k / T) to determine the values of Akaike’s\
\ criterion, where ln denotes the natural log function, det the determinant function,\
\ k the total number of parameters in total (across both equations), and T the number\
\ of observations (which, in this case, is equal to 500). For 1 lag, the number\
\ of parameters in total is equal to 6; for 2 lags, it is 10; for 3 lags, it is\
\ 14; and for 4 lags, it is 18. Now, let’s calculate the values of the criterion\
\ for each lag:\n(A) 1 lag: ln(0.0336) + (2 * 6 / 500) = ln(0.0336) + (12 / 500)\
\ = -3.369\n(B) 2 lags: ln(0.0169) + (2 * 10 / 500) = ln(0.0169) + (20 / 500) =\
\ -4.040\n(C) 3 lags: ln(0.0084) + (2 * 14 / 500) = ln(0.0084) + (28 / 500) =-4.724\n\
(D) 4 lags: ln(0.0062) + (2 * 18 / 500) = ln(0.0062) + (36 / 500) =-5.011\nBecause\
\ the optimal model order according to AIC minimizes the information criterion,\
\ the answer should be the one with the lowest value. In this case, (D) has the\
\ lowest value. The answer is (C).\n\nQ: Consider the following AR(1) model with\
\ the disturbances having zero mean and unit variance\nyt = 0.2 + 0.4 yt-1 + ut\n\
The (unconditional) mean of y will be given by\n(A) 0.2 (B) 0.4 (C) 0.5 (D) 0.33\n\
A: Let's think step by step. We refer to Wikipedia articles on econometrics for\
\ help. Let’s solve this problem step by step. If we have a an AR(1) model with\
\ the disturbances having zero mean and unit variance, then the unconditional mean\
\ of y is equal to the following:\nunconditional mean of y = (the intercept term)\
\ / (1 - autoregressive coefficient)\nWe know that the intercept term is 0.2 and\
\ the autoregressive coefficient is 0.4; thus, we have:\nunconditional mean of y\
\ = (0.2) / (1 - 0.4) = (0.2) / (0.6) = 2 / 6 = 1 / 3, which is approximately 0.33.\
\ That means that the answer should be (D) 0.33. The answer is (D).\n\nQ: What would\
\ be then consequences for the OLS estimator if heteroscedasticity is present in\
\ a regression model but ignored?\n(A) It will be biased (B) It will be inconsistent\
\ (C) It will be inefficient (D) All of (a), (b) and (c) will be true.\nA: Let's\
\ think step by step. We refer to Wikipedia articles on econometrics for help. Heteroscedasticity\
\ refers to the condition where the variance of the error terms is not constant\
\ across multiple observations. If heteroscedasticity is present in a regression\
\ model, then the coefficient estimates in the OLS estimator will be not only unbiased\
\ and consistent but also inefficient. Because (A) and (B) are incorrect choices\
\ and (C) is a correct choice, (D) cannot be the right answer. Ultimately, (C) is\
\ the only true choice. The answer is (C).\n\nQ: Suppose that a test statistic has\
\ associated with it a p-value of 0.08. Which one of the following statements is\
\ true?\n(i) If the size of the test were exactly 8%, we would be indifferent between\
\ rejecting and not rejecting the null hypothesis\n(ii) The null would be rejected\
\ if a 10% size of test were used\n(iii) The null would not be rejected if a 1%\
\ size of test were used\n(iv) The null would be rejected if a 5% size of test were\
\ used.\n(A) (ii) and (iv) only (B) (i) and (iii) only (C) (i), (ii), and (iii)\
\ only (D) (i), (ii), (iii), and (iv).\nA: Let's think step by step. We refer to\
\ Wikipedia articles on econometrics for help. Let’s reason about each of the options.\n\
(i) is a true statement.\n(ii) is a true statement.\n(iii) is a true statement.\n\
(iv) is not a true statement. Thus, (i), (ii), and (iii) are true. The answer is\
\ (C).\n\nQ: For a stationary autoregressive process, shocks will\n(A) Eventually\
\ die away (B) Persist indefinitely (C) Grow exponentially (D) Never occur\nA: Let's\
\ think step by step. We refer to Wikipedia articles on econometrics for help. This\
\ is a formal logic problem about stationally process. For a stationary autoregressive\
\ process, shocks will eventually die away. The answer is (A)."
"group": "mmlu_flan_cot_fewshot_social_sciences"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_econometrics"
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment