Unverified Commit da211969 authored by Jess's avatar Jess Committed by GitHub
Browse files

Merge branch 'EleutherAI:main' into main

parents 1b97e487 801322e0
"dataset_name": "anatomy"
"description": "The following are multiple choice questions (with answers) about anatomy.\n\
\nQ: Which of the following is the body cavity that contains the pituitary gland?\n\
(A) Abdominal (B) Cranial (C) Pleural (D) Spinal\nA: Let's think step by step. We\
\ refer to Wikipedia articles on anatomy for help. Let’s solve this problem step\
\ by step. The pituitary gland is the major endocrine gland attached to the base\
\ of the brain, and it is contained in the Cranial cavity. The answer is (B).\n\n\
Q: Which of these branches of the trigeminal nerve contain somatic motor processes?\n\
(A) The supraorbital nerve (B) The infraorbital nerve (C) The mental nerve (D) None\
\ of the above\nA: Let's think step by step. We refer to Wikipedia articles on anatomy\
\ for help. Let’s solve this problem step by step. \nWe know the following: (A)\
\ The supraorbital nerve (also known as the frontal nerve) is the largest branch\
\ of the ophthalmic nerve and branch of ophthalmic division of the trigeminal nerve.\
\ (B) The infraorbital nerve is a branch of the maxillary division of the trigeminal\
\ nerve. (C) The mental nerve is a branch of the mandibular division of the trigeminal\
\ nerve. Because all these nerves are purely sensory nerves and do not contain any\
\ somatic motor processes. Therefore, the answer should be none of the above, which\
\ is (D). The answer is (D).\n\nQ: In Angle's Class II Div 2 occlusion there is\n\
(A) excess overbite of the upper lateral incisors. (B) negative overjet of the upper\
\ central incisors. (C) excess overjet of the upper lateral incisors. (D) excess\
\ overjet of the upper central incisors.\nA: Let's think step by step. We refer\
\ to Wikipedia articles on anatomy for help. Let’s solve this problem step by step.\
\ This is a question related to anatomy and orthodontics. Excess overjet is associated\
\ with Class II occlusions; therefore, we can safely eliminate (B) from the list,\
\ as negative overjet is often associated with Class III occlusions. Now, we need\
\ to determine the location of the excess overjet, and that would be the upper (maxillary)\
\ lateral incisors. Only (C) has the correct information. The answer is (C).\n\n\
Q: The pleura\n(A) have no sensory innervation. (B) are separated by a 2 mm space.\
\ (C) extend into the neck. (D) are composed of respiratory epithelium.\nA: Let's\
\ think step by step. We refer to Wikipedia articles on anatomy for help. Let’s\
\ solve this problem step by step. First, recall that the pleura refers to the thin\
\ layer of tissue that covers the lungs and lines the interior wall of the chest\
\ cavity. Now, let’s look at each option:\nOption (A): “The pleura have no sensory\
\ innervation.” This information is not correct. The pleura do have a sensory innervation.\n\
Option (B): “The pleura are separated by a 2 mm space.” This information is not\
\ correct. There is a very thin “potential” space between the layers of the pleura;\
\ however, it is typically filled with serous pleural fluid. \nOption (C): “The\
\ pleura extend into the neck.” This information is actuakky true. The cervical\
\ pleura, also known as the dome of the pleuradome of the pleura, lines the extendsiton\
\ of the pleural cavity into the neck.\nOption (D): “The pleura are composed of\
\ respiratory epithelium.” This information is not correct. The pleaura are composed\
\ of connective tissue (CT).\nBecause (A), (B), and (D) are all incorrect, (D) is\
\ the only correct answer. The answer is (C).\n\nQ: What is the embryological origin\
\ of the hyoid bone?\n(A) The first pharyngeal arch (B) The first and second pharyngeal\
\ arches (C) The second pharyngeal arch (D) The second and third pharyngeal arches\n\
A: Let's think step by step. We refer to Wikipedia articles on anatomy for help.\
\ Let’s solve this problem step by step. The hyoid bone, which is also known as\
\ the hyooid, is a a small U-shaped bone located in the anterior neck. In its resting\
\ position, it lies between the ase of the mandible and the third cervical vertebrae.\
\ We know that the second and the third pharyngeal arches give rise to the horns\
\ of the hyoid bone; therefore, the embryological origin of the hyoid bone are the\
\ second and the third pharyngeal arches—this information is covered in the last\
\ option (D). Therefore, we conclude that (D) must be the correct answer. The answer\
\ is (D).\n\n"
"group": "mmlu_flan_cot_fewshot_stem"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_anatomy"
dataset_name: anatomy
description: The following are multiple choice questions (with answers) about anatomy.
fewshot_config:
sampler: first_n
samples:
- question: 'Which of the following is the body cavity that contains the pituitary
gland?
(A) Abdominal (B) Cranial (C) Pleural (D) Spinal'
target: "Let's think step by step. We refer to Wikipedia articles on anatomy for\
\ help. Let\u2019s solve this problem step by step. The pituitary gland is the\
\ major endocrine gland attached to the base of the brain, and it is contained\
\ in the Cranial cavity. The answer is (B)."
- question: 'Which of these branches of the trigeminal nerve contain somatic motor
processes?
(A) The supraorbital nerve (B) The infraorbital nerve (C) The mental nerve (D)
None of the above'
target: "Let's think step by step. We refer to Wikipedia articles on anatomy for\
\ help. Let\u2019s solve this problem step by step. \nWe know the following:\
\ (A) The supraorbital nerve (also known as the frontal nerve) is the largest\
\ branch of the ophthalmic nerve and branch of ophthalmic division of the trigeminal\
\ nerve. (B) The infraorbital nerve is a branch of the maxillary division of\
\ the trigeminal nerve. (C) The mental nerve is a branch of the mandibular division\
\ of the trigeminal nerve. Because all these nerves are purely sensory nerves\
\ and do not contain any somatic motor processes. Therefore, the answer should\
\ be none of the above, which is (D). The answer is (D)."
- question: 'In Angle''s Class II Div 2 occlusion there is
(A) excess overbite of the upper lateral incisors. (B) negative overjet of the
upper central incisors. (C) excess overjet of the upper lateral incisors. (D)
excess overjet of the upper central incisors.'
target: "Let's think step by step. We refer to Wikipedia articles on anatomy for\
\ help. Let\u2019s solve this problem step by step. This is a question related\
\ to anatomy and orthodontics. Excess overjet is associated with Class II occlusions;\
\ therefore, we can safely eliminate (B) from the list, as negative overjet\
\ is often associated with Class III occlusions. Now, we need to determine the\
\ location of the excess overjet, and that would be the upper (maxillary) lateral\
\ incisors. Only (C) has the correct information. The answer is (C)."
- question: 'The pleura
(A) have no sensory innervation. (B) are separated by a 2 mm space. (C) extend
into the neck. (D) are composed of respiratory epithelium.'
target: "Let's think step by step. We refer to Wikipedia articles on anatomy for\
\ help. Let\u2019s solve this problem step by step. First, recall that the pleura\
\ refers to the thin layer of tissue that covers the lungs and lines the interior\
\ wall of the chest cavity. Now, let\u2019s look at each option:\nOption (A):\
\ \u201CThe pleura have no sensory innervation.\u201D This information is not\
\ correct. The pleura do have a sensory innervation.\nOption (B): \u201CThe\
\ pleura are separated by a 2 mm space.\u201D This information is not correct.\
\ There is a very thin \u201Cpotential\u201D space between the layers of the\
\ pleura; however, it is typically filled with serous pleural fluid. \nOption\
\ (C): \u201CThe pleura extend into the neck.\u201D This information is actuakky\
\ true. The cervical pleura, also known as the dome of the pleuradome of the\
\ pleura, lines the extendsiton of the pleural cavity into the neck.\nOption\
\ (D): \u201CThe pleura are composed of respiratory epithelium.\u201D This information\
\ is not correct. The pleaura are composed of connective tissue (CT).\nBecause\
\ (A), (B), and (D) are all incorrect, (D) is the only correct answer. The answer\
\ is (C)."
- question: 'What is the embryological origin of the hyoid bone?
(A) The first pharyngeal arch (B) The first and second pharyngeal arches (C)
The second pharyngeal arch (D) The second and third pharyngeal arches'
target: "Let's think step by step. We refer to Wikipedia articles on anatomy for\
\ help. Let\u2019s solve this problem step by step. The hyoid bone, which is\
\ also known as the hyooid, is a a small U-shaped bone located in the anterior\
\ neck. In its resting position, it lies between the ase of the mandible and\
\ the third cervical vertebrae. We know that the second and the third pharyngeal\
\ arches give rise to the horns of the hyoid bone; therefore, the embryological\
\ origin of the hyoid bone are the second and the third pharyngeal arches\u2014\
this information is covered in the last option (D). Therefore, we conclude that\
\ (D) must be the correct answer. The answer is (D).\n\n"
group: mmlu_flan_cot_fewshot_stem
include: _mmlu_flan_cot_fewshot_template_yaml
task: mmlu_flan_cot_fewshot_anatomy
"dataset_name": "astronomy"
"description": "The following are multiple choice questions (with answers) about astronomy.\n\
\nQ: Where do most short-period comets come from and how do we know?\n(A) The Kuiper\
\ belt; short period comets tend to be in the plane of the solar system just like\
\ the Kuiper belt. (B) The Kuiper belt; short period comets tend to come from random\
\ directions indicating a spherical distribution of comets called the Kuiper belt.\
\ (C) The asteroid belt; short period comets have orbital periods similar to asteroids\
\ like Vesta and are found in the plane of the solar system just like the asteroid\
\ belt. (D) The Oort cloud; short period comets tend to be in the plane of the solar\
\ system just like the Oort cloud.\nA: Let's think step by step. Most short-period\
\ comets come from the Kuiper belt, and we know because short period coments tend\
\ to be in the plane of the solar system, just like the Kuiper belt is. The answer\
\ is (A).\n\nQ: You are pushing a truck along a road. Would it be easier to accelerate\
\ this truck on Mars? Why? (Assume there is no friction)\n(A) It would be harder\
\ since the truck is heavier on Mars. (B) It would be easier since the truck is\
\ lighter on Mars. (C) It would be harder since the truck is lighter on Mars. (D)\
\ It would be the same no matter where you are.\nA: Let's think step by step. If\
\ we assume that there is no friction, the force needed to accelerate the truck\
\ is by Newton’s second law only dependent on the mass of the truck. Hence (A),\
\ (B) and (C) are incorrect since it doesn’t matter that it’s on Mars, and (D) is\
\ the correct answer. The answer is (D).\n\nQ: Say the pupil of your eye has a diameter\
\ of 5 mm and you have a telescope with an aperture of 50 cm. How much more light\
\ can the telescope gather than your eye?\n(A) 10000 times more (B) 100 times more\
\ (C) 1000 times more (D) 10 times more\nA: Let's think step by step. The amount\
\ of light is proportional to the aperture area $A = \\pi D^2/4$ for a lens with\
\ diameter $D$, so the relative amounts of light between the eye with diameter 5mm\
\ and the telescope with diameter 50mm is $(50 cm)^2/(5mm)^2 = 10000$. The answer\
\ is (A).\n\nQ: Why isn't there a planet where the asteroid belt is located?\n(A)\
\ A planet once formed here but it was broken apart by a catastrophic collision.\
\ (B) There was not enough material in this part of the solar nebula to form a planet.\
\ (C) There was too much rocky material to form a terrestrial planet but not enough\
\ gaseous material to form a jovian planet. (D) Resonance with Jupiter prevented\
\ material from collecting together to form a planet.\nA: Let's think step by step.\
\ The asteroid belt is a stellar disc consisting of a large number of asteroids\
\ between Mars and Jupiter's orbits. The asteroids in this belt are affected by\
\ the gravitational pull from both other asteroids and nearby planets. Due to the\
\ strong gravitational force of Jupiter there are resonances that give rise to low\
\ density regions of asteroids known as the Kirkwood gap. So (B) and (C) are not\
\ correct since it’s not a lack of material that prevents a planet from being formed,\
\ and (A) is incorrect because the Kirkwood gap would have prevented a planet from\
\ forming in the first place, and (D) is the correct option. The answer is (D).\n\
\nQ: Why is Mars red?\n(A) Because the surface is covered with heavily oxidized\
\ (\"rusted\") minerals. (B) Because the atmosphere scatters more light at bluer\
\ wavelengths transmitting mostly red light. (C) Because Mars is covered with ancient\
\ lava flows which are red in color. (D) Because flowing water on Mars's surface\
\ altered the surface minerals several billion years ago.\nA: Let's think step by\
\ step. Option (B) is not correct because if the red color was caused by the scattering\
\ off the atmosphere, then the earth with a much thicker atmosphere would also look\
\ red. Options (C) and (D) are not specific enough about why the color of the surface\
\ would be red, while (A) is correct because it explains that the surface is red\
\ due to the rusted materials on the surface and the red color comes from the rust.\
\ So the correct option is (A). The answer is (A).\n\n"
"group": "mmlu_flan_cot_fewshot_stem"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_astronomy"
dataset_name: astronomy
description: The following are multiple choice questions (with answers) about astronomy.
fewshot_config:
sampler: first_n
samples:
- question: 'Where do most short-period comets come from and how do we know?
(A) The Kuiper belt; short period comets tend to be in the plane of the solar
system just like the Kuiper belt. (B) The Kuiper belt; short period comets tend
to come from random directions indicating a spherical distribution of comets
called the Kuiper belt. (C) The asteroid belt; short period comets have orbital
periods similar to asteroids like Vesta and are found in the plane of the solar
system just like the asteroid belt. (D) The Oort cloud; short period comets
tend to be in the plane of the solar system just like the Oort cloud.'
target: Let's think step by step. Most short-period comets come from the Kuiper
belt, and we know because short period coments tend to be in the plane of the
solar system, just like the Kuiper belt is. The answer is (A).
- question: 'You are pushing a truck along a road. Would it be easier to accelerate
this truck on Mars? Why? (Assume there is no friction)
(A) It would be harder since the truck is heavier on Mars. (B) It would be easier
since the truck is lighter on Mars. (C) It would be harder since the truck is
lighter on Mars. (D) It would be the same no matter where you are.'
target: "Let's think step by step. If we assume that there is no friction, the\
\ force needed to accelerate the truck is by Newton\u2019s second law only dependent\
\ on the mass of the truck. Hence (A), (B) and (C) are incorrect since it doesn\u2019\
t matter that it\u2019s on Mars, and (D) is the correct answer. The answer is\
\ (D)."
- question: 'Say the pupil of your eye has a diameter of 5 mm and you have a telescope
with an aperture of 50 cm. How much more light can the telescope gather than
your eye?
(A) 10000 times more (B) 100 times more (C) 1000 times more (D) 10 times more'
target: Let's think step by step. The amount of light is proportional to the aperture
area $A = \pi D^2/4$ for a lens with diameter $D$, so the relative amounts of
light between the eye with diameter 5mm and the telescope with diameter 50mm
is $(50 cm)^2/(5mm)^2 = 10000$. The answer is (A).
- question: 'Why isn''t there a planet where the asteroid belt is located?
(A) A planet once formed here but it was broken apart by a catastrophic collision.
(B) There was not enough material in this part of the solar nebula to form a
planet. (C) There was too much rocky material to form a terrestrial planet but
not enough gaseous material to form a jovian planet. (D) Resonance with Jupiter
prevented material from collecting together to form a planet.'
target: "Let's think step by step. The asteroid belt is a stellar disc consisting\
\ of a large number of asteroids between Mars and Jupiter's orbits. The asteroids\
\ in this belt are affected by the gravitational pull from both other asteroids\
\ and nearby planets. Due to the strong gravitational force of Jupiter there\
\ are resonances that give rise to low density regions of asteroids known as\
\ the Kirkwood gap. So (B) and (C) are not correct since it\u2019s not a lack\
\ of material that prevents a planet from being formed, and (A) is incorrect\
\ because the Kirkwood gap would have prevented a planet from forming in the\
\ first place, and (D) is the correct option. The answer is (D)."
- question: 'Why is Mars red?
(A) Because the surface is covered with heavily oxidized ("rusted") minerals.
(B) Because the atmosphere scatters more light at bluer wavelengths transmitting
mostly red light. (C) Because Mars is covered with ancient lava flows which
are red in color. (D) Because flowing water on Mars''s surface altered the surface
minerals several billion years ago.'
target: 'Let''s think step by step. Option (B) is not correct because if the red
color was caused by the scattering off the atmosphere, then the earth with a
much thicker atmosphere would also look red. Options (C) and (D) are not specific
enough about why the color of the surface would be red, while (A) is correct
because it explains that the surface is red due to the rusted materials on the
surface and the red color comes from the rust. So the correct option is (A).
The answer is (A).'
group: mmlu_flan_cot_fewshot_stem
include: _mmlu_flan_cot_fewshot_template_yaml
task: mmlu_flan_cot_fewshot_astronomy
"dataset_name": "business_ethics"
"description": "The following are multiple choice questions (with answers) about business\
\ ethics.\n\nQ: In contrast to _______, _______ aim to reward favourable behaviour\
\ by companies. The success of such campaigns have been heightened through the use\
\ of ___________, which allow campaigns to facilitate the company in achieving _________\
\ .\n(A) Buycotts, Boycotts, Blockchain technology, Charitable donations (B) Buycotts,\
\ Boycotts, Digital technology, Increased Sales (C) Boycotts, Buyalls, Blockchain\
\ technology, Charitable donations (D) Boycotts, Buycotts, Digital technology, Increased\
\ Sales\nA: Let's think step by step. We refer to Wikipedia articles on business\
\ ethics for help. The sentence that best uses the possible options above is “In\
\ contrast to *boycotts*, *buycotts* aim to reward favourable behavior by companies.\
\ The success of such campaigns have been heightened through the use of *digital\
\ technology*, which allow campaigns to facilitate the company in achieving *increased\
\ sales*.” The answer is (D).\n\nQ: _______ is the direct attempt to formally or\
\ informally manage ethical issues or problems, through specific policies, practices\
\ and programmes.\n(A) Corporate social responsibility (B) Business ethics management\
\ (C) Sustainability (D) Environmental management\nA: Let's think step by step.\
\ We refer to Wikipedia articles on business ethics for help. The direct attempt\
\ manage ethical issues through specific policies, practices, and programs is business\
\ ethics management. The answer is (B).\n\nQ: Three contrasting tactics that CSO's\
\ can engage in to meet their aims are ________ which typically involves research\
\ and communication, ________, which may involve physically attacking a company's\
\ operations or ________, often involving some form of _______.\n(A) Non-violent\
\ direct action, Violent direct action, Indirect action, Boycott (B) Indirect action,\
\ Instrumental action, Non-violent direct action, Information campaign (C) Indirect\
\ action, Violent direct action, Non-violent direct-action Boycott (D) Non-violent\
\ direct action, Instrumental action, Indirect action, Information campaign\nA:\
\ Let's think step by step. We refer to Wikipedia articles on business ethics for\
\ help. The sentence that best uses the possible options above is “Three contrasting\
\ tactics that CSO's can engage in to meet their aims are *indirect action*, which\
\ typically involves research and communication, *violent direct action*, which\
\ may involve physically attacking a company's operations or *non-violent direct\
\ action*, often involving some form of *boycott*.” The answer is (C).\n\nQ: To\
\ ensure the independence of the non-executive board members, there are a number\
\ of steps which can be taken, which include non-executives being drawn from _______\
\ the company, being appointed for a _________ time period as well as being appointed\
\ _________.\n(A) Outside, Limited, Independently (B) Inside, Limited, Intermittently\
\ (C) Outside, Unlimited, Intermittently (D) Inside, Unlimited, Independently\n\
A: Let's think step by step. We refer to Wikipedia articles on business ethics for\
\ help. The sentence that best uses the possible options above is “To ensure the\
\ independence of the non-executive board members, there are a number of steps which\
\ can be taken, which include non-executives being draw from *outside* the company,\
\ being appointed for a *limited* time period as well as being imported *independently*.\
\ The answer is (A).\n\nQ: Beyond the business case for engaging in CSR there are\
\ a number of moral arguments relating to: negative _______, the _______that corporations\
\ possess and the ________ of business and society.\n(A) Externalities, Power, Independence\
\ (B) Publicity, Insubstantial resources, Mutual dependence (C) Publicity, Power,\
\ Independence (D) Externalities, Power, Mutual dependence\nA: Let's think step\
\ by step. We refer to Wikipedia articles on business ethics for help. The sentence\
\ that best uses the possible options above is “Beyond the business case for engaging\
\ the CSR there are a number of moral arguments relating to: negative *externalities*,\
\ the *power* that corporations possess and the *mutual independence* of business\
\ and society. The answer is (D).\n\n"
"group": "mmlu_flan_cot_fewshot_other"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_business_ethics"
dataset_name: business_ethics
description: The following are multiple choice questions (with answers) about business
ethics.
fewshot_config:
sampler: first_n
samples:
- question: 'In contrast to _______, _______ aim to reward favourable behaviour by
companies. The success of such campaigns have been heightened through the use
of ___________, which allow campaigns to facilitate the company in achieving
_________ .
(A) Buycotts, Boycotts, Blockchain technology, Charitable donations (B) Buycotts,
Boycotts, Digital technology, Increased Sales (C) Boycotts, Buyalls, Blockchain
technology, Charitable donations (D) Boycotts, Buycotts, Digital technology,
Increased Sales'
target: "Let's think step by step. We refer to Wikipedia articles on business\
\ ethics for help. The sentence that best uses the possible options above is\
\ \u201CIn contrast to *boycotts*, *buycotts* aim to reward favourable behavior\
\ by companies. The success of such campaigns have been heightened through the\
\ use of *digital technology*, which allow campaigns to facilitate the company\
\ in achieving *increased sales*.\u201D The answer is (D)."
- question: '_______ is the direct attempt to formally or informally manage ethical
issues or problems, through specific policies, practices and programmes.
(A) Corporate social responsibility (B) Business ethics management (C) Sustainability
(D) Environmental management'
target: Let's think step by step. We refer to Wikipedia articles on business ethics
for help. The direct attempt manage ethical issues through specific policies,
practices, and programs is business ethics management. The answer is (B).
- question: 'Three contrasting tactics that CSO''s can engage in to meet their aims
are ________ which typically involves research and communication, ________,
which may involve physically attacking a company''s operations or ________,
often involving some form of _______.
(A) Non-violent direct action, Violent direct action, Indirect action, Boycott
(B) Indirect action, Instrumental action, Non-violent direct action, Information
campaign (C) Indirect action, Violent direct action, Non-violent direct-action
Boycott (D) Non-violent direct action, Instrumental action, Indirect action,
Information campaign'
target: "Let's think step by step. We refer to Wikipedia articles on business\
\ ethics for help. The sentence that best uses the possible options above is\
\ \u201CThree contrasting tactics that CSO's can engage in to meet their aims\
\ are *indirect action*, which typically involves research and communication,\
\ *violent direct action*, which may involve physically attacking a company's\
\ operations or *non-violent direct action*, often involving some form of *boycott*.\u201D\
\ The answer is (C)."
- question: 'To ensure the independence of the non-executive board members, there are
a number of steps which can be taken, which include non-executives being drawn
from _______ the company, being appointed for a _________ time period as well
as being appointed _________.
(A) Outside, Limited, Independently (B) Inside, Limited, Intermittently (C)
Outside, Unlimited, Intermittently (D) Inside, Unlimited, Independently'
target: "Let's think step by step. We refer to Wikipedia articles on business\
\ ethics for help. The sentence that best uses the possible options above is\
\ \u201CTo ensure the independence of the non-executive board members, there\
\ are a number of steps which can be taken, which include non-executives being\
\ draw from *outside* the company, being appointed for a *limited* time period\
\ as well as being imported *independently*. The answer is (A)."
- question: 'Beyond the business case for engaging in CSR there are a number of moral
arguments relating to: negative _______, the _______that corporations possess
and the ________ of business and society.
(A) Externalities, Power, Independence (B) Publicity, Insubstantial resources,
Mutual dependence (C) Publicity, Power, Independence (D) Externalities, Power,
Mutual dependence'
target: "Let's think step by step. We refer to Wikipedia articles on business\
\ ethics for help. The sentence that best uses the possible options above is\
\ \u201CBeyond the business case for engaging the CSR there are a number of\
\ moral arguments relating to: negative *externalities*, the *power* that corporations\
\ possess and the *mutual independence* of business and society. The answer\
\ is (D).\n\n"
group: mmlu_flan_cot_fewshot_other
include: _mmlu_flan_cot_fewshot_template_yaml
task: mmlu_flan_cot_fewshot_business_ethics
"dataset_name": "clinical_knowledge"
"description": "The following are multiple choice questions (with answers) about clinical\
\ knowledge.\n\nQ: Glycolysis is the name given to the pathway involving the conversion\
\ of:\n(A) glycogen to glucose-1-phosphate. (B) glycogen or glucose to fructose.\
\ (C) glycogen or glucose to pyruvate or lactate. (D) glycogen or glucose to pyruvate\
\ or acetyl CoA.\nA: Let's think step by step. We refer to Wikipedia articles on\
\ clinical knowledge for help. Glycolysis is the name given to the pathway involving\
\ conversion of glycogen or glucose to pyruvate or lactate. The answer is (C).\n\
\nQ: What is the difference between a male and a female catheter?\n(A) Male and\
\ female catheters are different colours. (B) Male catheters are longer than female\
\ catheters. (C) Male catheters are bigger than female catheters. (D) Female catheters\
\ are longer than male catheters.\nA: Let's think step by step. We refer to Wikipedia\
\ articles on clinical knowledge for help. The difference between a male and female\
\ catheter is that male catheters tend to be longer than female catheters. The answer\
\ is (B).\n\nQ: How many attempts should you make to cannulate a patient before\
\ passing the job on to a senior colleague, according to the medical knowledge of\
\ 2020?\n(A) 4 (B) 3 (C) 2 (D) 1\nA: Let's think step by step. We refer to Wikipedia\
\ articles on clinical knowledge for help. According to the medical protocol as\
\ of 2020, you should make two attempts to cannulate a patient before passing the\
\ job on to a more-senior practitioner. The answer is (C).\n\nQ: In the assessment\
\ of the hand function which of the following is true?\n(A) Abduction of the thumb\
\ is supplied by spinal root T2 (B) Opposition of the thumb by opponens policis\
\ is supplied by spinal root T1 (C) Finger adduction is supplied by the median nerve\
\ (D) Finger abduction is mediated by the palmar interossei\nA: Let's think step\
\ by step. We refer to Wikipedia articles on clinical knowledge for help. Of all\
\ the options, it is only true that the opposition of the thumb by opponens pollicis\
\ is supplied by spinal root T1. The answer is (B).\n\nQ: The energy for all forms\
\ of muscle contraction is provided by:\n(A) ATP. (B) ADP. (C) phosphocreatine.\
\ (D) oxidative phosphorylation.\nA: Let's think step by step. We refer to Wikipedia\
\ articles on clinical knowledge for help. The energy for muscular contraction is\
\ provided by ATP (adenosine triphosphate), which is the powerhouse of the cell.\
\ The answer is (A).\n\n"
"group": "mmlu_flan_cot_fewshot_other"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_clinical_knowledge"
dataset_name: clinical_knowledge
description: The following are multiple choice questions (with answers) about clinical
knowledge.
fewshot_config:
sampler: first_n
samples:
- question: 'Glycolysis is the name given to the pathway involving the conversion of:
(A) glycogen to glucose-1-phosphate. (B) glycogen or glucose to fructose. (C)
glycogen or glucose to pyruvate or lactate. (D) glycogen or glucose to pyruvate
or acetyl CoA.'
target: Let's think step by step. We refer to Wikipedia articles on clinical knowledge
for help. Glycolysis is the name given to the pathway involving conversion of
glycogen or glucose to pyruvate or lactate. The answer is (C).
- question: 'What is the difference between a male and a female catheter?
(A) Male and female catheters are different colours. (B) Male catheters are
longer than female catheters. (C) Male catheters are bigger than female catheters.
(D) Female catheters are longer than male catheters.'
target: Let's think step by step. We refer to Wikipedia articles on clinical knowledge
for help. The difference between a male and female catheter is that male catheters
tend to be longer than female catheters. The answer is (B).
- question: 'How many attempts should you make to cannulate a patient before passing
the job on to a senior colleague, according to the medical knowledge of 2020?
(A) 4 (B) 3 (C) 2 (D) 1'
target: Let's think step by step. We refer to Wikipedia articles on clinical knowledge
for help. According to the medical protocol as of 2020, you should make two
attempts to cannulate a patient before passing the job on to a more-senior practitioner.
The answer is (C).
- question: 'In the assessment of the hand function which of the following is true?
(A) Abduction of the thumb is supplied by spinal root T2 (B) Opposition of the
thumb by opponens policis is supplied by spinal root T1 (C) Finger adduction
is supplied by the median nerve (D) Finger abduction is mediated by the palmar
interossei'
target: Let's think step by step. We refer to Wikipedia articles on clinical knowledge
for help. Of all the options, it is only true that the opposition of the thumb
by opponens pollicis is supplied by spinal root T1. The answer is (B).
- question: 'The energy for all forms of muscle contraction is provided by:
(A) ATP. (B) ADP. (C) phosphocreatine. (D) oxidative phosphorylation.'
target: 'Let''s think step by step. We refer to Wikipedia articles on clinical
knowledge for help. The energy for muscular contraction is provided by ATP (adenosine
triphosphate), which is the powerhouse of the cell. The answer is (A).'
group: mmlu_flan_cot_fewshot_other
include: _mmlu_flan_cot_fewshot_template_yaml
task: mmlu_flan_cot_fewshot_clinical_knowledge
"dataset_name": "college_biology"
"description": "The following are multiple choice questions (with answers) about college\
\ biology.\n\nQ: Which of the following represents an accurate statement concerning\
\ arthropods?\n(A) They possess an exoskeleton composed primarily of peptidoglycan.\
\ (B) They possess an open circulatory system with a dorsal heart. (C) They are\
\ members of a biologically unsuccessful phylum incapable of exploiting diverse\
\ habitats and nutrition sources. (D) They lack paired, jointed appendages.\nA:\
\ Let's think step by step. Peptidoglycan is known to comprise the plasma membrane\
\ of most bacteria, rather than the exoskeleton of arthropods, which is made of\
\ chitin, which rules out (A). The answer (C) is false because arthropods are a\
\ highly successful phylum. Likewise, arthropods have paired, jointed appendages,\
\ which rules out (D). The only remaining option is (B), as arthropods have an open\
\ circulatory system with a dorsal tubular heart. The answer is (B).\n\nQ: In a\
\ given population, 1 out of every 400 people has a cancer caused by a completely\
\ recessive allele, b. Assuming the population is in Hardy-Weinberg equilibrium,\
\ which of the following is the expected proportion of individuals who carry the\
\ b allele but are not expected to develop the cancer?\n(A) 1/400 (B) 19/400 (C)\
\ 20/400 (D) 38/400\nA: Let's think step by step. According to the Hardy Weinberg\
\ Law, $p^2 + 2 p q + q^2 = 1$, and $p + q = 1$ where $p$ is the frequency of the\
\ dominant allele, $q$ is the frequency of the recessive allele, and $p^2$, $q^2$,\
\ and $2pq$ are the frequencies of dominant homozygous, recessive homozygous, and\
\ heterozygous individuals, respectively. ​The frequency of the recessive allele\
\ (q) is $\\sqrt{\frac{1}{400}} = 0.05$. We have $p = 1 - q = 0.95$. The frequency\
\ of heterozygous individuals is $2pq = 2 \\cdot 0.05 \\cdot 0.95 = 0.095$. The\
\ number of heterozygous individuals is equal to the frequency of heterozygous individuals\
\ times the size of the population, or $0.095 * 400 = 38$. So we end up with 38/400.\
\ The answer is (D).\n\nQ: According to the pressure-flow model of movement of phloem\
\ contents, photosynthate movement from source to sink is driven by\n(A) an ATP-dependent\
\ pressure-flow pump (B) a water-pressure potential gradient (C) transpiration (D)\
\ apoplastic diffusion\nA: Let's think step by step. It is a gradient in water pressure\
\ that induces the movement of phloem content, which refers to answer (B). The mechanism\
\ of movement does not rely on metabolism, which rules out (A). Transpiration refers\
\ to the exhalation of water vapor through plant stomata, and is also not related,\
\ which rules out (C). While the apoplastic pathway is one of two main pathways\
\ for water transport in plants, it is not central to the pressure flow model, which\
\ rules out (D). The answer is (B).\n\nQ: Which of the following contain DNA sequences\
\ required for the segregation of chromosomes in mitosis and meiosis?\n(A) Telomeres\
\ (B) Centromeres (C) Nucleosomes (D) Spliceosomes\nA: Let's think step by step.\
\ The genetic material in Telomeres is not used, which rules out (A). Nucleosomes\
\ are the repeating subunit that comprises chromatin packed in a cell nucleus, and\
\ do not specifically refer to DNA sequences necessary for segregating chromosomes\
\ in cell division, which rules out (C). A spliceosome is a large ribonucleoprotein\
\ that removes introns from transcribed pre-mRNA rather than governing chromosome\
\ segregation. Centromeres are directly responsible for segregating chromosomes\
\ in cell division. The answer is (B).\n\nQ: The presence of homologous structures\
\ in two different organisms, such as the humerus in the front limb of a human and\
\ a bird, indicates that\n(A) the human and bird are polyphyletic species (B) a\
\ human's and bird's evolution is convergent (C) the human and bird belong to a\
\ clade (D) the human and bird developed by analogy\nA: Let's think step by step.\
\ Polyphyletic species are organisms that are grouped due to having similar characteristics\
\ but which do not have a common ancestor. This is not the case for humans and birds,\
\ which rules out (A). Convergent evolution refers to the indepdendent development\
\ of similar features in different species at different periods, which is also not\
\ the case for humans and birds, which rules out (B). Analogy refers to the superficial\
\ resemblance of structures that have different origins, which is not the case for\
\ the human and bird forearms, which rules out (D). Humans and birds do belong to\
\ the same clade - a group of organisms composed of a common ancestor. The answer\
\ is (C).\n\n"
"group": "mmlu_flan_cot_fewshot_stem"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_college_biology"
dataset_name: college_biology
description: The following are multiple choice questions (with answers) about college
biology.
fewshot_config:
sampler: first_n
samples:
- question: 'Which of the following represents an accurate statement concerning arthropods?
(A) They possess an exoskeleton composed primarily of peptidoglycan. (B) They
possess an open circulatory system with a dorsal heart. (C) They are members
of a biologically unsuccessful phylum incapable of exploiting diverse habitats
and nutrition sources. (D) They lack paired, jointed appendages.'
target: Let's think step by step. Peptidoglycan is known to comprise the plasma
membrane of most bacteria, rather than the exoskeleton of arthropods, which
is made of chitin, which rules out (A). The answer (C) is false because arthropods
are a highly successful phylum. Likewise, arthropods have paired, jointed appendages,
which rules out (D). The only remaining option is (B), as arthropods have an
open circulatory system with a dorsal tubular heart. The answer is (B).
- question: 'In a given population, 1 out of every 400 people has a cancer caused by
a completely recessive allele, b. Assuming the population is in Hardy-Weinberg
equilibrium, which of the following is the expected proportion of individuals
who carry the b allele but are not expected to develop the cancer?
(A) 1/400 (B) 19/400 (C) 20/400 (D) 38/400'
target: "Let's think step by step. According to the Hardy Weinberg Law, $p^2 +\
\ 2 p q + q^2 = 1$, and $p + q = 1$ where $p$ is the frequency of the dominant\
\ allele, $q$ is the frequency of the recessive allele, and $p^2$, $q^2$, and\
\ $2pq$ are the frequencies of dominant homozygous, recessive homozygous, and\
\ heterozygous individuals, respectively. \u200BThe frequency of the recessive\
\ allele (q) is $\\sqrt{\frac{1}{400}} = 0.05$. We have $p = 1 - q = 0.95$.\
\ The frequency of heterozygous individuals is $2pq = 2 \\cdot 0.05 \\cdot 0.95\
\ = 0.095$. The number of heterozygous individuals is equal to the frequency\
\ of heterozygous individuals times the size of the population, or $0.095 *\
\ 400 = 38$. So we end up with 38/400. The answer is (D)."
- question: 'According to the pressure-flow model of movement of phloem contents, photosynthate
movement from source to sink is driven by
(A) an ATP-dependent pressure-flow pump (B) a water-pressure potential gradient
(C) transpiration (D) apoplastic diffusion'
target: Let's think step by step. It is a gradient in water pressure that induces
the movement of phloem content, which refers to answer (B). The mechanism of
movement does not rely on metabolism, which rules out (A). Transpiration refers
to the exhalation of water vapor through plant stomata, and is also not related,
which rules out (C). While the apoplastic pathway is one of two main pathways
for water transport in plants, it is not central to the pressure flow model,
which rules out (D). The answer is (B).
- question: 'Which of the following contain DNA sequences required for the segregation
of chromosomes in mitosis and meiosis?
(A) Telomeres (B) Centromeres (C) Nucleosomes (D) Spliceosomes'
target: Let's think step by step. The genetic material in Telomeres is not used,
which rules out (A). Nucleosomes are the repeating subunit that comprises chromatin
packed in a cell nucleus, and do not specifically refer to DNA sequences necessary
for segregating chromosomes in cell division, which rules out (C). A spliceosome
is a large ribonucleoprotein that removes introns from transcribed pre-mRNA
rather than governing chromosome segregation. Centromeres are directly responsible
for segregating chromosomes in cell division. The answer is (B).
- question: 'The presence of homologous structures in two different organisms, such
as the humerus in the front limb of a human and a bird, indicates that
(A) the human and bird are polyphyletic species (B) a human''s and bird''s evolution
is convergent (C) the human and bird belong to a clade (D) the human and bird
developed by analogy'
target: 'Let''s think step by step. Polyphyletic species are organisms that are
grouped due to having similar characteristics but which do not have a common
ancestor. This is not the case for humans and birds, which rules out (A). Convergent
evolution refers to the indepdendent development of similar features in different
species at different periods, which is also not the case for humans and birds,
which rules out (B). Analogy refers to the superficial resemblance of structures
that have different origins, which is not the case for the human and bird forearms,
which rules out (D). Humans and birds do belong to the same clade - a group
of organisms composed of a common ancestor. The answer is (C).'
group: mmlu_flan_cot_fewshot_stem
include: _mmlu_flan_cot_fewshot_template_yaml
task: mmlu_flan_cot_fewshot_college_biology
"dataset_name": "college_chemistry"
"description": "The following are multiple choice questions (with answers) about college\
\ chemistry.\n\nQ: 3 Cl−(aq) + 4 CrO_4^2−(aq) + 23 H+(aq) 3 HClO2(aq) + 4 Cr3+(aq)\
\ + 10 H2O(l). In the reaction shown above, Cl−(aq) behaves as\n(A) an acid (B)\
\ a base (C) a catalyst (D) a reducing agent\nA: Let's think step by step. A molecule\
\ that behaves as a base accepts an H+ ion (or proton) from another molecule, whereas\
\ a molecule that behaves as an acid donates an H+ ion (or proton) to another molecule.\
\ Neither of these is the case for Cl in this reaction, which rules out (A) and\
\ (B). A catalyst is a substance that only accelerates a reaction without itself\
\ undergoing chemical change, which is not the case here. This rules out (C). Instead,\
\ the $Cl^{-} molecules carry a negative charge, which they donate in the reaction\
\ to form 3 HClO2. This is the behavior of a reducing agent, or (D). The answer\
\ is (D).\n\nQ: Which of the following statements about the lanthanide elements\
\ is NOT true?\n(A) The most common oxidation state for the lanthanide elements\
\ is +3. (B) Lanthanide complexes often have high coordination numbers (> 6). (C)\
\ All of the lanthanide elements react with aqueous acid to liberate hydrogen. (D)\
\ The atomic radii of the lanthanide elements increase across the period from La\
\ to Lu.\nA: Let's think step by step. The atomic radii of the lanthanide elements\
\ in fact decrease across the period from La to Lu. Options (A), (B), and (C) are\
\ all true. This means that only (D) is NOT true. The answer is (D).\n\nQ: Which\
\ of the following lists the hydrides of group-14 elements in order of thermal stability,\
\ from lowest to highest?\n(A) PbH4 < SnH4 < GeH4 < SiH4 < CH4 (B) PbH4 < SnH4 <\
\ CH4 < GeH4 < SiH4 (C) CH4 < SiH4 < GeH4 < SnH4 < PbH4 (D) CH4 < PbH4 < GeH4 <\
\ SnH4 < SiH4\nA: Let's think step by step. The thermal stability of group-14 hydrides\
\ decreases as we move from the top of group 14 to the bottom. The order of elements\
\ in the group from top to bottom is C, Si, Ge, Sn, Pb. Therefore in order of increasing\
\ thermal stability we have PbH4, SnH4, GeH4, SiH4, and CH4, or answer (A). The\
\ answer is (A).\n\nQ: Predict the number of lines in the EPR spectrum of a solution\
\ of 13C-labelled methyl radical (13CH3•), assuming the lines do not overlap.\n\
(A) 4 (B) 3 (C) 6 (D) 24 (E) 8\nA: Let's think step by step. The electron paramagnetic\
\ resonance spectrum will be split by two forms of interactions. The first is the\
\ hyperfine interaction with the 13C (nuclear spin $I = \nrac{1}{2}$) which will\
\ split the spectrum into 2 lines. This will be further split into 4 lines by the\
\ interaction with three equivalent 1H nuclei. The total number of lines is therefore\
\ $2 \\cdot 4 = 8$. The answer is (E).\n\n"
"group": "mmlu_flan_cot_fewshot_stem"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_college_chemistry"
dataset_name: college_chemistry
description: The following are multiple choice questions (with answers) about college
chemistry.
fewshot_config:
sampler: first_n
samples:
- question: "3 Cl\u2212(aq) + 4 CrO_4^2\u2212(aq) + 23 H+(aq) \u2192 3 HClO2(aq) +\
\ 4 Cr3+(aq) + 10 H2O(l). In the reaction shown above, Cl\u2212(aq) behaves\
\ as\n(A) an acid (B) a base (C) a catalyst (D) a reducing agent"
target: Let's think step by step. A molecule that behaves as a base accepts an
H+ ion (or proton) from another molecule, whereas a molecule that behaves as
an acid donates an H+ ion (or proton) to another molecule. Neither of these
is the case for Cl in this reaction, which rules out (A) and (B). A catalyst
is a substance that only accelerates a reaction without itself undergoing chemical
change, which is not the case here. This rules out (C). Instead, the $Cl^{-}
molecules carry a negative charge, which they donate in the reaction to form
3 HClO2. This is the behavior of a reducing agent, or (D). The answer is (D).
- question: 'Which of the following statements about the lanthanide elements is NOT
true?
(A) The most common oxidation state for the lanthanide elements is +3. (B) Lanthanide
complexes often have high coordination numbers (> 6). (C) All of the lanthanide
elements react with aqueous acid to liberate hydrogen. (D) The atomic radii
of the lanthanide elements increase across the period from La to Lu.'
target: Let's think step by step. The atomic radii of the lanthanide elements
in fact decrease across the period from La to Lu. Options (A), (B), and (C)
are all true. This means that only (D) is NOT true. The answer is (D).
- question: 'Which of the following lists the hydrides of group-14 elements in order
of thermal stability, from lowest to highest?
(A) PbH4 < SnH4 < GeH4 < SiH4 < CH4 (B) PbH4 < SnH4 < CH4 < GeH4 < SiH4 (C)
CH4 < SiH4 < GeH4 < SnH4 < PbH4 (D) CH4 < PbH4 < GeH4 < SnH4 < SiH4'
target: Let's think step by step. The thermal stability of group-14 hydrides decreases
as we move from the top of group 14 to the bottom. The order of elements in
the group from top to bottom is C, Si, Ge, Sn, Pb. Therefore in order of increasing
thermal stability we have PbH4, SnH4, GeH4, SiH4, and CH4, or answer (A). The
answer is (A).
- question: "Predict the number of lines in the EPR spectrum of a solution of 13C-labelled\
\ methyl radical (13CH3\u2022), assuming the lines do not overlap.\n(A) 4 (B)\
\ 3 (C) 6 (D) 24 (E) 8"
target: "Let's think step by step. The electron paramagnetic resonance spectrum\
\ will be split by two forms of interactions. The first is the hyperfine interaction\
\ with the 13C (nuclear spin $I = \nrac{1}{2}$) which will split the spectrum\
\ into 2 lines. This will be further split into 4 lines by the interaction with\
\ three equivalent 1H nuclei. The total number of lines is therefore $2 \\cdot\
\ 4 = 8$. The answer is (E).\n\n"
group: mmlu_flan_cot_fewshot_stem
include: _mmlu_flan_cot_fewshot_template_yaml
task: mmlu_flan_cot_fewshot_college_chemistry
"dataset_name": "college_computer_science"
"description": "The following are multiple choice questions (with answers) about college\
\ computer science.\n\nQ: Which of the following regular expressions is equivalent\
\ to (describes the same set of strings as) (a* + b)*(c + d)?\n(A) a*(c + d)+ b(c\
\ + d)\n(B) a*(c + d)* + b(c + d)*\n(C) a*(c + d)+ b*(c + d)\n(D) (a + b)*c +(a\
\ + b)*d\nA: Let's think step by step. We know that:\n1. (X* + Y)* = (X + Y)*\n\
2. X(Y + Z)? = XY + XZ\nUsing equation 1 we can rewrite (a* + b)*(c + d)? as:\n\
3. (a + b)*(c + d)?\nUsing equation 2 we can rewrite equation 3 as:\n(a + b)*c +\
\ (a + b)*d The answer is (D).\n\nQ: The Singleton design pattern is used to guarantee\
\ that only a single instance of a class may be instantiated. Which of the following\
\ is (are) true of this design pattern?\nI. The Singleton class has a static factory\
\ method to provide its instance.\nII. The Singleton class can be a subclass of\
\ another class.\nIII. The Singleton class has a private constructor.\n(A) I only\n\
(B) II only\n(C) III only\n(D) I, II, and III\nA: Let's think step by step. Statement\
\ I is a correct statement about a Singleton, because a Singleton restricts instantiation\
\ to a single, static method. Statement II is also correct, because there is no\
\ inherent restriction regarding the inheritance of a Singleton. Statement III is\
\ also correct, because a Singletons must be instantiated only once, so its constructor\
\ is made private to prevent any construction except via its static factory method.\n\
Given these facts, statements I, II, and III are all correct. The answer is (D).\n\
\nQ: A certain pipelined RISC machine has 8 general-purpose registers R0, R1, .\
\ . . , R7 and supports the following operations:\nADD Rs1, Rs2, Rd (Add Rs1 to\
\ Rs2 and put the sum in Rd)\nMUL Rs1, Rs2, Rd (Multiply Rs1 by Rs2 and put the\
\ product in Rd)\nAn operation normally takes one cycle; however, an operation takes\
\ two cycles if it produces a result required by the immediately following operation\
\ in an operation sequence.\nConsider the expression AB + ABC + BC, where variables\
\ A, B, C are located in registers R0, R1, R2. If the contents of these three registers\
\ must not be modified, what is the minimum number of clock cycles required for\
\ an operation sequence that computes the value of AB + ABC + BC?\n(A) 5 (B) 6 (C)\
\ 7 (D) 8\nA: Let's think step by step. First, we are given that A is in R0, B is\
\ in R1, and C is in R2.\nNext, we can see that we must compute three multiplies\
\ (AB, BC, and ABC) and two adds (AB + ABC, (AB + ABC) + BC) to compute our final\
\ answer, resulting in a minimum of five clock cycles.\nNext, we can see that there\
\ is no way to avoid at least one pipeline stall when computing our final answer,\
\ because to compute our final sum we must wait at least one cycle for the results\
\ from the previous stage to be ready. Thus, our minimum number of cycles must be\
\ 6.\nWe can verify that we can create a solution that requires only six cycles\
\ as follows:\ncompute AB: MUL R0, R1, R3\ncompute BC: MUL R1, R2, R4\ncompute ABC:\
\ MUL R3, R4, R5\ncompute AB + BC: ADD R3, R4, R6\nSTALL\ncompute AB + ABC + BC:\
\ ADD R5, R6, R7\nSo there are 6 cycles. The answer is (B).\n\nQ: A compiler generates\
\ code for the following assignment statement.\nG := (A + B) * C - (D + E) * F\n\
The target machine has a single accumulator and a single-address instruction set\
\ consisting of instructions load, store, add, subtract, and multiply. For the arithmetic\
\ operations, the left operand is taken from the accumulator and the result appears\
\ in the accumulator. The smallest possible number of instructions in the resulting\
\ code is\n(A) 5 (B) 6 (C) 7 (D) 9\nA: Let's think step by step. We can compute\
\ the final answer with the following sequence of operations:\n1. LOAD D (accumulator\
\ = D)\n2. ADD E (accumulator = D+E)\n3. MUL F (accumulator = (D+E)*F)\n4. STORE\
\ X (X = (D+E)*F)\n5. LOAD A (accumulator = A)\n6. ADD B (accumulator = A+B)\n\
7. MUL C (accumulator = (A+B)*C)\n8. SUB X (accumulator = (A+B)*C - (D+E)*F)\n\
9. STORE G (G = (A+B)*C - (D+E)*F)\nThis sequence takes 9 instructions. The answer\
\ is (D).\n\nQ: Consider a computer design in which multiple processors, each with\
\ a private cache memory, share global memory using a single bus. This bus is the\
\ critical system resource. Each processor can execute one instruction every 500\
\ nanoseconds as long as memory references are satisfied by its local cache. When\
\ a cache miss occurs, the processor is delayed for an additional 2,000 nanoseconds.\
\ During half of this additional delay, the bus is dedicated to serving the cache\
\ miss. During the other half, the processor cannot continue, but the bus is free\
\ to service requests from other processors. On average, each instruction requires\
\ 2 memory references. On average, cache misses occur on 1 percent of references.\
\ What proportion of the capacity of the bus would a single processor consume, ignoring\
\ delays due to competition from other processors?\n(A) 1/50 (B) 1/27 (C) 1/25 (D)\
\ 2/27\nA: Let's think step by step. We know that each instruction requires two\
\ memory references per instruction, and that there is an average cache miss rate\
\ of one percent.\nThus a given processor has:\n(1 cache miss / 100 references)\
\ * (2 references / instruction) =\n(2 cache misses / 100 instructions), so:\nmisses_per_instruction\
\ = 1 cache miss / 50 instructions.\nNext, we know that each instruction requires\
\ 500 nanoseconds when there is no cache miss, and 500 + 2000 = 2500 nanoseconds\
\ when there is a cache miss. Thus:\n50 instructions / (49 * 500) + (1 * 2500) nanoseconds,\
\ so:\ninstructions_per_ns = 50 instructions / 27000 nanoseconds.\nNow, we know\
\ that each cache miss locks the bus for half of the 2000 nanosecond cache miss\
\ delay, or 1000 nanoseconds, so:\nlock_ns_per_miss = 1000 nanoseconds / cache miss.\n\
Thus we can see that on average a single processor will lock the bus for:\nlock_ns_per_miss\
\ * misses_per_instruction * instructions_per_ns =\n(1000 nanoseconds / cache miss)\
\ * (1 cache miss / 50 instructions) * (50 instructions / 27000 nanoseconds) = 1000\
\ * (1/50) * (50/27000) = 1000/27000 = 1/27. The answer is (B).\n\n"
"group": "mmlu_flan_cot_fewshot_stem"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_college_computer_science"
dataset_name: college_computer_science
description: The following are multiple choice questions (with answers) about college
computer science.
fewshot_config:
sampler: first_n
samples:
- question: 'Which of the following regular expressions is equivalent to (describes
the same set of strings as) (a* + b)*(c + d)?
(A) a*(c + d)+ b(c + d)
(B) a*(c + d)* + b(c + d)*
(C) a*(c + d)+ b*(c + d)
(D) (a + b)*c +(a + b)*d'
target: 'Let''s think step by step. We know that:
1. (X* + Y)* = (X + Y)*
2. X(Y + Z)? = XY + XZ
Using equation 1 we can rewrite (a* + b)*(c + d)? as:
3. (a + b)*(c + d)?
Using equation 2 we can rewrite equation 3 as:
(a + b)*c + (a + b)*d The answer is (D).'
- question: 'The Singleton design pattern is used to guarantee that only a single instance
of a class may be instantiated. Which of the following is (are) true of this
design pattern?
I. The Singleton class has a static factory method to provide its instance.
II. The Singleton class can be a subclass of another class.
III. The Singleton class has a private constructor.
(A) I only
(B) II only
(C) III only
(D) I, II, and III'
target: 'Let''s think step by step. Statement I is a correct statement about a
Singleton, because a Singleton restricts instantiation to a single, static method.
Statement II is also correct, because there is no inherent restriction regarding
the inheritance of a Singleton. Statement III is also correct, because a Singletons
must be instantiated only once, so its constructor is made private to prevent
any construction except via its static factory method.
Given these facts, statements I, II, and III are all correct. The answer is
(D).'
- question: 'A certain pipelined RISC machine has 8 general-purpose registers R0, R1,
. . . , R7 and supports the following operations:
ADD Rs1, Rs2, Rd (Add Rs1 to Rs2 and put the sum in Rd)
MUL Rs1, Rs2, Rd (Multiply Rs1 by Rs2 and put the product in Rd)
An operation normally takes one cycle; however, an operation takes two cycles
if it produces a result required by the immediately following operation in an
operation sequence.
Consider the expression AB + ABC + BC, where variables A, B, C are located in
registers R0, R1, R2. If the contents of these three registers must not be modified,
what is the minimum number of clock cycles required for an operation sequence
that computes the value of AB + ABC + BC?
(A) 5 (B) 6 (C) 7 (D) 8'
target: 'Let''s think step by step. First, we are given that A is in R0, B is
in R1, and C is in R2.
Next, we can see that we must compute three multiplies (AB, BC, and ABC) and
two adds (AB + ABC, (AB + ABC) + BC) to compute our final answer, resulting
in a minimum of five clock cycles.
Next, we can see that there is no way to avoid at least one pipeline stall when
computing our final answer, because to compute our final sum we must wait at
least one cycle for the results from the previous stage to be ready. Thus, our
minimum number of cycles must be 6.
We can verify that we can create a solution that requires only six cycles as
follows:
compute AB: MUL R0, R1, R3
compute BC: MUL R1, R2, R4
compute ABC: MUL R3, R4, R5
compute AB + BC: ADD R3, R4, R6
STALL
compute AB + ABC + BC: ADD R5, R6, R7
So there are 6 cycles. The answer is (B).'
- question: 'A compiler generates code for the following assignment statement.
G := (A + B) * C - (D + E) * F
The target machine has a single accumulator and a single-address instruction
set consisting of instructions load, store, add, subtract, and multiply. For
the arithmetic operations, the left operand is taken from the accumulator and
the result appears in the accumulator. The smallest possible number of instructions
in the resulting code is
(A) 5 (B) 6 (C) 7 (D) 9'
target: 'Let''s think step by step. We can compute the final answer with the following
sequence of operations:
1. LOAD D (accumulator = D)
2. ADD E (accumulator = D+E)
3. MUL F (accumulator = (D+E)*F)
4. STORE X (X = (D+E)*F)
5. LOAD A (accumulator = A)
6. ADD B (accumulator = A+B)
7. MUL C (accumulator = (A+B)*C)
8. SUB X (accumulator = (A+B)*C - (D+E)*F)
9. STORE G (G = (A+B)*C - (D+E)*F)
This sequence takes 9 instructions. The answer is (D).'
- question: 'Consider a computer design in which multiple processors, each with a private
cache memory, share global memory using a single bus. This bus is the critical
system resource. Each processor can execute one instruction every 500 nanoseconds
as long as memory references are satisfied by its local cache. When a cache
miss occurs, the processor is delayed for an additional 2,000 nanoseconds. During
half of this additional delay, the bus is dedicated to serving the cache miss.
During the other half, the processor cannot continue, but the bus is free to
service requests from other processors. On average, each instruction requires
2 memory references. On average, cache misses occur on 1 percent of references.
What proportion of the capacity of the bus would a single processor consume,
ignoring delays due to competition from other processors?
(A) 1/50 (B) 1/27 (C) 1/25 (D) 2/27'
target: 'Let''s think step by step. We know that each instruction requires two
memory references per instruction, and that there is an average cache miss rate
of one percent.
Thus a given processor has:
(1 cache miss / 100 references) * (2 references / instruction) =
(2 cache misses / 100 instructions), so:
misses_per_instruction = 1 cache miss / 50 instructions.
Next, we know that each instruction requires 500 nanoseconds when there is no
cache miss, and 500 + 2000 = 2500 nanoseconds when there is a cache miss. Thus:
50 instructions / (49 * 500) + (1 * 2500) nanoseconds, so:
instructions_per_ns = 50 instructions / 27000 nanoseconds.
Now, we know that each cache miss locks the bus for half of the 2000 nanosecond
cache miss delay, or 1000 nanoseconds, so:
lock_ns_per_miss = 1000 nanoseconds / cache miss.
Thus we can see that on average a single processor will lock the bus for:
lock_ns_per_miss * misses_per_instruction * instructions_per_ns =
(1000 nanoseconds / cache miss) * (1 cache miss / 50 instructions) * (50 instructions
/ 27000 nanoseconds) = 1000 * (1/50) * (50/27000) = 1000/27000 = 1/27. The answer
is (B).'
group: mmlu_flan_cot_fewshot_stem
include: _mmlu_flan_cot_fewshot_template_yaml
task: mmlu_flan_cot_fewshot_college_computer_science
"dataset_name": "college_mathematics"
"description": "The following are multiple choice questions (with answers) about college\
\ mathematics.\n\nQ: Let V be the set of all real polynomials p(x). Let transformations\
\ T, S be defined on V by T:p(x) -> xp(x) and S:p(x) -> p'(x) = d/dx p(x), and interpret\
\ (ST)(p(x)) as S(T(p(x))). Which of the following is true?\n(A) ST = 0 (B) ST =\
\ T (C) ST = TS (D) ST - TS is the identity map of V onto itself.\nA: Let's think\
\ step by step. For a given polynomial $p$ we have\n\\[ST(p) = (xp(x))’ = p(x) +\
\ xp’(x)\\]\nand\n\\[TS(p) = xp’(x).\\]\nHence \\[ST(p) - TS(p) = p(x) + xp’(x)\
\ - xp’(x).\\] The answer is (D).\n\nQ: Suppose that f(1 + x) = f(x) for all real\
\ x. If f is a polynomial and f(5) = 11, then f(15/2)\n(A) -11 (B) 0 (C) 11 (D)\
\ 33/2\nA: Let's think step by step. The only polynomial so that $f(1 + x) = f(x)$\
\ is a constant polynomial. Hence $f(5) = 11 = f(15/2)$. The answer is (C).\n\n\
Q: Let A be a real 2x2 matrix. Which of the following statements must be true?\n\
I. All of the entries of A^2 are nonnegative.\nII. The determinant of A^2 is nonnegative.\n\
III. If A has two distinct eigenvalues, then A^2 has two distinct eigenvalues.\n\
(A) I only (B) II only (C) III only (D) II and III only\nA: Let's think step by\
\ step. We have \\[ det(A^2) = (det(A))^2 \\geq 0,\\] hence II holds.\nIII is false:\
\ as a counterexample take a diagonal matrix with -1 and 1 on the diagonal. Then\
\ $A^2$ is the identity matrix. The answer is (B).\n\nQ: Let A be the set of all\
\ ordered pairs of integers (m, n) such that 7m + 12n = 22. What is the greatest\
\ negative number in the set B = {m + n : (m, n) \\in A}?\n(A) -5 (B) -4 (C) -3\
\ (D) -2\nA: Let's think step by step. We have 12n = 22 - 7m and one of the solutions\
\ is $m = -2$, $n = 3$. Then $m + n = 1$, hence we need to look for smaller $m$\
\ in order to make $m + n$ negative. The next solution is $m = -14$ and $n = 10$.\
\ For smaller $m$ we have $m + n$ smaller than $-4$. The answer is (B).\n\nQ: A\
\ tank initially contains a salt solution of 3 grams of salt dissolved in 100 liters\
\ of water. A salt solution containing 0.02 grams of salt per liter of water is\
\ sprayed into the tank at a rate of 4 liters per minute. The sprayed solution is\
\ continually mixed with the salt solution in the tank, and the mixture flows out\
\ of the tank at a rate of 4 liters per minute. If the mixing is instantaneous,\
\ how many grams of salt are in the tank after 100 minutes have elapsed?\n(A) 2\
\ (B) 2 - e^-2 (C) 2 + e^-2 (D) 2 + e^-4\nA: Let's think step by step. For all $t\
\ \\in \\mathbb{R}$, let $s(t)$ denote the number grams of salt in the tank at the\
\ $t$ minute mark. Then $s(0) = 3$.\nWe use $s$ and $s(t)$ interchangeably. We also\
\ use $s^{\\prime}$ and $s^{\\prime}(t)$ interchangeably. The solution sprayed into\
\ the tank adds $(0.02) 4=2 / 25$ grams of salt per minute. There are always 100\
\ liters of liquid in the tank, containing $s$ grams of salt. So the density of\
\ salt in the tank is $s / 100$ grams per liter. The flow of water out of the tank\
\ therefore subtracts $4(s / 100)=s / 25$ grams of salt per minute. Then, for all\
\ $t \\in \\mathbb{R}$, we have $s^{\\prime}(t)=(2 / 25)-(s / 25)=(2-s) / 25$, and\
\ so $[s(t)=2] \\Rightarrow\\left[s^{\\prime}(t)=0\right]$. For all $t \\in \\mathbb{R}$,\n\
$$\n\frac{d}{d t}[\\ln (s-2)]=\frac{s^{\\prime}}{s-2}=\frac{-1}{25}=\frac{d}{d t}\\\
left[-\frac{t}{25}\right] .\n$$\nChoose $C \\in \\mathbb{R}$ such that, for all\
\ $t \\in \\mathbb{R}, \\ln ((s(t)-2))=-[t / 25]+C$. Let $K:=e^{C}$. Then, for all\
\ $t \\in \\mathbb{R}$, we have $(s(t))-2=K e^{-t / 25}$, and so $s(t)=2+K e^{-t\
\ / 25}$. Then $3=s(0)=2+K e^{0}=2+K$, so $K=1$. Then $s(100)=2+K e^{-100 / 25}=2+1\
\ \\cdot e^{-4}=2+e^{-4}$. The answer is (D).\n\n"
"group": "mmlu_flan_cot_fewshot_stem"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_college_mathematics"
dataset_name: college_mathematics
description: The following are multiple choice questions (with answers) about college
mathematics.
fewshot_config:
sampler: first_n
samples:
- question: 'Let V be the set of all real polynomials p(x). Let transformations T,
S be defined on V by T:p(x) -> xp(x) and S:p(x) -> p''(x) = d/dx p(x), and interpret
(ST)(p(x)) as S(T(p(x))). Which of the following is true?
(A) ST = 0 (B) ST = T (C) ST = TS (D) ST - TS is the identity map of V onto
itself.'
target: "Let's think step by step. For a given polynomial $p$ we have\n\\[ST(p)\
\ = (xp(x))\u2019 = p(x) + xp\u2019(x)\\]\nand\n\\[TS(p) = xp\u2019(x).\\]\n\
Hence \\[ST(p) - TS(p) = p(x) + xp\u2019(x) - xp\u2019(x).\\] The answer is\
\ (D)."
- question: 'Suppose that f(1 + x) = f(x) for all real x. If f is a polynomial and
f(5) = 11, then f(15/2)
(A) -11 (B) 0 (C) 11 (D) 33/2'
target: Let's think step by step. The only polynomial so that $f(1 + x) = f(x)$
is a constant polynomial. Hence $f(5) = 11 = f(15/2)$. The answer is (C).
- question: 'Let A be a real 2x2 matrix. Which of the following statements must be
true?
I. All of the entries of A^2 are nonnegative.
II. The determinant of A^2 is nonnegative.
III. If A has two distinct eigenvalues, then A^2 has two distinct eigenvalues.
(A) I only (B) II only (C) III only (D) II and III only'
target: 'Let''s think step by step. We have \[ det(A^2) = (det(A))^2 \geq 0,\]
hence II holds.
III is false: as a counterexample take a diagonal matrix with -1 and 1 on the
diagonal. Then $A^2$ is the identity matrix. The answer is (B).'
- question: 'Let A be the set of all ordered pairs of integers (m, n) such that 7m
+ 12n = 22. What is the greatest negative number in the set B = {m + n : (m,
n) \in A}?
(A) -5 (B) -4 (C) -3 (D) -2'
target: Let's think step by step. We have 12n = 22 - 7m and one of the solutions
is $m = -2$, $n = 3$. Then $m + n = 1$, hence we need to look for smaller $m$
in order to make $m + n$ negative. The next solution is $m = -14$ and $n = 10$.
For smaller $m$ we have $m + n$ smaller than $-4$. The answer is (B).
- question: 'A tank initially contains a salt solution of 3 grams of salt dissolved
in 100 liters of water. A salt solution containing 0.02 grams of salt per liter
of water is sprayed into the tank at a rate of 4 liters per minute. The sprayed
solution is continually mixed with the salt solution in the tank, and the mixture
flows out of the tank at a rate of 4 liters per minute. If the mixing is instantaneous,
how many grams of salt are in the tank after 100 minutes have elapsed?
(A) 2 (B) 2 - e^-2 (C) 2 + e^-2 (D) 2 + e^-4'
target: "Let's think step by step. For all $t \\in \\mathbb{R}$, let $s(t)$ denote\
\ the number grams of salt in the tank at the $t$ minute mark. Then $s(0) =\
\ 3$.\nWe use $s$ and $s(t)$ interchangeably. We also use $s^{\\prime}$ and\
\ $s^{\\prime}(t)$ interchangeably. The solution sprayed into the tank adds\
\ $(0.02) 4=2 / 25$ grams of salt per minute. There are always 100 liters of\
\ liquid in the tank, containing $s$ grams of salt. So the density of salt in\
\ the tank is $s / 100$ grams per liter. The flow of water out of the tank therefore\
\ subtracts $4(s / 100)=s / 25$ grams of salt per minute. Then, for all $t \\\
in \\mathbb{R}$, we have $s^{\\prime}(t)=(2 / 25)-(s / 25)=(2-s) / 25$, and\
\ so $[s(t)=2] \\Rightarrow\\left[s^{\\prime}(t)=0\right]$. For all $t \\in\
\ \\mathbb{R}$,\n$$\n\frac{d}{d t}[\\ln (s-2)]=\frac{s^{\\prime}}{s-2}=\frac{-1}{25}=\f\
rac{d}{d t}\\left[-\frac{t}{25}\right] .\n$$\nChoose $C \\in \\mathbb{R}$ such\
\ that, for all $t \\in \\mathbb{R}, \\ln ((s(t)-2))=-[t / 25]+C$. Let $K:=e^{C}$.\
\ Then, for all $t \\in \\mathbb{R}$, we have $(s(t))-2=K e^{-t / 25}$, and\
\ so $s(t)=2+K e^{-t / 25}$. Then $3=s(0)=2+K e^{0}=2+K$, so $K=1$. Then $s(100)=2+K\
\ e^{-100 / 25}=2+1 \\cdot e^{-4}=2+e^{-4}$. The answer is (D).\n\n"
group: mmlu_flan_cot_fewshot_stem
include: _mmlu_flan_cot_fewshot_template_yaml
task: mmlu_flan_cot_fewshot_college_mathematics
"dataset_name": "college_medicine"
"description": "The following are multiple choice questions (with answers) about college\
\ medicine.\n\nQ: An expected side effect of creatine supplementation is:\n(A) muscle\
\ weakness. (B) gain in body mass. (C) muscle cramps. (D) loss of electrolytes.\n\
A: Let's think step by step. We refer to Wikipedia articles on medicine for help.\
\ Creatine supplementation is a dietary supplement that results in body mass gain.\
\ The answer is (B).\n\nQ: Which of the following is not a true statement?\n(A)\
\ Muscle glycogen is broken down enzymatically to glucose-1-phosphate (B) Elite\
\ endurance runners have a high proportion of Type I fibres in their leg muscles\
\ (C) Liver glycogen is important in the maintenance of the blood glucose concentration\
\ (D) Insulin promotes glucose uptake by all tissues in the body\nA: Let's think\
\ step by step. We refer to Wikipedia articles on medicine for help. Let’s solve\
\ this step by step and go over each choice: \n(A) “Muscle glycogen is broken down\
\ enzymatically to glucose-1-phosphate”: This is a correct statement.\n(B) “Elite\
\ endurance runners have a high proportion of Type I fibres in their leg muscles”:\
\ This is a correct statement.\n(C) “Liver glycogen is important in the maintenance\
\ of the blood glucose concentration”: This is a correct statement. \n(D) “Insulin\
\ promotes glucose uptake by all tissues in the body”: This is not a correct statement,\
\ because insulin promotes glucose uptake by the liver, adipose tissue, and muscle,\
\ but not all tissues. For instance, the tissues in the brain and red blood cells\
\ are not affected by insulin. The answer is (D).\n\nQ: A high school science teacher\
\ fills a 1 liter bottle with pure nitrogen and seals the lid. The pressure is 1.70\
\ atm, and the room temperature is 25°C. Which two variables will both increase\
\ the pressure of the system, if all other variables are held constant?\n(A) Increasing\
\ temperature, increasing moles of gas (B) Increasing temperature, increasing volume\
\ (C) Decreasing volume, decreasing temperature (D) Decreasing moles of gas, increasing\
\ volume\nA: Let's think step by step. We refer to Wikipedia articles on medicine\
\ for help. The relevant equation for this is the ideal gas law: PV=nRT. To increase\
\ the pressure of the system (P), then either n (number of moles of the gas) or\
\ T (temperature) have to increase. The answer is (A).\n\nQ: In a genetic test of\
\ a newborn, a rare genetic disorder is found that has X-linked recessive transmission.\
\ Which of the following statements is likely true regarding the pedigree of this\
\ disorder?\n(A) All descendants on the maternal side will have the disorder. (B)\
\ Females will be approximately twice as affected as males in this family. (C) All\
\ daughters of an affected male will be affected. (D) There will be equal distribution\
\ of males and females affected.\nA: Let's think step by step. We refer to Wikipedia\
\ articles on medicine for help. Let’s solve this step by step. Let's recall first\
\ that females have two X chromosomes, while males have one X and one Y chromosome.\
\ This is an important fact we need to know before answering this question. \nBecause\
\ a male can only pass his only one X chromosome to a daughter, if he is affected\
\ by this rare genetic disorder, then we know for sure that he will pass this rare\
\ genetic disorder to all his future-born daughters. Therefore, “(C): All daughters\
\ of an affected male will be affected” is a correct statement. The answer is (C).\n\
\nQ: Glucose is transported into the muscle cell:\n(A) via protein transporters\
\ called GLUT4. (B) only in the presence of insulin. (C) via hexokinase. (D) via\
\ monocarbylic acid transporters.\nA: Let's think step by step. We refer to Wikipedia\
\ articles on medicine for help. Glucose (also known as the blood sugar) is the\
\ main sugar found in the human body. It is transported into the muscle cell via\
\ diffusion through protein transporters called GLUT4. The answer is (A).\n\n"
"group": "mmlu_flan_cot_fewshot_other"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_college_medicine"
dataset_name: college_medicine
description: The following are multiple choice questions (with answers) about college
medicine.
fewshot_config:
sampler: first_n
samples:
- question: 'An expected side effect of creatine supplementation is:
(A) muscle weakness. (B) gain in body mass. (C) muscle cramps. (D) loss of electrolytes.'
target: Let's think step by step. We refer to Wikipedia articles on medicine for
help. Creatine supplementation is a dietary supplement that results in body
mass gain. The answer is (B).
- question: 'Which of the following is not a true statement?
(A) Muscle glycogen is broken down enzymatically to glucose-1-phosphate (B)
Elite endurance runners have a high proportion of Type I fibres in their leg
muscles (C) Liver glycogen is important in the maintenance of the blood glucose
concentration (D) Insulin promotes glucose uptake by all tissues in the body'
target: "Let's think step by step. We refer to Wikipedia articles on medicine\
\ for help. Let\u2019s solve this step by step and go over each choice: \n(A)\
\ \u201CMuscle glycogen is broken down enzymatically to glucose-1-phosphate\u201D\
: This is a correct statement.\n(B) \u201CElite endurance runners have a high\
\ proportion of Type I fibres in their leg muscles\u201D: This is a correct\
\ statement.\n(C) \u201CLiver glycogen is important in the maintenance of the\
\ blood glucose concentration\u201D: This is a correct statement. \n(D) \u201C\
Insulin promotes glucose uptake by all tissues in the body\u201D: This is not\
\ a correct statement, because insulin promotes glucose uptake by the liver,\
\ adipose tissue, and muscle, but not all tissues. For instance, the tissues\
\ in the brain and red blood cells are not affected by insulin. The answer is\
\ (D)."
- question: "A high school science teacher fills a 1 liter bottle with pure nitrogen\
\ and seals the lid. The pressure is 1.70 atm, and the room temperature is 25\xB0\
C. Which two variables will both increase the pressure of the system, if all\
\ other variables are held constant?\n(A) Increasing temperature, increasing\
\ moles of gas (B) Increasing temperature, increasing volume (C) Decreasing\
\ volume, decreasing temperature (D) Decreasing moles of gas, increasing volume"
target: 'Let''s think step by step. We refer to Wikipedia articles on medicine
for help. The relevant equation for this is the ideal gas law: PV=nRT. To increase
the pressure of the system (P), then either n (number of moles of the gas) or
T (temperature) have to increase. The answer is (A).'
- question: 'In a genetic test of a newborn, a rare genetic disorder is found that
has X-linked recessive transmission. Which of the following statements is likely
true regarding the pedigree of this disorder?
(A) All descendants on the maternal side will have the disorder. (B) Females
will be approximately twice as affected as males in this family. (C) All daughters
of an affected male will be affected. (D) There will be equal distribution of
males and females affected.'
target: "Let's think step by step. We refer to Wikipedia articles on medicine\
\ for help. Let\u2019s solve this step by step. Let's recall first that females\
\ have two X chromosomes, while males have one X and one Y chromosome. This\
\ is an important fact we need to know before answering this question. \nBecause\
\ a male can only pass his only one X chromosome to a daughter, if he is affected\
\ by this rare genetic disorder, then we know for sure that he will pass this\
\ rare genetic disorder to all his future-born daughters. Therefore, \u201C\
(C): All daughters of an affected male will be affected\u201D is a correct statement.\
\ The answer is (C)."
- question: 'Glucose is transported into the muscle cell:
(A) via protein transporters called GLUT4. (B) only in the presence of insulin.
(C) via hexokinase. (D) via monocarbylic acid transporters.'
target: 'Let''s think step by step. We refer to Wikipedia articles on medicine
for help. Glucose (also known as the blood sugar) is the main sugar found in
the human body. It is transported into the muscle cell via diffusion through
protein transporters called GLUT4. The answer is (A).'
group: mmlu_flan_cot_fewshot_other
include: _mmlu_flan_cot_fewshot_template_yaml
task: mmlu_flan_cot_fewshot_college_medicine
"dataset_name": "college_physics"
"description": "The following are multiple choice questions (with answers) about college\
\ physics.\n\nQ: A refracting telescope consists of two converging lenses separated\
\ by 100 cm. The eye-piece lens has a focal length of 20 cm. The angular magnification\
\ of the telescope is\n(A) 4 (B) 5 (C) 6 (D) 20\nA: Let's think step by step. In\
\ a refracting telescope, if both lenses are converging, the focus of both lenses\
\ must be between the two lenses, and thus the focal lengths of the two lenses must\
\ add up to their separation. Since the focal length of one lens is 20 cm, the focal\
\ length of the other must be 80 cm. The magnification is the ratio of these two\
\ focal lengths, or 4. The answer is (A).\n\nQ: The muon decays with a characteristic\
\ lifetime of about 10^-6 second into an electron, a muon neutrino, and an electron\
\ antineutrino. The muon is forbidden from decaying into an electron and just a\
\ single neutrino by the law of conservation of\n(A) charge (B) mass (C) energy\
\ and momentum (D) lepton number\nA: Let's think step by step. Lepton number must\
\ be conserved, meaning the total number of leptons minus the number of antileptons.\
\ If a muon decays into an electron and a single neutrino, the total lepton number\
\ would go from one to two, violating lepton number conservation. The answer is\
\ (D).\n\nQ: One end of a Nichrome wire of length 2L and cross-sectional area A\
\ is attached to an end of another Nichrome wire of length L and cross- sectional\
\ area 2A. If the free end of the longer wire is at an electric potential of 8.0\
\ volts, and the free end of the shorter wire is at an electric potential of 1.0\
\ volt, the potential at the junction of the two wires is most nearly equal to\n\
(A) 2.4 V (B) 3.3 V (C) 4.5 V (D) 5.7 V\nA: Let's think step by step. This is a\
\ simple voltage divider problem, where the longer wire has a resistance four times\
\ that of the shorter end. So the voltage divider ratio is 1 / 5, meaning that the\
\ potential in the middle is 1.0 V + (8.0 V - 1.0 V) * 1/5 = 2.4 V. The answer is\
\ (A).\n\nQ: A refracting telescope consists of two converging lenses separated\
\ by 100 cm. The eye-piece lens has a focal length of 20 cm. The angular magnification\
\ of the telescope is\n(A) 4 (B) 5 (C) 6 (D) 20\nA: Let's think step by step. In\
\ a refracting telescope, if both lenses are converging, the focus of both lenses\
\ must be between the two lenses, and thus the focal lengths of the two lenses must\
\ add up to their separation. Since the focal length of one lens is 20 cm, the focal\
\ length of the other must be 80 cm. The magnification is the ratio of these two\
\ focal lengths, or 4. The answer is (A).\n\nQ: For which of the following thermodynamic\
\ processes is the increase in the internal energy of an ideal gas equal to the\
\ heat added to the gas?\n(A) Constant temperature (B) Constant volume (C) Constant\
\ pressure (D) Adiabatic\nA: Let's think step by step. Heat added to the gas can\
\ go into the gases internal energy or work done against an external force. However,\
\ if the volume of the gas container is constant, no work will be done (since work\
\ is pressure times change in volume). So, at constant volume, all of the heat goes\
\ into the internal energy. The answer is (B).\n\n"
"group": "mmlu_flan_cot_fewshot_stem"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_college_physics"
dataset_name: college_physics
description: The following are multiple choice questions (with answers) about college
physics.
fewshot_config:
sampler: first_n
samples:
- question: 'A refracting telescope consists of two converging lenses separated by
100 cm. The eye-piece lens has a focal length of 20 cm. The angular magnification
of the telescope is
(A) 4 (B) 5 (C) 6 (D) 20'
target: Let's think step by step. In a refracting telescope, if both lenses are
converging, the focus of both lenses must be between the two lenses, and thus
the focal lengths of the two lenses must add up to their separation. Since the
focal length of one lens is 20 cm, the focal length of the other must be 80
cm. The magnification is the ratio of these two focal lengths, or 4. The answer
is (A).
- question: 'The muon decays with a characteristic lifetime of about 10^-6 second into
an electron, a muon neutrino, and an electron antineutrino. The muon is forbidden
from decaying into an electron and just a single neutrino by the law of conservation
of
(A) charge (B) mass (C) energy and momentum (D) lepton number'
target: Let's think step by step. Lepton number must be conserved, meaning the
total number of leptons minus the number of antileptons. If a muon decays into
an electron and a single neutrino, the total lepton number would go from one
to two, violating lepton number conservation. The answer is (D).
- question: 'One end of a Nichrome wire of length 2L and cross-sectional area A is
attached to an end of another Nichrome wire of length L and cross- sectional
area 2A. If the free end of the longer wire is at an electric potential of 8.0
volts, and the free end of the shorter wire is at an electric potential of 1.0
volt, the potential at the junction of the two wires is most nearly equal to
(A) 2.4 V (B) 3.3 V (C) 4.5 V (D) 5.7 V'
target: Let's think step by step. This is a simple voltage divider problem, where
the longer wire has a resistance four times that of the shorter end. So the
voltage divider ratio is 1 / 5, meaning that the potential in the middle is
1.0 V + (8.0 V - 1.0 V) * 1/5 = 2.4 V. The answer is (A).
- question: 'A refracting telescope consists of two converging lenses separated by
100 cm. The eye-piece lens has a focal length of 20 cm. The angular magnification
of the telescope is
(A) 4 (B) 5 (C) 6 (D) 20'
target: Let's think step by step. In a refracting telescope, if both lenses are
converging, the focus of both lenses must be between the two lenses, and thus
the focal lengths of the two lenses must add up to their separation. Since the
focal length of one lens is 20 cm, the focal length of the other must be 80
cm. The magnification is the ratio of these two focal lengths, or 4. The answer
is (A).
- question: 'For which of the following thermodynamic processes is the increase in
the internal energy of an ideal gas equal to the heat added to the gas?
(A) Constant temperature (B) Constant volume (C) Constant pressure (D) Adiabatic'
target: 'Let''s think step by step. Heat added to the gas can go into the gases
internal energy or work done against an external force. However, if the volume
of the gas container is constant, no work will be done (since work is pressure
times change in volume). So, at constant volume, all of the heat goes into the
internal energy. The answer is (B).'
group: mmlu_flan_cot_fewshot_stem
include: _mmlu_flan_cot_fewshot_template_yaml
task: mmlu_flan_cot_fewshot_college_physics
"dataset_name": "computer_security"
"description": "The following are multiple choice questions (with answers) about computer\
\ security.\n\nQ: SHA-1 has a message digest of\n(A) 160 bits (B) 512 bits (C) 628\
\ bits (D) 820 bits\nA: Let's think step by step. Since SHA-1 is a hash function\
\ which takes an input and produces a 160-bit (20-byte) hash value, its message\
\ digest is 160 bits. The answer is (A).\n\nQ: _____________ can modify data on\
\ your system so that your system doesn’t run correctly or you can no longer access\
\ specific data, or it may even ask for ransom in order to give your access.\n(A)\
\ IM Trojans (B) Backdoor Trojans (C) Trojan-Downloader (D) Ransom Trojan\nA:\
\ Let's think step by step. The system is asking for trojans, which are for ransom,\
\ which means ransom trojan. The answer is (D).\n\nQ: What is ethical hacking?\n\
(A) \"Hacking\" ethics so they justify unintended selfish behavior (B) Hacking systems\
\ (e.g., during penetration testing) to expose vulnerabilities so they can be fixed,\
\ rather than exploited (C) Hacking into systems run by those whose ethics you disagree\
\ with (D) A slang term for rapid software development, e.g., as part of hackathons\n\
A: Let's think step by step. Ethical hacking is a process of detecting vulnerabilities\
\ in an application, system, or organization's infrastructure that an attacker can\
\ use to exploit an individual or organization. They use this process to prevent\
\ cyberattacks and security breaches by lawfully hacking into the systems and looking\
\ for weak points. The answer is (B).\n\nQ: The ____________ is anything which your\
\ search engine cannot search.\n(A) Haunted web (B) World Wide Web (C) Surface web\
\ (D) Deep Web\nA: Let's think step by step. The search engine searches on the Surface\
\ Web, which is the portion of the world wide web which is visible so (B,C) are\
\ wrong. The Haunted Web doesn’t correspond to an internet concept. The Deep Web\
\ is the part of the World Wide Web which is not indexed. The answer is (D).\n\n\
Q: Exploitation of the Heartbleed bug permits\n(A) overwriting cryptographic keys\
\ in memory (B) a kind of code injection (C) a read outside bounds of a buffer (D)\
\ a format string attack\nA: Let's think step by step. The Heartbleed Bug is a serious\
\ vulnerability in the popular OpenSSL cryptographic software library. Heartbleed\
\ resulted from improper input validation (due to a missing bounds check) in the\
\ implementation of the TLS heartbeat extension. The vulnerability was classified\
\ as a buffer over-read, a situation where more data can be read than should be\
\ allowed. The answer is (C).\n\n"
"group": "mmlu_flan_cot_fewshot_stem"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_computer_security"
dataset_name: computer_security
description: The following are multiple choice questions (with answers) about computer
security.
fewshot_config:
sampler: first_n
samples:
- question: 'SHA-1 has a message digest of
(A) 160 bits (B) 512 bits (C) 628 bits (D) 820 bits'
target: Let's think step by step. Since SHA-1 is a hash function which takes an
question and produces a 160-bit (20-byte) hash value, its message digest is 160
bits. The answer is (A).
- question: "_____________ can modify data on your system \u2013 so that your system\
\ doesn\u2019t run correctly or you can no longer access specific data, or it\
\ may even ask for ransom in order to give your access.\n(A) IM \u2013 Trojans\
\ (B) Backdoor Trojans (C) Trojan-Downloader (D) Ransom Trojan"
target: Let's think step by step. The system is asking for trojans, which are
for ransom, which means ransom trojan. The answer is (D).
- question: 'What is ethical hacking?
(A) "Hacking" ethics so they justify unintended selfish behavior (B) Hacking
systems (e.g., during penetration testing) to expose vulnerabilities so they
can be fixed, rather than exploited (C) Hacking into systems run by those whose
ethics you disagree with (D) A slang term for rapid software development, e.g.,
as part of hackathons'
target: Let's think step by step. Ethical hacking is a process of detecting vulnerabilities
in an application, system, or organization's infrastructure that an attacker
can use to exploit an individual or organization. They use this process to prevent
cyberattacks and security breaches by lawfully hacking into the systems and
looking for weak points. The answer is (B).
- question: 'The ____________ is anything which your search engine cannot search.
(A) Haunted web (B) World Wide Web (C) Surface web (D) Deep Web'
target: "Let's think step by step. The search engine searches on the Surface Web,\
\ which is the portion of the world wide web which is visible so (B,C) are wrong.\
\ The Haunted Web doesn\u2019t correspond to an internet concept. The Deep Web\
\ is the part of the World Wide Web which is not indexed. The answer is (D)."
- question: 'Exploitation of the Heartbleed bug permits
(A) overwriting cryptographic keys in memory (B) a kind of code injection (C)
a read outside bounds of a buffer (D) a format string attack'
target: 'Let''s think step by step. The Heartbleed Bug is a serious vulnerability
in the popular OpenSSL cryptographic software library. Heartbleed resulted from
improper question validation (due to a missing bounds check) in the implementation
of the TLS heartbeat extension. The vulnerability was classified as a buffer
over-read, a situation where more data can be read than should be allowed. The
answer is (C).'
group: mmlu_flan_cot_fewshot_stem
include: _mmlu_flan_cot_fewshot_template_yaml
task: mmlu_flan_cot_fewshot_computer_security
"dataset_name": "conceptual_physics"
"description": "\nThe following are multiple choice questions (with answers) about\
\ conceptual physics.\n\nQ: Colors in a soap bubble result from light\n(A) converted\
\ to a different frequency (B) deflection (C) interference (D) polarization\nA:\
\ Let's think step by step. In a soap bubble film, the light bounces between the\
\ two soap-air interfaces many times, interfering with itself constructively or\
\ destructively depending on the width of the film. This results in different colors\
\ being visible. The answer is (C).\n\nQ: Compared with the mass of a uranium atom\
\ undergoing fission, the combined masses of the products after fission are\n(A)\
\ less (B) more (C) the same (D) zero\nA: Let's think step by step. Fission releases\
\ energy, which comes from the rest mass of its initial nucleus. Thus the mass of\
\ the products is less than the mass of the reactant uranium nucleus. The answer\
\ is (A).\n\nQ: Things that are equivalent according to the equivalence principle\
\ are\n(A) space and time. (B) a traveling twin and a stay-at-home twin. (C) gravity\
\ and acceleration. (D) mass and energy.\nA: Let's think step by step. Einstein’s\
\ famous equivalence principle states that gravity and acceleration are equivalent.\
\ The answer is (C).\n\nQ: Which of these three elements has the most mass per nucleon?\n\
(A) Hydrogen (B) Iron (C) Uranium (D) Same in each\nA: Let's think step by step.\
\ Due to nuclear binding energy, the mass of an atomic nucleus is less than the\
\ sum of individual masses of the free constituent protons and neutrons; this is\
\ known as the mass defect. Hydrogen has no mass defect because it has only a single\
\ nucleon, so it will have the most mass per nucleon. The answer is (A).\n\nQ: A\
\ model airplane flies slower when flying into the wind and faster with wind at\
\ its back. When launched at right angles to the wind a cross wind its groundspeed\
\ compared with flying in still air is\n(A) the same (B) greater (C) less (D) either\
\ greater or less depending on wind speed\nA: Let's think step by step. The plane’s\
\ speed in the direction of the wind is greater than it would be in the absence\
\ of wind, and its direction orthogonal to the wind is the same as it would be in\
\ the absence of the wind. The total speed, which is these two components added\
\ in quadrature, is thus greater than the speed in still air. The answer is (B).\n\n"
"group": "mmlu_flan_cot_fewshot_stem"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_conceptual_physics"
dataset_name: conceptual_physics
description: '
The following are multiple choice questions (with answers) about conceptual physics.'
fewshot_config:
sampler: first_n
samples:
- question: 'Colors in a soap bubble result from light
(A) converted to a different frequency (B) deflection (C) interference (D) polarization'
target: Let's think step by step. In a soap bubble film, the light bounces between
the two soap-air interfaces many times, interfering with itself constructively
or destructively depending on the width of the film. This results in different
colors being visible. The answer is (C).
- question: 'Compared with the mass of a uranium atom undergoing fission, the combined
masses of the products after fission are
(A) less (B) more (C) the same (D) zero'
target: Let's think step by step. Fission releases energy, which comes from the
rest mass of its initial nucleus. Thus the mass of the products is less than
the mass of the reactant uranium nucleus. The answer is (A).
- question: 'Things that are equivalent according to the equivalence principle are
(A) space and time. (B) a traveling twin and a stay-at-home twin. (C) gravity
and acceleration. (D) mass and energy.'
target: "Let's think step by step. Einstein\u2019s famous equivalence principle\
\ states that gravity and acceleration are equivalent. The answer is (C)."
- question: 'Which of these three elements has the most mass per nucleon?
(A) Hydrogen (B) Iron (C) Uranium (D) Same in each'
target: Let's think step by step. Due to nuclear binding energy, the mass of an
atomic nucleus is less than the sum of individual masses of the free constituent
protons and neutrons; this is known as the mass defect. Hydrogen has no mass
defect because it has only a single nucleon, so it will have the most mass per
nucleon. The answer is (A).
- question: 'A model airplane flies slower when flying into the wind and faster with
wind at its back. When launched at right angles to the wind a cross wind its
groundspeed compared with flying in still air is
(A) the same (B) greater (C) less (D) either greater or less depending on wind
speed'
target: "Let's think step by step. The plane\u2019s speed in the direction of\
\ the wind is greater than it would be in the absence of wind, and its direction\
\ orthogonal to the wind is the same as it would be in the absence of the wind.\
\ The total speed, which is these two components added in quadrature, is thus\
\ greater than the speed in still air. The answer is (B).\n\n"
group: mmlu_flan_cot_fewshot_stem
include: _mmlu_flan_cot_fewshot_template_yaml
task: mmlu_flan_cot_fewshot_conceptual_physics
"dataset_name": "econometrics"
"description": "The following are multiple choice questions (with answers) about econometrics.\n\
\nQ: Suppose now that a researcher wishes to use information criteria to determine\
\ the optimal lag length for a VAR. 500 observations are available for the bi-variate\
\ VAR, and the values of the determinant of the variance-covariance matrix of residuals\
\ are 0.0336, 0.0169, 0.0084, and 0.0062 for 1, 2, 3, and 4 lags respectively. What\
\ is the optimal model order according to Akaike's information criterion?\n(A) 1\
\ lag (B) 2 lags (C) 3 lags (D) 4 lags\nA: Let's think step by step. We refer to\
\ Wikipedia articles on econometrics for help. Let’s solve this problem step by\
\ step. First of all, let’s recall that for a given set of data, Akaike's information\
\ criterion (AIC) allows us to measure how well a statistical model fits the data;\
\ it is an estimator of prediction error. Here in this problem we will need to use\
\ the formula ln(det(sigma_hat)) + (2 * k / T) to determine the values of Akaike’s\
\ criterion, where ln denotes the natural log function, det the determinant function,\
\ k the total number of parameters in total (across both equations), and T the number\
\ of observations (which, in this case, is equal to 500). For 1 lag, the number\
\ of parameters in total is equal to 6; for 2 lags, it is 10; for 3 lags, it is\
\ 14; and for 4 lags, it is 18. Now, let’s calculate the values of the criterion\
\ for each lag:\n(A) 1 lag: ln(0.0336) + (2 * 6 / 500) = ln(0.0336) + (12 / 500)\
\ = -3.369\n(B) 2 lags: ln(0.0169) + (2 * 10 / 500) = ln(0.0169) + (20 / 500) =\
\ -4.040\n(C) 3 lags: ln(0.0084) + (2 * 14 / 500) = ln(0.0084) + (28 / 500) =-4.724\n\
(D) 4 lags: ln(0.0062) + (2 * 18 / 500) = ln(0.0062) + (36 / 500) =-5.011\nBecause\
\ the optimal model order according to AIC minimizes the information criterion,\
\ the answer should be the one with the lowest value. In this case, (D) has the\
\ lowest value. The answer is (C).\n\nQ: Consider the following AR(1) model with\
\ the disturbances having zero mean and unit variance\nyt = 0.2 + 0.4 yt-1 + ut\n\
The (unconditional) mean of y will be given by\n(A) 0.2 (B) 0.4 (C) 0.5 (D) 0.33\n\
A: Let's think step by step. We refer to Wikipedia articles on econometrics for\
\ help. Let’s solve this problem step by step. If we have a an AR(1) model with\
\ the disturbances having zero mean and unit variance, then the unconditional mean\
\ of y is equal to the following:\nunconditional mean of y = (the intercept term)\
\ / (1 - autoregressive coefficient)\nWe know that the intercept term is 0.2 and\
\ the autoregressive coefficient is 0.4; thus, we have:\nunconditional mean of y\
\ = (0.2) / (1 - 0.4) = (0.2) / (0.6) = 2 / 6 = 1 / 3, which is approximately 0.33.\
\ That means that the answer should be (D) 0.33. The answer is (D).\n\nQ: What would\
\ be then consequences for the OLS estimator if heteroscedasticity is present in\
\ a regression model but ignored?\n(A) It will be biased (B) It will be inconsistent\
\ (C) It will be inefficient (D) All of (a), (b) and (c) will be true.\nA: Let's\
\ think step by step. We refer to Wikipedia articles on econometrics for help. Heteroscedasticity\
\ refers to the condition where the variance of the error terms is not constant\
\ across multiple observations. If heteroscedasticity is present in a regression\
\ model, then the coefficient estimates in the OLS estimator will be not only unbiased\
\ and consistent but also inefficient. Because (A) and (B) are incorrect choices\
\ and (C) is a correct choice, (D) cannot be the right answer. Ultimately, (C) is\
\ the only true choice. The answer is (C).\n\nQ: Suppose that a test statistic has\
\ associated with it a p-value of 0.08. Which one of the following statements is\
\ true?\n(i) If the size of the test were exactly 8%, we would be indifferent between\
\ rejecting and not rejecting the null hypothesis\n(ii) The null would be rejected\
\ if a 10% size of test were used\n(iii) The null would not be rejected if a 1%\
\ size of test were used\n(iv) The null would be rejected if a 5% size of test were\
\ used.\n(A) (ii) and (iv) only (B) (i) and (iii) only (C) (i), (ii), and (iii)\
\ only (D) (i), (ii), (iii), and (iv).\nA: Let's think step by step. We refer to\
\ Wikipedia articles on econometrics for help. Let’s reason about each of the options.\n\
(i) is a true statement.\n(ii) is a true statement.\n(iii) is a true statement.\n\
(iv) is not a true statement. Thus, (i), (ii), and (iii) are true. The answer is\
\ (C).\n\nQ: For a stationary autoregressive process, shocks will\n(A) Eventually\
\ die away (B) Persist indefinitely (C) Grow exponentially (D) Never occur\nA: Let's\
\ think step by step. We refer to Wikipedia articles on econometrics for help. This\
\ is a formal logic problem about stationally process. For a stationary autoregressive\
\ process, shocks will eventually die away. The answer is (A).\n\n"
"group": "mmlu_flan_cot_fewshot_social_sciences"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_econometrics"
dataset_name: econometrics
description: The following are multiple choice questions (with answers) about econometrics.
fewshot_config:
sampler: first_n
samples:
- question: 'Suppose now that a researcher wishes to use information criteria to determine
the optimal lag length for a VAR. 500 observations are available for the bi-variate
VAR, and the values of the determinant of the variance-covariance matrix of
residuals are 0.0336, 0.0169, 0.0084, and 0.0062 for 1, 2, 3, and 4 lags respectively.
What is the optimal model order according to Akaike''s information criterion?
(A) 1 lag (B) 2 lags (C) 3 lags (D) 4 lags'
target: "Let's think step by step. We refer to Wikipedia articles on econometrics\
\ for help. Let\u2019s solve this problem step by step. First of all, let\u2019\
s recall that for a given set of data, Akaike's information criterion (AIC)\
\ allows us to measure how well a statistical model fits the data; it is an\
\ estimator of prediction error. Here in this problem we will need to use the\
\ formula ln(det(sigma_hat)) + (2 * k / T) to determine the values of Akaike\u2019\
s criterion, where ln denotes the natural log function, det the determinant\
\ function, k the total number of parameters in total (across both equations),\
\ and T the number of observations (which, in this case, is equal to 500). For\
\ 1 lag, the number of parameters in total is equal to 6; for 2 lags, it is\
\ 10; for 3 lags, it is 14; and for 4 lags, it is 18. Now, let\u2019s calculate\
\ the values of the criterion for each lag:\n(A) 1 lag: ln(0.0336) + (2 * 6\
\ / 500) = ln(0.0336) + (12 / 500) = -3.369\n(B) 2 lags: ln(0.0169) + (2 * 10\
\ / 500) = ln(0.0169) + (20 / 500) = -4.040\n(C) 3 lags: ln(0.0084) + (2 * 14\
\ / 500) = ln(0.0084) + (28 / 500) =-4.724\n(D) 4 lags: ln(0.0062) + (2 * 18\
\ / 500) = ln(0.0062) + (36 / 500) =-5.011\nBecause the optimal model order\
\ according to AIC minimizes the information criterion, the answer should be\
\ the one with the lowest value. In this case, (D) has the lowest value. The\
\ answer is (C)."
- question: 'Consider the following AR(1) model with the disturbances having zero mean
and unit variance
yt = 0.2 + 0.4 yt-1 + ut
The (unconditional) mean of y will be given by
(A) 0.2 (B) 0.4 (C) 0.5 (D) 0.33'
target: "Let's think step by step. We refer to Wikipedia articles on econometrics\
\ for help. Let\u2019s solve this problem step by step. If we have a an AR(1)\
\ model with the disturbances having zero mean and unit variance, then the unconditional\
\ mean of y is equal to the following:\nunconditional mean of y = (the intercept\
\ term) / (1 - autoregressive coefficient)\nWe know that the intercept term\
\ is 0.2 and the autoregressive coefficient is 0.4; thus, we have:\nunconditional\
\ mean of y = (0.2) / (1 - 0.4) = (0.2) / (0.6) = 2 / 6 = 1 / 3, which is approximately\
\ 0.33. That means that the answer should be (D) 0.33. The answer is (D)."
- question: 'What would be then consequences for the OLS estimator if heteroscedasticity
is present in a regression model but ignored?
(A) It will be biased (B) It will be inconsistent (C) It will be inefficient
(D) All of (a), (b) and (c) will be true.'
target: Let's think step by step. We refer to Wikipedia articles on econometrics
for help. Heteroscedasticity refers to the condition where the variance of the
error terms is not constant across multiple observations. If heteroscedasticity
is present in a regression model, then the coefficient estimates in the OLS
estimator will be not only unbiased and consistent but also inefficient. Because
(A) and (B) are incorrect choices and (C) is a correct choice, (D) cannot be
the right answer. Ultimately, (C) is the only true choice. The answer is (C).
- question: 'Suppose that a test statistic has associated with it a p-value of 0.08.
Which one of the following statements is true?
(i) If the size of the test were exactly 8%, we would be indifferent between
rejecting and not rejecting the null hypothesis
(ii) The null would be rejected if a 10% size of test were used
(iii) The null would not be rejected if a 1% size of test were used
(iv) The null would be rejected if a 5% size of test were used.
(A) (ii) and (iv) only (B) (i) and (iii) only (C) (i), (ii), and (iii) only
(D) (i), (ii), (iii), and (iv).'
target: "Let's think step by step. We refer to Wikipedia articles on econometrics\
\ for help. Let\u2019s reason about each of the options.\n(i) is a true statement.\n\
(ii) is a true statement.\n(iii) is a true statement.\n(iv) is not a true statement.\
\ Thus, (i), (ii), and (iii) are true. The answer is (C)."
- question: 'For a stationary autoregressive process, shocks will
(A) Eventually die away (B) Persist indefinitely (C) Grow exponentially (D)
Never occur'
target: 'Let''s think step by step. We refer to Wikipedia articles on econometrics
for help. This is a formal logic problem about stationally process. For a stationary
autoregressive process, shocks will eventually die away. The answer is (A).'
group: mmlu_flan_cot_fewshot_social_sciences
include: _mmlu_flan_cot_fewshot_template_yaml
task: mmlu_flan_cot_fewshot_econometrics
"dataset_name": "electrical_engineering"
"description": "\nThe following are multiple choice questions (with answers) about\
\ electrical engineering.\n\nQ: A point pole has a strength of * 10^-4 weber.\
\ The force in newtons on a point pole of * 1.5 * 10^-4 weber placed at a distance\
\ of 10 cm from it will be\n(A) 15 N. (B) 20 N. (C) 7.5 N. (D) 3.75 N.\nA: Let's\
\ think step by step. The force between two point poles is given by m_1m_2/(mu_0\
\ 4 \\pi r^2), in analogy to Coulomb’s law. Plugging in the values given in the\
\ question, we calculate that the force is approximately 15 N. The answer is (A).\n\
\nQ: The coil of a moving coil meter has 100 turns, is 40 mm long and 30 mm wide.\
\ The control torque is 240*10-6 N-m on full scale. If magnetic flux density is\
\ 1Wb/m2 range of meter is\n(A) 1 mA. (B) 2 mA. (C) 3 mA. (D) 4 mA.\nA: Let's think\
\ step by step. The torque on a coil in a uniform magnetic field is given by BANI,\
\ where B is the magnetic flux density, A is the area of the coil, N is the number\
\ of turns, and I is the current. So we have that I = (Torque)/(BAN), or 240e-6/(1200e-6\
\ * 100 * 1) = 2e-3. The answer is (B).\n\nQ: In an SR latch built from NOR gates,\
\ which condition is not allowed\n(A) S=0, R=0 (B) S=0, R=1 (C) S=1, R=0 (D) S=1,\
\ R=1\nA: Let's think step by step. An SR latch is a set-reset latch; in the case\
\ where S=1 and R=1, the circuit has no stable state; instead a race condition will\
\ be produced within the circuit, so the device will be in an undefined state. So\
\ S=1, R=1 is an illegal input. The answer is (D).\n\nQ: Two long parallel conductors\
\ carry 100 A. If the conductors are separated by 20 mm, the force per meter of\
\ length of each conductor will be\n(A) 100 N. (B) 0.1 N. (C) 1 N. (D) 0.01 N.\n\
A: Let's think step by step. The magnetic force-per-length between two current-carrying\
\ conductors is given by \\mu_0 I_1 I_2 / (2 \\pi r), where $r$ is the separation\
\ distance and I_1 and I_2 are the currents. Plugging in 100 A for I_1 and I_2,\
\ and 20 mm for r, gives 0.1 N. The answer is (B).\n\nQ: In a 2 pole lap winding\
\ dc machine , the resistance of one conductor is and total number of conductors\
\ is 100. Find the total resistance\n(A) 200Ω (B) 100Ω (C) 50Ω (D) 10Ω\nA: Let's\
\ think step by step. In lap winding, effectively two resistors are connected in\
\ parallel, so the actual resistance of each pair is 1 Ohm. Since we have 50 pairs,\
\ we get a total resistance of 50 Ohms. The answer is (C).\n\n"
"group": "mmlu_flan_cot_fewshot_stem"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_electrical_engineering"
dataset_name: electrical_engineering
description: '
The following are multiple choice questions (with answers) about electrical engineering.'
fewshot_config:
sampler: first_n
samples:
- question: "A point pole has a strength of 4\u03C0 * 10^-4 weber. The force in newtons\
\ on a point pole of 4\u03C0 * 1.5 * 10^-4 weber placed at a distance of 10\
\ cm from it will be\n(A) 15 N. (B) 20 N. (C) 7.5 N. (D) 3.75 N."
target: "Let's think step by step. The force between two point poles is given\
\ by m_1m_2/(mu_0 4 \\pi r^2), in analogy to Coulomb\u2019s law. Plugging in\
\ the values given in the question, we calculate that the force is approximately\
\ 15 N. The answer is (A)."
- question: 'The coil of a moving coil meter has 100 turns, is 40 mm long and 30 mm
wide. The control torque is 240*10-6 N-m on full scale. If magnetic flux density
is 1Wb/m2 range of meter is
(A) 1 mA. (B) 2 mA. (C) 3 mA. (D) 4 mA.'
target: Let's think step by step. The torque on a coil in a uniform magnetic field
is given by BANI, where B is the magnetic flux density, A is the area of the
coil, N is the number of turns, and I is the current. So we have that I = (Torque)/(BAN),
or 240e-6/(1200e-6 * 100 * 1) = 2e-3. The answer is (B).
- question: 'In an SR latch built from NOR gates, which condition is not allowed
(A) S=0, R=0 (B) S=0, R=1 (C) S=1, R=0 (D) S=1, R=1'
target: Let's think step by step. An SR latch is a set-reset latch; in the case
where S=1 and R=1, the circuit has no stable state; instead a race condition
will be produced within the circuit, so the device will be in an undefined state.
So S=1, R=1 is an illegal question. The answer is (D).
- question: 'Two long parallel conductors carry 100 A. If the conductors are separated
by 20 mm, the force per meter of length of each conductor will be
(A) 100 N. (B) 0.1 N. (C) 1 N. (D) 0.01 N.'
target: Let's think step by step. The magnetic force-per-length between two current-carrying
conductors is given by \mu_0 I_1 I_2 / (2 \pi r), where $r$ is the separation
distance and I_1 and I_2 are the currents. Plugging in 100 A for I_1 and I_2,
and 20 mm for r, gives 0.1 N. The answer is (B).
- question: "In a 2 pole lap winding dc machine , the resistance of one conductor is\
\ 2\u03A9 and total number of conductors is 100. Find the total resistance\n\
(A) 200\u03A9 (B) 100\u03A9 (C) 50\u03A9 (D) 10\u03A9"
target: 'Let''s think step by step. In lap winding, effectively two resistors
are connected in parallel, so the actual resistance of each pair is 1 Ohm. Since
we have 50 pairs, we get a total resistance of 50 Ohms. The answer is (C).'
group: mmlu_flan_cot_fewshot_stem
include: _mmlu_flan_cot_fewshot_template_yaml
task: mmlu_flan_cot_fewshot_electrical_engineering
"dataset_name": "elementary_mathematics"
"description": "The following are multiple choice questions (with answers) about elementary\
\ mathematics.\n\nQ: Olivia used the rule \"Add 11\" to create the number pattern\
\ shown below. 10, 21, 32, 43, 54. Which statement about the number pattern is true?\n\
(A) The 10th number in the pattern will be an even number.\n(B) The number pattern\
\ will never have two even numbers next to each other.\n(C) The next two numbers\
\ in the pattern will be an even number then an odd number.\n(D) If the number pattern\
\ started with an odd number then the pattern would have only odd numbers in it.\n\
A: Let's think step by step. Choice A is incorrect because every even-numbered term\
\ in the pattern is odd, and 10 is an even number. Choice B is correct, because\
\ adding an odd number (in this case 11) to an odd number produces an even number,\
\ and adding an odd number to an even number produces an odd number. Thus the terms\
\ in the pattern will alternate between odd and even, so there will never be two\
\ even numbers next to each other. Choice C is incorrect because the last term in\
\ the example is even (54), and we know that the terms will alternate between even\
\ and odd. Choice D is incorrect because the terms in the pattern will alternate\
\ between odd and even, regardless of the value of the first term. The answer is\
\ (B).\n\nQ: The population of the city where Michelle was born is 145,826. What\
\ is the value of the 5 in the number 145,826?\n(A) 5 thousands\n(B) 5 hundreds\n\
(C) 5 tens\n(D) 5 ones\nA: Let's think step by step. Choice A is correct, because\
\ there are three digits following the 5, so\nthe 5 is in the thousands place. Thus\
\ the other choices are incorrect. The answer is (A).\n\nQ: A store sells 107 different\
\ colors of paint. They have 25 cans of each color in storage. The number of cans\
\ of paint the store has in storage can be found using the expression below. 107\
\ × 25. How many cans of paint does the store have in storage?\n(A) 749\n(B) 2,675\n\
(C) 2,945\n(D) 4,250\nA: Let's think step by step. We can calculate 107 x 25 = (100\
\ x 25) + (7 x 25) = 2500 + 175 = 2675. The answer is (B).\n\nQ: A total of 30 players\
\ will play basketball at a park. There will be exactly 5 players on each team.\
\ Which statement correctly explains how to find the number of teams needed?\n(A)\
\ Add 5 to 30 to find 35 teams.\n(B) Divide 30 by 5 to find 6 teams.\n(C) Multiply\
\ 30 and 5 to find 150 teams.\n(D) Subtract 5 from 30 to find 25 teams.\nA: Let's\
\ think step by step. We want to find the number of teams. We know that there are\
\ 5 players/team, and 30 players. Thus to get the number of teams we divide players\
\ by players/team, so 30 players / 5 players/team = 6 teams. The answer is (B).\n\
\nQ: Which expression is equivalent to 5 x 9?\n(A) (5 x 4) x (6 x 5)\n(B) (5 x 5)\
\ + (5 x 4)\n(C) (5 x 5) + (5 x 9)\n(D) (5 x 9) x (6 x 9)\nA: Let's think step by\
\ step. We know that 9 = (5 + 4), so 5 x 9 = 5 x (5 + 4) = (5 x 5) + (5 x 4). The\
\ answer is (B).\n\n"
"group": "mmlu_flan_cot_fewshot_stem"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_elementary_mathematics"
dataset_name: elementary_mathematics
description: The following are multiple choice questions (with answers) about elementary
mathematics.
fewshot_config:
sampler: first_n
samples:
- question: 'Olivia used the rule "Add 11" to create the number pattern shown below.
10, 21, 32, 43, 54. Which statement about the number pattern is true?
(A) The 10th number in the pattern will be an even number.
(B) The number pattern will never have two even numbers next to each other.
(C) The next two numbers in the pattern will be an even number then an odd number.
(D) If the number pattern started with an odd number then the pattern would
have only odd numbers in it.'
target: Let's think step by step. Choice A is incorrect because every even-numbered
term in the pattern is odd, and 10 is an even number. Choice B is correct, because
adding an odd number (in this case 11) to an odd number produces an even number,
and adding an odd number to an even number produces an odd number. Thus the
terms in the pattern will alternate between odd and even, so there will never
be two even numbers next to each other. Choice C is incorrect because the last
term in the example is even (54), and we know that the terms will alternate
between even and odd. Choice D is incorrect because the terms in the pattern
will alternate between odd and even, regardless of the value of the first term.
The answer is (B).
- question: 'The population of the city where Michelle was born is 145,826. What is
the value of the 5 in the number 145,826?
(A) 5 thousands
(B) 5 hundreds
(C) 5 tens
(D) 5 ones'
target: 'Let''s think step by step. Choice A is correct, because there are three
digits following the 5, so
the 5 is in the thousands place. Thus the other choices are incorrect. The answer
is (A).'
- question: "A store sells 107 different colors of paint. They have 25 cans of each\
\ color in storage. The number of cans of paint the store has in storage can\
\ be found using the expression below. 107 \xD7 25. How many cans of paint does\
\ the store have in storage?\n(A) 749\n(B) 2,675\n(C) 2,945\n(D) 4,250"
target: Let's think step by step. We can calculate 107 x 25 = (100 x 25) + (7
x 25) = 2500 + 175 = 2675. The answer is (B).
- question: 'A total of 30 players will play basketball at a park. There will be exactly
5 players on each team. Which statement correctly explains how to find the number
of teams needed?
(A) Add 5 to 30 to find 35 teams.
(B) Divide 30 by 5 to find 6 teams.
(C) Multiply 30 and 5 to find 150 teams.
(D) Subtract 5 from 30 to find 25 teams.'
target: Let's think step by step. We want to find the number of teams. We know
that there are 5 players/team, and 30 players. Thus to get the number of teams
we divide players by players/team, so 30 players / 5 players/team = 6 teams.
The answer is (B).
- question: 'Which expression is equivalent to 5 x 9?
(A) (5 x 4) x (6 x 5)
(B) (5 x 5) + (5 x 4)
(C) (5 x 5) + (5 x 9)
(D) (5 x 9) x (6 x 9)'
target: 'Let''s think step by step. We know that 9 = (5 + 4), so 5 x 9 = 5 x (5
+ 4) = (5 x 5) + (5 x 4). The answer is (B).'
group: mmlu_flan_cot_fewshot_stem
include: _mmlu_flan_cot_fewshot_template_yaml
task: mmlu_flan_cot_fewshot_elementary_mathematics
"dataset_name": "formal_logic"
"description": "The following are multiple choice questions (with answers) about formal\
\ logic.\n\nQ: Which of the given formulas of PL is the best symbolization of the\
\ following sentence?\nTurtles live long lives and are happy creatures, unless they\
\ are injured.\n(A) (L H) I (B) (L H) I (C) L (H I) (D) L (H R).\n\
A: Let's think step by step. We refer to Wikipedia articles on formal logic for\
\ help. Let’s solve this step by step. Let “L” denote “living long”, H “being happy”,\
\ and “I” “being injured”. Now, consider each choice:\n(A) means (living long AND\
\ being happy) is equivalent to (being injured). \n(B) means (living long AND being\
\ happy) OR (being injured). \n(C) means (living long) AND (being happy OR being\
\ injured). \n(D) means (living long) AND (being happy implies being R), but what\
\ R denotes is not clear.\nObviously, (B) is the best symbolization of the original\
\ sentence. The answer is (B).\n\nQ: Select the best translation into predicate\
\ logic.George borrows Hector's lawnmower. (g: George; h: Hector; l: Hector's lawnmower;\
\ Bxyx: x borrows y from z).\n(A) Blgh (B) Bhlg (C) Bglh (D) Bghl\nA: Let's think\
\ step by step. We refer to Wikipedia articles on formal logic for help. Let’s solve\
\ this step by step. We are told that “Bxyx” means “x borrows y from z”. We can\
\ rewrite “George borrows Hector's lawnmower” as “George borrows a lawnmower from\
\ Hector”, which can then be translated into predicate logic as “Bglh”. The answer\
\ “Bglh” appears in (C); therefore, (C) must be the correct answer. The answer is\
\ (C).\n\nQ: \nSelect the best English interpretation of the given arguments in\
\ predicate logic.\nDm\n(∀x)(Wx ~Dx). \n(∀x)Wx Ag\t/ (∃x)Ax\n(A) Marina is a\
\ dancer. Some weaklings are not dancers. Either everything is a weakling or Georgia\
\ plays volleyball. So something plays volleyball. (B) Marina is a dancer. No weakling\
\ is a dancer. Everything is either a weakling or plays volleyball. So something\
\ plays volleyball. (C) Marina is a dancer. Some weaklings are not dancers. Everything\
\ is either a weakling or plays volleyball. So something plays volleyball. (D) Marina\
\ is a dancer. No weakling is a dancer. Either everything is a weakling or Georgia\
\ plays volleyball. So something plays volleyball.\nA: Let's think step by step.\
\ We refer to Wikipedia articles on formal logic for help. Let’s solve this step\
\ by step. Let “D” denote “being a dancer”, “m” denote “Maria”, “g” denote “Georgia”,\
\ “W” denote “weakling”, “A” denote “playing volleyball”. Then, we have the following:\n\
1. Dm Maria is a dance.\n2. (∀x)(Wx ~Dx). For all x, if x is a weakling, then\
\ x is not a dancer. In other words, no weakling is a dancer.\n3. (∀x)Wx Ag\t\
/ (∃x)Ax For all x, x is a weakling or Georgia plays volleyball. So there exists\
\ an x that plays volleyball. \nOptions (A) and (C) do claim that some weaklings\
\ are not dancers, but the second argument strongly states that no weakling is a\
\ dancer. Thus, we can eliminate them. Option (B) omits the important detail about\
\ Georgia playing volleyball. Option (D) has all the details presented in the arguments\
\ and is the best English interpretation of the arguments. The answer is (D).\n\n\
Q: Select the best translation into predicate logic: No people drive on Mars.\n\
(A) ~Pd (B) (∀x)(Px ~Dx) (C) (∀x)(Px ~Dx) (D) ~Dp\nA: Let's think step by step.\
\ We refer to Wikipedia articles on formal logic for help. Let’s solve this step\
\ by step. Let “P” denote “being on Mars” and “D” denote “driving on Mars”. Then\
\ let’s consider each option:\nOption (A): ~Pd d is not on Mars.\nOption (B):\
\ (∀x)(Px ~Dx) For all x, x is on Mars and x do not drive on Mars.\nOption (C):\
\ (∀x)(Px ~Dx) For all x, x is on Mars implies that x do not drive on Mars.\n\
Option (D): ~Dp: p do not drive on Mars.\nOf all these options, Option (C) appears\
\ to be the best and most meaningful interpretation of the argument “No people drive\
\ on Mars.” The answer is (C).\n\n"
"group": "mmlu_flan_cot_fewshot_humanities"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_formal_logic"
dataset_name: formal_logic
description: The following are multiple choice questions (with answers) about formal
logic.
fewshot_config:
sampler: first_n
samples:
- question: "Which of the given formulas of PL is the best symbolization of the following\
\ sentence?\nTurtles live long lives and are happy creatures, unless they are\
\ injured.\n(A) (L \u2022 H) \u2261 I (B) (L \u2022 H) \u2228 I (C) L \u2022\
\ (H \u2228 I) (D) L \u2022 (H \u2283 R)."
target: "Let's think step by step. We refer to Wikipedia articles on formal logic\
\ for help. Let\u2019s solve this step by step. Let \u201CL\u201D denote \u201C\
living long\u201D, H \u201Cbeing happy\u201D, and \u201CI\u201D \u201Cbeing\
\ injured\u201D. Now, consider each choice:\n(A) means (living long AND being\
\ happy) is equivalent to (being injured). \n(B) means (living long AND being\
\ happy) OR (being injured). \n(C) means (living long) AND (being happy OR being\
\ injured). \n(D) means (living long) AND (being happy implies being R), but\
\ what R denotes is not clear.\nObviously, (B) is the best symbolization of\
\ the original sentence. The answer is (B)."
- question: 'Select the best translation into predicate logic.George borrows Hector''s
lawnmower. (g: George; h: Hector; l: Hector''s lawnmower; Bxyx: x borrows y
from z).
(A) Blgh (B) Bhlg (C) Bglh (D) Bghl'
target: "Let's think step by step. We refer to Wikipedia articles on formal logic\
\ for help. Let\u2019s solve this step by step. We are told that \u201CBxyx\u201D\
\ means \u201Cx borrows y from z\u201D. We can rewrite \u201CGeorge borrows\
\ Hector's lawnmower\u201D as \u201CGeorge borrows a lawnmower from Hector\u201D\
, which can then be translated into predicate logic as \u201CBglh\u201D. The\
\ answer \u201CBglh\u201D appears in (C); therefore, (C) must be the correct\
\ answer. The answer is (C)."
- question: "\nSelect the best English interpretation of the given arguments in predicate\
\ logic.\nDm\n(\u2200x)(Wx \u2283 ~Dx). \n(\u2200x)Wx \u2228 Ag\t/ (\u2203x)Ax\n\
(A) Marina is a dancer. Some weaklings are not dancers. Either everything is\
\ a weakling or Georgia plays volleyball. So something plays volleyball. (B)\
\ Marina is a dancer. No weakling is a dancer. Everything is either a weakling\
\ or plays volleyball. So something plays volleyball. (C) Marina is a dancer.\
\ Some weaklings are not dancers. Everything is either a weakling or plays volleyball.\
\ So something plays volleyball. (D) Marina is a dancer. No weakling is a dancer.\
\ Either everything is a weakling or Georgia plays volleyball. So something\
\ plays volleyball."
target: "Let's think step by step. We refer to Wikipedia articles on formal logic\
\ for help. Let\u2019s solve this step by step. Let \u201CD\u201D denote \u201C\
being a dancer\u201D, \u201Cm\u201D denote \u201CMaria\u201D, \u201Cg\u201D\
\ denote \u201CGeorgia\u201D, \u201CW\u201D denote \u201Cweakling\u201D, \u201C\
A\u201D denote \u201Cplaying volleyball\u201D. Then, we have the following:\n\
1. Dm \u2192 Maria is a dance.\n2. (\u2200x)(Wx \u2283 ~Dx). \u2192 For all\
\ x, if x is a weakling, then x is not a dancer. In other words, no weakling\
\ is a dancer.\n3. (\u2200x)Wx \u2228 Ag\t/ (\u2203x)Ax \u2192 For all x, x\
\ is a weakling or Georgia plays volleyball. So there exists an x that plays\
\ volleyball. \nOptions (A) and (C) do claim that some weaklings are not dancers,\
\ but the second argument strongly states that no weakling is a dancer. Thus,\
\ we can eliminate them. Option (B) omits the important detail about Georgia\
\ playing volleyball. Option (D) has all the details presented in the arguments\
\ and is the best English interpretation of the arguments. The answer is (D)."
- question: "Select the best translation into predicate logic: No people drive on Mars.\n\
(A) ~Pd (B) (\u2200x)(Px \u2228 ~Dx) (C) (\u2200x)(Px \u2283 ~Dx) (D) ~Dp"
target: "Let's think step by step. We refer to Wikipedia articles on formal logic\
\ for help. Let\u2019s solve this step by step. Let \u201CP\u201D denote \u201C\
being on Mars\u201D and \u201CD\u201D denote \u201Cdriving on Mars\u201D. Then\
\ let\u2019s consider each option:\nOption (A): ~Pd \u2192 d is not on Mars.\n\
Option (B): (\u2200x)(Px \u2228 ~Dx) \u2192 For all x, x is on Mars and x do\
\ not drive on Mars.\nOption (C): (\u2200x)(Px \u2283 ~Dx) \u2192 For all x,\
\ x is on Mars implies that x do not drive on Mars.\nOption (D): ~Dp: \u2192\
\ p do not drive on Mars.\nOf all these options, Option (C) appears to be the\
\ best and most meaningful interpretation of the argument \u201CNo people drive\
\ on Mars.\u201D The answer is (C).\n\n"
group: mmlu_flan_cot_fewshot_humanities
include: _mmlu_flan_cot_fewshot_template_yaml
task: mmlu_flan_cot_fewshot_formal_logic
"dataset_name": "global_facts"
"description": "The following are multiple choice questions (with answers) about global\
\ facts.\n\nQ: As of 2017, how many of the world’s 1-year-old children today have\
\ been vaccinated against some disease? *\n(A) 80% (B) 60% (C) 40% (D) 20%\nA: Let's\
\ think step by step. We refer to Wikipedia articles on global facts for help. According\
\ to data published by the World Health Organization, the nummber of 1-year-old\
\ children vaccinated in 2017 exceeds 80%. The answer is (A).\n\nQ: As of 2019,\
\ about what percentage of Americans agree that the state is run for the benefit\
\ of all the people?\n(A) 31% (B) 46% (C) 61% (D) 76%\nA: Let's think step by step.\
\ We refer to Wikipedia articles on global facts for help. In 2019, about 46% percentage\
\ of Americans agree that the state is run for the benefit of all the people. The\
\ answer is (B).\n\nQ: As of 2019, about what percentage of Russians say it is very\
\ important to have free media in our country without government/state censorship?\n\
(A) 38% (B) 53% (C) 68% (D) 83%\nA: Let's think step by step. We refer to Wikipedia\
\ articles on global facts for help. As of 2019, about 38% of Russians say it is\
\ very important to have free media in our country. The answer is (A).\n\nQ: As\
\ of 2015, since 1990 forests have ____ in Europe and have ____ in Africa and the\
\ Americas.\n(A) increased, increased (B) increased, decreased (C) decreased, increased\
\ (D) decreased, decreased\nA: Let's think step by step. We refer to Wikipedia articles\
\ on global facts for help. As of 2015, since 1990 forests have increased in Europe\
\ and have decreased in Africa and the Americas. The answer is (B).\n\nQ: Which\
\ of the following pairs of statements are both true (as of 2019)?\n(A) People tend\
\ to be optimistic about their own future and the future of their nation or the\
\ world. (B) People tend to be optimistic about their own future but pessimistic\
\ about the future of their nation or the world. (C) People tend to be pessimistic\
\ about their own future but optimistic about the future of their nation or the\
\ world. (D) People tend to be pessimistic about their own future and the future\
\ of their nation or the world.\nA: Let's think step by step. We refer to Wikipedia\
\ articles on global facts for help. As of 2019, most people tend to be optimistic\
\ about their own future but pessimistic about the future of their nation or the\
\ world. The answer is (B).\n\n"
"group": "mmlu_flan_cot_fewshot_other"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_global_facts"
dataset_name: global_facts
description: The following are multiple choice questions (with answers) about global
facts.
fewshot_config:
sampler: first_n
samples:
- question: "As of 2017, how many of the world\u2019s 1-year-old children today have\
\ been vaccinated against some disease? *\n(A) 80% (B) 60% (C) 40% (D) 20%"
target: Let's think step by step. We refer to Wikipedia articles on global facts
for help. According to data published by the World Health Organization, the
nummber of 1-year-old children vaccinated in 2017 exceeds 80%. The answer is
(A).
- question: 'As of 2019, about what percentage of Americans agree that the state is
run for the benefit of all the people?
(A) 31% (B) 46% (C) 61% (D) 76%'
target: Let's think step by step. We refer to Wikipedia articles on global facts
for help. In 2019, about 46% percentage of Americans agree that the state is
run for the benefit of all the people. The answer is (B).
- question: 'As of 2019, about what percentage of Russians say it is very important
to have free media in our country without government/state censorship?
(A) 38% (B) 53% (C) 68% (D) 83%'
target: Let's think step by step. We refer to Wikipedia articles on global facts
for help. As of 2019, about 38% of Russians say it is very important to have
free media in our country. The answer is (A).
- question: 'As of 2015, since 1990 forests have ____ in Europe and have ____ in Africa
and the Americas.
(A) increased, increased (B) increased, decreased (C) decreased, increased (D)
decreased, decreased'
target: Let's think step by step. We refer to Wikipedia articles on global facts
for help. As of 2015, since 1990 forests have increased in Europe and have decreased
in Africa and the Americas. The answer is (B).
- question: 'Which of the following pairs of statements are both true (as of 2019)?
(A) People tend to be optimistic about their own future and the future of their
nation or the world. (B) People tend to be optimistic about their own future
but pessimistic about the future of their nation or the world. (C) People tend
to be pessimistic about their own future but optimistic about the future of
their nation or the world. (D) People tend to be pessimistic about their own
future and the future of their nation or the world.'
target: 'Let''s think step by step. We refer to Wikipedia articles on global facts
for help. As of 2019, most people tend to be optimistic about their own future
but pessimistic about the future of their nation or the world. The answer is
(B).'
group: mmlu_flan_cot_fewshot_other
include: _mmlu_flan_cot_fewshot_template_yaml
task: mmlu_flan_cot_fewshot_global_facts
"dataset_name": "high_school_biology"
"description": "The following are multiple choice questions (with answers) about high\
\ school biology.\n\nQ: In animal cells, which of the following represents the most\
\ likely pathway that a secretory protein takes as it is synthesized in a cell?\n\
(A) Plasma membrane–Golgi apparatus–ribosome–secretory vesicle–rough ER (B) Ribosome–Golgi\
\ apparatus–rough ER–secretory vesicle–plasma membrane (C) Plasma membrane–Golgi\
\ apparatus–ribosome–secretory vesicle–rough ER (D) Ribosome–rough ER–Golgi apparatus–secretory\
\ vesicle–plasma membrane\nA: Let's think step by step. Protein synthesis starts\
\ at the ribosome, so we can eliminate (A) and (C). The ribosome is often in the\
\ endoplasmic reticulum and moves from there to the Golgi apparatus, where it is\
\ modified and packaged into a vesicle. The vesicle then floats to the plasma membrane\
\ and is secreted. The answer is (D).\n\nQ: A mutation in a bacterial enzyme changed\
\ a previously polar amino acid into a nonpolar amino acid. This amino acid was\
\ located at a site distant from the enzyme’s active site. How might this mutation\
\ alter the enzyme’s substrate specificity?\n(A) By changing the enzyme’s pH optimum\
\ (B) By changing the enzyme’s location in the cell (C) By changing the shape of\
\ the protein (D) An amino acid change away from the active site cannot alter the\
\ enzyme’s substrate specificity.\nA: Let's think step by step. A change in an amino\
\ acid leads to a change in the primary structure of the protein. A change in the\
\ primary structure may lead to a change in the secondary and the tertiary structure\
\ of the protein. A change in the tertiary structure means a change in the shape\
\ of the protein, so (C) has to be correct. Since the change does not affect the\
\ active site of the enzyme, we do not expect the activity of the enzyme to be affected.\
\ The answer is (C).\n\nQ: Which of the following is not a way to form recombinant\
\ DNA?\n(A) Translation (B) Conjugation (C) Specialized transduction (D) Transformation\n\
A: Let's think step by step. The introduction of foreign DNA or RNA into bacteria\
\ or eukaryotic cells is a common technique in molecular biology and scientific\
\ research. There are multiple ways foreign DNA can be introduced into cells including\
\ transformation, transduction, conjugation, and transfection. In contrast, (A)\
\ is not a way to form DNA: during translation the ribosomes synthesize proteins\
\ from RNA. The answer is (A).\n\nQ: Homologous structures are often cited as evidence\
\ for the process of natural selection. All of the following are examples of homologous\
\ structures EXCEPT\n(A) the wings of a bird and the wings of a bat (B) the flippers\
\ of a whale and the arms of a man (C) the pectoral fins of a porpoise and the flippers\
\ of a seal (D) the forelegs of an insect and the forelimbs of a dog\nA: Let's think\
\ step by step. ​​Homologous structures are similar physical features in organisms\
\ that share a common ancestor ​​but different functions. Comparisons (B) and (C)\
\ are clearly homologous because they share a common ancestor and the structures\
\ serve different purposes. Bat wings and birg wings are also homologous, while\
\ they are both wings, the forelimbs serve different purposes. Insects and dogs\
\ are very far ancestors since one is vertebrate while the other is invertebrate\
\ and the forelimbs serve the same purpose, so they are not homologous. The answer\
\ is (D).\n\nQ: Which of the following is not known to be involved in the control\
\ of cell division?\n(A) Cyclins (B) Protein kinases (C) Checkpoints (D) Fibroblast\
\ cells\nA: Let's think step by step. Normal cells move through the cell cycle in\
\ a regulated way. At the checkpoint stage, they use information about their own\
\ internal state and cues from the environment around them to decide whether to\
\ proceed with cell division. Cues like these act by changing the activity of core\
\ cell cycle regulators inside the cell. The most common regulators are cyclins\
\ and cyclin-dependent kinases. Fibroblast cells do not play any role in cell division.\
\ The answer is (D).\n\n"
"group": "mmlu_flan_cot_fewshot_stem"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_high_school_biology"
dataset_name: high_school_biology
description: The following are multiple choice questions (with answers) about high
school biology.
fewshot_config:
sampler: first_n
samples:
- question: "In animal cells, which of the following represents the most likely pathway\
\ that a secretory protein takes as it is synthesized in a cell?\n(A) Plasma\
\ membrane\u2013Golgi apparatus\u2013ribosome\u2013secretory vesicle\u2013rough\
\ ER (B) Ribosome\u2013Golgi apparatus\u2013rough ER\u2013secretory vesicle\u2013\
plasma membrane (C) Plasma membrane\u2013Golgi apparatus\u2013ribosome\u2013\
secretory vesicle\u2013rough ER (D) Ribosome\u2013rough ER\u2013Golgi apparatus\u2013\
secretory vesicle\u2013plasma membrane"
target: Let's think step by step. Protein synthesis starts at the ribosome, so
we can eliminate (A) and (C). The ribosome is often in the endoplasmic reticulum
and moves from there to the Golgi apparatus, where it is modified and packaged
into a vesicle. The vesicle then floats to the plasma membrane and is secreted.
The answer is (D).
- question: "A mutation in a bacterial enzyme changed a previously polar amino acid\
\ into a nonpolar amino acid. This amino acid was located at a site distant\
\ from the enzyme\u2019s active site. How might this mutation alter the enzyme\u2019\
s substrate specificity?\n(A) By changing the enzyme\u2019s pH optimum (B) By\
\ changing the enzyme\u2019s location in the cell (C) By changing the shape\
\ of the protein (D) An amino acid change away from the active site cannot alter\
\ the enzyme\u2019s substrate specificity."
target: Let's think step by step. A change in an amino acid leads to a change
in the primary structure of the protein. A change in the primary structure may
lead to a change in the secondary and the tertiary structure of the protein.
A change in the tertiary structure means a change in the shape of the protein,
so (C) has to be correct. Since the change does not affect the active site of
the enzyme, we do not expect the activity of the enzyme to be affected. The
answer is (C).
- question: 'Which of the following is not a way to form recombinant DNA?
(A) Translation (B) Conjugation (C) Specialized transduction (D) Transformation'
target: 'Let''s think step by step. The introduction of foreign DNA or RNA into
bacteria or eukaryotic cells is a common technique in molecular biology and
scientific research. There are multiple ways foreign DNA can be introduced into
cells including transformation, transduction, conjugation, and transfection.
In contrast, (A) is not a way to form DNA: during translation the ribosomes
synthesize proteins from RNA. The answer is (A).'
- question: 'Homologous structures are often cited as evidence for the process of natural
selection. All of the following are examples of homologous structures EXCEPT
(A) the wings of a bird and the wings of a bat (B) the flippers of a whale and
the arms of a man (C) the pectoral fins of a porpoise and the flippers of a
seal (D) the forelegs of an insect and the forelimbs of a dog'
target: "Let's think step by step. \u200B\u200BHomologous structures are similar\
\ physical features in organisms that share a common ancestor \u200B\u200Bbut\
\ different functions. Comparisons (B) and (C) are clearly homologous because\
\ they share a common ancestor and the structures serve different purposes.\
\ Bat wings and birg wings are also homologous, while they are both wings, the\
\ forelimbs serve different purposes. Insects and dogs are very far ancestors\
\ since one is vertebrate while the other is invertebrate and the forelimbs\
\ serve the same purpose, so they are not homologous. The answer is (D)."
- question: 'Which of the following is not known to be involved in the control of cell
division?
(A) Cyclins (B) Protein kinases (C) Checkpoints (D) Fibroblast cells'
target: 'Let''s think step by step. Normal cells move through the cell cycle in
a regulated way. At the checkpoint stage, they use information about their own
internal state and cues from the environment around them to decide whether to
proceed with cell division. Cues like these act by changing the activity of
core cell cycle regulators inside the cell. The most common regulators are cyclins
and cyclin-dependent kinases. Fibroblast cells do not play any role in cell
division. The answer is (D).'
group: mmlu_flan_cot_fewshot_stem
include: _mmlu_flan_cot_fewshot_template_yaml
task: mmlu_flan_cot_fewshot_high_school_biology
"dataset_name": "high_school_chemistry"
"description": "The following are multiple choice questions (with answers) about high\
\ school chemistry.\n\nQ: Which of the following is considered an acid anhydride?\n\
(A) HCl (B) H2SO3 (C) SO2 (D) Al(NO3)3\nA: Let's think step by step. An acid anhydride\
\ is a compound that is derived by removing water from an acid. The chemical formula\
\ for water is H2O, which means that we need to determine which of these options,\
\ when combined with H2O, forms an acid. SO2, or Sulfur dioxide, when combined with\
\ H2O, makes H2SO4, or sulfuric acid. The answer is (C).\n\nQ: Which of the following\
\ is expected to be a polar molecule?\n(A) PCl4F (B) BF3 (C) CO2 (D) Si(CH3)4\n\
A: Let's think step by step. A polar molecule is one that has a slightly positive\
\ charge on one end of the molecule and a slightly negative charge on the other\
\ end. Boron trifluoride (BF3) has Boron as the center atom and three fluorine atoms\
\ attached to it; it is trigonal planar and symmetric, so it is nonpolar. Carbon\
\ Dioxide (CO2) has Carbon as the central atom with double bonds to two Oxygen atoms\
\ - this is also symmetrical and therefore nonpolar. The same is the case for tetramethyl\
\ silane (SI(CH3)4), which is a Silicon atom surrounded by four methyl groups. The\
\ structure of PCL4F is that Phosphorus is the central atom, attached to four chlorines\
\ and one fluorine atom. This is asymmetrical, and therefore has a net dipole and\
\ is expected to be a polar molecule. The answer is (A).\n\nQ: From the solubility\
\ rules, which of the following is true?\n(A) All chlorides, bromides, and iodides\
\ are soluble (B) All sulfates are soluble (C) All hydroxides are soluble (D) All\
\ ammonium-containing compounds are soluble\nA: Let's think step by step. The chlorides,\
\ bromides, and iodides of lead, silver, and mercury are not soluble in water. This\
\ rules out (A). The sulfates of lead, barium, and calcium are not soluble in water,\
\ which rules out (B). The hydroxides of any metal besides sodium, potassium, ammonium,\
\ calcium, and barium are insoluble. This rules out (C). Typically ammonium ions\
\ indicate a soluble ionic substance. The answer is (D).\n\nQ: A new compound is\
\ synthesized and found to be a monoprotic acid with a molar mass of 248 g/mol.\
\ When 0.0050 mol of this acid are dissolved in 0.500 L of water, the pH is measured\
\ as 3.89. What is the pKa of this acid?\n(A) 3.89 (B) 7.78 (C) 5.78 (D) 2.33\n\
A: Let's think step by step. Recall that $[A] = [H^{+}]$. Here, this is equal to\
\ $$10^{-3.89}$. Then we have $K_{a} = $\nrac{[H^{+}][A^{-}]}{[HA]} = \nrac{10^{-3.89}\
\ \\cdot 10^{-3.89}}{10^{-2}}. The resulting exponent is $-3.89 + (-3.89) - (-2)\
\ = 5.78$, therefore $K_a = 10^{-5.78}$. The $pK_a$ is the negative log of $K_a$,\
\ which is equal to $5.78$. The answer is (C).\n\nQ: A solution contains 2.00 mole\
\ of acetic acid, CH3COOH, and 1.00 mole of calcium acetate, Ca(CH3COO)2. The solution\
\ is able to resist the addition of a small amount of strong acid or strong base\
\ with only minor changes in the pH of the solution. Larger quantities of strong\
\ acid or strong base can cause a significant change in pH. How many moles of nitric\
\ acid, HNO3, may be added before the pH begins to change significantly?\n(A) 0.500\
\ mole (B) 1.00 mole (C) 2.00 mole (D) 3.00 mole\nA: Let's think step by step. We\
\ would like to compute the buffer capacity of this solution. First we write the\
\ equation for the ionization of the weak acid, in this case of acetic acid. $CH_{3}COOH\
\ (aq) + H_{2}O \nightarrow H_{3}O^{+} + CH3COO^{-}$. The conjugate base is therefore\
\ the acetate ion. The added strong acid, Nitric acid, will react with the conjugate\
\ base. Therefore the maximum amount of acid that can be added will be equal to\
\ the amount of acetate ion, or 2 moles. The answer is (C).\n\n"
"group": "mmlu_flan_cot_fewshot_stem"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_high_school_chemistry"
dataset_name: high_school_chemistry
description: The following are multiple choice questions (with answers) about high
school chemistry.
fewshot_config:
sampler: first_n
samples:
- question: 'Which of the following is considered an acid anhydride?
(A) HCl (B) H2SO3 (C) SO2 (D) Al(NO3)3'
target: Let's think step by step. An acid anhydride is a compound that is derived
by removing water from an acid. The chemical formula for water is H2O, which
means that we need to determine which of these options, when combined with H2O,
forms an acid. SO2, or Sulfur dioxide, when combined with H2O, makes H2SO4,
or sulfuric acid. The answer is (C).
- question: 'Which of the following is expected to be a polar molecule?
(A) PCl4F (B) BF3 (C) CO2 (D) Si(CH3)4'
target: Let's think step by step. A polar molecule is one that has a slightly
positive charge on one end of the molecule and a slightly negative charge on
the other end. Boron trifluoride (BF3) has Boron as the center atom and three
fluorine atoms attached to it; it is trigonal planar and symmetric, so it is
nonpolar. Carbon Dioxide (CO2) has Carbon as the central atom with double bonds
to two Oxygen atoms - this is also symmetrical and therefore nonpolar. The same
is the case for tetramethyl silane (SI(CH3)4), which is a Silicon atom surrounded
by four methyl groups. The structure of PCL4F is that Phosphorus is the central
atom, attached to four chlorines and one fluorine atom. This is asymmetrical,
and therefore has a net dipole and is expected to be a polar molecule. The answer
is (A).
- question: 'From the solubility rules, which of the following is true?
(A) All chlorides, bromides, and iodides are soluble (B) All sulfates are soluble
(C) All hydroxides are soluble (D) All ammonium-containing compounds are soluble'
target: Let's think step by step. The chlorides, bromides, and iodides of lead,
silver, and mercury are not soluble in water. This rules out (A). The sulfates
of lead, barium, and calcium are not soluble in water, which rules out (B).
The hydroxides of any metal besides sodium, potassium, ammonium, calcium, and
barium are insoluble. This rules out (C). Typically ammonium ions indicate a
soluble ionic substance. The answer is (D).
- question: 'A new compound is synthesized and found to be a monoprotic acid with a
molar mass of 248 g/mol. When 0.0050 mol of this acid are dissolved in 0.500
L of water, the pH is measured as 3.89. What is the pKa of this acid?
(A) 3.89 (B) 7.78 (C) 5.78 (D) 2.33'
target: "Let's think step by step. Recall that $[A] = [H^{+}]$. Here, this is\
\ equal to $$10^{-3.89}$. Then we have $K_{a} = $\nrac{[H^{+}][A^{-}]}{[HA]}\
\ = \nrac{10^{-3.89} \\cdot 10^{-3.89}}{10^{-2}}. The resulting exponent is\
\ $-3.89 + (-3.89) - (-2) = 5.78$, therefore $K_a = 10^{-5.78}$. The $pK_a$\
\ is the negative log of $K_a$, which is equal to $5.78$. The answer is (C)."
- question: 'A solution contains 2.00 mole of acetic acid, CH3COOH, and 1.00 mole of
calcium acetate, Ca(CH3COO)2. The solution is able to resist the addition of
a small amount of strong acid or strong base with only minor changes in the
pH of the solution. Larger quantities of strong acid or strong base can cause
a significant change in pH. How many moles of nitric acid, HNO3, may be added
before the pH begins to change significantly?
(A) 0.500 mole (B) 1.00 mole (C) 2.00 mole (D) 3.00 mole'
target: "Let's think step by step. We would like to compute the buffer capacity\
\ of this solution. First we write the equation for the ionization of the weak\
\ acid, in this case of acetic acid. $CH_{3}COOH (aq) + H_{2}O \nightarrow H_{3}O^{+}\
\ + CH3COO^{-}$. The conjugate base is therefore the acetate ion. The added\
\ strong acid, Nitric acid, will react with the conjugate base. Therefore the\
\ maximum amount of acid that can be added will be equal to the amount of acetate\
\ ion, or 2 moles. The answer is (C).\n\n"
group: mmlu_flan_cot_fewshot_stem
include: _mmlu_flan_cot_fewshot_template_yaml
task: mmlu_flan_cot_fewshot_high_school_chemistry
"dataset_name": "high_school_computer_science"
"description": "The following are multiple choice questions (with answers) about high\
\ school computer science.\n\nQ: Which of the following is an example of the use\
\ of a device on the Internet of Things (IoT) ?\n(A) A car alerts a driver that\
\ it is about to hit an object. (B) A hiker uses a G P S watch to keep track of\
\ her position. (C) A refrigerator orders milk from an online delivery service when\
\ the milk in the refrigerator is almost gone. (D) A runner uses a watch with optical\
\ sensors to monitor his heart rate.\nA: Let's think step by step. The term Internet\
\ of Things (IoT) refers to common devices which are connected to the internet,\
\ enabling new functionality. Choice A is incorrect because it does not describe\
\ an internet connected device. In choice B, the watch is only described as having\
\ GPS functionality but no internet connectivity. Choice C describes a common device\
\ (a refrigerator) which has internet connectivity enabling new functionality (online\
\ ordering). Choice D does not mention internet connectivity for the watch, only\
\ optical sensors. The answer is (C).\n\nQ: Many Web browsers allow users to open\
\ anonymous windows. During a browsing session in an anonymous window, the browser\
\ does not record a browsing history or a list of downloaded files. When the anonymous\
\ window is exited, cookies created during the session are deleted. Which of the\
\ following statements about browsing sessions in an anonymous window is true?\n\
(A) The activities of a user browsing in an anonymous window will not be visible\
\ to people who monitor the user's network, such as the system administrator. (B)\
\ Items placed in a Web store's shopping cart for future purchase during the anonymous\
\ browsing session will not be saved on the user's computer. (C) A user will not\
\ be able to log in to e-mail or social media accounts during the anonymous browsing\
\ session. (D) A user browsing in an anonymous window will be protected from viruses\
\ launched from any web sites visited or files downloaded.\nA: Let's think step\
\ by step. Choice A is incorrect as it only describes network traffic, which an\
\ anonymous browser does not change. Choice B is correct as it correctly describes\
\ how an anonymous browser will prevent saving data on the user’s computer after\
\ the session is ended. Choice C is incorrect because an anonymous browser will\
\ not prevent logging in to email or social media accounts. Choice D is incorrect\
\ because an anonymous browser in itself performs no virus protection. The answer\
\ is (B).\n\nQ: In the program below, the initial value of X is 5 and the initial\
\ value of Y is 10.\nIF (X < 0){\n DISPLAY (\"Foxtrot\")\n} ELSE {\n IF (X > Y){\n\
\ DISPLAY (\"Hotel\")\n } ELSE {\n IF (Y > 0){\n DISPLAY (\"November\")\n }\
\ ELSE {\n DISPLAY (\"Yankee\")\n }\n }\n}\nWhat is displayed as a result of\
\ running the program?\n(A) Foxtrot (B) Hotel (C) November (D) Yankee\nA: Let's\
\ think step by step. Because X has the value 5, the first conditional IF (X < 0)\
\ is false, so we move to the first ELSE clause. Because X is 5 and Y is 10, the\
\ second conditional IF (X > Y) is false, so we move to the following ELSE clause.\
\ Since Y is 10, the conditional IF (Y > 0) is true, so the command DISPLAY (\"\
November\") is executed. The answer is (C).\n\nQ: What is the output of \"abc\"\
[::-1] in Python 3?\n(A) Error (B) abc (C) cba (D) c\nA: Let's think step by step.\
\ We know that the slicing operator [::-1] takes all of the elements in the string\
\ in reverse order, so we reverse the order of the string \"abc\", resulting in\
\ \"cba\". The answer is (C).\n\nQ: A list of numbers has n elements, indexed from\
\ 1 to n. The following algorithm is intended to display the number of elements\
\ in the list that have a value greater than 100. The algorithm uses the variables\
\ count and position. Steps 3 and 4 are missing.\n Step 1: Set count to 0 and position\
\ to 1.\n Step 2: If the value of the element at index position is greater than\
\ 100, increase the value of count by 1.\n Step 3: (missing step)\n Step 4: (missing\
\ step)\n Step 5: Display the value of count.\nWhich of the following could be used\
\ to replace steps 3 and 4 so that the algorithm works as intended?\n(A) Step 3:\
\ Increase the value of position by 1.\n Step 4: Repeat steps 2 and 3 until the\
\ value of count is greater than 100.\n(B) Step 3: Increase the value of position\
\ by 1.\n Step 4: Repeat steps 2 and 3 until the value of position is greater than\
\ n.\n(C) Step 3: Repeat step 2 until the value of count is greater than 100.\n\
\ Step 4: Increase the value of position by 1.\n(D) Step 3: Repeat step 2 until\
\ the value of position is greater than n.\n Step 4: Increase the value of count\
\ by 1.\nA: Let's think step by step. Choice A is incorrect, because its Step 4\
\ has an incorrect termination condition, stopping when count is greater than 100.\
\ We need to stop after inspecting all elements in the list. Choice B is correct\
\ because it correctly increments both count and position, and correctly repeats\
\ these steps and terminates when all elements in the list have been inspected.\
\ Choice C is incorrect because it incorrectly increments the variable count until\
\ its value is greater than 100, regardless of the elements in the list. Choice\
\ D is incorrect because its step 3 does not increment the value of position, so\
\ it will repeat forever. The answer is (B).\n\n"
"group": "mmlu_flan_cot_fewshot_stem"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_high_school_computer_science"
dataset_name: high_school_computer_science
description: The following are multiple choice questions (with answers) about high
school computer science.
fewshot_config:
sampler: first_n
samples:
- question: 'Which of the following is an example of the use of a device on the Internet
of Things (IoT) ?
(A) A car alerts a driver that it is about to hit an object. (B) A hiker uses
a G P S watch to keep track of her position. (C) A refrigerator orders milk
from an online delivery service when the milk in the refrigerator is almost
gone. (D) A runner uses a watch with optical sensors to monitor his heart rate.'
target: Let's think step by step. The term Internet of Things (IoT) refers to
common devices which are connected to the internet, enabling new functionality.
Choice A is incorrect because it does not describe an internet connected device.
In choice B, the watch is only described as having GPS functionality but no
internet connectivity. Choice C describes a common device (a refrigerator) which
has internet connectivity enabling new functionality (online ordering). Choice
D does not mention internet connectivity for the watch, only optical sensors.
The answer is (C).
- question: 'Many Web browsers allow users to open anonymous windows. During a browsing
session in an anonymous window, the browser does not record a browsing history
or a list of downloaded files. When the anonymous window is exited, cookies
created during the session are deleted. Which of the following statements about
browsing sessions in an anonymous window is true?
(A) The activities of a user browsing in an anonymous window will not be visible
to people who monitor the user''s network, such as the system administrator.
(B) Items placed in a Web store''s shopping cart for future purchase during
the anonymous browsing session will not be saved on the user''s computer. (C)
A user will not be able to log in to e-mail or social media accounts during
the anonymous browsing session. (D) A user browsing in an anonymous window will
be protected from viruses launched from any web sites visited or files downloaded.'
target: "Let's think step by step. Choice A is incorrect as it only describes\
\ network traffic, which an anonymous browser does not change. Choice B is correct\
\ as it correctly describes how an anonymous browser will prevent saving data\
\ on the user\u2019s computer after the session is ended. Choice C is incorrect\
\ because an anonymous browser will not prevent logging in to email or social\
\ media accounts. Choice D is incorrect because an anonymous browser in itself\
\ performs no virus protection. The answer is (B)."
- question: "In the program below, the initial value of X is 5 and the initial value\
\ of Y is 10.\nIF (X < 0){\n DISPLAY (\"Foxtrot\")\n} ELSE {\n IF (X > Y){\n\
\ DISPLAY (\"Hotel\")\n } ELSE {\n IF (Y > 0){\n DISPLAY (\"November\")\n\
\ } ELSE {\n DISPLAY (\"Yankee\")\n }\n }\n}\nWhat is displayed as a result\
\ of running the program?\n(A) Foxtrot (B) Hotel (C) November (D) Yankee"
target: Let's think step by step. Because X has the value 5, the first conditional
IF (X < 0) is false, so we move to the first ELSE clause. Because X is 5 and
Y is 10, the second conditional IF (X > Y) is false, so we move to the following
ELSE clause. Since Y is 10, the conditional IF (Y > 0) is true, so the command
DISPLAY ("November") is executed. The answer is (C).
- question: 'What is the output of "abc"[::-1] in Python 3?
(A) Error (B) abc (C) cba (D) c'
target: Let's think step by step. We know that the slicing operator [::-1] takes
all of the elements in the string in reverse order, so we reverse the order
of the string "abc", resulting in "cba". The answer is (C).
- question: "A list of numbers has n elements, indexed from 1 to n. The following algorithm\
\ is intended to display the number of elements in the list that have a value\
\ greater than 100. The algorithm uses the variables count and position. Steps\
\ 3 and 4 are missing.\n Step 1: Set count to 0 and position to 1.\n Step 2:\
\ If the value of the element at index position is greater than 100, increase\
\ the value of count by 1.\n Step 3: (missing step)\n Step 4: (missing step)\n\
\ Step 5: Display the value of count.\nWhich of the following could be used\
\ to replace steps 3 and 4 so that the algorithm works as intended?\n(A) Step\
\ 3: Increase the value of position by 1.\n Step 4: Repeat steps 2 and 3 until\
\ the value of count is greater than 100.\n(B) Step 3: Increase the value of\
\ position by 1.\n Step 4: Repeat steps 2 and 3 until the value of position\
\ is greater than n.\n(C) Step 3: Repeat step 2 until the value of count is\
\ greater than 100.\n Step 4: Increase the value of position by 1.\n(D) Step\
\ 3: Repeat step 2 until the value of position is greater than n.\n Step 4:\
\ Increase the value of count by 1."
target: 'Let''s think step by step. Choice A is incorrect, because its Step 4
has an incorrect termination condition, stopping when count is greater than
100. We need to stop after inspecting all elements in the list. Choice B is
correct because it correctly increments both count and position, and correctly
repeats these steps and terminates when all elements in the list have been inspected.
Choice C is incorrect because it incorrectly increments the variable count until
its value is greater than 100, regardless of the elements in the list. Choice
D is incorrect because its step 3 does not increment the value of position,
so it will repeat forever. The answer is (B).'
group: mmlu_flan_cot_fewshot_stem
include: _mmlu_flan_cot_fewshot_template_yaml
task: mmlu_flan_cot_fewshot_high_school_computer_science
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment