Commit 6ac42518 authored by lintangsutawika's avatar lintangsutawika
Browse files

Merge branch 'big-refactor' of...

Merge branch 'big-refactor' of https://github.com/EleutherAI/lm-evaluation-harness into openai_completions
parents 9c3ba7d4 e3644fcc
"dataset_name": "high_school_government_and_politics"
"description": "The following are multiple choice questions (with answers) about high\
\ school government and politics.\n\nQ: Which of the following best states an argument\
\ made by James Madison in The Federalist number 10?\n(A) Honest politicians can\
\ prevent factions from developing. (B) Factions are more likely to occur in large\
\ republics than in small ones. (C) The negative effects of factionalism can be\
\ reduced by a republican government. (D) Free elections are the people's best defense\
\ against factionalism.\nA: Let's think step by step. We refer to Wikipedia articles\
\ on government and politics for help. In the Federalist number 10, James Madison\
\ advocated for a representative republican form of government to guard against\
\ factionalism. The answer is (C).\n\nQ: The term \"budget deficit\" refers to the\n\
(A) annual increase in federal spending on the military (B) amount of interest on\
\ the national debt (C) difference between the initial budget proposals made by\
\ the president and Congress (D) amount the government spends in excess of its revenues\n\
A: Let's think step by step. We refer to Wikipedia articles on government and politics\
\ for help. When the goverment spends more than it earns, their difference is the\
\ budget deficit. The answer is (D).\n\nQ: Which of the following statements about\
\ cabinet departments is FALSE?\n(A) They are established by the legislative branch.\
\ (B) Their members often don't have much influence over presidential decisions.\
\ (C) They cannot all be run by leaders who belong to the same political party the\
\ president does. (D) Not every federal agency is a cabinet department.\nA: Let's\
\ think step by step. We refer to Wikipedia articles on government and politics\
\ for help. There is no law stipulating that some cabinet department leaders have\
\ to belong to a political party different from that of the president. The answer\
\ is (C).\n\nQ: Which of the following cases established the precedent that a defendant\
\ must be informed of the right to remain silent, the right to a lawyer, and protection\
\ from self-incrimination?\n(A) Weeks v. United States (B) Betts v. Brady (C) Mapp\
\ v. Ohio (D) Miranda v. Arizona\nA: Let's think step by step. We refer to Wikipedia\
\ articles on government and politics for help. In the landmark Miranda v. Arizona\
\ in 1966, the US Supreme Court, based on the Fifth and Sixth Amendment of the US\
\ Constitution, guaranteed a defendant's right to an attorney and protection from\
\ self-incrimination. The answer is (D).\n\nQ: Uncertainty over the limits to presidential\
\ power is caused primarily by the fact that\n(A) the constitutional definition\
\ of those powers is broad and unspecific (B) most people agree that the Constitution\
\ places too many limits on presidential power (C) the Supreme Court consistently\
\ refuses to rule on cases concerning presidential powers (D) constitutional amendments\
\ have greatly increased presidential powers\nA: Let's think step by step. We refer\
\ to Wikipedia articles on government and politics for help. The US Constitution\
\ is not very specific about the powers of the president, leading to uncertainty\
\ over its limits. The answer is (A)."
"group": "mmlu_flan_cot_fewshot_social_sciences"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_high_school_government_and_politics"
"dataset_name": "high_school_macroeconomics"
"description": "The following are multiple choice questions (with answers) about high\
\ school macroeconomics.\n\nQ: Which of the following policies best describes supply-side\
\ fiscal policy?\n(A) An increase in the money supply (B) Increased government spending\
\ (C) Lower taxes on research and development of new technology (D) Higher taxes\
\ on household income\nA: Let's think step by step. We refer to Wikipedia articles\
\ on macroeconomics for help. Supply-side fiscal policy stimulates the economy by\
\ encouraging more production of goods and services through reduction in taxes and\
\ deregulation. The answer is (C).\n\nQ: The short-run Phillips curve indicates\
\ a\n(A) direct relation between unemployment and inflation (B) direct relation\
\ between price and quantity demanded (C) inverse relation between price and quantity\
\ demanded (D) inverse relation between unemployment and inflation\nA: Let's think\
\ step by step. We refer to Wikipedia articles on macroeconomics for help. The short-run\
\ Phillips curve shows that whenever unemployment decreases below a natural level,\
\ the inflation starts increasing, and vice-versa. The answer is (D).\n\nQ: Holding\
\ all else equal which of the following monetary policies would be used to boost\
\ U.S. exports?\n(A) Increasing the discount rate (B) Increasing the reserve ratio\
\ (C) Buying government securities (D) Lowering tariffs\nA: Let's think step by\
\ step. We refer to Wikipedia articles on macroeconomics for help. Buying government\
\ securities leads to reduction in demand for US dollars from foreign buyers, thereby\
\ making it cheaper and hence making US exports more attractive. The answer is (C).\n\
\nQ: A federal deficit occurs when\n(A) exports exceed imports. (B) imports exceed\
\ exports. (C) federal tax collections exceed spending. (D) federal spending exceeds\
\ federal tax revenues.\nA: Let's think step by step. We refer to Wikipedia articles\
\ on macroeconomics for help. A federal deficit occurs when federal spending exceeds\
\ federal income which is primarily from tax revenues. The answer is (D).\n\nQ:\
\ Which of the following is not included in the U.S. GDP?\n(A) The U.S. military\
\ opens a new base in a foreign country with 1000 U.S. personnel. (B) Japanese consumers\
\ buy thousands of CDs produced in the United States. (C) An American pop singer\
\ performs a sold-out concert in Paris. (D) A French theatrical production tours\
\ dozens of American cities.\nA: Let's think step by step. We refer to Wikipedia\
\ articles on macroeconomics for help. The economic transactions related to the\
\ performance of the American pop-singer in Paris happens entirely outside the U.S.\
\ and hence is not included in the GDP numbers. The answer is (C)."
"group": "mmlu_flan_cot_fewshot_social_sciences"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_high_school_macroeconomics"
"dataset_name": "high_school_mathematics"
"description": "The following are multiple choice questions (with answers) about high\
\ school mathematics.\n\nQ: Simplify and write the result with a rational denominator:\
\ $$\\sqrt{\\sqrt[3]{\\sqrt{\\frac{1}{729}}}}$$\n(A) \\frac{3\\sqrt{3}}{3} (B) \\\
frac{1}{3} (C) \\sqrt{3} (D) \\frac{\\sqrt{3}}{3}\nA: Let's think step by step.\
\ Factoring $729=3^6$ and combining the roots $\\frac{1}{2}\\frac{1}{3}\\frac{1}{2}=\\\
frac{1}{12}$, we get that $\\sqrt{\\sqrt[3]{\\sqrt{\\frac{1}{729}}}}=\\left(\\frac{1}{3^6}\\\
right)^{\\frac{1}{12}}=\\frac{1}{3^{\\frac{1}{2}}}=\\frac{3}{\\sqrt{3}}$ The answer\
\ is (D).\n\nQ: Five thousand dollars compounded annually at an $x\\%$ interest\
\ rate takes six years to double. At the same interest rate, how many years will\
\ it take $\\$300$ to grow to $\\$9600$?\n(A) 12 (B) 1 (C) 30 (D) 5\nA: Let's think\
\ step by step. To go from $\\$300$ to $\\$9600$, the value must go up by a factor\
\ of $9600/300=32=2^5$. Since at this interest rate it takes six years for it to\
\ double, it will take $5*6=30$ years to grow to $\\$9600$. The answer is (C).\n\
\nQ: Ten students take a biology test and receive the following scores: 45, 55,\
\ 50, 70, 65, 80, 40, 90, 70, 85. What is the mean of the students’ test scores?\n\
(A) 55 (B) 60 (C) 62 (D) 65\nA: Let's think step by step. There are 10 students\
\ and the sum of their scores is $45 + 55 + 50 + 70 + 65 + 80 + 40 + 90 + 70 + 85\
\ = 650$, the mean is $650/10=65$. The answer is (D).\n\nQ: The variable $x$ varies\
\ directly as the square of $y$, and $y$ varies directly as the cube of $z$. If\
\ $x$ equals $-16$ when $z$ equals 2, what is the value of $x$ when $z$ equals $\\\
frac{1}{2}$?\n(A) -1 (B) 16 (C) -\\frac{1}{256} (D) \\frac{1}{16}\nA: Let's think\
\ step by step. We know that $x \\propto y^2$ and $y \\propto z^3$, so $x = k z^6$\
\ for some constant $k$. Plugging in for $x=-16$ and $z=2$, the constant value is\
\ $k=\\frac{x}{z^6}=\\frac{-16}{64}=-\\frac{1}{4}$. So, when $z=\\frac{1}{2}$, the\
\ value of $x$ is $x=kz^6=-\\frac{1}{4}\\frac{1}{2^6}=-\\frac{1}{256}$. The answer\
\ is (C).\n\nQ: Joe was in charge of lights for a dance. The red light blinks every\
\ two seconds, the yellow light every three seconds, and the blue light every five\
\ seconds. If we include the very beginning and very end of the dance, how many\
\ times during a seven minute dance will all the lights come on at the same time?\
\ (Assume that all three lights blink simultaneously at the very beginning of the\
\ dance.)\n(A) 3 (B) 15 (C) 6 (D) 5\nA: Let's think step by step. The least common\
\ multiple of 2, 3 and 5 is 30, so during a 7 minute dance, all the three lights\
\ will come on at the same time $2*7+1=15$ times. The answer is (B)."
"group": "mmlu_flan_cot_fewshot_stem"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_high_school_mathematics"
"dataset_name": "high_school_microeconomics"
"description": "The following are multiple choice questions (with answers) about high\
\ school microeconomics.\n\nQ: Which of the following is necessarily a characteristic\
\ of oligopoly?\n(A) Free entry into and exit from the market (B) A few large producers\
\ (C) One producer of a good with no close substitutes (D) A homogenous product\n\
A: Let's think step by step. We refer to Wikipedia articles on microeconomics for\
\ help. An oligopoly is when a market is dominated by just one or a few number of\
\ sellers or producers. To get oligopoly, the market should have high barriers to\
\ new entry, and the product has differentiation. The answer is (B).\n\nQ: If the\
\ government subsidizes producers in a perfectly competitive market, then\n(A) the\
\ demand for the product will increase (B) the demand for the product will decrease\
\ (C) the consumer surplus will increase (D) the consumer surplus will decrease\n\
A: Let's think step by step. We refer to Wikipedia articles on microeconomics for\
\ help. (A) and (B) are wrong because the demand curve does not change at all. If\
\ the government subsidizes producers, the supply will increase, and thus the consumer\
\ surplus also increases. The answer is (C).\n\nQ: Which of the following is true\
\ of a price floor?\n(A) The price floor shifts the demand curve to the left. (B)\
\ An effective floor creates a shortage of the good. (C) The price floor shifts\
\ the supply curve of the good to the right. (D) To be an effective floor, it must\
\ be set above the equilibrium price.\nA: Let's think step by step. We refer to\
\ Wikipedia articles on microeconomics for help. Price floor does not shift the\
\ demand or shift curve. An effective price floor should be set above the equilibrium\
\ price, otherwise the market bears and the floor does not have effective effect.\
\ The answer is (D).\n\nQ: The concentration ratio for a monopoly is\n(A) 0 (B)\
\ 5 (C) 10 (D) 100\nA: Let's think step by step. We refer to Wikipedia articles\
\ on microeconomics for help. The concentration ratio is calculated as the sum of\
\ market share of a specific number of largest companies. Monopoly means one company\
\ or entity controls the entire market, therefore, the concentration ratio is 100\
\ percent. The answer is (D).\n\nQ: In a competitive labor market for housepainters,\
\ which of the following would increase the demand for housepainters?\n(A) An effective\
\ minimum wage imposed on this labor market. (B) An increase in the price of gallons\
\ of paint. (C) An increase in the construction of new houses. (D) An increase in\
\ the price of mechanical painters so long as the output effect exceeds the substitution\
\ effect.\nA: Let's think step by step. We refer to Wikipedia articles on microeconomics\
\ for help. An increase in the construction of new houses means an increase demand\
\ of in-house painting, thus increases the demand for housepainters. The answer\
\ is (C)."
"group": "mmlu_flan_cot_fewshot_social_sciences"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_high_school_microeconomics"
"dataset_name": "high_school_physics"
"description": "The following are multiple choice questions (with answers) about high\
\ school physics.\n\nQ: A microwave oven is connected to an outlet, 120 V, and draws\
\ a current of 2 amps. At what rate is energy being used by the microwave oven?\n\
(A) 10 W (B) 30 W (C) 60 W (D) 240 W\nA: Let's think step by step. Rate of energy\
\ usage is known as power; in an dissipative electrical circuit, power is given\
\ by voltage times current. So in our case, the power is 120 V times 2 amps, or\
\ 240 W. The answer is (D).\n\nQ: A point charge, Q = +1 mC, is fixed at the origin.\
\ How much work is required to move a charge, Q = +8 µC, from the point (0, 4 meters)\
\ to the point (3 meters, 0)?\n(A) 3.5 J (B) 6.0 J (C) 22.5 J (D) 40 J\nA: Let's\
\ think step by step. To calculate the work required to move a charge from one location\
\ to another in a fixed electric field, it is enough to calculate the potential\
\ difference between the two locations. Here, the potential only depends on the\
\ distance between the charges; it’s $k q_1 q_2 / r$, where $k$ is Coulomb’s constant.\
\ Plugging in values $q_1 = $ 1 mC, $q_2 = 8 \\mu$ C, gives the answer as 5.992\
\ J, which rounds to 6 J. The answer is (B).\n\nQ: Which of the following conditions\
\ will ensure that angular momentum is conserved? I. Conservation of linear momentum\
\ II. Zero net external force III. Zero net external torque\n(A) I and II only (B)\
\ I and III only (C) II and III only (D) III only\nA: Let's think step by step.\
\ Torque is defined as the change in angular momentum; if there is zero external\
\ torque, angular momentum is conserved. The answer is (D).\n\nQ: A photocell of\
\ work function ϕ = 2eV is connected to a resistor in series. Light of frequency\
\ f = 1 × 10^15 Hz hits a metal plate of the photocell. If the power of the light\
\ is P = 100 W, what is the current through the resistor?\n(A) 2:00 AM (B) 6:00\
\ AM (C) 12:00 AM (D) 24 A\nA: Let's think step by step. The only answer above which\
\ has units of current is D, 24 A. The answer is (D).\n\nQ: A pipe full of air is\
\ closed at one end. A standing wave is produced in the pipe, causing the pipe to\
\ sound a note. Which of the following is a correct statement about the wave’s properties\
\ at the closed end of the pipe?\n(A) The pressure is at a node, but the particle\
\ displacement is at an antinode. (B) The pressure is at an antinode, but the particle\
\ displacement is at a node. (C) The pressure and the particle displacement are\
\ both at nodes. (D) The pressure and the particle displacement are both at antinodes.\n\
A: Let's think step by step. At the closed end of the pipe, the particles cannot\
\ have any net displacement because the pipe closure stops them. So the particle\
\ displacement is at a node. This closure also causes the pressure to be maximal,\
\ i.e. an antinode. The answer is (B)."
"group": "mmlu_flan_cot_fewshot_stem"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_high_school_physics"
"dataset_name": "high_school_psychology"
"description": "The following are multiple choice questions (with answers) about high\
\ school psychology.\n\nQ: Pascale is interested in the processing strategies children\
\ use to learn new information. Pascale would best be classified as what type of\
\ psychologist?\n(A) sociocultural (B) clinical (C) cognitive (D) behaviorist\n\
A: Let's think step by step. We refer to Wikipedia articles on psychology for help.\
\ Sociocultural psychologist focuses on the effect of societal factors on people.\
\ Clinical psychologist focuses on people with mental issues. Cognitive psychologist\
\ focuses on how people think and learn, including the processing strategies. Behaviorist\
\ focuses more on the environment and experience effect on people. The answer is\
\ (C).\n\nQ: According to Caplan's model of consultee-centered case consultation,\
\ the consultant is primarily interested in\n(A) identifying the causes and solutions\
\ of the client's presenting problems (B) identifying and eliminating the causes\
\ of the consultee's difficulties in handling a problem (C) establishing a hierarchy\
\ of authority to enable effective decision making (D) presenting a single, well-defined\
\ and unambiguous course of action for the consultant to overcome skills deficits\n\
A: Let's think step by step. We refer to Wikipedia articles on psychology for help.\
\ Caplan defines two type of consultation. Client-centered case consultation aims\
\ to handle client's problems, while consultee-centered case consultation aims to\
\ identify the reason of client's difficulty to solve problems. The answer is (B).\n\
\nQ: According to the Individuals with Disabilities Education Improvement Act, which\
\ of the following must an educational agency do before it changes the educational\
\ placement of a student with a disability?\n(A) Give the child a trial period in\
\ the new environment (B) Notify the parents in writing (C) Obtain school board\
\ approval (D) Obtain parental consent\nA: Let's think step by step. We refer to\
\ Wikipedia articles on psychology for help. When the decision to change the educational\
\ placement of a student with a disability is made, the educational agency must\
\ notify the parents in writing on that date. The answer is (B).\n\nQ: While swimming\
\ in the ocean, Ivan is frightened by a dark shadow in the water even before he\
\ has the chance to identify what the shadow is. The synaptic connections taking\
\ place during this incident of fright are best described by which of the following?\n\
(A) Messages are sent from the thalamus directly to the amygdala. (B) Messages are\
\ sent from the thalamus to the \"what\" and \"where\" pathways. (C) Messages are\
\ sent from the parasympathetic nervous system to the cerebral cortex. (D) Messages\
\ are sent from the frontal lobes to the pituitary gland.\nA: Let's think step by\
\ step. We refer to Wikipedia articles on psychology for help. Our neural system\
\ has a mechanism that can respond immediate emotional signal before going to the\
\ thought center. In the Ivan's case, messages travel directly from thalamus to\
\ amygdala. The answer is (A).\n\nQ: Ani believes that her attitudes and behavior\
\ play a central role in what happens to her. Such a belief is likely to be associated\
\ with\n(A) a strong superego. (B) low self-esteem. (C) low self-efficacy. (D) an\
\ internal locus of control.\nA: Let's think step by step. We refer to Wikipedia\
\ articles on psychology for help. People with an external locus of control believes\
\ fate and luck play an important role in their lives, while people with an internal\
\ locus of control believes they control their lives. The answer is (D)."
"group": "mmlu_flan_cot_fewshot_social_sciences"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_high_school_psychology"
"dataset_name": "high_school_statistics"
"description": "The following are multiple choice questions (with answers) about high\
\ school statistics.\n\nQ: A new smartwatch is manufactured in one part of a factory,\
\ then secured for shipping in another, independent part of the factory. The weight\
\ of the smartwatch has a mean of 62 grams and a standard deviation of 1.0 grams.\
\ The weight of the packaging (box, user's guide, bubble wrap, etc.) has a mean\
\ of 456 grams and a standard deviation of 6 grams. Together, the distribution of\
\ the weight of the smartwatch and its packaging would have the following mean and\
\ standard deviation:\n(A) Mean 518 grams; standard deviation 7.0 grams (B) Mean\
\ 518 grams; standard deviation 3.5 grams (C) Mean 518 grams; standard deviation\
\ 6.1 grams (D) Mean 394 grams; standard deviation 6.1 grams\nA: Let's think step\
\ by step. Since the weight of the watch and the weight of the packaging are independent\
\ random variables, the mean and variance of their sum is equal to the sum of their\
\ individual means and variances. So the mean is 62 + 456 = 518 grams, and the variances\
\ is 1.0^2 + 6.0^2 = 37, leading to a standard deviation of 6.1 grams. The answer\
\ is (C).\n\nQ: After a frost warning was issued, the owner of a large orange grove\
\ asked his workers to spray all his trees with water. The water was supposed to\
\ freeze and form a protective covering of ice around the orange blossom. Nevertheless,\
\ the owner suspected that some trees suffered considerable damage due to the frost.\
\ To estimate the proportion of trees that suffered more than 50 percent damage\
\ due to the frost, he took a random sample of 100 trees from his grove. What is\
\ the response variable in this experiment?\n(A) The proportion of trees that suffered\
\ more than 50 percent damage due to frost. (B) The number of trees affected by\
\ the frost. (C) The number of trees sampled from the grove. (D) For each sampled\
\ tree, whether it suffered more than 50 percent damage or at most 50 percent damage.\n\
A: Let's think step by step. In this experiment, the response variable is what is\
\ measured. For each tree, what is measured is whether or not it suffered more than\
\ 50 percent damage due to the frost. The answer is (D).\n\nQ: Suppose X and Y are\
\ random variables with E(X) = 37, var(X) = 5, E(Y) = 62, and var(Y) = 12. What\
\ are the expected value and variance of the random variable X + Y?\n(A) E(X + Y)\
\ = 99, var(X + Y) = 8.5 (B) E(X + Y) = 99, var(X + Y) = 13 (C) E(X + Y) = 99, var(X\
\ + Y) = 17 (D) There is insufficient information to answer this question.\nA: Let's\
\ think step by step. While means of sums of random variables add (regardless of\
\ whether the variables are independent) in order to determine the variance of a\
\ sum of random variables, we need to know not just their individual variances but\
\ the covariance of the two variables, which is not given in this problem. The answer\
\ is (D).\n\nQ: Which of the following sets has the smallest standard deviation?\
\ Which has the largest?\nI: {1,2,3}\nII: {-10,10}\nIII: {100}\n(A) I, II (B) II,\
\ III (C) III, I (D) III, II\nA: Let's think step by step. The variance of distribution\
\ I is the expected squared deviation from its mean (which is 2), so the variance\
\ is 2/3 . The variance of distribution II is 10^2 (because both elements are 10\
\ away from the mean of zero). The variance of distribution III is 0, since it has\
\ a single entry. So distribution III has the smallest standard deviation and distribution\
\ II has the largest. The answer is (D).\n\nQ: Which of the following is a correct\
\ statement about correlation?\n(A) If the slope of the regression line is exactly\
\ 1, then the correlation is exactly 1. (B) If the correlation is 0, then the slope\
\ of the regression line is undefined. (C) Switching which variable is called x\
\ and which is called y changes the sign of the correlation. (D) The correlation\
\ r is equal to the slope of the regression line when z-scores for the y-variable\
\ are plotted against z-scores for the x-variable.\nA: Let's think step by step.\
\ Statement A is false because the slope of the regression line being exactly 1\
\ can occur even when the two variables are not perfectly correlated. Statement\
\ B is false because uncorrelated variables regression lines can have slope zero.\
\ Statement C is false because correlation is symmetric in the two random variables.\
\ The answer is (D)."
"group": "mmlu_flan_cot_fewshot_stem"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_high_school_statistics"
"dataset_name": "high_school_us_history"
"description": "The following are multiple choice questions (with answers) about high\
\ school us history.\n\nQ: This question refers to the following information.\n\
I come not to urge personal claims, nor to seek individual benefits; I appear as\
\ the advocate of those who cannot plead their own cause; I come as the friend of\
\ those who are deserted, oppressed, and desolate. In the Providence of God, I am\
\ the voice of the maniac whose piercing cries from the dreary dungeons of your\
\ jails penetrate not your Halls of Legislation. I am the Hope of the poor crazed\
\ beings who pine in the cells, and stalls, and cages, and waste rooms of your poor-houses.\
\ I am the Revelation of hundreds of wailing, suffering creatures, hidden in your\
\ private dwellings, and in pens and cabins—shut out, cut off from all healing influences,\
\ from all mind-restoring cares.… Could their melancholy histories be spread before\
\ you as revealed to my grieved spirit during the last three months, how promptly,\
\ how earnestly would you search out the most approved means of relief; how trifling,\
\ how insignificant, by comparison, would appear the sacrifices you are asked to\
\ make; how would a few dimes and dollars, gathered from each citizen, diminish\
\ in value as a possession, compared with the certain benefits and vast good to\
\ be secured for the suffering insane...by the consecration and application of a\
\ sufficient fund to the construction of a suitable hospital.…\n—Dorothea Dix, Memorial\
\ Soliciting a State Hospital for the Protection and Cure of the Insane,\nSubmitted\
\ to the General Assembly of North Carolina, November 1848\nDorothea Dix can best\
\ be compared to whom?\n(A) Abigail Adams (B) Clara Barton (C) Shirley Temple (D)\
\ Hillary Clinton\nA: Let's think step by step. We refer to Wikipedia articles on\
\ us history for help. Both Dorothea Dix and Clara barton are American nurses. The\
\ answer is (B).\n\nQ: This question refers to the following information.\n\"As\
\ our late Conduct at the Conestoga Manor and Lancaster have occasioned much Speculation\
\ & a great diversity of Sentiments in this and neighboring Governments; some vindicating\
\ & others condemning it; some charitably alleviating the Crime, & others maliciously\
\ painting it in the most odious & detestable Colours, we think it our duty to lay\
\ before the Publick, the whole Matter as it appeared, & still appears, to us. .\
\ . .\n\"If these things are not sufficient to prove an unjustifiable Attachment\
\ in the Quakers to the Indians Savages, a fixed Resolution to befriend them & an\
\ utter insensibility to human Distresses, let us consider a few more recent Facts.\
\ When we found the last Summer that we were likely to get no Assistance from the\
\ Government, some Volunteers went out at our own Expense, determined to drive our\
\ Enemies from our Borders; & when we came near to the great Island, we understood\
\ that a Number of their Warriors had gone out against our Frontiers. Upon this\
\ we returned and came up with them and fought with them at the Munfey Hill where\
\ we lost some of our Men & killed some of their Warriors & thereby saved our Frontiers\
\ from this Story in another Expedition. But no sooner had we destroyed their Provisions\
\ on the great Island, & ruined their trade with the good People at Bethlehem, but\
\ these very Indians, who were justly suspected of having murdered our Friends in\
\ Northampton County, were by the Influence of some Quakers taken under the Protection\
\ of the Government to screen them from the Resentments of the Friends and Relations\
\ of the Murdered, & to support them thro the Winter.\"\n\"Apology of the Paxton\
\ Boys\" (pamphlet), 1764 (Note: \"apology\" in this context should be read as an\
\ explanation, not an admission of guilt or regret.\nThe sentiments expressed in\
\ the explanation above reflect which of the ongoing tensions during the colonial\
\ period of American history?\n(A) Tensions between British policies and the aspirations\
\ of North American colonists. (B) Tensions between American Indians allied with\
\ the French and those allied with the British. (C) Tensions between freed African\
\ Americans and white planters. (D) Tensions between backcountry settlers and elites\
\ within colonial America.\nA: Let's think step by step. We refer to Wikipedia articles\
\ on us history for help. After the French and Indian War, the Scotch-Irish settlers\
\ attacked American Indians. After the attacks on the Conestoga, about 250 Paxton\
\ Boys present their grievances to the Pennsylvania legislature. As mentioned in\
\ the information, the Paxton Boys cited resentiment at local elites. The answer\
\ is (D).\n\nQ: This question refers to the following information.\nOur leaders\
\ talk about stopping aggression from the north, but this was a struggle among groups\
\ of Vietnamese until we intervened. We seem bent upon saving the Vietnamese from\
\ Ho Chi Minh even if we have to kill them and demolish their country to do it.\
\ As the native people survey bombed-out villages, women and children burned by\
\ napalm, rice crops destroyed and cities overrun with our military personnel, they\
\ are doubtless saying secretly of the Vietcong guerillas and of the American forces,\
\ \"A plague on both your houses.\" Stop the bombing, north and south, end search\
\ and destroy offensive sweeps, and confine our military action to holding operations\
\ on the ground. Bombing the north has failed to halt or seriously check the flow\
\ of troops to the south and may, in fact, have prompted a much greater war effort\
\ by Hanoi.\n—Senator George McGovern, \"The Lessons of Vietnam,\" April 25, 1967\n\
Which of the following opinions from the 1960s most directly reflects the perspective\
\ of George McGovern's speech?\n(A) Americans must maximize their technological\
\ edge in Vietnam. (B) American bombing in Vietnam is step by step leading to progress\
\ in the war. (C) American bombing in Vietnam is a failure. (D) America must not\
\ give in to defeatism about the war in Vietnam.\nA: Let's think step by step. We\
\ refer to Wikipedia articles on us history for help. \"Stop the bombing\" and \"\
Bombing the north has failed to halt or seriously check the flow of troops to the\
\ south\" indicate that the perspective of George McGovern's speech is that Amerian\
\ bombing in Vietnam is a failure. The answer is (C).\n\nQ: This question refers\
\ to the following information.\n\"In the new Code of Laws which I suppose it will\
\ be necessary for you to make I desire you would Remember the Ladies, and be more\
\ generous and favorable to them than your ancestors. Do not put such unlimited\
\ power into the hands of the Husbands. Remember all Men would be tyrants if they\
\ could. If particular care and attention is not paid to the Ladies we are determined\
\ to foment a Rebellion, and will not hold ourselves bound by any Laws in which\
\ we have no voice, or Representation.\"\nAbigail Adams, in a letter to John Adams,\
\ 1776\n\"Special legislation for woman has placed us in a most anomalous position.\
\ Women invested with the rights of citizens in one section—voters, jurors, office-holders—crossing\
\ an imaginary line, are subjects in the next. In some States, a married woman may\
\ hold property and transact business in her own name; in others, her earnings belong\
\ to her husband. In some States, a woman may testify against her husband, sue and\
\ be sued in the courts; in others, she has no redress in case of damage to person,\
\ property, or character. In case of divorce on account of adultery in the husband,\
\ the innocent wife is held to possess no right to children or property, unless\
\ by special decree of the court. But in no State of the Union has the wife the\
\ right to her own person, or to any part of the joint earnings of the co-partnership\
\ during the life of her husband. In some States women may enter the law schools\
\ and practice in the courts; in others they are forbidden. In some universities\
\ girls enjoy equal educational advantages with boys, while many of the proudest\
\ institutions in the land deny them admittance, though the sons of China, Japan\
\ and Africa are welcomed there. But the privileges already granted in the several\
\ States are by no means secure.\"\nSusan B. Anthony, \"Declaration of Rights for\
\ Women,\" July 4, 1876\nThe sentiments expressed in the second excerpt by Susan\
\ B. Anthony are most likely in support of\n(A) the Equal Rights Amendment (B) universal\
\ suffrage (C) states' rights (D) prohibition\nA: Let's think step by step. We refer\
\ to Wikipedia articles on us history for help. The above information mentioned\
\ that women are in an anomalous position in terms of legislation. Women's earnings\
\ do not belong to themselves, or they cannot testify against her husbands. Susan\
\ believes women should have equal legal rights as men. The answer is (B).\n\nQ:\
\ This question refers to the following information.\n\"Society in every state is\
\ a blessing, but government even in its best state is but a necessary evil; in\
\ its worst state an intolerable one; for when we suffer, or are exposed to the\
\ same miseries by a government, which we might expect in a country without government,\
\ our calamity is heightened by reflecting that we furnish the means by which we\
\ suffer. Government, like dress, is the badge of lost innocence; the palaces of\
\ kings are built on the ruins of the bowers of paradise. For were the impulses\
\ of conscience clear, uniform, and irresistibly obeyed, man would need no other\
\ lawgiver; but that not being the case, he finds it necessary to surrender up a\
\ part of his property to furnish means for the protection of the rest; and this\
\ he is induced to do by the same prudence which in every other case advises him\
\ out of two evils to choose the least. Wherefore, security being the true design\
\ and end of government, it unanswerably follows that whatever form thereof appears\
\ most likely to ensure it to us, with the least expense and greatest benefit, is\
\ preferable to all others.\"\nThomas Paine, Common Sense, 1776\nWhich of the following\
\ \"miseries\" alluded to above were most condemned by Anti-Federalists of the post-Revolutionary\
\ era?\n(A) Organized response to Bacon's Rebellion (B) Federal response to Shays's\
\ Rebellion (C) Federal response to the Whiskey Rebellion (D) Federal response to\
\ Pontiac's Rebellion\nA: Let's think step by step. We refer to Wikipedia articles\
\ on us history for help. Anti-Federalists do not believe centralized government\
\ power, and suspect Washington's military response to Whiskey Rebellion. Bacon's\
\ Rebellion and Pontiac's Rebellion happen before the Revolution and they can be\
\ ruled out. The answer is (C)."
"group": "mmlu_flan_cot_fewshot_humanities"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_high_school_us_history"
"dataset_name": "high_school_world_history"
"description": "The following are multiple choice questions (with answers) about high\
\ school world history.\n\nQ: This question refers to the following information.\n\
\"At least one of the [world's] societies would have to somehow enormously increase\
\ its productivity [in order to achieve global hegemony]. That quantum jump would\
\ have to be made before the various scientific, technological, agricultural, and\
\ industrial revolutions on which our post-quantum-leap world rests. It could only\
\ be accomplished by exploiting the ecosystems, mineral resources, and human assets\
\ of whole continents outside the lands of the society making the jump. Western\
\ Europe did just that by means of its brutality and guns and, more important, by\
\ geographical and ecological luck.\"\nCopyright © 2015 Cambridge University Press.\n\
Alfred Crosby, historian, Ecological Imperialism, 2004\nThe \"quantum jump\" mentioned\
\ in the passage most directly contributed to which of the following developments\
\ in the period 1450–1750 C.E.?\n(A) A breakdown in trade routes through the collapse\
\ of the established state structure (B) An increase in the population of the world\
\ through more plentiful supplies of food (C) The spread of Chinese and Indian belief\
\ systems across the world (D) An increase in social unrest\nA: Let's think step\
\ by step. We refer to Wikipedia articles on world history for help. The \"quantum\
\ jump\" mentioned in the passage refers to the conquest of the New World and the\
\ Columbian Exchange. Choice (A) and (C) did not happen in history. Choice (C) refers\
\ to the human assets. The answer is (B).\n\nQ: This question refers to the following\
\ information.\n\"The struggle against neo-colonialism is not aimed at excluding\
\ the capital of the developed world from operating in less developed countries.\
\ It is aimed at preventing the financial power of the developed countries being\
\ used in such a way as to impoverish the less developed.\nNon-alignment, as practiced\
\ by Ghana and many other countries, is based on co-operation with all States whether\
\ they be capitalist, socialist or have a mixed economy. Such a policy, therefore,\
\ involves foreign investment from capitalist countries, but it must be invested\
\ in accordance with a national plan drawn up by the government of the non-aligned\
\ State with its own interests in mind. The issue is not what return the foreign\
\ investor receives on his investments…The question is one of power. A State in\
\ the grip of neo-colonialism is not master of its own destiny.\"\nKwame Nkrumah,\
\ Neo-Colonialism, 1965\nWhich of the following provides the best context for Nkrumah's\
\ writings?\n(A) The Industrial Revolution (B) Decolonization (C) Regional Free\
\ Trade Associations (D) Autarky\nA: Let's think step by step. We refer to Wikipedia\
\ articles on world history for help. The passage expresses a point that the successful\
\ fight against neo-colonialism were in danger and the newly independent nations\
\ like Ghana may be re-colonized via financial power of the developed countries.\
\ The answer is (B).\n\nQ: This question refers to the following information.\n\"\
Indeed, as both the fatwas of distinguished [scholars] who base their opinion on\
\ reason and tradition alike and the consensus of the Sunni community agree that\
\ the ancient obligation of extirpation, extermination, and expulsion of evil innovation\
\ must be the aim of our exalted aspiration, for \"Religious zeal is a victory for\
\ the Faith of God the Beneficent\"; then, in accordance with the words of the Prophet\
\ (Peace upon him!) \"Whosoever introduces evil innovation into our order must be\
\ expelled\" and \"Whosoever does aught against our order must be expelled,\" action\
\ has become necessary and exigent…\"\nLetter from Ottoman Sultan Selim I to Safavid\
\ Shah Ismail I, 1514\nThe letter from Selim I is most clearly an example of which\
\ of the following?\n(A) The maintenance of military supremacy at all costs (B)\
\ Expanding tensions between religious sects (C) Factors that brought about the\
\ collapse of the Ottoman Empire (D) Peacemaking efforts among the Islamic empires\n\
A: Let's think step by step. We refer to Wikipedia articles on world history for\
\ help. The passage is an example of expanding tensions between Selim and Ismail.\
\ In the passage the Selim references the fatwa and the consensus of the Sunni community\
\ to against whosoever introduces evil. The answer is (B).\n\nQ: This question refers\
\ to the following information.\n\"The real grievance of the worker is the insecurity\
\ of his existence; he is not sure that he will always have work, he is not sure\
\ that he will always be healthy, and he foresees that he will one day be old and\
\ unfit to work. If he falls into poverty, even if only through a prolonged illness,\
\ he is then completely helpless, exam_ins to his own devices, and society does\
\ not currently recognize any real obligation towards him beyond the usual help\
\ for the poor, even if he has been working all the time ever so faithfully and\
\ diligently. The usual help for the poor, however, leaves a lot to be desired,\
\ especially in large cities, where it is very much worse than in the country.\"\
\nOtto von Bismarck, 1884\nOtto von Bismarck likely made this speech in reaction\
\ to which of the following issues?\n(A) Social acceptance of child labor (B) Declining\
\ life expectancy in Germany (C) Criticisms of German trade tariffs (D) Negative\
\ effects attributed to industrial capitalism\nA: Let's think step by step. We refer\
\ to Wikipedia articles on world history for help. The passage talks about the grievance\
\ of the work under the industrial capitalism. The answer is (D).\n\nQ: This question\
\ refers to the following information.\nHe contains all works and desires and all\
\ perfumes and all tastes. He enfolds the whole universe and in silence is loving\
\ to all. This is the Spirit that is in my heart, this is Brahman. To him I shall\
\ come when I go beyond this life, and to him will come he who has faith and doubts\
\ not.\n—The Upanishads, India, c. 1000 BCE\nTo which religion does the speaker\
\ most likely belong?\n(A) Hinduism (B) Buddhism (C) Shintoism (D) Zoroastrianism\n\
A: Let's think step by step. We refer to Wikipedia articles on world history for\
\ help. Brahman refers to the ultimate reality of all things in the Hindu religion.\
\ In contrast, Buddhism does not have a concept of supreme God. The answer is (A)."
"group": "mmlu_flan_cot_fewshot_humanities"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_high_school_world_history"
"dataset_name": "human_aging"
"description": "The following are multiple choice questions (with answers) about human\
\ aging.\n\nQ: All other things being equal, which of the following persons is more\
\ likely to show osteoporosis?\n(A) An older Hispanic American woman (B) An older\
\ African American woman (C) An older Asian American woman (D) An older Native American\
\ woman\nA: Let's think step by step. We refer to Wikipedia articles on human aging\
\ for help. Although osteoporosis can occur at any age, the risk is higher for older\
\ people. It is most common in Asian and non-Hispanic white women. The answer is\
\ (C).\n\nQ: The finding that adults tend to remember events from their adolescence\
\ better than from other periods in their lives is referred to as the\n(A) Adolescence\
\ advantage (B) Reminiscence bump (C) Memorial memorial (D) Quadratic retrieval\
\ spike\nA: Let's think step by step. We refer to Wikipedia articles on human aging\
\ for help. Reminiscence bump is a phenomenon that older adults tend to recollect\
\ events during their young ages. People usually have a period of childhood amnesia\
\ from birth to around age 5, and a reminiscence bump between 10 and 30. The answer\
\ is (B).\n\nQ: Which element in tobacco smoke is responsible for cancers?\n(A)\
\ Nicotine (B) Tar (C) Carbon monoxide (D) Smoke particles\nA: Let's think step\
\ by step. We refer to Wikipedia articles on human aging for help. The benzene,\
\ acrylamide and acrylonitrile in tar interact with the lungs and cause DNA mutations\
\ in cells of the lungs, and lead to cancer. The answer is (B).\n\nQ: When older\
\ adults move to a new state after retirement, which of the following is the more\
\ likely destination?\n(A) Texas (B) California (C) Hawaii (D) Vermont\nA: Let's\
\ think step by step. We refer to Wikipedia articles on human aging for help. Texas\
\ does not have state tax, and has low cost of living compared with the other three\
\ options. The answer is (A)."
"group": "mmlu_flan_cot_fewshot_other"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_human_aging"
"dataset_name": "human_sexuality"
"description": "The following are multiple choice questions (with answers) about human\
\ sexuality.\n\nQ: The most common disorder among men who seek sexual therapy is:\n\
(A) premature ejaculation (B) inhibited ejaculation (C) erectile disorder (D) ejaculatory\
\ disorder\nA: Let's think step by step. We refer to Wikipedia articles on human\
\ sexuality for help. The most common type of arousal disorder for men is erectile\
\ dysfunction, meaning that a person is not able to get any physical satisfaction\
\ from sexual activity although he may be interested in it. The answer is (C).\n\
\nQ: A woman who knows she has active herpes and untreated syphilis but continues\
\ to have sex without informing her partners of her condition has, in psychoanalytic\
\ terms:\n(A) a strong ego (B) a weak superego (C) a weak id (D) a strong superego\n\
A: Let's think step by step. We refer to Wikipedia articles on human sexuality for\
\ help. A person with weak superego tends to be delinquent, criminal or have antisocial\
\ personality. The action of the woman who knows she has active venereal disease\
\ but still have sex with her partners indicate she may has antisocial personality.\
\ The answer is (B).\n\nQ: The nature of homosexual activities that occur during\
\ preadolescence include all but which of the following?\n(A) sexual intercourse\
\ (B) circle jerks (C) exhibitionism (D) touching each other's genitals\nA: Let's\
\ think step by step. We refer to Wikipedia articles on human sexuality for help.\
\ Sexual intercourse is prohibited during preadolescence. The answer is (A).\n\n\
Q: Women's ability to have multiple orgasms is primarily due to:\n(A) the fact that\
\ they do not have a refractory period. (B) the response of the inner layers of\
\ the vagina. (C) having alternating orgasms in different locations. (D) the G-Spot.\n\
A: Let's think step by step. We refer to Wikipedia articles on human sexuality for\
\ help. The refractory period is the time when a person is not able to be erect\
\ or is not interested in sex. The answer is (A).\n\nQ: Morning sickness is typically\
\ a problem:\n(A) during the first trimester (B) during the second trimester (C)\
\ during the third trimester (D) all through the pregnancy\nA: Let's think step\
\ by step. We refer to Wikipedia articles on human sexuality for help. Morning sickness\
\ usually begins by nine weeks after conception, corresponding to the first trimester.\
\ The answer is (A)."
"group": "mmlu_flan_cot_fewshot_social_sciences"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_human_sexuality"
"dataset_name": "international_law"
"description": "The following are multiple choice questions (with answers) about international\
\ law.\n\nQ: How the consent to be bound of a State may be expressed?\n(A) The consent\
\ of a State to be bound is expressed only by ratification (B) The consent of a\
\ state to be bound by a treaty may be expressed by signature, ratification, acceptance,\
\ approval or accession (C) The consent of a State to be bound is expressed by signature\
\ (D) The consent of a State to be bound is expressed by whatever means they choose\n\
A: Let's think step by step. We refer to Wikipedia articles on international law\
\ for help. Article 11 of Vienna Convention on the Law of Treaties signed in 1969\
\ states that \"the consent of a State to be bound by a treaty may be expressed\
\ by signature, exchange of instruments constituting a treaty, ratification, acceptance,\
\ approval or accession, or by any other means if so agreed.\" (B) is the most precise\
\ and accurate answer. The answer is (B).\n\nQ: What is the judge ad hoc?\n(A) If\
\ a party to a contentious case before the ICJ does not have a national sitting\
\ as judge, it is entitled to nominate someone as a judge solely for that case,\
\ with the title of judge ad hoc (B) Judge ad hoc is the member of the bench of\
\ the ICJ with a casting vote (C) Judge ad hoc is a surrogate judge, in case a judge\
\ is disqualified or passes away (D) Judge ad hoc is the judge that each party will\
\ always nominate in every contentious case\nA: Let's think step by step. We refer\
\ to Wikipedia articles on international law for help. As \"ad hoc\" implies, a\
\ judge ad hoc is appointed only for a specific case or period, when a party to\
\ a contentious case before the International Court of Justice does not have a regular\
\ national sitting as judge. The answer is (A).\n\nQ: When 'consent' can serve as\
\ a circumstance precluding the wrongfulness of a State conduct?\n(A) Consent can\
\ serve as a circumstance precluding the wrongfulness whenever it is given (B) Consent\
\ can never serve as a circumstance precluding wrongfulness (C) Consent can serve\
\ as a circumstance precluding wrongfulness, provided the consent is valid and to\
\ the extent that the conduct remains within the limits of the consent given (D)\
\ Consent can always serve as a circumstance precluding wrongfulness, no matter\
\ which organ of the State gives it\nA: Let's think step by step. We refer to Wikipedia\
\ articles on international law for help. Valid consent can serve as a circumstance\
\ precluding the wrongfulness of a State conduct if the conduct remains within the\
\ limits of that consent, according to Chapter V of the Responsibility of States\
\ for Internationally Wrongful Acts, 2001, United Nations. The answer is (C).\n\n\
Q: Would a reservation to the definition of torture in the ICCPR be acceptable in\
\ contemporary practice?\n(A) This is an acceptable reservation if the reserving\
\ country's legislation employs a different definition (B) This is an unacceptable\
\ reservation because it contravenes the object and purpose of the ICCPR (C) This\
\ is an unacceptable reservation because the definition of torture in the ICCPR\
\ is consistent with customary international law (D) This is an acceptable reservation\
\ because under general international law States have the right to enter reservations\
\ to treaties\nA: Let's think step by step. We refer to Wikipedia articles on international\
\ law for help. For it contravenes the object and purpose of the ICCPR, this is\
\ an unacceptable reservation in contemporary practice. The answer is (B).\n\nQ:\
\ What types of force does Article 2(4) of the UN Charter prohibit?\n(A) Article\
\ 2(4) encompasses only armed force (B) Article 2(4) encompasses all types of force,\
\ including sanctions (C) Article 2(4) encompasses all interference in the domestic\
\ affairs of States (D) Article 2(4) encompasses force directed only against a State's\
\ territorial integrity\nA: Let's think step by step. We refer to Wikipedia articles\
\ on international law for help. Article 2(4) of the UN Charter prohibits states\
\ from using armed forces in their international relations. The answer is (A)."
"group": "mmlu_flan_cot_fewshot_humanities"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_international_law"
"dataset_name": "jurisprudence"
"description": "The following are multiple choice questions (with answers) about jurisprudence.\n\
\nQ: Iverson Jewelers wrote a letter to Miller, 'We have received an exceptionally\
\ fine self winding Rolox watch which we will sell to you at a very favorable price.'\n\
(A) The letter is an offer to sell (B) A valid offer cannot be made by letter. (C)\
\ The letter contains a valid offer which will terminate within a reasonable time.\
\ (D) The letter lacks one of the essential elements of an offer.\nA: Let's think\
\ step by step. We refer to Wikipedia articles on jurisprudence for help. An offer\
\ shows the intent to enter into a mutually-beneficial contract with specific terms.\
\ An offer can be made by a letter. While this letter indicates the willingness\
\ to sell, the lack of specific terms, such as transaction price and offer expiration\
\ date, makes it an incomplete offer. The answer is (D).\n\nQ: Functions of the\
\ law include all but which of the following?\n(A) maximizing individual freedom\
\ (B) providing a basis for compromise (C) keeping the peace (D) promoting the principles\
\ of the free enterprise system\nA: Let's think step by step. We refer to Wikipedia\
\ articles on jurisprudence for help. Laws are fundamentally about helping resolve\
\ disputes between individuals, and therefore essential for maximizing individual\
\ freedom, providing a basis for compromise, and keeping the peace. The answer is\
\ (D).\n\nQ: The ________ School of jurisprudence postulates that the law is based\
\ on what is \"correct.\"\n(A) Natural Law (B) Analytical (C) Historical (D) Sociological\n\
A: Let's think step by step. We refer to Wikipedia articles on jurisprudence for\
\ help. Natural Law School of jurisprudence focuses on the laws of nature, and states\
\ that the law should be based on ethics, morals, and what is \"correct\". Analytical\
\ deals with the law as it already exists, Historical postulates that the law was\
\ found and not made, and Sociological studies how the law and society impact each\
\ other. The answer is (A).\n\nQ: Which word best summarizes Weber's explanation\
\ of the development of formally rational law?\n(A) Authority. (B) Charisma. (C)\
\ Co-operation. (D) Capitalism.\nA: Let's think step by step. We refer to Wikipedia\
\ articles on jurisprudence for help. Weber explained the development of formal\
\ rationality in laws as how the modern society moved from tradition to rationality,\
\ where people decide actions based less on how they were culturally done and more\
\ on expected utilities. How rational individuals optimize efficiency of accomplishing\
\ tasks for higher rewards is a core principle of Capitalism. The answer is (D).\n\
\nQ: Which position does Rawls claim is the least likely to be adopted by the POP\
\ (people in the original position)?\n(A) The POP would choose equality above liberty.\
\ (B) The POP would opt for the 'maximin' strategy. (C) The POP would opt for the\
\ 'difference principle'. (D) The POP would reject the 'system of natural liberty.'\n\
A: Let's think step by step. We refer to Wikipedia articles on jurisprudence for\
\ help. The POP would opt for the 'maximin' strategy, opt for the 'difference principle',\
\ and reject the 'system of natural liberty', but the POP would not choose equality\
\ above liberty, since the POP assume both equal and free citizens. The answer is\
\ (A)."
"group": "mmlu_flan_cot_fewshot_humanities"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_jurisprudence"
"dataset_name": "logical_fallacies"
"description": "The following are multiple choice questions (with answers) about logical\
\ fallacies.\n\nQ: When an arguer causes confusion during refutation because of\
\ real or feigned lack of an ability to engage in refutation, that arguer may have\
\ committed the fallacy of\n(A) poor sportsmanship (B) appeal to compassion (C)\
\ argument against the person (D) ignorance of refutation\nA: Let's think step by\
\ step. We refer to Wikipedia articles on logical fallacies for help. Ignorance\
\ of refutation, one of Aristotle's original list of logical fallacies in his Organon,\
\ is when someone causes confusion in an argument through real or feigned inability\
\ to engage in refutation, in order to win the argument. The answer is (D).\n\n\
Q: The complex question fallacy consists of\n(A) arguing something is inferior just\
\ because it doesn't do something it was never intended to do. (B) including more\
\ than one claim in the proposition and treating proof for one claim as proof for\
\ all the claims. (C) drawing a conclusion before examining the evidence, and only\
\ considering evidence that supports that conclusion. (D) asking a question that\
\ includes either an unproven assumption or more than one question, thus making\
\ a straightforward yes or no answer meaningless.\nA: Let's think step by step.\
\ We refer to Wikipedia articles on logical fallacies for help. The complex question\
\ fallacy is when someone makes a single yes or no answer to a question meaningless,\
\ by including either an unproven assumption or many questions. The latter is also\
\ known as the many questions fallacy. The answer is (D).\n\nQ: Arguing that what\
\ is true of the parts must be true of the whole is the fallacy of...\n(A) Division\
\ (B) Composition (C) Appeal to the person (D) Appeal to ignorance\nA: Let's think\
\ step by step. We refer to Wikipedia articles on logical fallacies for help. Fallacy\
\ of composition occurs when someone argues what is true of the parts must be true\
\ of the whole. The answer is (B).\n\nQ: Which of the following is true of a valid\
\ categorical syllogism?\n(A) The minor premise must deny the antecedent (B) The\
\ major premise must affirm the consequent (C) The middle term must be used in at\
\ least one premise in a universal or unqualified sense (D) All of the above\nA:\
\ Let's think step by step. We refer to Wikipedia articles on logical fallacies\
\ for help. A valid categorical syllogism must satisfy several conditions: (1) the\
\ syllogism must have exactly three terms (2) every term of the syllogism must be\
\ used twice exactly, (3) a term may be used only once in any premise, and (4) the\
\ middle term must be used in at least one premise in a universal or unqualified\
\ sense, etc. Only (C) is true. The answer is (C).\n\nQ: If someone attacks the\
\ character of an opposing arguer, instead of responding to that opponent's arguments,\
\ the first person has probably committed which of the following fallacies?\n(A)\
\ tu quoque (B) horse laugh (C) argument against the person (D) ignoratio elenchi\n\
A: Let's think step by step. We refer to Wikipedia articles on logical fallacies\
\ for help. The argument against the person fallacy occurs when someone irrelevantly\
\ attacks the character of an opposing arguer, instead of addressing that opponent's\
\ arguments. The answer is (C)."
"group": "mmlu_flan_cot_fewshot_humanities"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_logical_fallacies"
"dataset_name": "machine_learning"
"description": "The following are multiple choice questions (with answers) about machine\
\ learning.\n\nQ: Which image data augmentation is most common for natural images?\n\
(A) random crop and horizontal flip (B) random crop and vertical flip (C) posterization\
\ (D) dithering\nA: Let's think step by step. Data augmentation is used to increase\
\ the diversity of images in the training dataset. It is important that natural\
\ images are kept natural after being augmented. Vertical flips of images are not\
\ natural, so (B) is false. Posterization makes the image look like a poster and\
\ and dithering increases color depth. None of these two preserve the natural property.\
\ The only natural data augmentation technique is (A). The answer is (A).\n\nQ:\
\ Traditionally, when we have a real-valued input attribute during decision-tree\
\ learning we consider a binary split according to whether the attribute is above\
\ or below some threshold. Pat suggests that instead we should just have a multiway\
\ split with one branch for each of the distinct values of the attribute. From the\
\ list below choose the single biggest problem with Pat’s suggestion:\n(A) It is\
\ too computationally expensive. (B) It would probably result in a decision tree\
\ that scores badly on the training set and a testset. (C) It would probably result\
\ in a decision tree that scores well on the training set but badly on a testset.\
\ (D) It would probably result in a decision tree that scores well on a testset\
\ but badly on a training set.\nA: Let's think step by step. Because the input is\
\ real valued, it is unlikely that the same values appear both at training and test\
\ time. This means that while such a decision tree could yield good performance\
\ on the training data, when evaluated on the test data it will perform badly because\
\ the decision tree won’t know what to do with numbers that did not appear in the\
\ training data. The answer is (C).\n\nQ: You are reviewing papers for the World’s\
\ Fanciest Machine Learning Conference, and you see submissions with the following\
\ claims. Which ones would you consider accepting?\n(A) My method achieves a training\
\ error lower than all previous methods! (B) My method achieves a test error lower\
\ than all previous methods! (Footnote: When regularisation parameter λ is chosen\
\ so as to minimise test error.) (C) My method achieves a test error lower than\
\ all previous methods! (Footnote: When regularisation parameter λ is chosen so\
\ as to minimise cross-validaton error.) (D) My method achieves a cross-validation\
\ error lower than all previous methods! (Footnote: When regularisation parameter\
\ λ is chosen so as to minimise cross-validaton error.)\nA: Let's think step by\
\ step. In machine learning, we train with some data and fixed hyperparameters and\
\ the training error can be arbitrarily low, so (A) can’t be right. Then, one compares\
\ different hyperparameters by selecting the model with the lowest cross-validation\
\ error, this means that (B) and (D) are not the right procedure. The only relevant\
\ number after these is the test error and thus (C) is the right answer. The answer\
\ is (C).\n\nQ: A 6-sided die is rolled 15 times and the results are: side 1 comes\
\ up 0 times; side 2: 1 time; side 3: 2 times; side 4: 3 times; side 5: 4 times;\
\ side 6: 5 times. Based on these results, what is the probability of side 3 coming\
\ up when using Add-1 Smoothing?\n(A) 2.0/15 (B) 1.0/7 (C) 3.0/16 (D) 1.0/5\nA:\
\ Let's think step by step. Add-1 smoothing adds the value of one to the different\
\ counts and then normalizes the probabilities accordingly. The counts after adding\
\ one will be: side 1 comes up 1 time; side 2: 2 times; side 3: 3 times; side 4:\
\ 4 times; side 5: 5 times; side 6: 6 times. The number of sum one die rolls will\
\ be 21, so the probability of drawing a three is 3/21 = 1/7. The answer is (B).\n\
\nQ: To achieve an 0/1 loss estimate that is less than 1 percent of the true 0/1\
\ loss (with probability 95%), according to Hoeffding's inequality the IID test\
\ set must have how many examples?\n(A) around 10 examples (B) around 100 examples\
\ (C) between 100 and 500 examples (D) more than 1000 examples\nA: Let's think step\
\ by step. By the Hoeffding’s inequality, we expect that with 95% probability the\
\ in-sample and out-of-sample errors differ by epsilon when we have N samples if\
\ 2 exp(-2 epsilon^2 N)<0.05, this implies that N > -1/(2*epsilon**2) log ( 0.05/2\
\ )= log (40)*5000. Since log(40)>1, we have that one needs more than 1000 examples.\
\ The answer is (D)."
"group": "mmlu_flan_cot_fewshot_stem"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_machine_learning"
"dataset_name": "management"
"description": "The following are multiple choice questions (with answers) about management.\n\
\nQ: How can organisational structures that are characterised by democratic and\
\ inclusive styles of management be described?\n(A) Hierarchical (B) Bureaucratic\
\ (C) Flat (D) Functional\nA: Let's think step by step. We refer to Wikipedia articles\
\ on management for help. Flat organizational structures are characterized by democratic\
\ and inclusive styles of management, and have few (if any) levels of management\
\ between the workers and managers. The answer is (C).\n\nQ: Hygiene factors are\
\ associated with which writer?\n(A) Frederick Hertzberg (B) D.C. McClelland (C)\
\ Abraham Maslow (D) Douglas McGregor\nA: Let's think step by step. We refer to\
\ Wikipedia articles on management for help. Hygiene factors include compensation,\
\ company policies, supervision, interpersonal relations, and work environments.\
\ Hertzberg lists them as factors that cannot motivate employees but can minimize\
\ job dissatisfaction. The answer is (A).\n\nQ: What characteristic is not a key\
\ feature of the 'open systems' model of management?\n(A) Morale (B) Innovation\
\ (C) Growth resource (D) Adaptation\nA: Let's think step by step. We refer to Wikipedia\
\ articles on management for help. The key characteristics of an open system in\
\ management include innovation, growth resource, and adaption, but do not include\
\ morale. The answer is (A).\n\nQ: Which element of the cultural web forms regalia?\n\
(A) Symbols (B) Rituals and routines (C) Power structures (D) Control systems\n\
A: Let's think step by step. We refer to Wikipedia articles on management for help.\
\ The cultural web is a tool for mapping an organization's culture, where symbols\
\ form the regalia that visually expresses the values that the organization holds\
\ as important. The answer is (A).\n\nQ: What are the two main dimensions of the\
\ Ohio Studies into leadership?\n(A) Starting position and end position (B) Initial\
\ environment and changed environment (C) Organisational structure and conditioning\
\ (D) Initiating structure and considerations\nA: Let's think step by step. We refer\
\ to Wikipedia articles on management for help. The Ohio State Leadership Studies\
\ conducted in the 1940s identified initiating structure and consideration as the\
\ two main dimensions of leader behavior. The answer is (D)."
"group": "mmlu_flan_cot_fewshot_other"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_management"
"dataset_name": "marketing"
"description": "The following are multiple choice questions (with answers) about marketing.\n\
\nQ: Although the content and quality can be as controlled as direct mail, response\
\ rates of this medium are lower because of the lack of a personal address mechanism.\
\ This media format is known as:\n(A) Care lines. (B) Direct mail. (C) Inserts.\
\ (D) Door to door.\nA: Let's think step by step. We refer to Wikipedia articles\
\ on marketing for help. Door to door marketing delivers non-addressed items within\
\ all buildings within a geographic area. While it can control the content and quality\
\ as well as direct mail marketing, its response rate is lower because of the lack\
\ of a personal address mechanism. The answer is (D).\n\nQ: In an organization,\
\ the group of people tasked with buying decisions is referred to as the _______________.\n\
(A) Outsourcing unit. (B) Procurement centre. (C) Chief executive unit. (D) Decision-making\
\ unit.\nA: Let's think step by step. We refer to Wikipedia articles on marketing\
\ for help. In an organization, the group of the people tasked with buying decision\
\ is referred to as the decision-making unit. The answer is (D).\n\nQ: The single\
\ group within society that is most vulnerable to reference group influence is:\n\
(A) The older consumer who feels somewhat left out of things. (B) The married women,\
\ many of whom feel a need for stability in their lives. (C) New immigrants who\
\ really want to assimilate into their new culture. (D) Children, who base most\
\ of their buying decisions on outside influences.\nA: Let's think step by step.\
\ We refer to Wikipedia articles on marketing for help. Children, who mostly based\
\ their buying decisions on outside influences, are the single group within society\
\ that is more vulnerable to reference group influence. The answer is (D).\n\nQ:\
\ Which of the following is an assumption in Maslow's hierarchy of needs?\n(A) Needs\
\ are dependent on culture and also on social class. (B) Lower-level needs must\
\ be at least partially satisfied before higher needs can affect behaviour. (C)\
\ Needs are not prioritized or arranged in any particular order. (D) Satisfied needs\
\ are motivators, and new needs emerge when current needs remain unmet.\nA: Let's\
\ think step by step. We refer to Wikipedia articles on marketing for help. Maslow's\
\ hierarchy of needs, from the bottom upwards, are physiological (food and clothing),\
\ safety, love and belonging needs, esteem, and self-actualization. Lower-level\
\ needs must be at least partially satisfied before higher ones can affect behavior.\
\ The answer is (B).\n\nQ: _____________ is a natural outcome when combining demographic\
\ and geographic variables.\n(A) Geodemographics (B) Product differentiation. (C)\
\ ANSOFF matrix. (D) Brand management.\nA: Let's think step by step. We refer to\
\ Wikipedia articles on marketing for help. Geodemographics is a natural outcome\
\ when combining demographic and geographic variables. The answer is (A)."
"group": "mmlu_flan_cot_fewshot_other"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_marketing"
"dataset_name": "medical_genetics"
"description": "The following are multiple choice questions (with answers) about medical\
\ genetics.\n\nQ: The stage of meiosis in which chromosomes pair and cross over\
\ is:\n(A) prophase I (B) metaphase I (C) prophase II (D) metaphase II\nA: Let's\
\ think step by step. We refer to Wikipedia articles on medical genetics for help.\
\ Prophase I is the stage of meiosis where homologous chromosomes pair with each\
\ other and exchange genetic material. The answer is (A).\n\nQ: DNA ligase is\n\
(A) an enzyme that joins fragments in normal DNA replication (B) an enzyme of bacterial\
\ origin which cuts DNA at defined base sequences (C) an enzyme that facilitates\
\ transcription of specific genes (D) an enzyme which limits the level to which\
\ a particular nutrient reaches\nA: Let's think step by step. We refer to Wikipedia\
\ articles on medical genetics for help. DNA ligase is a type of enzyme (EC 6.5.1.1)\
\ responsible for joining DNA strands together by catalyzing a phosphodiester bond.\
\ The answer is (A).\n\nQ: Which of the following conditions does not show multifactorial\
\ inheritance?\n(A) Pyloric stenosis (B) Schizophrenia (C) Spina bifida (neural\
\ tube defects) (D) Marfan syndrome\nA: Let's think step by step. We refer to Wikipedia\
\ articles on medical genetics for help. Multifactorial inheritance is when more\
\ than a single factor is responsible for causing a given trait or health problem.\
\ Genes cannot be the only factor. Marfan syndrome, on the other hand, requires\
\ only one abnormal copy of the of the Marfan gene, from one parent, to inherit\
\ the trait. The answer is (D).\n\nQ: A gene showing codominance\n(A) has both alleles\
\ independently expressed in the heterozygote (B) has one allele dominant to the\
\ other (C) has alleles tightly linked on the same chromosome (D) has alleles expressed\
\ at the same time in development\nA: Let's think step by step. We refer to Wikipedia\
\ articles on medical genetics for help. Codominance, as it relates to genetics,\
\ refers to a type of genetic inheritance where the phenotype of both the parents\
\ is easily observed in the offspring. A heterozygote is an individual having two\
\ different alleles of a gene. The answer is (A).\n\nQ: Large triplet repeat expansions\
\ can be detected by:\n(A) polymerase chain reaction. (B) single strand conformational\
\ polymorphism analysis. (C) Southern blotting. (D) Western blotting.\nA: Let's\
\ think step by step. We refer to Wikipedia articles on medical genetics for help.\
\ A Southern blot is a method in molecular biology for detecting specific DNA sequences\
\ in a sample. Large triplet repeat expansions are usually detected with this method.\
\ The answer is (C)."
"group": "mmlu_flan_cot_fewshot_other"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_medical_genetics"
"dataset_name": "miscellaneous"
"description": "The following are multiple choice questions (with answers) about miscellaneous.\n\
\nQ: Which of these songs was a Top 10 hit for the rock band The Police?\n(A) 'Radio\
\ Ga-Ga' (B) 'Ob-la-di Ob-la-da' (C) 'De Do Do Do De Da Da Da' (D) 'In-a-Gadda-Da-Vida'\n\
A: Let's think step by step. We refer to Wikipedia for help. Radio Ga-Ga is by Queen.\
\ Ob-la-di Ob-la-da is by The Beatles. And In-a-Gadda-Da-Vida is by Iron Butterfly.\
\ Leaving 'De Do Do Do De Da Da Da' as the only song by The Police, and also a Top\
\ 10 hit. The answer is (C).\n\nQ: What place is named in the title of the 1979\
\ live album by rock legends Cheap Trick?\n(A) Budapest (B) Budokan (C) Bhutan (D)\
\ Britain\nA: Let's think step by step. We refer to Wikipedia for help. Nippon Budokan\
\ is an indoor arena in Tokyo, Japan renowned for hosting rock music concerts including\
\ Cheap Trick in 1978. 'Cheap Trick at Budokan' became the name of their album.\
\ The answer is (B).\n\nQ: What is produced during photosynthesis?\n(A) hydrogen\
\ (B) nylon (C) oxygen (D) light\nA: Let's think step by step. We refer to Wikipedia\
\ for help. Photosynthesis is the process in which green plants use the green pigment\
\ chlorophyll to synthesize foods with water and carbon dioxide. Oxygen is the byproduct\
\ of this process. The answer is (C).\n\nQ: Who is the shortest man to ever win\
\ an NBA slam dunk competition?\n(A) Anthony 'Spud' Webb (B) Michael 'Air' Jordan\
\ (C) Tyrone 'Muggsy' Bogues (D) Julius 'Dr J' Erving\nA: Let's think step by step.\
\ We refer to Wikipedia for help. In 1986, Spud Webb, standing only 5'7\" became\
\ the shortest NBA player in history to win an official slam dunk contest. The answer\
\ is (A).\n\nQ: How many axles does a standard automobile have?\n(A) one (B) two\
\ (C) four (D) eight\nA: Let's think step by step. We refer to Wikipedia for help.\
\ Most cars have two axles to rotate the wheels.. The answer is (B)."
"group": "mmlu_flan_cot_fewshot_other"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_miscellaneous"
"dataset_name": "moral_disputes"
"description": "The following are multiple choice questions (with answers) about moral\
\ disputes.\n\nQ: Baron admits that the versions of the ticking bomb hypothetical\
\ she discusses are \"stunningly stupid,\" but she claims this is actually evidence\
\ of\n(A) the stupidity of most traditional philosophical examples. (B) a general\
\ lack of intelligence among people with advanced degrees. (C) the wrongness of\
\ torture. (D) the readiness on the part of many intelligent people to see torture\
\ as the best solution to deal with terrorism.\nA: Let's think step by step. We\
\ refer to Wikipedia articles on moral disputes for help. The ticking bomb hypothetical\
\ poses a problem where many people will die to an exploding bomb, if the hypothetical\
\ terrorist does not disclose how to defuse it. Baron sees this hypothetical as\
\ silly, but its prevalence does suggest intelligent people, particularly utilitarians,\
\ see torture as justifiable to save the lives in this scenario. The answer is (D).\n\
\nQ: A fertilized ovum is also known as\n(A) a zygote. (B) an embryo. (C) a viability.\
\ (D) a blastocyst.\nA: Let's think step by step. We refer to Wikipedia articles\
\ on moral disputes for help. Once a single sperm penetrates the layers of an egg\
\ to form a new cell, that cell is called a zygote. The answer is (A).\n\nQ: Pence\
\ compares six different cases of reproduction, from natural twinning to SCNT. What\
\ conclusion does he draw from this comparison?\n(A) SCNT is not a different kind\
\ of reproduction because there are no morally relevant differences between it and\
\ other permissible means of reproduction. (B) Because there is a low risk of harm\
\ for natural twinning, there will be a low risk of harm for SCNT. (C) both A and\
\ B (D) neither A nor B\nA: Let's think step by step. We refer to Wikipedia articles\
\ on moral disputes for help. SCNT, known as 'somatic cell nuclear transfer', is\
\ a technique in which the nucleus of a somatic (body) cell is transferred to the\
\ cytoplasm of an enucleated egg. This is used in farm animal cloning. Pence may\
\ compare this to other processes of reproduction, and see little moral separation.\
\ The answer is (A).\n\nQ: Which of the following is an intrinsic good, according\
\ to Singer?\n(A) being in the upper class (B) the ability to learn and solve problems\
\ (C) being more beautiful than most of the population (D) being shorter than average\n\
A: Let's think step by step. We refer to Wikipedia articles on moral disputes for\
\ help. Bioethicist Peter Singer sees intrinsic value as innate values conferred\
\ by oneself, for oneself. Innanimate objects can be beautiful, short, or have some\
\ valuable criteria, but capabilities are intrinsically good. The answer is (B).\n\
\nQ: According to Metz, what is wrong with consequentialist arguments against capital\
\ punishment based on African values?\n(A) It is unclear as of yet whether or not\
\ capital punishment deters harm to the community. (B) It is unclear as of yet whether\
\ or not capital punishment deters harm to any individuals. (C) Consequentialism\
\ is not supported by African values. (D) Even though consequentialism is supported\
\ by African values, no consequentialist arguments framed in terms of African values\
\ have been offered.\nA: Let's think step by step. We refer to Wikipedia articles\
\ on moral disputes for help. Thaddeus Metz is a humanities research professor focusing\
\ on ethical philosophy in South Africa. Metz has written the death penalty is unjustified\
\ as it treats individuals as incapable of communal relations. It is unclear that\
\ capital punishment is to the benefit of, or a deterrent of harm to the community.\
\ The answer is (A)."
"group": "mmlu_flan_cot_fewshot_humanities"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_moral_disputes"
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment