Unverified Commit 6769119f authored by Hailey Schoelkopf's avatar Hailey Schoelkopf Committed by GitHub
Browse files

Merge pull request #816 from EleutherAI/flan-benchmark

[Refactor] Flan benchmark
parents 4824a832 7d5e511c
dataset_name: international_law
description: 'The following are multiple choice questions (with answers) about international
law.
Q: How the consent to be bound of a State may be expressed?
(A) The consent of a State to be bound is expressed only by ratification (B) The
consent of a state to be bound by a treaty may be expressed by signature, ratification,
acceptance, approval or accession (C) The consent of a State to be bound is expressed
by signature (D) The consent of a State to be bound is expressed by whatever means
they choose
A: Let''s think step by step. We refer to Wikipedia articles on international law
for help. Article 11 of Vienna Convention on the Law of Treaties signed in 1969
states that "the consent of a State to be bound by a treaty may be expressed by
signature, exchange of instruments constituting a treaty, ratification, acceptance,
approval or accession, or by any other means if so agreed." (B) is the most precise
and accurate answer. The answer is (B).
Q: What is the judge ad hoc?
(A) If a party to a contentious case before the ICJ does not have a national sitting
as judge, it is entitled to nominate someone as a judge solely for that case, with
the title of judge ad hoc (B) Judge ad hoc is the member of the bench of the ICJ
with a casting vote (C) Judge ad hoc is a surrogate judge, in case a judge is disqualified
or passes away (D) Judge ad hoc is the judge that each party will always nominate
in every contentious case
A: Let''s think step by step. We refer to Wikipedia articles on international law
for help. As "ad hoc" implies, a judge ad hoc is appointed only for a specific case
or period, when a party to a contentious case before the International Court of
Justice does not have a regular national sitting as judge. The answer is (A).
Q: When ''consent'' can serve as a circumstance precluding the wrongfulness of a
State conduct?
(A) Consent can serve as a circumstance precluding the wrongfulness whenever it
is given (B) Consent can never serve as a circumstance precluding wrongfulness (C)
Consent can serve as a circumstance precluding wrongfulness, provided the consent
is valid and to the extent that the conduct remains within the limits of the consent
given (D) Consent can always serve as a circumstance precluding wrongfulness, no
matter which organ of the State gives it
A: Let''s think step by step. We refer to Wikipedia articles on international law
for help. Valid consent can serve as a circumstance precluding the wrongfulness
of a State conduct if the conduct remains within the limits of that consent, according
to Chapter V of the Responsibility of States for Internationally Wrongful Acts,
2001, United Nations. The answer is (C).
Q: Would a reservation to the definition of torture in the ICCPR be acceptable in
contemporary practice?
(A) This is an acceptable reservation if the reserving country''s legislation employs
a different definition (B) This is an unacceptable reservation because it contravenes
the object and purpose of the ICCPR (C) This is an unacceptable reservation because
the definition of torture in the ICCPR is consistent with customary international
law (D) This is an acceptable reservation because under general international law
States have the right to enter reservations to treaties
A: Let''s think step by step. We refer to Wikipedia articles on international law
for help. For it contravenes the object and purpose of the ICCPR, this is an unacceptable
reservation in contemporary practice. The answer is (B).
Q: What types of force does Article 2(4) of the UN Charter prohibit?
(A) Article 2(4) encompasses only armed force (B) Article 2(4) encompasses all types
of force, including sanctions (C) Article 2(4) encompasses all interference in the
domestic affairs of States (D) Article 2(4) encompasses force directed only against
a State''s territorial integrity
A: Let''s think step by step. We refer to Wikipedia articles on international law
for help. Article 2(4) of the UN Charter prohibits states from using armed forces
in their international relations. The answer is (A).'
include: _mmlu_flan_cot_fewshot_template_yaml
task: mmlu_flan_cot_fewshot_international_law
dataset_name: jurisprudence
description: 'The following are multiple choice questions (with answers) about jurisprudence.
Q: Iverson Jewelers wrote a letter to Miller, ''We have received an exceptionally
fine self winding Rolox watch which we will sell to you at a very favorable price.''
(A) The letter is an offer to sell (B) A valid offer cannot be made by letter. (C)
The letter contains a valid offer which will terminate within a reasonable time.
(D) The letter lacks one of the essential elements of an offer.
A: Let''s think step by step. We refer to Wikipedia articles on jurisprudence for
help. An offer shows the intent to enter into a mutually-beneficial contract with
specific terms. An offer can be made by a letter. While this letter indicates the
willingness to sell, the lack of specific terms, such as transaction price and offer
expiration date, makes it an incomplete offer. The answer is (D).
Q: Functions of the law include all but which of the following?
(A) maximizing individual freedom (B) providing a basis for compromise (C) keeping
the peace (D) promoting the principles of the free enterprise system
A: Let''s think step by step. We refer to Wikipedia articles on jurisprudence for
help. Laws are fundamentally about helping resolve disputes between individuals,
and therefore essential for maximizing individual freedom, providing a basis for
compromise, and keeping the peace. The answer is (D).
Q: The ________ School of jurisprudence postulates that the law is based on what
is "correct."
(A) Natural Law (B) Analytical (C) Historical (D) Sociological
A: Let''s think step by step. We refer to Wikipedia articles on jurisprudence for
help. Natural Law School of jurisprudence focuses on the laws of nature, and states
that the law should be based on ethics, morals, and what is "correct". Analytical
deals with the law as it already exists, Historical postulates that the law was
found and not made, and Sociological studies how the law and society impact each
other. The answer is (A).
Q: Which word best summarizes Weber''s explanation of the development of formally
rational law?
(A) Authority. (B) Charisma. (C) Co-operation. (D) Capitalism.
A: Let''s think step by step. We refer to Wikipedia articles on jurisprudence for
help. Weber explained the development of formal rationality in laws as how the modern
society moved from tradition to rationality, where people decide actions based less
on how they were culturally done and more on expected utilities. How rational individuals
optimize efficiency of accomplishing tasks for higher rewards is a core principle
of Capitalism. The answer is (D).
Q: Which position does Rawls claim is the least likely to be adopted by the POP
(people in the original position)?
(A) The POP would choose equality above liberty. (B) The POP would opt for the ''maximin''
strategy. (C) The POP would opt for the ''difference principle''. (D) The POP would
reject the ''system of natural liberty.''
A: Let''s think step by step. We refer to Wikipedia articles on jurisprudence for
help. The POP would opt for the ''maximin'' strategy, opt for the ''difference principle'',
and reject the ''system of natural liberty'', but the POP would not choose equality
above liberty, since the POP assume both equal and free citizens. The answer is
(A).'
include: _mmlu_flan_cot_fewshot_template_yaml
task: mmlu_flan_cot_fewshot_jurisprudence
dataset_name: logical_fallacies
description: 'The following are multiple choice questions (with answers) about logical
fallacies.
Q: When an arguer causes confusion during refutation because of real or feigned
lack of an ability to engage in refutation, that arguer may have committed the fallacy
of
(A) poor sportsmanship (B) appeal to compassion (C) argument against the person
(D) ignorance of refutation
A: Let''s think step by step. We refer to Wikipedia articles on logical fallacies
for help. Ignorance of refutation, one of Aristotle''s original list of logical
fallacies in his Organon, is when someone causes confusion in an argument through
real or feigned inability to engage in refutation, in order to win the argument.
The answer is (D).
Q: The complex question fallacy consists of
(A) arguing something is inferior just because it doesn''t do something it was never
intended to do. (B) including more than one claim in the proposition and treating
proof for one claim as proof for all the claims. (C) drawing a conclusion before
examining the evidence, and only considering evidence that supports that conclusion.
(D) asking a question that includes either an unproven assumption or more than one
question, thus making a straightforward yes or no answer meaningless.
A: Let''s think step by step. We refer to Wikipedia articles on logical fallacies
for help. The complex question fallacy is when someone makes a single yes or no
answer to a question meaningless, by including either an unproven assumption or
many questions. The latter is also known as the many questions fallacy. The answer
is (D).
Q: Arguing that what is true of the parts must be true of the whole is the fallacy
of...
(A) Division (B) Composition (C) Appeal to the person (D) Appeal to ignorance
A: Let''s think step by step. We refer to Wikipedia articles on logical fallacies
for help. Fallacy of composition occurs when someone argues what is true of the
parts must be true of the whole. The answer is (B).
Q: Which of the following is true of a valid categorical syllogism?
(A) The minor premise must deny the antecedent (B) The major premise must affirm
the consequent (C) The middle term must be used in at least one premise in a universal
or unqualified sense (D) All of the above
A: Let''s think step by step. We refer to Wikipedia articles on logical fallacies
for help. A valid categorical syllogism must satisfy several conditions: (1) the
syllogism must have exactly three terms (2) every term of the syllogism must be
used twice exactly, (3) a term may be used only once in any premise, and (4) the
middle term must be used in at least one premise in a universal or unqualified sense,
etc. Only (C) is true. The answer is (C).
Q: If someone attacks the character of an opposing arguer, instead of responding
to that opponent''s arguments, the first person has probably committed which of
the following fallacies?
(A) tu quoque (B) horse laugh (C) argument against the person (D) ignoratio elenchi
A: Let''s think step by step. We refer to Wikipedia articles on logical fallacies
for help. The argument against the person fallacy occurs when someone irrelevantly
attacks the character of an opposing arguer, instead of addressing that opponent''s
arguments. The answer is (C).'
include: _mmlu_flan_cot_fewshot_template_yaml
task: mmlu_flan_cot_fewshot_logical_fallacies
dataset_name: machine_learning
description: "The following are multiple choice questions (with answers) about machine\
\ learning.\n\nQ: Which image data augmentation is most common for natural images?\n\
(A) random crop and horizontal flip (B) random crop and vertical flip (C) posterization\
\ (D) dithering\nA: Let's think step by step. Data augmentation is used to increase\
\ the diversity of images in the training dataset. It is important that natural\
\ images are kept natural after being augmented. Vertical flips of images are not\
\ natural, so (B) is false. Posterization makes the image look like a poster and\
\ and dithering increases color depth. None of these two preserve the natural property.\
\ The only natural data augmentation technique is (A). The answer is (A).\n\nQ:\
\ Traditionally, when we have a real-valued input attribute during decision-tree\
\ learning we consider a binary split according to whether the attribute is above\
\ or below some threshold. Pat suggests that instead we should just have a multiway\
\ split with one branch for each of the distinct values of the attribute. From the\
\ list below choose the single biggest problem with Pat\u2019s suggestion:\n(A)\
\ It is too computationally expensive. (B) It would probably result in a decision\
\ tree that scores badly on the training set and a testset. (C) It would probably\
\ result in a decision tree that scores well on the training set but badly on a\
\ testset. (D) It would probably result in a decision tree that scores well on a\
\ testset but badly on a training set.\nA: Let's think step by step. Because the\
\ input is real valued, it is unlikely that the same values appear both at training\
\ and test time. This means that while such a decision tree could yield good performance\
\ on the training data, when evaluated on the test data it will perform badly because\
\ the decision tree won\u2019t know what to do with numbers that did not appear\
\ in the training data. The answer is (C).\n\nQ: You are reviewing papers for the\
\ World\u2019s Fanciest Machine Learning Conference, and you see submissions with\
\ the following claims. Which ones would you consider accepting?\n(A) My method\
\ achieves a training error lower than all previous methods! (B) My method achieves\
\ a test error lower than all previous methods! (Footnote: When regularisation parameter\
\ \u03BB is chosen so as to minimise test error.) (C) My method achieves a test\
\ error lower than all previous methods! (Footnote: When regularisation parameter\
\ \u03BB is chosen so as to minimise cross-validaton error.) (D) My method achieves\
\ a cross-validation error lower than all previous methods! (Footnote: When regularisation\
\ parameter \u03BB is chosen so as to minimise cross-validaton error.)\nA: Let's\
\ think step by step. In machine learning, we train with some data and fixed hyperparameters\
\ and the training error can be arbitrarily low, so (A) can\u2019t be right. Then,\
\ one compares different hyperparameters by selecting the model with the lowest\
\ cross-validation error, this means that (B) and (D) are not the right procedure.\
\ The only relevant number after these is the test error and thus (C) is the right\
\ answer. The answer is (C).\n\nQ: A 6-sided die is rolled 15 times and the results\
\ are: side 1 comes up 0 times; side 2: 1 time; side 3: 2 times; side 4: 3 times;\
\ side 5: 4 times; side 6: 5 times. Based on these results, what is the probability\
\ of side 3 coming up when using Add-1 Smoothing?\n(A) 2.0/15 (B) 1.0/7 (C) 3.0/16\
\ (D) 1.0/5\nA: Let's think step by step. Add-1 smoothing adds the value of one\
\ to the different counts and then normalizes the probabilities accordingly. The\
\ counts after adding one will be: side 1 comes up 1 time; side 2: 2 times; side\
\ 3: 3 times; side 4: 4 times; side 5: 5 times; side 6: 6 times. The number of sum\
\ one die rolls will be 21, so the probability of drawing a three is 3/21 = 1/7.\
\ The answer is (B).\n\nQ: To achieve an 0/1 loss estimate that is less than 1 percent\
\ of the true 0/1 loss (with probability 95%), according to Hoeffding's inequality\
\ the IID test set must have how many examples?\n(A) around 10 examples (B) around\
\ 100 examples (C) between 100 and 500 examples (D) more than 1000 examples\nA:\
\ Let's think step by step. By the Hoeffding\u2019s inequality, we expect that with\
\ 95% probability the in-sample and out-of-sample errors differ by epsilon when\
\ we have N samples if 2 exp(-2 epsilon^2 N)<0.05, this implies that N > -1/(2*epsilon**2)\
\ log ( 0.05/2 )= log (40)*5000. Since log(40)>1, we have that one needs more than\
\ 1000 examples. The answer is (D)."
include: _mmlu_flan_cot_fewshot_template_yaml
task: mmlu_flan_cot_fewshot_machine_learning
dataset_name: management
description: 'The following are multiple choice questions (with answers) about management.
Q: How can organisational structures that are characterised by democratic and inclusive
styles of management be described?
(A) Hierarchical (B) Bureaucratic (C) Flat (D) Functional
A: Let''s think step by step. We refer to Wikipedia articles on management for help.
Flat organizational structures are characterized by democratic and inclusive styles
of management, and have few (if any) levels of management between the workers and
managers. The answer is (C).
Q: Hygiene factors are associated with which writer?
(A) Frederick Hertzberg (B) D.C. McClelland (C) Abraham Maslow (D) Douglas McGregor
A: Let''s think step by step. We refer to Wikipedia articles on management for help.
Hygiene factors include compensation, company policies, supervision, interpersonal
relations, and work environments. Hertzberg lists them as factors that cannot motivate
employees but can minimize job dissatisfaction. The answer is (A).
Q: What characteristic is not a key feature of the ''open systems'' model of management?
(A) Morale (B) Innovation (C) Growth resource (D) Adaptation
A: Let''s think step by step. We refer to Wikipedia articles on management for help.
The key characteristics of an open system in management include innovation, growth
resource, and adaption, but do not include morale. The answer is (A).
Q: Which element of the cultural web forms regalia?
(A) Symbols (B) Rituals and routines (C) Power structures (D) Control systems
A: Let''s think step by step. We refer to Wikipedia articles on management for help.
The cultural web is a tool for mapping an organization''s culture, where symbols
form the regalia that visually expresses the values that the organization holds
as important. The answer is (A).
Q: What are the two main dimensions of the Ohio Studies into leadership?
(A) Starting position and end position (B) Initial environment and changed environment
(C) Organisational structure and conditioning (D) Initiating structure and considerations
A: Let''s think step by step. We refer to Wikipedia articles on management for help.
The Ohio State Leadership Studies conducted in the 1940s identified initiating structure
and consideration as the two main dimensions of leader behavior. The answer is (D).'
include: _mmlu_flan_cot_fewshot_template_yaml
task: mmlu_flan_cot_fewshot_management
dataset_name: marketing
description: 'The following are multiple choice questions (with answers) about marketing.
Q: Although the content and quality can be as controlled as direct mail, response
rates of this medium are lower because of the lack of a personal address mechanism.
This media format is known as:
(A) Care lines. (B) Direct mail. (C) Inserts. (D) Door to door.
A: Let''s think step by step. We refer to Wikipedia articles on marketing for help.
Door to door marketing delivers non-addressed items within all buildings within
a geographic area. While it can control the content and quality as well as direct
mail marketing, its response rate is lower because of the lack of a personal address
mechanism. The answer is (D).
Q: In an organization, the group of people tasked with buying decisions is referred
to as the _______________.
(A) Outsourcing unit. (B) Procurement centre. (C) Chief executive unit. (D) Decision-making
unit.
A: Let''s think step by step. We refer to Wikipedia articles on marketing for help.
In an organization, the group of the people tasked with buying decision is referred
to as the decision-making unit. The answer is (D).
Q: The single group within society that is most vulnerable to reference group influence
is:
(A) The older consumer who feels somewhat left out of things. (B) The married women,
many of whom feel a need for stability in their lives. (C) New immigrants who really
want to assimilate into their new culture. (D) Children, who base most of their
buying decisions on outside influences.
A: Let''s think step by step. We refer to Wikipedia articles on marketing for help.
Children, who mostly based their buying decisions on outside influences, are the
single group within society that is more vulnerable to reference group influence.
The answer is (D).
Q: Which of the following is an assumption in Maslow''s hierarchy of needs?
(A) Needs are dependent on culture and also on social class. (B) Lower-level needs
must be at least partially satisfied before higher needs can affect behaviour. (C)
Needs are not prioritized or arranged in any particular order. (D) Satisfied needs
are motivators, and new needs emerge when current needs remain unmet.
A: Let''s think step by step. We refer to Wikipedia articles on marketing for help.
Maslow''s hierarchy of needs, from the bottom upwards, are physiological (food and
clothing), safety, love and belonging needs, esteem, and self-actualization. Lower-level
needs must be at least partially satisfied before higher ones can affect behavior.
The answer is (B).
Q: _____________ is a natural outcome when combining demographic and geographic
variables.
(A) Geodemographics (B) Product differentiation. (C) ANSOFF matrix. (D) Brand management.
A: Let''s think step by step. We refer to Wikipedia articles on marketing for help.
Geodemographics is a natural outcome when combining demographic and geographic variables.
The answer is (A).'
include: _mmlu_flan_cot_fewshot_template_yaml
task: mmlu_flan_cot_fewshot_marketing
dataset_name: medical_genetics
description: 'The following are multiple choice questions (with answers) about medical
genetics.
Q: The stage of meiosis in which chromosomes pair and cross over is:
(A) prophase I (B) metaphase I (C) prophase II (D) metaphase II
A: Let''s think step by step. We refer to Wikipedia articles on medical genetics
for help. Prophase I is the stage of meiosis where homologous chromosomes pair with
each other and exchange genetic material. The answer is (A).
Q: DNA ligase is
(A) an enzyme that joins fragments in normal DNA replication (B) an enzyme of bacterial
origin which cuts DNA at defined base sequences (C) an enzyme that facilitates transcription
of specific genes (D) an enzyme which limits the level to which a particular nutrient
reaches
A: Let''s think step by step. We refer to Wikipedia articles on medical genetics
for help. DNA ligase is a type of enzyme (EC 6.5.1.1) responsible for joining DNA
strands together by catalyzing a phosphodiester bond. The answer is (A).
Q: Which of the following conditions does not show multifactorial inheritance?
(A) Pyloric stenosis (B) Schizophrenia (C) Spina bifida (neural tube defects) (D)
Marfan syndrome
A: Let''s think step by step. We refer to Wikipedia articles on medical genetics
for help. Multifactorial inheritance is when more than a single factor is responsible
for causing a given trait or health problem. Genes cannot be the only factor. Marfan
syndrome, on the other hand, requires only one abnormal copy of the of the Marfan
gene, from one parent, to inherit the trait. The answer is (D).
Q: A gene showing codominance
(A) has both alleles independently expressed in the heterozygote (B) has one allele
dominant to the other (C) has alleles tightly linked on the same chromosome (D)
has alleles expressed at the same time in development
A: Let''s think step by step. We refer to Wikipedia articles on medical genetics
for help. Codominance, as it relates to genetics, refers to a type of genetic inheritance
where the phenotype of both the parents is easily observed in the offspring. A heterozygote
is an individual having two different alleles of a gene. The answer is (A).
Q: Large triplet repeat expansions can be detected by:
(A) polymerase chain reaction. (B) single strand conformational polymorphism analysis.
(C) Southern blotting. (D) Western blotting.
A: Let''s think step by step. We refer to Wikipedia articles on medical genetics
for help. A Southern blot is a method in molecular biology for detecting specific
DNA sequences in a sample. Large triplet repeat expansions are usually detected
with this method. The answer is (C).'
include: _mmlu_flan_cot_fewshot_template_yaml
task: mmlu_flan_cot_fewshot_medical_genetics
dataset_name: miscellaneous
description: 'The following are multiple choice questions (with answers) about miscellaneous.
Q: Which of these songs was a Top 10 hit for the rock band The Police?
(A) ''Radio Ga-Ga'' (B) ''Ob-la-di Ob-la-da'' (C) ''De Do Do Do De Da Da Da'' (D)
''In-a-Gadda-Da-Vida''
A: Let''s think step by step. We refer to Wikipedia for help. Radio Ga-Ga is by
Queen. Ob-la-di Ob-la-da is by The Beatles. And In-a-Gadda-Da-Vida is by Iron Butterfly.
Leaving ''De Do Do Do De Da Da Da'' as the only song by The Police, and also a Top
10 hit. The answer is (C).
Q: What place is named in the title of the 1979 live album by rock legends Cheap
Trick?
(A) Budapest (B) Budokan (C) Bhutan (D) Britain
A: Let''s think step by step. We refer to Wikipedia for help. Nippon Budokan is
an indoor arena in Tokyo, Japan renowned for hosting rock music concerts including
Cheap Trick in 1978. ''Cheap Trick at Budokan'' became the name of their album.
The answer is (B).
Q: What is produced during photosynthesis?
(A) hydrogen (B) nylon (C) oxygen (D) light
A: Let''s think step by step. We refer to Wikipedia for help. Photosynthesis is
the process in which green plants use the green pigment chlorophyll to synthesize
foods with water and carbon dioxide. Oxygen is the byproduct of this process. The
answer is (C).
Q: Who is the shortest man to ever win an NBA slam dunk competition?
(A) Anthony ''Spud'' Webb (B) Michael ''Air'' Jordan (C) Tyrone ''Muggsy'' Bogues
(D) Julius ''Dr J'' Erving
A: Let''s think step by step. We refer to Wikipedia for help. In 1986, Spud Webb,
standing only 5''7" became the shortest NBA player in history to win an official
slam dunk contest. The answer is (A).
Q: How many axles does a standard automobile have?
(A) one (B) two (C) four (D) eight
A: Let''s think step by step. We refer to Wikipedia for help. Most cars have two
axles to rotate the wheels.. The answer is (B).'
include: _mmlu_flan_cot_fewshot_template_yaml
task: mmlu_flan_cot_fewshot_miscellaneous
dataset_name: moral_disputes
description: 'The following are multiple choice questions (with answers) about moral
disputes.
Q: Baron admits that the versions of the ticking bomb hypothetical she discusses
are "stunningly stupid," but she claims this is actually evidence of
(A) the stupidity of most traditional philosophical examples. (B) a general lack
of intelligence among people with advanced degrees. (C) the wrongness of torture.
(D) the readiness on the part of many intelligent people to see torture as the best
solution to deal with terrorism.
A: Let''s think step by step. We refer to Wikipedia articles on moral disputes for
help. The ticking bomb hypothetical poses a problem where many people will die to
an exploding bomb, if the hypothetical terrorist does not disclose how to defuse
it. Baron sees this hypothetical as silly, but its prevalence does suggest intelligent
people, particularly utilitarians, see torture as justifiable to save the lives
in this scenario. The answer is (D).
Q: A fertilized ovum is also known as
(A) a zygote. (B) an embryo. (C) a viability. (D) a blastocyst.
A: Let''s think step by step. We refer to Wikipedia articles on moral disputes for
help. Once a single sperm penetrates the layers of an egg to form a new cell, that
cell is called a zygote. The answer is (A).
Q: Pence compares six different cases of reproduction, from natural twinning to
SCNT. What conclusion does he draw from this comparison?
(A) SCNT is not a different kind of reproduction because there are no morally relevant
differences between it and other permissible means of reproduction. (B) Because
there is a low risk of harm for natural twinning, there will be a low risk of harm
for SCNT. (C) both A and B (D) neither A nor B
A: Let''s think step by step. We refer to Wikipedia articles on moral disputes for
help. SCNT, known as ''somatic cell nuclear transfer'', is a technique in which
the nucleus of a somatic (body) cell is transferred to the cytoplasm of an enucleated
egg. This is used in farm animal cloning. Pence may compare this to other processes
of reproduction, and see little moral separation. The answer is (A).
Q: Which of the following is an intrinsic good, according to Singer?
(A) being in the upper class (B) the ability to learn and solve problems (C) being
more beautiful than most of the population (D) being shorter than average
A: Let''s think step by step. We refer to Wikipedia articles on moral disputes for
help. Bioethicist Peter Singer sees intrinsic value as innate values conferred by
oneself, for oneself. Innanimate objects can be beautiful, short, or have some valuable
criteria, but capabilities are intrinsically good. The answer is (B).
Q: According to Metz, what is wrong with consequentialist arguments against capital
punishment based on African values?
(A) It is unclear as of yet whether or not capital punishment deters harm to the
community. (B) It is unclear as of yet whether or not capital punishment deters
harm to any individuals. (C) Consequentialism is not supported by African values.
(D) Even though consequentialism is supported by African values, no consequentialist
arguments framed in terms of African values have been offered.
A: Let''s think step by step. We refer to Wikipedia articles on moral disputes for
help. Thaddeus Metz is a humanities research professor focusing on ethical philosophy
in South Africa. Metz has written the death penalty is unjustified as it treats
individuals as incapable of communal relations. It is unclear that capital punishment
is to the benefit of, or a deterrent of harm to the community. The answer is (A).'
include: _mmlu_flan_cot_fewshot_template_yaml
task: mmlu_flan_cot_fewshot_moral_disputes
dataset_name: moral_scenarios
description: 'The following are multiple choice questions (with answers) about moral
scenarios.
Q: For which of these two scenarios does the main character (who uses I/me/my) do
something clearly morally wrong, according to ordinary moral standards in the US
as of 2020? Scenario 1 | I took his wallet after he harassed me at the bar. Scenario
2 | I made love to my newborn baby boy.
(A) Wrong, Wrong (B) Wrong, Not wrong (C) Not wrong, Wrong (D) Not wrong, Not wrong
A: Let''s think step by step. We refer to Wikipedia articles on moral scenarios
for help. Stealing from someone is not considered an ethical response to harrassment.
And having sex with a minor is egregiously unethical. The answer is (A).
Q: For which of these two scenarios does the main character (who uses I/me/my) do
something clearly morally wrong, according to ordinary moral standards in the US
as of 2020? Scenario 1 | I asked for a cigarette from the guy outside the store.
Scenario 2 | When the security guard left his post, I pocketed a rare coin.
(A) Wrong, Wrong (B) Wrong, Not wrong (C) Not wrong, Wrong (D) Not wrong, Not wrong
A: Let''s think step by step. We refer to Wikipedia articles on moral scenarios
for help. Asking for something is not morally wrong. Stealing is morally wrong.
The answer is (C).
Q: For which of these two scenarios does the main character (who uses I/me/my) do
something clearly morally wrong, according to ordinary moral standards in the US
as of 2020? Scenario 1 | During the house''s demolition, I pushed him into a wall.
Scenario 2 | When I choose a babysitter for my twin nieces, I don''t care about
her racial background.
(A) Wrong, Wrong (B) Wrong, Not wrong (C) Not wrong, Wrong (D) Not wrong, Not wrong
A: Let''s think step by step. We refer to Wikipedia articles on moral scenarios
for help. Physical violence is morally wrong. Being open to hiring people from different
racial backgrounds is an expected moral stance. The answer is (B).
Q: For which of these two scenarios does the main character (who uses I/me/my) do
something clearly morally wrong, according to ordinary moral standards in the US
as of 2020? Scenario 1 | I bought an old pick-up truck for work. Scenario 2 | I
looked up at the gorgeous waterfall and threw a rock at it.
(A) Wrong, Wrong (B) Wrong, Not wrong (C) Not wrong, Wrong (D) Not wrong, Not wrong
A: Let''s think step by step. We refer to Wikipedia articles on moral scenarios
for help. Neither of these are wrong as no one is wronged by buing a pick-up truck
or throwing a rock into a stream. The answer is (D).
Q: For which of these two scenarios does the main character (who uses I/me/my) do
something clearly morally wrong, according to ordinary moral standards in the US
as of 2020? Scenario 1 | I really loved my newborn baby boy. Scenario 2 | The new
girl is embarrassed to live in a trailer park, so I invite the class to her house.
(A) Wrong, Wrong (B) Wrong, Not wrong (C) Not wrong, Wrong (D) Not wrong, Not wrong
A: Let''s think step by step. We refer to Wikipedia articles on moral scenarios
for help. Loving someone is not wrong. However, exposing something that someone
is embarrassed about could be considered quite mean. The answer is (C).'
include: _mmlu_flan_cot_fewshot_template_yaml
task: mmlu_flan_cot_fewshot_moral_scenarios
dataset_name: nutrition
description: 'The following are multiple choice questions (with answers) about nutrition.
Q: What is the first-line drug for patients with type 2 diabetes and obesity, as
of 2020?
(A) Acarbose (B) Metformin (C) Sulphonylureas (D) Insulin
A: Let''s think step by step. We refer to Wikipedia articles on nutrition for help.
Metformin (Fortamet, Glumetza, or others) is usually the first medication prescribed
for type 2 diabetes, as well as obesity. It works by lowering glucose production
in the liver and improving the body''s sensitivity to insulin. The answer is (B).
Q: Which of the following statements is correct (according to knowledge in 2020)?
(A) Consumers with phenylketonuria must avoid the consumption of the sweetener aspartame
(B) Consumers with phenylketonuria must avoid the consumption of the sweetener saccharin
(C) Consumers with phenylketonuria must avoid the consumption of the sweetener sucralose
(D) Consumers with phenylketonuria must avoid the consumption of the sweetener acesulfame
K
A: Let''s think step by step. We refer to Wikipedia articles on nutrition for help.
People with phenylketonuria (PKU) cannot break down the amino acid phenylalanine.
As it builds up in the blood and brain it can lead to brain damage. People with
PKU should avoid foods that are converted to phenylalanine in the body, such as
aspartame. The answer is (A).
Q: Which of the following statements about iodine is correct, as of 2020?
(A) 50% of adults consume iodine at levels below the RNI (B) Dairy products are
a poor source of iodine (C) The iodine content of organic milk is generally lower
that the level in non-organic milk (D) UK dietary reference values recommend an
increase in iodine intake in pregnancy
A: Let''s think step by step. We refer to Wikipedia articles on nutrition for help.
Organic milk usually has less iodine content than non-organic milk. The answer is
(C).
Q: Which of the following is the most plausible explanation for the protective effect
of dietary fibre against cancer of the colon, as of 2020?
(A) Propionic acid, formed during colonic fibre fermentation inhibits liver fatty
acid synthesis (B) Butyric acid, formed during colonic fibre fermentation stimulates
"silencing" of the SLC5A8 tumour suppressor gene (C) None of these options are correct
(D) Butyric acid, formed during colonic fibre fermentation stimulates anti-oxidant
defences in the colon
A: Let''s think step by step. We refer to Wikipedia articles on nutrition for help.
Dietary fibre is inversely proportional to the risk of colorectal cancer. This is
presumed because butyric acid (BA) stimulates antioxidants which help protect the
colon from cancerous tumors. The answer is (D).
Q: In a cohort study, the risk ratio of developing diabetes was 0.86 when comparing
consumers of tea (the exposed) to those who did not drink tea (the unexposed). Which
one statement is correct (according to knowledge in 2020)?
(A) The tea drinkers have lower risk of developing diabetes. (B) The tea drinkers
have higher risk of developing diabetes. (C) Based on the information given we cannot
tell if the observed difference in disease risk is the result of chance. (D) The
risk ratio is close to the value one, so there is no difference in disease risk
between the two groups.
A: Let''s think step by step. We refer to Wikipedia articles on nutrition for help.
The risk ratio is not sufficiently reduced that it could not be explained by random
chance given the studies sample size. The answer is (C).'
include: _mmlu_flan_cot_fewshot_template_yaml
task: mmlu_flan_cot_fewshot_nutrition
dataset_name: philosophy
description: "The following are multiple choice questions (with answers) about philosophy.\n\
\nQ: The study of reality in the broadest sense, an inquiry into the elemental nature\
\ of the universe and the things in it, is known as _____.\n(A) metaphysics (B)\
\ epistemology (C) quantum physics (D) axiology\nA: Let's think step by step. We\
\ refer to Wikipedia articles on philosophy for help. Among the options, only metaphysics\
\ studies the nature of reality and existence. The answer is (A).\n\nQ: According\
\ to Moore\u2019s \u201Cideal utilitarianism,\u201D the right action is the one\
\ that brings about the greatest amount of:\n(A) pleasure. (B) happiness. (C) good.\
\ (D) virtue.\nA: Let's think step by step. We refer to Wikipedia articles on philosophy\
\ for help. Moore's \"ideal utilitarianism\" states that one's actions should maximize\
\ intrinsic goods. The answer is (C).\n\nQ: Before Tolstoy's Christian conversion,\
\ what was his perspective on the meaning of life?\n(A) optimist (B) satisfied (C)\
\ nominally religious (D) pessimist\nA: Let's think step by step. We refer to Wikipedia\
\ articles on philosophy for help. Before his conversion, Tolstoy feels that life\
\ was uncertain, which is a pessimist's point of view. The answer is (D).\n\nQ:\
\ According to d'Holbach, people always act according to _____.\n(A) free choices\
\ (B) dictates of the soul (C) necessary natural laws (D) undetermined will\nA:\
\ Let's think step by step. We refer to Wikipedia articles on philosophy for help.\
\ d'Holbach believes that people act according to necessary laws, and it proves\
\ nothing about people's free will. The answer is (C).\n\nQ: Psychological egoism\
\ is:\n(A) an ethical theory about how we ought to behave. (B) a generalization\
\ concerning the way people tend to behave. (C) a claim about human nature and the\
\ ways people are capable of behaving. (D) none of the above.\nA: Let's think step\
\ by step. We refer to Wikipedia articles on philosophy for help. Psychological\
\ egoism suggests that one behaves based on what makes one feels good, hence it\
\ is a claim about human nature and how humans are capable of behaving. The answer\
\ is (C)."
include: _mmlu_flan_cot_fewshot_template_yaml
task: mmlu_flan_cot_fewshot_philosophy
dataset_name: prehistory
description: 'The following are multiple choice questions (with answers) about prehistory.
Q: What is the approximate mean cranial capacity of Homo erectus?
(A) under 650 cc (B) about 800 cc (C) just under 1000 cc (D) 1200 cc
A: Let''s think step by step. We refer to Wikipedia articles on prehistory for help.
The average cranium capacity of Homo erectus is less than 1000 cubic cm. The answer
is (C).
Q: According to Timothy Pauketat, the evidence for social stratification and political
power at Cahokia suggests:
(A) a center of Mississippian civilization with conditions similar to the rise of
early states. (B) the limitations of authority in a Native American society of egalitarian
foragers. (C) a simple chiefdom or perhaps a complex chiefdom had evolved by A.D.
1500. (D) a center of Mississippian civilization with conditions similar to societies
on the Northwest Coast of North America.
A: Let''s think step by step. We refer to Wikipedia articles on prehistory for help.
Timothy Pauketat is known for his research on Cahokia, the center of the Mississippian
culture, where he found similar conditions to the rise of early states. The answer
is (A).
Q: Recent research on hominid species dating from the Middle Pliocene indicates
there was (as of 2020):
(A) a great amount of species diversity, or a single species that exhibited a lot
of diversity. (B) very little species diversity during this period and very few
hominids. (C) decreased species diversity due to a prolonged ice age followed by
a severe drought. (D) decreased species diversity but increased numbers of hammerstones
and flakes, indicating stone tool manufacture.
A: Let''s think step by step. We refer to Wikipedia articles on prehistory for help.
Recent research has recognized multiple hominid species from the Middle Pliocene,
meaning that there is a great amount of species diversity or diversity in a single
species. The answer is (A).
Q: Researchers now believe that the decline of the Maya was caused chiefly by:
(A) a cataclysm of some kind, such as an earthquake, volcano, or tsunami. (B) ecological
degradation resulting from slash-and-burn farming techniques. (C) endless wars between
neighboring Mayan city-states. (D) practices of interbreeding that led to a steep
rise in congenital disorders.
A: Let''s think step by step. We refer to Wikipedia articles on prehistory for help.
Researchers believe that the Maya collapse was mainly caused by over-exploitation
of natural resources like the slash-and-burn farming techniques. The answer is (B).
Q: The great Mayan king Pacal built temples in the city of Palenque in order to:
(A) satisfy the powerful Mayan astronomer priests. (B) display his generosity to
the common people, since they were allowed to live in the temples. (C) frighten
away enemies, in particular the Spaniards. (D) legitimize his kingship, since his
father was not royal.
A: Let''s think step by step. We refer to Wikipedia articles on prehistory for help.
Pacal built the temples as the funerary monument to legitimize his kingship. The
answer is (D).'
include: _mmlu_flan_cot_fewshot_template_yaml
task: mmlu_flan_cot_fewshot_prehistory
dataset_name: professional_accounting
description: "The following are multiple choice questions (with answers) about professional\
\ accounting.\n\nQ: An auditor traces the serial numbers on equipment to a nonissuer\u2019\
s subledger. Which of the following management assertions is supported by this test?\n\
(A) Valuation and allocation (B) Completeness (C) Rights and obligations (D) Presentation\
\ and disclosure\nA: Let's think step by step. We refer to Wikipedia articles on\
\ accounting for help. The completeness assertion is tested by tracing supporting\
\ documents to the record entries. The answer is (B).\n\nQ: One hundred years ago,\
\ your great-great-grandmother invested $100 at 5% yearly interest. What is the\
\ investment worth today?\n(A) $13,000 (B) $600 (C) $15,000 (D) $28,000\nA: Let's\
\ think step by step. We refer to Wikipedia articles on accounting for help. A $100\
\ investment at 5% yearly interest is worth 100*(1.05)^100=13150 after 100 years,\
\ which is around $13,000. The answer is (A).\n\nQ: On January 1, year 1, Alpha\
\ Co. signed an annual maintenance agreement with a software provider for $15,000\
\ and the maintenance period begins on March 1, year 1. Alpha also incurred $5,000\
\ of costs on January 1, year 1, related to software modification requests that\
\ will increase the functionality of the software. Alpha depreciates and amortizes\
\ its computer and software assets over five years using the straight-line method.\
\ What amount is the total expense that Alpha should recognize related to the maintenance\
\ agreement and the software modifications for the year ended December 31, year\
\ 1?\n(A) $5,000 (B) $13,500 (C) $16,000 (D) $20,000\nA: Let's think step by step.\
\ We refer to Wikipedia articles on accounting for help. The maintenance period\
\ begins on March 1, so only 10 months of expenses should be recognized, which is\
\ $15,000/12*10=$12,500. The software modification cost is amortized over 5 years,\
\ so each year is $5,000/5=$1,000. So the total expense is $12,500+$1,000=$13,500.\
\ The answer is (B).\n\nQ: Krete is an unmarried taxpayer with income exclusively\
\ from wages. By December 31, year 1, Krete's employer has withheld $16,000 in federal\
\ income taxes and Krete has made no estimated tax payments. On April 15, year 2,\
\ Krete timely filed for an extension request to file her individual tax return,\
\ and paid $300 of additional taxes. Krete's year 1 tax liability was $16,500 when\
\ she timely filed her return on April 30, year 2, and paid the remaining tax liability\
\ balance. What amount would be subject to the penalty for underpayment of estimated\
\ taxes?\n(A) $0 (B) $500 (C) $1,650 (D) $16,500\nA: Let's think step by step. We\
\ refer to Wikipedia articles on accounting for help. The tax due after withholding\
\ is $16,500-$16,000=$500, which is less than $1000, hence there is no underpayment\
\ penalty of estimated taxes. The answer is (A).\n\nQ: Box a nongovernmental not-for-profit\
\ organization had the following transactions during the year: Proceeds from sale\
\ of investments $80000 Purchase of property plant and equipment $10000 Proceeds\
\ from long-term debt $100000 Loss on sale of investment $5000 What amount should\
\ be reported as net cash provided by financing activities in Box's statement of\
\ cash flows?\n(A) $70,000 (B) $75,000 (C) $80,000 (D) 100000\nA: Let's think step\
\ by step. We refer to Wikipedia articles on accounting for help. Among the four\
\ transactions, only Proceeds from long-term debt belongs to the financing activities\
\ section of cashflow, hence the amount reported should be $100000. The answer is\
\ (D)."
include: _mmlu_flan_cot_fewshot_template_yaml
task: mmlu_flan_cot_fewshot_professional_accounting
dataset_name: professional_law
description: "The following are multiple choice questions (with answers) about professional\
\ law.\n\nQ: A son owed a creditor $5,000. The son's father contacted the creditor\
\ and told him that he wanted to pay the son's debt. The father signed a document\
\ that stated the father would pay the son's debt at a rate of $500 a month for\
\ 10 months. The creditor made no written or oral commitment to forbear to sue the\
\ son to collect the $5,000 debt, and the father made no oral or written request\
\ for any such forbearance. For the next five months, the father made and the creditor\
\ accepted the $500 monthly payments as agreed. During that period, the creditor,\
\ in fact, did forbear to take any legal action against the son. However, the father\
\ then informed the creditor that he would make no further payments on the debt.\
\ Which of the following is the most persuasive argument that the father is liable\
\ to the creditor under the terms of their agreement?\n(A) The father's promise\
\ and the creditor's reliance thereon, if proved, gave rise to a valid claim by\
\ the creditor against the father based on the doctrine of promissory estoppel.\
\ (B) Because it was foreseeable that the father's promise would induce the creditor\
\ to forbear taking any action against the son, such forbearance was, as a matter\
\ of law, a bargained-for consideration for the father's promise. (C) The father's\
\ five payments to the creditor totaling $2,500 manifested a serious intent on the\
\ father's part to be contractually bound, and such manifestation is generally recognized\
\ as an effective substitute for consideration. (D) By assuming the antecedent debt\
\ obligation that the son owed to the creditor, the father became a surety whose\
\ promise to the creditor was enforceable, since it was in writing and supported\
\ by adequate consideration. \nA: Let's think step by step. We refer to Wikipedia\
\ articles on law for help. The doctrine of promissory estoppel stops a person from\
\ going back on a promise in contract law, hence option (A) should be the most persuasive\
\ argument. The answer is (A).\n\nQ: A state has recently enacted a statute prohibiting\
\ the disposal of any nuclear wastes within the state. This law does not contravene\
\ or conflict with any federal statutes. A man operates a company in the state that\
\ is engaged in the disposal of nuclear wastes. Subsequent to the passage of the\
\ state statute, the man, not yet aware of the new law, entered into contracts with\
\ many out-of-state firms to dispose of their nuclear wastes in the state. On account\
\ of this new law, however, the man will be unable to perform these contracts. Assume\
\ that the man has standing to challenge this state law. Which of the following\
\ presents his strongest constitutional grounds to challenge the state law prohibiting\
\ the disposal of nuclear wastes within the state?\n(A) The commerce clause. (B)\
\ The equal protection clause of the Fourteenth Amendment. (C) The privileges and\
\ immunities clause of Article IV, Section 2. (D) The contract clause.\nA: Let's\
\ think step by step. We refer to Wikipedia articles on law for help. The commerce\
\ clause states that Congress shall have the power to regulate commerce with foreign\
\ Nations, and among the several States, and with the Indian Tribes. The statute\
\ affects inter-state commerce which puts it into question. Hence the man's strongest\
\ argument should be the commerce clause. The answer is (A).\n\nQ: On October 1,\
\ 1980, a developer, owner of several hundred acres in a rural county, drafted a\
\ general development plan for the area. The duly recorded plan imposed elaborate\
\ limitations and restrictions upon the land in the plan, which was to be developed\
\ as a residential district. The restrictions were to extend to all persons acquiring\
\ any of the lots and to their heirs, assigns, and lessees. It was further provided\
\ that all subsequent owners would be charged with due notice of the restrictions.\
\ Among those restrictions in the general plan were the following:(22) A franchise\
\ right is created in a strip of land 10 feet in width along the rear of each lot\
\ for the use of public utility companies with right of ingress and egress. (23)\
\ No house or structure of any kind shall be built on the aforementioned strip of\
\ land running through the said blocks. In 2000, a retiree purchased one of the\
\ lots, built a house, and erected a fence in the rear of his property within the\
\ restricted area. In 2004, a teacher purchased a lot adjacent to the retiree's\
\ property and built a new house. Two years later, a librarian purchased the lot\
\ that adjoined the teacher's property. The three deeds to those properties each\
\ contained references to the deed book where the general plan was recorded. In\
\ 2008, the librarian began the construction of a seven-foot post-and-rail fence\
\ along the line dividing his lot with the teacher's, and along the center of the\
\ area subject to the franchise right. Although the teacher objected to its construction,\
\ the fence was completed. If the teacher seeks a mandatory injunction to compel\
\ removal of the librarian's fence, the court will most likely\n(A) grant relief,\
\ because the fence was in violation of the easement restriction. (B) grant relief,\
\ because the encroachment of the fence violated the restriction in the original\
\ plan. (C) deny relief, because the teacher failed to enforce the restriction against\
\ the retiree. (D) deny relief, because the fence would not be construed as \"a\
\ structure\" within the terms of the restriction. \nA: Let's think step by step.\
\ We refer to Wikipedia articles on law for help. The restrictions in the original\
\ plan say no house or structure of any kind shall be built on the aforementioned\
\ strip of land running through the said blocks. Hence the court will most likely\
\ grant relief because the fence violated the restriction in the original plan.\
\ The answer is (B).\n\nQ: Judge took judicial notice of some facts at the beginning\
\ of the trial. Which of the following is not an appropriate kind of fact for judicial\
\ notice?\n(A) Indisputable facts. (B) Facts that have been asserted by individual\
\ political organizations. (C) Facts recognized to be true by common knowledge.\
\ (D) Facts capable of scientific verification.\nA: Let's think step by step. We\
\ refer to Wikipedia articles on law for help. Among the options, facts that have\
\ been asserted by individual political organizations is not an appropriate kind\
\ of fact for judicial notice. The answer is (B).\n\nQ: A state legislature has\
\ recently enacted a statute making it a misdemeanor to curse or revile or use obscene\
\ or opprobrious language toward or in reference to a police officer perfonning\
\ his duties. A student at a state university organized a demonstration on campus\
\ to protest the war. The rally was attended by a group of 50 students who shouted\
\ anti-war messages at cars passing by. To show his contempt for the United States,\
\ the student sewed the American flag to the rear of his jeans. When a police officer\
\ saw the flag sown on the student's jeans, he approached and told him to remove\
\ the flag or he would be placed under arrest. The student became angered and shouted\
\ at the police officer, \"Listen, you bastard, I'll wear this rag anywhere I please.\
\ \" The student was subsequently placed under arrest and charged with violating\
\ the state statute. The student subsequently brings suit in state court challenging\
\ the constitutionality of the statute. The strongest constitutional argument for\
\ the student is that\n(A) the statute is void for vagueness under the Fourteenth\
\ Amendment's due process clause. (B) the statute is invalid because it violates\
\ the petitioner's freedom of speech under the First Amendment. (C) the statute\
\ is an abridgment of freedom of speech under the First Amendment because less restrictive\
\ means are available for achieving the same purpose. (D) the statute is overbroad\
\ and consequently invalid under the First and Fourteenth Amendments.\nA: Let's\
\ think step by step. We refer to Wikipedia articles on law for help. The Fourteenth\
\ Amendment further supports the First Amendment by establishing a due process clause.\
\ Hence the strongest argument should be the statute is overbroad and consequently\
\ invalid under the First and Fourteenth Amendments. The answer is (D)."
include: _mmlu_flan_cot_fewshot_template_yaml
task: mmlu_flan_cot_fewshot_professional_law
dataset_name: professional_medicine
description: "The following are multiple choice questions (with answers) about professional\
\ medicine.\n\nQ: A 22-year-old male marathon runner presents to the office with\
\ the complaint of right-sided rib pain when he runs long distances. Physical examination\
\ reveals normal heart and lung findings and an exhalation dysfunction at ribs\_\
4-5 on the right. Which of the following muscles or muscle groups will be most useful\
\ in correcting this dysfunction utilizing a direct method?\n(A) anterior scalene\
\ (B) latissimus dorsi (C) pectoralis minor (D) quadratus lumborum\nA: Let's think\
\ step by step. We refer to Wikipedia articles on medicine for help. Among the options,\
\ only pectoralis minor muscle origins from the outer surfaces of the 3rd to 5th\
\ ribs. The answer is (C).\n\nQ: A 36-year-old male presents to the office with\
\ a\_3-week\_history of low back pain. He denies any recent trauma but says that\
\ he climbs in and out of his truck numerous times a day for his job. Examination\
\ of the patient in the prone position reveals a deep sacral sulcus on the left,\
\ a posterior inferior lateral angle on the right, and a lumbosacral junction that\
\ springs freely on compression. The most likely diagnosis is\n(A) left-on-left\
\ sacral torsion (B) left-on-right sacral torsion (C) right unilateral sacral flexion\
\ (D) right-on-right sacral torsion\nA: Let's think step by step. We refer to Wikipedia\
\ articles on medicine for help. The deep sulcus on the left, a posterior ILA on\
\ the right, with a negative spring test suggests a right-on-right sacral torsion.\
\ All other options have a deep sulcus on the right. The answer is (D).\n\nQ: A\
\ 44-year-old man comes to the office because of a 3-day history of sore throat,\
\ nonproductive cough, runny nose, and frontal headache. He says the headache is\
\ worse in the morning and ibuprofen does provide some relief. He has not had shortness\
\ of breath. Medical history is unremarkable. He takes no medications other than\
\ the ibuprofen for pain. Vital signs are temperature 37.4\xB0C (99.4\xB0F), pulse\
\ 88/min, respirations 18/min, and blood pressure 120/84 mm Hg. Examination of the\
\ nares shows erythematous mucous membranes. Examination of the throat shows erythema\
\ and follicular lymphoid hyperplasia on the posterior oropharynx. There is no palpable\
\ cervical adenopathy. Lungs are clear to auscultation. Which of the following is\
\ the most likely cause of this patient's symptoms?\n(A) Allergic rhinitis (B) Epstein-Barr\
\ virus (C) Mycoplasma pneumonia (D) Rhinovirus\nA: Let's think step by step. We\
\ refer to Wikipedia articles on medicine for help. The symptoms, especially the\
\ headache, suggest that the most likely cause is Rhinovirus. Epstein-Barr virus\
\ will cause swollen lymph nodes but there is no palpable cervical adenopathy. Lungs\
\ are clear to auscultation suggests it's not Mycoplasma pneumonia. The answer is\
\ (D).\n\nQ: A previously healthy 32-year-old woman comes to the physician 8 months\
\ after her husband was killed in a car crash. Since that time, she has had a decreased\
\ appetite and difficulty falling asleep. She states that she is often sad and cries\
\ frequently. She has been rechecking the door lock five times before leaving her\
\ house and has to count exactly five pieces of toilet paper before she uses it.\
\ She says that she has always been a perfectionist but these urges and rituals\
\ are new. Pharmacotherapy should be targeted to which of the following neurotransmitters?\n\
(A) Dopamine (B) Glutamate (C) Norepinephrine (D) Serotonin\nA: Let's think step\
\ by step. We refer to Wikipedia articles on medicine for help. The patient feels\
\ sad and among the options, only Dopamine and Serotonin can help increase positive\
\ emotions. Serotonin also affects digestion and metabolism, which can help the\
\ patient's decreased appetite and sleep difficulty. The answer is (D).\n\nQ: A\
\ 42-year-old man comes to the office for preoperative evaluation prior to undergoing\
\ adrenalectomy scheduled in 2 weeks. One month ago, he received care in the emergency\
\ department for pain over his right flank following a motor vehicle collision.\
\ At that time, blood pressure was 160/100 mm Hg and CT scan of the abdomen showed\
\ an incidental 10-cm left adrenal mass. Results of laboratory studies, including\
\ complete blood count, serum electrolyte concentrations, and liver function tests,\
\ were within the reference ranges. The patient otherwise had been healthy and had\
\ never been told that he had elevated blood pressure. He takes no medications.\
\ A follow-up visit in the office 2 weeks ago disclosed elevated urinary normetanephrine\
\ and metanephrine and plasma aldosterone concentrations. The patient was referred\
\ to a surgeon, who recommended the adrenalectomy. Today, vital signs are temperature\
\ 36.6\xB0C (97.9\xB0F), pulse 100/min, respirations 14/min, and blood pressure\
\ 170/95 mm Hg. Physical examination discloses no significant findings. Initial\
\ preoperative preparation should include treatment with which of the following?\n\
(A) Labetalol (B) A loading dose of potassium chloride (C) Nifedipine (D) Phenoxybenzamine\n\
A: Let's think step by step. We refer to Wikipedia articles on medicine for help.\
\ The symptoms and the adrenal mass suggested pheochromocytoma, and the blood pressure\
\ indicates hypertension. Phenoxybenzamine is used to treat hypertension caused\
\ by pheochromocytoma. The answer is (D)."
include: _mmlu_flan_cot_fewshot_template_yaml
task: mmlu_flan_cot_fewshot_professional_medicine
dataset_name: professional_psychology
description: "The following are multiple choice questions (with answers) about professional\
\ psychology.\n\nQ: In the construction of a multiple regression equation for purposes\
\ of prediction, the optimal combination of measures is one in which the predictors\n\
(A) are uncorrelated with each other but are moderately correlated with the criterion\
\ (B) have low correlations with each other and low correlations with the criterion\
\ (C) are highly intercorrelated with each other and moderately correlated with\
\ the criterion (D) have low correlations with the criterion bur are moderately\
\ correlated with each other\nA: Let's think step by step. We refer to Wikipedia\
\ articles on psychology for help. The basis of multiple regression is to assess\
\ the relationship between one continuous variable and a set of independent variables.\
\ So the predictors should be uncorrelated with each other but are moderately correlated\
\ with the criterion. The answer is (A).\n\nQ: There are three ways to measure the\
\ Central Tendency: the Mean, the Median and the Mode. From your knowledge about\
\ them, what is the mode?\n(A) less sensitive to extreme scores than the mean (B)\
\ more useful for skewed distributions (C) sensitive to extreme values and highly\
\ skewed distributions (D) the most frequently occurring number\nA: Let's think\
\ step by step. We refer to Wikipedia articles on psychology for help. The definition\
\ of mode is the most frequently occurring number. The answer is (D).\n\nQ: Carl\
\ Jung believed that a client's transference:\n(A) is a fantasy that distracts the\
\ client from reality. (B) represents \u201Cmixed feelings\u201D toward the therapist.\
\ (C) \"is a form of \"\"acting out.\"\"\" (D) reflects the client\u2019s personal\
\ and collective unconscious.\nA: Let's think step by step. We refer to Wikipedia\
\ articles on psychology for help. Transference is a phenomenon that a person's\
\ feelings are unconsciously redirected, so it reflects the client's personal and\
\ collective unconscious. The answer is (D).\n\nQ: In terms of Hofstede\u2019s (1980)\
\ five cultural dimensions, the United States scores at the top of the scale on:\n\
(A) individualism. (B) individualism and power distance. (C) power distance and\
\ masculinity. (D) uncertainty avoidance.\nA: Let's think step by step. We refer\
\ to Wikipedia articles on psychology for help. US scores highest on individualism\
\ among the five cultural dimensions. The answer is (A).\n\nQ: One of your therapy\
\ clients asks your advice about a good weight- reduction program. You have investigated\
\ the programs in the community and are enrolled in the one you consider the best.\
\ This program offers a $50 bonus to its patrons for each new person they bring\
\ into the program. Under these circumstances, your most appropriate response would\
\ be to\n(A) tell your client the pros and cons of each program you know about except\
\ for the one in which you are enrolled (B) recommend to your client the program\
\ in which you are enrolled and explain the $50 bonus you will receive (C) recommend\
\ to your client the program in which you are enrolled and offer to have the $50\
\ bonus credited to your client's account in the program (D) tell your client the\
\ pros and cons of each program you know about, but do not claim the $50 bonus if\
\ your client enrolls in your program\nA: Let's think step by step. We refer to\
\ Wikipedia articles on psychology for help. Based on the circumstances, you should\
\ tell your client about the pros and cons of each program, but it would be inappropriate\
\ to receive the bonus, so you should not claim the $50 bonus. The answer is (D)."
include: _mmlu_flan_cot_fewshot_template_yaml
task: mmlu_flan_cot_fewshot_professional_psychology
dataset_name: public_relations
description: 'The following are multiple choice questions (with answers) about public
relations.
Q: Earth Hour was a campaign launched by which organization?
(A) Greenpeace (B) The UN (C) Oxfam (D) World Wildlife Fund
A: Let''s think step by step. We refer to Wikipedia articles on public relations
for help. Earth Hour is a worldwide movement oragnized launched by the World Wildlife
Fund. The answer is (D).
Q: In issues management, what is the most proactive approach to addressing negative
or misleading information posted online about your organization?
(A) Buy domain names that could be used by opposition groups. (B) Post anonymous
comments on blogs to combat this information. (C) Prepare a news release that discredits
the inaccurate information. (D) Make policy changes to address complaints highlighted
on these sites.
A: Let''s think step by step. We refer to Wikipedia articles on public relations
for help. In issues management, the most proactive approach to addressing negative
or misleading information posted online is to make policy changes to address complaints
highlighted on those sites. The answer is (D).
Q: At which stage in the planning process would a situation analysis be carried
out?
(A) Defining the program (B) Planning the program (C) Taking action and implementing
ideas (D) Evaluation of the program
A: Let''s think step by step. We refer to Wikipedia articles on public relations
for help. Situation analyses are typically carried out during the planning process
stage of defining the program. The answer is (A).
Q: Which of these statements is true of the Vatican in 2010 at the time of the accusations
of child abuse cover-ups?
(A) There was a coordinated media response. (B) Consistent messages were communicated.
(C) Criticisms were taken as attacks on the Catholic Church. (D) The credibility
of the Vatican was upheld.
A: Let''s think step by step. We refer to Wikipedia articles on public relations
for help. In 2010 when there were accusations of child abuse cover-ups, the Vatican
took those criticisms as attacks on the Catholic Church. The answer is (C).
Q: What should a public relations media practitioner do if she does not know the
answer to a reporter''s question?
(A) Give the reporter other information she is certain is correct. (B) Say that
the information is ''off the record'' and will be disseminated later. (C) Say ''I
don''t know'' and promise to provide the information later. (D) Say ''no comment,''
rather than appear uninformed.
A: Let''s think step by step. We refer to Wikipedia articles on public relations
for help. If a public relations media practitioner does not know the answer to a
reporter''s question, they should say ''I don''t know'' and offer to provide the
information later. The answer is (C).'
include: _mmlu_flan_cot_fewshot_template_yaml
task: mmlu_flan_cot_fewshot_public_relations
dataset_name: security_studies
description: "The following are multiple choice questions (with answers) about security\
\ studies.\n\nQ: What are the frameworks of analysis within which terrorism has\
\ been considered (as of 2020)?\n(A) Competition between larger nations has resulted\
\ in some countries actively supporting terrorist groups to undermine the strength\
\ of rival states. Terrorist networks are extended patronage clubs maintained and\
\ paid for by their donor states and are conceptualised as being like state actors,\
\ to be dealt with using military force. (B) Globalization has enabled the internationalization\
\ of terrorist activities by opening up their operational space, although coordination\
\ is still managed from a geographical base. This suggests that terrorist groups\
\ are nationally structured which means that terrorism cannot be considered in terms\
\ of a war to be defeated militarily without having serious implications on the\
\ indigenous population. (C) Terrorism can be viewed as a problem to be resolved\
\ by military means (war on terrorism), by normal police techniques (terrorism as\
\ crime), or as a medical problem with underlying causes and symptoms (terrorism\
\ as disease). (D) Terrorism is viewed as a criminal problem. The criminalization\
\ of terrorism has two important implications. Firstly, it suggests that terrorism\
\ can be eradicated - terrorists can be caught and brought to trial by normal judicial\
\ proceedings thereby removing the threat from society - and secondly, it suggests\
\ that preventative crime techniques are applicable to prevent its development.\n\
A: Let's think step by step. We refer to Wikipedia articles on security studies\
\ for help. (A) is wrong because it is not competition between larger nations that\
\ causes terrorism. \n(B) is wrong because globalization is not the cause of terrorism.\n\
(C) is correct because the US undertook the war on terrorism. \n(D) is wrong because\
\ preventative crime techniques will likely not end terrorism. The answer is (C).\n\
\nQ: Which of the following is the best lens through which to investigate the role\
\ of child soldiers?\n(A) Child soldiers are victims of combat that need re-education\
\ and rehabilitation. (B) Children and their mothers are not active subjects in\
\ warfare and are best considered as subjects in the private sphere. (C) Children\
\ are most often innocent bystanders in war and are best used as signifiers of peace.\
\ (D) Children have political subjecthood that is missed when they are considered\
\ as passive victims of warfare.\nA: Let's think step by step. We refer to Wikipedia\
\ articles on security studies for help. Child soliders as a political topic can\
\ be missed when they are considered passive victims of warfare. The answer is (D).\n\
\nQ: How can we best describe the relationship between the state-centric approach\
\ and the concept of human security?\n(A) There are such wide divisions within the\
\ human security framework regarding the nature of threats and referent objects\
\ that no widely applicable comparisons between state-centric approaches and human\
\ security can be drawn. (B) By adopting the framework of human security, the limitations\
\ of the realist state-centric approach become evident. Whilst human security defines\
\ the referent object as the person or population, state-centric approaches prioritise\
\ the security of the state, de-prioritizing the pursuit of human security. (C)\
\ The state-centric approach to security is a faction of human security, usually\
\ defined within the broad school of human security. By being state-centric this\
\ approach prioritises the individual as the referent object in security studies.\
\ (D) Both the state-centric and human-centric approaches to security are mutually\
\ exclusive and offer a sufficient analytic framework with which to understand the\
\ international security system. It is therefore the role of security analysts to\
\ determine which of these substantial concepts is correct, and which should be\
\ discarded.\nA: Let's think step by step. We refer to Wikipedia articles on security\
\ studies for help. Human security focuses on a person or population whereas state-centric\
\ approaches focus on the state while deprioritizing human security. The answer\
\ is (B).\n\nQ: In order to become securitized, a threat must be presented in which\
\ of these ways?\n(A) As an existential threat that requires immediate and extraordinary\
\ action, posing a threat to the survival of the state or to societal security.\
\ (B) As requiring immediate and extraordinary action by the state, threatening\
\ the survival of a referent object and therefore warranting the use of measures\
\ not normally employed in the political realm. (C) As an urgent threat to the survival\
\ of the referent object, so serious that it legitimises the employment of extraordinary\
\ action in response. (D) As an urgent threat to the survival of the audience that\
\ requires extraordinary or emergency measures.\nA: Let's think step by step. We\
\ refer to Wikipedia articles on security studies for help. To be securitized, a\
\ threat must be an urgent threat to the survival of the referent object. The answer\
\ is (C).\n\nQ: What distinguishes coercive diplomacy from military force?\n(A)\
\ Compellence is another term for coercive diplomacy, but covering a narrower set\
\ of criteria; compellence covers those threats aimed at initiating adversary action.\
\ A threat to coerce a state to give up part of its territory would count as coercive\
\ diplomacy, as long as that threat proactively initiates action before reactive\
\ diplomacy is taken. (B) Coercive diplomacy constitutes the threats of limited\
\ force to induce adversary's incentive to comply with the coercer's demands. It\
\ is an influence strategy that is intended to obtain compliance: the use of force\
\ to defeat an opponent first does not count. It leaves an element of choice with\
\ the target to comply, or to continue. (C) Military force, or the threat of military\
\ force, utilises fear to achieve strategic objectives. Coercive diplomacy is differentiated\
\ from this approach, because it does not use fear as a tool for coercing an adversary.\
\ (D) Coercive diplomacy is employed to use force but to limit its effects on the\
\ international community. Coercive diplomacy is an aggressive strategy that is\
\ intended to obtain compliance through defeat. It does not leave an element of\
\ choice with the target, the target either being forced to comply or engage in\
\ conflict. It seeks to control by imposing compliance by removing any opportunity\
\ for negotiation or concession.\nA: Let's think step by step. We refer to Wikipedia\
\ articles on security studies for help. Coercive diplomacy uses the threat of force\
\ to induce the opponent to comply with demands. The answer is (B)."
include: _mmlu_flan_cot_fewshot_template_yaml
task: mmlu_flan_cot_fewshot_security_studies
dataset_name: sociology
description: 'The following are multiple choice questions (with answers) about sociology.
Q: Which of the following is not a problem associated with official statistics on
strike action?
(A) most strikes go unnoticed by employers and the mass media (B) not all industrial
disputes will be reported by the employer (C) the definition of strikes excludes
those that involve fewer than ten workers or last less than one day (D) it is hard
to compare strikes that were measured in different ways
A: Let''s think step by step. We refer to Wikipedia articles on sociology for help.
Official statistics on strike action can be problematic because not all industrial
disputes will be reported by employers, the definition of strikes excludes those
that involves fewer than ten workers or last less than one day, and it is hard to
compare strikes that were measured in different ways. Thus, (A) is not a problem
associated with official statistics on strike action. The answer is (A).
Q: What does Berger (1963) describe as a metaphor for social reality?
(A) a fairground ride (B) a circus (C) a puppet theatre (D) a ballet
A: Let''s think step by step. We refer to Wikipedia articles on sociology for help.
Berger describes social reality using the metaphor of a puppet theatre. The answer
is (C).
Q: The term ''hegemony'' refers to:
(A) the tendency for the working class not to realize their own interests (B) a
dominant ideology that legitimates economic, political and cultural power (C) a
form of dual consciousness based on ideology and everyday experiences (D) a mode
of payment given for outstanding topiary
A: Let''s think step by step. We refer to Wikipedia articles on sociology for help.
Hegemony refers to a dominant ideology that legitimates economic, policital, and
cultural power. The answer is (B).
Q: The shift from ''civil religion'' to ''common religion'' means that:
(A) the increasing bureaucracy of the state has made religion only a marginal part
of our lives (B) despite the weakening of traditional authority, our everyday lives
and ''common sense'' remain shaped by religious beliefs and values (C) religious
participation in collective worship may have declined, but people still practise
their faiths in private (D) people are much more likely to discuss their religious
beliefs in public, informal settings
A: Let''s think step by step. We refer to Wikipedia articles on sociology for help.
The shift from civil religion to common religion means that despite the weakening
of traditional authority, our everyday lives and common sense remain shaped by religious
beliefs and values. The answer is (B).
Q: Which of the following did the post-war welfare state of 1948 not aim to provide:
(A) free health care and education for all (B) a minimum wage (C) full employment
(D) universal welfare
A: Let''s think step by step. We refer to Wikipedia articles on sociology for help.
The post-war welfare state of 1948 aimed to provide free healthcare and education,
full employment, and universal welfare. But it did not aim to provide a minimum
wage. The answer is (B).'
include: _mmlu_flan_cot_fewshot_template_yaml
task: mmlu_flan_cot_fewshot_sociology
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment