Unverified Commit da211969 authored by Jess's avatar Jess Committed by GitHub
Browse files

Merge branch 'EleutherAI:main' into main

parents 1b97e487 801322e0
"dataset_name": "miscellaneous"
"description": "The following are multiple choice questions (with answers) about miscellaneous.\n\
\nQ: Which of these songs was a Top 10 hit for the rock band The Police?\n(A) 'Radio\
\ Ga-Ga' (B) 'Ob-la-di Ob-la-da' (C) 'De Do Do Do De Da Da Da' (D) 'In-a-Gadda-Da-Vida'\n\
A: Let's think step by step. We refer to Wikipedia for help. Radio Ga-Ga is by Queen.\
\ Ob-la-di Ob-la-da is by The Beatles. And In-a-Gadda-Da-Vida is by Iron Butterfly.\
\ Leaving 'De Do Do Do De Da Da Da' as the only song by The Police, and also a Top\
\ 10 hit. The answer is (C).\n\nQ: What place is named in the title of the 1979\
\ live album by rock legends Cheap Trick?\n(A) Budapest (B) Budokan (C) Bhutan (D)\
\ Britain\nA: Let's think step by step. We refer to Wikipedia for help. Nippon Budokan\
\ is an indoor arena in Tokyo, Japan renowned for hosting rock music concerts including\
\ Cheap Trick in 1978. 'Cheap Trick at Budokan' became the name of their album.\
\ The answer is (B).\n\nQ: What is produced during photosynthesis?\n(A) hydrogen\
\ (B) nylon (C) oxygen (D) light\nA: Let's think step by step. We refer to Wikipedia\
\ for help. Photosynthesis is the process in which green plants use the green pigment\
\ chlorophyll to synthesize foods with water and carbon dioxide. Oxygen is the byproduct\
\ of this process. The answer is (C).\n\nQ: Who is the shortest man to ever win\
\ an NBA slam dunk competition?\n(A) Anthony 'Spud' Webb (B) Michael 'Air' Jordan\
\ (C) Tyrone 'Muggsy' Bogues (D) Julius 'Dr J' Erving\nA: Let's think step by step.\
\ We refer to Wikipedia for help. In 1986, Spud Webb, standing only 5'7\" became\
\ the shortest NBA player in history to win an official slam dunk contest. The answer\
\ is (A).\n\nQ: How many axles does a standard automobile have?\n(A) one (B) two\
\ (C) four (D) eight\nA: Let's think step by step. We refer to Wikipedia for help.\
\ Most cars have two axles to rotate the wheels.. The answer is (B).\n\n"
"group": "mmlu_flan_cot_fewshot_other"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_miscellaneous"
dataset_name: miscellaneous
description: The following are multiple choice questions (with answers) about miscellaneous.
fewshot_config:
sampler: first_n
samples:
- question: 'Which of these songs was a Top 10 hit for the rock band The Police?
(A) ''Radio Ga-Ga'' (B) ''Ob-la-di Ob-la-da'' (C) ''De Do Do Do De Da Da Da''
(D) ''In-a-Gadda-Da-Vida'''
target: Let's think step by step. We refer to Wikipedia for help. Radio Ga-Ga
is by Queen. Ob-la-di Ob-la-da is by The Beatles. And In-a-Gadda-Da-Vida is
by Iron Butterfly. Leaving 'De Do Do Do De Da Da Da' as the only song by The
Police, and also a Top 10 hit. The answer is (C).
- question: 'What place is named in the title of the 1979 live album by rock legends
Cheap Trick?
(A) Budapest (B) Budokan (C) Bhutan (D) Britain'
target: Let's think step by step. We refer to Wikipedia for help. Nippon Budokan
is an indoor arena in Tokyo, Japan renowned for hosting rock music concerts
including Cheap Trick in 1978. 'Cheap Trick at Budokan' became the name of their
album. The answer is (B).
- question: 'What is produced during photosynthesis?
(A) hydrogen (B) nylon (C) oxygen (D) light'
target: Let's think step by step. We refer to Wikipedia for help. Photosynthesis
is the process in which green plants use the green pigment chlorophyll to synthesize
foods with water and carbon dioxide. Oxygen is the byproduct of this process.
The answer is (C).
- question: 'Who is the shortest man to ever win an NBA slam dunk competition?
(A) Anthony ''Spud'' Webb (B) Michael ''Air'' Jordan (C) Tyrone ''Muggsy'' Bogues
(D) Julius ''Dr J'' Erving'
target: Let's think step by step. We refer to Wikipedia for help. In 1986, Spud
Webb, standing only 5'7" became the shortest NBA player in history to win an
official slam dunk contest. The answer is (A).
- question: 'How many axles does a standard automobile have?
(A) one (B) two (C) four (D) eight'
target: 'Let''s think step by step. We refer to Wikipedia for help. Most cars
have two axles to rotate the wheels.. The answer is (B).'
group: mmlu_flan_cot_fewshot_other
include: _mmlu_flan_cot_fewshot_template_yaml
task: mmlu_flan_cot_fewshot_miscellaneous
"dataset_name": "moral_disputes"
"description": "The following are multiple choice questions (with answers) about moral\
\ disputes.\n\nQ: Baron admits that the versions of the ticking bomb hypothetical\
\ she discusses are \"stunningly stupid,\" but she claims this is actually evidence\
\ of\n(A) the stupidity of most traditional philosophical examples. (B) a general\
\ lack of intelligence among people with advanced degrees. (C) the wrongness of\
\ torture. (D) the readiness on the part of many intelligent people to see torture\
\ as the best solution to deal with terrorism.\nA: Let's think step by step. We\
\ refer to Wikipedia articles on moral disputes for help. The ticking bomb hypothetical\
\ poses a problem where many people will die to an exploding bomb, if the hypothetical\
\ terrorist does not disclose how to defuse it. Baron sees this hypothetical as\
\ silly, but its prevalence does suggest intelligent people, particularly utilitarians,\
\ see torture as justifiable to save the lives in this scenario. The answer is (D).\n\
\nQ: A fertilized ovum is also known as\n(A) a zygote. (B) an embryo. (C) a viability.\
\ (D) a blastocyst.\nA: Let's think step by step. We refer to Wikipedia articles\
\ on moral disputes for help. Once a single sperm penetrates the layers of an egg\
\ to form a new cell, that cell is called a zygote. The answer is (A).\n\nQ: Pence\
\ compares six different cases of reproduction, from natural twinning to SCNT. What\
\ conclusion does he draw from this comparison?\n(A) SCNT is not a different kind\
\ of reproduction because there are no morally relevant differences between it and\
\ other permissible means of reproduction. (B) Because there is a low risk of harm\
\ for natural twinning, there will be a low risk of harm for SCNT. (C) both A and\
\ B (D) neither A nor B\nA: Let's think step by step. We refer to Wikipedia articles\
\ on moral disputes for help. SCNT, known as 'somatic cell nuclear transfer', is\
\ a technique in which the nucleus of a somatic (body) cell is transferred to the\
\ cytoplasm of an enucleated egg. This is used in farm animal cloning. Pence may\
\ compare this to other processes of reproduction, and see little moral separation.\
\ The answer is (A).\n\nQ: Which of the following is an intrinsic good, according\
\ to Singer?\n(A) being in the upper class (B) the ability to learn and solve problems\
\ (C) being more beautiful than most of the population (D) being shorter than average\n\
A: Let's think step by step. We refer to Wikipedia articles on moral disputes for\
\ help. Bioethicist Peter Singer sees intrinsic value as innate values conferred\
\ by oneself, for oneself. Innanimate objects can be beautiful, short, or have some\
\ valuable criteria, but capabilities are intrinsically good. The answer is (B).\n\
\nQ: According to Metz, what is wrong with consequentialist arguments against capital\
\ punishment based on African values?\n(A) It is unclear as of yet whether or not\
\ capital punishment deters harm to the community. (B) It is unclear as of yet whether\
\ or not capital punishment deters harm to any individuals. (C) Consequentialism\
\ is not supported by African values. (D) Even though consequentialism is supported\
\ by African values, no consequentialist arguments framed in terms of African values\
\ have been offered.\nA: Let's think step by step. We refer to Wikipedia articles\
\ on moral disputes for help. Thaddeus Metz is a humanities research professor focusing\
\ on ethical philosophy in South Africa. Metz has written the death penalty is unjustified\
\ as it treats individuals as incapable of communal relations. It is unclear that\
\ capital punishment is to the benefit of, or a deterrent of harm to the community.\
\ The answer is (A).\n\n"
"group": "mmlu_flan_cot_fewshot_humanities"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_moral_disputes"
dataset_name: moral_disputes
description: The following are multiple choice questions (with answers) about moral
disputes.
fewshot_config:
sampler: first_n
samples:
- question: 'Baron admits that the versions of the ticking bomb hypothetical she discusses
are "stunningly stupid," but she claims this is actually evidence of
(A) the stupidity of most traditional philosophical examples. (B) a general
lack of intelligence among people with advanced degrees. (C) the wrongness of
torture. (D) the readiness on the part of many intelligent people to see torture
as the best solution to deal with terrorism.'
target: Let's think step by step. We refer to Wikipedia articles on moral disputes
for help. The ticking bomb hypothetical poses a problem where many people will
die to an exploding bomb, if the hypothetical terrorist does not disclose how
to defuse it. Baron sees this hypothetical as silly, but its prevalence does
suggest intelligent people, particularly utilitarians, see torture as justifiable
to save the lives in this scenario. The answer is (D).
- question: 'A fertilized ovum is also known as
(A) a zygote. (B) an embryo. (C) a viability. (D) a blastocyst.'
target: Let's think step by step. We refer to Wikipedia articles on moral disputes
for help. Once a single sperm penetrates the layers of an egg to form a new
cell, that cell is called a zygote. The answer is (A).
- question: 'Pence compares six different cases of reproduction, from natural twinning
to SCNT. What conclusion does he draw from this comparison?
(A) SCNT is not a different kind of reproduction because there are no morally
relevant differences between it and other permissible means of reproduction.
(B) Because there is a low risk of harm for natural twinning, there will be
a low risk of harm for SCNT. (C) both A and B (D) neither A nor B'
target: Let's think step by step. We refer to Wikipedia articles on moral disputes
for help. SCNT, known as 'somatic cell nuclear transfer', is a technique in
which the nucleus of a somatic (body) cell is transferred to the cytoplasm of
an enucleated egg. This is used in farm animal cloning. Pence may compare this
to other processes of reproduction, and see little moral separation. The answer
is (A).
- question: 'Which of the following is an intrinsic good, according to Singer?
(A) being in the upper class (B) the ability to learn and solve problems (C)
being more beautiful than most of the population (D) being shorter than average'
target: Let's think step by step. We refer to Wikipedia articles on moral disputes
for help. Bioethicist Peter Singer sees intrinsic value as innate values conferred
by oneself, for oneself. Innanimate objects can be beautiful, short, or have
some valuable criteria, but capabilities are intrinsically good. The answer
is (B).
- question: 'According to Metz, what is wrong with consequentialist arguments against
capital punishment based on African values?
(A) It is unclear as of yet whether or not capital punishment deters harm to
the community. (B) It is unclear as of yet whether or not capital punishment
deters harm to any individuals. (C) Consequentialism is not supported by African
values. (D) Even though consequentialism is supported by African values, no
consequentialist arguments framed in terms of African values have been offered.'
target: 'Let''s think step by step. We refer to Wikipedia articles on moral disputes
for help. Thaddeus Metz is a humanities research professor focusing on ethical
philosophy in South Africa. Metz has written the death penalty is unjustified
as it treats individuals as incapable of communal relations. It is unclear that
capital punishment is to the benefit of, or a deterrent of harm to the community.
The answer is (A).'
group: mmlu_flan_cot_fewshot_humanities
include: _mmlu_flan_cot_fewshot_template_yaml
task: mmlu_flan_cot_fewshot_moral_disputes
"dataset_name": "moral_scenarios"
"description": "The following are multiple choice questions (with answers) about moral\
\ scenarios.\n\nQ: For which of these two scenarios does the main character (who\
\ uses I/me/my) do something clearly morally wrong, according to ordinary moral\
\ standards in the US as of 2020? Scenario 1 | I took his wallet after he harassed\
\ me at the bar. Scenario 2 | I made love to my newborn baby boy.\n(A) Wrong, Wrong\
\ (B) Wrong, Not wrong (C) Not wrong, Wrong (D) Not wrong, Not wrong\nA: Let's think\
\ step by step. We refer to Wikipedia articles on moral scenarios for help. Stealing\
\ from someone is not considered an ethical response to harrassment. And having\
\ sex with a minor is egregiously unethical. The answer is (A).\n\nQ: For which\
\ of these two scenarios does the main character (who uses I/me/my) do something\
\ clearly morally wrong, according to ordinary moral standards in the US as of 2020?\
\ Scenario 1 | I asked for a cigarette from the guy outside the store. Scenario\
\ 2 | When the security guard left his post, I pocketed a rare coin.\n(A) Wrong,\
\ Wrong (B) Wrong, Not wrong (C) Not wrong, Wrong (D) Not wrong, Not wrong\nA: Let's\
\ think step by step. We refer to Wikipedia articles on moral scenarios for help.\
\ Asking for something is not morally wrong. Stealing is morally wrong. The answer\
\ is (C).\n\nQ: For which of these two scenarios does the main character (who uses\
\ I/me/my) do something clearly morally wrong, according to ordinary moral standards\
\ in the US as of 2020? Scenario 1 | During the house's demolition, I pushed him\
\ into a wall. Scenario 2 | When I choose a babysitter for my twin nieces, I don't\
\ care about her racial background.\n(A) Wrong, Wrong (B) Wrong, Not wrong (C) Not\
\ wrong, Wrong (D) Not wrong, Not wrong\nA: Let's think step by step. We refer to\
\ Wikipedia articles on moral scenarios for help. Physical violence is morally wrong.\
\ Being open to hiring people from different racial backgrounds is an expected moral\
\ stance. The answer is (B).\n\nQ: For which of these two scenarios does the main\
\ character (who uses I/me/my) do something clearly morally wrong, according to\
\ ordinary moral standards in the US as of 2020? Scenario 1 | I bought an old pick-up\
\ truck for work. Scenario 2 | I looked up at the gorgeous waterfall and threw a\
\ rock at it.\n(A) Wrong, Wrong (B) Wrong, Not wrong (C) Not wrong, Wrong (D) Not\
\ wrong, Not wrong\nA: Let's think step by step. We refer to Wikipedia articles\
\ on moral scenarios for help. Neither of these are wrong as no one is wronged by\
\ buing a pick-up truck or throwing a rock into a stream. The answer is (D).\n\n\
Q: For which of these two scenarios does the main character (who uses I/me/my) do\
\ something clearly morally wrong, according to ordinary moral standards in the\
\ US as of 2020? Scenario 1 | I really loved my newborn baby boy. Scenario 2 | The\
\ new girl is embarrassed to live in a trailer park, so I invite the class to her\
\ house.\n(A) Wrong, Wrong (B) Wrong, Not wrong (C) Not wrong, Wrong (D) Not wrong,\
\ Not wrong\nA: Let's think step by step. We refer to Wikipedia articles on moral\
\ scenarios for help. Loving someone is not wrong. However, exposing something that\
\ someone is embarrassed about could be considered quite mean. The answer is (C).\n\n"
"group": "mmlu_flan_cot_fewshot_humanities"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_moral_scenarios"
dataset_name: moral_scenarios
description: The following are multiple choice questions (with answers) about moral
scenarios.
fewshot_config:
sampler: first_n
samples:
- question: 'For which of these two scenarios does the main character (who uses I/me/my)
do something clearly morally wrong, according to ordinary moral standards in
the US as of 2020? Scenario 1 | I took his wallet after he harassed me at the
bar. Scenario 2 | I made love to my newborn baby boy.
(A) Wrong, Wrong (B) Wrong, Not wrong (C) Not wrong, Wrong (D) Not wrong, Not
wrong'
target: Let's think step by step. We refer to Wikipedia articles on moral scenarios
for help. Stealing from someone is not considered an ethical response to harrassment.
And having sex with a minor is egregiously unethical. The answer is (A).
- question: 'For which of these two scenarios does the main character (who uses I/me/my)
do something clearly morally wrong, according to ordinary moral standards in
the US as of 2020? Scenario 1 | I asked for a cigarette from the guy outside
the store. Scenario 2 | When the security guard left his post, I pocketed a
rare coin.
(A) Wrong, Wrong (B) Wrong, Not wrong (C) Not wrong, Wrong (D) Not wrong, Not
wrong'
target: Let's think step by step. We refer to Wikipedia articles on moral scenarios
for help. Asking for something is not morally wrong. Stealing is morally wrong.
The answer is (C).
- question: 'For which of these two scenarios does the main character (who uses I/me/my)
do something clearly morally wrong, according to ordinary moral standards in
the US as of 2020? Scenario 1 | During the house''s demolition, I pushed him
into a wall. Scenario 2 | When I choose a babysitter for my twin nieces, I don''t
care about her racial background.
(A) Wrong, Wrong (B) Wrong, Not wrong (C) Not wrong, Wrong (D) Not wrong, Not
wrong'
target: Let's think step by step. We refer to Wikipedia articles on moral scenarios
for help. Physical violence is morally wrong. Being open to hiring people from
different racial backgrounds is an expected moral stance. The answer is (B).
- question: 'For which of these two scenarios does the main character (who uses I/me/my)
do something clearly morally wrong, according to ordinary moral standards in
the US as of 2020? Scenario 1 | I bought an old pick-up truck for work. Scenario
2 | I looked up at the gorgeous waterfall and threw a rock at it.
(A) Wrong, Wrong (B) Wrong, Not wrong (C) Not wrong, Wrong (D) Not wrong, Not
wrong'
target: Let's think step by step. We refer to Wikipedia articles on moral scenarios
for help. Neither of these are wrong as no one is wronged by buing a pick-up
truck or throwing a rock into a stream. The answer is (D).
- question: 'For which of these two scenarios does the main character (who uses I/me/my)
do something clearly morally wrong, according to ordinary moral standards in
the US as of 2020? Scenario 1 | I really loved my newborn baby boy. Scenario
2 | The new girl is embarrassed to live in a trailer park, so I invite the class
to her house.
(A) Wrong, Wrong (B) Wrong, Not wrong (C) Not wrong, Wrong (D) Not wrong, Not
wrong'
target: 'Let''s think step by step. We refer to Wikipedia articles on moral scenarios
for help. Loving someone is not wrong. However, exposing something that someone
is embarrassed about could be considered quite mean. The answer is (C).'
group: mmlu_flan_cot_fewshot_humanities
include: _mmlu_flan_cot_fewshot_template_yaml
task: mmlu_flan_cot_fewshot_moral_scenarios
"dataset_name": "nutrition"
"description": "The following are multiple choice questions (with answers) about nutrition.\n\
\nQ: What is the first-line drug for patients with type 2 diabetes and obesity,\
\ as of 2020?\n(A) Acarbose (B) Metformin (C) Sulphonylureas (D) Insulin\nA: Let's\
\ think step by step. We refer to Wikipedia articles on nutrition for help. Metformin\
\ (Fortamet, Glumetza, or others) is usually the first medication prescribed for\
\ type 2 diabetes, as well as obesity. It works by lowering glucose production in\
\ the liver and improving the body's sensitivity to insulin. The answer is (B).\n\
\nQ: Which of the following statements is correct (according to knowledge in 2020)?\n\
(A) Consumers with phenylketonuria must avoid the consumption of the sweetener aspartame\
\ (B) Consumers with phenylketonuria must avoid the consumption of the sweetener\
\ saccharin (C) Consumers with phenylketonuria must avoid the consumption of the\
\ sweetener sucralose (D) Consumers with phenylketonuria must avoid the consumption\
\ of the sweetener acesulfame K\nA: Let's think step by step. We refer to Wikipedia\
\ articles on nutrition for help. People with phenylketonuria (PKU) cannot break\
\ down the amino acid phenylalanine. As it builds up in the blood and brain it can\
\ lead to brain damage. People with PKU should avoid foods that are converted to\
\ phenylalanine in the body, such as aspartame. The answer is (A).\n\nQ: Which of\
\ the following statements about iodine is correct, as of 2020?\n(A) 50% of adults\
\ consume iodine at levels below the RNI (B) Dairy products are a poor source of\
\ iodine (C) The iodine content of organic milk is generally lower that the level\
\ in non-organic milk (D) UK dietary reference values recommend an increase in iodine\
\ intake in pregnancy\nA: Let's think step by step. We refer to Wikipedia articles\
\ on nutrition for help. Organic milk usually has less iodine content than non-organic\
\ milk. The answer is (C).\n\nQ: Which of the following is the most plausible explanation\
\ for the protective effect of dietary fibre against cancer of the colon, as of\
\ 2020?\n(A) Propionic acid, formed during colonic fibre fermentation inhibits liver\
\ fatty acid synthesis (B) Butyric acid, formed during colonic fibre fermentation\
\ stimulates \"silencing\" of the SLC5A8 tumour suppressor gene (C) None of these\
\ options are correct (D) Butyric acid, formed during colonic fibre fermentation\
\ stimulates anti-oxidant defences in the colon\nA: Let's think step by step. We\
\ refer to Wikipedia articles on nutrition for help. Dietary fibre is inversely\
\ proportional to the risk of colorectal cancer. This is presumed because butyric\
\ acid (BA) stimulates antioxidants which help protect the colon from cancerous\
\ tumors. The answer is (D).\n\nQ: In a cohort study, the risk ratio of developing\
\ diabetes was 0.86 when comparing consumers of tea (the exposed) to those who did\
\ not drink tea (the unexposed). Which one statement is correct (according to knowledge\
\ in 2020)?\n(A) The tea drinkers have lower risk of developing diabetes. (B) The\
\ tea drinkers have higher risk of developing diabetes. (C) Based on the information\
\ given we cannot tell if the observed difference in disease risk is the result\
\ of chance. (D) The risk ratio is close to the value one, so there is no difference\
\ in disease risk between the two groups.\nA: Let's think step by step. We refer\
\ to Wikipedia articles on nutrition for help. The risk ratio is not sufficiently\
\ reduced that it could not be explained by random chance given the studies sample\
\ size. The answer is (C).\n\n"
"group": "mmlu_flan_cot_fewshot_other"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_nutrition"
dataset_name: nutrition
description: The following are multiple choice questions (with answers) about nutrition.
fewshot_config:
sampler: first_n
samples:
- question: 'What is the first-line drug for patients with type 2 diabetes and obesity,
as of 2020?
(A) Acarbose (B) Metformin (C) Sulphonylureas (D) Insulin'
target: Let's think step by step. We refer to Wikipedia articles on nutrition
for help. Metformin (Fortamet, Glumetza, or others) is usually the first medication
prescribed for type 2 diabetes, as well as obesity. It works by lowering glucose
production in the liver and improving the body's sensitivity to insulin. The
answer is (B).
- question: 'Which of the following statements is correct (according to knowledge in
2020)?
(A) Consumers with phenylketonuria must avoid the consumption of the sweetener
aspartame (B) Consumers with phenylketonuria must avoid the consumption of the
sweetener saccharin (C) Consumers with phenylketonuria must avoid the consumption
of the sweetener sucralose (D) Consumers with phenylketonuria must avoid the
consumption of the sweetener acesulfame K'
target: Let's think step by step. We refer to Wikipedia articles on nutrition
for help. People with phenylketonuria (PKU) cannot break down the amino acid
phenylalanine. As it builds up in the blood and brain it can lead to brain damage.
People with PKU should avoid foods that are converted to phenylalanine in the
body, such as aspartame. The answer is (A).
- question: 'Which of the following statements about iodine is correct, as of 2020?
(A) 50% of adults consume iodine at levels below the RNI (B) Dairy products
are a poor source of iodine (C) The iodine content of organic milk is generally
lower that the level in non-organic milk (D) UK dietary reference values recommend
an increase in iodine intake in pregnancy'
target: Let's think step by step. We refer to Wikipedia articles on nutrition
for help. Organic milk usually has less iodine content than non-organic milk.
The answer is (C).
- question: 'Which of the following is the most plausible explanation for the protective
effect of dietary fibre against cancer of the colon, as of 2020?
(A) Propionic acid, formed during colonic fibre fermentation inhibits liver
fatty acid synthesis (B) Butyric acid, formed during colonic fibre fermentation
stimulates "silencing" of the SLC5A8 tumour suppressor gene (C) None of these
options are correct (D) Butyric acid, formed during colonic fibre fermentation
stimulates anti-oxidant defences in the colon'
target: Let's think step by step. We refer to Wikipedia articles on nutrition
for help. Dietary fibre is inversely proportional to the risk of colorectal
cancer. This is presumed because butyric acid (BA) stimulates antioxidants which
help protect the colon from cancerous tumors. The answer is (D).
- question: 'In a cohort study, the risk ratio of developing diabetes was 0.86 when
comparing consumers of tea (the exposed) to those who did not drink tea (the
unexposed). Which one statement is correct (according to knowledge in 2020)?
(A) The tea drinkers have lower risk of developing diabetes. (B) The tea drinkers
have higher risk of developing diabetes. (C) Based on the information given
we cannot tell if the observed difference in disease risk is the result of chance.
(D) The risk ratio is close to the value one, so there is no difference in disease
risk between the two groups.'
target: 'Let''s think step by step. We refer to Wikipedia articles on nutrition
for help. The risk ratio is not sufficiently reduced that it could not be explained
by random chance given the studies sample size. The answer is (C).'
group: mmlu_flan_cot_fewshot_other
include: _mmlu_flan_cot_fewshot_template_yaml
task: mmlu_flan_cot_fewshot_nutrition
"dataset_name": "philosophy"
"description": "The following are multiple choice questions (with answers) about philosophy.\n\
\nQ: The study of reality in the broadest sense, an inquiry into the elemental nature\
\ of the universe and the things in it, is known as _____.\n(A) metaphysics (B)\
\ epistemology (C) quantum physics (D) axiology\nA: Let's think step by step. We\
\ refer to Wikipedia articles on philosophy for help. Among the options, only metaphysics\
\ studies the nature of reality and existence. The answer is (A).\n\nQ: According\
\ to Moore’s “ideal utilitarianism,” the right action is the one that brings about\
\ the greatest amount of:\n(A) pleasure. (B) happiness. (C) good. (D) virtue.\n\
A: Let's think step by step. We refer to Wikipedia articles on philosophy for help.\
\ Moore's \"ideal utilitarianism\" states that one's actions should maximize intrinsic\
\ goods. The answer is (C).\n\nQ: Before Tolstoy's Christian conversion, what was\
\ his perspective on the meaning of life?\n(A) optimist (B) satisfied (C) nominally\
\ religious (D) pessimist\nA: Let's think step by step. We refer to Wikipedia articles\
\ on philosophy for help. Before his conversion, Tolstoy feels that life was uncertain,\
\ which is a pessimist's point of view. The answer is (D).\n\nQ: According to d'Holbach,\
\ people always act according to _____.\n(A) free choices (B) dictates of the soul\
\ (C) necessary natural laws (D) undetermined will\nA: Let's think step by step.\
\ We refer to Wikipedia articles on philosophy for help. d'Holbach believes that\
\ people act according to necessary laws, and it proves nothing about people's free\
\ will. The answer is (C).\n\nQ: Psychological egoism is:\n(A) an ethical theory\
\ about how we ought to behave. (B) a generalization concerning the way people tend\
\ to behave. (C) a claim about human nature and the ways people are capable of behaving.\
\ (D) none of the above.\nA: Let's think step by step. We refer to Wikipedia articles\
\ on philosophy for help. Psychological egoism suggests that one behaves based on\
\ what makes one feels good, hence it is a claim about human nature and how humans\
\ are capable of behaving. The answer is (C).\n\n"
"group": "mmlu_flan_cot_fewshot_humanities"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_philosophy"
dataset_name: philosophy
description: The following are multiple choice questions (with answers) about philosophy.
fewshot_config:
sampler: first_n
samples:
- question: 'The study of reality in the broadest sense, an inquiry into the elemental
nature of the universe and the things in it, is known as _____.
(A) metaphysics (B) epistemology (C) quantum physics (D) axiology'
target: Let's think step by step. We refer to Wikipedia articles on philosophy
for help. Among the options, only metaphysics studies the nature of reality
and existence. The answer is (A).
- question: "According to Moore\u2019s \u201Cideal utilitarianism,\u201D the right\
\ action is the one that brings about the greatest amount of:\n(A) pleasure.\
\ (B) happiness. (C) good. (D) virtue."
target: Let's think step by step. We refer to Wikipedia articles on philosophy
for help. Moore's "ideal utilitarianism" states that one's actions should maximize
intrinsic goods. The answer is (C).
- question: 'Before Tolstoy''s Christian conversion, what was his perspective on the
meaning of life?
(A) optimist (B) satisfied (C) nominally religious (D) pessimist'
target: Let's think step by step. We refer to Wikipedia articles on philosophy
for help. Before his conversion, Tolstoy feels that life was uncertain, which
is a pessimist's point of view. The answer is (D).
- question: 'According to d''Holbach, people always act according to _____.
(A) free choices (B) dictates of the soul (C) necessary natural laws (D) undetermined
will'
target: Let's think step by step. We refer to Wikipedia articles on philosophy
for help. d'Holbach believes that people act according to necessary laws, and
it proves nothing about people's free will. The answer is (C).
- question: 'Psychological egoism is:
(A) an ethical theory about how we ought to behave. (B) a generalization concerning
the way people tend to behave. (C) a claim about human nature and the ways people
are capable of behaving. (D) none of the above.'
target: 'Let''s think step by step. We refer to Wikipedia articles on philosophy
for help. Psychological egoism suggests that one behaves based on what makes
one feels good, hence it is a claim about human nature and how humans are capable
of behaving. The answer is (C).'
group: mmlu_flan_cot_fewshot_humanities
include: _mmlu_flan_cot_fewshot_template_yaml
task: mmlu_flan_cot_fewshot_philosophy
"dataset_name": "prehistory"
"description": "The following are multiple choice questions (with answers) about prehistory.\n\
\nQ: What is the approximate mean cranial capacity of Homo erectus?\n(A) under 650\
\ cc (B) about 800 cc (C) just under 1000 cc (D) 1200 cc\nA: Let's think step by\
\ step. We refer to Wikipedia articles on prehistory for help. The average cranium\
\ capacity of Homo erectus is less than 1000 cubic cm. The answer is (C).\n\nQ:\
\ According to Timothy Pauketat, the evidence for social stratification and political\
\ power at Cahokia suggests:\n(A) a center of Mississippian civilization with conditions\
\ similar to the rise of early states. (B) the limitations of authority in a Native\
\ American society of egalitarian foragers. (C) a simple chiefdom or perhaps a complex\
\ chiefdom had evolved by A.D. 1500. (D) a center of Mississippian civilization\
\ with conditions similar to societies on the Northwest Coast of North America.\n\
A: Let's think step by step. We refer to Wikipedia articles on prehistory for help.\
\ Timothy Pauketat is known for his research on Cahokia, the center of the Mississippian\
\ culture, where he found similar conditions to the rise of early states. The answer\
\ is (A).\n\nQ: Recent research on hominid species dating from the Middle Pliocene\
\ indicates there was (as of 2020):\n(A) a great amount of species diversity, or\
\ a single species that exhibited a lot of diversity. (B) very little species diversity\
\ during this period and very few hominids. (C) decreased species diversity due\
\ to a prolonged ice age followed by a severe drought. (D) decreased species diversity\
\ but increased numbers of hammerstones and flakes, indicating stone tool manufacture.\n\
A: Let's think step by step. We refer to Wikipedia articles on prehistory for help.\
\ Recent research has recognized multiple hominid species from the Middle Pliocene,\
\ meaning that there is a great amount of species diversity or diversity in a single\
\ species. The answer is (A).\n\nQ: Researchers now believe that the decline of\
\ the Maya was caused chiefly by:\n(A) a cataclysm of some kind, such as an earthquake,\
\ volcano, or tsunami. (B) ecological degradation resulting from slash-and-burn\
\ farming techniques. (C) endless wars between neighboring Mayan city-states. (D)\
\ practices of interbreeding that led to a steep rise in congenital disorders.\n\
A: Let's think step by step. We refer to Wikipedia articles on prehistory for help.\
\ Researchers believe that the Maya collapse was mainly caused by over-exploitation\
\ of natural resources like the slash-and-burn farming techniques. The answer is\
\ (B).\n\nQ: The great Mayan king Pacal built temples in the city of Palenque in\
\ order to:\n(A) satisfy the powerful Mayan astronomer priests. (B) display his\
\ generosity to the common people, since they were allowed to live in the temples.\
\ (C) frighten away enemies, in particular the Spaniards. (D) legitimize his kingship,\
\ since his father was not royal.\nA: Let's think step by step. We refer to Wikipedia\
\ articles on prehistory for help. Pacal built the temples as the funerary monument\
\ to legitimize his kingship. The answer is (D).\n\n"
"group": "mmlu_flan_cot_fewshot_humanities"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_prehistory"
dataset_name: prehistory
description: The following are multiple choice questions (with answers) about prehistory.
fewshot_config:
sampler: first_n
samples:
- question: 'What is the approximate mean cranial capacity of Homo erectus?
(A) under 650 cc (B) about 800 cc (C) just under 1000 cc (D) 1200 cc'
target: Let's think step by step. We refer to Wikipedia articles on prehistory
for help. The average cranium capacity of Homo erectus is less than 1000 cubic
cm. The answer is (C).
- question: 'According to Timothy Pauketat, the evidence for social stratification
and political power at Cahokia suggests:
(A) a center of Mississippian civilization with conditions similar to the rise
of early states. (B) the limitations of authority in a Native American society
of egalitarian foragers. (C) a simple chiefdom or perhaps a complex chiefdom
had evolved by A.D. 1500. (D) a center of Mississippian civilization with conditions
similar to societies on the Northwest Coast of North America.'
target: Let's think step by step. We refer to Wikipedia articles on prehistory
for help. Timothy Pauketat is known for his research on Cahokia, the center
of the Mississippian culture, where he found similar conditions to the rise
of early states. The answer is (A).
- question: 'Recent research on hominid species dating from the Middle Pliocene indicates
there was (as of 2020):
(A) a great amount of species diversity, or a single species that exhibited
a lot of diversity. (B) very little species diversity during this period and
very few hominids. (C) decreased species diversity due to a prolonged ice age
followed by a severe drought. (D) decreased species diversity but increased
numbers of hammerstones and flakes, indicating stone tool manufacture.'
target: Let's think step by step. We refer to Wikipedia articles on prehistory
for help. Recent research has recognized multiple hominid species from the Middle
Pliocene, meaning that there is a great amount of species diversity or diversity
in a single species. The answer is (A).
- question: 'Researchers now believe that the decline of the Maya was caused chiefly
by:
(A) a cataclysm of some kind, such as an earthquake, volcano, or tsunami. (B)
ecological degradation resulting from slash-and-burn farming techniques. (C)
endless wars between neighboring Mayan city-states. (D) practices of interbreeding
that led to a steep rise in congenital disorders.'
target: Let's think step by step. We refer to Wikipedia articles on prehistory
for help. Researchers believe that the Maya collapse was mainly caused by over-exploitation
of natural resources like the slash-and-burn farming techniques. The answer
is (B).
- question: 'The great Mayan king Pacal built temples in the city of Palenque in order
to:
(A) satisfy the powerful Mayan astronomer priests. (B) display his generosity
to the common people, since they were allowed to live in the temples. (C) frighten
away enemies, in particular the Spaniards. (D) legitimize his kingship, since
his father was not royal.'
target: 'Let''s think step by step. We refer to Wikipedia articles on prehistory
for help. Pacal built the temples as the funerary monument to legitimize his
kingship. The answer is (D).'
group: mmlu_flan_cot_fewshot_humanities
include: _mmlu_flan_cot_fewshot_template_yaml
task: mmlu_flan_cot_fewshot_prehistory
"dataset_name": "professional_accounting"
"description": "The following are multiple choice questions (with answers) about professional\
\ accounting.\n\nQ: An auditor traces the serial numbers on equipment to a nonissuer’s\
\ subledger. Which of the following management assertions is supported by this test?\n\
(A) Valuation and allocation (B) Completeness (C) Rights and obligations (D) Presentation\
\ and disclosure\nA: Let's think step by step. We refer to Wikipedia articles on\
\ accounting for help. The completeness assertion is tested by tracing supporting\
\ documents to the record entries. The answer is (B).\n\nQ: One hundred years ago,\
\ your great-great-grandmother invested $100 at 5% yearly interest. What is the\
\ investment worth today?\n(A) $13,000 (B) $600 (C) $15,000 (D) $28,000\nA: Let's\
\ think step by step. We refer to Wikipedia articles on accounting for help. A $100\
\ investment at 5% yearly interest is worth 100*(1.05)^100=13150 after 100 years,\
\ which is around $13,000. The answer is (A).\n\nQ: On January 1, year 1, Alpha\
\ Co. signed an annual maintenance agreement with a software provider for $15,000\
\ and the maintenance period begins on March 1, year 1. Alpha also incurred $5,000\
\ of costs on January 1, year 1, related to software modification requests that\
\ will increase the functionality of the software. Alpha depreciates and amortizes\
\ its computer and software assets over five years using the straight-line method.\
\ What amount is the total expense that Alpha should recognize related to the maintenance\
\ agreement and the software modifications for the year ended December 31, year\
\ 1?\n(A) $5,000 (B) $13,500 (C) $16,000 (D) $20,000\nA: Let's think step by step.\
\ We refer to Wikipedia articles on accounting for help. The maintenance period\
\ begins on March 1, so only 10 months of expenses should be recognized, which is\
\ $15,000/12*10=$12,500. The software modification cost is amortized over 5 years,\
\ so each year is $5,000/5=$1,000. So the total expense is $12,500+$1,000=$13,500.\
\ The answer is (B).\n\nQ: Krete is an unmarried taxpayer with income exclusively\
\ from wages. By December 31, year 1, Krete's employer has withheld $16,000 in federal\
\ income taxes and Krete has made no estimated tax payments. On April 15, year 2,\
\ Krete timely filed for an extension request to file her individual tax return,\
\ and paid $300 of additional taxes. Krete's year 1 tax liability was $16,500 when\
\ she timely filed her return on April 30, year 2, and paid the remaining tax liability\
\ balance. What amount would be subject to the penalty for underpayment of estimated\
\ taxes?\n(A) $0 (B) $500 (C) $1,650 (D) $16,500\nA: Let's think step by step. We\
\ refer to Wikipedia articles on accounting for help. The tax due after withholding\
\ is $16,500-$16,000=$500, which is less than $1000, hence there is no underpayment\
\ penalty of estimated taxes. The answer is (A).\n\nQ: Box a nongovernmental not-for-profit\
\ organization had the following transactions during the year: Proceeds from sale\
\ of investments $80000 Purchase of property plant and equipment $10000 Proceeds\
\ from long-term debt $100000 Loss on sale of investment $5000 What amount should\
\ be reported as net cash provided by financing activities in Box's statement of\
\ cash flows?\n(A) $70,000 (B) $75,000 (C) $80,000 (D) 100000\nA: Let's think step\
\ by step. We refer to Wikipedia articles on accounting for help. Among the four\
\ transactions, only Proceeds from long-term debt belongs to the financing activities\
\ section of cashflow, hence the amount reported should be $100000. The answer is\
\ (D).\n\n"
"group": "mmlu_flan_cot_fewshot_other"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_professional_accounting"
dataset_name: professional_accounting
description: The following are multiple choice questions (with answers) about professional
accounting.
fewshot_config:
sampler: first_n
samples:
- question: "An auditor traces the serial numbers on equipment to a nonissuer\u2019\
s subledger. Which of the following management assertions is supported by this\
\ test?\n(A) Valuation and allocation (B) Completeness (C) Rights and obligations\
\ (D) Presentation and disclosure"
target: Let's think step by step. We refer to Wikipedia articles on accounting
for help. The completeness assertion is tested by tracing supporting documents
to the record entries. The answer is (B).
- question: 'One hundred years ago, your great-great-grandmother invested $100 at 5%
yearly interest. What is the investment worth today?
(A) $13,000 (B) $600 (C) $15,000 (D) $28,000'
target: Let's think step by step. We refer to Wikipedia articles on accounting
for help. A $100 investment at 5% yearly interest is worth 100*(1.05)^100=13150
after 100 years, which is around $13,000. The answer is (A).
- question: 'On January 1, year 1, Alpha Co. signed an annual maintenance agreement
with a software provider for $15,000 and the maintenance period begins on March
1, year 1. Alpha also incurred $5,000 of costs on January 1, year 1, related
to software modification requests that will increase the functionality of the
software. Alpha depreciates and amortizes its computer and software assets over
five years using the straight-line method. What amount is the total expense
that Alpha should recognize related to the maintenance agreement and the software
modifications for the year ended December 31, year 1?
(A) $5,000 (B) $13,500 (C) $16,000 (D) $20,000'
target: Let's think step by step. We refer to Wikipedia articles on accounting
for help. The maintenance period begins on March 1, so only 10 months of expenses
should be recognized, which is $15,000/12*10=$12,500. The software modification
cost is amortized over 5 years, so each year is $5,000/5=$1,000. So the total
expense is $12,500+$1,000=$13,500. The answer is (B).
- question: 'Krete is an unmarried taxpayer with income exclusively from wages. By
December 31, year 1, Krete''s employer has withheld $16,000 in federal income
taxes and Krete has made no estimated tax payments. On April 15, year 2, Krete
timely filed for an extension request to file her individual tax return, and
paid $300 of additional taxes. Krete''s year 1 tax liability was $16,500 when
she timely filed her return on April 30, year 2, and paid the remaining tax
liability balance. What amount would be subject to the penalty for underpayment
of estimated taxes?
(A) $0 (B) $500 (C) $1,650 (D) $16,500'
target: Let's think step by step. We refer to Wikipedia articles on accounting
for help. The tax due after withholding is $16,500-$16,000=$500, which is less
than $1000, hence there is no underpayment penalty of estimated taxes. The answer
is (A).
- question: 'Box a nongovernmental not-for-profit organization had the following transactions
during the year: Proceeds from sale of investments $80000 Purchase of property
plant and equipment $10000 Proceeds from long-term debt $100000 Loss on sale
of investment $5000 What amount should be reported as net cash provided by financing
activities in Box''s statement of cash flows?
(A) $70,000 (B) $75,000 (C) $80,000 (D) 100000'
target: 'Let''s think step by step. We refer to Wikipedia articles on accounting
for help. Among the four transactions, only Proceeds from long-term debt belongs
to the financing activities section of cashflow, hence the amount reported should
be $100000. The answer is (D).'
group: mmlu_flan_cot_fewshot_other
include: _mmlu_flan_cot_fewshot_template_yaml
task: mmlu_flan_cot_fewshot_professional_accounting
"dataset_name": "professional_law"
"description": "The following are multiple choice questions (with answers) about professional\
\ law.\n\nQ: A son owed a creditor $5,000. The son's father contacted the creditor\
\ and told him that he wanted to pay the son's debt. The father signed a document\
\ that stated the father would pay the son's debt at a rate of $500 a month for\
\ 10 months. The creditor made no written or oral commitment to forbear to sue the\
\ son to collect the $5,000 debt, and the father made no oral or written request\
\ for any such forbearance. For the next five months, the father made and the creditor\
\ accepted the $500 monthly payments as agreed. During that period, the creditor,\
\ in fact, did forbear to take any legal action against the son. However, the father\
\ then informed the creditor that he would make no further payments on the debt.\
\ Which of the following is the most persuasive argument that the father is liable\
\ to the creditor under the terms of their agreement?\n(A) The father's promise\
\ and the creditor's reliance thereon, if proved, gave rise to a valid claim by\
\ the creditor against the father based on the doctrine of promissory estoppel.\
\ (B) Because it was foreseeable that the father's promise would induce the creditor\
\ to forbear taking any action against the son, such forbearance was, as a matter\
\ of law, a bargained-for consideration for the father's promise. (C) The father's\
\ five payments to the creditor totaling $2,500 manifested a serious intent on the\
\ father's part to be contractually bound, and such manifestation is generally recognized\
\ as an effective substitute for consideration. (D) By assuming the antecedent debt\
\ obligation that the son owed to the creditor, the father became a surety whose\
\ promise to the creditor was enforceable, since it was in writing and supported\
\ by adequate consideration. \nA: Let's think step by step. We refer to Wikipedia\
\ articles on law for help. The doctrine of promissory estoppel stops a person from\
\ going back on a promise in contract law, hence option (A) should be the most persuasive\
\ argument. The answer is (A).\n\nQ: A state has recently enacted a statute prohibiting\
\ the disposal of any nuclear wastes within the state. This law does not contravene\
\ or conflict with any federal statutes. A man operates a company in the state that\
\ is engaged in the disposal of nuclear wastes. Subsequent to the passage of the\
\ state statute, the man, not yet aware of the new law, entered into contracts with\
\ many out-of-state firms to dispose of their nuclear wastes in the state. On account\
\ of this new law, however, the man will be unable to perform these contracts. Assume\
\ that the man has standing to challenge this state law. Which of the following\
\ presents his strongest constitutional grounds to challenge the state law prohibiting\
\ the disposal of nuclear wastes within the state?\n(A) The commerce clause. (B)\
\ The equal protection clause of the Fourteenth Amendment. (C) The privileges and\
\ immunities clause of Article IV, Section 2. (D) The contract clause.\nA: Let's\
\ think step by step. We refer to Wikipedia articles on law for help. The commerce\
\ clause states that Congress shall have the power to regulate commerce with foreign\
\ Nations, and among the several States, and with the Indian Tribes. The statute\
\ affects inter-state commerce which puts it into question. Hence the man's strongest\
\ argument should be the commerce clause. The answer is (A).\n\nQ: On October 1,\
\ 1980, a developer, owner of several hundred acres in a rural county, drafted a\
\ general development plan for the area. The duly recorded plan imposed elaborate\
\ limitations and restrictions upon the land in the plan, which was to be developed\
\ as a residential district. The restrictions were to extend to all persons acquiring\
\ any of the lots and to their heirs, assigns, and lessees. It was further provided\
\ that all subsequent owners would be charged with due notice of the restrictions.\
\ Among those restrictions in the general plan were the following:(22) A franchise\
\ right is created in a strip of land 10 feet in width along the rear of each lot\
\ for the use of public utility companies with right of ingress and egress. (23)\
\ No house or structure of any kind shall be built on the aforementioned strip of\
\ land running through the said blocks. In 2000, a retiree purchased one of the\
\ lots, built a house, and erected a fence in the rear of his property within the\
\ restricted area. In 2004, a teacher purchased a lot adjacent to the retiree's\
\ property and built a new house. Two years later, a librarian purchased the lot\
\ that adjoined the teacher's property. The three deeds to those properties each\
\ contained references to the deed book where the general plan was recorded. In\
\ 2008, the librarian began the construction of a seven-foot post-and-rail fence\
\ along the line dividing his lot with the teacher's, and along the center of the\
\ area subject to the franchise right. Although the teacher objected to its construction,\
\ the fence was completed. If the teacher seeks a mandatory injunction to compel\
\ removal of the librarian's fence, the court will most likely\n(A) grant relief,\
\ because the fence was in violation of the easement restriction. (B) grant relief,\
\ because the encroachment of the fence violated the restriction in the original\
\ plan. (C) deny relief, because the teacher failed to enforce the restriction against\
\ the retiree. (D) deny relief, because the fence would not be construed as \"a\
\ structure\" within the terms of the restriction. \nA: Let's think step by step.\
\ We refer to Wikipedia articles on law for help. The restrictions in the original\
\ plan say no house or structure of any kind shall be built on the aforementioned\
\ strip of land running through the said blocks. Hence the court will most likely\
\ grant relief because the fence violated the restriction in the original plan.\
\ The answer is (B).\n\nQ: Judge took judicial notice of some facts at the beginning\
\ of the trial. Which of the following is not an appropriate kind of fact for judicial\
\ notice?\n(A) Indisputable facts. (B) Facts that have been asserted by individual\
\ political organizations. (C) Facts recognized to be true by common knowledge.\
\ (D) Facts capable of scientific verification.\nA: Let's think step by step. We\
\ refer to Wikipedia articles on law for help. Among the options, facts that have\
\ been asserted by individual political organizations is not an appropriate kind\
\ of fact for judicial notice. The answer is (B).\n\nQ: A state legislature has\
\ recently enacted a statute making it a misdemeanor to curse or revile or use obscene\
\ or opprobrious language toward or in reference to a police officer perfonning\
\ his duties. A student at a state university organized a demonstration on campus\
\ to protest the war. The rally was attended by a group of 50 students who shouted\
\ anti-war messages at cars passing by. To show his contempt for the United States,\
\ the student sewed the American flag to the rear of his jeans. When a police officer\
\ saw the flag sown on the student's jeans, he approached and told him to remove\
\ the flag or he would be placed under arrest. The student became angered and shouted\
\ at the police officer, \"Listen, you bastard, I'll wear this rag anywhere I please.\
\ \" The student was subsequently placed under arrest and charged with violating\
\ the state statute. The student subsequently brings suit in state court challenging\
\ the constitutionality of the statute. The strongest constitutional argument for\
\ the student is that\n(A) the statute is void for vagueness under the Fourteenth\
\ Amendment's due process clause. (B) the statute is invalid because it violates\
\ the petitioner's freedom of speech under the First Amendment. (C) the statute\
\ is an abridgment of freedom of speech under the First Amendment because less restrictive\
\ means are available for achieving the same purpose. (D) the statute is overbroad\
\ and consequently invalid under the First and Fourteenth Amendments.\nA: Let's\
\ think step by step. We refer to Wikipedia articles on law for help. The Fourteenth\
\ Amendment further supports the First Amendment by establishing a due process clause.\
\ Hence the strongest argument should be the statute is overbroad and consequently\
\ invalid under the First and Fourteenth Amendments. The answer is (D).\n\n"
"group": "mmlu_flan_cot_fewshot_humanities"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_professional_law"
dataset_name: professional_law
description: The following are multiple choice questions (with answers) about professional
law.
fewshot_config:
sampler: first_n
samples:
- question: 'A son owed a creditor $5,000. The son''s father contacted the creditor
and told him that he wanted to pay the son''s debt. The father signed a document
that stated the father would pay the son''s debt at a rate of $500 a month for
10 months. The creditor made no written or oral commitment to forbear to sue
the son to collect the $5,000 debt, and the father made no oral or written request
for any such forbearance. For the next five months, the father made and the
creditor accepted the $500 monthly payments as agreed. During that period, the
creditor, in fact, did forbear to take any legal action against the son. However,
the father then informed the creditor that he would make no further payments
on the debt. Which of the following is the most persuasive argument that the
father is liable to the creditor under the terms of their agreement?
(A) The father''s promise and the creditor''s reliance thereon, if proved, gave
rise to a valid claim by the creditor against the father based on the doctrine
of promissory estoppel. (B) Because it was foreseeable that the father''s promise
would induce the creditor to forbear taking any action against the son, such
forbearance was, as a matter of law, a bargained-for consideration for the father''s
promise. (C) The father''s five payments to the creditor totaling $2,500 manifested
a serious intent on the father''s part to be contractually bound, and such manifestation
is generally recognized as an effective substitute for consideration. (D) By
assuming the antecedent debt obligation that the son owed to the creditor, the
father became a surety whose promise to the creditor was enforceable, since
it was in writing and supported by adequate consideration. '
target: Let's think step by step. We refer to Wikipedia articles on law for help.
The doctrine of promissory estoppel stops a person from going back on a promise
in contract law, hence option (A) should be the most persuasive argument. The
answer is (A).
- question: 'A state has recently enacted a statute prohibiting the disposal of any
nuclear wastes within the state. This law does not contravene or conflict with
any federal statutes. A man operates a company in the state that is engaged
in the disposal of nuclear wastes. Subsequent to the passage of the state statute,
the man, not yet aware of the new law, entered into contracts with many out-of-state
firms to dispose of their nuclear wastes in the state. On account of this new
law, however, the man will be unable to perform these contracts. Assume that
the man has standing to challenge this state law. Which of the following presents
his strongest constitutional grounds to challenge the state law prohibiting
the disposal of nuclear wastes within the state?
(A) The commerce clause. (B) The equal protection clause of the Fourteenth Amendment.
(C) The privileges and immunities clause of Article IV, Section 2. (D) The contract
clause.'
target: Let's think step by step. We refer to Wikipedia articles on law for help.
The commerce clause states that Congress shall have the power to regulate commerce
with foreign Nations, and among the several States, and with the Indian Tribes.
The statute affects inter-state commerce which puts it into question. Hence
the man's strongest argument should be the commerce clause. The answer is (A).
- question: 'On October 1, 1980, a developer, owner of several hundred acres in a rural
county, drafted a general development plan for the area. The duly recorded plan
imposed elaborate limitations and restrictions upon the land in the plan, which
was to be developed as a residential district. The restrictions were to extend
to all persons acquiring any of the lots and to their heirs, assigns, and lessees.
It was further provided that all subsequent owners would be charged with due
notice of the restrictions. Among those restrictions in the general plan were
the following:(22) A franchise right is created in a strip of land 10 feet in
width along the rear of each lot for the use of public utility companies with
right of ingress and egress. (23) No house or structure of any kind shall be
built on the aforementioned strip of land running through the said blocks. In
2000, a retiree purchased one of the lots, built a house, and erected a fence
in the rear of his property within the restricted area. In 2004, a teacher purchased
a lot adjacent to the retiree''s property and built a new house. Two years later,
a librarian purchased the lot that adjoined the teacher''s property. The three
deeds to those properties each contained references to the deed book where the
general plan was recorded. In 2008, the librarian began the construction of
a seven-foot post-and-rail fence along the line dividing his lot with the teacher''s,
and along the center of the area subject to the franchise right. Although the
teacher objected to its construction, the fence was completed. If the teacher
seeks a mandatory injunction to compel removal of the librarian''s fence, the
court will most likely
(A) grant relief, because the fence was in violation of the easement restriction.
(B) grant relief, because the encroachment of the fence violated the restriction
in the original plan. (C) deny relief, because the teacher failed to enforce
the restriction against the retiree. (D) deny relief, because the fence would
not be construed as "a structure" within the terms of the restriction. '
target: Let's think step by step. We refer to Wikipedia articles on law for help.
The restrictions in the original plan say no house or structure of any kind
shall be built on the aforementioned strip of land running through the said
blocks. Hence the court will most likely grant relief because the fence violated
the restriction in the original plan. The answer is (B).
- question: 'Judge took judicial notice of some facts at the beginning of the trial.
Which of the following is not an appropriate kind of fact for judicial notice?
(A) Indisputable facts. (B) Facts that have been asserted by individual political
organizations. (C) Facts recognized to be true by common knowledge. (D) Facts
capable of scientific verification.'
target: Let's think step by step. We refer to Wikipedia articles on law for help.
Among the options, facts that have been asserted by individual political organizations
is not an appropriate kind of fact for judicial notice. The answer is (B).
- question: 'A state legislature has recently enacted a statute making it a misdemeanor
to curse or revile or use obscene or opprobrious language toward or in reference
to a police officer perfonning his duties. A student at a state university organized
a demonstration on campus to protest the war. The rally was attended by a group
of 50 students who shouted anti-war messages at cars passing by. To show his
contempt for the United States, the student sewed the American flag to the rear
of his jeans. When a police officer saw the flag sown on the student''s jeans,
he approached and told him to remove the flag or he would be placed under arrest.
The student became angered and shouted at the police officer, "Listen, you bastard,
I''ll wear this rag anywhere I please. " The student was subsequently placed
under arrest and charged with violating the state statute. The student subsequently
brings suit in state court challenging the constitutionality of the statute.
The strongest constitutional argument for the student is that
(A) the statute is void for vagueness under the Fourteenth Amendment''s due
process clause. (B) the statute is invalid because it violates the petitioner''s
freedom of speech under the First Amendment. (C) the statute is an abridgment
of freedom of speech under the First Amendment because less restrictive means
are available for achieving the same purpose. (D) the statute is overbroad and
consequently invalid under the First and Fourteenth Amendments.'
target: 'Let''s think step by step. We refer to Wikipedia articles on law for
help. The Fourteenth Amendment further supports the First Amendment by establishing
a due process clause. Hence the strongest argument should be the statute is
overbroad and consequently invalid under the First and Fourteenth Amendments.
The answer is (D).'
group: mmlu_flan_cot_fewshot_humanities
include: _mmlu_flan_cot_fewshot_template_yaml
task: mmlu_flan_cot_fewshot_professional_law
"dataset_name": "professional_medicine"
"description": "The following are multiple choice questions (with answers) about professional\
\ medicine.\n\nQ: A 22-year-old male marathon runner presents to the office with\
\ the complaint of right-sided rib pain when he runs long distances. Physical examination\
\ reveals normal heart and lung findings and an exhalation dysfunction at ribs 4-5\
\ on the right. Which of the following muscles or muscle groups will be most useful\
\ in correcting this dysfunction utilizing a direct method?\n(A) anterior scalene\
\ (B) latissimus dorsi (C) pectoralis minor (D) quadratus lumborum\nA: Let's think\
\ step by step. We refer to Wikipedia articles on medicine for help. Among the options,\
\ only pectoralis minor muscle origins from the outer surfaces of the 3rd to 5th\
\ ribs. The answer is (C).\n\nQ: A 36-year-old male presents to the office with\
\ a 3-week history of low back pain. He denies any recent trauma but says that he\
\ climbs in and out of his truck numerous times a day for his job. Examination of\
\ the patient in the prone position reveals a deep sacral sulcus on the left, a\
\ posterior inferior lateral angle on the right, and a lumbosacral junction that\
\ springs freely on compression. The most likely diagnosis is\n(A) left-on-left\
\ sacral torsion (B) left-on-right sacral torsion (C) right unilateral sacral flexion\
\ (D) right-on-right sacral torsion\nA: Let's think step by step. We refer to Wikipedia\
\ articles on medicine for help. The deep sulcus on the left, a posterior ILA on\
\ the right, with a negative spring test suggests a right-on-right sacral torsion.\
\ All other options have a deep sulcus on the right. The answer is (D).\n\nQ: A\
\ 44-year-old man comes to the office because of a 3-day history of sore throat,\
\ nonproductive cough, runny nose, and frontal headache. He says the headache is\
\ worse in the morning and ibuprofen does provide some relief. He has not had shortness\
\ of breath. Medical history is unremarkable. He takes no medications other than\
\ the ibuprofen for pain. Vital signs are temperature 37.4°C (99.4°F), pulse 88/min,\
\ respirations 18/min, and blood pressure 120/84 mm Hg. Examination of the nares\
\ shows erythematous mucous membranes. Examination of the throat shows erythema\
\ and follicular lymphoid hyperplasia on the posterior oropharynx. There is no palpable\
\ cervical adenopathy. Lungs are clear to auscultation. Which of the following is\
\ the most likely cause of this patient's symptoms?\n(A) Allergic rhinitis (B) Epstein-Barr\
\ virus (C) Mycoplasma pneumonia (D) Rhinovirus\nA: Let's think step by step. We\
\ refer to Wikipedia articles on medicine for help. The symptoms, especially the\
\ headache, suggest that the most likely cause is Rhinovirus. Epstein-Barr virus\
\ will cause swollen lymph nodes but there is no palpable cervical adenopathy. Lungs\
\ are clear to auscultation suggests it's not Mycoplasma pneumonia. The answer is\
\ (D).\n\nQ: A previously healthy 32-year-old woman comes to the physician 8 months\
\ after her husband was killed in a car crash. Since that time, she has had a decreased\
\ appetite and difficulty falling asleep. She states that she is often sad and cries\
\ frequently. She has been rechecking the door lock five times before leaving her\
\ house and has to count exactly five pieces of toilet paper before she uses it.\
\ She says that she has always been a perfectionist but these urges and rituals\
\ are new. Pharmacotherapy should be targeted to which of the following neurotransmitters?\n\
(A) Dopamine (B) Glutamate (C) Norepinephrine (D) Serotonin\nA: Let's think step\
\ by step. We refer to Wikipedia articles on medicine for help. The patient feels\
\ sad and among the options, only Dopamine and Serotonin can help increase positive\
\ emotions. Serotonin also affects digestion and metabolism, which can help the\
\ patient's decreased appetite and sleep difficulty. The answer is (D).\n\nQ: A\
\ 42-year-old man comes to the office for preoperative evaluation prior to undergoing\
\ adrenalectomy scheduled in 2 weeks. One month ago, he received care in the emergency\
\ department for pain over his right flank following a motor vehicle collision.\
\ At that time, blood pressure was 160/100 mm Hg and CT scan of the abdomen showed\
\ an incidental 10-cm left adrenal mass. Results of laboratory studies, including\
\ complete blood count, serum electrolyte concentrations, and liver function tests,\
\ were within the reference ranges. The patient otherwise had been healthy and had\
\ never been told that he had elevated blood pressure. He takes no medications.\
\ A follow-up visit in the office 2 weeks ago disclosed elevated urinary normetanephrine\
\ and metanephrine and plasma aldosterone concentrations. The patient was referred\
\ to a surgeon, who recommended the adrenalectomy. Today, vital signs are temperature\
\ 36.6°C (97.9°F), pulse 100/min, respirations 14/min, and blood pressure 170/95\
\ mm Hg. Physical examination discloses no significant findings. Initial preoperative\
\ preparation should include treatment with which of the following?\n(A) Labetalol\
\ (B) A loading dose of potassium chloride (C) Nifedipine (D) Phenoxybenzamine\n\
A: Let's think step by step. We refer to Wikipedia articles on medicine for help.\
\ The symptoms and the adrenal mass suggested pheochromocytoma, and the blood pressure\
\ indicates hypertension. Phenoxybenzamine is used to treat hypertension caused\
\ by pheochromocytoma. The answer is (D).\n\n"
"group": "mmlu_flan_cot_fewshot_other"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_professional_medicine"
dataset_name: professional_medicine
description: The following are multiple choice questions (with answers) about professional
medicine.
fewshot_config:
sampler: first_n
samples:
- question: "A 22-year-old male marathon runner presents to the office with the complaint\
\ of right-sided rib pain when he runs long distances. Physical examination\
\ reveals normal heart and lung findings and an exhalation dysfunction at ribs\_\
4-5 on the right. Which of the following muscles or muscle groups will be most\
\ useful in correcting this dysfunction utilizing a direct method?\n(A) anterior\
\ scalene (B) latissimus dorsi (C) pectoralis minor (D) quadratus lumborum"
target: Let's think step by step. We refer to Wikipedia articles on medicine for
help. Among the options, only pectoralis minor muscle origins from the outer
surfaces of the 3rd to 5th ribs. The answer is (C).
- question: "A 36-year-old male presents to the office with a\_3-week\_history of low\
\ back pain. He denies any recent trauma but says that he climbs in and out\
\ of his truck numerous times a day for his job. Examination of the patient\
\ in the prone position reveals a deep sacral sulcus on the left, a posterior\
\ inferior lateral angle on the right, and a lumbosacral junction that springs\
\ freely on compression. The most likely diagnosis is\n(A) left-on-left sacral\
\ torsion (B) left-on-right sacral torsion (C) right unilateral sacral flexion\
\ (D) right-on-right sacral torsion"
target: Let's think step by step. We refer to Wikipedia articles on medicine for
help. The deep sulcus on the left, a posterior ILA on the right, with a negative
spring test suggests a right-on-right sacral torsion. All other options have
a deep sulcus on the right. The answer is (D).
- question: "A 44-year-old man comes to the office because of a 3-day history of sore\
\ throat, nonproductive cough, runny nose, and frontal headache. He says the\
\ headache is worse in the morning and ibuprofen does provide some relief. He\
\ has not had shortness of breath. Medical history is unremarkable. He takes\
\ no medications other than the ibuprofen for pain. Vital signs are temperature\
\ 37.4\xB0C (99.4\xB0F), pulse 88/min, respirations 18/min, and blood pressure\
\ 120/84 mm Hg. Examination of the nares shows erythematous mucous membranes.\
\ Examination of the throat shows erythema and follicular lymphoid hyperplasia\
\ on the posterior oropharynx. There is no palpable cervical adenopathy. Lungs\
\ are clear to auscultation. Which of the following is the most likely cause\
\ of this patient's symptoms?\n(A) Allergic rhinitis (B) Epstein-Barr virus\
\ (C) Mycoplasma pneumonia (D) Rhinovirus"
target: Let's think step by step. We refer to Wikipedia articles on medicine for
help. The symptoms, especially the headache, suggest that the most likely cause
is Rhinovirus. Epstein-Barr virus will cause swollen lymph nodes but there is
no palpable cervical adenopathy. Lungs are clear to auscultation suggests it's
not Mycoplasma pneumonia. The answer is (D).
- question: 'A previously healthy 32-year-old woman comes to the physician 8 months
after her husband was killed in a car crash. Since that time, she has had a
decreased appetite and difficulty falling asleep. She states that she is often
sad and cries frequently. She has been rechecking the door lock five times before
leaving her house and has to count exactly five pieces of toilet paper before
she uses it. She says that she has always been a perfectionist but these urges
and rituals are new. Pharmacotherapy should be targeted to which of the following
neurotransmitters?
(A) Dopamine (B) Glutamate (C) Norepinephrine (D) Serotonin'
target: Let's think step by step. We refer to Wikipedia articles on medicine for
help. The patient feels sad and among the options, only Dopamine and Serotonin
can help increase positive emotions. Serotonin also affects digestion and metabolism,
which can help the patient's decreased appetite and sleep difficulty. The answer
is (D).
- question: "A 42-year-old man comes to the office for preoperative evaluation prior\
\ to undergoing adrenalectomy scheduled in 2 weeks. One month ago, he received\
\ care in the emergency department for pain over his right flank following a\
\ motor vehicle collision. At that time, blood pressure was 160/100 mm Hg and\
\ CT scan of the abdomen showed an incidental 10-cm left adrenal mass. Results\
\ of laboratory studies, including complete blood count, serum electrolyte concentrations,\
\ and liver function tests, were within the reference ranges. The patient otherwise\
\ had been healthy and had never been told that he had elevated blood pressure.\
\ He takes no medications. A follow-up visit in the office 2 weeks ago disclosed\
\ elevated urinary normetanephrine and metanephrine and plasma aldosterone concentrations.\
\ The patient was referred to a surgeon, who recommended the adrenalectomy.\
\ Today, vital signs are temperature 36.6\xB0C (97.9\xB0F), pulse 100/min, respirations\
\ 14/min, and blood pressure 170/95 mm Hg. Physical examination discloses no\
\ significant findings. Initial preoperative preparation should include treatment\
\ with which of the following?\n(A) Labetalol (B) A loading dose of potassium\
\ chloride (C) Nifedipine (D) Phenoxybenzamine"
target: 'Let''s think step by step. We refer to Wikipedia articles on medicine
for help. The symptoms and the adrenal mass suggested pheochromocytoma, and
the blood pressure indicates hypertension. Phenoxybenzamine is used to treat
hypertension caused by pheochromocytoma. The answer is (D).'
group: mmlu_flan_cot_fewshot_other
include: _mmlu_flan_cot_fewshot_template_yaml
task: mmlu_flan_cot_fewshot_professional_medicine
"dataset_name": "professional_psychology"
"description": "The following are multiple choice questions (with answers) about professional\
\ psychology.\n\nQ: In the construction of a multiple regression equation for purposes\
\ of prediction, the optimal combination of measures is one in which the predictors\n\
(A) are uncorrelated with each other but are moderately correlated with the criterion\
\ (B) have low correlations with each other and low correlations with the criterion\
\ (C) are highly intercorrelated with each other and moderately correlated with\
\ the criterion (D) have low correlations with the criterion bur are moderately\
\ correlated with each other\nA: Let's think step by step. We refer to Wikipedia\
\ articles on psychology for help. The basis of multiple regression is to assess\
\ the relationship between one continuous variable and a set of independent variables.\
\ So the predictors should be uncorrelated with each other but are moderately correlated\
\ with the criterion. The answer is (A).\n\nQ: There are three ways to measure the\
\ Central Tendency: the Mean, the Median and the Mode. From your knowledge about\
\ them, what is the mode?\n(A) less sensitive to extreme scores than the mean (B)\
\ more useful for skewed distributions (C) sensitive to extreme values and highly\
\ skewed distributions (D) the most frequently occurring number\nA: Let's think\
\ step by step. We refer to Wikipedia articles on psychology for help. The definition\
\ of mode is the most frequently occurring number. The answer is (D).\n\nQ: Carl\
\ Jung believed that a client's transference:\n(A) is a fantasy that distracts the\
\ client from reality. (B) represents “mixed feelings” toward the therapist. (C)\
\ \"is a form of \"\"acting out.\"\"\" (D) reflects the client’s personal and collective\
\ unconscious.\nA: Let's think step by step. We refer to Wikipedia articles on psychology\
\ for help. Transference is a phenomenon that a person's feelings are unconsciously\
\ redirected, so it reflects the client's personal and collective unconscious. The\
\ answer is (D).\n\nQ: In terms of Hofstede’s (1980) five cultural dimensions, the\
\ United States scores at the top of the scale on:\n(A) individualism. (B) individualism\
\ and power distance. (C) power distance and masculinity. (D) uncertainty avoidance.\n\
A: Let's think step by step. We refer to Wikipedia articles on psychology for help.\
\ US scores highest on individualism among the five cultural dimensions. The answer\
\ is (A).\n\nQ: One of your therapy clients asks your advice about a good weight-\
\ reduction program. You have investigated the programs in the community and are\
\ enrolled in the one you consider the best. This program offers a $50 bonus to\
\ its patrons for each new person they bring into the program. Under these circumstances,\
\ your most appropriate response would be to\n(A) tell your client the pros and\
\ cons of each program you know about except for the one in which you are enrolled\
\ (B) recommend to your client the program in which you are enrolled and explain\
\ the $50 bonus you will receive (C) recommend to your client the program in which\
\ you are enrolled and offer to have the $50 bonus credited to your client's account\
\ in the program (D) tell your client the pros and cons of each program you know\
\ about, but do not claim the $50 bonus if your client enrolls in your program\n\
A: Let's think step by step. We refer to Wikipedia articles on psychology for help.\
\ Based on the circumstances, you should tell your client about the pros and cons\
\ of each program, but it would be inappropriate to receive the bonus, so you should\
\ not claim the $50 bonus. The answer is (D).\n\n"
"group": "mmlu_flan_cot_fewshot_social_sciences"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_professional_psychology"
dataset_name: professional_psychology
description: The following are multiple choice questions (with answers) about professional
psychology.
fewshot_config:
sampler: first_n
samples:
- question: 'In the construction of a multiple regression equation for purposes of
prediction, the optimal combination of measures is one in which the predictors
(A) are uncorrelated with each other but are moderately correlated with the
criterion (B) have low correlations with each other and low correlations with
the criterion (C) are highly intercorrelated with each other and moderately
correlated with the criterion (D) have low correlations with the criterion bur
are moderately correlated with each other'
target: Let's think step by step. We refer to Wikipedia articles on psychology
for help. The basis of multiple regression is to assess the relationship between
one continuous variable and a set of independent variables. So the predictors
should be uncorrelated with each other but are moderately correlated with the
criterion. The answer is (A).
- question: 'There are three ways to measure the Central Tendency: the Mean, the Median
and the Mode. From your knowledge about them, what is the mode?
(A) less sensitive to extreme scores than the mean (B) more useful for skewed
distributions (C) sensitive to extreme values and highly skewed distributions
(D) the most frequently occurring number'
target: Let's think step by step. We refer to Wikipedia articles on psychology
for help. The definition of mode is the most frequently occurring number. The
answer is (D).
- question: "Carl Jung believed that a client's transference:\n(A) is a fantasy that\
\ distracts the client from reality. (B) represents \u201Cmixed feelings\u201D\
\ toward the therapist. (C) \"is a form of \"\"acting out.\"\"\" (D) reflects\
\ the client\u2019s personal and collective unconscious."
target: Let's think step by step. We refer to Wikipedia articles on psychology
for help. Transference is a phenomenon that a person's feelings are unconsciously
redirected, so it reflects the client's personal and collective unconscious.
The answer is (D).
- question: "In terms of Hofstede\u2019s (1980) five cultural dimensions, the United\
\ States scores at the top of the scale on:\n(A) individualism. (B) individualism\
\ and power distance. (C) power distance and masculinity. (D) uncertainty avoidance."
target: Let's think step by step. We refer to Wikipedia articles on psychology
for help. US scores highest on individualism among the five cultural dimensions.
The answer is (A).
- question: 'One of your therapy clients asks your advice about a good weight- reduction
program. You have investigated the programs in the community and are enrolled
in the one you consider the best. This program offers a $50 bonus to its patrons
for each new person they bring into the program. Under these circumstances,
your most appropriate response would be to
(A) tell your client the pros and cons of each program you know about except
for the one in which you are enrolled (B) recommend to your client the program
in which you are enrolled and explain the $50 bonus you will receive (C) recommend
to your client the program in which you are enrolled and offer to have the $50
bonus credited to your client''s account in the program (D) tell your client
the pros and cons of each program you know about, but do not claim the $50 bonus
if your client enrolls in your program'
target: 'Let''s think step by step. We refer to Wikipedia articles on psychology
for help. Based on the circumstances, you should tell your client about the
pros and cons of each program, but it would be inappropriate to receive the
bonus, so you should not claim the $50 bonus. The answer is (D).'
group: mmlu_flan_cot_fewshot_social_sciences
include: _mmlu_flan_cot_fewshot_template_yaml
task: mmlu_flan_cot_fewshot_professional_psychology
"dataset_name": "public_relations"
"description": "The following are multiple choice questions (with answers) about public\
\ relations.\n\nQ: Earth Hour was a campaign launched by which organization?\n(A)\
\ Greenpeace (B) The UN (C) Oxfam (D) World Wildlife Fund\nA: Let's think step by\
\ step. We refer to Wikipedia articles on public relations for help. Earth Hour\
\ is a worldwide movement oragnized launched by the World Wildlife Fund. The answer\
\ is (D).\n\nQ: In issues management, what is the most proactive approach to addressing\
\ negative or misleading information posted online about your organization?\n(A)\
\ Buy domain names that could be used by opposition groups. (B) Post anonymous comments\
\ on blogs to combat this information. (C) Prepare a news release that discredits\
\ the inaccurate information. (D) Make policy changes to address complaints highlighted\
\ on these sites.\nA: Let's think step by step. We refer to Wikipedia articles on\
\ public relations for help. In issues management, the most proactive approach to\
\ addressing negative or misleading information posted online is to make policy\
\ changes to address complaints highlighted on those sites. The answer is (D).\n\
\nQ: At which stage in the planning process would a situation analysis be carried\
\ out?\n(A) Defining the program (B) Planning the program (C) Taking action and\
\ implementing ideas (D) Evaluation of the program\nA: Let's think step by step.\
\ We refer to Wikipedia articles on public relations for help. Situation analyses\
\ are typically carried out during the planning process stage of defining the program.\
\ The answer is (A).\n\nQ: Which of these statements is true of the Vatican in 2010\
\ at the time of the accusations of child abuse cover-ups?\n(A) There was a coordinated\
\ media response. (B) Consistent messages were communicated. (C) Criticisms were\
\ taken as attacks on the Catholic Church. (D) The credibility of the Vatican was\
\ upheld.\nA: Let's think step by step. We refer to Wikipedia articles on public\
\ relations for help. In 2010 when there were accusations of child abuse cover-ups,\
\ the Vatican took those criticisms as attacks on the Catholic Church. The answer\
\ is (C).\n\nQ: What should a public relations media practitioner do if she does\
\ not know the answer to a reporter's question?\n(A) Give the reporter other information\
\ she is certain is correct. (B) Say that the information is 'off the record' and\
\ will be disseminated later. (C) Say 'I don't know' and promise to provide the\
\ information later. (D) Say 'no comment,' rather than appear uninformed.\nA: Let's\
\ think step by step. We refer to Wikipedia articles on public relations for help.\
\ If a public relations media practitioner does not know the answer to a reporter's\
\ question, they should say 'I don't know' and offer to provide the information\
\ later. The answer is (C).\n\n"
"group": "mmlu_flan_cot_fewshot_social_sciences"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_public_relations"
dataset_name: public_relations
description: The following are multiple choice questions (with answers) about public
relations.
fewshot_config:
sampler: first_n
samples:
- question: 'Earth Hour was a campaign launched by which organization?
(A) Greenpeace (B) The UN (C) Oxfam (D) World Wildlife Fund'
target: Let's think step by step. We refer to Wikipedia articles on public relations
for help. Earth Hour is a worldwide movement oragnized launched by the World
Wildlife Fund. The answer is (D).
- question: 'In issues management, what is the most proactive approach to addressing
negative or misleading information posted online about your organization?
(A) Buy domain names that could be used by opposition groups. (B) Post anonymous
comments on blogs to combat this information. (C) Prepare a news release that
discredits the inaccurate information. (D) Make policy changes to address complaints
highlighted on these sites.'
target: Let's think step by step. We refer to Wikipedia articles on public relations
for help. In issues management, the most proactive approach to addressing negative
or misleading information posted online is to make policy changes to address
complaints highlighted on those sites. The answer is (D).
- question: 'At which stage in the planning process would a situation analysis be carried
out?
(A) Defining the program (B) Planning the program (C) Taking action and implementing
ideas (D) Evaluation of the program'
target: Let's think step by step. We refer to Wikipedia articles on public relations
for help. Situation analyses are typically carried out during the planning process
stage of defining the program. The answer is (A).
- question: 'Which of these statements is true of the Vatican in 2010 at the time of
the accusations of child abuse cover-ups?
(A) There was a coordinated media response. (B) Consistent messages were communicated.
(C) Criticisms were taken as attacks on the Catholic Church. (D) The credibility
of the Vatican was upheld.'
target: Let's think step by step. We refer to Wikipedia articles on public relations
for help. In 2010 when there were accusations of child abuse cover-ups, the
Vatican took those criticisms as attacks on the Catholic Church. The answer
is (C).
- question: 'What should a public relations media practitioner do if she does not know
the answer to a reporter''s question?
(A) Give the reporter other information she is certain is correct. (B) Say that
the information is ''off the record'' and will be disseminated later. (C) Say
''I don''t know'' and promise to provide the information later. (D) Say ''no
comment,'' rather than appear uninformed.'
target: 'Let''s think step by step. We refer to Wikipedia articles on public relations
for help. If a public relations media practitioner does not know the answer
to a reporter''s question, they should say ''I don''t know'' and offer to provide
the information later. The answer is (C).'
group: mmlu_flan_cot_fewshot_social_sciences
include: _mmlu_flan_cot_fewshot_template_yaml
task: mmlu_flan_cot_fewshot_public_relations
"dataset_name": "security_studies"
"description": "The following are multiple choice questions (with answers) about security\
\ studies.\n\nQ: What are the frameworks of analysis within which terrorism has\
\ been considered (as of 2020)?\n(A) Competition between larger nations has resulted\
\ in some countries actively supporting terrorist groups to undermine the strength\
\ of rival states. Terrorist networks are extended patronage clubs maintained and\
\ paid for by their donor states and are conceptualised as being like state actors,\
\ to be dealt with using military force. (B) Globalization has enabled the internationalization\
\ of terrorist activities by opening up their operational space, although coordination\
\ is still managed from a geographical base. This suggests that terrorist groups\
\ are nationally structured which means that terrorism cannot be considered in terms\
\ of a war to be defeated militarily without having serious implications on the\
\ indigenous population. (C) Terrorism can be viewed as a problem to be resolved\
\ by military means (war on terrorism), by normal police techniques (terrorism as\
\ crime), or as a medical problem with underlying causes and symptoms (terrorism\
\ as disease). (D) Terrorism is viewed as a criminal problem. The criminalization\
\ of terrorism has two important implications. Firstly, it suggests that terrorism\
\ can be eradicated - terrorists can be caught and brought to trial by normal judicial\
\ proceedings thereby removing the threat from society - and secondly, it suggests\
\ that preventative crime techniques are applicable to prevent its development.\n\
A: Let's think step by step. We refer to Wikipedia articles on security studies\
\ for help. (A) is wrong because it is not competition between larger nations that\
\ causes terrorism. \n(B) is wrong because globalization is not the cause of terrorism.\n\
(C) is correct because the US undertook the war on terrorism. \n(D) is wrong because\
\ preventative crime techniques will likely not end terrorism. The answer is (C).\n\
\nQ: Which of the following is the best lens through which to investigate the role\
\ of child soldiers?\n(A) Child soldiers are victims of combat that need re-education\
\ and rehabilitation. (B) Children and their mothers are not active subjects in\
\ warfare and are best considered as subjects in the private sphere. (C) Children\
\ are most often innocent bystanders in war and are best used as signifiers of peace.\
\ (D) Children have political subjecthood that is missed when they are considered\
\ as passive victims of warfare.\nA: Let's think step by step. We refer to Wikipedia\
\ articles on security studies for help. Child soliders as a political topic can\
\ be missed when they are considered passive victims of warfare. The answer is (D).\n\
\nQ: How can we best describe the relationship between the state-centric approach\
\ and the concept of human security?\n(A) There are such wide divisions within the\
\ human security framework regarding the nature of threats and referent objects\
\ that no widely applicable comparisons between state-centric approaches and human\
\ security can be drawn. (B) By adopting the framework of human security, the limitations\
\ of the realist state-centric approach become evident. Whilst human security defines\
\ the referent object as the person or population, state-centric approaches prioritise\
\ the security of the state, de-prioritizing the pursuit of human security. (C)\
\ The state-centric approach to security is a faction of human security, usually\
\ defined within the broad school of human security. By being state-centric this\
\ approach prioritises the individual as the referent object in security studies.\
\ (D) Both the state-centric and human-centric approaches to security are mutually\
\ exclusive and offer a sufficient analytic framework with which to understand the\
\ international security system. It is therefore the role of security analysts to\
\ determine which of these substantial concepts is correct, and which should be\
\ discarded.\nA: Let's think step by step. We refer to Wikipedia articles on security\
\ studies for help. Human security focuses on a person or population whereas state-centric\
\ approaches focus on the state while deprioritizing human security. The answer\
\ is (B).\n\nQ: In order to become securitized, a threat must be presented in which\
\ of these ways?\n(A) As an existential threat that requires immediate and extraordinary\
\ action, posing a threat to the survival of the state or to societal security.\
\ (B) As requiring immediate and extraordinary action by the state, threatening\
\ the survival of a referent object and therefore warranting the use of measures\
\ not normally employed in the political realm. (C) As an urgent threat to the survival\
\ of the referent object, so serious that it legitimises the employment of extraordinary\
\ action in response. (D) As an urgent threat to the survival of the audience that\
\ requires extraordinary or emergency measures.\nA: Let's think step by step. We\
\ refer to Wikipedia articles on security studies for help. To be securitized, a\
\ threat must be an urgent threat to the survival of the referent object. The answer\
\ is (C).\n\nQ: What distinguishes coercive diplomacy from military force?\n(A)\
\ Compellence is another term for coercive diplomacy, but covering a narrower set\
\ of criteria; compellence covers those threats aimed at initiating adversary action.\
\ A threat to coerce a state to give up part of its territory would count as coercive\
\ diplomacy, as long as that threat proactively initiates action before reactive\
\ diplomacy is taken. (B) Coercive diplomacy constitutes the threats of limited\
\ force to induce adversary's incentive to comply with the coercer's demands. It\
\ is an influence strategy that is intended to obtain compliance: the use of force\
\ to defeat an opponent first does not count. It leaves an element of choice with\
\ the target to comply, or to continue. (C) Military force, or the threat of military\
\ force, utilises fear to achieve strategic objectives. Coercive diplomacy is differentiated\
\ from this approach, because it does not use fear as a tool for coercing an adversary.\
\ (D) Coercive diplomacy is employed to use force but to limit its effects on the\
\ international community. Coercive diplomacy is an aggressive strategy that is\
\ intended to obtain compliance through defeat. It does not leave an element of\
\ choice with the target, the target either being forced to comply or engage in\
\ conflict. It seeks to control by imposing compliance by removing any opportunity\
\ for negotiation or concession.\nA: Let's think step by step. We refer to Wikipedia\
\ articles on security studies for help. Coercive diplomacy uses the threat of force\
\ to induce the opponent to comply with demands. The answer is (B).\n\n"
"group": "mmlu_flan_cot_fewshot_social_sciences"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_security_studies"
dataset_name: security_studies
description: The following are multiple choice questions (with answers) about security
studies.
fewshot_config:
sampler: first_n
samples:
- question: 'What are the frameworks of analysis within which terrorism has been considered
(as of 2020)?
(A) Competition between larger nations has resulted in some countries actively
supporting terrorist groups to undermine the strength of rival states. Terrorist
networks are extended patronage clubs maintained and paid for by their donor
states and are conceptualised as being like state actors, to be dealt with using
military force. (B) Globalization has enabled the internationalization of terrorist
activities by opening up their operational space, although coordination is still
managed from a geographical base. This suggests that terrorist groups are nationally
structured which means that terrorism cannot be considered in terms of a war
to be defeated militarily without having serious implications on the indigenous
population. (C) Terrorism can be viewed as a problem to be resolved by military
means (war on terrorism), by normal police techniques (terrorism as crime),
or as a medical problem with underlying causes and symptoms (terrorism as disease).
(D) Terrorism is viewed as a criminal problem. The criminalization of terrorism
has two important implications. Firstly, it suggests that terrorism can be eradicated
- terrorists can be caught and brought to trial by normal judicial proceedings
thereby removing the threat from society - and secondly, it suggests that preventative
crime techniques are applicable to prevent its development.'
target: "Let's think step by step. We refer to Wikipedia articles on security\
\ studies for help. (A) is wrong because it is not competition between larger\
\ nations that causes terrorism. \n(B) is wrong because globalization is not\
\ the cause of terrorism.\n(C) is correct because the US undertook the war on\
\ terrorism. \n(D) is wrong because preventative crime techniques will likely\
\ not end terrorism. The answer is (C)."
- question: 'Which of the following is the best lens through which to investigate the
role of child soldiers?
(A) Child soldiers are victims of combat that need re-education and rehabilitation.
(B) Children and their mothers are not active subjects in warfare and are best
considered as subjects in the private sphere. (C) Children are most often innocent
bystanders in war and are best used as signifiers of peace. (D) Children have
political subjecthood that is missed when they are considered as passive victims
of warfare.'
target: Let's think step by step. We refer to Wikipedia articles on security studies
for help. Child soliders as a political topic can be missed when they are considered
passive victims of warfare. The answer is (D).
- question: 'How can we best describe the relationship between the state-centric approach
and the concept of human security?
(A) There are such wide divisions within the human security framework regarding
the nature of threats and referent objects that no widely applicable comparisons
between state-centric approaches and human security can be drawn. (B) By adopting
the framework of human security, the limitations of the realist state-centric
approach become evident. Whilst human security defines the referent object as
the person or population, state-centric approaches prioritise the security of
the state, de-prioritizing the pursuit of human security. (C) The state-centric
approach to security is a faction of human security, usually defined within
the broad school of human security. By being state-centric this approach prioritises
the individual as the referent object in security studies. (D) Both the state-centric
and human-centric approaches to security are mutually exclusive and offer a
sufficient analytic framework with which to understand the international security
system. It is therefore the role of security analysts to determine which of
these substantial concepts is correct, and which should be discarded.'
target: Let's think step by step. We refer to Wikipedia articles on security studies
for help. Human security focuses on a person or population whereas state-centric
approaches focus on the state while deprioritizing human security. The answer
is (B).
- question: 'In order to become securitized, a threat must be presented in which of
these ways?
(A) As an existential threat that requires immediate and extraordinary action,
posing a threat to the survival of the state or to societal security. (B) As
requiring immediate and extraordinary action by the state, threatening the survival
of a referent object and therefore warranting the use of measures not normally
employed in the political realm. (C) As an urgent threat to the survival of
the referent object, so serious that it legitimises the employment of extraordinary
action in response. (D) As an urgent threat to the survival of the audience
that requires extraordinary or emergency measures.'
target: Let's think step by step. We refer to Wikipedia articles on security studies
for help. To be securitized, a threat must be an urgent threat to the survival
of the referent object. The answer is (C).
- question: 'What distinguishes coercive diplomacy from military force?
(A) Compellence is another term for coercive diplomacy, but covering a narrower
set of criteria; compellence covers those threats aimed at initiating adversary
action. A threat to coerce a state to give up part of its territory would count
as coercive diplomacy, as long as that threat proactively initiates action before
reactive diplomacy is taken. (B) Coercive diplomacy constitutes the threats
of limited force to induce adversary''s incentive to comply with the coercer''s
demands. It is an influence strategy that is intended to obtain compliance:
the use of force to defeat an opponent first does not count. It leaves an element
of choice with the target to comply, or to continue. (C) Military force, or
the threat of military force, utilises fear to achieve strategic objectives.
Coercive diplomacy is differentiated from this approach, because it does not
use fear as a tool for coercing an adversary. (D) Coercive diplomacy is employed
to use force but to limit its effects on the international community. Coercive
diplomacy is an aggressive strategy that is intended to obtain compliance through
defeat. It does not leave an element of choice with the target, the target either
being forced to comply or engage in conflict. It seeks to control by imposing
compliance by removing any opportunity for negotiation or concession.'
target: 'Let''s think step by step. We refer to Wikipedia articles on security
studies for help. Coercive diplomacy uses the threat of force to induce the
opponent to comply with demands. The answer is (B).'
group: mmlu_flan_cot_fewshot_social_sciences
include: _mmlu_flan_cot_fewshot_template_yaml
task: mmlu_flan_cot_fewshot_security_studies
"dataset_name": "sociology"
"description": "The following are multiple choice questions (with answers) about sociology.\n\
\nQ: Which of the following is not a problem associated with official statistics\
\ on strike action?\n(A) most strikes go unnoticed by employers and the mass media\
\ (B) not all industrial disputes will be reported by the employer (C) the definition\
\ of strikes excludes those that involve fewer than ten workers or last less than\
\ one day (D) it is hard to compare strikes that were measured in different ways\n\
A: Let's think step by step. We refer to Wikipedia articles on sociology for help.\
\ Official statistics on strike action can be problematic because not all industrial\
\ disputes will be reported by employers, the definition of strikes excludes those\
\ that involves fewer than ten workers or last less than one day, and it is hard\
\ to compare strikes that were measured in different ways. Thus, (A) is not a problem\
\ associated with official statistics on strike action. The answer is (A).\n\nQ:\
\ What does Berger (1963) describe as a metaphor for social reality?\n(A) a fairground\
\ ride (B) a circus (C) a puppet theatre (D) a ballet\nA: Let's think step by step.\
\ We refer to Wikipedia articles on sociology for help. Berger describes social\
\ reality using the metaphor of a puppet theatre. The answer is (C).\n\nQ: The term\
\ 'hegemony' refers to:\n(A) the tendency for the working class not to realize their\
\ own interests (B) a dominant ideology that legitimates economic, political and\
\ cultural power (C) a form of dual consciousness based on ideology and everyday\
\ experiences (D) a mode of payment given for outstanding topiary\nA: Let's think\
\ step by step. We refer to Wikipedia articles on sociology for help. Hegemony refers\
\ to a dominant ideology that legitimates economic, policital, and cultural power.\
\ The answer is (B).\n\nQ: The shift from 'civil religion' to 'common religion'\
\ means that:\n(A) the increasing bureaucracy of the state has made religion only\
\ a marginal part of our lives (B) despite the weakening of traditional authority,\
\ our everyday lives and 'common sense' remain shaped by religious beliefs and values\
\ (C) religious participation in collective worship may have declined, but people\
\ still practise their faiths in private (D) people are much more likely to discuss\
\ their religious beliefs in public, informal settings\nA: Let's think step by step.\
\ We refer to Wikipedia articles on sociology for help. The shift from civil religion\
\ to common religion means that despite the weakening of traditional authority,\
\ our everyday lives and common sense remain shaped by religious beliefs and values.\
\ The answer is (B).\n\nQ: Which of the following did the post-war welfare state\
\ of 1948 not aim to provide:\n(A) free health care and education for all (B) a\
\ minimum wage (C) full employment (D) universal welfare\nA: Let's think step by\
\ step. We refer to Wikipedia articles on sociology for help. The post-war welfare\
\ state of 1948 aimed to provide free healthcare and education, full employment,\
\ and universal welfare. But it did not aim to provide a minimum wage. The answer\
\ is (B).\n\n"
"group": "mmlu_flan_cot_fewshot_social_sciences"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_sociology"
dataset_name: sociology
description: The following are multiple choice questions (with answers) about sociology.
fewshot_config:
sampler: first_n
samples:
- question: 'Which of the following is not a problem associated with official statistics
on strike action?
(A) most strikes go unnoticed by employers and the mass media (B) not all industrial
disputes will be reported by the employer (C) the definition of strikes excludes
those that involve fewer than ten workers or last less than one day (D) it is
hard to compare strikes that were measured in different ways'
target: Let's think step by step. We refer to Wikipedia articles on sociology
for help. Official statistics on strike action can be problematic because not
all industrial disputes will be reported by employers, the definition of strikes
excludes those that involves fewer than ten workers or last less than one day,
and it is hard to compare strikes that were measured in different ways. Thus,
(A) is not a problem associated with official statistics on strike action. The
answer is (A).
- question: 'What does Berger (1963) describe as a metaphor for social reality?
(A) a fairground ride (B) a circus (C) a puppet theatre (D) a ballet'
target: Let's think step by step. We refer to Wikipedia articles on sociology
for help. Berger describes social reality using the metaphor of a puppet theatre.
The answer is (C).
- question: 'The term ''hegemony'' refers to:
(A) the tendency for the working class not to realize their own interests (B)
a dominant ideology that legitimates economic, political and cultural power
(C) a form of dual consciousness based on ideology and everyday experiences
(D) a mode of payment given for outstanding topiary'
target: Let's think step by step. We refer to Wikipedia articles on sociology
for help. Hegemony refers to a dominant ideology that legitimates economic,
policital, and cultural power. The answer is (B).
- question: 'The shift from ''civil religion'' to ''common religion'' means that:
(A) the increasing bureaucracy of the state has made religion only a marginal
part of our lives (B) despite the weakening of traditional authority, our everyday
lives and ''common sense'' remain shaped by religious beliefs and values (C)
religious participation in collective worship may have declined, but people
still practise their faiths in private (D) people are much more likely to discuss
their religious beliefs in public, informal settings'
target: Let's think step by step. We refer to Wikipedia articles on sociology
for help. The shift from civil religion to common religion means that despite
the weakening of traditional authority, our everyday lives and common sense
remain shaped by religious beliefs and values. The answer is (B).
- question: 'Which of the following did the post-war welfare state of 1948 not aim
to provide:
(A) free health care and education for all (B) a minimum wage (C) full employment
(D) universal welfare'
target: 'Let''s think step by step. We refer to Wikipedia articles on sociology
for help. The post-war welfare state of 1948 aimed to provide free healthcare
and education, full employment, and universal welfare. But it did not aim to
provide a minimum wage. The answer is (B).'
group: mmlu_flan_cot_fewshot_social_sciences
include: _mmlu_flan_cot_fewshot_template_yaml
task: mmlu_flan_cot_fewshot_sociology
"dataset_name": "us_foreign_policy"
"description": "The following are multiple choice questions (with answers) about us\
\ foreign policy.\n\nQ: How did Donald Trump attack globalization in the 2016 campaign?\n\
(A) Globalization had made men like him too rich (B) Globalization only benefited\
\ certain American states, such as New York (C) Liberal elites had encouraged globalization,\
\ while 'ordinary Americans' lost jobs because of it (D) Globalization encouraged\
\ damaging trade wars\nA: Let's think step by step. We refer to Wikipedia articles\
\ on us foreign policy for help. Trump attacked globalization because he believed\
\ ordinary Americans lost jobs due to it, and so he wanted to blame liberals who\
\ had encouraged it. The answer is (C).\n\nQ: How did NSC-68 change U.S. strategy?\n\
(A) It globalized containment. (B) It militarized containment. (C) It called for\
\ the development of the hydrogen bomb. (D) All of the above\nA: Let's think step\
\ by step. We refer to Wikipedia articles on us foreign policy for help. NSC-68\
\ outlined a variety of courses of action, including globalization of containment,\
\ militarization of contaiment, and the development of the hydrogen bomb. The answer\
\ is (D).\n\nQ: How do Defensive Realism and Offensive Realism differ in their explanation\
\ of state behaviour?\n(A) Defensive realists place greater emphasis on the role\
\ of international institutions (B) Defensive realists place less emphasis on geographical\
\ factors (C) Offensive realists give more priority to the national interest than\
\ Defensive realists. (D) Defensive realists believe states are security maximizers,\
\ while Offensive realists believe states to be power maximizers\nA: Let's think\
\ step by step. We refer to Wikipedia articles on us foreign policy for help. While\
\ defensive realism advocates that states are security maximizers, offensive realists\
\ think of states as power maximizers. The answer is (D).\n\nQ: The realm of policy\
\ decisions concerned primarily with relations between the United States and the\
\ rest of the world is known as\n(A) terrorism policy. (B) economic policy. (C)\
\ foreign policy. (D) international policy.\nA: Let's think step by step. We refer\
\ to Wikipedia articles on us foreign policy for help. The topic of policy decisions\
\ concerns with relations between the US and the rest of the world is known as foreign\
\ policy. The answer is (C).\n\nQ: How did the 2008 financial crisis affect America's\
\ international reputation?\n(A) It damaged support for the US model of political\
\ economy and capitalism (B) It created anger at the United States for exaggerating\
\ the crisis (C) It increased support for American global leadership under President\
\ Obama (D) It reduced global use of the US dollar\nA: Let's think step by step.\
\ We refer to Wikipedia articles on us foreign policy for help. The 2008 financial\
\ crisis damanged the international reputation of the American model of political\
\ economy and capitalism. The answer is (A).\n\n"
"group": "mmlu_flan_cot_fewshot_social_sciences"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_us_foreign_policy"
dataset_name: us_foreign_policy
description: The following are multiple choice questions (with answers) about us foreign
policy.
fewshot_config:
sampler: first_n
samples:
- question: 'How did Donald Trump attack globalization in the 2016 campaign?
(A) Globalization had made men like him too rich (B) Globalization only benefited
certain American states, such as New York (C) Liberal elites had encouraged
globalization, while ''ordinary Americans'' lost jobs because of it (D) Globalization
encouraged damaging trade wars'
target: Let's think step by step. We refer to Wikipedia articles on us foreign
policy for help. Trump attacked globalization because he believed ordinary Americans
lost jobs due to it, and so he wanted to blame liberals who had encouraged it.
The answer is (C).
- question: 'How did NSC-68 change U.S. strategy?
(A) It globalized containment. (B) It militarized containment. (C) It called
for the development of the hydrogen bomb. (D) All of the above'
target: Let's think step by step. We refer to Wikipedia articles on us foreign
policy for help. NSC-68 outlined a variety of courses of action, including globalization
of containment, militarization of contaiment, and the development of the hydrogen
bomb. The answer is (D).
- question: 'How do Defensive Realism and Offensive Realism differ in their explanation
of state behaviour?
(A) Defensive realists place greater emphasis on the role of international institutions
(B) Defensive realists place less emphasis on geographical factors (C) Offensive
realists give more priority to the national interest than Defensive realists.
(D) Defensive realists believe states are security maximizers, while Offensive
realists believe states to be power maximizers'
target: Let's think step by step. We refer to Wikipedia articles on us foreign
policy for help. While defensive realism advocates that states are security
maximizers, offensive realists think of states as power maximizers. The answer
is (D).
- question: 'The realm of policy decisions concerned primarily with relations between
the United States and the rest of the world is known as
(A) terrorism policy. (B) economic policy. (C) foreign policy. (D) international
policy.'
target: Let's think step by step. We refer to Wikipedia articles on us foreign
policy for help. The topic of policy decisions concerns with relations between
the US and the rest of the world is known as foreign policy. The answer is (C).
- question: 'How did the 2008 financial crisis affect America''s international reputation?
(A) It damaged support for the US model of political economy and capitalism
(B) It created anger at the United States for exaggerating the crisis (C) It
increased support for American global leadership under President Obama (D) It
reduced global use of the US dollar'
target: 'Let''s think step by step. We refer to Wikipedia articles on us foreign
policy for help. The 2008 financial crisis damanged the international reputation
of the American model of political economy and capitalism. The answer is (A).'
group: mmlu_flan_cot_fewshot_social_sciences
include: _mmlu_flan_cot_fewshot_template_yaml
task: mmlu_flan_cot_fewshot_us_foreign_policy
"dataset_name": "virology"
"description": "The following are multiple choice questions (with answers) about virology.\n\
\nQ: The median survival time to AIDS and death was established by following:\n\
(A) Seroprevalent HIV-infected individuals (B) Seronegatives (C) Seroconverters\
\ (D) High-risk seronegatives\nA: Let's think step by step. We refer to Wikipedia\
\ articles on virology for help. The median survival time to AIDS and death was\
\ established as a result of the development of seroconverters. The answer is (C).\n\
\nQ: Which of the following is a morphological characteristic of the paramyxoviruses.\n\
(A) Fragile viruses often visualised with RNA spewing from the inside (B) Elongate\
\ viruses (C) Icosahedral viruses with envelope (D) Very large viruses\nA: Let's\
\ think step by step. We refer to Wikipedia articles on virology for help. Paramyxoviruses\
\ are fragile viruses often visualised with RNA spewing from the inside. The answer\
\ is (A).\n\nQ: The most important goal of a behavioral intervention is:\n(A) Change\
\ in behavior (B) Comprehensive coverage (C) Effective use of behavioral theory\
\ (D) Sustained behavior change\nA: Let's think step by step. We refer to Wikipedia\
\ articles on virology for help. The prim goal of a behavioral intervention is to\
\ cause sustained behavior change. The answer is (D).\n\nQ: A key factor facilitating\
\ the application of nested case-control studies from the MACS was:\n(A) Data collection\
\ (B) Establishment of a repository of biologic specimens (C) Participant interest\
\ (D) Administration of the questionnaire by staff\nA: Let's think step by step.\
\ We refer to Wikipedia articles on virology for help. The Multicenter AIDS Cohort\
\ Study's use of nested case-control studies was facilitated by the establishment\
\ of a repository of biologic specimens. The answer is (B).\n\nQ: Why are parvoviruses\
\ a highly impactful parasite?\n(A) Because they have no nucleic acid (B) They require\
\ a helper virus (C) Only replicate in dividing cells (D) Can integrate into host\
\ chromosomes\nA: Let's think step by step. We refer to Wikipedia articles on virology\
\ for help. Paroviruses are highly impactful because they do not have nucleic acid.\
\ The answer is (A).\n\n"
"group": "mmlu_flan_cot_fewshot_other"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_virology"
dataset_name: virology
description: The following are multiple choice questions (with answers) about virology.
fewshot_config:
sampler: first_n
samples:
- question: 'The median survival time to AIDS and death was established by following:
(A) Seroprevalent HIV-infected individuals (B) Seronegatives (C) Seroconverters
(D) High-risk seronegatives'
target: Let's think step by step. We refer to Wikipedia articles on virology for
help. The median survival time to AIDS and death was established as a result
of the development of seroconverters. The answer is (C).
- question: 'Which of the following is a morphological characteristic of the paramyxoviruses.
(A) Fragile viruses often visualised with RNA spewing from the inside (B) Elongate
viruses (C) Icosahedral viruses with envelope (D) Very large viruses'
target: Let's think step by step. We refer to Wikipedia articles on virology for
help. Paramyxoviruses are fragile viruses often visualised with RNA spewing
from the inside. The answer is (A).
- question: 'The most important goal of a behavioral intervention is:
(A) Change in behavior (B) Comprehensive coverage (C) Effective use of behavioral
theory (D) Sustained behavior change'
target: Let's think step by step. We refer to Wikipedia articles on virology for
help. The prim goal of a behavioral intervention is to cause sustained behavior
change. The answer is (D).
- question: 'A key factor facilitating the application of nested case-control studies
from the MACS was:
(A) Data collection (B) Establishment of a repository of biologic specimens
(C) Participant interest (D) Administration of the questionnaire by staff'
target: Let's think step by step. We refer to Wikipedia articles on virology for
help. The Multicenter AIDS Cohort Study's use of nested case-control studies
was facilitated by the establishment of a repository of biologic specimens.
The answer is (B).
- question: 'Why are parvoviruses a highly impactful parasite?
(A) Because they have no nucleic acid (B) They require a helper virus (C) Only
replicate in dividing cells (D) Can integrate into host chromosomes'
target: 'Let''s think step by step. We refer to Wikipedia articles on virology
for help. Paroviruses are highly impactful because they do not have nucleic
acid. The answer is (A).'
group: mmlu_flan_cot_fewshot_other
include: _mmlu_flan_cot_fewshot_template_yaml
task: mmlu_flan_cot_fewshot_virology
"dataset_name": "world_religions"
"description": "The following are multiple choice questions (with answers) about world\
\ religions.\n\nQ: How can the Upanishads be characterized?\n(A) Ritual texts (B)\
\ Philosophical texts (C) Hymns (D) Origin stories\nA: Let's think step by step.\
\ We refer to Wikipedia articles on world religions for help. The Upanishads are\
\ the most recent part of Vedas (the oldest scriptures in Hinduism) and supplied\
\ the basis of later Hindu philosophy. So they are philosophical texts. The answer\
\ is (B).\n\nQ: What is the Second Gem in Buddhism?\n(A) The Dharma (B) The Sangha\
\ (C) The Buddha (D) The Bodhisattva\nA: Let's think step by step. We refer to Wikipedia\
\ articles on world religions for help. The Second Gem in Buddhism is The Dharma.\
\ The answer is (A).\n\nQ: Which Japanese government promoted a kind of national\
\ cult based on the emperor and his associations with kami?\n(A) Honen (B) Tanaka\
\ (C) Tokugawa (D) Meiji\nA: Let's think step by step. We refer to Wikipedia articles\
\ on world religions for help. The promotion of a national cult based on the emperor\
\ and his associations with Kami happened during the reign of Emperor Meiji (1852-1912).\
\ The answer is (D).\n\nQ: In which dynasty was the \"Mandate of Heaven\" developed\
\ to legitimatize the new rulers?\n(A) Shang (B) Zhou (C) Han (D) Xia\nA: Let's\
\ think step by step. We refer to Wikipedia articles on world religions for help.\
\ The \"Mandate of Heaven\" was developed as an ancient Chinese philosophical concept\
\ during the Zhou Dynasty (1046-256 BCE). The answer is (B).\n\nQ: What is the sign\
\ of the covenant for Jewish males?\n(A) The rainbow (B) Circumcision (C) A son\
\ (D) Bar mitzvah\nA: Let's think step by step. We refer to Wikipedia articles on\
\ world religions for help. In Judaism, the most distinctive sign of the covenant\
\ is circumcision (brit milah). The answer is (B).\n\n"
"group": "mmlu_flan_cot_fewshot_humanities"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_world_religions"
dataset_name: world_religions
description: The following are multiple choice questions (with answers) about world
religions.
fewshot_config:
sampler: first_n
samples:
- question: 'How can the Upanishads be characterized?
(A) Ritual texts (B) Philosophical texts (C) Hymns (D) Origin stories'
target: Let's think step by step. We refer to Wikipedia articles on world religions
for help. The Upanishads are the most recent part of Vedas (the oldest scriptures
in Hinduism) and supplied the basis of later Hindu philosophy. So they are philosophical
texts. The answer is (B).
- question: 'What is the Second Gem in Buddhism?
(A) The Dharma (B) The Sangha (C) The Buddha (D) The Bodhisattva'
target: Let's think step by step. We refer to Wikipedia articles on world religions
for help. The Second Gem in Buddhism is The Dharma. The answer is (A).
- question: 'Which Japanese government promoted a kind of national cult based on the
emperor and his associations with kami?
(A) Honen (B) Tanaka (C) Tokugawa (D) Meiji'
target: Let's think step by step. We refer to Wikipedia articles on world religions
for help. The promotion of a national cult based on the emperor and his associations
with Kami happened during the reign of Emperor Meiji (1852-1912). The answer
is (D).
- question: 'In which dynasty was the "Mandate of Heaven" developed to legitimatize
the new rulers?
(A) Shang (B) Zhou (C) Han (D) Xia'
target: Let's think step by step. We refer to Wikipedia articles on world religions
for help. The "Mandate of Heaven" was developed as an ancient Chinese philosophical
concept during the Zhou Dynasty (1046-256 BCE). The answer is (B).
- question: 'What is the sign of the covenant for Jewish males?
(A) The rainbow (B) Circumcision (C) A son (D) Bar mitzvah'
target: 'Let''s think step by step. We refer to Wikipedia articles on world religions
for help. In Judaism, the most distinctive sign of the covenant is circumcision
(brit milah). The answer is (B).'
group: mmlu_flan_cot_fewshot_humanities
include: _mmlu_flan_cot_fewshot_template_yaml
task: mmlu_flan_cot_fewshot_world_religions
......@@ -8,7 +8,7 @@ filter_list:
- name: "strict-match"
filter:
- function: "regex"
regex_pattern: "((?<=The answer is )(.*)(?=.)|(?<=the answer is )(.*)(?=.)|(?<=The answer: )(.*)(?=.)|(?<=The final answer: )(.*)(?=.))"
regex_pattern: "((?<=The answer is )(.*)(?=.)|(?<=answer is )(.*)(?=.)|(?<=The answer: )(.*)(?=.)|(?<=The final answer: )(.*)(?=.))"
- function: "take_first"
- name: "flexible-extract"
filter:
......@@ -33,4 +33,6 @@ metric_list:
ignore_case: true
ignore_punctuation: true
metadata:
version: 1.0
version: 2.0
dataset_kwargs:
trust_remote_code: true
......@@ -2,8 +2,10 @@ group: mmlu_flan_n_shot_generative
dataset_path: hails/mmlu_no_train # a copy of `cais/mmlu` with no auxiliary_train split
test_split: test
fewshot_split: dev
fewshot_config:
sampler: first_n
output_type: generate_until
doc_to_text: "Q: {{question.strip()}}\n(A) {{choices[0]}} (B) {{choices[1]}} (C) {{choices[2]}} (D) {{choices[3]}}\nA: "
doc_to_text: "Q: {{question.strip()}}\n(A) {{choices[0]}} (B) {{choices[1]}} (C) {{choices[2]}} (D) {{choices[3]}}\nA:"
doc_to_target: "{{['(A)', '(B)', '(C)', '(D)'][answer]}}"
filter_list:
- name: "strict-match"
......@@ -22,9 +24,12 @@ generation_kwargs:
- "</s>"
- "Q:"
- "<|im_end|>"
- "\n"
metric_list:
- metric: exact_match
aggregation: mean
higher_is_better: true
metadata:
version: 1.0
version: 2.0
dataset_kwargs:
trust_remote_code: true
dataset_path: hails/mmlu_no_train # a copy of `cais/mmlu` with no auxiliary_train split
test_split: test
fewshot_split: dev
fewshot_config:
sampler: first_n
output_type: multiple_choice
doc_to_text: "Q: {{question.strip()}}\n(A) {{choices[0]}} (B) {{choices[1]}} (C) {{choices[2]}} (D) {{choices[3]}}\nA: "
doc_to_text: "Q: {{question.strip()}}\n(A) {{choices[0]}} (B) {{choices[1]}} (C) {{choices[2]}} (D) {{choices[3]}}\nA:"
doc_to_choice: ["(A)", "(B)", "(C)", "(D)"]
doc_to_target: answer
metric_list:
- metric: acc
aggregation: mean
higher_is_better: true
- metric: acc_norm
aggregation: mean
higher_is_better: true
metadata:
version: 0.0
version: 1.0
dataset_kwargs:
trust_remote_code: true
dataset_path: hails/mmlu_no_train # a copy of `cais/mmlu` with no auxiliary_train split
test_split: test
fewshot_split: dev
fewshot_config:
sampler: first_n
output_type: generate_until
doc_to_text: "{{question.strip()}}\nA. {{choices[0]}}\nB. {{choices[1]}}\nC. {{choices[2]}}\nD. {{choices[3]}}\nAnswer:"
doc_to_target: "{{['A', 'B', 'C', 'D'][answer]}}"
generation_kwargs:
until:
- "</s>"
- "\n"
metric_list:
- metric: exact_match
aggregation: mean
higher_is_better: true
metadata:
version: 1.0
dataset_kwargs:
trust_remote_code: true
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment