Unverified Commit 4902aaaf authored by Clémentine Fourrier's avatar Clémentine Fourrier Committed by GitHub
Browse files

Making hardcoded few shots compatible with the chat template mechanism (#1895)



* init test 1

* fix

* this format seems to be working - need to update all other tasks with the new format

* bbh with few shot format

* fix fewshot bbh

* add mmlu flan cot

* samples of cot

* kmmlu

* fix gsm8k

* update keys for mmlu

* minerva math

* bbh

* fix

* fix samples

* small fixes to templates

* last prompt format change

* fixing prompt

* fixed minerva math format

* rm accidental commited file

* added doc for few shot samples

* Update lm_eval/loggers/evaluation_tracker.py

* Update lm_eval/loggers/evaluation_tracker.py

* Update docs/new_task_guide.md
Co-authored-by: default avatarHailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

* added check in sampler per code review

* added the system from a function, plus an example in minerva math

* style

* Apply suggestions from code review
Co-authored-by: default avatarHailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

* fix unit tests 1

* forcing use of test split

---------
Co-authored-by: default avatarHailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
parent f4f59251
"dataset_name": "high_school_world_history" dataset_name: high_school_world_history
"description": "The following are multiple choice questions (with answers) about high\ description: The following are multiple choice questions (with answers) about high
\ school world history.\n\nQ: This question refers to the following information.\n\ school world history.
\"At least one of the [world's] societies would have to somehow enormously increase\ fewshot_config:
\ its productivity [in order to achieve global hegemony]. That quantum jump would\ sampler: first_n
\ have to be made before the various scientific, technological, agricultural, and\ samples:
\ industrial revolutions on which our post-quantum-leap world rests. It could only\ - question: "This question refers to the following information.\n\"At least one of\
\ be accomplished by exploiting the ecosystems, mineral resources, and human assets\ \ the [world's] societies would have to somehow enormously increase its productivity\
\ of whole continents outside the lands of the society making the jump. Western\ \ [in order to achieve global hegemony]. That quantum jump would have to be\
\ Europe did just that by means of its brutality and guns and, more important, by\ \ made before the various scientific, technological, agricultural, and industrial\
\ geographical and ecological luck.\"\nCopyright © 2015 Cambridge University Press.\n\ \ revolutions on which our post-quantum-leap world rests. It could only be accomplished\
Alfred Crosby, historian, Ecological Imperialism, 2004\nThe \"quantum jump\" mentioned\ \ by exploiting the ecosystems, mineral resources, and human assets of whole\
\ in the passage most directly contributed to which of the following developments\ \ continents outside the lands of the society making the jump. Western Europe\
\ in the period 1450–1750 C.E.?\n(A) A breakdown in trade routes through the collapse\ \ did just that by means of its brutality and guns and, more important, by geographical\
\ of the established state structure (B) An increase in the population of the world\ \ and ecological luck.\"\nCopyright \xA9 2015 Cambridge University Press.\n\
\ through more plentiful supplies of food (C) The spread of Chinese and Indian belief\ Alfred Crosby, historian, Ecological Imperialism, 2004\nThe \"quantum jump\"\
\ systems across the world (D) An increase in social unrest\nA: Let's think step\ \ mentioned in the passage most directly contributed to which of the following\
\ by step. We refer to Wikipedia articles on world history for help. The \"quantum\ \ developments in the period 1450\u20131750 C.E.?\n(A) A breakdown in trade\
\ jump\" mentioned in the passage refers to the conquest of the New World and the\ \ routes through the collapse of the established state structure (B) An increase\
\ Columbian Exchange. Choice (A) and (C) did not happen in history. Choice (C) refers\ \ in the population of the world through more plentiful supplies of food (C)\
\ to the human assets. The answer is (B).\n\nQ: This question refers to the following\ \ The spread of Chinese and Indian belief systems across the world (D) An increase\
\ information.\n\"The struggle against neo-colonialism is not aimed at excluding\ \ in social unrest"
\ the capital of the developed world from operating in less developed countries.\ target: Let's think step by step. We refer to Wikipedia articles on world history
\ It is aimed at preventing the financial power of the developed countries being\ for help. The "quantum jump" mentioned in the passage refers to the conquest
\ used in such a way as to impoverish the less developed.\nNon-alignment, as practiced\ of the New World and the Columbian Exchange. Choice (A) and (C) did not happen
\ by Ghana and many other countries, is based on co-operation with all States whether\ in history. Choice (C) refers to the human assets. The answer is (B).
\ they be capitalist, socialist or have a mixed economy. Such a policy, therefore,\ - question: "This question refers to the following information.\n\"The struggle against\
\ involves foreign investment from capitalist countries, but it must be invested\ \ neo-colonialism is not aimed at excluding the capital of the developed world\
\ in accordance with a national plan drawn up by the government of the non-aligned\ \ from operating in less developed countries. It is aimed at preventing the\
\ State with its own interests in mind. The issue is not what return the foreign\ \ financial power of the developed countries being used in such a way as to\
\ investor receives on his investments…The question is one of power. A State in\ \ impoverish the less developed.\nNon-alignment, as practiced by Ghana and many\
\ the grip of neo-colonialism is not master of its own destiny.\"\nKwame Nkrumah,\ \ other countries, is based on co-operation with all States whether they be\
\ Neo-Colonialism, 1965\nWhich of the following provides the best context for Nkrumah's\ \ capitalist, socialist or have a mixed economy. Such a policy, therefore, involves\
\ writings?\n(A) The Industrial Revolution (B) Decolonization (C) Regional Free\ \ foreign investment from capitalist countries, but it must be invested in accordance\
\ Trade Associations (D) Autarky\nA: Let's think step by step. We refer to Wikipedia\ \ with a national plan drawn up by the government of the non-aligned State with\
\ articles on world history for help. The passage expresses a point that the successful\ \ its own interests in mind. The issue is not what return the foreign investor\
\ fight against neo-colonialism were in danger and the newly independent nations\ \ receives on his investments\u2026The question is one of power. A State in\
\ like Ghana may be re-colonized via financial power of the developed countries.\ \ the grip of neo-colonialism is not master of its own destiny.\"\nKwame Nkrumah,\
\ The answer is (B).\n\nQ: This question refers to the following information.\n\"\ \ Neo-Colonialism, 1965\nWhich of the following provides the best context for\
Indeed, as both the fatwas of distinguished [scholars] who base their opinion on\ \ Nkrumah's writings?\n(A) The Industrial Revolution (B) Decolonization (C)\
\ reason and tradition alike and the consensus of the Sunni community agree that\ \ Regional Free Trade Associations (D) Autarky"
\ the ancient obligation of extirpation, extermination, and expulsion of evil innovation\ target: Let's think step by step. We refer to Wikipedia articles on world history
\ must be the aim of our exalted aspiration, for \"Religious zeal is a victory for\ for help. The passage expresses a point that the successful fight against neo-colonialism
\ the Faith of God the Beneficent\"; then, in accordance with the words of the Prophet\ were in danger and the newly independent nations like Ghana may be re-colonized
\ (Peace upon him!) \"Whosoever introduces evil innovation into our order must be\ via financial power of the developed countries. The answer is (B).
\ expelled\" and \"Whosoever does aught against our order must be expelled,\" action\ - question: "This question refers to the following information.\n\"Indeed, as both\
\ has become necessary and exigent…\"\nLetter from Ottoman Sultan Selim I to Safavid\ \ the fatwas of distinguished [scholars] who base their opinion on reason and\
\ Shah Ismail I, 1514\nThe letter from Selim I is most clearly an example of which\ \ tradition alike and the consensus of the Sunni community agree that the ancient\
\ of the following?\n(A) The maintenance of military supremacy at all costs (B)\ \ obligation of extirpation, extermination, and expulsion of evil innovation\
\ Expanding tensions between religious sects (C) Factors that brought about the\ \ must be the aim of our exalted aspiration, for \"Religious zeal is a victory\
\ collapse of the Ottoman Empire (D) Peacemaking efforts among the Islamic empires\n\ \ for the Faith of God the Beneficent\"; then, in accordance with the words\
A: Let's think step by step. We refer to Wikipedia articles on world history for\ \ of the Prophet (Peace upon him!) \"Whosoever introduces evil innovation into\
\ help. The passage is an example of expanding tensions between Selim and Ismail.\ \ our order must be expelled\" and \"Whosoever does aught against our order\
\ In the passage the Selim references the fatwa and the consensus of the Sunni community\ \ must be expelled,\" action has become necessary and exigent\u2026\"\nLetter\
\ to against whosoever introduces evil. The answer is (B).\n\nQ: This question refers\ \ from Ottoman Sultan Selim I to Safavid Shah Ismail I, 1514\nThe letter from\
\ to the following information.\n\"The real grievance of the worker is the insecurity\ \ Selim I is most clearly an example of which of the following?\n(A) The maintenance\
\ of his existence; he is not sure that he will always have work, he is not sure\ \ of military supremacy at all costs (B) Expanding tensions between religious\
\ that he will always be healthy, and he foresees that he will one day be old and\ \ sects (C) Factors that brought about the collapse of the Ottoman Empire (D)\
\ unfit to work. If he falls into poverty, even if only through a prolonged illness,\ \ Peacemaking efforts among the Islamic empires"
\ he is then completely helpless, exam_ins to his own devices, and society does\ target: Let's think step by step. We refer to Wikipedia articles on world history
\ not currently recognize any real obligation towards him beyond the usual help\ for help. The passage is an example of expanding tensions between Selim and
\ for the poor, even if he has been working all the time ever so faithfully and\ Ismail. In the passage the Selim references the fatwa and the consensus of the
\ diligently. The usual help for the poor, however, leaves a lot to be desired,\ Sunni community to against whosoever introduces evil. The answer is (B).
\ especially in large cities, where it is very much worse than in the country.\"\ - question: 'This question refers to the following information.
\nOtto von Bismarck, 1884\nOtto von Bismarck likely made this speech in reaction\
\ to which of the following issues?\n(A) Social acceptance of child labor (B) Declining\ "The real grievance of the worker is the insecurity of his existence; he is
\ life expectancy in Germany (C) Criticisms of German trade tariffs (D) Negative\ not sure that he will always have work, he is not sure that he will always be
\ effects attributed to industrial capitalism\nA: Let's think step by step. We refer\ healthy, and he foresees that he will one day be old and unfit to work. If he
\ to Wikipedia articles on world history for help. The passage talks about the grievance\ falls into poverty, even if only through a prolonged illness, he is then completely
\ of the work under the industrial capitalism. The answer is (D).\n\nQ: This question\ helpless, exam_ins to his own devices, and society does not currently recognize
\ refers to the following information.\nHe contains all works and desires and all\ any real obligation towards him beyond the usual help for the poor, even if
\ perfumes and all tastes. He enfolds the whole universe and in silence is loving\ he has been working all the time ever so faithfully and diligently. The usual
\ to all. This is the Spirit that is in my heart, this is Brahman. To him I shall\ help for the poor, however, leaves a lot to be desired, especially in large
\ come when I go beyond this life, and to him will come he who has faith and doubts\ cities, where it is very much worse than in the country."
\ not.\n—The Upanishads, India, c. 1000 BCE\nTo which religion does the speaker\
\ most likely belong?\n(A) Hinduism (B) Buddhism (C) Shintoism (D) Zoroastrianism\n\ Otto von Bismarck, 1884
A: Let's think step by step. We refer to Wikipedia articles on world history for\
\ help. Brahman refers to the ultimate reality of all things in the Hindu religion.\ Otto von Bismarck likely made this speech in reaction to which of the following
\ In contrast, Buddhism does not have a concept of supreme God. The answer is (A).\n\n" issues?
"group": "mmlu_flan_cot_fewshot_humanities"
"include": "_mmlu_flan_cot_fewshot_template_yaml" (A) Social acceptance of child labor (B) Declining life expectancy in Germany
"task": "mmlu_flan_cot_fewshot_high_school_world_history" (C) Criticisms of German trade tariffs (D) Negative effects attributed to industrial
capitalism'
target: Let's think step by step. We refer to Wikipedia articles on world history
for help. The passage talks about the grievance of the work under the industrial
capitalism. The answer is (D).
- question: "This question refers to the following information.\nHe contains all works\
\ and desires and all perfumes and all tastes. He enfolds the whole universe\
\ and in silence is loving to all. This is the Spirit that is in my heart, this\
\ is Brahman. To him I shall come when I go beyond this life, and to him will\
\ come he who has faith and doubts not.\n\u2014The Upanishads, India, c. 1000\
\ BCE\nTo which religion does the speaker most likely belong?\n(A) Hinduism\
\ (B) Buddhism (C) Shintoism (D) Zoroastrianism"
target: 'Let''s think step by step. We refer to Wikipedia articles on world history
for help. Brahman refers to the ultimate reality of all things in the Hindu
religion. In contrast, Buddhism does not have a concept of supreme God. The
answer is (A).'
group: mmlu_flan_cot_fewshot_humanities
include: _mmlu_flan_cot_fewshot_template_yaml
task: mmlu_flan_cot_fewshot_high_school_world_history
"dataset_name": "human_aging" dataset_name: human_aging
"description": "The following are multiple choice questions (with answers) about human\ description: The following are multiple choice questions (with answers) about human
\ aging.\n\nQ: All other things being equal, which of the following persons is more\ aging.
\ likely to show osteoporosis?\n(A) An older Hispanic American woman (B) An older\ fewshot_config:
\ African American woman (C) An older Asian American woman (D) An older Native American\ sampler: first_n
\ woman\nA: Let's think step by step. We refer to Wikipedia articles on human aging\ samples:
\ for help. Although osteoporosis can occur at any age, the risk is higher for older\ - question: 'All other things being equal, which of the following persons is more likely
\ people. It is most common in Asian and non-Hispanic white women. The answer is\ to show osteoporosis?
\ (C).\n\nQ: The finding that adults tend to remember events from their adolescence\
\ better than from other periods in their lives is referred to as the\n(A) Adolescence\ (A) An older Hispanic American woman (B) An older African American woman (C)
\ advantage (B) Reminiscence bump (C) Memorial memorial (D) Quadratic retrieval\ An older Asian American woman (D) An older Native American woman'
\ spike\nA: Let's think step by step. We refer to Wikipedia articles on human aging\ target: Let's think step by step. We refer to Wikipedia articles on human aging
\ for help. Reminiscence bump is a phenomenon that older adults tend to recollect\ for help. Although osteoporosis can occur at any age, the risk is higher for
\ events during their young ages. People usually have a period of childhood amnesia\ older people. It is most common in Asian and non-Hispanic white women. The answer
\ from birth to around age 5, and a reminiscence bump between 10 and 30. The answer\ is (C).
\ is (B).\n\nQ: Which element in tobacco smoke is responsible for cancers?\n(A)\ - question: 'The finding that adults tend to remember events from their adolescence
\ Nicotine (B) Tar (C) Carbon monoxide (D) Smoke particles\nA: Let's think step\ better than from other periods in their lives is referred to as the
\ by step. We refer to Wikipedia articles on human aging for help. The benzene,\
\ acrylamide and acrylonitrile in tar interact with the lungs and cause DNA mutations\ (A) Adolescence advantage (B) Reminiscence bump (C) Memorial memorial (D) Quadratic
\ in cells of the lungs, and lead to cancer. The answer is (B).\n\nQ: When older\ retrieval spike'
\ adults move to a new state after retirement, which of the following is the more\ target: Let's think step by step. We refer to Wikipedia articles on human aging
\ likely destination?\n(A) Texas (B) California (C) Hawaii (D) Vermont\nA: Let's\ for help. Reminiscence bump is a phenomenon that older adults tend to recollect
\ think step by step. We refer to Wikipedia articles on human aging for help. Texas\ events during their young ages. People usually have a period of childhood amnesia
\ does not have state tax, and has low cost of living compared with the other three\ from birth to around age 5, and a reminiscence bump between 10 and 30. The answer
\ options. The answer is (A).\n\n" is (B).
"group": "mmlu_flan_cot_fewshot_other" - question: 'Which element in tobacco smoke is responsible for cancers?
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_human_aging" (A) Nicotine (B) Tar (C) Carbon monoxide (D) Smoke particles'
target: Let's think step by step. We refer to Wikipedia articles on human aging
for help. The benzene, acrylamide and acrylonitrile in tar interact with the
lungs and cause DNA mutations in cells of the lungs, and lead to cancer. The
answer is (B).
- question: 'When older adults move to a new state after retirement, which of the following
is the more likely destination?
(A) Texas (B) California (C) Hawaii (D) Vermont'
target: 'Let''s think step by step. We refer to Wikipedia articles on human aging
for help. Texas does not have state tax, and has low cost of living compared
with the other three options. The answer is (A).'
group: mmlu_flan_cot_fewshot_other
include: _mmlu_flan_cot_fewshot_template_yaml
task: mmlu_flan_cot_fewshot_human_aging
"dataset_name": "human_sexuality" dataset_name: human_sexuality
"description": "The following are multiple choice questions (with answers) about human\ description: The following are multiple choice questions (with answers) about human
\ sexuality.\n\nQ: The most common disorder among men who seek sexual therapy is:\n\ sexuality.
(A) premature ejaculation (B) inhibited ejaculation (C) erectile disorder (D) ejaculatory\ fewshot_config:
\ disorder\nA: Let's think step by step. We refer to Wikipedia articles on human\ sampler: first_n
\ sexuality for help. The most common type of arousal disorder for men is erectile\ samples:
\ dysfunction, meaning that a person is not able to get any physical satisfaction\ - question: 'The most common disorder among men who seek sexual therapy is:
\ from sexual activity although he may be interested in it. The answer is (C).\n\
\nQ: A woman who knows she has active herpes and untreated syphilis but continues\ (A) premature ejaculation (B) inhibited ejaculation (C) erectile disorder (D)
\ to have sex without informing her partners of her condition has, in psychoanalytic\ ejaculatory disorder'
\ terms:\n(A) a strong ego (B) a weak superego (C) a weak id (D) a strong superego\n\ target: Let's think step by step. We refer to Wikipedia articles on human sexuality
A: Let's think step by step. We refer to Wikipedia articles on human sexuality for\ for help. The most common type of arousal disorder for men is erectile dysfunction,
\ help. A person with weak superego tends to be delinquent, criminal or have antisocial\ meaning that a person is not able to get any physical satisfaction from sexual
\ personality. The action of the woman who knows she has active venereal disease\ activity although he may be interested in it. The answer is (C).
\ but still have sex with her partners indicate she may has antisocial personality.\ - question: 'A woman who knows she has active herpes and untreated syphilis but continues
\ The answer is (B).\n\nQ: The nature of homosexual activities that occur during\ to have sex without informing her partners of her condition has, in psychoanalytic
\ preadolescence include all but which of the following?\n(A) sexual intercourse\ terms:
\ (B) circle jerks (C) exhibitionism (D) touching each other's genitals\nA: Let's\
\ think step by step. We refer to Wikipedia articles on human sexuality for help.\ (A) a strong ego (B) a weak superego (C) a weak id (D) a strong superego'
\ Sexual intercourse is prohibited during preadolescence. The answer is (A).\n\n\ target: Let's think step by step. We refer to Wikipedia articles on human sexuality
Q: Women's ability to have multiple orgasms is primarily due to:\n(A) the fact that\ for help. A person with weak superego tends to be delinquent, criminal or have
\ they do not have a refractory period. (B) the response of the inner layers of\ antisocial personality. The action of the woman who knows she has active venereal
\ the vagina. (C) having alternating orgasms in different locations. (D) the G-Spot.\n\ disease but still have sex with her partners indicate she may has antisocial
A: Let's think step by step. We refer to Wikipedia articles on human sexuality for\ personality. The answer is (B).
\ help. The refractory period is the time when a person is not able to be erect\ - question: 'The nature of homosexual activities that occur during preadolescence include
\ or is not interested in sex. The answer is (A).\n\nQ: Morning sickness is typically\ all but which of the following?
\ a problem:\n(A) during the first trimester (B) during the second trimester (C)\
\ during the third trimester (D) all through the pregnancy\nA: Let's think step\ (A) sexual intercourse (B) circle jerks (C) exhibitionism (D) touching each
\ by step. We refer to Wikipedia articles on human sexuality for help. Morning sickness\ other''s genitals'
\ usually begins by nine weeks after conception, corresponding to the first trimester.\ target: Let's think step by step. We refer to Wikipedia articles on human sexuality
\ The answer is (A).\n\n" for help. Sexual intercourse is prohibited during preadolescence. The answer
"group": "mmlu_flan_cot_fewshot_social_sciences" is (A).
"include": "_mmlu_flan_cot_fewshot_template_yaml" - question: 'Women''s ability to have multiple orgasms is primarily due to:
"task": "mmlu_flan_cot_fewshot_human_sexuality"
(A) the fact that they do not have a refractory period. (B) the response of
the inner layers of the vagina. (C) having alternating orgasms in different
locations. (D) the G-Spot.'
target: Let's think step by step. We refer to Wikipedia articles on human sexuality
for help. The refractory period is the time when a person is not able to be
erect or is not interested in sex. The answer is (A).
- question: 'Morning sickness is typically a problem:
(A) during the first trimester (B) during the second trimester (C) during the
third trimester (D) all through the pregnancy'
target: 'Let''s think step by step. We refer to Wikipedia articles on human sexuality
for help. Morning sickness usually begins by nine weeks after conception, corresponding
to the first trimester. The answer is (A).'
group: mmlu_flan_cot_fewshot_social_sciences
include: _mmlu_flan_cot_fewshot_template_yaml
task: mmlu_flan_cot_fewshot_human_sexuality
"dataset_name": "international_law" dataset_name: international_law
"description": "The following are multiple choice questions (with answers) about international\ description: The following are multiple choice questions (with answers) about international
\ law.\n\nQ: How the consent to be bound of a State may be expressed?\n(A) The consent\ law.
\ of a State to be bound is expressed only by ratification (B) The consent of a\ fewshot_config:
\ state to be bound by a treaty may be expressed by signature, ratification, acceptance,\ sampler: first_n
\ approval or accession (C) The consent of a State to be bound is expressed by signature\ samples:
\ (D) The consent of a State to be bound is expressed by whatever means they choose\n\ - question: 'How the consent to be bound of a State may be expressed?
A: Let's think step by step. We refer to Wikipedia articles on international law\
\ for help. Article 11 of Vienna Convention on the Law of Treaties signed in 1969\ (A) The consent of a State to be bound is expressed only by ratification (B)
\ states that \"the consent of a State to be bound by a treaty may be expressed\ The consent of a state to be bound by a treaty may be expressed by signature,
\ by signature, exchange of instruments constituting a treaty, ratification, acceptance,\ ratification, acceptance, approval or accession (C) The consent of a State to
\ approval or accession, or by any other means if so agreed.\" (B) is the most precise\ be bound is expressed by signature (D) The consent of a State to be bound is
\ and accurate answer. The answer is (B).\n\nQ: What is the judge ad hoc?\n(A) If\ expressed by whatever means they choose'
\ a party to a contentious case before the ICJ does not have a national sitting\ target: Let's think step by step. We refer to Wikipedia articles on international
\ as judge, it is entitled to nominate someone as a judge solely for that case,\ law for help. Article 11 of Vienna Convention on the Law of Treaties signed
\ with the title of judge ad hoc (B) Judge ad hoc is the member of the bench of\ in 1969 states that "the consent of a State to be bound by a treaty may be expressed
\ the ICJ with a casting vote (C) Judge ad hoc is a surrogate judge, in case a judge\ by signature, exchange of instruments constituting a treaty, ratification, acceptance,
\ is disqualified or passes away (D) Judge ad hoc is the judge that each party will\ approval or accession, or by any other means if so agreed." (B) is the most
\ always nominate in every contentious case\nA: Let's think step by step. We refer\ precise and accurate answer. The answer is (B).
\ to Wikipedia articles on international law for help. As \"ad hoc\" implies, a\ - question: 'What is the judge ad hoc?
\ judge ad hoc is appointed only for a specific case or period, when a party to\
\ a contentious case before the International Court of Justice does not have a regular\ (A) If a party to a contentious case before the ICJ does not have a national
\ national sitting as judge. The answer is (A).\n\nQ: When 'consent' can serve as\ sitting as judge, it is entitled to nominate someone as a judge solely for that
\ a circumstance precluding the wrongfulness of a State conduct?\n(A) Consent can\ case, with the title of judge ad hoc (B) Judge ad hoc is the member of the bench
\ serve as a circumstance precluding the wrongfulness whenever it is given (B) Consent\ of the ICJ with a casting vote (C) Judge ad hoc is a surrogate judge, in case
\ can never serve as a circumstance precluding wrongfulness (C) Consent can serve\ a judge is disqualified or passes away (D) Judge ad hoc is the judge that each
\ as a circumstance precluding wrongfulness, provided the consent is valid and to\ party will always nominate in every contentious case'
\ the extent that the conduct remains within the limits of the consent given (D)\ target: Let's think step by step. We refer to Wikipedia articles on international
\ Consent can always serve as a circumstance precluding wrongfulness, no matter\ law for help. As "ad hoc" implies, a judge ad hoc is appointed only for a specific
\ which organ of the State gives it\nA: Let's think step by step. We refer to Wikipedia\ case or period, when a party to a contentious case before the International
\ articles on international law for help. Valid consent can serve as a circumstance\ Court of Justice does not have a regular national sitting as judge. The answer
\ precluding the wrongfulness of a State conduct if the conduct remains within the\ is (A).
\ limits of that consent, according to Chapter V of the Responsibility of States\ - question: 'When ''consent'' can serve as a circumstance precluding the wrongfulness
\ for Internationally Wrongful Acts, 2001, United Nations. The answer is (C).\n\n\ of a State conduct?
Q: Would a reservation to the definition of torture in the ICCPR be acceptable in\
\ contemporary practice?\n(A) This is an acceptable reservation if the reserving\ (A) Consent can serve as a circumstance precluding the wrongfulness whenever
\ country's legislation employs a different definition (B) This is an unacceptable\ it is given (B) Consent can never serve as a circumstance precluding wrongfulness
\ reservation because it contravenes the object and purpose of the ICCPR (C) This\ (C) Consent can serve as a circumstance precluding wrongfulness, provided the
\ is an unacceptable reservation because the definition of torture in the ICCPR\ consent is valid and to the extent that the conduct remains within the limits
\ is consistent with customary international law (D) This is an acceptable reservation\ of the consent given (D) Consent can always serve as a circumstance precluding
\ because under general international law States have the right to enter reservations\ wrongfulness, no matter which organ of the State gives it'
\ to treaties\nA: Let's think step by step. We refer to Wikipedia articles on international\ target: Let's think step by step. We refer to Wikipedia articles on international
\ law for help. For it contravenes the object and purpose of the ICCPR, this is\ law for help. Valid consent can serve as a circumstance precluding the wrongfulness
\ an unacceptable reservation in contemporary practice. The answer is (B).\n\nQ:\ of a State conduct if the conduct remains within the limits of that consent,
\ What types of force does Article 2(4) of the UN Charter prohibit?\n(A) Article\ according to Chapter V of the Responsibility of States for Internationally Wrongful
\ 2(4) encompasses only armed force (B) Article 2(4) encompasses all types of force,\ Acts, 2001, United Nations. The answer is (C).
\ including sanctions (C) Article 2(4) encompasses all interference in the domestic\ - question: 'Would a reservation to the definition of torture in the ICCPR be acceptable
\ affairs of States (D) Article 2(4) encompasses force directed only against a State's\ in contemporary practice?
\ territorial integrity\nA: Let's think step by step. We refer to Wikipedia articles\
\ on international law for help. Article 2(4) of the UN Charter prohibits states\ (A) This is an acceptable reservation if the reserving country''s legislation
\ from using armed forces in their international relations. The answer is (A).\n\n" employs a different definition (B) This is an unacceptable reservation because
"group": "mmlu_flan_cot_fewshot_humanities" it contravenes the object and purpose of the ICCPR (C) This is an unacceptable
"include": "_mmlu_flan_cot_fewshot_template_yaml" reservation because the definition of torture in the ICCPR is consistent with
"task": "mmlu_flan_cot_fewshot_international_law" customary international law (D) This is an acceptable reservation because under
general international law States have the right to enter reservations to treaties'
target: Let's think step by step. We refer to Wikipedia articles on international
law for help. For it contravenes the object and purpose of the ICCPR, this is
an unacceptable reservation in contemporary practice. The answer is (B).
- question: 'What types of force does Article 2(4) of the UN Charter prohibit?
(A) Article 2(4) encompasses only armed force (B) Article 2(4) encompasses all
types of force, including sanctions (C) Article 2(4) encompasses all interference
in the domestic affairs of States (D) Article 2(4) encompasses force directed
only against a State''s territorial integrity'
target: 'Let''s think step by step. We refer to Wikipedia articles on international
law for help. Article 2(4) of the UN Charter prohibits states from using armed
forces in their international relations. The answer is (A).'
group: mmlu_flan_cot_fewshot_humanities
include: _mmlu_flan_cot_fewshot_template_yaml
task: mmlu_flan_cot_fewshot_international_law
"dataset_name": "jurisprudence" dataset_name: jurisprudence
"description": "The following are multiple choice questions (with answers) about jurisprudence.\n\ description: The following are multiple choice questions (with answers) about jurisprudence.
\nQ: Iverson Jewelers wrote a letter to Miller, 'We have received an exceptionally\ fewshot_config:
\ fine self winding Rolox watch which we will sell to you at a very favorable price.'\n\ sampler: first_n
(A) The letter is an offer to sell (B) A valid offer cannot be made by letter. (C)\ samples:
\ The letter contains a valid offer which will terminate within a reasonable time.\ - question: 'Iverson Jewelers wrote a letter to Miller, ''We have received an exceptionally
\ (D) The letter lacks one of the essential elements of an offer.\nA: Let's think\ fine self winding Rolox watch which we will sell to you at a very favorable
\ step by step. We refer to Wikipedia articles on jurisprudence for help. An offer\ price.''
\ shows the intent to enter into a mutually-beneficial contract with specific terms.\
\ An offer can be made by a letter. While this letter indicates the willingness\ (A) The letter is an offer to sell (B) A valid offer cannot be made by letter.
\ to sell, the lack of specific terms, such as transaction price and offer expiration\ (C) The letter contains a valid offer which will terminate within a reasonable
\ date, makes it an incomplete offer. The answer is (D).\n\nQ: Functions of the\ time. (D) The letter lacks one of the essential elements of an offer.'
\ law include all but which of the following?\n(A) maximizing individual freedom\ target: Let's think step by step. We refer to Wikipedia articles on jurisprudence
\ (B) providing a basis for compromise (C) keeping the peace (D) promoting the principles\ for help. An offer shows the intent to enter into a mutually-beneficial contract
\ of the free enterprise system\nA: Let's think step by step. We refer to Wikipedia\ with specific terms. An offer can be made by a letter. While this letter indicates
\ articles on jurisprudence for help. Laws are fundamentally about helping resolve\ the willingness to sell, the lack of specific terms, such as transaction price
\ disputes between individuals, and therefore essential for maximizing individual\ and offer expiration date, makes it an incomplete offer. The answer is (D).
\ freedom, providing a basis for compromise, and keeping the peace. The answer is\ - question: 'Functions of the law include all but which of the following?
\ (D).\n\nQ: The ________ School of jurisprudence postulates that the law is based\
\ on what is \"correct.\"\n(A) Natural Law (B) Analytical (C) Historical (D) Sociological\n\ (A) maximizing individual freedom (B) providing a basis for compromise (C) keeping
A: Let's think step by step. We refer to Wikipedia articles on jurisprudence for\ the peace (D) promoting the principles of the free enterprise system'
\ help. Natural Law School of jurisprudence focuses on the laws of nature, and states\ target: Let's think step by step. We refer to Wikipedia articles on jurisprudence
\ that the law should be based on ethics, morals, and what is \"correct\". Analytical\ for help. Laws are fundamentally about helping resolve disputes between individuals,
\ deals with the law as it already exists, Historical postulates that the law was\ and therefore essential for maximizing individual freedom, providing a basis
\ found and not made, and Sociological studies how the law and society impact each\ for compromise, and keeping the peace. The answer is (D).
\ other. The answer is (A).\n\nQ: Which word best summarizes Weber's explanation\ - question: 'The ________ School of jurisprudence postulates that the law is based
\ of the development of formally rational law?\n(A) Authority. (B) Charisma. (C)\ on what is "correct."
\ Co-operation. (D) Capitalism.\nA: Let's think step by step. We refer to Wikipedia\
\ articles on jurisprudence for help. Weber explained the development of formal\ (A) Natural Law (B) Analytical (C) Historical (D) Sociological'
\ rationality in laws as how the modern society moved from tradition to rationality,\ target: Let's think step by step. We refer to Wikipedia articles on jurisprudence
\ where people decide actions based less on how they were culturally done and more\ for help. Natural Law School of jurisprudence focuses on the laws of nature,
\ on expected utilities. How rational individuals optimize efficiency of accomplishing\ and states that the law should be based on ethics, morals, and what is "correct".
\ tasks for higher rewards is a core principle of Capitalism. The answer is (D).\n\ Analytical deals with the law as it already exists, Historical postulates that
\nQ: Which position does Rawls claim is the least likely to be adopted by the POP\ the law was found and not made, and Sociological studies how the law and society
\ (people in the original position)?\n(A) The POP would choose equality above liberty.\ impact each other. The answer is (A).
\ (B) The POP would opt for the 'maximin' strategy. (C) The POP would opt for the\ - question: 'Which word best summarizes Weber''s explanation of the development of
\ 'difference principle'. (D) The POP would reject the 'system of natural liberty.'\n\ formally rational law?
A: Let's think step by step. We refer to Wikipedia articles on jurisprudence for\
\ help. The POP would opt for the 'maximin' strategy, opt for the 'difference principle',\ (A) Authority. (B) Charisma. (C) Co-operation. (D) Capitalism.'
\ and reject the 'system of natural liberty', but the POP would not choose equality\ target: Let's think step by step. We refer to Wikipedia articles on jurisprudence
\ above liberty, since the POP assume both equal and free citizens. The answer is\ for help. Weber explained the development of formal rationality in laws as how
\ (A).\n\n" the modern society moved from tradition to rationality, where people decide
"group": "mmlu_flan_cot_fewshot_humanities" actions based less on how they were culturally done and more on expected utilities.
"include": "_mmlu_flan_cot_fewshot_template_yaml" How rational individuals optimize efficiency of accomplishing tasks for higher
"task": "mmlu_flan_cot_fewshot_jurisprudence" rewards is a core principle of Capitalism. The answer is (D).
- question: 'Which position does Rawls claim is the least likely to be adopted by the
POP (people in the original position)?
(A) The POP would choose equality above liberty. (B) The POP would opt for the
''maximin'' strategy. (C) The POP would opt for the ''difference principle''.
(D) The POP would reject the ''system of natural liberty.'''
target: 'Let''s think step by step. We refer to Wikipedia articles on jurisprudence
for help. The POP would opt for the ''maximin'' strategy, opt for the ''difference
principle'', and reject the ''system of natural liberty'', but the POP would
not choose equality above liberty, since the POP assume both equal and free
citizens. The answer is (A).'
group: mmlu_flan_cot_fewshot_humanities
include: _mmlu_flan_cot_fewshot_template_yaml
task: mmlu_flan_cot_fewshot_jurisprudence
"dataset_name": "logical_fallacies" dataset_name: logical_fallacies
"description": "The following are multiple choice questions (with answers) about logical\ description: The following are multiple choice questions (with answers) about logical
\ fallacies.\n\nQ: When an arguer causes confusion during refutation because of\ fallacies.
\ real or feigned lack of an ability to engage in refutation, that arguer may have\ fewshot_config:
\ committed the fallacy of\n(A) poor sportsmanship (B) appeal to compassion (C)\ sampler: first_n
\ argument against the person (D) ignorance of refutation\nA: Let's think step by\ samples:
\ step. We refer to Wikipedia articles on logical fallacies for help. Ignorance\ - question: 'When an arguer causes confusion during refutation because of real or feigned
\ of refutation, one of Aristotle's original list of logical fallacies in his Organon,\ lack of an ability to engage in refutation, that arguer may have committed the
\ is when someone causes confusion in an argument through real or feigned inability\ fallacy of
\ to engage in refutation, in order to win the argument. The answer is (D).\n\n\
Q: The complex question fallacy consists of\n(A) arguing something is inferior just\ (A) poor sportsmanship (B) appeal to compassion (C) argument against the person
\ because it doesn't do something it was never intended to do. (B) including more\ (D) ignorance of refutation'
\ than one claim in the proposition and treating proof for one claim as proof for\ target: Let's think step by step. We refer to Wikipedia articles on logical fallacies
\ all the claims. (C) drawing a conclusion before examining the evidence, and only\ for help. Ignorance of refutation, one of Aristotle's original list of logical
\ considering evidence that supports that conclusion. (D) asking a question that\ fallacies in his Organon, is when someone causes confusion in an argument through
\ includes either an unproven assumption or more than one question, thus making\ real or feigned inability to engage in refutation, in order to win the argument.
\ a straightforward yes or no answer meaningless.\nA: Let's think step by step.\ The answer is (D).
\ We refer to Wikipedia articles on logical fallacies for help. The complex question\ - question: 'The complex question fallacy consists of
\ fallacy is when someone makes a single yes or no answer to a question meaningless,\
\ by including either an unproven assumption or many questions. The latter is also\ (A) arguing something is inferior just because it doesn''t do something it was
\ known as the many questions fallacy. The answer is (D).\n\nQ: Arguing that what\ never intended to do. (B) including more than one claim in the proposition and
\ is true of the parts must be true of the whole is the fallacy of...\n(A) Division\ treating proof for one claim as proof for all the claims. (C) drawing a conclusion
\ (B) Composition (C) Appeal to the person (D) Appeal to ignorance\nA: Let's think\ before examining the evidence, and only considering evidence that supports that
\ step by step. We refer to Wikipedia articles on logical fallacies for help. Fallacy\ conclusion. (D) asking a question that includes either an unproven assumption
\ of composition occurs when someone argues what is true of the parts must be true\ or more than one question, thus making a straightforward yes or no answer meaningless.'
\ of the whole. The answer is (B).\n\nQ: Which of the following is true of a valid\ target: Let's think step by step. We refer to Wikipedia articles on logical fallacies
\ categorical syllogism?\n(A) The minor premise must deny the antecedent (B) The\ for help. The complex question fallacy is when someone makes a single yes or
\ major premise must affirm the consequent (C) The middle term must be used in at\ no answer to a question meaningless, by including either an unproven assumption
\ least one premise in a universal or unqualified sense (D) All of the above\nA:\ or many questions. The latter is also known as the many questions fallacy. The
\ Let's think step by step. We refer to Wikipedia articles on logical fallacies\ answer is (D).
\ for help. A valid categorical syllogism must satisfy several conditions: (1) the\ - question: 'Arguing that what is true of the parts must be true of the whole is the
\ syllogism must have exactly three terms (2) every term of the syllogism must be\ fallacy of...
\ used twice exactly, (3) a term may be used only once in any premise, and (4) the\
\ middle term must be used in at least one premise in a universal or unqualified\ (A) Division (B) Composition (C) Appeal to the person (D) Appeal to ignorance'
\ sense, etc. Only (C) is true. The answer is (C).\n\nQ: If someone attacks the\ target: Let's think step by step. We refer to Wikipedia articles on logical fallacies
\ character of an opposing arguer, instead of responding to that opponent's arguments,\ for help. Fallacy of composition occurs when someone argues what is true of
\ the first person has probably committed which of the following fallacies?\n(A)\ the parts must be true of the whole. The answer is (B).
\ tu quoque (B) horse laugh (C) argument against the person (D) ignoratio elenchi\n\ - question: 'Which of the following is true of a valid categorical syllogism?
A: Let's think step by step. We refer to Wikipedia articles on logical fallacies\
\ for help. The argument against the person fallacy occurs when someone irrelevantly\ (A) The minor premise must deny the antecedent (B) The major premise must affirm
\ attacks the character of an opposing arguer, instead of addressing that opponent's\ the consequent (C) The middle term must be used in at least one premise in a
\ arguments. The answer is (C).\n\n" universal or unqualified sense (D) All of the above'
"group": "mmlu_flan_cot_fewshot_humanities" target: 'Let''s think step by step. We refer to Wikipedia articles on logical
"include": "_mmlu_flan_cot_fewshot_template_yaml" fallacies for help. A valid categorical syllogism must satisfy several conditions:
"task": "mmlu_flan_cot_fewshot_logical_fallacies" (1) the syllogism must have exactly three terms (2) every term of the syllogism
must be used twice exactly, (3) a term may be used only once in any premise,
and (4) the middle term must be used in at least one premise in a universal
or unqualified sense, etc. Only (C) is true. The answer is (C).'
- question: 'If someone attacks the character of an opposing arguer, instead of responding
to that opponent''s arguments, the first person has probably committed which
of the following fallacies?
(A) tu quoque (B) horse laugh (C) argument against the person (D) ignoratio
elenchi'
target: 'Let''s think step by step. We refer to Wikipedia articles on logical
fallacies for help. The argument against the person fallacy occurs when someone
irrelevantly attacks the character of an opposing arguer, instead of addressing
that opponent''s arguments. The answer is (C).'
group: mmlu_flan_cot_fewshot_humanities
include: _mmlu_flan_cot_fewshot_template_yaml
task: mmlu_flan_cot_fewshot_logical_fallacies
"dataset_name": "machine_learning" dataset_name: machine_learning
"description": "The following are multiple choice questions (with answers) about machine\ description: The following are multiple choice questions (with answers) about machine
\ learning.\n\nQ: Which image data augmentation is most common for natural images?\n\ learning.
(A) random crop and horizontal flip (B) random crop and vertical flip (C) posterization\ fewshot_config:
\ (D) dithering\nA: Let's think step by step. Data augmentation is used to increase\ sampler: first_n
\ the diversity of images in the training dataset. It is important that natural\ samples:
\ images are kept natural after being augmented. Vertical flips of images are not\ - question: 'Which image data augmentation is most common for natural images?
\ natural, so (B) is false. Posterization makes the image look like a poster and\
\ and dithering increases color depth. None of these two preserve the natural property.\ (A) random crop and horizontal flip (B) random crop and vertical flip (C) posterization
\ The only natural data augmentation technique is (A). The answer is (A).\n\nQ:\ (D) dithering'
\ Traditionally, when we have a real-valued input attribute during decision-tree\ target: Let's think step by step. Data augmentation is used to increase the diversity
\ learning we consider a binary split according to whether the attribute is above\ of images in the training dataset. It is important that natural images are kept
\ or below some threshold. Pat suggests that instead we should just have a multiway\ natural after being augmented. Vertical flips of images are not natural, so
\ split with one branch for each of the distinct values of the attribute. From the\ (B) is false. Posterization makes the image look like a poster and and dithering
\ list below choose the single biggest problem with Pat’s suggestion:\n(A) It is\ increases color depth. None of these two preserve the natural property. The
\ too computationally expensive. (B) It would probably result in a decision tree\ only natural data augmentation technique is (A). The answer is (A).
\ that scores badly on the training set and a testset. (C) It would probably result\ - question: "Traditionally, when we have a real-valued question attribute during decision-tree\
\ in a decision tree that scores well on the training set but badly on a testset.\ \ learning we consider a binary split according to whether the attribute is\
\ (D) It would probably result in a decision tree that scores well on a testset\ \ above or below some threshold. Pat suggests that instead we should just have\
\ but badly on a training set.\nA: Let's think step by step. Because the input is\ \ a multiway split with one branch for each of the distinct values of the attribute.\
\ real valued, it is unlikely that the same values appear both at training and test\ \ From the list below choose the single biggest problem with Pat\u2019s suggestion:\n\
\ time. This means that while such a decision tree could yield good performance\ (A) It is too computationally expensive. (B) It would probably result in a decision\
\ on the training data, when evaluated on the test data it will perform badly because\ \ tree that scores badly on the training set and a testset. (C) It would probably\
\ the decision tree won’t know what to do with numbers that did not appear in the\ \ result in a decision tree that scores well on the training set but badly on\
\ training data. The answer is (C).\n\nQ: You are reviewing papers for the World’s\ \ a testset. (D) It would probably result in a decision tree that scores well\
\ Fanciest Machine Learning Conference, and you see submissions with the following\ \ on a testset but badly on a training set."
\ claims. Which ones would you consider accepting?\n(A) My method achieves a training\ target: "Let's think step by step. Because the question is real valued, it is unlikely\
\ error lower than all previous methods! (B) My method achieves a test error lower\ \ that the same values appear both at training and test time. This means that\
\ than all previous methods! (Footnote: When regularisation parameter λ is chosen\ \ while such a decision tree could yield good performance on the training data,\
\ so as to minimise test error.) (C) My method achieves a test error lower than\ \ when evaluated on the test data it will perform badly because the decision\
\ all previous methods! (Footnote: When regularisation parameter λ is chosen so\ \ tree won\u2019t know what to do with numbers that did not appear in the training\
\ as to minimise cross-validaton error.) (D) My method achieves a cross-validation\ \ data. The answer is (C)."
\ error lower than all previous methods! (Footnote: When regularisation parameter\ - question: "You are reviewing papers for the World\u2019s Fanciest Machine Learning\
\ λ is chosen so as to minimise cross-validaton error.)\nA: Let's think step by\ \ Conference, and you see submissions with the following claims. Which ones\
\ step. In machine learning, we train with some data and fixed hyperparameters and\ \ would you consider accepting?\n(A) My method achieves a training error lower\
\ the training error can be arbitrarily low, so (A) can’t be right. Then, one compares\ \ than all previous methods! (B) My method achieves a test error lower than\
\ different hyperparameters by selecting the model with the lowest cross-validation\ \ all previous methods! (Footnote: When regularisation parameter \u03BB is chosen\
\ error, this means that (B) and (D) are not the right procedure. The only relevant\ \ so as to minimise test error.) (C) My method achieves a test error lower than\
\ number after these is the test error and thus (C) is the right answer. The answer\ \ all previous methods! (Footnote: When regularisation parameter \u03BB is chosen\
\ is (C).\n\nQ: A 6-sided die is rolled 15 times and the results are: side 1 comes\ \ so as to minimise cross-validaton error.) (D) My method achieves a cross-validation\
\ up 0 times; side 2: 1 time; side 3: 2 times; side 4: 3 times; side 5: 4 times;\ \ error lower than all previous methods! (Footnote: When regularisation parameter\
\ side 6: 5 times. Based on these results, what is the probability of side 3 coming\ \ \u03BB is chosen so as to minimise cross-validaton error.)"
\ up when using Add-1 Smoothing?\n(A) 2.0/15 (B) 1.0/7 (C) 3.0/16 (D) 1.0/5\nA:\ target: "Let's think step by step. In machine learning, we train with some data\
\ Let's think step by step. Add-1 smoothing adds the value of one to the different\ \ and fixed hyperparameters and the training error can be arbitrarily low, so\
\ counts and then normalizes the probabilities accordingly. The counts after adding\ \ (A) can\u2019t be right. Then, one compares different hyperparameters by selecting\
\ one will be: side 1 comes up 1 time; side 2: 2 times; side 3: 3 times; side 4:\ \ the model with the lowest cross-validation error, this means that (B) and\
\ 4 times; side 5: 5 times; side 6: 6 times. The number of sum one die rolls will\ \ (D) are not the right procedure. The only relevant number after these is the\
\ be 21, so the probability of drawing a three is 3/21 = 1/7. The answer is (B).\n\ \ test error and thus (C) is the right answer. The answer is (C)."
\nQ: To achieve an 0/1 loss estimate that is less than 1 percent of the true 0/1\ - question: 'A 6-sided die is rolled 15 times and the results are: side 1 comes up
\ loss (with probability 95%), according to Hoeffding's inequality the IID test\ 0 times; side 2: 1 time; side 3: 2 times; side 4: 3 times; side 5: 4 times;
\ set must have how many examples?\n(A) around 10 examples (B) around 100 examples\ side 6: 5 times. Based on these results, what is the probability of side 3 coming
\ (C) between 100 and 500 examples (D) more than 1000 examples\nA: Let's think step\ up when using Add-1 Smoothing?
\ by step. By the Hoeffding’s inequality, we expect that with 95% probability the\
\ in-sample and out-of-sample errors differ by epsilon when we have N samples if\ (A) 2.0/15 (B) 1.0/7 (C) 3.0/16 (D) 1.0/5'
\ 2 exp(-2 epsilon^2 N)<0.05, this implies that N > -1/(2*epsilon**2) log ( 0.05/2\ target: 'Let''s think step by step. Add-1 smoothing adds the value of one to the
\ )= log (40)*5000. Since log(40)>1, we have that one needs more than 1000 examples.\ different counts and then normalizes the probabilities accordingly. The counts
\ The answer is (D).\n\n" after adding one will be: side 1 comes up 1 time; side 2: 2 times; side 3: 3
"group": "mmlu_flan_cot_fewshot_stem" times; side 4: 4 times; side 5: 5 times; side 6: 6 times. The number of sum
"include": "_mmlu_flan_cot_fewshot_template_yaml" one die rolls will be 21, so the probability of drawing a three is 3/21 = 1/7.
"task": "mmlu_flan_cot_fewshot_machine_learning" The answer is (B).'
- question: 'To achieve an 0/1 loss estimate that is less than 1 percent of the true
0/1 loss (with probability 95%), according to Hoeffding''s inequality the IID
test set must have how many examples?
(A) around 10 examples (B) around 100 examples (C) between 100 and 500 examples
(D) more than 1000 examples'
target: "Let's think step by step. By the Hoeffding\u2019s inequality, we expect\
\ that with 95% probability the in-sample and out-of-sample errors differ by\
\ epsilon when we have N samples if 2 exp(-2 epsilon^2 N)<0.05, this implies\
\ that N > -1/(2*epsilon**2) log ( 0.05/2 )= log (40)*5000. Since log(40)>1,\
\ we have that one needs more than 1000 examples. The answer is (D).\n\n"
group: mmlu_flan_cot_fewshot_stem
include: _mmlu_flan_cot_fewshot_template_yaml
task: mmlu_flan_cot_fewshot_machine_learning
"dataset_name": "management" dataset_name: management
"description": "The following are multiple choice questions (with answers) about management.\n\ description: The following are multiple choice questions (with answers) about management.
\nQ: How can organisational structures that are characterised by democratic and\ fewshot_config:
\ inclusive styles of management be described?\n(A) Hierarchical (B) Bureaucratic\ sampler: first_n
\ (C) Flat (D) Functional\nA: Let's think step by step. We refer to Wikipedia articles\ samples:
\ on management for help. Flat organizational structures are characterized by democratic\ - question: 'How can organisational structures that are characterised by democratic
\ and inclusive styles of management, and have few (if any) levels of management\ and inclusive styles of management be described?
\ between the workers and managers. The answer is (C).\n\nQ: Hygiene factors are\
\ associated with which writer?\n(A) Frederick Hertzberg (B) D.C. McClelland (C)\ (A) Hierarchical (B) Bureaucratic (C) Flat (D) Functional'
\ Abraham Maslow (D) Douglas McGregor\nA: Let's think step by step. We refer to\ target: Let's think step by step. We refer to Wikipedia articles on management
\ Wikipedia articles on management for help. Hygiene factors include compensation,\ for help. Flat organizational structures are characterized by democratic and
\ company policies, supervision, interpersonal relations, and work environments.\ inclusive styles of management, and have few (if any) levels of management between
\ Hertzberg lists them as factors that cannot motivate employees but can minimize\ the workers and managers. The answer is (C).
\ job dissatisfaction. The answer is (A).\n\nQ: What characteristic is not a key\ - question: 'Hygiene factors are associated with which writer?
\ feature of the 'open systems' model of management?\n(A) Morale (B) Innovation\
\ (C) Growth resource (D) Adaptation\nA: Let's think step by step. We refer to Wikipedia\ (A) Frederick Hertzberg (B) D.C. McClelland (C) Abraham Maslow (D) Douglas McGregor'
\ articles on management for help. The key characteristics of an open system in\ target: Let's think step by step. We refer to Wikipedia articles on management
\ management include innovation, growth resource, and adaption, but do not include\ for help. Hygiene factors include compensation, company policies, supervision,
\ morale. The answer is (A).\n\nQ: Which element of the cultural web forms regalia?\n\ interpersonal relations, and work environments. Hertzberg lists them as factors
(A) Symbols (B) Rituals and routines (C) Power structures (D) Control systems\n\ that cannot motivate employees but can minimize job dissatisfaction. The answer
A: Let's think step by step. We refer to Wikipedia articles on management for help.\ is (A).
\ The cultural web is a tool for mapping an organization's culture, where symbols\ - question: 'What characteristic is not a key feature of the ''open systems'' model
\ form the regalia that visually expresses the values that the organization holds\ of management?
\ as important. The answer is (A).\n\nQ: What are the two main dimensions of the\
\ Ohio Studies into leadership?\n(A) Starting position and end position (B) Initial\ (A) Morale (B) Innovation (C) Growth resource (D) Adaptation'
\ environment and changed environment (C) Organisational structure and conditioning\ target: Let's think step by step. We refer to Wikipedia articles on management
\ (D) Initiating structure and considerations\nA: Let's think step by step. We refer\ for help. The key characteristics of an open system in management include innovation,
\ to Wikipedia articles on management for help. The Ohio State Leadership Studies\ growth resource, and adaption, but do not include morale. The answer is (A).
\ conducted in the 1940s identified initiating structure and consideration as the\ - question: 'Which element of the cultural web forms regalia?
\ two main dimensions of leader behavior. The answer is (D).\n\n"
"group": "mmlu_flan_cot_fewshot_other" (A) Symbols (B) Rituals and routines (C) Power structures (D) Control systems'
"include": "_mmlu_flan_cot_fewshot_template_yaml" target: Let's think step by step. We refer to Wikipedia articles on management
"task": "mmlu_flan_cot_fewshot_management" for help. The cultural web is a tool for mapping an organization's culture,
where symbols form the regalia that visually expresses the values that the organization
holds as important. The answer is (A).
- question: 'What are the two main dimensions of the Ohio Studies into leadership?
(A) Starting position and end position (B) Initial environment and changed environment
(C) Organisational structure and conditioning (D) Initiating structure and considerations'
target: 'Let''s think step by step. We refer to Wikipedia articles on management
for help. The Ohio State Leadership Studies conducted in the 1940s identified
initiating structure and consideration as the two main dimensions of leader
behavior. The answer is (D).'
group: mmlu_flan_cot_fewshot_other
include: _mmlu_flan_cot_fewshot_template_yaml
task: mmlu_flan_cot_fewshot_management
"dataset_name": "marketing" dataset_name: marketing
"description": "The following are multiple choice questions (with answers) about marketing.\n\ description: The following are multiple choice questions (with answers) about marketing.
\nQ: Although the content and quality can be as controlled as direct mail, response\ fewshot_config:
\ rates of this medium are lower because of the lack of a personal address mechanism.\ sampler: first_n
\ This media format is known as:\n(A) Care lines. (B) Direct mail. (C) Inserts.\ samples:
\ (D) Door to door.\nA: Let's think step by step. We refer to Wikipedia articles\ - question: 'Although the content and quality can be as controlled as direct mail,
\ on marketing for help. Door to door marketing delivers non-addressed items within\ response rates of this medium are lower because of the lack of a personal address
\ all buildings within a geographic area. While it can control the content and quality\ mechanism. This media format is known as:
\ as well as direct mail marketing, its response rate is lower because of the lack\
\ of a personal address mechanism. The answer is (D).\n\nQ: In an organization,\ (A) Care lines. (B) Direct mail. (C) Inserts. (D) Door to door.'
\ the group of people tasked with buying decisions is referred to as the _______________.\n\ target: Let's think step by step. We refer to Wikipedia articles on marketing
(A) Outsourcing unit. (B) Procurement centre. (C) Chief executive unit. (D) Decision-making\ for help. Door to door marketing delivers non-addressed items within all buildings
\ unit.\nA: Let's think step by step. We refer to Wikipedia articles on marketing\ within a geographic area. While it can control the content and quality as well
\ for help. In an organization, the group of the people tasked with buying decision\ as direct mail marketing, its response rate is lower because of the lack of
\ is referred to as the decision-making unit. The answer is (D).\n\nQ: The single\ a personal address mechanism. The answer is (D).
\ group within society that is most vulnerable to reference group influence is:\n\ - question: 'In an organization, the group of people tasked with buying decisions is
(A) The older consumer who feels somewhat left out of things. (B) The married women,\ referred to as the _______________.
\ many of whom feel a need for stability in their lives. (C) New immigrants who\
\ really want to assimilate into their new culture. (D) Children, who base most\ (A) Outsourcing unit. (B) Procurement centre. (C) Chief executive unit. (D)
\ of their buying decisions on outside influences.\nA: Let's think step by step.\ Decision-making unit.'
\ We refer to Wikipedia articles on marketing for help. Children, who mostly based\ target: Let's think step by step. We refer to Wikipedia articles on marketing
\ their buying decisions on outside influences, are the single group within society\ for help. In an organization, the group of the people tasked with buying decision
\ that is more vulnerable to reference group influence. The answer is (D).\n\nQ:\ is referred to as the decision-making unit. The answer is (D).
\ Which of the following is an assumption in Maslow's hierarchy of needs?\n(A) Needs\ - question: 'The single group within society that is most vulnerable to reference group
\ are dependent on culture and also on social class. (B) Lower-level needs must\ influence is:
\ be at least partially satisfied before higher needs can affect behaviour. (C)\
\ Needs are not prioritized or arranged in any particular order. (D) Satisfied needs\ (A) The older consumer who feels somewhat left out of things. (B) The married
\ are motivators, and new needs emerge when current needs remain unmet.\nA: Let's\ women, many of whom feel a need for stability in their lives. (C) New immigrants
\ think step by step. We refer to Wikipedia articles on marketing for help. Maslow's\ who really want to assimilate into their new culture. (D) Children, who base
\ hierarchy of needs, from the bottom upwards, are physiological (food and clothing),\ most of their buying decisions on outside influences.'
\ safety, love and belonging needs, esteem, and self-actualization. Lower-level\ target: Let's think step by step. We refer to Wikipedia articles on marketing
\ needs must be at least partially satisfied before higher ones can affect behavior.\ for help. Children, who mostly based their buying decisions on outside influences,
\ The answer is (B).\n\nQ: _____________ is a natural outcome when combining demographic\ are the single group within society that is more vulnerable to reference group
\ and geographic variables.\n(A) Geodemographics (B) Product differentiation. (C)\ influence. The answer is (D).
\ ANSOFF matrix. (D) Brand management.\nA: Let's think step by step. We refer to\ - question: 'Which of the following is an assumption in Maslow''s hierarchy of needs?
\ Wikipedia articles on marketing for help. Geodemographics is a natural outcome\
\ when combining demographic and geographic variables. The answer is (A).\n\n" (A) Needs are dependent on culture and also on social class. (B) Lower-level
"group": "mmlu_flan_cot_fewshot_other" needs must be at least partially satisfied before higher needs can affect behaviour.
"include": "_mmlu_flan_cot_fewshot_template_yaml" (C) Needs are not prioritized or arranged in any particular order. (D) Satisfied
"task": "mmlu_flan_cot_fewshot_marketing" needs are motivators, and new needs emerge when current needs remain unmet.'
target: Let's think step by step. We refer to Wikipedia articles on marketing
for help. Maslow's hierarchy of needs, from the bottom upwards, are physiological
(food and clothing), safety, love and belonging needs, esteem, and self-actualization.
Lower-level needs must be at least partially satisfied before higher ones can
affect behavior. The answer is (B).
- question: '_____________ is a natural outcome when combining demographic and geographic
variables.
(A) Geodemographics (B) Product differentiation. (C) ANSOFF matrix. (D) Brand
management.'
target: 'Let''s think step by step. We refer to Wikipedia articles on marketing
for help. Geodemographics is a natural outcome when combining demographic and
geographic variables. The answer is (A).'
group: mmlu_flan_cot_fewshot_other
include: _mmlu_flan_cot_fewshot_template_yaml
task: mmlu_flan_cot_fewshot_marketing
"dataset_name": "medical_genetics" dataset_name: medical_genetics
"description": "The following are multiple choice questions (with answers) about medical\ description: The following are multiple choice questions (with answers) about medical
\ genetics.\n\nQ: The stage of meiosis in which chromosomes pair and cross over\ genetics.
\ is:\n(A) prophase I (B) metaphase I (C) prophase II (D) metaphase II\nA: Let's\ fewshot_config:
\ think step by step. We refer to Wikipedia articles on medical genetics for help.\ sampler: first_n
\ Prophase I is the stage of meiosis where homologous chromosomes pair with each\ samples:
\ other and exchange genetic material. The answer is (A).\n\nQ: DNA ligase is\n\ - question: 'The stage of meiosis in which chromosomes pair and cross over is:
(A) an enzyme that joins fragments in normal DNA replication (B) an enzyme of bacterial\
\ origin which cuts DNA at defined base sequences (C) an enzyme that facilitates\ (A) prophase I (B) metaphase I (C) prophase II (D) metaphase II'
\ transcription of specific genes (D) an enzyme which limits the level to which\ target: Let's think step by step. We refer to Wikipedia articles on medical genetics
\ a particular nutrient reaches\nA: Let's think step by step. We refer to Wikipedia\ for help. Prophase I is the stage of meiosis where homologous chromosomes pair
\ articles on medical genetics for help. DNA ligase is a type of enzyme (EC 6.5.1.1)\ with each other and exchange genetic material. The answer is (A).
\ responsible for joining DNA strands together by catalyzing a phosphodiester bond.\ - question: 'DNA ligase is
\ The answer is (A).\n\nQ: Which of the following conditions does not show multifactorial\
\ inheritance?\n(A) Pyloric stenosis (B) Schizophrenia (C) Spina bifida (neural\ (A) an enzyme that joins fragments in normal DNA replication (B) an enzyme of
\ tube defects) (D) Marfan syndrome\nA: Let's think step by step. We refer to Wikipedia\ bacterial origin which cuts DNA at defined base sequences (C) an enzyme that
\ articles on medical genetics for help. Multifactorial inheritance is when more\ facilitates transcription of specific genes (D) an enzyme which limits the level
\ than a single factor is responsible for causing a given trait or health problem.\ to which a particular nutrient reaches'
\ Genes cannot be the only factor. Marfan syndrome, on the other hand, requires\ target: Let's think step by step. We refer to Wikipedia articles on medical genetics
\ only one abnormal copy of the of the Marfan gene, from one parent, to inherit\ for help. DNA ligase is a type of enzyme (EC 6.5.1.1) responsible for joining
\ the trait. The answer is (D).\n\nQ: A gene showing codominance\n(A) has both alleles\ DNA strands together by catalyzing a phosphodiester bond. The answer is (A).
\ independently expressed in the heterozygote (B) has one allele dominant to the\ - question: 'Which of the following conditions does not show multifactorial inheritance?
\ other (C) has alleles tightly linked on the same chromosome (D) has alleles expressed\
\ at the same time in development\nA: Let's think step by step. We refer to Wikipedia\ (A) Pyloric stenosis (B) Schizophrenia (C) Spina bifida (neural tube defects)
\ articles on medical genetics for help. Codominance, as it relates to genetics,\ (D) Marfan syndrome'
\ refers to a type of genetic inheritance where the phenotype of both the parents\ target: Let's think step by step. We refer to Wikipedia articles on medical genetics
\ is easily observed in the offspring. A heterozygote is an individual having two\ for help. Multifactorial inheritance is when more than a single factor is responsible
\ different alleles of a gene. The answer is (A).\n\nQ: Large triplet repeat expansions\ for causing a given trait or health problem. Genes cannot be the only factor.
\ can be detected by:\n(A) polymerase chain reaction. (B) single strand conformational\ Marfan syndrome, on the other hand, requires only one abnormal copy of the of
\ polymorphism analysis. (C) Southern blotting. (D) Western blotting.\nA: Let's\ the Marfan gene, from one parent, to inherit the trait. The answer is (D).
\ think step by step. We refer to Wikipedia articles on medical genetics for help.\ - question: 'A gene showing codominance
\ A Southern blot is a method in molecular biology for detecting specific DNA sequences\
\ in a sample. Large triplet repeat expansions are usually detected with this method.\ (A) has both alleles independently expressed in the heterozygote (B) has one
\ The answer is (C).\n\n" allele dominant to the other (C) has alleles tightly linked on the same chromosome
"group": "mmlu_flan_cot_fewshot_other" (D) has alleles expressed at the same time in development'
"include": "_mmlu_flan_cot_fewshot_template_yaml" target: Let's think step by step. We refer to Wikipedia articles on medical genetics
"task": "mmlu_flan_cot_fewshot_medical_genetics" for help. Codominance, as it relates to genetics, refers to a type of genetic
inheritance where the phenotype of both the parents is easily observed in the
offspring. A heterozygote is an individual having two different alleles of a
gene. The answer is (A).
- question: 'Large triplet repeat expansions can be detected by:
(A) polymerase chain reaction. (B) single strand conformational polymorphism
analysis. (C) Southern blotting. (D) Western blotting.'
target: 'Let''s think step by step. We refer to Wikipedia articles on medical
genetics for help. A Southern blot is a method in molecular biology for detecting
specific DNA sequences in a sample. Large triplet repeat expansions are usually
detected with this method. The answer is (C).'
group: mmlu_flan_cot_fewshot_other
include: _mmlu_flan_cot_fewshot_template_yaml
task: mmlu_flan_cot_fewshot_medical_genetics
"dataset_name": "miscellaneous" dataset_name: miscellaneous
"description": "The following are multiple choice questions (with answers) about miscellaneous.\n\ description: The following are multiple choice questions (with answers) about miscellaneous.
\nQ: Which of these songs was a Top 10 hit for the rock band The Police?\n(A) 'Radio\ fewshot_config:
\ Ga-Ga' (B) 'Ob-la-di Ob-la-da' (C) 'De Do Do Do De Da Da Da' (D) 'In-a-Gadda-Da-Vida'\n\ sampler: first_n
A: Let's think step by step. We refer to Wikipedia for help. Radio Ga-Ga is by Queen.\ samples:
\ Ob-la-di Ob-la-da is by The Beatles. And In-a-Gadda-Da-Vida is by Iron Butterfly.\ - question: 'Which of these songs was a Top 10 hit for the rock band The Police?
\ Leaving 'De Do Do Do De Da Da Da' as the only song by The Police, and also a Top\
\ 10 hit. The answer is (C).\n\nQ: What place is named in the title of the 1979\ (A) ''Radio Ga-Ga'' (B) ''Ob-la-di Ob-la-da'' (C) ''De Do Do Do De Da Da Da''
\ live album by rock legends Cheap Trick?\n(A) Budapest (B) Budokan (C) Bhutan (D)\ (D) ''In-a-Gadda-Da-Vida'''
\ Britain\nA: Let's think step by step. We refer to Wikipedia for help. Nippon Budokan\ target: Let's think step by step. We refer to Wikipedia for help. Radio Ga-Ga
\ is an indoor arena in Tokyo, Japan renowned for hosting rock music concerts including\ is by Queen. Ob-la-di Ob-la-da is by The Beatles. And In-a-Gadda-Da-Vida is
\ Cheap Trick in 1978. 'Cheap Trick at Budokan' became the name of their album.\ by Iron Butterfly. Leaving 'De Do Do Do De Da Da Da' as the only song by The
\ The answer is (B).\n\nQ: What is produced during photosynthesis?\n(A) hydrogen\ Police, and also a Top 10 hit. The answer is (C).
\ (B) nylon (C) oxygen (D) light\nA: Let's think step by step. We refer to Wikipedia\ - question: 'What place is named in the title of the 1979 live album by rock legends
\ for help. Photosynthesis is the process in which green plants use the green pigment\ Cheap Trick?
\ chlorophyll to synthesize foods with water and carbon dioxide. Oxygen is the byproduct\
\ of this process. The answer is (C).\n\nQ: Who is the shortest man to ever win\ (A) Budapest (B) Budokan (C) Bhutan (D) Britain'
\ an NBA slam dunk competition?\n(A) Anthony 'Spud' Webb (B) Michael 'Air' Jordan\ target: Let's think step by step. We refer to Wikipedia for help. Nippon Budokan
\ (C) Tyrone 'Muggsy' Bogues (D) Julius 'Dr J' Erving\nA: Let's think step by step.\ is an indoor arena in Tokyo, Japan renowned for hosting rock music concerts
\ We refer to Wikipedia for help. In 1986, Spud Webb, standing only 5'7\" became\ including Cheap Trick in 1978. 'Cheap Trick at Budokan' became the name of their
\ the shortest NBA player in history to win an official slam dunk contest. The answer\ album. The answer is (B).
\ is (A).\n\nQ: How many axles does a standard automobile have?\n(A) one (B) two\ - question: 'What is produced during photosynthesis?
\ (C) four (D) eight\nA: Let's think step by step. We refer to Wikipedia for help.\
\ Most cars have two axles to rotate the wheels.. The answer is (B).\n\n" (A) hydrogen (B) nylon (C) oxygen (D) light'
"group": "mmlu_flan_cot_fewshot_other" target: Let's think step by step. We refer to Wikipedia for help. Photosynthesis
"include": "_mmlu_flan_cot_fewshot_template_yaml" is the process in which green plants use the green pigment chlorophyll to synthesize
"task": "mmlu_flan_cot_fewshot_miscellaneous" foods with water and carbon dioxide. Oxygen is the byproduct of this process.
The answer is (C).
- question: 'Who is the shortest man to ever win an NBA slam dunk competition?
(A) Anthony ''Spud'' Webb (B) Michael ''Air'' Jordan (C) Tyrone ''Muggsy'' Bogues
(D) Julius ''Dr J'' Erving'
target: Let's think step by step. We refer to Wikipedia for help. In 1986, Spud
Webb, standing only 5'7" became the shortest NBA player in history to win an
official slam dunk contest. The answer is (A).
- question: 'How many axles does a standard automobile have?
(A) one (B) two (C) four (D) eight'
target: 'Let''s think step by step. We refer to Wikipedia for help. Most cars
have two axles to rotate the wheels.. The answer is (B).'
group: mmlu_flan_cot_fewshot_other
include: _mmlu_flan_cot_fewshot_template_yaml
task: mmlu_flan_cot_fewshot_miscellaneous
"dataset_name": "moral_disputes" dataset_name: moral_disputes
"description": "The following are multiple choice questions (with answers) about moral\ description: The following are multiple choice questions (with answers) about moral
\ disputes.\n\nQ: Baron admits that the versions of the ticking bomb hypothetical\ disputes.
\ she discusses are \"stunningly stupid,\" but she claims this is actually evidence\ fewshot_config:
\ of\n(A) the stupidity of most traditional philosophical examples. (B) a general\ sampler: first_n
\ lack of intelligence among people with advanced degrees. (C) the wrongness of\ samples:
\ torture. (D) the readiness on the part of many intelligent people to see torture\ - question: 'Baron admits that the versions of the ticking bomb hypothetical she discusses
\ as the best solution to deal with terrorism.\nA: Let's think step by step. We\ are "stunningly stupid," but she claims this is actually evidence of
\ refer to Wikipedia articles on moral disputes for help. The ticking bomb hypothetical\
\ poses a problem where many people will die to an exploding bomb, if the hypothetical\ (A) the stupidity of most traditional philosophical examples. (B) a general
\ terrorist does not disclose how to defuse it. Baron sees this hypothetical as\ lack of intelligence among people with advanced degrees. (C) the wrongness of
\ silly, but its prevalence does suggest intelligent people, particularly utilitarians,\ torture. (D) the readiness on the part of many intelligent people to see torture
\ see torture as justifiable to save the lives in this scenario. The answer is (D).\n\ as the best solution to deal with terrorism.'
\nQ: A fertilized ovum is also known as\n(A) a zygote. (B) an embryo. (C) a viability.\ target: Let's think step by step. We refer to Wikipedia articles on moral disputes
\ (D) a blastocyst.\nA: Let's think step by step. We refer to Wikipedia articles\ for help. The ticking bomb hypothetical poses a problem where many people will
\ on moral disputes for help. Once a single sperm penetrates the layers of an egg\ die to an exploding bomb, if the hypothetical terrorist does not disclose how
\ to form a new cell, that cell is called a zygote. The answer is (A).\n\nQ: Pence\ to defuse it. Baron sees this hypothetical as silly, but its prevalence does
\ compares six different cases of reproduction, from natural twinning to SCNT. What\ suggest intelligent people, particularly utilitarians, see torture as justifiable
\ conclusion does he draw from this comparison?\n(A) SCNT is not a different kind\ to save the lives in this scenario. The answer is (D).
\ of reproduction because there are no morally relevant differences between it and\ - question: 'A fertilized ovum is also known as
\ other permissible means of reproduction. (B) Because there is a low risk of harm\
\ for natural twinning, there will be a low risk of harm for SCNT. (C) both A and\ (A) a zygote. (B) an embryo. (C) a viability. (D) a blastocyst.'
\ B (D) neither A nor B\nA: Let's think step by step. We refer to Wikipedia articles\ target: Let's think step by step. We refer to Wikipedia articles on moral disputes
\ on moral disputes for help. SCNT, known as 'somatic cell nuclear transfer', is\ for help. Once a single sperm penetrates the layers of an egg to form a new
\ a technique in which the nucleus of a somatic (body) cell is transferred to the\ cell, that cell is called a zygote. The answer is (A).
\ cytoplasm of an enucleated egg. This is used in farm animal cloning. Pence may\ - question: 'Pence compares six different cases of reproduction, from natural twinning
\ compare this to other processes of reproduction, and see little moral separation.\ to SCNT. What conclusion does he draw from this comparison?
\ The answer is (A).\n\nQ: Which of the following is an intrinsic good, according\
\ to Singer?\n(A) being in the upper class (B) the ability to learn and solve problems\ (A) SCNT is not a different kind of reproduction because there are no morally
\ (C) being more beautiful than most of the population (D) being shorter than average\n\ relevant differences between it and other permissible means of reproduction.
A: Let's think step by step. We refer to Wikipedia articles on moral disputes for\ (B) Because there is a low risk of harm for natural twinning, there will be
\ help. Bioethicist Peter Singer sees intrinsic value as innate values conferred\ a low risk of harm for SCNT. (C) both A and B (D) neither A nor B'
\ by oneself, for oneself. Innanimate objects can be beautiful, short, or have some\ target: Let's think step by step. We refer to Wikipedia articles on moral disputes
\ valuable criteria, but capabilities are intrinsically good. The answer is (B).\n\ for help. SCNT, known as 'somatic cell nuclear transfer', is a technique in
\nQ: According to Metz, what is wrong with consequentialist arguments against capital\ which the nucleus of a somatic (body) cell is transferred to the cytoplasm of
\ punishment based on African values?\n(A) It is unclear as of yet whether or not\ an enucleated egg. This is used in farm animal cloning. Pence may compare this
\ capital punishment deters harm to the community. (B) It is unclear as of yet whether\ to other processes of reproduction, and see little moral separation. The answer
\ or not capital punishment deters harm to any individuals. (C) Consequentialism\ is (A).
\ is not supported by African values. (D) Even though consequentialism is supported\ - question: 'Which of the following is an intrinsic good, according to Singer?
\ by African values, no consequentialist arguments framed in terms of African values\
\ have been offered.\nA: Let's think step by step. We refer to Wikipedia articles\ (A) being in the upper class (B) the ability to learn and solve problems (C)
\ on moral disputes for help. Thaddeus Metz is a humanities research professor focusing\ being more beautiful than most of the population (D) being shorter than average'
\ on ethical philosophy in South Africa. Metz has written the death penalty is unjustified\ target: Let's think step by step. We refer to Wikipedia articles on moral disputes
\ as it treats individuals as incapable of communal relations. It is unclear that\ for help. Bioethicist Peter Singer sees intrinsic value as innate values conferred
\ capital punishment is to the benefit of, or a deterrent of harm to the community.\ by oneself, for oneself. Innanimate objects can be beautiful, short, or have
\ The answer is (A).\n\n" some valuable criteria, but capabilities are intrinsically good. The answer
"group": "mmlu_flan_cot_fewshot_humanities" is (B).
"include": "_mmlu_flan_cot_fewshot_template_yaml" - question: 'According to Metz, what is wrong with consequentialist arguments against
"task": "mmlu_flan_cot_fewshot_moral_disputes" capital punishment based on African values?
(A) It is unclear as of yet whether or not capital punishment deters harm to
the community. (B) It is unclear as of yet whether or not capital punishment
deters harm to any individuals. (C) Consequentialism is not supported by African
values. (D) Even though consequentialism is supported by African values, no
consequentialist arguments framed in terms of African values have been offered.'
target: 'Let''s think step by step. We refer to Wikipedia articles on moral disputes
for help. Thaddeus Metz is a humanities research professor focusing on ethical
philosophy in South Africa. Metz has written the death penalty is unjustified
as it treats individuals as incapable of communal relations. It is unclear that
capital punishment is to the benefit of, or a deterrent of harm to the community.
The answer is (A).'
group: mmlu_flan_cot_fewshot_humanities
include: _mmlu_flan_cot_fewshot_template_yaml
task: mmlu_flan_cot_fewshot_moral_disputes
"dataset_name": "moral_scenarios" dataset_name: moral_scenarios
"description": "The following are multiple choice questions (with answers) about moral\ description: The following are multiple choice questions (with answers) about moral
\ scenarios.\n\nQ: For which of these two scenarios does the main character (who\ scenarios.
\ uses I/me/my) do something clearly morally wrong, according to ordinary moral\ fewshot_config:
\ standards in the US as of 2020? Scenario 1 | I took his wallet after he harassed\ sampler: first_n
\ me at the bar. Scenario 2 | I made love to my newborn baby boy.\n(A) Wrong, Wrong\ samples:
\ (B) Wrong, Not wrong (C) Not wrong, Wrong (D) Not wrong, Not wrong\nA: Let's think\ - question: 'For which of these two scenarios does the main character (who uses I/me/my)
\ step by step. We refer to Wikipedia articles on moral scenarios for help. Stealing\ do something clearly morally wrong, according to ordinary moral standards in
\ from someone is not considered an ethical response to harrassment. And having\ the US as of 2020? Scenario 1 | I took his wallet after he harassed me at the
\ sex with a minor is egregiously unethical. The answer is (A).\n\nQ: For which\ bar. Scenario 2 | I made love to my newborn baby boy.
\ of these two scenarios does the main character (who uses I/me/my) do something\
\ clearly morally wrong, according to ordinary moral standards in the US as of 2020?\ (A) Wrong, Wrong (B) Wrong, Not wrong (C) Not wrong, Wrong (D) Not wrong, Not
\ Scenario 1 | I asked for a cigarette from the guy outside the store. Scenario\ wrong'
\ 2 | When the security guard left his post, I pocketed a rare coin.\n(A) Wrong,\ target: Let's think step by step. We refer to Wikipedia articles on moral scenarios
\ Wrong (B) Wrong, Not wrong (C) Not wrong, Wrong (D) Not wrong, Not wrong\nA: Let's\ for help. Stealing from someone is not considered an ethical response to harrassment.
\ think step by step. We refer to Wikipedia articles on moral scenarios for help.\ And having sex with a minor is egregiously unethical. The answer is (A).
\ Asking for something is not morally wrong. Stealing is morally wrong. The answer\ - question: 'For which of these two scenarios does the main character (who uses I/me/my)
\ is (C).\n\nQ: For which of these two scenarios does the main character (who uses\ do something clearly morally wrong, according to ordinary moral standards in
\ I/me/my) do something clearly morally wrong, according to ordinary moral standards\ the US as of 2020? Scenario 1 | I asked for a cigarette from the guy outside
\ in the US as of 2020? Scenario 1 | During the house's demolition, I pushed him\ the store. Scenario 2 | When the security guard left his post, I pocketed a
\ into a wall. Scenario 2 | When I choose a babysitter for my twin nieces, I don't\ rare coin.
\ care about her racial background.\n(A) Wrong, Wrong (B) Wrong, Not wrong (C) Not\
\ wrong, Wrong (D) Not wrong, Not wrong\nA: Let's think step by step. We refer to\ (A) Wrong, Wrong (B) Wrong, Not wrong (C) Not wrong, Wrong (D) Not wrong, Not
\ Wikipedia articles on moral scenarios for help. Physical violence is morally wrong.\ wrong'
\ Being open to hiring people from different racial backgrounds is an expected moral\ target: Let's think step by step. We refer to Wikipedia articles on moral scenarios
\ stance. The answer is (B).\n\nQ: For which of these two scenarios does the main\ for help. Asking for something is not morally wrong. Stealing is morally wrong.
\ character (who uses I/me/my) do something clearly morally wrong, according to\ The answer is (C).
\ ordinary moral standards in the US as of 2020? Scenario 1 | I bought an old pick-up\ - question: 'For which of these two scenarios does the main character (who uses I/me/my)
\ truck for work. Scenario 2 | I looked up at the gorgeous waterfall and threw a\ do something clearly morally wrong, according to ordinary moral standards in
\ rock at it.\n(A) Wrong, Wrong (B) Wrong, Not wrong (C) Not wrong, Wrong (D) Not\ the US as of 2020? Scenario 1 | During the house''s demolition, I pushed him
\ wrong, Not wrong\nA: Let's think step by step. We refer to Wikipedia articles\ into a wall. Scenario 2 | When I choose a babysitter for my twin nieces, I don''t
\ on moral scenarios for help. Neither of these are wrong as no one is wronged by\ care about her racial background.
\ buing a pick-up truck or throwing a rock into a stream. The answer is (D).\n\n\
Q: For which of these two scenarios does the main character (who uses I/me/my) do\ (A) Wrong, Wrong (B) Wrong, Not wrong (C) Not wrong, Wrong (D) Not wrong, Not
\ something clearly morally wrong, according to ordinary moral standards in the\ wrong'
\ US as of 2020? Scenario 1 | I really loved my newborn baby boy. Scenario 2 | The\ target: Let's think step by step. We refer to Wikipedia articles on moral scenarios
\ new girl is embarrassed to live in a trailer park, so I invite the class to her\ for help. Physical violence is morally wrong. Being open to hiring people from
\ house.\n(A) Wrong, Wrong (B) Wrong, Not wrong (C) Not wrong, Wrong (D) Not wrong,\ different racial backgrounds is an expected moral stance. The answer is (B).
\ Not wrong\nA: Let's think step by step. We refer to Wikipedia articles on moral\ - question: 'For which of these two scenarios does the main character (who uses I/me/my)
\ scenarios for help. Loving someone is not wrong. However, exposing something that\ do something clearly morally wrong, according to ordinary moral standards in
\ someone is embarrassed about could be considered quite mean. The answer is (C).\n\n" the US as of 2020? Scenario 1 | I bought an old pick-up truck for work. Scenario
"group": "mmlu_flan_cot_fewshot_humanities" 2 | I looked up at the gorgeous waterfall and threw a rock at it.
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_moral_scenarios" (A) Wrong, Wrong (B) Wrong, Not wrong (C) Not wrong, Wrong (D) Not wrong, Not
wrong'
target: Let's think step by step. We refer to Wikipedia articles on moral scenarios
for help. Neither of these are wrong as no one is wronged by buing a pick-up
truck or throwing a rock into a stream. The answer is (D).
- question: 'For which of these two scenarios does the main character (who uses I/me/my)
do something clearly morally wrong, according to ordinary moral standards in
the US as of 2020? Scenario 1 | I really loved my newborn baby boy. Scenario
2 | The new girl is embarrassed to live in a trailer park, so I invite the class
to her house.
(A) Wrong, Wrong (B) Wrong, Not wrong (C) Not wrong, Wrong (D) Not wrong, Not
wrong'
target: 'Let''s think step by step. We refer to Wikipedia articles on moral scenarios
for help. Loving someone is not wrong. However, exposing something that someone
is embarrassed about could be considered quite mean. The answer is (C).'
group: mmlu_flan_cot_fewshot_humanities
include: _mmlu_flan_cot_fewshot_template_yaml
task: mmlu_flan_cot_fewshot_moral_scenarios
"dataset_name": "nutrition" dataset_name: nutrition
"description": "The following are multiple choice questions (with answers) about nutrition.\n\ description: The following are multiple choice questions (with answers) about nutrition.
\nQ: What is the first-line drug for patients with type 2 diabetes and obesity,\ fewshot_config:
\ as of 2020?\n(A) Acarbose (B) Metformin (C) Sulphonylureas (D) Insulin\nA: Let's\ sampler: first_n
\ think step by step. We refer to Wikipedia articles on nutrition for help. Metformin\ samples:
\ (Fortamet, Glumetza, or others) is usually the first medication prescribed for\ - question: 'What is the first-line drug for patients with type 2 diabetes and obesity,
\ type 2 diabetes, as well as obesity. It works by lowering glucose production in\ as of 2020?
\ the liver and improving the body's sensitivity to insulin. The answer is (B).\n\
\nQ: Which of the following statements is correct (according to knowledge in 2020)?\n\ (A) Acarbose (B) Metformin (C) Sulphonylureas (D) Insulin'
(A) Consumers with phenylketonuria must avoid the consumption of the sweetener aspartame\ target: Let's think step by step. We refer to Wikipedia articles on nutrition
\ (B) Consumers with phenylketonuria must avoid the consumption of the sweetener\ for help. Metformin (Fortamet, Glumetza, or others) is usually the first medication
\ saccharin (C) Consumers with phenylketonuria must avoid the consumption of the\ prescribed for type 2 diabetes, as well as obesity. It works by lowering glucose
\ sweetener sucralose (D) Consumers with phenylketonuria must avoid the consumption\ production in the liver and improving the body's sensitivity to insulin. The
\ of the sweetener acesulfame K\nA: Let's think step by step. We refer to Wikipedia\ answer is (B).
\ articles on nutrition for help. People with phenylketonuria (PKU) cannot break\ - question: 'Which of the following statements is correct (according to knowledge in
\ down the amino acid phenylalanine. As it builds up in the blood and brain it can\ 2020)?
\ lead to brain damage. People with PKU should avoid foods that are converted to\
\ phenylalanine in the body, such as aspartame. The answer is (A).\n\nQ: Which of\ (A) Consumers with phenylketonuria must avoid the consumption of the sweetener
\ the following statements about iodine is correct, as of 2020?\n(A) 50% of adults\ aspartame (B) Consumers with phenylketonuria must avoid the consumption of the
\ consume iodine at levels below the RNI (B) Dairy products are a poor source of\ sweetener saccharin (C) Consumers with phenylketonuria must avoid the consumption
\ iodine (C) The iodine content of organic milk is generally lower that the level\ of the sweetener sucralose (D) Consumers with phenylketonuria must avoid the
\ in non-organic milk (D) UK dietary reference values recommend an increase in iodine\ consumption of the sweetener acesulfame K'
\ intake in pregnancy\nA: Let's think step by step. We refer to Wikipedia articles\ target: Let's think step by step. We refer to Wikipedia articles on nutrition
\ on nutrition for help. Organic milk usually has less iodine content than non-organic\ for help. People with phenylketonuria (PKU) cannot break down the amino acid
\ milk. The answer is (C).\n\nQ: Which of the following is the most plausible explanation\ phenylalanine. As it builds up in the blood and brain it can lead to brain damage.
\ for the protective effect of dietary fibre against cancer of the colon, as of\ People with PKU should avoid foods that are converted to phenylalanine in the
\ 2020?\n(A) Propionic acid, formed during colonic fibre fermentation inhibits liver\ body, such as aspartame. The answer is (A).
\ fatty acid synthesis (B) Butyric acid, formed during colonic fibre fermentation\ - question: 'Which of the following statements about iodine is correct, as of 2020?
\ stimulates \"silencing\" of the SLC5A8 tumour suppressor gene (C) None of these\
\ options are correct (D) Butyric acid, formed during colonic fibre fermentation\ (A) 50% of adults consume iodine at levels below the RNI (B) Dairy products
\ stimulates anti-oxidant defences in the colon\nA: Let's think step by step. We\ are a poor source of iodine (C) The iodine content of organic milk is generally
\ refer to Wikipedia articles on nutrition for help. Dietary fibre is inversely\ lower that the level in non-organic milk (D) UK dietary reference values recommend
\ proportional to the risk of colorectal cancer. This is presumed because butyric\ an increase in iodine intake in pregnancy'
\ acid (BA) stimulates antioxidants which help protect the colon from cancerous\ target: Let's think step by step. We refer to Wikipedia articles on nutrition
\ tumors. The answer is (D).\n\nQ: In a cohort study, the risk ratio of developing\ for help. Organic milk usually has less iodine content than non-organic milk.
\ diabetes was 0.86 when comparing consumers of tea (the exposed) to those who did\ The answer is (C).
\ not drink tea (the unexposed). Which one statement is correct (according to knowledge\ - question: 'Which of the following is the most plausible explanation for the protective
\ in 2020)?\n(A) The tea drinkers have lower risk of developing diabetes. (B) The\ effect of dietary fibre against cancer of the colon, as of 2020?
\ tea drinkers have higher risk of developing diabetes. (C) Based on the information\
\ given we cannot tell if the observed difference in disease risk is the result\ (A) Propionic acid, formed during colonic fibre fermentation inhibits liver
\ of chance. (D) The risk ratio is close to the value one, so there is no difference\ fatty acid synthesis (B) Butyric acid, formed during colonic fibre fermentation
\ in disease risk between the two groups.\nA: Let's think step by step. We refer\ stimulates "silencing" of the SLC5A8 tumour suppressor gene (C) None of these
\ to Wikipedia articles on nutrition for help. The risk ratio is not sufficiently\ options are correct (D) Butyric acid, formed during colonic fibre fermentation
\ reduced that it could not be explained by random chance given the studies sample\ stimulates anti-oxidant defences in the colon'
\ size. The answer is (C).\n\n" target: Let's think step by step. We refer to Wikipedia articles on nutrition
"group": "mmlu_flan_cot_fewshot_other" for help. Dietary fibre is inversely proportional to the risk of colorectal
"include": "_mmlu_flan_cot_fewshot_template_yaml" cancer. This is presumed because butyric acid (BA) stimulates antioxidants which
"task": "mmlu_flan_cot_fewshot_nutrition" help protect the colon from cancerous tumors. The answer is (D).
- question: 'In a cohort study, the risk ratio of developing diabetes was 0.86 when
comparing consumers of tea (the exposed) to those who did not drink tea (the
unexposed). Which one statement is correct (according to knowledge in 2020)?
(A) The tea drinkers have lower risk of developing diabetes. (B) The tea drinkers
have higher risk of developing diabetes. (C) Based on the information given
we cannot tell if the observed difference in disease risk is the result of chance.
(D) The risk ratio is close to the value one, so there is no difference in disease
risk between the two groups.'
target: 'Let''s think step by step. We refer to Wikipedia articles on nutrition
for help. The risk ratio is not sufficiently reduced that it could not be explained
by random chance given the studies sample size. The answer is (C).'
group: mmlu_flan_cot_fewshot_other
include: _mmlu_flan_cot_fewshot_template_yaml
task: mmlu_flan_cot_fewshot_nutrition
"dataset_name": "philosophy" dataset_name: philosophy
"description": "The following are multiple choice questions (with answers) about philosophy.\n\ description: The following are multiple choice questions (with answers) about philosophy.
\nQ: The study of reality in the broadest sense, an inquiry into the elemental nature\ fewshot_config:
\ of the universe and the things in it, is known as _____.\n(A) metaphysics (B)\ sampler: first_n
\ epistemology (C) quantum physics (D) axiology\nA: Let's think step by step. We\ samples:
\ refer to Wikipedia articles on philosophy for help. Among the options, only metaphysics\ - question: 'The study of reality in the broadest sense, an inquiry into the elemental
\ studies the nature of reality and existence. The answer is (A).\n\nQ: According\ nature of the universe and the things in it, is known as _____.
\ to Moore’s “ideal utilitarianism,” the right action is the one that brings about\
\ the greatest amount of:\n(A) pleasure. (B) happiness. (C) good. (D) virtue.\n\ (A) metaphysics (B) epistemology (C) quantum physics (D) axiology'
A: Let's think step by step. We refer to Wikipedia articles on philosophy for help.\ target: Let's think step by step. We refer to Wikipedia articles on philosophy
\ Moore's \"ideal utilitarianism\" states that one's actions should maximize intrinsic\ for help. Among the options, only metaphysics studies the nature of reality
\ goods. The answer is (C).\n\nQ: Before Tolstoy's Christian conversion, what was\ and existence. The answer is (A).
\ his perspective on the meaning of life?\n(A) optimist (B) satisfied (C) nominally\ - question: "According to Moore\u2019s \u201Cideal utilitarianism,\u201D the right\
\ religious (D) pessimist\nA: Let's think step by step. We refer to Wikipedia articles\ \ action is the one that brings about the greatest amount of:\n(A) pleasure.\
\ on philosophy for help. Before his conversion, Tolstoy feels that life was uncertain,\ \ (B) happiness. (C) good. (D) virtue."
\ which is a pessimist's point of view. The answer is (D).\n\nQ: According to d'Holbach,\ target: Let's think step by step. We refer to Wikipedia articles on philosophy
\ people always act according to _____.\n(A) free choices (B) dictates of the soul\ for help. Moore's "ideal utilitarianism" states that one's actions should maximize
\ (C) necessary natural laws (D) undetermined will\nA: Let's think step by step.\ intrinsic goods. The answer is (C).
\ We refer to Wikipedia articles on philosophy for help. d'Holbach believes that\ - question: 'Before Tolstoy''s Christian conversion, what was his perspective on the
\ people act according to necessary laws, and it proves nothing about people's free\ meaning of life?
\ will. The answer is (C).\n\nQ: Psychological egoism is:\n(A) an ethical theory\
\ about how we ought to behave. (B) a generalization concerning the way people tend\ (A) optimist (B) satisfied (C) nominally religious (D) pessimist'
\ to behave. (C) a claim about human nature and the ways people are capable of behaving.\ target: Let's think step by step. We refer to Wikipedia articles on philosophy
\ (D) none of the above.\nA: Let's think step by step. We refer to Wikipedia articles\ for help. Before his conversion, Tolstoy feels that life was uncertain, which
\ on philosophy for help. Psychological egoism suggests that one behaves based on\ is a pessimist's point of view. The answer is (D).
\ what makes one feels good, hence it is a claim about human nature and how humans\ - question: 'According to d''Holbach, people always act according to _____.
\ are capable of behaving. The answer is (C).\n\n"
"group": "mmlu_flan_cot_fewshot_humanities" (A) free choices (B) dictates of the soul (C) necessary natural laws (D) undetermined
"include": "_mmlu_flan_cot_fewshot_template_yaml" will'
"task": "mmlu_flan_cot_fewshot_philosophy" target: Let's think step by step. We refer to Wikipedia articles on philosophy
for help. d'Holbach believes that people act according to necessary laws, and
it proves nothing about people's free will. The answer is (C).
- question: 'Psychological egoism is:
(A) an ethical theory about how we ought to behave. (B) a generalization concerning
the way people tend to behave. (C) a claim about human nature and the ways people
are capable of behaving. (D) none of the above.'
target: 'Let''s think step by step. We refer to Wikipedia articles on philosophy
for help. Psychological egoism suggests that one behaves based on what makes
one feels good, hence it is a claim about human nature and how humans are capable
of behaving. The answer is (C).'
group: mmlu_flan_cot_fewshot_humanities
include: _mmlu_flan_cot_fewshot_template_yaml
task: mmlu_flan_cot_fewshot_philosophy
"dataset_name": "prehistory" dataset_name: prehistory
"description": "The following are multiple choice questions (with answers) about prehistory.\n\ description: The following are multiple choice questions (with answers) about prehistory.
\nQ: What is the approximate mean cranial capacity of Homo erectus?\n(A) under 650\ fewshot_config:
\ cc (B) about 800 cc (C) just under 1000 cc (D) 1200 cc\nA: Let's think step by\ sampler: first_n
\ step. We refer to Wikipedia articles on prehistory for help. The average cranium\ samples:
\ capacity of Homo erectus is less than 1000 cubic cm. The answer is (C).\n\nQ:\ - question: 'What is the approximate mean cranial capacity of Homo erectus?
\ According to Timothy Pauketat, the evidence for social stratification and political\
\ power at Cahokia suggests:\n(A) a center of Mississippian civilization with conditions\ (A) under 650 cc (B) about 800 cc (C) just under 1000 cc (D) 1200 cc'
\ similar to the rise of early states. (B) the limitations of authority in a Native\ target: Let's think step by step. We refer to Wikipedia articles on prehistory
\ American society of egalitarian foragers. (C) a simple chiefdom or perhaps a complex\ for help. The average cranium capacity of Homo erectus is less than 1000 cubic
\ chiefdom had evolved by A.D. 1500. (D) a center of Mississippian civilization\ cm. The answer is (C).
\ with conditions similar to societies on the Northwest Coast of North America.\n\ - question: 'According to Timothy Pauketat, the evidence for social stratification
A: Let's think step by step. We refer to Wikipedia articles on prehistory for help.\ and political power at Cahokia suggests:
\ Timothy Pauketat is known for his research on Cahokia, the center of the Mississippian\
\ culture, where he found similar conditions to the rise of early states. The answer\ (A) a center of Mississippian civilization with conditions similar to the rise
\ is (A).\n\nQ: Recent research on hominid species dating from the Middle Pliocene\ of early states. (B) the limitations of authority in a Native American society
\ indicates there was (as of 2020):\n(A) a great amount of species diversity, or\ of egalitarian foragers. (C) a simple chiefdom or perhaps a complex chiefdom
\ a single species that exhibited a lot of diversity. (B) very little species diversity\ had evolved by A.D. 1500. (D) a center of Mississippian civilization with conditions
\ during this period and very few hominids. (C) decreased species diversity due\ similar to societies on the Northwest Coast of North America.'
\ to a prolonged ice age followed by a severe drought. (D) decreased species diversity\ target: Let's think step by step. We refer to Wikipedia articles on prehistory
\ but increased numbers of hammerstones and flakes, indicating stone tool manufacture.\n\ for help. Timothy Pauketat is known for his research on Cahokia, the center
A: Let's think step by step. We refer to Wikipedia articles on prehistory for help.\ of the Mississippian culture, where he found similar conditions to the rise
\ Recent research has recognized multiple hominid species from the Middle Pliocene,\ of early states. The answer is (A).
\ meaning that there is a great amount of species diversity or diversity in a single\ - question: 'Recent research on hominid species dating from the Middle Pliocene indicates
\ species. The answer is (A).\n\nQ: Researchers now believe that the decline of\ there was (as of 2020):
\ the Maya was caused chiefly by:\n(A) a cataclysm of some kind, such as an earthquake,\
\ volcano, or tsunami. (B) ecological degradation resulting from slash-and-burn\ (A) a great amount of species diversity, or a single species that exhibited
\ farming techniques. (C) endless wars between neighboring Mayan city-states. (D)\ a lot of diversity. (B) very little species diversity during this period and
\ practices of interbreeding that led to a steep rise in congenital disorders.\n\ very few hominids. (C) decreased species diversity due to a prolonged ice age
A: Let's think step by step. We refer to Wikipedia articles on prehistory for help.\ followed by a severe drought. (D) decreased species diversity but increased
\ Researchers believe that the Maya collapse was mainly caused by over-exploitation\ numbers of hammerstones and flakes, indicating stone tool manufacture.'
\ of natural resources like the slash-and-burn farming techniques. The answer is\ target: Let's think step by step. We refer to Wikipedia articles on prehistory
\ (B).\n\nQ: The great Mayan king Pacal built temples in the city of Palenque in\ for help. Recent research has recognized multiple hominid species from the Middle
\ order to:\n(A) satisfy the powerful Mayan astronomer priests. (B) display his\ Pliocene, meaning that there is a great amount of species diversity or diversity
\ generosity to the common people, since they were allowed to live in the temples.\ in a single species. The answer is (A).
\ (C) frighten away enemies, in particular the Spaniards. (D) legitimize his kingship,\ - question: 'Researchers now believe that the decline of the Maya was caused chiefly
\ since his father was not royal.\nA: Let's think step by step. We refer to Wikipedia\ by:
\ articles on prehistory for help. Pacal built the temples as the funerary monument\
\ to legitimize his kingship. The answer is (D).\n\n" (A) a cataclysm of some kind, such as an earthquake, volcano, or tsunami. (B)
"group": "mmlu_flan_cot_fewshot_humanities" ecological degradation resulting from slash-and-burn farming techniques. (C)
"include": "_mmlu_flan_cot_fewshot_template_yaml" endless wars between neighboring Mayan city-states. (D) practices of interbreeding
"task": "mmlu_flan_cot_fewshot_prehistory" that led to a steep rise in congenital disorders.'
target: Let's think step by step. We refer to Wikipedia articles on prehistory
for help. Researchers believe that the Maya collapse was mainly caused by over-exploitation
of natural resources like the slash-and-burn farming techniques. The answer
is (B).
- question: 'The great Mayan king Pacal built temples in the city of Palenque in order
to:
(A) satisfy the powerful Mayan astronomer priests. (B) display his generosity
to the common people, since they were allowed to live in the temples. (C) frighten
away enemies, in particular the Spaniards. (D) legitimize his kingship, since
his father was not royal.'
target: 'Let''s think step by step. We refer to Wikipedia articles on prehistory
for help. Pacal built the temples as the funerary monument to legitimize his
kingship. The answer is (D).'
group: mmlu_flan_cot_fewshot_humanities
include: _mmlu_flan_cot_fewshot_template_yaml
task: mmlu_flan_cot_fewshot_prehistory
"dataset_name": "professional_accounting" dataset_name: professional_accounting
"description": "The following are multiple choice questions (with answers) about professional\ description: The following are multiple choice questions (with answers) about professional
\ accounting.\n\nQ: An auditor traces the serial numbers on equipment to a nonissuer’s\ accounting.
\ subledger. Which of the following management assertions is supported by this test?\n\ fewshot_config:
(A) Valuation and allocation (B) Completeness (C) Rights and obligations (D) Presentation\ sampler: first_n
\ and disclosure\nA: Let's think step by step. We refer to Wikipedia articles on\ samples:
\ accounting for help. The completeness assertion is tested by tracing supporting\ - question: "An auditor traces the serial numbers on equipment to a nonissuer\u2019\
\ documents to the record entries. The answer is (B).\n\nQ: One hundred years ago,\ s subledger. Which of the following management assertions is supported by this\
\ your great-great-grandmother invested $100 at 5% yearly interest. What is the\ \ test?\n(A) Valuation and allocation (B) Completeness (C) Rights and obligations\
\ investment worth today?\n(A) $13,000 (B) $600 (C) $15,000 (D) $28,000\nA: Let's\ \ (D) Presentation and disclosure"
\ think step by step. We refer to Wikipedia articles on accounting for help. A $100\ target: Let's think step by step. We refer to Wikipedia articles on accounting
\ investment at 5% yearly interest is worth 100*(1.05)^100=13150 after 100 years,\ for help. The completeness assertion is tested by tracing supporting documents
\ which is around $13,000. The answer is (A).\n\nQ: On January 1, year 1, Alpha\ to the record entries. The answer is (B).
\ Co. signed an annual maintenance agreement with a software provider for $15,000\ - question: 'One hundred years ago, your great-great-grandmother invested $100 at 5%
\ and the maintenance period begins on March 1, year 1. Alpha also incurred $5,000\ yearly interest. What is the investment worth today?
\ of costs on January 1, year 1, related to software modification requests that\
\ will increase the functionality of the software. Alpha depreciates and amortizes\ (A) $13,000 (B) $600 (C) $15,000 (D) $28,000'
\ its computer and software assets over five years using the straight-line method.\ target: Let's think step by step. We refer to Wikipedia articles on accounting
\ What amount is the total expense that Alpha should recognize related to the maintenance\ for help. A $100 investment at 5% yearly interest is worth 100*(1.05)^100=13150
\ agreement and the software modifications for the year ended December 31, year\ after 100 years, which is around $13,000. The answer is (A).
\ 1?\n(A) $5,000 (B) $13,500 (C) $16,000 (D) $20,000\nA: Let's think step by step.\ - question: 'On January 1, year 1, Alpha Co. signed an annual maintenance agreement
\ We refer to Wikipedia articles on accounting for help. The maintenance period\ with a software provider for $15,000 and the maintenance period begins on March
\ begins on March 1, so only 10 months of expenses should be recognized, which is\ 1, year 1. Alpha also incurred $5,000 of costs on January 1, year 1, related
\ $15,000/12*10=$12,500. The software modification cost is amortized over 5 years,\ to software modification requests that will increase the functionality of the
\ so each year is $5,000/5=$1,000. So the total expense is $12,500+$1,000=$13,500.\ software. Alpha depreciates and amortizes its computer and software assets over
\ The answer is (B).\n\nQ: Krete is an unmarried taxpayer with income exclusively\ five years using the straight-line method. What amount is the total expense
\ from wages. By December 31, year 1, Krete's employer has withheld $16,000 in federal\ that Alpha should recognize related to the maintenance agreement and the software
\ income taxes and Krete has made no estimated tax payments. On April 15, year 2,\ modifications for the year ended December 31, year 1?
\ Krete timely filed for an extension request to file her individual tax return,\
\ and paid $300 of additional taxes. Krete's year 1 tax liability was $16,500 when\ (A) $5,000 (B) $13,500 (C) $16,000 (D) $20,000'
\ she timely filed her return on April 30, year 2, and paid the remaining tax liability\ target: Let's think step by step. We refer to Wikipedia articles on accounting
\ balance. What amount would be subject to the penalty for underpayment of estimated\ for help. The maintenance period begins on March 1, so only 10 months of expenses
\ taxes?\n(A) $0 (B) $500 (C) $1,650 (D) $16,500\nA: Let's think step by step. We\ should be recognized, which is $15,000/12*10=$12,500. The software modification
\ refer to Wikipedia articles on accounting for help. The tax due after withholding\ cost is amortized over 5 years, so each year is $5,000/5=$1,000. So the total
\ is $16,500-$16,000=$500, which is less than $1000, hence there is no underpayment\ expense is $12,500+$1,000=$13,500. The answer is (B).
\ penalty of estimated taxes. The answer is (A).\n\nQ: Box a nongovernmental not-for-profit\ - question: 'Krete is an unmarried taxpayer with income exclusively from wages. By
\ organization had the following transactions during the year: Proceeds from sale\ December 31, year 1, Krete''s employer has withheld $16,000 in federal income
\ of investments $80000 Purchase of property plant and equipment $10000 Proceeds\ taxes and Krete has made no estimated tax payments. On April 15, year 2, Krete
\ from long-term debt $100000 Loss on sale of investment $5000 What amount should\ timely filed for an extension request to file her individual tax return, and
\ be reported as net cash provided by financing activities in Box's statement of\ paid $300 of additional taxes. Krete''s year 1 tax liability was $16,500 when
\ cash flows?\n(A) $70,000 (B) $75,000 (C) $80,000 (D) 100000\nA: Let's think step\ she timely filed her return on April 30, year 2, and paid the remaining tax
\ by step. We refer to Wikipedia articles on accounting for help. Among the four\ liability balance. What amount would be subject to the penalty for underpayment
\ transactions, only Proceeds from long-term debt belongs to the financing activities\ of estimated taxes?
\ section of cashflow, hence the amount reported should be $100000. The answer is\
\ (D).\n\n" (A) $0 (B) $500 (C) $1,650 (D) $16,500'
"group": "mmlu_flan_cot_fewshot_other" target: Let's think step by step. We refer to Wikipedia articles on accounting
"include": "_mmlu_flan_cot_fewshot_template_yaml" for help. The tax due after withholding is $16,500-$16,000=$500, which is less
"task": "mmlu_flan_cot_fewshot_professional_accounting" than $1000, hence there is no underpayment penalty of estimated taxes. The answer
is (A).
- question: 'Box a nongovernmental not-for-profit organization had the following transactions
during the year: Proceeds from sale of investments $80000 Purchase of property
plant and equipment $10000 Proceeds from long-term debt $100000 Loss on sale
of investment $5000 What amount should be reported as net cash provided by financing
activities in Box''s statement of cash flows?
(A) $70,000 (B) $75,000 (C) $80,000 (D) 100000'
target: 'Let''s think step by step. We refer to Wikipedia articles on accounting
for help. Among the four transactions, only Proceeds from long-term debt belongs
to the financing activities section of cashflow, hence the amount reported should
be $100000. The answer is (D).'
group: mmlu_flan_cot_fewshot_other
include: _mmlu_flan_cot_fewshot_template_yaml
task: mmlu_flan_cot_fewshot_professional_accounting
"dataset_name": "professional_law" dataset_name: professional_law
"description": "The following are multiple choice questions (with answers) about professional\ description: The following are multiple choice questions (with answers) about professional
\ law.\n\nQ: A son owed a creditor $5,000. The son's father contacted the creditor\ law.
\ and told him that he wanted to pay the son's debt. The father signed a document\ fewshot_config:
\ that stated the father would pay the son's debt at a rate of $500 a month for\ sampler: first_n
\ 10 months. The creditor made no written or oral commitment to forbear to sue the\ samples:
\ son to collect the $5,000 debt, and the father made no oral or written request\ - question: 'A son owed a creditor $5,000. The son''s father contacted the creditor
\ for any such forbearance. For the next five months, the father made and the creditor\ and told him that he wanted to pay the son''s debt. The father signed a document
\ accepted the $500 monthly payments as agreed. During that period, the creditor,\ that stated the father would pay the son''s debt at a rate of $500 a month for
\ in fact, did forbear to take any legal action against the son. However, the father\ 10 months. The creditor made no written or oral commitment to forbear to sue
\ then informed the creditor that he would make no further payments on the debt.\ the son to collect the $5,000 debt, and the father made no oral or written request
\ Which of the following is the most persuasive argument that the father is liable\ for any such forbearance. For the next five months, the father made and the
\ to the creditor under the terms of their agreement?\n(A) The father's promise\ creditor accepted the $500 monthly payments as agreed. During that period, the
\ and the creditor's reliance thereon, if proved, gave rise to a valid claim by\ creditor, in fact, did forbear to take any legal action against the son. However,
\ the creditor against the father based on the doctrine of promissory estoppel.\ the father then informed the creditor that he would make no further payments
\ (B) Because it was foreseeable that the father's promise would induce the creditor\ on the debt. Which of the following is the most persuasive argument that the
\ to forbear taking any action against the son, such forbearance was, as a matter\ father is liable to the creditor under the terms of their agreement?
\ of law, a bargained-for consideration for the father's promise. (C) The father's\
\ five payments to the creditor totaling $2,500 manifested a serious intent on the\ (A) The father''s promise and the creditor''s reliance thereon, if proved, gave
\ father's part to be contractually bound, and such manifestation is generally recognized\ rise to a valid claim by the creditor against the father based on the doctrine
\ as an effective substitute for consideration. (D) By assuming the antecedent debt\ of promissory estoppel. (B) Because it was foreseeable that the father''s promise
\ obligation that the son owed to the creditor, the father became a surety whose\ would induce the creditor to forbear taking any action against the son, such
\ promise to the creditor was enforceable, since it was in writing and supported\ forbearance was, as a matter of law, a bargained-for consideration for the father''s
\ by adequate consideration. \nA: Let's think step by step. We refer to Wikipedia\ promise. (C) The father''s five payments to the creditor totaling $2,500 manifested
\ articles on law for help. The doctrine of promissory estoppel stops a person from\ a serious intent on the father''s part to be contractually bound, and such manifestation
\ going back on a promise in contract law, hence option (A) should be the most persuasive\ is generally recognized as an effective substitute for consideration. (D) By
\ argument. The answer is (A).\n\nQ: A state has recently enacted a statute prohibiting\ assuming the antecedent debt obligation that the son owed to the creditor, the
\ the disposal of any nuclear wastes within the state. This law does not contravene\ father became a surety whose promise to the creditor was enforceable, since
\ or conflict with any federal statutes. A man operates a company in the state that\ it was in writing and supported by adequate consideration. '
\ is engaged in the disposal of nuclear wastes. Subsequent to the passage of the\ target: Let's think step by step. We refer to Wikipedia articles on law for help.
\ state statute, the man, not yet aware of the new law, entered into contracts with\ The doctrine of promissory estoppel stops a person from going back on a promise
\ many out-of-state firms to dispose of their nuclear wastes in the state. On account\ in contract law, hence option (A) should be the most persuasive argument. The
\ of this new law, however, the man will be unable to perform these contracts. Assume\ answer is (A).
\ that the man has standing to challenge this state law. Which of the following\ - question: 'A state has recently enacted a statute prohibiting the disposal of any
\ presents his strongest constitutional grounds to challenge the state law prohibiting\ nuclear wastes within the state. This law does not contravene or conflict with
\ the disposal of nuclear wastes within the state?\n(A) The commerce clause. (B)\ any federal statutes. A man operates a company in the state that is engaged
\ The equal protection clause of the Fourteenth Amendment. (C) The privileges and\ in the disposal of nuclear wastes. Subsequent to the passage of the state statute,
\ immunities clause of Article IV, Section 2. (D) The contract clause.\nA: Let's\ the man, not yet aware of the new law, entered into contracts with many out-of-state
\ think step by step. We refer to Wikipedia articles on law for help. The commerce\ firms to dispose of their nuclear wastes in the state. On account of this new
\ clause states that Congress shall have the power to regulate commerce with foreign\ law, however, the man will be unable to perform these contracts. Assume that
\ Nations, and among the several States, and with the Indian Tribes. The statute\ the man has standing to challenge this state law. Which of the following presents
\ affects inter-state commerce which puts it into question. Hence the man's strongest\ his strongest constitutional grounds to challenge the state law prohibiting
\ argument should be the commerce clause. The answer is (A).\n\nQ: On October 1,\ the disposal of nuclear wastes within the state?
\ 1980, a developer, owner of several hundred acres in a rural county, drafted a\
\ general development plan for the area. The duly recorded plan imposed elaborate\ (A) The commerce clause. (B) The equal protection clause of the Fourteenth Amendment.
\ limitations and restrictions upon the land in the plan, which was to be developed\ (C) The privileges and immunities clause of Article IV, Section 2. (D) The contract
\ as a residential district. The restrictions were to extend to all persons acquiring\ clause.'
\ any of the lots and to their heirs, assigns, and lessees. It was further provided\ target: Let's think step by step. We refer to Wikipedia articles on law for help.
\ that all subsequent owners would be charged with due notice of the restrictions.\ The commerce clause states that Congress shall have the power to regulate commerce
\ Among those restrictions in the general plan were the following:(22) A franchise\ with foreign Nations, and among the several States, and with the Indian Tribes.
\ right is created in a strip of land 10 feet in width along the rear of each lot\ The statute affects inter-state commerce which puts it into question. Hence
\ for the use of public utility companies with right of ingress and egress. (23)\ the man's strongest argument should be the commerce clause. The answer is (A).
\ No house or structure of any kind shall be built on the aforementioned strip of\ - question: 'On October 1, 1980, a developer, owner of several hundred acres in a rural
\ land running through the said blocks. In 2000, a retiree purchased one of the\ county, drafted a general development plan for the area. The duly recorded plan
\ lots, built a house, and erected a fence in the rear of his property within the\ imposed elaborate limitations and restrictions upon the land in the plan, which
\ restricted area. In 2004, a teacher purchased a lot adjacent to the retiree's\ was to be developed as a residential district. The restrictions were to extend
\ property and built a new house. Two years later, a librarian purchased the lot\ to all persons acquiring any of the lots and to their heirs, assigns, and lessees.
\ that adjoined the teacher's property. The three deeds to those properties each\ It was further provided that all subsequent owners would be charged with due
\ contained references to the deed book where the general plan was recorded. In\ notice of the restrictions. Among those restrictions in the general plan were
\ 2008, the librarian began the construction of a seven-foot post-and-rail fence\ the following:(22) A franchise right is created in a strip of land 10 feet in
\ along the line dividing his lot with the teacher's, and along the center of the\ width along the rear of each lot for the use of public utility companies with
\ area subject to the franchise right. Although the teacher objected to its construction,\ right of ingress and egress. (23) No house or structure of any kind shall be
\ the fence was completed. If the teacher seeks a mandatory injunction to compel\ built on the aforementioned strip of land running through the said blocks. In
\ removal of the librarian's fence, the court will most likely\n(A) grant relief,\ 2000, a retiree purchased one of the lots, built a house, and erected a fence
\ because the fence was in violation of the easement restriction. (B) grant relief,\ in the rear of his property within the restricted area. In 2004, a teacher purchased
\ because the encroachment of the fence violated the restriction in the original\ a lot adjacent to the retiree''s property and built a new house. Two years later,
\ plan. (C) deny relief, because the teacher failed to enforce the restriction against\ a librarian purchased the lot that adjoined the teacher''s property. The three
\ the retiree. (D) deny relief, because the fence would not be construed as \"a\ deeds to those properties each contained references to the deed book where the
\ structure\" within the terms of the restriction. \nA: Let's think step by step.\ general plan was recorded. In 2008, the librarian began the construction of
\ We refer to Wikipedia articles on law for help. The restrictions in the original\ a seven-foot post-and-rail fence along the line dividing his lot with the teacher''s,
\ plan say no house or structure of any kind shall be built on the aforementioned\ and along the center of the area subject to the franchise right. Although the
\ strip of land running through the said blocks. Hence the court will most likely\ teacher objected to its construction, the fence was completed. If the teacher
\ grant relief because the fence violated the restriction in the original plan.\ seeks a mandatory injunction to compel removal of the librarian''s fence, the
\ The answer is (B).\n\nQ: Judge took judicial notice of some facts at the beginning\ court will most likely
\ of the trial. Which of the following is not an appropriate kind of fact for judicial\
\ notice?\n(A) Indisputable facts. (B) Facts that have been asserted by individual\ (A) grant relief, because the fence was in violation of the easement restriction.
\ political organizations. (C) Facts recognized to be true by common knowledge.\ (B) grant relief, because the encroachment of the fence violated the restriction
\ (D) Facts capable of scientific verification.\nA: Let's think step by step. We\ in the original plan. (C) deny relief, because the teacher failed to enforce
\ refer to Wikipedia articles on law for help. Among the options, facts that have\ the restriction against the retiree. (D) deny relief, because the fence would
\ been asserted by individual political organizations is not an appropriate kind\ not be construed as "a structure" within the terms of the restriction. '
\ of fact for judicial notice. The answer is (B).\n\nQ: A state legislature has\ target: Let's think step by step. We refer to Wikipedia articles on law for help.
\ recently enacted a statute making it a misdemeanor to curse or revile or use obscene\ The restrictions in the original plan say no house or structure of any kind
\ or opprobrious language toward or in reference to a police officer perfonning\ shall be built on the aforementioned strip of land running through the said
\ his duties. A student at a state university organized a demonstration on campus\ blocks. Hence the court will most likely grant relief because the fence violated
\ to protest the war. The rally was attended by a group of 50 students who shouted\ the restriction in the original plan. The answer is (B).
\ anti-war messages at cars passing by. To show his contempt for the United States,\ - question: 'Judge took judicial notice of some facts at the beginning of the trial.
\ the student sewed the American flag to the rear of his jeans. When a police officer\ Which of the following is not an appropriate kind of fact for judicial notice?
\ saw the flag sown on the student's jeans, he approached and told him to remove\
\ the flag or he would be placed under arrest. The student became angered and shouted\ (A) Indisputable facts. (B) Facts that have been asserted by individual political
\ at the police officer, \"Listen, you bastard, I'll wear this rag anywhere I please.\ organizations. (C) Facts recognized to be true by common knowledge. (D) Facts
\ \" The student was subsequently placed under arrest and charged with violating\ capable of scientific verification.'
\ the state statute. The student subsequently brings suit in state court challenging\ target: Let's think step by step. We refer to Wikipedia articles on law for help.
\ the constitutionality of the statute. The strongest constitutional argument for\ Among the options, facts that have been asserted by individual political organizations
\ the student is that\n(A) the statute is void for vagueness under the Fourteenth\ is not an appropriate kind of fact for judicial notice. The answer is (B).
\ Amendment's due process clause. (B) the statute is invalid because it violates\ - question: 'A state legislature has recently enacted a statute making it a misdemeanor
\ the petitioner's freedom of speech under the First Amendment. (C) the statute\ to curse or revile or use obscene or opprobrious language toward or in reference
\ is an abridgment of freedom of speech under the First Amendment because less restrictive\ to a police officer perfonning his duties. A student at a state university organized
\ means are available for achieving the same purpose. (D) the statute is overbroad\ a demonstration on campus to protest the war. The rally was attended by a group
\ and consequently invalid under the First and Fourteenth Amendments.\nA: Let's\ of 50 students who shouted anti-war messages at cars passing by. To show his
\ think step by step. We refer to Wikipedia articles on law for help. The Fourteenth\ contempt for the United States, the student sewed the American flag to the rear
\ Amendment further supports the First Amendment by establishing a due process clause.\ of his jeans. When a police officer saw the flag sown on the student''s jeans,
\ Hence the strongest argument should be the statute is overbroad and consequently\ he approached and told him to remove the flag or he would be placed under arrest.
\ invalid under the First and Fourteenth Amendments. The answer is (D).\n\n" The student became angered and shouted at the police officer, "Listen, you bastard,
"group": "mmlu_flan_cot_fewshot_humanities" I''ll wear this rag anywhere I please. " The student was subsequently placed
"include": "_mmlu_flan_cot_fewshot_template_yaml" under arrest and charged with violating the state statute. The student subsequently
"task": "mmlu_flan_cot_fewshot_professional_law" brings suit in state court challenging the constitutionality of the statute.
The strongest constitutional argument for the student is that
(A) the statute is void for vagueness under the Fourteenth Amendment''s due
process clause. (B) the statute is invalid because it violates the petitioner''s
freedom of speech under the First Amendment. (C) the statute is an abridgment
of freedom of speech under the First Amendment because less restrictive means
are available for achieving the same purpose. (D) the statute is overbroad and
consequently invalid under the First and Fourteenth Amendments.'
target: 'Let''s think step by step. We refer to Wikipedia articles on law for
help. The Fourteenth Amendment further supports the First Amendment by establishing
a due process clause. Hence the strongest argument should be the statute is
overbroad and consequently invalid under the First and Fourteenth Amendments.
The answer is (D).'
group: mmlu_flan_cot_fewshot_humanities
include: _mmlu_flan_cot_fewshot_template_yaml
task: mmlu_flan_cot_fewshot_professional_law
"dataset_name": "professional_medicine" dataset_name: professional_medicine
"description": "The following are multiple choice questions (with answers) about professional\ description: The following are multiple choice questions (with answers) about professional
\ medicine.\n\nQ: A 22-year-old male marathon runner presents to the office with\ medicine.
\ the complaint of right-sided rib pain when he runs long distances. Physical examination\ fewshot_config:
\ reveals normal heart and lung findings and an exhalation dysfunction at ribs 4-5\ sampler: first_n
\ on the right. Which of the following muscles or muscle groups will be most useful\ samples:
\ in correcting this dysfunction utilizing a direct method?\n(A) anterior scalene\ - question: "A 22-year-old male marathon runner presents to the office with the complaint\
\ (B) latissimus dorsi (C) pectoralis minor (D) quadratus lumborum\nA: Let's think\ \ of right-sided rib pain when he runs long distances. Physical examination\
\ step by step. We refer to Wikipedia articles on medicine for help. Among the options,\ \ reveals normal heart and lung findings and an exhalation dysfunction at ribs\_\
\ only pectoralis minor muscle origins from the outer surfaces of the 3rd to 5th\ 4-5 on the right. Which of the following muscles or muscle groups will be most\
\ ribs. The answer is (C).\n\nQ: A 36-year-old male presents to the office with\ \ useful in correcting this dysfunction utilizing a direct method?\n(A) anterior\
\ a 3-week history of low back pain. He denies any recent trauma but says that he\ \ scalene (B) latissimus dorsi (C) pectoralis minor (D) quadratus lumborum"
\ climbs in and out of his truck numerous times a day for his job. Examination of\ target: Let's think step by step. We refer to Wikipedia articles on medicine for
\ the patient in the prone position reveals a deep sacral sulcus on the left, a\ help. Among the options, only pectoralis minor muscle origins from the outer
\ posterior inferior lateral angle on the right, and a lumbosacral junction that\ surfaces of the 3rd to 5th ribs. The answer is (C).
\ springs freely on compression. The most likely diagnosis is\n(A) left-on-left\ - question: "A 36-year-old male presents to the office with a\_3-week\_history of low\
\ sacral torsion (B) left-on-right sacral torsion (C) right unilateral sacral flexion\ \ back pain. He denies any recent trauma but says that he climbs in and out\
\ (D) right-on-right sacral torsion\nA: Let's think step by step. We refer to Wikipedia\ \ of his truck numerous times a day for his job. Examination of the patient\
\ articles on medicine for help. The deep sulcus on the left, a posterior ILA on\ \ in the prone position reveals a deep sacral sulcus on the left, a posterior\
\ the right, with a negative spring test suggests a right-on-right sacral torsion.\ \ inferior lateral angle on the right, and a lumbosacral junction that springs\
\ All other options have a deep sulcus on the right. The answer is (D).\n\nQ: A\ \ freely on compression. The most likely diagnosis is\n(A) left-on-left sacral\
\ 44-year-old man comes to the office because of a 3-day history of sore throat,\ \ torsion (B) left-on-right sacral torsion (C) right unilateral sacral flexion\
\ nonproductive cough, runny nose, and frontal headache. He says the headache is\ \ (D) right-on-right sacral torsion"
\ worse in the morning and ibuprofen does provide some relief. He has not had shortness\ target: Let's think step by step. We refer to Wikipedia articles on medicine for
\ of breath. Medical history is unremarkable. He takes no medications other than\ help. The deep sulcus on the left, a posterior ILA on the right, with a negative
\ the ibuprofen for pain. Vital signs are temperature 37.4°C (99.4°F), pulse 88/min,\ spring test suggests a right-on-right sacral torsion. All other options have
\ respirations 18/min, and blood pressure 120/84 mm Hg. Examination of the nares\ a deep sulcus on the right. The answer is (D).
\ shows erythematous mucous membranes. Examination of the throat shows erythema\ - question: "A 44-year-old man comes to the office because of a 3-day history of sore\
\ and follicular lymphoid hyperplasia on the posterior oropharynx. There is no palpable\ \ throat, nonproductive cough, runny nose, and frontal headache. He says the\
\ cervical adenopathy. Lungs are clear to auscultation. Which of the following is\ \ headache is worse in the morning and ibuprofen does provide some relief. He\
\ the most likely cause of this patient's symptoms?\n(A) Allergic rhinitis (B) Epstein-Barr\ \ has not had shortness of breath. Medical history is unremarkable. He takes\
\ virus (C) Mycoplasma pneumonia (D) Rhinovirus\nA: Let's think step by step. We\ \ no medications other than the ibuprofen for pain. Vital signs are temperature\
\ refer to Wikipedia articles on medicine for help. The symptoms, especially the\ \ 37.4\xB0C (99.4\xB0F), pulse 88/min, respirations 18/min, and blood pressure\
\ headache, suggest that the most likely cause is Rhinovirus. Epstein-Barr virus\ \ 120/84 mm Hg. Examination of the nares shows erythematous mucous membranes.\
\ will cause swollen lymph nodes but there is no palpable cervical adenopathy. Lungs\ \ Examination of the throat shows erythema and follicular lymphoid hyperplasia\
\ are clear to auscultation suggests it's not Mycoplasma pneumonia. The answer is\ \ on the posterior oropharynx. There is no palpable cervical adenopathy. Lungs\
\ (D).\n\nQ: A previously healthy 32-year-old woman comes to the physician 8 months\ \ are clear to auscultation. Which of the following is the most likely cause\
\ after her husband was killed in a car crash. Since that time, she has had a decreased\ \ of this patient's symptoms?\n(A) Allergic rhinitis (B) Epstein-Barr virus\
\ appetite and difficulty falling asleep. She states that she is often sad and cries\ \ (C) Mycoplasma pneumonia (D) Rhinovirus"
\ frequently. She has been rechecking the door lock five times before leaving her\ target: Let's think step by step. We refer to Wikipedia articles on medicine for
\ house and has to count exactly five pieces of toilet paper before she uses it.\ help. The symptoms, especially the headache, suggest that the most likely cause
\ She says that she has always been a perfectionist but these urges and rituals\ is Rhinovirus. Epstein-Barr virus will cause swollen lymph nodes but there is
\ are new. Pharmacotherapy should be targeted to which of the following neurotransmitters?\n\ no palpable cervical adenopathy. Lungs are clear to auscultation suggests it's
(A) Dopamine (B) Glutamate (C) Norepinephrine (D) Serotonin\nA: Let's think step\ not Mycoplasma pneumonia. The answer is (D).
\ by step. We refer to Wikipedia articles on medicine for help. The patient feels\ - question: 'A previously healthy 32-year-old woman comes to the physician 8 months
\ sad and among the options, only Dopamine and Serotonin can help increase positive\ after her husband was killed in a car crash. Since that time, she has had a
\ emotions. Serotonin also affects digestion and metabolism, which can help the\ decreased appetite and difficulty falling asleep. She states that she is often
\ patient's decreased appetite and sleep difficulty. The answer is (D).\n\nQ: A\ sad and cries frequently. She has been rechecking the door lock five times before
\ 42-year-old man comes to the office for preoperative evaluation prior to undergoing\ leaving her house and has to count exactly five pieces of toilet paper before
\ adrenalectomy scheduled in 2 weeks. One month ago, he received care in the emergency\ she uses it. She says that she has always been a perfectionist but these urges
\ department for pain over his right flank following a motor vehicle collision.\ and rituals are new. Pharmacotherapy should be targeted to which of the following
\ At that time, blood pressure was 160/100 mm Hg and CT scan of the abdomen showed\ neurotransmitters?
\ an incidental 10-cm left adrenal mass. Results of laboratory studies, including\
\ complete blood count, serum electrolyte concentrations, and liver function tests,\ (A) Dopamine (B) Glutamate (C) Norepinephrine (D) Serotonin'
\ were within the reference ranges. The patient otherwise had been healthy and had\ target: Let's think step by step. We refer to Wikipedia articles on medicine for
\ never been told that he had elevated blood pressure. He takes no medications.\ help. The patient feels sad and among the options, only Dopamine and Serotonin
\ A follow-up visit in the office 2 weeks ago disclosed elevated urinary normetanephrine\ can help increase positive emotions. Serotonin also affects digestion and metabolism,
\ and metanephrine and plasma aldosterone concentrations. The patient was referred\ which can help the patient's decreased appetite and sleep difficulty. The answer
\ to a surgeon, who recommended the adrenalectomy. Today, vital signs are temperature\ is (D).
\ 36.6°C (97.9°F), pulse 100/min, respirations 14/min, and blood pressure 170/95\ - question: "A 42-year-old man comes to the office for preoperative evaluation prior\
\ mm Hg. Physical examination discloses no significant findings. Initial preoperative\ \ to undergoing adrenalectomy scheduled in 2 weeks. One month ago, he received\
\ preparation should include treatment with which of the following?\n(A) Labetalol\ \ care in the emergency department for pain over his right flank following a\
\ (B) A loading dose of potassium chloride (C) Nifedipine (D) Phenoxybenzamine\n\ \ motor vehicle collision. At that time, blood pressure was 160/100 mm Hg and\
A: Let's think step by step. We refer to Wikipedia articles on medicine for help.\ \ CT scan of the abdomen showed an incidental 10-cm left adrenal mass. Results\
\ The symptoms and the adrenal mass suggested pheochromocytoma, and the blood pressure\ \ of laboratory studies, including complete blood count, serum electrolyte concentrations,\
\ indicates hypertension. Phenoxybenzamine is used to treat hypertension caused\ \ and liver function tests, were within the reference ranges. The patient otherwise\
\ by pheochromocytoma. The answer is (D).\n\n" \ had been healthy and had never been told that he had elevated blood pressure.\
"group": "mmlu_flan_cot_fewshot_other" \ He takes no medications. A follow-up visit in the office 2 weeks ago disclosed\
"include": "_mmlu_flan_cot_fewshot_template_yaml" \ elevated urinary normetanephrine and metanephrine and plasma aldosterone concentrations.\
"task": "mmlu_flan_cot_fewshot_professional_medicine" \ The patient was referred to a surgeon, who recommended the adrenalectomy.\
\ Today, vital signs are temperature 36.6\xB0C (97.9\xB0F), pulse 100/min, respirations\
\ 14/min, and blood pressure 170/95 mm Hg. Physical examination discloses no\
\ significant findings. Initial preoperative preparation should include treatment\
\ with which of the following?\n(A) Labetalol (B) A loading dose of potassium\
\ chloride (C) Nifedipine (D) Phenoxybenzamine"
target: 'Let''s think step by step. We refer to Wikipedia articles on medicine
for help. The symptoms and the adrenal mass suggested pheochromocytoma, and
the blood pressure indicates hypertension. Phenoxybenzamine is used to treat
hypertension caused by pheochromocytoma. The answer is (D).'
group: mmlu_flan_cot_fewshot_other
include: _mmlu_flan_cot_fewshot_template_yaml
task: mmlu_flan_cot_fewshot_professional_medicine
"dataset_name": "professional_psychology" dataset_name: professional_psychology
"description": "The following are multiple choice questions (with answers) about professional\ description: The following are multiple choice questions (with answers) about professional
\ psychology.\n\nQ: In the construction of a multiple regression equation for purposes\ psychology.
\ of prediction, the optimal combination of measures is one in which the predictors\n\ fewshot_config:
(A) are uncorrelated with each other but are moderately correlated with the criterion\ sampler: first_n
\ (B) have low correlations with each other and low correlations with the criterion\ samples:
\ (C) are highly intercorrelated with each other and moderately correlated with\ - question: 'In the construction of a multiple regression equation for purposes of
\ the criterion (D) have low correlations with the criterion bur are moderately\ prediction, the optimal combination of measures is one in which the predictors
\ correlated with each other\nA: Let's think step by step. We refer to Wikipedia\
\ articles on psychology for help. The basis of multiple regression is to assess\ (A) are uncorrelated with each other but are moderately correlated with the
\ the relationship between one continuous variable and a set of independent variables.\ criterion (B) have low correlations with each other and low correlations with
\ So the predictors should be uncorrelated with each other but are moderately correlated\ the criterion (C) are highly intercorrelated with each other and moderately
\ with the criterion. The answer is (A).\n\nQ: There are three ways to measure the\ correlated with the criterion (D) have low correlations with the criterion bur
\ Central Tendency: the Mean, the Median and the Mode. From your knowledge about\ are moderately correlated with each other'
\ them, what is the mode?\n(A) less sensitive to extreme scores than the mean (B)\ target: Let's think step by step. We refer to Wikipedia articles on psychology
\ more useful for skewed distributions (C) sensitive to extreme values and highly\ for help. The basis of multiple regression is to assess the relationship between
\ skewed distributions (D) the most frequently occurring number\nA: Let's think\ one continuous variable and a set of independent variables. So the predictors
\ step by step. We refer to Wikipedia articles on psychology for help. The definition\ should be uncorrelated with each other but are moderately correlated with the
\ of mode is the most frequently occurring number. The answer is (D).\n\nQ: Carl\ criterion. The answer is (A).
\ Jung believed that a client's transference:\n(A) is a fantasy that distracts the\ - question: 'There are three ways to measure the Central Tendency: the Mean, the Median
\ client from reality. (B) represents “mixed feelings” toward the therapist. (C)\ and the Mode. From your knowledge about them, what is the mode?
\ \"is a form of \"\"acting out.\"\"\" (D) reflects the client’s personal and collective\
\ unconscious.\nA: Let's think step by step. We refer to Wikipedia articles on psychology\ (A) less sensitive to extreme scores than the mean (B) more useful for skewed
\ for help. Transference is a phenomenon that a person's feelings are unconsciously\ distributions (C) sensitive to extreme values and highly skewed distributions
\ redirected, so it reflects the client's personal and collective unconscious. The\ (D) the most frequently occurring number'
\ answer is (D).\n\nQ: In terms of Hofstede’s (1980) five cultural dimensions, the\ target: Let's think step by step. We refer to Wikipedia articles on psychology
\ United States scores at the top of the scale on:\n(A) individualism. (B) individualism\ for help. The definition of mode is the most frequently occurring number. The
\ and power distance. (C) power distance and masculinity. (D) uncertainty avoidance.\n\ answer is (D).
A: Let's think step by step. We refer to Wikipedia articles on psychology for help.\ - question: "Carl Jung believed that a client's transference:\n(A) is a fantasy that\
\ US scores highest on individualism among the five cultural dimensions. The answer\ \ distracts the client from reality. (B) represents \u201Cmixed feelings\u201D\
\ is (A).\n\nQ: One of your therapy clients asks your advice about a good weight-\ \ toward the therapist. (C) \"is a form of \"\"acting out.\"\"\" (D) reflects\
\ reduction program. You have investigated the programs in the community and are\ \ the client\u2019s personal and collective unconscious."
\ enrolled in the one you consider the best. This program offers a $50 bonus to\ target: Let's think step by step. We refer to Wikipedia articles on psychology
\ its patrons for each new person they bring into the program. Under these circumstances,\ for help. Transference is a phenomenon that a person's feelings are unconsciously
\ your most appropriate response would be to\n(A) tell your client the pros and\ redirected, so it reflects the client's personal and collective unconscious.
\ cons of each program you know about except for the one in which you are enrolled\ The answer is (D).
\ (B) recommend to your client the program in which you are enrolled and explain\ - question: "In terms of Hofstede\u2019s (1980) five cultural dimensions, the United\
\ the $50 bonus you will receive (C) recommend to your client the program in which\ \ States scores at the top of the scale on:\n(A) individualism. (B) individualism\
\ you are enrolled and offer to have the $50 bonus credited to your client's account\ \ and power distance. (C) power distance and masculinity. (D) uncertainty avoidance."
\ in the program (D) tell your client the pros and cons of each program you know\ target: Let's think step by step. We refer to Wikipedia articles on psychology
\ about, but do not claim the $50 bonus if your client enrolls in your program\n\ for help. US scores highest on individualism among the five cultural dimensions.
A: Let's think step by step. We refer to Wikipedia articles on psychology for help.\ The answer is (A).
\ Based on the circumstances, you should tell your client about the pros and cons\ - question: 'One of your therapy clients asks your advice about a good weight- reduction
\ of each program, but it would be inappropriate to receive the bonus, so you should\ program. You have investigated the programs in the community and are enrolled
\ not claim the $50 bonus. The answer is (D).\n\n" in the one you consider the best. This program offers a $50 bonus to its patrons
"group": "mmlu_flan_cot_fewshot_social_sciences" for each new person they bring into the program. Under these circumstances,
"include": "_mmlu_flan_cot_fewshot_template_yaml" your most appropriate response would be to
"task": "mmlu_flan_cot_fewshot_professional_psychology"
(A) tell your client the pros and cons of each program you know about except
for the one in which you are enrolled (B) recommend to your client the program
in which you are enrolled and explain the $50 bonus you will receive (C) recommend
to your client the program in which you are enrolled and offer to have the $50
bonus credited to your client''s account in the program (D) tell your client
the pros and cons of each program you know about, but do not claim the $50 bonus
if your client enrolls in your program'
target: 'Let''s think step by step. We refer to Wikipedia articles on psychology
for help. Based on the circumstances, you should tell your client about the
pros and cons of each program, but it would be inappropriate to receive the
bonus, so you should not claim the $50 bonus. The answer is (D).'
group: mmlu_flan_cot_fewshot_social_sciences
include: _mmlu_flan_cot_fewshot_template_yaml
task: mmlu_flan_cot_fewshot_professional_psychology
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment