Unverified Commit 4902aaaf authored by Clémentine Fourrier's avatar Clémentine Fourrier Committed by GitHub
Browse files

Making hardcoded few shots compatible with the chat template mechanism (#1895)



* init test 1

* fix

* this format seems to be working - need to update all other tasks with the new format

* bbh with few shot format

* fix fewshot bbh

* add mmlu flan cot

* samples of cot

* kmmlu

* fix gsm8k

* update keys for mmlu

* minerva math

* bbh

* fix

* fix samples

* small fixes to templates

* last prompt format change

* fixing prompt

* fixed minerva math format

* rm accidental commited file

* added doc for few shot samples

* Update lm_eval/loggers/evaluation_tracker.py

* Update lm_eval/loggers/evaluation_tracker.py

* Update docs/new_task_guide.md
Co-authored-by: default avatarHailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

* added check in sampler per code review

* added the system from a function, plus an example in minerva math

* style

* Apply suggestions from code review
Co-authored-by: default avatarHailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

* fix unit tests 1

* forcing use of test split

---------
Co-authored-by: default avatarHailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
parent f4f59251
"dataset_name": "public_relations"
"description": "The following are multiple choice questions (with answers) about public\
\ relations.\n\nQ: Earth Hour was a campaign launched by which organization?\n(A)\
\ Greenpeace (B) The UN (C) Oxfam (D) World Wildlife Fund\nA: Let's think step by\
\ step. We refer to Wikipedia articles on public relations for help. Earth Hour\
\ is a worldwide movement oragnized launched by the World Wildlife Fund. The answer\
\ is (D).\n\nQ: In issues management, what is the most proactive approach to addressing\
\ negative or misleading information posted online about your organization?\n(A)\
\ Buy domain names that could be used by opposition groups. (B) Post anonymous comments\
\ on blogs to combat this information. (C) Prepare a news release that discredits\
\ the inaccurate information. (D) Make policy changes to address complaints highlighted\
\ on these sites.\nA: Let's think step by step. We refer to Wikipedia articles on\
\ public relations for help. In issues management, the most proactive approach to\
\ addressing negative or misleading information posted online is to make policy\
\ changes to address complaints highlighted on those sites. The answer is (D).\n\
\nQ: At which stage in the planning process would a situation analysis be carried\
\ out?\n(A) Defining the program (B) Planning the program (C) Taking action and\
\ implementing ideas (D) Evaluation of the program\nA: Let's think step by step.\
\ We refer to Wikipedia articles on public relations for help. Situation analyses\
\ are typically carried out during the planning process stage of defining the program.\
\ The answer is (A).\n\nQ: Which of these statements is true of the Vatican in 2010\
\ at the time of the accusations of child abuse cover-ups?\n(A) There was a coordinated\
\ media response. (B) Consistent messages were communicated. (C) Criticisms were\
\ taken as attacks on the Catholic Church. (D) The credibility of the Vatican was\
\ upheld.\nA: Let's think step by step. We refer to Wikipedia articles on public\
\ relations for help. In 2010 when there were accusations of child abuse cover-ups,\
\ the Vatican took those criticisms as attacks on the Catholic Church. The answer\
\ is (C).\n\nQ: What should a public relations media practitioner do if she does\
\ not know the answer to a reporter's question?\n(A) Give the reporter other information\
\ she is certain is correct. (B) Say that the information is 'off the record' and\
\ will be disseminated later. (C) Say 'I don't know' and promise to provide the\
\ information later. (D) Say 'no comment,' rather than appear uninformed.\nA: Let's\
\ think step by step. We refer to Wikipedia articles on public relations for help.\
\ If a public relations media practitioner does not know the answer to a reporter's\
\ question, they should say 'I don't know' and offer to provide the information\
\ later. The answer is (C).\n\n"
"group": "mmlu_flan_cot_fewshot_social_sciences"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_public_relations"
dataset_name: public_relations
description: The following are multiple choice questions (with answers) about public
relations.
fewshot_config:
sampler: first_n
samples:
- question: 'Earth Hour was a campaign launched by which organization?
(A) Greenpeace (B) The UN (C) Oxfam (D) World Wildlife Fund'
target: Let's think step by step. We refer to Wikipedia articles on public relations
for help. Earth Hour is a worldwide movement oragnized launched by the World
Wildlife Fund. The answer is (D).
- question: 'In issues management, what is the most proactive approach to addressing
negative or misleading information posted online about your organization?
(A) Buy domain names that could be used by opposition groups. (B) Post anonymous
comments on blogs to combat this information. (C) Prepare a news release that
discredits the inaccurate information. (D) Make policy changes to address complaints
highlighted on these sites.'
target: Let's think step by step. We refer to Wikipedia articles on public relations
for help. In issues management, the most proactive approach to addressing negative
or misleading information posted online is to make policy changes to address
complaints highlighted on those sites. The answer is (D).
- question: 'At which stage in the planning process would a situation analysis be carried
out?
(A) Defining the program (B) Planning the program (C) Taking action and implementing
ideas (D) Evaluation of the program'
target: Let's think step by step. We refer to Wikipedia articles on public relations
for help. Situation analyses are typically carried out during the planning process
stage of defining the program. The answer is (A).
- question: 'Which of these statements is true of the Vatican in 2010 at the time of
the accusations of child abuse cover-ups?
(A) There was a coordinated media response. (B) Consistent messages were communicated.
(C) Criticisms were taken as attacks on the Catholic Church. (D) The credibility
of the Vatican was upheld.'
target: Let's think step by step. We refer to Wikipedia articles on public relations
for help. In 2010 when there were accusations of child abuse cover-ups, the
Vatican took those criticisms as attacks on the Catholic Church. The answer
is (C).
- question: 'What should a public relations media practitioner do if she does not know
the answer to a reporter''s question?
(A) Give the reporter other information she is certain is correct. (B) Say that
the information is ''off the record'' and will be disseminated later. (C) Say
''I don''t know'' and promise to provide the information later. (D) Say ''no
comment,'' rather than appear uninformed.'
target: 'Let''s think step by step. We refer to Wikipedia articles on public relations
for help. If a public relations media practitioner does not know the answer
to a reporter''s question, they should say ''I don''t know'' and offer to provide
the information later. The answer is (C).'
group: mmlu_flan_cot_fewshot_social_sciences
include: _mmlu_flan_cot_fewshot_template_yaml
task: mmlu_flan_cot_fewshot_public_relations
"dataset_name": "security_studies"
"description": "The following are multiple choice questions (with answers) about security\
\ studies.\n\nQ: What are the frameworks of analysis within which terrorism has\
\ been considered (as of 2020)?\n(A) Competition between larger nations has resulted\
\ in some countries actively supporting terrorist groups to undermine the strength\
\ of rival states. Terrorist networks are extended patronage clubs maintained and\
\ paid for by their donor states and are conceptualised as being like state actors,\
\ to be dealt with using military force. (B) Globalization has enabled the internationalization\
\ of terrorist activities by opening up their operational space, although coordination\
\ is still managed from a geographical base. This suggests that terrorist groups\
\ are nationally structured which means that terrorism cannot be considered in terms\
\ of a war to be defeated militarily without having serious implications on the\
\ indigenous population. (C) Terrorism can be viewed as a problem to be resolved\
\ by military means (war on terrorism), by normal police techniques (terrorism as\
\ crime), or as a medical problem with underlying causes and symptoms (terrorism\
\ as disease). (D) Terrorism is viewed as a criminal problem. The criminalization\
\ of terrorism has two important implications. Firstly, it suggests that terrorism\
\ can be eradicated - terrorists can be caught and brought to trial by normal judicial\
\ proceedings thereby removing the threat from society - and secondly, it suggests\
\ that preventative crime techniques are applicable to prevent its development.\n\
A: Let's think step by step. We refer to Wikipedia articles on security studies\
\ for help. (A) is wrong because it is not competition between larger nations that\
\ causes terrorism. \n(B) is wrong because globalization is not the cause of terrorism.\n\
(C) is correct because the US undertook the war on terrorism. \n(D) is wrong because\
\ preventative crime techniques will likely not end terrorism. The answer is (C).\n\
\nQ: Which of the following is the best lens through which to investigate the role\
\ of child soldiers?\n(A) Child soldiers are victims of combat that need re-education\
\ and rehabilitation. (B) Children and their mothers are not active subjects in\
\ warfare and are best considered as subjects in the private sphere. (C) Children\
\ are most often innocent bystanders in war and are best used as signifiers of peace.\
\ (D) Children have political subjecthood that is missed when they are considered\
\ as passive victims of warfare.\nA: Let's think step by step. We refer to Wikipedia\
\ articles on security studies for help. Child soliders as a political topic can\
\ be missed when they are considered passive victims of warfare. The answer is (D).\n\
\nQ: How can we best describe the relationship between the state-centric approach\
\ and the concept of human security?\n(A) There are such wide divisions within the\
\ human security framework regarding the nature of threats and referent objects\
\ that no widely applicable comparisons between state-centric approaches and human\
\ security can be drawn. (B) By adopting the framework of human security, the limitations\
\ of the realist state-centric approach become evident. Whilst human security defines\
\ the referent object as the person or population, state-centric approaches prioritise\
\ the security of the state, de-prioritizing the pursuit of human security. (C)\
\ The state-centric approach to security is a faction of human security, usually\
\ defined within the broad school of human security. By being state-centric this\
\ approach prioritises the individual as the referent object in security studies.\
\ (D) Both the state-centric and human-centric approaches to security are mutually\
\ exclusive and offer a sufficient analytic framework with which to understand the\
\ international security system. It is therefore the role of security analysts to\
\ determine which of these substantial concepts is correct, and which should be\
\ discarded.\nA: Let's think step by step. We refer to Wikipedia articles on security\
\ studies for help. Human security focuses on a person or population whereas state-centric\
\ approaches focus on the state while deprioritizing human security. The answer\
\ is (B).\n\nQ: In order to become securitized, a threat must be presented in which\
\ of these ways?\n(A) As an existential threat that requires immediate and extraordinary\
\ action, posing a threat to the survival of the state or to societal security.\
\ (B) As requiring immediate and extraordinary action by the state, threatening\
\ the survival of a referent object and therefore warranting the use of measures\
\ not normally employed in the political realm. (C) As an urgent threat to the survival\
\ of the referent object, so serious that it legitimises the employment of extraordinary\
\ action in response. (D) As an urgent threat to the survival of the audience that\
\ requires extraordinary or emergency measures.\nA: Let's think step by step. We\
\ refer to Wikipedia articles on security studies for help. To be securitized, a\
\ threat must be an urgent threat to the survival of the referent object. The answer\
\ is (C).\n\nQ: What distinguishes coercive diplomacy from military force?\n(A)\
\ Compellence is another term for coercive diplomacy, but covering a narrower set\
\ of criteria; compellence covers those threats aimed at initiating adversary action.\
\ A threat to coerce a state to give up part of its territory would count as coercive\
\ diplomacy, as long as that threat proactively initiates action before reactive\
\ diplomacy is taken. (B) Coercive diplomacy constitutes the threats of limited\
\ force to induce adversary's incentive to comply with the coercer's demands. It\
\ is an influence strategy that is intended to obtain compliance: the use of force\
\ to defeat an opponent first does not count. It leaves an element of choice with\
\ the target to comply, or to continue. (C) Military force, or the threat of military\
\ force, utilises fear to achieve strategic objectives. Coercive diplomacy is differentiated\
\ from this approach, because it does not use fear as a tool for coercing an adversary.\
\ (D) Coercive diplomacy is employed to use force but to limit its effects on the\
\ international community. Coercive diplomacy is an aggressive strategy that is\
\ intended to obtain compliance through defeat. It does not leave an element of\
\ choice with the target, the target either being forced to comply or engage in\
\ conflict. It seeks to control by imposing compliance by removing any opportunity\
\ for negotiation or concession.\nA: Let's think step by step. We refer to Wikipedia\
\ articles on security studies for help. Coercive diplomacy uses the threat of force\
\ to induce the opponent to comply with demands. The answer is (B).\n\n"
"group": "mmlu_flan_cot_fewshot_social_sciences"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_security_studies"
dataset_name: security_studies
description: The following are multiple choice questions (with answers) about security
studies.
fewshot_config:
sampler: first_n
samples:
- question: 'What are the frameworks of analysis within which terrorism has been considered
(as of 2020)?
(A) Competition between larger nations has resulted in some countries actively
supporting terrorist groups to undermine the strength of rival states. Terrorist
networks are extended patronage clubs maintained and paid for by their donor
states and are conceptualised as being like state actors, to be dealt with using
military force. (B) Globalization has enabled the internationalization of terrorist
activities by opening up their operational space, although coordination is still
managed from a geographical base. This suggests that terrorist groups are nationally
structured which means that terrorism cannot be considered in terms of a war
to be defeated militarily without having serious implications on the indigenous
population. (C) Terrorism can be viewed as a problem to be resolved by military
means (war on terrorism), by normal police techniques (terrorism as crime),
or as a medical problem with underlying causes and symptoms (terrorism as disease).
(D) Terrorism is viewed as a criminal problem. The criminalization of terrorism
has two important implications. Firstly, it suggests that terrorism can be eradicated
- terrorists can be caught and brought to trial by normal judicial proceedings
thereby removing the threat from society - and secondly, it suggests that preventative
crime techniques are applicable to prevent its development.'
target: "Let's think step by step. We refer to Wikipedia articles on security\
\ studies for help. (A) is wrong because it is not competition between larger\
\ nations that causes terrorism. \n(B) is wrong because globalization is not\
\ the cause of terrorism.\n(C) is correct because the US undertook the war on\
\ terrorism. \n(D) is wrong because preventative crime techniques will likely\
\ not end terrorism. The answer is (C)."
- question: 'Which of the following is the best lens through which to investigate the
role of child soldiers?
(A) Child soldiers are victims of combat that need re-education and rehabilitation.
(B) Children and their mothers are not active subjects in warfare and are best
considered as subjects in the private sphere. (C) Children are most often innocent
bystanders in war and are best used as signifiers of peace. (D) Children have
political subjecthood that is missed when they are considered as passive victims
of warfare.'
target: Let's think step by step. We refer to Wikipedia articles on security studies
for help. Child soliders as a political topic can be missed when they are considered
passive victims of warfare. The answer is (D).
- question: 'How can we best describe the relationship between the state-centric approach
and the concept of human security?
(A) There are such wide divisions within the human security framework regarding
the nature of threats and referent objects that no widely applicable comparisons
between state-centric approaches and human security can be drawn. (B) By adopting
the framework of human security, the limitations of the realist state-centric
approach become evident. Whilst human security defines the referent object as
the person or population, state-centric approaches prioritise the security of
the state, de-prioritizing the pursuit of human security. (C) The state-centric
approach to security is a faction of human security, usually defined within
the broad school of human security. By being state-centric this approach prioritises
the individual as the referent object in security studies. (D) Both the state-centric
and human-centric approaches to security are mutually exclusive and offer a
sufficient analytic framework with which to understand the international security
system. It is therefore the role of security analysts to determine which of
these substantial concepts is correct, and which should be discarded.'
target: Let's think step by step. We refer to Wikipedia articles on security studies
for help. Human security focuses on a person or population whereas state-centric
approaches focus on the state while deprioritizing human security. The answer
is (B).
- question: 'In order to become securitized, a threat must be presented in which of
these ways?
(A) As an existential threat that requires immediate and extraordinary action,
posing a threat to the survival of the state or to societal security. (B) As
requiring immediate and extraordinary action by the state, threatening the survival
of a referent object and therefore warranting the use of measures not normally
employed in the political realm. (C) As an urgent threat to the survival of
the referent object, so serious that it legitimises the employment of extraordinary
action in response. (D) As an urgent threat to the survival of the audience
that requires extraordinary or emergency measures.'
target: Let's think step by step. We refer to Wikipedia articles on security studies
for help. To be securitized, a threat must be an urgent threat to the survival
of the referent object. The answer is (C).
- question: 'What distinguishes coercive diplomacy from military force?
(A) Compellence is another term for coercive diplomacy, but covering a narrower
set of criteria; compellence covers those threats aimed at initiating adversary
action. A threat to coerce a state to give up part of its territory would count
as coercive diplomacy, as long as that threat proactively initiates action before
reactive diplomacy is taken. (B) Coercive diplomacy constitutes the threats
of limited force to induce adversary''s incentive to comply with the coercer''s
demands. It is an influence strategy that is intended to obtain compliance:
the use of force to defeat an opponent first does not count. It leaves an element
of choice with the target to comply, or to continue. (C) Military force, or
the threat of military force, utilises fear to achieve strategic objectives.
Coercive diplomacy is differentiated from this approach, because it does not
use fear as a tool for coercing an adversary. (D) Coercive diplomacy is employed
to use force but to limit its effects on the international community. Coercive
diplomacy is an aggressive strategy that is intended to obtain compliance through
defeat. It does not leave an element of choice with the target, the target either
being forced to comply or engage in conflict. It seeks to control by imposing
compliance by removing any opportunity for negotiation or concession.'
target: 'Let''s think step by step. We refer to Wikipedia articles on security
studies for help. Coercive diplomacy uses the threat of force to induce the
opponent to comply with demands. The answer is (B).'
group: mmlu_flan_cot_fewshot_social_sciences
include: _mmlu_flan_cot_fewshot_template_yaml
task: mmlu_flan_cot_fewshot_security_studies
"dataset_name": "sociology"
"description": "The following are multiple choice questions (with answers) about sociology.\n\
\nQ: Which of the following is not a problem associated with official statistics\
\ on strike action?\n(A) most strikes go unnoticed by employers and the mass media\
\ (B) not all industrial disputes will be reported by the employer (C) the definition\
\ of strikes excludes those that involve fewer than ten workers or last less than\
\ one day (D) it is hard to compare strikes that were measured in different ways\n\
A: Let's think step by step. We refer to Wikipedia articles on sociology for help.\
\ Official statistics on strike action can be problematic because not all industrial\
\ disputes will be reported by employers, the definition of strikes excludes those\
\ that involves fewer than ten workers or last less than one day, and it is hard\
\ to compare strikes that were measured in different ways. Thus, (A) is not a problem\
\ associated with official statistics on strike action. The answer is (A).\n\nQ:\
\ What does Berger (1963) describe as a metaphor for social reality?\n(A) a fairground\
\ ride (B) a circus (C) a puppet theatre (D) a ballet\nA: Let's think step by step.\
\ We refer to Wikipedia articles on sociology for help. Berger describes social\
\ reality using the metaphor of a puppet theatre. The answer is (C).\n\nQ: The term\
\ 'hegemony' refers to:\n(A) the tendency for the working class not to realize their\
\ own interests (B) a dominant ideology that legitimates economic, political and\
\ cultural power (C) a form of dual consciousness based on ideology and everyday\
\ experiences (D) a mode of payment given for outstanding topiary\nA: Let's think\
\ step by step. We refer to Wikipedia articles on sociology for help. Hegemony refers\
\ to a dominant ideology that legitimates economic, policital, and cultural power.\
\ The answer is (B).\n\nQ: The shift from 'civil religion' to 'common religion'\
\ means that:\n(A) the increasing bureaucracy of the state has made religion only\
\ a marginal part of our lives (B) despite the weakening of traditional authority,\
\ our everyday lives and 'common sense' remain shaped by religious beliefs and values\
\ (C) religious participation in collective worship may have declined, but people\
\ still practise their faiths in private (D) people are much more likely to discuss\
\ their religious beliefs in public, informal settings\nA: Let's think step by step.\
\ We refer to Wikipedia articles on sociology for help. The shift from civil religion\
\ to common religion means that despite the weakening of traditional authority,\
\ our everyday lives and common sense remain shaped by religious beliefs and values.\
\ The answer is (B).\n\nQ: Which of the following did the post-war welfare state\
\ of 1948 not aim to provide:\n(A) free health care and education for all (B) a\
\ minimum wage (C) full employment (D) universal welfare\nA: Let's think step by\
\ step. We refer to Wikipedia articles on sociology for help. The post-war welfare\
\ state of 1948 aimed to provide free healthcare and education, full employment,\
\ and universal welfare. But it did not aim to provide a minimum wage. The answer\
\ is (B).\n\n"
"group": "mmlu_flan_cot_fewshot_social_sciences"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_sociology"
dataset_name: sociology
description: The following are multiple choice questions (with answers) about sociology.
fewshot_config:
sampler: first_n
samples:
- question: 'Which of the following is not a problem associated with official statistics
on strike action?
(A) most strikes go unnoticed by employers and the mass media (B) not all industrial
disputes will be reported by the employer (C) the definition of strikes excludes
those that involve fewer than ten workers or last less than one day (D) it is
hard to compare strikes that were measured in different ways'
target: Let's think step by step. We refer to Wikipedia articles on sociology
for help. Official statistics on strike action can be problematic because not
all industrial disputes will be reported by employers, the definition of strikes
excludes those that involves fewer than ten workers or last less than one day,
and it is hard to compare strikes that were measured in different ways. Thus,
(A) is not a problem associated with official statistics on strike action. The
answer is (A).
- question: 'What does Berger (1963) describe as a metaphor for social reality?
(A) a fairground ride (B) a circus (C) a puppet theatre (D) a ballet'
target: Let's think step by step. We refer to Wikipedia articles on sociology
for help. Berger describes social reality using the metaphor of a puppet theatre.
The answer is (C).
- question: 'The term ''hegemony'' refers to:
(A) the tendency for the working class not to realize their own interests (B)
a dominant ideology that legitimates economic, political and cultural power
(C) a form of dual consciousness based on ideology and everyday experiences
(D) a mode of payment given for outstanding topiary'
target: Let's think step by step. We refer to Wikipedia articles on sociology
for help. Hegemony refers to a dominant ideology that legitimates economic,
policital, and cultural power. The answer is (B).
- question: 'The shift from ''civil religion'' to ''common religion'' means that:
(A) the increasing bureaucracy of the state has made religion only a marginal
part of our lives (B) despite the weakening of traditional authority, our everyday
lives and ''common sense'' remain shaped by religious beliefs and values (C)
religious participation in collective worship may have declined, but people
still practise their faiths in private (D) people are much more likely to discuss
their religious beliefs in public, informal settings'
target: Let's think step by step. We refer to Wikipedia articles on sociology
for help. The shift from civil religion to common religion means that despite
the weakening of traditional authority, our everyday lives and common sense
remain shaped by religious beliefs and values. The answer is (B).
- question: 'Which of the following did the post-war welfare state of 1948 not aim
to provide:
(A) free health care and education for all (B) a minimum wage (C) full employment
(D) universal welfare'
target: 'Let''s think step by step. We refer to Wikipedia articles on sociology
for help. The post-war welfare state of 1948 aimed to provide free healthcare
and education, full employment, and universal welfare. But it did not aim to
provide a minimum wage. The answer is (B).'
group: mmlu_flan_cot_fewshot_social_sciences
include: _mmlu_flan_cot_fewshot_template_yaml
task: mmlu_flan_cot_fewshot_sociology
"dataset_name": "us_foreign_policy"
"description": "The following are multiple choice questions (with answers) about us\
\ foreign policy.\n\nQ: How did Donald Trump attack globalization in the 2016 campaign?\n\
(A) Globalization had made men like him too rich (B) Globalization only benefited\
\ certain American states, such as New York (C) Liberal elites had encouraged globalization,\
\ while 'ordinary Americans' lost jobs because of it (D) Globalization encouraged\
\ damaging trade wars\nA: Let's think step by step. We refer to Wikipedia articles\
\ on us foreign policy for help. Trump attacked globalization because he believed\
\ ordinary Americans lost jobs due to it, and so he wanted to blame liberals who\
\ had encouraged it. The answer is (C).\n\nQ: How did NSC-68 change U.S. strategy?\n\
(A) It globalized containment. (B) It militarized containment. (C) It called for\
\ the development of the hydrogen bomb. (D) All of the above\nA: Let's think step\
\ by step. We refer to Wikipedia articles on us foreign policy for help. NSC-68\
\ outlined a variety of courses of action, including globalization of containment,\
\ militarization of contaiment, and the development of the hydrogen bomb. The answer\
\ is (D).\n\nQ: How do Defensive Realism and Offensive Realism differ in their explanation\
\ of state behaviour?\n(A) Defensive realists place greater emphasis on the role\
\ of international institutions (B) Defensive realists place less emphasis on geographical\
\ factors (C) Offensive realists give more priority to the national interest than\
\ Defensive realists. (D) Defensive realists believe states are security maximizers,\
\ while Offensive realists believe states to be power maximizers\nA: Let's think\
\ step by step. We refer to Wikipedia articles on us foreign policy for help. While\
\ defensive realism advocates that states are security maximizers, offensive realists\
\ think of states as power maximizers. The answer is (D).\n\nQ: The realm of policy\
\ decisions concerned primarily with relations between the United States and the\
\ rest of the world is known as\n(A) terrorism policy. (B) economic policy. (C)\
\ foreign policy. (D) international policy.\nA: Let's think step by step. We refer\
\ to Wikipedia articles on us foreign policy for help. The topic of policy decisions\
\ concerns with relations between the US and the rest of the world is known as foreign\
\ policy. The answer is (C).\n\nQ: How did the 2008 financial crisis affect America's\
\ international reputation?\n(A) It damaged support for the US model of political\
\ economy and capitalism (B) It created anger at the United States for exaggerating\
\ the crisis (C) It increased support for American global leadership under President\
\ Obama (D) It reduced global use of the US dollar\nA: Let's think step by step.\
\ We refer to Wikipedia articles on us foreign policy for help. The 2008 financial\
\ crisis damanged the international reputation of the American model of political\
\ economy and capitalism. The answer is (A).\n\n"
"group": "mmlu_flan_cot_fewshot_social_sciences"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_us_foreign_policy"
dataset_name: us_foreign_policy
description: The following are multiple choice questions (with answers) about us foreign
policy.
fewshot_config:
sampler: first_n
samples:
- question: 'How did Donald Trump attack globalization in the 2016 campaign?
(A) Globalization had made men like him too rich (B) Globalization only benefited
certain American states, such as New York (C) Liberal elites had encouraged
globalization, while ''ordinary Americans'' lost jobs because of it (D) Globalization
encouraged damaging trade wars'
target: Let's think step by step. We refer to Wikipedia articles on us foreign
policy for help. Trump attacked globalization because he believed ordinary Americans
lost jobs due to it, and so he wanted to blame liberals who had encouraged it.
The answer is (C).
- question: 'How did NSC-68 change U.S. strategy?
(A) It globalized containment. (B) It militarized containment. (C) It called
for the development of the hydrogen bomb. (D) All of the above'
target: Let's think step by step. We refer to Wikipedia articles on us foreign
policy for help. NSC-68 outlined a variety of courses of action, including globalization
of containment, militarization of contaiment, and the development of the hydrogen
bomb. The answer is (D).
- question: 'How do Defensive Realism and Offensive Realism differ in their explanation
of state behaviour?
(A) Defensive realists place greater emphasis on the role of international institutions
(B) Defensive realists place less emphasis on geographical factors (C) Offensive
realists give more priority to the national interest than Defensive realists.
(D) Defensive realists believe states are security maximizers, while Offensive
realists believe states to be power maximizers'
target: Let's think step by step. We refer to Wikipedia articles on us foreign
policy for help. While defensive realism advocates that states are security
maximizers, offensive realists think of states as power maximizers. The answer
is (D).
- question: 'The realm of policy decisions concerned primarily with relations between
the United States and the rest of the world is known as
(A) terrorism policy. (B) economic policy. (C) foreign policy. (D) international
policy.'
target: Let's think step by step. We refer to Wikipedia articles on us foreign
policy for help. The topic of policy decisions concerns with relations between
the US and the rest of the world is known as foreign policy. The answer is (C).
- question: 'How did the 2008 financial crisis affect America''s international reputation?
(A) It damaged support for the US model of political economy and capitalism
(B) It created anger at the United States for exaggerating the crisis (C) It
increased support for American global leadership under President Obama (D) It
reduced global use of the US dollar'
target: 'Let''s think step by step. We refer to Wikipedia articles on us foreign
policy for help. The 2008 financial crisis damanged the international reputation
of the American model of political economy and capitalism. The answer is (A).'
group: mmlu_flan_cot_fewshot_social_sciences
include: _mmlu_flan_cot_fewshot_template_yaml
task: mmlu_flan_cot_fewshot_us_foreign_policy
"dataset_name": "virology"
"description": "The following are multiple choice questions (with answers) about virology.\n\
\nQ: The median survival time to AIDS and death was established by following:\n\
(A) Seroprevalent HIV-infected individuals (B) Seronegatives (C) Seroconverters\
\ (D) High-risk seronegatives\nA: Let's think step by step. We refer to Wikipedia\
\ articles on virology for help. The median survival time to AIDS and death was\
\ established as a result of the development of seroconverters. The answer is (C).\n\
\nQ: Which of the following is a morphological characteristic of the paramyxoviruses.\n\
(A) Fragile viruses often visualised with RNA spewing from the inside (B) Elongate\
\ viruses (C) Icosahedral viruses with envelope (D) Very large viruses\nA: Let's\
\ think step by step. We refer to Wikipedia articles on virology for help. Paramyxoviruses\
\ are fragile viruses often visualised with RNA spewing from the inside. The answer\
\ is (A).\n\nQ: The most important goal of a behavioral intervention is:\n(A) Change\
\ in behavior (B) Comprehensive coverage (C) Effective use of behavioral theory\
\ (D) Sustained behavior change\nA: Let's think step by step. We refer to Wikipedia\
\ articles on virology for help. The prim goal of a behavioral intervention is to\
\ cause sustained behavior change. The answer is (D).\n\nQ: A key factor facilitating\
\ the application of nested case-control studies from the MACS was:\n(A) Data collection\
\ (B) Establishment of a repository of biologic specimens (C) Participant interest\
\ (D) Administration of the questionnaire by staff\nA: Let's think step by step.\
\ We refer to Wikipedia articles on virology for help. The Multicenter AIDS Cohort\
\ Study's use of nested case-control studies was facilitated by the establishment\
\ of a repository of biologic specimens. The answer is (B).\n\nQ: Why are parvoviruses\
\ a highly impactful parasite?\n(A) Because they have no nucleic acid (B) They require\
\ a helper virus (C) Only replicate in dividing cells (D) Can integrate into host\
\ chromosomes\nA: Let's think step by step. We refer to Wikipedia articles on virology\
\ for help. Paroviruses are highly impactful because they do not have nucleic acid.\
\ The answer is (A).\n\n"
"group": "mmlu_flan_cot_fewshot_other"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_virology"
dataset_name: virology
description: The following are multiple choice questions (with answers) about virology.
fewshot_config:
sampler: first_n
samples:
- question: 'The median survival time to AIDS and death was established by following:
(A) Seroprevalent HIV-infected individuals (B) Seronegatives (C) Seroconverters
(D) High-risk seronegatives'
target: Let's think step by step. We refer to Wikipedia articles on virology for
help. The median survival time to AIDS and death was established as a result
of the development of seroconverters. The answer is (C).
- question: 'Which of the following is a morphological characteristic of the paramyxoviruses.
(A) Fragile viruses often visualised with RNA spewing from the inside (B) Elongate
viruses (C) Icosahedral viruses with envelope (D) Very large viruses'
target: Let's think step by step. We refer to Wikipedia articles on virology for
help. Paramyxoviruses are fragile viruses often visualised with RNA spewing
from the inside. The answer is (A).
- question: 'The most important goal of a behavioral intervention is:
(A) Change in behavior (B) Comprehensive coverage (C) Effective use of behavioral
theory (D) Sustained behavior change'
target: Let's think step by step. We refer to Wikipedia articles on virology for
help. The prim goal of a behavioral intervention is to cause sustained behavior
change. The answer is (D).
- question: 'A key factor facilitating the application of nested case-control studies
from the MACS was:
(A) Data collection (B) Establishment of a repository of biologic specimens
(C) Participant interest (D) Administration of the questionnaire by staff'
target: Let's think step by step. We refer to Wikipedia articles on virology for
help. The Multicenter AIDS Cohort Study's use of nested case-control studies
was facilitated by the establishment of a repository of biologic specimens.
The answer is (B).
- question: 'Why are parvoviruses a highly impactful parasite?
(A) Because they have no nucleic acid (B) They require a helper virus (C) Only
replicate in dividing cells (D) Can integrate into host chromosomes'
target: 'Let''s think step by step. We refer to Wikipedia articles on virology
for help. Paroviruses are highly impactful because they do not have nucleic
acid. The answer is (A).'
group: mmlu_flan_cot_fewshot_other
include: _mmlu_flan_cot_fewshot_template_yaml
task: mmlu_flan_cot_fewshot_virology
"dataset_name": "world_religions"
"description": "The following are multiple choice questions (with answers) about world\
\ religions.\n\nQ: How can the Upanishads be characterized?\n(A) Ritual texts (B)\
\ Philosophical texts (C) Hymns (D) Origin stories\nA: Let's think step by step.\
\ We refer to Wikipedia articles on world religions for help. The Upanishads are\
\ the most recent part of Vedas (the oldest scriptures in Hinduism) and supplied\
\ the basis of later Hindu philosophy. So they are philosophical texts. The answer\
\ is (B).\n\nQ: What is the Second Gem in Buddhism?\n(A) The Dharma (B) The Sangha\
\ (C) The Buddha (D) The Bodhisattva\nA: Let's think step by step. We refer to Wikipedia\
\ articles on world religions for help. The Second Gem in Buddhism is The Dharma.\
\ The answer is (A).\n\nQ: Which Japanese government promoted a kind of national\
\ cult based on the emperor and his associations with kami?\n(A) Honen (B) Tanaka\
\ (C) Tokugawa (D) Meiji\nA: Let's think step by step. We refer to Wikipedia articles\
\ on world religions for help. The promotion of a national cult based on the emperor\
\ and his associations with Kami happened during the reign of Emperor Meiji (1852-1912).\
\ The answer is (D).\n\nQ: In which dynasty was the \"Mandate of Heaven\" developed\
\ to legitimatize the new rulers?\n(A) Shang (B) Zhou (C) Han (D) Xia\nA: Let's\
\ think step by step. We refer to Wikipedia articles on world religions for help.\
\ The \"Mandate of Heaven\" was developed as an ancient Chinese philosophical concept\
\ during the Zhou Dynasty (1046-256 BCE). The answer is (B).\n\nQ: What is the sign\
\ of the covenant for Jewish males?\n(A) The rainbow (B) Circumcision (C) A son\
\ (D) Bar mitzvah\nA: Let's think step by step. We refer to Wikipedia articles on\
\ world religions for help. In Judaism, the most distinctive sign of the covenant\
\ is circumcision (brit milah). The answer is (B).\n\n"
"group": "mmlu_flan_cot_fewshot_humanities"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_world_religions"
dataset_name: world_religions
description: The following are multiple choice questions (with answers) about world
religions.
fewshot_config:
sampler: first_n
samples:
- question: 'How can the Upanishads be characterized?
(A) Ritual texts (B) Philosophical texts (C) Hymns (D) Origin stories'
target: Let's think step by step. We refer to Wikipedia articles on world religions
for help. The Upanishads are the most recent part of Vedas (the oldest scriptures
in Hinduism) and supplied the basis of later Hindu philosophy. So they are philosophical
texts. The answer is (B).
- question: 'What is the Second Gem in Buddhism?
(A) The Dharma (B) The Sangha (C) The Buddha (D) The Bodhisattva'
target: Let's think step by step. We refer to Wikipedia articles on world religions
for help. The Second Gem in Buddhism is The Dharma. The answer is (A).
- question: 'Which Japanese government promoted a kind of national cult based on the
emperor and his associations with kami?
(A) Honen (B) Tanaka (C) Tokugawa (D) Meiji'
target: Let's think step by step. We refer to Wikipedia articles on world religions
for help. The promotion of a national cult based on the emperor and his associations
with Kami happened during the reign of Emperor Meiji (1852-1912). The answer
is (D).
- question: 'In which dynasty was the "Mandate of Heaven" developed to legitimatize
the new rulers?
(A) Shang (B) Zhou (C) Han (D) Xia'
target: Let's think step by step. We refer to Wikipedia articles on world religions
for help. The "Mandate of Heaven" was developed as an ancient Chinese philosophical
concept during the Zhou Dynasty (1046-256 BCE). The answer is (B).
- question: 'What is the sign of the covenant for Jewish males?
(A) The rainbow (B) Circumcision (C) A son (D) Bar mitzvah'
target: 'Let''s think step by step. We refer to Wikipedia articles on world religions
for help. In Judaism, the most distinctive sign of the covenant is circumcision
(brit milah). The answer is (B).'
group: mmlu_flan_cot_fewshot_humanities
include: _mmlu_flan_cot_fewshot_template_yaml
task: mmlu_flan_cot_fewshot_world_religions
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment