Unverified Commit 6769119f authored by Hailey Schoelkopf's avatar Hailey Schoelkopf Committed by GitHub
Browse files

Merge pull request #816 from EleutherAI/flan-benchmark

[Refactor] Flan benchmark
parents 4824a832 7d5e511c
dataset_name: us_foreign_policy
description: 'The following are multiple choice questions (with answers) about us
foreign policy.
Q: How did Donald Trump attack globalization in the 2016 campaign?
(A) Globalization had made men like him too rich (B) Globalization only benefited
certain American states, such as New York (C) Liberal elites had encouraged globalization,
while ''ordinary Americans'' lost jobs because of it (D) Globalization encouraged
damaging trade wars
A: Let''s think step by step. We refer to Wikipedia articles on us foreign policy
for help. Trump attacked globalization because he believed ordinary Americans lost
jobs due to it, and so he wanted to blame liberals who had encouraged it. The answer
is (C).
Q: How did NSC-68 change U.S. strategy?
(A) It globalized containment. (B) It militarized containment. (C) It called for
the development of the hydrogen bomb. (D) All of the above
A: Let''s think step by step. We refer to Wikipedia articles on us foreign policy
for help. NSC-68 outlined a variety of courses of action, including globalization
of containment, militarization of contaiment, and the development of the hydrogen
bomb. The answer is (D).
Q: How do Defensive Realism and Offensive Realism differ in their explanation of
state behaviour?
(A) Defensive realists place greater emphasis on the role of international institutions
(B) Defensive realists place less emphasis on geographical factors (C) Offensive
realists give more priority to the national interest than Defensive realists. (D)
Defensive realists believe states are security maximizers, while Offensive realists
believe states to be power maximizers
A: Let''s think step by step. We refer to Wikipedia articles on us foreign policy
for help. While defensive realism advocates that states are security maximizers,
offensive realists think of states as power maximizers. The answer is (D).
Q: The realm of policy decisions concerned primarily with relations between the
United States and the rest of the world is known as
(A) terrorism policy. (B) economic policy. (C) foreign policy. (D) international
policy.
A: Let''s think step by step. We refer to Wikipedia articles on us foreign policy
for help. The topic of policy decisions concerns with relations between the US and
the rest of the world is known as foreign policy. The answer is (C).
Q: How did the 2008 financial crisis affect America''s international reputation?
(A) It damaged support for the US model of political economy and capitalism (B)
It created anger at the United States for exaggerating the crisis (C) It increased
support for American global leadership under President Obama (D) It reduced global
use of the US dollar
A: Let''s think step by step. We refer to Wikipedia articles on us foreign policy
for help. The 2008 financial crisis damanged the international reputation of the
American model of political economy and capitalism. The answer is (A).'
include: _mmlu_flan_cot_fewshot_template_yaml
task: mmlu_flan_cot_fewshot_us_foreign_policy
dataset_name: virology
description: 'The following are multiple choice questions (with answers) about virology.
Q: The median survival time to AIDS and death was established by following:
(A) Seroprevalent HIV-infected individuals (B) Seronegatives (C) Seroconverters
(D) High-risk seronegatives
A: Let''s think step by step. We refer to Wikipedia articles on virology for help.
The median survival time to AIDS and death was established as a result of the development
of seroconverters. The answer is (C).
Q: Which of the following is a morphological characteristic of the paramyxoviruses.
(A) Fragile viruses often visualised with RNA spewing from the inside (B) Elongate
viruses (C) Icosahedral viruses with envelope (D) Very large viruses
A: Let''s think step by step. We refer to Wikipedia articles on virology for help.
Paramyxoviruses are fragile viruses often visualised with RNA spewing from the inside.
The answer is (A).
Q: The most important goal of a behavioral intervention is:
(A) Change in behavior (B) Comprehensive coverage (C) Effective use of behavioral
theory (D) Sustained behavior change
A: Let''s think step by step. We refer to Wikipedia articles on virology for help.
The prim goal of a behavioral intervention is to cause sustained behavior change.
The answer is (D).
Q: A key factor facilitating the application of nested case-control studies from
the MACS was:
(A) Data collection (B) Establishment of a repository of biologic specimens (C)
Participant interest (D) Administration of the questionnaire by staff
A: Let''s think step by step. We refer to Wikipedia articles on virology for help.
The Multicenter AIDS Cohort Study''s use of nested case-control studies was facilitated
by the establishment of a repository of biologic specimens. The answer is (B).
Q: Why are parvoviruses a highly impactful parasite?
(A) Because they have no nucleic acid (B) They require a helper virus (C) Only replicate
in dividing cells (D) Can integrate into host chromosomes
A: Let''s think step by step. We refer to Wikipedia articles on virology for help.
Paroviruses are highly impactful because they do not have nucleic acid. The answer
is (A).'
include: _mmlu_flan_cot_fewshot_template_yaml
task: mmlu_flan_cot_fewshot_virology
dataset_name: world_religions
description: 'The following are multiple choice questions (with answers) about world
religions.
Q: How can the Upanishads be characterized?
(A) Ritual texts (B) Philosophical texts (C) Hymns (D) Origin stories
A: Let''s think step by step. We refer to Wikipedia articles on world religions
for help. The Upanishads are the most recent part of Vedas (the oldest scriptures
in Hinduism) and supplied the basis of later Hindu philosophy. So they are philosophical
texts. The answer is (B).
Q: What is the Second Gem in Buddhism?
(A) The Dharma (B) The Sangha (C) The Buddha (D) The Bodhisattva
A: Let''s think step by step. We refer to Wikipedia articles on world religions
for help. The Second Gem in Buddhism is The Dharma. The answer is (A).
Q: Which Japanese government promoted a kind of national cult based on the emperor
and his associations with kami?
(A) Honen (B) Tanaka (C) Tokugawa (D) Meiji
A: Let''s think step by step. We refer to Wikipedia articles on world religions
for help. The promotion of a national cult based on the emperor and his associations
with Kami happened during the reign of Emperor Meiji (1852-1912). The answer is
(D).
Q: In which dynasty was the "Mandate of Heaven" developed to legitimatize the new
rulers?
(A) Shang (B) Zhou (C) Han (D) Xia
A: Let''s think step by step. We refer to Wikipedia articles on world religions
for help. The "Mandate of Heaven" was developed as an ancient Chinese philosophical
concept during the Zhou Dynasty (1046-256 BCE). The answer is (B).
Q: What is the sign of the covenant for Jewish males?
(A) The rainbow (B) Circumcision (C) A son (D) Bar mitzvah
A: Let''s think step by step. We refer to Wikipedia articles on world religions
for help. In Judaism, the most distinctive sign of the covenant is circumcision
(brit milah). The answer is (B).'
include: _mmlu_flan_cot_fewshot_template_yaml
task: mmlu_flan_cot_fewshot_world_religions
group: mmlu_flan_cot_zeroshot
dataset_path: cais/mmlu
validation_split: validation
fewshot_split: dev
output_type: greedy_until
doc_to_text: "Q: {{question.strip()}}\n(A) {{choices[0]}} (B) {{choices[1]}} (C) {{choices[2]}} (D) {{choices[3]}}\nA: Let's think step by step."
doc_to_target: "{{['(A)', '(B)', '(C)', '(D)'][answer]}}"
filter_list:
- name: "get-answer"
filter:
- function: "regex"
regex_pattern: "((?<=The answer is )(.*)(?=.)|(?<=the answer is )(.*)(?=.)|(?<=The answer: )(.*)(?=.)|(?<=The final answer: )(.*)(?=.))"
- function: "take_first"
generation_kwargs:
until:
- "</s>"
do_sample: false
temperature: 0.0
metric_list:
- metric: exact_match
aggregation: mean
higher_is_better: true
ignore_case: true
ignore_punctuation: true
dataset_name: abstract_algebra
description: 'The following are multiple choice questions (with answers) about abstract
algebra.
'
include: _mmlu_flan_generative_template_yaml
task: mmlu_flan_cot_zeroshot_abstract_algebra
dataset_name: anatomy
description: 'The following are multiple choice questions (with answers) about anatomy.
'
include: _mmlu_flan_generative_template_yaml
task: mmlu_flan_cot_zeroshot_anatomy
dataset_name: astronomy
description: 'The following are multiple choice questions (with answers) about astronomy.
'
include: _mmlu_flan_generative_template_yaml
task: mmlu_flan_cot_zeroshot_astronomy
dataset_name: business_ethics
description: 'The following are multiple choice questions (with answers) about business
ethics.
'
include: _mmlu_flan_generative_template_yaml
task: mmlu_flan_cot_zeroshot_business_ethics
dataset_name: clinical_knowledge
description: 'The following are multiple choice questions (with answers) about clinical
knowledge.
'
include: _mmlu_flan_generative_template_yaml
task: mmlu_flan_cot_zeroshot_clinical_knowledge
dataset_name: college_biology
description: 'The following are multiple choice questions (with answers) about college
biology.
'
include: _mmlu_flan_generative_template_yaml
task: mmlu_flan_cot_zeroshot_college_biology
dataset_name: college_chemistry
description: 'The following are multiple choice questions (with answers) about college
chemistry.
'
include: _mmlu_flan_generative_template_yaml
task: mmlu_flan_cot_zeroshot_college_chemistry
dataset_name: college_computer_science
description: 'The following are multiple choice questions (with answers) about college
computer science.
'
include: _mmlu_flan_generative_template_yaml
task: mmlu_flan_cot_zeroshot_college_computer_science
dataset_name: college_mathematics
description: 'The following are multiple choice questions (with answers) about college
mathematics.
'
include: _mmlu_flan_generative_template_yaml
task: mmlu_flan_cot_zeroshot_college_mathematics
dataset_name: college_medicine
description: 'The following are multiple choice questions (with answers) about college
medicine.
'
include: _mmlu_flan_generative_template_yaml
task: mmlu_flan_cot_zeroshot_college_medicine
dataset_name: college_physics
description: 'The following are multiple choice questions (with answers) about college
physics.
'
include: _mmlu_flan_generative_template_yaml
task: mmlu_flan_cot_zeroshot_college_physics
dataset_name: computer_security
description: 'The following are multiple choice questions (with answers) about computer
security.
'
include: _mmlu_flan_generative_template_yaml
task: mmlu_flan_cot_zeroshot_computer_security
dataset_name: conceptual_physics
description: 'The following are multiple choice questions (with answers) about conceptual
physics.
'
include: _mmlu_flan_generative_template_yaml
task: mmlu_flan_cot_zeroshot_conceptual_physics
dataset_name: econometrics
description: 'The following are multiple choice questions (with answers) about econometrics.
'
include: _mmlu_flan_generative_template_yaml
task: mmlu_flan_cot_zeroshot_econometrics
dataset_name: electrical_engineering
description: 'The following are multiple choice questions (with answers) about electrical
engineering.
'
include: _mmlu_flan_generative_template_yaml
task: mmlu_flan_cot_zeroshot_electrical_engineering
dataset_name: elementary_mathematics
description: 'The following are multiple choice questions (with answers) about elementary
mathematics.
'
include: _mmlu_flan_generative_template_yaml
task: mmlu_flan_cot_zeroshot_elementary_mathematics
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment