Commit e1ae8a2f authored by Herbie Bradley's avatar Herbie Bradley
Browse files

Merge remote-tracking branch 'origin/big-refactor' into calibration

parents 50e99bd7 30936bc7
"dataset_name": "us_foreign_policy"
"description": "The following are multiple choice questions (with answers) about us\
\ foreign policy.\n\nQ: How did Donald Trump attack globalization in the 2016 campaign?\n\
(A) Globalization had made men like him too rich (B) Globalization only benefited\
\ certain American states, such as New York (C) Liberal elites had encouraged globalization,\
\ while 'ordinary Americans' lost jobs because of it (D) Globalization encouraged\
\ damaging trade wars\nA: Let's think step by step. We refer to Wikipedia articles\
\ on us foreign policy for help. Trump attacked globalization because he believed\
\ ordinary Americans lost jobs due to it, and so he wanted to blame liberals who\
\ had encouraged it. The answer is (C).\n\nQ: How did NSC-68 change U.S. strategy?\n\
(A) It globalized containment. (B) It militarized containment. (C) It called for\
\ the development of the hydrogen bomb. (D) All of the above\nA: Let's think step\
\ by step. We refer to Wikipedia articles on us foreign policy for help. NSC-68\
\ outlined a variety of courses of action, including globalization of containment,\
\ militarization of contaiment, and the development of the hydrogen bomb. The answer\
\ is (D).\n\nQ: How do Defensive Realism and Offensive Realism differ in their explanation\
\ of state behaviour?\n(A) Defensive realists place greater emphasis on the role\
\ of international institutions (B) Defensive realists place less emphasis on geographical\
\ factors (C) Offensive realists give more priority to the national interest than\
\ Defensive realists. (D) Defensive realists believe states are security maximizers,\
\ while Offensive realists believe states to be power maximizers\nA: Let's think\
\ step by step. We refer to Wikipedia articles on us foreign policy for help. While\
\ defensive realism advocates that states are security maximizers, offensive realists\
\ think of states as power maximizers. The answer is (D).\n\nQ: The realm of policy\
\ decisions concerned primarily with relations between the United States and the\
\ rest of the world is known as\n(A) terrorism policy. (B) economic policy. (C)\
\ foreign policy. (D) international policy.\nA: Let's think step by step. We refer\
\ to Wikipedia articles on us foreign policy for help. The topic of policy decisions\
\ concerns with relations between the US and the rest of the world is known as foreign\
\ policy. The answer is (C).\n\nQ: How did the 2008 financial crisis affect America's\
\ international reputation?\n(A) It damaged support for the US model of political\
\ economy and capitalism (B) It created anger at the United States for exaggerating\
\ the crisis (C) It increased support for American global leadership under President\
\ Obama (D) It reduced global use of the US dollar\nA: Let's think step by step.\
\ We refer to Wikipedia articles on us foreign policy for help. The 2008 financial\
\ crisis damanged the international reputation of the American model of political\
\ economy and capitalism. The answer is (A)."
"group": "mmlu_flan_cot_fewshot_social_sciences"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_us_foreign_policy"
"dataset_name": "virology"
"description": "The following are multiple choice questions (with answers) about virology.\n\
\nQ: The median survival time to AIDS and death was established by following:\n\
(A) Seroprevalent HIV-infected individuals (B) Seronegatives (C) Seroconverters\
\ (D) High-risk seronegatives\nA: Let's think step by step. We refer to Wikipedia\
\ articles on virology for help. The median survival time to AIDS and death was\
\ established as a result of the development of seroconverters. The answer is (C).\n\
\nQ: Which of the following is a morphological characteristic of the paramyxoviruses.\n\
(A) Fragile viruses often visualised with RNA spewing from the inside (B) Elongate\
\ viruses (C) Icosahedral viruses with envelope (D) Very large viruses\nA: Let's\
\ think step by step. We refer to Wikipedia articles on virology for help. Paramyxoviruses\
\ are fragile viruses often visualised with RNA spewing from the inside. The answer\
\ is (A).\n\nQ: The most important goal of a behavioral intervention is:\n(A) Change\
\ in behavior (B) Comprehensive coverage (C) Effective use of behavioral theory\
\ (D) Sustained behavior change\nA: Let's think step by step. We refer to Wikipedia\
\ articles on virology for help. The prim goal of a behavioral intervention is to\
\ cause sustained behavior change. The answer is (D).\n\nQ: A key factor facilitating\
\ the application of nested case-control studies from the MACS was:\n(A) Data collection\
\ (B) Establishment of a repository of biologic specimens (C) Participant interest\
\ (D) Administration of the questionnaire by staff\nA: Let's think step by step.\
\ We refer to Wikipedia articles on virology for help. The Multicenter AIDS Cohort\
\ Study's use of nested case-control studies was facilitated by the establishment\
\ of a repository of biologic specimens. The answer is (B).\n\nQ: Why are parvoviruses\
\ a highly impactful parasite?\n(A) Because they have no nucleic acid (B) They require\
\ a helper virus (C) Only replicate in dividing cells (D) Can integrate into host\
\ chromosomes\nA: Let's think step by step. We refer to Wikipedia articles on virology\
\ for help. Paroviruses are highly impactful because they do not have nucleic acid.\
\ The answer is (A)."
"group": "mmlu_flan_cot_fewshot_other"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_virology"
"dataset_name": "world_religions"
"description": "The following are multiple choice questions (with answers) about world\
\ religions.\n\nQ: How can the Upanishads be characterized?\n(A) Ritual texts (B)\
\ Philosophical texts (C) Hymns (D) Origin stories\nA: Let's think step by step.\
\ We refer to Wikipedia articles on world religions for help. The Upanishads are\
\ the most recent part of Vedas (the oldest scriptures in Hinduism) and supplied\
\ the basis of later Hindu philosophy. So they are philosophical texts. The answer\
\ is (B).\n\nQ: What is the Second Gem in Buddhism?\n(A) The Dharma (B) The Sangha\
\ (C) The Buddha (D) The Bodhisattva\nA: Let's think step by step. We refer to Wikipedia\
\ articles on world religions for help. The Second Gem in Buddhism is The Dharma.\
\ The answer is (A).\n\nQ: Which Japanese government promoted a kind of national\
\ cult based on the emperor and his associations with kami?\n(A) Honen (B) Tanaka\
\ (C) Tokugawa (D) Meiji\nA: Let's think step by step. We refer to Wikipedia articles\
\ on world religions for help. The promotion of a national cult based on the emperor\
\ and his associations with Kami happened during the reign of Emperor Meiji (1852-1912).\
\ The answer is (D).\n\nQ: In which dynasty was the \"Mandate of Heaven\" developed\
\ to legitimatize the new rulers?\n(A) Shang (B) Zhou (C) Han (D) Xia\nA: Let's\
\ think step by step. We refer to Wikipedia articles on world religions for help.\
\ The \"Mandate of Heaven\" was developed as an ancient Chinese philosophical concept\
\ during the Zhou Dynasty (1046-256 BCE). The answer is (B).\n\nQ: What is the sign\
\ of the covenant for Jewish males?\n(A) The rainbow (B) Circumcision (C) A son\
\ (D) Bar mitzvah\nA: Let's think step by step. We refer to Wikipedia articles on\
\ world religions for help. In Judaism, the most distinctive sign of the covenant\
\ is circumcision (brit milah). The answer is (B)."
"group": "mmlu_flan_cot_fewshot_humanities"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_world_religions"
group: mmlu_flan_cot_zeroshot
task:
- mmlu_flan_cot_zeroshot_stem
- mmlu_flan_cot_zeroshot_other
- mmlu_flan_cot_zeroshot_social_sciences
- mmlu_flan_cot_zeroshot_humanities
dataset_path: hails/mmlu_no_train # a copy of `cais/mmlu` with no auxiliary_train split
validation_split: validation
fewshot_split: dev
output_type: generate_until
doc_to_text: "Q: {{question.strip()}}\n(A) {{choices[0]}} (B) {{choices[1]}} (C) {{choices[2]}} (D) {{choices[3]}}\nA: Let's think step by step."
doc_to_target: "{{['(A)', '(B)', '(C)', '(D)'][answer]}}"
filter_list:
- name: "get-answer"
filter:
- function: "regex"
regex_pattern: "((?<=The answer is )(.*)(?=.)|(?<=the answer is )(.*)(?=.)|(?<=The answer: )(.*)(?=.)|(?<=The final answer: )(.*)(?=.))"
- function: "take_first"
generation_kwargs:
until:
- "</s>"
do_sample: false
temperature: 0.0
metric_list:
- metric: exact_match
aggregation: mean
higher_is_better: true
ignore_case: true
ignore_punctuation: true
"dataset_name": "abstract_algebra"
"description": "The following are multiple choice questions (with answers) about abstract\
\ algebra.\n\n"
"group": "mmlu_flan_cot_zeroshot_stem"
"include": "_mmlu_flan_cot_zeroshot_template_yaml"
"task": "mmlu_flan_cot_zeroshot_abstract_algebra"
"dataset_name": "anatomy"
"description": "The following are multiple choice questions (with answers) about anatomy.\n\
\n"
"group": "mmlu_flan_cot_zeroshot_stem"
"include": "_mmlu_flan_cot_zeroshot_template_yaml"
"task": "mmlu_flan_cot_zeroshot_anatomy"
"dataset_name": "astronomy"
"description": "The following are multiple choice questions (with answers) about astronomy.\n\
\n"
"group": "mmlu_flan_cot_zeroshot_stem"
"include": "_mmlu_flan_cot_zeroshot_template_yaml"
"task": "mmlu_flan_cot_zeroshot_astronomy"
"dataset_name": "business_ethics"
"description": "The following are multiple choice questions (with answers) about business\
\ ethics.\n\n"
"group": "mmlu_flan_cot_zeroshot_other"
"include": "_mmlu_flan_cot_zeroshot_template_yaml"
"task": "mmlu_flan_cot_zeroshot_business_ethics"
"dataset_name": "clinical_knowledge"
"description": "The following are multiple choice questions (with answers) about clinical\
\ knowledge.\n\n"
"group": "mmlu_flan_cot_zeroshot_other"
"include": "_mmlu_flan_cot_zeroshot_template_yaml"
"task": "mmlu_flan_cot_zeroshot_clinical_knowledge"
"dataset_name": "college_biology"
"description": "The following are multiple choice questions (with answers) about college\
\ biology.\n\n"
"group": "mmlu_flan_cot_zeroshot_stem"
"include": "_mmlu_flan_cot_zeroshot_template_yaml"
"task": "mmlu_flan_cot_zeroshot_college_biology"
"dataset_name": "college_chemistry"
"description": "The following are multiple choice questions (with answers) about college\
\ chemistry.\n\n"
"group": "mmlu_flan_cot_zeroshot_stem"
"include": "_mmlu_flan_cot_zeroshot_template_yaml"
"task": "mmlu_flan_cot_zeroshot_college_chemistry"
"dataset_name": "college_computer_science"
"description": "The following are multiple choice questions (with answers) about college\
\ computer science.\n\n"
"group": "mmlu_flan_cot_zeroshot_stem"
"include": "_mmlu_flan_cot_zeroshot_template_yaml"
"task": "mmlu_flan_cot_zeroshot_college_computer_science"
"dataset_name": "college_mathematics"
"description": "The following are multiple choice questions (with answers) about college\
\ mathematics.\n\n"
"group": "mmlu_flan_cot_zeroshot_stem"
"include": "_mmlu_flan_cot_zeroshot_template_yaml"
"task": "mmlu_flan_cot_zeroshot_college_mathematics"
"dataset_name": "college_medicine"
"description": "The following are multiple choice questions (with answers) about college\
\ medicine.\n\n"
"group": "mmlu_flan_cot_zeroshot_other"
"include": "_mmlu_flan_cot_zeroshot_template_yaml"
"task": "mmlu_flan_cot_zeroshot_college_medicine"
"dataset_name": "college_physics"
"description": "The following are multiple choice questions (with answers) about college\
\ physics.\n\n"
"group": "mmlu_flan_cot_zeroshot_stem"
"include": "_mmlu_flan_cot_zeroshot_template_yaml"
"task": "mmlu_flan_cot_zeroshot_college_physics"
"dataset_name": "computer_security"
"description": "The following are multiple choice questions (with answers) about computer\
\ security.\n\n"
"group": "mmlu_flan_cot_zeroshot_stem"
"include": "_mmlu_flan_cot_zeroshot_template_yaml"
"task": "mmlu_flan_cot_zeroshot_computer_security"
"dataset_name": "conceptual_physics"
"description": "The following are multiple choice questions (with answers) about conceptual\
\ physics.\n\n"
"group": "mmlu_flan_cot_zeroshot_stem"
"include": "_mmlu_flan_cot_zeroshot_template_yaml"
"task": "mmlu_flan_cot_zeroshot_conceptual_physics"
"dataset_name": "econometrics"
"description": "The following are multiple choice questions (with answers) about econometrics.\n\
\n"
"group": "mmlu_flan_cot_zeroshot_social_sciences"
"include": "_mmlu_flan_cot_zeroshot_template_yaml"
"task": "mmlu_flan_cot_zeroshot_econometrics"
"dataset_name": "electrical_engineering"
"description": "The following are multiple choice questions (with answers) about electrical\
\ engineering.\n\n"
"group": "mmlu_flan_cot_zeroshot_stem"
"include": "_mmlu_flan_cot_zeroshot_template_yaml"
"task": "mmlu_flan_cot_zeroshot_electrical_engineering"
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment