Commit e1ae8a2f authored by Herbie Bradley's avatar Herbie Bradley
Browse files

Merge remote-tracking branch 'origin/big-refactor' into calibration

parents 50e99bd7 30936bc7
"dataset_name": "international_law"
"description": "The following are multiple choice questions (with answers) about international\
\ law.\n\nQ: How the consent to be bound of a State may be expressed?\n(A) The consent\
\ of a State to be bound is expressed only by ratification (B) The consent of a\
\ state to be bound by a treaty may be expressed by signature, ratification, acceptance,\
\ approval or accession (C) The consent of a State to be bound is expressed by signature\
\ (D) The consent of a State to be bound is expressed by whatever means they choose\n\
A: Let's think step by step. We refer to Wikipedia articles on international law\
\ for help. Article 11 of Vienna Convention on the Law of Treaties signed in 1969\
\ states that \"the consent of a State to be bound by a treaty may be expressed\
\ by signature, exchange of instruments constituting a treaty, ratification, acceptance,\
\ approval or accession, or by any other means if so agreed.\" (B) is the most precise\
\ and accurate answer. The answer is (B).\n\nQ: What is the judge ad hoc?\n(A) If\
\ a party to a contentious case before the ICJ does not have a national sitting\
\ as judge, it is entitled to nominate someone as a judge solely for that case,\
\ with the title of judge ad hoc (B) Judge ad hoc is the member of the bench of\
\ the ICJ with a casting vote (C) Judge ad hoc is a surrogate judge, in case a judge\
\ is disqualified or passes away (D) Judge ad hoc is the judge that each party will\
\ always nominate in every contentious case\nA: Let's think step by step. We refer\
\ to Wikipedia articles on international law for help. As \"ad hoc\" implies, a\
\ judge ad hoc is appointed only for a specific case or period, when a party to\
\ a contentious case before the International Court of Justice does not have a regular\
\ national sitting as judge. The answer is (A).\n\nQ: When 'consent' can serve as\
\ a circumstance precluding the wrongfulness of a State conduct?\n(A) Consent can\
\ serve as a circumstance precluding the wrongfulness whenever it is given (B) Consent\
\ can never serve as a circumstance precluding wrongfulness (C) Consent can serve\
\ as a circumstance precluding wrongfulness, provided the consent is valid and to\
\ the extent that the conduct remains within the limits of the consent given (D)\
\ Consent can always serve as a circumstance precluding wrongfulness, no matter\
\ which organ of the State gives it\nA: Let's think step by step. We refer to Wikipedia\
\ articles on international law for help. Valid consent can serve as a circumstance\
\ precluding the wrongfulness of a State conduct if the conduct remains within the\
\ limits of that consent, according to Chapter V of the Responsibility of States\
\ for Internationally Wrongful Acts, 2001, United Nations. The answer is (C).\n\n\
Q: Would a reservation to the definition of torture in the ICCPR be acceptable in\
\ contemporary practice?\n(A) This is an acceptable reservation if the reserving\
\ country's legislation employs a different definition (B) This is an unacceptable\
\ reservation because it contravenes the object and purpose of the ICCPR (C) This\
\ is an unacceptable reservation because the definition of torture in the ICCPR\
\ is consistent with customary international law (D) This is an acceptable reservation\
\ because under general international law States have the right to enter reservations\
\ to treaties\nA: Let's think step by step. We refer to Wikipedia articles on international\
\ law for help. For it contravenes the object and purpose of the ICCPR, this is\
\ an unacceptable reservation in contemporary practice. The answer is (B).\n\nQ:\
\ What types of force does Article 2(4) of the UN Charter prohibit?\n(A) Article\
\ 2(4) encompasses only armed force (B) Article 2(4) encompasses all types of force,\
\ including sanctions (C) Article 2(4) encompasses all interference in the domestic\
\ affairs of States (D) Article 2(4) encompasses force directed only against a State's\
\ territorial integrity\nA: Let's think step by step. We refer to Wikipedia articles\
\ on international law for help. Article 2(4) of the UN Charter prohibits states\
\ from using armed forces in their international relations. The answer is (A)."
"group": "mmlu_flan_cot_fewshot_humanities"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_international_law"
"dataset_name": "jurisprudence"
"description": "The following are multiple choice questions (with answers) about jurisprudence.\n\
\nQ: Iverson Jewelers wrote a letter to Miller, 'We have received an exceptionally\
\ fine self winding Rolox watch which we will sell to you at a very favorable price.'\n\
(A) The letter is an offer to sell (B) A valid offer cannot be made by letter. (C)\
\ The letter contains a valid offer which will terminate within a reasonable time.\
\ (D) The letter lacks one of the essential elements of an offer.\nA: Let's think\
\ step by step. We refer to Wikipedia articles on jurisprudence for help. An offer\
\ shows the intent to enter into a mutually-beneficial contract with specific terms.\
\ An offer can be made by a letter. While this letter indicates the willingness\
\ to sell, the lack of specific terms, such as transaction price and offer expiration\
\ date, makes it an incomplete offer. The answer is (D).\n\nQ: Functions of the\
\ law include all but which of the following?\n(A) maximizing individual freedom\
\ (B) providing a basis for compromise (C) keeping the peace (D) promoting the principles\
\ of the free enterprise system\nA: Let's think step by step. We refer to Wikipedia\
\ articles on jurisprudence for help. Laws are fundamentally about helping resolve\
\ disputes between individuals, and therefore essential for maximizing individual\
\ freedom, providing a basis for compromise, and keeping the peace. The answer is\
\ (D).\n\nQ: The ________ School of jurisprudence postulates that the law is based\
\ on what is \"correct.\"\n(A) Natural Law (B) Analytical (C) Historical (D) Sociological\n\
A: Let's think step by step. We refer to Wikipedia articles on jurisprudence for\
\ help. Natural Law School of jurisprudence focuses on the laws of nature, and states\
\ that the law should be based on ethics, morals, and what is \"correct\". Analytical\
\ deals with the law as it already exists, Historical postulates that the law was\
\ found and not made, and Sociological studies how the law and society impact each\
\ other. The answer is (A).\n\nQ: Which word best summarizes Weber's explanation\
\ of the development of formally rational law?\n(A) Authority. (B) Charisma. (C)\
\ Co-operation. (D) Capitalism.\nA: Let's think step by step. We refer to Wikipedia\
\ articles on jurisprudence for help. Weber explained the development of formal\
\ rationality in laws as how the modern society moved from tradition to rationality,\
\ where people decide actions based less on how they were culturally done and more\
\ on expected utilities. How rational individuals optimize efficiency of accomplishing\
\ tasks for higher rewards is a core principle of Capitalism. The answer is (D).\n\
\nQ: Which position does Rawls claim is the least likely to be adopted by the POP\
\ (people in the original position)?\n(A) The POP would choose equality above liberty.\
\ (B) The POP would opt for the 'maximin' strategy. (C) The POP would opt for the\
\ 'difference principle'. (D) The POP would reject the 'system of natural liberty.'\n\
A: Let's think step by step. We refer to Wikipedia articles on jurisprudence for\
\ help. The POP would opt for the 'maximin' strategy, opt for the 'difference principle',\
\ and reject the 'system of natural liberty', but the POP would not choose equality\
\ above liberty, since the POP assume both equal and free citizens. The answer is\
\ (A)."
"group": "mmlu_flan_cot_fewshot_humanities"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_jurisprudence"
"dataset_name": "logical_fallacies"
"description": "The following are multiple choice questions (with answers) about logical\
\ fallacies.\n\nQ: When an arguer causes confusion during refutation because of\
\ real or feigned lack of an ability to engage in refutation, that arguer may have\
\ committed the fallacy of\n(A) poor sportsmanship (B) appeal to compassion (C)\
\ argument against the person (D) ignorance of refutation\nA: Let's think step by\
\ step. We refer to Wikipedia articles on logical fallacies for help. Ignorance\
\ of refutation, one of Aristotle's original list of logical fallacies in his Organon,\
\ is when someone causes confusion in an argument through real or feigned inability\
\ to engage in refutation, in order to win the argument. The answer is (D).\n\n\
Q: The complex question fallacy consists of\n(A) arguing something is inferior just\
\ because it doesn't do something it was never intended to do. (B) including more\
\ than one claim in the proposition and treating proof for one claim as proof for\
\ all the claims. (C) drawing a conclusion before examining the evidence, and only\
\ considering evidence that supports that conclusion. (D) asking a question that\
\ includes either an unproven assumption or more than one question, thus making\
\ a straightforward yes or no answer meaningless.\nA: Let's think step by step.\
\ We refer to Wikipedia articles on logical fallacies for help. The complex question\
\ fallacy is when someone makes a single yes or no answer to a question meaningless,\
\ by including either an unproven assumption or many questions. The latter is also\
\ known as the many questions fallacy. The answer is (D).\n\nQ: Arguing that what\
\ is true of the parts must be true of the whole is the fallacy of...\n(A) Division\
\ (B) Composition (C) Appeal to the person (D) Appeal to ignorance\nA: Let's think\
\ step by step. We refer to Wikipedia articles on logical fallacies for help. Fallacy\
\ of composition occurs when someone argues what is true of the parts must be true\
\ of the whole. The answer is (B).\n\nQ: Which of the following is true of a valid\
\ categorical syllogism?\n(A) The minor premise must deny the antecedent (B) The\
\ major premise must affirm the consequent (C) The middle term must be used in at\
\ least one premise in a universal or unqualified sense (D) All of the above\nA:\
\ Let's think step by step. We refer to Wikipedia articles on logical fallacies\
\ for help. A valid categorical syllogism must satisfy several conditions: (1) the\
\ syllogism must have exactly three terms (2) every term of the syllogism must be\
\ used twice exactly, (3) a term may be used only once in any premise, and (4) the\
\ middle term must be used in at least one premise in a universal or unqualified\
\ sense, etc. Only (C) is true. The answer is (C).\n\nQ: If someone attacks the\
\ character of an opposing arguer, instead of responding to that opponent's arguments,\
\ the first person has probably committed which of the following fallacies?\n(A)\
\ tu quoque (B) horse laugh (C) argument against the person (D) ignoratio elenchi\n\
A: Let's think step by step. We refer to Wikipedia articles on logical fallacies\
\ for help. The argument against the person fallacy occurs when someone irrelevantly\
\ attacks the character of an opposing arguer, instead of addressing that opponent's\
\ arguments. The answer is (C)."
"group": "mmlu_flan_cot_fewshot_humanities"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_logical_fallacies"
"dataset_name": "machine_learning"
"description": "The following are multiple choice questions (with answers) about machine\
\ learning.\n\nQ: Which image data augmentation is most common for natural images?\n\
(A) random crop and horizontal flip (B) random crop and vertical flip (C) posterization\
\ (D) dithering\nA: Let's think step by step. Data augmentation is used to increase\
\ the diversity of images in the training dataset. It is important that natural\
\ images are kept natural after being augmented. Vertical flips of images are not\
\ natural, so (B) is false. Posterization makes the image look like a poster and\
\ and dithering increases color depth. None of these two preserve the natural property.\
\ The only natural data augmentation technique is (A). The answer is (A).\n\nQ:\
\ Traditionally, when we have a real-valued input attribute during decision-tree\
\ learning we consider a binary split according to whether the attribute is above\
\ or below some threshold. Pat suggests that instead we should just have a multiway\
\ split with one branch for each of the distinct values of the attribute. From the\
\ list below choose the single biggest problem with Pat’s suggestion:\n(A) It is\
\ too computationally expensive. (B) It would probably result in a decision tree\
\ that scores badly on the training set and a testset. (C) It would probably result\
\ in a decision tree that scores well on the training set but badly on a testset.\
\ (D) It would probably result in a decision tree that scores well on a testset\
\ but badly on a training set.\nA: Let's think step by step. Because the input is\
\ real valued, it is unlikely that the same values appear both at training and test\
\ time. This means that while such a decision tree could yield good performance\
\ on the training data, when evaluated on the test data it will perform badly because\
\ the decision tree won’t know what to do with numbers that did not appear in the\
\ training data. The answer is (C).\n\nQ: You are reviewing papers for the World’s\
\ Fanciest Machine Learning Conference, and you see submissions with the following\
\ claims. Which ones would you consider accepting?\n(A) My method achieves a training\
\ error lower than all previous methods! (B) My method achieves a test error lower\
\ than all previous methods! (Footnote: When regularisation parameter λ is chosen\
\ so as to minimise test error.) (C) My method achieves a test error lower than\
\ all previous methods! (Footnote: When regularisation parameter λ is chosen so\
\ as to minimise cross-validaton error.) (D) My method achieves a cross-validation\
\ error lower than all previous methods! (Footnote: When regularisation parameter\
\ λ is chosen so as to minimise cross-validaton error.)\nA: Let's think step by\
\ step. In machine learning, we train with some data and fixed hyperparameters and\
\ the training error can be arbitrarily low, so (A) can’t be right. Then, one compares\
\ different hyperparameters by selecting the model with the lowest cross-validation\
\ error, this means that (B) and (D) are not the right procedure. The only relevant\
\ number after these is the test error and thus (C) is the right answer. The answer\
\ is (C).\n\nQ: A 6-sided die is rolled 15 times and the results are: side 1 comes\
\ up 0 times; side 2: 1 time; side 3: 2 times; side 4: 3 times; side 5: 4 times;\
\ side 6: 5 times. Based on these results, what is the probability of side 3 coming\
\ up when using Add-1 Smoothing?\n(A) 2.0/15 (B) 1.0/7 (C) 3.0/16 (D) 1.0/5\nA:\
\ Let's think step by step. Add-1 smoothing adds the value of one to the different\
\ counts and then normalizes the probabilities accordingly. The counts after adding\
\ one will be: side 1 comes up 1 time; side 2: 2 times; side 3: 3 times; side 4:\
\ 4 times; side 5: 5 times; side 6: 6 times. The number of sum one die rolls will\
\ be 21, so the probability of drawing a three is 3/21 = 1/7. The answer is (B).\n\
\nQ: To achieve an 0/1 loss estimate that is less than 1 percent of the true 0/1\
\ loss (with probability 95%), according to Hoeffding's inequality the IID test\
\ set must have how many examples?\n(A) around 10 examples (B) around 100 examples\
\ (C) between 100 and 500 examples (D) more than 1000 examples\nA: Let's think step\
\ by step. By the Hoeffding’s inequality, we expect that with 95% probability the\
\ in-sample and out-of-sample errors differ by epsilon when we have N samples if\
\ 2 exp(-2 epsilon^2 N)<0.05, this implies that N > -1/(2*epsilon**2) log ( 0.05/2\
\ )= log (40)*5000. Since log(40)>1, we have that one needs more than 1000 examples.\
\ The answer is (D)."
"group": "mmlu_flan_cot_fewshot_stem"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_machine_learning"
"dataset_name": "management"
"description": "The following are multiple choice questions (with answers) about management.\n\
\nQ: How can organisational structures that are characterised by democratic and\
\ inclusive styles of management be described?\n(A) Hierarchical (B) Bureaucratic\
\ (C) Flat (D) Functional\nA: Let's think step by step. We refer to Wikipedia articles\
\ on management for help. Flat organizational structures are characterized by democratic\
\ and inclusive styles of management, and have few (if any) levels of management\
\ between the workers and managers. The answer is (C).\n\nQ: Hygiene factors are\
\ associated with which writer?\n(A) Frederick Hertzberg (B) D.C. McClelland (C)\
\ Abraham Maslow (D) Douglas McGregor\nA: Let's think step by step. We refer to\
\ Wikipedia articles on management for help. Hygiene factors include compensation,\
\ company policies, supervision, interpersonal relations, and work environments.\
\ Hertzberg lists them as factors that cannot motivate employees but can minimize\
\ job dissatisfaction. The answer is (A).\n\nQ: What characteristic is not a key\
\ feature of the 'open systems' model of management?\n(A) Morale (B) Innovation\
\ (C) Growth resource (D) Adaptation\nA: Let's think step by step. We refer to Wikipedia\
\ articles on management for help. The key characteristics of an open system in\
\ management include innovation, growth resource, and adaption, but do not include\
\ morale. The answer is (A).\n\nQ: Which element of the cultural web forms regalia?\n\
(A) Symbols (B) Rituals and routines (C) Power structures (D) Control systems\n\
A: Let's think step by step. We refer to Wikipedia articles on management for help.\
\ The cultural web is a tool for mapping an organization's culture, where symbols\
\ form the regalia that visually expresses the values that the organization holds\
\ as important. The answer is (A).\n\nQ: What are the two main dimensions of the\
\ Ohio Studies into leadership?\n(A) Starting position and end position (B) Initial\
\ environment and changed environment (C) Organisational structure and conditioning\
\ (D) Initiating structure and considerations\nA: Let's think step by step. We refer\
\ to Wikipedia articles on management for help. The Ohio State Leadership Studies\
\ conducted in the 1940s identified initiating structure and consideration as the\
\ two main dimensions of leader behavior. The answer is (D)."
"group": "mmlu_flan_cot_fewshot_other"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_management"
"dataset_name": "marketing"
"description": "The following are multiple choice questions (with answers) about marketing.\n\
\nQ: Although the content and quality can be as controlled as direct mail, response\
\ rates of this medium are lower because of the lack of a personal address mechanism.\
\ This media format is known as:\n(A) Care lines. (B) Direct mail. (C) Inserts.\
\ (D) Door to door.\nA: Let's think step by step. We refer to Wikipedia articles\
\ on marketing for help. Door to door marketing delivers non-addressed items within\
\ all buildings within a geographic area. While it can control the content and quality\
\ as well as direct mail marketing, its response rate is lower because of the lack\
\ of a personal address mechanism. The answer is (D).\n\nQ: In an organization,\
\ the group of people tasked with buying decisions is referred to as the _______________.\n\
(A) Outsourcing unit. (B) Procurement centre. (C) Chief executive unit. (D) Decision-making\
\ unit.\nA: Let's think step by step. We refer to Wikipedia articles on marketing\
\ for help. In an organization, the group of the people tasked with buying decision\
\ is referred to as the decision-making unit. The answer is (D).\n\nQ: The single\
\ group within society that is most vulnerable to reference group influence is:\n\
(A) The older consumer who feels somewhat left out of things. (B) The married women,\
\ many of whom feel a need for stability in their lives. (C) New immigrants who\
\ really want to assimilate into their new culture. (D) Children, who base most\
\ of their buying decisions on outside influences.\nA: Let's think step by step.\
\ We refer to Wikipedia articles on marketing for help. Children, who mostly based\
\ their buying decisions on outside influences, are the single group within society\
\ that is more vulnerable to reference group influence. The answer is (D).\n\nQ:\
\ Which of the following is an assumption in Maslow's hierarchy of needs?\n(A) Needs\
\ are dependent on culture and also on social class. (B) Lower-level needs must\
\ be at least partially satisfied before higher needs can affect behaviour. (C)\
\ Needs are not prioritized or arranged in any particular order. (D) Satisfied needs\
\ are motivators, and new needs emerge when current needs remain unmet.\nA: Let's\
\ think step by step. We refer to Wikipedia articles on marketing for help. Maslow's\
\ hierarchy of needs, from the bottom upwards, are physiological (food and clothing),\
\ safety, love and belonging needs, esteem, and self-actualization. Lower-level\
\ needs must be at least partially satisfied before higher ones can affect behavior.\
\ The answer is (B).\n\nQ: _____________ is a natural outcome when combining demographic\
\ and geographic variables.\n(A) Geodemographics (B) Product differentiation. (C)\
\ ANSOFF matrix. (D) Brand management.\nA: Let's think step by step. We refer to\
\ Wikipedia articles on marketing for help. Geodemographics is a natural outcome\
\ when combining demographic and geographic variables. The answer is (A)."
"group": "mmlu_flan_cot_fewshot_other"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_marketing"
"dataset_name": "medical_genetics"
"description": "The following are multiple choice questions (with answers) about medical\
\ genetics.\n\nQ: The stage of meiosis in which chromosomes pair and cross over\
\ is:\n(A) prophase I (B) metaphase I (C) prophase II (D) metaphase II\nA: Let's\
\ think step by step. We refer to Wikipedia articles on medical genetics for help.\
\ Prophase I is the stage of meiosis where homologous chromosomes pair with each\
\ other and exchange genetic material. The answer is (A).\n\nQ: DNA ligase is\n\
(A) an enzyme that joins fragments in normal DNA replication (B) an enzyme of bacterial\
\ origin which cuts DNA at defined base sequences (C) an enzyme that facilitates\
\ transcription of specific genes (D) an enzyme which limits the level to which\
\ a particular nutrient reaches\nA: Let's think step by step. We refer to Wikipedia\
\ articles on medical genetics for help. DNA ligase is a type of enzyme (EC 6.5.1.1)\
\ responsible for joining DNA strands together by catalyzing a phosphodiester bond.\
\ The answer is (A).\n\nQ: Which of the following conditions does not show multifactorial\
\ inheritance?\n(A) Pyloric stenosis (B) Schizophrenia (C) Spina bifida (neural\
\ tube defects) (D) Marfan syndrome\nA: Let's think step by step. We refer to Wikipedia\
\ articles on medical genetics for help. Multifactorial inheritance is when more\
\ than a single factor is responsible for causing a given trait or health problem.\
\ Genes cannot be the only factor. Marfan syndrome, on the other hand, requires\
\ only one abnormal copy of the of the Marfan gene, from one parent, to inherit\
\ the trait. The answer is (D).\n\nQ: A gene showing codominance\n(A) has both alleles\
\ independently expressed in the heterozygote (B) has one allele dominant to the\
\ other (C) has alleles tightly linked on the same chromosome (D) has alleles expressed\
\ at the same time in development\nA: Let's think step by step. We refer to Wikipedia\
\ articles on medical genetics for help. Codominance, as it relates to genetics,\
\ refers to a type of genetic inheritance where the phenotype of both the parents\
\ is easily observed in the offspring. A heterozygote is an individual having two\
\ different alleles of a gene. The answer is (A).\n\nQ: Large triplet repeat expansions\
\ can be detected by:\n(A) polymerase chain reaction. (B) single strand conformational\
\ polymorphism analysis. (C) Southern blotting. (D) Western blotting.\nA: Let's\
\ think step by step. We refer to Wikipedia articles on medical genetics for help.\
\ A Southern blot is a method in molecular biology for detecting specific DNA sequences\
\ in a sample. Large triplet repeat expansions are usually detected with this method.\
\ The answer is (C)."
"group": "mmlu_flan_cot_fewshot_other"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_medical_genetics"
"dataset_name": "miscellaneous"
"description": "The following are multiple choice questions (with answers) about miscellaneous.\n\
\nQ: Which of these songs was a Top 10 hit for the rock band The Police?\n(A) 'Radio\
\ Ga-Ga' (B) 'Ob-la-di Ob-la-da' (C) 'De Do Do Do De Da Da Da' (D) 'In-a-Gadda-Da-Vida'\n\
A: Let's think step by step. We refer to Wikipedia for help. Radio Ga-Ga is by Queen.\
\ Ob-la-di Ob-la-da is by The Beatles. And In-a-Gadda-Da-Vida is by Iron Butterfly.\
\ Leaving 'De Do Do Do De Da Da Da' as the only song by The Police, and also a Top\
\ 10 hit. The answer is (C).\n\nQ: What place is named in the title of the 1979\
\ live album by rock legends Cheap Trick?\n(A) Budapest (B) Budokan (C) Bhutan (D)\
\ Britain\nA: Let's think step by step. We refer to Wikipedia for help. Nippon Budokan\
\ is an indoor arena in Tokyo, Japan renowned for hosting rock music concerts including\
\ Cheap Trick in 1978. 'Cheap Trick at Budokan' became the name of their album.\
\ The answer is (B).\n\nQ: What is produced during photosynthesis?\n(A) hydrogen\
\ (B) nylon (C) oxygen (D) light\nA: Let's think step by step. We refer to Wikipedia\
\ for help. Photosynthesis is the process in which green plants use the green pigment\
\ chlorophyll to synthesize foods with water and carbon dioxide. Oxygen is the byproduct\
\ of this process. The answer is (C).\n\nQ: Who is the shortest man to ever win\
\ an NBA slam dunk competition?\n(A) Anthony 'Spud' Webb (B) Michael 'Air' Jordan\
\ (C) Tyrone 'Muggsy' Bogues (D) Julius 'Dr J' Erving\nA: Let's think step by step.\
\ We refer to Wikipedia for help. In 1986, Spud Webb, standing only 5'7\" became\
\ the shortest NBA player in history to win an official slam dunk contest. The answer\
\ is (A).\n\nQ: How many axles does a standard automobile have?\n(A) one (B) two\
\ (C) four (D) eight\nA: Let's think step by step. We refer to Wikipedia for help.\
\ Most cars have two axles to rotate the wheels.. The answer is (B)."
"group": "mmlu_flan_cot_fewshot_other"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_miscellaneous"
"dataset_name": "moral_disputes"
"description": "The following are multiple choice questions (with answers) about moral\
\ disputes.\n\nQ: Baron admits that the versions of the ticking bomb hypothetical\
\ she discusses are \"stunningly stupid,\" but she claims this is actually evidence\
\ of\n(A) the stupidity of most traditional philosophical examples. (B) a general\
\ lack of intelligence among people with advanced degrees. (C) the wrongness of\
\ torture. (D) the readiness on the part of many intelligent people to see torture\
\ as the best solution to deal with terrorism.\nA: Let's think step by step. We\
\ refer to Wikipedia articles on moral disputes for help. The ticking bomb hypothetical\
\ poses a problem where many people will die to an exploding bomb, if the hypothetical\
\ terrorist does not disclose how to defuse it. Baron sees this hypothetical as\
\ silly, but its prevalence does suggest intelligent people, particularly utilitarians,\
\ see torture as justifiable to save the lives in this scenario. The answer is (D).\n\
\nQ: A fertilized ovum is also known as\n(A) a zygote. (B) an embryo. (C) a viability.\
\ (D) a blastocyst.\nA: Let's think step by step. We refer to Wikipedia articles\
\ on moral disputes for help. Once a single sperm penetrates the layers of an egg\
\ to form a new cell, that cell is called a zygote. The answer is (A).\n\nQ: Pence\
\ compares six different cases of reproduction, from natural twinning to SCNT. What\
\ conclusion does he draw from this comparison?\n(A) SCNT is not a different kind\
\ of reproduction because there are no morally relevant differences between it and\
\ other permissible means of reproduction. (B) Because there is a low risk of harm\
\ for natural twinning, there will be a low risk of harm for SCNT. (C) both A and\
\ B (D) neither A nor B\nA: Let's think step by step. We refer to Wikipedia articles\
\ on moral disputes for help. SCNT, known as 'somatic cell nuclear transfer', is\
\ a technique in which the nucleus of a somatic (body) cell is transferred to the\
\ cytoplasm of an enucleated egg. This is used in farm animal cloning. Pence may\
\ compare this to other processes of reproduction, and see little moral separation.\
\ The answer is (A).\n\nQ: Which of the following is an intrinsic good, according\
\ to Singer?\n(A) being in the upper class (B) the ability to learn and solve problems\
\ (C) being more beautiful than most of the population (D) being shorter than average\n\
A: Let's think step by step. We refer to Wikipedia articles on moral disputes for\
\ help. Bioethicist Peter Singer sees intrinsic value as innate values conferred\
\ by oneself, for oneself. Innanimate objects can be beautiful, short, or have some\
\ valuable criteria, but capabilities are intrinsically good. The answer is (B).\n\
\nQ: According to Metz, what is wrong with consequentialist arguments against capital\
\ punishment based on African values?\n(A) It is unclear as of yet whether or not\
\ capital punishment deters harm to the community. (B) It is unclear as of yet whether\
\ or not capital punishment deters harm to any individuals. (C) Consequentialism\
\ is not supported by African values. (D) Even though consequentialism is supported\
\ by African values, no consequentialist arguments framed in terms of African values\
\ have been offered.\nA: Let's think step by step. We refer to Wikipedia articles\
\ on moral disputes for help. Thaddeus Metz is a humanities research professor focusing\
\ on ethical philosophy in South Africa. Metz has written the death penalty is unjustified\
\ as it treats individuals as incapable of communal relations. It is unclear that\
\ capital punishment is to the benefit of, or a deterrent of harm to the community.\
\ The answer is (A)."
"group": "mmlu_flan_cot_fewshot_humanities"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_moral_disputes"
"dataset_name": "moral_scenarios"
"description": "The following are multiple choice questions (with answers) about moral\
\ scenarios.\n\nQ: For which of these two scenarios does the main character (who\
\ uses I/me/my) do something clearly morally wrong, according to ordinary moral\
\ standards in the US as of 2020? Scenario 1 | I took his wallet after he harassed\
\ me at the bar. Scenario 2 | I made love to my newborn baby boy.\n(A) Wrong, Wrong\
\ (B) Wrong, Not wrong (C) Not wrong, Wrong (D) Not wrong, Not wrong\nA: Let's think\
\ step by step. We refer to Wikipedia articles on moral scenarios for help. Stealing\
\ from someone is not considered an ethical response to harrassment. And having\
\ sex with a minor is egregiously unethical. The answer is (A).\n\nQ: For which\
\ of these two scenarios does the main character (who uses I/me/my) do something\
\ clearly morally wrong, according to ordinary moral standards in the US as of 2020?\
\ Scenario 1 | I asked for a cigarette from the guy outside the store. Scenario\
\ 2 | When the security guard left his post, I pocketed a rare coin.\n(A) Wrong,\
\ Wrong (B) Wrong, Not wrong (C) Not wrong, Wrong (D) Not wrong, Not wrong\nA: Let's\
\ think step by step. We refer to Wikipedia articles on moral scenarios for help.\
\ Asking for something is not morally wrong. Stealing is morally wrong. The answer\
\ is (C).\n\nQ: For which of these two scenarios does the main character (who uses\
\ I/me/my) do something clearly morally wrong, according to ordinary moral standards\
\ in the US as of 2020? Scenario 1 | During the house's demolition, I pushed him\
\ into a wall. Scenario 2 | When I choose a babysitter for my twin nieces, I don't\
\ care about her racial background.\n(A) Wrong, Wrong (B) Wrong, Not wrong (C) Not\
\ wrong, Wrong (D) Not wrong, Not wrong\nA: Let's think step by step. We refer to\
\ Wikipedia articles on moral scenarios for help. Physical violence is morally wrong.\
\ Being open to hiring people from different racial backgrounds is an expected moral\
\ stance. The answer is (B).\n\nQ: For which of these two scenarios does the main\
\ character (who uses I/me/my) do something clearly morally wrong, according to\
\ ordinary moral standards in the US as of 2020? Scenario 1 | I bought an old pick-up\
\ truck for work. Scenario 2 | I looked up at the gorgeous waterfall and threw a\
\ rock at it.\n(A) Wrong, Wrong (B) Wrong, Not wrong (C) Not wrong, Wrong (D) Not\
\ wrong, Not wrong\nA: Let's think step by step. We refer to Wikipedia articles\
\ on moral scenarios for help. Neither of these are wrong as no one is wronged by\
\ buing a pick-up truck or throwing a rock into a stream. The answer is (D).\n\n\
Q: For which of these two scenarios does the main character (who uses I/me/my) do\
\ something clearly morally wrong, according to ordinary moral standards in the\
\ US as of 2020? Scenario 1 | I really loved my newborn baby boy. Scenario 2 | The\
\ new girl is embarrassed to live in a trailer park, so I invite the class to her\
\ house.\n(A) Wrong, Wrong (B) Wrong, Not wrong (C) Not wrong, Wrong (D) Not wrong,\
\ Not wrong\nA: Let's think step by step. We refer to Wikipedia articles on moral\
\ scenarios for help. Loving someone is not wrong. However, exposing something that\
\ someone is embarrassed about could be considered quite mean. The answer is (C)."
"group": "mmlu_flan_cot_fewshot_humanities"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_moral_scenarios"
"dataset_name": "nutrition"
"description": "The following are multiple choice questions (with answers) about nutrition.\n\
\nQ: What is the first-line drug for patients with type 2 diabetes and obesity,\
\ as of 2020?\n(A) Acarbose (B) Metformin (C) Sulphonylureas (D) Insulin\nA: Let's\
\ think step by step. We refer to Wikipedia articles on nutrition for help. Metformin\
\ (Fortamet, Glumetza, or others) is usually the first medication prescribed for\
\ type 2 diabetes, as well as obesity. It works by lowering glucose production in\
\ the liver and improving the body's sensitivity to insulin. The answer is (B).\n\
\nQ: Which of the following statements is correct (according to knowledge in 2020)?\n\
(A) Consumers with phenylketonuria must avoid the consumption of the sweetener aspartame\
\ (B) Consumers with phenylketonuria must avoid the consumption of the sweetener\
\ saccharin (C) Consumers with phenylketonuria must avoid the consumption of the\
\ sweetener sucralose (D) Consumers with phenylketonuria must avoid the consumption\
\ of the sweetener acesulfame K\nA: Let's think step by step. We refer to Wikipedia\
\ articles on nutrition for help. People with phenylketonuria (PKU) cannot break\
\ down the amino acid phenylalanine. As it builds up in the blood and brain it can\
\ lead to brain damage. People with PKU should avoid foods that are converted to\
\ phenylalanine in the body, such as aspartame. The answer is (A).\n\nQ: Which of\
\ the following statements about iodine is correct, as of 2020?\n(A) 50% of adults\
\ consume iodine at levels below the RNI (B) Dairy products are a poor source of\
\ iodine (C) The iodine content of organic milk is generally lower that the level\
\ in non-organic milk (D) UK dietary reference values recommend an increase in iodine\
\ intake in pregnancy\nA: Let's think step by step. We refer to Wikipedia articles\
\ on nutrition for help. Organic milk usually has less iodine content than non-organic\
\ milk. The answer is (C).\n\nQ: Which of the following is the most plausible explanation\
\ for the protective effect of dietary fibre against cancer of the colon, as of\
\ 2020?\n(A) Propionic acid, formed during colonic fibre fermentation inhibits liver\
\ fatty acid synthesis (B) Butyric acid, formed during colonic fibre fermentation\
\ stimulates \"silencing\" of the SLC5A8 tumour suppressor gene (C) None of these\
\ options are correct (D) Butyric acid, formed during colonic fibre fermentation\
\ stimulates anti-oxidant defences in the colon\nA: Let's think step by step. We\
\ refer to Wikipedia articles on nutrition for help. Dietary fibre is inversely\
\ proportional to the risk of colorectal cancer. This is presumed because butyric\
\ acid (BA) stimulates antioxidants which help protect the colon from cancerous\
\ tumors. The answer is (D).\n\nQ: In a cohort study, the risk ratio of developing\
\ diabetes was 0.86 when comparing consumers of tea (the exposed) to those who did\
\ not drink tea (the unexposed). Which one statement is correct (according to knowledge\
\ in 2020)?\n(A) The tea drinkers have lower risk of developing diabetes. (B) The\
\ tea drinkers have higher risk of developing diabetes. (C) Based on the information\
\ given we cannot tell if the observed difference in disease risk is the result\
\ of chance. (D) The risk ratio is close to the value one, so there is no difference\
\ in disease risk between the two groups.\nA: Let's think step by step. We refer\
\ to Wikipedia articles on nutrition for help. The risk ratio is not sufficiently\
\ reduced that it could not be explained by random chance given the studies sample\
\ size. The answer is (C)."
"group": "mmlu_flan_cot_fewshot_other"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_nutrition"
"dataset_name": "philosophy"
"description": "The following are multiple choice questions (with answers) about philosophy.\n\
\nQ: The study of reality in the broadest sense, an inquiry into the elemental nature\
\ of the universe and the things in it, is known as _____.\n(A) metaphysics (B)\
\ epistemology (C) quantum physics (D) axiology\nA: Let's think step by step. We\
\ refer to Wikipedia articles on philosophy for help. Among the options, only metaphysics\
\ studies the nature of reality and existence. The answer is (A).\n\nQ: According\
\ to Moore’s “ideal utilitarianism,” the right action is the one that brings about\
\ the greatest amount of:\n(A) pleasure. (B) happiness. (C) good. (D) virtue.\n\
A: Let's think step by step. We refer to Wikipedia articles on philosophy for help.\
\ Moore's \"ideal utilitarianism\" states that one's actions should maximize intrinsic\
\ goods. The answer is (C).\n\nQ: Before Tolstoy's Christian conversion, what was\
\ his perspective on the meaning of life?\n(A) optimist (B) satisfied (C) nominally\
\ religious (D) pessimist\nA: Let's think step by step. We refer to Wikipedia articles\
\ on philosophy for help. Before his conversion, Tolstoy feels that life was uncertain,\
\ which is a pessimist's point of view. The answer is (D).\n\nQ: According to d'Holbach,\
\ people always act according to _____.\n(A) free choices (B) dictates of the soul\
\ (C) necessary natural laws (D) undetermined will\nA: Let's think step by step.\
\ We refer to Wikipedia articles on philosophy for help. d'Holbach believes that\
\ people act according to necessary laws, and it proves nothing about people's free\
\ will. The answer is (C).\n\nQ: Psychological egoism is:\n(A) an ethical theory\
\ about how we ought to behave. (B) a generalization concerning the way people tend\
\ to behave. (C) a claim about human nature and the ways people are capable of behaving.\
\ (D) none of the above.\nA: Let's think step by step. We refer to Wikipedia articles\
\ on philosophy for help. Psychological egoism suggests that one behaves based on\
\ what makes one feels good, hence it is a claim about human nature and how humans\
\ are capable of behaving. The answer is (C)."
"group": "mmlu_flan_cot_fewshot_humanities"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_philosophy"
"dataset_name": "prehistory"
"description": "The following are multiple choice questions (with answers) about prehistory.\n\
\nQ: What is the approximate mean cranial capacity of Homo erectus?\n(A) under 650\
\ cc (B) about 800 cc (C) just under 1000 cc (D) 1200 cc\nA: Let's think step by\
\ step. We refer to Wikipedia articles on prehistory for help. The average cranium\
\ capacity of Homo erectus is less than 1000 cubic cm. The answer is (C).\n\nQ:\
\ According to Timothy Pauketat, the evidence for social stratification and political\
\ power at Cahokia suggests:\n(A) a center of Mississippian civilization with conditions\
\ similar to the rise of early states. (B) the limitations of authority in a Native\
\ American society of egalitarian foragers. (C) a simple chiefdom or perhaps a complex\
\ chiefdom had evolved by A.D. 1500. (D) a center of Mississippian civilization\
\ with conditions similar to societies on the Northwest Coast of North America.\n\
A: Let's think step by step. We refer to Wikipedia articles on prehistory for help.\
\ Timothy Pauketat is known for his research on Cahokia, the center of the Mississippian\
\ culture, where he found similar conditions to the rise of early states. The answer\
\ is (A).\n\nQ: Recent research on hominid species dating from the Middle Pliocene\
\ indicates there was (as of 2020):\n(A) a great amount of species diversity, or\
\ a single species that exhibited a lot of diversity. (B) very little species diversity\
\ during this period and very few hominids. (C) decreased species diversity due\
\ to a prolonged ice age followed by a severe drought. (D) decreased species diversity\
\ but increased numbers of hammerstones and flakes, indicating stone tool manufacture.\n\
A: Let's think step by step. We refer to Wikipedia articles on prehistory for help.\
\ Recent research has recognized multiple hominid species from the Middle Pliocene,\
\ meaning that there is a great amount of species diversity or diversity in a single\
\ species. The answer is (A).\n\nQ: Researchers now believe that the decline of\
\ the Maya was caused chiefly by:\n(A) a cataclysm of some kind, such as an earthquake,\
\ volcano, or tsunami. (B) ecological degradation resulting from slash-and-burn\
\ farming techniques. (C) endless wars between neighboring Mayan city-states. (D)\
\ practices of interbreeding that led to a steep rise in congenital disorders.\n\
A: Let's think step by step. We refer to Wikipedia articles on prehistory for help.\
\ Researchers believe that the Maya collapse was mainly caused by over-exploitation\
\ of natural resources like the slash-and-burn farming techniques. The answer is\
\ (B).\n\nQ: The great Mayan king Pacal built temples in the city of Palenque in\
\ order to:\n(A) satisfy the powerful Mayan astronomer priests. (B) display his\
\ generosity to the common people, since they were allowed to live in the temples.\
\ (C) frighten away enemies, in particular the Spaniards. (D) legitimize his kingship,\
\ since his father was not royal.\nA: Let's think step by step. We refer to Wikipedia\
\ articles on prehistory for help. Pacal built the temples as the funerary monument\
\ to legitimize his kingship. The answer is (D)."
"group": "mmlu_flan_cot_fewshot_humanities"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_prehistory"
"dataset_name": "professional_accounting"
"description": "The following are multiple choice questions (with answers) about professional\
\ accounting.\n\nQ: An auditor traces the serial numbers on equipment to a nonissuer’s\
\ subledger. Which of the following management assertions is supported by this test?\n\
(A) Valuation and allocation (B) Completeness (C) Rights and obligations (D) Presentation\
\ and disclosure\nA: Let's think step by step. We refer to Wikipedia articles on\
\ accounting for help. The completeness assertion is tested by tracing supporting\
\ documents to the record entries. The answer is (B).\n\nQ: One hundred years ago,\
\ your great-great-grandmother invested $100 at 5% yearly interest. What is the\
\ investment worth today?\n(A) $13,000 (B) $600 (C) $15,000 (D) $28,000\nA: Let's\
\ think step by step. We refer to Wikipedia articles on accounting for help. A $100\
\ investment at 5% yearly interest is worth 100*(1.05)^100=13150 after 100 years,\
\ which is around $13,000. The answer is (A).\n\nQ: On January 1, year 1, Alpha\
\ Co. signed an annual maintenance agreement with a software provider for $15,000\
\ and the maintenance period begins on March 1, year 1. Alpha also incurred $5,000\
\ of costs on January 1, year 1, related to software modification requests that\
\ will increase the functionality of the software. Alpha depreciates and amortizes\
\ its computer and software assets over five years using the straight-line method.\
\ What amount is the total expense that Alpha should recognize related to the maintenance\
\ agreement and the software modifications for the year ended December 31, year\
\ 1?\n(A) $5,000 (B) $13,500 (C) $16,000 (D) $20,000\nA: Let's think step by step.\
\ We refer to Wikipedia articles on accounting for help. The maintenance period\
\ begins on March 1, so only 10 months of expenses should be recognized, which is\
\ $15,000/12*10=$12,500. The software modification cost is amortized over 5 years,\
\ so each year is $5,000/5=$1,000. So the total expense is $12,500+$1,000=$13,500.\
\ The answer is (B).\n\nQ: Krete is an unmarried taxpayer with income exclusively\
\ from wages. By December 31, year 1, Krete's employer has withheld $16,000 in federal\
\ income taxes and Krete has made no estimated tax payments. On April 15, year 2,\
\ Krete timely filed for an extension request to file her individual tax return,\
\ and paid $300 of additional taxes. Krete's year 1 tax liability was $16,500 when\
\ she timely filed her return on April 30, year 2, and paid the remaining tax liability\
\ balance. What amount would be subject to the penalty for underpayment of estimated\
\ taxes?\n(A) $0 (B) $500 (C) $1,650 (D) $16,500\nA: Let's think step by step. We\
\ refer to Wikipedia articles on accounting for help. The tax due after withholding\
\ is $16,500-$16,000=$500, which is less than $1000, hence there is no underpayment\
\ penalty of estimated taxes. The answer is (A).\n\nQ: Box a nongovernmental not-for-profit\
\ organization had the following transactions during the year: Proceeds from sale\
\ of investments $80000 Purchase of property plant and equipment $10000 Proceeds\
\ from long-term debt $100000 Loss on sale of investment $5000 What amount should\
\ be reported as net cash provided by financing activities in Box's statement of\
\ cash flows?\n(A) $70,000 (B) $75,000 (C) $80,000 (D) 100000\nA: Let's think step\
\ by step. We refer to Wikipedia articles on accounting for help. Among the four\
\ transactions, only Proceeds from long-term debt belongs to the financing activities\
\ section of cashflow, hence the amount reported should be $100000. The answer is\
\ (D)."
"group": "mmlu_flan_cot_fewshot_other"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_professional_accounting"
"dataset_name": "professional_law"
"description": "The following are multiple choice questions (with answers) about professional\
\ law.\n\nQ: A son owed a creditor $5,000. The son's father contacted the creditor\
\ and told him that he wanted to pay the son's debt. The father signed a document\
\ that stated the father would pay the son's debt at a rate of $500 a month for\
\ 10 months. The creditor made no written or oral commitment to forbear to sue the\
\ son to collect the $5,000 debt, and the father made no oral or written request\
\ for any such forbearance. For the next five months, the father made and the creditor\
\ accepted the $500 monthly payments as agreed. During that period, the creditor,\
\ in fact, did forbear to take any legal action against the son. However, the father\
\ then informed the creditor that he would make no further payments on the debt.\
\ Which of the following is the most persuasive argument that the father is liable\
\ to the creditor under the terms of their agreement?\n(A) The father's promise\
\ and the creditor's reliance thereon, if proved, gave rise to a valid claim by\
\ the creditor against the father based on the doctrine of promissory estoppel.\
\ (B) Because it was foreseeable that the father's promise would induce the creditor\
\ to forbear taking any action against the son, such forbearance was, as a matter\
\ of law, a bargained-for consideration for the father's promise. (C) The father's\
\ five payments to the creditor totaling $2,500 manifested a serious intent on the\
\ father's part to be contractually bound, and such manifestation is generally recognized\
\ as an effective substitute for consideration. (D) By assuming the antecedent debt\
\ obligation that the son owed to the creditor, the father became a surety whose\
\ promise to the creditor was enforceable, since it was in writing and supported\
\ by adequate consideration. \nA: Let's think step by step. We refer to Wikipedia\
\ articles on law for help. The doctrine of promissory estoppel stops a person from\
\ going back on a promise in contract law, hence option (A) should be the most persuasive\
\ argument. The answer is (A).\n\nQ: A state has recently enacted a statute prohibiting\
\ the disposal of any nuclear wastes within the state. This law does not contravene\
\ or conflict with any federal statutes. A man operates a company in the state that\
\ is engaged in the disposal of nuclear wastes. Subsequent to the passage of the\
\ state statute, the man, not yet aware of the new law, entered into contracts with\
\ many out-of-state firms to dispose of their nuclear wastes in the state. On account\
\ of this new law, however, the man will be unable to perform these contracts. Assume\
\ that the man has standing to challenge this state law. Which of the following\
\ presents his strongest constitutional grounds to challenge the state law prohibiting\
\ the disposal of nuclear wastes within the state?\n(A) The commerce clause. (B)\
\ The equal protection clause of the Fourteenth Amendment. (C) The privileges and\
\ immunities clause of Article IV, Section 2. (D) The contract clause.\nA: Let's\
\ think step by step. We refer to Wikipedia articles on law for help. The commerce\
\ clause states that Congress shall have the power to regulate commerce with foreign\
\ Nations, and among the several States, and with the Indian Tribes. The statute\
\ affects inter-state commerce which puts it into question. Hence the man's strongest\
\ argument should be the commerce clause. The answer is (A).\n\nQ: On October 1,\
\ 1980, a developer, owner of several hundred acres in a rural county, drafted a\
\ general development plan for the area. The duly recorded plan imposed elaborate\
\ limitations and restrictions upon the land in the plan, which was to be developed\
\ as a residential district. The restrictions were to extend to all persons acquiring\
\ any of the lots and to their heirs, assigns, and lessees. It was further provided\
\ that all subsequent owners would be charged with due notice of the restrictions.\
\ Among those restrictions in the general plan were the following:(22) A franchise\
\ right is created in a strip of land 10 feet in width along the rear of each lot\
\ for the use of public utility companies with right of ingress and egress. (23)\
\ No house or structure of any kind shall be built on the aforementioned strip of\
\ land running through the said blocks. In 2000, a retiree purchased one of the\
\ lots, built a house, and erected a fence in the rear of his property within the\
\ restricted area. In 2004, a teacher purchased a lot adjacent to the retiree's\
\ property and built a new house. Two years later, a librarian purchased the lot\
\ that adjoined the teacher's property. The three deeds to those properties each\
\ contained references to the deed book where the general plan was recorded. In\
\ 2008, the librarian began the construction of a seven-foot post-and-rail fence\
\ along the line dividing his lot with the teacher's, and along the center of the\
\ area subject to the franchise right. Although the teacher objected to its construction,\
\ the fence was completed. If the teacher seeks a mandatory injunction to compel\
\ removal of the librarian's fence, the court will most likely\n(A) grant relief,\
\ because the fence was in violation of the easement restriction. (B) grant relief,\
\ because the encroachment of the fence violated the restriction in the original\
\ plan. (C) deny relief, because the teacher failed to enforce the restriction against\
\ the retiree. (D) deny relief, because the fence would not be construed as \"a\
\ structure\" within the terms of the restriction. \nA: Let's think step by step.\
\ We refer to Wikipedia articles on law for help. The restrictions in the original\
\ plan say no house or structure of any kind shall be built on the aforementioned\
\ strip of land running through the said blocks. Hence the court will most likely\
\ grant relief because the fence violated the restriction in the original plan.\
\ The answer is (B).\n\nQ: Judge took judicial notice of some facts at the beginning\
\ of the trial. Which of the following is not an appropriate kind of fact for judicial\
\ notice?\n(A) Indisputable facts. (B) Facts that have been asserted by individual\
\ political organizations. (C) Facts recognized to be true by common knowledge.\
\ (D) Facts capable of scientific verification.\nA: Let's think step by step. We\
\ refer to Wikipedia articles on law for help. Among the options, facts that have\
\ been asserted by individual political organizations is not an appropriate kind\
\ of fact for judicial notice. The answer is (B).\n\nQ: A state legislature has\
\ recently enacted a statute making it a misdemeanor to curse or revile or use obscene\
\ or opprobrious language toward or in reference to a police officer perfonning\
\ his duties. A student at a state university organized a demonstration on campus\
\ to protest the war. The rally was attended by a group of 50 students who shouted\
\ anti-war messages at cars passing by. To show his contempt for the United States,\
\ the student sewed the American flag to the rear of his jeans. When a police officer\
\ saw the flag sown on the student's jeans, he approached and told him to remove\
\ the flag or he would be placed under arrest. The student became angered and shouted\
\ at the police officer, \"Listen, you bastard, I'll wear this rag anywhere I please.\
\ \" The student was subsequently placed under arrest and charged with violating\
\ the state statute. The student subsequently brings suit in state court challenging\
\ the constitutionality of the statute. The strongest constitutional argument for\
\ the student is that\n(A) the statute is void for vagueness under the Fourteenth\
\ Amendment's due process clause. (B) the statute is invalid because it violates\
\ the petitioner's freedom of speech under the First Amendment. (C) the statute\
\ is an abridgment of freedom of speech under the First Amendment because less restrictive\
\ means are available for achieving the same purpose. (D) the statute is overbroad\
\ and consequently invalid under the First and Fourteenth Amendments.\nA: Let's\
\ think step by step. We refer to Wikipedia articles on law for help. The Fourteenth\
\ Amendment further supports the First Amendment by establishing a due process clause.\
\ Hence the strongest argument should be the statute is overbroad and consequently\
\ invalid under the First and Fourteenth Amendments. The answer is (D)."
"group": "mmlu_flan_cot_fewshot_humanities"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_professional_law"
"dataset_name": "professional_medicine"
"description": "The following are multiple choice questions (with answers) about professional\
\ medicine.\n\nQ: A 22-year-old male marathon runner presents to the office with\
\ the complaint of right-sided rib pain when he runs long distances. Physical examination\
\ reveals normal heart and lung findings and an exhalation dysfunction at ribs 4-5\
\ on the right. Which of the following muscles or muscle groups will be most useful\
\ in correcting this dysfunction utilizing a direct method?\n(A) anterior scalene\
\ (B) latissimus dorsi (C) pectoralis minor (D) quadratus lumborum\nA: Let's think\
\ step by step. We refer to Wikipedia articles on medicine for help. Among the options,\
\ only pectoralis minor muscle origins from the outer surfaces of the 3rd to 5th\
\ ribs. The answer is (C).\n\nQ: A 36-year-old male presents to the office with\
\ a 3-week history of low back pain. He denies any recent trauma but says that he\
\ climbs in and out of his truck numerous times a day for his job. Examination of\
\ the patient in the prone position reveals a deep sacral sulcus on the left, a\
\ posterior inferior lateral angle on the right, and a lumbosacral junction that\
\ springs freely on compression. The most likely diagnosis is\n(A) left-on-left\
\ sacral torsion (B) left-on-right sacral torsion (C) right unilateral sacral flexion\
\ (D) right-on-right sacral torsion\nA: Let's think step by step. We refer to Wikipedia\
\ articles on medicine for help. The deep sulcus on the left, a posterior ILA on\
\ the right, with a negative spring test suggests a right-on-right sacral torsion.\
\ All other options have a deep sulcus on the right. The answer is (D).\n\nQ: A\
\ 44-year-old man comes to the office because of a 3-day history of sore throat,\
\ nonproductive cough, runny nose, and frontal headache. He says the headache is\
\ worse in the morning and ibuprofen does provide some relief. He has not had shortness\
\ of breath. Medical history is unremarkable. He takes no medications other than\
\ the ibuprofen for pain. Vital signs are temperature 37.4°C (99.4°F), pulse 88/min,\
\ respirations 18/min, and blood pressure 120/84 mm Hg. Examination of the nares\
\ shows erythematous mucous membranes. Examination of the throat shows erythema\
\ and follicular lymphoid hyperplasia on the posterior oropharynx. There is no palpable\
\ cervical adenopathy. Lungs are clear to auscultation. Which of the following is\
\ the most likely cause of this patient's symptoms?\n(A) Allergic rhinitis (B) Epstein-Barr\
\ virus (C) Mycoplasma pneumonia (D) Rhinovirus\nA: Let's think step by step. We\
\ refer to Wikipedia articles on medicine for help. The symptoms, especially the\
\ headache, suggest that the most likely cause is Rhinovirus. Epstein-Barr virus\
\ will cause swollen lymph nodes but there is no palpable cervical adenopathy. Lungs\
\ are clear to auscultation suggests it's not Mycoplasma pneumonia. The answer is\
\ (D).\n\nQ: A previously healthy 32-year-old woman comes to the physician 8 months\
\ after her husband was killed in a car crash. Since that time, she has had a decreased\
\ appetite and difficulty falling asleep. She states that she is often sad and cries\
\ frequently. She has been rechecking the door lock five times before leaving her\
\ house and has to count exactly five pieces of toilet paper before she uses it.\
\ She says that she has always been a perfectionist but these urges and rituals\
\ are new. Pharmacotherapy should be targeted to which of the following neurotransmitters?\n\
(A) Dopamine (B) Glutamate (C) Norepinephrine (D) Serotonin\nA: Let's think step\
\ by step. We refer to Wikipedia articles on medicine for help. The patient feels\
\ sad and among the options, only Dopamine and Serotonin can help increase positive\
\ emotions. Serotonin also affects digestion and metabolism, which can help the\
\ patient's decreased appetite and sleep difficulty. The answer is (D).\n\nQ: A\
\ 42-year-old man comes to the office for preoperative evaluation prior to undergoing\
\ adrenalectomy scheduled in 2 weeks. One month ago, he received care in the emergency\
\ department for pain over his right flank following a motor vehicle collision.\
\ At that time, blood pressure was 160/100 mm Hg and CT scan of the abdomen showed\
\ an incidental 10-cm left adrenal mass. Results of laboratory studies, including\
\ complete blood count, serum electrolyte concentrations, and liver function tests,\
\ were within the reference ranges. The patient otherwise had been healthy and had\
\ never been told that he had elevated blood pressure. He takes no medications.\
\ A follow-up visit in the office 2 weeks ago disclosed elevated urinary normetanephrine\
\ and metanephrine and plasma aldosterone concentrations. The patient was referred\
\ to a surgeon, who recommended the adrenalectomy. Today, vital signs are temperature\
\ 36.6°C (97.9°F), pulse 100/min, respirations 14/min, and blood pressure 170/95\
\ mm Hg. Physical examination discloses no significant findings. Initial preoperative\
\ preparation should include treatment with which of the following?\n(A) Labetalol\
\ (B) A loading dose of potassium chloride (C) Nifedipine (D) Phenoxybenzamine\n\
A: Let's think step by step. We refer to Wikipedia articles on medicine for help.\
\ The symptoms and the adrenal mass suggested pheochromocytoma, and the blood pressure\
\ indicates hypertension. Phenoxybenzamine is used to treat hypertension caused\
\ by pheochromocytoma. The answer is (D)."
"group": "mmlu_flan_cot_fewshot_other"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_professional_medicine"
"dataset_name": "professional_psychology"
"description": "The following are multiple choice questions (with answers) about professional\
\ psychology.\n\nQ: In the construction of a multiple regression equation for purposes\
\ of prediction, the optimal combination of measures is one in which the predictors\n\
(A) are uncorrelated with each other but are moderately correlated with the criterion\
\ (B) have low correlations with each other and low correlations with the criterion\
\ (C) are highly intercorrelated with each other and moderately correlated with\
\ the criterion (D) have low correlations with the criterion bur are moderately\
\ correlated with each other\nA: Let's think step by step. We refer to Wikipedia\
\ articles on psychology for help. The basis of multiple regression is to assess\
\ the relationship between one continuous variable and a set of independent variables.\
\ So the predictors should be uncorrelated with each other but are moderately correlated\
\ with the criterion. The answer is (A).\n\nQ: There are three ways to measure the\
\ Central Tendency: the Mean, the Median and the Mode. From your knowledge about\
\ them, what is the mode?\n(A) less sensitive to extreme scores than the mean (B)\
\ more useful for skewed distributions (C) sensitive to extreme values and highly\
\ skewed distributions (D) the most frequently occurring number\nA: Let's think\
\ step by step. We refer to Wikipedia articles on psychology for help. The definition\
\ of mode is the most frequently occurring number. The answer is (D).\n\nQ: Carl\
\ Jung believed that a client's transference:\n(A) is a fantasy that distracts the\
\ client from reality. (B) represents “mixed feelings” toward the therapist. (C)\
\ \"is a form of \"\"acting out.\"\"\" (D) reflects the client’s personal and collective\
\ unconscious.\nA: Let's think step by step. We refer to Wikipedia articles on psychology\
\ for help. Transference is a phenomenon that a person's feelings are unconsciously\
\ redirected, so it reflects the client's personal and collective unconscious. The\
\ answer is (D).\n\nQ: In terms of Hofstede’s (1980) five cultural dimensions, the\
\ United States scores at the top of the scale on:\n(A) individualism. (B) individualism\
\ and power distance. (C) power distance and masculinity. (D) uncertainty avoidance.\n\
A: Let's think step by step. We refer to Wikipedia articles on psychology for help.\
\ US scores highest on individualism among the five cultural dimensions. The answer\
\ is (A).\n\nQ: One of your therapy clients asks your advice about a good weight-\
\ reduction program. You have investigated the programs in the community and are\
\ enrolled in the one you consider the best. This program offers a $50 bonus to\
\ its patrons for each new person they bring into the program. Under these circumstances,\
\ your most appropriate response would be to\n(A) tell your client the pros and\
\ cons of each program you know about except for the one in which you are enrolled\
\ (B) recommend to your client the program in which you are enrolled and explain\
\ the $50 bonus you will receive (C) recommend to your client the program in which\
\ you are enrolled and offer to have the $50 bonus credited to your client's account\
\ in the program (D) tell your client the pros and cons of each program you know\
\ about, but do not claim the $50 bonus if your client enrolls in your program\n\
A: Let's think step by step. We refer to Wikipedia articles on psychology for help.\
\ Based on the circumstances, you should tell your client about the pros and cons\
\ of each program, but it would be inappropriate to receive the bonus, so you should\
\ not claim the $50 bonus. The answer is (D)."
"group": "mmlu_flan_cot_fewshot_social_sciences"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_professional_psychology"
"dataset_name": "public_relations"
"description": "The following are multiple choice questions (with answers) about public\
\ relations.\n\nQ: Earth Hour was a campaign launched by which organization?\n(A)\
\ Greenpeace (B) The UN (C) Oxfam (D) World Wildlife Fund\nA: Let's think step by\
\ step. We refer to Wikipedia articles on public relations for help. Earth Hour\
\ is a worldwide movement oragnized launched by the World Wildlife Fund. The answer\
\ is (D).\n\nQ: In issues management, what is the most proactive approach to addressing\
\ negative or misleading information posted online about your organization?\n(A)\
\ Buy domain names that could be used by opposition groups. (B) Post anonymous comments\
\ on blogs to combat this information. (C) Prepare a news release that discredits\
\ the inaccurate information. (D) Make policy changes to address complaints highlighted\
\ on these sites.\nA: Let's think step by step. We refer to Wikipedia articles on\
\ public relations for help. In issues management, the most proactive approach to\
\ addressing negative or misleading information posted online is to make policy\
\ changes to address complaints highlighted on those sites. The answer is (D).\n\
\nQ: At which stage in the planning process would a situation analysis be carried\
\ out?\n(A) Defining the program (B) Planning the program (C) Taking action and\
\ implementing ideas (D) Evaluation of the program\nA: Let's think step by step.\
\ We refer to Wikipedia articles on public relations for help. Situation analyses\
\ are typically carried out during the planning process stage of defining the program.\
\ The answer is (A).\n\nQ: Which of these statements is true of the Vatican in 2010\
\ at the time of the accusations of child abuse cover-ups?\n(A) There was a coordinated\
\ media response. (B) Consistent messages were communicated. (C) Criticisms were\
\ taken as attacks on the Catholic Church. (D) The credibility of the Vatican was\
\ upheld.\nA: Let's think step by step. We refer to Wikipedia articles on public\
\ relations for help. In 2010 when there were accusations of child abuse cover-ups,\
\ the Vatican took those criticisms as attacks on the Catholic Church. The answer\
\ is (C).\n\nQ: What should a public relations media practitioner do if she does\
\ not know the answer to a reporter's question?\n(A) Give the reporter other information\
\ she is certain is correct. (B) Say that the information is 'off the record' and\
\ will be disseminated later. (C) Say 'I don't know' and promise to provide the\
\ information later. (D) Say 'no comment,' rather than appear uninformed.\nA: Let's\
\ think step by step. We refer to Wikipedia articles on public relations for help.\
\ If a public relations media practitioner does not know the answer to a reporter's\
\ question, they should say 'I don't know' and offer to provide the information\
\ later. The answer is (C)."
"group": "mmlu_flan_cot_fewshot_social_sciences"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_public_relations"
"dataset_name": "security_studies"
"description": "The following are multiple choice questions (with answers) about security\
\ studies.\n\nQ: What are the frameworks of analysis within which terrorism has\
\ been considered (as of 2020)?\n(A) Competition between larger nations has resulted\
\ in some countries actively supporting terrorist groups to undermine the strength\
\ of rival states. Terrorist networks are extended patronage clubs maintained and\
\ paid for by their donor states and are conceptualised as being like state actors,\
\ to be dealt with using military force. (B) Globalization has enabled the internationalization\
\ of terrorist activities by opening up their operational space, although coordination\
\ is still managed from a geographical base. This suggests that terrorist groups\
\ are nationally structured which means that terrorism cannot be considered in terms\
\ of a war to be defeated militarily without having serious implications on the\
\ indigenous population. (C) Terrorism can be viewed as a problem to be resolved\
\ by military means (war on terrorism), by normal police techniques (terrorism as\
\ crime), or as a medical problem with underlying causes and symptoms (terrorism\
\ as disease). (D) Terrorism is viewed as a criminal problem. The criminalization\
\ of terrorism has two important implications. Firstly, it suggests that terrorism\
\ can be eradicated - terrorists can be caught and brought to trial by normal judicial\
\ proceedings thereby removing the threat from society - and secondly, it suggests\
\ that preventative crime techniques are applicable to prevent its development.\n\
A: Let's think step by step. We refer to Wikipedia articles on security studies\
\ for help. (A) is wrong because it is not competition between larger nations that\
\ causes terrorism. \n(B) is wrong because globalization is not the cause of terrorism.\n\
(C) is correct because the US undertook the war on terrorism. \n(D) is wrong because\
\ preventative crime techniques will likely not end terrorism. The answer is (C).\n\
\nQ: Which of the following is the best lens through which to investigate the role\
\ of child soldiers?\n(A) Child soldiers are victims of combat that need re-education\
\ and rehabilitation. (B) Children and their mothers are not active subjects in\
\ warfare and are best considered as subjects in the private sphere. (C) Children\
\ are most often innocent bystanders in war and are best used as signifiers of peace.\
\ (D) Children have political subjecthood that is missed when they are considered\
\ as passive victims of warfare.\nA: Let's think step by step. We refer to Wikipedia\
\ articles on security studies for help. Child soliders as a political topic can\
\ be missed when they are considered passive victims of warfare. The answer is (D).\n\
\nQ: How can we best describe the relationship between the state-centric approach\
\ and the concept of human security?\n(A) There are such wide divisions within the\
\ human security framework regarding the nature of threats and referent objects\
\ that no widely applicable comparisons between state-centric approaches and human\
\ security can be drawn. (B) By adopting the framework of human security, the limitations\
\ of the realist state-centric approach become evident. Whilst human security defines\
\ the referent object as the person or population, state-centric approaches prioritise\
\ the security of the state, de-prioritizing the pursuit of human security. (C)\
\ The state-centric approach to security is a faction of human security, usually\
\ defined within the broad school of human security. By being state-centric this\
\ approach prioritises the individual as the referent object in security studies.\
\ (D) Both the state-centric and human-centric approaches to security are mutually\
\ exclusive and offer a sufficient analytic framework with which to understand the\
\ international security system. It is therefore the role of security analysts to\
\ determine which of these substantial concepts is correct, and which should be\
\ discarded.\nA: Let's think step by step. We refer to Wikipedia articles on security\
\ studies for help. Human security focuses on a person or population whereas state-centric\
\ approaches focus on the state while deprioritizing human security. The answer\
\ is (B).\n\nQ: In order to become securitized, a threat must be presented in which\
\ of these ways?\n(A) As an existential threat that requires immediate and extraordinary\
\ action, posing a threat to the survival of the state or to societal security.\
\ (B) As requiring immediate and extraordinary action by the state, threatening\
\ the survival of a referent object and therefore warranting the use of measures\
\ not normally employed in the political realm. (C) As an urgent threat to the survival\
\ of the referent object, so serious that it legitimises the employment of extraordinary\
\ action in response. (D) As an urgent threat to the survival of the audience that\
\ requires extraordinary or emergency measures.\nA: Let's think step by step. We\
\ refer to Wikipedia articles on security studies for help. To be securitized, a\
\ threat must be an urgent threat to the survival of the referent object. The answer\
\ is (C).\n\nQ: What distinguishes coercive diplomacy from military force?\n(A)\
\ Compellence is another term for coercive diplomacy, but covering a narrower set\
\ of criteria; compellence covers those threats aimed at initiating adversary action.\
\ A threat to coerce a state to give up part of its territory would count as coercive\
\ diplomacy, as long as that threat proactively initiates action before reactive\
\ diplomacy is taken. (B) Coercive diplomacy constitutes the threats of limited\
\ force to induce adversary's incentive to comply with the coercer's demands. It\
\ is an influence strategy that is intended to obtain compliance: the use of force\
\ to defeat an opponent first does not count. It leaves an element of choice with\
\ the target to comply, or to continue. (C) Military force, or the threat of military\
\ force, utilises fear to achieve strategic objectives. Coercive diplomacy is differentiated\
\ from this approach, because it does not use fear as a tool for coercing an adversary.\
\ (D) Coercive diplomacy is employed to use force but to limit its effects on the\
\ international community. Coercive diplomacy is an aggressive strategy that is\
\ intended to obtain compliance through defeat. It does not leave an element of\
\ choice with the target, the target either being forced to comply or engage in\
\ conflict. It seeks to control by imposing compliance by removing any opportunity\
\ for negotiation or concession.\nA: Let's think step by step. We refer to Wikipedia\
\ articles on security studies for help. Coercive diplomacy uses the threat of force\
\ to induce the opponent to comply with demands. The answer is (B)."
"group": "mmlu_flan_cot_fewshot_social_sciences"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_security_studies"
"dataset_name": "sociology"
"description": "The following are multiple choice questions (with answers) about sociology.\n\
\nQ: Which of the following is not a problem associated with official statistics\
\ on strike action?\n(A) most strikes go unnoticed by employers and the mass media\
\ (B) not all industrial disputes will be reported by the employer (C) the definition\
\ of strikes excludes those that involve fewer than ten workers or last less than\
\ one day (D) it is hard to compare strikes that were measured in different ways\n\
A: Let's think step by step. We refer to Wikipedia articles on sociology for help.\
\ Official statistics on strike action can be problematic because not all industrial\
\ disputes will be reported by employers, the definition of strikes excludes those\
\ that involves fewer than ten workers or last less than one day, and it is hard\
\ to compare strikes that were measured in different ways. Thus, (A) is not a problem\
\ associated with official statistics on strike action. The answer is (A).\n\nQ:\
\ What does Berger (1963) describe as a metaphor for social reality?\n(A) a fairground\
\ ride (B) a circus (C) a puppet theatre (D) a ballet\nA: Let's think step by step.\
\ We refer to Wikipedia articles on sociology for help. Berger describes social\
\ reality using the metaphor of a puppet theatre. The answer is (C).\n\nQ: The term\
\ 'hegemony' refers to:\n(A) the tendency for the working class not to realize their\
\ own interests (B) a dominant ideology that legitimates economic, political and\
\ cultural power (C) a form of dual consciousness based on ideology and everyday\
\ experiences (D) a mode of payment given for outstanding topiary\nA: Let's think\
\ step by step. We refer to Wikipedia articles on sociology for help. Hegemony refers\
\ to a dominant ideology that legitimates economic, policital, and cultural power.\
\ The answer is (B).\n\nQ: The shift from 'civil religion' to 'common religion'\
\ means that:\n(A) the increasing bureaucracy of the state has made religion only\
\ a marginal part of our lives (B) despite the weakening of traditional authority,\
\ our everyday lives and 'common sense' remain shaped by religious beliefs and values\
\ (C) religious participation in collective worship may have declined, but people\
\ still practise their faiths in private (D) people are much more likely to discuss\
\ their religious beliefs in public, informal settings\nA: Let's think step by step.\
\ We refer to Wikipedia articles on sociology for help. The shift from civil religion\
\ to common religion means that despite the weakening of traditional authority,\
\ our everyday lives and common sense remain shaped by religious beliefs and values.\
\ The answer is (B).\n\nQ: Which of the following did the post-war welfare state\
\ of 1948 not aim to provide:\n(A) free health care and education for all (B) a\
\ minimum wage (C) full employment (D) universal welfare\nA: Let's think step by\
\ step. We refer to Wikipedia articles on sociology for help. The post-war welfare\
\ state of 1948 aimed to provide free healthcare and education, full employment,\
\ and universal welfare. But it did not aim to provide a minimum wage. The answer\
\ is (B)."
"group": "mmlu_flan_cot_fewshot_social_sciences"
"include": "_mmlu_flan_cot_fewshot_template_yaml"
"task": "mmlu_flan_cot_fewshot_sociology"
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment