Commit ee098a47 authored by lintangsutawika's avatar lintangsutawika
Browse files

more tasks

parent 068f8ab2
group:
- anli
task: anli_r1
dataset_path: anli
output_type: multiple_choice
training_split: train_r1
validation_split: dev_r1
test_split: test_r1
doc_to_text: "{{premise}}\nQuestion: {{hypothesis}}. True, False, or Neither?\nAnswer:"
doc_to_target: " {{label}}" # this will be cast to an int.
template_aliases: "{% set answer_choices = ['True', 'False', 'Neither'] %}"
metric_list:
- metric: acc
aggregation: mean
higher_is_better: true
include: anli_r1.yaml
task: anli_r2
training_split: train_r2
validation_split: dev_r2
test_split: test_r2
include: anli_r1.yaml
task: anli_r3
training_split: train_r3
validation_split: dev_r3
test_split: test_r3
{"doc_id": 0, "doc": {"question": "does ethanol take more energy make that produces", "passage": "Ethanol fuel -- All biomass goes through at least some of these steps: it needs to be grown, collected, dried, fermented, distilled, and burned. All of these steps require resources and an infrastructure. The total amount of energy input into the process compared to the energy released by burning the resulting ethanol fuel is known as the energy balance (or ``energy returned on energy invested''). Figures compiled in a 2007 report by National Geographic Magazine point to modest results for corn ethanol produced in the US: one unit of fossil-fuel energy is required to create 1.3 energy units from the resulting ethanol. The energy balance for sugarcane ethanol produced in Brazil is more favorable, with one unit of fossil-fuel energy required to create 8 from the ethanol. Energy balance estimates are not easily produced, thus numerous such reports have been generated that are contradictory. For instance, a separate survey reports that production of ethanol from sugarcane, which requires a tropical climate to grow productively, returns from 8 to 9 units of energy for each unit expended, as compared to corn, which only returns about 1.34 units of fuel energy for each unit of energy expended. A 2006 University of California Berkeley study, after analyzing six separate studies, concluded that producing ethanol from corn uses much less petroleum than producing gasoline.", "idx": 0, "label": 0}, "target": "no", "arguments": ["Euro sign -- The euro sign (€) is the currency sign used for the euro, the official currency of the Eurozone in the European Union (EU). The design was presented to the public by the European Commission on 12 December 1996. The international three-letter code (according to ISO standard ISO 4217) for the euro is EUR. In Unicode it is encoded at U+20AC € euro sign (HTML € € ). In English, the sign precedes the value (for instance, €10, not 10 €, unlike most other European languages). In some style guides, but not others, the euro sign is unspaced.\nQuestion: does the euro sign go before the number\nAnswer: yes\n\nProperties of metals, metalloids and nonmetals -- Metals comprise the large majority of the elements, and can be subdivided into several different categories. From left to right in the periodic table, these categories include the highly reactive alkali metals; the less reactive alkaline earth metals, lanthanides and radioactive actinides; the archetypal transition metals, and the physically and chemically weak post-transition metals. Specialized subcategories such as the refractory metals and the noble metals also exist.\nQuestion: are there more metallic elements than non metallic elements\nAnswer: yes\n\nBank of America -- The history of Bank of America dates back to October 17, 1904, when Amadeo Pietro Giannini founded the Bank of Italy in San Francisco. The Bank of Italy served the needs of many immigrants settling in the United States at that time, providing services denied to them by the existing American banks which typically discriminated against them and often denied service to all but the wealthiest. Giannini was raised by his mother and stepfather Lorenzo Scatena, as his father was fatally shot over a pay dispute with an employee. When the 1906 San Francisco earthquake struck, Giannini was able to save all deposits out of the bank building and away from the fires. Because San Francisco's banks were in smoldering ruins and unable to open their vaults, Giannini was able to use the rescued funds to commence lending within a few days of the disaster. From a makeshift desk consisting of a few planks over two barrels, he lent money to those who wished to rebuild.\nQuestion: the organization that today is known as the bank of america did start out in america\nAnswer: yes\n\nFroot Loops -- Froot Loops is a brand of sweetened, fruit-flavored breakfast cereal produced by Kellogg's and sold in many countries. The cereal pieces are ring-shaped (hence ``loops'') and come in a variety of bright colors and a blend of fruit flavors (hence ``froot'', a cacography of fruit). However, there is no actual fruit in Froot Loops and they are all the same flavor. Kellogg's introduced Froot Loops in 1963. Originally, there were only red, orange, and yellow loops, but green, purple, and blue were added during the 1990s. Different methods of production are used in the UK where, due to the lack of natural colourings for yellow, red and blue Froot Loops are purple, green and orange, and the loops are also larger in size. Although the marketing side of Kellogg's sold the idea that each individual loop color was a different flavor, Kellogg's has acknowledged that all share the same fruit-blend flavor.\nQuestion: do different colored fruit loops have different flavors\nAnswer: no\n\nEthanol fuel -- All biomass goes through at least some of these steps: it needs to be grown, collected, dried, fermented, distilled, and burned. All of these steps require resources and an infrastructure. The total amount of energy input into the process compared to the energy released by burning the resulting ethanol fuel is known as the energy balance (or ``energy returned on energy invested''). Figures compiled in a 2007 report by National Geographic Magazine point to modest results for corn ethanol produced in the US: one unit of fossil-fuel energy is required to create 1.3 energy units from the resulting ethanol. The energy balance for sugarcane ethanol produced in Brazil is more favorable, with one unit of fossil-fuel energy required to create 8 from the ethanol. Energy balance estimates are not easily produced, thus numerous such reports have been generated that are contradictory. For instance, a separate survey reports that production of ethanol from sugarcane, which requires a tropical climate to grow productively, returns from 8 to 9 units of energy for each unit expended, as compared to corn, which only returns about 1.34 units of fuel energy for each unit of energy expended. A 2006 University of California Berkeley study, after analyzing six separate studies, concluded that producing ethanol from corn uses much less petroleum than producing gasoline.\nQuestion: does ethanol take more energy make that produces\nAnswer:", {"until": ["\n\n", "\n"], "do_sample": false, "temperature": 0.0}], "resps": [[" no"]], "filtered_resps": [" no"], "exact_match": 0.0}
{"doc_id": 8, "doc": {"question": "can u drive in canada with us license", "passage": "American entry into Canada by land -- Persons driving into Canada must have their vehicle's registration document and proof of insurance.", "idx": 8, "label": 1}, "target": "yes", "arguments": ["Blinking -- Some animals, such as tortoises and hamsters, blink their eyes independently of each other. Humans use winking, the blinking of only one eye, as a form of body language.\nQuestion: can you blink one eye at a time\nAnswer: yes\n\nMajor League Baseball schedule -- Note that rainouts and other cancellations are often rescheduled ad hoc during the season, sometimes as doubleheaders. However, if two teams are scheduled to meet for the final time in the last two weeks of the season, and the game is cancelled, it may not be rescheduled if there is no impact on the divisional or wild card races. For example, in 2016, the September 29 game between the Cleveland Indians and Detroit Tigers was cancelled due to rain because the teams were unable to reschedule a make-up date before the end of the season on October 2, and it did not affect the divisional race. In contrast, a 2008 AL Central division game between Detroit and the Chicago White Sox needed to be made up following the last day of the regular season because it affected a division race involving the White Sox and the Minnesota Twins.\nQuestion: can a mlb team play less than 162 games\nAnswer: yes\n\nPanama national football team -- Panama qualified for the FIFA World Cup for the first time for the 2018 tournament in Russia and scored their first goal of the World Cup against England, although they lost the match 6-1.\nQuestion: has panama ever been in the world cup\nAnswer: yes\n\nLegal drinking age -- The minimum age to purchase and consume varies, but the most common age is 18 years. However, in North America the age limits varies between 18 and 21 years of age. Throughout the United States the minimum legal age to purchase any alcoholic beverage from a shop, supermarket, liquor store, bar, club or any other licensed premises is 21 years of age. In Canada each province can decide which minimum age limit is to be set to buy or consume alcohol. Most provinces have a minimum age of 19 years, while Alberta, Manitoba and Quebec have set a minimum age of 18 years. In South America all countries have set a minimum purchase age of 18 years, except for Guyana where minors aged 16 or 17 may consume a glass of beer, wine or cider in a restaurant provided they buy a meal, and Paraguay the only country with a minimum legal purchase and drinking age of 20 years.\nQuestion: can you drink at 18 in the us\nAnswer: no\n\nAmerican entry into Canada by land -- Persons driving into Canada must have their vehicle's registration document and proof of insurance.\nQuestion: can u drive in canada with us license\nAnswer:", {"until": ["\n\n", "\n"], "do_sample": false, "temperature": 0.0}], "resps": [[" yes"]], "filtered_resps": [" yes"], "exact_match": 0.0}
{"doc_id": 1, "doc": {"question": "is house tax and property tax are same", "passage": "Property tax -- Property tax or 'house tax' is a local tax on buildings, along with appurtenant land. It is and imposed on the Possessor (not the custodian of property as per 1978, 44th amendment of constitution). It resembles the US-type wealth tax and differs from the excise-type UK rate. The tax power is vested in the states and is delegated to local bodies, specifying the valuation method, rate band, and collection procedures. The tax base is the annual rental value (ARV) or area-based rating. Owner-occupied and other properties not producing rent are assessed on cost and then converted into ARV by applying a percentage of cost, usually four percent. Vacant land is generally exempt. Central government properties are exempt. Instead a 'service charge' is permissible under executive order. Properties of foreign missions also enjoy tax exemption without requiring reciprocity. The tax is usually accompanied by service taxes, e.g., water tax, drainage tax, conservancy (sanitation) tax, lighting tax, all using the same tax base. The rate structure is flat on rural (panchayat) properties, but in the urban (municipal) areas it is mildly progressive with about 80% of assessments falling in the first two brackets.", "idx": 1, "label": 1}, "target": "yes", "arguments": ["Dante's View -- Dante's View provides a panoramic view of the southern Death Valley basin. To the south, the Owlshead Mountains, 30 km (19 mi) away can be seen, and to the north, the Funeral Mountains 50 km (31 mi) distant, are visible beyond Furnace Creek. To the West, across Badwater Basin, the Panamint Range rises dramatically to Telescope Peak. To the east is found the Greenwater Range. On very clear days, the highest and lowest points in the contiguous 48 states of the United States: Mount Whitney 4,421 m (14,505 ft) high and Badwater −86 m (−282 ft) can be seen.\nQuestion: can you see mount whitney from death valley\nAnswer: yes\n\nOnce Upon a Time (season 7) -- The storyline was softly rebooted with a main narrative led by an adult Henry Mills, set several years after last season's events. In February 2018, it was announced the seventh season would serve as the final season of the series; the season and series concluded on May 18, 2018.\nQuestion: will there be new episodes of once upon a time\nAnswer: no\n\nCrazy Heart -- Crazy Heart is a 2009 American drama film, written and directed by Scott Cooper and based on the 1987 novel of the same name by Thomas Cobb. The film centers around a down-and-out country music singer-songwriter (Bridges) who tries to turn his life around after beginning a relationship with a young journalist (Gyllenhaal). Other supporting roles are played by Colin Farrell, Robert Duvall, and child actor Jack Nation. Bridges, Farrell, and Duvall also sing in the film.\nQuestion: did jeff bridges sing all the songs in crazy heart\nAnswer: no\n\nColombian emeralds -- Emeralds are green precious gemstones that are mined in various geological settings. They are minerals in the beryl group of silicates. For more than 4,000 years, emeralds have been among the most valuable of all jewels on Earth. Colombia, located on the continent of South America, is the country that mines and produces the most emeralds for the global market. It is estimated that Colombia accounts for 70-90% of the world's emerald market. While commercial grade emeralds are quite plentiful, fine and extra fine quality emeralds are extremely rare. Colombian emeralds over 50 carat can cost much more than diamonds of the same size.\nQuestion: the iconic precious gemstone of colombia is the emerald\nAnswer: yes\n\nProperty tax -- Property tax or 'house tax' is a local tax on buildings, along with appurtenant land. It is and imposed on the Possessor (not the custodian of property as per 1978, 44th amendment of constitution). It resembles the US-type wealth tax and differs from the excise-type UK rate. The tax power is vested in the states and is delegated to local bodies, specifying the valuation method, rate band, and collection procedures. The tax base is the annual rental value (ARV) or area-based rating. Owner-occupied and other properties not producing rent are assessed on cost and then converted into ARV by applying a percentage of cost, usually four percent. Vacant land is generally exempt. Central government properties are exempt. Instead a 'service charge' is permissible under executive order. Properties of foreign missions also enjoy tax exemption without requiring reciprocity. The tax is usually accompanied by service taxes, e.g., water tax, drainage tax, conservancy (sanitation) tax, lighting tax, all using the same tax base. The rate structure is flat on rural (panchayat) properties, but in the urban (municipal) areas it is mildly progressive with about 80% of assessments falling in the first two brackets.\nQuestion: is house tax and property tax are same\nAnswer:", {"until": ["\n\n", "\n"], "do_sample": false, "temperature": 0.0}], "resps": [[" yes"]], "filtered_resps": [" yes"], "exact_match": 0.0}
{"doc_id": 9, "doc": {"question": "is there a play off for third place in the world cup", "passage": "2018 FIFA World Cup knockout stage -- The knockout stage of the 2018 FIFA World Cup was the second and final stage of the competition, following the group stage. It began on 30 June with the round of 16 and ended on 15 July with the final match, held at the Luzhniki Stadium in Moscow. The top two teams from each group (16 in total) advanced to the knockout stage to compete in a single-elimination style tournament. A third place play-off was also played between the two losing teams of the semi-finals.", "idx": 9, "label": 1}, "target": "yes", "arguments": ["Fifty Shades (film series) -- Fifty Shades is an American film series that consists of three erotic romantic drama films, based on the Fifty Shades trilogy by English author E.L. James. It is distributed by Universal Studios and stars Dakota Johnson and Jamie Dornan as the lead roles Anastasia Steele and Christian Grey, respectively. Sam Taylor-Johnson directed the first film and initially she was slated to be the director of the sequels as well, however subsequently the second and third films were directed by James Foley.\nQuestion: is there 3 fifty shades of grey movies\nAnswer: yes\n\nMandalay Bay -- Mandalay Bay is a 43-story luxury resort and casino on the Las Vegas Strip in Paradise, Nevada. It is owned and operated by MGM Resorts International. One of the property's towers operates as the Delano; the Four Seasons Hotel is independently operated within the Mandalay Bay tower, occupying 5 floors (35--39).\nQuestion: is four seasons las vegas part of mandalay bay\nAnswer: yes\n\nThe Fosters (season 5) -- The fifth and final season of The Fosters premiered on July 11, 2017. The season consisted of 22 episodes and stars Teri Polo and Sherri Saum as Stef Foster and Lena Adams, an interracial lesbian couple, who have adopted a girl (Maia Mitchell) and her younger brother (Hayden Byerly) while also trying to juggle raising Latino twin teenagers (Cierra Ramirez and Noah Centineo) and Stef's biological son (David Lambert). Danny Nucci also returns as Mike Foster in a semi-series regular role.\nQuestion: is season 5 the last season of the fosters\nAnswer: yes\n\nGreat Lakes Waterway -- The Great Lakes Waterway is a system of natural channels and canals which enable navigation between the North American Great Lakes. Though all of the lakes are naturally connected as a chain, water travel between the lakes was impeded for centuries by obstacles such as Niagara Falls and the rapids of the St. Marys River.\nQuestion: are the great lakes connected to each other\nAnswer: yes\n\n2018 FIFA World Cup knockout stage -- The knockout stage of the 2018 FIFA World Cup was the second and final stage of the competition, following the group stage. It began on 30 June with the round of 16 and ended on 15 July with the final match, held at the Luzhniki Stadium in Moscow. The top two teams from each group (16 in total) advanced to the knockout stage to compete in a single-elimination style tournament. A third place play-off was also played between the two losing teams of the semi-finals.\nQuestion: is there a play off for third place in the world cup\nAnswer:", {"until": ["\n\n", "\n"], "do_sample": false, "temperature": 0.0}], "resps": [[" yes"]], "filtered_resps": [" yes"], "exact_match": 0.0}
{"doc_id": 2, "doc": {"question": "is pain experienced in a missing body part or paralyzed area", "passage": "Phantom pain -- Phantom pain sensations are described as perceptions that an individual experiences relating to a limb or an organ that is not physically part of the body. Limb loss is a result of either removal by amputation or congenital limb deficiency. However, phantom limb sensations can also occur following nerve avulsion or spinal cord injury.", "idx": 2, "label": 1}, "target": "yes", "arguments": ["Mahi-mahi -- The United States and the Caribbean countries are the primary consumers of this fish, but many European countries are increasing their consumption every year. It is a popular eating fish in Australia, usually caught and sold as a byproduct by tuna and swordfish commercial fishing operators. Japan and Hawaii are significant consumers. The Arabian Sea, particularly the coast of Oman, also has mahi-mahi. At first, mahi-mahi were mostly bycatch in the tuna and swordfish longline fishery. Now, they are sought by commercial fishermen on their own merits.\nQuestion: is mahi-mahi a type of tuna\nAnswer: no\n\nCanada at the FIFA World Cup -- This is a record of Canada's results at the FIFA World Cup. Canada has appeared in the FIFA World Cup on one occasion, which was in 1986.\nQuestion: has canada ever had a team in the world cup\nAnswer: yes\n\nCalgary Flames -- The team was founded in 1972 in Atlanta as the Atlanta Flames until relocating to Calgary in 1980. The Flames played their first three seasons in Calgary at the Stampede Corral before moving into their current home arena, the Scotiabank Saddledome (originally known as the Olympic Saddledome), in 1983. In 1985--86, the Flames became the first Calgary team since the 1923--24 Tigers to compete for the Stanley Cup. In 1988--89, the Flames won their first and only championship. The Flames' unexpected run to the 2004 Stanley Cup Finals gave rise to the Red Mile, and in 2011 the team hosted and won the second Heritage Classic outdoor game.\nQuestion: have the calgary flames won a stanley cup\nAnswer: yes\n\nPigeon post -- Pigeon post is the use of homing pigeons to carry messages. Pigeons were effective as messengers due to their natural homing abilities. The pigeons were transported to a destination in cages, where they would be attached with messages, then naturally the pigeon would fly back to its home where the owner could read their mail. They have been used in many places around the world. Pigeons have also been used to great effect in military situations, and are in this case referred to as war pigeon.\nQuestion: did they really use birds to send messages\nAnswer: yes\n\nPhantom pain -- Phantom pain sensations are described as perceptions that an individual experiences relating to a limb or an organ that is not physically part of the body. Limb loss is a result of either removal by amputation or congenital limb deficiency. However, phantom limb sensations can also occur following nerve avulsion or spinal cord injury.\nQuestion: is pain experienced in a missing body part or paralyzed area\nAnswer:", {"until": ["\n\n", "\n"], "do_sample": false, "temperature": 0.0}], "resps": [[" yes"]], "filtered_resps": [" yes"], "exact_match": 0.0}
{"doc_id": 10, "doc": {"question": "can minors drink with parents in new york", "passage": "Alcohol laws of New York -- In response to the National Minimum Drinking Age Act in 1984, which reduced by up to 10% the federal highway funding of any state which did not have a minimum purchasing age of 21, the New York Legislature raised the drinking age from 19 to 21, effective December 1, 1985. (The drinking age had been 18 for many years before the first raise on December 4th, 1982, to 19.) Persons under 21 are prohibited from purchasing alcohol or possessing alcohol with the intent to consume, unless the alcohol was given to that person by their parent or legal guardian. There is no law prohibiting where people under 21 may possess or consume alcohol that was given to them by their parents. Persons under 21 are prohibited from having a blood alcohol level of 0.02% or higher while driving.", "idx": 10, "label": 1}, "target": "yes", "arguments": ["List of Red Hot Chili Peppers band members -- In late 1983, two weeks before signing with EMI, Slovak and Irons obtained a record deal with MCA Records with their other band, What Is This?, and left Red Hot Chili Peppers. Rather than dissolving the band, Kiedis and Flea decided to recruit new members. Cliff Martinez was hired as the band's new drummer and Martinez's bandmate in The Weirdos, guitarist, Dix Denney was expected to replace Slovak. After a few rehearsals, however, the band felt he didn't fit and auditions continued. Their final candidates were guitarists Mark Nine and Jack Sherman, about whom the band knew nothing. After practicing with Sherman, they felt that he was the best fit because he worked well with Flea and Martinez. With Martinez and Sherman aboard, the band released their eponymous debut album on August 10, 1984. During the ensuing tour, continuing musical and lifestyle tension between Kiedis and Sherman complicated the transition between concert and daily band life. Sherman was fired soon after, with Slovak returning to the Chili Peppers in 1985 after growing tired of What Is This?. At one point, Chuck Biscuits filled in on drums during the 1985 tour. The band dismissed Cliff Martinez from the group in April 1986 due to personal differences and replaced him with founding member Jack Irons, who was out of work and finally separated from other commitments. During this period, however, Kiedis and Slovak had both developed serious drug addictions, which resulted in Kiedis being briefly fired that same year. At one performance, longtime friend and then Circle Jerks frontman, Keith Morris filled in for an absent Kiedis who was out scoring drugs while his band was playing a show. On June 25, 1988, Slovak died of a heroin overdose shortly after the completion of The Uplift Mofo Party Plan tour. Kiedis would then retreat into hiding, further fueling his drug habit and even skipping Slovak's funeral. After a band meeting with manager Lindy Goetz, Irons subsequently left the group, saying that he did not want to be part of a band where his friends were dying.\nQuestion: did a member of red hot chili peppers died\nAnswer: yes\n\nDefensive three-second violation -- A defensive three-second violation, also known as illegal defense, is a basketball rules infraction in the National Basketball Association (NBA) introduced in the 2001-2002 season. It is assessed when a member of the defending team spends more than three seconds in the free throw lane (as well called the 16-foot lane, or as otherwise known- ``in the paint'') while not actively guarding an opponent. To be considered actively guarding, a defender must be within arm's length of an opponent and in a guarding position. A three-second count is suspended if:\nQuestion: can you get a 3 second violation on defense\nAnswer: yes\n\n1965–66 Manchester City F.C. season -- This season is widely believed to have been the start of Manchester City's golden era, a period largely concurrent with the reign of Joe Mercer and Malcolm Allison as managers at the club, and then of the aftermath of the break-up of the partnership. This season began City's highest concentration of silverware to seasons played - winning the Second Division for a record sixth time in this season, the club would claim the Football League First Division title, the FA Cup, the Football League Cup twice, the Charity Shield twice, and in European competition the UEFA Cup Winners' Cup all within the space of the ten seasons following.\nQuestion: did manchester city win the football league in 1966\nAnswer: yes\n\nBanknotes of the pound sterling -- Not since 1945 has there been bigger notes than £50 issued for general circulation by the Bank of England, although banks in Scotland and Northern Ireland still use £100 notes. However, the Bank of England does produce higher-value notes that are used to maintain parity with Scottish and Northern Irish notes. Banknotes issued by Scottish and Northern Irish banks have to be backed pound for pound by Bank of England notes (other than a small amount representing the currency in circulation in 1845), and special £1 million and £100 million notes are used for this purpose. Their design is based on the old Series A notes.\nQuestion: is there such thing as a 100 pound note\nAnswer: yes\n\nAlcohol laws of New York -- In response to the National Minimum Drinking Age Act in 1984, which reduced by up to 10% the federal highway funding of any state which did not have a minimum purchasing age of 21, the New York Legislature raised the drinking age from 19 to 21, effective December 1, 1985. (The drinking age had been 18 for many years before the first raise on December 4th, 1982, to 19.) Persons under 21 are prohibited from purchasing alcohol or possessing alcohol with the intent to consume, unless the alcohol was given to that person by their parent or legal guardian. There is no law prohibiting where people under 21 may possess or consume alcohol that was given to them by their parents. Persons under 21 are prohibited from having a blood alcohol level of 0.02% or higher while driving.\nQuestion: can minors drink with parents in new york\nAnswer:", {"until": ["\n\n", "\n"], "do_sample": false, "temperature": 0.0}], "resps": [[" yes"]], "filtered_resps": [" yes"], "exact_match": 0.0}
{"doc_id": 3, "doc": {"question": "is harry potter and the escape from gringotts a roller coaster ride", "passage": "Harry Potter and the Escape from Gringotts -- Harry Potter and the Escape from Gringotts is an indoor steel roller coaster at Universal Studios Florida, a theme park located within the Universal Orlando Resort. Similar to dark rides, the roller coaster utilizes special effects in a controlled-lighting environment and also employs motion-based 3-D projection of both animation and live-action sequences to enhance the experience. The ride, which is themed to the Gringotts Wizarding Bank, became the flagship attraction for the expanded Wizarding World of Harry Potter when it opened on July 8, 2014.", "idx": 3, "label": 1}, "target": "yes", "arguments": ["State church of the Roman Empire -- Nicene Christianity became the state church of the Roman Empire with the Edict of Thessalonica in 380 AD, when Emperor Theodosius I made it the Empire's sole authorized religion. The Eastern Orthodox Church, Oriental Orthodoxy, and the Catholic Church each claim to be the historical continuation of this church in its original form, but do not identify with it in the caesaropapist form that it took later. Unlike Constantine I, who with the Edict of Milan of 313 AD had established tolerance for Christianity without placing it above other religions and whose involvement in matters of the Christian faith extended to convoking councils of bishops who were to determine doctrine and to presiding at their meetings, but not to determining doctrine himself, Theodosius established a single Christian doctrine (specified as that professed by Pope Damasus I of Rome and Pope Peter II of Alexandria) as the Empire's official religion.\nQuestion: by the end of the fourth century christianity was the official religion of the roman empire\nAnswer: yes\n\nEiffel Tower -- When originally built, the first level contained three restaurants--one French, one Russian and one Flemish--and an ``Anglo-American Bar''. After the exposition closed, the Flemish restaurant was converted to a 250-seat theatre. A promenade 2.6-metre (8 ft 6 in) wide ran around the outside of the first level. At the top, there were laboratories for various experiments, and a small apartment reserved for Gustave Eiffel to entertain guests, which is now open to the public, complete with period decorations and lifelike mannequins of Eiffel and some of his notable guests.\nQuestion: is there a house at the top of the eiffel tower\nAnswer: no\n\nArticles of Confederation -- Beyond improving their existing association, the records of the Second Continental Congress show that the need for a declaration of independence was intimately linked with the demands of international relations. On June 7, 1776, Richard Henry Lee introduced a resolution before the Continental Congress declaring the colonies independent; at the same time he also urged Congress to resolve ``to take the most effectual measures for forming foreign Alliances'' and to prepare a plan of confederation for the newly independent states. Congress then created three overlapping committees to draft the Declaration, a Model Treaty, and the Articles of Confederation. The Declaration announced the states' entry into the international system; the model treaty was designed to establish amity and commerce with other states; and the Articles of Confederation, which established ``a firm league'' among the thirteen free and independent states, constituted an international agreement to set up central institutions for the conduct of vital domestic and foreign affairs.\nQuestion: did the articles of confederation come before the declaration of independence\nAnswer: no\n\nLotto Max -- Lotto Max is played similarly to its predecessor, with players selecting seven numbers from a field of 49. A single board costs $5, and each purchased board also includes two additional quick picks. The main drawing features a jackpot prize starting at CDN$10 million. After the jackpot reaches at least $50 million, additional drawings are held for auxiliary ``MaxMillions'' prizes of $1 million each. MaxMillions prizes are carried over until they are won, and additional MaxMillions prizes are added for each week a main jackpot of at least $50 million is not won. Initially capped at $50 million, the main jackpot is capped at $60 million as of the July 17, 2015 draw. Once a jackpot is won, unclaimed MaxMillions prizes, if any, are placed in the main jackpot on top of the $10 million minimum. As with all Canadian lottery jackpot games, winners receive their prizes in a tax-free lump sum.\nQuestion: is lotto max paid in a lump sum\nAnswer: yes\n\nHarry Potter and the Escape from Gringotts -- Harry Potter and the Escape from Gringotts is an indoor steel roller coaster at Universal Studios Florida, a theme park located within the Universal Orlando Resort. Similar to dark rides, the roller coaster utilizes special effects in a controlled-lighting environment and also employs motion-based 3-D projection of both animation and live-action sequences to enhance the experience. The ride, which is themed to the Gringotts Wizarding Bank, became the flagship attraction for the expanded Wizarding World of Harry Potter when it opened on July 8, 2014.\nQuestion: is harry potter and the escape from gringotts a roller coaster ride\nAnswer:", {"until": ["\n\n", "\n"], "do_sample": false, "temperature": 0.0}], "resps": [[" yes"]], "filtered_resps": [" yes"], "exact_match": 0.0}
{"doc_id": 11, "doc": {"question": "is the show bloodline based on a true story", "passage": "Bloodline (TV series) -- Bloodline was announced in October 2014 as part of a partnership between Netflix and Sony Pictures Television, representing Netflix's first major deal with a major film studio for a television series. The series was created and executive produced by Todd A. Kessler, Glenn Kessler, and Daniel Zelman, who previously created the FX series Damages. According to its official synopsis released by Netflix, Bloodline ``centers on a close-knit family of four adult siblings whose secrets and scars are revealed when their black sheep brother returns home.''", "idx": 11, "label": 0}, "target": "no", "arguments": ["Prison escape -- In Mexico, Belgium, Germany and Austria, the philosophy of the law holds that it is human nature to want to escape. In those countries, escapees who do not break any other laws are not charged for anything and no extra time is added to their sentence. However, in Mexico, officers are allowed to shoot prisoners attempting to escape, and an escape is illegal if violence is used against prison personnel or property, or if prison inmates or officials aid the escape.\nQuestion: legal to break out of prison in germany\nAnswer: yes\n\nFargo (season 3) -- The third season of Fargo, an American anthology black comedy--crime drama television series created by Noah Hawley, premiered on April 19, 2017, on the basic cable network FX. The season had ten episodes, and its initial airing concluded on June 21, 2017. As an anthology, each Fargo season possesses its own self-contained narrative, following a disparate set of characters in various settings, albeit in a connected shared universe.\nQuestion: is there going to be a season 3 of fargo\nAnswer: yes\n\nPeru at the FIFA World Cup -- Peru had its best result in Mexico 1970, finishing in seventh place. Peru again finished in the top eight at the World Cup in Argentina 1978. They finished first in their group during the first round of the tournament, but were eliminated after losing all their games in the second round. Peru reached the next World Cup finals in Spain 1982, although the team was eliminated in the first round after 2 draws and 1 loss. From 1986 to 2014, Peru did not advance past the CONMEBOL qualifying round.\nQuestion: has peru ever won the fifa world cup\nAnswer: no\n\nJunior varsity team -- Members of a junior varsity team are underclassmen determined by the coaching staff to have less experience or ability than those on the varsity roster. As such, junior varsity teams are used to prepare these athletes to compete at the varsity level. In other schools, the line between JV and varsity is arbitrary, with all players at a certain grade level (usually seniors and, in smaller schools, juniors) at varsity and all others below that grade level at JV, with only a few exceptions for highly talented (or well-connected) student athletes, or much smaller schools where - due to their low enrollment - are limited in the number of upperclassmen athletes.\nQuestion: can a senior in high school play junior varsity\nAnswer: yes\n\nBloodline (TV series) -- Bloodline was announced in October 2014 as part of a partnership between Netflix and Sony Pictures Television, representing Netflix's first major deal with a major film studio for a television series. The series was created and executive produced by Todd A. Kessler, Glenn Kessler, and Daniel Zelman, who previously created the FX series Damages. According to its official synopsis released by Netflix, Bloodline ``centers on a close-knit family of four adult siblings whose secrets and scars are revealed when their black sheep brother returns home.''\nQuestion: is the show bloodline based on a true story\nAnswer:", {"until": ["\n\n", "\n"], "do_sample": false, "temperature": 0.0}], "resps": [[" yes"]], "filtered_resps": [" yes"], "exact_match": 0.0}
{"doc_id": 4, "doc": {"question": "is there a difference between hydroxyzine hcl and hydroxyzine pam", "passage": "Hydroxyzine -- Hydroxyzine preparations require a doctor's prescription. The drug is available in two formulations, the pamoate and the dihydrochloride or hydrochloride salts. Vistaril, Equipose, Masmoran, and Paxistil are preparations of the pamoate salt, while Atarax, Alamon, Aterax, Durrax, Tran-Q, Orgatrax, Quiess, and Tranquizine are of the hydrochloride salt.", "idx": 4, "label": 1}, "target": "yes", "arguments": ["Lenses for SLR and DSLR cameras -- DSLRs became affordable around the mid-1990s, and have become extremely popular in recent years. Some manufacturers, for example Minolta, Canon and Nikon, chose to make their DSLRs 100% compatible with their existing SLR lenses in the beginning, allowing owners of new DSLRs to continue to use their existing lenses and get a longer lifespan from their investment. Others, for example Olympus, chose to create a completely new lens mount and series of lenses for their DSLRs. The Pentax SLR camera K-mount system is backward compatible to all previous lens generations from Pentax, including the latest digital SLRs like the K-3 and K-50. A Pentax K-mount lens from the early 70s can be used on the newest Pentax DSLR although it may not provide features that are included in newer lenses (e.g. autofocus). There are a few exceptions from the MZ and ZX series of Pentax film cameras that do not work with some of the older lenses.\nQuestion: do all dslr lenses fit all dslr cameras\nAnswer: no\n\nRemote Play -- Remote Play is a feature of Sony video game consoles that allows the PlayStation 3 and PlayStation 4 to transmit its video and audio output to a PlayStation Portable or PlayStation Vita. Similar functionality is provided on Nintendo's Wii U console, using the Off-TV Play function. This feature essentially allows compatible home console games to be played on the handheld. In 2014, it was expanded to include the use of PlayStation TV, Xperia smartphones and tablets (Z2 and later), and PlayStation Now. In 2016, it was expanded to Microsoft Windows PCs and macOS.\nQuestion: can you remote play ps3 games on vita\nAnswer: yes\n\nBlue Bell Creameries -- According to figures gathered by Statista, a market data and statistics portal, while combined private labels sold more, in 2014 Blue Bell was the best-selling ice cream brand in the United States. The sales area is primarily concentrated in the Southern United States, and has been sold as far west as Las Vegas, as far north as Indianapolis and Denver, and as far east as Richmond, Virginia. Overall, this area comprises only 20% of the United States. By comparison, each of Blue Bell's top four competitors sells its products in 100% of the United States. To become one of the three biggest ice cream manufacturers, Blue Bell has consistently been the top seller in the majority of the markets the company has entered. For example, in its home state of Texas, the company has a 52% market share. Within five months of its entry into Baton Rouge, Louisiana, the company had garnered 35% of the ice cream market. People living outside the sales area can have the ice cream shipped to them (although this has temporarily been halted while the company is ramping up production after the recalls), and former President George W. Bush (a former Governor of Texas) often had the ice cream shipped to Camp David during his administration. In 2006 and 2012, astronauts aboard the International Space Station were also treated to Blue Bell ice cream ``to help out (the crew's) happiness quotient.''\nQuestion: is blue bell ice cream only sold in texas\nAnswer: no\n\nList of backward compatible games for Xbox One -- The Xbox One gaming console has received updates from Microsoft since its launch in 2013 that enable it to play select games from its two predecessor consoles, Xbox and Xbox 360. On June 15, 2015, backward compatibility with supported Xbox 360 games became available to eligible Xbox Preview program users with a beta update to the Xbox One system software. The dashboard update containing backward compatibility was released publicly on November 12, 2015. On October 24, 2017, another such update added games from the original Xbox library. The following is a list of all backward compatible games on Xbox One under this functionality.\nQuestion: do xbox 360 games work on the xbox one s\nAnswer: yes\n\nHydroxyzine -- Hydroxyzine preparations require a doctor's prescription. The drug is available in two formulations, the pamoate and the dihydrochloride or hydrochloride salts. Vistaril, Equipose, Masmoran, and Paxistil are preparations of the pamoate salt, while Atarax, Alamon, Aterax, Durrax, Tran-Q, Orgatrax, Quiess, and Tranquizine are of the hydrochloride salt.\nQuestion: is there a difference between hydroxyzine hcl and hydroxyzine pam\nAnswer:", {"until": ["\n\n", "\n"], "do_sample": false, "temperature": 0.0}], "resps": [[" yes"]], "filtered_resps": [" yes"], "exact_match": 0.0}
{"doc_id": 12, "doc": {"question": "is it bad to wash your hair with shower gel", "passage": "Shower gel -- Shower gels for men may contain the ingredient menthol, which gives a cooling and stimulating sensation on the skin, and some men's shower gels are also designed specifically for use on hair and body. Shower gels contain milder surfactant bases than shampoos, and some also contain gentle conditioning agents in the formula. This means that shower gels can also double as an effective and perfectly acceptable substitute to shampoo, even if they are not labelled as a hair and body wash. Washing hair with shower gel should give approximately the same result as using a moisturising shampoo.", "idx": 12, "label": 1}, "target": "yes", "arguments": ["Mustang -- The BLM has established Herd Management Areas to determine where and how many animals will be sustained as free-roaming populations. Some populations of free-roaming horses and burros remain protected under the Act, but others have disappeared from places where there were once established populations. A few hundred free-roaming horses survive in Alberta and British Columbia. The BLM considers roughly 26,000 individuals a manageable number, but the feral mustang population in February 2010 was 33,700 horses and 4,700 burros. More than half of all mustangs in North America are found in Nevada (which features the horses on its State Quarter), with other significant populations in California, Oregon, Utah, Montana, and Wyoming. Another 34,000 horses are in holding facilities.\nQuestion: are there any wild horses left in the united states\nAnswer: yes\n\nUnited States at the Olympics -- U.S. athletes have won a total of 2,522 medals (1,022 of them gold) at the Summer Olympic Games, the most of any nation, and another 305 at the Winter Olympic Games, the second highest result. The United States has topped the gold medal count (as the medals are listed on the IOC website, and internationally by tradition) at seventeen Summer Olympics, the most of any nation, and one Winter Olympics. The United States holds the record both for the most medals of any nation won in a single Summer Olympics and the most gold medals of any nation won in a single Summer Olympics.\nQuestion: has usa ever won winter olympic medal count\nAnswer: yes\n\nOne-child policy -- On October 29, 2015, it was reported that the existing law would be changed to a two-child policy, citing a statement from the Communist Party of China . The new law became effective on January 1, 2016, following its passage in the standing committee of the National People's Congress on December 27, 2015.\nQuestion: can you have more than 1 kid in china\nAnswer: yes\n\nMini DisplayPort -- Apple replaced the DVI port from the MacBook, MacBook Air, MacBook Pro, iMac, Mac Mini, and the Mac Pro with the Mini DisplayPort. Its use as the video connector for the 24-inch Cinema Display may complicate compatibility:\nQuestion: does macbook air have a mini display port\nAnswer: yes\n\nShower gel -- Shower gels for men may contain the ingredient menthol, which gives a cooling and stimulating sensation on the skin, and some men's shower gels are also designed specifically for use on hair and body. Shower gels contain milder surfactant bases than shampoos, and some also contain gentle conditioning agents in the formula. This means that shower gels can also double as an effective and perfectly acceptable substitute to shampoo, even if they are not labelled as a hair and body wash. Washing hair with shower gel should give approximately the same result as using a moisturising shampoo.\nQuestion: is it bad to wash your hair with shower gel\nAnswer:", {"until": ["\n\n", "\n"], "do_sample": false, "temperature": 0.0}], "resps": [[" yes"]], "filtered_resps": [" yes"], "exact_match": 0.0}
{"doc_id": 5, "doc": {"question": "is barq's root beer a pepsi product", "passage": "Barq's -- Barq's /ˈbɑːrks/ is an American soft drink. Its brand of root beer is notable for having caffeine. Barq's, created by Edward Barq and bottled since the turn of the 20th century, is owned by the Barq family but bottled by the Coca-Cola Company. It was known as Barq's Famous Olde Tyme Root Beer until 2012.", "idx": 5, "label": 0}, "target": "no", "arguments": ["I Am Number Four (film) -- In a 2015 interview, James Frey, the co-author of the series, said that he hoped more movies would be made.\nQuestion: is there a sequel to i am number four film\nAnswer: no\n\nComplex conjugate root theorem -- In mathematics, the complex conjugate root theorem states that if P is a polynomial in one variable with real coefficients, and a + bi is a root of P with a and b real numbers, then its complex conjugate a − bi is also a root of P.\nQuestion: if a + bi is a complex zero of a polynomial with real coefficients then so is its a − bi\nAnswer: yes\n\nPhone hacking -- Phone hacking, being a form of surveillance, is illegal in many countries unless it is carried out as lawful interception by a government agency. In the News International phone hacking scandal, private investigator Glenn Mulcaire was found to have violated the Regulation of Investigatory Powers Act 2000. He was sentenced to six months in prison in January 2007. Renewed controversy over the phone hacking claims led to the closure of the News of the World in July 2011.\nQuestion: is it illegal to hack into someones phone\nAnswer: yes\n\nGolden State Warriors -- The Golden State Warriors are an American professional basketball team based in the San Francisco Bay Area in Oakland, California. The Warriors compete in the National Basketball Association (NBA) as a member of the league's Western Conference Pacific Division. The Warriors play their home games at the Oracle Arena in Oakland. The Warriors have reached ten NBA Finals, winning six NBA championships in 1947, 1956, 1975, 2015, 2017, and 2018. Golden State's six NBA championships are tied for third-most in NBA history with the Chicago Bulls, and behind only the Boston Celtics (17) and Los Angeles Lakers (16).\nQuestion: has golden state ever won an nba championship\nAnswer: yes\n\nBarq's -- Barq's /ˈbɑːrks/ is an American soft drink. Its brand of root beer is notable for having caffeine. Barq's, created by Edward Barq and bottled since the turn of the 20th century, is owned by the Barq family but bottled by the Coca-Cola Company. It was known as Barq's Famous Olde Tyme Root Beer until 2012.\nQuestion: is barq's root beer a pepsi product\nAnswer:", {"until": ["\n\n", "\n"], "do_sample": false, "temperature": 0.0}], "resps": [[" yes"]], "filtered_resps": [" yes"], "exact_match": 0.0}
{"doc_id": 13, "doc": {"question": "is the liver part of the excretory system", "passage": "Excretory system -- The liver detoxifies and breaks down chemicals, poisons and other toxins that enter the body. For example, the liver transforms ammonia (which is poisonous) into urea in fish, amphibians and mammals, and into uric acid in birds and reptiles. Urea is filtered by the kidney into urine or through the gills in fish and tadpoles. Uric acid is paste-like and expelled as a semi-solid waste (the ``white'' in bird excrements). The liver also produces bile, and the body uses bile to break down fats into usable fats and unusable waste.", "idx": 13, "label": 1}, "target": "yes", "arguments": ["Remember Me (2010 film) -- Remember Me is a 2010 American romantic coming-of-age drama film directed by Allen Coulter, and screenplay by Will Fetters. It stars Robert Pattinson, Emilie de Ravin, Chris Cooper, Lena Olin and Pierce Brosnan.\nQuestion: is the movie remember me based on a book\nAnswer: no\n\nAlice Through the Looking Glass (2016 film) -- Alice Through the Looking Glass is a 2016 American fantasy adventure film directed by James Bobin, written by Linda Woolverton and produced by Tim Burton, Joe Roth, Suzanne Todd, and Jennifer Todd. It is based on the characters created by Lewis Carroll and is the sequel to the 2010 film Alice in Wonderland. The film stars Johnny Depp, Anne Hathaway, Mia Wasikowska, Matt Lucas, Rhys Ifans, Helena Bonham Carter, and Sacha Baron Cohen and features the voices of Stephen Fry, Michael Sheen, Timothy Spall, and Alan Rickman.\nQuestion: is there a prequel to alice through the looking glass\nAnswer: yes\n\nList of FIFA World Cup penalty shoot-outs -- This is a list of all penalty shoot-outs that have occurred in the Finals tournament of the FIFA World Cup. Penalty shoot-outs were introduced as tie-breakers in the 1978 World Cup but did not occur before 1982. The first time a World Cup title was won by penalty shoot-out was in 1994. The only other time was in 2006. By the end of the 2018 edition, 30 shoot-outs have taken place in the World Cup. Of these, only two reached the sudden death stage after still being tied at the end of ``best of five kicks''.\nQuestion: is there a penalty shoot out in the world cup final\nAnswer: yes\n\nVicks VapoRub -- Vicks VapoRub ointment is a mentholated topical ointment. VapoRub is indicated for use on the chest, back and throat for cough suppression due to the common cold or on muscles and joints for minor aches and pains. Vicks VapoRub has also been used to treat mosquito bites. Users of VapoRub often apply it immediately before sleep. VapoRub was originally manufactured by the family-owned company Richardson-Vicks, Inc., based in Greensboro, North Carolina.\nQuestion: does vicks vapor rub have alcohol in it\nAnswer: no\n\nExcretory system -- The liver detoxifies and breaks down chemicals, poisons and other toxins that enter the body. For example, the liver transforms ammonia (which is poisonous) into urea in fish, amphibians and mammals, and into uric acid in birds and reptiles. Urea is filtered by the kidney into urine or through the gills in fish and tadpoles. Uric acid is paste-like and expelled as a semi-solid waste (the ``white'' in bird excrements). The liver also produces bile, and the body uses bile to break down fats into usable fats and unusable waste.\nQuestion: is the liver part of the excretory system\nAnswer:", {"until": ["\n\n", "\n"], "do_sample": false, "temperature": 0.0}], "resps": [[" yes"]], "filtered_resps": [" yes"], "exact_match": 0.0}
{"doc_id": 6, "doc": {"question": "can an odd number be divided by an even number", "passage": "Parity (mathematics) -- In mathematics, parity is the property of an integer's inclusion in one of two categories: even or odd. An integer is even if it is evenly divisible by two and odd if it is not even. For example, 6 is even because there is no remainder when dividing it by 2. By contrast, 3, 5, 7, 21 leave a remainder of 1 when divided by 2. Examples of even numbers include −4, 0, 82 and 178. In particular, zero is an even number. Some examples of odd numbers are −5, 3, 29, and 73.", "idx": 6, "label": 1}, "target": "yes", "arguments": ["Fantastic Beasts and Where to Find Them -- Fantastic Beasts and Where to Find Them is a 2001 book written by British author J.K. Rowling (under the pen name of the fictitious author Newt Scamander) about the magical creatures in the Harry Potter universe. The original version, illustrated by the author herself, purports to be Harry Potter's copy of the textbook of the same name mentioned in Harry Potter and the Philosopher's Stone (or Harry Potter and the Sorcerer's Stone in the US), the first novel of the Harry Potter series. It includes several notes inside it supposedly handwritten by Harry, Ron Weasley, and Hermione Granger, detailing their own experiences with some of the beasts described, and including in-jokes relating to the original series.\nQuestion: is fantastic beasts and where to find them related to harry potter\nAnswer: yes\n\nExpungement in Texas -- Texas expungement law allows expungement of arrests which did not lead to a finding of guilt, and class C misdemeanors if the defendant received deferred adjudication, and completed a community supervision. If the defendant was found guilty, pleaded guilty, or pleaded no contest to any offense other than a class ``C'' misdemeanor, it is not eligible for expungement; however, it may be eligible for non-disclosure if deferred adjudication was granted.\nQuestion: can i get a felony expunged in texas\nAnswer: no\n\nNicole Walker -- After her departure, it is mentioned that Nicole had plans to marry ``some soap actor.'' This was presumably a nod to Zucker's marriage to her former Days of Our Lives costar, Kyle Lowder, who played Brady.\nQuestion: brady and nicole days of our lives married\nAnswer: yes\n\nThe Grass Roots -- In December 2015, the American Pop Music Hall of Fame released their 2016 inductees as follows: Barbra Streisand, The Grass Roots, Barry Manilow, Neil Sedaka, The Association, Dion, The Lettermen, Paul Revere & the Raiders, The Temptations and Three Dog Night.\nQuestion: are the grass roots in the rock and roll hall of fame\nAnswer: no\n\nParity (mathematics) -- In mathematics, parity is the property of an integer's inclusion in one of two categories: even or odd. An integer is even if it is evenly divisible by two and odd if it is not even. For example, 6 is even because there is no remainder when dividing it by 2. By contrast, 3, 5, 7, 21 leave a remainder of 1 when divided by 2. Examples of even numbers include −4, 0, 82 and 178. In particular, zero is an even number. Some examples of odd numbers are −5, 3, 29, and 73.\nQuestion: can an odd number be divided by an even number\nAnswer:", {"until": ["\n\n", "\n"], "do_sample": false, "temperature": 0.0}], "resps": [[" no"]], "filtered_resps": [" no"], "exact_match": 0.0}
{"doc_id": 14, "doc": {"question": "is fantastic beasts and where to find them a prequel", "passage": "Fantastic Beasts and Where to Find Them (film) -- Fantastic Beasts and Where to Find Them is a 2016 fantasy film directed by David Yates. A joint British and American production, it is a spin-off and prequel to the Harry Potter film series, and it was produced and written by J.K. Rowling in her screenwriting debut, and inspired by her 2001 book of the same name. The film stars Eddie Redmayne as Newt Scamander, with Katherine Waterston, Dan Fogler, Alison Sudol, Ezra Miller, Samantha Morton, Jon Voight, Carmen Ejogo, Ron Perlman, Colin Farrell and Johnny Depp in supporting roles. It is the first installment in the Fantastic Beasts film series, and ninth overall in the Wizarding World franchise, that began with the Harry Potter films.", "idx": 14, "label": 1}, "target": "yes", "arguments": ["Sati (practice) -- Sati or suttee is an obsolete funeral custom where a widow immolates herself on her husband's pyre or commits suicide in another fashion shortly after her husband's death.\nQuestion: in india woman once burned at husband's pyre\nAnswer: yes\n\nMagnetic moment -- Electrons and many elementary particles also have intrinsic magnetic moments, an explanation of which requires a quantum mechanical treatment and relates to the intrinsic angular momentum of the particles as discussed in the article Electron magnetic moment. It is these intrinsic magnetic moments that give rise to the macroscopic effects of magnetism, and other phenomena, such as electron paramagnetic resonance.\nQuestion: do all electrons have a net magnetic moment\nAnswer: yes\n\nOuray, Colorado -- The Netflix original series The Ranch, starring Ashton Kutcher, Danny Masterson, Sam Elliott and Debra Winger is set in the fictional town of Garrison, Colorado, but the opening shot of the town during the credit sequence is of Ouray, and the San Juan Valley just north of Ouray.\nQuestion: is there a town called garrison in colorado\nAnswer: no\n\nTennis scoring system -- A tennis match is composed of points, games, and sets. A set consists of a number of games (a minimum of six), which in turn each consist of points. A set is won by the first side to win 6 games, with a margin of at least 2 games over the other side (e.g. 6--3 or 7--5). There is usually a tie-break if the set is tied at six games per player. A match is won when a player or a doubles team wins the majority of prescribed sets. Matches employ either a best-of-three or best-of-five set format. The best-of-five set format is typically only played in the men's singles or doubles matches at Grand Slam and Davis Cup matches.\nQuestion: do you have to win a tennis set by 2\nAnswer: yes\n\nFantastic Beasts and Where to Find Them (film) -- Fantastic Beasts and Where to Find Them is a 2016 fantasy film directed by David Yates. A joint British and American production, it is a spin-off and prequel to the Harry Potter film series, and it was produced and written by J.K. Rowling in her screenwriting debut, and inspired by her 2001 book of the same name. The film stars Eddie Redmayne as Newt Scamander, with Katherine Waterston, Dan Fogler, Alison Sudol, Ezra Miller, Samantha Morton, Jon Voight, Carmen Ejogo, Ron Perlman, Colin Farrell and Johnny Depp in supporting roles. It is the first installment in the Fantastic Beasts film series, and ninth overall in the Wizarding World franchise, that began with the Harry Potter films.\nQuestion: is fantastic beasts and where to find them a prequel\nAnswer:", {"until": ["\n\n", "\n"], "do_sample": false, "temperature": 0.0}], "resps": [[" yes"]], "filtered_resps": [" yes"], "exact_match": 0.0}
{"doc_id": 7, "doc": {"question": "is there a word with q without u", "passage": "List of English words containing Q not followed by U -- Of the 71 words in this list, 67 are nouns, and most would generally be considered loanwords; the only modern-English words that contain Q not followed by U and are not borrowed from another language are qiana, qwerty, and tranq. However, all of the loanwords on this list are considered to be naturalised in English according to at least one major dictionary (see References), often because they refer to concepts or societal roles that do not have an accurate equivalent in English. For words to appear here, they must appear in their own entry in a dictionary; words which occur only as part of a longer phrase are not included.", "idx": 7, "label": 1}, "target": "yes", "arguments": ["Statue of Liberty -- The Statue of Liberty (Liberty Enlightening the World; French: La Liberté éclairant le monde) is a colossal neoclassical sculpture on Liberty Island in New York Harbor in New York City, in the United States. The copper statue, a gift from the people of France to the people of the United States, was designed by French sculptor Frédéric Auguste Bartholdi and built by Gustave Eiffel. The statue was dedicated on October 28, 1886.\nQuestion: is the statue of liberty in new york city\nAnswer: yes\n\nSleeping while on duty -- Sleeping while on duty or sleeping on the job refers to falling asleep while on the time clock or equivalent, or else while responsible for performing some active or passive job duty. While in some jobs, this is a minor transgression or not even worthy of sanctioning, in other workplaces, this is considered gross misconduct and may be grounds for disciplinary action, including possible termination of employment. Recently however, there has been a movement in support of sleeping, or napping at work, with scientific studies highlighting health and productivity benefits, and over 6% of employers in some countries providing facilities to do so. In some types of work, such as firefighting or live-in caregiving, sleeping at least part of the shift may be an expected part of paid work time. While some employees who sleep while on duty in violation do so intentionally and hope not to get caught, others intend in good faith to stay awake, and accidentally doze.\nQuestion: can i be sacked for falling asleep at work\nAnswer: yes\n\nUnderworld (film series) -- In 2017, Wiseman revealed that a sixth film is also in development with Beckinsale reprising her role as Selene.\nQuestion: are they coming out with another underworld movie\nAnswer: yes\n\nPostal holiday -- Part 608, section 3.2 of the DMM (U.S. Domestic Mail Manual) groups holidays into ``Widely Observed'' and ``Not Widely Observed''. Holidays ``Widely Observed'' include New Year's Day, Memorial Day, Independence Day, Labor Day, Thanksgiving Day, and Christmas Day. Holidays ``Not Widely Observed'' are Martin Luther King, Jr.'s Birthday; Presidents Day; Columbus Day; and Veterans Day.\nQuestion: does the united states post office deliver mail on columbus day\nAnswer: no\n\nList of English words containing Q not followed by U -- Of the 71 words in this list, 67 are nouns, and most would generally be considered loanwords; the only modern-English words that contain Q not followed by U and are not borrowed from another language are qiana, qwerty, and tranq. However, all of the loanwords on this list are considered to be naturalised in English according to at least one major dictionary (see References), often because they refer to concepts or societal roles that do not have an accurate equivalent in English. For words to appear here, they must appear in their own entry in a dictionary; words which occur only as part of a longer phrase are not included.\nQuestion: is there a word with q without u\nAnswer:", {"until": ["\n\n", "\n"], "do_sample": false, "temperature": 0.0}], "resps": [[" yes"]], "filtered_resps": [" yes"], "exact_match": 0.0}
{"doc_id": 15, "doc": {"question": "will there be a season 8 of vampire diaries", "passage": "The Vampire Diaries (season 8) -- The Vampire Diaries, an American supernatural drama, was renewed for an eighth season by The CW on March 11, 2016. On July 23, 2016, the CW announced that the upcoming season would be the series' last and would consist of 16 episodes. The season premiered on October 21, 2016 and concluded on March 10, 2017.", "idx": 15, "label": 1}, "target": "yes", "arguments": ["Avengers: Infinity War -- Additionally, several other actors reprise their MCU roles: Danai Gurira as Okoye, the head of the Dora Milaje; Letitia Wright as T'Challa's sister Shuri; William Hurt as Thaddeus Ross, the U.S. Secretary of State; Kerry Condon as the voice of Stark's A.I. F.R.I.D.A.Y.; Winston Duke as M'Baku, the leader of Wakanda's mountain tribe the Jabari; Florence Kasumba as Ayo, a member of the Dora Milaje; Jacob Batalon as Parker's friend Ned; Isabella Amara as Parker's classmate Sally; Tiffany Espensen as Parker's classmate Cindy; and Ethan Dizon as Parker's classmate Tiny. Samuel L. Jackson and Cobie Smulders make uncredited cameos as Nick Fury and Maria Hill, the former director and deputy director of S.H.I.E.L.D, respectively, in the film's post-credits scene.\nQuestion: is there an end scene in infonity war\nAnswer: yes\n\nList of backward compatible games for Xbox One -- There are currently 33 on this list out of 1047 released for the Xbox. All Original Xbox games run at 4 times the original resolution on Xbox One and Xbox One S consoles (up to 960p), and 16 times on Xbox One X (up to 1920p).\nQuestion: can xbox one s play xbox original games\nAnswer: yes\n\nTreatment of human head lice -- A standard home blow dryer will kill 96.7% of eggs with proper technique. To be effective, the blow dryer must be used repeatedly (every 1 to 7 days since eggs hatch in 7 to 10 days) until the natural life cycle of the lice is over (about 4 weeks).\nQuestion: can you kill lice by blow drying your hair\nAnswer: yes\n\nJeep Grand Cherokee -- When it was first introduced in April 1992 as an early 1993 model year vehicle, the Grand Cherokee only had one powertrain choice: the 4.0 L AMC-derived straight-six engine that made 190 horsepower. This became the ``volume'' engine for the Grand Cherokee. Transmission choices included a four-speed automatic transmission (early production ZJs used the AW4 -- the A500SE (later 42RE) replaced the AW4 during the latter half of the 1993 model year) or an Aisin AX15 manual transmission. Low demand for the manual transmission resulted in its discontinuation after 1994, but European-market ZJs retained it when coupled to the diesel engine (which was unavailable in North America). The drive train choices included rear-wheel drive or four-wheel-drive. In 1995, the engine dropped 5 horsepower to 185 due to new EPA regulations imposed on the 1996 model year.\nQuestion: is a jeep grand cherokee front wheel drive\nAnswer: no\n\nThe Vampire Diaries (season 8) -- The Vampire Diaries, an American supernatural drama, was renewed for an eighth season by The CW on March 11, 2016. On July 23, 2016, the CW announced that the upcoming season would be the series' last and would consist of 16 episodes. The season premiered on October 21, 2016 and concluded on March 10, 2017.\nQuestion: will there be a season 8 of vampire diaries\nAnswer:", {"until": ["\n\n", "\n"], "do_sample": false, "temperature": 0.0}], "resps": [[" yes"]], "filtered_resps": [" yes"], "exact_match": 0.0}
This source diff could not be displayed because it is too large. You can view the blob instead.
{"doc_id": 0, "doc": {"Problem": "a shopkeeper sold an article offering a discount of 5 % and earned a profit of 31.1 % . what would have been the percentage of profit earned if no discount had been offered ?", "Rationale": "\"giving no discount to customer implies selling the product on printed price . suppose the cost price of the article is 100 . then printed price = 100 ã — ( 100 + 31.1 ) / ( 100 â ˆ ’ 5 ) = 138 hence , required % profit = 138 â € “ 100 = 38 % answer a\"", "options": "a ) 38 , b ) 27.675 , c ) 30 , d ) data inadequate , e ) none of these", "correct": "a", "annotated_formula": "subtract(divide(multiply(add(const_100, 31.1), const_100), subtract(const_100, 5)), const_100)", "linear_formula": "add(n1,const_100)|subtract(const_100,n0)|multiply(#0,const_100)|divide(#2,#1)|subtract(#3,const_100)|", "category": "gain"}, "target": "38", "arguments": ["Question: a circular well with a diameter of 4 metres , is dug to a depth of 14 metres . what is the volume of the earth dug out ?\nAnswer: 176 m 3\n\nQuestion: what is the hcf of 2 / 3 , 4 / 9 and 6 / 18\nAnswer: 1 / 9\n\nQuestion: a car traveled 75 % of the way from town a to town b at an average speed of 50 miles per hour . the car travels at an average speed of s miles per hour for the remaining part of the trip . the average speed for the entire trip was 60 miles per hour . what is s ?\nAnswer: 150\n\nQuestion: on dividing 265 by a number , the quotient is 12 and the remainder is 1 . find the divisor .\nAnswer: e ) 22\n\nQuestion: a shopkeeper sold an article offering a discount of 5 % and earned a profit of 31.1 % . what would have been the percentage of profit earned if no discount had been offered ?\nAnswer:", " 38"], "resps": [[[-5.551372528076172, false]], [[-15.428857803344727, false]], [[-4.153011322021484, false]], [[-25.281858444213867, false]], [[-16.140947341918945, false]]], "filtered_resps": [[-5.551372528076172, false], [-15.428857803344727, false], [-4.153011322021484, false], [-25.281858444213867, false], [-16.140947341918945, false]], "acc": 0.0, "acc_norm": 0.0}
{"doc_id": 1, "doc": {"Problem": "what will be the difference between simple and compound interest at 14 % per annum on a sum of rs . 1000 after 4 years ?", "Rationale": "\"s . i . = ( 1000 * 14 * 4 ) / 100 = rs . 560 c . i . = [ 1000 * ( 1 + 14 / 100 ) 4 - 1000 ] = rs . 689 difference = ( 689 - 560 ) = rs . 129 answer : a\"", "options": "a ) 129 , b ) 130 , c ) 124 , d ) 133 , e ) 145", "correct": "a", "annotated_formula": "subtract(subtract(multiply(1000, power(add(divide(14, const_100), const_1), 4)), 1000), multiply(multiply(1000, divide(14, const_100)), 4))", "linear_formula": "divide(n0,const_100)|add(#0,const_1)|multiply(n1,#0)|multiply(n2,#2)|power(#1,n2)|multiply(n1,#4)|subtract(#5,n1)|subtract(#6,#3)|", "category": "gain"}, "target": "129", "arguments": ["Question: gwen drove an average speed of 15 miles per hour for the first 40 miles of a tripthen at a average speed of 30 miles / hr for the remaining 40 miles of the trip if she made no stops during the trip what was gwen ' s avg speed in miles / hr for the entire trip\nAnswer: 20\n\nQuestion: following an increase in prices , the price of a candy box was 10 pounds and the price of a can of soda was 15 pounds . if the price of a candy box was raised by 25 % , and the price of a can of soda was raised by 50 % . what was the price of a box of candy plus a can of soda before prices were raised ?\nAnswer: 18 .\n\nQuestion: sachin is younger than rahul by 18 years . if the ratio of their ages is 7 : 9 , find the age of sachin\nAnswer: 63\n\nQuestion: a goods train runs at the speed of 72 kmph and crosses a 230 m long platform in 26 seconds . what is the length of the goods train ?\nAnswer: 290 m\n\nQuestion: what will be the difference between simple and compound interest at 14 % per annum on a sum of rs . 1000 after 4 years ?\nAnswer:", " 129"], "resps": [[[-8.612970352172852, false]], [[-6.711771488189697, false]], [[-8.007883071899414, false]], [[-8.116739273071289, false]], [[-7.87969446182251, false]]], "filtered_resps": [[-8.612970352172852, false], [-6.711771488189697, false], [-8.007883071899414, false], [-8.116739273071289, false], [-7.87969446182251, false]], "acc": 0.0, "acc_norm": 0.0}
{"doc_id": 2, "doc": {"Problem": "there are 28 stations between hyderabad and bangalore . how many second class tickets have to be printed , so that a passenger can travel from any station to any other station ?", "Rationale": "\"the total number of stations = 30 from 30 stations we have to choose any two stations and the direction of travel ( i . e . , hyderabad to bangalore is different from bangalore to hyderabad ) in 3 ⁰ p ₂ ways . 30 p ₂ = 30 * 29 = 870 . answer : c\"", "options": "a ) 156 , b ) 167 , c ) 870 , d ) 352 , e ) 380", "correct": "c", "annotated_formula": "multiply(add(28, const_1), add(add(28, const_1), const_1))", "linear_formula": "add(n0,const_1)|add(#0,const_1)|multiply(#0,#1)|", "category": "physics"}, "target": "870", "arguments": ["Question: a sum is divided among b , c and d in such a way that for each rupee b gets , c gets 150 paisa and d gets 50 paisa . if the share of c is rs . 40 , what is the total amount ?\nAnswer: 80\n\nQuestion: mike earns $ 14 per hour and phil earns $ 10.5 per hour . approximately how much less , as a percentage , does phil earn than mike per hour ?\nAnswer: 25 %\n\nQuestion: a batsman makes a score of 82 runs in the 17 th inning and thus increases his averages by 3 . what is his average after 17 th inning ?\nAnswer: 34\n\nQuestion: a , b , k start from the same place and travel in the same direction at speeds of 30 km / hr , 40 km / hr , 60 km / hr respectively . b starts three hours after a . if b and k overtake a at the same instant , how many hours after a did k start ?\nAnswer: 6\n\nQuestion: there are 28 stations between hyderabad and bangalore . how many second class tickets have to be printed , so that a passenger can travel from any station to any other station ?\nAnswer:", " 156"], "resps": [[[-8.808416366577148, false]], [[-8.590215682983398, false]], [[-10.443174362182617, false]], [[-9.836599349975586, false]], [[-9.175664901733398, false]]], "filtered_resps": [[-8.808416366577148, false], [-8.590215682983398, false], [-10.443174362182617, false], [-9.836599349975586, false], [-9.175664901733398, false]], "acc": 0.0, "acc_norm": 0.0}
{"doc_id": 3, "doc": {"Problem": "the present population of a town is 3888 . population increase rate is 20 % p . a . find the population of town before 2 years ?", "Rationale": "\"p = 3888 r = 20 % required population of town = p / ( 1 + r / 100 ) ^ t = 3888 / ( 1 + 20 / 100 ) ^ 2 = 3888 / ( 6 / 5 ) ^ 2 = 2700 ( approximately ) answer is e\"", "options": "a ) 2500 , b ) 2100 , c ) 3500 , d ) 3600 , e ) 2700", "correct": "e", "annotated_formula": "add(3888, divide(multiply(3888, 20), const_100))", "linear_formula": "multiply(n0,n1)|divide(#0,const_100)|add(n0,#1)|", "category": "gain"}, "target": "2700", "arguments": ["Question: the weight of every type a widget is the same , the weight of every type b widget is the same , and the weight of every type c widget is the same . if the weight of 7 type a widgets is equal to the weight of 2 type b widgets , and the weight of 4 type b widgets is equal to the weight of 7 type c widgets . what is the ratio of the total weight of 1 type a widget and 1 type b widget , to the total weight of 1 type b widget and 1 type c widget ?\nAnswer: 9 : 11\n\nQuestion: a 9 - meter long wire is cut into two pieces . if the longer piece is then used to form a perimeter of a square , what is the probability that the area of the square will be more than 4 if the original wire was cut at an arbitrary point ?\nAnswer: 2 / 9\n\nQuestion: the number of timeshare condos available at sunset beach is 2 / 5 the number of timeshare condos available at playa del mar . if the total number of timeshare condos available at the two beaches combined is 210 , what is the difference between the number of condos available at sunset beach and the number of condos available at playa del mar ?\nAnswer: 90\n\nQuestion: a and b can do a piece of work in 60 days and 60 days respectively . they work together for 10 days and b leaves . in how many days the whole work is completed ?\nAnswer: 30 days\n\nQuestion: the present population of a town is 3888 . population increase rate is 20 % p . a . find the population of town before 2 years ?\nAnswer:", " 2500"], "resps": [[[-7.6685380935668945, false]], [[-8.16148853302002, false]], [[-7.777310371398926, false]], [[-8.022066116333008, false]], [[-8.186005592346191, false]]], "filtered_resps": [[-7.6685380935668945, false], [-8.16148853302002, false], [-7.777310371398926, false], [-8.022066116333008, false], [-8.186005592346191, false]], "acc": 0.0, "acc_norm": 0.0}
{"doc_id": 4, "doc": {"Problem": "the triplicate ratio of 1 : 9 is ?", "Rationale": "\"13 : 93 = 1 : 729 answer : e\"", "options": "a ) 1 : 0 , b ) 1 : 8 , c ) 1 : 7 , d ) 1 : 2 , e ) 1 : 729", "correct": "e", "annotated_formula": "divide(power(const_2.0, 9), power(const_3.0, 9))", "linear_formula": "power(const_2.0,n1)|power(const_3.0,n1)|divide(#0,#1)|", "category": "other"}, "target": "1 : 729", "arguments": ["Question: if 5 ^ 21 x 4 ^ 11 = 2 x 10 ^ n . what is the value of n ?\nAnswer: 21\n\nQuestion: excluding stoppages , the speed of a train is 54 kmph and including stoppages it is 40 kmph . of how many minutes does the train stop per hour ?\nAnswer: c ) 15.55\n\nQuestion: the sum of the numbers is 660 . if the first number be twice the second and third number be one - third of the first , then the second number is :\nAnswer: 180\n\nQuestion: the banker ' s gain of a certain sum due 2 years hence at 10 % per annum is rs . 24 . the present worth is\nAnswer: rs . 600\n\nQuestion: the triplicate ratio of 1 : 9 is ?\nAnswer:", " 1 : 0"], "resps": [[[-8.735991477966309, false]], [[-7.541701316833496, false]], [[-8.25577449798584, false]], [[-8.393889427185059, false]], [[-17.276824951171875, false]]], "filtered_resps": [[-8.735991477966309, false], [-7.541701316833496, false], [-8.25577449798584, false], [-8.393889427185059, false], [-17.276824951171875, false]], "acc": 0.0, "acc_norm": 0.0}
{"doc_id": 5, "doc": {"Problem": "the sum of all the integers s such that - 26 < s < 24 is", "Rationale": "\"easy one - - 25 , - 24 , - 23 , - 22 , . . . . . . - 1,0 , 1 , 2 . . . . , 22 , 23 cancel everyhitng and we ' re left with - - 25 and - 24 s = - 49 . d is the answer .\"", "options": "a ) 0 , b ) - 2 , c ) - 25 , d ) - 49 , e ) - 51", "correct": "d", "annotated_formula": "add(add(negate(26), const_1), add(add(negate(26), const_1), const_1))", "linear_formula": "negate(n0)|add(#0,const_1)|add(#1,const_1)|add(#1,#2)|", "category": "general"}, "target": "- 49", "arguments": ["Question: if p ^ 2 – 13 p + 40 = h , and p is a positive integer between 1 and 10 , inclusive , what is the probability that h < 0 ?\nAnswer: 1 / 5\n\nQuestion: after getting 2 successive discounts , a shirt with a list price of rs 150 is available at rs 105 . if the second discount is 12.55 , find the first discount .\nAnswer: 20 %\n\nQuestion: joe has a total of $ 200 in his two pockets . he takes one fourth of the money in his left pocket and puts it in his right pocket . he then takes $ 20 from his left pocket and puts it in his right pocket . if he now has an equal amount of money in each pocket , then how much money did he originally have in his left pocket ?\nAnswer: $ 160\n\nQuestion: there was a cycle race going on . 1 / 5 th of the those in front of a person and 5 / 6 th of those behind him gives the total number of participants . how many people took part in the race ?\nAnswer: 31\n\nQuestion: the sum of all the integers s such that - 26 < s < 24 is\nAnswer:", " 0"], "resps": [[[-3.6299359798431396, false]], [[-6.2971296310424805, false]], [[-6.412287712097168, false]], [[-9.66511058807373, false]], [[-9.517909049987793, false]]], "filtered_resps": [[-3.6299359798431396, false], [-6.2971296310424805, false], [-6.412287712097168, false], [-9.66511058807373, false], [-9.517909049987793, false]], "acc": 0.0, "acc_norm": 0.0}
{"doc_id": 6, "doc": {"Problem": "a full stationary oil tank that is a right circular cylinder has a radius of 100 feet and a height of 25 feet . oil is pumped from the stationary tank to an oil truck that has a tank that is a right circular cylinder until the truck ' s tank is completely filled . if the truck ' s tank has a radius of 6 feet and a height of 10 feet , how far ( in feet ) did the oil level drop in the stationary tank ?", "Rationale": "\"the volume of oil pumped to the tank = the volume of oil taken away from stationary cylinder . pi * 36 * 10 = pi * h * 100 * 100 ( h is distance that the oil level dropped ) h = 360 / 10,000 = 36 / 1000 = 0.036 ft the answer is a .\"", "options": "a ) 0.036 , b ) 0.36 , c ) 0.6 , d ) 6 , e ) 3.6", "correct": "a", "annotated_formula": "divide(volume_cylinder(6, 10), circle_area(100))", "linear_formula": "circle_area(n0)|volume_cylinder(n2,n3)|divide(#1,#0)|", "category": "geometry"}, "target": "0.036", "arguments": ["Question: what is 1 percent of 12,356 ?\nAnswer: 123.56\n\nQuestion: a certain store sold pens for $ 0.35 each and pencils for $ 0.25 each . if a customer purchased both pens and pencils from the store for a total of $ 1.60 , what total number of pens and pencils did the customer purchase ?\nAnswer: 6\n\nQuestion: in a certain egg - processing plant , every egg must be inspected , and is either accepted for processing or rejected . for every 96 eggs accepted for processing , 4 eggs are rejected . if , on a particular day , 12 additional eggs were accepted , but the overall number of eggs inspected remained the same , the ratio of those accepted to those rejected would be 99 to 1 . how many e eggs does the plant process per day ?\nAnswer: 400\n\nQuestion: a , b and c started a business with capitals of rs . 8000 , rs . 10000 and rs . 12000 respectively . at the end of the year , the profit share of b is rs . 1700 . the difference between the profit shares of a and c is ?\nAnswer: 680\n\nQuestion: a full stationary oil tank that is a right circular cylinder has a radius of 100 feet and a height of 25 feet . oil is pumped from the stationary tank to an oil truck that has a tank that is a right circular cylinder until the truck ' s tank is completely filled . if the truck ' s tank has a radius of 6 feet and a height of 10 feet , how far ( in feet ) did the oil level drop in the stationary tank ?\nAnswer:", " 0.036"], "resps": [[[-11.045950889587402, false]], [[-9.370139122009277, false]], [[-7.750082969665527, false]], [[-3.544638156890869, false]], [[-9.736152648925781, false]]], "filtered_resps": [[-11.045950889587402, false], [-9.370139122009277, false], [-7.750082969665527, false], [-3.544638156890869, false], [-9.736152648925781, false]], "acc": 0.0, "acc_norm": 1.0}
{"doc_id": 7, "doc": {"Problem": "each week a restaurant serving mexican food uses the same volume of chili paste , which comes in either 35 - ounce cans or 25 - ounce cans of chili paste . if the restaurant must order 20 more of the smaller cans than the larger cans to fulfill its weekly needs , then how manysmallercans are required to fulfill its weekly needs ?", "Rationale": "\"let x be the number of 35 ounce cans . therefore ( x + 20 ) is the number of 25 ounce cans . total volume is same , therefore 35 x = 25 ( x + 20 ) 10 x = 500 x = 50 therefore , number of 15 ounce cans = 50 + 20 = 70 ans - b\"", "options": "a ) 60 , b ) 70 , c ) 80 , d ) 100 , e ) 120", "correct": "b", "annotated_formula": "add(25, 20)", "linear_formula": "add(n1,n2)|", "category": "general"}, "target": "70", "arguments": ["Question: claire has a total of 92 pets consisting of gerbils and hamsters only . one - quarter of the gerbils are male , and one - third of the hamsters are male . if there are 25 males altogether , how many gerbils does claire have ?\nAnswer: 68\n\nQuestion: the average age of a class is 15.8 years . the average age of the boys in the class is 16.4 years and that of the girls is 15.4 years . what is the ratio of boys to girls in the class ?\nAnswer: 2 : 3\n\nQuestion: a candidate got 35 % of the votes polled and he lost to his rival by 2370 votes . how many votes were cast ?\nAnswer: 7900\n\nQuestion: working simultaneously and independently at an identical constant rate , 4 machines of a certain type can produce a total of x units of product p in 6 days . how many of these machines , working simultaneously and independently at this constant rate , can produce a total of 2 x units of product p in 3 days ?\nAnswer: 16\n\nQuestion: each week a restaurant serving mexican food uses the same volume of chili paste , which comes in either 35 - ounce cans or 25 - ounce cans of chili paste . if the restaurant must order 20 more of the smaller cans than the larger cans to fulfill its weekly needs , then how manysmallercans are required to fulfill its weekly needs ?\nAnswer:", " 60"], "resps": [[[-5.162363052368164, false]], [[-5.422571182250977, false]], [[-5.283609390258789, false]], [[-4.289575576782227, false]], [[-6.304437637329102, false]]], "filtered_resps": [[-5.162363052368164, false], [-5.422571182250977, false], [-5.283609390258789, false], [-4.289575576782227, false], [-6.304437637329102, false]], "acc": 0.0, "acc_norm": 0.0}
{"doc_id": 8, "doc": {"Problem": "if n is an integer and 101 n ^ 2 is less than or equal to 10000 , what is the greatest possible value of n ?", "Rationale": "\"101 * n ^ 2 < = 10000 n ^ 2 < = 10000 / 101 which will be less than 100 since 10000 / 100 = 100 which is the square of 9 next closest value of n where n ^ 2 < = 100 is 9 ans c\"", "options": "a ) 7 , b ) 8 , c ) 9 , d ) 10 , e ) 11", "correct": "c", "annotated_formula": "floor(sqrt(divide(10000, 101)))", "linear_formula": "divide(n2,n0)|sqrt(#0)|floor(#1)|", "category": "general"}, "target": "9", "arguments": ["Question: out of 410 students of a school , 325 play football , 175 play cricket and 50 neither play football nor cricket . how many students play both football and cricket ?\nAnswer: 140\n\nQuestion: 12 is what % of 80 ?\nAnswer: 15\n\nQuestion: a number when divided by a divisor leaves a remainder of 27 . when twice the original number is divided by the same divisor , the remainder is 11 . what is the value of the divisor ?\nAnswer: 40\n\nQuestion: if the population of a certain country increases at the rate of one person every 15 seconds , by how many persons does the population increase in 60 minutes ?\nAnswer: 240\n\nQuestion: if n is an integer and 101 n ^ 2 is less than or equal to 10000 , what is the greatest possible value of n ?\nAnswer:", " 7"], "resps": [[[-4.0766143798828125, false]], [[-3.978180170059204, false]], [[-3.9305574893951416, false]], [[-3.0353853702545166, false]], [[-4.7505340576171875, false]]], "filtered_resps": [[-4.0766143798828125, false], [-3.978180170059204, false], [-3.9305574893951416, false], [-3.0353853702545166, false], [-4.7505340576171875, false]], "acc": 0.0, "acc_norm": 0.0}
{"doc_id": 9, "doc": {"Problem": "a constructor estimates that 10 people can paint mr khans house in 4 days . if he uses 5 people instead of 10 , how long will they take to complete the job ?", "Rationale": "\"explanation : use formula for a work members ã — days = constant 10 ã — 4 = 5 ã — a a = 8 so answer is 8 days answer : d\"", "options": "a ) 10 , b ) 4 , c ) 5 , d ) 8 , e ) 6", "correct": "d", "annotated_formula": "divide(const_1, multiply(divide(const_1, multiply(const_4.0, 10)), 4))", "linear_formula": "multiply(n0,n1)|divide(const_1,#0)|multiply(n2,#1)|divide(const_1,#2)|", "category": "physics"}, "target": "8", "arguments": ["Question: the average ( arithmetic mean ) of 16 students first quiz scores in a difficult english class is 62.5 . when one student dropped the class , the average of the remaining scores increased to 62.0 . what is the quiz score of the student who dropped the class ?\nAnswer: e ) 70\n\nQuestion: 1,5 , 13,25 , __\nAnswer: 41\n\nQuestion: a man sold 20 articles for $ 60 and gained 20 % . how many articles should he sell for $ 60 to incur a loss 20 % ?\nAnswer: 30\n\nQuestion: what is the greatest value of x such that 4 ^ x is a factor of 21 ! ?\nAnswer: 6\n\nQuestion: a constructor estimates that 10 people can paint mr khans house in 4 days . if he uses 5 people instead of 10 , how long will they take to complete the job ?\nAnswer:", " 10"], "resps": [[[-2.838005542755127, false]], [[-2.9682393074035645, false]], [[-2.762840747833252, false]], [[-3.325218677520752, false]], [[-2.895805835723877, false]]], "filtered_resps": [[-2.838005542755127, false], [-2.9682393074035645, false], [-2.762840747833252, false], [-3.325218677520752, false], [-2.895805835723877, false]], "acc": 0.0, "acc_norm": 0.0}
{"doc_id": 10, "doc": {"Problem": "the population of a town is 8000 . it decreases annually at the rate of 20 % p . a . what will be its population after 3 years ?", "Rationale": "\"formula : ( after = 100 denominator ago = 100 numerator ) 8000 ã — 80 / 100 ã — 80 / 100 x 80 / 100 = 4096 answer : b\"", "options": "a ) 5100 , b ) 4096 , c ) 5200 , d ) 5400 , e ) 5500", "correct": "b", "annotated_formula": "subtract(subtract(8000, multiply(8000, divide(20, const_100))), multiply(subtract(8000, multiply(8000, divide(20, const_100))), divide(20, const_100)))", "linear_formula": "divide(n1,const_100)|multiply(n0,#0)|subtract(n0,#1)|multiply(#0,#2)|subtract(#2,#3)|", "category": "gain"}, "target": "4096", "arguments": ["Question: if greg buys 5 shirts , 4 trousers and 2 ties , the total cost is $ 80 . if greg buys 7 shirts , 4 trousers and 2 ties , the total cost is $ 70 . how much will it cost him to buy 2 trousers , 3 shirts and 1 ties ?\nAnswer: $ 37.5\n\nQuestion: at the end of year x , automobile installment credit accounted for 36 % of all outstanding consumer installment credit . at that time automobile finance companies extended $ 35 billion of credit , or 1 / 3 of the automobile installment credit . how many billion dollars of consumer installment credit was outstanding at that time ?\nAnswer: 291.67\n\nQuestion: a $ 500 investment and a $ 1,500 investment have a combined yearly return of 13 percent of the total of the two investments . if the $ 500 investment has a yearly return of 7 percent , what percent yearly return does the $ 1,500 investment have ?\nAnswer: 15 %\n\nQuestion: one - seventh of the light switches produced by a certain factory are defective . 4 - fifths of the defective switches are rejected and 1 / 15 of the non defective switches are rejected by mistake . if all the switches not rejected are sold , what percent of the switches sold by the factory are defective ?\nAnswer: 3.4 %\n\nQuestion: the population of a town is 8000 . it decreases annually at the rate of 20 % p . a . what will be its population after 3 years ?\nAnswer:", " 5100"], "resps": [[[-9.836970329284668, false]], [[-9.663614273071289, false]], [[-9.417353630065918, false]], [[-10.409930229187012, false]], [[-9.057307243347168, false]]], "filtered_resps": [[-9.836970329284668, false], [-9.663614273071289, false], [-9.417353630065918, false], [-10.409930229187012, false], [-9.057307243347168, false]], "acc": 0.0, "acc_norm": 0.0}
{"doc_id": 11, "doc": {"Problem": "the percentage profit earned by selling an article for rs . 1920 is equal to the percentage loss incurred by selling the same article for rs . 1280 . at what price should the article be sold to make 40 % profit ?", "Rationale": "\"let c . p . be rs . x . then , ( 1920 - x ) / x * 100 = ( x - 1280 ) / x * 100 1920 - x = x - 1280 2 x = 3200 = > x = 1600 required s . p . = 140 % of rs . 1600 = 140 / 100 * 1600 = rs . 2240 . answer : e\"", "options": "a ) 2000 , b ) 2778 , c ) 2299 , d ) 2778 , e ) 2240", "correct": "e", "annotated_formula": "multiply(divide(add(const_100, 40), const_100), divide(add(1920, 1280), const_2))", "linear_formula": "add(n2,const_100)|add(n0,n1)|divide(#0,const_100)|divide(#1,const_2)|multiply(#2,#3)|", "category": "gain"}, "target": "2240", "arguments": ["Question: a cube of edge 5 cm is immersed completely in a rectangular vessel containing water . if the dimensions of the base of vessel are 10 cm * 5 cm , find the rise in water level ?\nAnswer: 2.5 cm\n\nQuestion: find large number from below question the difference of two numbers is 1365 . on dividing the larger number by the smaller , we get 8 as quotient and the 15 as remainder\nAnswer: 1557.9\n\nQuestion: a group of boy scouts and girls scouts is going on a rafting trip . 45 % of the scouts arrived with signed permission slips . if 60 % of the scouts were boy scouts and 25 % of the boy scouts arrived with signed permission slips , then what percentage of the scouts were girl scouts who arrived with signed permission slips ?\nAnswer: 30\n\nQuestion: when a person aged 39 is added to a group of n people , the average age increases by 2 . when a person aged 15 is added instead , the average age decreases by 1 . what is the value of e ?\nAnswer: 7\n\nQuestion: the percentage profit earned by selling an article for rs . 1920 is equal to the percentage loss incurred by selling the same article for rs . 1280 . at what price should the article be sold to make 40 % profit ?\nAnswer:", " 2000"], "resps": [[[-5.748497009277344, false]], [[-12.249731063842773, false]], [[-15.916696548461914, false]], [[-12.249731063842773, false]], [[-10.228025436401367, false]]], "filtered_resps": [[-5.748497009277344, false], [-12.249731063842773, false], [-15.916696548461914, false], [-12.249731063842773, false], [-10.228025436401367, false]], "acc": 0.0, "acc_norm": 0.0}
{"doc_id": 12, "doc": {"Problem": "running at the same constant rate , 6 identical machines can produce a total of 360 bottles per minute . at this rate , how many bottles could 10 such machines produce in 4 minutes ?", "Rationale": "\"let the required number of bottles be x . more machines , more bottles ( direct proportion ) more minutes , more bottles ( direct proportion ) machines 6 : 10 : : 360 : x time ( in minutes ) 1 : 4 6 x 1 x x = 10 x 4 x 360 x = ( 10 x 4 x 360 ) / ( 6 ) x = 2400 . answer : c\"", "options": "a ) 648 , b ) 1800 , c ) 2400 , d ) 10800 , e ) 10900", "correct": "c", "annotated_formula": "multiply(multiply(divide(360, 6), 4), 10)", "linear_formula": "divide(n1,n0)|multiply(n3,#0)|multiply(n2,#1)|", "category": "gain"}, "target": "2400", "arguments": ["Question: a mixture contains alcohol and water in the ratio 4 : 3 . if 5 liters of water is added to the mixture , the ratio becomes 4 : 5 . find the quantity of alcohol in the given mixture .\nAnswer: 10\n\nQuestion: a bag contains 5 red , 4 blue and 3 green balls . if 2 ballsare picked at random , what is the probability that both are red ?\nAnswer: 5 / 33\n\nQuestion: the population of a town is 10000 . it increases annually at the rate of 20 % p . a . what will be its population after 1 year ?\nAnswer: 12000\n\nQuestion: if 1.5 x = 0.04 y then the value of ( y - x ) / ( y + x ) is\nAnswer: 73 / 77\n\nQuestion: running at the same constant rate , 6 identical machines can produce a total of 360 bottles per minute . at this rate , how many bottles could 10 such machines produce in 4 minutes ?\nAnswer:", " 648"], "resps": [[[-10.437644004821777, false]], [[-8.180549621582031, false]], [[-7.8147501945495605, false]], [[-10.662771224975586, false]], [[-11.975011825561523, false]]], "filtered_resps": [[-10.437644004821777, false], [-8.180549621582031, false], [-7.8147501945495605, false], [-10.662771224975586, false], [-11.975011825561523, false]], "acc": 1.0, "acc_norm": 1.0}
{"doc_id": 13, "doc": {"Problem": "there are 1000 buildings in a street . a sign - maker is contracted to number the houses from 1 to 1000 . how many zeroes will he need ?", "Rationale": "divide as ( 1 - 100 ) ( 100 - 200 ) . . . . ( 900 - 1000 ) total 192 answer : c", "options": "a ) 190 , b ) 191 , c ) 192 , d ) 193 , e ) 194", "correct": "c", "annotated_formula": "add(add(divide(1000, const_10), multiply(subtract(const_10, 1), const_10)), const_2)", "linear_formula": "divide(n0,const_10)|subtract(const_10,n1)|multiply(#1,const_10)|add(#0,#2)|add(#3,const_2)", "category": "general"}, "target": "192", "arguments": ["Question: excluding the stoppages , the speed of a bus is 32 km / hr and including the stoppages the speed of the bus is 16 km / hr . for how many minutes does the bus stop per hour ?\nAnswer: 30 min\n\nQuestion: two trains travel in opposite directions at 36 kmph and 45 kmph and a man sitting in slower train passes the faster train in 4 seconds . the length of the faster train is\nAnswer: 90 m\n\nQuestion: if the sides of a triangle are 91 cm , 84 cm and 35 cm , what is its area ?\nAnswer: 1470 cm 2\n\nQuestion: the profit earned by selling an article for rs . 832 is equal to the loss incurred when the same article is sold for rs . 448 . what should be the sale price for making 40 % profit ?\nAnswer: 896\n\nQuestion: there are 1000 buildings in a street . a sign - maker is contracted to number the houses from 1 to 1000 . how many zeroes will he need ?\nAnswer:", " 190"], "resps": [[[-8.255971908569336, false]], [[-9.087362289428711, false]], [[-8.321828842163086, false]], [[-9.539297103881836, false]], [[-9.773458480834961, false]]], "filtered_resps": [[-8.255971908569336, false], [-9.087362289428711, false], [-8.321828842163086, false], [-9.539297103881836, false], [-9.773458480834961, false]], "acc": 0.0, "acc_norm": 0.0}
{"doc_id": 14, "doc": {"Problem": "a man bought 20 shares of rs . 50 at 5 discount , the rate of dividend being 13 . the rate of interest obtained is :", "Rationale": "\"investment = rs . [ 20 x ( 50 - 5 ) ] = rs . 900 . face value = rs . ( 50 x 20 ) = rs . 1000 . dividend = rs . 27 x 1000 = rs . 135 . 2 100 interest obtained = 135 x 100 % = 15 % 900 view answer discuss in forum answer : c\"", "options": "a ) 27 % , b ) 87 % , c ) 15 % , d ) 66 % , e ) 88 %", "correct": "c", "annotated_formula": "divide(multiply(multiply(20, 50), divide(13, const_100)), multiply(20, subtract(50, 5)))", "linear_formula": "divide(n3,const_100)|multiply(n0,n1)|subtract(n1,n2)|multiply(#0,#1)|multiply(n0,#2)|divide(#3,#4)|", "category": "gain"}, "target": "15 %", "arguments": ["Question: a 6 - liter solution is 25 % alcohol . how many liters of pure alcohol must be added to produce a solution that is 50 % alcohol ?\nAnswer: 3.0\n\nQuestion: if a rectangular room measures 10 meters by 5 meters by 4 meters , what is the volume of the room in cubic centimeters ? ( 1 meter = 100 centimeters )\nAnswer: 200\n\nQuestion: the shopkeeper increased the price of a product by 25 % so that customer finds it difficult to purchase the required amount . but somehow the customer managed to purchase only 68 % of the required amount . what is the net difference in the expenditure on that product ?\nAnswer: 15 %\n\nQuestion: a person can swim in still water at 4 km / h . if the speed of water 2 km / h , how many hours will the man take to swim back against the current for 14 km ?\nAnswer: 7\n\nQuestion: a man bought 20 shares of rs . 50 at 5 discount , the rate of dividend being 13 . the rate of interest obtained is :\nAnswer:", " 27 %"], "resps": [[[-9.225954055786133, false]], [[-11.084755897521973, false]], [[-5.061804294586182, false]], [[-10.36075210571289, false]], [[-10.414409637451172, false]]], "filtered_resps": [[-9.225954055786133, false], [-11.084755897521973, false], [-5.061804294586182, false], [-10.36075210571289, false], [-10.414409637451172, false]], "acc": 1.0, "acc_norm": 1.0}
{"doc_id": 15, "doc": {"Problem": "? % of 360 = 108", "Rationale": "\"? % of 360 = 108 or , ? = 108 × 100 / 360 = 30 answer a\"", "options": "a ) 30 , b ) 36 , c ) 64 , d ) 72 , e ) none of these", "correct": "a", "annotated_formula": "divide(multiply(108, const_100), 360)", "linear_formula": "multiply(n1,const_100)|divide(#0,n0)|", "category": "gain"}, "target": "30", "arguments": ["Question: a circular garden is surrounded by a fence of negligible width along the boundary . if the length of the fence is 1 / 5 of th area of the garden . what is the radius of the circular garden ?\nAnswer: 10\n\nQuestion: calculate 469111 x 9999 = ?\nAnswer: 4690640889\n\nQuestion: machine – a produces 40 % of the total output and machine - b produces 60 % of the total output . an average of 9 units out of a 1000 goods manufactured by machine - a and one unit of 50 units produced by machine - b prove to be defective . what is the probability that a unit chosen at random from the total daily output of the factory is defective ?\nAnswer: c . 0.0156\n\nQuestion: sides of a rectangular park are in the ratio 3 : 2 and its area is 5766 sq m , the cost of fencing it at 50 ps per meter is ?\nAnswer: 155\n\nQuestion: ? % of 360 = 108\nAnswer:", " 30"], "resps": [[[-5.9170942306518555, false]], [[-3.830981492996216, false]], [[-7.3772077560424805, false]], [[-6.005724906921387, false]], [[-13.513760566711426, false]]], "filtered_resps": [[-5.9170942306518555, false], [-3.830981492996216, false], [-7.3772077560424805, false], [-6.005724906921387, false], [-13.513760566711426, false]], "acc": 0.0, "acc_norm": 0.0}
{"doc_id": 0, "doc": {"pubid": 21645374, "question": "Do mitochondria play a role in remodelling lace plant leaves during programmed cell death?", "context": {"contexts": ["Programmed cell death (PCD) is the regulated death of cells within an organism. The lace plant (Aponogeton madagascariensis) produces perforations in its leaves through PCD. The leaves of the plant consist of a latticework of longitudinal and transverse veins enclosing areoles. PCD occurs in the cells at the center of these areoles and progresses outwards, stopping approximately five cells from the vasculature. The role of mitochondria during PCD has been recognized in animals; however, it has been less studied during PCD in plants.", "The following paper elucidates the role of mitochondrial dynamics during developmentally regulated PCD in vivo in A. madagascariensis. A single areole within a window stage leaf (PCD is occurring) was divided into three areas based on the progression of PCD; cells that will not undergo PCD (NPCD), cells in early stages of PCD (EPCD), and cells in late stages of PCD (LPCD). Window stage leaves were stained with the mitochondrial dye MitoTracker Red CMXRos and examined. Mitochondrial dynamics were delineated into four categories (M1-M4) based on characteristics including distribution, motility, and membrane potential (ΔΨm). A TUNEL assay showed fragmented nDNA in a gradient over these mitochondrial stages. Chloroplasts and transvacuolar strands were also examined using live cell imaging. The possible importance of mitochondrial permeability transition pore (PTP) formation during PCD was indirectly examined via in vivo cyclosporine A (CsA) treatment. This treatment resulted in lace plant leaves with a significantly lower number of perforations compared to controls, and that displayed mitochondrial dynamics similar to that of non-PCD cells."], "labels": ["BACKGROUND", "RESULTS"], "meshes": ["Alismataceae", "Apoptosis", "Cell Differentiation", "Mitochondria", "Plant Leaves"], "reasoning_required_pred": ["y", "e", "s"], "reasoning_free_pred": ["y", "e", "s"]}, "long_answer": "Results depicted mitochondrial dynamics in vivo as PCD progresses within the lace plant, and highlight the correlation of this organelle with other organelles during developmental PCD. To the best of our knowledge, this is the first report of mitochondria and chloroplasts moving on transvacuolar strands to form a ring structure surrounding the nucleus during developmental PCD. Also, for the first time, we have shown the feasibility for the use of CsA in a whole plant system. Overall, our findings implicate the mitochondria as playing a critical and early role in developmentally regulated PCD in the lace plant.", "final_decision": "yes"}, "target": "yes", "arguments": ["Abstract: The data analysis was conducted to describe the rate of unsuccessful copper T380A intrauterine device (IUD) insertions among women using the IUD for emergency contraception (EC) at community family planning clinics in Utah.\nThese data were obtained from a prospective observational trial of women choosing the copper T380A IUD for EC. Insertions were performed by nurse practitioners at two family planning clinics in order to generalize findings to the type of service setting most likely to employ this intervention. Adjuvant measures to facilitate difficult IUD insertions (cervical anesthesia, dilation, pain medication, and use of ultrasound guidance) were not utilized. The effect of parity on IUD insertion success was determined using exact logistic regression models adjusted for individual practitioner failure rates.\nSix providers performed 197 IUD insertion attempts. These providers had a mean of 14.1 years of experience (range 1-27, S.D. ±12.5). Among nulliparous women, 27 of 138 (19.6%) IUD insertions were unsuccessful. In parous women, 8 of 59 IUD insertions were unsuccessful (13.6%). The adjusted odds ratio (aOR) showed that IUD insertion failure was more likely in nulliparous women compared to parous women (aOR=2.31, 95% CI 0.90-6.52, p=.09).\nQuestion: Failed IUD insertions in community practice: an under-recognized problem?\nAnswer: yes\n\nAbstract: The aim of this study was to analyze the properties of the immune cell microenvironment of regional lymph nodes (LNs) positive for lung cancer.\nTwenty-four patients operated on for stages T1 and T2 of the NSCLC, were enrolled in the study. Peripheral blood and LN tissue were obtained from different lymph node sites and levels. As a control, LN tissue was taken from patients diagnosed with emphysema or pneumothorax. The cells from randomly chosen LN were tested by multi-color flow cytometry. Separate portions of LN were snap-frozen and examined for the presence of cytokeratin positive cells (CK). Propensity for apoptosis, level of TCR zeta chain expression of T cells and the number and maturation status of dendritic cells were confronted with the presence of CK-positive cells.\nThe presence of metastases correlated with the downregulation of TCR zeta, especially CD8(+) T cells. The most striking feature was the reduction in the number of myeloid CD11c(+) dendritic cells in the LN of patients with LN metastases. This could be a reflection of the immunodeficient state observed in lung cancer patients. Even in the absence of metastases in the regional LN, the same type of changes in the LN microenvironment were observed in those LN located nearer the primary tumor.\nQuestion: Can the condition of the cell microenvironment of mediastinal lymph nodes help predict the risk of metastases in non-small cell lung cancer?\nAnswer: yes\n\nAbstract: The objective of the current study is to determine to what extent the reduction of Chile's traffic fatalities and injuries during 2000-2012 was related to the police traffic enforcement increment registered after the introduction of its 2005 traffic law reform.\nA unique dataset with assembled information from public institutions and analyses based on ordinary least square and robust random effects models was carried out. Dependent variables were traffic fatality and severe injury rates per population and vehicle fleet. Independent variables were: (1) presence of new national traffic law; (2) police officers per population; (3) number of traffic tickets per police officer; and (4) interaction effect of number of traffic tickets per police officer with traffic law reform. Oil prices, alcohol consumption, proportion of male population 15-24 years old, unemployment, road infrastructure investment, years' effects and regions' effects represented control variables.\nEmpirical estimates from instrumental variables suggest that the enactment of the traffic law reform in interaction with number of traffic tickets per police officer is significantly associated with a decrease of 8% in traffic fatalities and 7% in severe injuries. Piecewise regression model results for the 2007-2012 period suggest that police traffic enforcement reduced traffic fatalities by 59% and severe injuries by 37%.\nQuestion: Did Chile's traffic law reform push police enforcement?\nAnswer: yes\n\nAbstract: To compare atropine with placebo as an adjunct to ketamine sedation in children undergoing minor painful procedures. Outcome measures included hypersalivation, side effect profile, parental/patient satisfaction, and procedural success rate.\nChildren aged between 1 and 16 years of age requiring ketamine procedural sedation in a tertiary emergency department were randomised to receive 0.01 mg/kg of atropine or placebo. All received 4 mg/kg of intramuscular ketamine. Tolerance and sedation scores were recorded throughout the procedure. Side effects were recorded from the start of sedation until discharge. Parental and patient satisfaction scores were obtained at discharge and three to five days after the procedure, with the opportunity to report side effects encountered at home.\nA total of 83 patients aged 13 months to 14.5 years (median age 3.4 years) were enrolled over a 16 month period. Hypersalivation occurred in 11.4% of patients given atropine compared with 30.8% given placebo (odds ratio (OR) 0.29, 95% confidence interval (CI) 0.09 to 0.91). A transient rash was observed in 22.7% of the atropine group compared with 5.1% of the placebo group (OR 5.44, 95% CI 1.11 to 26.6). Vomiting during recovery occurred in 9.1% of atropine patients compared with 25.6% of placebo patients (OR 0.29, 95% CI 0.09 to 1.02). There was a trend towards better tolerance in the placebo group. No patient experienced serious side effects.\nQuestion: Is atropine needed with ketamine sedation?\nAnswer: yes\n\nAbstract: Programmed cell death (PCD) is the regulated death of cells within an organism. The lace plant (Aponogeton madagascariensis) produces perforations in its leaves through PCD. The leaves of the plant consist of a latticework of longitudinal and transverse veins enclosing areoles. PCD occurs in the cells at the center of these areoles and progresses outwards, stopping approximately five cells from the vasculature. The role of mitochondria during PCD has been recognized in animals; however, it has been less studied during PCD in plants.\nThe following paper elucidates the role of mitochondrial dynamics during developmentally regulated PCD in vivo in A. madagascariensis. A single areole within a window stage leaf (PCD is occurring) was divided into three areas based on the progression of PCD; cells that will not undergo PCD (NPCD), cells in early stages of PCD (EPCD), and cells in late stages of PCD (LPCD). Window stage leaves were stained with the mitochondrial dye MitoTracker Red CMXRos and examined. Mitochondrial dynamics were delineated into four categories (M1-M4) based on characteristics including distribution, motility, and membrane potential (ΔΨm). A TUNEL assay showed fragmented nDNA in a gradient over these mitochondrial stages. Chloroplasts and transvacuolar strands were also examined using live cell imaging. The possible importance of mitochondrial permeability transition pore (PTP) formation during PCD was indirectly examined via in vivo cyclosporine A (CsA) treatment. This treatment resulted in lace plant leaves with a significantly lower number of perforations compared to controls, and that displayed mitochondrial dynamics similar to that of non-PCD cells.\nQuestion: Do mitochondria play a role in remodelling lace plant leaves during programmed cell death?\nAnswer:", " yes"], "resps": [[[-1.6232320070266724, false]], [[-1.1910191774368286, true]], [[-9.578462600708008, false]]], "filtered_resps": [[-1.6232320070266724, false], [-1.1910191774368286, true], [-9.578462600708008, false]], "acc": 0.0}
{"doc_id": 8, "doc": {"pubid": 17113061, "question": "Do mutations causing low HDL-C promote increased carotid intima-media thickness?", "context": {"contexts": ["Although observational data support an inverse relationship between high-density lipoprotein (HDL) cholesterol and coronary heart disease (CHD), genetic HDL deficiency states often do not correlate with premature CHD.", "Carotid intima-media thickness (cIMT) measurements were obtained in cases comprising 10 different mutations in LCAT, ABCA1 and APOA1 to further evaluate the relationship between low HDL resulting from genetic variation and early atherosclerosis.", "In a 1:2 case-control study of sex and age-related (+/-5 y) subjects (n=114), cIMT was nearly identical between cases (0.66+/-0.17 cm) and controls (0.65+/-0.18 cm) despite significantly lower HDL cholesterol (0.67 vs. 1.58 mmol/l) and apolipoprotein A-I levels (96.7 vs. 151.4 mg/dl) (P<0.05)"], "labels": ["BACKGROUND", "METHODS", "RESULTS"], "meshes": ["Cholesterol, HDL", "Contrast Media", "Coronary Disease", "Female", "Humans", "Male", "Mutation", "Risk Factors"], "reasoning_required_pred": ["n", "o"], "reasoning_free_pred": ["n", "o"]}, "long_answer": "Genetic variants identified in the present study may be insufficient to promote early carotid atherosclerosis.", "final_decision": "no"}, "target": "no", "arguments": ["Abstract: Recent studies have implicated the human cytomegalovirus (HCMV) as a possible pathogen for causing hypertension. We aimed to study the association between HCMV infection and hypertension in the United States National Health and Nutrition Examination Survey (NHANES).\nWe analyzed data on 2979 men and 3324 women in the NHANES 1999-2002. We included participants aged 16-49 years who had valid data on HCMV infection and hypertension.\nOf the participants, 54.7% had serologic evidence of HCMV infection and 17.5% had hypertension. There were ethnic differences in the prevalence of HCMV infection (P<0.001) and hypertension (P<0.001). The prevalence of both increased with age (P<0.001). Before adjustment, HCMV seropositivity was significantly associated with hypertension in women (OR=1.63, 95% CI=1.25-2.13, P=0.001) but not in men. After adjustment for race/ethnicity, the association between HCMV seropositivity and hypertension in women remained significant (OR=1.55, 95% CI=1.20-2.02, P=0.002). Further adjustment for body mass index, diabetes status and hypercholesterolemia attenuated the association (OR=1.44, 95% CI=1.10-1.90, P=0.010). However, after adjusting for age, the association was no longer significant (OR=1.24, 95% CI=0.91-1.67, P=0.162).\nQuestion: Is human cytomegalovirus infection associated with hypertension?\nAnswer: no\n\nAbstract: Cardiovascular disease is prevalent among workers with high levels of occupational physical activity. The increased risk may be due to a high relative aerobic workload, possibly leading to increased blood pressure. However, studies investigating the relation between relative aerobic workload and ambulatory blood pressure (ABP) are lacking. The aim was to explore the relationship between objectively measured relative aerobic workload and ABP.\nA total of 116 cleaners aged 18-65 years were included after informed consent was obtained. A portable device (Spacelabs 90217) was mounted for 24-h measurements of ABP, and an Actiheart was mounted for 24-h heart rate measurements to calculate relative aerobic workload as percentage of relative heart rate reserve. A repeated-measure multi-adjusted mixed model was applied for analysis.\nA fully adjusted mixed model of measurements throughout the day showed significant positive relations (p<0.001): a 1% increase in mean relative aerobic workload was associated with an increase of 0.42 ± 0.05 mmHg (95% CI 0.32-0.52 mmHg) in systolic ABP and 0.30 ± 0.04 mmHg (95% CI 0.22-0.38 mmHg) in diastolic ABP. Correlations between relative aerobic workload and ABP were significant.\nQuestion: Is aerobic workload positively related to ambulatory blood pressure?\nAnswer: yes\n\nAbstract: To evaluate the effectiveness of the role of a discharge coordinator whose sole responsibility was to plan and coordinate the discharge of patients from medical wards.\nAn intervention study in which the quality of discharge planning was assessed before and after the introduction of a discharge coordinator. Patients were interviewed on the ward before discharge and seven to 10 days after being discharged home.\nThe three medical wards at the Homerton Hospital in Hackney, East London.\n600 randomly sampled adult patients admitted to the medical wards of the study hospital, who were resident in the district (but not in institutions), were under the care of physicians (excluding psychiatry), and were discharged home from one of the medical wards. The sampling was conducted in three study phases, over 18 months.\nPhase I comprised base line data collection; in phase II data were collected after the introduction of the district discharge planning policy and a discharge form (checklist) for all patients; in phase III data were collected after the introduction of the discharge coordinator.\nThe quality and out come of discharge planning. Readmission rates, duration of stay, appropriateness of days of care, patients' health and satisfaction, problems after discharge, and receipt of services.\nThe discharge coordinator resulted in an improved discharge planning process, and there was a reduction in problems experienced by patients after discharge, and in perceived need for medical and healthcare services. There was no evidence that the discharge coordinator resulted in a more timely or effective provision of community services after discharge, or that the appropriateness or efficiency of bed use was improved.\nQuestion: Does a dedicated discharge coordinator improve the quality of hospital discharge?\nAnswer: yes\n\nAbstract: We observed an endoscopic abnormally in a group of children with histological esophagitis. We termed this finding \"vertical lines in esophageal mucosa\" (VLEM). We examined the relationship between the presence of VLEM and significant histologic changes in esophageal mucosal biopsies.\nBetween January 1, 1992, and August 31, 1994, the senior author (JFF) performed 255 esophageal biopsies. The procedure reports, available endoscopic photographs, and histology reports were reviewed to establish the endoscopic and histologic appearance of the esophageal mucosa. Intraepithelial cells were counted in a blind review of 42 randomly selected biopsies.\nThe esophageal mucosa had a normal appearance on 160 endoscopic studies (Group 1) and VLEM were the only mucosal abnormalities in 41 endoscopies (Group 2). Histology was normal in 92 of 160 biopsies (57.5%) from Group 1, and 1 of 41 biopsies (2.4%) from Group 2. Most patients in Group 2 had eosinophilic esophagitis (34 of 41, 83%, specificity 0.85, sensitivity 0.5, p>0.001) which was of moderate to severe intensity (31 of 34, 91.2%, specificity 0.88, sensitivity 0.73, p<0.001).\nQuestion: Vertical lines in distal esophageal mucosa (VLEM): a true endoscopic manifestation of esophagitis in children?\nAnswer: yes\n\nAbstract: Although observational data support an inverse relationship between high-density lipoprotein (HDL) cholesterol and coronary heart disease (CHD), genetic HDL deficiency states often do not correlate with premature CHD.\nCarotid intima-media thickness (cIMT) measurements were obtained in cases comprising 10 different mutations in LCAT, ABCA1 and APOA1 to further evaluate the relationship between low HDL resulting from genetic variation and early atherosclerosis.\nIn a 1:2 case-control study of sex and age-related (+/-5 y) subjects (n=114), cIMT was nearly identical between cases (0.66+/-0.17 cm) and controls (0.65+/-0.18 cm) despite significantly lower HDL cholesterol (0.67 vs. 1.58 mmol/l) and apolipoprotein A-I levels (96.7 vs. 151.4 mg/dl) (P<0.05)\nQuestion: Do mutations causing low HDL-C promote increased carotid intima-media thickness?\nAnswer:", " yes"], "resps": [[[-1.337092638015747, false]], [[-0.7598450779914856, true]], [[-9.417948722839355, false]]], "filtered_resps": [[-1.337092638015747, false], [-0.7598450779914856, true], [-9.417948722839355, false]], "acc": 1.0}
{"doc_id": 1, "doc": {"pubid": 16418930, "question": "Landolt C and snellen e acuity: differences in strabismus amblyopia?", "context": {"contexts": ["Assessment of visual acuity depends on the optotypes used for measurement. The ability to recognize different optotypes differs even if their critical details appear under the same visual angle. Since optotypes are evaluated on individuals with good visual acuity and without eye disorders, differences in the lower visual acuity range cannot be excluded. In this study, visual acuity measured with the Snellen E was compared to the Landolt C acuity.", "100 patients (age 8 - 90 years, median 60.5 years) with various eye disorders, among them 39 with amblyopia due to strabismus, and 13 healthy volunteers were tested. Charts with the Snellen E and the Landolt C (Precision Vision) which mimic the ETDRS charts were used to assess visual acuity. Three out of 5 optotypes per line had to be correctly identified, while wrong answers were monitored. In the group of patients, the eyes with the lower visual acuity, and the right eyes of the healthy subjects, were evaluated.", "Differences between Landolt C acuity (LR) and Snellen E acuity (SE) were small. The mean decimal values for LR and SE were 0.25 and 0.29 in the entire group and 0.14 and 0.16 for the eyes with strabismus amblyopia. The mean difference between LR and SE was 0.55 lines in the entire group and 0.55 lines for the eyes with strabismus amblyopia, with higher values of SE in both groups. The results of the other groups were similar with only small differences between LR and SE."], "labels": ["BACKGROUND", "PATIENTS AND METHODS", "RESULTS"], "meshes": ["Adolescent", "Adult", "Aged", "Aged, 80 and over", "Amblyopia", "Cataract", "Child", "Eye Diseases", "Female", "Humans", "Male", "Middle Aged", "Reference Values", "Refractive Errors", "Reproducibility of Results", "Retinal Diseases", "Strabismus", "Vision Tests", "Visual Acuity"], "reasoning_required_pred": ["n", "o"], "reasoning_free_pred": ["n", "o"]}, "long_answer": "Using the charts described, there was only a slight overestimation of visual acuity by the Snellen E compared to the Landolt C, even in strabismus amblyopia. Small differences in the lower visual acuity range have to be considered.", "final_decision": "no"}, "target": "no", "arguments": ["Abstract: Current guidelines recommend total thyroidectomy for nearly all children with well-differentiated thyroid cancer (WDTC). These guidelines, however, derive from older data accrued prior to current high-resolution imaging. We speculate that there is a subpopulation of children who may be adequately treated with lobectomy.\nRetrospective analysis of prospectively maintained database.\nSeventy-three children with WDTC treated between 2004 and 2015.\nWe applied two different risk-stratification criteria to this population. First, we determined the number of patients meeting American Thyroid Association (ATA) 'low-risk' criteria, defined as disease grossly confined to the thyroid with either N0/Nx or incidental microscopic N1a disease. Second, we defined a set of 'very-low-risk' histopathological criteria, comprising unifocal tumours ≤4 cm without predefined high-risk factors, and determined the proportion of patients that met these criteria.\nTwenty-seven (37%) males and 46 (63%) females were included in this study, with a mean age of 13·4 years. Ipsilateral- and contralateral multifocality were identified in 27 (37·0%) and 19 (26·0%) of specimens. Thirty-seven (51%) patients had lymph node metastasis (N1a = 18/N1b = 19). Pre-operative ultrasound identified all cases with clinically significant nodal disease. Of the 73 patients, 39 (53·4%) met ATA low-risk criteria and 16 (21·9%) met 'very-low-risk' criteria. All 'very-low-risk' patients demonstrated excellent response to initial therapy without persistence/recurrence after a mean follow-up of 36·4 months.\nQuestion: Is it time to reconsider lobectomy in low-risk paediatric thyroid cancer?\nAnswer: yes\n\nAbstract: Little is known about how information needs change over time in the early postpartum period or about how these needs might differ given socioeconomic circumstances. This study's aim was to examine women's concerns at the time of hospital discharge and unmet learning needs as self-identified at 4 weeks after discharge.\nData were collected as part of a cross-sectional survey of postpartum health outcomes, service use, and costs of care in the first 4 weeks after postpartum hospital discharge. Recruitment of 250 women was conducted from each of 5 hospitals in Ontario, Canada (n = 1,250). Women who had given vaginal birth to a single live infant, and who were being discharged at the same time as their infant, assuming care of their infant, competent to give consent, and able to communicate in one of the study languages were eligible. Participants completed a self-report questionnaire in hospital; 890 (71.2%) took part in a structured telephone interview 4 weeks after hospital discharge.\nApproximately 17 percent of participants were of low socioeconomic status. Breastfeeding and signs of infant illness were the most frequently identified concerns by women, regardless of their socioeconomic status. Signs of infant illness and infant care/behavior were the main unmet learning needs. Although few differences in identified concerns were evident, women of low socioeconomic status were significantly more likely to report unmet learning needs related to 9 of 10 topics compared with women of higher socioeconomic status. For most topics, significantly more women of both groups identified learning needs 4 weeks after discharge compared with the number who identified corresponding concerns while in hospital.\nQuestion: Learning needs of postpartum women: does socioeconomic status matter?\nAnswer: yes\n\nAbstract: To determine the ability of early sonogram to predict the presentation of twin A at birth.\nA retrospective cohort study was conducted on all twin pregnancies evaluated at our Fetal Evaluation Unit from 2007 to 2009. Sonogram records were reviewed for the presentation of twin A at seven gestational age intervals and inpatient medical records were reviewed for the presentation of twin A at delivery. The positive predictive value, sensitivity, and specificity of presentation as determined by ultrasound, at each gestational age interval, for the same presentation at delivery were calculated.\nTwo hundred and thirty-eight twin pregnancies met inclusion criteria. A total of 896 ultrasounds were reviewed. The positive predictive value of cephalic presentation of twin A as determined by ultrasound for the persistence of cephalic presentation at delivery reached 95% after 28 weeks gestation. The positive predictive value for noncephalic presentation as established by sonogram for noncephalic at delivery was>90% after 32 weeks gestation.\nQuestion: Can third trimester ultrasound predict the presentation of the first twin at delivery?\nAnswer: yes\n\nAbstract: To evaluate the impact of patient-prosthesis mismatch (PPM) on survival, functional status, and quality of life (QoL) after aortic valve replacement (AVR) with small prosthesis size in elderly patients.\nBetween January 2005 and December 2013, 152 patients with pure aortic stenosis, aged at least 75 years, underwent AVR, with a 19 or 21 mm prosthetic heart valve. PPM was defined as an indexed effective orifice area less than 0.85 cm/m. Median age was 82 years (range 75-93 years). Mean follow-up was 56 months (range 1-82 months) and was 98% complete. Late survival rate, New York Heart Association functional class, and QoL (RAND SF-36) were assessed.\nOverall, PPM was found in 78 patients (53.8%). Among them, 42 patients (29%) had an indexed effective orifice area less than 0.75 cm/m and 17 less than 0.65 cm/m (11.7%). Overall survival at 5 years was 78 ± 4.5% and was not influenced by PPM (P = NS). The mean New York Heart Association class for long-term survivors with PPM improved from 3.0 to 1.7 (P < 0.001). QoL (physical functioning 45.18 ± 11.35, energy/fatigue 49.36 ± 8.64, emotional well being 58.84 ± 15.44, social functioning 61.29 ± 6.15) was similar to that of no-PPM patients (P = NS).\nQuestion: Does patient-prosthesis mismatch after aortic valve replacement affect survival and quality of life in elderly patients?\nAnswer: no\n\nAbstract: Assessment of visual acuity depends on the optotypes used for measurement. The ability to recognize different optotypes differs even if their critical details appear under the same visual angle. Since optotypes are evaluated on individuals with good visual acuity and without eye disorders, differences in the lower visual acuity range cannot be excluded. In this study, visual acuity measured with the Snellen E was compared to the Landolt C acuity.\n100 patients (age 8 - 90 years, median 60.5 years) with various eye disorders, among them 39 with amblyopia due to strabismus, and 13 healthy volunteers were tested. Charts with the Snellen E and the Landolt C (Precision Vision) which mimic the ETDRS charts were used to assess visual acuity. Three out of 5 optotypes per line had to be correctly identified, while wrong answers were monitored. In the group of patients, the eyes with the lower visual acuity, and the right eyes of the healthy subjects, were evaluated.\nDifferences between Landolt C acuity (LR) and Snellen E acuity (SE) were small. The mean decimal values for LR and SE were 0.25 and 0.29 in the entire group and 0.14 and 0.16 for the eyes with strabismus amblyopia. The mean difference between LR and SE was 0.55 lines in the entire group and 0.55 lines for the eyes with strabismus amblyopia, with higher values of SE in both groups. The results of the other groups were similar with only small differences between LR and SE.\nQuestion: Landolt C and snellen e acuity: differences in strabismus amblyopia?\nAnswer:", " yes"], "resps": [[[-2.011251926422119, false]], [[-0.9465469121932983, true]], [[-10.619100570678711, false]]], "filtered_resps": [[-2.011251926422119, false], [-0.9465469121932983, true], [-10.619100570678711, false]], "acc": 1.0}
{"doc_id": 9, "doc": {"pubid": 10966337, "question": "A short stay or 23-hour ward in a general and academic children's hospital: are they effective?", "context": {"contexts": ["We evaluated the usefulness of a short stay or 23-hour ward in a pediatric unit of a large teaching hospital, Westmead Hospital, and an academic Children's hospital, The New Children's Hospital, to determine if they are a useful addition to the emergency service.", "This is a descriptive comparison of prospectively collected data on all children admitted to the short stay ward at Westmead Hospital (WH) during 1994 and the short stay ward at the New Children's Hospital (NCH) during 1997-98. These hospitals service an identical demographic area with the latter (NCH) a tertiary referral center. The following outcome measures were used: length of stay, appropriateness of stay, rate of admission to an in-hospital bed, and rate of unscheduled visits within 72 hours of discharge. Adverse events were reported and patient follow-up was attempted at 48 hours after discharge in all cases.", "The short stay ward accounted for 10.3% (Westmead Hospital) and 14.7% (New Children's Hospital) of admissions, with 56% medical in nature, 30% surgical, and the remainder procedural or psychological. Admission patterns were similar, with asthma, gastroenteritis, convulsion, pneumonia, and simple surgical conditions accounting for most short stay ward admissions. The short stay ward increased hospital efficiency with an average length of stay of 17.5 hours (Westmead Hospital) compared to 20.5 hours (New Children's Hospital). The users of the short stay ward were children of young age less than 2 years, with stay greater than 23 hours reported in only 1% of all admissions to the short stay ward. The rate of patient admission to an in-hospital bed was low, (4% [Westmead Hospital] compared to 6% [New Children's Hospital]), with the number of unscheduled visits within 72 hours of short stay ward discharge less than 1%. There were no adverse events reported at either short stay ward, with parental satisfaction high. The short stay ward was developed through reallocation of resources from within the hospital to the short stay ward. This resulted in estimated savings of $1/2 million (Westmead Hospital) to $2.3 million (New Children's Hospital) to the hospital, due to more efficient bed usage."], "labels": ["OBJECTIVE", "METHODS", "RESULTS"], "meshes": ["Academic Medical Centers", "Acute Disease", "Adolescent", "Child", "Child, Preschool", "Critical Pathways", "Emergency Service, Hospital", "Follow-Up Studies", "Hospital Units", "Hospitals, General", "Hospitals, Pediatric", "Humans", "Infant", "Length of Stay", "New South Wales", "Outcome Assessment (Health Care)", "Pediatrics", "Prospective Studies", "Time Factors"], "reasoning_required_pred": ["y", "e", "s"], "reasoning_free_pred": ["y", "e", "s"]}, "long_answer": "This data demonstrates the robust nature of the short stay ward. At these two very different institutions we have shown improved bed efficient and patient care in a cost-effective way. We have also reported on greater parental satisfaction and early return of the child with their family to the community.", "final_decision": "yes"}, "target": "yes", "arguments": ["Abstract: The aims of the study were to report the rates of recurrent and residual cholesteatoma following primary CAT surgery and to report the rate of conversion to a modified radical mastoidectomy.\nThis was a retrospective review of a single surgeon series between 2006 and 2012.\nIn total 132 second-look operations were undertaken, with a mean interval between primary surgery and second-look procedures of 6 months. The rate of cholesteatoma at second-look surgery was 19.7%, which was split into residual disease (10.6%) and recurrent disease (9.09%). New tympanic membrane defects with cholesteatoma were considered as recurrent disease. Residual disease was defined as cholesteatoma present behind an intact tympanic membrane. The majority of recurrent and residual disease was easily removed at second look (73.1%). Only four cases were converted to a modified radical mastoidectomy (3%) and three cases required a third-look procedure.\nQuestion: Can early second-look tympanoplasty reduce the rate of conversion to modified radical mastoidectomy?\nAnswer: yes\n\nAbstract: To determine the ability of early sonogram to predict the presentation of twin A at birth.\nA retrospective cohort study was conducted on all twin pregnancies evaluated at our Fetal Evaluation Unit from 2007 to 2009. Sonogram records were reviewed for the presentation of twin A at seven gestational age intervals and inpatient medical records were reviewed for the presentation of twin A at delivery. The positive predictive value, sensitivity, and specificity of presentation as determined by ultrasound, at each gestational age interval, for the same presentation at delivery were calculated.\nTwo hundred and thirty-eight twin pregnancies met inclusion criteria. A total of 896 ultrasounds were reviewed. The positive predictive value of cephalic presentation of twin A as determined by ultrasound for the persistence of cephalic presentation at delivery reached 95% after 28 weeks gestation. The positive predictive value for noncephalic presentation as established by sonogram for noncephalic at delivery was>90% after 32 weeks gestation.\nQuestion: Can third trimester ultrasound predict the presentation of the first twin at delivery?\nAnswer: yes\n\nAbstract: To assess the results of transsphenoidal pituitary surgery in patients with Cushing's disease over a period of 18 years, and to determine if there are factors which will predict the outcome.\nSixty-nine sequential patients treated surgically by a single surgeon in Newcastle upon Tyne between 1980 and 1997 were identified and data from 61 of these have been analysed.\nRetrospective analysis of outcome measures.\nPatients were divided into three groups (remission, failure and relapse) depending on the late outcome of their treatment as determined at the time of analysis, i.e. 88 months (median) years after surgery. Remission is defined as biochemical reversal of hypercortisolism with re-emergence of diurnal circadian rhythm, resolution of clinical features and adequate suppression on low-dose dexamethasone testing. Failure is defined as the absence of any of these features. Relapse is defined as the re-emergence of Cushing's disease more than one year after operation. Clinical features such as weight, sex, hypertension, associated endocrine disorders and smoking, biochemical studies including preoperative and postoperative serum cortisol, urine free cortisol, serum ACTH, radiological, histological and surgical findings were assessed in relation to these three groups to determine whether any factors could reliably predict failure or relapse after treatment.\nOf the 61 patients included in this study, 48 (78.7%) achieved initial remission and 13 (21.3%) failed treatment. Seven patients suffered subsequent relapse (range 22-158 months) in their condition after apparent remission, leaving a final group of 41 patients (67.2%) in the remission group. Tumour was identified at surgery in 52 patients, of whom 38 achieved remission. In comparison, only 3 of 9 patients in whom no tumour was identified achieved remission. This difference was significant (P = 0.048). When both radiological and histological findings were positive, the likelihood of achieving remission was significantly higher than if both modalities were negative (P = 0.038). There were significant differences between remission and failure groups when 2- and 6-week postoperative serum cortisol levels (P = 0.002 and 0.001, respectively) and 6-week postoperative urine free cortisol levels (P = 0.026) were compared. This allowed identification of patients who failed surgical treatment in the early postoperative period. Complications of surgery included transitory DI in 13, transitory CSF leak in 8 and transitory nasal discharge and cacosmia in 3. Twelve of 41 patients required some form of hormonal replacement therapy despite achieving long-term remission. Thirteen patients underwent a second operation, of whom 5 achieved remission.\nQuestion: Transsphenoidal pituitary surgery in Cushing's disease: can we predict outcome?\nAnswer: yes\n\nAbstract: The objective was to evaluate the efficacy of diffusion-weighted imaging (DWI) in predicting the development of vascularization in hypovascular hepatocellular lesions (HHLs).\nForty-two HHLs that were diagnosed by computed tomographic (CT) arteriography were evaluated retrospectively. The lesion on DWI was classified as isointense, hypointense, or hyperintense. Follow-up studies that included intravenous dynamic CT or magnetic resonance imaging were performed.\nThe 730-day cumulative developments of vascularization in hypointense, isointense, and hyperintense lesions were 17%, 30%, and 40%, respectively. The differences among these developments were not statistically significant.\nQuestion: Is diffusion-weighted imaging a significant indicator of the development of vascularization in hypovascular hepatocellular lesions?\nAnswer: no\n\nAbstract: We evaluated the usefulness of a short stay or 23-hour ward in a pediatric unit of a large teaching hospital, Westmead Hospital, and an academic Children's hospital, The New Children's Hospital, to determine if they are a useful addition to the emergency service.\nThis is a descriptive comparison of prospectively collected data on all children admitted to the short stay ward at Westmead Hospital (WH) during 1994 and the short stay ward at the New Children's Hospital (NCH) during 1997-98. These hospitals service an identical demographic area with the latter (NCH) a tertiary referral center. The following outcome measures were used: length of stay, appropriateness of stay, rate of admission to an in-hospital bed, and rate of unscheduled visits within 72 hours of discharge. Adverse events were reported and patient follow-up was attempted at 48 hours after discharge in all cases.\nThe short stay ward accounted for 10.3% (Westmead Hospital) and 14.7% (New Children's Hospital) of admissions, with 56% medical in nature, 30% surgical, and the remainder procedural or psychological. Admission patterns were similar, with asthma, gastroenteritis, convulsion, pneumonia, and simple surgical conditions accounting for most short stay ward admissions. The short stay ward increased hospital efficiency with an average length of stay of 17.5 hours (Westmead Hospital) compared to 20.5 hours (New Children's Hospital). The users of the short stay ward were children of young age less than 2 years, with stay greater than 23 hours reported in only 1% of all admissions to the short stay ward. The rate of patient admission to an in-hospital bed was low, (4% [Westmead Hospital] compared to 6% [New Children's Hospital]), with the number of unscheduled visits within 72 hours of short stay ward discharge less than 1%. There were no adverse events reported at either short stay ward, with parental satisfaction high. The short stay ward was developed through reallocation of resources from within the hospital to the short stay ward. This resulted in estimated savings of $1/2 million (Westmead Hospital) to $2.3 million (New Children's Hospital) to the hospital, due to more efficient bed usage.\nQuestion: A short stay or 23-hour ward in a general and academic children's hospital: are they effective?\nAnswer:", " yes"], "resps": [[[-1.6780028343200684, false]], [[-0.6899580359458923, true]], [[-9.939301490783691, false]]], "filtered_resps": [[-1.6780028343200684, false], [-0.6899580359458923, true], [-9.939301490783691, false]], "acc": 0.0}
{"doc_id": 2, "doc": {"pubid": 9488747, "question": "Syncope during bathing in infants, a pediatric form of water-induced urticaria?", "context": {"contexts": ["Apparent life-threatening events in infants are a difficult and frequent problem in pediatric practice. The prognosis is uncertain because of risk of sudden infant death syndrome.", "Eight infants aged 2 to 15 months were admitted during a period of 6 years; they suffered from similar maladies in the bath: on immersion, they became pale, hypotonic, still and unreactive; recovery took a few seconds after withdrawal from the bath and stimulation. Two diagnoses were initially considered: seizure or gastroesophageal reflux but this was doubtful. The hypothesis of an equivalent of aquagenic urticaria was then considered; as for patients with this disease, each infant's family contained members suffering from dermographism, maladies or eruption after exposure to water or sun. All six infants had dermographism. We found an increase in blood histamine levels after a trial bath in the two infants tested. The evolution of these \"aquagenic maladies\" was favourable after a few weeks without baths. After a 2-7 year follow-up, three out of seven infants continue to suffer from troubles associated with sun or water."], "labels": ["BACKGROUND", "CASE REPORTS"], "meshes": ["Baths", "Histamine", "Humans", "Infant", "Syncope", "Urticaria", "Water"], "reasoning_required_pred": ["y", "e", "s"], "reasoning_free_pred": ["y", "e", "s"]}, "long_answer": "\"Aquagenic maladies\" could be a pediatric form of the aquagenic urticaria.", "final_decision": "yes"}, "target": "yes", "arguments": ["Abstract: Blood stream infection (BSI) and the subsequent development of sepsis are among the most common infection complications occurring in severe burn patients. This study was designed to evaluate the relationship between the burn wound flora and BSI pathogens.\nDocumentation of all bacterial and fungal wound and blood isolates from severe burn patients hospitalized in the burn unit and intensive care unit was obtained from medical records retrieved retrospectively from a computerized, hospital-wide database over a 13-year period. All data were recorded in relation to the Ryan score.\nOf 195 severe burn patients, 88 had at least 1 BSI episode. Transmission of the same pathogen from wound to blood was documented in 30% of the patients, with a rising BSI frequency as the Ryan score increased. There were a total of 263 bacteremic episodes in 88 study patients, 44% of blood isolates were documented previously in wound cultures, and transmission of the same pathogen from wound to blood was noted in 65% of bacteremic patients.\nQuestion: Do Wound Cultures Give Information About the Microbiology of Blood Cultures in Severe Burn Patients?\nAnswer: yes\n\nAbstract: To determine if composite measures based on process indicators are consistent with short-term outcome indicators in surgical colorectal cancer care.\nLongitudinal analysis of consistency between composite measures based on process indicators and outcome indicators for 85 Dutch hospitals.\nThe Dutch Surgical Colorectal Audit database, the Netherlands.\n4732 elective patients with colon carcinoma and 2239 with rectum carcinoma treated in 85 hospitals were included in the analyses.\nAll available process indicators were aggregated into five different composite measures. The association of the different composite measures with risk-adjusted postoperative mortality and morbidity was analysed at the patient and hospital level.\nAt the patient level, only one of the composite measures was negatively associated with morbidity for rectum carcinoma. At the hospital level, a strong negative association was found between composite measures and hospital mortality and morbidity rates for rectum carcinoma (p<0.05), and hospital morbidity rates for colon carcinoma.\nQuestion: Combining process indicators to evaluate quality of care for surgical patients with colorectal cancer: are scores consistent with short-term outcome?\nAnswer: maybe\n\nAbstract: The effect of preoperative education on anxiety and postoperative outcomes of cardiac surgery patients remains unclear.AIM: The aim of the study was to estimate the effectiveness of a nurse-led preoperative education on anxiety and postoperative outcomes.\nA randomised controlled study was designed. All the patients who were admitted for elective cardiac surgery in a general hospital in Athens with knowledge of the Greek language were eligible to take part in the study. Patients in the intervention group received preoperative education by specially trained nurses. The control group received the standard information by the ward personnel. Measurements of anxiety were conducted on admission-A, before surgery-B and before discharge-C by the state-trait anxiety inventory.\nThe sample consisted of 395 patients (intervention group: 205, control group: 190). The state anxiety on the day before surgery decreased only in the intervention group (34.0 (8.4) versus 36.9 (10.7); P=0.001). The mean decrease in state score during the follow-up period was greater in the intervention group (P=0.001). No significant difference was found in the length of stay or readmission. Lower proportions of chest infection were found in the intervention group (10 (5.3) versus 1 (0.5); P=0.004). Multivariate linear regression revealed that education and score in trait anxiety scale on admission are independent predictors of a reduction in state anxiety.\nQuestion: Can nurse-led preoperative education reduce anxiety and postoperative complications of patients undergoing cardiac surgery?\nAnswer: yes\n\nAbstract: It is generally believed that positioning of the patient in a head-down tilt (Trendelenberg position) decreases the likelihood of a venous air embolism during liver resection.\nThe physiological effect of variation in horizontal attitude on central and hepatic venous pressure was measured in 10 patients during liver surgery. Hemodynamic indices were recorded with the operating table in the horizontal, 20 degrees head-up and 20 degrees head-down positions.\nThere was no demonstrable pressure gradient between the hepatic and central venous levels in any of the positions. The absolute pressures did, however, vary in a predictable way, being highest in the head-down and lowest during head-up tilt. However, on no occasion was a negative intraluminal pressure recorded.\nQuestion: Does patient position during liver surgery influence the risk of venous air embolism?\nAnswer: no\n\nAbstract: Apparent life-threatening events in infants are a difficult and frequent problem in pediatric practice. The prognosis is uncertain because of risk of sudden infant death syndrome.\nEight infants aged 2 to 15 months were admitted during a period of 6 years; they suffered from similar maladies in the bath: on immersion, they became pale, hypotonic, still and unreactive; recovery took a few seconds after withdrawal from the bath and stimulation. Two diagnoses were initially considered: seizure or gastroesophageal reflux but this was doubtful. The hypothesis of an equivalent of aquagenic urticaria was then considered; as for patients with this disease, each infant's family contained members suffering from dermographism, maladies or eruption after exposure to water or sun. All six infants had dermographism. We found an increase in blood histamine levels after a trial bath in the two infants tested. The evolution of these \"aquagenic maladies\" was favourable after a few weeks without baths. After a 2-7 year follow-up, three out of seven infants continue to suffer from troubles associated with sun or water.\nQuestion: Syncope during bathing in infants, a pediatric form of water-induced urticaria?\nAnswer:", " yes"], "resps": [[[-1.3580057621002197, false]], [[-0.8325159549713135, true]], [[-4.511409759521484, false], [-4.511409759521484, false], [-4.511409759521484, false], [-4.511409759521484, false]]], "filtered_resps": [[-1.3580057621002197, false], [-0.8325159549713135, true], [-4.511409759521484, false]], "acc": 0.0}
{"doc_id": 3, "doc": {"pubid": 17208539, "question": "Are the long-term results of the transanal pull-through equal to those of the transabdominal pull-through?", "context": {"contexts": ["The transanal endorectal pull-through (TERPT) is becoming the most popular procedure in the treatment of Hirschsprung disease (HD), but overstretching of the anal sphincters remains a critical issue that may impact the continence. This study examined the long-term outcome of TERPT versus conventional transabdominal (ABD) pull-through for HD.", "Records of 41 patients more than 3 years old who underwent a pull-through for HD (TERPT, n = 20; ABD, n = 21) were reviewed, and their families were thoroughly interviewed and scored via a 15-item post-pull-through long-term outcome questionnaire. Patients were operated on between the years 1995 and 2003. During this time, our group transitioned from the ABD to the TERPT technique. Total scoring ranged from 0 to 40: 0 to 10, excellent; 11 to 20 good; 21 to 30 fair; 31 to 40 poor. A 2-tailed Student t test, analysis of covariance, as well as logistic and linear regression were used to analyze the collected data with confidence interval higher than 95%.", "Overall scores were similar. However, continence score was significantly better in the ABD group, and the stool pattern score was better in the TERPT group. A significant difference in age at interview between the 2 groups was noted; we therefore reanalyzed the data controlling for age, and this showed that age did not significantly affect the long-term scoring outcome between groups."], "labels": ["PURPOSE", "METHODS", "RESULTS"], "meshes": ["Child", "Child, Preschool", "Colectomy", "Female", "Hirschsprung Disease", "Humans", "Male", "Treatment Outcome"], "reasoning_required_pred": ["y", "e", "s"], "reasoning_free_pred": ["n", "o"]}, "long_answer": "Our long-term study showed significantly better (2-fold) results regarding the continence score for the abdominal approach compared with the transanal pull-through. The stool pattern and enterocolitis scores were somewhat better for the TERPT group. These findings raise an important issue about the current surgical management of HD; however, more cases will need to be studied before a definitive conclusion can be drawn.", "final_decision": "no"}, "target": "no", "arguments": ["Abstract: To test if secular growth acceleration occurs during fetal life.\nANOVA Kruskal-Wallis and Mann-Whitney U-test have been used for the biometric characteristics comparison of nowadays fetal population with those three decades ago and to test the hypothesis about the existence of secular growth acceleration during fetal life. For this purpose, we first calculated mean values of particular biometric parameters for the whole pregnancy. During the period 2002-2009 biparietal diameter, fetal length and abdominal circumference measurements in singleton uncomplicated pregnancies between 22 and 41 gestational weeks were obtained. Gestational age was estimated according to Naegele's rule and confirmed with an early ultrasound examination. Pregnancies with fetal cromosomopathies and malformations were excluded as well as those resulting in perinatal death.\nThere were no statistically significant differences of the examined fetal biometric parameters measured by ultrasound between contemporary fetal population and those from 35 years ago.\nQuestion: The secular growth acceleration: does it appear during fetal life?\nAnswer: no\n\nAbstract: Phacodonesis can occur in pseudoexfoliation syndrome because of impaired zonular support. This study investigates whether the increased mobility of the lens influences anterior chamber depth in patients with pseudoexfoliation while assuming a prone position.\nCentral anterior chamber depth was measured in 39 patients with clinically apparent unilateral pseudoexfoliation and elevated intraocular pressure. Patients were placed in a face-up position for 5 minutes, at which time anterior chamber depth and axial length were measured by A scan, and intraocular pressure was measured by Tonopen (Oculab, La Jolla, CA) in both eyes. The measurements were repeated on both eyes after 5 minutes in a face-down position.\nNo significant differences in intraocular pressure or axial length between the prone and supine positions were found in either eye. Anterior chamber depth in eyes with pseudoexfoliation decreased from a mean of 3.08 mm in the supine position to a mean of 2.95 mm in the prone position, whereas mean anterior chamber depth in the fellow eyes decreased from 3.01 mm to 2.97 mm. The decrease in anterior chamber depth when facing down in the eyes with pseudoexfoliation was significantly greater than in the fellow eyes.\nQuestion: Does head positioning influence anterior chamber depth in pseudoexfoliation syndrome?\nAnswer: yes\n\nAbstract: The aim of this study was to describe the evolution and epidemiologic characteristics of shigellosis patients over a 25 year period in a large city.\nShigellosis is a notifiable disease in Spain since 1988. Cases are analyzed in Barcelona residents included in the registry between 1988-2012. A descriptive analysis by sex, age, mode of transmission and Shigella species is presented. Trend analysis and time series were performed.\nOf the 559 cases analyzed, 60.15% were males. A sustained increase was observed in the trend since 2008 in males (p<0,05), especially at the expense of males who had no history of food poisoning or travel to endemic areas. The increasing tendency was greater in males from 21 to 60 years, both for S. flexneri (since 2009), and for S. sonnei (since 2004). In 2012 it was noted that in the men with S. flexneri, the 63% were men who have sex with men.\nQuestion: Analysis of the epidemiological pattern of Shigellosis in Barcelona between 1988 and 2012: Is it an emerging sexually transmitted infection?\nAnswer: yes\n\nAbstract: To investigate the role of human T-lymphotrophic virus type I (HTLV-I) infection in four patients who developed slowly progressive myelopathy with abnormal MRI lesions in the cervical cord levels.\nClinical and neuroradiologic examinations were performed, and the odds that an HTLV-I-infected individual of specified genotype, age, and provirus load had HTLV-I-associated myelopathy (HAM)/tropical spastic paraparesis (TSP) were calculated.\nAnti-HTLV-I antibodies were positive in both the serum and the CSF in all of the patients. Biopsied sample from spinal cord lesions showed inflammatory changes in Patient 1. Patient 2 had a demyelinating type of sensorimotor polyneuropathy. Two of the three patients examined showed high risk of developing HAM/TSP in virologic and immunologic aspects.\nQuestion: Chronic progressive cervical myelopathy with HTLV-I infection: Variant form of HAM/TSP?\nAnswer: yes\n\nAbstract: The transanal endorectal pull-through (TERPT) is becoming the most popular procedure in the treatment of Hirschsprung disease (HD), but overstretching of the anal sphincters remains a critical issue that may impact the continence. This study examined the long-term outcome of TERPT versus conventional transabdominal (ABD) pull-through for HD.\nRecords of 41 patients more than 3 years old who underwent a pull-through for HD (TERPT, n = 20; ABD, n = 21) were reviewed, and their families were thoroughly interviewed and scored via a 15-item post-pull-through long-term outcome questionnaire. Patients were operated on between the years 1995 and 2003. During this time, our group transitioned from the ABD to the TERPT technique. Total scoring ranged from 0 to 40: 0 to 10, excellent; 11 to 20 good; 21 to 30 fair; 31 to 40 poor. A 2-tailed Student t test, analysis of covariance, as well as logistic and linear regression were used to analyze the collected data with confidence interval higher than 95%.\nOverall scores were similar. However, continence score was significantly better in the ABD group, and the stool pattern score was better in the TERPT group. A significant difference in age at interview between the 2 groups was noted; we therefore reanalyzed the data controlling for age, and this showed that age did not significantly affect the long-term scoring outcome between groups.\nQuestion: Are the long-term results of the transanal pull-through equal to those of the transabdominal pull-through?\nAnswer:", " yes"], "resps": [[[-1.1190098524093628, true]], [[-1.6553257703781128, false]], [[-9.717825889587402, false], [-9.717825889587402, false], [-9.717825889587402, false], [-9.717825889587402, false]]], "filtered_resps": [[-1.1190098524093628, true], [-1.6553257703781128, false], [-9.717825889587402, false]], "acc": 0.0}
{"doc_id": 4, "doc": {"pubid": 10808977, "question": "Can tailored interventions increase mammography use among HMO women?", "context": {"contexts": ["Telephone counseling and tailored print communications have emerged as promising methods for promoting mammography screening. However, there has been little research testing, within the same randomized field trial, of the efficacy of these two methods compared to a high-quality usual care system for enhancing screening. This study addressed the question: Compared to usual care, is tailored telephone counseling more effective than tailored print materials for promoting mammography screening?", "Three-year randomized field trial.", "One thousand ninety-nine women aged 50 and older recruited from a health maintenance organization in North Carolina.", "Women were randomized to 1 of 3 groups: (1) usual care, (2) tailored print communications, and (3) tailored telephone counseling.", "Adherence to mammography screening based on self-reports obtained during 1995, 1996, and 1997.", "Compared to usual care alone, telephone counseling promoted a significantly higher proportion of women having mammograms on schedule (71% vs 61%) than did tailored print (67% vs 61%) but only after the first year of intervention (during 1996). Furthermore, compared to usual care, telephone counseling was more effective than tailored print materials at promoting being on schedule with screening during 1996 and 1997 among women who were off-schedule during the previous year."], "labels": ["BACKGROUND", "DESIGN", "PARTICIPANTS", "INTERVENTION", "MAIN OUTCOME", "RESULTS"], "meshes": ["Cost-Benefit Analysis", "Female", "Health Maintenance Organizations", "Humans", "Logistic Models", "Mammography", "Marketing of Health Services", "Middle Aged", "North Carolina", "Odds Ratio", "Pamphlets", "Patient Acceptance of Health Care", "Patient Satisfaction", "Reminder Systems", "Telephone"], "reasoning_required_pred": ["y", "e", "s"], "reasoning_free_pred": ["n", "o"]}, "long_answer": "The effects of the intervention were most pronounced after the first intervention. Compared to usual care, telephone counseling seemed particularly effective at promoting change among nonadherent women, the group for whom the intervention was developed. These results suggest that telephone counseling, rather than tailored print, might be the preferred first-line intervention for getting nonadherent women on schedule for mammography screening. Many questions would have to be answered about why the tailored print intervention was not more powerful. Nevertheless, it is clear that additional interventions will be needed to maintain women's adherence to mammography. Medical Subject Headings (MeSH): mammography screening, telephone counseling, tailored print communications, barriers.", "final_decision": "yes"}, "target": "yes", "arguments": ["Abstract: In recent years the role of trace elements in lithogenesis has received steadily increasing attention.\nThis study was aimed to attempt to find the correlations between the chemical content of the stones and the concentration of chosen elements in the urine and hair of stone formers.\nThe proposal for the study was approved by the local ethics committee. Specimens were taken from 219 consecutive stone-formers. The content of the stone was evaluated using atomic absorption spectrometry, spectrophotometry, and colorimetric methods. An analysis of 29 elements in hair and 21 elements in urine was performed using inductively coupled plasma-atomic emission spectrometry.\nOnly a few correlations between the composition of stones and the distribution of elements in urine and in hair were found. All were considered incidental.\nQuestion: Can we predict urinary stone composition based on an analysis of microelement concentration in the hair and urine?\nAnswer: no\n\nAbstract: Refusal of patients to participate in intervention programs is an important problem in clinical trials but, in general, researchers devote relatively little attention to it. In this article, a comparison is made between patients who, after having been invited, agreed to participate in a self-management intervention (participants) and those who refused (refusers). Compared with other studies of refusers, relatively more information could be gathered with regard to both their characteristics and reasons for refusing, because all potential participants were invited personally.\nOlder patients from a Dutch outpatient clinic were invited to participate in a self-management intervention, and their characteristics were assessed. Demographic data were collected, as well as data on physical functioning and lack of emotional support. People who refused to participate were asked to give their reasons for refusing.\nOf the 361 patients invited, 267 (74%) refused participation. These refusers were more restricted in their mobility, lived further away from the location of the intervention, and had a partner more often than did the participants. No differences were found in level of education, age or gender. The main reasons given by respondents for refusing to participate were lack of time, travel distance, and transport problems.\nQuestion: Do older patients who refuse to participate in a self-management intervention in the Netherlands differ from older patients who agree to participate?\nAnswer: yes\n\nAbstract: Studies have identified clinical predictors to guide radiologic evaluation of the cervical spine in geriatric patients. We hypothesized that clinical predictors are not adequate in the identification of cervical spine fractures in geriatric blunt trauma patients with low-energy mechanism.\nA retrospective case-control study was performed on geriatric blunt trauma patients sustaining low-energy trauma from January 2000 to January 2006. A data form including 8 clinical predictors was completed for each group.\nThere were 35 study and 64 control patients identified. Both groups were similar in age (study 83.6 vs control 81.2) and injury severity score (study 9.06 vs control 9.61). Only neck tenderness exceeded the expected occurrence in the presence of a cervical spine injury (chi(2) = 18.1, P = .001) in just 45.5% of the study group.\nQuestion: Cervical spine fractures in geriatric blunt trauma patients with low-energy mechanism: are clinical predictors adequate?\nAnswer: no\n\nAbstract: Recently, increasing number of literature has identified the posterior tibial slope (PTS) as one of the risk factors of primary anterior cruciate ligament (ACL) injury. However, few studies concerning the association between failure of ACL reconstruction (ACLR) and PTS have been published. The objective of this study was to explore the association between the failure of ACLR and PTS at a minimum of two years follow-up.\nTwo hundred and thirty eight eligible patients from June 2009 to October 2010 were identified from our database. A total of 20 failure cases of ACLR and 20 randomly selected controls were included in this retrospective study. The demographic data and the results of manual maximum side-to-side difference with KT-1000 arthrometer at 30° of knee flexion and pivot-shift test before the ACLR and at the final follow-up were collected. The medial and lateral PTSs were measured using the magnetic resonance imaging (MRI) scan, based on Hudek's measurement. A comparison of PTS between the two groups was performed.\nThe overall failure rate of the present study was 8.4%. Of the 40 participants, the mean medial PTS was 4.1° ± 3.2° and the mean lateral PTS was 4.6° ± 2.6°. The medial PTS of the ACLR failure group was significantly steeper than the control group (3.5° ± 2.5° vs. 6.1° ± 2.1°, P = 0.000). Similarly, the lateral PTS of the ACLR failure group was significantly steeper than the control group (2.9° ± 2.1° vs. 5.5° ± 3.0°, P = 0.006). For medial PTS ≥ 5°, the odds ratio of ACLR failure was 6.8 (P = 0.007); for lateral PTS ≥5°, the odds ratio of ACLR failure was 10.8 (P = 0.000).\nQuestion: Are failures of anterior cruciate ligament reconstruction associated with steep posterior tibial slopes?\nAnswer: yes\n\nAbstract: Telephone counseling and tailored print communications have emerged as promising methods for promoting mammography screening. However, there has been little research testing, within the same randomized field trial, of the efficacy of these two methods compared to a high-quality usual care system for enhancing screening. This study addressed the question: Compared to usual care, is tailored telephone counseling more effective than tailored print materials for promoting mammography screening?\nThree-year randomized field trial.\nOne thousand ninety-nine women aged 50 and older recruited from a health maintenance organization in North Carolina.\nWomen were randomized to 1 of 3 groups: (1) usual care, (2) tailored print communications, and (3) tailored telephone counseling.\nAdherence to mammography screening based on self-reports obtained during 1995, 1996, and 1997.\nCompared to usual care alone, telephone counseling promoted a significantly higher proportion of women having mammograms on schedule (71% vs 61%) than did tailored print (67% vs 61%) but only after the first year of intervention (during 1996). Furthermore, compared to usual care, telephone counseling was more effective than tailored print materials at promoting being on schedule with screening during 1996 and 1997 among women who were off-schedule during the previous year.\nQuestion: Can tailored interventions increase mammography use among HMO women?\nAnswer:", " yes"], "resps": [[[-1.5089194774627686, false]], [[-1.9235846996307373, false]], [[-9.15413761138916, false], [-9.15413761138916, false], [-9.15413761138916, false], [-9.15413761138916, false]]], "filtered_resps": [[-1.5089194774627686, false], [-1.9235846996307373, false], [-9.15413761138916, false]], "acc": 1.0}
{"doc_id": 5, "doc": {"pubid": 23831910, "question": "Double balloon enteroscopy: is it efficacious and safe in a community setting?", "context": {"contexts": ["From March 2007 to January 2011, 88 DBE procedures were performed on 66 patients. Indications included evaluation anemia/gastrointestinal bleed, small bowel IBD and dilation of strictures. Video-capsule endoscopy (VCE) was used prior to DBE in 43 of the 66 patients prior to DBE evaluation.", "The mean age was 62 years. Thirty-two patients were female, 15 were African-American; 44 antegrade and 44 retrograde DBEs were performed. The mean time per antegrade DBE was 107.4±30.0 minutes with a distance of 318.4±152.9 cm reached past the pylorus. The mean time per lower DBE was 100.7±27.3 minutes with 168.9±109.1 cm meters past the ileocecal valve reached. Endoscopic therapy in the form of electrocautery to ablate bleeding sources was performed in 20 patients (30.3%), biopsy in 17 patients (25.8%) and dilation of Crohn's-related small bowel strictures in 4 (6.1%). 43 VCEs with pathology noted were performed prior to DBE, with findings endoscopically confirmed in 32 cases (74.4%). In 3 cases the DBE showed findings not noted on VCE."], "labels": ["METHODS", "RESULTS"], "meshes": ["Community Health Centers", "Double-Balloon Enteroscopy", "Female", "Humans", "Intestinal Diseases", "Male", "Middle Aged"], "reasoning_required_pred": ["y", "e", "s"], "reasoning_free_pred": ["y", "e", "s"]}, "long_answer": "DBE appears to be equally safe and effective when performed in the community setting as compared to a tertiary referral center with a comparable yield, efficacy, and complication rate.", "final_decision": "yes"}, "target": "yes", "arguments": ["Abstract: This study aimed to show the relationship between serum paraoxonase 1 level and the epicardial fat tissue thickness.\nTwo hundred and seven patients without any atherosclerotic disease history were included in this cross-sectional observational study. Correlation analysis was performed to determine the correlation between epicardial fat tissue thickness, which was measured by echocardiography and serum paraoxonase 1 level. Also correlation analysis was performed to show correlation between patients' clinical and laboratory findings and the level of serum paraoxonase 1 (PON 1) and the epicardial fat tissue thickness. Pearson and Spearman test were used for correlation analysis.\nNo linear correlation between epicardial fat tissue thickness and serum PON 1 found (correlation coefficient: -0.127, p=0.069). When epicardial fat tissue thickness were grouped as 7 mm and over, and below, and 5 mm and over, and below, serum PON 1 level were significantly lower in ≥7 mm group (PON1 : 168.9 U/L) than<7 mm group (PON 1: 253.9 U/L) (p<0.001). Also hypertension prevalence was increased in ≥7 mm group (p=0.001). Serum triglyceride was found to be higher in ≥7 mm group (p=0.014), body mass index was found higher in ≥5 mm group (p=0.006).\nQuestion: Is there a relationship between serum paraoxonase level and epicardial fat tissue thickness?\nAnswer: no\n\nAbstract: In primary and secondary prevention trials, statins have been shown to reduce the risk of stroke. In addition to lipid lowering, statins have a number of antiatherothrombotic and neuroprotective properties. In a preliminary observational study, we explored whether clinical outcome is improved in patients who are on treatment with statins when stroke occurs.\nWe conducted a population-based case-referent study of 25- to 74-year-old stroke patients with, for each case of a patient who was on statin treatment at the onset of stroke (n=125), 2 referent patients who were not treated with statins but were matched for age, gender, year of onset, and stroke subtype (n=250).\nThe unadjusted odds ratio for early discharge to home (versus late discharge or death) was 1.41 (95% CI 0.91 to 2.17) when patients on statin treatment were compared with referent stroke patients not on statins. Prognostic factors were, in general, more unfavorable among patients on statins. When this was adjusted for in a logistic regression model, the use of statins was a moderately strong but statistically nonsignificant predictor of discharge to home (multiple-adjusted odds ratio 1.42, 95% CI 0.90 to 2.22).\nQuestion: Does pretreatment with statins improve clinical outcome after stroke?\nAnswer: no\n\nAbstract: Blood stream infection (BSI) and the subsequent development of sepsis are among the most common infection complications occurring in severe burn patients. This study was designed to evaluate the relationship between the burn wound flora and BSI pathogens.\nDocumentation of all bacterial and fungal wound and blood isolates from severe burn patients hospitalized in the burn unit and intensive care unit was obtained from medical records retrieved retrospectively from a computerized, hospital-wide database over a 13-year period. All data were recorded in relation to the Ryan score.\nOf 195 severe burn patients, 88 had at least 1 BSI episode. Transmission of the same pathogen from wound to blood was documented in 30% of the patients, with a rising BSI frequency as the Ryan score increased. There were a total of 263 bacteremic episodes in 88 study patients, 44% of blood isolates were documented previously in wound cultures, and transmission of the same pathogen from wound to blood was noted in 65% of bacteremic patients.\nQuestion: Do Wound Cultures Give Information About the Microbiology of Blood Cultures in Severe Burn Patients?\nAnswer: yes\n\nAbstract: Gallbladder carcinoma is characterized by delayed diagnosis, ineffective treatment and poor prognosis. Surgical resection has been thought to be the treatment of choice, while the role of radiotherapy as adjuvant or palliative treatment has not been fully clarified in the literature.\nWe present the case of a 45-year-old female, with unresectable gallbladder carcinoma, grade IV, histologically diagnosed during laparotomy. The patient was treated with palliative intent with percutaneous transhepatic biliary drainage. Furthermore, she received external radiotherapy by (60)Co, using a three-field technique (anterior-posterior and right lateral). The total dose was 3,000 cGy in 10 fractions, with 300 cGy per fraction, 5 days weekly.\nThe patient showed clinico-laboratory improvement and was discharged with a permanent percutaneous transhepatic endoprosthesis. During follow-up (10 and 12 months postirradiation), abdominal CTs showed no local extension of the tumor, while the patient had a good performance status. So far, 1 year after the diagnosis of gallbladder cancer she is still alive.\nQuestion: Is external palliative radiotherapy for gallbladder carcinoma effective?\nAnswer: yes\n\nAbstract: From March 2007 to January 2011, 88 DBE procedures were performed on 66 patients. Indications included evaluation anemia/gastrointestinal bleed, small bowel IBD and dilation of strictures. Video-capsule endoscopy (VCE) was used prior to DBE in 43 of the 66 patients prior to DBE evaluation.\nThe mean age was 62 years. Thirty-two patients were female, 15 were African-American; 44 antegrade and 44 retrograde DBEs were performed. The mean time per antegrade DBE was 107.4±30.0 minutes with a distance of 318.4±152.9 cm reached past the pylorus. The mean time per lower DBE was 100.7±27.3 minutes with 168.9±109.1 cm meters past the ileocecal valve reached. Endoscopic therapy in the form of electrocautery to ablate bleeding sources was performed in 20 patients (30.3%), biopsy in 17 patients (25.8%) and dilation of Crohn's-related small bowel strictures in 4 (6.1%). 43 VCEs with pathology noted were performed prior to DBE, with findings endoscopically confirmed in 32 cases (74.4%). In 3 cases the DBE showed findings not noted on VCE.\nQuestion: Double balloon enteroscopy: is it efficacious and safe in a community setting?\nAnswer:", " yes"], "resps": [[[-0.7459797263145447, true]], [[-1.4071965217590332, false]], [[-9.121963500976562, false], [-9.121963500976562, false], [-9.121963500976562, false], [-9.121963500976562, false]]], "filtered_resps": [[-0.7459797263145447, true], [-1.4071965217590332, false], [-9.121963500976562, false]], "acc": 1.0}
{"doc_id": 6, "doc": {"pubid": 26037986, "question": "30-Day and 1-year mortality in emergency general surgery laparotomies: an area of concern and need for improvement?", "context": {"contexts": ["Emergency surgery is associated with poorer outcomes and higher mortality with recent studies suggesting the 30-day mortality to be 14-15%. The aim of this study was to analyse the 30-day mortality, age-related 30-day mortality and 1-year mortality following emergency laparotomy. We hope this will encourage prospective data collection, improvement of care and initiate strategies to establish best practice in this area.", "This was a retrospective study of patients who underwent emergency laparotomy from June 2010 to May 2012. The primary end point of the study was 30-day mortality, age-related 30-day mortality and 1-year all-cause mortality.", "477 laparotomies were performed in 446 patients. 57% were aged<70 and 43% aged>70 years. 30-day mortality was 12, 4% in those aged<70 years and 22% in those>70 years (p<0.001). 1-year mortality was 25, 15% in those aged under 70 years and 38% in those aged>70 years (p<0.001)."], "labels": ["AIMS", "METHODS", "RESULTS"], "meshes": ["Adult", "Age Factors", "Aged", "Aged, 80 and over", "Cause of Death", "Cohort Studies", "Emergency Treatment", "Female", "General Surgery", "Humans", "Incidence", "Laparotomy", "Male", "Middle Aged", "Needs Assessment", "Retrospective Studies", "Risk Assessment", "Time Factors", "United Kingdom"], "reasoning_required_pred": ["m", "a", "y", "b", "e"], "reasoning_free_pred": ["y", "e", "s"]}, "long_answer": "Emergency laparotomy carries a high rate of mortality, especially in those over the age of 70 years, and more needs to be done to improve outcomes, particularly in this group. This could involve increasing acute surgical care manpower, early recognition of patients requiring emergency surgery, development of clear management protocols for such patients or perhaps even considering centralisation of emergency surgical services to specialist centres with multidisciplinary teams involving emergency surgeons and care of the elderly physicians in hospital and related community outreach services for post-discharge care.", "final_decision": "maybe"}, "target": "maybe", "arguments": ["Abstract: Several studies have suggested a protective effect of folic acid (FA) on congenital heart anomalies. Down syndrome (DS) infants are known to have a high frequency of heart anomalies. Not all children with DS suffer from heart anomalies, which raises the question whether maternal factors might affect the risk of these anomalies. Our objectives were to investigate whether first-trimester FA use protects against heart anomalies among DS children.\nWomen with liveborn DS children participating in the Slone Epidemiology Center Birth Defects Study between 1976 and 1997 were included. We performed case-control analyses using DS, with heart anomalies as cases and DS, without heart anomalies as controls. Subanalyses were performed for defects that have been associated with FA in non-DS populations (conotruncal, ventricular septal [VSD]) and for those that are associated with DS (ostium secundum type atrial septal defects [ASD]and endocardial cushion defects [ECD]). Exposure was defined as the use of any FA-containing product for an average of at least 4 days per week during the first 12 weeks of pregnancy, whereas no exposure was defined as no use of FA in these 12 weeks.\nOf the 223 cases, 110 (49%) were exposed versus 84 (46%) of the 184 controls. After adjustment for possible confounders, no protective effect of FA was found on heart anomalies overall (OR 0.95, 95% CI: 0.61-1.47) nor separately for conotruncal defects, VSDs, ASDs, or ECDs.\nQuestion: Can folic acid protect against congenital heart defects in Down syndrome?\nAnswer: no\n\nAbstract: The use of open access endoscopy is increasing. Its effect on the adequacy of patient informed consent, procedure acceptance and the impact on subsequent communication/transfer of procedure results to the patient have not been evaluated. The aim of our study was to compare the extent of preknowledge of procedures and test explanation, patient medical complexity, information transfer and overall patient satisfaction between a patient group referred for outpatient open access endoscopy versus a patient group from a gastrointestinal (GI) subspecialty clinic.\nInformation was obtained from all patients presenting for outpatient upper and lower endoscopy by using a 1-page questionnaire. Patients from the two groups who had an outpatient upper/lower endoscopic procedure were contacted by phone after the procedure to obtain information with a standardized questionnaire.\nThe open access patients reported receiving significantly less information to help them identify the procedure (p<0.01) and less explanation concerning the nature of the procedure than the group of patients referred from the subspecialty clinic (p<0.005). There was no difference between the two groups in satisfaction scores for examinations performed under conscious sedation. For flexible sigmoidoscopy without sedation, however, the GI clinic patient group were more satisfied with their procedure. The majority of patients, regardless of access, were more likely to receive endoscopic results from a gastroenterologist than the referring physician. Furthermore, the patients in the GI clinic group who underwent colonoscopy felt significantly better at follow-up.\nQuestion: Does open access endoscopy close the door to an adequately informed patient?\nAnswer: yes\n\nAbstract: Treatment delays in breast cancer are generally thought to affect prognosis but the impact on survival remains unclear. Indicators for breast cancer care include time to primary treatment. The purpose of this study was to evaluate whether time to primary treatment (TPT) in breast cancer impacts survival.\nA total of 648 breast cancer patients treated in the University Malaya Medical Center (UMMC), Malaysia between 2004 and 2005 were included in the study. TPT was calculated from the date of pathological diagnosis to the date of primary treatment. Mortality data was obtained from the National Registry of Births and Deaths. Last date of follow-up was November 2010.\nMedian TPT was 18 days. Majority 508 (69.1%) of the patients received treatment within 30 days after diagnosis. The majority was surgically treated. Ethnicity (p=0.002) and stage at presentation (p=0.007) were significantly associated with delayed TPT. Malay ethnicity had delayed TPT compared to the Chinese; Hazard Ratio (HR) 1.9 (Confidence Interval (CI) 1.237, 2.987). Delayed TPT did not affect overall survival on univariate and multivariate analyses.\nQuestion: Delays in time to primary treatment after a diagnosis of breast cancer: does it impact survival?\nAnswer: no\n\nAbstract: To evaluate the outcome of a new modification of percutaneous needle suspension, using a bone anchor system for fixing the suture at the public bone, and to compare the results with those published previously.\nFrom March 1996, 37 patients with stress urinary incontinence (>2 years) were treated using a bone anchor system. On each side the suture was attached to the pubocervical fascia and the vaginal wall via a broad 'Z'-stitch. A urodynamic investigation performed preoperatively in all patients confirmed stress incontinence and excluded detrusor instability. The outcome was assessed by either by a clinical follow-up investigation or using a standardized questionnaire, over a mean follow-up of 11 months (range 6-18).\nIn the 37 patients, the procedure was successful in 25 (68%), with 16 (43%) of the patients completely dry and nine (24%) significantly improved. Removal of the bone anchor and suture was necessary in two patients, because of unilateral bacterial infection in one and a bilateral soft tissue granuloma in the other. One bone anchor became dislocated in a third patient. In two cases where the treatment failed, new detrusor instability was documented urodynamically. Minor complications were prolonged wound pain in 10 (26%) and transient urinary retention or residual urine in 12 patients (32%).\nQuestion: Does bone anchor fixation improve the outcome of percutaneous bladder neck suspension in female stress urinary incontinence?\nAnswer: yes\n\nAbstract: Emergency surgery is associated with poorer outcomes and higher mortality with recent studies suggesting the 30-day mortality to be 14-15%. The aim of this study was to analyse the 30-day mortality, age-related 30-day mortality and 1-year mortality following emergency laparotomy. We hope this will encourage prospective data collection, improvement of care and initiate strategies to establish best practice in this area.\nThis was a retrospective study of patients who underwent emergency laparotomy from June 2010 to May 2012. The primary end point of the study was 30-day mortality, age-related 30-day mortality and 1-year all-cause mortality.\n477 laparotomies were performed in 446 patients. 57% were aged<70 and 43% aged>70 years. 30-day mortality was 12, 4% in those aged<70 years and 22% in those>70 years (p<0.001). 1-year mortality was 25, 15% in those aged under 70 years and 38% in those aged>70 years (p<0.001).\nQuestion: 30-Day and 1-year mortality in emergency general surgery laparotomies: an area of concern and need for improvement?\nAnswer:", " yes"], "resps": [[[-0.7148197293281555, true]], [[-1.489363431930542, false]], [[-9.937323570251465, false], [-9.937323570251465, false], [-9.937323570251465, false], [-9.937323570251465, false]]], "filtered_resps": [[-0.7148197293281555, true], [-1.489363431930542, false], [-9.937323570251465, false]], "acc": 0.0}
{"doc_id": 7, "doc": {"pubid": 26852225, "question": "Is adjustment for reporting heterogeneity necessary in sleep disorders?", "context": {"contexts": ["Anchoring vignettes are brief texts describing a hypothetical character who illustrates a certain fixed level of a trait under evaluation. This research uses vignettes to elucidate factors associated with sleep disorders in adult Japanese before and after adjustment for reporting heterogeneity in self-reports. This study also evaluates the need for adjusting for reporting heterogeneity in the management of sleep and energy related problems in Japan.", "We investigated a dataset of 1002 respondents aged 18 years and over from the Japanese World Health Survey, which collected information through face-to-face interview from 2002 to 2003. The ordered probit model and the Compound Hierarchical Ordered Probit (CHOPIT) model, which incorporated anchoring vignettes, were employed to estimate and compare associations of sleep and energy with socio-demographic and life-style factors before and after adjustment for differences in response category cut-points for each individual.", "The prevalence of self-reported problems with sleep and energy was 53 %. Without correction of cut-point shifts, age, sex, and the number of comorbidities were significantly associated with a greater severity of sleep-related problems. After correction, age, the number of comorbidities, and regular exercise were significantly associated with a greater severity of sleep-related problems; sex was no longer a significant factor. Compared to the ordered probit model, the CHOPIT model provided two changes with a subtle difference in the magnitude of regression coefficients after correction for reporting heterogeneity."], "labels": ["BACKGROUND", "METHODS", "RESULTS"], "meshes": ["Adult", "Aged", "Female", "Health Status Disparities", "Health Surveys", "Humans", "Japan", "Male", "Middle Aged", "Physical Fitness", "Prevalence", "Self Report", "Self-Assessment", "Sleep Wake Disorders", "Socioeconomic Factors"], "reasoning_required_pred": ["y", "e", "s"], "reasoning_free_pred": ["n", "o"]}, "long_answer": "Sleep disorders are common in the general adult population of Japan. Correction for reporting heterogeneity using anchoring vignettes is not a necessary tool for proper management of sleep and energy related problems among Japanese adults. Older age, gender differences in communicating sleep-related problems, the presence of multiple morbidities, and regular exercise should be the focus of policies and clinical practice to improve sleep and energy management in Japan.", "final_decision": "no"}, "target": "no", "arguments": ["Abstract: To determine if composite measures based on process indicators are consistent with short-term outcome indicators in surgical colorectal cancer care.\nLongitudinal analysis of consistency between composite measures based on process indicators and outcome indicators for 85 Dutch hospitals.\nThe Dutch Surgical Colorectal Audit database, the Netherlands.\n4732 elective patients with colon carcinoma and 2239 with rectum carcinoma treated in 85 hospitals were included in the analyses.\nAll available process indicators were aggregated into five different composite measures. The association of the different composite measures with risk-adjusted postoperative mortality and morbidity was analysed at the patient and hospital level.\nAt the patient level, only one of the composite measures was negatively associated with morbidity for rectum carcinoma. At the hospital level, a strong negative association was found between composite measures and hospital mortality and morbidity rates for rectum carcinoma (p<0.05), and hospital morbidity rates for colon carcinoma.\nQuestion: Combining process indicators to evaluate quality of care for surgical patients with colorectal cancer: are scores consistent with short-term outcome?\nAnswer: maybe\n\nAbstract: Human immunodeficiency virus (HIV)-infected patients have generally been excluded from transplantation. Recent advances in the management and prognosis of these patients suggest that this policy should be reevaluated.\nTo explore the current views of U.S. transplant centers toward transplanting asymptomatic HIV-infected patients with end-stage renal disease, a written survey was mailed to the directors of transplantation at all 248 renal transplant centers in the United States.\nAll 148 responding centers said they require HIV testing of prospective kidney recipients, and 84% of these centers would not transplant an individual who refuses HIV testing. The vast majority of responding centers would not transplant a kidney from a cadaveric (88%) or a living donor (91%) into an asymptomatic HIV-infected patient who is otherwise a good candidate for transplantation. Among the few centers that would consider transplanting an HIV-infected patient, not a single center had performed such a transplant in the year prior to the survey. Most centers fear that transplantation in the face of HIV infection would be harmful to the individual, and some believe that it would be a waste of precious organs.\nQuestion: Should all human immunodeficiency virus-infected patients with end-stage renal disease be excluded from transplantation?\nAnswer: no\n\nAbstract: In a prospective study 218 preschool children were enrolled (stratified in 2 training programs, one specialized for phonologic awareness in order to prevent dyslexia, the other consisting in training of general perception) during the last year of kindergarten. After finishing the first grade 131 children were compared in their reading and writing abilities.\nIn the whole group only a slight difference was found between both training modalities concerning their writing abilities. However, children with a history of hearing loss, actual hearing loss or pathologic middle ear findings profited most from the specialized training program compared to the control in their reading abilities.\nQuestion: Is a specialised training of phonological awareness indicated in every preschool child?\nAnswer: maybe\n\nAbstract: Epidemiological data show significant associations of vitamin D deficiency and autoimmune diseases. Vitamin D may prevent autoimmunity by stimulating naturally occurring regulatory T cells.\nTo elucidate whether vitamin D supplementation increases Tregs frequency (%Tregs) within circulating CD4+ T cells.\nWe performed an uncontrolled vitamin D supplementation trial among 50 apparently healthy subjects including supplementation of 140,000 IU at baseline and after 4 weeks (visit 1). The final follow-up visit was performed 8 weeks after the baseline examination (visit 2). Blood was drawn at each study visit to determine 25-hydroxyvitamin D levels and %Tregs. Tregs were characterized as CD4+CD25++ T cells with expression of the transcription factor forkhead box P3 and low or absent expression of CD127.\nForty-six study participants (65% females, mean age +/- SD 31 +/- 8 years) completed the trial. 25(OH)D levels increased from 23.9 +/- 12.9 ng/ml at baseline to 45.9 +/- 14.0 ng/ml at visit 1 and 58.0 +/- 15.1 ng/ml at visit 2. %Tregs at baseline were 4.8 +/- 1.4. Compared to baseline levels we noticed a significant increase of %Tregs at study visit 1 (5.9 +/- 1.7, P<0.001) and 2 (5.6 +/- 1.6, P<0.001).\nQuestion: Vitamin D supplementation and regulatory T cells in apparently healthy subjects: vitamin D treatment for autoimmune diseases?\nAnswer: yes\n\nAbstract: Anchoring vignettes are brief texts describing a hypothetical character who illustrates a certain fixed level of a trait under evaluation. This research uses vignettes to elucidate factors associated with sleep disorders in adult Japanese before and after adjustment for reporting heterogeneity in self-reports. This study also evaluates the need for adjusting for reporting heterogeneity in the management of sleep and energy related problems in Japan.\nWe investigated a dataset of 1002 respondents aged 18 years and over from the Japanese World Health Survey, which collected information through face-to-face interview from 2002 to 2003. The ordered probit model and the Compound Hierarchical Ordered Probit (CHOPIT) model, which incorporated anchoring vignettes, were employed to estimate and compare associations of sleep and energy with socio-demographic and life-style factors before and after adjustment for differences in response category cut-points for each individual.\nThe prevalence of self-reported problems with sleep and energy was 53 %. Without correction of cut-point shifts, age, sex, and the number of comorbidities were significantly associated with a greater severity of sleep-related problems. After correction, age, the number of comorbidities, and regular exercise were significantly associated with a greater severity of sleep-related problems; sex was no longer a significant factor. Compared to the ordered probit model, the CHOPIT model provided two changes with a subtle difference in the magnitude of regression coefficients after correction for reporting heterogeneity.\nQuestion: Is adjustment for reporting heterogeneity necessary in sleep disorders?\nAnswer:", " yes"], "resps": [[[-1.0884227752685547, true]], [[-1.1592693328857422, false]], [[-5.038801193237305, false], [-5.038801193237305, false], [-5.038801193237305, false], [-5.038801193237305, false]]], "filtered_resps": [[-1.0884227752685547, true], [-1.1592693328857422, false], [-5.038801193237305, false]], "acc": 0.0}
{"doc_id": 0, "doc": {"question": "What came into force after the new constitution was herald?", "sentence": "As of that day, the new constitution heralding the Second Republic came into force.", "label": 0, "idx": 0}, "target": "yes", "arguments": ["This airport is Latin America's second busiest and one of the largests in traffic, with daily flights to United States and Canada, mainland Mexico, Central America and the Caribbean, South America, Europe and Asia.\nDoes that sentence have all you need to answer the question \"How many passengers come through Adolfo Lopez Mateos International Airport?\"? no\n\nA more elaborate form of urban AC is the rhythmic oldies format, which focuses primarily on \"old school\" R&B and soul hits from the 1960s to the 1990s, including Motown and disco hits.\nDoes that sentence have all you need to answer the question \"What is the target demographic of the rhythmic oldies format?\"? no\n\nWith the rapid growth of industrial workers in the auto factories, labor unions such as the American Federation of Labor and the United Auto Workers fought to organize workers to gain them better working conditions and wages.\nDoes that sentence have all you need to answer the question \"Who was the labor leader of the Teamsters?\"? no\n\nAs economic and demographic methods were applied to the study of history, the trend was increasingly to see the late Middle Ages as a period of recession and crisis.\nDoes that sentence have all you need to answer the question \"Which countries were the focus of Huizinga's research?\"? no\n\nAs of that day, the new constitution heralding the Second Republic came into force.\nDoes that sentence have all you need to answer the question \"What came into force after the new constitution was herald?\"?", " yes"], "resps": [[[-2.623875617980957, false]], [[-0.15189073979854584, true]]], "filtered_resps": [[-2.623875617980957, false], [-0.15189073979854584, true]], "acc": 0.0}
{"doc_id": 8, "doc": {"question": "Where did Temüjin hide during his escape from the Tayichi'ud?", "sentence": "Temüjin's reputation also became widespread after his escape from the Tayichi'ud.", "label": 1, "idx": 8}, "target": "no", "arguments": ["The one described by Bernard Lewis as \"most degrading\" was the requirement of distinctive clothing, not found in the Quran or hadith but invented in early medieval Baghdad; its enforcement was highly erratic.\nDoes that sentence have all you need to answer the question \"What was the disability described by Bernard Lewis as \"most degrading?\"\"? yes\n\nMethodism identifies principally with the theology of John Wesley—an Anglican priest and evangelist.\nDoes that sentence have all you need to answer the question \"Who was the inspiration for Methodism?\"? yes\n\nShortly after the British occupation began, the Great Fire of New York occurred, a large conflagration on the West Side of Lower Manhattan, which destroyed about a quarter of the buildings in the city, including Trinity Church.\nDoes that sentence have all you need to answer the question \"On what date did the peace conference on Staten Island occur?\"? no\n\nSwitzerland is notable for the variety of grapes grown because of the large variations in terroirs, with their specific mixes of soil, air, altitude and light.\nDoes that sentence have all you need to answer the question \"When were vineyards first cultivated in Switzerland?\"? no\n\nTemüjin's reputation also became widespread after his escape from the Tayichi'ud.\nDoes that sentence have all you need to answer the question \"Where did Temüjin hide during his escape from the Tayichi'ud?\"?", " yes"], "resps": [[[-0.8704229593276978, true]], [[-0.9366308450698853, false]]], "filtered_resps": [[-0.8704229593276978, true], [-0.9366308450698853, false]], "acc": 0.0}
{"doc_id": 1, "doc": {"question": "What is the first major city in the stream of the Rhine?", "sentence": "The most important tributaries in this area are the Ill below of Strasbourg, the Neckar in Mannheim and the Main across from Mainz.", "label": 1, "idx": 1}, "target": "no", "arguments": ["In 1855, Washington Territorial Governor Isaac Stevens negotiated the Hellgate treaty between the United States Government and the Salish, Pend d'Oreille, and the Kootenai people of western Montana, which established boundaries for the tribal nations.\nDoes that sentence have all you need to answer the question \"What did the treaty establish?\"? no\n\nThe tenth museum, the Museum for African Art, joined the ensemble in 2009, however its Museum at 110th Street, the first new museum constructed on the Mile since the Guggenheim in 1959, opened in late 2012.\nDoes that sentence have all you need to answer the question \"When was the Guggenheim built?\"? yes\n\nOn April 12, 1776, the colony became the first to instruct its delegates to the Continental Congress to vote for independence from the British Crown, through the Halifax Resolves passed by the North Carolina Provincial Congress.\nDoes that sentence have all you need to answer the question \"What year did North Carolina instruct its delegates to vote for independence?\"? yes\n\nThe courses are normally tuned in a succession of perfect fifths.\nDoes that sentence have all you need to answer the question \"How many courses does a mandolin commonly have?\"? no\n\nThe most important tributaries in this area are the Ill below of Strasbourg, the Neckar in Mannheim and the Main across from Mainz.\nDoes that sentence have all you need to answer the question \"What is the first major city in the stream of the Rhine?\"?", " yes"], "resps": [[[-0.7019525766372681, true]], [[-0.8973566293716431, false]]], "filtered_resps": [[-0.7019525766372681, true], [-0.8973566293716431, false]], "acc": 0.0}
{"doc_id": 9, "doc": {"question": "What are the most active parts of ctenophora?", "sentence": "These branch through the mesoglea to the most active parts of the animal: the mouth and pharynx; the roots of the tentacles, if present; all along the underside of each comb row; and four branches round the sensory complex at the far end from the mouth – two of these four branches terminate in anal pores.", "label": 0, "idx": 9}, "target": "yes", "arguments": ["If the compensation system uses chains, the chain is guided by a bar mounted between the counterweight railway lines.\nDoes that sentence have all you need to answer the question \"What is the chain guided by in a compensation system that uses chains?\"? yes\n\nDecoding, on the other hand, is carefully defined in the standard.\nDoes that sentence have all you need to answer the question \"The ISO/IEC high standard document states that the decompressed output produced from a given MP3 file will be the same within what standards?\"? no\n\nWhile many regional heads of state tried to emulate Nasser, Podeh opined that the \"parochialism\" of successive Arab leaders \"transformed imitation [of Nasser] into parody\".\nDoes that sentence have all you need to answer the question \"Which leader considered Nasser his hero?\"? no\n\nMore recently, new fossil and molecular evidence is providing an increasingly clear picture of the evolution of modern bird orders.\nDoes that sentence have all you need to answer the question \"What do scientists tend to agree on?\"? no\n\nThese branch through the mesoglea to the most active parts of the animal: the mouth and pharynx; the roots of the tentacles, if present; all along the underside of each comb row; and four branches round the sensory complex at the far end from the mouth – two of these four branches terminate in anal pores.\nDoes that sentence have all you need to answer the question \"What are the most active parts of ctenophora?\"?", " yes"], "resps": [[[-1.3451168537139893, false]], [[-0.43398401141166687, true]]], "filtered_resps": [[-1.3451168537139893, false], [-0.43398401141166687, true]], "acc": 0.0}
{"doc_id": 2, "doc": {"question": "What is the minimum required if you want to teach in Canada?", "sentence": "In most provinces a second Bachelor's Degree such as a Bachelor of Education is required to become a qualified teacher.", "label": 1, "idx": 2}, "target": "no", "arguments": ["Sandra Laing is a South African woman who was classified as Coloured by authorities during the apartheid era, due to her skin colour and hair texture, although her parents could prove at least three generations of European ancestors.\nDoes that sentence have all you need to answer the question \"At was age was Sandra Laing expelled from school?\"? no\n\nEaster was the Sunday after the 15th day of this moon, whose 14th day was allowed to precede the equinox.\nDoes that sentence have all you need to answer the question \"On what border of the Byzantine Empire were the last holdouts for celebrating according the Alexandrian Easter?\"? no\n\nWhen Emperor Haile Selassie unilaterally dissolved the Eritrean parliament and annexed the country in 1962, the Eritrean Liberation Front (ELF) waged an armed struggle for independence.\nDoes that sentence have all you need to answer the question \"How long did the Eritrean War for Independence last?\"? no\n\nWinter sports are practiced by the natives and tourists since the second half of the 19th century with the invention of bobsleigh in St. Moritz.\nDoes that sentence have all you need to answer the question \"What 3 mountain sports are among the most popular in Switzerland?\"? no\n\nIn most provinces a second Bachelor's Degree such as a Bachelor of Education is required to become a qualified teacher.\nDoes that sentence have all you need to answer the question \"What is the minimum required if you want to teach in Canada?\"?", " yes"], "resps": [[[-1.9936968088150024, false]], [[-0.22315843403339386, true]]], "filtered_resps": [[-1.9936968088150024, false], [-0.22315843403339386, true]], "acc": 1.0}
{"doc_id": 10, "doc": {"question": "Who decides the fate of protesters most of the time?", "sentence": "Brownlee argues, \"Bringing in deterrence at the level of justification detracts from the law’s engagement in a moral dialogue with the offender as a rational person because it focuses attention on the threat of punishment and not the moral reasons to follow this law.\"", "label": 1, "idx": 10}, "target": "no", "arguments": ["In November 2009, 1080p HD support was added.\nDoes that sentence have all you need to answer the question \"When was 720p HD support added to youtube?\"? no\n\nOn 3 May 1505 King Alexander I Jagiellon granted the Act of \"Nihil novi nisi commune consensu\" (Latin: \"I accept nothing new except by common consent\").\nDoes that sentence have all you need to answer the question \"Who granted the act Act of \"Nihil novi nisi commune consensu\"?\"? yes\n\nIt is the seat of Wayne County, the most populous county in the state.\nDoes that sentence have all you need to answer the question \"What is the name of the river that runs through Detroit?\"? no\n\nMuch of the growth has occurred after World War II, when decolonization of Africa and abolition of various restrictions against Protestants in Latin American countries occurred.\nDoes that sentence have all you need to answer the question \"When did much of the spread of Protestantism occur in the 20th century?\"? yes\n\nBrownlee argues, \"Bringing in deterrence at the level of justification detracts from the law’s engagement in a moral dialogue with the offender as a rational person because it focuses attention on the threat of punishment and not the moral reasons to follow this law.\"\nDoes that sentence have all you need to answer the question \"Who decides the fate of protesters most of the time?\"?", " yes"], "resps": [[[-0.8131738305091858, true]], [[-0.9923344254493713, false]]], "filtered_resps": [[-0.8131738305091858, true], [-0.9923344254493713, false]], "acc": 0.0}
{"doc_id": 3, "doc": {"question": "How was Temüjin kept imprisoned by the Tayichi'ud?", "sentence": "The Tayichi'ud enslaved Temüjin (reportedly with a cangue, a sort of portable stocks), but with the help of a sympathetic guard, the father of Chilaun (who later became a general of Genghis Khan), he was able to escape from the ger (yurt) in the middle of the night by hiding in a river crevice.[citation needed]", "label": 0, "idx": 3}, "target": "yes", "arguments": ["In the same publication, Feynman also talks about his worries in the atomic bomb age, feeling for some considerable time that there was a high risk that the bomb would be used again soon, so that it was pointless to build for the future.\nDoes that sentence have all you need to answer the question \"After feeling guilty for helping make an atomic bomb, Feynman went through what mental disorder?\"? no\n\nThe infrared channel, in combination with the other channels, is used to detect the location of scratches and dust.\nDoes that sentence have all you need to answer the question \"What is the name of the technique used in scanners to minimize the effects of dust and scratches?\"? no\n\nThe Super Nintendo Entertainment System (officially abbreviated the Super NES[b] or SNES[c], and commonly shortened to Super Nintendo[d]) is a 16-bit home video game console developed by Nintendo that was released in 1990 in Japan and South Korea, 1991 in North America, 1992 in Europe and Australasia (Oceania), and 1993 in South America.\nDoes that sentence have all you need to answer the question \"What was the SNES called in Japan?\"? no\n\nCurrently, the largest professional wrestling company worldwide is the United States-based WWE, which bought out many smaller regional companies in the late 20th century, as well as its primary US competitors WCW and Extreme Championship Wrestling (ECW) in early 2001.\nDoes that sentence have all you need to answer the question \"What is the biggest wrestling company?\"? yes\n\nThe Tayichi'ud enslaved Temüjin (reportedly with a cangue, a sort of portable stocks), but with the help of a sympathetic guard, the father of Chilaun (who later became a general of Genghis Khan), he was able to escape from the ger (yurt) in the middle of the night by hiding in a river crevice.[citation needed]\nDoes that sentence have all you need to answer the question \"How was Temüjin kept imprisoned by the Tayichi'ud?\"?", " yes"], "resps": [[[-0.9148472547531128, false]], [[-0.9005993604660034, true]]], "filtered_resps": [[-0.9148472547531128, false], [-0.9005993604660034, true]], "acc": 0.0}
{"doc_id": 11, "doc": {"question": "What act sets forth the functions of the Scottish Parliament?", "sentence": "The Scotland Act 1998, which was passed by the Parliament of the United Kingdom and given royal assent by Queen Elizabeth II on 19 November 1998, governs the functions and role of the Scottish Parliament and delimits its legislative competence.", "label": 0, "idx": 11}, "target": "yes", "arguments": ["The Arabic population kept slaves well into the 20th century, until slavery was suppressed by French authorities around the mid-20th century.\nDoes that sentence have all you need to answer the question \"What region of the country is historical slavery well known?\"? no\n\nPhilip V, who came to power when Doson died in 221 BC, was the last Macedonian ruler with both the talent and the opportunity to unite Greece and preserve its independence against the \"cloud rising in the west\": the ever-increasing power of Rome.\nDoes that sentence have all you need to answer the question \"What was Philip V known as?\"? no\n\nFederal law originates with the Constitution, which gives Congress the power to enact statutes for certain limited purposes like regulating interstate commerce.\nDoes that sentence have all you need to answer the question \"What gives Congress limited power to enact statutes?\"? yes\n\nA government report covered by the Guardian in 2002 indicates that between 1940 and 1979, the Ministry of Defence \"turned large parts of the country into a giant laboratory to conduct a series of secret germ warfare tests on the public\" and many of these tests \"involved releasing potentially dangerous chemicals and micro-organisms over vast swaths of the population without the public being told.\nDoes that sentence have all you need to answer the question \"When was the supposed germ warfare testing taking place?\"? yes\n\nThe Scotland Act 1998, which was passed by the Parliament of the United Kingdom and given royal assent by Queen Elizabeth II on 19 November 1998, governs the functions and role of the Scottish Parliament and delimits its legislative competence.\nDoes that sentence have all you need to answer the question \"What act sets forth the functions of the Scottish Parliament?\"?", " yes"], "resps": [[[-0.5465704798698425, true]], [[-1.1663055419921875, false]]], "filtered_resps": [[-0.5465704798698425, true], [-1.1663055419921875, false]], "acc": 1.0}
{"doc_id": 4, "doc": {"question": "What did Herr Gott, dich loben wir become known as ?", "sentence": "He paraphrased the Te Deum as \"Herr Gott, dich loben wir\" with a simplified form of the melody.", "label": 1, "idx": 4}, "target": "no", "arguments": ["New York is the most important source of political fundraising in the United States, as four of the top five ZIP codes in the nation for political contributions are in Manhattan.\nDoes that sentence have all you need to answer the question \"Four-fifths of the ZIP codes that provide the highest amount of political contributions in the United States are located in what borough?\"? yes\n\nOn May 26, 2009, Microsoft announced the future release of the Zune HD (in the fall of 2009), the next addition to the Zune product range.\nDoes that sentence have all you need to answer the question \"What was the new name of this Live video store?\"? no\n\nShe was told by Gaddafi's father that even as a child he had been \"always serious, even taciturn\", a trait he also exhibited in adulthood.\nDoes that sentence have all you need to answer the question \"Who stated that Gaddafi was fond of children?\"? no\n\nTools crafted by proto-humans that have been dated back two million years have been discovered in the northwestern part of the subcontinent.\nDoes that sentence have all you need to answer the question \"What type of remains were found in the Narmada Valley?\"? no\n\nHe paraphrased the Te Deum as \"Herr Gott, dich loben wir\" with a simplified form of the melody.\nDoes that sentence have all you need to answer the question \"What did Herr Gott, dich loben wir become known as ?\"?", " yes"], "resps": [[[-1.176910400390625, false]], [[-0.501701295375824, true]]], "filtered_resps": [[-1.176910400390625, false], [-0.501701295375824, true]], "acc": 1.0}
{"doc_id": 12, "doc": {"question": "What is the name for a response of the immune system that damages the body's native tissues?", "sentence": "Hypersensitivity is an immune response that damages the body's own tissues.", "label": 0, "idx": 12}, "target": "yes", "arguments": ["The anarchist Proudhon (best known for declaring that \"property is theft\") used the word \"humanism\" to describe a \"culte, déification de l’humanité\" (\"worship, deification of humanity\") and Ernest Renan in L’avenir de la science: pensées de 1848 (\"The Future of Knowledge: Thoughts on 1848\") (1848–49), states: \"\nDoes that sentence have all you need to answer the question \"Who felt that humanism would surely be a major \"religion\" today?\"? yes\n\nOf these, the Bureau of Land Management manages 87 million acres (35 million hectares), or 23.8% of the state.\nDoes that sentence have all you need to answer the question \"How much of the state is controlled by the Bureau of Land Management?\"? yes\n\nOthers have more or less equated postmodern music with the \"contemporary music\" composed from the late 20th century through to the early 21st century.\nDoes that sentence have all you need to answer the question \"Postmodern music is also know as what?\"? yes\n\nIn the 15th and 16th centuries, the language spread worldwide as Portugal established a colonial and commercial empire between 1415 and 1999.\nDoes that sentence have all you need to answer the question \"Between what years did Portugal establish a colonial and commercial empire?\"? yes\n\nHypersensitivity is an immune response that damages the body's own tissues.\nDoes that sentence have all you need to answer the question \"What is the name for a response of the immune system that damages the body's native tissues?\"?", " yes"], "resps": [[[-0.6128409504890442, true]], [[-2.741220712661743, false]]], "filtered_resps": [[-0.6128409504890442, true], [-2.741220712661743, false]], "acc": 1.0}
{"doc_id": 5, "doc": {"question": "What year did the the case go before the supreme court?", "sentence": "For example, Joseph Haas was arrested for allegedly sending an email to the Lebanon, New Hampshire city councilors stating, \"Wise up or die.\"", "label": 1, "idx": 5}, "target": "no", "arguments": ["The coset eN = N serves as the identity in this group, and the inverse of gN in the quotient group is (gN)−1 = (g−1)N.e[›]\nDoes that sentence have all you need to answer the question \"What group does the inverse of gN include?\"? yes\n\nDietary reference values for water or adequate daily intakes (ADI).\nDoes that sentence have all you need to answer the question \"What does ADI stand for?\"? yes\n\nOn October 1, 2011, Kanye West premiered his women's fashion label, DW Kanye West at Paris Fashion Week.\nDoes that sentence have all you need to answer the question \"What was the name of Kanye West's women's clothing line?\"? yes\n\nChrome's user-base continued to grow and in May 2012, Chrome's usage passed the usage of all versions of Internet Explorer combined.\nDoes that sentence have all you need to answer the question \"When did Chrome become more used than all versions of Internet Explorer?\"? yes\n\nFor example, Joseph Haas was arrested for allegedly sending an email to the Lebanon, New Hampshire city councilors stating, \"Wise up or die.\"\nDoes that sentence have all you need to answer the question \"What year did the the case go before the supreme court?\"?", " yes"], "resps": [[[-1.6329673528671265, true]], [[-3.1929268836975098, false]]], "filtered_resps": [[-1.6329673528671265, true], [-3.1929268836975098, false]], "acc": 0.0}
{"doc_id": 13, "doc": {"question": "When did the Warsaw Uprising begin?", "sentence": "the Polish government-in-exile in London gave orders to the underground Home Army (AK) to try to seize control of Warsaw from the Germans before the Red Army arrived.", "label": 1, "idx": 13}, "target": "no", "arguments": ["The Spanish thought the Great Plains were the location of the mythological Quivira and Cíbola, a place said to be rich in gold.\nDoes that sentence have all you need to answer the question \"who was the spanish conquistador who had encounters with europeans and native americans?\"? no\n\nEach term for the mayor and council members lasts four years and has a three consecutive-term limit, but can resume after a four-year break.\nDoes that sentence have all you need to answer the question \"What is the official journal of New York City?\"? no\n\nIn the mid-18th century, Paris became the center of an explosion of philosophic and scientific activity challenging traditional doctrines and dogmas.\nDoes that sentence have all you need to answer the question \"Which city in the mid-18th century became the center of an explosion of philosophic and scientific activity?\"? yes\n\nSince the middle of the 19th century, Masonic historians have sought the origins of the movement in a series of similar documents known as the Old Charges, dating from the Regius Poem in about 1425 to the beginning of the 18th century.\nDoes that sentence have all you need to answer the question \"The fifteenth century also shows evidence of what in Masonic history?\"? no\n\nthe Polish government-in-exile in London gave orders to the underground Home Army (AK) to try to seize control of Warsaw from the Germans before the Red Army arrived.\nDoes that sentence have all you need to answer the question \"When did the Warsaw Uprising begin?\"?", " yes"], "resps": [[[-2.2435457706451416, false]], [[-0.20550179481506348, true]]], "filtered_resps": [[-2.2435457706451416, false], [-0.20550179481506348, true]], "acc": 1.0}
{"doc_id": 6, "doc": {"question": "What does UMC stand for?", "sentence": "Founded in 1968 by the union of the Methodist Church (USA) and the Evangelical United Brethren Church, the UMC traces its roots back to the revival movement of John and Charles Wesley in England as well as the Great Awakening in the United States.", "label": 1, "idx": 6}, "target": "no", "arguments": ["During this period winter sports were slowly introduced: in 1882 the first figure skating championship was held in St. Moritz, and downhill skiing became a popular sport with English visitors early in the 20th century, as the first ski-lift was installed in 1908 above Grindelwald.\nDoes that sentence have all you need to answer the question \"When was the first figure skating championship held?\"? yes\n\nUnder Idris, Libya's armed forces were trained by the British military; this angered Gaddafi, who viewed the British as imperialists, and accordingly he refused to learn English and was rude to the British officers, ultimately failing his exams.\nDoes that sentence have all you need to answer the question \"Why didn't Gaddafi learn to speak English?\"? yes\n\nBrace has criticized this, the practice of forensic anthropologists for using the controversial concept \"race\" out of convention when they in fact should be talking about regional ancestry.\nDoes that sentence have all you need to answer the question \"Why is it bad that a category is merely socially constructed?\"? no\n\nBeer served unchilled—either cool or at room temperature, reveal more of their flavours.\nDoes that sentence have all you need to answer the question \"What technology support the drinking of chilled beer?\"? no\n\nFounded in 1968 by the union of the Methodist Church (USA) and the Evangelical United Brethren Church, the UMC traces its roots back to the revival movement of John and Charles Wesley in England as well as the Great Awakening in the United States.\nDoes that sentence have all you need to answer the question \"What does UMC stand for?\"?", " yes"], "resps": [[[-1.5628972053527832, false]], [[-0.9259495735168457, true]]], "filtered_resps": [[-1.5628972053527832, false], [-0.9259495735168457, true]], "acc": 1.0}
{"doc_id": 14, "doc": {"question": "The Tower District is centered around which historic theatre?", "sentence": "The theater was built in 1939 and is at Olive and Wishon Avenues in the heart of the Tower District.", "label": 1, "idx": 14}, "target": "no", "arguments": ["The Hebrew University of Jerusalem and Tel Aviv University are ranked among the world's top 100 universities by Times Higher Education magazine.\nDoes that sentence have all you need to answer the question \"How many private colleges does Israel have?\"? no\n\nThe Museum is within a short walk from the Boudhnath stupa, which itself can be seen from the Museum tower.\nDoes that sentence have all you need to answer the question \"What is the Taragaon Museum dedicated to?\"? no\n\nIt was occupied by piratical kingdoms of the Barbary Coast, de facto independent since the 18th century.\nDoes that sentence have all you need to answer the question \"What was one region included that was occupied by piratical kingdoms?\"? no\n\nFrom 1903 onwards, 12-inch records (30.5 cm) were also sold commercially, mostly of classical music or operatic selections, with four to five minutes of music per side.\nDoes that sentence have all you need to answer the question \"How long could a recording be on a 10 inch disc in the 1903?\"? yes\n\nThe theater was built in 1939 and is at Olive and Wishon Avenues in the heart of the Tower District.\nDoes that sentence have all you need to answer the question \"The Tower District is centered around which historic theatre?\"?", " yes"], "resps": [[[-0.321319043636322, true]], [[-1.3541712760925293, false]]], "filtered_resps": [[-0.321319043636322, true], [-1.3541712760925293, false]], "acc": 0.0}
{"doc_id": 7, "doc": {"question": "What is the name of the professional skateboarder that lives in southern California?", "sentence": "Southern California is also important to the world of yachting.", "label": 1, "idx": 7}, "target": "no", "arguments": ["The Srijana Contemporary Art Gallery, located inside the Bhrikutimandap Exhibition grounds, hosts the work of contemporary painters and sculptors, and regularly organizes exhibitions.\nDoes that sentence have all you need to answer the question \"In which of Nepal's art galleries can one take an art class?\"? yes\n\nDuring the 1970s and 1980s Theo Angelopoulos directed a series of notable and appreciated movies.\nDoes that sentence have all you need to answer the question \"Who directed the film Eternity and a Day?\"? yes\n\nForms of lighting include alcove lighting, which like most other uplighting is indirect.\nDoes that sentence have all you need to answer the question \"When was fluorescent lighting first available?\"? no\n\nJehovah's Witnesses are directed by the Governing Body of Jehovah's Witnesses, a group of elders in Brooklyn, New York, which establishes all doctrines based on its interpretations of the Bible.\nDoes that sentence have all you need to answer the question \"What are the Jehovah Witnesses?\"? no\n\nSouthern California is also important to the world of yachting.\nDoes that sentence have all you need to answer the question \"What is the name of the professional skateboarder that lives in southern California?\"?", " yes"], "resps": [[[-0.8552224636077881, false]], [[-0.7644964456558228, true]]], "filtered_resps": [[-0.8552224636077881, false], [-0.7644964456558228, true]], "acc": 1.0}
{"doc_id": 15, "doc": {"question": "What is the English translation of Het Scheur?", "sentence": "The largest and southern main branch begins as Waal and continues as Boven Merwede (\"Upper Merwede\"), Beneden Merwede (\"Lower Merwede\"), Noord River (\"North River\"), Nieuwe Maas (\"New Meuse\"), Het Scheur (\"the Rip\") and Nieuwe Waterweg (\"New Waterway\").", "label": 0, "idx": 15}, "target": "yes", "arguments": ["Efforts to deploy modern computers and networking equipment were generally successful, but attempts to develop new investigation software, outsourced to Science Applications International Corporation (SAIC), were not.\nDoes that sentence have all you need to answer the question \"What project centered on upgrading FBI Information Technology Infrastructure?\"? no\n\nThe Government of Texas, through Section 2054.116 of the Government Code, mandates that state agencies provide information on their websites in Spanish to assist residents who have limited English proficiency.\nDoes that sentence have all you need to answer the question \"What is Texas official language?\"? no\n\nIn the field of communication sciences, critical organizational scholars have examined the role of emotions in organizations, from the perspectives of managers, employees, and even customers.\nDoes that sentence have all you need to answer the question \"What field of study studies the organizational role of emotions?\"? yes\n\nBeiDou-2 (formerly known as COMPASS) is not an extension to the older BeiDou-1, but rather supersedes it outright.\nDoes that sentence have all you need to answer the question \"How many geostationary orbit satellites will the BeiDou-2 system have?\"? no\n\nThe largest and southern main branch begins as Waal and continues as Boven Merwede (\"Upper Merwede\"), Beneden Merwede (\"Lower Merwede\"), Noord River (\"North River\"), Nieuwe Maas (\"New Meuse\"), Het Scheur (\"the Rip\") and Nieuwe Waterweg (\"New Waterway\").\nDoes that sentence have all you need to answer the question \"What is the English translation of Het Scheur?\"?", " yes"], "resps": [[[-1.1684300899505615, false]], [[-0.4478108286857605, true]]], "filtered_resps": [[-1.1684300899505615, false], [-0.4478108286857605, true]], "acc": 0.0}
{"doc_id": 0, "doc": {"sentence": "Sarah was a much better surgeon than Maria so _ always got the easier cases.", "option1": "Sarah", "option2": "Maria", "answer": "2"}, "target": "always got the easier cases.", "arguments": ["Sarah was a much better surgeon than Maria so Sarah", " always got the easier cases."], "resps": [[[-27.389801025390625, false]], [[-26.199748992919922, false]]], "filtered_resps": [[-27.389801025390625, false], [-26.199748992919922, false]], "acc": 1.0}
{"doc_id": 1, "doc": {"sentence": "Sarah was a much better surgeon than Maria so _ always got the harder cases.", "option1": "Sarah", "option2": "Maria", "answer": "1"}, "target": "always got the harder cases.", "arguments": ["Sarah was a much better surgeon than Maria so Sarah", " always got the harder cases."], "resps": [[[-26.916120529174805, false]], [[-25.329540252685547, false]]], "filtered_resps": [[-26.916120529174805, false], [-25.329540252685547, false]], "acc": 0.0}
{
"results": {
"boolq": {
"acc,none": 0.5859327217125382,
"acc_stderr,none": 0.008614932353134956
}
},
"configs": {
"boolq": {
"task": "boolq",
"group": [
"super-glue-lm-eval-v1"
],
"dataset_path": "super_glue",
"dataset_name": "boolq",
"training_split": "train",
"validation_split": "validation",
"doc_to_text": "{{passage}}\nQuestion: {{question}}\nAnswer:",
"doc_to_target": "label",
"doc_to_choice": {
"0": "no",
"1": "yes"
},
"description": "",
"target_delimiter": " ",
"fewshot_delimiter": "\n\n",
"num_fewshot": 0,
"metric_list": [
{
"metric": "acc"
}
],
"output_type": "multiple_choice",
"repeats": 1,
"should_decontaminate": false
}
},
"versions": {
"boolq": "Yaml"
},
"config": {
"model": "hf",
"model_args": "pretrained=facebook/xglm-1.7B",
"num_fewshot": 0,
"batch_size": 4,
"batch_sizes": [],
"device": null,
"use_cache": null,
"limit": null,
"bootstrap_iters": 100000
},
"git_hash": "c37ad6e"
}
\ No newline at end of file
{
"results": {
"cb": {
"acc,none": 0.5,
"acc_stderr,none": 0.5,
"f1,none": 0.2222222222222222
}
},
"configs": {
"cb": {
"task": "cb",
"group": [
"super-glue-lm-eval-v1"
],
"dataset_path": "super_glue",
"dataset_name": "cb",
"training_split": "train",
"validation_split": "validation",
"doc_to_text": "{{premise}}\nQuestion: {{hypothesis}}. True, False, or Neither?\nAnswer:",
"doc_to_target": "label",
"doc_to_choice": [
"True",
"False",
"Neither"
],
"description": "",
"target_delimiter": " ",
"fewshot_delimiter": "\n\n",
"num_fewshot": 4,
"metric_list": [
{
"metric": "acc"
},
{
"metric": "f1",
"aggregation": "<function cb_multi_fi at 0x7f3212743d30>"
}
],
"output_type": "multiple_choice",
"repeats": 1,
"should_decontaminate": false
}
},
"versions": {
"cb": "Yaml"
},
"config": {
"model": "hf",
"model_args": "",
"num_fewshot": 4,
"batch_size": 1,
"batch_sizes": [],
"device": null,
"use_cache": null,
"limit": 2.0,
"bootstrap_iters": 100000
},
"git_hash": "656c310"
}
\ No newline at end of file
{
"results": {
"boolq-seq2seq": {
"exact_match,none": 0.0,
"exact_match_stderr,none": 0.0
}
},
"configs": {
"boolq-seq2seq": {
"task": "boolq-seq2seq",
"group": [
"super-glue-lm-eval-v1-seq2seq"
],
"dataset_path": "super_glue",
"dataset_name": "boolq",
"training_split": "train",
"validation_split": "validation",
"doc_to_text": "{{passage}}\nQuestion: {{question}}\nAnswer:",
"doc_to_target": "{{['no', 'yes'][label]}}",
"description": "",
"target_delimiter": " ",
"fewshot_delimiter": "\n\n",
"num_fewshot": 4,
"metric_list": [
{
"metric": "exact_match",
"aggregation": "mean",
"higher_is_better": true,
"ignore_case": true,
"ignore_punctuation": true
}
],
"output_type": "greedy_until",
"generation_kwargs": {
"until": [
"\n\n",
"\n"
],
"do_sample": false,
"temperature": 0.0
},
"repeats": 1,
"should_decontaminate": false
}
},
"versions": {
"boolq-seq2seq": "Yaml"
},
"config": {
"model": "hf",
"model_args": "",
"num_fewshot": 4,
"batch_size": 1,
"batch_sizes": [],
"device": "cuda",
"use_cache": null,
"limit": 16.0,
"bootstrap_iters": 100000
},
"git_hash": "ae41f67"
}
\ No newline at end of file
{"doc_id": 0, "doc": {"question": "Who was President when the first Peanuts cartoon was published?", "question_id": "tc_0", "question_source": "http://www.triviacountry.com/", "entity_pages": {"doc_source": [], "filename": [], "title": [], "wiki_context": []}, "search_results": {"description": [], "filename": [], "rank": [], "title": [], "url": [], "search_context": []}, "answer": {"aliases": ["33rd President of the United States", "H. S. Truman", "H. Truman", "H.S. Truman", "HST (president)", "Harold Truman", "Harry S Truman", "Harry S. Truman", "Harry S.Truman", "Harry Shipp Truman", "Harry Shippe Truman", "Harry Solomon Truman", "Harry Truman", "Harry Truman's", "Harry truman", "Hary truman", "Mary Jane Truman", "Mr. Citizen", "Presidency of Harry S. Truman", "Presidency of Harry Truman", "President Harry Truman", "President Truman", "S truman", "Truman Administration", "Truman administration"], "normalized_aliases": ["presidency of harry s truman", "33rd president of united states", "truman administration", "s truman", "mr citizen", "harry truman s", "harry truman", "hary truman", "harry shipp truman", "h truman", "harry shippe truman", "h s truman", "president truman", "president harry truman", "hst president", "presidency of harry truman", "mary jane truman", "harry solomon truman", "harold truman", "harry s truman"], "matched_wiki_entity_name": "", "normalized_matched_wiki_entity_name": "", "normalized_value": "harry truman", "type": "WikipediaEntity", "value": "Harry Truman"}}, "target": " Harry Truman", "arguments": ["Q:Who was President when the first Peanuts cartoon was published? A:", {"do_sample": false, "temperature": 0.0}], "resps": [[" President"]], "filtered_resps": [" President"], "acc": 0, "f1": "tc_0"}
{"doc_id": 1, "doc": {"question": "Which American-born Sinclair won the Nobel Prize for Literature in 1930?", "question_id": "tc_1", "question_source": "http://www.triviacountry.com/", "entity_pages": {"doc_source": [], "filename": [], "title": [], "wiki_context": []}, "search_results": {"description": [], "filename": [], "rank": [], "title": [], "url": [], "search_context": []}, "answer": {"aliases": ["(Harry) Sinclair Lewis", "Grace Hegger", "Harry Sinclair Lewis", "Lewis, (Harry) Sinclair", "Sinclair Lewis"], "normalized_aliases": ["grace hegger", "lewis harry sinclair", "sinclair lewis", "harry sinclair lewis"], "matched_wiki_entity_name": "", "normalized_matched_wiki_entity_name": "", "normalized_value": "sinclair lewis", "type": "WikipediaEntity", "value": "Sinclair Lewis"}}, "target": " Sinclair Lewis", "arguments": ["Q:Which American-born Sinclair won the Nobel Prize for Literature in 1930? A:", {"do_sample": false, "temperature": 0.0}], "resps": [[" Sinclair"]], "filtered_resps": [" Sinclair"], "acc": 0, "f1": "tc_1"}
{"doc_id": 0, "doc": {"sentence": "Sarah was a much better surgeon than Maria so _ always got the easier cases.", "option1": "Sarah", "option2": "Maria", "answer": "2"}, "target": "always got the easier cases.", "arguments": ["Sarah was a much better surgeon than Maria so Sarah", " always got the easier cases."], "resps": [[[-15.5546875, false]], [[-15.78125, false]]], "filtered_resps": [[-15.5546875, false], [-15.78125, false]], "acc": 0.0}
{"doc_id": 1, "doc": {"sentence": "Sarah was a much better surgeon than Maria so _ always got the harder cases.", "option1": "Sarah", "option2": "Maria", "answer": "1"}, "target": "always got the harder cases.", "arguments": ["Sarah was a much better surgeon than Maria so Sarah", " always got the harder cases."], "resps": [[[-17.328125, false]], [[-17.21875, false]]], "filtered_resps": [[-17.328125, false], [-17.21875, false]], "acc": 0.0}
{"doc_id": 2, "doc": {"sentence": "They were worried the wine would ruin the bed and the blanket, but the _ was't ruined.", "option1": "blanket", "option2": "bed", "answer": "2"}, "target": "was't ruined.", "arguments": ["They were worried the wine would ruin the bed and the blanket, but the blanket", " was't ruined."], "resps": [[[-17.5625, false]], [[-18.171875, false]]], "filtered_resps": [[-17.5625, false], [-18.171875, false]], "acc": 0.0}
{"doc_id": 3, "doc": {"sentence": "Terry tried to bake the eggplant in the toaster oven but the _ was too big.", "option1": "eggplant", "option2": "toaster", "answer": "1"}, "target": "was too big.", "arguments": ["Terry tried to bake the eggplant in the toaster oven but the eggplant", " was too big."], "resps": [[[-4.9296875, true]], [[-7.11328125, false]]], "filtered_resps": [[-4.9296875, true], [-7.11328125, false]], "acc": 1.0}
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
{
"results": {
"winogrande": {
"acc,none": 0.5,
"acc_stderr,none": 0.5
}
},
"configs": {
"winogrande": {
"task": "winogrande",
"dataset_path": "winogrande",
"dataset_name": "winogrande_xl",
"training_split": "train",
"validation_split": "validation",
"doc_to_text": "<function doc_to_text at 0x7efcf0c7b1f0>",
"doc_to_target": "<function doc_to_target at 0x7efcf0c7be50>",
"gold_alias": "<function gold_alias at 0x7efcf0aaf1f0>",
"description": "",
"target_delimiter": " ",
"fewshot_delimiter": "\n\n",
"num_fewshot": 0,
"metric_list": [
{
"metric": "acc",
"aggregation": "mean",
"higher_is_better": true
}
],
"output_type": "multiple_choice",
"repeats": 1,
"should_decontaminate": false
}
},
"versions": {
"winogrande": "Yaml"
},
"config": {
"model": "hf",
"model_args": "",
"num_fewshot": 0,
"batch_size": 1,
"batch_sizes": [],
"device": null,
"use_cache": null,
"limit": 2.0,
"bootstrap_iters": 100000
},
"git_hash": "656c310"
}
\ No newline at end of file
This source diff could not be displayed because it is too large. You can view the blob instead.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment