Commit f9cc0267 authored by Leo Gao's avatar Leo Gao
Browse files

Use hashed version stability test instead

parent 10d4b64a
752cdf343d7152e476b0273065024f6ea0e0f47ea385c6bdf9067736cb39724a
\ No newline at end of file
[["Problem: Compute $36^{10} \\div 6^{19}$.\nAnswer:", ["\n"]], ["Problem: A math club is having a bake sale as a fundraiser to raise money for an upcoming trip. They sell $54$ cookies at three for $\\$1$, and $20$ cupcakes at $\\$2$ each, and $35$ brownies at $\\$1$ each. If it cost the math club $\\$15$ to bake these items, what was their profit?\nAnswer:", ["\n"]], ["Problem: A rectangular quilt's length is twice the length of a rectangular picture, and the quilt's width is three times the width of the same picture. The area of the picture is 2 square feet. What is the area of the quilt, in square feet?\nAnswer:", ["\n"]], ["Problem: In the diagram, the three concentric circles have radii of $4,$ $6,$ and $7.$ Three regions are labeled $X,$ $Y,$ or $Z$ below. Of these three regions, what is the difference between the area of the region with the greatest area and the area of the region with the smallest area? Express your answer in exact form.\n\n[asy]\nimport graph;\nfilldraw(circle((0,0),7), lightgray, black+linewidth(1));\nfilldraw(circle((0,0),6), gray, black+linewidth(1));\nfilldraw(circle((0,0),4), white, black+linewidth(1));\ndot((0,0));\nlabel(\"$X$\",(2,0));\nlabel(\"$Y$\",(5,0));\nlabel(\"$Z$\",(6.5,0));\n[/asy]\nAnswer:", ["\n"]], ["Problem: What is the value of $6 + (8 \\div 2)$?\nAnswer:", ["\n"]], ["Problem: Find $76-(-4\\cdot8-2)+13.$\nAnswer:", ["\n"]], ["Problem: A whole number larger than 2 leaves a remainder of 2 when divided by each of the numbers 3, 4, 5, and 6. What is the smallest such number?\nAnswer:", ["\n"]], ["Problem: A figure skater is facing north when she begins to spin to her right. She spins 2250 degrees. Which direction (north, south, east or west) is she facing when she finishes her spin?\nAnswer:", ["\n"]], ["Problem: What is the value of $(4 \\times 12)-(4+12)$?\nAnswer:", ["\n"]], ["Problem: The temperature in a desert rose $1.5$ degrees in $15$ minutes. If this rate of increase remains constant, how many degrees will the temperature rise in the next $2$ hours?\nAnswer:", ["\n"]]]
\ No newline at end of file
bc834b06fd79473ca6fe38a51b714aad0bf0478c1b0eec787eca34dbdf69cb71
\ No newline at end of file
[["Problem: All solutions of the equation $\\cos 4x = -\\frac{1}{2}$ can be expressed in the form $\\frac{(kn \\pm 1) \\pi}{6},$ where $n$ is an integer. Find the positive value of $k.$\nAnswer:", ["\n"]], ["Problem: Simplify\n\\[\\frac{\\cos x}{1 - \\sin x} - \\frac{\\cos x}{1 + \\sin x}.\\]\nAnswer:", ["\n"]], ["Problem: Compute $\\cos 180^\\circ$.\nAnswer:", ["\n"]], ["Problem: If $e^{i \\alpha} = \\frac{3}{5} +\\frac{4}{5} i$ and $e^{i \\beta} = -\\frac{12}{13} + \\frac{5}{13} i,$ then find $\\cos (\\alpha - \\beta).$\nAnswer:", ["\n"]], ["Problem: Let $0, a, b, c$ be the vertices of a square in counterclockwise order. Compute\n\\[\\frac{ac + b^2}{ab}.\\]Enter your answer in rectangular form.\nAnswer:", ["\n"]], ["Problem: What is the value of $ \\sum_{n=1}^\\infty (\\tan^{-1}\\sqrt{n}-\\tan^{-1}\\sqrt{n+1})$?\n\nYour answer should be in radians.\nAnswer:", ["\n"]], ["Problem: Let $L$ be the line in space that passes through the origin and the point $(2,1,-2).$ Find the reflection of the point $(3,6,15)$ across $L.$\nAnswer:", ["\n"]], ["Problem: Let $S$ be the set of all points $(x,y,z)$ such that $x^2 + y^2 + z^2 \\le 25$ and $z \\ge 0.$ Compute the side length of the largest cube contained in $S.$\nAnswer:", ["\n"]], ["Problem: Vectors $\\mathbf{a}$ and $\\mathbf{b}$ satisfy $\\|\\mathbf{a}\\| = 5$ and $\\|\\mathbf{b}\\| = 4.$ Also, the angle between vectors $\\mathbf{a}$ and $\\mathbf{b}$ is $60^\\circ.$ Find $\\|\\mathbf{a} - \\mathbf{b}\\|.$\nAnswer:", ["\n"]], ["Problem: Let $\\mathbf{a}$ and $\\mathbf{b}$ be two nonzero vectors such that $\\mathbf{a} + \\mathbf{b}$ and $\\mathbf{b}$ are orthogonal, and $\\mathbf{a} + 2 \\mathbf{b}$ and $\\mathbf{a}$ are orthogonal. Find $\\frac{\\|\\mathbf{a}\\|}{\\|\\mathbf{b}\\|}.$\nAnswer:", ["\n"]]]
\ No newline at end of file
a45260e49f02c7cb8886b3746db4d388890860b202dd8a9f0267e3c324e0af13
\ No newline at end of file
[["Question: in a group of ducks and cows , the total number of legs are 26 more than twice the no . of heads . find the total no . of buffaloes .\nAnswer:", " 11"], ["Question: in a group of ducks and cows , the total number of legs are 26 more than twice the no . of heads . find the total no . of buffaloes .\nAnswer:", " 12"], ["Question: in a group of ducks and cows , the total number of legs are 26 more than twice the no . of heads . find the total no . of buffaloes .\nAnswer:", " 13"], ["Question: in a group of ducks and cows , the total number of legs are 26 more than twice the no . of heads . find the total no . of buffaloes .\nAnswer:", " 16"], ["Question: in a group of ducks and cows , the total number of legs are 26 more than twice the no . of heads . find the total no . of buffaloes .\nAnswer:", " 18"], ["Question: how many words , with or without meaning , can be formed using all letters of the word good using each letter exactly once ?\nAnswer:", " 18"], ["Question: how many words , with or without meaning , can be formed using all letters of the word good using each letter exactly once ?\nAnswer:", " 20"], ["Question: how many words , with or without meaning , can be formed using all letters of the word good using each letter exactly once ?\nAnswer:", " 22"], ["Question: how many words , with or without meaning , can be formed using all letters of the word good using each letter exactly once ?\nAnswer:", " 23"], ["Question: how many words , with or without meaning , can be formed using all letters of the word good using each letter exactly once ?\nAnswer:", " 24"], ["Question: how many different ways can 2 students be seated in a row of 4 desks , so that there is always at least one empty desk between the students ?\nAnswer:", " 2"], ["Question: how many different ways can 2 students be seated in a row of 4 desks , so that there is always at least one empty desk between the students ?\nAnswer:", " 3"], ["Question: how many different ways can 2 students be seated in a row of 4 desks , so that there is always at least one empty desk between the students ?\nAnswer:", " 4"], ["Question: how many different ways can 2 students be seated in a row of 4 desks , so that there is always at least one empty desk between the students ?\nAnswer:", " 6"], ["Question: how many different ways can 2 students be seated in a row of 4 desks , so that there is always at least one empty desk between the students ?\nAnswer:", " 12"], ["Question: what is the length of the diagonal of a square whose area is 4 times of another square with diagonal as 5 v 2 cm ?\nAnswer:", " 20 v 2'"], ["Question: what is the length of the diagonal of a square whose area is 4 times of another square with diagonal as 5 v 2 cm ?\nAnswer:", " 10'"], ["Question: what is the length of the diagonal of a square whose area is 4 times of another square with diagonal as 5 v 2 cm ?\nAnswer:", " 10 v 2'"], ["Question: what is the length of the diagonal of a square whose area is 4 times of another square with diagonal as 5 v 2 cm ?\nAnswer:", " 20'"], ["Question: what is the length of the diagonal of a square whose area is 4 times of another square with diagonal as 5 v 2 cm ?\nAnswer:", " 25']"], ["Question: the sides of a square region , measured to the nearest centimeter , are 10 centimeters long . the least possible value of the actual area of the square region is\nAnswer:", " 96.25 sq cm"], ["Question: the sides of a square region , measured to the nearest centimeter , are 10 centimeters long . the least possible value of the actual area of the square region is\nAnswer:", " 98.25 sq cm"], ["Question: the sides of a square region , measured to the nearest centimeter , are 10 centimeters long . the least possible value of the actual area of the square region is\nAnswer:", " 92.25 sq cm"], ["Question: the sides of a square region , measured to the nearest centimeter , are 10 centimeters long . the least possible value of the actual area of the square region is\nAnswer:", " 100.25 sq cm"], ["Question: the sides of a square region , measured to the nearest centimeter , are 10 centimeters long . the least possible value of the actual area of the square region is\nAnswer:", " 90.25 sq cm"], ["Question: a group of n students can be divided into equal groups of 4 with 2 student left over or equal groups of 5 with 2 students left over . what is the sum of the two smallest possible values of n ?\nAnswer:", " 33"], ["Question: a group of n students can be divided into equal groups of 4 with 2 student left over or equal groups of 5 with 2 students left over . what is the sum of the two smallest possible values of n ?\nAnswer:", " 46"], ["Question: a group of n students can be divided into equal groups of 4 with 2 student left over or equal groups of 5 with 2 students left over . what is the sum of the two smallest possible values of n ?\nAnswer:", " 49"], ["Question: a group of n students can be divided into equal groups of 4 with 2 student left over or equal groups of 5 with 2 students left over . what is the sum of the two smallest possible values of n ?\nAnswer:", " 53"], ["Question: a group of n students can be divided into equal groups of 4 with 2 student left over or equal groups of 5 with 2 students left over . what is the sum of the two smallest possible values of n ?\nAnswer:", " 86"], ["Question: two pipes a and b can fill a tank in 36 hours and 45 hours respectively . if both the pipes are opened simultaneously , how much time will be taken to fill the tank ?\nAnswer:", " 20 hrs"], ["Question: two pipes a and b can fill a tank in 36 hours and 45 hours respectively . if both the pipes are opened simultaneously , how much time will be taken to fill the tank ?\nAnswer:", " 22 hrs"], ["Question: two pipes a and b can fill a tank in 36 hours and 45 hours respectively . if both the pipes are opened simultaneously , how much time will be taken to fill the tank ?\nAnswer:", " 23 hrs"], ["Question: two pipes a and b can fill a tank in 36 hours and 45 hours respectively . if both the pipes are opened simultaneously , how much time will be taken to fill the tank ?\nAnswer:", " 24 hrs"], ["Question: two pipes a and b can fill a tank in 36 hours and 45 hours respectively . if both the pipes are opened simultaneously , how much time will be taken to fill the tank ?\nAnswer:", " 21 hrs"], ["Question: two passenger trains start at the same hour in the day from two different stations and move towards each other at the rate of 25 kmph and 21 kmph respectively . when they meet , it is found that one train has traveled 60 km more than the other one . the distance between the two stations is ?\nAnswer:", " 457 km"], ["Question: two passenger trains start at the same hour in the day from two different stations and move towards each other at the rate of 25 kmph and 21 kmph respectively . when they meet , it is found that one train has traveled 60 km more than the other one . the distance between the two stations is ?\nAnswer:", " 444 km"], ["Question: two passenger trains start at the same hour in the day from two different stations and move towards each other at the rate of 25 kmph and 21 kmph respectively . when they meet , it is found that one train has traveled 60 km more than the other one . the distance between the two stations is ?\nAnswer:", " 552 km"], ["Question: two passenger trains start at the same hour in the day from two different stations and move towards each other at the rate of 25 kmph and 21 kmph respectively . when they meet , it is found that one train has traveled 60 km more than the other one . the distance between the two stations is ?\nAnswer:", " 645 km"], ["Question: two passenger trains start at the same hour in the day from two different stations and move towards each other at the rate of 25 kmph and 21 kmph respectively . when they meet , it is found that one train has traveled 60 km more than the other one . the distance between the two stations is ?\nAnswer:", " 453 km"], ["Question: in a forest 140 deer were caught , tagged with electronic markers , then released . a week later , 50 deer were captured in the same forest . of these 50 deer , it was found that 5 had been tagged with the electronic markers . if the percentage of tagged deer in the second sample approximates the percentage of tagged deer in the forest , and if no deer had either left or entered the forest over the preceding week , what is the approximate number of deer in the forest ?\nAnswer:", " 150"], ["Question: in a forest 140 deer were caught , tagged with electronic markers , then released . a week later , 50 deer were captured in the same forest . of these 50 deer , it was found that 5 had been tagged with the electronic markers . if the percentage of tagged deer in the second sample approximates the percentage of tagged deer in the forest , and if no deer had either left or entered the forest over the preceding week , what is the approximate number of deer in the forest ?\nAnswer:", " 750"], ["Question: in a forest 140 deer were caught , tagged with electronic markers , then released . a week later , 50 deer were captured in the same forest . of these 50 deer , it was found that 5 had been tagged with the electronic markers . if the percentage of tagged deer in the second sample approximates the percentage of tagged deer in the forest , and if no deer had either left or entered the forest over the preceding week , what is the approximate number of deer in the forest ?\nAnswer:", " 1,250"], ["Question: in a forest 140 deer were caught , tagged with electronic markers , then released . a week later , 50 deer were captured in the same forest . of these 50 deer , it was found that 5 had been tagged with the electronic markers . if the percentage of tagged deer in the second sample approximates the percentage of tagged deer in the forest , and if no deer had either left or entered the forest over the preceding week , what is the approximate number of deer in the forest ?\nAnswer:", " 1,400"], ["Question: in a forest 140 deer were caught , tagged with electronic markers , then released . a week later , 50 deer were captured in the same forest . of these 50 deer , it was found that 5 had been tagged with the electronic markers . if the percentage of tagged deer in the second sample approximates the percentage of tagged deer in the forest , and if no deer had either left or entered the forest over the preceding week , what is the approximate number of deer in the forest ?\nAnswer:", " 2,500"], ["Question: the manufacturer \u2019 s suggested retail price ( msrp ) of a certain item is $ 60 . store a sells the item for 20 percent more than the msrp . the regular price of the item at store b is 30 percent more than the msrp , but the item is currently on sale for 10 percent less than the regular price . if sales tax is 5 percent of the purchase price at both stores , what is the result when the total cost of the item at store b is subtracted from the total cost of the item at store a ?\nAnswer:", " $ 0"], ["Question: the manufacturer \u2019 s suggested retail price ( msrp ) of a certain item is $ 60 . store a sells the item for 20 percent more than the msrp . the regular price of the item at store b is 30 percent more than the msrp , but the item is currently on sale for 10 percent less than the regular price . if sales tax is 5 percent of the purchase price at both stores , what is the result when the total cost of the item at store b is subtracted from the total cost of the item at store a ?\nAnswer:", " $ 0.63"], ["Question: the manufacturer \u2019 s suggested retail price ( msrp ) of a certain item is $ 60 . store a sells the item for 20 percent more than the msrp . the regular price of the item at store b is 30 percent more than the msrp , but the item is currently on sale for 10 percent less than the regular price . if sales tax is 5 percent of the purchase price at both stores , what is the result when the total cost of the item at store b is subtracted from the total cost of the item at store a ?\nAnswer:", " $ 1.80"], ["Question: the manufacturer \u2019 s suggested retail price ( msrp ) of a certain item is $ 60 . store a sells the item for 20 percent more than the msrp . the regular price of the item at store b is 30 percent more than the msrp , but the item is currently on sale for 10 percent less than the regular price . if sales tax is 5 percent of the purchase price at both stores , what is the result when the total cost of the item at store b is subtracted from the total cost of the item at store a ?\nAnswer:", " $ 1.89"], ["Question: the manufacturer \u2019 s suggested retail price ( msrp ) of a certain item is $ 60 . store a sells the item for 20 percent more than the msrp . the regular price of the item at store b is 30 percent more than the msrp , but the item is currently on sale for 10 percent less than the regular price . if sales tax is 5 percent of the purchase price at both stores , what is the result when the total cost of the item at store b is subtracted from the total cost of the item at store a ?\nAnswer:", " $ 2.10"]]
\ No newline at end of file
{"results": {"mathqa": {"acc": 0.1, "acc_stderr": 0.09999999999999999, "acc_norm": 0.1, "acc_norm_stderr": 0.09999999999999999}}, "versions": {"mathqa": 0}}
\ No newline at end of file
{"results": {"mathqa": {"acc": 0.20770519262981574, "acc_norm": 0.2050251256281407, "acc_norm_stderr": 0.007390619359738901, "acc_stderr": 0.007426217631188539}}, "versions": {"mathqa": 0}}
\ No newline at end of file
4fc7b56b8f1e37e38f4a052b227baec2df914c898c3405d3e994726ba4fba976
\ No newline at end of file
[["The show, which begins each evening at 9:00 p.m. , relates in melodramatic fashion the history of Istanbul while coloured floodlights illuminate the spectacular architecture of the Blue Mosque.\nQuestion: The history of Istanbul is the subject of the show. True, False or Neither?\nAnswer:", " True"], ["The show, which begins each evening at 9:00 p.m. , relates in melodramatic fashion the history of Istanbul while coloured floodlights illuminate the spectacular architecture of the Blue Mosque.\nQuestion: The history of Istanbul is the subject of the show. True, False or Neither?\nAnswer:", " Neither"], ["The show, which begins each evening at 9:00 p.m. , relates in melodramatic fashion the history of Istanbul while coloured floodlights illuminate the spectacular architecture of the Blue Mosque.\nQuestion: The history of Istanbul is the subject of the show. True, False or Neither?\nAnswer:", " False"], ["It's thought he used the same architect who worked on the Taj Mahal.\nQuestion: Everyone thinks he used a different architect from the one who worked on the Taj Mahal. True, False or Neither?\nAnswer:", " True"], ["It's thought he used the same architect who worked on the Taj Mahal.\nQuestion: Everyone thinks he used a different architect from the one who worked on the Taj Mahal. True, False or Neither?\nAnswer:", " Neither"], ["It's thought he used the same architect who worked on the Taj Mahal.\nQuestion: Everyone thinks he used a different architect from the one who worked on the Taj Mahal. True, False or Neither?\nAnswer:", " False"], ["as long as you got congressmen and senators that are getting kickbacks kickbacks from these different companies that are getting awarded for the defense contracts that's never going to happen\nQuestion: It will never happen as long as there are congressmen and senators taking kickbacks from different companies. True, False or Neither?\nAnswer:", " True"], ["as long as you got congressmen and senators that are getting kickbacks kickbacks from these different companies that are getting awarded for the defense contracts that's never going to happen\nQuestion: It will never happen as long as there are congressmen and senators taking kickbacks from different companies. True, False or Neither?\nAnswer:", " Neither"], ["as long as you got congressmen and senators that are getting kickbacks kickbacks from these different companies that are getting awarded for the defense contracts that's never going to happen\nQuestion: It will never happen as long as there are congressmen and senators taking kickbacks from different companies. True, False or Neither?\nAnswer:", " False"], ["There always will be a need for an attorney to do general law.\nQuestion: There is not much need for attorney's to practice law. True, False or Neither?\nAnswer:", " True"], ["There always will be a need for an attorney to do general law.\nQuestion: There is not much need for attorney's to practice law. True, False or Neither?\nAnswer:", " Neither"], ["There always will be a need for an attorney to do general law.\nQuestion: There is not much need for attorney's to practice law. True, False or Neither?\nAnswer:", " False"], ["the the Iranian borders are still open uh from what i understand understand um\nQuestion: The borders of Iran are closed. True, False or Neither?\nAnswer:", " True"], ["the the Iranian borders are still open uh from what i understand understand um\nQuestion: The borders of Iran are closed. True, False or Neither?\nAnswer:", " Neither"], ["the the Iranian borders are still open uh from what i understand understand um\nQuestion: The borders of Iran are closed. True, False or Neither?\nAnswer:", " False"], ["According to the Natural Resources Conservation Service, this single, voluntary program will provide flexible technical, financial, and educational assistance to farmers and ranchers who face serious threats to soil, water, and related natural resources on agricultural and other lands, including grazing lands, wetlands, forest lands, and wildlife habitats.\nQuestion: Farmers and ranchers must have all of their licenses and permits to qualify. True, False or Neither?\nAnswer:", " True"], ["According to the Natural Resources Conservation Service, this single, voluntary program will provide flexible technical, financial, and educational assistance to farmers and ranchers who face serious threats to soil, water, and related natural resources on agricultural and other lands, including grazing lands, wetlands, forest lands, and wildlife habitats.\nQuestion: Farmers and ranchers must have all of their licenses and permits to qualify. True, False or Neither?\nAnswer:", " Neither"], ["According to the Natural Resources Conservation Service, this single, voluntary program will provide flexible technical, financial, and educational assistance to farmers and ranchers who face serious threats to soil, water, and related natural resources on agricultural and other lands, including grazing lands, wetlands, forest lands, and wildlife habitats.\nQuestion: Farmers and ranchers must have all of their licenses and permits to qualify. True, False or Neither?\nAnswer:", " False"], ["i wish it was as good over here as it is over there but if you're the\nQuestion: I wish it was as nice in America as it was in Iraq. True, False or Neither?\nAnswer:", " True"], ["i wish it was as good over here as it is over there but if you're the\nQuestion: I wish it was as nice in America as it was in Iraq. True, False or Neither?\nAnswer:", " Neither"], ["i wish it was as good over here as it is over there but if you're the\nQuestion: I wish it was as nice in America as it was in Iraq. True, False or Neither?\nAnswer:", " False"], ["He pulled his cloak tighter and wished for a moment that he had not shaved his head.\nQuestion: The man pulled his super hero cape around himself to show off. True, False or Neither?\nAnswer:", " True"], ["He pulled his cloak tighter and wished for a moment that he had not shaved his head.\nQuestion: The man pulled his super hero cape around himself to show off. True, False or Neither?\nAnswer:", " Neither"], ["He pulled his cloak tighter and wished for a moment that he had not shaved his head.\nQuestion: The man pulled his super hero cape around himself to show off. True, False or Neither?\nAnswer:", " False"], ["uh the one we thought would be the most timid uh turned out to be the one that stuck with it and was the first to learn\nQuestion: The one we thought would be timid was the first one to learn how to climb without a harness. True, False or Neither?\nAnswer:", " True"], ["uh the one we thought would be the most timid uh turned out to be the one that stuck with it and was the first to learn\nQuestion: The one we thought would be timid was the first one to learn how to climb without a harness. True, False or Neither?\nAnswer:", " Neither"], ["uh the one we thought would be the most timid uh turned out to be the one that stuck with it and was the first to learn\nQuestion: The one we thought would be timid was the first one to learn how to climb without a harness. True, False or Neither?\nAnswer:", " False"], ["Local legend claims that he wrote part of his great saga, Os Lusadas, in what is now called the Camees Grotto, situated in the spacious tropical Camees Garden.\nQuestion: Local legend makes the claim that he didn't write any of his great saga in the Camees Grotto. True, False or Neither?\nAnswer:", " True"], ["Local legend claims that he wrote part of his great saga, Os Lusadas, in what is now called the Camees Grotto, situated in the spacious tropical Camees Garden.\nQuestion: Local legend makes the claim that he didn't write any of his great saga in the Camees Grotto. True, False or Neither?\nAnswer:", " Neither"], ["Local legend claims that he wrote part of his great saga, Os Lusadas, in what is now called the Camees Grotto, situated in the spacious tropical Camees Garden.\nQuestion: Local legend makes the claim that he didn't write any of his great saga in the Camees Grotto. True, False or Neither?\nAnswer:", " False"]]
\ No newline at end of file
{"results": {"mnli": {"acc": 0.2, "acc_stderr": 0.13333333333333333}}, "versions": {"mnli": 0}}
\ No newline at end of file
{"results": {"mnli": {"acc": 0.32868059093224655, "acc_stderr": 0.004741640290753859}}, "versions": {"mnli": 0}}
\ No newline at end of file
3784acf322e79f31702a7a0612030e4ba5c4fc466ad976a34ee3f3d7278c01f0
\ No newline at end of file
[["This occurred not only because of the greater concentration of businesses in the textile industry, but because textile companies generally plan to run their expensive capital equipment at full capacity around the clock.\nQuestion: This didn't occur because of anything involving the textile industry. True, False or Neither?\nAnswer:", " True"], ["This occurred not only because of the greater concentration of businesses in the textile industry, but because textile companies generally plan to run their expensive capital equipment at full capacity around the clock.\nQuestion: This didn't occur because of anything involving the textile industry. True, False or Neither?\nAnswer:", " Neither"], ["This occurred not only because of the greater concentration of businesses in the textile industry, but because textile companies generally plan to run their expensive capital equipment at full capacity around the clock.\nQuestion: This didn't occur because of anything involving the textile industry. True, False or Neither?\nAnswer:", " False"], ["Still no response.\nQuestion: There had been multiple attempts to make contact. True, False or Neither?\nAnswer:", " True"], ["Still no response.\nQuestion: There had been multiple attempts to make contact. True, False or Neither?\nAnswer:", " Neither"], ["Still no response.\nQuestion: There had been multiple attempts to make contact. True, False or Neither?\nAnswer:", " False"], ["The simplest is for one or more of the members to simply donate one million dollars to the IGGS Scholarship Fund.\nQuestion: The most complicated is to arrange a donation of one million dollars to the IGGS Scholarship Fund. True, False or Neither?\nAnswer:", " True"], ["The simplest is for one or more of the members to simply donate one million dollars to the IGGS Scholarship Fund.\nQuestion: The most complicated is to arrange a donation of one million dollars to the IGGS Scholarship Fund. True, False or Neither?\nAnswer:", " Neither"], ["The simplest is for one or more of the members to simply donate one million dollars to the IGGS Scholarship Fund.\nQuestion: The most complicated is to arrange a donation of one million dollars to the IGGS Scholarship Fund. True, False or Neither?\nAnswer:", " False"], [" the evidence in the published work is quite thin\nQuestion: There is evidence that shows the work has false claims. True, False or Neither?\nAnswer:", " True"], [" the evidence in the published work is quite thin\nQuestion: There is evidence that shows the work has false claims. True, False or Neither?\nAnswer:", " Neither"], [" the evidence in the published work is quite thin\nQuestion: There is evidence that shows the work has false claims. True, False or Neither?\nAnswer:", " False"], ["I might add that a gift of $400 qualifies you for recognition in the School of Liberal Arts Dean's Council, a critical group of supporters whose generosity does a lot to advance the calibre of the School.\nQuestion: You will be qualified for recognition after gifting $400. True, False or Neither?\nAnswer:", " True"], ["I might add that a gift of $400 qualifies you for recognition in the School of Liberal Arts Dean's Council, a critical group of supporters whose generosity does a lot to advance the calibre of the School.\nQuestion: You will be qualified for recognition after gifting $400. True, False or Neither?\nAnswer:", " Neither"], ["I might add that a gift of $400 qualifies you for recognition in the School of Liberal Arts Dean's Council, a critical group of supporters whose generosity does a lot to advance the calibre of the School.\nQuestion: You will be qualified for recognition after gifting $400. True, False or Neither?\nAnswer:", " False"], ["There's, my dad reminds me from time to time.\nQuestion: My dad reminds me that from time to time. True, False or Neither?\nAnswer:", " True"], ["There's, my dad reminds me from time to time.\nQuestion: My dad reminds me that from time to time. True, False or Neither?\nAnswer:", " Neither"], ["There's, my dad reminds me from time to time.\nQuestion: My dad reminds me that from time to time. True, False or Neither?\nAnswer:", " False"], ["One spoke very little English and one spoke excellent English.\nQuestion: One spoke barely any English and the other spoke perfect English. True, False or Neither?\nAnswer:", " True"], ["One spoke very little English and one spoke excellent English.\nQuestion: One spoke barely any English and the other spoke perfect English. True, False or Neither?\nAnswer:", " Neither"], ["One spoke very little English and one spoke excellent English.\nQuestion: One spoke barely any English and the other spoke perfect English. True, False or Neither?\nAnswer:", " False"], ["Machines are laid out in a manner that speeds up shuttling a bin of garment bundles from operator to operator.\nQuestion: The machines were laid out without regard to the shuttling speed of the bins. True, False or Neither?\nAnswer:", " True"], ["Machines are laid out in a manner that speeds up shuttling a bin of garment bundles from operator to operator.\nQuestion: The machines were laid out without regard to the shuttling speed of the bins. True, False or Neither?\nAnswer:", " Neither"], ["Machines are laid out in a manner that speeds up shuttling a bin of garment bundles from operator to operator.\nQuestion: The machines were laid out without regard to the shuttling speed of the bins. True, False or Neither?\nAnswer:", " False"], ["Does parenting really matter?\nQuestion: There's no question that parenting is extremely important. True, False or Neither?\nAnswer:", " True"], ["Does parenting really matter?\nQuestion: There's no question that parenting is extremely important. True, False or Neither?\nAnswer:", " Neither"], ["Does parenting really matter?\nQuestion: There's no question that parenting is extremely important. True, False or Neither?\nAnswer:", " False"], ["One may well wonder what Miss Stowe had in mind when, in Uncle Tom's Cabin (Chapter 5), she narrates how Mrs.\nQuestion: Miss Stowe never spoke a word in Uncle Tom's Cabin. True, False or Neither?\nAnswer:", " True"], ["One may well wonder what Miss Stowe had in mind when, in Uncle Tom's Cabin (Chapter 5), she narrates how Mrs.\nQuestion: Miss Stowe never spoke a word in Uncle Tom's Cabin. True, False or Neither?\nAnswer:", " Neither"], ["One may well wonder what Miss Stowe had in mind when, in Uncle Tom's Cabin (Chapter 5), she narrates how Mrs.\nQuestion: Miss Stowe never spoke a word in Uncle Tom's Cabin. True, False or Neither?\nAnswer:", " False"]]
\ No newline at end of file
{"results": {"mnli_mismatched": {"acc": 0.3, "acc_stderr": 0.15275252316519464}}, "versions": {"mnli_mismatched": 0}}
\ No newline at end of file
{"results": {"mnli_mismatched": {"acc": 0.3360455655004068, "acc_stderr": 0.004763973908606819}}, "versions": {"mnli_mismatched": 0}}
\ No newline at end of file
9f54cbff8d6accba99cfa2c4c4b359563313941018173d7dcf9e32dc28c06583
\ No newline at end of file
[["Sentence 1: In fiction : Edward P. Jones (\"The Known World\") and Scott Spencer (\"A Ship Made of Paper\").\nSentence 2: The fifth nominee for fiction is Scott Spencer, for A Ship Made of Paper.\nQuestion: Do both sentences mean the same thing?\nAnswer:", " yes"], ["Sentence 1: In fiction : Edward P. Jones (\"The Known World\") and Scott Spencer (\"A Ship Made of Paper\").\nSentence 2: The fifth nominee for fiction is Scott Spencer, for A Ship Made of Paper.\nQuestion: Do both sentences mean the same thing?\nAnswer:", " no"], ["Sentence 1: Corixa shares rose 54 cents to $ 7.74 yesterday on the Nasdaq Stock Market.\nSentence 2: Shares of Corixa rose 54 cents, or about 8 percent, to close at $ 7.74.\nQuestion: Do both sentences mean the same thing?\nAnswer:", " yes"], ["Sentence 1: Corixa shares rose 54 cents to $ 7.74 yesterday on the Nasdaq Stock Market.\nSentence 2: Shares of Corixa rose 54 cents, or about 8 percent, to close at $ 7.74.\nQuestion: Do both sentences mean the same thing?\nAnswer:", " no"], ["Sentence 1: Nearly 300 mutinous troops who seized a Manila shopping and apartment complex demanding the government resign gave up and retreated peacefully after some 19 hours.\nSentence 2: Mutinous troops who seized a Manila shopping and apartment complex demanding the government resign ended a 19-hour standoff late Sunday and returned to barracks without a shot fired.\nQuestion: Do both sentences mean the same thing?\nAnswer:", " yes"], ["Sentence 1: Nearly 300 mutinous troops who seized a Manila shopping and apartment complex demanding the government resign gave up and retreated peacefully after some 19 hours.\nSentence 2: Mutinous troops who seized a Manila shopping and apartment complex demanding the government resign ended a 19-hour standoff late Sunday and returned to barracks without a shot fired.\nQuestion: Do both sentences mean the same thing?\nAnswer:", " no"], ["Sentence 1: Drax faced a financial crisis late last year after it lost its most lucrative sales contract, held with insolvent utility TXU Europe.\nSentence 2: Drax \u2019 s troubles began late last year when it lost its most lucrative sales contract, with the insolvent utility TXU Europe.\nQuestion: Do both sentences mean the same thing?\nAnswer:", " yes"], ["Sentence 1: Drax faced a financial crisis late last year after it lost its most lucrative sales contract, held with insolvent utility TXU Europe.\nSentence 2: Drax \u2019 s troubles began late last year when it lost its most lucrative sales contract, with the insolvent utility TXU Europe.\nQuestion: Do both sentences mean the same thing?\nAnswer:", " no"], ["Sentence 1: They will help draft a plan to attack obesity that Kraft will implement over three to four years.\nSentence 2: The team will help draft a plan by the end of the year to attack obesity.\nQuestion: Do both sentences mean the same thing?\nAnswer:", " yes"], ["Sentence 1: They will help draft a plan to attack obesity that Kraft will implement over three to four years.\nSentence 2: The team will help draft a plan by the end of the year to attack obesity.\nQuestion: Do both sentences mean the same thing?\nAnswer:", " no"], ["Sentence 1: The dollar was at 116.92 yen against the yen, flat on the session, and at 1.2891 against the Swiss franc, also flat.\nSentence 2: The dollar was at 116.78 yen JPY =, virtually flat on the session, and at 1.2871 against the Swiss franc CHF =, down 0.1 percent.\nQuestion: Do both sentences mean the same thing?\nAnswer:", " yes"], ["Sentence 1: The dollar was at 116.92 yen against the yen, flat on the session, and at 1.2891 against the Swiss franc, also flat.\nSentence 2: The dollar was at 116.78 yen JPY =, virtually flat on the session, and at 1.2871 against the Swiss franc CHF =, down 0.1 percent.\nQuestion: Do both sentences mean the same thing?\nAnswer:", " no"], ["Sentence 1: Nelson, 27, is being retried on civil-rights charges stemming from the disturbance which led to Rosenbaum's death.\nSentence 2: Nelson, 27, is being retried on civil rights charges stemming from the disturbance that led to Rosenbaum's death.\nQuestion: Do both sentences mean the same thing?\nAnswer:", " yes"], ["Sentence 1: Nelson, 27, is being retried on civil-rights charges stemming from the disturbance which led to Rosenbaum's death.\nSentence 2: Nelson, 27, is being retried on civil rights charges stemming from the disturbance that led to Rosenbaum's death.\nQuestion: Do both sentences mean the same thing?\nAnswer:", " no"], ["Sentence 1: \"Sanitation is poor... there could be typhoid and cholera,\"he said.\nSentence 2: \"Sanitation is poor, drinking water is generally left behind... there could be typhoid and cholera.\"\nQuestion: Do both sentences mean the same thing?\nAnswer:", " yes"], ["Sentence 1: \"Sanitation is poor... there could be typhoid and cholera,\"he said.\nSentence 2: \"Sanitation is poor, drinking water is generally left behind... there could be typhoid and cholera.\"\nQuestion: Do both sentences mean the same thing?\nAnswer:", " no"], ["Sentence 1: The legal ruling follows three days of intense speculation Hewlett-Packard Co. may be bidding for the company.\nSentence 2: The legal ruling follows three days of wild volatility in RIM's stock over speculation that PC giant Hewlett-Packard Co. may be bidding for the company.\nQuestion: Do both sentences mean the same thing?\nAnswer:", " yes"], ["Sentence 1: The legal ruling follows three days of intense speculation Hewlett-Packard Co. may be bidding for the company.\nSentence 2: The legal ruling follows three days of wild volatility in RIM's stock over speculation that PC giant Hewlett-Packard Co. may be bidding for the company.\nQuestion: Do both sentences mean the same thing?\nAnswer:", " no"], ["Sentence 1: Rumsfeld, who has been feuding for two years with Army leadership, passed over nine active-duty four-star generals.\nSentence 2: Rumsfeld has been feuding for a long time with Army leadership, and he passed over nine active-duty four-star generals.\nQuestion: Do both sentences mean the same thing?\nAnswer:", " yes"], ["Sentence 1: Rumsfeld, who has been feuding for two years with Army leadership, passed over nine active-duty four-star generals.\nSentence 2: Rumsfeld has been feuding for a long time with Army leadership, and he passed over nine active-duty four-star generals.\nQuestion: Do both sentences mean the same thing?\nAnswer:", " no"]]
\ No newline at end of file
{"results": {"mrpc": {"acc": 0.6, "acc_stderr": 0.16329931618554522, "f1": 0.5, "f1_stderr": 0.19843281497713464}}, "versions": {"mrpc": 0}}
\ No newline at end of file
{"results": {"mrpc": {"acc": 0.5392156862745098, "acc_stderr": 0.024707732873723128, "f1": 0.5982905982905982, "f1_stderr": 0.028928325246283727}}, "versions": {"mrpc": 0}}
\ No newline at end of file
cdb026c027437a8b4653212d0944d36fc16f49921dcb8e4bef899d15a55e9f80
\ No newline at end of file
[["The bar was manned by an expensive humanoid robot. It turned toward Sarah's wave and acknowledged her with a nod, moments later setting a fluted glass of sparkling liquid in front of her. I marveled at the robot's smoothness and coordination. Clearly, it was a high-end model. Sarah transferred the glass to my free hand and pulled me away from the bar for more introductions, with Alexis trailing after us. I spent the evening listening, mostly. Listening and stuffing my face with all the bits of fine food provided. No one minded; Sarah's inner circle was content to fill our circle of couches with plenty of chatter. Ray, a plump man who was grey where he wasn't bald. Zheng, short and dark and lean, with a very intense gaze. He made me a little uncomfortable. Kishori, petite, her hair strung out in a series of braids that reached nearly to her waist. I categorized them based on their appearances, hoping I'd be able to pick them out of the crowd again later. Most of their chatter was meaningless to me\u2014stories of day-to-day activities, how so-and-so had been seen in so-and-so's table at lunch and my wasn't that a surprise, and why hadn't the chef concocted this delectable a selection of appetizers for the dance the other night, but of course those rolled meat pastries reminded one of the pastries back on Earth, didn't they, and this was somehow an interesting fact. After the first half-hour, I stopped expending effort to keep names and stories and gossip straight. I wasn't learning anything useful. I could have started asking questions, but I wanted to get my bearings first. Tonight was for observation. I didn't bother trying to seek out a different group of potentially more interesting people, though. They all looked the same: clusters of social butterflies surrounded by the less apt, the hangers-on, the circle with whom the gossip was shared. \nQuestion: What were Zheng's traits?\nAnswer:", " yes, Short"], ["The bar was manned by an expensive humanoid robot. It turned toward Sarah's wave and acknowledged her with a nod, moments later setting a fluted glass of sparkling liquid in front of her. I marveled at the robot's smoothness and coordination. Clearly, it was a high-end model. Sarah transferred the glass to my free hand and pulled me away from the bar for more introductions, with Alexis trailing after us. I spent the evening listening, mostly. Listening and stuffing my face with all the bits of fine food provided. No one minded; Sarah's inner circle was content to fill our circle of couches with plenty of chatter. Ray, a plump man who was grey where he wasn't bald. Zheng, short and dark and lean, with a very intense gaze. He made me a little uncomfortable. Kishori, petite, her hair strung out in a series of braids that reached nearly to her waist. I categorized them based on their appearances, hoping I'd be able to pick them out of the crowd again later. Most of their chatter was meaningless to me\u2014stories of day-to-day activities, how so-and-so had been seen in so-and-so's table at lunch and my wasn't that a surprise, and why hadn't the chef concocted this delectable a selection of appetizers for the dance the other night, but of course those rolled meat pastries reminded one of the pastries back on Earth, didn't they, and this was somehow an interesting fact. After the first half-hour, I stopped expending effort to keep names and stories and gossip straight. I wasn't learning anything useful. I could have started asking questions, but I wanted to get my bearings first. Tonight was for observation. I didn't bother trying to seek out a different group of potentially more interesting people, though. They all looked the same: clusters of social butterflies surrounded by the less apt, the hangers-on, the circle with whom the gossip was shared. \nQuestion: What were Zheng's traits?\nAnswer:", " no, Short"], ["Before 9/11, the CIA did not invest in developing a robust capability to conduct paramilitary operations with U.S. personnel. It relied on proxies instead, organized by CIA operatives without the requisite military training. The results were unsatisfactory. Whether the price is measured in either money or people, the United States cannot afford to build two separate capabilities for carrying out secret military operations, secretly operating standoff missiles, and secretly training foreign military or paramilitary forces. The United States should concentrate responsibility and necessary legal authorities in one entity. The post-9/11 Afghanistan precedent of using joint CIA-military teams for covert and clandestine operations was a good one. We believe this proposal to be consistent with it. Each agency would concentrate on its comparative advantages in building capabilities for joint missions. The operation itself would be planned in common. The CIA has a reputation for agility in operations. The military has a reputation for being methodical and cumbersome. We do not know if these stereotypes match current reality; they may also be one more symptom of the civil-military misunderstandings we described in chapter 4. It is a problem to be resolved in policy guidance and agency management, not in the creation of redundant, overlapping capabilities and authorities in such sensitive work. The CIA's experts should be integrated into the military's training, exercises, and planning. To quote a CIA official now serving in the field:\"One fight, one team.\" Finally, to combat the secrecy and complexity we have described, the overall amounts of money being appropriated for national intelligence and to its component agencies should no longer be kept secret. Congress should pass a separate appropriations act for intelligence, defending the broad allocation of how these tens of billions of dollars have been assigned among the varieties of intelligence work. The specifics of the intelligence appropriation would remain classified, as they are today. \nQuestion: Who should concentrate on one entity instead of two separate capabilities?\nAnswer:", " yes, The United States Army branches"], ["Before 9/11, the CIA did not invest in developing a robust capability to conduct paramilitary operations with U.S. personnel. It relied on proxies instead, organized by CIA operatives without the requisite military training. The results were unsatisfactory. Whether the price is measured in either money or people, the United States cannot afford to build two separate capabilities for carrying out secret military operations, secretly operating standoff missiles, and secretly training foreign military or paramilitary forces. The United States should concentrate responsibility and necessary legal authorities in one entity. The post-9/11 Afghanistan precedent of using joint CIA-military teams for covert and clandestine operations was a good one. We believe this proposal to be consistent with it. Each agency would concentrate on its comparative advantages in building capabilities for joint missions. The operation itself would be planned in common. The CIA has a reputation for agility in operations. The military has a reputation for being methodical and cumbersome. We do not know if these stereotypes match current reality; they may also be one more symptom of the civil-military misunderstandings we described in chapter 4. It is a problem to be resolved in policy guidance and agency management, not in the creation of redundant, overlapping capabilities and authorities in such sensitive work. The CIA's experts should be integrated into the military's training, exercises, and planning. To quote a CIA official now serving in the field:\"One fight, one team.\" Finally, to combat the secrecy and complexity we have described, the overall amounts of money being appropriated for national intelligence and to its component agencies should no longer be kept secret. Congress should pass a separate appropriations act for intelligence, defending the broad allocation of how these tens of billions of dollars have been assigned among the varieties of intelligence work. The specifics of the intelligence appropriation would remain classified, as they are today. \nQuestion: Who should concentrate on one entity instead of two separate capabilities?\nAnswer:", " no, The United States Army branches"], ["The judge leaned back in his chair and beckoned to Mr. Andrews. It was finished. Spear was free, and from different parts of the courtroom people were moving toward the door. Their numbers showed that the friends of the young man had been many. Mr. Thorndike felt a certain twinge of disappointment. Even though the result relieved and pleased him, he wished, in bringing it about, he had had some part. He begrudged to Isaacs & Sons the credit of having given Spear his liberty. His morning had been wasted. He had neglected his own interests, and in no way assisted those of Spear. He was moving out of the railed enclosure when Andrews called him by name. \"His honor,\" he said impressively, \"wishes to speak to you.\" The judge leaned over his desk and shook Mr. Thorndike by the hand. Then he made a speech. The speech was about public-spirited citizens who, to the neglect of their own interests, came to assist the ends of justice, and fellow-creatures in misfortune. He purposely spoke in a loud voice, and every one stopped to listen. \"The law, Mr. Thorndike, is not vindictive,\" he said. \"It wishes only to be just. Nor can it be swayed by wealth or political or social influences. But when there is good in a man, I, personally, want to know it, and when gentlemen like yourself, of your standing in this city, come here to speak a good word for a man, we would stultify the purpose of justice if we did not listen. I thank you for coming, and I wish more of our citizens were as unselfish and public-spirited.\" It was all quite absurd and most embarrassing, but inwardly Mr. Thorndike glowed with pleasure. It was a long time since any one had had the audacity to tell him he had done well. \nQuestion: Who told Mr. Thorndike he had done well?\nAnswer:", " yes, The lawyers"], ["The judge leaned back in his chair and beckoned to Mr. Andrews. It was finished. Spear was free, and from different parts of the courtroom people were moving toward the door. Their numbers showed that the friends of the young man had been many. Mr. Thorndike felt a certain twinge of disappointment. Even though the result relieved and pleased him, he wished, in bringing it about, he had had some part. He begrudged to Isaacs & Sons the credit of having given Spear his liberty. His morning had been wasted. He had neglected his own interests, and in no way assisted those of Spear. He was moving out of the railed enclosure when Andrews called him by name. \"His honor,\" he said impressively, \"wishes to speak to you.\" The judge leaned over his desk and shook Mr. Thorndike by the hand. Then he made a speech. The speech was about public-spirited citizens who, to the neglect of their own interests, came to assist the ends of justice, and fellow-creatures in misfortune. He purposely spoke in a loud voice, and every one stopped to listen. \"The law, Mr. Thorndike, is not vindictive,\" he said. \"It wishes only to be just. Nor can it be swayed by wealth or political or social influences. But when there is good in a man, I, personally, want to know it, and when gentlemen like yourself, of your standing in this city, come here to speak a good word for a man, we would stultify the purpose of justice if we did not listen. I thank you for coming, and I wish more of our citizens were as unselfish and public-spirited.\" It was all quite absurd and most embarrassing, but inwardly Mr. Thorndike glowed with pleasure. It was a long time since any one had had the audacity to tell him he had done well. \nQuestion: Who told Mr. Thorndike he had done well?\nAnswer:", " no, The lawyers"], ["The bar was manned by an expensive humanoid robot. It turned toward Sarah's wave and acknowledged her with a nod, moments later setting a fluted glass of sparkling liquid in front of her. I marveled at the robot's smoothness and coordination. Clearly, it was a high-end model. Sarah transferred the glass to my free hand and pulled me away from the bar for more introductions, with Alexis trailing after us. I spent the evening listening, mostly. Listening and stuffing my face with all the bits of fine food provided. No one minded; Sarah's inner circle was content to fill our circle of couches with plenty of chatter. Ray, a plump man who was grey where he wasn't bald. Zheng, short and dark and lean, with a very intense gaze. He made me a little uncomfortable. Kishori, petite, her hair strung out in a series of braids that reached nearly to her waist. I categorized them based on their appearances, hoping I'd be able to pick them out of the crowd again later. Most of their chatter was meaningless to me\u2014stories of day-to-day activities, how so-and-so had been seen in so-and-so's table at lunch and my wasn't that a surprise, and why hadn't the chef concocted this delectable a selection of appetizers for the dance the other night, but of course those rolled meat pastries reminded one of the pastries back on Earth, didn't they, and this was somehow an interesting fact. After the first half-hour, I stopped expending effort to keep names and stories and gossip straight. I wasn't learning anything useful. I could have started asking questions, but I wanted to get my bearings first. Tonight was for observation. I didn't bother trying to seek out a different group of potentially more interesting people, though. They all looked the same: clusters of social butterflies surrounded by the less apt, the hangers-on, the circle with whom the gossip was shared. \nQuestion: Sarah introduces him to three other guests. Name them.\nAnswer:", " yes, Kishori"], ["The bar was manned by an expensive humanoid robot. It turned toward Sarah's wave and acknowledged her with a nod, moments later setting a fluted glass of sparkling liquid in front of her. I marveled at the robot's smoothness and coordination. Clearly, it was a high-end model. Sarah transferred the glass to my free hand and pulled me away from the bar for more introductions, with Alexis trailing after us. I spent the evening listening, mostly. Listening and stuffing my face with all the bits of fine food provided. No one minded; Sarah's inner circle was content to fill our circle of couches with plenty of chatter. Ray, a plump man who was grey where he wasn't bald. Zheng, short and dark and lean, with a very intense gaze. He made me a little uncomfortable. Kishori, petite, her hair strung out in a series of braids that reached nearly to her waist. I categorized them based on their appearances, hoping I'd be able to pick them out of the crowd again later. Most of their chatter was meaningless to me\u2014stories of day-to-day activities, how so-and-so had been seen in so-and-so's table at lunch and my wasn't that a surprise, and why hadn't the chef concocted this delectable a selection of appetizers for the dance the other night, but of course those rolled meat pastries reminded one of the pastries back on Earth, didn't they, and this was somehow an interesting fact. After the first half-hour, I stopped expending effort to keep names and stories and gossip straight. I wasn't learning anything useful. I could have started asking questions, but I wanted to get my bearings first. Tonight was for observation. I didn't bother trying to seek out a different group of potentially more interesting people, though. They all looked the same: clusters of social butterflies surrounded by the less apt, the hangers-on, the circle with whom the gossip was shared. \nQuestion: Sarah introduces him to three other guests. Name them.\nAnswer:", " no, Kishori"], ["While the Base Ball writers of the cities which comprise the Southern Association have no organized membership similar to the Base Ball Writers' Association of the major leagues and the organizations which are best known as the class AA leagues, they are a clever, hard-working group of young men, who have labored in season and out of season, not only to build up Base Ball but to build it up on the right lines. Experience of more than a quarter of a century has most abundantly proved that the standard of Base Ball has steadily been elevated. It needs no compilation of fact nor any dogmatic assertion on the part of the Editor of the GUIDE to attest that fact. It is a present condition which speaks for itself. The general tone of the players is far higher than it was and there has come into evidence a marked improvement in the spirit of the men who own Base Ball clubs. In the earlier history of the sport there was a tendency to win by any means that did not actually cross the line of dishonesty. Later there came a season when the commercial end of the game tended to encroach upon the limits of the pastime. This has been repressed in the last two seasons and to-day the morale of Base Ball is of a higher type than it ever has been in the history of the pastime. \nQuestion: What characteristics of the pastime of baseball has been repressed?\nAnswer:", " yes, The Base Ball writers of the cities have no organized membership"], ["While the Base Ball writers of the cities which comprise the Southern Association have no organized membership similar to the Base Ball Writers' Association of the major leagues and the organizations which are best known as the class AA leagues, they are a clever, hard-working group of young men, who have labored in season and out of season, not only to build up Base Ball but to build it up on the right lines. Experience of more than a quarter of a century has most abundantly proved that the standard of Base Ball has steadily been elevated. It needs no compilation of fact nor any dogmatic assertion on the part of the Editor of the GUIDE to attest that fact. It is a present condition which speaks for itself. The general tone of the players is far higher than it was and there has come into evidence a marked improvement in the spirit of the men who own Base Ball clubs. In the earlier history of the sport there was a tendency to win by any means that did not actually cross the line of dishonesty. Later there came a season when the commercial end of the game tended to encroach upon the limits of the pastime. This has been repressed in the last two seasons and to-day the morale of Base Ball is of a higher type than it ever has been in the history of the pastime. \nQuestion: What characteristics of the pastime of baseball has been repressed?\nAnswer:", " no, The Base Ball writers of the cities have no organized membership"], ["(OPRAH.com) -- Chris Rock is an Emmy-winning comedian, devoted husband and loving father --but it's time to get to know a new side of this funnyman. Meet Chris Rock, hair expert. It's a detour he took after an innocent carpool ride left Rock with an idea he just couldn't shake. \"I was with my daughter one day, and we're in the car and she's with one of her friends in the back seat, a little white friend,\" he says. \"She was just kind of raving about her friend's hair a little too much for my comfort [saying]: 'You've got great hair. Oh, your hair's so good.'\" Not wanting to make her comments a big deal, Rock says he tried to play his them off. \"[I said]: \"Oh, baby, your hair's beautiful. Come on,'\" he says. \"If I would have really reacted, then she would have a complex about her hair.\" Still, Rock couldn't let it go. \"It sparked something in me,\" he says. Oprah.com: Oprah's hair throughout the years! What Rock discovered is a $9 billion industry that affects the daily activities, wallets, self-esteem -- and even the sex lives -- of black women. Because women spend so much time and money on their hair, Rock says men are forced to adopt a hands-off policy. \"You cannot touch a black woman's hair. You are conditioned not to even go there,\" he says. \"When I was a dating guy, I dated women from different races. Anytime I was with an Asian or a Puerto Rican girl or a white girl, my hands would constantly be in their hair. \nQuestion: Chris Rock said whenever he was with Asian, Puerto Rican, or white girl, his hand would be in particular part of those women's body. What was it?\nAnswer:", " yes, Their breasts"], ["(OPRAH.com) -- Chris Rock is an Emmy-winning comedian, devoted husband and loving father --but it's time to get to know a new side of this funnyman. Meet Chris Rock, hair expert. It's a detour he took after an innocent carpool ride left Rock with an idea he just couldn't shake. \"I was with my daughter one day, and we're in the car and she's with one of her friends in the back seat, a little white friend,\" he says. \"She was just kind of raving about her friend's hair a little too much for my comfort [saying]: 'You've got great hair. Oh, your hair's so good.'\" Not wanting to make her comments a big deal, Rock says he tried to play his them off. \"[I said]: \"Oh, baby, your hair's beautiful. Come on,'\" he says. \"If I would have really reacted, then she would have a complex about her hair.\" Still, Rock couldn't let it go. \"It sparked something in me,\" he says. Oprah.com: Oprah's hair throughout the years! What Rock discovered is a $9 billion industry that affects the daily activities, wallets, self-esteem -- and even the sex lives -- of black women. Because women spend so much time and money on their hair, Rock says men are forced to adopt a hands-off policy. \"You cannot touch a black woman's hair. You are conditioned not to even go there,\" he says. \"When I was a dating guy, I dated women from different races. Anytime I was with an Asian or a Puerto Rican girl or a white girl, my hands would constantly be in their hair. \nQuestion: Chris Rock said whenever he was with Asian, Puerto Rican, or white girl, his hand would be in particular part of those women's body. What was it?\nAnswer:", " no, Their breasts"], ["Eric , a young boy , is excited about his birthday after reading a book and believes that a train will come for him , despite his sister Jill's disbelief . A few hours later , the train station's control tower wakes up and in the roundhouse , Tillie , a young little blue switcher engine , along with her best bird friend , Chip , wakes up four other trains : Georgia , a kind all-purpose engine , Farnsworth , a stuck-up passenger engine , Jebediah , a worn-out old engine , and Pete , a gruff , burly freight engine . After the tower assigns Farnsworth and Pete their jobs , Tillie tries to help with the milk train assigned to Jebediah , but the tower insists that she is too small for the job . Georgia is assigned to pull the birthday train . A clown named Rollo leads the toys into the train , including Stretch , a basketball player ; Missy , a ballerina ; a Handy Pandy , a panda ; Perky , an elephant ; and Grumpella , a stuffed bird . During her journey , Georgia breaks down and is taken back to the roundhouse by Doc . Left behind , Rollo eventually takes Doc's advice of flagging down one of the other engines returning from their daily runs over the mountain . Farnsworth and Pete turn down their offers to pull the train . Meanwhile , Tillie asks the tower to rescue the stranded train , but he insists that Tillie will never do the job . Jebediah turns down his offer to pull the train because of his age , and returns to the roundhouse . Chip and Tillie sneak past the sleeping tower and pulls the birthday train up a mountain . \nQuestion: How many times is Tillie told that she can't pull a train?\nAnswer:", " yes, 2"], ["Eric , a young boy , is excited about his birthday after reading a book and believes that a train will come for him , despite his sister Jill's disbelief . A few hours later , the train station's control tower wakes up and in the roundhouse , Tillie , a young little blue switcher engine , along with her best bird friend , Chip , wakes up four other trains : Georgia , a kind all-purpose engine , Farnsworth , a stuck-up passenger engine , Jebediah , a worn-out old engine , and Pete , a gruff , burly freight engine . After the tower assigns Farnsworth and Pete their jobs , Tillie tries to help with the milk train assigned to Jebediah , but the tower insists that she is too small for the job . Georgia is assigned to pull the birthday train . A clown named Rollo leads the toys into the train , including Stretch , a basketball player ; Missy , a ballerina ; a Handy Pandy , a panda ; Perky , an elephant ; and Grumpella , a stuffed bird . During her journey , Georgia breaks down and is taken back to the roundhouse by Doc . Left behind , Rollo eventually takes Doc's advice of flagging down one of the other engines returning from their daily runs over the mountain . Farnsworth and Pete turn down their offers to pull the train . Meanwhile , Tillie asks the tower to rescue the stranded train , but he insists that Tillie will never do the job . Jebediah turns down his offer to pull the train because of his age , and returns to the roundhouse . Chip and Tillie sneak past the sleeping tower and pulls the birthday train up a mountain . \nQuestion: How many times is Tillie told that she can't pull a train?\nAnswer:", " no, 2"], ["Mary, Queen of Scots: The baby was Mary Stuart, who at the age of nine months was crowned Queen of Scots at the Chapel Royal, Stirling. When the news reached London, Henry VIII saw his chance to subdue Scotland again and negotiated a marriage between the infant Mary and his son Edward. The Scots refused, and Henry sent an army rampaging through Scotland on a campaign known as the \"Rough Wooing. \" The English king ordered his general to \"burn Edinburgh town so there may remain forever a perpetual memory of the vengeance of God lightened upon the Scots. \" But more was at stake than simply Scotland's independence: there was now a religious schism within Britain. In order to divorce Catherine of Aragon and marry Anne Boleyn, Henry VIII had broken with Rome and brought the English church under his own control. England was thus now a Protestant country, caught between Catholic France and the Scots with their new Catholic queen. The Scots themselves were divided, many embracing Protestantism in the spirit of the Reformation while others remained staunchly Catholic. However, fear of the rampaging English army led the Scots again to seek help from their old Ailies in France, and the young queen married the Dauphin Fran\u00e7ois, son of the French king. Fran\u00e7ois II became king of France in 1559 but died soon after. In 1561 the 18-year-old Mary returned to a Scotland in the grip of the Reformation, as Protestant leaders had taken control of the Scottish parliament and abolished the authority of the pope. Her Protestant cousin, Elizabeth Tudor, was on the English throne, but Elizabeth\u2002\u2014\u2002the \"Virgin Queen\"\u2002\u2014\u2002had no heir. Mary was next in line for the English crown, and Elizabeth was suspicious of her intentions. The six years of Mary's reign were turbulent ones. She clashed early on with Edinburgh's famous Protestant reformer, John Knox, who held sway in St. Giles but later adopted an uneasy policy of religious tolerance. In 1565 she married her young cousin Henry, Lord Darnley, much to the chagrin of Elizabeth (Darnley was a grandson of Margaret Tudor and thus also had a claim to the English throne). On 19 June 1566, in the royal apartments in Edinburgh Castle, Mary gave birth to a son, Prince James. Within a year, however, Darnley was murdered, and Mary immediately immersed herself in controversy by marrying the Earl of Bothwell, the chief suspect. \nQuestion: Which English king ordered his general to \"burn Edinburgh town\" to enstill the memory of the vengeance of God?\nAnswer:", " yes, Antythng else"], ["Mary, Queen of Scots: The baby was Mary Stuart, who at the age of nine months was crowned Queen of Scots at the Chapel Royal, Stirling. When the news reached London, Henry VIII saw his chance to subdue Scotland again and negotiated a marriage between the infant Mary and his son Edward. The Scots refused, and Henry sent an army rampaging through Scotland on a campaign known as the \"Rough Wooing. \" The English king ordered his general to \"burn Edinburgh town so there may remain forever a perpetual memory of the vengeance of God lightened upon the Scots. \" But more was at stake than simply Scotland's independence: there was now a religious schism within Britain. In order to divorce Catherine of Aragon and marry Anne Boleyn, Henry VIII had broken with Rome and brought the English church under his own control. England was thus now a Protestant country, caught between Catholic France and the Scots with their new Catholic queen. The Scots themselves were divided, many embracing Protestantism in the spirit of the Reformation while others remained staunchly Catholic. However, fear of the rampaging English army led the Scots again to seek help from their old Ailies in France, and the young queen married the Dauphin Fran\u00e7ois, son of the French king. Fran\u00e7ois II became king of France in 1559 but died soon after. In 1561 the 18-year-old Mary returned to a Scotland in the grip of the Reformation, as Protestant leaders had taken control of the Scottish parliament and abolished the authority of the pope. Her Protestant cousin, Elizabeth Tudor, was on the English throne, but Elizabeth\u2002\u2014\u2002the \"Virgin Queen\"\u2002\u2014\u2002had no heir. Mary was next in line for the English crown, and Elizabeth was suspicious of her intentions. The six years of Mary's reign were turbulent ones. She clashed early on with Edinburgh's famous Protestant reformer, John Knox, who held sway in St. Giles but later adopted an uneasy policy of religious tolerance. In 1565 she married her young cousin Henry, Lord Darnley, much to the chagrin of Elizabeth (Darnley was a grandson of Margaret Tudor and thus also had a claim to the English throne). On 19 June 1566, in the royal apartments in Edinburgh Castle, Mary gave birth to a son, Prince James. Within a year, however, Darnley was murdered, and Mary immediately immersed herself in controversy by marrying the Earl of Bothwell, the chief suspect. \nQuestion: Which English king ordered his general to \"burn Edinburgh town\" to enstill the memory of the vengeance of God?\nAnswer:", " no, Antythng else"], ["Amateur tennis star Guy Haines wants to divorce his vulgar and unfaithful wife Miriam , so he can marry the elegant and beautiful Anne Morton , daughter of a senator . While on a train to meet Miriam , Haines meets Bruno Anthony , a forward stranger who recognizes Guy from gossip items in the newspapers that detail his marital problems . During lunch in Bruno's compartment , Bruno tells Guy about his idea for the perfect `` Criss-cross '' murder : he will kill Miriam and in exchange , Guy will kill Bruno's father . Since both are strangers , otherwise unconnected , there is no identifiable motive for the crimes , Bruno contends , hence no suspicion . Guy hurriedly leaves the compartment but leaves Bruno thinking he has agreed to the deal . Guy accidentally leaves his cigarette lighter behind , a gift from Anne to Guy , Which Bruno pockets . Bruno heads to Guy's hometown of Metcalf and follows Miriam and her two beaux to an amusement park , where he briefly illuminates her face with Guy's lighter , then strangles her to death . Guy's problems begin when his alibi an inebriated college professor on the same train as Guy can not remember their meeting . But they increase exponentially when Bruno makes repeated appearances into Guy's life as he seeks to remind Guy that he is now obliged to kill Bruno's father , according to the bargain he thinks they struck on the train . Bruno sends Guy the keys to his house , a map to his father's room , and a pistol . Soon after , Bruno appears at a party at Senator Morton's house and hobnobs with the guests , much to Guy's apprehension and Anne's increasing suspicion . \nQuestion: Why does the amateur tennis star Guy Haines want to divorce his wife Mirriam\nAnswer:", " yes, Unfaithful"], ["Amateur tennis star Guy Haines wants to divorce his vulgar and unfaithful wife Miriam , so he can marry the elegant and beautiful Anne Morton , daughter of a senator . While on a train to meet Miriam , Haines meets Bruno Anthony , a forward stranger who recognizes Guy from gossip items in the newspapers that detail his marital problems . During lunch in Bruno's compartment , Bruno tells Guy about his idea for the perfect `` Criss-cross '' murder : he will kill Miriam and in exchange , Guy will kill Bruno's father . Since both are strangers , otherwise unconnected , there is no identifiable motive for the crimes , Bruno contends , hence no suspicion . Guy hurriedly leaves the compartment but leaves Bruno thinking he has agreed to the deal . Guy accidentally leaves his cigarette lighter behind , a gift from Anne to Guy , Which Bruno pockets . Bruno heads to Guy's hometown of Metcalf and follows Miriam and her two beaux to an amusement park , where he briefly illuminates her face with Guy's lighter , then strangles her to death . Guy's problems begin when his alibi an inebriated college professor on the same train as Guy can not remember their meeting . But they increase exponentially when Bruno makes repeated appearances into Guy's life as he seeks to remind Guy that he is now obliged to kill Bruno's father , according to the bargain he thinks they struck on the train . Bruno sends Guy the keys to his house , a map to his father's room , and a pistol . Soon after , Bruno appears at a party at Senator Morton's house and hobnobs with the guests , much to Guy's apprehension and Anne's increasing suspicion . \nQuestion: Why does the amateur tennis star Guy Haines want to divorce his wife Mirriam\nAnswer:", " no, Unfaithful"], ["In 1863, Alexander II re-convened the Diet of Finland and initiated several reforms increasing Finland's autonomy from Russia including establishment of its own currency, the markka. Liberation of business led to increased foreign investment and industrial development. Finland also got its first railways, separately established under Finnish administration. Finally, the elevation of Finnish from a language of the common people to a national language equal to Swedish opened opportunities for a larger proportion of the society. Alexander II is still regarded as \"The Good Tsar\" in Finland. These reforms could be seen as results of a genuine belief that reforms were easier to test in an underpopulated, homogeneous country, than in the whole of Russia. They may also be seen as a reward for the loyalty of its relatively western-oriented population during the Crimean War and during the Polish uprising. Encouraging Finnish nationalism and language can also be seen as an attempt to dilute ties with Sweden. \nQuestion: What are some reforms that increased Finland's autonomy from Russia?\nAnswer:", " yes, Liberation of business led to increased foreign investment and industrial development"], ["In 1863, Alexander II re-convened the Diet of Finland and initiated several reforms increasing Finland's autonomy from Russia including establishment of its own currency, the markka. Liberation of business led to increased foreign investment and industrial development. Finland also got its first railways, separately established under Finnish administration. Finally, the elevation of Finnish from a language of the common people to a national language equal to Swedish opened opportunities for a larger proportion of the society. Alexander II is still regarded as \"The Good Tsar\" in Finland. These reforms could be seen as results of a genuine belief that reforms were easier to test in an underpopulated, homogeneous country, than in the whole of Russia. They may also be seen as a reward for the loyalty of its relatively western-oriented population during the Crimean War and during the Polish uprising. Encouraging Finnish nationalism and language can also be seen as an attempt to dilute ties with Sweden. \nQuestion: What are some reforms that increased Finland's autonomy from Russia?\nAnswer:", " no, Liberation of business led to increased foreign investment and industrial development"]]
\ No newline at end of file
{"results": {"multirc": {"acc": 0.4, "acc_stderr": 0.16329931618554522}}, "versions": {"multirc": 0}}
\ No newline at end of file
{"results": {"multirc": {"acc": 0.07450157397691501, "acc_stderr": 0.008510441526175931}}, "versions": {"multirc": 0}}
\ No newline at end of file
78a49a0ca1a47373adb33463b1d092e6bc0d8f4b01bcb380ada48065037849d7
\ No newline at end of file
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment