Unverified Commit c2be7211 authored by philipdoldo's avatar philipdoldo Committed by GitHub
Browse files

`bbh_cot_fewshot`: Removed repeated "Let''s think step by step." text from bbh cot prompts (#3140)



* Removed the 'Let''s think step by step.' text from the start of the target entry in each of the samples to prevent this phrase from being repeated twice in the few-shot prompts and to match the behavior from the original bbh repository. Worth noting that this applied to only 26 out of 27 subtasks, the only one it did not apply to is boolean_expressions.yaml. When it comes to boolean_expressions.yaml, in my opinion there is an error in that it doesn't say the 'Remember that (i) ...' text after the final 'A: Let's think step by step.' in the prompt. Models like EleutherAI/gpt-neo-125m seem to always begin answers with this string anyway (copying what was done in the few-shot prompts), but I think it really should've been part of the prompt, much like how 'A: Let's think step by step.' is included in the prompt for all of the cot tasks. However, the original bbh repo also has this issue, so I think it is fine to keep it this way for consistency, but just thought I'd point it out anyway.

* feat: remove extra space from answers; add changelog

---------
Co-authored-by: default avatarBaber <baber@hey.com>
parent 51ede33c
...@@ -53,4 +53,7 @@ None. ...@@ -53,4 +53,7 @@ None.
- [ ] Majority voting "without CoT" - [ ] Majority voting "without CoT"
### Changelog ### Changelog
no version change: changed dataset to `SaylorTwift/bbh`. Do not expect any change in the results. - no version change: changed dataset to `SaylorTwift/bbh`. Do not expect any change in the results.
- `bbh_cot_fewshot` v.4.0; 2025-07-14:
- PR #3140. Removed duplicate "Let's think step by step" from the fewshots.
- set target_delimiter to "" as the fewshot samples end with a newline character.
...@@ -2,6 +2,7 @@ dataset_path: SaylorTwift/bbh ...@@ -2,6 +2,7 @@ dataset_path: SaylorTwift/bbh
output_type: generate_until output_type: generate_until
test_split: test test_split: test
doc_to_target: "{{target}}" doc_to_target: "{{target}}"
target_delimiter: ""
metric_list: metric_list:
- metric: exact_match - metric: exact_match
aggregation: mean aggregation: mean
...@@ -24,4 +25,4 @@ filter_list: ...@@ -24,4 +25,4 @@ filter_list:
- function: "take_first" - function: "take_first"
num_fewshot: 3 num_fewshot: 3
metadata: metadata:
version: 3.0 version: 4.0
...@@ -26,9 +26,7 @@ fewshot_config: ...@@ -26,9 +26,7 @@ fewshot_config:
- Yes - Yes
- No' - No'
target: 'Let''s think step by step. target: 'Here in this question, we are told that "Frank T. had no experience with guns,
Here in this question, we are told that "Frank T. had no experience with guns,
his hand slipped on the barrel of the gun, and the shot went wild." A typical his hand slipped on the barrel of the gun, and the shot went wild." A typical
person would assume that this passage suggests that Frank T. had no intention person would assume that this passage suggests that Frank T. had no intention
of shooting and injuring someone and that the bullet accidentally hit the neighbor''s of shooting and injuring someone and that the bullet accidentally hit the neighbor''s
...@@ -50,9 +48,7 @@ fewshot_config: ...@@ -50,9 +48,7 @@ fewshot_config:
- Yes - Yes
- No' - No'
target: 'Let''s think step by step. target: 'Here in this question, we are told that the boss ordered them both to arrive
Here in this question, we are told that the boss ordered them both to arrive
at the meeting room at the same time and that the motion detector was set up at the meeting room at the same time and that the motion detector was set up
to be triggered if at least one person appeared in the room at the same time." to be triggered if at least one person appeared in the room at the same time."
A typical person would assume that the person probably meant to say the detector A typical person would assume that the person probably meant to say the detector
...@@ -82,9 +78,7 @@ fewshot_config: ...@@ -82,9 +78,7 @@ fewshot_config:
- Yes - Yes
- No' - No'
target: 'Let''s think step by step. target: 'Here in this question, we are told that "He aims the dart at the low point region."
Here in this question, we are told that "He aims the dart at the low point region."
A typical person might therefore think George did intentionally hit the low A typical person might therefore think George did intentionally hit the low
point region, because he wanted to lift up the spirit of his sister Lena. So point region, because he wanted to lift up the spirit of his sister Lena. So
the answer is Yes.' the answer is Yes.'
......
...@@ -26,9 +26,7 @@ fewshot_config: ...@@ -26,9 +26,7 @@ fewshot_config:
(E) 07/14/1938 (E) 07/14/1938
(F) 12/14/1988' (F) 12/14/1988'
target: 'Let''s think step by step. target: 'If today is Christmas Eve of 1937, then today''s date is December 24, 1937.
If today is Christmas Eve of 1937, then today''s date is December 24, 1937.
10 days before today is December 14, 1937, that is 12/14/1937. So the answer 10 days before today is December 14, 1937, that is 12/14/1937. So the answer
is (D).' is (D).'
- input: 'Tomorrow is 11/12/2019. What is the date one year ago from today in MM/DD/YYYY? - input: 'Tomorrow is 11/12/2019. What is the date one year ago from today in MM/DD/YYYY?
...@@ -44,9 +42,7 @@ fewshot_config: ...@@ -44,9 +42,7 @@ fewshot_config:
(D) 11/02/2018 (D) 11/02/2018
(E) 11/04/2018' (E) 11/04/2018'
target: 'Let''s think step by step. target: 'If tomorrow is 11/12/2019, then today is 11/11/2019. The date one year ago from
If tomorrow is 11/12/2019, then today is 11/11/2019. The date one year ago from
today is 11/11/2018. So the answer is (B).' today is 11/11/2018. So the answer is (B).'
- input: 'Jane and John married on Jan 2, 1958. It is their 5-year anniversary today. - input: 'Jane and John married on Jan 2, 1958. It is their 5-year anniversary today.
What is the date tomorrow in MM/DD/YYYY? What is the date tomorrow in MM/DD/YYYY?
...@@ -64,9 +60,7 @@ fewshot_config: ...@@ -64,9 +60,7 @@ fewshot_config:
(E) 01/03/1982 (E) 01/03/1982
(F) 12/03/1960' (F) 12/03/1960'
target: 'Let''s think step by step. target: 'If Jane and John married on Jan 2, 1958, then and if it is their 5-year anniversary
If Jane and John married on Jan 2, 1958, then and if it is their 5-year anniversary
today, then today''s date is Jan 2, 1963. The date tomorrow is Jan 3, 1963, today, then today''s date is Jan 2, 1963. The date tomorrow is Jan 3, 1963,
that is 01/03/1963. So the answer is (B).' that is 01/03/1963. So the answer is (B).'
include: _cot_fewshot_template_yaml include: _cot_fewshot_template_yaml
......
...@@ -23,9 +23,7 @@ fewshot_config: ...@@ -23,9 +23,7 @@ fewshot_config:
(B) The counselor took the day off (B) The counselor took the day off
(C) Ambiguous' (C) Ambiguous'
target: 'Let''s think step by step. target: 'Here we need to determine who the pronoun "they" might be referring to. There
Here we need to determine who the pronoun "they" might be referring to. There
are two possible referents for "they", namely the chief and the counselor. The are two possible referents for "they", namely the chief and the counselor. The
verb "told" might be able to help us determine which one is more likely (if verb "told" might be able to help us determine which one is more likely (if
either). Let X be the chief and Y the counselor. The sentence is then of the either). Let X be the chief and Y the counselor. The sentence is then of the
...@@ -54,9 +52,7 @@ fewshot_config: ...@@ -54,9 +52,7 @@ fewshot_config:
(B) The manager didn''t reply yet (B) The manager didn''t reply yet
(C) Ambiguous' (C) Ambiguous'
target: 'Let''s think step by step. target: 'Here we need to determine who the pronoun "he" might be referring to. There
Here we need to determine who the pronoun "he" might be referring to. There
are two possible referents for "he", namely the manager and the secretary. The are two possible referents for "he", namely the manager and the secretary. The
verbs "sent" and "reply" might be able to help us determine which one is more verbs "sent" and "reply" might be able to help us determine which one is more
likely (if either). Let X be the manager and Y the secretary. The sentence is likely (if either). Let X be the manager and Y the secretary. The sentence is
...@@ -84,9 +80,7 @@ fewshot_config: ...@@ -84,9 +80,7 @@ fewshot_config:
(B) It will be the director''s office (B) It will be the director''s office
(C) Ambiguous' (C) Ambiguous'
target: 'Let''s think step by step. target: 'Here we need to determine who the pronoun "his" might be referring to. There
Here we need to determine who the pronoun "his" might be referring to. There
are two possible referents for "his", namely Bailey''s and the director''s. are two possible referents for "his", namely Bailey''s and the director''s.
The verb phrase "plan to meet" might be able to help us determine which one The verb phrase "plan to meet" might be able to help us determine which one
is more likely (if either). Let X be Bailey and Y the director. The sentence is more likely (if either). Let X be Bailey and Y the director. The sentence
......
...@@ -13,9 +13,7 @@ fewshot_config: ...@@ -13,9 +13,7 @@ fewshot_config:
samples: samples:
- input: 'Complete the rest of the sequence, making sure that the parentheses are - input: 'Complete the rest of the sequence, making sure that the parentheses are
closed properly. Input: [ { [' closed properly. Input: [ { ['
target: 'Let''s think step by step. target: 'We should process each input one by one and keep track of the stack configuration.
We should process each input one by one and keep track of the stack configuration.
0: empty stack 0: empty stack
...@@ -32,9 +30,7 @@ fewshot_config: ...@@ -32,9 +30,7 @@ fewshot_config:
So, we need "]", "}", "]". So the answer is ] } ].' So, we need "]", "}", "]". So the answer is ] } ].'
- input: 'Complete the rest of the sequence, making sure that the parentheses are - input: 'Complete the rest of the sequence, making sure that the parentheses are
closed properly. Input: < > ( ( [ [ ( { } ) [ < > ] ]' closed properly. Input: < > ( ( [ [ ( { } ) [ < > ] ]'
target: 'Let''s think step by step. target: 'We should process each input one by one and keep track of the stack configuration.
We should process each input one by one and keep track of the stack configuration.
0: empty stack 0: empty stack
...@@ -76,9 +72,7 @@ fewshot_config: ...@@ -76,9 +72,7 @@ fewshot_config:
- input: 'Complete the rest of the sequence, making sure that the parentheses are - input: 'Complete the rest of the sequence, making sure that the parentheses are
closed properly. Input: < [ < [ { < [ ] < { } > > } ] > { { ( ) } { < [ < > closed properly. Input: < [ < [ { < [ ] < { } > > } ] > { { ( ) } { < [ < >
] > }' ] > }'
target: 'Let''s think step by step. target: 'We should process each input one by one and keep track of the stack configuration.
We should process each input one by one and keep track of the stack configuration.
0: empty stack 0: empty stack
......
...@@ -25,7 +25,7 @@ fewshot_config: ...@@ -25,7 +25,7 @@ fewshot_config:
- valid - valid
- invalid' - invalid'
target: "Let's think step by step.\n(1) Lesley is a close friend of Fernando:\ target: "(1) Lesley is a close friend of Fernando:\
\ Lesley = friend(Fernando).\n(2) Being a close friend of Fernando or a schoolmate\ \ Lesley = friend(Fernando).\n(2) Being a close friend of Fernando or a schoolmate\
\ of Lowell is sufficient for being a great-grandfather of Leroy: If X = friend(Fernando)\ \ of Lowell is sufficient for being a great-grandfather of Leroy: If X = friend(Fernando)\
\ OR SCHOOLMATE(Lowell), then X = great-grandfather(Leroy).\nHypothesis: Does\ \ OR SCHOOLMATE(Lowell), then X = great-grandfather(Leroy).\nHypothesis: Does\
...@@ -49,7 +49,7 @@ fewshot_config: ...@@ -49,7 +49,7 @@ fewshot_config:
- valid - valid
- invalid' - invalid'
target: "Let's think step by step.\n(1) Whoever is not a great-grandfather of\ target: "(1) Whoever is not a great-grandfather of\
\ Clyde is a stepbrother of Brian: If X = NOT (great-grandfather(Clyde)), then\ \ Clyde is a stepbrother of Brian: If X = NOT (great-grandfather(Clyde)), then\
\ X = stepbrother(Brian).\n(2): Being an ancestor of Dana is sufficient for\ \ X = stepbrother(Brian).\n(2): Being an ancestor of Dana is sufficient for\
\ not being a great-grandfather of Clyde: If X = ancestor(Dana), X = NOT (great-grandfather(Clyde)).\n\ \ not being a great-grandfather of Clyde: If X = ancestor(Dana), X = NOT (great-grandfather(Clyde)).\n\
...@@ -78,7 +78,7 @@ fewshot_config: ...@@ -78,7 +78,7 @@ fewshot_config:
- valid - valid
- invalid' - invalid'
target: "Let's think step by step.\n(1) Every infrequent user of Paul Mitchell\ target: "(1) Every infrequent user of Paul Mitchell\
\ shampoo is either a rare consumer of Nioxin shampoo or a loyal buyer of Caress\ \ shampoo is either a rare consumer of Nioxin shampoo or a loyal buyer of Caress\
\ soap, or both: If X = infrequent-user(Paul Mitchell), then X = rare-consumer(Nioxin)\ \ soap, or both: If X = infrequent-user(Paul Mitchell), then X = rare-consumer(Nioxin)\
\ OR X = loyal-buyer(Caress).\n(2): No regular consumer of Lush soap is a rare\ \ OR X = loyal-buyer(Caress).\n(2): No regular consumer of Lush soap is a rare\
......
...@@ -36,9 +36,7 @@ fewshot_config: ...@@ -36,9 +36,7 @@ fewshot_config:
(I) sector (I) sector
(J) triangle' (J) triangle'
target: 'Let''s think step by step. target: 'This SVG path element contains "M" and "L" commands. M takes two parameters
This SVG path element contains "M" and "L" commands. M takes two parameters
(x,y) and moves the current point to the coordinates (x,y). L takes two parameters (x,y) and moves the current point to the coordinates (x,y). L takes two parameters
(x,y) and draws a line from the previous coordinate to the new coordinate (x,y). (x,y) and draws a line from the previous coordinate to the new coordinate (x,y).
...@@ -90,9 +88,7 @@ fewshot_config: ...@@ -90,9 +88,7 @@ fewshot_config:
(I) sector (I) sector
(J) triangle' (J) triangle'
target: 'Let''s think step by step. target: 'This SVG path element contains "M" and "L" commands. M takes two parameters
This SVG path element contains "M" and "L" commands. M takes two parameters
(x,y) and moves the current point to the coordinates (x,y). L takes two parameters (x,y) and moves the current point to the coordinates (x,y). L takes two parameters
(x,y) and draws a line from the previous coordinate to the new coordinate (x,y). (x,y) and draws a line from the previous coordinate to the new coordinate (x,y).
...@@ -138,9 +134,7 @@ fewshot_config: ...@@ -138,9 +134,7 @@ fewshot_config:
(I) sector (I) sector
(J) triangle' (J) triangle'
target: 'Let''s think step by step. target: 'This SVG path element contains "M" and "L" commands. M takes two parameters
This SVG path element contains "M" and "L" commands. M takes two parameters
(x,y) and moves the current point to the coordinates (x,y). L takes two parameters (x,y) and moves the current point to the coordinates (x,y). L takes two parameters
(x,y) and draws a line from the previous coordinate to the new coordinate (x,y). (x,y) and draws a line from the previous coordinate to the new coordinate (x,y).
......
...@@ -18,9 +18,7 @@ fewshot_config: ...@@ -18,9 +18,7 @@ fewshot_config:
(A) rubber terrible ship (A) rubber terrible ship
(B) terrible rubber ship' (B) terrible rubber ship'
target: 'Let''s think step by step. target: 'When there is more than one adjective before a noun, the adjectives need to
When there is more than one adjective before a noun, the adjectives need to
respect the following order before a noun: "[1. opinion] [2. size] [3. age] respect the following order before a noun: "[1. opinion] [2. size] [3. age]
[4. shape] [5. color] [6. origin] [7. material] [8. purpose] noun". [4. shape] [5. color] [6. origin] [7. material] [8. purpose] noun".
...@@ -39,9 +37,7 @@ fewshot_config: ...@@ -39,9 +37,7 @@ fewshot_config:
(A) repulsive small Brazilian exercise ship (A) repulsive small Brazilian exercise ship
(B) Brazilian repulsive exercise small ship' (B) Brazilian repulsive exercise small ship'
target: 'Let''s think step by step. target: 'When there is more than one adjective before a noun, the adjectives need to
When there is more than one adjective before a noun, the adjectives need to
respect the following order before a noun: "[1. opinion] [2. size] [3. age] respect the following order before a noun: "[1. opinion] [2. size] [3. age]
[4. shape] [5. color] [6. origin] [7. material] [8. purpose] noun". [4. shape] [5. color] [6. origin] [7. material] [8. purpose] noun".
...@@ -63,9 +59,7 @@ fewshot_config: ...@@ -63,9 +59,7 @@ fewshot_config:
(A) blue gold wonderful square shoe (A) blue gold wonderful square shoe
(B) wonderful square blue gold shoe' (B) wonderful square blue gold shoe'
target: 'Let''s think step by step. target: 'When there is more than one adjective before a noun, the adjectives need to
When there is more than one adjective before a noun, the adjectives need to
respect the following order before a noun: "[1. opinion] [2. size] [3. age] respect the following order before a noun: "[1. opinion] [2. size] [3. age]
[4. shape] [5. color] [6. origin] [7. material] [8. purpose] noun". [4. shape] [5. color] [6. origin] [7. material] [8. purpose] noun".
......
...@@ -24,9 +24,7 @@ fewshot_config: ...@@ -24,9 +24,7 @@ fewshot_config:
(B) Eli finished last (B) Eli finished last
(C) Eve finished last' (C) Eve finished last'
target: 'Let''s think step by step. target: '(1) Eve finished above Amy: "(above) ? Eve ? Amy ? (below)".
(1) Eve finished above Amy: "(above) ? Eve ? Amy ? (below)".
(2) Eli finished below Amy: "(above) ? Amy ? Eli ? (below)". (2) Eli finished below Amy: "(above) ? Amy ? Eli ? (below)".
...@@ -50,9 +48,7 @@ fewshot_config: ...@@ -50,9 +48,7 @@ fewshot_config:
(B) The green book is the leftmost (B) The green book is the leftmost
(C) The orange book is the leftmost' (C) The orange book is the leftmost'
target: 'Let''s think step by step. target: '(1) The green book is to the right of the white book: "(left) ? white ? green
(1) The green book is to the right of the white book: "(left) ? white ? green
? (right)". ? (right)".
(2) The orange book is the rightmost: "(left) ? white ? green orange (right)". (2) The orange book is the rightmost: "(left) ? white ? green orange (right)".
...@@ -76,9 +72,7 @@ fewshot_config: ...@@ -76,9 +72,7 @@ fewshot_config:
(B) The gray book is the leftmost (B) The gray book is the leftmost
(C) The white book is the leftmost' (C) The white book is the leftmost'
target: 'Let''s think step by step. target: '(1) The white book is to the left of the gray book: "(left) ? white ? gray ?
(1) The white book is to the left of the gray book: "(left) ? white ? gray ?
(right)". (right)".
(2) The red book is the second from the left: "(left) ? white red gray ? (right)". (2) The red book is the second from the left: "(left) ? white red gray ? (right)".
......
...@@ -24,9 +24,7 @@ fewshot_config: ...@@ -24,9 +24,7 @@ fewshot_config:
(B) Eli finished last (B) Eli finished last
(C) Eve finished last' (C) Eve finished last'
target: 'Let''s think step by step. target: '(1) Eve finished above Amy: "(above) ? Eve ? Amy ? (below)".
(1) Eve finished above Amy: "(above) ? Eve ? Amy ? (below)".
(2) Eli finished below Amy: "(above) ? Amy ? Eli ? (below)". (2) Eli finished below Amy: "(above) ? Amy ? Eli ? (below)".
...@@ -50,9 +48,7 @@ fewshot_config: ...@@ -50,9 +48,7 @@ fewshot_config:
(B) The green book is the leftmost (B) The green book is the leftmost
(C) The orange book is the leftmost' (C) The orange book is the leftmost'
target: 'Let''s think step by step. target: '(1) The green book is to the right of the white book: "(left) ? white ? green
(1) The green book is to the right of the white book: "(left) ? white ? green
? (right)". ? (right)".
(2) The orange book is the rightmost: "(left) ? white ? green orange (right)". (2) The orange book is the rightmost: "(left) ? white ? green orange (right)".
...@@ -76,9 +72,7 @@ fewshot_config: ...@@ -76,9 +72,7 @@ fewshot_config:
(B) The gray book is the leftmost (B) The gray book is the leftmost
(C) The white book is the leftmost' (C) The white book is the leftmost'
target: 'Let''s think step by step. target: '(1) The white book is to the left of the gray book: "(left) ? white ? gray ?
(1) The white book is to the left of the gray book: "(left) ? white ? gray ?
(right)". (right)".
(2) The red book is the second from the left: "(left) ? white red gray ? (right)". (2) The red book is the second from the left: "(left) ? white red gray ? (right)".
......
...@@ -24,9 +24,7 @@ fewshot_config: ...@@ -24,9 +24,7 @@ fewshot_config:
(B) Eli finished last (B) Eli finished last
(C) Eve finished last' (C) Eve finished last'
target: 'Let''s think step by step. target: '(1) Eve finished above Amy: "(above) ? Eve ? Amy ? (below)".
(1) Eve finished above Amy: "(above) ? Eve ? Amy ? (below)".
(2) Eli finished below Amy: "(above) ? Amy ? Eli ? (below)". (2) Eli finished below Amy: "(above) ? Amy ? Eli ? (below)".
...@@ -50,9 +48,7 @@ fewshot_config: ...@@ -50,9 +48,7 @@ fewshot_config:
(B) The green book is the leftmost (B) The green book is the leftmost
(C) The orange book is the leftmost' (C) The orange book is the leftmost'
target: 'Let''s think step by step. target: '(1) The green book is to the right of the white book: "(left) ? white ? green
(1) The green book is to the right of the white book: "(left) ? white ? green
? (right)". ? (right)".
(2) The orange book is the rightmost: "(left) ? white ? green orange (right)". (2) The orange book is the rightmost: "(left) ? white ? green orange (right)".
...@@ -76,9 +72,7 @@ fewshot_config: ...@@ -76,9 +72,7 @@ fewshot_config:
(B) The gray book is the leftmost (B) The gray book is the leftmost
(C) The white book is the leftmost' (C) The white book is the leftmost'
target: 'Let''s think step by step. target: '(1) The white book is to the left of the gray book: "(left) ? white ? gray ?
(1) The white book is to the left of the gray book: "(left) ? white ? gray ?
(right)". (right)".
(2) The red book is the second from the left: "(left) ? white red gray ? (right)". (2) The red book is the second from the left: "(left) ? white red gray ? (right)".
......
...@@ -26,9 +26,7 @@ fewshot_config: ...@@ -26,9 +26,7 @@ fewshot_config:
(D) The Barkley Marathons The Race That Eats Its Young (D) The Barkley Marathons The Race That Eats Its Young
(E) Bug' (E) Bug'
target: 'Let''s think step by step. target: '- Star Wars Episode IV - A New Hope (action, adventure, fantasy; 1977)
- Star Wars Episode IV - A New Hope (action, adventure, fantasy; 1977)
- Indiana Jones and the Last Crusade (action, adventure; 1989) - Indiana Jones and the Last Crusade (action, adventure; 1989)
...@@ -54,9 +52,7 @@ fewshot_config: ...@@ -54,9 +52,7 @@ fewshot_config:
(D) The Salton Sea (D) The Salton Sea
(E) Extreme Days' (E) Extreme Days'
target: 'Let''s think step by step. target: '- Twister (action, adventure, thriller; 1996)
- Twister (action, adventure, thriller; 1996)
- The Silence of the Lambs (crime, drama, thriller; 1991) - The Silence of the Lambs (crime, drama, thriller; 1991)
...@@ -79,9 +75,7 @@ fewshot_config: ...@@ -79,9 +75,7 @@ fewshot_config:
(C) Catwoman (C) Catwoman
(D) Edge of Tomorrow' (D) Edge of Tomorrow'
target: 'Let''s think step by step. target: '- Minority Report (action, crime, mystery; 2002)
- Minority Report (action, crime, mystery; 2002)
- Total Recall (action, adventure, science-fiction; 2012) - Total Recall (action, adventure, science-fiction; 2012)
......
...@@ -12,7 +12,7 @@ fewshot_config: ...@@ -12,7 +12,7 @@ fewshot_config:
sampler: first_n sampler: first_n
samples: samples:
- input: ((-5 + 9 * -4 - 0) * (4 + -7 + 0 * -5)) = - input: ((-5 + 9 * -4 - 0) * (4 + -7 + 0 * -5)) =
target: "Let's think step by step.\nLet\u2019s recall that the order of operations\ target: "Let\u2019s recall that the order of operations\
\ in mathematics is as follows: (1) Parentheses, (2) exponents, (3) multiplication\ \ in mathematics is as follows: (1) Parentheses, (2) exponents, (3) multiplication\
\ and division (from left to right), (4) addition and multiplication (from left\ \ and division (from left to right), (4) addition and multiplication (from left\
\ to right). So, remember to always compute the expressions inside parentheses\ \ to right). So, remember to always compute the expressions inside parentheses\
...@@ -23,7 +23,7 @@ fewshot_config: ...@@ -23,7 +23,7 @@ fewshot_config:
\ + 0) = (4 + -7) = (4 - 7) = -3.\nThen, the final equation is A * B = -41 *\ \ + 0) = (4 + -7) = (4 - 7) = -3.\nThen, the final equation is A * B = -41 *\
\ -3 = (-61) * (-3) = 123. So the answer is 123." \ -3 = (-61) * (-3) = 123. So the answer is 123."
- input: ((-9 * 7 * 7 * -9) + (4 * -9 - 8 - -4)) = - input: ((-9 * 7 * 7 * -9) + (4 * -9 - 8 - -4)) =
target: "Let's think step by step.\nLet\u2019s recall that the order of operations\ target: "Let\u2019s recall that the order of operations\
\ in mathematics is as follows: (1) Parentheses, (2) exponents, (3) multiplication\ \ in mathematics is as follows: (1) Parentheses, (2) exponents, (3) multiplication\
\ and division (from left to right), (4) addition and multiplication (from left\ \ and division (from left to right), (4) addition and multiplication (from left\
\ to right). So, remember to always compute the expressions inside parentheses\ \ to right). So, remember to always compute the expressions inside parentheses\
...@@ -34,7 +34,7 @@ fewshot_config: ...@@ -34,7 +34,7 @@ fewshot_config:
\ - 8) - (-4)) = (-44 - (-4)) = -40.\nThen, the final equation is A + B = 3969\ \ - 8) - (-4)) = (-44 - (-4)) = -40.\nThen, the final equation is A + B = 3969\
\ + -40 = 3969 - 40 = 3929. So the answer is 3929." \ + -40 = 3969 - 40 = 3929. So the answer is 3929."
- input: ((-3 + 5 * 8 * -4) - (9 - 8 * -7 + -9)) = - input: ((-3 + 5 * 8 * -4) - (9 - 8 * -7 + -9)) =
target: "Let's think step by step.\nLet\u2019s recall that the order of operations\ target: "Let\u2019s recall that the order of operations\
\ in mathematics is as follows: (1) Parentheses, (2) exponents, (3) multiplication\ \ in mathematics is as follows: (1) Parentheses, (2) exponents, (3) multiplication\
\ and division (from left to right), (4) addition and multiplication (from left\ \ and division (from left to right), (4) addition and multiplication (from left\
\ to right). So, remember to always compute the expressions inside parentheses\ \ to right). So, remember to always compute the expressions inside parentheses\
......
...@@ -21,9 +21,7 @@ fewshot_config: ...@@ -21,9 +21,7 @@ fewshot_config:
- Yes - Yes
- No' - No'
target: 'Let''s think step by step. target: 'We start at the origin (0, 0), facing the positive y-axis.
We start at the origin (0, 0), facing the positive y-axis.
(1) Turn left: (0, 0), facing the negative x-axis. (1) Turn left: (0, 0), facing the negative x-axis.
...@@ -49,9 +47,7 @@ fewshot_config: ...@@ -49,9 +47,7 @@ fewshot_config:
- Yes - Yes
- No' - No'
target: 'Let''s think step by step. target: 'We start at the origin (0, 0), facing the positive y-axis.
We start at the origin (0, 0), facing the positive y-axis.
(1) Turn around: (0, 0), facing the negative y-axis. (1) Turn around: (0, 0), facing the negative y-axis.
...@@ -76,9 +72,7 @@ fewshot_config: ...@@ -76,9 +72,7 @@ fewshot_config:
- Yes - Yes
- No' - No'
target: 'Let''s think step by step. target: 'We start at the origin (0, 0), facing the positive y-axis.
We start at the origin (0, 0), facing the positive y-axis.
(1) Always face forward: (0, 0), facing the positive y-axis. (1) Always face forward: (0, 0), facing the positive y-axis.
......
...@@ -14,9 +14,7 @@ fewshot_config: ...@@ -14,9 +14,7 @@ fewshot_config:
samples: samples:
- input: I have a blackberry, a clarinet, a nectarine, a plum, a strawberry, a banana, - input: I have a blackberry, a clarinet, a nectarine, a plum, a strawberry, a banana,
a flute, an orange, and a violin. How many fruits do I have? a flute, an orange, and a violin. How many fruits do I have?
target: 'Let''s think step by step. target: 'We first identify the fruits on the list and include their quantity in parentheses:
We first identify the fruits on the list and include their quantity in parentheses:
- blackberry (1) - blackberry (1)
...@@ -34,9 +32,7 @@ fewshot_config: ...@@ -34,9 +32,7 @@ fewshot_config:
answer is 6.' answer is 6.'
- input: I have an orange, a raspberry, two peaches, a blackberry, an apple, a grape, - input: I have an orange, a raspberry, two peaches, a blackberry, an apple, a grape,
a nectarine, and three plums. How many fruits do I have? a nectarine, and three plums. How many fruits do I have?
target: 'Let''s think step by step. target: 'We first identify the fruits on the list and include their quantity in parentheses:
We first identify the fruits on the list and include their quantity in parentheses:
- orange (1) - orange (1)
...@@ -58,9 +54,7 @@ fewshot_config: ...@@ -58,9 +54,7 @@ fewshot_config:
11. So the answer is 11.' 11. So the answer is 11.'
- input: I have a lettuce head, a head of broccoli, an onion, a stalk of celery, - input: I have a lettuce head, a head of broccoli, an onion, a stalk of celery,
two carrots, a garlic, and a yam. How many vegetables do I have? two carrots, a garlic, and a yam. How many vegetables do I have?
target: 'Let''s think step by step. target: 'We first identify the vegetables on the list and include their quantity in parentheses:
We first identify the vegetables on the list and include their quantity in parentheses:
- lettuce (1) - lettuce (1)
......
...@@ -32,9 +32,7 @@ fewshot_config: ...@@ -32,9 +32,7 @@ fewshot_config:
(D) 4 (D) 4
(E) 5' (E) 5'
target: 'Let''s think step by step. target: 'This question focuses on age. We know the following: Louis is 7 years old, Bernard
This question focuses on age. We know the following: Louis is 7 years old, Bernard
is 5 years old, Vincent is 9 years old, and Gwen is 8 years old. is 5 years old, Vincent is 9 years old, and Gwen is 8 years old.
Now, we add James to this table: James is 12 years old. Now, we add James to this table: James is 12 years old.
...@@ -59,9 +57,7 @@ fewshot_config: ...@@ -59,9 +57,7 @@ fewshot_config:
(D) Gwen (D) Gwen
(E) James' (E) James'
target: 'Let''s think step by step. target: 'This question focuses on age. We know the following: Louis is 7 years old, Bernard
This question focuses on age. We know the following: Louis is 7 years old, Bernard
is 5 years old, Vincent is 9 years old, and Gwen is 8 years old. is 5 years old, Vincent is 9 years old, and Gwen is 8 years old.
According to the table, Bernard (5) is the youngest amongst them. According to the table, Bernard (5) is the youngest amongst them.
...@@ -84,9 +80,7 @@ fewshot_config: ...@@ -84,9 +80,7 @@ fewshot_config:
(D) Gwen (D) Gwen
(E) James' (E) James'
target: 'Let''s think step by step. target: 'This question focuses on the name. We know the following: The names of the penguin
This question focuses on the name. We know the following: The names of the penguin
in the table are Louis, Bernard, Vincent, and Gwen. in the table are Louis, Bernard, Vincent, and Gwen.
When we sort their names alphabetically, we get Bernard, Gwen, Louis, Vincent. When we sort their names alphabetically, we get Bernard, Gwen, Louis, Vincent.
......
...@@ -52,9 +52,7 @@ fewshot_config: ...@@ -52,9 +52,7 @@ fewshot_config:
(Q) purple (Q) purple
(R) pink' (R) pink'
target: 'Let''s think step by step. target: 'According to this question, the color of the stress ball is blue. So the answer
According to this question, the color of the stress ball is blue. So the answer
is (E).' is (E).'
- input: 'On the table, you see a bunch of objects arranged in a row: a purple paperclip, - input: 'On the table, you see a bunch of objects arranged in a row: a purple paperclip,
a pink stress ball, a brown keychain, a green scrunchiephone charger, a mauve a pink stress ball, a brown keychain, a green scrunchiephone charger, a mauve
...@@ -98,9 +96,7 @@ fewshot_config: ...@@ -98,9 +96,7 @@ fewshot_config:
(Q) purple (Q) purple
(R) pink' (R) pink'
target: 'Let''s think step by step. target: 'According to this question, the objects are arranged in a row, from left to
According to this question, the objects are arranged in a row, from left to
right, as follows: (1) a purple paperclip, (2) a pink stress ball, (3) a brown right, as follows: (1) a purple paperclip, (2) a pink stress ball, (3) a brown
keychain, (4) a green scrunchiephone charger, (5) a mauve fidget spinner, (6) keychain, (4) a green scrunchiephone charger, (5) a mauve fidget spinner, (6)
a burgundy pen. a burgundy pen.
...@@ -129,9 +125,7 @@ fewshot_config: ...@@ -129,9 +125,7 @@ fewshot_config:
(F) five (F) five
(G) six' (G) six'
target: 'Let''s think step by step. target: 'According to this question, the objects are arranged in a row, from left to
According to this question, the objects are arranged in a row, from left to
right, as follows: (1) a teal plate, (2) a burgundy keychain, (3) a yellow scrunchiephone right, as follows: (1) a teal plate, (2) a burgundy keychain, (3) a yellow scrunchiephone
charger, (4) an orange mug, (5) a pink notebook, (6) a grey cup. charger, (4) an orange mug, (5) a pink notebook, (6) a grey cup.
......
...@@ -24,9 +24,7 @@ fewshot_config: ...@@ -24,9 +24,7 @@ fewshot_config:
(C) whitesnuake (C) whitesnuake
(D) mwhitesnake' (D) mwhitesnake'
target: 'Let''s think step by step. target: 'The original name is "whitesnake". This is the name of an old English hard rock
The original name is "whitesnake". This is the name of an old English hard rock
band. It is a compound word, formed by the words "white" and "snake". band. It is a compound word, formed by the words "white" and "snake".
(A) "whitesnape": It is formed by the combination of "white" and "snake"; therefore, (A) "whitesnape": It is formed by the combination of "white" and "snake"; therefore,
...@@ -57,9 +55,7 @@ fewshot_config: ...@@ -57,9 +55,7 @@ fewshot_config:
(C) one of our dinosaurs is pissing (C) one of our dinosaurs is pissing
(D) one of our dinosaur is missing' (D) one of our dinosaur is missing'
target: 'Let''s think step by step. target: 'The original name is "one of our dinosaurs is missing". This is the name of
The original name is "one of our dinosaurs is missing". This is the name of
an old British movie. an old British movie.
(A) "ofne of our dinosaurs is missing": Here "one of" is changed to "ofne", (A) "ofne of our dinosaurs is missing": Here "one of" is changed to "ofne",
...@@ -91,9 +87,7 @@ fewshot_config: ...@@ -91,9 +87,7 @@ fewshot_config:
(C) courting crows (C) courting crows
(D) coutnting crows' (D) coutnting crows'
target: 'Let''s think step by step. target: 'The original name is "counting crows". This is the name of an American rock
The original name is "counting crows". This is the name of an American rock
band. Historically, the band name comes from the British nursery rhyme "One band. Historically, the band name comes from the British nursery rhyme "One
for Sorrow", which is about counting of magpies. for Sorrow", which is about counting of magpies.
......
...@@ -42,9 +42,7 @@ fewshot_config: ...@@ -42,9 +42,7 @@ fewshot_config:
(E) Dropped Content (E) Dropped Content
(F) Facts' (F) Facts'
target: 'Let''s think step by step. target: 'We solve this question by first translating the source sentence to English and
We solve this question by first translating the source sentence to English and
then by comparing our translation with the provided translation. According to then by comparing our translation with the provided translation. According to
Google Translate, the correct translation of the source sentence from German Google Translate, the correct translation of the source sentence from German
to English is "The list of monuments in Lenzen (Elbe) includes all the monuments to English is "The list of monuments in Lenzen (Elbe) includes all the monuments
...@@ -70,9 +68,7 @@ fewshot_config: ...@@ -70,9 +68,7 @@ fewshot_config:
\ am Lech.\nThe translation contains an error pertaining to\nOptions:\n(A) Modifiers\ \ am Lech.\nThe translation contains an error pertaining to\nOptions:\n(A) Modifiers\
\ or Adjectives\n(B) Numerical Values\n(C) Negation or Antonyms\n(D) Named Entities\n\ \ or Adjectives\n(B) Numerical Values\n(C) Negation or Antonyms\n(D) Named Entities\n\
(E) Dropped Content\n(F) Facts" (E) Dropped Content\n(F) Facts"
target: 'Let''s think step by step. target: 'We solve this question by first translating the source sentence to English and
We solve this question by first translating the source sentence to English and
then by comparing our translation with the provided translation. According to then by comparing our translation with the provided translation. According to
Google Translate, the correct translation of the source sentence from German Google Translate, the correct translation of the source sentence from German
to English is "The monuments of the Upper Bavarian district town of Landsberg to English is "The monuments of the Upper Bavarian district town of Landsberg
...@@ -98,7 +94,7 @@ fewshot_config: ...@@ -98,7 +94,7 @@ fewshot_config:
\ Voivodeship of Poland.\nThe translation contains an error pertaining to\n\ \ Voivodeship of Poland.\nThe translation contains an error pertaining to\n\
Options:\n(A) Modifiers or Adjectives\n(B) Numerical Values\n(C) Negation or\ Options:\n(A) Modifiers or Adjectives\n(B) Numerical Values\n(C) Negation or\
\ Antonyms\n(D) Named Entities\n(E) Dropped Content\n(F) Facts" \ Antonyms\n(D) Named Entities\n(E) Dropped Content\n(F) Facts"
target: "Let's think step by step.\nWe solve this question by first translating\ target: "We solve this question by first translating\
\ the source sentence to English and then by comparing our translation with\ \ the source sentence to English and then by comparing our translation with\
\ the provided translation. According to Google Translate, the correct translation\ \ the provided translation. According to Google Translate, the correct translation\
\ of the source sentence from German to English is \"\u0141eba is a small town\ \ of the source sentence from German to English is \"\u0141eba is a small town\
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment