Unverified Commit c2be7211 authored by philipdoldo's avatar philipdoldo Committed by GitHub
Browse files

`bbh_cot_fewshot`: Removed repeated "Let''s think step by step." text from bbh cot prompts (#3140)



* Removed the 'Let''s think step by step.' text from the start of the target entry in each of the samples to prevent this phrase from being repeated twice in the few-shot prompts and to match the behavior from the original bbh repository. Worth noting that this applied to only 26 out of 27 subtasks, the only one it did not apply to is boolean_expressions.yaml. When it comes to boolean_expressions.yaml, in my opinion there is an error in that it doesn't say the 'Remember that (i) ...' text after the final 'A: Let's think step by step.' in the prompt. Models like EleutherAI/gpt-neo-125m seem to always begin answers with this string anyway (copying what was done in the few-shot prompts), but I think it really should've been part of the prompt, much like how 'A: Let's think step by step.' is included in the prompt for all of the cot tasks. However, the original bbh repo also has this issue, so I think it is fine to keep it this way for consistency, but just thought I'd point it out anyway.

* feat: remove extra space from answers; add changelog

---------
Co-authored-by: default avatarBaber <baber@hey.com>
parent 51ede33c
......@@ -24,9 +24,7 @@ fewshot_config:
(A) Yes, because having interests and actively researching them is a huge waste
(B) Yes, because having interests and actively researching them is a huge deal'
target: 'Let''s think step by step.
If we look at (A), it says that having interests and actively researching them
target: 'If we look at (A), it says that having interests and actively researching them
is a huge waste, implying that it is a useless effort. However, we know that
having interests and actively researching them is typically not a waste but
rather is beneficial to the individual. The presence of such a juxtaposition
......@@ -46,9 +44,7 @@ fewshot_config:
(B) No one is going to disagree with you on this. Ad hominem attacks really
help your case'
target: 'Let''s think step by step.
If we look at (A), it says that avoiding ad hominem attacks really help your
target: 'If we look at (A), it says that avoiding ad hominem attacks really help your
case, implying that ad hominem attacks are adverse and injurious. Because ad
hominem attacks are adressed at a person rather than an idea, it is indeed true
that avoiding them is often useful and helpful; so, (A) is a neutral (valid
......@@ -70,9 +66,7 @@ fewshot_config:
(B) Consistency in the league''s punishments? What do you think this is supposed
to be, moral?'
target: 'Let''s think step by step.
If we look at (A), it likens the consistency in the league''s punishments with
target: 'If we look at (A), it likens the consistency in the league''s punishments with
that in politics. Because politics or political affairs are often not considered
to be consistent or dependable, this sentence appears to be satirical.
......
......@@ -14,15 +14,15 @@ fewshot_config:
samples:
- input: Is the following sentence plausible? "Bam Adebayo scored a reverse layup
in the Western Conference Finals."
target: Let's think step by step. Bam Adebayo is an American basketball player.
target: Bam Adebayo is an American basketball player.
Scoring a reverse layup in the Western Conference Finals is part of the NBA
Finals. So the answer is yes.
- input: Is the following sentence plausible? "Santi Cazorla scored a touchdown."
target: Let's think step by step. Santi Cazorla is a soccer player. Touchdown
target: Santi Cazorla is a soccer player. Touchdown
is part of American football and rugby. So the answer is no.
- input: Is the following sentence plausible? "DeMar DeRozan was called for the
goal tend."
target: Let's think step by step. DeMar DeRozan is an American basketball player.
target: DeMar DeRozan is an American basketball player.
Goal tending is part of basketball. So the answer is yes.
include: _cot_fewshot_template_yaml
task: bbh_cot_fewshot_sports_understanding
......@@ -39,9 +39,7 @@ fewshot_config:
(C) 5pm to 6pm
(D) 2pm to 4pm'
target: 'Let''s think step by step.
Wake-up time: 1pm.
target: 'Wake-up time: 1pm.
1pm-2pm: free.
......@@ -88,9 +86,7 @@ fewshot_config:
(C) 1pm to 2pm
(D) 3pm to 6pm'
target: 'Let''s think step by step.
Wake-up time: 7am.
target: 'Wake-up time: 7am.
7am-1pm: free.
......@@ -137,9 +133,7 @@ fewshot_config:
(C) 5am to 6am
(D) 3pm to 4pm'
target: 'Let''s think step by step.
Wake-up time: 5am.
target: 'Wake-up time: 5am.
5am-6am: getting a coffee at the cafe.
......
......@@ -27,9 +27,7 @@ fewshot_config:
(B) blue ball
(C) pink ball'
target: 'Let''s think step by step.
(0) At the start: Alice: yellow, Bob: blue, Claire: pink.
target: '(0) At the start: Alice: yellow, Bob: blue, Claire: pink.
(1) Claire and Alice swap balls: Alice: pink, Bob: blue, Claire: yellow.
......@@ -53,9 +51,7 @@ fewshot_config:
(B) purple ball
(C) pink ball'
target: 'Let''s think step by step.
(0) At the start: Alice: white, Bob: purple, Claire: pink.
target: '(0) At the start: Alice: white, Bob: purple, Claire: pink.
(1) Bob and Alice swap balls: Alice: purple, Bob: white, Claire: pink.
......@@ -79,9 +75,7 @@ fewshot_config:
(B) Rodrigo
(C) Patrick'
target: 'Let''s think step by step.
(0) At the start: Alice: Lola, Bob: Rodrigo, Claire: Patrick.
target: '(0) At the start: Alice: Lola, Bob: Rodrigo, Claire: Patrick.
(1) Alice and Bob switch partners: Alice: Rodrigo, Bob: Lola, Claire: Patrick.
......
......@@ -27,9 +27,7 @@ fewshot_config:
(B) blue ball
(C) pink ball'
target: 'Let''s think step by step.
(0) At the start: Alice: yellow, Bob: blue, Claire: pink.
target: '(0) At the start: Alice: yellow, Bob: blue, Claire: pink.
(1) Claire and Alice swap balls: Alice: pink, Bob: blue, Claire: yellow.
......@@ -53,9 +51,7 @@ fewshot_config:
(B) purple ball
(C) pink ball'
target: 'Let''s think step by step.
(0) At the start: Alice: white, Bob: purple, Claire: pink.
target: '(0) At the start: Alice: white, Bob: purple, Claire: pink.
(1) Bob and Alice swap balls: Alice: purple, Bob: white, Claire: pink.
......@@ -79,9 +75,7 @@ fewshot_config:
(B) Rodrigo
(C) Patrick'
target: 'Let''s think step by step.
(0) At the start: Alice: Lola, Bob: Rodrigo, Claire: Patrick.
target: '(0) At the start: Alice: Lola, Bob: Rodrigo, Claire: Patrick.
(1) Alice and Bob switch partners: Alice: Rodrigo, Bob: Lola, Claire: Patrick.
......
......@@ -27,9 +27,7 @@ fewshot_config:
(B) blue ball
(C) pink ball'
target: 'Let''s think step by step.
(0) At the start: Alice: yellow, Bob: blue, Claire: pink.
target: '(0) At the start: Alice: yellow, Bob: blue, Claire: pink.
(1) Claire and Alice swap balls: Alice: pink, Bob: blue, Claire: yellow.
......@@ -53,9 +51,7 @@ fewshot_config:
(B) purple ball
(C) pink ball'
target: 'Let''s think step by step.
(0) At the start: Alice: white, Bob: purple, Claire: pink.
target: '(0) At the start: Alice: white, Bob: purple, Claire: pink.
(1) Bob and Alice swap balls: Alice: purple, Bob: white, Claire: pink.
......@@ -79,9 +75,7 @@ fewshot_config:
(B) Rodrigo
(C) Patrick'
target: 'Let''s think step by step.
(0) At the start: Alice: Lola, Bob: Rodrigo, Claire: Patrick.
target: '(0) At the start: Alice: Lola, Bob: Rodrigo, Claire: Patrick.
(1) Alice and Bob switch partners: Alice: Rodrigo, Bob: Lola, Claire: Patrick.
......
......@@ -14,9 +14,7 @@ fewshot_config:
- input: 'Question: Fidel tells the truth. Jerry says Fidel tells the truth. Vina
says Jerry tells the truth. Millicent says Vina lies. Raymond says Millicent
lies. Does Raymond tell the truth?'
target: 'Let''s think step by step.
(1) Fidel tells the truth. So, we know that Fidel tells the truth.
target: '(1) Fidel tells the truth. So, we know that Fidel tells the truth.
(2) Jerry says Fidel tells the truth. Since we know from (1) that Fidel tells
the truth, if Jerry says that Fidel tells the truth, then Jerry tells the truth.
......@@ -35,9 +33,7 @@ fewshot_config:
- input: 'Question: Kristian lies. Millie says Kristian lies. Maybelle says Millie
tells the truth. Fidel says Maybelle lies. Leda says Fidel lies. Does Leda tell
the truth?'
target: 'Let''s think step by step.
(1) Kristian lies. So, we know that Kristian lies.
target: '(1) Kristian lies. So, we know that Kristian lies.
(2) Millie says Kristian lies. Since we know from (1) that Kristian lies, if
Millie says Kristian lies, then Millie tells the truth.
......@@ -57,9 +53,7 @@ fewshot_config:
- input: 'Question: Kristian tells the truth. Michaela says Kristian lies. Raymond
says Michaela tells the truth. Osvaldo says Raymond tells the truth. Jamey says
Osvaldo tells the truth. Does Jamey tell the truth?'
target: 'Let''s think step by step.
(1) Kristian tells the truth. So, we know that Kristian tells the truth.
target: '(1) Kristian tells the truth. So, we know that Kristian tells the truth.
(2) Michaela says Kristian lies. Since we know from (1) that Kristian tells
the truth, if Michaela says Kristian lies, then Michaela lies.
......
......@@ -12,9 +12,7 @@ fewshot_config:
sampler: first_n
samples:
- input: 'Sort the following words alphabetically: List: oven costume counterpart'
target: 'Let''s think step by step.
The first letter: "oven": "o" (15). "costume": "c" (3). "counterpart": "c" (3).
target: 'The first letter: "oven": "o" (15). "costume": "c" (3). "counterpart": "c" (3).
We now have: (3) ["costume" ? "counterpart"] < (15) "oven". Now let''s sort
this subpart ["costume" ? "counterpart"] by looking at their second letters.
......@@ -27,9 +25,7 @@ fewshot_config:
< "oven". So the answer is costume counterpart oven.'
- input: 'Sort the following words alphabetically: List: hypochlorite ponderosa
phone credulity'
target: 'Let''s think step by step.
The first letter: "hypochlorite": "h" (8). "ponderosa": "p" (16). "phone": "p"
target: 'The first letter: "hypochlorite": "h" (8). "ponderosa": "p" (16). "phone": "p"
(16). "credulity": "c" (3). We now have: (3) "credulity" < (8) "hypochlorite"
< (16) ["ponderosa" ? "phone"]. Now let''s sort this subpart ["ponderosa" ?
"phone"] by looking at their second letters.
......@@ -39,9 +35,7 @@ fewshot_config:
<"ponderosa"]. So the answer is credulity hypochlorite phone ponderosa.'
- input: 'Sort the following words alphabetically: List: newt arson parthia seismography
mugho aspect census'
target: 'Let''s think step by step.
The first letter: "newt": "n" (14). "arson": "a" (1). "parthia": "p" (16). "seismography":
target: 'The first letter: "newt": "n" (14). "arson": "a" (1). "parthia": "p" (16). "seismography":
"s" (19). "mugho": "m" (13). "aspect": "a" (1). "census": "c" (3). We now have:
(1) ["arson" ? "aspect"] < (3) "census" < (13) "mugho" < (14) "newt" < (16)
"parthia" < (19) "seismography". Now let''s sort this subpart ["arson" ? "aspect"]
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment