Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
gaoqiong
lm-evaluation-harness
Commits
bf26d979
Unverified
Commit
bf26d979
authored
Nov 28, 2023
by
Lintang Sutawika
Committed by
GitHub
Nov 28, 2023
Browse files
Merge pull request #1029 from EleutherAI/bbh-fixup
[Refactor] BBH fixup
parents
e7afee52
3b9640b8
Changes
114
Hide whitespace changes
Inline
Side-by-side
Showing
20 changed files
with
44 additions
and
40 deletions
+44
-40
lm_eval/tasks/bbh/README.md
lm_eval/tasks/bbh/README.md
+4
-4
lm_eval/tasks/bbh/_generate_configs.py
lm_eval/tasks/bbh/_generate_configs.py
+1
-1
lm_eval/tasks/bbh/cot_fewshot/_cot_fewshot_template_yaml
lm_eval/tasks/bbh/cot_fewshot/_cot_fewshot_template_yaml
+5
-1
lm_eval/tasks/bbh/cot_fewshot/boolean_expressions.yaml
lm_eval/tasks/bbh/cot_fewshot/boolean_expressions.yaml
+2
-2
lm_eval/tasks/bbh/cot_fewshot/causal_judgement.yaml
lm_eval/tasks/bbh/cot_fewshot/causal_judgement.yaml
+2
-2
lm_eval/tasks/bbh/cot_fewshot/date_understanding.yaml
lm_eval/tasks/bbh/cot_fewshot/date_understanding.yaml
+2
-2
lm_eval/tasks/bbh/cot_fewshot/disambiguation_qa.yaml
lm_eval/tasks/bbh/cot_fewshot/disambiguation_qa.yaml
+2
-2
lm_eval/tasks/bbh/cot_fewshot/dyck_languages.yaml
lm_eval/tasks/bbh/cot_fewshot/dyck_languages.yaml
+2
-2
lm_eval/tasks/bbh/cot_fewshot/formal_fallacies.yaml
lm_eval/tasks/bbh/cot_fewshot/formal_fallacies.yaml
+2
-2
lm_eval/tasks/bbh/cot_fewshot/geometric_shapes.yaml
lm_eval/tasks/bbh/cot_fewshot/geometric_shapes.yaml
+2
-2
lm_eval/tasks/bbh/cot_fewshot/hyperbaton.yaml
lm_eval/tasks/bbh/cot_fewshot/hyperbaton.yaml
+2
-2
lm_eval/tasks/bbh/cot_fewshot/logical_deduction_five_objects.yaml
...tasks/bbh/cot_fewshot/logical_deduction_five_objects.yaml
+2
-2
lm_eval/tasks/bbh/cot_fewshot/logical_deduction_seven_objects.yaml
...asks/bbh/cot_fewshot/logical_deduction_seven_objects.yaml
+2
-2
lm_eval/tasks/bbh/cot_fewshot/logical_deduction_three_objects.yaml
...asks/bbh/cot_fewshot/logical_deduction_three_objects.yaml
+2
-2
lm_eval/tasks/bbh/cot_fewshot/movie_recommendation.yaml
lm_eval/tasks/bbh/cot_fewshot/movie_recommendation.yaml
+2
-2
lm_eval/tasks/bbh/cot_fewshot/multistep_arithmetic_two.yaml
lm_eval/tasks/bbh/cot_fewshot/multistep_arithmetic_two.yaml
+2
-2
lm_eval/tasks/bbh/cot_fewshot/navigate.yaml
lm_eval/tasks/bbh/cot_fewshot/navigate.yaml
+2
-2
lm_eval/tasks/bbh/cot_fewshot/object_counting.yaml
lm_eval/tasks/bbh/cot_fewshot/object_counting.yaml
+2
-2
lm_eval/tasks/bbh/cot_fewshot/penguins_in_a_table.yaml
lm_eval/tasks/bbh/cot_fewshot/penguins_in_a_table.yaml
+2
-2
lm_eval/tasks/bbh/cot_fewshot/reasoning_about_colored_objects.yaml
...asks/bbh/cot_fewshot/reasoning_about_colored_objects.yaml
+2
-2
No files found.
lm_eval/tasks/bbh/README.md
View file @
bf26d979
...
@@ -25,10 +25,10 @@ Homepage: https://github.com/suzgunmirac/BIG-Bench-Hard
...
@@ -25,10 +25,10 @@ Homepage: https://github.com/suzgunmirac/BIG-Bench-Hard
#### Groups
#### Groups
-
`bbh_
flan_
zeroshot`
-
`bbh_zeroshot`
-
`bbh_
flan_
fewshot`
-
`bbh_fewshot`
-
`bbh_
flan_
cot_fewshot`
-
`bbh_cot_fewshot`
-
`bbh_
flan_
cot_zeroshot`
-
`bbh_cot_zeroshot`
#### Tasks
#### Tasks
...
...
lm_eval/tasks/bbh/_generate_configs.py
View file @
bf26d979
...
@@ -17,7 +17,7 @@ from lm_eval.logger import eval_logger
...
@@ -17,7 +17,7 @@ from lm_eval.logger import eval_logger
def
parse_args
():
def
parse_args
():
parser
=
argparse
.
ArgumentParser
()
parser
=
argparse
.
ArgumentParser
()
parser
.
add_argument
(
"--base_yaml_path"
,
required
=
True
)
parser
.
add_argument
(
"--base_yaml_path"
,
required
=
True
)
parser
.
add_argument
(
"--save_prefix_path"
,
default
=
"
flan_
zeroshot"
)
parser
.
add_argument
(
"--save_prefix_path"
,
default
=
"zeroshot"
)
parser
.
add_argument
(
"--cot"
,
default
=
False
)
parser
.
add_argument
(
"--cot"
,
default
=
False
)
parser
.
add_argument
(
"--fewshot"
,
default
=
False
)
parser
.
add_argument
(
"--fewshot"
,
default
=
False
)
parser
.
add_argument
(
"--task_prefix"
,
default
=
""
)
parser
.
add_argument
(
"--task_prefix"
,
default
=
""
)
...
...
lm_eval/tasks/bbh/
flan_
cot_fewshot/_
flan_
cot_fewshot_template_yaml
→
lm_eval/tasks/bbh/cot_fewshot/_cot_fewshot_template_yaml
View file @
bf26d979
group: bbh_flan_cot_fewshot
group:
- bbh
- bbh_cot_fewshot
dataset_path: lukaemon/bbh
dataset_path: lukaemon/bbh
output_type: generate_until
output_type: generate_until
test_split: test
test_split: test
...
@@ -12,6 +14,8 @@ metric_list:
...
@@ -12,6 +14,8 @@ metric_list:
generation_kwargs:
generation_kwargs:
until:
until:
- "</s>"
- "</s>"
- "Q"
- "\n\n"
do_sample: false
do_sample: false
temperature: 0.0
temperature: 0.0
filter_list:
filter_list:
...
...
lm_eval/tasks/bbh/
flan_
cot_fewshot/boolean_expressions.yaml
→
lm_eval/tasks/bbh/cot_fewshot/boolean_expressions.yaml
View file @
bf26d979
"
dataset_name"
:
"
boolean_expressions"
"
dataset_name"
:
"
boolean_expressions"
"
description"
:
"
Evaluate
the
result
of
a
random
Boolean
expression.
\n\n
"
"
description"
:
"
Evaluate
the
result
of
a
random
Boolean
expression.
\n\n
"
"
doc_to_text"
:
"
not
(
(
not
not
True
)
)
is
\n
A:
Let's
think
step
by
step.
\n
Remember
that
(i)
expressions
inside
brackets
are
always
evaluated
first
and
that
(ii)
the
order
of
operations
from
highest
priority
to
lowest
priority
is
\"
not
\"
,
\"
and
\"
,
\"
or
\"
,
respectively.
\n
We
first
simplify
this
expression
\"
Z
\"
as
follows:
\"
Z
=
not
(
(
not
not
True
)
)
=
not
(
(
A
)
)
\"
where
\"
A
=
not
not
True
\"
.
\n
Let's
evaluate
A:
A
=
not
not
True
=
not
(not
True)
=
not
False
=
True.
\n
Plugging
in
A,
we
get:
Z
=
not
(
(
A
)
)
=
not
(
(
True
)
)
=
not
True
=
False.
So
the
answer
is
False.
True
and
False
and
not
True
and
True
is
\n
A:
Let's
think
step
by
step.
\n
Remember
that
(i)
expressions
inside
brackets
are
always
evaluated
first
and
that
(ii)
the
order
of
operations
from
highest
priority
to
lowest
priority
is
\"
not
\"
,
\"
and
\"
,
\"
or
\"
,
respectively.
\n
We
first
simplify
this
expression
\"
Z
\"
as
follows:
\"
Z
=
True
and
False
and
not
True
and
True
=
A
and
B
\"
where
\"
A
=
True
and
False
\"
and
\"
B
=
not
True
and
True
\"
.
\n
Let's
evaluate
A:
A
=
True
and
False
=
False.
\n
Let's
evaluate
B:
B
=
not
True
and
True
=
not
(True
and
True)
=
not
(True)
=
False.
\n
Plugging
in
A
and
B,
we
get:
Z
=
A
and
B
=
False
and
False
=
False.
So
the
answer
is
False.
not
not
(
not
(
False
)
)
is
\n
A:
Let's
think
step
by
step.
\n
Remember
that
(i)
expressions
inside
brackets
are
always
evaluated
first
and
that
(ii)
the
order
of
operations
from
highest
priority
to
lowest
priority
is
\"
not
\"
,
\"
and
\"
,
\"
or
\"
,
respectively.
\n
We
first
simplify
this
expression
\"
Z
\"
as
follows:
\"
Z
=
not
not
(
not
(
False
)
)
=
not
not
(
A
)
\"
where
\"
A
=
not
(
False
)
\"
.
\n
Let's
evaluate
A:
A
=
not
(
False
)
=
not
False
=
True.
\n
Plugging
in
A,
we
get:
Z
=
not
not
(
A
)
=
not
not
(True)
=
not
not
False
=
True.
So
the
answer
is
True.Q:
{{input}}
\n
A:
Let's
think
step
by
step.
\n
"
"
doc_to_text"
:
"
not
(
(
not
not
True
)
)
is
\n
A:
Let's
think
step
by
step.
\n
Remember
that
(i)
expressions
inside
brackets
are
always
evaluated
first
and
that
(ii)
the
order
of
operations
from
highest
priority
to
lowest
priority
is
\"
not
\"
,
\"
and
\"
,
\"
or
\"
,
respectively.
\n
We
first
simplify
this
expression
\"
Z
\"
as
follows:
\"
Z
=
not
(
(
not
not
True
)
)
=
not
(
(
A
)
)
\"
where
\"
A
=
not
not
True
\"
.
\n
Let's
evaluate
A:
A
=
not
not
True
=
not
(not
True)
=
not
False
=
True.
\n
Plugging
in
A,
we
get:
Z
=
not
(
(
A
)
)
=
not
(
(
True
)
)
=
not
True
=
False.
So
the
answer
is
False.
True
and
False
and
not
True
and
True
is
\n
A:
Let's
think
step
by
step.
\n
Remember
that
(i)
expressions
inside
brackets
are
always
evaluated
first
and
that
(ii)
the
order
of
operations
from
highest
priority
to
lowest
priority
is
\"
not
\"
,
\"
and
\"
,
\"
or
\"
,
respectively.
\n
We
first
simplify
this
expression
\"
Z
\"
as
follows:
\"
Z
=
True
and
False
and
not
True
and
True
=
A
and
B
\"
where
\"
A
=
True
and
False
\"
and
\"
B
=
not
True
and
True
\"
.
\n
Let's
evaluate
A:
A
=
True
and
False
=
False.
\n
Let's
evaluate
B:
B
=
not
True
and
True
=
not
(True
and
True)
=
not
(True)
=
False.
\n
Plugging
in
A
and
B,
we
get:
Z
=
A
and
B
=
False
and
False
=
False.
So
the
answer
is
False.
not
not
(
not
(
False
)
)
is
\n
A:
Let's
think
step
by
step.
\n
Remember
that
(i)
expressions
inside
brackets
are
always
evaluated
first
and
that
(ii)
the
order
of
operations
from
highest
priority
to
lowest
priority
is
\"
not
\"
,
\"
and
\"
,
\"
or
\"
,
respectively.
\n
We
first
simplify
this
expression
\"
Z
\"
as
follows:
\"
Z
=
not
not
(
not
(
False
)
)
=
not
not
(
A
)
\"
where
\"
A
=
not
(
False
)
\"
.
\n
Let's
evaluate
A:
A
=
not
(
False
)
=
not
False
=
True.
\n
Plugging
in
A,
we
get:
Z
=
not
not
(
A
)
=
not
not
(True)
=
not
not
False
=
True.
So
the
answer
is
True.Q:
{{input}}
\n
A:
Let's
think
step
by
step.
\n
"
"
include"
:
"
_
flan_
cot_fewshot_template_yaml"
"
include"
:
"
_cot_fewshot_template_yaml"
"
task"
:
"
bbh_
flan_
cot_fewshot_boolean_expressions"
"
task"
:
"
bbh_cot_fewshot_boolean_expressions"
lm_eval/tasks/bbh/
flan_
cot_fewshot/causal_judgement.yaml
→
lm_eval/tasks/bbh/cot_fewshot/causal_judgement.yaml
View file @
bf26d979
"
dataset_name"
:
"
causal_judgement"
"
dataset_name"
:
"
causal_judgement"
"
description"
:
"
Answer
questions
about
causal
attribution.
\n\n
"
"
description"
:
"
Answer
questions
about
causal
attribution.
\n\n
"
"
doc_to_text"
:
"
How
would
a
typical
person
answer
each
of
the
following
questions
about
causation?
\n
Frank
T.,
had
an
ongoing
dispute
with
his
neighbor
over
a
stretch
of
land
and
one
day
decided
to
shoot
his
neighbor
in
the
body.
Frank
T.
had
no
experience
with
guns,
his
hand
slipped
on
the
barrel
of
the
gun,
and
the
shot
went
wild.
Nonetheless,
the
bullet
bounced
off
a
large
boulder
several
feet
away
and
hit
the
neighbor's
body,
causing
significant
injury.
Did
Frank
T.
intentionally
shoot
his
neighbor
in
the
body?
\n
Options:
\n
-
Yes
\n
-
No
\n
A:
Let's
think
step
by
step.
\n
Here
in
this
question,
we
are
told
that
\"
Frank
T.
had
no
experience
with
guns,
his
hand
slipped
on
the
barrel
of
the
gun,
and
the
shot
went
wild.
\"
A
typical
person
would
assume
that
this
passage
suggests
that
Frank
T.
had
no
intention
of
shooting
and
injuring
someone
and
that
the
bullet
accidentally
hit
the
neighbor's
body;
therefore,
we
conclude
that
Frank
T.
did
not
intentionally
hit
his
neighbor.
So
the
answer
is
No.
How
would
a
typical
person
answer
each
of
the
following
questions
about
causation?
\n
Suzy
and
Billy
are
working
on
a
project
that
is
very
important
for
our
nation's
security.
The
boss
tells
them
both:
\"
Be
sure
that
you
are
here
at
exactly
9
am.
It
is
absolutely
essential
that
you
arrive
at
that
time.
\"
Both
Billy
and
Suzy
arrive
at
9
am.
As
it
happens,
there
was
a
motion
detector
installed
in
the
room
where
they
arrived.
The
motion
detector
was
set
up
to
be
triggered
if
at
least
one
person
appeared
in
the
room
at
the
same
time.
So
the
motion
detector
went
off.
Did
Billy
cause
the
motion
detector
to
go
off?
\n
Options:
\n
-
Yes
\n
-
No
\n
A:
Let's
think
step
by
step.
\n
Here
in
this
question,
we
are
told
that
the
boss
ordered
them
both
to
arrive
at
the
meeting
room
at
the
same
time
and
that
the
motion
detector
was
set
up
to
be
triggered
if
at
least
one
person
appeared
in
the
room
at
the
same
time.
\"
A
typical
person
would
assume
that
the
person
probably
meant
to
say
the
detector
was
set
up
to
be
triggered
if
\"
both
persons
\"
appeared
in
the
room
at
the
same
time,
not
at
least
one
person,
since
otherwise
the
phrase
\"
at
the
same
time
\"
would
not
make
much
sense
in
that
sentence.
Because
the
motion
detector
went
off,
a
typical
person
would
therefore
come
to
the
conclusion
that
both
Suzy
and
Billy
triggered
the
motion
detector
to
go
off;
hence,
Billy
did
indeed
cause
the
motion
detector
to
go
off.
So
the
answer
is
Yes.
How
would
a
typical
person
answer
each
of
the
following
questions
about
causation?
\n
George
and
his
sister
Lena
reunite
at
their
parents'
house
for
Thanksgiving.
Whereas
George
just
got
into
medical
school,
Lena
is
unhappy
in
her
marriage
and
recently
lost
her
job.
Over
the
course
of
the
day,
George
and
Lena
get
into
a
number
of
heated
arguments.
Later
in
the
afternoon
they
play
a
game
of
darts.
They
split
the
first
two
games,
and
the
third
game
is
close
until
the
end.
Who
will
win
comes
down
to
George's
last
shot.
If
he
hits
a
high
point
region,
he
wins;
if
he
hits
a
low
point
region,
Lena
wins.
George
thinks
of
the
difficult
time
Lena
is
having,
and
he
really
wants
to
let
her
win.
He
aims
the
dart
at
the
low
point
region.
He
sets
up
his
shot
and
the
dart
lands
in
the
low
point
region.
After
his
shot,
Lena
wins
the
game
and
is
very
happy.
Did
George
hit
the
low
point
region
intentionally?
\n
Options:
\n
-
Yes
\n
-
No
\n
A:
Let's
think
step
by
step.
\n
Here
in
this
question,
we
are
told
that
\"
He
aims
the
dart
at
the
low
point
region.
\"
A
typical
person
might
therefore
think
George
did
intentionally
hit
the
low
point
region,
because
he
wanted
to
lift
up
the
spirit
of
his
sister
Lena.
So
the
answer
is
Yes.Q:
{{input}}
\n
A:
Let's
think
step
by
step.
\n
"
"
doc_to_text"
:
"
How
would
a
typical
person
answer
each
of
the
following
questions
about
causation?
\n
Frank
T.,
had
an
ongoing
dispute
with
his
neighbor
over
a
stretch
of
land
and
one
day
decided
to
shoot
his
neighbor
in
the
body.
Frank
T.
had
no
experience
with
guns,
his
hand
slipped
on
the
barrel
of
the
gun,
and
the
shot
went
wild.
Nonetheless,
the
bullet
bounced
off
a
large
boulder
several
feet
away
and
hit
the
neighbor's
body,
causing
significant
injury.
Did
Frank
T.
intentionally
shoot
his
neighbor
in
the
body?
\n
Options:
\n
-
Yes
\n
-
No
\n
A:
Let's
think
step
by
step.
\n
Here
in
this
question,
we
are
told
that
\"
Frank
T.
had
no
experience
with
guns,
his
hand
slipped
on
the
barrel
of
the
gun,
and
the
shot
went
wild.
\"
A
typical
person
would
assume
that
this
passage
suggests
that
Frank
T.
had
no
intention
of
shooting
and
injuring
someone
and
that
the
bullet
accidentally
hit
the
neighbor's
body;
therefore,
we
conclude
that
Frank
T.
did
not
intentionally
hit
his
neighbor.
So
the
answer
is
No.
How
would
a
typical
person
answer
each
of
the
following
questions
about
causation?
\n
Suzy
and
Billy
are
working
on
a
project
that
is
very
important
for
our
nation's
security.
The
boss
tells
them
both:
\"
Be
sure
that
you
are
here
at
exactly
9
am.
It
is
absolutely
essential
that
you
arrive
at
that
time.
\"
Both
Billy
and
Suzy
arrive
at
9
am.
As
it
happens,
there
was
a
motion
detector
installed
in
the
room
where
they
arrived.
The
motion
detector
was
set
up
to
be
triggered
if
at
least
one
person
appeared
in
the
room
at
the
same
time.
So
the
motion
detector
went
off.
Did
Billy
cause
the
motion
detector
to
go
off?
\n
Options:
\n
-
Yes
\n
-
No
\n
A:
Let's
think
step
by
step.
\n
Here
in
this
question,
we
are
told
that
the
boss
ordered
them
both
to
arrive
at
the
meeting
room
at
the
same
time
and
that
the
motion
detector
was
set
up
to
be
triggered
if
at
least
one
person
appeared
in
the
room
at
the
same
time.
\"
A
typical
person
would
assume
that
the
person
probably
meant
to
say
the
detector
was
set
up
to
be
triggered
if
\"
both
persons
\"
appeared
in
the
room
at
the
same
time,
not
at
least
one
person,
since
otherwise
the
phrase
\"
at
the
same
time
\"
would
not
make
much
sense
in
that
sentence.
Because
the
motion
detector
went
off,
a
typical
person
would
therefore
come
to
the
conclusion
that
both
Suzy
and
Billy
triggered
the
motion
detector
to
go
off;
hence,
Billy
did
indeed
cause
the
motion
detector
to
go
off.
So
the
answer
is
Yes.
How
would
a
typical
person
answer
each
of
the
following
questions
about
causation?
\n
George
and
his
sister
Lena
reunite
at
their
parents'
house
for
Thanksgiving.
Whereas
George
just
got
into
medical
school,
Lena
is
unhappy
in
her
marriage
and
recently
lost
her
job.
Over
the
course
of
the
day,
George
and
Lena
get
into
a
number
of
heated
arguments.
Later
in
the
afternoon
they
play
a
game
of
darts.
They
split
the
first
two
games,
and
the
third
game
is
close
until
the
end.
Who
will
win
comes
down
to
George's
last
shot.
If
he
hits
a
high
point
region,
he
wins;
if
he
hits
a
low
point
region,
Lena
wins.
George
thinks
of
the
difficult
time
Lena
is
having,
and
he
really
wants
to
let
her
win.
He
aims
the
dart
at
the
low
point
region.
He
sets
up
his
shot
and
the
dart
lands
in
the
low
point
region.
After
his
shot,
Lena
wins
the
game
and
is
very
happy.
Did
George
hit
the
low
point
region
intentionally?
\n
Options:
\n
-
Yes
\n
-
No
\n
A:
Let's
think
step
by
step.
\n
Here
in
this
question,
we
are
told
that
\"
He
aims
the
dart
at
the
low
point
region.
\"
A
typical
person
might
therefore
think
George
did
intentionally
hit
the
low
point
region,
because
he
wanted
to
lift
up
the
spirit
of
his
sister
Lena.
So
the
answer
is
Yes.Q:
{{input}}
\n
A:
Let's
think
step
by
step.
\n
"
"
include"
:
"
_
flan_
cot_fewshot_template_yaml"
"
include"
:
"
_cot_fewshot_template_yaml"
"
task"
:
"
bbh_
flan_
cot_fewshot_causal_judgement"
"
task"
:
"
bbh_cot_fewshot_causal_judgement"
lm_eval/tasks/bbh/
flan_
cot_fewshot/date_understanding.yaml
→
lm_eval/tasks/bbh/cot_fewshot/date_understanding.yaml
View file @
bf26d979
"
dataset_name"
:
"
date_understanding"
"
dataset_name"
:
"
date_understanding"
"
description"
:
"
Infer
the
date
from
context.
\n\n
"
"
description"
:
"
Infer
the
date
from
context.
\n\n
"
"
doc_to_text"
:
"
Today
is
Christmas
Eve
of
1937.
What
is
the
date
10
days
ago
in
MM/DD/YYYY?
\n
Options:
\n
(A)
12/14/2026
\n
(B)
12/14/1950
\n
(C)
12/14/2007
\n
(D)
12/14/1937
\n
(E)
07/14/1938
\n
(F)
12/14/1988
\n
A:
Let's
think
step
by
step.
\n
If
today
is
Christmas
Eve
of
1937,
then
today's
date
is
December
24,
1937.
10
days
before
today
is
December
14,
1937,
that
is
12/14/1937.
So
the
answer
is
(D).
Tomorrow
is
11/12/2019.
What
is
the
date
one
year
ago
from
today
in
MM/DD/YYYY?
\n
Options:
\n
(A)
09/04/2018
\n
(B)
11/11/2018
\n
(C)
08/25/2018
\n
(D)
11/02/2018
\n
(E)
11/04/2018
\n
A:
Let's
think
step
by
step.
\n
If
tomorrow
is
11/12/2019,
then
today
is
11/11/2019.
The
date
one
year
ago
from
today
is
11/11/2018.
So
the
answer
is
(B).
Jane
and
John
married
on
Jan
2,
1958.
It
is
their
5-year
anniversary
today.
What
is
the
date
tomorrow
in
MM/DD/YYYY?
\n
Options:
\n
(A)
01/11/1961
\n
(B)
01/03/1963
\n
(C)
01/18/1961
\n
(D)
10/14/1960
\n
(E)
01/03/1982
\n
(F)
12/03/1960
\n
A:
Let's
think
step
by
step.
\n
If
Jane
and
John
married
on
Jan
2,
1958,
then
and
if
it
is
their
5-year
anniversary
today,
then
today's
date
is
Jan
2,
1963.
The
date
tomorrow
is
Jan
3,
1963,
that
is
01/03/1963.
So
the
answer
is
(B).Q:
{{input}}
\n
A:
Let's
think
step
by
step.
\n
"
"
doc_to_text"
:
"
Today
is
Christmas
Eve
of
1937.
What
is
the
date
10
days
ago
in
MM/DD/YYYY?
\n
Options:
\n
(A)
12/14/2026
\n
(B)
12/14/1950
\n
(C)
12/14/2007
\n
(D)
12/14/1937
\n
(E)
07/14/1938
\n
(F)
12/14/1988
\n
A:
Let's
think
step
by
step.
\n
If
today
is
Christmas
Eve
of
1937,
then
today's
date
is
December
24,
1937.
10
days
before
today
is
December
14,
1937,
that
is
12/14/1937.
So
the
answer
is
(D).
Tomorrow
is
11/12/2019.
What
is
the
date
one
year
ago
from
today
in
MM/DD/YYYY?
\n
Options:
\n
(A)
09/04/2018
\n
(B)
11/11/2018
\n
(C)
08/25/2018
\n
(D)
11/02/2018
\n
(E)
11/04/2018
\n
A:
Let's
think
step
by
step.
\n
If
tomorrow
is
11/12/2019,
then
today
is
11/11/2019.
The
date
one
year
ago
from
today
is
11/11/2018.
So
the
answer
is
(B).
Jane
and
John
married
on
Jan
2,
1958.
It
is
their
5-year
anniversary
today.
What
is
the
date
tomorrow
in
MM/DD/YYYY?
\n
Options:
\n
(A)
01/11/1961
\n
(B)
01/03/1963
\n
(C)
01/18/1961
\n
(D)
10/14/1960
\n
(E)
01/03/1982
\n
(F)
12/03/1960
\n
A:
Let's
think
step
by
step.
\n
If
Jane
and
John
married
on
Jan
2,
1958,
then
and
if
it
is
their
5-year
anniversary
today,
then
today's
date
is
Jan
2,
1963.
The
date
tomorrow
is
Jan
3,
1963,
that
is
01/03/1963.
So
the
answer
is
(B).Q:
{{input}}
\n
A:
Let's
think
step
by
step.
\n
"
"
include"
:
"
_
flan_
cot_fewshot_template_yaml"
"
include"
:
"
_cot_fewshot_template_yaml"
"
task"
:
"
bbh_
flan_
cot_fewshot_date_understanding"
"
task"
:
"
bbh_cot_fewshot_date_understanding"
lm_eval/tasks/bbh/
flan_
cot_fewshot/disambiguation_qa.yaml
→
lm_eval/tasks/bbh/cot_fewshot/disambiguation_qa.yaml
View file @
bf26d979
"
dataset_name"
:
"
disambiguation_qa"
"
dataset_name"
:
"
disambiguation_qa"
"
description"
:
"
Clarify
the
meaning
of
sentences
with
ambiguous
pronouns.
\n\n
"
"
description"
:
"
Clarify
the
meaning
of
sentences
with
ambiguous
pronouns.
\n\n
"
"
doc_to_text"
:
"
In
the
following
sentences,
explain
the
antecedent
of
the
pronoun
(which
thing
the
pronoun
refers
to),
or
state
that
it
is
ambiguous.
\n
Sentence:
The
chief
told
the
counselor
that
they
took
the
day
off.
\n
Options:
\n
(A)
The
chief
took
the
day
off
\n
(B)
The
counselor
took
the
day
off
\n
(C)
Ambiguous
\n
A:
Let's
think
step
by
step.
\n
Here
we
need
to
determine
who
the
pronoun
\"
they
\"
might
be
referring
to.
There
are
two
possible
referents
for
\"
they
\"
,
namely
the
chief
and
the
counselor.
The
verb
\"
told
\"
might
be
able
to
help
us
determine
which
one
is
more
likely
(if
either).
Let
X
be
the
chief
and
Y
the
counselor.
The
sentence
is
then
of
the
form
\"
X
told
Y
that
(X
or
Y)
did
something.
\"\n
Let's
consider
Y
first:
\"
X
told
Y
that
Y
did
something.
\"
This
case
does
not
make
much
sense,
as
Y
would
already
have
the
information
that
Y
did
something,
because
it
is
information
about
themself.
\n
Now,
consider
X:
\"
X
told
Y
that
X
did
something.
\"
This
makes
sense,
because
X
would
be
sharing
some
information
about
themself
that
Y
might
not
have
known
before.
\n
Because
in
this
context,
X
is
the
chief
and
Y
is
the
counselor,
the
answer
should
be
the
chief.
So
the
answer
is
(A).
In
the
following
sentences,
explain
the
antecedent
of
the
pronoun
(which
thing
the
pronoun
refers
to),
or
state
that
it
is
ambiguous.
\n
Sentence:
The
manager
sent
a
message
to
the
secretary,
but
he
didn't
reply
yet.
\n
Options:
\n
(A)
The
secretary
didn't
reply
yet
\n
(B)
The
manager
didn't
reply
yet
\n
(C)
Ambiguous
\n
A:
Let's
think
step
by
step.
\n
Here
we
need
to
determine
who
the
pronoun
\"
he
\"
might
be
referring
to.
There
are
two
possible
referents
for
\"
he
\"
,
namely
the
manager
and
the
secretary.
The
verbs
\"
sent
\"
and
\"
reply
\"
might
be
able
to
help
us
determine
which
one
is
more
likely
(if
either).
Let
X
be
the
manager
and
Y
the
secretary.
The
sentence
is
then
of
the
form
\"
X
sent
a
message
to
Y,
but
(X
or
Y)
didn't
reply
yet.
\"\n
Let's
consider
Y
first:
\"
X
sent
a
message
to
Y,
but
Y
didn't
reply
yet.
\"
This
case
makes
sense,
because
of
the
implicit
causality
of
the
sentence.
Y
was
the
receiver
of
the
message,
but
Y
didn't
get
back
to
X
yet.
\n
Now,
consider
X:
\"
X
sent
a
message
to
Y,
but
X
didn't
reply
yet.
\"
This
case
doesn't
make
sense,
because
X
was
the
initial
sender
of
the
message,
so
it
is
now
Y's
turn
to
write
back
to
X.
\n
Because
in
this
context,
X
is
the
manager
and
Y
is
the
secretary,
the
answer
should
be
the
secretary.
So
the
answer
is
(A).
In
the
following
sentences,
explain
the
antecedent
of
the
pronoun
(which
thing
the
pronoun
refers
to),
or
state
that
it
is
ambiguous.
\n
Sentence:
Bailey
will
plan
to
meet
the
director
at
his
office
\n
Options:
\n
(A)
It
will
be
Bailey's
office
\n
(B)
It
will
be
the
director's
office
\n
(C)
Ambiguous
\n
A:
Let's
think
step
by
step.
\n
Here
we
need
to
determine
who
the
pronoun
\"
his
\"
might
be
referring
to.
There
are
two
possible
referents
for
\"
his
\"
,
namely
Bailey's
and
the
director's.
The
verb
phrase
\"
plan
to
meet
\"
might
be
able
to
help
us
determine
which
one
is
more
likely
(if
either).
Let
X
be
Bailey
and
Y
the
director.
The
sentence
is
then
of
the
form
\"
X
will
plan
to
meet
Y
at
(X
or
Y)'s
office.
\"\n
Let's
consider
Y
first:
\"
X
will
plan
to
meet
Y
at
Y's
office.
\"
This
case
makes
sense,
because
X
might
want
to
meet
up
with
Y
at
Y's
office.
\n
Now,
consider
X:
\"
X
will
plan
to
meet
Y
at
X's
office.
\"
This
case
also
makes
sense,
because
X
might
want
to
meet
up
with
Y
at
X's
own
office.
\n
Because
both
X
and
Y
are
possible
at
the
same
time,
we
conclude
that
the
antecedent
of
the
pronoun
is
ambiguous.
So
the
answer
is
(C).Q:
{{input}}
\n
A:
Let's
think
step
by
step.
\n
"
"
doc_to_text"
:
"
In
the
following
sentences,
explain
the
antecedent
of
the
pronoun
(which
thing
the
pronoun
refers
to),
or
state
that
it
is
ambiguous.
\n
Sentence:
The
chief
told
the
counselor
that
they
took
the
day
off.
\n
Options:
\n
(A)
The
chief
took
the
day
off
\n
(B)
The
counselor
took
the
day
off
\n
(C)
Ambiguous
\n
A:
Let's
think
step
by
step.
\n
Here
we
need
to
determine
who
the
pronoun
\"
they
\"
might
be
referring
to.
There
are
two
possible
referents
for
\"
they
\"
,
namely
the
chief
and
the
counselor.
The
verb
\"
told
\"
might
be
able
to
help
us
determine
which
one
is
more
likely
(if
either).
Let
X
be
the
chief
and
Y
the
counselor.
The
sentence
is
then
of
the
form
\"
X
told
Y
that
(X
or
Y)
did
something.
\"\n
Let's
consider
Y
first:
\"
X
told
Y
that
Y
did
something.
\"
This
case
does
not
make
much
sense,
as
Y
would
already
have
the
information
that
Y
did
something,
because
it
is
information
about
themself.
\n
Now,
consider
X:
\"
X
told
Y
that
X
did
something.
\"
This
makes
sense,
because
X
would
be
sharing
some
information
about
themself
that
Y
might
not
have
known
before.
\n
Because
in
this
context,
X
is
the
chief
and
Y
is
the
counselor,
the
answer
should
be
the
chief.
So
the
answer
is
(A).
In
the
following
sentences,
explain
the
antecedent
of
the
pronoun
(which
thing
the
pronoun
refers
to),
or
state
that
it
is
ambiguous.
\n
Sentence:
The
manager
sent
a
message
to
the
secretary,
but
he
didn't
reply
yet.
\n
Options:
\n
(A)
The
secretary
didn't
reply
yet
\n
(B)
The
manager
didn't
reply
yet
\n
(C)
Ambiguous
\n
A:
Let's
think
step
by
step.
\n
Here
we
need
to
determine
who
the
pronoun
\"
he
\"
might
be
referring
to.
There
are
two
possible
referents
for
\"
he
\"
,
namely
the
manager
and
the
secretary.
The
verbs
\"
sent
\"
and
\"
reply
\"
might
be
able
to
help
us
determine
which
one
is
more
likely
(if
either).
Let
X
be
the
manager
and
Y
the
secretary.
The
sentence
is
then
of
the
form
\"
X
sent
a
message
to
Y,
but
(X
or
Y)
didn't
reply
yet.
\"\n
Let's
consider
Y
first:
\"
X
sent
a
message
to
Y,
but
Y
didn't
reply
yet.
\"
This
case
makes
sense,
because
of
the
implicit
causality
of
the
sentence.
Y
was
the
receiver
of
the
message,
but
Y
didn't
get
back
to
X
yet.
\n
Now,
consider
X:
\"
X
sent
a
message
to
Y,
but
X
didn't
reply
yet.
\"
This
case
doesn't
make
sense,
because
X
was
the
initial
sender
of
the
message,
so
it
is
now
Y's
turn
to
write
back
to
X.
\n
Because
in
this
context,
X
is
the
manager
and
Y
is
the
secretary,
the
answer
should
be
the
secretary.
So
the
answer
is
(A).
In
the
following
sentences,
explain
the
antecedent
of
the
pronoun
(which
thing
the
pronoun
refers
to),
or
state
that
it
is
ambiguous.
\n
Sentence:
Bailey
will
plan
to
meet
the
director
at
his
office
\n
Options:
\n
(A)
It
will
be
Bailey's
office
\n
(B)
It
will
be
the
director's
office
\n
(C)
Ambiguous
\n
A:
Let's
think
step
by
step.
\n
Here
we
need
to
determine
who
the
pronoun
\"
his
\"
might
be
referring
to.
There
are
two
possible
referents
for
\"
his
\"
,
namely
Bailey's
and
the
director's.
The
verb
phrase
\"
plan
to
meet
\"
might
be
able
to
help
us
determine
which
one
is
more
likely
(if
either).
Let
X
be
Bailey
and
Y
the
director.
The
sentence
is
then
of
the
form
\"
X
will
plan
to
meet
Y
at
(X
or
Y)'s
office.
\"\n
Let's
consider
Y
first:
\"
X
will
plan
to
meet
Y
at
Y's
office.
\"
This
case
makes
sense,
because
X
might
want
to
meet
up
with
Y
at
Y's
office.
\n
Now,
consider
X:
\"
X
will
plan
to
meet
Y
at
X's
office.
\"
This
case
also
makes
sense,
because
X
might
want
to
meet
up
with
Y
at
X's
own
office.
\n
Because
both
X
and
Y
are
possible
at
the
same
time,
we
conclude
that
the
antecedent
of
the
pronoun
is
ambiguous.
So
the
answer
is
(C).Q:
{{input}}
\n
A:
Let's
think
step
by
step.
\n
"
"
include"
:
"
_
flan_
cot_fewshot_template_yaml"
"
include"
:
"
_cot_fewshot_template_yaml"
"
task"
:
"
bbh_
flan_
cot_fewshot_disambiguation_qa"
"
task"
:
"
bbh_cot_fewshot_disambiguation_qa"
lm_eval/tasks/bbh/
flan_
cot_fewshot/dyck_languages.yaml
→
lm_eval/tasks/bbh/cot_fewshot/dyck_languages.yaml
View file @
bf26d979
"
dataset_name"
:
"
dyck_languages"
"
dataset_name"
:
"
dyck_languages"
"
description"
:
"
Correctly
close
a
Dyck-n
word.
\n\n
"
"
description"
:
"
Correctly
close
a
Dyck-n
word.
\n\n
"
"
doc_to_text"
:
"
Complete
the
rest
of
the
sequence,
making
sure
that
the
parentheses
are
closed
properly.
Input:
[
{
[
\n
A:
Let's
think
step
by
step.
\n
We
should
process
each
input
one
by
one
and
keep
track
of
the
stack
configuration.
\n
0:
empty
stack
\n
1:
[
;
stack:
[
\n
2:
{
;
stack:
[
{
\n
3:
[
;
stack:
[
{
[
\n
Now,
we
have
reached
the
end.
The
final
stack
is
\"
[
{
[
\"
.
\n
We
will
need
to
pop
out
\"
[
\"
,
\"
{
\"
,
\"
[
\"
one
by
one
in
that
order.
\n
So,
we
need
\"
]
\"
,
\"
}
\"
,
\"
]
\"
.
So
the
answer
is
]
}
].
Complete
the
rest
of
the
sequence,
making
sure
that
the
parentheses
are
closed
properly.
Input:
<
>
(
(
[
[
(
{
}
)
[
<
>
]
]
\n
A:
Let's
think
step
by
step.
\n
We
should
process
each
input
one
by
one
and
keep
track
of
the
stack
configuration.
\n
0:
empty
stack
\n
1:
<
;
stack:
<
\n
2:
>
;
stack:
empty
\n
3:
(
;
stack:
(
\n
4:
(
;
stack:
(
(
\n
5:
[
;
stack:
(
(
[
\n
6:
[
;
stack:
(
(
[
[
\n
7:
(
;
stack:
(
(
[
[
(
\n
8:
{
;
stack:
(
(
[
[
(
{
\n
9:
}
;
stack:
(
(
[
[
(
\n
10:
)
;
stack:
(
(
[
[
\n
11:
[
;
stack:
(
(
[
[
[
\n
12:
<
;
stack:
(
(
[
[
[
<
\n
13:
>
;
stack:
(
(
[
[
[
\n
14:
]
;
stack:
(
(
[
[
\n
15:
]
;
stack:
(
(
[
\n
Now,
we
have
reached
the
end.
The
final
stack
is
\"
(
(
[
\"
.
\n
We
will
need
to
pop
out
\"
[
\"
,
\"
(
\"
,
\"
(
\"
one
by
one
in
that
order.
\n
So,
we
need
\"
]
\"
,
\"
)
\"
,
\"
)
\"
.
So
the
answer
is
]
)
).
Complete
the
rest
of
the
sequence,
making
sure
that
the
parentheses
are
closed
properly.
Input:
<
[
<
[
{
<
[
]
<
{
}
>
>
}
]
>
{
{
(
)
}
{
<
[
<
>
]
>
}
\n
A:
Let's
think
step
by
step.
\n
We
should
process
each
input
one
by
one
and
keep
track
of
the
stack
configuration.
\n
0:
empty
stack
\n
1:
<
;
stack:
<
\n
2:
[
;
stack:
<
[
\n
3:
<
;
stack:
<
[
<
\n
4:
[
;
stack:
<
[
<
[
\n
5:
{
;
stack:
<
[
<
[
{
\n
6:
<
;
stack:
<
[
<
[
{
<
\n
7:
[
;
stack:
<
[
<
[
{
<
[
\n
8:
]
;
stack:
<
[
<
[
{
<
\n
9:
<
;
stack:
<
[
<
[
{
<
<
\n
10:
{
;
stack:
<
[
<
[
{
<
<
{
\n
11:
}
;
stack:
<
[
<
[
{
<
<
\n
12:
>
;
stack:
<
[
<
[
{
<
\n
13:
>
;
stack:
<
[
<
[
{
\n
14:
}
;
stack:
<
[
<
[
\n
15:
]
;
stack:
<
[
<
\n
16:
>
;
stack:
<
[
\n
17:
{
;
stack:
<
[
{
\n
18:
{
;
stack:
<
[
{
{
\n
19:
(
;
stack:
<
[
{
{
(
\n
20:
)
;
stack:
<
[
{
{
\n
21:
}
;
stack:
<
[
{
\n
22:
{
;
stack:
<
[
{
{
\n
23:
<
;
stack:
<
[
{
{
<
\n
24:
[
;
stack:
<
[
{
{
<
[
\n
25:
<
;
stack:
<
[
{
{
<
[
<
\n
26:
>
;
stack:
<
[
{
{
<
[
\n
27:
]
;
stack:
<
[
{
{
<
\n
28:
>
;
stack:
<
[
{
{
\n
29:
}
;
stack:
<
[
{
\n
Now,
we
have
reached
the
end.
The
final
stack
is
\"
<
[
{
\"
.
\n
We
will
need
to
pop
out
\"
{
\"
,
\"
[
\"
,
\"
<
\"
one
by
one
in
that
order.
\n
So,
we
need
\"
}
\"
,
\"
]
\"
,
\"
>
\"
.
So
the
answer
is
}
]
>.Q:
{{input}}
\n
A:
Let's
think
step
by
step.
\n
"
"
doc_to_text"
:
"
Complete
the
rest
of
the
sequence,
making
sure
that
the
parentheses
are
closed
properly.
Input:
[
{
[
\n
A:
Let's
think
step
by
step.
\n
We
should
process
each
input
one
by
one
and
keep
track
of
the
stack
configuration.
\n
0:
empty
stack
\n
1:
[
;
stack:
[
\n
2:
{
;
stack:
[
{
\n
3:
[
;
stack:
[
{
[
\n
Now,
we
have
reached
the
end.
The
final
stack
is
\"
[
{
[
\"
.
\n
We
will
need
to
pop
out
\"
[
\"
,
\"
{
\"
,
\"
[
\"
one
by
one
in
that
order.
\n
So,
we
need
\"
]
\"
,
\"
}
\"
,
\"
]
\"
.
So
the
answer
is
]
}
].
Complete
the
rest
of
the
sequence,
making
sure
that
the
parentheses
are
closed
properly.
Input:
<
>
(
(
[
[
(
{
}
)
[
<
>
]
]
\n
A:
Let's
think
step
by
step.
\n
We
should
process
each
input
one
by
one
and
keep
track
of
the
stack
configuration.
\n
0:
empty
stack
\n
1:
<
;
stack:
<
\n
2:
>
;
stack:
empty
\n
3:
(
;
stack:
(
\n
4:
(
;
stack:
(
(
\n
5:
[
;
stack:
(
(
[
\n
6:
[
;
stack:
(
(
[
[
\n
7:
(
;
stack:
(
(
[
[
(
\n
8:
{
;
stack:
(
(
[
[
(
{
\n
9:
}
;
stack:
(
(
[
[
(
\n
10:
)
;
stack:
(
(
[
[
\n
11:
[
;
stack:
(
(
[
[
[
\n
12:
<
;
stack:
(
(
[
[
[
<
\n
13:
>
;
stack:
(
(
[
[
[
\n
14:
]
;
stack:
(
(
[
[
\n
15:
]
;
stack:
(
(
[
\n
Now,
we
have
reached
the
end.
The
final
stack
is
\"
(
(
[
\"
.
\n
We
will
need
to
pop
out
\"
[
\"
,
\"
(
\"
,
\"
(
\"
one
by
one
in
that
order.
\n
So,
we
need
\"
]
\"
,
\"
)
\"
,
\"
)
\"
.
So
the
answer
is
]
)
).
Complete
the
rest
of
the
sequence,
making
sure
that
the
parentheses
are
closed
properly.
Input:
<
[
<
[
{
<
[
]
<
{
}
>
>
}
]
>
{
{
(
)
}
{
<
[
<
>
]
>
}
\n
A:
Let's
think
step
by
step.
\n
We
should
process
each
input
one
by
one
and
keep
track
of
the
stack
configuration.
\n
0:
empty
stack
\n
1:
<
;
stack:
<
\n
2:
[
;
stack:
<
[
\n
3:
<
;
stack:
<
[
<
\n
4:
[
;
stack:
<
[
<
[
\n
5:
{
;
stack:
<
[
<
[
{
\n
6:
<
;
stack:
<
[
<
[
{
<
\n
7:
[
;
stack:
<
[
<
[
{
<
[
\n
8:
]
;
stack:
<
[
<
[
{
<
\n
9:
<
;
stack:
<
[
<
[
{
<
<
\n
10:
{
;
stack:
<
[
<
[
{
<
<
{
\n
11:
}
;
stack:
<
[
<
[
{
<
<
\n
12:
>
;
stack:
<
[
<
[
{
<
\n
13:
>
;
stack:
<
[
<
[
{
\n
14:
}
;
stack:
<
[
<
[
\n
15:
]
;
stack:
<
[
<
\n
16:
>
;
stack:
<
[
\n
17:
{
;
stack:
<
[
{
\n
18:
{
;
stack:
<
[
{
{
\n
19:
(
;
stack:
<
[
{
{
(
\n
20:
)
;
stack:
<
[
{
{
\n
21:
}
;
stack:
<
[
{
\n
22:
{
;
stack:
<
[
{
{
\n
23:
<
;
stack:
<
[
{
{
<
\n
24:
[
;
stack:
<
[
{
{
<
[
\n
25:
<
;
stack:
<
[
{
{
<
[
<
\n
26:
>
;
stack:
<
[
{
{
<
[
\n
27:
]
;
stack:
<
[
{
{
<
\n
28:
>
;
stack:
<
[
{
{
\n
29:
}
;
stack:
<
[
{
\n
Now,
we
have
reached
the
end.
The
final
stack
is
\"
<
[
{
\"
.
\n
We
will
need
to
pop
out
\"
{
\"
,
\"
[
\"
,
\"
<
\"
one
by
one
in
that
order.
\n
So,
we
need
\"
}
\"
,
\"
]
\"
,
\"
>
\"
.
So
the
answer
is
}
]
>.Q:
{{input}}
\n
A:
Let's
think
step
by
step.
\n
"
"
include"
:
"
_
flan_
cot_fewshot_template_yaml"
"
include"
:
"
_cot_fewshot_template_yaml"
"
task"
:
"
bbh_
flan_
cot_fewshot_dyck_languages"
"
task"
:
"
bbh_cot_fewshot_dyck_languages"
lm_eval/tasks/bbh/
flan_
cot_fewshot/formal_fallacies.yaml
→
lm_eval/tasks/bbh/cot_fewshot/formal_fallacies.yaml
View file @
bf26d979
"
dataset_name"
:
"
formal_fallacies"
"
dataset_name"
:
"
formal_fallacies"
"
description"
:
"
Distinguish
deductively
valid
arguments
from
formal
fallacies.
\n\n
"
"
description"
:
"
Distinguish
deductively
valid
arguments
from
formal
fallacies.
\n\n
"
"
doc_to_text"
:
"
\"
It
is
not
always
easy
to
see
who
is
related
to
whom
--
and
in
which
ways.
The
following
argument
pertains
to
this
question:
To
begin
with,
Lesley
is
a
close
friend
of
Fernando.
Moreover,
being
a
close
friend
of
Fernando
or
a
schoolmate
of
Lowell
is
sufficient
for
being
a
great-grandfather
of
Leroy.
It
follows
that
Lesley
is
a
great-grandfather
of
Leroy.
\"\n
Is
the
argument,
given
the
explicitly
stated
premises,
deductively
valid
or
invalid?
\n
Options:
\n
-
valid
\n
-
invalid
\n
A:
Let's
think
step
by
step.
\n
(1)
Lesley
is
a
close
friend
of
Fernando:
Lesley
=
friend(Fernando).
\n
(2)
Being
a
close
friend
of
Fernando
or
a
schoolmate
of
Lowell
is
sufficient
for
being
a
great-grandfather
of
Leroy:
If
X
=
friend(Fernando)
OR
SCHOOLMATE(Lowell),
then
X
=
great-grandfather(Leroy).
\n
Hypothesis:
Does
it
follow
that
Lesley
is
a
great-grandfather
of
Leroy:
Lesley
=
great-grandfather(Leroy)?
\n
Let’s
see
whether
the
Hypothesis
can
be
deduced
from
the
arguments
(1)
and
(2)
by
logical
reasoning?
\n
By
(1),
we
have
Lesley
=
friend(Fernando).
By
(2),
we
have
if
Lesley
=
friend(Fernando),
then
Lesley
=
great-grandfather(Leroy).
\n
So,
it
is
true
that
Lesley
is
a
great-grandfather
of
Leroy.
So
the
answer
is
valid.
\"
It
is
not
always
easy
to
see
who
is
related
to
whom
--
and
in
which
ways.
The
following
argument
pertains
to
this
question:
Whoever
is
not
a
great-grandfather
of
Clyde
is
a
stepbrother
of
Brian.
Being
an
ancestor
of
Dana
is
sufficient
for
not
being
a
great-grandfather
of
Clyde.
We
may
conclude:
Everyone
who
is
an
ancestor
of
Dana
is
a
stepbrother
of
Brian,
too.
\"\n
Is
the
argument,
given
the
explicitly
stated
premises,
deductively
valid
or
invalid?
\n
Options:
\n
-
valid
\n
-
invalid
\n
A:
Let's
think
step
by
step.
\n
(1)
Whoever
is
not
a
great-grandfather
of
Clyde
is
a
stepbrother
of
Brian:
If
X
=
NOT
(great-grandfather(Clyde)),
then
X
=
stepbrother(Brian).
\n
(2):
Being
an
ancestor
of
Dana
is
sufficient
for
not
being
a
great-grandfather
of
Clyde:
If
X
=
ancestor(Dana),
X
=
NOT
(great-grandfather(Clyde)).
\n
Hypothesis:
Does
it
follow
that
everyone
who
is
an
ancestor
of
Dana
is
a
stepbrother
of
Brian,
too:
If
X
=
ancestor(Dana),
then
X
=
stepbrother(Brian)?
\n
Let’s
see
whether
the
Hypothesis
can
be
deduced
from
the
arguments
(1)
and
(2)
by
logical
reasoning?
\n
By
(2),
we
have
if
X
=
ancestor(Dana),
X
=
NOT
(great-grandfather(Clyde)).
\n
Furthermore,
by
(1),
we
have
if
X
=
NOT
(great-grandfather(Clyde)),
then
X
=
stepbrother(Brian).
\n
By
the
transitive
relation
rule
in
first-order
logic,
we
then
have:
if
X
=
ancestor(Dana),
then
X
=
stepbrother(Brian).
\n
So,
it
is
true
that
everyone
who
is
an
ancestor
of
Dana
is
a
stepbrother
of
Brian.
So
the
answer
is
valid.
\"
It
is
not
always
easy
to
grasp
who
is
consuming
which
products.
The
following
argument
pertains
to
this
question:
Every
infrequent
user
of
Paul
Mitchell
shampoo
is
either
a
rare
consumer
of
Nioxin
shampoo
or
a
loyal
buyer
of
Caress
soap,
or
both.
No
regular
consumer
of
Lush
soap
is
a
rare
consumer
of
Nioxin
shampoo
and,
in
the
same
time,
a
loyal
buyer
of
Caress
soap.
It
follows
that
whoever
is
an
infrequent
user
of
Paul
Mitchell
shampoo
is
not
a
regular
consumer
of
Lush
soap.
\"\n
Is
the
argument,
given
the
explicitly
stated
premises,
deductively
valid
or
invalid?
\n
Options:
\n
-
valid
\n
-
invalid
\n
A:
Let's
think
step
by
step.
\n
(1)
Every
infrequent
user
of
Paul
Mitchell
shampoo
is
either
a
rare
consumer
of
Nioxin
shampoo
or
a
loyal
buyer
of
Caress
soap,
or
both:
If
X
=
infrequent-user(Paul
Mitchell),
then
X
=
rare-consumer(Nioxin)
OR
X
=
loyal-buyer(Caress).
\n
(2):
No
regular
consumer
of
Lush
soap
is
a
rare
consumer
of
Nioxin
shampoo
and
a
loyal
buyer
of
Caress
soap
at
the
same
time.
If
X
=
regular-consumer(Lush),
then
X
=
NOT
(rare-consumer(Nioxin)
AND
loyal-buyer(Caress)).
\n
Hypothesis:
Does
it
follow
that
whoever
is
an
infrequent
user
of
Paul
Mitchell
shampoo
is
not
a
regular
consumer
of
Lush
soap:
If
X
=
infrequent-user(Paul
Mitchell),
then
X
=
NOT
(regular-consumer(Lush))?
\n
Let’s
see
whether
the
Hypothesis
can
be
deduced
from
the
arguments
(1)
and
(2)
by
logical
reasoning?
\n
By
(1),
we
have
if
X
=
infrequent-user(Paul
Mitchell),
then
X
=
rare-consumer(Nioxin)
OR
X
=
loyal-buyer(Caress).
We
need
to
consider
both
cases
separately:
\n
The
case
X
=
rare-consumer(Nioxin)
does
not
appear
in
(2).
\n
The
case
X
=
loyal-buyer(Caress)
does
not
appear
in
(2),
either.
\n
So,
from
(1)
and
(2),
we
cannot
necessarily
deduce
the
Hypothesis.
So
the
answer
is
invalid.Q:
{{input}}
\n
A:
Let's
think
step
by
step.
\n
"
"
doc_to_text"
:
"
\"
It
is
not
always
easy
to
see
who
is
related
to
whom
--
and
in
which
ways.
The
following
argument
pertains
to
this
question:
To
begin
with,
Lesley
is
a
close
friend
of
Fernando.
Moreover,
being
a
close
friend
of
Fernando
or
a
schoolmate
of
Lowell
is
sufficient
for
being
a
great-grandfather
of
Leroy.
It
follows
that
Lesley
is
a
great-grandfather
of
Leroy.
\"\n
Is
the
argument,
given
the
explicitly
stated
premises,
deductively
valid
or
invalid?
\n
Options:
\n
-
valid
\n
-
invalid
\n
A:
Let's
think
step
by
step.
\n
(1)
Lesley
is
a
close
friend
of
Fernando:
Lesley
=
friend(Fernando).
\n
(2)
Being
a
close
friend
of
Fernando
or
a
schoolmate
of
Lowell
is
sufficient
for
being
a
great-grandfather
of
Leroy:
If
X
=
friend(Fernando)
OR
SCHOOLMATE(Lowell),
then
X
=
great-grandfather(Leroy).
\n
Hypothesis:
Does
it
follow
that
Lesley
is
a
great-grandfather
of
Leroy:
Lesley
=
great-grandfather(Leroy)?
\n
Let’s
see
whether
the
Hypothesis
can
be
deduced
from
the
arguments
(1)
and
(2)
by
logical
reasoning?
\n
By
(1),
we
have
Lesley
=
friend(Fernando).
By
(2),
we
have
if
Lesley
=
friend(Fernando),
then
Lesley
=
great-grandfather(Leroy).
\n
So,
it
is
true
that
Lesley
is
a
great-grandfather
of
Leroy.
So
the
answer
is
valid.
\"
It
is
not
always
easy
to
see
who
is
related
to
whom
--
and
in
which
ways.
The
following
argument
pertains
to
this
question:
Whoever
is
not
a
great-grandfather
of
Clyde
is
a
stepbrother
of
Brian.
Being
an
ancestor
of
Dana
is
sufficient
for
not
being
a
great-grandfather
of
Clyde.
We
may
conclude:
Everyone
who
is
an
ancestor
of
Dana
is
a
stepbrother
of
Brian,
too.
\"\n
Is
the
argument,
given
the
explicitly
stated
premises,
deductively
valid
or
invalid?
\n
Options:
\n
-
valid
\n
-
invalid
\n
A:
Let's
think
step
by
step.
\n
(1)
Whoever
is
not
a
great-grandfather
of
Clyde
is
a
stepbrother
of
Brian:
If
X
=
NOT
(great-grandfather(Clyde)),
then
X
=
stepbrother(Brian).
\n
(2):
Being
an
ancestor
of
Dana
is
sufficient
for
not
being
a
great-grandfather
of
Clyde:
If
X
=
ancestor(Dana),
X
=
NOT
(great-grandfather(Clyde)).
\n
Hypothesis:
Does
it
follow
that
everyone
who
is
an
ancestor
of
Dana
is
a
stepbrother
of
Brian,
too:
If
X
=
ancestor(Dana),
then
X
=
stepbrother(Brian)?
\n
Let’s
see
whether
the
Hypothesis
can
be
deduced
from
the
arguments
(1)
and
(2)
by
logical
reasoning?
\n
By
(2),
we
have
if
X
=
ancestor(Dana),
X
=
NOT
(great-grandfather(Clyde)).
\n
Furthermore,
by
(1),
we
have
if
X
=
NOT
(great-grandfather(Clyde)),
then
X
=
stepbrother(Brian).
\n
By
the
transitive
relation
rule
in
first-order
logic,
we
then
have:
if
X
=
ancestor(Dana),
then
X
=
stepbrother(Brian).
\n
So,
it
is
true
that
everyone
who
is
an
ancestor
of
Dana
is
a
stepbrother
of
Brian.
So
the
answer
is
valid.
\"
It
is
not
always
easy
to
grasp
who
is
consuming
which
products.
The
following
argument
pertains
to
this
question:
Every
infrequent
user
of
Paul
Mitchell
shampoo
is
either
a
rare
consumer
of
Nioxin
shampoo
or
a
loyal
buyer
of
Caress
soap,
or
both.
No
regular
consumer
of
Lush
soap
is
a
rare
consumer
of
Nioxin
shampoo
and,
in
the
same
time,
a
loyal
buyer
of
Caress
soap.
It
follows
that
whoever
is
an
infrequent
user
of
Paul
Mitchell
shampoo
is
not
a
regular
consumer
of
Lush
soap.
\"\n
Is
the
argument,
given
the
explicitly
stated
premises,
deductively
valid
or
invalid?
\n
Options:
\n
-
valid
\n
-
invalid
\n
A:
Let's
think
step
by
step.
\n
(1)
Every
infrequent
user
of
Paul
Mitchell
shampoo
is
either
a
rare
consumer
of
Nioxin
shampoo
or
a
loyal
buyer
of
Caress
soap,
or
both:
If
X
=
infrequent-user(Paul
Mitchell),
then
X
=
rare-consumer(Nioxin)
OR
X
=
loyal-buyer(Caress).
\n
(2):
No
regular
consumer
of
Lush
soap
is
a
rare
consumer
of
Nioxin
shampoo
and
a
loyal
buyer
of
Caress
soap
at
the
same
time.
If
X
=
regular-consumer(Lush),
then
X
=
NOT
(rare-consumer(Nioxin)
AND
loyal-buyer(Caress)).
\n
Hypothesis:
Does
it
follow
that
whoever
is
an
infrequent
user
of
Paul
Mitchell
shampoo
is
not
a
regular
consumer
of
Lush
soap:
If
X
=
infrequent-user(Paul
Mitchell),
then
X
=
NOT
(regular-consumer(Lush))?
\n
Let’s
see
whether
the
Hypothesis
can
be
deduced
from
the
arguments
(1)
and
(2)
by
logical
reasoning?
\n
By
(1),
we
have
if
X
=
infrequent-user(Paul
Mitchell),
then
X
=
rare-consumer(Nioxin)
OR
X
=
loyal-buyer(Caress).
We
need
to
consider
both
cases
separately:
\n
The
case
X
=
rare-consumer(Nioxin)
does
not
appear
in
(2).
\n
The
case
X
=
loyal-buyer(Caress)
does
not
appear
in
(2),
either.
\n
So,
from
(1)
and
(2),
we
cannot
necessarily
deduce
the
Hypothesis.
So
the
answer
is
invalid.Q:
{{input}}
\n
A:
Let's
think
step
by
step.
\n
"
"
include"
:
"
_
flan_
cot_fewshot_template_yaml"
"
include"
:
"
_cot_fewshot_template_yaml"
"
task"
:
"
bbh_
flan_
cot_fewshot_formal_fallacies"
"
task"
:
"
bbh_cot_fewshot_formal_fallacies"
lm_eval/tasks/bbh/
flan_
cot_fewshot/geometric_shapes.yaml
→
lm_eval/tasks/bbh/cot_fewshot/geometric_shapes.yaml
View file @
bf26d979
"
dataset_name"
:
"
geometric_shapes"
"
dataset_name"
:
"
geometric_shapes"
"
description"
:
"
Name
geometric
shapes
from
their
SVG
paths.
\n\n
"
"
description"
:
"
Name
geometric
shapes
from
their
SVG
paths.
\n\n
"
"
doc_to_text"
:
"
This
SVG
path
element
<path
d=
\"
M
31.00,73.00
L
32.00,59.00
L
44.00,50.00
L
49.00,41.00
L
64.00,37.00
L
71.00,55.00
L
64.00,76.00
L
52.00,61.00
L
31.00,73.00
\"
/>
draws
a
\n
Options:
\n
(A)
circle
\n
(B)
heptagon
\n
(C)
hexagon
\n
(D)
kite
\n
(E)
line
\n
(F)
octagon
\n
(G)
pentagon
\n
(H)
rectangle
\n
(I)
sector
\n
(J)
triangle
\n
A:
Let's
think
step
by
step.
\n
This
SVG
path
element
contains
\"
M
\"
and
\"
L
\"
commands.
M
takes
two
parameters
(x,y)
and
moves
the
current
point
to
the
coordinates
(x,y).
L
takes
two
parameters
(x,y)
and
draws
a
line
from
the
previous
coordinate
to
the
new
coordinate
(x,y).
\n
This
path
can
be
decomposed
into
9
separate
commands.
\n
(1)
M
31.00,73.00:
Move
the
current
point
to
31.00,73.00.
\n
(2)
L
32.00,59.00:
Create
a
line
from
31.00,73.00
to
32.00,59.00.
\n
(3)
L
44.00,50.00:
Create
a
line
from
32.00,59.00
to
44.00,50.00.
\n
(4)
L
49.00,41.00:
Create
a
line
from
44.00,50.00
to
49.00,41.00.
\n
(5)
L
64.00,37.00:
Create
a
line
from
49.00,41.00
to
64.00,37.00.
\n
(6)
L
71.00,55.00:
Create
a
line
from
64.00,37.00
to
71.00,55.00.
\n
(7)
L
64.00,76.00:
Create
a
line
from
71.00,55.00
to
64.00,76.00.
\n
(8)
L
52.00,61.00:
Create
a
line
from
64.00,76.00
to
52.00,61.00.
\n
(9)
L
31.00,73.00:
Create
a
line
from
52.00,61.00
to
31.00,73.00.
\n
This
SVG
path
starts
at
point
31.00,73.00,
creates
eight
consecutive
and
touching
lines,
and
then
returns
back
its
starting
point,
thereby
creating
an
eight-sided
shape.
It
does
not
have
any
curves
or
arches.
\"
octagon
\"
is
the
only
eight-sided
object
on
the
list.
So
the
answer
is
(F).
This
SVG
path
element
<path
d=
\"
M
14.19,26.04
L
51.43,39.21
L
58.44,36.69
L
56.63,30.17
L
48.53,26.66
L
14.19,26.04
\"
/>
draws
a
\n
Options:
\n
(A)
circle
\n
(B)
heptagon
\n
(C)
hexagon
\n
(D)
kite
\n
(E)
line
\n
(F)
octagon
\n
(G)
pentagon
\n
(H)
rectangle
\n
(I)
sector
\n
(J)
triangle
\n
A:
Let's
think
step
by
step.
\n
This
SVG
path
element
contains
\"
M
\"
and
\"
L
\"
commands.
M
takes
two
parameters
(x,y)
and
moves
the
current
point
to
the
coordinates
(x,y).
L
takes
two
parameters
(x,y)
and
draws
a
line
from
the
previous
coordinate
to
the
new
coordinate
(x,y).
\n
This
path
can
be
decomposed
into
6
separate
commands.
\n
(1)
M
14.19,26.04:
Move
the
current
point
to
14.19,26.04.
\n
(2)
L
51.43,39.21:
Create
a
line
from
14.19,26.04
to
51.43,39.21.
\n
(3)
L
58.44,36.69:
Create
a
line
from
51.43,39.21
to
58.44,36.69.
\n
(4)
L
56.63,30.17:
Create
a
line
from
58.44,36.69
to
56.63,30.17.
\n
(5)
L
48.53,26.66:
Create
a
line
from
56.63,30.17
to
48.53,26.66.
\n
(6)
L
14.19,26.04:
Create
a
line
from
48.53,26.66
to
14.19,26.04.
\n
This
SVG
path
starts
at
point
14.19,26.04,
creates
five
consecutive
and
touching
lines,
and
then
returns
back
its
starting
point,
thereby
creating
a
five-sided
shape.
It
does
not
have
any
curves
or
arches.
\"
pentagon
\"
is
the
only
five-sided
polygon
on
the
list.
So
the
answer
is
(G).
This
SVG
path
element
<path
d=
\"
M
41.00,43.00
L
37.00,34.00
L
41.00,33.00
L
45.00,34.00
L
41.00,43.00
\"
/>
draws
a
\n
Options:
\n
(A)
circle
\n
(B)
heptagon
\n
(C)
hexagon
\n
(D)
kite
\n
(E)
line
\n
(F)
octagon
\n
(G)
pentagon
\n
(H)
rectangle
\n
(I)
sector
\n
(J)
triangle
\n
A:
Let's
think
step
by
step.
\n
This
SVG
path
element
contains
\"
M
\"
and
\"
L
\"
commands.
M
takes
two
parameters
(x,y)
and
moves
the
current
point
to
the
coordinates
(x,y).
L
takes
two
parameters
(x,y)
and
draws
a
line
from
the
previous
coordinate
to
the
new
coordinate
(x,y).
\n
This
path
can
be
decomposed
into
5
separate
commands.
\n
(1)
M
41.00,43.00:
Move
the
current
point
to
41.00,43.00.
\n
(2)
L
37.00,34.00:
Create
a
line
from
41.00,43.00
to
37.00,34.00.
\n
(3)
L
41.00,33.00:
Create
a
line
from
37.00,34.00
to
41.00,33.00.
\n
(4)
L
45.00,34.00:
Create
a
line
from
41.00,33.00
to
45.00,34.00.
\n
(5)
L
41.00,43.00:
Create
a
line
from
45.00,34.00
to
41.00,43.00.
\n
This
SVG
path
starts
at
point
41.00,43.00,
creates
four
consecutive
and
touching
lines,
and
then
returns
back
its
starting
point,
thereby
creating
a
four-sided
shape.
\"
kite
\"
and
\"
rectangle
\"
are
the
only
two
four-sided
polygons
on
the
list.
So,
we
need
to
determine
which
one
is
the
correct
answer.
\n
A
kite
has
two
pairs
of
equal-length
adjacent
sides,
whereas
a
rectangle
has
two
pairs
of
equal-length
alternate
(opposite)
sides.
Now,
let's
check
whether
the
two
adjacent
sides
of
this
shape
are
equal.
\n
Length
of
side
A:
|A|
=
sqrt((41.00-37.00)^2
+
(43.00-34.00)^2)
=
sqrt((4)^2
+
(9)^2)
=
sqrt(16
+
81)
=
sqrt(97).
\n
Length
of
side
B:
|B|
=
sqrt((37.00-41.00)^2
+
(34.00-33.00)^2))
=
sqrt((4)^2
+
(1)^2)
=
sqrt(16
+
1)
=
sqrt(17).
\n
Length
of
side
C:
|C|
=
sqrt((41.00-45.00)^2
+
(33.00-34.00)^2))
=
sqrt((-4)^2
+
(-1)^2)
=
sqrt(16
+
1)
=
sqrt(17).
\n
Length
of
side
D:
|D|
=
sqrt((45.00-41.00)^2
+
(34.00-43.00)^2))
=
sqrt((4)^2
+
(-9)^2)
=
sqrt(16
+
81)
=
sqrt(97).
\n
Note
that
|A|
=
|D|
and
|B|
=
|C|.
Furthermore,
A
and
D
are
adjacent
and
B
and
C
are
adjacent.
Thus,
this
polygon
has
two
pairs
of
equal-length
adjacent
sides
and
is
\"
kite
\"
.
So
the
answer
is
(D).Q:
{{input}}
\n
A:
Let's
think
step
by
step.
\n
"
"
doc_to_text"
:
"
This
SVG
path
element
<path
d=
\"
M
31.00,73.00
L
32.00,59.00
L
44.00,50.00
L
49.00,41.00
L
64.00,37.00
L
71.00,55.00
L
64.00,76.00
L
52.00,61.00
L
31.00,73.00
\"
/>
draws
a
\n
Options:
\n
(A)
circle
\n
(B)
heptagon
\n
(C)
hexagon
\n
(D)
kite
\n
(E)
line
\n
(F)
octagon
\n
(G)
pentagon
\n
(H)
rectangle
\n
(I)
sector
\n
(J)
triangle
\n
A:
Let's
think
step
by
step.
\n
This
SVG
path
element
contains
\"
M
\"
and
\"
L
\"
commands.
M
takes
two
parameters
(x,y)
and
moves
the
current
point
to
the
coordinates
(x,y).
L
takes
two
parameters
(x,y)
and
draws
a
line
from
the
previous
coordinate
to
the
new
coordinate
(x,y).
\n
This
path
can
be
decomposed
into
9
separate
commands.
\n
(1)
M
31.00,73.00:
Move
the
current
point
to
31.00,73.00.
\n
(2)
L
32.00,59.00:
Create
a
line
from
31.00,73.00
to
32.00,59.00.
\n
(3)
L
44.00,50.00:
Create
a
line
from
32.00,59.00
to
44.00,50.00.
\n
(4)
L
49.00,41.00:
Create
a
line
from
44.00,50.00
to
49.00,41.00.
\n
(5)
L
64.00,37.00:
Create
a
line
from
49.00,41.00
to
64.00,37.00.
\n
(6)
L
71.00,55.00:
Create
a
line
from
64.00,37.00
to
71.00,55.00.
\n
(7)
L
64.00,76.00:
Create
a
line
from
71.00,55.00
to
64.00,76.00.
\n
(8)
L
52.00,61.00:
Create
a
line
from
64.00,76.00
to
52.00,61.00.
\n
(9)
L
31.00,73.00:
Create
a
line
from
52.00,61.00
to
31.00,73.00.
\n
This
SVG
path
starts
at
point
31.00,73.00,
creates
eight
consecutive
and
touching
lines,
and
then
returns
back
its
starting
point,
thereby
creating
an
eight-sided
shape.
It
does
not
have
any
curves
or
arches.
\"
octagon
\"
is
the
only
eight-sided
object
on
the
list.
So
the
answer
is
(F).
This
SVG
path
element
<path
d=
\"
M
14.19,26.04
L
51.43,39.21
L
58.44,36.69
L
56.63,30.17
L
48.53,26.66
L
14.19,26.04
\"
/>
draws
a
\n
Options:
\n
(A)
circle
\n
(B)
heptagon
\n
(C)
hexagon
\n
(D)
kite
\n
(E)
line
\n
(F)
octagon
\n
(G)
pentagon
\n
(H)
rectangle
\n
(I)
sector
\n
(J)
triangle
\n
A:
Let's
think
step
by
step.
\n
This
SVG
path
element
contains
\"
M
\"
and
\"
L
\"
commands.
M
takes
two
parameters
(x,y)
and
moves
the
current
point
to
the
coordinates
(x,y).
L
takes
two
parameters
(x,y)
and
draws
a
line
from
the
previous
coordinate
to
the
new
coordinate
(x,y).
\n
This
path
can
be
decomposed
into
6
separate
commands.
\n
(1)
M
14.19,26.04:
Move
the
current
point
to
14.19,26.04.
\n
(2)
L
51.43,39.21:
Create
a
line
from
14.19,26.04
to
51.43,39.21.
\n
(3)
L
58.44,36.69:
Create
a
line
from
51.43,39.21
to
58.44,36.69.
\n
(4)
L
56.63,30.17:
Create
a
line
from
58.44,36.69
to
56.63,30.17.
\n
(5)
L
48.53,26.66:
Create
a
line
from
56.63,30.17
to
48.53,26.66.
\n
(6)
L
14.19,26.04:
Create
a
line
from
48.53,26.66
to
14.19,26.04.
\n
This
SVG
path
starts
at
point
14.19,26.04,
creates
five
consecutive
and
touching
lines,
and
then
returns
back
its
starting
point,
thereby
creating
a
five-sided
shape.
It
does
not
have
any
curves
or
arches.
\"
pentagon
\"
is
the
only
five-sided
polygon
on
the
list.
So
the
answer
is
(G).
This
SVG
path
element
<path
d=
\"
M
41.00,43.00
L
37.00,34.00
L
41.00,33.00
L
45.00,34.00
L
41.00,43.00
\"
/>
draws
a
\n
Options:
\n
(A)
circle
\n
(B)
heptagon
\n
(C)
hexagon
\n
(D)
kite
\n
(E)
line
\n
(F)
octagon
\n
(G)
pentagon
\n
(H)
rectangle
\n
(I)
sector
\n
(J)
triangle
\n
A:
Let's
think
step
by
step.
\n
This
SVG
path
element
contains
\"
M
\"
and
\"
L
\"
commands.
M
takes
two
parameters
(x,y)
and
moves
the
current
point
to
the
coordinates
(x,y).
L
takes
two
parameters
(x,y)
and
draws
a
line
from
the
previous
coordinate
to
the
new
coordinate
(x,y).
\n
This
path
can
be
decomposed
into
5
separate
commands.
\n
(1)
M
41.00,43.00:
Move
the
current
point
to
41.00,43.00.
\n
(2)
L
37.00,34.00:
Create
a
line
from
41.00,43.00
to
37.00,34.00.
\n
(3)
L
41.00,33.00:
Create
a
line
from
37.00,34.00
to
41.00,33.00.
\n
(4)
L
45.00,34.00:
Create
a
line
from
41.00,33.00
to
45.00,34.00.
\n
(5)
L
41.00,43.00:
Create
a
line
from
45.00,34.00
to
41.00,43.00.
\n
This
SVG
path
starts
at
point
41.00,43.00,
creates
four
consecutive
and
touching
lines,
and
then
returns
back
its
starting
point,
thereby
creating
a
four-sided
shape.
\"
kite
\"
and
\"
rectangle
\"
are
the
only
two
four-sided
polygons
on
the
list.
So,
we
need
to
determine
which
one
is
the
correct
answer.
\n
A
kite
has
two
pairs
of
equal-length
adjacent
sides,
whereas
a
rectangle
has
two
pairs
of
equal-length
alternate
(opposite)
sides.
Now,
let's
check
whether
the
two
adjacent
sides
of
this
shape
are
equal.
\n
Length
of
side
A:
|A|
=
sqrt((41.00-37.00)^2
+
(43.00-34.00)^2)
=
sqrt((4)^2
+
(9)^2)
=
sqrt(16
+
81)
=
sqrt(97).
\n
Length
of
side
B:
|B|
=
sqrt((37.00-41.00)^2
+
(34.00-33.00)^2))
=
sqrt((4)^2
+
(1)^2)
=
sqrt(16
+
1)
=
sqrt(17).
\n
Length
of
side
C:
|C|
=
sqrt((41.00-45.00)^2
+
(33.00-34.00)^2))
=
sqrt((-4)^2
+
(-1)^2)
=
sqrt(16
+
1)
=
sqrt(17).
\n
Length
of
side
D:
|D|
=
sqrt((45.00-41.00)^2
+
(34.00-43.00)^2))
=
sqrt((4)^2
+
(-9)^2)
=
sqrt(16
+
81)
=
sqrt(97).
\n
Note
that
|A|
=
|D|
and
|B|
=
|C|.
Furthermore,
A
and
D
are
adjacent
and
B
and
C
are
adjacent.
Thus,
this
polygon
has
two
pairs
of
equal-length
adjacent
sides
and
is
\"
kite
\"
.
So
the
answer
is
(D).Q:
{{input}}
\n
A:
Let's
think
step
by
step.
\n
"
"
include"
:
"
_
flan_
cot_fewshot_template_yaml"
"
include"
:
"
_cot_fewshot_template_yaml"
"
task"
:
"
bbh_
flan_
cot_fewshot_geometric_shapes"
"
task"
:
"
bbh_cot_fewshot_geometric_shapes"
lm_eval/tasks/bbh/
flan_
cot_fewshot/hyperbaton.yaml
→
lm_eval/tasks/bbh/cot_fewshot/hyperbaton.yaml
View file @
bf26d979
"
dataset_name"
:
"
hyperbaton"
"
dataset_name"
:
"
hyperbaton"
"
description"
:
"
Order
adjectives
correctly
in
English
sentences.
\n\n
"
"
description"
:
"
Order
adjectives
correctly
in
English
sentences.
\n\n
"
"
doc_to_text"
:
"
Which
sentence
has
the
correct
adjective
order:
\n
Options:
\n
(A)
rubber
terrible
ship
\n
(B)
terrible
rubber
ship
\n
A:
Let's
think
step
by
step.
\n
When
there
is
more
than
one
adjective
before
a
noun,
the
adjectives
need
to
respect
the
following
order
before
a
noun:
\"
[1.
opinion]
[2.
size]
[3.
age]
[4.
shape]
[5.
color]
[6.
origin]
[7.
material]
[8.
purpose]
noun
\"
.
\n
Option
(A):
\"
rubber
terrible
ship
\"
.
(1)
rubber
\"
falls
into
the
material
category.
(2)
\"
terrible
\"
falls
into
the
opinion
category.
Option
(A)
has
the
following
adjective
order:
[7.
material]
[1.
opinion]
(or,
in
numeric
terms,
7
1).
Because
7
<
1
is
not
correct,
(A)
does
not
have
the
correct
ordering.
\n
Option
(B):
\"
terrible
rubber
ship
\"
.
Option
(B)
has
the
following
adjective
order:
[1.
opinion]
[7.
material]
(or,
in
numeric
terms,
1
7).
Because
1
<
7
is
correct,
(B)
has
the
correct
ordering.
So
the
answer
is
(B).
Which
sentence
has
the
correct
adjective
order:
\n
Options:
\n
(A)
repulsive
small
Brazilian
exercise
ship
\n
(B)
Brazilian
repulsive
exercise
small
ship
\n
A:
Let's
think
step
by
step.
\n
When
there
is
more
than
one
adjective
before
a
noun,
the
adjectives
need
to
respect
the
following
order
before
a
noun:
\"
[1.
opinion]
[2.
size]
[3.
age]
[4.
shape]
[5.
color]
[6.
origin]
[7.
material]
[8.
purpose]
noun
\"
.
\n
Option
(A):
\"
repulsive
small
Brazilian
exercise
ship
\"
.
(1)
\"
repulsive
\"
falls
into
the
opinion
category.
(2)
\"
small
\"
falls
into
the
size
category.
(3)
\"
Brazilian
\"
falls
into
the
origin
category.
(4)
\"
exercise
\"
falls
into
the
purpose
category.
Option
(A)
has
the
following
adjective
order:
[1.
opinion]
[2.
size]
[6.
origin]
[8.
purpose]
(or,
in
numeric
terms,
1
2
6
8).
Because
1
<
2
<
6
<
8
is
correct,
(A)
has
the
correct
ordering.
\n
Option
(B):
\"
Brazilian
repulsive
exercise
small
ship
\"
.
Option
(B)
has
the
following
adjective
order:
[6.
origin]
[1.
opinion]
[8.
purpose]
[2.
size]
(or,
in
numeric
terms,
6
1
8
2).
Because
6
<
1
<
8
<
2
is
not
correct,
(B)
does
not
have
the
correct
ordering.
So
the
answer
is
(A).
Which
sentence
has
the
correct
adjective
order:
\n
Options:
\n
(A)
blue
gold
wonderful
square
shoe
\n
(B)
wonderful
square
blue
gold
shoe
\n
A:
Let's
think
step
by
step.
\n
When
there
is
more
than
one
adjective
before
a
noun,
the
adjectives
need
to
respect
the
following
order
before
a
noun:
\"
[1.
opinion]
[2.
size]
[3.
age]
[4.
shape]
[5.
color]
[6.
origin]
[7.
material]
[8.
purpose]
noun
\"
.
\n
Option
(A):
\"
blue
gold
wonderful
square
shoe
\"
.
(1)
\"
blue
\"
falls
into
the
color
category.
(2)
\"
gold
\"
falls
into
the
material
category.
(3)
\"
wonderful
\"
falls
into
the
opinion
category.
(4)
\"
square
\"
falls
into
the
shape
category.
The
adjective
order
that
Option
(A)
has
is
[5.
color]
[7.
material]
[1.
opinion]
[4.
shape]
(or,
in
numeric
terms,
5
7
1
4).
Because
5
<
7
<
1
<
4
is
not
correct,
(A)
does
not
have
the
correct
ordering.
\n
Option
(B):
\"
wonderful
square
blue
gold
shoe
\"
.
Option
(B)
has
the
following
adjective
order:
[1.
opinion]
[4.
shape]
[5.
color]
[7.
material]
(or,
in
numeric
terms,
1
4
5
7
).
Because
1
<
4
<
5
<
7
is
correct,
(B)
has
the
correct
ordering.
So
the
answer
is
(B).Q:
{{input}}
\n
A:
Let's
think
step
by
step.
\n
"
"
doc_to_text"
:
"
Which
sentence
has
the
correct
adjective
order:
\n
Options:
\n
(A)
rubber
terrible
ship
\n
(B)
terrible
rubber
ship
\n
A:
Let's
think
step
by
step.
\n
When
there
is
more
than
one
adjective
before
a
noun,
the
adjectives
need
to
respect
the
following
order
before
a
noun:
\"
[1.
opinion]
[2.
size]
[3.
age]
[4.
shape]
[5.
color]
[6.
origin]
[7.
material]
[8.
purpose]
noun
\"
.
\n
Option
(A):
\"
rubber
terrible
ship
\"
.
(1)
rubber
\"
falls
into
the
material
category.
(2)
\"
terrible
\"
falls
into
the
opinion
category.
Option
(A)
has
the
following
adjective
order:
[7.
material]
[1.
opinion]
(or,
in
numeric
terms,
7
1).
Because
7
<
1
is
not
correct,
(A)
does
not
have
the
correct
ordering.
\n
Option
(B):
\"
terrible
rubber
ship
\"
.
Option
(B)
has
the
following
adjective
order:
[1.
opinion]
[7.
material]
(or,
in
numeric
terms,
1
7).
Because
1
<
7
is
correct,
(B)
has
the
correct
ordering.
So
the
answer
is
(B).
Which
sentence
has
the
correct
adjective
order:
\n
Options:
\n
(A)
repulsive
small
Brazilian
exercise
ship
\n
(B)
Brazilian
repulsive
exercise
small
ship
\n
A:
Let's
think
step
by
step.
\n
When
there
is
more
than
one
adjective
before
a
noun,
the
adjectives
need
to
respect
the
following
order
before
a
noun:
\"
[1.
opinion]
[2.
size]
[3.
age]
[4.
shape]
[5.
color]
[6.
origin]
[7.
material]
[8.
purpose]
noun
\"
.
\n
Option
(A):
\"
repulsive
small
Brazilian
exercise
ship
\"
.
(1)
\"
repulsive
\"
falls
into
the
opinion
category.
(2)
\"
small
\"
falls
into
the
size
category.
(3)
\"
Brazilian
\"
falls
into
the
origin
category.
(4)
\"
exercise
\"
falls
into
the
purpose
category.
Option
(A)
has
the
following
adjective
order:
[1.
opinion]
[2.
size]
[6.
origin]
[8.
purpose]
(or,
in
numeric
terms,
1
2
6
8).
Because
1
<
2
<
6
<
8
is
correct,
(A)
has
the
correct
ordering.
\n
Option
(B):
\"
Brazilian
repulsive
exercise
small
ship
\"
.
Option
(B)
has
the
following
adjective
order:
[6.
origin]
[1.
opinion]
[8.
purpose]
[2.
size]
(or,
in
numeric
terms,
6
1
8
2).
Because
6
<
1
<
8
<
2
is
not
correct,
(B)
does
not
have
the
correct
ordering.
So
the
answer
is
(A).
Which
sentence
has
the
correct
adjective
order:
\n
Options:
\n
(A)
blue
gold
wonderful
square
shoe
\n
(B)
wonderful
square
blue
gold
shoe
\n
A:
Let's
think
step
by
step.
\n
When
there
is
more
than
one
adjective
before
a
noun,
the
adjectives
need
to
respect
the
following
order
before
a
noun:
\"
[1.
opinion]
[2.
size]
[3.
age]
[4.
shape]
[5.
color]
[6.
origin]
[7.
material]
[8.
purpose]
noun
\"
.
\n
Option
(A):
\"
blue
gold
wonderful
square
shoe
\"
.
(1)
\"
blue
\"
falls
into
the
color
category.
(2)
\"
gold
\"
falls
into
the
material
category.
(3)
\"
wonderful
\"
falls
into
the
opinion
category.
(4)
\"
square
\"
falls
into
the
shape
category.
The
adjective
order
that
Option
(A)
has
is
[5.
color]
[7.
material]
[1.
opinion]
[4.
shape]
(or,
in
numeric
terms,
5
7
1
4).
Because
5
<
7
<
1
<
4
is
not
correct,
(A)
does
not
have
the
correct
ordering.
\n
Option
(B):
\"
wonderful
square
blue
gold
shoe
\"
.
Option
(B)
has
the
following
adjective
order:
[1.
opinion]
[4.
shape]
[5.
color]
[7.
material]
(or,
in
numeric
terms,
1
4
5
7
).
Because
1
<
4
<
5
<
7
is
correct,
(B)
has
the
correct
ordering.
So
the
answer
is
(B).Q:
{{input}}
\n
A:
Let's
think
step
by
step.
\n
"
"
include"
:
"
_
flan_
cot_fewshot_template_yaml"
"
include"
:
"
_cot_fewshot_template_yaml"
"
task"
:
"
bbh_
flan_
cot_fewshot_hyperbaton"
"
task"
:
"
bbh_cot_fewshot_hyperbaton"
lm_eval/tasks/bbh/
flan_
cot_fewshot/logical_deduction_five_objects.yaml
→
lm_eval/tasks/bbh/cot_fewshot/logical_deduction_five_objects.yaml
View file @
bf26d979
"
dataset_name"
:
"
logical_deduction_five_objects"
"
dataset_name"
:
"
logical_deduction_five_objects"
"
description"
:
"
A
logical
deduction
task
which
requires
deducing
the
order
of
a
sequence
of
objects.
\n\n
"
"
description"
:
"
A
logical
deduction
task
which
requires
deducing
the
order
of
a
sequence
of
objects.
\n\n
"
"
doc_to_text"
:
"
The
following
paragraphs
each
describe
a
set
of
three
objects
arranged
in
a
fixed
order.
The
statements
are
logically
consistent
within
each
paragraph.
In
a
golf
tournament,
there
were
three
golfers:
Amy,
Eli,
and
Eve.
Eve
finished
above
Amy.
Eli
finished
below
Amy.
\n
Options:
\n
(A)
Amy
finished
last
\n
(B)
Eli
finished
last
\n
(C)
Eve
finished
last
\n
A:
Let's
think
step
by
step.
\n
(1)
Eve
finished
above
Amy:
\"
(above)
?
Eve
?
Amy
?
(below)
\"
.
\n
(2)
Eli
finished
below
Amy:
\"
(above)
?
Amy
?
Eli
?
(below)
\"
.
\n
(3)
Combining
(1)
and
(2)
we
get
the
following
ordering:
\"
(above)
Eve
Amy
Eli
(below)
\"
.
\n
According
to
this
ordering,
the
person
who
finished
last
(the
one
at
the
bottom
of
this
list)
is
Eli.
\n
Eli
finished
last.
So
the
answer
is
(B).
The
following
paragraphs
each
describe
a
set
of
three
objects
arranged
in
a
fixed
order.
The
statements
are
logically
consistent
within
each
paragraph.
On
a
shelf,
there
are
three
books:
a
white
book,
a
green
book,
and
an
orange
book.
The
green
book
is
to
the
right
of
the
white
book.
The
orange
book
is
the
rightmost.
\n
Options:
\n
(A)
The
white
book
is
the
leftmost
\n
(B)
The
green
book
is
the
leftmost
\n
(C)
The
orange
book
is
the
leftmost
\n
A:
Let's
think
step
by
step.
\n
(1)
The
green
book
is
to
the
right
of
the
white
book:
\"
(left)
?
white
?
green
?
(right)
\"
.
\n
(2)
The
orange
book
is
the
rightmost:
\"
(left)
?
white
?
green
orange
(right)
\"
.
\n
(3)
Combining
(1)
and
(2)
we
get
the
following
ordering:
\"
(left)
white
green
orange
(right)
\"
.
\n
According
to
this
ordering,
the
leftmost
book
is
the
white
book.
\n
The
white
book
is
the
leftmost.
So
the
answer
is
(A).
The
following
paragraphs
each
describe
a
set
of
three
objects
arranged
in
a
fixed
order.
The
statements
are
logically
consistent
within
each
paragraph.
On
a
shelf,
there
are
three
books:
a
red
book,
a
gray
book,
and
a
white
book.
The
white
book
is
to
the
left
of
the
gray
book.
The
red
book
is
the
second
from
the
left.
\n
Options:
\n
(A)
The
red
book
is
the
leftmost
\n
(B)
The
gray
book
is
the
leftmost
\n
(C)
The
white
book
is
the
leftmost
\n
A:
Let's
think
step
by
step.
\n
(1)
The
white
book
is
to
the
left
of
the
gray
book:
\"
(left)
?
white
?
gray
?
(right)
\"
.
\n
(2)
The
red
book
is
the
second
from
the
left:
\"
(left)
?
white
red
gray
?
(right)
\"
.
\n
(3)
Combining
(1)
and
(2)
we
get
the
following
ordering:
\"
(left)
white
red
gray
(right)
\"
.
\n
According
to
this
ordering,
the
leftmost
book
is
the
white
book.
\n
The
white
book
is
the
leftmost.
So
the
answer
is
(C).Q:
{{input}}
\n
A:
Let's
think
step
by
step.
\n
"
"
doc_to_text"
:
"
The
following
paragraphs
each
describe
a
set
of
three
objects
arranged
in
a
fixed
order.
The
statements
are
logically
consistent
within
each
paragraph.
In
a
golf
tournament,
there
were
three
golfers:
Amy,
Eli,
and
Eve.
Eve
finished
above
Amy.
Eli
finished
below
Amy.
\n
Options:
\n
(A)
Amy
finished
last
\n
(B)
Eli
finished
last
\n
(C)
Eve
finished
last
\n
A:
Let's
think
step
by
step.
\n
(1)
Eve
finished
above
Amy:
\"
(above)
?
Eve
?
Amy
?
(below)
\"
.
\n
(2)
Eli
finished
below
Amy:
\"
(above)
?
Amy
?
Eli
?
(below)
\"
.
\n
(3)
Combining
(1)
and
(2)
we
get
the
following
ordering:
\"
(above)
Eve
Amy
Eli
(below)
\"
.
\n
According
to
this
ordering,
the
person
who
finished
last
(the
one
at
the
bottom
of
this
list)
is
Eli.
\n
Eli
finished
last.
So
the
answer
is
(B).
The
following
paragraphs
each
describe
a
set
of
three
objects
arranged
in
a
fixed
order.
The
statements
are
logically
consistent
within
each
paragraph.
On
a
shelf,
there
are
three
books:
a
white
book,
a
green
book,
and
an
orange
book.
The
green
book
is
to
the
right
of
the
white
book.
The
orange
book
is
the
rightmost.
\n
Options:
\n
(A)
The
white
book
is
the
leftmost
\n
(B)
The
green
book
is
the
leftmost
\n
(C)
The
orange
book
is
the
leftmost
\n
A:
Let's
think
step
by
step.
\n
(1)
The
green
book
is
to
the
right
of
the
white
book:
\"
(left)
?
white
?
green
?
(right)
\"
.
\n
(2)
The
orange
book
is
the
rightmost:
\"
(left)
?
white
?
green
orange
(right)
\"
.
\n
(3)
Combining
(1)
and
(2)
we
get
the
following
ordering:
\"
(left)
white
green
orange
(right)
\"
.
\n
According
to
this
ordering,
the
leftmost
book
is
the
white
book.
\n
The
white
book
is
the
leftmost.
So
the
answer
is
(A).
The
following
paragraphs
each
describe
a
set
of
three
objects
arranged
in
a
fixed
order.
The
statements
are
logically
consistent
within
each
paragraph.
On
a
shelf,
there
are
three
books:
a
red
book,
a
gray
book,
and
a
white
book.
The
white
book
is
to
the
left
of
the
gray
book.
The
red
book
is
the
second
from
the
left.
\n
Options:
\n
(A)
The
red
book
is
the
leftmost
\n
(B)
The
gray
book
is
the
leftmost
\n
(C)
The
white
book
is
the
leftmost
\n
A:
Let's
think
step
by
step.
\n
(1)
The
white
book
is
to
the
left
of
the
gray
book:
\"
(left)
?
white
?
gray
?
(right)
\"
.
\n
(2)
The
red
book
is
the
second
from
the
left:
\"
(left)
?
white
red
gray
?
(right)
\"
.
\n
(3)
Combining
(1)
and
(2)
we
get
the
following
ordering:
\"
(left)
white
red
gray
(right)
\"
.
\n
According
to
this
ordering,
the
leftmost
book
is
the
white
book.
\n
The
white
book
is
the
leftmost.
So
the
answer
is
(C).Q:
{{input}}
\n
A:
Let's
think
step
by
step.
\n
"
"
include"
:
"
_
flan_
cot_fewshot_template_yaml"
"
include"
:
"
_cot_fewshot_template_yaml"
"
task"
:
"
bbh_
flan_
cot_fewshot_logical_deduction_five_objects"
"
task"
:
"
bbh_cot_fewshot_logical_deduction_five_objects"
lm_eval/tasks/bbh/
flan_
cot_fewshot/logical_deduction_seven_objects.yaml
→
lm_eval/tasks/bbh/cot_fewshot/logical_deduction_seven_objects.yaml
View file @
bf26d979
"
dataset_name"
:
"
logical_deduction_seven_objects"
"
dataset_name"
:
"
logical_deduction_seven_objects"
"
description"
:
"
A
logical
deduction
task
which
requires
deducing
the
order
of
a
sequence
of
objects.
\n\n
"
"
description"
:
"
A
logical
deduction
task
which
requires
deducing
the
order
of
a
sequence
of
objects.
\n\n
"
"
doc_to_text"
:
"
The
following
paragraphs
each
describe
a
set
of
three
objects
arranged
in
a
fixed
order.
The
statements
are
logically
consistent
within
each
paragraph.
In
a
golf
tournament,
there
were
three
golfers:
Amy,
Eli,
and
Eve.
Eve
finished
above
Amy.
Eli
finished
below
Amy.
\n
Options:
\n
(A)
Amy
finished
last
\n
(B)
Eli
finished
last
\n
(C)
Eve
finished
last
\n
A:
Let's
think
step
by
step.
\n
(1)
Eve
finished
above
Amy:
\"
(above)
?
Eve
?
Amy
?
(below)
\"
.
\n
(2)
Eli
finished
below
Amy:
\"
(above)
?
Amy
?
Eli
?
(below)
\"
.
\n
(3)
Combining
(1)
and
(2)
we
get
the
following
ordering:
\"
(above)
Eve
Amy
Eli
(below)
\"
.
\n
According
to
this
ordering,
the
person
who
finished
last
(the
one
at
the
bottom
of
this
list)
is
Eli.
\n
Eli
finished
last.
So
the
answer
is
(B).
The
following
paragraphs
each
describe
a
set
of
three
objects
arranged
in
a
fixed
order.
The
statements
are
logically
consistent
within
each
paragraph.
On
a
shelf,
there
are
three
books:
a
white
book,
a
green
book,
and
an
orange
book.
The
green
book
is
to
the
right
of
the
white
book.
The
orange
book
is
the
rightmost.
\n
Options:
\n
(A)
The
white
book
is
the
leftmost
\n
(B)
The
green
book
is
the
leftmost
\n
(C)
The
orange
book
is
the
leftmost
\n
A:
Let's
think
step
by
step.
\n
(1)
The
green
book
is
to
the
right
of
the
white
book:
\"
(left)
?
white
?
green
?
(right)
\"
.
\n
(2)
The
orange
book
is
the
rightmost:
\"
(left)
?
white
?
green
orange
(right)
\"
.
\n
(3)
Combining
(1)
and
(2)
we
get
the
following
ordering:
\"
(left)
white
green
orange
(right)
\"
.
\n
According
to
this
ordering,
the
leftmost
book
is
the
white
book.
\n
The
white
book
is
the
leftmost.
So
the
answer
is
(A).
The
following
paragraphs
each
describe
a
set
of
three
objects
arranged
in
a
fixed
order.
The
statements
are
logically
consistent
within
each
paragraph.
On
a
shelf,
there
are
three
books:
a
red
book,
a
gray
book,
and
a
white
book.
The
white
book
is
to
the
left
of
the
gray
book.
The
red
book
is
the
second
from
the
left.
\n
Options:
\n
(A)
The
red
book
is
the
leftmost
\n
(B)
The
gray
book
is
the
leftmost
\n
(C)
The
white
book
is
the
leftmost
\n
A:
Let's
think
step
by
step.
\n
(1)
The
white
book
is
to
the
left
of
the
gray
book:
\"
(left)
?
white
?
gray
?
(right)
\"
.
\n
(2)
The
red
book
is
the
second
from
the
left:
\"
(left)
?
white
red
gray
?
(right)
\"
.
\n
(3)
Combining
(1)
and
(2)
we
get
the
following
ordering:
\"
(left)
white
red
gray
(right)
\"
.
\n
According
to
this
ordering,
the
leftmost
book
is
the
white
book.
\n
The
white
book
is
the
leftmost.
So
the
answer
is
(C).Q:
{{input}}
\n
A:
Let's
think
step
by
step.
\n
"
"
doc_to_text"
:
"
The
following
paragraphs
each
describe
a
set
of
three
objects
arranged
in
a
fixed
order.
The
statements
are
logically
consistent
within
each
paragraph.
In
a
golf
tournament,
there
were
three
golfers:
Amy,
Eli,
and
Eve.
Eve
finished
above
Amy.
Eli
finished
below
Amy.
\n
Options:
\n
(A)
Amy
finished
last
\n
(B)
Eli
finished
last
\n
(C)
Eve
finished
last
\n
A:
Let's
think
step
by
step.
\n
(1)
Eve
finished
above
Amy:
\"
(above)
?
Eve
?
Amy
?
(below)
\"
.
\n
(2)
Eli
finished
below
Amy:
\"
(above)
?
Amy
?
Eli
?
(below)
\"
.
\n
(3)
Combining
(1)
and
(2)
we
get
the
following
ordering:
\"
(above)
Eve
Amy
Eli
(below)
\"
.
\n
According
to
this
ordering,
the
person
who
finished
last
(the
one
at
the
bottom
of
this
list)
is
Eli.
\n
Eli
finished
last.
So
the
answer
is
(B).
The
following
paragraphs
each
describe
a
set
of
three
objects
arranged
in
a
fixed
order.
The
statements
are
logically
consistent
within
each
paragraph.
On
a
shelf,
there
are
three
books:
a
white
book,
a
green
book,
and
an
orange
book.
The
green
book
is
to
the
right
of
the
white
book.
The
orange
book
is
the
rightmost.
\n
Options:
\n
(A)
The
white
book
is
the
leftmost
\n
(B)
The
green
book
is
the
leftmost
\n
(C)
The
orange
book
is
the
leftmost
\n
A:
Let's
think
step
by
step.
\n
(1)
The
green
book
is
to
the
right
of
the
white
book:
\"
(left)
?
white
?
green
?
(right)
\"
.
\n
(2)
The
orange
book
is
the
rightmost:
\"
(left)
?
white
?
green
orange
(right)
\"
.
\n
(3)
Combining
(1)
and
(2)
we
get
the
following
ordering:
\"
(left)
white
green
orange
(right)
\"
.
\n
According
to
this
ordering,
the
leftmost
book
is
the
white
book.
\n
The
white
book
is
the
leftmost.
So
the
answer
is
(A).
The
following
paragraphs
each
describe
a
set
of
three
objects
arranged
in
a
fixed
order.
The
statements
are
logically
consistent
within
each
paragraph.
On
a
shelf,
there
are
three
books:
a
red
book,
a
gray
book,
and
a
white
book.
The
white
book
is
to
the
left
of
the
gray
book.
The
red
book
is
the
second
from
the
left.
\n
Options:
\n
(A)
The
red
book
is
the
leftmost
\n
(B)
The
gray
book
is
the
leftmost
\n
(C)
The
white
book
is
the
leftmost
\n
A:
Let's
think
step
by
step.
\n
(1)
The
white
book
is
to
the
left
of
the
gray
book:
\"
(left)
?
white
?
gray
?
(right)
\"
.
\n
(2)
The
red
book
is
the
second
from
the
left:
\"
(left)
?
white
red
gray
?
(right)
\"
.
\n
(3)
Combining
(1)
and
(2)
we
get
the
following
ordering:
\"
(left)
white
red
gray
(right)
\"
.
\n
According
to
this
ordering,
the
leftmost
book
is
the
white
book.
\n
The
white
book
is
the
leftmost.
So
the
answer
is
(C).Q:
{{input}}
\n
A:
Let's
think
step
by
step.
\n
"
"
include"
:
"
_
flan_
cot_fewshot_template_yaml"
"
include"
:
"
_cot_fewshot_template_yaml"
"
task"
:
"
bbh_
flan_
cot_fewshot_logical_deduction_seven_objects"
"
task"
:
"
bbh_cot_fewshot_logical_deduction_seven_objects"
lm_eval/tasks/bbh/
flan_
cot_fewshot/logical_deduction_three_objects.yaml
→
lm_eval/tasks/bbh/cot_fewshot/logical_deduction_three_objects.yaml
View file @
bf26d979
"
dataset_name"
:
"
logical_deduction_three_objects"
"
dataset_name"
:
"
logical_deduction_three_objects"
"
description"
:
"
A
logical
deduction
task
which
requires
deducing
the
order
of
a
sequence
of
objects.
\n\n
"
"
description"
:
"
A
logical
deduction
task
which
requires
deducing
the
order
of
a
sequence
of
objects.
\n\n
"
"
doc_to_text"
:
"
The
following
paragraphs
each
describe
a
set
of
three
objects
arranged
in
a
fixed
order.
The
statements
are
logically
consistent
within
each
paragraph.
In
a
golf
tournament,
there
were
three
golfers:
Amy,
Eli,
and
Eve.
Eve
finished
above
Amy.
Eli
finished
below
Amy.
\n
Options:
\n
(A)
Amy
finished
last
\n
(B)
Eli
finished
last
\n
(C)
Eve
finished
last
\n
A:
Let's
think
step
by
step.
\n
(1)
Eve
finished
above
Amy:
\"
(above)
?
Eve
?
Amy
?
(below)
\"
.
\n
(2)
Eli
finished
below
Amy:
\"
(above)
?
Amy
?
Eli
?
(below)
\"
.
\n
(3)
Combining
(1)
and
(2)
we
get
the
following
ordering:
\"
(above)
Eve
Amy
Eli
(below)
\"
.
\n
According
to
this
ordering,
the
person
who
finished
last
(the
one
at
the
bottom
of
this
list)
is
Eli.
\n
Eli
finished
last.
So
the
answer
is
(B).
The
following
paragraphs
each
describe
a
set
of
three
objects
arranged
in
a
fixed
order.
The
statements
are
logically
consistent
within
each
paragraph.
On
a
shelf,
there
are
three
books:
a
white
book,
a
green
book,
and
an
orange
book.
The
green
book
is
to
the
right
of
the
white
book.
The
orange
book
is
the
rightmost.
\n
Options:
\n
(A)
The
white
book
is
the
leftmost
\n
(B)
The
green
book
is
the
leftmost
\n
(C)
The
orange
book
is
the
leftmost
\n
A:
Let's
think
step
by
step.
\n
(1)
The
green
book
is
to
the
right
of
the
white
book:
\"
(left)
?
white
?
green
?
(right)
\"
.
\n
(2)
The
orange
book
is
the
rightmost:
\"
(left)
?
white
?
green
orange
(right)
\"
.
\n
(3)
Combining
(1)
and
(2)
we
get
the
following
ordering:
\"
(left)
white
green
orange
(right)
\"
.
\n
According
to
this
ordering,
the
leftmost
book
is
the
white
book.
\n
The
white
book
is
the
leftmost.
So
the
answer
is
(A).
The
following
paragraphs
each
describe
a
set
of
three
objects
arranged
in
a
fixed
order.
The
statements
are
logically
consistent
within
each
paragraph.
On
a
shelf,
there
are
three
books:
a
red
book,
a
gray
book,
and
a
white
book.
The
white
book
is
to
the
left
of
the
gray
book.
The
red
book
is
the
second
from
the
left.
\n
Options:
\n
(A)
The
red
book
is
the
leftmost
\n
(B)
The
gray
book
is
the
leftmost
\n
(C)
The
white
book
is
the
leftmost
\n
A:
Let's
think
step
by
step.
\n
(1)
The
white
book
is
to
the
left
of
the
gray
book:
\"
(left)
?
white
?
gray
?
(right)
\"
.
\n
(2)
The
red
book
is
the
second
from
the
left:
\"
(left)
?
white
red
gray
?
(right)
\"
.
\n
(3)
Combining
(1)
and
(2)
we
get
the
following
ordering:
\"
(left)
white
red
gray
(right)
\"
.
\n
According
to
this
ordering,
the
leftmost
book
is
the
white
book.
\n
The
white
book
is
the
leftmost.
So
the
answer
is
(C).Q:
{{input}}
\n
A:
Let's
think
step
by
step.
\n
"
"
doc_to_text"
:
"
The
following
paragraphs
each
describe
a
set
of
three
objects
arranged
in
a
fixed
order.
The
statements
are
logically
consistent
within
each
paragraph.
In
a
golf
tournament,
there
were
three
golfers:
Amy,
Eli,
and
Eve.
Eve
finished
above
Amy.
Eli
finished
below
Amy.
\n
Options:
\n
(A)
Amy
finished
last
\n
(B)
Eli
finished
last
\n
(C)
Eve
finished
last
\n
A:
Let's
think
step
by
step.
\n
(1)
Eve
finished
above
Amy:
\"
(above)
?
Eve
?
Amy
?
(below)
\"
.
\n
(2)
Eli
finished
below
Amy:
\"
(above)
?
Amy
?
Eli
?
(below)
\"
.
\n
(3)
Combining
(1)
and
(2)
we
get
the
following
ordering:
\"
(above)
Eve
Amy
Eli
(below)
\"
.
\n
According
to
this
ordering,
the
person
who
finished
last
(the
one
at
the
bottom
of
this
list)
is
Eli.
\n
Eli
finished
last.
So
the
answer
is
(B).
The
following
paragraphs
each
describe
a
set
of
three
objects
arranged
in
a
fixed
order.
The
statements
are
logically
consistent
within
each
paragraph.
On
a
shelf,
there
are
three
books:
a
white
book,
a
green
book,
and
an
orange
book.
The
green
book
is
to
the
right
of
the
white
book.
The
orange
book
is
the
rightmost.
\n
Options:
\n
(A)
The
white
book
is
the
leftmost
\n
(B)
The
green
book
is
the
leftmost
\n
(C)
The
orange
book
is
the
leftmost
\n
A:
Let's
think
step
by
step.
\n
(1)
The
green
book
is
to
the
right
of
the
white
book:
\"
(left)
?
white
?
green
?
(right)
\"
.
\n
(2)
The
orange
book
is
the
rightmost:
\"
(left)
?
white
?
green
orange
(right)
\"
.
\n
(3)
Combining
(1)
and
(2)
we
get
the
following
ordering:
\"
(left)
white
green
orange
(right)
\"
.
\n
According
to
this
ordering,
the
leftmost
book
is
the
white
book.
\n
The
white
book
is
the
leftmost.
So
the
answer
is
(A).
The
following
paragraphs
each
describe
a
set
of
three
objects
arranged
in
a
fixed
order.
The
statements
are
logically
consistent
within
each
paragraph.
On
a
shelf,
there
are
three
books:
a
red
book,
a
gray
book,
and
a
white
book.
The
white
book
is
to
the
left
of
the
gray
book.
The
red
book
is
the
second
from
the
left.
\n
Options:
\n
(A)
The
red
book
is
the
leftmost
\n
(B)
The
gray
book
is
the
leftmost
\n
(C)
The
white
book
is
the
leftmost
\n
A:
Let's
think
step
by
step.
\n
(1)
The
white
book
is
to
the
left
of
the
gray
book:
\"
(left)
?
white
?
gray
?
(right)
\"
.
\n
(2)
The
red
book
is
the
second
from
the
left:
\"
(left)
?
white
red
gray
?
(right)
\"
.
\n
(3)
Combining
(1)
and
(2)
we
get
the
following
ordering:
\"
(left)
white
red
gray
(right)
\"
.
\n
According
to
this
ordering,
the
leftmost
book
is
the
white
book.
\n
The
white
book
is
the
leftmost.
So
the
answer
is
(C).Q:
{{input}}
\n
A:
Let's
think
step
by
step.
\n
"
"
include"
:
"
_
flan_
cot_fewshot_template_yaml"
"
include"
:
"
_cot_fewshot_template_yaml"
"
task"
:
"
bbh_
flan_
cot_fewshot_logical_deduction_three_objects"
"
task"
:
"
bbh_cot_fewshot_logical_deduction_three_objects"
lm_eval/tasks/bbh/
flan_
cot_fewshot/movie_recommendation.yaml
→
lm_eval/tasks/bbh/cot_fewshot/movie_recommendation.yaml
View file @
bf26d979
"
dataset_name"
:
"
movie_recommendation"
"
dataset_name"
:
"
movie_recommendation"
"
description"
:
"
Recommend
movies
similar
to
the
given
list
of
movies.
\n\n
"
"
description"
:
"
Recommend
movies
similar
to
the
given
list
of
movies.
\n\n
"
"
doc_to_text"
:
"
Find
a
movie
similar
to
Star
Wars
Episode
IV
-
A
New
Hope,
Indiana
Jones
and
the
Last
Crusade,
Star
Wars
Episode
V
-
The
Empire
Strikes
Back,
The
Big
Lebowski:
\n
Options:
\n
(A)
Tetsuo
\n
(B)
the
Ironman
\n
(C)
The
Princess
Bride
\n
(D)
The
Barkley
Marathons
The
Race
That
Eats
Its
Young
\n
(E)
Bug
\n
A:
Let's
think
step
by
step.
\n
-
Star
Wars
Episode
IV
-
A
New
Hope
(action,
adventure,
fantasy;
1977)
\n
-
Indiana
Jones
and
the
Last
Crusade
(action,
adventure;
1989)
\n
-
Star
Wars
Episode
V
-
The
Empire
Strikes
Back
(action,
adventure,
fantasy;
1980)
\n
-
The
Big
Lebowski
(action,
drama,
comedy;
1998)
\n
These
are
all
famous
classic
American
movies
produced
before
2000.
Amongst
all
the
options,
the
only
movie
similar
to
these
ones
seems
to
be
The
Princess
Bride
(1987).
So
the
answer
is
(C).
Find
a
movie
similar
to
Twister,
The
Silence
of
the
Lambs,
Independence
Day,
Braveheart:
\n
Options:
\n
(A)
They
Shoot
Horses
\n
(B)
Don't
They
\n
(C)
Forrest
Gump
\n
(D)
The
Salton
Sea
\n
(E)
Extreme
Days
\n
A:
Let's
think
step
by
step.
\n
-
Twister
(action,
adventure,
thriller;
1996)
\n
-
The
Silence
of
the
Lambs
(crime,
drama,
thriller;
1991)
\n
-
Independence
Day
(action,
science-fiction,
drama;
1996)
\n
-
Braveheart
(biography,
drama,
epic;
1995)
\n
These
are
all
famous
Hollywood
movies
produced
around
the
1990s.
Amongst
all
the
options,
the
only
movie
similar
to
these
ones
seems
to
be
Forrest
Gump
(comedy,
drama,
romance;
1994).
So
the
answer
is
(C).
Find
a
movie
similar
to
Minority
Report,
Total
Recall,
Inside
Out,
Forrest
Gump:
\n
Options:
\n
(A)
Phenomena
\n
(B)
Lilting
\n
(C)
Catwoman
\n
(D)
Edge
of
Tomorrow
\n
A:
Let's
think
step
by
step.
\n
-
Minority
Report
(action,
crime,
mystery;
2002)
\n
-
Total
Recall
(action,
adventure,
science-fiction;
2012)
\n
-
Inside
Out
(animation,
family,
comedy;
2015)
\n
-
Forrest
Gump
(comedy,
drama,
romance;
1994)
\n
These
are
all
famous
movies
produced
in
the
past
few
decades.Amongst
all
the
options,
the
only
movie
similar
to
these
ones
seems
to
be
Edge
of
Tomorrow
(action,
adventure,
crime,
mystery;
2014),
as
it
is
also
a
science-fiction
movie
and
features
Tom
Cruise.
So
the
answer
is
(D).Q:
{{input}}
\n
A:
Let's
think
step
by
step.
\n
"
"
doc_to_text"
:
"
Find
a
movie
similar
to
Star
Wars
Episode
IV
-
A
New
Hope,
Indiana
Jones
and
the
Last
Crusade,
Star
Wars
Episode
V
-
The
Empire
Strikes
Back,
The
Big
Lebowski:
\n
Options:
\n
(A)
Tetsuo
\n
(B)
the
Ironman
\n
(C)
The
Princess
Bride
\n
(D)
The
Barkley
Marathons
The
Race
That
Eats
Its
Young
\n
(E)
Bug
\n
A:
Let's
think
step
by
step.
\n
-
Star
Wars
Episode
IV
-
A
New
Hope
(action,
adventure,
fantasy;
1977)
\n
-
Indiana
Jones
and
the
Last
Crusade
(action,
adventure;
1989)
\n
-
Star
Wars
Episode
V
-
The
Empire
Strikes
Back
(action,
adventure,
fantasy;
1980)
\n
-
The
Big
Lebowski
(action,
drama,
comedy;
1998)
\n
These
are
all
famous
classic
American
movies
produced
before
2000.
Amongst
all
the
options,
the
only
movie
similar
to
these
ones
seems
to
be
The
Princess
Bride
(1987).
So
the
answer
is
(C).
Find
a
movie
similar
to
Twister,
The
Silence
of
the
Lambs,
Independence
Day,
Braveheart:
\n
Options:
\n
(A)
They
Shoot
Horses
\n
(B)
Don't
They
\n
(C)
Forrest
Gump
\n
(D)
The
Salton
Sea
\n
(E)
Extreme
Days
\n
A:
Let's
think
step
by
step.
\n
-
Twister
(action,
adventure,
thriller;
1996)
\n
-
The
Silence
of
the
Lambs
(crime,
drama,
thriller;
1991)
\n
-
Independence
Day
(action,
science-fiction,
drama;
1996)
\n
-
Braveheart
(biography,
drama,
epic;
1995)
\n
These
are
all
famous
Hollywood
movies
produced
around
the
1990s.
Amongst
all
the
options,
the
only
movie
similar
to
these
ones
seems
to
be
Forrest
Gump
(comedy,
drama,
romance;
1994).
So
the
answer
is
(C).
Find
a
movie
similar
to
Minority
Report,
Total
Recall,
Inside
Out,
Forrest
Gump:
\n
Options:
\n
(A)
Phenomena
\n
(B)
Lilting
\n
(C)
Catwoman
\n
(D)
Edge
of
Tomorrow
\n
A:
Let's
think
step
by
step.
\n
-
Minority
Report
(action,
crime,
mystery;
2002)
\n
-
Total
Recall
(action,
adventure,
science-fiction;
2012)
\n
-
Inside
Out
(animation,
family,
comedy;
2015)
\n
-
Forrest
Gump
(comedy,
drama,
romance;
1994)
\n
These
are
all
famous
movies
produced
in
the
past
few
decades.Amongst
all
the
options,
the
only
movie
similar
to
these
ones
seems
to
be
Edge
of
Tomorrow
(action,
adventure,
crime,
mystery;
2014),
as
it
is
also
a
science-fiction
movie
and
features
Tom
Cruise.
So
the
answer
is
(D).Q:
{{input}}
\n
A:
Let's
think
step
by
step.
\n
"
"
include"
:
"
_
flan_
cot_fewshot_template_yaml"
"
include"
:
"
_cot_fewshot_template_yaml"
"
task"
:
"
bbh_
flan_
cot_fewshot_movie_recommendation"
"
task"
:
"
bbh_cot_fewshot_movie_recommendation"
lm_eval/tasks/bbh/
flan_
cot_fewshot/multistep_arithmetic_two.yaml
→
lm_eval/tasks/bbh/cot_fewshot/multistep_arithmetic_two.yaml
View file @
bf26d979
"
dataset_name"
:
"
multistep_arithmetic_two"
"
dataset_name"
:
"
multistep_arithmetic_two"
"
description"
:
"
Solve
multi-step
arithmetic
problems.
\n\n
"
"
description"
:
"
Solve
multi-step
arithmetic
problems.
\n\n
"
"
doc_to_text"
:
"
((-5
+
9
*
-4
-
0)
*
(4
+
-7
+
0
*
-5))
=
\n
A:
Let's
think
step
by
step.
\n
Let’s
recall
that
the
order
of
operations
in
mathematics
is
as
follows:
(1)
Parentheses,
(2)
exponents,
(3)
multiplication
and
division
(from
left
to
right),
(4)
addition
and
multiplication
(from
left
to
right).
So,
remember
to
always
compute
the
expressions
inside
parentheses
or
brackets
first.
\n
This
equation
can
be
written
as
\"
A
*
B
\"
,
where
A
=
(-5
+
9
*
-4
-
0)
and
B
=
(4
+
-7
+
0
*
-5).
\n
Let's
calculate
A
=
(-5
+
9
*
-4
-
0)
=
(-5
+
(9
*
-4)
-
0)
=
(-5
+
(-36)
-
0)
=
(-5
+
-36
-
0)
=
-5
-
36
=
-41.
\n
Let's
calculate
B
=
(4
+
-7
+
0
*
-5)
=
(4
+
-7
+
(0
*
-5))
=
(4
+
-7
+
0)
=
(4
+
-7)
=
(4
-
7)
=
-3.
\n
Then,
the
final
equation
is
A
*
B
=
-41
*
-3
=
(-61)
*
(-3)
=
123.
So
the
answer
is
123.
((-9
*
7
*
7
*
-9)
+
(4
*
-9
-
8
-
-4))
=
\n
A:
Let's
think
step
by
step.
\n
Let’s
recall
that
the
order
of
operations
in
mathematics
is
as
follows:
(1)
Parentheses,
(2)
exponents,
(3)
multiplication
and
division
(from
left
to
right),
(4)
addition
and
multiplication
(from
left
to
right).
So,
remember
to
always
compute
the
expressions
inside
parentheses
or
brackets
first.
\n
This
equation
can
be
written
as
\"
A
+
B
\"
,
where
A
=
(-9
*
7
*
7
*
-9)
and
B
=
(4
*
-9
-
8
-
-4).
\n
Let's
calculate
A
=
(-9
*
7
*
7
*
-9)
=
((-9
*
7)
*
(7
*
-9))
=
((-63)
*
(-63))
=
3969.
\n
Let's
calculate
B
=
(4
*
-9
-
8
-
(-4))
=
((4
*
-9)
-
8
-
(-4))
=
((-36)
-
8
-
(-4))
=
((-36
-
8)
-
(-4))
=
(-44
-
(-4))
=
-40.
\n
Then,
the
final
equation
is
A
+
B
=
3969
+
-40
=
3969
-
40
=
3929.
So
the
answer
is
3929.
((-3
+
5
*
8
*
-4)
-
(9
-
8
*
-7
+
-9))
=
\n
A:
Let's
think
step
by
step.
\n
Let’s
recall
that
the
order
of
operations
in
mathematics
is
as
follows:
(1)
Parentheses,
(2)
exponents,
(3)
multiplication
and
division
(from
left
to
right),
(4)
addition
and
multiplication
(from
left
to
right).
So,
remember
to
always
compute
the
expressions
inside
parentheses
or
brackets
first.
\n
This
equation
can
be
written
as
\"
A
-
B
\"
,
where
A
=
(-3
+
5
*
8
*
-4)
and
B
=
(9
-
8
*
-7
+
-9).
\n
Let's
calculate
A
=
(-3
+
5
*
8
*
-4)
=
(-3
+
(5
*
8)
*
-4)
=
(-3
+
(40)
*
-4)
=
(-3
+
(40
*
-4))
=
(-3
+
-160)
=
-163.
\n
Let's
calculate
B
=
(9
-
8
*
-7
+
-9)
=
(9
-
(8
*
-7)
+
-9)
=
(9
-
(-56)
+
-9)
=
((9
-
(-56))
+
-9)
=
((65)
+
-9)=
(65
-
9)
=
56.
\n
Then,
the
final
equation
is
A
-
B
=
-163
-
56
=
-219.
So
the
answer
is
-219.Q:
{{input}}
\n
A:
Let's
think
step
by
step.
\n
"
"
doc_to_text"
:
"
((-5
+
9
*
-4
-
0)
*
(4
+
-7
+
0
*
-5))
=
\n
A:
Let's
think
step
by
step.
\n
Let’s
recall
that
the
order
of
operations
in
mathematics
is
as
follows:
(1)
Parentheses,
(2)
exponents,
(3)
multiplication
and
division
(from
left
to
right),
(4)
addition
and
multiplication
(from
left
to
right).
So,
remember
to
always
compute
the
expressions
inside
parentheses
or
brackets
first.
\n
This
equation
can
be
written
as
\"
A
*
B
\"
,
where
A
=
(-5
+
9
*
-4
-
0)
and
B
=
(4
+
-7
+
0
*
-5).
\n
Let's
calculate
A
=
(-5
+
9
*
-4
-
0)
=
(-5
+
(9
*
-4)
-
0)
=
(-5
+
(-36)
-
0)
=
(-5
+
-36
-
0)
=
-5
-
36
=
-41.
\n
Let's
calculate
B
=
(4
+
-7
+
0
*
-5)
=
(4
+
-7
+
(0
*
-5))
=
(4
+
-7
+
0)
=
(4
+
-7)
=
(4
-
7)
=
-3.
\n
Then,
the
final
equation
is
A
*
B
=
-41
*
-3
=
(-61)
*
(-3)
=
123.
So
the
answer
is
123.
((-9
*
7
*
7
*
-9)
+
(4
*
-9
-
8
-
-4))
=
\n
A:
Let's
think
step
by
step.
\n
Let’s
recall
that
the
order
of
operations
in
mathematics
is
as
follows:
(1)
Parentheses,
(2)
exponents,
(3)
multiplication
and
division
(from
left
to
right),
(4)
addition
and
multiplication
(from
left
to
right).
So,
remember
to
always
compute
the
expressions
inside
parentheses
or
brackets
first.
\n
This
equation
can
be
written
as
\"
A
+
B
\"
,
where
A
=
(-9
*
7
*
7
*
-9)
and
B
=
(4
*
-9
-
8
-
-4).
\n
Let's
calculate
A
=
(-9
*
7
*
7
*
-9)
=
((-9
*
7)
*
(7
*
-9))
=
((-63)
*
(-63))
=
3969.
\n
Let's
calculate
B
=
(4
*
-9
-
8
-
(-4))
=
((4
*
-9)
-
8
-
(-4))
=
((-36)
-
8
-
(-4))
=
((-36
-
8)
-
(-4))
=
(-44
-
(-4))
=
-40.
\n
Then,
the
final
equation
is
A
+
B
=
3969
+
-40
=
3969
-
40
=
3929.
So
the
answer
is
3929.
((-3
+
5
*
8
*
-4)
-
(9
-
8
*
-7
+
-9))
=
\n
A:
Let's
think
step
by
step.
\n
Let’s
recall
that
the
order
of
operations
in
mathematics
is
as
follows:
(1)
Parentheses,
(2)
exponents,
(3)
multiplication
and
division
(from
left
to
right),
(4)
addition
and
multiplication
(from
left
to
right).
So,
remember
to
always
compute
the
expressions
inside
parentheses
or
brackets
first.
\n
This
equation
can
be
written
as
\"
A
-
B
\"
,
where
A
=
(-3
+
5
*
8
*
-4)
and
B
=
(9
-
8
*
-7
+
-9).
\n
Let's
calculate
A
=
(-3
+
5
*
8
*
-4)
=
(-3
+
(5
*
8)
*
-4)
=
(-3
+
(40)
*
-4)
=
(-3
+
(40
*
-4))
=
(-3
+
-160)
=
-163.
\n
Let's
calculate
B
=
(9
-
8
*
-7
+
-9)
=
(9
-
(8
*
-7)
+
-9)
=
(9
-
(-56)
+
-9)
=
((9
-
(-56))
+
-9)
=
((65)
+
-9)=
(65
-
9)
=
56.
\n
Then,
the
final
equation
is
A
-
B
=
-163
-
56
=
-219.
So
the
answer
is
-219.Q:
{{input}}
\n
A:
Let's
think
step
by
step.
\n
"
"
include"
:
"
_
flan_
cot_fewshot_template_yaml"
"
include"
:
"
_cot_fewshot_template_yaml"
"
task"
:
"
bbh_
flan_
cot_fewshot_multistep_arithmetic_two"
"
task"
:
"
bbh_cot_fewshot_multistep_arithmetic_two"
lm_eval/tasks/bbh/
flan_
cot_fewshot/navigate.yaml
→
lm_eval/tasks/bbh/cot_fewshot/navigate.yaml
View file @
bf26d979
"
dataset_name"
:
"
navigate"
"
dataset_name"
:
"
navigate"
"
description"
:
"
Given
a
series
of
navigation
instructions,
determine
whether
one
would
end
up
back
at
the
starting
point.
\n\n
"
"
description"
:
"
Given
a
series
of
navigation
instructions,
determine
whether
one
would
end
up
back
at
the
starting
point.
\n\n
"
"
doc_to_text"
:
"
If
you
follow
these
instructions,
do
you
return
to
the
starting
point?
Turn
left.
Turn
around.
Turn
left.
Take
7
steps.
Take
2
steps.
Take
4
steps.
Take
8
steps.
\n
Options:
\n
-
Yes
\n
-
No
\n
A:
Let's
think
step
by
step.
\n
We
start
at
the
origin
(0,
0),
facing
the
positive
y-axis.
\n
(1)
Turn
left:
(0,
0),
facing
the
negative
x-axis.
\n
(2)
Turn
around:
(0,
0),
facing
the
positive
x-axis.
\n
(3)
Turn
left:
(0,
0),
facing
the
positive
y-axis.
\n
(4)
Take
7
steps:
(0,
7),
facing
the
positive
y-axis.
\n
(5)
Take
2
steps:
(0,
9),
facing
the
positive
y-axis.
\n
(6)
Take
4
steps:
(0,
13),
facing
the
positive
y-axis.
\n
(7)
Take
8
steps:
(0,
21),
facing
the
positive
y-axis.
\n
Since
(0,
21)
is
not
(0,
0),
we
are
not
where
we
started.
So
the
answer
is
No.
If
you
follow
these
instructions,
do
you
return
to
the
starting
point?
Turn
around.
Take
1
step.
Take
6
steps.
Turn
around.
Take
6
steps.
Take
9
steps.
Take
1
step.
\n
Options:
\n
-
Yes
\n
-
No
\n
A:
Let's
think
step
by
step.
\n
We
start
at
the
origin
(0,
0),
facing
the
positive
y-axis.
\n
(1)
Turn
around:
(0,
0),
facing
the
negative
y-axis.
\n
(2)
Take
1
step:
(0,
-1),
facing
the
negative
y-axis.
\n
(3)
Take
6
steps:
(0,
-7),
facing
the
negative
y-axis.
\n
(4)
Turn
around:
(0,
-7),
facing
the
positive
y-axis.
\n
(5)
Take
6
steps:
(0,
-1),
facing
the
positive
y-axis.
\n
(6)
Take
9
steps:
(0,
8),
facing
the
positive
y-axis.
\n
(7)
Take
1
step:
(0,
9),
facing
the
positive
y-axis.
\n
Since
(0,
9)
is
not
(0,
0),
we
are
not
where
we
started.
So
the
answer
is
No.
If
you
follow
these
instructions,
do
you
return
to
the
starting
point?
Always
face
forward.
Take
2
steps
right.
Take
9
steps
left.
Take
7
steps
right.
\n
Options:
\n
-
Yes
\n
-
No
\n
A:
Let's
think
step
by
step.
\n
We
start
at
the
origin
(0,
0),
facing
the
positive
y-axis.
\n
(1)
Always
face
forward:
(0,
0),
facing
the
positive
y-axis.
\n
(2)
Take
2
steps
right:
(0,
2),
facing
the
positive
y-axis.
\n
(3)
Take
9
steps
left:
(0,
-7),
facing
the
positive
y-axis.
\n
(4)
Take
7
steps
right:
(0,
7),
facing
the
positive
y-axis.
\n
Since
(0,
0)
is
(0,
0),
we
are
indeed
where
we
started.
So
the
answer
is
Yes.Q:
{{input}}
\n
A:
Let's
think
step
by
step.
\n
"
"
doc_to_text"
:
"
If
you
follow
these
instructions,
do
you
return
to
the
starting
point?
Turn
left.
Turn
around.
Turn
left.
Take
7
steps.
Take
2
steps.
Take
4
steps.
Take
8
steps.
\n
Options:
\n
-
Yes
\n
-
No
\n
A:
Let's
think
step
by
step.
\n
We
start
at
the
origin
(0,
0),
facing
the
positive
y-axis.
\n
(1)
Turn
left:
(0,
0),
facing
the
negative
x-axis.
\n
(2)
Turn
around:
(0,
0),
facing
the
positive
x-axis.
\n
(3)
Turn
left:
(0,
0),
facing
the
positive
y-axis.
\n
(4)
Take
7
steps:
(0,
7),
facing
the
positive
y-axis.
\n
(5)
Take
2
steps:
(0,
9),
facing
the
positive
y-axis.
\n
(6)
Take
4
steps:
(0,
13),
facing
the
positive
y-axis.
\n
(7)
Take
8
steps:
(0,
21),
facing
the
positive
y-axis.
\n
Since
(0,
21)
is
not
(0,
0),
we
are
not
where
we
started.
So
the
answer
is
No.
If
you
follow
these
instructions,
do
you
return
to
the
starting
point?
Turn
around.
Take
1
step.
Take
6
steps.
Turn
around.
Take
6
steps.
Take
9
steps.
Take
1
step.
\n
Options:
\n
-
Yes
\n
-
No
\n
A:
Let's
think
step
by
step.
\n
We
start
at
the
origin
(0,
0),
facing
the
positive
y-axis.
\n
(1)
Turn
around:
(0,
0),
facing
the
negative
y-axis.
\n
(2)
Take
1
step:
(0,
-1),
facing
the
negative
y-axis.
\n
(3)
Take
6
steps:
(0,
-7),
facing
the
negative
y-axis.
\n
(4)
Turn
around:
(0,
-7),
facing
the
positive
y-axis.
\n
(5)
Take
6
steps:
(0,
-1),
facing
the
positive
y-axis.
\n
(6)
Take
9
steps:
(0,
8),
facing
the
positive
y-axis.
\n
(7)
Take
1
step:
(0,
9),
facing
the
positive
y-axis.
\n
Since
(0,
9)
is
not
(0,
0),
we
are
not
where
we
started.
So
the
answer
is
No.
If
you
follow
these
instructions,
do
you
return
to
the
starting
point?
Always
face
forward.
Take
2
steps
right.
Take
9
steps
left.
Take
7
steps
right.
\n
Options:
\n
-
Yes
\n
-
No
\n
A:
Let's
think
step
by
step.
\n
We
start
at
the
origin
(0,
0),
facing
the
positive
y-axis.
\n
(1)
Always
face
forward:
(0,
0),
facing
the
positive
y-axis.
\n
(2)
Take
2
steps
right:
(0,
2),
facing
the
positive
y-axis.
\n
(3)
Take
9
steps
left:
(0,
-7),
facing
the
positive
y-axis.
\n
(4)
Take
7
steps
right:
(0,
7),
facing
the
positive
y-axis.
\n
Since
(0,
0)
is
(0,
0),
we
are
indeed
where
we
started.
So
the
answer
is
Yes.Q:
{{input}}
\n
A:
Let's
think
step
by
step.
\n
"
"
include"
:
"
_
flan_
cot_fewshot_template_yaml"
"
include"
:
"
_cot_fewshot_template_yaml"
"
task"
:
"
bbh_
flan_
cot_fewshot_navigate"
"
task"
:
"
bbh_cot_fewshot_navigate"
lm_eval/tasks/bbh/
flan_
cot_fewshot/object_counting.yaml
→
lm_eval/tasks/bbh/cot_fewshot/object_counting.yaml
View file @
bf26d979
"
dataset_name"
:
"
object_counting"
"
dataset_name"
:
"
object_counting"
"
description"
:
"
Questions
that
involve
enumerating
objects
and
asking
the
model
to
count
them.
\n\n
"
"
description"
:
"
Questions
that
involve
enumerating
objects
and
asking
the
model
to
count
them.
\n\n
"
"
doc_to_text"
:
"
I
have
a
blackberry,
a
clarinet,
a
nectarine,
a
plum,
a
strawberry,
a
banana,
a
flute,
an
orange,
and
a
violin.
How
many
fruits
do
I
have?
\n
A:
Let's
think
step
by
step.
\n
We
first
identify
the
fruits
on
the
list
and
include
their
quantity
in
parentheses:
\n
-
blackberry
(1)
\n
-
nectarine
(1)
\n
-
plum
(1)
\n
-
strawberry
(1)
\n
-
banana
(1)
\n
-
orange
(1)
\n
Now,
let's
add
the
numbers
in
parentheses:
1
+
1
+
1
+
1
+
1
+
1
=
6.
So
the
answer
is
6.
I
have
an
orange,
a
raspberry,
two
peaches,
a
blackberry,
an
apple,
a
grape,
a
nectarine,
and
three
plums.
How
many
fruits
do
I
have?
\n
A:
Let's
think
step
by
step.
\n
We
first
identify
the
fruits
on
the
list
and
include
their
quantity
in
parentheses:
\n
-
orange
(1)
\n
-
raspberry
(1)
\n
-
peaches
(2)
\n
-
blackberry
(1)
\n
-
apple
(1)
\n
-
grape
(1)
\n
-
nectarine
(1)
\n
-
plums
(3)
\n
Now,
let's
add
the
numbers
in
parentheses:
1
+
1
+
2
+
1
+
1
+
1
+
1
+
3
=
11.
So
the
answer
is
11.
I
have
a
lettuce
head,
a
head
of
broccoli,
an
onion,
a
stalk
of
celery,
two
carrots,
a
garlic,
and
a
yam.
How
many
vegetables
do
I
have?
\n
A:
Let's
think
step
by
step.
\n
We
first
identify
the
vegetables
on
the
list
and
include
their
quantity
in
parentheses:
\n
-
lettuce
(1)
\n
-
broccoli
(1)
\n
-
onion
(1)
\n
-
celery
(1)
\n
-
carrots
(2)
\n
-
garlic
(1)
\n
-
yam
(1)
\n
Now,
let's
add
the
numbers
in
parentheses:
1
+
1
+
1
+
1
+
2
+
1
+
1
=
8.
So
the
answer
is
8.Q:
{{input}}
\n
A:
Let's
think
step
by
step.
\n
"
"
doc_to_text"
:
"
I
have
a
blackberry,
a
clarinet,
a
nectarine,
a
plum,
a
strawberry,
a
banana,
a
flute,
an
orange,
and
a
violin.
How
many
fruits
do
I
have?
\n
A:
Let's
think
step
by
step.
\n
We
first
identify
the
fruits
on
the
list
and
include
their
quantity
in
parentheses:
\n
-
blackberry
(1)
\n
-
nectarine
(1)
\n
-
plum
(1)
\n
-
strawberry
(1)
\n
-
banana
(1)
\n
-
orange
(1)
\n
Now,
let's
add
the
numbers
in
parentheses:
1
+
1
+
1
+
1
+
1
+
1
=
6.
So
the
answer
is
6.
I
have
an
orange,
a
raspberry,
two
peaches,
a
blackberry,
an
apple,
a
grape,
a
nectarine,
and
three
plums.
How
many
fruits
do
I
have?
\n
A:
Let's
think
step
by
step.
\n
We
first
identify
the
fruits
on
the
list
and
include
their
quantity
in
parentheses:
\n
-
orange
(1)
\n
-
raspberry
(1)
\n
-
peaches
(2)
\n
-
blackberry
(1)
\n
-
apple
(1)
\n
-
grape
(1)
\n
-
nectarine
(1)
\n
-
plums
(3)
\n
Now,
let's
add
the
numbers
in
parentheses:
1
+
1
+
2
+
1
+
1
+
1
+
1
+
3
=
11.
So
the
answer
is
11.
I
have
a
lettuce
head,
a
head
of
broccoli,
an
onion,
a
stalk
of
celery,
two
carrots,
a
garlic,
and
a
yam.
How
many
vegetables
do
I
have?
\n
A:
Let's
think
step
by
step.
\n
We
first
identify
the
vegetables
on
the
list
and
include
their
quantity
in
parentheses:
\n
-
lettuce
(1)
\n
-
broccoli
(1)
\n
-
onion
(1)
\n
-
celery
(1)
\n
-
carrots
(2)
\n
-
garlic
(1)
\n
-
yam
(1)
\n
Now,
let's
add
the
numbers
in
parentheses:
1
+
1
+
1
+
1
+
2
+
1
+
1
=
8.
So
the
answer
is
8.Q:
{{input}}
\n
A:
Let's
think
step
by
step.
\n
"
"
include"
:
"
_
flan_
cot_fewshot_template_yaml"
"
include"
:
"
_cot_fewshot_template_yaml"
"
task"
:
"
bbh_
flan_
cot_fewshot_object_counting"
"
task"
:
"
bbh_cot_fewshot_object_counting"
lm_eval/tasks/bbh/
flan_
cot_fewshot/penguins_in_a_table.yaml
→
lm_eval/tasks/bbh/cot_fewshot/penguins_in_a_table.yaml
View file @
bf26d979
"
dataset_name"
:
"
penguins_in_a_table"
"
dataset_name"
:
"
penguins_in_a_table"
"
description"
:
"
Answer
questions
about
a
table
of
penguins
and
their
attributes.
\n\n
"
"
description"
:
"
Answer
questions
about
a
table
of
penguins
and
their
attributes.
\n\n
"
"
doc_to_text"
:
"
Here
is
a
table
where
the
first
line
is
a
header
and
each
subsequent
line
is
a
penguin:
name,
age,
height
(cm),
weight
(kg)
Louis,
7,
50,
11
Bernard,
5,
80,
13
Vincent,
9,
60,
11
Gwen,
8,
70,
15
For
example:
the
age
of
Louis
is
7,
the
weight
of
Gwen
is
15
kg,
the
height
of
Bernard
is
80
cm.
We
now
add
a
penguin
to
the
table:
\n
James,
12,
90,
12
\n
How
many
penguins
are
less
than
8
years
old?
\n
Options:
\n
(A)
1
\n
(B)
2
\n
(C)
3
\n
(D)
4
\n
(E)
5
\n
A:
Let's
think
step
by
step.
\n
This
question
focuses
on
age.
We
know
the
following:
Louis
is
7
years
old,
Bernard
is
5
years
old,
Vincent
is
9
years
old,
and
Gwen
is
8
years
old.
\n
Now,
we
add
James
to
this
table:
James
is
12
years
old.
\n
The
penguins
that
are
less
than
8
years
old
are
Louis
and
Bernard.
\n
There
are
2
penguins
less
than
8
years
old.
So
the
answer
is
(B).
Here
is
a
table
where
the
first
line
is
a
header
and
each
subsequent
line
is
a
penguin:
name,
age,
height
(cm),
weight
(kg)
Louis,
7,
50,
11
Bernard,
5,
80,
13
Vincent,
9,
60,
11
Gwen,
8,
70,
15
For
example:
the
age
of
Louis
is
7,
the
weight
of
Gwen
is
15
kg,
the
height
of
Bernard
is
80
cm.
Which
is
the
youngest
penguin?
\n
Options:
\n
(A)
Louis
\n
(B)
Bernard
\n
(C)
Vincent
\n
(D)
Gwen
\n
(E)
James
\n
A:
Let's
think
step
by
step.
\n
This
question
focuses
on
age.
We
know
the
following:
Louis
is
7
years
old,
Bernard
is
5
years
old,
Vincent
is
9
years
old,
and
Gwen
is
8
years
old.
\n
According
to
the
table,
Bernard
(5)
is
the
youngest
amongst
them.
\n
The
youngest
penguin
is
Bernard.
So
the
answer
is
(B).
Here
is
a
table
where
the
first
line
is
a
header
and
each
subsequent
line
is
a
penguin:
name,
age,
height
(cm),
weight
(kg)
Louis,
7,
50,
11
Bernard,
5,
80,
13
Vincent,
9,
60,
11
Gwen,
8,
70,
15
For
example:
the
age
of
Louis
is
7,
the
weight
of
Gwen
is
15
kg,
the
height
of
Bernard
is
80
cm.
What
is
the
name
of
the
second
penguin
sorted
by
alphabetic
order?
\n
Options:
\n
(A)
Louis
\n
(B)
Bernard
\n
(C)
Vincent
\n
(D)
Gwen
\n
(E)
James
\n
A:
Let's
think
step
by
step.
\n
This
question
focuses
on
the
name.
We
know
the
following:
The
names
of
the
penguin
in
the
table
are
Louis,
Bernard,
Vincent,
and
Gwen.
\n
When
we
sort
their
names
alphabetically,
we
get
Bernard,
Gwen,
Louis,
Vincent.
\n
The
name
of
the
second
penguin
sorted
by
alphabetical
order
is
Gwen.
\n
The
name
of
the
second
penguin
sorted
by
alphabetic
order
is
Gwen.
So
the
answer
is
(D).Q:
{{input}}
\n
A:
Let's
think
step
by
step.
\n
"
"
doc_to_text"
:
"
Here
is
a
table
where
the
first
line
is
a
header
and
each
subsequent
line
is
a
penguin:
name,
age,
height
(cm),
weight
(kg)
Louis,
7,
50,
11
Bernard,
5,
80,
13
Vincent,
9,
60,
11
Gwen,
8,
70,
15
For
example:
the
age
of
Louis
is
7,
the
weight
of
Gwen
is
15
kg,
the
height
of
Bernard
is
80
cm.
We
now
add
a
penguin
to
the
table:
\n
James,
12,
90,
12
\n
How
many
penguins
are
less
than
8
years
old?
\n
Options:
\n
(A)
1
\n
(B)
2
\n
(C)
3
\n
(D)
4
\n
(E)
5
\n
A:
Let's
think
step
by
step.
\n
This
question
focuses
on
age.
We
know
the
following:
Louis
is
7
years
old,
Bernard
is
5
years
old,
Vincent
is
9
years
old,
and
Gwen
is
8
years
old.
\n
Now,
we
add
James
to
this
table:
James
is
12
years
old.
\n
The
penguins
that
are
less
than
8
years
old
are
Louis
and
Bernard.
\n
There
are
2
penguins
less
than
8
years
old.
So
the
answer
is
(B).
Here
is
a
table
where
the
first
line
is
a
header
and
each
subsequent
line
is
a
penguin:
name,
age,
height
(cm),
weight
(kg)
Louis,
7,
50,
11
Bernard,
5,
80,
13
Vincent,
9,
60,
11
Gwen,
8,
70,
15
For
example:
the
age
of
Louis
is
7,
the
weight
of
Gwen
is
15
kg,
the
height
of
Bernard
is
80
cm.
Which
is
the
youngest
penguin?
\n
Options:
\n
(A)
Louis
\n
(B)
Bernard
\n
(C)
Vincent
\n
(D)
Gwen
\n
(E)
James
\n
A:
Let's
think
step
by
step.
\n
This
question
focuses
on
age.
We
know
the
following:
Louis
is
7
years
old,
Bernard
is
5
years
old,
Vincent
is
9
years
old,
and
Gwen
is
8
years
old.
\n
According
to
the
table,
Bernard
(5)
is
the
youngest
amongst
them.
\n
The
youngest
penguin
is
Bernard.
So
the
answer
is
(B).
Here
is
a
table
where
the
first
line
is
a
header
and
each
subsequent
line
is
a
penguin:
name,
age,
height
(cm),
weight
(kg)
Louis,
7,
50,
11
Bernard,
5,
80,
13
Vincent,
9,
60,
11
Gwen,
8,
70,
15
For
example:
the
age
of
Louis
is
7,
the
weight
of
Gwen
is
15
kg,
the
height
of
Bernard
is
80
cm.
What
is
the
name
of
the
second
penguin
sorted
by
alphabetic
order?
\n
Options:
\n
(A)
Louis
\n
(B)
Bernard
\n
(C)
Vincent
\n
(D)
Gwen
\n
(E)
James
\n
A:
Let's
think
step
by
step.
\n
This
question
focuses
on
the
name.
We
know
the
following:
The
names
of
the
penguin
in
the
table
are
Louis,
Bernard,
Vincent,
and
Gwen.
\n
When
we
sort
their
names
alphabetically,
we
get
Bernard,
Gwen,
Louis,
Vincent.
\n
The
name
of
the
second
penguin
sorted
by
alphabetical
order
is
Gwen.
\n
The
name
of
the
second
penguin
sorted
by
alphabetic
order
is
Gwen.
So
the
answer
is
(D).Q:
{{input}}
\n
A:
Let's
think
step
by
step.
\n
"
"
include"
:
"
_
flan_
cot_fewshot_template_yaml"
"
include"
:
"
_cot_fewshot_template_yaml"
"
task"
:
"
bbh_
flan_
cot_fewshot_penguins_in_a_table"
"
task"
:
"
bbh_cot_fewshot_penguins_in_a_table"
lm_eval/tasks/bbh/
flan_
cot_fewshot/reasoning_about_colored_objects.yaml
→
lm_eval/tasks/bbh/cot_fewshot/reasoning_about_colored_objects.yaml
View file @
bf26d979
"
dataset_name"
:
"
reasoning_about_colored_objects"
"
dataset_name"
:
"
reasoning_about_colored_objects"
"
description"
:
"
Answer
extremely
simple
questions
about
the
colors
of
objects
on
a
surface.
\n\n
"
"
description"
:
"
Answer
extremely
simple
questions
about
the
colors
of
objects
on
a
surface.
\n\n
"
"
doc_to_text"
:
"
On
the
nightstand,
there
is
a
red
pencil,
a
purple
mug,
a
burgundy
keychain,
a
fuchsia
teddy
bear,
a
black
plate,
and
a
blue
stress
ball.
What
color
is
the
stress
ball?
\n
Options:
\n
(A)
red
\n
(B)
orange
\n
(C)
yellow
\n
(D)
green
\n
(E)
blue
\n
(F)
brown
\n
(G)
magenta
\n
(H)
fuchsia
\n
(I)
mauve
\n
(J)
teal
\n
(K)
turquoise
\n
(L)
burgundy
\n
(M)
silver
\n
(N)
gold
\n
(O)
black
\n
(P)
grey
\n
(Q)
purple
\n
(R)
pink
\n
A:
Let's
think
step
by
step.
\n
According
to
this
question,
the
color
of
the
stress
ball
is
blue.
So
the
answer
is
(E).
On
the
table,
you
see
a
bunch
of
objects
arranged
in
a
row:
a
purple
paperclip,
a
pink
stress
ball,
a
brown
keychain,
a
green
scrunchiephone
charger,
a
mauve
fidget
spinner,
and
a
burgundy
pen.
What
is
the
color
of
the
object
directly
to
the
right
of
the
stress
ball?
\n
Options:
\n
(A)
red
\n
(B)
orange
\n
(C)
yellow
\n
(D)
green
\n
(E)
blue
\n
(F)
brown
\n
(G)
magenta
\n
(H)
fuchsia
\n
(I)
mauve
\n
(J)
teal
\n
(K)
turquoise
\n
(L)
burgundy
\n
(M)
silver
\n
(N)
gold
\n
(O)
black
\n
(P)
grey
\n
(Q)
purple
\n
(R)
pink
\n
A:
Let's
think
step
by
step.
\n
According
to
this
question,
the
objects
are
arranged
in
a
row,
from
left
to
right,
as
follows:
(1)
a
purple
paperclip,
(2)
a
pink
stress
ball,
(3)
a
brown
keychain,
(4)
a
green
scrunchiephone
charger,
(5)
a
mauve
fidget
spinner,
(6)
a
burgundy
pen.
\n
The
stress
ball
is
the
second
object
on
the
list,
namely
(2).
The
object
that
is
to
the
right
of
the
stress
ball
corresponds
to
(3),
which
is
a
brown
keychain.
\n
The
color
of
the
keychain
is
brown.
So
the
answer
is
(F).
On
the
nightstand,
you
see
the
following
items
arranged
in
a
row:
a
teal
plate,
a
burgundy
keychain,
a
yellow
scrunchiephone
charger,
an
orange
mug,
a
pink
notebook,
and
a
grey
cup.
How
many
non-orange
items
do
you
see
to
the
left
of
the
teal
item?
\n
Options:
\n
(A)
zero
\n
(B)
one
\n
(C)
two
\n
(D)
three
\n
(E)
four
\n
(F)
five
\n
(G)
six
\n
A:
Let's
think
step
by
step.
\n
According
to
this
question,
the
objects
are
arranged
in
a
row,
from
left
to
right,
as
follows:
(1)
a
teal
plate,
(2)
a
burgundy
keychain,
(3)
a
yellow
scrunchiephone
charger,
(4)
an
orange
mug,
(5)
a
pink
notebook,
(6)
a
grey
cup.
\n
The
teal
plate
is
the
first
item,
namely
(1).
There
is
no
item
to
the
left
of
the
teal
item.
\n
The
number
of
non-orange
items
to
the
left
of
the
teal
item
is
zero.
So
the
answer
is
(A).Q:
{{input}}
\n
A:
Let's
think
step
by
step.
\n
"
"
doc_to_text"
:
"
On
the
nightstand,
there
is
a
red
pencil,
a
purple
mug,
a
burgundy
keychain,
a
fuchsia
teddy
bear,
a
black
plate,
and
a
blue
stress
ball.
What
color
is
the
stress
ball?
\n
Options:
\n
(A)
red
\n
(B)
orange
\n
(C)
yellow
\n
(D)
green
\n
(E)
blue
\n
(F)
brown
\n
(G)
magenta
\n
(H)
fuchsia
\n
(I)
mauve
\n
(J)
teal
\n
(K)
turquoise
\n
(L)
burgundy
\n
(M)
silver
\n
(N)
gold
\n
(O)
black
\n
(P)
grey
\n
(Q)
purple
\n
(R)
pink
\n
A:
Let's
think
step
by
step.
\n
According
to
this
question,
the
color
of
the
stress
ball
is
blue.
So
the
answer
is
(E).
On
the
table,
you
see
a
bunch
of
objects
arranged
in
a
row:
a
purple
paperclip,
a
pink
stress
ball,
a
brown
keychain,
a
green
scrunchiephone
charger,
a
mauve
fidget
spinner,
and
a
burgundy
pen.
What
is
the
color
of
the
object
directly
to
the
right
of
the
stress
ball?
\n
Options:
\n
(A)
red
\n
(B)
orange
\n
(C)
yellow
\n
(D)
green
\n
(E)
blue
\n
(F)
brown
\n
(G)
magenta
\n
(H)
fuchsia
\n
(I)
mauve
\n
(J)
teal
\n
(K)
turquoise
\n
(L)
burgundy
\n
(M)
silver
\n
(N)
gold
\n
(O)
black
\n
(P)
grey
\n
(Q)
purple
\n
(R)
pink
\n
A:
Let's
think
step
by
step.
\n
According
to
this
question,
the
objects
are
arranged
in
a
row,
from
left
to
right,
as
follows:
(1)
a
purple
paperclip,
(2)
a
pink
stress
ball,
(3)
a
brown
keychain,
(4)
a
green
scrunchiephone
charger,
(5)
a
mauve
fidget
spinner,
(6)
a
burgundy
pen.
\n
The
stress
ball
is
the
second
object
on
the
list,
namely
(2).
The
object
that
is
to
the
right
of
the
stress
ball
corresponds
to
(3),
which
is
a
brown
keychain.
\n
The
color
of
the
keychain
is
brown.
So
the
answer
is
(F).
On
the
nightstand,
you
see
the
following
items
arranged
in
a
row:
a
teal
plate,
a
burgundy
keychain,
a
yellow
scrunchiephone
charger,
an
orange
mug,
a
pink
notebook,
and
a
grey
cup.
How
many
non-orange
items
do
you
see
to
the
left
of
the
teal
item?
\n
Options:
\n
(A)
zero
\n
(B)
one
\n
(C)
two
\n
(D)
three
\n
(E)
four
\n
(F)
five
\n
(G)
six
\n
A:
Let's
think
step
by
step.
\n
According
to
this
question,
the
objects
are
arranged
in
a
row,
from
left
to
right,
as
follows:
(1)
a
teal
plate,
(2)
a
burgundy
keychain,
(3)
a
yellow
scrunchiephone
charger,
(4)
an
orange
mug,
(5)
a
pink
notebook,
(6)
a
grey
cup.
\n
The
teal
plate
is
the
first
item,
namely
(1).
There
is
no
item
to
the
left
of
the
teal
item.
\n
The
number
of
non-orange
items
to
the
left
of
the
teal
item
is
zero.
So
the
answer
is
(A).Q:
{{input}}
\n
A:
Let's
think
step
by
step.
\n
"
"
include"
:
"
_
flan_
cot_fewshot_template_yaml"
"
include"
:
"
_cot_fewshot_template_yaml"
"
task"
:
"
bbh_
flan_
cot_fewshot_reasoning_about_colored_objects"
"
task"
:
"
bbh_cot_fewshot_reasoning_about_colored_objects"
Prev
1
2
3
4
5
6
Next
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment