Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
gaoqiong
lm-evaluation-harness
Commits
33dcbd49
Commit
33dcbd49
authored
Dec 13, 2023
by
momotori
Browse files
fix: fix bug in the "doc_to_text" of BBH_cot_fewshot
parent
aed90773
Changes
27
Hide whitespace changes
Inline
Side-by-side
Showing
7 changed files
with
7 additions
and
7 deletions
+7
-7
lm_eval/tasks/bbh/cot_fewshot/sports_understanding.yaml
lm_eval/tasks/bbh/cot_fewshot/sports_understanding.yaml
+1
-1
lm_eval/tasks/bbh/cot_fewshot/temporal_sequences.yaml
lm_eval/tasks/bbh/cot_fewshot/temporal_sequences.yaml
+1
-1
lm_eval/tasks/bbh/cot_fewshot/tracking_shuffled_objects_five_objects.yaml
...h/cot_fewshot/tracking_shuffled_objects_five_objects.yaml
+1
-1
lm_eval/tasks/bbh/cot_fewshot/tracking_shuffled_objects_seven_objects.yaml
.../cot_fewshot/tracking_shuffled_objects_seven_objects.yaml
+1
-1
lm_eval/tasks/bbh/cot_fewshot/tracking_shuffled_objects_three_objects.yaml
.../cot_fewshot/tracking_shuffled_objects_three_objects.yaml
+1
-1
lm_eval/tasks/bbh/cot_fewshot/web_of_lies.yaml
lm_eval/tasks/bbh/cot_fewshot/web_of_lies.yaml
+1
-1
lm_eval/tasks/bbh/cot_fewshot/word_sorting.yaml
lm_eval/tasks/bbh/cot_fewshot/word_sorting.yaml
+1
-1
No files found.
lm_eval/tasks/bbh/cot_fewshot/sports_understanding.yaml
View file @
33dcbd49
"
dataset_name"
:
"
sports_understanding"
"
dataset_name"
:
"
sports_understanding"
"
description"
:
"
Determine
whether
an
artificially
constructed
sentence
relating
to
sports
is
plausible
or
not.
\n\n
"
"
description"
:
"
Determine
whether
an
artificially
constructed
sentence
relating
to
sports
is
plausible
or
not.
\n\n
"
"
doc_to_text"
:
"
Is
the
following
sentence
plausible?
\"
Bam
Adebayo
scored
a
reverse
layup
in
the
Western
Conference
Finals.
\"\n
A:
Let's
think
step
by
step.
Bam
Adebayo
is
an
American
basketball
player.
Scoring
a
reverse
layup
in
the
Western
Conference
Finals
is
part
of
the
NBA
Finals.
So
the
answer
is
yes.
Is
the
following
sentence
plausible?
\"
Santi
Cazorla
scored
a
touchdown.
\"\n
A:
Let's
think
step
by
step.
Santi
Cazorla
is
a
soccer
player.
Touchdown
is
part
of
American
football
and
rugby.
So
the
answer
is
no.
Is
the
following
sentence
plausible?
\"
DeMar
DeRozan
was
called
for
the
goal
tend.
\"\n
A:
Let's
think
step
by
step.
DeMar
DeRozan
is
an
American
basketball
player.
Goal
tending
is
part
of
basketball.
So
the
answer
is
yes.Q:
{{input}}
\n
A:
Let's
think
step
by
step.
\n
"
"
doc_to_text"
:
"
Q:
Is
the
following
sentence
plausible?
\"
Bam
Adebayo
scored
a
reverse
layup
in
the
Western
Conference
Finals.
\"\n
A:
Let's
think
step
by
step.
Bam
Adebayo
is
an
American
basketball
player.
Scoring
a
reverse
layup
in
the
Western
Conference
Finals
is
part
of
the
NBA
Finals.
So
the
answer
is
yes.
\n\n
Q:
Is
the
following
sentence
plausible?
\"
Santi
Cazorla
scored
a
touchdown.
\"\n
A:
Let's
think
step
by
step.
Santi
Cazorla
is
a
soccer
player.
Touchdown
is
part
of
American
football
and
rugby.
So
the
answer
is
no.
\n\n
Q:
Is
the
following
sentence
plausible?
\"
DeMar
DeRozan
was
called
for
the
goal
tend.
\"\n
A:
Let's
think
step
by
step.
DeMar
DeRozan
is
an
American
basketball
player.
Goal
tending
is
part
of
basketball.
So
the
answer
is
yes.
\n\n
Q:
{{input}}
\n
A:
Let's
think
step
by
step.
\n
"
"
include"
:
"
_cot_fewshot_template_yaml"
"
include"
:
"
_cot_fewshot_template_yaml"
"
task"
:
"
bbh_cot_fewshot_sports_understanding"
"
task"
:
"
bbh_cot_fewshot_sports_understanding"
lm_eval/tasks/bbh/cot_fewshot/temporal_sequences.yaml
View file @
33dcbd49
"
dataset_name"
:
"
temporal_sequences"
"
dataset_name"
:
"
temporal_sequences"
"
description"
:
"
Task
description:
Answer
questions
about
which
times
certain
events
could
have
occurred.
\n\n
"
"
description"
:
"
Task
description:
Answer
questions
about
which
times
certain
events
could
have
occurred.
\n\n
"
"
doc_to_text"
:
"
Today,
Emily
went
to
the
museum.
Between
what
times
could
they
have
gone?
\n
We
know
that:
\n
Emily
woke
up
at
1pm.
\n
Elizabeth
saw
Emily
reading
at
the
library
from
2pm
to
4pm.
\n
Jessica
saw
Emily
watching
a
movie
at
the
theater
from
4pm
to
5pm.
\n
Leslie
saw
Emily
waiting
at
the
airport
from
5pm
to
6pm.
\n
William
saw
Emily
buying
clothes
at
the
mall
from
6pm
to
7pm.
\n
The
museum
was
closed
after
7pm.
\n
Between
what
times
could
Emily
have
gone
to
the
museum?
\n
Options:
\n
(A)
1pm
to
2pm
\n
(B)
6pm
to
7pm
\n
(C)
5pm
to
6pm
\n
(D)
2pm
to
4pm
\n
A:
Let's
think
step
by
step.
\n
Wake-up
time:
1pm.
\n
1pm-2pm:
free.
\n
2pm-4pm:
reading
at
the
library.
\n
4pm-5pm:
watching
a
movie
at
the
theater.
\n
5pm-6pm:
waiting
at
the
airport.
\n
6pm-7pm:
buying
clothes
at
the
mall.
\n
The
museum
closure
time:
7pm.
\n
The
only
time
when
Emily
could
have
gone
to
the
museum
was
1pm
to
2pm.
So
the
answer
is
(A).
Today,
Elizabeth
went
to
the
amusement
park.
Between
what
times
could
they
have
gone?
\n
We
know
that:
\n
Elizabeth
woke
up
at
7am.
\n
David
saw
Elizabeth
fixing
their
computer
at
the
electronic
store
from
1pm
to
2pm.
\n
Sarah
saw
Elizabeth
playing
tennis
at
the
tennis
court
from
2pm
to
3pm.
\n
Susan
saw
Elizabeth
walking
towards
the
Statue
of
Liberty
from
3pm
to
6pm.
\n
Andrew
saw
Elizabeth
taking
photos
near
the
Eiffel
Tower
from
6pm
to
9pm.
\n
Emily
saw
Elizabeth
getting
a
coffee
at
the
cafe
from
9pm
to
10pm.
\n
The
amusement
park
was
closed
after
10pm.
\n
Between
what
times
could
Elizabeth
have
gone
to
the
amusement
park?
\n
Options:
\n
(A)
7am
to
1pm
\n
(B)
9pm
to
10pm
\n
(C)
1pm
to
2pm
\n
(D)
3pm
to
6pm
\n
A:
Let's
think
step
by
step.
\n
Wake-up
time:
7am.
\n
7am-1pm:
free.
\n
1pm-2pm:
fixing
their
computer
at
the
electronic
store.
\n
2pm-3pm:
playing
tennis
at
the
tennis
court.
\n
3pm-6pm:
walking
towards
the
Statue
of
Liberty.
\n
6pm-9pm:
taking
photos
near
the
Eiffel
Tower.
\n
9pm-10pm:
getting
a
coffee
at
the
cafe.
\n
The
amusement
park
closure
time:
10pm.
\n
The
only
time
when
Elizabeth
could
have
gone
to
the
amusement
park
was
7am
to
1pm.
So
the
answer
is
(A).
Today,
Tiffany
went
to
the
beach.
Between
what
times
could
they
have
gone?
\n
We
know
that:
\n
Tiffany
woke
up
at
5am.
\n
Betty
saw
Tiffany
getting
a
coffee
at
the
cafe
from
5am
to
6am.
\n
Jessica
saw
Tiffany
working
at
the
office
from
6am
to
9am.
\n
John
saw
Tiffany
stretching
at
a
yoga
studio
from
9am
to
12pm.
\n
Sean
saw
Tiffany
sitting
on
a
rooftop
from
12pm
to
2pm.
\n
Sarah
saw
Tiffany
playing
tennis
at
the
tennis
court
from
2pm
to
3pm.
\n
The
beach
was
closed
after
4pm.
\n
Between
what
times
could
Tiffany
have
gone
to
the
beach?
\n
Options:
\n
(A)
9am
to
12pm
\n
(B)
12pm
to
2pm
\n
(C)
5am
to
6am
\n
(D)
3pm
to
4pm
\n
A:
Let's
think
step
by
step.
\n
Wake-up
time:
5am.
\n
5am-6am:
getting
a
coffee
at
the
cafe.
\n
6am-9am:
working
at
the
office.
\n
9am-12pm:
stretching
at
a
yoga
studio.
\n
12pm-2pm:
sitting
on
a
rooftop.
\n
2pm-3pm:
playing
tennis
at
the
tennis
court.
\n
3pm-4pm:
free.
\n
The
beach
closure
time:
4pm.
\n
The
only
time
when
Tiffany
could
have
gone
to
the
beach
was
3pm
to
4pm.
So
the
answer
is
(D).Q:
{{input}}
\n
A:
Let's
think
step
by
step.
\n
"
"
doc_to_text"
:
"
Q:
Today,
Emily
went
to
the
museum.
Between
what
times
could
they
have
gone?
\n
We
know
that:
\n
Emily
woke
up
at
1pm.
\n
Elizabeth
saw
Emily
reading
at
the
library
from
2pm
to
4pm.
\n
Jessica
saw
Emily
watching
a
movie
at
the
theater
from
4pm
to
5pm.
\n
Leslie
saw
Emily
waiting
at
the
airport
from
5pm
to
6pm.
\n
William
saw
Emily
buying
clothes
at
the
mall
from
6pm
to
7pm.
\n
The
museum
was
closed
after
7pm.
\n
Between
what
times
could
Emily
have
gone
to
the
museum?
\n
Options:
\n
(A)
1pm
to
2pm
\n
(B)
6pm
to
7pm
\n
(C)
5pm
to
6pm
\n
(D)
2pm
to
4pm
\n
A:
Let's
think
step
by
step.
\n
Wake-up
time:
1pm.
\n
1pm-2pm:
free.
\n
2pm-4pm:
reading
at
the
library.
\n
4pm-5pm:
watching
a
movie
at
the
theater.
\n
5pm-6pm:
waiting
at
the
airport.
\n
6pm-7pm:
buying
clothes
at
the
mall.
\n
The
museum
closure
time:
7pm.
\n
The
only
time
when
Emily
could
have
gone
to
the
museum
was
1pm
to
2pm.
So
the
answer
is
(A).
\n\n
Q:
Today,
Elizabeth
went
to
the
amusement
park.
Between
what
times
could
they
have
gone?
\n
We
know
that:
\n
Elizabeth
woke
up
at
7am.
\n
David
saw
Elizabeth
fixing
their
computer
at
the
electronic
store
from
1pm
to
2pm.
\n
Sarah
saw
Elizabeth
playing
tennis
at
the
tennis
court
from
2pm
to
3pm.
\n
Susan
saw
Elizabeth
walking
towards
the
Statue
of
Liberty
from
3pm
to
6pm.
\n
Andrew
saw
Elizabeth
taking
photos
near
the
Eiffel
Tower
from
6pm
to
9pm.
\n
Emily
saw
Elizabeth
getting
a
coffee
at
the
cafe
from
9pm
to
10pm.
\n
The
amusement
park
was
closed
after
10pm.
\n
Between
what
times
could
Elizabeth
have
gone
to
the
amusement
park?
\n
Options:
\n
(A)
7am
to
1pm
\n
(B)
9pm
to
10pm
\n
(C)
1pm
to
2pm
\n
(D)
3pm
to
6pm
\n
A:
Let's
think
step
by
step.
\n
Wake-up
time:
7am.
\n
7am-1pm:
free.
\n
1pm-2pm:
fixing
their
computer
at
the
electronic
store.
\n
2pm-3pm:
playing
tennis
at
the
tennis
court.
\n
3pm-6pm:
walking
towards
the
Statue
of
Liberty.
\n
6pm-9pm:
taking
photos
near
the
Eiffel
Tower.
\n
9pm-10pm:
getting
a
coffee
at
the
cafe.
\n
The
amusement
park
closure
time:
10pm.
\n
The
only
time
when
Elizabeth
could
have
gone
to
the
amusement
park
was
7am
to
1pm.
So
the
answer
is
(A).
\n\n
Q:
Today,
Tiffany
went
to
the
beach.
Between
what
times
could
they
have
gone?
\n
We
know
that:
\n
Tiffany
woke
up
at
5am.
\n
Betty
saw
Tiffany
getting
a
coffee
at
the
cafe
from
5am
to
6am.
\n
Jessica
saw
Tiffany
working
at
the
office
from
6am
to
9am.
\n
John
saw
Tiffany
stretching
at
a
yoga
studio
from
9am
to
12pm.
\n
Sean
saw
Tiffany
sitting
on
a
rooftop
from
12pm
to
2pm.
\n
Sarah
saw
Tiffany
playing
tennis
at
the
tennis
court
from
2pm
to
3pm.
\n
The
beach
was
closed
after
4pm.
\n
Between
what
times
could
Tiffany
have
gone
to
the
beach?
\n
Options:
\n
(A)
9am
to
12pm
\n
(B)
12pm
to
2pm
\n
(C)
5am
to
6am
\n
(D)
3pm
to
4pm
\n
A:
Let's
think
step
by
step.
\n
Wake-up
time:
5am.
\n
5am-6am:
getting
a
coffee
at
the
cafe.
\n
6am-9am:
working
at
the
office.
\n
9am-12pm:
stretching
at
a
yoga
studio.
\n
12pm-2pm:
sitting
on
a
rooftop.
\n
2pm-3pm:
playing
tennis
at
the
tennis
court.
\n
3pm-4pm:
free.
\n
The
beach
closure
time:
4pm.
\n
The
only
time
when
Tiffany
could
have
gone
to
the
beach
was
3pm
to
4pm.
So
the
answer
is
(D).
\n\n
Q:
{{input}}
\n
A:
Let's
think
step
by
step.
\n
"
"
include"
:
"
_cot_fewshot_template_yaml"
"
include"
:
"
_cot_fewshot_template_yaml"
"
task"
:
"
bbh_cot_fewshot_temporal_sequences"
"
task"
:
"
bbh_cot_fewshot_temporal_sequences"
lm_eval/tasks/bbh/cot_fewshot/tracking_shuffled_objects_five_objects.yaml
View file @
33dcbd49
"
dataset_name"
:
"
tracking_shuffled_objects_five_objects"
"
dataset_name"
:
"
tracking_shuffled_objects_five_objects"
"
description"
:
"
A
task
requiring
determining
the
final
positions
of
a
set
of
objects
given
their
initial
positions
and
a
description
of
a
sequence
of
swaps.
\n\n
"
"
description"
:
"
A
task
requiring
determining
the
final
positions
of
a
set
of
objects
given
their
initial
positions
and
a
description
of
a
sequence
of
swaps.
\n\n
"
"
doc_to_text"
:
"
Alice,
Bob,
and
Claire
are
playing
a
game.
At
the
start
of
the
game,
they
are
each
holding
a
ball:
Alice
has
a
yellow
ball,
Bob
has
a
blue
ball,
and
Claire
has
a
pink
ball.
\n
As
the
game
progresses,
pairs
of
players
trade
balls.
First,
Claire
and
Alice
swap
balls.
Then,
Alice
and
Bob
swap
balls.
Finally,
Claire
and
Bob
swap
balls.
At
the
end
of
the
game,
Bob
has
the
\n
Options:
\n
(A)
yellow
ball
\n
(B)
blue
ball
\n
(C)
pink
ball
\n
A:
Let's
think
step
by
step.
\n
(0)
At
the
start:
Alice:
yellow,
Bob:
blue,
Claire:
pink.
\n
(1)
Claire
and
Alice
swap
balls:
Alice:
pink,
Bob:
blue,
Claire:
yellow.
\n
(2)
Alice
and
Bob
swap
balls:
Alice:
blue,
Bob:
pink,
Claire:
yellow.
\n
(3)
Claire
and
Bob
swap
balls:
Alice:
blue,
Bob:
yellow,
Claire:
pink.
\n
At
the
end
of
the
game,
Bob
has
the
yellow
ball.
So
the
answer
is
(A).
Alice,
Bob,
and
Claire
are
playing
a
game.
At
the
start
of
the
game,
they
are
each
holding
a
ball:
Alice
has
a
white
ball,
Bob
has
a
purple
ball,
and
Claire
has
a
pink
ball.
\n
As
the
game
progresses,
pairs
of
players
trade
balls.
First,
Bob
and
Alice
swap
balls.
Then,
Bob
and
Claire
swap
balls.
Finally,
Bob
and
Alice
swap
balls.
At
the
end
of
the
game,
Alice
has
the
\n
Options:
\n
(A)
white
ball
\n
(B)
purple
ball
\n
(C)
pink
ball
\n
A:
Let's
think
step
by
step.
\n
(0)
At
the
start:
Alice:
white,
Bob:
purple,
Claire:
pink.
\n
(1)
Bob
and
Alice
swap
balls:
Alice:
purple,
Bob:
white,
Claire:
pink.
\n
(2)
Bob
and
Claire
swap
balls:
Alice:
purple,
Bob:
pink,
Claire:
white.
\n
(3)
Bob
and
Alice
swap
balls:
Alice:
pink,
Bob:
purple,
Claire:
white.
\n
At
the
end
of
the
game,
Alice
has
the
pink
ball.
So
the
answer
is
(C).
Alice,
Bob,
and
Claire
are
dancers
at
a
square
dance.
At
the
start
of
a
song,
they
each
have
a
partner:
Alice
is
dancing
with
Lola,
Bob
is
dancing
with
Rodrigo,
and
Claire
is
dancing
with
Patrick.
\n
Throughout
the
song,
the
dancers
often
trade
partners.
First,
Alice
and
Bob
switch
partners.
Then,
Claire
and
Bob
switch
partners.
Finally,
Bob
and
Alice
switch
partners.
At
the
end
of
the
dance,
Alice
is
dancing
with
\n
Options:
\n
(A)
Lola
\n
(B)
Rodrigo
\n
(C)
Patrick
\n
A:
Let's
think
step
by
step.
\n
(0)
At
the
start:
Alice:
Lola,
Bob:
Rodrigo,
Claire:
Patrick.
\n
(1)
Alice
and
Bob
switch
partners:
Alice:
Rodrigo,
Bob:
Lola,
Claire:
Patrick.
\n
(2)
Claire
and
Bob
switch
partners:
Alice:
Rodrigo,
Bob:
Patrick,
Claire:
Lola.
\n
(3)
Bob
and
Alice
switch
partners:
Alice:
Patrick,
Bob:
Rodrigo,
Claire:
Lola.
\n
At
the
end
of
the
dance,
Alice
is
dancing
with
Patrick.
So
the
answer
is
(C).Q:
{{input}}
\n
A:
Let's
think
step
by
step.
\n
"
"
doc_to_text"
:
"
Q:
Alice,
Bob,
and
Claire
are
playing
a
game.
At
the
start
of
the
game,
they
are
each
holding
a
ball:
Alice
has
a
yellow
ball,
Bob
has
a
blue
ball,
and
Claire
has
a
pink
ball.
\n
As
the
game
progresses,
pairs
of
players
trade
balls.
First,
Claire
and
Alice
swap
balls.
Then,
Alice
and
Bob
swap
balls.
Finally,
Claire
and
Bob
swap
balls.
At
the
end
of
the
game,
Bob
has
the
\n
Options:
\n
(A)
yellow
ball
\n
(B)
blue
ball
\n
(C)
pink
ball
\n
A:
Let's
think
step
by
step.
\n
(0)
At
the
start:
Alice:
yellow,
Bob:
blue,
Claire:
pink.
\n
(1)
Claire
and
Alice
swap
balls:
Alice:
pink,
Bob:
blue,
Claire:
yellow.
\n
(2)
Alice
and
Bob
swap
balls:
Alice:
blue,
Bob:
pink,
Claire:
yellow.
\n
(3)
Claire
and
Bob
swap
balls:
Alice:
blue,
Bob:
yellow,
Claire:
pink.
\n
At
the
end
of
the
game,
Bob
has
the
yellow
ball.
So
the
answer
is
(A).
\n\n
Q:
Alice,
Bob,
and
Claire
are
playing
a
game.
At
the
start
of
the
game,
they
are
each
holding
a
ball:
Alice
has
a
white
ball,
Bob
has
a
purple
ball,
and
Claire
has
a
pink
ball.
\n
As
the
game
progresses,
pairs
of
players
trade
balls.
First,
Bob
and
Alice
swap
balls.
Then,
Bob
and
Claire
swap
balls.
Finally,
Bob
and
Alice
swap
balls.
At
the
end
of
the
game,
Alice
has
the
\n
Options:
\n
(A)
white
ball
\n
(B)
purple
ball
\n
(C)
pink
ball
\n
A:
Let's
think
step
by
step.
\n
(0)
At
the
start:
Alice:
white,
Bob:
purple,
Claire:
pink.
\n
(1)
Bob
and
Alice
swap
balls:
Alice:
purple,
Bob:
white,
Claire:
pink.
\n
(2)
Bob
and
Claire
swap
balls:
Alice:
purple,
Bob:
pink,
Claire:
white.
\n
(3)
Bob
and
Alice
swap
balls:
Alice:
pink,
Bob:
purple,
Claire:
white.
\n
At
the
end
of
the
game,
Alice
has
the
pink
ball.
So
the
answer
is
(C).
\n\n
Q:
Alice,
Bob,
and
Claire
are
dancers
at
a
square
dance.
At
the
start
of
a
song,
they
each
have
a
partner:
Alice
is
dancing
with
Lola,
Bob
is
dancing
with
Rodrigo,
and
Claire
is
dancing
with
Patrick.
\n
Throughout
the
song,
the
dancers
often
trade
partners.
First,
Alice
and
Bob
switch
partners.
Then,
Claire
and
Bob
switch
partners.
Finally,
Bob
and
Alice
switch
partners.
At
the
end
of
the
dance,
Alice
is
dancing
with
\n
Options:
\n
(A)
Lola
\n
(B)
Rodrigo
\n
(C)
Patrick
\n
A:
Let's
think
step
by
step.
\n
(0)
At
the
start:
Alice:
Lola,
Bob:
Rodrigo,
Claire:
Patrick.
\n
(1)
Alice
and
Bob
switch
partners:
Alice:
Rodrigo,
Bob:
Lola,
Claire:
Patrick.
\n
(2)
Claire
and
Bob
switch
partners:
Alice:
Rodrigo,
Bob:
Patrick,
Claire:
Lola.
\n
(3)
Bob
and
Alice
switch
partners:
Alice:
Patrick,
Bob:
Rodrigo,
Claire:
Lola.
\n
At
the
end
of
the
dance,
Alice
is
dancing
with
Patrick.
So
the
answer
is
(C).
\n\n
Q:
{{input}}
\n
A:
Let's
think
step
by
step.
\n
"
"
include"
:
"
_cot_fewshot_template_yaml"
"
include"
:
"
_cot_fewshot_template_yaml"
"
task"
:
"
bbh_cot_fewshot_tracking_shuffled_objects_five_objects"
"
task"
:
"
bbh_cot_fewshot_tracking_shuffled_objects_five_objects"
lm_eval/tasks/bbh/cot_fewshot/tracking_shuffled_objects_seven_objects.yaml
View file @
33dcbd49
"
dataset_name"
:
"
tracking_shuffled_objects_seven_objects"
"
dataset_name"
:
"
tracking_shuffled_objects_seven_objects"
"
description"
:
"
A
task
requiring
determining
the
final
positions
of
a
set
of
objects
given
their
initial
positions
and
a
description
of
a
sequence
of
swaps.
\n\n
"
"
description"
:
"
A
task
requiring
determining
the
final
positions
of
a
set
of
objects
given
their
initial
positions
and
a
description
of
a
sequence
of
swaps.
\n\n
"
"
doc_to_text"
:
"
Alice,
Bob,
and
Claire
are
playing
a
game.
At
the
start
of
the
game,
they
are
each
holding
a
ball:
Alice
has
a
yellow
ball,
Bob
has
a
blue
ball,
and
Claire
has
a
pink
ball.
\n
As
the
game
progresses,
pairs
of
players
trade
balls.
First,
Claire
and
Alice
swap
balls.
Then,
Alice
and
Bob
swap
balls.
Finally,
Claire
and
Bob
swap
balls.
At
the
end
of
the
game,
Bob
has
the
\n
Options:
\n
(A)
yellow
ball
\n
(B)
blue
ball
\n
(C)
pink
ball
\n
A:
Let's
think
step
by
step.
\n
(0)
At
the
start:
Alice:
yellow,
Bob:
blue,
Claire:
pink.
\n
(1)
Claire
and
Alice
swap
balls:
Alice:
pink,
Bob:
blue,
Claire:
yellow.
\n
(2)
Alice
and
Bob
swap
balls:
Alice:
blue,
Bob:
pink,
Claire:
yellow.
\n
(3)
Claire
and
Bob
swap
balls:
Alice:
blue,
Bob:
yellow,
Claire:
pink.
\n
At
the
end
of
the
game,
Bob
has
the
yellow
ball.
So
the
answer
is
(A).
Alice,
Bob,
and
Claire
are
playing
a
game.
At
the
start
of
the
game,
they
are
each
holding
a
ball:
Alice
has
a
white
ball,
Bob
has
a
purple
ball,
and
Claire
has
a
pink
ball.
\n
As
the
game
progresses,
pairs
of
players
trade
balls.
First,
Bob
and
Alice
swap
balls.
Then,
Bob
and
Claire
swap
balls.
Finally,
Bob
and
Alice
swap
balls.
At
the
end
of
the
game,
Alice
has
the
\n
Options:
\n
(A)
white
ball
\n
(B)
purple
ball
\n
(C)
pink
ball
\n
A:
Let's
think
step
by
step.
\n
(0)
At
the
start:
Alice:
white,
Bob:
purple,
Claire:
pink.
\n
(1)
Bob
and
Alice
swap
balls:
Alice:
purple,
Bob:
white,
Claire:
pink.
\n
(2)
Bob
and
Claire
swap
balls:
Alice:
purple,
Bob:
pink,
Claire:
white.
\n
(3)
Bob
and
Alice
swap
balls:
Alice:
pink,
Bob:
purple,
Claire:
white.
\n
At
the
end
of
the
game,
Alice
has
the
pink
ball.
So
the
answer
is
(C).
Alice,
Bob,
and
Claire
are
dancers
at
a
square
dance.
At
the
start
of
a
song,
they
each
have
a
partner:
Alice
is
dancing
with
Lola,
Bob
is
dancing
with
Rodrigo,
and
Claire
is
dancing
with
Patrick.
\n
Throughout
the
song,
the
dancers
often
trade
partners.
First,
Alice
and
Bob
switch
partners.
Then,
Claire
and
Bob
switch
partners.
Finally,
Bob
and
Alice
switch
partners.
At
the
end
of
the
dance,
Alice
is
dancing
with
\n
Options:
\n
(A)
Lola
\n
(B)
Rodrigo
\n
(C)
Patrick
\n
A:
Let's
think
step
by
step.
\n
(0)
At
the
start:
Alice:
Lola,
Bob:
Rodrigo,
Claire:
Patrick.
\n
(1)
Alice
and
Bob
switch
partners:
Alice:
Rodrigo,
Bob:
Lola,
Claire:
Patrick.
\n
(2)
Claire
and
Bob
switch
partners:
Alice:
Rodrigo,
Bob:
Patrick,
Claire:
Lola.
\n
(3)
Bob
and
Alice
switch
partners:
Alice:
Patrick,
Bob:
Rodrigo,
Claire:
Lola.
\n
At
the
end
of
the
dance,
Alice
is
dancing
with
Patrick.
So
the
answer
is
(C).Q:
{{input}}
\n
A:
Let's
think
step
by
step.
\n
"
"
doc_to_text"
:
"
Q:
Alice,
Bob,
and
Claire
are
playing
a
game.
At
the
start
of
the
game,
they
are
each
holding
a
ball:
Alice
has
a
yellow
ball,
Bob
has
a
blue
ball,
and
Claire
has
a
pink
ball.
\n
As
the
game
progresses,
pairs
of
players
trade
balls.
First,
Claire
and
Alice
swap
balls.
Then,
Alice
and
Bob
swap
balls.
Finally,
Claire
and
Bob
swap
balls.
At
the
end
of
the
game,
Bob
has
the
\n
Options:
\n
(A)
yellow
ball
\n
(B)
blue
ball
\n
(C)
pink
ball
\n
A:
Let's
think
step
by
step.
\n
(0)
At
the
start:
Alice:
yellow,
Bob:
blue,
Claire:
pink.
\n
(1)
Claire
and
Alice
swap
balls:
Alice:
pink,
Bob:
blue,
Claire:
yellow.
\n
(2)
Alice
and
Bob
swap
balls:
Alice:
blue,
Bob:
pink,
Claire:
yellow.
\n
(3)
Claire
and
Bob
swap
balls:
Alice:
blue,
Bob:
yellow,
Claire:
pink.
\n
At
the
end
of
the
game,
Bob
has
the
yellow
ball.
So
the
answer
is
(A).
\n\n
Q:
Alice,
Bob,
and
Claire
are
playing
a
game.
At
the
start
of
the
game,
they
are
each
holding
a
ball:
Alice
has
a
white
ball,
Bob
has
a
purple
ball,
and
Claire
has
a
pink
ball.
\n
As
the
game
progresses,
pairs
of
players
trade
balls.
First,
Bob
and
Alice
swap
balls.
Then,
Bob
and
Claire
swap
balls.
Finally,
Bob
and
Alice
swap
balls.
At
the
end
of
the
game,
Alice
has
the
\n
Options:
\n
(A)
white
ball
\n
(B)
purple
ball
\n
(C)
pink
ball
\n
A:
Let's
think
step
by
step.
\n
(0)
At
the
start:
Alice:
white,
Bob:
purple,
Claire:
pink.
\n
(1)
Bob
and
Alice
swap
balls:
Alice:
purple,
Bob:
white,
Claire:
pink.
\n
(2)
Bob
and
Claire
swap
balls:
Alice:
purple,
Bob:
pink,
Claire:
white.
\n
(3)
Bob
and
Alice
swap
balls:
Alice:
pink,
Bob:
purple,
Claire:
white.
\n
At
the
end
of
the
game,
Alice
has
the
pink
ball.
So
the
answer
is
(C).
\n\n
Q:
Alice,
Bob,
and
Claire
are
dancers
at
a
square
dance.
At
the
start
of
a
song,
they
each
have
a
partner:
Alice
is
dancing
with
Lola,
Bob
is
dancing
with
Rodrigo,
and
Claire
is
dancing
with
Patrick.
\n
Throughout
the
song,
the
dancers
often
trade
partners.
First,
Alice
and
Bob
switch
partners.
Then,
Claire
and
Bob
switch
partners.
Finally,
Bob
and
Alice
switch
partners.
At
the
end
of
the
dance,
Alice
is
dancing
with
\n
Options:
\n
(A)
Lola
\n
(B)
Rodrigo
\n
(C)
Patrick
\n
A:
Let's
think
step
by
step.
\n
(0)
At
the
start:
Alice:
Lola,
Bob:
Rodrigo,
Claire:
Patrick.
\n
(1)
Alice
and
Bob
switch
partners:
Alice:
Rodrigo,
Bob:
Lola,
Claire:
Patrick.
\n
(2)
Claire
and
Bob
switch
partners:
Alice:
Rodrigo,
Bob:
Patrick,
Claire:
Lola.
\n
(3)
Bob
and
Alice
switch
partners:
Alice:
Patrick,
Bob:
Rodrigo,
Claire:
Lola.
\n
At
the
end
of
the
dance,
Alice
is
dancing
with
Patrick.
So
the
answer
is
(C).
\n\n
Q:
{{input}}
\n
A:
Let's
think
step
by
step.
\n
"
"
include"
:
"
_cot_fewshot_template_yaml"
"
include"
:
"
_cot_fewshot_template_yaml"
"
task"
:
"
bbh_cot_fewshot_tracking_shuffled_objects_seven_objects"
"
task"
:
"
bbh_cot_fewshot_tracking_shuffled_objects_seven_objects"
lm_eval/tasks/bbh/cot_fewshot/tracking_shuffled_objects_three_objects.yaml
View file @
33dcbd49
"
dataset_name"
:
"
tracking_shuffled_objects_three_objects"
"
dataset_name"
:
"
tracking_shuffled_objects_three_objects"
"
description"
:
"
A
task
requiring
determining
the
final
positions
of
a
set
of
objects
given
their
initial
positions
and
a
description
of
a
sequence
of
swaps.
\n\n
"
"
description"
:
"
A
task
requiring
determining
the
final
positions
of
a
set
of
objects
given
their
initial
positions
and
a
description
of
a
sequence
of
swaps.
\n\n
"
"
doc_to_text"
:
"
Alice,
Bob,
and
Claire
are
playing
a
game.
At
the
start
of
the
game,
they
are
each
holding
a
ball:
Alice
has
a
yellow
ball,
Bob
has
a
blue
ball,
and
Claire
has
a
pink
ball.
\n
As
the
game
progresses,
pairs
of
players
trade
balls.
First,
Claire
and
Alice
swap
balls.
Then,
Alice
and
Bob
swap
balls.
Finally,
Claire
and
Bob
swap
balls.
At
the
end
of
the
game,
Bob
has
the
\n
Options:
\n
(A)
yellow
ball
\n
(B)
blue
ball
\n
(C)
pink
ball
\n
A:
Let's
think
step
by
step.
\n
(0)
At
the
start:
Alice:
yellow,
Bob:
blue,
Claire:
pink.
\n
(1)
Claire
and
Alice
swap
balls:
Alice:
pink,
Bob:
blue,
Claire:
yellow.
\n
(2)
Alice
and
Bob
swap
balls:
Alice:
blue,
Bob:
pink,
Claire:
yellow.
\n
(3)
Claire
and
Bob
swap
balls:
Alice:
blue,
Bob:
yellow,
Claire:
pink.
\n
At
the
end
of
the
game,
Bob
has
the
yellow
ball.
So
the
answer
is
(A).
Alice,
Bob,
and
Claire
are
playing
a
game.
At
the
start
of
the
game,
they
are
each
holding
a
ball:
Alice
has
a
white
ball,
Bob
has
a
purple
ball,
and
Claire
has
a
pink
ball.
\n
As
the
game
progresses,
pairs
of
players
trade
balls.
First,
Bob
and
Alice
swap
balls.
Then,
Bob
and
Claire
swap
balls.
Finally,
Bob
and
Alice
swap
balls.
At
the
end
of
the
game,
Alice
has
the
\n
Options:
\n
(A)
white
ball
\n
(B)
purple
ball
\n
(C)
pink
ball
\n
A:
Let's
think
step
by
step.
\n
(0)
At
the
start:
Alice:
white,
Bob:
purple,
Claire:
pink.
\n
(1)
Bob
and
Alice
swap
balls:
Alice:
purple,
Bob:
white,
Claire:
pink.
\n
(2)
Bob
and
Claire
swap
balls:
Alice:
purple,
Bob:
pink,
Claire:
white.
\n
(3)
Bob
and
Alice
swap
balls:
Alice:
pink,
Bob:
purple,
Claire:
white.
\n
At
the
end
of
the
game,
Alice
has
the
pink
ball.
So
the
answer
is
(C).
Alice,
Bob,
and
Claire
are
dancers
at
a
square
dance.
At
the
start
of
a
song,
they
each
have
a
partner:
Alice
is
dancing
with
Lola,
Bob
is
dancing
with
Rodrigo,
and
Claire
is
dancing
with
Patrick.
\n
Throughout
the
song,
the
dancers
often
trade
partners.
First,
Alice
and
Bob
switch
partners.
Then,
Claire
and
Bob
switch
partners.
Finally,
Bob
and
Alice
switch
partners.
At
the
end
of
the
dance,
Alice
is
dancing
with
\n
Options:
\n
(A)
Lola
\n
(B)
Rodrigo
\n
(C)
Patrick
\n
A:
Let's
think
step
by
step.
\n
(0)
At
the
start:
Alice:
Lola,
Bob:
Rodrigo,
Claire:
Patrick.
\n
(1)
Alice
and
Bob
switch
partners:
Alice:
Rodrigo,
Bob:
Lola,
Claire:
Patrick.
\n
(2)
Claire
and
Bob
switch
partners:
Alice:
Rodrigo,
Bob:
Patrick,
Claire:
Lola.
\n
(3)
Bob
and
Alice
switch
partners:
Alice:
Patrick,
Bob:
Rodrigo,
Claire:
Lola.
\n
At
the
end
of
the
dance,
Alice
is
dancing
with
Patrick.
So
the
answer
is
(C).Q:
{{input}}
\n
A:
Let's
think
step
by
step.
\n
"
"
doc_to_text"
:
"
Q:
Alice,
Bob,
and
Claire
are
playing
a
game.
At
the
start
of
the
game,
they
are
each
holding
a
ball:
Alice
has
a
yellow
ball,
Bob
has
a
blue
ball,
and
Claire
has
a
pink
ball.
\n
As
the
game
progresses,
pairs
of
players
trade
balls.
First,
Claire
and
Alice
swap
balls.
Then,
Alice
and
Bob
swap
balls.
Finally,
Claire
and
Bob
swap
balls.
At
the
end
of
the
game,
Bob
has
the
\n
Options:
\n
(A)
yellow
ball
\n
(B)
blue
ball
\n
(C)
pink
ball
\n
A:
Let's
think
step
by
step.
\n
(0)
At
the
start:
Alice:
yellow,
Bob:
blue,
Claire:
pink.
\n
(1)
Claire
and
Alice
swap
balls:
Alice:
pink,
Bob:
blue,
Claire:
yellow.
\n
(2)
Alice
and
Bob
swap
balls:
Alice:
blue,
Bob:
pink,
Claire:
yellow.
\n
(3)
Claire
and
Bob
swap
balls:
Alice:
blue,
Bob:
yellow,
Claire:
pink.
\n
At
the
end
of
the
game,
Bob
has
the
yellow
ball.
So
the
answer
is
(A).
\n\n
Q:
Alice,
Bob,
and
Claire
are
playing
a
game.
At
the
start
of
the
game,
they
are
each
holding
a
ball:
Alice
has
a
white
ball,
Bob
has
a
purple
ball,
and
Claire
has
a
pink
ball.
\n
As
the
game
progresses,
pairs
of
players
trade
balls.
First,
Bob
and
Alice
swap
balls.
Then,
Bob
and
Claire
swap
balls.
Finally,
Bob
and
Alice
swap
balls.
At
the
end
of
the
game,
Alice
has
the
\n
Options:
\n
(A)
white
ball
\n
(B)
purple
ball
\n
(C)
pink
ball
\n
A:
Let's
think
step
by
step.
\n
(0)
At
the
start:
Alice:
white,
Bob:
purple,
Claire:
pink.
\n
(1)
Bob
and
Alice
swap
balls:
Alice:
purple,
Bob:
white,
Claire:
pink.
\n
(2)
Bob
and
Claire
swap
balls:
Alice:
purple,
Bob:
pink,
Claire:
white.
\n
(3)
Bob
and
Alice
swap
balls:
Alice:
pink,
Bob:
purple,
Claire:
white.
\n
At
the
end
of
the
game,
Alice
has
the
pink
ball.
So
the
answer
is
(C).
\n\n
Q:
Alice,
Bob,
and
Claire
are
dancers
at
a
square
dance.
At
the
start
of
a
song,
they
each
have
a
partner:
Alice
is
dancing
with
Lola,
Bob
is
dancing
with
Rodrigo,
and
Claire
is
dancing
with
Patrick.
\n
Throughout
the
song,
the
dancers
often
trade
partners.
First,
Alice
and
Bob
switch
partners.
Then,
Claire
and
Bob
switch
partners.
Finally,
Bob
and
Alice
switch
partners.
At
the
end
of
the
dance,
Alice
is
dancing
with
\n
Options:
\n
(A)
Lola
\n
(B)
Rodrigo
\n
(C)
Patrick
\n
A:
Let's
think
step
by
step.
\n
(0)
At
the
start:
Alice:
Lola,
Bob:
Rodrigo,
Claire:
Patrick.
\n
(1)
Alice
and
Bob
switch
partners:
Alice:
Rodrigo,
Bob:
Lola,
Claire:
Patrick.
\n
(2)
Claire
and
Bob
switch
partners:
Alice:
Rodrigo,
Bob:
Patrick,
Claire:
Lola.
\n
(3)
Bob
and
Alice
switch
partners:
Alice:
Patrick,
Bob:
Rodrigo,
Claire:
Lola.
\n
At
the
end
of
the
dance,
Alice
is
dancing
with
Patrick.
So
the
answer
is
(C).
\n\n
Q:
{{input}}
\n
A:
Let's
think
step
by
step.
\n
"
"
include"
:
"
_cot_fewshot_template_yaml"
"
include"
:
"
_cot_fewshot_template_yaml"
"
task"
:
"
bbh_cot_fewshot_tracking_shuffled_objects_three_objects"
"
task"
:
"
bbh_cot_fewshot_tracking_shuffled_objects_three_objects"
lm_eval/tasks/bbh/cot_fewshot/web_of_lies.yaml
View file @
33dcbd49
"
dataset_name"
:
"
web_of_lies"
"
dataset_name"
:
"
web_of_lies"
"
description"
:
"
Evaluate
a
random
boolean
function
expressed
as
a
word
problem.
\n\n
"
"
description"
:
"
Evaluate
a
random
boolean
function
expressed
as
a
word
problem.
\n\n
"
"
doc_to_text"
:
"
Question:
Fidel
tells
the
truth.
Jerry
says
Fidel
tells
the
truth.
Vina
says
Jerry
tells
the
truth.
Millicent
says
Vina
lies.
Raymond
says
Millicent
lies.
Does
Raymond
tell
the
truth?
\n
A:
Let's
think
step
by
step.
\n
(1)
Fidel
tells
the
truth.
So,
we
know
that
Fidel
tells
the
truth.
\n
(2)
Jerry
says
Fidel
tells
the
truth.
Since
we
know
from
(1)
that
Fidel
tells
the
truth,
if
Jerry
says
that
Fidel
tells
the
truth,
then
Jerry
tells
the
truth.
\n
(3)
Vina
says
Jerry
tells
the
truth.
Since
we
know
from
(2)
that
Jerry
tells
the
truth,
if
Vina
says
Jerry
tells
the
truth,
then
Vine
tells
the
truth.
\n
(4)
Millicent
says
Vina
lies.
Since
we
know
from
(3)
that
Vina
tells
the
truth,
if
Millicent
says
Vina
lies,
then
Millicent
lies.
\n
(5)
Raymond
says
Millicent
lies.
Since
we
know
from
(4)
that
Millicent
lies,
if
Raymond
says
Millicent
lies,
then
Raymond
tells
the
truth.
\n
Now,
the
question
asks:
Does
Raymond
tell
the
truth?
We
know
from
(5)
that
Raymond
tells
the
truth.
So
the
answer
is
Yes.
Question:
Kristian
lies.
Millie
says
Kristian
lies.
Maybelle
says
Millie
tells
the
truth.
Fidel
says
Maybelle
lies.
Leda
says
Fidel
lies.
Does
Leda
tell
the
truth?
\n
A:
Let's
think
step
by
step.
\n
(1)
Kristian
lies.
So,
we
know
that
Kristian
lies.
\n
(2)
Millie
says
Kristian
lies.
Since
we
know
from
(1)
that
Kristian
lies,
if
Millie
says
Kristian
lies,
then
Millie
tells
the
truth.
\n
(3)
Maybelle
says
Millie
tells
the
truth.
Since
we
know
from
(2)
that
Millie
tells
the
truth,
if
Maybelle
says
Millie
tells
the
truth,
then
Maybelle
tells
the
truth.
\n
(4)
Fidel
says
Maybelle
lies.
Since
we
know
from
(3)
that
Maybelle
tells
the
truth,
if
Fidel
says
Maybelle
lies,
then
Fidel
lies.
\n
(5)
Leda
says
Fidel
lies.
Since
we
know
from
(4)
that
Fidel
lies,
if
Leda
says
Fidel
lies,
then
Leda
tells
the
truth.
\n
Now,
the
question
asks:
Does
Leda
tell
the
truth?
We
know
from
(5)
that
Leda
tells
the
truth.
So
the
answer
is
Yes.
Question:
Kristian
tells
the
truth.
Michaela
says
Kristian
lies.
Raymond
says
Michaela
tells
the
truth.
Osvaldo
says
Raymond
tells
the
truth.
Jamey
says
Osvaldo
tells
the
truth.
Does
Jamey
tell
the
truth?
\n
A:
Let's
think
step
by
step.
\n
(1)
Kristian
tells
the
truth.
So,
we
know
that
Kristian
tells
the
truth.
\n
(2)
Michaela
says
Kristian
lies.
Since
we
know
from
(1)
that
Kristian
tells
the
truth,
if
Michaela
says
Kristian
lies,
then
Michaela
lies.
\n
(3)
Raymond
says
Michaela
tells
the
truth.
Since
we
know
from
(2)
that
Michaela
lies,
if
Raymond
says
Michaela
tells
the
truth,
then
Raymond
lies.
\n
(4)
Osvaldo
says
Raymond
tells
the
truth.
Since
we
know
from
(3)
that
Raymond
lies,
if
Osvaldo
says
Raymond
tells
the
truth,
then
Osvaldo
lies.
\n
(5)
Jamey
says
Osvaldo
tells
the
truth.
Since
we
know
from
(4)
that
Osvaldo
lies,
if
Jamey
says
Osvaldo
tells
the
truth,
then
Jamey
lies.
\n
Now,
the
question
asks:
Does
Jamey
tell
the
truth?
We
know
from
(5)
that
Jamey
lies.
So
the
answer
is
No.Q:
{{input}}
\n
A:
Let's
think
step
by
step.
\n
"
"
doc_to_text"
:
"
Q:
Question:
Fidel
tells
the
truth.
Jerry
says
Fidel
tells
the
truth.
Vina
says
Jerry
tells
the
truth.
Millicent
says
Vina
lies.
Raymond
says
Millicent
lies.
Does
Raymond
tell
the
truth?
\n
A:
Let's
think
step
by
step.
\n
(1)
Fidel
tells
the
truth.
So,
we
know
that
Fidel
tells
the
truth.
\n
(2)
Jerry
says
Fidel
tells
the
truth.
Since
we
know
from
(1)
that
Fidel
tells
the
truth,
if
Jerry
says
that
Fidel
tells
the
truth,
then
Jerry
tells
the
truth.
\n
(3)
Vina
says
Jerry
tells
the
truth.
Since
we
know
from
(2)
that
Jerry
tells
the
truth,
if
Vina
says
Jerry
tells
the
truth,
then
Vine
tells
the
truth.
\n
(4)
Millicent
says
Vina
lies.
Since
we
know
from
(3)
that
Vina
tells
the
truth,
if
Millicent
says
Vina
lies,
then
Millicent
lies.
\n
(5)
Raymond
says
Millicent
lies.
Since
we
know
from
(4)
that
Millicent
lies,
if
Raymond
says
Millicent
lies,
then
Raymond
tells
the
truth.
\n
Now,
the
question
asks:
Does
Raymond
tell
the
truth?
We
know
from
(5)
that
Raymond
tells
the
truth.
So
the
answer
is
Yes.
\n\n
Q:
Question:
Kristian
lies.
Millie
says
Kristian
lies.
Maybelle
says
Millie
tells
the
truth.
Fidel
says
Maybelle
lies.
Leda
says
Fidel
lies.
Does
Leda
tell
the
truth?
\n
A:
Let's
think
step
by
step.
\n
(1)
Kristian
lies.
So,
we
know
that
Kristian
lies.
\n
(2)
Millie
says
Kristian
lies.
Since
we
know
from
(1)
that
Kristian
lies,
if
Millie
says
Kristian
lies,
then
Millie
tells
the
truth.
\n
(3)
Maybelle
says
Millie
tells
the
truth.
Since
we
know
from
(2)
that
Millie
tells
the
truth,
if
Maybelle
says
Millie
tells
the
truth,
then
Maybelle
tells
the
truth.
\n
(4)
Fidel
says
Maybelle
lies.
Since
we
know
from
(3)
that
Maybelle
tells
the
truth,
if
Fidel
says
Maybelle
lies,
then
Fidel
lies.
\n
(5)
Leda
says
Fidel
lies.
Since
we
know
from
(4)
that
Fidel
lies,
if
Leda
says
Fidel
lies,
then
Leda
tells
the
truth.
\n
Now,
the
question
asks:
Does
Leda
tell
the
truth?
We
know
from
(5)
that
Leda
tells
the
truth.
So
the
answer
is
Yes.
\n\n
Q:
Question:
Kristian
tells
the
truth.
Michaela
says
Kristian
lies.
Raymond
says
Michaela
tells
the
truth.
Osvaldo
says
Raymond
tells
the
truth.
Jamey
says
Osvaldo
tells
the
truth.
Does
Jamey
tell
the
truth?
\n
A:
Let's
think
step
by
step.
\n
(1)
Kristian
tells
the
truth.
So,
we
know
that
Kristian
tells
the
truth.
\n
(2)
Michaela
says
Kristian
lies.
Since
we
know
from
(1)
that
Kristian
tells
the
truth,
if
Michaela
says
Kristian
lies,
then
Michaela
lies.
\n
(3)
Raymond
says
Michaela
tells
the
truth.
Since
we
know
from
(2)
that
Michaela
lies,
if
Raymond
says
Michaela
tells
the
truth,
then
Raymond
lies.
\n
(4)
Osvaldo
says
Raymond
tells
the
truth.
Since
we
know
from
(3)
that
Raymond
lies,
if
Osvaldo
says
Raymond
tells
the
truth,
then
Osvaldo
lies.
\n
(5)
Jamey
says
Osvaldo
tells
the
truth.
Since
we
know
from
(4)
that
Osvaldo
lies,
if
Jamey
says
Osvaldo
tells
the
truth,
then
Jamey
lies.
\n
Now,
the
question
asks:
Does
Jamey
tell
the
truth?
We
know
from
(5)
that
Jamey
lies.
So
the
answer
is
No.
\n\n
Q:
{{input}}
\n
A:
Let's
think
step
by
step.
\n
"
"
include"
:
"
_cot_fewshot_template_yaml"
"
include"
:
"
_cot_fewshot_template_yaml"
"
task"
:
"
bbh_cot_fewshot_web_of_lies"
"
task"
:
"
bbh_cot_fewshot_web_of_lies"
lm_eval/tasks/bbh/cot_fewshot/word_sorting.yaml
View file @
33dcbd49
"
dataset_name"
:
"
word_sorting"
"
dataset_name"
:
"
word_sorting"
"
description"
:
"
Sort
a
list
of
words.
\n\n
"
"
description"
:
"
Sort
a
list
of
words.
\n\n
"
"
doc_to_text"
:
"
Sort
the
following
words
alphabetically:
List:
oven
costume
counterpart
\n
A:
Let's
think
step
by
step.
\n
The
first
letter:
\"
oven
\"
:
\"
o
\"
(15).
\"
costume
\"
:
\"
c
\"
(3).
\"
counterpart
\"
:
\"
c
\"
(3).
We
now
have:
(3)
[
\"
costume
\"
?
\"
counterpart
\"
]
<
(15)
\"
oven
\"
.
Now
let's
sort
this
subpart
[
\"
costume
\"
?
\"
counterpart
\"
]
by
looking
at
their
second
letters.
\n
The
second
letter:
\"
costume
\"
:
\"
o
\"
(15).
\"
counterpart
\"
:
\"
o
\"
(15).
We
now
have:
(15)
[
\"
costume
\"
?
\"
counterpart
\"
].
Now
let's
sort
this
subpart
[
\"
costume
\"
?
\"
counterpart
\"
]
by
looking
at
their
third
letters.
\n
The
third
letter:
\"
costume
\"
:
\"
s
\"
(19).
\"
counterpart
\"
:
\"
u
\"
(21).
We
now
have:
(19)
\"
costume
\"
<
(21)
\"
counterpart
\"
.
Hence,
we
have
[
\"
costume
\"
<
\"
counterpart
\"
]
<
\"
oven
\"
.
So
the
answer
is
costume
counterpart
oven.
Sort
the
following
words
alphabetically:
List:
hypochlorite
ponderosa
phone
credulity
\n
A:
Let's
think
step
by
step.
\n
The
first
letter:
\"
hypochlorite
\"
:
\"
h
\"
(8).
\"
ponderosa
\"
:
\"
p
\"
(16).
\"
phone
\"
:
\"
p
\"
(16).
\"
credulity
\"
:
\"
c
\"
(3).
We
now
have:
(3)
\"
credulity
\"
<
(8)
\"
hypochlorite
\"
<
(16)
[
\"
ponderosa
\"
?
\"
phone
\"
].
Now
let's
sort
this
subpart
[
\"
ponderosa
\"
?
\"
phone
\"
]
by
looking
at
their
second
letters.
\n
The
second
letter:
\"
ponderosa
\"
:
\"
o
\"
(15).
\"
phone
\"
:
\"
h
\"
(8).
We
now
have:
(8)
\"
phone
\"
<
(15)
\"
ponderosa
\"
.
Hence,
we
have
\"
credulity
\"
<
\"
hypochlorite
\"
<
[
\"
phone
\"
<
\"
ponderosa
\"
].
So
the
answer
is
credulity
hypochlorite
phone
ponderosa.
Sort
the
following
words
alphabetically:
List:
newt
arson
parthia
seismography
mugho
aspect
census
\n
A:
Let's
think
step
by
step.
\n
The
first
letter:
\"
newt
\"
:
\"
n
\"
(14).
\"
arson
\"
:
\"
a
\"
(1).
\"
parthia
\"
:
\"
p
\"
(16).
\"
seismography
\"
:
\"
s
\"
(19).
\"
mugho
\"
:
\"
m
\"
(13).
\"
aspect
\"
:
\"
a
\"
(1).
\"
census
\"
:
\"
c
\"
(3).
We
now
have:
(1)
[
\"
arson
\"
?
\"
aspect
\"
]
<
(3)
\"
census
\"
<
(13)
\"
mugho
\"
<
(14)
\"
newt
\"
<
(16)
\"
parthia
\"
<
(19)
\"
seismography
\"
.
Now
let's
sort
this
subpart
[
\"
arson
\"
?
\"
aspect
\"
]
by
looking
at
their
second
letters.
\n
The
second
letter:
\"
arson
\"
:
\"
r
\"
(18).
\"
aspect
\"
:
\"
s
\"
(19).
We
now
have:
(18)
\"
arson
\"
<
(19)
\"
aspect
\"
.
Hence,
we
have
[
\"
arson
\"
<
\"
aspect
\"
]
<
\"
census
\"
<
\"
mugho
\"
<
\"
newt
\"
<
\"
parthia
\"
<
\"
seismography
\"
.
So
the
answer
is
arson
aspect
census
mugho
newt
parthia
seismography.Q:
{{input}}
\n
A:
Let's
think
step
by
step.
\n
"
"
doc_to_text"
:
"
Q:
Sort
the
following
words
alphabetically:
List:
oven
costume
counterpart
\n
A:
Let's
think
step
by
step.
\n
The
first
letter:
\"
oven
\"
:
\"
o
\"
(15).
\"
costume
\"
:
\"
c
\"
(3).
\"
counterpart
\"
:
\"
c
\"
(3).
We
now
have:
(3)
[
\"
costume
\"
?
\"
counterpart
\"
]
<
(15)
\"
oven
\"
.
Now
let's
sort
this
subpart
[
\"
costume
\"
?
\"
counterpart
\"
]
by
looking
at
their
second
letters.
\n
The
second
letter:
\"
costume
\"
:
\"
o
\"
(15).
\"
counterpart
\"
:
\"
o
\"
(15).
We
now
have:
(15)
[
\"
costume
\"
?
\"
counterpart
\"
].
Now
let's
sort
this
subpart
[
\"
costume
\"
?
\"
counterpart
\"
]
by
looking
at
their
third
letters.
\n
The
third
letter:
\"
costume
\"
:
\"
s
\"
(19).
\"
counterpart
\"
:
\"
u
\"
(21).
We
now
have:
(19)
\"
costume
\"
<
(21)
\"
counterpart
\"
.
Hence,
we
have
[
\"
costume
\"
<
\"
counterpart
\"
]
<
\"
oven
\"
.
So
the
answer
is
costume
counterpart
oven.
\n\n
Q:
Sort
the
following
words
alphabetically:
List:
hypochlorite
ponderosa
phone
credulity
\n
A:
Let's
think
step
by
step.
\n
The
first
letter:
\"
hypochlorite
\"
:
\"
h
\"
(8).
\"
ponderosa
\"
:
\"
p
\"
(16).
\"
phone
\"
:
\"
p
\"
(16).
\"
credulity
\"
:
\"
c
\"
(3).
We
now
have:
(3)
\"
credulity
\"
<
(8)
\"
hypochlorite
\"
<
(16)
[
\"
ponderosa
\"
?
\"
phone
\"
].
Now
let's
sort
this
subpart
[
\"
ponderosa
\"
?
\"
phone
\"
]
by
looking
at
their
second
letters.
\n
The
second
letter:
\"
ponderosa
\"
:
\"
o
\"
(15).
\"
phone
\"
:
\"
h
\"
(8).
We
now
have:
(8)
\"
phone
\"
<
(15)
\"
ponderosa
\"
.
Hence,
we
have
\"
credulity
\"
<
\"
hypochlorite
\"
<
[
\"
phone
\"
<
\"
ponderosa
\"
].
So
the
answer
is
credulity
hypochlorite
phone
ponderosa.
\n\n
Q:
Sort
the
following
words
alphabetically:
List:
newt
arson
parthia
seismography
mugho
aspect
census
\n
A:
Let's
think
step
by
step.
\n
The
first
letter:
\"
newt
\"
:
\"
n
\"
(14).
\"
arson
\"
:
\"
a
\"
(1).
\"
parthia
\"
:
\"
p
\"
(16).
\"
seismography
\"
:
\"
s
\"
(19).
\"
mugho
\"
:
\"
m
\"
(13).
\"
aspect
\"
:
\"
a
\"
(1).
\"
census
\"
:
\"
c
\"
(3).
We
now
have:
(1)
[
\"
arson
\"
?
\"
aspect
\"
]
<
(3)
\"
census
\"
<
(13)
\"
mugho
\"
<
(14)
\"
newt
\"
<
(16)
\"
parthia
\"
<
(19)
\"
seismography
\"
.
Now
let's
sort
this
subpart
[
\"
arson
\"
?
\"
aspect
\"
]
by
looking
at
their
second
letters.
\n
The
second
letter:
\"
arson
\"
:
\"
r
\"
(18).
\"
aspect
\"
:
\"
s
\"
(19).
We
now
have:
(18)
\"
arson
\"
<
(19)
\"
aspect
\"
.
Hence,
we
have
[
\"
arson
\"
<
\"
aspect
\"
]
<
\"
census
\"
<
\"
mugho
\"
<
\"
newt
\"
<
\"
parthia
\"
<
\"
seismography
\"
.
So
the
answer
is
arson
aspect
census
mugho
newt
parthia
seismography.
\n\n
Q:
{{input}}
\n
A:
Let's
think
step
by
step.
\n
"
"
include"
:
"
_cot_fewshot_template_yaml"
"
include"
:
"
_cot_fewshot_template_yaml"
"
task"
:
"
bbh_cot_fewshot_word_sorting"
"
task"
:
"
bbh_cot_fewshot_word_sorting"
Prev
1
2
Next
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment