Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
gaoqiong
lm-evaluation-harness
Commits
da211969
Unverified
Commit
da211969
authored
Jun 28, 2024
by
Jess
Committed by
GitHub
Jun 28, 2024
Browse files
Merge branch 'EleutherAI:main' into main
parents
1b97e487
801322e0
Changes
654
Hide whitespace changes
Inline
Side-by-side
Showing
20 changed files
with
1015 additions
and
97 deletions
+1015
-97
lm_eval/tasks/bbh/cot_fewshot/tracking_shuffled_objects_three_objects.yaml
.../cot_fewshot/tracking_shuffled_objects_three_objects.yaml
+94
-5
lm_eval/tasks/bbh/cot_fewshot/web_of_lies.yaml
lm_eval/tasks/bbh/cot_fewshot/web_of_lies.yaml
+79
-5
lm_eval/tasks/bbh/cot_fewshot/word_sorting.yaml
lm_eval/tasks/bbh/cot_fewshot/word_sorting.yaml
+55
-5
lm_eval/tasks/bbh/fewshot/_fewshot_template_yaml
lm_eval/tasks/bbh/fewshot/_fewshot_template_yaml
+1
-2
lm_eval/tasks/bbh/fewshot/boolean_expressions.yaml
lm_eval/tasks/bbh/fewshot/boolean_expressions.yaml
+19
-5
lm_eval/tasks/bbh/fewshot/causal_judgement.yaml
lm_eval/tasks/bbh/fewshot/causal_judgement.yaml
+67
-5
lm_eval/tasks/bbh/fewshot/date_understanding.yaml
lm_eval/tasks/bbh/fewshot/date_understanding.yaml
+60
-5
lm_eval/tasks/bbh/fewshot/disambiguation_qa.yaml
lm_eval/tasks/bbh/fewshot/disambiguation_qa.yaml
+53
-5
lm_eval/tasks/bbh/fewshot/dyck_languages.yaml
lm_eval/tasks/bbh/fewshot/dyck_languages.yaml
+23
-5
lm_eval/tasks/bbh/fewshot/formal_fallacies.yaml
lm_eval/tasks/bbh/fewshot/formal_fallacies.yaml
+60
-5
lm_eval/tasks/bbh/fewshot/geometric_shapes.yaml
lm_eval/tasks/bbh/fewshot/geometric_shapes.yaml
+89
-5
lm_eval/tasks/bbh/fewshot/hyperbaton.yaml
lm_eval/tasks/bbh/fewshot/hyperbaton.yaml
+37
-5
lm_eval/tasks/bbh/fewshot/logical_deduction_five_objects.yaml
...val/tasks/bbh/fewshot/logical_deduction_five_objects.yaml
+55
-5
lm_eval/tasks/bbh/fewshot/logical_deduction_seven_objects.yaml
...al/tasks/bbh/fewshot/logical_deduction_seven_objects.yaml
+55
-5
lm_eval/tasks/bbh/fewshot/logical_deduction_three_objects.yaml
...al/tasks/bbh/fewshot/logical_deduction_three_objects.yaml
+55
-5
lm_eval/tasks/bbh/fewshot/movie_recommendation.yaml
lm_eval/tasks/bbh/fewshot/movie_recommendation.yaml
+57
-5
lm_eval/tasks/bbh/fewshot/multistep_arithmetic_two.yaml
lm_eval/tasks/bbh/fewshot/multistep_arithmetic_two.yaml
+19
-5
lm_eval/tasks/bbh/fewshot/navigate.yaml
lm_eval/tasks/bbh/fewshot/navigate.yaml
+43
-5
lm_eval/tasks/bbh/fewshot/object_counting.yaml
lm_eval/tasks/bbh/fewshot/object_counting.yaml
+23
-5
lm_eval/tasks/bbh/fewshot/penguins_in_a_table.yaml
lm_eval/tasks/bbh/fewshot/penguins_in_a_table.yaml
+71
-5
No files found.
lm_eval/tasks/bbh/cot_fewshot/tracking_shuffled_objects_three_objects.yaml
View file @
da211969
"
dataset_name"
:
"
tracking_shuffled_objects_three_objects"
"
description"
:
"
A
task
requiring
determining
the
final
positions
of
a
set
of
objects
given
their
initial
positions
and
a
description
of
a
sequence
of
swaps.
\n\n
"
"
doc_to_text"
:
"
Q:
Alice,
Bob,
and
Claire
are
playing
a
game.
At
the
start
of
the
game,
they
are
each
holding
a
ball:
Alice
has
a
yellow
ball,
Bob
has
a
blue
ball,
and
Claire
has
a
pink
ball.
\n
As
the
game
progresses,
pairs
of
players
trade
balls.
First,
Claire
and
Alice
swap
balls.
Then,
Alice
and
Bob
swap
balls.
Finally,
Claire
and
Bob
swap
balls.
At
the
end
of
the
game,
Bob
has
the
\n
Options:
\n
(A)
yellow
ball
\n
(B)
blue
ball
\n
(C)
pink
ball
\n
A:
Let's
think
step
by
step.
\n
(0)
At
the
start:
Alice:
yellow,
Bob:
blue,
Claire:
pink.
\n
(1)
Claire
and
Alice
swap
balls:
Alice:
pink,
Bob:
blue,
Claire:
yellow.
\n
(2)
Alice
and
Bob
swap
balls:
Alice:
blue,
Bob:
pink,
Claire:
yellow.
\n
(3)
Claire
and
Bob
swap
balls:
Alice:
blue,
Bob:
yellow,
Claire:
pink.
\n
At
the
end
of
the
game,
Bob
has
the
yellow
ball.
So
the
answer
is
(A).
\n\n
Q:
Alice,
Bob,
and
Claire
are
playing
a
game.
At
the
start
of
the
game,
they
are
each
holding
a
ball:
Alice
has
a
white
ball,
Bob
has
a
purple
ball,
and
Claire
has
a
pink
ball.
\n
As
the
game
progresses,
pairs
of
players
trade
balls.
First,
Bob
and
Alice
swap
balls.
Then,
Bob
and
Claire
swap
balls.
Finally,
Bob
and
Alice
swap
balls.
At
the
end
of
the
game,
Alice
has
the
\n
Options:
\n
(A)
white
ball
\n
(B)
purple
ball
\n
(C)
pink
ball
\n
A:
Let's
think
step
by
step.
\n
(0)
At
the
start:
Alice:
white,
Bob:
purple,
Claire:
pink.
\n
(1)
Bob
and
Alice
swap
balls:
Alice:
purple,
Bob:
white,
Claire:
pink.
\n
(2)
Bob
and
Claire
swap
balls:
Alice:
purple,
Bob:
pink,
Claire:
white.
\n
(3)
Bob
and
Alice
swap
balls:
Alice:
pink,
Bob:
purple,
Claire:
white.
\n
At
the
end
of
the
game,
Alice
has
the
pink
ball.
So
the
answer
is
(C).
\n\n
Q:
Alice,
Bob,
and
Claire
are
dancers
at
a
square
dance.
At
the
start
of
a
song,
they
each
have
a
partner:
Alice
is
dancing
with
Lola,
Bob
is
dancing
with
Rodrigo,
and
Claire
is
dancing
with
Patrick.
\n
Throughout
the
song,
the
dancers
often
trade
partners.
First,
Alice
and
Bob
switch
partners.
Then,
Claire
and
Bob
switch
partners.
Finally,
Bob
and
Alice
switch
partners.
At
the
end
of
the
dance,
Alice
is
dancing
with
\n
Options:
\n
(A)
Lola
\n
(B)
Rodrigo
\n
(C)
Patrick
\n
A:
Let's
think
step
by
step.
\n
(0)
At
the
start:
Alice:
Lola,
Bob:
Rodrigo,
Claire:
Patrick.
\n
(1)
Alice
and
Bob
switch
partners:
Alice:
Rodrigo,
Bob:
Lola,
Claire:
Patrick.
\n
(2)
Claire
and
Bob
switch
partners:
Alice:
Rodrigo,
Bob:
Patrick,
Claire:
Lola.
\n
(3)
Bob
and
Alice
switch
partners:
Alice:
Patrick,
Bob:
Rodrigo,
Claire:
Lola.
\n
At
the
end
of
the
dance,
Alice
is
dancing
with
Patrick.
So
the
answer
is
(C).
\n\n
Q:
{{input}}
\n
A:
Let's
think
step
by
step.
\n
"
"
include"
:
"
_cot_fewshot_template_yaml"
"
task"
:
"
bbh_cot_fewshot_tracking_shuffled_objects_three_objects"
dataset_name
:
tracking_shuffled_objects_three_objects
description
:
'
A
task
requiring
determining
the
final
positions
of
a
set
of
objects
given
their
initial
positions
and
a
description
of
a
sequence
of
swaps.
'
doc_to_text
:
'
Q:
{{input}}
A:
Let'
'
s
think
step
by
step.
'
fewshot_config
:
sampler
:
first_n
samples
:
-
input
:
'
Alice,
Bob,
and
Claire
are
playing
a
game.
At
the
start
of
the
game,
they
are
each
holding
a
ball:
Alice
has
a
yellow
ball,
Bob
has
a
blue
ball,
and
Claire
has
a
pink
ball.
As
the
game
progresses,
pairs
of
players
trade
balls.
First,
Claire
and
Alice
swap
balls.
Then,
Alice
and
Bob
swap
balls.
Finally,
Claire
and
Bob
swap
balls.
At
the
end
of
the
game,
Bob
has
the
Options:
(A)
yellow
ball
(B)
blue
ball
(C)
pink
ball'
target
:
'
Let'
'
s
think
step
by
step.
(0)
At
the
start:
Alice:
yellow,
Bob:
blue,
Claire:
pink.
(1)
Claire
and
Alice
swap
balls:
Alice:
pink,
Bob:
blue,
Claire:
yellow.
(2)
Alice
and
Bob
swap
balls:
Alice:
blue,
Bob:
pink,
Claire:
yellow.
(3)
Claire
and
Bob
swap
balls:
Alice:
blue,
Bob:
yellow,
Claire:
pink.
At
the
end
of
the
game,
Bob
has
the
yellow
ball.
So
the
answer
is
(A).'
-
input
:
'
Alice,
Bob,
and
Claire
are
playing
a
game.
At
the
start
of
the
game,
they
are
each
holding
a
ball:
Alice
has
a
white
ball,
Bob
has
a
purple
ball,
and
Claire
has
a
pink
ball.
As
the
game
progresses,
pairs
of
players
trade
balls.
First,
Bob
and
Alice
swap
balls.
Then,
Bob
and
Claire
swap
balls.
Finally,
Bob
and
Alice
swap
balls.
At
the
end
of
the
game,
Alice
has
the
Options:
(A)
white
ball
(B)
purple
ball
(C)
pink
ball'
target
:
'
Let'
'
s
think
step
by
step.
(0)
At
the
start:
Alice:
white,
Bob:
purple,
Claire:
pink.
(1)
Bob
and
Alice
swap
balls:
Alice:
purple,
Bob:
white,
Claire:
pink.
(2)
Bob
and
Claire
swap
balls:
Alice:
purple,
Bob:
pink,
Claire:
white.
(3)
Bob
and
Alice
swap
balls:
Alice:
pink,
Bob:
purple,
Claire:
white.
At
the
end
of
the
game,
Alice
has
the
pink
ball.
So
the
answer
is
(C).'
-
input
:
'
Alice,
Bob,
and
Claire
are
dancers
at
a
square
dance.
At
the
start
of
a
song,
they
each
have
a
partner:
Alice
is
dancing
with
Lola,
Bob
is
dancing
with
Rodrigo,
and
Claire
is
dancing
with
Patrick.
Throughout
the
song,
the
dancers
often
trade
partners.
First,
Alice
and
Bob
switch
partners.
Then,
Claire
and
Bob
switch
partners.
Finally,
Bob
and
Alice
switch
partners.
At
the
end
of
the
dance,
Alice
is
dancing
with
Options:
(A)
Lola
(B)
Rodrigo
(C)
Patrick'
target
:
'
Let'
'
s
think
step
by
step.
(0)
At
the
start:
Alice:
Lola,
Bob:
Rodrigo,
Claire:
Patrick.
(1)
Alice
and
Bob
switch
partners:
Alice:
Rodrigo,
Bob:
Lola,
Claire:
Patrick.
(2)
Claire
and
Bob
switch
partners:
Alice:
Rodrigo,
Bob:
Patrick,
Claire:
Lola.
(3)
Bob
and
Alice
switch
partners:
Alice:
Patrick,
Bob:
Rodrigo,
Claire:
Lola.
At
the
end
of
the
dance,
Alice
is
dancing
with
Patrick.
So
the
answer
is
(C).'
include
:
_cot_fewshot_template_yaml
task
:
bbh_cot_fewshot_tracking_shuffled_objects_three_objects
lm_eval/tasks/bbh/cot_fewshot/web_of_lies.yaml
View file @
da211969
"
dataset_name"
:
"
web_of_lies"
"
description"
:
"
Evaluate
a
random
boolean
function
expressed
as
a
word
problem.
\n\n
"
"
doc_to_text"
:
"
Q:
Question:
Fidel
tells
the
truth.
Jerry
says
Fidel
tells
the
truth.
Vina
says
Jerry
tells
the
truth.
Millicent
says
Vina
lies.
Raymond
says
Millicent
lies.
Does
Raymond
tell
the
truth?
\n
A:
Let's
think
step
by
step.
\n
(1)
Fidel
tells
the
truth.
So,
we
know
that
Fidel
tells
the
truth.
\n
(2)
Jerry
says
Fidel
tells
the
truth.
Since
we
know
from
(1)
that
Fidel
tells
the
truth,
if
Jerry
says
that
Fidel
tells
the
truth,
then
Jerry
tells
the
truth.
\n
(3)
Vina
says
Jerry
tells
the
truth.
Since
we
know
from
(2)
that
Jerry
tells
the
truth,
if
Vina
says
Jerry
tells
the
truth,
then
Vine
tells
the
truth.
\n
(4)
Millicent
says
Vina
lies.
Since
we
know
from
(3)
that
Vina
tells
the
truth,
if
Millicent
says
Vina
lies,
then
Millicent
lies.
\n
(5)
Raymond
says
Millicent
lies.
Since
we
know
from
(4)
that
Millicent
lies,
if
Raymond
says
Millicent
lies,
then
Raymond
tells
the
truth.
\n
Now,
the
question
asks:
Does
Raymond
tell
the
truth?
We
know
from
(5)
that
Raymond
tells
the
truth.
So
the
answer
is
Yes.
\n\n
Q:
Question:
Kristian
lies.
Millie
says
Kristian
lies.
Maybelle
says
Millie
tells
the
truth.
Fidel
says
Maybelle
lies.
Leda
says
Fidel
lies.
Does
Leda
tell
the
truth?
\n
A:
Let's
think
step
by
step.
\n
(1)
Kristian
lies.
So,
we
know
that
Kristian
lies.
\n
(2)
Millie
says
Kristian
lies.
Since
we
know
from
(1)
that
Kristian
lies,
if
Millie
says
Kristian
lies,
then
Millie
tells
the
truth.
\n
(3)
Maybelle
says
Millie
tells
the
truth.
Since
we
know
from
(2)
that
Millie
tells
the
truth,
if
Maybelle
says
Millie
tells
the
truth,
then
Maybelle
tells
the
truth.
\n
(4)
Fidel
says
Maybelle
lies.
Since
we
know
from
(3)
that
Maybelle
tells
the
truth,
if
Fidel
says
Maybelle
lies,
then
Fidel
lies.
\n
(5)
Leda
says
Fidel
lies.
Since
we
know
from
(4)
that
Fidel
lies,
if
Leda
says
Fidel
lies,
then
Leda
tells
the
truth.
\n
Now,
the
question
asks:
Does
Leda
tell
the
truth?
We
know
from
(5)
that
Leda
tells
the
truth.
So
the
answer
is
Yes.
\n\n
Q:
Question:
Kristian
tells
the
truth.
Michaela
says
Kristian
lies.
Raymond
says
Michaela
tells
the
truth.
Osvaldo
says
Raymond
tells
the
truth.
Jamey
says
Osvaldo
tells
the
truth.
Does
Jamey
tell
the
truth?
\n
A:
Let's
think
step
by
step.
\n
(1)
Kristian
tells
the
truth.
So,
we
know
that
Kristian
tells
the
truth.
\n
(2)
Michaela
says
Kristian
lies.
Since
we
know
from
(1)
that
Kristian
tells
the
truth,
if
Michaela
says
Kristian
lies,
then
Michaela
lies.
\n
(3)
Raymond
says
Michaela
tells
the
truth.
Since
we
know
from
(2)
that
Michaela
lies,
if
Raymond
says
Michaela
tells
the
truth,
then
Raymond
lies.
\n
(4)
Osvaldo
says
Raymond
tells
the
truth.
Since
we
know
from
(3)
that
Raymond
lies,
if
Osvaldo
says
Raymond
tells
the
truth,
then
Osvaldo
lies.
\n
(5)
Jamey
says
Osvaldo
tells
the
truth.
Since
we
know
from
(4)
that
Osvaldo
lies,
if
Jamey
says
Osvaldo
tells
the
truth,
then
Jamey
lies.
\n
Now,
the
question
asks:
Does
Jamey
tell
the
truth?
We
know
from
(5)
that
Jamey
lies.
So
the
answer
is
No.
\n\n
Q:
{{input}}
\n
A:
Let's
think
step
by
step.
\n
"
"
include"
:
"
_cot_fewshot_template_yaml"
"
task"
:
"
bbh_cot_fewshot_web_of_lies"
dataset_name
:
web_of_lies
description
:
'
Evaluate
a
random
boolean
function
expressed
as
a
word
problem.
'
doc_to_text
:
'
Q:
{{input}}
A:
Let'
'
s
think
step
by
step.
'
fewshot_config
:
sampler
:
first_n
samples
:
-
input
:
'
Question:
Fidel
tells
the
truth.
Jerry
says
Fidel
tells
the
truth.
Vina
says
Jerry
tells
the
truth.
Millicent
says
Vina
lies.
Raymond
says
Millicent
lies.
Does
Raymond
tell
the
truth?'
target
:
'
Let'
'
s
think
step
by
step.
(1)
Fidel
tells
the
truth.
So,
we
know
that
Fidel
tells
the
truth.
(2)
Jerry
says
Fidel
tells
the
truth.
Since
we
know
from
(1)
that
Fidel
tells
the
truth,
if
Jerry
says
that
Fidel
tells
the
truth,
then
Jerry
tells
the
truth.
(3)
Vina
says
Jerry
tells
the
truth.
Since
we
know
from
(2)
that
Jerry
tells
the
truth,
if
Vina
says
Jerry
tells
the
truth,
then
Vine
tells
the
truth.
(4)
Millicent
says
Vina
lies.
Since
we
know
from
(3)
that
Vina
tells
the
truth,
if
Millicent
says
Vina
lies,
then
Millicent
lies.
(5)
Raymond
says
Millicent
lies.
Since
we
know
from
(4)
that
Millicent
lies,
if
Raymond
says
Millicent
lies,
then
Raymond
tells
the
truth.
Now,
the
question
asks:
Does
Raymond
tell
the
truth?
We
know
from
(5)
that
Raymond
tells
the
truth.
So
the
answer
is
Yes.'
-
input
:
'
Question:
Kristian
lies.
Millie
says
Kristian
lies.
Maybelle
says
Millie
tells
the
truth.
Fidel
says
Maybelle
lies.
Leda
says
Fidel
lies.
Does
Leda
tell
the
truth?'
target
:
'
Let'
'
s
think
step
by
step.
(1)
Kristian
lies.
So,
we
know
that
Kristian
lies.
(2)
Millie
says
Kristian
lies.
Since
we
know
from
(1)
that
Kristian
lies,
if
Millie
says
Kristian
lies,
then
Millie
tells
the
truth.
(3)
Maybelle
says
Millie
tells
the
truth.
Since
we
know
from
(2)
that
Millie
tells
the
truth,
if
Maybelle
says
Millie
tells
the
truth,
then
Maybelle
tells
the
truth.
(4)
Fidel
says
Maybelle
lies.
Since
we
know
from
(3)
that
Maybelle
tells
the
truth,
if
Fidel
says
Maybelle
lies,
then
Fidel
lies.
(5)
Leda
says
Fidel
lies.
Since
we
know
from
(4)
that
Fidel
lies,
if
Leda
says
Fidel
lies,
then
Leda
tells
the
truth.
Now,
the
question
asks:
Does
Leda
tell
the
truth?
We
know
from
(5)
that
Leda
tells
the
truth.
So
the
answer
is
Yes.'
-
input
:
'
Question:
Kristian
tells
the
truth.
Michaela
says
Kristian
lies.
Raymond
says
Michaela
tells
the
truth.
Osvaldo
says
Raymond
tells
the
truth.
Jamey
says
Osvaldo
tells
the
truth.
Does
Jamey
tell
the
truth?'
target
:
'
Let'
'
s
think
step
by
step.
(1)
Kristian
tells
the
truth.
So,
we
know
that
Kristian
tells
the
truth.
(2)
Michaela
says
Kristian
lies.
Since
we
know
from
(1)
that
Kristian
tells
the
truth,
if
Michaela
says
Kristian
lies,
then
Michaela
lies.
(3)
Raymond
says
Michaela
tells
the
truth.
Since
we
know
from
(2)
that
Michaela
lies,
if
Raymond
says
Michaela
tells
the
truth,
then
Raymond
lies.
(4)
Osvaldo
says
Raymond
tells
the
truth.
Since
we
know
from
(3)
that
Raymond
lies,
if
Osvaldo
says
Raymond
tells
the
truth,
then
Osvaldo
lies.
(5)
Jamey
says
Osvaldo
tells
the
truth.
Since
we
know
from
(4)
that
Osvaldo
lies,
if
Jamey
says
Osvaldo
tells
the
truth,
then
Jamey
lies.
Now,
the
question
asks:
Does
Jamey
tell
the
truth?
We
know
from
(5)
that
Jamey
lies.
So
the
answer
is
No.'
include
:
_cot_fewshot_template_yaml
task
:
bbh_cot_fewshot_web_of_lies
lm_eval/tasks/bbh/cot_fewshot/word_sorting.yaml
View file @
da211969
"
dataset_name"
:
"
word_sorting"
"
description"
:
"
Sort
a
list
of
words.
\n\n
"
"
doc_to_text"
:
"
Q:
Sort
the
following
words
alphabetically:
List:
oven
costume
counterpart
\n
A:
Let's
think
step
by
step.
\n
The
first
letter:
\"
oven
\"
:
\"
o
\"
(15).
\"
costume
\"
:
\"
c
\"
(3).
\"
counterpart
\"
:
\"
c
\"
(3).
We
now
have:
(3)
[
\"
costume
\"
?
\"
counterpart
\"
]
<
(15)
\"
oven
\"
.
Now
let's
sort
this
subpart
[
\"
costume
\"
?
\"
counterpart
\"
]
by
looking
at
their
second
letters.
\n
The
second
letter:
\"
costume
\"
:
\"
o
\"
(15).
\"
counterpart
\"
:
\"
o
\"
(15).
We
now
have:
(15)
[
\"
costume
\"
?
\"
counterpart
\"
].
Now
let's
sort
this
subpart
[
\"
costume
\"
?
\"
counterpart
\"
]
by
looking
at
their
third
letters.
\n
The
third
letter:
\"
costume
\"
:
\"
s
\"
(19).
\"
counterpart
\"
:
\"
u
\"
(21).
We
now
have:
(19)
\"
costume
\"
<
(21)
\"
counterpart
\"
.
Hence,
we
have
[
\"
costume
\"
<
\"
counterpart
\"
]
<
\"
oven
\"
.
So
the
answer
is
costume
counterpart
oven.
\n\n
Q:
Sort
the
following
words
alphabetically:
List:
hypochlorite
ponderosa
phone
credulity
\n
A:
Let's
think
step
by
step.
\n
The
first
letter:
\"
hypochlorite
\"
:
\"
h
\"
(8).
\"
ponderosa
\"
:
\"
p
\"
(16).
\"
phone
\"
:
\"
p
\"
(16).
\"
credulity
\"
:
\"
c
\"
(3).
We
now
have:
(3)
\"
credulity
\"
<
(8)
\"
hypochlorite
\"
<
(16)
[
\"
ponderosa
\"
?
\"
phone
\"
].
Now
let's
sort
this
subpart
[
\"
ponderosa
\"
?
\"
phone
\"
]
by
looking
at
their
second
letters.
\n
The
second
letter:
\"
ponderosa
\"
:
\"
o
\"
(15).
\"
phone
\"
:
\"
h
\"
(8).
We
now
have:
(8)
\"
phone
\"
<
(15)
\"
ponderosa
\"
.
Hence,
we
have
\"
credulity
\"
<
\"
hypochlorite
\"
<
[
\"
phone
\"
<
\"
ponderosa
\"
].
So
the
answer
is
credulity
hypochlorite
phone
ponderosa.
\n\n
Q:
Sort
the
following
words
alphabetically:
List:
newt
arson
parthia
seismography
mugho
aspect
census
\n
A:
Let's
think
step
by
step.
\n
The
first
letter:
\"
newt
\"
:
\"
n
\"
(14).
\"
arson
\"
:
\"
a
\"
(1).
\"
parthia
\"
:
\"
p
\"
(16).
\"
seismography
\"
:
\"
s
\"
(19).
\"
mugho
\"
:
\"
m
\"
(13).
\"
aspect
\"
:
\"
a
\"
(1).
\"
census
\"
:
\"
c
\"
(3).
We
now
have:
(1)
[
\"
arson
\"
?
\"
aspect
\"
]
<
(3)
\"
census
\"
<
(13)
\"
mugho
\"
<
(14)
\"
newt
\"
<
(16)
\"
parthia
\"
<
(19)
\"
seismography
\"
.
Now
let's
sort
this
subpart
[
\"
arson
\"
?
\"
aspect
\"
]
by
looking
at
their
second
letters.
\n
The
second
letter:
\"
arson
\"
:
\"
r
\"
(18).
\"
aspect
\"
:
\"
s
\"
(19).
We
now
have:
(18)
\"
arson
\"
<
(19)
\"
aspect
\"
.
Hence,
we
have
[
\"
arson
\"
<
\"
aspect
\"
]
<
\"
census
\"
<
\"
mugho
\"
<
\"
newt
\"
<
\"
parthia
\"
<
\"
seismography
\"
.
So
the
answer
is
arson
aspect
census
mugho
newt
parthia
seismography.
\n\n
Q:
{{input}}
\n
A:
Let's
think
step
by
step.
\n
"
"
include"
:
"
_cot_fewshot_template_yaml"
"
task"
:
"
bbh_cot_fewshot_word_sorting"
dataset_name
:
word_sorting
description
:
'
Sort
a
list
of
words.
'
doc_to_text
:
'
Q:
{{input}}
A:
Let'
'
s
think
step
by
step.
'
fewshot_config
:
sampler
:
first_n
samples
:
-
input
:
'
Sort
the
following
words
alphabetically:
List:
oven
costume
counterpart'
target
:
'
Let'
'
s
think
step
by
step.
The
first
letter:
"oven":
"o"
(15).
"costume":
"c"
(3).
"counterpart":
"c"
(3).
We
now
have:
(3)
["costume"
?
"counterpart"]
<
(15)
"oven".
Now
let'
'
s
sort
this
subpart
["costume"
?
"counterpart"]
by
looking
at
their
second
letters.
The
second
letter:
"costume":
"o"
(15).
"counterpart":
"o"
(15).
We
now
have:
(15)
["costume"
?
"counterpart"].
Now
let'
'
s
sort
this
subpart
["costume"
?
"counterpart"]
by
looking
at
their
third
letters.
The
third
letter:
"costume":
"s"
(19).
"counterpart":
"u"
(21).
We
now
have:
(19)
"costume"
<
(21)
"counterpart".
Hence,
we
have
["costume"
<
"counterpart"]
<
"oven".
So
the
answer
is
costume
counterpart
oven.'
-
input
:
'
Sort
the
following
words
alphabetically:
List:
hypochlorite
ponderosa
phone
credulity'
target
:
'
Let'
'
s
think
step
by
step.
The
first
letter:
"hypochlorite":
"h"
(8).
"ponderosa":
"p"
(16).
"phone":
"p"
(16).
"credulity":
"c"
(3).
We
now
have:
(3)
"credulity"
<
(8)
"hypochlorite"
<
(16)
["ponderosa"
?
"phone"].
Now
let'
'
s
sort
this
subpart
["ponderosa"
?
"phone"]
by
looking
at
their
second
letters.
The
second
letter:
"ponderosa":
"o"
(15).
"phone":
"h"
(8).
We
now
have:
(8)
"phone"
<
(15)
"ponderosa".
Hence,
we
have
"credulity"
<
"hypochlorite"
<
["phone"
<"ponderosa"].
So
the
answer
is
credulity
hypochlorite
phone
ponderosa.'
-
input
:
'
Sort
the
following
words
alphabetically:
List:
newt
arson
parthia
seismography
mugho
aspect
census'
target
:
'
Let'
'
s
think
step
by
step.
The
first
letter:
"newt":
"n"
(14).
"arson":
"a"
(1).
"parthia":
"p"
(16).
"seismography":
"s"
(19).
"mugho":
"m"
(13).
"aspect":
"a"
(1).
"census":
"c"
(3).
We
now
have:
(1)
["arson"
?
"aspect"]
<
(3)
"census"
<
(13)
"mugho"
<
(14)
"newt"
<
(16)
"parthia"
<
(19)
"seismography".
Now
let'
'
s
sort
this
subpart
["arson"
?
"aspect"]
by
looking
at
their
second
letters.
The
second
letter:
"arson":
"r"
(18).
"aspect":
"s"
(19).
We
now
have:
(18)
"arson"
<
(19)
"aspect".
Hence,
we
have
["arson"
<
"aspect"]
<
"census"
<
"mugho"
<
"newt"
<
"parthia"
<
"seismography".
So
the
answer
is
arson
aspect
census
mugho
newt
parthia
seismography.'
include
:
_cot_fewshot_template_yaml
task
:
bbh_cot_fewshot_word_sorting
lm_eval/tasks/bbh/fewshot/_fewshot_template_yaml
View file @
da211969
...
...
@@ -16,7 +16,6 @@ generation_kwargs:
- "\n\n"
do_sample: false
temperature: 0.0
num_fewshot:
0
num_fewshot:
3
metadata:
version: 1.0
num_fewshot: 3 # will be printed in results table
lm_eval/tasks/bbh/fewshot/boolean_expressions.yaml
View file @
da211969
"
dataset_name"
:
"
boolean_expressions"
"
description"
:
"
Evaluate
the
result
of
a
random
Boolean
expression.
\n\n
"
"
doc_to_text"
:
"
Q:
not
(
(
not
not
True
)
)
is
\n
A:
False
\n\n
Q:
True
and
False
and
not
True
and
True
is
\n
A:
False
\n\n
Q:
not
not
(
not
(
False
)
)
is
\n
A:
True
\n\n
Q:
{{input}}
\n
A:"
"
include"
:
"
_fewshot_template_yaml"
"
task"
:
"
bbh_fewshot_boolean_expressions"
dataset_name
:
boolean_expressions
description
:
'
Evaluate
the
result
of
a
random
Boolean
expression.
'
doc_to_text
:
'
Q:
{{input}}
A:'
fewshot_config
:
sampler
:
first_n
samples
:
-
input
:
not ( ( not not True ) ) is
target
:
'
False'
-
input
:
True and False and not True and True is
target
:
'
False'
-
input
:
not not ( not ( False ) ) is
target
:
'
True'
include
:
_fewshot_template_yaml
task
:
bbh_fewshot_boolean_expressions
lm_eval/tasks/bbh/fewshot/causal_judgement.yaml
View file @
da211969
"
dataset_name"
:
"
causal_judgement"
"
description"
:
"
Answer
questions
about
causal
attribution.
\n\n
"
"
doc_to_text"
:
"
Q:
How
would
a
typical
person
answer
each
of
the
following
questions
about
causation?
\n
Frank
T.,
had
an
ongoing
dispute
with
his
neighbor
over
a
stretch
of
land
and
one
day
decided
to
shoot
his
neighbor
in
the
body.
Frank
T.
had
no
experience
with
guns,
his
hand
slipped
on
the
barrel
of
the
gun,
and
the
shot
went
wild.
Nonetheless,
the
bullet
bounced
off
a
large
boulder
several
feet
away
and
hit
the
neighbor's
body,
causing
significant
injury.
Did
Frank
T.
intentionally
shoot
his
neighbor
in
the
body?
\n
Options:
\n
-
Yes
\n
-
No
\n
A:
No
\n\n
Q:
How
would
a
typical
person
answer
each
of
the
following
questions
about
causation?
\n
Suzy
and
Billy
are
working
on
a
project
that
is
very
important
for
our
nation's
security.
The
boss
tells
them
both:
\"
Be
sure
that
you
are
here
at
exactly
9
am.
It
is
absolutely
essential
that
you
arrive
at
that
time.
\"
Both
Billy
and
Suzy
arrive
at
9
am.
As
it
happens,
there
was
a
motion
detector
installed
in
the
room
where
they
arrived.
The
motion
detector
was
set
up
to
be
triggered
if
at
least
one
person
appeared
in
the
room
at
the
same
time.
So
the
motion
detector
went
off.
Did
Billy
cause
the
motion
detector
to
go
off?
\n
Options:
\n
-
Yes
\n
-
No
\n
A:
Yes
\n\n
Q:
How
would
a
typical
person
answer
each
of
the
following
questions
about
causation?
\n
George
and
his
sister
Lena
reunite
at
their
parents'
house
for
Thanksgiving.
Whereas
George
just
got
into
medical
school,
Lena
is
unhappy
in
her
marriage
and
recently
lost
her
job.
Over
the
course
of
the
day,
George
and
Lena
get
into
a
number
of
heated
arguments.
Later
in
the
afternoon
they
play
a
game
of
darts.
They
split
the
first
two
games,
and
the
third
game
is
close
until
the
end.
Who
will
win
comes
down
to
George's
last
shot.
If
he
hits
a
high
point
region,
he
wins;
if
he
hits
a
low
point
region,
Lena
wins.
George
thinks
of
the
difficult
time
Lena
is
having,
and
he
really
wants
to
let
her
win.
He
aims
the
dart
at
the
low
point
region.
He
sets
up
his
shot
and
the
dart
lands
in
the
low
point
region.
After
his
shot,
Lena
wins
the
game
and
is
very
happy.
Did
George
hit
the
low
point
region
intentionally?
\n
Options:
\n
-
Yes
\n
-
No
\n
A:
Yes
\n\n
Q:
{{input}}
\n
A:"
"
include"
:
"
_fewshot_template_yaml"
"
task"
:
"
bbh_fewshot_causal_judgement"
dataset_name
:
causal_judgement
description
:
'
Answer
questions
about
causal
attribution.
'
doc_to_text
:
'
Q:
{{input}}
A:'
fewshot_config
:
sampler
:
first_n
samples
:
-
input
:
'
How
would
a
typical
person
answer
each
of
the
following
questions
about
causation?
Frank
T.,
had
an
ongoing
dispute
with
his
neighbor
over
a
stretch
of
land
and
one
day
decided
to
shoot
his
neighbor
in
the
body.
Frank
T.
had
no
experience
with
guns,
his
hand
slipped
on
the
barrel
of
the
gun,
and
the
shot
went
wild.
Nonetheless,
the
bullet
bounced
off
a
large
boulder
several
feet
away
and
hit
the
neighbor'
'
s
body,
causing
significant
injury.
Did
Frank
T.
intentionally
shoot
his
neighbor
in
the
body?
Options:
-
Yes
-
No'
target
:
'
No'
-
input
:
'
How
would
a
typical
person
answer
each
of
the
following
questions
about
causation?
Suzy
and
Billy
are
working
on
a
project
that
is
very
important
for
our
nation'
'
s
security.
The
boss
tells
them
both:
"Be
sure
that
you
are
here
at
exactly
9
am.
It
is
absolutely
essential
that
you
arrive
at
that
time."
Both
Billy
and
Suzy
arrive
at
9
am.
As
it
happens,
there
was
a
motion
detector
installed
in
the
room
where
they
arrived.
The
motion
detector
was
set
up
to
be
triggered
if
at
least
one
person
appeared
in
the
room
at
the
same
time.
So
the
motion
detector
went
off.
Did
Billy
cause
the
motion
detector
to
go
off?
Options:
-
Yes
-
No'
target
:
'
Yes'
-
input
:
'
How
would
a
typical
person
answer
each
of
the
following
questions
about
causation?
George
and
his
sister
Lena
reunite
at
their
parents'
'
house
for
Thanksgiving.
Whereas
George
just
got
into
medical
school,
Lena
is
unhappy
in
her
marriage
and
recently
lost
her
job.
Over
the
course
of
the
day,
George
and
Lena
get
into
a
number
of
heated
arguments.
Later
in
the
afternoon
they
play
a
game
of
darts.
They
split
the
first
two
games,
and
the
third
game
is
close
until
the
end.
Who
will
win
comes
down
to
George'
'
s
last
shot.
If
he
hits
a
high
point
region,
he
wins;
if
he
hits
a
low
point
region,
Lena
wins.
George
thinks
of
the
difficult
time
Lena
is
having,
and
he
really
wants
to
let
her
win.
He
aims
the
dart
at
the
low
point
region.
He
sets
up
his
shot
and
the
dart
lands
in
the
low
point
region.
After
his
shot,
Lena
wins
the
game
and
is
very
happy.
Did
George
hit
the
low
point
region
intentionally?
Options:
-
Yes
-
No'
target
:
'
Yes'
include
:
_fewshot_template_yaml
task
:
bbh_fewshot_causal_judgement
lm_eval/tasks/bbh/fewshot/date_understanding.yaml
View file @
da211969
"
dataset_name"
:
"
date_understanding"
"
description"
:
"
Infer
the
date
from
context.
\n\n
"
"
doc_to_text"
:
"
Q:
Today
is
Christmas
Eve
of
1937.
What
is
the
date
10
days
ago
in
MM/DD/YYYY?
\n
Options:
\n
(A)
12/14/2026
\n
(B)
12/14/1950
\n
(C)
12/14/2007
\n
(D)
12/14/1937
\n
(E)
07/14/1938
\n
(F)
12/14/1988
\n
A:
(D)
\n\n
Q:
Tomorrow
is
11/12/2019.
What
is
the
date
one
year
ago
from
today
in
MM/DD/YYYY?
\n
Options:
\n
(A)
09/04/2018
\n
(B)
11/11/2018
\n
(C)
08/25/2018
\n
(D)
11/02/2018
\n
(E)
11/04/2018
\n
A:
(B)
\n\n
Q:
Jane
and
John
married
on
Jan
2,
1958.
It
is
their
5-year
anniversary
today.
What
is
the
date
tomorrow
in
MM/DD/YYYY?
\n
Options:
\n
(A)
01/11/1961
\n
(B)
01/03/1963
\n
(C)
01/18/1961
\n
(D)
10/14/1960
\n
(E)
01/03/1982
\n
(F)
12/03/1960
\n
A:
(B)
\n\n
Q:
{{input}}
\n
A:"
"
include"
:
"
_fewshot_template_yaml"
"
task"
:
"
bbh_fewshot_date_understanding"
dataset_name
:
date_understanding
description
:
'
Infer
the
date
from
context.
'
doc_to_text
:
'
Q:
{{input}}
A:'
fewshot_config
:
sampler
:
first_n
samples
:
-
input
:
'
Today
is
Christmas
Eve
of
1937.
What
is
the
date
10
days
ago
in
MM/DD/YYYY?
Options:
(A)
12/14/2026
(B)
12/14/1950
(C)
12/14/2007
(D)
12/14/1937
(E)
07/14/1938
(F)
12/14/1988'
target
:
(D)
-
input
:
'
Tomorrow
is
11/12/2019.
What
is
the
date
one
year
ago
from
today
in
MM/DD/YYYY?
Options:
(A)
09/04/2018
(B)
11/11/2018
(C)
08/25/2018
(D)
11/02/2018
(E)
11/04/2018'
target
:
(B)
-
input
:
'
Jane
and
John
married
on
Jan
2,
1958.
It
is
their
5-year
anniversary
today.
What
is
the
date
tomorrow
in
MM/DD/YYYY?
Options:
(A)
01/11/1961
(B)
01/03/1963
(C)
01/18/1961
(D)
10/14/1960
(E)
01/03/1982
(F)
12/03/1960'
target
:
(B)
include
:
_fewshot_template_yaml
task
:
bbh_fewshot_date_understanding
lm_eval/tasks/bbh/fewshot/disambiguation_qa.yaml
View file @
da211969
"
dataset_name"
:
"
disambiguation_qa"
"
description"
:
"
Clarify
the
meaning
of
sentences
with
ambiguous
pronouns.
\n\n
"
"
doc_to_text"
:
"
Q:
In
the
following
sentences,
explain
the
antecedent
of
the
pronoun
(which
thing
the
pronoun
refers
to),
or
state
that
it
is
ambiguous.
\n
Sentence:
The
chief
told
the
counselor
that
they
took
the
day
off.
\n
Options:
\n
(A)
The
chief
took
the
day
off
\n
(B)
The
counselor
took
the
day
off
\n
(C)
Ambiguous
\n
A:
(A)
\n\n
Q:
In
the
following
sentences,
explain
the
antecedent
of
the
pronoun
(which
thing
the
pronoun
refers
to),
or
state
that
it
is
ambiguous.
\n
Sentence:
The
manager
sent
a
message
to
the
secretary,
but
he
didn't
reply
yet.
\n
Options:
\n
(A)
The
secretary
didn't
reply
yet
\n
(B)
The
manager
didn't
reply
yet
\n
(C)
Ambiguous
\n
A:
(A)
\n\n
Q:
In
the
following
sentences,
explain
the
antecedent
of
the
pronoun
(which
thing
the
pronoun
refers
to),
or
state
that
it
is
ambiguous.
\n
Sentence:
Bailey
will
plan
to
meet
the
director
at
his
office
\n
Options:
\n
(A)
It
will
be
Bailey's
office
\n
(B)
It
will
be
the
director's
office
\n
(C)
Ambiguous
\n
A:
(C)
\n\n
Q:
{{input}}
\n
A:"
"
include"
:
"
_fewshot_template_yaml"
"
task"
:
"
bbh_fewshot_disambiguation_qa"
dataset_name
:
disambiguation_qa
description
:
'
Clarify
the
meaning
of
sentences
with
ambiguous
pronouns.
'
doc_to_text
:
'
Q:
{{input}}
A:'
fewshot_config
:
sampler
:
first_n
samples
:
-
input
:
'
In
the
following
sentences,
explain
the
antecedent
of
the
pronoun
(which
thing
the
pronoun
refers
to),
or
state
that
it
is
ambiguous.
Sentence:
The
chief
told
the
counselor
that
they
took
the
day
off.
Options:
(A)
The
chief
took
the
day
off
(B)
The
counselor
took
the
day
off
(C)
Ambiguous'
target
:
(A)
-
input
:
'
In
the
following
sentences,
explain
the
antecedent
of
the
pronoun
(which
thing
the
pronoun
refers
to),
or
state
that
it
is
ambiguous.
Sentence:
The
manager
sent
a
message
to
the
secretary,
but
he
didn'
'
t
reply
yet.
Options:
(A)
The
secretary
didn'
'
t
reply
yet
(B)
The
manager
didn'
'
t
reply
yet
(C)
Ambiguous'
target
:
(A)
-
input
:
'
In
the
following
sentences,
explain
the
antecedent
of
the
pronoun
(which
thing
the
pronoun
refers
to),
or
state
that
it
is
ambiguous.
Sentence:
Bailey
will
plan
to
meet
the
director
at
his
office
Options:
(A)
It
will
be
Bailey'
'
s
office
(B)
It
will
be
the
director'
'
s
office
(C)
Ambiguous'
target
:
(C)
include
:
_fewshot_template_yaml
task
:
bbh_fewshot_disambiguation_qa
lm_eval/tasks/bbh/fewshot/dyck_languages.yaml
View file @
da211969
"
dataset_name"
:
"
dyck_languages"
"
description"
:
"
Correctly
close
a
Dyck-n
word.
\n\n
"
"
doc_to_text"
:
"
Q:
Complete
the
rest
of
the
sequence,
making
sure
that
the
parentheses
are
closed
properly.
Input:
[
{
[
\n
A:
]
}
]
\n\n
Q:
Complete
the
rest
of
the
sequence,
making
sure
that
the
parentheses
are
closed
properly.
Input:
<
>
(
(
[
[
(
{
}
)
[
<
>
]
]
\n
A:
]
)
)
\n\n
Q:
Complete
the
rest
of
the
sequence,
making
sure
that
the
parentheses
are
closed
properly.
Input:
<
[
<
[
{
<
[
]
<
{
}
>
>
}
]
>
{
{
(
)
}
{
<
[
<
>
]
>
}
\n
A:
}
]
>
\n\n
Q:
{{input}}
\n
A:"
"
include"
:
"
_fewshot_template_yaml"
"
task"
:
"
bbh_fewshot_dyck_languages"
dataset_name
:
dyck_languages
description
:
'
Correctly
close
a
Dyck-n
word.
'
doc_to_text
:
'
Q:
{{input}}
A:'
fewshot_config
:
sampler
:
first_n
samples
:
-
input
:
'
Complete
the
rest
of
the
sequence,
making
sure
that
the
parentheses
are
closed
properly.
Input:
[
{
['
target
:
'
]
}
]'
-
input
:
'
Complete
the
rest
of
the
sequence,
making
sure
that
the
parentheses
are
closed
properly.
Input:
<
>
(
(
[
[
(
{
}
)
[
<
>
]
]'
target
:
'
]
)
)'
-
input
:
'
Complete
the
rest
of
the
sequence,
making
sure
that
the
parentheses
are
closed
properly.
Input:
<
[
<
[
{
<
[
]
<
{
}
>
>
}
]
>
{
{
(
)
}
{
<
[
<
>
]
>
}'
target
:
'
}
]
>'
include
:
_fewshot_template_yaml
task
:
bbh_fewshot_dyck_languages
lm_eval/tasks/bbh/fewshot/formal_fallacies.yaml
View file @
da211969
"
dataset_name"
:
"
formal_fallacies"
"
description"
:
"
Distinguish
deductively
valid
arguments
from
formal
fallacies.
\n\n
"
"
doc_to_text"
:
"
Q:
\"
It
is
not
always
easy
to
see
who
is
related
to
whom
--
and
in
which
ways.
The
following
argument
pertains
to
this
question:
To
begin
with,
Lesley
is
a
close
friend
of
Fernando.
Moreover,
being
a
close
friend
of
Fernando
or
a
schoolmate
of
Lowell
is
sufficient
for
being
a
great-grandfather
of
Leroy.
It
follows
that
Lesley
is
a
great-grandfather
of
Leroy.
\"\n
Is
the
argument,
given
the
explicitly
stated
premises,
deductively
valid
or
invalid?
\n
Options:
\n
-
valid
\n
-
invalid
\n
A:
valid
\n\n
Q:
\"
It
is
not
always
easy
to
see
who
is
related
to
whom
--
and
in
which
ways.
The
following
argument
pertains
to
this
question:
Whoever
is
not
a
great-grandfather
of
Clyde
is
a
stepbrother
of
Brian.
Being
an
ancestor
of
Dana
is
sufficient
for
not
being
a
great-grandfather
of
Clyde.
We
may
conclude:
Everyone
who
is
an
ancestor
of
Dana
is
a
stepbrother
of
Brian,
too.
\"\n
Is
the
argument,
given
the
explicitly
stated
premises,
deductively
valid
or
invalid?
\n
Options:
\n
-
valid
\n
-
invalid
\n
A:
valid
\n\n
Q:
\"
It
is
not
always
easy
to
grasp
who
is
consuming
which
products.
The
following
argument
pertains
to
this
question:
Every
infrequent
user
of
Paul
Mitchell
shampoo
is
either
a
rare
consumer
of
Nioxin
shampoo
or
a
loyal
buyer
of
Caress
soap,
or
both.
No
regular
consumer
of
Lush
soap
is
a
rare
consumer
of
Nioxin
shampoo
and,
in
the
same
time,
a
loyal
buyer
of
Caress
soap.
It
follows
that
whoever
is
an
infrequent
user
of
Paul
Mitchell
shampoo
is
not
a
regular
consumer
of
Lush
soap.
\"\n
Is
the
argument,
given
the
explicitly
stated
premises,
deductively
valid
or
invalid?
\n
Options:
\n
-
valid
\n
-
invalid
\n
A:
invalid
\n\n
Q:
{{input}}
\n
A:"
"
include"
:
"
_fewshot_template_yaml"
"
task"
:
"
bbh_fewshot_formal_fallacies"
dataset_name
:
formal_fallacies
description
:
'
Distinguish
deductively
valid
arguments
from
formal
fallacies.
'
doc_to_text
:
'
Q:
{{input}}
A:'
fewshot_config
:
sampler
:
first_n
samples
:
-
input
:
'
"It
is
not
always
easy
to
see
who
is
related
to
whom
--
and
in
which
ways.
The
following
argument
pertains
to
this
question:
To
begin
with,
Lesley
is
a
close
friend
of
Fernando.
Moreover,
being
a
close
friend
of
Fernando
or
a
schoolmate
of
Lowell
is
sufficient
for
being
a
great-grandfather
of
Leroy.
It
follows
that
Lesley
is
a
great-grandfather
of
Leroy."
Is
the
argument,
given
the
explicitly
stated
premises,
deductively
valid
or
invalid?
Options:
-
valid
-
invalid'
target
:
valid
-
input
:
'
"It
is
not
always
easy
to
see
who
is
related
to
whom
--
and
in
which
ways.
The
following
argument
pertains
to
this
question:
Whoever
is
not
a
great-grandfather
of
Clyde
is
a
stepbrother
of
Brian.
Being
an
ancestor
of
Dana
is
sufficient
for
not
being
a
great-grandfather
of
Clyde.
We
may
conclude:
Everyone
who
is
an
ancestor
of
Dana
is
a
stepbrother
of
Brian,
too."
Is
the
argument,
given
the
explicitly
stated
premises,
deductively
valid
or
invalid?
Options:
-
valid
-
invalid'
target
:
valid
-
input
:
'
"It
is
not
always
easy
to
grasp
who
is
consuming
which
products.
The
following
argument
pertains
to
this
question:
Every
infrequent
user
of
Paul
Mitchell
shampoo
is
either
a
rare
consumer
of
Nioxin
shampoo
or
a
loyal
buyer
of
Caress
soap,
or
both.
No
regular
consumer
of
Lush
soap
is
a
rare
consumer
of
Nioxin
shampoo
and,
in
the
same
time,
a
loyal
buyer
of
Caress
soap.
It
follows
that
whoever
is
an
infrequent
user
of
Paul
Mitchell
shampoo
is
not
a
regular
consumer
of
Lush
soap."
Is
the
argument,
given
the
explicitly
stated
premises,
deductively
valid
or
invalid?
Options:
-
valid
-
invalid'
target
:
invalid
include
:
_fewshot_template_yaml
task
:
bbh_fewshot_formal_fallacies
lm_eval/tasks/bbh/fewshot/geometric_shapes.yaml
View file @
da211969
"
dataset_name"
:
"
geometric_shapes"
"
description"
:
"
Name
geometric
shapes
from
their
SVG
paths.
\n\n
"
"
doc_to_text"
:
"
Q:
This
SVG
path
element
<path
d=
\"
M
31.00,73.00
L
32.00,59.00
L
44.00,50.00
L
49.00,41.00
L
64.00,37.00
L
71.00,55.00
L
64.00,76.00
L
52.00,61.00
L
31.00,73.00
\"
/>
draws
a
\n
Options:
\n
(A)
circle
\n
(B)
heptagon
\n
(C)
hexagon
\n
(D)
kite
\n
(E)
line
\n
(F)
octagon
\n
(G)
pentagon
\n
(H)
rectangle
\n
(I)
sector
\n
(J)
triangle
\n
A:
(F)
\n\n
Q:
This
SVG
path
element
<path
d=
\"
M
14.19,26.04
L
51.43,39.21
L
58.44,36.69
L
56.63,30.17
L
48.53,26.66
L
14.19,26.04
\"
/>
draws
a
\n
Options:
\n
(A)
circle
\n
(B)
heptagon
\n
(C)
hexagon
\n
(D)
kite
\n
(E)
line
\n
(F)
octagon
\n
(G)
pentagon
\n
(H)
rectangle
\n
(I)
sector
\n
(J)
triangle
\n
A:
(G)
\n\n
Q:
This
SVG
path
element
<path
d=
\"
M
41.00,43.00
L
37.00,34.00
L
41.00,33.00
L
45.00,34.00
L
41.00,43.00
\"
/>
draws
a
\n
Options:
\n
(A)
circle
\n
(B)
heptagon
\n
(C)
hexagon
\n
(D)
kite
\n
(E)
line
\n
(F)
octagon
\n
(G)
pentagon
\n
(H)
rectangle
\n
(I)
sector
\n
(J)
triangle
\n
A:
(D)
\n\n
Q:
{{input}}
\n
A:"
"
include"
:
"
_fewshot_template_yaml"
"
task"
:
"
bbh_fewshot_geometric_shapes"
dataset_name
:
geometric_shapes
description
:
'
Name
geometric
shapes
from
their
SVG
paths.
'
doc_to_text
:
'
Q:
{{input}}
A:'
fewshot_config
:
sampler
:
first_n
samples
:
-
input
:
'
This
SVG
path
element
<path
d="M
31.00,73.00
L
32.00,59.00
L
44.00,50.00
L
49.00,41.00
L
64.00,37.00
L
71.00,55.00
L
64.00,76.00
L
52.00,61.00
L
31.00,73.00"/>
draws
a
Options:
(A)
circle
(B)
heptagon
(C)
hexagon
(D)
kite
(E)
line
(F)
octagon
(G)
pentagon
(H)
rectangle
(I)
sector
(J)
triangle'
target
:
(F)
-
input
:
'
This
SVG
path
element
<path
d="M
14.19,26.04
L
51.43,39.21
L
58.44,36.69
L
56.63,30.17
L
48.53,26.66
L
14.19,26.04"/>
draws
a
Options:
(A)
circle
(B)
heptagon
(C)
hexagon
(D)
kite
(E)
line
(F)
octagon
(G)
pentagon
(H)
rectangle
(I)
sector
(J)
triangle'
target
:
(G)
-
input
:
'
This
SVG
path
element
<path
d="M
41.00,43.00
L
37.00,34.00
L
41.00,33.00
L
45.00,34.00
L
41.00,43.00"/>
draws
a
Options:
(A)
circle
(B)
heptagon
(C)
hexagon
(D)
kite
(E)
line
(F)
octagon
(G)
pentagon
(H)
rectangle
(I)
sector
(J)
triangle'
target
:
(D)
include
:
_fewshot_template_yaml
task
:
bbh_fewshot_geometric_shapes
lm_eval/tasks/bbh/fewshot/hyperbaton.yaml
View file @
da211969
"
dataset_name"
:
"
hyperbaton"
"
description"
:
"
Order
adjectives
correctly
in
English
sentences.
\n\n
"
"
doc_to_text"
:
"
Q:
Which
sentence
has
the
correct
adjective
order:
\n
Options:
\n
(A)
rubber
terrible
ship
\n
(B)
terrible
rubber
ship
\n
A:
(B)
\n\n
Q:
Which
sentence
has
the
correct
adjective
order:
\n
Options:
\n
(A)
repulsive
small
Brazilian
exercise
ship
\n
(B)
Brazilian
repulsive
exercise
small
ship
\n
A:
(A)
\n\n
Q:
Which
sentence
has
the
correct
adjective
order:
\n
Options:
\n
(A)
blue
gold
wonderful
square
shoe
\n
(B)
wonderful
square
blue
gold
shoe
\n
A:
(B)
\n\n
Q:
{{input}}
\n
A:"
"
include"
:
"
_fewshot_template_yaml"
"
task"
:
"
bbh_fewshot_hyperbaton"
dataset_name
:
hyperbaton
description
:
'
Order
adjectives
correctly
in
English
sentences.
'
doc_to_text
:
'
Q:
{{input}}
A:'
fewshot_config
:
sampler
:
first_n
samples
:
-
input
:
'
Which
sentence
has
the
correct
adjective
order:
Options:
(A)
rubber
terrible
ship
(B)
terrible
rubber
ship'
target
:
(B)
-
input
:
'
Which
sentence
has
the
correct
adjective
order:
Options:
(A)
repulsive
small
Brazilian
exercise
ship
(B)
Brazilian
repulsive
exercise
small
ship'
target
:
(A)
-
input
:
'
Which
sentence
has
the
correct
adjective
order:
Options:
(A)
blue
gold
wonderful
square
shoe
(B)
wonderful
square
blue
gold
shoe'
target
:
(B)
include
:
_fewshot_template_yaml
task
:
bbh_fewshot_hyperbaton
lm_eval/tasks/bbh/fewshot/logical_deduction_five_objects.yaml
View file @
da211969
"
dataset_name"
:
"
logical_deduction_five_objects"
"
description"
:
"
A
logical
deduction
task
which
requires
deducing
the
order
of
a
sequence
of
objects.
\n\n
"
"
doc_to_text"
:
"
Q:
The
following
paragraphs
each
describe
a
set
of
three
objects
arranged
in
a
fixed
order.
The
statements
are
logically
consistent
within
each
paragraph.
In
a
golf
tournament,
there
were
three
golfers:
Amy,
Eli,
and
Eve.
Eve
finished
above
Amy.
Eli
finished
below
Amy.
\n
Options:
\n
(A)
Amy
finished
last
\n
(B)
Eli
finished
last
\n
(C)
Eve
finished
last
\n
A:
(B)
\n\n
Q:
The
following
paragraphs
each
describe
a
set
of
three
objects
arranged
in
a
fixed
order.
The
statements
are
logically
consistent
within
each
paragraph.
On
a
shelf,
there
are
three
books:
a
white
book,
a
green
book,
and
an
orange
book.
The
green
book
is
to
the
right
of
the
white
book.
The
orange
book
is
the
rightmost.
\n
Options:
\n
(A)
The
white
book
is
the
leftmost
\n
(B)
The
green
book
is
the
leftmost
\n
(C)
The
orange
book
is
the
leftmost
\n
A:
(A)
\n\n
Q:
The
following
paragraphs
each
describe
a
set
of
three
objects
arranged
in
a
fixed
order.
The
statements
are
logically
consistent
within
each
paragraph.
On
a
shelf,
there
are
three
books:
a
red
book,
a
gray
book,
and
a
white
book.
The
white
book
is
to
the
left
of
the
gray
book.
The
red
book
is
the
second
from
the
left.
\n
Options:
\n
(A)
The
red
book
is
the
leftmost
\n
(B)
The
gray
book
is
the
leftmost
\n
(C)
The
white
book
is
the
leftmost
\n
A:
(C)
\n\n
Q:
{{input}}
\n
A:"
"
include"
:
"
_fewshot_template_yaml"
"
task"
:
"
bbh_fewshot_logical_deduction_five_objects"
dataset_name
:
logical_deduction_five_objects
description
:
'
A
logical
deduction
task
which
requires
deducing
the
order
of
a
sequence
of
objects.
'
doc_to_text
:
'
Q:
{{input}}
A:'
fewshot_config
:
sampler
:
first_n
samples
:
-
input
:
'
The
following
paragraphs
each
describe
a
set
of
three
objects
arranged
in
a
fixed
order.
The
statements
are
logically
consistent
within
each
paragraph.
In
a
golf
tournament,
there
were
three
golfers:
Amy,
Eli,
and
Eve.
Eve
finished
above
Amy.
Eli
finished
below
Amy.
Options:
(A)
Amy
finished
last
(B)
Eli
finished
last
(C)
Eve
finished
last'
target
:
(B)
-
input
:
'
The
following
paragraphs
each
describe
a
set
of
three
objects
arranged
in
a
fixed
order.
The
statements
are
logically
consistent
within
each
paragraph.
On
a
shelf,
there
are
three
books:
a
white
book,
a
green
book,
and
an
orange
book.
The
green
book
is
to
the
right
of
the
white
book.
The
orange
book
is
the
rightmost.
Options:
(A)
The
white
book
is
the
leftmost
(B)
The
green
book
is
the
leftmost
(C)
The
orange
book
is
the
leftmost'
target
:
(A)
-
input
:
'
The
following
paragraphs
each
describe
a
set
of
three
objects
arranged
in
a
fixed
order.
The
statements
are
logically
consistent
within
each
paragraph.
On
a
shelf,
there
are
three
books:
a
red
book,
a
gray
book,
and
a
white
book.
The
white
book
is
to
the
left
of
the
gray
book.
The
red
book
is
the
second
from
the
left.
Options:
(A)
The
red
book
is
the
leftmost
(B)
The
gray
book
is
the
leftmost
(C)
The
white
book
is
the
leftmost'
target
:
(C)
include
:
_fewshot_template_yaml
task
:
bbh_fewshot_logical_deduction_five_objects
lm_eval/tasks/bbh/fewshot/logical_deduction_seven_objects.yaml
View file @
da211969
"
dataset_name"
:
"
logical_deduction_seven_objects"
"
description"
:
"
A
logical
deduction
task
which
requires
deducing
the
order
of
a
sequence
of
objects.
\n\n
"
"
doc_to_text"
:
"
Q:
The
following
paragraphs
each
describe
a
set
of
three
objects
arranged
in
a
fixed
order.
The
statements
are
logically
consistent
within
each
paragraph.
In
a
golf
tournament,
there
were
three
golfers:
Amy,
Eli,
and
Eve.
Eve
finished
above
Amy.
Eli
finished
below
Amy.
\n
Options:
\n
(A)
Amy
finished
last
\n
(B)
Eli
finished
last
\n
(C)
Eve
finished
last
\n
A:
(B)
\n\n
Q:
The
following
paragraphs
each
describe
a
set
of
three
objects
arranged
in
a
fixed
order.
The
statements
are
logically
consistent
within
each
paragraph.
On
a
shelf,
there
are
three
books:
a
white
book,
a
green
book,
and
an
orange
book.
The
green
book
is
to
the
right
of
the
white
book.
The
orange
book
is
the
rightmost.
\n
Options:
\n
(A)
The
white
book
is
the
leftmost
\n
(B)
The
green
book
is
the
leftmost
\n
(C)
The
orange
book
is
the
leftmost
\n
A:
(A)
\n\n
Q:
The
following
paragraphs
each
describe
a
set
of
three
objects
arranged
in
a
fixed
order.
The
statements
are
logically
consistent
within
each
paragraph.
On
a
shelf,
there
are
three
books:
a
red
book,
a
gray
book,
and
a
white
book.
The
white
book
is
to
the
left
of
the
gray
book.
The
red
book
is
the
second
from
the
left.
\n
Options:
\n
(A)
The
red
book
is
the
leftmost
\n
(B)
The
gray
book
is
the
leftmost
\n
(C)
The
white
book
is
the
leftmost
\n
A:
(C)
\n\n
Q:
{{input}}
\n
A:"
"
include"
:
"
_fewshot_template_yaml"
"
task"
:
"
bbh_fewshot_logical_deduction_seven_objects"
dataset_name
:
logical_deduction_seven_objects
description
:
'
A
logical
deduction
task
which
requires
deducing
the
order
of
a
sequence
of
objects.
'
doc_to_text
:
'
Q:
{{input}}
A:'
fewshot_config
:
sampler
:
first_n
samples
:
-
input
:
'
The
following
paragraphs
each
describe
a
set
of
three
objects
arranged
in
a
fixed
order.
The
statements
are
logically
consistent
within
each
paragraph.
In
a
golf
tournament,
there
were
three
golfers:
Amy,
Eli,
and
Eve.
Eve
finished
above
Amy.
Eli
finished
below
Amy.
Options:
(A)
Amy
finished
last
(B)
Eli
finished
last
(C)
Eve
finished
last'
target
:
(B)
-
input
:
'
The
following
paragraphs
each
describe
a
set
of
three
objects
arranged
in
a
fixed
order.
The
statements
are
logically
consistent
within
each
paragraph.
On
a
shelf,
there
are
three
books:
a
white
book,
a
green
book,
and
an
orange
book.
The
green
book
is
to
the
right
of
the
white
book.
The
orange
book
is
the
rightmost.
Options:
(A)
The
white
book
is
the
leftmost
(B)
The
green
book
is
the
leftmost
(C)
The
orange
book
is
the
leftmost'
target
:
(A)
-
input
:
'
The
following
paragraphs
each
describe
a
set
of
three
objects
arranged
in
a
fixed
order.
The
statements
are
logically
consistent
within
each
paragraph.
On
a
shelf,
there
are
three
books:
a
red
book,
a
gray
book,
and
a
white
book.
The
white
book
is
to
the
left
of
the
gray
book.
The
red
book
is
the
second
from
the
left.
Options:
(A)
The
red
book
is
the
leftmost
(B)
The
gray
book
is
the
leftmost
(C)
The
white
book
is
the
leftmost'
target
:
(C)
include
:
_fewshot_template_yaml
task
:
bbh_fewshot_logical_deduction_seven_objects
lm_eval/tasks/bbh/fewshot/logical_deduction_three_objects.yaml
View file @
da211969
"
dataset_name"
:
"
logical_deduction_three_objects"
"
description"
:
"
A
logical
deduction
task
which
requires
deducing
the
order
of
a
sequence
of
objects.
\n\n
"
"
doc_to_text"
:
"
Q:
The
following
paragraphs
each
describe
a
set
of
three
objects
arranged
in
a
fixed
order.
The
statements
are
logically
consistent
within
each
paragraph.
In
a
golf
tournament,
there
were
three
golfers:
Amy,
Eli,
and
Eve.
Eve
finished
above
Amy.
Eli
finished
below
Amy.
\n
Options:
\n
(A)
Amy
finished
last
\n
(B)
Eli
finished
last
\n
(C)
Eve
finished
last
\n
A:
(B)
\n\n
Q:
The
following
paragraphs
each
describe
a
set
of
three
objects
arranged
in
a
fixed
order.
The
statements
are
logically
consistent
within
each
paragraph.
On
a
shelf,
there
are
three
books:
a
white
book,
a
green
book,
and
an
orange
book.
The
green
book
is
to
the
right
of
the
white
book.
The
orange
book
is
the
rightmost.
\n
Options:
\n
(A)
The
white
book
is
the
leftmost
\n
(B)
The
green
book
is
the
leftmost
\n
(C)
The
orange
book
is
the
leftmost
\n
A:
(A)
\n\n
Q:
The
following
paragraphs
each
describe
a
set
of
three
objects
arranged
in
a
fixed
order.
The
statements
are
logically
consistent
within
each
paragraph.
On
a
shelf,
there
are
three
books:
a
red
book,
a
gray
book,
and
a
white
book.
The
white
book
is
to
the
left
of
the
gray
book.
The
red
book
is
the
second
from
the
left.
\n
Options:
\n
(A)
The
red
book
is
the
leftmost
\n
(B)
The
gray
book
is
the
leftmost
\n
(C)
The
white
book
is
the
leftmost
\n
A:
(C)
\n\n
Q:
{{input}}
\n
A:"
"
include"
:
"
_fewshot_template_yaml"
"
task"
:
"
bbh_fewshot_logical_deduction_three_objects"
dataset_name
:
logical_deduction_three_objects
description
:
'
A
logical
deduction
task
which
requires
deducing
the
order
of
a
sequence
of
objects.
'
doc_to_text
:
'
Q:
{{input}}
A:'
fewshot_config
:
sampler
:
first_n
samples
:
-
input
:
'
The
following
paragraphs
each
describe
a
set
of
three
objects
arranged
in
a
fixed
order.
The
statements
are
logically
consistent
within
each
paragraph.
In
a
golf
tournament,
there
were
three
golfers:
Amy,
Eli,
and
Eve.
Eve
finished
above
Amy.
Eli
finished
below
Amy.
Options:
(A)
Amy
finished
last
(B)
Eli
finished
last
(C)
Eve
finished
last'
target
:
(B)
-
input
:
'
The
following
paragraphs
each
describe
a
set
of
three
objects
arranged
in
a
fixed
order.
The
statements
are
logically
consistent
within
each
paragraph.
On
a
shelf,
there
are
three
books:
a
white
book,
a
green
book,
and
an
orange
book.
The
green
book
is
to
the
right
of
the
white
book.
The
orange
book
is
the
rightmost.
Options:
(A)
The
white
book
is
the
leftmost
(B)
The
green
book
is
the
leftmost
(C)
The
orange
book
is
the
leftmost'
target
:
(A)
-
input
:
'
The
following
paragraphs
each
describe
a
set
of
three
objects
arranged
in
a
fixed
order.
The
statements
are
logically
consistent
within
each
paragraph.
On
a
shelf,
there
are
three
books:
a
red
book,
a
gray
book,
and
a
white
book.
The
white
book
is
to
the
left
of
the
gray
book.
The
red
book
is
the
second
from
the
left.
Options:
(A)
The
red
book
is
the
leftmost
(B)
The
gray
book
is
the
leftmost
(C)
The
white
book
is
the
leftmost'
target
:
(C)
include
:
_fewshot_template_yaml
task
:
bbh_fewshot_logical_deduction_three_objects
lm_eval/tasks/bbh/fewshot/movie_recommendation.yaml
View file @
da211969
"
dataset_name"
:
"
movie_recommendation"
"
description"
:
"
Recommend
movies
similar
to
the
given
list
of
movies.
\n\n
"
"
doc_to_text"
:
"
Q:
Find
a
movie
similar
to
Star
Wars
Episode
IV
-
A
New
Hope,
Indiana
Jones
and
the
Last
Crusade,
Star
Wars
Episode
V
-
The
Empire
Strikes
Back,
The
Big
Lebowski:
\n
Options:
\n
(A)
Tetsuo
\n
(B)
the
Ironman
\n
(C)
The
Princess
Bride
\n
(D)
The
Barkley
Marathons
The
Race
That
Eats
Its
Young
\n
(E)
Bug
\n
A:
(C)
\n\n
Q:
Find
a
movie
similar
to
Twister,
The
Silence
of
the
Lambs,
Independence
Day,
Braveheart:
\n
Options:
\n
(A)
They
Shoot
Horses
\n
(B)
Don't
They
\n
(C)
Forrest
Gump
\n
(D)
The
Salton
Sea
\n
(E)
Extreme
Days
\n
A:
(C)
\n\n
Q:
Find
a
movie
similar
to
Minority
Report,
Total
Recall,
Inside
Out,
Forrest
Gump:
\n
Options:
\n
(A)
Phenomena
\n
(B)
Lilting
\n
(C)
Catwoman
\n
(D)
Edge
of
Tomorrow
\n
A:
(D)
\n\n
Q:
{{input}}
\n
A:"
"
include"
:
"
_fewshot_template_yaml"
"
task"
:
"
bbh_fewshot_movie_recommendation"
dataset_name
:
movie_recommendation
description
:
'
Recommend
movies
similar
to
the
given
list
of
movies.
'
doc_to_text
:
'
Q:
{{input}}
A:'
fewshot_config
:
sampler
:
first_n
samples
:
-
input
:
'
Find
a
movie
similar
to
Star
Wars
Episode
IV
-
A
New
Hope,
Indiana
Jones
and
the
Last
Crusade,
Star
Wars
Episode
V
-
The
Empire
Strikes
Back,
The
Big
Lebowski:
Options:
(A)
Tetsuo
(B)
the
Ironman
(C)
The
Princess
Bride
(D)
The
Barkley
Marathons
The
Race
That
Eats
Its
Young
(E)
Bug'
target
:
(C)
-
input
:
'
Find
a
movie
similar
to
Twister,
The
Silence
of
the
Lambs,
Independence
Day,
Braveheart:
Options:
(A)
They
Shoot
Horses
(B)
Don'
'
t
They
(C)
Forrest
Gump
(D)
The
Salton
Sea
(E)
Extreme
Days'
target
:
(C)
-
input
:
'
Find
a
movie
similar
to
Minority
Report,
Total
Recall,
Inside
Out,
Forrest
Gump:
Options:
(A)
Phenomena
(B)
Lilting
(C)
Catwoman
(D)
Edge
of
Tomorrow'
target
:
(D)
include
:
_fewshot_template_yaml
task
:
bbh_fewshot_movie_recommendation
lm_eval/tasks/bbh/fewshot/multistep_arithmetic_two.yaml
View file @
da211969
"
dataset_name"
:
"
multistep_arithmetic_two"
"
description"
:
"
Solve
multi-step
arithmetic
problems.
\n\n
"
"
doc_to_text"
:
"
Q:
((-5
+
9
*
-4
-
0)
*
(4
+
-7
+
0
*
-5))
=
\n
A:
123
\n\n
Q:
((-9
*
7
*
7
*
-9)
+
(4
*
-9
-
8
-
-4))
=
\n
A:
3929
\n\n
Q:
((-3
+
5
*
8
*
-4)
-
(9
-
8
*
-7
+
-9))
=
\n
A:
-219
\n\n
Q:
{{input}}
\n
A:"
"
include"
:
"
_fewshot_template_yaml"
"
task"
:
"
bbh_fewshot_multistep_arithmetic_two"
dataset_name
:
multistep_arithmetic_two
description
:
'
Solve
multi-step
arithmetic
problems.
'
doc_to_text
:
'
Q:
{{input}}
A:'
fewshot_config
:
sampler
:
first_n
samples
:
-
input
:
((-5 + 9 * -4 - 0) * (4 + -7 + 0 * -5)) =
target
:
'
123'
-
input
:
((-9 * 7 * 7 * -9) + (4 * -9 - 8 - -4)) =
target
:
'
3929'
-
input
:
((-3 + 5 * 8 * -4) - (9 - 8 * -7 + -9)) =
target
:
'
-219'
include
:
_fewshot_template_yaml
task
:
bbh_fewshot_multistep_arithmetic_two
lm_eval/tasks/bbh/fewshot/navigate.yaml
View file @
da211969
"
dataset_name"
:
"
navigate"
"
description"
:
"
Given
a
series
of
navigation
instructions,
determine
whether
one
would
end
up
back
at
the
starting
point.
\n\n
"
"
doc_to_text"
:
"
Q:
If
you
follow
these
instructions,
do
you
return
to
the
starting
point?
Turn
left.
Turn
around.
Turn
left.
Take
7
steps.
Take
2
steps.
Take
4
steps.
Take
8
steps.
\n
Options:
\n
-
Yes
\n
-
No
\n
A:
No
\n\n
Q:
If
you
follow
these
instructions,
do
you
return
to
the
starting
point?
Turn
around.
Take
1
step.
Take
6
steps.
Turn
around.
Take
6
steps.
Take
9
steps.
Take
1
step.
\n
Options:
\n
-
Yes
\n
-
No
\n
A:
No
\n\n
Q:
If
you
follow
these
instructions,
do
you
return
to
the
starting
point?
Always
face
forward.
Take
2
steps
right.
Take
9
steps
left.
Take
7
steps
right.
\n
Options:
\n
-
Yes
\n
-
No
\n
A:
Yes
\n\n
Q:
{{input}}
\n
A:"
"
include"
:
"
_fewshot_template_yaml"
"
task"
:
"
bbh_fewshot_navigate"
dataset_name
:
navigate
description
:
'
Given
a
series
of
navigation
instructions,
determine
whether
one
would
end
up
back
at
the
starting
point.
'
doc_to_text
:
'
Q:
{{input}}
A:'
fewshot_config
:
sampler
:
first_n
samples
:
-
input
:
'
If
you
follow
these
instructions,
do
you
return
to
the
starting
point?
Turn
left.
Turn
around.
Turn
left.
Take
7
steps.
Take
2
steps.
Take
4
steps.
Take
8
steps.
Options:
-
Yes
-
No'
target
:
'
No'
-
input
:
'
If
you
follow
these
instructions,
do
you
return
to
the
starting
point?
Turn
around.
Take
1
step.
Take
6
steps.
Turn
around.
Take
6
steps.
Take
9
steps.
Take
1
step.
Options:
-
Yes
-
No'
target
:
'
No'
-
input
:
'
If
you
follow
these
instructions,
do
you
return
to
the
starting
point?
Always
face
forward.
Take
2
steps
right.
Take
9
steps
left.
Take
7
steps
right.
Options:
-
Yes
-
No'
target
:
'
Yes'
include
:
_fewshot_template_yaml
task
:
bbh_fewshot_navigate
lm_eval/tasks/bbh/fewshot/object_counting.yaml
View file @
da211969
"
dataset_name"
:
"
object_counting"
"
description"
:
"
Questions
that
involve
enumerating
objects
and
asking
the
model
to
count
them.
\n\n
"
"
doc_to_text"
:
"
Q:
I
have
a
blackberry,
a
clarinet,
a
nectarine,
a
plum,
a
strawberry,
a
banana,
a
flute,
an
orange,
and
a
violin.
How
many
fruits
do
I
have?
\n
A:
6
\n\n
Q:
I
have
an
orange,
a
raspberry,
two
peaches,
a
blackberry,
an
apple,
a
grape,
a
nectarine,
and
three
plums.
How
many
fruits
do
I
have?
\n
A:
11
\n\n
Q:
I
have
a
lettuce
head,
a
head
of
broccoli,
an
onion,
a
stalk
of
celery,
two
carrots,
a
garlic,
and
a
yam.
How
many
vegetables
do
I
have?
\n
A:
8
\n\n
Q:
{{input}}
\n
A:"
"
include"
:
"
_fewshot_template_yaml"
"
task"
:
"
bbh_fewshot_object_counting"
dataset_name
:
object_counting
description
:
'
Questions
that
involve
enumerating
objects
and
asking
the
model
to
count
them.
'
doc_to_text
:
'
Q:
{{input}}
A:'
fewshot_config
:
sampler
:
first_n
samples
:
-
input
:
I have a blackberry, a clarinet, a nectarine, a plum, a strawberry, a banana,
a flute, an orange, and a violin. How many fruits do I have?
target
:
'
6'
-
input
:
I have an orange, a raspberry, two peaches, a blackberry, an apple, a grape,
a nectarine, and three plums. How many fruits do I have?
target
:
'
11'
-
input
:
I have a lettuce head, a head of broccoli, an onion, a stalk of celery,
two carrots, a garlic, and a yam. How many vegetables do I have?
target
:
'
8'
include
:
_fewshot_template_yaml
task
:
bbh_fewshot_object_counting
lm_eval/tasks/bbh/fewshot/penguins_in_a_table.yaml
View file @
da211969
"
dataset_name"
:
"
penguins_in_a_table"
"
description"
:
"
Answer
questions
about
a
table
of
penguins
and
their
attributes.
\n\n
"
"
doc_to_text"
:
"
Q:
Here
is
a
table
where
the
first
line
is
a
header
and
each
subsequent
line
is
a
penguin:
name,
age,
height
(cm),
weight
(kg)
Louis,
7,
50,
11
Bernard,
5,
80,
13
Vincent,
9,
60,
11
Gwen,
8,
70,
15
For
example:
the
age
of
Louis
is
7,
the
weight
of
Gwen
is
15
kg,
the
height
of
Bernard
is
80
cm.
We
now
add
a
penguin
to
the
table:
\n
James,
12,
90,
12
\n
How
many
penguins
are
less
than
8
years
old?
\n
Options:
\n
(A)
1
\n
(B)
2
\n
(C)
3
\n
(D)
4
\n
(E)
5
\n
A:
(B)
\n\n
Q:
Here
is
a
table
where
the
first
line
is
a
header
and
each
subsequent
line
is
a
penguin:
name,
age,
height
(cm),
weight
(kg)
Louis,
7,
50,
11
Bernard,
5,
80,
13
Vincent,
9,
60,
11
Gwen,
8,
70,
15
For
example:
the
age
of
Louis
is
7,
the
weight
of
Gwen
is
15
kg,
the
height
of
Bernard
is
80
cm.
Which
is
the
youngest
penguin?
\n
Options:
\n
(A)
Louis
\n
(B)
Bernard
\n
(C)
Vincent
\n
(D)
Gwen
\n
(E)
James
\n
A:
(B)
\n\n
Q:
Here
is
a
table
where
the
first
line
is
a
header
and
each
subsequent
line
is
a
penguin:
name,
age,
height
(cm),
weight
(kg)
Louis,
7,
50,
11
Bernard,
5,
80,
13
Vincent,
9,
60,
11
Gwen,
8,
70,
15
For
example:
the
age
of
Louis
is
7,
the
weight
of
Gwen
is
15
kg,
the
height
of
Bernard
is
80
cm.
What
is
the
name
of
the
second
penguin
sorted
by
alphabetic
order?
\n
Options:
\n
(A)
Louis
\n
(B)
Bernard
\n
(C)
Vincent
\n
(D)
Gwen
\n
(E)
James
\n
A:
(D)
\n\n
Q:
{{input}}
\n
A:"
"
include"
:
"
_fewshot_template_yaml"
"
task"
:
"
bbh_fewshot_penguins_in_a_table"
dataset_name
:
penguins_in_a_table
description
:
'
Answer
questions
about
a
table
of
penguins
and
their
attributes.
'
doc_to_text
:
'
Q:
{{input}}
A:'
fewshot_config
:
sampler
:
first_n
samples
:
-
input
:
'
Here
is
a
table
where
the
first
line
is
a
header
and
each
subsequent
line
is
a
penguin:
name,
age,
height
(cm),
weight
(kg)
Louis,
7,
50,
11
Bernard,
5,
80,
13
Vincent,
9,
60,
11
Gwen,
8,
70,
15
For
example:
the
age
of
Louis
is
7,
the
weight
of
Gwen
is
15
kg,
the
height
of
Bernard
is
80
cm.
We
now
add
a
penguin
to
the
table:
James,
12,
90,
12
How
many
penguins
are
less
than
8
years
old?
Options:
(A)
1
(B)
2
(C)
3
(D)
4
(E)
5'
target
:
(B)
-
input
:
'
Here
is
a
table
where
the
first
line
is
a
header
and
each
subsequent
line
is
a
penguin:
name,
age,
height
(cm),
weight
(kg)
Louis,
7,
50,
11
Bernard,
5,
80,
13
Vincent,
9,
60,
11
Gwen,
8,
70,
15
For
example:
the
age
of
Louis
is
7,
the
weight
of
Gwen
is
15
kg,
the
height
of
Bernard
is
80
cm.
Which
is
the
youngest
penguin?
Options:
(A)
Louis
(B)
Bernard
(C)
Vincent
(D)
Gwen
(E)
James'
target
:
(B)
-
input
:
'
Here
is
a
table
where
the
first
line
is
a
header
and
each
subsequent
line
is
a
penguin:
name,
age,
height
(cm),
weight
(kg)
Louis,
7,
50,
11
Bernard,
5,
80,
13
Vincent,
9,
60,
11
Gwen,
8,
70,
15
For
example:
the
age
of
Louis
is
7,
the
weight
of
Gwen
is
15
kg,
the
height
of
Bernard
is
80
cm.
What
is
the
name
of
the
second
penguin
sorted
by
alphabetic
order?
Options:
(A)
Louis
(B)
Bernard
(C)
Vincent
(D)
Gwen
(E)
James'
target
:
(D)
include
:
_fewshot_template_yaml
task
:
bbh_fewshot_penguins_in_a_table
Prev
1
…
6
7
8
9
10
11
12
13
14
…
33
Next
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment