Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
gaoqiong
lm-evaluation-harness
Commits
835cc40e
Commit
835cc40e
authored
Dec 06, 2023
by
lintangsutawika
Browse files
merged latest and added altworld files
parents
8da401e0
c9bbec6e
Changes
430
Hide whitespace changes
Inline
Side-by-side
Showing
20 changed files
with
43 additions
and
39 deletions
+43
-39
lm_eval/tasks/bbh/fewshot/salient_translation_error_detection.yaml
...asks/bbh/fewshot/salient_translation_error_detection.yaml
+2
-2
lm_eval/tasks/bbh/fewshot/snarks.yaml
lm_eval/tasks/bbh/fewshot/snarks.yaml
+2
-2
lm_eval/tasks/bbh/fewshot/sports_understanding.yaml
lm_eval/tasks/bbh/fewshot/sports_understanding.yaml
+2
-2
lm_eval/tasks/bbh/fewshot/temporal_sequences.yaml
lm_eval/tasks/bbh/fewshot/temporal_sequences.yaml
+2
-2
lm_eval/tasks/bbh/fewshot/tracking_shuffled_objects_five_objects.yaml
...s/bbh/fewshot/tracking_shuffled_objects_five_objects.yaml
+2
-2
lm_eval/tasks/bbh/fewshot/tracking_shuffled_objects_seven_objects.yaml
.../bbh/fewshot/tracking_shuffled_objects_seven_objects.yaml
+2
-2
lm_eval/tasks/bbh/fewshot/tracking_shuffled_objects_three_objects.yaml
.../bbh/fewshot/tracking_shuffled_objects_three_objects.yaml
+2
-2
lm_eval/tasks/bbh/fewshot/web_of_lies.yaml
lm_eval/tasks/bbh/fewshot/web_of_lies.yaml
+2
-2
lm_eval/tasks/bbh/fewshot/word_sorting.yaml
lm_eval/tasks/bbh/fewshot/word_sorting.yaml
+2
-2
lm_eval/tasks/bbh/zeroshot/_zeroshot_template_yaml
lm_eval/tasks/bbh/zeroshot/_zeroshot_template_yaml
+5
-1
lm_eval/tasks/bbh/zeroshot/boolean_expressions.yaml
lm_eval/tasks/bbh/zeroshot/boolean_expressions.yaml
+2
-2
lm_eval/tasks/bbh/zeroshot/causal_judgement.yaml
lm_eval/tasks/bbh/zeroshot/causal_judgement.yaml
+2
-2
lm_eval/tasks/bbh/zeroshot/date_understanding.yaml
lm_eval/tasks/bbh/zeroshot/date_understanding.yaml
+2
-2
lm_eval/tasks/bbh/zeroshot/disambiguation_qa.yaml
lm_eval/tasks/bbh/zeroshot/disambiguation_qa.yaml
+2
-2
lm_eval/tasks/bbh/zeroshot/dyck_languages.yaml
lm_eval/tasks/bbh/zeroshot/dyck_languages.yaml
+2
-2
lm_eval/tasks/bbh/zeroshot/formal_fallacies.yaml
lm_eval/tasks/bbh/zeroshot/formal_fallacies.yaml
+2
-2
lm_eval/tasks/bbh/zeroshot/geometric_shapes.yaml
lm_eval/tasks/bbh/zeroshot/geometric_shapes.yaml
+2
-2
lm_eval/tasks/bbh/zeroshot/hyperbaton.yaml
lm_eval/tasks/bbh/zeroshot/hyperbaton.yaml
+2
-2
lm_eval/tasks/bbh/zeroshot/logical_deduction_five_objects.yaml
...al/tasks/bbh/zeroshot/logical_deduction_five_objects.yaml
+2
-2
lm_eval/tasks/bbh/zeroshot/logical_deduction_seven_objects.yaml
...l/tasks/bbh/zeroshot/logical_deduction_seven_objects.yaml
+2
-2
No files found.
lm_eval/tasks/bbh/
flan_
fewshot/salient_translation_error_detection.yaml
→
lm_eval/tasks/bbh/fewshot/salient_translation_error_detection.yaml
View file @
835cc40e
"
dataset_name"
:
"
salient_translation_error_detection"
"
description"
:
"
Detect
the
type
of
error
in
an
English
translation
of
a
German
source
sentence.
\n\n
"
"
doc_to_text"
:
"
Q:
The
following
translations
from
German
to
English
contain
a
particular
error.
That
error
will
be
one
of
the
following
types:
Named
Entities:
An
entity
(names,
places,
locations,
etc.)
is
changed
to
a
different
entity.
Numerical
Values:
Numerical
values
(ordinals
or
cardinals),
dates,
and/or
units
are
changed.
Modifiers
or
Adjectives:
The
modifiers
and
adjectives
pertaining
to
a
noun
are
changed.
Negation
or
Antonyms:
Introduce
or
remove
a
negation
or
change
comparatives
to
their
antonyms.
Facts:
Trivial
factual
errors
not
pertaining
to
the
above
classes
are
introduced
in
the
translations.
Dropped
Content:
A
significant
clause
in
the
translation
is
removed.
Please
identify
that
error.
Source:
In
der
Liste
der
Baudenkmale
in
Lenzen
(Elbe)
sind
alle
Baudenkmale
der
brandenburgischen
Stadt
Lenzen
(Elbe)
und
ihrer
Ortsteile
aufgelistet.
\n
Translation:
In
the
list
of
architectural
monuments
in
Lenzen
all
architectural
monuments
of
the
Brandenburg
city
of
Lenzen
and
its
districts
are
listed.
\n
The
translation
contains
an
error
pertaining
to
\n
Options:
\n
(A)
Modifiers
or
Adjectives
\n
(B)
Numerical
Values
\n
(C)
Negation
or
Antonyms
\n
(D)
Named
Entities
\n
(E)
Dropped
Content
\n
(F)
Facts
\n
A:
(D)
\n\n
Q:
The
following
translations
from
German
to
English
contain
a
particular
error.
That
error
will
be
one
of
the
following
types:
Named
Entities:
An
entity
(names,
places,
locations,
etc.)
is
changed
to
a
different
entity.
Numerical
Values:
Numerical
values
(ordinals
or
cardinals),
dates,
and/or
units
are
changed.
Modifiers
or
Adjectives:
The
modifiers
and
adjectives
pertaining
to
a
noun
are
changed.
Negation
or
Antonyms:
Introduce
or
remove
a
negation
or
change
comparatives
to
their
antonyms.
Facts:
Trivial
factual
errors
not
pertaining
to
the
above
classes
are
introduced
in
the
translations.
Dropped
Content:
A
significant
clause
in
the
translation
is
removed.
Please
identify
that
error.
Source:
Auf
dieser
Seite
sind
die
Baudenkmäler
der
oberbayerischen
Großen
Kreisstadt
Landsberg
am
Lech
zusammengestellt.
\n
Translation:
On
this
page
are
compiled
the
architectural
monuments
of
the
town
of
Landsberg
am
Lech.
\n
The
translation
contains
an
error
pertaining
to
\n
Options:
\n
(A)
Modifiers
or
Adjectives
\n
(B)
Numerical
Values
\n
(C)
Negation
or
Antonyms
\n
(D)
Named
Entities
\n
(E)
Dropped
Content
\n
(F)
Facts
\n
A:
(E)
\n\n
Q:
The
following
translations
from
German
to
English
contain
a
particular
error.
That
error
will
be
one
of
the
following
types:
Named
Entities:
An
entity
(names,
places,
locations,
etc.)
is
changed
to
a
different
entity.
Numerical
Values:
Numerical
values
(ordinals
or
cardinals),
dates,
and/or
units
are
changed.
Modifiers
or
Adjectives:
The
modifiers
and
adjectives
pertaining
to
a
noun
are
changed.
Negation
or
Antonyms:
Introduce
or
remove
a
negation
or
change
comparatives
to
their
antonyms.
Facts:
Trivial
factual
errors
not
pertaining
to
the
above
classes
are
introduced
in
the
translations.
Dropped
Content:
A
significant
clause
in
the
translation
is
removed.
Please
identify
that
error.
Source:
Łeba
ist
eine
Kleinstadt
und
ein
Badeort
im
Powiat
Lęborski
der
polnischen
Woiwodschaft
Pommern.
\n
Translation:
Eba
is
not
a
small
town
and
seaside
resort
in
the
Powiat
Léborski
county
of
the
Pomeranian
Voivodeship
of
Poland.
\n
The
translation
contains
an
error
pertaining
to
\n
Options:
\n
(A)
Modifiers
or
Adjectives
\n
(B)
Numerical
Values
\n
(C)
Negation
or
Antonyms
\n
(D)
Named
Entities
\n
(E)
Dropped
Content
\n
(F)
Facts
\n
A:
(C)
\n\n
Q:
{{input}}
\n
A:"
"
include"
:
"
_
flan_
fewshot_template_yaml"
"
task"
:
"
bbh_
flan_
fewshot_salient_translation_error_detection"
"
include"
:
"
_fewshot_template_yaml"
"
task"
:
"
bbh_fewshot_salient_translation_error_detection"
lm_eval/tasks/bbh/
flan_
fewshot/snarks.yaml
→
lm_eval/tasks/bbh/fewshot/snarks.yaml
View file @
835cc40e
"
dataset_name"
:
"
snarks"
"
description"
:
"
Determine
which
of
two
sentences
is
sarcastic.
\n\n
According
to
Cambridge
University
Dictionary,
sarcasm
is
\"
the
use
of
remarks
that
clearly
mean
the
opposite
of
what
they
say,
made
in
order
to
hurt
someone's
feelings
or
to
criticize
something
in
a
humorous
way.
\"
Sarcastic
sentences
often
contain
satirical
or
ironic
utterances,
hyperboles,
ambivalent
or
witty
remarks.
\n\n
"
"
doc_to_text"
:
"
Q:
Which
statement
is
sarcastic?
\n
Options:
\n
(A)
Yes,
because
having
interests
and
actively
researching
them
is
a
huge
waste
\n
(B)
Yes,
because
having
interests
and
actively
researching
them
is
a
huge
deal
\n
A:
(A)
\n\n
Q:
Which
statement
is
sarcastic?
\n
Options:
\n
(A)
No
one
is
going
to
disagree
with
you
on
this.
Avoiding
ad
hominem
attacks
really
help
your
case
\n
(B)
No
one
is
going
to
disagree
with
you
on
this.
Ad
hominem
attacks
really
help
your
case
\n
A:
(B)
\n\n
Q:
Which
statement
is
sarcastic?
\n
Options:
\n
(A)
Consistency
in
the
league's
punishments?
What
do
you
think
this
is
supposed
to
be,
politics?
\n
(B)
Consistency
in
the
league's
punishments?
What
do
you
think
this
is
supposed
to
be,
moral?
\n
A:
(A)
\n\n
Q:
{{input}}
\n
A:"
"
include"
:
"
_
flan_
fewshot_template_yaml"
"
task"
:
"
bbh_
flan_
fewshot_snarks"
"
include"
:
"
_fewshot_template_yaml"
"
task"
:
"
bbh_fewshot_snarks"
lm_eval/tasks/bbh/
flan_
fewshot/sports_understanding.yaml
→
lm_eval/tasks/bbh/fewshot/sports_understanding.yaml
View file @
835cc40e
"
dataset_name"
:
"
sports_understanding"
"
description"
:
"
Determine
whether
an
artificially
constructed
sentence
relating
to
sports
is
plausible
or
not.
\n\n
"
"
doc_to_text"
:
"
Q:
Is
the
following
sentence
plausible?
\"
Bam
Adebayo
scored
a
reverse
layup
in
the
Western
Conference
Finals.
\"\n
A:
yes
\n\n
Q:
Is
the
following
sentence
plausible?
\"
Santi
Cazorla
scored
a
touchdown.
\"\n
A:
no
\n\n
Q:
Is
the
following
sentence
plausible?
\"
DeMar
DeRozan
was
called
for
the
goal
tend.
\"\n
A:
yes
\n\n
Q:
{{input}}
\n
A:"
"
include"
:
"
_
flan_
fewshot_template_yaml"
"
task"
:
"
bbh_
flan_
fewshot_sports_understanding"
"
include"
:
"
_fewshot_template_yaml"
"
task"
:
"
bbh_fewshot_sports_understanding"
lm_eval/tasks/bbh/
flan_
fewshot/temporal_sequences.yaml
→
lm_eval/tasks/bbh/fewshot/temporal_sequences.yaml
View file @
835cc40e
"
dataset_name"
:
"
temporal_sequences"
"
description"
:
"
Task
description:
Answer
questions
about
which
times
certain
events
could
have
occurred.
\n\n
"
"
doc_to_text"
:
"
Q:
Today,
Emily
went
to
the
museum.
Between
what
times
could
they
have
gone?
\n
We
know
that:
\n
Emily
woke
up
at
1pm.
\n
Elizabeth
saw
Emily
reading
at
the
library
from
2pm
to
4pm.
\n
Jessica
saw
Emily
watching
a
movie
at
the
theater
from
4pm
to
5pm.
\n
Leslie
saw
Emily
waiting
at
the
airport
from
5pm
to
6pm.
\n
William
saw
Emily
buying
clothes
at
the
mall
from
6pm
to
7pm.
\n
The
museum
was
closed
after
7pm.
\n
Between
what
times
could
Emily
have
gone
to
the
museum?
\n
Options:
\n
(A)
1pm
to
2pm
\n
(B)
6pm
to
7pm
\n
(C)
5pm
to
6pm
\n
(D)
2pm
to
4pm
\n
A:
(A)
\n\n
Q:
Today,
Elizabeth
went
to
the
amusement
park.
Between
what
times
could
they
have
gone?
\n
We
know
that:
\n
Elizabeth
woke
up
at
7am.
\n
David
saw
Elizabeth
fixing
their
computer
at
the
electronic
store
from
1pm
to
2pm.
\n
Sarah
saw
Elizabeth
playing
tennis
at
the
tennis
court
from
2pm
to
3pm.
\n
Susan
saw
Elizabeth
walking
towards
the
Statue
of
Liberty
from
3pm
to
6pm.
\n
Andrew
saw
Elizabeth
taking
photos
near
the
Eiffel
Tower
from
6pm
to
9pm.
\n
Emily
saw
Elizabeth
getting
a
coffee
at
the
cafe
from
9pm
to
10pm.
\n
The
amusement
park
was
closed
after
10pm.
\n
Between
what
times
could
Elizabeth
have
gone
to
the
amusement
park?
\n
Options:
\n
(A)
7am
to
1pm
\n
(B)
9pm
to
10pm
\n
(C)
1pm
to
2pm
\n
(D)
3pm
to
6pm
\n
A:
(A)
\n\n
Q:
Today,
Tiffany
went
to
the
beach.
Between
what
times
could
they
have
gone?
\n
We
know
that:
\n
Tiffany
woke
up
at
5am.
\n
Betty
saw
Tiffany
getting
a
coffee
at
the
cafe
from
5am
to
6am.
\n
Jessica
saw
Tiffany
working
at
the
office
from
6am
to
9am.
\n
John
saw
Tiffany
stretching
at
a
yoga
studio
from
9am
to
12pm.
\n
Sean
saw
Tiffany
sitting
on
a
rooftop
from
12pm
to
2pm.
\n
Sarah
saw
Tiffany
playing
tennis
at
the
tennis
court
from
2pm
to
3pm.
\n
The
beach
was
closed
after
4pm.
\n
Between
what
times
could
Tiffany
have
gone
to
the
beach?
\n
Options:
\n
(A)
9am
to
12pm
\n
(B)
12pm
to
2pm
\n
(C)
5am
to
6am
\n
(D)
3pm
to
4pm
\n
A:
(D)
\n\n
Q:
{{input}}
\n
A:"
"
include"
:
"
_
flan_
fewshot_template_yaml"
"
task"
:
"
bbh_
flan_
fewshot_temporal_sequences"
"
include"
:
"
_fewshot_template_yaml"
"
task"
:
"
bbh_fewshot_temporal_sequences"
lm_eval/tasks/bbh/
flan_
fewshot/tracking_shuffled_objects_five_objects.yaml
→
lm_eval/tasks/bbh/fewshot/tracking_shuffled_objects_five_objects.yaml
View file @
835cc40e
"
dataset_name"
:
"
tracking_shuffled_objects_five_objects"
"
description"
:
"
A
task
requiring
determining
the
final
positions
of
a
set
of
objects
given
their
initial
positions
and
a
description
of
a
sequence
of
swaps.
\n\n
"
"
doc_to_text"
:
"
Q:
Alice,
Bob,
and
Claire
are
playing
a
game.
At
the
start
of
the
game,
they
are
each
holding
a
ball:
Alice
has
a
yellow
ball,
Bob
has
a
blue
ball,
and
Claire
has
a
pink
ball.
\n
As
the
game
progresses,
pairs
of
players
trade
balls.
First,
Claire
and
Alice
swap
balls.
Then,
Alice
and
Bob
swap
balls.
Finally,
Claire
and
Bob
swap
balls.
At
the
end
of
the
game,
Bob
has
the
\n
Options:
\n
(A)
yellow
ball
\n
(B)
blue
ball
\n
(C)
pink
ball
\n
A:
(A)
\n\n
Q:
Alice,
Bob,
and
Claire
are
playing
a
game.
At
the
start
of
the
game,
they
are
each
holding
a
ball:
Alice
has
a
white
ball,
Bob
has
a
purple
ball,
and
Claire
has
a
pink
ball.
\n
As
the
game
progresses,
pairs
of
players
trade
balls.
First,
Bob
and
Alice
swap
balls.
Then,
Bob
and
Claire
swap
balls.
Finally,
Bob
and
Alice
swap
balls.
At
the
end
of
the
game,
Alice
has
the
\n
Options:
\n
(A)
white
ball
\n
(B)
purple
ball
\n
(C)
pink
ball
\n
A:
(C)
\n\n
Q:
Alice,
Bob,
and
Claire
are
dancers
at
a
square
dance.
At
the
start
of
a
song,
they
each
have
a
partner:
Alice
is
dancing
with
Lola,
Bob
is
dancing
with
Rodrigo,
and
Claire
is
dancing
with
Patrick.
\n
Throughout
the
song,
the
dancers
often
trade
partners.
First,
Alice
and
Bob
switch
partners.
Then,
Claire
and
Bob
switch
partners.
Finally,
Bob
and
Alice
switch
partners.
At
the
end
of
the
dance,
Alice
is
dancing
with
\n
Options:
\n
(A)
Lola
\n
(B)
Rodrigo
\n
(C)
Patrick
\n
A:
(C)
\n\n
Q:
{{input}}
\n
A:"
"
include"
:
"
_
flan_
fewshot_template_yaml"
"
task"
:
"
bbh_
flan_
fewshot_tracking_shuffled_objects_five_objects"
"
include"
:
"
_fewshot_template_yaml"
"
task"
:
"
bbh_fewshot_tracking_shuffled_objects_five_objects"
lm_eval/tasks/bbh/
flan_
fewshot/tracking_shuffled_objects_seven_objects.yaml
→
lm_eval/tasks/bbh/fewshot/tracking_shuffled_objects_seven_objects.yaml
View file @
835cc40e
"
dataset_name"
:
"
tracking_shuffled_objects_seven_objects"
"
description"
:
"
A
task
requiring
determining
the
final
positions
of
a
set
of
objects
given
their
initial
positions
and
a
description
of
a
sequence
of
swaps.
\n\n
"
"
doc_to_text"
:
"
Q:
Alice,
Bob,
and
Claire
are
playing
a
game.
At
the
start
of
the
game,
they
are
each
holding
a
ball:
Alice
has
a
yellow
ball,
Bob
has
a
blue
ball,
and
Claire
has
a
pink
ball.
\n
As
the
game
progresses,
pairs
of
players
trade
balls.
First,
Claire
and
Alice
swap
balls.
Then,
Alice
and
Bob
swap
balls.
Finally,
Claire
and
Bob
swap
balls.
At
the
end
of
the
game,
Bob
has
the
\n
Options:
\n
(A)
yellow
ball
\n
(B)
blue
ball
\n
(C)
pink
ball
\n
A:
(A)
\n\n
Q:
Alice,
Bob,
and
Claire
are
playing
a
game.
At
the
start
of
the
game,
they
are
each
holding
a
ball:
Alice
has
a
white
ball,
Bob
has
a
purple
ball,
and
Claire
has
a
pink
ball.
\n
As
the
game
progresses,
pairs
of
players
trade
balls.
First,
Bob
and
Alice
swap
balls.
Then,
Bob
and
Claire
swap
balls.
Finally,
Bob
and
Alice
swap
balls.
At
the
end
of
the
game,
Alice
has
the
\n
Options:
\n
(A)
white
ball
\n
(B)
purple
ball
\n
(C)
pink
ball
\n
A:
(C)
\n\n
Q:
Alice,
Bob,
and
Claire
are
dancers
at
a
square
dance.
At
the
start
of
a
song,
they
each
have
a
partner:
Alice
is
dancing
with
Lola,
Bob
is
dancing
with
Rodrigo,
and
Claire
is
dancing
with
Patrick.
\n
Throughout
the
song,
the
dancers
often
trade
partners.
First,
Alice
and
Bob
switch
partners.
Then,
Claire
and
Bob
switch
partners.
Finally,
Bob
and
Alice
switch
partners.
At
the
end
of
the
dance,
Alice
is
dancing
with
\n
Options:
\n
(A)
Lola
\n
(B)
Rodrigo
\n
(C)
Patrick
\n
A:
(C)
\n\n
Q:
{{input}}
\n
A:"
"
include"
:
"
_
flan_
fewshot_template_yaml"
"
task"
:
"
bbh_
flan_
fewshot_tracking_shuffled_objects_seven_objects"
"
include"
:
"
_fewshot_template_yaml"
"
task"
:
"
bbh_fewshot_tracking_shuffled_objects_seven_objects"
lm_eval/tasks/bbh/
flan_
fewshot/tracking_shuffled_objects_three_objects.yaml
→
lm_eval/tasks/bbh/fewshot/tracking_shuffled_objects_three_objects.yaml
View file @
835cc40e
"
dataset_name"
:
"
tracking_shuffled_objects_three_objects"
"
description"
:
"
A
task
requiring
determining
the
final
positions
of
a
set
of
objects
given
their
initial
positions
and
a
description
of
a
sequence
of
swaps.
\n\n
"
"
doc_to_text"
:
"
Q:
Alice,
Bob,
and
Claire
are
playing
a
game.
At
the
start
of
the
game,
they
are
each
holding
a
ball:
Alice
has
a
yellow
ball,
Bob
has
a
blue
ball,
and
Claire
has
a
pink
ball.
\n
As
the
game
progresses,
pairs
of
players
trade
balls.
First,
Claire
and
Alice
swap
balls.
Then,
Alice
and
Bob
swap
balls.
Finally,
Claire
and
Bob
swap
balls.
At
the
end
of
the
game,
Bob
has
the
\n
Options:
\n
(A)
yellow
ball
\n
(B)
blue
ball
\n
(C)
pink
ball
\n
A:
(A)
\n\n
Q:
Alice,
Bob,
and
Claire
are
playing
a
game.
At
the
start
of
the
game,
they
are
each
holding
a
ball:
Alice
has
a
white
ball,
Bob
has
a
purple
ball,
and
Claire
has
a
pink
ball.
\n
As
the
game
progresses,
pairs
of
players
trade
balls.
First,
Bob
and
Alice
swap
balls.
Then,
Bob
and
Claire
swap
balls.
Finally,
Bob
and
Alice
swap
balls.
At
the
end
of
the
game,
Alice
has
the
\n
Options:
\n
(A)
white
ball
\n
(B)
purple
ball
\n
(C)
pink
ball
\n
A:
(C)
\n\n
Q:
Alice,
Bob,
and
Claire
are
dancers
at
a
square
dance.
At
the
start
of
a
song,
they
each
have
a
partner:
Alice
is
dancing
with
Lola,
Bob
is
dancing
with
Rodrigo,
and
Claire
is
dancing
with
Patrick.
\n
Throughout
the
song,
the
dancers
often
trade
partners.
First,
Alice
and
Bob
switch
partners.
Then,
Claire
and
Bob
switch
partners.
Finally,
Bob
and
Alice
switch
partners.
At
the
end
of
the
dance,
Alice
is
dancing
with
\n
Options:
\n
(A)
Lola
\n
(B)
Rodrigo
\n
(C)
Patrick
\n
A:
(C)
\n\n
Q:
{{input}}
\n
A:"
"
include"
:
"
_
flan_
fewshot_template_yaml"
"
task"
:
"
bbh_
flan_
fewshot_tracking_shuffled_objects_three_objects"
"
include"
:
"
_fewshot_template_yaml"
"
task"
:
"
bbh_fewshot_tracking_shuffled_objects_three_objects"
lm_eval/tasks/bbh/
flan_
fewshot/web_of_lies.yaml
→
lm_eval/tasks/bbh/fewshot/web_of_lies.yaml
View file @
835cc40e
"
dataset_name"
:
"
web_of_lies"
"
description"
:
"
Evaluate
a
random
boolean
function
expressed
as
a
word
problem.
\n\n
"
"
doc_to_text"
:
"
Q:
Question:
Fidel
tells
the
truth.
Jerry
says
Fidel
tells
the
truth.
Vina
says
Jerry
tells
the
truth.
Millicent
says
Vina
lies.
Raymond
says
Millicent
lies.
Does
Raymond
tell
the
truth?
\n
A:
Yes
\n\n
Q:
Question:
Kristian
lies.
Millie
says
Kristian
lies.
Maybelle
says
Millie
tells
the
truth.
Fidel
says
Maybelle
lies.
Leda
says
Fidel
lies.
Does
Leda
tell
the
truth?
\n
A:
Yes
\n\n
Q:
Question:
Kristian
tells
the
truth.
Michaela
says
Kristian
lies.
Raymond
says
Michaela
tells
the
truth.
Osvaldo
says
Raymond
tells
the
truth.
Jamey
says
Osvaldo
tells
the
truth.
Does
Jamey
tell
the
truth?
\n
A:
No
\n\n
Q:
{{input}}
\n
A:"
"
include"
:
"
_
flan_
fewshot_template_yaml"
"
task"
:
"
bbh_
flan_
fewshot_web_of_lies"
"
include"
:
"
_fewshot_template_yaml"
"
task"
:
"
bbh_fewshot_web_of_lies"
lm_eval/tasks/bbh/
flan_
fewshot/word_sorting.yaml
→
lm_eval/tasks/bbh/fewshot/word_sorting.yaml
View file @
835cc40e
"
dataset_name"
:
"
word_sorting"
"
description"
:
"
Sort
a
list
of
words.
\n\n
"
"
doc_to_text"
:
"
Q:
Sort
the
following
words
alphabetically:
List:
oven
costume
counterpart
\n
A:
costume
counterpart
oven
\n\n
Q:
Sort
the
following
words
alphabetically:
List:
hypochlorite
ponderosa
phone
credulity
\n
A:
credulity
hypochlorite
phone
ponderosa
\n\n
Q:
Sort
the
following
words
alphabetically:
List:
newt
arson
parthia
seismography
mugho
aspect
census
\n
A:
arson
aspect
census
mugho
newt
parthia
seismography
\n\n
Q:
{{input}}
\n
A:"
"
include"
:
"
_
flan_
fewshot_template_yaml"
"
task"
:
"
bbh_
flan_
fewshot_word_sorting"
"
include"
:
"
_fewshot_template_yaml"
"
task"
:
"
bbh_fewshot_word_sorting"
lm_eval/tasks/bbh/
flan_
zeroshot/_
flan_
zeroshot_template_yaml
→
lm_eval/tasks/bbh/zeroshot/_zeroshot_template_yaml
View file @
835cc40e
group: bbh_
flan_
zeroshot
group: bbh_zeroshot
dataset_path: lukaemon/bbh
output_type: generate_until
test_split: test
...
...
@@ -12,5 +12,9 @@ metric_list:
generation_kwargs:
until:
- "</s>"
- "Q:"
- "\n\n"
do_sample: false
temperature: 0.0
metadata:
- version: 0
lm_eval/tasks/bbh/
flan_
zeroshot/boolean_expressions.yaml
→
lm_eval/tasks/bbh/zeroshot/boolean_expressions.yaml
View file @
835cc40e
"
dataset_name"
:
"
boolean_expressions"
"
description"
:
"
Evaluate
the
result
of
a
random
Boolean
expression.
\n\n
"
"
doc_to_text"
:
"
Q:
{{input}}
\n
A:"
"
include"
:
"
_
flan_
zeroshot_template_yaml"
"
task"
:
"
bbh_
flan_
zeroshot_boolean_expressions"
"
include"
:
"
_zeroshot_template_yaml"
"
task"
:
"
bbh_zeroshot_boolean_expressions"
lm_eval/tasks/bbh/
flan_
zeroshot/causal_judgement.yaml
→
lm_eval/tasks/bbh/zeroshot/causal_judgement.yaml
View file @
835cc40e
"
dataset_name"
:
"
causal_judgement"
"
description"
:
"
Answer
questions
about
causal
attribution.
\n\n
"
"
doc_to_text"
:
"
Q:
{{input}}
\n
A:"
"
include"
:
"
_
flan_
zeroshot_template_yaml"
"
task"
:
"
bbh_
flan_
zeroshot_causal_judgement"
"
include"
:
"
_zeroshot_template_yaml"
"
task"
:
"
bbh_zeroshot_causal_judgement"
lm_eval/tasks/bbh/
flan_
zeroshot/date_understanding.yaml
→
lm_eval/tasks/bbh/zeroshot/date_understanding.yaml
View file @
835cc40e
"
dataset_name"
:
"
date_understanding"
"
description"
:
"
Infer
the
date
from
context.
\n\n
"
"
doc_to_text"
:
"
Q:
{{input}}
\n
A:"
"
include"
:
"
_
flan_
zeroshot_template_yaml"
"
task"
:
"
bbh_
flan_
zeroshot_date_understanding"
"
include"
:
"
_zeroshot_template_yaml"
"
task"
:
"
bbh_zeroshot_date_understanding"
lm_eval/tasks/bbh/
flan_
zeroshot/disambiguation_qa.yaml
→
lm_eval/tasks/bbh/zeroshot/disambiguation_qa.yaml
View file @
835cc40e
"
dataset_name"
:
"
disambiguation_qa"
"
description"
:
"
Clarify
the
meaning
of
sentences
with
ambiguous
pronouns.
\n\n
"
"
doc_to_text"
:
"
Q:
{{input}}
\n
A:"
"
include"
:
"
_
flan_
zeroshot_template_yaml"
"
task"
:
"
bbh_
flan_
zeroshot_disambiguation_qa"
"
include"
:
"
_zeroshot_template_yaml"
"
task"
:
"
bbh_zeroshot_disambiguation_qa"
lm_eval/tasks/bbh/
flan_
zeroshot/dyck_languages.yaml
→
lm_eval/tasks/bbh/zeroshot/dyck_languages.yaml
View file @
835cc40e
"
dataset_name"
:
"
dyck_languages"
"
description"
:
"
Correctly
close
a
Dyck-n
word.
\n\n
"
"
doc_to_text"
:
"
Q:
{{input}}
\n
A:"
"
include"
:
"
_
flan_
zeroshot_template_yaml"
"
task"
:
"
bbh_
flan_
zeroshot_dyck_languages"
"
include"
:
"
_zeroshot_template_yaml"
"
task"
:
"
bbh_zeroshot_dyck_languages"
lm_eval/tasks/bbh/
flan_
zeroshot/formal_fallacies.yaml
→
lm_eval/tasks/bbh/zeroshot/formal_fallacies.yaml
View file @
835cc40e
"
dataset_name"
:
"
formal_fallacies"
"
description"
:
"
Distinguish
deductively
valid
arguments
from
formal
fallacies.
\n\n
"
"
doc_to_text"
:
"
Q:
{{input}}
\n
A:"
"
include"
:
"
_
flan_
zeroshot_template_yaml"
"
task"
:
"
bbh_
flan_
zeroshot_formal_fallacies"
"
include"
:
"
_zeroshot_template_yaml"
"
task"
:
"
bbh_zeroshot_formal_fallacies"
lm_eval/tasks/bbh/
flan_
zeroshot/geometric_shapes.yaml
→
lm_eval/tasks/bbh/zeroshot/geometric_shapes.yaml
View file @
835cc40e
"
dataset_name"
:
"
geometric_shapes"
"
description"
:
"
Name
geometric
shapes
from
their
SVG
paths.
\n\n
"
"
doc_to_text"
:
"
Q:
{{input}}
\n
A:"
"
include"
:
"
_
flan_
zeroshot_template_yaml"
"
task"
:
"
bbh_
flan_
zeroshot_geometric_shapes"
"
include"
:
"
_zeroshot_template_yaml"
"
task"
:
"
bbh_zeroshot_geometric_shapes"
lm_eval/tasks/bbh/
flan_
zeroshot/hyperbaton.yaml
→
lm_eval/tasks/bbh/zeroshot/hyperbaton.yaml
View file @
835cc40e
"
dataset_name"
:
"
hyperbaton"
"
description"
:
"
Order
adjectives
correctly
in
English
sentences.
\n\n
"
"
doc_to_text"
:
"
Q:
{{input}}
\n
A:"
"
include"
:
"
_
flan_
zeroshot_template_yaml"
"
task"
:
"
bbh_
flan_
zeroshot_hyperbaton"
"
include"
:
"
_zeroshot_template_yaml"
"
task"
:
"
bbh_zeroshot_hyperbaton"
lm_eval/tasks/bbh/
flan_
zeroshot/logical_deduction_five_objects.yaml
→
lm_eval/tasks/bbh/zeroshot/logical_deduction_five_objects.yaml
View file @
835cc40e
"
dataset_name"
:
"
logical_deduction_five_objects"
"
description"
:
"
A
logical
deduction
task
which
requires
deducing
the
order
of
a
sequence
of
objects.
\n\n
"
"
doc_to_text"
:
"
Q:
{{input}}
\n
A:"
"
include"
:
"
_
flan_
zeroshot_template_yaml"
"
task"
:
"
bbh_
flan_
zeroshot_logical_deduction_five_objects"
"
include"
:
"
_zeroshot_template_yaml"
"
task"
:
"
bbh_zeroshot_logical_deduction_five_objects"
lm_eval/tasks/bbh/
flan_
zeroshot/logical_deduction_seven_objects.yaml
→
lm_eval/tasks/bbh/zeroshot/logical_deduction_seven_objects.yaml
View file @
835cc40e
"
dataset_name"
:
"
logical_deduction_seven_objects"
"
description"
:
"
A
logical
deduction
task
which
requires
deducing
the
order
of
a
sequence
of
objects.
\n\n
"
"
doc_to_text"
:
"
Q:
{{input}}
\n
A:"
"
include"
:
"
_
flan_
zeroshot_template_yaml"
"
task"
:
"
bbh_
flan_
zeroshot_logical_deduction_seven_objects"
"
include"
:
"
_zeroshot_template_yaml"
"
task"
:
"
bbh_zeroshot_logical_deduction_seven_objects"
Prev
1
…
4
5
6
7
8
9
10
11
12
…
22
Next
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment