Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
gaoqiong
lm-evaluation-harness
Commits
bf26d979
Unverified
Commit
bf26d979
authored
Nov 28, 2023
by
Lintang Sutawika
Committed by
GitHub
Nov 28, 2023
Browse files
Merge pull request #1029 from EleutherAI/bbh-fixup
[Refactor] BBH fixup
parents
e7afee52
3b9640b8
Changes
114
Hide whitespace changes
Inline
Side-by-side
Showing
20 changed files
with
41 additions
and
39 deletions
+41
-39
lm_eval/tasks/bbh/cot_zeroshot/logical_deduction_seven_objects.yaml
...sks/bbh/cot_zeroshot/logical_deduction_seven_objects.yaml
+2
-2
lm_eval/tasks/bbh/cot_zeroshot/logical_deduction_three_objects.yaml
...sks/bbh/cot_zeroshot/logical_deduction_three_objects.yaml
+2
-2
lm_eval/tasks/bbh/cot_zeroshot/movie_recommendation.yaml
lm_eval/tasks/bbh/cot_zeroshot/movie_recommendation.yaml
+2
-2
lm_eval/tasks/bbh/cot_zeroshot/multistep_arithmetic_two.yaml
lm_eval/tasks/bbh/cot_zeroshot/multistep_arithmetic_two.yaml
+2
-2
lm_eval/tasks/bbh/cot_zeroshot/navigate.yaml
lm_eval/tasks/bbh/cot_zeroshot/navigate.yaml
+2
-2
lm_eval/tasks/bbh/cot_zeroshot/object_counting.yaml
lm_eval/tasks/bbh/cot_zeroshot/object_counting.yaml
+2
-2
lm_eval/tasks/bbh/cot_zeroshot/penguins_in_a_table.yaml
lm_eval/tasks/bbh/cot_zeroshot/penguins_in_a_table.yaml
+2
-2
lm_eval/tasks/bbh/cot_zeroshot/reasoning_about_colored_objects.yaml
...sks/bbh/cot_zeroshot/reasoning_about_colored_objects.yaml
+2
-2
lm_eval/tasks/bbh/cot_zeroshot/ruin_names.yaml
lm_eval/tasks/bbh/cot_zeroshot/ruin_names.yaml
+2
-2
lm_eval/tasks/bbh/cot_zeroshot/salient_translation_error_detection.yaml
...bbh/cot_zeroshot/salient_translation_error_detection.yaml
+2
-2
lm_eval/tasks/bbh/cot_zeroshot/snarks.yaml
lm_eval/tasks/bbh/cot_zeroshot/snarks.yaml
+2
-2
lm_eval/tasks/bbh/cot_zeroshot/sports_understanding.yaml
lm_eval/tasks/bbh/cot_zeroshot/sports_understanding.yaml
+2
-2
lm_eval/tasks/bbh/cot_zeroshot/temporal_sequences.yaml
lm_eval/tasks/bbh/cot_zeroshot/temporal_sequences.yaml
+2
-2
lm_eval/tasks/bbh/cot_zeroshot/tracking_shuffled_objects_five_objects.yaml
.../cot_zeroshot/tracking_shuffled_objects_five_objects.yaml
+2
-2
lm_eval/tasks/bbh/cot_zeroshot/tracking_shuffled_objects_seven_objects.yaml
...cot_zeroshot/tracking_shuffled_objects_seven_objects.yaml
+2
-2
lm_eval/tasks/bbh/cot_zeroshot/tracking_shuffled_objects_three_objects.yaml
...cot_zeroshot/tracking_shuffled_objects_three_objects.yaml
+2
-2
lm_eval/tasks/bbh/cot_zeroshot/web_of_lies.yaml
lm_eval/tasks/bbh/cot_zeroshot/web_of_lies.yaml
+2
-2
lm_eval/tasks/bbh/cot_zeroshot/word_sorting.yaml
lm_eval/tasks/bbh/cot_zeroshot/word_sorting.yaml
+2
-2
lm_eval/tasks/bbh/fewshot/_fewshot_template_yaml
lm_eval/tasks/bbh/fewshot/_fewshot_template_yaml
+3
-1
lm_eval/tasks/bbh/fewshot/boolean_expressions.yaml
lm_eval/tasks/bbh/fewshot/boolean_expressions.yaml
+2
-2
No files found.
lm_eval/tasks/bbh/
flan_
cot_zeroshot/logical_deduction_seven_objects.yaml
→
lm_eval/tasks/bbh/cot_zeroshot/logical_deduction_seven_objects.yaml
View file @
bf26d979
"
dataset_name"
:
"
logical_deduction_seven_objects"
"
description"
:
"
A
logical
deduction
task
which
requires
deducing
the
order
of
a
sequence
of
objects.
\n\n
"
"
doc_to_text"
:
"
Q:
{{input}}
\n
A:
Let's
think
step
by
step.
\n
"
"
include"
:
"
_
flan_
cot_zeroshot_template_yaml"
"
task"
:
"
bbh_
flan_
cot_zeroshot_logical_deduction_seven_objects"
"
include"
:
"
_cot_zeroshot_template_yaml"
"
task"
:
"
bbh_cot_zeroshot_logical_deduction_seven_objects"
lm_eval/tasks/bbh/
flan_
cot_zeroshot/logical_deduction_three_objects.yaml
→
lm_eval/tasks/bbh/cot_zeroshot/logical_deduction_three_objects.yaml
View file @
bf26d979
"
dataset_name"
:
"
logical_deduction_three_objects"
"
description"
:
"
A
logical
deduction
task
which
requires
deducing
the
order
of
a
sequence
of
objects.
\n\n
"
"
doc_to_text"
:
"
Q:
{{input}}
\n
A:
Let's
think
step
by
step.
\n
"
"
include"
:
"
_
flan_
cot_zeroshot_template_yaml"
"
task"
:
"
bbh_
flan_
cot_zeroshot_logical_deduction_three_objects"
"
include"
:
"
_cot_zeroshot_template_yaml"
"
task"
:
"
bbh_cot_zeroshot_logical_deduction_three_objects"
lm_eval/tasks/bbh/
flan_
cot_zeroshot/movie_recommendation.yaml
→
lm_eval/tasks/bbh/cot_zeroshot/movie_recommendation.yaml
View file @
bf26d979
"
dataset_name"
:
"
movie_recommendation"
"
description"
:
"
Recommend
movies
similar
to
the
given
list
of
movies.
\n\n
"
"
doc_to_text"
:
"
Q:
{{input}}
\n
A:
Let's
think
step
by
step.
\n
"
"
include"
:
"
_
flan_
cot_zeroshot_template_yaml"
"
task"
:
"
bbh_
flan_
cot_zeroshot_movie_recommendation"
"
include"
:
"
_cot_zeroshot_template_yaml"
"
task"
:
"
bbh_cot_zeroshot_movie_recommendation"
lm_eval/tasks/bbh/
flan_
cot_zeroshot/multistep_arithmetic_two.yaml
→
lm_eval/tasks/bbh/cot_zeroshot/multistep_arithmetic_two.yaml
View file @
bf26d979
"
dataset_name"
:
"
multistep_arithmetic_two"
"
description"
:
"
Solve
multi-step
arithmetic
problems.
\n\n
"
"
doc_to_text"
:
"
Q:
{{input}}
\n
A:
Let's
think
step
by
step.
\n
"
"
include"
:
"
_
flan_
cot_zeroshot_template_yaml"
"
task"
:
"
bbh_
flan_
cot_zeroshot_multistep_arithmetic_two"
"
include"
:
"
_cot_zeroshot_template_yaml"
"
task"
:
"
bbh_cot_zeroshot_multistep_arithmetic_two"
lm_eval/tasks/bbh/
flan_
cot_zeroshot/navigate.yaml
→
lm_eval/tasks/bbh/cot_zeroshot/navigate.yaml
View file @
bf26d979
"
dataset_name"
:
"
navigate"
"
description"
:
"
Given
a
series
of
navigation
instructions,
determine
whether
one
would
end
up
back
at
the
starting
point.
\n\n
"
"
doc_to_text"
:
"
Q:
{{input}}
\n
A:
Let's
think
step
by
step.
\n
"
"
include"
:
"
_
flan_
cot_zeroshot_template_yaml"
"
task"
:
"
bbh_
flan_
cot_zeroshot_navigate"
"
include"
:
"
_cot_zeroshot_template_yaml"
"
task"
:
"
bbh_cot_zeroshot_navigate"
lm_eval/tasks/bbh/
flan_
cot_zeroshot/object_counting.yaml
→
lm_eval/tasks/bbh/cot_zeroshot/object_counting.yaml
View file @
bf26d979
"
dataset_name"
:
"
object_counting"
"
description"
:
"
Questions
that
involve
enumerating
objects
and
asking
the
model
to
count
them.
\n\n
"
"
doc_to_text"
:
"
Q:
{{input}}
\n
A:
Let's
think
step
by
step.
\n
"
"
include"
:
"
_
flan_
cot_zeroshot_template_yaml"
"
task"
:
"
bbh_
flan_
cot_zeroshot_object_counting"
"
include"
:
"
_cot_zeroshot_template_yaml"
"
task"
:
"
bbh_cot_zeroshot_object_counting"
lm_eval/tasks/bbh/
flan_
cot_zeroshot/penguins_in_a_table.yaml
→
lm_eval/tasks/bbh/cot_zeroshot/penguins_in_a_table.yaml
View file @
bf26d979
"
dataset_name"
:
"
penguins_in_a_table"
"
description"
:
"
Answer
questions
about
a
table
of
penguins
and
their
attributes.
\n\n
"
"
doc_to_text"
:
"
Q:
{{input}}
\n
A:
Let's
think
step
by
step.
\n
"
"
include"
:
"
_
flan_
cot_zeroshot_template_yaml"
"
task"
:
"
bbh_
flan_
cot_zeroshot_penguins_in_a_table"
"
include"
:
"
_cot_zeroshot_template_yaml"
"
task"
:
"
bbh_cot_zeroshot_penguins_in_a_table"
lm_eval/tasks/bbh/
flan_
cot_zeroshot/reasoning_about_colored_objects.yaml
→
lm_eval/tasks/bbh/cot_zeroshot/reasoning_about_colored_objects.yaml
View file @
bf26d979
"
dataset_name"
:
"
reasoning_about_colored_objects"
"
description"
:
"
Answer
extremely
simple
questions
about
the
colors
of
objects
on
a
surface.
\n\n
"
"
doc_to_text"
:
"
Q:
{{input}}
\n
A:
Let's
think
step
by
step.
\n
"
"
include"
:
"
_
flan_
cot_zeroshot_template_yaml"
"
task"
:
"
bbh_
flan_
cot_zeroshot_reasoning_about_colored_objects"
"
include"
:
"
_cot_zeroshot_template_yaml"
"
task"
:
"
bbh_cot_zeroshot_reasoning_about_colored_objects"
lm_eval/tasks/bbh/
flan_
cot_zeroshot/ruin_names.yaml
→
lm_eval/tasks/bbh/cot_zeroshot/ruin_names.yaml
View file @
bf26d979
"
dataset_name"
:
"
ruin_names"
"
description"
:
"
Select
the
humorous
edit
that
'ruins'
the
input
movie
or
musical
artist
name.
\n\n
"
"
doc_to_text"
:
"
Q:
{{input}}
\n
A:
Let's
think
step
by
step.
\n
"
"
include"
:
"
_
flan_
cot_zeroshot_template_yaml"
"
task"
:
"
bbh_
flan_
cot_zeroshot_ruin_names"
"
include"
:
"
_cot_zeroshot_template_yaml"
"
task"
:
"
bbh_cot_zeroshot_ruin_names"
lm_eval/tasks/bbh/
flan_
cot_zeroshot/salient_translation_error_detection.yaml
→
lm_eval/tasks/bbh/cot_zeroshot/salient_translation_error_detection.yaml
View file @
bf26d979
"
dataset_name"
:
"
salient_translation_error_detection"
"
description"
:
"
Detect
the
type
of
error
in
an
English
translation
of
a
German
source
sentence.
\n\n
"
"
doc_to_text"
:
"
Q:
{{input}}
\n
A:
Let's
think
step
by
step.
\n
"
"
include"
:
"
_
flan_
cot_zeroshot_template_yaml"
"
task"
:
"
bbh_
flan_
cot_zeroshot_salient_translation_error_detection"
"
include"
:
"
_cot_zeroshot_template_yaml"
"
task"
:
"
bbh_cot_zeroshot_salient_translation_error_detection"
lm_eval/tasks/bbh/
flan_
cot_zeroshot/snarks.yaml
→
lm_eval/tasks/bbh/cot_zeroshot/snarks.yaml
View file @
bf26d979
"
dataset_name"
:
"
snarks"
"
description"
:
"
Determine
which
of
two
sentences
is
sarcastic.
\n\n
According
to
Cambridge
University
Dictionary,
sarcasm
is
\"
the
use
of
remarks
that
clearly
mean
the
opposite
of
what
they
say,
made
in
order
to
hurt
someone's
feelings
or
to
criticize
something
in
a
humorous
way.
\"
Sarcastic
sentences
often
contain
satirical
or
ironic
utterances,
hyperboles,
ambivalent
or
witty
remarks.
\n\n
"
"
doc_to_text"
:
"
Q:
{{input}}
\n
A:
Let's
think
step
by
step.
\n
"
"
include"
:
"
_
flan_
cot_zeroshot_template_yaml"
"
task"
:
"
bbh_
flan_
cot_zeroshot_snarks"
"
include"
:
"
_cot_zeroshot_template_yaml"
"
task"
:
"
bbh_cot_zeroshot_snarks"
lm_eval/tasks/bbh/
flan_
cot_zeroshot/sports_understanding.yaml
→
lm_eval/tasks/bbh/cot_zeroshot/sports_understanding.yaml
View file @
bf26d979
"
dataset_name"
:
"
sports_understanding"
"
description"
:
"
Determine
whether
an
artificially
constructed
sentence
relating
to
sports
is
plausible
or
not.
\n\n
"
"
doc_to_text"
:
"
Q:
{{input}}
\n
A:
Let's
think
step
by
step.
\n
"
"
include"
:
"
_
flan_
cot_zeroshot_template_yaml"
"
task"
:
"
bbh_
flan_
cot_zeroshot_sports_understanding"
"
include"
:
"
_cot_zeroshot_template_yaml"
"
task"
:
"
bbh_cot_zeroshot_sports_understanding"
lm_eval/tasks/bbh/
flan_
cot_zeroshot/temporal_sequences.yaml
→
lm_eval/tasks/bbh/cot_zeroshot/temporal_sequences.yaml
View file @
bf26d979
"
dataset_name"
:
"
temporal_sequences"
"
description"
:
"
Task
description:
Answer
questions
about
which
times
certain
events
could
have
occurred.
\n\n
"
"
doc_to_text"
:
"
Q:
{{input}}
\n
A:
Let's
think
step
by
step.
\n
"
"
include"
:
"
_
flan_
cot_zeroshot_template_yaml"
"
task"
:
"
bbh_
flan_
cot_zeroshot_temporal_sequences"
"
include"
:
"
_cot_zeroshot_template_yaml"
"
task"
:
"
bbh_cot_zeroshot_temporal_sequences"
lm_eval/tasks/bbh/
flan_
cot_zeroshot/tracking_shuffled_objects_five_objects.yaml
→
lm_eval/tasks/bbh/cot_zeroshot/tracking_shuffled_objects_five_objects.yaml
View file @
bf26d979
"
dataset_name"
:
"
tracking_shuffled_objects_five_objects"
"
description"
:
"
A
task
requiring
determining
the
final
positions
of
a
set
of
objects
given
their
initial
positions
and
a
description
of
a
sequence
of
swaps.
\n\n
"
"
doc_to_text"
:
"
Q:
{{input}}
\n
A:
Let's
think
step
by
step.
\n
"
"
include"
:
"
_
flan_
cot_zeroshot_template_yaml"
"
task"
:
"
bbh_
flan_
cot_zeroshot_tracking_shuffled_objects_five_objects"
"
include"
:
"
_cot_zeroshot_template_yaml"
"
task"
:
"
bbh_cot_zeroshot_tracking_shuffled_objects_five_objects"
lm_eval/tasks/bbh/
flan_
cot_zeroshot/tracking_shuffled_objects_seven_objects.yaml
→
lm_eval/tasks/bbh/cot_zeroshot/tracking_shuffled_objects_seven_objects.yaml
View file @
bf26d979
"
dataset_name"
:
"
tracking_shuffled_objects_seven_objects"
"
description"
:
"
A
task
requiring
determining
the
final
positions
of
a
set
of
objects
given
their
initial
positions
and
a
description
of
a
sequence
of
swaps.
\n\n
"
"
doc_to_text"
:
"
Q:
{{input}}
\n
A:
Let's
think
step
by
step.
\n
"
"
include"
:
"
_
flan_
cot_zeroshot_template_yaml"
"
task"
:
"
bbh_
flan_
cot_zeroshot_tracking_shuffled_objects_seven_objects"
"
include"
:
"
_cot_zeroshot_template_yaml"
"
task"
:
"
bbh_cot_zeroshot_tracking_shuffled_objects_seven_objects"
lm_eval/tasks/bbh/
flan_
cot_zeroshot/tracking_shuffled_objects_three_objects.yaml
→
lm_eval/tasks/bbh/cot_zeroshot/tracking_shuffled_objects_three_objects.yaml
View file @
bf26d979
"
dataset_name"
:
"
tracking_shuffled_objects_three_objects"
"
description"
:
"
A
task
requiring
determining
the
final
positions
of
a
set
of
objects
given
their
initial
positions
and
a
description
of
a
sequence
of
swaps.
\n\n
"
"
doc_to_text"
:
"
Q:
{{input}}
\n
A:
Let's
think
step
by
step.
\n
"
"
include"
:
"
_
flan_
cot_zeroshot_template_yaml"
"
task"
:
"
bbh_
flan_
cot_zeroshot_tracking_shuffled_objects_three_objects"
"
include"
:
"
_cot_zeroshot_template_yaml"
"
task"
:
"
bbh_cot_zeroshot_tracking_shuffled_objects_three_objects"
lm_eval/tasks/bbh/
flan_
cot_zeroshot/web_of_lies.yaml
→
lm_eval/tasks/bbh/cot_zeroshot/web_of_lies.yaml
View file @
bf26d979
"
dataset_name"
:
"
web_of_lies"
"
description"
:
"
Evaluate
a
random
boolean
function
expressed
as
a
word
problem.
\n\n
"
"
doc_to_text"
:
"
Q:
{{input}}
\n
A:
Let's
think
step
by
step.
\n
"
"
include"
:
"
_
flan_
cot_zeroshot_template_yaml"
"
task"
:
"
bbh_
flan_
cot_zeroshot_web_of_lies"
"
include"
:
"
_cot_zeroshot_template_yaml"
"
task"
:
"
bbh_cot_zeroshot_web_of_lies"
lm_eval/tasks/bbh/
flan_
cot_zeroshot/word_sorting.yaml
→
lm_eval/tasks/bbh/cot_zeroshot/word_sorting.yaml
View file @
bf26d979
"
dataset_name"
:
"
word_sorting"
"
description"
:
"
Sort
a
list
of
words.
\n\n
"
"
doc_to_text"
:
"
Q:
{{input}}
\n
A:
Let's
think
step
by
step.
\n
"
"
include"
:
"
_
flan_
cot_zeroshot_template_yaml"
"
task"
:
"
bbh_
flan_
cot_zeroshot_word_sorting"
"
include"
:
"
_cot_zeroshot_template_yaml"
"
task"
:
"
bbh_cot_zeroshot_word_sorting"
lm_eval/tasks/bbh/
flan_
fewshot/_
flan_
fewshot_template_yaml
→
lm_eval/tasks/bbh/fewshot/_fewshot_template_yaml
View file @
bf26d979
group: bbh_
flan_
fewshot
group: bbh_fewshot
dataset_path: lukaemon/bbh
output_type: generate_until
test_split: test
...
...
@@ -12,5 +12,7 @@ metric_list:
generation_kwargs:
until:
- "</s>"
- "Q"
- "\n\n"
do_sample: false
temperature: 0.0
lm_eval/tasks/bbh/
flan_
fewshot/boolean_expressions.yaml
→
lm_eval/tasks/bbh/fewshot/boolean_expressions.yaml
View file @
bf26d979
"
dataset_name"
:
"
boolean_expressions"
"
description"
:
"
Evaluate
the
result
of
a
random
Boolean
expression.
\n\n
"
"
doc_to_text"
:
"
Q:
not
(
(
not
not
True
)
)
is
\n
A:
False
\n\n
Q:
True
and
False
and
not
True
and
True
is
\n
A:
False
\n\n
Q:
not
not
(
not
(
False
)
)
is
\n
A:
True
\n\n
Q:
{{input}}
\n
A:"
"
include"
:
"
_
flan_
fewshot_template_yaml"
"
task"
:
"
bbh_
flan_
fewshot_boolean_expressions"
"
include"
:
"
_fewshot_template_yaml"
"
task"
:
"
bbh_fewshot_boolean_expressions"
Prev
1
2
3
4
5
6
Next
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment