Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
gaoqiong
lm-evaluation-harness
Commits
c06b0d6e
Commit
c06b0d6e
authored
Sep 04, 2023
by
lintangsutawika
Browse files
add flan_cot_zeroshot
parent
13940f1e
Changes
28
Hide whitespace changes
Inline
Side-by-side
Showing
20 changed files
with
117 additions
and
0 deletions
+117
-0
lm_eval/tasks/bbh/flan_cot_zeroshot/_flan_cot_zeroshot_template_yaml
...ks/bbh/flan_cot_zeroshot/_flan_cot_zeroshot_template_yaml
+22
-0
lm_eval/tasks/bbh/flan_cot_zeroshot/boolean_expressions.yaml
lm_eval/tasks/bbh/flan_cot_zeroshot/boolean_expressions.yaml
+5
-0
lm_eval/tasks/bbh/flan_cot_zeroshot/causal_judgement.yaml
lm_eval/tasks/bbh/flan_cot_zeroshot/causal_judgement.yaml
+5
-0
lm_eval/tasks/bbh/flan_cot_zeroshot/date_understanding.yaml
lm_eval/tasks/bbh/flan_cot_zeroshot/date_understanding.yaml
+5
-0
lm_eval/tasks/bbh/flan_cot_zeroshot/disambiguation_qa.yaml
lm_eval/tasks/bbh/flan_cot_zeroshot/disambiguation_qa.yaml
+5
-0
lm_eval/tasks/bbh/flan_cot_zeroshot/dyck_languages.yaml
lm_eval/tasks/bbh/flan_cot_zeroshot/dyck_languages.yaml
+5
-0
lm_eval/tasks/bbh/flan_cot_zeroshot/formal_fallacies.yaml
lm_eval/tasks/bbh/flan_cot_zeroshot/formal_fallacies.yaml
+5
-0
lm_eval/tasks/bbh/flan_cot_zeroshot/geometric_shapes.yaml
lm_eval/tasks/bbh/flan_cot_zeroshot/geometric_shapes.yaml
+5
-0
lm_eval/tasks/bbh/flan_cot_zeroshot/hyperbaton.yaml
lm_eval/tasks/bbh/flan_cot_zeroshot/hyperbaton.yaml
+5
-0
lm_eval/tasks/bbh/flan_cot_zeroshot/logical_deduction_five_objects.yaml
...bbh/flan_cot_zeroshot/logical_deduction_five_objects.yaml
+5
-0
lm_eval/tasks/bbh/flan_cot_zeroshot/logical_deduction_seven_objects.yaml
...bh/flan_cot_zeroshot/logical_deduction_seven_objects.yaml
+5
-0
lm_eval/tasks/bbh/flan_cot_zeroshot/logical_deduction_three_objects.yaml
...bh/flan_cot_zeroshot/logical_deduction_three_objects.yaml
+5
-0
lm_eval/tasks/bbh/flan_cot_zeroshot/movie_recommendation.yaml
...val/tasks/bbh/flan_cot_zeroshot/movie_recommendation.yaml
+5
-0
lm_eval/tasks/bbh/flan_cot_zeroshot/multistep_arithmetic_two.yaml
...tasks/bbh/flan_cot_zeroshot/multistep_arithmetic_two.yaml
+5
-0
lm_eval/tasks/bbh/flan_cot_zeroshot/navigate.yaml
lm_eval/tasks/bbh/flan_cot_zeroshot/navigate.yaml
+5
-0
lm_eval/tasks/bbh/flan_cot_zeroshot/object_counting.yaml
lm_eval/tasks/bbh/flan_cot_zeroshot/object_counting.yaml
+5
-0
lm_eval/tasks/bbh/flan_cot_zeroshot/penguins_in_a_table.yaml
lm_eval/tasks/bbh/flan_cot_zeroshot/penguins_in_a_table.yaml
+5
-0
lm_eval/tasks/bbh/flan_cot_zeroshot/reasoning_about_colored_objects.yaml
...bh/flan_cot_zeroshot/reasoning_about_colored_objects.yaml
+5
-0
lm_eval/tasks/bbh/flan_cot_zeroshot/ruin_names.yaml
lm_eval/tasks/bbh/flan_cot_zeroshot/ruin_names.yaml
+5
-0
lm_eval/tasks/bbh/flan_cot_zeroshot/salient_translation_error_detection.yaml
...lan_cot_zeroshot/salient_translation_error_detection.yaml
+5
-0
No files found.
lm_eval/tasks/bbh/flan_cot_zeroshot/_flan_cot_zeroshot_template_yaml
0 → 100644
View file @
c06b0d6e
group: bbh_flan_zeroshot
dataset_path: lukaemon/bbh
output_type: greedy_until
test_split: test
doc_to_target: "{{target}}"
metric_list:
- metric: exact_match
aggregation: mean
higher_is_better: true
ignore_case: true
ignore_punctuation: true
generation_kwargs:
until:
- "</s>"
do_sample: false
temperature: 0.0
filter_list:
- name: "get-answer"
filter:
- function: "regex"
regex_pattern: "(?<=The answer is )(.*)(?=.)"
- function: "take_first"
\ No newline at end of file
lm_eval/tasks/bbh/flan_cot_zeroshot/boolean_expressions.yaml
0 → 100644
View file @
c06b0d6e
"
dataset_name"
:
"
boolean_expressions"
"
description"
:
"
Evaluate
the
result
of
a
random
Boolean
expression.
\n\n
"
"
doc_to_text"
:
"
Q:
{{input}}
\n
A:
Let's
think
step
by
step.
\n
"
"
include"
:
"
_template_yaml"
"
task"
:
"
bbh_flan_cot_zeroshot_boolean_expressions"
lm_eval/tasks/bbh/flan_cot_zeroshot/causal_judgement.yaml
0 → 100644
View file @
c06b0d6e
"
dataset_name"
:
"
causal_judgement"
"
description"
:
"
Answer
questions
about
causal
attribution.
\n\n
"
"
doc_to_text"
:
"
Q:
{{input}}
\n
A:
Let's
think
step
by
step.
\n
"
"
include"
:
"
_template_yaml"
"
task"
:
"
bbh_flan_cot_zeroshot_causal_judgement"
lm_eval/tasks/bbh/flan_cot_zeroshot/date_understanding.yaml
0 → 100644
View file @
c06b0d6e
"
dataset_name"
:
"
date_understanding"
"
description"
:
"
Infer
the
date
from
context.
\n\n
"
"
doc_to_text"
:
"
Q:
{{input}}
\n
A:
Let's
think
step
by
step.
\n
"
"
include"
:
"
_template_yaml"
"
task"
:
"
bbh_flan_cot_zeroshot_date_understanding"
lm_eval/tasks/bbh/flan_cot_zeroshot/disambiguation_qa.yaml
0 → 100644
View file @
c06b0d6e
"
dataset_name"
:
"
disambiguation_qa"
"
description"
:
"
Clarify
the
meaning
of
sentences
with
ambiguous
pronouns.
\n\n
"
"
doc_to_text"
:
"
Q:
{{input}}
\n
A:
Let's
think
step
by
step.
\n
"
"
include"
:
"
_template_yaml"
"
task"
:
"
bbh_flan_cot_zeroshot_disambiguation_qa"
lm_eval/tasks/bbh/flan_cot_zeroshot/dyck_languages.yaml
0 → 100644
View file @
c06b0d6e
"
dataset_name"
:
"
dyck_languages"
"
description"
:
"
Correctly
close
a
Dyck-n
word.
\n\n
"
"
doc_to_text"
:
"
Q:
{{input}}
\n
A:
Let's
think
step
by
step.
\n
"
"
include"
:
"
_template_yaml"
"
task"
:
"
bbh_flan_cot_zeroshot_dyck_languages"
lm_eval/tasks/bbh/flan_cot_zeroshot/formal_fallacies.yaml
0 → 100644
View file @
c06b0d6e
"
dataset_name"
:
"
formal_fallacies"
"
description"
:
"
Distinguish
deductively
valid
arguments
from
formal
fallacies.
\n\n
"
"
doc_to_text"
:
"
Q:
{{input}}
\n
A:
Let's
think
step
by
step.
\n
"
"
include"
:
"
_template_yaml"
"
task"
:
"
bbh_flan_cot_zeroshot_formal_fallacies"
lm_eval/tasks/bbh/flan_cot_zeroshot/geometric_shapes.yaml
0 → 100644
View file @
c06b0d6e
"
dataset_name"
:
"
geometric_shapes"
"
description"
:
"
Name
geometric
shapes
from
their
SVG
paths.
\n\n
"
"
doc_to_text"
:
"
Q:
{{input}}
\n
A:
Let's
think
step
by
step.
\n
"
"
include"
:
"
_template_yaml"
"
task"
:
"
bbh_flan_cot_zeroshot_geometric_shapes"
lm_eval/tasks/bbh/flan_cot_zeroshot/hyperbaton.yaml
0 → 100644
View file @
c06b0d6e
"
dataset_name"
:
"
hyperbaton"
"
description"
:
"
Order
adjectives
correctly
in
English
sentences.
\n\n
"
"
doc_to_text"
:
"
Q:
{{input}}
\n
A:
Let's
think
step
by
step.
\n
"
"
include"
:
"
_template_yaml"
"
task"
:
"
bbh_flan_cot_zeroshot_hyperbaton"
lm_eval/tasks/bbh/flan_cot_zeroshot/logical_deduction_five_objects.yaml
0 → 100644
View file @
c06b0d6e
"
dataset_name"
:
"
logical_deduction_five_objects"
"
description"
:
"
A
logical
deduction
task
which
requires
deducing
the
order
of
a
sequence
of
objects.
\n\n
"
"
doc_to_text"
:
"
Q:
{{input}}
\n
A:
Let's
think
step
by
step.
\n
"
"
include"
:
"
_template_yaml"
"
task"
:
"
bbh_flan_cot_zeroshot_logical_deduction_five_objects"
lm_eval/tasks/bbh/flan_cot_zeroshot/logical_deduction_seven_objects.yaml
0 → 100644
View file @
c06b0d6e
"
dataset_name"
:
"
logical_deduction_seven_objects"
"
description"
:
"
A
logical
deduction
task
which
requires
deducing
the
order
of
a
sequence
of
objects.
\n\n
"
"
doc_to_text"
:
"
Q:
{{input}}
\n
A:
Let's
think
step
by
step.
\n
"
"
include"
:
"
_template_yaml"
"
task"
:
"
bbh_flan_cot_zeroshot_logical_deduction_seven_objects"
lm_eval/tasks/bbh/flan_cot_zeroshot/logical_deduction_three_objects.yaml
0 → 100644
View file @
c06b0d6e
"
dataset_name"
:
"
logical_deduction_three_objects"
"
description"
:
"
A
logical
deduction
task
which
requires
deducing
the
order
of
a
sequence
of
objects.
\n\n
"
"
doc_to_text"
:
"
Q:
{{input}}
\n
A:
Let's
think
step
by
step.
\n
"
"
include"
:
"
_template_yaml"
"
task"
:
"
bbh_flan_cot_zeroshot_logical_deduction_three_objects"
lm_eval/tasks/bbh/flan_cot_zeroshot/movie_recommendation.yaml
0 → 100644
View file @
c06b0d6e
"
dataset_name"
:
"
movie_recommendation"
"
description"
:
"
Recommend
movies
similar
to
the
given
list
of
movies.
\n\n
"
"
doc_to_text"
:
"
Q:
{{input}}
\n
A:
Let's
think
step
by
step.
\n
"
"
include"
:
"
_template_yaml"
"
task"
:
"
bbh_flan_cot_zeroshot_movie_recommendation"
lm_eval/tasks/bbh/flan_cot_zeroshot/multistep_arithmetic_two.yaml
0 → 100644
View file @
c06b0d6e
"
dataset_name"
:
"
multistep_arithmetic_two"
"
description"
:
"
Solve
multi-step
arithmetic
problems.
\n\n
"
"
doc_to_text"
:
"
Q:
{{input}}
\n
A:
Let's
think
step
by
step.
\n
"
"
include"
:
"
_template_yaml"
"
task"
:
"
bbh_flan_cot_zeroshot_multistep_arithmetic_two"
lm_eval/tasks/bbh/flan_cot_zeroshot/navigate.yaml
0 → 100644
View file @
c06b0d6e
"
dataset_name"
:
"
navigate"
"
description"
:
"
Given
a
series
of
navigation
instructions,
determine
whether
one
would
end
up
back
at
the
starting
point.
\n\n
"
"
doc_to_text"
:
"
Q:
{{input}}
\n
A:
Let's
think
step
by
step.
\n
"
"
include"
:
"
_template_yaml"
"
task"
:
"
bbh_flan_cot_zeroshot_navigate"
lm_eval/tasks/bbh/flan_cot_zeroshot/object_counting.yaml
0 → 100644
View file @
c06b0d6e
"
dataset_name"
:
"
object_counting"
"
description"
:
"
Questions
that
involve
enumerating
objects
and
asking
the
model
to
count
them.
\n\n
"
"
doc_to_text"
:
"
Q:
{{input}}
\n
A:
Let's
think
step
by
step.
\n
"
"
include"
:
"
_template_yaml"
"
task"
:
"
bbh_flan_cot_zeroshot_object_counting"
lm_eval/tasks/bbh/flan_cot_zeroshot/penguins_in_a_table.yaml
0 → 100644
View file @
c06b0d6e
"
dataset_name"
:
"
penguins_in_a_table"
"
description"
:
"
Answer
questions
about
a
table
of
penguins
and
their
attributes.
\n\n
"
"
doc_to_text"
:
"
Q:
{{input}}
\n
A:
Let's
think
step
by
step.
\n
"
"
include"
:
"
_template_yaml"
"
task"
:
"
bbh_flan_cot_zeroshot_penguins_in_a_table"
lm_eval/tasks/bbh/flan_cot_zeroshot/reasoning_about_colored_objects.yaml
0 → 100644
View file @
c06b0d6e
"
dataset_name"
:
"
reasoning_about_colored_objects"
"
description"
:
"
Answer
extremely
simple
questions
about
the
colors
of
objects
on
a
surface.
\n\n
"
"
doc_to_text"
:
"
Q:
{{input}}
\n
A:
Let's
think
step
by
step.
\n
"
"
include"
:
"
_template_yaml"
"
task"
:
"
bbh_flan_cot_zeroshot_reasoning_about_colored_objects"
lm_eval/tasks/bbh/flan_cot_zeroshot/ruin_names.yaml
0 → 100644
View file @
c06b0d6e
"
dataset_name"
:
"
ruin_names"
"
description"
:
"
Select
the
humorous
edit
that
'ruins'
the
input
movie
or
musical
artist
name.
\n\n
"
"
doc_to_text"
:
"
Q:
{{input}}
\n
A:
Let's
think
step
by
step.
\n
"
"
include"
:
"
_template_yaml"
"
task"
:
"
bbh_flan_cot_zeroshot_ruin_names"
lm_eval/tasks/bbh/flan_cot_zeroshot/salient_translation_error_detection.yaml
0 → 100644
View file @
c06b0d6e
"
dataset_name"
:
"
salient_translation_error_detection"
"
description"
:
"
Detect
the
type
of
error
in
an
English
translation
of
a
German
source
sentence.
\n\n
"
"
doc_to_text"
:
"
Q:
{{input}}
\n
A:
Let's
think
step
by
step.
\n
"
"
include"
:
"
_template_yaml"
"
task"
:
"
bbh_flan_cot_zeroshot_salient_translation_error_detection"
Prev
1
2
Next
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment