Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
gaoqiong
lm-evaluation-harness
Commits
66421b57
Commit
66421b57
authored
Dec 08, 2023
by
lintangsutawika
Browse files
add prompt variation
parent
55eff889
Changes
76
Hide whitespace changes
Inline
Side-by-side
Showing
20 changed files
with
97 additions
and
0 deletions
+97
-0
lm_eval/tasks/bbh/alternative_worlds/prompt_variation/style_01/zeroshot/sports_understanding.yaml
...mpt_variation/style_01/zeroshot/sports_understanding.yaml
+6
-0
lm_eval/tasks/bbh/alternative_worlds/prompt_variation/style_01/zeroshot/temporal_sequences.yaml
...rompt_variation/style_01/zeroshot/temporal_sequences.yaml
+4
-0
lm_eval/tasks/bbh/alternative_worlds/prompt_variation/style_01/zeroshot/tracking_shuffled_objects_five_objects.yaml
...e_01/zeroshot/tracking_shuffled_objects_five_objects.yaml
+4
-0
lm_eval/tasks/bbh/alternative_worlds/prompt_variation/style_01/zeroshot/tracking_shuffled_objects_seven_objects.yaml
..._01/zeroshot/tracking_shuffled_objects_seven_objects.yaml
+4
-0
lm_eval/tasks/bbh/alternative_worlds/prompt_variation/style_01/zeroshot/tracking_shuffled_objects_three_objects.yaml
..._01/zeroshot/tracking_shuffled_objects_three_objects.yaml
+4
-0
lm_eval/tasks/bbh/alternative_worlds/prompt_variation/style_01/zeroshot/web_of_lies.yaml
...orlds/prompt_variation/style_01/zeroshot/web_of_lies.yaml
+6
-0
lm_eval/tasks/bbh/alternative_worlds/prompt_variation/style_02/zeroshot/_zeroshot_template_yaml
...rompt_variation/style_02/zeroshot/_zeroshot_template_yaml
+12
-0
lm_eval/tasks/bbh/alternative_worlds/prompt_variation/style_02/zeroshot/boolean_expressions.yaml
...ompt_variation/style_02/zeroshot/boolean_expressions.yaml
+6
-0
lm_eval/tasks/bbh/alternative_worlds/prompt_variation/style_02/zeroshot/causal_judgement.yaml
.../prompt_variation/style_02/zeroshot/causal_judgement.yaml
+4
-0
lm_eval/tasks/bbh/alternative_worlds/prompt_variation/style_02/zeroshot/date_understanding.yaml
...rompt_variation/style_02/zeroshot/date_understanding.yaml
+4
-0
lm_eval/tasks/bbh/alternative_worlds/prompt_variation/style_02/zeroshot/disambiguation_qa.yaml
...prompt_variation/style_02/zeroshot/disambiguation_qa.yaml
+4
-0
lm_eval/tasks/bbh/alternative_worlds/prompt_variation/style_02/zeroshot/formal_fallacies.yaml
.../prompt_variation/style_02/zeroshot/formal_fallacies.yaml
+6
-0
lm_eval/tasks/bbh/alternative_worlds/prompt_variation/style_02/zeroshot/geometric_shapes.yaml
.../prompt_variation/style_02/zeroshot/geometric_shapes.yaml
+4
-0
lm_eval/tasks/bbh/alternative_worlds/prompt_variation/style_02/zeroshot/hyperbaton.yaml
...worlds/prompt_variation/style_02/zeroshot/hyperbaton.yaml
+4
-0
lm_eval/tasks/bbh/alternative_worlds/prompt_variation/style_02/zeroshot/logical_deduction_five_objects.yaml
...ion/style_02/zeroshot/logical_deduction_five_objects.yaml
+4
-0
lm_eval/tasks/bbh/alternative_worlds/prompt_variation/style_02/zeroshot/logical_deduction_seven_objects.yaml
...on/style_02/zeroshot/logical_deduction_seven_objects.yaml
+4
-0
lm_eval/tasks/bbh/alternative_worlds/prompt_variation/style_02/zeroshot/logical_deduction_three_objects.yaml
...on/style_02/zeroshot/logical_deduction_three_objects.yaml
+4
-0
lm_eval/tasks/bbh/alternative_worlds/prompt_variation/style_02/zeroshot/movie_recommendation.yaml
...mpt_variation/style_02/zeroshot/movie_recommendation.yaml
+5
-0
lm_eval/tasks/bbh/alternative_worlds/prompt_variation/style_02/zeroshot/navigate.yaml
...e_worlds/prompt_variation/style_02/zeroshot/navigate.yaml
+4
-0
lm_eval/tasks/bbh/alternative_worlds/prompt_variation/style_02/zeroshot/penguins_in_a_table.yaml
...ompt_variation/style_02/zeroshot/penguins_in_a_table.yaml
+4
-0
No files found.
lm_eval/tasks/bbh/alternative_worlds/prompt_variation/style_01/zeroshot/sports_understanding.yaml
0 → 100644
View file @
66421b57
"
dataset_name"
:
"
sports_understanding"
"
description"
:
"
Determine
whether
an
artificially
constructed
sentence
relating
to
sports
is
plausible
or
not.
\n\n
"
"
include"
:
"
_zeroshot_template_yaml"
"
task"
:
"
bbh_alt_pv_01_zeroshot_sports_understanding"
"
doc_to_target"
:
target
"
doc_to_choice"
:
[
"
yes"
,
"
no"
]
\ No newline at end of file
lm_eval/tasks/bbh/alternative_worlds/prompt_variation/style_01/zeroshot/temporal_sequences.yaml
0 → 100644
View file @
66421b57
"
dataset_name"
:
"
temporal_sequences"
"
description"
:
"
Task
description:
Answer
questions
about
which
times
certain
events
could
have
occurred.
\n\n
"
"
include"
:
"
_zeroshot_template_yaml"
"
task"
:
"
bbh_alt_pv_01_zeroshot_temporal_sequences"
lm_eval/tasks/bbh/alternative_worlds/prompt_variation/style_01/zeroshot/tracking_shuffled_objects_five_objects.yaml
0 → 100644
View file @
66421b57
"
dataset_name"
:
"
tracking_shuffled_objects_five_objects"
"
description"
:
"
A
task
requiring
determining
the
final
positions
of
a
set
of
objects
given
their
initial
positions
and
a
description
of
a
sequence
of
swaps.
\n\n
"
"
include"
:
"
_zeroshot_template_yaml"
"
task"
:
"
bbh_alt_pv_01_zeroshot_tracking_shuffled_objects_five_objects"
lm_eval/tasks/bbh/alternative_worlds/prompt_variation/style_01/zeroshot/tracking_shuffled_objects_seven_objects.yaml
0 → 100644
View file @
66421b57
"
dataset_name"
:
"
tracking_shuffled_objects_seven_objects"
"
description"
:
"
A
task
requiring
determining
the
final
positions
of
a
set
of
objects
given
their
initial
positions
and
a
description
of
a
sequence
of
swaps.
\n\n
"
"
include"
:
"
_zeroshot_template_yaml"
"
task"
:
"
bbh_alt_pv_01_zeroshot_tracking_shuffled_objects_seven_objects"
lm_eval/tasks/bbh/alternative_worlds/prompt_variation/style_01/zeroshot/tracking_shuffled_objects_three_objects.yaml
0 → 100644
View file @
66421b57
"
dataset_name"
:
"
tracking_shuffled_objects_three_objects"
"
description"
:
"
A
task
requiring
determining
the
final
positions
of
a
set
of
objects
given
their
initial
positions
and
a
description
of
a
sequence
of
swaps.
\n\n
"
"
include"
:
"
_zeroshot_template_yaml"
"
task"
:
"
bbh_alt_pv_01_zeroshot_tracking_shuffled_objects_three_objects"
lm_eval/tasks/bbh/alternative_worlds/prompt_variation/style_01/zeroshot/web_of_lies.yaml
0 → 100644
View file @
66421b57
"
dataset_name"
:
"
web_of_lies"
"
description"
:
"
Evaluate
a
random
boolean
function
expressed
as
a
word
problem.
\n\n
"
"
include"
:
"
_zeroshot_template_yaml"
"
task"
:
"
bbh_alt_pv_01_zeroshot_web_of_lies"
"
doc_to_target"
:
target
"
doc_to_choice"
:
[
"
Yes"
,
"
No"
]
\ No newline at end of file
lm_eval/tasks/bbh/alternative_worlds/prompt_variation/style_02/zeroshot/_zeroshot_template_yaml
0 → 100644
View file @
66421b57
group: bbh_alt_pv_02_zeroshot
dataset_path: lukaemon/bbh
output_type: multiple_choice
test_split: test
doc_to_text: !function ../../styles.styles_02
doc_to_target: !function ../../styles.doc_to_target
doc_to_choice: !function ../../styles.doc_to_choice
num_fewshot: 0
metric_list:
- metric: acc
- metric: acc_norm
- metric: brier_score
\ No newline at end of file
lm_eval/tasks/bbh/alternative_worlds/prompt_variation/style_02/zeroshot/boolean_expressions.yaml
0 → 100644
View file @
66421b57
"
dataset_name"
:
"
boolean_expressions"
"
description"
:
"
Evaluate
the
result
of
a
random
Boolean
expression.
\n\n
"
"
include"
:
"
_zeroshot_template_yaml"
"
task"
:
"
bbh_alt_pv_02_zeroshot_boolean_expressions"
"
doc_to_target"
:
target
"
doc_to_choice"
:
[
"
True"
,
"
False"
]
lm_eval/tasks/bbh/alternative_worlds/prompt_variation/style_02/zeroshot/causal_judgement.yaml
0 → 100644
View file @
66421b57
"
dataset_name"
:
"
causal_judgement"
"
description"
:
"
Answer
questions
about
causal
attribution.
\n\n
"
"
include"
:
"
_zeroshot_template_yaml"
"
task"
:
"
bbh_alt_pv_02_zeroshot_causal_judgement"
lm_eval/tasks/bbh/alternative_worlds/prompt_variation/style_02/zeroshot/date_understanding.yaml
0 → 100644
View file @
66421b57
"
dataset_name"
:
"
date_understanding"
"
description"
:
"
Infer
the
date
from
context.
\n\n
"
"
include"
:
"
_zeroshot_template_yaml"
"
task"
:
"
bbh_alt_pv_02_zeroshot_date_understanding"
lm_eval/tasks/bbh/alternative_worlds/prompt_variation/style_02/zeroshot/disambiguation_qa.yaml
0 → 100644
View file @
66421b57
"
dataset_name"
:
"
disambiguation_qa"
"
description"
:
"
Clarify
the
meaning
of
sentences
with
ambiguous
pronouns.
\n\n
"
"
include"
:
"
_zeroshot_template_yaml"
"
task"
:
"
bbh_alt_pv_02_zeroshot_disambiguation_qa"
lm_eval/tasks/bbh/alternative_worlds/prompt_variation/style_02/zeroshot/formal_fallacies.yaml
0 → 100644
View file @
66421b57
"
dataset_name"
:
"
formal_fallacies"
"
description"
:
"
Distinguish
deductively
valid
arguments
from
formal
fallacies.
\n\n
"
"
include"
:
"
_zeroshot_template_yaml"
"
task"
:
"
bbh_alt_pv_02_zeroshot_formal_fallacies"
"
doc_to_target"
:
target
"
doc_to_choice"
:
[
"
valid"
,
"
invalid"
]
\ No newline at end of file
lm_eval/tasks/bbh/alternative_worlds/prompt_variation/style_02/zeroshot/geometric_shapes.yaml
0 → 100644
View file @
66421b57
"
dataset_name"
:
"
geometric_shapes"
"
description"
:
"
Name
geometric
shapes
from
their
SVG
paths.
\n\n
"
"
include"
:
"
_zeroshot_template_yaml"
"
task"
:
"
bbh_alt_pv_02_zeroshot_geometric_shapes"
lm_eval/tasks/bbh/alternative_worlds/prompt_variation/style_02/zeroshot/hyperbaton.yaml
0 → 100644
View file @
66421b57
"
dataset_name"
:
"
hyperbaton"
"
description"
:
"
Order
adjectives
correctly
in
English
sentences.
\n\n
"
"
include"
:
"
_zeroshot_template_yaml"
"
task"
:
"
bbh_alt_pv_02_zeroshot_hyperbaton"
lm_eval/tasks/bbh/alternative_worlds/prompt_variation/style_02/zeroshot/logical_deduction_five_objects.yaml
0 → 100644
View file @
66421b57
"
dataset_name"
:
"
logical_deduction_five_objects"
"
description"
:
"
A
logical
deduction
task
which
requires
deducing
the
order
of
a
sequence
of
objects.
\n\n
"
"
include"
:
"
_zeroshot_template_yaml"
"
task"
:
"
bbh_alt_pv_02_zeroshot_logical_deduction_five_objects"
lm_eval/tasks/bbh/alternative_worlds/prompt_variation/style_02/zeroshot/logical_deduction_seven_objects.yaml
0 → 100644
View file @
66421b57
"
dataset_name"
:
"
logical_deduction_seven_objects"
"
description"
:
"
A
logical
deduction
task
which
requires
deducing
the
order
of
a
sequence
of
objects.
\n\n
"
"
include"
:
"
_zeroshot_template_yaml"
"
task"
:
"
bbh_alt_pv_02_zeroshot_logical_deduction_seven_objects"
lm_eval/tasks/bbh/alternative_worlds/prompt_variation/style_02/zeroshot/logical_deduction_three_objects.yaml
0 → 100644
View file @
66421b57
"
dataset_name"
:
"
logical_deduction_three_objects"
"
description"
:
"
A
logical
deduction
task
which
requires
deducing
the
order
of
a
sequence
of
objects.
\n\n
"
"
include"
:
"
_zeroshot_template_yaml"
"
task"
:
"
bbh_alt_pv_02_zeroshot_logical_deduction_three_objects"
lm_eval/tasks/bbh/alternative_worlds/prompt_variation/style_02/zeroshot/movie_recommendation.yaml
0 → 100644
View file @
66421b57
"
dataset_name"
:
"
movie_recommendation"
"
description"
:
"
Recommend
movies
similar
to
the
given
list
of
movies.
\n\n
"
"
include"
:
"
_zeroshot_template_yaml"
"
task"
:
"
bbh_alt_pv_02_zeroshot_movie_recommendation"
"
process_docs"
:
!function
../utils.fix_movie_recommendation
\ No newline at end of file
lm_eval/tasks/bbh/alternative_worlds/prompt_variation/style_02/zeroshot/navigate.yaml
0 → 100644
View file @
66421b57
"
dataset_name"
:
"
navigate"
"
description"
:
"
Given
a
series
of
navigation
instructions,
determine
whether
one
would
end
up
back
at
the
starting
point.
\n\n
"
"
include"
:
"
_zeroshot_template_yaml"
"
task"
:
"
bbh_alt_pv_02_zeroshot_navigate"
lm_eval/tasks/bbh/alternative_worlds/prompt_variation/style_02/zeroshot/penguins_in_a_table.yaml
0 → 100644
View file @
66421b57
"
dataset_name"
:
"
penguins_in_a_table"
"
description"
:
"
Answer
questions
about
a
table
of
penguins
and
their
attributes.
\n\n
"
"
include"
:
"
_zeroshot_template_yaml"
"
task"
:
"
bbh_alt_pv_02_zeroshot_penguins_in_a_table"
Prev
1
2
3
4
Next
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment