Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
gaoqiong
lm-evaluation-harness
Commits
06d3406e
Commit
06d3406e
authored
Sep 04, 2023
by
lintangsutawika
Browse files
update
parent
f23ae748
Changes
129
Show whitespace changes
Inline
Side-by-side
Showing
20 changed files
with
113 additions
and
180 deletions
+113
-180
lm_eval/benchmarks/flan/prompt_templates/flan_anli.yaml
lm_eval/benchmarks/flan/prompt_templates/flan_anli.yaml
+18
-18
lm_eval/benchmarks/flan/prompt_templates/flan_bbh.yaml
lm_eval/benchmarks/flan/prompt_templates/flan_bbh.yaml
+0
-29
lm_eval/benchmarks/flan/yaml_templates/held_in_template_yaml
lm_eval/benchmarks/flan/yaml_templates/held_in_template_yaml
+1
-1
lm_eval/benchmarks/flan_held_in.yaml
lm_eval/benchmarks/flan_held_in.yaml
+10
-10
lm_eval/benchmarks/flan_held_out.yaml
lm_eval/benchmarks/flan_held_out.yaml
+2
-2
lm_eval/benchmarks/t0_eval.yaml
lm_eval/benchmarks/t0_eval.yaml
+77
-68
lm_eval/tasks/bbh/_generate_configs.py
lm_eval/tasks/bbh/_generate_configs.py
+3
-0
lm_eval/tasks/bbh/_template_yaml
lm_eval/tasks/bbh/_template_yaml
+2
-4
lm_eval/tasks/bbh/boolean_expressions.yaml
lm_eval/tasks/bbh/boolean_expressions.yaml
+0
-4
lm_eval/tasks/bbh/causal_judgement.yaml
lm_eval/tasks/bbh/causal_judgement.yaml
+0
-4
lm_eval/tasks/bbh/date_understanding.yaml
lm_eval/tasks/bbh/date_understanding.yaml
+0
-4
lm_eval/tasks/bbh/disambiguation_qa.yaml
lm_eval/tasks/bbh/disambiguation_qa.yaml
+0
-4
lm_eval/tasks/bbh/dyck_languages.yaml
lm_eval/tasks/bbh/dyck_languages.yaml
+0
-4
lm_eval/tasks/bbh/formal_fallacies.yaml
lm_eval/tasks/bbh/formal_fallacies.yaml
+0
-4
lm_eval/tasks/bbh/geometric_shapes.yaml
lm_eval/tasks/bbh/geometric_shapes.yaml
+0
-4
lm_eval/tasks/bbh/hyperbaton.yaml
lm_eval/tasks/bbh/hyperbaton.yaml
+0
-4
lm_eval/tasks/bbh/logical_deduction_five_objects.yaml
lm_eval/tasks/bbh/logical_deduction_five_objects.yaml
+0
-4
lm_eval/tasks/bbh/logical_deduction_seven_objects.yaml
lm_eval/tasks/bbh/logical_deduction_seven_objects.yaml
+0
-4
lm_eval/tasks/bbh/logical_deduction_three_objects.yaml
lm_eval/tasks/bbh/logical_deduction_three_objects.yaml
+0
-4
lm_eval/tasks/bbh/movie_recommendation.yaml
lm_eval/tasks/bbh/movie_recommendation.yaml
+0
-4
No files found.
lm_eval/benchmarks/flan/prompt_templates/flan_anli.yaml
View file @
06d3406e
# Flan Prompt Templates
# Flan Prompt Templates
prompts
:
prompts
:
"
template-0"
:
"
template-0"
:
doc_to_text
:
"
{{context}}
\n\n
Choose
your
answer:
based
on
the
paragraph
above
can
we
conclude
that
\"
{{hypothesis}}
\"
?
\n\n
OPTIONS:
\n
-
Yes
\n
-
It's
impossible
to
say
\n
-
No
\n
I
think
the
answer
is"
doc_to_text
:
"
{{context}}
\n\n
Choose
your
answer:
based
on
the
paragraph
above
can
we
conclude
that
\"
{{hypothesis}}
\"
?
\n\n
OPTIONS:
\n
-
Yes
\n
-
It
\
's
impossible
to
say
\n
-
No
\n
I
think
the
answer
is"
doc_to_target
:
"
"
"
{{[
"
Yes
"
,
"
It's impossible to say
"
,
"
No
"
][label]}}"
""
doc_to_target
:
"
{{[
'
Yes
'
,
'
It
\
's
impossible
to
say
'
,
'
No
'
][label]}}"
"
template-1"
:
"
template-1"
:
doc_to_text
:
"
{{context}}
\n\n
Based
on
that
paragraph
can
we
conclude
that
this
sentence
is
true?
\n
{{hypothesis}}
\n\n
OPTIONS:
\n
-
Yes
\n
-
It's
impossible
to
say
\n
-
No"
doc_to_text
:
"
{{context}}
\n\n
Based
on
that
paragraph
can
we
conclude
that
this
sentence
is
true?
\n
{{hypothesis}}
\n\n
OPTIONS:
\n
-
Yes
\n
-
It
\
's
impossible
to
say
\n
-
No"
doc_to_target
:
"
{{[
"
Yes
"
,
"
It's impossible to say
"
,
"
No
"
][label]}}"
doc_to_target
:
"
{{[
'
Yes
'
,
'
It
\
's
impossible
to
say
'
,
'
No
'
][label]}}"
"
template-2"
:
"
template-2"
:
doc_to_text
:
"
{{context}}
\n\n
Can
we
draw
the
following
conclusion?
\n
{{hypothesis}}
\n\n
OPTIONS:
\n
-
Yes
\n
-
It's
impossible
to
say
\n
-
No"
doc_to_text
:
"
{{context}}
\n\n
Can
we
draw
the
following
conclusion?
\n
{{hypothesis}}
\n\n
OPTIONS:
\n
-
Yes
\n
-
It
\
's
impossible
to
say
\n
-
No"
doc_to_target
:
"
{{[
"
Yes
"
,
"
It's impossible to say
"
,
"
No
"
][label]}}"
doc_to_target
:
"
{{[
'
Yes
'
,
'
It
\
's
impossible
to
say
'
,
'
No
'
][label]}}"
"
template-3"
:
"
template-3"
:
doc_to_text
:
"
{{context}}
\n
Does
this
next
sentence
follow,
given
the
preceding
text?
\n
{{hypothesis}}
\n\n
OPTIONS:
\n
-
Yes
\n
-
It's
impossible
to
say
\n
-
No"
doc_to_text
:
"
{{context}}
\n
Does
this
next
sentence
follow,
given
the
preceding
text?
\n
{{hypothesis}}
\n\n
OPTIONS:
\n
-
Yes
\n
-
It
\
's
impossible
to
say
\n
-
No"
doc_to_target
:
"
{{[
"
Yes
"
,
"
It's impossible to say
"
,
"
No
"
][label]}}"
doc_to_target
:
"
{{[
'
Yes
'
,
'
It
\
's
impossible
to
say
'
,
'
No
'
][label]}}"
"
template-4"
:
"
template-4"
:
doc_to_text
:
"
{{context}}
\n
Can
we
infer
the
following?
\n
{{hypothesis}}
\n\n
OPTIONS:
\n
-
Yes
\n
-
It's
impossible
to
say
\n
-
No
\n
The
answer
is:"
doc_to_text
:
"
{{context}}
\n
Can
we
infer
the
following?
\n
{{hypothesis}}
\n\n
OPTIONS:
\n
-
Yes
\n
-
It
\
's
impossible
to
say
\n
-
No
\n
The
answer
is:"
doc_to_target
:
"
{{[
"
Yes
"
,
"
It's impossible to say
"
,
"
No
"
][label]}}"
doc_to_target
:
"
{{[
'
Yes
'
,
'
It
\
's
impossible
to
say
'
,
'
No
'
][label]}}"
"
template-5"
:
"
template-5"
:
doc_to_text
:
"
Read
the
following
paragraph
and
determine
if
the
hypothesis
is
true:
\n\n
{{context}}
\n\n
OPTIONS:
\n
-
Yes
\n
-
It's
impossible
to
say
\n
-
No
\n
Hypothesis:
{{hypothesis}}
\n\n\n
"
doc_to_text
:
"
Read
the
following
paragraph
and
determine
if
the
hypothesis
is
true:
\n\n
{{context}}
\n\n
OPTIONS:
\n
-
Yes
\n
-
It
\
's
impossible
to
say
\n
-
No
\n
Hypothesis:
{{hypothesis}}
\n\n\n
"
doc_to_target
:
"
{{[
"
Yes
"
,
"
It's impossible to say
"
,
"
No
"
][label]}}"
doc_to_target
:
"
{{[
'
Yes
'
,
'
It
\
's
impossible
to
say
'
,
'
No
'
][label]}}"
"
template-6"
:
"
template-6"
:
doc_to_text
:
"
Read
the
text
and
determine
if
the
sentence
is
true
(see
options
at
the
end):
\n\n
{{context}}
\n\n
Sentence:
{{hypothesis}}
\n
OPTIONS:
\n
-
Yes
\n
-
It's
impossible
to
say
\n
-
No"
doc_to_text
:
"
Read
the
text
and
determine
if
the
sentence
is
true
(see
options
at
the
end):
\n\n
{{context}}
\n\n
Sentence:
{{hypothesis}}
\n
OPTIONS:
\n
-
Yes
\n
-
It
\
's
impossible
to
say
\n
-
No"
doc_to_target
:
"
{{[
"
Yes
"
,
"
It's impossible to say
"
,
"
No
"
][label]}}"
doc_to_target
:
"
{{[
'
Yes
'
,
'
It
\
's
impossible
to
say
'
,
'
No
'
][label]}}"
"
template-7"
:
"
template-7"
:
doc_to_text
:
"
Can
we
draw
the
following
hypothesis
from
the
context
(see
options)?
\n\n
Context:
\n\n
{{context}}
\n\n
Hypothesis:
{{hypothesis}}
\n
OPTIONS:
\n
-
Yes
\n
-
It's
impossible
to
say
\n
-
No"
doc_to_text
:
"
Can
we
draw
the
following
hypothesis
from
the
context
(see
options)?
\n\n
Context:
\n\n
{{context}}
\n\n
Hypothesis:
{{hypothesis}}
\n
OPTIONS:
\n
-
Yes
\n
-
It
\
's
impossible
to
say
\n
-
No"
doc_to_target
:
"
{{[
"
Yes
"
,
"
It's impossible to say
"
,
"
No
"
][label]}}"
doc_to_target
:
"
{{[
'
Yes
'
,
'
It
\
's
impossible
to
say
'
,
'
No
'
][label]}}"
"
template-8"
:
"
template-8"
:
doc_to_text
:
"
Choose
from
options:
Determine
if
the
sentence
is
true
based
on
the
text
below:
\n
{{hypothesis}}
\n\n
{{context}}
\n
OPTIONS:
\n
-
Yes
\n
-
It's
impossible
to
say
\n
-
No"
doc_to_text
:
"
Choose
from
options:
Determine
if
the
sentence
is
true
based
on
the
text
below:
\n
{{hypothesis}}
\n\n
{{context}}
\n
OPTIONS:
\n
-
Yes
\n
-
It
\
's
impossible
to
say
\n
-
No"
doc_to_target
:
"
{{[
"
Yes
"
,
"
It's impossible to say
"
,
"
No
"
][label]}}"
doc_to_target
:
"
{{[
'
Yes
'
,
'
It
\
's
impossible
to
say
'
,
'
No
'
][label]}}"
lm_eval/benchmarks/flan/prompt_templates/flan_bbh.yaml
deleted
100644 → 0
View file @
f23ae748
# Flan Prompt Templates
prompts
:
"
template-0"
:
doc_to_text
:
"
{{context}}
\n\n
Choose
your
answer:
based
on
the
paragraph
above
can
we
conclude
that
\"
{{hypothesis}}
\"
?
\n\n
OPTIONS:
\n
-
Yes
\n
-
It's
impossible
to
say
\n
-
No
\n
I
think
the
answer
is"
doc_to_target
:
"
{{['Yes',
'It
\
's
impossible
to
say',
'No'][label]}}"
"
template-1"
:
doc_to_text
:
"
{{context}}
\n\n
Based
on
that
paragraph
can
we
conclude
that
this
sentence
is
true?
\n
{{hypothesis}}
\n\n
OPTIONS:
\n
-
Yes
\n
-
It's
impossible
to
say
\n
-
No"
doc_to_target
:
"
{{['Yes',
'It
\
's
impossible
to
say',
'No'][label]}}"
"
template-2"
:
doc_to_text
:
"
{{context}}
\n\n
Can
we
draw
the
following
conclusion?
\n
{{hypothesis}}
\n\n
OPTIONS:
\n
-
Yes
\n
-
It's
impossible
to
say
\n
-
No"
doc_to_target
:
"
{{['Yes',
'It
\
's
impossible
to
say',
'No'][label]}}"
"
template-3"
:
doc_to_text
:
"
{{context}}
\n
Does
this
next
sentence
follow,
given
the
preceding
text?
\n
{{hypothesis}}
\n\n
OPTIONS:
\n
-
Yes
\n
-
It's
impossible
to
say
\n
-
No"
doc_to_target
:
"
{{['Yes',
'It
\
's
impossible
to
say',
'No'][label]}}"
"
template-4"
:
doc_to_text
:
"
{{context}}
\n
Can
we
infer
the
following?
\n
{{hypothesis}}
\n\n
OPTIONS:
\n
-
Yes
\n
-
It's
impossible
to
say
\n
-
No
\n
The
answer
is:"
doc_to_target
:
"
{{['Yes',
'It
\
's
impossible
to
say',
'No'][label]}}"
"
template-5"
:
doc_to_text
:
"
Read
the
following
paragraph
and
determine
if
the
hypothesis
is
true:
\n\n
{{context}}
\n\n
OPTIONS:
\n
-
Yes
\n
-
It's
impossible
to
say
\n
-
No
\n
Hypothesis:
{{hypothesis}}
\n\n\n
"
doc_to_target
:
"
{{['Yes',
'It
\
's
impossible
to
say',
'No'][label]}}"
"
template-6"
:
doc_to_text
:
"
Read
the
text
and
determine
if
the
sentence
is
true
(see
options
at
the
end):
\n\n
{{context}}
\n\n
Sentence:
{{hypothesis}}
\n
OPTIONS:
\n
-
Yes
\n
-
It's
impossible
to
say
\n
-
No"
doc_to_target
:
"
{{['Yes',
'It
\
's
impossible
to
say',
'No'][label]}}"
"
template-7"
:
doc_to_text
:
"
Can
we
draw
the
following
hypothesis
from
the
context
(see
options)?
\n\n
Context:
\n\n
{{context}}
\n\n
Hypothesis:
{{hypothesis}}
\n
OPTIONS:
\n
-
Yes
\n
-
It's
impossible
to
say
\n
-
No"
doc_to_target
:
"
{{['Yes',
'It
\
's
impossible
to
say',
'No'][label]}}"
"
template-8"
:
doc_to_text
:
"
Choose
from
options:
Determine
if
the
sentence
is
true
based
on
the
text
below:
\n
{{hypothesis}}
\n\n
{{context}}
\n
OPTIONS:
\n
-
Yes
\n
-
It's
impossible
to
say
\n
-
No"
doc_to_target
:
"
{{['Yes',
'It
\
's
impossible
to
say',
'No'][label]}}"
lm_eval/benchmarks/flan/yaml_templates/held_in_template_yaml
View file @
06d3406e
...
@@ -8,6 +8,6 @@ metric_list:
...
@@ -8,6 +8,6 @@ metric_list:
ignore_punctuation: true
ignore_punctuation: true
generation_kwargs:
generation_kwargs:
until:
until:
- "
\n\n
"
- "
</s>
"
do_sample: false
do_sample: false
temperature: 0.0
temperature: 0.0
lm_eval/benchmarks/flan_held_in.yaml
View file @
06d3406e
...
@@ -25,13 +25,13 @@ task:
...
@@ -25,13 +25,13 @@ task:
dataset_path
:
anli
dataset_path
:
anli
use_prompt
:
flan/prompt_templates/flan_anli.yaml:*
use_prompt
:
flan/prompt_templates/flan_anli.yaml:*
validation_split
:
dev_r3
validation_split
:
dev_r3
#
- include: flan/yaml_templates/held_in_template_yaml
-
include
:
flan/yaml_templates/held_in_template_yaml
#
task: ai2_arc
task
:
ai2_arc
#
dataset_path: ARC-Easy
dataset_path
:
ARC-Easy
#
use_prompt: local:*
use_prompt
:
local:*
#
validation_split: validation
validation_split
:
validation
#
- include: flan/yaml_templates/held_in_template_yaml
-
include
:
flan/yaml_templates/held_in_template_yaml
#
task: ai2_arc
task
:
ai2_arc
#
dataset_path: ARC-Challange
dataset_path
:
ARC-Challange
#
use_prompt: local:*
use_prompt
:
local:*
#
validation_split: validation
validation_split
:
validation
lm_eval/benchmarks/flan_held_out.yaml
View file @
06d3406e
group
:
flan_held_out
group
:
flan_held_out
task
:
task
:
-
bbh
-
bbh
_flan
-
mmlu
-
mmlu
_flan
lm_eval/benchmarks/t0_eval.yaml
View file @
06d3406e
...
@@ -6,6 +6,7 @@ task:
...
@@ -6,6 +6,7 @@ task:
use_prompt
:
promptsource:*
use_prompt
:
promptsource:*
training_split
:
train
training_split
:
train
validation_split
:
validation
validation_split
:
validation
output_type
:
greedy_until
metric_list
:
metric_list
:
-
metric
:
exact_match
-
metric
:
exact_match
aggregation
:
mean
aggregation
:
mean
...
@@ -18,18 +19,6 @@ task:
...
@@ -18,18 +19,6 @@ task:
use_prompt
:
promptsource:*
use_prompt
:
promptsource:*
training_split
:
train
training_split
:
train
validation_split
:
validation
validation_split
:
validation
metric_list
:
-
metric
:
exact_match
aggregation
:
mean
higher_is_better
:
true
ignore_case
:
true
ignore_punctuation
:
true
# Natural Language Inference
-
dataset_path
:
super_glue
dataset_name
:
cb
use_prompt
:
promptsource:*
training_split
:
train
validation_split
:
validation
output_type
:
greedy_until
output_type
:
greedy_until
metric_list
:
metric_list
:
-
metric
:
exact_match
-
metric
:
exact_match
...
@@ -37,67 +26,86 @@ task:
...
@@ -37,67 +26,86 @@ task:
higher_is_better
:
true
higher_is_better
:
true
ignore_case
:
true
ignore_case
:
true
ignore_punctuation
:
true
ignore_punctuation
:
true
-
dataset_path
:
super_glue
# # Natural Language Inference
dataset_name
:
rte
# - dataset_path: super_glue
use_prompt
:
promptsource:*
# dataset_name: cb
training_split
:
train
# use_prompt: promptsource:*
validation_split
:
validation
# training_split: train
metric_list
:
# validation_split: validation
-
metric
:
exact_match
# output_type: greedy_until
aggregation
:
mean
# metric_list:
higher_is_better
:
true
# - metric: exact_match
ignore_case
:
true
# aggregation: mean
ignore_punctuation
:
true
# higher_is_better: true
-
task
:
anli_r1
# ignore_case: true
dataset_path
:
anli
# ignore_punctuation: true
use_prompt
:
promptsource:*
# - dataset_path: super_glue
training_split
:
train_r1
# dataset_name: rte
validation_split
:
dev_r1
# use_prompt: promptsource:*
metric_list
:
# training_split: train
-
metric
:
exact_match
# validation_split: validation
aggregation
:
mean
# output_type: greedy_until
higher_is_better
:
true
# metric_list:
ignore_case
:
true
# - metric: exact_match
ignore_punctuation
:
true
# aggregation: mean
-
task
:
anli_r2
# higher_is_better: true
dataset_path
:
anli
# ignore_case: true
use_prompt
:
promptsource:*
# ignore_punctuation: true
training_split
:
train_r2
# - task: anli_r1
validation_split
:
dev_r2
# dataset_path: anli
metric_list
:
# use_prompt: promptsource:*
-
metric
:
exact_match
# training_split: train_r1
aggregation
:
mean
# validation_split: dev_r1
higher_is_better
:
true
# output_type: greedy_until
ignore_case
:
true
# metric_list:
ignore_punctuation
:
true
# - metric: exact_match
-
task
:
anli_r3
# aggregation: mean
dataset_path
:
anli
# higher_is_better: true
use_prompt
:
promptsource:*
# ignore_case: true
training_split
:
train_r3
# ignore_punctuation: true
validation_split
:
dev_r3
# - task: anli_r2
metric_list
:
# dataset_path: anli
-
metric
:
exact_match
# use_prompt: promptsource:*
aggregation
:
mean
# training_split: train_r2
higher_is_better
:
true
# validation_split: dev_r2
ignore_case
:
true
# output_type: greedy_until
ignore_punctuation
:
true
# metric_list:
# Sentence Completion
# - metric: exact_match
-
dataset_path
:
super_glue
# aggregation: mean
dataset_name
:
copa
# higher_is_better: true
use_prompt
:
promptsource:*
# ignore_case: true
training_split
:
train
# ignore_punctuation: true
validation_split
:
validation
# - task: anli_r3
metric_list
:
# dataset_path: anli
-
metric
:
exact_match
# use_prompt: promptsource:*
aggregation
:
mean
# training_split: train_r3
higher_is_better
:
true
# validation_split: dev_r3
ignore_case
:
true
# output_type: greedy_until
ignore_punctuation
:
true
# metric_list:
# - metric: exact_match
# aggregation: mean
# higher_is_better: true
# ignore_case: true
# ignore_punctuation: true
# # Sentence Completion
# - dataset_path: super_glue
# dataset_name: copa
# use_prompt: promptsource:*
# training_split: train
# validation_split: validation
# output_type: greedy_until
# metric_list:
# - metric: exact_match
# aggregation: mean
# higher_is_better: true
# ignore_case: true
# ignore_punctuation: true
# Natural Language Inference
# Natural Language Inference
-
dataset_path
:
hellaswag
-
dataset_path
:
hellaswag
use_prompt
:
promptsource:*
use_prompt
:
promptsource:*
training_split
:
train
training_split
:
train
validation_split
:
validation
validation_split
:
validation
output_type
:
greedy_until
metric_list
:
metric_list
:
-
metric
:
exact_match
-
metric
:
exact_match
aggregation
:
mean
aggregation
:
mean
...
@@ -110,6 +118,7 @@ task:
...
@@ -110,6 +118,7 @@ task:
use_prompt
:
promptsource:*
use_prompt
:
promptsource:*
training_split
:
train
training_split
:
train
validation_split
:
validation
validation_split
:
validation
output_type
:
greedy_until
metric_list
:
metric_list
:
-
metric
:
exact_match
-
metric
:
exact_match
aggregation
:
mean
aggregation
:
mean
...
...
lm_eval/tasks/bbh/_generate_configs.py
View file @
06d3406e
...
@@ -27,3 +27,6 @@ def main() -> None:
...
@@ -27,3 +27,6 @@ def main() -> None:
if
__name__
==
"__main__"
:
if
__name__
==
"__main__"
:
main
()
main
()
# https://raw.githubusercontent.com/suzgunmirac/BIG-Bench-Hard/main/cot-prompts/boolean_expressions.txt
lm_eval/tasks/bbh/_template_yaml
View file @
06d3406e
...
@@ -2,16 +2,14 @@ group: bbh
...
@@ -2,16 +2,14 @@ group: bbh
dataset_path: lukaemon/bbh
dataset_path: lukaemon/bbh
output_type: greedy_until
output_type: greedy_until
test_split: test
test_split: test
doc_to_text: "{{input}}"
doc_to_text: "
Q:
{{input}}
\nA:
"
doc_to_target: "{{target}}"
doc_to_target: "{{target}}"
metric_list:
metric_list:
- metric: exact_match
- metric: exact_match
aggregation: mean
aggregation: mean
higher_is_better: true
higher_is_better: true
ignore_case: true
ignore_punctuation: false
generation_kwargs:
generation_kwargs:
until:
until:
- "
\n\n
"
- "
</s>
"
do_sample: false
do_sample: false
temperature: 0.0
temperature: 0.0
lm_eval/tasks/bbh/boolean_expressions.yaml
deleted
100644 → 0
View file @
f23ae748
# Generated by _generate_configs.py
dataset_name
:
boolean_expressions
include
:
_template_yaml
task
:
bbh_boolean_expressions
lm_eval/tasks/bbh/causal_judgement.yaml
deleted
100644 → 0
View file @
f23ae748
# Generated by _generate_configs.py
dataset_name
:
causal_judgement
include
:
_template_yaml
task
:
bbh_causal_judgement
lm_eval/tasks/bbh/date_understanding.yaml
deleted
100644 → 0
View file @
f23ae748
# Generated by _generate_configs.py
dataset_name
:
date_understanding
include
:
_template_yaml
task
:
bbh_date_understanding
lm_eval/tasks/bbh/disambiguation_qa.yaml
deleted
100644 → 0
View file @
f23ae748
# Generated by _generate_configs.py
dataset_name
:
disambiguation_qa
include
:
_template_yaml
task
:
bbh_disambiguation_qa
lm_eval/tasks/bbh/dyck_languages.yaml
deleted
100644 → 0
View file @
f23ae748
# Generated by _generate_configs.py
dataset_name
:
dyck_languages
include
:
_template_yaml
task
:
bbh_dyck_languages
lm_eval/tasks/bbh/formal_fallacies.yaml
deleted
100644 → 0
View file @
f23ae748
# Generated by _generate_configs.py
dataset_name
:
formal_fallacies
include
:
_template_yaml
task
:
bbh_formal_fallacies
lm_eval/tasks/bbh/geometric_shapes.yaml
deleted
100644 → 0
View file @
f23ae748
# Generated by _generate_configs.py
dataset_name
:
geometric_shapes
include
:
_template_yaml
task
:
bbh_geometric_shapes
lm_eval/tasks/bbh/hyperbaton.yaml
deleted
100644 → 0
View file @
f23ae748
# Generated by _generate_configs.py
dataset_name
:
hyperbaton
include
:
_template_yaml
task
:
bbh_hyperbaton
lm_eval/tasks/bbh/logical_deduction_five_objects.yaml
deleted
100644 → 0
View file @
f23ae748
# Generated by _generate_configs.py
dataset_name
:
logical_deduction_five_objects
include
:
_template_yaml
task
:
bbh_logical_deduction_five_objects
lm_eval/tasks/bbh/logical_deduction_seven_objects.yaml
deleted
100644 → 0
View file @
f23ae748
# Generated by _generate_configs.py
dataset_name
:
logical_deduction_seven_objects
include
:
_template_yaml
task
:
bbh_logical_deduction_seven_objects
lm_eval/tasks/bbh/logical_deduction_three_objects.yaml
deleted
100644 → 0
View file @
f23ae748
# Generated by _generate_configs.py
dataset_name
:
logical_deduction_three_objects
include
:
_template_yaml
task
:
bbh_logical_deduction_three_objects
lm_eval/tasks/bbh/movie_recommendation.yaml
deleted
100644 → 0
View file @
f23ae748
# Generated by _generate_configs.py
dataset_name
:
movie_recommendation
include
:
_template_yaml
task
:
bbh_movie_recommendation
Prev
1
2
3
4
5
…
7
Next
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment