Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
gaoqiong
lm-evaluation-harness
Commits
0375b792
Unverified
Commit
0375b792
authored
May 23, 2023
by
Lintang Sutawika
Committed by
GitHub
May 23, 2023
Browse files
Merge pull request #520 from EleutherAI/update-config
Update config
parents
c5ed8cdc
eb42b01b
Changes
29
Hide whitespace changes
Inline
Side-by-side
Showing
20 changed files
with
106 additions
and
25 deletions
+106
-25
lm_eval/prompts/__init__.py
lm_eval/prompts/__init__.py
+5
-2
lm_eval/tasks/__init__.py
lm_eval/tasks/__init__.py
+1
-1
lm_eval/tasks/pile/pile_enron.yaml
lm_eval/tasks/pile/pile_enron.yaml
+0
-0
lm_eval/tasks/super_glue/boolq/promptsource-00.yaml
lm_eval/tasks/super_glue/boolq/promptsource-00.yaml
+4
-4
lm_eval/tasks/super_glue/boolq/promptsource-01.yaml
lm_eval/tasks/super_glue/boolq/promptsource-01.yaml
+5
-0
lm_eval/tasks/super_glue/boolq/promptsource-02.yaml
lm_eval/tasks/super_glue/boolq/promptsource-02.yaml
+5
-0
lm_eval/tasks/super_glue/cb/can_we_infer.yaml
lm_eval/tasks/super_glue/cb/can_we_infer.yaml
+0
-7
lm_eval/tasks/super_glue/cb/claim_true_false_inconclusive.yaml
...al/tasks/super_glue/cb/claim_true_false_inconclusive.yaml
+0
-7
lm_eval/tasks/super_glue/cb/promptsource-00.yaml
lm_eval/tasks/super_glue/cb/promptsource-00.yaml
+4
-4
lm_eval/tasks/super_glue/cb/promptsource-01.yaml
lm_eval/tasks/super_glue/cb/promptsource-01.yaml
+5
-0
lm_eval/tasks/super_glue/cb/promptsource-02.yaml
lm_eval/tasks/super_glue/cb/promptsource-02.yaml
+5
-0
lm_eval/tasks/super_glue/copa/promptsource-00.yaml
lm_eval/tasks/super_glue/copa/promptsource-00.yaml
+14
-0
lm_eval/tasks/super_glue/copa/promptsource-01.yaml
lm_eval/tasks/super_glue/copa/promptsource-01.yaml
+5
-0
lm_eval/tasks/super_glue/copa/promptsource-02.yaml
lm_eval/tasks/super_glue/copa/promptsource-02.yaml
+5
-0
lm_eval/tasks/super_glue/multirc/promptsource-00.yaml
lm_eval/tasks/super_glue/multirc/promptsource-00.yaml
+14
-0
lm_eval/tasks/super_glue/multirc/promptsource-01.yaml
lm_eval/tasks/super_glue/multirc/promptsource-01.yaml
+5
-0
lm_eval/tasks/super_glue/multirc/promptsource-02.yaml
lm_eval/tasks/super_glue/multirc/promptsource-02.yaml
+5
-0
lm_eval/tasks/super_glue/record/promptsource-00.yaml
lm_eval/tasks/super_glue/record/promptsource-00.yaml
+14
-0
lm_eval/tasks/super_glue/record/promptsource-01.yaml
lm_eval/tasks/super_glue/record/promptsource-01.yaml
+5
-0
lm_eval/tasks/super_glue/record/promptsource-02.yaml
lm_eval/tasks/super_glue/record/promptsource-02.yaml
+5
-0
No files found.
lm_eval/prompts/__init__.py
View file @
0375b792
...
...
@@ -17,10 +17,13 @@ PROMPT_REGISTRY = {
def
get_prompt
(
prompt_id
:
str
,
dataset_name
=
None
,
subset_name
=
None
):
# unpack prompt name
category_name
,
prompt_name
=
prompt_id
.
split
(
":"
)
eval_logger
.
info
(
f
"Loading prompt from
{
category_name
}
"
)
if
subset_name
is
None
:
dataset_full_name
=
dataset_name
else
:
dataset_full_name
=
f
"
{
dataset_name
}
-
{
subset_name
}
"
eval_logger
.
info
(
f
"Loading prompt from
{
category_name
}
for
{
dataset_full_name
}
"
)
if
category_name
==
"promptsource"
:
try
:
# prompts = DatasetTemplates(dataset_name, dataset_path)
if
subset_name
is
None
:
prompts
=
DatasetTemplates
(
dataset_name
=
dataset_name
)
else
:
...
...
lm_eval/tasks/__init__.py
View file @
0375b792
...
...
@@ -55,7 +55,7 @@ def get_task(task_name, config):
return
TASK_REGISTRY
[
task_name
](
config
=
config
)
except
KeyError
:
eval_logger
.
info
(
"Available tasks:"
)
eval_logger
.
info
(
TASK_REGISTRY
)
eval_logger
.
info
(
ALL_TASKS
)
raise
KeyError
(
f
"Missing task
{
task_name
}
"
)
...
...
lm_eval/tasks/pile_enron.yaml
→
lm_eval/tasks/pile
/pile
_enron.yaml
View file @
0375b792
File moved
lm_eval/tasks/super_glue/
wsc.fixed/templat
e-00.yaml
→
lm_eval/tasks/super_glue/
boolq/promptsourc
e-00.yaml
View file @
0375b792
group
:
-
t0-eval
task
:
"
does
the
pronoun
refer
to
"
-
super-glue-promptsource
task
:
"
GPT-3
Style
"
dataset_path
:
super_glue
dataset_name
:
wsc.fixed
dataset_name
:
boolq
training_split
:
train
validation_split
:
validation
use_prompt
:
"
promptsource:
does
the
pronoun
refer
to
"
use_prompt
:
"
promptsource:
GPT-3
Style
"
metric_list
:
-
metric
:
exact_match
aggregation
:
mean
...
...
lm_eval/tasks/super_glue/boolq/promptsource-01.yaml
0 → 100644
View file @
0375b792
include
:
promptsource-00.yaml
group
:
-
super-glue-promptsource
task
:
"
based
on
the
previous
passage"
use_prompt
:
"
promptsource:based
on
the
previous
passage"
lm_eval/tasks/super_glue/boolq/promptsource-02.yaml
0 → 100644
View file @
0375b792
include
:
promptsource-00.yaml
group
:
-
super-glue-promptsource
task
:
"
based
on
the
following
passage"
use_prompt
:
"
promptsource:based
on
the
following
passage"
lm_eval/tasks/super_glue/cb/can_we_infer.yaml
deleted
100644 → 0
View file @
c5ed8cdc
group
:
-
super-glue-cb
include
:
based_on_previous_passage.yaml
task
:
can we infer
reference
:
Webson & Pavlick
2021
doc_to_text
:
"
Suppose
{{premise}}
Can
we
infer
that
\"
{{hypothesis}}
\"
?
Yes,
no,
or
maybe?"
doc_to_target
:
"
{%
set
answer_choices
=
['Yes',
'No',
'Maybe']
%}{{answer_choices[label]}}"
lm_eval/tasks/super_glue/cb/claim_true_false_inconclusive.yaml
deleted
100644 → 0
View file @
c5ed8cdc
group
:
-
super-glue-cb
include
:
based_on_previous_passage.yaml
task
:
claim
true
/false/inconclusive
reference
:
Sanh et al.
2021
doc_to_text
:
"
{{premise}}
Based
on
that
information,
is
the
claim:
\"
{{hypothesis}}
\"
\"
true
\"
,
\"
false
\"
,
or
\"
inconclusive
\"
?"
doc_to_target
:
"
{%
set
answer_choices
=
['True',
'False',
'Inconclusive']
%}{{answer_choices[label]}}"
lm_eval/tasks/super_glue/
wsc.fixed/templat
e-0
1
.yaml
→
lm_eval/tasks/super_glue/
cb/promptsourc
e-0
0
.yaml
View file @
0375b792
group
:
-
t0-eval
task
:
"
by
p
they
mean
"
-
super-glue-promptsource
task
:
"
GPT-3
style
"
dataset_path
:
super_glue
dataset_name
:
wsc.fixed
dataset_name
:
cb
training_split
:
train
validation_split
:
validation
use_prompt
:
"
promptsource:
by
p
they
mean
"
use_prompt
:
"
promptsource:
GPT-3
style
"
metric_list
:
-
metric
:
exact_match
aggregation
:
mean
...
...
lm_eval/tasks/super_glue/cb/promptsource-01.yaml
0 → 100644
View file @
0375b792
include
:
promptsource-00.yaml
group
:
-
super-glue-promptsource
task
:
"
MNLI
crowdsource"
use_prompt
:
"
promptsource:MNLI
crowdsource"
lm_eval/tasks/super_glue/cb/promptsource-02.yaml
0 → 100644
View file @
0375b792
include
:
promptsource-00.yaml
group
:
-
super-glue-promptsource
task
:
"
based
on
the
previous
passage"
use_prompt
:
"
promptsource:based
on
the
previous
passage"
lm_eval/tasks/super_glue/c
b/based_on_previous_passage
.yaml
→
lm_eval/tasks/super_glue/c
opa/promptsource-00
.yaml
View file @
0375b792
group
:
-
super-glue-cb
task
:
based on the previous passage
reference
:
"
Adapted
from
the
BoolQ
prompts
in
Schick
&
Sch
\xFC
tze
2021."
-
super-glue-promptsource
task
:
"
C1
or
C2?
premise,
so/because…"
dataset_path
:
super_glue
dataset_name
:
c
b
dataset_name
:
c
opa
training_split
:
train
validation_split
:
validation
doc_to_text
:
"
{{premise}}
Based
on
the
previous
passage,
is
it
true
that
\"
{{hypothesis}}
\"
?
Yes,
no,
or
maybe?"
doc_to_target
:
"
{%
set
answer_choices
=
['Yes',
'No',
'Maybe']
%}{{answer_choices[label]}}"
use_prompt
:
"
promptsource:C1
or
C2?
premise,
so/because…"
metric_list
:
-
metric
:
exact_match
aggregation
:
mean
...
...
lm_eval/tasks/super_glue/copa/promptsource-01.yaml
0 → 100644
View file @
0375b792
include
:
promptsource-00.yaml
group
:
-
super-glue-promptsource
task
:
"
best_option"
use_prompt
:
"
promptsource:best_option"
lm_eval/tasks/super_glue/copa/promptsource-02.yaml
0 → 100644
View file @
0375b792
include
:
promptsource-00.yaml
group
:
-
super-glue-promptsource
task
:
"
cause_effect"
use_prompt
:
"
promptsource:cause_effect"
lm_eval/tasks/super_glue/multirc/promptsource-00.yaml
0 → 100644
View file @
0375b792
group
:
-
super-glue-promptsource
task
:
"
I
was
going
to
say…"
dataset_path
:
super_glue
dataset_name
:
multirc
training_split
:
train
validation_split
:
validation
use_prompt
:
"
promptsource:I
was
going
to
say…"
metric_list
:
-
metric
:
exact_match
aggregation
:
mean
higher_is_better
:
true
ignore_case
:
true
ignore_punctuation
:
true
lm_eval/tasks/super_glue/multirc/promptsource-01.yaml
0 → 100644
View file @
0375b792
include
:
promptsource-00.yaml
group
:
-
super-glue-promptsource
task
:
"
Would
it
be
good
to
answer…"
use_prompt
:
"
promptsource:Would
it
be
good
to
answer…"
lm_eval/tasks/super_glue/multirc/promptsource-02.yaml
0 → 100644
View file @
0375b792
include
:
promptsource-00.yaml
group
:
-
super-glue-promptsource
task
:
"
confirm"
use_prompt
:
"
promptsource:confirm"
lm_eval/tasks/super_glue/record/promptsource-00.yaml
0 → 100644
View file @
0375b792
group
:
-
super-glue-promptsource
task
:
"
Add
sentence
after
(continuation
choices)"
dataset_path
:
super_glue
dataset_name
:
record
training_split
:
train
validation_split
:
validation
use_prompt
:
"
promptsource:Add
sentence
after
(continuation
choices)"
metric_list
:
-
metric
:
exact_match
aggregation
:
mean
higher_is_better
:
true
ignore_case
:
true
ignore_punctuation
:
true
lm_eval/tasks/super_glue/record/promptsource-01.yaml
0 → 100644
View file @
0375b792
include
:
promptsource-00.yaml
group
:
-
super-glue-promptsource
task
:
"
Add
sentence
after
after
(continuation
choices)"
use_prompt
:
"
promptsource:Add
sentence
after
after
(continuation
choices)"
lm_eval/tasks/super_glue/record/promptsource-02.yaml
0 → 100644
View file @
0375b792
include
:
promptsource-00.yaml
group
:
-
super-glue-promptsource
task
:
"
Can
you
figure
out…"
use_prompt
:
"
promptsource:Can
you
figure
out…"
Prev
1
2
Next
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment