Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
gaoqiong
lm-evaluation-harness
Commits
f7b81bd4
"test/vscode:/vscode.git/clone" did not exist on "3a79613c28319030a5fe7fe22284b178f56984e1"
Commit
f7b81bd4
authored
Jan 17, 2024
by
lintangsutawika
Browse files
modifications for current evals on t5v2
parent
032e879b
Changes
13
Hide whitespace changes
Inline
Side-by-side
Showing
13 changed files
with
50 additions
and
53 deletions
+50
-53
lm_eval/evaluator.py
lm_eval/evaluator.py
+3
-1
lm_eval/tasks/bbh/cot_fewshot/_cot_fewshot_template_yaml
lm_eval/tasks/bbh/cot_fewshot/_cot_fewshot_template_yaml
+1
-0
lm_eval/tasks/bbh/cot_zeroshot/_cot_zeroshot_template_yaml
lm_eval/tasks/bbh/cot_zeroshot/_cot_zeroshot_template_yaml
+1
-0
lm_eval/tasks/bbh/fewshot/_fewshot_template_yaml
lm_eval/tasks/bbh/fewshot/_fewshot_template_yaml
+1
-0
lm_eval/tasks/bbh/zeroshot/_zeroshot_template_yaml
lm_eval/tasks/bbh/zeroshot/_zeroshot_template_yaml
+1
-0
lm_eval/tasks/benchmarks/flan/_flan_anli_yaml
lm_eval/tasks/benchmarks/flan/_flan_anli_yaml
+3
-3
lm_eval/tasks/benchmarks/flan/_flan_arc_yaml
lm_eval/tasks/benchmarks/flan/_flan_arc_yaml
+0
-2
lm_eval/tasks/benchmarks/flan/_flan_boolq_yaml
lm_eval/tasks/benchmarks/flan/_flan_boolq_yaml
+0
-0
lm_eval/tasks/benchmarks/flan/_flan_rte_yaml
lm_eval/tasks/benchmarks/flan/_flan_rte_yaml
+0
-0
lm_eval/tasks/benchmarks/flan/flan_held_in.yaml
lm_eval/tasks/benchmarks/flan/flan_held_in.yaml
+35
-4
lm_eval/tasks/benchmarks/flan/flan_held_in_yaml
lm_eval/tasks/benchmarks/flan/flan_held_in_yaml
+0
-39
lm_eval/tasks/benchmarks/flan/flan_held_out.yaml
lm_eval/tasks/benchmarks/flan/flan_held_out.yaml
+4
-4
lm_eval/tasks/mmlu/flan_n_shot/generative/_mmlu_flan_generative_template_yaml
...lan_n_shot/generative/_mmlu_flan_generative_template_yaml
+1
-0
No files found.
lm_eval/evaluator.py
View file @
f7b81bd4
...
@@ -399,7 +399,7 @@ def evaluate(
...
@@ -399,7 +399,7 @@ def evaluate(
if
type
(
items
[
0
])
==
tuple
:
if
type
(
items
[
0
])
==
tuple
:
numitem
=
len
(
items
[
0
])
numitem
=
len
(
items
[
0
])
if
isinstance
(
items
[
0
],
(
str
,
list
)):
if
isinstance
(
items
[
0
],
(
str
,
list
,
tuple
)):
# handle the string case
# handle the string case
gathered_items
=
[
None
]
*
lm
.
accelerator
.
num_processes
gathered_items
=
[
None
]
*
lm
.
accelerator
.
num_processes
torch
.
distributed
.
all_gather_object
(
gathered_items
,
items
)
torch
.
distributed
.
all_gather_object
(
gathered_items
,
items
)
...
@@ -492,6 +492,8 @@ def evaluate(
...
@@ -492,6 +492,8 @@ def evaluate(
]:
]:
stderr
=
"_stderr,"
.
join
(
metric
.
split
(
","
))
stderr
=
"_stderr,"
.
join
(
metric
.
split
(
","
))
stderr_score
=
results
[
task
][
stderr
]
stderr_score
=
results
[
task
][
stderr
]
if
isinstance
(
stderr_score
,
str
):
stderr_score
=
0
var_score
=
stderr_score
**
2
var_score
=
stderr_score
**
2
metric_score
=
results
[
task
][
metric
]
metric_score
=
results
[
task
][
metric
]
...
...
lm_eval/tasks/bbh/cot_fewshot/_cot_fewshot_template_yaml
View file @
f7b81bd4
...
@@ -17,6 +17,7 @@ generation_kwargs:
...
@@ -17,6 +17,7 @@ generation_kwargs:
- "</s>"
- "</s>"
- "Q"
- "Q"
- "\n\n"
- "\n\n"
- "<0x0A>"
do_sample: false
do_sample: false
temperature: 0.0
temperature: 0.0
filter_list:
filter_list:
...
...
lm_eval/tasks/bbh/cot_zeroshot/_cot_zeroshot_template_yaml
View file @
f7b81bd4
...
@@ -14,6 +14,7 @@ generation_kwargs:
...
@@ -14,6 +14,7 @@ generation_kwargs:
- "</s>"
- "</s>"
- "Q"
- "Q"
- "\n\n"
- "\n\n"
- "<0x0A>"
do_sample: false
do_sample: false
temperature: 0.0
temperature: 0.0
filter_list:
filter_list:
...
...
lm_eval/tasks/bbh/fewshot/_fewshot_template_yaml
View file @
f7b81bd4
...
@@ -14,6 +14,7 @@ generation_kwargs:
...
@@ -14,6 +14,7 @@ generation_kwargs:
- "</s>"
- "</s>"
- "Q"
- "Q"
- "\n\n"
- "\n\n"
- "<0x0A>"
do_sample: false
do_sample: false
temperature: 0.0
temperature: 0.0
num_fewshot: 0
num_fewshot: 0
...
...
lm_eval/tasks/bbh/zeroshot/_zeroshot_template_yaml
View file @
f7b81bd4
...
@@ -14,6 +14,7 @@ generation_kwargs:
...
@@ -14,6 +14,7 @@ generation_kwargs:
- "</s>"
- "</s>"
- "Q:"
- "Q:"
- "\n\n"
- "\n\n"
- "<0x0A>"
do_sample: false
do_sample: false
temperature: 0.0
temperature: 0.0
num_fewshot: 0
num_fewshot: 0
...
...
lm_eval/tasks/benchmarks/flan/flan_anli
.
yaml
→
lm_eval/tasks/benchmarks/flan/
_
flan_anli
_
yaml
View file @
f7b81bd4
group: flan_anli
group: flan_anli
task:
task:
- include: yaml_templates/held_in_template_yaml
- include: yaml_templates/held_in_template_yaml
task
:
anli_
r1
task: r1
dataset_path: anli
dataset_path: anli
use_prompt: prompt_templates/anli.yaml:*
use_prompt: prompt_templates/anli.yaml:*
validation_split: dev_r1
validation_split: dev_r1
- include: yaml_templates/held_in_template_yaml
- include: yaml_templates/held_in_template_yaml
task
:
anli_
r2
task: r2
dataset_path: anli
dataset_path: anli
use_prompt: prompt_templates/anli.yaml:*
use_prompt: prompt_templates/anli.yaml:*
validation_split: dev_r2
validation_split: dev_r2
- include: yaml_templates/held_in_template_yaml
- include: yaml_templates/held_in_template_yaml
task
:
anli_
r3
task: r3
dataset_path: anli
dataset_path: anli
use_prompt: prompt_templates/anli.yaml:*
use_prompt: prompt_templates/anli.yaml:*
validation_split: dev_r3
validation_split: dev_r3
lm_eval/tasks/benchmarks/flan/flan_arc
.
yaml
→
lm_eval/tasks/benchmarks/flan/
_
flan_arc
_
yaml
View file @
f7b81bd4
group: flan_arc
group: flan_arc
task:
task:
- include: yaml_templates/held_in_template_yaml
- include: yaml_templates/held_in_template_yaml
task
:
arc_easy
dataset_path: ai2_arc
dataset_path: ai2_arc
dataset_name: ARC-Easy
dataset_name: ARC-Easy
use_prompt: prompt_templates/arc.yaml:*
use_prompt: prompt_templates/arc.yaml:*
validation_split: validation
validation_split: validation
- include: yaml_templates/held_in_template_yaml
- include: yaml_templates/held_in_template_yaml
task
:
arc_challenge
dataset_path: ai2_arc
dataset_path: ai2_arc
dataset_name: ARC-Challenge
dataset_name: ARC-Challenge
use_prompt: prompt_templates/arc.yaml:*
use_prompt: prompt_templates/arc.yaml:*
...
...
lm_eval/tasks/benchmarks/flan/flan_boolq
.
yaml
→
lm_eval/tasks/benchmarks/flan/
_
flan_boolq
_
yaml
View file @
f7b81bd4
File moved
lm_eval/tasks/benchmarks/flan/flan_rte
.
yaml
→
lm_eval/tasks/benchmarks/flan/
_
flan_rte
_
yaml
View file @
f7b81bd4
File moved
lm_eval/tasks/benchmarks/flan/flan_held_in.yaml
View file @
f7b81bd4
group
:
flan_held_in
group
:
flan_held_in
task
:
task
:
-
flan_boolq
-
include
:
yaml_templates/held_in_template_yaml
-
flan_rte
task
:
r1
-
flan_anli
dataset_path
:
anli
-
flan_arc
use_prompt
:
prompt_templates/anli.yaml:*
validation_split
:
dev_r1
-
include
:
yaml_templates/held_in_template_yaml
task
:
r2
dataset_path
:
anli
use_prompt
:
prompt_templates/anli.yaml:*
validation_split
:
dev_r2
-
include
:
yaml_templates/held_in_template_yaml
task
:
r3
dataset_path
:
anli
use_prompt
:
prompt_templates/anli.yaml:*
validation_split
:
dev_r3
-
include
:
yaml_templates/held_in_template_yaml
dataset_path
:
ai2_arc
dataset_name
:
ARC-Easy
use_prompt
:
prompt_templates/arc.yaml:*
validation_split
:
validation
-
include
:
yaml_templates/held_in_template_yaml
dataset_path
:
ai2_arc
dataset_name
:
ARC-Challenge
use_prompt
:
prompt_templates/arc.yaml:*
validation_split
:
validation
-
include
:
yaml_templates/held_in_template_yaml
dataset_path
:
super_glue
dataset_name
:
boolq
use_prompt
:
prompt_templates/boolq.yaml:*
validation_split
:
validation
-
include
:
yaml_templates/held_in_template_yaml
dataset_path
:
super_glue
dataset_name
:
rte
use_prompt
:
prompt_templates/rte.yaml:*
validation_split
:
validation
lm_eval/tasks/benchmarks/flan/flan_held_in_yaml
deleted
100644 → 0
View file @
032e879b
group: flan_held_in
task:
- include: flan/yaml_templates/held_in_template_yaml
dataset_path: super_glue
dataset_name: boolq
use_prompt: flan/prompt_templates/boolq.yaml:*
validation_split: validation
- include: flan/yaml_templates/held_in_template_yaml
dataset_path: super_glue
dataset_name: rte
use_prompt: flan/prompt_templates/rte.yaml:*
validation_split: validation
- include: flan/yaml_templates/held_in_template_yaml
task: anli_r1
dataset_path: anli
use_prompt: flan/prompt_templates/anli.yaml:*
validation_split: dev_r1
- include: flan/yaml_templates/held_in_template_yaml
task: anli_r2
dataset_path: anli
use_prompt: flan/prompt_templates/anli.yaml:*
validation_split: dev_r2
- include: flan/yaml_templates/held_in_template_yaml
task: anli_r3
dataset_path: anli
use_prompt: flan/prompt_templates/anli.yaml:*
validation_split: dev_r3
- include: flan/yaml_templates/held_in_template_yaml
task: arc_easy
dataset_path: ai2_arc
dataset_name: ARC-Easy
use_prompt: flan/prompt_templates/arc.yaml:*
validation_split: validation
- include: flan/yaml_templates/held_in_template_yaml
task: arc_challenge
dataset_path: ai2_arc
dataset_name: ARC-Challenge
use_prompt: flan/prompt_templates/arc.yaml:*
validation_split: validation
lm_eval/tasks/benchmarks/flan/flan_held_out.yaml
View file @
f7b81bd4
group
:
flan_held_out
group
:
flan_held_out
task
:
task
:
# BBH
# BBH
-
bbh_
flan_
zeroshot
-
bbh_zeroshot
-
bbh_
flan_
fewshot
-
bbh_fewshot
-
bbh_
flan_
cot_fewshot
-
bbh_cot_fewshot
-
bbh_
flan_
cot_zeroshot
-
bbh_cot_zeroshot
# MMLU
# MMLU
-
mmlu
-
mmlu
-
mmlu_flan_n_shot_generative
-
mmlu_flan_n_shot_generative
...
...
lm_eval/tasks/mmlu/flan_n_shot/generative/_mmlu_flan_generative_template_yaml
View file @
f7b81bd4
...
@@ -8,6 +8,7 @@ doc_to_target: "{{['(A)', '(B)', '(C)', '(D)'][answer]}}"
...
@@ -8,6 +8,7 @@ doc_to_target: "{{['(A)', '(B)', '(C)', '(D)'][answer]}}"
generation_kwargs:
generation_kwargs:
until:
until:
- "</s>"
- "</s>"
- "<0x0A>"
metric_list:
metric_list:
- metric: exact_match
- metric: exact_match
aggregation: mean
aggregation: mean
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment