Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
gaoqiong
lm-evaluation-harness
Commits
32a70d89
Commit
32a70d89
authored
Jul 17, 2023
by
lintangsutawika
Browse files
aggregate is shown in the table
parent
1d995b6d
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
11 additions
and
1 deletion
+11
-1
lm_eval/evaluator.py
lm_eval/evaluator.py
+11
-1
No files found.
lm_eval/evaluator.py
View file @
32a70d89
...
...
@@ -364,6 +364,15 @@ def evaluate(
task_score
=
task
.
aggregation
()[
metric
](
items
)
results
[
task_name
][
metric
+
","
+
key
]
=
task_score
# if task_name not in benchmark_agg:
# benchmark[] = [task_score]
# Need to put back in results
# pythia | acc
# | perplexity
# | word_perplexity
# | byte_perplexity
# | bits_per_byte
if
metric
not
in
aggregate
:
aggregate
[
metric
]
=
[
task_score
]
else
:
...
...
@@ -383,7 +392,8 @@ def evaluate(
results
[
task_name
][
metric
+
"_stderr"
+
","
+
key
]
=
stderr
(
items
)
for
metric
in
aggregate
.
keys
():
aggregate
[
metric
]
=
np
.
average
(
aggregate
[
metric
])
results
[
"Aggregate"
][
metric
]
=
np
.
average
(
aggregate
[
metric
])
versions
[
"Aggregate"
]
=
"N/A"
results_dict
=
{
"results"
:
dict
(
results
),
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment