Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
gaoqiong
lm-evaluation-harness
Commits
6998762a
Commit
6998762a
authored
Nov 09, 2023
by
lintangsutawika
Browse files
merged cont-metrics here
parent
2184b8de
Changes
12
Hide whitespace changes
Inline
Side-by-side
Showing
12 changed files
with
15 additions
and
33 deletions
+15
-33
lm_eval/api/metrics.py
lm_eval/api/metrics.py
+3
-3
lm_eval/evaluator.py
lm_eval/evaluator.py
+2
-0
lm_eval/tasks/mmlu/alternative_worlds/full_continuation/style_01/_template_yaml
...ernative_worlds/full_continuation/style_01/_template_yaml
+1
-3
lm_eval/tasks/mmlu/alternative_worlds/full_continuation/style_02/_template_yaml
...ernative_worlds/full_continuation/style_02/_template_yaml
+1
-3
lm_eval/tasks/mmlu/alternative_worlds/full_continuation/style_03/_template_yaml
...ernative_worlds/full_continuation/style_03/_template_yaml
+1
-3
lm_eval/tasks/mmlu/alternative_worlds/full_continuation/style_04/_template_yaml
...ernative_worlds/full_continuation/style_04/_template_yaml
+1
-3
lm_eval/tasks/mmlu/alternative_worlds/full_continuation/style_05/_template_yaml
...ernative_worlds/full_continuation/style_05/_template_yaml
+1
-3
lm_eval/tasks/mmlu/alternative_worlds/letters_only/style_01/_template_yaml
...u/alternative_worlds/letters_only/style_01/_template_yaml
+1
-3
lm_eval/tasks/mmlu/alternative_worlds/letters_only/style_02/_template_yaml
...u/alternative_worlds/letters_only/style_02/_template_yaml
+1
-3
lm_eval/tasks/mmlu/alternative_worlds/letters_only/style_03/_template_yaml
...u/alternative_worlds/letters_only/style_03/_template_yaml
+1
-3
lm_eval/tasks/mmlu/alternative_worlds/letters_only/style_04/_template_yaml
...u/alternative_worlds/letters_only/style_04/_template_yaml
+1
-3
lm_eval/tasks/mmlu/alternative_worlds/letters_only/style_05/_template_yaml
...u/alternative_worlds/letters_only/style_05/_template_yaml
+1
-3
No files found.
lm_eval/api/metrics.py
View file @
6998762a
...
...
@@ -109,9 +109,9 @@ def ter(items):
@
register_aggregation
(
"brier_score"
)
def
brier_score
(
items
):
# This is a passthrough function
gold
,
predictions
=
list
(
zip
(
*
items
))
gold
=
list
(
gold
)
gold_one_hot
=
np
.
eye
(
np
.
max
(
gold
)
+
1
)[
gold
]
predictions
=
list
(
zip
(
*
items
))[
1
]
gold
=
np
.
array
(
gold
)
predictions
=
np
.
array
(
predictions
)
gold_one_hot
=
np
.
eye
(
len
(
predictions
[
0
]))[
gold
]
return
np
.
mean
(
np
.
sum
((
predictions
-
gold_one_hot
)
**
2
,
axis
=
1
))
...
...
lm_eval/evaluator.py
View file @
6998762a
...
...
@@ -468,6 +468,8 @@ def evaluate(
if
stderr
is
not
None
:
results
[
task_name
][
metric
+
"_stderr"
+
","
+
key
]
=
stderr
(
items
)
else
:
results
[
task_name
][
metric
+
"_stderr"
+
","
+
key
]
=
0
if
bool
(
results
):
...
...
lm_eval/tasks/mmlu/alternative_worlds/full_continuation/style_01/_template_yaml
View file @
6998762a
...
...
@@ -10,6 +10,4 @@ metric_list:
- metric: acc
aggregation: mean
higher_is_better: true
- metric: acc_norm
aggregation: mean
higher_is_better: true
- metric: brier_score
lm_eval/tasks/mmlu/alternative_worlds/full_continuation/style_02/_template_yaml
View file @
6998762a
...
...
@@ -10,6 +10,4 @@ metric_list:
- metric: acc
aggregation: mean
higher_is_better: true
- metric: acc_norm
aggregation: mean
higher_is_better: true
- metric: brier_score
lm_eval/tasks/mmlu/alternative_worlds/full_continuation/style_03/_template_yaml
View file @
6998762a
...
...
@@ -10,6 +10,4 @@ metric_list:
- metric: acc
aggregation: mean
higher_is_better: true
- metric: acc_norm
aggregation: mean
higher_is_better: true
- metric: brier_score
lm_eval/tasks/mmlu/alternative_worlds/full_continuation/style_04/_template_yaml
View file @
6998762a
...
...
@@ -10,6 +10,4 @@ metric_list:
- metric: acc
aggregation: mean
higher_is_better: true
- metric: acc_norm
aggregation: mean
higher_is_better: true
- metric: brier_score
lm_eval/tasks/mmlu/alternative_worlds/full_continuation/style_05/_template_yaml
View file @
6998762a
...
...
@@ -10,6 +10,4 @@ metric_list:
- metric: acc
aggregation: mean
higher_is_better: true
- metric: acc_norm
aggregation: mean
higher_is_better: true
- metric: brier_score
lm_eval/tasks/mmlu/alternative_worlds/letters_only/style_01/_template_yaml
View file @
6998762a
...
...
@@ -10,6 +10,4 @@ metric_list:
- metric: acc
aggregation: mean
higher_is_better: true
- metric: acc_norm
aggregation: mean
higher_is_better: true
- metric: brier_score
lm_eval/tasks/mmlu/alternative_worlds/letters_only/style_02/_template_yaml
View file @
6998762a
...
...
@@ -10,6 +10,4 @@ metric_list:
- metric: acc
aggregation: mean
higher_is_better: true
- metric: acc_norm
aggregation: mean
higher_is_better: true
- metric: brier_score
lm_eval/tasks/mmlu/alternative_worlds/letters_only/style_03/_template_yaml
View file @
6998762a
...
...
@@ -10,6 +10,4 @@ metric_list:
- metric: acc
aggregation: mean
higher_is_better: true
- metric: acc_norm
aggregation: mean
higher_is_better: true
- metric: brier_score
lm_eval/tasks/mmlu/alternative_worlds/letters_only/style_04/_template_yaml
View file @
6998762a
...
...
@@ -10,6 +10,4 @@ metric_list:
- metric: acc
aggregation: mean
higher_is_better: true
- metric: acc_norm
aggregation: mean
higher_is_better: true
- metric: brier_score
lm_eval/tasks/mmlu/alternative_worlds/letters_only/style_05/_template_yaml
View file @
6998762a
...
...
@@ -10,6 +10,4 @@ metric_list:
- metric: acc
aggregation: mean
higher_is_better: true
- metric: acc_norm
aggregation: mean
higher_is_better: true
- metric: brier_score
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment