Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
gaoqiong
lm-evaluation-harness
Commits
2b7d8c2d
Commit
2b7d8c2d
authored
Nov 07, 2023
by
lintangsutawika
Browse files
add TED to BigBench and Brier score to MMLU
parent
5cc65a79
Changes
3
Hide whitespace changes
Inline
Side-by-side
Showing
3 changed files
with
17 additions
and
1 deletion
+17
-1
lm_eval/tasks/bigbench/aux_metric.py
lm_eval/tasks/bigbench/aux_metric.py
+10
-0
lm_eval/tasks/bigbench/generate_until_template_yaml
lm_eval/tasks/bigbench/generate_until_template_yaml
+4
-1
lm_eval/tasks/mmlu/default/_default_template_yaml
lm_eval/tasks/mmlu/default/_default_template_yaml
+3
-0
No files found.
lm_eval/tasks/bigbench/aux_metric.py
0 → 100644
View file @
2b7d8c2d
from
textdistance
import
levenshtein
from
transformers
import
AutoTokenizer
# Change this tokenizer to fit with the model you are using.
tokenizer
=
AutoTokenizer
.
from_pretrained
(
"EleutherAI/pythia-2.8b"
)
def
token_edit_distance
(
references
,
predictions
,
**
kwargs
):
ref_tokens
=
tokenizer
.
encode
(
references
[
0
])
pred_tokens
=
tokenizer
.
encode
(
predictions
[
0
])
return
levenshtein
.
distance
(
ref_tokens
,
pred_tokens
)
lm_eval/tasks/bigbench/generate_until_template_yaml
View file @
2b7d8c2d
group: bigbench
dataset_path: bigbench # will switch to `hails/bigbench` when all tasks are pushed
dataset_path:
hails/
bigbench # will switch to `hails/bigbench` when all tasks are pushed
output_type: generate_until
dataset_kwargs:
# num_shots: 0 # TODO: num of shots for `bigbench` HF dataset should be controlled through this, not through the typical methods
...
...
@@ -14,3 +14,6 @@ metric_list:
aggregation: mean
higher_is_better: true
ignore_punctuation: true
- metric: !function aux_metric.token_edit_distance # pip install textdistance
aggregation: mean
higher_is_better: false
\ No newline at end of file
lm_eval/tasks/mmlu/default/_default_template_yaml
View file @
2b7d8c2d
...
...
@@ -15,3 +15,6 @@ metric_list:
- metric: acc_norm
aggregation: mean
higher_is_better: true
- metric: brier_score
aggregation: mean
higher_is_better: false
\ No newline at end of file
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment