Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
gaoqiong
lm-evaluation-harness
Commits
6ac0fa62
Commit
6ac0fa62
authored
Jan 07, 2025
by
Baber
Browse files
changed mmlu to cloze and added arc_challenge_mmlu
parent
bb098f13
Changes
7
Hide whitespace changes
Inline
Side-by-side
Showing
7 changed files
with
37 additions
and
2 deletions
+37
-2
lm_eval/tasks/arc/arc_challenge_mmlu.yaml
lm_eval/tasks/arc/arc_challenge_mmlu.yaml
+22
-0
lm_eval/tasks/mmlu/default/_default_template_yaml
lm_eval/tasks/mmlu/default/_default_template_yaml
+5
-2
lm_eval/tasks/mmlu/default/_mmlu.yaml
lm_eval/tasks/mmlu/default/_mmlu.yaml
+2
-0
lm_eval/tasks/mmlu/default/_mmlu_humanities.yaml
lm_eval/tasks/mmlu/default/_mmlu_humanities.yaml
+2
-0
lm_eval/tasks/mmlu/default/_mmlu_other.yaml
lm_eval/tasks/mmlu/default/_mmlu_other.yaml
+2
-0
lm_eval/tasks/mmlu/default/_mmlu_social_sciences.yaml
lm_eval/tasks/mmlu/default/_mmlu_social_sciences.yaml
+2
-0
lm_eval/tasks/mmlu/default/_mmlu_stem.yaml
lm_eval/tasks/mmlu/default/_mmlu_stem.yaml
+2
-0
No files found.
lm_eval/tasks/arc/arc_challenge_mmlu.yaml
0 → 100644
View file @
6ac0fa62
tag
:
-
llama
task
:
arc_challenge_mmlu
dataset_path
:
allenai/ai2_arc
dataset_name
:
ARC-Challenge
output_type
:
multiple_choice
training_split
:
train
validation_split
:
validation
test_split
:
test
fewshot_split
:
train
doc_to_text
:
"
Question:
{{question.strip()}}
\n
A.
{{choices.text[0]}}
\n
B.
{{choices.text[1]}}
\n
C.
{{choices.text[2]}}{%
if
choices.text|length
>
3
%}
\n
D.
{{choices.text[3]}}{%
endif
%}
\n
Answer:"
doc_to_target
:
"
{{
'ABCD'[answerKey|int
-
1]
if
answerKey|string
in
'1234'
else
answerKey
}}"
doc_to_choice
:
"
{{
choices.label|map('replace',
'1',
'A')|map('replace',
'2',
'B')|map('replace',
'3',
'C')|map('replace',
'4',
'D')|list
if
choices.label[0]
in
'1234'
else
choices.label
}}"
metric_list
:
-
metric
:
acc
aggregation
:
mean
higher_is_better
:
true
-
metric
:
acc_norm
aggregation
:
mean
higher_is_better
:
true
metadata
:
version
:
1.0
lm_eval/tasks/mmlu/default/_default_template_yaml
View file @
6ac0fa62
...
@@ -4,13 +4,16 @@ fewshot_split: dev
...
@@ -4,13 +4,16 @@ fewshot_split: dev
fewshot_config:
fewshot_config:
sampler: first_n
sampler: first_n
output_type: multiple_choice
output_type: multiple_choice
doc_to_text: "{{question.strip()}}\
nA. {{choices[0]}}\nB. {{choices[1]}}\nC. {{choices[2]}}\nD. {{choices[3]}}\
nAnswer:"
doc_to_text: "
Question:
{{question.strip()}}\nAnswer:"
doc_to_choice:
["A", "B", "C", "D"]
doc_to_choice:
{{choices}}
doc_to_target: answer
doc_to_target: answer
metric_list:
metric_list:
- metric: acc
- metric: acc
aggregation: mean
aggregation: mean
higher_is_better: true
higher_is_better: true
- metric: acc_norm
aggregation: mean
higher_is_better: true
metadata:
metadata:
version: 1.0
version: 1.0
dataset_kwargs:
dataset_kwargs:
...
...
lm_eval/tasks/mmlu/default/_mmlu.yaml
View file @
6ac0fa62
...
@@ -7,5 +7,7 @@ task:
...
@@ -7,5 +7,7 @@ task:
aggregate_metric_list
:
aggregate_metric_list
:
-
metric
:
acc
-
metric
:
acc
weight_by_size
:
True
weight_by_size
:
True
-
metric
:
acc_norm
weight_by_size
:
True
metadata
:
metadata
:
version
:
2
version
:
2
lm_eval/tasks/mmlu/default/_mmlu_humanities.yaml
View file @
6ac0fa62
...
@@ -5,5 +5,7 @@ task:
...
@@ -5,5 +5,7 @@ task:
aggregate_metric_list
:
aggregate_metric_list
:
-
metric
:
acc
-
metric
:
acc
weight_by_size
:
True
weight_by_size
:
True
-
metric
:
acc_norm
weight_by_size
:
True
metadata
:
metadata
:
version
:
2
version
:
2
lm_eval/tasks/mmlu/default/_mmlu_other.yaml
View file @
6ac0fa62
...
@@ -5,5 +5,7 @@ task:
...
@@ -5,5 +5,7 @@ task:
aggregate_metric_list
:
aggregate_metric_list
:
-
metric
:
acc
-
metric
:
acc
weight_by_size
:
True
weight_by_size
:
True
-
metric
:
acc_norm
weight_by_size
:
True
metadata
:
metadata
:
version
:
2
version
:
2
lm_eval/tasks/mmlu/default/_mmlu_social_sciences.yaml
View file @
6ac0fa62
...
@@ -5,5 +5,7 @@ task:
...
@@ -5,5 +5,7 @@ task:
aggregate_metric_list
:
aggregate_metric_list
:
-
metric
:
acc
-
metric
:
acc
weight_by_size
:
True
weight_by_size
:
True
-
metric
:
acc_norm
weight_by_size
:
True
metadata
:
metadata
:
version
:
2
version
:
2
lm_eval/tasks/mmlu/default/_mmlu_stem.yaml
View file @
6ac0fa62
...
@@ -5,5 +5,7 @@ task:
...
@@ -5,5 +5,7 @@ task:
aggregate_metric_list
:
aggregate_metric_list
:
-
metric
:
acc
-
metric
:
acc
weight_by_size
:
True
weight_by_size
:
True
-
metric
:
acc_norm
weight_by_size
:
True
metadata
:
metadata
:
version
:
2
version
:
2
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment