Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
gaoqiong
lm-evaluation-harness
Commits
2507c434
Commit
2507c434
authored
Jul 29, 2025
by
Baber
Browse files
add comma bench
parent
4f8195f1
Changes
2
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
55 additions
and
0 deletions
+55
-0
lm_eval/tasks/benchmarks/comma.yaml
lm_eval/tasks/benchmarks/comma.yaml
+53
-0
lm_eval/tasks/super_glue/boolq/default.yaml
lm_eval/tasks/super_glue/boolq/default.yaml
+2
-0
No files found.
lm_eval/tasks/benchmarks/comma.yaml
0 → 100644
View file @
2507c434
group
:
comma
task
:
-
task
:
arc_challenge
metric_list
:
-
metric
:
acc_mutual_info
aggregation
:
mean
higher_is_better
:
true
-
task
:
arc_easy
metric_list
:
-
metric
:
acc_mutual_info
aggregation
:
mean
higher_is_better
:
true
-
boolq
-
task
:
hellaswag
metric_list
:
-
metric
:
acc_norm
aggregation
:
mean
higher_is_better
:
true
-
task
:
openbookqa
metric_list
:
-
metric
:
acc_mutual_info
aggregation
:
mean
higher_is_better
:
true
-
task
:
commonsense_qa
doc_to_text
:
"
Question:
{{
question.strip()
}}
\n
Answer:"
doc_to_target
:
'
{{["A",
"B",
"C",
"D",
"E"].index(answerKey)}}'
doc_to_choice
:
"
{{
choices['text']
}}"
metric_list
:
-
metric
:
acc_mutual_info
aggregation
:
mean
higher_is_better
:
true
-
task
:
piqa
doc_to_text
:
"
Goal:
{{goal}}
\n
Answer:"
metric_list
:
-
metric
:
acc_norm
aggregation
:
mean
higher_is_better
:
true
-
task
:
social_iqa
doc_to_text
:
"
Question:
{{context}}
{{question}}
\n
Answer:"
metric_list
:
-
metric
:
acc_norm
aggregation
:
mean
higher_is_better
:
true
aggregate_metric_list
:
-
metric
:
acc
aggregation
:
mean
weight_by_size
:
false
-
metric
:
acc_norm
aggregation
:
mean
weight_by_size
:
false
-
metric
:
acc_mutual_info
aggregation
:
mean
weight_by_size
:
false
lm_eval/tasks/super_glue/boolq/default.yaml
View file @
2507c434
...
...
@@ -13,5 +13,7 @@ should_decontaminate: true
doc_to_decontamination_query
:
passage
metric_list
:
-
metric
:
acc
aggregation
:
mean
higher_is_better
:
true
metadata
:
version
:
2.0
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment