Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
gaoqiong
lm-evaluation-harness
Commits
cda25fef
Unverified
Commit
cda25fef
authored
Jan 02, 2024
by
Lintang Sutawika
Committed by
GitHub
Jan 02, 2024
Browse files
Merge branch 'main' into standardize_metrics
parents
dfb41835
4d10ad56
Changes
249
Hide whitespace changes
Inline
Side-by-side
Showing
20 changed files
with
17 additions
and
41 deletions
+17
-41
lm_eval/tasks/kmmlu/utils.py
lm_eval/tasks/kmmlu/utils.py
+0
-20
lm_eval/tasks/lambada/lambada_openai.yaml
lm_eval/tasks/lambada/lambada_openai.yaml
+1
-1
lm_eval/tasks/lambada/lambada_standard.yaml
lm_eval/tasks/lambada/lambada_standard.yaml
+1
-1
lm_eval/tasks/lambada_cloze/lambada_openai_cloze.yaml
lm_eval/tasks/lambada_cloze/lambada_openai_cloze.yaml
+1
-1
lm_eval/tasks/lambada_cloze/lambada_standard_cloze.yaml
lm_eval/tasks/lambada_cloze/lambada_standard_cloze.yaml
+1
-1
lm_eval/tasks/lambada_multilingual/lambada_mt_en.yaml
lm_eval/tasks/lambada_multilingual/lambada_mt_en.yaml
+1
-1
lm_eval/tasks/logiqa/logiqa.yaml
lm_eval/tasks/logiqa/logiqa.yaml
+1
-1
lm_eval/tasks/logiqa2/logieval.yaml
lm_eval/tasks/logiqa2/logieval.yaml
+1
-1
lm_eval/tasks/logiqa2/logiqa2.yaml
lm_eval/tasks/logiqa2/logiqa2.yaml
+1
-1
lm_eval/tasks/mathqa/mathqa.yaml
lm_eval/tasks/mathqa/mathqa.yaml
+1
-1
lm_eval/tasks/mc_taco/default.yaml
lm_eval/tasks/mc_taco/default.yaml
+1
-1
lm_eval/tasks/mgsm/direct/direct_yaml
lm_eval/tasks/mgsm/direct/direct_yaml
+1
-1
lm_eval/tasks/mgsm/en_cot/cot_yaml
lm_eval/tasks/mgsm/en_cot/cot_yaml
+1
-1
lm_eval/tasks/mgsm/native_cot/cot_yaml
lm_eval/tasks/mgsm/native_cot/cot_yaml
+1
-1
lm_eval/tasks/mgsm/utils.py
lm_eval/tasks/mgsm/utils.py
+0
-1
lm_eval/tasks/minerva_math/minerva_math_algebra.yaml
lm_eval/tasks/minerva_math/minerva_math_algebra.yaml
+1
-1
lm_eval/tasks/mmlu/_generate_configs.py
lm_eval/tasks/mmlu/_generate_configs.py
+0
-3
lm_eval/tasks/mmlu/default/_default_template_yaml
lm_eval/tasks/mmlu/default/_default_template_yaml
+1
-1
lm_eval/tasks/mmlu/flan_cot_fewshot/_mmlu_flan_cot_fewshot_template_yaml
...mlu/flan_cot_fewshot/_mmlu_flan_cot_fewshot_template_yaml
+1
-1
lm_eval/tasks/mmlu/flan_cot_zeroshot/_mmlu_flan_cot_zeroshot_template_yaml
...u/flan_cot_zeroshot/_mmlu_flan_cot_zeroshot_template_yaml
+1
-1
No files found.
lm_eval/tasks/kmmlu/utils.py
deleted
100644 → 0
View file @
dfb41835
import
datasets
def
process_docs
(
dataset
:
datasets
.
Dataset
)
->
datasets
.
Dataset
:
def
_process_doc
(
doc
):
instruction
=
(
f
"다음을 읽고 정답으로 알맞은 것을 고르시요.
\n
"
f
"### Question:
{
doc
[
'question'
]
}
\n
"
f
"### Options:
\n
"
f
"(1)
{
doc
[
'option#1'
]
}
\n
(2)
{
doc
[
'option#2'
]
}
\n
(3)
{
doc
[
'option#3'
]
}
\n
(4)
{
doc
[
'option#4'
]
}
\n
"
f
"### Answer: 주어진 문제의 정답은"
)
out_doc
=
{
"question"
:
instruction
,
"choices"
:
[
"(1)"
,
"(2)"
,
"(3)"
,
"(4)"
],
"gold"
:
int
(
doc
[
"answer"
])
-
1
,
}
return
out_doc
return
dataset
.
map
(
_process_doc
)
lm_eval/tasks/lambada/lambada_openai.yaml
View file @
cda25fef
...
@@ -17,4 +17,4 @@ metric_list:
...
@@ -17,4 +17,4 @@ metric_list:
aggregation
:
mean
aggregation
:
mean
higher_is_better
:
true
higher_is_better
:
true
metadata
:
metadata
:
-
version
:
1.0
version
:
1.0
lm_eval/tasks/lambada/lambada_standard.yaml
View file @
cda25fef
...
@@ -18,4 +18,4 @@ metric_list:
...
@@ -18,4 +18,4 @@ metric_list:
aggregation
:
mean
aggregation
:
mean
higher_is_better
:
true
higher_is_better
:
true
metadata
:
metadata
:
-
version
:
1.0
version
:
1.0
lm_eval/tasks/lambada_cloze/lambada_openai_cloze.yaml
View file @
cda25fef
...
@@ -17,4 +17,4 @@ metric_list:
...
@@ -17,4 +17,4 @@ metric_list:
aggregation
:
mean
aggregation
:
mean
higher_is_better
:
true
higher_is_better
:
true
metadata
:
metadata
:
-
version
:
1.0
version
:
1.0
lm_eval/tasks/lambada_cloze/lambada_standard_cloze.yaml
View file @
cda25fef
...
@@ -18,4 +18,4 @@ metric_list:
...
@@ -18,4 +18,4 @@ metric_list:
aggregation
:
mean
aggregation
:
mean
higher_is_better
:
true
higher_is_better
:
true
metadata
:
metadata
:
-
version
:
1.0
version
:
1.0
lm_eval/tasks/lambada_multilingual/lambada_mt_en.yaml
View file @
cda25fef
...
@@ -17,4 +17,4 @@ metric_list:
...
@@ -17,4 +17,4 @@ metric_list:
aggregation
:
mean
aggregation
:
mean
higher_is_better
:
true
higher_is_better
:
true
metadata
:
metadata
:
-
version
:
1.0
version
:
1.0
lm_eval/tasks/logiqa/logiqa.yaml
View file @
cda25fef
...
@@ -18,4 +18,4 @@ metric_list:
...
@@ -18,4 +18,4 @@ metric_list:
aggregation
:
mean
aggregation
:
mean
higher_is_better
:
true
higher_is_better
:
true
metadata
:
metadata
:
-
version
:
1.0
version
:
1.0
lm_eval/tasks/logiqa2/logieval.yaml
View file @
cda25fef
...
@@ -24,4 +24,4 @@ filter_list:
...
@@ -24,4 +24,4 @@ filter_list:
regex_pattern
:
"
^
\\
s*([A-D])"
regex_pattern
:
"
^
\\
s*([A-D])"
-
function
:
"
take_first"
-
function
:
"
take_first"
metadata
:
metadata
:
-
version
:
0.0
version
:
0.0
lm_eval/tasks/logiqa2/logiqa2.yaml
View file @
cda25fef
...
@@ -18,4 +18,4 @@ metric_list:
...
@@ -18,4 +18,4 @@ metric_list:
aggregation
:
mean
aggregation
:
mean
higher_is_better
:
true
higher_is_better
:
true
metadata
:
metadata
:
-
version
:
0.0
version
:
0.0
lm_eval/tasks/mathqa/mathqa.yaml
View file @
cda25fef
...
@@ -19,4 +19,4 @@ metric_list:
...
@@ -19,4 +19,4 @@ metric_list:
aggregation
:
mean
aggregation
:
mean
higher_is_better
:
true
higher_is_better
:
true
metadata
:
metadata
:
-
version
:
1.0
version
:
1.0
lm_eval/tasks/mc_taco/default.yaml
View file @
cda25fef
...
@@ -12,4 +12,4 @@ metric_list:
...
@@ -12,4 +12,4 @@ metric_list:
-
metric
:
acc
-
metric
:
acc
-
metric
:
f1
-
metric
:
f1
metadata
:
metadata
:
-
version
:
1.0
version
:
1.0
lm_eval/tasks/mgsm/direct/direct_yaml
View file @
cda25fef
...
@@ -26,4 +26,4 @@ metric_list:
...
@@ -26,4 +26,4 @@ metric_list:
ignore_case: true
ignore_case: true
ignore_punctuation: true
ignore_punctuation: true
metadata:
metadata:
-
version: 0.0
version: 0.0
lm_eval/tasks/mgsm/en_cot/cot_yaml
View file @
cda25fef
...
@@ -28,4 +28,4 @@ filter_list:
...
@@ -28,4 +28,4 @@ filter_list:
regex_pattern: "The answer is (\\-?[0-9\\.\\,]+)"
regex_pattern: "The answer is (\\-?[0-9\\.\\,]+)"
- function: "take_first"
- function: "take_first"
metadata:
metadata:
-
version: 0.0
version: 0.0
lm_eval/tasks/mgsm/native_cot/cot_yaml
View file @
cda25fef
...
@@ -28,4 +28,4 @@ filter_list:
...
@@ -28,4 +28,4 @@ filter_list:
regex_pattern: "The answer is (\\-?[0-9\\.\\,]+)"
regex_pattern: "The answer is (\\-?[0-9\\.\\,]+)"
- function: "take_first"
- function: "take_first"
metadata:
metadata:
-
version: 1.0
version: 1.0
lm_eval/tasks/mgsm/utils.py
View file @
cda25fef
...
@@ -94,7 +94,6 @@ LANGUAGES = {
...
@@ -94,7 +94,6 @@ LANGUAGES = {
def
add_regex_pattern
(
regex_pattern
):
def
add_regex_pattern
(
regex_pattern
):
if
regex_pattern
is
None
:
if
regex_pattern
is
None
:
return
{}
return
{}
return
{
return
{
...
...
lm_eval/tasks/minerva_math/minerva_math_algebra.yaml
View file @
cda25fef
...
@@ -21,4 +21,4 @@ metric_list:
...
@@ -21,4 +21,4 @@ metric_list:
higher_is_better
:
true
higher_is_better
:
true
num_fewshot
:
0
num_fewshot
:
0
metadata
:
metadata
:
-
version
:
0.0
version
:
0.0
lm_eval/tasks/mmlu/_generate_configs.py
View file @
cda25fef
...
@@ -7,7 +7,6 @@ import argparse
...
@@ -7,7 +7,6 @@ import argparse
from
tqdm
import
tqdm
from
tqdm
import
tqdm
from
lm_eval
import
utils
from
lm_eval.logger
import
eval_logger
from
lm_eval.logger
import
eval_logger
SUBJECTS
=
{
SUBJECTS
=
{
...
@@ -82,7 +81,6 @@ def parse_args():
...
@@ -82,7 +81,6 @@ def parse_args():
if
__name__
==
"__main__"
:
if
__name__
==
"__main__"
:
args
=
parse_args
()
args
=
parse_args
()
# get filename of base_yaml so we can `"include": ` it in our "other" YAMLs.
# get filename of base_yaml so we can `"include": ` it in our "other" YAMLs.
...
@@ -98,7 +96,6 @@ if __name__ == "__main__":
...
@@ -98,7 +96,6 @@ if __name__ == "__main__":
ALL_CATEGORIES
=
[]
ALL_CATEGORIES
=
[]
for
subject
,
category
in
tqdm
(
SUBJECTS
.
items
()):
for
subject
,
category
in
tqdm
(
SUBJECTS
.
items
()):
if
category
not
in
ALL_CATEGORIES
:
if
category
not
in
ALL_CATEGORIES
:
ALL_CATEGORIES
.
append
(
category
)
ALL_CATEGORIES
.
append
(
category
)
...
...
lm_eval/tasks/mmlu/default/_default_template_yaml
View file @
cda25fef
...
@@ -12,4 +12,4 @@ metric_list:
...
@@ -12,4 +12,4 @@ metric_list:
aggregation: mean
aggregation: mean
higher_is_better: true
higher_is_better: true
metadata:
metadata:
-
version: 0.0
version: 0.0
lm_eval/tasks/mmlu/flan_cot_fewshot/_mmlu_flan_cot_fewshot_template_yaml
View file @
cda25fef
...
@@ -23,4 +23,4 @@ metric_list:
...
@@ -23,4 +23,4 @@ metric_list:
ignore_case: true
ignore_case: true
ignore_punctuation: true
ignore_punctuation: true
metadata:
metadata:
-
version: 0.0
version: 0.0
lm_eval/tasks/mmlu/flan_cot_zeroshot/_mmlu_flan_cot_zeroshot_template_yaml
View file @
cda25fef
...
@@ -23,4 +23,4 @@ metric_list:
...
@@ -23,4 +23,4 @@ metric_list:
ignore_case: true
ignore_case: true
ignore_punctuation: true
ignore_punctuation: true
metadata:
metadata:
-
version: 0.0
version: 0.0
Prev
1
…
3
4
5
6
7
8
9
10
11
…
13
Next
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment