Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
gaoqiong
lm-evaluation-harness
Commits
02e841ce
Commit
02e841ce
authored
Mar 14, 2024
by
lintangsutawika
Browse files
Merge branch 'main' of
https://github.com/EleutherAI/lm-evaluation-harness
into t5v2-alt-plus
parents
90ad5db7
e74ec966
Changes
154
Hide whitespace changes
Inline
Side-by-side
Showing
20 changed files
with
413 additions
and
0 deletions
+413
-0
lm_eval/tasks/agieval/gaokao-mathcloze.yaml
lm_eval/tasks/agieval/gaokao-mathcloze.yaml
+25
-0
lm_eval/tasks/agieval/gaokao-mathqa.yaml
lm_eval/tasks/agieval/gaokao-mathqa.yaml
+6
-0
lm_eval/tasks/agieval/gaokao-physics.yaml
lm_eval/tasks/agieval/gaokao-physics.yaml
+6
-0
lm_eval/tasks/agieval/jec-qa-ca.yaml
lm_eval/tasks/agieval/jec-qa-ca.yaml
+6
-0
lm_eval/tasks/agieval/jec-qa-kd.yaml
lm_eval/tasks/agieval/jec-qa-kd.yaml
+6
-0
lm_eval/tasks/agieval/logiqa-en.yaml
lm_eval/tasks/agieval/logiqa-en.yaml
+7
-0
lm_eval/tasks/agieval/logiqa-zh.yaml
lm_eval/tasks/agieval/logiqa-zh.yaml
+6
-0
lm_eval/tasks/agieval/lsat-ar.yaml
lm_eval/tasks/agieval/lsat-ar.yaml
+7
-0
lm_eval/tasks/agieval/lsat-lr.yaml
lm_eval/tasks/agieval/lsat-lr.yaml
+7
-0
lm_eval/tasks/agieval/lsat-rc.yaml
lm_eval/tasks/agieval/lsat-rc.yaml
+7
-0
lm_eval/tasks/agieval/math.yaml
lm_eval/tasks/agieval/math.yaml
+25
-0
lm_eval/tasks/agieval/sat-en-without-passage.yaml
lm_eval/tasks/agieval/sat-en-without-passage.yaml
+7
-0
lm_eval/tasks/agieval/sat-en.yaml
lm_eval/tasks/agieval/sat-en.yaml
+7
-0
lm_eval/tasks/agieval/sat-math.yaml
lm_eval/tasks/agieval/sat-math.yaml
+7
-0
lm_eval/tasks/agieval/utils.py
lm_eval/tasks/agieval/utils.py
+274
-0
lm_eval/tasks/arithmetic/arithmetic_1dc.yaml
lm_eval/tasks/arithmetic/arithmetic_1dc.yaml
+2
-0
lm_eval/tasks/arithmetic/arithmetic_2da.yaml
lm_eval/tasks/arithmetic/arithmetic_2da.yaml
+2
-0
lm_eval/tasks/arithmetic/arithmetic_2dm.yaml
lm_eval/tasks/arithmetic/arithmetic_2dm.yaml
+2
-0
lm_eval/tasks/arithmetic/arithmetic_2ds.yaml
lm_eval/tasks/arithmetic/arithmetic_2ds.yaml
+2
-0
lm_eval/tasks/arithmetic/arithmetic_3da.yaml
lm_eval/tasks/arithmetic/arithmetic_3da.yaml
+2
-0
No files found.
lm_eval/tasks/agieval/gaokao-mathcloze.yaml
0 → 100644
View file @
02e841ce
group
:
-
agieval
-
agieval_cn
task
:
agieval_gaokao_mathcloze
dataset_path
:
hails/agieval-gaokao-mathcloze
dataset_name
:
null
output_type
:
generate_until
training_split
:
null
validation_split
:
null
test_split
:
test
doc_to_text
:
"
{{query}}"
doc_to_target
:
"
{{answer}}"
process_results
:
!function
utils.process_results
generation_kwargs
:
max_gen_toks
:
32
do_sample
:
False
temperature
:
0.0
until
:
-
"
Q:"
metric_list
:
-
metric
:
acc
aggregation
:
mean
higher_is_better
:
true
metadata
:
version
:
1.0
lm_eval/tasks/agieval/gaokao-mathqa.yaml
0 → 100644
View file @
02e841ce
include
:
aqua-rat.yaml
group
:
-
agieval
-
agieval_cn
task
:
agieval_gaokao_mathqa
dataset_path
:
hails/agieval-gaokao-mathqa
lm_eval/tasks/agieval/gaokao-physics.yaml
0 → 100644
View file @
02e841ce
include
:
aqua-rat.yaml
group
:
-
agieval
-
agieval_cn
task
:
agieval_gaokao_physics
dataset_path
:
hails/agieval-gaokao-physics
lm_eval/tasks/agieval/jec-qa-ca.yaml
0 → 100644
View file @
02e841ce
include
:
aqua-rat.yaml
group
:
-
agieval
-
agieval_cn
task
:
agieval_jec_qa_ca
dataset_path
:
hails/agieval-jec-qa-ca
lm_eval/tasks/agieval/jec-qa-kd.yaml
0 → 100644
View file @
02e841ce
include
:
aqua-rat.yaml
group
:
-
agieval
-
agieval_cn
task
:
agieval_jec_qa_kd
dataset_path
:
hails/agieval-jec-qa-kd
lm_eval/tasks/agieval/logiqa-en.yaml
0 → 100644
View file @
02e841ce
include
:
aqua-rat.yaml
group
:
-
agieval
-
agieval_nous
-
agieval_en
task
:
agieval_logiqa_en
dataset_path
:
hails/agieval-logiqa-en
lm_eval/tasks/agieval/logiqa-zh.yaml
0 → 100644
View file @
02e841ce
include
:
aqua-rat.yaml
group
:
-
agieval
-
agieval_cn
task
:
agieval_logiqa_zh
dataset_path
:
hails/agieval-logiqa-zh
lm_eval/tasks/agieval/lsat-ar.yaml
0 → 100644
View file @
02e841ce
include
:
aqua-rat.yaml
group
:
-
agieval
-
agieval_nous
-
agieval_en
task
:
agieval_lsat_ar
dataset_path
:
hails/agieval-lsat-ar
lm_eval/tasks/agieval/lsat-lr.yaml
0 → 100644
View file @
02e841ce
include
:
aqua-rat.yaml
group
:
-
agieval
-
agieval_nous
-
agieval_en
task
:
agieval_lsat_lr
dataset_path
:
hails/agieval-lsat-lr
lm_eval/tasks/agieval/lsat-rc.yaml
0 → 100644
View file @
02e841ce
include
:
aqua-rat.yaml
group
:
-
agieval
-
agieval_nous
-
agieval_en
task
:
agieval_lsat_rc
dataset_path
:
hails/agieval-lsat-rc
lm_eval/tasks/agieval/math.yaml
0 → 100644
View file @
02e841ce
group
:
-
agieval
-
agieval_en
task
:
agieval_math
dataset_path
:
hails/agieval-math
dataset_name
:
null
output_type
:
generate_until
training_split
:
null
validation_split
:
null
test_split
:
test
doc_to_text
:
"
{{query}}"
doc_to_target
:
"
{{answer}}"
process_results
:
!function
utils.process_results
generation_kwargs
:
max_gen_toks
:
32
do_sample
:
False
temperature
:
0.0
until
:
-
"
Q:"
metric_list
:
-
metric
:
acc
aggregation
:
mean
higher_is_better
:
true
metadata
:
version
:
1.0
lm_eval/tasks/agieval/sat-en-without-passage.yaml
0 → 100644
View file @
02e841ce
include
:
aqua-rat.yaml
group
:
-
agieval
-
agieval_nous
-
agieval_en
task
:
agieval_sat_en_without_passage
dataset_path
:
hails/agieval-sat-en-without-passage
lm_eval/tasks/agieval/sat-en.yaml
0 → 100644
View file @
02e841ce
include
:
aqua-rat.yaml
group
:
-
agieval
-
agieval_nous
-
agieval_en
task
:
agieval_sat_en
dataset_path
:
hails/agieval-sat-en
lm_eval/tasks/agieval/sat-math.yaml
0 → 100644
View file @
02e841ce
include
:
aqua-rat.yaml
group
:
-
agieval
-
agieval_nous
-
agieval_en
task
:
agieval_sat_math
dataset_path
:
hails/agieval-sat-math
lm_eval/tasks/agieval/utils.py
0 → 100644
View file @
02e841ce
# Answer parsing and normalization code, from
# https://github.com/ruixiangcui/AGIEval/blob/main/src/
# math_equivalence.py and post_process.py
import
re
from
typing
import
Dict
,
List
import
numpy
as
np
def
parse_math_answer
(
raw_string
):
def
remove_boxed
(
s
):
left
=
"
\\
boxed{"
try
:
assert
s
[:
len
(
left
)]
==
left
assert
s
[
-
1
]
==
"}"
answer
=
s
[
len
(
left
)
:
-
1
]
if
"="
in
answer
:
answer
=
answer
.
split
(
"="
)[
-
1
].
lstrip
(
" "
)
return
answer
except
Exception
:
return
None
def
last_boxed_only_string
(
string
):
idx
=
string
.
rfind
(
"
\\
boxed"
)
if
idx
<
0
:
idx
=
string
.
rfind
(
"
\\
fbox"
)
if
idx
<
0
:
return
None
i
=
idx
right_brace_idx
=
None
num_left_braces_open
=
0
while
i
<
len
(
string
):
if
string
[
i
]
==
"{"
:
num_left_braces_open
+=
1
if
string
[
i
]
==
"}"
:
num_left_braces_open
-=
1
if
num_left_braces_open
==
0
:
right_brace_idx
=
i
break
i
+=
1
if
right_brace_idx
is
None
:
retval
=
None
else
:
retval
=
string
[
idx
:
right_brace_idx
+
1
]
return
retval
def
get_answer_with_dollar_sign
(
s
):
first_pattern
=
"\$(.*)\$"
last_match
=
None
matches
=
re
.
findall
(
first_pattern
,
s
)
if
matches
:
last_match
=
matches
[
-
1
]
if
"="
in
last_match
:
last_match
=
last_match
.
split
(
"="
)[
-
1
].
lstrip
(
" "
)
return
last_match
def
get_answer_without_dollar_sign
(
s
):
last_match
=
None
if
"="
in
s
:
last_match
=
s
.
split
(
"="
)[
-
1
].
lstrip
(
" "
).
rstrip
(
"."
)
if
"
\\
n"
in
last_match
:
last_match
=
last_match
.
split
(
"
\\
n"
)[
0
]
else
:
pattern
=
"(?:
\\
$)?\d+(?:\.\d+)?(?![\w\d])"
matches
=
re
.
findall
(
pattern
,
s
)
if
matches
:
last_match
=
matches
[
-
1
]
return
last_match
if
"
\\
boxed"
in
raw_string
:
answer
=
remove_boxed
(
last_boxed_only_string
(
raw_string
))
else
:
answer
=
get_answer_with_dollar_sign
(
raw_string
)
if
not
answer
:
answer
=
get_answer_without_dollar_sign
(
raw_string
)
return
answer
# code from https://github.com/hendrycks/math/blob/main/modeling/math_equivalence.py
def
_fix_fracs
(
string
):
substrs
=
string
.
split
(
"
\\
frac"
)
new_str
=
substrs
[
0
]
if
len
(
substrs
)
>
1
:
substrs
=
substrs
[
1
:]
for
substr
in
substrs
:
new_str
+=
"
\\
frac"
if
substr
[
0
]
==
"{"
:
new_str
+=
substr
else
:
try
:
assert
len
(
substr
)
>=
2
except
Exception
:
return
string
a
=
substr
[
0
]
b
=
substr
[
1
]
if
b
!=
"{"
:
if
len
(
substr
)
>
2
:
post_substr
=
substr
[
2
:]
new_str
+=
"{"
+
a
+
"}{"
+
b
+
"}"
+
post_substr
else
:
new_str
+=
"{"
+
a
+
"}{"
+
b
+
"}"
else
:
if
len
(
substr
)
>
2
:
post_substr
=
substr
[
2
:]
new_str
+=
"{"
+
a
+
"}"
+
b
+
post_substr
else
:
new_str
+=
"{"
+
a
+
"}"
+
b
string
=
new_str
return
string
def
_fix_a_slash_b
(
string
):
if
len
(
string
.
split
(
"/"
))
!=
2
:
return
string
a
=
string
.
split
(
"/"
)[
0
]
b
=
string
.
split
(
"/"
)[
1
]
try
:
a
=
int
(
a
)
b
=
int
(
b
)
assert
string
==
"{}/{}"
.
format
(
a
,
b
)
new_string
=
"
\\
frac{"
+
str
(
a
)
+
"}{"
+
str
(
b
)
+
"}"
return
new_string
except
Exception
:
return
string
def
_remove_right_units
(
string
):
# "\\text{ " only ever occurs (at least in the val set) when describing units
if
"
\\
text{ "
in
string
:
splits
=
string
.
split
(
"
\\
text{ "
)
assert
len
(
splits
)
==
2
return
splits
[
0
]
else
:
return
string
def
_fix_sqrt
(
string
):
if
"
\\
sqrt"
not
in
string
:
return
string
splits
=
string
.
split
(
"
\\
sqrt"
)
new_string
=
splits
[
0
]
for
split
in
splits
[
1
:]:
if
split
[
0
]
!=
"{"
:
a
=
split
[
0
]
new_substr
=
"
\\
sqrt{"
+
a
+
"}"
+
split
[
1
:]
else
:
new_substr
=
"
\\
sqrt"
+
split
new_string
+=
new_substr
return
new_string
def
_strip_string
(
string
):
# linebreaks
string
=
string
.
replace
(
"
\n
"
,
""
)
# print(string)
# remove inverse spaces
string
=
string
.
replace
(
"
\\
!"
,
""
)
# print(string)
# replace \\ with \
string
=
string
.
replace
(
"
\\\\
"
,
"
\\
"
)
# print(string)
# replace tfrac and dfrac with frac
string
=
string
.
replace
(
"tfrac"
,
"frac"
)
string
=
string
.
replace
(
"dfrac"
,
"frac"
)
# print(string)
# remove \left and \right
string
=
string
.
replace
(
"
\\
left"
,
""
)
string
=
string
.
replace
(
"
\\
right"
,
""
)
# print(string)
# Remove circ (degrees)
string
=
string
.
replace
(
"^{
\\
circ}"
,
""
)
string
=
string
.
replace
(
"^
\\
circ"
,
""
)
# remove dollar signs
string
=
string
.
replace
(
"
\\
$"
,
""
)
# remove units (on the right)
string
=
_remove_right_units
(
string
)
# remove percentage
string
=
string
.
replace
(
"
\\
%"
,
""
)
string
=
string
.
replace
(
"\%"
,
""
)
# " 0." equivalent to " ." and "{0." equivalent to "{." Alternatively, add "0" if "." is the start of the string
string
=
string
.
replace
(
" ."
,
" 0."
)
string
=
string
.
replace
(
"{."
,
"{0."
)
# if empty, return empty string
if
len
(
string
)
==
0
:
return
string
if
string
[
0
]
==
"."
:
string
=
"0"
+
string
# to consider: get rid of e.g. "k = " or "q = " at beginning
if
len
(
string
.
split
(
"="
))
==
2
:
if
len
(
string
.
split
(
"="
)[
0
])
<=
2
:
string
=
string
.
split
(
"="
)[
1
]
# fix sqrt3 --> sqrt{3}
string
=
_fix_sqrt
(
string
)
# remove spaces
string
=
string
.
replace
(
" "
,
""
)
# \frac1b or \frac12 --> \frac{1}{b} and \frac{1}{2}, etc. Even works with \frac1{72} (but not \frac{72}1). Also does a/b --> \\frac{a}{b}
string
=
_fix_fracs
(
string
)
# manually change 0.5 --> \frac{1}{2}
if
string
==
"0.5"
:
string
=
"
\\
frac{1}{2}"
# NOTE: X/Y changed to \frac{X}{Y} in dataset, but in simple cases fix in case the model output is X/Y
string
=
_fix_a_slash_b
(
string
)
return
string
def
is_equiv
(
str1
,
str2
,
verbose
=
False
):
if
str1
is
None
and
str2
is
None
:
print
(
"WARNING: Both None"
)
return
True
if
str1
is
None
or
str2
is
None
:
return
False
str1
,
str2
=
parse_math_answer
(
str1
),
parse_math_answer
(
str2
)
try
:
ss1
=
_strip_string
(
str1
)
ss2
=
_strip_string
(
str2
)
if
verbose
:
print
(
ss1
,
ss2
)
return
ss1
==
ss2
except
Exception
:
return
str1
==
str2
def
process_results
(
doc
:
dict
,
results
:
List
[
str
])
->
Dict
[
str
,
int
]:
candidate
=
results
[
0
]
gold
=
doc
[
"answer"
]
if
not
gold
:
print
(
doc
,
candidate
,
gold
)
if
is_equiv
(
candidate
,
gold
):
retval
=
1
else
:
retval
=
0
results
=
{
"acc"
:
retval
,
}
return
results
# use a custom process_results() function, because AGIEval can have multiple valid answers
def
process_results_mcqa
(
doc
,
results
):
results
=
[
result
[
0
]
for
result
in
results
]
gold
=
doc
[
"gold"
]
acc
=
1.0
if
int
(
np
.
argmax
(
results
))
in
gold
else
0.0
completion_len
=
np
.
array
([
float
(
len
(
i
))
for
i
in
doc
[
"choices"
]])
acc_norm
=
1.0
if
int
(
np
.
argmax
(
results
/
completion_len
))
in
gold
else
0.0
return
{
"acc"
:
acc
,
"acc_norm"
:
acc_norm
,
}
lm_eval/tasks/arithmetic/arithmetic_1dc.yaml
View file @
02e841ce
...
@@ -14,3 +14,5 @@ metric_list:
...
@@ -14,3 +14,5 @@ metric_list:
higher_is_better
:
true
higher_is_better
:
true
metadata
:
metadata
:
version
:
1.0
version
:
1.0
dataset_kwargs
:
trust_remote_code
:
true
lm_eval/tasks/arithmetic/arithmetic_2da.yaml
View file @
02e841ce
include
:
arithmetic_1dc.yaml
include
:
arithmetic_1dc.yaml
task
:
arithmetic_2da
task
:
arithmetic_2da
dataset_name
:
arithmetic_2da
dataset_name
:
arithmetic_2da
dataset_kwargs
:
trust_remote_code
:
true
lm_eval/tasks/arithmetic/arithmetic_2dm.yaml
View file @
02e841ce
include
:
arithmetic_1dc.yaml
include
:
arithmetic_1dc.yaml
task
:
arithmetic_2dm
task
:
arithmetic_2dm
dataset_name
:
arithmetic_2dm
dataset_name
:
arithmetic_2dm
dataset_kwargs
:
trust_remote_code
:
true
lm_eval/tasks/arithmetic/arithmetic_2ds.yaml
View file @
02e841ce
include
:
arithmetic_1dc.yaml
include
:
arithmetic_1dc.yaml
task
:
arithmetic_2ds
task
:
arithmetic_2ds
dataset_name
:
arithmetic_2ds
dataset_name
:
arithmetic_2ds
dataset_kwargs
:
trust_remote_code
:
true
lm_eval/tasks/arithmetic/arithmetic_3da.yaml
View file @
02e841ce
include
:
arithmetic_1dc.yaml
include
:
arithmetic_1dc.yaml
task
:
arithmetic_3da
task
:
arithmetic_3da
dataset_name
:
arithmetic_3da
dataset_name
:
arithmetic_3da
dataset_kwargs
:
trust_remote_code
:
true
Prev
1
2
3
4
5
6
7
8
Next
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment