Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
gaoqiong
lm-evaluation-harness
Commits
ac290ce8
Commit
ac290ce8
authored
Aug 19, 2023
by
baberabb
Browse files
get _RES from CI and change atol to 0.0001
parent
a004fdc0
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
43 additions
and
42 deletions
+43
-42
tests/models/test_huggingface.py
tests/models/test_huggingface.py
+43
-42
No files found.
tests/models/test_huggingface.py
View file @
ac290ce8
...
...
@@ -20,46 +20,46 @@ class Test_HFLM:
ROLLING
:
list
[
Instance
]
=
rolling_task
.
instances
MULTIPLE_CH_RES
=
[
(
-
41.90
5879974365234
,
False
)
,
(
-
42.93
785095214844
,
False
)
,
(
-
33.914
5393371582
,
False
)
,
(
-
37.071
10595703125
,
False
)
,
(
-
22.95
4187393188477
,
False
)
,
(
-
20.342
954635620117
,
False
)
,
(
-
14.81
6370010375977
,
False
)
,
(
-
27.94
381332397461
,
False
)
,
(
-
15.80
6619644165039
,
False
)
,
(
-
15.93
7178611755371
,
False
)
,
(
-
13.052
162170410156
,
False
)
,
(
-
18.048
89678955078
,
False
)
,
(
-
13.34
6054077148438
,
False
)
,
(
-
13.36
778
2592
773438
,
False
)
,
(
-
12.12
8646850585938
,
False
)
,
(
-
11.87
1688842773438
,
False
)
,
(
-
47.10
654067993164
,
False
)
,
(
-
47.76
068115234375
,
False
)
,
(
-
36.44
114303588867
,
False
)
,
(
-
50.028
51104736328
,
False
)
,
(
-
16.7
19867706298828
,
False
)
,
(
-
18.53
7654876708984
,
False
)
,
(
-
26.4699
72610473633
,
False
)
,
(
-
20.35
6552124023438
,
False
)
,
(
-
17.757
23648071289
,
False
)
,
(
-
21.80
68790435791
,
False
)
,
(
-
33.199
71466064453
,
False
)
,
(
-
39.286
2434387207
,
False
)
,
(
-
14.7
62389183044434
,
False
)
,
(
-
16.75
531005859375
,
False
)
,
(
-
11.486
998558044434
,
False
)
,
(
-
15.421
247482299805
,
False
)
,
(
-
13.157
613754272461
,
False
)
,
(
-
15.88
864517211914
,
False
)
,
(
-
15.28
7158012390137
,
False
)
,
(
-
12.339
122772216797
,
False
)
,
(
-
44.594
00177001953
,
False
)
,
(
-
55.40
974807739258
,
False
)
,
(
-
52.
697017669677734
,
False
)
,
(
-
56.25
2601623535156
,
False
)
,
-
41.90
2435302734375
,
-
42.93
9308166503906
,
-
33.914
180755615234
,
-
37.071
39205932617
,
-
22.95
258331298828
,
-
20.342
208862304688
,
-
14.81
8366050720215
,
-
27.94
2853927612305
,
-
15.80
704116821289
,
-
15.93
6427116394043
,
-
13.052
018165588379
,
-
18.048
28453063965
,
-
13.34
5029830932617
,
-
13.36
60
2592
4682617
,
-
12.12
7134323120117
,
-
11.87
2495651245117
,
-
47.10
598373413086
,
-
47.76
410675048828
,
-
36.44
06852722168
,
-
50.028
9421081543
,
-
16.7
2093963623047
,
-
18.53
5587310791016
,
-
26.4699
3637084961
,
-
20.35
5995178222656
,
-
17.757
919311523438
,
-
21.80
595588684082
,
-
33.199
0852355957
,
-
39.286
36932373047
,
-
14.7
59679794311523
,
-
16.75
3942489624023
,
-
11.486
852645874023
,
-
15.421
77677154541
,
-
13.157
98282623291
,
-
15.88
7393951416016
,
-
15.28
614616394043
,
-
12.339
089393615723
,
-
44.594
41375732422
,
-
55.40
888214111328
,
-
52.
70050811767578
,
-
56.25
089645385742
,
]
GREEDY_UNTIL_RES
=
[
" The average of $2.50 each is $"
,
...
...
@@ -89,8 +89,9 @@ class Test_HFLM:
def
test_logliklihood
(
self
)
->
None
:
res
=
self
.
LM
.
loglikelihood
(
self
.
MULTIPLE_CH
)
_RES
,
_res
=
[
r
[
0
]
for
r
in
self
.
MULTIPLE_CH_RES
],
[
r
[
0
]
for
r
in
res
]
assert
np
.
allclose
(
_res
,
_RES
,
atol
=
1e-2
)
_RES
,
_res
=
self
.
MULTIPLE_CH_RES
,
[
r
[
0
]
for
r
in
res
]
# change atol in case of consistent failure
assert
np
.
allclose
(
_res
,
_RES
,
atol
=
1e-4
)
# check indices for Multiple Choice
argmax_RES
,
argmax_res
=
np
.
argmax
(
np
.
array
(
_RES
).
reshape
(
-
1
,
4
),
axis
=
1
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment