Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
gaoqiong
lm-evaluation-harness
Commits
8c997e53
Commit
8c997e53
authored
May 03, 2022
by
jon-tow
Browse files
Revert `tests/testdata` changes and address flake8 issues
parent
d95a4333
Changes
627
Hide whitespace changes
Inline
Side-by-side
Showing
20 changed files
with
20 additions
and
20 deletions
+20
-20
tests/testdata/headqa-v0-res.json
tests/testdata/headqa-v0-res.json
+1
-1
tests/testdata/headqa_en-v0-loglikelihood
tests/testdata/headqa_en-v0-loglikelihood
+1
-1
tests/testdata/headqa_en-v0-res.json
tests/testdata/headqa_en-v0-res.json
+1
-1
tests/testdata/headqa_es-v0-loglikelihood
tests/testdata/headqa_es-v0-loglikelihood
+1
-1
tests/testdata/headqa_es-v0-res.json
tests/testdata/headqa_es-v0-res.json
+1
-1
tests/testdata/hellaswag-v0-loglikelihood
tests/testdata/hellaswag-v0-loglikelihood
+1
-1
tests/testdata/hellaswag-v0-res.json
tests/testdata/hellaswag-v0-res.json
+1
-1
tests/testdata/hendrycksTest-abstract_algebra-v0-loglikelihood
.../testdata/hendrycksTest-abstract_algebra-v0-loglikelihood
+1
-1
tests/testdata/hendrycksTest-abstract_algebra-v0-res.json
tests/testdata/hendrycksTest-abstract_algebra-v0-res.json
+1
-1
tests/testdata/hendrycksTest-anatomy-v0-loglikelihood
tests/testdata/hendrycksTest-anatomy-v0-loglikelihood
+1
-1
tests/testdata/hendrycksTest-anatomy-v0-res.json
tests/testdata/hendrycksTest-anatomy-v0-res.json
+1
-1
tests/testdata/hendrycksTest-astronomy-v0-loglikelihood
tests/testdata/hendrycksTest-astronomy-v0-loglikelihood
+1
-1
tests/testdata/hendrycksTest-astronomy-v0-res.json
tests/testdata/hendrycksTest-astronomy-v0-res.json
+1
-1
tests/testdata/hendrycksTest-business_ethics-v0-loglikelihood
...s/testdata/hendrycksTest-business_ethics-v0-loglikelihood
+1
-1
tests/testdata/hendrycksTest-business_ethics-v0-res.json
tests/testdata/hendrycksTest-business_ethics-v0-res.json
+1
-1
tests/testdata/hendrycksTest-clinical_knowledge-v0-loglikelihood
...estdata/hendrycksTest-clinical_knowledge-v0-loglikelihood
+1
-1
tests/testdata/hendrycksTest-clinical_knowledge-v0-res.json
tests/testdata/hendrycksTest-clinical_knowledge-v0-res.json
+1
-1
tests/testdata/hendrycksTest-college_biology-v0-loglikelihood
...s/testdata/hendrycksTest-college_biology-v0-loglikelihood
+1
-1
tests/testdata/hendrycksTest-college_biology-v0-res.json
tests/testdata/hendrycksTest-college_biology-v0-res.json
+1
-1
tests/testdata/hendrycksTest-college_chemistry-v0-loglikelihood
...testdata/hendrycksTest-college_chemistry-v0-loglikelihood
+1
-1
No files found.
tests/testdata/headqa-v0-res.json
View file @
8c997e53
{
"results"
:
{
"headqa"
:
{
"acc"
:
0.23559445660102116
,
"acc_norm"
:
0.25018234865062
,
"acc_norm_stderr"
:
0.008272783230806014
,
"acc_stderr"
:
0.008105688874297972
}},
"versions"
:
{
"headqa"
:
0
}}
{
"results"
:
{
"headqa"
:
{
"acc"
:
0.23559445660102116
,
"acc_norm"
:
0.25018234865062
,
"acc_norm_stderr"
:
0.008272783230806014
,
"acc_stderr"
:
0.008105688874297972
}},
"versions"
:
{
"headqa"
:
0
}}
\ No newline at end of file
tests/testdata/headqa_en-v0-loglikelihood
View file @
8c997e53
09da45119b12a0144e3081f8fb790c2a22af7b9c3aac42f54423d348a711fbf5
09da45119b12a0144e3081f8fb790c2a22af7b9c3aac42f54423d348a711fbf5
\ No newline at end of file
tests/testdata/headqa_en-v0-res.json
View file @
8c997e53
{
"results"
:
{
"headqa_en"
:
{
"acc"
:
0.23559445660102116
,
"acc_norm"
:
0.2447118891320204
,
"acc_norm_stderr"
:
0.008211629406841468
,
"acc_stderr"
:
0.008105688874297972
}},
"versions"
:
{
"headqa_en"
:
0
}}
{
"results"
:
{
"headqa_en"
:
{
"acc"
:
0.23559445660102116
,
"acc_norm"
:
0.2447118891320204
,
"acc_norm_stderr"
:
0.008211629406841468
,
"acc_stderr"
:
0.008105688874297972
}},
"versions"
:
{
"headqa_en"
:
0
}}
\ No newline at end of file
tests/testdata/headqa_es-v0-loglikelihood
View file @
8c997e53
767ca34d9714edd9fb030ddbcc35a64e5180d1e247b0cb557fbb22fdf971ad1f
767ca34d9714edd9fb030ddbcc35a64e5180d1e247b0cb557fbb22fdf971ad1f
\ No newline at end of file
tests/testdata/headqa_es-v0-res.json
View file @
8c997e53
{
"results"
:
{
"headqa_es"
:
{
"acc"
:
0.23559445660102116
,
"acc_norm"
:
0.25018234865062
,
"acc_norm_stderr"
:
0.008272783230806014
,
"acc_stderr"
:
0.008105688874297972
}},
"versions"
:
{
"headqa_es"
:
0
}}
{
"results"
:
{
"headqa_es"
:
{
"acc"
:
0.23559445660102116
,
"acc_norm"
:
0.25018234865062
,
"acc_norm_stderr"
:
0.008272783230806014
,
"acc_stderr"
:
0.008105688874297972
}},
"versions"
:
{
"headqa_es"
:
0
}}
\ No newline at end of file
tests/testdata/hellaswag-v0-loglikelihood
View file @
8c997e53
abb808c97d6529eda6c11067837a132c62d25cba0394d720f80cca6df9f7196e
abb808c97d6529eda6c11067837a132c62d25cba0394d720f80cca6df9f7196e
\ No newline at end of file
tests/testdata/hellaswag-v0-res.json
View file @
8c997e53
{
"results"
:
{
"hellaswag"
:
{
"acc"
:
0.24965146385182235
,
"acc_norm"
:
0.24756024696275641
,
"acc_norm_stderr"
:
0.004307128573285236
,
"acc_stderr"
:
0.004319267432460666
}},
"versions"
:
{
"hellaswag"
:
0
}}
{
"results"
:
{
"hellaswag"
:
{
"acc"
:
0.24965146385182235
,
"acc_norm"
:
0.24756024696275641
,
"acc_norm_stderr"
:
0.004307128573285236
,
"acc_stderr"
:
0.004319267432460666
}},
"versions"
:
{
"hellaswag"
:
0
}}
\ No newline at end of file
tests/testdata/hendrycksTest-abstract_algebra-v0-loglikelihood
View file @
8c997e53
e35d1eeb356ac1084d4e9773f028cb3c81ba1c6e5574d598ac4a78aa467cd797
e35d1eeb356ac1084d4e9773f028cb3c81ba1c6e5574d598ac4a78aa467cd797
\ No newline at end of file
tests/testdata/hendrycksTest-abstract_algebra-v0-res.json
View file @
8c997e53
{
"results"
:
{
"hendrycksTest-abstract_algebra"
:
{
"acc"
:
0.32
,
"acc_norm"
:
0.34
,
"acc_norm_stderr"
:
0.04760952285695235
,
"acc_stderr"
:
0.04688261722621504
}},
"versions"
:
{
"hendrycksTest-abstract_algebra"
:
0
}}
{
"results"
:
{
"hendrycksTest-abstract_algebra"
:
{
"acc"
:
0.32
,
"acc_norm"
:
0.34
,
"acc_norm_stderr"
:
0.04760952285695235
,
"acc_stderr"
:
0.04688261722621504
}},
"versions"
:
{
"hendrycksTest-abstract_algebra"
:
0
}}
\ No newline at end of file
tests/testdata/hendrycksTest-anatomy-v0-loglikelihood
View file @
8c997e53
bf05e04ed8cf61cf3aad294ed3f5a16137775ffdd20f1b129022ddffc1251768
bf05e04ed8cf61cf3aad294ed3f5a16137775ffdd20f1b129022ddffc1251768
\ No newline at end of file
tests/testdata/hendrycksTest-anatomy-v0-res.json
View file @
8c997e53
{
"results"
:
{
"hendrycksTest-anatomy"
:
{
"acc"
:
0.2222222222222222
,
"acc_norm"
:
0.23703703703703705
,
"acc_norm_stderr"
:
0.03673731683969506
,
"acc_stderr"
:
0.0359144408419697
}},
"versions"
:
{
"hendrycksTest-anatomy"
:
0
}}
{
"results"
:
{
"hendrycksTest-anatomy"
:
{
"acc"
:
0.2222222222222222
,
"acc_norm"
:
0.23703703703703705
,
"acc_norm_stderr"
:
0.03673731683969506
,
"acc_stderr"
:
0.0359144408419697
}},
"versions"
:
{
"hendrycksTest-anatomy"
:
0
}}
\ No newline at end of file
tests/testdata/hendrycksTest-astronomy-v0-loglikelihood
View file @
8c997e53
bed1e47127cc2893c6aef63b9a0909cca31aa351a703da2a166b01cae03c3311
bed1e47127cc2893c6aef63b9a0909cca31aa351a703da2a166b01cae03c3311
\ No newline at end of file
tests/testdata/hendrycksTest-astronomy-v0-res.json
View file @
8c997e53
{
"results"
:
{
"hendrycksTest-astronomy"
:
{
"acc"
:
0.2565789473684211
,
"acc_norm"
:
0.29605263157894735
,
"acc_norm_stderr"
:
0.03715062154998904
,
"acc_stderr"
:
0.0355418036802569
}},
"versions"
:
{
"hendrycksTest-astronomy"
:
0
}}
{
"results"
:
{
"hendrycksTest-astronomy"
:
{
"acc"
:
0.2565789473684211
,
"acc_norm"
:
0.29605263157894735
,
"acc_norm_stderr"
:
0.03715062154998904
,
"acc_stderr"
:
0.0355418036802569
}},
"versions"
:
{
"hendrycksTest-astronomy"
:
0
}}
\ No newline at end of file
tests/testdata/hendrycksTest-business_ethics-v0-loglikelihood
View file @
8c997e53
b3b27e9dbad587377d3c8cab1072782de883e245da93a563bd8b3099017b1fc0
b3b27e9dbad587377d3c8cab1072782de883e245da93a563bd8b3099017b1fc0
\ No newline at end of file
tests/testdata/hendrycksTest-business_ethics-v0-res.json
View file @
8c997e53
{
"results"
:
{
"hendrycksTest-business_ethics"
:
{
"acc"
:
0.29
,
"acc_norm"
:
0.27
,
"acc_norm_stderr"
:
0.044619604333847394
,
"acc_stderr"
:
0.045604802157206845
}},
"versions"
:
{
"hendrycksTest-business_ethics"
:
0
}}
{
"results"
:
{
"hendrycksTest-business_ethics"
:
{
"acc"
:
0.29
,
"acc_norm"
:
0.27
,
"acc_norm_stderr"
:
0.044619604333847394
,
"acc_stderr"
:
0.045604802157206845
}},
"versions"
:
{
"hendrycksTest-business_ethics"
:
0
}}
\ No newline at end of file
tests/testdata/hendrycksTest-clinical_knowledge-v0-loglikelihood
View file @
8c997e53
fbcb7ce507e0675d811e71e10a67c8d05a6605e29036f46776e04a6588cefbda
fbcb7ce507e0675d811e71e10a67c8d05a6605e29036f46776e04a6588cefbda
\ No newline at end of file
tests/testdata/hendrycksTest-clinical_knowledge-v0-res.json
View file @
8c997e53
{
"results"
:
{
"hendrycksTest-clinical_knowledge"
:
{
"acc"
:
0.23773584905660378
,
"acc_norm"
:
0.27169811320754716
,
"acc_norm_stderr"
:
0.027377706624670713
,
"acc_stderr"
:
0.02619980880756191
}},
"versions"
:
{
"hendrycksTest-clinical_knowledge"
:
0
}}
{
"results"
:
{
"hendrycksTest-clinical_knowledge"
:
{
"acc"
:
0.23773584905660378
,
"acc_norm"
:
0.27169811320754716
,
"acc_norm_stderr"
:
0.027377706624670713
,
"acc_stderr"
:
0.02619980880756191
}},
"versions"
:
{
"hendrycksTest-clinical_knowledge"
:
0
}}
\ No newline at end of file
tests/testdata/hendrycksTest-college_biology-v0-loglikelihood
View file @
8c997e53
c29e4e67ff91af29b9434884874414d1b1b32ccc32903c6b1639469b19907419
c29e4e67ff91af29b9434884874414d1b1b32ccc32903c6b1639469b19907419
\ No newline at end of file
tests/testdata/hendrycksTest-college_biology-v0-res.json
View file @
8c997e53
{
"results"
:
{
"hendrycksTest-college_biology"
:
{
"acc"
:
0.24305555555555555
,
"acc_norm"
:
0.2361111111111111
,
"acc_norm_stderr"
:
0.03551446610810826
,
"acc_stderr"
:
0.03586879280080341
}},
"versions"
:
{
"hendrycksTest-college_biology"
:
0
}}
{
"results"
:
{
"hendrycksTest-college_biology"
:
{
"acc"
:
0.24305555555555555
,
"acc_norm"
:
0.2361111111111111
,
"acc_norm_stderr"
:
0.03551446610810826
,
"acc_stderr"
:
0.03586879280080341
}},
"versions"
:
{
"hendrycksTest-college_biology"
:
0
}}
\ No newline at end of file
tests/testdata/hendrycksTest-college_chemistry-v0-loglikelihood
View file @
8c997e53
044752b21540db95118b8cbe7e75c4c9b8758e27df56543deaeadec7f749a28d
044752b21540db95118b8cbe7e75c4c9b8758e27df56543deaeadec7f749a28d
\ No newline at end of file
Prev
1
…
8
9
10
11
12
13
14
15
16
…
32
Next
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment