Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
gaoqiong
lm-evaluation-harness
Commits
8c997e53
Commit
8c997e53
authored
May 03, 2022
by
jon-tow
Browse files
Revert `tests/testdata` changes and address flake8 issues
parent
d95a4333
Changes
627
Hide whitespace changes
Inline
Side-by-side
Showing
20 changed files
with
20 additions
and
20 deletions
+20
-20
tests/testdata/hendrycksTest-logical_fallacies-v0-res.json
tests/testdata/hendrycksTest-logical_fallacies-v0-res.json
+1
-1
tests/testdata/hendrycksTest-machine_learning-v0-loglikelihood
.../testdata/hendrycksTest-machine_learning-v0-loglikelihood
+1
-1
tests/testdata/hendrycksTest-machine_learning-v0-res.json
tests/testdata/hendrycksTest-machine_learning-v0-res.json
+1
-1
tests/testdata/hendrycksTest-management-v0-loglikelihood
tests/testdata/hendrycksTest-management-v0-loglikelihood
+1
-1
tests/testdata/hendrycksTest-management-v0-res.json
tests/testdata/hendrycksTest-management-v0-res.json
+1
-1
tests/testdata/hendrycksTest-marketing-v0-loglikelihood
tests/testdata/hendrycksTest-marketing-v0-loglikelihood
+1
-1
tests/testdata/hendrycksTest-marketing-v0-res.json
tests/testdata/hendrycksTest-marketing-v0-res.json
+1
-1
tests/testdata/hendrycksTest-medical_genetics-v0-loglikelihood
.../testdata/hendrycksTest-medical_genetics-v0-loglikelihood
+1
-1
tests/testdata/hendrycksTest-medical_genetics-v0-res.json
tests/testdata/hendrycksTest-medical_genetics-v0-res.json
+1
-1
tests/testdata/hendrycksTest-miscellaneous-v0-loglikelihood
tests/testdata/hendrycksTest-miscellaneous-v0-loglikelihood
+1
-1
tests/testdata/hendrycksTest-miscellaneous-v0-res.json
tests/testdata/hendrycksTest-miscellaneous-v0-res.json
+1
-1
tests/testdata/hendrycksTest-moral_disputes-v0-loglikelihood
tests/testdata/hendrycksTest-moral_disputes-v0-loglikelihood
+1
-1
tests/testdata/hendrycksTest-moral_disputes-v0-res.json
tests/testdata/hendrycksTest-moral_disputes-v0-res.json
+1
-1
tests/testdata/hendrycksTest-moral_scenarios-v0-loglikelihood
...s/testdata/hendrycksTest-moral_scenarios-v0-loglikelihood
+1
-1
tests/testdata/hendrycksTest-moral_scenarios-v0-res.json
tests/testdata/hendrycksTest-moral_scenarios-v0-res.json
+1
-1
tests/testdata/hendrycksTest-nutrition-v0-loglikelihood
tests/testdata/hendrycksTest-nutrition-v0-loglikelihood
+1
-1
tests/testdata/hendrycksTest-nutrition-v0-res.json
tests/testdata/hendrycksTest-nutrition-v0-res.json
+1
-1
tests/testdata/hendrycksTest-philosophy-v0-loglikelihood
tests/testdata/hendrycksTest-philosophy-v0-loglikelihood
+1
-1
tests/testdata/hendrycksTest-philosophy-v0-res.json
tests/testdata/hendrycksTest-philosophy-v0-res.json
+1
-1
tests/testdata/hendrycksTest-prehistory-v0-loglikelihood
tests/testdata/hendrycksTest-prehistory-v0-loglikelihood
+1
-1
No files found.
tests/testdata/hendrycksTest-logical_fallacies-v0-res.json
View file @
8c997e53
{
"results"
:
{
"hendrycksTest-logical_fallacies"
:
{
"acc"
:
0.20245398773006135
,
"acc_norm"
:
0.2147239263803681
,
"acc_norm_stderr"
:
0.03226219377286774
,
"acc_stderr"
:
0.03157065078911902
}},
"versions"
:
{
"hendrycksTest-logical_fallacies"
:
0
}}
{
"results"
:
{
"hendrycksTest-logical_fallacies"
:
{
"acc"
:
0.20245398773006135
,
"acc_norm"
:
0.2147239263803681
,
"acc_norm_stderr"
:
0.03226219377286774
,
"acc_stderr"
:
0.03157065078911902
}},
"versions"
:
{
"hendrycksTest-logical_fallacies"
:
0
}}
\ No newline at end of file
tests/testdata/hendrycksTest-machine_learning-v0-loglikelihood
View file @
8c997e53
7a7138821a66ef946e427b40344cf7f1a916a2926995a85ef731a3bee40cb7ce
7a7138821a66ef946e427b40344cf7f1a916a2926995a85ef731a3bee40cb7ce
\ No newline at end of file
tests/testdata/hendrycksTest-machine_learning-v0-res.json
View file @
8c997e53
{
"results"
:
{
"hendrycksTest-machine_learning"
:
{
"acc"
:
0.23214285714285715
,
"acc_norm"
:
0.22321428571428573
,
"acc_norm_stderr"
:
0.039523019677025116
,
"acc_stderr"
:
0.04007341809755806
}},
"versions"
:
{
"hendrycksTest-machine_learning"
:
0
}}
{
"results"
:
{
"hendrycksTest-machine_learning"
:
{
"acc"
:
0.23214285714285715
,
"acc_norm"
:
0.22321428571428573
,
"acc_norm_stderr"
:
0.039523019677025116
,
"acc_stderr"
:
0.04007341809755806
}},
"versions"
:
{
"hendrycksTest-machine_learning"
:
0
}}
\ No newline at end of file
tests/testdata/hendrycksTest-management-v0-loglikelihood
View file @
8c997e53
355489f4bd176ab84db5ef4c03d56ddeeeb1b0ad69827122b2d800e1cdc7e5f0
355489f4bd176ab84db5ef4c03d56ddeeeb1b0ad69827122b2d800e1cdc7e5f0
\ No newline at end of file
tests/testdata/hendrycksTest-management-v0-res.json
View file @
8c997e53
{
"results"
:
{
"hendrycksTest-management"
:
{
"acc"
:
0.24271844660194175
,
"acc_norm"
:
0.2621359223300971
,
"acc_norm_stderr"
:
0.043546310772605956
,
"acc_stderr"
:
0.04245022486384495
}},
"versions"
:
{
"hendrycksTest-management"
:
0
}}
{
"results"
:
{
"hendrycksTest-management"
:
{
"acc"
:
0.24271844660194175
,
"acc_norm"
:
0.2621359223300971
,
"acc_norm_stderr"
:
0.043546310772605956
,
"acc_stderr"
:
0.04245022486384495
}},
"versions"
:
{
"hendrycksTest-management"
:
0
}}
\ No newline at end of file
tests/testdata/hendrycksTest-marketing-v0-loglikelihood
View file @
8c997e53
b4fa0681fe54671a80509779d4338d744097a7206687f62977df7145dfa74a66
b4fa0681fe54671a80509779d4338d744097a7206687f62977df7145dfa74a66
\ No newline at end of file
tests/testdata/hendrycksTest-marketing-v0-res.json
View file @
8c997e53
{
"results"
:
{
"hendrycksTest-marketing"
:
{
"acc"
:
0.2863247863247863
,
"acc_norm"
:
0.2905982905982906
,
"acc_norm_stderr"
:
0.029745048572674043
,
"acc_stderr"
:
0.029614323690456648
}},
"versions"
:
{
"hendrycksTest-marketing"
:
0
}}
{
"results"
:
{
"hendrycksTest-marketing"
:
{
"acc"
:
0.2863247863247863
,
"acc_norm"
:
0.2905982905982906
,
"acc_norm_stderr"
:
0.029745048572674043
,
"acc_stderr"
:
0.029614323690456648
}},
"versions"
:
{
"hendrycksTest-marketing"
:
0
}}
\ No newline at end of file
tests/testdata/hendrycksTest-medical_genetics-v0-loglikelihood
View file @
8c997e53
db6141246889a19dd3f6b9109f314d49c1a70f7a98795858804378b095c4a2fe
db6141246889a19dd3f6b9109f314d49c1a70f7a98795858804378b095c4a2fe
\ No newline at end of file
tests/testdata/hendrycksTest-medical_genetics-v0-res.json
View file @
8c997e53
{
"results"
:
{
"hendrycksTest-medical_genetics"
:
{
"acc"
:
0.27
,
"acc_norm"
:
0.29
,
"acc_norm_stderr"
:
0.04560480215720684
,
"acc_stderr"
:
0.0446196043338474
}},
"versions"
:
{
"hendrycksTest-medical_genetics"
:
0
}}
{
"results"
:
{
"hendrycksTest-medical_genetics"
:
{
"acc"
:
0.27
,
"acc_norm"
:
0.29
,
"acc_norm_stderr"
:
0.04560480215720684
,
"acc_stderr"
:
0.0446196043338474
}},
"versions"
:
{
"hendrycksTest-medical_genetics"
:
0
}}
\ No newline at end of file
tests/testdata/hendrycksTest-miscellaneous-v0-loglikelihood
View file @
8c997e53
972dd88dbbaf09d14766e243cfc233425e7c01a26dbc61bdb9eeefa788822331
972dd88dbbaf09d14766e243cfc233425e7c01a26dbc61bdb9eeefa788822331
\ No newline at end of file
tests/testdata/hendrycksTest-miscellaneous-v0-res.json
View file @
8c997e53
{
"results"
:
{
"hendrycksTest-miscellaneous"
:
{
"acc"
:
0.23499361430395913
,
"acc_norm"
:
0.2515964240102171
,
"acc_norm_stderr"
:
0.015517322365529622
,
"acc_stderr"
:
0.015162024152278445
}},
"versions"
:
{
"hendrycksTest-miscellaneous"
:
0
}}
{
"results"
:
{
"hendrycksTest-miscellaneous"
:
{
"acc"
:
0.23499361430395913
,
"acc_norm"
:
0.2515964240102171
,
"acc_norm_stderr"
:
0.015517322365529622
,
"acc_stderr"
:
0.015162024152278445
}},
"versions"
:
{
"hendrycksTest-miscellaneous"
:
0
}}
\ No newline at end of file
tests/testdata/hendrycksTest-moral_disputes-v0-loglikelihood
View file @
8c997e53
d6ef028022c02b69d1516973e08bebaa14d8debcf2589a2bb124823178202d20
d6ef028022c02b69d1516973e08bebaa14d8debcf2589a2bb124823178202d20
\ No newline at end of file
tests/testdata/hendrycksTest-moral_disputes-v0-res.json
View file @
8c997e53
{
"results"
:
{
"hendrycksTest-moral_disputes"
:
{
"acc"
:
0.24855491329479767
,
"acc_norm"
:
0.27167630057803466
,
"acc_norm_stderr"
:
0.023948512905468365
,
"acc_stderr"
:
0.023267528432100174
}},
"versions"
:
{
"hendrycksTest-moral_disputes"
:
0
}}
{
"results"
:
{
"hendrycksTest-moral_disputes"
:
{
"acc"
:
0.24855491329479767
,
"acc_norm"
:
0.27167630057803466
,
"acc_norm_stderr"
:
0.023948512905468365
,
"acc_stderr"
:
0.023267528432100174
}},
"versions"
:
{
"hendrycksTest-moral_disputes"
:
0
}}
\ No newline at end of file
tests/testdata/hendrycksTest-moral_scenarios-v0-loglikelihood
View file @
8c997e53
a8e1882e77728b53c8b86312254d08320d8363fb606d746a8dd145b812f62cf5
a8e1882e77728b53c8b86312254d08320d8363fb606d746a8dd145b812f62cf5
\ No newline at end of file
tests/testdata/hendrycksTest-moral_scenarios-v0-res.json
View file @
8c997e53
{
"results"
:
{
"hendrycksTest-moral_scenarios"
:
{
"acc"
:
0.2547486033519553
,
"acc_norm"
:
0.25251396648044694
,
"acc_norm_stderr"
:
0.014530330201468654
,
"acc_stderr"
:
0.014572650383409158
}},
"versions"
:
{
"hendrycksTest-moral_scenarios"
:
0
}}
{
"results"
:
{
"hendrycksTest-moral_scenarios"
:
{
"acc"
:
0.2547486033519553
,
"acc_norm"
:
0.25251396648044694
,
"acc_norm_stderr"
:
0.014530330201468654
,
"acc_stderr"
:
0.014572650383409158
}},
"versions"
:
{
"hendrycksTest-moral_scenarios"
:
0
}}
\ No newline at end of file
tests/testdata/hendrycksTest-nutrition-v0-loglikelihood
View file @
8c997e53
19e49d218f55ed5ec4bd1a6cd3f3388c6f620b81484e7abe8b298e5481c3044d
19e49d218f55ed5ec4bd1a6cd3f3388c6f620b81484e7abe8b298e5481c3044d
\ No newline at end of file
tests/testdata/hendrycksTest-nutrition-v0-res.json
View file @
8c997e53
{
"results"
:
{
"hendrycksTest-nutrition"
:
{
"acc"
:
0.24509803921568626
,
"acc_norm"
:
0.28104575163398693
,
"acc_norm_stderr"
:
0.025738854797818723
,
"acc_stderr"
:
0.02463004897982476
}},
"versions"
:
{
"hendrycksTest-nutrition"
:
0
}}
{
"results"
:
{
"hendrycksTest-nutrition"
:
{
"acc"
:
0.24509803921568626
,
"acc_norm"
:
0.28104575163398693
,
"acc_norm_stderr"
:
0.025738854797818723
,
"acc_stderr"
:
0.02463004897982476
}},
"versions"
:
{
"hendrycksTest-nutrition"
:
0
}}
\ No newline at end of file
tests/testdata/hendrycksTest-philosophy-v0-loglikelihood
View file @
8c997e53
a419204da36c2b7a70fa8909a3a804260cc3283c7e07917534dfb76216c77f46
a419204da36c2b7a70fa8909a3a804260cc3283c7e07917534dfb76216c77f46
\ No newline at end of file
tests/testdata/hendrycksTest-philosophy-v0-res.json
View file @
8c997e53
{
"results"
:
{
"hendrycksTest-philosophy"
:
{
"acc"
:
0.26366559485530544
,
"acc_norm"
:
0.2733118971061093
,
"acc_norm_stderr"
:
0.02531176597542612
,
"acc_stderr"
:
0.02502553850053234
}},
"versions"
:
{
"hendrycksTest-philosophy"
:
0
}}
{
"results"
:
{
"hendrycksTest-philosophy"
:
{
"acc"
:
0.26366559485530544
,
"acc_norm"
:
0.2733118971061093
,
"acc_norm_stderr"
:
0.02531176597542612
,
"acc_stderr"
:
0.02502553850053234
}},
"versions"
:
{
"hendrycksTest-philosophy"
:
0
}}
\ No newline at end of file
tests/testdata/hendrycksTest-prehistory-v0-loglikelihood
View file @
8c997e53
6983c560a562749f4f702249a3a6ae51fa495acc0643a980bf2cf52c6c5d4b95
6983c560a562749f4f702249a3a6ae51fa495acc0643a980bf2cf52c6c5d4b95
\ No newline at end of file
Prev
1
…
12
13
14
15
16
17
18
19
20
…
32
Next
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment