Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
gaoqiong
lm-evaluation-harness
Commits
121b7096
Commit
121b7096
authored
May 02, 2022
by
Fabrizio Milo
Browse files
add pre-commit
parent
7a038118
Changes
732
Hide whitespace changes
Inline
Side-by-side
Showing
20 changed files
with
20 additions
and
20 deletions
+20
-20
tests/testdata/hendrycksTest-miscellaneous-v0-loglikelihood
tests/testdata/hendrycksTest-miscellaneous-v0-loglikelihood
+1
-1
tests/testdata/hendrycksTest-miscellaneous-v0-res.json
tests/testdata/hendrycksTest-miscellaneous-v0-res.json
+1
-1
tests/testdata/hendrycksTest-moral_disputes-v0-loglikelihood
tests/testdata/hendrycksTest-moral_disputes-v0-loglikelihood
+1
-1
tests/testdata/hendrycksTest-moral_disputes-v0-res.json
tests/testdata/hendrycksTest-moral_disputes-v0-res.json
+1
-1
tests/testdata/hendrycksTest-moral_scenarios-v0-loglikelihood
...s/testdata/hendrycksTest-moral_scenarios-v0-loglikelihood
+1
-1
tests/testdata/hendrycksTest-moral_scenarios-v0-res.json
tests/testdata/hendrycksTest-moral_scenarios-v0-res.json
+1
-1
tests/testdata/hendrycksTest-nutrition-v0-loglikelihood
tests/testdata/hendrycksTest-nutrition-v0-loglikelihood
+1
-1
tests/testdata/hendrycksTest-nutrition-v0-res.json
tests/testdata/hendrycksTest-nutrition-v0-res.json
+1
-1
tests/testdata/hendrycksTest-philosophy-v0-loglikelihood
tests/testdata/hendrycksTest-philosophy-v0-loglikelihood
+1
-1
tests/testdata/hendrycksTest-philosophy-v0-res.json
tests/testdata/hendrycksTest-philosophy-v0-res.json
+1
-1
tests/testdata/hendrycksTest-prehistory-v0-loglikelihood
tests/testdata/hendrycksTest-prehistory-v0-loglikelihood
+1
-1
tests/testdata/hendrycksTest-prehistory-v0-res.json
tests/testdata/hendrycksTest-prehistory-v0-res.json
+1
-1
tests/testdata/hendrycksTest-professional_accounting-v0-loglikelihood
...ta/hendrycksTest-professional_accounting-v0-loglikelihood
+1
-1
tests/testdata/hendrycksTest-professional_accounting-v0-res.json
...estdata/hendrycksTest-professional_accounting-v0-res.json
+1
-1
tests/testdata/hendrycksTest-professional_law-v0-loglikelihood
.../testdata/hendrycksTest-professional_law-v0-loglikelihood
+1
-1
tests/testdata/hendrycksTest-professional_law-v0-res.json
tests/testdata/hendrycksTest-professional_law-v0-res.json
+1
-1
tests/testdata/hendrycksTest-professional_medicine-v0-loglikelihood
...data/hendrycksTest-professional_medicine-v0-loglikelihood
+1
-1
tests/testdata/hendrycksTest-professional_medicine-v0-res.json
.../testdata/hendrycksTest-professional_medicine-v0-res.json
+1
-1
tests/testdata/hendrycksTest-professional_psychology-v0-loglikelihood
...ta/hendrycksTest-professional_psychology-v0-loglikelihood
+1
-1
tests/testdata/hendrycksTest-professional_psychology-v0-res.json
...estdata/hendrycksTest-professional_psychology-v0-res.json
+1
-1
No files found.
tests/testdata/hendrycksTest-miscellaneous-v0-loglikelihood
View file @
121b7096
972dd88dbbaf09d14766e243cfc233425e7c01a26dbc61bdb9eeefa788822331
\ No newline at end of file
972dd88dbbaf09d14766e243cfc233425e7c01a26dbc61bdb9eeefa788822331
tests/testdata/hendrycksTest-miscellaneous-v0-res.json
View file @
121b7096
{
"results"
:
{
"hendrycksTest-miscellaneous"
:
{
"acc"
:
0.23499361430395913
,
"acc_norm"
:
0.2515964240102171
,
"acc_norm_stderr"
:
0.015517322365529622
,
"acc_stderr"
:
0.015162024152278445
}},
"versions"
:
{
"hendrycksTest-miscellaneous"
:
0
}}
\ No newline at end of file
{
"results"
:
{
"hendrycksTest-miscellaneous"
:
{
"acc"
:
0.23499361430395913
,
"acc_norm"
:
0.2515964240102171
,
"acc_norm_stderr"
:
0.015517322365529622
,
"acc_stderr"
:
0.015162024152278445
}},
"versions"
:
{
"hendrycksTest-miscellaneous"
:
0
}}
tests/testdata/hendrycksTest-moral_disputes-v0-loglikelihood
View file @
121b7096
d6ef028022c02b69d1516973e08bebaa14d8debcf2589a2bb124823178202d20
\ No newline at end of file
d6ef028022c02b69d1516973e08bebaa14d8debcf2589a2bb124823178202d20
tests/testdata/hendrycksTest-moral_disputes-v0-res.json
View file @
121b7096
{
"results"
:
{
"hendrycksTest-moral_disputes"
:
{
"acc"
:
0.24855491329479767
,
"acc_norm"
:
0.27167630057803466
,
"acc_norm_stderr"
:
0.023948512905468365
,
"acc_stderr"
:
0.023267528432100174
}},
"versions"
:
{
"hendrycksTest-moral_disputes"
:
0
}}
\ No newline at end of file
{
"results"
:
{
"hendrycksTest-moral_disputes"
:
{
"acc"
:
0.24855491329479767
,
"acc_norm"
:
0.27167630057803466
,
"acc_norm_stderr"
:
0.023948512905468365
,
"acc_stderr"
:
0.023267528432100174
}},
"versions"
:
{
"hendrycksTest-moral_disputes"
:
0
}}
tests/testdata/hendrycksTest-moral_scenarios-v0-loglikelihood
View file @
121b7096
a8e1882e77728b53c8b86312254d08320d8363fb606d746a8dd145b812f62cf5
\ No newline at end of file
a8e1882e77728b53c8b86312254d08320d8363fb606d746a8dd145b812f62cf5
tests/testdata/hendrycksTest-moral_scenarios-v0-res.json
View file @
121b7096
{
"results"
:
{
"hendrycksTest-moral_scenarios"
:
{
"acc"
:
0.2547486033519553
,
"acc_norm"
:
0.25251396648044694
,
"acc_norm_stderr"
:
0.014530330201468654
,
"acc_stderr"
:
0.014572650383409158
}},
"versions"
:
{
"hendrycksTest-moral_scenarios"
:
0
}}
\ No newline at end of file
{
"results"
:
{
"hendrycksTest-moral_scenarios"
:
{
"acc"
:
0.2547486033519553
,
"acc_norm"
:
0.25251396648044694
,
"acc_norm_stderr"
:
0.014530330201468654
,
"acc_stderr"
:
0.014572650383409158
}},
"versions"
:
{
"hendrycksTest-moral_scenarios"
:
0
}}
tests/testdata/hendrycksTest-nutrition-v0-loglikelihood
View file @
121b7096
19e49d218f55ed5ec4bd1a6cd3f3388c6f620b81484e7abe8b298e5481c3044d
\ No newline at end of file
19e49d218f55ed5ec4bd1a6cd3f3388c6f620b81484e7abe8b298e5481c3044d
tests/testdata/hendrycksTest-nutrition-v0-res.json
View file @
121b7096
{
"results"
:
{
"hendrycksTest-nutrition"
:
{
"acc"
:
0.24509803921568626
,
"acc_norm"
:
0.28104575163398693
,
"acc_norm_stderr"
:
0.025738854797818723
,
"acc_stderr"
:
0.02463004897982476
}},
"versions"
:
{
"hendrycksTest-nutrition"
:
0
}}
\ No newline at end of file
{
"results"
:
{
"hendrycksTest-nutrition"
:
{
"acc"
:
0.24509803921568626
,
"acc_norm"
:
0.28104575163398693
,
"acc_norm_stderr"
:
0.025738854797818723
,
"acc_stderr"
:
0.02463004897982476
}},
"versions"
:
{
"hendrycksTest-nutrition"
:
0
}}
tests/testdata/hendrycksTest-philosophy-v0-loglikelihood
View file @
121b7096
a419204da36c2b7a70fa8909a3a804260cc3283c7e07917534dfb76216c77f46
\ No newline at end of file
a419204da36c2b7a70fa8909a3a804260cc3283c7e07917534dfb76216c77f46
tests/testdata/hendrycksTest-philosophy-v0-res.json
View file @
121b7096
{
"results"
:
{
"hendrycksTest-philosophy"
:
{
"acc"
:
0.26366559485530544
,
"acc_norm"
:
0.2733118971061093
,
"acc_norm_stderr"
:
0.02531176597542612
,
"acc_stderr"
:
0.02502553850053234
}},
"versions"
:
{
"hendrycksTest-philosophy"
:
0
}}
\ No newline at end of file
{
"results"
:
{
"hendrycksTest-philosophy"
:
{
"acc"
:
0.26366559485530544
,
"acc_norm"
:
0.2733118971061093
,
"acc_norm_stderr"
:
0.02531176597542612
,
"acc_stderr"
:
0.02502553850053234
}},
"versions"
:
{
"hendrycksTest-philosophy"
:
0
}}
tests/testdata/hendrycksTest-prehistory-v0-loglikelihood
View file @
121b7096
6983c560a562749f4f702249a3a6ae51fa495acc0643a980bf2cf52c6c5d4b95
\ No newline at end of file
6983c560a562749f4f702249a3a6ae51fa495acc0643a980bf2cf52c6c5d4b95
tests/testdata/hendrycksTest-prehistory-v0-res.json
View file @
121b7096
{
"results"
:
{
"hendrycksTest-prehistory"
:
{
"acc"
:
0.2623456790123457
,
"acc_norm"
:
0.26851851851851855
,
"acc_norm_stderr"
:
0.024659685185967277
,
"acc_stderr"
:
0.02447722285613511
}},
"versions"
:
{
"hendrycksTest-prehistory"
:
0
}}
\ No newline at end of file
{
"results"
:
{
"hendrycksTest-prehistory"
:
{
"acc"
:
0.2623456790123457
,
"acc_norm"
:
0.26851851851851855
,
"acc_norm_stderr"
:
0.024659685185967277
,
"acc_stderr"
:
0.02447722285613511
}},
"versions"
:
{
"hendrycksTest-prehistory"
:
0
}}
tests/testdata/hendrycksTest-professional_accounting-v0-loglikelihood
View file @
121b7096
847418f7b22cd9b499e95fd73c40a2fbc40076895280cc2c560199c0c4c4f433
\ No newline at end of file
847418f7b22cd9b499e95fd73c40a2fbc40076895280cc2c560199c0c4c4f433
tests/testdata/hendrycksTest-professional_accounting-v0-res.json
View file @
121b7096
{
"results"
:
{
"hendrycksTest-professional_accounting"
:
{
"acc"
:
0.2553191489361702
,
"acc_norm"
:
0.26595744680851063
,
"acc_norm_stderr"
:
0.026358065698880582
,
"acc_stderr"
:
0.026011992930902006
}},
"versions"
:
{
"hendrycksTest-professional_accounting"
:
0
}}
\ No newline at end of file
{
"results"
:
{
"hendrycksTest-professional_accounting"
:
{
"acc"
:
0.2553191489361702
,
"acc_norm"
:
0.26595744680851063
,
"acc_norm_stderr"
:
0.026358065698880582
,
"acc_stderr"
:
0.026011992930902006
}},
"versions"
:
{
"hendrycksTest-professional_accounting"
:
0
}}
tests/testdata/hendrycksTest-professional_law-v0-loglikelihood
View file @
121b7096
c38c9d5d84eeb7a5f3c4a34d6e70d7e15847b3c38f26e4b119c982bb935e118f
\ No newline at end of file
c38c9d5d84eeb7a5f3c4a34d6e70d7e15847b3c38f26e4b119c982bb935e118f
tests/testdata/hendrycksTest-professional_law-v0-res.json
View file @
121b7096
{
"results"
:
{
"hendrycksTest-professional_law"
:
{
"acc"
:
0.2561929595827901
,
"acc_norm"
:
0.2470664928292047
,
"acc_norm_stderr"
:
0.011015752255279352
,
"acc_stderr"
:
0.011149173153110582
}},
"versions"
:
{
"hendrycksTest-professional_law"
:
0
}}
\ No newline at end of file
{
"results"
:
{
"hendrycksTest-professional_law"
:
{
"acc"
:
0.2561929595827901
,
"acc_norm"
:
0.2470664928292047
,
"acc_norm_stderr"
:
0.011015752255279352
,
"acc_stderr"
:
0.011149173153110582
}},
"versions"
:
{
"hendrycksTest-professional_law"
:
0
}}
tests/testdata/hendrycksTest-professional_medicine-v0-loglikelihood
View file @
121b7096
7a30599858398169cde61430c18efdd7fb4dcd09c34aa9baba70f0f8cf17a9f1
\ No newline at end of file
7a30599858398169cde61430c18efdd7fb4dcd09c34aa9baba70f0f8cf17a9f1
tests/testdata/hendrycksTest-professional_medicine-v0-res.json
View file @
121b7096
{
"results"
:
{
"hendrycksTest-professional_medicine"
:
{
"acc"
:
0.23161764705882354
,
"acc_norm"
:
0.2536764705882353
,
"acc_norm_stderr"
:
0.02643132987078953
,
"acc_stderr"
:
0.025626533803777562
}},
"versions"
:
{
"hendrycksTest-professional_medicine"
:
0
}}
\ No newline at end of file
{
"results"
:
{
"hendrycksTest-professional_medicine"
:
{
"acc"
:
0.23161764705882354
,
"acc_norm"
:
0.2536764705882353
,
"acc_norm_stderr"
:
0.02643132987078953
,
"acc_stderr"
:
0.025626533803777562
}},
"versions"
:
{
"hendrycksTest-professional_medicine"
:
0
}}
tests/testdata/hendrycksTest-professional_psychology-v0-loglikelihood
View file @
121b7096
92a5fad6e9ec700f84946faeccd399dda3569fb71837c9fb0c5c87f5ec29c43e
\ No newline at end of file
92a5fad6e9ec700f84946faeccd399dda3569fb71837c9fb0c5c87f5ec29c43e
tests/testdata/hendrycksTest-professional_psychology-v0-res.json
View file @
121b7096
{
"results"
:
{
"hendrycksTest-professional_psychology"
:
{
"acc"
:
0.27124183006535946
,
"acc_norm"
:
0.2826797385620915
,
"acc_norm_stderr"
:
0.01821726955205344
,
"acc_stderr"
:
0.01798661530403031
}},
"versions"
:
{
"hendrycksTest-professional_psychology"
:
0
}}
\ No newline at end of file
{
"results"
:
{
"hendrycksTest-professional_psychology"
:
{
"acc"
:
0.27124183006535946
,
"acc_norm"
:
0.2826797385620915
,
"acc_norm_stderr"
:
0.01821726955205344
,
"acc_stderr"
:
0.01798661530403031
}},
"versions"
:
{
"hendrycksTest-professional_psychology"
:
0
}}
Prev
1
…
11
12
13
14
15
16
17
18
19
…
37
Next
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment