Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
gaoqiong
lm-evaluation-harness
Commits
121b7096
Commit
121b7096
authored
May 02, 2022
by
Fabrizio Milo
Browse files
add pre-commit
parent
7a038118
Changes
732
Hide whitespace changes
Inline
Side-by-side
Showing
20 changed files
with
20 additions
and
20 deletions
+20
-20
tests/testdata/hendrycksTest-high_school_world_history-v0-loglikelihood
.../hendrycksTest-high_school_world_history-v0-loglikelihood
+1
-1
tests/testdata/hendrycksTest-high_school_world_history-v0-res.json
...tdata/hendrycksTest-high_school_world_history-v0-res.json
+1
-1
tests/testdata/hendrycksTest-human_aging-v0-loglikelihood
tests/testdata/hendrycksTest-human_aging-v0-loglikelihood
+1
-1
tests/testdata/hendrycksTest-human_aging-v0-res.json
tests/testdata/hendrycksTest-human_aging-v0-res.json
+1
-1
tests/testdata/hendrycksTest-human_sexuality-v0-loglikelihood
...s/testdata/hendrycksTest-human_sexuality-v0-loglikelihood
+1
-1
tests/testdata/hendrycksTest-human_sexuality-v0-res.json
tests/testdata/hendrycksTest-human_sexuality-v0-res.json
+1
-1
tests/testdata/hendrycksTest-international_law-v0-loglikelihood
...testdata/hendrycksTest-international_law-v0-loglikelihood
+1
-1
tests/testdata/hendrycksTest-international_law-v0-res.json
tests/testdata/hendrycksTest-international_law-v0-res.json
+1
-1
tests/testdata/hendrycksTest-jurisprudence-v0-loglikelihood
tests/testdata/hendrycksTest-jurisprudence-v0-loglikelihood
+1
-1
tests/testdata/hendrycksTest-jurisprudence-v0-res.json
tests/testdata/hendrycksTest-jurisprudence-v0-res.json
+1
-1
tests/testdata/hendrycksTest-logical_fallacies-v0-loglikelihood
...testdata/hendrycksTest-logical_fallacies-v0-loglikelihood
+1
-1
tests/testdata/hendrycksTest-logical_fallacies-v0-res.json
tests/testdata/hendrycksTest-logical_fallacies-v0-res.json
+1
-1
tests/testdata/hendrycksTest-machine_learning-v0-loglikelihood
.../testdata/hendrycksTest-machine_learning-v0-loglikelihood
+1
-1
tests/testdata/hendrycksTest-machine_learning-v0-res.json
tests/testdata/hendrycksTest-machine_learning-v0-res.json
+1
-1
tests/testdata/hendrycksTest-management-v0-loglikelihood
tests/testdata/hendrycksTest-management-v0-loglikelihood
+1
-1
tests/testdata/hendrycksTest-management-v0-res.json
tests/testdata/hendrycksTest-management-v0-res.json
+1
-1
tests/testdata/hendrycksTest-marketing-v0-loglikelihood
tests/testdata/hendrycksTest-marketing-v0-loglikelihood
+1
-1
tests/testdata/hendrycksTest-marketing-v0-res.json
tests/testdata/hendrycksTest-marketing-v0-res.json
+1
-1
tests/testdata/hendrycksTest-medical_genetics-v0-loglikelihood
.../testdata/hendrycksTest-medical_genetics-v0-loglikelihood
+1
-1
tests/testdata/hendrycksTest-medical_genetics-v0-res.json
tests/testdata/hendrycksTest-medical_genetics-v0-res.json
+1
-1
No files found.
tests/testdata/hendrycksTest-high_school_world_history-v0-loglikelihood
View file @
121b7096
1c8b994bd9a63ec874fc8d0e3a27077118b7adc472306b2fd6c55635a78b9d52
1c8b994bd9a63ec874fc8d0e3a27077118b7adc472306b2fd6c55635a78b9d52
\ No newline at end of file
tests/testdata/hendrycksTest-high_school_world_history-v0-res.json
View file @
121b7096
{
"results"
:
{
"hendrycksTest-high_school_world_history"
:
{
"acc"
:
0.23628691983122363
,
"acc_norm"
:
0.24472573839662448
,
"acc_norm_stderr"
:
0.02798569938703642
,
"acc_stderr"
:
0.027652153144159263
}},
"versions"
:
{
"hendrycksTest-high_school_world_history"
:
0
}}
{
"results"
:
{
"hendrycksTest-high_school_world_history"
:
{
"acc"
:
0.23628691983122363
,
"acc_norm"
:
0.24472573839662448
,
"acc_norm_stderr"
:
0.02798569938703642
,
"acc_stderr"
:
0.027652153144159263
}},
"versions"
:
{
"hendrycksTest-high_school_world_history"
:
0
}}
\ No newline at end of file
tests/testdata/hendrycksTest-human_aging-v0-loglikelihood
View file @
121b7096
0880b3a78f8d7b17ffc612031427b9085367cf65dabe2a68c4b64e3171d17e88
0880b3a78f8d7b17ffc612031427b9085367cf65dabe2a68c4b64e3171d17e88
\ No newline at end of file
tests/testdata/hendrycksTest-human_aging-v0-res.json
View file @
121b7096
{
"results"
:
{
"hendrycksTest-human_aging"
:
{
"acc"
:
0.21524663677130046
,
"acc_norm"
:
0.17937219730941703
,
"acc_norm_stderr"
:
0.025749819569192804
,
"acc_stderr"
:
0.02758406660220827
}},
"versions"
:
{
"hendrycksTest-human_aging"
:
0
}}
{
"results"
:
{
"hendrycksTest-human_aging"
:
{
"acc"
:
0.21524663677130046
,
"acc_norm"
:
0.17937219730941703
,
"acc_norm_stderr"
:
0.025749819569192804
,
"acc_stderr"
:
0.02758406660220827
}},
"versions"
:
{
"hendrycksTest-human_aging"
:
0
}}
\ No newline at end of file
tests/testdata/hendrycksTest-human_sexuality-v0-loglikelihood
View file @
121b7096
4b07922fa1d549b655c21440b13d869263ce7dd9771d8147c450f11c91d26c10
4b07922fa1d549b655c21440b13d869263ce7dd9771d8147c450f11c91d26c10
\ No newline at end of file
tests/testdata/hendrycksTest-human_sexuality-v0-res.json
View file @
121b7096
{
"results"
:
{
"hendrycksTest-human_sexuality"
:
{
"acc"
:
0.22137404580152673
,
"acc_norm"
:
0.22900763358778625
,
"acc_norm_stderr"
:
0.036853466317118506
,
"acc_stderr"
:
0.0364129708131373
}},
"versions"
:
{
"hendrycksTest-human_sexuality"
:
0
}}
{
"results"
:
{
"hendrycksTest-human_sexuality"
:
{
"acc"
:
0.22137404580152673
,
"acc_norm"
:
0.22900763358778625
,
"acc_norm_stderr"
:
0.036853466317118506
,
"acc_stderr"
:
0.0364129708131373
}},
"versions"
:
{
"hendrycksTest-human_sexuality"
:
0
}}
\ No newline at end of file
tests/testdata/hendrycksTest-international_law-v0-loglikelihood
View file @
121b7096
ea9b2cefd27959db564168f6ad1169a5eaa012fc5a5d5b8faf9e34d94e335dc1
ea9b2cefd27959db564168f6ad1169a5eaa012fc5a5d5b8faf9e34d94e335dc1
\ No newline at end of file
tests/testdata/hendrycksTest-international_law-v0-res.json
View file @
121b7096
{
"results"
:
{
"hendrycksTest-international_law"
:
{
"acc"
:
0.2396694214876033
,
"acc_norm"
:
0.3140495867768595
,
"acc_norm_stderr"
:
0.042369647530410164
,
"acc_stderr"
:
0.03896878985070417
}},
"versions"
:
{
"hendrycksTest-international_law"
:
0
}}
{
"results"
:
{
"hendrycksTest-international_law"
:
{
"acc"
:
0.2396694214876033
,
"acc_norm"
:
0.3140495867768595
,
"acc_norm_stderr"
:
0.042369647530410164
,
"acc_stderr"
:
0.03896878985070417
}},
"versions"
:
{
"hendrycksTest-international_law"
:
0
}}
\ No newline at end of file
tests/testdata/hendrycksTest-jurisprudence-v0-loglikelihood
View file @
121b7096
cac440189f1ec778e82f4975d88b74689553ecc5116aaa7f76587a50c1a610e0
cac440189f1ec778e82f4975d88b74689553ecc5116aaa7f76587a50c1a610e0
\ No newline at end of file
tests/testdata/hendrycksTest-jurisprudence-v0-res.json
View file @
121b7096
{
"results"
:
{
"hendrycksTest-jurisprudence"
:
{
"acc"
:
0.25
,
"acc_norm"
:
0.3148148148148148
,
"acc_norm_stderr"
:
0.04489931073591312
,
"acc_stderr"
:
0.04186091791394607
}},
"versions"
:
{
"hendrycksTest-jurisprudence"
:
0
}}
{
"results"
:
{
"hendrycksTest-jurisprudence"
:
{
"acc"
:
0.25
,
"acc_norm"
:
0.3148148148148148
,
"acc_norm_stderr"
:
0.04489931073591312
,
"acc_stderr"
:
0.04186091791394607
}},
"versions"
:
{
"hendrycksTest-jurisprudence"
:
0
}}
\ No newline at end of file
tests/testdata/hendrycksTest-logical_fallacies-v0-loglikelihood
View file @
121b7096
2e9449dd803f9e2334dc562d9f04031fd013ed36b883b44ab500533a5dbbface
2e9449dd803f9e2334dc562d9f04031fd013ed36b883b44ab500533a5dbbface
\ No newline at end of file
tests/testdata/hendrycksTest-logical_fallacies-v0-res.json
View file @
121b7096
{
"results"
:
{
"hendrycksTest-logical_fallacies"
:
{
"acc"
:
0.20245398773006135
,
"acc_norm"
:
0.2147239263803681
,
"acc_norm_stderr"
:
0.03226219377286774
,
"acc_stderr"
:
0.03157065078911902
}},
"versions"
:
{
"hendrycksTest-logical_fallacies"
:
0
}}
{
"results"
:
{
"hendrycksTest-logical_fallacies"
:
{
"acc"
:
0.20245398773006135
,
"acc_norm"
:
0.2147239263803681
,
"acc_norm_stderr"
:
0.03226219377286774
,
"acc_stderr"
:
0.03157065078911902
}},
"versions"
:
{
"hendrycksTest-logical_fallacies"
:
0
}}
\ No newline at end of file
tests/testdata/hendrycksTest-machine_learning-v0-loglikelihood
View file @
121b7096
7a7138821a66ef946e427b40344cf7f1a916a2926995a85ef731a3bee40cb7ce
7a7138821a66ef946e427b40344cf7f1a916a2926995a85ef731a3bee40cb7ce
\ No newline at end of file
tests/testdata/hendrycksTest-machine_learning-v0-res.json
View file @
121b7096
{
"results"
:
{
"hendrycksTest-machine_learning"
:
{
"acc"
:
0.23214285714285715
,
"acc_norm"
:
0.22321428571428573
,
"acc_norm_stderr"
:
0.039523019677025116
,
"acc_stderr"
:
0.04007341809755806
}},
"versions"
:
{
"hendrycksTest-machine_learning"
:
0
}}
{
"results"
:
{
"hendrycksTest-machine_learning"
:
{
"acc"
:
0.23214285714285715
,
"acc_norm"
:
0.22321428571428573
,
"acc_norm_stderr"
:
0.039523019677025116
,
"acc_stderr"
:
0.04007341809755806
}},
"versions"
:
{
"hendrycksTest-machine_learning"
:
0
}}
\ No newline at end of file
tests/testdata/hendrycksTest-management-v0-loglikelihood
View file @
121b7096
355489f4bd176ab84db5ef4c03d56ddeeeb1b0ad69827122b2d800e1cdc7e5f0
355489f4bd176ab84db5ef4c03d56ddeeeb1b0ad69827122b2d800e1cdc7e5f0
\ No newline at end of file
tests/testdata/hendrycksTest-management-v0-res.json
View file @
121b7096
{
"results"
:
{
"hendrycksTest-management"
:
{
"acc"
:
0.24271844660194175
,
"acc_norm"
:
0.2621359223300971
,
"acc_norm_stderr"
:
0.043546310772605956
,
"acc_stderr"
:
0.04245022486384495
}},
"versions"
:
{
"hendrycksTest-management"
:
0
}}
{
"results"
:
{
"hendrycksTest-management"
:
{
"acc"
:
0.24271844660194175
,
"acc_norm"
:
0.2621359223300971
,
"acc_norm_stderr"
:
0.043546310772605956
,
"acc_stderr"
:
0.04245022486384495
}},
"versions"
:
{
"hendrycksTest-management"
:
0
}}
\ No newline at end of file
tests/testdata/hendrycksTest-marketing-v0-loglikelihood
View file @
121b7096
b4fa0681fe54671a80509779d4338d744097a7206687f62977df7145dfa74a66
b4fa0681fe54671a80509779d4338d744097a7206687f62977df7145dfa74a66
\ No newline at end of file
tests/testdata/hendrycksTest-marketing-v0-res.json
View file @
121b7096
{
"results"
:
{
"hendrycksTest-marketing"
:
{
"acc"
:
0.2863247863247863
,
"acc_norm"
:
0.2905982905982906
,
"acc_norm_stderr"
:
0.029745048572674043
,
"acc_stderr"
:
0.029614323690456648
}},
"versions"
:
{
"hendrycksTest-marketing"
:
0
}}
{
"results"
:
{
"hendrycksTest-marketing"
:
{
"acc"
:
0.2863247863247863
,
"acc_norm"
:
0.2905982905982906
,
"acc_norm_stderr"
:
0.029745048572674043
,
"acc_stderr"
:
0.029614323690456648
}},
"versions"
:
{
"hendrycksTest-marketing"
:
0
}}
\ No newline at end of file
tests/testdata/hendrycksTest-medical_genetics-v0-loglikelihood
View file @
121b7096
db6141246889a19dd3f6b9109f314d49c1a70f7a98795858804378b095c4a2fe
db6141246889a19dd3f6b9109f314d49c1a70f7a98795858804378b095c4a2fe
\ No newline at end of file
tests/testdata/hendrycksTest-medical_genetics-v0-res.json
View file @
121b7096
{
"results"
:
{
"hendrycksTest-medical_genetics"
:
{
"acc"
:
0.27
,
"acc_norm"
:
0.29
,
"acc_norm_stderr"
:
0.04560480215720684
,
"acc_stderr"
:
0.0446196043338474
}},
"versions"
:
{
"hendrycksTest-medical_genetics"
:
0
}}
{
"results"
:
{
"hendrycksTest-medical_genetics"
:
{
"acc"
:
0.27
,
"acc_norm"
:
0.29
,
"acc_norm_stderr"
:
0.04560480215720684
,
"acc_stderr"
:
0.0446196043338474
}},
"versions"
:
{
"hendrycksTest-medical_genetics"
:
0
}}
\ No newline at end of file
Prev
1
…
10
11
12
13
14
15
16
17
18
…
37
Next
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment