Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
gaoqiong
lm-evaluation-harness
Commits
8c997e53
Commit
8c997e53
authored
May 03, 2022
by
jon-tow
Browse files
Revert `tests/testdata` changes and address flake8 issues
parent
d95a4333
Changes
627
Hide whitespace changes
Inline
Side-by-side
Showing
20 changed files
with
20 additions
and
20 deletions
+20
-20
tests/testdata/hendrycksTest-formal_logic-v0-res.json
tests/testdata/hendrycksTest-formal_logic-v0-res.json
+1
-1
tests/testdata/hendrycksTest-global_facts-v0-loglikelihood
tests/testdata/hendrycksTest-global_facts-v0-loglikelihood
+1
-1
tests/testdata/hendrycksTest-global_facts-v0-res.json
tests/testdata/hendrycksTest-global_facts-v0-res.json
+1
-1
tests/testdata/hendrycksTest-high_school_biology-v0-loglikelihood
...stdata/hendrycksTest-high_school_biology-v0-loglikelihood
+1
-1
tests/testdata/hendrycksTest-high_school_biology-v0-res.json
tests/testdata/hendrycksTest-high_school_biology-v0-res.json
+1
-1
tests/testdata/hendrycksTest-high_school_chemistry-v0-loglikelihood
...data/hendrycksTest-high_school_chemistry-v0-loglikelihood
+1
-1
tests/testdata/hendrycksTest-high_school_chemistry-v0-res.json
.../testdata/hendrycksTest-high_school_chemistry-v0-res.json
+1
-1
tests/testdata/hendrycksTest-high_school_computer_science-v0-loglikelihood
...ndrycksTest-high_school_computer_science-v0-loglikelihood
+1
-1
tests/testdata/hendrycksTest-high_school_computer_science-v0-res.json
...ta/hendrycksTest-high_school_computer_science-v0-res.json
+1
-1
tests/testdata/hendrycksTest-high_school_european_history-v0-loglikelihood
...ndrycksTest-high_school_european_history-v0-loglikelihood
+1
-1
tests/testdata/hendrycksTest-high_school_european_history-v0-res.json
...ta/hendrycksTest-high_school_european_history-v0-res.json
+1
-1
tests/testdata/hendrycksTest-high_school_geography-v0-loglikelihood
...data/hendrycksTest-high_school_geography-v0-loglikelihood
+1
-1
tests/testdata/hendrycksTest-high_school_geography-v0-res.json
.../testdata/hendrycksTest-high_school_geography-v0-res.json
+1
-1
tests/testdata/hendrycksTest-high_school_government_and_politics-v0-loglikelihood
...Test-high_school_government_and_politics-v0-loglikelihood
+1
-1
tests/testdata/hendrycksTest-high_school_government_and_politics-v0-res.json
...rycksTest-high_school_government_and_politics-v0-res.json
+1
-1
tests/testdata/hendrycksTest-high_school_macroeconomics-v0-loglikelihood
...hendrycksTest-high_school_macroeconomics-v0-loglikelihood
+1
-1
tests/testdata/hendrycksTest-high_school_macroeconomics-v0-res.json
...data/hendrycksTest-high_school_macroeconomics-v0-res.json
+1
-1
tests/testdata/hendrycksTest-high_school_mathematics-v0-loglikelihood
...ta/hendrycksTest-high_school_mathematics-v0-loglikelihood
+1
-1
tests/testdata/hendrycksTest-high_school_mathematics-v0-res.json
...estdata/hendrycksTest-high_school_mathematics-v0-res.json
+1
-1
tests/testdata/hendrycksTest-high_school_microeconomics-v0-loglikelihood
...hendrycksTest-high_school_microeconomics-v0-loglikelihood
+1
-1
No files found.
tests/testdata/hendrycksTest-formal_logic-v0-res.json
View file @
8c997e53
{
"results"
:
{
"hendrycksTest-formal_logic"
:
{
"acc"
:
0.25396825396825395
,
"acc_norm"
:
0.2698412698412698
,
"acc_norm_stderr"
:
0.03970158273235172
,
"acc_stderr"
:
0.03893259610604674
}},
"versions"
:
{
"hendrycksTest-formal_logic"
:
0
}}
{
"results"
:
{
"hendrycksTest-formal_logic"
:
{
"acc"
:
0.25396825396825395
,
"acc_norm"
:
0.2698412698412698
,
"acc_norm_stderr"
:
0.03970158273235172
,
"acc_stderr"
:
0.03893259610604674
}},
"versions"
:
{
"hendrycksTest-formal_logic"
:
0
}}
\ No newline at end of file
tests/testdata/hendrycksTest-global_facts-v0-loglikelihood
View file @
8c997e53
9fdc85240b8170839278b1e883ee0868611d84dce202cb8aa037c841ec76d089
9fdc85240b8170839278b1e883ee0868611d84dce202cb8aa037c841ec76d089
\ No newline at end of file
tests/testdata/hendrycksTest-global_facts-v0-res.json
View file @
8c997e53
{
"results"
:
{
"hendrycksTest-global_facts"
:
{
"acc"
:
0.23
,
"acc_norm"
:
0.23
,
"acc_norm_stderr"
:
0.04229525846816507
,
"acc_stderr"
:
0.04229525846816507
}},
"versions"
:
{
"hendrycksTest-global_facts"
:
0
}}
{
"results"
:
{
"hendrycksTest-global_facts"
:
{
"acc"
:
0.23
,
"acc_norm"
:
0.23
,
"acc_norm_stderr"
:
0.04229525846816507
,
"acc_stderr"
:
0.04229525846816507
}},
"versions"
:
{
"hendrycksTest-global_facts"
:
0
}}
\ No newline at end of file
tests/testdata/hendrycksTest-high_school_biology-v0-loglikelihood
View file @
8c997e53
d4dc051f37a49dc75c218741e87bc826fd44f31ee1309b55e0f33bd191c1bc78
d4dc051f37a49dc75c218741e87bc826fd44f31ee1309b55e0f33bd191c1bc78
\ No newline at end of file
tests/testdata/hendrycksTest-high_school_biology-v0-res.json
View file @
8c997e53
{
"results"
:
{
"hendrycksTest-high_school_biology"
:
{
"acc"
:
0.23870967741935484
,
"acc_norm"
:
0.2709677419354839
,
"acc_norm_stderr"
:
0.025284416114900152
,
"acc_stderr"
:
0.024251071262208834
}},
"versions"
:
{
"hendrycksTest-high_school_biology"
:
0
}}
{
"results"
:
{
"hendrycksTest-high_school_biology"
:
{
"acc"
:
0.23870967741935484
,
"acc_norm"
:
0.2709677419354839
,
"acc_norm_stderr"
:
0.025284416114900152
,
"acc_stderr"
:
0.024251071262208834
}},
"versions"
:
{
"hendrycksTest-high_school_biology"
:
0
}}
\ No newline at end of file
tests/testdata/hendrycksTest-high_school_chemistry-v0-loglikelihood
View file @
8c997e53
f4f338e45415c4b5ee7f1d249155bcd910c8401bd1436760a5ec61cb6bb211b6
f4f338e45415c4b5ee7f1d249155bcd910c8401bd1436760a5ec61cb6bb211b6
\ No newline at end of file
tests/testdata/hendrycksTest-high_school_chemistry-v0-res.json
View file @
8c997e53
{
"results"
:
{
"hendrycksTest-high_school_chemistry"
:
{
"acc"
:
0.2857142857142857
,
"acc_norm"
:
0.2660098522167488
,
"acc_norm_stderr"
:
0.031089826002937523
,
"acc_stderr"
:
0.031785297106427496
}},
"versions"
:
{
"hendrycksTest-high_school_chemistry"
:
0
}}
{
"results"
:
{
"hendrycksTest-high_school_chemistry"
:
{
"acc"
:
0.2857142857142857
,
"acc_norm"
:
0.2660098522167488
,
"acc_norm_stderr"
:
0.031089826002937523
,
"acc_stderr"
:
0.031785297106427496
}},
"versions"
:
{
"hendrycksTest-high_school_chemistry"
:
0
}}
\ No newline at end of file
tests/testdata/hendrycksTest-high_school_computer_science-v0-loglikelihood
View file @
8c997e53
870d5a6300c527077aaf6baa3e750e75fa840b41657cf82549f39b768b14862d
870d5a6300c527077aaf6baa3e750e75fa840b41657cf82549f39b768b14862d
\ No newline at end of file
tests/testdata/hendrycksTest-high_school_computer_science-v0-res.json
View file @
8c997e53
{
"results"
:
{
"hendrycksTest-high_school_computer_science"
:
{
"acc"
:
0.2
,
"acc_norm"
:
0.22
,
"acc_norm_stderr"
:
0.04163331998932269
,
"acc_stderr"
:
0.04020151261036845
}},
"versions"
:
{
"hendrycksTest-high_school_computer_science"
:
0
}}
{
"results"
:
{
"hendrycksTest-high_school_computer_science"
:
{
"acc"
:
0.2
,
"acc_norm"
:
0.22
,
"acc_norm_stderr"
:
0.04163331998932269
,
"acc_stderr"
:
0.04020151261036845
}},
"versions"
:
{
"hendrycksTest-high_school_computer_science"
:
0
}}
\ No newline at end of file
tests/testdata/hendrycksTest-high_school_european_history-v0-loglikelihood
View file @
8c997e53
d8070e113be9d420fef5578cb69c70df4ea5118f9b18553023fd9efd5ff0b7f4
d8070e113be9d420fef5578cb69c70df4ea5118f9b18553023fd9efd5ff0b7f4
\ No newline at end of file
tests/testdata/hendrycksTest-high_school_european_history-v0-res.json
View file @
8c997e53
{
"results"
:
{
"hendrycksTest-high_school_european_history"
:
{
"acc"
:
0.23636363636363636
,
"acc_norm"
:
0.24242424242424243
,
"acc_norm_stderr"
:
0.03346409881055953
,
"acc_stderr"
:
0.033175059300091805
}},
"versions"
:
{
"hendrycksTest-high_school_european_history"
:
0
}}
{
"results"
:
{
"hendrycksTest-high_school_european_history"
:
{
"acc"
:
0.23636363636363636
,
"acc_norm"
:
0.24242424242424243
,
"acc_norm_stderr"
:
0.03346409881055953
,
"acc_stderr"
:
0.033175059300091805
}},
"versions"
:
{
"hendrycksTest-high_school_european_history"
:
0
}}
\ No newline at end of file
tests/testdata/hendrycksTest-high_school_geography-v0-loglikelihood
View file @
8c997e53
add45970ea3865be7c7a31f788a835949f6937ac73f699b122ca56a3431e95f8
add45970ea3865be7c7a31f788a835949f6937ac73f699b122ca56a3431e95f8
\ No newline at end of file
tests/testdata/hendrycksTest-high_school_geography-v0-res.json
View file @
8c997e53
{
"results"
:
{
"hendrycksTest-high_school_geography"
:
{
"acc"
:
0.2474747474747475
,
"acc_norm"
:
0.2777777777777778
,
"acc_norm_stderr"
:
0.03191178226713547
,
"acc_stderr"
:
0.03074630074212452
}},
"versions"
:
{
"hendrycksTest-high_school_geography"
:
0
}}
{
"results"
:
{
"hendrycksTest-high_school_geography"
:
{
"acc"
:
0.2474747474747475
,
"acc_norm"
:
0.2777777777777778
,
"acc_norm_stderr"
:
0.03191178226713547
,
"acc_stderr"
:
0.03074630074212452
}},
"versions"
:
{
"hendrycksTest-high_school_geography"
:
0
}}
\ No newline at end of file
tests/testdata/hendrycksTest-high_school_government_and_politics-v0-loglikelihood
View file @
8c997e53
11f40d8f48ba5cd739e21d54c3c04d3761f81df5cb7ddd77df868d24ced44b49
11f40d8f48ba5cd739e21d54c3c04d3761f81df5cb7ddd77df868d24ced44b49
\ No newline at end of file
tests/testdata/hendrycksTest-high_school_government_and_politics-v0-res.json
View file @
8c997e53
{
"results"
:
{
"hendrycksTest-high_school_government_and_politics"
:
{
"acc"
:
0.24352331606217617
,
"acc_norm"
:
0.23834196891191708
,
"acc_norm_stderr"
:
0.03074890536390988
,
"acc_stderr"
:
0.030975436386845436
}},
"versions"
:
{
"hendrycksTest-high_school_government_and_politics"
:
0
}}
{
"results"
:
{
"hendrycksTest-high_school_government_and_politics"
:
{
"acc"
:
0.24352331606217617
,
"acc_norm"
:
0.23834196891191708
,
"acc_norm_stderr"
:
0.03074890536390988
,
"acc_stderr"
:
0.030975436386845436
}},
"versions"
:
{
"hendrycksTest-high_school_government_and_politics"
:
0
}}
\ No newline at end of file
tests/testdata/hendrycksTest-high_school_macroeconomics-v0-loglikelihood
View file @
8c997e53
ce4faae2fb6628caa48f6fc74cbc848880db49e6ff51079392778a2322bcefef
ce4faae2fb6628caa48f6fc74cbc848880db49e6ff51079392778a2322bcefef
\ No newline at end of file
tests/testdata/hendrycksTest-high_school_macroeconomics-v0-res.json
View file @
8c997e53
{
"results"
:
{
"hendrycksTest-high_school_macroeconomics"
:
{
"acc"
:
0.2230769230769231
,
"acc_norm"
:
0.22564102564102564
,
"acc_norm_stderr"
:
0.021193632525148522
,
"acc_stderr"
:
0.021107730127244
}},
"versions"
:
{
"hendrycksTest-high_school_macroeconomics"
:
0
}}
{
"results"
:
{
"hendrycksTest-high_school_macroeconomics"
:
{
"acc"
:
0.2230769230769231
,
"acc_norm"
:
0.22564102564102564
,
"acc_norm_stderr"
:
0.021193632525148522
,
"acc_stderr"
:
0.021107730127244
}},
"versions"
:
{
"hendrycksTest-high_school_macroeconomics"
:
0
}}
\ No newline at end of file
tests/testdata/hendrycksTest-high_school_mathematics-v0-loglikelihood
View file @
8c997e53
ab368d16fc4648ad27940f71abd266366663f51db612f732a0b9b0eea28de9f8
ab368d16fc4648ad27940f71abd266366663f51db612f732a0b9b0eea28de9f8
\ No newline at end of file
tests/testdata/hendrycksTest-high_school_mathematics-v0-res.json
View file @
8c997e53
{
"results"
:
{
"hendrycksTest-high_school_mathematics"
:
{
"acc"
:
0.22592592592592592
,
"acc_norm"
:
0.24814814814814815
,
"acc_norm_stderr"
:
0.0263357394040558
,
"acc_stderr"
:
0.025497532639609553
}},
"versions"
:
{
"hendrycksTest-high_school_mathematics"
:
0
}}
{
"results"
:
{
"hendrycksTest-high_school_mathematics"
:
{
"acc"
:
0.22592592592592592
,
"acc_norm"
:
0.24814814814814815
,
"acc_norm_stderr"
:
0.0263357394040558
,
"acc_stderr"
:
0.025497532639609553
}},
"versions"
:
{
"hendrycksTest-high_school_mathematics"
:
0
}}
\ No newline at end of file
tests/testdata/hendrycksTest-high_school_microeconomics-v0-loglikelihood
View file @
8c997e53
513b998585ebc1ebdefca6435b7c84fd73dc36fc80321a22503467f04efed23e
513b998585ebc1ebdefca6435b7c84fd73dc36fc80321a22503467f04efed23e
\ No newline at end of file
Prev
1
…
10
11
12
13
14
15
16
17
18
…
32
Next
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment