Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
gaoqiong
lm-evaluation-harness
Commits
8c997e53
Commit
8c997e53
authored
May 03, 2022
by
jon-tow
Browse files
Revert `tests/testdata` changes and address flake8 issues
parent
d95a4333
Changes
627
Hide whitespace changes
Inline
Side-by-side
Showing
20 changed files
with
20 additions
and
20 deletions
+20
-20
tests/testdata/hendrycksTest-high_school_microeconomics-v0-res.json
...data/hendrycksTest-high_school_microeconomics-v0-res.json
+1
-1
tests/testdata/hendrycksTest-high_school_physics-v0-loglikelihood
...stdata/hendrycksTest-high_school_physics-v0-loglikelihood
+1
-1
tests/testdata/hendrycksTest-high_school_physics-v0-res.json
tests/testdata/hendrycksTest-high_school_physics-v0-res.json
+1
-1
tests/testdata/hendrycksTest-high_school_psychology-v0-loglikelihood
...ata/hendrycksTest-high_school_psychology-v0-loglikelihood
+1
-1
tests/testdata/hendrycksTest-high_school_psychology-v0-res.json
...testdata/hendrycksTest-high_school_psychology-v0-res.json
+1
-1
tests/testdata/hendrycksTest-high_school_statistics-v0-loglikelihood
...ata/hendrycksTest-high_school_statistics-v0-loglikelihood
+1
-1
tests/testdata/hendrycksTest-high_school_statistics-v0-res.json
...testdata/hendrycksTest-high_school_statistics-v0-res.json
+1
-1
tests/testdata/hendrycksTest-high_school_us_history-v0-loglikelihood
...ata/hendrycksTest-high_school_us_history-v0-loglikelihood
+1
-1
tests/testdata/hendrycksTest-high_school_us_history-v0-res.json
...testdata/hendrycksTest-high_school_us_history-v0-res.json
+1
-1
tests/testdata/hendrycksTest-high_school_world_history-v0-loglikelihood
.../hendrycksTest-high_school_world_history-v0-loglikelihood
+1
-1
tests/testdata/hendrycksTest-high_school_world_history-v0-res.json
...tdata/hendrycksTest-high_school_world_history-v0-res.json
+1
-1
tests/testdata/hendrycksTest-human_aging-v0-loglikelihood
tests/testdata/hendrycksTest-human_aging-v0-loglikelihood
+1
-1
tests/testdata/hendrycksTest-human_aging-v0-res.json
tests/testdata/hendrycksTest-human_aging-v0-res.json
+1
-1
tests/testdata/hendrycksTest-human_sexuality-v0-loglikelihood
...s/testdata/hendrycksTest-human_sexuality-v0-loglikelihood
+1
-1
tests/testdata/hendrycksTest-human_sexuality-v0-res.json
tests/testdata/hendrycksTest-human_sexuality-v0-res.json
+1
-1
tests/testdata/hendrycksTest-international_law-v0-loglikelihood
...testdata/hendrycksTest-international_law-v0-loglikelihood
+1
-1
tests/testdata/hendrycksTest-international_law-v0-res.json
tests/testdata/hendrycksTest-international_law-v0-res.json
+1
-1
tests/testdata/hendrycksTest-jurisprudence-v0-loglikelihood
tests/testdata/hendrycksTest-jurisprudence-v0-loglikelihood
+1
-1
tests/testdata/hendrycksTest-jurisprudence-v0-res.json
tests/testdata/hendrycksTest-jurisprudence-v0-res.json
+1
-1
tests/testdata/hendrycksTest-logical_fallacies-v0-loglikelihood
...testdata/hendrycksTest-logical_fallacies-v0-loglikelihood
+1
-1
No files found.
tests/testdata/hendrycksTest-high_school_microeconomics-v0-res.json
View file @
8c997e53
{
"results"
:
{
"hendrycksTest-high_school_microeconomics"
:
{
"acc"
:
0.24369747899159663
,
"acc_norm"
:
0.22268907563025211
,
"acc_norm_stderr"
:
0.027025433498882378
,
"acc_stderr"
:
0.027886828078380558
}},
"versions"
:
{
"hendrycksTest-high_school_microeconomics"
:
0
}}
{
"results"
:
{
"hendrycksTest-high_school_microeconomics"
:
{
"acc"
:
0.24369747899159663
,
"acc_norm"
:
0.22268907563025211
,
"acc_norm_stderr"
:
0.027025433498882378
,
"acc_stderr"
:
0.027886828078380558
}},
"versions"
:
{
"hendrycksTest-high_school_microeconomics"
:
0
}}
\ No newline at end of file
tests/testdata/hendrycksTest-high_school_physics-v0-loglikelihood
View file @
8c997e53
dae59e82d3d4d8dec82239d9620b72cc47bb6efbe2f1c2f9b9d23e849c9c5e32
dae59e82d3d4d8dec82239d9620b72cc47bb6efbe2f1c2f9b9d23e849c9c5e32
\ No newline at end of file
tests/testdata/hendrycksTest-high_school_physics-v0-res.json
View file @
8c997e53
{
"results"
:
{
"hendrycksTest-high_school_physics"
:
{
"acc"
:
0.2582781456953642
,
"acc_norm"
:
0.271523178807947
,
"acc_norm_stderr"
:
0.03631329803969653
,
"acc_stderr"
:
0.035737053147634576
}},
"versions"
:
{
"hendrycksTest-high_school_physics"
:
0
}}
{
"results"
:
{
"hendrycksTest-high_school_physics"
:
{
"acc"
:
0.2582781456953642
,
"acc_norm"
:
0.271523178807947
,
"acc_norm_stderr"
:
0.03631329803969653
,
"acc_stderr"
:
0.035737053147634576
}},
"versions"
:
{
"hendrycksTest-high_school_physics"
:
0
}}
\ No newline at end of file
tests/testdata/hendrycksTest-high_school_psychology-v0-loglikelihood
View file @
8c997e53
0e4c8d13806d3696167e40544d2d114c557c10c74bc61fcb9c51bbfced0266ef
0e4c8d13806d3696167e40544d2d114c557c10c74bc61fcb9c51bbfced0266ef
\ No newline at end of file
tests/testdata/hendrycksTest-high_school_psychology-v0-res.json
View file @
8c997e53
{
"results"
:
{
"hendrycksTest-high_school_psychology"
:
{
"acc"
:
0.24587155963302754
,
"acc_norm"
:
0.23302752293577983
,
"acc_norm_stderr"
:
0.018125669180861493
,
"acc_stderr"
:
0.018461940968708436
}},
"versions"
:
{
"hendrycksTest-high_school_psychology"
:
0
}}
{
"results"
:
{
"hendrycksTest-high_school_psychology"
:
{
"acc"
:
0.24587155963302754
,
"acc_norm"
:
0.23302752293577983
,
"acc_norm_stderr"
:
0.018125669180861493
,
"acc_stderr"
:
0.018461940968708436
}},
"versions"
:
{
"hendrycksTest-high_school_psychology"
:
0
}}
\ No newline at end of file
tests/testdata/hendrycksTest-high_school_statistics-v0-loglikelihood
View file @
8c997e53
33d1d6eaaa2c3a944bf49d3f220a4efc328d7c3b3465b7cec40ae36d8984b75f
33d1d6eaaa2c3a944bf49d3f220a4efc328d7c3b3465b7cec40ae36d8984b75f
\ No newline at end of file
tests/testdata/hendrycksTest-high_school_statistics-v0-res.json
View file @
8c997e53
{
"results"
:
{
"hendrycksTest-high_school_statistics"
:
{
"acc"
:
0.2962962962962963
,
"acc_norm"
:
0.3055555555555556
,
"acc_norm_stderr"
:
0.03141554629402544
,
"acc_stderr"
:
0.03114144782353604
}},
"versions"
:
{
"hendrycksTest-high_school_statistics"
:
0
}}
{
"results"
:
{
"hendrycksTest-high_school_statistics"
:
{
"acc"
:
0.2962962962962963
,
"acc_norm"
:
0.3055555555555556
,
"acc_norm_stderr"
:
0.03141554629402544
,
"acc_stderr"
:
0.03114144782353604
}},
"versions"
:
{
"hendrycksTest-high_school_statistics"
:
0
}}
\ No newline at end of file
tests/testdata/hendrycksTest-high_school_us_history-v0-loglikelihood
View file @
8c997e53
8c65c1a28330dd001d395ac11f1bb80c3b33f5935f503e74067aef6e9e1d9d9b
8c65c1a28330dd001d395ac11f1bb80c3b33f5935f503e74067aef6e9e1d9d9b
\ No newline at end of file
tests/testdata/hendrycksTest-high_school_us_history-v0-res.json
View file @
8c997e53
{
"results"
:
{
"hendrycksTest-high_school_us_history"
:
{
"acc"
:
0.29901960784313725
,
"acc_norm"
:
0.28431372549019607
,
"acc_norm_stderr"
:
0.03166009679399814
,
"acc_stderr"
:
0.03213325717373618
}},
"versions"
:
{
"hendrycksTest-high_school_us_history"
:
0
}}
{
"results"
:
{
"hendrycksTest-high_school_us_history"
:
{
"acc"
:
0.29901960784313725
,
"acc_norm"
:
0.28431372549019607
,
"acc_norm_stderr"
:
0.03166009679399814
,
"acc_stderr"
:
0.03213325717373618
}},
"versions"
:
{
"hendrycksTest-high_school_us_history"
:
0
}}
\ No newline at end of file
tests/testdata/hendrycksTest-high_school_world_history-v0-loglikelihood
View file @
8c997e53
1c8b994bd9a63ec874fc8d0e3a27077118b7adc472306b2fd6c55635a78b9d52
1c8b994bd9a63ec874fc8d0e3a27077118b7adc472306b2fd6c55635a78b9d52
\ No newline at end of file
tests/testdata/hendrycksTest-high_school_world_history-v0-res.json
View file @
8c997e53
{
"results"
:
{
"hendrycksTest-high_school_world_history"
:
{
"acc"
:
0.23628691983122363
,
"acc_norm"
:
0.24472573839662448
,
"acc_norm_stderr"
:
0.02798569938703642
,
"acc_stderr"
:
0.027652153144159263
}},
"versions"
:
{
"hendrycksTest-high_school_world_history"
:
0
}}
{
"results"
:
{
"hendrycksTest-high_school_world_history"
:
{
"acc"
:
0.23628691983122363
,
"acc_norm"
:
0.24472573839662448
,
"acc_norm_stderr"
:
0.02798569938703642
,
"acc_stderr"
:
0.027652153144159263
}},
"versions"
:
{
"hendrycksTest-high_school_world_history"
:
0
}}
\ No newline at end of file
tests/testdata/hendrycksTest-human_aging-v0-loglikelihood
View file @
8c997e53
0880b3a78f8d7b17ffc612031427b9085367cf65dabe2a68c4b64e3171d17e88
0880b3a78f8d7b17ffc612031427b9085367cf65dabe2a68c4b64e3171d17e88
\ No newline at end of file
tests/testdata/hendrycksTest-human_aging-v0-res.json
View file @
8c997e53
{
"results"
:
{
"hendrycksTest-human_aging"
:
{
"acc"
:
0.21524663677130046
,
"acc_norm"
:
0.17937219730941703
,
"acc_norm_stderr"
:
0.025749819569192804
,
"acc_stderr"
:
0.02758406660220827
}},
"versions"
:
{
"hendrycksTest-human_aging"
:
0
}}
{
"results"
:
{
"hendrycksTest-human_aging"
:
{
"acc"
:
0.21524663677130046
,
"acc_norm"
:
0.17937219730941703
,
"acc_norm_stderr"
:
0.025749819569192804
,
"acc_stderr"
:
0.02758406660220827
}},
"versions"
:
{
"hendrycksTest-human_aging"
:
0
}}
\ No newline at end of file
tests/testdata/hendrycksTest-human_sexuality-v0-loglikelihood
View file @
8c997e53
4b07922fa1d549b655c21440b13d869263ce7dd9771d8147c450f11c91d26c10
4b07922fa1d549b655c21440b13d869263ce7dd9771d8147c450f11c91d26c10
\ No newline at end of file
tests/testdata/hendrycksTest-human_sexuality-v0-res.json
View file @
8c997e53
{
"results"
:
{
"hendrycksTest-human_sexuality"
:
{
"acc"
:
0.22137404580152673
,
"acc_norm"
:
0.22900763358778625
,
"acc_norm_stderr"
:
0.036853466317118506
,
"acc_stderr"
:
0.0364129708131373
}},
"versions"
:
{
"hendrycksTest-human_sexuality"
:
0
}}
{
"results"
:
{
"hendrycksTest-human_sexuality"
:
{
"acc"
:
0.22137404580152673
,
"acc_norm"
:
0.22900763358778625
,
"acc_norm_stderr"
:
0.036853466317118506
,
"acc_stderr"
:
0.0364129708131373
}},
"versions"
:
{
"hendrycksTest-human_sexuality"
:
0
}}
\ No newline at end of file
tests/testdata/hendrycksTest-international_law-v0-loglikelihood
View file @
8c997e53
ea9b2cefd27959db564168f6ad1169a5eaa012fc5a5d5b8faf9e34d94e335dc1
ea9b2cefd27959db564168f6ad1169a5eaa012fc5a5d5b8faf9e34d94e335dc1
\ No newline at end of file
tests/testdata/hendrycksTest-international_law-v0-res.json
View file @
8c997e53
{
"results"
:
{
"hendrycksTest-international_law"
:
{
"acc"
:
0.2396694214876033
,
"acc_norm"
:
0.3140495867768595
,
"acc_norm_stderr"
:
0.042369647530410164
,
"acc_stderr"
:
0.03896878985070417
}},
"versions"
:
{
"hendrycksTest-international_law"
:
0
}}
{
"results"
:
{
"hendrycksTest-international_law"
:
{
"acc"
:
0.2396694214876033
,
"acc_norm"
:
0.3140495867768595
,
"acc_norm_stderr"
:
0.042369647530410164
,
"acc_stderr"
:
0.03896878985070417
}},
"versions"
:
{
"hendrycksTest-international_law"
:
0
}}
\ No newline at end of file
tests/testdata/hendrycksTest-jurisprudence-v0-loglikelihood
View file @
8c997e53
cac440189f1ec778e82f4975d88b74689553ecc5116aaa7f76587a50c1a610e0
cac440189f1ec778e82f4975d88b74689553ecc5116aaa7f76587a50c1a610e0
\ No newline at end of file
tests/testdata/hendrycksTest-jurisprudence-v0-res.json
View file @
8c997e53
{
"results"
:
{
"hendrycksTest-jurisprudence"
:
{
"acc"
:
0.25
,
"acc_norm"
:
0.3148148148148148
,
"acc_norm_stderr"
:
0.04489931073591312
,
"acc_stderr"
:
0.04186091791394607
}},
"versions"
:
{
"hendrycksTest-jurisprudence"
:
0
}}
{
"results"
:
{
"hendrycksTest-jurisprudence"
:
{
"acc"
:
0.25
,
"acc_norm"
:
0.3148148148148148
,
"acc_norm_stderr"
:
0.04489931073591312
,
"acc_stderr"
:
0.04186091791394607
}},
"versions"
:
{
"hendrycksTest-jurisprudence"
:
0
}}
\ No newline at end of file
tests/testdata/hendrycksTest-logical_fallacies-v0-loglikelihood
View file @
8c997e53
2e9449dd803f9e2334dc562d9f04031fd013ed36b883b44ab500533a5dbbface
2e9449dd803f9e2334dc562d9f04031fd013ed36b883b44ab500533a5dbbface
\ No newline at end of file
Prev
1
…
11
12
13
14
15
16
17
18
19
…
32
Next
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment