Commit 4d147bdd authored by Jonathan Tow's avatar Jonathan Tow
Browse files

Merge branch 'master' of https://github.com/EleutherAI/lm-evaluation-harness into task-guide

parents 011cc891 dc937d4b
{"results": {"hendrycksTest-moral_scenarios": {"acc": 0.2547486033519553, "acc_norm": 0.25251396648044694, "acc_norm_stderr": 0.014530330201468654, "acc_stderr": 0.014572650383409158}}, "versions": {"hendrycksTest-moral_scenarios": 0}}
\ No newline at end of file
19e49d218f55ed5ec4bd1a6cd3f3388c6f620b81484e7abe8b298e5481c3044d
\ No newline at end of file
{"results": {"hendrycksTest-nutrition": {"acc": 0.24509803921568626, "acc_norm": 0.28104575163398693, "acc_norm_stderr": 0.025738854797818723, "acc_stderr": 0.02463004897982476}}, "versions": {"hendrycksTest-nutrition": 0}}
\ No newline at end of file
a419204da36c2b7a70fa8909a3a804260cc3283c7e07917534dfb76216c77f46
\ No newline at end of file
{"results": {"hendrycksTest-philosophy": {"acc": 0.26366559485530544, "acc_norm": 0.2733118971061093, "acc_norm_stderr": 0.02531176597542612, "acc_stderr": 0.02502553850053234}}, "versions": {"hendrycksTest-philosophy": 0}}
\ No newline at end of file
6983c560a562749f4f702249a3a6ae51fa495acc0643a980bf2cf52c6c5d4b95
\ No newline at end of file
{"results": {"hendrycksTest-prehistory": {"acc": 0.2623456790123457, "acc_norm": 0.26851851851851855, "acc_norm_stderr": 0.024659685185967277, "acc_stderr": 0.02447722285613511}}, "versions": {"hendrycksTest-prehistory": 0}}
\ No newline at end of file
847418f7b22cd9b499e95fd73c40a2fbc40076895280cc2c560199c0c4c4f433
\ No newline at end of file
{"results": {"hendrycksTest-professional_accounting": {"acc": 0.2553191489361702, "acc_norm": 0.26595744680851063, "acc_norm_stderr": 0.026358065698880582, "acc_stderr": 0.026011992930902006}}, "versions": {"hendrycksTest-professional_accounting": 0}}
\ No newline at end of file
c38c9d5d84eeb7a5f3c4a34d6e70d7e15847b3c38f26e4b119c982bb935e118f
\ No newline at end of file
{"results": {"hendrycksTest-professional_law": {"acc": 0.2561929595827901, "acc_norm": 0.2470664928292047, "acc_norm_stderr": 0.011015752255279352, "acc_stderr": 0.011149173153110582}}, "versions": {"hendrycksTest-professional_law": 0}}
\ No newline at end of file
7a30599858398169cde61430c18efdd7fb4dcd09c34aa9baba70f0f8cf17a9f1
\ No newline at end of file
{"results": {"hendrycksTest-professional_medicine": {"acc": 0.23161764705882354, "acc_norm": 0.2536764705882353, "acc_norm_stderr": 0.02643132987078953, "acc_stderr": 0.025626533803777562}}, "versions": {"hendrycksTest-professional_medicine": 0}}
\ No newline at end of file
92a5fad6e9ec700f84946faeccd399dda3569fb71837c9fb0c5c87f5ec29c43e
\ No newline at end of file
{"results": {"hendrycksTest-professional_psychology": {"acc": 0.27124183006535946, "acc_norm": 0.2826797385620915, "acc_norm_stderr": 0.01821726955205344, "acc_stderr": 0.01798661530403031}}, "versions": {"hendrycksTest-professional_psychology": 0}}
\ No newline at end of file
ab70f500cf24e876f6ae6bdc27525a1d6074fa9b6ea97770255d9fc2559b36ff
\ No newline at end of file
{"results": {"hendrycksTest-public_relations": {"acc": 0.3090909090909091, "acc_norm": 0.2636363636363636, "acc_norm_stderr": 0.04220224692971987, "acc_stderr": 0.044262946482000985}}, "versions": {"hendrycksTest-public_relations": 0}}
\ No newline at end of file
92dfffe2acf3278256486d3e1cf1edb5a739ad0a54c0f9c67695f7a411ed5f76
\ No newline at end of file
{"results": {"hendrycksTest-security_studies": {"acc": 0.2979591836734694, "acc_norm": 0.2693877551020408, "acc_norm_stderr": 0.02840125202902294, "acc_stderr": 0.029279567411065674}}, "versions": {"hendrycksTest-security_studies": 0}}
\ No newline at end of file
f99a3caece11169f2a5cc951001f92027104afd25d29b2a399883bd4bf118605
\ No newline at end of file
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment