Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
gaoqiong
lm-evaluation-harness
Commits
121b7096
Commit
121b7096
authored
May 02, 2022
by
Fabrizio Milo
Browse files
add pre-commit
parent
7a038118
Changes
732
Hide whitespace changes
Inline
Side-by-side
Showing
20 changed files
with
20 additions
and
20 deletions
+20
-20
tests/testdata/squad2-v0-greedy_until
tests/testdata/squad2-v0-greedy_until
+1
-1
tests/testdata/squad2-v0-loglikelihood
tests/testdata/squad2-v0-loglikelihood
+1
-1
tests/testdata/squad2-v0-res.json
tests/testdata/squad2-v0-res.json
+1
-1
tests/testdata/squad2-v1-greedy_until
tests/testdata/squad2-v1-greedy_until
+1
-1
tests/testdata/squad2-v1-loglikelihood
tests/testdata/squad2-v1-loglikelihood
+1
-1
tests/testdata/squad2-v1-res.json
tests/testdata/squad2-v1-res.json
+1
-1
tests/testdata/sst-v0-loglikelihood
tests/testdata/sst-v0-loglikelihood
+1
-1
tests/testdata/sst-v0-res.json
tests/testdata/sst-v0-res.json
+1
-1
tests/testdata/swag-v0-loglikelihood
tests/testdata/swag-v0-loglikelihood
+1
-1
tests/testdata/swag-v0-res.json
tests/testdata/swag-v0-res.json
+1
-1
tests/testdata/triviaqa-v0-loglikelihood
tests/testdata/triviaqa-v0-loglikelihood
+1
-1
tests/testdata/triviaqa-v0-res.json
tests/testdata/triviaqa-v0-res.json
+1
-1
tests/testdata/truthfulqa_gen-v0-greedy_until
tests/testdata/truthfulqa_gen-v0-greedy_until
+1
-1
tests/testdata/truthfulqa_gen-v0-res.json
tests/testdata/truthfulqa_gen-v0-res.json
+1
-1
tests/testdata/truthfulqa_gen-v1-greedy_until
tests/testdata/truthfulqa_gen-v1-greedy_until
+1
-1
tests/testdata/truthfulqa_gen-v1-res.json
tests/testdata/truthfulqa_gen-v1-res.json
+1
-1
tests/testdata/truthfulqa_mc-v0-loglikelihood
tests/testdata/truthfulqa_mc-v0-loglikelihood
+1
-1
tests/testdata/truthfulqa_mc-v0-res.json
tests/testdata/truthfulqa_mc-v0-res.json
+1
-1
tests/testdata/truthfulqa_mc-v1-loglikelihood
tests/testdata/truthfulqa_mc-v1-loglikelihood
+1
-1
tests/testdata/truthfulqa_mc-v1-res.json
tests/testdata/truthfulqa_mc-v1-res.json
+1
-1
No files found.
tests/testdata/squad2-v0-greedy_until
View file @
121b7096
b261e8885c11750ce6911bb11e8693de03d53758297c26fb14cfc1ef508862cb
\ No newline at end of file
b261e8885c11750ce6911bb11e8693de03d53758297c26fb14cfc1ef508862cb
tests/testdata/squad2-v0-loglikelihood
View file @
121b7096
287e87cc6878debcc80d9b6df4e2d0a74ed29068e0e0a80906c8441843a17cee
\ No newline at end of file
287e87cc6878debcc80d9b6df4e2d0a74ed29068e0e0a80906c8441843a17cee
tests/testdata/squad2-v0-res.json
View file @
121b7096
{
"results"
:
{
"squad2"
:
{
"HasAns_exact"
:
0.0
,
"HasAns_f1"
:
0.0
,
"NoAns_exact"
:
0.0
,
"NoAns_f1"
:
0.0
,
"best_exact"
:
50.07159100480081
,
"best_f1"
:
50.07159100480081
,
"exact"
:
0.0
,
"f1"
:
0.0
}},
"versions"
:
{
"squad2"
:
0
}}
\ No newline at end of file
{
"results"
:
{
"squad2"
:
{
"HasAns_exact"
:
0.0
,
"HasAns_f1"
:
0.0
,
"NoAns_exact"
:
0.0
,
"NoAns_f1"
:
0.0
,
"best_exact"
:
50.07159100480081
,
"best_f1"
:
50.07159100480081
,
"exact"
:
0.0
,
"f1"
:
0.0
}},
"versions"
:
{
"squad2"
:
0
}}
tests/testdata/squad2-v1-greedy_until
View file @
121b7096
e17e3d85c1d5adaf2d6b4b752c4babc2e0b3a6e144e6de70cb3b2287e85109b8
\ No newline at end of file
e17e3d85c1d5adaf2d6b4b752c4babc2e0b3a6e144e6de70cb3b2287e85109b8
tests/testdata/squad2-v1-loglikelihood
View file @
121b7096
f5da6173402b274dc89130755c222c6ca6b2a3bacaaa4e4ab07be9322b7bad65
\ No newline at end of file
f5da6173402b274dc89130755c222c6ca6b2a3bacaaa4e4ab07be9322b7bad65
tests/testdata/squad2-v1-res.json
View file @
121b7096
{
"results"
:
{
"squad2"
:
{
"HasAns_exact"
:
0.0
,
"HasAns_f1"
:
0.0
,
"NoAns_exact"
:
0.0
,
"NoAns_f1"
:
0.0
,
"best_exact"
:
50.07159100480081
,
"best_f1"
:
50.07159100480081
,
"exact"
:
0.0
,
"f1"
:
0.0
}},
"versions"
:
{
"squad2"
:
1
}}
\ No newline at end of file
{
"results"
:
{
"squad2"
:
{
"HasAns_exact"
:
0.0
,
"HasAns_f1"
:
0.0
,
"NoAns_exact"
:
0.0
,
"NoAns_f1"
:
0.0
,
"best_exact"
:
50.07159100480081
,
"best_f1"
:
50.07159100480081
,
"exact"
:
0.0
,
"f1"
:
0.0
}},
"versions"
:
{
"squad2"
:
1
}}
tests/testdata/sst-v0-loglikelihood
View file @
121b7096
d2ebe3a63517d1d481aa1513bebe124c57a0904554a1e95f566979cfe67b1a7f
\ No newline at end of file
d2ebe3a63517d1d481aa1513bebe124c57a0904554a1e95f566979cfe67b1a7f
tests/testdata/sst-v0-res.json
View file @
121b7096
{
"results"
:
{
"sst"
:
{
"acc"
:
0.5172018348623854
,
"acc_stderr"
:
0.016931824425903734
}},
"versions"
:
{
"sst"
:
0
}}
\ No newline at end of file
{
"results"
:
{
"sst"
:
{
"acc"
:
0.5172018348623854
,
"acc_stderr"
:
0.016931824425903734
}},
"versions"
:
{
"sst"
:
0
}}
tests/testdata/swag-v0-loglikelihood
View file @
121b7096
be4fcbad876124c4ba3c71970538a97fec0e36a9cc677c70b6c9243a7bcee0ec
\ No newline at end of file
be4fcbad876124c4ba3c71970538a97fec0e36a9cc677c70b6c9243a7bcee0ec
tests/testdata/swag-v0-res.json
View file @
121b7096
{
"results"
:
{
"swag"
:
{
"acc"
:
0.2482255323402979
,
"acc_norm"
:
0.24882535239428172
,
"acc_norm_stderr"
:
0.00305666959496067
,
"acc_stderr"
:
0.003054201832644171
}},
"versions"
:
{
"swag"
:
0
}}
\ No newline at end of file
{
"results"
:
{
"swag"
:
{
"acc"
:
0.2482255323402979
,
"acc_norm"
:
0.24882535239428172
,
"acc_norm_stderr"
:
0.00305666959496067
,
"acc_stderr"
:
0.003054201832644171
}},
"versions"
:
{
"swag"
:
0
}}
tests/testdata/triviaqa-v0-loglikelihood
View file @
121b7096
f8ec05b306b9f6187c0f8117cae441fb85a7a2e4670f4f9a1a3b632b1978421a
\ No newline at end of file
f8ec05b306b9f6187c0f8117cae441fb85a7a2e4670f4f9a1a3b632b1978421a
tests/testdata/triviaqa-v0-res.json
View file @
121b7096
{
"results"
:
{
"triviaqa"
:
{
"acc"
:
0.0
,
"acc_stderr"
:
0.0
}},
"versions"
:
{
"triviaqa"
:
0
}}
\ No newline at end of file
{
"results"
:
{
"triviaqa"
:
{
"acc"
:
0.0
,
"acc_stderr"
:
0.0
}},
"versions"
:
{
"triviaqa"
:
0
}}
tests/testdata/truthfulqa_gen-v0-greedy_until
View file @
121b7096
0d7c56e1aa71ffd8f94bde28f6e8dfdd35f7aaadffa0620bd2a27704253d6c14
\ No newline at end of file
0d7c56e1aa71ffd8f94bde28f6e8dfdd35f7aaadffa0620bd2a27704253d6c14
tests/testdata/truthfulqa_gen-v0-res.json
View file @
121b7096
{
"results"
:
{
"truthfulqa_gen"
:
{
"bleu_acc"
:
0.0
,
"bleu_acc_stderr"
:
0.0
,
"bleu_diff"
:
0.0
,
"bleu_diff_stderr"
:
0.0
,
"bleu_max"
:
0.0
,
"bleu_max_stderr"
:
0.0
,
"bleurt_acc"
:
0.8372093023255814
,
"bleurt_acc_stderr"
:
0.012923696051772253
,
"bleurt_diff"
:
0.13967358205134603
,
"bleurt_diff_stderr"
:
0.00532907098769571
,
"bleurt_max"
:
-1.4402793981454072
,
"bleurt_max_stderr"
:
0.0021884846359458963
,
"rouge1_acc"
:
0.0
,
"rouge1_acc_stderr"
:
0.0
,
"rouge1_diff"
:
0.0
,
"rouge1_diff_stderr"
:
0.0
,
"rouge1_max"
:
0.0
,
"rouge1_max_stderr"
:
0.0
,
"rouge2_acc"
:
0.0
,
"rouge2_acc_stderr"
:
0.0
,
"rouge2_diff"
:
0.0
,
"rouge2_diff_stderr"
:
0.0
,
"rouge2_max"
:
0.0
,
"rouge2_max_stderr"
:
0.0
,
"rougeL_acc"
:
0.0
,
"rougeL_acc_stderr"
:
0.0
,
"rougeL_diff"
:
0.0
,
"rougeL_diff_stderr"
:
0.0
,
"rougeL_max"
:
0.0
,
"rougeL_max_stderr"
:
0.0
}},
"versions"
:
{
"truthfulqa_gen"
:
0
}}
\ No newline at end of file
{
"results"
:
{
"truthfulqa_gen"
:
{
"bleu_acc"
:
0.0
,
"bleu_acc_stderr"
:
0.0
,
"bleu_diff"
:
0.0
,
"bleu_diff_stderr"
:
0.0
,
"bleu_max"
:
0.0
,
"bleu_max_stderr"
:
0.0
,
"bleurt_acc"
:
0.8372093023255814
,
"bleurt_acc_stderr"
:
0.012923696051772253
,
"bleurt_diff"
:
0.13967358205134603
,
"bleurt_diff_stderr"
:
0.00532907098769571
,
"bleurt_max"
:
-1.4402793981454072
,
"bleurt_max_stderr"
:
0.0021884846359458963
,
"rouge1_acc"
:
0.0
,
"rouge1_acc_stderr"
:
0.0
,
"rouge1_diff"
:
0.0
,
"rouge1_diff_stderr"
:
0.0
,
"rouge1_max"
:
0.0
,
"rouge1_max_stderr"
:
0.0
,
"rouge2_acc"
:
0.0
,
"rouge2_acc_stderr"
:
0.0
,
"rouge2_diff"
:
0.0
,
"rouge2_diff_stderr"
:
0.0
,
"rouge2_max"
:
0.0
,
"rouge2_max_stderr"
:
0.0
,
"rougeL_acc"
:
0.0
,
"rougeL_acc_stderr"
:
0.0
,
"rougeL_diff"
:
0.0
,
"rougeL_diff_stderr"
:
0.0
,
"rougeL_max"
:
0.0
,
"rougeL_max_stderr"
:
0.0
}},
"versions"
:
{
"truthfulqa_gen"
:
0
}}
tests/testdata/truthfulqa_gen-v1-greedy_until
View file @
121b7096
1a280973bbac2b7ac29dd64dddac474fb4749585f7de893483b4034814466c67
\ No newline at end of file
1a280973bbac2b7ac29dd64dddac474fb4749585f7de893483b4034814466c67
tests/testdata/truthfulqa_gen-v1-res.json
View file @
121b7096
{
"results"
:
{
"truthfulqa_gen"
:
{
"bleu_acc"
:
0.0
,
"bleu_acc_stderr"
:
0.0
,
"bleu_diff"
:
0.0
,
"bleu_diff_stderr"
:
0.0
,
"bleu_max"
:
0.0
,
"bleu_max_stderr"
:
0.0
,
"bleurt_acc"
:
0.835985312117503
,
"bleurt_acc_stderr"
:
0.012962704327492454
,
"bleurt_diff"
:
0.14077322143090107
,
"bleurt_diff_stderr"
:
0.005459888909582694
,
"bleurt_max"
:
-1.4399358725752065
,
"bleurt_max_stderr"
:
0.0022126992369197133
,
"rouge1_acc"
:
0.0
,
"rouge1_acc_stderr"
:
0.0
,
"rouge1_diff"
:
0.0
,
"rouge1_diff_stderr"
:
0.0
,
"rouge1_max"
:
0.0
,
"rouge1_max_stderr"
:
0.0
,
"rouge2_acc"
:
0.0
,
"rouge2_acc_stderr"
:
0.0
,
"rouge2_diff"
:
0.0
,
"rouge2_diff_stderr"
:
0.0
,
"rouge2_max"
:
0.0
,
"rouge2_max_stderr"
:
0.0
,
"rougeL_acc"
:
0.0
,
"rougeL_acc_stderr"
:
0.0
,
"rougeL_diff"
:
0.0
,
"rougeL_diff_stderr"
:
0.0
,
"rougeL_max"
:
0.0
,
"rougeL_max_stderr"
:
0.0
}},
"versions"
:
{
"truthfulqa_gen"
:
1
}}
\ No newline at end of file
{
"results"
:
{
"truthfulqa_gen"
:
{
"bleu_acc"
:
0.0
,
"bleu_acc_stderr"
:
0.0
,
"bleu_diff"
:
0.0
,
"bleu_diff_stderr"
:
0.0
,
"bleu_max"
:
0.0
,
"bleu_max_stderr"
:
0.0
,
"bleurt_acc"
:
0.835985312117503
,
"bleurt_acc_stderr"
:
0.012962704327492454
,
"bleurt_diff"
:
0.14077322143090107
,
"bleurt_diff_stderr"
:
0.005459888909582694
,
"bleurt_max"
:
-1.4399358725752065
,
"bleurt_max_stderr"
:
0.0022126992369197133
,
"rouge1_acc"
:
0.0
,
"rouge1_acc_stderr"
:
0.0
,
"rouge1_diff"
:
0.0
,
"rouge1_diff_stderr"
:
0.0
,
"rouge1_max"
:
0.0
,
"rouge1_max_stderr"
:
0.0
,
"rouge2_acc"
:
0.0
,
"rouge2_acc_stderr"
:
0.0
,
"rouge2_diff"
:
0.0
,
"rouge2_diff_stderr"
:
0.0
,
"rouge2_max"
:
0.0
,
"rouge2_max_stderr"
:
0.0
,
"rougeL_acc"
:
0.0
,
"rougeL_acc_stderr"
:
0.0
,
"rougeL_diff"
:
0.0
,
"rougeL_diff_stderr"
:
0.0
,
"rougeL_max"
:
0.0
,
"rougeL_max_stderr"
:
0.0
}},
"versions"
:
{
"truthfulqa_gen"
:
1
}}
tests/testdata/truthfulqa_mc-v0-loglikelihood
View file @
121b7096
226a6783976177dc9ceda5688623ff37023242eff30ddf270b886bf7b9b32228
\ No newline at end of file
226a6783976177dc9ceda5688623ff37023242eff30ddf270b886bf7b9b32228
tests/testdata/truthfulqa_mc-v0-res.json
View file @
121b7096
{
"results"
:
{
"truthfulqa_mc"
:
{
"mc1"
:
0.2141982864137087
,
"mc1_stderr"
:
0.01436214815569045
,
"mc2"
:
0.465436996173817
,
"mc2_stderr"
:
0.0048422530880316405
}},
"versions"
:
{
"truthfulqa_mc"
:
0
}}
\ No newline at end of file
{
"results"
:
{
"truthfulqa_mc"
:
{
"mc1"
:
0.2141982864137087
,
"mc1_stderr"
:
0.01436214815569045
,
"mc2"
:
0.465436996173817
,
"mc2_stderr"
:
0.0048422530880316405
}},
"versions"
:
{
"truthfulqa_mc"
:
0
}}
tests/testdata/truthfulqa_mc-v1-loglikelihood
View file @
121b7096
1e07020e9cf41d46ed65312eb39d2b8e6599673d4f0d6b67c0d0eba0efb493bb
\ No newline at end of file
1e07020e9cf41d46ed65312eb39d2b8e6599673d4f0d6b67c0d0eba0efb493bb
tests/testdata/truthfulqa_mc-v1-res.json
View file @
121b7096
{
"results"
:
{
"truthfulqa_mc"
:
{
"mc1"
:
0.23255813953488372
,
"mc1_stderr"
:
0.01478915753108052
,
"mc2"
:
0.4462325560722362
,
"mc2_stderr"
:
0.004986523944692003
}},
"versions"
:
{
"truthfulqa_mc"
:
1
}}
\ No newline at end of file
{
"results"
:
{
"truthfulqa_mc"
:
{
"mc1"
:
0.23255813953488372
,
"mc1_stderr"
:
0.01478915753108052
,
"mc2"
:
0.4462325560722362
,
"mc2_stderr"
:
0.004986523944692003
}},
"versions"
:
{
"truthfulqa_mc"
:
1
}}
Prev
1
…
22
23
24
25
26
27
28
29
30
…
37
Next
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment