Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
gaoqiong
lm-evaluation-harness
Commits
52f75f0e
Commit
52f75f0e
authored
Nov 28, 2023
by
lintangsutawika
Browse files
Merge branch 'big-refactor' of
https://github.com/EleutherAI/lm-evaluation-harness
into versioning
parents
331d7c51
b072bb0d
Changes
72
Hide whitespace changes
Inline
Side-by-side
Showing
12 changed files
with
21 additions
and
25 deletions
+21
-25
lm_eval/tasks/blimp/tough_vs_raising_1.yaml
lm_eval/tasks/blimp/tough_vs_raising_1.yaml
+1
-1
lm_eval/tasks/blimp/tough_vs_raising_2.yaml
lm_eval/tasks/blimp/tough_vs_raising_2.yaml
+1
-1
lm_eval/tasks/blimp/transitive.yaml
lm_eval/tasks/blimp/transitive.yaml
+1
-1
lm_eval/tasks/blimp/wh_island.yaml
lm_eval/tasks/blimp/wh_island.yaml
+1
-1
lm_eval/tasks/blimp/wh_questions_object_gap.yaml
lm_eval/tasks/blimp/wh_questions_object_gap.yaml
+1
-1
lm_eval/tasks/blimp/wh_questions_subject_gap.yaml
lm_eval/tasks/blimp/wh_questions_subject_gap.yaml
+1
-1
lm_eval/tasks/blimp/wh_questions_subject_gap_long_distance.yaml
...l/tasks/blimp/wh_questions_subject_gap_long_distance.yaml
+1
-1
lm_eval/tasks/blimp/wh_vs_that_no_gap.yaml
lm_eval/tasks/blimp/wh_vs_that_no_gap.yaml
+1
-1
lm_eval/tasks/blimp/wh_vs_that_no_gap_long_distance.yaml
lm_eval/tasks/blimp/wh_vs_that_no_gap_long_distance.yaml
+1
-1
lm_eval/tasks/blimp/wh_vs_that_with_gap.yaml
lm_eval/tasks/blimp/wh_vs_that_with_gap.yaml
+1
-1
lm_eval/tasks/blimp/wh_vs_that_with_gap_long_distance.yaml
lm_eval/tasks/blimp/wh_vs_that_with_gap_long_distance.yaml
+1
-1
lm_eval/utils.py
lm_eval/utils.py
+10
-14
No files found.
lm_eval/tasks/blimp/tough_vs_raising_1.yaml
View file @
52f75f0e
# Generated by utils.py
dataset_name
:
tough_vs_raising_1
include
:
template_yaml
include
:
_
template_yaml
task
:
blimp_tough_vs_raising_1
lm_eval/tasks/blimp/tough_vs_raising_2.yaml
View file @
52f75f0e
# Generated by utils.py
dataset_name
:
tough_vs_raising_2
include
:
template_yaml
include
:
_
template_yaml
task
:
blimp_tough_vs_raising_2
lm_eval/tasks/blimp/transitive.yaml
View file @
52f75f0e
# Generated by utils.py
dataset_name
:
transitive
include
:
template_yaml
include
:
_
template_yaml
task
:
blimp_transitive
lm_eval/tasks/blimp/wh_island.yaml
View file @
52f75f0e
# Generated by utils.py
dataset_name
:
wh_island
include
:
template_yaml
include
:
_
template_yaml
task
:
blimp_wh_island
lm_eval/tasks/blimp/wh_questions_object_gap.yaml
View file @
52f75f0e
# Generated by utils.py
dataset_name
:
wh_questions_object_gap
include
:
template_yaml
include
:
_
template_yaml
task
:
blimp_wh_questions_object_gap
lm_eval/tasks/blimp/wh_questions_subject_gap.yaml
View file @
52f75f0e
# Generated by utils.py
dataset_name
:
wh_questions_subject_gap
include
:
template_yaml
include
:
_
template_yaml
task
:
blimp_wh_questions_subject_gap
lm_eval/tasks/blimp/wh_questions_subject_gap_long_distance.yaml
View file @
52f75f0e
# Generated by utils.py
dataset_name
:
wh_questions_subject_gap_long_distance
include
:
template_yaml
include
:
_
template_yaml
task
:
blimp_wh_questions_subject_gap_long_distance
lm_eval/tasks/blimp/wh_vs_that_no_gap.yaml
View file @
52f75f0e
# Generated by utils.py
dataset_name
:
wh_vs_that_no_gap
include
:
template_yaml
include
:
_
template_yaml
task
:
blimp_wh_vs_that_no_gap
lm_eval/tasks/blimp/wh_vs_that_no_gap_long_distance.yaml
View file @
52f75f0e
# Generated by utils.py
dataset_name
:
wh_vs_that_no_gap_long_distance
include
:
template_yaml
include
:
_
template_yaml
task
:
blimp_wh_vs_that_no_gap_long_distance
lm_eval/tasks/blimp/wh_vs_that_with_gap.yaml
View file @
52f75f0e
# Generated by utils.py
dataset_name
:
wh_vs_that_with_gap
include
:
template_yaml
include
:
_
template_yaml
task
:
blimp_wh_vs_that_with_gap
lm_eval/tasks/blimp/wh_vs_that_with_gap_long_distance.yaml
View file @
52f75f0e
# Generated by utils.py
dataset_name
:
wh_vs_that_with_gap_long_distance
include
:
template_yaml
include
:
_
template_yaml
task
:
blimp_wh_vs_that_with_gap_long_distance
lm_eval/utils.py
View file @
52f75f0e
...
...
@@ -339,31 +339,27 @@ def make_table(result_dict, column: str = "results"):
elif
column
==
"groups"
:
column_name
=
"Groups"
md_writer
=
MarkdownTableWriter
()
latex_writer
=
LatexTableWriter
()
md_writer
.
headers
=
[
column_name
,
"Version"
,
"Filter"
,
"Metric"
,
"Value"
,
""
,
"Stderr"
,
]
latex_writer
.
headers
=
[
all_headers
=
[
column_name
,
"Version"
,
"Filter"
,
"n-shot"
,
"Metric"
,
"Value"
,
""
,
"Stderr"
,
]
md_writer
=
MarkdownTableWriter
()
latex_writer
=
LatexTableWriter
()
md_writer
.
headers
=
all_headers
latex_writer
.
headers
=
all_headers
values
=
[]
for
k
,
dic
in
result_dict
[
column
].
items
():
version
=
result_dict
[
"versions"
][
k
]
n
=
str
(
result_dict
[
"n-shot"
][
k
])
if
"alias"
in
dic
:
k
=
dic
.
pop
(
"alias"
)
...
...
@@ -375,9 +371,9 @@ def make_table(result_dict, column: str = "results"):
if
m
+
"_stderr"
+
","
+
f
in
dic
:
se
=
dic
[
m
+
"_stderr"
+
","
+
f
]
values
.
append
([
k
,
version
,
f
,
m
,
"%.4f"
%
v
,
"±"
,
"%.4f"
%
se
])
values
.
append
([
k
,
version
,
f
,
n
,
m
,
"%.4f"
%
v
,
"±"
,
"%.4f"
%
se
])
else
:
values
.
append
([
k
,
version
,
f
,
m
,
"%.4f"
%
v
,
""
,
""
])
values
.
append
([
k
,
version
,
f
,
n
,
m
,
"%.4f"
%
v
,
""
,
""
])
k
=
""
version
=
""
md_writer
.
value_matrix
=
values
...
...
Prev
1
2
3
4
Next
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment