Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
gaoqiong
lm-evaluation-harness
Commits
c0fbf9e8
Commit
c0fbf9e8
authored
Feb 04, 2021
by
Leo Gao
Browse files
Add task table
parent
049dfa34
Changes
4
Hide whitespace changes
Inline
Side-by-side
Showing
4 changed files
with
70 additions
and
5 deletions
+70
-5
README.md
README.md
+43
-0
lm_eval/tasks/lambada.py
lm_eval/tasks/lambada.py
+5
-5
scripts/make_table.py
scripts/make_table.py
+22
-0
scripts/write_out.py
scripts/write_out.py
+0
-0
No files found.
README.md
View file @
c0fbf9e8
...
@@ -10,6 +10,49 @@ The goal of this project is to build a set of tools for evaluating LMs on typica
...
@@ -10,6 +10,49 @@ The goal of this project is to build a set of tools for evaluating LMs on typica
2.
Removing task val/test data from LM training set
2.
Removing task val/test data from LM training set
3.
Adding task training data to LM training set
3.
Adding task training data to LM training set
### Overview of Tasks
| Task Name |Train|Val|Test| Metrics |
|---------------|-----|---|----|--------------------|
|cola |✓ |✓ |✓ |mcc |
|mnli |✓ |✓ |✓ |acc |
|mnli_mismatched|✓ |✓ |✓ |acc |
|mrpc |✓ |✓ |✓ |acc, f1 |
|rte |✓ |✓ |✓ |acc |
|qnli |✓ |✓ |✓ |acc |
|qqp |✓ |✓ |✓ |acc, f1 |
|sst |✓ |✓ |✓ |acc |
|wnli |✓ |✓ |✓ |acc |
|boolq |✓ |✓ |✓ |acc |
|cb |✓ |✓ |✓ |acc, f1 |
|copa |✓ |✓ |✓ |acc |
|multirc |✓ |✓ |✓ |acc |
|wic |✓ |✓ |✓ |acc |
|wsc |✓ |✓ |✓ |acc |
|lambada | |✓ | |perplexity, accuracy|
|piqa |✓ |✓ | |acc |
|arc_easy |✓ |✓ |✓ |acc |
|arc_challenge |✓ |✓ |✓ |acc |
|hellaswag |✓ |✓ |✓ |acc |
|race |✓ |✓ |✓ |acc |
|webqs |✓ | |✓ |acc |
|wsc273 | | |✓ |acc |
|winogrande |✓ |✓ |✓ |acc |
|anli_r1 |✓ |✓ |✓ |acc |
|anli_r2 |✓ |✓ |✓ |acc |
|anli_r3 |✓ |✓ |✓ |acc |
|arithmetic_2da | |✓ | |acc |
|arithmetic_2ds | |✓ | |acc |
|arithmetic_3da | |✓ | |acc |
|arithmetic_3ds | |✓ | |acc |
|arithmetic_4da | |✓ | |acc |
|arithmetic_4ds | |✓ | |acc |
|arithmetic_5da | |✓ | |acc |
|arithmetic_5ds | |✓ | |acc |
|arithmetic_2dm | |✓ | |acc |
|arithmetic_1dc | |✓ | |acc |
## Usage
## Usage
### Evaluate a task
### Evaluate a task
...
...
lm_eval/tasks/lambada.py
View file @
c0fbf9e8
...
@@ -18,22 +18,22 @@ class LAMBADA(Task):
...
@@ -18,22 +18,22 @@ class LAMBADA(Task):
return
False
return
False
def
has_validation_docs
(
self
):
def
has_validation_docs
(
self
):
return
Fals
e
return
Tru
e
def
has_test_docs
(
self
):
def
has_test_docs
(
self
):
return
Tru
e
return
Fals
e
def
training_docs
(
self
):
def
training_docs
(
self
):
pass
pass
def
validation_docs
(
self
):
def
validation_docs
(
self
):
pass
def
test_docs
(
self
):
with
open
(
"data/lambada/lambada_test.jsonl"
)
as
fh
:
with
open
(
"data/lambada/lambada_test.jsonl"
)
as
fh
:
for
line
in
fh
:
for
line
in
fh
:
yield
json
.
loads
(
line
)
yield
json
.
loads
(
line
)
def
test_docs
(
self
):
pass
def
doc_to_text
(
self
,
doc
):
def
doc_to_text
(
self
,
doc
):
return
doc
[
'text'
].
rsplit
(
' '
,
1
)[
0
]
return
doc
[
'text'
].
rsplit
(
' '
,
1
)[
0
]
...
...
scripts/make_table.py
0 → 100644
View file @
c0fbf9e8
from
lm_eval
import
tasks
from
pytablewriter
import
MarkdownTableWriter
writer
=
MarkdownTableWriter
()
writer
.
headers
=
[
"Task Name"
,
"Train"
,
"Val"
,
"Test"
,
"Metrics"
]
values
=
[]
def
chk
(
tf
):
if
tf
:
return
'✓'
else
:
return
' '
for
tname
,
Task
in
tasks
.
TASK_REGISTRY
.
items
():
task
=
Task
()
values
.
append
([
tname
,
chk
(
task
.
has_training_docs
()),
chk
(
task
.
has_validation_docs
()),
chk
(
task
.
has_test_docs
()),
', '
.
join
(
task
.
aggregation
().
keys
())])
writer
.
value_matrix
=
values
print
(
writer
.
dumps
())
\ No newline at end of file
write_out.py
→
scripts/
write_out.py
View file @
c0fbf9e8
File moved
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment