Commit c0fbf9e8 authored by Leo Gao's avatar Leo Gao
Browse files

Add task table

parent 049dfa34
......@@ -10,6 +10,49 @@ The goal of this project is to build a set of tools for evaluating LMs on typica
2. Removing task val/test data from LM training set
3. Adding task training data to LM training set
### Overview of Tasks
| Task Name |Train|Val|Test| Metrics |
|---------------|-----|---|----|--------------------|
|cola |✓ |✓ |✓ |mcc |
|mnli |✓ |✓ |✓ |acc |
|mnli_mismatched|✓ |✓ |✓ |acc |
|mrpc |✓ |✓ |✓ |acc, f1 |
|rte |✓ |✓ |✓ |acc |
|qnli |✓ |✓ |✓ |acc |
|qqp |✓ |✓ |✓ |acc, f1 |
|sst |✓ |✓ |✓ |acc |
|wnli |✓ |✓ |✓ |acc |
|boolq |✓ |✓ |✓ |acc |
|cb |✓ |✓ |✓ |acc, f1 |
|copa |✓ |✓ |✓ |acc |
|multirc |✓ |✓ |✓ |acc |
|wic |✓ |✓ |✓ |acc |
|wsc |✓ |✓ |✓ |acc |
|lambada | |✓ | |perplexity, accuracy|
|piqa |✓ |✓ | |acc |
|arc_easy |✓ |✓ |✓ |acc |
|arc_challenge |✓ |✓ |✓ |acc |
|hellaswag |✓ |✓ |✓ |acc |
|race |✓ |✓ |✓ |acc |
|webqs |✓ | |✓ |acc |
|wsc273 | | |✓ |acc |
|winogrande |✓ |✓ |✓ |acc |
|anli_r1 |✓ |✓ |✓ |acc |
|anli_r2 |✓ |✓ |✓ |acc |
|anli_r3 |✓ |✓ |✓ |acc |
|arithmetic_2da | |✓ | |acc |
|arithmetic_2ds | |✓ | |acc |
|arithmetic_3da | |✓ | |acc |
|arithmetic_3ds | |✓ | |acc |
|arithmetic_4da | |✓ | |acc |
|arithmetic_4ds | |✓ | |acc |
|arithmetic_5da | |✓ | |acc |
|arithmetic_5ds | |✓ | |acc |
|arithmetic_2dm | |✓ | |acc |
|arithmetic_1dc | |✓ | |acc |
## Usage
### Evaluate a task
......
......@@ -18,22 +18,22 @@ class LAMBADA(Task):
return False
def has_validation_docs(self):
return False
return True
def has_test_docs(self):
return True
return False
def training_docs(self):
pass
def validation_docs(self):
pass
def test_docs(self):
with open("data/lambada/lambada_test.jsonl") as fh:
for line in fh:
yield json.loads(line)
def test_docs(self):
pass
def doc_to_text(self, doc):
return doc['text'].rsplit(' ', 1)[0]
......
from lm_eval import tasks
from pytablewriter import MarkdownTableWriter
writer = MarkdownTableWriter()
writer.headers = ["Task Name", "Train", "Val", "Test", "Metrics"]
values = []
def chk(tf):
if tf:
return '✓'
else:
return ' '
for tname, Task in tasks.TASK_REGISTRY.items():
task = Task()
values.append([tname,chk(task.has_training_docs()),chk(task.has_validation_docs()),chk(task.has_test_docs()),', '.join(task.aggregation().keys())])
writer.value_matrix = values
print(writer.dumps())
\ No newline at end of file
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment