Commit 00cecd59 authored by Jonathan Tow's avatar Jonathan Tow
Browse files

Add task source header and versioning guide

parent 24d50a70
...@@ -23,7 +23,23 @@ cd lm_eval/tasks ...@@ -23,7 +23,23 @@ cd lm_eval/tasks
touch <task-name>.py touch <task-name>.py
``` ```
Now let's walk through this - from data handling to evaluation. Then open the file and create a multiline docstring on the first line with the name of the paper associated with your task/s on one line, the paper’s url on the next line, and its BibTeX Code on another. For example, take the QuAC dataset. You’d write:
```python
"""
QuAC: Question Answering in Context
https://arxiv.org/abs/1808.07036
@article{choi2018quac,
title={Quac: Question answering in context},
author={Choi, Eunsol and He, He and Iyyer, Mohit and Yatskar, Mark and Yih, Wen-tau and Choi, Yejin and Liang, Percy and Zettlemoyer, Luke},
journal={arXiv preprint arXiv:1808.07036},
year={2018}
}
"""
```
Now let's walk through the actual implementation - from data handling to evaluation.
### Downloading the Data ### Downloading the Data
...@@ -86,7 +102,7 @@ There are 2 standard approaches we follow for downloading data: ...@@ -86,7 +102,7 @@ There are 2 standard approaches we follow for downloading data:
If your task is multiple-choice just inherit from the `MultipleChoiceTask` class we provide. If your task is multiple-choice just inherit from the `MultipleChoiceTask` class we provide.
```python ```python
from lm_eval.base import MultipleChoiceTask from lm_eval.base import MultipleChoiceTask
# ...
class TaskName(..., MultipleChoiceTask): class TaskName(..., MultipleChoiceTask):
``` ```
Multiple-choice tasks require you to format your documents according to a standard. Multiple-choice tasks require you to format your documents according to a standard.
...@@ -94,6 +110,15 @@ after this go <a href="#Registering-Your-Task">register your task</a>. ...@@ -94,6 +110,15 @@ after this go <a href="#Registering-Your-Task">register your task</a>.
⚠️ __END TODO__ ⚠️ __END TODO__
### Versioning
Tasks in the harness can always evolve. Metrics get updated, data sources change, etc. It’s important to mark each task with a version attribute so users can specify which version of the implementation was used to obtain their results. Add a `VERSION` attribute to your task class set to 0 (the first version of your task):
```python
class TaskName(...):
VERSION = 0
```
### Formatting your Few-Shot Examples ### Formatting your Few-Shot Examples
The harness is designed to facilitate task evaluations under the few-shot setting. Here we’ll format such examples. Override the following methods for your task class: The harness is designed to facilitate task evaluations under the few-shot setting. Here we’ll format such examples. Override the following methods for your task class:
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment