Commit acf76b50 authored by Jonathan Tow's avatar Jonathan Tow
Browse files

Add `description_dict` docs and update `task-guide`

parent 1bc6cdb1
......@@ -55,7 +55,7 @@ To evaluate mesh-transformer-jax models that are not available on HF, please inv
## Implementing new tasks
To implement a new task in eval harness, see [this guide](https://github.com/EleutherAI/lm-evaluation-harness/blob/master/task-guide.md).
To implement a new task in eval harness, see [this guide](./docs/task_guide.md).
## Cite as
......@@ -298,7 +298,6 @@ To inspect what the LM inputs look like, you can run the following command:
```bash
python write_out.py \
--tasks all_tasks \
--provide_description \
--num_fewshot 5 \
--num_examples 10 \
--output_base_path /path/to/output/folder
......
# Description Guide
![fewshot-example](./img/fewshot_example_gpt3.png)
(Figure from [Brown et al., 2020](https://arxiv.org/pdf/2005.14165.pdf))
Task descriptions provide in-context task instruction for your language model. If you'd like to prepend a natural language description to your few-shot examples and prompt, you can do so on a per-task basis via the `description_dict` arg of [`evaluator.evaluate`](../lm_eval/evaluator.py). This `description_dict` must adhere to the following key-value structure:
- **key**: the task name (`str`) as specified in the lm-eval-harness task registry (see the following section on task registry).
- **value**: the corresponding (`str`) description/prompt for the task identified by **key**.
```python
description_dict = {
"task_name_1": "description",
"task_name_2": "description",
...
}
```
Note that a task's description will be separated from its following few-shot examples and prompt by a new line as such:
```python
"""
<description>
<examples>
<prompt>
"""
```
## Descriptions in File
One can also interface with the aforementioned [`evaluator.evaluate`](../lm_eval/evaluator.py) (or `evaluator.simple_evaluate`) method from a higher level by simply passing a JSON file path to the `description_dict_path` arg of the command-line interface (CLI) program, `main.py`. The JSON file pointed to should be structured the same as the `description_dict`. E.g. for some file at `/your/path/descriptions.json` you may have:
```json
{
"cycle_letters": "Please unscramble the letters into a word, and write that word:",
"copa": "Given a premise and one alternative with a causal relation to the premise and another without, choose the more plausible alternative"
}
```
which can then be supplied to the CLI as:
```python
python main.py \
--tasks cycle_letters,copa \
--description_dict_path /your/path/descriptions.json \
...
```
......@@ -142,41 +142,6 @@ def doc_to_target(self, doc):
Understand that the strings from `doc_to_text` and `doc_to_target` will be concatenated together to build up labeled examples in the k-shot setting where k > 0. Design with that in mind 👍.
### Formatting Prompts
If you'd like to prepend your few-shot examples with a natural language description or provide a lone custom prompt for a zero-shot task, you can do so on a per-task basis via the `description_dict` arg of `evaluator.evaluate` which is accessible from the `evaluator` module. This `description_dict` must adhere to the following key-value structure:
- **key**: the task name as specified in the lm-eval-harness task registry (see the following section on task registry).
- **value**: the corresponding description/prompt for the task identified by **key**.
E.g.
```python
description_dict = {
"task_name_1": "fewshot description",
"task_name_2": "fewshot description",
...
}
```
One can also interface with `evaluator.evaluate`/`evaluator.simple_evaluate` from a higher level by simply passing a JSON file path to the `description_dict_path` arg of the command-line interface (CLI) programs, `main.py` and `write_out.py` . The JSON file pointed to should be structured the same way as the aforementioned `description_dict`. E.g. for some file at `/your/path/descriptions.json` you might have:
```json
{
"cycle_letters": "Please unscramble the letters into a word, and write that word:",
"copa": "Given a premise and one alternative with a causal relation to the premise and another without, choose the more plausible alternative"
}
```
which can then be used, for example, in the `main.py` CLI as:
```python
python main.py \
--tasks cycle_letters,copa \
--description_dict_path /your/path/descriptions.json \
...
```
### Registering Your Task
Now's a good time to register your task to expose it for usage. All you'll need to do is import your task module in `lm_eval/tasks/__init__.py` and provide an entry in the `TASK_REGISTRY` dictionary with the key as the name of your benchmark task (in the form it'll be referred to in the command line) and the value as the task class. See how it's done for other tasks in the [file](https://github.com/EleutherAI/lm-evaluation-harness/blob/master/lm_eval/tasks/__init__.py).
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment