Add `description_dict` docs and update `task-guide`

acf76b50 · Jonathan Tow · 1bc6cdb1 · acf76b50 · acf76b50 · acf76b50
Commit acf76b50 authored Dec 16, 2021 by Jonathan Tow
Showing with 50 additions and 37 deletions

README.md README.md +1 -2

docs/description_guide.md docs/description_guide.md +49 -0

docs/img/fewshot_example_gpt3.png docs/img/fewshot_example_gpt3.png +0 -0

docs/task_guide.md docs/task_guide.md +0 -35

No files found.
--- a/README.md
+++ b/README.md
@@ -55,7 +55,7 @@ To evaluate mesh-transformer-jax models that are not available on HF, please inv

 ## Implementing new tasks

-To implement a new task in eval harness, see [this guide](https://github.com/EleutherAI/lm-evaluation-harness/blob/master/task-guide.md).
+To implement a new task in eval harness, see [this guide](./docs/task_guide.md).

 ## Cite as

@@ -298,7 +298,6 @@ To inspect what the LM inputs look like, you can run the following command:
 ```bash
 python write_out.py \
 	--tasks all_tasks \
-	--provide_description \
 	--num_fewshot 5 \
 	--num_examples 10 \
 	--output_base_path /path/to/output/folder

--- a/docs/description_guide.md
+++ b/docs/description_guide.md
+# Description Guide
+
+![fewshot-example](./img/fewshot_example_gpt3.png)
+(Figure from [Brown et al., 2020](https://arxiv.org/pdf/2005.14165.pdf))
+
+Task descriptions provide in-context task instruction for your language model. If you'd like to prepend a natural language description to your few-shot examples and prompt, you can do so on a per-task basis via the `description_dict` arg of [`evaluator.evaluate`](../lm_eval/evaluator.py). This `description_dict` must adhere to the following key-value structure:
+
+- **key**: the task name (`str`) as specified in the lm-eval-harness task registry (see the following section on task registry).
+- **value**: the corresponding (`str`) description/prompt for the task identified by **key**.
+
+```python
+description_dict = {
+    "task_name_1": "description",
+    "task_name_2": "description",
+    ...
+}
+```
+
+Note that a task's description will be separated from its following few-shot examples and prompt by a new line as such:
+
+```python
+"""
+<description>
+
+<examples>
+
+<prompt>
+"""
+```
+
+## Descriptions in File
+
+One can also interface with the aforementioned [`evaluator.evaluate`](../lm_eval/evaluator.py) (or `evaluator.simple_evaluate`) method from a higher level by simply passing a JSON file path to the `description_dict_path` arg of the command-line interface (CLI) program, `main.py`. The JSON file pointed to should be structured the same as the `description_dict`. E.g. for some file at `/your/path/descriptions.json` you may have:
+
+```json
+{
+    "cycle_letters": "Please unscramble the letters into a word, and write that word:",
+    "copa": "Given a premise and one alternative with a causal relation to the premise and another without, choose the more plausible alternative"
+}
+```
+
+which can then be supplied to the CLI as:
+
+```python
+python main.py  \
+--tasks cycle_letters,copa \
+--description_dict_path /your/path/descriptions.json \
+...
+```
--- a/docs/img/fewshot_example_gpt3.png
+++ b/docs/img/fewshot_example_gpt3.png
--- a/task-guide.md
+++ b/task-guide.md
@@ -142,41 +142,6 @@ def doc_to_target(self, doc):

 Understand that the strings from `doc_to_text` and `doc_to_target` will be concatenated together to build up labeled examples in the k-shot setting where k > 0. Design with that in mind 👍.

-### Formatting Prompts
-
-If you'd like to prepend your few-shot examples with a natural language description or provide a lone custom prompt for a zero-shot task, you can do so on a per-task basis via the `description_dict` arg of `evaluator.evaluate` which is accessible from the `evaluator` module. This `description_dict` must adhere to the following key-value structure:
-
- **key**: the task name as specified in the lm-eval-harness task registry (see the following section on task registry).
- **value**: the corresponding description/prompt for the task identified by **key**.
-
-E.g.
-
-```python
-description_dict = {
-    "task_name_1": "fewshot description",
-    "task_name_2": "fewshot description",
-    ...
-}
-```
-
-One can also interface with `evaluator.evaluate`/`evaluator.simple_evaluate` from a higher level by simply passing a JSON file path to the `description_dict_path` arg of the command-line interface (CLI) programs, `main.py` and `write_out.py` . The JSON file pointed to should be structured the same way as the aforementioned `description_dict`. E.g. for some file at `/your/path/descriptions.json` you might have:
-
-```json
-{
-    "cycle_letters": "Please unscramble the letters into a word, and write that word:",
-    "copa": "Given a premise and one alternative with a causal relation to the premise and another without, choose the more plausible alternative"
-}
-```
-
-which can then be used, for example, in the `main.py` CLI as:
-
-```python
-python main.py  \
--tasks cycle_letters,copa \
--description_dict_path /your/path/descriptions.json \
-...
-```
-
 ### Registering Your Task

 Now's a good time to register your task to expose it for usage. All you'll need to do is import your task module in `lm_eval/tasks/__init__.py` and provide an entry in the `TASK_REGISTRY`  dictionary with the key as the name of your benchmark task (in the form it'll be referred to in the command line) and the value as the task class. See how it's done for other tasks in the [file](https://github.com/EleutherAI/lm-evaluation-harness/blob/master/lm_eval/tasks/__init__.py).