Unverified Commit 0b8358ec authored by Sypherd's avatar Sypherd Committed by GitHub
Browse files

Add real process_docs example (#2456)

parent c0745fec
...@@ -86,20 +86,20 @@ Let's create a python file in the directory where we're writing our YAML file: ...@@ -86,20 +86,20 @@ Let's create a python file in the directory where we're writing our YAML file:
```bash ```bash
touch lm_eval/tasks/<dataset_name>/utils.py touch lm_eval/tasks/<dataset_name>/utils.py
``` ```
Now, in `utils.py` we'll write a function to process each split of our dataset: Now, in `utils.py` we'll write a function to process each split of our dataset (the following example is drawn from [the `hellaswag` task](../lm_eval/tasks/hellaswag/utils.py)):
TODO: Change the example to one that's in the tasks/
```python ```python
def process_docs(dataset: datasets.Dataset): def process_docs(dataset: datasets.Dataset) -> datasets.Dataset:
def _helper(doc): def _process_doc(doc):
# modifies the contents of a single ctx = doc["ctx_a"] + " " + doc["ctx_b"].capitalize()
# document in our dataset. out_doc = {
doc["choices"] = [doc["choice1"], doc["choice2"], doc["wrong_answer"]] "query": preprocess(doc["activity_label"] + ": " + ctx),
doc["gold"] = doc["label"] "choices": [preprocess(ending) for ending in doc["endings"]],
return doc "gold": int(doc["label"]),
}
return dataset.map(_helper) # returns back a datasets.Dataset object return out_doc
return dataset.map(_process_doc)
``` ```
Now, in our YAML config file we'll use the `!function` constructor, and tell the config where our imported Python function will come from. At runtime, before doing anything else we will preprocess our dataset according to this function! Now, in our YAML config file we'll use the `!function` constructor, and tell the config where our imported Python function will come from. At runtime, before doing anything else we will preprocess our dataset according to this function!
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment