Unverified Commit 03e7df51 authored by Lintang Sutawika's avatar Lintang Sutawika Committed by GitHub
Browse files

Allow parameter edits for registered tasks when listed in a benchmark (#1273)

* benchmark yamls allow minor edits of already registered tasks

* add documentation

* removed print
parent 39e7b264
...@@ -301,6 +301,23 @@ task: ...@@ -301,6 +301,23 @@ task:
- hendrycksTest* - hendrycksTest*
``` ```
It is also possible to list an existing task in your benchmark configuration with some adjustments. For example, a few tasks from mmlu is included `multimedqa`. There, the `task_alias` and `group_alias` (See [here](https://github.com/EleutherAI/lm-evaluation-harness/blob/main/docs/new_task_guide.md#beautifying-table-display) for more details) are modified to suit the benchmark.
```yaml
group: multimedqa
task:
- pubmedqa
- medmcqa
- medqa_4options
- task: mmlu_anatomy
task_alias: "anatomy (mmlu)"
group_alias: null
- task: mmlu_clinical_knowledge
task_alias: "clinical_knowledge (mmlu)"
group_alias: null
...
```
Alternatively, benchmarks can have tasks that are customizable for each task. They can be defined like how a yaml task is usually set. Alternatively, benchmarks can have tasks that are customizable for each task. They can be defined like how a yaml task is usually set.
```yaml ```yaml
......
...@@ -61,11 +61,27 @@ def register_configurable_group(config: Dict[str, str], yaml_path: str = None) - ...@@ -61,11 +61,27 @@ def register_configurable_group(config: Dict[str, str], yaml_path: str = None) -
task_list = [task for task in all_task_list if type(task) == str] task_list = [task for task in all_task_list if type(task) == str]
for task_config in config_list: for task_config in config_list:
base_config = {}
task_name_config = {}
if "task" in task_config:
task_name = task_config["task"]
if task_name in ALL_TASKS:
task_obj = get_task_dict(task_name)[task_name]
if type(task_obj) == tuple:
_, task_obj = task_obj
if task_obj is not None:
base_config = task_obj._config.to_dict()
task_name_config["task"] = f"{group}_{task_name}"
task_config = utils.load_yaml_config(yaml_path, task_config) task_config = utils.load_yaml_config(yaml_path, task_config)
var_configs = check_prompt_config( var_configs = check_prompt_config(
{ {
**base_config,
**task_config, **task_config,
**{"group": group}, **{"group": group},
**task_name_config,
}, },
yaml_path=os.path.dirname(yaml_path), yaml_path=os.path.dirname(yaml_path),
) )
......
...@@ -3,9 +3,21 @@ task: ...@@ -3,9 +3,21 @@ task:
- pubmedqa - pubmedqa
- medmcqa - medmcqa
- medqa_4options - medqa_4options
- mmlu_anatomy - task: mmlu_anatomy
- mmlu_clinical_knowledge task_alias: "anatomy (mmlu)"
- mmlu_college_medicine group_alias: null
- mmlu_medical_genetics - task: mmlu_clinical_knowledge
- mmlu_professional_medicine task_alias: "clinical_knowledge (mmlu)"
- mmlu_college_biology group_alias: null
- task: mmlu_college_medicine
task_alias: "college_medicine (mmlu)"
group_alias: null
- task: mmlu_medical_genetics
task_alias: "medical_genetics (mmlu)"
group_alias: null
- task: mmlu_professional_medicine
task_alias: "professional_medicine (mmlu)"
group_alias: null
- task: mmlu_college_biology
task_alias: "college_biology (mmlu)"
group_alias: null
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment