Unverified Commit d714fc95 authored by Lintang Sutawika's avatar Lintang Sutawika Committed by GitHub
Browse files

Faster Task and Group Loading, Allow Recursive Groups (#1321)



* add trust_remote_code as default

* task for testing recursive

* changed source of ALL_TASKS

* tasks should only accept TaskObjects

* initialize_tasks returns list of tasks and groups

* remove trust_remote_code for now

* moved constructor process to inside load_yaml_config

* more comprehensive way to index tasks and groups

* pre-commit format

* add exit after error

* adjust how task objects are called

* no need to use get_task_dict

* load_task_or_group works but only for tasks

* pre-commit format

* half working for nested groups

* changed variable names

* allow groups and tasks to work

* temp save

* indexing and loading are part of a task_manager object

* adapted initialize_tasks

* iron out bugs

* fixed typo

* fixed typo

* simplified code

* further tidy up

* remove lines for testing

* removed test lines

* removed unused code

* remove unused import

* fixed bug

* removed comments

* group in a list of group can accept parameter changes like `num_fewshot`

* add trust_remote_code as default

* task for testing recursive

* changed source of ALL_TASKS

* tasks should only accept TaskObjects

* initialize_tasks returns list of tasks and groups

* remove trust_remote_code for now

* moved constructor process to inside load_yaml_config

* more comprehensive way to index tasks and groups

* pre-commit format

* add exit after error

* adjust how task objects are called

* no need to use get_task_dict

* load_task_or_group works but only for tasks

* pre-commit format

* half working for nested groups

* changed variable names

* allow groups and tasks to work

* temp save

* indexing and loading are part of a task_manager object

* adapted initialize_tasks

* iron out bugs

* fixed typo

* fixed typo

* simplified code

* further tidy up

* remove lines for testing

* removed test lines

* removed unused code

* remove unused import

* fixed bug

* removed comments

* group in a list of group can accept parameter changes like `num_fewshot`

* check if config is task update

* add GroupConfig object

* edit test yaml

* remove args

* testing returning to python task list

* add weight_by_size config

* describe weight_by_size in docs

* fix weight by size potential error

* can load individual custom python class task

* moved import_function into the config loading file

* remove print lines

* add squadv2 yaml

* temporary scroll implementation

* revert back to use load_yaml_config but with modes

* fix group being loaded with a None

* reformat

* can load unregistered tasks from a group

* update scrolls

* edit scrolls multiplechoice task

* adjust class initialization

* fix initialization

* changed how to identify group and python tasks, fix logger

* allow loading "include" that is nested in a group config

* reworked flan benchmark

* allow duplicate task in the same group to co-exist

* process group_alias

* removed group_alias

* allow parameters set in group_config to apply to all tasks in tasklist

* add function, but comment for now

* reworked processing dict-base config

* fixed how configs in group are processed

* update to allow root group to have its alias used

* remove unused classes

* remove unused classes

* revert some parts to original

* forgot to change one variable

* adapt the new process to use get_task_dict

* fix for singular group call

* fix variable names

* add TaskManager into the evaluator

* format

* changed how dict tasks are loaded

* add docs

* Update docs/new_task_guide.md
Co-authored-by: default avatarHailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

* Update evaluator.py

* Update evaluator.py

* remove groupconfig for now

* changed _config to config

* update interface.md to explain TaskManager

* added property functions

* adjusted logger

* update write_out.py

* updated tests

* added documentation and some modifications

* added docstring documentation

* precommit format

* updated task loading for tests

* updates tests

* changed arg order for load_yaml_config

* update to handle scrolls and edit log message

* remove unused lines

* return a list of task classes and not a dict

* Update __init__.py

* Delete lm_eval/tasks/benchmarks/test.yaml

* Update task.py

* Update lm_eval/utils.py
Co-authored-by: default avatarHailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

* Update lm_eval/utils.py
Co-authored-by: default avatarHailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

* Update utils.py

* re-added old functions with new log message

* Update docs/new_task_guide.md
Co-authored-by: default avatarHailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

* Update new_task_guide.md

* added infor regarding `get_task_dict` and documentation

* add get_config for Task

* pre-commit formatting

---------
Co-authored-by: default avatarHailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
parent 17191063
...@@ -61,14 +61,25 @@ import lm_eval ...@@ -61,14 +61,25 @@ import lm_eval
my_model = initialize_my_model() # create your model (could be running finetuning with some custom modeling code) my_model = initialize_my_model() # create your model (could be running finetuning with some custom modeling code)
... ...
lm_obj = Your_LM(model=my_model, batch_size=16) # instantiate an LM subclass that takes your initialized model and can run `Your_LM.loglikelihood()`, `Your_LM.loglikelihood_rolling()`, `Your_LM.generate_until()` # instantiate an LM subclass that takes your initialized model and can run
# - `Your_LM.loglikelihood()`
lm_eval.tasks.initialize_tasks() # register all tasks from the `lm_eval/tasks` subdirectory. Alternatively, can call `lm_eval.tasks.include_path("path/to/my/custom/task/configs")` to only register a set of tasks in a separate directory. # - `Your_LM.loglikelihood_rolling()`
# - `Your_LM.generate_until()`
lm_obj = Your_LM(model=my_model, batch_size=16)
# indexes all tasks from the `lm_eval/tasks` subdirectory.
# Alternatively, you can set `TaskManager(include_path="path/to/my/custom/task/configs")`
# to include a set of tasks in a separate directory.
task_manager = lm_eval.tasks.TaskManager()
# Setting `task_manager` to the one above is optional and should generally be done
# if you want to include tasks from paths other than ones in `lm_eval/tasks`.
# `simple_evaluate` will instantiate its own task_manager is the it is set to None here.
results = lm_eval.simple_evaluate( # call simple_evaluate results = lm_eval.simple_evaluate( # call simple_evaluate
model=lm_obj, model=lm_obj,
tasks=["taskname1", "taskname2"], tasks=["taskname1", "taskname2"],
num_fewshot=0, num_fewshot=0,
task_manager=task_manager,
... ...
) )
``` ```
...@@ -84,18 +95,49 @@ As a brief example usage of `evaluate()`: ...@@ -84,18 +95,49 @@ As a brief example usage of `evaluate()`:
```python ```python
import lm_eval import lm_eval
from my_tasks import MyTask1 # suppose you've defined a custom lm_eval.api.Task subclass in your own external codebase # suppose you've defined a custom lm_eval.api.Task subclass in your own external codebase
from my_tasks import MyTask1
... ...
my_model = initialize_my_model() # create your model (could be running finetuning with some custom modeling code) # create your model (could be running finetuning with some custom modeling code)
my_model = initialize_my_model()
... ...
lm_obj = Your_LM(model=my_model, batch_size=16) # instantiate an LM subclass that takes your initialized model and can run `Your_LM.loglikelihood()`, `Your_LM.loglikelihood_rolling()`, `Your_LM.generate_until()`
lm_eval.tasks.initialize_tasks() # register all tasks from the `lm_eval/tasks` subdirectory. Alternatively, can call `lm_eval.tasks.include_path("path/to/my/custom/task/configs")` to only register a set of tasks in a separate directory. # instantiate an LM subclass that takes your initialized model and can run
# - `Your_LM.loglikelihood()`
# - `Your_LM.loglikelihood_rolling()`
# - `Your_LM.generate_until()`
lm_obj = Your_LM(model=my_model, batch_size=16)
# The task_manager indexes tasks including ones
# specified by the user through `include_path`
task_manager = lm_eval.tasks.TaskManager(
include_path="/path/to/custom/yaml"
)
# To get a task dict for `evaluate`
task_dict = lm_eval.tasks.get_task_dict(
[
"mmlu", # A stock task
"my_custom_task", # A custom task
{
"task": ..., # A dict that configures a task
"doc_to_text": ...,
},
MyTask1 # A task object from `lm_eval.task.Task`
],
task_manager # A task manager that allows lm_eval to
# load the task during evaluation.
# If none is provided, `get_task_dict`
# will instantiated one itself, but this
# only includes the stock tasks so users
# will need to set this if including
# custom paths is required.
)
def evaluate( def evaluate(
lm=lm_obj, lm=lm_obj,
task_dict={"mytask1": MyTask1}, task_dict=task_dict,
... ...
): ):
``` ```
...@@ -290,17 +290,80 @@ This will add your task to the `group1` and `group2` groups, enabling people to ...@@ -290,17 +290,80 @@ This will add your task to the `group1` and `group2` groups, enabling people to
If your task is not in the `lm_eval/tasks` folder, you'll need to tell the Eval Harness where to look for YAML files. If your task is not in the `lm_eval/tasks` folder, you'll need to tell the Eval Harness where to look for YAML files.
You can do this via adding the Python snippet You can do this via the `--include_path` argument in `__main__.py`. This command will be used to initialize the `TaskManager` object which you can also use for your custom scripts.
```python ```python
from lm_eval.tasks import include_task_folder task_manager = TaskManager(args.verbosity, include_path=args.include_path)
include_task_folder("/path/to/yaml/parent/folder")
``` ```
to the top of any Python file that is run or imported when performing evaluation, such as `\_\_main\_\_.py`.
Passing `--tasks /path/to/yaml/file` is also accepted. Passing `--tasks /path/to/yaml/file` is also accepted.
### Advanced Group Configs
You can make more complete group config while also tailoring parameters for individual tasks.
For example, let's build a config for evaluating MMLU and a few natural language inference tasks. For MMLU, we can write the name for the benchmark as a subtask written under `task`. You can configure the parameters such as `num_fewshot`. If the task being configured is a group such as `mmlu` or `super_glue`, the parameter set will be applied to all of the subtasks.
```yaml
group: nli_and_mmlu
task:
- group: nli_tasks
task:
- cb
- anli_r1
- rte
- task: mmlu
num_fewshot: 2
```
It's also important to note how you can basically insert a group config as a task. Here, to make a group of natural language inference tasks, you simply write like how you would normally write a group config but this time place that as part of a task list under the main group being built.
### Duplicate Tasks in Group Configs
There might be cases where you might want to evaluate prompts and how models perform over prompt variations. You can list an existing task (In the example below, `anli_r1`) which varying `doc_to_text` implementation. To differentiate from each variation, we can utilize `task_alias`. LM-Eval will recognize that there are multiple variations of the same tasks and differentiate them.
```yaml
group: flan_held_in
group_alias: Flan (Held-In)
task:
# ANLI R1
- group: anli_r1_flan
group_alias: ANLI R1
task:
- task: anli_r1
task_alias: prompt-0
include: _held_in_template_yaml
doc_to_text: "{{premise}}\n\nChoose your answer ..."
...
- task: anli_r1
task_alias: prompt-1
include: _held_in_template_yaml
doc_to_text: "{{premise}}\n\nBased on ..."
...
```
### Configuring python classes
There can occasions when yaml-based tasks cannot accommodate how a task is handled. LM-Eval supports the manually implementing tasks as was previously done before `0.4.x`. To register the task, you can simply make a yaml with the name of the task in `task` and the class object in `class` using the `!function` prefix.
```yaml
task: squadv2
class: !function task.SQuAD2
```
This also applies to building group configurations with subtasks that are python classes.
```yaml
group: scrolls
task:
- task: scrolls_qasper
class: !function task.Qasper
- task: scrolls_quality
class: !function task.QuALITY
- task: scrolls_narrativeqa
class: !function task.NarrativeQA
...
```
## Beautifying Table Display ## Beautifying Table Display
To avoid conflict, each task needs to be registered with a unique name. Because of this, slight variations of task are still counted as unique tasks and need to be named uniquely. This could be done by appending an additional naming that may refer to the variation such as in MMLU where the template used to evaluated for flan are differentiated from the default by the prefix `mmlu_flan_*`. Printing the full task names can easily clutter the results table at the end of the evaluation especially when you have a long list of tasks or are using a benchmark that comprises of many tasks. To make it more legible, you can use `task_alias` and `group_alias` to provide an alternative task name and group name that will be printed. To avoid conflict, each task needs to be registered with a unique name. Because of this, slight variations of task are still counted as unique tasks and need to be named uniquely. This could be done by appending an additional naming that may refer to the variation such as in MMLU where the template used to evaluated for flan are differentiated from the default by the prefix `mmlu_flan_*`. Printing the full task names can easily clutter the results table at the end of the evaluation especially when you have a long list of tasks or are using a benchmark that comprises of many tasks. To make it more legible, you can use `task_alias` and `group_alias` to provide an alternative task name and group name that will be printed.
......
...@@ -10,8 +10,7 @@ from typing import Union ...@@ -10,8 +10,7 @@ from typing import Union
import numpy as np import numpy as np
from lm_eval import evaluator, utils from lm_eval import evaluator, utils
from lm_eval.api.registry import ALL_TASKS from lm_eval.tasks import TaskManager, include_path, initialize_tasks
from lm_eval.tasks import include_path, initialize_tasks
from lm_eval.utils import make_table from lm_eval.utils import make_table
...@@ -169,6 +168,7 @@ def cli_evaluate(args: Union[argparse.Namespace, None] = None) -> None: ...@@ -169,6 +168,7 @@ def cli_evaluate(args: Union[argparse.Namespace, None] = None) -> None:
assert args.output_path, "Specify --output_path" assert args.output_path, "Specify --output_path"
initialize_tasks(args.verbosity) initialize_tasks(args.verbosity)
task_manager = TaskManager(args.verbosity, include_path=args.include_path)
if args.limit: if args.limit:
eval_logger.warning( eval_logger.warning(
...@@ -180,12 +180,12 @@ def cli_evaluate(args: Union[argparse.Namespace, None] = None) -> None: ...@@ -180,12 +180,12 @@ def cli_evaluate(args: Union[argparse.Namespace, None] = None) -> None:
include_path(args.include_path) include_path(args.include_path)
if args.tasks is None: if args.tasks is None:
task_names = ALL_TASKS eval_logger.error("Need to specify task to evaluate.")
sys.exit()
elif args.tasks == "list": elif args.tasks == "list":
eval_logger.info( eval_logger.info(
f"Available Tasks:\n - {(os.linesep + ' - ').join(sorted(ALL_TASKS))}" "Available Tasks:\n - {}".format("\n - ".join(task_manager.all_tasks()))
) )
sys.exit()
else: else:
if os.path.isdir(args.tasks): if os.path.isdir(args.tasks):
import glob import glob
...@@ -196,16 +196,14 @@ def cli_evaluate(args: Union[argparse.Namespace, None] = None) -> None: ...@@ -196,16 +196,14 @@ def cli_evaluate(args: Union[argparse.Namespace, None] = None) -> None:
config = utils.load_yaml_config(yaml_file) config = utils.load_yaml_config(yaml_file)
task_names.append(config) task_names.append(config)
else: else:
tasks_list = args.tasks.split(",") task_list = args.tasks.split(",")
task_names = utils.pattern_match(tasks_list, ALL_TASKS) task_names = task_manager.match_tasks(task_list)
for task in [task for task in tasks_list if task not in task_names]: for task in [task for task in task_list if task not in task_names]:
if os.path.isfile(task): if os.path.isfile(task):
config = utils.load_yaml_config(task) config = utils.load_yaml_config(task)
task_names.append(config) task_names.append(config)
task_missing = [ task_missing = [
task task for task in task_list if task not in task_names and "*" not in task
for task in tasks_list
if task not in task_names and "*" not in task
] # we don't want errors if a wildcard ("*") task name was used ] # we don't want errors if a wildcard ("*") task name was used
if task_missing: if task_missing:
...@@ -237,6 +235,7 @@ def cli_evaluate(args: Union[argparse.Namespace, None] = None) -> None: ...@@ -237,6 +235,7 @@ def cli_evaluate(args: Union[argparse.Namespace, None] = None) -> None:
output_path_file = path.joinpath("results.json") output_path_file = path.joinpath("results.json")
eval_logger.info(f"Selected Tasks: {task_names}") eval_logger.info(f"Selected Tasks: {task_names}")
eval_logger.info("Loading selected tasks...")
results = evaluator.simple_evaluate( results = evaluator.simple_evaluate(
model=args.model, model=args.model,
...@@ -253,6 +252,7 @@ def cli_evaluate(args: Union[argparse.Namespace, None] = None) -> None: ...@@ -253,6 +252,7 @@ def cli_evaluate(args: Union[argparse.Namespace, None] = None) -> None:
write_out=args.write_out, write_out=args.write_out,
log_samples=args.log_samples, log_samples=args.log_samples,
gen_kwargs=args.gen_kwargs, gen_kwargs=args.gen_kwargs,
task_manager=task_manager,
predict_only=args.predict_only, predict_only=args.predict_only,
) )
......
...@@ -85,7 +85,6 @@ class TaskConfig(dict): ...@@ -85,7 +85,6 @@ class TaskConfig(dict):
filter_list: Union[str, list] = None filter_list: Union[str, list] = None
should_decontaminate: bool = False should_decontaminate: bool = False
doc_to_decontamination_query: str = None doc_to_decontamination_query: str = None
metadata: dict = None # by default, not used in the code. allows for users to pass arbitrary info to tasks metadata: dict = None # by default, not used in the code. allows for users to pass arbitrary info to tasks
def __post_init__(self) -> None: def __post_init__(self) -> None:
...@@ -306,7 +305,7 @@ class Task(abc.ABC): ...@@ -306,7 +305,7 @@ class Task(abc.ABC):
return self.validation_docs() return self.validation_docs()
else: else:
eval_logger.warning( eval_logger.warning(
"has_training_docs and has_validation_docs are False" f"[Task: {self.config.task}] has_training_docs and has_validation_docs are False"
", using test_docs as fewshot_docs but this is not recommended." ", using test_docs as fewshot_docs but this is not recommended."
) )
return self.test_docs() return self.test_docs()
...@@ -437,6 +436,9 @@ class Task(abc.ABC): ...@@ -437,6 +436,9 @@ class Task(abc.ABC):
""" """
pass pass
def get_config(self, key: str) -> Any:
return getattr(self._config, key, None)
@classmethod @classmethod
def count_bytes(cls, doc): def count_bytes(cls, doc):
"""Used for byte-level perplexity metrics in rolling loglikelihood""" """Used for byte-level perplexity metrics in rolling loglikelihood"""
...@@ -622,7 +624,7 @@ class ConfigurableTask(Task): ...@@ -622,7 +624,7 @@ class ConfigurableTask(Task):
INV_AGG_REGISTRY = {v: k for k, v in AGGREGATION_REGISTRY.items()} INV_AGG_REGISTRY = {v: k for k, v in AGGREGATION_REGISTRY.items()}
metric_agg = get_metric_aggregation(metric_name) metric_agg = get_metric_aggregation(metric_name)
eval_logger.warning( eval_logger.warning(
f"[Task: {self._config.task}] metric {metric_name} is defined, but aggregation is not. " f"[Task: {self.config.task}] metric {metric_name} is defined, but aggregation is not. "
f"using default " f"using default "
f"aggregation={INV_AGG_REGISTRY[metric_agg]}" f"aggregation={INV_AGG_REGISTRY[metric_agg]}"
) )
...@@ -634,7 +636,7 @@ class ConfigurableTask(Task): ...@@ -634,7 +636,7 @@ class ConfigurableTask(Task):
] ]
else: else:
eval_logger.warning( eval_logger.warning(
f"[Task: {self._config.task}] metric {metric_name} is defined, but higher_is_better is not. " f"[Task: {self.config.task}] metric {metric_name} is defined, but higher_is_better is not. "
f"using default " f"using default "
f"higher_is_better={is_higher_better(metric_name)}" f"higher_is_better={is_higher_better(metric_name)}"
) )
......
...@@ -4,20 +4,24 @@ import collections ...@@ -4,20 +4,24 @@ import collections
import torch import torch
import logging
import numpy as np import numpy as np
import lm_eval.api import lm_eval.api
import lm_eval.tasks
import lm_eval.models import lm_eval.models
import lm_eval.api.metrics import lm_eval.api.metrics
import lm_eval.api.registry import lm_eval.api.registry
from lm_eval.tasks import (
get_task_dict,
TaskManager
)
from lm_eval.utils import ( from lm_eval.utils import (
positional_deprecated, positional_deprecated,
run_task_tests, run_task_tests,
get_git_commit_hash, get_git_commit_hash,
simple_parse_args_string, simple_parse_args_string,
eval_logger, eval_logger
) )
...@@ -38,6 +42,8 @@ def simple_evaluate( ...@@ -38,6 +42,8 @@ def simple_evaluate(
write_out: bool = False, write_out: bool = False,
log_samples: bool = True, log_samples: bool = True,
gen_kwargs: str = None, gen_kwargs: str = None,
task_manager: TaskManager = None,
verbosity: str = "INFO",
predict_only: bool = False, predict_only: bool = False,
): ):
"""Instantiate and evaluate a model on a list of tasks. """Instantiate and evaluate a model on a list of tasks.
...@@ -47,7 +53,7 @@ def simple_evaluate( ...@@ -47,7 +53,7 @@ def simple_evaluate(
:param model_args: Optional[str] :param model_args: Optional[str]
String arguments for each model class, see LM.create_from_arg_string. String arguments for each model class, see LM.create_from_arg_string.
Ignored if `model` argument is a LM object. Ignored if `model` argument is a LM object.
:param tasks: list[Union[str, Task]] :param tasks: list[Union[str, dict, Task]]
List of task names or Task objects. Task objects will be taken to have name task.EVAL_HARNESS_NAME if defined and type(task).__name__ otherwise. List of task names or Task objects. Task objects will be taken to have name task.EVAL_HARNESS_NAME if defined and type(task).__name__ otherwise.
:param num_fewshot: int :param num_fewshot: int
Number of examples in few-shot context Number of examples in few-shot context
...@@ -84,6 +90,8 @@ def simple_evaluate( ...@@ -84,6 +90,8 @@ def simple_evaluate(
1234 1234
) # TODO: this may affect training runs that are run with evaluation mid-run. ) # TODO: this may affect training runs that are run with evaluation mid-run.
eval_logger.setLevel(getattr(logging, f"{verbosity}"))
if tasks is None: if tasks is None:
tasks = [] tasks = []
assert ( assert (
...@@ -125,11 +133,18 @@ def simple_evaluate( ...@@ -125,11 +133,18 @@ def simple_evaluate(
+ ".db", + ".db",
) )
task_dict = lm_eval.tasks.get_task_dict(tasks) if task_manager is None:
task_manager = TaskManager(verbosity)
eval_logger.info(
"get_task_dict has been updated to accept an optional argument, `task_manager`"
"Read more here: https://github.com/EleutherAI/lm-evaluation-harness/blob/recursive-groups/docs/interface.md#external-library-usage"
)
task_dict = get_task_dict(tasks, task_manager)
for task_name in task_dict.keys(): for task_name in task_dict.keys():
task_obj = task_dict[task_name] task_obj = task_dict[task_name]
if isinstance(task_obj, tuple): if isinstance(task_obj, tuple):
group, task_obj = task_obj _, task_obj = task_obj
if task_obj is None: if task_obj is None:
continue continue
...@@ -169,6 +184,7 @@ def simple_evaluate( ...@@ -169,6 +184,7 @@ def simple_evaluate(
decontamination_ngrams_path=decontamination_ngrams_path, decontamination_ngrams_path=decontamination_ngrams_path,
write_out=write_out, write_out=write_out,
log_samples=log_samples, log_samples=log_samples,
verbosity=verbosity,
) )
if lm.rank == 0: if lm.rank == 0:
...@@ -211,6 +227,7 @@ def evaluate( ...@@ -211,6 +227,7 @@ def evaluate(
decontamination_ngrams_path=None, decontamination_ngrams_path=None,
write_out: bool = False, write_out: bool = False,
log_samples: bool = True, log_samples: bool = True,
verbosity: str = "INFO",
): ):
"""Instantiate and evaluate a model on a list of tasks. """Instantiate and evaluate a model on a list of tasks.
...@@ -230,6 +247,7 @@ def evaluate( ...@@ -230,6 +247,7 @@ def evaluate(
Dictionary of results Dictionary of results
""" """
eval_logger.setLevel(getattr(logging, f"{verbosity}"))
# decontaminate = decontamination_ngrams_path is not None # decontaminate = decontamination_ngrams_path is not None
for task_name, task in task_dict.items(): for task_name, task in task_dict.items():
...@@ -511,11 +529,6 @@ def evaluate( ...@@ -511,11 +529,6 @@ def evaluate(
metrics.pop("alias") metrics.pop("alias")
current_size = metrics.pop("samples") current_size = metrics.pop("samples")
# TODO: There should be a way for users
# to toggle between weighted and
# unweighted averaging
# For unweighted averaging, use:
# current_size = 1
all_stderr = [] all_stderr = []
for metric in [ for metric in [
......
import os import os
import yaml import abc
import collections
from functools import partial
from typing import List, Union, Dict from typing import List, Union, Dict
from lm_eval import utils from lm_eval import utils
from lm_eval import prompts from lm_eval.api.task import Task, ConfigurableTask
from lm_eval.api.task import TaskConfig, Task, ConfigurableTask
from lm_eval.api.registry import (
register_task,
register_group,
TASK_REGISTRY,
GROUP_REGISTRY,
ALL_TASKS,
)
import logging import logging
# import python tasks
from .squadv2.task import SQuAD2
from .scrolls.task import (
QuALITY,
NarrativeQA,
ContractNLI,
GovReport,
SummScreenFD,
QMSum,
)
eval_logger = utils.eval_logger
def register_configurable_task(config: Dict[str, str]) -> int:
SubClass = type(
config["task"] + "ConfigurableTask",
(ConfigurableTask,),
{"CONFIG": TaskConfig(**config)},
)
if "task" in config: class TaskManager:
task_name = "{}".format(config["task"]) """TaskManager indexes all tasks from the default `lm_eval/tasks/`
register_task(task_name)(SubClass) and an optional directory if provided.
if "group" in config: """
if config["group"] == config["task"]: def __init__(
raise ValueError("task and group name cannot be the same") self,
elif isinstance(config["group"], str): verbosity="INFO",
group_name = [config["group"]] include_path=None
else: ) -> None:
group_name = config["group"]
self.verbosity = verbosity
self.include_path = include_path
self.logger = utils.eval_logger
self.logger.setLevel(getattr(logging, f"{verbosity}"))
self._task_index = self.initialize_tasks(
include_path=include_path
)
self._all_tasks = sorted(list(self._task_index.keys()))
for group in group_name: self.task_group_map = collections.defaultdict(list)
register_group(group)(SubClass)
return 0 def initialize_tasks(self, include_path: str = None):
"""Creates an dictionary of tasks index.
:param include_path: str = None
An additional path to be searched for tasks
:return
Dictionary of task names as key and task metadata
"""
all_paths = [os.path.dirname(os.path.abspath(__file__)) + "/"]
if include_path is not None:
if isinstance(include_path, str):
include_path = [include_path]
all_paths.extend(include_path)
task_index = {}
for task_dir in all_paths:
tasks = self._get_task_and_group(task_dir)
task_index = {**tasks, **task_index}
def register_configurable_group(config: Dict[str, str], yaml_path: str = None) -> int: return task_index
group = config["group"]
all_task_list = config["task"] @property
config_list = [task for task in all_task_list if not isinstance(task, str)] def all_tasks(self):
task_list = [task for task in all_task_list if isinstance(task, str)] return self._all_tasks
for task_config in config_list: @property
def task_index(self):
base_config = {} return self._task_index
task_name_config = {}
if "task" in task_config: def match_tasks(self, task_list):
task_name = task_config["task"] return utils.pattern_match(
if task_name in ALL_TASKS: task_list, self.all_tasks
task_obj = TASK_REGISTRY[task_name]
if isinstance(task_obj, tuple):
_, task_obj = task_obj
if task_obj is not None:
base_config = task_obj.CONFIG.to_dict(keep_callable=True)
task_name_config["task"] = f"{group}_{task_name}"
task_config = utils.load_yaml_config(yaml_path, task_config)
var_configs = check_prompt_config(
{
**base_config,
**task_config,
**{"group": group},
**task_name_config,
},
yaml_path=os.path.dirname(yaml_path),
) )
for config in var_configs:
register_configurable_task(config) def _name_is_registered(self, name):
if name in self.all_tasks:
task_names = utils.pattern_match(task_list, ALL_TASKS) return True
for task in task_names: return False
if (task in TASK_REGISTRY) or (task in GROUP_REGISTRY):
if group in GROUP_REGISTRY: def _name_is_task(self, name):
GROUP_REGISTRY[group].append(task) if self._name_is_registered(name) and ("task" in self.task_index[name]["type"]):
return True
return False
def _name_is_group(self, name):
if self._name_is_registered(name) and (self.task_index[name]["type"] == "group"):
return True
return False
def _name_is_python_task(self, name):
if self._name_is_registered(name) and (self.task_index[name]["type"] == "python_task"):
return True
return False
def _config_is_task(self, config):
if ("task" in config) and isinstance(config["task"], str):
return True
return False
def _config_is_group(self, config):
if ("task" in config) and isinstance(config["task"], list):
return True
return False
def _config_is_python_task(self, config):
if "class" in config:
return True
return False
def _get_yaml_path(self, name):
assert name in self.task_index
return self.task_index[name]["yaml_path"]
def _get_config(self, name):
assert name in self.task_index
yaml_path = self._get_yaml_path(name)
if yaml_path == -1:
return {}
else:
return utils.load_yaml_config(yaml_path, mode="full")
def _get_tasklist(self, name):
assert self._name_is_task(name) == False
return self.task_index[name]["task"]
def _process_alias(self, config, group=None):
# If the group is not the same as the original
# group which the group alias was intended for,
# Set the group_alias to None instead.
if ("group_alias" in config) and ("group" in config) and group is not None:
if config["group"] != group:
config["group_alias"] = None
return config
def _load_individual_task_or_group(
self,
name_or_config: Union[str, dict] = None,
parent_name: str = None,
update_config: dict = None,
yaml_path: str = None,
) -> ConfigurableTask:
def load_task(config, task, group=None, yaml_path=None):
if "include" in config:
assert yaml_path is not None
config.update(
utils.load_yaml_config(
yaml_path,
yaml_config={"include": config.pop("include")},
mode="full",
)
)
if self._config_is_python_task(config):
task_object = config["class"]()
else:
config = self._process_alias(config, group=group)
task_object = ConfigurableTask(config=config)
if group is not None:
task_object = (group, task_object)
return {task: task_object}
if isinstance(name_or_config, str):
if update_config is not None:
# Process name_or_config as a dict instead
name_or_config = {"task": name_or_config, **update_config}
elif self._name_is_task(name_or_config):
task_config = self._get_config(name_or_config)
return load_task(task_config, task=name_or_config, group=parent_name)
else: else:
GROUP_REGISTRY[group] = [task] group_name = name_or_config
ALL_TASKS.add(group) subtask_list = self._get_tasklist(name_or_config)
if subtask_list == -1:
group_config = self._get_config(name_or_config)
subtask_list = group_config["task"]
# This checks if we're at the root.
if parent_name is None:
group_config = self._get_config(name_or_config)
if set(group_config.keys()) > set(["task", "group"]):
update_config = {
k:v for k,v in group_config.items() if k not in ["task", "group"]
}
yaml_path = self._get_yaml_path(group_name)
return 0 if (update_config is not None) and ("group_alias" in update_config):
group_name = update_config["group_alias"]
update_config.pop("group_alias")
if isinstance(name_or_config, dict):
def check_prompt_config( if update_config is not None:
config: Dict[str, str], yaml_path: str = None name_or_config={
) -> List[Dict[str, str]]: **name_or_config,
all_configs = [] **update_config,
if "use_prompt" in config:
prompt_list = prompts.load_prompt_list(
use_prompt=config["use_prompt"],
dataset_name=config["dataset_path"],
subset_name=config["dataset_name"] if "dataset_name" in config else None,
yaml_path=yaml_path,
)
for idx, prompt_variation in enumerate(prompt_list):
all_configs.append(
{
**config,
**{"use_prompt": prompt_variation},
**{
"task": "_".join(
[
config["task"]
if "task" in config
else get_task_name_from_config(config),
prompt_variation.split("/")[-1]
if ".yaml" in prompt_variation
else prompt_variation,
]
)
},
**{"output_type": "generate_until"},
} }
)
else:
all_configs.append(config)
return all_configs
def get_task_name_from_config(task_config: Dict[str, str]) -> str: if self._config_is_task(name_or_config):
if "dataset_name" in task_config: name = name_or_config["task"]
return "{dataset_path}_{dataset_name}".format(**task_config) # If the name is registered as a group
else: # if self._name_is_task(name) is False:
return "{dataset_path}".format(**task_config) if self._name_is_group(name):
group_name = name
update_config = {k:v for k,v in name_or_config.items() if k != "task"}
subtask_list = self._get_tasklist(name)
if subtask_list == -1:
subtask_list = self._get_config(name)["task"]
else:
if self._name_is_registered(name):
base_task_config = self._get_config(name)
# Check if this is a duplicate.
if parent_name is not None:
name_or_config["group"] = parent_name
num_duplicate = len(list(filter(lambda x: x.startswith(name), self.task_group_map[parent_name])))
if num_duplicate > 0:
name = f"{name}-{num_duplicate}"
self.task_group_map[parent_name].append(name)
task_config={
**base_task_config,
**name_or_config,
}
else:
task_config = name_or_config
return load_task(task_config, task=name, group=parent_name, yaml_path=yaml_path)
else:
group_name = name_or_config["group"]
subtask_list = name_or_config["task"]
# update_config = {k:v for k,v in name_or_config.items() if k != "task"}
if set(name_or_config.keys()) > set(["task", "group"]):
update_config = {
k:v for k,v in name_or_config.items() if k not in ["task", "group"]
}
all_subtasks = {}
if (parent_name is not None):
all_subtasks = {group_name: (parent_name, None)}
def include_task_folder(task_dir: str, register_task: bool = True) -> None: fn = partial(self._load_individual_task_or_group, parent_name=group_name, update_config=update_config, yaml_path=yaml_path)
""" all_subtasks = {**all_subtasks, **dict(collections.ChainMap(*map(fn, subtask_list)))}
Calling this function return all_subtasks
"""
# Track whether any tasks failed during loading
import_fail = False
for root, subdirs, file_list in os.walk(task_dir):
# if (subdirs == [] or subdirs == ["__pycache__"]) and (len(file_list) > 0):
for f in file_list:
if f.endswith(".yaml"):
yaml_path = os.path.join(root, f)
try:
config = utils.load_yaml_config(yaml_path)
if "task" not in config:
continue
all_configs = check_prompt_config(
config, yaml_path=os.path.dirname(yaml_path)
)
for config in all_configs:
if register_task:
if isinstance(config["task"], str):
register_configurable_task(config)
else:
if isinstance(config["task"], list):
register_configurable_group(config, yaml_path)
# Log this silently and show it only when
# the user defines the appropriate verbosity.
except (ImportError, ModuleNotFoundError) as e:
import_fail = True
eval_logger.debug(
f"{yaml_path}: {e}. Config will not be added to registry."
)
except Exception as error:
import traceback
eval_logger.warning(
"Unexpected error loading config in\n"
f" {yaml_path}\n"
" Config will not be added to registry\n"
f" Error: {error}\n"
f" Traceback: {traceback.format_exc()}"
)
if import_fail: def load_task_or_group(self, task_list: Union[str, list] = None) -> dict:
eval_logger.warning( """Loads a dictionary of task objects from a list
"Some tasks could not be loaded due to missing dependencies."
" Run with `--verbosity DEBUG` for full details."
)
return 0
:param task_list: Union[str, list] = None
Single string or list of string of task names to be loaded
def include_path(task_dir): :return
include_task_folder(task_dir) Dictionary of task objects
# Register Benchmarks after all tasks have been added """
include_task_folder(task_dir, register_task=False) if isinstance(task_list, str):
return 0 task_list = [task_list]
all_loaded_tasks = dict(
collections.ChainMap(
*map(
self._load_individual_task_or_group,
task_list
)
)
)
return all_loaded_tasks
def load_config(self, config: Dict):
return self._load_individual_task_or_group(config)
def _get_task_and_group(self, task_dir: str):
"""Creates an dictionary of tasks index with the following metadata,
- `type`, that can be either `task`, `python_task`, or `group`.
`task` refer to regular task configs, `python_task` are special
yaml files that only consists of `task` and `class` parameters.
`group` are group configs.
- `yaml_path`, path to the yaml file. If the entry is a `group` that
was configured through a task config, the yaml_path will be -1
and all subtasks will be listed in `task` (see below)
- `task`, reserved for entries with `type` as `group`. This will list
all subtasks. When a group config is created (as opposed to task
config having `group` parameter set), this will be set to -1 to
avoid recursive indexing. The whole list of subtasks will be loaded
at evaluation.
:param task_dir: str
A directory to check for tasks
:return
Dictionary of task names as key and task metadata
"""
tasks_and_groups = collections.defaultdict()
for root, _, file_list in os.walk(task_dir):
for f in file_list:
if f.endswith(".yaml"):
yaml_path = os.path.join(root, f)
config = utils.load_yaml_config(yaml_path, mode="simple")
if self._config_is_python_task(config):
# This is a python class config
tasks_and_groups[config["task"]] = {
"type": "python_task",
"yaml_path": yaml_path,
}
elif self._config_is_group(config):
# This is a group config
tasks_and_groups[config["group"]] = {
"type": "group",
"task": -1, # This signals that
# we don't need to know
# the task list for indexing
# as it can be loaded
# when called.
"yaml_path": yaml_path,
}
def initialize_tasks(verbosity="INFO"): # # Registered the level 1 tasks from a group config
eval_logger.setLevel(getattr(logging, f"{verbosity}")) # for config in config["task"]:
# if isinstance(config, dict) and self._config_is_task(config):
# task = config["task"]
# tasks_and_groups[task] = {
# "type": "task",
# "yaml_path": yaml_path,
# }
elif self._config_is_task(config):
# This is a task config
task = config["task"]
tasks_and_groups[task] = {
"type": "task",
"yaml_path": yaml_path,
}
task_dir = os.path.dirname(os.path.abspath(__file__)) + "/" if "group" in config:
include_path(task_dir) groups = config["group"]
if isinstance(config["group"], str):
groups = [groups]
for group in groups:
if group not in tasks_and_groups:
tasks_and_groups[group] = {
"type": "group",
"task": [task],
"yaml_path": -1,
}
else:
tasks_and_groups[group]["task"].append(task)
else:
self.logger.debug(f"File {f} in {root} could not be loaded")
return tasks_and_groups
def include_path(task_dir):
logger = utils.eval_logger
logger.setLevel(getattr(logging, "INFO"))
logger.info(
"To still use tasks loaded from args.include_path,"
"see an example of the new TaskManager API in https://github.com/EleutherAI/lm-evaluation-harness/blob/main/docs/interface.md#external-library-usage"
)
return 0
def get_task(task_name, config): def initialize_tasks(verbosity="INFO"):
try: logger = utils.eval_logger
return TASK_REGISTRY[task_name](config=config) logger.setLevel(getattr(logging, f"{verbosity}"))
except KeyError: logger.info(
eval_logger.info("Available tasks:") "lm_eval.tasks.initialize_tasks() is deprecated and no longer necessary. "
eval_logger.info(list(TASK_REGISTRY) + list(GROUP_REGISTRY)) "It will be removed in v0.4.2 release. "
raise KeyError(f"Missing task {task_name}") "TaskManager will instead be used."
)
return 0
def get_task_name_from_config(task_config: Dict[str, str]) -> str:
if "task" in task_config:
return task_config["task"]
if "dataset_name" in task_config:
return "{dataset_path}_{dataset_name}".format(**task_config)
else:
return "{dataset_path}".format(**task_config)
def get_task_name_from_object(task_object): def get_task_name_from_object(task_object):
for name, class_ in TASK_REGISTRY.items(): if hasattr(task_object, "config"):
if class_ is task_object: return task_object._config["task"]
return name
# TODO: scrap this # TODO: scrap this
# this gives a mechanism for non-registered tasks to have a custom name anyways when reporting # this gives a mechanism for non-registered tasks to have a custom name anyways when reporting
...@@ -234,54 +382,40 @@ def get_task_name_from_object(task_object): ...@@ -234,54 +382,40 @@ def get_task_name_from_object(task_object):
else type(task_object).__name__ else type(task_object).__name__
) )
def get_task_dict(task_name_list: List[Union[str, Dict, Task]], task_manager: TaskManager = None):
"""Creates a dictionary of task objects from either a name of task, config, or prepared Task object.
# TODO: pass num_fewshot and other cmdline overrides in a better way :param task_name_list: List[Union[str, Dict, Task]]
def get_task_dict(task_name_list: List[Union[str, Dict, Task]], **kwargs): Name of model or LM object, see lm_eval.models.get_model
config = {**kwargs} :param task_manager: TaskManager = None
A TaskManager object that stores indexed tasks. If not set,
task_manager will load one. This should be set by the user
if there are additional paths that want to be included
via `include_path`
task_name_from_registry_dict = {} :return
Dictionary of task objects
"""
task_name_from_string_dict = {}
task_name_from_config_dict = {} task_name_from_config_dict = {}
task_name_from_object_dict = {} task_name_from_object_dict = {}
if not isinstance(task_name_list, list): if isinstance(task_name_list, str):
task_name_list = [task_name_list] task_name_list = [task_name_list]
for task_element in task_name_list: string_task_name_list = [task for task in task_name_list if isinstance(task, str)]
if isinstance(task_element, str): others_task_name_list = [task for task in task_name_list if ~isinstance(task, str)]
if task_element in GROUP_REGISTRY: if len(string_task_name_list) > 0:
group_name = task_element if task_manager is None:
for task_name in GROUP_REGISTRY[task_element]: task_manager = TaskManager()
if task_name not in task_name_from_registry_dict:
task_obj = get_task_dict(task_name)
if task_name in task_obj.keys():
task_dict = {
task_name: (group_name, task_obj[task_name]),
}
else:
task_dict = {
task_name: (group_name, None),
**task_obj,
}
task_name_from_registry_dict = { task_name_from_string_dict = task_manager.load_task_or_group(string_task_name_list)
**task_name_from_registry_dict,
**task_dict,
}
else:
task_name = task_element
if task_name not in task_name_from_registry_dict:
task_name_from_registry_dict = {
**task_name_from_registry_dict,
task_name: get_task(task_name=task_element, config=config),
}
elif isinstance(task_element, dict): for task_element in others_task_name_list:
task_element.update(config) if isinstance(task_element, dict):
task_name_from_config_dict = { task_name_from_config_dict = {
**task_name_from_config_dict, **task_name_from_config_dict,
get_task_name_from_config(task_element): ConfigurableTask( **task_manager.load_config(config=task_element),
config=task_element
),
} }
elif isinstance(task_element, Task): elif isinstance(task_element, Task):
...@@ -290,11 +424,11 @@ def get_task_dict(task_name_list: List[Union[str, Dict, Task]], **kwargs): ...@@ -290,11 +424,11 @@ def get_task_dict(task_name_list: List[Union[str, Dict, Task]], **kwargs):
get_task_name_from_object(task_element): task_element, get_task_name_from_object(task_element): task_element,
} }
assert set(task_name_from_registry_dict.keys()).isdisjoint( assert set(task_name_from_string_dict.keys()).isdisjoint(
set(task_name_from_object_dict.keys()) set(task_name_from_object_dict.keys())
) )
return { return {
**task_name_from_registry_dict, **task_name_from_string_dict,
**task_name_from_config_dict, **task_name_from_config_dict,
**task_name_from_object_dict, **task_name_from_object_dict,
} }
output_type: generate_until output_type: generate_until
validation_split: validation test_split: null
doc_to_choice: null
metric_list: metric_list:
- metric: exact_match - metric: exact_match
aggregation: mean aggregation: mean
......
group: flan_anli
task:
- include: yaml_templates/held_in_template_yaml
task: anli_r1
dataset_path: anli
use_prompt: prompt_templates/anli.yaml:*
validation_split: dev_r1
- include: yaml_templates/held_in_template_yaml
task: anli_r2
dataset_path: anli
use_prompt: prompt_templates/anli.yaml:*
validation_split: dev_r2
- include: yaml_templates/held_in_template_yaml
task: anli_r3
dataset_path: anli
use_prompt: prompt_templates/anli.yaml:*
validation_split: dev_r3
group: flan_arc
task:
- include: yaml_templates/held_in_template_yaml
task: arc_easy
dataset_path: ai2_arc
dataset_name: ARC-Easy
use_prompt: prompt_templates/arc.yaml:*
validation_split: validation
- include: yaml_templates/held_in_template_yaml
task: arc_challenge
dataset_path: ai2_arc
dataset_name: ARC-Challenge
use_prompt: prompt_templates/arc.yaml:*
validation_split: validation
group: flan_boolq
task:
- include: yaml_templates/held_in_template_yaml
dataset_path: super_glue
dataset_name: boolq
use_prompt: prompt_templates/boolq.yaml:*
validation_split: validation
group: flan_cot
task:
- include: yaml_templates/cot_template_yaml
dataset_path: gsm8k
use_prompt: promptsource:*
validation_split: validation
- include: yaml_templates/cot_template_yaml
dataset_path: EleutherAI/asdiv
use_prompt: promptsource:*
validation_split: validation
group: flan_held_in group: flan_held_in
group_alias: Flan (Held-In)
task: task:
- flan_boolq # ANLI R1
- flan_rte - group: anli_r1_flan
- flan_anli group_alias: ANLI R1
- flan_arc task:
- task: anli_r1
task_alias: prompt-0
include: _held_in_template_yaml
doc_to_text: "{{premise}}\n\nChoose your answer: based on the paragraph above can we conclude that \"{{hypothesis}}\"?\n\nOPTIONS:\n- Yes\n- It's impossible to say\n- No\nI think the answer is"
doc_to_target: "{{[\"Yes\", \"It's impossible to say\", \"No\"][label]}}"
- task: anli_r1
task_alias: prompt-1
include: _held_in_template_yaml
doc_to_text: "{{premise}}\n\nBased on that paragraph can we conclude that this sentence is true?\n{{hypothesis}}\n\nOPTIONS:\n- Yes\n- It's impossible to say\n- No"
doc_to_target: "{{[\"Yes\", \"It's impossible to say\", \"No\"][label]}}"
- task: anli_r1
task_alias: prompt-2
include: _held_in_template_yaml
doc_to_text: "{{premise}}\n\nCan we draw the following conclusion?\n{{hypothesis}}\n\nOPTIONS:\n- Yes\n- It's impossible to say\n- No"
doc_to_target: "{{[\"Yes\", \"It's impossible to say\", \"No\"][label]}}"
- task: anli_r1
task_alias: prompt-3
include: _held_in_template_yaml
doc_to_text: "{{premise}}\nDoes this next sentence follow, given the preceding text?\n{{hypothesis}}\n\nOPTIONS:\n- Yes\n- It's impossible to say\n- No"
doc_to_target: "{{[\"Yes\", \"It's impossible to say\", \"No\"][label]}}"
- task: anli_r1
task_alias: prompt-4
include: _held_in_template_yaml
doc_to_text: "{{premise}}\nCan we infer the following?\n{{hypothesis}}\n\nOPTIONS:\n- Yes\n- It's impossible to say\n- No\nThe answer is:"
doc_to_target: "{{[\"Yes\", \"It's impossible to say\", \"No\"][label]}}"
- task: anli_r1
task_alias: prompt-5
include: _held_in_template_yaml
doc_to_text: "Read the following paragraph and determine if the hypothesis is true:\n\n{{premise}}\n\nOPTIONS:\n- Yes\n- It's impossible to say\n- No\nHypothesis: {{hypothesis}}\n\n\n"
doc_to_target: "{{[\"Yes\", \"It's impossible to say\", \"No\"][label]}}"
- task: anli_r1
task_alias: prompt-6
include: _held_in_template_yaml
doc_to_text: "Read the text and determine if the sentence is true (see options at the end):\n\n{{premise}}\n\nSentence: {{hypothesis}}\nOPTIONS:\n- Yes\n- It's impossible to say\n- No"
doc_to_target: "{{[\"Yes\", \"It's impossible to say\", \"No\"][label]}}"
- task: anli_r1
task_alias: prompt-7
include: _held_in_template_yaml
doc_to_text: "Can we draw the following hypothesis from the context (see options)? \n\nContext:\n\n{{premise}}\n\nHypothesis: {{hypothesis}}\nOPTIONS:\n- Yes\n- It's impossible to say\n- No"
doc_to_target: "{{[\"Yes\", \"It's impossible to say\", \"No\"][label]}}"
- task: anli_r1
task_alias: prompt-8
include: _held_in_template_yaml
doc_to_text: "Choose from options: Determine if the sentence is true based on the text below:\n{{hypothesis}}\n\n{{premise}}\nOPTIONS:\n- Yes\n- It's impossible to say\n- No"
doc_to_target: "{{[\"Yes\", \"It's impossible to say\", \"No\"][label]}}"
# ANLI R2
- group: anli_r2_flan
group_alias: ANLI R2
task:
- task: anli_r2
task_alias: prompt-0
include: _held_in_template_yaml
doc_to_text: "{{premise}}\n\nChoose your answer: based on the paragraph above can we conclude that \"{{hypothesis}}\"?\n\nOPTIONS:\n- Yes\n- It's impossible to say\n- No\nI think the answer is"
doc_to_target: "{{[\"Yes\", \"It's impossible to say\", \"No\"][label]}}"
- task: anli_r2
task_alias: prompt-1
include: _held_in_template_yaml
doc_to_text: "{{premise}}\n\nBased on that paragraph can we conclude that this sentence is true?\n{{hypothesis}}\n\nOPTIONS:\n- Yes\n- It's impossible to say\n- No"
doc_to_target: "{{[\"Yes\", \"It's impossible to say\", \"No\"][label]}}"
- task: anli_r2
task_alias: prompt-2
include: _held_in_template_yaml
doc_to_text: "{{premise}}\n\nCan we draw the following conclusion?\n{{hypothesis}}\n\nOPTIONS:\n- Yes\n- It's impossible to say\n- No"
doc_to_target: "{{[\"Yes\", \"It's impossible to say\", \"No\"][label]}}"
- task: anli_r2
task_alias: prompt-3
include: _held_in_template_yaml
doc_to_text: "{{premise}}\nDoes this next sentence follow, given the preceding text?\n{{hypothesis}}\n\nOPTIONS:\n- Yes\n- It's impossible to say\n- No"
doc_to_target: "{{[\"Yes\", \"It's impossible to say\", \"No\"][label]}}"
- task: anli_r2
task_alias: prompt-4
include: _held_in_template_yaml
doc_to_text: "{{premise}}\nCan we infer the following?\n{{hypothesis}}\n\nOPTIONS:\n- Yes\n- It's impossible to say\n- No\nThe answer is:"
doc_to_target: "{{[\"Yes\", \"It's impossible to say\", \"No\"][label]}}"
- task: anli_r2
task_alias: prompt-5
include: _held_in_template_yaml
doc_to_text: "Read the following paragraph and determine if the hypothesis is true:\n\n{{premise}}\n\nOPTIONS:\n- Yes\n- It's impossible to say\n- No\nHypothesis: {{hypothesis}}\n\n\n"
doc_to_target: "{{[\"Yes\", \"It's impossible to say\", \"No\"][label]}}"
- task: anli_r2
task_alias: prompt-6
include: _held_in_template_yaml
doc_to_text: "Read the text and determine if the sentence is true (see options at the end):\n\n{{premise}}\n\nSentence: {{hypothesis}}\nOPTIONS:\n- Yes\n- It's impossible to say\n- No"
doc_to_target: "{{[\"Yes\", \"It's impossible to say\", \"No\"][label]}}"
- task: anli_r2
task_alias: prompt-7
include: _held_in_template_yaml
doc_to_text: "Can we draw the following hypothesis from the context (see options)? \n\nContext:\n\n{{premise}}\n\nHypothesis: {{hypothesis}}\nOPTIONS:\n- Yes\n- It's impossible to say\n- No"
doc_to_target: "{{[\"Yes\", \"It's impossible to say\", \"No\"][label]}}"
- task: anli_r2
task_alias: prompt-8
include: _held_in_template_yaml
doc_to_text: "Choose from options: Determine if the sentence is true based on the text below:\n{{hypothesis}}\n\n{{premise}}\nOPTIONS:\n- Yes\n- It's impossible to say\n- No"
doc_to_target: "{{[\"Yes\", \"It's impossible to say\", \"No\"][label]}}"
# ANLI R3
- group: anli_r3_flan
group_alias: ANLI R3
task:
- task: anli_r3
task_alias: prompt-0
include: _held_in_template_yaml
doc_to_text: "{{premise}}\n\nChoose your answer: based on the paragraph above can we conclude that \"{{hypothesis}}\"?\n\nOPTIONS:\n- Yes\n- It's impossible to say\n- No\nI think the answer is"
doc_to_target: "{{[\"Yes\", \"It's impossible to say\", \"No\"][label]}}"
- task: anli_r3
task_alias: prompt-1
include: _held_in_template_yaml
doc_to_text: "{{premise}}\n\nBased on that paragraph can we conclude that this sentence is true?\n{{hypothesis}}\n\nOPTIONS:\n- Yes\n- It's impossible to say\n- No"
doc_to_target: "{{[\"Yes\", \"It's impossible to say\", \"No\"][label]}}"
- task: anli_r3
task_alias: prompt-2
include: _held_in_template_yaml
doc_to_text: "{{premise}}\n\nCan we draw the following conclusion?\n{{hypothesis}}\n\nOPTIONS:\n- Yes\n- It's impossible to say\n- No"
doc_to_target: "{{[\"Yes\", \"It's impossible to say\", \"No\"][label]}}"
- task: anli_r3
task_alias: prompt-3
include: _held_in_template_yaml
doc_to_text: "{{premise}}\nDoes this next sentence follow, given the preceding text?\n{{hypothesis}}\n\nOPTIONS:\n- Yes\n- It's impossible to say\n- No"
doc_to_target: "{{[\"Yes\", \"It's impossible to say\", \"No\"][label]}}"
- task: anli_r3
task_alias: prompt-4
include: _held_in_template_yaml
doc_to_text: "{{premise}}\nCan we infer the following?\n{{hypothesis}}\n\nOPTIONS:\n- Yes\n- It's impossible to say\n- No\nThe answer is:"
doc_to_target: "{{[\"Yes\", \"It's impossible to say\", \"No\"][label]}}"
- task: anli_r3
task_alias: prompt-5
include: _held_in_template_yaml
doc_to_text: "Read the following paragraph and determine if the hypothesis is true:\n\n{{premise}}\n\nOPTIONS:\n- Yes\n- It's impossible to say\n- No\nHypothesis: {{hypothesis}}\n\n\n"
doc_to_target: "{{[\"Yes\", \"It's impossible to say\", \"No\"][label]}}"
- task: anli_r3
task_alias: prompt-6
include: _held_in_template_yaml
doc_to_text: "Read the text and determine if the sentence is true (see options at the end):\n\n{{premise}}\n\nSentence: {{hypothesis}}\nOPTIONS:\n- Yes\n- It's impossible to say\n- No"
doc_to_target: "{{[\"Yes\", \"It's impossible to say\", \"No\"][label]}}"
- task: anli_r3
task_alias: prompt-7
include: _held_in_template_yaml
doc_to_text: "Can we draw the following hypothesis from the context (see options)? \n\nContext:\n\n{{premise}}\n\nHypothesis: {{hypothesis}}\nOPTIONS:\n- Yes\n- It's impossible to say\n- No"
doc_to_target: "{{[\"Yes\", \"It's impossible to say\", \"No\"][label]}}"
- task: anli_r3
task_alias: prompt-8
include: _held_in_template_yaml
doc_to_text: "Choose from options: Determine if the sentence is true based on the text below:\n{{hypothesis}}\n\n{{premise}}\nOPTIONS:\n- Yes\n- It's impossible to say\n- No"
doc_to_target: "{{[\"Yes\", \"It's impossible to say\", \"No\"][label]}}"
# Arc Easy
- group: arc_easy_flan
group_alias: Arc Easy
task:
- task: arc_easy
task_alias: prompt-0
include: _held_in_template_yaml
doc_to_text: "{{question}}\n\nOPTIONS:\n- {{choices.text|join('\n- ')}}"
doc_to_target: "{{choices.text[choices.label.index(answerKey)]}}"
- task: arc_easy
task_alias: prompt-1
include: _held_in_template_yaml
doc_to_text: "Question: {{question}}\nOPTIONS:\n- {{choices.text|join('\n- ')}}\nAnswer:"
doc_to_target: "{{choices.text[choices.label.index(answerKey)]}}"
- task: arc_easy
task_alias: prompt-2
include: _held_in_template_yaml
doc_to_text: "Question: {{question}}\n\nWhat is the correct answer to the question from the following choices?\nOPTIONS:\n- {{choices.text|join('\n- ')}}"
doc_to_target: "{{choices.text[choices.label.index(answerKey)]}}"
- task: arc_easy
task_alias: prompt-3
include: _held_in_template_yaml
doc_to_text: "Q: {{question}}\nWhat is the correct answer to this question?\nOPTIONS:\n- {{choices.text|join('\n- ')}}...A:"
doc_to_target: "{{choices.text[choices.label.index(answerKey)]}}"
- task: arc_easy
task_alias: prompt-4
include: _held_in_template_yaml
doc_to_text: "Choose your answer?\n\n{{question}}\n\nOPTIONS:\n- {{choices.text|join('\n- ')}}"
doc_to_target: "{{choices.text[choices.label.index(answerKey)]}}"
- task: arc_easy
task_alias: prompt-5
include: _held_in_template_yaml
doc_to_text: "Answer the question\n\n{{question}}\nOPTIONS:\n- {{choices.text|join('\n- ')}}"
doc_to_target: "{{choices.text[choices.label.index(answerKey)]}}"
- task: arc_easy
task_alias: prompt-6
include: _held_in_template_yaml
doc_to_text: "{{question}}\n\nPick the answer from these options\n\nOPTIONS:\n- {{choices.text|join('\n- ')}}"
doc_to_target: "{{choices.text[choices.label.index(answerKey)]}}"
# Arc Challenge
- group: arc_challenge_flan
group_alias: Arc Challenge
task:
- task: arc_challenge
task_alias: prompt-0
include: _held_in_template_yaml
doc_to_text: "{{question}}\n\nOPTIONS:\n- {{choices.text|join('\n- ')}}"
doc_to_target: "{{choices.text[choices.label.index(answerKey)]}}"
- task: arc_challenge
task_alias: prompt-1
include: _held_in_template_yaml
doc_to_text: "Question: {{question}}\nOPTIONS:\n- {{choices.text|join('\n- ')}}\nAnswer:"
doc_to_target: "{{choices.text[choices.label.index(answerKey)]}}"
- task: arc_challenge
task_alias: prompt-2
include: _held_in_template_yaml
doc_to_text: "Question: {{question}}\n\nWhat is the correct answer to the question from the following choices?\nOPTIONS:\n- {{choices.text|join('\n- ')}}"
doc_to_target: "{{choices.text[choices.label.index(answerKey)]}}"
- task: arc_challenge
task_alias: prompt-3
include: _held_in_template_yaml
doc_to_text: "Q: {{question}}\nWhat is the correct answer to this question?\nOPTIONS:\n- {{choices.text|join('\n- ')}}...A:"
doc_to_target: "{{choices.text[choices.label.index(answerKey)]}}"
- task: arc_challenge
task_alias: prompt-4
include: _held_in_template_yaml
doc_to_text: "Choose your answer?\n\n{{question}}\n\nOPTIONS:\n- {{choices.text|join('\n- ')}}"
doc_to_target: "{{choices.text[choices.label.index(answerKey)]}}"
- task: arc_challenge
task_alias: prompt-5
include: _held_in_template_yaml
doc_to_text: "Answer the question\n\n{{question}}\nOPTIONS:\n- {{choices.text|join('\n- ')}}"
doc_to_target: "{{choices.text[choices.label.index(answerKey)]}}"
- task: arc_challenge
task_alias: prompt-6
include: _held_in_template_yaml
doc_to_text: "{{question}}\n\nPick the answer from these options\n\nOPTIONS:\n- {{choices.text|join('\n- ')}}"
doc_to_target: "{{choices.text[choices.label.index(answerKey)]}}"
# BoolQ
- group: boolq_flan
group_alias: BoolQ
task:
- task: boolq
task_alias: prompt-0
include: _held_in_template_yaml
doc_to_text: "{{passage}}\n\nCan we conclude that {{question}}?\n\nOPTIONS:\n- no\n- yes"
doc_to_target: "{{['no', 'yes'][label]}}"
- task: boolq
task_alias: prompt-1
include: _held_in_template_yaml
doc_to_text: "{{passage}}\n\nIs it true that {{question}}?\n\nOPTIONS:\n- no\n- yes"
doc_to_target: "{{['no', 'yes'][label]}}"
- task: boolq
task_alias: prompt-2
include: _held_in_template_yaml
doc_to_text: "{{passage}}\n\n{{question}}?\n\nOPTIONS:\n- no\n- yes"
doc_to_target: "{{['no', 'yes'][label]}}"
- task: boolq
task_alias: prompt-3
include: _held_in_template_yaml
doc_to_text: "Text: {{passage}}\n\nQuestion: {{question}}?\n\nOPTIONS:\n- no\n- yes"
doc_to_target: "{{['no', 'yes'][label]}}"
- task: boolq
task_alias: prompt-4
include: _held_in_template_yaml
doc_to_text: "{{passage}}\n\nWhat's the best answer to this question: {{question}}?\n\nOPTIONS:\n- no\n- yes"
doc_to_target: "{{['no', 'yes'][label]}}"
- task: boolq
task_alias: prompt-5
include: _held_in_template_yaml
doc_to_text: "{{passage}}\nBased on the above text what's the best answer to this question: {{question}}?\n\nOPTIONS:\n- no\n- yes"
doc_to_target: "{{['no', 'yes'][label]}}"
- task: boolq
task_alias: prompt-6
include: _held_in_template_yaml
doc_to_text: "{{passage}}\nAnswer this question making sure that the answer is supposed by the text: {{question}}?\n\nOPTIONS:\n- no\n- yes"
doc_to_target: "{{['no', 'yes'][label]}}"
- task: boolq
task_alias: prompt-7
include: _held_in_template_yaml
doc_to_text: "{{passage}}\n\nIs the following statement correct based on the text\n\n{{question}}\n\nOPTIONS:\n- no\n- yes"
doc_to_target: "{{['no', 'yes'][label]}}"
- task: boolq
task_alias: prompt-8
include: _held_in_template_yaml
doc_to_text: "{{passage}}\n\nIs this statement correct \"{{question}}\"?\n\nOPTIONS:\n- no\n- yes"
doc_to_target: "{{['no', 'yes'][label]}}"
- task: boolq
task_alias: prompt-9
include: _held_in_template_yaml
doc_to_text: "Is it true that {{question}} based on the following text?\n\n{{passage}}\n\nOPTIONS:\n- no\n- yes"
doc_to_target: "{{['no', 'yes'][label]}}"
# RTE
- group: rte_flan
group_alias: RTE
task:
- task: rte
task_alias: prompt-0
include: _held_in_template_yaml
doc_to_text: "{{sentence1}}\n\nQuestion with options: Based on the paragraph above can we conclude that \"{{sentence2}}\"?\n\nOPTIONS:\n- yes\n- no"
doc_to_target: "{{['yes', 'no'][label]}}"
- task: rte
task_alias: prompt-1
include: _held_in_template_yaml
doc_to_text: "{{sentence1}}\n\nBased on that paragraph can we conclude that the sentence below is true?\n{{sentence2}}\n\nOPTIONS:\n- yes\n- no"
doc_to_target: "{{['yes', 'no'][label]}}"
- task: rte
task_alias: prompt-2
include: _held_in_template_yaml
doc_to_text: "{{sentence1}}\n\nQ with options: Can we draw the following conclusion?\n{{sentence2}}\n\nOPTIONS:\n- yes\n- no"
doc_to_target: "{{['yes', 'no'][label]}}"
- task: rte
task_alias: prompt-3
include: _held_in_template_yaml
doc_to_text: "{{sentence1}}\nDoes this next sentence follow, given the preceding text?\n{{sentence2}}\n\nOPTIONS:\n- yes\n- no"
doc_to_target: "{{['yes', 'no'][label]}}"
- task: rte
task_alias: prompt-4
include: _held_in_template_yaml
doc_to_text: "{{sentence1}}\nOPTIONS:\n- yes\n- no\nQuestion: Can we infer the following?\n{{sentence2}}"
doc_to_target: "{{['yes', 'no'][label]}}"
- task: rte
task_alias: prompt-5
include: _held_in_template_yaml
doc_to_text: "Read the following paragraph and determine if the hypothesis is true. Select from options at the end:\n\n{{sentence1}}\n\nHypothesis: {{sentence2}}\nOPTIONS:\n- yes\n- no\nThe answer is"
doc_to_target: "{{['yes', 'no'][label]}}"
- task: rte
task_alias: prompt-6
include: _held_in_template_yaml
doc_to_text: "Read the text and determine if the sentence is true:\n\n{{sentence1}}\n\nSentence: {{sentence2}}\nOPTIONS:\n- yes\n- no\nA:"
doc_to_target: "{{['yes', 'no'][label]}}"
- task: rte
task_alias: prompt-7
include: _held_in_template_yaml
doc_to_text: "Question with options: can we draw the following hypothesis from the context? \n\nContext:\n\n{{sentence1}}\n\nHypothesis: {{sentence2}}\nOPTIONS:\n- yes\n- no\nA:"
doc_to_target: "{{['yes', 'no'][label]}}"
- task: rte
task_alias: prompt-8
include: _held_in_template_yaml
doc_to_text: "Determine if the sentence is true based on the text below. Choose from options.\n{{sentence2}}\n\n{{sentence1}}\nOPTIONS:\n- yes\n- no"
doc_to_target: "{{['yes', 'no'][label]}}"
group: flan_held_in
task:
- include: flan/yaml_templates/held_in_template_yaml
dataset_path: super_glue
dataset_name: boolq
use_prompt: flan/prompt_templates/boolq.yaml:*
validation_split: validation
- include: flan/yaml_templates/held_in_template_yaml
dataset_path: super_glue
dataset_name: rte
use_prompt: flan/prompt_templates/rte.yaml:*
validation_split: validation
- include: flan/yaml_templates/held_in_template_yaml
task: anli_r1
dataset_path: anli
use_prompt: flan/prompt_templates/anli.yaml:*
validation_split: dev_r1
- include: flan/yaml_templates/held_in_template_yaml
task: anli_r2
dataset_path: anli
use_prompt: flan/prompt_templates/anli.yaml:*
validation_split: dev_r2
- include: flan/yaml_templates/held_in_template_yaml
task: anli_r3
dataset_path: anli
use_prompt: flan/prompt_templates/anli.yaml:*
validation_split: dev_r3
- include: flan/yaml_templates/held_in_template_yaml
task: arc_easy
dataset_path: ai2_arc
dataset_name: ARC-Easy
use_prompt: flan/prompt_templates/arc.yaml:*
validation_split: validation
- include: flan/yaml_templates/held_in_template_yaml
task: arc_challenge
dataset_path: ai2_arc
dataset_name: ARC-Challenge
use_prompt: flan/prompt_templates/arc.yaml:*
validation_split: validation
group: flan_held_out group: flan_held_out
task: task:
# BBH # BBH
- bbh_flan_zeroshot - bbh_zeroshot
- bbh_flan_fewshot - bbh_fewshot
- bbh_flan_cot_fewshot - bbh_cot_fewshot
- bbh_flan_cot_zeroshot - bbh_cot_zeroshot
# MMLU # MMLU
- mmlu - mmlu
- mmlu_flan_n_shot_generative - mmlu_flan_n_shot_generative
......
group: flan_rte
task:
- include: yaml_templates/held_in_template_yaml
dataset_path: super_glue
dataset_name: rte
use_prompt: prompt_templates/rte.yaml:*
validation_split: validation
# Flan Prompt Templates
prompts:
"template-0":
doc_to_text: "{{premise}}\n\nChoose your answer: based on the paragraph above can we conclude that \"{{hypothesis}}\"?\n\nOPTIONS:\n- Yes\n- It's impossible to say\n- No\nI think the answer is"
doc_to_target: "{{[\"Yes\", \"It's impossible to say\", \"No\"][label]}}"
"template-1":
doc_to_text: "{{premise}}\n\nBased on that paragraph can we conclude that this sentence is true?\n{{hypothesis}}\n\nOPTIONS:\n- Yes\n- It's impossible to say\n- No"
doc_to_target: "{{[\"Yes\", \"It's impossible to say\", \"No\"][label]}}"
"template-2":
doc_to_text: "{{premise}}\n\nCan we draw the following conclusion?\n{{hypothesis}}\n\nOPTIONS:\n- Yes\n- It's impossible to say\n- No"
doc_to_target: "{{[\"Yes\", \"It's impossible to say\", \"No\"][label]}}"
"template-3":
doc_to_text: "{{premise}}\nDoes this next sentence follow, given the preceding text?\n{{hypothesis}}\n\nOPTIONS:\n- Yes\n- It's impossible to say\n- No"
doc_to_target: "{{[\"Yes\", \"It's impossible to say\", \"No\"][label]}}"
"template-4":
doc_to_text: "{{premise}}\nCan we infer the following?\n{{hypothesis}}\n\nOPTIONS:\n- Yes\n- It's impossible to say\n- No\nThe answer is:"
doc_to_target: "{{[\"Yes\", \"It's impossible to say\", \"No\"][label]}}"
"template-5":
doc_to_text: "Read the following paragraph and determine if the hypothesis is true:\n\n{{premise}}\n\nOPTIONS:\n- Yes\n- It's impossible to say\n- No\nHypothesis: {{hypothesis}}\n\n\n"
doc_to_target: "{{[\"Yes\", \"It's impossible to say\", \"No\"][label]}}"
"template-6":
doc_to_text: "Read the text and determine if the sentence is true (see options at the end):\n\n{{premise}}\n\nSentence: {{hypothesis}}\nOPTIONS:\n- Yes\n- It's impossible to say\n- No"
doc_to_target: "{{[\"Yes\", \"It's impossible to say\", \"No\"][label]}}"
"template-7":
doc_to_text: "Can we draw the following hypothesis from the context (see options)? \n\nContext:\n\n{{premise}}\n\nHypothesis: {{hypothesis}}\nOPTIONS:\n- Yes\n- It's impossible to say\n- No"
doc_to_target: "{{[\"Yes\", \"It's impossible to say\", \"No\"][label]}}"
"template-8":
doc_to_text: "Choose from options: Determine if the sentence is true based on the text below:\n{{hypothesis}}\n\n{{premise}}\nOPTIONS:\n- Yes\n- It's impossible to say\n- No"
doc_to_target: "{{[\"Yes\", \"It's impossible to say\", \"No\"][label]}}"
# Flan Prompt Templates
prompts:
"template-0":
doc_to_text: "{{question}}\n\nOPTIONS:\n- {{choices.text|join('\n- ')}}"
doc_to_target: "{{choices.text[choices.label.index(answerKey)]}}"
"template-1":
doc_to_text: "Question: {{question}}\nOPTIONS:\n- {{choices.text|join('\n- ')}}\nAnswer:"
doc_to_target: "{{choices.text[choices.label.index(answerKey)]}}"
"template-2":
doc_to_text: "Question: {{question}}\n\nWhat is the correct answer to the question from the following choices?\nOPTIONS:\n- {{choices.text|join('\n- ')}}"
doc_to_target: "{{choices.text[choices.label.index(answerKey)]}}"
"template-3":
doc_to_text: "Q: {{question}}\nWhat is the correct answer to this question?\nOPTIONS:\n- {{choices.text|join('\n- ')}}...A:"
doc_to_target: "{{choices.text[choices.label.index(answerKey)]}}"
"template-4":
doc_to_text: "Choose your answer?\n\n{{question}}\n\nOPTIONS:\n- {{choices.text|join('\n- ')}}"
doc_to_target: "{{choices.text[choices.label.index(answerKey)]}}"
"template-5":
doc_to_text: "Answer the question\n\n{{question}}\nOPTIONS:\n- {{choices.text|join('\n- ')}}"
doc_to_target: "{{choices.text[choices.label.index(answerKey)]}}"
"template-6":
doc_to_text: "{{question}}\n\nPick the answer from these options\n\nOPTIONS:\n- {{choices.text|join('\n- ')}}"
doc_to_target: "{{choices.text[choices.label.index(answerKey)]}}"
# Flan Prompt Templates
prompts:
"template-0":
doc_to_text: "{{passage}}\n\nCan we conclude that {{question}}?\n\nOPTIONS:\n- no\n- yes"
doc_to_target: "{{['no', 'yes'][label]}}"
"template-1":
doc_to_text: "{{passage}}\n\nIs it true that {{question}}?\n\nOPTIONS:\n- no\n- yes"
doc_to_target: "{{['no', 'yes'][label]}}"
"template-2":
doc_to_text: "{{passage}}\n\n{{question}}?\n\nOPTIONS:\n- no\n- yes"
doc_to_target: "{{['no', 'yes'][label]}}"
"template-3":
doc_to_text: "Text: {{passage}}\n\nQuestion: {{question}}?\n\nOPTIONS:\n- no\n- yes"
doc_to_target: "{{['no', 'yes'][label]}}"
"template-4":
doc_to_text: "{{passage}}\n\nWhat's the best answer to this question: {{question}}?\n\nOPTIONS:\n- no\n- yes"
doc_to_target: "{{['no', 'yes'][label]}}"
"template-5":
doc_to_text: "{{passage}}\nBased on the above text what's the best answer to this question: {{question}}?\n\nOPTIONS:\n- no\n- yes"
doc_to_target: "{{['no', 'yes'][label]}}"
"template-6":
doc_to_text: "{{passage}}\nAnswer this question making sure that the answer is supposed by the text: {{question}}?\n\nOPTIONS:\n- no\n- yes"
doc_to_target: "{{['no', 'yes'][label]}}"
"template-7":
doc_to_text: "{{passage}}\n\nIs the following statement correct based on the text\n\n{{question}}\n\nOPTIONS:\n- no\n- yes"
doc_to_target: "{{['no', 'yes'][label]}}"
"template-8":
# doc_to_text: "{{title}}\n\n{{passage}}\n\nIs this statement correct \"{{question}}\"?\n\nOPTIONS:\n- no\n- yes"
doc_to_text: "{{passage}}\n\nIs this statement correct \"{{question}}\"?\n\nOPTIONS:\n- no\n- yes"
doc_to_target: "{{['no', 'yes'][label]}}"
"template-9":
doc_to_text: "Is it true that {{question}} based on the following text?\n\n{{passage}}\n\nOPTIONS:\n- no\n- yes"
doc_to_target: "{{['no', 'yes'][label]}}"
# Flan Prompt Templates
prompts:
"template-0":
doc_to_text: "{{premise}}\n\nQuestion with options: Based on the paragraph above can we conclude that \"{{hypothesis}}\"?\n\nOPTIONS:\n- yes\n- no"
doc_to_target: "{{['yes', 'no'][label]}}"
"template-1":
doc_to_text: "{{premise}}\n\nBased on that paragraph can we conclude that the sentence below is true?\n{{hypothesis}}\n\nOPTIONS:\n- yes\n- no"
doc_to_target: "{{['yes', 'no'][label]}}"
"template-2":
doc_to_text: "{{premise}}\n\nQ with options: Can we draw the following conclusion?\n{{hypothesis}}\n\nOPTIONS:\n- yes\n- no"
doc_to_target: "{{['yes', 'no'][label]}}"
"template-3":
doc_to_text: "{{premise}}\nDoes this next sentence follow, given the preceding text?\n{{hypothesis}}\n\nOPTIONS:\n- yes\n- no"
doc_to_target: "{{['yes', 'no'][label]}}"
"template-4":
doc_to_text: "{{premise}}\nOPTIONS:\n- yes\n- no\nQuestion: Can we infer the following?\n{{hypothesis}}"
doc_to_target: "{{['yes', 'no'][label]}}"
"template-5":
doc_to_text: "Read the following paragraph and determine if the hypothesis is true. Select from options at the end:\n\n{{premise}}\n\nHypothesis: {{hypothesis}}\nOPTIONS:\n- yes\n- no\nThe answer is"
doc_to_target: "{{['yes', 'no'][label]}}"
"template-6":
doc_to_text: "Read the text and determine if the sentence is true:\n\n{{premise}}\n\nSentence: {{hypothesis}}\nOPTIONS:\n- yes\n- no\nA:"
doc_to_target: "{{['yes', 'no'][label]}}"
"template-7":
doc_to_text: "Question with options: can we draw the following hypothesis from the context? \n\nContext:\n\n{{premise}}\n\nHypothesis: {{hypothesis}}\nOPTIONS:\n- yes\n- no\nA:"
doc_to_target: "{{['yes', 'no'][label]}}"
"template-8":
doc_to_text: "Determine if the sentence is true based on the text below. Choose from options.\n{{hypothesis}}\n\n{{premise}}\nOPTIONS:\n- yes\n- no"
doc_to_target: "{{['yes', 'no'][label]}}"
group: flan-cot
output_type: generate_until
validation_split: validation
doc_to_target: "{{answer}}"
metric_list:
- metric: exact_match
aggregation: mean
higher_is_better: true
generation_kwargs:
until:
- "\n\n"
do_sample: false
temperature: 0.0
filter_list:
- name: "get-answer"
filter:
- function: "regex"
regex_pattern: "The answer is (\\-?[0-9\\.\\,]+)"
- function: "take_first"
metadata:
version: 1.0
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment