"vscode:/vscode.git/clone" did not exist on "51fc02ae735c6297906814090340034bb1d574da"
Commit 400c0199 authored by lintangsutawika's avatar lintangsutawika
Browse files

minor fixes to satisify pre-commit

parent 2ee7121b
...@@ -19,7 +19,7 @@ Tasks are configured via the `TaskConfig` object. Below, we describe all fields ...@@ -19,7 +19,7 @@ Tasks are configured via the `TaskConfig` object. Below, we describe all fields
- **reference** (`str`, *optional*) — - **reference** (`str`, *optional*) —
- **dataset_path** (`str`) — The name of the dataset as listed by HF in the datasets Hub. - **dataset_path** (`str`) — The name of the dataset as listed by HF in the datasets Hub.
- **dataset_name** (`str`, *optional*, defaults to None) — The name of, what HF calls, a “data instance” or sub-task of the benchmark. If your task does not contain any data instances, just leave this to default to None. (If you're familiar with the HF `datasets.load_dataset` function, these are just the first 2 arguments to it.) - **dataset_name** (`str`, *optional*, defaults to None) — The name of, what HF calls, a “data instance” or sub-task of the benchmark. If your task does not contain any data instances, just leave this to default to None. (If you're familiar with the HF `datasets.load_dataset` function, these are just the first 2 arguments to it.)
- **dataset_kwargs** (`dict`, *optional*) — Auxillary arguments that `datasets.load_dataset` accepts. This can be used to specify arguments such as `data_files` or `data_dir` if you want to use local datafiles such as json or csv. - **dataset_kwargs** (`dict`, *optional*) — Auxiliary arguments that `datasets.load_dataset` accepts. This can be used to specify arguments such as `data_files` or `data_dir` if you want to use local datafiles such as json or csv.
- **training_split** (`str`, *optional*) — Split in the dataset to use as the training split. - **training_split** (`str`, *optional*) — Split in the dataset to use as the training split.
- **validation_split** (`str`, *optional*) — Split in the dataset to use as the validation split. - **validation_split** (`str`, *optional*) — Split in the dataset to use as the validation split.
- **test_split** (`str`, *optional*) — Split in the dataset to use as the test split. - **test_split** (`str`, *optional*) — Split in the dataset to use as the test split.
...@@ -169,7 +169,7 @@ You can find an example of how to use this feature at [gsm8k-cot-self-consistenc ...@@ -169,7 +169,7 @@ You can find an example of how to use this feature at [gsm8k-cot-self-consistenc
## Passing Arguments to Metrics ## Passing Arguments to Metrics
Metrics can be defined in the `metric_list` argument when building the YAML config. Multiple metrics can be listed along with any auxillary arguments. For example, setting the [`exact_match` metric](https://github.com/huggingface/evaluate/tree/main/metrics/exact_match), auxiliary arguments such as `ignore_case`, `ignore_punctuation`, `regexes_to_ignore` can be listed as well. They will be added to the metric function as `kwargs`. Some metrics have predefined values for `aggregation` and `higher_is_better` so listing the metric name only can be sufficient. Metrics can be defined in the `metric_list` argument when building the YAML config. Multiple metrics can be listed along with any auxiliary arguments. For example, setting the [`exact_match` metric](https://github.com/huggingface/evaluate/tree/main/metrics/exact_match), auxiliary arguments such as `ignore_case`, `ignore_punctuation`, `regexes_to_ignore` can be listed as well. They will be added to the metric function as `kwargs`. Some metrics have predefined values for `aggregation` and `higher_is_better` so listing the metric name only can be sufficient.
``` ```
metric_list: metric_list:
...@@ -225,4 +225,3 @@ Generative tasks: ...@@ -225,4 +225,3 @@ Generative tasks:
Tasks using complex filtering: Tasks using complex filtering:
- GSM8k with CoT (+ with Self-Consistency): (`lm_eval/tasks/gsm8k/gsm8k-cot.yaml` ; `lm_eval/tasks/gsm8k/gsm8k-cot-self-consistency.yaml`) - GSM8k with CoT (+ with Self-Consistency): (`lm_eval/tasks/gsm8k/gsm8k-cot.yaml` ; `lm_eval/tasks/gsm8k/gsm8k-cot-self-consistency.yaml`)
...@@ -250,4 +250,3 @@ It is recommended to include a filled-out copy of this checklist in the README.m ...@@ -250,4 +250,3 @@ It is recommended to include a filled-out copy of this checklist in the README.m
## Submitting your task ## Submitting your task
You're all set! Now push your work and make a pull request to the `big-refactor` branch! Thanks for the contribution :). If there are any questions, please leave a message in the `#lm-thunderdome` channel on the EAI discord! You're all set! Now push your work and make a pull request to the `big-refactor` branch! Thanks for the contribution :). If there are any questions, please leave a message in the `#lm-thunderdome` channel on the EAI discord!
...@@ -29,7 +29,9 @@ def get_model(model_name): ...@@ -29,7 +29,9 @@ def get_model(model_name):
try: try:
return MODEL_REGISTRY[model_name] return MODEL_REGISTRY[model_name]
except KeyError: except KeyError:
raise ValueError(f"Attempted to load model '{model_name}', but no model for this name found! Supported model names: {', '.join(MODEL_REGISTRY.keys())}") raise ValueError(
f"Attempted to load model '{model_name}', but no model for this name found! Supported model names: {', '.join(MODEL_REGISTRY.keys())}"
)
TASK_REGISTRY = {} TASK_REGISTRY = {}
...@@ -75,10 +77,7 @@ DEFAULT_METRIC_REGISTRY = { ...@@ -75,10 +77,7 @@ DEFAULT_METRIC_REGISTRY = {
"acc", "acc",
], ],
"loglikelihood_rolling": ["word_perplexity", "byte_perplexity", "bits_per_byte"], "loglikelihood_rolling": ["word_perplexity", "byte_perplexity", "bits_per_byte"],
"multiple_choice": [ "multiple_choice": ["acc", "acc_norm"],
"acc",
"acc_norm"
],
"greedy_until": ["exact_match"], "greedy_until": ["exact_match"],
} }
...@@ -136,7 +135,6 @@ searching in HF Evaluate library..." ...@@ -136,7 +135,6 @@ searching in HF Evaluate library..."
def register_aggregation(name): def register_aggregation(name):
def decorate(fn): def decorate(fn):
assert ( assert (
name not in AGGREGATION_REGISTRY name not in AGGREGATION_REGISTRY
......
...@@ -98,7 +98,9 @@ class TaskConfig(dict): ...@@ -98,7 +98,9 @@ class TaskConfig(dict):
self.gold_alias = self.template_aliases + self.doc_to_target self.gold_alias = self.template_aliases + self.doc_to_target
if self.generation_kwargs or self.output_type == "greedy_until": if self.generation_kwargs or self.output_type == "greedy_until":
assert self.output_type == "greedy_until", "passed `generation_kwargs`, but not using a generation request type!" assert (
self.output_type == "greedy_until"
), "passed `generation_kwargs`, but not using a generation request type!"
# ensure that we greedily generate in absence of explicit arguments otherwise # ensure that we greedily generate in absence of explicit arguments otherwise
self.generation_kwargs = {"do_sample": False, "temperature": 0.0} self.generation_kwargs = {"do_sample": False, "temperature": 0.0}
...@@ -546,7 +548,7 @@ class ConfigurableTask(Task): ...@@ -546,7 +548,7 @@ class ConfigurableTask(Task):
} }
try: try:
self._metric_fn_list[metric_name] = METRIC_REGISTRY[metric_name] self._metric_fn_list[metric_name] = METRIC_REGISTRY[metric_name]
except: except Exception:
eval_logger.warning( eval_logger.warning(
f"Metric {metric_name} not found, " f"Metric {metric_name} not found, "
"Searching from https://huggingface.co/evaluate-metric" "Searching from https://huggingface.co/evaluate-metric"
...@@ -606,9 +608,7 @@ class ConfigurableTask(Task): ...@@ -606,9 +608,7 @@ class ConfigurableTask(Task):
filter_pipeline = build_filter_ensemble(filter_name, components) filter_pipeline = build_filter_ensemble(filter_name, components)
self._filters.append(filter_pipeline) self._filters.append(filter_pipeline)
else: else:
self._filters = [ self._filters = [build_filter_ensemble("none", [["take_first", None]])]
build_filter_ensemble("none", [["take_first", None]])
]
if self._config.use_prompt is not None: if self._config.use_prompt is not None:
eval_logger.info(f"loading prompt {self._config.use_prompt}") eval_logger.info(f"loading prompt {self._config.use_prompt}")
......
...@@ -150,7 +150,9 @@ def evaluate( ...@@ -150,7 +150,9 @@ def evaluate(
# get lists of each type of request # get lists of each type of request
for task_name, task in task_dict.items(): for task_name, task in task_dict.items():
versions[task_name] = task.VERSION versions[task_name] = task.VERSION
configs[task_name] = dict(task.dump_config()) # TODO: don't access a private attribute here ; for non-YAML tasks handle this case configs[task_name] = dict(
task.dump_config()
) # TODO: don't access a private attribute here ; for non-YAML tasks handle this case
# deterministically shuffle docs and chop off the first `limit` because sometimes docs are in some kind of order # deterministically shuffle docs and chop off the first `limit` because sometimes docs are in some kind of order
# task_docs = list(task_doc_func()) # task_docs = list(task_doc_func())
...@@ -290,7 +292,11 @@ def evaluate( ...@@ -290,7 +292,11 @@ def evaluate(
if stderr is not None: if stderr is not None:
results[task_name][metric + "_stderr" + "," + key] = stderr(items) results[task_name][metric + "_stderr" + "," + key] = stderr(items)
return {"results": dict(results), "configs": dict(configs), "versions": dict(versions)} return {
"results": dict(results),
"configs": dict(configs),
"versions": dict(versions),
}
else: else:
return None return None
...@@ -63,7 +63,7 @@ def get_task(task_name, config): ...@@ -63,7 +63,7 @@ def get_task(task_name, config):
return TASK_REGISTRY[task_name](config=config) return TASK_REGISTRY[task_name](config=config)
except KeyError: except KeyError:
eval_logger.info("Available tasks:") eval_logger.info("Available tasks:")
eval_logger.info(ALL_TASKS) eval_logger.info(list(TASK_REGISTRY) + list(GROUP_REGISTRY))
raise KeyError(f"Missing task {task_name}") raise KeyError(f"Missing task {task_name}")
......
include: pile_arxiv.yaml include: pile_arxiv.yaml
task: pile_pubmed-abstracts task: pile_pubmed-abstracts
dataset_name: pile_pubmed-abstracts dataset_name: pile_pubmed-abstracts
include: pile_arxiv.yaml include: pile_arxiv.yaml
task: pile_pubmed-central task: pile_pubmed-central
dataset_name: pile_pubmed-central dataset_name: pile_pubmed-central
include: pile_arxiv.yaml include: pile_arxiv.yaml
task: pile_stackexchange task: pile_stackexchange
dataset_name: pile_stackexchange dataset_name: pile_stackexchange
include: pile_arxiv.yaml include: pile_arxiv.yaml
task: pile_ubuntu-irc task: pile_ubuntu-irc
dataset_name: pile_ubuntu-irc dataset_name: pile_ubuntu-irc
include: pile_arxiv.yaml include: pile_arxiv.yaml
task: pile_uspto task: pile_uspto
dataset_name: pile_uspto dataset_name: pile_uspto
include: pile_arxiv.yaml include: pile_arxiv.yaml
task: pile_wikipedia task: pile_wikipedia
dataset_name: pile_wikipedia dataset_name: pile_wikipedia
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment