Commit 61520ad6 authored by Baber's avatar Baber
Browse files

add subcommands

parent f9d5d3e7
......@@ -8,71 +8,160 @@ A majority of users run the library by cloning it from Github, installing the pa
Equivalently, running the library can be done via the `lm-eval` entrypoint at the command line.
This mode supports a number of command-line arguments, the details of which can also be seen via running with `-h` or `--help`:
### Subcommand Structure
- `--model` : Selects which model type or provider is evaluated. Must be a string corresponding to the name of the model type/provider being used. See [the main README](https://github.com/EleutherAI/lm-evaluation-harness/tree/main#model-apis-and-inference-servers) for a full list of enabled model names and supported libraries or APIs.
The CLI now uses a subcommand structure for better organization:
- `--model_args` : Controls parameters passed to the model constructor. Accepts a string containing comma-separated keyword arguments to the model class of the format `"arg1=val1,arg2=val2,..."`, such as, for example `--model_args pretrained=EleutherAI/pythia-160m,dtype=float32`. For a full list of what keyword arguments, see the initialization of the `lm_eval.api.model.LM` subclass, e.g. [`HFLM`](https://github.com/EleutherAI/lm-evaluation-harness/blob/365fcda9b85bbb6e0572d91976b8daf409164500/lm_eval/models/huggingface.py#L66)
- `lm-eval run` - Execute evaluations (default behavior)
- `lm-eval list` - List available tasks, models, etc.
- `lm-eval validate` - Validate task configurations
- `--tasks` : Determines which tasks or task groups are evaluated. Accepts a comma-separated list of task names or task group names. Must be solely comprised of valid tasks/groups. A list of supported tasks can be viewed with `--tasks list`.
For backward compatibility, if no subcommand is specified, `run` is automatically inserted. So `lm-eval --model hf --tasks hellaswag` is equivalent to `lm-eval run --model hf --tasks hellaswag`.
- `--num_fewshot` : Sets the number of few-shot examples to place in context. Must be an integer.
### Run Command Arguments
- `--gen_kwargs` : takes an arg string in same format as `--model_args` and creates a dictionary of keyword arguments. These will be passed to the models for all called `generate_until` (free-form or greedy generation task) tasks, to set options such as the sampling temperature or `top_p` / `top_k`. For a list of what args are supported for each model type, reference the respective library's documentation (for example, the documentation for `transformers.AutoModelForCausalLM.generate()`.) These kwargs will be applied to all `generate_until` tasks called--we do not currently support unique gen_kwargs or batch_size values per task in a single run of the library. To control these on a per-task level, set them in that task's YAML file.
The `run` command supports a number of command-line arguments. Details can also be seen via running with `-h` or `--help`:
- `--batch_size` : Sets the batch size used for evaluation. Can be a positive integer or `"auto"` to automatically select the largest batch size that will fit in memory, speeding up evaluation. One can pass `--batch_size auto:N` to re-select the maximum batch size `N` times during evaluation. This can help accelerate evaluation further, since `lm-eval` sorts documents in descending order of context length.
#### Configuration
- `--max_batch_size` : Sets the maximum batch size to try to fit in memory, if `--batch_size auto` is passed.
- `--config` **[path: str]** : Set initial arguments from a YAML configuration file. Takes a path to a YAML file that contains argument values. This allows you to specify complex configurations in a file rather than on the command line. Further CLI arguments can override values from the configuration file.
- `--device` : Sets which device to place the model onto. Must be a string, for example, `"cuda", "cuda:0", "cpu", "mps"`. Defaults to "cuda", and can be ignored if running multi-GPU or running a non-local model type.
For the complete list of available configuration fields and their types, see [`EvaluatorConfig` in the source code](../lm_eval/config/evaluate_config.py).
- `--output_path` : A string of the form `dir/file.jsonl` or `dir/`. Provides a path where high-level results will be saved, either into the file named or into the directory named. If `--log_samples` is passed as well, then per-document outputs and metrics will be saved into the directory as well.
#### Model and Tasks
- `--log_samples` : If this flag is passed, then the model's outputs, and the text fed into the model, will be saved at per-document granularity. Must be used with `--output_path`.
- `--model` **[str, default: "hf"]** : Selects which model type or provider is evaluated. Must be a string corresponding to the name of the model type/provider being used. See [the main README](https://github.com/EleutherAI/lm-evaluation-harness/tree/main#model-apis-and-inference-servers) for a full list of enabled model names and supported libraries or APIs.
- `--limit` : Accepts an integer, or a float between 0.0 and 1.0 . If passed, will limit the number of documents to evaluate to the first X documents (if an integer) per task or first X% of documents per task. Useful for debugging, especially on costly API models.
- `--model_args` **[comma-sep str | json str → dict]** : Controls parameters passed to the model constructor. Can be provided as:
- Comma-separated string: `pretrained=EleutherAI/pythia-160m,dtype=float32`
- JSON string: `'{"pretrained": "EleutherAI/pythia-160m", "dtype": "float32"}'`
- `--use_cache` : Should be a path where a sqlite db file can be written to. Takes a string of format `/path/to/sqlite_cache_` in order to create a cache db at `/path/to/sqlite_cache_rank{i}.db` for each process (0-NUM_GPUS). This allows results of prior runs to be cached, so that there is no need to re-run results in order to re-score or re-run a given (model, task) pair again.
For a full list of supported arguments, see the initialization of the `lm_eval.api.model.LM` subclass, e.g. [`HFLM`](https://github.com/EleutherAI/lm-evaluation-harness/blob/365fcda9b85bbb6e0572d91976b8daf409164500/lm_eval/models/huggingface.py#L66)
- `--cache_requests` : Can be "true", "refresh", or "delete". "true" means that the cache should be used. "refresh" means that you wish to regenerate the cache, which you should run if you change your dataset configuration for a given task. "delete" will delete the cache. Cached files are stored under lm_eval/cache/.cache unless you specify a different path via the environment variable: `LM_HARNESS_CACHE_PATH`. e.g. `LM_HARNESS_CACHE_PATH=~/Documents/cache_for_lm_harness`.
- `--tasks` **[comma-sep str → list[str]]** : Determines which tasks or task groups are evaluated. Accepts a comma-separated list of task names or task group names. Must be solely comprised of valid tasks/groups. A list of supported tasks can be viewed with `lm-eval list tasks`.
- `--check_integrity` : If this flag is used, the library tests for each task selected are run to confirm task integrity.
#### Evaluation Settings
- `--write_out` : Used for diagnostic purposes to observe the format of task documents passed to a model. If this flag is used, then prints the prompt and gold target string for the first document of each task.
- `--num_fewshot` **[int]** : Sets the number of few-shot examples to place in context. Must be an integer.
- `--show_config` : If used, prints the full `lm_eval.api.task.TaskConfig` contents (non-default settings the task YAML file) for each task which was run, at the completion of an evaluation. Useful for when one is modifying a task's configuration YAML locally to transmit the exact configurations used for debugging or for reproducibility purposes.
- `--batch_size` **[int | "auto" | "auto:N", default: 1]** : Sets the batch size used for evaluation. Options:
- Integer: Fixed batch size (e.g., `8`)
- `"auto"`: Automatically select the largest batch size that fits in memory
- `"auto:N"`: Re-select maximum batch size N times during evaluation
- `--include_path` : Accepts a path to a folder. If passed, then all YAML files containing `lm-eval` compatible task configurations will be added to the task registry as available tasks. Used for when one is writing config files for their own task in a folder other than `lm_eval/tasks/`.
Auto mode is useful since `lm-eval` sorts documents in descending order of context length.
- `--system_instruction`: Specifies a system instruction string to prepend to the prompt.
- `--max_batch_size` **[int]** : Sets the maximum batch size to try when using `--batch_size auto`.
- `--apply_chat_template` : This flag specifies whether to apply a chat template to the prompt. It can be used in the following ways:
- `--apply_chat_template` : When used without an argument, applies the only available chat template to the prompt. For Hugging Face models, if no dedicated chat template exists, the default chat template will be applied.
- `--apply_chat_template template_name` : If the model has multiple chat templates, apply the specified template to the prompt.
- `--device` **[str]** : Sets which device to place the model onto. Examples: `"cuda"`, `"cuda:0"`, `"cpu"`, `"mps"`. Can be ignored if running multi-GPU or non-local model types.
For Hugging Face models, the default chat template can be found in the [`default_chat_template`](https://github.com/huggingface/transformers/blob/fc35907f95459d7a6c5281dfadd680b6f7b620e3/src/transformers/tokenization_utils_base.py#L1912) property of the Transformers Tokenizer.
- `--gen_kwargs` **[comma-sep str | json str → dict]** : Generation arguments for `generate_until` tasks. Same format as `--model_args`:
- Comma-separated: `temperature=0.8,top_p=0.95`
- JSON: `'{"temperature": 0.8, "top_p": 0.95}'`
- `--fewshot_as_multiturn` : If this flag is on, the Fewshot examples are treated as a multi-turn conversation. Questions are provided as user content and answers are provided as assistant responses. Requires `--num_fewshot` to be set to be greater than 0, and `--apply_chat_template` to be on.
See model documentation (e.g., `transformers.AutoModelForCausalLM.generate()`) for supported arguments. Applied to all generation tasks - use task YAML files for per-task control.
- `--predict_only`: Generates the model outputs without computing metrics. Use with `--log_samples` to retrieve decoded results.
#### Data and Output
- `--seed`: Set seed for python's random, numpy and torch. Accepts a comma-separated list of 3 values for python's random, numpy, and torch seeds, respectively, or a single integer to set the same seed for all three. The values are either an integer or 'None' to not set the seed. Default is `0,1234,1234` (for backward compatibility). E.g. `--seed 0,None,8` sets `random.seed(0)` and `torch.manual_seed(8)`. Here numpy's seed is not set since the second value is `None`. E.g, `--seed 42` sets all three seeds to 42.
- `--output_path` **[path: str]** : Output location for results. Format options:
- Directory: `results/` - saves as `results/<model_name>_<timestamp>.json`
- File: `results/output.jsonl` - saves to specific file
- `--wandb_args`: Tracks logging to Weights and Biases for evaluation runs and includes args passed to `wandb.init`, such as `project` and `job_type`. Full list [here](https://docs.wandb.ai/ref/python/init). e.g., ```--wandb_args project=test-project,name=test-run```. Also allows for the passing of the step to log things at (passed to `wandb.run.log`), e.g., `--wandb_args step=123`.
When used with `--log_samples`, per-document outputs are saved in the directory.
- `--hf_hub_log_args` : Logs evaluation results to Hugging Face Hub. Accepts a string with the arguments separated by commas. Available arguments:
- `hub_results_org` - organization name on Hugging Face Hub, e.g., `EleutherAI`. If not provided, the results will be pushed to the owner of the Hugging Face token,
- `hub_repo_name` - repository name on Hugging Face Hub (deprecated, `details_repo_name` and `results_repo_name` should be used instead), e.g., `lm-eval-results`,
- `details_repo_name` - repository name on Hugging Face Hub to store details, e.g., `lm-eval-results`,
- `results_repo_name` - repository name on Hugging Face Hub to store results, e.g., `lm-eval-results`,
- `push_results_to_hub` - whether to push results to Hugging Face Hub, can be `True` or `False`,
- `push_samples_to_hub` - whether to push samples results to Hugging Face Hub, can be `True` or `False`. Requires `--log_samples` to be set,
- `public_repo` - whether the repository is public, can be `True` or `False`,
- `leaderboard_url` - URL to the leaderboard, e.g., `https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard`.
- `point_of_contact` - Point of contact for the results dataset, e.g., `yourname@example.com`.
- `gated` - whether to gate the details dataset, can be `True` or `False`.
- `--log_samples` **[flag, default: False]** : Save model outputs and inputs at per-document granularity. Requires `--output_path`. Automatically enabled when using `--predict_only`.
- `--metadata`: JSON string to pass to TaskConfig. Used for some tasks which require additional metadata to be passed for processing. E.g., `--metadata '{"key": "value"}'`.
- `--limit` **[int | float]** : Limit evaluation examples per task. **WARNING: Only for testing!**
- Integer: First N documents (e.g., `100`)
- Float (0.0-1.0): Percentage of documents (e.g., `0.1` for 10%)
- `--samples` **[path | json str | dict → dict]** : Evaluate specific sample indices only. Input formats:
- JSON file path: `samples.json`
- JSON string: `'{"hellaswag": [0, 1, 2], "arc_easy": [10, 20]}'`
- Dictionary (programmatic use)
Format: `{"task_name": [indices], ...}`. Incompatible with `--limit`.
#### Caching and Performance
- `--use_cache` **[path: str]** : SQLite cache database path prefix. Creates per-process cache files:
- Single GPU: `/path/to/cache.db`
- Multi-GPU: `/path/to/cache_rank0.db`, `/path/to/cache_rank1.db`, etc.
Caches model outputs to avoid re-running the same (model, task) evaluations.
- `--cache_requests` **["true" | "refresh" | "delete"]** : Dataset request caching control:
- `"true"`: Use existing cache
- `"refresh"`: Regenerate cache (use after changing task configs)
- `"delete"`: Delete cache
Cache location: `lm_eval/cache/.cache` or `$LM_HARNESS_CACHE_PATH` if set.
- `--check_integrity` **[flag, default: False]** : Run task integrity tests to validate configurations.
#### Instruct Formatting
- `--system_instruction` **[str]** : Custom system instruction to prepend to prompts. Used with instruction-following models.
- `--apply_chat_template` **[bool | str, default: False]** : Apply chat template formatting. Usage:
- No argument: Apply default/only available template
- Template name: Apply specific template (e.g., `"chatml"`)
For HuggingFace models, uses the tokenizer's chat template. Default template defined in [`transformers` documentation](https://github.com/huggingface/transformers/blob/fc35907f95459d7a6c5281dfadd680b6f7b620e3/src/transformers/tokenization_utils_base.py#L1912).
- `--fewshot_as_multiturn` **[flag, default: False]** : Format few-shot examples as multi-turn conversation:
- Questions → User messages
- Answers → Assistant responses
Requires: `--num_fewshot > 0` and `--apply_chat_template` enabled.
#### Task Management
- `--include_path` **[path: str]** : Directory containing custom task YAML files. All `.yaml` files in this directory will be registered as available tasks. Use for custom tasks outside of `lm_eval/tasks/`.
#### Logging and Tracking
- `--verbosity` **[str]** : **DEPRECATED** - Use `LOGLEVEL` environment variable instead.
- `--write_out` **[flag, default: False]** : Print first document's prompt and target for each task. Useful for debugging prompt formatting.
- `--show_config` **[flag, default: False]** : Display full task configurations after evaluation. Shows all non-default settings from task YAML files.
- `--wandb_args` **[comma-sep str → dict]** : Weights & Biases integration. Arguments for `wandb.init()`:
- Example: `project=my-project,name=run-1,tags=test`
- Special: `step=123` sets logging step
- See [W&B docs](https://docs.wandb.ai/ref/python/init) for all options
- `--wandb_config_args` **[comma-sep str → dict]** : Additional W&B config arguments, same format as `--wandb_args`.
- `--hf_hub_log_args` **[comma-sep str → dict]** : Hugging Face Hub logging configuration. Format: `key1=value1,key2=value2`. Options:
- `hub_results_org`: Organization name (default: token owner)
- `details_repo_name`: Repository for detailed results
- `results_repo_name`: Repository for aggregated results
- `push_results_to_hub`: Enable pushing (`True`/`False`)
- `push_samples_to_hub`: Push samples (`True`/`False`, requires `--log_samples`)
- `public_repo`: Make repo public (`True`/`False`)
- `leaderboard_url`: Associated leaderboard URL
- `point_of_contact`: Contact email
- `gated`: Gate the dataset (`True`/`False`)
- ~~`hub_repo_name`~~: Deprecated, use `details_repo_name` and `results_repo_name`
#### Advanced Options
- `--predict_only` **[flag, default: False]** : Generate outputs without computing metrics. Automatically enables `--log_samples`. Use to get raw model outputs.
- `--seed` **[int | comma-sep str → list[int], default: [0,1234,1234,1234]]** : Set random seeds for reproducibility:
- Single integer: Same seed for all (e.g., `42`)
- Four values: `python,numpy,torch,fewshot` seeds (e.g., `0,1234,8,52`)
- Use `None` to skip setting a seed (e.g., `0,None,8,52`)
Default preserves backward compatibility.
- `--trust_remote_code` **[flag, default: False]** : Allow executing remote code from Hugging Face Hub. **Security Risk**: Required for some models with custom code.
- `--confirm_run_unsafe_code` **[flag, default: False]** : Acknowledge risks when running tasks that execute arbitrary Python code (e.g., code generation tasks).
- `--metadata` **[json str → dict]** : Additional metadata for specific tasks. Format: `'{"key": "value"}'`. Required by tasks like RULER that need extra configuration.
## External Library Usage
......
from typing import Union
import argparse
from lm_eval._cli.eval import Eval
from lm_eval._cli import CLIParser
def cli_evaluate(args: Union[argparse.Namespace, None] = None) -> None:
def cli_evaluate() -> None:
"""Main CLI entry point with subcommand and legacy support."""
parser = CLIParser()
if args is None:
# Parse from command line
parser.execute()
else:
# External call with pre-parsed args - use legacy mode
parser._handle_legacy_mode(args)
parser = Eval()
args = parser.parse_args()
parser.execute(args)
if __name__ == "__main__":
cli_evaluate()
\ No newline at end of file
cli_evaluate()
"""
CLI subcommands for the Language Model Evaluation Harness.
CLI subcommands to run from terminal.
"""
from lm_eval._cli.base import SubCommand
from lm_eval._cli.cache import CacheCommand
from lm_eval._cli.evaluate import EvaluateCommand
from lm_eval._cli.list import ListCommand
from lm_eval._cli.parser import CLIParser
from lm_eval._cli.validate import ValidateCommand
__all__ = [
"SubCommand",
"EvaluateCommand",
"ListCommand",
"ValidateCommand",
"CacheCommand",
"CLIParser",
]
\ No newline at end of file
import argparse
from lm_eval._cli.base import SubCommand
class CacheCommand(SubCommand):
"""Command for cache management."""
def __init__(self, subparsers: argparse._SubParsersAction, *args, **kwargs):
# Create and configure the parser
super().__init__(*args, **kwargs)
parser = subparsers.add_parser(
"cache",
help="Manage evaluation cache",
description="Manage evaluation cache files and directories.",
epilog="""
Examples:
lm-eval cache clear --cache_path ./cache.db # Clear cache file
lm-eval cache info --cache_path ./cache.db # Show cache info
lm-eval cache clear --cache_path ./cache_dir/ # Clear cache directory
""",
formatter_class=argparse.RawDescriptionHelpFormatter,
)
# Add command-specific arguments
self._add_args(parser)
# Set the function to execute for this subcommand
parser.set_defaults(func=self.execute)
def _add_args(self, parser: argparse.ArgumentParser) -> None:
parser.add_argument(
"action",
choices=["clear", "info"],
help="Action to perform: clear or info",
)
parser.add_argument(
"--cache_path",
type=str,
default=None,
help="Path to cache directory or file",
)
def execute(self, args: argparse.Namespace) -> None:
"""Execute the cache command."""
import os
if args.action == "clear":
if args.cache_path:
if os.path.exists(args.cache_path):
if os.path.isdir(args.cache_path):
import shutil
shutil.rmtree(args.cache_path)
else:
os.remove(args.cache_path)
print(f"✅ Cache cleared: {args.cache_path}")
else:
print(f"❌ Cache path not found: {args.cache_path}")
else:
print("❌ Please specify --cache_path")
elif args.action == "info":
if args.cache_path and os.path.exists(args.cache_path):
import os
size = os.path.getsize(args.cache_path)
print(f"Cache: {args.cache_path}")
print(f"Size: {size} bytes")
else:
print("❌ Cache path not found or not specified")
import argparse
import sys
import textwrap
from lm_eval._cli.listall import ListAll
from lm_eval._cli.run import Run
from lm_eval._cli.validate import Validate
class Eval:
"""Main CLI parser that manages all subcommands."""
def __init__(self):
self._parser = argparse.ArgumentParser(
prog="lm-eval",
description="Language Model Evaluation Harness",
epilog=textwrap.dedent("""
quick start:
# Basic evaluation
lm-eval run --model hf --model_args pretrained=gpt2 --tasks hellaswag
# List available tasks
lm-eval list tasks
# Validate task configurations
lm-eval validate --tasks hellaswag,arc_easy
legacy compatibility:
The harness maintains backward compatibility with the original interface.
If no command is specified, 'run' is automatically inserted:
lm-eval --model hf --tasks hellaswag # Equivalent to 'lm-eval run --model hf --tasks hellaswag'
For documentation, visit: https://github.com/EleutherAI/lm-evaluation-harness/blob/main/docs/interface.md
"""),
formatter_class=argparse.RawDescriptionHelpFormatter,
)
self._parser.set_defaults(func=lambda args: self._parser.print_help())
self._subparsers = self._parser.add_subparsers(
dest="command", help="Available commands", metavar="COMMAND"
)
Run.create(self._subparsers)
ListAll.create(self._subparsers)
Validate.create(self._subparsers)
def parse_args(self) -> argparse.Namespace:
"""Parse arguments using the main parser."""
if len(sys.argv) > 2 and sys.argv[1] not in self._subparsers.choices:
# Backward compatibility: arguments provided but no valid subcommand - insert 'run'
sys.argv.insert(1, "run")
elif len(sys.argv) == 2 and "run" in sys.argv:
# if only 'run' is specified, ensure it is treated as a subcommand
self._subparsers.choices["run"].print_help()
sys.exit(0)
return self._parser.parse_args()
def execute(self, args: argparse.Namespace) -> None:
"""Main execution method that handles subcommands and legacy support."""
args.func(args)
import argparse
import textwrap
from lm_eval._cli.base import SubCommand
from lm_eval._cli.subcommand import SubCommand
class ListCommand(SubCommand):
class ListAll(SubCommand):
"""Command for listing available tasks."""
def __init__(self, subparsers: argparse._SubParsersAction, *args, **kwargs):
# Create and configure the parser
super().__init__(*args, **kwargs)
parser = subparsers.add_parser(
self._parser = subparsers.add_parser(
"list",
help="List available tasks, groups, subtasks, or tags",
description="List available tasks, groups, subtasks, or tags from the evaluation harness.",
epilog="""
Examples:
lm-eval list tasks # List all available tasks
lm-eval list groups # List task groups only
lm-eval list subtasks # List subtasks only
lm-eval list tags # List available tags
""",
formatter_class=argparse.RawDescriptionHelpFormatter,
)
usage="lm-eval list [tasks|groups|subtasks|tags] [--include_path DIR]",
epilog=textwrap.dedent("""
examples:
# List all available tasks (includes groups, subtasks, and tags)
$ lm-eval list tasks
# List only task groups (like 'mmlu', 'glue', 'superglue')
$ lm-eval list groups
# List only individual subtasks (like 'mmlu_abstract_algebra')
$ lm-eval list subtasks
# Include external task definitions
$ lm-eval list tasks --include_path /path/to/external/tasks
# Add command-specific arguments
self._add_args(parser)
# List tasks from multiple external paths
$ lm-eval list tasks --include_path "/path/to/tasks1:/path/to/tasks2"
# Set the function to execute for this subcommand
parser.set_defaults(func=self.execute)
organization:
• Groups: Collections of tasks with aggregated metric across subtasks (e.g., 'mmlu')
• Subtasks: Individual evaluation tasks (e.g., 'mmlu_anatomy', 'hellaswag')
• Tags: Similar to groups but no aggregate metric (e.g., 'reasoning', 'knowledge', 'language')
• External Tasks: Custom tasks defined in external directories
evaluation usage:
After listing tasks, use them with the run command!
For more information tasks configs are defined in https://github.com/EleutherAI/lm-evaluation-harness/tree/main/lm_eval/tasks
"""),
formatter_class=argparse.RawDescriptionHelpFormatter,
)
self._add_args()
self._parser.set_defaults(func=lambda arg: self._parser.print_help())
def _add_args(self, parser: argparse.ArgumentParser) -> None:
parser.add_argument(
def _add_args(self) -> None:
self._parser.add_argument(
"what",
choices=["tasks", "groups", "subtasks", "tags"],
nargs="?",
help="What to list: tasks (all), groups, subtasks, or tags",
)
parser.add_argument(
self._parser.add_argument(
"--include_path",
type=str,
default=None,
......@@ -57,3 +77,5 @@ Examples:
print(task_manager.list_all_tasks(list_groups=False, list_tags=False))
elif args.what == "tags":
print(task_manager.list_all_tasks(list_groups=False, list_subtasks=False))
elif args.what is None:
self._parser.print_help()
import argparse
import sys
from typing import Dict, Type
from lm_eval._cli.base import SubCommand
from lm_eval._cli.cache import CacheCommand
from lm_eval._cli.evaluate import EvaluateCommand
from lm_eval._cli.list import ListCommand
from lm_eval._cli.validate import ValidateCommand
def check_argument_types(parser: argparse.ArgumentParser):
"""
Check to make sure all CLI args are typed, raises error if not
"""
for action in parser._actions:
# Skip help, subcommands, and const actions
if action.dest in ["help", "command"] or action.const is not None:
continue
if action.type is None:
raise ValueError(f"Argument '{action.dest}' doesn't have a type specified.")
else:
continue
class CLIParser:
"""Main CLI parser class that manages all subcommands."""
def __init__(self):
self.parser = None
self.subparsers = None
self.legacy_parser = None
self.command_instances: Dict[str, SubCommand] = {}
def setup_parser(self) -> argparse.ArgumentParser:
"""Set up the main parser with subcommands."""
if self.parser is not None:
return self.parser
self.parser = argparse.ArgumentParser(
prog="lm-eval",
description="Language Model Evaluation Harness",
formatter_class=argparse.RawTextHelpFormatter,
)
# Create subparsers
self.subparsers = self.parser.add_subparsers(
dest="command", help="Available commands", metavar="COMMAND"
)
# Create and register all command instances
self.command_instances = {
"evaluate": EvaluateCommand.create(self.subparsers),
"list": ListCommand.create(self.subparsers),
"validate": ValidateCommand.create(self.subparsers),
"cache": CacheCommand.create(self.subparsers),
}
return self.parser
def setup_legacy_parser(self) -> argparse.ArgumentParser:
"""Set up legacy parser for backward compatibility."""
if self.legacy_parser is not None:
return self.legacy_parser
self.legacy_parser = argparse.ArgumentParser(
formatter_class=argparse.RawTextHelpFormatter
)
# For legacy mode, we just need to add the evaluate command's arguments
# without the subcommand structure. We'll create a temporary instance.
from lm_eval._cli.evaluate import EvaluateCommand as EvalCmd
# Create a minimal instance just to get the arguments
temp_cmd = object.__new__(EvalCmd)
temp_cmd._add_args(self.legacy_parser)
return self.legacy_parser
def parse_args(self, args=None) -> argparse.Namespace:
"""Parse arguments using the main parser."""
parser = self.setup_parser()
check_argument_types(parser)
return parser.parse_args(args)
def parse_legacy_args(self, args=None) -> argparse.Namespace:
"""Parse arguments using the legacy parser."""
parser = self.setup_legacy_parser()
check_argument_types(parser)
return parser.parse_args(args)
def should_use_subcommand_mode(self, argv=None) -> bool:
"""Determine if we should use subcommand mode based on arguments."""
if argv is None:
argv = sys.argv[1:]
# If no arguments, show main help
if len(argv) == 0:
return True
# Check if first argument is a known subcommand
# First ensure parser is set up to populate command_instances
if not self.command_instances:
self.setup_parser()
if len(argv) > 0 and argv[0] in self.command_instances:
return True
return False
def execute(self, argv=None) -> None:
"""Main execution method that handles both subcommand and legacy modes."""
if self.should_use_subcommand_mode(argv):
# Use subcommand mode
if argv is None and len(sys.argv) == 1:
# No arguments provided, show help
self.setup_parser().print_help()
sys.exit(1)
args = self.parse_args(argv)
args.func(args)
else:
# Use legacy mode for backward compatibility
args = self.parse_legacy_args(argv)
self._handle_legacy_mode(args)
def _handle_legacy_mode(self, args: argparse.Namespace) -> None:
"""Handle legacy CLI mode for backward compatibility."""
# Handle legacy task listing
if hasattr(args, "tasks") and args.tasks in [
"list",
"list_groups",
"list_subtasks",
"list_tags",
]:
from lm_eval.tasks import TaskManager
task_manager = TaskManager(include_path=getattr(args, "include_path", None))
if args.tasks == "list":
print(task_manager.list_all_tasks())
elif args.tasks == "list_groups":
print(task_manager.list_all_tasks(list_subtasks=False, list_tags=False))
elif args.tasks == "list_subtasks":
print(task_manager.list_all_tasks(list_groups=False, list_tags=False))
elif args.tasks == "list_tags":
print(
task_manager.list_all_tasks(list_groups=False, list_subtasks=False)
)
sys.exit(0)
# Handle legacy evaluation
# Use existing instance if available, otherwise create temporary one
if "evaluate" in self.command_instances:
evaluate_cmd = self.command_instances["evaluate"]
else:
# For legacy mode, we don't need the subparser registration
# Just execute with the existing args
from lm_eval._cli.evaluate import EvaluateCommand as EvalCmd
# Create a minimal instance just for execution
evaluate_cmd = object.__new__(EvalCmd)
evaluate_cmd.execute(args)
def add_command(self, name: str, command_class: Type[SubCommand]) -> None:
"""Add a new command to the parser (for extensibility)."""
# If parser is already set up, create and register the command instance
if self.subparsers is not None:
self.command_instances[name] = command_class.create(self.subparsers)
else:
# Store class for later instantiation
if not hasattr(self, "_pending_commands"):
self._pending_commands = {}
self._pending_commands[name] = command_class
import argparse
from abc import ABC, abstractmethod
class SubCommand(ABC):
"""Base class for all subcommands."""
def __init__(self, *args, **kwargs):
pass
@classmethod
def create(cls, subparsers: argparse._SubParsersAction):
"""Factory method to create and register a command instance."""
return cls(subparsers)
@abstractmethod
def _add_args(self) -> None:
"""Add arguments specific to this subcommand."""
pass
@abstractmethod
def execute(self, args: argparse.Namespace) -> None:
"""Execute the subcommand with the given arguments."""
pass
import argparse
import ast
import json
import logging
from abc import ABC, abstractmethod
from typing import Union
from typing import Any, Optional, Union
def try_parse_json(value: str) -> Union[str, dict, None]:
def try_parse_json(value: Union[str, dict, None]) -> Union[str, dict, None]:
"""Try to parse a string as JSON. If it fails, return the original string."""
if value is None:
return None
if isinstance(value, dict):
return value
try:
return json.loads(value)
except json.JSONDecodeError:
if "{" in value:
raise argparse.ArgumentTypeError(
raise ValueError(
f"Invalid JSON: {value}. Hint: Use double quotes for JSON strings."
)
return value
......@@ -20,15 +23,19 @@ def try_parse_json(value: str) -> Union[str, dict, None]:
def _int_or_none_list_arg_type(
min_len: int, max_len: int, defaults: str, value: str, split_char: str = ","
):
) -> list[Union[int, None]]:
"""Parses a string of integers or 'None' values separated by a specified character into a list.
Validates the number of items against specified minimum and maximum lengths and fills missing values with defaults."""
def parse_value(item):
"""Parses an individual item, converting it to an integer or `None`."""
item = item.strip().lower()
if item == "none":
return None
try:
return int(item)
except ValueError:
raise argparse.ArgumentTypeError(f"{item} is not an integer or None")
raise ValueError(f"{item} is not an integer or None")
items = [parse_value(v) for v in value.split(split_char)]
num_items = len(items)
......@@ -36,7 +43,7 @@ def _int_or_none_list_arg_type(
if num_items == 1:
items = items * max_len
elif num_items < min_len or num_items > max_len:
raise argparse.ArgumentTypeError(
raise ValueError(
f"Argument requires {max_len} integers or None, separated by '{split_char}'"
)
elif num_items != max_len:
......@@ -50,23 +57,60 @@ def _int_or_none_list_arg_type(
return items
class SubCommand(ABC):
"""Base class for all subcommands."""
def request_caching_arg_to_dict(cache_requests: Optional[str]) -> dict[str, bool]:
"""Convert a request caching argument to a dictionary."""
if cache_requests is None:
return {}
request_caching_args = {
"cache_requests": cache_requests in {"true", "refresh"},
"rewrite_requests_cache": cache_requests == "refresh",
"delete_requests_cache": cache_requests == "delete",
}
return request_caching_args
def check_argument_types(parser: argparse.ArgumentParser) -> None:
"""
Check to make sure all CLI args are typed, raises error if not
"""
for action in parser._actions:
# Skip help, subcommands, and const actions
if action.dest in ["help", "command"] or action.const is not None:
continue
if action.type is None:
raise ValueError(f"Argument '{action.dest}' doesn't have a type specified.")
else:
continue
def handle_cli_value_string(arg: str) -> Any:
if arg.lower() == "true":
return True
elif arg.lower() == "false":
return False
elif arg.isnumeric():
return int(arg)
try:
return float(arg)
except ValueError:
try:
return ast.literal_eval(arg)
except (ValueError, SyntaxError):
return arg
def __init__(self, *args, **kwargs):
pass
@classmethod
def create(cls, subparsers: argparse._SubParsersAction):
"""Factory method to create and register a command instance."""
return cls(subparsers)
def key_val_to_dict(args: str) -> dict:
"""Parse model arguments from a string into a dictionary."""
return (
{
k: handle_cli_value_string(v)
for k, v in (item.split("=") for item in args.split(","))
}
if args
else {}
)
@abstractmethod
def _add_args(self, parser: argparse.ArgumentParser) -> None:
"""Add arguments specific to this subcommand."""
pass
@abstractmethod
def execute(self, args: argparse.Namespace) -> None:
"""Execute the subcommand with the given arguments."""
pass
def merge_dicts(*dicts):
return {k: v for d in dicts for k, v in d.items()}
import argparse
import sys
import textwrap
from lm_eval._cli.base import SubCommand
from lm_eval._cli.subcommand import SubCommand
class ValidateCommand(SubCommand):
class Validate(SubCommand):
"""Command for validating tasks."""
def __init__(self, subparsers: argparse._SubParsersAction, *args, **kwargs):
# Create and configure the parser
# Create and configure the self._parser
super().__init__(*args, **kwargs)
parser = subparsers.add_parser(
self._parser = subparsers.add_parser(
"validate",
help="Validate task configurations",
description="Validate task configurations and check for errors.",
epilog="""
Examples:
lm-eval validate --tasks hellaswag # Validate single task
lm-eval validate --tasks arc_easy,arc_challenge # Validate multiple tasks
lm-eval validate --tasks mmlu --include_path ./custom_tasks
""",
formatter_class=argparse.RawDescriptionHelpFormatter,
)
usage="lm-eval validate --tasks <task1,task2> [--include_path DIR]",
epilog=textwrap.dedent("""
examples:
# Validate a single task
lm-eval validate --tasks hellaswag
# Validate multiple tasks
lm-eval validate --tasks arc_easy,arc_challenge,hellaswag
# Validate a task group
lm-eval validate --tasks mmlu
# Validate tasks with external definitions
lm-eval validate --tasks my_custom_task --include_path ./custom_tasks
# Validate tasks from multiple external paths
lm-eval validate --tasks custom_task1,custom_task2 --include_path "/path/to/tasks1:/path/to/tasks2"
# Add command-specific arguments
self._add_args(parser)
validation check:
The validate command performs several checks:
• Task existence: Verifies all specified tasks are available
• Configuration syntax: Checks YAML/JSON configuration files
• Dataset access: Validates dataset paths and configurations
• Required fields: Ensures all mandatory task parameters are present
• Metric definitions: Verifies metric functions and aggregation methods
• Filter pipelines: Validates filter chains and their parameters
• Template rendering: Tests prompt templates with sample data
# Set the function to execute for this subcommand
parser.set_defaults(func=self.execute)
task config files:
Tasks are defined using YAML configuration files with these key sections:
• task: Task name and metadata
• dataset_path: HuggingFace dataset identifier
• doc_to_text: Template for converting documents to prompts
• doc_to_target: Template for extracting target answers
• metric_list: List of evaluation metrics to compute
• output_type: Type of model output (loglikelihood, generate_until, etc.)
• filter_list: Post-processing filters for model outputs
common errors:
• Missing required fields in YAML configuration
• Invalid dataset paths or missing dataset splits
• Malformed Jinja2 templates in doc_to_text/doc_to_target
• Undefined metrics or aggregation functions
• Invalid filter names or parameters
• Circular dependencies in task inheritance
• Missing external task files when using --include_path
debugging tips:
• Use --include_path to test external task definitions
• Check task configuration files for syntax errors
• Verify dataset access and authentication if needed
• Use 'lm-eval list tasks' to see available tasks
For task configuration guide, see: https://github.com/EleutherAI/lm-evaluation-harness/blob/main/docs/task_guide.md
"""),
formatter_class=argparse.RawDescriptionHelpFormatter,
)
self._add_args()
self._parser.set_defaults(func=lambda arg: self._parser.print_help())
def _add_args(self, parser: argparse.ArgumentParser) -> None:
parser.add_argument(
def _add_args(self) -> None:
self._parser.add_argument(
"--tasks",
"-t",
required=True,
type=str,
metavar="task1,task2",
metavar="TASK1,TASK2",
help="Comma-separated list of task names to validate",
)
parser.add_argument(
self._parser.add_argument(
"--include_path",
type=str,
default=None,
......
from .evaluate_config import EvaluatorConfig
__all__ = [
"EvaluatorConfig",
]
import json
import logging
import warnings
from argparse import Namespace
from dataclasses import dataclass
from dataclasses import asdict, dataclass, field
from pathlib import Path
from typing import Any, Dict, List, Optional, Union
from typing import TYPE_CHECKING, Any, Dict, Optional, Union
import yaml
from lm_eval.utils import simple_parse_args_string
if TYPE_CHECKING:
from lm_eval.tasks import TaskManager
DICT_KEYS = [
"wandb_args",
"wandb_config_args",
......@@ -20,65 +25,167 @@ DICT_KEYS = [
@dataclass
class EvaluationConfig:
"""
Simple config container for holding params.
"""
class EvaluatorConfig:
"""Configuration for language model evaluation runs.
This dataclass contains all parameters for configuring model evaluations via
`simple_evaluate()` or the CLI. It supports initialization from:
- CLI arguments (via `from_cli()`)
- YAML configuration files (via `from_config()`)
- Direct instantiation with keyword arguments
The configuration handles argument parsing, validation, and preprocessing
to ensure properly structured and validated.
Example:
# From CLI arguments
config = EvaluatorConfig.from_cli(args)
# From YAML file
config = EvaluatorConfig.from_config("eval_config.yaml")
# Direct instantiation
config = EvaluatorConfig(
model="hf",
model_args={"pretrained": "gpt2"},
tasks=["hellaswag", "arc_easy"],
num_fewshot=5
)
config: Optional[str] = None
model: Optional[str] = None
model_args: Optional[dict] = None
tasks: Optional[str] = None
num_fewshot: Optional[int] = None
batch_size: Optional[int] = None
max_batch_size: Optional[int] = None
device: Optional[str] = None
output_path: Optional[str] = None
limit: Optional[float] = None
samples: Optional[str] = None
use_cache: Optional[str] = None
cache_requests: Optional[str] = None
check_integrity: Optional[bool] = None
write_out: Optional[bool] = None
log_samples: Optional[bool] = None
predict_only: Optional[bool] = None
system_instruction: Optional[str] = None
apply_chat_template: Optional[Union[bool, str]] = None
fewshot_as_multiturn: Optional[bool] = None
show_config: Optional[bool] = None
include_path: Optional[str] = None
gen_kwargs: Optional[dict] = None
verbosity: Optional[str] = None
wandb_args: Optional[dict] = None
wandb_config_args: Optional[dict] = None
hf_hub_log_args: Optional[dict] = None
seed: Optional[list] = None
trust_remote_code: Optional[bool] = None
confirm_run_unsafe_code: Optional[bool] = None
metadata: Optional[dict] = None
request_caching_args: Optional[dict] = None
See individual field documentation for detailed parameter descriptions.
"""
@staticmethod
def _get_defaults() -> Dict[str, Any]:
"""Get default values for all configuration options."""
return {
"model": "hf",
"model_args": {},
"batch_size": 1,
"check_integrity": False,
"write_out": False,
"log_samples": False,
"predict_only": False,
"fewshot_as_multiturn": False,
"show_config": False,
"trust_remote_code": False,
"confirm_run_unsafe_code": False,
"metadata": {},
"wandb_args": {},
"wandb_config_args": {},
"hf_hub_log_args": {},
"seed": [0, 1234, 1234, 1234],
}
# Core evaluation parameters
config: Optional[str] = field(
default=None, metadata={"help": "Path to YAML config file"}
)
model: str = field(default="hf", metadata={"help": "Name of model e.g. 'hf'"})
model_args: dict = field(
default_factory=dict, metadata={"help": "Arguments for model initialization"}
)
tasks: Union[str, list[str]] = field(
default_factory=list,
metadata={"help": "Comma-separated list of task names to evaluate"},
)
# Few-shot and batching
num_fewshot: Optional[int] = field(
default=None, metadata={"help": "Number of examples in few-shot context"}
)
batch_size: int = field(default=1, metadata={"help": "Batch size for evaluation"})
max_batch_size: Optional[int] = field(
default=None, metadata={"help": "Maximum batch size for auto batching"}
)
# Device
device: Optional[str] = field(
default=None, metadata={"help": "Device to use (e.g. cuda, cuda:0, cpu)"}
)
# Data sampling and limiting
limit: Optional[float] = field(
default=None, metadata={"help": "Limit number of examples per task"}
)
samples: Union[str, dict, None] = field(
default=None,
metadata={"help": "dict, JSON string or path to JSON file with doc indices"},
)
# Caching
use_cache: Optional[str] = field(
default=None,
metadata={"help": "Path to sqlite db file for caching model outputs"},
)
cache_requests: dict = field(
default_factory=dict,
metadata={"help": "Cache dataset requests: true/refresh/delete"},
)
# Output and logging flags
check_integrity: bool = field(
default=False, metadata={"help": "Run test suite for tasks"}
)
write_out: bool = field(
default=False, metadata={"help": "Print prompts for first few documents"}
)
log_samples: bool = field(
default=False, metadata={"help": "Save model outputs and inputs"}
)
output_path: Optional[str] = field(
default=None, metadata={"help": "Dir path where result metrics will be saved"}
)
predict_only: bool = field(
default=False,
metadata={
"help": "Only save model outputs, don't evaluate metrics. Use with log_samples."
},
)
# Chat and instruction handling
system_instruction: Optional[str] = field(
default=None, metadata={"help": "Custom System instruction to add"}
)
apply_chat_template: Union[bool, str] = field(
default=False, metadata={"help": "Apply chat template to prompt"}
)
fewshot_as_multiturn: bool = field(
default=False,
metadata={
"help": "Use fewshot as multi-turn conversation. Requires apply_chat_template=True."
},
)
# Configuration display
show_config: bool = field(
default=False, metadata={"help": "Show full config at end of evaluation"}
)
# External tasks and generation
include_path: Optional[str] = field(
default=None, metadata={"help": "Additional dir path for external tasks"}
)
gen_kwargs: Optional[dict] = field(
default=None, metadata={"help": "Arguments for model generation"}
)
# Logging and verbosity
verbosity: Optional[str] = field(
default=None, metadata={"help": "Logging verbosity level"}
)
# External integrations
wandb_args: dict = field(
default_factory=dict, metadata={"help": "Arguments for wandb.init"}
)
wandb_config_args: dict = field(
default_factory=dict, metadata={"help": "Arguments for wandb.config.update"}
)
hf_hub_log_args: dict = field(
default_factory=dict, metadata={"help": "Arguments for HF Hub logging"}
)
# Reproducibility
seed: list = field(
default_factory=lambda: [0, 1234, 1234, 1234],
metadata={"help": "Seeds for random, numpy, torch, fewshot (random)"},
)
# Security and safety
trust_remote_code: bool = field(
default=False, metadata={"help": "Trust remote code for HF datasets"}
)
confirm_run_unsafe_code: bool = field(
default=False,
metadata={
"help": "Confirm understanding of unsafe code risks (for code tasks that executes arbitrary Python)"
},
)
# Internal metadata
metadata: dict = field(
default_factory=dict,
metadata={"help": "Additional metadata for tasks that require it"},
)
@staticmethod
def _parse_dict_args(config: Dict[str, Any]) -> Dict[str, Any]:
......@@ -89,24 +196,22 @@ class EvaluationConfig:
return config
@classmethod
def from_cli(cls, namespace: Namespace) -> "EvaluationConfig":
def from_cli(cls, namespace: Namespace) -> "EvaluatorConfig":
"""
Build an EvaluationConfig by merging with simple precedence:
CLI args > YAML config > built-in defaults
"""
# Start with built-in defaults
config = cls._get_defaults()
config = asdict(cls())
# Load and merge YAML config if provided
if hasattr(namespace, "config") and namespace.config:
config.update(cls._load_yaml_config(namespace.config))
# Override with CLI args (only non-None values, exclude non-config args)
# Override with CLI args (only truthy values, exclude non-config args)
excluded_args = {"config", "command", "func"} # argparse internal args
cli_args = {
k: v
for k, v in vars(namespace).items()
if v is not None and k not in excluded_args
k: v for k, v in vars(namespace).items() if v and k not in excluded_args
}
config.update(cli_args)
......@@ -119,10 +224,30 @@ class EvaluationConfig:
return instance
@classmethod
def from_config(cls, config_path: Union[str, Path]) -> "EvaluatorConfig":
"""
Build an EvaluationConfig from a YAML config file.
Merges with built-in defaults and validates.
"""
# Load YAML config
yaml_config = cls._load_yaml_config(config_path)
# Parse string arguments that should be dictionaries
yaml_config = cls._parse_dict_args(yaml_config)
# Create instance and validate
instance = cls(**yaml_config)
instance.validate_and_preprocess()
return instance
@staticmethod
def _load_yaml_config(config_path: str) -> Dict[str, Any]:
def _load_yaml_config(config_path: Union[str, Path]) -> Dict[str, Any]:
"""Load and validate YAML config file."""
config_file = Path(config_path)
config_file = (
Path(config_path) if not isinstance(config_path, Path) else config_path
)
if not config_file.is_file():
raise FileNotFoundError(f"Config file not found: {config_path}")
......@@ -143,13 +268,17 @@ class EvaluationConfig:
def validate_and_preprocess(self) -> None:
"""Validate configuration and preprocess fields after creation."""
self._validate_arguments()
self._process_samples()
self._setup_metadata()
self._process_arguments()
self._apply_trust_remote_code()
self._process_tasks()
def _validate_arguments(self) -> None:
"""Validate configuration arguments and cross-field constraints."""
if self.limit:
warnings.warn(
"--limit SHOULD ONLY BE USED FOR TESTING. "
"REAL METRICS SHOULD NOT BE COMPUTED USING LIMIT."
)
# predict_only implies log_samples
if self.predict_only:
self.log_samples = True
......@@ -174,26 +303,47 @@ class EvaluationConfig:
if self.tasks is None:
raise ValueError("Need to specify task to evaluate.")
def _process_samples(self) -> None:
def _process_arguments(self) -> None:
"""Process samples argument - load from file if needed."""
if self.samples:
if (samples_path := Path(self.samples)).is_file():
self.samples = json.loads(samples_path.read_text())
else:
self.samples = json.loads(self.samples)
if isinstance(self.samples, dict):
self.samples = self.samples
elif isinstance(self.samples, str):
try:
self.samples = json.loads(self.samples)
except json.JSONDecodeError:
if (samples_path := Path(self.samples)).is_file():
self.samples = json.loads(samples_path.read_text())
# Set up metadata by merging model_args and metadata.
if self.model_args is None:
self.model_args = {}
if self.metadata is None:
self.metadata = {}
self.metadata = self.model_args | self.metadata
def _process_tasks(self, metadata: Union[dict, str]) -> List[str]:
def process_tasks(self, metadata: Optional[dict] = None) -> "TaskManager":
"""Process and validate tasks, return resolved task names."""
from lm_eval import utils
from lm_eval.tasks import TaskManager
# if metadata manually passed use that:
self.metadata = metadata if metadata else self.metadata
# Create task manager with metadata
task_manager = TaskManager(
include_path=self.include_path, metadata=self.metadata
include_path=self.include_path,
metadata=self.metadata if self.metadata else {},
)
# self.tasks is a comma-separated string of task names
task_list = self.tasks.split(",")
if isinstance((task_list := self.tasks), str):
task_list = self.tasks.split(",")
else:
assert isinstance(self.tasks, list), (
"`tasks` must be a comma delimited string of task names or list[str]."
)
task_names = task_manager.match_tasks(task_list)
# Check for any individual task files in the list
......@@ -214,18 +364,7 @@ class EvaluationConfig:
# Update tasks with resolved names
self.tasks = task_names
return task_names
def _setup_metadata(self) -> None:
"""Set up metadata by merging model_args and metadata."""
if self.model_args is None:
self.model_args = {}
if self.metadata is None:
self.metadata = {}
# Merge model_args and metadata
merged_metadata = self.model_args | self.metadata
self.metadata = merged_metadata
return task_manager
def _apply_trust_remote_code(self) -> None:
"""Apply trust_remote_code setting if enabled."""
......
......@@ -777,13 +777,3 @@ def evaluate(
else:
return None
def request_caching_arg_to_dict(cache_requests: str) -> dict:
request_caching_args = {
"cache_requests": cache_requests in {"true", "refresh"},
"rewrite_requests_cache": cache_requests == "refresh",
"delete_requests_cache": cache_requests == "delete",
}
return request_caching_args
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment