Commit be78dc7a authored by Baber's avatar Baber
Browse files

pre-commit

parent b7d3f0dd
...@@ -18,24 +18,24 @@ class Eval: ...@@ -18,24 +18,24 @@ class Eval:
quick start: quick start:
# Basic evaluation # Basic evaluation
lm-eval run --model hf --model_args pretrained=gpt2 --tasks hellaswag lm-eval run --model hf --model_args pretrained=gpt2 --tasks hellaswag
# List available tasks # List available tasks
lm-eval list tasks lm-eval list tasks
# Validate task configurations # Validate task configurations
lm-eval validate --tasks hellaswag,arc_easy lm-eval validate --tasks hellaswag,arc_easy
available commands: available commands:
run Run the harness on specified tasks run Run the harness on specified tasks
list List available tasks, groups, subtasks, or tags list List available tasks, groups, subtasks, or tags
validate Validate task configurations and check for errors validate Validate task configurations and check for errors
legacy compatibility: legacy compatibility:
The harness maintains backward compatibility with the original interface. The harness maintains backward compatibility with the original interface.
If no command is specified, 'run' is automatically inserted: If no command is specified, 'run' is automatically inserted:
lm-eval --model hf --tasks hellaswag # Equivalent to 'lm-eval run --model hf --tasks hellaswag' lm-eval --model hf --tasks hellaswag # Equivalent to 'lm-eval run --model hf --tasks hellaswag'
For documentation, visit: https://github.com/EleutherAI/lm-evaluation-harness/blob/main/docs/interface.md For documentation, visit: https://github.com/EleutherAI/lm-evaluation-harness/blob/main/docs/interface.md
"""), """),
formatter_class=argparse.RawDescriptionHelpFormatter, formatter_class=argparse.RawDescriptionHelpFormatter,
......
...@@ -19,28 +19,28 @@ class List(SubCommand): ...@@ -19,28 +19,28 @@ class List(SubCommand):
examples: examples:
# List all available tasks (includes groups, subtasks, and tags) # List all available tasks (includes groups, subtasks, and tags)
$ lm-eval list tasks $ lm-eval list tasks
# List only task groups (like 'mmlu', 'glue', 'superglue') # List only task groups (like 'mmlu', 'glue', 'superglue')
$ lm-eval list groups $ lm-eval list groups
# List only individual subtasks (like 'mmlu_abstract_algebra') # List only individual subtasks (like 'mmlu_abstract_algebra')
$ lm-eval list subtasks $ lm-eval list subtasks
# Include external task definitions # Include external task definitions
$ lm-eval list tasks --include_path /path/to/external/tasks $ lm-eval list tasks --include_path /path/to/external/tasks
# List tasks from multiple external paths # List tasks from multiple external paths
$ lm-eval list tasks --include_path "/path/to/tasks1:/path/to/tasks2" $ lm-eval list tasks --include_path "/path/to/tasks1:/path/to/tasks2"
organization: organization:
• Groups: Collections of tasks with aggregated metric across subtasks (e.g., 'mmlu') • Groups: Collections of tasks with aggregated metric across subtasks (e.g., 'mmlu')
• Subtasks: Individual evaluation tasks (e.g., 'mmlu_anatomy', 'hellaswag') • Subtasks: Individual evaluation tasks (e.g., 'mmlu_anatomy', 'hellaswag')
• Tags: Similar to groups but no aggregate metric (e.g., 'reasoning', 'knowledge', 'language') • Tags: Similar to groups but no aggregate metric (e.g., 'reasoning', 'knowledge', 'language')
• External Tasks: Custom tasks defined in external directories • External Tasks: Custom tasks defined in external directories
evaluation usage: evaluation usage:
After listing tasks, use them with the run command! After listing tasks, use them with the run command!
For more information tasks configs are defined in https://github.com/EleutherAI/lm-evaluation-harness/tree/main/lm_eval/tasks For more information tasks configs are defined in https://github.com/EleutherAI/lm-evaluation-harness/tree/main/lm_eval/tasks
"""), """),
formatter_class=argparse.RawDescriptionHelpFormatter, formatter_class=argparse.RawDescriptionHelpFormatter,
......
...@@ -27,16 +27,16 @@ class Run(SubCommand): ...@@ -27,16 +27,16 @@ class Run(SubCommand):
examples: examples:
# Basic evaluation with HuggingFace model # Basic evaluation with HuggingFace model
$ lm-eval run --model hf --model_args pretrained=gpt2 --tasks hellaswag $ lm-eval run --model hf --model_args pretrained=gpt2 --tasks hellaswag
# Evaluate on multiple tasks with few-shot examples # Evaluate on multiple tasks with few-shot examples
$ lm-eval run --model vllm --model_args pretrained=EleutherAI/gpt-j-6B --tasks arc_easy,arc_challenge --num_fewshot 5 $ lm-eval run --model vllm --model_args pretrained=EleutherAI/gpt-j-6B --tasks arc_easy,arc_challenge --num_fewshot 5
# Evaluation with custom generation parameters # Evaluation with custom generation parameters
$ lm-eval run --model hf --model_args pretrained=gpt2 --tasks lambada --gen_kwargs "temperature=0.8,top_p=0.95" $ lm-eval run --model hf --model_args pretrained=gpt2 --tasks lambada --gen_kwargs "temperature=0.8,top_p=0.95"
# Use configuration file # Use configuration file
$ lm-eval run --config my_config.yaml --tasks mmlu $ lm-eval run --config my_config.yaml --tasks mmlu
For more information, see: https://github.com/EleutherAI/lm-evaluation-harness For more information, see: https://github.com/EleutherAI/lm-evaluation-harness
"""), """),
formatter_class=argparse.RawDescriptionHelpFormatter, formatter_class=argparse.RawDescriptionHelpFormatter,
......
...@@ -20,19 +20,19 @@ class Validate(SubCommand): ...@@ -20,19 +20,19 @@ class Validate(SubCommand):
examples: examples:
# Validate a single task # Validate a single task
lm-eval validate --tasks hellaswag lm-eval validate --tasks hellaswag
# Validate multiple tasks # Validate multiple tasks
lm-eval validate --tasks arc_easy,arc_challenge,hellaswag lm-eval validate --tasks arc_easy,arc_challenge,hellaswag
# Validate a task group # Validate a task group
lm-eval validate --tasks mmlu lm-eval validate --tasks mmlu
# Validate tasks with external definitions # Validate tasks with external definitions
lm-eval validate --tasks my_custom_task --include_path ./custom_tasks lm-eval validate --tasks my_custom_task --include_path ./custom_tasks
# Validate tasks from multiple external paths # Validate tasks from multiple external paths
lm-eval validate --tasks custom_task1,custom_task2 --include_path "/path/to/tasks1:/path/to/tasks2" lm-eval validate --tasks custom_task1,custom_task2 --include_path "/path/to/tasks1:/path/to/tasks2"
validation check: validation check:
The validate command performs several checks: The validate command performs several checks:
• Task existence: Verifies all specified tasks are available • Task existence: Verifies all specified tasks are available
...@@ -42,7 +42,7 @@ class Validate(SubCommand): ...@@ -42,7 +42,7 @@ class Validate(SubCommand):
• Metric definitions: Verifies metric functions and aggregation methods • Metric definitions: Verifies metric functions and aggregation methods
• Filter pipelines: Validates filter chains and their parameters • Filter pipelines: Validates filter chains and their parameters
• Template rendering: Tests prompt templates with sample data • Template rendering: Tests prompt templates with sample data
task config files: task config files:
Tasks are defined using YAML configuration files with these key sections: Tasks are defined using YAML configuration files with these key sections:
• task: Task name and metadata • task: Task name and metadata
...@@ -52,7 +52,7 @@ class Validate(SubCommand): ...@@ -52,7 +52,7 @@ class Validate(SubCommand):
• metric_list: List of evaluation metrics to compute • metric_list: List of evaluation metrics to compute
• output_type: Type of model output (loglikelihood, generate_until, etc.) • output_type: Type of model output (loglikelihood, generate_until, etc.)
• filter_list: Post-processing filters for model outputs • filter_list: Post-processing filters for model outputs
common errors: common errors:
• Missing required fields in YAML configuration • Missing required fields in YAML configuration
• Invalid dataset paths or missing dataset splits • Invalid dataset paths or missing dataset splits
...@@ -61,13 +61,13 @@ class Validate(SubCommand): ...@@ -61,13 +61,13 @@ class Validate(SubCommand):
• Invalid filter names or parameters • Invalid filter names or parameters
• Circular dependencies in task inheritance • Circular dependencies in task inheritance
• Missing external task files when using --include_path • Missing external task files when using --include_path
debugging tips: debugging tips:
• Use --include_path to test external task definitions • Use --include_path to test external task definitions
• Check task configuration files for syntax errors • Check task configuration files for syntax errors
• Verify dataset access and authentication if needed • Verify dataset access and authentication if needed
• Use 'lm-eval list tasks' to see available tasks • Use 'lm-eval list tasks' to see available tasks
For task configuration guide, see: https://github.com/EleutherAI/lm-evaluation-harness/blob/main/docs/task_guide.md For task configuration guide, see: https://github.com/EleutherAI/lm-evaluation-harness/blob/main/docs/task_guide.md
"""), """),
formatter_class=argparse.RawDescriptionHelpFormatter, formatter_class=argparse.RawDescriptionHelpFormatter,
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment