Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
gaoqiong
lm-evaluation-harness
Commits
be78dc7a
Commit
be78dc7a
authored
Jul 04, 2025
by
Baber
Browse files
pre-commit
parent
b7d3f0dd
Changes
4
Hide whitespace changes
Inline
Side-by-side
Showing
4 changed files
with
26 additions
and
26 deletions
+26
-26
lm_eval/_cli/eval.py
lm_eval/_cli/eval.py
+6
-6
lm_eval/_cli/list.py
lm_eval/_cli/list.py
+7
-7
lm_eval/_cli/run.py
lm_eval/_cli/run.py
+4
-4
lm_eval/_cli/validate.py
lm_eval/_cli/validate.py
+9
-9
No files found.
lm_eval/_cli/eval.py
View file @
be78dc7a
...
@@ -18,24 +18,24 @@ class Eval:
...
@@ -18,24 +18,24 @@ class Eval:
quick start:
quick start:
# Basic evaluation
# Basic evaluation
lm-eval run --model hf --model_args pretrained=gpt2 --tasks hellaswag
lm-eval run --model hf --model_args pretrained=gpt2 --tasks hellaswag
# List available tasks
# List available tasks
lm-eval list tasks
lm-eval list tasks
# Validate task configurations
# Validate task configurations
lm-eval validate --tasks hellaswag,arc_easy
lm-eval validate --tasks hellaswag,arc_easy
available commands:
available commands:
run Run the harness on specified tasks
run Run the harness on specified tasks
list List available tasks, groups, subtasks, or tags
list List available tasks, groups, subtasks, or tags
validate Validate task configurations and check for errors
validate Validate task configurations and check for errors
legacy compatibility:
legacy compatibility:
The harness maintains backward compatibility with the original interface.
The harness maintains backward compatibility with the original interface.
If no command is specified, 'run' is automatically inserted:
If no command is specified, 'run' is automatically inserted:
lm-eval --model hf --tasks hellaswag # Equivalent to 'lm-eval run --model hf --tasks hellaswag'
lm-eval --model hf --tasks hellaswag # Equivalent to 'lm-eval run --model hf --tasks hellaswag'
For documentation, visit: https://github.com/EleutherAI/lm-evaluation-harness/blob/main/docs/interface.md
For documentation, visit: https://github.com/EleutherAI/lm-evaluation-harness/blob/main/docs/interface.md
"""
),
"""
),
formatter_class
=
argparse
.
RawDescriptionHelpFormatter
,
formatter_class
=
argparse
.
RawDescriptionHelpFormatter
,
...
...
lm_eval/_cli/list.py
View file @
be78dc7a
...
@@ -19,28 +19,28 @@ class List(SubCommand):
...
@@ -19,28 +19,28 @@ class List(SubCommand):
examples:
examples:
# List all available tasks (includes groups, subtasks, and tags)
# List all available tasks (includes groups, subtasks, and tags)
$ lm-eval list tasks
$ lm-eval list tasks
# List only task groups (like 'mmlu', 'glue', 'superglue')
# List only task groups (like 'mmlu', 'glue', 'superglue')
$ lm-eval list groups
$ lm-eval list groups
# List only individual subtasks (like 'mmlu_abstract_algebra')
# List only individual subtasks (like 'mmlu_abstract_algebra')
$ lm-eval list subtasks
$ lm-eval list subtasks
# Include external task definitions
# Include external task definitions
$ lm-eval list tasks --include_path /path/to/external/tasks
$ lm-eval list tasks --include_path /path/to/external/tasks
# List tasks from multiple external paths
# List tasks from multiple external paths
$ lm-eval list tasks --include_path "/path/to/tasks1:/path/to/tasks2"
$ lm-eval list tasks --include_path "/path/to/tasks1:/path/to/tasks2"
organization:
organization:
• Groups: Collections of tasks with aggregated metric across subtasks (e.g., 'mmlu')
• Groups: Collections of tasks with aggregated metric across subtasks (e.g., 'mmlu')
• Subtasks: Individual evaluation tasks (e.g., 'mmlu_anatomy', 'hellaswag')
• Subtasks: Individual evaluation tasks (e.g., 'mmlu_anatomy', 'hellaswag')
• Tags: Similar to groups but no aggregate metric (e.g., 'reasoning', 'knowledge', 'language')
• Tags: Similar to groups but no aggregate metric (e.g., 'reasoning', 'knowledge', 'language')
• External Tasks: Custom tasks defined in external directories
• External Tasks: Custom tasks defined in external directories
evaluation usage:
evaluation usage:
After listing tasks, use them with the run command!
After listing tasks, use them with the run command!
For more information tasks configs are defined in https://github.com/EleutherAI/lm-evaluation-harness/tree/main/lm_eval/tasks
For more information tasks configs are defined in https://github.com/EleutherAI/lm-evaluation-harness/tree/main/lm_eval/tasks
"""
),
"""
),
formatter_class
=
argparse
.
RawDescriptionHelpFormatter
,
formatter_class
=
argparse
.
RawDescriptionHelpFormatter
,
...
...
lm_eval/_cli/run.py
View file @
be78dc7a
...
@@ -27,16 +27,16 @@ class Run(SubCommand):
...
@@ -27,16 +27,16 @@ class Run(SubCommand):
examples:
examples:
# Basic evaluation with HuggingFace model
# Basic evaluation with HuggingFace model
$ lm-eval run --model hf --model_args pretrained=gpt2 --tasks hellaswag
$ lm-eval run --model hf --model_args pretrained=gpt2 --tasks hellaswag
# Evaluate on multiple tasks with few-shot examples
# Evaluate on multiple tasks with few-shot examples
$ lm-eval run --model vllm --model_args pretrained=EleutherAI/gpt-j-6B --tasks arc_easy,arc_challenge --num_fewshot 5
$ lm-eval run --model vllm --model_args pretrained=EleutherAI/gpt-j-6B --tasks arc_easy,arc_challenge --num_fewshot 5
# Evaluation with custom generation parameters
# Evaluation with custom generation parameters
$ lm-eval run --model hf --model_args pretrained=gpt2 --tasks lambada --gen_kwargs "temperature=0.8,top_p=0.95"
$ lm-eval run --model hf --model_args pretrained=gpt2 --tasks lambada --gen_kwargs "temperature=0.8,top_p=0.95"
# Use configuration file
# Use configuration file
$ lm-eval run --config my_config.yaml --tasks mmlu
$ lm-eval run --config my_config.yaml --tasks mmlu
For more information, see: https://github.com/EleutherAI/lm-evaluation-harness
For more information, see: https://github.com/EleutherAI/lm-evaluation-harness
"""
),
"""
),
formatter_class
=
argparse
.
RawDescriptionHelpFormatter
,
formatter_class
=
argparse
.
RawDescriptionHelpFormatter
,
...
...
lm_eval/_cli/validate.py
View file @
be78dc7a
...
@@ -20,19 +20,19 @@ class Validate(SubCommand):
...
@@ -20,19 +20,19 @@ class Validate(SubCommand):
examples:
examples:
# Validate a single task
# Validate a single task
lm-eval validate --tasks hellaswag
lm-eval validate --tasks hellaswag
# Validate multiple tasks
# Validate multiple tasks
lm-eval validate --tasks arc_easy,arc_challenge,hellaswag
lm-eval validate --tasks arc_easy,arc_challenge,hellaswag
# Validate a task group
# Validate a task group
lm-eval validate --tasks mmlu
lm-eval validate --tasks mmlu
# Validate tasks with external definitions
# Validate tasks with external definitions
lm-eval validate --tasks my_custom_task --include_path ./custom_tasks
lm-eval validate --tasks my_custom_task --include_path ./custom_tasks
# Validate tasks from multiple external paths
# Validate tasks from multiple external paths
lm-eval validate --tasks custom_task1,custom_task2 --include_path "/path/to/tasks1:/path/to/tasks2"
lm-eval validate --tasks custom_task1,custom_task2 --include_path "/path/to/tasks1:/path/to/tasks2"
validation check:
validation check:
The validate command performs several checks:
The validate command performs several checks:
• Task existence: Verifies all specified tasks are available
• Task existence: Verifies all specified tasks are available
...
@@ -42,7 +42,7 @@ class Validate(SubCommand):
...
@@ -42,7 +42,7 @@ class Validate(SubCommand):
• Metric definitions: Verifies metric functions and aggregation methods
• Metric definitions: Verifies metric functions and aggregation methods
• Filter pipelines: Validates filter chains and their parameters
• Filter pipelines: Validates filter chains and their parameters
• Template rendering: Tests prompt templates with sample data
• Template rendering: Tests prompt templates with sample data
task config files:
task config files:
Tasks are defined using YAML configuration files with these key sections:
Tasks are defined using YAML configuration files with these key sections:
• task: Task name and metadata
• task: Task name and metadata
...
@@ -52,7 +52,7 @@ class Validate(SubCommand):
...
@@ -52,7 +52,7 @@ class Validate(SubCommand):
• metric_list: List of evaluation metrics to compute
• metric_list: List of evaluation metrics to compute
• output_type: Type of model output (loglikelihood, generate_until, etc.)
• output_type: Type of model output (loglikelihood, generate_until, etc.)
• filter_list: Post-processing filters for model outputs
• filter_list: Post-processing filters for model outputs
common errors:
common errors:
• Missing required fields in YAML configuration
• Missing required fields in YAML configuration
• Invalid dataset paths or missing dataset splits
• Invalid dataset paths or missing dataset splits
...
@@ -61,13 +61,13 @@ class Validate(SubCommand):
...
@@ -61,13 +61,13 @@ class Validate(SubCommand):
• Invalid filter names or parameters
• Invalid filter names or parameters
• Circular dependencies in task inheritance
• Circular dependencies in task inheritance
• Missing external task files when using --include_path
• Missing external task files when using --include_path
debugging tips:
debugging tips:
• Use --include_path to test external task definitions
• Use --include_path to test external task definitions
• Check task configuration files for syntax errors
• Check task configuration files for syntax errors
• Verify dataset access and authentication if needed
• Verify dataset access and authentication if needed
• Use 'lm-eval list tasks' to see available tasks
• Use 'lm-eval list tasks' to see available tasks
For task configuration guide, see: https://github.com/EleutherAI/lm-evaluation-harness/blob/main/docs/task_guide.md
For task configuration guide, see: https://github.com/EleutherAI/lm-evaluation-harness/blob/main/docs/task_guide.md
"""
),
"""
),
formatter_class
=
argparse
.
RawDescriptionHelpFormatter
,
formatter_class
=
argparse
.
RawDescriptionHelpFormatter
,
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment