Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
gaoqiong
lm-evaluation-harness
Commits
b58e5556
Commit
b58e5556
authored
Jul 27, 2025
by
Baber
Browse files
Merge branch 'main' into tasklist
# Conflicts: # pyproject.toml
parents
6e1866f5
4f8195f1
Changes
340
Hide whitespace changes
Inline
Side-by-side
Showing
20 changed files
with
16 additions
and
36 deletions
+16
-36
lm_eval/tasks/llama3/instruct/mmlu_de/_continuation_template_yaml
...tasks/llama3/instruct/mmlu_de/_continuation_template_yaml
+0
-2
lm_eval/tasks/llama3/instruct/mmlu_es/_continuation_template_yaml
...tasks/llama3/instruct/mmlu_es/_continuation_template_yaml
+0
-2
lm_eval/tasks/llama3/instruct/mmlu_fr/_continuation_template_yaml
...tasks/llama3/instruct/mmlu_fr/_continuation_template_yaml
+0
-2
lm_eval/tasks/llama3/instruct/mmlu_hi/_continuation_template_yaml
...tasks/llama3/instruct/mmlu_hi/_continuation_template_yaml
+0
-2
lm_eval/tasks/llama3/instruct/mmlu_it/_continuation_template_yaml
...tasks/llama3/instruct/mmlu_it/_continuation_template_yaml
+0
-2
lm_eval/tasks/llama3/instruct/mmlu_pro/_default_template_yaml
...val/tasks/llama3/instruct/mmlu_pro/_default_template_yaml
+0
-2
lm_eval/tasks/llama3/instruct/mmlu_pt/_continuation_template_yaml
...tasks/llama3/instruct/mmlu_pt/_continuation_template_yaml
+0
-2
lm_eval/tasks/llama3/instruct/mmlu_th/_continuation_template_yaml
...tasks/llama3/instruct/mmlu_th/_continuation_template_yaml
+0
-2
lm_eval/tasks/logiqa/logiqa.yaml
lm_eval/tasks/logiqa/logiqa.yaml
+0
-2
lm_eval/tasks/logiqa2/logieval.yaml
lm_eval/tasks/logiqa2/logieval.yaml
+0
-2
lm_eval/tasks/meddialog/utils.py
lm_eval/tasks/meddialog/utils.py
+3
-1
lm_eval/tasks/mediqa_qa2019/mediqa_qa2019_perplexity.yaml
lm_eval/tasks/mediqa_qa2019/mediqa_qa2019_perplexity.yaml
+0
-2
lm_eval/tasks/mediqa_qa2019/utils.py
lm_eval/tasks/mediqa_qa2019/utils.py
+3
-1
lm_eval/tasks/medtext/utils.py
lm_eval/tasks/medtext/utils.py
+3
-1
lm_eval/tasks/meqsum/utils.py
lm_eval/tasks/meqsum/utils.py
+3
-1
lm_eval/tasks/mimic_repsum/utils.py
lm_eval/tasks/mimic_repsum/utils.py
+3
-1
lm_eval/tasks/minerva_math/minerva_math_algebra.yaml
lm_eval/tasks/minerva_math/minerva_math_algebra.yaml
+1
-3
lm_eval/tasks/mlqa/mlqa_common_yaml
lm_eval/tasks/mlqa/mlqa_common_yaml
+0
-2
lm_eval/tasks/mmlu/continuation/_continuation_template_yaml
lm_eval/tasks/mmlu/continuation/_continuation_template_yaml
+0
-2
lm_eval/tasks/mmlu/default/_default_template_yaml
lm_eval/tasks/mmlu/default/_default_template_yaml
+0
-2
No files found.
lm_eval/tasks/llama3/instruct/mmlu_de/_continuation_template_yaml
View file @
b58e5556
...
...
@@ -28,5 +28,3 @@ filter_list:
- function: take_first
metadata:
version: 1.0
dataset_kwargs:
trust_remote_code: true
lm_eval/tasks/llama3/instruct/mmlu_es/_continuation_template_yaml
View file @
b58e5556
...
...
@@ -28,5 +28,3 @@ filter_list:
- function: take_first
metadata:
version: 1.0
dataset_kwargs:
trust_remote_code: true
lm_eval/tasks/llama3/instruct/mmlu_fr/_continuation_template_yaml
View file @
b58e5556
...
...
@@ -28,5 +28,3 @@ filter_list:
- function: take_first
metadata:
version: 1.0
dataset_kwargs:
trust_remote_code: true
lm_eval/tasks/llama3/instruct/mmlu_hi/_continuation_template_yaml
View file @
b58e5556
...
...
@@ -28,5 +28,3 @@ filter_list:
- function: take_first
metadata:
version: 1.0
dataset_kwargs:
trust_remote_code: true
lm_eval/tasks/llama3/instruct/mmlu_it/_continuation_template_yaml
View file @
b58e5556
...
...
@@ -28,5 +28,3 @@ filter_list:
- function: take_first
metadata:
version: 1.0
dataset_kwargs:
trust_remote_code: true
lm_eval/tasks/llama3/instruct/mmlu_pro/_default_template_yaml
View file @
b58e5556
...
...
@@ -31,5 +31,3 @@ filter_list:
- function: take_first
metadata:
version: 1.0
dataset_kwargs:
trust_remote_code: true
lm_eval/tasks/llama3/instruct/mmlu_pt/_continuation_template_yaml
View file @
b58e5556
...
...
@@ -28,5 +28,3 @@ filter_list:
- function: take_first
metadata:
version: 1.0
dataset_kwargs:
trust_remote_code: true
lm_eval/tasks/llama3/instruct/mmlu_th/_continuation_template_yaml
View file @
b58e5556
...
...
@@ -28,5 +28,3 @@ filter_list:
- function: take_first
metadata:
version: 1.0
dataset_kwargs:
trust_remote_code: true
lm_eval/tasks/logiqa/logiqa.yaml
View file @
b58e5556
...
...
@@ -19,5 +19,3 @@ metric_list:
higher_is_better
:
true
metadata
:
version
:
1.0
dataset_kwargs
:
trust_remote_code
:
true
lm_eval/tasks/logiqa2/logieval.yaml
View file @
b58e5556
...
...
@@ -25,5 +25,3 @@ filter_list:
-
function
:
"
take_first"
metadata
:
version
:
0.0
dataset_kwargs
:
trust_remote_code
:
true
lm_eval/tasks/meddialog/utils.py
View file @
b58e5556
...
...
@@ -11,7 +11,9 @@ try:
except
(
ModuleNotFoundError
,
ImportError
):
raise
ModuleNotFoundError
(
"Please install evaluation metrics via pip install evaluate and pip install bert-score"
,
"Please install evaluation metrics via pip install evaluate bert-score "
"rouge_score>=0.1.2 nltk absl-py "
"git+https://github.com/google-research/bleurt.git"
)
except
Exception
as
e
:
raise
RuntimeError
(
...
...
lm_eval/tasks/mediqa_qa2019/mediqa_qa2019_perplexity.yaml
View file @
b58e5556
...
...
@@ -23,5 +23,3 @@ metric_list:
higher_is_better
:
false
metadata
:
version
:
1.0
dataset_kwargs
:
trust_remote_code
:
true
lm_eval/tasks/mediqa_qa2019/utils.py
View file @
b58e5556
...
...
@@ -11,7 +11,9 @@ try:
except
(
ModuleNotFoundError
,
ImportError
):
raise
ModuleNotFoundError
(
"Please install evaluation metrics via pip install evaluate and pip install bert-score"
,
"Please install evaluation metrics via pip install evaluate bert-score "
"rouge_score>=0.1.2 nltk absl-py "
"git+https://github.com/google-research/bleurt.git"
)
except
Exception
as
e
:
raise
RuntimeError
(
...
...
lm_eval/tasks/medtext/utils.py
View file @
b58e5556
...
...
@@ -11,7 +11,9 @@ try:
except
(
ModuleNotFoundError
,
ImportError
):
raise
ModuleNotFoundError
(
"Please install evaluation metrics via pip install evaluate and pip install bert-score"
,
"Please install evaluation metrics via pip install evaluate bert-score "
"rouge_score>=0.1.2 nltk absl-py "
"git+https://github.com/google-research/bleurt.git"
)
except
Exception
as
e
:
raise
RuntimeError
(
...
...
lm_eval/tasks/meqsum/utils.py
View file @
b58e5556
...
...
@@ -11,7 +11,9 @@ try:
except
(
ModuleNotFoundError
,
ImportError
):
raise
ModuleNotFoundError
(
"Please install evaluation metrics via pip install evaluate and pip install bert-score"
,
"Please install evaluation metrics via pip install evaluate bert-score "
"rouge_score>=0.1.2 nltk absl-py "
"git+https://github.com/google-research/bleurt.git"
)
except
Exception
as
e
:
raise
RuntimeError
(
...
...
lm_eval/tasks/mimic_repsum/utils.py
View file @
b58e5556
...
...
@@ -15,7 +15,9 @@ try:
except
(
ModuleNotFoundError
,
ImportError
):
raise
ModuleNotFoundError
(
"Please install evaluation metrics via pip install evaluate and pip install bert-score"
,
"Please install evaluation metrics via pip install evaluate bert-score "
"rouge_score>=0.1.2 nltk absl-py radgraph"
"git+https://github.com/google-research/bleurt.git"
)
except
Exception
as
e
:
raise
RuntimeError
(
...
...
lm_eval/tasks/minerva_math/minerva_math_algebra.yaml
View file @
b58e5556
...
...
@@ -7,7 +7,7 @@ dataset_name: algebra
output_type
:
generate_until
training_split
:
train
test_split
:
test
doc_to_text
:
!function
utils.doc_to_text
doc_to_text
:
!function
utils.doc_to_text
process_results
:
!function
utils.process_results
doc_to_target
:
"
{{answer
if
few_shot
is
undefined
else
solution}}"
generation_kwargs
:
...
...
@@ -25,8 +25,6 @@ metric_list:
num_fewshot
:
4
metadata
:
version
:
2.0
dataset_kwargs
:
trust_remote_code
:
true
fewshot_config
:
sampler
:
first_n
samples
:
!function
utils.list_fewshot_samples
lm_eval/tasks/mlqa/mlqa_common_yaml
View file @
b58e5556
dataset_path: facebook/mlqa
dataset_kwargs:
trust_remote_code: true
test_split: test
validation_split: validation
output_type: generate_until
...
...
lm_eval/tasks/mmlu/continuation/_continuation_template_yaml
View file @
b58e5556
...
...
@@ -9,5 +9,3 @@ doc_to_choice: "{{choices}}"
doc_to_target: "{{answer}}"
metadata:
version: 1.0
dataset_kwargs:
trust_remote_code: true
lm_eval/tasks/mmlu/default/_default_template_yaml
View file @
b58e5556
...
...
@@ -13,5 +13,3 @@ metric_list:
higher_is_better: true
metadata:
version: 1.0
dataset_kwargs:
trust_remote_code: true
Prev
1
…
6
7
8
9
10
11
12
13
14
…
17
Next
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment