Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
gaoqiong
lm-evaluation-harness
Commits
470059f6
Commit
470059f6
authored
Nov 24, 2023
by
lintangsutawika
Browse files
merge conflict
parents
b8d7d6c3
9d030712
Changes
1000
Hide whitespace changes
Inline
Side-by-side
Showing
20 changed files
with
265 additions
and
7 deletions
+265
-7
lm_eval/tasks/logiqa2/logieval.yaml
lm_eval/tasks/logiqa2/logieval.yaml
+1
-1
lm_eval/tasks/mgsm/direct/direct_yaml
lm_eval/tasks/mgsm/direct/direct_yaml
+1
-1
lm_eval/tasks/mgsm/en_cot/cot_yaml
lm_eval/tasks/mgsm/en_cot/cot_yaml
+1
-1
lm_eval/tasks/mgsm/native_cot/cot_yaml
lm_eval/tasks/mgsm/native_cot/cot_yaml
+1
-1
lm_eval/tasks/minerva_math/README.md
lm_eval/tasks/minerva_math/README.md
+1
-1
lm_eval/tasks/minerva_math/minerva_math_algebra.yaml
lm_eval/tasks/minerva_math/minerva_math_algebra.yaml
+1
-1
lm_eval/tasks/minerva_math/utils.py
lm_eval/tasks/minerva_math/utils.py
+1
-1
lm_eval/tasks/mmlu/_generate_configs.py
lm_eval/tasks/mmlu/_generate_configs.py
+159
-0
lm_eval/tasks/mmlu/default/_default_template_yaml
lm_eval/tasks/mmlu/default/_default_template_yaml
+13
-0
lm_eval/tasks/mmlu/default/_mmlu.yaml
lm_eval/tasks/mmlu/default/_mmlu.yaml
+6
-0
lm_eval/tasks/mmlu/default/mmlu_abstract_algebra.yaml
lm_eval/tasks/mmlu/default/mmlu_abstract_algebra.yaml
+8
-0
lm_eval/tasks/mmlu/default/mmlu_anatomy.yaml
lm_eval/tasks/mmlu/default/mmlu_anatomy.yaml
+8
-0
lm_eval/tasks/mmlu/default/mmlu_astronomy.yaml
lm_eval/tasks/mmlu/default/mmlu_astronomy.yaml
+8
-0
lm_eval/tasks/mmlu/default/mmlu_business_ethics.yaml
lm_eval/tasks/mmlu/default/mmlu_business_ethics.yaml
+8
-0
lm_eval/tasks/mmlu/default/mmlu_clinical_knowledge.yaml
lm_eval/tasks/mmlu/default/mmlu_clinical_knowledge.yaml
+8
-0
lm_eval/tasks/mmlu/default/mmlu_college_biology.yaml
lm_eval/tasks/mmlu/default/mmlu_college_biology.yaml
+8
-0
lm_eval/tasks/mmlu/default/mmlu_college_chemistry.yaml
lm_eval/tasks/mmlu/default/mmlu_college_chemistry.yaml
+8
-0
lm_eval/tasks/mmlu/default/mmlu_college_computer_science.yaml
...val/tasks/mmlu/default/mmlu_college_computer_science.yaml
+8
-0
lm_eval/tasks/mmlu/default/mmlu_college_mathematics.yaml
lm_eval/tasks/mmlu/default/mmlu_college_mathematics.yaml
+8
-0
lm_eval/tasks/mmlu/default/mmlu_college_medicine.yaml
lm_eval/tasks/mmlu/default/mmlu_college_medicine.yaml
+8
-0
No files found.
Too many changes to show.
To preserve performance only
1000 of 1000+
files are displayed.
Plain diff
Email patch
lm_eval/tasks/logiqa2/logieval.yaml
View file @
470059f6
task
:
logieval
dataset_path
:
baber/logiqa2
dataset_name
:
logieval
output_type
:
g
reedy
_until
output_type
:
g
enerate
_until
training_split
:
train
test_split
:
test
# Instructions + {content}
...
...
lm_eval/tasks/mgsm/direct/direct_yaml
View file @
470059f6
...
...
@@ -4,7 +4,7 @@
group: mgsm_direct
dataset_path: juletxara/mgsm
dataset_name: null # Overridden by language-specific config.
output_type: g
reedy
_until
output_type: g
enerate
_until
training_split: train
test_split: test
target_delimiter: ""
...
...
lm_eval/tasks/mgsm/en_cot/cot_yaml
View file @
470059f6
...
...
@@ -4,7 +4,7 @@
group: mgsm_cot_native
dataset_path: juletxara/mgsm
dataset_name: null # Overridden by language-specific config.
output_type: g
reedy
_until
output_type: g
enerate
_until
training_split: train
test_split: test
target_delimiter: ""
...
...
lm_eval/tasks/mgsm/native_cot/cot_yaml
View file @
470059f6
...
...
@@ -4,7 +4,7 @@
group: mgsm_cot_native
dataset_path: juletxara/mgsm
dataset_name: null # Overridden by language-specific config.
output_type: g
reedy
_until
output_type: g
enerate
_until
training_split: train
test_split: test
target_delimiter: ""
...
...
lm_eval/tasks/minerva_math/README.md
View file @
470059f6
...
...
@@ -37,7 +37,7 @@ Eprint = {arXiv:2206.14858},
#### Groups
-
`math_word_problems`
-
`g
reedy
_until`
-
`g
enerate
_until`
#### Tasks
...
...
lm_eval/tasks/minerva_math/minerva_math_algebra.yaml
View file @
470059f6
...
...
@@ -4,7 +4,7 @@ task: minerva_math_algebra
dataset_path
:
EleutherAI/hendrycks_math
process_docs
:
!function
utils.process_docs
dataset_name
:
algebra
output_type
:
g
reedy
_until
output_type
:
g
enerate
_until
training_split
:
train
test_split
:
test
doc_to_text
:
!function
utils.doc_to_text
...
...
lm_eval/tasks/minerva_math/utils.py
View file @
470059f6
import
datasets
import
re
import
signal
from
lm_eval.
logger
import
eval_logger
from
lm_eval.
utils
import
eval_logger
from
typing
import
Optional
,
List
,
Dict
try
:
...
...
lm_eval/tasks/mmlu/_generate_configs.py
0 → 100644
View file @
470059f6
"""
Take in a YAML, and output all "other" splits with this YAML
"""
import
os
import
yaml
import
argparse
from
tqdm
import
tqdm
from
lm_eval
import
utils
from
lm_eval.logger
import
eval_logger
SUBJECTS
=
{
"abstract_algebra"
:
"stem"
,
"anatomy"
:
"stem"
,
"astronomy"
:
"stem"
,
"business_ethics"
:
"other"
,
"clinical_knowledge"
:
"other"
,
"college_biology"
:
"stem"
,
"college_chemistry"
:
"stem"
,
"college_computer_science"
:
"stem"
,
"college_mathematics"
:
"stem"
,
"college_medicine"
:
"other"
,
"college_physics"
:
"stem"
,
"computer_security"
:
"stem"
,
"conceptual_physics"
:
"stem"
,
"econometrics"
:
"social_sciences"
,
"electrical_engineering"
:
"stem"
,
"elementary_mathematics"
:
"stem"
,
"formal_logic"
:
"humanities"
,
"global_facts"
:
"other"
,
"high_school_biology"
:
"stem"
,
"high_school_chemistry"
:
"stem"
,
"high_school_computer_science"
:
"stem"
,
"high_school_european_history"
:
"humanities"
,
"high_school_geography"
:
"social_sciences"
,
"high_school_government_and_politics"
:
"social_sciences"
,
"high_school_macroeconomics"
:
"social_sciences"
,
"high_school_mathematics"
:
"stem"
,
"high_school_microeconomics"
:
"social_sciences"
,
"high_school_physics"
:
"stem"
,
"high_school_psychology"
:
"social_sciences"
,
"high_school_statistics"
:
"stem"
,
"high_school_us_history"
:
"humanities"
,
"high_school_world_history"
:
"humanities"
,
"human_aging"
:
"other"
,
"human_sexuality"
:
"social_sciences"
,
"international_law"
:
"humanities"
,
"jurisprudence"
:
"humanities"
,
"logical_fallacies"
:
"humanities"
,
"machine_learning"
:
"stem"
,
"management"
:
"other"
,
"marketing"
:
"other"
,
"medical_genetics"
:
"other"
,
"miscellaneous"
:
"other"
,
"moral_disputes"
:
"humanities"
,
"moral_scenarios"
:
"humanities"
,
"nutrition"
:
"other"
,
"philosophy"
:
"humanities"
,
"prehistory"
:
"humanities"
,
"professional_accounting"
:
"other"
,
"professional_law"
:
"humanities"
,
"professional_medicine"
:
"other"
,
"professional_psychology"
:
"social_sciences"
,
"public_relations"
:
"social_sciences"
,
"security_studies"
:
"social_sciences"
,
"sociology"
:
"social_sciences"
,
"us_foreign_policy"
:
"social_sciences"
,
"virology"
:
"other"
,
"world_religions"
:
"humanities"
,
}
def
parse_args
():
parser
=
argparse
.
ArgumentParser
()
parser
.
add_argument
(
"--base_yaml_path"
,
required
=
True
)
parser
.
add_argument
(
"--save_prefix_path"
,
default
=
"mmlu"
)
parser
.
add_argument
(
"--cot_prompt_path"
,
default
=
None
)
parser
.
add_argument
(
"--task_prefix"
,
default
=
""
)
parser
.
add_argument
(
"--group_prefix"
,
default
=
""
)
return
parser
.
parse_args
()
if
__name__
==
"__main__"
:
args
=
parse_args
()
# get filename of base_yaml so we can `"include": ` it in our "other" YAMLs.
base_yaml_name
=
os
.
path
.
split
(
args
.
base_yaml_path
)[
-
1
]
with
open
(
args
.
base_yaml_path
)
as
f
:
base_yaml
=
yaml
.
full_load
(
f
)
if
args
.
cot_prompt_path
is
not
None
:
import
json
with
open
(
args
.
cot_prompt_path
)
as
f
:
cot_file
=
json
.
load
(
f
)
ALL_CATEGORIES
=
[]
for
subject
,
category
in
tqdm
(
SUBJECTS
.
items
()):
if
category
not
in
ALL_CATEGORIES
:
ALL_CATEGORIES
.
append
(
category
)
if
args
.
cot_prompt_path
is
not
None
:
description
=
cot_file
[
subject
]
else
:
description
=
f
"The following are multiple choice questions (with answers) about
{
' '
.
join
(
subject
.
split
(
'_'
))
}
.
\n\n
"
yaml_dict
=
{
"include"
:
base_yaml_name
,
"group"
:
f
"mmlu_
{
args
.
task_prefix
}
_
{
category
}
"
if
args
.
task_prefix
!=
""
else
f
"mmlu_
{
category
}
"
,
"group_alias"
:
category
.
replace
(
"_"
,
" "
),
"task"
:
f
"mmlu_
{
args
.
task_prefix
}
_
{
subject
}
"
if
args
.
task_prefix
!=
""
else
f
"mmlu_
{
subject
}
"
,
"task_alias"
:
subject
.
replace
(
"_"
,
" "
),
"dataset_name"
:
subject
,
"description"
:
description
,
}
file_save_path
=
args
.
save_prefix_path
+
f
"_
{
subject
}
.yaml"
eval_logger
.
info
(
f
"Saving yaml for subset
{
subject
}
to
{
file_save_path
}
"
)
with
open
(
file_save_path
,
"w"
)
as
yaml_file
:
yaml
.
dump
(
yaml_dict
,
yaml_file
,
# width=float("inf"),
allow_unicode
=
True
,
default_style
=
'"'
,
)
if
args
.
task_prefix
!=
""
:
mmlu_subcategories
=
[
f
"mmlu_
{
args
.
task_prefix
}
_
{
category
}
"
for
category
in
ALL_CATEGORIES
]
else
:
mmlu_subcategories
=
[
f
"mmlu_
{
category
}
"
for
category
in
ALL_CATEGORIES
]
if
args
.
group_prefix
!=
""
:
file_save_path
=
args
.
group_prefix
+
".yaml"
else
:
file_save_path
=
args
.
save_prefix_path
+
".yaml"
eval_logger
.
info
(
f
"Saving benchmark config to
{
file_save_path
}
"
)
with
open
(
file_save_path
,
"w"
)
as
yaml_file
:
yaml
.
dump
(
{
"group"
:
f
"mmlu_
{
args
.
task_prefix
}
"
if
args
.
task_prefix
!=
""
else
"mmlu"
,
"task"
:
mmlu_subcategories
,
},
yaml_file
,
indent
=
4
,
default_flow_style
=
False
,
)
lm_eval/tasks/mmlu/default/_default_template_yaml
0 → 100644
View file @
470059f6
dataset_path: hails/mmlu_no_train # a copy of `cais/mmlu` with no auxiliary_train split
test_split: test
fewshot_split: dev
fewshot_config:
sampler: first_n
output_type: multiple_choice
doc_to_text: "{{question.strip()}}\nA. {{choices[0]}}\nB. {{choices[1]}}\nC. {{choices[2]}}\nD. {{choices[3]}}\nAnswer:"
doc_to_choice: ["A", "B", "C", "D"]
doc_to_target: answer
metric_list:
- metric: acc
aggregation: mean
higher_is_better: true
lm_eval/tasks/mmlu/default/_mmlu.yaml
0 → 100644
View file @
470059f6
group
:
mmlu
task
:
-
mmlu_stem
-
mmlu_other
-
mmlu_social_sciences
-
mmlu_humanities
lm_eval/tasks/mmlu/default/mmlu_abstract_algebra.yaml
0 → 100644
View file @
470059f6
"
dataset_name"
:
"
abstract_algebra"
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
abstract
\
\
algebra.
\n\n
"
"
group"
:
"
mmlu_stem"
"
group_alias"
:
"
stem"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
mmlu_abstract_algebra"
"
task_alias"
:
"
abstract_algebra"
lm_eval/tasks/mmlu/default/mmlu_anatomy.yaml
0 → 100644
View file @
470059f6
"
dataset_name"
:
"
anatomy"
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
anatomy.
\n\
\n
"
"
group"
:
"
mmlu_stem"
"
group_alias"
:
"
stem"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
mmlu_anatomy"
"
task_alias"
:
"
anatomy"
lm_eval/tasks/mmlu/default/mmlu_astronomy.yaml
0 → 100644
View file @
470059f6
"
dataset_name"
:
"
astronomy"
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
astronomy.
\n\
\n
"
"
group"
:
"
mmlu_stem"
"
group_alias"
:
"
stem"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
mmlu_astronomy"
"
task_alias"
:
"
astronomy"
lm_eval/tasks/mmlu/default/mmlu_business_ethics.yaml
0 → 100644
View file @
470059f6
"
dataset_name"
:
"
business_ethics"
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
business
\
\
ethics.
\n\n
"
"
group"
:
"
mmlu_other"
"
group_alias"
:
"
other"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
mmlu_business_ethics"
"
task_alias"
:
"
business_ethics"
lm_eval/tasks/mmlu/default/mmlu_clinical_knowledge.yaml
0 → 100644
View file @
470059f6
"
dataset_name"
:
"
clinical_knowledge"
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
clinical
\
\
knowledge.
\n\n
"
"
group"
:
"
mmlu_other"
"
group_alias"
:
"
other"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
mmlu_clinical_knowledge"
"
task_alias"
:
"
clinical_knowledge"
lm_eval/tasks/mmlu/default/mmlu_college_biology.yaml
0 → 100644
View file @
470059f6
"
dataset_name"
:
"
college_biology"
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
college
\
\
biology.
\n\n
"
"
group"
:
"
mmlu_stem"
"
group_alias"
:
"
stem"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
mmlu_college_biology"
"
task_alias"
:
"
college_biology"
lm_eval/tasks/mmlu/default/mmlu_college_chemistry.yaml
0 → 100644
View file @
470059f6
"
dataset_name"
:
"
college_chemistry"
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
college
\
\
chemistry.
\n\n
"
"
group"
:
"
mmlu_stem"
"
group_alias"
:
"
stem"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
mmlu_college_chemistry"
"
task_alias"
:
"
college_chemistry"
lm_eval/tasks/mmlu/default/mmlu_college_computer_science.yaml
0 → 100644
View file @
470059f6
"
dataset_name"
:
"
college_computer_science"
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
college
\
\
computer
science.
\n\n
"
"
group"
:
"
mmlu_stem"
"
group_alias"
:
"
stem"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
mmlu_college_computer_science"
"
task_alias"
:
"
college_computer_science"
lm_eval/tasks/mmlu/default/mmlu_college_mathematics.yaml
0 → 100644
View file @
470059f6
"
dataset_name"
:
"
college_mathematics"
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
college
\
\
mathematics.
\n\n
"
"
group"
:
"
mmlu_stem"
"
group_alias"
:
"
stem"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
mmlu_college_mathematics"
"
task_alias"
:
"
college_mathematics"
lm_eval/tasks/mmlu/default/mmlu_college_medicine.yaml
0 → 100644
View file @
470059f6
"
dataset_name"
:
"
college_medicine"
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
college
\
\
medicine.
\n\n
"
"
group"
:
"
mmlu_other"
"
group_alias"
:
"
other"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
mmlu_college_medicine"
"
task_alias"
:
"
college_medicine"
Prev
1
…
29
30
31
32
33
34
35
36
37
…
50
Next
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment