Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
gaoqiong
lm-evaluation-harness
Commits
601be343
Commit
601be343
authored
Jun 23, 2025
by
Baber
Browse files
Merge branch 'main' into feature/eval_from_config
parents
d0884a96
68c3a811
Changes
1000
Hide whitespace changes
Inline
Side-by-side
Showing
20 changed files
with
122 additions
and
253 deletions
+122
-253
lm_eval/tasks/afrimgsm/en_cot/afrimgsm_en_cot_twi.yaml
lm_eval/tasks/afrimgsm/en_cot/afrimgsm_en_cot_twi.yaml
+0
-12
lm_eval/tasks/afrimgsm/en_cot/afrimgsm_en_cot_wol.yaml
lm_eval/tasks/afrimgsm/en_cot/afrimgsm_en_cot_wol.yaml
+0
-12
lm_eval/tasks/afrimgsm/en_cot/afrimgsm_en_cot_xho.yaml
lm_eval/tasks/afrimgsm/en_cot/afrimgsm_en_cot_xho.yaml
+0
-12
lm_eval/tasks/afrimgsm/en_cot/afrimgsm_en_cot_yor.yaml
lm_eval/tasks/afrimgsm/en_cot/afrimgsm_en_cot_yor.yaml
+0
-12
lm_eval/tasks/afrimgsm/en_cot/afrimgsm_en_cot_zul.yaml
lm_eval/tasks/afrimgsm/en_cot/afrimgsm_en_cot_zul.yaml
+0
-12
lm_eval/tasks/afrimgsm/en_cot/cot_yaml
lm_eval/tasks/afrimgsm/en_cot/cot_yaml
+0
-37
lm_eval/tasks/afrimgsm/gen_utils.py
lm_eval/tasks/afrimgsm/gen_utils.py
+122
-0
lm_eval/tasks/afrimgsm/translate/afrimgsm_translate_amh.yaml
lm_eval/tasks/afrimgsm/translate/afrimgsm_translate_amh.yaml
+0
-12
lm_eval/tasks/afrimgsm/translate/afrimgsm_translate_eng.yaml
lm_eval/tasks/afrimgsm/translate/afrimgsm_translate_eng.yaml
+0
-12
lm_eval/tasks/afrimgsm/translate/afrimgsm_translate_ewe.yaml
lm_eval/tasks/afrimgsm/translate/afrimgsm_translate_ewe.yaml
+0
-12
lm_eval/tasks/afrimgsm/translate/afrimgsm_translate_fra.yaml
lm_eval/tasks/afrimgsm/translate/afrimgsm_translate_fra.yaml
+0
-12
lm_eval/tasks/afrimgsm/translate/afrimgsm_translate_hau.yaml
lm_eval/tasks/afrimgsm/translate/afrimgsm_translate_hau.yaml
+0
-12
lm_eval/tasks/afrimgsm/translate/afrimgsm_translate_ibo.yaml
lm_eval/tasks/afrimgsm/translate/afrimgsm_translate_ibo.yaml
+0
-12
lm_eval/tasks/afrimgsm/translate/afrimgsm_translate_kin.yaml
lm_eval/tasks/afrimgsm/translate/afrimgsm_translate_kin.yaml
+0
-12
lm_eval/tasks/afrimgsm/translate/afrimgsm_translate_lin.yaml
lm_eval/tasks/afrimgsm/translate/afrimgsm_translate_lin.yaml
+0
-12
lm_eval/tasks/afrimgsm/translate/afrimgsm_translate_lug.yaml
lm_eval/tasks/afrimgsm/translate/afrimgsm_translate_lug.yaml
+0
-12
lm_eval/tasks/afrimgsm/translate/afrimgsm_translate_orm.yaml
lm_eval/tasks/afrimgsm/translate/afrimgsm_translate_orm.yaml
+0
-12
lm_eval/tasks/afrimgsm/translate/afrimgsm_translate_sna.yaml
lm_eval/tasks/afrimgsm/translate/afrimgsm_translate_sna.yaml
+0
-12
lm_eval/tasks/afrimgsm/translate/afrimgsm_translate_sot.yaml
lm_eval/tasks/afrimgsm/translate/afrimgsm_translate_sot.yaml
+0
-12
lm_eval/tasks/afrimgsm/translate/afrimgsm_translate_swa.yaml
lm_eval/tasks/afrimgsm/translate/afrimgsm_translate_swa.yaml
+0
-12
No files found.
Too many changes to show.
To preserve performance only
1000 of 1000+
files are displayed.
Plain diff
Email patch
lm_eval/tasks/afrimgsm/en_cot/afrimgsm_en_cot_twi.yaml
deleted
100644 → 0
View file @
d0884a96
# Generated by utils.py
dataset_name
:
twi
doc_to_target
:
'
{%
if
answer
is
not
none
%}{{answer[21:]}}{%
else
%}{{answer_number|string}}{%
endif
%}'
doc_to_text
:
'
{%
if
answer
is
not
none
%}{{question+"\nStep-by-Step
Answer:"}}{%
else
%}{{"Question:
"+question+"\nStep-by-Step
Answer:"}}{%
endif
%}'
generation_kwargs
:
do_sample
:
false
until
:
-
'
Question:'
-
</s>
-
<|im_end|>
include
:
cot_yaml
task
:
afrimgsm_en_cot_twi
lm_eval/tasks/afrimgsm/en_cot/afrimgsm_en_cot_wol.yaml
deleted
100644 → 0
View file @
d0884a96
# Generated by utils.py
dataset_name
:
wol
doc_to_target
:
'
{%
if
answer
is
not
none
%}{{answer[21:]}}{%
else
%}{{answer_number|string}}{%
endif
%}'
doc_to_text
:
'
{%
if
answer
is
not
none
%}{{question+"\nStep-by-Step
Answer:"}}{%
else
%}{{"Question:
"+question+"\nStep-by-Step
Answer:"}}{%
endif
%}'
generation_kwargs
:
do_sample
:
false
until
:
-
'
Question:'
-
</s>
-
<|im_end|>
include
:
cot_yaml
task
:
afrimgsm_en_cot_wol
lm_eval/tasks/afrimgsm/en_cot/afrimgsm_en_cot_xho.yaml
deleted
100644 → 0
View file @
d0884a96
# Generated by utils.py
dataset_name
:
xho
doc_to_target
:
'
{%
if
answer
is
not
none
%}{{answer[21:]}}{%
else
%}{{answer_number|string}}{%
endif
%}'
doc_to_text
:
'
{%
if
answer
is
not
none
%}{{question+"\nStep-by-Step
Answer:"}}{%
else
%}{{"Question:
"+question+"\nStep-by-Step
Answer:"}}{%
endif
%}'
generation_kwargs
:
do_sample
:
false
until
:
-
'
Question:'
-
</s>
-
<|im_end|>
include
:
cot_yaml
task
:
afrimgsm_en_cot_xho
lm_eval/tasks/afrimgsm/en_cot/afrimgsm_en_cot_yor.yaml
deleted
100644 → 0
View file @
d0884a96
# Generated by utils.py
dataset_name
:
yor
doc_to_target
:
'
{%
if
answer
is
not
none
%}{{answer[16:]}}{%
else
%}{{answer_number|string}}{%
endif
%}'
doc_to_text
:
'
{%
if
answer
is
not
none
%}{{question+"\nStep-by-Step
Answer:"}}{%
else
%}{{"Question:
"+question+"\nStep-by-Step
Answer:"}}{%
endif
%}'
generation_kwargs
:
do_sample
:
false
until
:
-
'
Question:'
-
</s>
-
<|im_end|>
include
:
cot_yaml
task
:
afrimgsm_en_cot_yor
lm_eval/tasks/afrimgsm/en_cot/afrimgsm_en_cot_zul.yaml
deleted
100644 → 0
View file @
d0884a96
# Generated by utils.py
dataset_name
:
zul
doc_to_target
:
'
{%
if
answer
is
not
none
%}{{answer[21:]}}{%
else
%}{{answer_number|string}}{%
endif
%}'
doc_to_text
:
'
{%
if
answer
is
not
none
%}{{question+"\nStep-by-Step
Answer:"}}{%
else
%}{{"Question:
"+question+"\nStep-by-Step
Answer:"}}{%
endif
%}'
generation_kwargs
:
do_sample
:
false
until
:
-
'
Question:'
-
</s>
-
<|im_end|>
include
:
cot_yaml
task
:
afrimgsm_en_cot_zul
lm_eval/tasks/afrimgsm/en_cot/cot_yaml
deleted
100644 → 0
View file @
d0884a96
# This file will be included in the generated language-specific task configs.
# It doesn't have a yaml file extension as it is not meant to be imported directly by the harness.
tag:
- afrimgsm
- afrimgsm_en_cot
dataset_path: masakhane/afrimgsm
dataset_name: null # Overridden by language-specific config.
output_type: generate_until
training_split: train
test_split: test
generation_kwargs:
until:
- "\n\n"
- "\n"
do_sample: false
temperature: 0.0
target_delimiter: " "
metric_list:
- metric: exact_match
aggregation: mean
higher_is_better: true
ignore_case: true
ignore_punctuation: true
filter_list:
- name: "strict-match"
filter:
- function: "regex"
regex_pattern: "The answer is (\\-?[0-9\\.\\,]+)"
- function: "take_first"
- filter:
- function: regex
group_select: -1
regex_pattern: (-?[$0-9.,]{2,})|(-?[0-9]+)
- function: take_first
name: flexible-extract
metadata:
version: 2.0
lm_eval/tasks/afrimgsm/gen_utils.py
0 → 100644
View file @
601be343
import
argparse
import
os
import
yaml
class
FunctionTag
:
def
__init__
(
self
,
value
):
self
.
value
=
value
def
prompt_func
(
mode
,
lang
):
prompt_map
=
{
"prompt_4"
:
"Answer the given question with the step by step solution appropriate numerical value, ensuring that the response is "
"clear and without any supplementary information.
\n\n
Question: {{question}}
\n
Step by step answer: "
,
"prompt_5"
:
f
"For mathematical questions provided in
{
lang
}
language. Supply the accurate step by step answer to the "
"provided question.
\n\n
Question: {{question}}
\n
Step by step answer: "
,
}
return
prompt_map
[
mode
]
def
gen_lang_yamls
(
output_dir
:
str
,
overwrite
:
bool
,
mode
:
str
)
->
None
:
"""
Generate a yaml file for each language.
:param output_dir: The directory to output the files to.
:param overwrite: Whether to overwrite files if they already exist.
"""
err
=
[]
languages
=
{
"eng"
:
"English"
,
"amh"
:
"Amharic"
,
"ibo"
:
"Igbo"
,
"fra"
:
"French"
,
"sna"
:
"chiShona"
,
"wol"
:
"Wolof"
,
"ewe"
:
"Ewe"
,
"lin"
:
"Lingala"
,
"lug"
:
"Luganda"
,
"xho"
:
"isiXhosa"
,
"kin"
:
"Kinyarwanda"
,
"twi"
:
"Twi"
,
"zul"
:
"Zulu"
,
"orm"
:
"Oromo"
,
"yor"
:
"Yoruba"
,
"hau"
:
"Hausa"
,
"sot"
:
"Sesotho"
,
"swa"
:
"Swahili"
,
"vai"
:
"Vai"
,
}
for
lang
in
languages
.
keys
():
try
:
file_name
=
f
"afrimgsm_cot_
{
lang
}
.yaml"
task_name
=
f
"afrimgsm_cot_
{
lang
}
_
{
mode
}
"
yaml_template
=
"afrimgsm_cot_yaml"
if
"translate"
in
output_dir
.
split
(
"/"
)[
-
1
]:
file_name
=
f
"afrimgsm_cot_translate_
{
lang
}
.yaml"
task_name
=
f
"afrimgsm_cot_translate_
{
lang
}
_
{
mode
}
"
yaml_template
=
"afrimgsm_cot_translate_yaml"
if
int
(
mode
.
split
(
"_"
)[
-
1
])
>
3
:
yaml_details
=
{
"include"
:
yaml_template
,
"task"
:
task_name
,
"dataset_name"
:
lang
,
"doc_to_text"
:
prompt_func
(
mode
,
languages
[
lang
]),
}
else
:
yaml_details
=
{
"include"
:
yaml_template
,
"task"
:
task_name
,
"dataset_name"
:
lang
,
}
os
.
makedirs
(
f
"
{
output_dir
}
/
{
mode
}
"
,
exist_ok
=
True
)
with
open
(
f
"
{
output_dir
}
/
{
mode
}
/
{
file_name
}
"
,
"w"
if
overwrite
else
"x"
,
encoding
=
"utf8"
,
)
as
f
:
f
.
write
(
"# Generated by utils.py
\n
"
)
yaml
.
dump
(
yaml_details
,
f
,
allow_unicode
=
True
,
)
except
FileExistsError
:
err
.
append
(
file_name
)
if
len
(
err
)
>
0
:
raise
FileExistsError
(
"Files were not created because they already exist (use --overwrite flag):"
f
"
{
', '
.
join
(
err
)
}
"
)
def
main
()
->
None
:
"""Parse CLI args and generate language-specific yaml files."""
parser
=
argparse
.
ArgumentParser
()
parser
.
add_argument
(
"--overwrite"
,
default
=
True
,
action
=
"store_true"
,
help
=
"Overwrite files if they already exist"
,
)
parser
.
add_argument
(
"--output-dir"
,
default
=
"./translate_cot"
,
help
=
"Directory to write yaml files to"
,
)
parser
.
add_argument
(
"--mode"
,
default
=
"prompt_5"
,
choices
=
[
"prompt_1"
,
"prompt_2"
,
"prompt_3"
,
"prompt_4"
,
"prompt_5"
],
help
=
"Prompt number"
,
)
args
=
parser
.
parse_args
()
gen_lang_yamls
(
output_dir
=
args
.
output_dir
,
overwrite
=
args
.
overwrite
,
mode
=
args
.
mode
)
if
__name__
==
"__main__"
:
main
()
lm_eval/tasks/afrimgsm/translate/afrimgsm_translate_amh.yaml
deleted
100644 → 0
View file @
d0884a96
# Generated by utils.py
dataset_name
:
amh
doc_to_target
:
'
{%
if
answer
is
not
none
%}{{answer[15:]}}{%
else
%}{{answer_number|string}}{%
endif
%}'
doc_to_text
:
'
{%
if
answer
is
not
none
%}{{question+"\nAnswer:"}}{%
else
%}{{"Question:
"+question+"\nAnswer:"}}{%
endif
%}'
generation_kwargs
:
do_sample
:
false
until
:
-
'
Question:'
-
</s>
-
<|im_end|>
include
:
translate_direct_yaml
task
:
afrimgsm_translate_direct_amh
lm_eval/tasks/afrimgsm/translate/afrimgsm_translate_eng.yaml
deleted
100644 → 0
View file @
d0884a96
# Generated by utils.py
dataset_name
:
eng
doc_to_target
:
'
{%
if
answer
is
not
none
%}{{answer[21:]}}{%
else
%}{{answer_number|string}}{%
endif
%}'
doc_to_text
:
'
{%
if
answer
is
not
none
%}{{question+"\nAnswer:"}}{%
else
%}{{"Question:
"+question+"\nAnswer:"}}{%
endif
%}'
generation_kwargs
:
do_sample
:
false
until
:
-
'
Question:'
-
</s>
-
<|im_end|>
include
:
translate_direct_yaml
task
:
afrimgsm_translate_direct_eng
lm_eval/tasks/afrimgsm/translate/afrimgsm_translate_ewe.yaml
deleted
100644 → 0
View file @
d0884a96
# Generated by utils.py
dataset_name
:
ewe
doc_to_target
:
'
{%
if
answer
is
not
none
%}{{answer[21:]}}{%
else
%}{{answer_number|string}}{%
endif
%}'
doc_to_text
:
'
{%
if
answer
is
not
none
%}{{question+"\nAnswer:"}}{%
else
%}{{"Question:
"+question+"\nAnswer:"}}{%
endif
%}'
generation_kwargs
:
do_sample
:
false
until
:
-
'
Question:'
-
</s>
-
<|im_end|>
include
:
translate_direct_yaml
task
:
afrimgsm_translate_direct_ewe
lm_eval/tasks/afrimgsm/translate/afrimgsm_translate_fra.yaml
deleted
100644 → 0
View file @
d0884a96
# Generated by utils.py
dataset_name
:
fra
doc_to_target
:
'
{%
if
answer
is
not
none
%}{{answer[21:]}}{%
else
%}{{answer_number|string}}{%
endif
%}'
doc_to_text
:
'
{%
if
answer
is
not
none
%}{{question+"\nAnswer:"}}{%
else
%}{{"Question:
"+question+"\nAnswer:"}}{%
endif
%}'
generation_kwargs
:
do_sample
:
false
until
:
-
'
Question:'
-
</s>
-
<|im_end|>
include
:
translate_direct_yaml
task
:
afrimgsm_translate_direct_fra
lm_eval/tasks/afrimgsm/translate/afrimgsm_translate_hau.yaml
deleted
100644 → 0
View file @
d0884a96
# Generated by utils.py
dataset_name
:
hau
doc_to_target
:
'
{%
if
answer
is
not
none
%}{{answer[21:]}}{%
else
%}{{answer_number|string}}{%
endif
%}'
doc_to_text
:
'
{%
if
answer
is
not
none
%}{{question+"\nAnswer:"}}{%
else
%}{{"Question:
"+question+"\nAnswer:"}}{%
endif
%}'
generation_kwargs
:
do_sample
:
false
until
:
-
'
Question:'
-
</s>
-
<|im_end|>
include
:
translate_direct_yaml
task
:
afrimgsm_translate_direct_hau
lm_eval/tasks/afrimgsm/translate/afrimgsm_translate_ibo.yaml
deleted
100644 → 0
View file @
d0884a96
# Generated by utils.py
dataset_name
:
ibo
doc_to_target
:
'
{%
if
answer
is
not
none
%}{{answer[21:]}}{%
else
%}{{answer_number|string}}{%
endif
%}'
doc_to_text
:
'
{%
if
answer
is
not
none
%}{{question+"\nAnswer:"}}{%
else
%}{{"Question:
"+question+"\nAnswer:"}}{%
endif
%}'
generation_kwargs
:
do_sample
:
false
until
:
-
'
Question:'
-
</s>
-
<|im_end|>
include
:
translate_direct_yaml
task
:
afrimgsm_translate_direct_ibo
lm_eval/tasks/afrimgsm/translate/afrimgsm_translate_kin.yaml
deleted
100644 → 0
View file @
d0884a96
# Generated by utils.py
dataset_name
:
kin
doc_to_target
:
'
{%
if
answer
is
not
none
%}{{answer[21:]}}{%
else
%}{{answer_number|string}}{%
endif
%}'
doc_to_text
:
'
{%
if
answer
is
not
none
%}{{question+"\nAnswer:"}}{%
else
%}{{"Question:
"+question+"\nAnswer:"}}{%
endif
%}'
generation_kwargs
:
do_sample
:
false
until
:
-
'
Question:'
-
</s>
-
<|im_end|>
include
:
translate_direct_yaml
task
:
afrimgsm_translate_direct_kin
lm_eval/tasks/afrimgsm/translate/afrimgsm_translate_lin.yaml
deleted
100644 → 0
View file @
d0884a96
# Generated by utils.py
dataset_name
:
lin
doc_to_target
:
'
{%
if
answer
is
not
none
%}{{answer[21:]}}{%
else
%}{{answer_number|string}}{%
endif
%}'
doc_to_text
:
'
{%
if
answer
is
not
none
%}{{question+"\nAnswer:"}}{%
else
%}{{"Question:
"+question+"\nAnswer:"}}{%
endif
%}'
generation_kwargs
:
do_sample
:
false
until
:
-
'
Question:'
-
</s>
-
<|im_end|>
include
:
translate_direct_yaml
task
:
afrimgsm_translate_direct_lin
lm_eval/tasks/afrimgsm/translate/afrimgsm_translate_lug.yaml
deleted
100644 → 0
View file @
d0884a96
# Generated by utils.py
dataset_name
:
lug
doc_to_target
:
'
{%
if
answer
is
not
none
%}{{answer[21:]}}{%
else
%}{{answer_number|string}}{%
endif
%}'
doc_to_text
:
'
{%
if
answer
is
not
none
%}{{question+"\nAnswer:"}}{%
else
%}{{"Question:
"+question+"\nAnswer:"}}{%
endif
%}'
generation_kwargs
:
do_sample
:
false
until
:
-
'
Question:'
-
</s>
-
<|im_end|>
include
:
translate_direct_yaml
task
:
afrimgsm_translate_direct_lug
lm_eval/tasks/afrimgsm/translate/afrimgsm_translate_orm.yaml
deleted
100644 → 0
View file @
d0884a96
# Generated by utils.py
dataset_name
:
orm
doc_to_target
:
'
{%
if
answer
is
not
none
%}{{answer[21:]}}{%
else
%}{{answer_number|string}}{%
endif
%}'
doc_to_text
:
'
{%
if
answer
is
not
none
%}{{question+"\nAnswer:"}}{%
else
%}{{"Question:
"+question+"\nAnswer:"}}{%
endif
%}'
generation_kwargs
:
do_sample
:
false
until
:
-
'
Question:'
-
</s>
-
<|im_end|>
include
:
translate_direct_yaml
task
:
afrimgsm_translate_direct_orm
lm_eval/tasks/afrimgsm/translate/afrimgsm_translate_sna.yaml
deleted
100644 → 0
View file @
d0884a96
# Generated by utils.py
dataset_name
:
sna
doc_to_target
:
'
{%
if
answer
is
not
none
%}{{answer[21:]}}{%
else
%}{{answer_number|string}}{%
endif
%}'
doc_to_text
:
'
{%
if
answer
is
not
none
%}{{question+"\nAnswer:"}}{%
else
%}{{"Question:
"+question+"\nAnswer:"}}{%
endif
%}'
generation_kwargs
:
do_sample
:
false
until
:
-
'
Question:'
-
</s>
-
<|im_end|>
include
:
translate_direct_yaml
task
:
afrimgsm_translate_direct_sna
lm_eval/tasks/afrimgsm/translate/afrimgsm_translate_sot.yaml
deleted
100644 → 0
View file @
d0884a96
# Generated by utils.py
dataset_name
:
sot
doc_to_target
:
'
{%
if
answer
is
not
none
%}{{answer[21:]}}{%
else
%}{{answer_number|string}}{%
endif
%}'
doc_to_text
:
'
{%
if
answer
is
not
none
%}{{question+"\nAnswer:"}}{%
else
%}{{"Question:
"+question+"\nAnswer:"}}{%
endif
%}'
generation_kwargs
:
do_sample
:
false
until
:
-
'
Question:'
-
</s>
-
<|im_end|>
include
:
translate_direct_yaml
task
:
afrimgsm_translate_direct_sot
lm_eval/tasks/afrimgsm/translate/afrimgsm_translate_swa.yaml
deleted
100644 → 0
View file @
d0884a96
# Generated by utils.py
dataset_name
:
swa
doc_to_target
:
'
{%
if
answer
is
not
none
%}{{answer[21:]}}{%
else
%}{{answer_number|string}}{%
endif
%}'
doc_to_text
:
'
{%
if
answer
is
not
none
%}{{question+"\nAnswer:"}}{%
else
%}{{"Question:
"+question+"\nAnswer:"}}{%
endif
%}'
generation_kwargs
:
do_sample
:
false
until
:
-
'
Question:'
-
</s>
-
<|im_end|>
include
:
translate_direct_yaml
task
:
afrimgsm_translate_direct_swa
Prev
1
…
11
12
13
14
15
16
17
18
19
…
50
Next
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment