Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
gaoqiong
lm-evaluation-harness
Commits
601be343
Commit
601be343
authored
Jun 23, 2025
by
Baber
Browse files
Merge branch 'main' into feature/eval_from_config
parents
d0884a96
68c3a811
Changes
1000
Hide whitespace changes
Inline
Side-by-side
Showing
20 changed files
with
180 additions
and
49 deletions
+180
-49
lm_eval/tasks/afrimmlu/direct/prompt_5/afrimmlu_direct_kin.yaml
...l/tasks/afrimmlu/direct/prompt_5/afrimmlu_direct_kin.yaml
+4
-0
lm_eval/tasks/afrimmlu/direct/prompt_5/afrimmlu_direct_lin.yaml
...l/tasks/afrimmlu/direct/prompt_5/afrimmlu_direct_lin.yaml
+4
-0
lm_eval/tasks/afrimmlu/direct/prompt_5/afrimmlu_direct_lug.yaml
...l/tasks/afrimmlu/direct/prompt_5/afrimmlu_direct_lug.yaml
+4
-0
lm_eval/tasks/afrimmlu/direct/prompt_5/afrimmlu_direct_orm.yaml
...l/tasks/afrimmlu/direct/prompt_5/afrimmlu_direct_orm.yaml
+4
-0
lm_eval/tasks/afrimmlu/direct/prompt_5/afrimmlu_direct_sna.yaml
...l/tasks/afrimmlu/direct/prompt_5/afrimmlu_direct_sna.yaml
+4
-0
lm_eval/tasks/afrimmlu/direct/prompt_5/afrimmlu_direct_sot.yaml
...l/tasks/afrimmlu/direct/prompt_5/afrimmlu_direct_sot.yaml
+4
-0
lm_eval/tasks/afrimmlu/direct/prompt_5/afrimmlu_direct_swa.yaml
...l/tasks/afrimmlu/direct/prompt_5/afrimmlu_direct_swa.yaml
+4
-0
lm_eval/tasks/afrimmlu/direct/prompt_5/afrimmlu_direct_twi.yaml
...l/tasks/afrimmlu/direct/prompt_5/afrimmlu_direct_twi.yaml
+4
-0
lm_eval/tasks/afrimmlu/direct/prompt_5/afrimmlu_direct_wol.yaml
...l/tasks/afrimmlu/direct/prompt_5/afrimmlu_direct_wol.yaml
+4
-0
lm_eval/tasks/afrimmlu/direct/prompt_5/afrimmlu_direct_xho.yaml
...l/tasks/afrimmlu/direct/prompt_5/afrimmlu_direct_xho.yaml
+4
-0
lm_eval/tasks/afrimmlu/direct/prompt_5/afrimmlu_direct_yor.yaml
...l/tasks/afrimmlu/direct/prompt_5/afrimmlu_direct_yor.yaml
+4
-0
lm_eval/tasks/afrimmlu/direct/prompt_5/afrimmlu_direct_zul.yaml
...l/tasks/afrimmlu/direct/prompt_5/afrimmlu_direct_zul.yaml
+4
-0
lm_eval/tasks/afrimmlu/direct/prompt_5/utils.py
lm_eval/tasks/afrimmlu/direct/prompt_5/utils.py
+29
-0
lm_eval/tasks/afrimmlu/gen_utils.py
lm_eval/tasks/afrimmlu/gen_utils.py
+103
-0
lm_eval/tasks/afrimmlu/translate/afrimmlu_common_translate_yaml
...l/tasks/afrimmlu/translate/afrimmlu_common_translate_yaml
+0
-34
lm_eval/tasks/afrimmlu/translate/afrimmlu_translate_amh.yaml
lm_eval/tasks/afrimmlu/translate/afrimmlu_translate_amh.yaml
+0
-3
lm_eval/tasks/afrimmlu/translate/afrimmlu_translate_eng.yaml
lm_eval/tasks/afrimmlu/translate/afrimmlu_translate_eng.yaml
+0
-3
lm_eval/tasks/afrimmlu/translate/afrimmlu_translate_ewe.yaml
lm_eval/tasks/afrimmlu/translate/afrimmlu_translate_ewe.yaml
+0
-3
lm_eval/tasks/afrimmlu/translate/afrimmlu_translate_fra.yaml
lm_eval/tasks/afrimmlu/translate/afrimmlu_translate_fra.yaml
+0
-3
lm_eval/tasks/afrimmlu/translate/afrimmlu_translate_hau.yaml
lm_eval/tasks/afrimmlu/translate/afrimmlu_translate_hau.yaml
+0
-3
No files found.
Too many changes to show.
To preserve performance only
1000 of 1000+
files are displayed.
Plain diff
Email patch
lm_eval/tasks/afrimmlu/direct/prompt_5/afrimmlu_direct_kin.yaml
0 → 100644
View file @
601be343
# Generated by utils.py
dataset_name
:
kin
include
:
afrimmlu_direct
task
:
afrimmlu_direct_kin_prompt_5
lm_eval/tasks/afrimmlu/direct/prompt_5/afrimmlu_direct_lin.yaml
0 → 100644
View file @
601be343
# Generated by utils.py
dataset_name
:
lin
include
:
afrimmlu_direct
task
:
afrimmlu_direct_lin_prompt_5
lm_eval/tasks/afrimmlu/direct/prompt_5/afrimmlu_direct_lug.yaml
0 → 100644
View file @
601be343
# Generated by utils.py
dataset_name
:
lug
include
:
afrimmlu_direct
task
:
afrimmlu_direct_lug_prompt_5
lm_eval/tasks/afrimmlu/direct/prompt_5/afrimmlu_direct_orm.yaml
0 → 100644
View file @
601be343
# Generated by utils.py
dataset_name
:
orm
include
:
afrimmlu_direct
task
:
afrimmlu_direct_orm_prompt_5
lm_eval/tasks/afrimmlu/direct/prompt_5/afrimmlu_direct_sna.yaml
0 → 100644
View file @
601be343
# Generated by utils.py
dataset_name
:
sna
include
:
afrimmlu_direct
task
:
afrimmlu_direct_sna_prompt_5
lm_eval/tasks/afrimmlu/direct/prompt_5/afrimmlu_direct_sot.yaml
0 → 100644
View file @
601be343
# Generated by utils.py
dataset_name
:
sot
include
:
afrimmlu_direct
task
:
afrimmlu_direct_sot_prompt_5
lm_eval/tasks/afrimmlu/direct/prompt_5/afrimmlu_direct_swa.yaml
0 → 100644
View file @
601be343
# Generated by utils.py
dataset_name
:
swa
include
:
afrimmlu_direct
task
:
afrimmlu_direct_swa_prompt_5
lm_eval/tasks/afrimmlu/direct/prompt_5/afrimmlu_direct_twi.yaml
0 → 100644
View file @
601be343
# Generated by utils.py
dataset_name
:
twi
include
:
afrimmlu_direct
task
:
afrimmlu_direct_twi_prompt_5
lm_eval/tasks/afrimmlu/direct/prompt_5/afrimmlu_direct_wol.yaml
0 → 100644
View file @
601be343
# Generated by utils.py
dataset_name
:
wol
include
:
afrimmlu_direct
task
:
afrimmlu_direct_wol_prompt_5
lm_eval/tasks/afrimmlu/direct/prompt_5/afrimmlu_direct_xho.yaml
0 → 100644
View file @
601be343
# Generated by utils.py
dataset_name
:
xho
include
:
afrimmlu_direct
task
:
afrimmlu_direct_xho_prompt_5
lm_eval/tasks/afrimmlu/direct/prompt_5/afrimmlu_direct_yor.yaml
0 → 100644
View file @
601be343
# Generated by utils.py
dataset_name
:
yor
include
:
afrimmlu_direct
task
:
afrimmlu_direct_yor_prompt_5
lm_eval/tasks/afrimmlu/direct/prompt_5/afrimmlu_direct_zul.yaml
0 → 100644
View file @
601be343
# Generated by utils.py
dataset_name
:
zul
include
:
afrimmlu_direct
task
:
afrimmlu_direct_zul_prompt_5
lm_eval/tasks/afrimmlu/direct/prompt_5/utils.py
0 → 100644
View file @
601be343
from
lm_eval.utils
import
weighted_f1_score
def
doc_to_choice
(
doc
):
choices
=
eval
(
doc
[
"choices"
])
return
choices
def
doc_to_text
(
doc
):
output
=
"""Given your proficiency in {subject}, please answer the subsequent multiple-choice question with 'A', 'B', 'C', or 'D'.
Question: {question}
Choices:
A: {choice1}
B: {choice2}
C: {choice3}
D: {choice4}
Answer: """
choices
=
eval
(
doc
[
"choices"
])
text
=
output
.
format
(
subject
=
doc
[
"subject"
],
question
=
doc
[
"question"
],
choice1
=
choices
[
0
],
choice2
=
choices
[
1
],
choice3
=
choices
[
2
],
choice4
=
choices
[
3
],
)
return
text
lm_eval/tasks/afrimmlu/gen_utils.py
0 → 100644
View file @
601be343
import
argparse
import
os
import
yaml
class
FunctionTag
:
def
__init__
(
self
,
value
):
self
.
value
=
value
def
gen_lang_yamls
(
output_dir
:
str
,
overwrite
:
bool
,
mode
:
str
)
->
None
:
"""
Generate a yaml file for each language.
:param output_dir: The directory to output the files to.
:param overwrite: Whether to overwrite files if they already exist.
"""
err
=
[]
languages
=
{
"eng"
:
"English"
,
"amh"
:
"Amharic"
,
"ibo"
:
"Igbo"
,
"fra"
:
"French"
,
"sna"
:
"chiShona"
,
"wol"
:
"Wolof"
,
"ewe"
:
"Ewe"
,
"lin"
:
"Lingala"
,
"lug"
:
"Luganda"
,
"xho"
:
"isiXhosa"
,
"kin"
:
"Kinyarwanda"
,
"twi"
:
"Twi"
,
"zul"
:
"Zulu"
,
"orm"
:
"Oromo"
,
"yor"
:
"Yoruba"
,
"hau"
:
"Hausa"
,
"sot"
:
"Sesotho"
,
"swa"
:
"Swahili"
,
}
for
lang
in
languages
.
keys
():
try
:
file_name
=
f
"afrimmlu_direct_
{
lang
}
.yaml"
task_name
=
f
"afrimmlu_direct_
{
lang
}
_
{
mode
}
"
yaml_template
=
"afrimmlu_direct"
if
output_dir
.
split
(
"/"
)[
-
1
]
==
"translate"
:
file_name
=
f
"afrimmlu_translate_
{
lang
}
.yaml"
task_name
=
f
"afrimmlu_translate_
{
lang
}
_
{
mode
}
"
yaml_template
=
"afrimmlu_translate"
yaml_details
=
{
"include"
:
yaml_template
,
"task"
:
task_name
,
"dataset_name"
:
lang
,
}
os
.
makedirs
(
f
"
{
output_dir
}
/
{
mode
}
"
,
exist_ok
=
True
)
with
open
(
f
"
{
output_dir
}
/
{
mode
}
/
{
file_name
}
"
,
"w"
if
overwrite
else
"x"
,
encoding
=
"utf8"
,
)
as
f
:
f
.
write
(
"# Generated by utils.py
\n
"
)
yaml
.
dump
(
yaml_details
,
f
,
allow_unicode
=
True
,
)
except
FileExistsError
:
err
.
append
(
file_name
)
if
len
(
err
)
>
0
:
raise
FileExistsError
(
"Files were not created because they already exist (use --overwrite flag):"
f
"
{
', '
.
join
(
err
)
}
"
)
def
main
()
->
None
:
"""Parse CLI args and generate language-specific yaml files."""
parser
=
argparse
.
ArgumentParser
()
parser
.
add_argument
(
"--overwrite"
,
default
=
True
,
action
=
"store_true"
,
help
=
"Overwrite files if they already exist"
,
)
parser
.
add_argument
(
"--output-dir"
,
default
=
"./direct"
,
help
=
"Directory to write yaml files to"
,
)
parser
.
add_argument
(
"--mode"
,
default
=
"prompt_4"
,
choices
=
[
"prompt_1"
,
"prompt_2"
,
"prompt_3"
,
"prompt_4"
,
"prompt_5"
],
help
=
"Prompt number"
,
)
args
=
parser
.
parse_args
()
gen_lang_yamls
(
output_dir
=
args
.
output_dir
,
overwrite
=
args
.
overwrite
,
mode
=
args
.
mode
)
if
__name__
==
"__main__"
:
main
()
lm_eval/tasks/afrimmlu/translate/afrimmlu_common_translate_yaml
deleted
100644 → 0
View file @
d0884a96
tag:
- afrimmlu_translate
task: null
dataset_path: masakhane/afrimmlu-translate-test
dataset_name: null
output_type: multiple_choice
test_split: test
doc_to_text: !function utils.doc_to_text
doc_to_target: "{{['A', 'B', 'C', 'D'].index(answer)}}"
doc_to_choice: !function utils.doc_to_choice
should_decontaminate: true
doc_to_decontamination_query: "Question: {{question}}\nAnswer:"
metric_list:
- metric: f1
aggregation: !function utils.weighted_f1_score
# aggregation: mean
average: weighted
hf_evaluate: true
higher_is_better: True
ignore_case: true
ignore_punctuation: true
regexes_to_ignore:
- ","
- "\\$"
- metric: acc
aggregation: mean
higher_is_better: true
ignore_case: true
ignore_punctuation: true
regexes_to_ignore:
- ","
- "\\$"
metadata:
version: 1.0
lm_eval/tasks/afrimmlu/translate/afrimmlu_translate_amh.yaml
deleted
100644 → 0
View file @
d0884a96
dataset_name
:
amh
include
:
afrimmlu_common_translate_yaml
task
:
afrimmlu_translate_amh
lm_eval/tasks/afrimmlu/translate/afrimmlu_translate_eng.yaml
deleted
100644 → 0
View file @
d0884a96
dataset_name
:
eng
include
:
afrimmlu_common_translate_yaml
task
:
afrimmlu_translate_eng
lm_eval/tasks/afrimmlu/translate/afrimmlu_translate_ewe.yaml
deleted
100644 → 0
View file @
d0884a96
dataset_name
:
ewe
include
:
afrimmlu_common_translate_yaml
task
:
afrimmlu_translate_ewe
lm_eval/tasks/afrimmlu/translate/afrimmlu_translate_fra.yaml
deleted
100644 → 0
View file @
d0884a96
dataset_name
:
fra
include
:
afrimmlu_common_translate_yaml
task
:
afrimmlu_translate_fra
lm_eval/tasks/afrimmlu/translate/afrimmlu_translate_hau.yaml
deleted
100644 → 0
View file @
d0884a96
dataset_name
:
hau
include
:
afrimmlu_common_translate_yaml
task
:
afrimmlu_translate_hau
Prev
1
…
27
28
29
30
31
32
33
34
35
…
50
Next
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment