Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
gaoqiong
lm-evaluation-harness
Commits
601be343
Commit
601be343
authored
Jun 23, 2025
by
Baber
Browse files
Merge branch 'main' into feature/eval_from_config
parents
d0884a96
68c3a811
Changes
1000
Hide whitespace changes
Inline
Side-by-side
Showing
20 changed files
with
333 additions
and
0 deletions
+333
-0
lm_eval/tasks/afrixnli/direct/prompt_5/afrixnli_orm.yaml
lm_eval/tasks/afrixnli/direct/prompt_5/afrixnli_orm.yaml
+6
-0
lm_eval/tasks/afrixnli/direct/prompt_5/afrixnli_sna.yaml
lm_eval/tasks/afrixnli/direct/prompt_5/afrixnli_sna.yaml
+6
-0
lm_eval/tasks/afrixnli/direct/prompt_5/afrixnli_sot.yaml
lm_eval/tasks/afrixnli/direct/prompt_5/afrixnli_sot.yaml
+6
-0
lm_eval/tasks/afrixnli/direct/prompt_5/afrixnli_swa.yaml
lm_eval/tasks/afrixnli/direct/prompt_5/afrixnli_swa.yaml
+6
-0
lm_eval/tasks/afrixnli/direct/prompt_5/afrixnli_twi.yaml
lm_eval/tasks/afrixnli/direct/prompt_5/afrixnli_twi.yaml
+6
-0
lm_eval/tasks/afrixnli/direct/prompt_5/afrixnli_wol.yaml
lm_eval/tasks/afrixnli/direct/prompt_5/afrixnli_wol.yaml
+6
-0
lm_eval/tasks/afrixnli/direct/prompt_5/afrixnli_xho.yaml
lm_eval/tasks/afrixnli/direct/prompt_5/afrixnli_xho.yaml
+6
-0
lm_eval/tasks/afrixnli/direct/prompt_5/afrixnli_yaml
lm_eval/tasks/afrixnli/direct/prompt_5/afrixnli_yaml
+30
-0
lm_eval/tasks/afrixnli/direct/prompt_5/afrixnli_yor.yaml
lm_eval/tasks/afrixnli/direct/prompt_5/afrixnli_yor.yaml
+6
-0
lm_eval/tasks/afrixnli/direct/prompt_5/afrixnli_zul.yaml
lm_eval/tasks/afrixnli/direct/prompt_5/afrixnli_zul.yaml
+6
-0
lm_eval/tasks/afrixnli/direct/prompt_5/utils.py
lm_eval/tasks/afrixnli/direct/prompt_5/utils.py
+6
-0
lm_eval/tasks/afrixnli/gen_utils.py
lm_eval/tasks/afrixnli/gen_utils.py
+129
-0
lm_eval/tasks/afrixnli/translate/afrixnli_tt.yaml
lm_eval/tasks/afrixnli/translate/afrixnli_tt.yaml
+9
-0
lm_eval/tasks/afrixnli/translate/prompt_1/afrixnli_translate_amh.yaml
...s/afrixnli/translate/prompt_1/afrixnli_translate_amh.yaml
+15
-0
lm_eval/tasks/afrixnli/translate/prompt_1/afrixnli_translate_ewe.yaml
...s/afrixnli/translate/prompt_1/afrixnli_translate_ewe.yaml
+15
-0
lm_eval/tasks/afrixnli/translate/prompt_1/afrixnli_translate_fra.yaml
...s/afrixnli/translate/prompt_1/afrixnli_translate_fra.yaml
+15
-0
lm_eval/tasks/afrixnli/translate/prompt_1/afrixnli_translate_hau.yaml
...s/afrixnli/translate/prompt_1/afrixnli_translate_hau.yaml
+15
-0
lm_eval/tasks/afrixnli/translate/prompt_1/afrixnli_translate_ibo.yaml
...s/afrixnli/translate/prompt_1/afrixnli_translate_ibo.yaml
+15
-0
lm_eval/tasks/afrixnli/translate/prompt_1/afrixnli_translate_kin.yaml
...s/afrixnli/translate/prompt_1/afrixnli_translate_kin.yaml
+15
-0
lm_eval/tasks/afrixnli/translate/prompt_1/afrixnli_translate_lin.yaml
...s/afrixnli/translate/prompt_1/afrixnli_translate_lin.yaml
+15
-0
No files found.
Too many changes to show.
To preserve performance only
1000 of 1000+
files are displayed.
Plain diff
Email patch
lm_eval/tasks/afrixnli/direct/prompt_5/afrixnli_orm.yaml
0 → 100644
View file @
601be343
# Generated by utils.py
dataset_name
:
orm
doc_to_text
:
"
Based
on
the
given
statement,
is
the
following
claim
'true',
'false',
\
\
or
'inconclusive'.
\n
Statement:
{{premise}}
\n
Claim:
{{hypothesis}}"
include
:
afrixnli_yaml
task
:
afrixnli_orm_prompt_5
lm_eval/tasks/afrixnli/direct/prompt_5/afrixnli_sna.yaml
0 → 100644
View file @
601be343
# Generated by utils.py
dataset_name
:
sna
doc_to_text
:
"
Based
on
the
given
statement,
is
the
following
claim
'true',
'false',
\
\
or
'inconclusive'.
\n
Statement:
{{premise}}
\n
Claim:
{{hypothesis}}"
include
:
afrixnli_yaml
task
:
afrixnli_sna_prompt_5
lm_eval/tasks/afrixnli/direct/prompt_5/afrixnli_sot.yaml
0 → 100644
View file @
601be343
# Generated by utils.py
dataset_name
:
sot
doc_to_text
:
"
Based
on
the
given
statement,
is
the
following
claim
'true',
'false',
\
\
or
'inconclusive'.
\n
Statement:
{{premise}}
\n
Claim:
{{hypothesis}}"
include
:
afrixnli_yaml
task
:
afrixnli_sot_prompt_5
lm_eval/tasks/afrixnli/direct/prompt_5/afrixnli_swa.yaml
0 → 100644
View file @
601be343
# Generated by utils.py
dataset_name
:
swa
doc_to_text
:
"
Based
on
the
given
statement,
is
the
following
claim
'true',
'false',
\
\
or
'inconclusive'.
\n
Statement:
{{premise}}
\n
Claim:
{{hypothesis}}"
include
:
afrixnli_yaml
task
:
afrixnli_swa_prompt_5
lm_eval/tasks/afrixnli/direct/prompt_5/afrixnli_twi.yaml
0 → 100644
View file @
601be343
# Generated by utils.py
dataset_name
:
twi
doc_to_text
:
"
Based
on
the
given
statement,
is
the
following
claim
'true',
'false',
\
\
or
'inconclusive'.
\n
Statement:
{{premise}}
\n
Claim:
{{hypothesis}}"
include
:
afrixnli_yaml
task
:
afrixnli_twi_prompt_5
lm_eval/tasks/afrixnli/direct/prompt_5/afrixnli_wol.yaml
0 → 100644
View file @
601be343
# Generated by utils.py
dataset_name
:
wol
doc_to_text
:
"
Based
on
the
given
statement,
is
the
following
claim
'true',
'false',
\
\
or
'inconclusive'.
\n
Statement:
{{premise}}
\n
Claim:
{{hypothesis}}"
include
:
afrixnli_yaml
task
:
afrixnli_wol_prompt_5
lm_eval/tasks/afrixnli/direct/prompt_5/afrixnli_xho.yaml
0 → 100644
View file @
601be343
# Generated by utils.py
dataset_name
:
xho
doc_to_text
:
"
Based
on
the
given
statement,
is
the
following
claim
'true',
'false',
\
\
or
'inconclusive'.
\n
Statement:
{{premise}}
\n
Claim:
{{hypothesis}}"
include
:
afrixnli_yaml
task
:
afrixnli_xho_prompt_5
lm_eval/tasks/afrixnli/direct/prompt_5/afrixnli_yaml
0 → 100644
View file @
601be343
tag:
- afrixnli_tasks
- afrixnli_tasks_prompt_5
dataset_path: masakhane/afrixnli
dataset_name: null
output_type: multiple_choice
validation_split: validation
test_split: test
fewshot_split: validation
doc_to_target: !function utils.doc_to_target
doc_to_choice:
- "true"
- "inconclusive"
- "false"
should_decontaminate: true
doc_to_decontamination_query: premise
metric_list:
- metric: f1
aggregation: !function utils.weighted_f1_score
average: weighted
higher_is_better: True
ignore_case: true
ignore_punctuation: true
- metric: acc
aggregation: mean
higher_is_better: true
ignore_case: true
ignore_punctuation: true
metadata:
version: 1.0
lm_eval/tasks/afrixnli/direct/prompt_5/afrixnli_yor.yaml
0 → 100644
View file @
601be343
# Generated by utils.py
dataset_name
:
yor
doc_to_text
:
"
Based
on
the
given
statement,
is
the
following
claim
'true',
'false',
\
\
or
'inconclusive'.
\n
Statement:
{{premise}}
\n
Claim:
{{hypothesis}}"
include
:
afrixnli_yaml
task
:
afrixnli_yor_prompt_5
lm_eval/tasks/afrixnli/direct/prompt_5/afrixnli_zul.yaml
0 → 100644
View file @
601be343
# Generated by utils.py
dataset_name
:
zul
doc_to_text
:
"
Based
on
the
given
statement,
is
the
following
claim
'true',
'false',
\
\
or
'inconclusive'.
\n
Statement:
{{premise}}
\n
Claim:
{{hypothesis}}"
include
:
afrixnli_yaml
task
:
afrixnli_zul_prompt_5
lm_eval/tasks/afrixnli/direct/prompt_5/utils.py
0 → 100644
View file @
601be343
from
lm_eval.utils
import
weighted_f1_score
def
doc_to_target
(
doc
):
replacements
=
{
0
:
"true"
,
1
:
"false"
,
2
:
"inconclusive"
}
return
replacements
[
doc
[
"label"
]]
lm_eval/tasks/afrixnli/gen_utils.py
0 → 100644
View file @
601be343
import
argparse
import
os
import
yaml
class
FunctionTag
:
def
__init__
(
self
,
value
):
self
.
value
=
value
def
prompt_func
(
mode
,
lang
):
prompt_map
=
{
"prompt_1"
:
"Please identify whether the premise entails or contradicts the hypothesis in the following premise "
"and hypothesis. The answer should be exact entailment, contradiction, or neutral.
\n\n
Premise: {premise}
\n
Hypothesis: {hypothesis}
\n\n
"
"Is it entailment, contradiction, or neutral?"
,
"prompt_3"
:
f
"Given the following premise and hypothesis in
{
lang
}
, identify if the premise entails, contradicts, "
f
"or is neutral towards the hypothesis. Please respond with exact 'entailment', 'contradiction', or 'neutral'.
\n\n
"
"Premise: {{premise}}
\n
Hypothesis: {{hypothesis}}"
,
"prompt_4"
:
f
"You are an expert in Natural Language Inference (NLI) specializing in the
{
lang
}
language.
\n
"
f
"Analyze the premise and hypothesis given in
{
lang
}
, and determine the relationship between them.
\n
"
f
"Respond with one of the following options: 'entailment', 'contradiction', or 'neutral'.
\n\n
"
"Premise: {{premise}}
\n
Hypothesis: {{hypothesis}}"
,
"prompt_5"
:
"Based on the given statement, is the following claim 'true', 'false', or 'inconclusive'.
\n
"
"Statement: {{premise}}
\n
Claim: {{hypothesis}}"
,
}
return
prompt_map
[
mode
]
def
gen_lang_yamls
(
output_dir
:
str
,
overwrite
:
bool
,
mode
:
str
)
->
None
:
"""
Generate a yaml file for each language.
:param output_dir: The directory to output the files to.
:param overwrite: Whether to overwrite files if they already exist.
"""
err
=
[]
languages
=
{
"eng"
:
"English"
,
"amh"
:
"Amharic"
,
"ibo"
:
"Igbo"
,
"fra"
:
"French"
,
"sna"
:
"chiShona"
,
"wol"
:
"Wolof"
,
"ewe"
:
"Ewe"
,
"lin"
:
"Lingala"
,
"lug"
:
"Luganda"
,
"xho"
:
"isiXhosa"
,
"kin"
:
"Kinyarwanda"
,
"twi"
:
"Twi"
,
"zul"
:
"Zulu"
,
"orm"
:
"Oromo"
,
"yor"
:
"Yoruba"
,
"hau"
:
"Hausa"
,
"sot"
:
"Sesotho"
,
"swa"
:
"Swahili"
,
}
for
lang
in
languages
.
keys
():
try
:
file_name
=
f
"afrixnli_
{
lang
}
.yaml"
task_name
=
f
"afrixnli_
{
lang
}
_
{
mode
}
"
yaml_template
=
"afrixnli_yaml"
if
output_dir
.
split
(
"/"
)[
-
1
]
==
"translate"
:
file_name
=
f
"afrixnli_translate_
{
lang
}
.yaml"
task_name
=
f
"afrixnli_translate_
{
lang
}
_
{
mode
}
"
yaml_template
=
"afrixnli_translate_yaml"
if
int
(
mode
.
split
(
"_"
)[
-
1
])
==
1
or
int
(
mode
.
split
(
"_"
)[
-
1
])
>
2
:
yaml_details
=
{
"include"
:
yaml_template
,
"task"
:
task_name
,
"dataset_name"
:
lang
,
"doc_to_text"
:
prompt_func
(
mode
,
languages
[
lang
]),
}
else
:
yaml_details
=
{
"include"
:
yaml_template
,
"task"
:
task_name
,
"dataset_name"
:
lang
,
}
os
.
makedirs
(
f
"
{
output_dir
}
/
{
mode
}
"
,
exist_ok
=
True
)
with
open
(
f
"
{
output_dir
}
/
{
mode
}
/
{
file_name
}
"
,
"w"
if
overwrite
else
"x"
,
encoding
=
"utf8"
,
)
as
f
:
f
.
write
(
"# Generated by utils.py
\n
"
)
yaml
.
dump
(
yaml_details
,
f
,
allow_unicode
=
True
,
)
except
FileExistsError
:
err
.
append
(
file_name
)
if
len
(
err
)
>
0
:
raise
FileExistsError
(
"Files were not created because they already exist (use --overwrite flag):"
f
"
{
', '
.
join
(
err
)
}
"
)
def
main
()
->
None
:
"""Parse CLI args and generate language-specific yaml files."""
parser
=
argparse
.
ArgumentParser
()
parser
.
add_argument
(
"--overwrite"
,
default
=
True
,
action
=
"store_true"
,
help
=
"Overwrite files if they already exist"
,
)
parser
.
add_argument
(
"--output-dir"
,
default
=
"./translate"
,
help
=
"Directory to write yaml files to"
,
)
parser
.
add_argument
(
"--mode"
,
default
=
"prompt_5"
,
choices
=
[
"prompt_1"
,
"prompt_2"
,
"prompt_3"
,
"prompt_4"
,
"prompt_5"
],
help
=
"Prompt number"
,
)
args
=
parser
.
parse_args
()
gen_lang_yamls
(
output_dir
=
args
.
output_dir
,
overwrite
=
args
.
overwrite
,
mode
=
args
.
mode
)
if
__name__
==
"__main__"
:
main
()
lm_eval/tasks/afrixnli/translate/afrixnli_tt.yaml
0 → 100644
View file @
601be343
group
:
afrixnli_tt-irokobench
task
:
-
afrixnli_tt_tasks
aggregate_metric_list
:
-
metric
:
acc
aggregation
:
mean
weight_by_size
:
true
metadata
:
version
:
2
lm_eval/tasks/afrixnli/translate/prompt_1/afrixnli_translate_amh.yaml
0 → 100644
View file @
601be343
# Generated by utils.py
dataset_name
:
amh
doc_to_text
:
'
Please
identify
whether
the
premise
entails
or
contradicts
the
hypothesis
in
the
following
premise
and
hypothesis.
The
answer
should
be
exact
entailment,
contradiction,
or
neutral.
Premise:
{premise}
Hypothesis:
{hypothesis}
Is
it
entailment,
contradiction,
or
neutral?'
include
:
afrixnli_translate_yaml
task
:
afrixnli_translate_amh_prompt_1
lm_eval/tasks/afrixnli/translate/prompt_1/afrixnli_translate_ewe.yaml
0 → 100644
View file @
601be343
# Generated by utils.py
dataset_name
:
ewe
doc_to_text
:
'
Please
identify
whether
the
premise
entails
or
contradicts
the
hypothesis
in
the
following
premise
and
hypothesis.
The
answer
should
be
exact
entailment,
contradiction,
or
neutral.
Premise:
{premise}
Hypothesis:
{hypothesis}
Is
it
entailment,
contradiction,
or
neutral?'
include
:
afrixnli_translate_yaml
task
:
afrixnli_translate_ewe_prompt_1
lm_eval/tasks/afrixnli/translate/prompt_1/afrixnli_translate_fra.yaml
0 → 100644
View file @
601be343
# Generated by utils.py
dataset_name
:
fra
doc_to_text
:
'
Please
identify
whether
the
premise
entails
or
contradicts
the
hypothesis
in
the
following
premise
and
hypothesis.
The
answer
should
be
exact
entailment,
contradiction,
or
neutral.
Premise:
{premise}
Hypothesis:
{hypothesis}
Is
it
entailment,
contradiction,
or
neutral?'
include
:
afrixnli_translate_yaml
task
:
afrixnli_translate_fra_prompt_1
lm_eval/tasks/afrixnli/translate/prompt_1/afrixnli_translate_hau.yaml
0 → 100644
View file @
601be343
# Generated by utils.py
dataset_name
:
hau
doc_to_text
:
'
Please
identify
whether
the
premise
entails
or
contradicts
the
hypothesis
in
the
following
premise
and
hypothesis.
The
answer
should
be
exact
entailment,
contradiction,
or
neutral.
Premise:
{premise}
Hypothesis:
{hypothesis}
Is
it
entailment,
contradiction,
or
neutral?'
include
:
afrixnli_translate_yaml
task
:
afrixnli_translate_hau_prompt_1
lm_eval/tasks/afrixnli/translate/prompt_1/afrixnli_translate_ibo.yaml
0 → 100644
View file @
601be343
# Generated by utils.py
dataset_name
:
ibo
doc_to_text
:
'
Please
identify
whether
the
premise
entails
or
contradicts
the
hypothesis
in
the
following
premise
and
hypothesis.
The
answer
should
be
exact
entailment,
contradiction,
or
neutral.
Premise:
{premise}
Hypothesis:
{hypothesis}
Is
it
entailment,
contradiction,
or
neutral?'
include
:
afrixnli_translate_yaml
task
:
afrixnli_translate_ibo_prompt_1
lm_eval/tasks/afrixnli/translate/prompt_1/afrixnli_translate_kin.yaml
0 → 100644
View file @
601be343
# Generated by utils.py
dataset_name
:
kin
doc_to_text
:
'
Please
identify
whether
the
premise
entails
or
contradicts
the
hypothesis
in
the
following
premise
and
hypothesis.
The
answer
should
be
exact
entailment,
contradiction,
or
neutral.
Premise:
{premise}
Hypothesis:
{hypothesis}
Is
it
entailment,
contradiction,
or
neutral?'
include
:
afrixnli_translate_yaml
task
:
afrixnli_translate_kin_prompt_1
lm_eval/tasks/afrixnli/translate/prompt_1/afrixnli_translate_lin.yaml
0 → 100644
View file @
601be343
# Generated by utils.py
dataset_name
:
lin
doc_to_text
:
'
Please
identify
whether
the
premise
entails
or
contradicts
the
hypothesis
in
the
following
premise
and
hypothesis.
The
answer
should
be
exact
entailment,
contradiction,
or
neutral.
Premise:
{premise}
Hypothesis:
{hypothesis}
Is
it
entailment,
contradiction,
or
neutral?'
include
:
afrixnli_translate_yaml
task
:
afrixnli_translate_lin_prompt_1
Prev
1
…
38
39
40
41
42
43
44
45
46
…
50
Next
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment