Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
gaoqiong
lm-evaluation-harness
Commits
2b56339e
"model.properties" did not exist on "382d1b19bc1f19e13b9a141c5aed420f9d1897a6"
Commit
2b56339e
authored
Jan 17, 2025
by
Baber
Browse files
Merge branch 'main' into longcxt
parents
0b533339
703fbffd
Changes
316
Hide whitespace changes
Inline
Side-by-side
Showing
20 changed files
with
304 additions
and
0 deletions
+304
-0
lm_eval/tasks/aradice/ArabicMMLU/EGY/AraDiCE_ArabicMMLU_univ_social-science_political-science.yaml
...iCE_ArabicMMLU_univ_social-science_political-science.yaml
+10
-0
lm_eval/tasks/aradice/ArabicMMLU/EGY/AraDiCE_ArabicMMLU_univ_stem_computer-science.yaml
...LU/EGY/AraDiCE_ArabicMMLU_univ_stem_computer-science.yaml
+10
-0
lm_eval/tasks/aradice/ArabicMMLU/EGY/_default_template_yaml
lm_eval/tasks/aradice/ArabicMMLU/EGY/_default_template_yaml
+20
-0
lm_eval/tasks/aradice/ArabicMMLU/EGY/metrics.py
lm_eval/tasks/aradice/ArabicMMLU/EGY/metrics.py
+25
-0
lm_eval/tasks/aradice/ArabicMMLU/EGY/utils.py
lm_eval/tasks/aradice/ArabicMMLU/EGY/utils.py
+87
-0
lm_eval/tasks/aradice/ArabicMMLU/LEV/AraDiCE_ArabicMMLU.yaml
lm_eval/tasks/aradice/ArabicMMLU/LEV/AraDiCE_ArabicMMLU.yaml
+12
-0
lm_eval/tasks/aradice/ArabicMMLU/LEV/AraDiCE_ArabicMMLU_high_humanities_history.yaml
...cMMLU/LEV/AraDiCE_ArabicMMLU_high_humanities_history.yaml
+10
-0
lm_eval/tasks/aradice/ArabicMMLU/LEV/AraDiCE_ArabicMMLU_high_humanities_islamic-studies.yaml
...V/AraDiCE_ArabicMMLU_high_humanities_islamic-studies.yaml
+10
-0
lm_eval/tasks/aradice/ArabicMMLU/LEV/AraDiCE_ArabicMMLU_high_humanities_philosophy.yaml
...LU/LEV/AraDiCE_ArabicMMLU_high_humanities_philosophy.yaml
+10
-0
lm_eval/tasks/aradice/ArabicMMLU/LEV/AraDiCE_ArabicMMLU_high_language_arabic-language.yaml
...LEV/AraDiCE_ArabicMMLU_high_language_arabic-language.yaml
+10
-0
lm_eval/tasks/aradice/ArabicMMLU/LEV/AraDiCE_ArabicMMLU_high_social-science_civics.yaml
...LU/LEV/AraDiCE_ArabicMMLU_high_social-science_civics.yaml
+10
-0
lm_eval/tasks/aradice/ArabicMMLU/LEV/AraDiCE_ArabicMMLU_high_social-science_economics.yaml
...LEV/AraDiCE_ArabicMMLU_high_social-science_economics.yaml
+10
-0
lm_eval/tasks/aradice/ArabicMMLU/LEV/AraDiCE_ArabicMMLU_high_social-science_geography.yaml
...LEV/AraDiCE_ArabicMMLU_high_social-science_geography.yaml
+10
-0
lm_eval/tasks/aradice/ArabicMMLU/LEV/AraDiCE_ArabicMMLU_high_stem_biology.yaml
.../ArabicMMLU/LEV/AraDiCE_ArabicMMLU_high_stem_biology.yaml
+10
-0
lm_eval/tasks/aradice/ArabicMMLU/LEV/AraDiCE_ArabicMMLU_high_stem_computer-science.yaml
...LU/LEV/AraDiCE_ArabicMMLU_high_stem_computer-science.yaml
+10
-0
lm_eval/tasks/aradice/ArabicMMLU/LEV/AraDiCE_ArabicMMLU_high_stem_physics.yaml
.../ArabicMMLU/LEV/AraDiCE_ArabicMMLU_high_stem_physics.yaml
+10
-0
lm_eval/tasks/aradice/ArabicMMLU/LEV/AraDiCE_ArabicMMLU_middle_humanities_history.yaml
...MLU/LEV/AraDiCE_ArabicMMLU_middle_humanities_history.yaml
+10
-0
lm_eval/tasks/aradice/ArabicMMLU/LEV/AraDiCE_ArabicMMLU_middle_humanities_islamic-studies.yaml
...AraDiCE_ArabicMMLU_middle_humanities_islamic-studies.yaml
+10
-0
lm_eval/tasks/aradice/ArabicMMLU/LEV/AraDiCE_ArabicMMLU_middle_language_arabic-language.yaml
...V/AraDiCE_ArabicMMLU_middle_language_arabic-language.yaml
+10
-0
lm_eval/tasks/aradice/ArabicMMLU/LEV/AraDiCE_ArabicMMLU_middle_other_general-knowledge.yaml
...EV/AraDiCE_ArabicMMLU_middle_other_general-knowledge.yaml
+10
-0
No files found.
lm_eval/tasks/aradice/ArabicMMLU/EGY/AraDiCE_ArabicMMLU_univ_social-science_political-science.yaml
0 → 100644
View file @
2b56339e
"
dataset_name"
:
"
univ_social-science_political-science"
"
description"
:
"
"
"
fewshot_split"
:
!!null
"
null"
"
include"
:
"
_default_template_yaml"
"
tag"
:
"
AraDiCE_ArabicMMLU_social-science_egy"
"
task"
:
"
AraDiCE_ArabicMMLU_univ_social-science_political-science_egy"
"
task_alias"
:
"
univ
social-science
political-science"
"
test_split"
:
"
test"
"
training_split"
:
!!null
"
null"
"
validation_split"
:
!!null
"
null"
lm_eval/tasks/aradice/ArabicMMLU/EGY/AraDiCE_ArabicMMLU_univ_stem_computer-science.yaml
0 → 100644
View file @
2b56339e
"
dataset_name"
:
"
univ_stem_computer-science"
"
description"
:
"
"
"
fewshot_split"
:
!!null
"
null"
"
include"
:
"
_default_template_yaml"
"
tag"
:
"
AraDiCE_ArabicMMLU_stem_egy"
"
task"
:
"
AraDiCE_ArabicMMLU_univ_stem_computer-science_egy"
"
task_alias"
:
"
univ
stem
computer-science"
"
test_split"
:
"
test"
"
training_split"
:
!!null
"
null"
"
validation_split"
:
!!null
"
null"
lm_eval/tasks/aradice/ArabicMMLU/EGY/_default_template_yaml
0 → 100644
View file @
2b56339e
dataset_path: "QCRI/AraDICE-ArabicMMLU-egy"
fewshot_config:
sampler: default
output_type: multiple_choice
process_docs: !function utils.process_docs
doc_to_text: "{{prompt}}"
doc_to_choice: choices
doc_to_target: target
metric_list:
- metric: acc
aggregation: mean
higher_is_better: true
- metric: acc_norm
aggregation: mean
higher_is_better: true
- metric: f1
higher_is_better: true
aggregation: !function metrics.micro_f1_score
metadata:
version: 0.0
lm_eval/tasks/aradice/ArabicMMLU/EGY/metrics.py
0 → 100644
View file @
2b56339e
from
sklearn.metrics
import
f1_score
def
macro_f1_score
(
items
):
unzipped_list
=
list
(
zip
(
*
items
))
golds
=
unzipped_list
[
0
]
preds
=
unzipped_list
[
1
]
fscore
=
f1_score
(
golds
,
preds
,
average
=
"macro"
)
return
fscore
def
micro_f1_score
(
items
):
unzipped_list
=
list
(
zip
(
*
items
))
golds
=
unzipped_list
[
0
]
preds
=
unzipped_list
[
1
]
fscore
=
f1_score
(
golds
,
preds
,
average
=
"micro"
)
return
fscore
def
weighted_f1_score
(
items
):
unzipped_list
=
list
(
zip
(
*
items
))
golds
=
unzipped_list
[
0
]
preds
=
unzipped_list
[
1
]
fscore
=
f1_score
(
golds
,
preds
,
average
=
"weighted"
)
return
fscore
lm_eval/tasks/aradice/ArabicMMLU/EGY/utils.py
0 → 100644
View file @
2b56339e
level_ar
=
{
"Primary"
:
"للمرحلة الابتدائية"
,
"Middle"
:
"للمرحلة المتوسطة"
,
"High"
:
"للمرحلة الثانوية"
,
"Univ"
:
"للمرحلة الجامعية "
,
"Prof"
:
"للمحترفين"
,
}
country_ar
=
{
"UAE"
:
"في الإمارات"
,
"Egypt"
:
"في مصر"
,
"Lebanon"
:
"في لبنان"
,
"Jordan"
:
"في الأردن"
,
"Kuwait"
:
"في الكويت"
,
"KSA"
:
"في السعودية"
,
"Palestine"
:
"في فلسطين"
,
"Morocco"
:
"في المغرب"
,
}
subject_ar
=
{
"Islamic Studies"
:
"في الدراسات إسلامية"
,
"Driving Test"
:
"في اختبار القيادة"
,
"Natural Science"
:
"في العلوم الطبيعية"
,
"History"
:
"في مادة التاريخ"
,
"General Knowledge"
:
"في المعرفة العامة"
,
"Law"
:
"في القانون"
,
"Physics"
:
"في الفيزياء"
,
"Social Science"
:
"في العلوم الاجتماعية"
,
"Management"
:
"في الإدارة"
,
"Arabic Language"
:
"في اللغة العربية"
,
"Political Science"
:
" في العلوم السياسية"
,
"Philosophy"
:
"في الفلسفة"
,
"Accounting"
:
"في المحاسبة"
,
"Computer Science"
:
"في علوم الحاسوب"
,
"Geography"
:
"في الجغرافيا"
,
"Math"
:
"في الرياضيات"
,
"Biology"
:
"في علم الأحياء"
,
"Economics"
:
"في الاقتصاد"
,
"Arabic Language (General)"
:
"في اللغة العربية (عام)"
,
"Arabic Language (Grammar)"
:
"في اللغة العربية (النحو)"
,
"Civics"
:
"في التربية المدنية"
,
}
alpa_ar
=
[
"أ-"
,
"ب-"
,
"ج-"
,
"د-"
,
"و-"
]
alpa_en
=
[
"A-"
,
"B-"
,
"C-"
,
"D-"
,
"E-"
]
all_choices
=
[
"أ"
,
"ب"
,
"ج"
,
"د"
,
"و"
]
all_choices_en
=
[
"A"
,
"B"
,
"C"
,
"D"
,
"E"
]
def
process_docs
(
dataset
):
def
_helper
(
doc
):
# modifies the contents of a single
# document in our dataset.
PROMPT
=
"ده سؤال [MAIN_META_DATA]. اختار الإجابة الصحيحة!
\n\n
سؤال: [INPUT]
\n
[OPTION]"
PROMPT
=
f
"
{
PROMPT
}
\n\n
إجابة:"
alpa
=
alpa_ar
subject
=
subject_ar
[
doc
[
"Subject"
]]
level
=
" "
+
level_ar
[
doc
[
"Level"
]]
if
doc
[
"Level"
]
else
""
country
=
" "
+
country_ar
[
doc
[
"Country"
]]
if
doc
[
"Country"
]
else
""
main_meta_data
=
f
"
{
subject
}{
level
}{
country
}
"
question
=
(
f
"
{
doc
[
'context'
]
}
\n\n
{
doc
[
'question'
]
}
"
if
doc
[
"context"
]
else
doc
[
"question"
]
)
options
=
[]
for
i
,
opt
in
enumerate
([
"A"
,
"B"
,
"C"
,
"D"
,
"E"
]):
if
opt
not
in
doc
[
"options"
]
or
doc
[
"options"
][
opt
]
is
None
:
break
options
.
append
(
f
"
{
alpa
[
i
]
}
{
doc
[
'options'
][
opt
]
}
"
)
doc
[
"prompt"
]
=
(
PROMPT
.
replace
(
"[MAIN_META_DATA]"
,
main_meta_data
)
.
replace
(
"[INPUT]"
,
question
)
.
replace
(
"[OPTION]"
,
"
\n
"
.
join
(
options
))
)
doc
[
"choices"
]
=
all_choices
[:
len
(
options
)]
doc
[
"target"
]
=
[
"A"
,
"B"
,
"C"
,
"D"
,
"E"
].
index
(
doc
[
"Answer Key"
])
return
doc
return
dataset
.
map
(
_helper
)
# returns back a datasets.Dataset object
lm_eval/tasks/aradice/ArabicMMLU/LEV/AraDiCE_ArabicMMLU.yaml
0 → 100644
View file @
2b56339e
group
:
AraDiCE_ArabicMMLU_lev
task
:
-
AraDiCE_ArabicMMLU_humanities_lev
-
AraDiCE_ArabicMMLU_language_lev
-
AraDiCE_ArabicMMLU_social-science_lev
-
AraDiCE_ArabicMMLU_stem_lev
-
AraDiCE_ArabicMMLU_other_lev
aggregate_metric_list
:
-
metric
:
acc
weight_by_size
:
True
-
metric
:
acc_norm
weight_by_size
:
True
lm_eval/tasks/aradice/ArabicMMLU/LEV/AraDiCE_ArabicMMLU_high_humanities_history.yaml
0 → 100644
View file @
2b56339e
"
dataset_name"
:
"
high_humanities_history"
"
description"
:
"
"
"
fewshot_split"
:
!!null
"
null"
"
include"
:
"
_default_template_yaml"
"
tag"
:
"
AraDiCE_ArabicMMLU_humanities_lev"
"
task"
:
"
AraDiCE_ArabicMMLU_high_humanities_history_lev"
"
task_alias"
:
"
high
humanities
history"
"
test_split"
:
"
test"
"
training_split"
:
!!null
"
null"
"
validation_split"
:
!!null
"
null"
lm_eval/tasks/aradice/ArabicMMLU/LEV/AraDiCE_ArabicMMLU_high_humanities_islamic-studies.yaml
0 → 100644
View file @
2b56339e
"
dataset_name"
:
"
high_humanities_islamic-studies"
"
description"
:
"
"
"
fewshot_split"
:
!!null
"
null"
"
include"
:
"
_default_template_yaml"
"
tag"
:
"
AraDiCE_ArabicMMLU_humanities_lev"
"
task"
:
"
AraDiCE_ArabicMMLU_high_humanities_islamic-studies_lev"
"
task_alias"
:
"
high
humanities
islamic-studies"
"
test_split"
:
"
test"
"
training_split"
:
!!null
"
null"
"
validation_split"
:
!!null
"
null"
lm_eval/tasks/aradice/ArabicMMLU/LEV/AraDiCE_ArabicMMLU_high_humanities_philosophy.yaml
0 → 100644
View file @
2b56339e
"
dataset_name"
:
"
high_humanities_philosophy"
"
description"
:
"
"
"
fewshot_split"
:
!!null
"
null"
"
include"
:
"
_default_template_yaml"
"
tag"
:
"
AraDiCE_ArabicMMLU_humanities_lev"
"
task"
:
"
AraDiCE_ArabicMMLU_high_humanities_philosophy_lev"
"
task_alias"
:
"
high
humanities
philosophy"
"
test_split"
:
"
test"
"
training_split"
:
!!null
"
null"
"
validation_split"
:
!!null
"
null"
lm_eval/tasks/aradice/ArabicMMLU/LEV/AraDiCE_ArabicMMLU_high_language_arabic-language.yaml
0 → 100644
View file @
2b56339e
"
dataset_name"
:
"
high_language_arabic-language"
"
description"
:
"
"
"
fewshot_split"
:
!!null
"
null"
"
include"
:
"
_default_template_yaml"
"
tag"
:
"
AraDiCE_ArabicMMLU_language_lev"
"
task"
:
"
AraDiCE_ArabicMMLU_high_language_arabic-language_lev"
"
task_alias"
:
"
high
language
arabic-language"
"
test_split"
:
"
test"
"
training_split"
:
!!null
"
null"
"
validation_split"
:
!!null
"
null"
lm_eval/tasks/aradice/ArabicMMLU/LEV/AraDiCE_ArabicMMLU_high_social-science_civics.yaml
0 → 100644
View file @
2b56339e
"
dataset_name"
:
"
high_social-science_civics"
"
description"
:
"
"
"
fewshot_split"
:
!!null
"
null"
"
include"
:
"
_default_template_yaml"
"
tag"
:
"
AraDiCE_ArabicMMLU_social-science_lev"
"
task"
:
"
AraDiCE_ArabicMMLU_high_social-science_civics_lev"
"
task_alias"
:
"
high
social-science
civics"
"
test_split"
:
"
test"
"
training_split"
:
!!null
"
null"
"
validation_split"
:
!!null
"
null"
lm_eval/tasks/aradice/ArabicMMLU/LEV/AraDiCE_ArabicMMLU_high_social-science_economics.yaml
0 → 100644
View file @
2b56339e
"
dataset_name"
:
"
high_social-science_economics"
"
description"
:
"
"
"
fewshot_split"
:
!!null
"
null"
"
include"
:
"
_default_template_yaml"
"
tag"
:
"
AraDiCE_ArabicMMLU_social-science_lev"
"
task"
:
"
AraDiCE_ArabicMMLU_high_social-science_economics_lev"
"
task_alias"
:
"
high
social-science
economics"
"
test_split"
:
"
test"
"
training_split"
:
!!null
"
null"
"
validation_split"
:
!!null
"
null"
lm_eval/tasks/aradice/ArabicMMLU/LEV/AraDiCE_ArabicMMLU_high_social-science_geography.yaml
0 → 100644
View file @
2b56339e
"
dataset_name"
:
"
high_social-science_geography"
"
description"
:
"
"
"
fewshot_split"
:
!!null
"
null"
"
include"
:
"
_default_template_yaml"
"
tag"
:
"
AraDiCE_ArabicMMLU_social-science_lev"
"
task"
:
"
AraDiCE_ArabicMMLU_high_social-science_geography_lev"
"
task_alias"
:
"
high
social-science
geography"
"
test_split"
:
"
test"
"
training_split"
:
!!null
"
null"
"
validation_split"
:
!!null
"
null"
lm_eval/tasks/aradice/ArabicMMLU/LEV/AraDiCE_ArabicMMLU_high_stem_biology.yaml
0 → 100644
View file @
2b56339e
"
dataset_name"
:
"
high_stem_biology"
"
description"
:
"
"
"
fewshot_split"
:
!!null
"
null"
"
include"
:
"
_default_template_yaml"
"
tag"
:
"
AraDiCE_ArabicMMLU_stem_lev"
"
task"
:
"
AraDiCE_ArabicMMLU_high_stem_biology_lev"
"
task_alias"
:
"
high
stem
biology"
"
test_split"
:
"
test"
"
training_split"
:
!!null
"
null"
"
validation_split"
:
!!null
"
null"
lm_eval/tasks/aradice/ArabicMMLU/LEV/AraDiCE_ArabicMMLU_high_stem_computer-science.yaml
0 → 100644
View file @
2b56339e
"
dataset_name"
:
"
high_stem_computer-science"
"
description"
:
"
"
"
fewshot_split"
:
!!null
"
null"
"
include"
:
"
_default_template_yaml"
"
tag"
:
"
AraDiCE_ArabicMMLU_stem_lev"
"
task"
:
"
AraDiCE_ArabicMMLU_high_stem_computer-science_lev"
"
task_alias"
:
"
high
stem
computer-science"
"
test_split"
:
"
test"
"
training_split"
:
!!null
"
null"
"
validation_split"
:
!!null
"
null"
lm_eval/tasks/aradice/ArabicMMLU/LEV/AraDiCE_ArabicMMLU_high_stem_physics.yaml
0 → 100644
View file @
2b56339e
"
dataset_name"
:
"
high_stem_physics"
"
description"
:
"
"
"
fewshot_split"
:
!!null
"
null"
"
include"
:
"
_default_template_yaml"
"
tag"
:
"
AraDiCE_ArabicMMLU_stem_lev"
"
task"
:
"
AraDiCE_ArabicMMLU_high_stem_physics_lev"
"
task_alias"
:
"
high
stem
physics"
"
test_split"
:
"
test"
"
training_split"
:
!!null
"
null"
"
validation_split"
:
!!null
"
null"
lm_eval/tasks/aradice/ArabicMMLU/LEV/AraDiCE_ArabicMMLU_middle_humanities_history.yaml
0 → 100644
View file @
2b56339e
"
dataset_name"
:
"
middle_humanities_history"
"
description"
:
"
"
"
fewshot_split"
:
!!null
"
null"
"
include"
:
"
_default_template_yaml"
"
tag"
:
"
AraDiCE_ArabicMMLU_humanities_lev"
"
task"
:
"
AraDiCE_ArabicMMLU_middle_humanities_history_lev"
"
task_alias"
:
"
middle
humanities
history"
"
test_split"
:
"
test"
"
training_split"
:
!!null
"
null"
"
validation_split"
:
!!null
"
null"
lm_eval/tasks/aradice/ArabicMMLU/LEV/AraDiCE_ArabicMMLU_middle_humanities_islamic-studies.yaml
0 → 100644
View file @
2b56339e
"
dataset_name"
:
"
middle_humanities_islamic-studies"
"
description"
:
"
"
"
fewshot_split"
:
!!null
"
null"
"
include"
:
"
_default_template_yaml"
"
tag"
:
"
AraDiCE_ArabicMMLU_humanities_lev"
"
task"
:
"
AraDiCE_ArabicMMLU_middle_humanities_islamic-studies_lev"
"
task_alias"
:
"
middle
humanities
islamic-studies"
"
test_split"
:
"
test"
"
training_split"
:
!!null
"
null"
"
validation_split"
:
!!null
"
null"
lm_eval/tasks/aradice/ArabicMMLU/LEV/AraDiCE_ArabicMMLU_middle_language_arabic-language.yaml
0 → 100644
View file @
2b56339e
"
dataset_name"
:
"
middle_language_arabic-language"
"
description"
:
"
"
"
fewshot_split"
:
!!null
"
null"
"
include"
:
"
_default_template_yaml"
"
tag"
:
"
AraDiCE_ArabicMMLU_language_lev"
"
task"
:
"
AraDiCE_ArabicMMLU_middle_language_arabic-language_lev"
"
task_alias"
:
"
middle
language
arabic-language"
"
test_split"
:
"
test"
"
training_split"
:
!!null
"
null"
"
validation_split"
:
!!null
"
null"
lm_eval/tasks/aradice/ArabicMMLU/LEV/AraDiCE_ArabicMMLU_middle_other_general-knowledge.yaml
0 → 100644
View file @
2b56339e
"
dataset_name"
:
"
middle_other_general-knowledge"
"
description"
:
"
"
"
fewshot_split"
:
!!null
"
null"
"
include"
:
"
_default_template_yaml"
"
tag"
:
"
AraDiCE_ArabicMMLU_other_lev"
"
task"
:
"
AraDiCE_ArabicMMLU_middle_other_general-knowledge_lev"
"
task_alias"
:
"
middle
other
general-knowledge"
"
test_split"
:
"
test"
"
training_split"
:
!!null
"
null"
"
validation_split"
:
!!null
"
null"
Prev
1
2
3
4
5
6
7
8
…
16
Next
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment