Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
gaoqiong
lm-evaluation-harness
Commits
89b6bdb3
Commit
89b6bdb3
authored
Feb 06, 2025
by
Baber
Browse files
Merge branch 'main' into ai2d
parents
59053d58
144a1e58
Changes
1000
Hide whitespace changes
Inline
Side-by-side
Showing
20 changed files
with
154 additions
and
22 deletions
+154
-22
lm_eval/tasks/arabicmmlu/arabicmmlu_univ_accounting.yaml
lm_eval/tasks/arabicmmlu/arabicmmlu_univ_accounting.yaml
+0
-5
lm_eval/tasks/arabicmmlu/arabicmmlu_univ_computer_science.yaml
...al/tasks/arabicmmlu/arabicmmlu_univ_computer_science.yaml
+0
-5
lm_eval/tasks/arabicmmlu/arabicmmlu_univ_economics.yaml
lm_eval/tasks/arabicmmlu/arabicmmlu_univ_economics.yaml
+0
-5
lm_eval/tasks/arabicmmlu/arabicmmlu_univ_political_science.yaml
...l/tasks/arabicmmlu/arabicmmlu_univ_political_science.yaml
+0
-5
lm_eval/tasks/arabicmmlu/utils.py
lm_eval/tasks/arabicmmlu/utils.py
+2
-2
lm_eval/tasks/aradice/ArabicMMLU/EGY/AraDiCE_ArabicMMLU.yaml
lm_eval/tasks/aradice/ArabicMMLU/EGY/AraDiCE_ArabicMMLU.yaml
+12
-0
lm_eval/tasks/aradice/ArabicMMLU/EGY/AraDiCE_ArabicMMLU_high_humanities_history.yaml
...cMMLU/EGY/AraDiCE_ArabicMMLU_high_humanities_history.yaml
+10
-0
lm_eval/tasks/aradice/ArabicMMLU/EGY/AraDiCE_ArabicMMLU_high_humanities_islamic-studies.yaml
...Y/AraDiCE_ArabicMMLU_high_humanities_islamic-studies.yaml
+10
-0
lm_eval/tasks/aradice/ArabicMMLU/EGY/AraDiCE_ArabicMMLU_high_humanities_philosophy.yaml
...LU/EGY/AraDiCE_ArabicMMLU_high_humanities_philosophy.yaml
+10
-0
lm_eval/tasks/aradice/ArabicMMLU/EGY/AraDiCE_ArabicMMLU_high_language_arabic-language.yaml
...EGY/AraDiCE_ArabicMMLU_high_language_arabic-language.yaml
+10
-0
lm_eval/tasks/aradice/ArabicMMLU/EGY/AraDiCE_ArabicMMLU_high_social-science_civics.yaml
...LU/EGY/AraDiCE_ArabicMMLU_high_social-science_civics.yaml
+10
-0
lm_eval/tasks/aradice/ArabicMMLU/EGY/AraDiCE_ArabicMMLU_high_social-science_economics.yaml
...EGY/AraDiCE_ArabicMMLU_high_social-science_economics.yaml
+10
-0
lm_eval/tasks/aradice/ArabicMMLU/EGY/AraDiCE_ArabicMMLU_high_social-science_geography.yaml
...EGY/AraDiCE_ArabicMMLU_high_social-science_geography.yaml
+10
-0
lm_eval/tasks/aradice/ArabicMMLU/EGY/AraDiCE_ArabicMMLU_high_stem_biology.yaml
.../ArabicMMLU/EGY/AraDiCE_ArabicMMLU_high_stem_biology.yaml
+10
-0
lm_eval/tasks/aradice/ArabicMMLU/EGY/AraDiCE_ArabicMMLU_high_stem_computer-science.yaml
...LU/EGY/AraDiCE_ArabicMMLU_high_stem_computer-science.yaml
+10
-0
lm_eval/tasks/aradice/ArabicMMLU/EGY/AraDiCE_ArabicMMLU_high_stem_physics.yaml
.../ArabicMMLU/EGY/AraDiCE_ArabicMMLU_high_stem_physics.yaml
+10
-0
lm_eval/tasks/aradice/ArabicMMLU/EGY/AraDiCE_ArabicMMLU_middle_humanities_history.yaml
...MLU/EGY/AraDiCE_ArabicMMLU_middle_humanities_history.yaml
+10
-0
lm_eval/tasks/aradice/ArabicMMLU/EGY/AraDiCE_ArabicMMLU_middle_humanities_islamic-studies.yaml
...AraDiCE_ArabicMMLU_middle_humanities_islamic-studies.yaml
+10
-0
lm_eval/tasks/aradice/ArabicMMLU/EGY/AraDiCE_ArabicMMLU_middle_language_arabic-language.yaml
...Y/AraDiCE_ArabicMMLU_middle_language_arabic-language.yaml
+10
-0
lm_eval/tasks/aradice/ArabicMMLU/EGY/AraDiCE_ArabicMMLU_middle_other_general-knowledge.yaml
...GY/AraDiCE_ArabicMMLU_middle_other_general-knowledge.yaml
+10
-0
No files found.
Too many changes to show.
To preserve performance only
1000 of 1000+
files are displayed.
Plain diff
Email patch
lm_eval/tasks/arabicmmlu/arabicmmlu_univ_accounting.yaml
deleted
100644 → 0
View file @
59053d58
"
dataset_name"
:
"
Univ
Accounting"
"
tag"
:
"
arabicmmlu_social_science_tasks"
"
include"
:
"
_default_arabicmmlu_template_yaml"
"
task"
:
"
arabicmmlu_univ_accounting"
"
task_alias"
:
"
Univ
Accounting"
lm_eval/tasks/arabicmmlu/arabicmmlu_univ_computer_science.yaml
deleted
100644 → 0
View file @
59053d58
"
dataset_name"
:
"
Univ
Computer
Science"
"
tag"
:
"
arabicmmlu_stem_tasks"
"
include"
:
"
_default_arabicmmlu_template_yaml"
"
task"
:
"
arabicmmlu_univ_computer_science"
"
task_alias"
:
"
Univ
Computer
Science"
lm_eval/tasks/arabicmmlu/arabicmmlu_univ_economics.yaml
deleted
100644 → 0
View file @
59053d58
"
dataset_name"
:
"
Univ
Economics"
"
tag"
:
"
arabicmmlu_social_science_tasks"
"
include"
:
"
_default_arabicmmlu_template_yaml"
"
task"
:
"
arabicmmlu_univ_economics"
"
task_alias"
:
"
Univ
Economics"
lm_eval/tasks/arabicmmlu/arabicmmlu_univ_political_science.yaml
deleted
100644 → 0
View file @
59053d58
"
dataset_name"
:
"
Univ
Political
Science"
"
tag"
:
"
arabicmmlu_social_science_tasks"
"
include"
:
"
_default_arabicmmlu_template_yaml"
"
task"
:
"
arabicmmlu_univ_political_science"
"
task_alias"
:
"
Univ
Political
Science"
lm_eval/tasks/arabicmmlu/utils.py
View file @
89b6bdb3
...
...
@@ -23,7 +23,7 @@ def doc_to_text(doc):
question
=
(
doc
[
"Question"
]
if
doc
[
"Context"
]
==
""
if
not
doc
[
"Context"
]
else
f
"
{
doc
[
'Context'
]
}
\n\n
{
doc
[
'Question'
]
}
"
)
...
...
@@ -41,4 +41,4 @@ def doc_to_text(doc):
def
doc_to_choice
(
doc
):
return
[
alpa
[
i
][
0
]
for
i
in
range
(
5
)
if
doc
[
f
"Option
{
i
+
1
}
"
]]
return
[
alpa
[
i
][
0
]
for
i
in
range
(
5
)
if
doc
[
f
"Option
{
i
+
1
}
"
]]
lm_eval/tasks/aradice/ArabicMMLU/EGY/AraDiCE_ArabicMMLU.yaml
0 → 100644
View file @
89b6bdb3
group
:
AraDiCE_ArabicMMLU_egy
task
:
-
AraDiCE_ArabicMMLU_humanities_egy
-
AraDiCE_ArabicMMLU_language_egy
-
AraDiCE_ArabicMMLU_social-science_egy
-
AraDiCE_ArabicMMLU_stem_egy
-
AraDiCE_ArabicMMLU_other_egy
aggregate_metric_list
:
-
metric
:
acc
weight_by_size
:
True
-
metric
:
acc_norm
weight_by_size
:
True
lm_eval/tasks/aradice/ArabicMMLU/EGY/AraDiCE_ArabicMMLU_high_humanities_history.yaml
0 → 100644
View file @
89b6bdb3
"
dataset_name"
:
"
high_humanities_history"
"
description"
:
"
"
"
fewshot_split"
:
!!null
"
null"
"
include"
:
"
_default_template_yaml"
"
tag"
:
"
AraDiCE_ArabicMMLU_humanities_egy"
"
task"
:
"
AraDiCE_ArabicMMLU_high_humanities_history_egy"
"
task_alias"
:
"
high
humanities
history"
"
test_split"
:
"
test"
"
training_split"
:
!!null
"
null"
"
validation_split"
:
!!null
"
null"
lm_eval/tasks/aradice/ArabicMMLU/EGY/AraDiCE_ArabicMMLU_high_humanities_islamic-studies.yaml
0 → 100644
View file @
89b6bdb3
"
dataset_name"
:
"
high_humanities_islamic-studies"
"
description"
:
"
"
"
fewshot_split"
:
!!null
"
null"
"
include"
:
"
_default_template_yaml"
"
tag"
:
"
AraDiCE_ArabicMMLU_humanities_egy"
"
task"
:
"
AraDiCE_ArabicMMLU_high_humanities_islamic-studies_egy"
"
task_alias"
:
"
high
humanities
islamic-studies"
"
test_split"
:
"
test"
"
training_split"
:
!!null
"
null"
"
validation_split"
:
!!null
"
null"
lm_eval/tasks/aradice/ArabicMMLU/EGY/AraDiCE_ArabicMMLU_high_humanities_philosophy.yaml
0 → 100644
View file @
89b6bdb3
"
dataset_name"
:
"
high_humanities_philosophy"
"
description"
:
"
"
"
fewshot_split"
:
!!null
"
null"
"
include"
:
"
_default_template_yaml"
"
tag"
:
"
AraDiCE_ArabicMMLU_humanities_egy"
"
task"
:
"
AraDiCE_ArabicMMLU_high_humanities_philosophy_egy"
"
task_alias"
:
"
high
humanities
philosophy"
"
test_split"
:
"
test"
"
training_split"
:
!!null
"
null"
"
validation_split"
:
!!null
"
null"
lm_eval/tasks/aradice/ArabicMMLU/EGY/AraDiCE_ArabicMMLU_high_language_arabic-language.yaml
0 → 100644
View file @
89b6bdb3
"
dataset_name"
:
"
high_language_arabic-language"
"
description"
:
"
"
"
fewshot_split"
:
!!null
"
null"
"
include"
:
"
_default_template_yaml"
"
tag"
:
"
AraDiCE_ArabicMMLU_language_egy"
"
task"
:
"
AraDiCE_ArabicMMLU_high_language_arabic-language_egy"
"
task_alias"
:
"
high
language
arabic-language"
"
test_split"
:
"
test"
"
training_split"
:
!!null
"
null"
"
validation_split"
:
!!null
"
null"
lm_eval/tasks/aradice/ArabicMMLU/EGY/AraDiCE_ArabicMMLU_high_social-science_civics.yaml
0 → 100644
View file @
89b6bdb3
"
dataset_name"
:
"
high_social-science_civics"
"
description"
:
"
"
"
fewshot_split"
:
!!null
"
null"
"
include"
:
"
_default_template_yaml"
"
tag"
:
"
AraDiCE_ArabicMMLU_social-science_egy"
"
task"
:
"
AraDiCE_ArabicMMLU_high_social-science_civics_egy"
"
task_alias"
:
"
high
social-science
civics"
"
test_split"
:
"
test"
"
training_split"
:
!!null
"
null"
"
validation_split"
:
!!null
"
null"
lm_eval/tasks/aradice/ArabicMMLU/EGY/AraDiCE_ArabicMMLU_high_social-science_economics.yaml
0 → 100644
View file @
89b6bdb3
"
dataset_name"
:
"
high_social-science_economics"
"
description"
:
"
"
"
fewshot_split"
:
!!null
"
null"
"
include"
:
"
_default_template_yaml"
"
tag"
:
"
AraDiCE_ArabicMMLU_social-science_egy"
"
task"
:
"
AraDiCE_ArabicMMLU_high_social-science_economics_egy"
"
task_alias"
:
"
high
social-science
economics"
"
test_split"
:
"
test"
"
training_split"
:
!!null
"
null"
"
validation_split"
:
!!null
"
null"
lm_eval/tasks/aradice/ArabicMMLU/EGY/AraDiCE_ArabicMMLU_high_social-science_geography.yaml
0 → 100644
View file @
89b6bdb3
"
dataset_name"
:
"
high_social-science_geography"
"
description"
:
"
"
"
fewshot_split"
:
!!null
"
null"
"
include"
:
"
_default_template_yaml"
"
tag"
:
"
AraDiCE_ArabicMMLU_social-science_egy"
"
task"
:
"
AraDiCE_ArabicMMLU_high_social-science_geography_egy"
"
task_alias"
:
"
high
social-science
geography"
"
test_split"
:
"
test"
"
training_split"
:
!!null
"
null"
"
validation_split"
:
!!null
"
null"
lm_eval/tasks/aradice/ArabicMMLU/EGY/AraDiCE_ArabicMMLU_high_stem_biology.yaml
0 → 100644
View file @
89b6bdb3
"
dataset_name"
:
"
high_stem_biology"
"
description"
:
"
"
"
fewshot_split"
:
!!null
"
null"
"
include"
:
"
_default_template_yaml"
"
tag"
:
"
AraDiCE_ArabicMMLU_stem_egy"
"
task"
:
"
AraDiCE_ArabicMMLU_high_stem_biology_egy"
"
task_alias"
:
"
high
stem
biology"
"
test_split"
:
"
test"
"
training_split"
:
!!null
"
null"
"
validation_split"
:
!!null
"
null"
lm_eval/tasks/aradice/ArabicMMLU/EGY/AraDiCE_ArabicMMLU_high_stem_computer-science.yaml
0 → 100644
View file @
89b6bdb3
"
dataset_name"
:
"
high_stem_computer-science"
"
description"
:
"
"
"
fewshot_split"
:
!!null
"
null"
"
include"
:
"
_default_template_yaml"
"
tag"
:
"
AraDiCE_ArabicMMLU_stem_egy"
"
task"
:
"
AraDiCE_ArabicMMLU_high_stem_computer-science_egy"
"
task_alias"
:
"
high
stem
computer-science"
"
test_split"
:
"
test"
"
training_split"
:
!!null
"
null"
"
validation_split"
:
!!null
"
null"
lm_eval/tasks/aradice/ArabicMMLU/EGY/AraDiCE_ArabicMMLU_high_stem_physics.yaml
0 → 100644
View file @
89b6bdb3
"
dataset_name"
:
"
high_stem_physics"
"
description"
:
"
"
"
fewshot_split"
:
!!null
"
null"
"
include"
:
"
_default_template_yaml"
"
tag"
:
"
AraDiCE_ArabicMMLU_stem_egy"
"
task"
:
"
AraDiCE_ArabicMMLU_high_stem_physics_egy"
"
task_alias"
:
"
high
stem
physics"
"
test_split"
:
"
test"
"
training_split"
:
!!null
"
null"
"
validation_split"
:
!!null
"
null"
lm_eval/tasks/aradice/ArabicMMLU/EGY/AraDiCE_ArabicMMLU_middle_humanities_history.yaml
0 → 100644
View file @
89b6bdb3
"
dataset_name"
:
"
middle_humanities_history"
"
description"
:
"
"
"
fewshot_split"
:
!!null
"
null"
"
include"
:
"
_default_template_yaml"
"
tag"
:
"
AraDiCE_ArabicMMLU_humanities_egy"
"
task"
:
"
AraDiCE_ArabicMMLU_middle_humanities_history_egy"
"
task_alias"
:
"
middle
humanities
history"
"
test_split"
:
"
test"
"
training_split"
:
!!null
"
null"
"
validation_split"
:
!!null
"
null"
lm_eval/tasks/aradice/ArabicMMLU/EGY/AraDiCE_ArabicMMLU_middle_humanities_islamic-studies.yaml
0 → 100644
View file @
89b6bdb3
"
dataset_name"
:
"
middle_humanities_islamic-studies"
"
description"
:
"
"
"
fewshot_split"
:
!!null
"
null"
"
include"
:
"
_default_template_yaml"
"
tag"
:
"
AraDiCE_ArabicMMLU_humanities_egy"
"
task"
:
"
AraDiCE_ArabicMMLU_middle_humanities_islamic-studies_egy"
"
task_alias"
:
"
middle
humanities
islamic-studies"
"
test_split"
:
"
test"
"
training_split"
:
!!null
"
null"
"
validation_split"
:
!!null
"
null"
lm_eval/tasks/aradice/ArabicMMLU/EGY/AraDiCE_ArabicMMLU_middle_language_arabic-language.yaml
0 → 100644
View file @
89b6bdb3
"
dataset_name"
:
"
middle_language_arabic-language"
"
description"
:
"
"
"
fewshot_split"
:
!!null
"
null"
"
include"
:
"
_default_template_yaml"
"
tag"
:
"
AraDiCE_ArabicMMLU_language_egy"
"
task"
:
"
AraDiCE_ArabicMMLU_middle_language_arabic-language_egy"
"
task_alias"
:
"
middle
language
arabic-language"
"
test_split"
:
"
test"
"
training_split"
:
!!null
"
null"
"
validation_split"
:
!!null
"
null"
lm_eval/tasks/aradice/ArabicMMLU/EGY/AraDiCE_ArabicMMLU_middle_other_general-knowledge.yaml
0 → 100644
View file @
89b6bdb3
"
dataset_name"
:
"
middle_other_general-knowledge"
"
description"
:
"
"
"
fewshot_split"
:
!!null
"
null"
"
include"
:
"
_default_template_yaml"
"
tag"
:
"
AraDiCE_ArabicMMLU_other_egy"
"
task"
:
"
AraDiCE_ArabicMMLU_middle_other_general-knowledge_egy"
"
task_alias"
:
"
middle
other
general-knowledge"
"
test_split"
:
"
test"
"
training_split"
:
!!null
"
null"
"
validation_split"
:
!!null
"
null"
Prev
1
2
3
4
5
6
7
8
9
10
…
50
Next
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment