Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
gaoqiong
lm-evaluation-harness
Commits
89b6bdb3
Commit
89b6bdb3
authored
Feb 06, 2025
by
Baber
Browse files
Merge branch 'main' into ai2d
parents
59053d58
144a1e58
Changes
1000
Hide whitespace changes
Inline
Side-by-side
Showing
20 changed files
with
104 additions
and
53 deletions
+104
-53
lm_eval/tasks/arabicmmlu/_arabicmmlu.yaml
lm_eval/tasks/arabicmmlu/_arabicmmlu.yaml
+1
-1
lm_eval/tasks/arabicmmlu/_arabicmmlu_humanities.yaml
lm_eval/tasks/arabicmmlu/_arabicmmlu_humanities.yaml
+1
-1
lm_eval/tasks/arabicmmlu/_arabicmmlu_language.yaml
lm_eval/tasks/arabicmmlu/_arabicmmlu_language.yaml
+1
-1
lm_eval/tasks/arabicmmlu/_arabicmmlu_other.yaml
lm_eval/tasks/arabicmmlu/_arabicmmlu_other.yaml
+1
-1
lm_eval/tasks/arabicmmlu/_arabicmmlu_social_science.yaml
lm_eval/tasks/arabicmmlu/_arabicmmlu_social_science.yaml
+1
-1
lm_eval/tasks/arabicmmlu/_arabicmmlu_stem.yaml
lm_eval/tasks/arabicmmlu/_arabicmmlu_stem.yaml
+1
-1
lm_eval/tasks/arabicmmlu/_default_arabicmmlu_template_yaml
lm_eval/tasks/arabicmmlu/_default_arabicmmlu_template_yaml
+2
-2
lm_eval/tasks/arabicmmlu/_generate_configs.py
lm_eval/tasks/arabicmmlu/_generate_configs.py
+42
-41
lm_eval/tasks/arabicmmlu/arabicmmlu_accounting_university.yaml
...al/tasks/arabicmmlu/arabicmmlu_accounting_university.yaml
+5
-0
lm_eval/tasks/arabicmmlu/arabicmmlu_arabic_language_general.yaml
.../tasks/arabicmmlu/arabicmmlu_arabic_language_general.yaml
+2
-2
lm_eval/tasks/arabicmmlu/arabicmmlu_arabic_language_grammar.yaml
.../tasks/arabicmmlu/arabicmmlu_arabic_language_grammar.yaml
+2
-2
lm_eval/tasks/arabicmmlu/arabicmmlu_arabic_language_high_school.yaml
...ks/arabicmmlu/arabicmmlu_arabic_language_high_school.yaml
+5
-0
lm_eval/tasks/arabicmmlu/arabicmmlu_arabic_language_middle_school.yaml
.../arabicmmlu/arabicmmlu_arabic_language_middle_school.yaml
+5
-0
lm_eval/tasks/arabicmmlu/arabicmmlu_arabic_language_primary_school.yaml
...arabicmmlu/arabicmmlu_arabic_language_primary_school.yaml
+5
-0
lm_eval/tasks/arabicmmlu/arabicmmlu_biology_high_school.yaml
lm_eval/tasks/arabicmmlu/arabicmmlu_biology_high_school.yaml
+5
-0
lm_eval/tasks/arabicmmlu/arabicmmlu_civics_high_school.yaml
lm_eval/tasks/arabicmmlu/arabicmmlu_civics_high_school.yaml
+5
-0
lm_eval/tasks/arabicmmlu/arabicmmlu_civics_middle_school.yaml
...val/tasks/arabicmmlu/arabicmmlu_civics_middle_school.yaml
+5
-0
lm_eval/tasks/arabicmmlu/arabicmmlu_computer_science_high_school.yaml
...s/arabicmmlu/arabicmmlu_computer_science_high_school.yaml
+5
-0
lm_eval/tasks/arabicmmlu/arabicmmlu_computer_science_middle_school.yaml
...arabicmmlu/arabicmmlu_computer_science_middle_school.yaml
+5
-0
lm_eval/tasks/arabicmmlu/arabicmmlu_computer_science_primary_school.yaml
...rabicmmlu/arabicmmlu_computer_science_primary_school.yaml
+5
-0
No files found.
Too many changes to show.
To preserve performance only
1000 of 1000+
files are displayed.
Plain diff
Email patch
lm_eval/tasks/arabicmmlu/_arabicmmlu.yaml
View file @
89b6bdb3
...
@@ -9,4 +9,4 @@ aggregate_metric_list:
...
@@ -9,4 +9,4 @@ aggregate_metric_list:
-
metric
:
acc
-
metric
:
acc
weight_by_size
:
True
weight_by_size
:
True
metadata
:
metadata
:
version
:
0
version
:
1
lm_eval/tasks/arabicmmlu/_arabicmmlu_humanities.yaml
View file @
89b6bdb3
...
@@ -6,4 +6,4 @@ aggregate_metric_list:
...
@@ -6,4 +6,4 @@ aggregate_metric_list:
-
metric
:
acc
-
metric
:
acc
weight_by_size
:
True
weight_by_size
:
True
metadata
:
metadata
:
version
:
0
version
:
1
lm_eval/tasks/arabicmmlu/_arabicmmlu_language.yaml
View file @
89b6bdb3
...
@@ -6,4 +6,4 @@ aggregate_metric_list:
...
@@ -6,4 +6,4 @@ aggregate_metric_list:
-
metric
:
acc
-
metric
:
acc
weight_by_size
:
True
weight_by_size
:
True
metadata
:
metadata
:
version
:
0
version
:
1
lm_eval/tasks/arabicmmlu/_arabicmmlu_other.yaml
View file @
89b6bdb3
...
@@ -6,4 +6,4 @@ aggregate_metric_list:
...
@@ -6,4 +6,4 @@ aggregate_metric_list:
-
metric
:
acc
-
metric
:
acc
weight_by_size
:
True
weight_by_size
:
True
metadata
:
metadata
:
version
:
0
version
:
1
lm_eval/tasks/arabicmmlu/_arabicmmlu_social_science.yaml
View file @
89b6bdb3
...
@@ -6,4 +6,4 @@ aggregate_metric_list:
...
@@ -6,4 +6,4 @@ aggregate_metric_list:
-
metric
:
acc
-
metric
:
acc
weight_by_size
:
True
weight_by_size
:
True
metadata
:
metadata
:
version
:
0
version
:
1
lm_eval/tasks/arabicmmlu/_arabicmmlu_stem.yaml
View file @
89b6bdb3
...
@@ -6,4 +6,4 @@ aggregate_metric_list:
...
@@ -6,4 +6,4 @@ aggregate_metric_list:
-
metric
:
acc
-
metric
:
acc
weight_by_size
:
True
weight_by_size
:
True
metadata
:
metadata
:
version
:
0
version
:
1
lm_eval/tasks/arabicmmlu/_default_arabicmmlu_template_yaml
View file @
89b6bdb3
dataset_path:
yazeed7
/ArabicMMLU
dataset_path:
MBZUAI
/ArabicMMLU
test_split: test
test_split: test
fewshot_split: dev
fewshot_split: dev
fewshot_config:
fewshot_config:
...
@@ -12,4 +12,4 @@ metric_list:
...
@@ -12,4 +12,4 @@ metric_list:
aggregation: mean
aggregation: mean
higher_is_better: true
higher_is_better: true
metadata:
metadata:
version:
0
.0
version:
1
.0
lm_eval/tasks/arabicmmlu/_generate_configs.py
View file @
89b6bdb3
...
@@ -14,46 +14,46 @@ eval_logger = logging.getLogger("lm-eval")
...
@@ -14,46 +14,46 @@ eval_logger = logging.getLogger("lm-eval")
SUBJECTS
=
{
SUBJECTS
=
{
"Driving Test"
:
"other"
,
"High Geography"
:
"social_science"
,
"High History"
:
"humanities"
,
"Islamic Studies"
:
"humanities"
,
"Islamic Studies"
:
"humanities"
,
"Univ Accounting"
:
"social_science"
,
"Driving Test"
:
"other"
,
"Primary General Knowledge"
:
"other"
,
"Natural Science (Middle School)"
:
"stem"
,
"Univ Political Science"
:
"social_science"
,
"Natural Science (Primary School)"
:
"stem"
,
"Primary Math"
:
"stem"
,
"History (Primary School)"
:
"humanities"
,
"Middle General Knowledge"
:
"other"
,
"History (Middle School)"
:
"humanities"
,
"High Biology"
:
"stem"
,
"History (High School)"
:
"humanities"
,
"Primary Natural Science"
:
"stem"
,
"High Economics"
:
"social_science"
,
"Middle Natural Science"
:
"stem"
,
"Middle Geography"
:
"social_science"
,
"Primary Social Science"
:
"social_science"
,
"Middle Computer Science"
:
"stem"
,
"Middle Islamic Studies"
:
"humanities"
,
"Primary Computer Science"
:
"stem"
,
"High Physics"
:
"stem"
,
"Middle Social Science"
:
"social_science"
,
"Middle Civics"
:
"social_science"
,
"High Computer Science"
:
"stem"
,
"General Knowledge"
:
"other"
,
"General Knowledge"
:
"other"
,
"High Civics"
:
"social_science"
,
"General Knowledge (Primary School)"
:
"other"
,
"Prof Law"
:
"humanities"
,
"General Knowledge (Middle School)"
:
"other"
,
"High Islamic Studies"
:
"humanities"
,
"Law (Professional)"
:
"humanities"
,
"Primary Arabic Language"
:
"language"
,
"Physics (High School)"
:
"stem"
,
"High Arabic Language"
:
"language"
,
"Social Science (Middle School)"
:
"social_science"
,
"Arabic Language (Grammar)"
:
"language"
,
"Social Science (Primary School)"
:
"social_science"
,
"Primary History"
:
"humanities"
,
"Management (University)"
:
"other"
,
"Middle History"
:
"humanities"
,
"Arabic Language (Primary School)"
:
"language"
,
"Univ Economics"
:
"social_science"
,
"Arabic Language (Middle School)"
:
"language"
,
"Arabic Language (High School)"
:
"language"
,
"Political Science (University)"
:
"social_science"
,
"Philosophy (High School)"
:
"humanities"
,
"Accounting (University)"
:
"social_science"
,
"Computer Science (University)"
:
"stem"
,
"Computer Science (Middle School)"
:
"stem"
,
"Computer Science (Primary School)"
:
"stem"
,
"Computer Science (High School)"
:
"stem"
,
"Geography (Primary School)"
:
"social_science"
,
"Geography (Middle School)"
:
"social_science"
,
"Geography (High School)"
:
"social_science"
,
"Math (Primary School)"
:
"stem"
,
"Biology (High School)"
:
"stem"
,
"Economics (University)"
:
"social_science"
,
"Economics (Middle School)"
:
"social_science"
,
"Economics (High School)"
:
"social_science"
,
"Arabic Language (General)"
:
"language"
,
"Arabic Language (General)"
:
"language"
,
"Univ Computer Science"
:
"stem"
,
"Arabic Language (Grammar)"
:
"language"
,
"Primary Islamic Studies"
:
"humanities"
,
"Islamic Studies (High School)"
:
"humanities"
,
"Primary Geography"
:
"social_science"
,
"Islamic Studies (Middle School)"
:
"humanities"
,
"High Philosophy"
:
"humanities"
,
"Islamic Studies (Primary School)"
:
"humanities"
,
"Middle Arabic Language"
:
"language"
,
"Civics (Middle School)"
:
"social_science"
,
"Middle Economics"
:
"social_science"
,
"Civics (High School)"
:
"social_science"
,
"Univ Management"
:
"other"
,
}
}
...
@@ -69,8 +69,9 @@ if __name__ == "__main__":
...
@@ -69,8 +69,9 @@ if __name__ == "__main__":
# get filename of base_yaml so we can `"include": ` it in our "other" YAMLs.
# get filename of base_yaml so we can `"include": ` it in our "other" YAMLs.
base_yaml_name
=
os
.
path
.
split
(
args
.
base_yaml_path
)[
-
1
]
base_yaml_name
=
os
.
path
.
split
(
args
.
base_yaml_path
)[
-
1
]
with
open
(
args
.
base_yaml_path
,
encoding
=
"utf-8"
)
as
f
:
base_yaml
=
yaml
.
full_load
(
f
)
# with open(args.base_yaml_path, encoding="utf-8") as f:
# base_yaml = yaml.full_load(f)
ALL_CATEGORIES
=
[]
ALL_CATEGORIES
=
[]
for
subject
,
category
in
tqdm
(
SUBJECTS
.
items
()):
for
subject
,
category
in
tqdm
(
SUBJECTS
.
items
()):
...
@@ -81,8 +82,8 @@ if __name__ == "__main__":
...
@@ -81,8 +82,8 @@ if __name__ == "__main__":
yaml_dict
=
{
yaml_dict
=
{
"include"
:
base_yaml_name
,
"include"
:
base_yaml_name
,
"tag"
:
f
"arabicmmlu_
{
category
}
"
,
"tag"
:
f
"arabicmmlu_
{
category
}
_tasks
"
,
"task"
:
f
"arabicmmlu_
{
subject
.
lower
().
replace
(
' '
,
'_'
)
}
"
,
"task"
:
f
"arabicmmlu_
{
subject
.
lower
().
replace
(
' '
,
'_'
)
.
replace
(
'('
,
''
).
replace
(
')'
,
''
)
}
"
,
"task_alias"
:
subject
,
"task_alias"
:
subject
,
"dataset_name"
:
subject
,
"dataset_name"
:
subject
,
# "description": description,
# "description": description,
...
...
lm_eval/tasks/arabicmmlu/arabicmmlu_
middle_civics
.yaml
→
lm_eval/tasks/arabicmmlu/arabicmmlu_
accounting_university
.yaml
View file @
89b6bdb3
"
dataset_name"
:
"
Middle
Civics"
"
dataset_name"
:
"
Accounting
(University)"
"
tag"
:
"
arabicmmlu_social_science_tasks"
"
include"
:
"
_default_arabicmmlu_template_yaml"
"
include"
:
"
_default_arabicmmlu_template_yaml"
"
task"
:
"
arabicmmlu_middle_civics"
"
tag"
:
"
arabicmmlu_social_science_tasks"
"
task_alias"
:
"
Middle
Civics"
"
task"
:
"
arabicmmlu_accounting_university"
"
task_alias"
:
"
Accounting
(University)"
lm_eval/tasks/arabicmmlu/arabicmmlu_arabic_language_general.yaml
View file @
89b6bdb3
"
dataset_name"
:
"
Arabic
Language
(General)"
"
dataset_name"
:
"
Arabic
Language
(General)"
"
tag"
:
"
arabicmmlu_language_tasks"
"
include"
:
"
_default_arabicmmlu_template_yaml"
"
include"
:
"
_default_arabicmmlu_template_yaml"
"
task"
:
"
arabicmmlu_arabic_language_(general)"
"
tag"
:
"
arabicmmlu_language_tasks"
"
task"
:
"
arabicmmlu_arabic_language_general"
"
task_alias"
:
"
Arabic
Language
(General)"
"
task_alias"
:
"
Arabic
Language
(General)"
lm_eval/tasks/arabicmmlu/arabicmmlu_arabic_language_grammar.yaml
View file @
89b6bdb3
"
dataset_name"
:
"
Arabic
Language
(Grammar)"
"
dataset_name"
:
"
Arabic
Language
(Grammar)"
"
tag"
:
"
arabicmmlu_language_tasks"
"
include"
:
"
_default_arabicmmlu_template_yaml"
"
include"
:
"
_default_arabicmmlu_template_yaml"
"
task"
:
"
arabicmmlu_arabic_language_(grammar)"
"
tag"
:
"
arabicmmlu_language_tasks"
"
task"
:
"
arabicmmlu_arabic_language_grammar"
"
task_alias"
:
"
Arabic
Language
(Grammar)"
"
task_alias"
:
"
Arabic
Language
(Grammar)"
lm_eval/tasks/arabicmmlu/arabicmmlu_
high_
arabic_language.yaml
→
lm_eval/tasks/arabicmmlu/arabicmmlu_arabic_language
_high_school
.yaml
View file @
89b6bdb3
"
dataset_name"
:
"
High
Arabic
Language"
"
dataset_name"
:
"
Arabic
Language
(High
School)"
"
tag"
:
"
arabicmmlu_language_tasks"
"
include"
:
"
_default_arabicmmlu_template_yaml"
"
include"
:
"
_default_arabicmmlu_template_yaml"
"
task"
:
"
arabicmmlu_high_arabic_language"
"
tag"
:
"
arabicmmlu_language_tasks"
"
task_alias"
:
"
High
Arabic
Language"
"
task"
:
"
arabicmmlu_arabic_language_high_school"
"
task_alias"
:
"
Arabic
Language
(High
School)"
lm_eval/tasks/arabicmmlu/arabicmmlu_
middle_
arabic_language.yaml
→
lm_eval/tasks/arabicmmlu/arabicmmlu_arabic_language
_middle_school
.yaml
View file @
89b6bdb3
"
dataset_name"
:
"
Middle
Arabic
Language"
"
dataset_name"
:
"
Arabic
Language
(Middle
School)"
"
tag"
:
"
arabicmmlu_language_tasks"
"
include"
:
"
_default_arabicmmlu_template_yaml"
"
include"
:
"
_default_arabicmmlu_template_yaml"
"
task"
:
"
arabicmmlu_middle_arabic_language"
"
tag"
:
"
arabicmmlu_language_tasks"
"
task_alias"
:
"
Middle
Arabic
Language"
"
task"
:
"
arabicmmlu_arabic_language_middle_school"
"
task_alias"
:
"
Arabic
Language
(Middle
School)"
lm_eval/tasks/arabicmmlu/arabicmmlu_
primary_
arabic_language.yaml
→
lm_eval/tasks/arabicmmlu/arabicmmlu_arabic_language
_primary_school
.yaml
View file @
89b6bdb3
"
dataset_name"
:
"
Primary
Arabic
Language"
"
dataset_name"
:
"
Arabic
Language
(Primary
School)"
"
tag"
:
"
arabicmmlu_language_tasks"
"
include"
:
"
_default_arabicmmlu_template_yaml"
"
include"
:
"
_default_arabicmmlu_template_yaml"
"
task"
:
"
arabicmmlu_primary_arabic_language"
"
tag"
:
"
arabicmmlu_language_tasks"
"
task_alias"
:
"
Primary
Arabic
Language"
"
task"
:
"
arabicmmlu_arabic_language_primary_school"
"
task_alias"
:
"
Arabic
Language
(Primary
School)"
lm_eval/tasks/arabicmmlu/arabicmmlu_
high_physics
.yaml
→
lm_eval/tasks/arabicmmlu/arabicmmlu_
biology_high_school
.yaml
View file @
89b6bdb3
"
dataset_name"
:
"
High
Physics"
"
dataset_name"
:
"
Biology
(High
School)"
"
tag"
:
"
arabicmmlu_stem_tasks"
"
include"
:
"
_default_arabicmmlu_template_yaml"
"
include"
:
"
_default_arabicmmlu_template_yaml"
"
task"
:
"
arabicmmlu_high_physics"
"
tag"
:
"
arabicmmlu_stem_tasks"
"
task_alias"
:
"
High
Physics"
"
task"
:
"
arabicmmlu_biology_high_school"
"
task_alias"
:
"
Biology
(High
School)"
lm_eval/tasks/arabicmmlu/arabicmmlu_
high_economics
.yaml
→
lm_eval/tasks/arabicmmlu/arabicmmlu_
civics_high_school
.yaml
View file @
89b6bdb3
"
dataset_name"
:
"
High
Economics"
"
dataset_name"
:
"
Civics
(High
School)"
"
tag"
:
"
arabicmmlu_social_science_tasks"
"
include"
:
"
_default_arabicmmlu_template_yaml"
"
include"
:
"
_default_arabicmmlu_template_yaml"
"
task"
:
"
arabicmmlu_high_economics"
"
tag"
:
"
arabicmmlu_social_science_tasks"
"
task_alias"
:
"
High
Economics"
"
task"
:
"
arabicmmlu_civics_high_school"
"
task_alias"
:
"
Civics
(High
School)"
lm_eval/tasks/arabicmmlu/arabicmmlu_
high_geography
.yaml
→
lm_eval/tasks/arabicmmlu/arabicmmlu_
civics_middle_school
.yaml
View file @
89b6bdb3
"
dataset_name"
:
"
High
Geography"
"
dataset_name"
:
"
Civics
(Middle
School)"
"
tag"
:
"
arabicmmlu_social_science_tasks"
"
include"
:
"
_default_arabicmmlu_template_yaml"
"
include"
:
"
_default_arabicmmlu_template_yaml"
"
task"
:
"
arabicmmlu_high_geography"
"
tag"
:
"
arabicmmlu_social_science_tasks"
"
task_alias"
:
"
High
Geography"
"
task"
:
"
arabicmmlu_civics_middle_school"
"
task_alias"
:
"
Civics
(Middle
School)"
lm_eval/tasks/arabicmmlu/arabicmmlu_
high_
computer_science.yaml
→
lm_eval/tasks/arabicmmlu/arabicmmlu_computer_science
_high_school
.yaml
View file @
89b6bdb3
"
dataset_name"
:
"
High
Computer
Science"
"
dataset_name"
:
"
Computer
Science
(High
School)"
"
tag"
:
"
arabicmmlu_stem_tasks"
"
include"
:
"
_default_arabicmmlu_template_yaml"
"
include"
:
"
_default_arabicmmlu_template_yaml"
"
task"
:
"
arabicmmlu_high_computer_science"
"
tag"
:
"
arabicmmlu_stem_tasks"
"
task_alias"
:
"
High
Computer
Science"
"
task"
:
"
arabicmmlu_computer_science_high_school"
"
task_alias"
:
"
Computer
Science
(High
School)"
lm_eval/tasks/arabicmmlu/arabicmmlu_computer_science_middle_school.yaml
0 → 100644
View file @
89b6bdb3
"
dataset_name"
:
"
Computer
Science
(Middle
School)"
"
include"
:
"
_default_arabicmmlu_template_yaml"
"
tag"
:
"
arabicmmlu_stem_tasks"
"
task"
:
"
arabicmmlu_computer_science_middle_school"
"
task_alias"
:
"
Computer
Science
(Middle
School)"
lm_eval/tasks/arabicmmlu/arabicmmlu_computer_science_primary_school.yaml
0 → 100644
View file @
89b6bdb3
"
dataset_name"
:
"
Computer
Science
(Primary
School)"
"
include"
:
"
_default_arabicmmlu_template_yaml"
"
tag"
:
"
arabicmmlu_stem_tasks"
"
task"
:
"
arabicmmlu_computer_science_primary_school"
"
task_alias"
:
"
Computer
Science
(Primary
School)"
Prev
1
2
3
4
5
6
7
…
50
Next
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment