Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
gaoqiong
lm-evaluation-harness
Commits
4eecbabb
Commit
4eecbabb
authored
Sep 16, 2024
by
Baber
Browse files
Merge branch 'main' into prefill
parents
dac8b534
fb963f0f
Changes
465
Hide whitespace changes
Inline
Side-by-side
Showing
20 changed files
with
505 additions
and
0 deletions
+505
-0
lm_eval/tasks/arabic_leaderboard_light/arabic_leaderboard_arabic_mmlu_light/arabic_leaderboard_arabic_mmlu_high_school_geography_light.yaml
..._leaderboard_arabic_mmlu_high_school_geography_light.yaml
+23
-0
lm_eval/tasks/arabic_leaderboard_light/arabic_leaderboard_arabic_mmlu_light/arabic_leaderboard_arabic_mmlu_high_school_government_and_politics_light.yaml
...rabic_mmlu_high_school_government_and_politics_light.yaml
+23
-0
lm_eval/tasks/arabic_leaderboard_light/arabic_leaderboard_arabic_mmlu_light/arabic_leaderboard_arabic_mmlu_high_school_macroeconomics_light.yaml
...erboard_arabic_mmlu_high_school_macroeconomics_light.yaml
+23
-0
lm_eval/tasks/arabic_leaderboard_light/arabic_leaderboard_arabic_mmlu_light/arabic_leaderboard_arabic_mmlu_high_school_mathematics_light.yaml
...eaderboard_arabic_mmlu_high_school_mathematics_light.yaml
+23
-0
lm_eval/tasks/arabic_leaderboard_light/arabic_leaderboard_arabic_mmlu_light/arabic_leaderboard_arabic_mmlu_high_school_microeconomics_light.yaml
...erboard_arabic_mmlu_high_school_microeconomics_light.yaml
+23
-0
lm_eval/tasks/arabic_leaderboard_light/arabic_leaderboard_arabic_mmlu_light/arabic_leaderboard_arabic_mmlu_high_school_physics_light.yaml
...ic_leaderboard_arabic_mmlu_high_school_physics_light.yaml
+23
-0
lm_eval/tasks/arabic_leaderboard_light/arabic_leaderboard_arabic_mmlu_light/arabic_leaderboard_arabic_mmlu_high_school_psychology_light.yaml
...leaderboard_arabic_mmlu_high_school_psychology_light.yaml
+23
-0
lm_eval/tasks/arabic_leaderboard_light/arabic_leaderboard_arabic_mmlu_light/arabic_leaderboard_arabic_mmlu_high_school_statistics_light.yaml
...leaderboard_arabic_mmlu_high_school_statistics_light.yaml
+23
-0
lm_eval/tasks/arabic_leaderboard_light/arabic_leaderboard_arabic_mmlu_light/arabic_leaderboard_arabic_mmlu_high_school_us_history_light.yaml
...leaderboard_arabic_mmlu_high_school_us_history_light.yaml
+23
-0
lm_eval/tasks/arabic_leaderboard_light/arabic_leaderboard_arabic_mmlu_light/arabic_leaderboard_arabic_mmlu_high_school_world_history_light.yaml
...derboard_arabic_mmlu_high_school_world_history_light.yaml
+23
-0
lm_eval/tasks/arabic_leaderboard_light/arabic_leaderboard_arabic_mmlu_light/arabic_leaderboard_arabic_mmlu_human_aging_light.yaml
...ght/arabic_leaderboard_arabic_mmlu_human_aging_light.yaml
+23
-0
lm_eval/tasks/arabic_leaderboard_light/arabic_leaderboard_arabic_mmlu_light/arabic_leaderboard_arabic_mmlu_human_sexuality_light.yaml
...arabic_leaderboard_arabic_mmlu_human_sexuality_light.yaml
+23
-0
lm_eval/tasks/arabic_leaderboard_light/arabic_leaderboard_arabic_mmlu_light/arabic_leaderboard_arabic_mmlu_international_law_light.yaml
...abic_leaderboard_arabic_mmlu_international_law_light.yaml
+23
-0
lm_eval/tasks/arabic_leaderboard_light/arabic_leaderboard_arabic_mmlu_light/arabic_leaderboard_arabic_mmlu_jurisprudence_light.yaml
...t/arabic_leaderboard_arabic_mmlu_jurisprudence_light.yaml
+23
-0
lm_eval/tasks/arabic_leaderboard_light/arabic_leaderboard_arabic_mmlu_light/arabic_leaderboard_arabic_mmlu_light.yaml
...abic_mmlu_light/arabic_leaderboard_arabic_mmlu_light.yaml
+68
-0
lm_eval/tasks/arabic_leaderboard_light/arabic_leaderboard_arabic_mmlu_light/arabic_leaderboard_arabic_mmlu_logical_fallacies_light.yaml
...abic_leaderboard_arabic_mmlu_logical_fallacies_light.yaml
+23
-0
lm_eval/tasks/arabic_leaderboard_light/arabic_leaderboard_arabic_mmlu_light/arabic_leaderboard_arabic_mmlu_machine_learning_light.yaml
...rabic_leaderboard_arabic_mmlu_machine_learning_light.yaml
+23
-0
lm_eval/tasks/arabic_leaderboard_light/arabic_leaderboard_arabic_mmlu_light/arabic_leaderboard_arabic_mmlu_management_light.yaml
...ight/arabic_leaderboard_arabic_mmlu_management_light.yaml
+23
-0
lm_eval/tasks/arabic_leaderboard_light/arabic_leaderboard_arabic_mmlu_light/arabic_leaderboard_arabic_mmlu_marketing_light.yaml
...light/arabic_leaderboard_arabic_mmlu_marketing_light.yaml
+23
-0
lm_eval/tasks/arabic_leaderboard_light/arabic_leaderboard_arabic_mmlu_light/arabic_leaderboard_arabic_mmlu_medical_genetics_light.yaml
...rabic_leaderboard_arabic_mmlu_medical_genetics_light.yaml
+23
-0
No files found.
lm_eval/tasks/arabic_leaderboard_light/arabic_leaderboard_arabic_mmlu_light/arabic_leaderboard_arabic_mmlu_high_school_geography_light.yaml
0 → 100644
View file @
4eecbabb
task
:
arabic_leaderboard_arabic_mmlu_high_school_geography_light
dataset_path
:
arcee-globe/Arabic_MMLU-10percent
dataset_name
:
high_school_geography
output_type
:
multiple_choice
training_split
:
null
validation_split
:
dev
test_split
:
test
process_docs
:
!function
utils.process_docs
doc_to_text
:
"
{{query}}"
doc_to_target
:
"
{{gold}}"
doc_to_choice
:
"
choices"
fewshot_split
:
dev
fewshot_config
:
sampler
:
first_n
metric_list
:
-
metric
:
acc
aggregation
:
mean
higher_is_better
:
true
-
metric
:
acc_norm
aggregation
:
mean
higher_is_better
:
true
metadata
:
version
:
1.0
lm_eval/tasks/arabic_leaderboard_light/arabic_leaderboard_arabic_mmlu_light/arabic_leaderboard_arabic_mmlu_high_school_government_and_politics_light.yaml
0 → 100644
View file @
4eecbabb
task
:
arabic_leaderboard_arabic_mmlu_high_school_government_and_politics_light
dataset_path
:
arcee-globe/Arabic_MMLU-10percent
dataset_name
:
high_school_government_and_politics
output_type
:
multiple_choice
training_split
:
null
validation_split
:
dev
test_split
:
test
process_docs
:
!function
utils.process_docs
doc_to_text
:
"
{{query}}"
doc_to_target
:
"
{{gold}}"
doc_to_choice
:
"
choices"
fewshot_split
:
dev
fewshot_config
:
sampler
:
first_n
metric_list
:
-
metric
:
acc
aggregation
:
mean
higher_is_better
:
true
-
metric
:
acc_norm
aggregation
:
mean
higher_is_better
:
true
metadata
:
version
:
1.0
lm_eval/tasks/arabic_leaderboard_light/arabic_leaderboard_arabic_mmlu_light/arabic_leaderboard_arabic_mmlu_high_school_macroeconomics_light.yaml
0 → 100644
View file @
4eecbabb
task
:
arabic_leaderboard_arabic_mmlu_high_school_macroeconomics_light
dataset_path
:
arcee-globe/Arabic_MMLU-10percent
dataset_name
:
high_school_macroeconomics
output_type
:
multiple_choice
training_split
:
null
validation_split
:
dev
test_split
:
test
process_docs
:
!function
utils.process_docs
doc_to_text
:
"
{{query}}"
doc_to_target
:
"
{{gold}}"
doc_to_choice
:
"
choices"
fewshot_split
:
dev
fewshot_config
:
sampler
:
first_n
metric_list
:
-
metric
:
acc
aggregation
:
mean
higher_is_better
:
true
-
metric
:
acc_norm
aggregation
:
mean
higher_is_better
:
true
metadata
:
version
:
1.0
lm_eval/tasks/arabic_leaderboard_light/arabic_leaderboard_arabic_mmlu_light/arabic_leaderboard_arabic_mmlu_high_school_mathematics_light.yaml
0 → 100644
View file @
4eecbabb
task
:
arabic_leaderboard_arabic_mmlu_high_school_mathematics_light
dataset_path
:
arcee-globe/Arabic_MMLU-10percent
dataset_name
:
high_school_mathematics
output_type
:
multiple_choice
training_split
:
null
validation_split
:
dev
test_split
:
test
process_docs
:
!function
utils.process_docs
doc_to_text
:
"
{{query}}"
doc_to_target
:
"
{{gold}}"
doc_to_choice
:
"
choices"
fewshot_split
:
dev
fewshot_config
:
sampler
:
first_n
metric_list
:
-
metric
:
acc
aggregation
:
mean
higher_is_better
:
true
-
metric
:
acc_norm
aggregation
:
mean
higher_is_better
:
true
metadata
:
version
:
1.0
lm_eval/tasks/arabic_leaderboard_light/arabic_leaderboard_arabic_mmlu_light/arabic_leaderboard_arabic_mmlu_high_school_microeconomics_light.yaml
0 → 100644
View file @
4eecbabb
task
:
arabic_leaderboard_arabic_mmlu_high_school_microeconomics_light
dataset_path
:
arcee-globe/Arabic_MMLU-10percent
dataset_name
:
high_school_microeconomics
output_type
:
multiple_choice
training_split
:
null
validation_split
:
dev
test_split
:
test
process_docs
:
!function
utils.process_docs
doc_to_text
:
"
{{query}}"
doc_to_target
:
"
{{gold}}"
doc_to_choice
:
"
choices"
fewshot_split
:
dev
fewshot_config
:
sampler
:
first_n
metric_list
:
-
metric
:
acc
aggregation
:
mean
higher_is_better
:
true
-
metric
:
acc_norm
aggregation
:
mean
higher_is_better
:
true
metadata
:
version
:
1.0
lm_eval/tasks/arabic_leaderboard_light/arabic_leaderboard_arabic_mmlu_light/arabic_leaderboard_arabic_mmlu_high_school_physics_light.yaml
0 → 100644
View file @
4eecbabb
task
:
arabic_leaderboard_arabic_mmlu_high_school_physics_light
dataset_path
:
arcee-globe/Arabic_MMLU-10percent
dataset_name
:
high_school_physics
output_type
:
multiple_choice
training_split
:
null
validation_split
:
dev
test_split
:
test
process_docs
:
!function
utils.process_docs
doc_to_text
:
"
{{query}}"
doc_to_target
:
"
{{gold}}"
doc_to_choice
:
"
choices"
fewshot_split
:
dev
fewshot_config
:
sampler
:
first_n
metric_list
:
-
metric
:
acc
aggregation
:
mean
higher_is_better
:
true
-
metric
:
acc_norm
aggregation
:
mean
higher_is_better
:
true
metadata
:
version
:
1.0
lm_eval/tasks/arabic_leaderboard_light/arabic_leaderboard_arabic_mmlu_light/arabic_leaderboard_arabic_mmlu_high_school_psychology_light.yaml
0 → 100644
View file @
4eecbabb
task
:
arabic_leaderboard_arabic_mmlu_high_school_psychology_light
dataset_path
:
arcee-globe/Arabic_MMLU-10percent
dataset_name
:
high_school_psychology
output_type
:
multiple_choice
training_split
:
null
validation_split
:
dev
test_split
:
test
process_docs
:
!function
utils.process_docs
doc_to_text
:
"
{{query}}"
doc_to_target
:
"
{{gold}}"
doc_to_choice
:
"
choices"
fewshot_split
:
dev
fewshot_config
:
sampler
:
first_n
metric_list
:
-
metric
:
acc
aggregation
:
mean
higher_is_better
:
true
-
metric
:
acc_norm
aggregation
:
mean
higher_is_better
:
true
metadata
:
version
:
1.0
lm_eval/tasks/arabic_leaderboard_light/arabic_leaderboard_arabic_mmlu_light/arabic_leaderboard_arabic_mmlu_high_school_statistics_light.yaml
0 → 100644
View file @
4eecbabb
task
:
arabic_leaderboard_arabic_mmlu_high_school_statistics_light
dataset_path
:
arcee-globe/Arabic_MMLU-10percent
dataset_name
:
high_school_statistics
output_type
:
multiple_choice
training_split
:
null
validation_split
:
dev
test_split
:
test
process_docs
:
!function
utils.process_docs
doc_to_text
:
"
{{query}}"
doc_to_target
:
"
{{gold}}"
doc_to_choice
:
"
choices"
fewshot_split
:
dev
fewshot_config
:
sampler
:
first_n
metric_list
:
-
metric
:
acc
aggregation
:
mean
higher_is_better
:
true
-
metric
:
acc_norm
aggregation
:
mean
higher_is_better
:
true
metadata
:
version
:
1.0
lm_eval/tasks/arabic_leaderboard_light/arabic_leaderboard_arabic_mmlu_light/arabic_leaderboard_arabic_mmlu_high_school_us_history_light.yaml
0 → 100644
View file @
4eecbabb
task
:
arabic_leaderboard_arabic_mmlu_high_school_us_history_light
dataset_path
:
arcee-globe/Arabic_MMLU-10percent
dataset_name
:
high_school_us_history
output_type
:
multiple_choice
training_split
:
null
validation_split
:
dev
test_split
:
test
process_docs
:
!function
utils.process_docs
doc_to_text
:
"
{{query}}"
doc_to_target
:
"
{{gold}}"
doc_to_choice
:
"
choices"
fewshot_split
:
dev
fewshot_config
:
sampler
:
first_n
metric_list
:
-
metric
:
acc
aggregation
:
mean
higher_is_better
:
true
-
metric
:
acc_norm
aggregation
:
mean
higher_is_better
:
true
metadata
:
version
:
1.0
lm_eval/tasks/arabic_leaderboard_light/arabic_leaderboard_arabic_mmlu_light/arabic_leaderboard_arabic_mmlu_high_school_world_history_light.yaml
0 → 100644
View file @
4eecbabb
task
:
arabic_leaderboard_arabic_mmlu_high_school_world_history_light
dataset_path
:
arcee-globe/Arabic_MMLU-10percent
dataset_name
:
high_school_world_history
output_type
:
multiple_choice
training_split
:
null
validation_split
:
dev
test_split
:
test
process_docs
:
!function
utils.process_docs
doc_to_text
:
"
{{query}}"
doc_to_target
:
"
{{gold}}"
doc_to_choice
:
"
choices"
fewshot_split
:
dev
fewshot_config
:
sampler
:
first_n
metric_list
:
-
metric
:
acc
aggregation
:
mean
higher_is_better
:
true
-
metric
:
acc_norm
aggregation
:
mean
higher_is_better
:
true
metadata
:
version
:
1.0
lm_eval/tasks/arabic_leaderboard_light/arabic_leaderboard_arabic_mmlu_light/arabic_leaderboard_arabic_mmlu_human_aging_light.yaml
0 → 100644
View file @
4eecbabb
task
:
arabic_leaderboard_arabic_mmlu_human_aging_light
dataset_path
:
arcee-globe/Arabic_MMLU-10percent
dataset_name
:
human_aging
output_type
:
multiple_choice
training_split
:
null
validation_split
:
dev
test_split
:
test
process_docs
:
!function
utils.process_docs
doc_to_text
:
"
{{query}}"
doc_to_target
:
"
{{gold}}"
doc_to_choice
:
"
choices"
fewshot_split
:
dev
fewshot_config
:
sampler
:
first_n
metric_list
:
-
metric
:
acc
aggregation
:
mean
higher_is_better
:
true
-
metric
:
acc_norm
aggregation
:
mean
higher_is_better
:
true
metadata
:
version
:
1.0
lm_eval/tasks/arabic_leaderboard_light/arabic_leaderboard_arabic_mmlu_light/arabic_leaderboard_arabic_mmlu_human_sexuality_light.yaml
0 → 100644
View file @
4eecbabb
task
:
arabic_leaderboard_arabic_mmlu_human_sexuality_light
dataset_path
:
arcee-globe/Arabic_MMLU-10percent
dataset_name
:
human_sexuality
output_type
:
multiple_choice
training_split
:
null
validation_split
:
dev
test_split
:
test
process_docs
:
!function
utils.process_docs
doc_to_text
:
"
{{query}}"
doc_to_target
:
"
{{gold}}"
doc_to_choice
:
"
choices"
fewshot_split
:
dev
fewshot_config
:
sampler
:
first_n
metric_list
:
-
metric
:
acc
aggregation
:
mean
higher_is_better
:
true
-
metric
:
acc_norm
aggregation
:
mean
higher_is_better
:
true
metadata
:
version
:
1.0
lm_eval/tasks/arabic_leaderboard_light/arabic_leaderboard_arabic_mmlu_light/arabic_leaderboard_arabic_mmlu_international_law_light.yaml
0 → 100644
View file @
4eecbabb
task
:
arabic_leaderboard_arabic_mmlu_international_law_light
dataset_path
:
arcee-globe/Arabic_MMLU-10percent
dataset_name
:
international_law
output_type
:
multiple_choice
training_split
:
null
validation_split
:
dev
test_split
:
test
process_docs
:
!function
utils.process_docs
doc_to_text
:
"
{{query}}"
doc_to_target
:
"
{{gold}}"
doc_to_choice
:
"
choices"
fewshot_split
:
dev
fewshot_config
:
sampler
:
first_n
metric_list
:
-
metric
:
acc
aggregation
:
mean
higher_is_better
:
true
-
metric
:
acc_norm
aggregation
:
mean
higher_is_better
:
true
metadata
:
version
:
1.0
lm_eval/tasks/arabic_leaderboard_light/arabic_leaderboard_arabic_mmlu_light/arabic_leaderboard_arabic_mmlu_jurisprudence_light.yaml
0 → 100644
View file @
4eecbabb
task
:
arabic_leaderboard_arabic_mmlu_jurisprudence_light
dataset_path
:
arcee-globe/Arabic_MMLU-10percent
dataset_name
:
jurisprudence
output_type
:
multiple_choice
training_split
:
null
validation_split
:
dev
test_split
:
test
process_docs
:
!function
utils.process_docs
doc_to_text
:
"
{{query}}"
doc_to_target
:
"
{{gold}}"
doc_to_choice
:
"
choices"
fewshot_split
:
dev
fewshot_config
:
sampler
:
first_n
metric_list
:
-
metric
:
acc
aggregation
:
mean
higher_is_better
:
true
-
metric
:
acc_norm
aggregation
:
mean
higher_is_better
:
true
metadata
:
version
:
1.0
lm_eval/tasks/arabic_leaderboard_light/arabic_leaderboard_arabic_mmlu_light/arabic_leaderboard_arabic_mmlu_light.yaml
0 → 100644
View file @
4eecbabb
group
:
arabic_leaderboard_arabic_mmlu_light
task
:
-
arabic_leaderboard_arabic_mmlu_abstract_algebra_light
-
arabic_leaderboard_arabic_mmlu_anatomy_light
-
arabic_leaderboard_arabic_mmlu_astronomy_light
-
arabic_leaderboard_arabic_mmlu_business_ethics_light
-
arabic_leaderboard_arabic_mmlu_clinical_knowledge_light
-
arabic_leaderboard_arabic_mmlu_college_biology_light
-
arabic_leaderboard_arabic_mmlu_college_chemistry_light
-
arabic_leaderboard_arabic_mmlu_college_computer_science_light
-
arabic_leaderboard_arabic_mmlu_college_mathematics_light
-
arabic_leaderboard_arabic_mmlu_college_medicine_light
-
arabic_leaderboard_arabic_mmlu_college_physics_light
-
arabic_leaderboard_arabic_mmlu_computer_security_light
-
arabic_leaderboard_arabic_mmlu_conceptual_physics_light
-
arabic_leaderboard_arabic_mmlu_econometrics_light
-
arabic_leaderboard_arabic_mmlu_electrical_engineering_light
-
arabic_leaderboard_arabic_mmlu_elementary_mathematics_light
-
arabic_leaderboard_arabic_mmlu_formal_logic_light
-
arabic_leaderboard_arabic_mmlu_global_facts_light
-
arabic_leaderboard_arabic_mmlu_high_school_biology_light
-
arabic_leaderboard_arabic_mmlu_high_school_chemistry_light
-
arabic_leaderboard_arabic_mmlu_high_school_computer_science_light
-
arabic_leaderboard_arabic_mmlu_high_school_european_history_light
-
arabic_leaderboard_arabic_mmlu_high_school_geography_light
-
arabic_leaderboard_arabic_mmlu_high_school_government_and_politics_light
-
arabic_leaderboard_arabic_mmlu_high_school_macroeconomics_light
-
arabic_leaderboard_arabic_mmlu_high_school_mathematics_light
-
arabic_leaderboard_arabic_mmlu_high_school_microeconomics_light
-
arabic_leaderboard_arabic_mmlu_high_school_physics_light
-
arabic_leaderboard_arabic_mmlu_high_school_psychology_light
-
arabic_leaderboard_arabic_mmlu_high_school_statistics_light
-
arabic_leaderboard_arabic_mmlu_high_school_us_history_light
-
arabic_leaderboard_arabic_mmlu_high_school_world_history_light
-
arabic_leaderboard_arabic_mmlu_human_aging_light
-
arabic_leaderboard_arabic_mmlu_human_sexuality_light
-
arabic_leaderboard_arabic_mmlu_international_law_light
-
arabic_leaderboard_arabic_mmlu_jurisprudence_light
-
arabic_leaderboard_arabic_mmlu_logical_fallacies_light
-
arabic_leaderboard_arabic_mmlu_machine_learning_light
-
arabic_leaderboard_arabic_mmlu_management_light
-
arabic_leaderboard_arabic_mmlu_marketing_light
-
arabic_leaderboard_arabic_mmlu_medical_genetics_light
-
arabic_leaderboard_arabic_mmlu_miscellaneous_light
-
arabic_leaderboard_arabic_mmlu_moral_disputes_light
-
arabic_leaderboard_arabic_mmlu_moral_scenarios_light
-
arabic_leaderboard_arabic_mmlu_nutrition_light
-
arabic_leaderboard_arabic_mmlu_philosophy_light
-
arabic_leaderboard_arabic_mmlu_prehistory_light
-
arabic_leaderboard_arabic_mmlu_professional_accounting_light
-
arabic_leaderboard_arabic_mmlu_professional_law_light
-
arabic_leaderboard_arabic_mmlu_professional_medicine_light
-
arabic_leaderboard_arabic_mmlu_professional_psychology_light
-
arabic_leaderboard_arabic_mmlu_public_relations_light
-
arabic_leaderboard_arabic_mmlu_security_studies_light
-
arabic_leaderboard_arabic_mmlu_sociology_light
-
arabic_leaderboard_arabic_mmlu_us_foreign_policy_light
-
arabic_leaderboard_arabic_mmlu_virology_light
-
arabic_leaderboard_arabic_mmlu_world_religions_light
aggregate_metric_list
:
-
metric
:
acc
aggregation
:
mean
weight_by_size
:
true
-
metric
:
acc_norm
aggregation
:
mean
weight_by_size
:
true
metadata
:
version
:
1.0
lm_eval/tasks/arabic_leaderboard_light/arabic_leaderboard_arabic_mmlu_light/arabic_leaderboard_arabic_mmlu_logical_fallacies_light.yaml
0 → 100644
View file @
4eecbabb
task
:
arabic_leaderboard_arabic_mmlu_logical_fallacies_light
dataset_path
:
arcee-globe/Arabic_MMLU-10percent
dataset_name
:
logical_fallacies
output_type
:
multiple_choice
training_split
:
null
validation_split
:
dev
test_split
:
test
process_docs
:
!function
utils.process_docs
doc_to_text
:
"
{{query}}"
doc_to_target
:
"
{{gold}}"
doc_to_choice
:
"
choices"
fewshot_split
:
dev
fewshot_config
:
sampler
:
first_n
metric_list
:
-
metric
:
acc
aggregation
:
mean
higher_is_better
:
true
-
metric
:
acc_norm
aggregation
:
mean
higher_is_better
:
true
metadata
:
version
:
1.0
lm_eval/tasks/arabic_leaderboard_light/arabic_leaderboard_arabic_mmlu_light/arabic_leaderboard_arabic_mmlu_machine_learning_light.yaml
0 → 100644
View file @
4eecbabb
task
:
arabic_leaderboard_arabic_mmlu_machine_learning_light
dataset_path
:
arcee-globe/Arabic_MMLU-10percent
dataset_name
:
machine_learning
output_type
:
multiple_choice
training_split
:
null
validation_split
:
dev
test_split
:
test
process_docs
:
!function
utils.process_docs
doc_to_text
:
"
{{query}}"
doc_to_target
:
"
{{gold}}"
doc_to_choice
:
"
choices"
fewshot_split
:
dev
fewshot_config
:
sampler
:
first_n
metric_list
:
-
metric
:
acc
aggregation
:
mean
higher_is_better
:
true
-
metric
:
acc_norm
aggregation
:
mean
higher_is_better
:
true
metadata
:
version
:
1.0
lm_eval/tasks/arabic_leaderboard_light/arabic_leaderboard_arabic_mmlu_light/arabic_leaderboard_arabic_mmlu_management_light.yaml
0 → 100644
View file @
4eecbabb
task
:
arabic_leaderboard_arabic_mmlu_management_light
dataset_path
:
arcee-globe/Arabic_MMLU-10percent
dataset_name
:
management
output_type
:
multiple_choice
training_split
:
null
validation_split
:
dev
test_split
:
test
process_docs
:
!function
utils.process_docs
doc_to_text
:
"
{{query}}"
doc_to_target
:
"
{{gold}}"
doc_to_choice
:
"
choices"
fewshot_split
:
dev
fewshot_config
:
sampler
:
first_n
metric_list
:
-
metric
:
acc
aggregation
:
mean
higher_is_better
:
true
-
metric
:
acc_norm
aggregation
:
mean
higher_is_better
:
true
metadata
:
version
:
1.0
lm_eval/tasks/arabic_leaderboard_light/arabic_leaderboard_arabic_mmlu_light/arabic_leaderboard_arabic_mmlu_marketing_light.yaml
0 → 100644
View file @
4eecbabb
task
:
arabic_leaderboard_arabic_mmlu_marketing_light
dataset_path
:
arcee-globe/Arabic_MMLU-10percent
dataset_name
:
marketing
output_type
:
multiple_choice
training_split
:
null
validation_split
:
dev
test_split
:
test
process_docs
:
!function
utils.process_docs
doc_to_text
:
"
{{query}}"
doc_to_target
:
"
{{gold}}"
doc_to_choice
:
"
choices"
fewshot_split
:
dev
fewshot_config
:
sampler
:
first_n
metric_list
:
-
metric
:
acc
aggregation
:
mean
higher_is_better
:
true
-
metric
:
acc_norm
aggregation
:
mean
higher_is_better
:
true
metadata
:
version
:
1.0
lm_eval/tasks/arabic_leaderboard_light/arabic_leaderboard_arabic_mmlu_light/arabic_leaderboard_arabic_mmlu_medical_genetics_light.yaml
0 → 100644
View file @
4eecbabb
task
:
arabic_leaderboard_arabic_mmlu_medical_genetics_light
dataset_path
:
arcee-globe/Arabic_MMLU-10percent
dataset_name
:
medical_genetics
output_type
:
multiple_choice
training_split
:
null
validation_split
:
dev
test_split
:
test
process_docs
:
!function
utils.process_docs
doc_to_text
:
"
{{query}}"
doc_to_target
:
"
{{gold}}"
doc_to_choice
:
"
choices"
fewshot_split
:
dev
fewshot_config
:
sampler
:
first_n
metric_list
:
-
metric
:
acc
aggregation
:
mean
higher_is_better
:
true
-
metric
:
acc_norm
aggregation
:
mean
higher_is_better
:
true
metadata
:
version
:
1.0
Prev
1
…
8
9
10
11
12
13
14
15
16
…
24
Next
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment