Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
gaoqiong
lm-evaluation-harness
Commits
4eecbabb
Commit
4eecbabb
authored
Sep 16, 2024
by
Baber
Browse files
Merge branch 'main' into prefill
parents
dac8b534
fb963f0f
Changes
465
Hide whitespace changes
Inline
Side-by-side
Showing
20 changed files
with
460 additions
and
0 deletions
+460
-0
lm_eval/tasks/arabic_leaderboard_light/arabic_leaderboard_avca_light/arabic_leaderboard_acva_Arabic_Food_light.yaml
...avca_light/arabic_leaderboard_acva_Arabic_Food_light.yaml
+23
-0
lm_eval/tasks/arabic_leaderboard_light/arabic_leaderboard_avca_light/arabic_leaderboard_acva_Arabic_Funeral_light.yaml
...a_light/arabic_leaderboard_acva_Arabic_Funeral_light.yaml
+23
-0
lm_eval/tasks/arabic_leaderboard_light/arabic_leaderboard_avca_light/arabic_leaderboard_acva_Arabic_Geography_light.yaml
...light/arabic_leaderboard_acva_Arabic_Geography_light.yaml
+23
-0
lm_eval/tasks/arabic_leaderboard_light/arabic_leaderboard_avca_light/arabic_leaderboard_acva_Arabic_History_light.yaml
...a_light/arabic_leaderboard_acva_Arabic_History_light.yaml
+23
-0
lm_eval/tasks/arabic_leaderboard_light/arabic_leaderboard_avca_light/arabic_leaderboard_acva_Arabic_Language_Origin_light.yaml
...arabic_leaderboard_acva_Arabic_Language_Origin_light.yaml
+23
-0
lm_eval/tasks/arabic_leaderboard_light/arabic_leaderboard_avca_light/arabic_leaderboard_acva_Arabic_Literature_light.yaml
...ight/arabic_leaderboard_acva_Arabic_Literature_light.yaml
+23
-0
lm_eval/tasks/arabic_leaderboard_light/arabic_leaderboard_avca_light/arabic_leaderboard_acva_Arabic_Math_light.yaml
...avca_light/arabic_leaderboard_acva_Arabic_Math_light.yaml
+23
-0
lm_eval/tasks/arabic_leaderboard_light/arabic_leaderboard_avca_light/arabic_leaderboard_acva_Arabic_Medicine_light.yaml
..._light/arabic_leaderboard_acva_Arabic_Medicine_light.yaml
+23
-0
lm_eval/tasks/arabic_leaderboard_light/arabic_leaderboard_avca_light/arabic_leaderboard_acva_Arabic_Music_light.yaml
...vca_light/arabic_leaderboard_acva_Arabic_Music_light.yaml
+23
-0
lm_eval/tasks/arabic_leaderboard_light/arabic_leaderboard_avca_light/arabic_leaderboard_acva_Arabic_Ornament_light.yaml
..._light/arabic_leaderboard_acva_Arabic_Ornament_light.yaml
+23
-0
lm_eval/tasks/arabic_leaderboard_light/arabic_leaderboard_avca_light/arabic_leaderboard_acva_Arabic_Philosophy_light.yaml
...ight/arabic_leaderboard_acva_Arabic_Philosophy_light.yaml
+23
-0
lm_eval/tasks/arabic_leaderboard_light/arabic_leaderboard_avca_light/arabic_leaderboard_acva_Arabic_Physics_and_Chemistry_light.yaml
..._leaderboard_acva_Arabic_Physics_and_Chemistry_light.yaml
+23
-0
lm_eval/tasks/arabic_leaderboard_light/arabic_leaderboard_avca_light/arabic_leaderboard_acva_Arabic_Wedding_light.yaml
...a_light/arabic_leaderboard_acva_Arabic_Wedding_light.yaml
+23
-0
lm_eval/tasks/arabic_leaderboard_light/arabic_leaderboard_avca_light/arabic_leaderboard_acva_Bahrain_light.yaml
...ard_avca_light/arabic_leaderboard_acva_Bahrain_light.yaml
+23
-0
lm_eval/tasks/arabic_leaderboard_light/arabic_leaderboard_avca_light/arabic_leaderboard_acva_Comoros_light.yaml
...ard_avca_light/arabic_leaderboard_acva_Comoros_light.yaml
+23
-0
lm_eval/tasks/arabic_leaderboard_light/arabic_leaderboard_avca_light/arabic_leaderboard_acva_Egypt_modern_light.yaml
...vca_light/arabic_leaderboard_acva_Egypt_modern_light.yaml
+23
-0
lm_eval/tasks/arabic_leaderboard_light/arabic_leaderboard_avca_light/arabic_leaderboard_acva_InfluenceFromAncientEgypt_light.yaml
...bic_leaderboard_acva_InfluenceFromAncientEgypt_light.yaml
+23
-0
lm_eval/tasks/arabic_leaderboard_light/arabic_leaderboard_avca_light/arabic_leaderboard_acva_InfluenceFromByzantium_light.yaml
...arabic_leaderboard_acva_InfluenceFromByzantium_light.yaml
+23
-0
lm_eval/tasks/arabic_leaderboard_light/arabic_leaderboard_avca_light/arabic_leaderboard_acva_InfluenceFromChina_light.yaml
...ght/arabic_leaderboard_acva_InfluenceFromChina_light.yaml
+23
-0
lm_eval/tasks/arabic_leaderboard_light/arabic_leaderboard_avca_light/arabic_leaderboard_acva_InfluenceFromGreece_light.yaml
...ht/arabic_leaderboard_acva_InfluenceFromGreece_light.yaml
+23
-0
No files found.
lm_eval/tasks/arabic_leaderboard_light/arabic_leaderboard_avca_light/arabic_leaderboard_acva_Arabic_Food_light.yaml
0 → 100644
View file @
4eecbabb
task
:
arabic_leaderboard_acva_Arabic_Food_light
dataset_path
:
arcee-globe/ACVA-10percent
dataset_name
:
Arabic_Food
output_type
:
multiple_choice
training_split
:
null
validation_split
:
validation
test_split
:
test
process_docs
:
!function
utils.process_docs
doc_to_text
:
"
{{query}}"
doc_to_target
:
"
{{gold}}"
doc_to_choice
:
"
choices"
fewshot_split
:
validation
fewshot_config
:
sampler
:
first_n
metric_list
:
-
metric
:
acc
aggregation
:
mean
higher_is_better
:
true
-
metric
:
acc_norm
aggregation
:
mean
higher_is_better
:
true
metadata
:
version
:
1.0
lm_eval/tasks/arabic_leaderboard_light/arabic_leaderboard_avca_light/arabic_leaderboard_acva_Arabic_Funeral_light.yaml
0 → 100644
View file @
4eecbabb
task
:
arabic_leaderboard_acva_Arabic_Funeral_light
dataset_path
:
arcee-globe/ACVA-10percent
dataset_name
:
Arabic_Funeral
output_type
:
multiple_choice
training_split
:
null
validation_split
:
validation
test_split
:
test
process_docs
:
!function
utils.process_docs
doc_to_text
:
"
{{query}}"
doc_to_target
:
"
{{gold}}"
doc_to_choice
:
"
choices"
fewshot_split
:
validation
fewshot_config
:
sampler
:
first_n
metric_list
:
-
metric
:
acc
aggregation
:
mean
higher_is_better
:
true
-
metric
:
acc_norm
aggregation
:
mean
higher_is_better
:
true
metadata
:
version
:
1.0
lm_eval/tasks/arabic_leaderboard_light/arabic_leaderboard_avca_light/arabic_leaderboard_acva_Arabic_Geography_light.yaml
0 → 100644
View file @
4eecbabb
task
:
arabic_leaderboard_acva_Arabic_Geography_light
dataset_path
:
arcee-globe/ACVA-10percent
dataset_name
:
Arabic_Geography
output_type
:
multiple_choice
training_split
:
null
validation_split
:
validation
test_split
:
test
process_docs
:
!function
utils.process_docs
doc_to_text
:
"
{{query}}"
doc_to_target
:
"
{{gold}}"
doc_to_choice
:
"
choices"
fewshot_split
:
validation
fewshot_config
:
sampler
:
first_n
metric_list
:
-
metric
:
acc
aggregation
:
mean
higher_is_better
:
true
-
metric
:
acc_norm
aggregation
:
mean
higher_is_better
:
true
metadata
:
version
:
1.0
lm_eval/tasks/arabic_leaderboard_light/arabic_leaderboard_avca_light/arabic_leaderboard_acva_Arabic_History_light.yaml
0 → 100644
View file @
4eecbabb
task
:
arabic_leaderboard_acva_Arabic_History_light
dataset_path
:
arcee-globe/ACVA-10percent
dataset_name
:
Arabic_History
output_type
:
multiple_choice
training_split
:
null
validation_split
:
validation
test_split
:
test
process_docs
:
!function
utils.process_docs
doc_to_text
:
"
{{query}}"
doc_to_target
:
"
{{gold}}"
doc_to_choice
:
"
choices"
fewshot_split
:
validation
fewshot_config
:
sampler
:
first_n
metric_list
:
-
metric
:
acc
aggregation
:
mean
higher_is_better
:
true
-
metric
:
acc_norm
aggregation
:
mean
higher_is_better
:
true
metadata
:
version
:
1.0
lm_eval/tasks/arabic_leaderboard_light/arabic_leaderboard_avca_light/arabic_leaderboard_acva_Arabic_Language_Origin_light.yaml
0 → 100644
View file @
4eecbabb
task
:
arabic_leaderboard_acva_Arabic_Language_Origin_light
dataset_path
:
arcee-globe/ACVA-10percent
dataset_name
:
Arabic_Language_Origin
output_type
:
multiple_choice
training_split
:
null
validation_split
:
validation
test_split
:
test
process_docs
:
!function
utils.process_docs
doc_to_text
:
"
{{query}}"
doc_to_target
:
"
{{gold}}"
doc_to_choice
:
"
choices"
fewshot_split
:
validation
fewshot_config
:
sampler
:
first_n
metric_list
:
-
metric
:
acc
aggregation
:
mean
higher_is_better
:
true
-
metric
:
acc_norm
aggregation
:
mean
higher_is_better
:
true
metadata
:
version
:
1.0
lm_eval/tasks/arabic_leaderboard_light/arabic_leaderboard_avca_light/arabic_leaderboard_acva_Arabic_Literature_light.yaml
0 → 100644
View file @
4eecbabb
task
:
arabic_leaderboard_acva_Arabic_Literature_light
dataset_path
:
arcee-globe/ACVA-10percent
dataset_name
:
Arabic_Literature
output_type
:
multiple_choice
training_split
:
null
validation_split
:
validation
test_split
:
test
process_docs
:
!function
utils.process_docs
doc_to_text
:
"
{{query}}"
doc_to_target
:
"
{{gold}}"
doc_to_choice
:
"
choices"
fewshot_split
:
validation
fewshot_config
:
sampler
:
first_n
metric_list
:
-
metric
:
acc
aggregation
:
mean
higher_is_better
:
true
-
metric
:
acc_norm
aggregation
:
mean
higher_is_better
:
true
metadata
:
version
:
1.0
lm_eval/tasks/arabic_leaderboard_light/arabic_leaderboard_avca_light/arabic_leaderboard_acva_Arabic_Math_light.yaml
0 → 100644
View file @
4eecbabb
task
:
arabic_leaderboard_acva_Arabic_Math_light
dataset_path
:
arcee-globe/ACVA-10percent
dataset_name
:
Arabic_Math
output_type
:
multiple_choice
training_split
:
null
validation_split
:
validation
test_split
:
test
process_docs
:
!function
utils.process_docs
doc_to_text
:
"
{{query}}"
doc_to_target
:
"
{{gold}}"
doc_to_choice
:
"
choices"
fewshot_split
:
validation
fewshot_config
:
sampler
:
first_n
metric_list
:
-
metric
:
acc
aggregation
:
mean
higher_is_better
:
true
-
metric
:
acc_norm
aggregation
:
mean
higher_is_better
:
true
metadata
:
version
:
1.0
lm_eval/tasks/arabic_leaderboard_light/arabic_leaderboard_avca_light/arabic_leaderboard_acva_Arabic_Medicine_light.yaml
0 → 100644
View file @
4eecbabb
task
:
arabic_leaderboard_acva_Arabic_Medicine_light
dataset_path
:
arcee-globe/ACVA-10percent
dataset_name
:
Arabic_Medicine
output_type
:
multiple_choice
training_split
:
null
validation_split
:
validation
test_split
:
test
process_docs
:
!function
utils.process_docs
doc_to_text
:
"
{{query}}"
doc_to_target
:
"
{{gold}}"
doc_to_choice
:
"
choices"
fewshot_split
:
validation
fewshot_config
:
sampler
:
first_n
metric_list
:
-
metric
:
acc
aggregation
:
mean
higher_is_better
:
true
-
metric
:
acc_norm
aggregation
:
mean
higher_is_better
:
true
metadata
:
version
:
1.0
lm_eval/tasks/arabic_leaderboard_light/arabic_leaderboard_avca_light/arabic_leaderboard_acva_Arabic_Music_light.yaml
0 → 100644
View file @
4eecbabb
task
:
arabic_leaderboard_acva_Arabic_Music_light
dataset_path
:
arcee-globe/ACVA-10percent
dataset_name
:
Arabic_Music
output_type
:
multiple_choice
training_split
:
null
validation_split
:
validation
test_split
:
test
process_docs
:
!function
utils.process_docs
doc_to_text
:
"
{{query}}"
doc_to_target
:
"
{{gold}}"
doc_to_choice
:
"
choices"
fewshot_split
:
validation
fewshot_config
:
sampler
:
first_n
metric_list
:
-
metric
:
acc
aggregation
:
mean
higher_is_better
:
true
-
metric
:
acc_norm
aggregation
:
mean
higher_is_better
:
true
metadata
:
version
:
1.0
lm_eval/tasks/arabic_leaderboard_light/arabic_leaderboard_avca_light/arabic_leaderboard_acva_Arabic_Ornament_light.yaml
0 → 100644
View file @
4eecbabb
task
:
arabic_leaderboard_acva_Arabic_Ornament_light
dataset_path
:
arcee-globe/ACVA-10percent
dataset_name
:
Arabic_Ornament
output_type
:
multiple_choice
training_split
:
null
validation_split
:
validation
test_split
:
test
process_docs
:
!function
utils.process_docs
doc_to_text
:
"
{{query}}"
doc_to_target
:
"
{{gold}}"
doc_to_choice
:
"
choices"
fewshot_split
:
validation
fewshot_config
:
sampler
:
first_n
metric_list
:
-
metric
:
acc
aggregation
:
mean
higher_is_better
:
true
-
metric
:
acc_norm
aggregation
:
mean
higher_is_better
:
true
metadata
:
version
:
1.0
lm_eval/tasks/arabic_leaderboard_light/arabic_leaderboard_avca_light/arabic_leaderboard_acva_Arabic_Philosophy_light.yaml
0 → 100644
View file @
4eecbabb
task
:
arabic_leaderboard_acva_Arabic_Philosophy_light
dataset_path
:
arcee-globe/ACVA-10percent
dataset_name
:
Arabic_Philosophy
output_type
:
multiple_choice
training_split
:
null
validation_split
:
validation
test_split
:
test
process_docs
:
!function
utils.process_docs
doc_to_text
:
"
{{query}}"
doc_to_target
:
"
{{gold}}"
doc_to_choice
:
"
choices"
fewshot_split
:
validation
fewshot_config
:
sampler
:
first_n
metric_list
:
-
metric
:
acc
aggregation
:
mean
higher_is_better
:
true
-
metric
:
acc_norm
aggregation
:
mean
higher_is_better
:
true
metadata
:
version
:
1.0
lm_eval/tasks/arabic_leaderboard_light/arabic_leaderboard_avca_light/arabic_leaderboard_acva_Arabic_Physics_and_Chemistry_light.yaml
0 → 100644
View file @
4eecbabb
task
:
arabic_leaderboard_acva_Arabic_Physics_and_Chemistry_light
dataset_path
:
arcee-globe/ACVA-10percent
dataset_name
:
Arabic_Physics_and_Chemistry
output_type
:
multiple_choice
training_split
:
null
validation_split
:
validation
test_split
:
test
process_docs
:
!function
utils.process_docs
doc_to_text
:
"
{{query}}"
doc_to_target
:
"
{{gold}}"
doc_to_choice
:
"
choices"
fewshot_split
:
validation
fewshot_config
:
sampler
:
first_n
metric_list
:
-
metric
:
acc
aggregation
:
mean
higher_is_better
:
true
-
metric
:
acc_norm
aggregation
:
mean
higher_is_better
:
true
metadata
:
version
:
1.0
lm_eval/tasks/arabic_leaderboard_light/arabic_leaderboard_avca_light/arabic_leaderboard_acva_Arabic_Wedding_light.yaml
0 → 100644
View file @
4eecbabb
task
:
arabic_leaderboard_acva_Arabic_Wedding_light
dataset_path
:
arcee-globe/ACVA-10percent
dataset_name
:
Arabic_Wedding
output_type
:
multiple_choice
training_split
:
null
validation_split
:
validation
test_split
:
test
process_docs
:
!function
utils.process_docs
doc_to_text
:
"
{{query}}"
doc_to_target
:
"
{{gold}}"
doc_to_choice
:
"
choices"
fewshot_split
:
validation
fewshot_config
:
sampler
:
first_n
metric_list
:
-
metric
:
acc
aggregation
:
mean
higher_is_better
:
true
-
metric
:
acc_norm
aggregation
:
mean
higher_is_better
:
true
metadata
:
version
:
1.0
lm_eval/tasks/arabic_leaderboard_light/arabic_leaderboard_avca_light/arabic_leaderboard_acva_Bahrain_light.yaml
0 → 100644
View file @
4eecbabb
task
:
arabic_leaderboard_acva_Bahrain_light
dataset_path
:
arcee-globe/ACVA-10percent
dataset_name
:
Bahrain
output_type
:
multiple_choice
training_split
:
null
validation_split
:
validation
test_split
:
test
process_docs
:
!function
utils.process_docs
doc_to_text
:
"
{{query}}"
doc_to_target
:
"
{{gold}}"
doc_to_choice
:
"
choices"
fewshot_split
:
validation
fewshot_config
:
sampler
:
first_n
metric_list
:
-
metric
:
acc
aggregation
:
mean
higher_is_better
:
true
-
metric
:
acc_norm
aggregation
:
mean
higher_is_better
:
true
metadata
:
version
:
1.0
lm_eval/tasks/arabic_leaderboard_light/arabic_leaderboard_avca_light/arabic_leaderboard_acva_Comoros_light.yaml
0 → 100644
View file @
4eecbabb
task
:
arabic_leaderboard_acva_Comoros_light
dataset_path
:
arcee-globe/ACVA-10percent
dataset_name
:
Comoros
output_type
:
multiple_choice
training_split
:
null
validation_split
:
validation
test_split
:
test
process_docs
:
!function
utils.process_docs
doc_to_text
:
"
{{query}}"
doc_to_target
:
"
{{gold}}"
doc_to_choice
:
"
choices"
fewshot_split
:
validation
fewshot_config
:
sampler
:
first_n
metric_list
:
-
metric
:
acc
aggregation
:
mean
higher_is_better
:
true
-
metric
:
acc_norm
aggregation
:
mean
higher_is_better
:
true
metadata
:
version
:
1.0
lm_eval/tasks/arabic_leaderboard_light/arabic_leaderboard_avca_light/arabic_leaderboard_acva_Egypt_modern_light.yaml
0 → 100644
View file @
4eecbabb
task
:
arabic_leaderboard_acva_Egypt_modern_light
dataset_path
:
arcee-globe/ACVA-10percent
dataset_name
:
Egypt_modern
output_type
:
multiple_choice
training_split
:
null
validation_split
:
validation
test_split
:
test
process_docs
:
!function
utils.process_docs
doc_to_text
:
"
{{query}}"
doc_to_target
:
"
{{gold}}"
doc_to_choice
:
"
choices"
fewshot_split
:
validation
fewshot_config
:
sampler
:
first_n
metric_list
:
-
metric
:
acc
aggregation
:
mean
higher_is_better
:
true
-
metric
:
acc_norm
aggregation
:
mean
higher_is_better
:
true
metadata
:
version
:
1.0
lm_eval/tasks/arabic_leaderboard_light/arabic_leaderboard_avca_light/arabic_leaderboard_acva_InfluenceFromAncientEgypt_light.yaml
0 → 100644
View file @
4eecbabb
task
:
arabic_leaderboard_acva_InfluenceFromAncientEgypt_light
dataset_path
:
arcee-globe/ACVA-10percent
dataset_name
:
InfluenceFromAncientEgypt
output_type
:
multiple_choice
training_split
:
null
validation_split
:
validation
test_split
:
test
process_docs
:
!function
utils.process_docs
doc_to_text
:
"
{{query}}"
doc_to_target
:
"
{{gold}}"
doc_to_choice
:
"
choices"
fewshot_split
:
validation
fewshot_config
:
sampler
:
first_n
metric_list
:
-
metric
:
acc
aggregation
:
mean
higher_is_better
:
true
-
metric
:
acc_norm
aggregation
:
mean
higher_is_better
:
true
metadata
:
version
:
1.0
lm_eval/tasks/arabic_leaderboard_light/arabic_leaderboard_avca_light/arabic_leaderboard_acva_InfluenceFromByzantium_light.yaml
0 → 100644
View file @
4eecbabb
task
:
arabic_leaderboard_acva_InfluenceFromByzantium_light
dataset_path
:
arcee-globe/ACVA-10percent
dataset_name
:
InfluenceFromByzantium
output_type
:
multiple_choice
training_split
:
null
validation_split
:
validation
test_split
:
test
process_docs
:
!function
utils.process_docs
doc_to_text
:
"
{{query}}"
doc_to_target
:
"
{{gold}}"
doc_to_choice
:
"
choices"
fewshot_split
:
validation
fewshot_config
:
sampler
:
first_n
metric_list
:
-
metric
:
acc
aggregation
:
mean
higher_is_better
:
true
-
metric
:
acc_norm
aggregation
:
mean
higher_is_better
:
true
metadata
:
version
:
1.0
lm_eval/tasks/arabic_leaderboard_light/arabic_leaderboard_avca_light/arabic_leaderboard_acva_InfluenceFromChina_light.yaml
0 → 100644
View file @
4eecbabb
task
:
arabic_leaderboard_acva_InfluenceFromChina_light
dataset_path
:
arcee-globe/ACVA-10percent
dataset_name
:
InfluenceFromChina
output_type
:
multiple_choice
training_split
:
null
validation_split
:
validation
test_split
:
test
process_docs
:
!function
utils.process_docs
doc_to_text
:
"
{{query}}"
doc_to_target
:
"
{{gold}}"
doc_to_choice
:
"
choices"
fewshot_split
:
validation
fewshot_config
:
sampler
:
first_n
metric_list
:
-
metric
:
acc
aggregation
:
mean
higher_is_better
:
true
-
metric
:
acc_norm
aggregation
:
mean
higher_is_better
:
true
metadata
:
version
:
1.0
lm_eval/tasks/arabic_leaderboard_light/arabic_leaderboard_avca_light/arabic_leaderboard_acva_InfluenceFromGreece_light.yaml
0 → 100644
View file @
4eecbabb
task
:
arabic_leaderboard_acva_InfluenceFromGreece_light
dataset_path
:
arcee-globe/ACVA-10percent
dataset_name
:
InfluenceFromGreece
output_type
:
multiple_choice
training_split
:
null
validation_split
:
validation
test_split
:
test
process_docs
:
!function
utils.process_docs
doc_to_text
:
"
{{query}}"
doc_to_target
:
"
{{gold}}"
doc_to_choice
:
"
choices"
fewshot_split
:
validation
fewshot_config
:
sampler
:
first_n
metric_list
:
-
metric
:
acc
aggregation
:
mean
higher_is_better
:
true
-
metric
:
acc_norm
aggregation
:
mean
higher_is_better
:
true
metadata
:
version
:
1.0
Prev
1
…
12
13
14
15
16
17
18
19
20
…
24
Next
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment