Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
gaoqiong
lm-evaluation-harness
Commits
4eecbabb
Commit
4eecbabb
authored
Sep 16, 2024
by
Baber
Browse files
Merge branch 'main' into prefill
parents
dac8b534
fb963f0f
Changes
465
Hide whitespace changes
Inline
Side-by-side
Showing
20 changed files
with
460 additions
and
0 deletions
+460
-0
lm_eval/tasks/arabic_leaderboard_complete/arabic_leaderboard_avca/arabic_leaderboard_acva_Arabic_Medicine.yaml
...erboard_avca/arabic_leaderboard_acva_Arabic_Medicine.yaml
+23
-0
lm_eval/tasks/arabic_leaderboard_complete/arabic_leaderboard_avca/arabic_leaderboard_acva_Arabic_Music.yaml
...eaderboard_avca/arabic_leaderboard_acva_Arabic_Music.yaml
+23
-0
lm_eval/tasks/arabic_leaderboard_complete/arabic_leaderboard_avca/arabic_leaderboard_acva_Arabic_Ornament.yaml
...erboard_avca/arabic_leaderboard_acva_Arabic_Ornament.yaml
+23
-0
lm_eval/tasks/arabic_leaderboard_complete/arabic_leaderboard_avca/arabic_leaderboard_acva_Arabic_Philosophy.yaml
...board_avca/arabic_leaderboard_acva_Arabic_Philosophy.yaml
+23
-0
lm_eval/tasks/arabic_leaderboard_complete/arabic_leaderboard_avca/arabic_leaderboard_acva_Arabic_Physics_and_Chemistry.yaml
...arabic_leaderboard_acva_Arabic_Physics_and_Chemistry.yaml
+23
-0
lm_eval/tasks/arabic_leaderboard_complete/arabic_leaderboard_avca/arabic_leaderboard_acva_Arabic_Wedding.yaml
...derboard_avca/arabic_leaderboard_acva_Arabic_Wedding.yaml
+23
-0
lm_eval/tasks/arabic_leaderboard_complete/arabic_leaderboard_avca/arabic_leaderboard_acva_Bahrain.yaml
...bic_leaderboard_avca/arabic_leaderboard_acva_Bahrain.yaml
+23
-0
lm_eval/tasks/arabic_leaderboard_complete/arabic_leaderboard_avca/arabic_leaderboard_acva_Comoros.yaml
...bic_leaderboard_avca/arabic_leaderboard_acva_Comoros.yaml
+23
-0
lm_eval/tasks/arabic_leaderboard_complete/arabic_leaderboard_avca/arabic_leaderboard_acva_Egypt_modern.yaml
...eaderboard_avca/arabic_leaderboard_acva_Egypt_modern.yaml
+23
-0
lm_eval/tasks/arabic_leaderboard_complete/arabic_leaderboard_avca/arabic_leaderboard_acva_InfluenceFromAncientEgypt.yaml
...ca/arabic_leaderboard_acva_InfluenceFromAncientEgypt.yaml
+23
-0
lm_eval/tasks/arabic_leaderboard_complete/arabic_leaderboard_avca/arabic_leaderboard_acva_InfluenceFromByzantium.yaml
..._avca/arabic_leaderboard_acva_InfluenceFromByzantium.yaml
+23
-0
lm_eval/tasks/arabic_leaderboard_complete/arabic_leaderboard_avca/arabic_leaderboard_acva_InfluenceFromChina.yaml
...oard_avca/arabic_leaderboard_acva_InfluenceFromChina.yaml
+23
-0
lm_eval/tasks/arabic_leaderboard_complete/arabic_leaderboard_avca/arabic_leaderboard_acva_InfluenceFromGreece.yaml
...ard_avca/arabic_leaderboard_acva_InfluenceFromGreece.yaml
+23
-0
lm_eval/tasks/arabic_leaderboard_complete/arabic_leaderboard_avca/arabic_leaderboard_acva_InfluenceFromIslam.yaml
...oard_avca/arabic_leaderboard_acva_InfluenceFromIslam.yaml
+23
-0
lm_eval/tasks/arabic_leaderboard_complete/arabic_leaderboard_avca/arabic_leaderboard_acva_InfluenceFromPersia.yaml
...ard_avca/arabic_leaderboard_acva_InfluenceFromPersia.yaml
+23
-0
lm_eval/tasks/arabic_leaderboard_complete/arabic_leaderboard_avca/arabic_leaderboard_acva_InfluenceFromRome.yaml
...board_avca/arabic_leaderboard_acva_InfluenceFromRome.yaml
+23
-0
lm_eval/tasks/arabic_leaderboard_complete/arabic_leaderboard_avca/arabic_leaderboard_acva_Iraq.yaml
...arabic_leaderboard_avca/arabic_leaderboard_acva_Iraq.yaml
+23
-0
lm_eval/tasks/arabic_leaderboard_complete/arabic_leaderboard_avca/arabic_leaderboard_acva_Islam_Education.yaml
...erboard_avca/arabic_leaderboard_acva_Islam_Education.yaml
+23
-0
lm_eval/tasks/arabic_leaderboard_complete/arabic_leaderboard_avca/arabic_leaderboard_acva_Islam_branches_and_schools.yaml
...a/arabic_leaderboard_acva_Islam_branches_and_schools.yaml
+23
-0
lm_eval/tasks/arabic_leaderboard_complete/arabic_leaderboard_avca/arabic_leaderboard_acva_Islamic_law_system.yaml
...oard_avca/arabic_leaderboard_acva_Islamic_law_system.yaml
+23
-0
No files found.
lm_eval/tasks/arabic_leaderboard_complete/arabic_leaderboard_avca/arabic_leaderboard_acva_Arabic_Medicine.yaml
0 → 100644
View file @
4eecbabb
task
:
arabic_leaderboard_acva_Arabic_Medicine
dataset_path
:
OALL/ACVA
dataset_name
:
Arabic_Medicine
output_type
:
multiple_choice
training_split
:
null
validation_split
:
validation
test_split
:
test
process_docs
:
!function
utils.process_docs
doc_to_text
:
"
{{query}}"
doc_to_target
:
"
{{gold}}"
doc_to_choice
:
"
choices"
fewshot_split
:
validation
fewshot_config
:
sampler
:
first_n
metric_list
:
-
metric
:
acc
aggregation
:
mean
higher_is_better
:
true
-
metric
:
acc_norm
aggregation
:
mean
higher_is_better
:
true
metadata
:
version
:
1.0
lm_eval/tasks/arabic_leaderboard_complete/arabic_leaderboard_avca/arabic_leaderboard_acva_Arabic_Music.yaml
0 → 100644
View file @
4eecbabb
task
:
arabic_leaderboard_acva_Arabic_Music
dataset_path
:
OALL/ACVA
dataset_name
:
Arabic_Music
output_type
:
multiple_choice
training_split
:
null
validation_split
:
validation
test_split
:
test
process_docs
:
!function
utils.process_docs
doc_to_text
:
"
{{query}}"
doc_to_target
:
"
{{gold}}"
doc_to_choice
:
"
choices"
fewshot_split
:
validation
fewshot_config
:
sampler
:
first_n
metric_list
:
-
metric
:
acc
aggregation
:
mean
higher_is_better
:
true
-
metric
:
acc_norm
aggregation
:
mean
higher_is_better
:
true
metadata
:
version
:
1.0
lm_eval/tasks/arabic_leaderboard_complete/arabic_leaderboard_avca/arabic_leaderboard_acva_Arabic_Ornament.yaml
0 → 100644
View file @
4eecbabb
task
:
arabic_leaderboard_acva_Arabic_Ornament
dataset_path
:
OALL/ACVA
dataset_name
:
Arabic_Ornament
output_type
:
multiple_choice
training_split
:
null
validation_split
:
validation
test_split
:
test
process_docs
:
!function
utils.process_docs
doc_to_text
:
"
{{query}}"
doc_to_target
:
"
{{gold}}"
doc_to_choice
:
"
choices"
fewshot_split
:
validation
fewshot_config
:
sampler
:
first_n
metric_list
:
-
metric
:
acc
aggregation
:
mean
higher_is_better
:
true
-
metric
:
acc_norm
aggregation
:
mean
higher_is_better
:
true
metadata
:
version
:
1.0
lm_eval/tasks/arabic_leaderboard_complete/arabic_leaderboard_avca/arabic_leaderboard_acva_Arabic_Philosophy.yaml
0 → 100644
View file @
4eecbabb
task
:
arabic_leaderboard_acva_Arabic_Philosophy
dataset_path
:
OALL/ACVA
dataset_name
:
Arabic_Philosophy
output_type
:
multiple_choice
training_split
:
null
validation_split
:
validation
test_split
:
test
process_docs
:
!function
utils.process_docs
doc_to_text
:
"
{{query}}"
doc_to_target
:
"
{{gold}}"
doc_to_choice
:
"
choices"
fewshot_split
:
validation
fewshot_config
:
sampler
:
first_n
metric_list
:
-
metric
:
acc
aggregation
:
mean
higher_is_better
:
true
-
metric
:
acc_norm
aggregation
:
mean
higher_is_better
:
true
metadata
:
version
:
1.0
lm_eval/tasks/arabic_leaderboard_complete/arabic_leaderboard_avca/arabic_leaderboard_acva_Arabic_Physics_and_Chemistry.yaml
0 → 100644
View file @
4eecbabb
task
:
arabic_leaderboard_acva_Arabic_Physics_and_Chemistry
dataset_path
:
OALL/ACVA
dataset_name
:
Arabic_Physics_and_Chemistry
output_type
:
multiple_choice
training_split
:
null
validation_split
:
validation
test_split
:
test
process_docs
:
!function
utils.process_docs
doc_to_text
:
"
{{query}}"
doc_to_target
:
"
{{gold}}"
doc_to_choice
:
"
choices"
fewshot_split
:
validation
fewshot_config
:
sampler
:
first_n
metric_list
:
-
metric
:
acc
aggregation
:
mean
higher_is_better
:
true
-
metric
:
acc_norm
aggregation
:
mean
higher_is_better
:
true
metadata
:
version
:
1.0
lm_eval/tasks/arabic_leaderboard_complete/arabic_leaderboard_avca/arabic_leaderboard_acva_Arabic_Wedding.yaml
0 → 100644
View file @
4eecbabb
task
:
arabic_leaderboard_acva_Arabic_Wedding
dataset_path
:
OALL/ACVA
dataset_name
:
Arabic_Wedding
output_type
:
multiple_choice
training_split
:
null
validation_split
:
validation
test_split
:
test
process_docs
:
!function
utils.process_docs
doc_to_text
:
"
{{query}}"
doc_to_target
:
"
{{gold}}"
doc_to_choice
:
"
choices"
fewshot_split
:
validation
fewshot_config
:
sampler
:
first_n
metric_list
:
-
metric
:
acc
aggregation
:
mean
higher_is_better
:
true
-
metric
:
acc_norm
aggregation
:
mean
higher_is_better
:
true
metadata
:
version
:
1.0
lm_eval/tasks/arabic_leaderboard_complete/arabic_leaderboard_avca/arabic_leaderboard_acva_Bahrain.yaml
0 → 100644
View file @
4eecbabb
task
:
arabic_leaderboard_acva_Bahrain
dataset_path
:
OALL/ACVA
dataset_name
:
Bahrain
output_type
:
multiple_choice
training_split
:
null
validation_split
:
validation
test_split
:
test
process_docs
:
!function
utils.process_docs
doc_to_text
:
"
{{query}}"
doc_to_target
:
"
{{gold}}"
doc_to_choice
:
"
choices"
fewshot_split
:
validation
fewshot_config
:
sampler
:
first_n
metric_list
:
-
metric
:
acc
aggregation
:
mean
higher_is_better
:
true
-
metric
:
acc_norm
aggregation
:
mean
higher_is_better
:
true
metadata
:
version
:
1.0
lm_eval/tasks/arabic_leaderboard_complete/arabic_leaderboard_avca/arabic_leaderboard_acva_Comoros.yaml
0 → 100644
View file @
4eecbabb
task
:
arabic_leaderboard_acva_Comoros
dataset_path
:
OALL/ACVA
dataset_name
:
Comoros
output_type
:
multiple_choice
training_split
:
null
validation_split
:
validation
test_split
:
test
process_docs
:
!function
utils.process_docs
doc_to_text
:
"
{{query}}"
doc_to_target
:
"
{{gold}}"
doc_to_choice
:
"
choices"
fewshot_split
:
validation
fewshot_config
:
sampler
:
first_n
metric_list
:
-
metric
:
acc
aggregation
:
mean
higher_is_better
:
true
-
metric
:
acc_norm
aggregation
:
mean
higher_is_better
:
true
metadata
:
version
:
1.0
lm_eval/tasks/arabic_leaderboard_complete/arabic_leaderboard_avca/arabic_leaderboard_acva_Egypt_modern.yaml
0 → 100644
View file @
4eecbabb
task
:
arabic_leaderboard_acva_Egypt_modern
dataset_path
:
OALL/ACVA
dataset_name
:
Egypt_modern
output_type
:
multiple_choice
training_split
:
null
validation_split
:
validation
test_split
:
test
process_docs
:
!function
utils.process_docs
doc_to_text
:
"
{{query}}"
doc_to_target
:
"
{{gold}}"
doc_to_choice
:
"
choices"
fewshot_split
:
validation
fewshot_config
:
sampler
:
first_n
metric_list
:
-
metric
:
acc
aggregation
:
mean
higher_is_better
:
true
-
metric
:
acc_norm
aggregation
:
mean
higher_is_better
:
true
metadata
:
version
:
1.0
lm_eval/tasks/arabic_leaderboard_complete/arabic_leaderboard_avca/arabic_leaderboard_acva_InfluenceFromAncientEgypt.yaml
0 → 100644
View file @
4eecbabb
task
:
arabic_leaderboard_acva_InfluenceFromAncientEgypt
dataset_path
:
OALL/ACVA
dataset_name
:
InfluenceFromAncientEgypt
output_type
:
multiple_choice
training_split
:
null
validation_split
:
validation
test_split
:
test
process_docs
:
!function
utils.process_docs
doc_to_text
:
"
{{query}}"
doc_to_target
:
"
{{gold}}"
doc_to_choice
:
"
choices"
fewshot_split
:
validation
fewshot_config
:
sampler
:
first_n
metric_list
:
-
metric
:
acc
aggregation
:
mean
higher_is_better
:
true
-
metric
:
acc_norm
aggregation
:
mean
higher_is_better
:
true
metadata
:
version
:
1.0
lm_eval/tasks/arabic_leaderboard_complete/arabic_leaderboard_avca/arabic_leaderboard_acva_InfluenceFromByzantium.yaml
0 → 100644
View file @
4eecbabb
task
:
arabic_leaderboard_acva_InfluenceFromByzantium
dataset_path
:
OALL/ACVA
dataset_name
:
InfluenceFromByzantium
output_type
:
multiple_choice
training_split
:
null
validation_split
:
validation
test_split
:
test
process_docs
:
!function
utils.process_docs
doc_to_text
:
"
{{query}}"
doc_to_target
:
"
{{gold}}"
doc_to_choice
:
"
choices"
fewshot_split
:
validation
fewshot_config
:
sampler
:
first_n
metric_list
:
-
metric
:
acc
aggregation
:
mean
higher_is_better
:
true
-
metric
:
acc_norm
aggregation
:
mean
higher_is_better
:
true
metadata
:
version
:
1.0
lm_eval/tasks/arabic_leaderboard_complete/arabic_leaderboard_avca/arabic_leaderboard_acva_InfluenceFromChina.yaml
0 → 100644
View file @
4eecbabb
task
:
arabic_leaderboard_acva_InfluenceFromChina
dataset_path
:
OALL/ACVA
dataset_name
:
InfluenceFromChina
output_type
:
multiple_choice
training_split
:
null
validation_split
:
validation
test_split
:
test
process_docs
:
!function
utils.process_docs
doc_to_text
:
"
{{query}}"
doc_to_target
:
"
{{gold}}"
doc_to_choice
:
"
choices"
fewshot_split
:
validation
fewshot_config
:
sampler
:
first_n
metric_list
:
-
metric
:
acc
aggregation
:
mean
higher_is_better
:
true
-
metric
:
acc_norm
aggregation
:
mean
higher_is_better
:
true
metadata
:
version
:
1.0
lm_eval/tasks/arabic_leaderboard_complete/arabic_leaderboard_avca/arabic_leaderboard_acva_InfluenceFromGreece.yaml
0 → 100644
View file @
4eecbabb
task
:
arabic_leaderboard_acva_InfluenceFromGreece
dataset_path
:
OALL/ACVA
dataset_name
:
InfluenceFromGreece
output_type
:
multiple_choice
training_split
:
null
validation_split
:
validation
test_split
:
test
process_docs
:
!function
utils.process_docs
doc_to_text
:
"
{{query}}"
doc_to_target
:
"
{{gold}}"
doc_to_choice
:
"
choices"
fewshot_split
:
validation
fewshot_config
:
sampler
:
first_n
metric_list
:
-
metric
:
acc
aggregation
:
mean
higher_is_better
:
true
-
metric
:
acc_norm
aggregation
:
mean
higher_is_better
:
true
metadata
:
version
:
1.0
lm_eval/tasks/arabic_leaderboard_complete/arabic_leaderboard_avca/arabic_leaderboard_acva_InfluenceFromIslam.yaml
0 → 100644
View file @
4eecbabb
task
:
arabic_leaderboard_acva_InfluenceFromIslam
dataset_path
:
OALL/ACVA
dataset_name
:
InfluenceFromIslam
output_type
:
multiple_choice
training_split
:
null
validation_split
:
validation
test_split
:
test
process_docs
:
!function
utils.process_docs
doc_to_text
:
"
{{query}}"
doc_to_target
:
"
{{gold}}"
doc_to_choice
:
"
choices"
fewshot_split
:
validation
fewshot_config
:
sampler
:
first_n
metric_list
:
-
metric
:
acc
aggregation
:
mean
higher_is_better
:
true
-
metric
:
acc_norm
aggregation
:
mean
higher_is_better
:
true
metadata
:
version
:
1.0
lm_eval/tasks/arabic_leaderboard_complete/arabic_leaderboard_avca/arabic_leaderboard_acva_InfluenceFromPersia.yaml
0 → 100644
View file @
4eecbabb
task
:
arabic_leaderboard_acva_InfluenceFromPersia
dataset_path
:
OALL/ACVA
dataset_name
:
InfluenceFromPersia
output_type
:
multiple_choice
training_split
:
null
validation_split
:
validation
test_split
:
test
process_docs
:
!function
utils.process_docs
doc_to_text
:
"
{{query}}"
doc_to_target
:
"
{{gold}}"
doc_to_choice
:
"
choices"
fewshot_split
:
validation
fewshot_config
:
sampler
:
first_n
metric_list
:
-
metric
:
acc
aggregation
:
mean
higher_is_better
:
true
-
metric
:
acc_norm
aggregation
:
mean
higher_is_better
:
true
metadata
:
version
:
1.0
lm_eval/tasks/arabic_leaderboard_complete/arabic_leaderboard_avca/arabic_leaderboard_acva_InfluenceFromRome.yaml
0 → 100644
View file @
4eecbabb
task
:
arabic_leaderboard_acva_InfluenceFromRome
dataset_path
:
OALL/ACVA
dataset_name
:
InfluenceFromRome
output_type
:
multiple_choice
training_split
:
null
validation_split
:
validation
test_split
:
test
process_docs
:
!function
utils.process_docs
doc_to_text
:
"
{{query}}"
doc_to_target
:
"
{{gold}}"
doc_to_choice
:
"
choices"
fewshot_split
:
validation
fewshot_config
:
sampler
:
first_n
metric_list
:
-
metric
:
acc
aggregation
:
mean
higher_is_better
:
true
-
metric
:
acc_norm
aggregation
:
mean
higher_is_better
:
true
metadata
:
version
:
1.0
lm_eval/tasks/arabic_leaderboard_complete/arabic_leaderboard_avca/arabic_leaderboard_acva_Iraq.yaml
0 → 100644
View file @
4eecbabb
task
:
arabic_leaderboard_acva_Iraq
dataset_path
:
OALL/ACVA
dataset_name
:
Iraq
output_type
:
multiple_choice
training_split
:
null
validation_split
:
validation
test_split
:
test
process_docs
:
!function
utils.process_docs
doc_to_text
:
"
{{query}}"
doc_to_target
:
"
{{gold}}"
doc_to_choice
:
"
choices"
fewshot_split
:
validation
fewshot_config
:
sampler
:
first_n
metric_list
:
-
metric
:
acc
aggregation
:
mean
higher_is_better
:
true
-
metric
:
acc_norm
aggregation
:
mean
higher_is_better
:
true
metadata
:
version
:
1.0
lm_eval/tasks/arabic_leaderboard_complete/arabic_leaderboard_avca/arabic_leaderboard_acva_Islam_Education.yaml
0 → 100644
View file @
4eecbabb
task
:
arabic_leaderboard_acva_Islam_Education
dataset_path
:
OALL/ACVA
dataset_name
:
Islam_Education
output_type
:
multiple_choice
training_split
:
null
validation_split
:
validation
test_split
:
test
process_docs
:
!function
utils.process_docs
doc_to_text
:
"
{{query}}"
doc_to_target
:
"
{{gold}}"
doc_to_choice
:
"
choices"
fewshot_split
:
validation
fewshot_config
:
sampler
:
first_n
metric_list
:
-
metric
:
acc
aggregation
:
mean
higher_is_better
:
true
-
metric
:
acc_norm
aggregation
:
mean
higher_is_better
:
true
metadata
:
version
:
1.0
lm_eval/tasks/arabic_leaderboard_complete/arabic_leaderboard_avca/arabic_leaderboard_acva_Islam_branches_and_schools.yaml
0 → 100644
View file @
4eecbabb
task
:
arabic_leaderboard_acva_Islam_branches_and_schools
dataset_path
:
OALL/ACVA
dataset_name
:
Islam_branches_and_schools
output_type
:
multiple_choice
training_split
:
null
validation_split
:
validation
test_split
:
test
process_docs
:
!function
utils.process_docs
doc_to_text
:
"
{{query}}"
doc_to_target
:
"
{{gold}}"
doc_to_choice
:
"
choices"
fewshot_split
:
validation
fewshot_config
:
sampler
:
first_n
metric_list
:
-
metric
:
acc
aggregation
:
mean
higher_is_better
:
true
-
metric
:
acc_norm
aggregation
:
mean
higher_is_better
:
true
metadata
:
version
:
1.0
lm_eval/tasks/arabic_leaderboard_complete/arabic_leaderboard_avca/arabic_leaderboard_acva_Islamic_law_system.yaml
0 → 100644
View file @
4eecbabb
task
:
arabic_leaderboard_acva_Islamic_law_system
dataset_path
:
OALL/ACVA
dataset_name
:
Islamic_law_system
output_type
:
multiple_choice
training_split
:
null
validation_split
:
validation
test_split
:
test
process_docs
:
!function
utils.process_docs
doc_to_text
:
"
{{query}}"
doc_to_target
:
"
{{gold}}"
doc_to_choice
:
"
choices"
fewshot_split
:
validation
fewshot_config
:
sampler
:
first_n
metric_list
:
-
metric
:
acc
aggregation
:
mean
higher_is_better
:
true
-
metric
:
acc_norm
aggregation
:
mean
higher_is_better
:
true
metadata
:
version
:
1.0
Prev
1
…
4
5
6
7
8
9
10
11
12
…
24
Next
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment