Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
gaoqiong
lm-evaluation-harness
Commits
4eecbabb
Commit
4eecbabb
authored
Sep 16, 2024
by
Baber
Browse files
Merge branch 'main' into prefill
parents
dac8b534
fb963f0f
Changes
465
Hide whitespace changes
Inline
Side-by-side
Showing
20 changed files
with
460 additions
and
0 deletions
+460
-0
lm_eval/tasks/arabic_leaderboard_complete/arabic_leaderboard_avca/arabic_leaderboard_acva_Jordan.yaml
...abic_leaderboard_avca/arabic_leaderboard_acva_Jordan.yaml
+23
-0
lm_eval/tasks/arabic_leaderboard_complete/arabic_leaderboard_avca/arabic_leaderboard_acva_Kuwait.yaml
...abic_leaderboard_avca/arabic_leaderboard_acva_Kuwait.yaml
+23
-0
lm_eval/tasks/arabic_leaderboard_complete/arabic_leaderboard_avca/arabic_leaderboard_acva_Lebanon.yaml
...bic_leaderboard_avca/arabic_leaderboard_acva_Lebanon.yaml
+23
-0
lm_eval/tasks/arabic_leaderboard_complete/arabic_leaderboard_avca/arabic_leaderboard_acva_Libya.yaml
...rabic_leaderboard_avca/arabic_leaderboard_acva_Libya.yaml
+23
-0
lm_eval/tasks/arabic_leaderboard_complete/arabic_leaderboard_avca/arabic_leaderboard_acva_Mauritania.yaml
..._leaderboard_avca/arabic_leaderboard_acva_Mauritania.yaml
+23
-0
lm_eval/tasks/arabic_leaderboard_complete/arabic_leaderboard_avca/arabic_leaderboard_acva_Mesopotamia_civilization.yaml
...vca/arabic_leaderboard_acva_Mesopotamia_civilization.yaml
+23
-0
lm_eval/tasks/arabic_leaderboard_complete/arabic_leaderboard_avca/arabic_leaderboard_acva_Morocco.yaml
...bic_leaderboard_avca/arabic_leaderboard_acva_Morocco.yaml
+23
-0
lm_eval/tasks/arabic_leaderboard_complete/arabic_leaderboard_avca/arabic_leaderboard_acva_Oman.yaml
...arabic_leaderboard_avca/arabic_leaderboard_acva_Oman.yaml
+23
-0
lm_eval/tasks/arabic_leaderboard_complete/arabic_leaderboard_avca/arabic_leaderboard_acva_Palestine.yaml
...c_leaderboard_avca/arabic_leaderboard_acva_Palestine.yaml
+23
-0
lm_eval/tasks/arabic_leaderboard_complete/arabic_leaderboard_avca/arabic_leaderboard_acva_Qatar.yaml
...rabic_leaderboard_avca/arabic_leaderboard_acva_Qatar.yaml
+23
-0
lm_eval/tasks/arabic_leaderboard_complete/arabic_leaderboard_avca/arabic_leaderboard_acva_Saudi_Arabia.yaml
...eaderboard_avca/arabic_leaderboard_acva_Saudi_Arabia.yaml
+23
-0
lm_eval/tasks/arabic_leaderboard_complete/arabic_leaderboard_avca/arabic_leaderboard_acva_Somalia.yaml
...bic_leaderboard_avca/arabic_leaderboard_acva_Somalia.yaml
+23
-0
lm_eval/tasks/arabic_leaderboard_complete/arabic_leaderboard_avca/arabic_leaderboard_acva_Sudan.yaml
...rabic_leaderboard_avca/arabic_leaderboard_acva_Sudan.yaml
+23
-0
lm_eval/tasks/arabic_leaderboard_complete/arabic_leaderboard_avca/arabic_leaderboard_acva_Syria.yaml
...rabic_leaderboard_avca/arabic_leaderboard_acva_Syria.yaml
+23
-0
lm_eval/tasks/arabic_leaderboard_complete/arabic_leaderboard_avca/arabic_leaderboard_acva_Tunisia.yaml
...bic_leaderboard_avca/arabic_leaderboard_acva_Tunisia.yaml
+23
-0
lm_eval/tasks/arabic_leaderboard_complete/arabic_leaderboard_avca/arabic_leaderboard_acva_United_Arab_Emirates.yaml
...rd_avca/arabic_leaderboard_acva_United_Arab_Emirates.yaml
+23
-0
lm_eval/tasks/arabic_leaderboard_complete/arabic_leaderboard_avca/arabic_leaderboard_acva_Yemen.yaml
...rabic_leaderboard_avca/arabic_leaderboard_acva_Yemen.yaml
+23
-0
lm_eval/tasks/arabic_leaderboard_complete/arabic_leaderboard_avca/arabic_leaderboard_acva_communication.yaml
...aderboard_avca/arabic_leaderboard_acva_communication.yaml
+23
-0
lm_eval/tasks/arabic_leaderboard_complete/arabic_leaderboard_avca/arabic_leaderboard_acva_computer_and_phone.yaml
...oard_avca/arabic_leaderboard_acva_computer_and_phone.yaml
+23
-0
lm_eval/tasks/arabic_leaderboard_complete/arabic_leaderboard_avca/arabic_leaderboard_acva_daily_life.yaml
..._leaderboard_avca/arabic_leaderboard_acva_daily_life.yaml
+23
-0
No files found.
lm_eval/tasks/arabic_leaderboard_complete/arabic_leaderboard_avca/arabic_leaderboard_acva_Jordan.yaml
0 → 100644
View file @
4eecbabb
task
:
arabic_leaderboard_acva_Jordan
dataset_path
:
OALL/ACVA
dataset_name
:
Jordan
output_type
:
multiple_choice
training_split
:
null
validation_split
:
validation
test_split
:
test
process_docs
:
!function
utils.process_docs
doc_to_text
:
"
{{query}}"
doc_to_target
:
"
{{gold}}"
doc_to_choice
:
"
choices"
fewshot_split
:
validation
fewshot_config
:
sampler
:
first_n
metric_list
:
-
metric
:
acc
aggregation
:
mean
higher_is_better
:
true
-
metric
:
acc_norm
aggregation
:
mean
higher_is_better
:
true
metadata
:
version
:
1.0
lm_eval/tasks/arabic_leaderboard_complete/arabic_leaderboard_avca/arabic_leaderboard_acva_Kuwait.yaml
0 → 100644
View file @
4eecbabb
task
:
arabic_leaderboard_acva_Kuwait
dataset_path
:
OALL/ACVA
dataset_name
:
Kuwait
output_type
:
multiple_choice
training_split
:
null
validation_split
:
validation
test_split
:
test
process_docs
:
!function
utils.process_docs
doc_to_text
:
"
{{query}}"
doc_to_target
:
"
{{gold}}"
doc_to_choice
:
"
choices"
fewshot_split
:
validation
fewshot_config
:
sampler
:
first_n
metric_list
:
-
metric
:
acc
aggregation
:
mean
higher_is_better
:
true
-
metric
:
acc_norm
aggregation
:
mean
higher_is_better
:
true
metadata
:
version
:
1.0
lm_eval/tasks/arabic_leaderboard_complete/arabic_leaderboard_avca/arabic_leaderboard_acva_Lebanon.yaml
0 → 100644
View file @
4eecbabb
task
:
arabic_leaderboard_acva_Lebanon
dataset_path
:
OALL/ACVA
dataset_name
:
Lebanon
output_type
:
multiple_choice
training_split
:
null
validation_split
:
validation
test_split
:
test
process_docs
:
!function
utils.process_docs
doc_to_text
:
"
{{query}}"
doc_to_target
:
"
{{gold}}"
doc_to_choice
:
"
choices"
fewshot_split
:
validation
fewshot_config
:
sampler
:
first_n
metric_list
:
-
metric
:
acc
aggregation
:
mean
higher_is_better
:
true
-
metric
:
acc_norm
aggregation
:
mean
higher_is_better
:
true
metadata
:
version
:
1.0
lm_eval/tasks/arabic_leaderboard_complete/arabic_leaderboard_avca/arabic_leaderboard_acva_Libya.yaml
0 → 100644
View file @
4eecbabb
task
:
arabic_leaderboard_acva_Libya
dataset_path
:
OALL/ACVA
dataset_name
:
Libya
output_type
:
multiple_choice
training_split
:
null
validation_split
:
validation
test_split
:
test
process_docs
:
!function
utils.process_docs
doc_to_text
:
"
{{query}}"
doc_to_target
:
"
{{gold}}"
doc_to_choice
:
"
choices"
fewshot_split
:
validation
fewshot_config
:
sampler
:
first_n
metric_list
:
-
metric
:
acc
aggregation
:
mean
higher_is_better
:
true
-
metric
:
acc_norm
aggregation
:
mean
higher_is_better
:
true
metadata
:
version
:
1.0
lm_eval/tasks/arabic_leaderboard_complete/arabic_leaderboard_avca/arabic_leaderboard_acva_Mauritania.yaml
0 → 100644
View file @
4eecbabb
task
:
arabic_leaderboard_acva_Mauritania
dataset_path
:
OALL/ACVA
dataset_name
:
Mauritania
output_type
:
multiple_choice
training_split
:
null
validation_split
:
validation
test_split
:
test
process_docs
:
!function
utils.process_docs
doc_to_text
:
"
{{query}}"
doc_to_target
:
"
{{gold}}"
doc_to_choice
:
"
choices"
fewshot_split
:
validation
fewshot_config
:
sampler
:
first_n
metric_list
:
-
metric
:
acc
aggregation
:
mean
higher_is_better
:
true
-
metric
:
acc_norm
aggregation
:
mean
higher_is_better
:
true
metadata
:
version
:
1.0
lm_eval/tasks/arabic_leaderboard_complete/arabic_leaderboard_avca/arabic_leaderboard_acva_Mesopotamia_civilization.yaml
0 → 100644
View file @
4eecbabb
task
:
arabic_leaderboard_acva_Mesopotamia_civilization
dataset_path
:
OALL/ACVA
dataset_name
:
Mesopotamia_civilization
output_type
:
multiple_choice
training_split
:
null
validation_split
:
validation
test_split
:
test
process_docs
:
!function
utils.process_docs
doc_to_text
:
"
{{query}}"
doc_to_target
:
"
{{gold}}"
doc_to_choice
:
"
choices"
fewshot_split
:
validation
fewshot_config
:
sampler
:
first_n
metric_list
:
-
metric
:
acc
aggregation
:
mean
higher_is_better
:
true
-
metric
:
acc_norm
aggregation
:
mean
higher_is_better
:
true
metadata
:
version
:
1.0
lm_eval/tasks/arabic_leaderboard_complete/arabic_leaderboard_avca/arabic_leaderboard_acva_Morocco.yaml
0 → 100644
View file @
4eecbabb
task
:
arabic_leaderboard_acva_Morocco
dataset_path
:
OALL/ACVA
dataset_name
:
Morocco
output_type
:
multiple_choice
training_split
:
null
validation_split
:
validation
test_split
:
test
process_docs
:
!function
utils.process_docs
doc_to_text
:
"
{{query}}"
doc_to_target
:
"
{{gold}}"
doc_to_choice
:
"
choices"
fewshot_split
:
validation
fewshot_config
:
sampler
:
first_n
metric_list
:
-
metric
:
acc
aggregation
:
mean
higher_is_better
:
true
-
metric
:
acc_norm
aggregation
:
mean
higher_is_better
:
true
metadata
:
version
:
1.0
lm_eval/tasks/arabic_leaderboard_complete/arabic_leaderboard_avca/arabic_leaderboard_acva_Oman.yaml
0 → 100644
View file @
4eecbabb
task
:
arabic_leaderboard_acva_Oman
dataset_path
:
OALL/ACVA
dataset_name
:
Oman
output_type
:
multiple_choice
training_split
:
null
validation_split
:
validation
test_split
:
test
process_docs
:
!function
utils.process_docs
doc_to_text
:
"
{{query}}"
doc_to_target
:
"
{{gold}}"
doc_to_choice
:
"
choices"
fewshot_split
:
validation
fewshot_config
:
sampler
:
first_n
metric_list
:
-
metric
:
acc
aggregation
:
mean
higher_is_better
:
true
-
metric
:
acc_norm
aggregation
:
mean
higher_is_better
:
true
metadata
:
version
:
1.0
lm_eval/tasks/arabic_leaderboard_complete/arabic_leaderboard_avca/arabic_leaderboard_acva_Palestine.yaml
0 → 100644
View file @
4eecbabb
task
:
arabic_leaderboard_acva_Palestine
dataset_path
:
OALL/ACVA
dataset_name
:
Palestine
output_type
:
multiple_choice
training_split
:
null
validation_split
:
validation
test_split
:
test
process_docs
:
!function
utils.process_docs
doc_to_text
:
"
{{query}}"
doc_to_target
:
"
{{gold}}"
doc_to_choice
:
"
choices"
fewshot_split
:
validation
fewshot_config
:
sampler
:
first_n
metric_list
:
-
metric
:
acc
aggregation
:
mean
higher_is_better
:
true
-
metric
:
acc_norm
aggregation
:
mean
higher_is_better
:
true
metadata
:
version
:
1.0
lm_eval/tasks/arabic_leaderboard_complete/arabic_leaderboard_avca/arabic_leaderboard_acva_Qatar.yaml
0 → 100644
View file @
4eecbabb
task
:
arabic_leaderboard_acva_Qatar
dataset_path
:
OALL/ACVA
dataset_name
:
Qatar
output_type
:
multiple_choice
training_split
:
null
validation_split
:
validation
test_split
:
test
process_docs
:
!function
utils.process_docs
doc_to_text
:
"
{{query}}"
doc_to_target
:
"
{{gold}}"
doc_to_choice
:
"
choices"
fewshot_split
:
validation
fewshot_config
:
sampler
:
first_n
metric_list
:
-
metric
:
acc
aggregation
:
mean
higher_is_better
:
true
-
metric
:
acc_norm
aggregation
:
mean
higher_is_better
:
true
metadata
:
version
:
1.0
lm_eval/tasks/arabic_leaderboard_complete/arabic_leaderboard_avca/arabic_leaderboard_acva_Saudi_Arabia.yaml
0 → 100644
View file @
4eecbabb
task
:
arabic_leaderboard_acva_Saudi_Arabia
dataset_path
:
OALL/ACVA
dataset_name
:
Saudi_Arabia
output_type
:
multiple_choice
training_split
:
null
validation_split
:
validation
test_split
:
test
process_docs
:
!function
utils.process_docs
doc_to_text
:
"
{{query}}"
doc_to_target
:
"
{{gold}}"
doc_to_choice
:
"
choices"
fewshot_split
:
validation
fewshot_config
:
sampler
:
first_n
metric_list
:
-
metric
:
acc
aggregation
:
mean
higher_is_better
:
true
-
metric
:
acc_norm
aggregation
:
mean
higher_is_better
:
true
metadata
:
version
:
1.0
lm_eval/tasks/arabic_leaderboard_complete/arabic_leaderboard_avca/arabic_leaderboard_acva_Somalia.yaml
0 → 100644
View file @
4eecbabb
task
:
arabic_leaderboard_acva_Somalia
dataset_path
:
OALL/ACVA
dataset_name
:
Somalia
output_type
:
multiple_choice
training_split
:
null
validation_split
:
validation
test_split
:
test
process_docs
:
!function
utils.process_docs
doc_to_text
:
"
{{query}}"
doc_to_target
:
"
{{gold}}"
doc_to_choice
:
"
choices"
fewshot_split
:
validation
fewshot_config
:
sampler
:
first_n
metric_list
:
-
metric
:
acc
aggregation
:
mean
higher_is_better
:
true
-
metric
:
acc_norm
aggregation
:
mean
higher_is_better
:
true
metadata
:
version
:
1.0
lm_eval/tasks/arabic_leaderboard_complete/arabic_leaderboard_avca/arabic_leaderboard_acva_Sudan.yaml
0 → 100644
View file @
4eecbabb
task
:
arabic_leaderboard_acva_Sudan
dataset_path
:
OALL/ACVA
dataset_name
:
Sudan
output_type
:
multiple_choice
training_split
:
null
validation_split
:
validation
test_split
:
test
process_docs
:
!function
utils.process_docs
doc_to_text
:
"
{{query}}"
doc_to_target
:
"
{{gold}}"
doc_to_choice
:
"
choices"
fewshot_split
:
validation
fewshot_config
:
sampler
:
first_n
metric_list
:
-
metric
:
acc
aggregation
:
mean
higher_is_better
:
true
-
metric
:
acc_norm
aggregation
:
mean
higher_is_better
:
true
metadata
:
version
:
1.0
lm_eval/tasks/arabic_leaderboard_complete/arabic_leaderboard_avca/arabic_leaderboard_acva_Syria.yaml
0 → 100644
View file @
4eecbabb
task
:
arabic_leaderboard_acva_Syria
dataset_path
:
OALL/ACVA
dataset_name
:
Syria
output_type
:
multiple_choice
training_split
:
null
validation_split
:
validation
test_split
:
test
process_docs
:
!function
utils.process_docs
doc_to_text
:
"
{{query}}"
doc_to_target
:
"
{{gold}}"
doc_to_choice
:
"
choices"
fewshot_split
:
validation
fewshot_config
:
sampler
:
first_n
metric_list
:
-
metric
:
acc
aggregation
:
mean
higher_is_better
:
true
-
metric
:
acc_norm
aggregation
:
mean
higher_is_better
:
true
metadata
:
version
:
1.0
lm_eval/tasks/arabic_leaderboard_complete/arabic_leaderboard_avca/arabic_leaderboard_acva_Tunisia.yaml
0 → 100644
View file @
4eecbabb
task
:
arabic_leaderboard_acva_Tunisia
dataset_path
:
OALL/ACVA
dataset_name
:
Tunisia
output_type
:
multiple_choice
training_split
:
null
validation_split
:
validation
test_split
:
test
process_docs
:
!function
utils.process_docs
doc_to_text
:
"
{{query}}"
doc_to_target
:
"
{{gold}}"
doc_to_choice
:
"
choices"
fewshot_split
:
validation
fewshot_config
:
sampler
:
first_n
metric_list
:
-
metric
:
acc
aggregation
:
mean
higher_is_better
:
true
-
metric
:
acc_norm
aggregation
:
mean
higher_is_better
:
true
metadata
:
version
:
1.0
lm_eval/tasks/arabic_leaderboard_complete/arabic_leaderboard_avca/arabic_leaderboard_acva_United_Arab_Emirates.yaml
0 → 100644
View file @
4eecbabb
task
:
arabic_leaderboard_acva_United_Arab_Emirates
dataset_path
:
OALL/ACVA
dataset_name
:
United_Arab_Emirates
output_type
:
multiple_choice
training_split
:
null
validation_split
:
validation
test_split
:
test
process_docs
:
!function
utils.process_docs
doc_to_text
:
"
{{query}}"
doc_to_target
:
"
{{gold}}"
doc_to_choice
:
"
choices"
fewshot_split
:
validation
fewshot_config
:
sampler
:
first_n
metric_list
:
-
metric
:
acc
aggregation
:
mean
higher_is_better
:
true
-
metric
:
acc_norm
aggregation
:
mean
higher_is_better
:
true
metadata
:
version
:
1.0
lm_eval/tasks/arabic_leaderboard_complete/arabic_leaderboard_avca/arabic_leaderboard_acva_Yemen.yaml
0 → 100644
View file @
4eecbabb
task
:
arabic_leaderboard_acva_Yemen
dataset_path
:
OALL/ACVA
dataset_name
:
Yemen
output_type
:
multiple_choice
training_split
:
null
validation_split
:
validation
test_split
:
test
process_docs
:
!function
utils.process_docs
doc_to_text
:
"
{{query}}"
doc_to_target
:
"
{{gold}}"
doc_to_choice
:
"
choices"
fewshot_split
:
validation
fewshot_config
:
sampler
:
first_n
metric_list
:
-
metric
:
acc
aggregation
:
mean
higher_is_better
:
true
-
metric
:
acc_norm
aggregation
:
mean
higher_is_better
:
true
metadata
:
version
:
1.0
lm_eval/tasks/arabic_leaderboard_complete/arabic_leaderboard_avca/arabic_leaderboard_acva_communication.yaml
0 → 100644
View file @
4eecbabb
task
:
arabic_leaderboard_acva_communication
dataset_path
:
OALL/ACVA
dataset_name
:
communication
output_type
:
multiple_choice
training_split
:
null
validation_split
:
validation
test_split
:
test
process_docs
:
!function
utils.process_docs
doc_to_text
:
"
{{query}}"
doc_to_target
:
"
{{gold}}"
doc_to_choice
:
"
choices"
fewshot_split
:
validation
fewshot_config
:
sampler
:
first_n
metric_list
:
-
metric
:
acc
aggregation
:
mean
higher_is_better
:
true
-
metric
:
acc_norm
aggregation
:
mean
higher_is_better
:
true
metadata
:
version
:
1.0
lm_eval/tasks/arabic_leaderboard_complete/arabic_leaderboard_avca/arabic_leaderboard_acva_computer_and_phone.yaml
0 → 100644
View file @
4eecbabb
task
:
arabic_leaderboard_acva_computer_and_phone
dataset_path
:
OALL/ACVA
dataset_name
:
computer_and_phone
output_type
:
multiple_choice
training_split
:
null
validation_split
:
validation
test_split
:
test
process_docs
:
!function
utils.process_docs
doc_to_text
:
"
{{query}}"
doc_to_target
:
"
{{gold}}"
doc_to_choice
:
"
choices"
fewshot_split
:
validation
fewshot_config
:
sampler
:
first_n
metric_list
:
-
metric
:
acc
aggregation
:
mean
higher_is_better
:
true
-
metric
:
acc_norm
aggregation
:
mean
higher_is_better
:
true
metadata
:
version
:
1.0
lm_eval/tasks/arabic_leaderboard_complete/arabic_leaderboard_avca/arabic_leaderboard_acva_daily_life.yaml
0 → 100644
View file @
4eecbabb
task
:
arabic_leaderboard_acva_daily_life
dataset_path
:
OALL/ACVA
dataset_name
:
daily_life
output_type
:
multiple_choice
training_split
:
null
validation_split
:
validation
test_split
:
test
process_docs
:
!function
utils.process_docs
doc_to_text
:
"
{{query}}"
doc_to_target
:
"
{{gold}}"
doc_to_choice
:
"
choices"
fewshot_split
:
validation
fewshot_config
:
sampler
:
first_n
metric_list
:
-
metric
:
acc
aggregation
:
mean
higher_is_better
:
true
-
metric
:
acc_norm
aggregation
:
mean
higher_is_better
:
true
metadata
:
version
:
1.0
Prev
1
…
5
6
7
8
9
10
11
12
13
…
24
Next
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment