Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
gaoqiong
lm-evaluation-harness
Commits
26bc3eab
Unverified
Commit
26bc3eab
authored
Oct 19, 2023
by
Lintang Sutawika
Committed by
GitHub
Oct 19, 2023
Browse files
Merge branch 'big-refactor' into model-written-eval
parents
0d701496
cf617ab1
Changes
381
Hide whitespace changes
Inline
Side-by-side
Showing
20 changed files
with
64 additions
and
14 deletions
+64
-14
lm_eval/tasks/belebele/belebele_uzn_Latn.yaml
lm_eval/tasks/belebele/belebele_uzn_Latn.yaml
+3
-0
lm_eval/tasks/belebele/belebele_vie_Latn.yaml
lm_eval/tasks/belebele/belebele_vie_Latn.yaml
+3
-0
lm_eval/tasks/belebele/belebele_war_Latn.yaml
lm_eval/tasks/belebele/belebele_war_Latn.yaml
+3
-0
lm_eval/tasks/belebele/belebele_wol_Latn.yaml
lm_eval/tasks/belebele/belebele_wol_Latn.yaml
+3
-0
lm_eval/tasks/belebele/belebele_xho_Latn.yaml
lm_eval/tasks/belebele/belebele_xho_Latn.yaml
+3
-0
lm_eval/tasks/belebele/belebele_yor_Latn.yaml
lm_eval/tasks/belebele/belebele_yor_Latn.yaml
+3
-0
lm_eval/tasks/belebele/belebele_zho_Hans.yaml
lm_eval/tasks/belebele/belebele_zho_Hans.yaml
+3
-0
lm_eval/tasks/belebele/belebele_zho_Hant.yaml
lm_eval/tasks/belebele/belebele_zho_Hant.yaml
+3
-0
lm_eval/tasks/belebele/belebele_zsm_Latn.yaml
lm_eval/tasks/belebele/belebele_zsm_Latn.yaml
+3
-0
lm_eval/tasks/belebele/belebele_zul_Latn.yaml
lm_eval/tasks/belebele/belebele_zul_Latn.yaml
+3
-0
lm_eval/tasks/benchmarks/flan/yaml_templates/cot_template_yaml
...al/tasks/benchmarks/flan/yaml_templates/cot_template_yaml
+1
-1
lm_eval/tasks/benchmarks/flan/yaml_templates/held_in_template_yaml
...asks/benchmarks/flan/yaml_templates/held_in_template_yaml
+1
-1
lm_eval/tasks/benchmarks/minerva_math.yaml
lm_eval/tasks/benchmarks/minerva_math.yaml
+0
-0
lm_eval/tasks/benchmarks/t0_eval.yaml
lm_eval/tasks/benchmarks/t0_eval.yaml
+10
-10
lm_eval/tasks/bigbench/generate_tasks.py
lm_eval/tasks/bigbench/generate_tasks.py
+2
-2
lm_eval/tasks/bigbench/generate_until/abstract_narrative_understanding.yaml
...ench/generate_until/abstract_narrative_understanding.yaml
+4
-0
lm_eval/tasks/bigbench/generate_until/anachronisms.yaml
lm_eval/tasks/bigbench/generate_until/anachronisms.yaml
+4
-0
lm_eval/tasks/bigbench/generate_until/analogical_similarity.yaml
.../tasks/bigbench/generate_until/analogical_similarity.yaml
+4
-0
lm_eval/tasks/bigbench/generate_until/analytic_entailment.yaml
...al/tasks/bigbench/generate_until/analytic_entailment.yaml
+4
-0
lm_eval/tasks/bigbench/generate_until/arithmetic.yaml
lm_eval/tasks/bigbench/generate_until/arithmetic.yaml
+4
-0
No files found.
lm_eval/tasks/belebele/belebele_uzn_Latn.yaml
0 → 100644
View file @
26bc3eab
"
dataset_name"
:
"
uzn_Latn"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
belebele_uzn_Latn"
lm_eval/tasks/belebele/belebele_vie_Latn.yaml
0 → 100644
View file @
26bc3eab
"
dataset_name"
:
"
vie_Latn"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
belebele_vie_Latn"
lm_eval/tasks/belebele/belebele_war_Latn.yaml
0 → 100644
View file @
26bc3eab
"
dataset_name"
:
"
war_Latn"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
belebele_war_Latn"
lm_eval/tasks/belebele/belebele_wol_Latn.yaml
0 → 100644
View file @
26bc3eab
"
dataset_name"
:
"
wol_Latn"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
belebele_wol_Latn"
lm_eval/tasks/belebele/belebele_xho_Latn.yaml
0 → 100644
View file @
26bc3eab
"
dataset_name"
:
"
xho_Latn"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
belebele_xho_Latn"
lm_eval/tasks/belebele/belebele_yor_Latn.yaml
0 → 100644
View file @
26bc3eab
"
dataset_name"
:
"
yor_Latn"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
belebele_yor_Latn"
lm_eval/tasks/belebele/belebele_zho_Hans.yaml
0 → 100644
View file @
26bc3eab
"
dataset_name"
:
"
zho_Hans"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
belebele_zho_Hans"
lm_eval/tasks/belebele/belebele_zho_Hant.yaml
0 → 100644
View file @
26bc3eab
"
dataset_name"
:
"
zho_Hant"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
belebele_zho_Hant"
lm_eval/tasks/belebele/belebele_zsm_Latn.yaml
0 → 100644
View file @
26bc3eab
"
dataset_name"
:
"
zsm_Latn"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
belebele_zsm_Latn"
lm_eval/tasks/belebele/belebele_zul_Latn.yaml
0 → 100644
View file @
26bc3eab
"
dataset_name"
:
"
zul_Latn"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
belebele_zul_Latn"
lm_eval/tasks/benchmarks/flan/yaml_templates/cot_template_yaml
View file @
26bc3eab
group: flan-cot
group: flan-cot
output_type: g
reedy
_until
output_type: g
enerate
_until
validation_split: validation
validation_split: validation
doc_to_target: "{{answer}}"
doc_to_target: "{{answer}}"
metric_list:
metric_list:
...
...
lm_eval/tasks/benchmarks/flan/yaml_templates/held_in_template_yaml
View file @
26bc3eab
output_type: g
reedy
_until
output_type: g
enerate
_until
validation_split: validation
validation_split: validation
metric_list:
metric_list:
- metric: exact_match
- metric: exact_match
...
...
lm_eval/benchmarks/minerva_math.yaml
→
lm_eval/
tasks/
benchmarks/minerva_math.yaml
View file @
26bc3eab
File moved
lm_eval/tasks/benchmarks/t0_eval.yaml
View file @
26bc3eab
...
@@ -6,7 +6,7 @@ task:
...
@@ -6,7 +6,7 @@ task:
use_prompt
:
promptsource:*
use_prompt
:
promptsource:*
training_split
:
train
training_split
:
train
validation_split
:
validation
validation_split
:
validation
output_type
:
g
reedy
_until
output_type
:
g
enerate
_until
metric_list
:
metric_list
:
-
metric
:
exact_match
-
metric
:
exact_match
aggregation
:
mean
aggregation
:
mean
...
@@ -19,7 +19,7 @@ task:
...
@@ -19,7 +19,7 @@ task:
use_prompt
:
promptsource:*
use_prompt
:
promptsource:*
training_split
:
train
training_split
:
train
validation_split
:
validation
validation_split
:
validation
output_type
:
g
reedy
_until
output_type
:
g
enerate
_until
metric_list
:
metric_list
:
-
metric
:
exact_match
-
metric
:
exact_match
aggregation
:
mean
aggregation
:
mean
...
@@ -32,7 +32,7 @@ task:
...
@@ -32,7 +32,7 @@ task:
use_prompt
:
promptsource:*
use_prompt
:
promptsource:*
training_split
:
train
training_split
:
train
validation_split
:
validation
validation_split
:
validation
output_type
:
g
reedy
_until
output_type
:
g
enerate
_until
metric_list
:
metric_list
:
-
metric
:
exact_match
-
metric
:
exact_match
aggregation
:
mean
aggregation
:
mean
...
@@ -44,7 +44,7 @@ task:
...
@@ -44,7 +44,7 @@ task:
use_prompt
:
promptsource:*
use_prompt
:
promptsource:*
training_split
:
train
training_split
:
train
validation_split
:
validation
validation_split
:
validation
output_type
:
g
reedy
_until
output_type
:
g
enerate
_until
metric_list
:
metric_list
:
-
metric
:
exact_match
-
metric
:
exact_match
aggregation
:
mean
aggregation
:
mean
...
@@ -56,7 +56,7 @@ task:
...
@@ -56,7 +56,7 @@ task:
use_prompt
:
promptsource:*
use_prompt
:
promptsource:*
training_split
:
train_r1
training_split
:
train_r1
validation_split
:
dev_r1
validation_split
:
dev_r1
output_type
:
g
reedy
_until
output_type
:
g
enerate
_until
metric_list
:
metric_list
:
-
metric
:
exact_match
-
metric
:
exact_match
aggregation
:
mean
aggregation
:
mean
...
@@ -68,7 +68,7 @@ task:
...
@@ -68,7 +68,7 @@ task:
use_prompt
:
promptsource:*
use_prompt
:
promptsource:*
training_split
:
train_r2
training_split
:
train_r2
validation_split
:
dev_r2
validation_split
:
dev_r2
output_type
:
g
reedy
_until
output_type
:
g
enerate
_until
metric_list
:
metric_list
:
-
metric
:
exact_match
-
metric
:
exact_match
aggregation
:
mean
aggregation
:
mean
...
@@ -80,7 +80,7 @@ task:
...
@@ -80,7 +80,7 @@ task:
use_prompt
:
promptsource:*
use_prompt
:
promptsource:*
training_split
:
train_r3
training_split
:
train_r3
validation_split
:
dev_r3
validation_split
:
dev_r3
output_type
:
g
reedy
_until
output_type
:
g
enerate
_until
metric_list
:
metric_list
:
-
metric
:
exact_match
-
metric
:
exact_match
aggregation
:
mean
aggregation
:
mean
...
@@ -93,7 +93,7 @@ task:
...
@@ -93,7 +93,7 @@ task:
use_prompt
:
promptsource:*
use_prompt
:
promptsource:*
training_split
:
train
training_split
:
train
validation_split
:
validation
validation_split
:
validation
output_type
:
g
reedy
_until
output_type
:
g
enerate
_until
metric_list
:
metric_list
:
-
metric
:
exact_match
-
metric
:
exact_match
aggregation
:
mean
aggregation
:
mean
...
@@ -105,7 +105,7 @@ task:
...
@@ -105,7 +105,7 @@ task:
use_prompt
:
promptsource:*
use_prompt
:
promptsource:*
training_split
:
train
training_split
:
train
validation_split
:
validation
validation_split
:
validation
output_type
:
g
reedy
_until
output_type
:
g
enerate
_until
metric_list
:
metric_list
:
-
metric
:
exact_match
-
metric
:
exact_match
aggregation
:
mean
aggregation
:
mean
...
@@ -118,7 +118,7 @@ task:
...
@@ -118,7 +118,7 @@ task:
use_prompt
:
promptsource:*
use_prompt
:
promptsource:*
training_split
:
train
training_split
:
train
validation_split
:
validation
validation_split
:
validation
output_type
:
g
reedy
_until
output_type
:
g
enerate
_until
metric_list
:
metric_list
:
-
metric
:
exact_match
-
metric
:
exact_match
aggregation
:
mean
aggregation
:
mean
...
...
lm_eval/tasks/bigbench/generate_tasks.py
View file @
26bc3eab
...
@@ -175,8 +175,8 @@ all_subtasks = [
...
@@ -175,8 +175,8 @@ all_subtasks = [
def
main
()
->
None
:
def
main
()
->
None
:
for
path
,
task_type
in
zip
(
for
path
,
task_type
in
zip
(
[
"multiple_choice"
,
"g
reedy
_until"
],
[
"multiple_choice"
,
"g
enerate
_until"
],
[
"multiple_choice_template_yaml"
,
"g
reedy
_until_template_yaml"
],
[
"multiple_choice_template_yaml"
,
"g
enerate
_until_template_yaml"
],
):
):
os
.
makedirs
(
path
,
exist_ok
=
True
)
os
.
makedirs
(
path
,
exist_ok
=
True
)
for
task
in
all_subtasks
:
for
task
in
all_subtasks
:
...
...
lm_eval/tasks/bigbench/g
reedy
_until/abstract_narrative_understanding.yaml
→
lm_eval/tasks/bigbench/g
enerate
_until/abstract_narrative_understanding.yaml
View file @
26bc3eab
# Generated by utils.py
# Generated by utils.py
dataset_name
:
abstract_narrative_understanding_zero_shot
dataset_name
:
abstract_narrative_understanding_zero_shot
include
:
../g
reedy
_until_template_yaml
include
:
../g
enerate
_until_template_yaml
task
:
bigbench_abstract_narrative_understanding_g
reedy
_until
task
:
bigbench_abstract_narrative_understanding_g
enerate
_until
lm_eval/tasks/bigbench/g
reedy
_until/anachronisms.yaml
→
lm_eval/tasks/bigbench/g
enerate
_until/anachronisms.yaml
View file @
26bc3eab
# Generated by utils.py
# Generated by utils.py
dataset_name
:
anachronisms_zero_shot
dataset_name
:
anachronisms_zero_shot
include
:
../g
reedy
_until_template_yaml
include
:
../g
enerate
_until_template_yaml
task
:
bigbench_anachronisms_g
reedy
_until
task
:
bigbench_anachronisms_g
enerate
_until
lm_eval/tasks/bigbench/g
reedy
_until/analogical_similarity.yaml
→
lm_eval/tasks/bigbench/g
enerate
_until/analogical_similarity.yaml
View file @
26bc3eab
# Generated by utils.py
# Generated by utils.py
dataset_name
:
analogical_similarity_zero_shot
dataset_name
:
analogical_similarity_zero_shot
include
:
../g
reedy
_until_template_yaml
include
:
../g
enerate
_until_template_yaml
task
:
bigbench_analogical_similarity_g
reedy
_until
task
:
bigbench_analogical_similarity_g
enerate
_until
lm_eval/tasks/bigbench/g
reedy
_until/analytic_entailment.yaml
→
lm_eval/tasks/bigbench/g
enerate
_until/analytic_entailment.yaml
View file @
26bc3eab
# Generated by utils.py
# Generated by utils.py
dataset_name
:
analytic_entailment_zero_shot
dataset_name
:
analytic_entailment_zero_shot
include
:
../g
reedy
_until_template_yaml
include
:
../g
enerate
_until_template_yaml
task
:
bigbench_analytic_entailment_g
reedy
_until
task
:
bigbench_analytic_entailment_g
enerate
_until
lm_eval/tasks/bigbench/g
reedy
_until/arithmetic.yaml
→
lm_eval/tasks/bigbench/g
enerate
_until/arithmetic.yaml
View file @
26bc3eab
# Generated by utils.py
# Generated by utils.py
dataset_name
:
arithmetic_zero_shot
dataset_name
:
arithmetic_zero_shot
include
:
../g
reedy
_until_template_yaml
include
:
../g
enerate
_until_template_yaml
task
:
bigbench_arithmetic_g
reedy
_until
task
:
bigbench_arithmetic_g
enerate
_until
Prev
1
…
4
5
6
7
8
9
10
11
12
…
20
Next
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment