Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
gaoqiong
lm-evaluation-harness
Commits
03be40e2
Commit
03be40e2
authored
Sep 04, 2023
by
lintangsutawika
Browse files
udpate
parent
e795efcf
Changes
125
Hide whitespace changes
Inline
Side-by-side
Showing
20 changed files
with
45 additions
and
135 deletions
+45
-135
lm_eval/tasks/bbh/README.md
lm_eval/tasks/bbh/README.md
+4
-0
lm_eval/tasks/bbh/flan_cot_fewshot/_flan_cot_fewshot_template_yaml
...asks/bbh/flan_cot_fewshot/_flan_cot_fewshot_template_yaml
+2
-2
lm_eval/tasks/bbh/flan_cot_zeroshot/_flan_cot_zeroshot_template_yaml
...ks/bbh/flan_cot_zeroshot/_flan_cot_zeroshot_template_yaml
+2
-2
lm_eval/tasks/bbh/flan_fewshot/_flan_fewshot_template_yaml
lm_eval/tasks/bbh/flan_fewshot/_flan_fewshot_template_yaml
+2
-2
lm_eval/tasks/bbh/flan_zeroshot/_flan_zeroshot_template_yaml
lm_eval/tasks/bbh/flan_zeroshot/_flan_zeroshot_template_yaml
+2
-2
lm_eval/tasks/mmlu/_generate_configs.py
lm_eval/tasks/mmlu/_generate_configs.py
+1
-2
lm_eval/tasks/mmlu/default/hendrycks_test_original_default.yaml
...l/tasks/mmlu/default/hendrycks_test_original_default.yaml
+0
-21
lm_eval/tasks/mmlu/flan_cot_fewshot/_mmlu_flan_cot_fewshot_template_yaml
...mlu/flan_cot_fewshot/_mmlu_flan_cot_fewshot_template_yaml
+13
-14
lm_eval/tasks/mmlu/flan_cot_zeroshot/_mmlu_flan_generative_template_yaml
...mlu/flan_cot_zeroshot/_mmlu_flan_generative_template_yaml
+12
-13
lm_eval/tasks/mmlu/flan_n_shot/_mmlu_flan_generative_template_yaml
...asks/mmlu/flan_n_shot/_mmlu_flan_generative_template_yaml
+5
-11
lm_eval/tasks/mmlu/flan_n_shot/_mmlu_flan_loglikelihood_template_yaml
...s/mmlu/flan_n_shot/_mmlu_flan_loglikelihood_template_yaml
+2
-3
lm_eval/tasks/mmlu/flan_n_shot/mmlu_abstract_algebra.yaml
lm_eval/tasks/mmlu/flan_n_shot/mmlu_abstract_algebra.yaml
+0
-7
lm_eval/tasks/mmlu/flan_n_shot/mmlu_anatomy.yaml
lm_eval/tasks/mmlu/flan_n_shot/mmlu_anatomy.yaml
+0
-7
lm_eval/tasks/mmlu/flan_n_shot/mmlu_astronomy.yaml
lm_eval/tasks/mmlu/flan_n_shot/mmlu_astronomy.yaml
+0
-7
lm_eval/tasks/mmlu/flan_n_shot/mmlu_business_ethics.yaml
lm_eval/tasks/mmlu/flan_n_shot/mmlu_business_ethics.yaml
+0
-7
lm_eval/tasks/mmlu/flan_n_shot/mmlu_clinical_knowledge.yaml
lm_eval/tasks/mmlu/flan_n_shot/mmlu_clinical_knowledge.yaml
+0
-7
lm_eval/tasks/mmlu/flan_n_shot/mmlu_college_biology.yaml
lm_eval/tasks/mmlu/flan_n_shot/mmlu_college_biology.yaml
+0
-7
lm_eval/tasks/mmlu/flan_n_shot/mmlu_college_chemistry.yaml
lm_eval/tasks/mmlu/flan_n_shot/mmlu_college_chemistry.yaml
+0
-7
lm_eval/tasks/mmlu/flan_n_shot/mmlu_college_computer_science.yaml
...tasks/mmlu/flan_n_shot/mmlu_college_computer_science.yaml
+0
-7
lm_eval/tasks/mmlu/flan_n_shot/mmlu_college_mathematics.yaml
lm_eval/tasks/mmlu/flan_n_shot/mmlu_college_mathematics.yaml
+0
-7
No files found.
lm_eval/tasks/bbh/README.md
View file @
03be40e2
...
...
@@ -26,6 +26,10 @@ Homepage: https://github.com/suzgunmirac/BIG-Bench-Hard
#### Groups
-
`bbh_flan_zeroshot`
-
`bbh_flan_fewshot`
-
`bbh_flan_cot_fewshot`
-
`bbh_flan_cot_zeroshot`
#### Tasks
...
...
lm_eval/tasks/bbh/flan_cot_fewshot/_flan_cot_fewshot_template_yaml
View file @
03be40e2
...
...
@@ -7,8 +7,8 @@ metric_list:
- metric: exact_match
aggregation: mean
higher_is_better: true
ignore_case: true
ignore_punctuation: true
#
ignore_case: true
#
ignore_punctuation: true
generation_kwargs:
until:
- "</s>"
...
...
lm_eval/tasks/bbh/flan_cot_zeroshot/_flan_cot_zeroshot_template_yaml
View file @
03be40e2
...
...
@@ -7,8 +7,8 @@ metric_list:
- metric: exact_match
aggregation: mean
higher_is_better: true
ignore_case: true
ignore_punctuation: true
#
ignore_case: true
#
ignore_punctuation: true
generation_kwargs:
until:
- "</s>"
...
...
lm_eval/tasks/bbh/flan_fewshot/_flan_fewshot_template_yaml
View file @
03be40e2
...
...
@@ -7,8 +7,8 @@ metric_list:
- metric: exact_match
aggregation: mean
higher_is_better: true
ignore_case: true
ignore_punctuation: true
#
ignore_case: true
#
ignore_punctuation: true
generation_kwargs:
until:
- "</s>"
...
...
lm_eval/tasks/bbh/flan_zeroshot/_flan_zeroshot_template_yaml
View file @
03be40e2
...
...
@@ -7,8 +7,8 @@ metric_list:
- metric: exact_match
aggregation: mean
higher_is_better: true
ignore_case: true
ignore_punctuation: true
#
ignore_case: true
#
ignore_punctuation: true
generation_kwargs:
until:
- "</s>"
...
...
lm_eval/tasks/mmlu/_generate_configs.py
View file @
03be40e2
...
...
@@ -92,7 +92,6 @@ if __name__ == "__main__":
base_yaml_name
=
os
.
path
.
split
(
args
.
base_yaml_path
)[
-
1
]
with
open
(
args
.
base_yaml_path
)
as
f
:
base_yaml
=
yaml
.
full_load
(
f
)
print
(
base_yaml
)
if
args
.
cot_prompt_path
is
not
None
:
import
json
...
...
@@ -115,4 +114,4 @@ if __name__ == "__main__":
file_save_path
=
args
.
save_prefix_path
+
f
"_
{
subject
}
.yaml"
eval_logger
.
info
(
f
"Saving yaml for subset
{
subject
}
to
{
file_save_path
}
"
)
with
open
(
file_save_path
,
"w"
)
as
yaml_file
:
yaml
.
dump
(
yaml_dict
,
yaml_file
,
width
=
float
(
"inf"
))
yaml
.
dump
(
yaml_dict
,
yaml_file
,
width
=
float
(
"inf"
)
,
allow_unicode
=
True
,
default_style
=
'"'
)
lm_eval/tasks/mmlu/default/hendrycks_test_original_default.yaml
deleted
100644 → 0
View file @
e795efcf
group
:
-
mmlu
-
mmlu_original
-
multiple_choice
task
:
mmlu_original_abstract_algebra
dataset_path
:
cais/mmlu
dataset_name
:
abstract_algebra
output_type
:
multiple_choice
validation_split
:
validation
test_split
:
test
description
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
abstract
algebra.
\n\n
"
doc_to_text
:
"
{{question.strip()}}
\n
A.
{{choices[0]}}
\n
B.
{{choices[1]}}
\n
C.
{{choices[2]}}
\n
D.
{{choices[3]}}
\n
Answer:"
doc_to_choice
:
[
"
A"
,
"
B"
,
"
C"
,
"
D"
]
doc_to_target
:
"
{{answer}}"
metric_list
:
-
metric
:
acc
aggregation
:
mean
higher_is_better
:
true
-
metric
:
acc_norm
aggregation
:
mean
higher_is_better
:
true
lm_eval/tasks/mmlu/flan_cot_fewshot/_mmlu_flan_cot_fewshot_template_yaml
View file @
03be40e2
...
...
@@ -2,24 +2,23 @@ group: mmlu_flan_cot_fewshot
dataset_path: cais/mmlu
validation_split: validation
fewshot_split: dev
doc_to_text: "\n\nQ: {{question.strip()}}\n(A) {{choices[0]}} (B) {{choices[1]}} (C) {{choices[2]}} (D) {{choices[3]}}\nA: Let's think step by step."
fewshot_delimiter: ""
output_type: greedy_until
doc_to_text: "Q: {{question.strip()}}\n(A) {{choices[0]}} (B) {{choices[1]}} (C) {{choices[2]}} (D) {{choices[3]}}\nA: Let's think step by step."
doc_to_target: "{{['(A)', '(B)', '(C)', '(D)'][answer]}}"
metric
_list:
-
metric: exact_match
aggregation: mean
higher_is_better: true
ignore_case: true
ignore_p
unct
uat
ion:
true
filter
_list:
-
name: "get-answer"
filter:
- function: "regex"
regex_pattern: "(?<=The answer is )(.*)(?=.)"
- f
unction:
"take_first"
generation_kwargs:
until:
- "</s>"
do_sample: false
temperature: 0.0
filter_list:
- name: "get-answer"
filter:
- function: "regex"
regex_pattern: "(?<=The answer is )(.*)(.)"
- function: "take_first"
\ No newline at end of file
metric_list:
- metric: exact_match
aggregation: mean
higher_is_better: true
ignore_case: true
ignore_punctuation: true
\ No newline at end of file
lm_eval/tasks/mmlu/flan_cot_zeroshot/_mmlu_flan_generative_template_yaml
View file @
03be40e2
...
...
@@ -2,24 +2,23 @@ group: mmlu_flan_cot_zeroshot
dataset_path: cais/mmlu
validation_split: validation
fewshot_split: dev
doc_to_text: "\n\nQ: {{question.strip()}}\n(A) {{choices[0]}} (B) {{choices[1]}} (C) {{choices[2]}} (D) {{choices[3]}}\nA: Let's think step by step."
output_type: greedy_until
fewshot_delimiter: "
"
doc_to_text: "Q: {{question.strip()}}\n(A) {{choices[0]}} (B) {{choices[1]}} (C) {{choices[2]}} (D) {{choices[3]}}\nA: Let's think step by step.
"
doc_to_target: "{{['(A)', '(B)', '(C)', '(D)'][answer]}}"
filter_list:
- name: "get-answer"
filter:
- function: "regex"
regex_pattern: "(?<=The answer is )(.*)(?=.)"
- function: "take_first"
generation_kwargs:
until:
- "</s>"
do_sample: false
temperature: 0.0
metric_list:
- metric: exact_match
aggregation: mean
higher_is_better: true
ignore_case: true
ignore_punctuation: true
generation_kwargs:
until:
- "</s>"
do_sample: false
temperature: 0.0
filter_list:
- name: "get-answer"
filter:
- function: "regex"
regex_pattern: "(?<=The answer is )(.*)(.)"
- function: "take_first"
\ No newline at end of file
lm_eval/tasks/mmlu/flan_n_shot/_mmlu_flan_generative_template_yaml
View file @
03be40e2
...
...
@@ -2,19 +2,13 @@ group: mmlu_flan_n_shot_generative
dataset_path: cais/mmlu
test_split: test
fewshot_split: dev
# doc_to_text: "Q: {{question.strip()}}\n(A) {{choices[0]}} (B) {{choices[1]}} (C) {{choices[2]}} (D) {{choices[3]}}\nA: "
doc_to_text: "{{question.strip()}}\nA. {{choices[0]}}\nB. {{choices[1]}}\nC. {{choices[2]}}\nD. {{choices[3]}}\nAnswer:"
output_type: greedy_until
# doc_to_target: "{{['(A)', '(B)', '(C)', '(D)'][answer]}}"
doc_to_target: "{{['A', 'B', 'C', 'D'][answer]}}"
doc_to_text: "Q: {{question.strip()}}\n(A) {{choices[0]}} (B) {{choices[1]}} (C) {{choices[2]}} (D) {{choices[3]}}\nA: "
doc_to_target: "{{['(A)', '(B)', '(C)', '(D)'][answer]}}"
generation_kwargs:
until:
- "</s>"
metric_list:
- metric: exact_match
aggregation: mean
higher_is_better: true
# ignore_case: true
# ignore_punctuation: true
generation_kwargs:
until:
- "</s>"
# do_sample: false
# temperature: 0.0
\ No newline at end of file
lm_eval/tasks/mmlu/flan_n_shot/_mmlu_flan_loglikelihood_template_yaml
View file @
03be40e2
group: mmlu_flan_n_shot_loglikelihood
dataset_path: cais/mmlu
# validation_split: validation
test_split: test
fewshot_split: dev
output_type: multiple_choice
doc_to_text: "{{question.strip()}}\n
A.
{{choices[0]}}
\nB.
{{choices[1]}}
\nC.
{{choices[2]}}
\nD.
{{choices[3]}}\nA
nswer
:"
doc_to_choice: ["
A
", "
B
", "
C
", "
D
"]
doc_to_text: "
Q:
{{question.strip()}}\n
(A)
{{choices[0]}}
(B)
{{choices[1]}}
(C)
{{choices[2]}}
(D)
{{choices[3]}}\nA:
"
doc_to_choice: ["
(A)
", "
(B)
", "
(C)
", "
(D)
"]
doc_to_target: answer
metric_list:
- metric: acc
...
...
lm_eval/tasks/mmlu/flan_n_shot/mmlu_abstract_algebra.yaml
deleted
100644 → 0
View file @
e795efcf
dataset_name
:
abstract_algebra
description
:
'
The
following
are
multiple
choice
questions
(with
answers)
about
abstract
algebra.
'
include
:
_mmlu_flan_generative_template_yaml
task
:
mmlu_flan_n_shot_abstract_algebra
lm_eval/tasks/mmlu/flan_n_shot/mmlu_anatomy.yaml
deleted
100644 → 0
View file @
e795efcf
dataset_name
:
anatomy
description
:
'
The
following
are
multiple
choice
questions
(with
answers)
about
anatomy.
'
include
:
_mmlu_flan_generative_template_yaml
task
:
mmlu_flan_n_shot_anatomy
lm_eval/tasks/mmlu/flan_n_shot/mmlu_astronomy.yaml
deleted
100644 → 0
View file @
e795efcf
dataset_name
:
astronomy
description
:
'
The
following
are
multiple
choice
questions
(with
answers)
about
astronomy.
'
include
:
_mmlu_flan_generative_template_yaml
task
:
mmlu_flan_n_shot_astronomy
lm_eval/tasks/mmlu/flan_n_shot/mmlu_business_ethics.yaml
deleted
100644 → 0
View file @
e795efcf
dataset_name
:
business_ethics
description
:
'
The
following
are
multiple
choice
questions
(with
answers)
about
business
ethics.
'
include
:
_mmlu_flan_generative_template_yaml
task
:
mmlu_flan_n_shot_business_ethics
lm_eval/tasks/mmlu/flan_n_shot/mmlu_clinical_knowledge.yaml
deleted
100644 → 0
View file @
e795efcf
dataset_name
:
clinical_knowledge
description
:
'
The
following
are
multiple
choice
questions
(with
answers)
about
clinical
knowledge.
'
include
:
_mmlu_flan_generative_template_yaml
task
:
mmlu_flan_n_shot_clinical_knowledge
lm_eval/tasks/mmlu/flan_n_shot/mmlu_college_biology.yaml
deleted
100644 → 0
View file @
e795efcf
dataset_name
:
college_biology
description
:
'
The
following
are
multiple
choice
questions
(with
answers)
about
college
biology.
'
include
:
_mmlu_flan_generative_template_yaml
task
:
mmlu_flan_n_shot_college_biology
lm_eval/tasks/mmlu/flan_n_shot/mmlu_college_chemistry.yaml
deleted
100644 → 0
View file @
e795efcf
dataset_name
:
college_chemistry
description
:
'
The
following
are
multiple
choice
questions
(with
answers)
about
college
chemistry.
'
include
:
_mmlu_flan_generative_template_yaml
task
:
mmlu_flan_n_shot_college_chemistry
lm_eval/tasks/mmlu/flan_n_shot/mmlu_college_computer_science.yaml
deleted
100644 → 0
View file @
e795efcf
dataset_name
:
college_computer_science
description
:
'
The
following
are
multiple
choice
questions
(with
answers)
about
college
computer
science.
'
include
:
_mmlu_flan_generative_template_yaml
task
:
mmlu_flan_n_shot_college_computer_science
lm_eval/tasks/mmlu/flan_n_shot/mmlu_college_mathematics.yaml
deleted
100644 → 0
View file @
e795efcf
dataset_name
:
college_mathematics
description
:
'
The
following
are
multiple
choice
questions
(with
answers)
about
college
mathematics.
'
include
:
_mmlu_flan_generative_template_yaml
task
:
mmlu_flan_n_shot_college_mathematics
Prev
1
2
3
4
5
…
7
Next
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment