Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
gaoqiong
lm-evaluation-harness
Commits
f23ae748
Commit
f23ae748
authored
Sep 03, 2023
by
lintangsutawika
Browse files
add mmlu variants
parent
191458b8
Changes
235
Hide whitespace changes
Inline
Side-by-side
Showing
20 changed files
with
219 additions
and
0 deletions
+219
-0
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_world_religions.yaml
...val/tasks/mmlu/flan_cot_fewshot/mmlu_world_religions.yaml
+53
-0
lm_eval/tasks/mmlu/flan_cot_zeroshot/_mmlu_flan_generative_template_yaml
...mlu/flan_cot_zeroshot/_mmlu_flan_generative_template_yaml
+25
-0
lm_eval/tasks/mmlu/flan_cot_zeroshot/mmlu_abstract_algebra.yaml
...l/tasks/mmlu/flan_cot_zeroshot/mmlu_abstract_algebra.yaml
+8
-0
lm_eval/tasks/mmlu/flan_cot_zeroshot/mmlu_anatomy.yaml
lm_eval/tasks/mmlu/flan_cot_zeroshot/mmlu_anatomy.yaml
+7
-0
lm_eval/tasks/mmlu/flan_cot_zeroshot/mmlu_astronomy.yaml
lm_eval/tasks/mmlu/flan_cot_zeroshot/mmlu_astronomy.yaml
+7
-0
lm_eval/tasks/mmlu/flan_cot_zeroshot/mmlu_business_ethics.yaml
...al/tasks/mmlu/flan_cot_zeroshot/mmlu_business_ethics.yaml
+8
-0
lm_eval/tasks/mmlu/flan_cot_zeroshot/mmlu_clinical_knowledge.yaml
...tasks/mmlu/flan_cot_zeroshot/mmlu_clinical_knowledge.yaml
+8
-0
lm_eval/tasks/mmlu/flan_cot_zeroshot/mmlu_college_biology.yaml
...al/tasks/mmlu/flan_cot_zeroshot/mmlu_college_biology.yaml
+8
-0
lm_eval/tasks/mmlu/flan_cot_zeroshot/mmlu_college_chemistry.yaml
.../tasks/mmlu/flan_cot_zeroshot/mmlu_college_chemistry.yaml
+8
-0
lm_eval/tasks/mmlu/flan_cot_zeroshot/mmlu_college_computer_science.yaml
...mmlu/flan_cot_zeroshot/mmlu_college_computer_science.yaml
+8
-0
lm_eval/tasks/mmlu/flan_cot_zeroshot/mmlu_college_mathematics.yaml
...asks/mmlu/flan_cot_zeroshot/mmlu_college_mathematics.yaml
+8
-0
lm_eval/tasks/mmlu/flan_cot_zeroshot/mmlu_college_medicine.yaml
...l/tasks/mmlu/flan_cot_zeroshot/mmlu_college_medicine.yaml
+8
-0
lm_eval/tasks/mmlu/flan_cot_zeroshot/mmlu_college_physics.yaml
...al/tasks/mmlu/flan_cot_zeroshot/mmlu_college_physics.yaml
+8
-0
lm_eval/tasks/mmlu/flan_cot_zeroshot/mmlu_computer_security.yaml
.../tasks/mmlu/flan_cot_zeroshot/mmlu_computer_security.yaml
+8
-0
lm_eval/tasks/mmlu/flan_cot_zeroshot/mmlu_conceptual_physics.yaml
...tasks/mmlu/flan_cot_zeroshot/mmlu_conceptual_physics.yaml
+8
-0
lm_eval/tasks/mmlu/flan_cot_zeroshot/mmlu_econometrics.yaml
lm_eval/tasks/mmlu/flan_cot_zeroshot/mmlu_econometrics.yaml
+7
-0
lm_eval/tasks/mmlu/flan_cot_zeroshot/mmlu_electrical_engineering.yaml
...s/mmlu/flan_cot_zeroshot/mmlu_electrical_engineering.yaml
+8
-0
lm_eval/tasks/mmlu/flan_cot_zeroshot/mmlu_elementary_mathematics.yaml
...s/mmlu/flan_cot_zeroshot/mmlu_elementary_mathematics.yaml
+8
-0
lm_eval/tasks/mmlu/flan_cot_zeroshot/mmlu_formal_logic.yaml
lm_eval/tasks/mmlu/flan_cot_zeroshot/mmlu_formal_logic.yaml
+8
-0
lm_eval/tasks/mmlu/flan_cot_zeroshot/mmlu_global_facts.yaml
lm_eval/tasks/mmlu/flan_cot_zeroshot/mmlu_global_facts.yaml
+8
-0
No files found.
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_world_religions.yaml
0 → 100644
View file @
f23ae748
dataset_name
:
world_religions
description
:
'
The
following
are
multiple
choice
questions
(with
answers)
about
world
religions.
Q:
How
can
the
Upanishads
be
characterized?
(A)
Ritual
texts
(B)
Philosophical
texts
(C)
Hymns
(D)
Origin
stories
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
world
religions
for
help.
The
Upanishads
are
the
most
recent
part
of
Vedas
(the
oldest
scriptures
in
Hinduism)
and
supplied
the
basis
of
later
Hindu
philosophy.
So
they
are
philosophical
texts.
The
answer
is
(B).
Q:
What
is
the
Second
Gem
in
Buddhism?
(A)
The
Dharma
(B)
The
Sangha
(C)
The
Buddha
(D)
The
Bodhisattva
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
world
religions
for
help.
The
Second
Gem
in
Buddhism
is
The
Dharma.
The
answer
is
(A).
Q:
Which
Japanese
government
promoted
a
kind
of
national
cult
based
on
the
emperor
and
his
associations
with
kami?
(A)
Honen
(B)
Tanaka
(C)
Tokugawa
(D)
Meiji
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
world
religions
for
help.
The
promotion
of
a
national
cult
based
on
the
emperor
and
his
associations
with
Kami
happened
during
the
reign
of
Emperor
Meiji
(1852-1912).
The
answer
is
(D).
Q:
In
which
dynasty
was
the
"Mandate
of
Heaven"
developed
to
legitimatize
the
new
rulers?
(A)
Shang
(B)
Zhou
(C)
Han
(D)
Xia
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
world
religions
for
help.
The
"Mandate
of
Heaven"
was
developed
as
an
ancient
Chinese
philosophical
concept
during
the
Zhou
Dynasty
(1046-256
BCE).
The
answer
is
(B).
Q:
What
is
the
sign
of
the
covenant
for
Jewish
males?
(A)
The
rainbow
(B)
Circumcision
(C)
A
son
(D)
Bar
mitzvah
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
world
religions
for
help.
In
Judaism,
the
most
distinctive
sign
of
the
covenant
is
circumcision
(brit
milah).
The
answer
is
(B).'
include
:
_mmlu_flan_cot_fewshot_template_yaml
task
:
mmlu_flan_cot_fewshot_world_religions
lm_eval/tasks/mmlu/flan_cot_zeroshot/_mmlu_flan_generative_template_yaml
0 → 100644
View file @
f23ae748
group: mmlu_flan_cot_zeroshot
dataset_path: cais/mmlu
validation_split: validation
fewshot_split: dev
doc_to_text: "\n\nQ: {{question.strip()}}\n(A) {{choices[0]}} (B) {{choices[1]}} (C) {{choices[2]}} (D) {{choices[3]}}\nA: Let's think step by step."
output_type: greedy_until
fewshot_delimiter: ""
doc_to_target: "{{['(A)', '(B)', '(C)', '(D)'][answer]}}"
metric_list:
- metric: exact_match
aggregation: mean
higher_is_better: true
ignore_case: true
ignore_punctuation: true
generation_kwargs:
until:
- "</s>"
do_sample: false
temperature: 0.0
filter_list:
- name: "get-answer"
filter:
- function: "regex"
regex_pattern: "(?<=The answer is )(.*)(.)"
- function: "take_first"
\ No newline at end of file
lm_eval/tasks/mmlu/flan_cot_zeroshot/mmlu_abstract_algebra.yaml
0 → 100644
View file @
f23ae748
dataset_name
:
abstract_algebra
description
:
'
The
following
are
multiple
choice
questions
(with
answers)
about
abstract
algebra.
'
include
:
_mmlu_flan_generative_template_yaml
task
:
mmlu_flan_cot_zeroshot_abstract_algebra
lm_eval/tasks/mmlu/flan_cot_zeroshot/mmlu_anatomy.yaml
0 → 100644
View file @
f23ae748
dataset_name
:
anatomy
description
:
'
The
following
are
multiple
choice
questions
(with
answers)
about
anatomy.
'
include
:
_mmlu_flan_generative_template_yaml
task
:
mmlu_flan_cot_zeroshot_anatomy
lm_eval/tasks/mmlu/flan_cot_zeroshot/mmlu_astronomy.yaml
0 → 100644
View file @
f23ae748
dataset_name
:
astronomy
description
:
'
The
following
are
multiple
choice
questions
(with
answers)
about
astronomy.
'
include
:
_mmlu_flan_generative_template_yaml
task
:
mmlu_flan_cot_zeroshot_astronomy
lm_eval/tasks/mmlu/flan_cot_zeroshot/mmlu_business_ethics.yaml
0 → 100644
View file @
f23ae748
dataset_name
:
business_ethics
description
:
'
The
following
are
multiple
choice
questions
(with
answers)
about
business
ethics.
'
include
:
_mmlu_flan_generative_template_yaml
task
:
mmlu_flan_cot_zeroshot_business_ethics
lm_eval/tasks/mmlu/flan_cot_zeroshot/mmlu_clinical_knowledge.yaml
0 → 100644
View file @
f23ae748
dataset_name
:
clinical_knowledge
description
:
'
The
following
are
multiple
choice
questions
(with
answers)
about
clinical
knowledge.
'
include
:
_mmlu_flan_generative_template_yaml
task
:
mmlu_flan_cot_zeroshot_clinical_knowledge
lm_eval/tasks/mmlu/flan_cot_zeroshot/mmlu_college_biology.yaml
0 → 100644
View file @
f23ae748
dataset_name
:
college_biology
description
:
'
The
following
are
multiple
choice
questions
(with
answers)
about
college
biology.
'
include
:
_mmlu_flan_generative_template_yaml
task
:
mmlu_flan_cot_zeroshot_college_biology
lm_eval/tasks/mmlu/flan_cot_zeroshot/mmlu_college_chemistry.yaml
0 → 100644
View file @
f23ae748
dataset_name
:
college_chemistry
description
:
'
The
following
are
multiple
choice
questions
(with
answers)
about
college
chemistry.
'
include
:
_mmlu_flan_generative_template_yaml
task
:
mmlu_flan_cot_zeroshot_college_chemistry
lm_eval/tasks/mmlu/flan_cot_zeroshot/mmlu_college_computer_science.yaml
0 → 100644
View file @
f23ae748
dataset_name
:
college_computer_science
description
:
'
The
following
are
multiple
choice
questions
(with
answers)
about
college
computer
science.
'
include
:
_mmlu_flan_generative_template_yaml
task
:
mmlu_flan_cot_zeroshot_college_computer_science
lm_eval/tasks/mmlu/flan_cot_zeroshot/mmlu_college_mathematics.yaml
0 → 100644
View file @
f23ae748
dataset_name
:
college_mathematics
description
:
'
The
following
are
multiple
choice
questions
(with
answers)
about
college
mathematics.
'
include
:
_mmlu_flan_generative_template_yaml
task
:
mmlu_flan_cot_zeroshot_college_mathematics
lm_eval/tasks/mmlu/flan_cot_zeroshot/mmlu_college_medicine.yaml
0 → 100644
View file @
f23ae748
dataset_name
:
college_medicine
description
:
'
The
following
are
multiple
choice
questions
(with
answers)
about
college
medicine.
'
include
:
_mmlu_flan_generative_template_yaml
task
:
mmlu_flan_cot_zeroshot_college_medicine
lm_eval/tasks/mmlu/flan_cot_zeroshot/mmlu_college_physics.yaml
0 → 100644
View file @
f23ae748
dataset_name
:
college_physics
description
:
'
The
following
are
multiple
choice
questions
(with
answers)
about
college
physics.
'
include
:
_mmlu_flan_generative_template_yaml
task
:
mmlu_flan_cot_zeroshot_college_physics
lm_eval/tasks/mmlu/flan_cot_zeroshot/mmlu_computer_security.yaml
0 → 100644
View file @
f23ae748
dataset_name
:
computer_security
description
:
'
The
following
are
multiple
choice
questions
(with
answers)
about
computer
security.
'
include
:
_mmlu_flan_generative_template_yaml
task
:
mmlu_flan_cot_zeroshot_computer_security
lm_eval/tasks/mmlu/flan_cot_zeroshot/mmlu_conceptual_physics.yaml
0 → 100644
View file @
f23ae748
dataset_name
:
conceptual_physics
description
:
'
The
following
are
multiple
choice
questions
(with
answers)
about
conceptual
physics.
'
include
:
_mmlu_flan_generative_template_yaml
task
:
mmlu_flan_cot_zeroshot_conceptual_physics
lm_eval/tasks/mmlu/flan_cot_zeroshot/mmlu_econometrics.yaml
0 → 100644
View file @
f23ae748
dataset_name
:
econometrics
description
:
'
The
following
are
multiple
choice
questions
(with
answers)
about
econometrics.
'
include
:
_mmlu_flan_generative_template_yaml
task
:
mmlu_flan_cot_zeroshot_econometrics
lm_eval/tasks/mmlu/flan_cot_zeroshot/mmlu_electrical_engineering.yaml
0 → 100644
View file @
f23ae748
dataset_name
:
electrical_engineering
description
:
'
The
following
are
multiple
choice
questions
(with
answers)
about
electrical
engineering.
'
include
:
_mmlu_flan_generative_template_yaml
task
:
mmlu_flan_cot_zeroshot_electrical_engineering
lm_eval/tasks/mmlu/flan_cot_zeroshot/mmlu_elementary_mathematics.yaml
0 → 100644
View file @
f23ae748
dataset_name
:
elementary_mathematics
description
:
'
The
following
are
multiple
choice
questions
(with
answers)
about
elementary
mathematics.
'
include
:
_mmlu_flan_generative_template_yaml
task
:
mmlu_flan_cot_zeroshot_elementary_mathematics
lm_eval/tasks/mmlu/flan_cot_zeroshot/mmlu_formal_logic.yaml
0 → 100644
View file @
f23ae748
dataset_name
:
formal_logic
description
:
'
The
following
are
multiple
choice
questions
(with
answers)
about
formal
logic.
'
include
:
_mmlu_flan_generative_template_yaml
task
:
mmlu_flan_cot_zeroshot_formal_logic
lm_eval/tasks/mmlu/flan_cot_zeroshot/mmlu_global_facts.yaml
0 → 100644
View file @
f23ae748
dataset_name
:
global_facts
description
:
'
The
following
are
multiple
choice
questions
(with
answers)
about
global
facts.
'
include
:
_mmlu_flan_generative_template_yaml
task
:
mmlu_flan_cot_zeroshot_global_facts
Prev
1
2
3
4
5
6
7
8
…
12
Next
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment