Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
gaoqiong
lm-evaluation-harness
Commits
e200c24e
Commit
e200c24e
authored
Jul 03, 2024
by
lintangsutawika
Browse files
update mmlu
parent
43765669
Changes
342
Hide whitespace changes
Inline
Side-by-side
Showing
20 changed files
with
20 additions
and
20 deletions
+20
-20
lm_eval/tasks/mmlu/default/mmlu_medical_genetics.yaml
lm_eval/tasks/mmlu/default/mmlu_medical_genetics.yaml
+1
-1
lm_eval/tasks/mmlu/default/mmlu_miscellaneous.yaml
lm_eval/tasks/mmlu/default/mmlu_miscellaneous.yaml
+1
-1
lm_eval/tasks/mmlu/default/mmlu_moral_disputes.yaml
lm_eval/tasks/mmlu/default/mmlu_moral_disputes.yaml
+1
-1
lm_eval/tasks/mmlu/default/mmlu_moral_scenarios.yaml
lm_eval/tasks/mmlu/default/mmlu_moral_scenarios.yaml
+1
-1
lm_eval/tasks/mmlu/default/mmlu_nutrition.yaml
lm_eval/tasks/mmlu/default/mmlu_nutrition.yaml
+1
-1
lm_eval/tasks/mmlu/default/mmlu_philosophy.yaml
lm_eval/tasks/mmlu/default/mmlu_philosophy.yaml
+1
-1
lm_eval/tasks/mmlu/default/mmlu_prehistory.yaml
lm_eval/tasks/mmlu/default/mmlu_prehistory.yaml
+1
-1
lm_eval/tasks/mmlu/default/mmlu_professional_accounting.yaml
lm_eval/tasks/mmlu/default/mmlu_professional_accounting.yaml
+1
-1
lm_eval/tasks/mmlu/default/mmlu_professional_law.yaml
lm_eval/tasks/mmlu/default/mmlu_professional_law.yaml
+1
-1
lm_eval/tasks/mmlu/default/mmlu_professional_medicine.yaml
lm_eval/tasks/mmlu/default/mmlu_professional_medicine.yaml
+1
-1
lm_eval/tasks/mmlu/default/mmlu_professional_psychology.yaml
lm_eval/tasks/mmlu/default/mmlu_professional_psychology.yaml
+1
-1
lm_eval/tasks/mmlu/default/mmlu_public_relations.yaml
lm_eval/tasks/mmlu/default/mmlu_public_relations.yaml
+1
-1
lm_eval/tasks/mmlu/default/mmlu_security_studies.yaml
lm_eval/tasks/mmlu/default/mmlu_security_studies.yaml
+1
-1
lm_eval/tasks/mmlu/default/mmlu_sociology.yaml
lm_eval/tasks/mmlu/default/mmlu_sociology.yaml
+1
-1
lm_eval/tasks/mmlu/default/mmlu_us_foreign_policy.yaml
lm_eval/tasks/mmlu/default/mmlu_us_foreign_policy.yaml
+1
-1
lm_eval/tasks/mmlu/default/mmlu_virology.yaml
lm_eval/tasks/mmlu/default/mmlu_virology.yaml
+1
-1
lm_eval/tasks/mmlu/default/mmlu_world_religions.yaml
lm_eval/tasks/mmlu/default/mmlu_world_religions.yaml
+1
-1
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_abstract_algebra.yaml
...al/tasks/mmlu/flan_cot_fewshot/mmlu_abstract_algebra.yaml
+1
-1
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_anatomy.yaml
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_anatomy.yaml
+1
-1
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_astronomy.yaml
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_astronomy.yaml
+1
-1
No files found.
lm_eval/tasks/mmlu/default/mmlu_medical_genetics.yaml
View file @
e200c24e
"
dataset_name"
:
"
medical_genetics"
"
dataset_name"
:
"
medical_genetics"
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
medical
\
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
medical
\
\
genetics.
\n\n
"
\
genetics.
\n\n
"
"
group
"
:
"
mmlu_other_tasks"
"
tag
"
:
"
mmlu_other_tasks"
"
include"
:
"
_default_template_yaml"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
mmlu_medical_genetics"
"
task"
:
"
mmlu_medical_genetics"
"
task_alias"
:
"
medical_genetics"
"
task_alias"
:
"
medical_genetics"
lm_eval/tasks/mmlu/default/mmlu_miscellaneous.yaml
View file @
e200c24e
"
dataset_name"
:
"
miscellaneous"
"
dataset_name"
:
"
miscellaneous"
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
miscellaneous.
\n\
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
miscellaneous.
\n\
\n
"
\n
"
"
group
"
:
"
mmlu_other_tasks"
"
tag
"
:
"
mmlu_other_tasks"
"
include"
:
"
_default_template_yaml"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
mmlu_miscellaneous"
"
task"
:
"
mmlu_miscellaneous"
"
task_alias"
:
"
miscellaneous"
"
task_alias"
:
"
miscellaneous"
lm_eval/tasks/mmlu/default/mmlu_moral_disputes.yaml
View file @
e200c24e
"
dataset_name"
:
"
moral_disputes"
"
dataset_name"
:
"
moral_disputes"
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
moral
\
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
moral
\
\
disputes.
\n\n
"
\
disputes.
\n\n
"
"
group
"
:
"
mmlu_humanities_tasks"
"
tag
"
:
"
mmlu_humanities_tasks"
"
include"
:
"
_default_template_yaml"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
mmlu_moral_disputes"
"
task"
:
"
mmlu_moral_disputes"
"
task_alias"
:
"
moral_disputes"
"
task_alias"
:
"
moral_disputes"
lm_eval/tasks/mmlu/default/mmlu_moral_scenarios.yaml
View file @
e200c24e
"
dataset_name"
:
"
moral_scenarios"
"
dataset_name"
:
"
moral_scenarios"
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
moral
\
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
moral
\
\
scenarios.
\n\n
"
\
scenarios.
\n\n
"
"
group
"
:
"
mmlu_humanities_tasks"
"
tag
"
:
"
mmlu_humanities_tasks"
"
include"
:
"
_default_template_yaml"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
mmlu_moral_scenarios"
"
task"
:
"
mmlu_moral_scenarios"
"
task_alias"
:
"
moral_scenarios"
"
task_alias"
:
"
moral_scenarios"
lm_eval/tasks/mmlu/default/mmlu_nutrition.yaml
View file @
e200c24e
"
dataset_name"
:
"
nutrition"
"
dataset_name"
:
"
nutrition"
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
nutrition.
\n\
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
nutrition.
\n\
\n
"
\n
"
"
group
"
:
"
mmlu_other_tasks"
"
tag
"
:
"
mmlu_other_tasks"
"
include"
:
"
_default_template_yaml"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
mmlu_nutrition"
"
task"
:
"
mmlu_nutrition"
"
task_alias"
:
"
nutrition"
"
task_alias"
:
"
nutrition"
lm_eval/tasks/mmlu/default/mmlu_philosophy.yaml
View file @
e200c24e
"
dataset_name"
:
"
philosophy"
"
dataset_name"
:
"
philosophy"
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
philosophy.
\n\
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
philosophy.
\n\
\n
"
\n
"
"
group
"
:
"
mmlu_humanities_tasks"
"
tag
"
:
"
mmlu_humanities_tasks"
"
include"
:
"
_default_template_yaml"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
mmlu_philosophy"
"
task"
:
"
mmlu_philosophy"
"
task_alias"
:
"
philosophy"
"
task_alias"
:
"
philosophy"
lm_eval/tasks/mmlu/default/mmlu_prehistory.yaml
View file @
e200c24e
"
dataset_name"
:
"
prehistory"
"
dataset_name"
:
"
prehistory"
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
prehistory.
\n\
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
prehistory.
\n\
\n
"
\n
"
"
group
"
:
"
mmlu_humanities_tasks"
"
tag
"
:
"
mmlu_humanities_tasks"
"
include"
:
"
_default_template_yaml"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
mmlu_prehistory"
"
task"
:
"
mmlu_prehistory"
"
task_alias"
:
"
prehistory"
"
task_alias"
:
"
prehistory"
lm_eval/tasks/mmlu/default/mmlu_professional_accounting.yaml
View file @
e200c24e
"
dataset_name"
:
"
professional_accounting"
"
dataset_name"
:
"
professional_accounting"
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
professional
\
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
professional
\
\
accounting.
\n\n
"
\
accounting.
\n\n
"
"
group
"
:
"
mmlu_other_tasks"
"
tag
"
:
"
mmlu_other_tasks"
"
include"
:
"
_default_template_yaml"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
mmlu_professional_accounting"
"
task"
:
"
mmlu_professional_accounting"
"
task_alias"
:
"
professional_accounting"
"
task_alias"
:
"
professional_accounting"
lm_eval/tasks/mmlu/default/mmlu_professional_law.yaml
View file @
e200c24e
"
dataset_name"
:
"
professional_law"
"
dataset_name"
:
"
professional_law"
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
professional
\
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
professional
\
\
law.
\n\n
"
\
law.
\n\n
"
"
group
"
:
"
mmlu_humanities_tasks"
"
tag
"
:
"
mmlu_humanities_tasks"
"
include"
:
"
_default_template_yaml"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
mmlu_professional_law"
"
task"
:
"
mmlu_professional_law"
"
task_alias"
:
"
professional_law"
"
task_alias"
:
"
professional_law"
lm_eval/tasks/mmlu/default/mmlu_professional_medicine.yaml
View file @
e200c24e
"
dataset_name"
:
"
professional_medicine"
"
dataset_name"
:
"
professional_medicine"
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
professional
\
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
professional
\
\
medicine.
\n\n
"
\
medicine.
\n\n
"
"
group
"
:
"
mmlu_other_tasks"
"
tag
"
:
"
mmlu_other_tasks"
"
include"
:
"
_default_template_yaml"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
mmlu_professional_medicine"
"
task"
:
"
mmlu_professional_medicine"
"
task_alias"
:
"
professional_medicine"
"
task_alias"
:
"
professional_medicine"
lm_eval/tasks/mmlu/default/mmlu_professional_psychology.yaml
View file @
e200c24e
"
dataset_name"
:
"
professional_psychology"
"
dataset_name"
:
"
professional_psychology"
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
professional
\
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
professional
\
\
psychology.
\n\n
"
\
psychology.
\n\n
"
"
group
"
:
"
mmlu_social_sciences_tasks"
"
tag
"
:
"
mmlu_social_sciences_tasks"
"
include"
:
"
_default_template_yaml"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
mmlu_professional_psychology"
"
task"
:
"
mmlu_professional_psychology"
"
task_alias"
:
"
professional_psychology"
"
task_alias"
:
"
professional_psychology"
lm_eval/tasks/mmlu/default/mmlu_public_relations.yaml
View file @
e200c24e
"
dataset_name"
:
"
public_relations"
"
dataset_name"
:
"
public_relations"
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
public
\
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
public
\
\
relations.
\n\n
"
\
relations.
\n\n
"
"
group
"
:
"
mmlu_social_sciences_tasks"
"
tag
"
:
"
mmlu_social_sciences_tasks"
"
include"
:
"
_default_template_yaml"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
mmlu_public_relations"
"
task"
:
"
mmlu_public_relations"
"
task_alias"
:
"
public_relations"
"
task_alias"
:
"
public_relations"
lm_eval/tasks/mmlu/default/mmlu_security_studies.yaml
View file @
e200c24e
"
dataset_name"
:
"
security_studies"
"
dataset_name"
:
"
security_studies"
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
security
\
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
security
\
\
studies.
\n\n
"
\
studies.
\n\n
"
"
group
"
:
"
mmlu_social_sciences_tasks"
"
tag
"
:
"
mmlu_social_sciences_tasks"
"
include"
:
"
_default_template_yaml"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
mmlu_security_studies"
"
task"
:
"
mmlu_security_studies"
"
task_alias"
:
"
security_studies"
"
task_alias"
:
"
security_studies"
lm_eval/tasks/mmlu/default/mmlu_sociology.yaml
View file @
e200c24e
"
dataset_name"
:
"
sociology"
"
dataset_name"
:
"
sociology"
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
sociology.
\n\
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
sociology.
\n\
\n
"
\n
"
"
group
"
:
"
mmlu_social_sciences_tasks"
"
tag
"
:
"
mmlu_social_sciences_tasks"
"
include"
:
"
_default_template_yaml"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
mmlu_sociology"
"
task"
:
"
mmlu_sociology"
"
task_alias"
:
"
sociology"
"
task_alias"
:
"
sociology"
lm_eval/tasks/mmlu/default/mmlu_us_foreign_policy.yaml
View file @
e200c24e
"
dataset_name"
:
"
us_foreign_policy"
"
dataset_name"
:
"
us_foreign_policy"
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
us
\
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
us
\
\
foreign
policy.
\n\n
"
\
foreign
policy.
\n\n
"
"
group
"
:
"
mmlu_social_sciences_tasks"
"
tag
"
:
"
mmlu_social_sciences_tasks"
"
include"
:
"
_default_template_yaml"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
mmlu_us_foreign_policy"
"
task"
:
"
mmlu_us_foreign_policy"
"
task_alias"
:
"
us_foreign_policy"
"
task_alias"
:
"
us_foreign_policy"
lm_eval/tasks/mmlu/default/mmlu_virology.yaml
View file @
e200c24e
"
dataset_name"
:
"
virology"
"
dataset_name"
:
"
virology"
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
virology.
\n\
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
virology.
\n\
\n
"
\n
"
"
group
"
:
"
mmlu_other_tasks"
"
tag
"
:
"
mmlu_other_tasks"
"
include"
:
"
_default_template_yaml"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
mmlu_virology"
"
task"
:
"
mmlu_virology"
"
task_alias"
:
"
virology"
"
task_alias"
:
"
virology"
lm_eval/tasks/mmlu/default/mmlu_world_religions.yaml
View file @
e200c24e
"
dataset_name"
:
"
world_religions"
"
dataset_name"
:
"
world_religions"
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
world
\
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
world
\
\
religions.
\n\n
"
\
religions.
\n\n
"
"
group
"
:
"
mmlu_humanities_tasks"
"
tag
"
:
"
mmlu_humanities_tasks"
"
include"
:
"
_default_template_yaml"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
mmlu_world_religions"
"
task"
:
"
mmlu_world_religions"
"
task_alias"
:
"
world_religions"
"
task_alias"
:
"
world_religions"
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_abstract_algebra.yaml
View file @
e200c24e
...
@@ -54,6 +54,6 @@ fewshot_config:
...
@@ -54,6 +54,6 @@ fewshot_config:
not
have
any
roots.
For
c
=
2
the
polynomial
x^2
+
2
has
two
roots
at
x
=
1
not
have
any
roots.
For
c
=
2
the
polynomial
x^2
+
2
has
two
roots
at
x
=
1
and
x
=
2.
Hence
Z_3[x]/(x^2
+
c)
is
a
field
if
and
only
if
c
=
1.
The
answer
and
x
=
2.
Hence
Z_3[x]/(x^2
+
c)
is
a
field
if
and
only
if
c
=
1.
The
answer
is
(B).'
is
(B).'
group
:
mmlu_flan_cot_fewshot_stem
tag
:
mmlu_flan_cot_fewshot_stem
include
:
_mmlu_flan_cot_fewshot_template_yaml
include
:
_mmlu_flan_cot_fewshot_template_yaml
task
:
mmlu_flan_cot_fewshot_abstract_algebra
task
:
mmlu_flan_cot_fewshot_abstract_algebra
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_anatomy.yaml
View file @
e200c24e
...
@@ -70,6 +70,6 @@ fewshot_config:
...
@@ -70,6 +70,6 @@ fewshot_config:
\
origin
of
the
hyoid
bone
are
the
second
and
the
third
pharyngeal
arches
\u2014\
\
origin
of
the
hyoid
bone
are
the
second
and
the
third
pharyngeal
arches
\u2014\
this
information
is
covered
in
the
last
option
(D).
Therefore,
we
conclude
that
\
this
information
is
covered
in
the
last
option
(D).
Therefore,
we
conclude
that
\
\
(D)
must
be
the
correct
answer.
The
answer
is
(D).
\n\n
"
\
(D)
must
be
the
correct
answer.
The
answer
is
(D).
\n\n
"
group
:
mmlu_flan_cot_fewshot_stem
tag
:
mmlu_flan_cot_fewshot_stem
include
:
_mmlu_flan_cot_fewshot_template_yaml
include
:
_mmlu_flan_cot_fewshot_template_yaml
task
:
mmlu_flan_cot_fewshot_anatomy
task
:
mmlu_flan_cot_fewshot_anatomy
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_astronomy.yaml
View file @
e200c24e
...
@@ -65,6 +65,6 @@ fewshot_config:
...
@@ -65,6 +65,6 @@ fewshot_config:
because
it
explains
that
the
surface
is
red
due
to
the
rusted
materials
on
the
because
it
explains
that
the
surface
is
red
due
to
the
rusted
materials
on
the
surface
and
the
red
color
comes
from
the
rust.
So
the
correct
option
is
(A).
surface
and
the
red
color
comes
from
the
rust.
So
the
correct
option
is
(A).
The
answer
is
(A).'
The
answer
is
(A).'
group
:
mmlu_flan_cot_fewshot_stem
tag
:
mmlu_flan_cot_fewshot_stem
include
:
_mmlu_flan_cot_fewshot_template_yaml
include
:
_mmlu_flan_cot_fewshot_template_yaml
task
:
mmlu_flan_cot_fewshot_astronomy
task
:
mmlu_flan_cot_fewshot_astronomy
Prev
1
2
3
4
5
6
7
…
18
Next
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment