Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
gaoqiong
lm-evaluation-harness
Commits
741a6a69
Commit
741a6a69
authored
Aug 20, 2024
by
lintangsutawika
Browse files
Merge branch 'main' of
https://github.com/EleutherAI/lm-evaluation-harness
into mela
parents
494a4515
b536f067
Changes
1000
Hide whitespace changes
Inline
Side-by-side
Showing
20 changed files
with
51 additions
and
36 deletions
+51
-36
lm_eval/tasks/mmlu/default/mmlu_nutrition.yaml
lm_eval/tasks/mmlu/default/mmlu_nutrition.yaml
+1
-2
lm_eval/tasks/mmlu/default/mmlu_philosophy.yaml
lm_eval/tasks/mmlu/default/mmlu_philosophy.yaml
+1
-2
lm_eval/tasks/mmlu/default/mmlu_prehistory.yaml
lm_eval/tasks/mmlu/default/mmlu_prehistory.yaml
+1
-2
lm_eval/tasks/mmlu/default/mmlu_professional_accounting.yaml
lm_eval/tasks/mmlu/default/mmlu_professional_accounting.yaml
+1
-2
lm_eval/tasks/mmlu/default/mmlu_professional_law.yaml
lm_eval/tasks/mmlu/default/mmlu_professional_law.yaml
+1
-2
lm_eval/tasks/mmlu/default/mmlu_professional_medicine.yaml
lm_eval/tasks/mmlu/default/mmlu_professional_medicine.yaml
+1
-2
lm_eval/tasks/mmlu/default/mmlu_professional_psychology.yaml
lm_eval/tasks/mmlu/default/mmlu_professional_psychology.yaml
+1
-2
lm_eval/tasks/mmlu/default/mmlu_public_relations.yaml
lm_eval/tasks/mmlu/default/mmlu_public_relations.yaml
+1
-2
lm_eval/tasks/mmlu/default/mmlu_security_studies.yaml
lm_eval/tasks/mmlu/default/mmlu_security_studies.yaml
+1
-2
lm_eval/tasks/mmlu/default/mmlu_sociology.yaml
lm_eval/tasks/mmlu/default/mmlu_sociology.yaml
+1
-2
lm_eval/tasks/mmlu/default/mmlu_us_foreign_policy.yaml
lm_eval/tasks/mmlu/default/mmlu_us_foreign_policy.yaml
+1
-2
lm_eval/tasks/mmlu/default/mmlu_virology.yaml
lm_eval/tasks/mmlu/default/mmlu_virology.yaml
+1
-2
lm_eval/tasks/mmlu/default/mmlu_world_religions.yaml
lm_eval/tasks/mmlu/default/mmlu_world_religions.yaml
+1
-2
lm_eval/tasks/mmlu/flan_cot_fewshot/_mmlu.yaml
lm_eval/tasks/mmlu/flan_cot_fewshot/_mmlu.yaml
+30
-4
lm_eval/tasks/mmlu/flan_cot_fewshot/_mmlu_flan_cot_fewshot_template_yaml
...mlu/flan_cot_fewshot/_mmlu_flan_cot_fewshot_template_yaml
+3
-1
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_abstract_algebra.yaml
...al/tasks/mmlu/flan_cot_fewshot/mmlu_abstract_algebra.yaml
+1
-1
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_anatomy.yaml
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_anatomy.yaml
+1
-1
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_astronomy.yaml
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_astronomy.yaml
+1
-1
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_business_ethics.yaml
...val/tasks/mmlu/flan_cot_fewshot/mmlu_business_ethics.yaml
+1
-1
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_clinical_knowledge.yaml
.../tasks/mmlu/flan_cot_fewshot/mmlu_clinical_knowledge.yaml
+1
-1
No files found.
Too many changes to show.
To preserve performance only
1000 of 1000+
files are displayed.
Plain diff
Email patch
lm_eval/tasks/mmlu/default/mmlu_nutrition.yaml
View file @
741a6a69
"
dataset_name"
:
"
nutrition"
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
nutrition.
\n\
\n
"
"
group"
:
"
mmlu_other"
"
group_alias"
:
"
other"
"
tag"
:
"
mmlu_other_tasks"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
mmlu_nutrition"
"
task_alias"
:
"
nutrition"
lm_eval/tasks/mmlu/default/mmlu_philosophy.yaml
View file @
741a6a69
"
dataset_name"
:
"
philosophy"
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
philosophy.
\n\
\n
"
"
group"
:
"
mmlu_humanities"
"
group_alias"
:
"
humanities"
"
tag"
:
"
mmlu_humanities_tasks"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
mmlu_philosophy"
"
task_alias"
:
"
philosophy"
lm_eval/tasks/mmlu/default/mmlu_prehistory.yaml
View file @
741a6a69
"
dataset_name"
:
"
prehistory"
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
prehistory.
\n\
\n
"
"
group"
:
"
mmlu_humanities"
"
group_alias"
:
"
humanities"
"
tag"
:
"
mmlu_humanities_tasks"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
mmlu_prehistory"
"
task_alias"
:
"
prehistory"
lm_eval/tasks/mmlu/default/mmlu_professional_accounting.yaml
View file @
741a6a69
"
dataset_name"
:
"
professional_accounting"
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
professional
\
\
accounting.
\n\n
"
"
group"
:
"
mmlu_other"
"
group_alias"
:
"
other"
"
tag"
:
"
mmlu_other_tasks"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
mmlu_professional_accounting"
"
task_alias"
:
"
professional_accounting"
lm_eval/tasks/mmlu/default/mmlu_professional_law.yaml
View file @
741a6a69
"
dataset_name"
:
"
professional_law"
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
professional
\
\
law.
\n\n
"
"
group"
:
"
mmlu_humanities"
"
group_alias"
:
"
humanities"
"
tag"
:
"
mmlu_humanities_tasks"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
mmlu_professional_law"
"
task_alias"
:
"
professional_law"
lm_eval/tasks/mmlu/default/mmlu_professional_medicine.yaml
View file @
741a6a69
"
dataset_name"
:
"
professional_medicine"
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
professional
\
\
medicine.
\n\n
"
"
group"
:
"
mmlu_other"
"
group_alias"
:
"
other"
"
tag"
:
"
mmlu_other_tasks"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
mmlu_professional_medicine"
"
task_alias"
:
"
professional_medicine"
lm_eval/tasks/mmlu/default/mmlu_professional_psychology.yaml
View file @
741a6a69
"
dataset_name"
:
"
professional_psychology"
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
professional
\
\
psychology.
\n\n
"
"
group"
:
"
mmlu_social_sciences"
"
group_alias"
:
"
social_sciences"
"
tag"
:
"
mmlu_social_sciences_tasks"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
mmlu_professional_psychology"
"
task_alias"
:
"
professional_psychology"
lm_eval/tasks/mmlu/default/mmlu_public_relations.yaml
View file @
741a6a69
"
dataset_name"
:
"
public_relations"
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
public
\
\
relations.
\n\n
"
"
group"
:
"
mmlu_social_sciences"
"
group_alias"
:
"
social_sciences"
"
tag"
:
"
mmlu_social_sciences_tasks"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
mmlu_public_relations"
"
task_alias"
:
"
public_relations"
lm_eval/tasks/mmlu/default/mmlu_security_studies.yaml
View file @
741a6a69
"
dataset_name"
:
"
security_studies"
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
security
\
\
studies.
\n\n
"
"
group"
:
"
mmlu_social_sciences"
"
group_alias"
:
"
social_sciences"
"
tag"
:
"
mmlu_social_sciences_tasks"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
mmlu_security_studies"
"
task_alias"
:
"
security_studies"
lm_eval/tasks/mmlu/default/mmlu_sociology.yaml
View file @
741a6a69
"
dataset_name"
:
"
sociology"
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
sociology.
\n\
\n
"
"
group"
:
"
mmlu_social_sciences"
"
group_alias"
:
"
social_sciences"
"
tag"
:
"
mmlu_social_sciences_tasks"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
mmlu_sociology"
"
task_alias"
:
"
sociology"
lm_eval/tasks/mmlu/default/mmlu_us_foreign_policy.yaml
View file @
741a6a69
"
dataset_name"
:
"
us_foreign_policy"
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
us
\
\
foreign
policy.
\n\n
"
"
group"
:
"
mmlu_social_sciences"
"
group_alias"
:
"
social_sciences"
"
tag"
:
"
mmlu_social_sciences_tasks"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
mmlu_us_foreign_policy"
"
task_alias"
:
"
us_foreign_policy"
lm_eval/tasks/mmlu/default/mmlu_virology.yaml
View file @
741a6a69
"
dataset_name"
:
"
virology"
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
virology.
\n\
\n
"
"
group"
:
"
mmlu_other"
"
group_alias"
:
"
other"
"
tag"
:
"
mmlu_other_tasks"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
mmlu_virology"
"
task_alias"
:
"
virology"
lm_eval/tasks/mmlu/default/mmlu_world_religions.yaml
View file @
741a6a69
"
dataset_name"
:
"
world_religions"
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
world
\
\
religions.
\n\n
"
"
group"
:
"
mmlu_humanities"
"
group_alias"
:
"
humanities"
"
tag"
:
"
mmlu_humanities_tasks"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
mmlu_world_religions"
"
task_alias"
:
"
world_religions"
lm_eval/tasks/mmlu/flan_cot_fewshot/_mmlu.yaml
View file @
741a6a69
group
:
mmlu_flan_cot_fewshot
group_alias
:
mmlu (flan style, fewshot cot)
task
:
-
mmlu_flan_cot_fewshot_stem
-
mmlu_flan_cot_fewshot_other
-
mmlu_flan_cot_fewshot_social_sciences
-
mmlu_flan_cot_fewshot_humanities
-
group
:
stem
task
:
-
mmlu_flan_cot_fewshot_stem
aggregate_metric_list
:
-
metric
:
acc
weight_by_size
:
True
-
group
:
other
task
:
-
mmlu_flan_cot_fewshot_other
aggregate_metric_list
:
-
metric
:
acc
weight_by_size
:
True
-
group
:
social sciences
task
:
-
mmlu_flan_cot_fewshot_social_sciences
aggregate_metric_list
:
-
metric
:
acc
weight_by_size
:
True
-
group
:
humanities
task
:
-
mmlu_flan_cot_fewshot_humanities
aggregate_metric_list
:
-
metric
:
acc
weight_by_size
:
True
aggregate_metric_list
:
-
metric
:
acc
weight_by_size
:
True
metadata
:
version
:
2
lm_eval/tasks/mmlu/flan_cot_fewshot/_mmlu_flan_cot_fewshot_template_yaml
View file @
741a6a69
...
...
@@ -26,4 +26,6 @@ metric_list:
ignore_case: true
ignore_punctuation: true
metadata:
version: 1.0
version: 2.0
dataset_kwargs:
trust_remote_code: true
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_abstract_algebra.yaml
View file @
741a6a69
...
...
@@ -54,6 +54,6 @@ fewshot_config:
not
have
any
roots.
For
c
=
2
the
polynomial
x^2
+
2
has
two
roots
at
x
=
1
and
x
=
2.
Hence
Z_3[x]/(x^2
+
c)
is
a
field
if
and
only
if
c
=
1.
The
answer
is
(B).'
group
:
mmlu_flan_cot_fewshot_stem
tag
:
mmlu_flan_cot_fewshot_stem
include
:
_mmlu_flan_cot_fewshot_template_yaml
task
:
mmlu_flan_cot_fewshot_abstract_algebra
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_anatomy.yaml
View file @
741a6a69
...
...
@@ -70,6 +70,6 @@ fewshot_config:
\
origin
of
the
hyoid
bone
are
the
second
and
the
third
pharyngeal
arches
\u2014\
this
information
is
covered
in
the
last
option
(D).
Therefore,
we
conclude
that
\
\
(D)
must
be
the
correct
answer.
The
answer
is
(D).
\n\n
"
group
:
mmlu_flan_cot_fewshot_stem
tag
:
mmlu_flan_cot_fewshot_stem
include
:
_mmlu_flan_cot_fewshot_template_yaml
task
:
mmlu_flan_cot_fewshot_anatomy
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_astronomy.yaml
View file @
741a6a69
...
...
@@ -65,6 +65,6 @@ fewshot_config:
because
it
explains
that
the
surface
is
red
due
to
the
rusted
materials
on
the
surface
and
the
red
color
comes
from
the
rust.
So
the
correct
option
is
(A).
The
answer
is
(A).'
group
:
mmlu_flan_cot_fewshot_stem
tag
:
mmlu_flan_cot_fewshot_stem
include
:
_mmlu_flan_cot_fewshot_template_yaml
task
:
mmlu_flan_cot_fewshot_astronomy
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_business_ethics.yaml
View file @
741a6a69
...
...
@@ -70,6 +70,6 @@ fewshot_config:
\
moral
arguments
relating
to:
negative
*externalities*,
the
*power*
that
corporations
\
\
possess
and
the
*mutual
independence*
of
business
and
society.
The
answer
\
\
is
(D).
\n\n
"
group
:
mmlu_flan_cot_fewshot_other
tag
:
mmlu_flan_cot_fewshot_other
include
:
_mmlu_flan_cot_fewshot_template_yaml
task
:
mmlu_flan_cot_fewshot_business_ethics
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_clinical_knowledge.yaml
View file @
741a6a69
...
...
@@ -43,6 +43,6 @@ fewshot_config:
target
:
'
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
clinical
knowledge
for
help.
The
energy
for
muscular
contraction
is
provided
by
ATP
(adenosine
triphosphate),
which
is
the
powerhouse
of
the
cell.
The
answer
is
(A).'
group
:
mmlu_flan_cot_fewshot_other
tag
:
mmlu_flan_cot_fewshot_other
include
:
_mmlu_flan_cot_fewshot_template_yaml
task
:
mmlu_flan_cot_fewshot_clinical_knowledge
Prev
1
…
41
42
43
44
45
46
47
48
49
50
Next
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment