Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
gaoqiong
lm-evaluation-harness
Commits
c9bbec6e
Unverified
Commit
c9bbec6e
authored
Dec 04, 2023
by
Hailey Schoelkopf
Committed by
GitHub
Dec 04, 2023
Browse files
Merge pull request #1060 from EleutherAI/fix-mmlu
[Refactor] Fix fewshot cot mmlu descriptions
parents
7afae7b5
57e017ff
Pipeline
#2992
failed with stages
Changes
59
Pipelines
1
Show whitespace changes
Inline
Side-by-side
Showing
19 changed files
with
19 additions
and
19 deletions
+19
-19
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_management.yaml
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_management.yaml
+1
-1
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_marketing.yaml
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_marketing.yaml
+1
-1
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_medical_genetics.yaml
...al/tasks/mmlu/flan_cot_fewshot/mmlu_medical_genetics.yaml
+1
-1
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_miscellaneous.yaml
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_miscellaneous.yaml
+1
-1
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_moral_disputes.yaml
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_moral_disputes.yaml
+1
-1
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_moral_scenarios.yaml
...val/tasks/mmlu/flan_cot_fewshot/mmlu_moral_scenarios.yaml
+1
-1
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_nutrition.yaml
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_nutrition.yaml
+1
-1
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_philosophy.yaml
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_philosophy.yaml
+1
-1
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_prehistory.yaml
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_prehistory.yaml
+1
-1
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_professional_accounting.yaml
...s/mmlu/flan_cot_fewshot/mmlu_professional_accounting.yaml
+1
-1
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_professional_law.yaml
...al/tasks/mmlu/flan_cot_fewshot/mmlu_professional_law.yaml
+1
-1
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_professional_medicine.yaml
...sks/mmlu/flan_cot_fewshot/mmlu_professional_medicine.yaml
+1
-1
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_professional_psychology.yaml
...s/mmlu/flan_cot_fewshot/mmlu_professional_psychology.yaml
+1
-1
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_public_relations.yaml
...al/tasks/mmlu/flan_cot_fewshot/mmlu_public_relations.yaml
+1
-1
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_security_studies.yaml
...al/tasks/mmlu/flan_cot_fewshot/mmlu_security_studies.yaml
+1
-1
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_sociology.yaml
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_sociology.yaml
+1
-1
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_us_foreign_policy.yaml
...l/tasks/mmlu/flan_cot_fewshot/mmlu_us_foreign_policy.yaml
+1
-1
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_virology.yaml
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_virology.yaml
+1
-1
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_world_religions.yaml
...val/tasks/mmlu/flan_cot_fewshot/mmlu_world_religions.yaml
+1
-1
No files found.
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_management.yaml
View file @
c9bbec6e
...
@@ -27,7 +27,7 @@
...
@@ -27,7 +27,7 @@
\
(D)
Initiating
structure
and
considerations
\n
A:
Let's
think
step
by
step.
We
refer
\
\
(D)
Initiating
structure
and
considerations
\n
A:
Let's
think
step
by
step.
We
refer
\
\
to
Wikipedia
articles
on
management
for
help.
The
Ohio
State
Leadership
Studies
\
\
to
Wikipedia
articles
on
management
for
help.
The
Ohio
State
Leadership
Studies
\
\
conducted
in
the
1940s
identified
initiating
structure
and
consideration
as
the
\
\
conducted
in
the
1940s
identified
initiating
structure
and
consideration
as
the
\
\
two
main
dimensions
of
leader
behavior.
The
answer
is
(D)."
\
two
main
dimensions
of
leader
behavior.
The
answer
is
(D).
\n\n
"
"
group"
:
"
mmlu_flan_cot_fewshot_other"
"
group"
:
"
mmlu_flan_cot_fewshot_other"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
task"
:
"
mmlu_flan_cot_fewshot_management"
"
task"
:
"
mmlu_flan_cot_fewshot_management"
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_marketing.yaml
View file @
c9bbec6e
...
@@ -34,7 +34,7 @@
...
@@ -34,7 +34,7 @@
\
and
geographic
variables.
\n
(A)
Geodemographics
(B)
Product
differentiation.
(C)
\
\
and
geographic
variables.
\n
(A)
Geodemographics
(B)
Product
differentiation.
(C)
\
\
ANSOFF
matrix.
(D)
Brand
management.
\n
A:
Let's
think
step
by
step.
We
refer
to
\
\
ANSOFF
matrix.
(D)
Brand
management.
\n
A:
Let's
think
step
by
step.
We
refer
to
\
\
Wikipedia
articles
on
marketing
for
help.
Geodemographics
is
a
natural
outcome
\
\
Wikipedia
articles
on
marketing
for
help.
Geodemographics
is
a
natural
outcome
\
\
when
combining
demographic
and
geographic
variables.
The
answer
is
(A)."
\
when
combining
demographic
and
geographic
variables.
The
answer
is
(A).
\n\n
"
"
group"
:
"
mmlu_flan_cot_fewshot_other"
"
group"
:
"
mmlu_flan_cot_fewshot_other"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
task"
:
"
mmlu_flan_cot_fewshot_marketing"
"
task"
:
"
mmlu_flan_cot_fewshot_marketing"
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_medical_genetics.yaml
View file @
c9bbec6e
...
@@ -31,7 +31,7 @@
...
@@ -31,7 +31,7 @@
\
think
step
by
step.
We
refer
to
Wikipedia
articles
on
medical
genetics
for
help.
\
\
think
step
by
step.
We
refer
to
Wikipedia
articles
on
medical
genetics
for
help.
\
\
A
Southern
blot
is
a
method
in
molecular
biology
for
detecting
specific
DNA
sequences
\
\
A
Southern
blot
is
a
method
in
molecular
biology
for
detecting
specific
DNA
sequences
\
\
in
a
sample.
Large
triplet
repeat
expansions
are
usually
detected
with
this
method.
\
\
in
a
sample.
Large
triplet
repeat
expansions
are
usually
detected
with
this
method.
\
\
The
answer
is
(C)."
\
The
answer
is
(C).
\n\n
"
"
group"
:
"
mmlu_flan_cot_fewshot_other"
"
group"
:
"
mmlu_flan_cot_fewshot_other"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
task"
:
"
mmlu_flan_cot_fewshot_medical_genetics"
"
task"
:
"
mmlu_flan_cot_fewshot_medical_genetics"
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_miscellaneous.yaml
View file @
c9bbec6e
...
@@ -21,7 +21,7 @@
...
@@ -21,7 +21,7 @@
\
the
shortest
NBA
player
in
history
to
win
an
official
slam
dunk
contest.
The
answer
\
\
the
shortest
NBA
player
in
history
to
win
an
official
slam
dunk
contest.
The
answer
\
\
is
(A).
\n\n
Q:
How
many
axles
does
a
standard
automobile
have?
\n
(A)
one
(B)
two
\
\
is
(A).
\n\n
Q:
How
many
axles
does
a
standard
automobile
have?
\n
(A)
one
(B)
two
\
\
(C)
four
(D)
eight
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
for
help.
\
\
(C)
four
(D)
eight
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
for
help.
\
\
Most
cars
have
two
axles
to
rotate
the
wheels..
The
answer
is
(B)."
\
Most
cars
have
two
axles
to
rotate
the
wheels..
The
answer
is
(B).
\n\n
"
"
group"
:
"
mmlu_flan_cot_fewshot_other"
"
group"
:
"
mmlu_flan_cot_fewshot_other"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
task"
:
"
mmlu_flan_cot_fewshot_miscellaneous"
"
task"
:
"
mmlu_flan_cot_fewshot_miscellaneous"
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_moral_disputes.yaml
View file @
c9bbec6e
...
@@ -43,7 +43,7 @@
...
@@ -43,7 +43,7 @@
\
on
ethical
philosophy
in
South
Africa.
Metz
has
written
the
death
penalty
is
unjustified
\
\
on
ethical
philosophy
in
South
Africa.
Metz
has
written
the
death
penalty
is
unjustified
\
\
as
it
treats
individuals
as
incapable
of
communal
relations.
It
is
unclear
that
\
\
as
it
treats
individuals
as
incapable
of
communal
relations.
It
is
unclear
that
\
\
capital
punishment
is
to
the
benefit
of,
or
a
deterrent
of
harm
to
the
community.
\
\
capital
punishment
is
to
the
benefit
of,
or
a
deterrent
of
harm
to
the
community.
\
\
The
answer
is
(A)."
\
The
answer
is
(A).
\n\n
"
"
group"
:
"
mmlu_flan_cot_fewshot_humanities"
"
group"
:
"
mmlu_flan_cot_fewshot_humanities"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
task"
:
"
mmlu_flan_cot_fewshot_moral_disputes"
"
task"
:
"
mmlu_flan_cot_fewshot_moral_disputes"
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_moral_scenarios.yaml
View file @
c9bbec6e
...
@@ -38,7 +38,7 @@
...
@@ -38,7 +38,7 @@
\
house.
\n
(A)
Wrong,
Wrong
(B)
Wrong,
Not
wrong
(C)
Not
wrong,
Wrong
(D)
Not
wrong,
\
\
house.
\n
(A)
Wrong,
Wrong
(B)
Wrong,
Not
wrong
(C)
Not
wrong,
Wrong
(D)
Not
wrong,
\
\
Not
wrong
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
moral
\
\
Not
wrong
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
moral
\
\
scenarios
for
help.
Loving
someone
is
not
wrong.
However,
exposing
something
that
\
\
scenarios
for
help.
Loving
someone
is
not
wrong.
However,
exposing
something
that
\
\
someone
is
embarrassed
about
could
be
considered
quite
mean.
The
answer
is
(C)."
\
someone
is
embarrassed
about
could
be
considered
quite
mean.
The
answer
is
(C).
\n\n
"
"
group"
:
"
mmlu_flan_cot_fewshot_humanities"
"
group"
:
"
mmlu_flan_cot_fewshot_humanities"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
task"
:
"
mmlu_flan_cot_fewshot_moral_scenarios"
"
task"
:
"
mmlu_flan_cot_fewshot_moral_scenarios"
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_nutrition.yaml
View file @
c9bbec6e
...
@@ -42,7 +42,7 @@
...
@@ -42,7 +42,7 @@
\
in
disease
risk
between
the
two
groups.
\n
A:
Let's
think
step
by
step.
We
refer
\
\
in
disease
risk
between
the
two
groups.
\n
A:
Let's
think
step
by
step.
We
refer
\
\
to
Wikipedia
articles
on
nutrition
for
help.
The
risk
ratio
is
not
sufficiently
\
\
to
Wikipedia
articles
on
nutrition
for
help.
The
risk
ratio
is
not
sufficiently
\
\
reduced
that
it
could
not
be
explained
by
random
chance
given
the
studies
sample
\
\
reduced
that
it
could
not
be
explained
by
random
chance
given
the
studies
sample
\
\
size.
The
answer
is
(C)."
\
size.
The
answer
is
(C).
\n\n
"
"
group"
:
"
mmlu_flan_cot_fewshot_other"
"
group"
:
"
mmlu_flan_cot_fewshot_other"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
task"
:
"
mmlu_flan_cot_fewshot_nutrition"
"
task"
:
"
mmlu_flan_cot_fewshot_nutrition"
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_philosophy.yaml
View file @
c9bbec6e
...
@@ -24,7 +24,7 @@
...
@@ -24,7 +24,7 @@
\
(D)
none
of
the
above.
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
\
\
(D)
none
of
the
above.
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
\
\
on
philosophy
for
help.
Psychological
egoism
suggests
that
one
behaves
based
on
\
\
on
philosophy
for
help.
Psychological
egoism
suggests
that
one
behaves
based
on
\
\
what
makes
one
feels
good,
hence
it
is
a
claim
about
human
nature
and
how
humans
\
\
what
makes
one
feels
good,
hence
it
is
a
claim
about
human
nature
and
how
humans
\
\
are
capable
of
behaving.
The
answer
is
(C)."
\
are
capable
of
behaving.
The
answer
is
(C).
\n\n
"
"
group"
:
"
mmlu_flan_cot_fewshot_humanities"
"
group"
:
"
mmlu_flan_cot_fewshot_humanities"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
task"
:
"
mmlu_flan_cot_fewshot_philosophy"
"
task"
:
"
mmlu_flan_cot_fewshot_philosophy"
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_prehistory.yaml
View file @
c9bbec6e
...
@@ -36,7 +36,7 @@
...
@@ -36,7 +36,7 @@
\
(C)
frighten
away
enemies,
in
particular
the
Spaniards.
(D)
legitimize
his
kingship,
\
\
(C)
frighten
away
enemies,
in
particular
the
Spaniards.
(D)
legitimize
his
kingship,
\
\
since
his
father
was
not
royal.
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
\
\
since
his
father
was
not
royal.
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
\
\
articles
on
prehistory
for
help.
Pacal
built
the
temples
as
the
funerary
monument
\
\
articles
on
prehistory
for
help.
Pacal
built
the
temples
as
the
funerary
monument
\
\
to
legitimize
his
kingship.
The
answer
is
(D)."
\
to
legitimize
his
kingship.
The
answer
is
(D).
\n\n
"
"
group"
:
"
mmlu_flan_cot_fewshot_humanities"
"
group"
:
"
mmlu_flan_cot_fewshot_humanities"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
task"
:
"
mmlu_flan_cot_fewshot_prehistory"
"
task"
:
"
mmlu_flan_cot_fewshot_prehistory"
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_professional_accounting.yaml
View file @
c9bbec6e
...
@@ -42,7 +42,7 @@
...
@@ -42,7 +42,7 @@
\
by
step.
We
refer
to
Wikipedia
articles
on
accounting
for
help.
Among
the
four
\
\
by
step.
We
refer
to
Wikipedia
articles
on
accounting
for
help.
Among
the
four
\
\
transactions,
only
Proceeds
from
long-term
debt
belongs
to
the
financing
activities
\
\
transactions,
only
Proceeds
from
long-term
debt
belongs
to
the
financing
activities
\
\
section
of
cashflow,
hence
the
amount
reported
should
be
$100000.
The
answer
is
\
\
section
of
cashflow,
hence
the
amount
reported
should
be
$100000.
The
answer
is
\
\
(D)."
\
(D).
\n\n
"
"
group"
:
"
mmlu_flan_cot_fewshot_other"
"
group"
:
"
mmlu_flan_cot_fewshot_other"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
task"
:
"
mmlu_flan_cot_fewshot_professional_accounting"
"
task"
:
"
mmlu_flan_cot_fewshot_professional_accounting"
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_professional_law.yaml
View file @
c9bbec6e
...
@@ -100,7 +100,7 @@
...
@@ -100,7 +100,7 @@
\
think
step
by
step.
We
refer
to
Wikipedia
articles
on
law
for
help.
The
Fourteenth
\
\
think
step
by
step.
We
refer
to
Wikipedia
articles
on
law
for
help.
The
Fourteenth
\
\
Amendment
further
supports
the
First
Amendment
by
establishing
a
due
process
clause.
\
\
Amendment
further
supports
the
First
Amendment
by
establishing
a
due
process
clause.
\
\
Hence
the
strongest
argument
should
be
the
statute
is
overbroad
and
consequently
\
\
Hence
the
strongest
argument
should
be
the
statute
is
overbroad
and
consequently
\
\
invalid
under
the
First
and
Fourteenth
Amendments.
The
answer
is
(D)."
\
invalid
under
the
First
and
Fourteenth
Amendments.
The
answer
is
(D).
\n\n
"
"
group"
:
"
mmlu_flan_cot_fewshot_humanities"
"
group"
:
"
mmlu_flan_cot_fewshot_humanities"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
task"
:
"
mmlu_flan_cot_fewshot_professional_law"
"
task"
:
"
mmlu_flan_cot_fewshot_professional_law"
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_professional_medicine.yaml
View file @
c9bbec6e
...
@@ -64,7 +64,7 @@
...
@@ -64,7 +64,7 @@
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
medicine
for
help.
\
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
medicine
for
help.
\
\
The
symptoms
and
the
adrenal
mass
suggested
pheochromocytoma,
and
the
blood
pressure
\
\
The
symptoms
and
the
adrenal
mass
suggested
pheochromocytoma,
and
the
blood
pressure
\
\
indicates
hypertension.
Phenoxybenzamine
is
used
to
treat
hypertension
caused
\
\
indicates
hypertension.
Phenoxybenzamine
is
used
to
treat
hypertension
caused
\
\
by
pheochromocytoma.
The
answer
is
(D)."
\
by
pheochromocytoma.
The
answer
is
(D).
\n\n
"
"
group"
:
"
mmlu_flan_cot_fewshot_other"
"
group"
:
"
mmlu_flan_cot_fewshot_other"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
task"
:
"
mmlu_flan_cot_fewshot_professional_medicine"
"
task"
:
"
mmlu_flan_cot_fewshot_professional_medicine"
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_professional_psychology.yaml
View file @
c9bbec6e
...
@@ -42,7 +42,7 @@
...
@@ -42,7 +42,7 @@
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
psychology
for
help.
\
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
psychology
for
help.
\
\
Based
on
the
circumstances,
you
should
tell
your
client
about
the
pros
and
cons
\
\
Based
on
the
circumstances,
you
should
tell
your
client
about
the
pros
and
cons
\
\
of
each
program,
but
it
would
be
inappropriate
to
receive
the
bonus,
so
you
should
\
\
of
each
program,
but
it
would
be
inappropriate
to
receive
the
bonus,
so
you
should
\
\
not
claim
the
$50
bonus.
The
answer
is
(D)."
\
not
claim
the
$50
bonus.
The
answer
is
(D).
\n\n
"
"
group"
:
"
mmlu_flan_cot_fewshot_social_sciences"
"
group"
:
"
mmlu_flan_cot_fewshot_social_sciences"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
task"
:
"
mmlu_flan_cot_fewshot_professional_psychology"
"
task"
:
"
mmlu_flan_cot_fewshot_professional_psychology"
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_public_relations.yaml
View file @
c9bbec6e
...
@@ -33,7 +33,7 @@
...
@@ -33,7 +33,7 @@
\
think
step
by
step.
We
refer
to
Wikipedia
articles
on
public
relations
for
help.
\
\
think
step
by
step.
We
refer
to
Wikipedia
articles
on
public
relations
for
help.
\
\
If
a
public
relations
media
practitioner
does
not
know
the
answer
to
a
reporter's
\
\
If
a
public
relations
media
practitioner
does
not
know
the
answer
to
a
reporter's
\
\
question,
they
should
say
'I
don't
know'
and
offer
to
provide
the
information
\
\
question,
they
should
say
'I
don't
know'
and
offer
to
provide
the
information
\
\
later.
The
answer
is
(C)."
\
later.
The
answer
is
(C).
\n\n
"
"
group"
:
"
mmlu_flan_cot_fewshot_social_sciences"
"
group"
:
"
mmlu_flan_cot_fewshot_social_sciences"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
task"
:
"
mmlu_flan_cot_fewshot_public_relations"
"
task"
:
"
mmlu_flan_cot_fewshot_public_relations"
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_security_studies.yaml
View file @
c9bbec6e
...
@@ -80,7 +80,7 @@
...
@@ -80,7 +80,7 @@
\
conflict.
It
seeks
to
control
by
imposing
compliance
by
removing
any
opportunity
\
\
conflict.
It
seeks
to
control
by
imposing
compliance
by
removing
any
opportunity
\
\
for
negotiation
or
concession.
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
\
\
for
negotiation
or
concession.
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
\
\
articles
on
security
studies
for
help.
Coercive
diplomacy
uses
the
threat
of
force
\
\
articles
on
security
studies
for
help.
Coercive
diplomacy
uses
the
threat
of
force
\
\
to
induce
the
opponent
to
comply
with
demands.
The
answer
is
(B)."
\
to
induce
the
opponent
to
comply
with
demands.
The
answer
is
(B).
\n\n
"
"
group"
:
"
mmlu_flan_cot_fewshot_social_sciences"
"
group"
:
"
mmlu_flan_cot_fewshot_social_sciences"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
task"
:
"
mmlu_flan_cot_fewshot_security_studies"
"
task"
:
"
mmlu_flan_cot_fewshot_security_studies"
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_sociology.yaml
View file @
c9bbec6e
...
@@ -37,7 +37,7 @@
...
@@ -37,7 +37,7 @@
\
step.
We
refer
to
Wikipedia
articles
on
sociology
for
help.
The
post-war
welfare
\
\
step.
We
refer
to
Wikipedia
articles
on
sociology
for
help.
The
post-war
welfare
\
\
state
of
1948
aimed
to
provide
free
healthcare
and
education,
full
employment,
\
\
state
of
1948
aimed
to
provide
free
healthcare
and
education,
full
employment,
\
\
and
universal
welfare.
But
it
did
not
aim
to
provide
a
minimum
wage.
The
answer
\
\
and
universal
welfare.
But
it
did
not
aim
to
provide
a
minimum
wage.
The
answer
\
\
is
(B)."
\
is
(B).
\n\n
"
"
group"
:
"
mmlu_flan_cot_fewshot_social_sciences"
"
group"
:
"
mmlu_flan_cot_fewshot_social_sciences"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
task"
:
"
mmlu_flan_cot_fewshot_sociology"
"
task"
:
"
mmlu_flan_cot_fewshot_sociology"
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_us_foreign_policy.yaml
View file @
c9bbec6e
...
@@ -34,7 +34,7 @@
...
@@ -34,7 +34,7 @@
\
Obama
(D)
It
reduced
global
use
of
the
US
dollar
\n
A:
Let's
think
step
by
step.
\
\
Obama
(D)
It
reduced
global
use
of
the
US
dollar
\n
A:
Let's
think
step
by
step.
\
\
We
refer
to
Wikipedia
articles
on
us
foreign
policy
for
help.
The
2008
financial
\
\
We
refer
to
Wikipedia
articles
on
us
foreign
policy
for
help.
The
2008
financial
\
\
crisis
damanged
the
international
reputation
of
the
American
model
of
political
\
\
crisis
damanged
the
international
reputation
of
the
American
model
of
political
\
\
economy
and
capitalism.
The
answer
is
(A)."
\
economy
and
capitalism.
The
answer
is
(A).
\n\n
"
"
group"
:
"
mmlu_flan_cot_fewshot_social_sciences"
"
group"
:
"
mmlu_flan_cot_fewshot_social_sciences"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
task"
:
"
mmlu_flan_cot_fewshot_us_foreign_policy"
"
task"
:
"
mmlu_flan_cot_fewshot_us_foreign_policy"
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_virology.yaml
View file @
c9bbec6e
...
@@ -25,7 +25,7 @@
...
@@ -25,7 +25,7 @@
\
a
helper
virus
(C)
Only
replicate
in
dividing
cells
(D)
Can
integrate
into
host
\
\
a
helper
virus
(C)
Only
replicate
in
dividing
cells
(D)
Can
integrate
into
host
\
\
chromosomes
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
virology
\
\
chromosomes
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
virology
\
\
for
help.
Paroviruses
are
highly
impactful
because
they
do
not
have
nucleic
acid.
\
\
for
help.
Paroviruses
are
highly
impactful
because
they
do
not
have
nucleic
acid.
\
\
The
answer
is
(A)."
\
The
answer
is
(A).
\n\n
"
"
group"
:
"
mmlu_flan_cot_fewshot_other"
"
group"
:
"
mmlu_flan_cot_fewshot_other"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
task"
:
"
mmlu_flan_cot_fewshot_virology"
"
task"
:
"
mmlu_flan_cot_fewshot_virology"
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_world_religions.yaml
View file @
c9bbec6e
...
@@ -21,7 +21,7 @@
...
@@ -21,7 +21,7 @@
\
of
the
covenant
for
Jewish
males?
\n
(A)
The
rainbow
(B)
Circumcision
(C)
A
son
\
\
of
the
covenant
for
Jewish
males?
\n
(A)
The
rainbow
(B)
Circumcision
(C)
A
son
\
\
(D)
Bar
mitzvah
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
\
\
(D)
Bar
mitzvah
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
\
\
world
religions
for
help.
In
Judaism,
the
most
distinctive
sign
of
the
covenant
\
\
world
religions
for
help.
In
Judaism,
the
most
distinctive
sign
of
the
covenant
\
\
is
circumcision
(brit
milah).
The
answer
is
(B)."
\
is
circumcision
(brit
milah).
The
answer
is
(B).
\n\n
"
"
group"
:
"
mmlu_flan_cot_fewshot_humanities"
"
group"
:
"
mmlu_flan_cot_fewshot_humanities"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
task"
:
"
mmlu_flan_cot_fewshot_world_religions"
"
task"
:
"
mmlu_flan_cot_fewshot_world_religions"
Prev
1
2
3
Next
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment