Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
gaoqiong
lm-evaluation-harness
Commits
ccca64f7
Unverified
Commit
ccca64f7
authored
Nov 07, 2023
by
Lintang Sutawika
Committed by
GitHub
Nov 07, 2023
Browse files
Merge branch 'big-refactor' into cont-metrics
parents
0a39d055
b7a4ea06
Changes
304
Hide whitespace changes
Inline
Side-by-side
Showing
20 changed files
with
263 additions
and
404 deletions
+263
-404
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_public_relations.yaml
...al/tasks/mmlu/flan_cot_fewshot/mmlu_public_relations.yaml
+39
-65
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_security_studies.yaml
...al/tasks/mmlu/flan_cot_fewshot/mmlu_security_studies.yaml
+5
-4
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_sociology.yaml
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_sociology.yaml
+43
-67
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_us_foreign_policy.yaml
...l/tasks/mmlu/flan_cot_fewshot/mmlu_us_foreign_policy.yaml
+40
-66
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_virology.yaml
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_virology.yaml
+31
-55
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_world_religions.yaml
...val/tasks/mmlu/flan_cot_fewshot/mmlu_world_religions.yaml
+27
-53
lm_eval/tasks/mmlu/flan_cot_zeroshot/_mmlu.yaml
lm_eval/tasks/mmlu/flan_cot_zeroshot/_mmlu.yaml
+6
-0
lm_eval/tasks/mmlu/flan_cot_zeroshot/_mmlu_flan_cot_zeroshot_template_yaml
...u/flan_cot_zeroshot/_mmlu_flan_cot_zeroshot_template_yaml
+0
-0
lm_eval/tasks/mmlu/flan_cot_zeroshot/mmlu_abstract_algebra.yaml
...l/tasks/mmlu/flan_cot_zeroshot/mmlu_abstract_algebra.yaml
+6
-8
lm_eval/tasks/mmlu/flan_cot_zeroshot/mmlu_anatomy.yaml
lm_eval/tasks/mmlu/flan_cot_zeroshot/mmlu_anatomy.yaml
+6
-7
lm_eval/tasks/mmlu/flan_cot_zeroshot/mmlu_astronomy.yaml
lm_eval/tasks/mmlu/flan_cot_zeroshot/mmlu_astronomy.yaml
+6
-7
lm_eval/tasks/mmlu/flan_cot_zeroshot/mmlu_business_ethics.yaml
...al/tasks/mmlu/flan_cot_zeroshot/mmlu_business_ethics.yaml
+6
-8
lm_eval/tasks/mmlu/flan_cot_zeroshot/mmlu_clinical_knowledge.yaml
...tasks/mmlu/flan_cot_zeroshot/mmlu_clinical_knowledge.yaml
+6
-8
lm_eval/tasks/mmlu/flan_cot_zeroshot/mmlu_college_biology.yaml
...al/tasks/mmlu/flan_cot_zeroshot/mmlu_college_biology.yaml
+6
-8
lm_eval/tasks/mmlu/flan_cot_zeroshot/mmlu_college_chemistry.yaml
.../tasks/mmlu/flan_cot_zeroshot/mmlu_college_chemistry.yaml
+6
-8
lm_eval/tasks/mmlu/flan_cot_zeroshot/mmlu_college_computer_science.yaml
...mmlu/flan_cot_zeroshot/mmlu_college_computer_science.yaml
+6
-8
lm_eval/tasks/mmlu/flan_cot_zeroshot/mmlu_college_mathematics.yaml
...asks/mmlu/flan_cot_zeroshot/mmlu_college_mathematics.yaml
+6
-8
lm_eval/tasks/mmlu/flan_cot_zeroshot/mmlu_college_medicine.yaml
...l/tasks/mmlu/flan_cot_zeroshot/mmlu_college_medicine.yaml
+6
-8
lm_eval/tasks/mmlu/flan_cot_zeroshot/mmlu_college_physics.yaml
...al/tasks/mmlu/flan_cot_zeroshot/mmlu_college_physics.yaml
+6
-8
lm_eval/tasks/mmlu/flan_cot_zeroshot/mmlu_computer_security.yaml
.../tasks/mmlu/flan_cot_zeroshot/mmlu_computer_security.yaml
+6
-8
No files found.
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_public_relations.yaml
View file @
ccca64f7
dataset_name
:
public_relations
"
dataset_name"
:
"
public_relations"
description
:
'
The
following
are
multiple
choice
questions
(with
answers)
about
public
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
public
\
relations.
\
relations.
\n\n
Q:
Earth
Hour
was
a
campaign
launched
by
which
organization?
\n
(A)
\
\
Greenpeace
(B)
The
UN
(C)
Oxfam
(D)
World
Wildlife
Fund
\n
A:
Let's
think
step
by
\
\
step.
We
refer
to
Wikipedia
articles
on
public
relations
for
help.
Earth
Hour
\
Q:
Earth
Hour
was
a
campaign
launched
by
which
organization?
\
is
a
worldwide
movement
oragnized
launched
by
the
World
Wildlife
Fund.
The
answer
\
\
is
(D).
\n\n
Q:
In
issues
management,
what
is
the
most
proactive
approach
to
addressing
\
(A)
Greenpeace
(B)
The
UN
(C)
Oxfam
(D)
World
Wildlife
Fund
\
negative
or
misleading
information
posted
online
about
your
organization?
\n
(A)
\
\
Buy
domain
names
that
could
be
used
by
opposition
groups.
(B)
Post
anonymous
comments
\
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
public
relations
\
on
blogs
to
combat
this
information.
(C)
Prepare
a
news
release
that
discredits
\
for
help.
Earth
Hour
is
a
worldwide
movement
oragnized
launched
by
the
World
Wildlife
\
the
inaccurate
information.
(D)
Make
policy
changes
to
address
complaints
highlighted
\
Fund.
The
answer
is
(D).
\
on
these
sites.
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
\
\
public
relations
for
help.
In
issues
management,
the
most
proactive
approach
to
\
\
addressing
negative
or
misleading
information
posted
online
is
to
make
policy
\
Q:
In
issues
management,
what
is
the
most
proactive
approach
to
addressing
negative
\
changes
to
address
complaints
highlighted
on
those
sites.
The
answer
is
(D).
\n\
or
misleading
information
posted
online
about
your
organization?
\n
Q:
At
which
stage
in
the
planning
process
would
a
situation
analysis
be
carried
\
\
out?
\n
(A)
Defining
the
program
(B)
Planning
the
program
(C)
Taking
action
and
\
(A)
Buy
domain
names
that
could
be
used
by
opposition
groups.
(B)
Post
anonymous
\
implementing
ideas
(D)
Evaluation
of
the
program
\n
A:
Let's
think
step
by
step.
\
comments
on
blogs
to
combat
this
information.
(C)
Prepare
a
news
release
that
discredits
\
We
refer
to
Wikipedia
articles
on
public
relations
for
help.
Situation
analyses
\
the
inaccurate
information.
(D)
Make
policy
changes
to
address
complaints
highlighted
\
are
typically
carried
out
during
the
planning
process
stage
of
defining
the
program.
\
on
these
sites.
\
The
answer
is
(A).
\n\n
Q:
Which
of
these
statements
is
true
of
the
Vatican
in
2010
\
\
at
the
time
of
the
accusations
of
child
abuse
cover-ups?
\n
(A)
There
was
a
coordinated
\
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
public
relations
\
media
response.
(B)
Consistent
messages
were
communicated.
(C)
Criticisms
were
\
for
help.
In
issues
management,
the
most
proactive
approach
to
addressing
negative
\
taken
as
attacks
on
the
Catholic
Church.
(D)
The
credibility
of
the
Vatican
was
\
or
misleading
information
posted
online
is
to
make
policy
changes
to
address
complaints
\
upheld.
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
public
\
highlighted
on
those
sites.
The
answer
is
(D).
\
relations
for
help.
In
2010
when
there
were
accusations
of
child
abuse
cover-ups,
\
\
the
Vatican
took
those
criticisms
as
attacks
on
the
Catholic
Church.
The
answer
\
\
is
(C).
\n\n
Q:
What
should
a
public
relations
media
practitioner
do
if
she
does
\
Q:
At
which
stage
in
the
planning
process
would
a
situation
analysis
be
carried
\
not
know
the
answer
to
a
reporter's
question?
\n
(A)
Give
the
reporter
other
information
\
out?
\
she
is
certain
is
correct.
(B)
Say
that
the
information
is
'off
the
record'
and
\
\
will
be
disseminated
later.
(C)
Say
'I
don't
know'
and
promise
to
provide
the
\
(A)
Defining
the
program
(B)
Planning
the
program
(C)
Taking
action
and
implementing
\
information
later.
(D)
Say
'no
comment,'
rather
than
appear
uninformed.
\n
A:
Let's
\
ideas
(D)
Evaluation
of
the
program
\
think
step
by
step.
We
refer
to
Wikipedia
articles
on
public
relations
for
help.
\
\
If
a
public
relations
media
practitioner
does
not
know
the
answer
to
a
reporter's
\
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
public
relations
\
question,
they
should
say
'I
don't
know'
and
offer
to
provide
the
information
\
for
help.
Situation
analyses
are
typically
carried
out
during
the
planning
process
\
later.
The
answer
is
(C)."
stage
of
defining
the
program.
The
answer
is
(A).
"
group"
:
"
mmlu_flan_cot_fewshot_social_sciences"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
task"
:
"
mmlu_flan_cot_fewshot_public_relations"
Q:
Which
of
these
statements
is
true
of
the
Vatican
in
2010
at
the
time
of
the
accusations
of
child
abuse
cover-ups?
(A)
There
was
a
coordinated
media
response.
(B)
Consistent
messages
were
communicated.
(C)
Criticisms
were
taken
as
attacks
on
the
Catholic
Church.
(D)
The
credibility
of
the
Vatican
was
upheld.
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
public
relations
for
help.
In
2010
when
there
were
accusations
of
child
abuse
cover-ups,
the
Vatican
took
those
criticisms
as
attacks
on
the
Catholic
Church.
The
answer
is
(C).
Q:
What
should
a
public
relations
media
practitioner
do
if
she
does
not
know
the
answer
to
a
reporter'
'
s
question?
(A)
Give
the
reporter
other
information
she
is
certain
is
correct.
(B)
Say
that
the
information
is
'
'
off
the
record'
'
and
will
be
disseminated
later.
(C)
Say
'
'
I
don'
'
t
know'
'
and
promise
to
provide
the
information
later.
(D)
Say
'
'
no
comment,'
'
rather
than
appear
uninformed.
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
public
relations
for
help.
If
a
public
relations
media
practitioner
does
not
know
the
answer
to
a
reporter'
'
s
question,
they
should
say
'
'
I
don'
'
t
know'
'
and
offer
to
provide
the
information
later.
The
answer
is
(C).'
include
:
_mmlu_flan_cot_fewshot_template_yaml
task
:
mmlu_flan_cot_fewshot_public_relations
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_security_studies.yaml
View file @
ccca64f7
dataset_name
:
security_studies
"
dataset_name
"
:
"
security_studies
"
description
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
security
\
"
description
"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
security
\
\
studies.
\n\n
Q:
What
are
the
frameworks
of
analysis
within
which
terrorism
has
\
\
studies.
\n\n
Q:
What
are
the
frameworks
of
analysis
within
which
terrorism
has
\
\
been
considered
(as
of
2020)?
\n
(A)
Competition
between
larger
nations
has
resulted
\
\
been
considered
(as
of
2020)?
\n
(A)
Competition
between
larger
nations
has
resulted
\
\
in
some
countries
actively
supporting
terrorist
groups
to
undermine
the
strength
\
\
in
some
countries
actively
supporting
terrorist
groups
to
undermine
the
strength
\
...
@@ -81,5 +81,6 @@ description: "The following are multiple choice questions (with answers) about s
...
@@ -81,5 +81,6 @@ description: "The following are multiple choice questions (with answers) about s
\
for
negotiation
or
concession.
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
\
\
for
negotiation
or
concession.
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
\
\
articles
on
security
studies
for
help.
Coercive
diplomacy
uses
the
threat
of
force
\
\
articles
on
security
studies
for
help.
Coercive
diplomacy
uses
the
threat
of
force
\
\
to
induce
the
opponent
to
comply
with
demands.
The
answer
is
(B)."
\
to
induce
the
opponent
to
comply
with
demands.
The
answer
is
(B)."
include
:
_mmlu_flan_cot_fewshot_template_yaml
"
group"
:
"
mmlu_flan_cot_fewshot_social_sciences"
task
:
mmlu_flan_cot_fewshot_security_studies
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
task"
:
"
mmlu_flan_cot_fewshot_security_studies"
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_sociology.yaml
View file @
ccca64f7
dataset_name
:
sociology
"
dataset_name"
:
"
sociology"
description
:
'
The
following
are
multiple
choice
questions
(with
answers)
about
sociology.
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
sociology.
\n\
\n
Q:
Which
of
the
following
is
not
a
problem
associated
with
official
statistics
\
\
on
strike
action?
\n
(A)
most
strikes
go
unnoticed
by
employers
and
the
mass
media
\
Q:
Which
of
the
following
is
not
a
problem
associated
with
official
statistics
on
\
(B)
not
all
industrial
disputes
will
be
reported
by
the
employer
(C)
the
definition
\
strike
action?
\
of
strikes
excludes
those
that
involve
fewer
than
ten
workers
or
last
less
than
\
\
one
day
(D)
it
is
hard
to
compare
strikes
that
were
measured
in
different
ways
\n\
(A)
most
strikes
go
unnoticed
by
employers
and
the
mass
media
(B)
not
all
industrial
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
sociology
for
help.
\
disputes
will
be
reported
by
the
employer
(C)
the
definition
of
strikes
excludes
\
Official
statistics
on
strike
action
can
be
problematic
because
not
all
industrial
\
those
that
involve
fewer
than
ten
workers
or
last
less
than
one
day
(D)
it
is
hard
\
disputes
will
be
reported
by
employers,
the
definition
of
strikes
excludes
those
\
to
compare
strikes
that
were
measured
in
different
ways
\
that
involves
fewer
than
ten
workers
or
last
less
than
one
day,
and
it
is
hard
\
\
to
compare
strikes
that
were
measured
in
different
ways.
Thus,
(A)
is
not
a
problem
\
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
sociology
for
help.
\
associated
with
official
statistics
on
strike
action.
The
answer
is
(A).
\n\n
Q:
\
Official
statistics
on
strike
action
can
be
problematic
because
not
all
industrial
\
What
does
Berger
(1963)
describe
as
a
metaphor
for
social
reality?
\n
(A)
a
fairground
\
disputes
will
be
reported
by
employers,
the
definition
of
strikes
excludes
those
\
ride
(B)
a
circus
(C)
a
puppet
theatre
(D)
a
ballet
\n
A:
Let's
think
step
by
step.
\
that
involves
fewer
than
ten
workers
or
last
less
than
one
day,
and
it
is
hard
to
\
We
refer
to
Wikipedia
articles
on
sociology
for
help.
Berger
describes
social
\
compare
strikes
that
were
measured
in
different
ways.
Thus,
(A)
is
not
a
problem
\
reality
using
the
metaphor
of
a
puppet
theatre.
The
answer
is
(C).
\n\n
Q:
The
term
\
associated
with
official
statistics
on
strike
action.
The
answer
is
(A).
\
'hegemony'
refers
to:
\n
(A)
the
tendency
for
the
working
class
not
to
realize
their
\
\
own
interests
(B)
a
dominant
ideology
that
legitimates
economic,
political
and
\
\
cultural
power
(C)
a
form
of
dual
consciousness
based
on
ideology
and
everyday
\
Q:
What
does
Berger
(1963)
describe
as
a
metaphor
for
social
reality?
\
experiences
(D)
a
mode
of
payment
given
for
outstanding
topiary
\n
A:
Let's
think
\
\
step
by
step.
We
refer
to
Wikipedia
articles
on
sociology
for
help.
Hegemony
refers
\
(A)
a
fairground
ride
(B)
a
circus
(C)
a
puppet
theatre
(D)
a
ballet
\
to
a
dominant
ideology
that
legitimates
economic,
policital,
and
cultural
power.
\
\
The
answer
is
(B).
\n\n
Q:
The
shift
from
'civil
religion'
to
'common
religion'
\
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
sociology
for
help.
\
means
that:
\n
(A)
the
increasing
bureaucracy
of
the
state
has
made
religion
only
\
Berger
describes
social
reality
using
the
metaphor
of
a
puppet
theatre.
The
answer
\
a
marginal
part
of
our
lives
(B)
despite
the
weakening
of
traditional
authority,
\
is
(C).
\
our
everyday
lives
and
'common
sense'
remain
shaped
by
religious
beliefs
and
values
\
\
(C)
religious
participation
in
collective
worship
may
have
declined,
but
people
\
\
still
practise
their
faiths
in
private
(D)
people
are
much
more
likely
to
discuss
\
Q:
The
term
'
'
hegemony'
'
refers
to:
\
their
religious
beliefs
in
public,
informal
settings
\n
A:
Let's
think
step
by
step.
\
\
We
refer
to
Wikipedia
articles
on
sociology
for
help.
The
shift
from
civil
religion
\
(A)
the
tendency
for
the
working
class
not
to
realize
their
own
interests
(B)
a
\
to
common
religion
means
that
despite
the
weakening
of
traditional
authority,
\
dominant
ideology
that
legitimates
economic,
political
and
cultural
power
(C)
a
\
our
everyday
lives
and
common
sense
remain
shaped
by
religious
beliefs
and
values.
\
form
of
dual
consciousness
based
on
ideology
and
everyday
experiences
(D)
a
mode
\
The
answer
is
(B).
\n\n
Q:
Which
of
the
following
did
the
post-war
welfare
state
\
of
payment
given
for
outstanding
topiary
\
of
1948
not
aim
to
provide:
\n
(A)
free
health
care
and
education
for
all
(B)
a
\
\
minimum
wage
(C)
full
employment
(D)
universal
welfare
\n
A:
Let's
think
step
by
\
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
sociology
for
help.
\
step.
We
refer
to
Wikipedia
articles
on
sociology
for
help.
The
post-war
welfare
\
Hegemony
refers
to
a
dominant
ideology
that
legitimates
economic,
policital,
and
\
state
of
1948
aimed
to
provide
free
healthcare
and
education,
full
employment,
\
cultural
power.
The
answer
is
(B).
\
and
universal
welfare.
But
it
did
not
aim
to
provide
a
minimum
wage.
The
answer
\
\
is
(B)."
"
group"
:
"
mmlu_flan_cot_fewshot_social_sciences"
Q:
The
shift
from
'
'
civil
religion'
'
to
'
'
common
religion'
'
means
that:
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
task"
:
"
mmlu_flan_cot_fewshot_sociology"
(A)
the
increasing
bureaucracy
of
the
state
has
made
religion
only
a
marginal
part
of
our
lives
(B)
despite
the
weakening
of
traditional
authority,
our
everyday
lives
and
'
'
common
sense'
'
remain
shaped
by
religious
beliefs
and
values
(C)
religious
participation
in
collective
worship
may
have
declined,
but
people
still
practise
their
faiths
in
private
(D)
people
are
much
more
likely
to
discuss
their
religious
beliefs
in
public,
informal
settings
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
sociology
for
help.
The
shift
from
civil
religion
to
common
religion
means
that
despite
the
weakening
of
traditional
authority,
our
everyday
lives
and
common
sense
remain
shaped
by
religious
beliefs
and
values.
The
answer
is
(B).
Q:
Which
of
the
following
did
the
post-war
welfare
state
of
1948
not
aim
to
provide:
(A)
free
health
care
and
education
for
all
(B)
a
minimum
wage
(C)
full
employment
(D)
universal
welfare
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
sociology
for
help.
The
post-war
welfare
state
of
1948
aimed
to
provide
free
healthcare
and
education,
full
employment,
and
universal
welfare.
But
it
did
not
aim
to
provide
a
minimum
wage.
The
answer
is
(B).'
include
:
_mmlu_flan_cot_fewshot_template_yaml
task
:
mmlu_flan_cot_fewshot_sociology
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_us_foreign_policy.yaml
View file @
ccca64f7
dataset_name
:
us_foreign_policy
"
dataset_name"
:
"
us_foreign_policy"
description
:
'
The
following
are
multiple
choice
questions
(with
answers)
about
us
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
us
\
foreign
policy.
\
foreign
policy.
\n\n
Q:
How
did
Donald
Trump
attack
globalization
in
the
2016
campaign?
\n\
(A)
Globalization
had
made
men
like
him
too
rich
(B)
Globalization
only
benefited
\
\
certain
American
states,
such
as
New
York
(C)
Liberal
elites
had
encouraged
globalization,
\
Q:
How
did
Donald
Trump
attack
globalization
in
the
2016
campaign?
\
while
'ordinary
Americans'
lost
jobs
because
of
it
(D)
Globalization
encouraged
\
\
damaging
trade
wars
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
\
(A)
Globalization
had
made
men
like
him
too
rich
(B)
Globalization
only
benefited
\
on
us
foreign
policy
for
help.
Trump
attacked
globalization
because
he
believed
\
certain
American
states,
such
as
New
York
(C)
Liberal
elites
had
encouraged
globalization,
\
ordinary
Americans
lost
jobs
due
to
it,
and
so
he
wanted
to
blame
liberals
who
\
while
'
'
ordinary
Americans'
'
lost
jobs
because
of
it
(D)
Globalization
encouraged
\
had
encouraged
it.
The
answer
is
(C).
\n\n
Q:
How
did
NSC-68
change
U.S.
strategy?
\n\
damaging
trade
wars
(A)
It
globalized
containment.
(B)
It
militarized
containment.
(C)
It
called
for
\
\
the
development
of
the
hydrogen
bomb.
(D)
All
of
the
above
\n
A:
Let's
think
step
\
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
us
foreign
policy
\
by
step.
We
refer
to
Wikipedia
articles
on
us
foreign
policy
for
help.
NSC-68
\
for
help.
Trump
attacked
globalization
because
he
believed
ordinary
Americans
lost
\
outlined
a
variety
of
courses
of
action,
including
globalization
of
containment,
\
jobs
due
to
it,
and
so
he
wanted
to
blame
liberals
who
had
encouraged
it.
The
answer
\
militarization
of
contaiment,
and
the
development
of
the
hydrogen
bomb.
The
answer
\
is
(C).
\
is
(D).
\n\n
Q:
How
do
Defensive
Realism
and
Offensive
Realism
differ
in
their
explanation
\
\
of
state
behaviour?
\n
(A)
Defensive
realists
place
greater
emphasis
on
the
role
\
\
of
international
institutions
(B)
Defensive
realists
place
less
emphasis
on
geographical
\
Q:
How
did
NSC-68
change
U.S.
strategy?
\
factors
(C)
Offensive
realists
give
more
priority
to
the
national
interest
than
\
\
Defensive
realists.
(D)
Defensive
realists
believe
states
are
security
maximizers,
\
(A)
It
globalized
containment.
(B)
It
militarized
containment.
(C)
It
called
for
\
while
Offensive
realists
believe
states
to
be
power
maximizers
\n
A:
Let's
think
\
the
development
of
the
hydrogen
bomb.
(D)
All
of
the
above
\
step
by
step.
We
refer
to
Wikipedia
articles
on
us
foreign
policy
for
help.
While
\
\
defensive
realism
advocates
that
states
are
security
maximizers,
offensive
realists
\
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
us
foreign
policy
\
think
of
states
as
power
maximizers.
The
answer
is
(D).
\n\n
Q:
The
realm
of
policy
\
for
help.
NSC-68
outlined
a
variety
of
courses
of
action,
including
globalization
\
decisions
concerned
primarily
with
relations
between
the
United
States
and
the
\
of
containment,
militarization
of
contaiment,
and
the
development
of
the
hydrogen
\
rest
of
the
world
is
known
as
\n
(A)
terrorism
policy.
(B)
economic
policy.
(C)
\
bomb.
The
answer
is
(D).
\
foreign
policy.
(D)
international
policy.
\n
A:
Let's
think
step
by
step.
We
refer
\
\
to
Wikipedia
articles
on
us
foreign
policy
for
help.
The
topic
of
policy
decisions
\
\
concerns
with
relations
between
the
US
and
the
rest
of
the
world
is
known
as
foreign
\
Q:
How
do
Defensive
Realism
and
Offensive
Realism
differ
in
their
explanation
of
\
policy.
The
answer
is
(C).
\n\n
Q:
How
did
the
2008
financial
crisis
affect
America's
\
state
behaviour?
\
international
reputation?
\n
(A)
It
damaged
support
for
the
US
model
of
political
\
\
economy
and
capitalism
(B)
It
created
anger
at
the
United
States
for
exaggerating
\
(A)
Defensive
realists
place
greater
emphasis
on
the
role
of
international
institutions
\
the
crisis
(C)
It
increased
support
for
American
global
leadership
under
President
\
(B)
Defensive
realists
place
less
emphasis
on
geographical
factors
(C)
Offensive
\
Obama
(D)
It
reduced
global
use
of
the
US
dollar
\n
A:
Let's
think
step
by
step.
\
realists
give
more
priority
to
the
national
interest
than
Defensive
realists.
(D)
\
We
refer
to
Wikipedia
articles
on
us
foreign
policy
for
help.
The
2008
financial
\
Defensive
realists
believe
states
are
security
maximizers,
while
Offensive
realists
\
crisis
damanged
the
international
reputation
of
the
American
model
of
political
\
believe
states
to
be
power
maximizers
\
economy
and
capitalism.
The
answer
is
(A)."
"
group"
:
"
mmlu_flan_cot_fewshot_social_sciences"
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
us
foreign
policy
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
for
help.
While
defensive
realism
advocates
that
states
are
security
maximizers,
"
task"
:
"
mmlu_flan_cot_fewshot_us_foreign_policy"
offensive
realists
think
of
states
as
power
maximizers.
The
answer
is
(D).
Q:
The
realm
of
policy
decisions
concerned
primarily
with
relations
between
the
United
States
and
the
rest
of
the
world
is
known
as
(A)
terrorism
policy.
(B)
economic
policy.
(C)
foreign
policy.
(D)
international
policy.
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
us
foreign
policy
for
help.
The
topic
of
policy
decisions
concerns
with
relations
between
the
US
and
the
rest
of
the
world
is
known
as
foreign
policy.
The
answer
is
(C).
Q:
How
did
the
2008
financial
crisis
affect
America'
'
s
international
reputation?
(A)
It
damaged
support
for
the
US
model
of
political
economy
and
capitalism
(B)
It
created
anger
at
the
United
States
for
exaggerating
the
crisis
(C)
It
increased
support
for
American
global
leadership
under
President
Obama
(D)
It
reduced
global
use
of
the
US
dollar
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
us
foreign
policy
for
help.
The
2008
financial
crisis
damanged
the
international
reputation
of
the
American
model
of
political
economy
and
capitalism.
The
answer
is
(A).'
include
:
_mmlu_flan_cot_fewshot_template_yaml
task
:
mmlu_flan_cot_fewshot_us_foreign_policy
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_virology.yaml
View file @
ccca64f7
dataset_name
:
virology
"
dataset_name"
:
"
virology"
description
:
'
The
following
are
multiple
choice
questions
(with
answers)
about
virology.
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
virology.
\n\
\n
Q:
The
median
survival
time
to
AIDS
and
death
was
established
by
following:
\n\
(A)
Seroprevalent
HIV-infected
individuals
(B)
Seronegatives
(C)
Seroconverters
\
Q:
The
median
survival
time
to
AIDS
and
death
was
established
by
following:
\
(D)
High-risk
seronegatives
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
\
\
articles
on
virology
for
help.
The
median
survival
time
to
AIDS
and
death
was
\
(A)
Seroprevalent
HIV-infected
individuals
(B)
Seronegatives
(C)
Seroconverters
\
established
as
a
result
of
the
development
of
seroconverters.
The
answer
is
(C).
\n\
(D)
High-risk
seronegatives
\n
Q:
Which
of
the
following
is
a
morphological
characteristic
of
the
paramyxoviruses.
\n\
(A)
Fragile
viruses
often
visualised
with
RNA
spewing
from
the
inside
(B)
Elongate
\
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
virology
for
help.
\
viruses
(C)
Icosahedral
viruses
with
envelope
(D)
Very
large
viruses
\n
A:
Let's
\
The
median
survival
time
to
AIDS
and
death
was
established
as
a
result
of
the
development
\
think
step
by
step.
We
refer
to
Wikipedia
articles
on
virology
for
help.
Paramyxoviruses
\
of
seroconverters.
The
answer
is
(C).
\
are
fragile
viruses
often
visualised
with
RNA
spewing
from
the
inside.
The
answer
\
\
is
(A).
\n\n
Q:
The
most
important
goal
of
a
behavioral
intervention
is:
\n
(A)
Change
\
\
in
behavior
(B)
Comprehensive
coverage
(C)
Effective
use
of
behavioral
theory
\
Q:
Which
of
the
following
is
a
morphological
characteristic
of
the
paramyxoviruses.
\
(D)
Sustained
behavior
change
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
\
\
articles
on
virology
for
help.
The
prim
goal
of
a
behavioral
intervention
is
to
\
(A)
Fragile
viruses
often
visualised
with
RNA
spewing
from
the
inside
(B)
Elongate
\
cause
sustained
behavior
change.
The
answer
is
(D).
\n\n
Q:
A
key
factor
facilitating
\
viruses
(C)
Icosahedral
viruses
with
envelope
(D)
Very
large
viruses
\
the
application
of
nested
case-control
studies
from
the
MACS
was:
\n
(A)
Data
collection
\
\
(B)
Establishment
of
a
repository
of
biologic
specimens
(C)
Participant
interest
\
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
virology
for
help.
\
(D)
Administration
of
the
questionnaire
by
staff
\n
A:
Let's
think
step
by
step.
\
Paramyxoviruses
are
fragile
viruses
often
visualised
with
RNA
spewing
from
the
inside.
\
We
refer
to
Wikipedia
articles
on
virology
for
help.
The
Multicenter
AIDS
Cohort
\
The
answer
is
(A).
\
Study's
use
of
nested
case-control
studies
was
facilitated
by
the
establishment
\
\
of
a
repository
of
biologic
specimens.
The
answer
is
(B).
\n\n
Q:
Why
are
parvoviruses
\
\
a
highly
impactful
parasite?
\n
(A)
Because
they
have
no
nucleic
acid
(B)
They
require
\
Q:
The
most
important
goal
of
a
behavioral
intervention
is:
\
a
helper
virus
(C)
Only
replicate
in
dividing
cells
(D)
Can
integrate
into
host
\
\
chromosomes
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
virology
\
(A)
Change
in
behavior
(B)
Comprehensive
coverage
(C)
Effective
use
of
behavioral
\
for
help.
Paroviruses
are
highly
impactful
because
they
do
not
have
nucleic
acid.
\
theory
(D)
Sustained
behavior
change
\
The
answer
is
(A)."
"
group"
:
"
mmlu_flan_cot_fewshot_other"
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
virology
for
help.
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
The
prim
goal
of
a
behavioral
intervention
is
to
cause
sustained
behavior
change.
"
task"
:
"
mmlu_flan_cot_fewshot_virology"
The
answer
is
(D).
Q:
A
key
factor
facilitating
the
application
of
nested
case-control
studies
from
the
MACS
was:
(A)
Data
collection
(B)
Establishment
of
a
repository
of
biologic
specimens
(C)
Participant
interest
(D)
Administration
of
the
questionnaire
by
staff
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
virology
for
help.
The
Multicenter
AIDS
Cohort
Study'
'
s
use
of
nested
case-control
studies
was
facilitated
by
the
establishment
of
a
repository
of
biologic
specimens.
The
answer
is
(B).
Q:
Why
are
parvoviruses
a
highly
impactful
parasite?
(A)
Because
they
have
no
nucleic
acid
(B)
They
require
a
helper
virus
(C)
Only
replicate
in
dividing
cells
(D)
Can
integrate
into
host
chromosomes
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
virology
for
help.
Paroviruses
are
highly
impactful
because
they
do
not
have
nucleic
acid.
The
answer
is
(A).'
include
:
_mmlu_flan_cot_fewshot_template_yaml
task
:
mmlu_flan_cot_fewshot_virology
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_world_religions.yaml
View file @
ccca64f7
dataset_name
:
world_religions
"
dataset_name"
:
"
world_religions"
description
:
'
The
following
are
multiple
choice
questions
(with
answers)
about
world
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
world
\
religions.
\
religions.
\n\n
Q:
How
can
the
Upanishads
be
characterized?
\n
(A)
Ritual
texts
(B)
\
\
Philosophical
texts
(C)
Hymns
(D)
Origin
stories
\n
A:
Let's
think
step
by
step.
\
\
We
refer
to
Wikipedia
articles
on
world
religions
for
help.
The
Upanishads
are
\
Q:
How
can
the
Upanishads
be
characterized?
\
the
most
recent
part
of
Vedas
(the
oldest
scriptures
in
Hinduism)
and
supplied
\
\
the
basis
of
later
Hindu
philosophy.
So
they
are
philosophical
texts.
The
answer
\
(A)
Ritual
texts
(B)
Philosophical
texts
(C)
Hymns
(D)
Origin
stories
\
is
(B).
\n\n
Q:
What
is
the
Second
Gem
in
Buddhism?
\n
(A)
The
Dharma
(B)
The
Sangha
\
\
(C)
The
Buddha
(D)
The
Bodhisattva
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
\
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
world
religions
\
articles
on
world
religions
for
help.
The
Second
Gem
in
Buddhism
is
The
Dharma.
\
for
help.
The
Upanishads
are
the
most
recent
part
of
Vedas
(the
oldest
scriptures
\
The
answer
is
(A).
\n\n
Q:
Which
Japanese
government
promoted
a
kind
of
national
\
in
Hinduism)
and
supplied
the
basis
of
later
Hindu
philosophy.
So
they
are
philosophical
\
cult
based
on
the
emperor
and
his
associations
with
kami?
\n
(A)
Honen
(B)
Tanaka
\
texts.
The
answer
is
(B).
\
(C)
Tokugawa
(D)
Meiji
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
\
\
on
world
religions
for
help.
The
promotion
of
a
national
cult
based
on
the
emperor
\
\
and
his
associations
with
Kami
happened
during
the
reign
of
Emperor
Meiji
(1852-1912).
\
Q:
What
is
the
Second
Gem
in
Buddhism?
\
The
answer
is
(D).
\n\n
Q:
In
which
dynasty
was
the
\"
Mandate
of
Heaven
\"
developed
\
\
to
legitimatize
the
new
rulers?
\n
(A)
Shang
(B)
Zhou
(C)
Han
(D)
Xia
\n
A:
Let's
\
(A)
The
Dharma
(B)
The
Sangha
(C)
The
Buddha
(D)
The
Bodhisattva
\
think
step
by
step.
We
refer
to
Wikipedia
articles
on
world
religions
for
help.
\
\
The
\"
Mandate
of
Heaven
\"
was
developed
as
an
ancient
Chinese
philosophical
concept
\
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
world
religions
\
during
the
Zhou
Dynasty
(1046-256
BCE).
The
answer
is
(B).
\n\n
Q:
What
is
the
sign
\
for
help.
The
Second
Gem
in
Buddhism
is
The
Dharma.
The
answer
is
(A).
\
of
the
covenant
for
Jewish
males?
\n
(A)
The
rainbow
(B)
Circumcision
(C)
A
son
\
\
(D)
Bar
mitzvah
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
\
\
world
religions
for
help.
In
Judaism,
the
most
distinctive
sign
of
the
covenant
\
Q:
Which
Japanese
government
promoted
a
kind
of
national
cult
based
on
the
emperor
\
is
circumcision
(brit
milah).
The
answer
is
(B)."
and
his
associations
with
kami?
"
group"
:
"
mmlu_flan_cot_fewshot_humanities"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
(A)
Honen
(B)
Tanaka
(C)
Tokugawa
(D)
Meiji
"
task"
:
"
mmlu_flan_cot_fewshot_world_religions"
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
world
religions
for
help.
The
promotion
of
a
national
cult
based
on
the
emperor
and
his
associations
with
Kami
happened
during
the
reign
of
Emperor
Meiji
(1852-1912).
The
answer
is
(D).
Q:
In
which
dynasty
was
the
"Mandate
of
Heaven"
developed
to
legitimatize
the
new
rulers?
(A)
Shang
(B)
Zhou
(C)
Han
(D)
Xia
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
world
religions
for
help.
The
"Mandate
of
Heaven"
was
developed
as
an
ancient
Chinese
philosophical
concept
during
the
Zhou
Dynasty
(1046-256
BCE).
The
answer
is
(B).
Q:
What
is
the
sign
of
the
covenant
for
Jewish
males?
(A)
The
rainbow
(B)
Circumcision
(C)
A
son
(D)
Bar
mitzvah
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
world
religions
for
help.
In
Judaism,
the
most
distinctive
sign
of
the
covenant
is
circumcision
(brit
milah).
The
answer
is
(B).'
include
:
_mmlu_flan_cot_fewshot_template_yaml
task
:
mmlu_flan_cot_fewshot_world_religions
lm_eval/tasks/mmlu/flan_cot_zeroshot/_mmlu.yaml
0 → 100644
View file @
ccca64f7
group
:
mmlu_flan_cot_zeroshot
task
:
-
mmlu_flan_cot_zeroshot_stem
-
mmlu_flan_cot_zeroshot_other
-
mmlu_flan_cot_zeroshot_social_sciences
-
mmlu_flan_cot_zeroshot_humanities
lm_eval/tasks/mmlu/flan_cot_zeroshot/_mmlu_flan_
generative
_template_yaml
→
lm_eval/tasks/mmlu/flan_cot_zeroshot/_mmlu_flan_
cot_zeroshot
_template_yaml
View file @
ccca64f7
File moved
lm_eval/tasks/mmlu/flan_cot_zeroshot/mmlu_abstract_algebra.yaml
View file @
ccca64f7
dataset_name
:
abstract_algebra
"
dataset_name"
:
"
abstract_algebra"
description
:
'
The
following
are
multiple
choice
questions
(with
answers)
about
abstract
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
abstract
\
algebra.
\
algebra.
\n\n
"
"
group"
:
"
mmlu_flan_cot_zeroshot_stem"
"
include"
:
"
_mmlu_flan_cot_zeroshot_template_yaml"
'
"
task"
:
"
mmlu_flan_cot_zeroshot_abstract_algebra"
include
:
_mmlu_flan_generative_template_yaml
task
:
mmlu_flan_cot_zeroshot_abstract_algebra
lm_eval/tasks/mmlu/flan_cot_zeroshot/mmlu_anatomy.yaml
View file @
ccca64f7
dataset_name
:
anatomy
"
dataset_name"
:
"
anatomy"
description
:
'
The
following
are
multiple
choice
questions
(with
answers)
about
anatomy.
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
anatomy.
\n\
\n
"
"
group"
:
"
mmlu_flan_cot_zeroshot_stem"
'
"
include"
:
"
_mmlu_flan_cot_zeroshot_template_yaml"
include
:
_mmlu_flan_generative_template_yaml
"
task"
:
"
mmlu_flan_cot_zeroshot_anatomy"
task
:
mmlu_flan_cot_zeroshot_anatomy
lm_eval/tasks/mmlu/flan_cot_zeroshot/mmlu_astronomy.yaml
View file @
ccca64f7
dataset_name
:
astronomy
"
dataset_name"
:
"
astronomy"
description
:
'
The
following
are
multiple
choice
questions
(with
answers)
about
astronomy.
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
astronomy.
\n\
\n
"
"
group"
:
"
mmlu_flan_cot_zeroshot_stem"
'
"
include"
:
"
_mmlu_flan_cot_zeroshot_template_yaml"
include
:
_mmlu_flan_generative_template_yaml
"
task"
:
"
mmlu_flan_cot_zeroshot_astronomy"
task
:
mmlu_flan_cot_zeroshot_astronomy
lm_eval/tasks/mmlu/flan_cot_zeroshot/mmlu_business_ethics.yaml
View file @
ccca64f7
dataset_name
:
business_ethics
"
dataset_name"
:
"
business_ethics"
description
:
'
The
following
are
multiple
choice
questions
(with
answers)
about
business
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
business
\
ethics.
\
ethics.
\n\n
"
"
group"
:
"
mmlu_flan_cot_zeroshot_other"
"
include"
:
"
_mmlu_flan_cot_zeroshot_template_yaml"
'
"
task"
:
"
mmlu_flan_cot_zeroshot_business_ethics"
include
:
_mmlu_flan_generative_template_yaml
task
:
mmlu_flan_cot_zeroshot_business_ethics
lm_eval/tasks/mmlu/flan_cot_zeroshot/mmlu_clinical_knowledge.yaml
View file @
ccca64f7
dataset_name
:
clinical_knowledge
"
dataset_name"
:
"
clinical_knowledge"
description
:
'
The
following
are
multiple
choice
questions
(with
answers)
about
clinical
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
clinical
\
knowledge.
\
knowledge.
\n\n
"
"
group"
:
"
mmlu_flan_cot_zeroshot_other"
"
include"
:
"
_mmlu_flan_cot_zeroshot_template_yaml"
'
"
task"
:
"
mmlu_flan_cot_zeroshot_clinical_knowledge"
include
:
_mmlu_flan_generative_template_yaml
task
:
mmlu_flan_cot_zeroshot_clinical_knowledge
lm_eval/tasks/mmlu/flan_cot_zeroshot/mmlu_college_biology.yaml
View file @
ccca64f7
dataset_name
:
college_biology
"
dataset_name"
:
"
college_biology"
description
:
'
The
following
are
multiple
choice
questions
(with
answers)
about
college
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
college
\
biology.
\
biology.
\n\n
"
"
group"
:
"
mmlu_flan_cot_zeroshot_stem"
"
include"
:
"
_mmlu_flan_cot_zeroshot_template_yaml"
'
"
task"
:
"
mmlu_flan_cot_zeroshot_college_biology"
include
:
_mmlu_flan_generative_template_yaml
task
:
mmlu_flan_cot_zeroshot_college_biology
lm_eval/tasks/mmlu/flan_cot_zeroshot/mmlu_college_chemistry.yaml
View file @
ccca64f7
dataset_name
:
college_chemistry
"
dataset_name"
:
"
college_chemistry"
description
:
'
The
following
are
multiple
choice
questions
(with
answers)
about
college
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
college
\
chemistry.
\
chemistry.
\n\n
"
"
group"
:
"
mmlu_flan_cot_zeroshot_stem"
"
include"
:
"
_mmlu_flan_cot_zeroshot_template_yaml"
'
"
task"
:
"
mmlu_flan_cot_zeroshot_college_chemistry"
include
:
_mmlu_flan_generative_template_yaml
task
:
mmlu_flan_cot_zeroshot_college_chemistry
lm_eval/tasks/mmlu/flan_cot_zeroshot/mmlu_college_computer_science.yaml
View file @
ccca64f7
dataset_name
:
college_computer_science
"
dataset_name"
:
"
college_computer_science"
description
:
'
The
following
are
multiple
choice
questions
(with
answers)
about
college
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
college
\
computer
science.
\
computer
science.
\n\n
"
"
group"
:
"
mmlu_flan_cot_zeroshot_stem"
"
include"
:
"
_mmlu_flan_cot_zeroshot_template_yaml"
'
"
task"
:
"
mmlu_flan_cot_zeroshot_college_computer_science"
include
:
_mmlu_flan_generative_template_yaml
task
:
mmlu_flan_cot_zeroshot_college_computer_science
lm_eval/tasks/mmlu/flan_cot_zeroshot/mmlu_college_mathematics.yaml
View file @
ccca64f7
dataset_name
:
college_mathematics
"
dataset_name"
:
"
college_mathematics"
description
:
'
The
following
are
multiple
choice
questions
(with
answers)
about
college
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
college
\
mathematics.
\
mathematics.
\n\n
"
"
group"
:
"
mmlu_flan_cot_zeroshot_stem"
"
include"
:
"
_mmlu_flan_cot_zeroshot_template_yaml"
'
"
task"
:
"
mmlu_flan_cot_zeroshot_college_mathematics"
include
:
_mmlu_flan_generative_template_yaml
task
:
mmlu_flan_cot_zeroshot_college_mathematics
lm_eval/tasks/mmlu/flan_cot_zeroshot/mmlu_college_medicine.yaml
View file @
ccca64f7
dataset_name
:
college_medicine
"
dataset_name"
:
"
college_medicine"
description
:
'
The
following
are
multiple
choice
questions
(with
answers)
about
college
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
college
\
medicine.
\
medicine.
\n\n
"
"
group"
:
"
mmlu_flan_cot_zeroshot_other"
"
include"
:
"
_mmlu_flan_cot_zeroshot_template_yaml"
'
"
task"
:
"
mmlu_flan_cot_zeroshot_college_medicine"
include
:
_mmlu_flan_generative_template_yaml
task
:
mmlu_flan_cot_zeroshot_college_medicine
lm_eval/tasks/mmlu/flan_cot_zeroshot/mmlu_college_physics.yaml
View file @
ccca64f7
dataset_name
:
college_physics
"
dataset_name"
:
"
college_physics"
description
:
'
The
following
are
multiple
choice
questions
(with
answers)
about
college
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
college
\
physics.
\
physics.
\n\n
"
"
group"
:
"
mmlu_flan_cot_zeroshot_stem"
"
include"
:
"
_mmlu_flan_cot_zeroshot_template_yaml"
'
"
task"
:
"
mmlu_flan_cot_zeroshot_college_physics"
include
:
_mmlu_flan_generative_template_yaml
task
:
mmlu_flan_cot_zeroshot_college_physics
lm_eval/tasks/mmlu/flan_cot_zeroshot/mmlu_computer_security.yaml
View file @
ccca64f7
dataset_name
:
computer_security
"
dataset_name"
:
"
computer_security"
description
:
'
The
following
are
multiple
choice
questions
(with
answers)
about
computer
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
computer
\
security.
\
security.
\n\n
"
"
group"
:
"
mmlu_flan_cot_zeroshot_stem"
"
include"
:
"
_mmlu_flan_cot_zeroshot_template_yaml"
'
"
task"
:
"
mmlu_flan_cot_zeroshot_computer_security"
include
:
_mmlu_flan_generative_template_yaml
task
:
mmlu_flan_cot_zeroshot_computer_security
Prev
1
…
3
4
5
6
7
8
9
10
11
…
16
Next
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment