Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
gaoqiong
lm-evaluation-harness
Commits
574e565a
"git@developer.sourcefind.cn:sugon_wxj/megatron-lm.git" did not exist on "488f8c02a3c7fb6dbd1a1bb6cc410cf63e6794e3"
Unverified
Commit
574e565a
authored
Nov 10, 2023
by
Lintang Sutawika
Committed by
GitHub
Nov 10, 2023
Browse files
Merge branch 'big-refactor' into verbosity-rework
parents
73f3029c
b7a4ea06
Changes
498
Hide whitespace changes
Inline
Side-by-side
Showing
20 changed files
with
867 additions
and
1208 deletions
+867
-1208
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_high_school_statistics.yaml
...ks/mmlu/flan_cot_fewshot/mmlu_high_school_statistics.yaml
+58
-88
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_high_school_us_history.yaml
...ks/mmlu/flan_cot_fewshot/mmlu_high_school_us_history.yaml
+126
-125
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_high_school_world_history.yaml
...mmlu/flan_cot_fewshot/mmlu_high_school_world_history.yaml
+74
-74
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_human_aging.yaml
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_human_aging.yaml
+28
-48
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_human_sexuality.yaml
...val/tasks/mmlu/flan_cot_fewshot/mmlu_human_sexuality.yaml
+34
-61
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_international_law.yaml
...l/tasks/mmlu/flan_cot_fewshot/mmlu_international_law.yaml
+54
-80
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_jurisprudence.yaml
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_jurisprudence.yaml
+45
-69
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_logical_fallacies.yaml
...l/tasks/mmlu/flan_cot_fewshot/mmlu_logical_fallacies.yaml
+45
-71
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_machine_learning.yaml
...al/tasks/mmlu/flan_cot_fewshot/mmlu_machine_learning.yaml
+46
-45
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_management.yaml
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_management.yaml
+33
-54
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_marketing.yaml
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_marketing.yaml
+40
-66
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_medical_genetics.yaml
...al/tasks/mmlu/flan_cot_fewshot/mmlu_medical_genetics.yaml
+37
-61
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_miscellaneous.yaml
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_miscellaneous.yaml
+27
-54
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_moral_disputes.yaml
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_moral_disputes.yaml
+49
-72
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_moral_scenarios.yaml
...val/tasks/mmlu/flan_cot_fewshot/mmlu_moral_scenarios.yaml
+44
-66
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_nutrition.yaml
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_nutrition.yaml
+48
-72
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_philosophy.yaml
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_philosophy.yaml
+25
-25
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_prehistory.yaml
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_prehistory.yaml
+42
-67
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_professional_accounting.yaml
...s/mmlu/flan_cot_fewshot/mmlu_professional_accounting.yaml
+7
-6
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_professional_law.yaml
...al/tasks/mmlu/flan_cot_fewshot/mmlu_professional_law.yaml
+5
-4
No files found.
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_high_school_statistics.yaml
View file @
574e565a
dataset_name
:
high_school_statistics
"
dataset_name"
:
"
high_school_statistics"
description
:
'
The
following
are
multiple
choice
questions
(with
answers)
about
high
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
high
\
school
statistics.
\
school
statistics.
\n\n
Q:
A
new
smartwatch
is
manufactured
in
one
part
of
a
factory,
\
\
then
secured
for
shipping
in
another,
independent
part
of
the
factory.
The
weight
\
\
of
the
smartwatch
has
a
mean
of
62
grams
and
a
standard
deviation
of
1.0
grams.
\
Q:
A
new
smartwatch
is
manufactured
in
one
part
of
a
factory,
then
secured
for
shipping
\
The
weight
of
the
packaging
(box,
user's
guide,
bubble
wrap,
etc.)
has
a
mean
\
in
another,
independent
part
of
the
factory.
The
weight
of
the
smartwatch
has
a
\
of
456
grams
and
a
standard
deviation
of
6
grams.
Together,
the
distribution
of
\
mean
of
62
grams
and
a
standard
deviation
of
1.0
grams.
The
weight
of
the
packaging
\
the
weight
of
the
smartwatch
and
its
packaging
would
have
the
following
mean
and
\
(box,
user'
'
s
guide,
bubble
wrap,
etc.)
has
a
mean
of
456
grams
and
a
standard
deviation
\
standard
deviation:
\n
(A)
Mean
518
grams;
standard
deviation
7.0
grams
(B)
Mean
\
of
6
grams.
Together,
the
distribution
of
the
weight
of
the
smartwatch
and
its
packaging
\
518
grams;
standard
deviation
3.5
grams
(C)
Mean
518
grams;
standard
deviation
\
would
have
the
following
mean
and
standard
deviation:
\
6.1
grams
(D)
Mean
394
grams;
standard
deviation
6.1
grams
\n
A:
Let's
think
step
\
\
by
step.
Since
the
weight
of
the
watch
and
the
weight
of
the
packaging
are
independent
\
(A)
Mean
518
grams;
standard
deviation
7.0
grams
(B)
Mean
518
grams;
standard
deviation
\
random
variables,
the
mean
and
variance
of
their
sum
is
equal
to
the
sum
of
their
\
3.5
grams
(C)
Mean
518
grams;
standard
deviation
6.1
grams
(D)
Mean
394
grams;
standard
\
individual
means
and
variances.
So
the
mean
is
62
+
456
=
518
grams,
and
the
variances
\
deviation
6.1
grams
\
is
1.0^2
+
6.0^2
=
37,
leading
to
a
standard
deviation
of
6.1
grams.
The
answer
\
\
is
(C).
\n\n
Q:
After
a
frost
warning
was
issued,
the
owner
of
a
large
orange
grove
\
A:
Let'
'
s
think
step
by
step.
Since
the
weight
of
the
watch
and
the
weight
of
the
\
asked
his
workers
to
spray
all
his
trees
with
water.
The
water
was
supposed
to
\
packaging
are
independent
random
variables,
the
mean
and
variance
of
their
sum
is
\
freeze
and
form
a
protective
covering
of
ice
around
the
orange
blossom.
Nevertheless,
\
equal
to
the
sum
of
their
individual
means
and
variances.
So
the
mean
is
62
+
456
\
the
owner
suspected
that
some
trees
suffered
considerable
damage
due
to
the
frost.
\
=
518
grams,
and
the
variances
is
1.0^2
+
6.0^2
=
37,
leading
to
a
standard
deviation
\
To
estimate
the
proportion
of
trees
that
suffered
more
than
50
percent
damage
\
of
6.1
grams.
The
answer
is
(C).
\
due
to
the
frost,
he
took
a
random
sample
of
100
trees
from
his
grove.
What
is
\
\
the
response
variable
in
this
experiment?
\n
(A)
The
proportion
of
trees
that
suffered
\
\
more
than
50
percent
damage
due
to
frost.
(B)
The
number
of
trees
affected
by
\
Q:
After
a
frost
warning
was
issued,
the
owner
of
a
large
orange
grove
asked
his
\
the
frost.
(C)
The
number
of
trees
sampled
from
the
grove.
(D)
For
each
sampled
\
workers
to
spray
all
his
trees
with
water.
The
water
was
supposed
to
freeze
and
\
tree,
whether
it
suffered
more
than
50
percent
damage
or
at
most
50
percent
damage.
\n\
form
a
protective
covering
of
ice
around
the
orange
blossom.
Nevertheless,
the
owner
A:
Let's
think
step
by
step.
In
this
experiment,
the
response
variable
is
what
is
\
suspected
that
some
trees
suffered
considerable
damage
due
to
the
frost.
To
estimate
\
measured.
For
each
tree,
what
is
measured
is
whether
or
not
it
suffered
more
than
\
the
proportion
of
trees
that
suffered
more
than
50
percent
damage
due
to
the
frost,
\
50
percent
damage
due
to
the
frost.
The
answer
is
(D).
\n\n
Q:
Suppose
X
and
Y
are
\
he
took
a
random
sample
of
100
trees
from
his
grove.
What
is
the
response
variable
\
random
variables
with
E(X)
=
37,
var(X)
=
5,
E(Y)
=
62,
and
var(Y)
=
12.
What
\
in
this
experiment?
\
are
the
expected
value
and
variance
of
the
random
variable
X
+
Y?
\n
(A)
E(X
+
Y)
\
\
=
99,
var(X
+
Y)
=
8.5
(B)
E(X
+
Y)
=
99,
var(X
+
Y)
=
13
(C)
E(X
+
Y)
=
99,
var(X
\
(A)
The
proportion
of
trees
that
suffered
more
than
50
percent
damage
due
to
frost.
\
+
Y)
=
17
(D)
There
is
insufficient
information
to
answer
this
question.
\n
A:
Let's
\
(B)
The
number
of
trees
affected
by
the
frost.
(C)
The
number
of
trees
sampled
from
\
think
step
by
step.
While
means
of
sums
of
random
variables
add
(regardless
of
\
the
grove.
(D)
For
each
sampled
tree,
whether
it
suffered
more
than
50
percent
damage
\
whether
the
variables
are
independent)
in
order
to
determine
the
variance
of
a
\
or
at
most
50
percent
damage.
\
sum
of
random
variables,
we
need
to
know
not
just
their
individual
variances
but
\
\
the
covariance
of
the
two
variables,
which
is
not
given
in
this
problem.
The
answer
\
A:
Let'
'
s
think
step
by
step.
In
this
experiment,
the
response
variable
is
what
\
is
(D).
\n\n
Q:
Which
of
the
following
sets
has
the
smallest
standard
deviation?
\
is
measured.
For
each
tree,
what
is
measured
is
whether
or
not
it
suffered
more
\
Which
has
the
largest?
\n
I:
{1,2,3}
\n
II:
{-10,10}
\n
III:
{100}
\n
(A)
I,
II
(B)
II,
\
than
50
percent
damage
due
to
the
frost.
The
answer
is
(D).
\
III
(C)
III,
I
(D)
III,
II
\n
A:
Let's
think
step
by
step.
The
variance
of
distribution
\
\
I
is
the
expected
squared
deviation
from
its
mean
(which
is
2),
so
the
variance
\
\
is
2/3
.
The
variance
of
distribution
II
is
10^2
(because
both
elements
are
10
\
Q:
Suppose
X
and
Y
are
random
variables
with
E(X)
=
37,
var(X)
=
5,
E(Y)
=
62,
and
\
away
from
the
mean
of
zero).
The
variance
of
distribution
III
is
0,
since
it
has
\
var(Y)
=
12.
What
are
the
expected
value
and
variance
of
the
random
variable
X
+
\
a
single
entry.
So
distribution
III
has
the
smallest
standard
deviation
and
distribution
\
Y?
\
II
has
the
largest.
The
answer
is
(D).
\n\n
Q:
Which
of
the
following
is
a
correct
\
\
statement
about
correlation?
\n
(A)
If
the
slope
of
the
regression
line
is
exactly
\
(A)
E(X
+
Y)
=
99,
var(X
+
Y)
=
8.5
(B)
E(X
+
Y)
=
99,
var(X
+
Y)
=
13
(C)
E(X
+
\
1,
then
the
correlation
is
exactly
1.
(B)
If
the
correlation
is
0,
then
the
slope
\
Y)
=
99,
var(X
+
Y)
=
17
(D)
There
is
insufficient
information
to
answer
this
question.
\
of
the
regression
line
is
undefined.
(C)
Switching
which
variable
is
called
x
\
\
and
which
is
called
y
changes
the
sign
of
the
correlation.
(D)
The
correlation
\
A:
Let'
'
s
think
step
by
step.
While
means
of
sums
of
random
variables
add
(regardless
\
r
is
equal
to
the
slope
of
the
regression
line
when
z-scores
for
the
y-variable
\
of
whether
the
variables
are
independent)
in
order
to
determine
the
variance
of
\
are
plotted
against
z-scores
for
the
x-variable.
\n
A:
Let's
think
step
by
step.
\
a
sum
of
random
variables,
we
need
to
know
not
just
their
individual
variances
but
\
Statement
A
is
false
because
the
slope
of
the
regression
line
being
exactly
1
\
the
covariance
of
the
two
variables,
which
is
not
given
in
this
problem.
The
answer
\
can
occur
even
when
the
two
variables
are
not
perfectly
correlated.
Statement
\
is
(D).
\
B
is
false
because
uncorrelated
variables
regression
lines
can
have
slope
zero.
\
\
Statement
C
is
false
because
correlation
is
symmetric
in
the
two
random
variables.
\
\
The
answer
is
(D)."
Q:
Which
of
the
following
sets
has
the
smallest
standard
deviation?
Which
has
the
"
group"
:
"
mmlu_flan_cot_fewshot_stem"
largest?
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
task"
:
"
mmlu_flan_cot_fewshot_high_school_statistics"
I:
{1,2,3}
II:
{-10,10}
III:
{100}
(A)
I,
II
(B)
II,
III
(C)
III,
I
(D)
III,
II
A:
Let'
'
s
think
step
by
step.
The
variance
of
distribution
I
is
the
expected
squared
deviation
from
its
mean
(which
is
2),
so
the
variance
is
2/3
.
The
variance
of
distribution
II
is
10^2
(because
both
elements
are
10
away
from
the
mean
of
zero).
The
variance
of
distribution
III
is
0,
since
it
has
a
single
entry.
So
distribution
III
has
the
smallest
standard
deviation
and
distribution
II
has
the
largest.
The
answer
is
(D).
Q:
Which
of
the
following
is
a
correct
statement
about
correlation?
(A)
If
the
slope
of
the
regression
line
is
exactly
1,
then
the
correlation
is
exactly
1.
(B)
If
the
correlation
is
0,
then
the
slope
of
the
regression
line
is
undefined.
(C)
Switching
which
variable
is
called
x
and
which
is
called
y
changes
the
sign
of
the
correlation.
(D)
The
correlation
r
is
equal
to
the
slope
of
the
regression
line
when
z-scores
for
the
y-variable
are
plotted
against
z-scores
for
the
x-variable.
A:
Let'
'
s
think
step
by
step.
Statement
A
is
false
because
the
slope
of
the
regression
line
being
exactly
1
can
occur
even
when
the
two
variables
are
not
perfectly
correlated.
Statement
B
is
false
because
uncorrelated
variables
regression
lines
can
have
slope
zero.
Statement
C
is
false
because
correlation
is
symmetric
in
the
two
random
variables.
The
answer
is
(D).'
include
:
_mmlu_flan_cot_fewshot_template_yaml
task
:
mmlu_flan_cot_fewshot_high_school_statistics
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_high_school_us_history.yaml
View file @
574e565a
dataset_name
:
high_school_us_history
"
dataset_name
"
:
"
high_school_us_history
"
description
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
high
\
"
description
"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
high
\
\
school
us
history.
\n\n
Q:
This
question
refers
to
the
following
information.
\n\
\
school
us
history.
\n\n
Q:
This
question
refers
to
the
following
information.
\n\
I
come
not
to
urge
personal
claims,
nor
to
seek
individual
benefits;
I
appear
as
\
I
come
not
to
urge
personal
claims,
nor
to
seek
individual
benefits;
I
appear
as
\
\
the
advocate
of
those
who
cannot
plead
their
own
cause;
I
come
as
the
friend
of
\
\
the
advocate
of
those
who
cannot
plead
their
own
cause;
I
come
as
the
friend
of
\
...
@@ -8,126 +8,127 @@ description: "The following are multiple choice questions (with answers) about h
...
@@ -8,126 +8,127 @@ description: "The following are multiple choice questions (with answers) about h
\
jails
penetrate
not
your
Halls
of
Legislation.
I
am
the
Hope
of
the
poor
crazed
\
\
jails
penetrate
not
your
Halls
of
Legislation.
I
am
the
Hope
of
the
poor
crazed
\
\
beings
who
pine
in
the
cells,
and
stalls,
and
cages,
and
waste
rooms
of
your
poor-houses.
\
\
beings
who
pine
in
the
cells,
and
stalls,
and
cages,
and
waste
rooms
of
your
poor-houses.
\
\
I
am
the
Revelation
of
hundreds
of
wailing,
suffering
creatures,
hidden
in
your
\
\
I
am
the
Revelation
of
hundreds
of
wailing,
suffering
creatures,
hidden
in
your
\
\
private
dwellings,
and
in
pens
and
cabins
\u2014
shut
out,
cut
off
from
all
healing
\
\
private
dwellings,
and
in
pens
and
cabins—shut
out,
cut
off
from
all
healing
influences,
\
\
influences,
from
all
mind-restoring
cares.
\u2026
Could
their
melancholy
histories
\
\
from
all
mind-restoring
cares.…
Could
their
melancholy
histories
be
spread
before
\
\
be
spread
before
you
as
revealed
to
my
grieved
spirit
during
the
last
three
months,
\
\
you
as
revealed
to
my
grieved
spirit
during
the
last
three
months,
how
promptly,
\
\
how
promptly,
how
earnestly
would
you
search
out
the
most
approved
means
of
relief;
\
\
how
earnestly
would
you
search
out
the
most
approved
means
of
relief;
how
trifling,
\
\
how
trifling,
how
insignificant,
by
comparison,
would
appear
the
sacrifices
you
\
\
how
insignificant,
by
comparison,
would
appear
the
sacrifices
you
are
asked
to
\
\
are
asked
to
make;
how
would
a
few
dimes
and
dollars,
gathered
from
each
citizen,
\
\
make;
how
would
a
few
dimes
and
dollars,
gathered
from
each
citizen,
diminish
\
\
diminish
in
value
as
a
possession,
compared
with
the
certain
benefits
and
vast
\
\
in
value
as
a
possession,
compared
with
the
certain
benefits
and
vast
good
to
\
\
good
to
be
secured
for
the
suffering
insane...by
the
consecration
and
application
\
\
be
secured
for
the
suffering
insane...by
the
consecration
and
application
of
a
\
\
of
a
sufficient
fund
to
the
construction
of
a
suitable
hospital.
\u2026\n\u2014\
\
sufficient
fund
to
the
construction
of
a
suitable
hospital.…
\n
—Dorothea
Dix,
Memorial
\
Dorothea
Dix,
Memorial
Soliciting
a
State
Hospital
for
the
Protection
and
Cure
of
\
\
Soliciting
a
State
Hospital
for
the
Protection
and
Cure
of
the
Insane,
\n
Submitted
\
\
the
Insane,
\n
Submitted
to
the
General
Assembly
of
North
Carolina,
November
1848
\n\
\
to
the
General
Assembly
of
North
Carolina,
November
1848
\n
Dorothea
Dix
can
best
\
Dorothea
Dix
can
best
be
compared
to
whom?
\n
(A)
Abigail
Adams
(B)
Clara
Barton
(C)
\
\
be
compared
to
whom?
\n
(A)
Abigail
Adams
(B)
Clara
Barton
(C)
Shirley
Temple
(D)
\
\
Shirley
Temple
(D)
Hillary
Clinton
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
\
\
Hillary
Clinton
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
\
\
articles
on
us
history
for
help.
Both
Dorothea
Dix
and
Clara
barton
are
American
\
\
us
history
for
help.
Both
Dorothea
Dix
and
Clara
barton
are
American
nurses.
The
\
\
nurses.
The
answer
is
(B).
\n\n
Q:
This
question
refers
to
the
following
information.
\n\
\
answer
is
(B).
\n\n
Q:
This
question
refers
to
the
following
information.
\n\"
As
\
\"
As
our
late
Conduct
at
the
Conestoga
Manor
and
Lancaster
have
occasioned
much
\
\
our
late
Conduct
at
the
Conestoga
Manor
and
Lancaster
have
occasioned
much
Speculation
\
\
Speculation
&
a
great
diversity
of
Sentiments
in
this
and
neighboring
Governments;
\
\
&
a
great
diversity
of
Sentiments
in
this
and
neighboring
Governments;
some
vindicating
\
\
some
vindicating
&
others
condemning
it;
some
charitably
alleviating
the
Crime,
\
\
&
others
condemning
it;
some
charitably
alleviating
the
Crime,
&
others
maliciously
\
\
&
others
maliciously
painting
it
in
the
most
odious
&
detestable
Colours,
we
think
\
\
painting
it
in
the
most
odious
&
detestable
Colours,
we
think
it
our
duty
to
lay
\
\
it
our
duty
to
lay
before
the
Publick,
the
whole
Matter
as
it
appeared,
&
still
\
\
before
the
Publick,
the
whole
Matter
as
it
appeared,
&
still
appears,
to
us.
.
\
\
appears,
to
us.
.
.
.
\n\"
If
these
things
are
not
sufficient
to
prove
an
unjustifiable
\
\
.
.
\n\"
If
these
things
are
not
sufficient
to
prove
an
unjustifiable
Attachment
\
\
Attachment
in
the
Quakers
to
the
Indians
Savages,
a
fixed
Resolution
to
befriend
\
\
in
the
Quakers
to
the
Indians
Savages,
a
fixed
Resolution
to
befriend
them
&
an
\
\
them
&
an
utter
insensibility
to
human
Distresses,
let
us
consider
a
few
more
\
\
utter
insensibility
to
human
Distresses,
let
us
consider
a
few
more
recent
Facts.
\
\
recent
Facts.
When
we
found
the
last
Summer
that
we
were
likely
to
get
no
Assistance
\
\
When
we
found
the
last
Summer
that
we
were
likely
to
get
no
Assistance
from
the
\
\
from
the
Government,
some
Volunteers
went
out
at
our
own
Expense,
determined
to
\
\
Government,
some
Volunteers
went
out
at
our
own
Expense,
determined
to
drive
our
\
\
drive
our
Enemies
from
our
Borders;
&
when
we
came
near
to
the
great
Island,
we
\
\
Enemies
from
our
Borders;
&
when
we
came
near
to
the
great
Island,
we
understood
\
\
understood
that
a
Number
of
their
Warriors
had
gone
out
against
our
Frontiers.
\
\
that
a
Number
of
their
Warriors
had
gone
out
against
our
Frontiers.
Upon
this
\
\
Upon
this
we
returned
and
came
up
with
them
and
fought
with
them
at
the
Munfey
\
\
we
returned
and
came
up
with
them
and
fought
with
them
at
the
Munfey
Hill
where
\
\
Hill
where
we
lost
some
of
our
Men
&
killed
some
of
their
Warriors
&
thereby
saved
\
\
we
lost
some
of
our
Men
&
killed
some
of
their
Warriors
&
thereby
saved
our
Frontiers
\
\
our
Frontiers
from
this
Story
in
another
Expedition.
But
no
sooner
had
we
destroyed
\
\
from
this
Story
in
another
Expedition.
But
no
sooner
had
we
destroyed
their
Provisions
\
\
their
Provisions
on
the
great
Island,
&
ruined
their
trade
with
the
good
People
\
\
on
the
great
Island,
&
ruined
their
trade
with
the
good
People
at
Bethlehem,
but
\
\
at
Bethlehem,
but
these
very
Indians,
who
were
justly
suspected
of
having
murdered
\
\
these
very
Indians,
who
were
justly
suspected
of
having
murdered
our
Friends
in
\
\
our
Friends
in
Northampton
County,
were
by
the
Influence
of
some
Quakers
taken
\
\
Northampton
County,
were
by
the
Influence
of
some
Quakers
taken
under
the
Protection
\
\
under
the
Protection
of
the
Government
to
screen
them
from
the
Resentments
of
\
\
of
the
Government
to
screen
them
from
the
Resentments
of
the
Friends
and
Relations
\
\
the
Friends
and
Relations
of
the
Murdered,
&
to
support
them
thro
the
Winter.
\"\
\
of
the
Murdered,
&
to
support
them
thro
the
Winter.
\"\n
—
\"
Apology
of
the
Paxton
\
\n\u2014\"
Apology
of
the
Paxton
Boys
\"
(pamphlet),
1764
(Note:
\"
apology
\"
in
this
\
\
Boys
\"
(pamphlet),
1764
(Note:
\"
apology
\"
in
this
context
should
be
read
as
an
\
\
context
should
be
read
as
an
explanation,
not
an
admission
of
guilt
or
regret.
\n\
\
explanation,
not
an
admission
of
guilt
or
regret.
\n
The
sentiments
expressed
in
\
The
sentiments
expressed
in
the
explanation
above
reflect
which
of
the
ongoing
tensions
\
\
the
explanation
above
reflect
which
of
the
ongoing
tensions
during
the
colonial
\
\
during
the
colonial
period
of
American
history?
\n
(A)
Tensions
between
British
\
\
period
of
American
history?
\n
(A)
Tensions
between
British
policies
and
the
aspirations
\
\
policies
and
the
aspirations
of
North
American
colonists.
(B)
Tensions
between
\
\
of
North
American
colonists.
(B)
Tensions
between
American
Indians
allied
with
\
\
American
Indians
allied
with
the
French
and
those
allied
with
the
British.
(C)
\
\
the
French
and
those
allied
with
the
British.
(C)
Tensions
between
freed
African
\
\
Tensions
between
freed
African
Americans
and
white
planters.
(D)
Tensions
between
\
\
Americans
and
white
planters.
(D)
Tensions
between
backcountry
settlers
and
elites
\
\
backcountry
settlers
and
elites
within
colonial
America.
\n
A:
Let's
think
step
\
\
within
colonial
America.
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
\
\
by
step.
We
refer
to
Wikipedia
articles
on
us
history
for
help.
After
the
French
\
\
on
us
history
for
help.
After
the
French
and
Indian
War,
the
Scotch-Irish
settlers
\
\
and
Indian
War,
the
Scotch-Irish
settlers
attacked
American
Indians.
After
the
\
\
attacked
American
Indians.
After
the
attacks
on
the
Conestoga,
about
250
Paxton
\
\
attacks
on
the
Conestoga,
about
250
Paxton
Boys
present
their
grievances
to
the
\
\
Boys
present
their
grievances
to
the
Pennsylvania
legislature.
As
mentioned
in
\
\
Pennsylvania
legislature.
As
mentioned
in
the
information,
the
Paxton
Boys
cited
\
\
the
information,
the
Paxton
Boys
cited
resentiment
at
local
elites.
The
answer
\
\
resentiment
at
local
elites.
The
answer
is
(D).
\n\n
Q:
This
question
refers
to
\
\
is
(D).
\n\n
Q:
This
question
refers
to
the
following
information.
\n
Our
leaders
\
\
the
following
information.
\n
Our
leaders
talk
about
stopping
aggression
from
the
\
\
talk
about
stopping
aggression
from
the
north,
but
this
was
a
struggle
among
groups
\
\
north,
but
this
was
a
struggle
among
groups
of
Vietnamese
until
we
intervened.
\
\
of
Vietnamese
until
we
intervened.
We
seem
bent
upon
saving
the
Vietnamese
from
\
\
We
seem
bent
upon
saving
the
Vietnamese
from
Ho
Chi
Minh
even
if
we
have
to
kill
\
\
Ho
Chi
Minh
even
if
we
have
to
kill
them
and
demolish
their
country
to
do
it.
\
\
them
and
demolish
their
country
to
do
it.
As
the
native
people
survey
bombed-out
\
\
As
the
native
people
survey
bombed-out
villages,
women
and
children
burned
by
\
\
villages,
women
and
children
burned
by
napalm,
rice
crops
destroyed
and
cities
\
\
napalm,
rice
crops
destroyed
and
cities
overrun
with
our
military
personnel,
they
\
\
overrun
with
our
military
personnel,
they
are
doubtless
saying
secretly
of
the
\
\
are
doubtless
saying
secretly
of
the
Vietcong
guerillas
and
of
the
American
forces,
\
\
Vietcong
guerillas
and
of
the
American
forces,
\"
A
plague
on
both
your
houses.
\"\
\ \"
A
plague
on
both
your
houses.
\"
…
Stop
the
bombing,
north
and
south,
end
search
\
\ \u2026
Stop
the
bombing,
north
and
south,
end
search
and
destroy
offensive
sweeps,
\
\
and
destroy
offensive
sweeps,
and
confine
our
military
action
to
holding
operations
\
\
and
confine
our
military
action
to
holding
operations
on
the
ground.
Bombing
the
\
\
on
the
ground.
Bombing
the
north
has
failed
to
halt
or
seriously
check
the
flow
\
\
north
has
failed
to
halt
or
seriously
check
the
flow
of
troops
to
the
south
and
\
\
of
troops
to
the
south
and
may,
in
fact,
have
prompted
a
much
greater
war
effort
\
\
may,
in
fact,
have
prompted
a
much
greater
war
effort
by
Hanoi.
\n\u2014
Senator
\
\
by
Hanoi.
\n
—Senator
George
McGovern,
\"
The
Lessons
of
Vietnam,
\"
April
25,
1967
\n\
\
George
McGovern,
\"
The
Lessons
of
Vietnam,
\"
April
25,
1967
\n
Which
of
the
following
\
Which
of
the
following
opinions
from
the
1960s
most
directly
reflects
the
perspective
\
\
opinions
from
the
1960s
most
directly
reflects
the
perspective
of
George
McGovern's
\
\
of
George
McGovern's
speech?
\n
(A)
Americans
must
maximize
their
technological
\
\
speech?
\n
(A)
Americans
must
maximize
their
technological
edge
in
Vietnam.
(B)
\
\
edge
in
Vietnam.
(B)
American
bombing
in
Vietnam
is
step
by
step
leading
to
progress
\
\
American
bombing
in
Vietnam
is
step
by
step
leading
to
progress
in
the
war.
(C)
\
\
in
the
war.
(C)
American
bombing
in
Vietnam
is
a
failure.
(D)
America
must
not
\
\
American
bombing
in
Vietnam
is
a
failure.
(D)
America
must
not
give
in
to
defeatism
\
\
give
in
to
defeatism
about
the
war
in
Vietnam.
\n
A:
Let's
think
step
by
step.
We
\
\
about
the
war
in
Vietnam.
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
\
\
refer
to
Wikipedia
articles
on
us
history
for
help.
\"
Stop
the
bombing
\"
and
\"\
\
articles
on
us
history
for
help.
\"
Stop
the
bombing
\"
and
\"
Bombing
the
north
\
Bombing
the
north
has
failed
to
halt
or
seriously
check
the
flow
of
troops
to
the
\
\
has
failed
to
halt
or
seriously
check
the
flow
of
troops
to
the
south
\"
indicate
\
\
south
\"
indicate
that
the
perspective
of
George
McGovern's
speech
is
that
Amerian
\
\
that
the
perspective
of
George
McGovern's
speech
is
that
Amerian
bombing
in
Vietnam
\
\
bombing
in
Vietnam
is
a
failure.
The
answer
is
(C).
\n\n
Q:
This
question
refers
\
\
is
a
failure.
The
answer
is
(C).
\n\n
Q:
This
question
refers
to
the
following
information.
\n\
\
to
the
following
information.
\n\"
In
the
new
Code
of
Laws
which
I
suppose
it
will
\
\"
In
the
new
Code
of
Laws
which
I
suppose
it
will
be
necessary
for
you
to
make
I
\
\
be
necessary
for
you
to
make
I
desire
you
would
Remember
the
Ladies,
and
be
more
\
\
desire
you
would
Remember
the
Ladies,
and
be
more
generous
and
favorable
to
them
\
\
generous
and
favorable
to
them
than
your
ancestors.
Do
not
put
such
unlimited
\
\
than
your
ancestors.
Do
not
put
such
unlimited
power
into
the
hands
of
the
Husbands.
\
\
power
into
the
hands
of
the
Husbands.
Remember
all
Men
would
be
tyrants
if
they
\
\
Remember
all
Men
would
be
tyrants
if
they
could.
If
particular
care
and
attention
\
\
could.
If
particular
care
and
attention
is
not
paid
to
the
Ladies
we
are
determined
\
\
is
not
paid
to
the
Ladies
we
are
determined
to
foment
a
Rebellion,
and
will
not
\
\
to
foment
a
Rebellion,
and
will
not
hold
ourselves
bound
by
any
Laws
in
which
\
\
hold
ourselves
bound
by
any
Laws
in
which
we
have
no
voice,
or
Representation.
\"\
\
we
have
no
voice,
or
Representation.
\"\n
Abigail
Adams,
in
a
letter
to
John
Adams,
\
\n
Abigail
Adams,
in
a
letter
to
John
Adams,
1776
\n\"
Special
legislation
for
woman
\
\
1776
\n\"
Special
legislation
for
woman
has
placed
us
in
a
most
anomalous
position.
\
\
has
placed
us
in
a
most
anomalous
position.
Women
invested
with
the
rights
of
\
\
Women
invested
with
the
rights
of
citizens
in
one
section—voters,
jurors,
office-holders—crossing
\
\
citizens
in
one
section
\u2014
voters,
jurors,
office-holders
\u2014
crossing
an
imaginary
\
\
an
imaginary
line,
are
subjects
in
the
next.
In
some
States,
a
married
woman
may
\
\
line,
are
subjects
in
the
next.
In
some
States,
a
married
woman
may
hold
property
\
\
hold
property
and
transact
business
in
her
own
name;
in
others,
her
earnings
belong
\
\
and
transact
business
in
her
own
name;
in
others,
her
earnings
belong
to
her
husband.
\
\
to
her
husband.
In
some
States,
a
woman
may
testify
against
her
husband,
sue
and
\
\
In
some
States,
a
woman
may
testify
against
her
husband,
sue
and
be
sued
in
the
\
\
be
sued
in
the
courts;
in
others,
she
has
no
redress
in
case
of
damage
to
person,
\
\
courts;
in
others,
she
has
no
redress
in
case
of
damage
to
person,
property,
or
\
\
property,
or
character.
In
case
of
divorce
on
account
of
adultery
in
the
husband,
\
\
character.
In
case
of
divorce
on
account
of
adultery
in
the
husband,
the
innocent
\
\
the
innocent
wife
is
held
to
possess
no
right
to
children
or
property,
unless
\
\
wife
is
held
to
possess
no
right
to
children
or
property,
unless
by
special
decree
\
\
by
special
decree
of
the
court.
But
in
no
State
of
the
Union
has
the
wife
the
\
\
of
the
court.
But
in
no
State
of
the
Union
has
the
wife
the
right
to
her
own
person,
\
\
right
to
her
own
person,
or
to
any
part
of
the
joint
earnings
of
the
co-partnership
\
\
or
to
any
part
of
the
joint
earnings
of
the
co-partnership
during
the
life
of
\
\
during
the
life
of
her
husband.
In
some
States
women
may
enter
the
law
schools
\
\
her
husband.
In
some
States
women
may
enter
the
law
schools
and
practice
in
the
\
\
and
practice
in
the
courts;
in
others
they
are
forbidden.
In
some
universities
\
\
courts;
in
others
they
are
forbidden.
In
some
universities
girls
enjoy
equal
educational
\
\
girls
enjoy
equal
educational
advantages
with
boys,
while
many
of
the
proudest
\
\
advantages
with
boys,
while
many
of
the
proudest
institutions
in
the
land
deny
\
\
institutions
in
the
land
deny
them
admittance,
though
the
sons
of
China,
Japan
\
\
them
admittance,
though
the
sons
of
China,
Japan
and
Africa
are
welcomed
there.
\
\
and
Africa
are
welcomed
there.
But
the
privileges
already
granted
in
the
several
\
\
But
the
privileges
already
granted
in
the
several
States
are
by
no
means
secure.
\"\
\
States
are
by
no
means
secure.
\"\n
Susan
B.
Anthony,
\"
Declaration
of
Rights
for
\
\n
Susan
B.
Anthony,
\"
Declaration
of
Rights
for
Women,
\"
July
4,
1876
\n
The
sentiments
\
\
Women,
\"
July
4,
1876
\n
The
sentiments
expressed
in
the
second
excerpt
by
Susan
\
\
expressed
in
the
second
excerpt
by
Susan
B.
Anthony
are
most
likely
in
support
\
\
B.
Anthony
are
most
likely
in
support
of
\n
(A)
the
Equal
Rights
Amendment
(B)
universal
\
\
of
\n
(A)
the
Equal
Rights
Amendment
(B)
universal
suffrage
(C)
states'
rights
(D)
\
\
suffrage
(C)
states'
rights
(D)
prohibition
\n
A:
Let's
think
step
by
step.
We
refer
\
\
prohibition
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
us
\
\
to
Wikipedia
articles
on
us
history
for
help.
The
above
information
mentioned
\
\
history
for
help.
The
above
information
mentioned
that
women
are
in
an
anomalous
\
\
that
women
are
in
an
anomalous
position
in
terms
of
legislation.
Women's
earnings
\
\
position
in
terms
of
legislation.
Women's
earnings
do
not
belong
to
themselves,
\
\
do
not
belong
to
themselves,
or
they
cannot
testify
against
her
husbands.
Susan
\
\
or
they
cannot
testify
against
her
husbands.
Susan
believes
women
should
have
\
\
believes
women
should
have
equal
legal
rights
as
men.
The
answer
is
(B).
\n\n
Q:
\
\
equal
legal
rights
as
men.
The
answer
is
(B).
\n\n
Q:
This
question
refers
to
the
\
\
This
question
refers
to
the
following
information.
\n\"
Society
in
every
state
is
\
\
following
information.
\n\"
Society
in
every
state
is
a
blessing,
but
government
\
\
a
blessing,
but
government
even
in
its
best
state
is
but
a
necessary
evil;
in
\
\
even
in
its
best
state
is
but
a
necessary
evil;
in
its
worst
state
an
intolerable
\
\
its
worst
state
an
intolerable
one;
for
when
we
suffer,
or
are
exposed
to
the
\
\
one;
for
when
we
suffer,
or
are
exposed
to
the
same
miseries
by
a
government,
\
\
same
miseries
by
a
government,
which
we
might
expect
in
a
country
without
government,
\
\
which
we
might
expect
in
a
country
without
government,
our
calamity
is
heightened
\
\
our
calamity
is
heightened
by
reflecting
that
we
furnish
the
means
by
which
we
\
\
by
reflecting
that
we
furnish
the
means
by
which
we
suffer.
Government,
like
dress,
\
\
suffer.
Government,
like
dress,
is
the
badge
of
lost
innocence;
the
palaces
of
\
\
is
the
badge
of
lost
innocence;
the
palaces
of
kings
are
built
on
the
ruins
of
\
\
kings
are
built
on
the
ruins
of
the
bowers
of
paradise.
For
were
the
impulses
\
\
the
bowers
of
paradise.
For
were
the
impulses
of
conscience
clear,
uniform,
and
\
\
of
conscience
clear,
uniform,
and
irresistibly
obeyed,
man
would
need
no
other
\
\
irresistibly
obeyed,
man
would
need
no
other
lawgiver;
but
that
not
being
the
\
\
lawgiver;
but
that
not
being
the
case,
he
finds
it
necessary
to
surrender
up
a
\
\
case,
he
finds
it
necessary
to
surrender
up
a
part
of
his
property
to
furnish
\
\
part
of
his
property
to
furnish
means
for
the
protection
of
the
rest;
and
this
\
\
means
for
the
protection
of
the
rest;
and
this
he
is
induced
to
do
by
the
same
\
\
he
is
induced
to
do
by
the
same
prudence
which
in
every
other
case
advises
him
\
\
prudence
which
in
every
other
case
advises
him
out
of
two
evils
to
choose
the
\
\
out
of
two
evils
to
choose
the
least.
Wherefore,
security
being
the
true
design
\
\
least.
Wherefore,
security
being
the
true
design
and
end
of
government,
it
unanswerably
\
\
and
end
of
government,
it
unanswerably
follows
that
whatever
form
thereof
appears
\
\
follows
that
whatever
form
thereof
appears
most
likely
to
ensure
it
to
us,
with
\
\
most
likely
to
ensure
it
to
us,
with
the
least
expense
and
greatest
benefit,
is
\
\
the
least
expense
and
greatest
benefit,
is
preferable
to
all
others.
\"\n
Thomas
\
\
preferable
to
all
others.
\"\n
Thomas
Paine,
Common
Sense,
1776
\n
Which
of
the
following
\
\
Paine,
Common
Sense,
1776
\n
Which
of
the
following
\"
miseries
\"
alluded
to
above
\
\ \"
miseries
\"
alluded
to
above
were
most
condemned
by
Anti-Federalists
of
the
post-Revolutionary
\
\
were
most
condemned
by
Anti-Federalists
of
the
post-Revolutionary
era?
\n
(A)
Organized
\
\
era?
\n
(A)
Organized
response
to
Bacon's
Rebellion
(B)
Federal
response
to
Shays's
\
\
response
to
Bacon's
Rebellion
(B)
Federal
response
to
Shays's
Rebellion
(C)
Federal
\
\
Rebellion
(C)
Federal
response
to
the
Whiskey
Rebellion
(D)
Federal
response
to
\
\
response
to
the
Whiskey
Rebellion
(D)
Federal
response
to
Pontiac's
Rebellion
\n\
\
Pontiac's
Rebellion
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
\
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
us
history
for
help.
\
\
on
us
history
for
help.
Anti-Federalists
do
not
believe
centralized
government
\
\
Anti-Federalists
do
not
believe
centralized
government
power,
and
suspect
Washington's
\
\
power,
and
suspect
Washington's
military
response
to
Whiskey
Rebellion.
Bacon's
\
\
military
response
to
Whiskey
Rebellion.
Bacon's
Rebellion
and
Pontiac's
Rebellion
\
\
Rebellion
and
Pontiac's
Rebellion
happen
before
the
Revolution
and
they
can
be
\
\
happen
before
the
Revolution
and
they
can
be
ruled
out.
The
answer
is
(C)."
\
ruled
out.
The
answer
is
(C)."
include
:
_mmlu_flan_cot_fewshot_template_yaml
"
group"
:
"
mmlu_flan_cot_fewshot_humanities"
task
:
mmlu_flan_cot_fewshot_high_school_us_history
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
task"
:
"
mmlu_flan_cot_fewshot_high_school_us_history"
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_high_school_world_history.yaml
View file @
574e565a
dataset_name
:
high_school_world_history
"
dataset_name
"
:
"
high_school_world_history
"
description
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
high
\
"
description
"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
high
\
\
school
world
history.
\n\n
Q:
This
question
refers
to
the
following
information.
\n\
\
school
world
history.
\n\n
Q:
This
question
refers
to
the
following
information.
\n\
\"
At
least
one
of
the
[world's]
societies
would
have
to
somehow
enormously
increase
\
\"
At
least
one
of
the
[world's]
societies
would
have
to
somehow
enormously
increase
\
\
its
productivity
[in
order
to
achieve
global
hegemony].
That
quantum
jump
would
\
\
its
productivity
[in
order
to
achieve
global
hegemony].
That
quantum
jump
would
\
...
@@ -8,75 +8,75 @@ description: "The following are multiple choice questions (with answers) about h
...
@@ -8,75 +8,75 @@ description: "The following are multiple choice questions (with answers) about h
\
be
accomplished
by
exploiting
the
ecosystems,
mineral
resources,
and
human
assets
\
\
be
accomplished
by
exploiting
the
ecosystems,
mineral
resources,
and
human
assets
\
\
of
whole
continents
outside
the
lands
of
the
society
making
the
jump.
Western
\
\
of
whole
continents
outside
the
lands
of
the
society
making
the
jump.
Western
\
\
Europe
did
just
that
by
means
of
its
brutality
and
guns
and,
more
important,
by
\
\
Europe
did
just
that
by
means
of
its
brutality
and
guns
and,
more
important,
by
\
\
geographical
and
ecological
luck.
\"\n
Copyright
\xA9
2015
Cambridge
University
\
\
geographical
and
ecological
luck.
\"\n
Copyright
©
2015
Cambridge
University
Press.
\n
\
\
Press.
\n
Alfred
Crosby,
historian,
Ecological
Imperialism,
2004
\n
The
\"
quantum
\
Alfred
Crosby,
historian,
Ecological
Imperialism,
2004
\n
The
\"
quantum
jump
\"
mentioned
\
\
jump
\"
mentioned
in
the
passage
most
directly
contributed
to
which
of
the
following
\
\
in
the
passage
most
directly
contributed
to
which
of
the
following
developments
\
\
developments
in
the
period
1450
\u2013
1750
C.E.?
\n
(A)
A
breakdown
in
trade
routes
\
\
in
the
period
1450
–
1750
C.E.?
\n
(A)
A
breakdown
in
trade
routes
through
the
collapse
\
\
through
the
collapse
of
the
established
state
structure
(B)
An
increase
in
the
\
\
of
the
established
state
structure
(B)
An
increase
in
the
population
of
the
world
\
\
population
of
the
world
through
more
plentiful
supplies
of
food
(C)
The
spread
\
\
through
more
plentiful
supplies
of
food
(C)
The
spread
of
Chinese
and
Indian
belief
\
\
of
Chinese
and
Indian
belief
systems
across
the
world
(D)
An
increase
in
social
\
\
systems
across
the
world
(D)
An
increase
in
social
unrest
\n
A:
Let's
think
step
\
\
unrest
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
world
history
\
\
by
step.
We
refer
to
Wikipedia
articles
on
world
history
for
help.
The
\"
quantum
\
\
for
help.
The
\"
quantum
jump
\"
mentioned
in
the
passage
refers
to
the
conquest
\
\
jump
\"
mentioned
in
the
passage
refers
to
the
conquest
of
the
New
World
and
the
\
\
of
the
New
World
and
the
Columbian
Exchange.
Choice
(A)
and
(C)
did
not
happen
\
\
Columbian
Exchange.
Choice
(A)
and
(C)
did
not
happen
in
history.
Choice
(C)
refers
\
\
in
history.
Choice
(C)
refers
to
the
human
assets.
The
answer
is
(B).
\n\n
Q:
This
\
\
to
the
human
assets.
The
answer
is
(B).
\n\n
Q:
This
question
refers
to
the
following
\
\
question
refers
to
the
following
information.
\n\"
The
struggle
against
neo-colonialism
\
\
information.
\n\"
The
struggle
against
neo-colonialism
is
not
aimed
at
excluding
\
\
is
not
aimed
at
excluding
the
capital
of
the
developed
world
from
operating
in
\
\
the
capital
of
the
developed
world
from
operating
in
less
developed
countries.
\
\
less
developed
countries.
It
is
aimed
at
preventing
the
financial
power
of
the
\
\
It
is
aimed
at
preventing
the
financial
power
of
the
developed
countries
being
\
\
developed
countries
being
used
in
such
a
way
as
to
impoverish
the
less
developed.
\n\
\
used
in
such
a
way
as
to
impoverish
the
less
developed.
\n
Non-alignment,
as
practiced
\
Non-alignment,
as
practiced
by
Ghana
and
many
other
countries,
is
based
on
co-operation
\
\
by
Ghana
and
many
other
countries,
is
based
on
co-operation
with
all
States
whether
\
\
with
all
States
whether
they
be
capitalist,
socialist
or
have
a
mixed
economy.
\
\
they
be
capitalist,
socialist
or
have
a
mixed
economy.
Such
a
policy,
therefore,
\
\
Such
a
policy,
therefore,
involves
foreign
investment
from
capitalist
countries,
\
\
involves
foreign
investment
from
capitalist
countries,
but
it
must
be
invested
\
\
but
it
must
be
invested
in
accordance
with
a
national
plan
drawn
up
by
the
government
\
\
in
accordance
with
a
national
plan
drawn
up
by
the
government
of
the
non-aligned
\
\
of
the
non-aligned
State
with
its
own
interests
in
mind.
The
issue
is
not
what
\
\
State
with
its
own
interests
in
mind.
The
issue
is
not
what
return
the
foreign
\
\
return
the
foreign
investor
receives
on
his
investments
\u2026
The
question
is
one
\
\
investor
receives
on
his
investments
…
The
question
is
one
of
power.
A
State
in
\
\
of
power.
A
State
in
the
grip
of
neo-colonialism
is
not
master
of
its
own
destiny.
\"\
\
the
grip
of
neo-colonialism
is
not
master
of
its
own
destiny.
\"\
n
Kwame
Nkrumah,
\
\
n
Kwame
Nkrumah,
Neo-Colonialism,
1965
\n
Which
of
the
following
provides
the
best
\
\
Neo-Colonialism,
1965
\n
Which
of
the
following
provides
the
best
context
for
Nkrumah's
\
\
context
for
Nkrumah's
writings?
\n
(A)
The
Industrial
Revolution
(B)
Decolonization
\
\
writings?
\n
(A)
The
Industrial
Revolution
(B)
Decolonization
(C)
Regional
Free
\
\
(C)
Regional
Free
Trade
Associations
(D)
Autarky
\n
A:
Let's
think
step
by
step.
\
\
Trade
Associations
(D)
Autarky
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
\
\
We
refer
to
Wikipedia
articles
on
world
history
for
help.
The
passage
expresses
\
\
articles
on
world
history
for
help.
The
passage
expresses
a
point
that
the
successful
\
\
a
point
that
the
successful
fight
against
neo-colonialism
were
in
danger
and
the
\
\
fight
against
neo-colonialism
were
in
danger
and
the
newly
independent
nations
\
\
newly
independent
nations
like
Ghana
may
be
re-colonized
via
financial
power
of
\
\
like
Ghana
may
be
re-colonized
via
financial
power
of
the
developed
countries.
\
\
the
developed
countries.
The
answer
is
(B).
\n\n
Q:
This
question
refers
to
the
\
\
The
answer
is
(B).
\n\n
Q:
This
question
refers
to
the
following
information.
\n\"
\
\
following
information.
\n\"
Indeed,
as
both
the
fatwas
of
distinguished
[scholars]
\
Indeed,
as
both
the
fatwas
of
distinguished
[scholars]
who
base
their
opinion
on
\
\
who
base
their
opinion
on
reason
and
tradition
alike
and
the
consensus
of
the
\
\
reason
and
tradition
alike
and
the
consensus
of
the
Sunni
community
agree
that
\
\
Sunni
community
agree
that
the
ancient
obligation
of
extirpation,
extermination,
\
\
the
ancient
obligation
of
extirpation,
extermination,
and
expulsion
of
evil
innovation
\
\
and
expulsion
of
evil
innovation
must
be
the
aim
of
our
exalted
aspiration,
for
\
\
must
be
the
aim
of
our
exalted
aspiration,
for
\"
Religious
zeal
is
a
victory
for
\
\
\"
Religious
zeal
is
a
victory
for
the
Faith
of
God
the
Beneficent
\"
;
then,
in
\
\
the
Faith
of
God
the
Beneficent
\"
;
then,
in
accordance
with
the
words
of
the
Prophet
\
\
accordance
with
the
words
of
the
Prophet
(Peace
upon
him!)
\"
Whosoever
introduces
\
\
(Peace
upon
him!)
\"
Whosoever
introduces
evil
innovation
into
our
order
must
be
\
\
evil
innovation
into
our
order
must
be
expelled
\"
and
\"
Whosoever
does
aught
against
\
\
expelled
\"
and
\"
Whosoever
does
aught
against
our
order
must
be
expelled,
\"
action
\
\
our
order
must
be
expelled,
\"
action
has
become
necessary
and
exigent
\u2026\"\n
\
\
has
become
necessary
and
exigent…
\"\n
Letter
from
Ottoman
Sultan
Selim
I
to
Safavid
\
Letter
from
Ottoman
Sultan
Selim
I
to
Safavid
Shah
Ismail
I,
1514
\n
The
letter
from
\
\
Shah
Ismail
I,
1514
\n
The
letter
from
Selim
I
is
most
clearly
an
example
of
which
\
\
Selim
I
is
most
clearly
an
example
of
which
of
the
following?
\n
(A)
The
maintenance
\
\
of
the
following?
\n
(A)
The
maintenance
of
military
supremacy
at
all
costs
(B)
\
\
of
military
supremacy
at
all
costs
(B)
Expanding
tensions
between
religious
sects
\
\
Expanding
tensions
between
religious
sects
(C)
Factors
that
brought
about
the
\
\
(C)
Factors
that
brought
about
the
collapse
of
the
Ottoman
Empire
(D)
Peacemaking
\
\
collapse
of
the
Ottoman
Empire
(D)
Peacemaking
efforts
among
the
Islamic
empires
\n
\
\
efforts
among
the
Islamic
empires
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
\
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
world
history
for
\
\
articles
on
world
history
for
help.
The
passage
is
an
example
of
expanding
tensions
\
\
help.
The
passage
is
an
example
of
expanding
tensions
between
Selim
and
Ismail.
\
\
between
Selim
and
Ismail.
In
the
passage
the
Selim
references
the
fatwa
and
the
\
\
In
the
passage
the
Selim
references
the
fatwa
and
the
consensus
of
the
Sunni
community
\
\
consensus
of
the
Sunni
community
to
against
whosoever
introduces
evil.
The
answer
\
\
to
against
whosoever
introduces
evil.
The
answer
is
(B).
\n\n
Q:
This
question
refers
\
\
is
(B).
\n\n
Q:
This
question
refers
to
the
following
information.
\n\"
The
real
grievance
\
\
to
the
following
information.
\n\"
The
real
grievance
of
the
worker
is
the
insecurity
\
\
of
the
worker
is
the
insecurity
of
his
existence;
he
is
not
sure
that
he
will
\
\
of
his
existence;
he
is
not
sure
that
he
will
always
have
work,
he
is
not
sure
\
\
always
have
work,
he
is
not
sure
that
he
will
always
be
healthy,
and
he
foresees
\
\
that
he
will
always
be
healthy,
and
he
foresees
that
he
will
one
day
be
old
and
\
\
that
he
will
one
day
be
old
and
unfit
to
work.
If
he
falls
into
poverty,
even
\
\
unfit
to
work.
If
he
falls
into
poverty,
even
if
only
through
a
prolonged
illness,
\
\
if
only
through
a
prolonged
illness,
he
is
then
completely
helpless,
exam_ins
\
\
he
is
then
completely
helpless,
exam_ins
to
his
own
devices,
and
society
does
\
\
to
his
own
devices,
and
society
does
not
currently
recognize
any
real
obligation
\
\
not
currently
recognize
any
real
obligation
towards
him
beyond
the
usual
help
\
\
towards
him
beyond
the
usual
help
for
the
poor,
even
if
he
has
been
working
all
\
\
for
the
poor,
even
if
he
has
been
working
all
the
time
ever
so
faithfully
and
\
\
the
time
ever
so
faithfully
and
diligently.
The
usual
help
for
the
poor,
however,
\
\
diligently.
The
usual
help
for
the
poor,
however,
leaves
a
lot
to
be
desired,
\
\
leaves
a
lot
to
be
desired,
especially
in
large
cities,
where
it
is
very
much
\
\
especially
in
large
cities,
where
it
is
very
much
worse
than
in
the
country.
\"
\
\
worse
than
in
the
country.
\"\
n
Otto
von
Bismarck,
1884
\n
Otto
von
Bismarck
likely
\
\n
Otto
von
Bismarck,
1884
\n
Otto
von
Bismarck
likely
made
this
speech
in
reaction
\
\
made
this
speech
in
reaction
to
which
of
the
following
issues?
\n
(A)
Social
acceptance
\
\
to
which
of
the
following
issues?
\n
(A)
Social
acceptance
of
child
labor
(B)
Declining
\
\
of
child
labor
(B)
Declining
life
expectancy
in
Germany
(C)
Criticisms
of
German
\
\
life
expectancy
in
Germany
(C)
Criticisms
of
German
trade
tariffs
(D)
Negative
\
\
trade
tariffs
(D)
Negative
effects
attributed
to
industrial
capitalism
\n
A:
Let's
\
\
effects
attributed
to
industrial
capitalism
\n
A:
Let's
think
step
by
step.
We
refer
\
\
think
step
by
step.
We
refer
to
Wikipedia
articles
on
world
history
for
help.
\
\
to
Wikipedia
articles
on
world
history
for
help.
The
passage
talks
about
the
grievance
\
\
The
passage
talks
about
the
grievance
of
the
work
under
the
industrial
capitalism.
\
\
of
the
work
under
the
industrial
capitalism.
The
answer
is
(D).
\n\n
Q:
This
question
\
\
The
answer
is
(D).
\n\n
Q:
This
question
refers
to
the
following
information.
\n
\
\
refers
to
the
following
information.
\n
He
contains
all
works
and
desires
and
all
\
He
contains
all
works
and
desires
and
all
perfumes
and
all
tastes.
He
enfolds
the
\
\
perfumes
and
all
tastes.
He
enfolds
the
whole
universe
and
in
silence
is
loving
\
\
whole
universe
and
in
silence
is
loving
to
all.
This
is
the
Spirit
that
is
in
\
\
to
all.
This
is
the
Spirit
that
is
in
my
heart,
this
is
Brahman.
To
him
I
shall
\
\
my
heart,
this
is
Brahman.
T
o
him
I
sha
ll
come
w
he
n
I
go
beyond
this
life,
and
\
\
come
when
I
go
beyond
this
life,
and
t
o
him
wi
ll
come
he
who
has
faith
and
doubts
\
\
to
him
will
come
he
who
has
faith
and
doubts
not.
\n\u2014
The
Upanishads,
India,
\
\
not.
\n
—The
Upanishads,
India,
c.
1000
BCE
\n
To
which
religion
does
the
speaker
\
\
c.
1000
BCE
\n
To
which
religion
does
the
speaker
most
likely
belong?
\n
(A)
Hinduism
\
\
most
likely
belong?
\n
(A)
Hinduism
(B)
Buddhism
(C)
Shintoism
(D)
Zoroastrianism
\n
\
\
(B)
Buddhism
(C)
Shintoism
(D)
Zoroastrianism
\n
A:
Let's
think
step
by
step.
We
\
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
world
history
for
\
\
refer
to
Wikipedia
articles
on
world
history
for
help.
Brahman
refers
to
the
ultimate
\
\
help.
Brahman
refer
s
to
the
ultimate
reality
of
all
things
in
the
Hindu
religion.
\
\
reality
of
all
things
in
the
Hindu
religion.
In
contrast,
Buddhism
does
not
have
\
\
In
contrast,
Buddhism
does
not
have
a
concept
of
supreme
God.
The
answer
is
(A)."
\
a
concept
of
supreme
God.
The
answer
is
(A).
"
"
group"
:
"
mmlu_flan_cot_fewshot_humanities
"
include
:
_mmlu_flan_cot_fewshot_template_yaml
"
include
"
:
"
_mmlu_flan_cot_fewshot_template_yaml
"
task
:
mmlu_flan_cot_fewshot_high_school_world_history
"
task
"
:
"
mmlu_flan_cot_fewshot_high_school_world_history
"
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_human_aging.yaml
View file @
574e565a
dataset_name
:
human_aging
"
dataset_name"
:
"
human_aging"
description
:
'
The
following
are
multiple
choice
questions
(with
answers)
about
human
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
human
\
aging.
\
aging.
\n\n
Q:
All
other
things
being
equal,
which
of
the
following
persons
is
more
\
\
likely
to
show
osteoporosis?
\n
(A)
An
older
Hispanic
American
woman
(B)
An
older
\
\
African
American
woman
(C)
An
older
Asian
American
woman
(D)
An
older
Native
American
\
Q:
All
other
things
being
equal,
which
of
the
following
persons
is
more
likely
to
\
woman
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
human
aging
\
show
osteoporosis?
\
for
help.
Although
osteoporosis
can
occur
at
any
age,
the
risk
is
higher
for
older
\
\
people.
It
is
most
common
in
Asian
and
non-Hispanic
white
women.
The
answer
is
\
(A)
An
older
Hispanic
American
woman
(B)
An
older
African
American
woman
(C)
An
\
(C).
\n\n
Q:
The
finding
that
adults
tend
to
remember
events
from
their
adolescence
\
older
Asian
American
woman
(D)
An
older
Native
American
woman
\
better
than
from
other
periods
in
their
lives
is
referred
to
as
the
\n
(A)
Adolescence
\
\
advantage
(B)
Reminiscence
bump
(C)
Memorial
memorial
(D)
Quadratic
retrieval
\
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
human
aging
for
\
spike
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
human
aging
\
help.
Although
osteoporosis
can
occur
at
any
age,
the
risk
is
higher
for
older
people.
\
for
help.
Reminiscence
bump
is
a
phenomenon
that
older
adults
tend
to
recollect
\
It
is
most
common
in
Asian
and
non-Hispanic
white
women.
The
answer
is
(C).
\
events
during
their
young
ages.
People
usually
have
a
period
of
childhood
amnesia
\
\
from
birth
to
around
age
5,
and
a
reminiscence
bump
between
10
and
30.
The
answer
\
\
is
(B).
\n\n
Q:
Which
element
in
tobacco
smoke
is
responsible
for
cancers?
\n
(A)
\
Q:
The
finding
that
adults
tend
to
remember
events
from
their
adolescence
better
\
Nicotine
(B)
Tar
(C)
Carbon
monoxide
(D)
Smoke
particles
\n
A:
Let's
think
step
\
than
from
other
periods
in
their
lives
is
referred
to
as
the
\
by
step.
We
refer
to
Wikipedia
articles
on
human
aging
for
help.
The
benzene,
\
\
acrylamide
and
acrylonitrile
in
tar
interact
with
the
lungs
and
cause
DNA
mutations
\
(A)
Adolescence
advantage
(B)
Reminiscence
bump
(C)
Memorial
memorial
(D)
Quadratic
\
in
cells
of
the
lungs,
and
lead
to
cancer.
The
answer
is
(B).
\n\n
Q:
When
older
\
retrieval
spike
\
adults
move
to
a
new
state
after
retirement,
which
of
the
following
is
the
more
\
\
likely
destination?
\n
(A)
Texas
(B)
California
(C)
Hawaii
(D)
Vermont
\n
A:
Let's
\
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
human
aging
for
\
think
step
by
step.
We
refer
to
Wikipedia
articles
on
human
aging
for
help.
Texas
\
help.
Reminiscence
bump
is
a
phenomenon
that
older
adults
tend
to
recollect
events
\
does
not
have
state
tax,
and
has
low
cost
of
living
compared
with
the
other
three
\
during
their
young
ages.
People
usually
have
a
period
of
childhood
amnesia
from
\
options.
The
answer
is
(A)."
birth
to
around
age
5,
and
a
reminiscence
bump
between
10
and
30.
The
answer
is
"
group"
:
"
mmlu_flan_cot_fewshot_other"
(B).
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
task"
:
"
mmlu_flan_cot_fewshot_human_aging"
Q:
Which
element
in
tobacco
smoke
is
responsible
for
cancers?
(A)
Nicotine
(B)
Tar
(C)
Carbon
monoxide
(D)
Smoke
particles
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
human
aging
for
help.
The
benzene,
acrylamide
and
acrylonitrile
in
tar
interact
with
the
lungs
and
cause
DNA
mutations
in
cells
of
the
lungs,
and
lead
to
cancer.
The
answer
is
(B).
Q:
When
older
adults
move
to
a
new
state
after
retirement,
which
of
the
following
is
the
more
likely
destination?
(A)
Texas
(B)
California
(C)
Hawaii
(D)
Vermont
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
human
aging
for
help.
Texas
does
not
have
state
tax,
and
has
low
cost
of
living
compared
with
the
other
three
options.
The
answer
is
(A).'
include
:
_mmlu_flan_cot_fewshot_template_yaml
task
:
mmlu_flan_cot_fewshot_human_aging
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_human_sexuality.yaml
View file @
574e565a
dataset_name
:
human_sexuality
"
dataset_name"
:
"
human_sexuality"
description
:
'
The
following
are
multiple
choice
questions
(with
answers)
about
human
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
human
\
sexuality.
\
sexuality.
\n\n
Q:
The
most
common
disorder
among
men
who
seek
sexual
therapy
is:
\n\
(A)
premature
ejaculation
(B)
inhibited
ejaculation
(C)
erectile
disorder
(D)
ejaculatory
\
\
disorder
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
human
\
Q:
The
most
common
disorder
among
men
who
seek
sexual
therapy
is:
\
sexuality
for
help.
The
most
common
type
of
arousal
disorder
for
men
is
erectile
\
\
dysfunction,
meaning
that
a
person
is
not
able
to
get
any
physical
satisfaction
\
(A)
premature
ejaculation
(B)
inhibited
ejaculation
(C)
erectile
disorder
(D)
ejaculatory
\
from
sexual
activity
although
he
may
be
interested
in
it.
The
answer
is
(C).
\n\
disorder
\n
Q:
A
woman
who
knows
she
has
active
herpes
and
untreated
syphilis
but
continues
\
\
to
have
sex
without
informing
her
partners
of
her
condition
has,
in
psychoanalytic
\
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
human
sexuality
\
terms:
\n
(A)
a
strong
ego
(B)
a
weak
superego
(C)
a
weak
id
(D)
a
strong
superego
\n\
for
help.
The
most
common
type
of
arousal
disorder
for
men
is
erectile
dysfunction,
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
human
sexuality
for
\
meaning
that
a
person
is
not
able
to
get
any
physical
satisfaction
from
sexual
activity
\
help.
A
person
with
weak
superego
tends
to
be
delinquent,
criminal
or
have
antisocial
\
although
he
may
be
interested
in
it.
The
answer
is
(C).
\
personality.
The
action
of
the
woman
who
knows
she
has
active
venereal
disease
\
\
but
still
have
sex
with
her
partners
indicate
she
may
has
antisocial
personality.
\
\
The
answer
is
(B).
\n\n
Q:
The
nature
of
homosexual
activities
that
occur
during
\
Q:
A
woman
who
knows
she
has
active
herpes
and
untreated
syphilis
but
continues
\
preadolescence
include
all
but
which
of
the
following?
\n
(A)
sexual
intercourse
\
to
have
sex
without
informing
her
partners
of
her
condition
has,
in
psychoanalytic
\
(B)
circle
jerks
(C)
exhibitionism
(D)
touching
each
other's
genitals
\n
A:
Let's
\
terms:
\
think
step
by
step.
We
refer
to
Wikipedia
articles
on
human
sexuality
for
help.
\
\
Sexual
intercourse
is
prohibited
during
preadolescence.
The
answer
is
(A).
\n\n\
(A)
a
strong
ego
(B)
a
weak
superego
(C)
a
weak
id
(D)
a
strong
superego
Q:
Women's
ability
to
have
multiple
orgasms
is
primarily
due
to:
\n
(A)
the
fact
that
\
\
they
do
not
have
a
refractory
period.
(B)
the
response
of
the
inner
layers
of
\
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
human
sexuality
\
the
vagina.
(C)
having
alternating
orgasms
in
different
locations.
(D)
the
G-Spot.
\n\
for
help.
A
person
with
weak
superego
tends
to
be
delinquent,
criminal
or
have
antisocial
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
human
sexuality
for
\
personality.
The
action
of
the
woman
who
knows
she
has
active
venereal
disease
but
\
help.
The
refractory
period
is
the
time
when
a
person
is
not
able
to
be
erect
\
still
have
sex
with
her
partners
indicate
she
may
has
antisocial
personality.
The
\
or
is
not
interested
in
sex.
The
answer
is
(A).
\n\n
Q:
Morning
sickness
is
typically
\
answer
is
(B).
\
a
problem:
\n
(A)
during
the
first
trimester
(B)
during
the
second
trimester
(C)
\
\
during
the
third
trimester
(D)
all
through
the
pregnancy
\n
A:
Let's
think
step
\
\
by
step.
We
refer
to
Wikipedia
articles
on
human
sexuality
for
help.
Morning
sickness
\
Q:
The
nature
of
homosexual
activities
that
occur
during
preadolescence
include
\
usually
begins
by
nine
weeks
after
conception,
corresponding
to
the
first
trimester.
\
all
but
which
of
the
following?
\
The
answer
is
(A)."
"
group"
:
"
mmlu_flan_cot_fewshot_social_sciences"
(A)
sexual
intercourse
(B)
circle
jerks
(C)
exhibitionism
(D)
touching
each
other'
'
s
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
genitals
"
task"
:
"
mmlu_flan_cot_fewshot_human_sexuality"
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
human
sexuality
for
help.
Sexual
intercourse
is
prohibited
during
preadolescence.
The
answer
is
(A).
Q:
Women'
'
s
ability
to
have
multiple
orgasms
is
primarily
due
to:
(A)
the
fact
that
they
do
not
have
a
refractory
period.
(B)
the
response
of
the
inner
layers
of
the
vagina.
(C)
having
alternating
orgasms
in
different
locations.
(D)
the
G-Spot.
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
human
sexuality
for
help.
The
refractory
period
is
the
time
when
a
person
is
not
able
to
be
erect
or
is
not
interested
in
sex.
The
answer
is
(A).
Q:
Morning
sickness
is
typically
a
problem:
(A)
during
the
first
trimester
(B)
during
the
second
trimester
(C)
during
the
third
trimester
(D)
all
through
the
pregnancy
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
human
sexuality
for
help.
Morning
sickness
usually
begins
by
nine
weeks
after
conception,
corresponding
to
the
first
trimester.
The
answer
is
(A).'
include
:
_mmlu_flan_cot_fewshot_template_yaml
task
:
mmlu_flan_cot_fewshot_human_sexuality
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_international_law.yaml
View file @
574e565a
dataset_name
:
international_law
"
dataset_name"
:
"
international_law"
description
:
'
The
following
are
multiple
choice
questions
(with
answers)
about
international
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
international
\
law.
\
law.
\n\n
Q:
How
the
consent
to
be
bound
of
a
State
may
be
expressed?
\n
(A)
The
consent
\
\
of
a
State
to
be
bound
is
expressed
only
by
ratification
(B)
The
consent
of
a
\
\
state
to
be
bound
by
a
treaty
may
be
expressed
by
signature,
ratification,
acceptance,
\
Q:
How
the
consent
to
be
bound
of
a
State
may
be
expressed?
\
approval
or
accession
(C)
The
consent
of
a
State
to
be
bound
is
expressed
by
signature
\
\
(D)
The
consent
of
a
State
to
be
bound
is
expressed
by
whatever
means
they
choose
\n\
(A)
The
consent
of
a
State
to
be
bound
is
expressed
only
by
ratification
(B)
The
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
international
law
\
consent
of
a
state
to
be
bound
by
a
treaty
may
be
expressed
by
signature,
ratification,
\
for
help.
Article
11
of
Vienna
Convention
on
the
Law
of
Treaties
signed
in
1969
\
acceptance,
approval
or
accession
(C)
The
consent
of
a
State
to
be
bound
is
expressed
\
states
that
\"
the
consent
of
a
State
to
be
bound
by
a
treaty
may
be
expressed
\
by
signature
(D)
The
consent
of
a
State
to
be
bound
is
expressed
by
whatever
means
\
by
signature,
exchange
of
instruments
constituting
a
treaty,
ratification,
acceptance,
\
they
choose
\
approval
or
accession,
or
by
any
other
means
if
so
agreed.
\"
(B)
is
the
most
precise
\
\
and
accurate
answer.
The
answer
is
(B).
\n\n
Q:
What
is
the
judge
ad
hoc?
\n
(A)
If
\
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
international
law
\
a
party
to
a
contentious
case
before
the
ICJ
does
not
have
a
national
sitting
\
for
help.
Article
11
of
Vienna
Convention
on
the
Law
of
Treaties
signed
in
1969
\
as
judge,
it
is
entitled
to
nominate
someone
as
a
judge
solely
for
that
case,
\
states
that
"the
consent
of
a
State
to
be
bound
by
a
treaty
may
be
expressed
by
\
with
the
title
of
judge
ad
hoc
(B)
Judge
ad
hoc
is
the
member
of
the
bench
of
\
signature,
exchange
of
instruments
constituting
a
treaty,
ratification,
acceptance,
\
the
ICJ
with
a
casting
vote
(C)
Judge
ad
hoc
is
a
surrogate
judge,
in
case
a
judge
\
approval
or
accession,
or
by
any
other
means
if
so
agreed."
(B)
is
the
most
precise
\
is
disqualified
or
passes
away
(D)
Judge
ad
hoc
is
the
judge
that
each
party
will
\
and
accurate
answer.
The
answer
is
(B).
\
always
nominate
in
every
contentious
case
\n
A:
Let's
think
step
by
step.
We
refer
\
\
to
Wikipedia
articles
on
international
law
for
help.
As
\"
ad
hoc
\"
implies,
a
\
\
judge
ad
hoc
is
appointed
only
for
a
specific
case
or
period,
when
a
party
to
\
Q:
What
is
the
judge
ad
hoc?
\
a
contentious
case
before
the
International
Court
of
Justice
does
not
have
a
regular
\
\
national
sitting
as
judge.
The
answer
is
(A).
\n\n
Q:
When
'consent'
can
serve
as
\
(A)
If
a
party
to
a
contentious
case
before
the
ICJ
does
not
have
a
national
sitting
\
a
circumstance
precluding
the
wrongfulness
of
a
State
conduct?
\n
(A)
Consent
can
\
as
judge,
it
is
entitled
to
nominate
someone
as
a
judge
solely
for
that
case,
with
\
serve
as
a
circumstance
precluding
the
wrongfulness
whenever
it
is
given
(B)
Consent
\
the
title
of
judge
ad
hoc
(B)
Judge
ad
hoc
is
the
member
of
the
bench
of
the
ICJ
\
can
never
serve
as
a
circumstance
precluding
wrongfulness
(C)
Consent
can
serve
\
with
a
casting
vote
(C)
Judge
ad
hoc
is
a
surrogate
judge,
in
case
a
judge
is
disqualified
\
as
a
circumstance
precluding
wrongfulness,
provided
the
consent
is
valid
and
to
\
or
passes
away
(D)
Judge
ad
hoc
is
the
judge
that
each
party
will
always
nominate
\
the
extent
that
the
conduct
remains
within
the
limits
of
the
consent
given
(D)
\
in
every
contentious
case
\
Consent
can
always
serve
as
a
circumstance
precluding
wrongfulness,
no
matter
\
\
which
organ
of
the
State
gives
it
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
\
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
international
law
\
articles
on
international
law
for
help.
Valid
consent
can
serve
as
a
circumstance
\
for
help.
As
"ad
hoc"
implies,
a
judge
ad
hoc
is
appointed
only
for
a
specific
case
\
precluding
the
wrongfulness
of
a
State
conduct
if
the
conduct
remains
within
the
\
or
period,
when
a
party
to
a
contentious
case
before
the
International
Court
of
\
limits
of
that
consent,
according
to
Chapter
V
of
the
Responsibility
of
States
\
Justice
does
not
have
a
regular
national
sitting
as
judge.
The
answer
is
(A).
\
for
Internationally
Wrongful
Acts,
2001,
United
Nations.
The
answer
is
(C).
\n\n\
Q:
Would
a
reservation
to
the
definition
of
torture
in
the
ICCPR
be
acceptable
in
\
\
contemporary
practice?
\n
(A)
This
is
an
acceptable
reservation
if
the
reserving
\
Q:
When
'
'
consent'
'
can
serve
as
a
circumstance
precluding
the
wrongfulness
of
a
\
country's
legislation
employs
a
different
definition
(B)
This
is
an
unacceptable
\
State
conduct?
\
reservation
because
it
contravenes
the
object
and
purpose
of
the
ICCPR
(C)
This
\
\
is
an
unacceptable
reservation
because
the
definition
of
torture
in
the
ICCPR
\
(A)
Consent
can
serve
as
a
circumstance
precluding
the
wrongfulness
whenever
it
\
is
consistent
with
customary
international
law
(D)
This
is
an
acceptable
reservation
\
is
given
(B)
Consent
can
never
serve
as
a
circumstance
precluding
wrongfulness
(C)
\
because
under
general
international
law
States
have
the
right
to
enter
reservations
\
Consent
can
serve
as
a
circumstance
precluding
wrongfulness,
provided
the
consent
\
to
treaties
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
international
\
is
valid
and
to
the
extent
that
the
conduct
remains
within
the
limits
of
the
consent
\
law
for
help.
For
it
contravenes
the
object
and
purpose
of
the
ICCPR,
this
is
\
given
(D)
Consent
can
always
serve
as
a
circumstance
precluding
wrongfulness,
no
\
an
unacceptable
reservation
in
contemporary
practice.
The
answer
is
(B).
\n\n
Q:
\
matter
which
organ
of
the
State
gives
it
\
What
types
of
force
does
Article
2(4)
of
the
UN
Charter
prohibit?
\n
(A)
Article
\
\
2(4)
encompasses
only
armed
force
(B)
Article
2(4)
encompasses
all
types
of
force,
\
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
international
law
\
including
sanctions
(C)
Article
2(4)
encompasses
all
interference
in
the
domestic
\
for
help.
Valid
consent
can
serve
as
a
circumstance
precluding
the
wrongfulness
\
affairs
of
States
(D)
Article
2(4)
encompasses
force
directed
only
against
a
State's
\
of
a
State
conduct
if
the
conduct
remains
within
the
limits
of
that
consent,
according
\
territorial
integrity
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
\
to
Chapter
V
of
the
Responsibility
of
States
for
Internationally
Wrongful
Acts,
\
on
international
law
for
help.
Article
2(4)
of
the
UN
Charter
prohibits
states
\
2001,
United
Nations.
The
answer
is
(C).
\
from
using
armed
forces
in
their
international
relations.
The
answer
is
(A)."
"
group"
:
"
mmlu_flan_cot_fewshot_humanities"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
Q:
Would
a
reservation
to
the
definition
of
torture
in
the
ICCPR
be
acceptable
in
"
task"
:
"
mmlu_flan_cot_fewshot_international_law"
contemporary
practice?
(A)
This
is
an
acceptable
reservation
if
the
reserving
country'
'
s
legislation
employs
a
different
definition
(B)
This
is
an
unacceptable
reservation
because
it
contravenes
the
object
and
purpose
of
the
ICCPR
(C)
This
is
an
unacceptable
reservation
because
the
definition
of
torture
in
the
ICCPR
is
consistent
with
customary
international
law
(D)
This
is
an
acceptable
reservation
because
under
general
international
law
States
have
the
right
to
enter
reservations
to
treaties
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
international
law
for
help.
For
it
contravenes
the
object
and
purpose
of
the
ICCPR,
this
is
an
unacceptable
reservation
in
contemporary
practice.
The
answer
is
(B).
Q:
What
types
of
force
does
Article
2(4)
of
the
UN
Charter
prohibit?
(A)
Article
2(4)
encompasses
only
armed
force
(B)
Article
2(4)
encompasses
all
types
of
force,
including
sanctions
(C)
Article
2(4)
encompasses
all
interference
in
the
domestic
affairs
of
States
(D)
Article
2(4)
encompasses
force
directed
only
against
a
State'
'
s
territorial
integrity
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
international
law
for
help.
Article
2(4)
of
the
UN
Charter
prohibits
states
from
using
armed
forces
in
their
international
relations.
The
answer
is
(A).'
include
:
_mmlu_flan_cot_fewshot_template_yaml
task
:
mmlu_flan_cot_fewshot_international_law
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_jurisprudence.yaml
View file @
574e565a
dataset_name
:
jurisprudence
"
dataset_name"
:
"
jurisprudence"
description
:
'
The
following
are
multiple
choice
questions
(with
answers)
about
jurisprudence.
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
jurisprudence.
\n\
\n
Q:
Iverson
Jewelers
wrote
a
letter
to
Miller,
'We
have
received
an
exceptionally
\
\
fine
self
winding
Rolox
watch
which
we
will
sell
to
you
at
a
very
favorable
price.'
\n\
Q:
Iverson
Jewelers
wrote
a
letter
to
Miller,
'
'
We
have
received
an
exceptionally
(A)
The
letter
is
an
offer
to
sell
(B)
A
valid
offer
cannot
be
made
by
letter.
(C)
\
fine
self
winding
Rolox
watch
which
we
will
sell
to
you
at
a
very
favorable
price.'
'
\
The
letter
contains
a
valid
offer
which
will
terminate
within
a
reasonable
time.
\
\
(D)
The
letter
lacks
one
of
the
essential
elements
of
an
offer.
\n
A:
Let's
think
\
(A)
The
letter
is
an
offer
to
sell
(B)
A
valid
offer
cannot
be
made
by
letter.
(C)
\
step
by
step.
We
refer
to
Wikipedia
articles
on
jurisprudence
for
help.
An
offer
\
The
letter
contains
a
valid
offer
which
will
terminate
within
a
reasonable
time.
\
shows
the
intent
to
enter
into
a
mutually-beneficial
contract
with
specific
terms.
\
(D)
The
letter
lacks
one
of
the
essential
elements
of
an
offer.
\
An
offer
can
be
made
by
a
letter.
While
this
letter
indicates
the
willingness
\
\
to
sell,
the
lack
of
specific
terms,
such
as
transaction
price
and
offer
expiration
\
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
jurisprudence
for
\
date,
makes
it
an
incomplete
offer.
The
answer
is
(D).
\n\n
Q:
Functions
of
the
\
help.
An
offer
shows
the
intent
to
enter
into
a
mutually-beneficial
contract
with
\
law
include
all
but
which
of
the
following?
\n
(A)
maximizing
individual
freedom
\
specific
terms.
An
offer
can
be
made
by
a
letter.
While
this
letter
indicates
the
\
(B)
providing
a
basis
for
compromise
(C)
keeping
the
peace
(D)
promoting
the
principles
\
willingness
to
sell,
the
lack
of
specific
terms,
such
as
transaction
price
and
offer
\
of
the
free
enterprise
system
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
\
expiration
date,
makes
it
an
incomplete
offer.
The
answer
is
(D).
\
articles
on
jurisprudence
for
help.
Laws
are
fundamentally
about
helping
resolve
\
\
disputes
between
individuals,
and
therefore
essential
for
maximizing
individual
\
\
freedom,
providing
a
basis
for
compromise,
and
keeping
the
peace.
The
answer
is
\
Q:
Functions
of
the
law
include
all
but
which
of
the
following?
\
(D).
\n\n
Q:
The
________
School
of
jurisprudence
postulates
that
the
law
is
based
\
\
on
what
is
\"
correct.
\"\n
(A)
Natural
Law
(B)
Analytical
(C)
Historical
(D)
Sociological
\n\
(A)
maximizing
individual
freedom
(B)
providing
a
basis
for
compromise
(C)
keeping
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
jurisprudence
for
\
the
peace
(D)
promoting
the
principles
of
the
free
enterprise
system
\
help.
Natural
Law
School
of
jurisprudence
focuses
on
the
laws
of
nature,
and
states
\
\
that
the
law
should
be
based
on
ethics,
morals,
and
what
is
\"
correct
\"
.
Analytical
\
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
jurisprudence
for
\
deals
with
the
law
as
it
already
exists,
Historical
postulates
that
the
law
was
\
help.
Laws
are
fundamentally
about
helping
resolve
disputes
between
individuals,
\
found
and
not
made,
and
Sociological
studies
how
the
law
and
society
impact
each
\
and
therefore
essential
for
maximizing
individual
freedom,
providing
a
basis
for
\
other.
The
answer
is
(A).
\n\n
Q:
Which
word
best
summarizes
Weber's
explanation
\
compromise,
and
keeping
the
peace.
The
answer
is
(D).
\
of
the
development
of
formally
rational
law?
\n
(A)
Authority.
(B)
Charisma.
(C)
\
\
Co-operation.
(D)
Capitalism.
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
\
\
articles
on
jurisprudence
for
help.
Weber
explained
the
development
of
formal
\
Q:
The
________
School
of
jurisprudence
postulates
that
the
law
is
based
on
what
\
rationality
in
laws
as
how
the
modern
society
moved
from
tradition
to
rationality,
\
is
"correct."
\
where
people
decide
actions
based
less
on
how
they
were
culturally
done
and
more
\
\
on
expected
utilities.
How
rational
individuals
optimize
efficiency
of
accomplishing
\
(A)
Natural
Law
(B)
Analytical
(C)
Historical
(D)
Sociological
\
tasks
for
higher
rewards
is
a
core
principle
of
Capitalism.
The
answer
is
(D).
\n\
\n
Q:
Which
position
does
Rawls
claim
is
the
least
likely
to
be
adopted
by
the
POP
\
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
jurisprudence
for
\
(people
in
the
original
position)?
\n
(A)
The
POP
would
choose
equality
above
liberty.
\
help.
Natural
Law
School
of
jurisprudence
focuses
on
the
laws
of
nature,
and
states
\
(B)
The
POP
would
opt
for
the
'maximin'
strategy.
(C)
The
POP
would
opt
for
the
\
that
the
law
should
be
based
on
ethics,
morals,
and
what
is
"correct".
Analytical
\
'difference
principle'.
(D)
The
POP
would
reject
the
'system
of
natural
liberty.'
\n\
deals
with
the
law
as
it
already
exists,
Historical
postulates
that
the
law
was
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
jurisprudence
for
\
found
and
not
made,
and
Sociological
studies
how
the
law
and
society
impact
each
\
help.
The
POP
would
opt
for
the
'maximin'
strategy,
opt
for
the
'difference
principle',
\
other.
The
answer
is
(A).
\
and
reject
the
'system
of
natural
liberty',
but
the
POP
would
not
choose
equality
\
\
above
liberty,
since
the
POP
assume
both
equal
and
free
citizens.
The
answer
is
\
\
(A)."
Q:
Which
word
best
summarizes
Weber'
'
s
explanation
of
the
development
of
formally
"
group"
:
"
mmlu_flan_cot_fewshot_humanities"
rational
law?
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
task"
:
"
mmlu_flan_cot_fewshot_jurisprudence"
(A)
Authority.
(B)
Charisma.
(C)
Co-operation.
(D)
Capitalism.
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
jurisprudence
for
help.
Weber
explained
the
development
of
formal
rationality
in
laws
as
how
the
modern
society
moved
from
tradition
to
rationality,
where
people
decide
actions
based
less
on
how
they
were
culturally
done
and
more
on
expected
utilities.
How
rational
individuals
optimize
efficiency
of
accomplishing
tasks
for
higher
rewards
is
a
core
principle
of
Capitalism.
The
answer
is
(D).
Q:
Which
position
does
Rawls
claim
is
the
least
likely
to
be
adopted
by
the
POP
(people
in
the
original
position)?
(A)
The
POP
would
choose
equality
above
liberty.
(B)
The
POP
would
opt
for
the
'
'
maximin'
'
strategy.
(C)
The
POP
would
opt
for
the
'
'
difference
principle'
'
.
(D)
The
POP
would
reject
the
'
'
system
of
natural
liberty.'
'
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
jurisprudence
for
help.
The
POP
would
opt
for
the
'
'
maximin'
'
strategy,
opt
for
the
'
'
difference
principle'
'
,
and
reject
the
'
'
system
of
natural
liberty'
'
,
but
the
POP
would
not
choose
equality
above
liberty,
since
the
POP
assume
both
equal
and
free
citizens.
The
answer
is
(A).'
include
:
_mmlu_flan_cot_fewshot_template_yaml
task
:
mmlu_flan_cot_fewshot_jurisprudence
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_logical_fallacies.yaml
View file @
574e565a
dataset_name
:
logical_fallacies
"
dataset_name"
:
"
logical_fallacies"
description
:
'
The
following
are
multiple
choice
questions
(with
answers)
about
logical
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
logical
\
fallacies.
\
fallacies.
\n\n
Q:
When
an
arguer
causes
confusion
during
refutation
because
of
\
\
real
or
feigned
lack
of
an
ability
to
engage
in
refutation,
that
arguer
may
have
\
\
committed
the
fallacy
of
\n
(A)
poor
sportsmanship
(B)
appeal
to
compassion
(C)
\
Q:
When
an
arguer
causes
confusion
during
refutation
because
of
real
or
feigned
\
argument
against
the
person
(D)
ignorance
of
refutation
\n
A:
Let's
think
step
by
\
lack
of
an
ability
to
engage
in
refutation,
that
arguer
may
have
committed
the
fallacy
\
step.
We
refer
to
Wikipedia
articles
on
logical
fallacies
for
help.
Ignorance
\
of
\
of
refutation,
one
of
Aristotle's
original
list
of
logical
fallacies
in
his
Organon,
\
\
is
when
someone
causes
confusion
in
an
argument
through
real
or
feigned
inability
\
(A)
poor
sportsmanship
(B)
appeal
to
compassion
(C)
argument
against
the
person
\
to
engage
in
refutation,
in
order
to
win
the
argument.
The
answer
is
(D).
\n\n\
(D)
ignorance
of
refutation
Q:
The
complex
question
fallacy
consists
of
\n
(A)
arguing
something
is
inferior
just
\
\
because
it
doesn't
do
something
it
was
never
intended
to
do.
(B)
including
more
\
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
logical
fallacies
\
than
one
claim
in
the
proposition
and
treating
proof
for
one
claim
as
proof
for
\
for
help.
Ignorance
of
refutation,
one
of
Aristotle'
'
s
original
list
of
logical
\
all
the
claims.
(C)
drawing
a
conclusion
before
examining
the
evidence,
and
only
\
fallacies
in
his
Organon,
is
when
someone
causes
confusion
in
an
argument
through
\
considering
evidence
that
supports
that
conclusion.
(D)
asking
a
question
that
\
real
or
feigned
inability
to
engage
in
refutation,
in
order
to
win
the
argument.
\
includes
either
an
unproven
assumption
or
more
than
one
question,
thus
making
\
The
answer
is
(D).
\
a
straightforward
yes
or
no
answer
meaningless.
\n
A:
Let's
think
step
by
step.
\
\
We
refer
to
Wikipedia
articles
on
logical
fallacies
for
help.
The
complex
question
\
\
fallacy
is
when
someone
makes
a
single
yes
or
no
answer
to
a
question
meaningless,
\
Q:
The
complex
question
fallacy
consists
of
\
by
including
either
an
unproven
assumption
or
many
questions.
The
latter
is
also
\
\
known
as
the
many
questions
fallacy.
The
answer
is
(D).
\n\n
Q:
Arguing
that
what
\
(A)
arguing
something
is
inferior
just
because
it
doesn'
'
t
do
something
it
was
never
\
is
true
of
the
parts
must
be
true
of
the
whole
is
the
fallacy
of...
\n
(A)
Division
\
intended
to
do.
(B)
including
more
than
one
claim
in
the
proposition
and
treating
\
(B)
Composition
(C)
Appeal
to
the
person
(D)
Appeal
to
ignorance
\n
A:
Let's
think
\
proof
for
one
claim
as
proof
for
all
the
claims.
(C)
drawing
a
conclusion
before
\
step
by
step.
We
refer
to
Wikipedia
articles
on
logical
fallacies
for
help.
Fallacy
\
examining
the
evidence,
and
only
considering
evidence
that
supports
that
conclusion.
\
of
composition
occurs
when
someone
argues
what
is
true
of
the
parts
must
be
true
\
(D)
asking
a
question
that
includes
either
an
unproven
assumption
or
more
than
one
\
of
the
whole.
The
answer
is
(B).
\n\n
Q:
Which
of
the
following
is
true
of
a
valid
\
question,
thus
making
a
straightforward
yes
or
no
answer
meaningless.
\
categorical
syllogism?
\n
(A)
The
minor
premise
must
deny
the
antecedent
(B)
The
\
\
major
premise
must
affirm
the
consequent
(C)
The
middle
term
must
be
used
in
at
\
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
logical
fallacies
\
least
one
premise
in
a
universal
or
unqualified
sense
(D)
All
of
the
above
\n
A:
\
for
help.
The
complex
question
fallacy
is
when
someone
makes
a
single
yes
or
no
\
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
logical
fallacies
\
answer
to
a
question
meaningless,
by
including
either
an
unproven
assumption
or
\
for
help.
A
valid
categorical
syllogism
must
satisfy
several
conditions:
(1)
the
\
many
questions.
The
latter
is
also
known
as
the
many
questions
fallacy.
The
answer
\
syllogism
must
have
exactly
three
terms
(2)
every
term
of
the
syllogism
must
be
\
is
(D).
\
used
twice
exactly,
(3)
a
term
may
be
used
only
once
in
any
premise,
and
(4)
the
\
\
middle
term
must
be
used
in
at
least
one
premise
in
a
universal
or
unqualified
\
\
sense,
etc.
Only
(C)
is
true.
The
answer
is
(C).
\n\n
Q:
If
someone
attacks
the
\
Q:
Arguing
that
what
is
true
of
the
parts
must
be
true
of
the
whole
is
the
fallacy
\
character
of
an
opposing
arguer,
instead
of
responding
to
that
opponent's
arguments,
\
of...
\
the
first
person
has
probably
committed
which
of
the
following
fallacies?
\n
(A)
\
\
tu
quoque
(B)
horse
laugh
(C)
argument
against
the
person
(D)
ignoratio
elenchi
\n\
(A)
Division
(B)
Composition
(C)
Appeal
to
the
person
(D)
Appeal
to
ignorance
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
logical
fallacies
\
\
for
help.
The
argument
against
the
person
fallacy
occurs
when
someone
irrelevantly
\
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
logical
fallacies
\
attacks
the
character
of
an
opposing
arguer,
instead
of
addressing
that
opponent's
\
for
help.
Fallacy
of
composition
occurs
when
someone
argues
what
is
true
of
the
\
arguments.
The
answer
is
(C)."
parts
must
be
true
of
the
whole.
The
answer
is
(B).
"
group"
:
"
mmlu_flan_cot_fewshot_humanities"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
task"
:
"
mmlu_flan_cot_fewshot_logical_fallacies"
Q:
Which
of
the
following
is
true
of
a
valid
categorical
syllogism?
(A)
The
minor
premise
must
deny
the
antecedent
(B)
The
major
premise
must
affirm
the
consequent
(C)
The
middle
term
must
be
used
in
at
least
one
premise
in
a
universal
or
unqualified
sense
(D)
All
of
the
above
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
logical
fallacies
for
help.
A
valid
categorical
syllogism
must
satisfy
several
conditions:
(1)
the
syllogism
must
have
exactly
three
terms
(2)
every
term
of
the
syllogism
must
be
used
twice
exactly,
(3)
a
term
may
be
used
only
once
in
any
premise,
and
(4)
the
middle
term
must
be
used
in
at
least
one
premise
in
a
universal
or
unqualified
sense,
etc.
Only
(C)
is
true.
The
answer
is
(C).
Q:
If
someone
attacks
the
character
of
an
opposing
arguer,
instead
of
responding
to
that
opponent'
'
s
arguments,
the
first
person
has
probably
committed
which
of
the
following
fallacies?
(A)
tu
quoque
(B)
horse
laugh
(C)
argument
against
the
person
(D)
ignoratio
elenchi
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
logical
fallacies
for
help.
The
argument
against
the
person
fallacy
occurs
when
someone
irrelevantly
attacks
the
character
of
an
opposing
arguer,
instead
of
addressing
that
opponent'
'
s
arguments.
The
answer
is
(C).'
include
:
_mmlu_flan_cot_fewshot_template_yaml
task
:
mmlu_flan_cot_fewshot_logical_fallacies
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_machine_learning.yaml
View file @
574e565a
dataset_name
:
machine_learning
"
dataset_name
"
:
"
machine_learning
"
description
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
machine
\
"
description
"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
machine
\
\
learning.
\n\n
Q:
Which
image
data
augmentation
is
most
common
for
natural
images?
\n\
\
learning.
\n\n
Q:
Which
image
data
augmentation
is
most
common
for
natural
images?
\n\
(A)
random
crop
and
horizontal
flip
(B)
random
crop
and
vertical
flip
(C)
posterization
\
(A)
random
crop
and
horizontal
flip
(B)
random
crop
and
vertical
flip
(C)
posterization
\
\
(D)
dithering
\n
A:
Let's
think
step
by
step.
Data
augmentation
is
used
to
increase
\
\
(D)
dithering
\n
A:
Let's
think
step
by
step.
Data
augmentation
is
used
to
increase
\
...
@@ -12,48 +12,49 @@ description: "The following are multiple choice questions (with answers) about m
...
@@ -12,48 +12,49 @@ description: "The following are multiple choice questions (with answers) about m
\
learning
we
consider
a
binary
split
according
to
whether
the
attribute
is
above
\
\
learning
we
consider
a
binary
split
according
to
whether
the
attribute
is
above
\
\
or
below
some
threshold.
Pat
suggests
that
instead
we
should
just
have
a
multiway
\
\
or
below
some
threshold.
Pat
suggests
that
instead
we
should
just
have
a
multiway
\
\
split
with
one
branch
for
each
of
the
distinct
values
of
the
attribute.
From
the
\
\
split
with
one
branch
for
each
of
the
distinct
values
of
the
attribute.
From
the
\
\
list
below
choose
the
single
biggest
problem
with
Pat
\u2019
s
suggestion:
\n
(A)
\
\
list
below
choose
the
single
biggest
problem
with
Pat
’
s
suggestion:
\n
(A)
It
is
\
\
It
is
too
computationally
expensive.
(B)
It
would
probably
result
in
a
decision
\
\
too
computationally
expensive.
(B)
It
would
probably
result
in
a
decision
tree
\
\
tree
that
scores
badly
on
the
training
set
and
a
testset.
(C)
It
would
probably
\
\
that
scores
badly
on
the
training
set
and
a
testset.
(C)
It
would
probably
result
\
\
result
in
a
decision
tree
that
scores
well
on
the
training
set
but
badly
on
a
\
\
in
a
decision
tree
that
scores
well
on
the
training
set
but
badly
on
a
testset.
\
\
testset.
(D)
It
would
probably
result
in
a
decision
tree
that
scores
well
on
a
\
\
(D)
It
would
probably
result
in
a
decision
tree
that
scores
well
on
a
testset
\
\
testset
but
badly
on
a
training
set.
\n
A:
Let's
think
step
by
step.
Because
the
\
\
but
badly
on
a
training
set.
\n
A:
Let's
think
step
by
step.
Because
the
input
is
\
\
input
is
real
valued,
it
is
unlikely
that
the
same
values
appear
both
at
training
\
\
real
valued,
it
is
unlikely
that
the
same
values
appear
both
at
training
and
test
\
\
and
test
time.
This
means
that
while
such
a
decision
tree
could
yield
good
performance
\
\
time.
This
means
that
while
such
a
decision
tree
could
yield
good
performance
\
\
on
the
training
data,
when
evaluated
on
the
test
data
it
will
perform
badly
because
\
\
on
the
training
data,
when
evaluated
on
the
test
data
it
will
perform
badly
because
\
\
the
decision
tree
won
\u2019
t
know
what
to
do
with
numbers
that
did
not
appear
\
\
the
decision
tree
won’t
know
what
to
do
with
numbers
that
did
not
appear
in
the
\
\
in
the
training
data.
The
answer
is
(C).
\n\n
Q:
You
are
reviewing
papers
for
the
\
\
training
data.
The
answer
is
(C).
\n\n
Q:
You
are
reviewing
papers
for
the
World’s
\
\
World
\u2019
s
Fanciest
Machine
Learning
Conference,
and
you
see
submissions
with
\
\
Fanciest
Machine
Learning
Conference,
and
you
see
submissions
with
the
following
\
\
the
following
claims.
Which
ones
would
you
consider
accepting?
\n
(A)
My
method
\
\
claims.
Which
ones
would
you
consider
accepting?
\n
(A)
My
method
achieves
a
training
\
\
achieves
a
training
error
lower
than
all
previous
methods!
(B)
My
method
achieves
\
\
error
lower
than
all
previous
methods!
(B)
My
method
achieves
a
test
error
lower
\
\
a
test
error
lower
than
all
previous
methods!
(Footnote:
When
regularisation
parameter
\
\
than
all
previous
methods!
(Footnote:
When
regularisation
parameter
λ
is
chosen
\
\ \u03BB
is
chosen
so
as
to
minimise
test
error.)
(C)
My
method
achieves
a
test
\
\
so
as
to
minimise
test
error.)
(C)
My
method
achieves
a
test
error
lower
than
\
\
all
previous
methods!
(Footnote:
When
regularisation
parameter
λ
is
chosen
so
\
\
as
to
minimise
cross-validaton
error.)
(D)
My
method
achieves
a
cross-validation
\
\
error
lower
than
all
previous
methods!
(Footnote:
When
regularisation
parameter
\
\
error
lower
than
all
previous
methods!
(Footnote:
When
regularisation
parameter
\
\ \u03BB
is
chosen
so
as
to
minimise
cross-validaton
error.)
(D)
My
method
achieves
\
\
λ
is
chosen
so
as
to
minimise
cross-validaton
error.)
\n
A:
Let's
think
step
by
\
\
a
cross-validation
error
lower
than
all
previous
methods!
(Footnote:
When
regularisation
\
\
step.
In
machine
learning,
we
train
with
some
data
and
fixed
hyperparameters
and
\
\
parameter
\u03BB
is
chosen
so
as
to
minimise
cross-validaton
error.)
\n
A:
Let's
\
\
the
training
error
can
be
arbitrarily
low,
so
(A)
can’t
be
right.
Then,
one
compares
\
\
think
step
by
step.
In
machine
learning,
we
train
with
some
data
and
fixed
hyperparameters
\
\
different
hyperparameters
by
selecting
the
model
with
the
lowest
cross-validation
\
\
and
the
training
error
can
be
arbitrarily
low,
so
(A)
can
\u2019
t
be
right.
Then,
\
\
error,
this
means
that
(B)
and
(D)
are
not
the
right
procedure.
The
only
relevant
\
\
one
compares
different
hyperparameters
by
selecting
the
model
with
the
lowest
\
\
number
after
these
is
the
test
error
and
thus
(C)
is
the
right
answer.
The
answer
\
\
cross-validation
error,
this
means
that
(B)
and
(D)
are
not
the
right
procedure.
\
\
is
(C).
\n\n
Q:
A
6-sided
die
is
rolled
15
times
and
the
results
are:
side
1
comes
\
\
The
only
relevant
number
after
these
is
the
test
error
and
thus
(C)
is
the
right
\
\
up
0
times;
side
2:
1
time;
side
3:
2
times;
side
4:
3
times;
side
5:
4
times;
\
\
answer.
The
answer
is
(C).
\n\n
Q:
A
6-sided
die
is
rolled
15
times
and
the
results
\
\
side
6:
5
times.
Based
on
these
results,
what
is
the
probability
of
side
3
coming
\
\
are:
side
1
comes
up
0
times;
side
2:
1
time;
side
3:
2
times;
side
4:
3
times;
\
\
up
when
using
Add-1
Smoothing?
\n
(A)
2.0/15
(B)
1.0/7
(C)
3.0/16
(D)
1.0/5
\n
A:
\
\
side
5:
4
times;
side
6:
5
times.
Based
on
these
results,
what
is
the
probability
\
\
Let's
think
step
by
step.
Add-1
smoothing
adds
the
value
of
one
to
the
different
\
\
of
side
3
coming
up
when
using
Add-1
Smoothing?
\n
(A)
2.0/15
(B)
1.0/7
(C)
3.0/16
\
\
counts
and
then
normalizes
the
probabilities
accordingly.
The
counts
after
adding
\
\
(D)
1.0/5
\n
A:
Let's
think
step
by
step.
Add-1
smoothing
adds
the
value
of
one
\
\
one
will
be:
side
1
comes
up
1
time;
side
2:
2
times;
side
3:
3
times;
side
4:
\
\
to
the
different
counts
and
then
normalizes
the
probabilities
accordingly.
The
\
\
4
times;
side
5:
5
times;
side
6:
6
times.
The
number
of
sum
one
die
rolls
will
\
\
counts
after
adding
one
will
be:
side
1
comes
up
1
time;
side
2:
2
times;
side
\
\
be
21,
so
the
probability
of
drawing
a
three
is
3/21
=
1/7.
The
answer
is
(B).
\n\
\
3:
3
times;
side
4:
4
times;
side
5:
5
times;
side
6:
6
times.
The
number
of
sum
\
\n
Q:
To
achieve
an
0/1
loss
estimate
that
is
less
than
1
percent
of
the
true
0/1
\
\
one
die
rolls
will
be
21,
so
the
probability
of
drawing
a
three
is
3/21
=
1/7.
\
\
loss
(with
probability
95%),
according
to
Hoeffding's
inequality
the
IID
test
\
\
The
answer
is
(B).
\n\n
Q:
To
achieve
an
0/1
loss
estimate
that
is
less
than
1
percent
\
\
set
must
have
how
many
examples?
\n
(A)
around
10
examples
(B)
around
100
examples
\
\
of
the
true
0/1
loss
(with
probability
95%),
according
to
Hoeffding's
inequality
\
\
(C)
between
100
and
500
examples
(D)
more
than
1000
examples
\n
A:
Let's
think
step
\
\
the
IID
test
set
must
have
how
many
examples?
\n
(A)
around
10
examples
(B)
around
\
\
by
step.
By
the
Hoeffding’s
inequality,
we
expect
that
with
95%
probability
the
\
\
100
examples
(C)
between
100
and
500
examples
(D)
more
than
1000
examples
\n
A:
\
\
in-sample
and
out-of-sample
errors
differ
by
epsilon
when
we
have
N
samples
if
\
\
Let's
think
step
by
step.
By
the
Hoeffding
\u2019
s
inequality,
we
expect
that
with
\
\
2
exp(-2
epsilon^2
N)<0.05,
this
implies
that
N
>
-1/(2*epsilon**2)
log
(
0.05/2
\
\
95%
probability
the
in-sample
and
out-of-sample
errors
differ
by
epsilon
when
\
\
)=
log
(40)*5000.
Since
log(40)>1,
we
have
that
one
needs
more
than
1000
examples.
\
\
we
have
N
samples
if
2
exp(-2
epsilon^2
N)<0.05,
this
implies
that
N
>
-1/(2*epsilon**2)
\
\
The
answer
is
(D)."
\
log
(
0.05/2
)=
log
(40)*5000.
Since
log(40)>1,
we
have
that
one
needs
more
than
\
"
group"
:
"
mmlu_flan_cot_fewshot_stem"
\
1000
examples.
The
answer
is
(D)."
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
include
:
_mmlu_flan_cot_fewshot_template_yaml
"
task"
:
"
mmlu_flan_cot_fewshot_machine_learning"
task
:
mmlu_flan_cot_fewshot_machine_learning
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_management.yaml
View file @
574e565a
dataset_name
:
management
"
dataset_name"
:
"
management"
description
:
'
The
following
are
multiple
choice
questions
(with
answers)
about
management.
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
management.
\n\
\n
Q:
How
can
organisational
structures
that
are
characterised
by
democratic
and
\
\
inclusive
styles
of
management
be
described?
\n
(A)
Hierarchical
(B)
Bureaucratic
\
Q:
How
can
organisational
structures
that
are
characterised
by
democratic
and
inclusive
\
(C)
Flat
(D)
Functional
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
\
styles
of
management
be
described?
\
on
management
for
help.
Flat
organizational
structures
are
characterized
by
democratic
\
\
and
inclusive
styles
of
management,
and
have
few
(if
any)
levels
of
management
\
(A)
Hierarchical
(B)
Bureaucratic
(C)
Flat
(D)
Functional
\
between
the
workers
and
managers.
The
answer
is
(C).
\n\n
Q:
Hygiene
factors
are
\
\
associated
with
which
writer?
\n
(A)
Frederick
Hertzberg
(B)
D.C.
McClelland
(C)
\
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
management
for
help.
\
Abraham
Maslow
(D)
Douglas
McGregor
\n
A:
Let's
think
step
by
step.
We
refer
to
\
Flat
organizational
structures
are
characterized
by
democratic
and
inclusive
styles
\
Wikipedia
articles
on
management
for
help.
Hygiene
factors
include
compensation,
\
of
management,
and
have
few
(if
any)
levels
of
management
between
the
workers
and
\
company
policies,
supervision,
interpersonal
relations,
and
work
environments.
\
managers.
The
answer
is
(C).
\
Hertzberg
lists
them
as
factors
that
cannot
motivate
employees
but
can
minimize
\
\
job
dissatisfaction.
The
answer
is
(A).
\n\n
Q:
What
characteristic
is
not
a
key
\
\
feature
of
the
'open
systems'
model
of
management?
\n
(A)
Morale
(B)
Innovation
\
Q:
Hygiene
factors
are
associated
with
which
writer?
\
(C)
Growth
resource
(D)
Adaptation
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
\
\
articles
on
management
for
help.
The
key
characteristics
of
an
open
system
in
\
(A)
Frederick
Hertzberg
(B)
D.C.
McClelland
(C)
Abraham
Maslow
(D)
Douglas
McGregor
\
management
include
innovation,
growth
resource,
and
adaption,
but
do
not
include
\
\
morale.
The
answer
is
(A).
\n\n
Q:
Which
element
of
the
cultural
web
forms
regalia?
\n\
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
management
for
help.
(A)
Symbols
(B)
Rituals
and
routines
(C)
Power
structures
(D)
Control
systems
\n\
Hygiene
factors
include
compensation,
company
policies,
supervision,
interpersonal
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
management
for
help.
\
relations,
and
work
environments.
Hertzberg
lists
them
as
factors
that
cannot
motivate
\
The
cultural
web
is
a
tool
for
mapping
an
organization's
culture,
where
symbols
\
employees
but
can
minimize
job
dissatisfaction.
The
answer
is
(A).
\
form
the
regalia
that
visually
expresses
the
values
that
the
organization
holds
\
\
as
important.
The
answer
is
(A).
\n\n
Q:
What
are
the
two
main
dimensions
of
the
\
\
Ohio
Studies
into
leadership?
\n
(A)
Starting
position
and
end
position
(B)
Initial
\
Q:
What
characteristic
is
not
a
key
feature
of
the
'
'
open
systems'
'
model
of
management?
\
environment
and
changed
environment
(C)
Organisational
structure
and
conditioning
\
\
(D)
Initiating
structure
and
considerations
\n
A:
Let's
think
step
by
step.
We
refer
\
(A)
Morale
(B)
Innovation
(C)
Growth
resource
(D)
Adaptation
\
to
Wikipedia
articles
on
management
for
help.
The
Ohio
State
Leadership
Studies
\
\
conducted
in
the
1940s
identified
initiating
structure
and
consideration
as
the
\
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
management
for
help.
\
two
main
dimensions
of
leader
behavior.
The
answer
is
(D)."
The
key
characteristics
of
an
open
system
in
management
include
innovation,
growth
"
group"
:
"
mmlu_flan_cot_fewshot_other"
resource,
and
adaption,
but
do
not
include
morale.
The
answer
is
(A).
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
task"
:
"
mmlu_flan_cot_fewshot_management"
Q:
Which
element
of
the
cultural
web
forms
regalia?
(A)
Symbols
(B)
Rituals
and
routines
(C)
Power
structures
(D)
Control
systems
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
management
for
help.
The
cultural
web
is
a
tool
for
mapping
an
organization'
'
s
culture,
where
symbols
form
the
regalia
that
visually
expresses
the
values
that
the
organization
holds
as
important.
The
answer
is
(A).
Q:
What
are
the
two
main
dimensions
of
the
Ohio
Studies
into
leadership?
(A)
Starting
position
and
end
position
(B)
Initial
environment
and
changed
environment
(C)
Organisational
structure
and
conditioning
(D)
Initiating
structure
and
considerations
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
management
for
help.
The
Ohio
State
Leadership
Studies
conducted
in
the
1940s
identified
initiating
structure
and
consideration
as
the
two
main
dimensions
of
leader
behavior.
The
answer
is
(D).'
include
:
_mmlu_flan_cot_fewshot_template_yaml
task
:
mmlu_flan_cot_fewshot_management
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_marketing.yaml
View file @
574e565a
dataset_name
:
marketing
"
dataset_name"
:
"
marketing"
description
:
'
The
following
are
multiple
choice
questions
(with
answers)
about
marketing.
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
marketing.
\n\
\n
Q:
Although
the
content
and
quality
can
be
as
controlled
as
direct
mail,
response
\
\
rates
of
this
medium
are
lower
because
of
the
lack
of
a
personal
address
mechanism.
\
Q:
Although
the
content
and
quality
can
be
as
controlled
as
direct
mail,
response
\
This
media
format
is
known
as:
\n
(A)
Care
lines.
(B)
Direct
mail.
(C)
Inserts.
\
rates
of
this
medium
are
lower
because
of
the
lack
of
a
personal
address
mechanism.
\
(D)
Door
to
door.
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
\
This
media
format
is
known
as:
\
on
marketing
for
help.
Door
to
door
marketing
delivers
non-addressed
items
within
\
\
all
buildings
within
a
geographic
area.
While
it
can
control
the
content
and
quality
\
(A)
Care
lines.
(B)
Direct
mail.
(C)
Inserts.
(D)
Door
to
door.
\
as
well
as
direct
mail
marketing,
its
response
rate
is
lower
because
of
the
lack
\
\
of
a
personal
address
mechanism.
The
answer
is
(D).
\n\n
Q:
In
an
organization,
\
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
marketing
for
help.
\
the
group
of
people
tasked
with
buying
decisions
is
referred
to
as
the
_______________.
\n\
Door
to
door
marketing
delivers
non-addressed
items
within
all
buildings
within
(A)
Outsourcing
unit.
(B)
Procurement
centre.
(C)
Chief
executive
unit.
(D)
Decision-making
\
a
geographic
area.
While
it
can
control
the
content
and
quality
as
well
as
direct
\
unit.
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
marketing
\
mail
marketing,
its
response
rate
is
lower
because
of
the
lack
of
a
personal
address
\
for
help.
In
an
organization,
the
group
of
the
people
tasked
with
buying
decision
\
mechanism.
The
answer
is
(D).
\
is
referred
to
as
the
decision-making
unit.
The
answer
is
(D).
\n\n
Q:
The
single
\
\
group
within
society
that
is
most
vulnerable
to
reference
group
influence
is:
\n\
(A)
The
older
consumer
who
feels
somewhat
left
out
of
things.
(B)
The
married
women,
\
Q:
In
an
organization,
the
group
of
people
tasked
with
buying
decisions
is
referred
\
many
of
whom
feel
a
need
for
stability
in
their
lives.
(C)
New
immigrants
who
\
to
as
the
_______________.
\
really
want
to
assimilate
into
their
new
culture.
(D)
Children,
who
base
most
\
\
of
their
buying
decisions
on
outside
influences.
\n
A:
Let's
think
step
by
step.
\
(A)
Outsourcing
unit.
(B)
Procurement
centre.
(C)
Chief
executive
unit.
(D)
Decision-making
\
We
refer
to
Wikipedia
articles
on
marketing
for
help.
Children,
who
mostly
based
\
unit.
\
their
buying
decisions
on
outside
influences,
are
the
single
group
within
society
\
\
that
is
more
vulnerable
to
reference
group
influence.
The
answer
is
(D).
\n\n
Q:
\
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
marketing
for
help.
\
Which
of
the
following
is
an
assumption
in
Maslow's
hierarchy
of
needs?
\n
(A)
Needs
\
In
an
organization,
the
group
of
the
people
tasked
with
buying
decision
is
referred
\
are
dependent
on
culture
and
also
on
social
class.
(B)
Lower-level
needs
must
\
to
as
the
decision-making
unit.
The
answer
is
(D).
\
be
at
least
partially
satisfied
before
higher
needs
can
affect
behaviour.
(C)
\
\
Needs
are
not
prioritized
or
arranged
in
any
particular
order.
(D)
Satisfied
needs
\
\
are
motivators,
and
new
needs
emerge
when
current
needs
remain
unmet.
\n
A:
Let's
\
Q:
The
single
group
within
society
that
is
most
vulnerable
to
reference
group
influence
\
think
step
by
step.
We
refer
to
Wikipedia
articles
on
marketing
for
help.
Maslow's
\
is:
\
hierarchy
of
needs,
from
the
bottom
upwards,
are
physiological
(food
and
clothing),
\
\
safety,
love
and
belonging
needs,
esteem,
and
self-actualization.
Lower-level
\
(A)
The
older
consumer
who
feels
somewhat
left
out
of
things.
(B)
The
married
women,
\
needs
must
be
at
least
partially
satisfied
before
higher
ones
can
affect
behavior.
\
many
of
whom
feel
a
need
for
stability
in
their
lives.
(C)
New
immigrants
who
really
\
The
answer
is
(B).
\n\n
Q:
_____________
is
a
natural
outcome
when
combining
demographic
\
want
to
assimilate
into
their
new
culture.
(D)
Children,
who
base
most
of
their
\
and
geographic
variables.
\n
(A)
Geodemographics
(B)
Product
differentiation.
(C)
\
buying
decisions
on
outside
influences.
\
ANSOFF
matrix.
(D)
Brand
management.
\n
A:
Let's
think
step
by
step.
We
refer
to
\
\
Wikipedia
articles
on
marketing
for
help.
Geodemographics
is
a
natural
outcome
\
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
marketing
for
help.
\
when
combining
demographic
and
geographic
variables.
The
answer
is
(A)."
Children,
who
mostly
based
their
buying
decisions
on
outside
influences,
are
the
"
group"
:
"
mmlu_flan_cot_fewshot_other"
single
group
within
society
that
is
more
vulnerable
to
reference
group
influence.
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
The
answer
is
(D).
"
task"
:
"
mmlu_flan_cot_fewshot_marketing"
Q:
Which
of
the
following
is
an
assumption
in
Maslow'
'
s
hierarchy
of
needs?
(A)
Needs
are
dependent
on
culture
and
also
on
social
class.
(B)
Lower-level
needs
must
be
at
least
partially
satisfied
before
higher
needs
can
affect
behaviour.
(C)
Needs
are
not
prioritized
or
arranged
in
any
particular
order.
(D)
Satisfied
needs
are
motivators,
and
new
needs
emerge
when
current
needs
remain
unmet.
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
marketing
for
help.
Maslow'
'
s
hierarchy
of
needs,
from
the
bottom
upwards,
are
physiological
(food
and
clothing),
safety,
love
and
belonging
needs,
esteem,
and
self-actualization.
Lower-level
needs
must
be
at
least
partially
satisfied
before
higher
ones
can
affect
behavior.
The
answer
is
(B).
Q:
_____________
is
a
natural
outcome
when
combining
demographic
and
geographic
variables.
(A)
Geodemographics
(B)
Product
differentiation.
(C)
ANSOFF
matrix.
(D)
Brand
management.
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
marketing
for
help.
Geodemographics
is
a
natural
outcome
when
combining
demographic
and
geographic
variables.
The
answer
is
(A).'
include
:
_mmlu_flan_cot_fewshot_template_yaml
task
:
mmlu_flan_cot_fewshot_marketing
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_medical_genetics.yaml
View file @
574e565a
dataset_name
:
medical_genetics
"
dataset_name"
:
"
medical_genetics"
description
:
'
The
following
are
multiple
choice
questions
(with
answers)
about
medical
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
medical
\
genetics.
\
genetics.
\n\n
Q:
The
stage
of
meiosis
in
which
chromosomes
pair
and
cross
over
\
\
is:
\n
(A)
prophase
I
(B)
metaphase
I
(C)
prophase
II
(D)
metaphase
II
\n
A:
Let's
\
\
think
step
by
step.
We
refer
to
Wikipedia
articles
on
medical
genetics
for
help.
\
Q:
The
stage
of
meiosis
in
which
chromosomes
pair
and
cross
over
is:
\
Prophase
I
is
the
stage
of
meiosis
where
homologous
chromosomes
pair
with
each
\
\
other
and
exchange
genetic
material.
The
answer
is
(A).
\n\n
Q:
DNA
ligase
is
\n\
(A)
prophase
I
(B)
metaphase
I
(C)
prophase
II
(D)
metaphase
II
(A)
an
enzyme
that
joins
fragments
in
normal
DNA
replication
(B)
an
enzyme
of
bacterial
\
\
origin
which
cuts
DNA
at
defined
base
sequences
(C)
an
enzyme
that
facilitates
\
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
medical
genetics
\
transcription
of
specific
genes
(D)
an
enzyme
which
limits
the
level
to
which
\
for
help.
Prophase
I
is
the
stage
of
meiosis
where
homologous
chromosomes
pair
with
\
a
particular
nutrient
reaches
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
\
each
other
and
exchange
genetic
material.
The
answer
is
(A).
\
articles
on
medical
genetics
for
help.
DNA
ligase
is
a
type
of
enzyme
(EC
6.5.1.1)
\
\
responsible
for
joining
DNA
strands
together
by
catalyzing
a
phosphodiester
bond.
\
\
The
answer
is
(A).
\n\n
Q:
Which
of
the
following
conditions
does
not
show
multifactorial
\
Q:
DNA
ligase
is
\
inheritance?
\n
(A)
Pyloric
stenosis
(B)
Schizophrenia
(C)
Spina
bifida
(neural
\
\
tube
defects)
(D)
Marfan
syndrome
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
\
(A)
an
enzyme
that
joins
fragments
in
normal
DNA
replication
(B)
an
enzyme
of
bacterial
\
articles
on
medical
genetics
for
help.
Multifactorial
inheritance
is
when
more
\
origin
which
cuts
DNA
at
defined
base
sequences
(C)
an
enzyme
that
facilitates
transcription
\
than
a
single
factor
is
responsible
for
causing
a
given
trait
or
health
problem.
\
of
specific
genes
(D)
an
enzyme
which
limits
the
level
to
which
a
particular
nutrient
\
Genes
cannot
be
the
only
factor.
Marfan
syndrome,
on
the
other
hand,
requires
\
reaches
\
only
one
abnormal
copy
of
the
of
the
Marfan
gene,
from
one
parent,
to
inherit
\
\
the
trait.
The
answer
is
(D).
\n\n
Q:
A
gene
showing
codominance
\n
(A)
has
both
alleles
\
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
medical
genetics
\
independently
expressed
in
the
heterozygote
(B)
has
one
allele
dominant
to
the
\
for
help.
DNA
ligase
is
a
type
of
enzyme
(EC
6.5.1.1)
responsible
for
joining
DNA
\
other
(C)
has
alleles
tightly
linked
on
the
same
chromosome
(D)
has
alleles
expressed
\
strands
together
by
catalyzing
a
phosphodiester
bond.
The
answer
is
(A).
\
at
the
same
time
in
development
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
\
\
articles
on
medical
genetics
for
help.
Codominance,
as
it
relates
to
genetics,
\
\
refers
to
a
type
of
genetic
inheritance
where
the
phenotype
of
both
the
parents
\
Q:
Which
of
the
following
conditions
does
not
show
multifactorial
inheritance?
\
is
easily
observed
in
the
offspring.
A
heterozygote
is
an
individual
having
two
\
\
different
alleles
of
a
gene.
The
answer
is
(A).
\n\n
Q:
Large
triplet
repeat
expansions
\
(A)
Pyloric
stenosis
(B)
Schizophrenia
(C)
Spina
bifida
(neural
tube
defects)
(D)
\
can
be
detected
by:
\n
(A)
polymerase
chain
reaction.
(B)
single
strand
conformational
\
Marfan
syndrome
\
polymorphism
analysis.
(C)
Southern
blotting.
(D)
Western
blotting.
\n
A:
Let's
\
\
think
step
by
step.
We
refer
to
Wikipedia
articles
on
medical
genetics
for
help.
\
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
medical
genetics
\
A
Southern
blot
is
a
method
in
molecular
biology
for
detecting
specific
DNA
sequences
\
for
help.
Multifactorial
inheritance
is
when
more
than
a
single
factor
is
responsible
\
in
a
sample.
Large
triplet
repeat
expansions
are
usually
detected
with
this
method.
\
for
causing
a
given
trait
or
health
problem.
Genes
cannot
be
the
only
factor.
Marfan
\
The
answer
is
(C)."
syndrome,
on
the
other
hand,
requires
only
one
abnormal
copy
of
the
of
the
Marfan
"
group"
:
"
mmlu_flan_cot_fewshot_other"
gene,
from
one
parent,
to
inherit
the
trait.
The
answer
is
(D).
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
task"
:
"
mmlu_flan_cot_fewshot_medical_genetics"
Q:
A
gene
showing
codominance
(A)
has
both
alleles
independently
expressed
in
the
heterozygote
(B)
has
one
allele
dominant
to
the
other
(C)
has
alleles
tightly
linked
on
the
same
chromosome
(D)
has
alleles
expressed
at
the
same
time
in
development
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
medical
genetics
for
help.
Codominance,
as
it
relates
to
genetics,
refers
to
a
type
of
genetic
inheritance
where
the
phenotype
of
both
the
parents
is
easily
observed
in
the
offspring.
A
heterozygote
is
an
individual
having
two
different
alleles
of
a
gene.
The
answer
is
(A).
Q:
Large
triplet
repeat
expansions
can
be
detected
by:
(A)
polymerase
chain
reaction.
(B)
single
strand
conformational
polymorphism
analysis.
(C)
Southern
blotting.
(D)
Western
blotting.
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
medical
genetics
for
help.
A
Southern
blot
is
a
method
in
molecular
biology
for
detecting
specific
DNA
sequences
in
a
sample.
Large
triplet
repeat
expansions
are
usually
detected
with
this
method.
The
answer
is
(C).'
include
:
_mmlu_flan_cot_fewshot_template_yaml
task
:
mmlu_flan_cot_fewshot_medical_genetics
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_miscellaneous.yaml
View file @
574e565a
dataset_name
:
miscellaneous
"
dataset_name"
:
"
miscellaneous"
description
:
'
The
following
are
multiple
choice
questions
(with
answers)
about
miscellaneous.
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
miscellaneous.
\n\
\n
Q:
Which
of
these
songs
was
a
Top
10
hit
for
the
rock
band
The
Police?
\n
(A)
'Radio
\
\
Ga-Ga'
(B)
'Ob-la-di
Ob-la-da'
(C)
'De
Do
Do
Do
De
Da
Da
Da'
(D)
'In-a-Gadda-Da-Vida'
\n\
Q:
Which
of
these
songs
was
a
Top
10
hit
for
the
rock
band
The
Police?
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
for
help.
Radio
Ga-Ga
is
by
Queen.
\
\
Ob-la-di
Ob-la-da
is
by
The
Beatles.
And
In-a-Gadda-Da-Vida
is
by
Iron
Butterfly.
\
(A)
'
'
Radio
Ga-Ga'
'
(B)
'
'
Ob-la-di
Ob-la-da'
'
(C)
'
'
De
Do
Do
Do
De
Da
Da
Da'
'
(D)
\
Leaving
'De
Do
Do
Do
De
Da
Da
Da'
as
the
only
song
by
The
Police,
and
also
a
Top
\
'
'
In-a-Gadda-Da-Vida'
'
\
10
hit.
The
answer
is
(C).
\n\n
Q:
What
place
is
named
in
the
title
of
the
1979
\
\
live
album
by
rock
legends
Cheap
Trick?
\n
(A)
Budapest
(B)
Budokan
(C)
Bhutan
(D)
\
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
for
help.
Radio
Ga-Ga
is
by
\
Britain
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
for
help.
Nippon
Budokan
\
Queen.
Ob-la-di
Ob-la-da
is
by
The
Beatles.
And
In-a-Gadda-Da-Vida
is
by
Iron
Butterfly.
\
is
an
indoor
arena
in
Tokyo,
Japan
renowned
for
hosting
rock
music
concerts
including
\
Leaving
'
'
De
Do
Do
Do
De
Da
Da
Da'
'
as
the
only
song
by
The
Police,
and
also
a
Top
\
Cheap
Trick
in
1978.
'Cheap
Trick
at
Budokan'
became
the
name
of
their
album.
\
10
hit.
The
answer
is
(C).
\
The
answer
is
(B).
\n\n
Q:
What
is
produced
during
photosynthesis?
\n
(A)
hydrogen
\
\
(B)
nylon
(C)
oxygen
(D)
light
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
\
\
for
help.
Photosynthesis
is
the
process
in
which
green
plants
use
the
green
pigment
\
Q:
What
place
is
named
in
the
title
of
the
1979
live
album
by
rock
legends
Cheap
\
chlorophyll
to
synthesize
foods
with
water
and
carbon
dioxide.
Oxygen
is
the
byproduct
\
Trick?
\
of
this
process.
The
answer
is
(C).
\n\n
Q:
Who
is
the
shortest
man
to
ever
win
\
\
an
NBA
slam
dunk
competition?
\n
(A)
Anthony
'Spud'
Webb
(B)
Michael
'Air'
Jordan
\
(A)
Budapest
(B)
Budokan
(C)
Bhutan
(D)
Britain
\
(C)
Tyrone
'Muggsy'
Bogues
(D)
Julius
'Dr
J'
Erving
\n
A:
Let's
think
step
by
step.
\
\
We
refer
to
Wikipedia
for
help.
In
1986,
Spud
Webb,
standing
only
5'7
\"
became
\
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
for
help.
Nippon
Budokan
is
\
the
shortest
NBA
player
in
history
to
win
an
official
slam
dunk
contest.
The
answer
\
an
indoor
arena
in
Tokyo,
Japan
renowned
for
hosting
rock
music
concerts
including
\
is
(A).
\n\n
Q:
How
many
axles
does
a
standard
automobile
have?
\n
(A)
one
(B)
two
\
Cheap
Trick
in
1978.
'
'
Cheap
Trick
at
Budokan'
'
became
the
name
of
their
album.
\
(C)
four
(D)
eight
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
for
help.
\
The
answer
is
(B).
\
Most
cars
have
two
axles
to
rotate
the
wheels..
The
answer
is
(B)."
"
group"
:
"
mmlu_flan_cot_fewshot_other"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
Q:
What
is
produced
during
photosynthesis?
"
task"
:
"
mmlu_flan_cot_fewshot_miscellaneous"
(A)
hydrogen
(B)
nylon
(C)
oxygen
(D)
light
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
for
help.
Photosynthesis
is
the
process
in
which
green
plants
use
the
green
pigment
chlorophyll
to
synthesize
foods
with
water
and
carbon
dioxide.
Oxygen
is
the
byproduct
of
this
process.
The
answer
is
(C).
Q:
Who
is
the
shortest
man
to
ever
win
an
NBA
slam
dunk
competition?
(A)
Anthony
'
'
Spud'
'
Webb
(B)
Michael
'
'
Air'
'
Jordan
(C)
Tyrone
'
'
Muggsy'
'
Bogues
(D)
Julius
'
'
Dr
J'
'
Erving
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
for
help.
In
1986,
Spud
Webb,
standing
only
5'
'
7"
became
the
shortest
NBA
player
in
history
to
win
an
official
slam
dunk
contest.
The
answer
is
(A).
Q:
How
many
axles
does
a
standard
automobile
have?
(A)
one
(B)
two
(C)
four
(D)
eight
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
for
help.
Most
cars
have
two
axles
to
rotate
the
wheels..
The
answer
is
(B).'
include
:
_mmlu_flan_cot_fewshot_template_yaml
task
:
mmlu_flan_cot_fewshot_miscellaneous
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_moral_disputes.yaml
View file @
574e565a
dataset_name
:
moral_disputes
"
dataset_name"
:
"
moral_disputes"
description
:
'
The
following
are
multiple
choice
questions
(with
answers)
about
moral
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
moral
\
disputes.
\
disputes.
\n\n
Q:
Baron
admits
that
the
versions
of
the
ticking
bomb
hypothetical
\
\
she
discusses
are
\"
stunningly
stupid,
\"
but
she
claims
this
is
actually
evidence
\
\
of
\n
(A)
the
stupidity
of
most
traditional
philosophical
examples.
(B)
a
general
\
Q:
Baron
admits
that
the
versions
of
the
ticking
bomb
hypothetical
she
discusses
\
lack
of
intelligence
among
people
with
advanced
degrees.
(C)
the
wrongness
of
\
are
"stunningly
stupid,"
but
she
claims
this
is
actually
evidence
of
\
torture.
(D)
the
readiness
on
the
part
of
many
intelligent
people
to
see
torture
\
\
as
the
best
solution
to
deal
with
terrorism.
\n
A:
Let's
think
step
by
step.
We
\
(A)
the
stupidity
of
most
traditional
philosophical
examples.
(B)
a
general
lack
\
refer
to
Wikipedia
articles
on
moral
disputes
for
help.
The
ticking
bomb
hypothetical
\
of
intelligence
among
people
with
advanced
degrees.
(C)
the
wrongness
of
torture.
\
poses
a
problem
where
many
people
will
die
to
an
exploding
bomb,
if
the
hypothetical
\
(D)
the
readiness
on
the
part
of
many
intelligent
people
to
see
torture
as
the
best
\
terrorist
does
not
disclose
how
to
defuse
it.
Baron
sees
this
hypothetical
as
\
solution
to
deal
with
terrorism.
\
silly,
but
its
prevalence
does
suggest
intelligent
people,
particularly
utilitarians,
\
\
see
torture
as
justifiable
to
save
the
lives
in
this
scenario.
The
answer
is
(D).
\n\
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
moral
disputes
for
\n
Q:
A
fertilized
ovum
is
also
known
as
\n
(A)
a
zygote.
(B)
an
embryo.
(C)
a
viability.
\
help.
The
ticking
bomb
hypothetical
poses
a
problem
where
many
people
will
die
to
\
(D)
a
blastocyst.
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
\
an
exploding
bomb,
if
the
hypothetical
terrorist
does
not
disclose
how
to
defuse
\
on
moral
disputes
for
help.
Once
a
single
sperm
penetrates
the
layers
of
an
egg
\
it.
Baron
sees
this
hypothetical
as
silly,
but
its
prevalence
does
suggest
intelligent
\
to
form
a
new
cell,
that
cell
is
called
a
zygote.
The
answer
is
(A).
\n\n
Q:
Pence
\
people,
particularly
utilitarians,
see
torture
as
justifiable
to
save
the
lives
\
compares
six
different
cases
of
reproduction,
from
natural
twinning
to
SCNT.
What
\
in
this
scenario.
The
answer
is
(D).
\
conclusion
does
he
draw
from
this
comparison?
\n
(A)
SCNT
is
not
a
different
kind
\
\
of
reproduction
because
there
are
no
morally
relevant
differences
between
it
and
\
\
other
permissible
means
of
reproduction.
(B)
Because
there
is
a
low
risk
of
harm
\
Q:
A
fertilized
ovum
is
also
known
as
\
for
natural
twinning,
there
will
be
a
low
risk
of
harm
for
SCNT.
(C)
both
A
and
\
\
B
(D)
neither
A
nor
B
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
\
(A)
a
zygote.
(B)
an
embryo.
(C)
a
viability.
(D)
a
blastocyst.
\
on
moral
disputes
for
help.
SCNT,
known
as
'somatic
cell
nuclear
transfer',
is
\
\
a
technique
in
which
the
nucleus
of
a
somatic
(body)
cell
is
transferred
to
the
\
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
moral
disputes
for
\
cytoplasm
of
an
enucleated
egg.
This
is
used
in
farm
animal
cloning.
Pence
may
\
help.
Once
a
single
sperm
penetrates
the
layers
of
an
egg
to
form
a
new
cell,
that
\
compare
this
to
other
processes
of
reproduction,
and
see
little
moral
separation.
\
cell
is
called
a
zygote.
The
answer
is
(A).
\
The
answer
is
(A).
\n\n
Q:
Which
of
the
following
is
an
intrinsic
good,
according
\
\
to
Singer?
\n
(A)
being
in
the
upper
class
(B)
the
ability
to
learn
and
solve
problems
\
\
(C)
being
more
beautiful
than
most
of
the
population
(D)
being
shorter
than
average
\n\
Q:
Pence
compares
six
different
cases
of
reproduction,
from
natural
twinning
to
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
moral
disputes
for
\
SCNT.
What
conclusion
does
he
draw
from
this
comparison?
\
help.
Bioethicist
Peter
Singer
sees
intrinsic
value
as
innate
values
conferred
\
\
by
oneself,
for
oneself.
Innanimate
objects
can
be
beautiful,
short,
or
have
some
\
(A)
SCNT
is
not
a
different
kind
of
reproduction
because
there
are
no
morally
relevant
\
valuable
criteria,
but
capabilities
are
intrinsically
good.
The
answer
is
(B).
\n\
differences
between
it
and
other
permissible
means
of
reproduction.
(B)
Because
\n
Q:
According
to
Metz,
what
is
wrong
with
consequentialist
arguments
against
capital
\
there
is
a
low
risk
of
harm
for
natural
twinning,
there
will
be
a
low
risk
of
harm
\
punishment
based
on
African
values?
\n
(A)
It
is
unclear
as
of
yet
whether
or
not
\
for
SCNT.
(C)
both
A
and
B
(D)
neither
A
nor
B
\
capital
punishment
deters
harm
to
the
community.
(B)
It
is
unclear
as
of
yet
whether
\
\
or
not
capital
punishment
deters
harm
to
any
individuals.
(C)
Consequentialism
\
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
moral
disputes
for
\
is
not
supported
by
African
values.
(D)
Even
though
consequentialism
is
supported
\
help.
SCNT,
known
as
'
'
somatic
cell
nuclear
transfer'
'
,
is
a
technique
in
which
\
by
African
values,
no
consequentialist
arguments
framed
in
terms
of
African
values
\
the
nucleus
of
a
somatic
(body)
cell
is
transferred
to
the
cytoplasm
of
an
enucleated
\
have
been
offered.
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
\
egg.
This
is
used
in
farm
animal
cloning.
Pence
may
compare
this
to
other
processes
\
on
moral
disputes
for
help.
Thaddeus
Metz
is
a
humanities
research
professor
focusing
\
of
reproduction,
and
see
little
moral
separation.
The
answer
is
(A).
\
on
ethical
philosophy
in
South
Africa.
Metz
has
written
the
death
penalty
is
unjustified
\
\
as
it
treats
individuals
as
incapable
of
communal
relations.
It
is
unclear
that
\
\
capital
punishment
is
to
the
benefit
of,
or
a
deterrent
of
harm
to
the
community.
\
Q:
Which
of
the
following
is
an
intrinsic
good,
according
to
Singer?
\
The
answer
is
(A)."
"
group"
:
"
mmlu_flan_cot_fewshot_humanities"
(A)
being
in
the
upper
class
(B)
the
ability
to
learn
and
solve
problems
(C)
being
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
more
beautiful
than
most
of
the
population
(D)
being
shorter
than
average
"
task"
:
"
mmlu_flan_cot_fewshot_moral_disputes"
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
moral
disputes
for
help.
Bioethicist
Peter
Singer
sees
intrinsic
value
as
innate
values
conferred
by
oneself,
for
oneself.
Innanimate
objects
can
be
beautiful,
short,
or
have
some
valuable
criteria,
but
capabilities
are
intrinsically
good.
The
answer
is
(B).
Q:
According
to
Metz,
what
is
wrong
with
consequentialist
arguments
against
capital
punishment
based
on
African
values?
(A)
It
is
unclear
as
of
yet
whether
or
not
capital
punishment
deters
harm
to
the
community.
(B)
It
is
unclear
as
of
yet
whether
or
not
capital
punishment
deters
harm
to
any
individuals.
(C)
Consequentialism
is
not
supported
by
African
values.
(D)
Even
though
consequentialism
is
supported
by
African
values,
no
consequentialist
arguments
framed
in
terms
of
African
values
have
been
offered.
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
moral
disputes
for
help.
Thaddeus
Metz
is
a
humanities
research
professor
focusing
on
ethical
philosophy
in
South
Africa.
Metz
has
written
the
death
penalty
is
unjustified
as
it
treats
individuals
as
incapable
of
communal
relations.
It
is
unclear
that
capital
punishment
is
to
the
benefit
of,
or
a
deterrent
of
harm
to
the
community.
The
answer
is
(A).'
include
:
_mmlu_flan_cot_fewshot_template_yaml
task
:
mmlu_flan_cot_fewshot_moral_disputes
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_moral_scenarios.yaml
View file @
574e565a
dataset_name
:
moral_scenarios
"
dataset_name"
:
"
moral_scenarios"
description
:
'
The
following
are
multiple
choice
questions
(with
answers)
about
moral
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
moral
\
scenarios.
\
scenarios.
\n\n
Q:
For
which
of
these
two
scenarios
does
the
main
character
(who
\
\
uses
I/me/my)
do
something
clearly
morally
wrong,
according
to
ordinary
moral
\
\
standards
in
the
US
as
of
2020?
Scenario
1
|
I
took
his
wallet
after
he
harassed
\
Q:
For
which
of
these
two
scenarios
does
the
main
character
(who
uses
I/me/my)
do
\
me
at
the
bar.
Scenario
2
|
I
made
love
to
my
newborn
baby
boy.
\n
(A)
Wrong,
Wrong
\
something
clearly
morally
wrong,
according
to
ordinary
moral
standards
in
the
US
\
(B)
Wrong,
Not
wrong
(C)
Not
wrong,
Wrong
(D)
Not
wrong,
Not
wrong
\n
A:
Let's
think
\
as
of
2020?
Scenario
1
|
I
took
his
wallet
after
he
harassed
me
at
the
bar.
Scenario
\
step
by
step.
We
refer
to
Wikipedia
articles
on
moral
scenarios
for
help.
Stealing
\
2
|
I
made
love
to
my
newborn
baby
boy.
\
from
someone
is
not
considered
an
ethical
response
to
harrassment.
And
having
\
\
sex
with
a
minor
is
egregiously
unethical.
The
answer
is
(A).
\n\n
Q:
For
which
\
(A)
Wrong,
Wrong
(B)
Wrong,
Not
wrong
(C)
Not
wrong,
Wrong
(D)
Not
wrong,
Not
wrong
\
of
these
two
scenarios
does
the
main
character
(who
uses
I/me/my)
do
something
\
\
clearly
morally
wrong,
according
to
ordinary
moral
standards
in
the
US
as
of
2020?
\
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
moral
scenarios
\
Scenario
1
|
I
asked
for
a
cigarette
from
the
guy
outside
the
store.
Scenario
\
for
help.
Stealing
from
someone
is
not
considered
an
ethical
response
to
harrassment.
\
2
|
When
the
security
guard
left
his
post,
I
pocketed
a
rare
coin.
\n
(A)
Wrong,
\
And
having
sex
with
a
minor
is
egregiously
unethical.
The
answer
is
(A).
\
Wrong
(B)
Wrong,
Not
wrong
(C)
Not
wrong,
Wrong
(D)
Not
wrong,
Not
wrong
\n
A:
Let's
\
\
think
step
by
step.
We
refer
to
Wikipedia
articles
on
moral
scenarios
for
help.
\
\
Asking
for
something
is
not
morally
wrong.
Stealing
is
morally
wrong.
The
answer
\
Q:
For
which
of
these
two
scenarios
does
the
main
character
(who
uses
I/me/my)
do
\
is
(C).
\n\n
Q:
For
which
of
these
two
scenarios
does
the
main
character
(who
uses
\
something
clearly
morally
wrong,
according
to
ordinary
moral
standards
in
the
US
\
I/me/my)
do
something
clearly
morally
wrong,
according
to
ordinary
moral
standards
\
as
of
2020?
Scenario
1
|
I
asked
for
a
cigarette
from
the
guy
outside
the
store.
\
in
the
US
as
of
2020?
Scenario
1
|
During
the
house's
demolition,
I
pushed
him
\
Scenario
2
|
When
the
security
guard
left
his
post,
I
pocketed
a
rare
coin.
\
into
a
wall.
Scenario
2
|
When
I
choose
a
babysitter
for
my
twin
nieces,
I
don't
\
\
care
about
her
racial
background.
\n
(A)
Wrong,
Wrong
(B)
Wrong,
Not
wrong
(C)
Not
\
(A)
Wrong,
Wrong
(B)
Wrong,
Not
wrong
(C)
Not
wrong,
Wrong
(D)
Not
wrong,
Not
wrong
\
wrong,
Wrong
(D)
Not
wrong,
Not
wrong
\n
A:
Let's
think
step
by
step.
We
refer
to
\
\
Wikipedia
articles
on
moral
scenarios
for
help.
Physical
violence
is
morally
wrong.
\
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
moral
scenarios
\
Being
open
to
hiring
people
from
different
racial
backgrounds
is
an
expected
moral
\
for
help.
Asking
for
something
is
not
morally
wrong.
Stealing
is
morally
wrong.
\
stance.
The
answer
is
(B).
\n\n
Q:
For
which
of
these
two
scenarios
does
the
main
\
The
answer
is
(C).
\
character
(who
uses
I/me/my)
do
something
clearly
morally
wrong,
according
to
\
\
ordinary
moral
standards
in
the
US
as
of
2020?
Scenario
1
|
I
bought
an
old
pick-up
\
\
truck
for
work.
Scenario
2
|
I
looked
up
at
the
gorgeous
waterfall
and
threw
a
\
Q:
For
which
of
these
two
scenarios
does
the
main
character
(who
uses
I/me/my)
do
\
rock
at
it.
\n
(A)
Wrong,
Wrong
(B)
Wrong,
Not
wrong
(C)
Not
wrong,
Wrong
(D)
Not
\
something
clearly
morally
wrong,
according
to
ordinary
moral
standards
in
the
US
\
wrong,
Not
wrong
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
\
as
of
2020?
Scenario
1
|
During
the
house'
'
s
demolition,
I
pushed
him
into
a
wall.
\
on
moral
scenarios
for
help.
Neither
of
these
are
wrong
as
no
one
is
wronged
by
\
Scenario
2
|
When
I
choose
a
babysitter
for
my
twin
nieces,
I
don'
'
t
care
about
\
buing
a
pick-up
truck
or
throwing
a
rock
into
a
stream.
The
answer
is
(D).
\n\n\
her
racial
background.
Q:
For
which
of
these
two
scenarios
does
the
main
character
(who
uses
I/me/my)
do
\
\
something
clearly
morally
wrong,
according
to
ordinary
moral
standards
in
the
\
(A)
Wrong,
Wrong
(B)
Wrong,
Not
wrong
(C)
Not
wrong,
Wrong
(D)
Not
wrong,
Not
wrong
\
US
as
of
2020?
Scenario
1
|
I
really
loved
my
newborn
baby
boy.
Scenario
2
|
The
\
\
new
girl
is
embarrassed
to
live
in
a
trailer
park,
so
I
invite
the
class
to
her
\
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
moral
scenarios
\
house.
\n
(A)
Wrong,
Wrong
(B)
Wrong,
Not
wrong
(C)
Not
wrong,
Wrong
(D)
Not
wrong,
\
for
help.
Physical
violence
is
morally
wrong.
Being
open
to
hiring
people
from
different
\
Not
wrong
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
moral
\
racial
backgrounds
is
an
expected
moral
stance.
The
answer
is
(B).
\
scenarios
for
help.
Loving
someone
is
not
wrong.
However,
exposing
something
that
\
\
someone
is
embarrassed
about
could
be
considered
quite
mean.
The
answer
is
(C)."
"
group"
:
"
mmlu_flan_cot_fewshot_humanities"
Q:
For
which
of
these
two
scenarios
does
the
main
character
(who
uses
I/me/my)
do
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
something
clearly
morally
wrong,
according
to
ordinary
moral
standards
in
the
US
"
task"
:
"
mmlu_flan_cot_fewshot_moral_scenarios"
as
of
2020?
Scenario
1
|
I
bought
an
old
pick-up
truck
for
work.
Scenario
2
|
I
looked
up
at
the
gorgeous
waterfall
and
threw
a
rock
at
it.
(A)
Wrong,
Wrong
(B)
Wrong,
Not
wrong
(C)
Not
wrong,
Wrong
(D)
Not
wrong,
Not
wrong
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
moral
scenarios
for
help.
Neither
of
these
are
wrong
as
no
one
is
wronged
by
buing
a
pick-up
truck
or
throwing
a
rock
into
a
stream.
The
answer
is
(D).
Q:
For
which
of
these
two
scenarios
does
the
main
character
(who
uses
I/me/my)
do
something
clearly
morally
wrong,
according
to
ordinary
moral
standards
in
the
US
as
of
2020?
Scenario
1
|
I
really
loved
my
newborn
baby
boy.
Scenario
2
|
The
new
girl
is
embarrassed
to
live
in
a
trailer
park,
so
I
invite
the
class
to
her
house.
(A)
Wrong,
Wrong
(B)
Wrong,
Not
wrong
(C)
Not
wrong,
Wrong
(D)
Not
wrong,
Not
wrong
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
moral
scenarios
for
help.
Loving
someone
is
not
wrong.
However,
exposing
something
that
someone
is
embarrassed
about
could
be
considered
quite
mean.
The
answer
is
(C).'
include
:
_mmlu_flan_cot_fewshot_template_yaml
task
:
mmlu_flan_cot_fewshot_moral_scenarios
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_nutrition.yaml
View file @
574e565a
dataset_name
:
nutrition
"
dataset_name"
:
"
nutrition"
description
:
'
The
following
are
multiple
choice
questions
(with
answers)
about
nutrition.
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
nutrition.
\n\
\n
Q:
What
is
the
first-line
drug
for
patients
with
type
2
diabetes
and
obesity,
\
\
as
of
2020?
\n
(A)
Acarbose
(B)
Metformin
(C)
Sulphonylureas
(D)
Insulin
\n
A:
Let's
\
Q:
What
is
the
first-line
drug
for
patients
with
type
2
diabetes
and
obesity,
as
\
think
step
by
step.
We
refer
to
Wikipedia
articles
on
nutrition
for
help.
Metformin
\
of
2020?
\
(Fortamet,
Glumetza,
or
others)
is
usually
the
first
medication
prescribed
for
\
\
type
2
diabetes,
as
well
as
obesity.
It
works
by
lowering
glucose
production
in
\
(A)
Acarbose
(B)
Metformin
(C)
Sulphonylureas
(D)
Insulin
\
the
liver
and
improving
the
body's
sensitivity
to
insulin.
The
answer
is
(B).
\n\
\n
Q:
Which
of
the
following
statements
is
correct
(according
to
knowledge
in
2020)?
\n\
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
nutrition
for
help.
(A)
Consumers
with
phenylketonuria
must
avoid
the
consumption
of
the
sweetener
aspartame
\
Metformin
(Fortamet,
Glumetza,
or
others)
is
usually
the
first
medication
prescribed
\
(B)
Consumers
with
phenylketonuria
must
avoid
the
consumption
of
the
sweetener
\
for
type
2
diabetes,
as
well
as
obesity.
It
works
by
lowering
glucose
production
\
saccharin
(C)
Consumers
with
phenylketonuria
must
avoid
the
consumption
of
the
\
in
the
liver
and
improving
the
body'
'
s
sensitivity
to
insulin.
The
answer
is
(B).
\
sweetener
sucralose
(D)
Consumers
with
phenylketonuria
must
avoid
the
consumption
\
\
of
the
sweetener
acesulfame
K
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
\
\
articles
on
nutrition
for
help.
People
with
phenylketonuria
(PKU)
cannot
break
\
Q:
Which
of
the
following
statements
is
correct
(according
to
knowledge
in
2020)?
\
down
the
amino
acid
phenylalanine.
As
it
builds
up
in
the
blood
and
brain
it
can
\
\
lead
to
brain
damage.
People
with
PKU
should
avoid
foods
that
are
converted
to
\
(A)
Consumers
with
phenylketonuria
must
avoid
the
consumption
of
the
sweetener
aspartame
\
phenylalanine
in
the
body,
such
as
aspartame.
The
answer
is
(A).
\n\n
Q:
Which
of
\
(B)
Consumers
with
phenylketonuria
must
avoid
the
consumption
of
the
sweetener
saccharin
\
the
following
statements
about
iodine
is
correct,
as
of
2020?
\n
(A)
50%
of
adults
\
(C)
Consumers
with
phenylketonuria
must
avoid
the
consumption
of
the
sweetener
sucralose
\
consume
iodine
at
levels
below
the
RNI
(B)
Dairy
products
are
a
poor
source
of
\
(D)
Consumers
with
phenylketonuria
must
avoid
the
consumption
of
the
sweetener
acesulfame
\
iodine
(C)
The
iodine
content
of
organic
milk
is
generally
lower
that
the
level
\
K
\
in
non-organic
milk
(D)
UK
dietary
reference
values
recommend
an
increase
in
iodine
\
\
intake
in
pregnancy
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
\
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
nutrition
for
help.
\
on
nutrition
for
help.
Organic
milk
usually
has
less
iodine
content
than
non-organic
\
People
with
phenylketonuria
(PKU)
cannot
break
down
the
amino
acid
phenylalanine.
\
milk.
The
answer
is
(C).
\n\n
Q:
Which
of
the
following
is
the
most
plausible
explanation
\
As
it
builds
up
in
the
blood
and
brain
it
can
lead
to
brain
damage.
People
with
\
for
the
protective
effect
of
dietary
fibre
against
cancer
of
the
colon,
as
of
\
PKU
should
avoid
foods
that
are
converted
to
phenylalanine
in
the
body,
such
as
\
2020?
\n
(A)
Propionic
acid,
formed
during
colonic
fibre
fermentation
inhibits
liver
\
aspartame.
The
answer
is
(A).
\
fatty
acid
synthesis
(B)
Butyric
acid,
formed
during
colonic
fibre
fermentation
\
\
stimulates
\"
silencing
\"
of
the
SLC5A8
tumour
suppressor
gene
(C)
None
of
these
\
\
options
are
correct
(D)
Butyric
acid,
formed
during
colonic
fibre
fermentation
\
Q:
Which
of
the
following
statements
about
iodine
is
correct,
as
of
2020?
\
stimulates
anti-oxidant
defences
in
the
colon
\n
A:
Let's
think
step
by
step.
We
\
\
refer
to
Wikipedia
articles
on
nutrition
for
help.
Dietary
fibre
is
inversely
\
(A)
50%
of
adults
consume
iodine
at
levels
below
the
RNI
(B)
Dairy
products
are
\
proportional
to
the
risk
of
colorectal
cancer.
This
is
presumed
because
butyric
\
a
poor
source
of
iodine
(C)
The
iodine
content
of
organic
milk
is
generally
lower
\
acid
(BA)
stimulates
antioxidants
which
help
protect
the
colon
from
cancerous
\
that
the
level
in
non-organic
milk
(D)
UK
dietary
reference
values
recommend
an
\
tumors.
The
answer
is
(D).
\n\n
Q:
In
a
cohort
study,
the
risk
ratio
of
developing
\
increase
in
iodine
intake
in
pregnancy
\
diabetes
was
0.86
when
comparing
consumers
of
tea
(the
exposed)
to
those
who
did
\
\
not
drink
tea
(the
unexposed).
Which
one
statement
is
correct
(according
to
knowledge
\
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
nutrition
for
help.
\
in
2020)?
\n
(A)
The
tea
drinkers
have
lower
risk
of
developing
diabetes.
(B)
The
\
Organic
milk
usually
has
less
iodine
content
than
non-organic
milk.
The
answer
is
\
tea
drinkers
have
higher
risk
of
developing
diabetes.
(C)
Based
on
the
information
\
(C).
\
given
we
cannot
tell
if
the
observed
difference
in
disease
risk
is
the
result
\
\
of
chance.
(D)
The
risk
ratio
is
close
to
the
value
one,
so
there
is
no
difference
\
\
in
disease
risk
between
the
two
groups.
\n
A:
Let's
think
step
by
step.
We
refer
\
Q:
Which
of
the
following
is
the
most
plausible
explanation
for
the
protective
effect
\
to
Wikipedia
articles
on
nutrition
for
help.
The
risk
ratio
is
not
sufficiently
\
of
dietary
fibre
against
cancer
of
the
colon,
as
of
2020?
\
reduced
that
it
could
not
be
explained
by
random
chance
given
the
studies
sample
\
\
size.
The
answer
is
(C)."
(A)
Propionic
acid,
formed
during
colonic
fibre
fermentation
inhibits
liver
fatty
"
group"
:
"
mmlu_flan_cot_fewshot_other"
acid
synthesis
(B)
Butyric
acid,
formed
during
colonic
fibre
fermentation
stimulates
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"silencing"
of
the
SLC5A8
tumour
suppressor
gene
(C)
None
of
these
options
are
correct
"
task"
:
"
mmlu_flan_cot_fewshot_nutrition"
(D)
Butyric
acid,
formed
during
colonic
fibre
fermentation
stimulates
anti-oxidant
defences
in
the
colon
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
nutrition
for
help.
Dietary
fibre
is
inversely
proportional
to
the
risk
of
colorectal
cancer.
This
is
presumed
because
butyric
acid
(BA)
stimulates
antioxidants
which
help
protect
the
colon
from
cancerous
tumors.
The
answer
is
(D).
Q:
In
a
cohort
study,
the
risk
ratio
of
developing
diabetes
was
0.86
when
comparing
consumers
of
tea
(the
exposed)
to
those
who
did
not
drink
tea
(the
unexposed).
Which
one
statement
is
correct
(according
to
knowledge
in
2020)?
(A)
The
tea
drinkers
have
lower
risk
of
developing
diabetes.
(B)
The
tea
drinkers
have
higher
risk
of
developing
diabetes.
(C)
Based
on
the
information
given
we
cannot
tell
if
the
observed
difference
in
disease
risk
is
the
result
of
chance.
(D)
The
risk
ratio
is
close
to
the
value
one,
so
there
is
no
difference
in
disease
risk
between
the
two
groups.
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
nutrition
for
help.
The
risk
ratio
is
not
sufficiently
reduced
that
it
could
not
be
explained
by
random
chance
given
the
studies
sample
size.
The
answer
is
(C).'
include
:
_mmlu_flan_cot_fewshot_template_yaml
task
:
mmlu_flan_cot_fewshot_nutrition
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_philosophy.yaml
View file @
574e565a
dataset_name
:
philosophy
"
dataset_name
"
:
"
philosophy
"
description
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
philosophy.
\n\
"
description
"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
philosophy.
\n\
\n
Q:
The
study
of
reality
in
the
broadest
sense,
an
inquiry
into
the
elemental
nature
\
\n
Q:
The
study
of
reality
in
the
broadest
sense,
an
inquiry
into
the
elemental
nature
\
\
of
the
universe
and
the
things
in
it,
is
known
as
_____.
\n
(A)
metaphysics
(B)
\
\
of
the
universe
and
the
things
in
it,
is
known
as
_____.
\n
(A)
metaphysics
(B)
\
\
epistemology
(C)
quantum
physics
(D)
axiology
\n
A:
Let's
think
step
by
step.
We
\
\
epistemology
(C)
quantum
physics
(D)
axiology
\n
A:
Let's
think
step
by
step.
We
\
\
refer
to
Wikipedia
articles
on
philosophy
for
help.
Among
the
options,
only
metaphysics
\
\
refer
to
Wikipedia
articles
on
philosophy
for
help.
Among
the
options,
only
metaphysics
\
\
studies
the
nature
of
reality
and
existence.
The
answer
is
(A).
\n\n
Q:
According
\
\
studies
the
nature
of
reality
and
existence.
The
answer
is
(A).
\n\n
Q:
According
\
\
to
Moore
\u2019
s
\u201C
ideal
utilitarianism,
\u201D
the
right
action
is
the
one
\
\
to
Moore
’s
“
ideal
utilitarianism,
”
the
right
action
is
the
one
that
brings
about
\
\
that
brings
about
the
greatest
amount
of:
\n
(A)
pleasure.
(B)
happiness.
(C)
good.
\
\
the
greatest
amount
of:
\n
(A)
pleasure.
(B)
happiness.
(C)
good.
(D)
virtue.
\n
\
\
(D)
virtue.
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
philosophy
\
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
philosophy
for
help.
\
\
for
help.
Moore's
\"
ideal
utilitarianism
\"
states
that
one's
actions
should
maximize
\
\
Moore's
\"
ideal
utilitarianism
\"
states
that
one's
actions
should
maximize
intrinsic
\
\
intrinsic
goods.
The
answer
is
(C).
\n\n
Q:
Before
Tolstoy's
Christian
conversion,
\
\
goods.
The
answer
is
(C).
\n\n
Q:
Before
Tolstoy's
Christian
conversion,
what
was
\
\
what
was
his
perspective
on
the
meaning
of
life?
\n
(A)
optimist
(B)
satisfied
(C)
\
\
his
perspective
on
the
meaning
of
life?
\n
(A)
optimist
(B)
satisfied
(C)
nominally
\
\
nominally
religious
(D)
pessimist
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
\
\
religious
(D)
pessimist
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
\
\
articles
on
philosophy
for
help.
Before
his
conversion,
Tolstoy
feels
that
life
\
\
on
philosophy
for
help.
Before
his
conversion,
Tolstoy
feels
that
life
was
uncertain,
\
\
was
uncertain,
which
is
a
pessimist's
point
of
view.
The
answer
is
(D).
\n\n
Q:
\
\
which
is
a
pessimist's
point
of
view.
The
answer
is
(D).
\n\n
Q:
According
to
d'Holbach,
\
\
According
to
d'Holbach,
people
always
act
according
to
_____.
\n
(A)
free
choices
\
\
people
always
act
according
to
_____.
\n
(A)
free
choices
(B)
dictates
of
the
soul
\
\
(B)
dictates
of
the
soul
(C)
necessary
natural
laws
(D)
undetermined
will
\n
A:
\
\
(C)
necessary
natural
laws
(D)
undetermined
will
\n
A:
Let's
think
step
by
step.
\
\
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
philosophy
for
help.
\
\
We
refer
to
Wikipedia
articles
on
philosophy
for
help.
d'Holbach
believes
that
\
\
d'Holbach
believes
that
people
act
according
to
necessary
laws,
and
it
proves
\
\
people
act
according
to
necessary
laws,
and
it
proves
nothing
about
people's
free
\
\
nothing
about
people's
free
will.
The
answer
is
(C).
\n\n
Q:
Psychological
egoism
\
\
will.
The
answer
is
(C).
\n\n
Q:
Psychological
egoism
is:
\n
(A)
an
ethical
theory
\
\
is:
\n
(A)
an
ethical
theory
about
how
we
ought
to
behave.
(B)
a
generalization
\
\
about
how
we
ought
to
behave.
(B)
a
generalization
concerning
the
way
people
tend
\
\
concerning
the
way
people
tend
to
behave.
(C)
a
claim
about
human
nature
and
the
\
\
to
behave.
(C)
a
claim
about
human
nature
and
the
ways
people
are
capable
of
behaving.
\
\
ways
people
are
capable
of
behaving.
(D)
none
of
the
above.
\n
A:
Let's
think
step
\
\
(D)
none
of
the
above.
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
\
\
by
step.
We
refer
to
Wikipedia
articles
on
philosophy
for
help.
Psychological
\
\
on
philosophy
for
help.
Psychological
egoism
suggests
that
one
behaves
based
on
\
\
egoism
suggests
that
one
behaves
based
on
what
makes
one
feels
good,
hence
it
\
\
what
makes
one
feels
good,
hence
it
is
a
claim
about
human
nature
and
how
humans
\
\
is
a
claim
about
human
nature
and
how
humans
are
capable
of
behaving.
The
answer
\
\
are
capable
of
behaving.
The
answer
is
(C)."
\
is
(C).
"
"
group"
:
"
mmlu_flan_cot_fewshot_humanities
"
include
:
_mmlu_flan_cot_fewshot_template_yaml
"
include
"
:
"
_mmlu_flan_cot_fewshot_template_yaml
"
task
:
mmlu_flan_cot_fewshot_philosophy
"
task
"
:
"
mmlu_flan_cot_fewshot_philosophy
"
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_prehistory.yaml
View file @
574e565a
dataset_name
:
prehistory
"
dataset_name"
:
"
prehistory"
description
:
'
The
following
are
multiple
choice
questions
(with
answers)
about
prehistory.
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
prehistory.
\n\
\n
Q:
What
is
the
approximate
mean
cranial
capacity
of
Homo
erectus?
\n
(A)
under
650
\
\
cc
(B)
about
800
cc
(C)
just
under
1000
cc
(D)
1200
cc
\n
A:
Let's
think
step
by
\
Q:
What
is
the
approximate
mean
cranial
capacity
of
Homo
erectus?
\
step.
We
refer
to
Wikipedia
articles
on
prehistory
for
help.
The
average
cranium
\
\
capacity
of
Homo
erectus
is
less
than
1000
cubic
cm.
The
answer
is
(C).
\n\n
Q:
\
(A)
under
650
cc
(B)
about
800
cc
(C)
just
under
1000
cc
(D)
1200
cc
\
According
to
Timothy
Pauketat,
the
evidence
for
social
stratification
and
political
\
\
power
at
Cahokia
suggests:
\n
(A)
a
center
of
Mississippian
civilization
with
conditions
\
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
prehistory
for
help.
\
similar
to
the
rise
of
early
states.
(B)
the
limitations
of
authority
in
a
Native
\
The
average
cranium
capacity
of
Homo
erectus
is
less
than
1000
cubic
cm.
The
answer
\
American
society
of
egalitarian
foragers.
(C)
a
simple
chiefdom
or
perhaps
a
complex
\
is
(C).
\
chiefdom
had
evolved
by
A.D.
1500.
(D)
a
center
of
Mississippian
civilization
\
\
with
conditions
similar
to
societies
on
the
Northwest
Coast
of
North
America.
\n\
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
prehistory
for
help.
\
Q:
According
to
Timothy
Pauketat,
the
evidence
for
social
stratification
and
political
\
Timothy
Pauketat
is
known
for
his
research
on
Cahokia,
the
center
of
the
Mississippian
\
power
at
Cahokia
suggests:
\
culture,
where
he
found
similar
conditions
to
the
rise
of
early
states.
The
answer
\
\
is
(A).
\n\n
Q:
Recent
research
on
hominid
species
dating
from
the
Middle
Pliocene
\
(A)
a
center
of
Mississippian
civilization
with
conditions
similar
to
the
rise
of
\
indicates
there
was
(as
of
2020):
\n
(A)
a
great
amount
of
species
diversity,
or
\
early
states.
(B)
the
limitations
of
authority
in
a
Native
American
society
of
egalitarian
\
a
single
species
that
exhibited
a
lot
of
diversity.
(B)
very
little
species
diversity
\
foragers.
(C)
a
simple
chiefdom
or
perhaps
a
complex
chiefdom
had
evolved
by
A.D.
\
during
this
period
and
very
few
hominids.
(C)
decreased
species
diversity
due
\
1500.
(D)
a
center
of
Mississippian
civilization
with
conditions
similar
to
societies
\
to
a
prolonged
ice
age
followed
by
a
severe
drought.
(D)
decreased
species
diversity
\
on
the
Northwest
Coast
of
North
America.
\
but
increased
numbers
of
hammerstones
and
flakes,
indicating
stone
tool
manufacture.
\n\
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
prehistory
for
help.
\
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
prehistory
for
help.
\
Recent
research
has
recognized
multiple
hominid
species
from
the
Middle
Pliocene,
\
Timothy
Pauketat
is
known
for
his
research
on
Cahokia,
the
center
of
the
Mississippian
\
meaning
that
there
is
a
great
amount
of
species
diversity
or
diversity
in
a
single
\
culture,
where
he
found
similar
conditions
to
the
rise
of
early
states.
The
answer
\
species.
The
answer
is
(A).
\n\n
Q:
Researchers
now
believe
that
the
decline
of
\
is
(A).
\
the
Maya
was
caused
chiefly
by:
\n
(A)
a
cataclysm
of
some
kind,
such
as
an
earthquake,
\
\
volcano,
or
tsunami.
(B)
ecological
degradation
resulting
from
slash-and-burn
\
\
farming
techniques.
(C)
endless
wars
between
neighboring
Mayan
city-states.
(D)
\
Q:
Recent
research
on
hominid
species
dating
from
the
Middle
Pliocene
indicates
\
practices
of
interbreeding
that
led
to
a
steep
rise
in
congenital
disorders.
\n\
there
was
(as
of
2020):
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
prehistory
for
help.
\
\
Researchers
believe
that
the
Maya
collapse
was
mainly
caused
by
over-exploitation
\
(A)
a
great
amount
of
species
diversity,
or
a
single
species
that
exhibited
a
lot
\
of
natural
resources
like
the
slash-and-burn
farming
techniques.
The
answer
is
\
of
diversity.
(B)
very
little
species
diversity
during
this
period
and
very
few
\
(B).
\n\n
Q:
The
great
Mayan
king
Pacal
built
temples
in
the
city
of
Palenque
in
\
hominids.
(C)
decreased
species
diversity
due
to
a
prolonged
ice
age
followed
by
\
order
to:
\n
(A)
satisfy
the
powerful
Mayan
astronomer
priests.
(B)
display
his
\
a
severe
drought.
(D)
decreased
species
diversity
but
increased
numbers
of
hammerstones
\
generosity
to
the
common
people,
since
they
were
allowed
to
live
in
the
temples.
\
and
flakes,
indicating
stone
tool
manufacture.
\
(C)
frighten
away
enemies,
in
particular
the
Spaniards.
(D)
legitimize
his
kingship,
\
\
since
his
father
was
not
royal.
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
\
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
prehistory
for
help.
\
articles
on
prehistory
for
help.
Pacal
built
the
temples
as
the
funerary
monument
\
Recent
research
has
recognized
multiple
hominid
species
from
the
Middle
Pliocene,
\
to
legitimize
his
kingship.
The
answer
is
(D)."
meaning
that
there
is
a
great
amount
of
species
diversity
or
diversity
in
a
single
"
group"
:
"
mmlu_flan_cot_fewshot_humanities"
species.
The
answer
is
(A).
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
task"
:
"
mmlu_flan_cot_fewshot_prehistory"
Q:
Researchers
now
believe
that
the
decline
of
the
Maya
was
caused
chiefly
by:
(A)
a
cataclysm
of
some
kind,
such
as
an
earthquake,
volcano,
or
tsunami.
(B)
ecological
degradation
resulting
from
slash-and-burn
farming
techniques.
(C)
endless
wars
between
neighboring
Mayan
city-states.
(D)
practices
of
interbreeding
that
led
to
a
steep
rise
in
congenital
disorders.
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
prehistory
for
help.
Researchers
believe
that
the
Maya
collapse
was
mainly
caused
by
over-exploitation
of
natural
resources
like
the
slash-and-burn
farming
techniques.
The
answer
is
(B).
Q:
The
great
Mayan
king
Pacal
built
temples
in
the
city
of
Palenque
in
order
to:
(A)
satisfy
the
powerful
Mayan
astronomer
priests.
(B)
display
his
generosity
to
the
common
people,
since
they
were
allowed
to
live
in
the
temples.
(C)
frighten
away
enemies,
in
particular
the
Spaniards.
(D)
legitimize
his
kingship,
since
his
father
was
not
royal.
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
prehistory
for
help.
Pacal
built
the
temples
as
the
funerary
monument
to
legitimize
his
kingship.
The
answer
is
(D).'
include
:
_mmlu_flan_cot_fewshot_template_yaml
task
:
mmlu_flan_cot_fewshot_prehistory
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_professional_accounting.yaml
View file @
574e565a
dataset_name
:
professional_accounting
"
dataset_name
"
:
"
professional_accounting
"
description
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
professional
\
"
description
"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
professional
\
\
accounting.
\n\n
Q:
An
auditor
traces
the
serial
numbers
on
equipment
to
a
nonissuer
\u2019
\
\
accounting.
\n\n
Q:
An
auditor
traces
the
serial
numbers
on
equipment
to
a
nonissuer
’s
\
s
subledger.
Which
of
the
following
management
assertions
is
supported
by
this
test?
\n\
\
subledger.
Which
of
the
following
management
assertions
is
supported
by
this
test?
\n\
(A)
Valuation
and
allocation
(B)
Completeness
(C)
Rights
and
obligations
(D)
Presentation
\
(A)
Valuation
and
allocation
(B)
Completeness
(C)
Rights
and
obligations
(D)
Presentation
\
\
and
disclosure
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
\
\
and
disclosure
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
\
\
accounting
for
help.
The
completeness
assertion
is
tested
by
tracing
supporting
\
\
accounting
for
help.
The
completeness
assertion
is
tested
by
tracing
supporting
\
...
@@ -43,5 +43,6 @@ description: "The following are multiple choice questions (with answers) about p
...
@@ -43,5 +43,6 @@ description: "The following are multiple choice questions (with answers) about p
\
transactions,
only
Proceeds
from
long-term
debt
belongs
to
the
financing
activities
\
\
transactions,
only
Proceeds
from
long-term
debt
belongs
to
the
financing
activities
\
\
section
of
cashflow,
hence
the
amount
reported
should
be
$100000.
The
answer
is
\
\
section
of
cashflow,
hence
the
amount
reported
should
be
$100000.
The
answer
is
\
\
(D)."
\
(D)."
include
:
_mmlu_flan_cot_fewshot_template_yaml
"
group"
:
"
mmlu_flan_cot_fewshot_other"
task
:
mmlu_flan_cot_fewshot_professional_accounting
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
task"
:
"
mmlu_flan_cot_fewshot_professional_accounting"
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_professional_law.yaml
View file @
574e565a
dataset_name
:
professional_law
"
dataset_name
"
:
"
professional_law
"
description
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
professional
\
"
description
"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
professional
\
\
law.
\n\n
Q:
A
son
owed
a
creditor
$5,000.
The
son's
father
contacted
the
creditor
\
\
law.
\n\n
Q:
A
son
owed
a
creditor
$5,000.
The
son's
father
contacted
the
creditor
\
\
and
told
him
that
he
wanted
to
pay
the
son's
debt.
The
father
signed
a
document
\
\
and
told
him
that
he
wanted
to
pay
the
son's
debt.
The
father
signed
a
document
\
\
that
stated
the
father
would
pay
the
son's
debt
at
a
rate
of
$500
a
month
for
\
\
that
stated
the
father
would
pay
the
son's
debt
at
a
rate
of
$500
a
month
for
\
...
@@ -101,5 +101,6 @@ description: "The following are multiple choice questions (with answers) about p
...
@@ -101,5 +101,6 @@ description: "The following are multiple choice questions (with answers) about p
\
Amendment
further
supports
the
First
Amendment
by
establishing
a
due
process
clause.
\
\
Amendment
further
supports
the
First
Amendment
by
establishing
a
due
process
clause.
\
\
Hence
the
strongest
argument
should
be
the
statute
is
overbroad
and
consequently
\
\
Hence
the
strongest
argument
should
be
the
statute
is
overbroad
and
consequently
\
\
invalid
under
the
First
and
Fourteenth
Amendments.
The
answer
is
(D)."
\
invalid
under
the
First
and
Fourteenth
Amendments.
The
answer
is
(D)."
include
:
_mmlu_flan_cot_fewshot_template_yaml
"
group"
:
"
mmlu_flan_cot_fewshot_humanities"
task
:
mmlu_flan_cot_fewshot_professional_law
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
task"
:
"
mmlu_flan_cot_fewshot_professional_law"
Prev
1
2
3
4
5
6
7
8
9
10
…
25
Next
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment