Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
gaoqiong
lm-evaluation-harness
Commits
574e565a
Unverified
Commit
574e565a
authored
Nov 10, 2023
by
Lintang Sutawika
Committed by
GitHub
Nov 10, 2023
Browse files
Merge branch 'big-refactor' into verbosity-rework
parents
73f3029c
b7a4ea06
Changes
498
Hide whitespace changes
Inline
Side-by-side
Showing
20 changed files
with
671 additions
and
816 deletions
+671
-816
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_college_medicine.yaml
...al/tasks/mmlu/flan_cot_fewshot/mmlu_college_medicine.yaml
+43
-43
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_college_physics.yaml
...val/tasks/mmlu/flan_cot_fewshot/mmlu_college_physics.yaml
+44
-70
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_computer_security.yaml
...l/tasks/mmlu/flan_cot_fewshot/mmlu_computer_security.yaml
+32
-31
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_conceptual_physics.yaml
.../tasks/mmlu/flan_cot_fewshot/mmlu_conceptual_physics.yaml
+33
-32
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_econometrics.yaml
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_econometrics.yaml
+57
-57
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_electrical_engineering.yaml
...ks/mmlu/flan_cot_fewshot/mmlu_electrical_engineering.yaml
+32
-32
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_elementary_mathematics.yaml
...ks/mmlu/flan_cot_fewshot/mmlu_elementary_mathematics.yaml
+19
-18
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_formal_logic.yaml
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_formal_logic.yaml
+51
-55
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_global_facts.yaml
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_global_facts.yaml
+33
-32
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_high_school_biology.yaml
...tasks/mmlu/flan_cot_fewshot/mmlu_high_school_biology.yaml
+47
-47
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_high_school_chemistry.yaml
...sks/mmlu/flan_cot_fewshot/mmlu_high_school_chemistry.yaml
+5
-4
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_high_school_computer_science.yaml
...u/flan_cot_fewshot/mmlu_high_school_computer_science.yaml
+8
-7
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_high_school_european_history.yaml
...u/flan_cot_fewshot/mmlu_high_school_european_history.yaml
+23
-22
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_high_school_geography.yaml
...sks/mmlu/flan_cot_fewshot/mmlu_high_school_geography.yaml
+37
-63
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_high_school_government_and_politics.yaml
...cot_fewshot/mmlu_high_school_government_and_politics.yaml
+43
-67
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_high_school_macroeconomics.yaml
...mlu/flan_cot_fewshot/mmlu_high_school_macroeconomics.yaml
+37
-64
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_high_school_mathematics.yaml
...s/mmlu/flan_cot_fewshot/mmlu_high_school_mathematics.yaml
+6
-5
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_high_school_microeconomics.yaml
...mlu/flan_cot_fewshot/mmlu_high_school_microeconomics.yaml
+40
-63
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_high_school_physics.yaml
...tasks/mmlu/flan_cot_fewshot/mmlu_high_school_physics.yaml
+33
-32
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_high_school_psychology.yaml
...ks/mmlu/flan_cot_fewshot/mmlu_high_school_psychology.yaml
+48
-72
No files found.
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_college_medicine.yaml
View file @
574e565a
dataset_name
:
college_medicine
description
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
college
\
"
dataset_name
"
:
"
college_medicine
"
"
description
"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
college
\
\
medicine.
\n\n
Q:
An
expected
side
effect
of
creatine
supplementation
is:
\n
(A)
muscle
\
\
weakness.
(B)
gain
in
body
mass.
(C)
muscle
cramps.
(D)
loss
of
electrolytes.
\n\
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
medicine
for
help.
\
...
...
@@ -9,44 +9,44 @@ description: "The following are multiple choice questions (with answers) about c
\
endurance
runners
have
a
high
proportion
of
Type
I
fibres
in
their
leg
muscles
\
\
(C)
Liver
glycogen
is
important
in
the
maintenance
of
the
blood
glucose
concentration
\
\
(D)
Insulin
promotes
glucose
uptake
by
all
tissues
in
the
body
\n
A:
Let's
think
\
\
step
by
step.
We
refer
to
Wikipedia
articles
on
medicine
for
help.
Let
\u2019
s
\
\
solve
this
step
by
step
and
go
over
each
choice:
\n
(A)
\u201C
Muscle
glycogen
is
\
\
broken
down
enzymatically
to
glucose-1-phosphate
\u201D
:
This
is
a
correct
statement.
\n\
(B)
\u201C
Elite
endurance
runners
have
a
high
proportion
of
Type
I
fibres
in
their
\
\
leg
muscles
\u201D
:
This
is
a
correct
statement.
\n
(C)
\u201C
Liver
glycogen
is
important
\
\
in
the
maintenance
of
the
blood
glucose
concentration
\u201D
:
This
is
a
correct
\
\
statement.
\n
(D)
\u201C
Insulin
promotes
glucose
uptake
by
all
tissues
in
the
body
\u201D
\
:
This
is
not
a
correct
statement,
because
insulin
promotes
glucose
uptake
by
the
\
\
liver,
adipose
tissue,
and
muscle,
but
not
all
tissues.
For
instance,
the
tissues
\
\
in
the
brain
and
red
blood
cells
are
not
affected
by
insulin.
The
answer
is
(D).
\n\
\
n
Q:
A
high
school
science
teacher
fills
a
1
liter
bottle
with
pure
nitrogen
and
\
\
seals
the
lid.
The
pressure
is
1.70
atm,
and
the
room
temperature
is
25
\xB0
C.
\
\
Which
two
variables
will
both
increase
the
pressure
of
the
system,
if
all
other
\
\
variables
are
held
constant?
\n
(A
)
Increasing
temperature,
increasing
m
ol
es
of
\
\
gas
(B)
Increasing
temperatur
e,
in
creasing
volum
e
(
C
)
Decreasing
v
ol
ume,
de
creasing
\
\
temperature
(D)
Decreasing
moles
of
gas,
increasing
volume
\n
A:
Let's
think
step
\
\
by
ste
p.
W
e
re
fer
to
Wikipedia
articles
on
medicine
for
help.
The
relevant
equation
\
\
for
this
is
the
ideal
gas
law:
PV=nRT.
To
increase
the
pressure
of
the
system
\
\
(P),
then
either
n
(number
of
moles
of
the
gas)
or
T
(temperature)
have
to
increase.
\
\
The
answer
is
(A).
\n\n
Q:
In
a
genetic
test
of
a
newborn,
a
rare
genetic
disorder
\
\
is
found
that
has
X-
li
n
ke
d
recessive
transmission.
Which
of
the
following
statement
s
\
\
is
likely
true
regarding
the
pedigree
of
this
disorder?
\n
(A)
All
descendants
on
\
\
the
maternal
side
will
have
the
disorder.
(B)
Females
will
be
approximately
twice
\
\
as
affected
as
male
s
in
this
family.
(C)
All
daughters
of
an
affected
male
will
\
\
be
affected.
(D)
There
will
be
equal
distribution
of
males
and
females
affected.
\n
\
A:
Let
'
s
thi
nk
step
by
step.
We
refer
to
Wikipedia
articles
on
medicine
for
help.
\
\
Let
\u2019
s
solve
this
step
by
step.
Let's
recall
first
that
females
have
two
X
\
\
chromosomes,
while
males
hav
e
o
ne
X
and
one
Y
chromosome.
This
is
an
important
\
\
fact
we
need
to
know
before
answering
this
question.
\n
Because
a
male
can
only
\
\
pass
his
only
one
X
chromosome
to
a
daughter,
if
he
is
affected
by
this
rare
genetic
\
\
disorder
,
t
hen
we
know
for
sure
that
he
will
pass
this
rare
genetic
disorder
to
\
\
all
his
future-born
daughters.
Therefore,
\u201C
(C):
All
daughters
of
an
affected
\
\
male
will
be
affected
\u201D
is
a
correct
statement.
The
answer
is
(C).
\n\n
Q:
Glucose
\
\
is
transported
into
the
muscle
cell:
\n
(A)
via
protein
transporters
called
GLUT4.
\
\
(B)
only
in
the
presence
of
insulin.
(C)
via
hexokinase.
(D)
via
monocarbylic
\
\
a
cid
transporters.
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
\
\
on
medicine
for
help.
Glucose
(also
known
as
the
blood
sugar)
is
the
main
sugar
\
\
found
in
the
human
body.
It
is
transported
into
the
muscle
cell
via
diffusion
\
\
through
protein
transporters
called
GLUT4.
The
answer
is
(A).
"
include
:
_mmlu_flan_cot_fewshot_template_yaml
task
:
mmlu_flan_cot_fewshot_college_medicine
\
step
by
step.
We
refer
to
Wikipedia
articles
on
medicine
for
help.
Let
’s
solve
\
\
this
step
by
step
and
go
over
each
choice:
\n
(A)
“
Muscle
glycogen
is
broken
down
\
\
enzymatically
to
glucose-1-phosphate
”
:
This
is
a
correct
statement.
\n
(B)
“Elite
\
\
endurance
runners
have
a
high
proportion
of
Type
I
fibres
in
their
leg
muscles”:
\
\
This
is
a
correct
statement.
\n
(C)
“
Liver
glycogen
is
important
in
the
maintenance
\
\
of
the
blood
glucose
concentration
”
:
This
is
a
correct
statement.
\n
(D)
“Insulin
\
\
promotes
glucose
uptake
by
all
tissues
in
the
body
”:
This
is
not
a
correct
statement,
\
\
because
insulin
promotes
glucose
uptake
by
the
liver,
adipose
tissue,
and
muscle,
\
\
but
not
all
tissues.
For
instance,
the
tissues
in
the
brain
and
red
blood
cells
\
\
are
not
affected
by
insulin.
The
answer
is
(D).
\n\
n
Q:
A
high
school
science
teacher
\
\
fills
a
1
liter
bottle
with
pure
nitrogen
and
seals
the
lid.
The
pressure
is
1.70
\
\
atm,
and
the
room
temperature
is
25°C.
Which
two
variables
will
both
increase
\
\
the
pressure
of
the
system,
if
all
other
variables
are
held
constant?
\n
(A)
Increasing
\
\
temperature,
increasing
moles
of
gas
(B
)
Increasing
temperature,
increasing
v
ol
ume
\
\
(C)
Decreasing
volum
e,
de
creasing
temperatur
e
(
D
)
Decreasing
m
ol
es
of
gas,
in
creasing
\
\
volume
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
medicine
\
\
for
hel
p.
Th
e
re
levant
equation
for
this
is
the
ideal
gas
law:
PV=nRT.
To
increase
\
\
the
pressure
of
the
system
(P),
then
either
n
(number
of
moles
of
the
gas)
or
\
\
T
(temperature)
have
to
increase.
The
answer
is
(A).
\n\n
Q:
In
a
genetic
test
of
\
\
a
newborn,
a
rare
genetic
disorder
is
found
that
has
X-linked
recessive
transmission.
\
\
Which
of
the
following
statements
is
like
ly
true
regarding
the
pedigree
of
thi
s
\
\
disorder?
\n
(A)
All
descendants
on
the
maternal
side
will
have
the
disorder.
(B)
\
\
Females
will
be
approximately
twice
as
affected
as
males
in
this
family.
(C)
All
\
\
daughters
of
an
affected
male
will
be
affected.
(D)
There
will
be
equal
distribution
\
\
of
males
and
females
affected.
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
\
\
articles
on
medicine
for
help.
Let
’
s
solve
thi
s
step
by
step.
Let's
recall
first
\
\
that
females
have
two
X
chromosomes,
while
males
have
one
X
and
one
Y
chromosome.
\
\
This
is
an
important
fact
w
e
ne
ed
to
know
before
answering
this
question.
\n
Because
\
\
a
male
can
only
pass
his
only
one
X
chromosome
to
a
daughter,
if
he
is
affected
\
\
by
this
rare
genetic
disorder,
then
we
know
for
sure
that
he
will
pass
this
rare
\
\
genetic
disorder
t
o
all
his
future-born
daughters.
Therefore,
“(C):
All
daughters
\
\
of
an
affected
male
will
be
affected”
is
a
correct
statement.
The
answer
is
(C).
\n
\
\
n
Q:
Glucose
is
transported
into
the
muscle
cell:
\n
(A)
via
protein
transporters
\
\
called
GLUT4.
(B)
only
in
the
presence
of
insulin.
(C)
via
hexokinase.
(D)
via
\
\
monocarbylic
acid
transporters.
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
\
\
a
rticles
on
medicine
for
help.
Glucose
(also
known
as
the
blood
sugar)
is
the
\
\
main
sugar
found
in
the
human
body.
It
is
transported
into
the
muscle
cell
via
\
\
diffusion
through
protein
transporters
called
GLUT4.
The
answer
is
(A)."
"
group"
:
"
mmlu_flan_cot_fewshot_other
"
"
include
"
:
"
_mmlu_flan_cot_fewshot_template_yaml
"
"
task
"
:
"
mmlu_flan_cot_fewshot_college_medicine
"
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_college_physics.yaml
View file @
574e565a
dataset_name
:
college_physics
description
:
'
The
following
are
multiple
choice
questions
(with
answers)
about
college
physics.
Q:
A
refracting
telescope
consists
of
two
converging
lenses
separated
by
100
cm.
The
eye-piece
lens
has
a
focal
length
of
20
cm.
The
angular
magnification
of
the
telescope
is
(A)
4
(B)
5
(C)
6
(D)
20
A:
Let'
'
s
think
step
by
step.
In
a
refracting
telescope,
if
both
lenses
are
converging,
the
focus
of
both
lenses
must
be
between
the
two
lenses,
and
thus
the
focal
lengths
of
the
two
lenses
must
add
up
to
their
separation.
Since
the
focal
length
of
one
lens
is
20
cm,
the
focal
length
of
the
other
must
be
80
cm.
The
magnification
is
the
ratio
of
these
two
focal
lengths,
or
4.
The
answer
is
(A).
Q:
The
muon
decays
with
a
characteristic
lifetime
of
about
10^-6
second
into
an
electron,
a
muon
neutrino,
and
an
electron
antineutrino.
The
muon
is
forbidden
from
decaying
into
an
electron
and
just
a
single
neutrino
by
the
law
of
conservation
of
(A)
charge
(B)
mass
(C)
energy
and
momentum
(D)
lepton
number
A:
Let'
'
s
think
step
by
step.
Lepton
number
must
be
conserved,
meaning
the
total
number
of
leptons
minus
the
number
of
antileptons.
If
a
muon
decays
into
an
electron
and
a
single
neutrino,
the
total
lepton
number
would
go
from
one
to
two,
violating
lepton
number
conservation.
The
answer
is
(D).
Q:
One
end
of
a
Nichrome
wire
of
length
2L
and
cross-sectional
area
A
is
attached
to
an
end
of
another
Nichrome
wire
of
length
L
and
cross-
sectional
area
2A.
If
the
free
end
of
the
longer
wire
is
at
an
electric
potential
of
8.0
volts,
and
the
free
end
of
the
shorter
wire
is
at
an
electric
potential
of
1.0
volt,
the
potential
at
the
junction
of
the
two
wires
is
most
nearly
equal
to
(A)
2.4
V
(B)
3.3
V
(C)
4.5
V
(D)
5.7
V
A:
Let'
'
s
think
step
by
step.
This
is
a
simple
voltage
divider
problem,
where
the
longer
wire
has
a
resistance
four
times
that
of
the
shorter
end.
So
the
voltage
divider
ratio
is
1
/
5,
meaning
that
the
potential
in
the
middle
is
1.0
V
+
(8.0
V
-
1.0
V)
*
1/5
=
2.4
V.
The
answer
is
(A).
Q:
A
refracting
telescope
consists
of
two
converging
lenses
separated
by
100
cm.
The
eye-piece
lens
has
a
focal
length
of
20
cm.
The
angular
magnification
of
the
telescope
is
(A)
4
(B)
5
(C)
6
(D)
20
A:
Let'
'
s
think
step
by
step.
In
a
refracting
telescope,
if
both
lenses
are
converging,
the
focus
of
both
lenses
must
be
between
the
two
lenses,
and
thus
the
focal
lengths
of
the
two
lenses
must
add
up
to
their
separation.
Since
the
focal
length
of
one
lens
is
20
cm,
the
focal
length
of
the
other
must
be
80
cm.
The
magnification
is
the
ratio
of
these
two
focal
lengths,
or
4.
The
answer
is
(A).
Q:
For
which
of
the
following
thermodynamic
processes
is
the
increase
in
the
internal
energy
of
an
ideal
gas
equal
to
the
heat
added
to
the
gas?
(A)
Constant
temperature
(B)
Constant
volume
(C)
Constant
pressure
(D)
Adiabatic
A:
Let'
'
s
think
step
by
step.
Heat
added
to
the
gas
can
go
into
the
gases
internal
energy
or
work
done
against
an
external
force.
However,
if
the
volume
of
the
gas
container
is
constant,
no
work
will
be
done
(since
work
is
pressure
times
change
in
volume).
So,
at
constant
volume,
all
of
the
heat
goes
into
the
internal
energy.
The
answer
is
(B).'
include
:
_mmlu_flan_cot_fewshot_template_yaml
task
:
mmlu_flan_cot_fewshot_college_physics
"
dataset_name"
:
"
college_physics"
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
college
\
\
physics.
\n\n
Q:
A
refracting
telescope
consists
of
two
converging
lenses
separated
\
\
by
100
cm.
The
eye-piece
lens
has
a
focal
length
of
20
cm.
The
angular
magnification
\
\
of
the
telescope
is
\n
(A)
4
(B)
5
(C)
6
(D)
20
\n
A:
Let's
think
step
by
step.
In
\
\
a
refracting
telescope,
if
both
lenses
are
converging,
the
focus
of
both
lenses
\
\
must
be
between
the
two
lenses,
and
thus
the
focal
lengths
of
the
two
lenses
must
\
\
add
up
to
their
separation.
Since
the
focal
length
of
one
lens
is
20
cm,
the
focal
\
\
length
of
the
other
must
be
80
cm.
The
magnification
is
the
ratio
of
these
two
\
\
focal
lengths,
or
4.
The
answer
is
(A).
\n\n
Q:
The
muon
decays
with
a
characteristic
\
\
lifetime
of
about
10^-6
second
into
an
electron,
a
muon
neutrino,
and
an
electron
\
\
antineutrino.
The
muon
is
forbidden
from
decaying
into
an
electron
and
just
a
\
\
single
neutrino
by
the
law
of
conservation
of
\n
(A)
charge
(B)
mass
(C)
energy
\
\
and
momentum
(D)
lepton
number
\n
A:
Let's
think
step
by
step.
Lepton
number
must
\
\
be
conserved,
meaning
the
total
number
of
leptons
minus
the
number
of
antileptons.
\
\
If
a
muon
decays
into
an
electron
and
a
single
neutrino,
the
total
lepton
number
\
\
would
go
from
one
to
two,
violating
lepton
number
conservation.
The
answer
is
\
\
(D).
\n\n
Q:
One
end
of
a
Nichrome
wire
of
length
2L
and
cross-sectional
area
A
\
\
is
attached
to
an
end
of
another
Nichrome
wire
of
length
L
and
cross-
sectional
\
\
area
2A.
If
the
free
end
of
the
longer
wire
is
at
an
electric
potential
of
8.0
\
\
volts,
and
the
free
end
of
the
shorter
wire
is
at
an
electric
potential
of
1.0
\
\
volt,
the
potential
at
the
junction
of
the
two
wires
is
most
nearly
equal
to
\n\
(A)
2.4
V
(B)
3.3
V
(C)
4.5
V
(D)
5.7
V
\n
A:
Let's
think
step
by
step.
This
is
a
\
\
simple
voltage
divider
problem,
where
the
longer
wire
has
a
resistance
four
times
\
\
that
of
the
shorter
end.
So
the
voltage
divider
ratio
is
1
/
5,
meaning
that
the
\
\
potential
in
the
middle
is
1.0
V
+
(8.0
V
-
1.0
V)
*
1/5
=
2.4
V.
The
answer
is
\
\
(A).
\n\n
Q:
A
refracting
telescope
consists
of
two
converging
lenses
separated
\
\
by
100
cm.
The
eye-piece
lens
has
a
focal
length
of
20
cm.
The
angular
magnification
\
\
of
the
telescope
is
\n
(A)
4
(B)
5
(C)
6
(D)
20
\n
A:
Let's
think
step
by
step.
In
\
\
a
refracting
telescope,
if
both
lenses
are
converging,
the
focus
of
both
lenses
\
\
must
be
between
the
two
lenses,
and
thus
the
focal
lengths
of
the
two
lenses
must
\
\
add
up
to
their
separation.
Since
the
focal
length
of
one
lens
is
20
cm,
the
focal
\
\
length
of
the
other
must
be
80
cm.
The
magnification
is
the
ratio
of
these
two
\
\
focal
lengths,
or
4.
The
answer
is
(A).
\n\n
Q:
For
which
of
the
following
thermodynamic
\
\
processes
is
the
increase
in
the
internal
energy
of
an
ideal
gas
equal
to
the
\
\
heat
added
to
the
gas?
\n
(A)
Constant
temperature
(B)
Constant
volume
(C)
Constant
\
\
pressure
(D)
Adiabatic
\n
A:
Let's
think
step
by
step.
Heat
added
to
the
gas
can
\
\
go
into
the
gases
internal
energy
or
work
done
against
an
external
force.
However,
\
\
if
the
volume
of
the
gas
container
is
constant,
no
work
will
be
done
(since
work
\
\
is
pressure
times
change
in
volume).
So,
at
constant
volume,
all
of
the
heat
goes
\
\
into
the
internal
energy.
The
answer
is
(B)."
"
group"
:
"
mmlu_flan_cot_fewshot_stem"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
task"
:
"
mmlu_flan_cot_fewshot_college_physics"
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_computer_security.yaml
View file @
574e565a
dataset_name
:
computer_security
description
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
computer
\
"
dataset_name
"
:
"
computer_security
"
"
description
"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
computer
\
\
security.
\n\n
Q:
SHA-1
has
a
message
digest
of
\n
(A)
160
bits
(B)
512
bits
(C)
628
\
\
bits
(D)
820
bits
\n
A:
Let's
think
step
by
step.
Since
SHA-1
is
a
hash
function
\
\
which
takes
an
input
and
produces
a
160-bit
(20-byte)
hash
value,
its
message
\
\
digest
is
160
bits.
The
answer
is
(A).
\n\n
Q:
_____________
can
modify
data
on
\
\
your
system
\u2013
so
that
your
system
doesn
\u2019
t
run
correctly
or
you
can
no
\
\
longer
access
specific
data,
or
it
may
even
ask
for
ransom
in
order
to
give
your
\
\
access.
\n
(A)
IM
\u2013
Trojans
(B)
Backdoor
Trojans
(C)
Trojan-Downloader
(D)
\
\
Ransom
Trojan
\n
A:
Let's
think
step
by
step.
The
system
is
asking
for
trojans,
\
\
which
are
for
ransom,
which
means
ransom
trojan.
The
answer
is
(D).
\n\n
Q:
What
\
\
is
ethical
hacking?
\n
(A)
\"
Hacking
\"
ethics
so
they
justify
unintended
selfish
\
\
behavior
(B)
Hacking
systems
(e.g.,
during
penetration
testing)
to
expose
vulnerabilities
\
\
so
they
can
be
fixed,
rather
than
exploited
(C)
Hacking
into
systems
run
by
those
\
\
whose
ethics
you
disagree
with
(D)
A
slang
term
for
rapid
software
development,
\
\
e.g.,
as
part
of
hackathons
\n
A:
Let's
think
step
by
step.
Ethical
hacking
is
a
\
\
process
of
detecting
vulnerabilities
in
an
application,
system,
or
organization's
\
\
infrastructure
that
an
attacker
can
use
to
exploit
an
individual
or
organization.
\
\
They
use
this
process
to
prevent
cyberattacks
and
security
breaches
by
lawfully
\
\
hacking
into
the
systems
and
looking
for
weak
points.
The
answer
is
(B).
\n\n
Q:
\
\
The
____________
is
anything
which
your
search
engine
cannot
search.
\n
(A)
Haunted
\
\
web
(B)
World
Wide
Web
(C)
Surface
web
(D)
Deep
Web
\n
A:
Let's
think
step
by
step.
\
\
The
search
engine
searches
on
the
Surface
Web,
which
is
the
portion
of
the
world
\
\
wide
web
which
is
visible
so
(B,C)
are
wrong.
The
Haunted
Web
doesn
\u2019
t
correspond
\
\
to
an
internet
concept.
The
Deep
Web
is
the
part
of
the
World
Wide
Web
which
is
\
\
not
indexed.
The
answer
is
(D).
\n\n
Q:
Exploitation
of
the
Heartbleed
bug
permits
\n\
(A)
overwriting
cryptographic
keys
in
memory
(B)
a
kind
of
code
injection
(C)
a
\
\
read
outside
bounds
of
a
buffer
(D)
a
format
string
attack
\n
A:
Let's
think
step
\
\
by
step.
The
Heartbleed
Bug
is
a
serious
vulnerability
in
the
popular
OpenSSL
\
\
cryptographic
software
library.
Heartbleed
resulted
from
improper
input
validation
\
\
(due
to
a
missing
bounds
check)
in
the
implementation
of
the
TLS
heartbeat
extension.
\
\
The
vulnerability
was
classified
as
a
buffer
over-read,
a
situation
where
more
\
\
data
can
be
read
than
should
be
allowed.
The
answer
is
(C)."
include
:
_mmlu_flan_cot_fewshot_template_yaml
task
:
mmlu_flan_cot_fewshot_computer_security
\
your
system
–
so
that
your
system
doesn’t
run
correctly
or
you
can
no
longer
access
\
\
specific
data,
or
it
may
even
ask
for
ransom
in
order
to
give
your
access.
\n
(A)
\
\
IM
–
Trojans
(B)
Backdoor
Trojans
(C)
Trojan-Downloader
(D)
Ransom
Trojan
\n
A:
\
\
Let's
think
step
by
step.
The
system
is
asking
for
trojans,
which
are
for
ransom,
\
\
which
means
ransom
trojan.
The
answer
is
(D).
\n\n
Q:
What
is
ethical
hacking?
\n\
(A)
\"
Hacking
\"
ethics
so
they
justify
unintended
selfish
behavior
(B)
Hacking
systems
\
\
(e.g.,
during
penetration
testing)
to
expose
vulnerabilities
so
they
can
be
fixed,
\
\
rather
than
exploited
(C)
Hacking
into
systems
run
by
those
whose
ethics
you
disagree
\
\
with
(D)
A
slang
term
for
rapid
software
development,
e.g.,
as
part
of
hackathons
\n\
A:
Let's
think
step
by
step.
Ethical
hacking
is
a
process
of
detecting
vulnerabilities
\
\
in
an
application,
system,
or
organization's
infrastructure
that
an
attacker
can
\
\
use
to
exploit
an
individual
or
organization.
They
use
this
process
to
prevent
\
\
cyberattacks
and
security
breaches
by
lawfully
hacking
into
the
systems
and
looking
\
\
for
weak
points.
The
answer
is
(B).
\n\n
Q:
The
____________
is
anything
which
your
\
\
search
engine
cannot
search.
\n
(A)
Haunted
web
(B)
World
Wide
Web
(C)
Surface
web
\
\
(D)
Deep
Web
\n
A:
Let's
think
step
by
step.
The
search
engine
searches
on
the
Surface
\
\
Web,
which
is
the
portion
of
the
world
wide
web
which
is
visible
so
(B,C)
are
\
\
wrong.
The
Haunted
Web
doesn’t
correspond
to
an
internet
concept.
The
Deep
Web
\
\
is
the
part
of
the
World
Wide
Web
which
is
not
indexed.
The
answer
is
(D).
\n\n\
Q:
Exploitation
of
the
Heartbleed
bug
permits
\n
(A)
overwriting
cryptographic
keys
\
\
in
memory
(B)
a
kind
of
code
injection
(C)
a
read
outside
bounds
of
a
buffer
(D)
\
\
a
format
string
attack
\n
A:
Let's
think
step
by
step.
The
Heartbleed
Bug
is
a
serious
\
\
vulnerability
in
the
popular
OpenSSL
cryptographic
software
library.
Heartbleed
\
\
resulted
from
improper
input
validation
(due
to
a
missing
bounds
check)
in
the
\
\
implementation
of
the
TLS
heartbeat
extension.
The
vulnerability
was
classified
\
\
as
a
buffer
over-read,
a
situation
where
more
data
can
be
read
than
should
be
\
\
allowed.
The
answer
is
(C)."
"
group"
:
"
mmlu_flan_cot_fewshot_stem"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
task"
:
"
mmlu_flan_cot_fewshot_computer_security"
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_conceptual_physics.yaml
View file @
574e565a
dataset_name
:
conceptual_physics
description
:
"
\n
The
following
are
multiple
choice
questions
(with
answers)
about
conceptual
\
\
physics.
\n\n
Q:
Colors
in
a
soap
bubble
result
from
light
\n
(A)
converted
to
a
different
\
\
frequency
(B)
deflection
(C)
interference
(D)
polarization
\n
A:
Let's
think
step
\
\
by
step.
In
a
soap
bubble
film,
the
light
bounces
between
the
two
soap-air
interfaces
\
\
many
times,
interfering
with
itself
constructively
or
destructively
depending
\
\
on
the
width
of
the
film.
This
results
in
different
colors
being
visible.
The
\
\
answer
is
(C).
\n\n
Q:
Compared
with
the
mass
of
a
uranium
atom
undergoing
fission,
\
\
the
combined
masses
of
the
products
after
fission
are
\n
(A)
less
(B)
more
(C)
the
\
\
same
(D)
zero
\n
A:
Let's
think
step
by
step.
Fission
releases
energy,
which
comes
\
\
from
the
rest
mass
of
its
initial
nucleus.
Thus
the
mass
of
the
products
is
less
\
\
than
the
mass
of
the
reactant
uranium
nucleus.
The
answer
is
(A).
\n\n
Q:
Things
\
\
that
are
equivalent
according
to
the
equivalence
principle
are
\n
(A)
space
and
\
\
time.
(B)
a
traveling
twin
and
a
stay-at-home
twin.
(C)
gravity
and
acceleration.
\
\
(D)
mass
and
energy.
\n
A:
Let's
think
step
by
step.
Einstein
\u2019
s
famous
equivalence
\
\
principle
states
that
gravity
and
acceleration
are
equivalent.
The
answer
is
(C).
\n\
\n
Q:
Which
of
these
three
elements
has
the
most
mass
per
nucleon?
\n
(A)
Hydrogen
\
\
(B)
Iron
(C)
Uranium
(D)
Same
in
each
\n
A:
Let's
think
step
by
step.
Due
to
nuclear
\
\
binding
energy,
the
mass
of
an
atomic
nucleus
is
less
than
the
sum
of
individual
\
\
masses
of
the
free
constituent
protons
and
neutrons;
this
is
known
as
the
mass
\
\
defect.
Hydrogen
has
no
mass
defect
because
it
has
only
a
single
nucleon,
so
it
\
\
will
have
the
most
mass
per
nucleon.
The
answer
is
(A).
\n\n
Q:
A
model
airplane
\
\
flies
slower
when
flying
into
the
wind
and
faster
with
wind
at
its
back.
When
\
\
launched
at
right
angles
to
the
wind
a
cross
wind
its
groundspeed
compared
with
\
\
flying
in
still
air
is
\n
(A)
the
same
(B)
greater
(C)
less
(D)
either
greater
or
\
\
less
depending
on
wind
speed
\n
A:
Let's
think
step
by
step.
The
plane
\u2019
s
speed
\
\
in
the
direction
of
the
wind
is
greater
than
it
would
be
in
the
absence
of
wind,
\
\
and
its
direction
orthogonal
to
the
wind
is
the
same
as
it
would
be
in
the
absence
\
\
of
the
wind.
The
total
speed,
which
is
these
two
components
added
in
quadrature,
\
\
is
thus
greater
than
the
speed
in
still
air.
The
answer
is
(B)."
include
:
_mmlu_flan_cot_fewshot_template_yaml
task
:
mmlu_flan_cot_fewshot_conceptual_physics
"
dataset_name"
:
"
conceptual_physics"
"
description"
:
"
\n
The
following
are
multiple
choice
questions
(with
answers)
about
\
\
conceptual
physics.
\n\n
Q:
Colors
in
a
soap
bubble
result
from
light
\n
(A)
converted
\
\
to
a
different
frequency
(B)
deflection
(C)
interference
(D)
polarization
\n
A:
\
\
Let's
think
step
by
step.
In
a
soap
bubble
film,
the
light
bounces
between
the
\
\
two
soap-air
interfaces
many
times,
interfering
with
itself
constructively
or
\
\
destructively
depending
on
the
width
of
the
film.
This
results
in
different
colors
\
\
being
visible.
The
answer
is
(C).
\n\n
Q:
Compared
with
the
mass
of
a
uranium
atom
\
\
undergoing
fission,
the
combined
masses
of
the
products
after
fission
are
\n
(A)
\
\
less
(B)
more
(C)
the
same
(D)
zero
\n
A:
Let's
think
step
by
step.
Fission
releases
\
\
energy,
which
comes
from
the
rest
mass
of
its
initial
nucleus.
Thus
the
mass
of
\
\
the
products
is
less
than
the
mass
of
the
reactant
uranium
nucleus.
The
answer
\
\
is
(A).
\n\n
Q:
Things
that
are
equivalent
according
to
the
equivalence
principle
\
\
are
\n
(A)
space
and
time.
(B)
a
traveling
twin
and
a
stay-at-home
twin.
(C)
gravity
\
\
and
acceleration.
(D)
mass
and
energy.
\n
A:
Let's
think
step
by
step.
Einstein’s
\
\
famous
equivalence
principle
states
that
gravity
and
acceleration
are
equivalent.
\
\
The
answer
is
(C).
\n\n
Q:
Which
of
these
three
elements
has
the
most
mass
per
nucleon?
\n\
(A)
Hydrogen
(B)
Iron
(C)
Uranium
(D)
Same
in
each
\n
A:
Let's
think
step
by
step.
\
\
Due
to
nuclear
binding
energy,
the
mass
of
an
atomic
nucleus
is
less
than
the
\
\
sum
of
individual
masses
of
the
free
constituent
protons
and
neutrons;
this
is
\
\
known
as
the
mass
defect.
Hydrogen
has
no
mass
defect
because
it
has
only
a
single
\
\
nucleon,
so
it
will
have
the
most
mass
per
nucleon.
The
answer
is
(A).
\n\n
Q:
A
\
\
model
airplane
flies
slower
when
flying
into
the
wind
and
faster
with
wind
at
\
\
its
back.
When
launched
at
right
angles
to
the
wind
a
cross
wind
its
groundspeed
\
\
compared
with
flying
in
still
air
is
\n
(A)
the
same
(B)
greater
(C)
less
(D)
either
\
\
greater
or
less
depending
on
wind
speed
\n
A:
Let's
think
step
by
step.
The
plane’s
\
\
speed
in
the
direction
of
the
wind
is
greater
than
it
would
be
in
the
absence
\
\
of
wind,
and
its
direction
orthogonal
to
the
wind
is
the
same
as
it
would
be
in
\
\
the
absence
of
the
wind.
The
total
speed,
which
is
these
two
components
added
\
\
in
quadrature,
is
thus
greater
than
the
speed
in
still
air.
The
answer
is
(B)."
"
group"
:
"
mmlu_flan_cot_fewshot_stem"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
task"
:
"
mmlu_flan_cot_fewshot_conceptual_physics"
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_econometrics.yaml
View file @
574e565a
dataset_name
:
econometrics
description
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
econometrics.
\n\
"
dataset_name
"
:
"
econometrics
"
"
description
"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
econometrics.
\n\
\n
Q:
Suppose
now
that
a
researcher
wishes
to
use
information
criteria
to
determine
\
\
the
optimal
lag
length
for
a
VAR.
500
observations
are
available
for
the
bi-variate
\
\
VAR,
and
the
values
of
the
determinant
of
the
variance-covariance
matrix
of
residuals
\
\
are
0.0336,
0.0169,
0.0084,
and
0.0062
for
1,
2,
3,
and
4
lags
respectively.
What
\
\
is
the
optimal
model
order
according
to
Akaike's
information
criterion?
\n
(A)
1
\
\
lag
(B)
2
lags
(C)
3
lags
(D)
4
lags
\n
A:
Let's
think
step
by
step.
We
refer
to
\
\
Wikipedia
articles
on
econometrics
for
help.
Let
\u2019
s
solve
this
problem
step
\
\
by
step.
First
of
all,
let
\u2019
s
recall
that
for
a
given
set
of
data,
Akaike's
\
\
information
criterion
(AIC)
allows
us
to
measure
how
well
a
statistical
model
\
\
fits
the
data;
it
is
an
estimator
of
prediction
error.
Here
in
this
problem
we
\
\
will
need
to
use
the
formula
ln(det(sigma_hat))
+
(2
*
k
/
T)
to
determine
the
\
\
values
of
Akaike
\u2019
s
criterion,
where
ln
denotes
the
natural
log
function,
\
\
det
the
determinant
function,
k
the
total
number
of
parameters
in
total
(across
\
\
both
equations),
and
T
the
number
of
observations
(which,
in
this
case,
is
equal
\
\
to
500).
For
1
lag,
the
number
of
parameters
in
total
is
equal
to
6;
for
2
lags,
\
\
it
is
10;
for
3
lags,
it
is
14;
and
for
4
lags,
it
is
18.
Now,
let
\u2019
s
calculate
\
\
the
values
of
the
criterion
for
each
lag:
\n
(A)
1
lag:
ln(0.0336)
+
(2
*
6
/
500)
\
\
=
ln(0.0336)
+
(12
/
500)
=
-3.369
\n
(B)
2
lags:
ln(0.0169)
+
(2
*
10
/
500)
=
\
\
ln(0.0169)
+
(20
/
500)
=
-4.040
\n
(C)
3
lags:
ln(0.0084)
+
(2
*
14
/
500)
=
ln(0.0084)
\
\
+
(28
/
500)
=-4.724
\n
(D)
4
lags:
ln(0.0062)
+
(2
*
18
/
500)
=
ln(0.0062)
+
(36
\
\
/
500)
=-5.011
\n
Because
the
optimal
model
order
according
to
AIC
minimizes
the
\
\
information
criterion,
the
answer
should
be
the
one
with
the
lowest
value.
In
\
\
this
case,
(D)
has
the
lowest
value.
The
answer
is
(C).
\n\n
Q:
Consider
the
following
\
\
AR(1)
model
with
the
disturbances
having
zero
mean
and
unit
variance
\n
yt
=
0.2
\
\
+
0.4
yt-1
+
ut
\n
The
(unconditional)
mean
of
y
will
be
given
by
\n
(A)
0.2
(B)
0.4
\
\
(C)
0.5
(D)
0.33
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
\
\
on
econometrics
for
help.
Let
\u2019
s
solve
this
problem
step
by
step.
If
we
have
\
\
a
an
AR(1)
model
with
the
disturbances
having
zero
mean
and
unit
variance,
then
\
\
the
unconditional
mean
of
y
is
equal
to
the
following:
\n
unconditional
mean
of
\
\
y
=
(the
intercept
term)
/
(1
-
autoregressive
coefficient)
\n
We
know
that
the
\
\
intercept
term
is
0.2
and
the
autoregressive
coefficient
is
0.4;
thus,
we
have:
\n\
unconditional
mean
of
y
=
(0.2)
/
(1
-
0.4)
=
(0.2)
/
(0.6)
=
2
/
6
=
1
/
3,
which
\
\
is
approximately
0.33.
That
means
that
the
answer
should
be
(D)
0.33.
The
answer
\
\
is
(D).
\n\n
Q:
What
would
be
then
consequences
for
the
OLS
estimator
if
heteroscedasticity
\
\
is
present
in
a
regression
model
but
ignored?
\n
(A)
It
will
be
biased
(B)
It
will
\
\
be
inconsistent
(C)
It
will
be
inefficient
(D)
All
of
(a),
(b)
and
(c)
will
be
\
\
true.
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
econometrics
\
\
for
help.
Heteroscedasticity
refers
to
the
condition
where
the
variance
of
the
\
\
error
terms
is
not
constant
across
multiple
observations.
If
heteroscedasticity
\
\
is
present
in
a
regression
model,
then
the
coefficient
estimates
in
the
OLS
estimator
\
\
will
be
not
only
unbiased
and
consistent
but
also
inefficient.
Because
(A)
and
\
\
(B)
are
incorrect
choices
and
(C)
is
a
correct
choice,
(D)
cannot
be
the
right
\
\
answer.
Ultimately,
(C)
is
the
only
true
choice.
The
answer
is
(C).
\n\n
Q:
Suppose
\
\
that
a
test
statistic
has
associated
with
it
a
p-value
of
0.08.
Which
one
of
the
\
\
following
statements
is
true?
\n
(i)
If
the
size
of
the
test
were
exactly
8%,
we
\
\
would
be
indifferent
between
rejecting
and
not
rejecting
the
null
hypothesis
\n\
(ii)
The
null
would
be
rejected
if
a
10%
size
of
test
were
used
\n
(iii)
The
null
\
\
would
not
be
rejected
if
a
1%
size
of
test
were
used
\n
(iv)
The
null
would
be
rejected
\
\
if
a
5%
size
of
test
were
used.
\n
(A)
(ii)
and
(iv)
only
(B)
(i)
and
(iii)
only
\
\
(C)
(i),
(ii),
and
(iii)
only
(D)
(i),
(ii),
(iii),
and
(iv).
\n
A:
Let's
think
\
\
step
by
step.
We
refer
to
Wikipedia
articles
on
econometrics
for
help.
Let
\u2019
\
s
reason
about
each
of
the
options
.
\n
(i)
is
a
true
statement.
\n
(ii)
is
a
true
statement.
\n\
(iii)
is
a
true
statement.
\n
(iv)
is
not
a
true
statement.
Thus,
(i),
(ii),
and
(iii)
\
\
are
true.
The
answer
is
(C).
\n\n
Q:
For
a
stationary
autoregressive
process,
shocks
\
\
will
\n
(A)
Eventually
die
away
(B)
Persist
indefinitely
(C)
Grow
exponentially
\
\
(D)
Never
occur
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
\
\
econometrics
for
help.
This
is
a
formal
logic
problem
about
stationally
process.
\
\
For
a
stationary
autoregressive
process,
shocks
will
eventually
die
away.
The
\
\
answer
is
(A).
"
include
:
_mmlu_flan_cot_fewshot_template_yaml
task
:
mmlu_flan_cot_fewshot_econometrics
\
Wikipedia
articles
on
econometrics
for
help.
Let
’
s
solve
this
problem
step
by
\
\
step.
First
of
all,
let
’
s
recall
that
for
a
given
set
of
data,
Akaike's
information
\
\
criterion
(AIC)
allows
us
to
measure
how
well
a
statistical
model
fits
the
data;
\
\
it
is
an
estimator
of
prediction
error.
Here
in
this
problem
we
will
need
to
use
\
\
the
formula
ln(det(sigma_hat))
+
(2
*
k
/
T)
to
determine
the
values
of
Akaike’s
\
\
criterion,
where
ln
denotes
the
natural
log
function,
det
the
determinant
function,
\
\
k
the
total
number
of
parameters
in
total
(across
both
equations),
and
T
the
number
\
\
of
observations
(which,
in
this
case,
is
equal
to
500).
For
1
lag,
the
number
\
\
of
parameters
in
total
is
equal
to
6;
for
2
lags,
it
is
10;
for
3
lags,
it
is
\
\
14;
and
for
4
lags,
it
is
18.
Now,
let
’
s
calculate
the
values
of
the
criterion
\
\
for
each
lag:
\n
(A)
1
lag:
ln(0.0336)
+
(2
*
6
/
500)
=
ln(0.0336)
+
(12
/
500)
\
\
=
-3.369
\n
(B)
2
lags:
ln(0.0169)
+
(2
*
10
/
500)
=
ln(0.0169)
+
(20
/
500)
=
\
\
-4.040
\n
(C)
3
lags:
ln(0.0084)
+
(2
*
14
/
500)
=
ln(0.0084)
+
(28
/
500)
=-4.724
\n
\
(D)
4
lags:
ln(0.0062)
+
(2
*
18
/
500)
=
ln(0.0062)
+
(36
/
500)
=-5.011
\n
Because
\
\
the
optimal
model
order
according
to
AIC
minimizes
the
information
criterion,
\
\
the
answer
should
be
the
one
with
the
lowest
value.
In
this
case,
(D)
has
the
\
\
lowest
value.
The
answer
is
(C).
\n\n
Q:
Consider
the
following
AR(1)
model
with
\
\
the
disturbances
having
zero
mean
and
unit
variance
\n
yt
=
0.2
+
0.4
yt-1
+
ut
\n
\
The
(unconditional)
mean
of
y
will
be
given
by
\n
(A)
0.2
(B)
0.4
(C)
0.5
(D)
0.33
\n
\
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
econometrics
for
\
\
help.
Let
’
s
solve
this
problem
step
by
step.
If
we
have
a
an
AR(1)
model
with
\
\
the
disturbances
having
zero
mean
and
unit
variance,
then
the
unconditional
mean
\
\
of
y
is
equal
to
the
following:
\n
unconditional
mean
of
y
=
(the
intercept
term)
\
\
/
(1
-
autoregressive
coefficient)
\n
We
know
that
the
intercept
term
is
0.2
and
\
\
the
autoregressive
coefficient
is
0.4;
thus,
we
have:
\n
unconditional
mean
of
y
\
\
=
(0.2)
/
(1
-
0.4)
=
(0.2)
/
(0.6)
=
2
/
6
=
1
/
3,
which
is
approximately
0.33.
\
\
That
means
that
the
answer
should
be
(D)
0.33.
The
answer
is
(D).
\n\n
Q:
What
would
\
\
be
then
consequences
for
the
OLS
estimator
if
heteroscedasticity
is
present
in
\
\
a
regression
model
but
ignored?
\n
(A)
It
will
be
biased
(B)
It
will
be
inconsistent
\
\
(C)
It
will
be
inefficient
(D)
All
of
(a),
(b)
and
(c)
will
be
true.
\n
A:
Let's
\
\
think
step
by
step.
We
refer
to
Wikipedia
articles
on
econometrics
for
help.
Heteroscedasticity
\
\
refers
to
the
condition
where
the
variance
of
the
error
terms
is
not
constant
\
\
across
multiple
observations.
If
heteroscedasticity
is
present
in
a
regression
\
\
model,
then
the
coefficient
estimates
in
the
OLS
estimator
will
be
not
only
unbiased
\
\
and
consistent
but
also
inefficient.
Because
(A)
and
(B)
are
incorrect
choices
\
\
and
(C)
is
a
correct
choice,
(D)
cannot
be
the
right
answer.
Ultimately,
(C)
is
\
\
the
only
true
choice.
The
answer
is
(C).
\n\n
Q:
Suppose
that
a
test
statistic
has
\
\
associated
with
it
a
p-value
of
0.08.
Which
one
of
the
following
statements
is
\
\
true?
\n
(i)
If
the
size
of
the
test
were
exactly
8%,
we
would
be
indifferent
between
\
\
rejecting
and
not
rejecting
the
null
hypothesis
\n
(ii)
The
null
would
be
rejected
\
\
if
a
10%
size
of
test
were
used
\n
(iii)
The
null
would
not
be
rejected
if
a
1%
\
\
size
of
test
were
used
\n
(iv)
The
null
would
be
rejected
if
a
5%
size
of
test
were
\
\
used.
\n
(A)
(ii)
and
(iv)
only
(B)
(i)
and
(iii)
only
(C)
(i),
(ii),
and
(iii)
\
\
only
(D)
(i),
(ii),
(iii),
and
(iv).
\n
A:
Let's
think
step
by
step.
We
refer
to
\
\
Wikipedia
articles
on
econometrics
for
help.
Let
’s
reason
about
each
of
the
options.
\n
\
(i)
is
a
true
statement
.
\n
(i
i
)
is
a
true
statement.
\n
(ii
i
)
is
a
true
statement.
\n\
(iv)
is
not
a
true
statement.
Thus,
(i),
(ii),
and
(iii)
are
true.
The
answer
is
\
\
(C).
\n\n
Q:
For
a
stationary
autoregressive
process,
shocks
will
\n
(A)
Eventually
\
\
die
away
(B)
Persist
indefinitely
(C)
Grow
exponentially
(D)
Never
occur
\n
A:
Let's
\
\
think
step
by
step.
We
refer
to
Wikipedia
articles
on
econometrics
for
help.
This
\
\
is
a
formal
logic
problem
about
stationally
process.
For
a
stationary
autoregressive
\
\
process,
shocks
will
eventually
die
away.
The
answer
is
(A)."
"
group"
:
"
mmlu_flan_cot_fewshot_social_sciences
"
"
include
"
:
"
_mmlu_flan_cot_fewshot_template_yaml
"
"
task
"
:
"
mmlu_flan_cot_fewshot_econometrics
"
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_electrical_engineering.yaml
View file @
574e565a
dataset_name
:
electrical_engineering
description
:
"
\n
The
following
are
multiple
choice
questions
(with
answers)
about
electrical
\
\
engineering.
\n\n
Q:
A
point
pole
has
a
strength
of
4
\u03C0
*
10^-4
weber.
The
force
\
\
in
newtons
on
a
point
pole
of
4
\u03C0
*
1.5
*
10^-4
weber
placed
at
a
distance
\
"
dataset_name
"
:
"
electrical_engineering
"
"
description
"
:
"
\n
The
following
are
multiple
choice
questions
(with
answers)
about
\
\
electrical
engineering.
\n\n
Q:
A
point
pole
has
a
strength
of
4
π
*
10^-4
weber.
\
\
The
force
in
newtons
on
a
point
pole
of
4
π
*
1.5
*
10^-4
weber
placed
at
a
distance
\
\
of
10
cm
from
it
will
be
\n
(A)
15
N.
(B)
20
N.
(C)
7.5
N.
(D)
3.75
N.
\n
A:
Let's
\
\
think
step
by
step.
The
force
between
two
point
poles
is
given
by
m_1m_2/(mu_0
\
\
4
\\
pi
r^2),
in
analogy
to
Coulomb
\u2019
s
law.
Plugging
in
the
values
given
in
\
\
the
question,
we
calculate
that
the
force
is
approximately
15
N.
The
answer
is
\
\
(A).
\n\
n
Q:
The
coil
of
a
moving
coil
meter
has
100
turns,
is
40
mm
long
and
30
\
\
mm
wide.
The
control
torque
is
240*10-6
N-m
on
full
scale.
If
magnetic
flux
density
\
\
is
1Wb/m2
range
of
meter
is
\n
(A)
1
mA.
(B)
2
mA.
(C)
3
mA.
(D)
4
mA.
\n
A:
Let's
\
\
think
step
by
step.
The
torque
on
a
coil
in
a
uniform
magnetic
field
is
given
\
\
by
BANI,
where
B
is
the
magnetic
flux
density,
A
is
the
area
of
the
coil,
N
is
\
\
the
number
of
turns,
and
I
is
the
current.
So
we
have
that
I
=
(Torque)/(BAN),
\
\
or
240e-6/(1200e-6
*
100
*
1)
=
2e-3.
The
answer
is
(B).
\n\n
Q:
In
an
SR
latch
\
\
built
from
NOR
gates,
which
condition
is
not
allowed
\n
(A)
S=0,
R=0
(B)
S=0,
R=1
\
\
(C)
S=1,
R=0
(D)
S=1,
R=1
\n
A:
Let's
think
step
by
step.
An
SR
latch
is
a
set-reset
\
\
latch;
in
the
case
where
S=1
and
R=1,
the
circuit
has
no
stable
state;
instead
\
\
a
race
condition
will
be
produced
within
the
circuit,
so
the
device
will
be
in
\
\
an
undefined
state.
So
S=1,
R=1
is
an
illegal
input.
The
answer
is
(D).
\n\n
Q:
\
\
Two
long
parallel
conductors
carry
100
A.
If
the
conductors
are
separated
by
20
\
\
mm,
the
force
per
meter
of
length
of
each
conductor
will
be
\n
(A)
100
N.
(B)
0.1
\
\
N.
(C)
1
N.
(D)
0.01
N.
\n
A:
Let's
think
step
by
step.
The
magnetic
force-per-length
\
\
between
two
current-carrying
conductors
is
given
by
\\
mu_0
I_1
I_2
/
(2
\\
pi
r),
\
\
where
$r$
is
the
separation
distance
and
I_1
and
I_2
are
the
currents.
Plugging
\
\
in
100
A
for
I_1
and
I_2,
and
20
mm
for
r,
gives
0.1
N.
The
answer
is
(B).
\n\n\
Q:
In
a
2
pole
lap
winding
dc
machine
,
the
resistance
of
one
conductor
is
2
\u03A9
\
\
and
total
number
of
conductors
is
100.
Find
the
total
resistance
\n
(A)
200
\u03A9
\
\
(B)
100
\u03A9
(C)
50
\u03A9
(D)
10
\u03A9\n
A:
Let's
think
step
by
step.
In
lap
winding,
\
\
effectively
two
resistors
are
connected
in
parallel,
so
the
actual
resistance
\
\
of
each
pair
is
1
Ohm.
Since
we
have
50
pairs,
we
get
a
total
resistance
of
50
\
\
Ohms.
The
answer
is
(C).
"
include
:
_mmlu_flan_cot_fewshot_template_yaml
task
:
mmlu_flan_cot_fewshot_electrical_engineering
\
4
\\
pi
r^2),
in
analogy
to
Coulomb
’
s
law.
Plugging
in
the
values
given
in
the
\
\
question,
we
calculate
that
the
force
is
approximately
15
N.
The
answer
is
(A).
\n
\
\n
Q:
The
coil
of
a
moving
coil
meter
has
100
turns,
is
40
mm
long
and
30
mm
wide.
\
\
The
control
torque
is
240*10-6
N-m
on
full
scale.
If
magnetic
flux
density
is
\
\
1Wb/m2
range
of
meter
is
\n
(A)
1
mA.
(B)
2
mA.
(C)
3
mA.
(D)
4
mA.
\n
A:
Let's
think
\
\
step
by
step.
The
torque
on
a
coil
in
a
uniform
magnetic
field
is
given
by
BANI,
\
\
where
B
is
the
magnetic
flux
density,
A
is
the
area
of
the
coil,
N
is
the
number
\
\
of
turns,
and
I
is
the
current.
So
we
have
that
I
=
(Torque)/(BAN),
or
240e-6/(1200e-6
\
\
*
100
*
1)
=
2e-3.
The
answer
is
(B).
\n\n
Q:
In
an
SR
latch
built
from
NOR
gates,
\
\
which
condition
is
not
allowed
\n
(A)
S=0,
R=0
(B)
S=0,
R=1
(C)
S=1,
R=0
(D)
S=1,
\
\
R=1
\n
A:
Let's
think
step
by
step.
An
SR
latch
is
a
set-reset
latch;
in
the
case
\
\
where
S=1
and
R=1,
the
circuit
has
no
stable
state;
instead
a
race
condition
will
\
\
be
produced
within
the
circuit,
so
the
device
will
be
in
an
undefined
state.
So
\
\
S=1,
R=1
is
an
illegal
input.
The
answer
is
(D).
\n\n
Q:
Two
long
parallel
conductors
\
\
carry
100
A.
If
the
conductors
are
separated
by
20
mm,
the
force
per
meter
of
\
\
length
of
each
conductor
will
be
\n
(A)
100
N.
(B)
0.1
N.
(C)
1
N.
(D)
0.01
N.
\n
\
A:
Let's
think
step
by
step.
The
magnetic
force-per-length
between
two
current-carrying
\
\
conductors
is
given
by
\\
mu_0
I_1
I_2
/
(2
\\
pi
r),
where
$r$
is
the
separation
\
\
distance
and
I_1
and
I_2
are
the
currents.
Plugging
in
100
A
for
I_1
and
I_2,
\
\
and
20
mm
for
r,
gives
0.1
N.
The
answer
is
(B).
\n\n
Q:
In
a
2
pole
lap
winding
\
\
dc
machine
,
the
resistance
of
one
conductor
is
2
Ω
and
total
number
of
conductors
\
\
is
100.
Find
the
total
resistance
\n
(A)
200
Ω
(B)
100Ω
(C)
50Ω
(D)
10Ω
\n
A:
Let's
\
\
think
step
by
step.
In
lap
winding,
effectively
two
resistors
are
connected
in
\
\
parallel,
so
the
actual
resistance
of
each
pair
is
1
Ohm.
Since
we
have
50
pairs,
\
\
we
get
a
total
resistance
of
50
Ohms.
The
answer
is
(C)."
"
group"
:
"
mmlu_flan_cot_fewshot_stem
"
"
include
"
:
"
_mmlu_flan_cot_fewshot_template_yaml
"
"
task
"
:
"
mmlu_flan_cot_fewshot_electrical_engineering
"
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_elementary_mathematics.yaml
View file @
574e565a
dataset_name
:
elementary_mathematics
description
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
elementary
\
"
dataset_name
"
:
"
elementary_mathematics
"
"
description
"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
elementary
\
\
mathematics.
\n\n
Q:
Olivia
used
the
rule
\"
Add
11
\"
to
create
the
number
pattern
\
\
shown
below.
10,
21,
32,
43,
54.
Which
statement
about
the
number
pattern
is
true?
\n\
(A)
The
10th
number
in
the
pattern
will
be
an
even
number.
\n
(B)
The
number
pattern
\
...
...
@@ -22,19 +22,20 @@ description: "The following are multiple choice questions (with answers) about e
\
the
other
choices
are
incorrect.
The
answer
is
(A).
\n\n
Q:
A
store
sells
107
different
\
\
colors
of
paint.
They
have
25
cans
of
each
color
in
storage.
The
number
of
cans
\
\
of
paint
the
store
has
in
storage
can
be
found
using
the
expression
below.
107
\
\ \xD7
25.
How
many
cans
of
paint
does
the
store
have
in
storage?
\n
(A)
749
\n
(B)
\
\
2,675
\n
(C)
2,945
\n
(D)
4,250
\n
A:
Let's
think
step
by
step.
We
can
calculate
107
\
\
x
25
=
(100
x
25)
+
(7
x
25)
=
2500
+
175
=
2675.
The
answer
is
(B).
\n\n
Q:
A
total
\
\
of
30
players
will
play
basketball
at
a
park.
There
will
be
exactly
5
players
\
\
on
each
team.
Which
statement
correctly
explains
how
to
find
the
number
of
teams
\
\
needed?
\n
(A)
Add
5
to
30
to
find
35
teams.
\n
(B)
Divide
30
by
5
to
find
6
teams.
\n\
(C)
Multiply
30
and
5
to
find
150
teams.
\n
(D)
Subtract
5
from
30
to
find
25
teams.
\n\
A:
Let's
think
step
by
step.
We
want
to
find
the
number
of
teams.
We
know
that
there
\
\
are
5
players/team,
and
30
players.
Thus
to
get
the
number
of
teams
we
divide
\
\
players
by
players/team,
so
30
players
/
5
players/team
=
6
teams.
The
answer
\
\
is
(B).
\n\n
Q:
Which
expression
is
equivalent
to
5
x
9?
\n
(A)
(5
x
4)
x
(6
x
5)
\n\
(B)
(5
x
5)
+
(5
x
4)
\n
(C)
(5
x
5)
+
(5
x
9)
\n
(D)
(5
x
9)
x
(6
x
9)
\n
A:
Let's
think
\
\
step
by
step.
We
know
that
9
=
(5
+
4),
so
5
x
9
=
5
x
(5
+
4)
=
(5
x
5)
+
(5
\
\
x
4).
The
answer
is
(B)."
include
:
_mmlu_flan_cot_fewshot_template_yaml
task
:
mmlu_flan_cot_fewshot_elementary_mathematics
\
×
25.
How
many
cans
of
paint
does
the
store
have
in
storage?
\n
(A)
749
\n
(B)
2,675
\n\
(C)
2,945
\n
(D)
4,250
\n
A:
Let's
think
step
by
step.
We
can
calculate
107
x
25
=
(100
\
\
x
25)
+
(7
x
25)
=
2500
+
175
=
2675.
The
answer
is
(B).
\n\n
Q:
A
total
of
30
players
\
\
will
play
basketball
at
a
park.
There
will
be
exactly
5
players
on
each
team.
\
\
Which
statement
correctly
explains
how
to
find
the
number
of
teams
needed?
\n
(A)
\
\
Add
5
to
30
to
find
35
teams.
\n
(B)
Divide
30
by
5
to
find
6
teams.
\n
(C)
Multiply
\
\
30
and
5
to
find
150
teams.
\n
(D)
Subtract
5
from
30
to
find
25
teams.
\n
A:
Let's
\
\
think
step
by
step.
We
want
to
find
the
number
of
teams.
We
know
that
there
are
\
\
5
players/team,
and
30
players.
Thus
to
get
the
number
of
teams
we
divide
players
\
\
by
players/team,
so
30
players
/
5
players/team
=
6
teams.
The
answer
is
(B).
\n\
\n
Q:
Which
expression
is
equivalent
to
5
x
9?
\n
(A)
(5
x
4)
x
(6
x
5)
\n
(B)
(5
x
5)
\
\
+
(5
x
4)
\n
(C)
(5
x
5)
+
(5
x
9)
\n
(D)
(5
x
9)
x
(6
x
9)
\n
A:
Let's
think
step
by
\
\
step.
We
know
that
9
=
(5
+
4),
so
5
x
9
=
5
x
(5
+
4)
=
(5
x
5)
+
(5
x
4).
The
\
\
answer
is
(B)."
"
group"
:
"
mmlu_flan_cot_fewshot_stem"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
task"
:
"
mmlu_flan_cot_fewshot_elementary_mathematics"
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_formal_logic.yaml
View file @
574e565a
dataset_name
:
formal_logic
description
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
formal
\
"
dataset_name
"
:
"
formal_logic
"
"
description
"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
formal
\
\
logic.
\n\n
Q:
Which
of
the
given
formulas
of
PL
is
the
best
symbolization
of
the
\
\
following
sentence?
\n
Turtles
live
long
lives
and
are
happy
creatures,
unless
they
\
\
are
injured.
\n
(A)
(L
\u2022
H)
\u2261
I
(B)
(L
\u2022
H)
\u2228
I
(C)
L
\u2022\
\
(H
\u2228
I)
(D)
L
\u2022
(H
\u2283
R).
\n
A:
Let's
think
step
by
step.
We
refer
\
\
to
Wikipedia
articles
on
formal
logic
for
help.
Let
\u2019
s
solve
this
step
by
\
\
step.
Let
\u201C
L
\u201D
denote
\u201C
living
long
\u201D
,
H
\u201C
being
happy
\u201D\
,
and
\u201C
I
\u201D
\u201C
being
injured
\u201D
.
Now,
consider
each
choice:
\n
(A)
means
\
\
(living
long
AND
being
happy)
is
equivalent
to
(being
injured).
\n
(B)
means
(living
\
\
long
AND
being
happy)
OR
(being
injured).
\n
(C)
means
(living
long)
AND
(being
\
\
happy
OR
being
injured).
\n
(D)
means
(living
long)
AND
(being
happy
implies
being
\
\
R),
but
what
R
denotes
is
not
clear.
\n
Obviously,
(B)
is
the
best
symbolization
\
\
of
the
original
sentence.
The
answer
is
(B).
\n\n
Q:
Select
the
best
translation
\
\
into
predicate
logic.George
borrows
Hector's
lawnmower.
(g:
George;
h:
Hector;
\
\
l:
Hector's
lawnmower;
Bxyx:
x
borrows
y
from
z).
\n
(A)
Blgh
(B)
Bhlg
(C)
Bglh
\
\
(D)
Bghl
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
formal
\
\
logic
for
help.
Let
\u2019
s
solve
this
step
by
step.
We
are
told
that
\u201C
Bxyx
\u201D\
\
means
\u201C
x
borrows
y
from
z
\u201D
.
We
can
rewrite
\u201C
George
borrows
Hector's
\
\
lawnmower
\u201D
as
\u201C
George
borrows
a
lawnmower
from
Hector
\u201D
,
which
can
\
\
then
be
translated
into
predicate
logic
as
\u201C
Bglh
\u201D
.
The
answer
\u201C\
Bglh
\u201D
appears
in
(C);
therefore,
(C)
must
be
the
correct
answer.
The
answer
\
\
is
(C).
\n\n
Q:
\n
Select
the
best
English
interpretation
of
the
given
arguments
\
\
in
predicate
logic.
\n
Dm
\n
(
\u2200
x)(Wx
\u2283
~Dx).
\n
(
\u2200
x)Wx
\u2228
Ag
\t
/
\
\
(
\u2203
x)Ax
\n
(A)
Marina
is
a
dancer.
Some
weaklings
are
not
dancers.
Either
everything
\
\
is
a
weakling
or
Georgia
plays
volleyball.
So
something
plays
volleyball.
(B)
\
\
Marina
is
a
dancer.
No
weakling
is
a
dancer.
Everything
is
either
a
weakling
or
\
\
plays
volleyball.
So
something
plays
volleyball.
(C)
Marina
is
a
dancer.
Some
\
\
weaklings
are
not
dancers.
Everything
is
either
a
weakling
or
plays
volleyball.
\
\
So
something
plays
volleyball.
(D)
Marina
is
a
dancer.
No
weakling
is
a
dancer.
\
\
Either
everything
is
a
weakling
or
Georgia
plays
volleyball.
So
something
plays
\
\
volleyball.
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
formal
\
\
logic
for
help.
Let
\u2019
s
solve
this
step
by
step.
Let
\u201C
D
\u201D
denote
\u201C\
being
a
dancer
\u201D
,
\u201C
m
\u201D
denote
\u201C
Maria
\u201D
,
\u201C
g
\u201D
denote
\
\ \u201C
Georgia
\u201D
,
\u201C
W
\u201D
denote
\u201C
weakling
\u201D
,
\u201C
A
\u201D\
\
denote
\u201C
playing
volleyball
\u201D
.
Then,
we
have
the
following:
\n
1.
Dm
\u2192\
\
Maria
is
a
dance.
\n
2.
(
\u2200
x)(Wx
\u2283
~Dx).
\u2192
For
all
x,
if
x
is
a
weakling,
\
\
then
x
is
not
a
dancer.
In
other
words,
no
weakling
is
a
dancer.
\n
3.
(
\u2200
x)Wx
\
\ \u2228
Ag
\t
/
(
\u2203
x)Ax
\u2192
For
all
x,
x
is
a
weakling
or
Georgia
plays
volleyball.
\
\
So
there
exists
an
x
that
plays
volleyball.
\n
Options
(A)
and
(C)
do
claim
that
\
\
some
weaklings
are
not
dancers,
but
the
second
argument
strongly
states
that
no
\
\
weakling
is
a
dancer.
Thus,
we
can
eliminate
them.
Option
(B)
omits
the
important
\
\
detail
about
Georgia
playing
volleyball.
Option
(D)
has
all
the
details
presented
\
\
in
the
arguments
and
is
the
best
English
interpretation
of
the
arguments.
The
\
\
answer
is
(D).
\n\n
Q:
Select
the
best
translation
into
predicate
logic:
No
people
\
\
drive
on
Mars.
\n
(A)
~Pd
(B)
(
\u2200
x)(Px
\u2228
~Dx)
(C)
(
\u2200
x)(Px
\u2283
~Dx)
\
\
(D)
~Dp
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
formal
\
\
logic
for
help.
Let
\u2019
s
solve
this
step
by
step.
Let
\u201C
P
\u201D
denote
\u201C\
being
on
Mars
\u201D
and
\u201C
D
\u201D
denote
\u201C
driving
on
Mars
\u201D
.
Then
let
\u2019\
s
consider
each
option:
\n
Option
(A):
~Pd
\u2192
d
is
not
on
Mars.
\n
Option
(B):
(
\u2200\
x)(Px
\u2228
~Dx)
\u2192
For
all
x,
x
is
on
Mars
and
x
do
not
drive
on
Mars.
\n
Option
\
\
(C):
(
\u2200
x)(Px
\u2283
~Dx)
\u2192
For
all
x,
x
is
on
Mars
implies
that
x
do
\
\
not
drive
on
Mars.
\n
Option
(D):
~Dp:
\u2192
p
do
not
drive
on
Mars.
\n
Of
all
these
\
\
options,
Option
(C)
appears
to
be
the
best
and
most
meaningful
interpretation
\
\
of
the
argument
\u201C
No
people
drive
on
Mars.
\u201D
The
answer
is
(C)."
include
:
_mmlu_flan_cot_fewshot_template_yaml
task
:
mmlu_flan_cot_fewshot_formal_logic
\
are
injured.
\n
(A)
(L
•
H)
≡
I
(B)
(L
•
H)
∨
I
(C)
L
•
(H
∨
I)
(D)
L
•
(H
⊃
R).
\n\
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
formal
logic
for
\
\
help.
Let’s
solve
this
step
by
step.
Let
“L”
denote
“living
long”,
H
“being
happy”,
\
\
and
“I”
“being
injured”.
Now,
consider
each
choice:
\n
(A)
means
(living
long
AND
\
\
being
happy)
is
equivalent
to
(being
injured).
\n
(B)
means
(living
long
AND
being
\
\
happy)
OR
(being
injured).
\n
(C)
means
(living
long)
AND
(being
happy
OR
being
\
\
injured).
\n
(D)
means
(living
long)
AND
(being
happy
implies
being
R),
but
what
\
\
R
denotes
is
not
clear.
\n
Obviously,
(B)
is
the
best
symbolization
of
the
original
\
\
sentence.
The
answer
is
(B).
\n\n
Q:
Select
the
best
translation
into
predicate
\
\
logic.George
borrows
Hector's
lawnmower.
(g:
George;
h:
Hector;
l:
Hector's
lawnmower;
\
\
Bxyx:
x
borrows
y
from
z).
\n
(A)
Blgh
(B)
Bhlg
(C)
Bglh
(D)
Bghl
\n
A:
Let's
think
\
\
step
by
step.
We
refer
to
Wikipedia
articles
on
formal
logic
for
help.
Let’s
solve
\
\
this
step
by
step.
We
are
told
that
“Bxyx”
means
“x
borrows
y
from
z”.
We
can
\
\
rewrite
“George
borrows
Hector's
lawnmower”
as
“George
borrows
a
lawnmower
from
\
\
Hector”,
which
can
then
be
translated
into
predicate
logic
as
“Bglh”.
The
answer
\
\
“Bglh”
appears
in
(C);
therefore,
(C)
must
be
the
correct
answer.
The
answer
is
\
\
(C).
\n\n
Q:
\n
Select
the
best
English
interpretation
of
the
given
arguments
in
\
\
predicate
logic.
\n
Dm
\n
(∀x)(Wx
⊃
~Dx).
\n
(∀x)Wx
∨
Ag
\t
/
(∃x)Ax
\n
(A)
Marina
is
a
\
\
dancer.
Some
weaklings
are
not
dancers.
Either
everything
is
a
weakling
or
Georgia
\
\
plays
volleyball.
So
something
plays
volleyball.
(B)
Marina
is
a
dancer.
No
weakling
\
\
is
a
dancer.
Everything
is
either
a
weakling
or
plays
volleyball.
So
something
\
\
plays
volleyball.
(C)
Marina
is
a
dancer.
Some
weaklings
are
not
dancers.
Everything
\
\
is
either
a
weakling
or
plays
volleyball.
So
something
plays
volleyball.
(D)
Marina
\
\
is
a
dancer.
No
weakling
is
a
dancer.
Either
everything
is
a
weakling
or
Georgia
\
\
plays
volleyball.
So
something
plays
volleyball.
\n
A:
Let's
think
step
by
step.
\
\
We
refer
to
Wikipedia
articles
on
formal
logic
for
help.
Let’s
solve
this
step
\
\
by
step.
Let
“D”
denote
“being
a
dancer”,
“m”
denote
“Maria”,
“g”
denote
“Georgia”,
\
\
“W”
denote
“weakling”,
“A”
denote
“playing
volleyball”.
Then,
we
have
the
following:
\n\
1.
Dm
→
Maria
is
a
dance.
\n
2.
(∀x)(Wx
⊃
~Dx).
→
For
all
x,
if
x
is
a
weakling,
then
\
\
x
is
not
a
dancer.
In
other
words,
no
weakling
is
a
dancer.
\n
3.
(∀x)Wx
∨
Ag
\t\
/
(∃x)Ax
→
For
all
x,
x
is
a
weakling
or
Georgia
plays
volleyball.
So
there
exists
\
\
an
x
that
plays
volleyball.
\n
Options
(A)
and
(C)
do
claim
that
some
weaklings
\
\
are
not
dancers,
but
the
second
argument
strongly
states
that
no
weakling
is
a
\
\
dancer.
Thus,
we
can
eliminate
them.
Option
(B)
omits
the
important
detail
about
\
\
Georgia
playing
volleyball.
Option
(D)
has
all
the
details
presented
in
the
arguments
\
\
and
is
the
best
English
interpretation
of
the
arguments.
The
answer
is
(D).
\n\n\
Q:
Select
the
best
translation
into
predicate
logic:
No
people
drive
on
Mars.
\n\
(A)
~Pd
(B)
(∀x)(Px
∨
~Dx)
(C)
(∀x)(Px
⊃
~Dx)
(D)
~Dp
\n
A:
Let's
think
step
by
step.
\
\
We
refer
to
Wikipedia
articles
on
formal
logic
for
help.
Let’s
solve
this
step
\
\
by
step.
Let
“P”
denote
“being
on
Mars”
and
“D”
denote
“driving
on
Mars”.
Then
\
\
let’s
consider
each
option:
\n
Option
(A):
~Pd
→
d
is
not
on
Mars.
\n
Option
(B):
\
\
(∀x)(Px
∨
~Dx)
→
For
all
x,
x
is
on
Mars
and
x
do
not
drive
on
Mars.
\n
Option
(C):
\
\
(∀x)(Px
⊃
~Dx)
→
For
all
x,
x
is
on
Mars
implies
that
x
do
not
drive
on
Mars.
\n\
Option
(D):
~Dp:
→
p
do
not
drive
on
Mars.
\n
Of
all
these
options,
Option
(C)
appears
\
\
to
be
the
best
and
most
meaningful
interpretation
of
the
argument
“No
people
drive
\
\
on
Mars.”
The
answer
is
(C)."
"
group"
:
"
mmlu_flan_cot_fewshot_humanities"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
task"
:
"
mmlu_flan_cot_fewshot_formal_logic"
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_global_facts.yaml
View file @
574e565a
dataset_name
:
global_facts
description
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
global
\
\
facts.
\n\n
Q:
As
of
2017,
how
many
of
the
world
\u2019
s
1-year-old
children
today
\
\
have
been
vaccinated
against
some
disease?
*
\n
(A)
80%
(B)
60%
(C)
40%
(D)
20%
\n\
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
global
facts
for
\
\
help.
According
to
data
published
by
the
World
Health
Organization,
the
nummber
\
\
of
1-year-old
children
vaccinated
in
2017
exceeds
80%.
The
answer
is
(A).
\n\n\
Q:
As
of
2019,
about
what
percentage
of
Americans
agree
that
the
state
is
run
for
\
\
the
benefit
of
all
the
people?
\n
(A)
31%
(B)
46%
(C)
61%
(D)
76%
\n
A:
Let's
think
\
\
step
by
step.
We
refer
to
Wikipedia
articles
on
global
facts
for
help.
In
2019,
\
\
about
46%
percentage
of
Americans
agree
that
the
state
is
run
for
the
benefit
\
\
of
all
the
people.
The
answer
is
(B).
\n\n
Q:
As
of
2019,
about
what
percentage
\
\
of
Russians
say
it
is
very
important
to
have
free
media
in
our
country
without
\
\
government/state
censorship?
\n
(A)
38%
(B)
53%
(C)
68%
(D)
83%
\n
A:
Let's
think
\
\
step
by
step.
We
refer
to
Wikipedia
articles
on
global
facts
for
help.
As
of
2019,
\
\
about
38%
of
Russians
say
it
is
very
important
to
have
free
media
in
our
country.
\
\
The
answer
is
(A).
\n\n
Q:
As
of
2015,
since
1990
forests
have
____
in
Europe
and
\
\
have
____
in
Africa
and
the
Americas.
\n
(A)
increased,
increased
(B)
increased,
\
\
decreased
(C)
decreased,
increased
(D)
decreased,
decreased
\n
A:
Let's
think
step
\
\
by
step.
We
refer
to
Wikipedia
articles
on
global
facts
for
help.
As
of
2015,
\
\
since
1990
forests
have
increased
in
Europe
and
have
decreased
in
Africa
and
the
\
\
Americas.
The
answer
is
(B).
\n\n
Q:
Which
of
the
following
pairs
of
statements
\
\
are
both
true
(as
of
2019)?
\n
(A)
People
tend
to
be
optimistic
about
their
own
\
\
future
and
the
future
of
their
nation
or
the
world.
(B)
People
tend
to
be
optimistic
\
"
dataset_name"
:
"
global_facts"
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
global
\
\
facts.
\n\n
Q:
As
of
2017,
how
many
of
the
world’s
1-year-old
children
today
have
\
\
been
vaccinated
against
some
disease?
*
\n
(A)
80%
(B)
60%
(C)
40%
(D)
20%
\n
A:
Let's
\
\
think
step
by
step.
We
refer
to
Wikipedia
articles
on
global
facts
for
help.
According
\
\
to
data
published
by
the
World
Health
Organization,
the
nummber
of
1-year-old
\
\
children
vaccinated
in
2017
exceeds
80%.
The
answer
is
(A).
\n\n
Q:
As
of
2019,
\
\
about
what
percentage
of
Americans
agree
that
the
state
is
run
for
the
benefit
\
\
of
all
the
people?
\n
(A)
31%
(B)
46%
(C)
61%
(D)
76%
\n
A:
Let's
think
step
by
step.
\
\
We
refer
to
Wikipedia
articles
on
global
facts
for
help.
In
2019,
about
46%
percentage
\
\
of
Americans
agree
that
the
state
is
run
for
the
benefit
of
all
the
people.
The
\
\
answer
is
(B).
\n\n
Q:
As
of
2019,
about
what
percentage
of
Russians
say
it
is
very
\
\
important
to
have
free
media
in
our
country
without
government/state
censorship?
\n\
(A)
38%
(B)
53%
(C)
68%
(D)
83%
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
\
\
articles
on
global
facts
for
help.
As
of
2019,
about
38%
of
Russians
say
it
is
\
\
very
important
to
have
free
media
in
our
country.
The
answer
is
(A).
\n\n
Q:
As
\
\
of
2015,
since
1990
forests
have
____
in
Europe
and
have
____
in
Africa
and
the
\
\
Americas.
\n
(A)
increased,
increased
(B)
increased,
decreased
(C)
decreased,
increased
\
\
(D)
decreased,
decreased
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
\
\
on
global
facts
for
help.
As
of
2015,
since
1990
forests
have
increased
in
Europe
\
\
and
have
decreased
in
Africa
and
the
Americas.
The
answer
is
(B).
\n\n
Q:
Which
\
\
of
the
following
pairs
of
statements
are
both
true
(as
of
2019)?
\n
(A)
People
tend
\
\
to
be
optimistic
about
their
own
future
and
the
future
of
their
nation
or
the
\
\
world.
(B)
People
tend
to
be
optimistic
about
their
own
future
but
pessimistic
\
\
about
the
future
of
their
nation
or
the
world.
(C)
People
tend
to
be
pessimistic
\
\
about
their
own
future
but
optimistic
about
the
future
of
their
nation
or
the
\
\
world.
(D)
People
tend
to
be
pessimistic
about
their
own
future
and
the
future
\
\
of
their
nation
or
the
world.
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
\
\
articles
on
global
facts
for
help.
As
of
2019,
most
people
tend
to
be
optimistic
\
\
about
their
own
future
but
pessimistic
about
the
future
of
their
nation
or
the
\
\
world.
(C)
People
tend
to
be
pessimistic
about
their
own
future
but
optimistic
\
\
about
the
future
of
their
nation
or
the
world.
(D)
People
tend
to
be
pessimistic
\
\
about
their
own
future
and
the
future
of
their
nation
or
the
world.
\n
A:
Let's
\
\
think
step
by
step.
We
refer
to
Wikipedia
articles
on
global
facts
for
help.
As
\
\
of
2019,
most
people
tend
to
be
optimistic
about
their
own
future
but
pessimistic
\
\
about
the
future
of
their
nation
or
the
world.
The
answer
is
(B)."
include
:
_mmlu_flan_cot_fewshot_template_yaml
task
:
mmlu_flan_cot_fewshot_global_facts
\
world.
The
answer
is
(B)."
"
group"
:
"
mmlu_flan_cot_fewshot_other"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
task"
:
"
mmlu_flan_cot_fewshot_global_facts"
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_high_school_biology.yaml
View file @
574e565a
dataset_name
:
high_school_biology
description
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
high
\
"
dataset_name
"
:
"
high_school_biology
"
"
description
"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
high
\
\
school
biology.
\n\n
Q:
In
animal
cells,
which
of
the
following
represents
the
most
\
\
likely
pathway
that
a
secretory
protein
takes
as
it
is
synthesized
in
a
cell?
\n\
(A)
Plasma
membrane
\u2013
Golgi
apparatus
\u2013
ribosome
\u2013
secretory
vesicle
\u2013\
rough
ER
(B)
Ribosome
\u2013
Golgi
apparatus
\u2013
rough
ER
\u2013
secretory
vesicle
\u2013\
plasma
membrane
(C)
Plasma
membrane
\u2013
Golgi
apparatus
\u2013
ribosome
\u2013
secretory
\
\
vesicle
\u2013
rough
ER
(D)
Ribosome
\u2013
rough
ER
\u2013
Golgi
apparatus
\u2013
secretory
\
\
vesicle
\u2013
plasma
membrane
\n
A:
Let's
think
step
by
step.
Protein
synthesis
starts
\
(A)
Plasma
membrane–Golgi
apparatus–ribosome–secretory
vesicle–rough
ER
(B)
Ribosome–Golgi
\
\
apparatus–rough
ER–secretory
vesicle–plasma
membrane
(C)
Plasma
membrane–Golgi
\
\
apparatus–ribosome–secretory
vesicle–rough
ER
(D)
Ribosome–rough
ER–Golgi
apparatus–secretory
\
\
vesicle–plasma
membrane
\n
A:
Let's
think
step
by
step.
Protein
synthesis
starts
\
\
at
the
ribosome,
so
we
can
eliminate
(A)
and
(C).
The
ribosome
is
often
in
the
\
\
endoplasmic
reticulum
and
moves
from
there
to
the
Golgi
apparatus,
where
it
is
\
\
modified
and
packaged
into
a
vesicle.
The
vesicle
then
floats
to
the
plasma
membrane
\
\
and
is
secreted.
The
answer
is
(D).
\n\n
Q:
A
mutation
in
a
bacterial
enzyme
changed
\
\
a
previously
polar
amino
acid
into
a
nonpolar
amino
acid.
This
amino
acid
was
\
\
located
at
a
site
distant
from
the
enzyme
\u2019
s
active
site.
How
might
this
mutation
\
\
alter
the
enzyme
\u2019
s
substrate
specificity?
\n
(A)
By
changing
the
enzyme
\u2019\
s
pH
optimum
(B)
By
changing
the
enzyme
\u2019
s
location
in
the
cell
(C)
By
changing
\
\
the
shape
of
the
protein
(D)
An
amino
acid
change
away
from
the
active
site
cannot
\
\
alter
the
enzyme
\u2019
s
substrate
specificity.
\n
A:
Let's
think
step
by
step.
A
\
\
change
in
an
amino
acid
leads
to
a
change
in
the
primary
structure
of
the
protein.
\
\
A
change
in
the
primary
structure
may
lead
to
a
change
in
the
secondary
and
the
\
\
tertiary
structure
of
the
protein.
A
change
in
the
tertiary
structure
means
a
\
\
change
in
the
shape
of
the
protein,
so
(C)
has
to
be
correct.
Since
the
change
\
\
does
not
affect
the
active
site
of
the
enzyme,
we
do
not
expect
the
activity
of
\
\
the
enzyme
to
be
affected.
The
answer
is
(C).
\n\n
Q:
Which
of
the
following
is
\
\
not
a
way
to
form
recombinant
DNA?
\n
(A)
Translation
(B)
Conjugation
(C)
Specialized
\
\
transduction
(D)
Transformation
\n
A:
Let's
think
step
by
step.
The
introduction
\
\
of
foreign
DNA
or
RNA
into
bacteria
or
eukaryotic
cells
is
a
common
technique
\
\
in
molecular
biology
and
scientific
research.
There
are
multiple
ways
foreign
\
\
DNA
can
be
introduced
into
cells
including
transformation,
transduction,
conjugation,
\
\
and
transfection.
In
contrast,
(A)
is
not
a
way
to
form
DNA:
during
translation
\
\
the
ribosomes
synthesize
proteins
from
RNA.
The
answer
is
(A).
\n\n
Q:
Homologous
\
\
structures
are
often
cited
as
evidence
for
the
process
of
natural
selection.
All
\
\
of
the
following
are
examples
of
homologous
structures
EXCEPT
\n
(A)
the
wings
of
\
\
a
bird
and
the
wings
of
a
bat
(B)
the
flippers
of
a
whale
and
the
arms
of
a
man
\
\
(C)
the
pectoral
fins
of
a
porpoise
and
the
flippers
of
a
seal
(D)
the
forelegs
\
\
of
an
insect
and
the
forelimbs
of
a
dog
\n
A:
Let's
think
step
by
step.
\u200B\u200B\
Homologous
structures
are
similar
physical
features
in
organisms
that
share
a
common
\
\
ancestor
\u200B\u200B
but
different
functions.
Comparisons
(B)
and
(C)
are
clearly
\
\
homologous
because
they
share
a
common
ancestor
and
the
structures
serve
different
\
\
purposes.
Bat
wings
and
birg
wings
are
also
homologous,
while
they
are
both
wings,
\
\
the
forelimbs
serve
different
purposes.
Insects
and
dogs
are
very
far
ancestors
\
\
since
one
is
vertebrate
while
the
other
is
invertebrate
and
the
forelimbs
serve
\
\
the
same
purpose,
so
they
are
not
homologous.
The
answer
is
(D).
\n\n
Q:
Which
of
\
\
the
following
is
not
known
to
be
involved
in
the
control
of
cell
division?
\n
(A)
\
\
Cyclins
(B)
Protein
kinases
(C)
Checkpoints
(D)
Fibroblast
cells
\n
A:
Let's
think
\
\
step
by
step.
Normal
cells
move
through
the
cell
cycle
in
a
regulated
way.
At
\
\
the
checkpoint
stage,
they
use
information
about
their
own
internal
state
and
\
\
cues
from
the
environment
around
them
to
decide
whether
to
proceed
with
cell
division.
\
\
Cues
like
these
act
by
changing
the
activity
of
core
cell
cycle
regulators
inside
\
\
the
cell.
The
most
common
regulators
are
cyclins
and
cyclin-dependent
kinases.
\
\
Fibroblast
cells
do
not
play
any
role
in
cell
division.
The
answer
is
(D)."
include
:
_mmlu_flan_cot_fewshot_template_yaml
task
:
mmlu_flan_cot_fewshot_high_school_biology
\
located
at
a
site
distant
from
the
enzyme’s
active
site.
How
might
this
mutation
\
\
alter
the
enzyme’s
substrate
specificity?
\n
(A)
By
changing
the
enzyme’s
pH
optimum
\
\
(B)
By
changing
the
enzyme’s
location
in
the
cell
(C)
By
changing
the
shape
of
\
\
the
protein
(D)
An
amino
acid
change
away
from
the
active
site
cannot
alter
the
\
\
enzyme’s
substrate
specificity.
\n
A:
Let's
think
step
by
step.
A
change
in
an
amino
\
\
acid
leads
to
a
change
in
the
primary
structure
of
the
protein.
A
change
in
the
\
\
primary
structure
may
lead
to
a
change
in
the
secondary
and
the
tertiary
structure
\
\
of
the
protein.
A
change
in
the
tertiary
structure
means
a
change
in
the
shape
\
\
of
the
protein,
so
(C)
has
to
be
correct.
Since
the
change
does
not
affect
the
\
\
active
site
of
the
enzyme,
we
do
not
expect
the
activity
of
the
enzyme
to
be
affected.
\
\
The
answer
is
(C).
\n\n
Q:
Which
of
the
following
is
not
a
way
to
form
recombinant
\
\
DNA?
\n
(A)
Translation
(B)
Conjugation
(C)
Specialized
transduction
(D)
Transformation
\n\
A:
Let's
think
step
by
step.
The
introduction
of
foreign
DNA
or
RNA
into
bacteria
\
\
or
eukaryotic
cells
is
a
common
technique
in
molecular
biology
and
scientific
\
\
research.
There
are
multiple
ways
foreign
DNA
can
be
introduced
into
cells
including
\
\
transformation,
transduction,
conjugation,
and
transfection.
In
contrast,
(A)
\
\
is
not
a
way
to
form
DNA:
during
translation
the
ribosomes
synthesize
proteins
\
\
from
RNA.
The
answer
is
(A).
\n\n
Q:
Homologous
structures
are
often
cited
as
evidence
\
\
for
the
process
of
natural
selection.
All
of
the
following
are
examples
of
homologous
\
\
structures
EXCEPT
\n
(A)
the
wings
of
a
bird
and
the
wings
of
a
bat
(B)
the
flippers
\
\
of
a
whale
and
the
arms
of
a
man
(C)
the
pectoral
fins
of
a
porpoise
and
the
flippers
\
\
of
a
seal
(D)
the
forelegs
of
an
insect
and
the
forelimbs
of
a
dog
\n
A:
Let's
think
\
\
step
by
step.
Homologous
structures
are
similar
physical
features
in
organisms
\
\
that
share
a
common
ancestor
but
different
functions.
Comparisons
(B)
and
(C)
\
\
are
clearly
homologous
because
they
share
a
common
ancestor
and
the
structures
\
\
serve
different
purposes.
Bat
wings
and
birg
wings
are
also
homologous,
while
\
\
they
are
both
wings,
the
forelimbs
serve
different
purposes.
Insects
and
dogs
\
\
are
very
far
ancestors
since
one
is
vertebrate
while
the
other
is
invertebrate
\
\
and
the
forelimbs
serve
the
same
purpose,
so
they
are
not
homologous.
The
answer
\
\
is
(D).
\n\n
Q:
Which
of
the
following
is
not
known
to
be
involved
in
the
control
\
\
of
cell
division?
\n
(A)
Cyclins
(B)
Protein
kinases
(C)
Checkpoints
(D)
Fibroblast
\
\
cells
\n
A:
Let's
think
step
by
step.
Normal
cells
move
through
the
cell
cycle
in
\
\
a
regulated
way.
At
the
checkpoint
stage,
they
use
information
about
their
own
\
\
internal
state
and
cues
from
the
environment
around
them
to
decide
whether
to
\
\
proceed
with
cell
division.
Cues
like
these
act
by
changing
the
activity
of
core
\
\
cell
cycle
regulators
inside
the
cell.
The
most
common
regulators
are
cyclins
\
\
and
cyclin-dependent
kinases.
Fibroblast
cells
do
not
play
any
role
in
cell
division.
\
\
The
answer
is
(D)."
"
group"
:
"
mmlu_flan_cot_fewshot_stem"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
task"
:
"
mmlu_flan_cot_fewshot_high_school_biology"
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_high_school_chemistry.yaml
View file @
574e565a
dataset_name
:
high_school_chemistry
description
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
high
\
"
dataset_name
"
:
"
high_school_chemistry
"
"
description
"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
high
\
\
school
chemistry.
\n\n
Q:
Which
of
the
following
is
considered
an
acid
anhydride?
\n\
(A)
HCl
(B)
H2SO3
(C)
SO2
(D)
Al(NO3)3
\n
A:
Let's
think
step
by
step.
An
acid
anhydride
\
\
is
a
compound
that
is
derived
by
removing
water
from
an
acid.
The
chemical
formula
\
...
...
@@ -45,5 +45,6 @@ description: "The following are multiple choice questions (with answers) about h
\
the
acetate
ion.
The
added
strong
acid,
Nitric
acid,
will
react
with
the
conjugate
\
\
base.
Therefore
the
maximum
amount
of
acid
that
can
be
added
will
be
equal
to
\
\
the
amount
of
acetate
ion,
or
2
moles.
The
answer
is
(C)."
include
:
_mmlu_flan_cot_fewshot_template_yaml
task
:
mmlu_flan_cot_fewshot_high_school_chemistry
"
group"
:
"
mmlu_flan_cot_fewshot_stem"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
task"
:
"
mmlu_flan_cot_fewshot_high_school_chemistry"
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_high_school_computer_science.yaml
View file @
574e565a
dataset_name
:
high_school_computer_science
description
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
high
\
"
dataset_name
"
:
"
high_school_computer_science
"
"
description
"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
high
\
\
school
computer
science.
\n\n
Q:
Which
of
the
following
is
an
example
of
the
use
\
\
of
a
device
on
the
Internet
of
Things
(IoT)
?
\n
(A)
A
car
alerts
a
driver
that
\
\
it
is
about
to
hit
an
object.
(B)
A
hiker
uses
a
G
P
S
watch
to
keep
track
of
\
...
...
@@ -26,9 +26,9 @@ description: "The following are multiple choice questions (with answers) about h
\
launched
from
any
web
sites
visited
or
files
downloaded.
\n
A:
Let's
think
step
\
\
by
step.
Choice
A
is
incorrect
as
it
only
describes
network
traffic,
which
an
\
\
anonymous
browser
does
not
change.
Choice
B
is
correct
as
it
correctly
describes
\
\
how
an
anonymous
browser
will
prevent
saving
data
on
the
user
\u2019
s
computer
\
\
after
the
session
is
ended.
Choice
C
is
incorrect
because
an
anonymous
browser
\
\
will
not
prevent
logging
in
to
email
or
social
media
accounts.
Choice
D
is
incorrect
\
\
how
an
anonymous
browser
will
prevent
saving
data
on
the
user
’
s
computer
after
\
\
the
session
is
ended.
Choice
C
is
incorrect
because
an
anonymous
browser
will
\
\
not
prevent
logging
in
to
email
or
social
media
accounts.
Choice
D
is
incorrect
\
\
because
an
anonymous
browser
in
itself
performs
no
virus
protection.
The
answer
\
\
is
(B).
\n\n
Q:
In
the
program
below,
the
initial
value
of
X
is
5
and
the
initial
\
\
value
of
Y
is
10.
\n
IF
(X
<
0){
\n
DISPLAY
(
\"
Foxtrot
\"
)
\n
}
ELSE
{
\n
IF
(X
>
Y){
\n\
...
...
@@ -66,5 +66,6 @@ description: "The following are multiple choice questions (with answers) about h
\
its
value
is
greater
than
100,
regardless
of
the
elements
in
the
list.
Choice
\
\
D
is
incorrect
because
its
step
3
does
not
increment
the
value
of
position,
so
\
\
it
will
repeat
forever.
The
answer
is
(B)."
include
:
_mmlu_flan_cot_fewshot_template_yaml
task
:
mmlu_flan_cot_fewshot_high_school_computer_science
"
group"
:
"
mmlu_flan_cot_fewshot_stem"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
task"
:
"
mmlu_flan_cot_fewshot_high_school_computer_science"
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_high_school_european_history.yaml
View file @
574e565a
dataset_name
:
high_school_european_history
description
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
high
\
"
dataset_name
"
:
"
high_school_european_history
"
"
description
"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
high
\
\
school
european
history.
\n\n
Q:
This
question
refers
to
the
following
information.
\n\
Albeit
the
king's
Majesty
justly
and
rightfully
is
and
ought
to
be
the
supreme
head
\
\
of
the
Church
of
England,
and
so
is
recognized
by
the
clergy
of
this
realm
in
\
...
...
@@ -34,7 +34,7 @@ description: "The following are multiple choice questions (with answers) about h
\
the
corruption
in
the
Church
of
England.
The
answer
is
(D).
\n\n
Q:
This
question
\
\
refers
to
the
following
information.
\n
Read
the
following
excerpt.
\n
The
revolutionary
\
\
seed
had
penetrated
into
every
country
and
spread
more
or
less.
It
was
greatly
\
\
developed
under
the
r
\xE9
gime
of
the
military
despotism
of
Bonaparte.
His
conquests
\
\
developed
under
the
r
é
gime
of
the
military
despotism
of
Bonaparte.
His
conquests
\
\
displaced
a
number
of
laws,
institutions,
and
customs;
broke
through
bonds
sacred
\
\
among
all
nations,
strong
enough
to
resist
time
itself;
which
is
more
than
can
\
\
be
said
of
certain
benefits
conferred
by
these
innovators.
\n
The
monarchs
will
\
...
...
@@ -55,9 +55,9 @@ description: "The following are multiple choice questions (with answers) about h
Let
them
maintain
religious
principles
in
all
their
purity,
and
not
allow
the
faith
\
\
to
be
attacked
and
morality
interpreted
according
to
the
social
contract
or
the
\
\
visions
of
foolish
sectarians.
\n
Let
them
suppress
Secret
Societies;
that
gangrene
\
\
of
society.
\n
\u2014
Klemens
von
Metternich,
Political
Confession
of
Faith,
1820
\n\
Which
of
the
following
was
the
greatest
cause
of
the
fears
expressed
by
Metternich
\
\
in
the
document
above?
\n
(A)
The
ideas
of
personal
liberty
and
nationalism
conceived
\
\
of
society.
\n
—
Klemens
von
Metternich,
Political
Confession
of
Faith,
1820
\n
Which
\
\
of
the
following
was
the
greatest
cause
of
the
fears
expressed
by
Metternich
in
\
\
the
document
above?
\n
(A)
The
ideas
of
personal
liberty
and
nationalism
conceived
\
\
during
the
Enlightenment
resulted
in
radical
revolutions
that
could
spread
throughout
\
\
Europe.
(B)
The
conquest
of
Europe
by
Napoleon
led
to
the
creation
of
new
factions
\
\
and
shifted
the
European
balance
of
power.
(C)
The
power
of
monarchs
had
grown
\
...
...
@@ -110,15 +110,15 @@ description: "The following are multiple choice questions (with answers) about h
\
were
all
turning
to
the
politicians;
the
famous
Nihilists
who
made
Europe
tremble-sons
\
\
of
village
priests,
of
the
lower
middle
class,
of
tradesmen-could
not
rise
above
\
\
the
idea
of
national
liberation,
and
seemed
to
believe
that
the
world
would
be
\
\
delivered-when
they
had
killed
their
despot&
\u2026
\n\"
Foolery!
They'll
never
get
\
\
out
of
it
with
their
foolery.
\"\n
Then,
lowering
his
voice
still
more,
in
a
few
\
\
bitter
words
he
described
his
old
dream
of
fraternity.
He
had
renounced
his
rank
\
\
and
his
fortune;
he
had
gone
among
workmen,
only
in
the
hope
of
seeing
at
last
\
\
the
foundation
of
a
new
society
of
labour
in
common.
All
the
sous
in
his
pockets
\
\
had
long
gone
to
the
urchins
of
the
settlement;
he
had
been
as
tender
as
a
brother
\
\
delivered-when
they
had
killed
their
despot&
…
\n\"
Foolery!
They'll
never
get
out
\
\
of
it
with
their
foolery.
\"\n
Then,
lowering
his
voice
still
more,
in
a
few
bitter
\
\
words
he
described
his
old
dream
of
fraternity.
He
had
renounced
his
rank
and
\
\
his
fortune;
he
had
gone
among
workmen,
only
in
the
hope
of
seeing
at
last
the
\
\
foundation
of
a
new
society
of
labour
in
common.
All
the
sous
in
his
pockets
had
\
\
long
gone
to
the
urchins
of
the
settlement;
he
had
been
as
tender
as
a
brother
\
\
with
the
colliers,
smiling
at
their
suspicion,
winning
them
over
by
his
quiet
\
\
workmanlike
ways
and
his
dislike
of
chattering.
But
decidedly
the
fusion
had
not
\
\
taken
place.
\n
His
voice
changed,
his
eyes
grew
bright,
he
fixed
them
on
\xE9
tienne,
\
\
taken
place.
\n
His
voice
changed,
his
eyes
grew
bright,
he
fixed
them
on
é
tienne,
\
\
directly
addressing
him:
\n\"
Now,
do
you
understand
that?
These
hatworkers
at
Marseilles
\
\
who
have
won
the
great
lottery
prize
of
a
hundred
thousand
francs
have
gone
off
\
\
at
once
and
invested
it,
declaring
that
they
are
going
to
live
without
doing
anything!
\
...
...
@@ -127,7 +127,7 @@ description: "The following are multiple choice questions (with answers) about h
\
out
as
much
as
you
like
against
the
rich,
you
haven't
got
courage
enough
to
give
\
\
back
to
the
poor
the
money
that
luck
brings
you.
You
will
never
be
worthy
of
happiness
\
\
as
long
as
you
own
anything,
and
your
hatred
of
the
bourgeois
proceeds
solely
\
\
from
an
angry
desire
to
be
bourgeois
yourselves
in
their
place.
\"\n
\xE9
mile
Zola,
\
\
from
an
angry
desire
to
be
bourgeois
yourselves
in
their
place.
\"\n
é
mile
Zola,
\
\
French
writer,
Germinal,
1885
\n
The
passage
displays
the
direct
concern
for
the
\
\
welfare
of
the
working
classes
that
was
typically
a
part
of
which
movement?
\n\
(A)
Capitalist
(B)
Scientific
(C)
Communist
(D)
Existentialist
\n
A:
Let's
think
step
\
...
...
@@ -156,13 +156,14 @@ description: "The following are multiple choice questions (with answers) about h
\
whether
Jewish,
Christian
or
Turkish,
appear
to
me
no
other
than
human
inventions,
\
\
set
up
to
terrify
and
enslave
mankind,
and
monopolize
power
and
profit.
\n
I
do
\
\
not
mean
by
this
declaration
to
condemn
those
who
believe
otherwise;
they
have
\
\
the
same
right
to
their
belief
as
I
have
to
mine.
\n
\u2014
Thomas
Paine,
The
Age
\
\
of
Reason,
1794
\u2013
1795
\n
Which
of
the
following
Enlightenment
philosophes
designed
\
\
a
system
of
checks
and
balances
for
government
to
avoid
abuses
of
power?
\n
(A)
\
\
Jean
Jacques
Rousseau
(B)
Baron
Montesquieu
(C)
Mary
Wollstonecraft
(D)
Adam
Smith
\n\
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
european
history
\
\
for
help.
Baron
Montesquieu
was
a
18th
centrury
French
philsopher
who
wrote
extensively
\
\
the
same
right
to
their
belief
as
I
have
to
mine.
\n
—
Thomas
Paine,
The
Age
of
Reason,
\
\
1794–
1795
\n
Which
of
the
following
Enlightenment
philosophes
designed
a
system
\
\
of
checks
and
balances
for
government
to
avoid
abuses
of
power?
\n
(A)
Jean
Jacques
\
\
Rousseau
(B)
Baron
Montesquieu
(C)
Mary
Wollstonecraft
(D)
Adam
Smith
\n
A:
Let's
\
\
think
step
by
step.
We
refer
to
Wikipedia
articles
on
european
history
for
help.
\
\
Baron
Montesquieu
was
a
18th
centrury
French
philsopher
who
wrote
extensively
\
\
against
the
monoplization
of
power
and
advocated
for
a
system
of
checks
and
balances
\
\
in
government
to
prevent
the
rise
of
despotism.
The
answer
is
(B)."
include
:
_mmlu_flan_cot_fewshot_template_yaml
task
:
mmlu_flan_cot_fewshot_high_school_european_history
"
group"
:
"
mmlu_flan_cot_fewshot_humanities"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
task"
:
"
mmlu_flan_cot_fewshot_high_school_european_history"
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_high_school_geography.yaml
View file @
574e565a
dataset_name
:
high_school_geography
description
:
'
The
following
are
multiple
choice
questions
(with
answers)
about
high
school
geography.
Q:
Which
one
of
the
following
items
is
an
example
of
nonmaterial
culture?
(A)
Dove
soap
(B)
Dove
candy
bar
(C)
Dove
symbol
(D)
A
dove
(bird).
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
geography
for
help.
Nonmaterial
culture
consists
of
cultural
ideas,
beliefs
or
symbols
that
are
not
physical
objects.
The
answer
is
(C).
Q:
During
the
third
stage
of
the
demographic
transition
model,
which
of
the
following
is
true?
(A)
Birth
rates
increase
and
population
growth
rate
is
less
rapid.
(B)
Birth
rates
decline
and
population
growth
rate
is
less
rapid.
(C)
Birth
rates
increase
and
population
growth
rate
increases.
(D)
Birth
rates
decrease
and
population
growth
rate
increases.
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
geography
for
help.
The
demographic
transition
model
models
the
five
different
stages
of
population
growth
as
a
country
goes
through
economic
development,
where
the
third
stage
refers
to
a
period
of
declining
birth
rates
and
lower
population
growth.
The
answer
is
(B).
Q:
The
practice
of
hiring
a
foreign
third-party
service
provider
to
run
an
operation
is
called
(A)
outsourcing.
(B)
offshoring.
(C)
maquiladoras.
(D)
locational
interdependence.
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
geography
for
help.
"Offshoring"
literally
means
to
move
or
base
some
of
the
activities
or
processes
of
a
company
to
a
foreign
country.
The
answer
is
(B).
Q:
Which
of
the
following
statements
is
NOT
accurate
regarding
the
services
provided
by
local
governments
in
the
United
States?
(A)
Duplication
of
efforts
occurs
often.
(B)
Social
problems
of
the
central
city
spill
over
into
the
surrounding
residential
suburbs.
(C)
Inefficiency
in
providing
services
occurs
often.
(D)
One
neighborhood'
'
s
efforts
to
reduce
pollution
are
always
supported
by
neighboring
communities.
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
geography
for
help.
There
may
be
economic,
social
or
political
reasons
for
two
neighboring
communities
and
their
local
governments
not
agreeing
to
pollution
reduction
efforts
initiated
by
one
of
them.
The
answer
is
(D).
Q:
The
rate
of
natural
increase
of
a
population
is
found
by
subtracting
the
(A)
crude
death
rate
from
the
crude
birth
date.
(B)
crude
birth
rate
from
the
crude
death
rate.
(C)
doubling
time
from
the
crude
birth
rate.
(D)
fertility
rate
from
the
crude
death
rate.
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
geography
for
help.
The
difference
between
number
of
births
and
deaths
gives
the
population
increase
at
any
given
time.
The
answer
is
(A).'
include
:
_mmlu_flan_cot_fewshot_template_yaml
task
:
mmlu_flan_cot_fewshot_high_school_geography
"
dataset_name"
:
"
high_school_geography"
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
high
\
\
school
geography.
\n\n
Q:
Which
one
of
the
following
items
is
an
example
of
nonmaterial
\
\
culture?
\n
(A)
Dove
soap
(B)
Dove
candy
bar
(C)
Dove
symbol
(D)
A
dove
(bird).
\n\
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
geography
for
help.
\
\
Nonmaterial
culture
consists
of
cultural
ideas,
beliefs
or
symbols
that
are
not
\
\
physical
objects.
The
answer
is
(C).
\n\n
Q:
During
the
third
stage
of
the
demographic
\
\
transition
model,
which
of
the
following
is
true?
\n
(A)
Birth
rates
increase
and
\
\
population
growth
rate
is
less
rapid.
(B)
Birth
rates
decline
and
population
growth
\
\
rate
is
less
rapid.
(C)
Birth
rates
increase
and
population
growth
rate
increases.
\
\
(D)
Birth
rates
decrease
and
population
growth
rate
increases.
\n
A:
Let's
think
\
\
step
by
step.
We
refer
to
Wikipedia
articles
on
geography
for
help.
The
demographic
\
\
transition
model
models
the
five
different
stages
of
population
growth
as
a
country
\
\
goes
through
economic
development,
where
the
third
stage
refers
to
a
period
of
\
\
declining
birth
rates
and
lower
population
growth.
The
answer
is
(B).
\n\n
Q:
The
\
\
practice
of
hiring
a
foreign
third-party
service
provider
to
run
an
operation
\
\
is
called
\n
(A)
outsourcing.
(B)
offshoring.
(C)
maquiladoras.
(D)
locational
interdependence.
\n\
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
geography
for
help.
\
\ \"
Offshoring
\"
literally
means
to
move
or
base
some
of
the
activities
or
processes
\
\
of
a
company
to
a
foreign
country.
The
answer
is
(B).
\n\n
Q:
Which
of
the
following
\
\
statements
is
NOT
accurate
regarding
the
services
provided
by
local
governments
\
\
in
the
United
States?
\n
(A)
Duplication
of
efforts
occurs
often.
(B)
Social
problems
\
\
of
the
central
city
spill
over
into
the
surrounding
residential
suburbs.
(C)
Inefficiency
\
\
in
providing
services
occurs
often.
(D)
One
neighborhood's
efforts
to
reduce
pollution
\
\
are
always
supported
by
neighboring
communities.
\n
A:
Let's
think
step
by
step.
\
\
We
refer
to
Wikipedia
articles
on
geography
for
help.
There
may
be
economic,
social
\
\
or
political
reasons
for
two
neighboring
communities
and
their
local
governments
\
\
not
agreeing
to
pollution
reduction
efforts
initiated
by
one
of
them.
The
answer
\
\
is
(D).
\n\n
Q:
The
rate
of
natural
increase
of
a
population
is
found
by
subtracting
\
\
the
\n
(A)
crude
death
rate
from
the
crude
birth
date.
(B)
crude
birth
rate
from
\
\
the
crude
death
rate.
(C)
doubling
time
from
the
crude
birth
rate.
(D)
fertility
\
\
rate
from
the
crude
death
rate.
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
\
\
articles
on
geography
for
help.
The
difference
between
number
of
births
and
deaths
\
\
gives
the
population
increase
at
any
given
time.
The
answer
is
(A)."
"
group"
:
"
mmlu_flan_cot_fewshot_social_sciences"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
task"
:
"
mmlu_flan_cot_fewshot_high_school_geography"
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_high_school_government_and_politics.yaml
View file @
574e565a
dataset_name
:
high_school_government_and_politics
description
:
'
The
following
are
multiple
choice
questions
(with
answers)
about
high
school
government
and
politics.
Q:
Which
of
the
following
best
states
an
argument
made
by
James
Madison
in
The
Federalist
number
10?
(A)
Honest
politicians
can
prevent
factions
from
developing.
(B)
Factions
are
more
likely
to
occur
in
large
republics
than
in
small
ones.
(C)
The
negative
effects
of
factionalism
can
be
reduced
by
a
republican
government.
(D)
Free
elections
are
the
people'
'
s
best
defense
against
factionalism.
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
government
and
politics
for
help.
In
the
Federalist
number
10,
James
Madison
advocated
for
a
representative
republican
form
of
government
to
guard
against
factionalism.
The
answer
is
(C).
Q:
The
term
"budget
deficit"
refers
to
the
(A)
annual
increase
in
federal
spending
on
the
military
(B)
amount
of
interest
on
the
national
debt
(C)
difference
between
the
initial
budget
proposals
made
by
the
president
and
Congress
(D)
amount
the
government
spends
in
excess
of
its
revenues
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
government
and
politics
for
help.
When
the
goverment
spends
more
than
it
earns,
their
difference
is
the
budget
deficit.
The
answer
is
(D).
Q:
Which
of
the
following
statements
about
cabinet
departments
is
FALSE?
(A)
They
are
established
by
the
legislative
branch.
(B)
Their
members
often
don'
'
t
have
much
influence
over
presidential
decisions.
(C)
They
cannot
all
be
run
by
leaders
who
belong
to
the
same
political
party
the
president
does.
(D)
Not
every
federal
agency
is
a
cabinet
department.
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
government
and
politics
for
help.
There
is
no
law
stipulating
that
some
cabinet
department
leaders
have
to
belong
to
a
political
party
different
from
that
of
the
president.
The
answer
is
(C).
Q:
Which
of
the
following
cases
established
the
precedent
that
a
defendant
must
be
informed
of
the
right
to
remain
silent,
the
right
to
a
lawyer,
and
protection
from
self-incrimination?
(A)
Weeks
v.
United
States
(B)
Betts
v.
Brady
(C)
Mapp
v.
Ohio
(D)
Miranda
v.
Arizona
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
government
and
politics
for
help.
In
the
landmark
Miranda
v.
Arizona
in
1966,
the
US
Supreme
Court,
based
on
the
Fifth
and
Sixth
Amendment
of
the
US
Constitution,
guaranteed
a
defendant'
'
s
right
to
an
attorney
and
protection
from
self-incrimination.
The
answer
is
(D).
Q:
Uncertainty
over
the
limits
to
presidential
power
is
caused
primarily
by
the
fact
that
(A)
the
constitutional
definition
of
those
powers
is
broad
and
unspecific
(B)
most
people
agree
that
the
Constitution
places
too
many
limits
on
presidential
power
(C)
the
Supreme
Court
consistently
refuses
to
rule
on
cases
concerning
presidential
powers
(D)
constitutional
amendments
have
greatly
increased
presidential
powers
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
government
and
politics
for
help.
The
US
Constitution
is
not
very
specific
about
the
powers
of
the
president,
leading
to
uncertainty
over
its
limits.
The
answer
is
(A).'
include
:
_mmlu_flan_cot_fewshot_template_yaml
task
:
mmlu_flan_cot_fewshot_high_school_government_and_politics
"
dataset_name"
:
"
high_school_government_and_politics"
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
high
\
\
school
government
and
politics.
\n\n
Q:
Which
of
the
following
best
states
an
argument
\
\
made
by
James
Madison
in
The
Federalist
number
10?
\n
(A)
Honest
politicians
can
\
\
prevent
factions
from
developing.
(B)
Factions
are
more
likely
to
occur
in
large
\
\
republics
than
in
small
ones.
(C)
The
negative
effects
of
factionalism
can
be
\
\
reduced
by
a
republican
government.
(D)
Free
elections
are
the
people's
best
defense
\
\
against
factionalism.
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
\
\
on
government
and
politics
for
help.
In
the
Federalist
number
10,
James
Madison
\
\
advocated
for
a
representative
republican
form
of
government
to
guard
against
\
\
factionalism.
The
answer
is
(C).
\n\n
Q:
The
term
\"
budget
deficit
\"
refers
to
the
\n\
(A)
annual
increase
in
federal
spending
on
the
military
(B)
amount
of
interest
on
\
\
the
national
debt
(C)
difference
between
the
initial
budget
proposals
made
by
\
\
the
president
and
Congress
(D)
amount
the
government
spends
in
excess
of
its
revenues
\n\
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
government
and
politics
\
\
for
help.
When
the
goverment
spends
more
than
it
earns,
their
difference
is
the
\
\
budget
deficit.
The
answer
is
(D).
\n\n
Q:
Which
of
the
following
statements
about
\
\
cabinet
departments
is
FALSE?
\n
(A)
They
are
established
by
the
legislative
branch.
\
\
(B)
Their
members
often
don't
have
much
influence
over
presidential
decisions.
\
\
(C)
They
cannot
all
be
run
by
leaders
who
belong
to
the
same
political
party
the
\
\
president
does.
(D)
Not
every
federal
agency
is
a
cabinet
department.
\n
A:
Let's
\
\
think
step
by
step.
We
refer
to
Wikipedia
articles
on
government
and
politics
\
\
for
help.
There
is
no
law
stipulating
that
some
cabinet
department
leaders
have
\
\
to
belong
to
a
political
party
different
from
that
of
the
president.
The
answer
\
\
is
(C).
\n\n
Q:
Which
of
the
following
cases
established
the
precedent
that
a
defendant
\
\
must
be
informed
of
the
right
to
remain
silent,
the
right
to
a
lawyer,
and
protection
\
\
from
self-incrimination?
\n
(A)
Weeks
v.
United
States
(B)
Betts
v.
Brady
(C)
Mapp
\
\
v.
Ohio
(D)
Miranda
v.
Arizona
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
\
\
articles
on
government
and
politics
for
help.
In
the
landmark
Miranda
v.
Arizona
\
\
in
1966,
the
US
Supreme
Court,
based
on
the
Fifth
and
Sixth
Amendment
of
the
US
\
\
Constitution,
guaranteed
a
defendant's
right
to
an
attorney
and
protection
from
\
\
self-incrimination.
The
answer
is
(D).
\n\n
Q:
Uncertainty
over
the
limits
to
presidential
\
\
power
is
caused
primarily
by
the
fact
that
\n
(A)
the
constitutional
definition
\
\
of
those
powers
is
broad
and
unspecific
(B)
most
people
agree
that
the
Constitution
\
\
places
too
many
limits
on
presidential
power
(C)
the
Supreme
Court
consistently
\
\
refuses
to
rule
on
cases
concerning
presidential
powers
(D)
constitutional
amendments
\
\
have
greatly
increased
presidential
powers
\n
A:
Let's
think
step
by
step.
We
refer
\
\
to
Wikipedia
articles
on
government
and
politics
for
help.
The
US
Constitution
\
\
is
not
very
specific
about
the
powers
of
the
president,
leading
to
uncertainty
\
\
over
its
limits.
The
answer
is
(A)."
"
group"
:
"
mmlu_flan_cot_fewshot_social_sciences"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
task"
:
"
mmlu_flan_cot_fewshot_high_school_government_and_politics"
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_high_school_macroeconomics.yaml
View file @
574e565a
dataset_name
:
high_school_macroeconomics
description
:
'
The
following
are
multiple
choice
questions
(with
answers)
about
high
school
macroeconomics.
Q:
Which
of
the
following
policies
best
describes
supply-side
fiscal
policy?
(A)
An
increase
in
the
money
supply
(B)
Increased
government
spending
(C)
Lower
taxes
on
research
and
development
of
new
technology
(D)
Higher
taxes
on
household
income
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
macroeconomics
for
help.
Supply-side
fiscal
policy
stimulates
the
economy
by
encouraging
more
production
of
goods
and
services
through
reduction
in
taxes
and
deregulation.
The
answer
is
(C).
Q:
The
short-run
Phillips
curve
indicates
a
(A)
direct
relation
between
unemployment
and
inflation
(B)
direct
relation
between
price
and
quantity
demanded
(C)
inverse
relation
between
price
and
quantity
demanded
(D)
inverse
relation
between
unemployment
and
inflation
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
macroeconomics
for
help.
The
short-run
Phillips
curve
shows
that
whenever
unemployment
decreases
below
a
natural
level,
the
inflation
starts
increasing,
and
vice-versa.
The
answer
is
(D).
Q:
Holding
all
else
equal
which
of
the
following
monetary
policies
would
be
used
to
boost
U.S.
exports?
(A)
Increasing
the
discount
rate
(B)
Increasing
the
reserve
ratio
(C)
Buying
government
securities
(D)
Lowering
tariffs
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
macroeconomics
for
help.
Buying
government
securities
leads
to
reduction
in
demand
for
US
dollars
from
foreign
buyers,
thereby
making
it
cheaper
and
hence
making
US
exports
more
attractive.
The
answer
is
(C).
Q:
A
federal
deficit
occurs
when
(A)
exports
exceed
imports.
(B)
imports
exceed
exports.
(C)
federal
tax
collections
exceed
spending.
(D)
federal
spending
exceeds
federal
tax
revenues.
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
macroeconomics
for
help.
A
federal
deficit
occurs
when
federal
spending
exceeds
federal
income
which
is
primarily
from
tax
revenues.
The
answer
is
(D).
Q:
Which
of
the
following
is
not
included
in
the
U.S.
GDP?
(A)
The
U.S.
military
opens
a
new
base
in
a
foreign
country
with
1000
U.S.
personnel.
(B)
Japanese
consumers
buy
thousands
of
CDs
produced
in
the
United
States.
(C)
An
American
pop
singer
performs
a
sold-out
concert
in
Paris.
(D)
A
French
theatrical
production
tours
dozens
of
American
cities.
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
macroeconomics
for
help.
The
economic
transactions
related
to
the
performance
of
the
American
pop-singer
in
Paris
happens
entirely
outside
the
U.S.
and
hence
is
not
included
in
the
GDP
numbers.
The
answer
is
(C).'
include
:
_mmlu_flan_cot_fewshot_template_yaml
task
:
mmlu_flan_cot_fewshot_high_school_macroeconomics
"
dataset_name"
:
"
high_school_macroeconomics"
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
high
\
\
school
macroeconomics.
\n\n
Q:
Which
of
the
following
policies
best
describes
supply-side
\
\
fiscal
policy?
\n
(A)
An
increase
in
the
money
supply
(B)
Increased
government
spending
\
\
(C)
Lower
taxes
on
research
and
development
of
new
technology
(D)
Higher
taxes
\
\
on
household
income
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
\
\
on
macroeconomics
for
help.
Supply-side
fiscal
policy
stimulates
the
economy
by
\
\
encouraging
more
production
of
goods
and
services
through
reduction
in
taxes
and
\
\
deregulation.
The
answer
is
(C).
\n\n
Q:
The
short-run
Phillips
curve
indicates
\
\
a
\n
(A)
direct
relation
between
unemployment
and
inflation
(B)
direct
relation
\
\
between
price
and
quantity
demanded
(C)
inverse
relation
between
price
and
quantity
\
\
demanded
(D)
inverse
relation
between
unemployment
and
inflation
\n
A:
Let's
think
\
\
step
by
step.
We
refer
to
Wikipedia
articles
on
macroeconomics
for
help.
The
short-run
\
\
Phillips
curve
shows
that
whenever
unemployment
decreases
below
a
natural
level,
\
\
the
inflation
starts
increasing,
and
vice-versa.
The
answer
is
(D).
\n\n
Q:
Holding
\
\
all
else
equal
which
of
the
following
monetary
policies
would
be
used
to
boost
\
\
U.S.
exports?
\n
(A)
Increasing
the
discount
rate
(B)
Increasing
the
reserve
ratio
\
\
(C)
Buying
government
securities
(D)
Lowering
tariffs
\n
A:
Let's
think
step
by
\
\
step.
We
refer
to
Wikipedia
articles
on
macroeconomics
for
help.
Buying
government
\
\
securities
leads
to
reduction
in
demand
for
US
dollars
from
foreign
buyers,
thereby
\
\
making
it
cheaper
and
hence
making
US
exports
more
attractive.
The
answer
is
(C).
\n\
\n
Q:
A
federal
deficit
occurs
when
\n
(A)
exports
exceed
imports.
(B)
imports
exceed
\
\
exports.
(C)
federal
tax
collections
exceed
spending.
(D)
federal
spending
exceeds
\
\
federal
tax
revenues.
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
\
\
on
macroeconomics
for
help.
A
federal
deficit
occurs
when
federal
spending
exceeds
\
\
federal
income
which
is
primarily
from
tax
revenues.
The
answer
is
(D).
\n\n
Q:
\
\
Which
of
the
following
is
not
included
in
the
U.S.
GDP?
\n
(A)
The
U.S.
military
\
\
opens
a
new
base
in
a
foreign
country
with
1000
U.S.
personnel.
(B)
Japanese
consumers
\
\
buy
thousands
of
CDs
produced
in
the
United
States.
(C)
An
American
pop
singer
\
\
performs
a
sold-out
concert
in
Paris.
(D)
A
French
theatrical
production
tours
\
\
dozens
of
American
cities.
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
\
\
articles
on
macroeconomics
for
help.
The
economic
transactions
related
to
the
\
\
performance
of
the
American
pop-singer
in
Paris
happens
entirely
outside
the
U.S.
\
\
and
hence
is
not
included
in
the
GDP
numbers.
The
answer
is
(C)."
"
group"
:
"
mmlu_flan_cot_fewshot_social_sciences"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
task"
:
"
mmlu_flan_cot_fewshot_high_school_macroeconomics"
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_high_school_mathematics.yaml
View file @
574e565a
dataset_name
:
high_school_mathematics
description
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
high
\
"
dataset_name
"
:
"
high_school_mathematics
"
"
description
"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
high
\
\
school
mathematics.
\n\n
Q:
Simplify
and
write
the
result
with
a
rational
denominator:
\
\
$$
\\
sqrt{
\\
sqrt[3]{
\\
sqrt{
\\
frac{1}{729}}}}$$
\n
(A)
\\
frac{3
\\
sqrt{3}}{3}
(B)
\\\
frac{1}{3}
(C)
\\
sqrt{3}
(D)
\\
frac{
\\
sqrt{3}}{3}
\n
A:
Let's
think
step
by
step.
\
...
...
@@ -13,7 +13,7 @@ description: "The following are multiple choice questions (with answers) about h
\
of
$9600/300=32=2^5$.
Since
at
this
interest
rate
it
takes
six
years
for
it
to
\
\
double,
it
will
take
$5*6=30$
years
to
grow
to
$
\\
$9600$.
The
answer
is
(C).
\n\
\n
Q:
Ten
students
take
a
biology
test
and
receive
the
following
scores:
45,
55,
\
\
50,
70,
65,
80,
40,
90,
70,
85.
What
is
the
mean
of
the
students
\u2019
test
scores?
\n\
\
50,
70,
65,
80,
40,
90,
70,
85.
What
is
the
mean
of
the
students
’
test
scores?
\n\
(A)
55
(B)
60
(C)
62
(D)
65
\n
A:
Let's
think
step
by
step.
There
are
10
students
\
\
and
the
sum
of
their
scores
is
$45
+
55
+
50
+
70
+
65
+
80
+
40
+
90
+
70
+
85
\
\
=
650$,
the
mean
is
$650/10=65$.
The
answer
is
(D).
\n\n
Q:
The
variable
$x$
varies
\
...
...
@@ -32,5 +32,6 @@ description: "The following are multiple choice questions (with answers) about h
\
dance.)
\n
(A)
3
(B)
15
(C)
6
(D)
5
\n
A:
Let's
think
step
by
step.
The
least
common
\
\
multiple
of
2,
3
and
5
is
30,
so
during
a
7
minute
dance,
all
the
three
lights
\
\
will
come
on
at
the
same
time
$2*7+1=15$
times.
The
answer
is
(B)."
include
:
_mmlu_flan_cot_fewshot_template_yaml
task
:
mmlu_flan_cot_fewshot_high_school_mathematics
"
group"
:
"
mmlu_flan_cot_fewshot_stem"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
task"
:
"
mmlu_flan_cot_fewshot_high_school_mathematics"
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_high_school_microeconomics.yaml
View file @
574e565a
dataset_name
:
high_school_microeconomics
description
:
'
The
following
are
multiple
choice
questions
(with
answers)
about
high
school
microeconomics.
Q:
Which
of
the
following
is
necessarily
a
characteristic
of
oligopoly?
(A)
Free
entry
into
and
exit
from
the
market
(B)
A
few
large
producers
(C)
One
producer
of
a
good
with
no
close
substitutes
(D)
A
homogenous
product
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
microeconomics
for
help.
An
oligopoly
is
when
a
market
is
dominated
by
just
one
or
a
few
number
of
sellers
or
producers.
To
get
oligopoly,
the
market
should
have
high
barriers
to
new
entry,
and
the
product
has
differentiation.
The
answer
is
(B).
Q:
If
the
government
subsidizes
producers
in
a
perfectly
competitive
market,
then
(A)
the
demand
for
the
product
will
increase
(B)
the
demand
for
the
product
will
decrease
(C)
the
consumer
surplus
will
increase
(D)
the
consumer
surplus
will
decrease
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
microeconomics
for
help.
(A)
and
(B)
are
wrong
because
the
demand
curve
does
not
change
at
all.
If
the
government
subsidizes
producers,
the
supply
will
increase,
and
thus
the
consumer
surplus
also
increases.
The
answer
is
(C).
Q:
Which
of
the
following
is
true
of
a
price
floor?
(A)
The
price
floor
shifts
the
demand
curve
to
the
left.
(B)
An
effective
floor
creates
a
shortage
of
the
good.
(C)
The
price
floor
shifts
the
supply
curve
of
the
good
to
the
right.
(D)
To
be
an
effective
floor,
it
must
be
set
above
the
equilibrium
price.
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
microeconomics
for
help.
Price
floor
does
not
shift
the
demand
or
shift
curve.
An
effective
price
floor
should
be
set
above
the
equilibrium
price,
otherwise
the
market
bears
and
the
floor
does
not
have
effective
effect.
The
answer
is
(D).
Q:
The
concentration
ratio
for
a
monopoly
is
(A)
0
(B)
5
(C)
10
(D)
100
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
microeconomics
for
help.
The
concentration
ratio
is
calculated
as
the
sum
of
market
share
of
a
specific
number
of
largest
companies.
Monopoly
means
one
company
or
entity
controls
the
entire
market,
therefore,
the
concentration
ratio
is
100
percent.
The
answer
is
(D).
Q:
In
a
competitive
labor
market
for
housepainters,
which
of
the
following
would
increase
the
demand
for
housepainters?
(A)
An
effective
minimum
wage
imposed
on
this
labor
market.
(B)
An
increase
in
the
price
of
gallons
of
paint.
(C)
An
increase
in
the
construction
of
new
houses.
(D)
An
increase
in
the
price
of
mechanical
painters
so
long
as
the
output
effect
exceeds
the
substitution
effect.
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
microeconomics
for
help.
An
increase
in
the
construction
of
new
houses
means
an
increase
demand
of
in-house
painting,
thus
increases
the
demand
for
housepainters.
The
answer
is
(C).'
include
:
_mmlu_flan_cot_fewshot_template_yaml
task
:
mmlu_flan_cot_fewshot_high_school_microeconomics
"
dataset_name"
:
"
high_school_microeconomics"
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
high
\
\
school
microeconomics.
\n\n
Q:
Which
of
the
following
is
necessarily
a
characteristic
\
\
of
oligopoly?
\n
(A)
Free
entry
into
and
exit
from
the
market
(B)
A
few
large
producers
\
\
(C)
One
producer
of
a
good
with
no
close
substitutes
(D)
A
homogenous
product
\n\
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
microeconomics
for
\
\
help.
An
oligopoly
is
when
a
market
is
dominated
by
just
one
or
a
few
number
of
\
\
sellers
or
producers.
To
get
oligopoly,
the
market
should
have
high
barriers
to
\
\
new
entry,
and
the
product
has
differentiation.
The
answer
is
(B).
\n\n
Q:
If
the
\
\
government
subsidizes
producers
in
a
perfectly
competitive
market,
then
\n
(A)
the
\
\
demand
for
the
product
will
increase
(B)
the
demand
for
the
product
will
decrease
\
\
(C)
the
consumer
surplus
will
increase
(D)
the
consumer
surplus
will
decrease
\n\
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
microeconomics
for
\
\
help.
(A)
and
(B)
are
wrong
because
the
demand
curve
does
not
change
at
all.
If
\
\
the
government
subsidizes
producers,
the
supply
will
increase,
and
thus
the
consumer
\
\
surplus
also
increases.
The
answer
is
(C).
\n\n
Q:
Which
of
the
following
is
true
\
\
of
a
price
floor?
\n
(A)
The
price
floor
shifts
the
demand
curve
to
the
left.
(B)
\
\
An
effective
floor
creates
a
shortage
of
the
good.
(C)
The
price
floor
shifts
\
\
the
supply
curve
of
the
good
to
the
right.
(D)
To
be
an
effective
floor,
it
must
\
\
be
set
above
the
equilibrium
price.
\n
A:
Let's
think
step
by
step.
We
refer
to
\
\
Wikipedia
articles
on
microeconomics
for
help.
Price
floor
does
not
shift
the
\
\
demand
or
shift
curve.
An
effective
price
floor
should
be
set
above
the
equilibrium
\
\
price,
otherwise
the
market
bears
and
the
floor
does
not
have
effective
effect.
\
\
The
answer
is
(D).
\n\n
Q:
The
concentration
ratio
for
a
monopoly
is
\n
(A)
0
(B)
\
\
5
(C)
10
(D)
100
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
\
\
on
microeconomics
for
help.
The
concentration
ratio
is
calculated
as
the
sum
of
\
\
market
share
of
a
specific
number
of
largest
companies.
Monopoly
means
one
company
\
\
or
entity
controls
the
entire
market,
therefore,
the
concentration
ratio
is
100
\
\
percent.
The
answer
is
(D).
\n\n
Q:
In
a
competitive
labor
market
for
housepainters,
\
\
which
of
the
following
would
increase
the
demand
for
housepainters?
\n
(A)
An
effective
\
\
minimum
wage
imposed
on
this
labor
market.
(B)
An
increase
in
the
price
of
gallons
\
\
of
paint.
(C)
An
increase
in
the
construction
of
new
houses.
(D)
An
increase
in
\
\
the
price
of
mechanical
painters
so
long
as
the
output
effect
exceeds
the
substitution
\
\
effect.
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
microeconomics
\
\
for
help.
An
increase
in
the
construction
of
new
houses
means
an
increase
demand
\
\
of
in-house
painting,
thus
increases
the
demand
for
housepainters.
The
answer
\
\
is
(C)."
"
group"
:
"
mmlu_flan_cot_fewshot_social_sciences"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
task"
:
"
mmlu_flan_cot_fewshot_high_school_microeconomics"
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_high_school_physics.yaml
View file @
574e565a
dataset_name
:
high_school_physics
description
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
high
\
"
dataset_name
"
:
"
high_school_physics
"
"
description
"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
high
\
\
school
physics.
\n\n
Q:
A
microwave
oven
is
connected
to
an
outlet,
120
V,
and
draws
\
\
a
current
of
2
amps.
At
what
rate
is
energy
being
used
by
the
microwave
oven?
\n\
(A)
10
W
(B)
30
W
(C)
60
W
(D)
240
W
\n
A:
Let's
think
step
by
step.
Rate
of
energy
\
\
usage
is
known
as
power;
in
an
dissipative
electrical
circuit,
power
is
given
\
\
by
voltage
times
current.
So
in
our
case,
the
power
is
120
V
times
2
amps,
or
\
\
240
W.
The
answer
is
(D).
\n\n
Q:
A
point
charge,
Q
=
+1
mC,
is
fixed
at
the
origin.
\
\
How
much
work
is
required
to
move
a
charge,
Q
=
+8
\xB5
C,
from
the
point
(0,
4
\
\
meters)
to
the
point
(3
meters,
0)?
\n
(A)
3.5
J
(B)
6.0
J
(C)
22.5
J
(D)
40
J
\n\
A:
Let's
think
step
by
step.
To
calculate
the
work
required
to
move
a
charge
from
\
\
one
location
to
another
in
a
fixed
electric
field,
it
is
enough
to
calculate
the
\
\
potential
difference
between
the
two
locations.
Here,
the
potential
only
depends
\
\
on
the
distance
between
the
charges;
it
\u2019
s
$k
q_1
q_2
/
r$,
where
$k$
is
Coulomb
\u2019\
s
constant.
Plugging
in
values
$q_1
=
$
1
mC,
$q_2
=
8
\\
mu$
C,
gives
the
answer
\
\
as
5.992
J,
which
rounds
to
6
J.
The
answer
is
(B).
\n\n
Q:
Which
of
the
following
\
\
conditions
will
ensure
that
angular
momentum
is
conserved?
I.
Conservation
of
\
\
linear
momentum
II.
Zero
net
external
force
III.
Zero
net
external
torque
\n
(A)
\
\
I
and
II
only
(B)
I
and
III
only
(C)
II
and
III
only
(D)
III
only
\n
A:
Let's
think
\
\
step
by
step.
Torque
is
defined
as
the
change
in
angular
momentum;
if
there
is
\
\
zero
external
torque,
angular
momentum
is
conserved.
The
answer
is
(D).
\n\n
Q:
\
\
A
photocell
of
work
function
\u03D5
=
2eV
is
connected
to
a
resistor
in
series.
\
\
Light
of
frequency
f
=
1
\xD7
10^15
Hz
hits
a
metal
plate
of
the
photocell.
If
\
\
the
power
of
the
light
is
P
=
100
W,
what
is
the
current
through
the
resistor?
\n\
(A)
2:00
AM
(B)
6:00
AM
(C)
12:00
AM
(D)
24
A
\n
A:
Let's
think
step
by
step.
The
\
\
only
answer
above
which
has
units
of
current
is
D,
24
A.
The
answer
is
(D).
\n\n\
Q:
A
pipe
full
of
air
is
closed
at
one
end.
A
standing
wave
is
produced
in
the
pipe,
\
\
causing
the
pipe
to
sound
a
note.
Which
of
the
following
is
a
correct
statement
\
\
about
the
wave
\u2019
s
properties
at
the
closed
end
of
the
pipe?
\n
(A)
The
pressure
\
\
is
at
a
node,
but
the
particle
displacement
is
at
an
antinode.
(B)
The
pressure
\
\
is
at
an
antinode,
but
the
particle
displacement
is
at
a
node.
(C)
The
pressure
\
\
and
the
particle
displacement
are
both
at
nodes.
(D)
The
pressure
and
the
particle
\
\
displacement
are
both
at
antinodes.
\n
A:
Let's
think
step
by
step.
At
the
closed
\
\
end
of
the
pipe,
the
particles
cannot
have
any
net
displacement
because
the
pipe
\
\
closure
stops
them.
So
the
particle
displacement
is
at
a
node.
This
closure
also
\
\
causes
the
pressure
to
be
maximal,
i.e.
an
antinode.
The
answer
is
(B)."
include
:
_mmlu_flan_cot_fewshot_template_yaml
task
:
mmlu_flan_cot_fewshot_high_school_physics
\
How
much
work
is
required
to
move
a
charge,
Q
=
+8
µC,
from
the
point
(0,
4
meters)
\
\
to
the
point
(3
meters,
0)?
\n
(A)
3.5
J
(B)
6.0
J
(C)
22.5
J
(D)
40
J
\n
A:
Let's
\
\
think
step
by
step.
To
calculate
the
work
required
to
move
a
charge
from
one
location
\
\
to
another
in
a
fixed
electric
field,
it
is
enough
to
calculate
the
potential
\
\
difference
between
the
two
locations.
Here,
the
potential
only
depends
on
the
\
\
distance
between
the
charges;
it’s
$k
q_1
q_2
/
r$,
where
$k$
is
Coulomb’s
constant.
\
\
Plugging
in
values
$q_1
=
$
1
mC,
$q_2
=
8
\\
mu$
C,
gives
the
answer
as
5.992
\
\
J,
which
rounds
to
6
J.
The
answer
is
(B).
\n\n
Q:
Which
of
the
following
conditions
\
\
will
ensure
that
angular
momentum
is
conserved?
I.
Conservation
of
linear
momentum
\
\
II.
Zero
net
external
force
III.
Zero
net
external
torque
\n
(A)
I
and
II
only
(B)
\
\
I
and
III
only
(C)
II
and
III
only
(D)
III
only
\n
A:
Let's
think
step
by
step.
\
\
Torque
is
defined
as
the
change
in
angular
momentum;
if
there
is
zero
external
\
\
torque,
angular
momentum
is
conserved.
The
answer
is
(D).
\n\n
Q:
A
photocell
of
\
\
work
function
ϕ
=
2eV
is
connected
to
a
resistor
in
series.
Light
of
frequency
\
\
f
=
1
×
10^15
Hz
hits
a
metal
plate
of
the
photocell.
If
the
power
of
the
light
\
\
is
P
=
100
W,
what
is
the
current
through
the
resistor?
\n
(A)
2:00
AM
(B)
6:00
\
\
AM
(C)
12:00
AM
(D)
24
A
\n
A:
Let's
think
step
by
step.
The
only
answer
above
which
\
\
has
units
of
current
is
D,
24
A.
The
answer
is
(D).
\n\n
Q:
A
pipe
full
of
air
is
\
\
closed
at
one
end.
A
standing
wave
is
produced
in
the
pipe,
causing
the
pipe
to
\
\
sound
a
note.
Which
of
the
following
is
a
correct
statement
about
the
wave’s
properties
\
\
at
the
closed
end
of
the
pipe?
\n
(A)
The
pressure
is
at
a
node,
but
the
particle
\
\
displacement
is
at
an
antinode.
(B)
The
pressure
is
at
an
antinode,
but
the
particle
\
\
displacement
is
at
a
node.
(C)
The
pressure
and
the
particle
displacement
are
\
\
both
at
nodes.
(D)
The
pressure
and
the
particle
displacement
are
both
at
antinodes.
\n\
A:
Let's
think
step
by
step.
At
the
closed
end
of
the
pipe,
the
particles
cannot
\
\
have
any
net
displacement
because
the
pipe
closure
stops
them.
So
the
particle
\
\
displacement
is
at
a
node.
This
closure
also
causes
the
pressure
to
be
maximal,
\
\
i.e.
an
antinode.
The
answer
is
(B)."
"
group"
:
"
mmlu_flan_cot_fewshot_stem"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
task"
:
"
mmlu_flan_cot_fewshot_high_school_physics"
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_high_school_psychology.yaml
View file @
574e565a
dataset_name
:
high_school_psychology
description
:
'
The
following
are
multiple
choice
questions
(with
answers)
about
high
school
psychology.
Q:
Pascale
is
interested
in
the
processing
strategies
children
use
to
learn
new
information.
Pascale
would
best
be
classified
as
what
type
of
psychologist?
(A)
sociocultural
(B)
clinical
(C)
cognitive
(D)
behaviorist
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
psychology
for
help.
Sociocultural
psychologist
focuses
on
the
effect
of
societal
factors
on
people.
Clinical
psychologist
focuses
on
people
with
mental
issues.
Cognitive
psychologist
focuses
on
how
people
think
and
learn,
including
the
processing
strategies.
Behaviorist
focuses
more
on
the
environment
and
experience
effect
on
people.
The
answer
is
(C).
Q:
According
to
Caplan'
'
s
model
of
consultee-centered
case
consultation,
the
consultant
is
primarily
interested
in
(A)
identifying
the
causes
and
solutions
of
the
client'
'
s
presenting
problems
(B)
identifying
and
eliminating
the
causes
of
the
consultee'
'
s
difficulties
in
handling
a
problem
(C)
establishing
a
hierarchy
of
authority
to
enable
effective
decision
making
(D)
presenting
a
single,
well-defined
and
unambiguous
course
of
action
for
the
consultant
to
overcome
skills
deficits
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
psychology
for
help.
Caplan
defines
two
type
of
consultation.
Client-centered
case
consultation
aims
to
handle
client'
'
s
problems,
while
consultee-centered
case
consultation
aims
to
identify
the
reason
of
client'
'
s
difficulty
to
solve
problems.
The
answer
is
(B).
Q:
According
to
the
Individuals
with
Disabilities
Education
Improvement
Act,
which
of
the
following
must
an
educational
agency
do
before
it
changes
the
educational
placement
of
a
student
with
a
disability?
(A)
Give
the
child
a
trial
period
in
the
new
environment
(B)
Notify
the
parents
in
writing
(C)
Obtain
school
board
approval
(D)
Obtain
parental
consent
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
psychology
for
help.
When
the
decision
to
change
the
educational
placement
of
a
student
with
a
disability
is
made,
the
educational
agency
must
notify
the
parents
in
writing
on
that
date.
The
answer
is
(B).
Q:
While
swimming
in
the
ocean,
Ivan
is
frightened
by
a
dark
shadow
in
the
water
even
before
he
has
the
chance
to
identify
what
the
shadow
is.
The
synaptic
connections
taking
place
during
this
incident
of
fright
are
best
described
by
which
of
the
following?
(A)
Messages
are
sent
from
the
thalamus
directly
to
the
amygdala.
(B)
Messages
are
sent
from
the
thalamus
to
the
"what"
and
"where"
pathways.
(C)
Messages
are
sent
from
the
parasympathetic
nervous
system
to
the
cerebral
cortex.
(D)
Messages
are
sent
from
the
frontal
lobes
to
the
pituitary
gland.
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
psychology
for
help.
Our
neural
system
has
a
mechanism
that
can
respond
immediate
emotional
signal
before
going
to
the
thought
center.
In
the
Ivan'
'
s
case,
messages
travel
directly
from
thalamus
to
amygdala.
The
answer
is
(A).
Q:
Ani
believes
that
her
attitudes
and
behavior
play
a
central
role
in
what
happens
to
her.
Such
a
belief
is
likely
to
be
associated
with
(A)
a
strong
superego.
(B)
low
self-esteem.
(C)
low
self-efficacy.
(D)
an
internal
locus
of
control.
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
psychology
for
help.
People
with
an
external
locus
of
control
believes
fate
and
luck
play
an
important
role
in
their
lives,
while
people
with
an
internal
locus
of
control
believes
they
control
their
lives.
The
answer
is
(D).'
include
:
_mmlu_flan_cot_fewshot_template_yaml
task
:
mmlu_flan_cot_fewshot_high_school_psychology
"
dataset_name"
:
"
high_school_psychology"
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
high
\
\
school
psychology.
\n\n
Q:
Pascale
is
interested
in
the
processing
strategies
children
\
\
use
to
learn
new
information.
Pascale
would
best
be
classified
as
what
type
of
\
\
psychologist?
\n
(A)
sociocultural
(B)
clinical
(C)
cognitive
(D)
behaviorist
\n\
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
psychology
for
help.
\
\
Sociocultural
psychologist
focuses
on
the
effect
of
societal
factors
on
people.
\
\
Clinical
psychologist
focuses
on
people
with
mental
issues.
Cognitive
psychologist
\
\
focuses
on
how
people
think
and
learn,
including
the
processing
strategies.
Behaviorist
\
\
focuses
more
on
the
environment
and
experience
effect
on
people.
The
answer
is
\
\
(C).
\n\n
Q:
According
to
Caplan's
model
of
consultee-centered
case
consultation,
\
\
the
consultant
is
primarily
interested
in
\n
(A)
identifying
the
causes
and
solutions
\
\
of
the
client's
presenting
problems
(B)
identifying
and
eliminating
the
causes
\
\
of
the
consultee's
difficulties
in
handling
a
problem
(C)
establishing
a
hierarchy
\
\
of
authority
to
enable
effective
decision
making
(D)
presenting
a
single,
well-defined
\
\
and
unambiguous
course
of
action
for
the
consultant
to
overcome
skills
deficits
\n\
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
psychology
for
help.
\
\
Caplan
defines
two
type
of
consultation.
Client-centered
case
consultation
aims
\
\
to
handle
client's
problems,
while
consultee-centered
case
consultation
aims
to
\
\
identify
the
reason
of
client's
difficulty
to
solve
problems.
The
answer
is
(B).
\n\
\n
Q:
According
to
the
Individuals
with
Disabilities
Education
Improvement
Act,
which
\
\
of
the
following
must
an
educational
agency
do
before
it
changes
the
educational
\
\
placement
of
a
student
with
a
disability?
\n
(A)
Give
the
child
a
trial
period
in
\
\
the
new
environment
(B)
Notify
the
parents
in
writing
(C)
Obtain
school
board
\
\
approval
(D)
Obtain
parental
consent
\n
A:
Let's
think
step
by
step.
We
refer
to
\
\
Wikipedia
articles
on
psychology
for
help.
When
the
decision
to
change
the
educational
\
\
placement
of
a
student
with
a
disability
is
made,
the
educational
agency
must
\
\
notify
the
parents
in
writing
on
that
date.
The
answer
is
(B).
\n\n
Q:
While
swimming
\
\
in
the
ocean,
Ivan
is
frightened
by
a
dark
shadow
in
the
water
even
before
he
\
\
has
the
chance
to
identify
what
the
shadow
is.
The
synaptic
connections
taking
\
\
place
during
this
incident
of
fright
are
best
described
by
which
of
the
following?
\n\
(A)
Messages
are
sent
from
the
thalamus
directly
to
the
amygdala.
(B)
Messages
are
\
\
sent
from
the
thalamus
to
the
\"
what
\"
and
\"
where
\"
pathways.
(C)
Messages
are
\
\
sent
from
the
parasympathetic
nervous
system
to
the
cerebral
cortex.
(D)
Messages
\
\
are
sent
from
the
frontal
lobes
to
the
pituitary
gland.
\n
A:
Let's
think
step
by
\
\
step.
We
refer
to
Wikipedia
articles
on
psychology
for
help.
Our
neural
system
\
\
has
a
mechanism
that
can
respond
immediate
emotional
signal
before
going
to
the
\
\
thought
center.
In
the
Ivan's
case,
messages
travel
directly
from
thalamus
to
\
\
amygdala.
The
answer
is
(A).
\n\n
Q:
Ani
believes
that
her
attitudes
and
behavior
\
\
play
a
central
role
in
what
happens
to
her.
Such
a
belief
is
likely
to
be
associated
\
\
with
\n
(A)
a
strong
superego.
(B)
low
self-esteem.
(C)
low
self-efficacy.
(D)
an
\
\
internal
locus
of
control.
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
\
\
articles
on
psychology
for
help.
People
with
an
external
locus
of
control
believes
\
\
fate
and
luck
play
an
important
role
in
their
lives,
while
people
with
an
internal
\
\
locus
of
control
believes
they
control
their
lives.
The
answer
is
(D)."
"
group"
:
"
mmlu_flan_cot_fewshot_social_sciences"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
task"
:
"
mmlu_flan_cot_fewshot_high_school_psychology"
Prev
1
2
3
4
5
6
7
8
9
…
25
Next
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment