Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
gaoqiong
lm-evaluation-harness
Commits
da211969
Unverified
Commit
da211969
authored
Jun 28, 2024
by
Jess
Committed by
GitHub
Jun 28, 2024
Browse files
Merge branch 'EleutherAI:main' into main
parents
1b97e487
801322e0
Changes
654
Hide whitespace changes
Inline
Side-by-side
Showing
20 changed files
with
1422 additions
and
996 deletions
+1422
-996
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_anatomy.yaml
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_anatomy.yaml
+75
-57
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_astronomy.yaml
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_astronomy.yaml
+70
-55
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_business_ethics.yaml
...val/tasks/mmlu/flan_cot_fewshot/mmlu_business_ethics.yaml
+75
-56
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_clinical_knowledge.yaml
.../tasks/mmlu/flan_cot_fewshot/mmlu_clinical_knowledge.yaml
+48
-35
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_college_biology.yaml
...val/tasks/mmlu/flan_cot_fewshot/mmlu_college_biology.yaml
+75
-61
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_college_chemistry.yaml
...l/tasks/mmlu/flan_cot_fewshot/mmlu_college_chemistry.yaml
+49
-38
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_college_computer_science.yaml
.../mmlu/flan_cot_fewshot/mmlu_college_computer_science.yaml
+180
-79
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_college_mathematics.yaml
...tasks/mmlu/flan_cot_fewshot/mmlu_college_mathematics.yaml
+73
-50
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_college_medicine.yaml
...al/tasks/mmlu/flan_cot_fewshot/mmlu_college_medicine.yaml
+68
-52
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_college_physics.yaml
...val/tasks/mmlu/flan_cot_fewshot/mmlu_college_physics.yaml
+61
-44
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_computer_security.yaml
...l/tasks/mmlu/flan_cot_fewshot/mmlu_computer_security.yaml
+50
-36
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_conceptual_physics.yaml
.../tasks/mmlu/flan_cot_fewshot/mmlu_conceptual_physics.yaml
+49
-33
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_econometrics.yaml
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_econometrics.yaml
+87
-63
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_electrical_engineering.yaml
...ks/mmlu/flan_cot_fewshot/mmlu_electrical_engineering.yaml
+47
-34
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_elementary_mathematics.yaml
...ks/mmlu/flan_cot_fewshot/mmlu_elementary_mathematics.yaml
+77
-41
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_formal_logic.yaml
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_formal_logic.yaml
+70
-53
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_global_facts.yaml
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_global_facts.yaml
+49
-34
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_high_school_biology.yaml
...tasks/mmlu/flan_cot_fewshot/mmlu_high_school_biology.yaml
+69
-54
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_high_school_chemistry.yaml
...sks/mmlu/flan_cot_fewshot/mmlu_high_school_chemistry.yaml
+66
-50
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_high_school_computer_science.yaml
...u/flan_cot_fewshot/mmlu_high_school_computer_science.yaml
+84
-71
No files found.
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_anatomy.yaml
View file @
da211969
"
dataset_name"
:
"
anatomy"
dataset_name
:
anatomy
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
anatomy.
\n\
description
:
The following are multiple choice questions (with answers) about anatomy.
\n
Q:
Which
of
the
following
is
the
body
cavity
that
contains
the
pituitary
gland?
\n\
fewshot_config
:
(A)
Abdominal
(B)
Cranial
(C)
Pleural
(D)
Spinal
\n
A:
Let's
think
step
by
step.
We
\
sampler
:
first_n
\
refer
to
Wikipedia
articles
on
anatomy
for
help.
Let’s
solve
this
problem
step
\
samples
:
\
by
step.
The
pituitary
gland
is
the
major
endocrine
gland
attached
to
the
base
\
-
question
:
'
Which
of
the
following
is
the
body
cavity
that
contains
the
pituitary
\
of
the
brain,
and
it
is
contained
in
the
Cranial
cavity.
The
answer
is
(B).
\n\n\
gland?
Q:
Which
of
these
branches
of
the
trigeminal
nerve
contain
somatic
motor
processes?
\n\
(A)
The
supraorbital
nerve
(B)
The
infraorbital
nerve
(C)
The
mental
nerve
(D)
None
\
(A)
Abdominal
(B)
Cranial
(C)
Pleural
(D)
Spinal'
\
of
the
above
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
anatomy
\
target
:
"
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
anatomy
for
\
\
for
help.
Let’s
solve
this
problem
step
by
step.
\n
We
know
the
following:
(A)
\
\
help.
Let
\u2019
s
solve
this
problem
step
by
step.
The
pituitary
gland
is
the
\
\
The
supraorbital
nerve
(also
known
as
the
frontal
nerve)
is
the
largest
branch
\
\
major
endocrine
gland
attached
to
the
base
of
the
brain,
and
it
is
contained
\
\
of
the
ophthalmic
nerve
and
branch
of
ophthalmic
division
of
the
trigeminal
nerve.
\
\
in
the
Cranial
cavity.
The
answer
is
(B)."
\
(B)
The
infraorbital
nerve
is
a
branch
of
the
maxillary
division
of
the
trigeminal
\
-
question
:
'
Which
of
these
branches
of
the
trigeminal
nerve
contain
somatic
motor
\
nerve.
(C)
The
mental
nerve
is
a
branch
of
the
mandibular
division
of
the
trigeminal
\
processes?
\
nerve.
Because
all
these
nerves
are
purely
sensory
nerves
and
do
not
contain
any
\
\
somatic
motor
processes.
Therefore,
the
answer
should
be
none
of
the
above,
which
\
(A)
The
supraorbital
nerve
(B)
The
infraorbital
nerve
(C)
The
mental
nerve
(D)
\
is
(D).
The
answer
is
(D).
\n\n
Q:
In
Angle's
Class
II
Div
2
occlusion
there
is
\n\
None
of
the
above'
(A)
excess
overbite
of
the
upper
lateral
incisors.
(B)
negative
overjet
of
the
upper
\
target
:
"
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
anatomy
for
\
\
central
incisors.
(C)
excess
overjet
of
the
upper
lateral
incisors.
(D)
excess
\
\
help.
Let
\u2019
s
solve
this
problem
step
by
step.
\n
We
know
the
following:
\
\
overjet
of
the
upper
central
incisors.
\n
A:
Let's
think
step
by
step.
We
refer
\
\
(A)
The
supraorbital
nerve
(also
known
as
the
frontal
nerve)
is
the
largest
\
\
to
Wikipedia
articles
on
anatomy
for
help.
Let’s
solve
this
problem
step
by
step.
\
\
branch
of
the
ophthalmic
nerve
and
branch
of
ophthalmic
division
of
the
trigeminal
\
\
This
is
a
question
related
to
anatomy
and
orthodontics.
Excess
overjet
is
associated
\
\
nerve.
(B)
The
infraorbital
nerve
is
a
branch
of
the
maxillary
division
of
\
\
with
Class
II
occlusions;
therefore,
we
can
safely
eliminate
(B)
from
the
list,
\
\
the
trigeminal
nerve.
(C)
The
mental
nerve
is
a
branch
of
the
mandibular
division
\
\
as
negative
overjet
is
often
associated
with
Class
III
occlusions.
Now,
we
need
\
\
of
the
trigeminal
nerve.
Because
all
these
nerves
are
purely
sensory
nerves
\
\
to
determine
the
location
of
the
excess
overjet,
and
that
would
be
the
upper
(maxillary)
\
\
and
do
not
contain
any
somatic
motor
processes.
Therefore,
the
answer
should
\
\
lateral
incisors.
Only
(C)
has
the
correct
information.
The
answer
is
(C).
\n\n\
\
be
none
of
the
above,
which
is
(D).
The
answer
is
(D)."
Q:
The
pleura
\n
(A)
have
no
sensory
innervation.
(B)
are
separated
by
a
2
mm
space.
\
-
question
:
'
In
Angle'
'
s
Class
II
Div
2
occlusion
there
is
\
(C)
extend
into
the
neck.
(D)
are
composed
of
respiratory
epithelium.
\n
A:
Let's
\
\
think
step
by
step.
We
refer
to
Wikipedia
articles
on
anatomy
for
help.
Let’s
\
(A)
excess
overbite
of
the
upper
lateral
incisors.
(B)
negative
overjet
of
the
\
solve
this
problem
step
by
step.
First,
recall
that
the
pleura
refers
to
the
thin
\
upper
central
incisors.
(C)
excess
overjet
of
the
upper
lateral
incisors.
(D)
\
layer
of
tissue
that
covers
the
lungs
and
lines
the
interior
wall
of
the
chest
\
excess
overjet
of
the
upper
central
incisors.'
\
cavity.
Now,
let’s
look
at
each
option:
\n
Option
(A):
“The
pleura
have
no
sensory
\
target
:
"
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
anatomy
for
\
\
innervation.”
This
information
is
not
correct.
The
pleura
do
have
a
sensory
innervation.
\n\
\
help.
Let
\u2019
s
solve
this
problem
step
by
step.
This
is
a
question
related
\
Option
(B):
“The
pleura
are
separated
by
a
2
mm
space.”
This
information
is
not
\
\
to
anatomy
and
orthodontics.
Excess
overjet
is
associated
with
Class
II
occlusions;
\
\
correct.
There
is
a
very
thin
“potential”
space
between
the
layers
of
the
pleura;
\
\
therefore,
we
can
safely
eliminate
(B)
from
the
list,
as
negative
overjet
\
\
however,
it
is
typically
filled
with
serous
pleural
fluid.
\n
Option
(C):
“The
\
\
is
often
associated
with
Class
III
occlusions.
Now,
we
need
to
determine
the
\
\
pleura
extend
into
the
neck.”
This
information
is
actuakky
true.
The
cervical
\
\
location
of
the
excess
overjet,
and
that
would
be
the
upper
(maxillary)
lateral
\
\
pleura,
also
known
as
the
dome
of
the
pleuradome
of
the
pleura,
lines
the
extendsiton
\
\
incisors.
Only
(C)
has
the
correct
information.
The
answer
is
(C)."
\
of
the
pleural
cavity
into
the
neck.
\n
Option
(D):
“The
pleura
are
composed
of
\
-
question
:
'
The
pleura
\
respiratory
epithelium.”
This
information
is
not
correct.
The
pleaura
are
composed
\
\
of
connective
tissue
(CT).
\n
Because
(A),
(B),
and
(D)
are
all
incorrect,
(D)
is
\
(A)
have
no
sensory
innervation.
(B)
are
separated
by
a
2
mm
space.
(C)
extend
\
the
only
correct
answer.
The
answer
is
(C).
\n\n
Q:
What
is
the
embryological
origin
\
into
the
neck.
(D)
are
composed
of
respiratory
epithelium.'
\
of
the
hyoid
bone?
\n
(A)
The
first
pharyngeal
arch
(B)
The
first
and
second
pharyngeal
\
target
:
"
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
anatomy
for
\
\
arches
(C)
The
second
pharyngeal
arch
(D)
The
second
and
third
pharyngeal
arches
\n\
\
help.
Let
\u2019
s
solve
this
problem
step
by
step.
First,
recall
that
the
pleura
\
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
anatomy
for
help.
\
\
refers
to
the
thin
layer
of
tissue
that
covers
the
lungs
and
lines
the
interior
\
\
Let’s
solve
this
problem
step
by
step.
The
hyoid
bone,
which
is
also
known
as
\
\
wall
of
the
chest
cavity.
Now,
let
\u2019
s
look
at
each
option:
\n
Option
(A):
\
\
the
hyooid,
is
a
a
small
U-shaped
bone
located
in
the
anterior
neck.
In
its
resting
\
\ \u201C
The
pleura
have
no
sensory
innervation.
\u201D
This
information
is
not
\
\
position,
it
lies
between
the
ase
of
the
mandible
and
the
third
cervical
vertebrae.
\
\
correct.
The
pleura
do
have
a
sensory
innervation.
\n
Option
(B):
\u201C
The
\
\
We
know
that
the
second
and
the
third
pharyngeal
arches
give
rise
to
the
horns
\
\
pleura
are
separated
by
a
2
mm
space.
\u201D
This
information
is
not
correct.
\
\
of
the
hyoid
bone;
therefore,
the
embryological
origin
of
the
hyoid
bone
are
the
\
\
There
is
a
very
thin
\u201C
potential
\u201D
space
between
the
layers
of
the
\
\
second
and
the
third
pharyngeal
arches—this
information
is
covered
in
the
last
\
\
pleura;
however,
it
is
typically
filled
with
serous
pleural
fluid.
\n
Option
\
\
option
(D).
Therefore,
we
conclude
that
(D)
must
be
the
correct
answer.
The
answer
\
\
(C):
\u201C
The
pleura
extend
into
the
neck.
\u201D
This
information
is
actuakky
\
\
is
(D).
\n\n
"
\
true.
The
cervical
pleura,
also
known
as
the
dome
of
the
pleuradome
of
the
\
"
group"
:
"
mmlu_flan_cot_fewshot_stem"
\
pleura,
lines
the
extendsiton
of
the
pleural
cavity
into
the
neck.
\n
Option
\
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
\
(D):
\u201C
The
pleura
are
composed
of
respiratory
epithelium.
\u201D
This
information
\
"
task"
:
"
mmlu_flan_cot_fewshot_anatomy"
\
is
not
correct.
The
pleaura
are
composed
of
connective
tissue
(CT).
\n
Because
\
\
(A),
(B),
and
(D)
are
all
incorrect,
(D)
is
the
only
correct
answer.
The
answer
\
\
is
(C)."
-
question
:
'
What
is
the
embryological
origin
of
the
hyoid
bone?
(A)
The
first
pharyngeal
arch
(B)
The
first
and
second
pharyngeal
arches
(C)
The
second
pharyngeal
arch
(D)
The
second
and
third
pharyngeal
arches'
target
:
"
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
anatomy
for
\
\
help.
Let
\u2019
s
solve
this
problem
step
by
step.
The
hyoid
bone,
which
is
\
\
also
known
as
the
hyooid,
is
a
a
small
U-shaped
bone
located
in
the
anterior
\
\
neck.
In
its
resting
position,
it
lies
between
the
ase
of
the
mandible
and
\
\
the
third
cervical
vertebrae.
We
know
that
the
second
and
the
third
pharyngeal
\
\
arches
give
rise
to
the
horns
of
the
hyoid
bone;
therefore,
the
embryological
\
\
origin
of
the
hyoid
bone
are
the
second
and
the
third
pharyngeal
arches
\u2014\
this
information
is
covered
in
the
last
option
(D).
Therefore,
we
conclude
that
\
\
(D)
must
be
the
correct
answer.
The
answer
is
(D).
\n\n
"
group
:
mmlu_flan_cot_fewshot_stem
include
:
_mmlu_flan_cot_fewshot_template_yaml
task
:
mmlu_flan_cot_fewshot_anatomy
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_astronomy.yaml
View file @
da211969
"
dataset_name"
:
"
astronomy"
dataset_name
:
astronomy
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
astronomy.
\n\
description
:
The following are multiple choice questions (with answers) about astronomy.
\n
Q:
Where
do
most
short-period
comets
come
from
and
how
do
we
know?
\n
(A)
The
Kuiper
\
fewshot_config
:
\
belt;
short
period
comets
tend
to
be
in
the
plane
of
the
solar
system
just
like
\
sampler
:
first_n
\
the
Kuiper
belt.
(B)
The
Kuiper
belt;
short
period
comets
tend
to
come
from
random
\
samples
:
\
directions
indicating
a
spherical
distribution
of
comets
called
the
Kuiper
belt.
\
-
question
:
'
Where
do
most
short-period
comets
come
from
and
how
do
we
know?
\
(C)
The
asteroid
belt;
short
period
comets
have
orbital
periods
similar
to
asteroids
\
\
like
Vesta
and
are
found
in
the
plane
of
the
solar
system
just
like
the
asteroid
\
(A)
The
Kuiper
belt;
short
period
comets
tend
to
be
in
the
plane
of
the
solar
\
belt.
(D)
The
Oort
cloud;
short
period
comets
tend
to
be
in
the
plane
of
the
solar
\
system
just
like
the
Kuiper
belt.
(B)
The
Kuiper
belt;
short
period
comets
tend
\
system
just
like
the
Oort
cloud.
\n
A:
Let's
think
step
by
step.
Most
short-period
\
to
come
from
random
directions
indicating
a
spherical
distribution
of
comets
\
comets
come
from
the
Kuiper
belt,
and
we
know
because
short
period
coments
tend
\
called
the
Kuiper
belt.
(C)
The
asteroid
belt;
short
period
comets
have
orbital
\
to
be
in
the
plane
of
the
solar
system,
just
like
the
Kuiper
belt
is.
The
answer
\
periods
similar
to
asteroids
like
Vesta
and
are
found
in
the
plane
of
the
solar
\
is
(A).
\n\n
Q:
You
are
pushing
a
truck
along
a
road.
Would
it
be
easier
to
accelerate
\
system
just
like
the
asteroid
belt.
(D)
The
Oort
cloud;
short
period
comets
\
this
truck
on
Mars?
Why?
(Assume
there
is
no
friction)
\n
(A)
It
would
be
harder
\
tend
to
be
in
the
plane
of
the
solar
system
just
like
the
Oort
cloud.'
\
since
the
truck
is
heavier
on
Mars.
(B)
It
would
be
easier
since
the
truck
is
\
target
:
Let's think step by step. Most short-period comets come from the Kuiper
\
lighter
on
Mars.
(C)
It
would
be
harder
since
the
truck
is
lighter
on
Mars.
(D)
\
belt, and we know because short period coments tend to be in the plane of the
\
It
would
be
the
same
no
matter
where
you
are.
\n
A:
Let's
think
step
by
step.
If
\
solar system, just like the Kuiper belt is. The answer is (A).
\
we
assume
that
there
is
no
friction,
the
force
needed
to
accelerate
the
truck
\
-
question
:
'
You
are
pushing
a
truck
along
a
road.
Would
it
be
easier
to
accelerate
\
is
by
Newton’s
second
law
only
dependent
on
the
mass
of
the
truck.
Hence
(A),
\
this
truck
on
Mars?
Why?
(Assume
there
is
no
friction)
\
(B)
and
(C)
are
incorrect
since
it
doesn’t
matter
that
it’s
on
Mars,
and
(D)
is
\
\
the
correct
answer.
The
answer
is
(D).
\n\n
Q:
Say
the
pupil
of
your
eye
has
a
diameter
\
(A)
It
would
be
harder
since
the
truck
is
heavier
on
Mars.
(B)
It
would
be
easier
\
of
5
mm
and
you
have
a
telescope
with
an
aperture
of
50
cm.
How
much
more
light
\
since
the
truck
is
lighter
on
Mars.
(C)
It
would
be
harder
since
the
truck
is
\
can
the
telescope
gather
than
your
eye?
\n
(A)
10000
times
more
(B)
100
times
more
\
lighter
on
Mars.
(D)
It
would
be
the
same
no
matter
where
you
are.'
\
(C)
1000
times
more
(D)
10
times
more
\n
A:
Let's
think
step
by
step.
The
amount
\
target
:
"
Let's
think
step
by
step.
If
we
assume
that
there
is
no
friction,
the
\
\
of
light
is
proportional
to
the
aperture
area
$A
=
\\
pi
D^2/4$
for
a
lens
with
\
\
force
needed
to
accelerate
the
truck
is
by
Newton
\u2019
s
second
law
only
dependent
\
\
diameter
$D$,
so
the
relative
amounts
of
light
between
the
eye
with
diameter
5mm
\
\
on
the
mass
of
the
truck.
Hence
(A),
(B)
and
(C)
are
incorrect
since
it
doesn
\u2019\
\
and
the
telescope
with
diameter
50mm
is
$(50
cm)^2/(5mm)^2
=
10000$.
The
answer
\
t
matter
that
it
\u2019
s
on
Mars,
and
(D)
is
the
correct
answer.
The
answer
is
\
\
is
(A).
\n\n
Q:
Why
isn't
there
a
planet
where
the
asteroid
belt
is
located?
\n
(A)
\
\
(D)."
\
A
planet
once
formed
here
but
it
was
broken
apart
by
a
catastrophic
collision.
\
-
question
:
'
Say
the
pupil
of
your
eye
has
a
diameter
of
5
mm
and
you
have
a
telescope
\
(B)
There
was
not
enough
material
in
this
part
of
the
solar
nebula
to
form
a
planet.
\
with
an
aperture
of
50
cm.
How
much
more
light
can
the
telescope
gather
than
\
(C)
There
was
too
much
rocky
material
to
form
a
terrestrial
planet
but
not
enough
\
your
eye?
\
gaseous
material
to
form
a
jovian
planet.
(D)
Resonance
with
Jupiter
prevented
\
\
material
from
collecting
together
to
form
a
planet.
\n
A:
Let's
think
step
by
step.
\
(A)
10000
times
more
(B)
100
times
more
(C)
1000
times
more
(D)
10
times
more'
\
The
asteroid
belt
is
a
stellar
disc
consisting
of
a
large
number
of
asteroids
\
target
:
Let's think step by step. The amount of light is proportional to the aperture
\
between
Mars
and
Jupiter's
orbits.
The
asteroids
in
this
belt
are
affected
by
\
area $A = \pi D^2/4$ for a lens with diameter $D$, so the relative amounts of
\
the
gravitational
pull
from
both
other
asteroids
and
nearby
planets.
Due
to
the
\
light between the eye with diameter 5mm and the telescope with diameter 50mm
\
strong
gravitational
force
of
Jupiter
there
are
resonances
that
give
rise
to
low
\
is $(50 cm)^2/(5mm)^2 = 10000$. The answer is (A).
\
density
regions
of
asteroids
known
as
the
Kirkwood
gap.
So
(B)
and
(C)
are
not
\
-
question
:
'
Why
isn'
'
t
there
a
planet
where
the
asteroid
belt
is
located?
\
correct
since
it’s
not
a
lack
of
material
that
prevents
a
planet
from
being
formed,
\
\
and
(A)
is
incorrect
because
the
Kirkwood
gap
would
have
prevented
a
planet
from
\
(A)
A
planet
once
formed
here
but
it
was
broken
apart
by
a
catastrophic
collision.
\
forming
in
the
first
place,
and
(D)
is
the
correct
option.
The
answer
is
(D).
\n\
(B)
There
was
not
enough
material
in
this
part
of
the
solar
nebula
to
form
a
\n
Q:
Why
is
Mars
red?
\n
(A)
Because
the
surface
is
covered
with
heavily
oxidized
\
planet.
(C)
There
was
too
much
rocky
material
to
form
a
terrestrial
planet
but
\
(
\"
rusted
\"
)
minerals.
(B)
Because
the
atmosphere
scatters
more
light
at
bluer
\
not
enough
gaseous
material
to
form
a
jovian
planet.
(D)
Resonance
with
Jupiter
\
wavelengths
transmitting
mostly
red
light.
(C)
Because
Mars
is
covered
with
ancient
\
prevented
material
from
collecting
together
to
form
a
planet.'
\
lava
flows
which
are
red
in
color.
(D)
Because
flowing
water
on
Mars's
surface
\
target
:
"
Let's
think
step
by
step.
The
asteroid
belt
is
a
stellar
disc
consisting
\
\
altered
the
surface
minerals
several
billion
years
ago.
\n
A:
Let's
think
step
by
\
\
of
a
large
number
of
asteroids
between
Mars
and
Jupiter's
orbits.
The
asteroids
\
\
step.
Option
(B)
is
not
correct
because
if
the
red
color
was
caused
by
the
scattering
\
\
in
this
belt
are
affected
by
the
gravitational
pull
from
both
other
asteroids
\
\
off
the
atmosphere,
then
the
earth
with
a
much
thicker
atmosphere
would
also
look
\
\
and
nearby
planets.
Due
to
the
strong
gravitational
force
of
Jupiter
there
\
\
red.
Options
(C)
and
(D)
are
not
specific
enough
about
why
the
color
of
the
surface
\
\
are
resonances
that
give
rise
to
low
density
regions
of
asteroids
known
as
\
\
would
be
red,
while
(A)
is
correct
because
it
explains
that
the
surface
is
red
\
\
the
Kirkwood
gap.
So
(B)
and
(C)
are
not
correct
since
it
\u2019
s
not
a
lack
\
\
due
to
the
rusted
materials
on
the
surface
and
the
red
color
comes
from
the
rust.
\
\
of
material
that
prevents
a
planet
from
being
formed,
and
(A)
is
incorrect
\
\
So
the
correct
option
is
(A).
The
answer
is
(A).
\n\n
"
\
because
the
Kirkwood
gap
would
have
prevented
a
planet
from
forming
in
the
\
"
group"
:
"
mmlu_flan_cot_fewshot_stem"
\
first
place,
and
(D)
is
the
correct
option.
The
answer
is
(D)."
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
-
question
:
'
Why
is
Mars
red?
"
task"
:
"
mmlu_flan_cot_fewshot_astronomy"
(A)
Because
the
surface
is
covered
with
heavily
oxidized
("rusted")
minerals.
(B)
Because
the
atmosphere
scatters
more
light
at
bluer
wavelengths
transmitting
mostly
red
light.
(C)
Because
Mars
is
covered
with
ancient
lava
flows
which
are
red
in
color.
(D)
Because
flowing
water
on
Mars'
'
s
surface
altered
the
surface
minerals
several
billion
years
ago.'
target
:
'
Let'
'
s
think
step
by
step.
Option
(B)
is
not
correct
because
if
the
red
color
was
caused
by
the
scattering
off
the
atmosphere,
then
the
earth
with
a
much
thicker
atmosphere
would
also
look
red.
Options
(C)
and
(D)
are
not
specific
enough
about
why
the
color
of
the
surface
would
be
red,
while
(A)
is
correct
because
it
explains
that
the
surface
is
red
due
to
the
rusted
materials
on
the
surface
and
the
red
color
comes
from
the
rust.
So
the
correct
option
is
(A).
The
answer
is
(A).'
group
:
mmlu_flan_cot_fewshot_stem
include
:
_mmlu_flan_cot_fewshot_template_yaml
task
:
mmlu_flan_cot_fewshot_astronomy
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_business_ethics.yaml
View file @
da211969
"
dataset_name"
:
"
business_ethics"
dataset_name
:
business_ethics
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
business
\
description
:
The following are multiple choice questions (with answers) about business
\
ethics.
\n\n
Q:
In
contrast
to
_______,
_______
aim
to
reward
favourable
behaviour
\
ethics.
\
by
companies.
The
success
of
such
campaigns
have
been
heightened
through
the
use
\
fewshot_config
:
\
of
___________,
which
allow
campaigns
to
facilitate
the
company
in
achieving
_________
\
sampler
:
first_n
\
.
\n
(A)
Buycotts,
Boycotts,
Blockchain
technology,
Charitable
donations
(B)
Buycotts,
\
samples
:
\
Boycotts,
Digital
technology,
Increased
Sales
(C)
Boycotts,
Buyalls,
Blockchain
\
-
question
:
'
In
contrast
to
_______,
_______
aim
to
reward
favourable
behaviour
by
\
technology,
Charitable
donations
(D)
Boycotts,
Buycotts,
Digital
technology,
Increased
\
companies.
The
success
of
such
campaigns
have
been
heightened
through
the
use
\
Sales
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
business
\
of
___________,
which
allow
campaigns
to
facilitate
the
company
in
achieving
\
ethics
for
help.
The
sentence
that
best
uses
the
possible
options
above
is
“In
\
_________
.
\
contrast
to
*boycotts*,
*buycotts*
aim
to
reward
favourable
behavior
by
companies.
\
\
The
success
of
such
campaigns
have
been
heightened
through
the
use
of
*digital
\
(A)
Buycotts,
Boycotts,
Blockchain
technology,
Charitable
donations
(B)
Buycotts,
\
technology*,
which
allow
campaigns
to
facilitate
the
company
in
achieving
*increased
\
Boycotts,
Digital
technology,
Increased
Sales
(C)
Boycotts,
Buyalls,
Blockchain
\
sales*.”
The
answer
is
(D).
\n\n
Q:
_______
is
the
direct
attempt
to
formally
or
\
technology,
Charitable
donations
(D)
Boycotts,
Buycotts,
Digital
technology,
\
informally
manage
ethical
issues
or
problems,
through
specific
policies,
practices
\
Increased
Sales'
\
and
programmes.
\n
(A)
Corporate
social
responsibility
(B)
Business
ethics
management
\
target
:
"
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
business
\
\
(C)
Sustainability
(D)
Environmental
management
\n
A:
Let's
think
step
by
step.
\
\
ethics
for
help.
The
sentence
that
best
uses
the
possible
options
above
is
\
\
We
refer
to
Wikipedia
articles
on
business
ethics
for
help.
The
direct
attempt
\
\ \u201C
In
contrast
to
*boycotts*,
*buycotts*
aim
to
reward
favourable
behavior
\
\
manage
ethical
issues
through
specific
policies,
practices,
and
programs
is
business
\
\
by
companies.
The
success
of
such
campaigns
have
been
heightened
through
the
\
\
ethics
management.
The
answer
is
(B).
\n\n
Q:
Three
contrasting
tactics
that
CSO's
\
\
use
of
*digital
technology*,
which
allow
campaigns
to
facilitate
the
company
\
\
can
engage
in
to
meet
their
aims
are
________
which
typically
involves
research
\
\
in
achieving
*increased
sales*.
\u201D
The
answer
is
(D)."
\
and
communication,
________,
which
may
involve
physically
attacking
a
company's
\
-
question
:
'
_______
is
the
direct
attempt
to
formally
or
informally
manage
ethical
\
operations
or
________,
often
involving
some
form
of
_______.
\n
(A)
Non-violent
\
issues
or
problems,
through
specific
policies,
practices
and
programmes.
\
direct
action,
Violent
direct
action,
Indirect
action,
Boycott
(B)
Indirect
action,
\
\
Instrumental
action,
Non-violent
direct
action,
Information
campaign
(C)
Indirect
\
(A)
Corporate
social
responsibility
(B)
Business
ethics
management
(C)
Sustainability
\
action,
Violent
direct
action,
Non-violent
direct-action
Boycott
(D)
Non-violent
\
(D)
Environmental
management'
\
direct
action,
Instrumental
action,
Indirect
action,
Information
campaign
\n
A:
\
target
:
Let's think step by step. We refer to Wikipedia articles on business ethics
\
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
business
ethics
for
\
for help. The direct attempt manage ethical issues through specific policies,
\
help.
The
sentence
that
best
uses
the
possible
options
above
is
“Three
contrasting
\
practices, and programs is business ethics management. The answer is (B).
\
tactics
that
CSO's
can
engage
in
to
meet
their
aims
are
*indirect
action*,
which
\
-
question
:
'
Three
contrasting
tactics
that
CSO'
'
s
can
engage
in
to
meet
their
aims
\
typically
involves
research
and
communication,
*violent
direct
action*,
which
\
are
________
which
typically
involves
research
and
communication,
________,
\
may
involve
physically
attacking
a
company's
operations
or
*non-violent
direct
\
which
may
involve
physically
attacking
a
company'
'
s
operations
or
________,
\
action*,
often
involving
some
form
of
*boycott*.”
The
answer
is
(C).
\n\n
Q:
To
\
often
involving
some
form
of
_______.
\
ensure
the
independence
of
the
non-executive
board
members,
there
are
a
number
\
\
of
steps
which
can
be
taken,
which
include
non-executives
being
drawn
from
_______
\
(A)
Non-violent
direct
action,
Violent
direct
action,
Indirect
action,
Boycott
\
the
company,
being
appointed
for
a
_________
time
period
as
well
as
being
appointed
\
(B)
Indirect
action,
Instrumental
action,
Non-violent
direct
action,
Information
\
_________.
\n
(A)
Outside,
Limited,
Independently
(B)
Inside,
Limited,
Intermittently
\
campaign
(C)
Indirect
action,
Violent
direct
action,
Non-violent
direct-action
\
(C)
Outside,
Unlimited,
Intermittently
(D)
Inside,
Unlimited,
Independently
\n\
Boycott
(D)
Non-violent
direct
action,
Instrumental
action,
Indirect
action,
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
business
ethics
for
\
Information
campaign'
\
help.
The
sentence
that
best
uses
the
possible
options
above
is
“To
ensure
the
\
target
:
"
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
business
\
\
independence
of
the
non-executive
board
members,
there
are
a
number
of
steps
which
\
\
ethics
for
help.
The
sentence
that
best
uses
the
possible
options
above
is
\
\
can
be
taken,
which
include
non-executives
being
draw
from
*outside*
the
company,
\
\ \u201C
Three
contrasting
tactics
that
CSO's
can
engage
in
to
meet
their
aims
\
\
being
appointed
for
a
*limited*
time
period
as
well
as
being
imported
*independently*.
\
\
are
*indirect
action*,
which
typically
involves
research
and
communication,
\
\
The
answer
is
(A).
\n\n
Q:
Beyond
the
business
case
for
engaging
in
CSR
there
are
\
\
*violent
direct
action*,
which
may
involve
physically
attacking
a
company's
\
\
a
number
of
moral
arguments
relating
to:
negative
_______,
the
_______that
corporations
\
\
operations
or
*non-violent
direct
action*,
often
involving
some
form
of
*boycott*.
\u201D\
\
possess
and
the
________
of
business
and
society.
\n
(A)
Externalities,
Power,
Independence
\
\
The
answer
is
(C)."
\
(B)
Publicity,
Insubstantial
resources,
Mutual
dependence
(C)
Publicity,
Power,
\
-
question
:
'
To
ensure
the
independence
of
the
non-executive
board
members,
there
are
\
Independence
(D)
Externalities,
Power,
Mutual
dependence
\n
A:
Let's
think
step
\
a
number
of
steps
which
can
be
taken,
which
include
non-executives
being
drawn
\
by
step.
We
refer
to
Wikipedia
articles
on
business
ethics
for
help.
The
sentence
\
from
_______
the
company,
being
appointed
for
a
_________
time
period
as
well
\
that
best
uses
the
possible
options
above
is
“Beyond
the
business
case
for
engaging
\
as
being
appointed
_________.
\
the
CSR
there
are
a
number
of
moral
arguments
relating
to:
negative
*externalities*,
\
\
the
*power*
that
corporations
possess
and
the
*mutual
independence*
of
business
\
(A)
Outside,
Limited,
Independently
(B)
Inside,
Limited,
Intermittently
(C)
\
and
society.
The
answer
is
(D).
\n\n
"
Outside,
Unlimited,
Intermittently
(D)
Inside,
Unlimited,
Independently'
"
group"
:
"
mmlu_flan_cot_fewshot_other"
target
:
"
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
business
\
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
\
ethics
for
help.
The
sentence
that
best
uses
the
possible
options
above
is
\
"
task"
:
"
mmlu_flan_cot_fewshot_business_ethics"
\ \u201C
To
ensure
the
independence
of
the
non-executive
board
members,
there
\
\
are
a
number
of
steps
which
can
be
taken,
which
include
non-executives
being
\
\
draw
from
*outside*
the
company,
being
appointed
for
a
*limited*
time
period
\
\
as
well
as
being
imported
*independently*.
The
answer
is
(A)."
-
question
:
'
Beyond
the
business
case
for
engaging
in
CSR
there
are
a
number
of
moral
arguments
relating
to:
negative
_______,
the
_______that
corporations
possess
and
the
________
of
business
and
society.
(A)
Externalities,
Power,
Independence
(B)
Publicity,
Insubstantial
resources,
Mutual
dependence
(C)
Publicity,
Power,
Independence
(D)
Externalities,
Power,
Mutual
dependence'
target
:
"
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
business
\
\
ethics
for
help.
The
sentence
that
best
uses
the
possible
options
above
is
\
\ \u201C
Beyond
the
business
case
for
engaging
the
CSR
there
are
a
number
of
\
\
moral
arguments
relating
to:
negative
*externalities*,
the
*power*
that
corporations
\
\
possess
and
the
*mutual
independence*
of
business
and
society.
The
answer
\
\
is
(D).
\n\n
"
group
:
mmlu_flan_cot_fewshot_other
include
:
_mmlu_flan_cot_fewshot_template_yaml
task
:
mmlu_flan_cot_fewshot_business_ethics
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_clinical_knowledge.yaml
View file @
da211969
"
dataset_name"
:
"
clinical_knowledge"
dataset_name
:
clinical_knowledge
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
clinical
\
description
:
The following are multiple choice questions (with answers) about clinical
\
knowledge.
\n\n
Q:
Glycolysis
is
the
name
given
to
the
pathway
involving
the
conversion
\
knowledge.
\
of:
\n
(A)
glycogen
to
glucose-1-phosphate.
(B)
glycogen
or
glucose
to
fructose.
\
fewshot_config
:
\
(C)
glycogen
or
glucose
to
pyruvate
or
lactate.
(D)
glycogen
or
glucose
to
pyruvate
\
sampler
:
first_n
\
or
acetyl
CoA.
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
\
samples
:
\
clinical
knowledge
for
help.
Glycolysis
is
the
name
given
to
the
pathway
involving
\
-
question
:
'
Glycolysis
is
the
name
given
to
the
pathway
involving
the
conversion
of:
\
conversion
of
glycogen
or
glucose
to
pyruvate
or
lactate.
The
answer
is
(C).
\n\
\n
Q:
What
is
the
difference
between
a
male
and
a
female
catheter?
\n
(A)
Male
and
\
(A)
glycogen
to
glucose-1-phosphate.
(B)
glycogen
or
glucose
to
fructose.
(C)
\
female
catheters
are
different
colours.
(B)
Male
catheters
are
longer
than
female
\
glycogen
or
glucose
to
pyruvate
or
lactate.
(D)
glycogen
or
glucose
to
pyruvate
\
catheters.
(C)
Male
catheters
are
bigger
than
female
catheters.
(D)
Female
catheters
\
or
acetyl
CoA.'
\
are
longer
than
male
catheters.
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
\
target
:
Let's think step by step. We refer to Wikipedia articles on clinical knowledge
\
articles
on
clinical
knowledge
for
help.
The
difference
between
a
male
and
female
\
for help. Glycolysis is the name given to the pathway involving conversion of
\
catheter
is
that
male
catheters
tend
to
be
longer
than
female
catheters.
The
answer
\
glycogen or glucose to pyruvate or lactate. The answer is (C).
\
is
(B).
\n\n
Q:
How
many
attempts
should
you
make
to
cannulate
a
patient
before
\
-
question
:
'
What
is
the
difference
between
a
male
and
a
female
catheter?
\
passing
the
job
on
to
a
senior
colleague,
according
to
the
medical
knowledge
of
\
\
2020?
\n
(A)
4
(B)
3
(C)
2
(D)
1
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
\
(A)
Male
and
female
catheters
are
different
colours.
(B)
Male
catheters
are
\
articles
on
clinical
knowledge
for
help.
According
to
the
medical
protocol
as
\
longer
than
female
catheters.
(C)
Male
catheters
are
bigger
than
female
catheters.
\
of
2020,
you
should
make
two
attempts
to
cannulate
a
patient
before
passing
the
\
(D)
Female
catheters
are
longer
than
male
catheters.'
\
job
on
to
a
more-senior
practitioner.
The
answer
is
(C).
\n\n
Q:
In
the
assessment
\
target
:
Let's think step by step. We refer to Wikipedia articles on clinical knowledge
\
of
the
hand
function
which
of
the
following
is
true?
\n
(A)
Abduction
of
the
thumb
\
for help. The difference between a male and female catheter is that male catheters
\
is
supplied
by
spinal
root
T2
(B)
Opposition
of
the
thumb
by
opponens
policis
\
tend to be longer than female catheters. The answer is (B).
\
is
supplied
by
spinal
root
T1
(C)
Finger
adduction
is
supplied
by
the
median
nerve
\
-
question
:
'
How
many
attempts
should
you
make
to
cannulate
a
patient
before
passing
\
(D)
Finger
abduction
is
mediated
by
the
palmar
interossei
\n
A:
Let's
think
step
\
the
job
on
to
a
senior
colleague,
according
to
the
medical
knowledge
of
2020?
\
by
step.
We
refer
to
Wikipedia
articles
on
clinical
knowledge
for
help.
Of
all
\
\
the
options,
it
is
only
true
that
the
opposition
of
the
thumb
by
opponens
pollicis
\
(A)
4
(B)
3
(C)
2
(D)
1'
\
is
supplied
by
spinal
root
T1.
The
answer
is
(B).
\n\n
Q:
The
energy
for
all
forms
\
target
:
Let's think step by step. We refer to Wikipedia articles on clinical knowledge
\
of
muscle
contraction
is
provided
by:
\n
(A)
ATP.
(B)
ADP.
(C)
phosphocreatine.
\
for help. According to the medical protocol as of 2020, you should make two
\
(D)
oxidative
phosphorylation.
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
\
attempts to cannulate a patient before passing the job on to a more-senior practitioner.
\
articles
on
clinical
knowledge
for
help.
The
energy
for
muscular
contraction
is
\
The answer is (C).
\
provided
by
ATP
(adenosine
triphosphate),
which
is
the
powerhouse
of
the
cell.
\
-
question
:
'
In
the
assessment
of
the
hand
function
which
of
the
following
is
true?
\
The
answer
is
(A).
\n\n
"
"
group"
:
"
mmlu_flan_cot_fewshot_other"
(A)
Abduction
of
the
thumb
is
supplied
by
spinal
root
T2
(B)
Opposition
of
the
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
thumb
by
opponens
policis
is
supplied
by
spinal
root
T1
(C)
Finger
adduction
"
task"
:
"
mmlu_flan_cot_fewshot_clinical_knowledge"
is
supplied
by
the
median
nerve
(D)
Finger
abduction
is
mediated
by
the
palmar
interossei'
target
:
Let's think step by step. We refer to Wikipedia articles on clinical knowledge
for help. Of all the options, it is only
true
that the opposition of the thumb
by opponens pollicis is supplied by spinal root T1. The answer is (B).
-
question
:
'
The
energy
for
all
forms
of
muscle
contraction
is
provided
by:
(A)
ATP.
(B)
ADP.
(C)
phosphocreatine.
(D)
oxidative
phosphorylation.'
target
:
'
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
clinical
knowledge
for
help.
The
energy
for
muscular
contraction
is
provided
by
ATP
(adenosine
triphosphate),
which
is
the
powerhouse
of
the
cell.
The
answer
is
(A).'
group
:
mmlu_flan_cot_fewshot_other
include
:
_mmlu_flan_cot_fewshot_template_yaml
task
:
mmlu_flan_cot_fewshot_clinical_knowledge
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_college_biology.yaml
View file @
da211969
"
dataset_name"
:
"
college_biology"
dataset_name
:
college_biology
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
college
\
description
:
The following are multiple choice questions (with answers) about college
\
biology.
\n\n
Q:
Which
of
the
following
represents
an
accurate
statement
concerning
\
biology.
\
arthropods?
\n
(A)
They
possess
an
exoskeleton
composed
primarily
of
peptidoglycan.
\
fewshot_config
:
\
(B)
They
possess
an
open
circulatory
system
with
a
dorsal
heart.
(C)
They
are
\
sampler
:
first_n
\
members
of
a
biologically
unsuccessful
phylum
incapable
of
exploiting
diverse
\
samples
:
\
habitats
and
nutrition
sources.
(D)
They
lack
paired,
jointed
appendages.
\n
A:
\
-
question
:
'
Which
of
the
following
represents
an
accurate
statement
concerning
arthropods?
\
Let's
think
step
by
step.
Peptidoglycan
is
known
to
comprise
the
plasma
membrane
\
\
of
most
bacteria,
rather
than
the
exoskeleton
of
arthropods,
which
is
made
of
\
(A)
They
possess
an
exoskeleton
composed
primarily
of
peptidoglycan.
(B)
They
\
chitin,
which
rules
out
(A).
The
answer
(C)
is
false
because
arthropods
are
a
\
possess
an
open
circulatory
system
with
a
dorsal
heart.
(C)
They
are
members
\
highly
successful
phylum.
Likewise,
arthropods
have
paired,
jointed
appendages,
\
of
a
biologically
unsuccessful
phylum
incapable
of
exploiting
diverse
habitats
\
which
rules
out
(D).
The
only
remaining
option
is
(B),
as
arthropods
have
an
open
\
and
nutrition
sources.
(D)
They
lack
paired,
jointed
appendages.'
\
circulatory
system
with
a
dorsal
tubular
heart.
The
answer
is
(B).
\n\n
Q:
In
a
\
target
:
Let's think step by step. Peptidoglycan is known to comprise the plasma
\
given
population,
1
out
of
every
400
people
has
a
cancer
caused
by
a
completely
\
membrane of most bacteria, rather than the exoskeleton of arthropods, which
\
recessive
allele,
b.
Assuming
the
population
is
in
Hardy-Weinberg
equilibrium,
\
is made of chitin, which rules out (A). The answer (C) is
false
because arthropods
\
which
of
the
following
is
the
expected
proportion
of
individuals
who
carry
the
\
are a highly successful phylum. Likewise, arthropods have paired, jointed appendages,
\
b
allele
but
are
not
expected
to
develop
the
cancer?
\n
(A)
1/400
(B)
19/400
(C)
\
which rules out (D). The only remaining option is (B), as arthropods have an
\
20/400
(D)
38/400
\n
A:
Let's
think
step
by
step.
According
to
the
Hardy
Weinberg
\
open circulatory system with a dorsal tubular heart. The answer is (B).
\
Law,
$p^2
+
2
p
q
+
q^2
=
1$,
and
$p
+
q
=
1$
where
$p$
is
the
frequency
of
the
\
-
question
:
'
In
a
given
population,
1
out
of
every
400
people
has
a
cancer
caused
by
\
dominant
allele,
$q$
is
the
frequency
of
the
recessive
allele,
and
$p^2$,
$q^2$,
\
a
completely
recessive
allele,
b.
Assuming
the
population
is
in
Hardy-Weinberg
\
and
$2pq$
are
the
frequencies
of
dominant
homozygous,
recessive
homozygous,
and
\
equilibrium,
which
of
the
following
is
the
expected
proportion
of
individuals
\
heterozygous
individuals,
respectively.
The
frequency
of
the
recessive
allele
\
who
carry
the
b
allele
but
are
not
expected
to
develop
the
cancer?
\
(q)
is
$
\\
sqrt{
\f
rac{1}{400}}
=
0.05$.
We
have
$p
=
1
-
q
=
0.95$.
The
frequency
\
\
of
heterozygous
individuals
is
$2pq
=
2
\\
cdot
0.05
\\
cdot
0.95
=
0.095$.
The
\
(A)
1/400
(B)
19/400
(C)
20/400
(D)
38/400'
\
number
of
heterozygous
individuals
is
equal
to
the
frequency
of
heterozygous
individuals
\
target
:
"
Let's
think
step
by
step.
According
to
the
Hardy
Weinberg
Law,
$p^2
+
\
\
times
the
size
of
the
population,
or
$0.095
*
400
=
38$.
So
we
end
up
with
38/400.
\
\
2
p
q
+
q^2
=
1$,
and
$p
+
q
=
1$
where
$p$
is
the
frequency
of
the
dominant
\
\
The
answer
is
(D).
\n\n
Q:
According
to
the
pressure-flow
model
of
movement
of
phloem
\
\
allele,
$q$
is
the
frequency
of
the
recessive
allele,
and
$p^2$,
$q^2$,
and
\
\
contents,
photosynthate
movement
from
source
to
sink
is
driven
by
\n
(A)
an
ATP-dependent
\
\
$2pq$
are
the
frequencies
of
dominant
homozygous,
recessive
homozygous,
and
\
\
pressure-flow
pump
(B)
a
water-pressure
potential
gradient
(C)
transpiration
(D)
\
\
heterozygous
individuals,
respectively.
\u200B
The
frequency
of
the
recessive
\
\
apoplastic
diffusion
\n
A:
Let's
think
step
by
step.
It
is
a
gradient
in
water
pressure
\
\
allele
(q)
is
$
\\
sqrt{
\f
rac{1}{400}}
=
0.05$.
We
have
$p
=
1
-
q
=
0.95$.
\
\
that
induces
the
movement
of
phloem
content,
which
refers
to
answer
(B).
The
mechanism
\
\
The
frequency
of
heterozygous
individuals
is
$2pq
=
2
\\
cdot
0.05
\\
cdot
0.95
\
\
of
movement
does
not
rely
on
metabolism,
which
rules
out
(A).
Transpiration
refers
\
\
=
0.095$.
The
number
of
heterozygous
individuals
is
equal
to
the
frequency
\
\
to
the
exhalation
of
water
vapor
through
plant
stomata,
and
is
also
not
related,
\
\
of
heterozygous
individuals
times
the
size
of
the
population,
or
$0.095
*
\
\
which
rules
out
(C).
While
the
apoplastic
pathway
is
one
of
two
main
pathways
\
\
400
=
38$.
So
we
end
up
with
38/400.
The
answer
is
(D)."
\
for
water
transport
in
plants,
it
is
not
central
to
the
pressure
flow
model,
which
\
-
question
:
'
According
to
the
pressure-flow
model
of
movement
of
phloem
contents,
photosynthate
\
rules
out
(D).
The
answer
is
(B).
\n\n
Q:
Which
of
the
following
contain
DNA
sequences
\
movement
from
source
to
sink
is
driven
by
\
required
for
the
segregation
of
chromosomes
in
mitosis
and
meiosis?
\n
(A)
Telomeres
\
\
(B)
Centromeres
(C)
Nucleosomes
(D)
Spliceosomes
\n
A:
Let's
think
step
by
step.
\
(A)
an
ATP-dependent
pressure-flow
pump
(B)
a
water-pressure
potential
gradient
\
The
genetic
material
in
Telomeres
is
not
used,
which
rules
out
(A).
Nucleosomes
\
(C)
transpiration
(D)
apoplastic
diffusion'
\
are
the
repeating
subunit
that
comprises
chromatin
packed
in
a
cell
nucleus,
and
\
target
:
Let's think step by step. It is a gradient in water pressure that induces
\
do
not
specifically
refer
to
DNA
sequences
necessary
for
segregating
chromosomes
\
the movement of phloem content, which refers to answer (B). The mechanism of
\
in
cell
division,
which
rules
out
(C).
A
spliceosome
is
a
large
ribonucleoprotein
\
movement does not rely on metabolism, which rules out (A). Transpiration refers
\
that
removes
introns
from
transcribed
pre-mRNA
rather
than
governing
chromosome
\
to the exhalation of water vapor through plant stomata, and is also not related,
\
segregation.
Centromeres
are
directly
responsible
for
segregating
chromosomes
\
which rules out (C). While the apoplastic pathway is one of two main pathways
\
in
cell
division.
The
answer
is
(B).
\n\n
Q:
The
presence
of
homologous
structures
\
for water transport in plants, it is not central to the pressure flow model,
\
in
two
different
organisms,
such
as
the
humerus
in
the
front
limb
of
a
human
and
\
which rules out (D). The answer is (B).
\
a
bird,
indicates
that
\n
(A)
the
human
and
bird
are
polyphyletic
species
(B)
a
\
-
question
:
'
Which
of
the
following
contain
DNA
sequences
required
for
the
segregation
\
human's
and
bird's
evolution
is
convergent
(C)
the
human
and
bird
belong
to
a
\
of
chromosomes
in
mitosis
and
meiosis?
\
clade
(D)
the
human
and
bird
developed
by
analogy
\n
A:
Let's
think
step
by
step.
\
\
Polyphyletic
species
are
organisms
that
are
grouped
due
to
having
similar
characteristics
\
(A)
Telomeres
(B)
Centromeres
(C)
Nucleosomes
(D)
Spliceosomes'
\
but
which
do
not
have
a
common
ancestor.
This
is
not
the
case
for
humans
and
birds,
\
target
:
Let's think step by step. The genetic material in Telomeres is not used,
\
which
rules
out
(A).
Convergent
evolution
refers
to
the
indepdendent
development
\
which rules out (A). Nucleosomes are the repeating subunit that comprises chromatin
\
of
similar
features
in
different
species
at
different
periods,
which
is
also
not
\
packed in a cell nucleus, and do not specifically refer to DNA sequences necessary
\
the
case
for
humans
and
birds,
which
rules
out
(B).
Analogy
refers
to
the
superficial
\
for segregating chromosomes in cell division, which rules out (C). A spliceosome
\
resemblance
of
structures
that
have
different
origins,
which
is
not
the
case
for
\
is a large ribonucleoprotein that removes introns from transcribed pre-mRNA
\
the
human
and
bird
forearms,
which
rules
out
(D).
Humans
and
birds
do
belong
to
\
rather than governing chromosome segregation. Centromeres are directly responsible
\
the
same
clade
-
a
group
of
organisms
composed
of
a
common
ancestor.
The
answer
\
for segregating chromosomes in cell division. The answer is (B).
\
is
(C).
\n\n
"
-
question
:
'
The
presence
of
homologous
structures
in
two
different
organisms,
such
"
group"
:
"
mmlu_flan_cot_fewshot_stem"
as
the
humerus
in
the
front
limb
of
a
human
and
a
bird,
indicates
that
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
task"
:
"
mmlu_flan_cot_fewshot_college_biology"
(A)
the
human
and
bird
are
polyphyletic
species
(B)
a
human'
'
s
and
bird'
'
s
evolution
is
convergent
(C)
the
human
and
bird
belong
to
a
clade
(D)
the
human
and
bird
developed
by
analogy'
target
:
'
Let'
'
s
think
step
by
step.
Polyphyletic
species
are
organisms
that
are
grouped
due
to
having
similar
characteristics
but
which
do
not
have
a
common
ancestor.
This
is
not
the
case
for
humans
and
birds,
which
rules
out
(A).
Convergent
evolution
refers
to
the
indepdendent
development
of
similar
features
in
different
species
at
different
periods,
which
is
also
not
the
case
for
humans
and
birds,
which
rules
out
(B).
Analogy
refers
to
the
superficial
resemblance
of
structures
that
have
different
origins,
which
is
not
the
case
for
the
human
and
bird
forearms,
which
rules
out
(D).
Humans
and
birds
do
belong
to
the
same
clade
-
a
group
of
organisms
composed
of
a
common
ancestor.
The
answer
is
(C).'
group
:
mmlu_flan_cot_fewshot_stem
include
:
_mmlu_flan_cot_fewshot_template_yaml
task
:
mmlu_flan_cot_fewshot_college_biology
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_college_chemistry.yaml
View file @
da211969
"
dataset_name"
:
"
college_chemistry"
dataset_name
:
college_chemistry
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
college
\
description
:
The following are multiple choice questions (with answers) about college
\
chemistry.
\n\n
Q:
3
Cl−(aq)
+
4
CrO_4^2−(aq)
+
23
H+(aq)
→
3
HClO2(aq)
+
4
Cr3+(aq)
\
chemistry.
\
+
10
H2O(l).
In
the
reaction
shown
above,
Cl−(aq)
behaves
as
\n
(A)
an
acid
(B)
\
fewshot_config
:
\
a
base
(C)
a
catalyst
(D)
a
reducing
agent
\n
A:
Let's
think
step
by
step.
A
molecule
\
sampler
:
first_n
\
that
behaves
as
a
base
accepts
an
H+
ion
(or
proton)
from
another
molecule,
whereas
\
samples
:
\
a
molecule
that
behaves
as
an
acid
donates
an
H+
ion
(or
proton)
to
another
molecule.
\
-
question
:
"
3
Cl
\u2212
(aq)
+
4
CrO_4^2
\u2212
(aq)
+
23
H+(aq)
\u2192
3
HClO2(aq)
+
\
\
Neither
of
these
is
the
case
for
Cl
in
this
reaction,
which
rules
out
(A)
and
\
\
4
Cr3+(aq)
+
10
H2O(l).
In
the
reaction
shown
above,
Cl
\u2212
(aq)
behaves
\
\
(B).
A
catalyst
is
a
substance
that
only
accelerates
a
reaction
without
itself
\
\
as
\n
(A)
an
acid
(B)
a
base
(C)
a
catalyst
(D)
a
reducing
agent"
\
undergoing
chemical
change,
which
is
not
the
case
here.
This
rules
out
(C).
Instead,
\
target
:
Let's think step by step. A molecule that behaves as a base accepts an
\
the
$Cl^{-}
molecules
carry
a
negative
charge,
which
they
donate
in
the
reaction
\
H+ ion (or proton) from another molecule, whereas a molecule that behaves as
\
to
form
3
HClO2.
This
is
the
behavior
of
a
reducing
agent,
or
(D).
The
answer
\
an acid donates an H+ ion (or proton) to another molecule. Neither of these
\
is
(D).
\n\n
Q:
Which
of
the
following
statements
about
the
lanthanide
elements
\
is the case for Cl in this reaction, which rules out (A) and (B). A catalyst
\
is
NOT
true?
\n
(A)
The
most
common
oxidation
state
for
the
lanthanide
elements
\
is a substance that only accelerates a reaction without itself undergoing chemical
\
is
+3.
(B)
Lanthanide
complexes
often
have
high
coordination
numbers
(>
6).
(C)
\
change, which is not the case here. This rules out (C). Instead, the $Cl^{-}
\
All
of
the
lanthanide
elements
react
with
aqueous
acid
to
liberate
hydrogen.
(D)
\
molecules carry a negative charge, which they donate in the reaction to form
\
The
atomic
radii
of
the
lanthanide
elements
increase
across
the
period
from
La
\
3 HClO2. This is the behavior of a reducing agent, or (D). The answer is (D).
\
to
Lu.
\n
A:
Let's
think
step
by
step.
The
atomic
radii
of
the
lanthanide
elements
\
-
question
:
'
Which
of
the
following
statements
about
the
lanthanide
elements
is
NOT
\
in
fact
decrease
across
the
period
from
La
to
Lu.
Options
(A),
(B),
and
(C)
are
\
true?
\
all
true.
This
means
that
only
(D)
is
NOT
true.
The
answer
is
(D).
\n\n
Q:
Which
\
\
of
the
following
lists
the
hydrides
of
group-14
elements
in
order
of
thermal
stability,
\
(A)
The
most
common
oxidation
state
for
the
lanthanide
elements
is
+3.
(B)
Lanthanide
\
from
lowest
to
highest?
\n
(A)
PbH4
<
SnH4
<
GeH4
<
SiH4
<
CH4
(B)
PbH4
<
SnH4
<
\
complexes
often
have
high
coordination
numbers
(>
6).
(C)
All
of
the
lanthanide
\
CH4
<
GeH4
<
SiH4
(C)
CH4
<
SiH4
<
GeH4
<
SnH4
<
PbH4
(D)
CH4
<
PbH4
<
GeH4
<
\
elements
react
with
aqueous
acid
to
liberate
hydrogen.
(D)
The
atomic
radii
\
SnH4
<
SiH4
\n
A:
Let's
think
step
by
step.
The
thermal
stability
of
group-14
hydrides
\
of
the
lanthanide
elements
increase
across
the
period
from
La
to
Lu.'
\
decreases
as
we
move
from
the
top
of
group
14
to
the
bottom.
The
order
of
elements
\
target
:
Let's think step by step. The atomic radii of the lanthanide elements
\
in
the
group
from
top
to
bottom
is
C,
Si,
Ge,
Sn,
Pb.
Therefore
in
order
of
increasing
\
in fact decrease across the period from La to Lu. Options (A), (B), and (C)
\
thermal
stability
we
have
PbH4,
SnH4,
GeH4,
SiH4,
and
CH4,
or
answer
(A).
The
\
are all
true
. This means that only (D) is NOT
true
. The answer is (D).
\
answer
is
(A).
\n\n
Q:
Predict
the
number
of
lines
in
the
EPR
spectrum
of
a
solution
\
-
question
:
'
Which
of
the
following
lists
the
hydrides
of
group-14
elements
in
order
\
of
13C-labelled
methyl
radical
(13CH3•),
assuming
the
lines
do
not
overlap.
\n\
of
thermal
stability,
from
lowest
to
highest?
(A)
4
(B)
3
(C)
6
(D)
24
(E)
8
\n
A:
Let's
think
step
by
step.
The
electron
paramagnetic
\
\
resonance
spectrum
will
be
split
by
two
forms
of
interactions.
The
first
is
the
\
(A)
PbH4
<
SnH4
<
GeH4
<
SiH4
<
CH4
(B)
PbH4
<
SnH4
<
CH4
<
GeH4
<
SiH4
(C)
\
hyperfine
interaction
with
the
13C
(nuclear
spin
$I
=
\n
rac{1}{2}$)
which
will
\
CH4
<
SiH4
<
GeH4
<
SnH4
<
PbH4
(D)
CH4
<
PbH4
<
GeH4
<
SnH4
<
SiH4'
\
split
the
spectrum
into
2
lines.
This
will
be
further
split
into
4
lines
by
the
\
target
:
Let's think step by step. The thermal stability of group-14 hydrides decreases
\
interaction
with
three
equivalent
1H
nuclei.
The
total
number
of
lines
is
therefore
\
as we move from the top of group 14 to the bottom. The order of elements in
\
$2
\\
cdot
4
=
8$.
The
answer
is
(E).
\n\n
"
the group from top to bottom is C, Si, Ge, Sn, Pb. Therefore in order of increasing
"
group"
:
"
mmlu_flan_cot_fewshot_stem"
thermal stability we have PbH4, SnH4, GeH4, SiH4, and CH4, or answer (A). The
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
answer is (A).
"
task"
:
"
mmlu_flan_cot_fewshot_college_chemistry"
-
question
:
"
Predict
the
number
of
lines
in
the
EPR
spectrum
of
a
solution
of
13C-labelled
\
\
methyl
radical
(13CH3
\u2022
),
assuming
the
lines
do
not
overlap.
\n
(A)
4
(B)
\
\
3
(C)
6
(D)
24
(E)
8"
target
:
"
Let's
think
step
by
step.
The
electron
paramagnetic
resonance
spectrum
\
\
will
be
split
by
two
forms
of
interactions.
The
first
is
the
hyperfine
interaction
\
\
with
the
13C
(nuclear
spin
$I
=
\n
rac{1}{2}$)
which
will
split
the
spectrum
\
\
into
2
lines.
This
will
be
further
split
into
4
lines
by
the
interaction
with
\
\
three
equivalent
1H
nuclei.
The
total
number
of
lines
is
therefore
$2
\\
cdot
\
\
4
=
8$.
The
answer
is
(E).
\n\n
"
group
:
mmlu_flan_cot_fewshot_stem
include
:
_mmlu_flan_cot_fewshot_template_yaml
task
:
mmlu_flan_cot_fewshot_college_chemistry
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_college_computer_science.yaml
View file @
da211969
"
dataset_name"
:
"
college_computer_science"
dataset_name
:
college_computer_science
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
college
\
description
:
The following are multiple choice questions (with answers) about college
\
computer
science.
\n\n
Q:
Which
of
the
following
regular
expressions
is
equivalent
\
computer science.
\
to
(describes
the
same
set
of
strings
as)
(a*
+
b)*(c
+
d)?
\n
(A)
a*(c
+
d)+
b(c
\
fewshot_config
:
\
+
d)
\n
(B)
a*(c
+
d)*
+
b(c
+
d)*
\n
(C)
a*(c
+
d)+
b*(c
+
d)
\n
(D)
(a
+
b)*c
+(a
\
sampler
:
first_n
\
+
b)*d
\n
A:
Let's
think
step
by
step.
We
know
that:
\n
1.
(X*
+
Y)*
=
(X
+
Y)*
\n\
samples
:
2.
X(Y
+
Z)?
=
XY
+
XZ
\n
Using
equation
1
we
can
rewrite
(a*
+
b)*(c
+
d)?
as:
\n\
-
question
:
'
Which
of
the
following
regular
expressions
is
equivalent
to
(describes
3.
(a
+
b)*(c
+
d)?
\n
Using
equation
2
we
can
rewrite
equation
3
as:
\n
(a
+
b)*c
+
\
the
same
set
of
strings
as)
(a*
+
b)*(c
+
d)?
\
(a
+
b)*d
The
answer
is
(D).
\n\n
Q:
The
Singleton
design
pattern
is
used
to
guarantee
\
\
that
only
a
single
instance
of
a
class
may
be
instantiated.
Which
of
the
following
\
(A)
a*(c
+
d)+
b(c
+
d)
\
is
(are)
true
of
this
design
pattern?
\n
I.
The
Singleton
class
has
a
static
factory
\
\
method
to
provide
its
instance.
\n
II.
The
Singleton
class
can
be
a
subclass
of
\
(B)
a*(c
+
d)*
+
b(c
+
d)*
\
another
class.
\n
III.
The
Singleton
class
has
a
private
constructor.
\n
(A)
I
only
\n\
(B)
II
only
\n
(C)
III
only
\n
(D)
I,
II,
and
III
\n
A:
Let's
think
step
by
step.
Statement
\
(C)
a*(c
+
d)+
b*(c
+
d)
\
I
is
a
correct
statement
about
a
Singleton,
because
a
Singleton
restricts
instantiation
\
\
to
a
single,
static
method.
Statement
II
is
also
correct,
because
there
is
no
\
(D)
(a
+
b)*c
+(a
+
b)*d'
\
inherent
restriction
regarding
the
inheritance
of
a
Singleton.
Statement
III
is
\
target
:
'
Let'
'
s
think
step
by
step.
We
know
that:
\
also
correct,
because
a
Singletons
must
be
instantiated
only
once,
so
its
constructor
\
\
is
made
private
to
prevent
any
construction
except
via
its
static
factory
method.
\n\
1.
(X*
+
Y)*
=
(X
+
Y)*
Given
these
facts,
statements
I,
II,
and
III
are
all
correct.
The
answer
is
(D).
\n\
\n
Q:
A
certain
pipelined
RISC
machine
has
8
general-purpose
registers
R0,
R1,
.
\
2.
X(Y
+
Z)?
=
XY
+
XZ
\
.
.
,
R7
and
supports
the
following
operations:
\n
ADD
Rs1,
Rs2,
Rd
(Add
Rs1
to
\
\
Rs2
and
put
the
sum
in
Rd)
\n
MUL
Rs1,
Rs2,
Rd
(Multiply
Rs1
by
Rs2
and
put
the
\
Using
equation
1
we
can
rewrite
(a*
+
b)*(c
+
d)?
as:
\
product
in
Rd)
\n
An
operation
normally
takes
one
cycle;
however,
an
operation
takes
\
\
two
cycles
if
it
produces
a
result
required
by
the
immediately
following
operation
\
3.
(a
+
b)*(c
+
d)?
\
in
an
operation
sequence.
\n
Consider
the
expression
AB
+
ABC
+
BC,
where
variables
\
\
A,
B,
C
are
located
in
registers
R0,
R1,
R2.
If
the
contents
of
these
three
registers
\
Using
equation
2
we
can
rewrite
equation
3
as:
\
must
not
be
modified,
what
is
the
minimum
number
of
clock
cycles
required
for
\
\
an
operation
sequence
that
computes
the
value
of
AB
+
ABC
+
BC?
\n
(A)
5
(B)
6
(C)
\
(a
+
b)*c
+
(a
+
b)*d
The
answer
is
(D).'
\
7
(D)
8
\n
A:
Let's
think
step
by
step.
First,
we
are
given
that
A
is
in
R0,
B
is
\
-
question
:
'
The
Singleton
design
pattern
is
used
to
guarantee
that
only
a
single
instance
\
in
R1,
and
C
is
in
R2.
\n
Next,
we
can
see
that
we
must
compute
three
multiplies
\
of
a
class
may
be
instantiated.
Which
of
the
following
is
(are)
true
of
this
\
(AB,
BC,
and
ABC)
and
two
adds
(AB
+
ABC,
(AB
+
ABC)
+
BC)
to
compute
our
final
\
design
pattern?
\
answer,
resulting
in
a
minimum
of
five
clock
cycles.
\n
Next,
we
can
see
that
there
\
\
is
no
way
to
avoid
at
least
one
pipeline
stall
when
computing
our
final
answer,
\
I.
The
Singleton
class
has
a
static
factory
method
to
provide
its
instance.
\
because
to
compute
our
final
sum
we
must
wait
at
least
one
cycle
for
the
results
\
\
from
the
previous
stage
to
be
ready.
Thus,
our
minimum
number
of
cycles
must
be
\
II.
The
Singleton
class
can
be
a
subclass
of
another
class.
\
6.
\n
We
can
verify
that
we
can
create
a
solution
that
requires
only
six
cycles
\
\
as
follows:
\n
compute
AB:
MUL
R0,
R1,
R3
\n
compute
BC:
MUL
R1,
R2,
R4
\n
compute
ABC:
\
III.
The
Singleton
class
has
a
private
constructor.
\
MUL
R3,
R4,
R5
\n
compute
AB
+
BC:
ADD
R3,
R4,
R6
\n
STALL
\n
compute
AB
+
ABC
+
BC:
\
\
ADD
R5,
R6,
R7
\n
So
there
are
6
cycles.
The
answer
is
(B).
\n\n
Q:
A
compiler
generates
\
(A)
I
only
\
code
for
the
following
assignment
statement.
\n
G
:=
(A
+
B)
*
C
-
(D
+
E)
*
F
\n\
The
target
machine
has
a
single
accumulator
and
a
single-address
instruction
set
\
(B)
II
only
\
consisting
of
instructions
load,
store,
add,
subtract,
and
multiply.
For
the
arithmetic
\
\
operations,
the
left
operand
is
taken
from
the
accumulator
and
the
result
appears
\
(C)
III
only
\
in
the
accumulator.
The
smallest
possible
number
of
instructions
in
the
resulting
\
\
code
is
\n
(A)
5
(B)
6
(C)
7
(D)
9
\n
A:
Let's
think
step
by
step.
We
can
compute
\
(D)
I,
II,
and
III'
\
the
final
answer
with
the
following
sequence
of
operations:
\n
1.
LOAD
D
(accumulator
\
target
:
'
Let'
'
s
think
step
by
step.
Statement
I
is
a
correct
statement
about
a
\
=
D)
\n
2.
ADD
E
(accumulator
=
D+E)
\n
3.
MUL
F
(accumulator
=
(D+E)*F)
\n
4.
STORE
\
Singleton,
because
a
Singleton
restricts
instantiation
to
a
single,
static
method.
\
X
(X
=
(D+E)*F)
\n
5.
LOAD
A
(accumulator
=
A)
\n
6.
ADD
B
(accumulator
=
A+B)
\n\
Statement
II
is
also
correct,
because
there
is
no
inherent
restriction
regarding
7.
MUL
C
(accumulator
=
(A+B)*C)
\n
8.
SUB
X
(accumulator
=
(A+B)*C
-
(D+E)*F)
\n\
the
inheritance
of
a
Singleton.
Statement
III
is
also
correct,
because
a
Singletons
9.
STORE
G
(G
=
(A+B)*C
-
(D+E)*F)
\n
This
sequence
takes
9
instructions.
The
answer
\
must
be
instantiated
only
once,
so
its
constructor
is
made
private
to
prevent
\
is
(D).
\n\n
Q:
Consider
a
computer
design
in
which
multiple
processors,
each
with
\
any
construction
except
via
its
static
factory
method.
\
a
private
cache
memory,
share
global
memory
using
a
single
bus.
This
bus
is
the
\
\
critical
system
resource.
Each
processor
can
execute
one
instruction
every
500
\
Given
these
facts,
statements
I,
II,
and
III
are
all
correct.
The
answer
is
\
nanoseconds
as
long
as
memory
references
are
satisfied
by
its
local
cache.
When
\
(D).'
\
a
cache
miss
occurs,
the
processor
is
delayed
for
an
additional
2,000
nanoseconds.
\
-
question
:
'
A
certain
pipelined
RISC
machine
has
8
general-purpose
registers
R0,
R1,
\
During
half
of
this
additional
delay,
the
bus
is
dedicated
to
serving
the
cache
\
.
.
.
,
R7
and
supports
the
following
operations:
\
miss.
During
the
other
half,
the
processor
cannot
continue,
but
the
bus
is
free
\
\
to
service
requests
from
other
processors.
On
average,
each
instruction
requires
\
ADD
Rs1,
Rs2,
Rd
(Add
Rs1
to
Rs2
and
put
the
sum
in
Rd)
\
2
memory
references.
On
average,
cache
misses
occur
on
1
percent
of
references.
\
\
What
proportion
of
the
capacity
of
the
bus
would
a
single
processor
consume,
ignoring
\
MUL
Rs1,
Rs2,
Rd
(Multiply
Rs1
by
Rs2
and
put
the
product
in
Rd)
\
delays
due
to
competition
from
other
processors?
\n
(A)
1/50
(B)
1/27
(C)
1/25
(D)
\
\
2/27
\n
A:
Let's
think
step
by
step.
We
know
that
each
instruction
requires
two
\
An
operation
normally
takes
one
cycle;
however,
an
operation
takes
two
cycles
\
memory
references
per
instruction,
and
that
there
is
an
average
cache
miss
rate
\
if
it
produces
a
result
required
by
the
immediately
following
operation
in
an
\
of
one
percent.
\n
Thus
a
given
processor
has:
\n
(1
cache
miss
/
100
references)
\
operation
sequence.
\
*
(2
references
/
instruction)
=
\n
(2
cache
misses
/
100
instructions),
so:
\n
misses_per_instruction
\
\
=
1
cache
miss
/
50
instructions.
\n
Next,
we
know
that
each
instruction
requires
\
Consider
the
expression
AB
+
ABC
+
BC,
where
variables
A,
B,
C
are
located
in
\
500
nanoseconds
when
there
is
no
cache
miss,
and
500
+
2000
=
2500
nanoseconds
\
registers
R0,
R1,
R2.
If
the
contents
of
these
three
registers
must
not
be
modified,
\
when
there
is
a
cache
miss.
Thus:
\n
50
instructions
/
(49
*
500)
+
(1
*
2500)
nanoseconds,
\
what
is
the
minimum
number
of
clock
cycles
required
for
an
operation
sequence
\
so:
\n
instructions_per_ns
=
50
instructions
/
27000
nanoseconds.
\n
Now,
we
know
\
that
computes
the
value
of
AB
+
ABC
+
BC?
\
that
each
cache
miss
locks
the
bus
for
half
of
the
2000
nanosecond
cache
miss
\
\
delay,
or
1000
nanoseconds,
so:
\n
lock_ns_per_miss
=
1000
nanoseconds
/
cache
miss.
\n\
(A)
5
(B)
6
(C)
7
(D)
8'
Thus
we
can
see
that
on
average
a
single
processor
will
lock
the
bus
for:
\n
lock_ns_per_miss
\
target
:
'
Let'
'
s
think
step
by
step.
First,
we
are
given
that
A
is
in
R0,
B
is
\
*
misses_per_instruction
*
instructions_per_ns
=
\n
(1000
nanoseconds
/
cache
miss)
\
in
R1,
and
C
is
in
R2.
\
*
(1
cache
miss
/
50
instructions)
*
(50
instructions
/
27000
nanoseconds)
=
1000
\
\
*
(1/50)
*
(50/27000)
=
1000/27000
=
1/27.
The
answer
is
(B).
\n\n
"
Next,
we
can
see
that
we
must
compute
three
multiplies
(AB,
BC,
and
ABC)
and
"
group"
:
"
mmlu_flan_cot_fewshot_stem"
two
adds
(AB
+
ABC,
(AB
+
ABC)
+
BC)
to
compute
our
final
answer,
resulting
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
in
a
minimum
of
five
clock
cycles.
"
task"
:
"
mmlu_flan_cot_fewshot_college_computer_science"
Next,
we
can
see
that
there
is
no
way
to
avoid
at
least
one
pipeline
stall
when
computing
our
final
answer,
because
to
compute
our
final
sum
we
must
wait
at
least
one
cycle
for
the
results
from
the
previous
stage
to
be
ready.
Thus,
our
minimum
number
of
cycles
must
be
6.
We
can
verify
that
we
can
create
a
solution
that
requires
only
six
cycles
as
follows:
compute
AB:
MUL
R0,
R1,
R3
compute
BC:
MUL
R1,
R2,
R4
compute
ABC:
MUL
R3,
R4,
R5
compute
AB
+
BC:
ADD
R3,
R4,
R6
STALL
compute
AB
+
ABC
+
BC:
ADD
R5,
R6,
R7
So
there
are
6
cycles.
The
answer
is
(B).'
-
question
:
'
A
compiler
generates
code
for
the
following
assignment
statement.
G
:=
(A
+
B)
*
C
-
(D
+
E)
*
F
The
target
machine
has
a
single
accumulator
and
a
single-address
instruction
set
consisting
of
instructions
load,
store,
add,
subtract,
and
multiply.
For
the
arithmetic
operations,
the
left
operand
is
taken
from
the
accumulator
and
the
result
appears
in
the
accumulator.
The
smallest
possible
number
of
instructions
in
the
resulting
code
is
(A)
5
(B)
6
(C)
7
(D)
9'
target
:
'
Let'
'
s
think
step
by
step.
We
can
compute
the
final
answer
with
the
following
sequence
of
operations:
1.
LOAD
D
(accumulator
=
D)
2.
ADD
E
(accumulator
=
D+E)
3.
MUL
F
(accumulator
=
(D+E)*F)
4.
STORE
X
(X
=
(D+E)*F)
5.
LOAD
A
(accumulator
=
A)
6.
ADD
B
(accumulator
=
A+B)
7.
MUL
C
(accumulator
=
(A+B)*C)
8.
SUB
X
(accumulator
=
(A+B)*C
-
(D+E)*F)
9.
STORE
G
(G
=
(A+B)*C
-
(D+E)*F)
This
sequence
takes
9
instructions.
The
answer
is
(D).'
-
question
:
'
Consider
a
computer
design
in
which
multiple
processors,
each
with
a
private
cache
memory,
share
global
memory
using
a
single
bus.
This
bus
is
the
critical
system
resource.
Each
processor
can
execute
one
instruction
every
500
nanoseconds
as
long
as
memory
references
are
satisfied
by
its
local
cache.
When
a
cache
miss
occurs,
the
processor
is
delayed
for
an
additional
2,000
nanoseconds.
During
half
of
this
additional
delay,
the
bus
is
dedicated
to
serving
the
cache
miss.
During
the
other
half,
the
processor
cannot
continue,
but
the
bus
is
free
to
service
requests
from
other
processors.
On
average,
each
instruction
requires
2
memory
references.
On
average,
cache
misses
occur
on
1
percent
of
references.
What
proportion
of
the
capacity
of
the
bus
would
a
single
processor
consume,
ignoring
delays
due
to
competition
from
other
processors?
(A)
1/50
(B)
1/27
(C)
1/25
(D)
2/27'
target
:
'
Let'
'
s
think
step
by
step.
We
know
that
each
instruction
requires
two
memory
references
per
instruction,
and
that
there
is
an
average
cache
miss
rate
of
one
percent.
Thus
a
given
processor
has:
(1
cache
miss
/
100
references)
*
(2
references
/
instruction)
=
(2
cache
misses
/
100
instructions),
so:
misses_per_instruction
=
1
cache
miss
/
50
instructions.
Next,
we
know
that
each
instruction
requires
500
nanoseconds
when
there
is
no
cache
miss,
and
500
+
2000
=
2500
nanoseconds
when
there
is
a
cache
miss.
Thus:
50
instructions
/
(49
*
500)
+
(1
*
2500)
nanoseconds,
so:
instructions_per_ns
=
50
instructions
/
27000
nanoseconds.
Now,
we
know
that
each
cache
miss
locks
the
bus
for
half
of
the
2000
nanosecond
cache
miss
delay,
or
1000
nanoseconds,
so:
lock_ns_per_miss
=
1000
nanoseconds
/
cache
miss.
Thus
we
can
see
that
on
average
a
single
processor
will
lock
the
bus
for:
lock_ns_per_miss
*
misses_per_instruction
*
instructions_per_ns
=
(1000
nanoseconds
/
cache
miss)
*
(1
cache
miss
/
50
instructions)
*
(50
instructions
/
27000
nanoseconds)
=
1000
*
(1/50)
*
(50/27000)
=
1000/27000
=
1/27.
The
answer
is
(B).'
group
:
mmlu_flan_cot_fewshot_stem
include
:
_mmlu_flan_cot_fewshot_template_yaml
task
:
mmlu_flan_cot_fewshot_college_computer_science
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_college_mathematics.yaml
View file @
da211969
"
dataset_name"
:
"
college_mathematics"
dataset_name
:
college_mathematics
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
college
\
description
:
The following are multiple choice questions (with answers) about college
\
mathematics.
\n\n
Q:
Let
V
be
the
set
of
all
real
polynomials
p(x).
Let
transformations
\
mathematics.
\
T,
S
be
defined
on
V
by
T:p(x)
->
xp(x)
and
S:p(x)
->
p'(x)
=
d/dx
p(x),
and
interpret
\
fewshot_config
:
\
(ST)(p(x))
as
S(T(p(x))).
Which
of
the
following
is
true?
\n
(A)
ST
=
0
(B)
ST
=
\
sampler
:
first_n
\
T
(C)
ST
=
TS
(D)
ST
-
TS
is
the
identity
map
of
V
onto
itself.
\n
A:
Let's
think
\
samples
:
\
step
by
step.
For
a
given
polynomial
$p$
we
have
\n\\
[ST(p)
=
(xp(x))’
=
p(x)
+
\
-
question
:
'
Let
V
be
the
set
of
all
real
polynomials
p(x).
Let
transformations
T,
\
xp’(x)
\\
]
\n
and
\n\\
[TS(p)
=
xp’(x).
\\
]
\n
Hence
\\
[ST(p)
-
TS(p)
=
p(x)
+
xp’(x)
\
S
be
defined
on
V
by
T:p(x)
->
xp(x)
and
S:p(x)
->
p'
'
(x)
=
d/dx
p(x),
and
interpret
\
-
xp’(x).
\\
]
The
answer
is
(D).
\n\n
Q:
Suppose
that
f(1
+
x)
=
f(x)
for
all
real
\
(ST)(p(x))
as
S(T(p(x))).
Which
of
the
following
is
true?
\
x.
If
f
is
a
polynomial
and
f(5)
=
11,
then
f(15/2)
\n
(A)
-11
(B)
0
(C)
11
(D)
\
\
33/2
\n
A:
Let's
think
step
by
step.
The
only
polynomial
so
that
$f(1
+
x)
=
f(x)$
\
(A)
ST
=
0
(B)
ST
=
T
(C)
ST
=
TS
(D)
ST
-
TS
is
the
identity
map
of
V
onto
\
is
a
constant
polynomial.
Hence
$f(5)
=
11
=
f(15/2)$.
The
answer
is
(C).
\n\n\
itself.'
Q:
Let
A
be
a
real
2x2
matrix.
Which
of
the
following
statements
must
be
true?
\n\
target
:
"
Let's
think
step
by
step.
For
a
given
polynomial
$p$
we
have
\n\\
[ST(p)
\
I.
All
of
the
entries
of
A^2
are
nonnegative.
\n
II.
The
determinant
of
A^2
is
nonnegative.
\n\
\
=
(xp(x))
\u2019
=
p(x)
+
xp
\u2019
(x)
\\
]
\n
and
\n\\
[TS(p)
=
xp
\u2019
(x).
\\
]
\n\
III.
If
A
has
two
distinct
eigenvalues,
then
A^2
has
two
distinct
eigenvalues.
\n\
Hence
\\
[ST(p)
-
TS(p)
=
p(x)
+
xp
\u2019
(x)
-
xp
\u2019
(x).
\\
]
The
answer
is
\
(A)
I
only
(B)
II
only
(C)
III
only
(D)
II
and
III
only
\n
A:
Let's
think
step
by
\
\
(D)."
\
step.
We
have
\\
[
det(A^2)
=
(det(A))^2
\\
geq
0,
\\
]
hence
II
holds.
\n
III
is
false:
\
-
question
:
'
Suppose
that
f(1
+
x)
=
f(x)
for
all
real
x.
If
f
is
a
polynomial
and
\
as
a
counterexample
take
a
diagonal
matrix
with
-1
and
1
on
the
diagonal.
Then
\
f(5)
=
11,
then
f(15/2)
\
$A^2$
is
the
identity
matrix.
The
answer
is
(B).
\n\n
Q:
Let
A
be
the
set
of
all
\
\
ordered
pairs
of
integers
(m,
n)
such
that
7m
+
12n
=
22.
What
is
the
greatest
\
(A)
-11
(B)
0
(C)
11
(D)
33/2'
\
negative
number
in
the
set
B
=
{m
+
n
:
(m,
n)
\\
in
A}?
\n
(A)
-5
(B)
-4
(C)
-3
\
target
:
Let's think step by step. The only polynomial so that $f(1 + x) = f(x)$
\
(D)
-2
\n
A:
Let's
think
step
by
step.
We
have
12n
=
22
-
7m
and
one
of
the
solutions
\
is a constant polynomial. Hence $f(5) = 11 = f(15/2)$. The answer is (C).
\
is
$m
=
-2$,
$n
=
3$.
Then
$m
+
n
=
1$,
hence
we
need
to
look
for
smaller
$m$
\
-
question
:
'
Let
A
be
a
real
2x2
matrix.
Which
of
the
following
statements
must
be
\
in
order
to
make
$m
+
n$
negative.
The
next
solution
is
$m
=
-14$
and
$n
=
10$.
\
true?
\
For
smaller
$m$
we
have
$m
+
n$
smaller
than
$-4$.
The
answer
is
(B).
\n\n
Q:
A
\
\
tank
initially
contains
a
salt
solution
of
3
grams
of
salt
dissolved
in
100
liters
\
I.
All
of
the
entries
of
A^2
are
nonnegative.
\
of
water.
A
salt
solution
containing
0.02
grams
of
salt
per
liter
of
water
is
\
\
sprayed
into
the
tank
at
a
rate
of
4
liters
per
minute.
The
sprayed
solution
is
\
II.
The
determinant
of
A^2
is
nonnegative.
\
continually
mixed
with
the
salt
solution
in
the
tank,
and
the
mixture
flows
out
\
\
of
the
tank
at
a
rate
of
4
liters
per
minute.
If
the
mixing
is
instantaneous,
\
III.
If
A
has
two
distinct
eigenvalues,
then
A^2
has
two
distinct
eigenvalues.
\
how
many
grams
of
salt
are
in
the
tank
after
100
minutes
have
elapsed?
\n
(A)
2
\
\
(B)
2
-
e^-2
(C)
2
+
e^-2
(D)
2
+
e^-4
\n
A:
Let's
think
step
by
step.
For
all
$t
\
(A)
I
only
(B)
II
only
(C)
III
only
(D)
II
and
III
only'
\ \\
in
\\
mathbb{R}$,
let
$s(t)$
denote
the
number
grams
of
salt
in
the
tank
at
the
\
target
:
'
Let'
'
s
think
step
by
step.
We
have
\[
det(A^2)
=
(det(A))^2
\geq
0,\]
\
$t$
minute
mark.
Then
$s(0)
=
3$.
\n
We
use
$s$
and
$s(t)$
interchangeably.
We
also
\
hence
II
holds.
\
use
$s^{
\\
prime}$
and
$s^{
\\
prime}(t)$
interchangeably.
The
solution
sprayed
into
\
\
the
tank
adds
$(0.02)
4=2
/
25$
grams
of
salt
per
minute.
There
are
always
100
\
III
is
false:
as
a
counterexample
take
a
diagonal
matrix
with
-1
and
1
on
the
\
liters
of
liquid
in
the
tank,
containing
$s$
grams
of
salt.
So
the
density
of
\
diagonal.
Then
$A^2$
is
the
identity
matrix.
The
answer
is
(B).'
\
salt
in
the
tank
is
$s
/
100$
grams
per
liter.
The
flow
of
water
out
of
the
tank
\
-
question
:
'
Let
A
be
the
set
of
all
ordered
pairs
of
integers
(m,
n)
such
that
7m
\
therefore
subtracts
$4(s
/
100)=s
/
25$
grams
of
salt
per
minute.
Then,
for
all
\
+
12n
=
22.
What
is
the
greatest
negative
number
in
the
set
B
=
{m
+
n
:
(m,
\
$t
\\
in
\\
mathbb{R}$,
we
have
$s^{
\\
prime}(t)=(2
/
25)-(s
/
25)=(2-s)
/
25$,
and
\
n)
\in
A}?
\
so
$[s(t)=2]
\\
Rightarrow
\\
left[s^{
\\
prime}(t)=0
\r
ight]$.
For
all
$t
\\
in
\\
mathbb{R}$,
\n\
$$
\n\f
rac{d}{d
t}[
\\
ln
(s-2)]=
\f
rac{s^{
\\
prime}}{s-2}=
\f
rac{-1}{25}=
\f
rac{d}{d
t}
\\\
(A)
-5
(B)
-4
(C)
-3
(D)
-2'
left[-
\f
rac{t}{25}
\r
ight]
.
\n
$$
\n
Choose
$C
\\
in
\\
mathbb{R}$
such
that,
for
all
\
target
:
Let's think step by step. We have 12n = 22 - 7m and one of the solutions
\
$t
\\
in
\\
mathbb{R},
\\
ln
((s(t)-2))=-[t
/
25]+C$.
Let
$K:=e^{C}$.
Then,
for
all
\
is $m = -2$, $n = 3$. Then $m + n = 1$, hence we need to look for smaller $m$
\
$t
\\
in
\\
mathbb{R}$,
we
have
$(s(t))-2=K
e^{-t
/
25}$,
and
so
$s(t)=2+K
e^{-t
\
in order to make $m + n$ negative. The next solution is $m = -14$ and $n = 10$.
\
/
25}$.
Then
$3=s(0)=2+K
e^{0}=2+K$,
so
$K=1$.
Then
$s(100)=2+K
e^{-100
/
25}=2+1
\
For smaller $m$ we have $m + n$ smaller than $-4$. The answer is (B).
\ \\
cdot
e^{-4}=2+e^{-4}$.
The
answer
is
(D).
\n\n
"
-
question
:
'
A
tank
initially
contains
a
salt
solution
of
3
grams
of
salt
dissolved
"
group"
:
"
mmlu_flan_cot_fewshot_stem"
in
100
liters
of
water.
A
salt
solution
containing
0.02
grams
of
salt
per
liter
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
of
water
is
sprayed
into
the
tank
at
a
rate
of
4
liters
per
minute.
The
sprayed
"
task"
:
"
mmlu_flan_cot_fewshot_college_mathematics"
solution
is
continually
mixed
with
the
salt
solution
in
the
tank,
and
the
mixture
flows
out
of
the
tank
at
a
rate
of
4
liters
per
minute.
If
the
mixing
is
instantaneous,
how
many
grams
of
salt
are
in
the
tank
after
100
minutes
have
elapsed?
(A)
2
(B)
2
-
e^-2
(C)
2
+
e^-2
(D)
2
+
e^-4'
target
:
"
Let's
think
step
by
step.
For
all
$t
\\
in
\\
mathbb{R}$,
let
$s(t)$
denote
\
\
the
number
grams
of
salt
in
the
tank
at
the
$t$
minute
mark.
Then
$s(0)
=
\
\
3$.
\n
We
use
$s$
and
$s(t)$
interchangeably.
We
also
use
$s^{
\\
prime}$
and
\
\
$s^{
\\
prime}(t)$
interchangeably.
The
solution
sprayed
into
the
tank
adds
\
\
$(0.02)
4=2
/
25$
grams
of
salt
per
minute.
There
are
always
100
liters
of
\
\
liquid
in
the
tank,
containing
$s$
grams
of
salt.
So
the
density
of
salt
in
\
\
the
tank
is
$s
/
100$
grams
per
liter.
The
flow
of
water
out
of
the
tank
therefore
\
\
subtracts
$4(s
/
100)=s
/
25$
grams
of
salt
per
minute.
Then,
for
all
$t
\\\
in
\\
mathbb{R}$,
we
have
$s^{
\\
prime}(t)=(2
/
25)-(s
/
25)=(2-s)
/
25$,
and
\
\
so
$[s(t)=2]
\\
Rightarrow
\\
left[s^{
\\
prime}(t)=0
\r
ight]$.
For
all
$t
\\
in
\
\ \\
mathbb{R}$,
\n
$$
\n\f
rac{d}{d
t}[
\\
ln
(s-2)]=
\f
rac{s^{
\\
prime}}{s-2}=
\f
rac{-1}{25}=
\f\
rac{d}{d
t}
\\
left[-
\f
rac{t}{25}
\r
ight]
.
\n
$$
\n
Choose
$C
\\
in
\\
mathbb{R}$
such
\
\
that,
for
all
$t
\\
in
\\
mathbb{R},
\\
ln
((s(t)-2))=-[t
/
25]+C$.
Let
$K:=e^{C}$.
\
\
Then,
for
all
$t
\\
in
\\
mathbb{R}$,
we
have
$(s(t))-2=K
e^{-t
/
25}$,
and
\
\
so
$s(t)=2+K
e^{-t
/
25}$.
Then
$3=s(0)=2+K
e^{0}=2+K$,
so
$K=1$.
Then
$s(100)=2+K
\
\
e^{-100
/
25}=2+1
\\
cdot
e^{-4}=2+e^{-4}$.
The
answer
is
(D).
\n\n
"
group
:
mmlu_flan_cot_fewshot_stem
include
:
_mmlu_flan_cot_fewshot_template_yaml
task
:
mmlu_flan_cot_fewshot_college_mathematics
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_college_medicine.yaml
View file @
da211969
"
dataset_name"
:
"
college_medicine"
dataset_name
:
college_medicine
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
college
\
description
:
The following are multiple choice questions (with answers) about college
\
medicine.
\n\n
Q:
An
expected
side
effect
of
creatine
supplementation
is:
\n
(A)
muscle
\
medicine.
\
weakness.
(B)
gain
in
body
mass.
(C)
muscle
cramps.
(D)
loss
of
electrolytes.
\n\
fewshot_config
:
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
medicine
for
help.
\
sampler
:
first_n
\
Creatine
supplementation
is
a
dietary
supplement
that
results
in
body
mass
gain.
\
samples
:
\
The
answer
is
(B).
\n\n
Q:
Which
of
the
following
is
not
a
true
statement?
\n
(A)
\
-
question
:
'
An
expected
side
effect
of
creatine
supplementation
is:
\
Muscle
glycogen
is
broken
down
enzymatically
to
glucose-1-phosphate
(B)
Elite
\
\
endurance
runners
have
a
high
proportion
of
Type
I
fibres
in
their
leg
muscles
\
(A)
muscle
weakness.
(B)
gain
in
body
mass.
(C)
muscle
cramps.
(D)
loss
of
electrolytes.'
\
(C)
Liver
glycogen
is
important
in
the
maintenance
of
the
blood
glucose
concentration
\
target
:
Let's think step by step. We refer to Wikipedia articles on medicine for
\
(D)
Insulin
promotes
glucose
uptake
by
all
tissues
in
the
body
\n
A:
Let's
think
\
help. Creatine supplementation is a dietary supplement that results in body
\
step
by
step.
We
refer
to
Wikipedia
articles
on
medicine
for
help.
Let’s
solve
\
mass gain. The answer is (B).
\
this
step
by
step
and
go
over
each
choice:
\n
(A)
“Muscle
glycogen
is
broken
down
\
-
question
:
'
Which
of
the
following
is
not
a
true
statement?
\
enzymatically
to
glucose-1-phosphate”:
This
is
a
correct
statement.
\n
(B)
“Elite
\
\
endurance
runners
have
a
high
proportion
of
Type
I
fibres
in
their
leg
muscles”:
\
(A)
Muscle
glycogen
is
broken
down
enzymatically
to
glucose-1-phosphate
(B)
\
This
is
a
correct
statement.
\n
(C)
“Liver
glycogen
is
important
in
the
maintenance
\
Elite
endurance
runners
have
a
high
proportion
of
Type
I
fibres
in
their
leg
\
of
the
blood
glucose
concentration”:
This
is
a
correct
statement.
\n
(D)
“Insulin
\
muscles
(C)
Liver
glycogen
is
important
in
the
maintenance
of
the
blood
glucose
\
promotes
glucose
uptake
by
all
tissues
in
the
body”:
This
is
not
a
correct
statement,
\
concentration
(D)
Insulin
promotes
glucose
uptake
by
all
tissues
in
the
body'
\
because
insulin
promotes
glucose
uptake
by
the
liver,
adipose
tissue,
and
muscle,
\
target
:
"
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
medicine
\
\
but
not
all
tissues.
For
instance,
the
tissues
in
the
brain
and
red
blood
cells
\
\
for
help.
Let
\u2019
s
solve
this
step
by
step
and
go
over
each
choice:
\n
(A)
\
\
are
not
affected
by
insulin.
The
answer
is
(D).
\n\n
Q:
A
high
school
science
teacher
\
\ \u201C
Muscle
glycogen
is
broken
down
enzymatically
to
glucose-1-phosphate
\u201D\
\
fills
a
1
liter
bottle
with
pure
nitrogen
and
seals
the
lid.
The
pressure
is
1.70
\
:
This
is
a
correct
statement.
\n
(B)
\u201C
Elite
endurance
runners
have
a
high
\
\
atm,
and
the
room
temperature
is
25°C.
Which
two
variables
will
both
increase
\
\
proportion
of
Type
I
fibres
in
their
leg
muscles
\u201D
:
This
is
a
correct
\
\
the
pressure
of
the
system,
if
all
other
variables
are
held
constant?
\n
(A)
Increasing
\
\
statement.
\n
(C)
\u201C
Liver
glycogen
is
important
in
the
maintenance
of
the
\
\
temperature,
increasing
moles
of
gas
(B)
Increasing
temperature,
increasing
volume
\
\
blood
glucose
concentration
\u201D
:
This
is
a
correct
statement.
\n
(D)
\u201C\
\
(C)
Decreasing
volume,
decreasing
temperature
(D)
Decreasing
moles
of
gas,
increasing
\
Insulin
promotes
glucose
uptake
by
all
tissues
in
the
body
\u201D
:
This
is
not
\
\
volume
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
medicine
\
\
a
correct
statement,
because
insulin
promotes
glucose
uptake
by
the
liver,
\
\
for
help.
The
relevant
equation
for
this
is
the
ideal
gas
law:
PV=nRT.
To
increase
\
\
adipose
tissue,
and
muscle,
but
not
all
tissues.
For
instance,
the
tissues
\
\
the
pressure
of
the
system
(P),
then
either
n
(number
of
moles
of
the
gas)
or
\
\
in
the
brain
and
red
blood
cells
are
not
affected
by
insulin.
The
answer
is
\
\
T
(temperature)
have
to
increase.
The
answer
is
(A).
\n\n
Q:
In
a
genetic
test
of
\
\
(D)."
\
a
newborn,
a
rare
genetic
disorder
is
found
that
has
X-linked
recessive
transmission.
\
-
question
:
"
A
high
school
science
teacher
fills
a
1
liter
bottle
with
pure
nitrogen
\
\
Which
of
the
following
statements
is
likely
true
regarding
the
pedigree
of
this
\
\
and
seals
the
lid.
The
pressure
is
1.70
atm,
and
the
room
temperature
is
25
\xB0\
\
disorder?
\n
(A)
All
descendants
on
the
maternal
side
will
have
the
disorder.
(B)
\
C.
Which
two
variables
will
both
increase
the
pressure
of
the
system,
if
all
\
\
Females
will
be
approximately
twice
as
affected
as
males
in
this
family.
(C)
All
\
\
other
variables
are
held
constant?
\n
(A)
Increasing
temperature,
increasing
\
\
daughters
of
an
affected
male
will
be
affected.
(D)
There
will
be
equal
distribution
\
\
moles
of
gas
(B)
Increasing
temperature,
increasing
volume
(C)
Decreasing
\
\
of
males
and
females
affected.
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
\
\
volume,
decreasing
temperature
(D)
Decreasing
moles
of
gas,
increasing
volume"
\
articles
on
medicine
for
help.
Let’s
solve
this
step
by
step.
Let's
recall
first
\
target
:
'
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
medicine
\
that
females
have
two
X
chromosomes,
while
males
have
one
X
and
one
Y
chromosome.
\
for
help.
The
relevant
equation
for
this
is
the
ideal
gas
law:
PV=nRT.
To
increase
\
This
is
an
important
fact
we
need
to
know
before
answering
this
question.
\n
Because
\
the
pressure
of
the
system
(P),
then
either
n
(number
of
moles
of
the
gas)
or
\
a
male
can
only
pass
his
only
one
X
chromosome
to
a
daughter,
if
he
is
affected
\
T
(temperature)
have
to
increase.
The
answer
is
(A).'
\
by
this
rare
genetic
disorder,
then
we
know
for
sure
that
he
will
pass
this
rare
\
-
question
:
'
In
a
genetic
test
of
a
newborn,
a
rare
genetic
disorder
is
found
that
\
genetic
disorder
to
all
his
future-born
daughters.
Therefore,
“(C):
All
daughters
\
has
X-linked
recessive
transmission.
Which
of
the
following
statements
is
likely
\
of
an
affected
male
will
be
affected”
is
a
correct
statement.
The
answer
is
(C).
\n\
true
regarding
the
pedigree
of
this
disorder?
\n
Q:
Glucose
is
transported
into
the
muscle
cell:
\n
(A)
via
protein
transporters
\
\
called
GLUT4.
(B)
only
in
the
presence
of
insulin.
(C)
via
hexokinase.
(D)
via
\
(A)
All
descendants
on
the
maternal
side
will
have
the
disorder.
(B)
Females
\
monocarbylic
acid
transporters.
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
\
will
be
approximately
twice
as
affected
as
males
in
this
family.
(C)
All
daughters
\
articles
on
medicine
for
help.
Glucose
(also
known
as
the
blood
sugar)
is
the
\
of
an
affected
male
will
be
affected.
(D)
There
will
be
equal
distribution
of
\
main
sugar
found
in
the
human
body.
It
is
transported
into
the
muscle
cell
via
\
males
and
females
affected.'
\
diffusion
through
protein
transporters
called
GLUT4.
The
answer
is
(A).
\n\n
"
target
:
"
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
medicine
\
"
group"
:
"
mmlu_flan_cot_fewshot_other"
\
for
help.
Let
\u2019
s
solve
this
step
by
step.
Let's
recall
first
that
females
\
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
\
have
two
X
chromosomes,
while
males
have
one
X
and
one
Y
chromosome.
This
\
"
task"
:
"
mmlu_flan_cot_fewshot_college_medicine"
\
is
an
important
fact
we
need
to
know
before
answering
this
question.
\n
Because
\
\
a
male
can
only
pass
his
only
one
X
chromosome
to
a
daughter,
if
he
is
affected
\
\
by
this
rare
genetic
disorder,
then
we
know
for
sure
that
he
will
pass
this
\
\
rare
genetic
disorder
to
all
his
future-born
daughters.
Therefore,
\u201C\
(C):
All
daughters
of
an
affected
male
will
be
affected
\u201D
is
a
correct
statement.
\
\
The
answer
is
(C)."
-
question
:
'
Glucose
is
transported
into
the
muscle
cell:
(A)
via
protein
transporters
called
GLUT4.
(B)
only
in
the
presence
of
insulin.
(C)
via
hexokinase.
(D)
via
monocarbylic
acid
transporters.'
target
:
'
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
medicine
for
help.
Glucose
(also
known
as
the
blood
sugar)
is
the
main
sugar
found
in
the
human
body.
It
is
transported
into
the
muscle
cell
via
diffusion
through
protein
transporters
called
GLUT4.
The
answer
is
(A).'
group
:
mmlu_flan_cot_fewshot_other
include
:
_mmlu_flan_cot_fewshot_template_yaml
task
:
mmlu_flan_cot_fewshot_college_medicine
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_college_physics.yaml
View file @
da211969
"
dataset_name"
:
"
college_physics"
dataset_name
:
college_physics
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
college
\
description
:
The following are multiple choice questions (with answers) about college
\
physics.
\n\n
Q:
A
refracting
telescope
consists
of
two
converging
lenses
separated
\
physics.
\
by
100
cm.
The
eye-piece
lens
has
a
focal
length
of
20
cm.
The
angular
magnification
\
fewshot_config
:
\
of
the
telescope
is
\n
(A)
4
(B)
5
(C)
6
(D)
20
\n
A:
Let's
think
step
by
step.
In
\
sampler
:
first_n
\
a
refracting
telescope,
if
both
lenses
are
converging,
the
focus
of
both
lenses
\
samples
:
\
must
be
between
the
two
lenses,
and
thus
the
focal
lengths
of
the
two
lenses
must
\
-
question
:
'
A
refracting
telescope
consists
of
two
converging
lenses
separated
by
\
add
up
to
their
separation.
Since
the
focal
length
of
one
lens
is
20
cm,
the
focal
\
100
cm.
The
eye-piece
lens
has
a
focal
length
of
20
cm.
The
angular
magnification
\
length
of
the
other
must
be
80
cm.
The
magnification
is
the
ratio
of
these
two
\
of
the
telescope
is
\
focal
lengths,
or
4.
The
answer
is
(A).
\n\n
Q:
The
muon
decays
with
a
characteristic
\
\
lifetime
of
about
10^-6
second
into
an
electron,
a
muon
neutrino,
and
an
electron
\
(A)
4
(B)
5
(C)
6
(D)
20'
\
antineutrino.
The
muon
is
forbidden
from
decaying
into
an
electron
and
just
a
\
target
:
Let's think step by step. In a refracting telescope, if both lenses are
\
single
neutrino
by
the
law
of
conservation
of
\n
(A)
charge
(B)
mass
(C)
energy
\
converging, the focus of both lenses must be between the two lenses, and thus
\
and
momentum
(D)
lepton
number
\n
A:
Let's
think
step
by
step.
Lepton
number
must
\
the focal lengths of the two lenses must add up to their separation. Since the
\
be
conserved,
meaning
the
total
number
of
leptons
minus
the
number
of
antileptons.
\
focal length of one lens is 20 cm, the focal length of the other must be
80
\
If
a
muon
decays
into
an
electron
and
a
single
neutrino,
the
total
lepton
number
\
cm. The magnification is the ratio of these two focal lengths, or 4. The answer
\
would
go
from
one
to
two,
violating
lepton
number
conservation.
The
answer
is
\
is (A).
\
(D).
\n\n
Q:
One
end
of
a
Nichrome
wire
of
length
2L
and
cross-sectional
area
A
\
-
question
:
'
The
muon
decays
with
a
characteristic
lifetime
of
about
10^-6
second
into
\
is
attached
to
an
end
of
another
Nichrome
wire
of
length
L
and
cross-
sectional
\
an
electron,
a
muon
neutrino,
and
an
electron
antineutrino.
The
muon
is
forbidden
\
area
2A.
If
the
free
end
of
the
longer
wire
is
at
an
electric
potential
of
8.0
\
from
decaying
into
an
electron
and
just
a
single
neutrino
by
the
law
of
conservation
\
volts,
and
the
free
end
of
the
shorter
wire
is
at
an
electric
potential
of
1.0
\
of
\
volt,
the
potential
at
the
junction
of
the
two
wires
is
most
nearly
equal
to
\n\
(A)
2.4
V
(B)
3.3
V
(C)
4.5
V
(D)
5.7
V
\n
A:
Let's
think
step
by
step.
This
is
a
\
(A)
charge
(B)
mass
(C)
energy
and
momentum
(D)
lepton
number'
\
simple
voltage
divider
problem,
where
the
longer
wire
has
a
resistance
four
times
\
target
:
Let's think step by step. Lepton number must be conserved, meaning the
\
that
of
the
shorter
end.
So
the
voltage
divider
ratio
is
1
/
5,
meaning
that
the
\
total number of leptons minus the number of antileptons. If a muon decays into
\
potential
in
the
middle
is
1.0
V
+
(8.0
V
-
1.0
V)
*
1/5
=
2.4
V.
The
answer
is
\
an electron and a single neutrino, the total lepton number would go from one
\
(A).
\n\n
Q:
A
refracting
telescope
consists
of
two
converging
lenses
separated
\
to two, violating lepton number conservation. The answer is (D).
\
by
100
cm.
The
eye-piece
lens
has
a
focal
length
of
20
cm.
The
angular
magnification
\
-
question
:
'
One
end
of
a
Nichrome
wire
of
length
2L
and
cross-sectional
area
A
is
\
of
the
telescope
is
\n
(A)
4
(B)
5
(C)
6
(D)
20
\n
A:
Let's
think
step
by
step.
In
\
attached
to
an
end
of
another
Nichrome
wire
of
length
L
and
cross-
sectional
\
a
refracting
telescope,
if
both
lenses
are
converging,
the
focus
of
both
lenses
\
area
2A.
If
the
free
end
of
the
longer
wire
is
at
an
electric
potential
of
8.0
\
must
be
between
the
two
lenses,
and
thus
the
focal
lengths
of
the
two
lenses
must
\
volts,
and
the
free
end
of
the
shorter
wire
is
at
an
electric
potential
of
1.0
\
add
up
to
their
separation.
Since
the
focal
length
of
one
lens
is
20
cm,
the
focal
\
volt,
the
potential
at
the
junction
of
the
two
wires
is
most
nearly
equal
to
\
length
of
the
other
must
be
80
cm.
The
magnification
is
the
ratio
of
these
two
\
\
focal
lengths,
or
4.
The
answer
is
(A).
\n\n
Q:
For
which
of
the
following
thermodynamic
\
(A)
2.4
V
(B)
3.3
V
(C)
4.5
V
(D)
5.7
V'
\
processes
is
the
increase
in
the
internal
energy
of
an
ideal
gas
equal
to
the
\
target
:
Let's think step by step. This is a simple voltage divider problem, where
\
heat
added
to
the
gas?
\n
(A)
Constant
temperature
(B)
Constant
volume
(C)
Constant
\
the longer wire has a resistance four times that of the shorter end. So the
\
pressure
(D)
Adiabatic
\n
A:
Let's
think
step
by
step.
Heat
added
to
the
gas
can
\
voltage divider ratio is 1 / 5, meaning that the potential in the middle is
\
go
into
the
gases
internal
energy
or
work
done
against
an
external
force.
However,
\
1.0 V + (8.0 V - 1.0 V) * 1/5 = 2.4 V. The answer is (A).
\
if
the
volume
of
the
gas
container
is
constant,
no
work
will
be
done
(since
work
\
-
question
:
'
A
refracting
telescope
consists
of
two
converging
lenses
separated
by
\
is
pressure
times
change
in
volume).
So,
at
constant
volume,
all
of
the
heat
goes
\
100
cm.
The
eye-piece
lens
has
a
focal
length
of
20
cm.
The
angular
magnification
\
into
the
internal
energy.
The
answer
is
(B).
\n\n
"
of
the
telescope
is
"
group"
:
"
mmlu_flan_cot_fewshot_stem"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
(A)
4
(B)
5
(C)
6
(D)
20'
"
task"
:
"
mmlu_flan_cot_fewshot_college_physics"
target
:
Let's think step by step. In a refracting telescope, if both lenses are
converging, the focus of both lenses must be between the two lenses, and thus
the focal lengths of the two lenses must add up to their separation. Since the
focal length of one lens is 20 cm, the focal length of the other must be
80
cm. The magnification is the ratio of these two focal lengths, or 4. The answer
is (A).
-
question
:
'
For
which
of
the
following
thermodynamic
processes
is
the
increase
in
the
internal
energy
of
an
ideal
gas
equal
to
the
heat
added
to
the
gas?
(A)
Constant
temperature
(B)
Constant
volume
(C)
Constant
pressure
(D)
Adiabatic'
target
:
'
Let'
'
s
think
step
by
step.
Heat
added
to
the
gas
can
go
into
the
gases
internal
energy
or
work
done
against
an
external
force.
However,
if
the
volume
of
the
gas
container
is
constant,
no
work
will
be
done
(since
work
is
pressure
times
change
in
volume).
So,
at
constant
volume,
all
of
the
heat
goes
into
the
internal
energy.
The
answer
is
(B).'
group
:
mmlu_flan_cot_fewshot_stem
include
:
_mmlu_flan_cot_fewshot_template_yaml
task
:
mmlu_flan_cot_fewshot_college_physics
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_computer_security.yaml
View file @
da211969
"
dataset_name"
:
"
computer_security"
dataset_name
:
computer_security
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
computer
\
description
:
The following are multiple choice questions (with answers) about computer
\
security.
\n\n
Q:
SHA-1
has
a
message
digest
of
\n
(A)
160
bits
(B)
512
bits
(C)
628
\
security.
\
bits
(D)
820
bits
\n
A:
Let's
think
step
by
step.
Since
SHA-1
is
a
hash
function
\
fewshot_config
:
\
which
takes
an
input
and
produces
a
160-bit
(20-byte)
hash
value,
its
message
\
sampler
:
first_n
\
digest
is
160
bits.
The
answer
is
(A).
\n\n
Q:
_____________
can
modify
data
on
\
samples
:
\
your
system
–
so
that
your
system
doesn’t
run
correctly
or
you
can
no
longer
access
\
-
question
:
'
SHA-1
has
a
message
digest
of
\
specific
data,
or
it
may
even
ask
for
ransom
in
order
to
give
your
access.
\n
(A)
\
\
IM
–
Trojans
(B)
Backdoor
Trojans
(C)
Trojan-Downloader
(D)
Ransom
Trojan
\n
A:
\
(A)
160
bits
(B)
512
bits
(C)
628
bits
(D)
820
bits'
\
Let's
think
step
by
step.
The
system
is
asking
for
trojans,
which
are
for
ransom,
\
target
:
Let's think step by step. Since SHA-1 is a hash function which takes an
\
which
means
ransom
trojan.
The
answer
is
(D).
\n\n
Q:
What
is
ethical
hacking?
\n\
question and produces a 160-bit (20-byte) hash value, its message digest is
160
(A)
\"
Hacking
\"
ethics
so
they
justify
unintended
selfish
behavior
(B)
Hacking
systems
\
bits. The answer is (A).
\
(e.g.,
during
penetration
testing)
to
expose
vulnerabilities
so
they
can
be
fixed,
\
-
question
:
"
_____________
can
modify
data
on
your
system
\u2013
so
that
your
system
\
\
rather
than
exploited
(C)
Hacking
into
systems
run
by
those
whose
ethics
you
disagree
\
\
doesn
\u2019
t
run
correctly
or
you
can
no
longer
access
specific
data,
or
it
\
\
with
(D)
A
slang
term
for
rapid
software
development,
e.g.,
as
part
of
hackathons
\n\
\
may
even
ask
for
ransom
in
order
to
give
your
access.
\n
(A)
IM
\u2013
Trojans
\
A:
Let's
think
step
by
step.
Ethical
hacking
is
a
process
of
detecting
vulnerabilities
\
\
(B)
Backdoor
Trojans
(C)
Trojan-Downloader
(D)
Ransom
Trojan"
\
in
an
application,
system,
or
organization's
infrastructure
that
an
attacker
can
\
target
:
Let's think step by step. The system is asking for trojans, which are
\
use
to
exploit
an
individual
or
organization.
They
use
this
process
to
prevent
\
for ransom, which means ransom trojan. The answer is (D).
\
cyberattacks
and
security
breaches
by
lawfully
hacking
into
the
systems
and
looking
\
-
question
:
'
What
is
ethical
hacking?
\
for
weak
points.
The
answer
is
(B).
\n\n
Q:
The
____________
is
anything
which
your
\
\
search
engine
cannot
search.
\n
(A)
Haunted
web
(B)
World
Wide
Web
(C)
Surface
web
\
(A)
"Hacking"
ethics
so
they
justify
unintended
selfish
behavior
(B)
Hacking
\
(D)
Deep
Web
\n
A:
Let's
think
step
by
step.
The
search
engine
searches
on
the
Surface
\
systems
(e.g.,
during
penetration
testing)
to
expose
vulnerabilities
so
they
\
Web,
which
is
the
portion
of
the
world
wide
web
which
is
visible
so
(B,C)
are
\
can
be
fixed,
rather
than
exploited
(C)
Hacking
into
systems
run
by
those
whose
\
wrong.
The
Haunted
Web
doesn’t
correspond
to
an
internet
concept.
The
Deep
Web
\
ethics
you
disagree
with
(D)
A
slang
term
for
rapid
software
development,
e.g.,
\
is
the
part
of
the
World
Wide
Web
which
is
not
indexed.
The
answer
is
(D).
\n\n\
as
part
of
hackathons'
Q:
Exploitation
of
the
Heartbleed
bug
permits
\n
(A)
overwriting
cryptographic
keys
\
target
:
Let's think step by step. Ethical hacking is a process of detecting vulnerabilities
\
in
memory
(B)
a
kind
of
code
injection
(C)
a
read
outside
bounds
of
a
buffer
(D)
\
in an application, system, or organization's infrastructure that an attacker
\
a
format
string
attack
\n
A:
Let's
think
step
by
step.
The
Heartbleed
Bug
is
a
serious
\
can use to exploit an individual or organization. They use this process to prevent
\
vulnerability
in
the
popular
OpenSSL
cryptographic
software
library.
Heartbleed
\
cyberattacks and security breaches by lawfully hacking into the systems and
\
resulted
from
improper
input
validation
(due
to
a
missing
bounds
check)
in
the
\
looking for weak points. The answer is (B).
\
implementation
of
the
TLS
heartbeat
extension.
The
vulnerability
was
classified
\
-
question
:
'
The
____________
is
anything
which
your
search
engine
cannot
search.
\
as
a
buffer
over-read,
a
situation
where
more
data
can
be
read
than
should
be
\
\
allowed.
The
answer
is
(C).
\n\n
"
(A)
Haunted
web
(B)
World
Wide
Web
(C)
Surface
web
(D)
Deep
Web'
"
group"
:
"
mmlu_flan_cot_fewshot_stem"
target
:
"
Let's
think
step
by
step.
The
search
engine
searches
on
the
Surface
Web,
\
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
\
which
is
the
portion
of
the
world
wide
web
which
is
visible
so
(B,C)
are
wrong.
\
"
task"
:
"
mmlu_flan_cot_fewshot_computer_security"
\
The
Haunted
Web
doesn
\u2019
t
correspond
to
an
internet
concept.
The
Deep
Web
\
\
is
the
part
of
the
World
Wide
Web
which
is
not
indexed.
The
answer
is
(D)."
-
question
:
'
Exploitation
of
the
Heartbleed
bug
permits
(A)
overwriting
cryptographic
keys
in
memory
(B)
a
kind
of
code
injection
(C)
a
read
outside
bounds
of
a
buffer
(D)
a
format
string
attack'
target
:
'
Let'
'
s
think
step
by
step.
The
Heartbleed
Bug
is
a
serious
vulnerability
in
the
popular
OpenSSL
cryptographic
software
library.
Heartbleed
resulted
from
improper
question
validation
(due
to
a
missing
bounds
check)
in
the
implementation
of
the
TLS
heartbeat
extension.
The
vulnerability
was
classified
as
a
buffer
over-read,
a
situation
where
more
data
can
be
read
than
should
be
allowed.
The
answer
is
(C).'
group
:
mmlu_flan_cot_fewshot_stem
include
:
_mmlu_flan_cot_fewshot_template_yaml
task
:
mmlu_flan_cot_fewshot_computer_security
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_conceptual_physics.yaml
View file @
da211969
"
dataset_name"
:
"
conceptual_physics"
dataset_name
:
conceptual_physics
"
description"
:
"
\n
The
following
are
multiple
choice
questions
(with
answers)
about
\
description
:
'
\
conceptual
physics.
\n\n
Q:
Colors
in
a
soap
bubble
result
from
light
\n
(A)
converted
\
\
to
a
different
frequency
(B)
deflection
(C)
interference
(D)
polarization
\n
A:
\
The
following
are
multiple
choice
questions
(with
answers)
about
conceptual
physics.'
\
Let's
think
step
by
step.
In
a
soap
bubble
film,
the
light
bounces
between
the
\
fewshot_config
:
\
two
soap-air
interfaces
many
times,
interfering
with
itself
constructively
or
\
sampler
:
first_n
\
destructively
depending
on
the
width
of
the
film.
This
results
in
different
colors
\
samples
:
\
being
visible.
The
answer
is
(C).
\n\n
Q:
Compared
with
the
mass
of
a
uranium
atom
\
-
question
:
'
Colors
in
a
soap
bubble
result
from
light
\
undergoing
fission,
the
combined
masses
of
the
products
after
fission
are
\n
(A)
\
\
less
(B)
more
(C)
the
same
(D)
zero
\n
A:
Let's
think
step
by
step.
Fission
releases
\
(A)
converted
to
a
different
frequency
(B)
deflection
(C)
interference
(D)
polarization'
\
energy,
which
comes
from
the
rest
mass
of
its
initial
nucleus.
Thus
the
mass
of
\
target
:
Let's think step by step. In a soap bubble film, the light bounces between
\
the
products
is
less
than
the
mass
of
the
reactant
uranium
nucleus.
The
answer
\
the two soap-air interfaces many times, interfering with itself constructively
\
is
(A).
\n\n
Q:
Things
that
are
equivalent
according
to
the
equivalence
principle
\
or destructively depending on the width of the film. This results in different
\
are
\n
(A)
space
and
time.
(B)
a
traveling
twin
and
a
stay-at-home
twin.
(C)
gravity
\
colors being visible. The answer is (C).
\
and
acceleration.
(D)
mass
and
energy.
\n
A:
Let's
think
step
by
step.
Einstein’s
\
-
question
:
'
Compared
with
the
mass
of
a
uranium
atom
undergoing
fission,
the
combined
\
famous
equivalence
principle
states
that
gravity
and
acceleration
are
equivalent.
\
masses
of
the
products
after
fission
are
\
The
answer
is
(C).
\n\n
Q:
Which
of
these
three
elements
has
the
most
mass
per
nucleon?
\n\
(A)
Hydrogen
(B)
Iron
(C)
Uranium
(D)
Same
in
each
\n
A:
Let's
think
step
by
step.
\
(A)
less
(B)
more
(C)
the
same
(D)
zero'
\
Due
to
nuclear
binding
energy,
the
mass
of
an
atomic
nucleus
is
less
than
the
\
target
:
Let's think step by step. Fission releases energy, which comes from the
\
sum
of
individual
masses
of
the
free
constituent
protons
and
neutrons;
this
is
\
rest mass of its initial nucleus. Thus the mass of the products is less than
\
known
as
the
mass
defect.
Hydrogen
has
no
mass
defect
because
it
has
only
a
single
\
the mass of the reactant uranium nucleus. The answer is (A).
\
nucleon,
so
it
will
have
the
most
mass
per
nucleon.
The
answer
is
(A).
\n\n
Q:
A
\
-
question
:
'
Things
that
are
equivalent
according
to
the
equivalence
principle
are
\
model
airplane
flies
slower
when
flying
into
the
wind
and
faster
with
wind
at
\
\
its
back.
When
launched
at
right
angles
to
the
wind
a
cross
wind
its
groundspeed
\
(A)
space
and
time.
(B)
a
traveling
twin
and
a
stay-at-home
twin.
(C)
gravity
\
compared
with
flying
in
still
air
is
\n
(A)
the
same
(B)
greater
(C)
less
(D)
either
\
and
acceleration.
(D)
mass
and
energy.'
\
greater
or
less
depending
on
wind
speed
\n
A:
Let's
think
step
by
step.
The
plane’s
\
target
:
"
Let's
think
step
by
step.
Einstein
\u2019
s
famous
equivalence
principle
\
\
speed
in
the
direction
of
the
wind
is
greater
than
it
would
be
in
the
absence
\
\
states
that
gravity
and
acceleration
are
equivalent.
The
answer
is
(C)."
\
of
wind,
and
its
direction
orthogonal
to
the
wind
is
the
same
as
it
would
be
in
\
-
question
:
'
Which
of
these
three
elements
has
the
most
mass
per
nucleon?
\
the
absence
of
the
wind.
The
total
speed,
which
is
these
two
components
added
\
\
in
quadrature,
is
thus
greater
than
the
speed
in
still
air.
The
answer
is
(B).
\n\n
"
(A)
Hydrogen
(B)
Iron
(C)
Uranium
(D)
Same
in
each'
"
group"
:
"
mmlu_flan_cot_fewshot_stem"
target
:
Let's think step by step. Due to nuclear binding energy, the mass of an
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
atomic nucleus is less than the sum of individual masses of the free constituent
"
task"
:
"
mmlu_flan_cot_fewshot_conceptual_physics"
protons and neutrons; this is known as the mass defect. Hydrogen has no mass
defect because it has only a single nucleon, so it will have the most mass per
nucleon. The answer is (A).
-
question
:
'
A
model
airplane
flies
slower
when
flying
into
the
wind
and
faster
with
wind
at
its
back.
When
launched
at
right
angles
to
the
wind
a
cross
wind
its
groundspeed
compared
with
flying
in
still
air
is
(A)
the
same
(B)
greater
(C)
less
(D)
either
greater
or
less
depending
on
wind
speed'
target
:
"
Let's
think
step
by
step.
The
plane
\u2019
s
speed
in
the
direction
of
\
\
the
wind
is
greater
than
it
would
be
in
the
absence
of
wind,
and
its
direction
\
\
orthogonal
to
the
wind
is
the
same
as
it
would
be
in
the
absence
of
the
wind.
\
\
The
total
speed,
which
is
these
two
components
added
in
quadrature,
is
thus
\
\
greater
than
the
speed
in
still
air.
The
answer
is
(B).
\n\n
"
group
:
mmlu_flan_cot_fewshot_stem
include
:
_mmlu_flan_cot_fewshot_template_yaml
task
:
mmlu_flan_cot_fewshot_conceptual_physics
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_econometrics.yaml
View file @
da211969
"
dataset_name"
:
"
econometrics"
dataset_name
:
econometrics
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
econometrics.
\n\
description
:
The following are multiple choice questions (with answers) about econometrics.
\n
Q:
Suppose
now
that
a
researcher
wishes
to
use
information
criteria
to
determine
\
fewshot_config
:
\
the
optimal
lag
length
for
a
VAR.
500
observations
are
available
for
the
bi-variate
\
sampler
:
first_n
\
VAR,
and
the
values
of
the
determinant
of
the
variance-covariance
matrix
of
residuals
\
samples
:
\
are
0.0336,
0.0169,
0.0084,
and
0.0062
for
1,
2,
3,
and
4
lags
respectively.
What
\
-
question
:
'
Suppose
now
that
a
researcher
wishes
to
use
information
criteria
to
determine
\
is
the
optimal
model
order
according
to
Akaike's
information
criterion?
\n
(A)
1
\
the
optimal
lag
length
for
a
VAR.
500
observations
are
available
for
the
bi-variate
\
lag
(B)
2
lags
(C)
3
lags
(D)
4
lags
\n
A:
Let's
think
step
by
step.
We
refer
to
\
VAR,
and
the
values
of
the
determinant
of
the
variance-covariance
matrix
of
\
Wikipedia
articles
on
econometrics
for
help.
Let’s
solve
this
problem
step
by
\
residuals
are
0.0336,
0.0169,
0.0084,
and
0.0062
for
1,
2,
3,
and
4
lags
respectively.
\
step.
First
of
all,
let’s
recall
that
for
a
given
set
of
data,
Akaike's
information
\
What
is
the
optimal
model
order
according
to
Akaike'
'
s
information
criterion?
\
criterion
(AIC)
allows
us
to
measure
how
well
a
statistical
model
fits
the
data;
\
\
it
is
an
estimator
of
prediction
error.
Here
in
this
problem
we
will
need
to
use
\
(A)
1
lag
(B)
2
lags
(C)
3
lags
(D)
4
lags'
\
the
formula
ln(det(sigma_hat))
+
(2
*
k
/
T)
to
determine
the
values
of
Akaike’s
\
target
:
"
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
econometrics
\
\
criterion,
where
ln
denotes
the
natural
log
function,
det
the
determinant
function,
\
\
for
help.
Let
\u2019
s
solve
this
problem
step
by
step.
First
of
all,
let
\u2019\
\
k
the
total
number
of
parameters
in
total
(across
both
equations),
and
T
the
number
\
s
recall
that
for
a
given
set
of
data,
Akaike's
information
criterion
(AIC)
\
\
of
observations
(which,
in
this
case,
is
equal
to
500).
For
1
lag,
the
number
\
\
allows
us
to
measure
how
well
a
statistical
model
fits
the
data;
it
is
an
\
\
of
parameters
in
total
is
equal
to
6;
for
2
lags,
it
is
10;
for
3
lags,
it
is
\
\
estimator
of
prediction
error.
Here
in
this
problem
we
will
need
to
use
the
\
\
14;
and
for
4
lags,
it
is
18.
Now,
let’s
calculate
the
values
of
the
criterion
\
\
formula
ln(det(sigma_hat))
+
(2
*
k
/
T)
to
determine
the
values
of
Akaike
\u2019\
\
for
each
lag:
\n
(A)
1
lag:
ln(0.0336)
+
(2
*
6
/
500)
=
ln(0.0336)
+
(12
/
500)
\
s
criterion,
where
ln
denotes
the
natural
log
function,
det
the
determinant
\
\
=
-3.369
\n
(B)
2
lags:
ln(0.0169)
+
(2
*
10
/
500)
=
ln(0.0169)
+
(20
/
500)
=
\
\
function,
k
the
total
number
of
parameters
in
total
(across
both
equations),
\
\
-4.040
\n
(C)
3
lags:
ln(0.0084)
+
(2
*
14
/
500)
=
ln(0.0084)
+
(28
/
500)
=-4.724
\n\
\
and
T
the
number
of
observations
(which,
in
this
case,
is
equal
to
500).
For
\
(D)
4
lags:
ln(0.0062)
+
(2
*
18
/
500)
=
ln(0.0062)
+
(36
/
500)
=-5.011
\n
Because
\
\
1
lag,
the
number
of
parameters
in
total
is
equal
to
6;
for
2
lags,
it
is
\
\
the
optimal
model
order
according
to
AIC
minimizes
the
information
criterion,
\
\
10;
for
3
lags,
it
is
14;
and
for
4
lags,
it
is
18.
Now,
let
\u2019
s
calculate
\
\
the
answer
should
be
the
one
with
the
lowest
value.
In
this
case,
(D)
has
the
\
\
the
values
of
the
criterion
for
each
lag:
\n
(A)
1
lag:
ln(0.0336)
+
(2
*
6
\
\
lowest
value.
The
answer
is
(C).
\n\n
Q:
Consider
the
following
AR(1)
model
with
\
\
/
500)
=
ln(0.0336)
+
(12
/
500)
=
-3.369
\n
(B)
2
lags:
ln(0.0169)
+
(2
*
10
\
\
the
disturbances
having
zero
mean
and
unit
variance
\n
yt
=
0.2
+
0.4
yt-1
+
ut
\n\
\
/
500)
=
ln(0.0169)
+
(20
/
500)
=
-4.040
\n
(C)
3
lags:
ln(0.0084)
+
(2
*
14
\
The
(unconditional)
mean
of
y
will
be
given
by
\n
(A)
0.2
(B)
0.4
(C)
0.5
(D)
0.33
\n\
\
/
500)
=
ln(0.0084)
+
(28
/
500)
=-4.724
\n
(D)
4
lags:
ln(0.0062)
+
(2
*
18
\
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
econometrics
for
\
\
/
500)
=
ln(0.0062)
+
(36
/
500)
=-5.011
\n
Because
the
optimal
model
order
\
\
help.
Let’s
solve
this
problem
step
by
step.
If
we
have
a
an
AR(1)
model
with
\
\
according
to
AIC
minimizes
the
information
criterion,
the
answer
should
be
\
\
the
disturbances
having
zero
mean
and
unit
variance,
then
the
unconditional
mean
\
\
the
one
with
the
lowest
value.
In
this
case,
(D)
has
the
lowest
value.
The
\
\
of
y
is
equal
to
the
following:
\n
unconditional
mean
of
y
=
(the
intercept
term)
\
\
answer
is
(C)."
\
/
(1
-
autoregressive
coefficient)
\n
We
know
that
the
intercept
term
is
0.2
and
\
-
question
:
'
Consider
the
following
AR(1)
model
with
the
disturbances
having
zero
mean
\
the
autoregressive
coefficient
is
0.4;
thus,
we
have:
\n
unconditional
mean
of
y
\
and
unit
variance
\
=
(0.2)
/
(1
-
0.4)
=
(0.2)
/
(0.6)
=
2
/
6
=
1
/
3,
which
is
approximately
0.33.
\
\
That
means
that
the
answer
should
be
(D)
0.33.
The
answer
is
(D).
\n\n
Q:
What
would
\
yt
=
0.2
+
0.4
yt-1
+
ut
\
be
then
consequences
for
the
OLS
estimator
if
heteroscedasticity
is
present
in
\
\
a
regression
model
but
ignored?
\n
(A)
It
will
be
biased
(B)
It
will
be
inconsistent
\
The
(unconditional)
mean
of
y
will
be
given
by
\
(C)
It
will
be
inefficient
(D)
All
of
(a),
(b)
and
(c)
will
be
true.
\n
A:
Let's
\
\
think
step
by
step.
We
refer
to
Wikipedia
articles
on
econometrics
for
help.
Heteroscedasticity
\
(A)
0.2
(B)
0.4
(C)
0.5
(D)
0.33'
\
refers
to
the
condition
where
the
variance
of
the
error
terms
is
not
constant
\
target
:
"
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
econometrics
\
\
across
multiple
observations.
If
heteroscedasticity
is
present
in
a
regression
\
\
for
help.
Let
\u2019
s
solve
this
problem
step
by
step.
If
we
have
a
an
AR(1)
\
\
model,
then
the
coefficient
estimates
in
the
OLS
estimator
will
be
not
only
unbiased
\
\
model
with
the
disturbances
having
zero
mean
and
unit
variance,
then
the
unconditional
\
\
and
consistent
but
also
inefficient.
Because
(A)
and
(B)
are
incorrect
choices
\
\
mean
of
y
is
equal
to
the
following:
\n
unconditional
mean
of
y
=
(the
intercept
\
\
and
(C)
is
a
correct
choice,
(D)
cannot
be
the
right
answer.
Ultimately,
(C)
is
\
\
term)
/
(1
-
autoregressive
coefficient)
\n
We
know
that
the
intercept
term
\
\
the
only
true
choice.
The
answer
is
(C).
\n\n
Q:
Suppose
that
a
test
statistic
has
\
\
is
0.2
and
the
autoregressive
coefficient
is
0.4;
thus,
we
have:
\n
unconditional
\
\
associated
with
it
a
p-value
of
0.08.
Which
one
of
the
following
statements
is
\
\
mean
of
y
=
(0.2)
/
(1
-
0.4)
=
(0.2)
/
(0.6)
=
2
/
6
=
1
/
3,
which
is
approximately
\
\
true?
\n
(i)
If
the
size
of
the
test
were
exactly
8%,
we
would
be
indifferent
between
\
\
0.33.
That
means
that
the
answer
should
be
(D)
0.33.
The
answer
is
(D)."
\
rejecting
and
not
rejecting
the
null
hypothesis
\n
(ii)
The
null
would
be
rejected
\
-
question
:
'
What
would
be
then
consequences
for
the
OLS
estimator
if
heteroscedasticity
\
if
a
10%
size
of
test
were
used
\n
(iii)
The
null
would
not
be
rejected
if
a
1%
\
is
present
in
a
regression
model
but
ignored?
\
size
of
test
were
used
\n
(iv)
The
null
would
be
rejected
if
a
5%
size
of
test
were
\
\
used.
\n
(A)
(ii)
and
(iv)
only
(B)
(i)
and
(iii)
only
(C)
(i),
(ii),
and
(iii)
\
(A)
It
will
be
biased
(B)
It
will
be
inconsistent
(C)
It
will
be
inefficient
\
only
(D)
(i),
(ii),
(iii),
and
(iv).
\n
A:
Let's
think
step
by
step.
We
refer
to
\
(D)
All
of
(a),
(b)
and
(c)
will
be
true.'
\
Wikipedia
articles
on
econometrics
for
help.
Let’s
reason
about
each
of
the
options.
\n\
target
:
Let's think step by step. We refer to Wikipedia articles on econometrics
(i)
is
a
true
statement.
\n
(ii)
is
a
true
statement.
\n
(iii)
is
a
true
statement.
\n\
for help. Heteroscedasticity refers to the condition where the variance of the
(iv)
is
not
a
true
statement.
Thus,
(i),
(ii),
and
(iii)
are
true.
The
answer
is
\
error terms is not constant across multiple observations. If heteroscedasticity
\
(C).
\n\n
Q:
For
a
stationary
autoregressive
process,
shocks
will
\n
(A)
Eventually
\
is present in a regression model, then the coefficient estimates in the OLS
\
die
away
(B)
Persist
indefinitely
(C)
Grow
exponentially
(D)
Never
occur
\n
A:
Let's
\
estimator will be not only unbiased and consistent but also inefficient. Because
\
think
step
by
step.
We
refer
to
Wikipedia
articles
on
econometrics
for
help.
This
\
(A) and (B) are incorrect choices and (C) is a correct choice, (D) cannot be
\
is
a
formal
logic
problem
about
stationally
process.
For
a
stationary
autoregressive
\
the right answer. Ultimately, (C) is the only
true
choice. The answer is (C).
\
process,
shocks
will
eventually
die
away.
The
answer
is
(A).
\n\n
"
-
question
:
'
Suppose
that
a
test
statistic
has
associated
with
it
a
p-value
of
0.08.
"
group"
:
"
mmlu_flan_cot_fewshot_social_sciences"
Which
one
of
the
following
statements
is
true?
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
task"
:
"
mmlu_flan_cot_fewshot_econometrics"
(i)
If
the
size
of
the
test
were
exactly
8%,
we
would
be
indifferent
between
rejecting
and
not
rejecting
the
null
hypothesis
(ii)
The
null
would
be
rejected
if
a
10%
size
of
test
were
used
(iii)
The
null
would
not
be
rejected
if
a
1%
size
of
test
were
used
(iv)
The
null
would
be
rejected
if
a
5%
size
of
test
were
used.
(A)
(ii)
and
(iv)
only
(B)
(i)
and
(iii)
only
(C)
(i),
(ii),
and
(iii)
only
(D)
(i),
(ii),
(iii),
and
(iv).'
target
:
"
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
econometrics
\
\
for
help.
Let
\u2019
s
reason
about
each
of
the
options.
\n
(i)
is
a
true
statement.
\n\
(ii)
is
a
true
statement.
\n
(iii)
is
a
true
statement.
\n
(iv)
is
not
a
true
statement.
\
\
Thus,
(i),
(ii),
and
(iii)
are
true.
The
answer
is
(C)."
-
question
:
'
For
a
stationary
autoregressive
process,
shocks
will
(A)
Eventually
die
away
(B)
Persist
indefinitely
(C)
Grow
exponentially
(D)
Never
occur'
target
:
'
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
econometrics
for
help.
This
is
a
formal
logic
problem
about
stationally
process.
For
a
stationary
autoregressive
process,
shocks
will
eventually
die
away.
The
answer
is
(A).'
group
:
mmlu_flan_cot_fewshot_social_sciences
include
:
_mmlu_flan_cot_fewshot_template_yaml
task
:
mmlu_flan_cot_fewshot_econometrics
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_electrical_engineering.yaml
View file @
da211969
"
dataset_name"
:
"
electrical_engineering"
dataset_name
:
electrical_engineering
"
description"
:
"
\n
The
following
are
multiple
choice
questions
(with
answers)
about
\
description
:
'
\
electrical
engineering.
\n\n
Q:
A
point
pole
has
a
strength
of
4π
*
10^-4
weber.
\
\
The
force
in
newtons
on
a
point
pole
of
4π
*
1.5
*
10^-4
weber
placed
at
a
distance
\
The
following
are
multiple
choice
questions
(with
answers)
about
electrical
engineering.'
\
of
10
cm
from
it
will
be
\n
(A)
15
N.
(B)
20
N.
(C)
7.5
N.
(D)
3.75
N.
\n
A:
Let's
\
fewshot_config
:
\
think
step
by
step.
The
force
between
two
point
poles
is
given
by
m_1m_2/(mu_0
\
sampler
:
first_n
\
4
\\
pi
r^2),
in
analogy
to
Coulomb’s
law.
Plugging
in
the
values
given
in
the
\
samples
:
\
question,
we
calculate
that
the
force
is
approximately
15
N.
The
answer
is
(A).
\n\
-
question
:
"
A
point
pole
has
a
strength
of
4
\u03C0
*
10^-4
weber.
The
force
in
newtons
\
\n
Q:
The
coil
of
a
moving
coil
meter
has
100
turns,
is
40
mm
long
and
30
mm
wide.
\
\
on
a
point
pole
of
4
\u03C0
*
1.5
*
10^-4
weber
placed
at
a
distance
of
10
\
\
The
control
torque
is
240*10-6
N-m
on
full
scale.
If
magnetic
flux
density
is
\
\
cm
from
it
will
be
\n
(A)
15
N.
(B)
20
N.
(C)
7.5
N.
(D)
3.75
N."
\
1Wb/m2
range
of
meter
is
\n
(A)
1
mA.
(B)
2
mA.
(C)
3
mA.
(D)
4
mA.
\n
A:
Let's
think
\
target
:
"
Let's
think
step
by
step.
The
force
between
two
point
poles
is
given
\
\
step
by
step.
The
torque
on
a
coil
in
a
uniform
magnetic
field
is
given
by
BANI,
\
\
by
m_1m_2/(mu_0
4
\\
pi
r^2),
in
analogy
to
Coulomb
\u2019
s
law.
Plugging
in
\
\
where
B
is
the
magnetic
flux
density,
A
is
the
area
of
the
coil,
N
is
the
number
\
\
the
values
given
in
the
question,
we
calculate
that
the
force
is
approximately
\
\
of
turns,
and
I
is
the
current.
So
we
have
that
I
=
(Torque)/(BAN),
or
240e-6/(1200e-6
\
\
15
N.
The
answer
is
(A)."
\
*
100
*
1)
=
2e-3.
The
answer
is
(B).
\n\n
Q:
In
an
SR
latch
built
from
NOR
gates,
\
-
question
:
'
The
coil
of
a
moving
coil
meter
has
100
turns,
is
40
mm
long
and
30
mm
\
which
condition
is
not
allowed
\n
(A)
S=0,
R=0
(B)
S=0,
R=1
(C)
S=1,
R=0
(D)
S=1,
\
wide.
The
control
torque
is
240*10-6
N-m
on
full
scale.
If
magnetic
flux
density
\
R=1
\n
A:
Let's
think
step
by
step.
An
SR
latch
is
a
set-reset
latch;
in
the
case
\
is
1Wb/m2
range
of
meter
is
\
where
S=1
and
R=1,
the
circuit
has
no
stable
state;
instead
a
race
condition
will
\
\
be
produced
within
the
circuit,
so
the
device
will
be
in
an
undefined
state.
So
\
(A)
1
mA.
(B)
2
mA.
(C)
3
mA.
(D)
4
mA.'
\
S=1,
R=1
is
an
illegal
input.
The
answer
is
(D).
\n\n
Q:
Two
long
parallel
conductors
\
target
:
Let's think step by step. The torque on a coil in a uniform magnetic field
\
carry
100
A.
If
the
conductors
are
separated
by
20
mm,
the
force
per
meter
of
\
is given by BANI, where B is the magnetic flux density, A is the area of the
\
length
of
each
conductor
will
be
\n
(A)
100
N.
(B)
0.1
N.
(C)
1
N.
(D)
0.01
N.
\n\
coil, N is the number of turns, and I is the current. So we have that I = (Torque)/(BAN),
A:
Let's
think
step
by
step.
The
magnetic
force-per-length
between
two
current-carrying
\
or 240e-6/(1200e-6 * 100 * 1) = 2e-3. The answer is (B).
\
conductors
is
given
by
\\
mu_0
I_1
I_2
/
(2
\\
pi
r),
where
$r$
is
the
separation
\
-
question
:
'
In
an
SR
latch
built
from
NOR
gates,
which
condition
is
not
allowed
\
distance
and
I_1
and
I_2
are
the
currents.
Plugging
in
100
A
for
I_1
and
I_2,
\
\
and
20
mm
for
r,
gives
0.1
N.
The
answer
is
(B).
\n\n
Q:
In
a
2
pole
lap
winding
\
(A)
S=0,
R=0
(B)
S=0,
R=1
(C)
S=1,
R=0
(D)
S=1,
R=1'
\
dc
machine
,
the
resistance
of
one
conductor
is
2Ω
and
total
number
of
conductors
\
target
:
Let's think step by step. An SR latch is a set-reset latch; in the case
\
is
100.
Find
the
total
resistance
\n
(A)
200Ω
(B)
100Ω
(C)
50Ω
(D)
10Ω
\n
A:
Let's
\
where S=1 and R=1, the circuit has no stable state; instead a race condition
\
think
step
by
step.
In
lap
winding,
effectively
two
resistors
are
connected
in
\
will be produced within the circuit, so the device will be in an undefined state.
\
parallel,
so
the
actual
resistance
of
each
pair
is
1
Ohm.
Since
we
have
50
pairs,
\
So S=1, R=1 is an illegal question. The answer is (D).
\
we
get
a
total
resistance
of
50
Ohms.
The
answer
is
(C).
\n\n
"
-
question
:
'
Two
long
parallel
conductors
carry
100
A.
If
the
conductors
are
separated
"
group"
:
"
mmlu_flan_cot_fewshot_stem"
by
20
mm,
the
force
per
meter
of
length
of
each
conductor
will
be
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
task"
:
"
mmlu_flan_cot_fewshot_electrical_engineering"
(A)
100
N.
(B)
0.1
N.
(C)
1
N.
(D)
0.01
N.'
target
:
Let's think step by step. The magnetic force-per-length between two current-carrying
conductors is given by \mu_0 I_1 I_2 / (2 \pi r), where $r$ is the separation
distance and I_1 and I_2 are the currents. Plugging in 100 A for I_1 and I_2,
and 20 mm for r, gives 0.1 N. The answer is (B).
-
question
:
"
In
a
2
pole
lap
winding
dc
machine
,
the
resistance
of
one
conductor
is
\
\
2
\u03A9
and
total
number
of
conductors
is
100.
Find
the
total
resistance
\n\
(A)
200
\u03A9
(B)
100
\u03A9
(C)
50
\u03A9
(D)
10
\u03A9
"
target
:
'
Let'
'
s
think
step
by
step.
In
lap
winding,
effectively
two
resistors
are
connected
in
parallel,
so
the
actual
resistance
of
each
pair
is
1
Ohm.
Since
we
have
50
pairs,
we
get
a
total
resistance
of
50
Ohms.
The
answer
is
(C).'
group
:
mmlu_flan_cot_fewshot_stem
include
:
_mmlu_flan_cot_fewshot_template_yaml
task
:
mmlu_flan_cot_fewshot_electrical_engineering
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_elementary_mathematics.yaml
View file @
da211969
"
dataset_name"
:
"
elementary_mathematics"
dataset_name
:
elementary_mathematics
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
elementary
\
description
:
The following are multiple choice questions (with answers) about elementary
\
mathematics.
\n\n
Q:
Olivia
used
the
rule
\"
Add
11
\"
to
create
the
number
pattern
\
mathematics.
\
shown
below.
10,
21,
32,
43,
54.
Which
statement
about
the
number
pattern
is
true?
\n\
fewshot_config
:
(A)
The
10th
number
in
the
pattern
will
be
an
even
number.
\n
(B)
The
number
pattern
\
sampler
:
first_n
\
will
never
have
two
even
numbers
next
to
each
other.
\n
(C)
The
next
two
numbers
\
samples
:
\
in
the
pattern
will
be
an
even
number
then
an
odd
number.
\n
(D)
If
the
number
pattern
\
-
question
:
'
Olivia
used
the
rule
"Add
11"
to
create
the
number
pattern
shown
below.
\
started
with
an
odd
number
then
the
pattern
would
have
only
odd
numbers
in
it.
\n\
10,
21,
32,
43,
54.
Which
statement
about
the
number
pattern
is
true?
A:
Let's
think
step
by
step.
Choice
A
is
incorrect
because
every
even-numbered
term
\
\
in
the
pattern
is
odd,
and
10
is
an
even
number.
Choice
B
is
correct,
because
\
(A)
The
10th
number
in
the
pattern
will
be
an
even
number.
\
adding
an
odd
number
(in
this
case
11)
to
an
odd
number
produces
an
even
number,
\
\
and
adding
an
odd
number
to
an
even
number
produces
an
odd
number.
Thus
the
terms
\
(B)
The
number
pattern
will
never
have
two
even
numbers
next
to
each
other.
\
in
the
pattern
will
alternate
between
odd
and
even,
so
there
will
never
be
two
\
\
even
numbers
next
to
each
other.
Choice
C
is
incorrect
because
the
last
term
in
\
(C)
The
next
two
numbers
in
the
pattern
will
be
an
even
number
then
an
odd
number.
\
the
example
is
even
(54),
and
we
know
that
the
terms
will
alternate
between
even
\
\
and
odd.
Choice
D
is
incorrect
because
the
terms
in
the
pattern
will
alternate
\
(D)
If
the
number
pattern
started
with
an
odd
number
then
the
pattern
would
\
between
odd
and
even,
regardless
of
the
value
of
the
first
term.
The
answer
is
\
have
only
odd
numbers
in
it.'
\
(B).
\n\n
Q:
The
population
of
the
city
where
Michelle
was
born
is
145,826.
What
\
target
:
Let's think step by step. Choice A is incorrect because every even-numbered
\
is
the
value
of
the
5
in
the
number
145,826?
\n
(A)
5
thousands
\n
(B)
5
hundreds
\n\
term in the pattern is odd, and 10 is an even number. Choice B is correct, because
(C)
5
tens
\n
(D)
5
ones
\n
A:
Let's
think
step
by
step.
Choice
A
is
correct,
because
\
adding an odd number (in this case 11) to an odd number produces an even number,
\
there
are
three
digits
following
the
5,
so
\n
the
5
is
in
the
thousands
place.
Thus
\
and adding an odd number to an even number produces an odd number. Thus the
\
the
other
choices
are
incorrect.
The
answer
is
(A).
\n\n
Q:
A
store
sells
107
different
\
terms in the pattern will alternate between odd and even, so there will never
\
colors
of
paint.
They
have
25
cans
of
each
color
in
storage.
The
number
of
cans
\
be two even numbers next to each other. Choice C is incorrect because the last
\
of
paint
the
store
has
in
storage
can
be
found
using
the
expression
below.
107
\
term in the example is even (54), and we know that the terms will alternate
\
×
25.
How
many
cans
of
paint
does
the
store
have
in
storage?
\n
(A)
749
\n
(B)
2,675
\n\
between even and odd. Choice D is incorrect because the terms in the pattern
(C)
2,945
\n
(D)
4,250
\n
A:
Let's
think
step
by
step.
We
can
calculate
107
x
25
=
(100
\
will alternate between odd and even, regardless of the value of the first term.
\
x
25)
+
(7
x
25)
=
2500
+
175
=
2675.
The
answer
is
(B).
\n\n
Q:
A
total
of
30
players
\
The answer is (B).
\
will
play
basketball
at
a
park.
There
will
be
exactly
5
players
on
each
team.
\
-
question
:
'
The
population
of
the
city
where
Michelle
was
born
is
145,826.
What
is
\
Which
statement
correctly
explains
how
to
find
the
number
of
teams
needed?
\n
(A)
\
the
value
of
the
5
in
the
number
145,826?
\
Add
5
to
30
to
find
35
teams.
\n
(B)
Divide
30
by
5
to
find
6
teams.
\n
(C)
Multiply
\
\
30
and
5
to
find
150
teams.
\n
(D)
Subtract
5
from
30
to
find
25
teams.
\n
A:
Let's
\
(A)
5
thousands
\
think
step
by
step.
We
want
to
find
the
number
of
teams.
We
know
that
there
are
\
\
5
players/team,
and
30
players.
Thus
to
get
the
number
of
teams
we
divide
players
\
(B)
5
hundreds
\
by
players/team,
so
30
players
/
5
players/team
=
6
teams.
The
answer
is
(B).
\n\
\n
Q:
Which
expression
is
equivalent
to
5
x
9?
\n
(A)
(5
x
4)
x
(6
x
5)
\n
(B)
(5
x
5)
\
(C)
5
tens
\
+
(5
x
4)
\n
(C)
(5
x
5)
+
(5
x
9)
\n
(D)
(5
x
9)
x
(6
x
9)
\n
A:
Let's
think
step
by
\
\
step.
We
know
that
9
=
(5
+
4),
so
5
x
9
=
5
x
(5
+
4)
=
(5
x
5)
+
(5
x
4).
The
\
(D)
5
ones'
\
answer
is
(B).
\n\n
"
target
:
'
Let'
'
s
think
step
by
step.
Choice
A
is
correct,
because
there
are
three
"
group"
:
"
mmlu_flan_cot_fewshot_stem"
digits
following
the
5,
so
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
task"
:
"
mmlu_flan_cot_fewshot_elementary_mathematics"
the
5
is
in
the
thousands
place.
Thus
the
other
choices
are
incorrect.
The
answer
is
(A).'
-
question
:
"
A
store
sells
107
different
colors
of
paint.
They
have
25
cans
of
each
\
\
color
in
storage.
The
number
of
cans
of
paint
the
store
has
in
storage
can
\
\
be
found
using
the
expression
below.
107
\xD7
25.
How
many
cans
of
paint
does
\
\
the
store
have
in
storage?
\n
(A)
749
\n
(B)
2,675
\n
(C)
2,945
\n
(D)
4,250"
target
:
Let's think step by step. We can calculate 107 x 25 = (100 x 25) + (7
x 25) = 2500 + 175 = 2675. The answer is (B).
-
question
:
'
A
total
of
30
players
will
play
basketball
at
a
park.
There
will
be
exactly
5
players
on
each
team.
Which
statement
correctly
explains
how
to
find
the
number
of
teams
needed?
(A)
Add
5
to
30
to
find
35
teams.
(B)
Divide
30
by
5
to
find
6
teams.
(C)
Multiply
30
and
5
to
find
150
teams.
(D)
Subtract
5
from
30
to
find
25
teams.'
target
:
Let's think step by step. We want to find the number of teams. We know
that there are 5 players/team, and 30 players. Thus to get the number of teams
we divide players by players/team, so 30 players / 5 players/team = 6 teams.
The answer is (B).
-
question
:
'
Which
expression
is
equivalent
to
5
x
9?
(A)
(5
x
4)
x
(6
x
5)
(B)
(5
x
5)
+
(5
x
4)
(C)
(5
x
5)
+
(5
x
9)
(D)
(5
x
9)
x
(6
x
9)'
target
:
'
Let'
'
s
think
step
by
step.
We
know
that
9
=
(5
+
4),
so
5
x
9
=
5
x
(5
+
4)
=
(5
x
5)
+
(5
x
4).
The
answer
is
(B).'
group
:
mmlu_flan_cot_fewshot_stem
include
:
_mmlu_flan_cot_fewshot_template_yaml
task
:
mmlu_flan_cot_fewshot_elementary_mathematics
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_formal_logic.yaml
View file @
da211969
"
dataset_name"
:
"
formal_logic"
dataset_name
:
formal_logic
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
formal
\
description
:
The following are multiple choice questions (with answers) about formal
\
logic.
\n\n
Q:
Which
of
the
given
formulas
of
PL
is
the
best
symbolization
of
the
\
logic.
\
following
sentence?
\n
Turtles
live
long
lives
and
are
happy
creatures,
unless
they
\
fewshot_config
:
\
are
injured.
\n
(A)
(L
•
H)
≡
I
(B)
(L
•
H)
∨
I
(C)
L
•
(H
∨
I)
(D)
L
•
(H
⊃
R).
\n\
sampler
:
first_n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
formal
logic
for
\
samples
:
\
help.
Let’s
solve
this
step
by
step.
Let
“L”
denote
“living
long”,
H
“being
happy”,
\
-
question
:
"
Which
of
the
given
formulas
of
PL
is
the
best
symbolization
of
the
following
\
\
and
“I”
“being
injured”.
Now,
consider
each
choice:
\n
(A)
means
(living
long
AND
\
\
sentence?
\n
Turtles
live
long
lives
and
are
happy
creatures,
unless
they
are
\
\
being
happy)
is
equivalent
to
(being
injured).
\n
(B)
means
(living
long
AND
being
\
\
injured.
\n
(A)
(L
\u2022
H)
\u2261
I
(B)
(L
\u2022
H)
\u2228
I
(C)
L
\u2022\
\
happy)
OR
(being
injured).
\n
(C)
means
(living
long)
AND
(being
happy
OR
being
\
\
(H
\u2228
I)
(D)
L
\u2022
(H
\u2283
R)."
\
injured).
\n
(D)
means
(living
long)
AND
(being
happy
implies
being
R),
but
what
\
target
:
"
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
formal
logic
\
\
R
denotes
is
not
clear.
\n
Obviously,
(B)
is
the
best
symbolization
of
the
original
\
\
for
help.
Let
\u2019
s
solve
this
step
by
step.
Let
\u201C
L
\u201D
denote
\u201C\
\
sentence.
The
answer
is
(B).
\n\n
Q:
Select
the
best
translation
into
predicate
\
living
long
\u201D
,
H
\u201C
being
happy
\u201D
,
and
\u201C
I
\u201D
\u201C
being
\
\
logic.George
borrows
Hector's
lawnmower.
(g:
George;
h:
Hector;
l:
Hector's
lawnmower;
\
\
injured
\u201D
.
Now,
consider
each
choice:
\n
(A)
means
(living
long
AND
being
\
\
Bxyx:
x
borrows
y
from
z).
\n
(A)
Blgh
(B)
Bhlg
(C)
Bglh
(D)
Bghl
\n
A:
Let's
think
\
\
happy)
is
equivalent
to
(being
injured).
\n
(B)
means
(living
long
AND
being
\
\
step
by
step.
We
refer
to
Wikipedia
articles
on
formal
logic
for
help.
Let’s
solve
\
\
happy)
OR
(being
injured).
\n
(C)
means
(living
long)
AND
(being
happy
OR
being
\
\
this
step
by
step.
We
are
told
that
“Bxyx”
means
“x
borrows
y
from
z”.
We
can
\
\
injured).
\n
(D)
means
(living
long)
AND
(being
happy
implies
being
R),
but
\
\
rewrite
“George
borrows
Hector's
lawnmower”
as
“George
borrows
a
lawnmower
from
\
\
what
R
denotes
is
not
clear.
\n
Obviously,
(B)
is
the
best
symbolization
of
\
\
Hector”,
which
can
then
be
translated
into
predicate
logic
as
“Bglh”.
The
answer
\
\
the
original
sentence.
The
answer
is
(B)."
\
“Bglh”
appears
in
(C);
therefore,
(C)
must
be
the
correct
answer.
The
answer
is
\
-
question
:
'
Select
the
best
translation
into
predicate
logic.George
borrows
Hector'
'
s
\
(C).
\n\n
Q:
\n
Select
the
best
English
interpretation
of
the
given
arguments
in
\
lawnmower.
(g:
George;
h:
Hector;
l:
Hector'
'
s
lawnmower;
Bxyx:
x
borrows
y
\
predicate
logic.
\n
Dm
\n
(∀x)(Wx
⊃
~Dx).
\n
(∀x)Wx
∨
Ag
\t
/
(∃x)Ax
\n
(A)
Marina
is
a
\
from
z).
\
dancer.
Some
weaklings
are
not
dancers.
Either
everything
is
a
weakling
or
Georgia
\
\
plays
volleyball.
So
something
plays
volleyball.
(B)
Marina
is
a
dancer.
No
weakling
\
(A)
Blgh
(B)
Bhlg
(C)
Bglh
(D)
Bghl'
\
is
a
dancer.
Everything
is
either
a
weakling
or
plays
volleyball.
So
something
\
target
:
"
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
formal
logic
\
\
plays
volleyball.
(C)
Marina
is
a
dancer.
Some
weaklings
are
not
dancers.
Everything
\
\
for
help.
Let
\u2019
s
solve
this
step
by
step.
We
are
told
that
\u201C
Bxyx
\u201D\
\
is
either
a
weakling
or
plays
volleyball.
So
something
plays
volleyball.
(D)
Marina
\
\
means
\u201C
x
borrows
y
from
z
\u201D
.
We
can
rewrite
\u201C
George
borrows
\
\
is
a
dancer.
No
weakling
is
a
dancer.
Either
everything
is
a
weakling
or
Georgia
\
\
Hector's
lawnmower
\u201D
as
\u201C
George
borrows
a
lawnmower
from
Hector
\u201D\
\
plays
volleyball.
So
something
plays
volleyball.
\n
A:
Let's
think
step
by
step.
\
,
which
can
then
be
translated
into
predicate
logic
as
\u201C
Bglh
\u201D
.
The
\
\
We
refer
to
Wikipedia
articles
on
formal
logic
for
help.
Let’s
solve
this
step
\
\
answer
\u201C
Bglh
\u201D
appears
in
(C);
therefore,
(C)
must
be
the
correct
\
\
by
step.
Let
“D”
denote
“being
a
dancer”,
“m”
denote
“Maria”,
“g”
denote
“Georgia”,
\
\
answer.
The
answer
is
(C)."
\
“W”
denote
“weakling”,
“A”
denote
“playing
volleyball”.
Then,
we
have
the
following:
\n\
-
question
:
"
\n
Select
the
best
English
interpretation
of
the
given
arguments
in
predicate
\
1.
Dm
→
Maria
is
a
dance.
\n
2.
(∀x)(Wx
⊃
~Dx).
→
For
all
x,
if
x
is
a
weakling,
then
\
\
logic.
\n
Dm
\n
(
\u2200
x)(Wx
\u2283
~Dx).
\n
(
\u2200
x)Wx
\u2228
Ag
\t
/
(
\u2203
x)Ax
\n\
\
x
is
not
a
dancer.
In
other
words,
no
weakling
is
a
dancer.
\n
3.
(∀x)Wx
∨
Ag
\t\
(A)
Marina
is
a
dancer.
Some
weaklings
are
not
dancers.
Either
everything
is
\
/
(∃x)Ax
→
For
all
x,
x
is
a
weakling
or
Georgia
plays
volleyball.
So
there
exists
\
\
a
weakling
or
Georgia
plays
volleyball.
So
something
plays
volleyball.
(B)
\
\
an
x
that
plays
volleyball.
\n
Options
(A)
and
(C)
do
claim
that
some
weaklings
\
\
Marina
is
a
dancer.
No
weakling
is
a
dancer.
Everything
is
either
a
weakling
\
\
are
not
dancers,
but
the
second
argument
strongly
states
that
no
weakling
is
a
\
\
or
plays
volleyball.
So
something
plays
volleyball.
(C)
Marina
is
a
dancer.
\
\
dancer.
Thus,
we
can
eliminate
them.
Option
(B)
omits
the
important
detail
about
\
\
Some
weaklings
are
not
dancers.
Everything
is
either
a
weakling
or
plays
volleyball.
\
\
Georgia
playing
volleyball.
Option
(D)
has
all
the
details
presented
in
the
arguments
\
\
So
something
plays
volleyball.
(D)
Marina
is
a
dancer.
No
weakling
is
a
dancer.
\
\
and
is
the
best
English
interpretation
of
the
arguments.
The
answer
is
(D).
\n\n\
\
Either
everything
is
a
weakling
or
Georgia
plays
volleyball.
So
something
\
Q:
Select
the
best
translation
into
predicate
logic:
No
people
drive
on
Mars.
\n\
\
plays
volleyball."
(A)
~Pd
(B)
(∀x)(Px
∨
~Dx)
(C)
(∀x)(Px
⊃
~Dx)
(D)
~Dp
\n
A:
Let's
think
step
by
step.
\
target
:
"
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
formal
logic
\
\
We
refer
to
Wikipedia
articles
on
formal
logic
for
help.
Let’s
solve
this
step
\
\
for
help.
Let
\u2019
s
solve
this
step
by
step.
Let
\u201C
D
\u201D
denote
\u201C\
\
by
step.
Let
“P”
denote
“being
on
Mars”
and
“D”
denote
“driving
on
Mars”.
Then
\
being
a
dancer
\u201D
,
\u201C
m
\u201D
denote
\u201C
Maria
\u201D
,
\u201C
g
\u201D\
\
let’s
consider
each
option:
\n
Option
(A):
~Pd
→
d
is
not
on
Mars.
\n
Option
(B):
\
\
denote
\u201C
Georgia
\u201D
,
\u201C
W
\u201D
denote
\u201C
weakling
\u201D
,
\u201C\
\
(∀x)(Px
∨
~Dx)
→
For
all
x,
x
is
on
Mars
and
x
do
not
drive
on
Mars.
\n
Option
(C):
\
A
\u201D
denote
\u201C
playing
volleyball
\u201D
.
Then,
we
have
the
following:
\n\
\
(∀x)(Px
⊃
~Dx)
→
For
all
x,
x
is
on
Mars
implies
that
x
do
not
drive
on
Mars.
\n\
1.
Dm
\u2192
Maria
is
a
dance.
\n
2.
(
\u2200
x)(Wx
\u2283
~Dx).
\u2192
For
all
\
Option
(D):
~Dp:
→
p
do
not
drive
on
Mars.
\n
Of
all
these
options,
Option
(C)
appears
\
\
x,
if
x
is
a
weakling,
then
x
is
not
a
dancer.
In
other
words,
no
weakling
\
\
to
be
the
best
and
most
meaningful
interpretation
of
the
argument
“No
people
drive
\
\
is
a
dancer.
\n
3.
(
\u2200
x)Wx
\u2228
Ag
\t
/
(
\u2203
x)Ax
\u2192
For
all
x,
x
\
\
on
Mars.”
The
answer
is
(C).
\n\n
"
\
is
a
weakling
or
Georgia
plays
volleyball.
So
there
exists
an
x
that
plays
\
"
group"
:
"
mmlu_flan_cot_fewshot_humanities"
\
volleyball.
\n
Options
(A)
and
(C)
do
claim
that
some
weaklings
are
not
dancers,
\
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
\
but
the
second
argument
strongly
states
that
no
weakling
is
a
dancer.
Thus,
\
"
task"
:
"
mmlu_flan_cot_fewshot_formal_logic"
\
we
can
eliminate
them.
Option
(B)
omits
the
important
detail
about
Georgia
\
\
playing
volleyball.
Option
(D)
has
all
the
details
presented
in
the
arguments
\
\
and
is
the
best
English
interpretation
of
the
arguments.
The
answer
is
(D)."
-
question
:
"
Select
the
best
translation
into
predicate
logic:
No
people
drive
on
Mars.
\n\
(A)
~Pd
(B)
(
\u2200
x)(Px
\u2228
~Dx)
(C)
(
\u2200
x)(Px
\u2283
~Dx)
(D)
~Dp"
target
:
"
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
formal
logic
\
\
for
help.
Let
\u2019
s
solve
this
step
by
step.
Let
\u201C
P
\u201D
denote
\u201C\
being
on
Mars
\u201D
and
\u201C
D
\u201D
denote
\u201C
driving
on
Mars
\u201D
.
Then
\
\
let
\u2019
s
consider
each
option:
\n
Option
(A):
~Pd
\u2192
d
is
not
on
Mars.
\n\
Option
(B):
(
\u2200
x)(Px
\u2228
~Dx)
\u2192
For
all
x,
x
is
on
Mars
and
x
do
\
\
not
drive
on
Mars.
\n
Option
(C):
(
\u2200
x)(Px
\u2283
~Dx)
\u2192
For
all
x,
\
\
x
is
on
Mars
implies
that
x
do
not
drive
on
Mars.
\n
Option
(D):
~Dp:
\u2192\
\
p
do
not
drive
on
Mars.
\n
Of
all
these
options,
Option
(C)
appears
to
be
the
\
\
best
and
most
meaningful
interpretation
of
the
argument
\u201C
No
people
drive
\
\
on
Mars.
\u201D
The
answer
is
(C).
\n\n
"
group
:
mmlu_flan_cot_fewshot_humanities
include
:
_mmlu_flan_cot_fewshot_template_yaml
task
:
mmlu_flan_cot_fewshot_formal_logic
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_global_facts.yaml
View file @
da211969
"
dataset_name"
:
"
global_facts"
dataset_name
:
global_facts
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
global
\
description
:
The following are multiple choice questions (with answers) about global
\
facts.
\n\n
Q:
As
of
2017,
how
many
of
the
world’s
1-year-old
children
today
have
\
facts.
\
been
vaccinated
against
some
disease?
*
\n
(A)
80%
(B)
60%
(C)
40%
(D)
20%
\n
A:
Let's
\
fewshot_config
:
\
think
step
by
step.
We
refer
to
Wikipedia
articles
on
global
facts
for
help.
According
\
sampler
:
first_n
\
to
data
published
by
the
World
Health
Organization,
the
nummber
of
1-year-old
\
samples
:
\
children
vaccinated
in
2017
exceeds
80%.
The
answer
is
(A).
\n\n
Q:
As
of
2019,
\
-
question
:
"
As
of
2017,
how
many
of
the
world
\u2019
s
1-year-old
children
today
have
\
\
about
what
percentage
of
Americans
agree
that
the
state
is
run
for
the
benefit
\
\
been
vaccinated
against
some
disease?
*
\n
(A)
80%
(B)
60%
(C)
40%
(D)
20%"
\
of
all
the
people?
\n
(A)
31%
(B)
46%
(C)
61%
(D)
76%
\n
A:
Let's
think
step
by
step.
\
target
:
Let's think step by step. We refer to Wikipedia articles on global facts
\
We
refer
to
Wikipedia
articles
on
global
facts
for
help.
In
2019,
about
46%
percentage
\
for help. According to data published by the World Health Organization, the
\
of
Americans
agree
that
the
state
is
run
for
the
benefit
of
all
the
people.
The
\
nummber of 1-year-old children vaccinated in 2017 exceeds 80%. The answer is
\
answer
is
(B).
\n\n
Q:
As
of
2019,
about
what
percentage
of
Russians
say
it
is
very
\
(A).
\
important
to
have
free
media
in
our
country
without
government/state
censorship?
\n\
-
question
:
'
As
of
2019,
about
what
percentage
of
Americans
agree
that
the
state
is
(A)
38%
(B)
53%
(C)
68%
(D)
83%
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
\
run
for
the
benefit
of
all
the
people?
\
articles
on
global
facts
for
help.
As
of
2019,
about
38%
of
Russians
say
it
is
\
\
very
important
to
have
free
media
in
our
country.
The
answer
is
(A).
\n\n
Q:
As
\
(A)
31%
(B)
46%
(C)
61%
(D)
76%'
\
of
2015,
since
1990
forests
have
____
in
Europe
and
have
____
in
Africa
and
the
\
target
:
Let's think step by step. We refer to Wikipedia articles on global facts
\
Americas.
\n
(A)
increased,
increased
(B)
increased,
decreased
(C)
decreased,
increased
\
for help. In 2019, about 46% percentage of Americans agree that the state is
\
(D)
decreased,
decreased
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
\
run for the benefit of all the people. The answer is (B).
\
on
global
facts
for
help.
As
of
2015,
since
1990
forests
have
increased
in
Europe
\
-
question
:
'
As
of
2019,
about
what
percentage
of
Russians
say
it
is
very
important
\
and
have
decreased
in
Africa
and
the
Americas.
The
answer
is
(B).
\n\n
Q:
Which
\
to
have
free
media
in
our
country
without
government/state
censorship?
\
of
the
following
pairs
of
statements
are
both
true
(as
of
2019)?
\n
(A)
People
tend
\
\
to
be
optimistic
about
their
own
future
and
the
future
of
their
nation
or
the
\
(A)
38%
(B)
53%
(C)
68%
(D)
83%'
\
world.
(B)
People
tend
to
be
optimistic
about
their
own
future
but
pessimistic
\
target
:
Let's think step by step. We refer to Wikipedia articles on global facts
\
about
the
future
of
their
nation
or
the
world.
(C)
People
tend
to
be
pessimistic
\
for help. As of 2019, about 38% of Russians say it is very important to have
\
about
their
own
future
but
optimistic
about
the
future
of
their
nation
or
the
\
free media in our country. The answer is (A).
\
world.
(D)
People
tend
to
be
pessimistic
about
their
own
future
and
the
future
\
-
question
:
'
As
of
2015,
since
1990
forests
have
____
in
Europe
and
have
____
in
Africa
\
of
their
nation
or
the
world.
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
\
and
the
Americas.
\
articles
on
global
facts
for
help.
As
of
2019,
most
people
tend
to
be
optimistic
\
\
about
their
own
future
but
pessimistic
about
the
future
of
their
nation
or
the
\
(A)
increased,
increased
(B)
increased,
decreased
(C)
decreased,
increased
(D)
\
world.
The
answer
is
(B).
\n\n
"
decreased,
decreased'
"
group"
:
"
mmlu_flan_cot_fewshot_other"
target
:
Let's think step by step. We refer to Wikipedia articles on global facts
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
for help. As of 2015, since 1990 forests have increased in Europe and have decreased
"
task"
:
"
mmlu_flan_cot_fewshot_global_facts"
in Africa and the Americas. The answer is (B).
-
question
:
'
Which
of
the
following
pairs
of
statements
are
both
true
(as
of
2019)?
(A)
People
tend
to
be
optimistic
about
their
own
future
and
the
future
of
their
nation
or
the
world.
(B)
People
tend
to
be
optimistic
about
their
own
future
but
pessimistic
about
the
future
of
their
nation
or
the
world.
(C)
People
tend
to
be
pessimistic
about
their
own
future
but
optimistic
about
the
future
of
their
nation
or
the
world.
(D)
People
tend
to
be
pessimistic
about
their
own
future
and
the
future
of
their
nation
or
the
world.'
target
:
'
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
global
facts
for
help.
As
of
2019,
most
people
tend
to
be
optimistic
about
their
own
future
but
pessimistic
about
the
future
of
their
nation
or
the
world.
The
answer
is
(B).'
group
:
mmlu_flan_cot_fewshot_other
include
:
_mmlu_flan_cot_fewshot_template_yaml
task
:
mmlu_flan_cot_fewshot_global_facts
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_high_school_biology.yaml
View file @
da211969
"
dataset_name"
:
"
high_school_biology"
dataset_name
:
high_school_biology
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
high
\
description
:
The following are multiple choice questions (with answers) about high
\
school
biology.
\n\n
Q:
In
animal
cells,
which
of
the
following
represents
the
most
\
school biology.
\
likely
pathway
that
a
secretory
protein
takes
as
it
is
synthesized
in
a
cell?
\n\
fewshot_config
:
(A)
Plasma
membrane–Golgi
apparatus–ribosome–secretory
vesicle–rough
ER
(B)
Ribosome–Golgi
\
sampler
:
first_n
\
apparatus–rough
ER–secretory
vesicle–plasma
membrane
(C)
Plasma
membrane–Golgi
\
samples
:
\
apparatus–ribosome–secretory
vesicle–rough
ER
(D)
Ribosome–rough
ER–Golgi
apparatus–secretory
\
-
question
:
"
In
animal
cells,
which
of
the
following
represents
the
most
likely
pathway
\
\
vesicle–plasma
membrane
\n
A:
Let's
think
step
by
step.
Protein
synthesis
starts
\
\
that
a
secretory
protein
takes
as
it
is
synthesized
in
a
cell?
\n
(A)
Plasma
\
\
at
the
ribosome,
so
we
can
eliminate
(A)
and
(C).
The
ribosome
is
often
in
the
\
\
membrane
\u2013
Golgi
apparatus
\u2013
ribosome
\u2013
secretory
vesicle
\u2013
rough
\
\
endoplasmic
reticulum
and
moves
from
there
to
the
Golgi
apparatus,
where
it
is
\
\
ER
(B)
Ribosome
\u2013
Golgi
apparatus
\u2013
rough
ER
\u2013
secretory
vesicle
\u2013\
\
modified
and
packaged
into
a
vesicle.
The
vesicle
then
floats
to
the
plasma
membrane
\
plasma
membrane
(C)
Plasma
membrane
\u2013
Golgi
apparatus
\u2013
ribosome
\u2013\
\
and
is
secreted.
The
answer
is
(D).
\n\n
Q:
A
mutation
in
a
bacterial
enzyme
changed
\
secretory
vesicle
\u2013
rough
ER
(D)
Ribosome
\u2013
rough
ER
\u2013
Golgi
apparatus
\u2013\
\
a
previously
polar
amino
acid
into
a
nonpolar
amino
acid.
This
amino
acid
was
\
secretory
vesicle
\u2013
plasma
membrane"
\
located
at
a
site
distant
from
the
enzyme’s
active
site.
How
might
this
mutation
\
target
:
Let's think step by step. Protein synthesis starts at the ribosome, so
\
alter
the
enzyme’s
substrate
specificity?
\n
(A)
By
changing
the
enzyme’s
pH
optimum
\
we can eliminate (A) and (C). The ribosome is often in the endoplasmic reticulum
\
(B)
By
changing
the
enzyme’s
location
in
the
cell
(C)
By
changing
the
shape
of
\
and moves from there to the Golgi apparatus, where it is modified and packaged
\
the
protein
(D)
An
amino
acid
change
away
from
the
active
site
cannot
alter
the
\
into a vesicle. The vesicle then floats to the plasma membrane and is secreted.
\
enzyme’s
substrate
specificity.
\n
A:
Let's
think
step
by
step.
A
change
in
an
amino
\
The answer is (D).
\
acid
leads
to
a
change
in
the
primary
structure
of
the
protein.
A
change
in
the
\
-
question
:
"
A
mutation
in
a
bacterial
enzyme
changed
a
previously
polar
amino
acid
\
\
primary
structure
may
lead
to
a
change
in
the
secondary
and
the
tertiary
structure
\
\
into
a
nonpolar
amino
acid.
This
amino
acid
was
located
at
a
site
distant
\
\
of
the
protein.
A
change
in
the
tertiary
structure
means
a
change
in
the
shape
\
\
from
the
enzyme
\u2019
s
active
site.
How
might
this
mutation
alter
the
enzyme
\u2019\
\
of
the
protein,
so
(C)
has
to
be
correct.
Since
the
change
does
not
affect
the
\
s
substrate
specificity?
\n
(A)
By
changing
the
enzyme
\u2019
s
pH
optimum
(B)
By
\
\
active
site
of
the
enzyme,
we
do
not
expect
the
activity
of
the
enzyme
to
be
affected.
\
\
changing
the
enzyme
\u2019
s
location
in
the
cell
(C)
By
changing
the
shape
\
\
The
answer
is
(C).
\n\n
Q:
Which
of
the
following
is
not
a
way
to
form
recombinant
\
\
of
the
protein
(D)
An
amino
acid
change
away
from
the
active
site
cannot
alter
\
\
DNA?
\n
(A)
Translation
(B)
Conjugation
(C)
Specialized
transduction
(D)
Transformation
\n\
\
the
enzyme
\u2019
s
substrate
specificity."
A:
Let's
think
step
by
step.
The
introduction
of
foreign
DNA
or
RNA
into
bacteria
\
target
:
Let's think step by step. A change in an amino acid leads to a change
\
or
eukaryotic
cells
is
a
common
technique
in
molecular
biology
and
scientific
\
in the primary structure of the protein. A change in the primary structure may
\
research.
There
are
multiple
ways
foreign
DNA
can
be
introduced
into
cells
including
\
lead to a change in the secondary and the tertiary structure of the protein.
\
transformation,
transduction,
conjugation,
and
transfection.
In
contrast,
(A)
\
A change in the tertiary structure means a change in the shape of the protein,
\
is
not
a
way
to
form
DNA:
during
translation
the
ribosomes
synthesize
proteins
\
so (C) has to be correct. Since the change does not affect the active site of
\
from
RNA.
The
answer
is
(A).
\n\n
Q:
Homologous
structures
are
often
cited
as
evidence
\
the enzyme, we do not expect the activity of the enzyme to be affected. The
\
for
the
process
of
natural
selection.
All
of
the
following
are
examples
of
homologous
\
answer is (C).
\
structures
EXCEPT
\n
(A)
the
wings
of
a
bird
and
the
wings
of
a
bat
(B)
the
flippers
\
-
question
:
'
Which
of
the
following
is
not
a
way
to
form
recombinant
DNA?
\
of
a
whale
and
the
arms
of
a
man
(C)
the
pectoral
fins
of
a
porpoise
and
the
flippers
\
\
of
a
seal
(D)
the
forelegs
of
an
insect
and
the
forelimbs
of
a
dog
\n
A:
Let's
think
\
(A)
Translation
(B)
Conjugation
(C)
Specialized
transduction
(D)
Transformation'
\
step
by
step.
Homologous
structures
are
similar
physical
features
in
organisms
\
target
:
'
Let'
'
s
think
step
by
step.
The
introduction
of
foreign
DNA
or
RNA
into
\
that
share
a
common
ancestor
but
different
functions.
Comparisons
(B)
and
(C)
\
bacteria
or
eukaryotic
cells
is
a
common
technique
in
molecular
biology
and
\
are
clearly
homologous
because
they
share
a
common
ancestor
and
the
structures
\
scientific
research.
There
are
multiple
ways
foreign
DNA
can
be
introduced
into
\
serve
different
purposes.
Bat
wings
and
birg
wings
are
also
homologous,
while
\
cells
including
transformation,
transduction,
conjugation,
and
transfection.
\
they
are
both
wings,
the
forelimbs
serve
different
purposes.
Insects
and
dogs
\
In
contrast,
(A)
is
not
a
way
to
form
DNA:
during
translation
the
ribosomes
\
are
very
far
ancestors
since
one
is
vertebrate
while
the
other
is
invertebrate
\
synthesize
proteins
from
RNA.
The
answer
is
(A).'
\
and
the
forelimbs
serve
the
same
purpose,
so
they
are
not
homologous.
The
answer
\
-
question
:
'
Homologous
structures
are
often
cited
as
evidence
for
the
process
of
natural
\
is
(D).
\n\n
Q:
Which
of
the
following
is
not
known
to
be
involved
in
the
control
\
selection.
All
of
the
following
are
examples
of
homologous
structures
EXCEPT
\
of
cell
division?
\n
(A)
Cyclins
(B)
Protein
kinases
(C)
Checkpoints
(D)
Fibroblast
\
\
cells
\n
A:
Let's
think
step
by
step.
Normal
cells
move
through
the
cell
cycle
in
\
(A)
the
wings
of
a
bird
and
the
wings
of
a
bat
(B)
the
flippers
of
a
whale
and
\
a
regulated
way.
At
the
checkpoint
stage,
they
use
information
about
their
own
\
the
arms
of
a
man
(C)
the
pectoral
fins
of
a
porpoise
and
the
flippers
of
a
\
internal
state
and
cues
from
the
environment
around
them
to
decide
whether
to
\
seal
(D)
the
forelegs
of
an
insect
and
the
forelimbs
of
a
dog'
\
proceed
with
cell
division.
Cues
like
these
act
by
changing
the
activity
of
core
\
target
:
"
Let's
think
step
by
step.
\u200B\u200B
Homologous
structures
are
similar
\
\
cell
cycle
regulators
inside
the
cell.
The
most
common
regulators
are
cyclins
\
\
physical
features
in
organisms
that
share
a
common
ancestor
\u200B\u200B
but
\
\
and
cyclin-dependent
kinases.
Fibroblast
cells
do
not
play
any
role
in
cell
division.
\
\
different
functions.
Comparisons
(B)
and
(C)
are
clearly
homologous
because
\
\
The
answer
is
(D).
\n\n
"
\
they
share
a
common
ancestor
and
the
structures
serve
different
purposes.
\
"
group"
:
"
mmlu_flan_cot_fewshot_stem"
\
Bat
wings
and
birg
wings
are
also
homologous,
while
they
are
both
wings,
the
\
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
\
forelimbs
serve
different
purposes.
Insects
and
dogs
are
very
far
ancestors
\
"
task"
:
"
mmlu_flan_cot_fewshot_high_school_biology"
\
since
one
is
vertebrate
while
the
other
is
invertebrate
and
the
forelimbs
\
\
serve
the
same
purpose,
so
they
are
not
homologous.
The
answer
is
(D)."
-
question
:
'
Which
of
the
following
is
not
known
to
be
involved
in
the
control
of
cell
division?
(A)
Cyclins
(B)
Protein
kinases
(C)
Checkpoints
(D)
Fibroblast
cells'
target
:
'
Let'
'
s
think
step
by
step.
Normal
cells
move
through
the
cell
cycle
in
a
regulated
way.
At
the
checkpoint
stage,
they
use
information
about
their
own
internal
state
and
cues
from
the
environment
around
them
to
decide
whether
to
proceed
with
cell
division.
Cues
like
these
act
by
changing
the
activity
of
core
cell
cycle
regulators
inside
the
cell.
The
most
common
regulators
are
cyclins
and
cyclin-dependent
kinases.
Fibroblast
cells
do
not
play
any
role
in
cell
division.
The
answer
is
(D).'
group
:
mmlu_flan_cot_fewshot_stem
include
:
_mmlu_flan_cot_fewshot_template_yaml
task
:
mmlu_flan_cot_fewshot_high_school_biology
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_high_school_chemistry.yaml
View file @
da211969
"
dataset_name"
:
"
high_school_chemistry"
dataset_name
:
high_school_chemistry
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
high
\
description
:
The following are multiple choice questions (with answers) about high
\
school
chemistry.
\n\n
Q:
Which
of
the
following
is
considered
an
acid
anhydride?
\n\
school chemistry.
(A)
HCl
(B)
H2SO3
(C)
SO2
(D)
Al(NO3)3
\n
A:
Let's
think
step
by
step.
An
acid
anhydride
\
fewshot_config
:
\
is
a
compound
that
is
derived
by
removing
water
from
an
acid.
The
chemical
formula
\
sampler
:
first_n
\
for
water
is
H2O,
which
means
that
we
need
to
determine
which
of
these
options,
\
samples
:
\
when
combined
with
H2O,
forms
an
acid.
SO2,
or
Sulfur
dioxide,
when
combined
with
\
-
question
:
'
Which
of
the
following
is
considered
an
acid
anhydride?
\
H2O,
makes
H2SO4,
or
sulfuric
acid.
The
answer
is
(C).
\n\n
Q:
Which
of
the
following
\
\
is
expected
to
be
a
polar
molecule?
\n
(A)
PCl4F
(B)
BF3
(C)
CO2
(D)
Si(CH3)4
\n\
(A)
HCl
(B)
H2SO3
(C)
SO2
(D)
Al(NO3)3'
A:
Let's
think
step
by
step.
A
polar
molecule
is
one
that
has
a
slightly
positive
\
target
:
Let's think step by step. An acid anhydride is a compound that is derived
\
charge
on
one
end
of
the
molecule
and
a
slightly
negative
charge
on
the
other
\
by removing water from an acid. The chemical formula for water is H2O, which
\
end.
Boron
trifluoride
(BF3)
has
Boron
as
the
center
atom
and
three
fluorine
atoms
\
means that we need to determine which of these options, when combined with H2O,
\
attached
to
it;
it
is
trigonal
planar
and
symmetric,
so
it
is
nonpolar.
Carbon
\
forms an acid. SO2, or Sulfur dioxide, when combined with H2O, makes H2SO4,
\
Dioxide
(CO2)
has
Carbon
as
the
central
atom
with
double
bonds
to
two
Oxygen
atoms
\
or sulfuric acid. The answer is (C).
\
-
this
is
also
symmetrical
and
therefore
nonpolar.
The
same
is
the
case
for
tetramethyl
\
-
question
:
'
Which
of
the
following
is
expected
to
be
a
polar
molecule?
\
silane
(SI(CH3)4),
which
is
a
Silicon
atom
surrounded
by
four
methyl
groups.
The
\
\
structure
of
PCL4F
is
that
Phosphorus
is
the
central
atom,
attached
to
four
chlorines
\
(A)
PCl4F
(B)
BF3
(C)
CO2
(D)
Si(CH3)4'
\
and
one
fluorine
atom.
This
is
asymmetrical,
and
therefore
has
a
net
dipole
and
\
target
:
Let's think step by step. A polar molecule is one that has a slightly
\
is
expected
to
be
a
polar
molecule.
The
answer
is
(A).
\n\n
Q:
From
the
solubility
\
positive charge on one end of the molecule and a slightly negative charge on
\
rules,
which
of
the
following
is
true?
\n
(A)
All
chlorides,
bromides,
and
iodides
\
the other end. Boron trifluoride (BF3) has Boron as the center atom and three
\
are
soluble
(B)
All
sulfates
are
soluble
(C)
All
hydroxides
are
soluble
(D)
All
\
fluorine atoms attached to it; it is trigonal planar and symmetric, so it is
\
ammonium-containing
compounds
are
soluble
\n
A:
Let's
think
step
by
step.
The
chlorides,
\
nonpolar. Carbon Dioxide (CO2) has Carbon as the central atom with double bonds
\
bromides,
and
iodides
of
lead,
silver,
and
mercury
are
not
soluble
in
water.
This
\
to two Oxygen atoms - this is also symmetrical and therefore nonpolar. The same
\
rules
out
(A).
The
sulfates
of
lead,
barium,
and
calcium
are
not
soluble
in
water,
\
is the case for tetramethyl silane (SI(CH3)4), which is a Silicon atom surrounded
\
which
rules
out
(B).
The
hydroxides
of
any
metal
besides
sodium,
potassium,
ammonium,
\
by four methyl groups. The structure of PCL4F is that Phosphorus is the central
\
calcium,
and
barium
are
insoluble.
This
rules
out
(C).
Typically
ammonium
ions
\
atom, attached to four chlorines and one fluorine atom. This is asymmetrical,
\
indicate
a
soluble
ionic
substance.
The
answer
is
(D).
\n\n
Q:
A
new
compound
is
\
and therefore has a net dipole and is expected to be a polar molecule. The answer
\
synthesized
and
found
to
be
a
monoprotic
acid
with
a
molar
mass
of
248
g/mol.
\
is (A).
\
When
0.0050
mol
of
this
acid
are
dissolved
in
0.500
L
of
water,
the
pH
is
measured
\
-
question
:
'
From
the
solubility
rules,
which
of
the
following
is
true?
\
as
3.89.
What
is
the
pKa
of
this
acid?
\n
(A)
3.89
(B)
7.78
(C)
5.78
(D)
2.33
\n\
A:
Let's
think
step
by
step.
Recall
that
$[A]
=
[H^{+}]$.
Here,
this
is
equal
to
\
(A)
All
chlorides,
bromides,
and
iodides
are
soluble
(B)
All
sulfates
are
soluble
\
$$10^{-3.89}$.
Then
we
have
$K_{a}
=
$
\n
rac{[H^{+}][A^{-}]}{[HA]}
=
\n
rac{10^{-3.89}
\
(C)
All
hydroxides
are
soluble
(D)
All
ammonium-containing
compounds
are
soluble'
\ \\
cdot
10^{-3.89}}{10^{-2}}.
The
resulting
exponent
is
$-3.89
+
(-3.89)
-
(-2)
\
target
:
Let's think step by step. The chlorides, bromides, and iodides of lead,
\
=
5.78$,
therefore
$K_a
=
10^{-5.78}$.
The
$pK_a$
is
the
negative
log
of
$K_a$,
\
silver, and mercury are not soluble in water. This rules out (A). The sulfates
\
which
is
equal
to
$5.78$.
The
answer
is
(C).
\n\n
Q:
A
solution
contains
2.00
mole
\
of lead, barium, and calcium are not soluble in water, which rules out (B).
\
of
acetic
acid,
CH3COOH,
and
1.00
mole
of
calcium
acetate,
Ca(CH3COO)2.
The
solution
\
The hydroxides of any metal besides sodium, potassium, ammonium, calcium, and
\
is
able
to
resist
the
addition
of
a
small
amount
of
strong
acid
or
strong
base
\
barium are insoluble. This rules out (C). Typically ammonium ions indicate a
\
with
only
minor
changes
in
the
pH
of
the
solution.
Larger
quantities
of
strong
\
soluble ionic substance. The answer is (D).
\
acid
or
strong
base
can
cause
a
significant
change
in
pH.
How
many
moles
of
nitric
\
-
question
:
'
A
new
compound
is
synthesized
and
found
to
be
a
monoprotic
acid
with
a
\
acid,
HNO3,
may
be
added
before
the
pH
begins
to
change
significantly?
\n
(A)
0.500
\
molar
mass
of
248
g/mol.
When
0.0050
mol
of
this
acid
are
dissolved
in
0.500
\
mole
(B)
1.00
mole
(C)
2.00
mole
(D)
3.00
mole
\n
A:
Let's
think
step
by
step.
We
\
L
of
water,
the
pH
is
measured
as
3.89.
What
is
the
pKa
of
this
acid?
\
would
like
to
compute
the
buffer
capacity
of
this
solution.
First
we
write
the
\
\
equation
for
the
ionization
of
the
weak
acid,
in
this
case
of
acetic
acid.
$CH_{3}COOH
\
(A)
3.89
(B)
7.78
(C)
5.78
(D)
2.33'
\
(aq)
+
H_{2}O
\n
ightarrow
H_{3}O^{+}
+
CH3COO^{-}$.
The
conjugate
base
is
therefore
\
target
:
"
Let's
think
step
by
step.
Recall
that
$[A]
=
[H^{+}]$.
Here,
this
is
\
\
the
acetate
ion.
The
added
strong
acid,
Nitric
acid,
will
react
with
the
conjugate
\
\
equal
to
$$10^{-3.89}$.
Then
we
have
$K_{a}
=
$
\n
rac{[H^{+}][A^{-}]}{[HA]}
\
\
base.
Therefore
the
maximum
amount
of
acid
that
can
be
added
will
be
equal
to
\
\
=
\n
rac{10^{-3.89}
\\
cdot
10^{-3.89}}{10^{-2}}.
The
resulting
exponent
is
\
\
the
amount
of
acetate
ion,
or
2
moles.
The
answer
is
(C).
\n\n
"
\
$-3.89
+
(-3.89)
-
(-2)
=
5.78$,
therefore
$K_a
=
10^{-5.78}$.
The
$pK_a$
\
"
group"
:
"
mmlu_flan_cot_fewshot_stem"
\
is
the
negative
log
of
$K_a$,
which
is
equal
to
$5.78$.
The
answer
is
(C)."
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
-
question
:
'
A
solution
contains
2.00
mole
of
acetic
acid,
CH3COOH,
and
1.00
mole
of
"
task"
:
"
mmlu_flan_cot_fewshot_high_school_chemistry"
calcium
acetate,
Ca(CH3COO)2.
The
solution
is
able
to
resist
the
addition
of
a
small
amount
of
strong
acid
or
strong
base
with
only
minor
changes
in
the
pH
of
the
solution.
Larger
quantities
of
strong
acid
or
strong
base
can
cause
a
significant
change
in
pH.
How
many
moles
of
nitric
acid,
HNO3,
may
be
added
before
the
pH
begins
to
change
significantly?
(A)
0.500
mole
(B)
1.00
mole
(C)
2.00
mole
(D)
3.00
mole'
target
:
"
Let's
think
step
by
step.
We
would
like
to
compute
the
buffer
capacity
\
\
of
this
solution.
First
we
write
the
equation
for
the
ionization
of
the
weak
\
\
acid,
in
this
case
of
acetic
acid.
$CH_{3}COOH
(aq)
+
H_{2}O
\n
ightarrow
H_{3}O^{+}
\
\
+
CH3COO^{-}$.
The
conjugate
base
is
therefore
the
acetate
ion.
The
added
\
\
strong
acid,
Nitric
acid,
will
react
with
the
conjugate
base.
Therefore
the
\
\
maximum
amount
of
acid
that
can
be
added
will
be
equal
to
the
amount
of
acetate
\
\
ion,
or
2
moles.
The
answer
is
(C).
\n\n
"
group
:
mmlu_flan_cot_fewshot_stem
include
:
_mmlu_flan_cot_fewshot_template_yaml
task
:
mmlu_flan_cot_fewshot_high_school_chemistry
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_high_school_computer_science.yaml
View file @
da211969
"
dataset_name"
:
"
high_school_computer_science"
dataset_name
:
high_school_computer_science
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
high
\
description
:
The following are multiple choice questions (with answers) about high
\
school
computer
science.
\n\n
Q:
Which
of
the
following
is
an
example
of
the
use
\
school computer science.
\
of
a
device
on
the
Internet
of
Things
(IoT)
?
\n
(A)
A
car
alerts
a
driver
that
\
fewshot_config
:
\
it
is
about
to
hit
an
object.
(B)
A
hiker
uses
a
G
P
S
watch
to
keep
track
of
\
sampler
:
first_n
\
her
position.
(C)
A
refrigerator
orders
milk
from
an
online
delivery
service
when
\
samples
:
\
the
milk
in
the
refrigerator
is
almost
gone.
(D)
A
runner
uses
a
watch
with
optical
\
-
question
:
'
Which
of
the
following
is
an
example
of
the
use
of
a
device
on
the
Internet
\
sensors
to
monitor
his
heart
rate.
\n
A:
Let's
think
step
by
step.
The
term
Internet
\
of
Things
(IoT)
?
\
of
Things
(IoT)
refers
to
common
devices
which
are
connected
to
the
internet,
\
\
enabling
new
functionality.
Choice
A
is
incorrect
because
it
does
not
describe
\
(A)
A
car
alerts
a
driver
that
it
is
about
to
hit
an
object.
(B)
A
hiker
uses
\
an
internet
connected
device.
In
choice
B,
the
watch
is
only
described
as
having
\
a
G
P
S
watch
to
keep
track
of
her
position.
(C)
A
refrigerator
orders
milk
\
GPS
functionality
but
no
internet
connectivity.
Choice
C
describes
a
common
device
\
from
an
online
delivery
service
when
the
milk
in
the
refrigerator
is
almost
\
(a
refrigerator)
which
has
internet
connectivity
enabling
new
functionality
(online
\
gone.
(D)
A
runner
uses
a
watch
with
optical
sensors
to
monitor
his
heart
rate.'
\
ordering).
Choice
D
does
not
mention
internet
connectivity
for
the
watch,
only
\
target
:
Let's think step by step. The term Internet of Things (IoT) refers to
\
optical
sensors.
The
answer
is
(C).
\n\n
Q:
Many
Web
browsers
allow
users
to
open
\
common devices which are connected to the internet, enabling new functionality.
\
anonymous
windows.
During
a
browsing
session
in
an
anonymous
window,
the
browser
\
Choice A is incorrect because it does not describe an internet connected device.
\
does
not
record
a
browsing
history
or
a
list
of
downloaded
files.
When
the
anonymous
\
In choice B, the watch is only described as having GPS functionality but no
\
window
is
exited,
cookies
created
during
the
session
are
deleted.
Which
of
the
\
internet connectivity. Choice C describes a common device (a refrigerator) which
\
following
statements
about
browsing
sessions
in
an
anonymous
window
is
true?
\n\
has internet connectivity enabling new functionality (online ordering). Choice
(A)
The
activities
of
a
user
browsing
in
an
anonymous
window
will
not
be
visible
\
D does not mention internet connectivity for the watch, only optical sensors.
\
to
people
who
monitor
the
user's
network,
such
as
the
system
administrator.
(B)
\
The answer is (C).
\
Items
placed
in
a
Web
store's
shopping
cart
for
future
purchase
during
the
anonymous
\
-
question
:
'
Many
Web
browsers
allow
users
to
open
anonymous
windows.
During
a
browsing
\
browsing
session
will
not
be
saved
on
the
user's
computer.
(C)
A
user
will
not
\
session
in
an
anonymous
window,
the
browser
does
not
record
a
browsing
history
\
be
able
to
log
in
to
e-mail
or
social
media
accounts
during
the
anonymous
browsing
\
or
a
list
of
downloaded
files.
When
the
anonymous
window
is
exited,
cookies
\
session.
(D)
A
user
browsing
in
an
anonymous
window
will
be
protected
from
viruses
\
created
during
the
session
are
deleted.
Which
of
the
following
statements
about
\
launched
from
any
web
sites
visited
or
files
downloaded.
\n
A:
Let's
think
step
\
browsing
sessions
in
an
anonymous
window
is
true?
\
by
step.
Choice
A
is
incorrect
as
it
only
describes
network
traffic,
which
an
\
\
anonymous
browser
does
not
change.
Choice
B
is
correct
as
it
correctly
describes
\
(A)
The
activities
of
a
user
browsing
in
an
anonymous
window
will
not
be
visible
\
how
an
anonymous
browser
will
prevent
saving
data
on
the
user’s
computer
after
\
to
people
who
monitor
the
user'
'
s
network,
such
as
the
system
administrator.
\
the
session
is
ended.
Choice
C
is
incorrect
because
an
anonymous
browser
will
\
(B)
Items
placed
in
a
Web
store'
'
s
shopping
cart
for
future
purchase
during
\
not
prevent
logging
in
to
email
or
social
media
accounts.
Choice
D
is
incorrect
\
the
anonymous
browsing
session
will
not
be
saved
on
the
user'
'
s
computer.
(C)
\
because
an
anonymous
browser
in
itself
performs
no
virus
protection.
The
answer
\
A
user
will
not
be
able
to
log
in
to
e-mail
or
social
media
accounts
during
\
is
(B).
\n\n
Q:
In
the
program
below,
the
initial
value
of
X
is
5
and
the
initial
\
the
anonymous
browsing
session.
(D)
A
user
browsing
in
an
anonymous
window
will
\
value
of
Y
is
10.
\n
IF
(X
<
0){
\n
DISPLAY
(
\"
Foxtrot
\"
)
\n
}
ELSE
{
\n
IF
(X
>
Y){
\n\
be
protected
from
viruses
launched
from
any
web
sites
visited
or
files
downloaded.'
\
DISPLAY
(
\"
Hotel
\"
)
\n
}
ELSE
{
\n
IF
(Y
>
0){
\n
DISPLAY
(
\"
November
\"
)
\n
}
\
target
:
"
Let's
think
step
by
step.
Choice
A
is
incorrect
as
it
only
describes
\
\
ELSE
{
\n
DISPLAY
(
\"
Yankee
\"
)
\n
}
\n
}
\n
}
\n
What
is
displayed
as
a
result
of
\
\
network
traffic,
which
an
anonymous
browser
does
not
change.
Choice
B
is
correct
\
\
running
the
program?
\n
(A)
Foxtrot
(B)
Hotel
(C)
November
(D)
Yankee
\n
A:
Let's
\
\
as
it
correctly
describes
how
an
anonymous
browser
will
prevent
saving
data
\
\
think
step
by
step.
Because
X
has
the
value
5,
the
first
conditional
IF
(X
<
0)
\
\
on
the
user
\u2019
s
computer
after
the
session
is
ended.
Choice
C
is
incorrect
\
\
is
false,
so
we
move
to
the
first
ELSE
clause.
Because
X
is
5
and
Y
is
10,
the
\
\
because
an
anonymous
browser
will
not
prevent
logging
in
to
email
or
social
\
\
second
conditional
IF
(X
>
Y)
is
false,
so
we
move
to
the
following
ELSE
clause.
\
\
media
accounts.
Choice
D
is
incorrect
because
an
anonymous
browser
in
itself
\
\
Since
Y
is
10,
the
conditional
IF
(Y
>
0)
is
true,
so
the
command
DISPLAY
(
\"\
\
performs
no
virus
protection.
The
answer
is
(B)."
November
\"
)
is
executed.
The
answer
is
(C).
\n\n
Q:
What
is
the
output
of
\"
abc
\"\
-
question
:
"
In
the
program
below,
the
initial
value
of
X
is
5
and
the
initial
value
\
[::-1]
in
Python
3?
\n
(A)
Error
(B)
abc
(C)
cba
(D)
c
\n
A:
Let's
think
step
by
step.
\
\
of
Y
is
10.
\n
IF
(X
<
0){
\n
DISPLAY
(
\"
Foxtrot
\"
)
\n
}
ELSE
{
\n
IF
(X
>
Y){
\n\
\
We
know
that
the
slicing
operator
[::-1]
takes
all
of
the
elements
in
the
string
\
\
DISPLAY
(
\"
Hotel
\"
)
\n
}
ELSE
{
\n
IF
(Y
>
0){
\n
DISPLAY
(
\"
November
\"
)
\n\
\
in
reverse
order,
so
we
reverse
the
order
of
the
string
\"
abc
\"
,
resulting
in
\
\
}
ELSE
{
\n
DISPLAY
(
\"
Yankee
\"
)
\n
}
\n
}
\n
}
\n
What
is
displayed
as
a
result
\
\ \"
cba
\"
.
The
answer
is
(C).
\n\n
Q:
A
list
of
numbers
has
n
elements,
indexed
from
\
\
of
running
the
program?
\n
(A)
Foxtrot
(B)
Hotel
(C)
November
(D)
Yankee"
\
1
to
n.
The
following
algorithm
is
intended
to
display
the
number
of
elements
\
target
:
Let's think step by step. Because X has the value 5, the first conditional
\
in
the
list
that
have
a
value
greater
than
100.
The
algorithm
uses
the
variables
\
IF (X < 0) is
false
, so we move to the first ELSE clause. Because X is 5 and
\
count
and
position.
Steps
3
and
4
are
missing.
\n
Step
1:
Set
count
to
0
and
position
\
Y is 10, the second conditional IF (X > Y) is
false
, so we move to the following
\
to
1.
\n
Step
2:
If
the
value
of
the
element
at
index
position
is
greater
than
\
ELSE clause. Since Y is 10, the conditional IF (Y > 0) is
true
, so the command
\
100,
increase
the
value
of
count
by
1.
\n
Step
3:
(missing
step)
\n
Step
4:
(missing
\
DISPLAY ("November") is executed. The answer is (C).
\
step)
\n
Step
5:
Display
the
value
of
count.
\n
Which
of
the
following
could
be
used
\
-
question
:
'
What
is
the
output
of
"abc"[::-1]
in
Python
3?
\
to
replace
steps
3
and
4
so
that
the
algorithm
works
as
intended?
\n
(A)
Step
3:
\
\
Increase
the
value
of
position
by
1.
\n
Step
4:
Repeat
steps
2
and
3
until
the
\
(A)
Error
(B)
abc
(C)
cba
(D)
c'
\
value
of
count
is
greater
than
100.
\n
(B)
Step
3:
Increase
the
value
of
position
\
target
:
Let's think step by step. We know that the slicing operator [::-1] takes
\
by
1.
\n
Step
4:
Repeat
steps
2
and
3
until
the
value
of
position
is
greater
than
\
all of the elements in the string in reverse order, so we reverse the order
\
n.
\n
(C)
Step
3:
Repeat
step
2
until
the
value
of
count
is
greater
than
100.
\n\
of the string "abc", resulting in "cba". The answer is (C).
\
Step
4:
Increase
the
value
of
position
by
1.
\n
(D)
Step
3:
Repeat
step
2
until
\
-
question
:
"
A
list
of
numbers
has
n
elements,
indexed
from
1
to
n.
The
following
algorithm
\
\
the
value
of
position
is
greater
than
n.
\n
Step
4:
Increase
the
value
of
count
\
\
is
intended
to
display
the
number
of
elements
in
the
list
that
have
a
value
\
\
by
1.
\n
A:
Let's
think
step
by
step.
Choice
A
is
incorrect,
because
its
Step
4
\
\
greater
than
100.
The
algorithm
uses
the
variables
count
and
position.
Steps
\
\
has
an
incorrect
termination
condition,
stopping
when
count
is
greater
than
100.
\
\
3
and
4
are
missing.
\n
Step
1:
Set
count
to
0
and
position
to
1.
\n
Step
2:
\
\
We
need
to
stop
after
inspecting
all
elements
in
the
list.
Choice
B
is
correct
\
\
If
the
value
of
the
element
at
index
position
is
greater
than
100,
increase
\
\
because
it
correctly
increments
both
count
and
position,
and
correctly
repeats
\
\
the
value
of
count
by
1.
\n
Step
3:
(missing
step)
\n
Step
4:
(missing
step)
\n\
\
these
steps
and
terminates
when
all
elements
in
the
list
have
been
inspected.
\
\
Step
5:
Display
the
value
of
count.
\n
Which
of
the
following
could
be
used
\
\
Choice
C
is
incorrect
because
it
incorrectly
increments
the
variable
count
until
\
\
to
replace
steps
3
and
4
so
that
the
algorithm
works
as
intended?
\n
(A)
Step
\
\
its
value
is
greater
than
100,
regardless
of
the
elements
in
the
list.
Choice
\
\
3:
Increase
the
value
of
position
by
1.
\n
Step
4:
Repeat
steps
2
and
3
until
\
\
D
is
incorrect
because
its
step
3
does
not
increment
the
value
of
position,
so
\
\
the
value
of
count
is
greater
than
100.
\n
(B)
Step
3:
Increase
the
value
of
\
\
it
will
repeat
forever.
The
answer
is
(B).
\n\n
"
\
position
by
1.
\n
Step
4:
Repeat
steps
2
and
3
until
the
value
of
position
\
"
group"
:
"
mmlu_flan_cot_fewshot_stem"
\
is
greater
than
n.
\n
(C)
Step
3:
Repeat
step
2
until
the
value
of
count
is
\
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
\
greater
than
100.
\n
Step
4:
Increase
the
value
of
position
by
1.
\n
(D)
Step
\
"
task"
:
"
mmlu_flan_cot_fewshot_high_school_computer_science"
\
3:
Repeat
step
2
until
the
value
of
position
is
greater
than
n.
\n
Step
4:
\
\
Increase
the
value
of
count
by
1."
target
:
'
Let'
'
s
think
step
by
step.
Choice
A
is
incorrect,
because
its
Step
4
has
an
incorrect
termination
condition,
stopping
when
count
is
greater
than
100.
We
need
to
stop
after
inspecting
all
elements
in
the
list.
Choice
B
is
correct
because
it
correctly
increments
both
count
and
position,
and
correctly
repeats
these
steps
and
terminates
when
all
elements
in
the
list
have
been
inspected.
Choice
C
is
incorrect
because
it
incorrectly
increments
the
variable
count
until
its
value
is
greater
than
100,
regardless
of
the
elements
in
the
list.
Choice
D
is
incorrect
because
its
step
3
does
not
increment
the
value
of
position,
so
it
will
repeat
forever.
The
answer
is
(B).'
group
:
mmlu_flan_cot_fewshot_stem
include
:
_mmlu_flan_cot_fewshot_template_yaml
task
:
mmlu_flan_cot_fewshot_high_school_computer_science
Prev
1
…
21
22
23
24
25
26
27
28
29
…
33
Next
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment