Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
gaoqiong
lm-evaluation-harness
Commits
772d6f6f
Commit
772d6f6f
authored
Jun 14, 2024
by
Yu Shi Jie
Browse files
added cot fewshot for mmlu-pro
parent
fbeaa2c1
Changes
21
Expand all
Hide whitespace changes
Inline
Side-by-side
Showing
20 changed files
with
600 additions
and
4 deletions
+600
-4
lm_eval/tasks/mmlu_pro/default/mmlu_pro_math.yaml
lm_eval/tasks/mmlu_pro/default/mmlu_pro_math.yaml
+4
-4
lm_eval/tasks/mmlu_pro/flan_cot_fewshot/_cot_prompts.json
lm_eval/tasks/mmlu_pro/flan_cot_fewshot/_cot_prompts.json
+1
-0
lm_eval/tasks/mmlu_pro/flan_cot_fewshot/_mmlu_pro.yaml
lm_eval/tasks/mmlu_pro/flan_cot_fewshot/_mmlu_pro.yaml
+6
-0
lm_eval/tasks/mmlu_pro/flan_cot_fewshot/_mmlu_pro_flan_cot_fewshot_template_yaml
...flan_cot_fewshot/_mmlu_pro_flan_cot_fewshot_template_yaml
+31
-0
lm_eval/tasks/mmlu_pro/flan_cot_fewshot/mmlu_anatomy.yaml
lm_eval/tasks/mmlu_pro/flan_cot_fewshot/mmlu_anatomy.yaml
+75
-0
lm_eval/tasks/mmlu_pro/flan_cot_fewshot/mmlu_astronomy.yaml
lm_eval/tasks/mmlu_pro/flan_cot_fewshot/mmlu_astronomy.yaml
+70
-0
lm_eval/tasks/mmlu_pro/flan_cot_fewshot/mmlu_pro_biology.yaml
...val/tasks/mmlu_pro/flan_cot_fewshot/mmlu_pro_biology.yaml
+23
-0
lm_eval/tasks/mmlu_pro/flan_cot_fewshot/mmlu_pro_business.yaml
...al/tasks/mmlu_pro/flan_cot_fewshot/mmlu_pro_business.yaml
+23
-0
lm_eval/tasks/mmlu_pro/flan_cot_fewshot/mmlu_pro_chemistry.yaml
...l/tasks/mmlu_pro/flan_cot_fewshot/mmlu_pro_chemistry.yaml
+23
-0
lm_eval/tasks/mmlu_pro/flan_cot_fewshot/mmlu_pro_computer_science.yaml
.../mmlu_pro/flan_cot_fewshot/mmlu_pro_computer_science.yaml
+63
-0
lm_eval/tasks/mmlu_pro/flan_cot_fewshot/mmlu_pro_economics.yaml
...l/tasks/mmlu_pro/flan_cot_fewshot/mmlu_pro_economics.yaml
+36
-0
lm_eval/tasks/mmlu_pro/flan_cot_fewshot/mmlu_pro_engineering.yaml
...tasks/mmlu_pro/flan_cot_fewshot/mmlu_pro_engineering.yaml
+23
-0
lm_eval/tasks/mmlu_pro/flan_cot_fewshot/mmlu_pro_health.yaml
lm_eval/tasks/mmlu_pro/flan_cot_fewshot/mmlu_pro_health.yaml
+23
-0
lm_eval/tasks/mmlu_pro/flan_cot_fewshot/mmlu_pro_history.yaml
...val/tasks/mmlu_pro/flan_cot_fewshot/mmlu_pro_history.yaml
+45
-0
lm_eval/tasks/mmlu_pro/flan_cot_fewshot/mmlu_pro_law.yaml
lm_eval/tasks/mmlu_pro/flan_cot_fewshot/mmlu_pro_law.yaml
+23
-0
lm_eval/tasks/mmlu_pro/flan_cot_fewshot/mmlu_pro_math.yaml
lm_eval/tasks/mmlu_pro/flan_cot_fewshot/mmlu_pro_math.yaml
+33
-0
lm_eval/tasks/mmlu_pro/flan_cot_fewshot/mmlu_pro_other.yaml
lm_eval/tasks/mmlu_pro/flan_cot_fewshot/mmlu_pro_other.yaml
+23
-0
lm_eval/tasks/mmlu_pro/flan_cot_fewshot/mmlu_pro_philosophy.yaml
.../tasks/mmlu_pro/flan_cot_fewshot/mmlu_pro_philosophy.yaml
+29
-0
lm_eval/tasks/mmlu_pro/flan_cot_fewshot/mmlu_pro_physics.yaml
...val/tasks/mmlu_pro/flan_cot_fewshot/mmlu_pro_physics.yaml
+23
-0
lm_eval/tasks/mmlu_pro/flan_cot_fewshot/mmlu_pro_psychology.yaml
.../tasks/mmlu_pro/flan_cot_fewshot/mmlu_pro_psychology.yaml
+23
-0
No files found.
lm_eval/tasks/mmlu_pro/default/mmlu_pro_math.yaml
View file @
772d6f6f
"
dataset_name"
:
"
chemistry
"
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
chemistry
.
\n\
"
dataset_name"
:
"
math
"
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
math
.
\n\
\n
"
"
group"
:
"
mmlu_pro_stem"
"
group_alias"
:
"
stem"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
mmlu_pro_
chemistry
"
"
task_alias"
:
"
chemistry
"
"
task"
:
"
mmlu_pro_
math
"
"
task_alias"
:
"
math
"
lm_eval/tasks/mmlu_pro/flan_cot_fewshot/_cot_prompts.json
0 → 100644
View file @
772d6f6f
This diff is collapsed.
Click to expand it.
lm_eval/tasks/mmlu_pro/flan_cot_fewshot/_mmlu_pro.yaml
0 → 100644
View file @
772d6f6f
group
:
mmlu_pro_flan_cot_fewshot
task
:
-
mmlu_pro_flan_cot_fewshot_stem
-
mmlu_pro_flan_cot_fewshot_other
-
mmlu_pro_flan_cot_fewshot_social_sciences
-
mmlu_pro_flan_cot_fewshot_humanities
lm_eval/tasks/mmlu_pro/flan_cot_fewshot/_mmlu_pro_flan_cot_fewshot_template_yaml
0 → 100644
View file @
772d6f6f
dataset_path: sjyuxyz/MMLU-Pro-with-subset
validation_split: validation
test_split: test
fewshot_split: validation
fewshot_config:
sampler: first_n
output_type: generate_until
doc_to_text: "Q: {{question.strip()}}\n(A) {{choices[0]}} (B) {{choices[1]}} (C) {{choices[2]}} (D) {{choices[3]}} (E) {{choices[4]}} (F) {{choices[5]}} (G) {{choices[6]}} (H) {{choices[7]}} (I) {{choices[8]}} (J) {{choices[9]}}\nA: Let's think step by step."
doc_to_target: "{{['(A)', '(B)', '(C)', '(D)', '(E)', '(F)', '(G)', '(H)', '(I)', '(J)'][answer]}}"
filter_list:
- name: "get-answer"
filter:
- function: "regex"
regex_pattern: "(?<=answer is )([A-Ja-j])(?=.)"
- function: "take_first"
generation_kwargs:
until:
- "</s>"
do_sample: false
temperature: 0.0
num_fewshot: 0
metric_list:
- metric: exact_match
aggregation: mean
higher_is_better: true
ignore_case: true
ignore_punctuation: true
metadata:
version: 1.0
lm_eval/tasks/mmlu_pro/flan_cot_fewshot/mmlu_anatomy.yaml
0 → 100644
View file @
772d6f6f
dataset_name
:
anatomy
description
:
The following are multiple choice questions (with answers) about anatomy.
fewshot_config
:
sampler
:
first_n
samples
:
-
question
:
'
Which
of
the
following
is
the
body
cavity
that
contains
the
pituitary
gland?
(A)
Abdominal
(B)
Cranial
(C)
Pleural
(D)
Spinal'
target
:
"
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
anatomy
for
\
\
help.
Let
\u2019
s
solve
this
problem
step
by
step.
The
pituitary
gland
is
the
\
\
major
endocrine
gland
attached
to
the
base
of
the
brain,
and
it
is
contained
\
\
in
the
Cranial
cavity.
The
answer
is
(B)."
-
question
:
'
Which
of
these
branches
of
the
trigeminal
nerve
contain
somatic
motor
processes?
(A)
The
supraorbital
nerve
(B)
The
infraorbital
nerve
(C)
The
mental
nerve
(D)
None
of
the
above'
target
:
"
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
anatomy
for
\
\
help.
Let
\u2019
s
solve
this
problem
step
by
step.
\n
We
know
the
following:
\
\
(A)
The
supraorbital
nerve
(also
known
as
the
frontal
nerve)
is
the
largest
\
\
branch
of
the
ophthalmic
nerve
and
branch
of
ophthalmic
division
of
the
trigeminal
\
\
nerve.
(B)
The
infraorbital
nerve
is
a
branch
of
the
maxillary
division
of
\
\
the
trigeminal
nerve.
(C)
The
mental
nerve
is
a
branch
of
the
mandibular
division
\
\
of
the
trigeminal
nerve.
Because
all
these
nerves
are
purely
sensory
nerves
\
\
and
do
not
contain
any
somatic
motor
processes.
Therefore,
the
answer
should
\
\
be
none
of
the
above,
which
is
(D).
The
answer
is
(D)."
-
question
:
'
In
Angle'
'
s
Class
II
Div
2
occlusion
there
is
(A)
excess
overbite
of
the
upper
lateral
incisors.
(B)
negative
overjet
of
the
upper
central
incisors.
(C)
excess
overjet
of
the
upper
lateral
incisors.
(D)
excess
overjet
of
the
upper
central
incisors.'
target
:
"
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
anatomy
for
\
\
help.
Let
\u2019
s
solve
this
problem
step
by
step.
This
is
a
question
related
\
\
to
anatomy
and
orthodontics.
Excess
overjet
is
associated
with
Class
II
occlusions;
\
\
therefore,
we
can
safely
eliminate
(B)
from
the
list,
as
negative
overjet
\
\
is
often
associated
with
Class
III
occlusions.
Now,
we
need
to
determine
the
\
\
location
of
the
excess
overjet,
and
that
would
be
the
upper
(maxillary)
lateral
\
\
incisors.
Only
(C)
has
the
correct
information.
The
answer
is
(C)."
-
question
:
'
The
pleura
(A)
have
no
sensory
innervation.
(B)
are
separated
by
a
2
mm
space.
(C)
extend
into
the
neck.
(D)
are
composed
of
respiratory
epithelium.'
target
:
"
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
anatomy
for
\
\
help.
Let
\u2019
s
solve
this
problem
step
by
step.
First,
recall
that
the
pleura
\
\
refers
to
the
thin
layer
of
tissue
that
covers
the
lungs
and
lines
the
interior
\
\
wall
of
the
chest
cavity.
Now,
let
\u2019
s
look
at
each
option:
\n
Option
(A):
\
\ \u201C
The
pleura
have
no
sensory
innervation.
\u201D
This
information
is
not
\
\
correct.
The
pleura
do
have
a
sensory
innervation.
\n
Option
(B):
\u201C
The
\
\
pleura
are
separated
by
a
2
mm
space.
\u201D
This
information
is
not
correct.
\
\
There
is
a
very
thin
\u201C
potential
\u201D
space
between
the
layers
of
the
\
\
pleura;
however,
it
is
typically
filled
with
serous
pleural
fluid.
\n
Option
\
\
(C):
\u201C
The
pleura
extend
into
the
neck.
\u201D
This
information
is
actuakky
\
\
true.
The
cervical
pleura,
also
known
as
the
dome
of
the
pleuradome
of
the
\
\
pleura,
lines
the
extendsiton
of
the
pleural
cavity
into
the
neck.
\n
Option
\
\
(D):
\u201C
The
pleura
are
composed
of
respiratory
epithelium.
\u201D
This
information
\
\
is
not
correct.
The
pleaura
are
composed
of
connective
tissue
(CT).
\n
Because
\
\
(A),
(B),
and
(D)
are
all
incorrect,
(D)
is
the
only
correct
answer.
The
answer
\
\
is
(C)."
-
question
:
'
What
is
the
embryological
origin
of
the
hyoid
bone?
(A)
The
first
pharyngeal
arch
(B)
The
first
and
second
pharyngeal
arches
(C)
The
second
pharyngeal
arch
(D)
The
second
and
third
pharyngeal
arches'
target
:
"
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
anatomy
for
\
\
help.
Let
\u2019
s
solve
this
problem
step
by
step.
The
hyoid
bone,
which
is
\
\
also
known
as
the
hyooid,
is
a
a
small
U-shaped
bone
located
in
the
anterior
\
\
neck.
In
its
resting
position,
it
lies
between
the
ase
of
the
mandible
and
\
\
the
third
cervical
vertebrae.
We
know
that
the
second
and
the
third
pharyngeal
\
\
arches
give
rise
to
the
horns
of
the
hyoid
bone;
therefore,
the
embryological
\
\
origin
of
the
hyoid
bone
are
the
second
and
the
third
pharyngeal
arches
\u2014\
this
information
is
covered
in
the
last
option
(D).
Therefore,
we
conclude
that
\
\
(D)
must
be
the
correct
answer.
The
answer
is
(D).
\n\n
"
group
:
mmlu_flan_cot_fewshot_stem
include
:
_mmlu_flan_cot_fewshot_template_yaml
task
:
mmlu_flan_cot_fewshot_anatomy
lm_eval/tasks/mmlu_pro/flan_cot_fewshot/mmlu_astronomy.yaml
0 → 100644
View file @
772d6f6f
dataset_name
:
astronomy
description
:
The following are multiple choice questions (with answers) about astronomy.
fewshot_config
:
sampler
:
first_n
samples
:
-
question
:
'
Where
do
most
short-period
comets
come
from
and
how
do
we
know?
(A)
The
Kuiper
belt;
short
period
comets
tend
to
be
in
the
plane
of
the
solar
system
just
like
the
Kuiper
belt.
(B)
The
Kuiper
belt;
short
period
comets
tend
to
come
from
random
directions
indicating
a
spherical
distribution
of
comets
called
the
Kuiper
belt.
(C)
The
asteroid
belt;
short
period
comets
have
orbital
periods
similar
to
asteroids
like
Vesta
and
are
found
in
the
plane
of
the
solar
system
just
like
the
asteroid
belt.
(D)
The
Oort
cloud;
short
period
comets
tend
to
be
in
the
plane
of
the
solar
system
just
like
the
Oort
cloud.'
target
:
Let's think step by step. Most short-period comets come from the Kuiper
belt, and we know because short period coments tend to be in the plane of the
solar system, just like the Kuiper belt is. The answer is (A).
-
question
:
'
You
are
pushing
a
truck
along
a
road.
Would
it
be
easier
to
accelerate
this
truck
on
Mars?
Why?
(Assume
there
is
no
friction)
(A)
It
would
be
harder
since
the
truck
is
heavier
on
Mars.
(B)
It
would
be
easier
since
the
truck
is
lighter
on
Mars.
(C)
It
would
be
harder
since
the
truck
is
lighter
on
Mars.
(D)
It
would
be
the
same
no
matter
where
you
are.'
target
:
"
Let's
think
step
by
step.
If
we
assume
that
there
is
no
friction,
the
\
\
force
needed
to
accelerate
the
truck
is
by
Newton
\u2019
s
second
law
only
dependent
\
\
on
the
mass
of
the
truck.
Hence
(A),
(B)
and
(C)
are
incorrect
since
it
doesn
\u2019\
t
matter
that
it
\u2019
s
on
Mars,
and
(D)
is
the
correct
answer.
The
answer
is
\
\
(D)."
-
question
:
'
Say
the
pupil
of
your
eye
has
a
diameter
of
5
mm
and
you
have
a
telescope
with
an
aperture
of
50
cm.
How
much
more
light
can
the
telescope
gather
than
your
eye?
(A)
10000
times
more
(B)
100
times
more
(C)
1000
times
more
(D)
10
times
more'
target
:
Let's think step by step. The amount of light is proportional to the aperture
area $A = \pi D^2/4$ for a lens with diameter $D$, so the relative amounts of
light between the eye with diameter 5mm and the telescope with diameter 50mm
is $(50 cm)^2/(5mm)^2 = 10000$. The answer is (A).
-
question
:
'
Why
isn'
'
t
there
a
planet
where
the
asteroid
belt
is
located?
(A)
A
planet
once
formed
here
but
it
was
broken
apart
by
a
catastrophic
collision.
(B)
There
was
not
enough
material
in
this
part
of
the
solar
nebula
to
form
a
planet.
(C)
There
was
too
much
rocky
material
to
form
a
terrestrial
planet
but
not
enough
gaseous
material
to
form
a
jovian
planet.
(D)
Resonance
with
Jupiter
prevented
material
from
collecting
together
to
form
a
planet.'
target
:
"
Let's
think
step
by
step.
The
asteroid
belt
is
a
stellar
disc
consisting
\
\
of
a
large
number
of
asteroids
between
Mars
and
Jupiter's
orbits.
The
asteroids
\
\
in
this
belt
are
affected
by
the
gravitational
pull
from
both
other
asteroids
\
\
and
nearby
planets.
Due
to
the
strong
gravitational
force
of
Jupiter
there
\
\
are
resonances
that
give
rise
to
low
density
regions
of
asteroids
known
as
\
\
the
Kirkwood
gap.
So
(B)
and
(C)
are
not
correct
since
it
\u2019
s
not
a
lack
\
\
of
material
that
prevents
a
planet
from
being
formed,
and
(A)
is
incorrect
\
\
because
the
Kirkwood
gap
would
have
prevented
a
planet
from
forming
in
the
\
\
first
place,
and
(D)
is
the
correct
option.
The
answer
is
(D)."
-
question
:
'
Why
is
Mars
red?
(A)
Because
the
surface
is
covered
with
heavily
oxidized
("rusted")
minerals.
(B)
Because
the
atmosphere
scatters
more
light
at
bluer
wavelengths
transmitting
mostly
red
light.
(C)
Because
Mars
is
covered
with
ancient
lava
flows
which
are
red
in
color.
(D)
Because
flowing
water
on
Mars'
'
s
surface
altered
the
surface
minerals
several
billion
years
ago.'
target
:
'
Let'
'
s
think
step
by
step.
Option
(B)
is
not
correct
because
if
the
red
color
was
caused
by
the
scattering
off
the
atmosphere,
then
the
earth
with
a
much
thicker
atmosphere
would
also
look
red.
Options
(C)
and
(D)
are
not
specific
enough
about
why
the
color
of
the
surface
would
be
red,
while
(A)
is
correct
because
it
explains
that
the
surface
is
red
due
to
the
rusted
materials
on
the
surface
and
the
red
color
comes
from
the
rust.
So
the
correct
option
is
(A).
The
answer
is
(A).'
group
:
mmlu_flan_cot_fewshot_stem
include
:
_mmlu_flan_cot_fewshot_template_yaml
task
:
mmlu_flan_cot_fewshot_astronomy
lm_eval/tasks/mmlu_pro/flan_cot_fewshot/mmlu_pro_biology.yaml
0 → 100644
View file @
772d6f6f
dataset_name
:
biology
description
:
The following are multiple choice questions (with answers) about biology.
fewshot_config
:
sampler
:
first_n
samples
:
-
question
:
"
Which
of
the
following
represents
an
accurate
statement
concerning
arthropods?
(A)
They
possess
an
exoskeleton
composed
primarily
of
peptidoglycan.
(B)
They
possess
an
open
circulatory
system
with
a
dorsal
heart.
(C)
They
are
members
of
a
biologically
unsuccessful
phylum
incapable
of
exploiting
diverse
habitats
and
nutrition
sources.
(D)
They
lack
paired,
jointed
appendages.
(E)
N/A
(F)
N/A
(G)
N/A
(H)
N/A
(I)
N/A
(J)
N/A"
target
:
"
Let's
think
step
by
step.
Peptidoglycan
is
known
to
comprise
the
plasma
membrane
of
most
bacteria,
rather
than
the
exoskeleton
of
arthropods,
which
is
made
of
chitin,
which
rules
out
(A).
The
answer
(C)
is
false
because
arthropods
are
a
highly
successful
phylum.
Likewise,
arthropods
have
paired,
jointed
appendages,
which
rules
out
(D).
The
only
remaining
option
is
(B),
as
arthropods
have
an
open
circulatory
system
with
a
dorsal
tubular
heart.
The
answer
is
(B)."
-
question
:
"
In
a
given
population,
1
out
of
every
400
people
has
a
cancer
caused
by
a
completely
recessive
allele,
b.
Assuming
the
population
is
in
Hardy-Weinberg
equilibrium,
which
of
the
following
is
the
expected
proportion
of
individuals
who
carry
the
b
allele
but
are
not
expected
to
develop
the
cancer?
(A)
19/400
(B)
1/400
(C)
40/400
(D)
38/400
(E)
2/400
(F)
1/200
(G)
20/400
(H)
50/400
(I)
N/A
(J)
N/A"
target
:
"
Let's
think
step
by
step.
According
to
the
Hardy
Weinberg
Law,
$p^2
+
2
p
q
+
q^2
=
1$,
and
$p
+
q
=
1$
where
$p$
is
the
frequency
of
the
dominant
allele,
$q$
is
the
frequency
of
the
recessive
allele,
and
$p^2$,
$q^2$,
and
$2pq$
are
the
frequencies
of
dominant
homozygous,
recessive
homozygous,
and
heterozygous
individuals,
respectively.
The
frequency
of
the
recessive
allele
(q)
is
$
\
sqrt{
rac{1}{400}}
=
0.05$.
We
have
$p
=
1
-
q
=
0.95$.
The
frequency
of
heterozygous
individuals
is
$2pq
=
2
\
cdot
0.05
\
cdot
0.95
=
0.095$.
The
number
of
heterozygous
individuals
is
equal
to
the
frequency
of
heterozygous
individuals
times
the
size
of
the
population,
or
$0.095
*
400
=
38$.
So
we
end
up
with
38/400.
The
answer
is
(D)."
-
question
:
"
A
mutation
in
a
bacterial
enzyme
changed
a
previously
polar
amino
acid
into
a
nonpolar
amino
acid.
This
amino
acid
was
located
at
a
site
distant
from
the
enzyme’s
active
site.
How
might
this
mutation
alter
the
enzyme’s
substrate
specificity?
(A)
By
changing
the
enzyme’s
pH
optimum
(B)
By
changing
the
enzyme's
molecular
weight
(C)
An
amino
acid
change
away
from
the
active
site
increases
the
enzyme's
substrate
specificity.
(D)
By
changing
the
shape
of
the
protein
(E)
By
changing
the
enzyme's
temperature
optimum
(F)
By
altering
the
enzyme's
ability
to
be
denatured
(G)
By
changing
the
enzyme’s
location
in
the
cell
(H)
By
changing
the
enzyme's
color
(I)
An
amino
acid
change
away
from
the
active
site
cannot
alter
the
enzyme’s
substrate
specificity.
(J)
By
altering
the
enzyme's
rate
of
reaction"
target
:
"
Let's
think
step
by
step.
A
change
in
an
amino
acid
leads
to
a
change
in
the
primary
structure
of
the
protein.
A
change
in
the
primary
structure
may
lead
to
a
change
in
the
secondary
and
the
tertiary
structure
of
the
protein.
A
change
in
the
tertiary
structure
means
a
change
in
the
shape
of
the
protein,
so
(C)
has
to
be
correct.
Since
the
change
does
not
affect
the
active
site
of
the
enzyme,
we
do
not
expect
the
activity
of
the
enzyme
to
be
affected.
The
answer
is
(D)."
-
question
:
"
Which
of
the
following
is
not
a
way
to
form
recombinant
DNA?
(A)
Translation
(B)
Conjugation
(C)
Specialized
transduction
(D)
Transformation
(E)
N/A
(F)
N/A
(G)
N/A
(H)
N/A
(I)
N/A
(J)
N/A"
target
:
"
Let's
think
step
by
step.
The
introduction
of
foreign
DNA
or
RNA
into
bacteria
or
eukaryotic
cells
is
a
common
technique
in
molecular
biology
and
scientific
research.
There
are
multiple
ways
foreign
DNA
can
be
introduced
into
cells
including
transformation,
transduction,
conjugation,
and
transfection.
In
contrast,
(A)
is
not
a
way
to
form
DNA:
during
translation
the
ribosomes
synthesize
proteins
from
RNA.
The
answer
is
(A)."
-
question
:
"
Which
of
the
following
is
not
known
to
be
involved
in
the
control
of
cell
division?
(A)
Microtubules
(B)
Checkpoints
(C)
DNA
polymerase
(D)
Centrosomes
(E)
Cyclins
(F)
Mitochondria
(G)
Protein
kinases
(H)
Fibroblast
cells
(I)
N/A
(J)
N/A"
target
:
"
Let's
think
step
by
step.
Normal
cells
move
through
the
cell
cycle
in
a
regulated
way.
At
the
checkpoint
stage,
they
use
information
about
their
own
internal
state
and
cues
from
the
environment
around
them
to
decide
whether
to
proceed
with
cell
division.
Cues
like
these
act
by
changing
the
activity
of
core
cell
cycle
regulators
inside
the
cell.
The
most
common
regulators
are
cyclins
and
cyclin-dependent
kinases.
Fibroblast
cells
do
not
play
any
role
in
cell
division.
The
answer
is
(H)."
group
:
mmlu_pro_flan_cot_fewshot_stem
include
:
_mmlu_pro_flan_cot_fewshot_template_yaml
task
:
mmlu_pro_flan_cot_fewshot_biology
lm_eval/tasks/mmlu_pro/flan_cot_fewshot/mmlu_pro_business.yaml
0 → 100644
View file @
772d6f6f
dataset_name
:
business
description
:
The following are multiple choice questions (with answers) about business.
fewshot_config
:
sampler
:
first_n
samples
:
-
question
:
"
In
contrast
to
_______,
_______
aim
to
reward
favourable
behaviour
by
companies.
The
success
of
such
campaigns
have
been
heightened
through
the
use
of
___________,
which
allow
campaigns
to
facilitate
the
company
in
achieving
_________
.
(A)
Boycotts,
Buyalls,
Blockchain
technology,
Increased
Sales
(B)
Buycotts,
Boycotts,
Digital
technology,
Decreased
Sales
(C)
Boycotts,
Buycotts,
Digital
technology,
Decreased
Sales
(D)
Buycotts,
Boycotts,
Blockchain
technology,
Charitable
donations
(E)
Boycotts,
Buyalls,
Blockchain
technology,
Charitable
donations
(F)
Boycotts,
Buycotts,
Digital
technology,
Increased
Sales
(G)
Buycotts,
Boycotts,
Digital
technology,
Increased
Sales
(H)
Boycotts,
Buycotts,
Physical
technology,
Increased
Sales
(I)
Buycotts,
Buyalls,
Blockchain
technology,
Charitable
donations
(J)
Boycotts,
Buycotts,
Blockchain
technology,
Decreased
Sales"
target
:
"
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
business
ethics
for
help.
The
sentence
that
best
uses
the
possible
options
above
is
_In
contrast
to
boycotts,
buycotts
aim
to
reward
favourable
behavior
by
companies.
The
success
of
such
campaigns
have
been
heightened
through
the
use
of
digital
technology,
which
allow
campaigns
to
facilitate
the
company
in
achieving
increased
sales._
The
answer
is
(F)."
-
question
:
"
_______
is
the
direct
attempt
to
formally
or
informally
manage
ethical
issues
or
problems,
through
specific
policies,
practices
and
programmes.
(A)
Operational
management
(B)
Corporate
governance
(C)
Environmental
management
(D)
Business
ethics
management
(E)
Sustainability
(F)
Stakeholder
management
(G)
Social
marketing
(H)
Human
resource
management
(I)
N/A
(J)
N/A"
target
:
"
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
business
ethics
for
help.
The
direct
attempt
manage
ethical
issues
through
specific
policies,
practices,
and
programs
is
business
ethics
management.
The
answer
is
(D)."
-
question
:
"
How
can
organisational
structures
that
are
characterised
by
democratic
and
inclusive
styles
of
management
be
described?
(A)
Flat
(B)
Bureaucratic
(C)
Autocratic
(D)
Hierarchical
(E)
Functional
(F)
Decentralized
(G)
Matrix
(H)
Network
(I)
Divisional
(J)
Centralized"
target
:
"
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
management
for
help.
Flat
organizational
structures
are
characterized
by
democratic
and
inclusive
styles
of
management,
and
have
few
(if
any)
levels
of
management
between
the
workers
and
managers.
The
answer
is
(A)."
-
question
:
"
Although
the
content
and
quality
can
be
as
controlled
as
direct
mail,
response
rates
of
this
medium
are
lower
because
of
the
lack
of
a
personal
address
mechanism.
This
media
format
is
known
as:
(A)
Online
banners.
(B)
Television
advertising.
(C)
Email
marketing.
(D)
Care
lines.
(E)
Direct
mail.
(F)
Inserts.
(G)
Door
to
door.
(H)
Radio
advertising.
(I)
Billboards.
(J)
Social
media
advertising."
target
:
"
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
marketing
for
help.
Door
to
door
marketing
delivers
non-addressed
items
within
all
buildings
within
a
geographic
area.
While
it
can
control
the
content
and
quality
as
well
as
direct
mail
marketing,
its
response
rate
is
lower
because
of
the
lack
of
a
personal
address
mechanism.
The
answer
is
(G)."
-
question
:
"
In
an
organization,
the
group
of
people
tasked
with
buying
decisions
is
referred
to
as
the
_______________.
(A)
Procurement
centre.
(B)
Chief
executive
unit.
(C)
Resources
allocation
group.
(D)
Marketing
department.
(E)
Purchasing
department.
(F)
Supply
chain
management
team.
(G)
Outsourcing
unit.
(H)
Decision-making
unit.
(I)
Operations
unit.
(J)
Financial
management
team."
target
:
"
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
marketing
for
help.
In
an
organization,
the
group
of
the
people
tasked
with
buying
decision
is
referred
to
as
the
decision-making
unit.
The
answer
is
(H)."
group
:
mmlu_pro_flan_cot_fewshot_other
include
:
_mmlu_pro_flan_cot_fewshot_template_yaml
task
:
mmlu_pro_flan_cot_fewshot_business
lm_eval/tasks/mmlu_pro/flan_cot_fewshot/mmlu_pro_chemistry.yaml
0 → 100644
View file @
772d6f6f
dataset_name
:
chemistry
description
:
The following are multiple choice questions (with answers) about chemistry.
fewshot_config
:
sampler
:
first_n
samples
:
-
question
:
"
Predict
the
number
of
lines
in
the
EPR
spectrum
of
a
solution
of
13C-labelled
methyl
radical
(13CH3•),
assuming
the
lines
do
not
overlap.
(A)
10
(B)
8
(C)
4
(D)
20
(E)
12
(F)
3
(G)
16
(H)
5
(I)
24
(J)
6"
target
:
"
Let's
think
step
by
step.
The
electron
paramagnetic
resonance
spectrum
will
be
split
by
two
forms
of
interactions.
The
first
is
the
hyperfine
interaction
with
the
13C
(nuclear
spin
$I
=
\f
rac{1}{2}$)
which
will
split
the
spectrum
into
2
lines.
This
will
be
further
split
into
4
lines
by
the
interaction
with
three
equivalent
1H
nuclei.
The
total
number
of
lines
is
therefore
$2
\
cdot
4
=
8$.
The
answer
is
(B)."
-
question
:
"
Which
of
the
following
lists
the
hydrides
of
group-14
elements
in
order
of
thermal
stability,
from
lowest
to
highest?
(A)
GeH4
<
SnH4
<
PbH4
<
SiH4
<
CH4
(B)
SiH4
<
GeH4
<
SnH4
<
PbH4
<
CH4
(C)
PbH4
<
CH4
<
SnH4
<
GeH4
<
SiH4
(D)
PbH4
<
SnH4
<
CH4
<
GeH4
<
SiH4
(E)
SnH4
<
GeH4
<
SiH4
<
PbH4
<
CH4
(F)
CH4
<
GeH4
<
SnH4
<
PbH4
<
SiH4
(G)
SiH4
<
SnH4
<
PbH4
<
GeH4
<
CH4
(H)
CH4
<
SiH4
<
GeH4
<
SnH4
<
PbH4
(I)
CH4
<
PbH4
<
GeH4
<
SnH4
<
SiH4
(J)
PbH4
<
SnH4
<
GeH4
<
SiH4
<
CH4"
target
:
"
Let's
think
step
by
step.
The
thermal
stability
of
group-14
hydrides
decreases
as
we
move
from
the
top
of
group
14
to
the
bottom.
The
order
of
elements
in
the
group
from
top
to
bottom
is
C,
Si,
Ge,
Sn,
Pb.
Therefore
in
order
of
increasing
thermal
stability
we
have
PbH4,
SnH4,
GeH4,
SiH4,
and
CH4,
or
answer
(J).
The
answer
is
(J)."
-
question
:
"
Which
of
the
following
is
considered
an
acid
anhydride?
(A)
H2SO3
(B)
NaCl
(C)
CH4
(D)
HNO3
(E)
SO2
(F)
Al(NO3)3
(G)
CaCO3
(H)
H2O
(I)
HCl
(J)
N/A"
target
:
"
Let's
think
step
by
step.
An
acid
anhydride
is
a
compound
that
is
derived
by
removing
water
from
an
acid.
The
chemical
formula
for
water
is
H2O,
which
means
that
we
need
to
determine
which
of
these
options,
when
combined
with
H2O,
forms
an
acid.
SO2,
or
Sulfur
dioxide,
when
combined
with
H2O,
makes
H2SO4,
or
sulfuric
acid.
The
answer
is
(E)."
-
question
:
"
A
new
compound
is
synthesized
and
found
to
be
a
monoprotic
acid
with
a
molar
mass
of
248
g/mol.
When
0.0050
mol
of
this
acid
are
dissolved
in
0.500
L
of
water,
the
pH
is
measured
as
3.89.
What
is
the
pKa
of
this
acid?
(A)
5.78
(B)
4.78
(C)
4.56
(D)
6.89
(E)
7.78
(F)
3.89
(G)
1.23
(H)
2.89
(I)
2.33
(J)
5.33"
target
:
"
Let's
think
step
by
step.
Recall
that
$[A]
=
[H^{+}]$.
Here,
this
is
equal
to
$10^{-3.89}$.
Then
we
have
$K_{a}
=
\f
rac{[H^{+}][A^{-}]}{[HA]}
=
\f
rac{10^{-3.89}
\
cdot
10^{-3.89}}{10^{-2}}$.
The
resulting
exponent
is
$-3.89
+
(-3.89)
-
(-2)
=
5.78$,
therefore
$K_a
=
10^{-5.78}$.
The
$pK_a$
is
the
negative
log
of
$K_a$,
which
is
equal
to
$5.78$.
The
answer
is
(A)."
-
question
:
"
A
solution
contains
2.00
mole
of
acetic
acid,
CH3COOH,
and
1.00
mole
of
calcium
acetate,
Ca(CH3COO)2.
The
solution
is
able
to
resist
the
addition
of
a
small
amount
of
strong
acid
or
strong
base
with
only
minor
changes
in
the
pH
of
the
solution.
Larger
quantities
of
strong
acid
or
strong
base
can
cause
a
significant
change
in
pH.
How
many
moles
of
nitric
acid,
HNO3,
may
be
added
before
the
pH
begins
to
change
significantly?
(A)
0.250
mole
(B)
0.500
mole
(C)
3.00
mole
(D)
1.00
mole
(E)
3.50
mole
(F)
1.50
mole
(G)
2.50
mole
(H)
4.00
mole
(I)
0.750
mole
(J)
2.00
mole"
target
:
"
Let's
think
step
by
step.
We
would
like
to
compute
the
buffer
capacity
of
this
solution.
First
we
write
the
equation
for
the
ionization
of
the
weak
acid,
in
this
case
of
acetic
acid.
$CH_{3}COOH
(aq)
+
H_{2}O
\r
ightarrow
H_{3}O^{+}
+
CH3COO^{-}$.
The
conjugate
base
is
therefore
the
acetate
ion.
The
added
strong
acid,
Nitric
acid,
will
react
with
the
conjugate
base.
Therefore
the
maximum
amount
of
acid
that
can
be
added
will
be
equal
to
the
amount
of
acetate
ion,
or
2
moles.
The
answer
is
(J)."
group
:
mmlu_pro_flan_cot_fewshot_stem
include
:
_mmlu_pro_flan_cot_fewshot_template_yaml
task
:
mmlu_pro_flan_cot_fewshot_chemistry
lm_eval/tasks/mmlu_pro/flan_cot_fewshot/mmlu_pro_computer_science.yaml
0 → 100644
View file @
772d6f6f
dataset_name
:
computer_science
description
:
The following are multiple choice questions (with answers) about computer science.
fewshot_config
:
sampler
:
first_n
samples
:
-
question
:
"
A
certain
pipelined
RISC
machine
has
8
general-purpose
registers
R0,
R1,
.
.
.
,
R7
and
supports
the
following
operations:
ADD
Rs1,
Rs2,
Rd
(Add
Rs1
to
Rs2
and
put
the
sum
in
Rd)
MUL
Rs1,
Rs2,
Rd
(Multiply
Rs1
by
Rs2
and
put
the
product
in
Rd)
An
operation
normally
takes
one
cycle;
however,
an
operation
takes
two
cycles
if
it
produces
a
result
required
by
the
immediately
following
operation
in
an
operation
sequence.
Consider
the
expression
AB
+
ABC
+
BC,
where
variables
A,
B,
C
are
located
in
registers
R0,
R1,
R2.
If
the
contents
of
these
three
registers
must
not
be
modified,
what
is
the
minimum
number
of
clock
cycles
required
for
an
operation
sequence
that
computes
the
value
of
AB
+
ABC
+
BC?
(A)
10
(B)
6
(C)
9
(D)
5
(E)
12
(F)
4
(G)
7
(H)
8
(I)
N/A
(J)
N/A"
target
:
"
Let's
think
step
by
step.
First,
we
are
given
that
A
is
in
R0,
B
is
in
R1,
and
C
is
in
R2.
Next,
we
can
see
that
we
must
compute
three
multiplies
(AB,
BC,
and
ABC)
and
two
adds
(AB
+
ABC,
(AB
+
ABC)
+
BC)
to
compute
our
final
answer,
resulting
in
a
minimum
of
five
clock
cycles.
Next,
we
can
see
that
there
is
no
way
to
avoid
at
least
one
pipeline
stall
when
computing
our
final
answer,
because
to
compute
our
final
sum
we
must
wait
at
least
one
cycle
for
the
results
from
the
previous
stage
to
be
ready.
Thus,
our
minimum
number
of
cycles
must
be
6.
We
can
verify
that
we
can
create
a
solution
that
requires
only
six
cycles
as
follows:
compute
AB:
MUL
R0,
R1,
R3
compute
BC:
MUL
R1,
R2,
R4
compute
ABC:
MUL
R3,
R4,
R5
compute
AB
+
BC:
ADD
R3,
R4,
R6
STALL
compute
AB
+
ABC
+
BC:
ADD
R5,
R6,
R7
So
there
are
6
cycles.
The
answer
is
(B)."
-
question
:
"
A
compiler
generates
code
for
the
following
assignment
statement.
G
:=
(A
+
B)
*
C
-
(D
+
E)
*
F
The
target
machine
has
a
single
accumulator
and
a
single-address
instruction
set
consisting
of
instructions
load,
store,
add,
subtract,
and
multiply.
For
the
arithmetic
operations,
the
left
operand
is
taken
from
the
accumulator
and
the
result
appears
in
the
accumulator.
The
smallest
possible
number
of
instructions
in
the
resulting
code
is
(A)
7
(B)
5
(C)
12
(D)
4
(E)
11
(F)
6
(G)
8
(H)
10
(I)
9
(J)
3"
target
:
"
Let's
think
step
by
step.
We
can
compute
the
final
answer
with
the
following
sequence
of
operations:
1.
LOAD
D
(accumulator
=
D)
2.
ADD
E
(accumulator
=
D+E)
3.
MUL
F
(accumulator
=
(D+E)*F)
4.
STORE
X
(X
=
(D+E)*F)
5.
LOAD
A
(accumulator
=
A)
6.
ADD
B
(accumulator
=
A+B)
7.
MUL
C
(accumulator
=
(A+B)*C)
8.
SUB
X
(accumulator
=
(A+B)*C
-
(D+E)*F)
9.
STORE
G
(G
=
(A+B)*C
-
(D+E)*F)
This
sequence
takes
9
instructions.
The
answer
is
(I)."
-
question
:
"
SHA-1
has
a
message
digest
of
(A)
628
bits
(B)
780
bits
(C)
160
bits
(D)
256
bits
(E)
2048
bits
(F)
820
bits
(G)
128
bits
(H)
512
bits
(I)
1024
bits
(J)
64
bits"
target
:
"
Let's
think
step
by
step.
Since
SHA-1
is
a
hash
function
which
takes
an
input
and
produces
a
160-bit
(20-byte)
hash
value,
its
message
digest
is
160
bits.
The
answer
is
(C)."
-
question
:
"
_____________
can
modify
data
on
your
system
–
so
that
your
system
doesn’t
run
correctly
or
you
can
no
longer
access
specific
data,
or
it
may
even
ask
for
ransom
in
order
to
give
your
access.
(A)
Email
Trojans
(B)
Backdoor
Trojans
(C)
Rootkit
Trojans
(D)
Banking
Trojans
(E)
Trojan-Downloader
(F)
Spyware
Trojans
(G)
Network
Trojans
(H)
Worm
Trojans
(I)
Ransom
Trojan
(J)
IM
–
Trojans"
target
:
"
Let's
think
step
by
step.
The
system
is
asking
for
trojans,
which
are
for
ransom,
which
means
ransom
trojan.
The
answer
is
(I)."
-
question
:
"
In
the
program
below,
the
initial
value
of
X
is
5
and
the
initial
value
of
Y
is
10.
IF
(X
<
0){
DISPLAY
(
\"
Foxtrot
\"
)
}
ELSE
{
IF
(X
>
Y){
DISPLAY
(
\"
Hotel
\"
)
}
ELSE
{
IF
(Y
>
0){
DISPLAY
(
\"
November
\"
)
}
ELSE
{
DISPLAY
(
\"
Yankee
\"
)
}
}
}
What
is
displayed
as
a
result
of
running
the
program?
(A)
November
(B)
Foxtrot
(C)
Zulu
(D)
Alpha
(E)
Charlie
(F)
Bravo
(G)
Yankee
(H)
Echo
(I)
Hotel
(J)
Delta"
target
:
"
Let's
think
step
by
step.
Because
X
has
the
value
5,
the
first
conditional
IF
(X
<
0)
is
false,
so
we
move
to
the
first
ELSE
clause.
Because
X
is
5
and
Y
is
10,
the
second
conditional
IF
(X
>
Y)
is
false,
so
we
move
to
the
following
ELSE
clause.
Since
Y
is
10,
the
conditional
IF
(Y
>
0)
is
true,
so
the
command
DISPLAY
(
\"
November
\"
)
is
executed.
The
answer
is
(A)."
group
:
mmlu_pro_flan_cot_fewshot_stem
include
:
_mmlu_pro_flan_cot_fewshot_template_yaml
task
:
mmlu_pro_flan_cot_fewshot_computer_science
lm_eval/tasks/mmlu_pro/flan_cot_fewshot/mmlu_pro_economics.yaml
0 → 100644
View file @
772d6f6f
dataset_name
:
economics
description
:
The following are multiple choice questions (with answers) about economics.
fewshot_config
:
sampler
:
first_n
samples
:
-
question
:
"
For
a
stationary
autoregressive
process,
shocks
will
(A)
Never
occur
(B)
Persist
indefinitely
(C)
Cause
a
permanent
change
in
the
trend
(D)
Grow
exponentially
(E)
Only
affect
the
next
period
(F)
Eventually
die
away
(G)
Cause
the
process
to
become
non-stationary
(H)
Immediately
disappear
(I)
Lead
to
a
cyclical
pattern
(J)
Be
balanced
out
by
subsequent
shocks"
target
:
"
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
econometrics
for
help.
This
is
a
formal
logic
problem
about
stationary
processes.
For
a
stationary
autoregressive
process,
shocks
will
eventually
die
away.
The
answer
is
(F)."
-
question
:
"
Consider
the
following
AR(1)
model
with
the
disturbances
having
zero
mean
and
unit
variance
yt
=
0.2
+
0.4
yt-1
+
ut
The
(unconditional)
mean
of
y
will
be
given
by
(A)
0.45
(B)
0.2
(C)
0.4
(D)
0.1
(E)
0.3
(F)
0.55
(G)
0.25
(H)
0.33
(I)
0.6
(J)
0.5"
target
:
"
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
econometrics
for
help.
Let’s
solve
this
problem
step
by
step.
If
we
have
an
AR(1)
model
with
the
disturbances
having
zero
mean
and
unit
variance,
then
the
unconditional
mean
of
y
is
equal
to
the
following:
unconditional
mean
of
y
=
(the
intercept
term)
/
(1
-
autoregressive
coefficient)
We
know
that
the
intercept
term
is
0.2
and
the
autoregressive
coefficient
is
0.4;
thus,
we
have:
unconditional
mean
of
y
=
(0.2)
/
(1
-
0.4)
=
(0.2)
/
(0.6)
=
2
/
6
=
1
/
3,
which
is
approximately
0.33.
That
means
that
the
answer
should
be
(H)
0.33.
The
answer
is
(H)."
-
question
:
"
Suppose
that
a
test
statistic
has
associated
with
it
a
p-value
of
0.08.
Which
one
of
the
following
statements
is
true?
(i)
If
the
size
of
the
test
were
exactly
8%,
we
would
be
indifferent
between
rejecting
and
not
rejecting
the
null
hypothesis
(ii)
The
null
would
be
rejected
if
a
10%
size
of
test
were
used
(iii)
The
null
would
not
be
rejected
if
a
1%
size
of
test
were
used
(iv)
The
null
would
be
rejected
if
a
5%
size
of
test
were
used.
(A)
(iii)
and
(iv)
only
(B)
(i)
and
(iv)
only
(C)
(i),
(ii),
and
(iii)
only
(D)
(i)
and
(ii)
only
(E)
(i)
and
(iii)
only
(F)
(i),
(ii),
(iii),
and
(iv)
(G)
(ii),
(iii),
and
(iv)
only
(H)
(ii)
and
(iii)
only
(I)
(ii)
and
(iv)
only
(J)
(i),
(iii),
and
(iv)
only"
target
:
"
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
econometrics
for
help.
Let’s
reason
about
each
of
the
options.
(i)
is
a
true
statement.
(ii)
is
a
true
statement.
(iii)
is
a
true
statement.
(iv)
is
not
a
true
statement.
Thus,
(i),
(ii),
and
(iii)
are
true.
The
answer
is
(C)."
-
question
:
"
Which
of
the
following
policies
best
describes
supply-side
fiscal
policy?
(A)
Higher
taxes
on
household
income
(B)
Increased
government
spending
(C)
Increased
taxes
on
corporate
profits
(D)
Increased
import
tariffs
(E)
Decreased
interest
rates
(F)
Lower
taxes
on
consumer
goods
(G)
Lower
taxes
on
research
and
development
of
new
technology
(H)
Reduced
government
spending
(I)
Higher
taxes
on
capital
gains
(J)
An
increase
in
the
money
supply"
target
:
"
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
macroeconomics
for
help.
Supply-side
fiscal
policy
stimulates
the
economy
by
encouraging
more
production
of
goods
and
services
through
reduction
in
taxes
and
deregulation.
The
answer
is
(G)."
-
question
:
"
The
concentration
ratio
for
a
monopoly
is
(A)
50
(B)
5
(C)
10
(D)
90
(E)
15
(F)
100
(G)
0
(H)
25
(I)
75
(J)
N/A"
target
:
"
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
microeconomics
for
help.
The
concentration
ratio
is
calculated
as
the
sum
of
market
share
of
a
specific
number
of
largest
companies.
Monopoly
means
one
company
or
entity
controls
the
entire
market,
therefore,
the
concentration
ratio
is
100
percent.
The
answer
is
(F)."
group
:
mmlu_pro_flan_cot_fewshot_social_sciences
include
:
_mmlu_pro_flan_cot_fewshot_template_yaml
task
:
mmlu_pro_flan_cot_fewshot_economics
lm_eval/tasks/mmlu_pro/flan_cot_fewshot/mmlu_pro_engineering.yaml
0 → 100644
View file @
772d6f6f
dataset_name
:
engineering
description
:
The following are multiple choice questions (with answers) about engineering.
fewshot_config
:
sampler
:
first_n
samples
:
-
question
:
"
A
point
pole
has
a
strength
of
4π
*
10^-4
weber.
The
force
in
newtons
on
a
point
pole
of
4π
*
1.5
*
10^-4
weber
placed
at
a
distance
of
10
cm
from
it
will
be
(A)
25
N.
(B)
12.5
N.
(C)
15
N.
(D)
10
N.
(E)
7.5
N.
(F)
20
N.
(G)
5
N.
(H)
3.75
N.
(I)
N/A
(J)
N/A"
target
:
"
Let's
think
step
by
step.
The
force
between
two
point
poles
is
given
by
\
(
\f
rac{m_1
m_2}{
\
mu_0
4
\
pi
r^2}
\
),
in
analogy
to
Coulomb’s
law.
Plugging
in
the
values
given
in
the
question,
we
calculate
that
the
force
is
approximately
15
N.
The
answer
is
(C)."
-
question
:
"
The
coil
of
a
moving
coil
meter
has
100
turns,
is
40
mm
long
and
30
mm
wide.
The
control
torque
is
\
(
240
\t
imes
10^{-6}
\
)
N-m
on
full
scale.
If
magnetic
flux
density
is
1
Wb/m²
range
of
meter
is
(A)
2
mA.
(B)
5
mA.
(C)
1.5
mA.
(D)
0.5
mA.
(E)
6
mA.
(F)
4
mA.
(G)
3
mA.
(H)
1
mA.
(I)
2.5
mA.
(J)
3.5
mA."
target
:
"
Let's
think
step
by
step.
The
torque
on
a
coil
in
a
uniform
magnetic
field
is
given
by
\
(
BANI
\
),
where
\
(
B
\
)
is
the
magnetic
flux
density,
\
(
A
\
)
is
the
area
of
the
coil,
\
(
N
\
)
is
the
number
of
turns,
and
\
(
I
\
)
is
the
current.
So
we
have
that
\
(
I
=
\f
rac{
\t
ext{Torque}}{BAN}
\
),
or
\
(
\f
rac{240
\t
imes
10^{-6}}{1200
\t
imes
10^{-6}
\t
imes
100
\t
imes
1}
=
2
\t
imes
10^{-3}
\
).
The
answer
is
(A)."
-
question
:
"
In
an
SR
latch
built
from
NOR
gates,
which
condition
is
not
allowed
(A)
S=0,
R=2
(B)
S=2,
R=2
(C)
S=1,
R=1
(D)
S=1,
R=-1
(E)
S=1,
R=2
(F)
S=0,
R=0
(G)
S=2,
R=0
(H)
S=1,
R=0
(I)
S=2,
R=1
(J)
S=0,
R=1"
target
:
"
Let's
think
step
by
step.
An
SR
latch
is
a
set-reset
latch;
in
the
case
where
\
(
S=1
\
)
and
\
(
R=1
\
),
the
circuit
has
no
stable
state;
instead
a
race
condition
will
be
produced
within
the
circuit,
so
the
device
will
be
in
an
undefined
state.
So
\
(
S=1
\
),
\
(
R=1
\
)
is
an
illegal
input.
The
answer
is
(C)."
-
question
:
"
Two
long
parallel
conductors
carry
100
A.
If
the
conductors
are
separated
by
20
mm,
the
force
per
meter
of
length
of
each
conductor
will
be
(A)
100
N.
(B)
0.01
N.
(C)
10
N.
(D)
0.5
N.
(E)
0.001
N.
(F)
5
N.
(G)
1
N.
(H)
50
N.
(I)
0.1
N.
(J)
0.0001
N."
target
:
"
Let's
think
step
by
step.
The
magnetic
force-per-length
between
two
current-carrying
conductors
is
given
by
\
(
\f
rac{
\
mu_0
I_1
I_2}{2
\
pi
r}
\
),
where
\
(
r
\
)
is
the
separation
distance
and
\
(
I_1
\
)
and
\
(
I_2
\
)
are
the
currents.
Plugging
in
100
A
for
\
(
I_1
\
)
and
\
(
I_2
\
),
and
20
mm
for
\
(
r
\
),
gives
0.1
N.
The
answer
is
(I)."
-
question
:
"
In
a
2
pole
lap
winding
dc
machine,
the
resistance
of
one
conductor
is
2Ω
and
the
total
number
of
conductors
is
100.
Find
the
total
resistance
(A)
50Ω
(B)
1Ω
(C)
25Ω
(D)
200Ω
(E)
10Ω
(F)
100Ω
(G)
500Ω
(H)
150Ω
(I)
75Ω
(J)
20Ω"
target
:
"
Let's
think
step
by
step.
In
lap
winding,
effectively
two
resistors
are
connected
in
parallel,
so
the
actual
resistance
of
each
pair
is
1
Ohm.
Since
we
have
50
pairs,
we
get
a
total
resistance
of
50
Ohms.
The
answer
is
(A)."
group
:
mmlu_pro_flan_cot_fewshot_stem
include
:
_mmlu_pro_flan_cot_fewshot_template_yaml
task
:
mmlu_pro_flan_cot_fewshot_engineering
lm_eval/tasks/mmlu_pro/flan_cot_fewshot/mmlu_pro_health.yaml
0 → 100644
View file @
772d6f6f
dataset_name
:
health
description
:
The following are multiple choice questions (with answers) about health.
fewshot_config
:
sampler
:
first_n
samples
:
-
question
:
"
Which
of
the
following
is
the
body
cavity
that
contains
the
pituitary
gland?
(A)
Ventral
(B)
Dorsal
(C)
Buccal
(D)
Thoracic
(E)
Pericardial
(F)
Abdominal
(G)
Spinal
(H)
Pelvic
(I)
Pleural
(J)
Cranial"
target
:
"
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
anatomy
for
help.
Let’s
solve
this
problem
step
by
step.
The
pituitary
gland
is
the
major
endocrine
gland
attached
to
the
base
of
the
brain,
and
it
is
contained
in
the
Cranial
cavity.
The
answer
is
(J)."
-
question
:
"
What
is
the
embryological
origin
of
the
hyoid
bone?
(A)
The
third
and
fourth
pharyngeal
arches
(B)
The
fourth
pharyngeal
arch
(C)
The
third
pharyngeal
arch
(D)
The
second
pharyngeal
arch
(E)
The
second,
third
and
fourth
pharyngeal
arches
(F)
The
first
pharyngeal
arch
(G)
The
second
and
third
pharyngeal
arches
(H)
The
first
and
third
pharyngeal
arches
(I)
The
first,
second
and
third
pharyngeal
arches
(J)
The
first
and
second
pharyngeal
arches"
target
:
"
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
anatomy
for
help.
Let’s
solve
this
problem
step
by
step.
The
hyoid
bone,
which
is
also
known
as
the
hyooid,
is
a
small
U-shaped
bone
located
in
the
anterior
neck.
In
its
resting
position,
it
lies
between
the
base
of
the
mandible
and
the
third
cervical
vertebrae.
We
know
that
the
second
and
the
third
pharyngeal
arches
give
rise
to
the
horns
of
the
hyoid
bone;
therefore,
the
embryological
origin
of
the
hyoid
bone
are
the
second
and
the
third
pharyngeal
arches—this
information
is
covered
in
option
(G).
Therefore,
we
conclude
that
(G)
must
be
the
correct
answer.
The
answer
is
(G)."
-
question
:
"
What
is
the
difference
between
a
male
and
a
female
catheter?
(A)
Female
catheters
are
used
more
frequently
than
male
catheters.
(B)
Male
catheters
are
bigger
than
female
catheters.
(C)
Male
catheters
are
more
flexible
than
female
catheters.
(D)
Male
catheters
are
made
from
a
different
material
than
female
catheters.
(E)
Female
catheters
are
longer
than
male
catheters.
(F)
Male
catheters
are
longer
than
female
catheters.
(G)
Female
catheters
are
bigger
than
male
catheters.
(H)
Female
catheters
have
a
curved
shape
while
male
catheters
are
straight.
(I)
Male
and
female
catheters
are
different
colours.
(J)
Male
catheters
have
a
smaller
diameter
than
female
catheters."
target
:
"
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
clinical
knowledge
for
help.
The
difference
between
a
male
and
female
catheter
is
that
male
catheters
tend
to
be
longer
than
female
catheters.
The
answer
is
(F)."
-
question
:
"
How
many
attempts
should
you
make
to
cannulate
a
patient
before
passing
the
job
on
to
a
senior
colleague,
according
to
the
medical
knowledge
of
2020?
(A)
1
(B)
Unlimited
attempts
(C)
5
(D)
0,
always
pass
on
to
a
senior
colleague
(E)
7
(F)
2
(G)
4
(H)
6
(I)
3
(J)
8"
target
:
"
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
clinical
knowledge
for
help.
According
to
the
medical
protocol
as
of
2020,
you
should
make
two
attempts
to
cannulate
a
patient
before
passing
the
job
on
to
a
more-senior
practitioner.
The
answer
is
(F)."
-
question
:
"
Why
are
parvoviruses
a
highly
impactful
parasite?
(A)
They
are
able
to
alter
the
host's
DNA
(B)
Because
they
have
no
nucleic
acid
(C)
They
can
survive
in
extreme
temperatures
(D)
Only
replicate
in
dividing
cells
(E)
They
can
infect
multiple
species
(F)
They
don't
require
a
host
to
survive
(G)
Can
integrate
into
host
chromosomes
(H)
N/A
(I)
N/A
(J)
N/A"
target
:
"
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
virology
for
help.
Paroviruses
are
highly
impactful
because
they
do
not
have
nucleic
acid.
The
answer
is
(B)."
group
:
mmlu_pro_flan_cot_fewshot_other
include
:
_mmlu_pro_flan_cot_fewshot_template_yaml
task
:
mmlu_pro_flan_cot_fewshot_health
lm_eval/tasks/mmlu_pro/flan_cot_fewshot/mmlu_pro_history.yaml
0 → 100644
View file @
772d6f6f
dataset_name
:
history
description
:
The following are multiple choice questions (with answers) about history.
fewshot_config
:
sampler
:
first_n
samples
:
-
question
:
"
This
question
refers
to
the
following
information.
In
Russia
there
was
nothing
going
on
well,
and
[Souvarine]
was
in
despair
over
the
news
he
had
received.
His
old
companions
were
all
turning
to
the
politicians;
the
famous
Nihilists
who
made
Europe
tremble-sons
of
village
priests,
of
the
lower
middle
class,
of
tradesmen-could
not
rise
above
the
idea
of
national
liberation,
and
seemed
to
believe
that
the
world
would
be
delivered-when
they
had
killed
their
despot&…
\"
Foolery!
They'll
never
get
out
of
it
with
their
foolery.
\"
Then,
lowering
his
voice
still
more,
in
a
few
bitter
words
he
described
his
old
dream
of
fraternity.
He
had
renounced
his
rank
and
his
fortune;
he
had
gone
among
workmen,
only
in
the
hope
of
seeing
at
last
the
foundation
of
a
new
society
of
labour
in
common.
All
the
sous
in
his
pockets
had
long
gone
to
the
urchins
of
the
settlement;
he
had
been
as
tender
as
a
brother
with
the
colliers,
smiling
at
their
suspicion,
winning
them
over
by
his
quiet
workmanlike
ways
and
his
dislike
of
chattering.
But
decidedly
the
fusion
had
not
taken
place.
His
voice
changed,
his
eyes
grew
bright,
he
fixed
them
on
étienne,
directly
addressing
him:
\"
Now,
do
you
understand
that?
These
hatworkers
at
Marseilles
who
have
won
the
great
lottery
prize
of
a
hundred
thousand
francs
have
gone
off
at
once
and
invested
it,
declaring
that
they
are
going
to
live
without
doing
anything!
Yes,
that
is
your
idea,
all
of
you
French
workmen;
you
want
to
unearth
a
treasure
in
order
to
devour
it
alone
afterwards
in
some
lazy,
selfish
corner.
You
may
cry
out
as
much
as
you
like
against
the
rich,
you
haven't
got
courage
enough
to
give
back
to
the
poor
the
money
that
luck
brings
you.
You
will
never
be
worthy
of
happiness
as
long
as
you
own
anything,
and
your
hatred
of
the
bourgeois
proceeds
solely
from
an
angry
desire
to
be
bourgeois
yourselves
in
their
place.
\"
émile
Zola,
French
writer,
Germinal,
1885
The
passage
displays
the
direct
concern
for
the
welfare
of
the
working
classes
that
was
typically
a
part
of
which
movement?
(A)
Communist
(B)
Anarchist
(C)
Feminist
(D)
Fascist
(E)
Imperialist
(F)
Nationalist
(G)
Enlightenment
(H)
Existentialist
(I)
N/A
(J)
N/A"
target
:
"
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
European
history
for
help.
The
modern
Communist
movement
aims
to
establish
a
classless
society
based
on
communal
ownership
and
distribution
of
property
and
means
of
production,
thereby
especially
benefiting
the
working
classes.
The
answer
is
(A)."
-
question
:
"
This
question
refers
to
the
following
information.
The
following
excerpt
is
from
a
pamphlet.
You
will
do
me
the
justice
to
remember,
that
I
have
always
strenuously
supported
the
Right
of
every
man
to
his
own
opinion,
however
different
that
opinion
might
be
to
mine.
He
who
denies
to
another
this
right,
makes
a
slave
of
himself
to
his
present
opinion,
because
he
precludes
himself
the
right
of
changing
it.
The
most
formidable
weapon
against
errors
of
every
kind
is
Reason.
I
have
never
used
any
other,
and
I
trust
I
never
shall.
The
circumstance
that
has
now
taken
place
in
France
of
the
total
abolition
of
the
whole
national
order
of
priesthood,
and
of
everything
appertaining
to
compulsive
systems
of
religion,
and
compulsive
articles
of
faith,
has
not
only
precipitated
my
intention,
but
rendered
a
work
of
this
kind
exceedingly
necessary,
lest
in
the
general
wreck
of
superstition,
of
false
systems
of
government,
and
false
theology,
we
lose
sight
of
morality,
of
humanity,
and
of
the
theology
that
is
true.
I
believe
in
one
God,
and
no
more;
and
I
hope
for
happiness
beyond
this
life.
I
believe
in
the
equality
of
man;
and
I
believe
that
religious
duties
consist
in
doing
justice,
loving
mercy,
and
endeavoring
to
make
our
fellow-creatures
happy.
I
do
not
believe
in
the
creed
professed
by
the
Jewish
church,
by
the
Roman
church,
by
the
Greek
church,
by
the
Turkish
church,
by
the
Protestant
church,
nor
by
any
church
that
I
know
of.
My
own
mind
is
my
own
church.
All
national
institutions
of
churches,
whether
Jewish,
Christian
or
Turkish,
appear
to
me
no
other
than
human
inventions,
set
up
to
terrify
and
enslave
mankind,
and
monopolize
power
and
profit.
I
do
not
mean
by
this
declaration
to
condemn
those
who
believe
otherwise;
they
have
the
same
right
to
their
belief
as
I
have
to
mine.
—Thomas
Paine,
The
Age
of
Reason,
1794–1795
Which
of
the
following
Enlightenment
philosophes
designed
a
system
of
checks
and
balances
for
government
to
avoid
abuses
of
power?
(A)
Thomas
Hobbes
(B)
Jean
Jacques
Rousseau
(C)
Baron
Montesquieu
(D)
Voltaire
(E)
Denis
Diderot
(F)
Immanuel
Kant
(G)
David
Hume
(H)
Mary
Wollstonecraft
(I)
Adam
Smith
(J)
John
Locke"
target
:
"
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
European
history
for
help.
Baron
Montesquieu
was
an
18th-century
French
philosopher
who
wrote
extensively
against
the
monopolization
of
power
and
advocated
for
a
system
of
checks
and
balances
in
government
to
prevent
the
rise
of
despotism.
The
answer
is
(C)."
-
question
:
"
This
question
refers
to
the
following
information.
\"
The
struggle
against
neo-colonialism
is
not
aimed
at
excluding
the
capital
of
the
developed
world
from
operating
in
less
developed
countries.
It
is
aimed
at
preventing
the
financial
power
of
the
developed
countries
being
used
in
such
a
way
as
to
impoverish
the
less
developed.
Non-alignment,
as
practiced
by
Ghana
and
many
other
countries,
is
based
on
co-operation
with
all
States
whether
they
be
capitalist,
socialist
or
have
a
mixed
economy.
Such
a
policy,
therefore,
involves
foreign
investment
from
capitalist
countries,
but
it
must
be
invested
in
accordance
with
a
national
plan
drawn
up
by
the
government
of
the
non-aligned
State
with
its
own
interests
in
mind.
The
issue
is
not
what
return
the
foreign
investor
receives
on
his
investments…The
question
is
one
of
power.
A
State
in
the
grip
of
neo-colonialism
is
not
master
of
its
own
destiny.
\"
Kwame
Nkrumah,
Neo-Colonialism,
1965
Which
of
the
following
provides
the
best
context
for
Nkrumah's
writings?
(A)
The
spread
of
communism
(B)
The
Industrial
Revolution
(C)
The
French
Revolution
(D)
The
Cold
War
(E)
Decolonization
(F)
World
War
II
(G)
Autarky
(H)
The
rise
of
multinational
corporations
(I)
The
American
Civil
Rights
Movement
(J)
Regional
Free
Trade
Associations"
target
:
"
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
world
history
for
help.
The
passage
expresses
a
point
that
the
successful
fight
against
neo-colonialism
was
in
danger
and
the
newly
independent
nations
like
Ghana
may
be
re-colonized
via
financial
power
of
the
developed
countries.
The
answer
is
(E)."
-
question
:
"
What
is
the
approximate
mean
cranial
capacity
of
Homo
erectus?
(A)
1200
cc
(B)
under
650
cc
(C)
1700
cc
(D)
1350
cc
(E)
just
under
1000
cc
(F)
1500
cc
(G)
under
500
cc
(H)
about
800
cc
(I)
just
over
1100
cc
(J)
about
900
cc"
target
:
"
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
prehistory
for
help.
The
average
cranium
capacity
of
Homo
erectus
is
less
than
1000
cubic
cm.
The
answer
is
(E)."
-
question
:
"
Recent
research
on
hominid
species
dating
from
the
Middle
Pliocene
indicates
there
was
(as
of
2020):
(A)
multiple
hominid
species
but
with
limited
diversity.
(B)
a
single
species
with
no
diversity.
(C)
decreased
species
diversity
but
increased
numbers
of
hammerstones
and
flakes,
indicating
stone
tool
manufacture.
(D)
a
single
dominant
species
that
outcompeted
all
others,
leading
to
decreased
diversity.
(E)
increased
species
diversity
due
to
a
prolonged
ice
age
followed
by
a
severe
drought.
(F)
decreased
species
diversity
due
to
a
prolonged
ice
age
followed
by
a
severe
drought.
(G)
a
great
amount
of
species
diversity,
or
a
single
species
that
exhibited
a
lot
of
diversity.
(H)
increased
species
diversity
but
with
decreased
population
numbers
due
to
harsh
climate
conditions.
(I)
increased
species
diversity
but
decreased
numbers
of
hammerstones
and
flakes,
indicating
less
stone
tool
manufacture.
(J)
very
little
species
diversity
during
this
period
and
very
few
hominids."
target
:
"
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
prehistory
for
help.
Recent
research
has
recognized
multiple
hominid
species
from
the
Middle
Pliocene,
meaning
that
there
is
a
great
amount
of
species
diversity
or
diversity
in
a
single
species.
The
answer
is
(G)."
group
:
mmlu_pro_flan_cot_fewshot_humanities
include
:
_mmlu_pro_flan_cot_fewshot_template_yaml
task
:
mmlu_pro_flan_cot_fewshot_history
lm_eval/tasks/mmlu_pro/flan_cot_fewshot/mmlu_pro_law.yaml
0 → 100644
View file @
772d6f6f
dataset_name
:
law
description
:
The following are multiple choice questions (with answers) about law.
fewshot_config
:
sampler
:
first_n
samples
:
-
question
:
"
What
is
the
judge
ad
hoc?
(A)
Judge
ad
hoc
is
the
president
of
the
ICJ
(B)
Judge
ad
hoc
is
a
temporary
judge
appointed
for
a
specific
period
of
time
(C)
Judge
ad
hoc
is
the
judge
that
each
party
will
always
nominate
in
every
contentious
case
(D)
Judge
ad
hoc
is
the
member
of
the
bench
of
the
ICJ
with
a
casting
vote
(E)
Judge
ad
hoc
is
a
judge
who
is
nominated
by
the
parties
involved
in
a
contentious
case,
irrespective
of
their
nationality
(F)
Judge
ad
hoc
is
a
judge
who
decides
on
the
admissibility
of
cases
before
the
ICJ
(G)
Judge
ad
hoc
is
a
judge
appointed
by
the
Security
Council
of
the
United
Nations
(H)
Judge
ad
hoc
is
a
surrogate
judge,
in
case
a
judge
is
disqualified
or
passes
away
(I)
If
a
party
to
a
contentious
case
before
the
ICJ
does
not
have
a
national
sitting
as
judge,
it
is
entitled
to
nominate
someone
as
a
judge
solely
for
that
case,
with
the
title
of
judge
ad
hoc
(J)
N/A"
target
:
"
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
international
law
for
help.
As
'ad
hoc'
implies,
a
judge
ad
hoc
is
appointed
only
for
a
specific
case
or
period,
when
a
party
to
a
contentious
case
before
the
International
Court
of
Justice
does
not
have
a
regular
national
sitting
as
judge.
The
answer
is
(I)."
-
question
:
"
Functions
of
the
law
include
all
but
which
of
the
following?
(A)
defining
the
limits
of
government
power
(B)
regulating
the
use
of
public
spaces
(C)
keeping
the
peace
(D)
maximizing
individual
freedom
(E)
maintaining
order
and
stability
(F)
preventing
environmental
degradation
(G)
providing
a
basis
for
compromise
(H)
promoting
social
justice
(I)
promoting
the
principles
of
the
free
enterprise
system
(J)
encouraging
economic
growth"
target
:
"
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
jurisprudence
for
help.
Laws
are
fundamentally
about
helping
resolve
disputes
between
individuals,
and
therefore
essential
for
maximizing
individual
freedom,
providing
a
basis
for
compromise,
and
keeping
the
peace.
The
answer
is
(I)."
-
question
:
"
The
________
School
of
jurisprudence
postulates
that
the
law
is
based
on
what
is
'correct.'
(A)
Legal
Pragmatism
(B)
Legal
Formalism
(C)
Comparative
(D)
Analytical
(E)
Sociological
(F)
Historical
(G)
Critical
Legal
Studies
(H)
Realist
(I)
Positivist
(J)
Natural
Law"
target
:
"
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
jurisprudence
for
help.
Natural
Law
School
of
jurisprudence
focuses
on
the
laws
of
nature,
and
states
that
the
law
should
be
based
on
ethics,
morals,
and
what
is
'correct'.
Analytical
deals
with
the
law
as
it
already
exists,
Historical
postulates
that
the
law
was
found
and
not
made,
and
Sociological
studies
how
the
law
and
society
impact
each
other.
The
answer
is
(J)."
-
question
:
"
Which
word
best
summarizes
Weber's
explanation
of
the
development
of
formally
rational
law?
(A)
Socialism.
(B)
Legitimacy.
(C)
Authority.
(D)
Democracy.
(E)
Bureaucracy.
(F)
Conflict.
(G)
Capitalism.
(H)
Charisma.
(I)
Co-operation.
(J)
Tradition."
target
:
"
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
jurisprudence
for
help.
Weber
explained
the
development
of
formal
rationality
in
laws
as
how
the
modern
society
moved
from
tradition
to
rationality,
where
people
decide
actions
based
less
on
how
they
were
culturally
done
and
more
on
expected
utilities.
How
rational
individuals
optimize
efficiency
of
accomplishing
tasks
for
higher
rewards
is
a
core
principle
of
Capitalism.
The
answer
is
(G)."
-
question
:
"
A
state
has
recently
enacted
a
statute
prohibiting
the
disposal
of
any
nuclear
wastes
within
the
state.
This
law
does
not
contravene
or
conflict
with
any
federal
statutes.
A
man
operates
a
company
in
the
state
that
is
engaged
in
the
disposal
of
nuclear
wastes.
Subsequent
to
the
passage
of
the
state
statute,
the
man,
not
yet
aware
of
the
new
law,
entered
into
contracts
with
many
out-of-state
firms
to
dispose
of
their
nuclear
wastes
in
the
state.
On
account
of
this
new
law,
however,
the
man
will
be
unable
to
perform
these
contracts.
Assume
that
the
man
has
standing
to
challenge
this
state
law.
Which
of
the
following
presents
his
strongest
constitutional
grounds
to
challenge
the
state
law
prohibiting
the
disposal
of
nuclear
wastes
within
the
state?
(A)
The
second
amendment
-
the
right
to
bear
arms.
(B)
The
due
process
clause
of
the
Fourteenth
Amendment.
(C)
The
tenth
amendment
-
powers
not
delegated
to
the
United
States
by
the
Constitution.
(D)
The
first
amendment
-
freedom
of
speech.
(E)
The
privileges
and
immunities
clause
of
Article
IV,
Section
2.
(F)
The
commerce
clause.
(G)
The
sixth
amendment
-
right
to
a
fair
trial.
(H)
The
eighth
amendment
-
prohibition
of
cruel
and
unusual
punishment.
(I)
The
equal
protection
clause
of
the
Fourteenth
Amendment.
(J)
N/A"
target
:
"
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
law
for
help.
The
commerce
clause
states
that
Congress
shall
have
the
power
to
regulate
commerce
with
foreign
Nations,
and
among
the
several
States,
and
with
the
Indian
Tribes.
The
statute
affects
inter-state
commerce
which
puts
it
into
question.
Hence
the
man's
strongest
argument
should
be
the
commerce
clause.
The
answer
is
(F)."
group
:
mmlu_pro_flan_cot_fewshot_humanities
include
:
_mmlu_pro_flan_cot_fewshot_template_yaml
task
:
mmlu_pro_flan_cot_fewshot_law
lm_eval/tasks/mmlu_pro/flan_cot_fewshot/mmlu_pro_math.yaml
0 → 100644
View file @
772d6f6f
dataset_name
:
math
description
:
The following are multiple choice questions (with answers) about math.
fewshot_config
:
sampler
:
first_n
samples
:
-
question
:
"
The
symmetric
group
$S_n$
has
$
\\
factorial{n}$
elements,
hence
it
is
not
true
that
$S_{10}$
has
10
elements.
Find
the
characteristic
of
the
ring
2Z.
(A)
0
(B)
30
(C)
3
(D)
10
(E)
12
(F)
50
(G)
2
(H)
100
(I)
20
(J)
5"
target
:
"
Let's
think
step
by
step.
A
characteristic
of
a
ring
is
R
is
$n$
if
the
statement
$ka
=
0$
for
all
$a
\\
in
2Z$
implies
that
$k$
is
a
multiple
of
$n$.
Assume
that
$ka
=
0$
for
all
$a
\\
in
2Z$
for
some
$k$.
In
particular
$2k
=
0$.
Hence
$k=0$
and
$n=0$.
The
answer
is
(A)."
-
question
:
"
Let
V
be
the
set
of
all
real
polynomials
p(x).
Let
transformations
T,
S
be
defined
on
V
by
T:p(x)
->
xp(x)
and
S:p(x)
->
p'(x)
=
d/dx
p(x),
and
interpret
(ST)(p(x))
as
S(T(p(x))).
Which
of
the
following
is
true?
(A)
ST
+
TS
is
the
identity
map
of
V
onto
itself.
(B)
TS
=
0
(C)
ST
=
1
(D)
ST
-
TS
=
0
(E)
ST
=
T
(F)
ST
=
0
(G)
ST
=
TS
(H)
ST
-
TS
is
the
identity
map
of
V
onto
itself.
(I)
TS
=
T
(J)
ST
=
S"
target
:
"
Let's
think
step
by
step.
For
a
given
polynomial
$p$
we
have
\\
[ST(p)
=
(xp(x))’
=
p(x)
+
xp’(x)
\\
]
and
\\
[TS(p)
=
xp’(x).
\\
]
Hence
\\
[ST(p)
-
TS(p)
=
p(x)
+
xp’(x)
-
xp’(x).
\\
]
The
answer
is
(H)."
-
question
:
"
Let
A
be
the
set
of
all
ordered
pairs
of
integers
(m,
n)
such
that
7m
+
12n
=
22.
What
is
the
greatest
negative
number
in
the
set
B
=
{m
+
n
:
(m,
n)
\\
in
A}?
(A)
-5
(B)
0
(C)
-3
(D)
-7
(E)
-4
(F)
-6
(G)
-1
(H)
-2
(I)
-9
(J)
N/A"
target
:
"
Let's
think
step
by
step.
We
have
12n
=
22
-
7m
and
one
of
the
solutions
is
$m
=
-2$,
$n
=
3$.
Then
$m
+
n
=
1$,
hence
we
need
to
look
for
smaller
$m$
in
order
to
make
$m
+
n$
negative.
The
next
solution
is
$m
=
-14$
and
$n
=
10$.
For
smaller
$m$
we
have
$m
+
n$
smaller
than
$-4$.
The
answer
is
(E)."
-
question
:
"
A
tank
initially
contains
a
salt
solution
of
3
grams
of
salt
dissolved
in
100
liters
of
water.
A
salt
solution
containing
0.02
grams
of
salt
per
liter
of
water
is
sprayed
into
the
tank
at
a
rate
of
4
liters
per
minute.
The
sprayed
solution
is
continually
mixed
with
the
salt
solution
in
the
tank,
and
the
mixture
flows
out
of
the
tank
at
a
rate
of
4
liters
per
minute.
If
the
mixing
is
instantaneous,
how
many
grams
of
salt
are
in
the
tank
after
100
minutes
have
elapsed?
(A)
3
+
e^-2
(B)
2
-
e^-4
(C)
2
-
e^-2
(D)
3
+
e^-4
(E)
2
+
e^-3
(F)
2
-
e^-3
(G)
3
-
e^-2
(H)
2
+
e^-2
(I)
2
+
e^-4
(J)
2"
target
:
"
Let's
think
step
by
step.
For
all
$t
\\
in
\\
mathbb{R}$,
let
$s(t)$
denote
the
number
grams
of
salt
in
the
tank
at
the
$t$
minute
mark.
Then
$s(0)
=
3$.
We
use
$s$
and
$s(t)$
interchangeably.
We
also
use
$s^{
\\
prime}$
and
$s^{
\\
prime}(t)$
interchangeably.
The
solution
sprayed
into
the
tank
adds
$(0.02)
4=2
/
25$
grams
of
salt
per
minute.
There
are
always
100
liters
of
liquid
in
the
tank,
containing
$s$
grams
of
salt.
So
the
density
of
salt
in
the
tank
is
$s
/
100$
grams
per
liter.
The
flow
of
water
out
of
the
tank
therefore
subtracts
$4(s
/
100)=s
/
25$
grams
of
salt
per
minute.
Then,
for
all
$t
\\
in
\\
mathbb{R}$,
we
have
$s^{
\\
prime}(t)=(2
/
25)-(s
/
25)=(2-s)
/
25$,
and
so
$[s(t)=2]
\\
Rightarrow
\\
left[s^{
\\
prime}(t)=0ight]$.
For
all
$t
\\
in
\\
mathbb{R}$,
$$
\\
frac{d}{d
t}[
\\
ln
(s-2)]=
\\
frac{s^{
\\
prime}}{s-2}=
\\
frac{-1}{25}=
\\
frac{d}{d
t}
\\
left[-
\\
frac{t}{25}ight]
.
$$
Choose
$C
\\
in
\\
mathbb{R}$
such
that,
for
all
$t
\\
in
\\
mathbb{R},
\\
ln
((s(t)-2))=-[t
/
25]+C$.
Let
$K:=e^{C}$.
Then,
for
all
$t
\\
in
\\
mathbb{R}$,
we
have
$(s(t))-2=K
e^{-t
/
25}$,
and
so
$s(t)=2+K
e^{-t
/
25}$.
Then
$3=s(0)=2+K
e^{0}=2+K$,
so
$K=1$.
Then
$s(100)=2+K
e^{-100
/
25}=2+1
\\
cdot
e^{-4}=2+e^{-4}$.
The
answer
is
(I)."
-
question
:
"
A
total
of
30
players
will
play
basketball
at
a
park.
There
will
be
exactly
5
players
on
each
team.
Which
statement
correctly
explains
how
to
find
the
number
of
teams
needed?
(A)
Multiply
5
by
5
to
find
25
teams.
(B)
Divide
30
by
5
to
find
6
teams.
(C)
Add
5
to
30
to
find
35
teams.
(D)
Subtract
30
from
5
to
find
-25
teams.
(E)
Divide
5
by
30
to
find
0.1667
teams.
(F)
Add
5
to
30
then
divide
by
2
to
find
17.5
teams.
(G)
N/A
(H)
N/A
(I)
N/A
(J)
N/A"
target
:
"
Let's
think
step
by
step.
We
want
to
find
the
number
of
teams.
We
know
that
there
are
5
players/team,
and
30
players.
Thus
to
get
the
number
of
teams
we
divide
players
by
players/team,
so
30
players
/
5
players/team
=
6
teams.
The
answer
is
(B)."
group
:
mmlu_pro_flan_cot_fewshot_stem
include
:
_mmlu_pro_flan_cot_fewshot_template_yaml
task
:
mmlu_pro_flan_cot_fewshot_math
lm_eval/tasks/mmlu_pro/flan_cot_fewshot/mmlu_pro_other.yaml
0 → 100644
View file @
772d6f6f
dataset_name
:
other
description
:
The following are multiple choice questions (with answers) about other.
fewshot_config
:
sampler
:
first_n
samples
:
-
question
:
"
As
of
2017,
how
many
of
the
world’s
1-year-old
children
today
have
been
vaccinated
against
some
disease?
(A)
30%
(B)
60%
(C)
10%
(D)
90%
(E)
80%
(F)
40%
(G)
100%
(H)
50%
(I)
N/A
(J)
N/A"
target
:
"
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
global
facts
for
help.
According
to
data
published
by
the
World
Health
Organization,
the
number
of
1-year-old
children
vaccinated
in
2017
exceeds
80%.
The
answer
is
(E)."
-
question
:
"
Which
one
of
the
following
items
is
an
example
of
nonmaterial
culture?
(A)
A
dove
feather
(B)
Dove
symbol
(C)
Dove
body
lotion
(D)
Dove
deodorant
(E)
Dove
soap
(F)
Dove
candy
bar
(G)
Dove
conditioner
(H)
A
dove
(bird).
(I)
Dove
chocolate
(J)
Dove
shampoo"
target
:
"
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
geography
for
help.
Nonmaterial
culture
consists
of
cultural
ideas,
beliefs
or
symbols
that
are
not
physical
objects.
The
answer
is
(B)."
-
question
:
"
Which
of
the
following
cases
established
the
precedent
that
a
defendant
must
be
informed
of
the
right
to
remain
silent,
the
right
to
a
lawyer,
and
protection
from
self-incrimination?
(A)
Brown
v.
Board
of
Education
(B)
Miranda
v.
Arizona
(C)
Roe
v.
Wade
(D)
Betts
v.
Brady
(E)
Plessy
v.
Ferguson
(F)
Dred
Scott
v.
Sandford
(G)
Weeks
v.
United
States
(H)
Gideon
v.
Wainwright
(I)
Marbury
v.
Madison
(J)
Mapp
v.
Ohio"
target
:
"
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
government
and
politics
for
help.
In
the
landmark
Miranda
v.
Arizona
in
1966,
the
US
Supreme
Court,
based
on
the
Fifth
and
Sixth
Amendment
of
the
US
Constitution,
guaranteed
a
defendant's
right
to
an
attorney
and
protection
from
self-incrimination.
The
answer
is
(B)."
-
question
:
"
A
woman
who
knows
she
has
active
herpes
and
untreated
syphilis
but
continues
to
have
sex
without
informing
her
partners
of
her
condition
has,
in
psychoanalytic
terms:
(A)
a
weak
conscious
mind
(B)
a
strong
conscious
mind
(C)
a
weak
id
(D)
a
weak
ego
(E)
a
weak
unconscious
mind
(F)
a
strong
id
(G)
a
strong
ego
(H)
a
strong
superego
(I)
a
strong
preconscious
mind
(J)
a
weak
superego"
target
:
"
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
human
sexuality
for
help.
A
person
with
weak
superego
tends
to
be
delinquent,
criminal
or
have
antisocial
personality.
The
action
of
the
woman
who
knows
she
has
active
venereal
disease
but
still
have
sex
with
her
partners
indicate
she
may
have
antisocial
personality.
The
answer
is
(J)."
-
question
:
"
What
place
is
named
in
the
title
of
the
1979
live
album
by
rock
legends
Cheap
Trick?
(A)
Brooklyn
(B)
Beijing
(C)
Budapest
(D)
Boston
(E)
Bhutan
(F)
Barcelona
(G)
Britain
(H)
Brisbane
(I)
Bruges
(J)
Budokan"
target
:
"
Let's
think
step
by
step.
We
refer
to
Wikipedia
for
help.
Nippon
Budokan
is
an
indoor
arena
in
Tokyo,
Japan
renowned
for
hosting
rock
music
concerts
including
Cheap
Trick
in
1978.
'Cheap
Trick
at
Budokan'
became
the
name
of
their
album.
The
answer
is
(J)."
group
:
mmlu_pro_flan_cot_fewshot_other
include
:
_mmlu_pro_flan_cot_fewshot_template_yaml
task
:
mmlu_pro_flan_cot_fewshot_other
lm_eval/tasks/mmlu_pro/flan_cot_fewshot/mmlu_pro_philosophy.yaml
0 → 100644
View file @
772d6f6f
dataset_name
:
philosophy
description
:
The following are multiple choice questions (with answers) about philosophy.
fewshot_config
:
sampler
:
first_n
samples
:
-
question
:
"
Which
of
the
given
formulas
of
PL
is
the
best
symbolization
of
the
following
sentence?
Turtles
live
long
lives
and
are
happy
creatures,
unless
they
are
injured.
(A)
(L
•
H)
≡
I
(B)
(L
•
H)
∨
I
(C)
L
•
(H
∨
I)
(D)
L
•
(H
⊃
R).
(E)
N/A
(F)
N/A
(G)
N/A
(H)
N/A
(I)
N/A
(J)
N/A"
target
:
"
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
formal
logic
for
help.
Let’s
solve
this
step
by
step.
Let
“L”
denote
“living
long”,
H
“being
happy”,
and
“I”
“being
injured”.
Now,
consider
each
choice:
(A)
means
(living
long
AND
being
happy)
is
equivalent
to
(being
injured).
(B)
means
(living
long
AND
being
happy)
OR
(being
injured).
(C)
means
(living
long)
AND
(being
happy
OR
being
injured).
(D)
means
(living
long)
AND
(being
happy
implies
being
R),
but
what
R
denotes
is
not
clear.
Obviously,
(B)
is
the
best
symbolization
of
the
original
sentence.
The
answer
is
(B)."
-
question
:
"
Select
the
best
translation
into
predicate
logic.
George
borrows
Hector's
lawnmower.
(g:
George;
h:
Hector;
l:
Hector's
lawnmower;
Bxyx:
x
borrows
y
from
z).
(A)
Bhgh
(B)
Bggh
(C)
Bhlh
(D)
Bghl
(E)
Bhlg
(F)
Blhg
(G)
Bllg
(H)
Blgh
(I)
Bhhg
(J)
Bglh"
target
:
"
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
formal
logic
for
help.
Let’s
solve
this
step
by
step.
We
are
told
that
“Bxyx”
means
“x
borrows
y
from
z”.
We
can
rewrite
“George
borrows
Hector's
lawnmower”
as
“George
borrows
a
lawnmower
from
Hector”,
which
can
then
be
translated
into
predicate
logic
as
“Bglh”.
The
answer
“Bglh”
appears
in
(J);
therefore,
(J)
must
be
the
correct
answer.
The
answer
is
(J)."
-
question
:
"
The
study
of
reality
in
the
broadest
sense,
an
inquiry
into
the
elemental
nature
of
the
universe
and
the
things
in
it,
is
known
as
_____.
(A)
quantum
physics
(B)
metaphysics
(C)
ontology
(D)
phenomenology
(E)
cosmology
(F)
teleology
(G)
epistemology
(H)
axiology
(I)
N/A
(J)
N/A"
target
:
"
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
philosophy
for
help.
Among
the
options,
only
metaphysics
studies
the
nature
of
reality
and
existence.
The
answer
is
(B)."
-
question
:
"
According
to
Moore’s
“ideal
utilitarianism,”
the
right
action
is
the
one
that
brings
about
the
greatest
amount
of:
(A)
wealth.
(B)
virtue.
(C)
fairness.
(D)
pleasure.
(E)
peace.
(F)
justice.
(G)
happiness.
(H)
power.
(I)
good.
(J)
knowledge."
target
:
"
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
philosophy
for
help.
Moore's
'ideal
utilitarianism'
states
that
one's
actions
should
maximize
intrinsic
goods.
The
answer
is
(I)."
-
question
:
"
What
is
the
sign
of
the
covenant
for
Jewish
males?
(A)
Fasting
on
Yom
Kippur
(B)
Lighting
Shabbat
candles
(C)
The
rainbow
(D)
Circumcision
(E)
The
Torah
(F)
Bar
mitzvah
(G)
Keeping
kosher
(H)
Wearing
a
kippah
(I)
A
son
(J)
The
Star
of
David"
target
:
"
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
world
religions
for
help.
In
Judaism,
the
most
distinctive
sign
of
the
covenant
is
circumcision
(brit
milah).
The
answer
is
(D)."
group
:
mmlu_pro_flan_cot_fewshot_humanities
include
:
_mmlu_pro_flan_cot_fewshot_template_yaml
task
:
mmlu_pro_flan_cot_fewshot_philosophy
lm_eval/tasks/mmlu_pro/flan_cot_fewshot/mmlu_pro_physics.yaml
0 → 100644
View file @
772d6f6f
dataset_name
:
physics
description
:
The following are multiple choice questions (with answers) about physics.
fewshot_config
:
sampler
:
first_n
samples
:
-
question
:
"
Say
the
pupil
of
your
eye
has
a
diameter
of
5
mm
and
you
have
a
telescope
with
an
aperture
of
50
cm.
How
much
more
light
can
the
telescope
gather
than
your
eye?
(A)
1000
times
more
(B)
50
times
more
(C)
5000
times
more
(D)
500
times
more
(E)
10000
times
more
(F)
20000
times
more
(G)
2000
times
more
(H)
100
times
more
(I)
10
times
more
(J)
N/A"
target
:
"
Let's
think
step
by
step.
The
amount
of
light
is
proportional
to
the
aperture
area
$A
=
\
pi
D^2/4$
for
a
lens
with
diameter
$D$,
so
the
relative
amounts
of
light
between
the
eye
with
diameter
5
mm
and
the
telescope
with
diameter
50
cm
is
$(50
cm)^2/(5
mm)^2
=
10000$.
The
answer
is
(E)."
-
question
:
"
Where
do
most
short-period
comets
come
from
and
how
do
we
know?
(A)
The
Kuiper
belt;
short
period
comets
tend
to
be
in
the
plane
of
the
solar
system
just
like
the
Kuiper
belt.
(B)
The
asteroid
belt;
short
period
comets
tend
to
come
from
random
directions
indicating
a
spherical
distribution
of
comets
called
the
asteroid
belt.
(C)
The
asteroid
belt;
short
period
comets
tend
to
be
in
the
plane
of
the
solar
system
just
like
the
asteroid
belt.
(D)
The
Oort
cloud;
short
period
comets
have
orbital
periods
similar
to
asteroids
like
Vesta
and
are
found
in
the
plane
of
the
solar
system
just
like
the
Oort
cloud.
(E)
The
Oort
Cloud;
short
period
comets
tend
to
come
from
random
directions
indicating
a
spherical
distribution
of
comets
called
the
Oort
Cloud.
(F)
The
Oort
cloud;
short
period
comets
tend
to
be
in
the
plane
of
the
solar
system
just
like
the
Oort
cloud.
(G)
The
asteroid
belt;
short
period
comets
have
orbital
periods
similar
to
asteroids
like
Vesta
and
are
found
in
the
plane
of
the
solar
system
just
like
the
asteroid
belt.
(H)
N/A
(I)
N/A
(J)
N/A"
target
:
"
Let's
think
step
by
step.
Most
short-period
comets
come
from
the
Kuiper
belt,
and
we
know
because
short
period
comets
tend
to
be
in
the
plane
of
the
solar
system,
just
like
the
Kuiper
belt
is.
The
answer
is
(A)."
-
question
:
"
A
refracting
telescope
consists
of
two
converging
lenses
separated
by
100
cm.
The
eye-piece
lens
has
a
focal
length
of
20
cm.
The
angular
magnification
of
the
telescope
is
(A)
10
(B)
40
(C)
6
(D)
25
(E)
15
(F)
50
(G)
30
(H)
4
(I)
5
(J)
20"
target
:
"
Let's
think
step
by
step.
In
a
refracting
telescope,
if
both
lenses
are
converging,
the
focus
of
both
lenses
must
be
between
the
two
lenses,
and
thus
the
focal
lengths
of
the
two
lenses
must
add
up
to
their
separation.
Since
the
focal
length
of
one
lens
is
20
cm,
the
focal
length
of
the
other
must
be
80
cm.
The
magnification
is
the
ratio
of
these
two
focal
lengths,
or
4.
The
answer
is
(H)."
-
question
:
"
Colors
in
a
soap
bubble
result
from
light
(A)
dispersion
(B)
deflection
(C)
refraction
(D)
reflection
(E)
interference
(F)
converted
to
a
different
frequency
(G)
polarization
(H)
absorption
(I)
diffraction
(J)
transmission"
target
:
"
Let's
think
step
by
step.
In
a
soap
bubble
film,
the
light
bounces
between
the
two
soap-air
interfaces
many
times,
interfering
with
itself
constructively
or
destructively
depending
on
the
width
of
the
film.
This
results
in
different
colors
being
visible.
The
answer
is
(E)."
-
question
:
"
A
microwave
oven
is
connected
to
an
outlet,
120
V,
and
draws
a
current
of
2
amps.
At
what
rate
is
energy
being
used
by
the
microwave
oven?
(A)
240
W
(B)
120
W
(C)
10
W
(D)
480
W
(E)
360
W
(F)
200
W
(G)
30
W
(H)
150
W
(I)
60
W
(J)
300
W"
target
:
"
Let's
think
step
by
step.
Rate
of
energy
usage
is
known
as
power;
in
an
dissipative
electrical
circuit,
power
is
given
by
voltage
times
current.
So
in
our
case,
the
power
is
120
V
times
2
amps,
or
240
W.
The
answer
is
(A)."
group
:
mmlu_pro_flan_cot_fewshot_stem
include
:
_mmlu_pro_flan_cot_fewshot_template_yaml
task
:
mmlu_pro_flan_cot_fewshot_physics
lm_eval/tasks/mmlu_pro/flan_cot_fewshot/mmlu_pro_psychology.yaml
0 → 100644
View file @
772d6f6f
dataset_name
:
psychology
description
:
The following are multiple choice questions (with answers) about psychology.
fewshot_config
:
sampler
:
first_n
samples
:
-
question
:
"
Pascale
is
interested
in
the
processing
strategies
children
use
to
learn
new
information.
Pascale
would
best
be
classified
as
what
type
of
psychologist?
(A)
social
(B)
school
(C)
sociocultural
(D)
forensic
(E)
behaviorist
(F)
health
(G)
clinical
(H)
cognitive
(I)
psychoanalytic
(J)
developmental"
target
:
"
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
psychology
for
help.
Sociocultural
psychologists
focus
on
the
effect
of
societal
factors
on
people.
Clinical
psychologists
focus
on
people
with
mental
issues.
Cognitive
psychologists
focus
on
how
people
think
and
learn,
including
the
processing
strategies.
Behaviorists
focus
more
on
the
environment
and
experience
effect
on
people.
The
answer
is
(H)."
-
question
:
"
According
to
Caplan's
model
of
consultee-centered
case
consultation,
the
consultant
is
primarily
interested
in
(A)
identifying
the
causes
and
solutions
of
the
client's
presenting
problems
(B)
establishing
a
hierarchy
of
authority
to
enable
effective
decision
making
(C)
ensuring
the
consultee
adheres
strictly
to
a
predetermined
action
plan
(D)
proposing
multiple
alternative
solutions
for
the
consultee
to
choose
from
(E)
identifying
the
strengths
and
weaknesses
of
the
consultee's
current
approach
(F)
presenting
a
single,
well-defined
and
unambiguous
course
of
action
for
the
consultant
to
overcome
skills
deficits
(G)
developing
a
comprehensive
treatment
plan
for
the
client
(H)
identifying
and
eliminating
the
causes
of
the
consultee's
difficulties
in
handling
a
problem
(I)
focusing
on
the
consultant's
personal
growth
and
development
(J)
focusing
on
the
relationship
between
the
client
and
the
consultee"
target
:
"
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
psychology
for
help.
Caplan
defines
two
types
of
consultation.
Client-centered
case
consultation
aims
to
handle
the
client's
problems,
while
consultee-centered
case
consultation
aims
to
identify
the
reason
of
the
consultee's
difficulty
to
solve
problems.
The
answer
is
(H)."
-
question
:
"
According
to
the
Individuals
with
Disabilities
Education
Improvement
Act,
which
of
the
following
must
an
educational
agency
do
before
it
changes
the
educational
placement
of
a
student
with
a
disability?
(A)
Notify
the
parents
in
writing
(B)
Obtain
the
child's
consent
(C)
Obtain
a
court
order
(D)
Conduct
a
new
evaluation
of
the
child's
disability
(E)
Discuss
with
the
child's
psychologist
(F)
Give
the
child
a
trial
period
in
the
new
environment
(G)
Obtain
parental
consent
(H)
Notify
the
local
education
authority
(I)
Arrange
a
meeting
with
all
teachers
and
administrators
(J)
Obtain
school
board
approval"
target
:
"
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
psychology
for
help.
When
the
decision
to
change
the
educational
placement
of
a
student
with
a
disability
is
made,
the
educational
agency
must
notify
the
parents
in
writing
on
that
date.
The
answer
is
(A)."
-
question
:
"
Ani
believes
that
her
attitudes
and
behavior
play
a
central
role
in
what
happens
to
her.
Such
a
belief
is
likely
to
be
associated
with
(A)
low
self-esteem.
(B)
a
strong
id.
(C)
a
high
level
of
anxiety.
(D)
a
strong
superego.
(E)
high
levels
of
self-consciousness.
(F)
an
external
locus
of
control.
(G)
an
inferiority
complex.
(H)
a
low
level
of
self-awareness.
(I)
low
self-efficacy.
(J)
an
internal
locus
of
control."
target
:
"
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
psychology
for
help.
People
with
an
external
locus
of
control
believe
fate
and
luck
play
an
important
role
in
their
lives,
while
people
with
an
internal
locus
of
control
believe
they
control
their
lives.
The
answer
is
(J)."
-
question
:
"
In
terms
of
Hofstede’s
(1980)
five
cultural
dimensions,
the
United
States
scores
at
the
top
of
the
scale
on:
(A)
individualism
and
long-term
orientation.
(B)
individualism
and
power
distance.
(C)
uncertainty
avoidance.
(D)
long-term
orientation.
(E)
individualism.
(F)
individualism
and
masculinity.
(G)
long-term
orientation
and
uncertainty
avoidance.
(H)
power
distance.
(I)
power
distance
and
masculinity.
(J)
N/A"
target
:
"
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
psychology
for
help.
The
US
scores
highest
on
individualism
among
the
five
cultural
dimensions.
The
answer
is
(E)."
group
:
mmlu_pro_flan_cot_fewshot_social_sciences
include
:
_mmlu_pro_flan_cot_fewshot_template_yaml
task
:
mmlu_pro_flan_cot_fewshot_psychology
Prev
1
2
Next
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment