Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
gaoqiong
lm-evaluation-harness
Commits
e1ae8a2f
Commit
e1ae8a2f
authored
Nov 26, 2023
by
Herbie Bradley
Browse files
Merge remote-tracking branch 'origin/big-refactor' into calibration
parents
50e99bd7
30936bc7
Changes
1000
Hide whitespace changes
Inline
Side-by-side
Showing
20 changed files
with
1123 additions
and
0 deletions
+1123
-0
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_electrical_engineering.yaml
...ks/mmlu/flan_cot_fewshot/mmlu_electrical_engineering.yaml
+34
-0
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_elementary_mathematics.yaml
...ks/mmlu/flan_cot_fewshot/mmlu_elementary_mathematics.yaml
+41
-0
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_formal_logic.yaml
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_formal_logic.yaml
+53
-0
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_global_facts.yaml
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_global_facts.yaml
+34
-0
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_high_school_biology.yaml
...tasks/mmlu/flan_cot_fewshot/mmlu_high_school_biology.yaml
+54
-0
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_high_school_chemistry.yaml
...sks/mmlu/flan_cot_fewshot/mmlu_high_school_chemistry.yaml
+50
-0
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_high_school_computer_science.yaml
...u/flan_cot_fewshot/mmlu_high_school_computer_science.yaml
+71
-0
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_high_school_european_history.yaml
...u/flan_cot_fewshot/mmlu_high_school_european_history.yaml
+169
-0
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_high_school_geography.yaml
...sks/mmlu/flan_cot_fewshot/mmlu_high_school_geography.yaml
+37
-0
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_high_school_government_and_politics.yaml
...cot_fewshot/mmlu_high_school_government_and_politics.yaml
+43
-0
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_high_school_macroeconomics.yaml
...mlu/flan_cot_fewshot/mmlu_high_school_macroeconomics.yaml
+37
-0
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_high_school_mathematics.yaml
...s/mmlu/flan_cot_fewshot/mmlu_high_school_mathematics.yaml
+37
-0
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_high_school_microeconomics.yaml
...mlu/flan_cot_fewshot/mmlu_high_school_microeconomics.yaml
+40
-0
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_high_school_physics.yaml
...tasks/mmlu/flan_cot_fewshot/mmlu_high_school_physics.yaml
+39
-0
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_high_school_psychology.yaml
...ks/mmlu/flan_cot_fewshot/mmlu_high_school_psychology.yaml
+48
-0
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_high_school_statistics.yaml
...ks/mmlu/flan_cot_fewshot/mmlu_high_school_statistics.yaml
+58
-0
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_high_school_us_history.yaml
...ks/mmlu/flan_cot_fewshot/mmlu_high_school_us_history.yaml
+134
-0
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_high_school_world_history.yaml
...mmlu/flan_cot_fewshot/mmlu_high_school_world_history.yaml
+82
-0
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_human_aging.yaml
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_human_aging.yaml
+28
-0
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_human_sexuality.yaml
...val/tasks/mmlu/flan_cot_fewshot/mmlu_human_sexuality.yaml
+34
-0
No files found.
Too many changes to show.
To preserve performance only
1000 of 1000+
files are displayed.
Plain diff
Email patch
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_electrical_engineering.yaml
0 → 100644
View file @
e1ae8a2f
"
dataset_name"
:
"
electrical_engineering"
"
description"
:
"
\n
The
following
are
multiple
choice
questions
(with
answers)
about
\
\
electrical
engineering.
\n\n
Q:
A
point
pole
has
a
strength
of
4π
*
10^-4
weber.
\
\
The
force
in
newtons
on
a
point
pole
of
4π
*
1.5
*
10^-4
weber
placed
at
a
distance
\
\
of
10
cm
from
it
will
be
\n
(A)
15
N.
(B)
20
N.
(C)
7.5
N.
(D)
3.75
N.
\n
A:
Let's
\
\
think
step
by
step.
The
force
between
two
point
poles
is
given
by
m_1m_2/(mu_0
\
\
4
\\
pi
r^2),
in
analogy
to
Coulomb’s
law.
Plugging
in
the
values
given
in
the
\
\
question,
we
calculate
that
the
force
is
approximately
15
N.
The
answer
is
(A).
\n\
\n
Q:
The
coil
of
a
moving
coil
meter
has
100
turns,
is
40
mm
long
and
30
mm
wide.
\
\
The
control
torque
is
240*10-6
N-m
on
full
scale.
If
magnetic
flux
density
is
\
\
1Wb/m2
range
of
meter
is
\n
(A)
1
mA.
(B)
2
mA.
(C)
3
mA.
(D)
4
mA.
\n
A:
Let's
think
\
\
step
by
step.
The
torque
on
a
coil
in
a
uniform
magnetic
field
is
given
by
BANI,
\
\
where
B
is
the
magnetic
flux
density,
A
is
the
area
of
the
coil,
N
is
the
number
\
\
of
turns,
and
I
is
the
current.
So
we
have
that
I
=
(Torque)/(BAN),
or
240e-6/(1200e-6
\
\
*
100
*
1)
=
2e-3.
The
answer
is
(B).
\n\n
Q:
In
an
SR
latch
built
from
NOR
gates,
\
\
which
condition
is
not
allowed
\n
(A)
S=0,
R=0
(B)
S=0,
R=1
(C)
S=1,
R=0
(D)
S=1,
\
\
R=1
\n
A:
Let's
think
step
by
step.
An
SR
latch
is
a
set-reset
latch;
in
the
case
\
\
where
S=1
and
R=1,
the
circuit
has
no
stable
state;
instead
a
race
condition
will
\
\
be
produced
within
the
circuit,
so
the
device
will
be
in
an
undefined
state.
So
\
\
S=1,
R=1
is
an
illegal
input.
The
answer
is
(D).
\n\n
Q:
Two
long
parallel
conductors
\
\
carry
100
A.
If
the
conductors
are
separated
by
20
mm,
the
force
per
meter
of
\
\
length
of
each
conductor
will
be
\n
(A)
100
N.
(B)
0.1
N.
(C)
1
N.
(D)
0.01
N.
\n\
A:
Let's
think
step
by
step.
The
magnetic
force-per-length
between
two
current-carrying
\
\
conductors
is
given
by
\\
mu_0
I_1
I_2
/
(2
\\
pi
r),
where
$r$
is
the
separation
\
\
distance
and
I_1
and
I_2
are
the
currents.
Plugging
in
100
A
for
I_1
and
I_2,
\
\
and
20
mm
for
r,
gives
0.1
N.
The
answer
is
(B).
\n\n
Q:
In
a
2
pole
lap
winding
\
\
dc
machine
,
the
resistance
of
one
conductor
is
2Ω
and
total
number
of
conductors
\
\
is
100.
Find
the
total
resistance
\n
(A)
200Ω
(B)
100Ω
(C)
50Ω
(D)
10Ω
\n
A:
Let's
\
\
think
step
by
step.
In
lap
winding,
effectively
two
resistors
are
connected
in
\
\
parallel,
so
the
actual
resistance
of
each
pair
is
1
Ohm.
Since
we
have
50
pairs,
\
\
we
get
a
total
resistance
of
50
Ohms.
The
answer
is
(C)."
"
group"
:
"
mmlu_flan_cot_fewshot_stem"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
task"
:
"
mmlu_flan_cot_fewshot_electrical_engineering"
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_elementary_mathematics.yaml
0 → 100644
View file @
e1ae8a2f
"
dataset_name"
:
"
elementary_mathematics"
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
elementary
\
\
mathematics.
\n\n
Q:
Olivia
used
the
rule
\"
Add
11
\"
to
create
the
number
pattern
\
\
shown
below.
10,
21,
32,
43,
54.
Which
statement
about
the
number
pattern
is
true?
\n\
(A)
The
10th
number
in
the
pattern
will
be
an
even
number.
\n
(B)
The
number
pattern
\
\
will
never
have
two
even
numbers
next
to
each
other.
\n
(C)
The
next
two
numbers
\
\
in
the
pattern
will
be
an
even
number
then
an
odd
number.
\n
(D)
If
the
number
pattern
\
\
started
with
an
odd
number
then
the
pattern
would
have
only
odd
numbers
in
it.
\n\
A:
Let's
think
step
by
step.
Choice
A
is
incorrect
because
every
even-numbered
term
\
\
in
the
pattern
is
odd,
and
10
is
an
even
number.
Choice
B
is
correct,
because
\
\
adding
an
odd
number
(in
this
case
11)
to
an
odd
number
produces
an
even
number,
\
\
and
adding
an
odd
number
to
an
even
number
produces
an
odd
number.
Thus
the
terms
\
\
in
the
pattern
will
alternate
between
odd
and
even,
so
there
will
never
be
two
\
\
even
numbers
next
to
each
other.
Choice
C
is
incorrect
because
the
last
term
in
\
\
the
example
is
even
(54),
and
we
know
that
the
terms
will
alternate
between
even
\
\
and
odd.
Choice
D
is
incorrect
because
the
terms
in
the
pattern
will
alternate
\
\
between
odd
and
even,
regardless
of
the
value
of
the
first
term.
The
answer
is
\
\
(B).
\n\n
Q:
The
population
of
the
city
where
Michelle
was
born
is
145,826.
What
\
\
is
the
value
of
the
5
in
the
number
145,826?
\n
(A)
5
thousands
\n
(B)
5
hundreds
\n\
(C)
5
tens
\n
(D)
5
ones
\n
A:
Let's
think
step
by
step.
Choice
A
is
correct,
because
\
\
there
are
three
digits
following
the
5,
so
\n
the
5
is
in
the
thousands
place.
Thus
\
\
the
other
choices
are
incorrect.
The
answer
is
(A).
\n\n
Q:
A
store
sells
107
different
\
\
colors
of
paint.
They
have
25
cans
of
each
color
in
storage.
The
number
of
cans
\
\
of
paint
the
store
has
in
storage
can
be
found
using
the
expression
below.
107
\
\
×
25.
How
many
cans
of
paint
does
the
store
have
in
storage?
\n
(A)
749
\n
(B)
2,675
\n\
(C)
2,945
\n
(D)
4,250
\n
A:
Let's
think
step
by
step.
We
can
calculate
107
x
25
=
(100
\
\
x
25)
+
(7
x
25)
=
2500
+
175
=
2675.
The
answer
is
(B).
\n\n
Q:
A
total
of
30
players
\
\
will
play
basketball
at
a
park.
There
will
be
exactly
5
players
on
each
team.
\
\
Which
statement
correctly
explains
how
to
find
the
number
of
teams
needed?
\n
(A)
\
\
Add
5
to
30
to
find
35
teams.
\n
(B)
Divide
30
by
5
to
find
6
teams.
\n
(C)
Multiply
\
\
30
and
5
to
find
150
teams.
\n
(D)
Subtract
5
from
30
to
find
25
teams.
\n
A:
Let's
\
\
think
step
by
step.
We
want
to
find
the
number
of
teams.
We
know
that
there
are
\
\
5
players/team,
and
30
players.
Thus
to
get
the
number
of
teams
we
divide
players
\
\
by
players/team,
so
30
players
/
5
players/team
=
6
teams.
The
answer
is
(B).
\n\
\n
Q:
Which
expression
is
equivalent
to
5
x
9?
\n
(A)
(5
x
4)
x
(6
x
5)
\n
(B)
(5
x
5)
\
\
+
(5
x
4)
\n
(C)
(5
x
5)
+
(5
x
9)
\n
(D)
(5
x
9)
x
(6
x
9)
\n
A:
Let's
think
step
by
\
\
step.
We
know
that
9
=
(5
+
4),
so
5
x
9
=
5
x
(5
+
4)
=
(5
x
5)
+
(5
x
4).
The
\
\
answer
is
(B)."
"
group"
:
"
mmlu_flan_cot_fewshot_stem"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
task"
:
"
mmlu_flan_cot_fewshot_elementary_mathematics"
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_formal_logic.yaml
0 → 100644
View file @
e1ae8a2f
"
dataset_name"
:
"
formal_logic"
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
formal
\
\
logic.
\n\n
Q:
Which
of
the
given
formulas
of
PL
is
the
best
symbolization
of
the
\
\
following
sentence?
\n
Turtles
live
long
lives
and
are
happy
creatures,
unless
they
\
\
are
injured.
\n
(A)
(L
•
H)
≡
I
(B)
(L
•
H)
∨
I
(C)
L
•
(H
∨
I)
(D)
L
•
(H
⊃
R).
\n\
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
formal
logic
for
\
\
help.
Let’s
solve
this
step
by
step.
Let
“L”
denote
“living
long”,
H
“being
happy”,
\
\
and
“I”
“being
injured”.
Now,
consider
each
choice:
\n
(A)
means
(living
long
AND
\
\
being
happy)
is
equivalent
to
(being
injured).
\n
(B)
means
(living
long
AND
being
\
\
happy)
OR
(being
injured).
\n
(C)
means
(living
long)
AND
(being
happy
OR
being
\
\
injured).
\n
(D)
means
(living
long)
AND
(being
happy
implies
being
R),
but
what
\
\
R
denotes
is
not
clear.
\n
Obviously,
(B)
is
the
best
symbolization
of
the
original
\
\
sentence.
The
answer
is
(B).
\n\n
Q:
Select
the
best
translation
into
predicate
\
\
logic.George
borrows
Hector's
lawnmower.
(g:
George;
h:
Hector;
l:
Hector's
lawnmower;
\
\
Bxyx:
x
borrows
y
from
z).
\n
(A)
Blgh
(B)
Bhlg
(C)
Bglh
(D)
Bghl
\n
A:
Let's
think
\
\
step
by
step.
We
refer
to
Wikipedia
articles
on
formal
logic
for
help.
Let’s
solve
\
\
this
step
by
step.
We
are
told
that
“Bxyx”
means
“x
borrows
y
from
z”.
We
can
\
\
rewrite
“George
borrows
Hector's
lawnmower”
as
“George
borrows
a
lawnmower
from
\
\
Hector”,
which
can
then
be
translated
into
predicate
logic
as
“Bglh”.
The
answer
\
\
“Bglh”
appears
in
(C);
therefore,
(C)
must
be
the
correct
answer.
The
answer
is
\
\
(C).
\n\n
Q:
\n
Select
the
best
English
interpretation
of
the
given
arguments
in
\
\
predicate
logic.
\n
Dm
\n
(∀x)(Wx
⊃
~Dx).
\n
(∀x)Wx
∨
Ag
\t
/
(∃x)Ax
\n
(A)
Marina
is
a
\
\
dancer.
Some
weaklings
are
not
dancers.
Either
everything
is
a
weakling
or
Georgia
\
\
plays
volleyball.
So
something
plays
volleyball.
(B)
Marina
is
a
dancer.
No
weakling
\
\
is
a
dancer.
Everything
is
either
a
weakling
or
plays
volleyball.
So
something
\
\
plays
volleyball.
(C)
Marina
is
a
dancer.
Some
weaklings
are
not
dancers.
Everything
\
\
is
either
a
weakling
or
plays
volleyball.
So
something
plays
volleyball.
(D)
Marina
\
\
is
a
dancer.
No
weakling
is
a
dancer.
Either
everything
is
a
weakling
or
Georgia
\
\
plays
volleyball.
So
something
plays
volleyball.
\n
A:
Let's
think
step
by
step.
\
\
We
refer
to
Wikipedia
articles
on
formal
logic
for
help.
Let’s
solve
this
step
\
\
by
step.
Let
“D”
denote
“being
a
dancer”,
“m”
denote
“Maria”,
“g”
denote
“Georgia”,
\
\
“W”
denote
“weakling”,
“A”
denote
“playing
volleyball”.
Then,
we
have
the
following:
\n\
1.
Dm
→
Maria
is
a
dance.
\n
2.
(∀x)(Wx
⊃
~Dx).
→
For
all
x,
if
x
is
a
weakling,
then
\
\
x
is
not
a
dancer.
In
other
words,
no
weakling
is
a
dancer.
\n
3.
(∀x)Wx
∨
Ag
\t\
/
(∃x)Ax
→
For
all
x,
x
is
a
weakling
or
Georgia
plays
volleyball.
So
there
exists
\
\
an
x
that
plays
volleyball.
\n
Options
(A)
and
(C)
do
claim
that
some
weaklings
\
\
are
not
dancers,
but
the
second
argument
strongly
states
that
no
weakling
is
a
\
\
dancer.
Thus,
we
can
eliminate
them.
Option
(B)
omits
the
important
detail
about
\
\
Georgia
playing
volleyball.
Option
(D)
has
all
the
details
presented
in
the
arguments
\
\
and
is
the
best
English
interpretation
of
the
arguments.
The
answer
is
(D).
\n\n\
Q:
Select
the
best
translation
into
predicate
logic:
No
people
drive
on
Mars.
\n\
(A)
~Pd
(B)
(∀x)(Px
∨
~Dx)
(C)
(∀x)(Px
⊃
~Dx)
(D)
~Dp
\n
A:
Let's
think
step
by
step.
\
\
We
refer
to
Wikipedia
articles
on
formal
logic
for
help.
Let’s
solve
this
step
\
\
by
step.
Let
“P”
denote
“being
on
Mars”
and
“D”
denote
“driving
on
Mars”.
Then
\
\
let’s
consider
each
option:
\n
Option
(A):
~Pd
→
d
is
not
on
Mars.
\n
Option
(B):
\
\
(∀x)(Px
∨
~Dx)
→
For
all
x,
x
is
on
Mars
and
x
do
not
drive
on
Mars.
\n
Option
(C):
\
\
(∀x)(Px
⊃
~Dx)
→
For
all
x,
x
is
on
Mars
implies
that
x
do
not
drive
on
Mars.
\n\
Option
(D):
~Dp:
→
p
do
not
drive
on
Mars.
\n
Of
all
these
options,
Option
(C)
appears
\
\
to
be
the
best
and
most
meaningful
interpretation
of
the
argument
“No
people
drive
\
\
on
Mars.”
The
answer
is
(C)."
"
group"
:
"
mmlu_flan_cot_fewshot_humanities"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
task"
:
"
mmlu_flan_cot_fewshot_formal_logic"
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_global_facts.yaml
0 → 100644
View file @
e1ae8a2f
"
dataset_name"
:
"
global_facts"
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
global
\
\
facts.
\n\n
Q:
As
of
2017,
how
many
of
the
world’s
1-year-old
children
today
have
\
\
been
vaccinated
against
some
disease?
*
\n
(A)
80%
(B)
60%
(C)
40%
(D)
20%
\n
A:
Let's
\
\
think
step
by
step.
We
refer
to
Wikipedia
articles
on
global
facts
for
help.
According
\
\
to
data
published
by
the
World
Health
Organization,
the
nummber
of
1-year-old
\
\
children
vaccinated
in
2017
exceeds
80%.
The
answer
is
(A).
\n\n
Q:
As
of
2019,
\
\
about
what
percentage
of
Americans
agree
that
the
state
is
run
for
the
benefit
\
\
of
all
the
people?
\n
(A)
31%
(B)
46%
(C)
61%
(D)
76%
\n
A:
Let's
think
step
by
step.
\
\
We
refer
to
Wikipedia
articles
on
global
facts
for
help.
In
2019,
about
46%
percentage
\
\
of
Americans
agree
that
the
state
is
run
for
the
benefit
of
all
the
people.
The
\
\
answer
is
(B).
\n\n
Q:
As
of
2019,
about
what
percentage
of
Russians
say
it
is
very
\
\
important
to
have
free
media
in
our
country
without
government/state
censorship?
\n\
(A)
38%
(B)
53%
(C)
68%
(D)
83%
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
\
\
articles
on
global
facts
for
help.
As
of
2019,
about
38%
of
Russians
say
it
is
\
\
very
important
to
have
free
media
in
our
country.
The
answer
is
(A).
\n\n
Q:
As
\
\
of
2015,
since
1990
forests
have
____
in
Europe
and
have
____
in
Africa
and
the
\
\
Americas.
\n
(A)
increased,
increased
(B)
increased,
decreased
(C)
decreased,
increased
\
\
(D)
decreased,
decreased
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
\
\
on
global
facts
for
help.
As
of
2015,
since
1990
forests
have
increased
in
Europe
\
\
and
have
decreased
in
Africa
and
the
Americas.
The
answer
is
(B).
\n\n
Q:
Which
\
\
of
the
following
pairs
of
statements
are
both
true
(as
of
2019)?
\n
(A)
People
tend
\
\
to
be
optimistic
about
their
own
future
and
the
future
of
their
nation
or
the
\
\
world.
(B)
People
tend
to
be
optimistic
about
their
own
future
but
pessimistic
\
\
about
the
future
of
their
nation
or
the
world.
(C)
People
tend
to
be
pessimistic
\
\
about
their
own
future
but
optimistic
about
the
future
of
their
nation
or
the
\
\
world.
(D)
People
tend
to
be
pessimistic
about
their
own
future
and
the
future
\
\
of
their
nation
or
the
world.
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
\
\
articles
on
global
facts
for
help.
As
of
2019,
most
people
tend
to
be
optimistic
\
\
about
their
own
future
but
pessimistic
about
the
future
of
their
nation
or
the
\
\
world.
The
answer
is
(B)."
"
group"
:
"
mmlu_flan_cot_fewshot_other"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
task"
:
"
mmlu_flan_cot_fewshot_global_facts"
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_high_school_biology.yaml
0 → 100644
View file @
e1ae8a2f
"
dataset_name"
:
"
high_school_biology"
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
high
\
\
school
biology.
\n\n
Q:
In
animal
cells,
which
of
the
following
represents
the
most
\
\
likely
pathway
that
a
secretory
protein
takes
as
it
is
synthesized
in
a
cell?
\n\
(A)
Plasma
membrane–Golgi
apparatus–ribosome–secretory
vesicle–rough
ER
(B)
Ribosome–Golgi
\
\
apparatus–rough
ER–secretory
vesicle–plasma
membrane
(C)
Plasma
membrane–Golgi
\
\
apparatus–ribosome–secretory
vesicle–rough
ER
(D)
Ribosome–rough
ER–Golgi
apparatus–secretory
\
\
vesicle–plasma
membrane
\n
A:
Let's
think
step
by
step.
Protein
synthesis
starts
\
\
at
the
ribosome,
so
we
can
eliminate
(A)
and
(C).
The
ribosome
is
often
in
the
\
\
endoplasmic
reticulum
and
moves
from
there
to
the
Golgi
apparatus,
where
it
is
\
\
modified
and
packaged
into
a
vesicle.
The
vesicle
then
floats
to
the
plasma
membrane
\
\
and
is
secreted.
The
answer
is
(D).
\n\n
Q:
A
mutation
in
a
bacterial
enzyme
changed
\
\
a
previously
polar
amino
acid
into
a
nonpolar
amino
acid.
This
amino
acid
was
\
\
located
at
a
site
distant
from
the
enzyme’s
active
site.
How
might
this
mutation
\
\
alter
the
enzyme’s
substrate
specificity?
\n
(A)
By
changing
the
enzyme’s
pH
optimum
\
\
(B)
By
changing
the
enzyme’s
location
in
the
cell
(C)
By
changing
the
shape
of
\
\
the
protein
(D)
An
amino
acid
change
away
from
the
active
site
cannot
alter
the
\
\
enzyme’s
substrate
specificity.
\n
A:
Let's
think
step
by
step.
A
change
in
an
amino
\
\
acid
leads
to
a
change
in
the
primary
structure
of
the
protein.
A
change
in
the
\
\
primary
structure
may
lead
to
a
change
in
the
secondary
and
the
tertiary
structure
\
\
of
the
protein.
A
change
in
the
tertiary
structure
means
a
change
in
the
shape
\
\
of
the
protein,
so
(C)
has
to
be
correct.
Since
the
change
does
not
affect
the
\
\
active
site
of
the
enzyme,
we
do
not
expect
the
activity
of
the
enzyme
to
be
affected.
\
\
The
answer
is
(C).
\n\n
Q:
Which
of
the
following
is
not
a
way
to
form
recombinant
\
\
DNA?
\n
(A)
Translation
(B)
Conjugation
(C)
Specialized
transduction
(D)
Transformation
\n\
A:
Let's
think
step
by
step.
The
introduction
of
foreign
DNA
or
RNA
into
bacteria
\
\
or
eukaryotic
cells
is
a
common
technique
in
molecular
biology
and
scientific
\
\
research.
There
are
multiple
ways
foreign
DNA
can
be
introduced
into
cells
including
\
\
transformation,
transduction,
conjugation,
and
transfection.
In
contrast,
(A)
\
\
is
not
a
way
to
form
DNA:
during
translation
the
ribosomes
synthesize
proteins
\
\
from
RNA.
The
answer
is
(A).
\n\n
Q:
Homologous
structures
are
often
cited
as
evidence
\
\
for
the
process
of
natural
selection.
All
of
the
following
are
examples
of
homologous
\
\
structures
EXCEPT
\n
(A)
the
wings
of
a
bird
and
the
wings
of
a
bat
(B)
the
flippers
\
\
of
a
whale
and
the
arms
of
a
man
(C)
the
pectoral
fins
of
a
porpoise
and
the
flippers
\
\
of
a
seal
(D)
the
forelegs
of
an
insect
and
the
forelimbs
of
a
dog
\n
A:
Let's
think
\
\
step
by
step.
Homologous
structures
are
similar
physical
features
in
organisms
\
\
that
share
a
common
ancestor
but
different
functions.
Comparisons
(B)
and
(C)
\
\
are
clearly
homologous
because
they
share
a
common
ancestor
and
the
structures
\
\
serve
different
purposes.
Bat
wings
and
birg
wings
are
also
homologous,
while
\
\
they
are
both
wings,
the
forelimbs
serve
different
purposes.
Insects
and
dogs
\
\
are
very
far
ancestors
since
one
is
vertebrate
while
the
other
is
invertebrate
\
\
and
the
forelimbs
serve
the
same
purpose,
so
they
are
not
homologous.
The
answer
\
\
is
(D).
\n\n
Q:
Which
of
the
following
is
not
known
to
be
involved
in
the
control
\
\
of
cell
division?
\n
(A)
Cyclins
(B)
Protein
kinases
(C)
Checkpoints
(D)
Fibroblast
\
\
cells
\n
A:
Let's
think
step
by
step.
Normal
cells
move
through
the
cell
cycle
in
\
\
a
regulated
way.
At
the
checkpoint
stage,
they
use
information
about
their
own
\
\
internal
state
and
cues
from
the
environment
around
them
to
decide
whether
to
\
\
proceed
with
cell
division.
Cues
like
these
act
by
changing
the
activity
of
core
\
\
cell
cycle
regulators
inside
the
cell.
The
most
common
regulators
are
cyclins
\
\
and
cyclin-dependent
kinases.
Fibroblast
cells
do
not
play
any
role
in
cell
division.
\
\
The
answer
is
(D)."
"
group"
:
"
mmlu_flan_cot_fewshot_stem"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
task"
:
"
mmlu_flan_cot_fewshot_high_school_biology"
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_high_school_chemistry.yaml
0 → 100644
View file @
e1ae8a2f
"
dataset_name"
:
"
high_school_chemistry"
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
high
\
\
school
chemistry.
\n\n
Q:
Which
of
the
following
is
considered
an
acid
anhydride?
\n\
(A)
HCl
(B)
H2SO3
(C)
SO2
(D)
Al(NO3)3
\n
A:
Let's
think
step
by
step.
An
acid
anhydride
\
\
is
a
compound
that
is
derived
by
removing
water
from
an
acid.
The
chemical
formula
\
\
for
water
is
H2O,
which
means
that
we
need
to
determine
which
of
these
options,
\
\
when
combined
with
H2O,
forms
an
acid.
SO2,
or
Sulfur
dioxide,
when
combined
with
\
\
H2O,
makes
H2SO4,
or
sulfuric
acid.
The
answer
is
(C).
\n\n
Q:
Which
of
the
following
\
\
is
expected
to
be
a
polar
molecule?
\n
(A)
PCl4F
(B)
BF3
(C)
CO2
(D)
Si(CH3)4
\n\
A:
Let's
think
step
by
step.
A
polar
molecule
is
one
that
has
a
slightly
positive
\
\
charge
on
one
end
of
the
molecule
and
a
slightly
negative
charge
on
the
other
\
\
end.
Boron
trifluoride
(BF3)
has
Boron
as
the
center
atom
and
three
fluorine
atoms
\
\
attached
to
it;
it
is
trigonal
planar
and
symmetric,
so
it
is
nonpolar.
Carbon
\
\
Dioxide
(CO2)
has
Carbon
as
the
central
atom
with
double
bonds
to
two
Oxygen
atoms
\
\
-
this
is
also
symmetrical
and
therefore
nonpolar.
The
same
is
the
case
for
tetramethyl
\
\
silane
(SI(CH3)4),
which
is
a
Silicon
atom
surrounded
by
four
methyl
groups.
The
\
\
structure
of
PCL4F
is
that
Phosphorus
is
the
central
atom,
attached
to
four
chlorines
\
\
and
one
fluorine
atom.
This
is
asymmetrical,
and
therefore
has
a
net
dipole
and
\
\
is
expected
to
be
a
polar
molecule.
The
answer
is
(A).
\n\n
Q:
From
the
solubility
\
\
rules,
which
of
the
following
is
true?
\n
(A)
All
chlorides,
bromides,
and
iodides
\
\
are
soluble
(B)
All
sulfates
are
soluble
(C)
All
hydroxides
are
soluble
(D)
All
\
\
ammonium-containing
compounds
are
soluble
\n
A:
Let's
think
step
by
step.
The
chlorides,
\
\
bromides,
and
iodides
of
lead,
silver,
and
mercury
are
not
soluble
in
water.
This
\
\
rules
out
(A).
The
sulfates
of
lead,
barium,
and
calcium
are
not
soluble
in
water,
\
\
which
rules
out
(B).
The
hydroxides
of
any
metal
besides
sodium,
potassium,
ammonium,
\
\
calcium,
and
barium
are
insoluble.
This
rules
out
(C).
Typically
ammonium
ions
\
\
indicate
a
soluble
ionic
substance.
The
answer
is
(D).
\n\n
Q:
A
new
compound
is
\
\
synthesized
and
found
to
be
a
monoprotic
acid
with
a
molar
mass
of
248
g/mol.
\
\
When
0.0050
mol
of
this
acid
are
dissolved
in
0.500
L
of
water,
the
pH
is
measured
\
\
as
3.89.
What
is
the
pKa
of
this
acid?
\n
(A)
3.89
(B)
7.78
(C)
5.78
(D)
2.33
\n\
A:
Let's
think
step
by
step.
Recall
that
$[A]
=
[H^{+}]$.
Here,
this
is
equal
to
\
\
$$10^{-3.89}$.
Then
we
have
$K_{a}
=
$
\n
rac{[H^{+}][A^{-}]}{[HA]}
=
\n
rac{10^{-3.89}
\
\ \\
cdot
10^{-3.89}}{10^{-2}}.
The
resulting
exponent
is
$-3.89
+
(-3.89)
-
(-2)
\
\
=
5.78$,
therefore
$K_a
=
10^{-5.78}$.
The
$pK_a$
is
the
negative
log
of
$K_a$,
\
\
which
is
equal
to
$5.78$.
The
answer
is
(C).
\n\n
Q:
A
solution
contains
2.00
mole
\
\
of
acetic
acid,
CH3COOH,
and
1.00
mole
of
calcium
acetate,
Ca(CH3COO)2.
The
solution
\
\
is
able
to
resist
the
addition
of
a
small
amount
of
strong
acid
or
strong
base
\
\
with
only
minor
changes
in
the
pH
of
the
solution.
Larger
quantities
of
strong
\
\
acid
or
strong
base
can
cause
a
significant
change
in
pH.
How
many
moles
of
nitric
\
\
acid,
HNO3,
may
be
added
before
the
pH
begins
to
change
significantly?
\n
(A)
0.500
\
\
mole
(B)
1.00
mole
(C)
2.00
mole
(D)
3.00
mole
\n
A:
Let's
think
step
by
step.
We
\
\
would
like
to
compute
the
buffer
capacity
of
this
solution.
First
we
write
the
\
\
equation
for
the
ionization
of
the
weak
acid,
in
this
case
of
acetic
acid.
$CH_{3}COOH
\
\
(aq)
+
H_{2}O
\n
ightarrow
H_{3}O^{+}
+
CH3COO^{-}$.
The
conjugate
base
is
therefore
\
\
the
acetate
ion.
The
added
strong
acid,
Nitric
acid,
will
react
with
the
conjugate
\
\
base.
Therefore
the
maximum
amount
of
acid
that
can
be
added
will
be
equal
to
\
\
the
amount
of
acetate
ion,
or
2
moles.
The
answer
is
(C)."
"
group"
:
"
mmlu_flan_cot_fewshot_stem"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
task"
:
"
mmlu_flan_cot_fewshot_high_school_chemistry"
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_high_school_computer_science.yaml
0 → 100644
View file @
e1ae8a2f
"
dataset_name"
:
"
high_school_computer_science"
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
high
\
\
school
computer
science.
\n\n
Q:
Which
of
the
following
is
an
example
of
the
use
\
\
of
a
device
on
the
Internet
of
Things
(IoT)
?
\n
(A)
A
car
alerts
a
driver
that
\
\
it
is
about
to
hit
an
object.
(B)
A
hiker
uses
a
G
P
S
watch
to
keep
track
of
\
\
her
position.
(C)
A
refrigerator
orders
milk
from
an
online
delivery
service
when
\
\
the
milk
in
the
refrigerator
is
almost
gone.
(D)
A
runner
uses
a
watch
with
optical
\
\
sensors
to
monitor
his
heart
rate.
\n
A:
Let's
think
step
by
step.
The
term
Internet
\
\
of
Things
(IoT)
refers
to
common
devices
which
are
connected
to
the
internet,
\
\
enabling
new
functionality.
Choice
A
is
incorrect
because
it
does
not
describe
\
\
an
internet
connected
device.
In
choice
B,
the
watch
is
only
described
as
having
\
\
GPS
functionality
but
no
internet
connectivity.
Choice
C
describes
a
common
device
\
\
(a
refrigerator)
which
has
internet
connectivity
enabling
new
functionality
(online
\
\
ordering).
Choice
D
does
not
mention
internet
connectivity
for
the
watch,
only
\
\
optical
sensors.
The
answer
is
(C).
\n\n
Q:
Many
Web
browsers
allow
users
to
open
\
\
anonymous
windows.
During
a
browsing
session
in
an
anonymous
window,
the
browser
\
\
does
not
record
a
browsing
history
or
a
list
of
downloaded
files.
When
the
anonymous
\
\
window
is
exited,
cookies
created
during
the
session
are
deleted.
Which
of
the
\
\
following
statements
about
browsing
sessions
in
an
anonymous
window
is
true?
\n\
(A)
The
activities
of
a
user
browsing
in
an
anonymous
window
will
not
be
visible
\
\
to
people
who
monitor
the
user's
network,
such
as
the
system
administrator.
(B)
\
\
Items
placed
in
a
Web
store's
shopping
cart
for
future
purchase
during
the
anonymous
\
\
browsing
session
will
not
be
saved
on
the
user's
computer.
(C)
A
user
will
not
\
\
be
able
to
log
in
to
e-mail
or
social
media
accounts
during
the
anonymous
browsing
\
\
session.
(D)
A
user
browsing
in
an
anonymous
window
will
be
protected
from
viruses
\
\
launched
from
any
web
sites
visited
or
files
downloaded.
\n
A:
Let's
think
step
\
\
by
step.
Choice
A
is
incorrect
as
it
only
describes
network
traffic,
which
an
\
\
anonymous
browser
does
not
change.
Choice
B
is
correct
as
it
correctly
describes
\
\
how
an
anonymous
browser
will
prevent
saving
data
on
the
user’s
computer
after
\
\
the
session
is
ended.
Choice
C
is
incorrect
because
an
anonymous
browser
will
\
\
not
prevent
logging
in
to
email
or
social
media
accounts.
Choice
D
is
incorrect
\
\
because
an
anonymous
browser
in
itself
performs
no
virus
protection.
The
answer
\
\
is
(B).
\n\n
Q:
In
the
program
below,
the
initial
value
of
X
is
5
and
the
initial
\
\
value
of
Y
is
10.
\n
IF
(X
<
0){
\n
DISPLAY
(
\"
Foxtrot
\"
)
\n
}
ELSE
{
\n
IF
(X
>
Y){
\n\
\
DISPLAY
(
\"
Hotel
\"
)
\n
}
ELSE
{
\n
IF
(Y
>
0){
\n
DISPLAY
(
\"
November
\"
)
\n
}
\
\
ELSE
{
\n
DISPLAY
(
\"
Yankee
\"
)
\n
}
\n
}
\n
}
\n
What
is
displayed
as
a
result
of
\
\
running
the
program?
\n
(A)
Foxtrot
(B)
Hotel
(C)
November
(D)
Yankee
\n
A:
Let's
\
\
think
step
by
step.
Because
X
has
the
value
5,
the
first
conditional
IF
(X
<
0)
\
\
is
false,
so
we
move
to
the
first
ELSE
clause.
Because
X
is
5
and
Y
is
10,
the
\
\
second
conditional
IF
(X
>
Y)
is
false,
so
we
move
to
the
following
ELSE
clause.
\
\
Since
Y
is
10,
the
conditional
IF
(Y
>
0)
is
true,
so
the
command
DISPLAY
(
\"\
November
\"
)
is
executed.
The
answer
is
(C).
\n\n
Q:
What
is
the
output
of
\"
abc
\"\
[::-1]
in
Python
3?
\n
(A)
Error
(B)
abc
(C)
cba
(D)
c
\n
A:
Let's
think
step
by
step.
\
\
We
know
that
the
slicing
operator
[::-1]
takes
all
of
the
elements
in
the
string
\
\
in
reverse
order,
so
we
reverse
the
order
of
the
string
\"
abc
\"
,
resulting
in
\
\ \"
cba
\"
.
The
answer
is
(C).
\n\n
Q:
A
list
of
numbers
has
n
elements,
indexed
from
\
\
1
to
n.
The
following
algorithm
is
intended
to
display
the
number
of
elements
\
\
in
the
list
that
have
a
value
greater
than
100.
The
algorithm
uses
the
variables
\
\
count
and
position.
Steps
3
and
4
are
missing.
\n
Step
1:
Set
count
to
0
and
position
\
\
to
1.
\n
Step
2:
If
the
value
of
the
element
at
index
position
is
greater
than
\
\
100,
increase
the
value
of
count
by
1.
\n
Step
3:
(missing
step)
\n
Step
4:
(missing
\
\
step)
\n
Step
5:
Display
the
value
of
count.
\n
Which
of
the
following
could
be
used
\
\
to
replace
steps
3
and
4
so
that
the
algorithm
works
as
intended?
\n
(A)
Step
3:
\
\
Increase
the
value
of
position
by
1.
\n
Step
4:
Repeat
steps
2
and
3
until
the
\
\
value
of
count
is
greater
than
100.
\n
(B)
Step
3:
Increase
the
value
of
position
\
\
by
1.
\n
Step
4:
Repeat
steps
2
and
3
until
the
value
of
position
is
greater
than
\
\
n.
\n
(C)
Step
3:
Repeat
step
2
until
the
value
of
count
is
greater
than
100.
\n\
\
Step
4:
Increase
the
value
of
position
by
1.
\n
(D)
Step
3:
Repeat
step
2
until
\
\
the
value
of
position
is
greater
than
n.
\n
Step
4:
Increase
the
value
of
count
\
\
by
1.
\n
A:
Let's
think
step
by
step.
Choice
A
is
incorrect,
because
its
Step
4
\
\
has
an
incorrect
termination
condition,
stopping
when
count
is
greater
than
100.
\
\
We
need
to
stop
after
inspecting
all
elements
in
the
list.
Choice
B
is
correct
\
\
because
it
correctly
increments
both
count
and
position,
and
correctly
repeats
\
\
these
steps
and
terminates
when
all
elements
in
the
list
have
been
inspected.
\
\
Choice
C
is
incorrect
because
it
incorrectly
increments
the
variable
count
until
\
\
its
value
is
greater
than
100,
regardless
of
the
elements
in
the
list.
Choice
\
\
D
is
incorrect
because
its
step
3
does
not
increment
the
value
of
position,
so
\
\
it
will
repeat
forever.
The
answer
is
(B)."
"
group"
:
"
mmlu_flan_cot_fewshot_stem"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
task"
:
"
mmlu_flan_cot_fewshot_high_school_computer_science"
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_high_school_european_history.yaml
0 → 100644
View file @
e1ae8a2f
"
dataset_name"
:
"
high_school_european_history"
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
high
\
\
school
european
history.
\n\n
Q:
This
question
refers
to
the
following
information.
\n\
Albeit
the
king's
Majesty
justly
and
rightfully
is
and
ought
to
be
the
supreme
head
\
\
of
the
Church
of
England,
and
so
is
recognized
by
the
clergy
of
this
realm
in
\
\
their
convocations,
yet
nevertheless,
for
corroboration
and
confirmation
thereof,
\
\
and
for
increase
of
virtue
in
Christ's
religion
within
this
realm
of
England,
\
\
and
to
repress
and
extirpate
all
errors,
heresies,
and
other
enormities
and
abuses
\
\
heretofore
used
in
the
same,
be
it
enacted,
by
authority
of
this
present
Parliament,
\
\
that
the
king,
our
sovereign
lord,
his
heirs
and
successors,
kings
of
this
realm,
\
\
shall
be
taken,
accepted,
and
reputed
the
only
supreme
head
in
earth
of
the
Church
\
\
of
England,
called
Anglicans
Ecclesia;
and
shall
have
and
enjoy,
annexed
and
united
\
\
to
the
imperial
crown
of
this
realm,
as
well
the
title
and
style
thereof,
as
all
\
\
honors,
dignities,
preeminences,
jurisdictions,
privileges,
authorities,
immunities,
\
\
profits,
and
commodities
to
the
said
dignity
of
the
supreme
head
of
the
same
Church
\
\
belonging
and
appertaining;
and
that
our
said
sovereign
lord,
his
heirs
and
successors,
\
\
kings
of
this
realm,
shall
have
full
power
and
authority
from
time
to
time
to
\
\
visit,
repress,
redress,
record,
order,
correct,
restrain,
and
amend
all
such
\
\
errors,
heresies,
abuses,
offenses,
contempts,
and
enormities,
whatsoever
they
\
\
be,
which
by
any
manner
of
spiritual
authority
or
jurisdiction
ought
or
may
lawfully
\
\
be
reformed,
repressed,
ordered,
redressed,
corrected,
restrained,
or
amended,
\
\
most
to
the
pleasure
of
Almighty
God,
the
increase
of
virtue
in
Christ's
religion,
\
\
and
for
the
conservation
of
the
peace,
unity,
and
tranquility
of
this
realm;
any
\
\
usage,
foreign
land,
foreign
authority,
prescription,
or
any
other
thing
or
things
\
\
to
the
contrary
hereof
notwithstanding.
\n
English
Parliament,
Act
of
Supremacy,
\
\
1534
\n
From
the
passage,
one
may
infer
that
the
English
Parliament
wished
to
argue
\
\
that
the
Act
of
Supremacy
would
\n
(A)
give
the
English
king
a
new
position
of
authority
\
\
(B)
give
the
position
of
head
of
the
Church
of
England
to
Henry
VIII
alone
and
\
\
exclude
his
heirs
(C)
establish
Calvinism
as
the
one
true
theology
in
England
\
\
(D)
end
various
forms
of
corruption
plaguing
the
Church
in
England
\n
A:
Let's
think
\
\
step
by
step.
We
refer
to
Wikipedia
articles
on
european
history
for
help.
The
\
\
Act
of
Supremacy
states
that
it
grants
authority
to
the
king
\"
to
repress
and
\
\
extirpate
all
errors,
heresies,
and
other
enormities
and
abuses
\"
,
referring
to
\
\
the
corruption
in
the
Church
of
England.
The
answer
is
(D).
\n\n
Q:
This
question
\
\
refers
to
the
following
information.
\n
Read
the
following
excerpt.
\n
The
revolutionary
\
\
seed
had
penetrated
into
every
country
and
spread
more
or
less.
It
was
greatly
\
\
developed
under
the
régime
of
the
military
despotism
of
Bonaparte.
His
conquests
\
\
displaced
a
number
of
laws,
institutions,
and
customs;
broke
through
bonds
sacred
\
\
among
all
nations,
strong
enough
to
resist
time
itself;
which
is
more
than
can
\
\
be
said
of
certain
benefits
conferred
by
these
innovators.
\n
The
monarchs
will
\
\
fulfil
the
duties
imposed
upon
them
by
Him
who,
by
entrusting
them
with
power,
\
\
has
charged
them
to
watch
over
the
maintenance
of
justice,
and
the
rights
of
all,
\
\
to
avoid
the
paths
of
error,
and
tread
firmly
in
the
way
of
truth.
Placed
beyond
\
\
the
passions
which
agitate
society,
it
is
in
days
of
trial
chiefly
that
they
are
\
\
called
upon
to
despoil
realities
of
their
false
appearances,
and
to
show
themselves
\
\
as
they
are,
fathers
invested
with
the
authority
belonging
by
right
to
the
heads
\
\
of
families,
to
prove
that,
in
days
of
mourning,
they
know
how
to
be
just,
wise,
\
\
and
therefore
strong,
and
that
they
will
not
abandon
the
people
whom
they
ought
\
\
to
govern
to
be
the
sport
of
factions,
to
error
and
its
consequences,
which
must
\
\
involve
the
loss
of
society.
\n
Union
between
the
monarchs
is
the
basis
of
the
policy
\
\
which
must
now
be
followed
to
save
society
from
total
ruin.
.
.
.
\n
Let
them
not
\
\
confound
concessions
made
to
parties
with
the
good
they
ought
to
do
for
their
\
\
people,
in
modifying,
according
to
their
recognized
needs,
such
branches
of
the
\
\
administration
as
require
it.
\n
Let
them
be
just,
but
strong;
beneficent,
but
strict.
\n\
Let
them
maintain
religious
principles
in
all
their
purity,
and
not
allow
the
faith
\
\
to
be
attacked
and
morality
interpreted
according
to
the
social
contract
or
the
\
\
visions
of
foolish
sectarians.
\n
Let
them
suppress
Secret
Societies;
that
gangrene
\
\
of
society.
\n
—Klemens
von
Metternich,
Political
Confession
of
Faith,
1820
\n
Which
\
\
of
the
following
was
the
greatest
cause
of
the
fears
expressed
by
Metternich
in
\
\
the
document
above?
\n
(A)
The
ideas
of
personal
liberty
and
nationalism
conceived
\
\
during
the
Enlightenment
resulted
in
radical
revolutions
that
could
spread
throughout
\
\
Europe.
(B)
The
conquest
of
Europe
by
Napoleon
led
to
the
creation
of
new
factions
\
\
and
shifted
the
European
balance
of
power.
(C)
The
power
of
monarchs
had
grown
\
\
to
the
point
where
it
needed
to
be
checked
by
other
powers
within
each
nation
\
\
or
domination
of
civilians
would
occur.
(D)
The
rising
and
falling
economic
cycle
\
\
of
the
newly
emerging
capitalist
economy
could
lead
to
civilian
unrest
that
must
\
\
be
suppressed.
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
\
\
european
history
for
help.
The
fears
of
revolution
in
early
19th
century
Europe
\
\
expressed
by
Klemens
von
Metternich,
a
conservative
Austrian
statesman,
were
a
\
\
direct
result
of
the
age
of
Enlightenment,
a
period
of
European
history
where
\
\
the
absolute
power
of
the
monarchy
was
challenged
with
ideas
of
individual
liberty
\
\
and
nationalism,
leading
to
the
French
revolution
and
its
effects
all
over
Europe.
\
\
The
answer
is
(A).
\n\n
Q:
This
question
refers
to
the
following
information.
\n\
The
excerpts
below
are
from
the
Navigation
Acts
of
1651.
\n
[A]fter
the
first
day
\
\
of
December,
one
thousand
six
hundred
fifty
and
one,
and
from
thence
forwards,
\
\
no
goods
or
commodities
whatsoever
of
the
growth,
production
or
manufacture
of
\
\
Asia,
Africa
or
America,
or
of
any
part
thereof;
or
of
any
islands
belonging
to
\
\
them,
or
which
are
described
or
laid
down
in
the
usual
maps
or
cards
of
those
\
\
places,
as
well
of
the
English
plantations
as
others,
shall
be
imported
or
brought
\
\
into
this
Commonwealth
of
England,
or
into
Ireland,
or
any
other
lands,
islands,
\
\
plantations,
or
territories
to
this
Commonwealth
belonging,
or
in
their
possession,
\
\
in
any
other
ship
or
ships,
vessel
or
vessels
whatsoever,
but
only
in
such
as
\
\
do
truly
and
without
fraud
belong
only
to
the
people
of
this
Commonwealth,
or
\
\
the
plantations
thereof,
as
the
proprietors
or
right
owners
thereof;
and
whereof
\
\
the
master
and
mariners
are
also
of
the
people
of
this
Commonwealth,
under
the
\
\
penalty
of
the
forfeiture
and
loss
of
all
the
goods
that
shall
be
imported
contrary
\
\
to
this
act,
,
,
,
\n
[N]o
goods
or
commodities
of
the
growth,
production,
or
manufacture
\
\
of
Europe,
or
of
any
part
thereof,
shall
after
the
first
day
of
December,
one
\
\
thousand
six
hundred
fifty
and
one,
be
imported
or
brought
into
this
Commonwealth
\
\
of
England,
or
any
other
lands
or
territories
to
this
Commonwealth
belonging,
\
\
or
in
their
possession,
in
any
ship
or
ships,
vessel
or
vessels
whatsoever,
but
\
\
in
such
as
do
truly
and
without
fraud
belong
only
to
the
people
of
this
Commonwealth,
\
\
and
in
no
other,
except
only
such
foreign
ships
and
vessels
as
do
truly
and
properly
\
\
belong
to
the
people
of
that
country
or
place,
of
which
the
said
goods
are
the
\
\
growth,
production
or
manufacture.
\n
Which
of
the
following
best
describes
the
\
\
outcome
of
the
Navigation
Acts
of
1651?
\n
(A)
They
served
as
a
catalyst
for
the
\
\
growth
of
English
shipping
and
overseas
trade,
but
did
little
to
limit
the
prospects
\
\
of
the
Dutch
in
the
seventeenth
century.
(B)
They
brought
about
almost
immediate
\
\
hardships
for
the
Dutch
economy
as
their
dominance
of
overseas
trade
quickly
ended.
\
\
(C)
They
were
rescinded
during
the
restoration
of
the
Stuarts
as
they
sought
normal
\
\
diplomatic
relations
with
the
Dutch
so
not
as
to
need
Parliament's
financial
support
\
\
for
war.
(D)
They
led
to
nearly
a
century
of
recurrent
war
between
England
and
\
\
the
Netherlands,
which
would
not
end
until
after
American
independence.
\n
A:
Let's
\
\
think
step
by
step.
We
refer
to
Wikipedia
articles
on
european
history
for
help.
\
\
The
Navigation
Acts
of
1651
helped
English
shipping
by
restricting
the
ability
\
\
of
ships
from
other
European
countries,
especially
the
Dutch,
to
transport
goods
\
\
from
colonies
in
Asia
and
Africa
into
England.
The
answer
is
(A).
\n\n
Q:
This
question
\
\
refers
to
the
following
information.
\n
In
Russia
there
was
nothing
going
on
well,
\
\
and
[Souvarine]
was
in
despair
over
the
news
he
had
received.
His
old
companions
\
\
were
all
turning
to
the
politicians;
the
famous
Nihilists
who
made
Europe
tremble-sons
\
\
of
village
priests,
of
the
lower
middle
class,
of
tradesmen-could
not
rise
above
\
\
the
idea
of
national
liberation,
and
seemed
to
believe
that
the
world
would
be
\
\
delivered-when
they
had
killed
their
despot&…
\n\"
Foolery!
They'll
never
get
out
\
\
of
it
with
their
foolery.
\"\n
Then,
lowering
his
voice
still
more,
in
a
few
bitter
\
\
words
he
described
his
old
dream
of
fraternity.
He
had
renounced
his
rank
and
\
\
his
fortune;
he
had
gone
among
workmen,
only
in
the
hope
of
seeing
at
last
the
\
\
foundation
of
a
new
society
of
labour
in
common.
All
the
sous
in
his
pockets
had
\
\
long
gone
to
the
urchins
of
the
settlement;
he
had
been
as
tender
as
a
brother
\
\
with
the
colliers,
smiling
at
their
suspicion,
winning
them
over
by
his
quiet
\
\
workmanlike
ways
and
his
dislike
of
chattering.
But
decidedly
the
fusion
had
not
\
\
taken
place.
\n
His
voice
changed,
his
eyes
grew
bright,
he
fixed
them
on
étienne,
\
\
directly
addressing
him:
\n\"
Now,
do
you
understand
that?
These
hatworkers
at
Marseilles
\
\
who
have
won
the
great
lottery
prize
of
a
hundred
thousand
francs
have
gone
off
\
\
at
once
and
invested
it,
declaring
that
they
are
going
to
live
without
doing
anything!
\
\
Yes,
that
is
your
idea,
all
of
you
French
workmen;
you
want
to
unearth
a
treasure
\
\
in
order
to
devour
it
alone
afterwards
in
some
lazy,
selfish
corner.
You
may
cry
\
\
out
as
much
as
you
like
against
the
rich,
you
haven't
got
courage
enough
to
give
\
\
back
to
the
poor
the
money
that
luck
brings
you.
You
will
never
be
worthy
of
happiness
\
\
as
long
as
you
own
anything,
and
your
hatred
of
the
bourgeois
proceeds
solely
\
\
from
an
angry
desire
to
be
bourgeois
yourselves
in
their
place.
\"\n
émile
Zola,
\
\
French
writer,
Germinal,
1885
\n
The
passage
displays
the
direct
concern
for
the
\
\
welfare
of
the
working
classes
that
was
typically
a
part
of
which
movement?
\n\
(A)
Capitalist
(B)
Scientific
(C)
Communist
(D)
Existentialist
\n
A:
Let's
think
step
\
\
by
step.
We
refer
to
Wikipedia
articles
on
european
history
for
help.
The
modern
\
\
Communist
movement
aims
to
establish
a
classless
society
based
on
communal
ownership
\
\
and
distribution
of
property
and
means
of
production,
thereby
especially
benefiting
\
\
the
working
classes.
The
answer
is
(C).
\n\n
Q:
This
question
refers
to
the
following
\
\
information.
\n
The
following
excerpt
is
from
a
pamphlet.
\n
You
will
do
me
the
justice
\
\
to
remember,
that
I
have
always
strenuously
supported
the
Right
of
every
man
to
\
\
his
own
opinion,
however
different
that
opinion
might
be
to
mine.
He
who
denies
\
\
to
another
this
right,
makes
a
slave
of
himself
to
his
present
opinion,
because
\
\
he
precludes
himself
the
right
of
changing
it.
\n
The
most
formidable
weapon
against
\
\
errors
of
every
kind
is
Reason.
I
have
never
used
any
other,
and
I
trust
I
never
\
\
shall.
\n
The
circumstance
that
has
now
taken
place
in
France
of
the
total
abolition
\
\
of
the
whole
national
order
of
priesthood,
and
of
everything
appertaining
to
compulsive
\
\
systems
of
religion,
and
compulsive
articles
of
faith,
has
not
only
precipitated
\
\
my
intention,
but
rendered
a
work
of
this
kind
exceedingly
necessary,
lest
in
\
\
the
general
wreck
of
superstition,
of
false
systems
of
government,
and
false
theology,
\
\
we
lose
sight
of
morality,
of
humanity,
and
of
the
theology
that
is
true.
\n
I
believe
\
\
in
one
God,
and
no
more;
and
I
hope
for
happiness
beyond
this
life.
\n
I
believe
\
\
in
the
equality
of
man;
and
I
believe
that
religious
duties
consist
in
doing
justice,
\
\
loving
mercy,
and
endeavoring
to
make
our
fellow-creatures
happy.
\n
I
do
not
believe
\
\
in
the
creed
professed
by
the
Jewish
church,
by
the
Roman
church,
by
the
Greek
\
\
church,
by
the
Turkish
church,
by
the
Protestant
church,
nor
by
any
church
that
\
\
I
know
of.
My
own
mind
is
my
own
church.
\n
All
national
institutions
of
churches,
\
\
whether
Jewish,
Christian
or
Turkish,
appear
to
me
no
other
than
human
inventions,
\
\
set
up
to
terrify
and
enslave
mankind,
and
monopolize
power
and
profit.
\n
I
do
\
\
not
mean
by
this
declaration
to
condemn
those
who
believe
otherwise;
they
have
\
\
the
same
right
to
their
belief
as
I
have
to
mine.
\n
—Thomas
Paine,
The
Age
of
Reason,
\
\
1794–1795
\n
Which
of
the
following
Enlightenment
philosophes
designed
a
system
\
\
of
checks
and
balances
for
government
to
avoid
abuses
of
power?
\n
(A)
Jean
Jacques
\
\
Rousseau
(B)
Baron
Montesquieu
(C)
Mary
Wollstonecraft
(D)
Adam
Smith
\n
A:
Let's
\
\
think
step
by
step.
We
refer
to
Wikipedia
articles
on
european
history
for
help.
\
\
Baron
Montesquieu
was
a
18th
centrury
French
philsopher
who
wrote
extensively
\
\
against
the
monoplization
of
power
and
advocated
for
a
system
of
checks
and
balances
\
\
in
government
to
prevent
the
rise
of
despotism.
The
answer
is
(B)."
"
group"
:
"
mmlu_flan_cot_fewshot_humanities"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
task"
:
"
mmlu_flan_cot_fewshot_high_school_european_history"
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_high_school_geography.yaml
0 → 100644
View file @
e1ae8a2f
"
dataset_name"
:
"
high_school_geography"
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
high
\
\
school
geography.
\n\n
Q:
Which
one
of
the
following
items
is
an
example
of
nonmaterial
\
\
culture?
\n
(A)
Dove
soap
(B)
Dove
candy
bar
(C)
Dove
symbol
(D)
A
dove
(bird).
\n\
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
geography
for
help.
\
\
Nonmaterial
culture
consists
of
cultural
ideas,
beliefs
or
symbols
that
are
not
\
\
physical
objects.
The
answer
is
(C).
\n\n
Q:
During
the
third
stage
of
the
demographic
\
\
transition
model,
which
of
the
following
is
true?
\n
(A)
Birth
rates
increase
and
\
\
population
growth
rate
is
less
rapid.
(B)
Birth
rates
decline
and
population
growth
\
\
rate
is
less
rapid.
(C)
Birth
rates
increase
and
population
growth
rate
increases.
\
\
(D)
Birth
rates
decrease
and
population
growth
rate
increases.
\n
A:
Let's
think
\
\
step
by
step.
We
refer
to
Wikipedia
articles
on
geography
for
help.
The
demographic
\
\
transition
model
models
the
five
different
stages
of
population
growth
as
a
country
\
\
goes
through
economic
development,
where
the
third
stage
refers
to
a
period
of
\
\
declining
birth
rates
and
lower
population
growth.
The
answer
is
(B).
\n\n
Q:
The
\
\
practice
of
hiring
a
foreign
third-party
service
provider
to
run
an
operation
\
\
is
called
\n
(A)
outsourcing.
(B)
offshoring.
(C)
maquiladoras.
(D)
locational
interdependence.
\n\
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
geography
for
help.
\
\ \"
Offshoring
\"
literally
means
to
move
or
base
some
of
the
activities
or
processes
\
\
of
a
company
to
a
foreign
country.
The
answer
is
(B).
\n\n
Q:
Which
of
the
following
\
\
statements
is
NOT
accurate
regarding
the
services
provided
by
local
governments
\
\
in
the
United
States?
\n
(A)
Duplication
of
efforts
occurs
often.
(B)
Social
problems
\
\
of
the
central
city
spill
over
into
the
surrounding
residential
suburbs.
(C)
Inefficiency
\
\
in
providing
services
occurs
often.
(D)
One
neighborhood's
efforts
to
reduce
pollution
\
\
are
always
supported
by
neighboring
communities.
\n
A:
Let's
think
step
by
step.
\
\
We
refer
to
Wikipedia
articles
on
geography
for
help.
There
may
be
economic,
social
\
\
or
political
reasons
for
two
neighboring
communities
and
their
local
governments
\
\
not
agreeing
to
pollution
reduction
efforts
initiated
by
one
of
them.
The
answer
\
\
is
(D).
\n\n
Q:
The
rate
of
natural
increase
of
a
population
is
found
by
subtracting
\
\
the
\n
(A)
crude
death
rate
from
the
crude
birth
date.
(B)
crude
birth
rate
from
\
\
the
crude
death
rate.
(C)
doubling
time
from
the
crude
birth
rate.
(D)
fertility
\
\
rate
from
the
crude
death
rate.
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
\
\
articles
on
geography
for
help.
The
difference
between
number
of
births
and
deaths
\
\
gives
the
population
increase
at
any
given
time.
The
answer
is
(A)."
"
group"
:
"
mmlu_flan_cot_fewshot_social_sciences"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
task"
:
"
mmlu_flan_cot_fewshot_high_school_geography"
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_high_school_government_and_politics.yaml
0 → 100644
View file @
e1ae8a2f
"
dataset_name"
:
"
high_school_government_and_politics"
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
high
\
\
school
government
and
politics.
\n\n
Q:
Which
of
the
following
best
states
an
argument
\
\
made
by
James
Madison
in
The
Federalist
number
10?
\n
(A)
Honest
politicians
can
\
\
prevent
factions
from
developing.
(B)
Factions
are
more
likely
to
occur
in
large
\
\
republics
than
in
small
ones.
(C)
The
negative
effects
of
factionalism
can
be
\
\
reduced
by
a
republican
government.
(D)
Free
elections
are
the
people's
best
defense
\
\
against
factionalism.
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
\
\
on
government
and
politics
for
help.
In
the
Federalist
number
10,
James
Madison
\
\
advocated
for
a
representative
republican
form
of
government
to
guard
against
\
\
factionalism.
The
answer
is
(C).
\n\n
Q:
The
term
\"
budget
deficit
\"
refers
to
the
\n\
(A)
annual
increase
in
federal
spending
on
the
military
(B)
amount
of
interest
on
\
\
the
national
debt
(C)
difference
between
the
initial
budget
proposals
made
by
\
\
the
president
and
Congress
(D)
amount
the
government
spends
in
excess
of
its
revenues
\n\
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
government
and
politics
\
\
for
help.
When
the
goverment
spends
more
than
it
earns,
their
difference
is
the
\
\
budget
deficit.
The
answer
is
(D).
\n\n
Q:
Which
of
the
following
statements
about
\
\
cabinet
departments
is
FALSE?
\n
(A)
They
are
established
by
the
legislative
branch.
\
\
(B)
Their
members
often
don't
have
much
influence
over
presidential
decisions.
\
\
(C)
They
cannot
all
be
run
by
leaders
who
belong
to
the
same
political
party
the
\
\
president
does.
(D)
Not
every
federal
agency
is
a
cabinet
department.
\n
A:
Let's
\
\
think
step
by
step.
We
refer
to
Wikipedia
articles
on
government
and
politics
\
\
for
help.
There
is
no
law
stipulating
that
some
cabinet
department
leaders
have
\
\
to
belong
to
a
political
party
different
from
that
of
the
president.
The
answer
\
\
is
(C).
\n\n
Q:
Which
of
the
following
cases
established
the
precedent
that
a
defendant
\
\
must
be
informed
of
the
right
to
remain
silent,
the
right
to
a
lawyer,
and
protection
\
\
from
self-incrimination?
\n
(A)
Weeks
v.
United
States
(B)
Betts
v.
Brady
(C)
Mapp
\
\
v.
Ohio
(D)
Miranda
v.
Arizona
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
\
\
articles
on
government
and
politics
for
help.
In
the
landmark
Miranda
v.
Arizona
\
\
in
1966,
the
US
Supreme
Court,
based
on
the
Fifth
and
Sixth
Amendment
of
the
US
\
\
Constitution,
guaranteed
a
defendant's
right
to
an
attorney
and
protection
from
\
\
self-incrimination.
The
answer
is
(D).
\n\n
Q:
Uncertainty
over
the
limits
to
presidential
\
\
power
is
caused
primarily
by
the
fact
that
\n
(A)
the
constitutional
definition
\
\
of
those
powers
is
broad
and
unspecific
(B)
most
people
agree
that
the
Constitution
\
\
places
too
many
limits
on
presidential
power
(C)
the
Supreme
Court
consistently
\
\
refuses
to
rule
on
cases
concerning
presidential
powers
(D)
constitutional
amendments
\
\
have
greatly
increased
presidential
powers
\n
A:
Let's
think
step
by
step.
We
refer
\
\
to
Wikipedia
articles
on
government
and
politics
for
help.
The
US
Constitution
\
\
is
not
very
specific
about
the
powers
of
the
president,
leading
to
uncertainty
\
\
over
its
limits.
The
answer
is
(A)."
"
group"
:
"
mmlu_flan_cot_fewshot_social_sciences"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
task"
:
"
mmlu_flan_cot_fewshot_high_school_government_and_politics"
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_high_school_macroeconomics.yaml
0 → 100644
View file @
e1ae8a2f
"
dataset_name"
:
"
high_school_macroeconomics"
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
high
\
\
school
macroeconomics.
\n\n
Q:
Which
of
the
following
policies
best
describes
supply-side
\
\
fiscal
policy?
\n
(A)
An
increase
in
the
money
supply
(B)
Increased
government
spending
\
\
(C)
Lower
taxes
on
research
and
development
of
new
technology
(D)
Higher
taxes
\
\
on
household
income
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
\
\
on
macroeconomics
for
help.
Supply-side
fiscal
policy
stimulates
the
economy
by
\
\
encouraging
more
production
of
goods
and
services
through
reduction
in
taxes
and
\
\
deregulation.
The
answer
is
(C).
\n\n
Q:
The
short-run
Phillips
curve
indicates
\
\
a
\n
(A)
direct
relation
between
unemployment
and
inflation
(B)
direct
relation
\
\
between
price
and
quantity
demanded
(C)
inverse
relation
between
price
and
quantity
\
\
demanded
(D)
inverse
relation
between
unemployment
and
inflation
\n
A:
Let's
think
\
\
step
by
step.
We
refer
to
Wikipedia
articles
on
macroeconomics
for
help.
The
short-run
\
\
Phillips
curve
shows
that
whenever
unemployment
decreases
below
a
natural
level,
\
\
the
inflation
starts
increasing,
and
vice-versa.
The
answer
is
(D).
\n\n
Q:
Holding
\
\
all
else
equal
which
of
the
following
monetary
policies
would
be
used
to
boost
\
\
U.S.
exports?
\n
(A)
Increasing
the
discount
rate
(B)
Increasing
the
reserve
ratio
\
\
(C)
Buying
government
securities
(D)
Lowering
tariffs
\n
A:
Let's
think
step
by
\
\
step.
We
refer
to
Wikipedia
articles
on
macroeconomics
for
help.
Buying
government
\
\
securities
leads
to
reduction
in
demand
for
US
dollars
from
foreign
buyers,
thereby
\
\
making
it
cheaper
and
hence
making
US
exports
more
attractive.
The
answer
is
(C).
\n\
\n
Q:
A
federal
deficit
occurs
when
\n
(A)
exports
exceed
imports.
(B)
imports
exceed
\
\
exports.
(C)
federal
tax
collections
exceed
spending.
(D)
federal
spending
exceeds
\
\
federal
tax
revenues.
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
\
\
on
macroeconomics
for
help.
A
federal
deficit
occurs
when
federal
spending
exceeds
\
\
federal
income
which
is
primarily
from
tax
revenues.
The
answer
is
(D).
\n\n
Q:
\
\
Which
of
the
following
is
not
included
in
the
U.S.
GDP?
\n
(A)
The
U.S.
military
\
\
opens
a
new
base
in
a
foreign
country
with
1000
U.S.
personnel.
(B)
Japanese
consumers
\
\
buy
thousands
of
CDs
produced
in
the
United
States.
(C)
An
American
pop
singer
\
\
performs
a
sold-out
concert
in
Paris.
(D)
A
French
theatrical
production
tours
\
\
dozens
of
American
cities.
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
\
\
articles
on
macroeconomics
for
help.
The
economic
transactions
related
to
the
\
\
performance
of
the
American
pop-singer
in
Paris
happens
entirely
outside
the
U.S.
\
\
and
hence
is
not
included
in
the
GDP
numbers.
The
answer
is
(C)."
"
group"
:
"
mmlu_flan_cot_fewshot_social_sciences"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
task"
:
"
mmlu_flan_cot_fewshot_high_school_macroeconomics"
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_high_school_mathematics.yaml
0 → 100644
View file @
e1ae8a2f
"
dataset_name"
:
"
high_school_mathematics"
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
high
\
\
school
mathematics.
\n\n
Q:
Simplify
and
write
the
result
with
a
rational
denominator:
\
\
$$
\\
sqrt{
\\
sqrt[3]{
\\
sqrt{
\\
frac{1}{729}}}}$$
\n
(A)
\\
frac{3
\\
sqrt{3}}{3}
(B)
\\\
frac{1}{3}
(C)
\\
sqrt{3}
(D)
\\
frac{
\\
sqrt{3}}{3}
\n
A:
Let's
think
step
by
step.
\
\
Factoring
$729=3^6$
and
combining
the
roots
$
\\
frac{1}{2}
\\
frac{1}{3}
\\
frac{1}{2}=
\\\
frac{1}{12}$,
we
get
that
$
\\
sqrt{
\\
sqrt[3]{
\\
sqrt{
\\
frac{1}{729}}}}=
\\
left(
\\
frac{1}{3^6}
\\\
right)^{
\\
frac{1}{12}}=
\\
frac{1}{3^{
\\
frac{1}{2}}}=
\\
frac{3}{
\\
sqrt{3}}$
The
answer
\
\
is
(D).
\n\n
Q:
Five
thousand
dollars
compounded
annually
at
an
$x
\\
%$
interest
\
\
rate
takes
six
years
to
double.
At
the
same
interest
rate,
how
many
years
will
\
\
it
take
$
\\
$300$
to
grow
to
$
\\
$9600$?
\n
(A)
12
(B)
1
(C)
30
(D)
5
\n
A:
Let's
think
\
\
step
by
step.
To
go
from
$
\\
$300$
to
$
\\
$9600$,
the
value
must
go
up
by
a
factor
\
\
of
$9600/300=32=2^5$.
Since
at
this
interest
rate
it
takes
six
years
for
it
to
\
\
double,
it
will
take
$5*6=30$
years
to
grow
to
$
\\
$9600$.
The
answer
is
(C).
\n\
\n
Q:
Ten
students
take
a
biology
test
and
receive
the
following
scores:
45,
55,
\
\
50,
70,
65,
80,
40,
90,
70,
85.
What
is
the
mean
of
the
students’
test
scores?
\n\
(A)
55
(B)
60
(C)
62
(D)
65
\n
A:
Let's
think
step
by
step.
There
are
10
students
\
\
and
the
sum
of
their
scores
is
$45
+
55
+
50
+
70
+
65
+
80
+
40
+
90
+
70
+
85
\
\
=
650$,
the
mean
is
$650/10=65$.
The
answer
is
(D).
\n\n
Q:
The
variable
$x$
varies
\
\
directly
as
the
square
of
$y$,
and
$y$
varies
directly
as
the
cube
of
$z$.
If
\
\
$x$
equals
$-16$
when
$z$
equals
2,
what
is
the
value
of
$x$
when
$z$
equals
$
\\\
frac{1}{2}$?
\n
(A)
-1
(B)
16
(C)
-
\\
frac{1}{256}
(D)
\\
frac{1}{16}
\n
A:
Let's
think
\
\
step
by
step.
We
know
that
$x
\\
propto
y^2$
and
$y
\\
propto
z^3$,
so
$x
=
k
z^6$
\
\
for
some
constant
$k$.
Plugging
in
for
$x=-16$
and
$z=2$,
the
constant
value
is
\
\
$k=
\\
frac{x}{z^6}=
\\
frac{-16}{64}=-
\\
frac{1}{4}$.
So,
when
$z=
\\
frac{1}{2}$,
the
\
\
value
of
$x$
is
$x=kz^6=-
\\
frac{1}{4}
\\
frac{1}{2^6}=-
\\
frac{1}{256}$.
The
answer
\
\
is
(C).
\n\n
Q:
Joe
was
in
charge
of
lights
for
a
dance.
The
red
light
blinks
every
\
\
two
seconds,
the
yellow
light
every
three
seconds,
and
the
blue
light
every
five
\
\
seconds.
If
we
include
the
very
beginning
and
very
end
of
the
dance,
how
many
\
\
times
during
a
seven
minute
dance
will
all
the
lights
come
on
at
the
same
time?
\
\
(Assume
that
all
three
lights
blink
simultaneously
at
the
very
beginning
of
the
\
\
dance.)
\n
(A)
3
(B)
15
(C)
6
(D)
5
\n
A:
Let's
think
step
by
step.
The
least
common
\
\
multiple
of
2,
3
and
5
is
30,
so
during
a
7
minute
dance,
all
the
three
lights
\
\
will
come
on
at
the
same
time
$2*7+1=15$
times.
The
answer
is
(B)."
"
group"
:
"
mmlu_flan_cot_fewshot_stem"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
task"
:
"
mmlu_flan_cot_fewshot_high_school_mathematics"
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_high_school_microeconomics.yaml
0 → 100644
View file @
e1ae8a2f
"
dataset_name"
:
"
high_school_microeconomics"
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
high
\
\
school
microeconomics.
\n\n
Q:
Which
of
the
following
is
necessarily
a
characteristic
\
\
of
oligopoly?
\n
(A)
Free
entry
into
and
exit
from
the
market
(B)
A
few
large
producers
\
\
(C)
One
producer
of
a
good
with
no
close
substitutes
(D)
A
homogenous
product
\n\
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
microeconomics
for
\
\
help.
An
oligopoly
is
when
a
market
is
dominated
by
just
one
or
a
few
number
of
\
\
sellers
or
producers.
To
get
oligopoly,
the
market
should
have
high
barriers
to
\
\
new
entry,
and
the
product
has
differentiation.
The
answer
is
(B).
\n\n
Q:
If
the
\
\
government
subsidizes
producers
in
a
perfectly
competitive
market,
then
\n
(A)
the
\
\
demand
for
the
product
will
increase
(B)
the
demand
for
the
product
will
decrease
\
\
(C)
the
consumer
surplus
will
increase
(D)
the
consumer
surplus
will
decrease
\n\
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
microeconomics
for
\
\
help.
(A)
and
(B)
are
wrong
because
the
demand
curve
does
not
change
at
all.
If
\
\
the
government
subsidizes
producers,
the
supply
will
increase,
and
thus
the
consumer
\
\
surplus
also
increases.
The
answer
is
(C).
\n\n
Q:
Which
of
the
following
is
true
\
\
of
a
price
floor?
\n
(A)
The
price
floor
shifts
the
demand
curve
to
the
left.
(B)
\
\
An
effective
floor
creates
a
shortage
of
the
good.
(C)
The
price
floor
shifts
\
\
the
supply
curve
of
the
good
to
the
right.
(D)
To
be
an
effective
floor,
it
must
\
\
be
set
above
the
equilibrium
price.
\n
A:
Let's
think
step
by
step.
We
refer
to
\
\
Wikipedia
articles
on
microeconomics
for
help.
Price
floor
does
not
shift
the
\
\
demand
or
shift
curve.
An
effective
price
floor
should
be
set
above
the
equilibrium
\
\
price,
otherwise
the
market
bears
and
the
floor
does
not
have
effective
effect.
\
\
The
answer
is
(D).
\n\n
Q:
The
concentration
ratio
for
a
monopoly
is
\n
(A)
0
(B)
\
\
5
(C)
10
(D)
100
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
\
\
on
microeconomics
for
help.
The
concentration
ratio
is
calculated
as
the
sum
of
\
\
market
share
of
a
specific
number
of
largest
companies.
Monopoly
means
one
company
\
\
or
entity
controls
the
entire
market,
therefore,
the
concentration
ratio
is
100
\
\
percent.
The
answer
is
(D).
\n\n
Q:
In
a
competitive
labor
market
for
housepainters,
\
\
which
of
the
following
would
increase
the
demand
for
housepainters?
\n
(A)
An
effective
\
\
minimum
wage
imposed
on
this
labor
market.
(B)
An
increase
in
the
price
of
gallons
\
\
of
paint.
(C)
An
increase
in
the
construction
of
new
houses.
(D)
An
increase
in
\
\
the
price
of
mechanical
painters
so
long
as
the
output
effect
exceeds
the
substitution
\
\
effect.
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
microeconomics
\
\
for
help.
An
increase
in
the
construction
of
new
houses
means
an
increase
demand
\
\
of
in-house
painting,
thus
increases
the
demand
for
housepainters.
The
answer
\
\
is
(C)."
"
group"
:
"
mmlu_flan_cot_fewshot_social_sciences"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
task"
:
"
mmlu_flan_cot_fewshot_high_school_microeconomics"
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_high_school_physics.yaml
0 → 100644
View file @
e1ae8a2f
"
dataset_name"
:
"
high_school_physics"
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
high
\
\
school
physics.
\n\n
Q:
A
microwave
oven
is
connected
to
an
outlet,
120
V,
and
draws
\
\
a
current
of
2
amps.
At
what
rate
is
energy
being
used
by
the
microwave
oven?
\n\
(A)
10
W
(B)
30
W
(C)
60
W
(D)
240
W
\n
A:
Let's
think
step
by
step.
Rate
of
energy
\
\
usage
is
known
as
power;
in
an
dissipative
electrical
circuit,
power
is
given
\
\
by
voltage
times
current.
So
in
our
case,
the
power
is
120
V
times
2
amps,
or
\
\
240
W.
The
answer
is
(D).
\n\n
Q:
A
point
charge,
Q
=
+1
mC,
is
fixed
at
the
origin.
\
\
How
much
work
is
required
to
move
a
charge,
Q
=
+8
µC,
from
the
point
(0,
4
meters)
\
\
to
the
point
(3
meters,
0)?
\n
(A)
3.5
J
(B)
6.0
J
(C)
22.5
J
(D)
40
J
\n
A:
Let's
\
\
think
step
by
step.
To
calculate
the
work
required
to
move
a
charge
from
one
location
\
\
to
another
in
a
fixed
electric
field,
it
is
enough
to
calculate
the
potential
\
\
difference
between
the
two
locations.
Here,
the
potential
only
depends
on
the
\
\
distance
between
the
charges;
it’s
$k
q_1
q_2
/
r$,
where
$k$
is
Coulomb’s
constant.
\
\
Plugging
in
values
$q_1
=
$
1
mC,
$q_2
=
8
\\
mu$
C,
gives
the
answer
as
5.992
\
\
J,
which
rounds
to
6
J.
The
answer
is
(B).
\n\n
Q:
Which
of
the
following
conditions
\
\
will
ensure
that
angular
momentum
is
conserved?
I.
Conservation
of
linear
momentum
\
\
II.
Zero
net
external
force
III.
Zero
net
external
torque
\n
(A)
I
and
II
only
(B)
\
\
I
and
III
only
(C)
II
and
III
only
(D)
III
only
\n
A:
Let's
think
step
by
step.
\
\
Torque
is
defined
as
the
change
in
angular
momentum;
if
there
is
zero
external
\
\
torque,
angular
momentum
is
conserved.
The
answer
is
(D).
\n\n
Q:
A
photocell
of
\
\
work
function
ϕ
=
2eV
is
connected
to
a
resistor
in
series.
Light
of
frequency
\
\
f
=
1
×
10^15
Hz
hits
a
metal
plate
of
the
photocell.
If
the
power
of
the
light
\
\
is
P
=
100
W,
what
is
the
current
through
the
resistor?
\n
(A)
2:00
AM
(B)
6:00
\
\
AM
(C)
12:00
AM
(D)
24
A
\n
A:
Let's
think
step
by
step.
The
only
answer
above
which
\
\
has
units
of
current
is
D,
24
A.
The
answer
is
(D).
\n\n
Q:
A
pipe
full
of
air
is
\
\
closed
at
one
end.
A
standing
wave
is
produced
in
the
pipe,
causing
the
pipe
to
\
\
sound
a
note.
Which
of
the
following
is
a
correct
statement
about
the
wave’s
properties
\
\
at
the
closed
end
of
the
pipe?
\n
(A)
The
pressure
is
at
a
node,
but
the
particle
\
\
displacement
is
at
an
antinode.
(B)
The
pressure
is
at
an
antinode,
but
the
particle
\
\
displacement
is
at
a
node.
(C)
The
pressure
and
the
particle
displacement
are
\
\
both
at
nodes.
(D)
The
pressure
and
the
particle
displacement
are
both
at
antinodes.
\n\
A:
Let's
think
step
by
step.
At
the
closed
end
of
the
pipe,
the
particles
cannot
\
\
have
any
net
displacement
because
the
pipe
closure
stops
them.
So
the
particle
\
\
displacement
is
at
a
node.
This
closure
also
causes
the
pressure
to
be
maximal,
\
\
i.e.
an
antinode.
The
answer
is
(B)."
"
group"
:
"
mmlu_flan_cot_fewshot_stem"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
task"
:
"
mmlu_flan_cot_fewshot_high_school_physics"
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_high_school_psychology.yaml
0 → 100644
View file @
e1ae8a2f
"
dataset_name"
:
"
high_school_psychology"
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
high
\
\
school
psychology.
\n\n
Q:
Pascale
is
interested
in
the
processing
strategies
children
\
\
use
to
learn
new
information.
Pascale
would
best
be
classified
as
what
type
of
\
\
psychologist?
\n
(A)
sociocultural
(B)
clinical
(C)
cognitive
(D)
behaviorist
\n\
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
psychology
for
help.
\
\
Sociocultural
psychologist
focuses
on
the
effect
of
societal
factors
on
people.
\
\
Clinical
psychologist
focuses
on
people
with
mental
issues.
Cognitive
psychologist
\
\
focuses
on
how
people
think
and
learn,
including
the
processing
strategies.
Behaviorist
\
\
focuses
more
on
the
environment
and
experience
effect
on
people.
The
answer
is
\
\
(C).
\n\n
Q:
According
to
Caplan's
model
of
consultee-centered
case
consultation,
\
\
the
consultant
is
primarily
interested
in
\n
(A)
identifying
the
causes
and
solutions
\
\
of
the
client's
presenting
problems
(B)
identifying
and
eliminating
the
causes
\
\
of
the
consultee's
difficulties
in
handling
a
problem
(C)
establishing
a
hierarchy
\
\
of
authority
to
enable
effective
decision
making
(D)
presenting
a
single,
well-defined
\
\
and
unambiguous
course
of
action
for
the
consultant
to
overcome
skills
deficits
\n\
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
psychology
for
help.
\
\
Caplan
defines
two
type
of
consultation.
Client-centered
case
consultation
aims
\
\
to
handle
client's
problems,
while
consultee-centered
case
consultation
aims
to
\
\
identify
the
reason
of
client's
difficulty
to
solve
problems.
The
answer
is
(B).
\n\
\n
Q:
According
to
the
Individuals
with
Disabilities
Education
Improvement
Act,
which
\
\
of
the
following
must
an
educational
agency
do
before
it
changes
the
educational
\
\
placement
of
a
student
with
a
disability?
\n
(A)
Give
the
child
a
trial
period
in
\
\
the
new
environment
(B)
Notify
the
parents
in
writing
(C)
Obtain
school
board
\
\
approval
(D)
Obtain
parental
consent
\n
A:
Let's
think
step
by
step.
We
refer
to
\
\
Wikipedia
articles
on
psychology
for
help.
When
the
decision
to
change
the
educational
\
\
placement
of
a
student
with
a
disability
is
made,
the
educational
agency
must
\
\
notify
the
parents
in
writing
on
that
date.
The
answer
is
(B).
\n\n
Q:
While
swimming
\
\
in
the
ocean,
Ivan
is
frightened
by
a
dark
shadow
in
the
water
even
before
he
\
\
has
the
chance
to
identify
what
the
shadow
is.
The
synaptic
connections
taking
\
\
place
during
this
incident
of
fright
are
best
described
by
which
of
the
following?
\n\
(A)
Messages
are
sent
from
the
thalamus
directly
to
the
amygdala.
(B)
Messages
are
\
\
sent
from
the
thalamus
to
the
\"
what
\"
and
\"
where
\"
pathways.
(C)
Messages
are
\
\
sent
from
the
parasympathetic
nervous
system
to
the
cerebral
cortex.
(D)
Messages
\
\
are
sent
from
the
frontal
lobes
to
the
pituitary
gland.
\n
A:
Let's
think
step
by
\
\
step.
We
refer
to
Wikipedia
articles
on
psychology
for
help.
Our
neural
system
\
\
has
a
mechanism
that
can
respond
immediate
emotional
signal
before
going
to
the
\
\
thought
center.
In
the
Ivan's
case,
messages
travel
directly
from
thalamus
to
\
\
amygdala.
The
answer
is
(A).
\n\n
Q:
Ani
believes
that
her
attitudes
and
behavior
\
\
play
a
central
role
in
what
happens
to
her.
Such
a
belief
is
likely
to
be
associated
\
\
with
\n
(A)
a
strong
superego.
(B)
low
self-esteem.
(C)
low
self-efficacy.
(D)
an
\
\
internal
locus
of
control.
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
\
\
articles
on
psychology
for
help.
People
with
an
external
locus
of
control
believes
\
\
fate
and
luck
play
an
important
role
in
their
lives,
while
people
with
an
internal
\
\
locus
of
control
believes
they
control
their
lives.
The
answer
is
(D)."
"
group"
:
"
mmlu_flan_cot_fewshot_social_sciences"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
task"
:
"
mmlu_flan_cot_fewshot_high_school_psychology"
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_high_school_statistics.yaml
0 → 100644
View file @
e1ae8a2f
"
dataset_name"
:
"
high_school_statistics"
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
high
\
\
school
statistics.
\n\n
Q:
A
new
smartwatch
is
manufactured
in
one
part
of
a
factory,
\
\
then
secured
for
shipping
in
another,
independent
part
of
the
factory.
The
weight
\
\
of
the
smartwatch
has
a
mean
of
62
grams
and
a
standard
deviation
of
1.0
grams.
\
\
The
weight
of
the
packaging
(box,
user's
guide,
bubble
wrap,
etc.)
has
a
mean
\
\
of
456
grams
and
a
standard
deviation
of
6
grams.
Together,
the
distribution
of
\
\
the
weight
of
the
smartwatch
and
its
packaging
would
have
the
following
mean
and
\
\
standard
deviation:
\n
(A)
Mean
518
grams;
standard
deviation
7.0
grams
(B)
Mean
\
\
518
grams;
standard
deviation
3.5
grams
(C)
Mean
518
grams;
standard
deviation
\
\
6.1
grams
(D)
Mean
394
grams;
standard
deviation
6.1
grams
\n
A:
Let's
think
step
\
\
by
step.
Since
the
weight
of
the
watch
and
the
weight
of
the
packaging
are
independent
\
\
random
variables,
the
mean
and
variance
of
their
sum
is
equal
to
the
sum
of
their
\
\
individual
means
and
variances.
So
the
mean
is
62
+
456
=
518
grams,
and
the
variances
\
\
is
1.0^2
+
6.0^2
=
37,
leading
to
a
standard
deviation
of
6.1
grams.
The
answer
\
\
is
(C).
\n\n
Q:
After
a
frost
warning
was
issued,
the
owner
of
a
large
orange
grove
\
\
asked
his
workers
to
spray
all
his
trees
with
water.
The
water
was
supposed
to
\
\
freeze
and
form
a
protective
covering
of
ice
around
the
orange
blossom.
Nevertheless,
\
\
the
owner
suspected
that
some
trees
suffered
considerable
damage
due
to
the
frost.
\
\
To
estimate
the
proportion
of
trees
that
suffered
more
than
50
percent
damage
\
\
due
to
the
frost,
he
took
a
random
sample
of
100
trees
from
his
grove.
What
is
\
\
the
response
variable
in
this
experiment?
\n
(A)
The
proportion
of
trees
that
suffered
\
\
more
than
50
percent
damage
due
to
frost.
(B)
The
number
of
trees
affected
by
\
\
the
frost.
(C)
The
number
of
trees
sampled
from
the
grove.
(D)
For
each
sampled
\
\
tree,
whether
it
suffered
more
than
50
percent
damage
or
at
most
50
percent
damage.
\n\
A:
Let's
think
step
by
step.
In
this
experiment,
the
response
variable
is
what
is
\
\
measured.
For
each
tree,
what
is
measured
is
whether
or
not
it
suffered
more
than
\
\
50
percent
damage
due
to
the
frost.
The
answer
is
(D).
\n\n
Q:
Suppose
X
and
Y
are
\
\
random
variables
with
E(X)
=
37,
var(X)
=
5,
E(Y)
=
62,
and
var(Y)
=
12.
What
\
\
are
the
expected
value
and
variance
of
the
random
variable
X
+
Y?
\n
(A)
E(X
+
Y)
\
\
=
99,
var(X
+
Y)
=
8.5
(B)
E(X
+
Y)
=
99,
var(X
+
Y)
=
13
(C)
E(X
+
Y)
=
99,
var(X
\
\
+
Y)
=
17
(D)
There
is
insufficient
information
to
answer
this
question.
\n
A:
Let's
\
\
think
step
by
step.
While
means
of
sums
of
random
variables
add
(regardless
of
\
\
whether
the
variables
are
independent)
in
order
to
determine
the
variance
of
a
\
\
sum
of
random
variables,
we
need
to
know
not
just
their
individual
variances
but
\
\
the
covariance
of
the
two
variables,
which
is
not
given
in
this
problem.
The
answer
\
\
is
(D).
\n\n
Q:
Which
of
the
following
sets
has
the
smallest
standard
deviation?
\
\
Which
has
the
largest?
\n
I:
{1,2,3}
\n
II:
{-10,10}
\n
III:
{100}
\n
(A)
I,
II
(B)
II,
\
\
III
(C)
III,
I
(D)
III,
II
\n
A:
Let's
think
step
by
step.
The
variance
of
distribution
\
\
I
is
the
expected
squared
deviation
from
its
mean
(which
is
2),
so
the
variance
\
\
is
2/3
.
The
variance
of
distribution
II
is
10^2
(because
both
elements
are
10
\
\
away
from
the
mean
of
zero).
The
variance
of
distribution
III
is
0,
since
it
has
\
\
a
single
entry.
So
distribution
III
has
the
smallest
standard
deviation
and
distribution
\
\
II
has
the
largest.
The
answer
is
(D).
\n\n
Q:
Which
of
the
following
is
a
correct
\
\
statement
about
correlation?
\n
(A)
If
the
slope
of
the
regression
line
is
exactly
\
\
1,
then
the
correlation
is
exactly
1.
(B)
If
the
correlation
is
0,
then
the
slope
\
\
of
the
regression
line
is
undefined.
(C)
Switching
which
variable
is
called
x
\
\
and
which
is
called
y
changes
the
sign
of
the
correlation.
(D)
The
correlation
\
\
r
is
equal
to
the
slope
of
the
regression
line
when
z-scores
for
the
y-variable
\
\
are
plotted
against
z-scores
for
the
x-variable.
\n
A:
Let's
think
step
by
step.
\
\
Statement
A
is
false
because
the
slope
of
the
regression
line
being
exactly
1
\
\
can
occur
even
when
the
two
variables
are
not
perfectly
correlated.
Statement
\
\
B
is
false
because
uncorrelated
variables
regression
lines
can
have
slope
zero.
\
\
Statement
C
is
false
because
correlation
is
symmetric
in
the
two
random
variables.
\
\
The
answer
is
(D)."
"
group"
:
"
mmlu_flan_cot_fewshot_stem"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
task"
:
"
mmlu_flan_cot_fewshot_high_school_statistics"
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_high_school_us_history.yaml
0 → 100644
View file @
e1ae8a2f
"
dataset_name"
:
"
high_school_us_history"
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
high
\
\
school
us
history.
\n\n
Q:
This
question
refers
to
the
following
information.
\n\
I
come
not
to
urge
personal
claims,
nor
to
seek
individual
benefits;
I
appear
as
\
\
the
advocate
of
those
who
cannot
plead
their
own
cause;
I
come
as
the
friend
of
\
\
those
who
are
deserted,
oppressed,
and
desolate.
In
the
Providence
of
God,
I
am
\
\
the
voice
of
the
maniac
whose
piercing
cries
from
the
dreary
dungeons
of
your
\
\
jails
penetrate
not
your
Halls
of
Legislation.
I
am
the
Hope
of
the
poor
crazed
\
\
beings
who
pine
in
the
cells,
and
stalls,
and
cages,
and
waste
rooms
of
your
poor-houses.
\
\
I
am
the
Revelation
of
hundreds
of
wailing,
suffering
creatures,
hidden
in
your
\
\
private
dwellings,
and
in
pens
and
cabins—shut
out,
cut
off
from
all
healing
influences,
\
\
from
all
mind-restoring
cares.…
Could
their
melancholy
histories
be
spread
before
\
\
you
as
revealed
to
my
grieved
spirit
during
the
last
three
months,
how
promptly,
\
\
how
earnestly
would
you
search
out
the
most
approved
means
of
relief;
how
trifling,
\
\
how
insignificant,
by
comparison,
would
appear
the
sacrifices
you
are
asked
to
\
\
make;
how
would
a
few
dimes
and
dollars,
gathered
from
each
citizen,
diminish
\
\
in
value
as
a
possession,
compared
with
the
certain
benefits
and
vast
good
to
\
\
be
secured
for
the
suffering
insane...by
the
consecration
and
application
of
a
\
\
sufficient
fund
to
the
construction
of
a
suitable
hospital.…
\n
—Dorothea
Dix,
Memorial
\
\
Soliciting
a
State
Hospital
for
the
Protection
and
Cure
of
the
Insane,
\n
Submitted
\
\
to
the
General
Assembly
of
North
Carolina,
November
1848
\n
Dorothea
Dix
can
best
\
\
be
compared
to
whom?
\n
(A)
Abigail
Adams
(B)
Clara
Barton
(C)
Shirley
Temple
(D)
\
\
Hillary
Clinton
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
\
\
us
history
for
help.
Both
Dorothea
Dix
and
Clara
barton
are
American
nurses.
The
\
\
answer
is
(B).
\n\n
Q:
This
question
refers
to
the
following
information.
\n\"
As
\
\
our
late
Conduct
at
the
Conestoga
Manor
and
Lancaster
have
occasioned
much
Speculation
\
\
&
a
great
diversity
of
Sentiments
in
this
and
neighboring
Governments;
some
vindicating
\
\
&
others
condemning
it;
some
charitably
alleviating
the
Crime,
&
others
maliciously
\
\
painting
it
in
the
most
odious
&
detestable
Colours,
we
think
it
our
duty
to
lay
\
\
before
the
Publick,
the
whole
Matter
as
it
appeared,
&
still
appears,
to
us.
.
\
\
.
.
\n\"
If
these
things
are
not
sufficient
to
prove
an
unjustifiable
Attachment
\
\
in
the
Quakers
to
the
Indians
Savages,
a
fixed
Resolution
to
befriend
them
&
an
\
\
utter
insensibility
to
human
Distresses,
let
us
consider
a
few
more
recent
Facts.
\
\
When
we
found
the
last
Summer
that
we
were
likely
to
get
no
Assistance
from
the
\
\
Government,
some
Volunteers
went
out
at
our
own
Expense,
determined
to
drive
our
\
\
Enemies
from
our
Borders;
&
when
we
came
near
to
the
great
Island,
we
understood
\
\
that
a
Number
of
their
Warriors
had
gone
out
against
our
Frontiers.
Upon
this
\
\
we
returned
and
came
up
with
them
and
fought
with
them
at
the
Munfey
Hill
where
\
\
we
lost
some
of
our
Men
&
killed
some
of
their
Warriors
&
thereby
saved
our
Frontiers
\
\
from
this
Story
in
another
Expedition.
But
no
sooner
had
we
destroyed
their
Provisions
\
\
on
the
great
Island,
&
ruined
their
trade
with
the
good
People
at
Bethlehem,
but
\
\
these
very
Indians,
who
were
justly
suspected
of
having
murdered
our
Friends
in
\
\
Northampton
County,
were
by
the
Influence
of
some
Quakers
taken
under
the
Protection
\
\
of
the
Government
to
screen
them
from
the
Resentments
of
the
Friends
and
Relations
\
\
of
the
Murdered,
&
to
support
them
thro
the
Winter.
\"\n
—
\"
Apology
of
the
Paxton
\
\
Boys
\"
(pamphlet),
1764
(Note:
\"
apology
\"
in
this
context
should
be
read
as
an
\
\
explanation,
not
an
admission
of
guilt
or
regret.
\n
The
sentiments
expressed
in
\
\
the
explanation
above
reflect
which
of
the
ongoing
tensions
during
the
colonial
\
\
period
of
American
history?
\n
(A)
Tensions
between
British
policies
and
the
aspirations
\
\
of
North
American
colonists.
(B)
Tensions
between
American
Indians
allied
with
\
\
the
French
and
those
allied
with
the
British.
(C)
Tensions
between
freed
African
\
\
Americans
and
white
planters.
(D)
Tensions
between
backcountry
settlers
and
elites
\
\
within
colonial
America.
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
\
\
on
us
history
for
help.
After
the
French
and
Indian
War,
the
Scotch-Irish
settlers
\
\
attacked
American
Indians.
After
the
attacks
on
the
Conestoga,
about
250
Paxton
\
\
Boys
present
their
grievances
to
the
Pennsylvania
legislature.
As
mentioned
in
\
\
the
information,
the
Paxton
Boys
cited
resentiment
at
local
elites.
The
answer
\
\
is
(D).
\n\n
Q:
This
question
refers
to
the
following
information.
\n
Our
leaders
\
\
talk
about
stopping
aggression
from
the
north,
but
this
was
a
struggle
among
groups
\
\
of
Vietnamese
until
we
intervened.
We
seem
bent
upon
saving
the
Vietnamese
from
\
\
Ho
Chi
Minh
even
if
we
have
to
kill
them
and
demolish
their
country
to
do
it.
\
\
As
the
native
people
survey
bombed-out
villages,
women
and
children
burned
by
\
\
napalm,
rice
crops
destroyed
and
cities
overrun
with
our
military
personnel,
they
\
\
are
doubtless
saying
secretly
of
the
Vietcong
guerillas
and
of
the
American
forces,
\
\ \"
A
plague
on
both
your
houses.
\"
…
Stop
the
bombing,
north
and
south,
end
search
\
\
and
destroy
offensive
sweeps,
and
confine
our
military
action
to
holding
operations
\
\
on
the
ground.
Bombing
the
north
has
failed
to
halt
or
seriously
check
the
flow
\
\
of
troops
to
the
south
and
may,
in
fact,
have
prompted
a
much
greater
war
effort
\
\
by
Hanoi.
\n
—Senator
George
McGovern,
\"
The
Lessons
of
Vietnam,
\"
April
25,
1967
\n\
Which
of
the
following
opinions
from
the
1960s
most
directly
reflects
the
perspective
\
\
of
George
McGovern's
speech?
\n
(A)
Americans
must
maximize
their
technological
\
\
edge
in
Vietnam.
(B)
American
bombing
in
Vietnam
is
step
by
step
leading
to
progress
\
\
in
the
war.
(C)
American
bombing
in
Vietnam
is
a
failure.
(D)
America
must
not
\
\
give
in
to
defeatism
about
the
war
in
Vietnam.
\n
A:
Let's
think
step
by
step.
We
\
\
refer
to
Wikipedia
articles
on
us
history
for
help.
\"
Stop
the
bombing
\"
and
\"\
Bombing
the
north
has
failed
to
halt
or
seriously
check
the
flow
of
troops
to
the
\
\
south
\"
indicate
that
the
perspective
of
George
McGovern's
speech
is
that
Amerian
\
\
bombing
in
Vietnam
is
a
failure.
The
answer
is
(C).
\n\n
Q:
This
question
refers
\
\
to
the
following
information.
\n\"
In
the
new
Code
of
Laws
which
I
suppose
it
will
\
\
be
necessary
for
you
to
make
I
desire
you
would
Remember
the
Ladies,
and
be
more
\
\
generous
and
favorable
to
them
than
your
ancestors.
Do
not
put
such
unlimited
\
\
power
into
the
hands
of
the
Husbands.
Remember
all
Men
would
be
tyrants
if
they
\
\
could.
If
particular
care
and
attention
is
not
paid
to
the
Ladies
we
are
determined
\
\
to
foment
a
Rebellion,
and
will
not
hold
ourselves
bound
by
any
Laws
in
which
\
\
we
have
no
voice,
or
Representation.
\"\n
Abigail
Adams,
in
a
letter
to
John
Adams,
\
\
1776
\n\"
Special
legislation
for
woman
has
placed
us
in
a
most
anomalous
position.
\
\
Women
invested
with
the
rights
of
citizens
in
one
section—voters,
jurors,
office-holders—crossing
\
\
an
imaginary
line,
are
subjects
in
the
next.
In
some
States,
a
married
woman
may
\
\
hold
property
and
transact
business
in
her
own
name;
in
others,
her
earnings
belong
\
\
to
her
husband.
In
some
States,
a
woman
may
testify
against
her
husband,
sue
and
\
\
be
sued
in
the
courts;
in
others,
she
has
no
redress
in
case
of
damage
to
person,
\
\
property,
or
character.
In
case
of
divorce
on
account
of
adultery
in
the
husband,
\
\
the
innocent
wife
is
held
to
possess
no
right
to
children
or
property,
unless
\
\
by
special
decree
of
the
court.
But
in
no
State
of
the
Union
has
the
wife
the
\
\
right
to
her
own
person,
or
to
any
part
of
the
joint
earnings
of
the
co-partnership
\
\
during
the
life
of
her
husband.
In
some
States
women
may
enter
the
law
schools
\
\
and
practice
in
the
courts;
in
others
they
are
forbidden.
In
some
universities
\
\
girls
enjoy
equal
educational
advantages
with
boys,
while
many
of
the
proudest
\
\
institutions
in
the
land
deny
them
admittance,
though
the
sons
of
China,
Japan
\
\
and
Africa
are
welcomed
there.
But
the
privileges
already
granted
in
the
several
\
\
States
are
by
no
means
secure.
\"\n
Susan
B.
Anthony,
\"
Declaration
of
Rights
for
\
\
Women,
\"
July
4,
1876
\n
The
sentiments
expressed
in
the
second
excerpt
by
Susan
\
\
B.
Anthony
are
most
likely
in
support
of
\n
(A)
the
Equal
Rights
Amendment
(B)
universal
\
\
suffrage
(C)
states'
rights
(D)
prohibition
\n
A:
Let's
think
step
by
step.
We
refer
\
\
to
Wikipedia
articles
on
us
history
for
help.
The
above
information
mentioned
\
\
that
women
are
in
an
anomalous
position
in
terms
of
legislation.
Women's
earnings
\
\
do
not
belong
to
themselves,
or
they
cannot
testify
against
her
husbands.
Susan
\
\
believes
women
should
have
equal
legal
rights
as
men.
The
answer
is
(B).
\n\n
Q:
\
\
This
question
refers
to
the
following
information.
\n\"
Society
in
every
state
is
\
\
a
blessing,
but
government
even
in
its
best
state
is
but
a
necessary
evil;
in
\
\
its
worst
state
an
intolerable
one;
for
when
we
suffer,
or
are
exposed
to
the
\
\
same
miseries
by
a
government,
which
we
might
expect
in
a
country
without
government,
\
\
our
calamity
is
heightened
by
reflecting
that
we
furnish
the
means
by
which
we
\
\
suffer.
Government,
like
dress,
is
the
badge
of
lost
innocence;
the
palaces
of
\
\
kings
are
built
on
the
ruins
of
the
bowers
of
paradise.
For
were
the
impulses
\
\
of
conscience
clear,
uniform,
and
irresistibly
obeyed,
man
would
need
no
other
\
\
lawgiver;
but
that
not
being
the
case,
he
finds
it
necessary
to
surrender
up
a
\
\
part
of
his
property
to
furnish
means
for
the
protection
of
the
rest;
and
this
\
\
he
is
induced
to
do
by
the
same
prudence
which
in
every
other
case
advises
him
\
\
out
of
two
evils
to
choose
the
least.
Wherefore,
security
being
the
true
design
\
\
and
end
of
government,
it
unanswerably
follows
that
whatever
form
thereof
appears
\
\
most
likely
to
ensure
it
to
us,
with
the
least
expense
and
greatest
benefit,
is
\
\
preferable
to
all
others.
\"\n
Thomas
Paine,
Common
Sense,
1776
\n
Which
of
the
following
\
\ \"
miseries
\"
alluded
to
above
were
most
condemned
by
Anti-Federalists
of
the
post-Revolutionary
\
\
era?
\n
(A)
Organized
response
to
Bacon's
Rebellion
(B)
Federal
response
to
Shays's
\
\
Rebellion
(C)
Federal
response
to
the
Whiskey
Rebellion
(D)
Federal
response
to
\
\
Pontiac's
Rebellion
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
\
\
on
us
history
for
help.
Anti-Federalists
do
not
believe
centralized
government
\
\
power,
and
suspect
Washington's
military
response
to
Whiskey
Rebellion.
Bacon's
\
\
Rebellion
and
Pontiac's
Rebellion
happen
before
the
Revolution
and
they
can
be
\
\
ruled
out.
The
answer
is
(C)."
"
group"
:
"
mmlu_flan_cot_fewshot_humanities"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
task"
:
"
mmlu_flan_cot_fewshot_high_school_us_history"
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_high_school_world_history.yaml
0 → 100644
View file @
e1ae8a2f
"
dataset_name"
:
"
high_school_world_history"
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
high
\
\
school
world
history.
\n\n
Q:
This
question
refers
to
the
following
information.
\n\
\"
At
least
one
of
the
[world's]
societies
would
have
to
somehow
enormously
increase
\
\
its
productivity
[in
order
to
achieve
global
hegemony].
That
quantum
jump
would
\
\
have
to
be
made
before
the
various
scientific,
technological,
agricultural,
and
\
\
industrial
revolutions
on
which
our
post-quantum-leap
world
rests.
It
could
only
\
\
be
accomplished
by
exploiting
the
ecosystems,
mineral
resources,
and
human
assets
\
\
of
whole
continents
outside
the
lands
of
the
society
making
the
jump.
Western
\
\
Europe
did
just
that
by
means
of
its
brutality
and
guns
and,
more
important,
by
\
\
geographical
and
ecological
luck.
\"\n
Copyright
©
2015
Cambridge
University
Press.
\n\
Alfred
Crosby,
historian,
Ecological
Imperialism,
2004
\n
The
\"
quantum
jump
\"
mentioned
\
\
in
the
passage
most
directly
contributed
to
which
of
the
following
developments
\
\
in
the
period
1450–1750
C.E.?
\n
(A)
A
breakdown
in
trade
routes
through
the
collapse
\
\
of
the
established
state
structure
(B)
An
increase
in
the
population
of
the
world
\
\
through
more
plentiful
supplies
of
food
(C)
The
spread
of
Chinese
and
Indian
belief
\
\
systems
across
the
world
(D)
An
increase
in
social
unrest
\n
A:
Let's
think
step
\
\
by
step.
We
refer
to
Wikipedia
articles
on
world
history
for
help.
The
\"
quantum
\
\
jump
\"
mentioned
in
the
passage
refers
to
the
conquest
of
the
New
World
and
the
\
\
Columbian
Exchange.
Choice
(A)
and
(C)
did
not
happen
in
history.
Choice
(C)
refers
\
\
to
the
human
assets.
The
answer
is
(B).
\n\n
Q:
This
question
refers
to
the
following
\
\
information.
\n\"
The
struggle
against
neo-colonialism
is
not
aimed
at
excluding
\
\
the
capital
of
the
developed
world
from
operating
in
less
developed
countries.
\
\
It
is
aimed
at
preventing
the
financial
power
of
the
developed
countries
being
\
\
used
in
such
a
way
as
to
impoverish
the
less
developed.
\n
Non-alignment,
as
practiced
\
\
by
Ghana
and
many
other
countries,
is
based
on
co-operation
with
all
States
whether
\
\
they
be
capitalist,
socialist
or
have
a
mixed
economy.
Such
a
policy,
therefore,
\
\
involves
foreign
investment
from
capitalist
countries,
but
it
must
be
invested
\
\
in
accordance
with
a
national
plan
drawn
up
by
the
government
of
the
non-aligned
\
\
State
with
its
own
interests
in
mind.
The
issue
is
not
what
return
the
foreign
\
\
investor
receives
on
his
investments…The
question
is
one
of
power.
A
State
in
\
\
the
grip
of
neo-colonialism
is
not
master
of
its
own
destiny.
\"\n
Kwame
Nkrumah,
\
\
Neo-Colonialism,
1965
\n
Which
of
the
following
provides
the
best
context
for
Nkrumah's
\
\
writings?
\n
(A)
The
Industrial
Revolution
(B)
Decolonization
(C)
Regional
Free
\
\
Trade
Associations
(D)
Autarky
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
\
\
articles
on
world
history
for
help.
The
passage
expresses
a
point
that
the
successful
\
\
fight
against
neo-colonialism
were
in
danger
and
the
newly
independent
nations
\
\
like
Ghana
may
be
re-colonized
via
financial
power
of
the
developed
countries.
\
\
The
answer
is
(B).
\n\n
Q:
This
question
refers
to
the
following
information.
\n\"\
Indeed,
as
both
the
fatwas
of
distinguished
[scholars]
who
base
their
opinion
on
\
\
reason
and
tradition
alike
and
the
consensus
of
the
Sunni
community
agree
that
\
\
the
ancient
obligation
of
extirpation,
extermination,
and
expulsion
of
evil
innovation
\
\
must
be
the
aim
of
our
exalted
aspiration,
for
\"
Religious
zeal
is
a
victory
for
\
\
the
Faith
of
God
the
Beneficent
\"
;
then,
in
accordance
with
the
words
of
the
Prophet
\
\
(Peace
upon
him!)
\"
Whosoever
introduces
evil
innovation
into
our
order
must
be
\
\
expelled
\"
and
\"
Whosoever
does
aught
against
our
order
must
be
expelled,
\"
action
\
\
has
become
necessary
and
exigent…
\"\n
Letter
from
Ottoman
Sultan
Selim
I
to
Safavid
\
\
Shah
Ismail
I,
1514
\n
The
letter
from
Selim
I
is
most
clearly
an
example
of
which
\
\
of
the
following?
\n
(A)
The
maintenance
of
military
supremacy
at
all
costs
(B)
\
\
Expanding
tensions
between
religious
sects
(C)
Factors
that
brought
about
the
\
\
collapse
of
the
Ottoman
Empire
(D)
Peacemaking
efforts
among
the
Islamic
empires
\n\
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
world
history
for
\
\
help.
The
passage
is
an
example
of
expanding
tensions
between
Selim
and
Ismail.
\
\
In
the
passage
the
Selim
references
the
fatwa
and
the
consensus
of
the
Sunni
community
\
\
to
against
whosoever
introduces
evil.
The
answer
is
(B).
\n\n
Q:
This
question
refers
\
\
to
the
following
information.
\n\"
The
real
grievance
of
the
worker
is
the
insecurity
\
\
of
his
existence;
he
is
not
sure
that
he
will
always
have
work,
he
is
not
sure
\
\
that
he
will
always
be
healthy,
and
he
foresees
that
he
will
one
day
be
old
and
\
\
unfit
to
work.
If
he
falls
into
poverty,
even
if
only
through
a
prolonged
illness,
\
\
he
is
then
completely
helpless,
exam_ins
to
his
own
devices,
and
society
does
\
\
not
currently
recognize
any
real
obligation
towards
him
beyond
the
usual
help
\
\
for
the
poor,
even
if
he
has
been
working
all
the
time
ever
so
faithfully
and
\
\
diligently.
The
usual
help
for
the
poor,
however,
leaves
a
lot
to
be
desired,
\
\
especially
in
large
cities,
where
it
is
very
much
worse
than
in
the
country.
\"\
\n
Otto
von
Bismarck,
1884
\n
Otto
von
Bismarck
likely
made
this
speech
in
reaction
\
\
to
which
of
the
following
issues?
\n
(A)
Social
acceptance
of
child
labor
(B)
Declining
\
\
life
expectancy
in
Germany
(C)
Criticisms
of
German
trade
tariffs
(D)
Negative
\
\
effects
attributed
to
industrial
capitalism
\n
A:
Let's
think
step
by
step.
We
refer
\
\
to
Wikipedia
articles
on
world
history
for
help.
The
passage
talks
about
the
grievance
\
\
of
the
work
under
the
industrial
capitalism.
The
answer
is
(D).
\n\n
Q:
This
question
\
\
refers
to
the
following
information.
\n
He
contains
all
works
and
desires
and
all
\
\
perfumes
and
all
tastes.
He
enfolds
the
whole
universe
and
in
silence
is
loving
\
\
to
all.
This
is
the
Spirit
that
is
in
my
heart,
this
is
Brahman.
To
him
I
shall
\
\
come
when
I
go
beyond
this
life,
and
to
him
will
come
he
who
has
faith
and
doubts
\
\
not.
\n
—The
Upanishads,
India,
c.
1000
BCE
\n
To
which
religion
does
the
speaker
\
\
most
likely
belong?
\n
(A)
Hinduism
(B)
Buddhism
(C)
Shintoism
(D)
Zoroastrianism
\n\
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
world
history
for
\
\
help.
Brahman
refers
to
the
ultimate
reality
of
all
things
in
the
Hindu
religion.
\
\
In
contrast,
Buddhism
does
not
have
a
concept
of
supreme
God.
The
answer
is
(A)."
"
group"
:
"
mmlu_flan_cot_fewshot_humanities"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
task"
:
"
mmlu_flan_cot_fewshot_high_school_world_history"
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_human_aging.yaml
0 → 100644
View file @
e1ae8a2f
"
dataset_name"
:
"
human_aging"
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
human
\
\
aging.
\n\n
Q:
All
other
things
being
equal,
which
of
the
following
persons
is
more
\
\
likely
to
show
osteoporosis?
\n
(A)
An
older
Hispanic
American
woman
(B)
An
older
\
\
African
American
woman
(C)
An
older
Asian
American
woman
(D)
An
older
Native
American
\
\
woman
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
human
aging
\
\
for
help.
Although
osteoporosis
can
occur
at
any
age,
the
risk
is
higher
for
older
\
\
people.
It
is
most
common
in
Asian
and
non-Hispanic
white
women.
The
answer
is
\
\
(C).
\n\n
Q:
The
finding
that
adults
tend
to
remember
events
from
their
adolescence
\
\
better
than
from
other
periods
in
their
lives
is
referred
to
as
the
\n
(A)
Adolescence
\
\
advantage
(B)
Reminiscence
bump
(C)
Memorial
memorial
(D)
Quadratic
retrieval
\
\
spike
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
human
aging
\
\
for
help.
Reminiscence
bump
is
a
phenomenon
that
older
adults
tend
to
recollect
\
\
events
during
their
young
ages.
People
usually
have
a
period
of
childhood
amnesia
\
\
from
birth
to
around
age
5,
and
a
reminiscence
bump
between
10
and
30.
The
answer
\
\
is
(B).
\n\n
Q:
Which
element
in
tobacco
smoke
is
responsible
for
cancers?
\n
(A)
\
\
Nicotine
(B)
Tar
(C)
Carbon
monoxide
(D)
Smoke
particles
\n
A:
Let's
think
step
\
\
by
step.
We
refer
to
Wikipedia
articles
on
human
aging
for
help.
The
benzene,
\
\
acrylamide
and
acrylonitrile
in
tar
interact
with
the
lungs
and
cause
DNA
mutations
\
\
in
cells
of
the
lungs,
and
lead
to
cancer.
The
answer
is
(B).
\n\n
Q:
When
older
\
\
adults
move
to
a
new
state
after
retirement,
which
of
the
following
is
the
more
\
\
likely
destination?
\n
(A)
Texas
(B)
California
(C)
Hawaii
(D)
Vermont
\n
A:
Let's
\
\
think
step
by
step.
We
refer
to
Wikipedia
articles
on
human
aging
for
help.
Texas
\
\
does
not
have
state
tax,
and
has
low
cost
of
living
compared
with
the
other
three
\
\
options.
The
answer
is
(A)."
"
group"
:
"
mmlu_flan_cot_fewshot_other"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
task"
:
"
mmlu_flan_cot_fewshot_human_aging"
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_human_sexuality.yaml
0 → 100644
View file @
e1ae8a2f
"
dataset_name"
:
"
human_sexuality"
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
human
\
\
sexuality.
\n\n
Q:
The
most
common
disorder
among
men
who
seek
sexual
therapy
is:
\n\
(A)
premature
ejaculation
(B)
inhibited
ejaculation
(C)
erectile
disorder
(D)
ejaculatory
\
\
disorder
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
human
\
\
sexuality
for
help.
The
most
common
type
of
arousal
disorder
for
men
is
erectile
\
\
dysfunction,
meaning
that
a
person
is
not
able
to
get
any
physical
satisfaction
\
\
from
sexual
activity
although
he
may
be
interested
in
it.
The
answer
is
(C).
\n\
\n
Q:
A
woman
who
knows
she
has
active
herpes
and
untreated
syphilis
but
continues
\
\
to
have
sex
without
informing
her
partners
of
her
condition
has,
in
psychoanalytic
\
\
terms:
\n
(A)
a
strong
ego
(B)
a
weak
superego
(C)
a
weak
id
(D)
a
strong
superego
\n\
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
human
sexuality
for
\
\
help.
A
person
with
weak
superego
tends
to
be
delinquent,
criminal
or
have
antisocial
\
\
personality.
The
action
of
the
woman
who
knows
she
has
active
venereal
disease
\
\
but
still
have
sex
with
her
partners
indicate
she
may
has
antisocial
personality.
\
\
The
answer
is
(B).
\n\n
Q:
The
nature
of
homosexual
activities
that
occur
during
\
\
preadolescence
include
all
but
which
of
the
following?
\n
(A)
sexual
intercourse
\
\
(B)
circle
jerks
(C)
exhibitionism
(D)
touching
each
other's
genitals
\n
A:
Let's
\
\
think
step
by
step.
We
refer
to
Wikipedia
articles
on
human
sexuality
for
help.
\
\
Sexual
intercourse
is
prohibited
during
preadolescence.
The
answer
is
(A).
\n\n\
Q:
Women's
ability
to
have
multiple
orgasms
is
primarily
due
to:
\n
(A)
the
fact
that
\
\
they
do
not
have
a
refractory
period.
(B)
the
response
of
the
inner
layers
of
\
\
the
vagina.
(C)
having
alternating
orgasms
in
different
locations.
(D)
the
G-Spot.
\n\
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
human
sexuality
for
\
\
help.
The
refractory
period
is
the
time
when
a
person
is
not
able
to
be
erect
\
\
or
is
not
interested
in
sex.
The
answer
is
(A).
\n\n
Q:
Morning
sickness
is
typically
\
\
a
problem:
\n
(A)
during
the
first
trimester
(B)
during
the
second
trimester
(C)
\
\
during
the
third
trimester
(D)
all
through
the
pregnancy
\n
A:
Let's
think
step
\
\
by
step.
We
refer
to
Wikipedia
articles
on
human
sexuality
for
help.
Morning
sickness
\
\
usually
begins
by
nine
weeks
after
conception,
corresponding
to
the
first
trimester.
\
\
The
answer
is
(A)."
"
group"
:
"
mmlu_flan_cot_fewshot_social_sciences"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
task"
:
"
mmlu_flan_cot_fewshot_human_sexuality"
Prev
1
…
41
42
43
44
45
46
47
48
49
50
Next
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment