Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
gaoqiong
lm-evaluation-harness
Commits
0348ed97
Commit
0348ed97
authored
Oct 09, 2023
by
lintangsutawika
Browse files
merged latest update
parents
451a1873
6769119f
Changes
1000
Hide whitespace changes
Inline
Side-by-side
Showing
20 changed files
with
618 additions
and
0 deletions
+618
-0
lm_eval/tasks/mmlu/default/mmlu_prehistory.yaml
lm_eval/tasks/mmlu/default/mmlu_prehistory.yaml
+4
-0
lm_eval/tasks/mmlu/default/mmlu_professional_accounting.yaml
lm_eval/tasks/mmlu/default/mmlu_professional_accounting.yaml
+4
-0
lm_eval/tasks/mmlu/default/mmlu_professional_law.yaml
lm_eval/tasks/mmlu/default/mmlu_professional_law.yaml
+4
-0
lm_eval/tasks/mmlu/default/mmlu_professional_medicine.yaml
lm_eval/tasks/mmlu/default/mmlu_professional_medicine.yaml
+4
-0
lm_eval/tasks/mmlu/default/mmlu_professional_psychology.yaml
lm_eval/tasks/mmlu/default/mmlu_professional_psychology.yaml
+4
-0
lm_eval/tasks/mmlu/default/mmlu_public_relations.yaml
lm_eval/tasks/mmlu/default/mmlu_public_relations.yaml
+4
-0
lm_eval/tasks/mmlu/default/mmlu_security_studies.yaml
lm_eval/tasks/mmlu/default/mmlu_security_studies.yaml
+4
-0
lm_eval/tasks/mmlu/default/mmlu_sociology.yaml
lm_eval/tasks/mmlu/default/mmlu_sociology.yaml
+4
-0
lm_eval/tasks/mmlu/default/mmlu_us_foreign_policy.yaml
lm_eval/tasks/mmlu/default/mmlu_us_foreign_policy.yaml
+4
-0
lm_eval/tasks/mmlu/default/mmlu_virology.yaml
lm_eval/tasks/mmlu/default/mmlu_virology.yaml
+4
-0
lm_eval/tasks/mmlu/default/mmlu_world_religions.yaml
lm_eval/tasks/mmlu/default/mmlu_world_religions.yaml
+4
-0
lm_eval/tasks/mmlu/flan_cot_fewshot/_mmlu_flan_cot_fewshot_template_yaml
...mlu/flan_cot_fewshot/_mmlu_flan_cot_fewshot_template_yaml
+24
-0
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_abstract_algebra.yaml
...al/tasks/mmlu/flan_cot_fewshot/mmlu_abstract_algebra.yaml
+40
-0
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_anatomy.yaml
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_anatomy.yaml
+57
-0
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_astronomy.yaml
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_astronomy.yaml
+54
-0
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_business_ethics.yaml
...val/tasks/mmlu/flan_cot_fewshot/mmlu_business_ethics.yaml
+55
-0
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_clinical_knowledge.yaml
.../tasks/mmlu/flan_cot_fewshot/mmlu_clinical_knowledge.yaml
+58
-0
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_college_biology.yaml
...val/tasks/mmlu/flan_cot_fewshot/mmlu_college_biology.yaml
+60
-0
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_college_chemistry.yaml
...l/tasks/mmlu/flan_cot_fewshot/mmlu_college_chemistry.yaml
+37
-0
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_college_computer_science.yaml
.../mmlu/flan_cot_fewshot/mmlu_college_computer_science.yaml
+189
-0
No files found.
Too many changes to show.
To preserve performance only
1000 of 1000+
files are displayed.
Plain diff
Email patch
lm_eval/tasks/mmlu/default/mmlu_prehistory.yaml
0 → 100644
View file @
0348ed97
"
dataset_name"
:
"
prehistory"
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
prehistory.
\n\n
"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
mmlu_prehistory"
lm_eval/tasks/mmlu/default/mmlu_professional_accounting.yaml
0 → 100644
View file @
0348ed97
"
dataset_name"
:
"
professional_accounting"
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
professional
accounting.
\n\n
"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
mmlu_professional_accounting"
lm_eval/tasks/mmlu/default/mmlu_professional_law.yaml
0 → 100644
View file @
0348ed97
"
dataset_name"
:
"
professional_law"
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
professional
law.
\n\n
"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
mmlu_professional_law"
lm_eval/tasks/mmlu/default/mmlu_professional_medicine.yaml
0 → 100644
View file @
0348ed97
"
dataset_name"
:
"
professional_medicine"
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
professional
medicine.
\n\n
"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
mmlu_professional_medicine"
lm_eval/tasks/mmlu/default/mmlu_professional_psychology.yaml
0 → 100644
View file @
0348ed97
"
dataset_name"
:
"
professional_psychology"
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
professional
psychology.
\n\n
"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
mmlu_professional_psychology"
lm_eval/tasks/mmlu/default/mmlu_public_relations.yaml
0 → 100644
View file @
0348ed97
"
dataset_name"
:
"
public_relations"
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
public
relations.
\n\n
"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
mmlu_public_relations"
lm_eval/tasks/mmlu/default/mmlu_security_studies.yaml
0 → 100644
View file @
0348ed97
"
dataset_name"
:
"
security_studies"
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
security
studies.
\n\n
"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
mmlu_security_studies"
lm_eval/tasks/mmlu/default/mmlu_sociology.yaml
0 → 100644
View file @
0348ed97
"
dataset_name"
:
"
sociology"
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
sociology.
\n\n
"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
mmlu_sociology"
lm_eval/tasks/mmlu/default/mmlu_us_foreign_policy.yaml
0 → 100644
View file @
0348ed97
"
dataset_name"
:
"
us_foreign_policy"
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
us
foreign
policy.
\n\n
"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
mmlu_us_foreign_policy"
lm_eval/tasks/mmlu/default/mmlu_virology.yaml
0 → 100644
View file @
0348ed97
"
dataset_name"
:
"
virology"
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
virology.
\n\n
"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
mmlu_virology"
lm_eval/tasks/mmlu/default/mmlu_world_religions.yaml
0 → 100644
View file @
0348ed97
"
dataset_name"
:
"
world_religions"
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
world
religions.
\n\n
"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
mmlu_world_religions"
lm_eval/tasks/mmlu/flan_cot_fewshot/_mmlu_flan_cot_fewshot_template_yaml
0 → 100644
View file @
0348ed97
group: mmlu_flan_cot_fewshot
dataset_path: cais/mmlu
validation_split: validation
fewshot_split: dev
output_type: greedy_until
doc_to_text: "Q: {{question.strip()}}\n(A) {{choices[0]}} (B) {{choices[1]}} (C) {{choices[2]}} (D) {{choices[3]}}\nA: Let's think step by step."
doc_to_target: "{{['(A)', '(B)', '(C)', '(D)'][answer]}}"
filter_list:
- name: "get-answer"
filter:
- function: "regex"
regex_pattern: "(?<=The answer is )(.*)(?=.)"
- function: "take_first"
generation_kwargs:
until:
- "</s>"
do_sample: false
temperature: 0.0
metric_list:
- metric: exact_match
aggregation: mean
higher_is_better: true
ignore_case: true
ignore_punctuation: true
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_abstract_algebra.yaml
0 → 100644
View file @
0348ed97
dataset_name
:
abstract_algebra
description
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
abstract
\
\
algebra.
\n\n
Q:
Statement
1
|
Every
element
of
a
group
generates
a
cyclic
subgroup
\
\
of
the
group.
Statement
2
|
The
symmetric
group
S_10
has
10
elements.
\n
(A)
True,
\
\
True
(B)
False,
False
(C)
True,
False
(D)
False,
True
\n
A:
Let's
think
step
by
\
\
step.
A
cyclic
group
is
a
group
that
is
generated
by
a
single
element.
Hence
a
\
\
subgroup
generated
by
a
single
element
of
a
group
is
cyclic
and
Statement
1
is
\
\
True.
The
answer
is
(C).
\n\n
Q:
The
symmetric
group
$S_n$
has
$
\n
actorial{n}$
elements,
\
\
hence
it
is
not
true
that
$S_{10}$
has
10
elements.
\n
Find
the
characteristic
of
\
\
the
ring
2Z.
\n
(A)
0
(B)
3
(C)
12
(D)
30
\n
A:
Let's
think
step
by
step.
A
characteristic
\
\
of
a
ring
is
R
is
$n$
if
the
statement
$ka
=
0$
for
all
$a
\\
in
2Z$
implies
that
\
\
$k$
is
a
multiple
of
$n$.
Assume
that
$ka
=
0$
for
all
$a
\\
in
2Z$
for
some
$k$.
\
\
In
particular
$2k
=
0$.
Hence
$k=0$
and
$n=0$.
The
answer
is
(A).
\n\n
Q:
Statement
\
\
1|
Every
function
from
a
finite
set
onto
itself
must
be
one
to
one.
Statement
\
\
2
|
Every
subgroup
of
an
abelian
group
is
abelian.
\n
(A)
True,
True
(B)
False,
\
\
False
(C)
True,
False
(D)
False,
True
\n
A:
Let's
think
step
by
step.
Statement
\
\
1
is
true.
Let
$S$
be
a
finite
set.
If
$f:S
\n
ightarrow
S$
is
a
onto
function,
\
\
then
$|S|
=
|f(S)|$.
If
$f$
was
not
one
to
one,
then
for
finite
domain
$S$
the
\
\
image
would
have
less
than
$S$
elements,
a
contradiction.
\n
Statement
2
is
true.
\
\
Let
$G$
be
an
abelian
group
and
$H$
be
a
subgroup
of
$G$.
We
need
to
show
that
\
\
$H$
is
abelian.
Let
$a,b
\\
in
H$.
Then
$a,b
\\
in
G$
and
$ab=ba$.
Since
$G$
is
\
\
abelian,
$ab=ba$.
Since
$H$
is
a
subgroup
of
$G$,
$ab
\\
in
H$.
Therefore,
$ab=ba$
\
\
and
$H$
is
abelian.
The
answer
is
(A).
\n\n
Q:
Statement
1
|
If
aH
is
an
element
\
\
of
a
factor
group,
then
|aH|
divides
|a|.
Statement
2
|
If
H
and
K
are
subgroups
\
\
of
G
then
HK
is
a
subgroup
of
G.
\n
(A)
True,
True
(B)
False,
False
(C)
True,
False
\
\
(D)
False,
True
\n
A:
Let's
think
step
by
step.
Statement
2
is
false.
Let
$H$
be
\
\
a
subgroup
of
$S_3$
generated
by
the
cycle
$(1,2)$
and
$K$
be
a
subgroup
of
$S_3$
\
\
generated
by
the
cycle
$(1,3)$.
Both
$H$
and
$K$
have
two
elements,
the
generators
\
\
and
the
identity.
However
$HK$
contains
cycles
(1,2),
(1,3)
and
(2,3,1),
but
the
\
\
inverse
of
(2,3,1)
is
(2,1,3)
and
it
does
not
belong
to
HK,
hence
HK
is
not
a
\
\
subgroup.
The
answer
is
(B).
\n\n
Q:
Find
all
c
in
Z_3
such
that
Z_3[x]/(x^2
+
c)
\
\
is
a
field.
\n
(A)
0
(B)
1
(C)
2
(D)
3
\n
A:
Let's
think
step
by
step.
Z_3[x]/(x^2
\
\
+
c)
is
a
field
if
and
only
if
x^2
+
c
does
not
have
roots
in
Z_3.
That
is
x^2
\
\
+
c
!=
0
for
every
x
in
Z_3.
If
c
=
0,
then
x^2
+
c
=
x^2
has
root
0.
If
c
=
1
\
\
then
x^2
+
c
=
x^2
+
1
=
0
+
1
for
x
=
0,
1
+
1
=
2
for
x
=
1
and
1
+
1
=
2
for
\
\
x
=
2,
hence
x^2
+
1
does
not
have
any
roots.
For
c
=
2
the
polynomial
x^2
+
2
\
\
has
two
roots
at
x
=
1
and
x
=
2.
Hence
Z_3[x]/(x^2
+
c)
is
a
field
if
and
only
\
\
if
c
=
1.
The
answer
is
(B)."
include
:
_mmlu_flan_cot_fewshot_template_yaml
task
:
mmlu_flan_cot_fewshot_abstract_algebra
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_anatomy.yaml
0 → 100644
View file @
0348ed97
dataset_name
:
anatomy
description
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
anatomy.
\n\
\n
Q:
Which
of
the
following
is
the
body
cavity
that
contains
the
pituitary
gland?
\n\
(A)
Abdominal
(B)
Cranial
(C)
Pleural
(D)
Spinal
\n
A:
Let's
think
step
by
step.
We
\
\
refer
to
Wikipedia
articles
on
anatomy
for
help.
Let
\u2019
s
solve
this
problem
\
\
step
by
step.
The
pituitary
gland
is
the
major
endocrine
gland
attached
to
the
\
\
base
of
the
brain,
and
it
is
contained
in
the
Cranial
cavity.
The
answer
is
(B).
\n\
\n
Q:
Which
of
these
branches
of
the
trigeminal
nerve
contain
somatic
motor
processes?
\n\
(A)
The
supraorbital
nerve
(B)
The
infraorbital
nerve
(C)
The
mental
nerve
(D)
None
\
\
of
the
above
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
anatomy
\
\
for
help.
Let
\u2019
s
solve
this
problem
step
by
step.
\n
We
know
the
following:
\
\
(A)
The
supraorbital
nerve
(also
known
as
the
frontal
nerve)
is
the
largest
branch
\
\
of
the
ophthalmic
nerve
and
branch
of
ophthalmic
division
of
the
trigeminal
nerve.
\
\
(B)
The
infraorbital
nerve
is
a
branch
of
the
maxillary
division
of
the
trigeminal
\
\
nerve.
(C)
The
mental
nerve
is
a
branch
of
the
mandibular
division
of
the
trigeminal
\
\
nerve.
Because
all
these
nerves
are
purely
sensory
nerves
and
do
not
contain
any
\
\
somatic
motor
processes.
Therefore,
the
answer
should
be
none
of
the
above,
which
\
\
is
(D).
The
answer
is
(D).
\n\n
Q:
In
Angle's
Class
II
Div
2
occlusion
there
is
\n\
(A)
excess
overbite
of
the
upper
lateral
incisors.
(B)
negative
overjet
of
the
upper
\
\
central
incisors.
(C)
excess
overjet
of
the
upper
lateral
incisors.
(D)
excess
\
\
overjet
of
the
upper
central
incisors.
\n
A:
Let's
think
step
by
step.
We
refer
\
\
to
Wikipedia
articles
on
anatomy
for
help.
Let
\u2019
s
solve
this
problem
step
\
\
by
step.
This
is
a
question
related
to
anatomy
and
orthodontics.
Excess
overjet
\
\
is
associated
with
Class
II
occlusions;
therefore,
we
can
safely
eliminate
(B)
\
\
from
the
list,
as
negative
overjet
is
often
associated
with
Class
III
occlusions.
\
\
Now,
we
need
to
determine
the
location
of
the
excess
overjet,
and
that
would
be
\
\
the
upper
(maxillary)
lateral
incisors.
Only
(C)
has
the
correct
information.
\
\
The
answer
is
(C).
\n\n
Q:
The
pleura
\n
(A)
have
no
sensory
innervation.
(B)
are
\
\
separated
by
a
2
mm
space.
(C)
extend
into
the
neck.
(D)
are
composed
of
respiratory
\
\
epithelium.
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
anatomy
\
\
for
help.
Let
\u2019
s
solve
this
problem
step
by
step.
First,
recall
that
the
pleura
\
\
refers
to
the
thin
layer
of
tissue
that
covers
the
lungs
and
lines
the
interior
\
\
wall
of
the
chest
cavity.
Now,
let
\u2019
s
look
at
each
option:
\n
Option
(A):
\u201C\
The
pleura
have
no
sensory
innervation.
\u201D
This
information
is
not
correct.
The
\
\
pleura
do
have
a
sensory
innervation.
\n
Option
(B):
\u201C
The
pleura
are
separated
\
\
by
a
2
mm
space.
\u201D
This
information
is
not
correct.
There
is
a
very
thin
\u201C\
potential
\u201D
space
between
the
layers
of
the
pleura;
however,
it
is
typically
\
\
filled
with
serous
pleural
fluid.
\n
Option
(C):
\u201C
The
pleura
extend
into
the
\
\
neck.
\u201D
This
information
is
actuakky
true.
The
cervical
pleura,
also
known
\
\
as
the
dome
of
the
pleuradome
of
the
pleura,
lines
the
extendsiton
of
the
pleural
\
\
cavity
into
the
neck.
\n
Option
(D):
\u201C
The
pleura
are
composed
of
respiratory
\
\
epithelium.
\u201D
This
information
is
not
correct.
The
pleaura
are
composed
of
\
\
connective
tissue
(CT).
\n
Because
(A),
(B),
and
(D)
are
all
incorrect,
(D)
is
the
\
\
only
correct
answer.
The
answer
is
(C).
\n\n
Q:
What
is
the
embryological
origin
\
\
of
the
hyoid
bone?
\n
(A)
The
first
pharyngeal
arch
(B)
The
first
and
second
pharyngeal
\
\
arches
(C)
The
second
pharyngeal
arch
(D)
The
second
and
third
pharyngeal
arches
\n\
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
anatomy
for
help.
\
\
Let
\u2019
s
solve
this
problem
step
by
step.
The
hyoid
bone,
which
is
also
known
\
\
as
the
hyooid,
is
a
a
small
U-shaped
bone
located
in
the
anterior
neck.
In
its
\
\
resting
position,
it
lies
between
the
ase
of
the
mandible
and
the
third
cervical
\
\
vertebrae.
We
know
that
the
second
and
the
third
pharyngeal
arches
give
rise
to
\
\
the
horns
of
the
hyoid
bone;
therefore,
the
embryological
origin
of
the
hyoid
\
\
bone
are
the
second
and
the
third
pharyngeal
arches
\u2014
this
information
is
covered
\
\
in
the
last
option
(D).
Therefore,
we
conclude
that
(D)
must
be
the
correct
answer.
\
\
The
answer
is
(D)."
include
:
_mmlu_flan_cot_fewshot_template_yaml
task
:
mmlu_flan_cot_fewshot_anatomy
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_astronomy.yaml
0 → 100644
View file @
0348ed97
dataset_name
:
astronomy
description
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
astronomy.
\n\
\n
Q:
Where
do
most
short-period
comets
come
from
and
how
do
we
know?
\n
(A)
The
Kuiper
\
\
belt;
short
period
comets
tend
to
be
in
the
plane
of
the
solar
system
just
like
\
\
the
Kuiper
belt.
(B)
The
Kuiper
belt;
short
period
comets
tend
to
come
from
random
\
\
directions
indicating
a
spherical
distribution
of
comets
called
the
Kuiper
belt.
\
\
(C)
The
asteroid
belt;
short
period
comets
have
orbital
periods
similar
to
asteroids
\
\
like
Vesta
and
are
found
in
the
plane
of
the
solar
system
just
like
the
asteroid
\
\
belt.
(D)
The
Oort
cloud;
short
period
comets
tend
to
be
in
the
plane
of
the
solar
\
\
system
just
like
the
Oort
cloud.
\n
A:
Let's
think
step
by
step.
Most
short-period
\
\
comets
come
from
the
Kuiper
belt,
and
we
know
because
short
period
coments
tend
\
\
to
be
in
the
plane
of
the
solar
system,
just
like
the
Kuiper
belt
is.
The
answer
\
\
is
(A).
\n\n
Q:
You
are
pushing
a
truck
along
a
road.
Would
it
be
easier
to
accelerate
\
\
this
truck
on
Mars?
Why?
(Assume
there
is
no
friction)
\n
(A)
It
would
be
harder
\
\
since
the
truck
is
heavier
on
Mars.
(B)
It
would
be
easier
since
the
truck
is
\
\
lighter
on
Mars.
(C)
It
would
be
harder
since
the
truck
is
lighter
on
Mars.
(D)
\
\
It
would
be
the
same
no
matter
where
you
are.
\n
A:
Let's
think
step
by
step.
If
\
\
we
assume
that
there
is
no
friction,
the
force
needed
to
accelerate
the
truck
\
\
is
by
Newton
\u2019
s
second
law
only
dependent
on
the
mass
of
the
truck.
Hence
\
\
(A),
(B)
and
(C)
are
incorrect
since
it
doesn
\u2019
t
matter
that
it
\u2019
s
on
\
\
Mars,
and
(D)
is
the
correct
answer.
The
answer
is
(D).
\n\n
Q:
Say
the
pupil
of
\
\
your
eye
has
a
diameter
of
5
mm
and
you
have
a
telescope
with
an
aperture
of
50
\
\
cm.
How
much
more
light
can
the
telescope
gather
than
your
eye?
\n
(A)
10000
times
\
\
more
(B)
100
times
more
(C)
1000
times
more
(D)
10
times
more
\n
A:
Let's
think
\
\
step
by
step.
The
amount
of
light
is
proportional
to
the
aperture
area
$A
=
\\\
pi
D^2/4$
for
a
lens
with
diameter
$D$,
so
the
relative
amounts
of
light
between
\
\
the
eye
with
diameter
5mm
and
the
telescope
with
diameter
50mm
is
$(50
cm)^2/(5mm)^2
\
\
=
10000$.
The
answer
is
(A).
\n\n
Q:
Why
isn't
there
a
planet
where
the
asteroid
\
\
belt
is
located?
\n
(A)
A
planet
once
formed
here
but
it
was
broken
apart
by
a
catastrophic
\
\
collision.
(B)
There
was
not
enough
material
in
this
part
of
the
solar
nebula
\
\
to
form
a
planet.
(C)
There
was
too
much
rocky
material
to
form
a
terrestrial
\
\
planet
but
not
enough
gaseous
material
to
form
a
jovian
planet.
(D)
Resonance
\
\
with
Jupiter
prevented
material
from
collecting
together
to
form
a
planet.
\n
A:
\
\
Let's
think
step
by
step.
The
asteroid
belt
is
a
stellar
disc
consisting
of
a
\
\
large
number
of
asteroids
between
Mars
and
Jupiter's
orbits.
The
asteroids
in
\
\
this
belt
are
affected
by
the
gravitational
pull
from
both
other
asteroids
and
\
\
nearby
planets.
Due
to
the
strong
gravitational
force
of
Jupiter
there
are
resonances
\
\
that
give
rise
to
low
density
regions
of
asteroids
known
as
the
Kirkwood
gap.
\
\
So
(B)
and
(C)
are
not
correct
since
it
\u2019
s
not
a
lack
of
material
that
prevents
\
\
a
planet
from
being
formed,
and
(A)
is
incorrect
because
the
Kirkwood
gap
would
\
\
have
prevented
a
planet
from
forming
in
the
first
place,
and
(D)
is
the
correct
\
\
option.
The
answer
is
(D).
\n\n
Q:
Why
is
Mars
red?
\n
(A)
Because
the
surface
is
\
\
covered
with
heavily
oxidized
(
\"
rusted
\"
)
minerals.
(B)
Because
the
atmosphere
\
\
scatters
more
light
at
bluer
wavelengths
transmitting
mostly
red
light.
(C)
Because
\
\
Mars
is
covered
with
ancient
lava
flows
which
are
red
in
color.
(D)
Because
flowing
\
\
water
on
Mars's
surface
altered
the
surface
minerals
several
billion
years
ago.
\n\
A:
Let's
think
step
by
step.
Option
(B)
is
not
correct
because
if
the
red
color
\
\
was
caused
by
the
scattering
off
the
atmosphere,
then
the
earth
with
a
much
thicker
\
\
atmosphere
would
also
look
red.
Options
(C)
and
(D)
are
not
specific
enough
about
\
\
why
the
color
of
the
surface
would
be
red,
while
(A)
is
correct
because
it
explains
\
\
that
the
surface
is
red
due
to
the
rusted
materials
on
the
surface
and
the
red
\
\
color
comes
from
the
rust.
So
the
correct
option
is
(A).
The
answer
is
(A)."
include
:
_mmlu_flan_cot_fewshot_template_yaml
task
:
mmlu_flan_cot_fewshot_astronomy
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_business_ethics.yaml
0 → 100644
View file @
0348ed97
dataset_name
:
business_ethics
description
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
business
\
\
ethics.
\n\n
Q:
In
contrast
to
_______,
_______
aim
to
reward
favourable
behaviour
\
\
by
companies.
The
success
of
such
campaigns
have
been
heightened
through
the
use
\
\
of
___________,
which
allow
campaigns
to
facilitate
the
company
in
achieving
_________
\
\
.
\n
(A)
Buycotts,
Boycotts,
Blockchain
technology,
Charitable
donations
(B)
Buycotts,
\
\
Boycotts,
Digital
technology,
Increased
Sales
(C)
Boycotts,
Buyalls,
Blockchain
\
\
technology,
Charitable
donations
(D)
Boycotts,
Buycotts,
Digital
technology,
Increased
\
\
Sales
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
business
\
\
ethics
for
help.
The
sentence
that
best
uses
the
possible
options
above
is
\u201C\
In
contrast
to
*boycotts*,
*buycotts*
aim
to
reward
favourable
behavior
by
companies.
\
\
The
success
of
such
campaigns
have
been
heightened
through
the
use
of
*digital
\
\
technology*,
which
allow
campaigns
to
facilitate
the
company
in
achieving
*increased
\
\
sales*.
\u201D
The
answer
is
(D).
\n\n
Q:
_______
is
the
direct
attempt
to
formally
\
\
or
informally
manage
ethical
issues
or
problems,
through
specific
policies,
practices
\
\
and
programmes.
\n
(A)
Corporate
social
responsibility
(B)
Business
ethics
management
\
\
(C)
Sustainability
(D)
Environmental
management
\n
A:
Let's
think
step
by
step.
\
\
We
refer
to
Wikipedia
articles
on
business
ethics
for
help.
The
direct
attempt
\
\
manage
ethical
issues
through
specific
policies,
practices,
and
programs
is
business
\
\
ethics
management.
The
answer
is
(B).
\n\n
Q:
Three
contrasting
tactics
that
CSO's
\
\
can
engage
in
to
meet
their
aims
are
________
which
typically
involves
research
\
\
and
communication,
________,
which
may
involve
physically
attacking
a
company's
\
\
operations
or
________,
often
involving
some
form
of
_______.
\n
(A)
Non-violent
\
\
direct
action,
Violent
direct
action,
Indirect
action,
Boycott
(B)
Indirect
action,
\
\
Instrumental
action,
Non-violent
direct
action,
Information
campaign
(C)
Indirect
\
\
action,
Violent
direct
action,
Non-violent
direct-action
Boycott
(D)
Non-violent
\
\
direct
action,
Instrumental
action,
Indirect
action,
Information
campaign
\n
A:
\
\
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
business
ethics
for
\
\
help.
The
sentence
that
best
uses
the
possible
options
above
is
\u201C
Three
contrasting
\
\
tactics
that
CSO's
can
engage
in
to
meet
their
aims
are
*indirect
action*,
which
\
\
typically
involves
research
and
communication,
*violent
direct
action*,
which
\
\
may
involve
physically
attacking
a
company's
operations
or
*non-violent
direct
\
\
action*,
often
involving
some
form
of
*boycott*.
\u201D
The
answer
is
(C).
\n\n\
Q:
To
ensure
the
independence
of
the
non-executive
board
members,
there
are
a
number
\
\
of
steps
which
can
be
taken,
which
include
non-executives
being
drawn
from
_______
\
\
the
company,
being
appointed
for
a
_________
time
period
as
well
as
being
appointed
\
\
_________.
\n
(A)
Outside,
Limited,
Independently
(B)
Inside,
Limited,
Intermittently
\
\
(C)
Outside,
Unlimited,
Intermittently
(D)
Inside,
Unlimited,
Independently
\n\
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
business
ethics
for
\
\
help.
The
sentence
that
best
uses
the
possible
options
above
is
\u201C
To
ensure
\
\
the
independence
of
the
non-executive
board
members,
there
are
a
number
of
steps
\
\
which
can
be
taken,
which
include
non-executives
being
draw
from
*outside*
the
\
\
company,
being
appointed
for
a
*limited*
time
period
as
well
as
being
imported
\
\
*independently*.
The
answer
is
(A).
\n\n
Q:
Beyond
the
business
case
for
engaging
\
\
in
CSR
there
are
a
number
of
moral
arguments
relating
to:
negative
_______,
the
\
\
_______that
corporations
possess
and
the
________
of
business
and
society.
\n
(A)
\
\
Externalities,
Power,
Independence
(B)
Publicity,
Insubstantial
resources,
Mutual
\
\
dependence
(C)
Publicity,
Power,
Independence
(D)
Externalities,
Power,
Mutual
\
\
dependence
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
business
\
\
ethics
for
help.
The
sentence
that
best
uses
the
possible
options
above
is
\u201C\
Beyond
the
business
case
for
engaging
the
CSR
there
are
a
number
of
moral
arguments
\
\
relating
to:
negative
*externalities*,
the
*power*
that
corporations
possess
and
\
\
the
*mutual
independence*
of
business
and
society.
The
answer
is
(D)."
include
:
_mmlu_flan_cot_fewshot_template_yaml
task
:
mmlu_flan_cot_fewshot_business_ethics
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_clinical_knowledge.yaml
0 → 100644
View file @
0348ed97
dataset_name
:
clinical_knowledge
description
:
'
The
following
are
multiple
choice
questions
(with
answers)
about
clinical
knowledge.
Q:
Glycolysis
is
the
name
given
to
the
pathway
involving
the
conversion
of:
(A)
glycogen
to
glucose-1-phosphate.
(B)
glycogen
or
glucose
to
fructose.
(C)
glycogen
or
glucose
to
pyruvate
or
lactate.
(D)
glycogen
or
glucose
to
pyruvate
or
acetyl
CoA.
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
clinical
knowledge
for
help.
Glycolysis
is
the
name
given
to
the
pathway
involving
conversion
of
glycogen
or
glucose
to
pyruvate
or
lactate.
The
answer
is
(C).
Q:
What
is
the
difference
between
a
male
and
a
female
catheter?
(A)
Male
and
female
catheters
are
different
colours.
(B)
Male
catheters
are
longer
than
female
catheters.
(C)
Male
catheters
are
bigger
than
female
catheters.
(D)
Female
catheters
are
longer
than
male
catheters.
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
clinical
knowledge
for
help.
The
difference
between
a
male
and
female
catheter
is
that
male
catheters
tend
to
be
longer
than
female
catheters.
The
answer
is
(B).
Q:
How
many
attempts
should
you
make
to
cannulate
a
patient
before
passing
the
job
on
to
a
senior
colleague,
according
to
the
medical
knowledge
of
2020?
(A)
4
(B)
3
(C)
2
(D)
1
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
clinical
knowledge
for
help.
According
to
the
medical
protocol
as
of
2020,
you
should
make
two
attempts
to
cannulate
a
patient
before
passing
the
job
on
to
a
more-senior
practitioner.
The
answer
is
(C).
Q:
In
the
assessment
of
the
hand
function
which
of
the
following
is
true?
(A)
Abduction
of
the
thumb
is
supplied
by
spinal
root
T2
(B)
Opposition
of
the
thumb
by
opponens
policis
is
supplied
by
spinal
root
T1
(C)
Finger
adduction
is
supplied
by
the
median
nerve
(D)
Finger
abduction
is
mediated
by
the
palmar
interossei
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
clinical
knowledge
for
help.
Of
all
the
options,
it
is
only
true
that
the
opposition
of
the
thumb
by
opponens
pollicis
is
supplied
by
spinal
root
T1.
The
answer
is
(B).
Q:
The
energy
for
all
forms
of
muscle
contraction
is
provided
by:
(A)
ATP.
(B)
ADP.
(C)
phosphocreatine.
(D)
oxidative
phosphorylation.
A:
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
clinical
knowledge
for
help.
The
energy
for
muscular
contraction
is
provided
by
ATP
(adenosine
triphosphate),
which
is
the
powerhouse
of
the
cell.
The
answer
is
(A).'
include
:
_mmlu_flan_cot_fewshot_template_yaml
task
:
mmlu_flan_cot_fewshot_clinical_knowledge
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_college_biology.yaml
0 → 100644
View file @
0348ed97
dataset_name
:
college_biology
description
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
college
\
\
biology.
\n\n
Q:
Which
of
the
following
represents
an
accurate
statement
concerning
\
\
arthropods?
\n
(A)
They
possess
an
exoskeleton
composed
primarily
of
peptidoglycan.
\
\
(B)
They
possess
an
open
circulatory
system
with
a
dorsal
heart.
(C)
They
are
\
\
members
of
a
biologically
unsuccessful
phylum
incapable
of
exploiting
diverse
\
\
habitats
and
nutrition
sources.
(D)
They
lack
paired,
jointed
appendages.
\n
A:
\
\
Let's
think
step
by
step.
Peptidoglycan
is
known
to
comprise
the
plasma
membrane
\
\
of
most
bacteria,
rather
than
the
exoskeleton
of
arthropods,
which
is
made
of
\
\
chitin,
which
rules
out
(A).
The
answer
(C)
is
false
because
arthropods
are
a
\
\
highly
successful
phylum.
Likewise,
arthropods
have
paired,
jointed
appendages,
\
\
which
rules
out
(D).
The
only
remaining
option
is
(B),
as
arthropods
have
an
open
\
\
circulatory
system
with
a
dorsal
tubular
heart.
The
answer
is
(B).
\n\n
Q:
In
a
\
\
given
population,
1
out
of
every
400
people
has
a
cancer
caused
by
a
completely
\
\
recessive
allele,
b.
Assuming
the
population
is
in
Hardy-Weinberg
equilibrium,
\
\
which
of
the
following
is
the
expected
proportion
of
individuals
who
carry
the
\
\
b
allele
but
are
not
expected
to
develop
the
cancer?
\n
(A)
1/400
(B)
19/400
(C)
\
\
20/400
(D)
38/400
\n
A:
Let's
think
step
by
step.
According
to
the
Hardy
Weinberg
\
\
Law,
$p^2
+
2
p
q
+
q^2
=
1$,
and
$p
+
q
=
1$
where
$p$
is
the
frequency
of
the
\
\
dominant
allele,
$q$
is
the
frequency
of
the
recessive
allele,
and
$p^2$,
$q^2$,
\
\
and
$2pq$
are
the
frequencies
of
dominant
homozygous,
recessive
homozygous,
and
\
\
heterozygous
individuals,
respectively.
\u200B
The
frequency
of
the
recessive
allele
\
\
(q)
is
$
\\
sqrt{
\f
rac{1}{400}}
=
0.05$.
We
have
$p
=
1
-
q
=
0.95$.
The
frequency
\
\
of
heterozygous
individuals
is
$2pq
=
2
\\
cdot
0.05
\\
cdot
0.95
=
0.095$.
The
\
\
number
of
heterozygous
individuals
is
equal
to
the
frequency
of
heterozygous
individuals
\
\
times
the
size
of
the
population,
or
$0.095
*
400
=
38$.
So
we
end
up
with
38/400.
\
\
The
answer
is
(D).
\n\n
Q:
According
to
the
pressure-flow
model
of
movement
of
phloem
\
\
contents,
photosynthate
movement
from
source
to
sink
is
driven
by
\n
(A)
an
ATP-dependent
\
\
pressure-flow
pump
(B)
a
water-pressure
potential
gradient
(C)
transpiration
(D)
\
\
apoplastic
diffusion
\n
A:
Let's
think
step
by
step.
It
is
a
gradient
in
water
pressure
\
\
that
induces
the
movement
of
phloem
content,
which
refers
to
answer
(B).
The
mechanism
\
\
of
movement
does
not
rely
on
metabolism,
which
rules
out
(A).
Transpiration
refers
\
\
to
the
exhalation
of
water
vapor
through
plant
stomata,
and
is
also
not
related,
\
\
which
rules
out
(C).
While
the
apoplastic
pathway
is
one
of
two
main
pathways
\
\
for
water
transport
in
plants,
it
is
not
central
to
the
pressure
flow
model,
which
\
\
rules
out
(D).
The
answer
is
(B).
\n\n
Q:
Which
of
the
following
contain
DNA
sequences
\
\
required
for
the
segregation
of
chromosomes
in
mitosis
and
meiosis?
\n
(A)
Telomeres
\
\
(B)
Centromeres
(C)
Nucleosomes
(D)
Spliceosomes
\n
A:
Let's
think
step
by
step.
\
\
The
genetic
material
in
Telomeres
is
not
used,
which
rules
out
(A).
Nucleosomes
\
\
are
the
repeating
subunit
that
comprises
chromatin
packed
in
a
cell
nucleus,
and
\
\
do
not
specifically
refer
to
DNA
sequences
necessary
for
segregating
chromosomes
\
\
in
cell
division,
which
rules
out
(C).
A
spliceosome
is
a
large
ribonucleoprotein
\
\
that
removes
introns
from
transcribed
pre-mRNA
rather
than
governing
chromosome
\
\
segregation.
Centromeres
are
directly
responsible
for
segregating
chromosomes
\
\
in
cell
division.
The
answer
is
(B).
\n\n
Q:
The
presence
of
homologous
structures
\
\
in
two
different
organisms,
such
as
the
humerus
in
the
front
limb
of
a
human
and
\
\
a
bird,
indicates
that
\n
(A)
the
human
and
bird
are
polyphyletic
species
(B)
a
\
\
human's
and
bird's
evolution
is
convergent
(C)
the
human
and
bird
belong
to
a
\
\
clade
(D)
the
human
and
bird
developed
by
analogy
\n
A:
Let's
think
step
by
step.
\
\
Polyphyletic
species
are
organisms
that
are
grouped
due
to
having
similar
characteristics
\
\
but
which
do
not
have
a
common
ancestor.
This
is
not
the
case
for
humans
and
birds,
\
\
which
rules
out
(A).
Convergent
evolution
refers
to
the
indepdendent
development
\
\
of
similar
features
in
different
species
at
different
periods,
which
is
also
not
\
\
the
case
for
humans
and
birds,
which
rules
out
(B).
Analogy
refers
to
the
superficial
\
\
resemblance
of
structures
that
have
different
origins,
which
is
not
the
case
for
\
\
the
human
and
bird
forearms,
which
rules
out
(D).
Humans
and
birds
do
belong
to
\
\
the
same
clade
-
a
group
of
organisms
composed
of
a
common
ancestor.
The
answer
\
\
is
(C)."
include
:
_mmlu_flan_cot_fewshot_template_yaml
task
:
mmlu_flan_cot_fewshot_college_biology
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_college_chemistry.yaml
0 → 100644
View file @
0348ed97
dataset_name
:
college_chemistry
description
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
college
\
\
chemistry.
\n\n
Q:
3
Cl
\u2212
(aq)
+
4
CrO_4^2
\u2212
(aq)
+
23
H+(aq)
\u2192
3
HClO2(aq)
\
\
+
4
Cr3+(aq)
+
10
H2O(l).
In
the
reaction
shown
above,
Cl
\u2212
(aq)
behaves
as
\n\
(A)
an
acid
(B)
a
base
(C)
a
catalyst
(D)
a
reducing
agent
\n
A:
Let's
think
step
\
\
by
step.
A
molecule
that
behaves
as
a
base
accepts
an
H+
ion
(or
proton)
from
\
\
another
molecule,
whereas
a
molecule
that
behaves
as
an
acid
donates
an
H+
ion
\
\
(or
proton)
to
another
molecule.
Neither
of
these
is
the
case
for
Cl
in
this
reaction,
\
\
which
rules
out
(A)
and
(B).
A
catalyst
is
a
substance
that
only
accelerates
a
\
\
reaction
without
itself
undergoing
chemical
change,
which
is
not
the
case
here.
\
\
This
rules
out
(C).
Instead,
the
$Cl^{-}
molecules
carry
a
negative
charge,
which
\
\
they
donate
in
the
reaction
to
form
3
HClO2.
This
is
the
behavior
of
a
reducing
\
\
agent,
or
(D).
The
answer
is
(D).
\n\n
Q:
Which
of
the
following
statements
about
\
\
the
lanthanide
elements
is
NOT
true?
\n
(A)
The
most
common
oxidation
state
for
\
\
the
lanthanide
elements
is
+3.
(B)
Lanthanide
complexes
often
have
high
coordination
\
\
numbers
(>
6).
(C)
All
of
the
lanthanide
elements
react
with
aqueous
acid
to
liberate
\
\
hydrogen.
(D)
The
atomic
radii
of
the
lanthanide
elements
increase
across
the
\
\
period
from
La
to
Lu.
\n
A:
Let's
think
step
by
step.
The
atomic
radii
of
the
lanthanide
\
\
elements
in
fact
decrease
across
the
period
from
La
to
Lu.
Options
(A),
(B),
and
\
\
(C)
are
all
true.
This
means
that
only
(D)
is
NOT
true.
The
answer
is
(D).
\n\n\
Q:
Which
of
the
following
lists
the
hydrides
of
group-14
elements
in
order
of
thermal
\
\
stability,
from
lowest
to
highest?
\n
(A)
PbH4
<
SnH4
<
GeH4
<
SiH4
<
CH4
(B)
PbH4
\
\
<
SnH4
<
CH4
<
GeH4
<
SiH4
(C)
CH4
<
SiH4
<
GeH4
<
SnH4
<
PbH4
(D)
CH4
<
PbH4
\
\
<
GeH4
<
SnH4
<
SiH4
\n
A:
Let's
think
step
by
step.
The
thermal
stability
of
group-14
\
\
hydrides
decreases
as
we
move
from
the
top
of
group
14
to
the
bottom.
The
order
\
\
of
elements
in
the
group
from
top
to
bottom
is
C,
Si,
Ge,
Sn,
Pb.
Therefore
in
\
\
order
of
increasing
thermal
stability
we
have
PbH4,
SnH4,
GeH4,
SiH4,
and
CH4,
\
\
or
answer
(A).
The
answer
is
(A).
\n\n
Q:
Predict
the
number
of
lines
in
the
EPR
\
\
spectrum
of
a
solution
of
13C-labelled
methyl
radical
(13CH3
\u2022
),
assuming
\
\
the
lines
do
not
overlap.
\n
(A)
4
(B)
3
(C)
6
(D)
24
(E)
8
\n
A:
Let's
think
step
\
\
by
step.
The
electron
paramagnetic
resonance
spectrum
will
be
split
by
two
forms
\
\
of
interactions.
The
first
is
the
hyperfine
interaction
with
the
13C
(nuclear
\
\
spin
$I
=
\n
rac{1}{2}$)
which
will
split
the
spectrum
into
2
lines.
This
will
\
\
be
further
split
into
4
lines
by
the
interaction
with
three
equivalent
1H
nuclei.
\
\
The
total
number
of
lines
is
therefore
$2
\\
cdot
4
=
8$.
The
answer
is
(E)."
include
:
_mmlu_flan_cot_fewshot_template_yaml
task
:
mmlu_flan_cot_fewshot_college_chemistry
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_college_computer_science.yaml
0 → 100644
View file @
0348ed97
dataset_name
:
college_computer_science
description
:
'
The
following
are
multiple
choice
questions
(with
answers)
about
college
computer
science.
Q:
Which
of
the
following
regular
expressions
is
equivalent
to
(describes
the
same
set
of
strings
as)
(a*
+
b)*(c
+
d)?
(A)
a*(c
+
d)+
b(c
+
d)
(B)
a*(c
+
d)*
+
b(c
+
d)*
(C)
a*(c
+
d)+
b*(c
+
d)
(D)
(a
+
b)*c
+(a
+
b)*d
A:
Let'
'
s
think
step
by
step.
We
know
that:
1.
(X*
+
Y)*
=
(X
+
Y)*
2.
X(Y
+
Z)?
=
XY
+
XZ
Using
equation
1
we
can
rewrite
(a*
+
b)*(c
+
d)?
as:
3.
(a
+
b)*(c
+
d)?
Using
equation
2
we
can
rewrite
equation
3
as:
(a
+
b)*c
+
(a
+
b)*d
The
answer
is
(D).
Q:
The
Singleton
design
pattern
is
used
to
guarantee
that
only
a
single
instance
of
a
class
may
be
instantiated.
Which
of
the
following
is
(are)
true
of
this
design
pattern?
I.
The
Singleton
class
has
a
static
factory
method
to
provide
its
instance.
II.
The
Singleton
class
can
be
a
subclass
of
another
class.
III.
The
Singleton
class
has
a
private
constructor.
(A)
I
only
(B)
II
only
(C)
III
only
(D)
I,
II,
and
III
A:
Let'
'
s
think
step
by
step.
Statement
I
is
a
correct
statement
about
a
Singleton,
because
a
Singleton
restricts
instantiation
to
a
single,
static
method.
Statement
II
is
also
correct,
because
there
is
no
inherent
restriction
regarding
the
inheritance
of
a
Singleton.
Statement
III
is
also
correct,
because
a
Singletons
must
be
instantiated
only
once,
so
its
constructor
is
made
private
to
prevent
any
construction
except
via
its
static
factory
method.
Given
these
facts,
statements
I,
II,
and
III
are
all
correct.
The
answer
is
(D).
Q:
A
certain
pipelined
RISC
machine
has
8
general-purpose
registers
R0,
R1,
.
.
.
,
R7
and
supports
the
following
operations:
ADD
Rs1,
Rs2,
Rd
(Add
Rs1
to
Rs2
and
put
the
sum
in
Rd)
MUL
Rs1,
Rs2,
Rd
(Multiply
Rs1
by
Rs2
and
put
the
product
in
Rd)
An
operation
normally
takes
one
cycle;
however,
an
operation
takes
two
cycles
if
it
produces
a
result
required
by
the
immediately
following
operation
in
an
operation
sequence.
Consider
the
expression
AB
+
ABC
+
BC,
where
variables
A,
B,
C
are
located
in
registers
R0,
R1,
R2.
If
the
contents
of
these
three
registers
must
not
be
modified,
what
is
the
minimum
number
of
clock
cycles
required
for
an
operation
sequence
that
computes
the
value
of
AB
+
ABC
+
BC?
(A)
5
(B)
6
(C)
7
(D)
8
A:
Let'
'
s
think
step
by
step.
First,
we
are
given
that
A
is
in
R0,
B
is
in
R1,
and
C
is
in
R2.
Next,
we
can
see
that
we
must
compute
three
multiplies
(AB,
BC,
and
ABC)
and
two
adds
(AB
+
ABC,
(AB
+
ABC)
+
BC)
to
compute
our
final
answer,
resulting
in
a
minimum
of
five
clock
cycles.
Next,
we
can
see
that
there
is
no
way
to
avoid
at
least
one
pipeline
stall
when
computing
our
final
answer,
because
to
compute
our
final
sum
we
must
wait
at
least
one
cycle
for
the
results
from
the
previous
stage
to
be
ready.
Thus,
our
minimum
number
of
cycles
must
be
6.
We
can
verify
that
we
can
create
a
solution
that
requires
only
six
cycles
as
follows:
compute
AB:
MUL
R0,
R1,
R3
compute
BC:
MUL
R1,
R2,
R4
compute
ABC:
MUL
R3,
R4,
R5
compute
AB
+
BC:
ADD
R3,
R4,
R6
STALL
compute
AB
+
ABC
+
BC:
ADD
R5,
R6,
R7
So
there
are
6
cycles.
The
answer
is
(B).
Q:
A
compiler
generates
code
for
the
following
assignment
statement.
G
:=
(A
+
B)
*
C
-
(D
+
E)
*
F
The
target
machine
has
a
single
accumulator
and
a
single-address
instruction
set
consisting
of
instructions
load,
store,
add,
subtract,
and
multiply.
For
the
arithmetic
operations,
the
left
operand
is
taken
from
the
accumulator
and
the
result
appears
in
the
accumulator.
The
smallest
possible
number
of
instructions
in
the
resulting
code
is
(A)
5
(B)
6
(C)
7
(D)
9
A:
Let'
'
s
think
step
by
step.
We
can
compute
the
final
answer
with
the
following
sequence
of
operations:
1.
LOAD
D
(accumulator
=
D)
2.
ADD
E
(accumulator
=
D+E)
3.
MUL
F
(accumulator
=
(D+E)*F)
4.
STORE
X
(X
=
(D+E)*F)
5.
LOAD
A
(accumulator
=
A)
6.
ADD
B
(accumulator
=
A+B)
7.
MUL
C
(accumulator
=
(A+B)*C)
8.
SUB
X
(accumulator
=
(A+B)*C
-
(D+E)*F)
9.
STORE
G
(G
=
(A+B)*C
-
(D+E)*F)
This
sequence
takes
9
instructions.
The
answer
is
(D).
Q:
Consider
a
computer
design
in
which
multiple
processors,
each
with
a
private
cache
memory,
share
global
memory
using
a
single
bus.
This
bus
is
the
critical
system
resource.
Each
processor
can
execute
one
instruction
every
500
nanoseconds
as
long
as
memory
references
are
satisfied
by
its
local
cache.
When
a
cache
miss
occurs,
the
processor
is
delayed
for
an
additional
2,000
nanoseconds.
During
half
of
this
additional
delay,
the
bus
is
dedicated
to
serving
the
cache
miss.
During
the
other
half,
the
processor
cannot
continue,
but
the
bus
is
free
to
service
requests
from
other
processors.
On
average,
each
instruction
requires
2
memory
references.
On
average,
cache
misses
occur
on
1
percent
of
references.
What
proportion
of
the
capacity
of
the
bus
would
a
single
processor
consume,
ignoring
delays
due
to
competition
from
other
processors?
(A)
1/50
(B)
1/27
(C)
1/25
(D)
2/27
A:
Let'
'
s
think
step
by
step.
We
know
that
each
instruction
requires
two
memory
references
per
instruction,
and
that
there
is
an
average
cache
miss
rate
of
one
percent.
Thus
a
given
processor
has:
(1
cache
miss
/
100
references)
*
(2
references
/
instruction)
=
(2
cache
misses
/
100
instructions),
so:
misses_per_instruction
=
1
cache
miss
/
50
instructions.
Next,
we
know
that
each
instruction
requires
500
nanoseconds
when
there
is
no
cache
miss,
and
500
+
2000
=
2500
nanoseconds
when
there
is
a
cache
miss.
Thus:
50
instructions
/
(49
*
500)
+
(1
*
2500)
nanoseconds,
so:
instructions_per_ns
=
50
instructions
/
27000
nanoseconds.
Now,
we
know
that
each
cache
miss
locks
the
bus
for
half
of
the
2000
nanosecond
cache
miss
delay,
or
1000
nanoseconds,
so:
lock_ns_per_miss
=
1000
nanoseconds
/
cache
miss.
Thus
we
can
see
that
on
average
a
single
processor
will
lock
the
bus
for:
lock_ns_per_miss
*
misses_per_instruction
*
instructions_per_ns
=
(1000
nanoseconds
/
cache
miss)
*
(1
cache
miss
/
50
instructions)
*
(50
instructions
/
27000
nanoseconds)
=
1000
*
(1/50)
*
(50/27000)
=
1000/27000
=
1/27.
The
answer
is
(B).'
include
:
_mmlu_flan_cot_fewshot_template_yaml
task
:
mmlu_flan_cot_fewshot_college_computer_science
Prev
1
…
35
36
37
38
39
40
41
42
43
…
50
Next
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment