Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
gaoqiong
lm-evaluation-harness
Commits
b32b3793
Commit
b32b3793
authored
Dec 04, 2023
by
lintangsutawika
Browse files
add \n\n to end of description
parent
7afae7b5
Changes
58
Hide whitespace changes
Inline
Side-by-side
Showing
20 changed files
with
20 additions
and
20 deletions
+20
-20
lm_eval/tasks/mmlu/flan_cot_fewshot/_cot_prompts.json
lm_eval/tasks/mmlu/flan_cot_fewshot/_cot_prompts.json
+1
-1
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_abstract_algebra.yaml
...al/tasks/mmlu/flan_cot_fewshot/mmlu_abstract_algebra.yaml
+1
-1
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_anatomy.yaml
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_anatomy.yaml
+1
-1
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_astronomy.yaml
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_astronomy.yaml
+1
-1
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_business_ethics.yaml
...val/tasks/mmlu/flan_cot_fewshot/mmlu_business_ethics.yaml
+1
-1
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_clinical_knowledge.yaml
.../tasks/mmlu/flan_cot_fewshot/mmlu_clinical_knowledge.yaml
+1
-1
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_college_biology.yaml
...val/tasks/mmlu/flan_cot_fewshot/mmlu_college_biology.yaml
+1
-1
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_college_chemistry.yaml
...l/tasks/mmlu/flan_cot_fewshot/mmlu_college_chemistry.yaml
+1
-1
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_college_computer_science.yaml
.../mmlu/flan_cot_fewshot/mmlu_college_computer_science.yaml
+1
-1
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_college_mathematics.yaml
...tasks/mmlu/flan_cot_fewshot/mmlu_college_mathematics.yaml
+1
-1
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_college_medicine.yaml
...al/tasks/mmlu/flan_cot_fewshot/mmlu_college_medicine.yaml
+1
-1
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_college_physics.yaml
...val/tasks/mmlu/flan_cot_fewshot/mmlu_college_physics.yaml
+1
-1
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_computer_security.yaml
...l/tasks/mmlu/flan_cot_fewshot/mmlu_computer_security.yaml
+1
-1
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_conceptual_physics.yaml
.../tasks/mmlu/flan_cot_fewshot/mmlu_conceptual_physics.yaml
+1
-1
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_econometrics.yaml
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_econometrics.yaml
+1
-1
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_electrical_engineering.yaml
...ks/mmlu/flan_cot_fewshot/mmlu_electrical_engineering.yaml
+1
-1
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_elementary_mathematics.yaml
...ks/mmlu/flan_cot_fewshot/mmlu_elementary_mathematics.yaml
+1
-1
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_formal_logic.yaml
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_formal_logic.yaml
+1
-1
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_global_facts.yaml
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_global_facts.yaml
+1
-1
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_high_school_biology.yaml
...tasks/mmlu/flan_cot_fewshot/mmlu_high_school_biology.yaml
+1
-1
No files found.
lm_eval/tasks/mmlu/flan_cot_fewshot/_cot_prompts.json
View file @
b32b3793
This source diff could not be displayed because it is too large. You can
view the blob
instead.
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_abstract_algebra.yaml
View file @
b32b3793
...
@@ -35,7 +35,7 @@
...
@@ -35,7 +35,7 @@
\
then
x^2
+
c
=
x^2
+
1
=
0
+
1
for
x
=
0,
1
+
1
=
2
for
x
=
1
and
1
+
1
=
2
for
\
\
then
x^2
+
c
=
x^2
+
1
=
0
+
1
for
x
=
0,
1
+
1
=
2
for
x
=
1
and
1
+
1
=
2
for
\
\
x
=
2,
hence
x^2
+
1
does
not
have
any
roots.
For
c
=
2
the
polynomial
x^2
+
2
\
\
x
=
2,
hence
x^2
+
1
does
not
have
any
roots.
For
c
=
2
the
polynomial
x^2
+
2
\
\
has
two
roots
at
x
=
1
and
x
=
2.
Hence
Z_3[x]/(x^2
+
c)
is
a
field
if
and
only
\
\
has
two
roots
at
x
=
1
and
x
=
2.
Hence
Z_3[x]/(x^2
+
c)
is
a
field
if
and
only
\
\
if
c
=
1.
The
answer
is
(B)."
\
if
c
=
1.
The
answer
is
(B).
\n\n
"
"
group"
:
"
mmlu_flan_cot_fewshot_stem"
"
group"
:
"
mmlu_flan_cot_fewshot_stem"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
task"
:
"
mmlu_flan_cot_fewshot_abstract_algebra"
"
task"
:
"
mmlu_flan_cot_fewshot_abstract_algebra"
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_anatomy.yaml
View file @
b32b3793
...
@@ -51,7 +51,7 @@
...
@@ -51,7 +51,7 @@
\
of
the
hyoid
bone;
therefore,
the
embryological
origin
of
the
hyoid
bone
are
the
\
\
of
the
hyoid
bone;
therefore,
the
embryological
origin
of
the
hyoid
bone
are
the
\
\
second
and
the
third
pharyngeal
arches—this
information
is
covered
in
the
last
\
\
second
and
the
third
pharyngeal
arches—this
information
is
covered
in
the
last
\
\
option
(D).
Therefore,
we
conclude
that
(D)
must
be
the
correct
answer.
The
answer
\
\
option
(D).
Therefore,
we
conclude
that
(D)
must
be
the
correct
answer.
The
answer
\
\
is
(D)."
\
is
(D).
\n\n
"
"
group"
:
"
mmlu_flan_cot_fewshot_stem"
"
group"
:
"
mmlu_flan_cot_fewshot_stem"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
task"
:
"
mmlu_flan_cot_fewshot_anatomy"
"
task"
:
"
mmlu_flan_cot_fewshot_anatomy"
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_astronomy.yaml
View file @
b32b3793
...
@@ -49,7 +49,7 @@
...
@@ -49,7 +49,7 @@
\
red.
Options
(C)
and
(D)
are
not
specific
enough
about
why
the
color
of
the
surface
\
\
red.
Options
(C)
and
(D)
are
not
specific
enough
about
why
the
color
of
the
surface
\
\
would
be
red,
while
(A)
is
correct
because
it
explains
that
the
surface
is
red
\
\
would
be
red,
while
(A)
is
correct
because
it
explains
that
the
surface
is
red
\
\
due
to
the
rusted
materials
on
the
surface
and
the
red
color
comes
from
the
rust.
\
\
due
to
the
rusted
materials
on
the
surface
and
the
red
color
comes
from
the
rust.
\
\
So
the
correct
option
is
(A).
The
answer
is
(A)."
\
So
the
correct
option
is
(A).
The
answer
is
(A).
\n\n
"
"
group"
:
"
mmlu_flan_cot_fewshot_stem"
"
group"
:
"
mmlu_flan_cot_fewshot_stem"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
task"
:
"
mmlu_flan_cot_fewshot_astronomy"
"
task"
:
"
mmlu_flan_cot_fewshot_astronomy"
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_business_ethics.yaml
View file @
b32b3793
...
@@ -50,7 +50,7 @@
...
@@ -50,7 +50,7 @@
\
that
best
uses
the
possible
options
above
is
“Beyond
the
business
case
for
engaging
\
\
that
best
uses
the
possible
options
above
is
“Beyond
the
business
case
for
engaging
\
\
the
CSR
there
are
a
number
of
moral
arguments
relating
to:
negative
*externalities*,
\
\
the
CSR
there
are
a
number
of
moral
arguments
relating
to:
negative
*externalities*,
\
\
the
*power*
that
corporations
possess
and
the
*mutual
independence*
of
business
\
\
the
*power*
that
corporations
possess
and
the
*mutual
independence*
of
business
\
\
and
society.
The
answer
is
(D)."
\
and
society.
The
answer
is
(D).
\n\n
"
"
group"
:
"
mmlu_flan_cot_fewshot_other"
"
group"
:
"
mmlu_flan_cot_fewshot_other"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
task"
:
"
mmlu_flan_cot_fewshot_business_ethics"
"
task"
:
"
mmlu_flan_cot_fewshot_business_ethics"
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_clinical_knowledge.yaml
View file @
b32b3793
...
@@ -29,7 +29,7 @@
...
@@ -29,7 +29,7 @@
\
(D)
oxidative
phosphorylation.
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
\
\
(D)
oxidative
phosphorylation.
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
\
\
articles
on
clinical
knowledge
for
help.
The
energy
for
muscular
contraction
is
\
\
articles
on
clinical
knowledge
for
help.
The
energy
for
muscular
contraction
is
\
\
provided
by
ATP
(adenosine
triphosphate),
which
is
the
powerhouse
of
the
cell.
\
\
provided
by
ATP
(adenosine
triphosphate),
which
is
the
powerhouse
of
the
cell.
\
\
The
answer
is
(A)."
\
The
answer
is
(A).
\n\n
"
"
group"
:
"
mmlu_flan_cot_fewshot_other"
"
group"
:
"
mmlu_flan_cot_fewshot_other"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
task"
:
"
mmlu_flan_cot_fewshot_clinical_knowledge"
"
task"
:
"
mmlu_flan_cot_fewshot_clinical_knowledge"
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_college_biology.yaml
View file @
b32b3793
...
@@ -55,7 +55,7 @@
...
@@ -55,7 +55,7 @@
\
resemblance
of
structures
that
have
different
origins,
which
is
not
the
case
for
\
\
resemblance
of
structures
that
have
different
origins,
which
is
not
the
case
for
\
\
the
human
and
bird
forearms,
which
rules
out
(D).
Humans
and
birds
do
belong
to
\
\
the
human
and
bird
forearms,
which
rules
out
(D).
Humans
and
birds
do
belong
to
\
\
the
same
clade
-
a
group
of
organisms
composed
of
a
common
ancestor.
The
answer
\
\
the
same
clade
-
a
group
of
organisms
composed
of
a
common
ancestor.
The
answer
\
\
is
(C)."
\
is
(C).
\n\n
"
"
group"
:
"
mmlu_flan_cot_fewshot_stem"
"
group"
:
"
mmlu_flan_cot_fewshot_stem"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
task"
:
"
mmlu_flan_cot_fewshot_college_biology"
"
task"
:
"
mmlu_flan_cot_fewshot_college_biology"
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_college_chemistry.yaml
View file @
b32b3793
...
@@ -32,7 +32,7 @@
...
@@ -32,7 +32,7 @@
\
hyperfine
interaction
with
the
13C
(nuclear
spin
$I
=
\n
rac{1}{2}$)
which
will
\
\
hyperfine
interaction
with
the
13C
(nuclear
spin
$I
=
\n
rac{1}{2}$)
which
will
\
\
split
the
spectrum
into
2
lines.
This
will
be
further
split
into
4
lines
by
the
\
\
split
the
spectrum
into
2
lines.
This
will
be
further
split
into
4
lines
by
the
\
\
interaction
with
three
equivalent
1H
nuclei.
The
total
number
of
lines
is
therefore
\
\
interaction
with
three
equivalent
1H
nuclei.
The
total
number
of
lines
is
therefore
\
\
$2
\\
cdot
4
=
8$.
The
answer
is
(E)."
\
$2
\\
cdot
4
=
8$.
The
answer
is
(E).
\n\n
"
"
group"
:
"
mmlu_flan_cot_fewshot_stem"
"
group"
:
"
mmlu_flan_cot_fewshot_stem"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
task"
:
"
mmlu_flan_cot_fewshot_college_chemistry"
"
task"
:
"
mmlu_flan_cot_fewshot_college_chemistry"
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_college_computer_science.yaml
View file @
b32b3793
...
@@ -73,7 +73,7 @@
...
@@ -73,7 +73,7 @@
Thus
we
can
see
that
on
average
a
single
processor
will
lock
the
bus
for:
\n
lock_ns_per_miss
\
Thus
we
can
see
that
on
average
a
single
processor
will
lock
the
bus
for:
\n
lock_ns_per_miss
\
\
*
misses_per_instruction
*
instructions_per_ns
=
\n
(1000
nanoseconds
/
cache
miss)
\
\
*
misses_per_instruction
*
instructions_per_ns
=
\n
(1000
nanoseconds
/
cache
miss)
\
\
*
(1
cache
miss
/
50
instructions)
*
(50
instructions
/
27000
nanoseconds)
=
1000
\
\
*
(1
cache
miss
/
50
instructions)
*
(50
instructions
/
27000
nanoseconds)
=
1000
\
\
*
(1/50)
*
(50/27000)
=
1000/27000
=
1/27.
The
answer
is
(B)."
\
*
(1/50)
*
(50/27000)
=
1000/27000
=
1/27.
The
answer
is
(B).
\n\n
"
"
group"
:
"
mmlu_flan_cot_fewshot_stem"
"
group"
:
"
mmlu_flan_cot_fewshot_stem"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
task"
:
"
mmlu_flan_cot_fewshot_college_computer_science"
"
task"
:
"
mmlu_flan_cot_fewshot_college_computer_science"
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_college_mathematics.yaml
View file @
b32b3793
...
@@ -44,7 +44,7 @@
...
@@ -44,7 +44,7 @@
\
$t
\\
in
\\
mathbb{R},
\\
ln
((s(t)-2))=-[t
/
25]+C$.
Let
$K:=e^{C}$.
Then,
for
all
\
\
$t
\\
in
\\
mathbb{R},
\\
ln
((s(t)-2))=-[t
/
25]+C$.
Let
$K:=e^{C}$.
Then,
for
all
\
\
$t
\\
in
\\
mathbb{R}$,
we
have
$(s(t))-2=K
e^{-t
/
25}$,
and
so
$s(t)=2+K
e^{-t
\
\
$t
\\
in
\\
mathbb{R}$,
we
have
$(s(t))-2=K
e^{-t
/
25}$,
and
so
$s(t)=2+K
e^{-t
\
\
/
25}$.
Then
$3=s(0)=2+K
e^{0}=2+K$,
so
$K=1$.
Then
$s(100)=2+K
e^{-100
/
25}=2+1
\
\
/
25}$.
Then
$3=s(0)=2+K
e^{0}=2+K$,
so
$K=1$.
Then
$s(100)=2+K
e^{-100
/
25}=2+1
\
\ \\
cdot
e^{-4}=2+e^{-4}$.
The
answer
is
(D)."
\ \\
cdot
e^{-4}=2+e^{-4}$.
The
answer
is
(D).
\n\n
"
"
group"
:
"
mmlu_flan_cot_fewshot_stem"
"
group"
:
"
mmlu_flan_cot_fewshot_stem"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
task"
:
"
mmlu_flan_cot_fewshot_college_mathematics"
"
task"
:
"
mmlu_flan_cot_fewshot_college_mathematics"
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_college_medicine.yaml
View file @
b32b3793
...
@@ -46,7 +46,7 @@
...
@@ -46,7 +46,7 @@
\
monocarbylic
acid
transporters.
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
\
\
monocarbylic
acid
transporters.
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
\
\
articles
on
medicine
for
help.
Glucose
(also
known
as
the
blood
sugar)
is
the
\
\
articles
on
medicine
for
help.
Glucose
(also
known
as
the
blood
sugar)
is
the
\
\
main
sugar
found
in
the
human
body.
It
is
transported
into
the
muscle
cell
via
\
\
main
sugar
found
in
the
human
body.
It
is
transported
into
the
muscle
cell
via
\
\
diffusion
through
protein
transporters
called
GLUT4.
The
answer
is
(A)."
\
diffusion
through
protein
transporters
called
GLUT4.
The
answer
is
(A).
\n\n
"
"
group"
:
"
mmlu_flan_cot_fewshot_other"
"
group"
:
"
mmlu_flan_cot_fewshot_other"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
task"
:
"
mmlu_flan_cot_fewshot_college_medicine"
"
task"
:
"
mmlu_flan_cot_fewshot_college_medicine"
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_college_physics.yaml
View file @
b32b3793
...
@@ -38,7 +38,7 @@
...
@@ -38,7 +38,7 @@
\
go
into
the
gases
internal
energy
or
work
done
against
an
external
force.
However,
\
\
go
into
the
gases
internal
energy
or
work
done
against
an
external
force.
However,
\
\
if
the
volume
of
the
gas
container
is
constant,
no
work
will
be
done
(since
work
\
\
if
the
volume
of
the
gas
container
is
constant,
no
work
will
be
done
(since
work
\
\
is
pressure
times
change
in
volume).
So,
at
constant
volume,
all
of
the
heat
goes
\
\
is
pressure
times
change
in
volume).
So,
at
constant
volume,
all
of
the
heat
goes
\
\
into
the
internal
energy.
The
answer
is
(B)."
\
into
the
internal
energy.
The
answer
is
(B).
\n\n
"
"
group"
:
"
mmlu_flan_cot_fewshot_stem"
"
group"
:
"
mmlu_flan_cot_fewshot_stem"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
task"
:
"
mmlu_flan_cot_fewshot_college_physics"
"
task"
:
"
mmlu_flan_cot_fewshot_college_physics"
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_computer_security.yaml
View file @
b32b3793
...
@@ -30,7 +30,7 @@
...
@@ -30,7 +30,7 @@
\
resulted
from
improper
input
validation
(due
to
a
missing
bounds
check)
in
the
\
\
resulted
from
improper
input
validation
(due
to
a
missing
bounds
check)
in
the
\
\
implementation
of
the
TLS
heartbeat
extension.
The
vulnerability
was
classified
\
\
implementation
of
the
TLS
heartbeat
extension.
The
vulnerability
was
classified
\
\
as
a
buffer
over-read,
a
situation
where
more
data
can
be
read
than
should
be
\
\
as
a
buffer
over-read,
a
situation
where
more
data
can
be
read
than
should
be
\
\
allowed.
The
answer
is
(C)."
\
allowed.
The
answer
is
(C).
\n\n
"
"
group"
:
"
mmlu_flan_cot_fewshot_stem"
"
group"
:
"
mmlu_flan_cot_fewshot_stem"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
task"
:
"
mmlu_flan_cot_fewshot_computer_security"
"
task"
:
"
mmlu_flan_cot_fewshot_computer_security"
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_conceptual_physics.yaml
View file @
b32b3793
...
@@ -27,7 +27,7 @@
...
@@ -27,7 +27,7 @@
\
speed
in
the
direction
of
the
wind
is
greater
than
it
would
be
in
the
absence
\
\
speed
in
the
direction
of
the
wind
is
greater
than
it
would
be
in
the
absence
\
\
of
wind,
and
its
direction
orthogonal
to
the
wind
is
the
same
as
it
would
be
in
\
\
of
wind,
and
its
direction
orthogonal
to
the
wind
is
the
same
as
it
would
be
in
\
\
the
absence
of
the
wind.
The
total
speed,
which
is
these
two
components
added
\
\
the
absence
of
the
wind.
The
total
speed,
which
is
these
two
components
added
\
\
in
quadrature,
is
thus
greater
than
the
speed
in
still
air.
The
answer
is
(B)."
\
in
quadrature,
is
thus
greater
than
the
speed
in
still
air.
The
answer
is
(B).
\n\n
"
"
group"
:
"
mmlu_flan_cot_fewshot_stem"
"
group"
:
"
mmlu_flan_cot_fewshot_stem"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
task"
:
"
mmlu_flan_cot_fewshot_conceptual_physics"
"
task"
:
"
mmlu_flan_cot_fewshot_conceptual_physics"
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_econometrics.yaml
View file @
b32b3793
...
@@ -57,7 +57,7 @@
...
@@ -57,7 +57,7 @@
\
die
away
(B)
Persist
indefinitely
(C)
Grow
exponentially
(D)
Never
occur
\n
A:
Let's
\
\
die
away
(B)
Persist
indefinitely
(C)
Grow
exponentially
(D)
Never
occur
\n
A:
Let's
\
\
think
step
by
step.
We
refer
to
Wikipedia
articles
on
econometrics
for
help.
This
\
\
think
step
by
step.
We
refer
to
Wikipedia
articles
on
econometrics
for
help.
This
\
\
is
a
formal
logic
problem
about
stationally
process.
For
a
stationary
autoregressive
\
\
is
a
formal
logic
problem
about
stationally
process.
For
a
stationary
autoregressive
\
\
process,
shocks
will
eventually
die
away.
The
answer
is
(A)."
\
process,
shocks
will
eventually
die
away.
The
answer
is
(A).
\n\n
"
"
group"
:
"
mmlu_flan_cot_fewshot_social_sciences"
"
group"
:
"
mmlu_flan_cot_fewshot_social_sciences"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
task"
:
"
mmlu_flan_cot_fewshot_econometrics"
"
task"
:
"
mmlu_flan_cot_fewshot_econometrics"
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_electrical_engineering.yaml
View file @
b32b3793
...
@@ -28,7 +28,7 @@
...
@@ -28,7 +28,7 @@
\
is
100.
Find
the
total
resistance
\n
(A)
200Ω
(B)
100Ω
(C)
50Ω
(D)
10Ω
\n
A:
Let's
\
\
is
100.
Find
the
total
resistance
\n
(A)
200Ω
(B)
100Ω
(C)
50Ω
(D)
10Ω
\n
A:
Let's
\
\
think
step
by
step.
In
lap
winding,
effectively
two
resistors
are
connected
in
\
\
think
step
by
step.
In
lap
winding,
effectively
two
resistors
are
connected
in
\
\
parallel,
so
the
actual
resistance
of
each
pair
is
1
Ohm.
Since
we
have
50
pairs,
\
\
parallel,
so
the
actual
resistance
of
each
pair
is
1
Ohm.
Since
we
have
50
pairs,
\
\
we
get
a
total
resistance
of
50
Ohms.
The
answer
is
(C)."
\
we
get
a
total
resistance
of
50
Ohms.
The
answer
is
(C).
\n\n
"
"
group"
:
"
mmlu_flan_cot_fewshot_stem"
"
group"
:
"
mmlu_flan_cot_fewshot_stem"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
task"
:
"
mmlu_flan_cot_fewshot_electrical_engineering"
"
task"
:
"
mmlu_flan_cot_fewshot_electrical_engineering"
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_elementary_mathematics.yaml
View file @
b32b3793
...
@@ -35,7 +35,7 @@
...
@@ -35,7 +35,7 @@
\n
Q:
Which
expression
is
equivalent
to
5
x
9?
\n
(A)
(5
x
4)
x
(6
x
5)
\n
(B)
(5
x
5)
\
\n
Q:
Which
expression
is
equivalent
to
5
x
9?
\n
(A)
(5
x
4)
x
(6
x
5)
\n
(B)
(5
x
5)
\
\
+
(5
x
4)
\n
(C)
(5
x
5)
+
(5
x
9)
\n
(D)
(5
x
9)
x
(6
x
9)
\n
A:
Let's
think
step
by
\
\
+
(5
x
4)
\n
(C)
(5
x
5)
+
(5
x
9)
\n
(D)
(5
x
9)
x
(6
x
9)
\n
A:
Let's
think
step
by
\
\
step.
We
know
that
9
=
(5
+
4),
so
5
x
9
=
5
x
(5
+
4)
=
(5
x
5)
+
(5
x
4).
The
\
\
step.
We
know
that
9
=
(5
+
4),
so
5
x
9
=
5
x
(5
+
4)
=
(5
x
5)
+
(5
x
4).
The
\
\
answer
is
(B)."
\
answer
is
(B).
\n\n
"
"
group"
:
"
mmlu_flan_cot_fewshot_stem"
"
group"
:
"
mmlu_flan_cot_fewshot_stem"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
task"
:
"
mmlu_flan_cot_fewshot_elementary_mathematics"
"
task"
:
"
mmlu_flan_cot_fewshot_elementary_mathematics"
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_formal_logic.yaml
View file @
b32b3793
...
@@ -47,7 +47,7 @@
...
@@ -47,7 +47,7 @@
\
(∀x)(Px
⊃
~Dx)
→
For
all
x,
x
is
on
Mars
implies
that
x
do
not
drive
on
Mars.
\n\
\
(∀x)(Px
⊃
~Dx)
→
For
all
x,
x
is
on
Mars
implies
that
x
do
not
drive
on
Mars.
\n\
Option
(D):
~Dp:
→
p
do
not
drive
on
Mars.
\n
Of
all
these
options,
Option
(C)
appears
\
Option
(D):
~Dp:
→
p
do
not
drive
on
Mars.
\n
Of
all
these
options,
Option
(C)
appears
\
\
to
be
the
best
and
most
meaningful
interpretation
of
the
argument
“No
people
drive
\
\
to
be
the
best
and
most
meaningful
interpretation
of
the
argument
“No
people
drive
\
\
on
Mars.”
The
answer
is
(C)."
\
on
Mars.”
The
answer
is
(C).
\n\n
"
"
group"
:
"
mmlu_flan_cot_fewshot_humanities"
"
group"
:
"
mmlu_flan_cot_fewshot_humanities"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
task"
:
"
mmlu_flan_cot_fewshot_formal_logic"
"
task"
:
"
mmlu_flan_cot_fewshot_formal_logic"
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_global_facts.yaml
View file @
b32b3793
...
@@ -28,7 +28,7 @@
...
@@ -28,7 +28,7 @@
\
of
their
nation
or
the
world.
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
\
\
of
their
nation
or
the
world.
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
\
\
articles
on
global
facts
for
help.
As
of
2019,
most
people
tend
to
be
optimistic
\
\
articles
on
global
facts
for
help.
As
of
2019,
most
people
tend
to
be
optimistic
\
\
about
their
own
future
but
pessimistic
about
the
future
of
their
nation
or
the
\
\
about
their
own
future
but
pessimistic
about
the
future
of
their
nation
or
the
\
\
world.
The
answer
is
(B)."
\
world.
The
answer
is
(B).
\n\n
"
"
group"
:
"
mmlu_flan_cot_fewshot_other"
"
group"
:
"
mmlu_flan_cot_fewshot_other"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
task"
:
"
mmlu_flan_cot_fewshot_global_facts"
"
task"
:
"
mmlu_flan_cot_fewshot_global_facts"
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_high_school_biology.yaml
View file @
b32b3793
...
@@ -48,7 +48,7 @@
...
@@ -48,7 +48,7 @@
\
proceed
with
cell
division.
Cues
like
these
act
by
changing
the
activity
of
core
\
\
proceed
with
cell
division.
Cues
like
these
act
by
changing
the
activity
of
core
\
\
cell
cycle
regulators
inside
the
cell.
The
most
common
regulators
are
cyclins
\
\
cell
cycle
regulators
inside
the
cell.
The
most
common
regulators
are
cyclins
\
\
and
cyclin-dependent
kinases.
Fibroblast
cells
do
not
play
any
role
in
cell
division.
\
\
and
cyclin-dependent
kinases.
Fibroblast
cells
do
not
play
any
role
in
cell
division.
\
\
The
answer
is
(D)."
\
The
answer
is
(D).
\n\n
"
"
group"
:
"
mmlu_flan_cot_fewshot_stem"
"
group"
:
"
mmlu_flan_cot_fewshot_stem"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
task"
:
"
mmlu_flan_cot_fewshot_high_school_biology"
"
task"
:
"
mmlu_flan_cot_fewshot_high_school_biology"
Prev
1
2
3
Next
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment