Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
gaoqiong
lm-evaluation-harness
Commits
835cc40e
"kubernetes/manifest/base/webui-service.yaml" did not exist on "7063f00b71ad3276ac43f6d450b322dbcb945c88"
Commit
835cc40e
authored
Dec 06, 2023
by
lintangsutawika
Browse files
merged latest and added altworld files
parents
8da401e0
c9bbec6e
Changes
430
Show whitespace changes
Inline
Side-by-side
Showing
20 changed files
with
20 additions
and
20 deletions
+20
-20
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_abstract_algebra.yaml
...al/tasks/mmlu/flan_cot_fewshot/mmlu_abstract_algebra.yaml
+1
-1
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_anatomy.yaml
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_anatomy.yaml
+1
-1
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_astronomy.yaml
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_astronomy.yaml
+1
-1
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_business_ethics.yaml
...val/tasks/mmlu/flan_cot_fewshot/mmlu_business_ethics.yaml
+1
-1
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_clinical_knowledge.yaml
.../tasks/mmlu/flan_cot_fewshot/mmlu_clinical_knowledge.yaml
+1
-1
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_college_biology.yaml
...val/tasks/mmlu/flan_cot_fewshot/mmlu_college_biology.yaml
+1
-1
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_college_chemistry.yaml
...l/tasks/mmlu/flan_cot_fewshot/mmlu_college_chemistry.yaml
+1
-1
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_college_computer_science.yaml
.../mmlu/flan_cot_fewshot/mmlu_college_computer_science.yaml
+1
-1
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_college_mathematics.yaml
...tasks/mmlu/flan_cot_fewshot/mmlu_college_mathematics.yaml
+1
-1
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_college_medicine.yaml
...al/tasks/mmlu/flan_cot_fewshot/mmlu_college_medicine.yaml
+1
-1
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_college_physics.yaml
...val/tasks/mmlu/flan_cot_fewshot/mmlu_college_physics.yaml
+1
-1
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_computer_security.yaml
...l/tasks/mmlu/flan_cot_fewshot/mmlu_computer_security.yaml
+1
-1
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_conceptual_physics.yaml
.../tasks/mmlu/flan_cot_fewshot/mmlu_conceptual_physics.yaml
+1
-1
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_econometrics.yaml
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_econometrics.yaml
+1
-1
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_electrical_engineering.yaml
...ks/mmlu/flan_cot_fewshot/mmlu_electrical_engineering.yaml
+1
-1
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_elementary_mathematics.yaml
...ks/mmlu/flan_cot_fewshot/mmlu_elementary_mathematics.yaml
+1
-1
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_formal_logic.yaml
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_formal_logic.yaml
+1
-1
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_global_facts.yaml
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_global_facts.yaml
+1
-1
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_high_school_biology.yaml
...tasks/mmlu/flan_cot_fewshot/mmlu_high_school_biology.yaml
+1
-1
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_high_school_chemistry.yaml
...sks/mmlu/flan_cot_fewshot/mmlu_high_school_chemistry.yaml
+1
-1
No files found.
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_abstract_algebra.yaml
View file @
835cc40e
...
...
@@ -35,7 +35,7 @@
\
then
x^2
+
c
=
x^2
+
1
=
0
+
1
for
x
=
0,
1
+
1
=
2
for
x
=
1
and
1
+
1
=
2
for
\
\
x
=
2,
hence
x^2
+
1
does
not
have
any
roots.
For
c
=
2
the
polynomial
x^2
+
2
\
\
has
two
roots
at
x
=
1
and
x
=
2.
Hence
Z_3[x]/(x^2
+
c)
is
a
field
if
and
only
\
\
if
c
=
1.
The
answer
is
(B)."
\
if
c
=
1.
The
answer
is
(B).
\n\n
"
"
group"
:
"
mmlu_flan_cot_fewshot_stem"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
task"
:
"
mmlu_flan_cot_fewshot_abstract_algebra"
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_anatomy.yaml
View file @
835cc40e
...
...
@@ -51,7 +51,7 @@
\
of
the
hyoid
bone;
therefore,
the
embryological
origin
of
the
hyoid
bone
are
the
\
\
second
and
the
third
pharyngeal
arches—this
information
is
covered
in
the
last
\
\
option
(D).
Therefore,
we
conclude
that
(D)
must
be
the
correct
answer.
The
answer
\
\
is
(D)."
\
is
(D).
\n\n
"
"
group"
:
"
mmlu_flan_cot_fewshot_stem"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
task"
:
"
mmlu_flan_cot_fewshot_anatomy"
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_astronomy.yaml
View file @
835cc40e
...
...
@@ -49,7 +49,7 @@
\
red.
Options
(C)
and
(D)
are
not
specific
enough
about
why
the
color
of
the
surface
\
\
would
be
red,
while
(A)
is
correct
because
it
explains
that
the
surface
is
red
\
\
due
to
the
rusted
materials
on
the
surface
and
the
red
color
comes
from
the
rust.
\
\
So
the
correct
option
is
(A).
The
answer
is
(A)."
\
So
the
correct
option
is
(A).
The
answer
is
(A).
\n\n
"
"
group"
:
"
mmlu_flan_cot_fewshot_stem"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
task"
:
"
mmlu_flan_cot_fewshot_astronomy"
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_business_ethics.yaml
View file @
835cc40e
...
...
@@ -50,7 +50,7 @@
\
that
best
uses
the
possible
options
above
is
“Beyond
the
business
case
for
engaging
\
\
the
CSR
there
are
a
number
of
moral
arguments
relating
to:
negative
*externalities*,
\
\
the
*power*
that
corporations
possess
and
the
*mutual
independence*
of
business
\
\
and
society.
The
answer
is
(D)."
\
and
society.
The
answer
is
(D).
\n\n
"
"
group"
:
"
mmlu_flan_cot_fewshot_other"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
task"
:
"
mmlu_flan_cot_fewshot_business_ethics"
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_clinical_knowledge.yaml
View file @
835cc40e
...
...
@@ -29,7 +29,7 @@
\
(D)
oxidative
phosphorylation.
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
\
\
articles
on
clinical
knowledge
for
help.
The
energy
for
muscular
contraction
is
\
\
provided
by
ATP
(adenosine
triphosphate),
which
is
the
powerhouse
of
the
cell.
\
\
The
answer
is
(A)."
\
The
answer
is
(A).
\n\n
"
"
group"
:
"
mmlu_flan_cot_fewshot_other"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
task"
:
"
mmlu_flan_cot_fewshot_clinical_knowledge"
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_college_biology.yaml
View file @
835cc40e
...
...
@@ -55,7 +55,7 @@
\
resemblance
of
structures
that
have
different
origins,
which
is
not
the
case
for
\
\
the
human
and
bird
forearms,
which
rules
out
(D).
Humans
and
birds
do
belong
to
\
\
the
same
clade
-
a
group
of
organisms
composed
of
a
common
ancestor.
The
answer
\
\
is
(C)."
\
is
(C).
\n\n
"
"
group"
:
"
mmlu_flan_cot_fewshot_stem"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
task"
:
"
mmlu_flan_cot_fewshot_college_biology"
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_college_chemistry.yaml
View file @
835cc40e
...
...
@@ -32,7 +32,7 @@
\
hyperfine
interaction
with
the
13C
(nuclear
spin
$I
=
\n
rac{1}{2}$)
which
will
\
\
split
the
spectrum
into
2
lines.
This
will
be
further
split
into
4
lines
by
the
\
\
interaction
with
three
equivalent
1H
nuclei.
The
total
number
of
lines
is
therefore
\
\
$2
\\
cdot
4
=
8$.
The
answer
is
(E)."
\
$2
\\
cdot
4
=
8$.
The
answer
is
(E).
\n\n
"
"
group"
:
"
mmlu_flan_cot_fewshot_stem"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
task"
:
"
mmlu_flan_cot_fewshot_college_chemistry"
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_college_computer_science.yaml
View file @
835cc40e
...
...
@@ -73,7 +73,7 @@
Thus
we
can
see
that
on
average
a
single
processor
will
lock
the
bus
for:
\n
lock_ns_per_miss
\
\
*
misses_per_instruction
*
instructions_per_ns
=
\n
(1000
nanoseconds
/
cache
miss)
\
\
*
(1
cache
miss
/
50
instructions)
*
(50
instructions
/
27000
nanoseconds)
=
1000
\
\
*
(1/50)
*
(50/27000)
=
1000/27000
=
1/27.
The
answer
is
(B)."
\
*
(1/50)
*
(50/27000)
=
1000/27000
=
1/27.
The
answer
is
(B).
\n\n
"
"
group"
:
"
mmlu_flan_cot_fewshot_stem"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
task"
:
"
mmlu_flan_cot_fewshot_college_computer_science"
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_college_mathematics.yaml
View file @
835cc40e
...
...
@@ -44,7 +44,7 @@
\
$t
\\
in
\\
mathbb{R},
\\
ln
((s(t)-2))=-[t
/
25]+C$.
Let
$K:=e^{C}$.
Then,
for
all
\
\
$t
\\
in
\\
mathbb{R}$,
we
have
$(s(t))-2=K
e^{-t
/
25}$,
and
so
$s(t)=2+K
e^{-t
\
\
/
25}$.
Then
$3=s(0)=2+K
e^{0}=2+K$,
so
$K=1$.
Then
$s(100)=2+K
e^{-100
/
25}=2+1
\
\ \\
cdot
e^{-4}=2+e^{-4}$.
The
answer
is
(D)."
\ \\
cdot
e^{-4}=2+e^{-4}$.
The
answer
is
(D).
\n\n
"
"
group"
:
"
mmlu_flan_cot_fewshot_stem"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
task"
:
"
mmlu_flan_cot_fewshot_college_mathematics"
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_college_medicine.yaml
View file @
835cc40e
...
...
@@ -46,7 +46,7 @@
\
monocarbylic
acid
transporters.
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
\
\
articles
on
medicine
for
help.
Glucose
(also
known
as
the
blood
sugar)
is
the
\
\
main
sugar
found
in
the
human
body.
It
is
transported
into
the
muscle
cell
via
\
\
diffusion
through
protein
transporters
called
GLUT4.
The
answer
is
(A)."
\
diffusion
through
protein
transporters
called
GLUT4.
The
answer
is
(A).
\n\n
"
"
group"
:
"
mmlu_flan_cot_fewshot_other"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
task"
:
"
mmlu_flan_cot_fewshot_college_medicine"
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_college_physics.yaml
View file @
835cc40e
...
...
@@ -38,7 +38,7 @@
\
go
into
the
gases
internal
energy
or
work
done
against
an
external
force.
However,
\
\
if
the
volume
of
the
gas
container
is
constant,
no
work
will
be
done
(since
work
\
\
is
pressure
times
change
in
volume).
So,
at
constant
volume,
all
of
the
heat
goes
\
\
into
the
internal
energy.
The
answer
is
(B)."
\
into
the
internal
energy.
The
answer
is
(B).
\n\n
"
"
group"
:
"
mmlu_flan_cot_fewshot_stem"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
task"
:
"
mmlu_flan_cot_fewshot_college_physics"
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_computer_security.yaml
View file @
835cc40e
...
...
@@ -30,7 +30,7 @@
\
resulted
from
improper
input
validation
(due
to
a
missing
bounds
check)
in
the
\
\
implementation
of
the
TLS
heartbeat
extension.
The
vulnerability
was
classified
\
\
as
a
buffer
over-read,
a
situation
where
more
data
can
be
read
than
should
be
\
\
allowed.
The
answer
is
(C)."
\
allowed.
The
answer
is
(C).
\n\n
"
"
group"
:
"
mmlu_flan_cot_fewshot_stem"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
task"
:
"
mmlu_flan_cot_fewshot_computer_security"
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_conceptual_physics.yaml
View file @
835cc40e
...
...
@@ -27,7 +27,7 @@
\
speed
in
the
direction
of
the
wind
is
greater
than
it
would
be
in
the
absence
\
\
of
wind,
and
its
direction
orthogonal
to
the
wind
is
the
same
as
it
would
be
in
\
\
the
absence
of
the
wind.
The
total
speed,
which
is
these
two
components
added
\
\
in
quadrature,
is
thus
greater
than
the
speed
in
still
air.
The
answer
is
(B)."
\
in
quadrature,
is
thus
greater
than
the
speed
in
still
air.
The
answer
is
(B).
\n\n
"
"
group"
:
"
mmlu_flan_cot_fewshot_stem"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
task"
:
"
mmlu_flan_cot_fewshot_conceptual_physics"
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_econometrics.yaml
View file @
835cc40e
...
...
@@ -57,7 +57,7 @@
\
die
away
(B)
Persist
indefinitely
(C)
Grow
exponentially
(D)
Never
occur
\n
A:
Let's
\
\
think
step
by
step.
We
refer
to
Wikipedia
articles
on
econometrics
for
help.
This
\
\
is
a
formal
logic
problem
about
stationally
process.
For
a
stationary
autoregressive
\
\
process,
shocks
will
eventually
die
away.
The
answer
is
(A)."
\
process,
shocks
will
eventually
die
away.
The
answer
is
(A).
\n\n
"
"
group"
:
"
mmlu_flan_cot_fewshot_social_sciences"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
task"
:
"
mmlu_flan_cot_fewshot_econometrics"
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_electrical_engineering.yaml
View file @
835cc40e
...
...
@@ -28,7 +28,7 @@
\
is
100.
Find
the
total
resistance
\n
(A)
200Ω
(B)
100Ω
(C)
50Ω
(D)
10Ω
\n
A:
Let's
\
\
think
step
by
step.
In
lap
winding,
effectively
two
resistors
are
connected
in
\
\
parallel,
so
the
actual
resistance
of
each
pair
is
1
Ohm.
Since
we
have
50
pairs,
\
\
we
get
a
total
resistance
of
50
Ohms.
The
answer
is
(C)."
\
we
get
a
total
resistance
of
50
Ohms.
The
answer
is
(C).
\n\n
"
"
group"
:
"
mmlu_flan_cot_fewshot_stem"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
task"
:
"
mmlu_flan_cot_fewshot_electrical_engineering"
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_elementary_mathematics.yaml
View file @
835cc40e
...
...
@@ -35,7 +35,7 @@
\n
Q:
Which
expression
is
equivalent
to
5
x
9?
\n
(A)
(5
x
4)
x
(6
x
5)
\n
(B)
(5
x
5)
\
\
+
(5
x
4)
\n
(C)
(5
x
5)
+
(5
x
9)
\n
(D)
(5
x
9)
x
(6
x
9)
\n
A:
Let's
think
step
by
\
\
step.
We
know
that
9
=
(5
+
4),
so
5
x
9
=
5
x
(5
+
4)
=
(5
x
5)
+
(5
x
4).
The
\
\
answer
is
(B)."
\
answer
is
(B).
\n\n
"
"
group"
:
"
mmlu_flan_cot_fewshot_stem"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
task"
:
"
mmlu_flan_cot_fewshot_elementary_mathematics"
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_formal_logic.yaml
View file @
835cc40e
...
...
@@ -47,7 +47,7 @@
\
(∀x)(Px
⊃
~Dx)
→
For
all
x,
x
is
on
Mars
implies
that
x
do
not
drive
on
Mars.
\n\
Option
(D):
~Dp:
→
p
do
not
drive
on
Mars.
\n
Of
all
these
options,
Option
(C)
appears
\
\
to
be
the
best
and
most
meaningful
interpretation
of
the
argument
“No
people
drive
\
\
on
Mars.”
The
answer
is
(C)."
\
on
Mars.”
The
answer
is
(C).
\n\n
"
"
group"
:
"
mmlu_flan_cot_fewshot_humanities"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
task"
:
"
mmlu_flan_cot_fewshot_formal_logic"
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_global_facts.yaml
View file @
835cc40e
...
...
@@ -28,7 +28,7 @@
\
of
their
nation
or
the
world.
\n
A:
Let's
think
step
by
step.
We
refer
to
Wikipedia
\
\
articles
on
global
facts
for
help.
As
of
2019,
most
people
tend
to
be
optimistic
\
\
about
their
own
future
but
pessimistic
about
the
future
of
their
nation
or
the
\
\
world.
The
answer
is
(B)."
\
world.
The
answer
is
(B).
\n\n
"
"
group"
:
"
mmlu_flan_cot_fewshot_other"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
task"
:
"
mmlu_flan_cot_fewshot_global_facts"
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_high_school_biology.yaml
View file @
835cc40e
...
...
@@ -48,7 +48,7 @@
\
proceed
with
cell
division.
Cues
like
these
act
by
changing
the
activity
of
core
\
\
cell
cycle
regulators
inside
the
cell.
The
most
common
regulators
are
cyclins
\
\
and
cyclin-dependent
kinases.
Fibroblast
cells
do
not
play
any
role
in
cell
division.
\
\
The
answer
is
(D)."
\
The
answer
is
(D).
\n\n
"
"
group"
:
"
mmlu_flan_cot_fewshot_stem"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
task"
:
"
mmlu_flan_cot_fewshot_high_school_biology"
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_high_school_chemistry.yaml
View file @
835cc40e
...
...
@@ -44,7 +44,7 @@
\
(aq)
+
H_{2}O
\n
ightarrow
H_{3}O^{+}
+
CH3COO^{-}$.
The
conjugate
base
is
therefore
\
\
the
acetate
ion.
The
added
strong
acid,
Nitric
acid,
will
react
with
the
conjugate
\
\
base.
Therefore
the
maximum
amount
of
acid
that
can
be
added
will
be
equal
to
\
\
the
amount
of
acetate
ion,
or
2
moles.
The
answer
is
(C)."
\
the
amount
of
acetate
ion,
or
2
moles.
The
answer
is
(C).
\n\n
"
"
group"
:
"
mmlu_flan_cot_fewshot_stem"
"
include"
:
"
_mmlu_flan_cot_fewshot_template_yaml"
"
task"
:
"
mmlu_flan_cot_fewshot_high_school_chemistry"
Prev
1
…
12
13
14
15
16
17
18
19
20
…
22
Next
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment