Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
gaoqiong
lm-evaluation-harness
Commits
e200c24e
Commit
e200c24e
authored
Jul 03, 2024
by
lintangsutawika
Browse files
update mmlu
parent
43765669
Changes
342
Hide whitespace changes
Inline
Side-by-side
Showing
20 changed files
with
20 additions
and
20 deletions
+20
-20
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_business_ethics.yaml
...val/tasks/mmlu/flan_cot_fewshot/mmlu_business_ethics.yaml
+1
-1
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_clinical_knowledge.yaml
.../tasks/mmlu/flan_cot_fewshot/mmlu_clinical_knowledge.yaml
+1
-1
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_college_biology.yaml
...val/tasks/mmlu/flan_cot_fewshot/mmlu_college_biology.yaml
+1
-1
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_college_chemistry.yaml
...l/tasks/mmlu/flan_cot_fewshot/mmlu_college_chemistry.yaml
+1
-1
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_college_computer_science.yaml
.../mmlu/flan_cot_fewshot/mmlu_college_computer_science.yaml
+1
-1
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_college_mathematics.yaml
...tasks/mmlu/flan_cot_fewshot/mmlu_college_mathematics.yaml
+1
-1
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_college_medicine.yaml
...al/tasks/mmlu/flan_cot_fewshot/mmlu_college_medicine.yaml
+1
-1
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_college_physics.yaml
...val/tasks/mmlu/flan_cot_fewshot/mmlu_college_physics.yaml
+1
-1
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_computer_security.yaml
...l/tasks/mmlu/flan_cot_fewshot/mmlu_computer_security.yaml
+1
-1
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_conceptual_physics.yaml
.../tasks/mmlu/flan_cot_fewshot/mmlu_conceptual_physics.yaml
+1
-1
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_econometrics.yaml
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_econometrics.yaml
+1
-1
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_electrical_engineering.yaml
...ks/mmlu/flan_cot_fewshot/mmlu_electrical_engineering.yaml
+1
-1
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_elementary_mathematics.yaml
...ks/mmlu/flan_cot_fewshot/mmlu_elementary_mathematics.yaml
+1
-1
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_formal_logic.yaml
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_formal_logic.yaml
+1
-1
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_global_facts.yaml
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_global_facts.yaml
+1
-1
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_high_school_biology.yaml
...tasks/mmlu/flan_cot_fewshot/mmlu_high_school_biology.yaml
+1
-1
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_high_school_chemistry.yaml
...sks/mmlu/flan_cot_fewshot/mmlu_high_school_chemistry.yaml
+1
-1
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_high_school_computer_science.yaml
...u/flan_cot_fewshot/mmlu_high_school_computer_science.yaml
+1
-1
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_high_school_european_history.yaml
...u/flan_cot_fewshot/mmlu_high_school_european_history.yaml
+1
-1
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_high_school_geography.yaml
...sks/mmlu/flan_cot_fewshot/mmlu_high_school_geography.yaml
+1
-1
No files found.
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_business_ethics.yaml
View file @
e200c24e
...
@@ -70,6 +70,6 @@ fewshot_config:
...
@@ -70,6 +70,6 @@ fewshot_config:
\
moral
arguments
relating
to:
negative
*externalities*,
the
*power*
that
corporations
\
\
moral
arguments
relating
to:
negative
*externalities*,
the
*power*
that
corporations
\
\
possess
and
the
*mutual
independence*
of
business
and
society.
The
answer
\
\
possess
and
the
*mutual
independence*
of
business
and
society.
The
answer
\
\
is
(D).
\n\n
"
\
is
(D).
\n\n
"
group
:
mmlu_flan_cot_fewshot_other
tag
:
mmlu_flan_cot_fewshot_other
include
:
_mmlu_flan_cot_fewshot_template_yaml
include
:
_mmlu_flan_cot_fewshot_template_yaml
task
:
mmlu_flan_cot_fewshot_business_ethics
task
:
mmlu_flan_cot_fewshot_business_ethics
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_clinical_knowledge.yaml
View file @
e200c24e
...
@@ -43,6 +43,6 @@ fewshot_config:
...
@@ -43,6 +43,6 @@ fewshot_config:
target
:
'
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
clinical
target
:
'
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
clinical
knowledge
for
help.
The
energy
for
muscular
contraction
is
provided
by
ATP
(adenosine
knowledge
for
help.
The
energy
for
muscular
contraction
is
provided
by
ATP
(adenosine
triphosphate),
which
is
the
powerhouse
of
the
cell.
The
answer
is
(A).'
triphosphate),
which
is
the
powerhouse
of
the
cell.
The
answer
is
(A).'
group
:
mmlu_flan_cot_fewshot_other
tag
:
mmlu_flan_cot_fewshot_other
include
:
_mmlu_flan_cot_fewshot_template_yaml
include
:
_mmlu_flan_cot_fewshot_template_yaml
task
:
mmlu_flan_cot_fewshot_clinical_knowledge
task
:
mmlu_flan_cot_fewshot_clinical_knowledge
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_college_biology.yaml
View file @
e200c24e
...
@@ -70,6 +70,6 @@ fewshot_config:
...
@@ -70,6 +70,6 @@ fewshot_config:
that
have
different
origins,
which
is
not
the
case
for
the
human
and
bird
forearms,
that
have
different
origins,
which
is
not
the
case
for
the
human
and
bird
forearms,
which
rules
out
(D).
Humans
and
birds
do
belong
to
the
same
clade
-
a
group
which
rules
out
(D).
Humans
and
birds
do
belong
to
the
same
clade
-
a
group
of
organisms
composed
of
a
common
ancestor.
The
answer
is
(C).'
of
organisms
composed
of
a
common
ancestor.
The
answer
is
(C).'
group
:
mmlu_flan_cot_fewshot_stem
tag
:
mmlu_flan_cot_fewshot_stem
include
:
_mmlu_flan_cot_fewshot_template_yaml
include
:
_mmlu_flan_cot_fewshot_template_yaml
task
:
mmlu_flan_cot_fewshot_college_biology
task
:
mmlu_flan_cot_fewshot_college_biology
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_college_chemistry.yaml
View file @
e200c24e
...
@@ -44,6 +44,6 @@ fewshot_config:
...
@@ -44,6 +44,6 @@ fewshot_config:
\
into
2
lines.
This
will
be
further
split
into
4
lines
by
the
interaction
with
\
\
into
2
lines.
This
will
be
further
split
into
4
lines
by
the
interaction
with
\
\
three
equivalent
1H
nuclei.
The
total
number
of
lines
is
therefore
$2
\\
cdot
\
\
three
equivalent
1H
nuclei.
The
total
number
of
lines
is
therefore
$2
\\
cdot
\
\
4
=
8$.
The
answer
is
(E).
\n\n
"
\
4
=
8$.
The
answer
is
(E).
\n\n
"
group
:
mmlu_flan_cot_fewshot_stem
tag
:
mmlu_flan_cot_fewshot_stem
include
:
_mmlu_flan_cot_fewshot_template_yaml
include
:
_mmlu_flan_cot_fewshot_template_yaml
task
:
mmlu_flan_cot_fewshot_college_chemistry
task
:
mmlu_flan_cot_fewshot_college_chemistry
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_college_computer_science.yaml
View file @
e200c24e
...
@@ -175,6 +175,6 @@ fewshot_config:
...
@@ -175,6 +175,6 @@ fewshot_config:
(1000
nanoseconds
/
cache
miss)
*
(1
cache
miss
/
50
instructions)
*
(50
instructions
(1000
nanoseconds
/
cache
miss)
*
(1
cache
miss
/
50
instructions)
*
(50
instructions
/
27000
nanoseconds)
=
1000
*
(1/50)
*
(50/27000)
=
1000/27000
=
1/27.
The
answer
/
27000
nanoseconds)
=
1000
*
(1/50)
*
(50/27000)
=
1000/27000
=
1/27.
The
answer
is
(B).'
is
(B).'
group
:
mmlu_flan_cot_fewshot_stem
tag
:
mmlu_flan_cot_fewshot_stem
include
:
_mmlu_flan_cot_fewshot_template_yaml
include
:
_mmlu_flan_cot_fewshot_template_yaml
task
:
mmlu_flan_cot_fewshot_college_computer_science
task
:
mmlu_flan_cot_fewshot_college_computer_science
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_college_mathematics.yaml
View file @
e200c24e
...
@@ -68,6 +68,6 @@ fewshot_config:
...
@@ -68,6 +68,6 @@ fewshot_config:
\
Then,
for
all
$t
\\
in
\\
mathbb{R}$,
we
have
$(s(t))-2=K
e^{-t
/
25}$,
and
\
\
Then,
for
all
$t
\\
in
\\
mathbb{R}$,
we
have
$(s(t))-2=K
e^{-t
/
25}$,
and
\
\
so
$s(t)=2+K
e^{-t
/
25}$.
Then
$3=s(0)=2+K
e^{0}=2+K$,
so
$K=1$.
Then
$s(100)=2+K
\
\
so
$s(t)=2+K
e^{-t
/
25}$.
Then
$3=s(0)=2+K
e^{0}=2+K$,
so
$K=1$.
Then
$s(100)=2+K
\
\
e^{-100
/
25}=2+1
\\
cdot
e^{-4}=2+e^{-4}$.
The
answer
is
(D).
\n\n
"
\
e^{-100
/
25}=2+1
\\
cdot
e^{-4}=2+e^{-4}$.
The
answer
is
(D).
\n\n
"
group
:
mmlu_flan_cot_fewshot_stem
tag
:
mmlu_flan_cot_fewshot_stem
include
:
_mmlu_flan_cot_fewshot_template_yaml
include
:
_mmlu_flan_cot_fewshot_template_yaml
task
:
mmlu_flan_cot_fewshot_college_mathematics
task
:
mmlu_flan_cot_fewshot_college_mathematics
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_college_medicine.yaml
View file @
e200c24e
...
@@ -63,6 +63,6 @@ fewshot_config:
...
@@ -63,6 +63,6 @@ fewshot_config:
for
help.
Glucose
(also
known
as
the
blood
sugar)
is
the
main
sugar
found
in
for
help.
Glucose
(also
known
as
the
blood
sugar)
is
the
main
sugar
found
in
the
human
body.
It
is
transported
into
the
muscle
cell
via
diffusion
through
the
human
body.
It
is
transported
into
the
muscle
cell
via
diffusion
through
protein
transporters
called
GLUT4.
The
answer
is
(A).'
protein
transporters
called
GLUT4.
The
answer
is
(A).'
group
:
mmlu_flan_cot_fewshot_other
tag
:
mmlu_flan_cot_fewshot_other
include
:
_mmlu_flan_cot_fewshot_template_yaml
include
:
_mmlu_flan_cot_fewshot_template_yaml
task
:
mmlu_flan_cot_fewshot_college_medicine
task
:
mmlu_flan_cot_fewshot_college_medicine
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_college_physics.yaml
View file @
e200c24e
...
@@ -56,6 +56,6 @@ fewshot_config:
...
@@ -56,6 +56,6 @@ fewshot_config:
of
the
gas
container
is
constant,
no
work
will
be
done
(since
work
is
pressure
of
the
gas
container
is
constant,
no
work
will
be
done
(since
work
is
pressure
times
change
in
volume).
So,
at
constant
volume,
all
of
the
heat
goes
into
the
times
change
in
volume).
So,
at
constant
volume,
all
of
the
heat
goes
into
the
internal
energy.
The
answer
is
(B).'
internal
energy.
The
answer
is
(B).'
group
:
mmlu_flan_cot_fewshot_stem
tag
:
mmlu_flan_cot_fewshot_stem
include
:
_mmlu_flan_cot_fewshot_template_yaml
include
:
_mmlu_flan_cot_fewshot_template_yaml
task
:
mmlu_flan_cot_fewshot_college_physics
task
:
mmlu_flan_cot_fewshot_college_physics
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_computer_security.yaml
View file @
e200c24e
...
@@ -45,6 +45,6 @@ fewshot_config:
...
@@ -45,6 +45,6 @@ fewshot_config:
of
the
TLS
heartbeat
extension.
The
vulnerability
was
classified
as
a
buffer
of
the
TLS
heartbeat
extension.
The
vulnerability
was
classified
as
a
buffer
over-read,
a
situation
where
more
data
can
be
read
than
should
be
allowed.
The
over-read,
a
situation
where
more
data
can
be
read
than
should
be
allowed.
The
answer
is
(C).'
answer
is
(C).'
group
:
mmlu_flan_cot_fewshot_stem
tag
:
mmlu_flan_cot_fewshot_stem
include
:
_mmlu_flan_cot_fewshot_template_yaml
include
:
_mmlu_flan_cot_fewshot_template_yaml
task
:
mmlu_flan_cot_fewshot_computer_security
task
:
mmlu_flan_cot_fewshot_computer_security
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_conceptual_physics.yaml
View file @
e200c24e
...
@@ -44,6 +44,6 @@ fewshot_config:
...
@@ -44,6 +44,6 @@ fewshot_config:
\
orthogonal
to
the
wind
is
the
same
as
it
would
be
in
the
absence
of
the
wind.
\
\
orthogonal
to
the
wind
is
the
same
as
it
would
be
in
the
absence
of
the
wind.
\
\
The
total
speed,
which
is
these
two
components
added
in
quadrature,
is
thus
\
\
The
total
speed,
which
is
these
two
components
added
in
quadrature,
is
thus
\
\
greater
than
the
speed
in
still
air.
The
answer
is
(B).
\n\n
"
\
greater
than
the
speed
in
still
air.
The
answer
is
(B).
\n\n
"
group
:
mmlu_flan_cot_fewshot_stem
tag
:
mmlu_flan_cot_fewshot_stem
include
:
_mmlu_flan_cot_fewshot_template_yaml
include
:
_mmlu_flan_cot_fewshot_template_yaml
task
:
mmlu_flan_cot_fewshot_conceptual_physics
task
:
mmlu_flan_cot_fewshot_conceptual_physics
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_econometrics.yaml
View file @
e200c24e
...
@@ -82,6 +82,6 @@ fewshot_config:
...
@@ -82,6 +82,6 @@ fewshot_config:
target
:
'
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
econometrics
target
:
'
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
econometrics
for
help.
This
is
a
formal
logic
problem
about
stationally
process.
For
a
stationary
for
help.
This
is
a
formal
logic
problem
about
stationally
process.
For
a
stationary
autoregressive
process,
shocks
will
eventually
die
away.
The
answer
is
(A).'
autoregressive
process,
shocks
will
eventually
die
away.
The
answer
is
(A).'
group
:
mmlu_flan_cot_fewshot_social_sciences
tag
:
mmlu_flan_cot_fewshot_social_sciences
include
:
_mmlu_flan_cot_fewshot_template_yaml
include
:
_mmlu_flan_cot_fewshot_template_yaml
task
:
mmlu_flan_cot_fewshot_econometrics
task
:
mmlu_flan_cot_fewshot_econometrics
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_electrical_engineering.yaml
View file @
e200c24e
...
@@ -42,6 +42,6 @@ fewshot_config:
...
@@ -42,6 +42,6 @@ fewshot_config:
target
:
'
Let'
'
s
think
step
by
step.
In
lap
winding,
effectively
two
resistors
target
:
'
Let'
'
s
think
step
by
step.
In
lap
winding,
effectively
two
resistors
are
connected
in
parallel,
so
the
actual
resistance
of
each
pair
is
1
Ohm.
Since
are
connected
in
parallel,
so
the
actual
resistance
of
each
pair
is
1
Ohm.
Since
we
have
50
pairs,
we
get
a
total
resistance
of
50
Ohms.
The
answer
is
(C).'
we
have
50
pairs,
we
get
a
total
resistance
of
50
Ohms.
The
answer
is
(C).'
group
:
mmlu_flan_cot_fewshot_stem
tag
:
mmlu_flan_cot_fewshot_stem
include
:
_mmlu_flan_cot_fewshot_template_yaml
include
:
_mmlu_flan_cot_fewshot_template_yaml
task
:
mmlu_flan_cot_fewshot_electrical_engineering
task
:
mmlu_flan_cot_fewshot_electrical_engineering
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_elementary_mathematics.yaml
View file @
e200c24e
...
@@ -72,6 +72,6 @@ fewshot_config:
...
@@ -72,6 +72,6 @@ fewshot_config:
(D)
(5
x
9)
x
(6
x
9)'
(D)
(5
x
9)
x
(6
x
9)'
target
:
'
Let'
'
s
think
step
by
step.
We
know
that
9
=
(5
+
4),
so
5
x
9
=
5
x
(5
target
:
'
Let'
'
s
think
step
by
step.
We
know
that
9
=
(5
+
4),
so
5
x
9
=
5
x
(5
+
4)
=
(5
x
5)
+
(5
x
4).
The
answer
is
(B).'
+
4)
=
(5
x
5)
+
(5
x
4).
The
answer
is
(B).'
group
:
mmlu_flan_cot_fewshot_stem
tag
:
mmlu_flan_cot_fewshot_stem
include
:
_mmlu_flan_cot_fewshot_template_yaml
include
:
_mmlu_flan_cot_fewshot_template_yaml
task
:
mmlu_flan_cot_fewshot_elementary_mathematics
task
:
mmlu_flan_cot_fewshot_elementary_mathematics
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_formal_logic.yaml
View file @
e200c24e
...
@@ -65,6 +65,6 @@ fewshot_config:
...
@@ -65,6 +65,6 @@ fewshot_config:
\
p
do
not
drive
on
Mars.
\n
Of
all
these
options,
Option
(C)
appears
to
be
the
\
\
p
do
not
drive
on
Mars.
\n
Of
all
these
options,
Option
(C)
appears
to
be
the
\
\
best
and
most
meaningful
interpretation
of
the
argument
\u201C
No
people
drive
\
\
best
and
most
meaningful
interpretation
of
the
argument
\u201C
No
people
drive
\
\
on
Mars.
\u201D
The
answer
is
(C).
\n\n
"
\
on
Mars.
\u201D
The
answer
is
(C).
\n\n
"
group
:
mmlu_flan_cot_fewshot_humanities
tag
:
mmlu_flan_cot_fewshot_humanities
include
:
_mmlu_flan_cot_fewshot_template_yaml
include
:
_mmlu_flan_cot_fewshot_template_yaml
task
:
mmlu_flan_cot_fewshot_formal_logic
task
:
mmlu_flan_cot_fewshot_formal_logic
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_global_facts.yaml
View file @
e200c24e
...
@@ -44,6 +44,6 @@ fewshot_config:
...
@@ -44,6 +44,6 @@ fewshot_config:
for
help.
As
of
2019,
most
people
tend
to
be
optimistic
about
their
own
future
for
help.
As
of
2019,
most
people
tend
to
be
optimistic
about
their
own
future
but
pessimistic
about
the
future
of
their
nation
or
the
world.
The
answer
is
but
pessimistic
about
the
future
of
their
nation
or
the
world.
The
answer
is
(B).'
(B).'
group
:
mmlu_flan_cot_fewshot_other
tag
:
mmlu_flan_cot_fewshot_other
include
:
_mmlu_flan_cot_fewshot_template_yaml
include
:
_mmlu_flan_cot_fewshot_template_yaml
task
:
mmlu_flan_cot_fewshot_global_facts
task
:
mmlu_flan_cot_fewshot_global_facts
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_high_school_biology.yaml
View file @
e200c24e
...
@@ -64,6 +64,6 @@ fewshot_config:
...
@@ -64,6 +64,6 @@ fewshot_config:
core
cell
cycle
regulators
inside
the
cell.
The
most
common
regulators
are
cyclins
core
cell
cycle
regulators
inside
the
cell.
The
most
common
regulators
are
cyclins
and
cyclin-dependent
kinases.
Fibroblast
cells
do
not
play
any
role
in
cell
and
cyclin-dependent
kinases.
Fibroblast
cells
do
not
play
any
role
in
cell
division.
The
answer
is
(D).'
division.
The
answer
is
(D).'
group
:
mmlu_flan_cot_fewshot_stem
tag
:
mmlu_flan_cot_fewshot_stem
include
:
_mmlu_flan_cot_fewshot_template_yaml
include
:
_mmlu_flan_cot_fewshot_template_yaml
task
:
mmlu_flan_cot_fewshot_high_school_biology
task
:
mmlu_flan_cot_fewshot_high_school_biology
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_high_school_chemistry.yaml
View file @
e200c24e
...
@@ -61,6 +61,6 @@ fewshot_config:
...
@@ -61,6 +61,6 @@ fewshot_config:
\
strong
acid,
Nitric
acid,
will
react
with
the
conjugate
base.
Therefore
the
\
\
strong
acid,
Nitric
acid,
will
react
with
the
conjugate
base.
Therefore
the
\
\
maximum
amount
of
acid
that
can
be
added
will
be
equal
to
the
amount
of
acetate
\
\
maximum
amount
of
acid
that
can
be
added
will
be
equal
to
the
amount
of
acetate
\
\
ion,
or
2
moles.
The
answer
is
(C).
\n\n
"
\
ion,
or
2
moles.
The
answer
is
(C).
\n\n
"
group
:
mmlu_flan_cot_fewshot_stem
tag
:
mmlu_flan_cot_fewshot_stem
include
:
_mmlu_flan_cot_fewshot_template_yaml
include
:
_mmlu_flan_cot_fewshot_template_yaml
task
:
mmlu_flan_cot_fewshot_high_school_chemistry
task
:
mmlu_flan_cot_fewshot_high_school_chemistry
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_high_school_computer_science.yaml
View file @
e200c24e
...
@@ -79,6 +79,6 @@ fewshot_config:
...
@@ -79,6 +79,6 @@ fewshot_config:
its
value
is
greater
than
100,
regardless
of
the
elements
in
the
list.
Choice
its
value
is
greater
than
100,
regardless
of
the
elements
in
the
list.
Choice
D
is
incorrect
because
its
step
3
does
not
increment
the
value
of
position,
D
is
incorrect
because
its
step
3
does
not
increment
the
value
of
position,
so
it
will
repeat
forever.
The
answer
is
(B).'
so
it
will
repeat
forever.
The
answer
is
(B).'
group
:
mmlu_flan_cot_fewshot_stem
tag
:
mmlu_flan_cot_fewshot_stem
include
:
_mmlu_flan_cot_fewshot_template_yaml
include
:
_mmlu_flan_cot_fewshot_template_yaml
task
:
mmlu_flan_cot_fewshot_high_school_computer_science
task
:
mmlu_flan_cot_fewshot_high_school_computer_science
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_high_school_european_history.yaml
View file @
e200c24e
...
@@ -194,6 +194,6 @@ fewshot_config:
...
@@ -194,6 +194,6 @@ fewshot_config:
wrote
extensively
against
the
monoplization
of
power
and
advocated
for
a
system
wrote
extensively
against
the
monoplization
of
power
and
advocated
for
a
system
of
checks
and
balances
in
government
to
prevent
the
rise
of
despotism.
The
answer
of
checks
and
balances
in
government
to
prevent
the
rise
of
despotism.
The
answer
is
(B).'
is
(B).'
group
:
mmlu_flan_cot_fewshot_humanities
tag
:
mmlu_flan_cot_fewshot_humanities
include
:
_mmlu_flan_cot_fewshot_template_yaml
include
:
_mmlu_flan_cot_fewshot_template_yaml
task
:
mmlu_flan_cot_fewshot_high_school_european_history
task
:
mmlu_flan_cot_fewshot_high_school_european_history
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_high_school_geography.yaml
View file @
e200c24e
...
@@ -48,6 +48,6 @@ fewshot_config:
...
@@ -48,6 +48,6 @@ fewshot_config:
target
:
'
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
geography
target
:
'
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
geography
for
help.
The
difference
between
number
of
births
and
deaths
gives
the
population
for
help.
The
difference
between
number
of
births
and
deaths
gives
the
population
increase
at
any
given
time.
The
answer
is
(A).'
increase
at
any
given
time.
The
answer
is
(A).'
group
:
mmlu_flan_cot_fewshot_social_sciences
tag
:
mmlu_flan_cot_fewshot_social_sciences
include
:
_mmlu_flan_cot_fewshot_template_yaml
include
:
_mmlu_flan_cot_fewshot_template_yaml
task
:
mmlu_flan_cot_fewshot_high_school_geography
task
:
mmlu_flan_cot_fewshot_high_school_geography
Prev
1
2
3
4
5
6
7
8
…
18
Next
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment