Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
gaoqiong
lm-evaluation-harness
Commits
e200c24e
Commit
e200c24e
authored
Jul 03, 2024
by
lintangsutawika
Browse files
update mmlu
parent
43765669
Changes
342
Hide whitespace changes
Inline
Side-by-side
Showing
20 changed files
with
20 additions
and
20 deletions
+20
-20
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_moral_scenarios.yaml
...val/tasks/mmlu/flan_cot_fewshot/mmlu_moral_scenarios.yaml
+1
-1
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_nutrition.yaml
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_nutrition.yaml
+1
-1
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_philosophy.yaml
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_philosophy.yaml
+1
-1
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_prehistory.yaml
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_prehistory.yaml
+1
-1
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_professional_accounting.yaml
...s/mmlu/flan_cot_fewshot/mmlu_professional_accounting.yaml
+1
-1
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_professional_law.yaml
...al/tasks/mmlu/flan_cot_fewshot/mmlu_professional_law.yaml
+1
-1
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_professional_medicine.yaml
...sks/mmlu/flan_cot_fewshot/mmlu_professional_medicine.yaml
+1
-1
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_professional_psychology.yaml
...s/mmlu/flan_cot_fewshot/mmlu_professional_psychology.yaml
+1
-1
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_public_relations.yaml
...al/tasks/mmlu/flan_cot_fewshot/mmlu_public_relations.yaml
+1
-1
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_security_studies.yaml
...al/tasks/mmlu/flan_cot_fewshot/mmlu_security_studies.yaml
+1
-1
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_sociology.yaml
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_sociology.yaml
+1
-1
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_us_foreign_policy.yaml
...l/tasks/mmlu/flan_cot_fewshot/mmlu_us_foreign_policy.yaml
+1
-1
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_virology.yaml
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_virology.yaml
+1
-1
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_world_religions.yaml
...val/tasks/mmlu/flan_cot_fewshot/mmlu_world_religions.yaml
+1
-1
lm_eval/tasks/mmlu/flan_cot_zeroshot/mmlu_abstract_algebra.yaml
...l/tasks/mmlu/flan_cot_zeroshot/mmlu_abstract_algebra.yaml
+1
-1
lm_eval/tasks/mmlu/flan_cot_zeroshot/mmlu_anatomy.yaml
lm_eval/tasks/mmlu/flan_cot_zeroshot/mmlu_anatomy.yaml
+1
-1
lm_eval/tasks/mmlu/flan_cot_zeroshot/mmlu_astronomy.yaml
lm_eval/tasks/mmlu/flan_cot_zeroshot/mmlu_astronomy.yaml
+1
-1
lm_eval/tasks/mmlu/flan_cot_zeroshot/mmlu_business_ethics.yaml
...al/tasks/mmlu/flan_cot_zeroshot/mmlu_business_ethics.yaml
+1
-1
lm_eval/tasks/mmlu/flan_cot_zeroshot/mmlu_clinical_knowledge.yaml
...tasks/mmlu/flan_cot_zeroshot/mmlu_clinical_knowledge.yaml
+1
-1
lm_eval/tasks/mmlu/flan_cot_zeroshot/mmlu_college_biology.yaml
...al/tasks/mmlu/flan_cot_zeroshot/mmlu_college_biology.yaml
+1
-1
No files found.
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_moral_scenarios.yaml
View file @
e200c24e
...
@@ -57,6 +57,6 @@ fewshot_config:
...
@@ -57,6 +57,6 @@ fewshot_config:
target
:
'
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
moral
scenarios
target
:
'
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
moral
scenarios
for
help.
Loving
someone
is
not
wrong.
However,
exposing
something
that
someone
for
help.
Loving
someone
is
not
wrong.
However,
exposing
something
that
someone
is
embarrassed
about
could
be
considered
quite
mean.
The
answer
is
(C).'
is
embarrassed
about
could
be
considered
quite
mean.
The
answer
is
(C).'
group
:
mmlu_flan_cot_fewshot_humanities
tag
:
mmlu_flan_cot_fewshot_humanities
include
:
_mmlu_flan_cot_fewshot_template_yaml
include
:
_mmlu_flan_cot_fewshot_template_yaml
task
:
mmlu_flan_cot_fewshot_moral_scenarios
task
:
mmlu_flan_cot_fewshot_moral_scenarios
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_nutrition.yaml
View file @
e200c24e
...
@@ -58,6 +58,6 @@ fewshot_config:
...
@@ -58,6 +58,6 @@ fewshot_config:
target
:
'
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
nutrition
target
:
'
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
nutrition
for
help.
The
risk
ratio
is
not
sufficiently
reduced
that
it
could
not
be
explained
for
help.
The
risk
ratio
is
not
sufficiently
reduced
that
it
could
not
be
explained
by
random
chance
given
the
studies
sample
size.
The
answer
is
(C).'
by
random
chance
given
the
studies
sample
size.
The
answer
is
(C).'
group
:
mmlu_flan_cot_fewshot_other
tag
:
mmlu_flan_cot_fewshot_other
include
:
_mmlu_flan_cot_fewshot_template_yaml
include
:
_mmlu_flan_cot_fewshot_template_yaml
task
:
mmlu_flan_cot_fewshot_nutrition
task
:
mmlu_flan_cot_fewshot_nutrition
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_philosophy.yaml
View file @
e200c24e
...
@@ -39,6 +39,6 @@ fewshot_config:
...
@@ -39,6 +39,6 @@ fewshot_config:
for
help.
Psychological
egoism
suggests
that
one
behaves
based
on
what
makes
for
help.
Psychological
egoism
suggests
that
one
behaves
based
on
what
makes
one
feels
good,
hence
it
is
a
claim
about
human
nature
and
how
humans
are
capable
one
feels
good,
hence
it
is
a
claim
about
human
nature
and
how
humans
are
capable
of
behaving.
The
answer
is
(C).'
of
behaving.
The
answer
is
(C).'
group
:
mmlu_flan_cot_fewshot_humanities
tag
:
mmlu_flan_cot_fewshot_humanities
include
:
_mmlu_flan_cot_fewshot_template_yaml
include
:
_mmlu_flan_cot_fewshot_template_yaml
task
:
mmlu_flan_cot_fewshot_philosophy
task
:
mmlu_flan_cot_fewshot_philosophy
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_prehistory.yaml
View file @
e200c24e
...
@@ -54,6 +54,6 @@ fewshot_config:
...
@@ -54,6 +54,6 @@ fewshot_config:
target
:
'
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
prehistory
target
:
'
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
prehistory
for
help.
Pacal
built
the
temples
as
the
funerary
monument
to
legitimize
his
for
help.
Pacal
built
the
temples
as
the
funerary
monument
to
legitimize
his
kingship.
The
answer
is
(D).'
kingship.
The
answer
is
(D).'
group
:
mmlu_flan_cot_fewshot_humanities
tag
:
mmlu_flan_cot_fewshot_humanities
include
:
_mmlu_flan_cot_fewshot_template_yaml
include
:
_mmlu_flan_cot_fewshot_template_yaml
task
:
mmlu_flan_cot_fewshot_prehistory
task
:
mmlu_flan_cot_fewshot_prehistory
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_professional_accounting.yaml
View file @
e200c24e
...
@@ -58,6 +58,6 @@ fewshot_config:
...
@@ -58,6 +58,6 @@ fewshot_config:
for
help.
Among
the
four
transactions,
only
Proceeds
from
long-term
debt
belongs
for
help.
Among
the
four
transactions,
only
Proceeds
from
long-term
debt
belongs
to
the
financing
activities
section
of
cashflow,
hence
the
amount
reported
should
to
the
financing
activities
section
of
cashflow,
hence
the
amount
reported
should
be
$100000.
The
answer
is
(D).'
be
$100000.
The
answer
is
(D).'
group
:
mmlu_flan_cot_fewshot_other
tag
:
mmlu_flan_cot_fewshot_other
include
:
_mmlu_flan_cot_fewshot_template_yaml
include
:
_mmlu_flan_cot_fewshot_template_yaml
task
:
mmlu_flan_cot_fewshot_professional_accounting
task
:
mmlu_flan_cot_fewshot_professional_accounting
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_professional_law.yaml
View file @
e200c24e
...
@@ -117,6 +117,6 @@ fewshot_config:
...
@@ -117,6 +117,6 @@ fewshot_config:
a
due
process
clause.
Hence
the
strongest
argument
should
be
the
statute
is
a
due
process
clause.
Hence
the
strongest
argument
should
be
the
statute
is
overbroad
and
consequently
invalid
under
the
First
and
Fourteenth
Amendments.
overbroad
and
consequently
invalid
under
the
First
and
Fourteenth
Amendments.
The
answer
is
(D).'
The
answer
is
(D).'
group
:
mmlu_flan_cot_fewshot_humanities
tag
:
mmlu_flan_cot_fewshot_humanities
include
:
_mmlu_flan_cot_fewshot_template_yaml
include
:
_mmlu_flan_cot_fewshot_template_yaml
task
:
mmlu_flan_cot_fewshot_professional_law
task
:
mmlu_flan_cot_fewshot_professional_law
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_professional_medicine.yaml
View file @
e200c24e
...
@@ -77,6 +77,6 @@ fewshot_config:
...
@@ -77,6 +77,6 @@ fewshot_config:
for
help.
The
symptoms
and
the
adrenal
mass
suggested
pheochromocytoma,
and
for
help.
The
symptoms
and
the
adrenal
mass
suggested
pheochromocytoma,
and
the
blood
pressure
indicates
hypertension.
Phenoxybenzamine
is
used
to
treat
the
blood
pressure
indicates
hypertension.
Phenoxybenzamine
is
used
to
treat
hypertension
caused
by
pheochromocytoma.
The
answer
is
(D).'
hypertension
caused
by
pheochromocytoma.
The
answer
is
(D).'
group
:
mmlu_flan_cot_fewshot_other
tag
:
mmlu_flan_cot_fewshot_other
include
:
_mmlu_flan_cot_fewshot_template_yaml
include
:
_mmlu_flan_cot_fewshot_template_yaml
task
:
mmlu_flan_cot_fewshot_professional_medicine
task
:
mmlu_flan_cot_fewshot_professional_medicine
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_professional_psychology.yaml
View file @
e200c24e
...
@@ -57,6 +57,6 @@ fewshot_config:
...
@@ -57,6 +57,6 @@ fewshot_config:
for
help.
Based
on
the
circumstances,
you
should
tell
your
client
about
the
for
help.
Based
on
the
circumstances,
you
should
tell
your
client
about
the
pros
and
cons
of
each
program,
but
it
would
be
inappropriate
to
receive
the
pros
and
cons
of
each
program,
but
it
would
be
inappropriate
to
receive
the
bonus,
so
you
should
not
claim
the
$50
bonus.
The
answer
is
(D).'
bonus,
so
you
should
not
claim
the
$50
bonus.
The
answer
is
(D).'
group
:
mmlu_flan_cot_fewshot_social_sciences
tag
:
mmlu_flan_cot_fewshot_social_sciences
include
:
_mmlu_flan_cot_fewshot_template_yaml
include
:
_mmlu_flan_cot_fewshot_template_yaml
task
:
mmlu_flan_cot_fewshot_professional_psychology
task
:
mmlu_flan_cot_fewshot_professional_psychology
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_public_relations.yaml
View file @
e200c24e
...
@@ -50,6 +50,6 @@ fewshot_config:
...
@@ -50,6 +50,6 @@ fewshot_config:
for
help.
If
a
public
relations
media
practitioner
does
not
know
the
answer
for
help.
If
a
public
relations
media
practitioner
does
not
know
the
answer
to
a
reporter'
'
s
question,
they
should
say
'
'
I
don'
'
t
know'
'
and
offer
to
provide
to
a
reporter'
'
s
question,
they
should
say
'
'
I
don'
'
t
know'
'
and
offer
to
provide
the
information
later.
The
answer
is
(C).'
the
information
later.
The
answer
is
(C).'
group
:
mmlu_flan_cot_fewshot_social_sciences
tag
:
mmlu_flan_cot_fewshot_social_sciences
include
:
_mmlu_flan_cot_fewshot_template_yaml
include
:
_mmlu_flan_cot_fewshot_template_yaml
task
:
mmlu_flan_cot_fewshot_public_relations
task
:
mmlu_flan_cot_fewshot_public_relations
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_security_studies.yaml
View file @
e200c24e
...
@@ -99,6 +99,6 @@ fewshot_config:
...
@@ -99,6 +99,6 @@ fewshot_config:
target
:
'
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
security
target
:
'
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
security
studies
for
help.
Coercive
diplomacy
uses
the
threat
of
force
to
induce
the
studies
for
help.
Coercive
diplomacy
uses
the
threat
of
force
to
induce
the
opponent
to
comply
with
demands.
The
answer
is
(B).'
opponent
to
comply
with
demands.
The
answer
is
(B).'
group
:
mmlu_flan_cot_fewshot_social_sciences
tag
:
mmlu_flan_cot_fewshot_social_sciences
include
:
_mmlu_flan_cot_fewshot_template_yaml
include
:
_mmlu_flan_cot_fewshot_template_yaml
task
:
mmlu_flan_cot_fewshot_security_studies
task
:
mmlu_flan_cot_fewshot_security_studies
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_sociology.yaml
View file @
e200c24e
...
@@ -53,6 +53,6 @@ fewshot_config:
...
@@ -53,6 +53,6 @@ fewshot_config:
for
help.
The
post-war
welfare
state
of
1948
aimed
to
provide
free
healthcare
for
help.
The
post-war
welfare
state
of
1948
aimed
to
provide
free
healthcare
and
education,
full
employment,
and
universal
welfare.
But
it
did
not
aim
to
and
education,
full
employment,
and
universal
welfare.
But
it
did
not
aim
to
provide
a
minimum
wage.
The
answer
is
(B).'
provide
a
minimum
wage.
The
answer
is
(B).'
group
:
mmlu_flan_cot_fewshot_social_sciences
tag
:
mmlu_flan_cot_fewshot_social_sciences
include
:
_mmlu_flan_cot_fewshot_template_yaml
include
:
_mmlu_flan_cot_fewshot_template_yaml
task
:
mmlu_flan_cot_fewshot_sociology
task
:
mmlu_flan_cot_fewshot_sociology
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_us_foreign_policy.yaml
View file @
e200c24e
...
@@ -51,6 +51,6 @@ fewshot_config:
...
@@ -51,6 +51,6 @@ fewshot_config:
target
:
'
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
us
foreign
target
:
'
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
us
foreign
policy
for
help.
The
2008
financial
crisis
damanged
the
international
reputation
policy
for
help.
The
2008
financial
crisis
damanged
the
international
reputation
of
the
American
model
of
political
economy
and
capitalism.
The
answer
is
(A).'
of
the
American
model
of
political
economy
and
capitalism.
The
answer
is
(A).'
group
:
mmlu_flan_cot_fewshot_social_sciences
tag
:
mmlu_flan_cot_fewshot_social_sciences
include
:
_mmlu_flan_cot_fewshot_template_yaml
include
:
_mmlu_flan_cot_fewshot_template_yaml
task
:
mmlu_flan_cot_fewshot_us_foreign_policy
task
:
mmlu_flan_cot_fewshot_us_foreign_policy
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_virology.yaml
View file @
e200c24e
...
@@ -40,6 +40,6 @@ fewshot_config:
...
@@ -40,6 +40,6 @@ fewshot_config:
target
:
'
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
virology
target
:
'
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
virology
for
help.
Paroviruses
are
highly
impactful
because
they
do
not
have
nucleic
for
help.
Paroviruses
are
highly
impactful
because
they
do
not
have
nucleic
acid.
The
answer
is
(A).'
acid.
The
answer
is
(A).'
group
:
mmlu_flan_cot_fewshot_other
tag
:
mmlu_flan_cot_fewshot_other
include
:
_mmlu_flan_cot_fewshot_template_yaml
include
:
_mmlu_flan_cot_fewshot_template_yaml
task
:
mmlu_flan_cot_fewshot_virology
task
:
mmlu_flan_cot_fewshot_virology
lm_eval/tasks/mmlu/flan_cot_fewshot/mmlu_world_religions.yaml
View file @
e200c24e
...
@@ -37,6 +37,6 @@ fewshot_config:
...
@@ -37,6 +37,6 @@ fewshot_config:
target
:
'
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
world
religions
target
:
'
Let'
'
s
think
step
by
step.
We
refer
to
Wikipedia
articles
on
world
religions
for
help.
In
Judaism,
the
most
distinctive
sign
of
the
covenant
is
circumcision
for
help.
In
Judaism,
the
most
distinctive
sign
of
the
covenant
is
circumcision
(brit
milah).
The
answer
is
(B).'
(brit
milah).
The
answer
is
(B).'
group
:
mmlu_flan_cot_fewshot_humanities
tag
:
mmlu_flan_cot_fewshot_humanities
include
:
_mmlu_flan_cot_fewshot_template_yaml
include
:
_mmlu_flan_cot_fewshot_template_yaml
task
:
mmlu_flan_cot_fewshot_world_religions
task
:
mmlu_flan_cot_fewshot_world_religions
lm_eval/tasks/mmlu/flan_cot_zeroshot/mmlu_abstract_algebra.yaml
View file @
e200c24e
"
dataset_name"
:
"
abstract_algebra"
"
dataset_name"
:
"
abstract_algebra"
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
abstract
\
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
abstract
\
\
algebra.
\n\n
"
\
algebra.
\n\n
"
"
group
"
:
"
mmlu_flan_cot_zeroshot_stem"
"
tag
"
:
"
mmlu_flan_cot_zeroshot_stem"
"
include"
:
"
_mmlu_flan_cot_zeroshot_template_yaml"
"
include"
:
"
_mmlu_flan_cot_zeroshot_template_yaml"
"
task"
:
"
mmlu_flan_cot_zeroshot_abstract_algebra"
"
task"
:
"
mmlu_flan_cot_zeroshot_abstract_algebra"
lm_eval/tasks/mmlu/flan_cot_zeroshot/mmlu_anatomy.yaml
View file @
e200c24e
"
dataset_name"
:
"
anatomy"
"
dataset_name"
:
"
anatomy"
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
anatomy.
\n\
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
anatomy.
\n\
\n
"
\n
"
"
group
"
:
"
mmlu_flan_cot_zeroshot_stem"
"
tag
"
:
"
mmlu_flan_cot_zeroshot_stem"
"
include"
:
"
_mmlu_flan_cot_zeroshot_template_yaml"
"
include"
:
"
_mmlu_flan_cot_zeroshot_template_yaml"
"
task"
:
"
mmlu_flan_cot_zeroshot_anatomy"
"
task"
:
"
mmlu_flan_cot_zeroshot_anatomy"
lm_eval/tasks/mmlu/flan_cot_zeroshot/mmlu_astronomy.yaml
View file @
e200c24e
"
dataset_name"
:
"
astronomy"
"
dataset_name"
:
"
astronomy"
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
astronomy.
\n\
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
astronomy.
\n\
\n
"
\n
"
"
group
"
:
"
mmlu_flan_cot_zeroshot_stem"
"
tag
"
:
"
mmlu_flan_cot_zeroshot_stem"
"
include"
:
"
_mmlu_flan_cot_zeroshot_template_yaml"
"
include"
:
"
_mmlu_flan_cot_zeroshot_template_yaml"
"
task"
:
"
mmlu_flan_cot_zeroshot_astronomy"
"
task"
:
"
mmlu_flan_cot_zeroshot_astronomy"
lm_eval/tasks/mmlu/flan_cot_zeroshot/mmlu_business_ethics.yaml
View file @
e200c24e
"
dataset_name"
:
"
business_ethics"
"
dataset_name"
:
"
business_ethics"
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
business
\
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
business
\
\
ethics.
\n\n
"
\
ethics.
\n\n
"
"
group
"
:
"
mmlu_flan_cot_zeroshot_other"
"
tag
"
:
"
mmlu_flan_cot_zeroshot_other"
"
include"
:
"
_mmlu_flan_cot_zeroshot_template_yaml"
"
include"
:
"
_mmlu_flan_cot_zeroshot_template_yaml"
"
task"
:
"
mmlu_flan_cot_zeroshot_business_ethics"
"
task"
:
"
mmlu_flan_cot_zeroshot_business_ethics"
lm_eval/tasks/mmlu/flan_cot_zeroshot/mmlu_clinical_knowledge.yaml
View file @
e200c24e
"
dataset_name"
:
"
clinical_knowledge"
"
dataset_name"
:
"
clinical_knowledge"
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
clinical
\
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
clinical
\
\
knowledge.
\n\n
"
\
knowledge.
\n\n
"
"
group
"
:
"
mmlu_flan_cot_zeroshot_other"
"
tag
"
:
"
mmlu_flan_cot_zeroshot_other"
"
include"
:
"
_mmlu_flan_cot_zeroshot_template_yaml"
"
include"
:
"
_mmlu_flan_cot_zeroshot_template_yaml"
"
task"
:
"
mmlu_flan_cot_zeroshot_clinical_knowledge"
"
task"
:
"
mmlu_flan_cot_zeroshot_clinical_knowledge"
lm_eval/tasks/mmlu/flan_cot_zeroshot/mmlu_college_biology.yaml
View file @
e200c24e
"
dataset_name"
:
"
college_biology"
"
dataset_name"
:
"
college_biology"
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
college
\
"
description"
:
"
The
following
are
multiple
choice
questions
(with
answers)
about
college
\
\
biology.
\n\n
"
\
biology.
\n\n
"
"
group
"
:
"
mmlu_flan_cot_zeroshot_stem"
"
tag
"
:
"
mmlu_flan_cot_zeroshot_stem"
"
include"
:
"
_mmlu_flan_cot_zeroshot_template_yaml"
"
include"
:
"
_mmlu_flan_cot_zeroshot_template_yaml"
"
task"
:
"
mmlu_flan_cot_zeroshot_college_biology"
"
task"
:
"
mmlu_flan_cot_zeroshot_college_biology"
Prev
1
2
3
4
5
6
7
8
9
10
…
18
Next
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment