Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
gaoqiong
lm-evaluation-harness
Commits
bfbda3b3
Commit
bfbda3b3
authored
Jun 26, 2024
by
lintangsutawika
Browse files
removed
parent
067f681a
Changes
67
Expand all
Hide whitespace changes
Inline
Side-by-side
Showing
20 changed files
with
0 additions
and
213 deletions
+0
-213
lm_eval/tasks/mmlu_pro/continuation/_continuation_template_yaml
...l/tasks/mmlu_pro/continuation/_continuation_template_yaml
+0
-10
lm_eval/tasks/mmlu_pro/continuation/_mmlu_pro.yaml
lm_eval/tasks/mmlu_pro/continuation/_mmlu_pro.yaml
+0
-6
lm_eval/tasks/mmlu_pro/continuation/mmlu_pro_biology.yaml
lm_eval/tasks/mmlu_pro/continuation/mmlu_pro_biology.yaml
+0
-6
lm_eval/tasks/mmlu_pro/continuation/mmlu_pro_business.yaml
lm_eval/tasks/mmlu_pro/continuation/mmlu_pro_business.yaml
+0
-6
lm_eval/tasks/mmlu_pro/continuation/mmlu_pro_chemistry.yaml
lm_eval/tasks/mmlu_pro/continuation/mmlu_pro_chemistry.yaml
+0
-6
lm_eval/tasks/mmlu_pro/continuation/mmlu_pro_computer_science.yaml
...asks/mmlu_pro/continuation/mmlu_pro_computer_science.yaml
+0
-6
lm_eval/tasks/mmlu_pro/continuation/mmlu_pro_economics.yaml
lm_eval/tasks/mmlu_pro/continuation/mmlu_pro_economics.yaml
+0
-6
lm_eval/tasks/mmlu_pro/continuation/mmlu_pro_engineering.yaml
...val/tasks/mmlu_pro/continuation/mmlu_pro_engineering.yaml
+0
-6
lm_eval/tasks/mmlu_pro/continuation/mmlu_pro_health.yaml
lm_eval/tasks/mmlu_pro/continuation/mmlu_pro_health.yaml
+0
-6
lm_eval/tasks/mmlu_pro/continuation/mmlu_pro_history.yaml
lm_eval/tasks/mmlu_pro/continuation/mmlu_pro_history.yaml
+0
-6
lm_eval/tasks/mmlu_pro/continuation/mmlu_pro_law.yaml
lm_eval/tasks/mmlu_pro/continuation/mmlu_pro_law.yaml
+0
-6
lm_eval/tasks/mmlu_pro/continuation/mmlu_pro_math.yaml
lm_eval/tasks/mmlu_pro/continuation/mmlu_pro_math.yaml
+0
-6
lm_eval/tasks/mmlu_pro/continuation/mmlu_pro_other.yaml
lm_eval/tasks/mmlu_pro/continuation/mmlu_pro_other.yaml
+0
-6
lm_eval/tasks/mmlu_pro/continuation/mmlu_pro_philosophy.yaml
lm_eval/tasks/mmlu_pro/continuation/mmlu_pro_philosophy.yaml
+0
-6
lm_eval/tasks/mmlu_pro/continuation/mmlu_pro_physics.yaml
lm_eval/tasks/mmlu_pro/continuation/mmlu_pro_physics.yaml
+0
-6
lm_eval/tasks/mmlu_pro/continuation/mmlu_pro_psychology.yaml
lm_eval/tasks/mmlu_pro/continuation/mmlu_pro_psychology.yaml
+0
-6
lm_eval/tasks/mmlu_pro/flan_cot_fewshot/_cot_prompts.json
lm_eval/tasks/mmlu_pro/flan_cot_fewshot/_cot_prompts.json
+0
-1
lm_eval/tasks/mmlu_pro/flan_cot_fewshot/_mmlu_pro.yaml
lm_eval/tasks/mmlu_pro/flan_cot_fewshot/_mmlu_pro.yaml
+0
-6
lm_eval/tasks/mmlu_pro/flan_cot_fewshot/_mmlu_pro_flan_cot_fewshot_template_yaml
...flan_cot_fewshot/_mmlu_pro_flan_cot_fewshot_template_yaml
+0
-31
lm_eval/tasks/mmlu_pro/flan_cot_fewshot/mmlu_anatomy.yaml
lm_eval/tasks/mmlu_pro/flan_cot_fewshot/mmlu_anatomy.yaml
+0
-75
No files found.
lm_eval/tasks/mmlu_pro/continuation/_continuation_template_yaml
deleted
100644 → 0
View file @
067f681a
dataset_path: sjyuxyz/MMLU-Pro-with-subset
test_split: test
fewshot_split: dev
fewshot_config:
sampler: first_n
doc_to_text: "Question: {{question.strip()}}\nAnswer:"
doc_to_choice: "{{choices}}"
doc_to_target: "{{answer}}"
metadata:
version: 0.0
lm_eval/tasks/mmlu_pro/continuation/_mmlu_pro.yaml
deleted
100644 → 0
View file @
067f681a
group
:
mmlu_pro_continuation
task
:
-
mmlu_pro_continuation_stem
-
mmlu_pro_continuation_other
-
mmlu_pro_continuation_social_sciences
-
mmlu_pro_continuation_humanities
lm_eval/tasks/mmlu_pro/continuation/mmlu_pro_biology.yaml
deleted
100644 → 0
View file @
067f681a
"
dataset_name"
:
"
biology"
"
description"
:
"
The
following
are
questions
(with
answers)
about
biology.
\n\
\n
"
"
group"
:
"
mmlu_continuation_stem"
"
include"
:
"
_continuation_template_yaml"
"
task"
:
"
mmlu_continuation_biology"
lm_eval/tasks/mmlu_pro/continuation/mmlu_pro_business.yaml
deleted
100644 → 0
View file @
067f681a
"
dataset_name"
:
"
business"
"
description"
:
"
The
following
are
questions
(with
answers)
about
business.
\n\
\n
"
"
group"
:
"
mmlu_continuation_other"
"
include"
:
"
_continuation_template_yaml"
"
task"
:
"
mmlu_continuation_business"
lm_eval/tasks/mmlu_pro/continuation/mmlu_pro_chemistry.yaml
deleted
100644 → 0
View file @
067f681a
"
dataset_name"
:
"
math"
"
description"
:
"
The
following
are
questions
(with
answers)
about
math.
\n\
\n
"
"
group"
:
"
mmlu_continuation_stem"
"
include"
:
"
_continuation_template_yaml"
"
task"
:
"
mmlu_continuation_chemistry"
\ No newline at end of file
lm_eval/tasks/mmlu_pro/continuation/mmlu_pro_computer_science.yaml
deleted
100644 → 0
View file @
067f681a
"
dataset_name"
:
"
computer_science"
"
description"
:
"
The
following
are
questions
(with
answers)
about
computer_science.
\n\
\n
"
"
group"
:
"
mmlu_continuation_stem"
"
include"
:
"
_continuation_template_yaml"
"
task"
:
"
mmlu_continuation_computer_science"
lm_eval/tasks/mmlu_pro/continuation/mmlu_pro_economics.yaml
deleted
100644 → 0
View file @
067f681a
"
dataset_name"
:
"
economics"
"
description"
:
"
The
following
are
questions
(with
answers)
about
economics.
\n\
\n
"
"
group"
:
"
mmlu_continuation_social_sciences"
"
include"
:
"
_continuation_template_yaml"
"
task"
:
"
mmlu_continuation_economics"
lm_eval/tasks/mmlu_pro/continuation/mmlu_pro_engineering.yaml
deleted
100644 → 0
View file @
067f681a
"
dataset_name"
:
"
engineering"
"
description"
:
"
The
following
are
questions
(with
answers)
about
engineering.
\n\
\n
"
"
group"
:
"
mmlu_continuation_stem"
"
include"
:
"
_continuation_template_yaml"
"
task"
:
"
mmlu_continuation_engineering"
lm_eval/tasks/mmlu_pro/continuation/mmlu_pro_health.yaml
deleted
100644 → 0
View file @
067f681a
"
dataset_name"
:
"
health"
"
description"
:
"
The
following
are
questions
(with
answers)
about
health.
\n\
\n
"
"
group"
:
"
mmlu_continuation_other"
"
include"
:
"
_continuation_template_yaml"
"
task"
:
"
mmlu_continuation_health"
lm_eval/tasks/mmlu_pro/continuation/mmlu_pro_history.yaml
deleted
100644 → 0
View file @
067f681a
"
dataset_name"
:
"
history"
"
description"
:
"
The
following
are
questions
(with
answers)
about
history.
\n\
\n
"
"
group"
:
"
mmlu_continuation_humanities"
"
include"
:
"
_continuation_template_yaml"
"
task"
:
"
mmlu_continuation_history"
lm_eval/tasks/mmlu_pro/continuation/mmlu_pro_law.yaml
deleted
100644 → 0
View file @
067f681a
"
dataset_name"
:
"
law"
"
description"
:
"
The
following
are
questions
(with
answers)
about
law.
\n\
\n
"
"
group"
:
"
mmlu_continuation_humanities"
"
include"
:
"
_continuation_template_yaml"
"
task"
:
"
mmlu_continuation_law"
lm_eval/tasks/mmlu_pro/continuation/mmlu_pro_math.yaml
deleted
100644 → 0
View file @
067f681a
"
dataset_name"
:
"
math"
"
description"
:
"
The
following
are
questions
(with
answers)
about
math.
\n\
\n
"
"
group"
:
"
mmlu_continuation_stem"
"
include"
:
"
_continuation_template_yaml"
"
task"
:
"
mmlu_continuation_math"
lm_eval/tasks/mmlu_pro/continuation/mmlu_pro_other.yaml
deleted
100644 → 0
View file @
067f681a
"
dataset_name"
:
"
other"
"
description"
:
"
The
following
are
questions
(with
answers)
about
other.
\n\
\n
"
"
group"
:
"
mmlu_continuation_other"
"
include"
:
"
_continuation_template_yaml"
"
task"
:
"
mmlu_continuation_other"
lm_eval/tasks/mmlu_pro/continuation/mmlu_pro_philosophy.yaml
deleted
100644 → 0
View file @
067f681a
"
dataset_name"
:
"
philosophy"
"
description"
:
"
The
following
are
questions
(with
answers)
about
philosophy.
\n\
\n
"
"
group"
:
"
mmlu_continuation_humanities"
"
include"
:
"
_continuation_template_yaml"
"
task"
:
"
mmlu_continuation_philosophy"
lm_eval/tasks/mmlu_pro/continuation/mmlu_pro_physics.yaml
deleted
100644 → 0
View file @
067f681a
"
dataset_name"
:
"
physics"
"
description"
:
"
The
following
are
questions
(with
answers)
about
physics.
\n\
\n
"
"
group"
:
"
mmlu_continuation_stem"
"
include"
:
"
_continuation_template_yaml"
"
task"
:
"
mmlu_continuation_physics"
lm_eval/tasks/mmlu_pro/continuation/mmlu_pro_psychology.yaml
deleted
100644 → 0
View file @
067f681a
"
dataset_name"
:
"
psychology"
"
description"
:
"
The
following
are
questions
(with
answers)
about
psychology.
\n\
\n
"
"
group"
:
"
mmlu_continuation_social_sciences"
"
include"
:
"
_continuation_template_yaml"
"
task"
:
"
mmlu_continuation_psychology"
lm_eval/tasks/mmlu_pro/flan_cot_fewshot/_cot_prompts.json
deleted
100644 → 0
View file @
067f681a
This diff is collapsed.
Click to expand it.
lm_eval/tasks/mmlu_pro/flan_cot_fewshot/_mmlu_pro.yaml
deleted
100644 → 0
View file @
067f681a
group
:
mmlu_pro_flan_cot_fewshot
task
:
-
mmlu_pro_flan_cot_fewshot_stem
-
mmlu_pro_flan_cot_fewshot_other
-
mmlu_pro_flan_cot_fewshot_social_sciences
-
mmlu_pro_flan_cot_fewshot_humanities
lm_eval/tasks/mmlu_pro/flan_cot_fewshot/_mmlu_pro_flan_cot_fewshot_template_yaml
deleted
100644 → 0
View file @
067f681a
dataset_path: sjyuxyz/MMLU-Pro-with-subset
validation_split: validation
test_split: test
fewshot_split: dev
fewshot_config:
sampler: first_n
output_type: generate_until
doc_to_text: "Q: {{question.strip()}}\n(A) {{choices[0]}} (B) {{choices[1]}} (C) {{choices[2]}} (D) {{choices[3]}} (E) {{choices[4]}} (F) {{choices[5]}} (G) {{choices[6]}} (H) {{choices[7]}} (I) {{choices[8]}} (J) {{choices[9]}}\nA: Let's think step by step."
doc_to_target: "{{['(A)', '(B)', '(C)', '(D)', '(E)', '(F)', '(G)', '(H)', '(I)', '(J)'][answer]}}"
filter_list:
- name: "get-answer"
filter:
- function: "regex"
regex_pattern: "(?<=answer is )([A-Ja-j])(?=.)"
- function: "take_first"
generation_kwargs:
until:
- "</s>"
do_sample: false
temperature: 0.0
num_fewshot: 0
metric_list:
- metric: exact_match
aggregation: mean
higher_is_better: true
ignore_case: true
ignore_punctuation: true
metadata:
version: 1.0
lm_eval/tasks/mmlu_pro/flan_cot_fewshot/mmlu_anatomy.yaml
deleted
100644 → 0
View file @
067f681a
dataset_name
:
anatomy
description
:
The following are multiple choice questions (with answers) about anatomy.
fewshot_config
:
sampler
:
first_n
samples
:
-
question
:
'
Which
of
the
following
is
the
body
cavity
that
contains
the
pituitary
gland?
(A)
Abdominal
(B)
Cranial
(C)
Pleural
(D)
Spinal'
target
:
"
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
anatomy
for
\
\
help.
Let
\u2019
s
solve
this
problem
step
by
step.
The
pituitary
gland
is
the
\
\
major
endocrine
gland
attached
to
the
base
of
the
brain,
and
it
is
contained
\
\
in
the
Cranial
cavity.
The
answer
is
(B)."
-
question
:
'
Which
of
these
branches
of
the
trigeminal
nerve
contain
somatic
motor
processes?
(A)
The
supraorbital
nerve
(B)
The
infraorbital
nerve
(C)
The
mental
nerve
(D)
None
of
the
above'
target
:
"
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
anatomy
for
\
\
help.
Let
\u2019
s
solve
this
problem
step
by
step.
\n
We
know
the
following:
\
\
(A)
The
supraorbital
nerve
(also
known
as
the
frontal
nerve)
is
the
largest
\
\
branch
of
the
ophthalmic
nerve
and
branch
of
ophthalmic
division
of
the
trigeminal
\
\
nerve.
(B)
The
infraorbital
nerve
is
a
branch
of
the
maxillary
division
of
\
\
the
trigeminal
nerve.
(C)
The
mental
nerve
is
a
branch
of
the
mandibular
division
\
\
of
the
trigeminal
nerve.
Because
all
these
nerves
are
purely
sensory
nerves
\
\
and
do
not
contain
any
somatic
motor
processes.
Therefore,
the
answer
should
\
\
be
none
of
the
above,
which
is
(D).
The
answer
is
(D)."
-
question
:
'
In
Angle'
'
s
Class
II
Div
2
occlusion
there
is
(A)
excess
overbite
of
the
upper
lateral
incisors.
(B)
negative
overjet
of
the
upper
central
incisors.
(C)
excess
overjet
of
the
upper
lateral
incisors.
(D)
excess
overjet
of
the
upper
central
incisors.'
target
:
"
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
anatomy
for
\
\
help.
Let
\u2019
s
solve
this
problem
step
by
step.
This
is
a
question
related
\
\
to
anatomy
and
orthodontics.
Excess
overjet
is
associated
with
Class
II
occlusions;
\
\
therefore,
we
can
safely
eliminate
(B)
from
the
list,
as
negative
overjet
\
\
is
often
associated
with
Class
III
occlusions.
Now,
we
need
to
determine
the
\
\
location
of
the
excess
overjet,
and
that
would
be
the
upper
(maxillary)
lateral
\
\
incisors.
Only
(C)
has
the
correct
information.
The
answer
is
(C)."
-
question
:
'
The
pleura
(A)
have
no
sensory
innervation.
(B)
are
separated
by
a
2
mm
space.
(C)
extend
into
the
neck.
(D)
are
composed
of
respiratory
epithelium.'
target
:
"
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
anatomy
for
\
\
help.
Let
\u2019
s
solve
this
problem
step
by
step.
First,
recall
that
the
pleura
\
\
refers
to
the
thin
layer
of
tissue
that
covers
the
lungs
and
lines
the
interior
\
\
wall
of
the
chest
cavity.
Now,
let
\u2019
s
look
at
each
option:
\n
Option
(A):
\
\ \u201C
The
pleura
have
no
sensory
innervation.
\u201D
This
information
is
not
\
\
correct.
The
pleura
do
have
a
sensory
innervation.
\n
Option
(B):
\u201C
The
\
\
pleura
are
separated
by
a
2
mm
space.
\u201D
This
information
is
not
correct.
\
\
There
is
a
very
thin
\u201C
potential
\u201D
space
between
the
layers
of
the
\
\
pleura;
however,
it
is
typically
filled
with
serous
pleural
fluid.
\n
Option
\
\
(C):
\u201C
The
pleura
extend
into
the
neck.
\u201D
This
information
is
actuakky
\
\
true.
The
cervical
pleura,
also
known
as
the
dome
of
the
pleuradome
of
the
\
\
pleura,
lines
the
extendsiton
of
the
pleural
cavity
into
the
neck.
\n
Option
\
\
(D):
\u201C
The
pleura
are
composed
of
respiratory
epithelium.
\u201D
This
information
\
\
is
not
correct.
The
pleaura
are
composed
of
connective
tissue
(CT).
\n
Because
\
\
(A),
(B),
and
(D)
are
all
incorrect,
(D)
is
the
only
correct
answer.
The
answer
\
\
is
(C)."
-
question
:
'
What
is
the
embryological
origin
of
the
hyoid
bone?
(A)
The
first
pharyngeal
arch
(B)
The
first
and
second
pharyngeal
arches
(C)
The
second
pharyngeal
arch
(D)
The
second
and
third
pharyngeal
arches'
target
:
"
Let's
think
step
by
step.
We
refer
to
Wikipedia
articles
on
anatomy
for
\
\
help.
Let
\u2019
s
solve
this
problem
step
by
step.
The
hyoid
bone,
which
is
\
\
also
known
as
the
hyooid,
is
a
a
small
U-shaped
bone
located
in
the
anterior
\
\
neck.
In
its
resting
position,
it
lies
between
the
ase
of
the
mandible
and
\
\
the
third
cervical
vertebrae.
We
know
that
the
second
and
the
third
pharyngeal
\
\
arches
give
rise
to
the
horns
of
the
hyoid
bone;
therefore,
the
embryological
\
\
origin
of
the
hyoid
bone
are
the
second
and
the
third
pharyngeal
arches
\u2014\
this
information
is
covered
in
the
last
option
(D).
Therefore,
we
conclude
that
\
\
(D)
must
be
the
correct
answer.
The
answer
is
(D).
\n\n
"
group
:
mmlu_flan_cot_fewshot_stem
include
:
_mmlu_flan_cot_fewshot_template_yaml
task
:
mmlu_flan_cot_fewshot_anatomy
Prev
1
2
3
4
Next
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment