Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
silencealiang
Pai-Megatron-Patch
Commits
3041681f
Commit
3041681f
authored
Mar 19, 2025
by
silencealiang
Browse files
init
parent
291fc518
Pipeline
#2557
canceled with stages
Changes
1000
Pipelines
1
Hide whitespace changes
Inline
Side-by-side
Showing
20 changed files
with
286 additions
and
0 deletions
+286
-0
LM-Evaluation-Harness-240310/lm_eval/tasks/bbh/fewshot/tracking_shuffled_objects_five_objects.yaml
...s/bbh/fewshot/tracking_shuffled_objects_five_objects.yaml
+5
-0
LM-Evaluation-Harness-240310/lm_eval/tasks/bbh/fewshot/tracking_shuffled_objects_seven_objects.yaml
.../bbh/fewshot/tracking_shuffled_objects_seven_objects.yaml
+5
-0
LM-Evaluation-Harness-240310/lm_eval/tasks/bbh/fewshot/tracking_shuffled_objects_three_objects.yaml
.../bbh/fewshot/tracking_shuffled_objects_three_objects.yaml
+5
-0
LM-Evaluation-Harness-240310/lm_eval/tasks/bbh/fewshot/web_of_lies.yaml
...Harness-240310/lm_eval/tasks/bbh/fewshot/web_of_lies.yaml
+5
-0
LM-Evaluation-Harness-240310/lm_eval/tasks/bbh/fewshot/word_sorting.yaml
...arness-240310/lm_eval/tasks/bbh/fewshot/word_sorting.yaml
+5
-0
LM-Evaluation-Harness-240310/lm_eval/tasks/bbh/zeroshot/_zeroshot_template_yaml
...240310/lm_eval/tasks/bbh/zeroshot/_zeroshot_template_yaml
+27
-0
LM-Evaluation-Harness-240310/lm_eval/tasks/bbh/zeroshot/boolean_expressions.yaml
...40310/lm_eval/tasks/bbh/zeroshot/boolean_expressions.yaml
+16
-0
LM-Evaluation-Harness-240310/lm_eval/tasks/bbh/zeroshot/causal_judgement.yaml
...s-240310/lm_eval/tasks/bbh/zeroshot/causal_judgement.yaml
+16
-0
LM-Evaluation-Harness-240310/lm_eval/tasks/bbh/zeroshot/date_understanding.yaml
...240310/lm_eval/tasks/bbh/zeroshot/date_understanding.yaml
+18
-0
LM-Evaluation-Harness-240310/lm_eval/tasks/bbh/zeroshot/disambiguation_qa.yaml
...-240310/lm_eval/tasks/bbh/zeroshot/disambiguation_qa.yaml
+18
-0
LM-Evaluation-Harness-240310/lm_eval/tasks/bbh/zeroshot/dyck_languages.yaml
...ess-240310/lm_eval/tasks/bbh/zeroshot/dyck_languages.yaml
+15
-0
LM-Evaluation-Harness-240310/lm_eval/tasks/bbh/zeroshot/formal_fallacies.yaml
...s-240310/lm_eval/tasks/bbh/zeroshot/formal_fallacies.yaml
+16
-0
LM-Evaluation-Harness-240310/lm_eval/tasks/bbh/zeroshot/geometric_shapes.yaml
...s-240310/lm_eval/tasks/bbh/zeroshot/geometric_shapes.yaml
+18
-0
LM-Evaluation-Harness-240310/lm_eval/tasks/bbh/zeroshot/hyperbaton.yaml
...Harness-240310/lm_eval/tasks/bbh/zeroshot/hyperbaton.yaml
+18
-0
LM-Evaluation-Harness-240310/lm_eval/tasks/bbh/zeroshot/logical_deduction_five_objects.yaml
...al/tasks/bbh/zeroshot/logical_deduction_five_objects.yaml
+17
-0
LM-Evaluation-Harness-240310/lm_eval/tasks/bbh/zeroshot/logical_deduction_seven_objects.yaml
...l/tasks/bbh/zeroshot/logical_deduction_seven_objects.yaml
+17
-0
LM-Evaluation-Harness-240310/lm_eval/tasks/bbh/zeroshot/logical_deduction_three_objects.yaml
...l/tasks/bbh/zeroshot/logical_deduction_three_objects.yaml
+17
-0
LM-Evaluation-Harness-240310/lm_eval/tasks/bbh/zeroshot/movie_recommendation.yaml
...0310/lm_eval/tasks/bbh/zeroshot/movie_recommendation.yaml
+17
-0
LM-Evaluation-Harness-240310/lm_eval/tasks/bbh/zeroshot/multistep_arithmetic_two.yaml
.../lm_eval/tasks/bbh/zeroshot/multistep_arithmetic_two.yaml
+16
-0
LM-Evaluation-Harness-240310/lm_eval/tasks/bbh/zeroshot/navigate.yaml
...n-Harness-240310/lm_eval/tasks/bbh/zeroshot/navigate.yaml
+15
-0
No files found.
Too many changes to show.
To preserve performance only
1000 of 1000+
files are displayed.
Plain diff
Email patch
LM-Evaluation-Harness-240310/lm_eval/tasks/bbh/fewshot/tracking_shuffled_objects_five_objects.yaml
0 → 100644
View file @
3041681f
"
dataset_name"
:
"
tracking_shuffled_objects_five_objects"
"
description"
:
"
A
task
requiring
determining
the
final
positions
of
a
set
of
objects
given
their
initial
positions
and
a
description
of
a
sequence
of
swaps.
\n\n
"
"
doc_to_text"
:
"
Q:
Alice,
Bob,
and
Claire
are
playing
a
game.
At
the
start
of
the
game,
they
are
each
holding
a
ball:
Alice
has
a
yellow
ball,
Bob
has
a
blue
ball,
and
Claire
has
a
pink
ball.
\n
As
the
game
progresses,
pairs
of
players
trade
balls.
First,
Claire
and
Alice
swap
balls.
Then,
Alice
and
Bob
swap
balls.
Finally,
Claire
and
Bob
swap
balls.
At
the
end
of
the
game,
Bob
has
the
\n
Options:
\n
(A)
yellow
ball
\n
(B)
blue
ball
\n
(C)
pink
ball
\n
A:
(A)
\n\n
Q:
Alice,
Bob,
and
Claire
are
playing
a
game.
At
the
start
of
the
game,
they
are
each
holding
a
ball:
Alice
has
a
white
ball,
Bob
has
a
purple
ball,
and
Claire
has
a
pink
ball.
\n
As
the
game
progresses,
pairs
of
players
trade
balls.
First,
Bob
and
Alice
swap
balls.
Then,
Bob
and
Claire
swap
balls.
Finally,
Bob
and
Alice
swap
balls.
At
the
end
of
the
game,
Alice
has
the
\n
Options:
\n
(A)
white
ball
\n
(B)
purple
ball
\n
(C)
pink
ball
\n
A:
(C)
\n\n
Q:
Alice,
Bob,
and
Claire
are
dancers
at
a
square
dance.
At
the
start
of
a
song,
they
each
have
a
partner:
Alice
is
dancing
with
Lola,
Bob
is
dancing
with
Rodrigo,
and
Claire
is
dancing
with
Patrick.
\n
Throughout
the
song,
the
dancers
often
trade
partners.
First,
Alice
and
Bob
switch
partners.
Then,
Claire
and
Bob
switch
partners.
Finally,
Bob
and
Alice
switch
partners.
At
the
end
of
the
dance,
Alice
is
dancing
with
\n
Options:
\n
(A)
Lola
\n
(B)
Rodrigo
\n
(C)
Patrick
\n
A:
(C)
\n\n
Q:
{{input}}
\n
A:"
"
include"
:
"
_fewshot_template_yaml"
"
task"
:
"
bbh_fewshot_tracking_shuffled_objects_five_objects"
LM-Evaluation-Harness-240310/lm_eval/tasks/bbh/fewshot/tracking_shuffled_objects_seven_objects.yaml
0 → 100644
View file @
3041681f
"
dataset_name"
:
"
tracking_shuffled_objects_seven_objects"
"
description"
:
"
A
task
requiring
determining
the
final
positions
of
a
set
of
objects
given
their
initial
positions
and
a
description
of
a
sequence
of
swaps.
\n\n
"
"
doc_to_text"
:
"
Q:
Alice,
Bob,
and
Claire
are
playing
a
game.
At
the
start
of
the
game,
they
are
each
holding
a
ball:
Alice
has
a
yellow
ball,
Bob
has
a
blue
ball,
and
Claire
has
a
pink
ball.
\n
As
the
game
progresses,
pairs
of
players
trade
balls.
First,
Claire
and
Alice
swap
balls.
Then,
Alice
and
Bob
swap
balls.
Finally,
Claire
and
Bob
swap
balls.
At
the
end
of
the
game,
Bob
has
the
\n
Options:
\n
(A)
yellow
ball
\n
(B)
blue
ball
\n
(C)
pink
ball
\n
A:
(A)
\n\n
Q:
Alice,
Bob,
and
Claire
are
playing
a
game.
At
the
start
of
the
game,
they
are
each
holding
a
ball:
Alice
has
a
white
ball,
Bob
has
a
purple
ball,
and
Claire
has
a
pink
ball.
\n
As
the
game
progresses,
pairs
of
players
trade
balls.
First,
Bob
and
Alice
swap
balls.
Then,
Bob
and
Claire
swap
balls.
Finally,
Bob
and
Alice
swap
balls.
At
the
end
of
the
game,
Alice
has
the
\n
Options:
\n
(A)
white
ball
\n
(B)
purple
ball
\n
(C)
pink
ball
\n
A:
(C)
\n\n
Q:
Alice,
Bob,
and
Claire
are
dancers
at
a
square
dance.
At
the
start
of
a
song,
they
each
have
a
partner:
Alice
is
dancing
with
Lola,
Bob
is
dancing
with
Rodrigo,
and
Claire
is
dancing
with
Patrick.
\n
Throughout
the
song,
the
dancers
often
trade
partners.
First,
Alice
and
Bob
switch
partners.
Then,
Claire
and
Bob
switch
partners.
Finally,
Bob
and
Alice
switch
partners.
At
the
end
of
the
dance,
Alice
is
dancing
with
\n
Options:
\n
(A)
Lola
\n
(B)
Rodrigo
\n
(C)
Patrick
\n
A:
(C)
\n\n
Q:
{{input}}
\n
A:"
"
include"
:
"
_fewshot_template_yaml"
"
task"
:
"
bbh_fewshot_tracking_shuffled_objects_seven_objects"
LM-Evaluation-Harness-240310/lm_eval/tasks/bbh/fewshot/tracking_shuffled_objects_three_objects.yaml
0 → 100644
View file @
3041681f
"
dataset_name"
:
"
tracking_shuffled_objects_three_objects"
"
description"
:
"
A
task
requiring
determining
the
final
positions
of
a
set
of
objects
given
their
initial
positions
and
a
description
of
a
sequence
of
swaps.
\n\n
"
"
doc_to_text"
:
"
Q:
Alice,
Bob,
and
Claire
are
playing
a
game.
At
the
start
of
the
game,
they
are
each
holding
a
ball:
Alice
has
a
yellow
ball,
Bob
has
a
blue
ball,
and
Claire
has
a
pink
ball.
\n
As
the
game
progresses,
pairs
of
players
trade
balls.
First,
Claire
and
Alice
swap
balls.
Then,
Alice
and
Bob
swap
balls.
Finally,
Claire
and
Bob
swap
balls.
At
the
end
of
the
game,
Bob
has
the
\n
Options:
\n
(A)
yellow
ball
\n
(B)
blue
ball
\n
(C)
pink
ball
\n
A:
(A)
\n\n
Q:
Alice,
Bob,
and
Claire
are
playing
a
game.
At
the
start
of
the
game,
they
are
each
holding
a
ball:
Alice
has
a
white
ball,
Bob
has
a
purple
ball,
and
Claire
has
a
pink
ball.
\n
As
the
game
progresses,
pairs
of
players
trade
balls.
First,
Bob
and
Alice
swap
balls.
Then,
Bob
and
Claire
swap
balls.
Finally,
Bob
and
Alice
swap
balls.
At
the
end
of
the
game,
Alice
has
the
\n
Options:
\n
(A)
white
ball
\n
(B)
purple
ball
\n
(C)
pink
ball
\n
A:
(C)
\n\n
Q:
Alice,
Bob,
and
Claire
are
dancers
at
a
square
dance.
At
the
start
of
a
song,
they
each
have
a
partner:
Alice
is
dancing
with
Lola,
Bob
is
dancing
with
Rodrigo,
and
Claire
is
dancing
with
Patrick.
\n
Throughout
the
song,
the
dancers
often
trade
partners.
First,
Alice
and
Bob
switch
partners.
Then,
Claire
and
Bob
switch
partners.
Finally,
Bob
and
Alice
switch
partners.
At
the
end
of
the
dance,
Alice
is
dancing
with
\n
Options:
\n
(A)
Lola
\n
(B)
Rodrigo
\n
(C)
Patrick
\n
A:
(C)
\n\n
Q:
{{input}}
\n
A:"
"
include"
:
"
_fewshot_template_yaml"
"
task"
:
"
bbh_fewshot_tracking_shuffled_objects_three_objects"
LM-Evaluation-Harness-240310/lm_eval/tasks/bbh/fewshot/web_of_lies.yaml
0 → 100644
View file @
3041681f
"
dataset_name"
:
"
web_of_lies"
"
description"
:
"
Evaluate
a
random
boolean
function
expressed
as
a
word
problem.
\n\n
"
"
doc_to_text"
:
"
Q:
Question:
Fidel
tells
the
truth.
Jerry
says
Fidel
tells
the
truth.
Vina
says
Jerry
tells
the
truth.
Millicent
says
Vina
lies.
Raymond
says
Millicent
lies.
Does
Raymond
tell
the
truth?
\n
A:
Yes
\n\n
Q:
Question:
Kristian
lies.
Millie
says
Kristian
lies.
Maybelle
says
Millie
tells
the
truth.
Fidel
says
Maybelle
lies.
Leda
says
Fidel
lies.
Does
Leda
tell
the
truth?
\n
A:
Yes
\n\n
Q:
Question:
Kristian
tells
the
truth.
Michaela
says
Kristian
lies.
Raymond
says
Michaela
tells
the
truth.
Osvaldo
says
Raymond
tells
the
truth.
Jamey
says
Osvaldo
tells
the
truth.
Does
Jamey
tell
the
truth?
\n
A:
No
\n\n
Q:
{{input}}
\n
A:"
"
include"
:
"
_fewshot_template_yaml"
"
task"
:
"
bbh_fewshot_web_of_lies"
LM-Evaluation-Harness-240310/lm_eval/tasks/bbh/fewshot/word_sorting.yaml
0 → 100644
View file @
3041681f
"
dataset_name"
:
"
word_sorting"
"
description"
:
"
Sort
a
list
of
words.
\n\n
"
"
doc_to_text"
:
"
Q:
Sort
the
following
words
alphabetically:
List:
oven
costume
counterpart
\n
A:
costume
counterpart
oven
\n\n
Q:
Sort
the
following
words
alphabetically:
List:
hypochlorite
ponderosa
phone
credulity
\n
A:
credulity
hypochlorite
phone
ponderosa
\n\n
Q:
Sort
the
following
words
alphabetically:
List:
newt
arson
parthia
seismography
mugho
aspect
census
\n
A:
arson
aspect
census
mugho
newt
parthia
seismography
\n\n
Q:
{{input}}
\n
A:"
"
include"
:
"
_fewshot_template_yaml"
"
task"
:
"
bbh_fewshot_word_sorting"
LM-Evaluation-Harness-240310/lm_eval/tasks/bbh/zeroshot/_zeroshot_template_yaml
0 → 100644
View file @
3041681f
group: bbh_zeroshot
dataset_path: lukaemon/bbh
output_type: generate_until
test_split: test
doc_to_target: "{{target}}"
metric_list:
- metric: exact_match
aggregation: mean
higher_is_better: true
ignore_case: true
# ignore_punctuation: true
regexes_to_ignore:
- "\\.$"
- ","
- "\n"
- "\\\\"
- '"'
generation_kwargs:
until:
- "</s>"
- "Q:"
- "<|im_end|>"
do_sample: false
temperature: 0.0
num_fewshot: 0
metadata:
version: 2.0
LM-Evaluation-Harness-240310/lm_eval/tasks/bbh/zeroshot/boolean_expressions.yaml
0 → 100644
View file @
3041681f
"
dataset_name"
:
"
boolean_expressions"
"
description"
:
"
Evaluate
the
result
of
a
random
Boolean
expression.
\n\n
"
"
doc_to_text"
:
"
Q:
{{input}}
\n
A:"
"
include"
:
"
_zeroshot_template_yaml"
"
task"
:
"
bbh_zeroshot_boolean_expressions"
filter_list
:
-
name
:
"
strict-match"
filter
:
-
function
:
"
take_first"
-
name
:
"
flexible-extract"
filter
:
-
function
:
"
regex"
group_select
:
0
regex_pattern
:
"
\\
b(True|False)
\\
b"
-
function
:
"
take_first"
LM-Evaluation-Harness-240310/lm_eval/tasks/bbh/zeroshot/causal_judgement.yaml
0 → 100644
View file @
3041681f
"
dataset_name"
:
"
causal_judgement"
"
description"
:
"
Answer
questions
about
causal
attribution.
\n\n
"
"
doc_to_text"
:
"
Q:
{{input}}
\n
A:"
"
include"
:
"
_zeroshot_template_yaml"
"
task"
:
"
bbh_zeroshot_causal_judgement"
filter_list
:
-
name
:
"
strict-match"
filter
:
-
function
:
"
take_first"
-
name
:
"
flexible-extract"
filter
:
-
function
:
"
regex"
group_select
:
0
regex_pattern
:
"
\\
b(Yes|No|yes|no)
\\
b"
-
function
:
"
take_first"
LM-Evaluation-Harness-240310/lm_eval/tasks/bbh/zeroshot/date_understanding.yaml
0 → 100644
View file @
3041681f
"
dataset_name"
:
"
date_understanding"
"
description"
:
"
Infer
the
date
from
context.
\n\n
"
"
doc_to_text"
:
"
Q:
{{input}}
\n
A:"
"
include"
:
"
_zeroshot_template_yaml"
"
task"
:
"
bbh_zeroshot_date_understanding"
filter_list
:
-
name
:
"
strict-match"
filter
:
-
function
:
"
take_first"
-
name
:
"
flexible-extract"
filter
:
-
function
:
!function
utils.MultiChoiceRegexFilter
group_select
:
0
ignore_case
:
true
ignore_punctuation
:
true
regex_pattern
:
"
(
\\
([A-Z]
\\
))"
-
function
:
"
take_first"
LM-Evaluation-Harness-240310/lm_eval/tasks/bbh/zeroshot/disambiguation_qa.yaml
0 → 100644
View file @
3041681f
"
dataset_name"
:
"
disambiguation_qa"
"
description"
:
"
Clarify
the
meaning
of
sentences
with
ambiguous
pronouns.
\n\n
"
"
doc_to_text"
:
"
Q:
{{input}}
\n
A:"
"
include"
:
"
_zeroshot_template_yaml"
"
task"
:
"
bbh_zeroshot_disambiguation_qa"
filter_list
:
-
name
:
"
strict-match"
filter
:
-
function
:
"
take_first"
-
name
:
"
flexible-extract"
filter
:
-
function
:
!function
utils.MultiChoiceRegexFilter
group_select
:
0
ignore_case
:
true
ignore_punctuation
:
true
regex_pattern
:
"
(
\\
([A-Z]
\\
))"
-
function
:
"
take_first"
LM-Evaluation-Harness-240310/lm_eval/tasks/bbh/zeroshot/dyck_languages.yaml
0 → 100644
View file @
3041681f
"
dataset_name"
:
"
dyck_languages"
"
description"
:
"
Correctly
close
a
Dyck-n
word.
\n\n
"
"
doc_to_text"
:
"
Q:
{{input}}
\n
A:"
"
include"
:
"
_zeroshot_template_yaml"
"
task"
:
"
bbh_zeroshot_dyck_languages"
filter_list
:
-
name
:
"
strict-match"
filter
:
-
function
:
"
take_first"
-
name
:
"
flexible-extract"
filter
:
-
function
:
"
regex"
group_select
:
0
regex_pattern
:
"
(?<=
)([
\"
\\
[
\\
(<{}>
\\
)
\\
]]+)|([
\"
\\
[
\\
(<{}>
\\
)
\\
]]+)"
-
function
:
"
take_first"
LM-Evaluation-Harness-240310/lm_eval/tasks/bbh/zeroshot/formal_fallacies.yaml
0 → 100644
View file @
3041681f
"
dataset_name"
:
"
formal_fallacies"
"
description"
:
"
Distinguish
deductively
valid
arguments
from
formal
fallacies.
\n\n
"
"
doc_to_text"
:
"
Q:
{{input}}
\n
A:"
"
include"
:
"
_zeroshot_template_yaml"
"
task"
:
"
bbh_zeroshot_formal_fallacies"
filter_list
:
-
name
:
"
strict-match"
filter
:
-
function
:
"
take_first"
-
name
:
"
flexible-extract"
filter
:
-
function
:
"
regex"
group_select
:
0
regex_pattern
:
"
\\
b(valid|invalid)
\\
b"
-
function
:
"
take_first"
LM-Evaluation-Harness-240310/lm_eval/tasks/bbh/zeroshot/geometric_shapes.yaml
0 → 100644
View file @
3041681f
"
dataset_name"
:
"
geometric_shapes"
"
description"
:
"
Name
geometric
shapes
from
their
SVG
paths.
\n\n
"
"
doc_to_text"
:
"
Q:
{{input}}
\n
A:"
"
include"
:
"
_zeroshot_template_yaml"
"
task"
:
"
bbh_zeroshot_geometric_shapes"
filter_list
:
-
name
:
"
strict-match"
filter
:
-
function
:
"
take_first"
-
name
:
"
flexible-extract"
filter
:
-
function
:
!function
utils.MultiChoiceRegexFilter
group_select
:
0
ignore_case
:
true
ignore_punctuation
:
true
regex_pattern
:
"
(
\\
([A-Z]
\\
))"
-
function
:
"
take_first"
LM-Evaluation-Harness-240310/lm_eval/tasks/bbh/zeroshot/hyperbaton.yaml
0 → 100644
View file @
3041681f
"
dataset_name"
:
"
hyperbaton"
"
description"
:
"
Order
adjectives
correctly
in
English
sentences.
\n\n
"
"
doc_to_text"
:
"
Q:
{{input}}
\n
A:"
"
include"
:
"
_zeroshot_template_yaml"
"
task"
:
"
bbh_zeroshot_hyperbaton"
filter_list
:
-
name
:
"
strict-match"
filter
:
-
function
:
"
take_first"
-
name
:
"
flexible-extract"
filter
:
-
function
:
!function
utils.MultiChoiceRegexFilter
group_select
:
0
ignore_case
:
true
ignore_punctuation
:
true
regex_pattern
:
"
(
\\
([A-Z]
\\
))"
-
function
:
"
take_first"
LM-Evaluation-Harness-240310/lm_eval/tasks/bbh/zeroshot/logical_deduction_five_objects.yaml
0 → 100644
View file @
3041681f
"
dataset_name"
:
"
logical_deduction_five_objects"
"
description"
:
"
A
logical
deduction
task
which
requires
deducing
the
order
of
a
sequence
of
objects.
\n\n
"
"
doc_to_text"
:
"
Q:
{{input}}
\n
A:"
"
include"
:
"
_zeroshot_template_yaml"
"
task"
:
"
bbh_zeroshot_logical_deduction_five_objects"
filter_list
:
-
name
:
"
strict-match"
filter
:
-
function
:
"
take_first"
-
name
:
"
flexible-extract"
filter
:
-
function
:
!function
utils.MultiChoiceRegexFilter
group_select
:
0
ignore_case
:
true
ignore_punctuation
:
true
regex_pattern
:
"
(
\\
([A-Z]
\\
))"
-
function
:
"
take_first"
LM-Evaluation-Harness-240310/lm_eval/tasks/bbh/zeroshot/logical_deduction_seven_objects.yaml
0 → 100644
View file @
3041681f
"
dataset_name"
:
"
logical_deduction_seven_objects"
"
description"
:
"
A
logical
deduction
task
which
requires
deducing
the
order
of
a
sequence
of
objects.
\n\n
"
"
doc_to_text"
:
"
Q:
{{input}}
\n
A:"
"
include"
:
"
_zeroshot_template_yaml"
"
task"
:
"
bbh_zeroshot_logical_deduction_seven_objects"
filter_list
:
-
name
:
"
strict-match"
filter
:
-
function
:
"
take_first"
-
name
:
"
flexible-extract"
filter
:
-
function
:
!function
utils.MultiChoiceRegexFilter
group_select
:
0
ignore_case
:
true
ignore_punctuation
:
true
regex_pattern
:
"
(
\\
([A-Z]
\\
))"
-
function
:
"
take_first"
LM-Evaluation-Harness-240310/lm_eval/tasks/bbh/zeroshot/logical_deduction_three_objects.yaml
0 → 100644
View file @
3041681f
"
dataset_name"
:
"
logical_deduction_three_objects"
"
description"
:
"
A
logical
deduction
task
which
requires
deducing
the
order
of
a
sequence
of
objects.
\n\n
"
"
doc_to_text"
:
"
Q:
{{input}}
\n
A:"
"
include"
:
"
_zeroshot_template_yaml"
"
task"
:
"
bbh_zeroshot_logical_deduction_three_objects"
filter_list
:
-
name
:
"
strict-match"
filter
:
-
function
:
"
take_first"
-
name
:
"
flexible-extract"
filter
:
-
function
:
!function
utils.MultiChoiceRegexFilter
group_select
:
0
ignore_case
:
true
ignore_punctuation
:
true
regex_pattern
:
"
(
\\
([A-Z]
\\
))"
-
function
:
"
take_first"
LM-Evaluation-Harness-240310/lm_eval/tasks/bbh/zeroshot/movie_recommendation.yaml
0 → 100644
View file @
3041681f
"
dataset_name"
:
"
movie_recommendation"
"
description"
:
"
Recommend
movies
similar
to
the
given
list
of
movies.
\n\n
"
"
doc_to_text"
:
"
Q:
{{input}}
\n
A:"
"
include"
:
"
_zeroshot_template_yaml"
"
task"
:
"
bbh_zeroshot_movie_recommendation"
filter_list
:
-
name
:
"
strict-match"
filter
:
-
function
:
"
take_first"
-
name
:
"
flexible-extract"
filter
:
-
function
:
!function
utils.MultiChoiceRegexFilter
group_select
:
0
ignore_case
:
true
ignore_punctuation
:
true
regex_pattern
:
"
(
\\
([A-Z]
\\
))"
-
function
:
"
take_first"
LM-Evaluation-Harness-240310/lm_eval/tasks/bbh/zeroshot/multistep_arithmetic_two.yaml
0 → 100644
View file @
3041681f
"
dataset_name"
:
"
multistep_arithmetic_two"
"
description"
:
"
Solve
multi-step
arithmetic
problems.
\n\n
"
"
doc_to_text"
:
"
Q:
{{input}}
\n
A:"
"
include"
:
"
_zeroshot_template_yaml"
"
task"
:
"
bbh_zeroshot_multistep_arithmetic_two"
filter_list
:
-
name
:
"
strict-match"
filter
:
-
function
:
"
take_first"
-
name
:
"
flexible-extract"
filter
:
-
function
:
!function
utils.NumberParseRegexFilter
group_select
:
0
regex_pattern
:
"
([-0-9]+)"
-
function
:
"
take_first"
LM-Evaluation-Harness-240310/lm_eval/tasks/bbh/zeroshot/navigate.yaml
0 → 100644
View file @
3041681f
"
dataset_name"
:
"
navigate"
"
description"
:
"
Given
a
series
of
navigation
instructions,
determine
whether
one
would
end
up
back
at
the
starting
point.
\n\n
"
"
doc_to_text"
:
"
Q:
{{input}}
\n
A:"
"
include"
:
"
_zeroshot_template_yaml"
"
task"
:
"
bbh_zeroshot_navigate"
filter_list
:
-
name
:
"
strict-match"
filter
:
-
function
:
"
take_first"
-
name
:
"
flexible-extract"
filter
:
-
function
:
"
regex"
group_select
:
0
regex_pattern
:
"
\\
b(Yes|No|yes|no)
\\
b"
-
function
:
"
take_first"
Prev
1
…
14
15
16
17
18
19
20
21
22
…
50
Next
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment