Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
gaoqiong
lm-evaluation-harness
Commits
f44f2c5e
Commit
f44f2c5e
authored
Dec 10, 2024
by
Baber
Browse files
add mgsm
parent
0b994433
Changes
2
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
60 additions
and
0 deletions
+60
-0
lm_eval/tasks/llama3/instruct/mgsm_chat.yaml
lm_eval/tasks/llama3/instruct/mgsm_chat.yaml
+45
-0
lm_eval/tasks/llama3/instruct/utils.py
lm_eval/tasks/llama3/instruct/utils.py
+15
-0
No files found.
lm_eval/tasks/llama3/instruct/mgsm_chat.yaml
0 → 100644
View file @
f44f2c5e
tag
:
llama3
task
:
mgsm_chat
dataset_path
:
meta-llama/Llama-3.2-3B-Instruct-evals
dataset_name
:
Llama-3.2-3B-Instruct-evals__mgsm__details
output_type
:
generate_until
test_split
:
latest
doc_to_text
:
"
{{
input_final_prompts
|first
|replace('<|eot_id|><|start_header_id|>assistant<|end_header_id|>',
'')
|replace('<|start_header_id|>',
'')
|replace('<|end_header_id|>',
'')
|replace('<|eot_id|>',
'')
|replace('^user',
'')
|trim
}}"
doc_to_target
:
"
input_correct_responses"
process_results
:
!function
utils.process_results_mgsm
generation_kwargs
:
until
:
[]
do_sample
:
false
temperature
:
0.0
max_gen_toks
:
2048
metric_list
:
-
metric
:
exact_match
aggregation
:
mean
higher_is_better
:
true
ignore_case
:
true
ignore_punctuation
:
true
filter_list
:
-
name
:
"
strict-match"
filter
:
-
function
:
"
regex"
regex_pattern
:
"
Answer:
(
\\
-?[0-9
\\
.
\\
,]+)"
-
function
:
"
take_first"
-
name
:
"
flexible-extract"
filter
:
-
function
:
regex
group_select
:
-1
regex_pattern
:
"
Answer:
(-?[$0-9.,]{2,})|(-?[0-9]+)"
-
function
:
take_first
-
function
:
remove_whitespace
-
function
:
take_first
metadata
:
version
:
0.0
lm_eval/tasks/llama3/instruct/utils.py
0 → 100644
View file @
f44f2c5e
from
typing
import
List
from
lm_eval.api.metrics
import
exact_match_fn
def
process_results_mgsm
(
doc
,
prediction
):
gold
:
List
=
doc
[
"input_correct_responses"
]
return
{
"exact_match"
:
int
(
exact_match_fn
(
predictions
=
prediction
*
len
(
gold
),
references
=
gold
,
ignore_case
=
True
)[
"exact_match"
]
>
0
)
}
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment