Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
gaoqiong
lm-evaluation-harness
Commits
2a6acc88
Unverified
Commit
2a6acc88
authored
Jun 29, 2024
by
Hailey Schoelkopf
Committed by
GitHub
Jun 29, 2024
Browse files
fail gracefully upon tokenizer logging failure (#2038)
parent
cc2d3463
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
23 additions
and
9 deletions
+23
-9
lm_eval/loggers/utils.py
lm_eval/loggers/utils.py
+23
-9
No files found.
lm_eval/loggers/utils.py
View file @
2a6acc88
...
@@ -114,14 +114,28 @@ def add_env_info(storage: Dict[str, Any]):
...
@@ -114,14 +114,28 @@ def add_env_info(storage: Dict[str, Any]):
def
add_tokenizer_info
(
storage
:
Dict
[
str
,
Any
],
lm
):
def
add_tokenizer_info
(
storage
:
Dict
[
str
,
Any
],
lm
):
if
getattr
(
lm
,
"tokenizer"
,
False
):
if
getattr
(
lm
,
"tokenizer"
,
False
):
try
:
tokenizer_info
=
{
tokenizer_info
=
{
"tokenizer_pad_token"
:
[
lm
.
tokenizer
.
pad_token
,
lm
.
tokenizer
.
pad_token_id
],
"tokenizer_pad_token"
:
[
"tokenizer_eos_token"
:
[
lm
.
tokenizer
.
eos_token
,
lm
.
tokenizer
.
eos_token_id
],
lm
.
tokenizer
.
pad_token
,
"tokenizer_bos_token"
:
[
lm
.
tokenizer
.
bos_token
,
lm
.
tokenizer
.
bos_token_id
],
lm
.
tokenizer
.
pad_token_id
,
],
"tokenizer_eos_token"
:
[
lm
.
tokenizer
.
eos_token
,
lm
.
tokenizer
.
eos_token_id
,
],
"tokenizer_bos_token"
:
[
lm
.
tokenizer
.
bos_token
,
lm
.
tokenizer
.
bos_token_id
,
],
"eot_token_id"
:
getattr
(
lm
,
"eot_token_id"
,
None
),
"eot_token_id"
:
getattr
(
lm
,
"eot_token_id"
,
None
),
"max_length"
:
getattr
(
lm
,
"max_length"
,
None
),
"max_length"
:
getattr
(
lm
,
"max_length"
,
None
),
}
}
storage
.
update
(
tokenizer_info
)
storage
.
update
(
tokenizer_info
)
except
Exception
as
err
:
logger
.
debug
(
f
"Logging detailed tokenizer info failed with
{
err
}
, skipping..."
)
# seems gguf and textsynth do not have tokenizer
# seems gguf and textsynth do not have tokenizer
else
:
else
:
logger
.
debug
(
logger
.
debug
(
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment