Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
xdb4_94051
vllm
Commits
ec3b5ce9
Unverified
Commit
ec3b5ce9
authored
Oct 13, 2023
by
Antoni Baum
Committed by
GitHub
Oct 13, 2023
Browse files
Improve detokenization performance (#1338)
parent
6368e777
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
4 additions
and
3 deletions
+4
-3
vllm/transformers_utils/tokenizer.py
vllm/transformers_utils/tokenizer.py
+4
-3
No files found.
vllm/transformers_utils/tokenizer.py
View file @
ec3b5ce9
...
...
@@ -81,10 +81,11 @@ def _convert_tokens_to_string_with_added_encoders(
# even when the loop body is very simple.
sub_texts
=
[]
current_sub_text
=
[]
all_special_tokens
=
set
(
tokenizer
.
all_special_tokens
)
for
token
in
output_tokens
:
if
skip_special_tokens
and
token
in
tokenizer
.
all_special_tokens
:
if
skip_special_tokens
and
token
in
all_special_tokens
:
continue
if
token
in
tokenizer
.
added_
tokens_encoder
:
if
token
in
tokenizer
.
get_
added_
vocab
()
:
if
current_sub_text
:
sub_text
=
tokenizer
.
convert_tokens_to_string
(
current_sub_text
)
sub_texts
.
append
(
sub_text
)
...
...
@@ -129,7 +130,7 @@ def detokenize_incrementally(
# The prefix text is necessary only to defeat cleanup algorithms in
# the decode which decide to add a space or not depending on the
# surrounding ids.
if
not
getattr
(
tokenizer
,
"
added_
tokens_encoder"
,
{}
):
if
tokenizer
.
is_fast
or
not
tokenizer
.
get_
added_
vocab
(
):
prefix_text
=
tokenizer
.
convert_tokens_to_string
(
output_tokens
[
prefix_offset
:
read_offset
])
new_text
=
tokenizer
.
convert_tokens_to_string
(
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment