Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
chenpangpang
transformers
Commits
90ce374d
Unverified
Commit
90ce374d
authored
Apr 14, 2023
by
Ruiyang Sun
Committed by
GitHub
Apr 13, 2023
Browse files
fix(llama): fix LlamaTokenzier (#22746)
Bug in LlamaTokenizer when #22742
parent
d85bf954
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
8 additions
and
5 deletions
+8
-5
src/transformers/models/llama/tokenization_llama.py
src/transformers/models/llama/tokenization_llama.py
+8
-5
No files found.
src/transformers/models/llama/tokenization_llama.py
View file @
90ce374d
...
...
@@ -246,9 +246,12 @@ class LlamaTokenizer(PreTrainedTokenizer):
Returns:
`List[int]`: List of [token type IDs](../glossary#token-type-ids) according to the given sequence(s).
"""
sep
=
[
self
.
sep
_token_id
]
cls
=
[
self
.
cl
s_token_id
]
bos_token_id
=
[
self
.
bos
_token_id
]
if
self
.
add_bos_token
else
[]
eos_token_id
=
[
self
.
eo
s_token_id
]
if
self
.
add_eos_token
else
[]
if
token_ids_1
is
None
:
return
len
(
cls
+
token_ids_0
+
sep
)
*
[
0
]
return
len
(
cls
+
token_ids_0
+
sep
)
*
[
0
]
+
len
(
token_ids_1
+
sep
)
*
[
1
]
output
=
[
0
]
*
len
(
bos_token_id
+
token_ids_0
+
eos_token_id
)
if
token_ids_1
is
not
None
:
output
+=
[
1
]
*
len
(
bos_token_id
+
token_ids_1
+
eos_token_id
)
return
output
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment