Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
chenpangpang
transformers
Commits
588e6caa
Unverified
Commit
588e6caa
authored
Aug 23, 2021
by
NielsRogge
Committed by
GitHub
Aug 23, 2021
Browse files
Overwrite get_clean_sequence as this was causing a bottleneck (#13183)
parent
14373821
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
6 additions
and
0 deletions
+6
-0
tests/test_tokenization_luke.py
tests/test_tokenization_luke.py
+6
-0
No files found.
tests/test_tokenization_luke.py
View file @
588e6caa
...
@@ -15,6 +15,7 @@
...
@@ -15,6 +15,7 @@
import
unittest
import
unittest
from
typing
import
Tuple
from
transformers
import
AddedToken
,
LukeTokenizer
from
transformers
import
AddedToken
,
LukeTokenizer
from
transformers.testing_utils
import
require_torch
,
slow
from
transformers.testing_utils
import
require_torch
,
slow
...
@@ -81,6 +82,11 @@ class Luke(TokenizerTesterMixin, unittest.TestCase):
...
@@ -81,6 +82,11 @@ class Luke(TokenizerTesterMixin, unittest.TestCase):
assert
encoded_sentence
==
encoded_text_from_decode
assert
encoded_sentence
==
encoded_text_from_decode
assert
encoded_pair
==
encoded_pair_from_decode
assert
encoded_pair
==
encoded_pair_from_decode
def
get_clean_sequence
(
self
,
tokenizer
,
max_length
=
20
)
->
Tuple
[
str
,
list
]:
txt
=
"Beyonce lives in Los Angeles"
ids
=
tokenizer
.
encode
(
txt
,
add_special_tokens
=
False
)
return
txt
,
ids
def
test_space_encoding
(
self
):
def
test_space_encoding
(
self
):
tokenizer
=
self
.
get_tokenizer
()
tokenizer
=
self
.
get_tokenizer
()
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment