Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
chenpangpang
transformers
Commits
3ec8171b
Unverified
Commit
3ec8171b
authored
Mar 09, 2023
by
Ceyda Cinarel
Committed by
GitHub
Mar 08, 2023
Browse files
Bug fix: token classification pipeline while passing offset_mapping (#22034)
fix slow tokenizers with passing offset_mapping
parent
1cbac686
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
3 additions
and
1 deletion
+3
-1
src/transformers/pipelines/token_classification.py
src/transformers/pipelines/token_classification.py
+3
-1
No files found.
src/transformers/pipelines/token_classification.py
View file @
3ec8171b
...
...
@@ -304,7 +304,9 @@ class TokenClassificationPipeline(Pipeline):
start_ind
=
start_ind
.
item
()
end_ind
=
end_ind
.
item
()
word_ref
=
sentence
[
start_ind
:
end_ind
]
if
getattr
(
self
.
tokenizer
.
_tokenizer
.
model
,
"continuing_subword_prefix"
,
None
):
if
getattr
(
self
.
tokenizer
,
"_tokenizer"
,
None
)
and
getattr
(
self
.
tokenizer
.
_tokenizer
.
model
,
"continuing_subword_prefix"
,
None
):
# This is a BPE, word aware tokenizer, there is a correct way
# to fuse tokens
is_subword
=
len
(
word
)
!=
len
(
word_ref
)
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment