Bug fix: token classification pipeline while passing offset_mapping (#22034)

fix slow tokenizers with passing offset_mapping

Bug fix: token classification pipeline while passing offset_mapping (#22034)
fix slow tokenizers with passing offset_mapping
3ec8171b · Ceyda Cinarel · GitHub · 1cbac686 · 3ec8171b
Unverified Commit 3ec8171b authored Mar 09, 2023 by Ceyda Cinarel Committed by GitHub Mar 08, 2023
Show whitespace changes
Inline Side-by-side

Showing with 3 additions and 1 deletion

src/transformers/pipelines/token_classification.py src/transformers/pipelines/token_classification.py +3 -1

No files found.
--- a/src/transformers/pipelines/token_classification.py
+++ b/src/transformers/pipelines/token_classification.py
@@ -304,7 +304,9 @@ class TokenClassificationPipeline(Pipeline):
                        start_ind = start_ind.item()
                        end_ind = end_ind.item()
                word_ref = sentence[start_ind:end_ind]
-                if getattr(self.tokenizer._tokenizer.model, "continuing_subword_prefix", None):
+                if getattr(self.tokenizer, "_tokenizer", None) and getattr(
+                    self.tokenizer._tokenizer.model, "continuing_subword_prefix", None
+                ):
                    # This is a BPE, word aware tokenizer, there is a correct way
                    # to fuse tokens
                    is_subword = len(word) != len(word_ref)