Warning on `add_special_tokens` (#2966)

Warning on `add_special_tokens` when passed to `encode`, `encode_plus` and `batch_encode_plus`

Warning on `add_special_tokens` (#2966)
Warning on `add_special_tokens` when passed to `encode`, `encode_plus` and `batch_encode_plus`
8194df8e · Lysandre Debut · GitHub · 38f5fe9e · 8194df8e
Unverified Commit 8194df8e authored Feb 24, 2020 by Lysandre Debut Committed by GitHub Feb 24, 2020
Hide whitespace changes
Inline Side-by-side

Showing with 7 additions and 0 deletions

src/transformers/tokenization_utils.py src/transformers/tokenization_utils.py +7 -0

No files found.
--- a/src/transformers/tokenization_utils.py
+++ b/src/transformers/tokenization_utils.py
@@ -1704,6 +1704,13 @@ class PreTrainedTokenizerFast(PreTrainedTokenizer):
        return_offsets_mapping=False,
        **kwargs
    ):
+        if not add_special_tokens:
+            logger.warning(
+                "Fast tokenizers add special tokens by default. To remove special tokens, please specify"
+                "`add_special_tokens=False` during the initialisation rather than when calling `encode`,"
+                "`encode_plus` or `batch_encode_plus`."
+            )
        # Needed if we have to return a tensor
        pad_to_max_length = pad_to_max_length or (return_tensors is not None)