improve `add_tokens` docstring (#18687)

* improve add_tokens documentation * format

improve `add_tokens` docstring (#18687)
* improve add_tokens documentation * format
43869808 · SaulLu · GitHub · 891704b3 · 43869808
Unverified Commit 43869808 authored Aug 23, 2022 by SaulLu Committed by GitHub Aug 23, 2022
Show whitespace changes
Inline Side-by-side

Showing with 5 additions and 3 deletions

src/transformers/tokenization_utils_base.py src/transformers/tokenization_utils_base.py +5 -3

No files found.
--- a/src/transformers/tokenization_utils_base.py
+++ b/src/transformers/tokenization_utils_base.py
@@ -915,10 +915,12 @@ class SpecialTokensMixin:
    ) -> int:
        """
        Add a list of new tokens to the tokenizer class. If the new tokens are not in the vocabulary, they are added to
-        it with indices starting from length of the current vocabulary.
+        it with indices starting from length of the current vocabulary and and will be isolated before the tokenization
+        algorithm is applied. Added tokens and tokens from the vocabulary of the tokenization algorithm are therefore
+        not treated in the same way.

-        Note,None When adding new tokens to the vocabulary, you should make sure to also resize the token embedding
-        matrix of the model so that its embedding matrix matches the tokenizer.
+        Note, when adding new tokens to the vocabulary, you should make sure to also resize the token embedding matrix
+        of the model so that its embedding matrix matches the tokenizer.

        In order to do that, please use the [`~PreTrainedModel.resize_token_embeddings`] method.