Adds a note to resize the token embedding matrix when adding special … (#11120)

* Adds a note to resize the token embedding matrix when adding special tokens * Remove superfluous space

Adds a note to resize the token embedding matrix when adding special … (#11120)
* Adds a note to resize the token embedding matrix when adding special tokens * Remove superfluous space
c0d97cee · Lysandre Debut · GitHub · 02f7c2fe · c0d97cee
Unverified Commit c0d97cee authored Apr 07, 2021 by Lysandre Debut Committed by GitHub Apr 07, 2021
Hide whitespace changes
Inline Side-by-side

Showing with 7 additions and 1 deletion

src/transformers/tokenization_utils_base.py src/transformers/tokenization_utils_base.py +7 -1

No files found.
--- a/src/transformers/tokenization_utils_base.py
+++ b/src/transformers/tokenization_utils_base.py
@@ -825,7 +825,13 @@ class SpecialTokensMixin:
        special tokens are NOT in the vocabulary, they are added to it (indexed starting from the last index of the
        current vocabulary).

-        Using : obj:`add_special_tokens` will ensure your special tokens can be used in several ways:
+        .. Note::
+            When adding new tokens to the vocabulary, you should make sure to also resize the token embedding matrix of
+            the model so that its embedding matrix matches the tokenizer.
+
+            In order to do that, please use the :meth:`~transformers.PreTrainedModel.resize_token_embeddings` method.
+
+        Using :obj:`add_special_tokens` will ensure your special tokens can be used in several ways:

        - Special tokens are carefully handled by the tokenizer (they are never split).
        - You can easily refer to special tokens using tokenizer class attributes like :obj:`tokenizer.cls_token`. This