Add note to resize token embeddings matrix when adding new tokens to voc (#10331)

e73a3e18 · Lysandre Debut · GitHub · 19e737b9 · e73a3e18
Unverified Commit e73a3e18 authored Feb 22, 2021 by Lysandre Debut Committed by GitHub Feb 22, 2021
Hide whitespace changes
Inline Side-by-side

Showing with 6 additions and 0 deletions

src/transformers/tokenization_utils_base.py src/transformers/tokenization_utils_base.py +6 -0

No files found.
--- a/src/transformers/tokenization_utils_base.py
+++ b/src/transformers/tokenization_utils_base.py
@@ -971,6 +971,12 @@ class SpecialTokensMixin:
        Add a list of new tokens to the tokenizer class. If the new tokens are not in the vocabulary, they are added to
        it with indices starting from length of the current vocabulary.

+        .. Note::
+            When adding new tokens to the vocabulary, you should make sure to also resize the token embedding matrix of
+            the model so that its embedding matrix matches the tokenizer.
+
+            In order to do that, please use the :meth:`~transformers.PreTrainedModel.resize_token_embeddings` method.
+
        Args:
            new_tokens (:obj:`str`, :obj:`tokenizers.AddedToken` or a list of `str` or :obj:`tokenizers.AddedToken`):
                Tokens are only added if they are not already in the vocabulary. :obj:`tokenizers.AddedToken` wraps a