fix `word_to_tokens` docstring format (#20450)

* fix docstring * fix 2 * add details

fix `word_to_tokens` docstring format (#20450)
* fix docstring * fix 2 * add details
3c39c07f · SaulLu · GitHub · a547d5bd · 3c39c07f
Unverified Commit 3c39c07f authored Nov 25, 2022 by SaulLu Committed by GitHub Nov 25, 2022
Show whitespace changes
Inline Side-by-side

Showing with 4 additions and 2 deletions

src/transformers/tokenization_utils_base.py src/transformers/tokenization_utils_base.py +4 -2

No files found.
--- a/src/transformers/tokenization_utils_base.py
+++ b/src/transformers/tokenization_utils_base.py
@@ -475,8 +475,10 @@ class BatchEncoding(UserDict):
                or 1) the provided word index belongs to.

        Returns:
-            Optional [`~tokenization_utils_base.TokenSpan`] Span of tokens in the encoded sequence. Returns `None` if
-            no tokens correspond to the word.
+            ([`~tokenization_utils_base.TokenSpan`], *optional*): Span of tokens in the encoded sequence. Returns
+            `None` if no tokens correspond to the word. This can happen especially when the token is a special token
+            that has been used to format the tokenization. For example when we add a class token at the very beginning
+            of the tokenization.
        """

        if not self._encodings: