Example of pad_to_multiple_of for padding and truncation guide & docstring update (#22278)

* added an example of pad_to_multiple_of * make style * addressed feedback

Example of pad_to_multiple_of for padding and truncation guide & docstring update (#22278)
* added an example of pad_to_multiple_of * make style * addressed feedback
7bd86505 · Maria Khalusova · GitHub · fb0a38b4 · 7bd86505 · 7bd86505
Unverified Commit 7bd86505 authored Mar 20, 2023 by Maria Khalusova Committed by GitHub Mar 20, 2023
Hide whitespace changes
Inline Side-by-side

Showing with 4 additions and 2 deletions

docs/source/en/pad_truncation.mdx docs/source/en/pad_truncation.mdx +1 -0

src/transformers/tokenization_utils_base.py src/transformers/tokenization_utils_base.py +3 -2

No files found.
--- a/docs/source/en/pad_truncation.mdx
+++ b/docs/source/en/pad_truncation.mdx
@@ -50,6 +50,7 @@ The following table summarizes the recommended way to setup padding and truncati
 |                                      |                                   | `tokenizer(batch_sentences, padding='longest')`                                        |
 |                                      | padding to max model input length | `tokenizer(batch_sentences, padding='max_length')`                                     |
 |                                      | padding to specific length        | `tokenizer(batch_sentences, padding='max_length', max_length=42)`                      |
+|                                      | padding to a multiple of a value  | `tokenizer(batch_sentences, padding=True, pad_to_multiple_of=8)                        |
 | truncation to max model input length | no padding                        | `tokenizer(batch_sentences, truncation=True)` or                                       |
 |                                      |                                   | `tokenizer(batch_sentences, truncation=STRATEGY)`                                      |
 |                                      | padding to max sequence in batch  | `tokenizer(batch_sentences, padding=True, truncation=True)` or                         |

--- a/src/transformers/tokenization_utils_base.py
+++ b/src/transformers/tokenization_utils_base.py
@@ -1342,8 +1342,9 @@ ENCODE_KWARGS_DOCSTRING = r"""
                tokenizer assumes the input is already split into words (for instance, by splitting it on whitespace)
                which it will tokenize. This is useful for NER or token classification.
            pad_to_multiple_of (`int`, *optional*):
-                If set will pad the sequence to a multiple of the provided value. This is especially useful to enable
+                If set will pad the sequence to a multiple of the provided value. Requires `padding` to be activated.
-                the use of Tensor Cores on NVIDIA hardware with compute capability `>= 7.5` (Volta).
+                This is especially useful to enable the use of Tensor Cores on NVIDIA hardware with compute capability
+                `>= 7.5` (Volta).
            return_tensors (`str` or [`~utils.TensorType`], *optional*):
                If set, will return tensors instead of list of python integers. Acceptable values are: