Accumulate tokens into batches in `PreTrainedTokenizerBase.add_tokens()` (#17119)
* Accumulate tokens into batches in PreTrainedTokenizerBase.add_tokens() For tokenizers with a small number of special tokens or special tokens with consecutive token IDs, this reduces the time complexity of creating the trie from quadratic to linear, see also #16936. * Extend explanation of batching added tokens
Showing
Please register or sign in to comment