"vscode:/vscode.git/clone" did not exist on "70f742343602a18c499aaf5d891bec1d379723e0"
never_split on slow tokenizers should not split (#4723)
* Ensure tokens in never_split are not splitted when using basic tokenizer before wordpiece. * never_split only use membership attempt to use a set() which is 10x faster for this operation. * Use union to concatenate two sets. * Updated docstring for never_split parameter. * Avoid set.union() if never_split is None * Added comments. * Correct docstring format.
Showing
Please register or sign in to comment