"git@developer.sourcefind.cn:chenpangpang/ComfyUI.git" did not exist on "815fefc48a1ae71ca0ad86b32eb56fa3705643e0"
never_split on slow tokenizers should not split (#4723)
* Ensure tokens in never_split are not splitted when using basic tokenizer before wordpiece. * never_split only use membership attempt to use a set() which is 10x faster for this operation. * Use union to concatenate two sets. * Updated docstring for never_split parameter. * Avoid set.union() if never_split is None * Added comments. * Correct docstring format.
Showing
Please register or sign in to comment