"src/turbomind/utils/memory_utils.h" did not exist on "720fc533da804ac3f46ee938864403e51fcd9fa7"
Refactor to remove generate and fix some bad tokenization
In particular, the following assumptions are FALSE in general: tokenize(context + continuation) = tokenize(context) + tokenize(continuation) len(tokenize(context + continuation)) = len(tokenize(context)) + len(tokenize(continuation)) tokenize(context + continuation)[:len(tokenize(context))] = tokenize(context) So we need to tip-toe around the problem by being careful with how we do it. In particular, using Fast is not just for performance; while behavour of GPT2Tokenizer differs across Transformers 2 and 3, GPT2TokenizerFast doesn't.
Showing
Please register or sign in to comment