"vscode:/vscode.git/clone" did not exist on "039d8d65fc19ac74a8c7917233eb2828c46c0fa7"
-
Joe Davison authored
* Preserve spaces in GPT-2 tokenizers Preserves spaces after special tokens in GPT-2 and inhereted (RoBERTa) tokenizers, enabling correct BPE encoding. Automatically inserts a space in front of first token in encode function when adding special tokens. * Add tokenization preprocessing method * Add framework argument to pipeline factory Also fixes pipeline test issue. Each test input now treated as a distinct sequence.
f1e8a51f