Add preprocessing step for transfo-xl tokenization to avoid tokenizing words...
Add preprocessing step for transfo-xl tokenization to avoid tokenizing words followed by punction to <unk> (#2987) * add preprocessing to add space before punctuation for transfo_xl * improve warning messages * make style * compile regex at instantination of tokenizer object
Showing
Please register or sign in to comment