"...resnet50_tensorflow.git" did not exist on "e9e3e584ba816254e2911a368d74ae7773366d4b"
Transformer-XL: Improved tokenization with sacremoses (#6322)
* Improved tokenization with sacremoses * The TransfoXLTokenizer is now using sacremoses for tokenization * Added tokenization of comma-separated and floating point numbers. * Removed prepare_for_tokenization() from tokenization_transfo_xl.py because punctuation is handled by sacremoses * Added corresponding tests * Removed test comapring TransfoXLTokenizer and TransfoXLTokenizerFast * Added deprecation warning to TransfoXLTokenizerFast * isort change Co-authored-by:Teven <teven.lescao@gmail.com> Co-authored-by:
Lysandre Debut <lysandre@huggingface.co>
Showing
Please register or sign in to comment