Tokenization behave the same as original XLM proprocessing for most languages...
Tokenization behave the same as original XLM proprocessing for most languages except zh, ja and th; Change API to allow specifying language in `tokenize`
Showing
| ... | ... | @@ -10,3 +10,5 @@ requests |
| regex | ||
| # For XLNet | ||
| sentencepiece | ||
| # For XLM | ||
| sacremoses | ||
| \ No newline at end of file |
Please register or sign in to comment