Tokenization behave the same as original XLM proprocessing for most languages...
Tokenization behave the same as original XLM proprocessing for most languages except zh, ja and th; Change API to allow specifying language in `tokenize`
Showing
| ... | @@ -9,4 +9,6 @@ requests | ... | @@ -9,4 +9,6 @@ requests |
| # For OpenAI GPT | # For OpenAI GPT | ||
| regex | regex | ||
| # For XLNet | # For XLNet | ||
| sentencepiece | sentencepiece | ||
| \ No newline at end of file | # For XLM | ||
| sacremoses | |||
| \ No newline at end of file |
Please register or sign in to comment