"git@developer.sourcefind.cn:chenpangpang/transformers.git" did not exist on "ce02550d50f2074d54d58b37cbc9845d9a159818"
Commit 5eab3cf6 authored by Aymeric Augustin's avatar Aymeric Augustin
Browse files

Fix W605 flake8 warning (x5).

parent 7dce8dc7
...@@ -22,8 +22,8 @@ ...@@ -22,8 +22,8 @@
--model_name openai-gpt \ --model_name openai-gpt \
--do_train \ --do_train \
--do_eval \ --do_eval \
--train_dataset $ROC_STORIES_DIR/cloze_test_val__spring2016\ -\ cloze_test_ALL_val.csv \ --train_dataset "$ROC_STORIES_DIR/cloze_test_val__spring2016 - cloze_test_ALL_val.csv" \
--eval_dataset $ROC_STORIES_DIR/cloze_test_test__spring2016\ -\ cloze_test_ALL_test.csv \ --eval_dataset "$ROC_STORIES_DIR/cloze_test_test__spring2016 - cloze_test_ALL_test.csv" \
--output_dir ../log \ --output_dir ../log \
--train_batch_size 16 \ --train_batch_size 16 \
""" """
......
...@@ -725,10 +725,10 @@ class XLMTokenizer(PreTrainedTokenizer): ...@@ -725,10 +725,10 @@ class XLMTokenizer(PreTrainedTokenizer):
make && make install make && make install
pip install kytea pip install kytea
``` ```
- [jieba](https://github.com/fxsjy/jieba): Chinese tokenizer * - [jieba](https://github.com/fxsjy/jieba): Chinese tokenizer (*)
- Install with `pip install jieba` - Install with `pip install jieba`
\* The original XLM used [Stanford Segmenter](https://nlp.stanford.edu/software/stanford-segmenter-2018-10-16.zip). (*) The original XLM used [Stanford Segmenter](https://nlp.stanford.edu/software/stanford-segmenter-2018-10-16.zip).
However, the wrapper (`nltk.tokenize.stanford_segmenter`) is slow due to JVM overhead, and it will be deprecated. However, the wrapper (`nltk.tokenize.stanford_segmenter`) is slow due to JVM overhead, and it will be deprecated.
Jieba is a lot faster and pip-installable. Note there is some mismatch with the Stanford Segmenter. It should be fine Jieba is a lot faster and pip-installable. Note there is some mismatch with the Stanford Segmenter. It should be fine
if you fine-tune the model with Chinese supervisionself. If you want the same exact behaviour, use the original XLM if you fine-tune the model with Chinese supervisionself. If you want the same exact behaviour, use the original XLM
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment