Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
chenpangpang
transformers
Commits
5eab3cf6
Commit
5eab3cf6
authored
Dec 21, 2019
by
Aymeric Augustin
Browse files
Fix W605 flake8 warning (x5).
parent
7dce8dc7
Changes
2
Show whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
4 additions
and
4 deletions
+4
-4
examples/contrib/run_openai_gpt.py
examples/contrib/run_openai_gpt.py
+2
-2
transformers/tokenization_xlm.py
transformers/tokenization_xlm.py
+2
-2
No files found.
examples/contrib/run_openai_gpt.py
View file @
5eab3cf6
...
@@ -22,8 +22,8 @@
...
@@ -22,8 +22,8 @@
--model_name openai-gpt
\
--model_name openai-gpt
\
--do_train
\
--do_train
\
--do_eval
\
--do_eval
\
--train_dataset $ROC_STORIES_DIR/cloze_test_val__spring2016
\
-
\
cloze_test_ALL_val.csv
\
--train_dataset
"
$ROC_STORIES_DIR/cloze_test_val__spring2016 - cloze_test_ALL_val.csv
"
\
--eval_dataset $ROC_STORIES_DIR/cloze_test_test__spring2016
\
-
\
cloze_test_ALL_test.csv
\
--eval_dataset
"
$ROC_STORIES_DIR/cloze_test_test__spring2016 - cloze_test_ALL_test.csv
"
\
--output_dir ../log
\
--output_dir ../log
\
--train_batch_size 16
\
--train_batch_size 16
\
"""
"""
...
...
transformers/tokenization_xlm.py
View file @
5eab3cf6
...
@@ -725,10 +725,10 @@ class XLMTokenizer(PreTrainedTokenizer):
...
@@ -725,10 +725,10 @@ class XLMTokenizer(PreTrainedTokenizer):
make && make install
make && make install
pip install kytea
pip install kytea
```
```
- [jieba](https://github.com/fxsjy/jieba): Chinese tokenizer
*
- [jieba](https://github.com/fxsjy/jieba): Chinese tokenizer
(*)
- Install with `pip install jieba`
- Install with `pip install jieba`
\*
The original XLM used [Stanford Segmenter](https://nlp.stanford.edu/software/stanford-segmenter-2018-10-16.zip).
(*)
The original XLM used [Stanford Segmenter](https://nlp.stanford.edu/software/stanford-segmenter-2018-10-16.zip).
However, the wrapper (`nltk.tokenize.stanford_segmenter`) is slow due to JVM overhead, and it will be deprecated.
However, the wrapper (`nltk.tokenize.stanford_segmenter`) is slow due to JVM overhead, and it will be deprecated.
Jieba is a lot faster and pip-installable. Note there is some mismatch with the Stanford Segmenter. It should be fine
Jieba is a lot faster and pip-installable. Note there is some mismatch with the Stanford Segmenter. It should be fine
if you fine-tune the model with Chinese supervisionself. If you want the same exact behaviour, use the original XLM
if you fine-tune the model with Chinese supervisionself. If you want the same exact behaviour, use the original XLM
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment