Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
chenpangpang
transformers
Commits
6a43dc9d
Commit
6a43dc9d
authored
Dec 05, 2019
by
Masatoshi Suzuki
Committed by
Julien Chaumond
Dec 11, 2019
Browse files
Support Python 2
parent
a09da4ee
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
7 additions
and
1 deletion
+7
-1
transformers/tokenization_bert_japanese.py
transformers/tokenization_bert_japanese.py
+7
-1
No files found.
transformers/tokenization_bert_japanese.py
View file @
6a43dc9d
...
@@ -19,6 +19,7 @@ from __future__ import absolute_import, division, print_function, unicode_litera
...
@@ -19,6 +19,7 @@ from __future__ import absolute_import, division, print_function, unicode_litera
import
collections
import
collections
import
logging
import
logging
import
os
import
os
import
six
import
unicodedata
import
unicodedata
from
io
import
open
from
io
import
open
...
@@ -186,8 +187,13 @@ class MecabTokenizer(object):
...
@@ -186,8 +187,13 @@ class MecabTokenizer(object):
never_split
=
self
.
never_split
+
(
never_split
if
never_split
is
not
None
else
[])
never_split
=
self
.
never_split
+
(
never_split
if
never_split
is
not
None
else
[])
tokens
=
[]
tokens
=
[]
if
six
.
PY2
:
mecab_output
=
self
.
mecab
.
parse
(
text
.
encode
(
'utf-8'
)).
decode
(
'utf-8'
)
else
:
mecab_output
=
self
.
mecab
.
parse
(
text
)
cursor
=
0
cursor
=
0
for
line
in
self
.
mecab
.
parse
(
text
)
.
split
(
'
\n
'
):
for
line
in
mecab_output
.
split
(
'
\n
'
):
if
line
==
'EOS'
:
if
line
==
'EOS'
:
break
break
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment