Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
chenpangpang
transformers
Commits
a0d38645
Commit
a0d38645
authored
Dec 17, 2019
by
Julien Chaumond
Browse files
Fix outdated tokenizer doc
parent
ea636440
Changes
3
Show whitespace changes
Inline
Side-by-side
Showing
3 changed files
with
5 additions
and
5 deletions
+5
-5
templates/adding_a_new_model/tokenization_xxx.py
templates/adding_a_new_model/tokenization_xxx.py
+1
-1
transformers/tokenization_bert.py
transformers/tokenization_bert.py
+2
-2
transformers/tokenization_distilbert.py
transformers/tokenization_distilbert.py
+2
-2
No files found.
templates/adding_a_new_model/tokenization_xxx.py
View file @
a0d38645
...
...
@@ -85,7 +85,7 @@ class XxxTokenizer(PreTrainedTokenizer):
Args:
vocab_file: Path to a one-wordpiece-per-line vocabulary file
do_lower_case: Whether to lower case the input. Only has an effect when do_
wordpiece_only=Fals
e
do_lower_case: Whether to lower case the input. Only has an effect when do_
basic_tokenize=Tru
e
"""
vocab_files_names
=
VOCAB_FILES_NAMES
...
...
transformers/tokenization_bert.py
View file @
a0d38645
...
...
@@ -113,12 +113,12 @@ class BertTokenizer(PreTrainedTokenizer):
Args:
vocab_file: Path to a one-wordpiece-per-line vocabulary file
do_lower_case: Whether to lower case the input. Only has an effect when do_
wordpiece_only=Fals
e
do_lower_case: Whether to lower case the input. Only has an effect when do_
basic_tokenize=Tru
e
do_basic_tokenize: Whether to do basic tokenization before wordpiece.
max_len: An artificial maximum length to truncate tokenized sequences to; Effective maximum length is always the
minimum of this value (if specified) and the underlying BERT model's sequence length.
never_split: List of tokens which will never be split during tokenization. Only has an effect when
do_
wordpiece_only=Fals
e
do_
basic_tokenize=Tru
e
"""
vocab_files_names
=
VOCAB_FILES_NAMES
...
...
transformers/tokenization_distilbert.py
View file @
a0d38645
...
...
@@ -53,12 +53,12 @@ class DistilBertTokenizer(BertTokenizer):
Args:
vocab_file: Path to a one-wordpiece-per-line vocabulary file
do_lower_case: Whether to lower case the input. Only has an effect when do_
wordpiece_only=Fals
e
do_lower_case: Whether to lower case the input. Only has an effect when do_
basic_tokenize=Tru
e
do_basic_tokenize: Whether to do basic tokenization before wordpiece.
max_len: An artificial maximum length to truncate tokenized sequences to; Effective maximum length is always the
minimum of this value (if specified) and the underlying BERT model's sequence length.
never_split: List of tokens which will never be split during tokenization. Only has an effect when
do_
wordpiece_only=Fals
e
do_
basic_tokenize=Tru
e
"""
vocab_files_names
=
VOCAB_FILES_NAMES
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment