Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
chenpangpang
transformers
Commits
b03b2a65
Commit
b03b2a65
authored
Apr 26, 2021
by
Sylvain Gugger
Browse files
Style
parent
ce11318e
Changes
3
Hide whitespace changes
Inline
Side-by-side
Showing
3 changed files
with
9 additions
and
9 deletions
+9
-9
src/transformers/models/tapas/tokenization_tapas.py
src/transformers/models/tapas/tokenization_tapas.py
+3
-3
src/transformers/tokenization_utils.py
src/transformers/tokenization_utils.py
+3
-3
src/transformers/tokenization_utils_base.py
src/transformers/tokenization_utils_base.py
+3
-3
No files found.
src/transformers/models/tapas/tokenization_tapas.py
View file @
b03b2a65
...
@@ -172,9 +172,9 @@ TAPAS_ENCODE_PLUS_ADDITIONAL_KWARGS_DOCSTRING = r"""
...
@@ -172,9 +172,9 @@ TAPAS_ENCODE_PLUS_ADDITIONAL_KWARGS_DOCSTRING = r"""
length is required by one of the truncation/padding parameters. If the model has no specific maximum
length is required by one of the truncation/padding parameters. If the model has no specific maximum
input length (like XLNet) truncation/padding to a maximum length will be deactivated.
input length (like XLNet) truncation/padding to a maximum length will be deactivated.
is_split_into_words (:obj:`bool`, `optional`, defaults to :obj:`False`):
is_split_into_words (:obj:`bool`, `optional`, defaults to :obj:`False`):
Whether or not the input is already pre-tokenized (e.g., split into words). If set to :obj:`True`,
Whether or not the input is already pre-tokenized (e.g., split into words). If set to :obj:`True`,
the
the
tokenizer assumes the input is already split into words (for instance, by splitting it on
tokenizer assumes the input is already split into words (for instance, by splitting it on
whitespace)
whitespace)
which it will tokenize. This is useful for NER or token classification.
which it will tokenize. This is useful for NER or token classification.
pad_to_multiple_of (:obj:`int`, `optional`):
pad_to_multiple_of (:obj:`int`, `optional`):
If set will pad the sequence to a multiple of the provided value. This is especially useful to enable
If set will pad the sequence to a multiple of the provided value. This is especially useful to enable
the use of Tensor Cores on NVIDIA hardware with compute capability >= 7.5 (Volta).
the use of Tensor Cores on NVIDIA hardware with compute capability >= 7.5 (Volta).
...
...
src/transformers/tokenization_utils.py
View file @
b03b2a65
...
@@ -643,9 +643,9 @@ class PreTrainedTokenizer(PreTrainedTokenizerBase):
...
@@ -643,9 +643,9 @@ class PreTrainedTokenizer(PreTrainedTokenizerBase):
text (:obj:`str`):
text (:obj:`str`):
The text to prepare.
The text to prepare.
is_split_into_words (:obj:`bool`, `optional`, defaults to :obj:`False`):
is_split_into_words (:obj:`bool`, `optional`, defaults to :obj:`False`):
Whether or not the input is already pre-tokenized (e.g., split into words). If set to :obj:`True`,
Whether or not the input is already pre-tokenized (e.g., split into words). If set to :obj:`True`,
the
the
tokenizer assumes the input is already split into words (for instance, by splitting it on
tokenizer assumes the input is already split into words (for instance, by splitting it on
whitespace)
whitespace)
which it will tokenize. This is useful for NER or token classification.
which it will tokenize. This is useful for NER or token classification.
kwargs:
kwargs:
Keyword arguments to use for the tokenization.
Keyword arguments to use for the tokenization.
...
...
src/transformers/tokenization_utils_base.py
View file @
b03b2a65
...
@@ -1286,9 +1286,9 @@ ENCODE_KWARGS_DOCSTRING = r"""
...
@@ -1286,9 +1286,9 @@ ENCODE_KWARGS_DOCSTRING = r"""
returned to provide some overlap between truncated and overflowing sequences. The value of this
returned to provide some overlap between truncated and overflowing sequences. The value of this
argument defines the number of overlapping tokens.
argument defines the number of overlapping tokens.
is_split_into_words (:obj:`bool`, `optional`, defaults to :obj:`False`):
is_split_into_words (:obj:`bool`, `optional`, defaults to :obj:`False`):
Whether or not the input is already pre-tokenized (e.g., split into words). If set to :obj:`True`,
Whether or not the input is already pre-tokenized (e.g., split into words). If set to :obj:`True`,
the
the
tokenizer assumes the input is already split into words (for instance, by splitting it on
tokenizer assumes the input is already split into words (for instance, by splitting it on
whitespace)
whitespace)
which it will tokenize. This is useful for NER or token classification.
which it will tokenize. This is useful for NER or token classification.
pad_to_multiple_of (:obj:`int`, `optional`):
pad_to_multiple_of (:obj:`int`, `optional`):
If set will pad the sequence to a multiple of the provided value. This is especially useful to enable
If set will pad the sequence to a multiple of the provided value. This is especially useful to enable
the use of Tensor Cores on NVIDIA hardware with compute capability >= 7.5 (Volta).
the use of Tensor Cores on NVIDIA hardware with compute capability >= 7.5 (Volta).
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment