Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
chenpangpang
transformers
Commits
10dfa126
"git@developer.sourcefind.cn:chenpangpang/transformers.git" did not exist on "aa79aa4e7d4e743f8bb705f57e99c95179e9eeba"
Unverified
Commit
10dfa126
authored
Apr 27, 2022
by
Yang Ming
Committed by
GitHub
Apr 26, 2022
Browse files
documentation: some minor clean up (#16850)
parent
aaee4038
Changes
3
Hide whitespace changes
Inline
Side-by-side
Showing
3 changed files
with
4 additions
and
5 deletions
+4
-5
docs/source/en/main_classes/tokenizer.mdx
docs/source/en/main_classes/tokenizer.mdx
+1
-3
src/transformers/models/deberta_v2/tokenization_deberta_v2.py
...transformers/models/deberta_v2/tokenization_deberta_v2.py
+2
-2
utils/documentation_tests.txt
utils/documentation_tests.txt
+1
-0
No files found.
docs/source/en/main_classes/tokenizer.mdx
View file @
10dfa126
...
@@ -18,9 +18,7 @@ Rust library [🤗 Tokenizers](https://github.com/huggingface/tokenizers). The "
...
@@ -18,9 +18,7 @@ Rust library [🤗 Tokenizers](https://github.com/huggingface/tokenizers). The "
1. a significant speed-up in particular when doing batched tokenization and
1. a significant speed-up in particular when doing batched tokenization and
2. additional methods to map between the original string (character and words) and the token space (e.g. getting the
2. additional methods to map between the original string (character and words) and the token space (e.g. getting the
index of the token comprising a given character or the span of characters corresponding to a given token). Currently
index of the token comprising a given character or the span of characters corresponding to a given token).
no "Fast" implementation is available for the SentencePiece-based tokenizers (for T5, ALBERT, CamemBERT, XLM-RoBERTa
and XLNet models).
The base classes [`PreTrainedTokenizer`] and [`PreTrainedTokenizerFast`]
The base classes [`PreTrainedTokenizer`] and [`PreTrainedTokenizerFast`]
implement the common methods for encoding string inputs in model inputs (see below) and instantiating/saving python and
implement the common methods for encoding string inputs in model inputs (see below) and instantiating/saving python and
...
...
src/transformers/models/deberta_v2/tokenization_deberta_v2.py
View file @
10dfa126
...
@@ -60,11 +60,11 @@ class DebertaV2Tokenizer(PreTrainedTokenizer):
...
@@ -60,11 +60,11 @@ class DebertaV2Tokenizer(PreTrainedTokenizer):
contains the vocabulary necessary to instantiate a tokenizer.
contains the vocabulary necessary to instantiate a tokenizer.
do_lower_case (`bool`, *optional*, defaults to `False`):
do_lower_case (`bool`, *optional*, defaults to `False`):
Whether or not to lowercase the input when tokenizing.
Whether or not to lowercase the input when tokenizing.
bos_token (`string`, *optional*, defaults to "[CLS]"):
bos_token (`string`, *optional*, defaults to
`
"[CLS]"
`
):
The beginning of sequence token that was used during pre-training. Can be used a sequence classifier token.
The beginning of sequence token that was used during pre-training. Can be used a sequence classifier token.
When building a sequence using special tokens, this is not the token that is used for the beginning of
When building a sequence using special tokens, this is not the token that is used for the beginning of
sequence. The token used is the `cls_token`.
sequence. The token used is the `cls_token`.
eos_token (`string`, *optional*, defaults to "[SEP]"):
eos_token (`string`, *optional*, defaults to
`
"[SEP]"
`
):
The end of sequence token. When building a sequence using special tokens, this is not the token that is
The end of sequence token. When building a sequence using special tokens, this is not the token that is
used for the end of sequence. The token used is the `sep_token`.
used for the end of sequence. The token used is the `sep_token`.
unk_token (`str`, *optional*, defaults to `"[UNK]"`):
unk_token (`str`, *optional*, defaults to `"[UNK]"`):
...
...
utils/documentation_tests.txt
View file @
10dfa126
...
@@ -59,3 +59,4 @@ src/transformers/models/wav2vec2/modeling_wav2vec2.py
...
@@ -59,3 +59,4 @@ src/transformers/models/wav2vec2/modeling_wav2vec2.py
src/transformers/models/wav2vec2/tokenization_wav2vec2.py
src/transformers/models/wav2vec2/tokenization_wav2vec2.py
src/transformers/models/wav2vec2_with_lm/processing_wav2vec2_with_lm.py
src/transformers/models/wav2vec2_with_lm/processing_wav2vec2_with_lm.py
src/transformers/models/wavlm/modeling_wavlm.py
src/transformers/models/wavlm/modeling_wavlm.py
src/transformers/models/ctrl/modeling_ctrl.py
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment