Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
chenpangpang
transformers
Commits
10dfa126
Unverified
Commit
10dfa126
authored
Apr 27, 2022
by
Yang Ming
Committed by
GitHub
Apr 26, 2022
Browse files
documentation: some minor clean up (#16850)
parent
aaee4038
Changes
3
Show whitespace changes
Inline
Side-by-side
Showing
3 changed files
with
4 additions
and
5 deletions
+4
-5
docs/source/en/main_classes/tokenizer.mdx
docs/source/en/main_classes/tokenizer.mdx
+1
-3
src/transformers/models/deberta_v2/tokenization_deberta_v2.py
...transformers/models/deberta_v2/tokenization_deberta_v2.py
+2
-2
utils/documentation_tests.txt
utils/documentation_tests.txt
+1
-0
No files found.
docs/source/en/main_classes/tokenizer.mdx
View file @
10dfa126
...
...
@@ -18,9 +18,7 @@ Rust library [🤗 Tokenizers](https://github.com/huggingface/tokenizers). The "
1. a significant speed-up in particular when doing batched tokenization and
2. additional methods to map between the original string (character and words) and the token space (e.g. getting the
index of the token comprising a given character or the span of characters corresponding to a given token). Currently
no "Fast" implementation is available for the SentencePiece-based tokenizers (for T5, ALBERT, CamemBERT, XLM-RoBERTa
and XLNet models).
index of the token comprising a given character or the span of characters corresponding to a given token).
The base classes [`PreTrainedTokenizer`] and [`PreTrainedTokenizerFast`]
implement the common methods for encoding string inputs in model inputs (see below) and instantiating/saving python and
...
...
src/transformers/models/deberta_v2/tokenization_deberta_v2.py
View file @
10dfa126
...
...
@@ -60,11 +60,11 @@ class DebertaV2Tokenizer(PreTrainedTokenizer):
contains the vocabulary necessary to instantiate a tokenizer.
do_lower_case (`bool`, *optional*, defaults to `False`):
Whether or not to lowercase the input when tokenizing.
bos_token (`string`, *optional*, defaults to "[CLS]"):
bos_token (`string`, *optional*, defaults to
`
"[CLS]"
`
):
The beginning of sequence token that was used during pre-training. Can be used a sequence classifier token.
When building a sequence using special tokens, this is not the token that is used for the beginning of
sequence. The token used is the `cls_token`.
eos_token (`string`, *optional*, defaults to "[SEP]"):
eos_token (`string`, *optional*, defaults to
`
"[SEP]"
`
):
The end of sequence token. When building a sequence using special tokens, this is not the token that is
used for the end of sequence. The token used is the `sep_token`.
unk_token (`str`, *optional*, defaults to `"[UNK]"`):
...
...
utils/documentation_tests.txt
View file @
10dfa126
...
...
@@ -59,3 +59,4 @@ src/transformers/models/wav2vec2/modeling_wav2vec2.py
src/transformers/models/wav2vec2/tokenization_wav2vec2.py
src/transformers/models/wav2vec2_with_lm/processing_wav2vec2_with_lm.py
src/transformers/models/wavlm/modeling_wavlm.py
src/transformers/models/ctrl/modeling_ctrl.py
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment