"git@developer.sourcefind.cn:chenpangpang/transformers.git" did not exist on "aa79aa4e7d4e743f8bb705f57e99c95179e9eeba"
Unverified Commit 10dfa126 authored by Yang Ming's avatar Yang Ming Committed by GitHub
Browse files

documentation: some minor clean up (#16850)

parent aaee4038
...@@ -18,9 +18,7 @@ Rust library [🤗 Tokenizers](https://github.com/huggingface/tokenizers). The " ...@@ -18,9 +18,7 @@ Rust library [🤗 Tokenizers](https://github.com/huggingface/tokenizers). The "
1. a significant speed-up in particular when doing batched tokenization and 1. a significant speed-up in particular when doing batched tokenization and
2. additional methods to map between the original string (character and words) and the token space (e.g. getting the 2. additional methods to map between the original string (character and words) and the token space (e.g. getting the
index of the token comprising a given character or the span of characters corresponding to a given token). Currently index of the token comprising a given character or the span of characters corresponding to a given token).
no "Fast" implementation is available for the SentencePiece-based tokenizers (for T5, ALBERT, CamemBERT, XLM-RoBERTa
and XLNet models).
The base classes [`PreTrainedTokenizer`] and [`PreTrainedTokenizerFast`] The base classes [`PreTrainedTokenizer`] and [`PreTrainedTokenizerFast`]
implement the common methods for encoding string inputs in model inputs (see below) and instantiating/saving python and implement the common methods for encoding string inputs in model inputs (see below) and instantiating/saving python and
......
...@@ -60,11 +60,11 @@ class DebertaV2Tokenizer(PreTrainedTokenizer): ...@@ -60,11 +60,11 @@ class DebertaV2Tokenizer(PreTrainedTokenizer):
contains the vocabulary necessary to instantiate a tokenizer. contains the vocabulary necessary to instantiate a tokenizer.
do_lower_case (`bool`, *optional*, defaults to `False`): do_lower_case (`bool`, *optional*, defaults to `False`):
Whether or not to lowercase the input when tokenizing. Whether or not to lowercase the input when tokenizing.
bos_token (`string`, *optional*, defaults to "[CLS]"): bos_token (`string`, *optional*, defaults to `"[CLS]"`):
The beginning of sequence token that was used during pre-training. Can be used a sequence classifier token. The beginning of sequence token that was used during pre-training. Can be used a sequence classifier token.
When building a sequence using special tokens, this is not the token that is used for the beginning of When building a sequence using special tokens, this is not the token that is used for the beginning of
sequence. The token used is the `cls_token`. sequence. The token used is the `cls_token`.
eos_token (`string`, *optional*, defaults to "[SEP]"): eos_token (`string`, *optional*, defaults to `"[SEP]"`):
The end of sequence token. When building a sequence using special tokens, this is not the token that is The end of sequence token. When building a sequence using special tokens, this is not the token that is
used for the end of sequence. The token used is the `sep_token`. used for the end of sequence. The token used is the `sep_token`.
unk_token (`str`, *optional*, defaults to `"[UNK]"`): unk_token (`str`, *optional*, defaults to `"[UNK]"`):
......
...@@ -59,3 +59,4 @@ src/transformers/models/wav2vec2/modeling_wav2vec2.py ...@@ -59,3 +59,4 @@ src/transformers/models/wav2vec2/modeling_wav2vec2.py
src/transformers/models/wav2vec2/tokenization_wav2vec2.py src/transformers/models/wav2vec2/tokenization_wav2vec2.py
src/transformers/models/wav2vec2_with_lm/processing_wav2vec2_with_lm.py src/transformers/models/wav2vec2_with_lm/processing_wav2vec2_with_lm.py
src/transformers/models/wavlm/modeling_wavlm.py src/transformers/models/wavlm/modeling_wavlm.py
src/transformers/models/ctrl/modeling_ctrl.py
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment