Unverified Commit dcff504e authored by Juyoung Kim's avatar Juyoung Kim Committed by GitHub
Browse files

fixed docstring typos (#18739)



* fixed docstring typos

* Added missing colon
Co-authored-by: default avatar김주영 <juyoung@zezedu.com>
parent e49c71fc
...@@ -77,17 +77,17 @@ class BartConfig(PretrainedConfig): ...@@ -77,17 +77,17 @@ class BartConfig(PretrainedConfig):
just in case (e.g., 512 or 1024 or 2048). just in case (e.g., 512 or 1024 or 2048).
init_std (`float`, *optional*, defaults to 0.02): init_std (`float`, *optional*, defaults to 0.02):
The standard deviation of the truncated_normal_initializer for initializing all weight matrices. The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
encoder_layerdrop: (`float`, *optional*, defaults to 0.0): encoder_layerdrop (`float`, *optional*, defaults to 0.0):
The LayerDrop probability for the encoder. See the [LayerDrop paper](see https://arxiv.org/abs/1909.11556) The LayerDrop probability for the encoder. See the [LayerDrop paper](see https://arxiv.org/abs/1909.11556)
for more details. for more details.
decoder_layerdrop: (`float`, *optional*, defaults to 0.0): decoder_layerdrop (`float`, *optional*, defaults to 0.0):
The LayerDrop probability for the decoder. See the [LayerDrop paper](see https://arxiv.org/abs/1909.11556) The LayerDrop probability for the decoder. See the [LayerDrop paper](see https://arxiv.org/abs/1909.11556)
for more details. for more details.
scale_embedding (`bool`, *optional*, defaults to `False`): scale_embedding (`bool`, *optional*, defaults to `False`):
Scale embeddings by diving by sqrt(d_model). Scale embeddings by diving by sqrt(d_model).
use_cache (`bool`, *optional*, defaults to `True`): use_cache (`bool`, *optional*, defaults to `True`):
Whether or not the model should return the last key/values attentions (not used by all models). Whether or not the model should return the last key/values attentions (not used by all models).
num_labels: (`int`, *optional*, defaults to 3): num_labels (`int`, *optional*, defaults to 3):
The number of labels to use in [`BartForSequenceClassification`]. The number of labels to use in [`BartForSequenceClassification`].
forced_eos_token_id (`int`, *optional*, defaults to 2): forced_eos_token_id (`int`, *optional*, defaults to 2):
The id of the token to force as the last generated token when `max_length` is reached. Usually set to The id of the token to force as the last generated token when `max_length` is reached. Usually set to
......
...@@ -383,7 +383,7 @@ class BasicTokenizer(object): ...@@ -383,7 +383,7 @@ class BasicTokenizer(object):
This should likely be deactivated for Japanese (see this This should likely be deactivated for Japanese (see this
[issue](https://github.com/huggingface/transformers/issues/328)). [issue](https://github.com/huggingface/transformers/issues/328)).
strip_accents: (`bool`, *optional*): strip_accents (`bool`, *optional*):
Whether or not to strip all accents. If this option is not specified, then it will be determined by the Whether or not to strip all accents. If this option is not specified, then it will be determined by the
value for `lowercase` (as in the original BERT). value for `lowercase` (as in the original BERT).
""" """
......
...@@ -85,10 +85,10 @@ class BigBirdPegasusConfig(PretrainedConfig): ...@@ -85,10 +85,10 @@ class BigBirdPegasusConfig(PretrainedConfig):
just in case (e.g., 1024 or 2048 or 4096). just in case (e.g., 1024 or 2048 or 4096).
init_std (`float`, *optional*, defaults to 0.02): init_std (`float`, *optional*, defaults to 0.02):
The standard deviation of the truncated_normal_initializer for initializing all weight matrices. The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
encoder_layerdrop: (`float`, *optional*, defaults to 0.0): encoder_layerdrop (`float`, *optional*, defaults to 0.0):
The LayerDrop probability for the encoder. See the [LayerDrop paper](see https://arxiv.org/abs/1909.11556) The LayerDrop probability for the encoder. See the [LayerDrop paper](see https://arxiv.org/abs/1909.11556)
for more details. for more details.
decoder_layerdrop: (`float`, *optional*, defaults to 0.0): decoder_layerdrop (`float`, *optional*, defaults to 0.0):
The LayerDrop probability for the decoder. See the [LayerDrop paper](see https://arxiv.org/abs/1909.11556) The LayerDrop probability for the decoder. See the [LayerDrop paper](see https://arxiv.org/abs/1909.11556)
for more details. for more details.
use_cache (`bool`, *optional*, defaults to `True`): use_cache (`bool`, *optional*, defaults to `True`):
......
...@@ -78,10 +78,10 @@ class BlenderbotConfig(PretrainedConfig): ...@@ -78,10 +78,10 @@ class BlenderbotConfig(PretrainedConfig):
just in case (e.g., 512 or 1024 or 2048). just in case (e.g., 512 or 1024 or 2048).
init_std (`float`, *optional*, defaults to 0.02): init_std (`float`, *optional*, defaults to 0.02):
The standard deviation of the truncated_normal_initializer for initializing all weight matrices. The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
encoder_layerdrop: (`float`, *optional*, defaults to 0.0): encoder_layerdrop (`float`, *optional*, defaults to 0.0):
The LayerDrop probability for the encoder. See the [LayerDrop paper](see https://arxiv.org/abs/1909.11556) The LayerDrop probability for the encoder. See the [LayerDrop paper](see https://arxiv.org/abs/1909.11556)
for more details. for more details.
decoder_layerdrop: (`float`, *optional*, defaults to 0.0): decoder_layerdrop (`float`, *optional*, defaults to 0.0):
The LayerDrop probability for the decoder. See the [LayerDrop paper](see https://arxiv.org/abs/1909.11556) The LayerDrop probability for the decoder. See the [LayerDrop paper](see https://arxiv.org/abs/1909.11556)
for more details. for more details.
scale_embedding (`bool`, *optional*, defaults to `False`): scale_embedding (`bool`, *optional*, defaults to `False`):
......
...@@ -78,10 +78,10 @@ class BlenderbotSmallConfig(PretrainedConfig): ...@@ -78,10 +78,10 @@ class BlenderbotSmallConfig(PretrainedConfig):
just in case (e.g., 512 or 1024 or 2048). just in case (e.g., 512 or 1024 or 2048).
init_std (`float`, *optional*, defaults to 0.02): init_std (`float`, *optional*, defaults to 0.02):
The standard deviation of the truncated_normal_initializer for initializing all weight matrices. The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
encoder_layerdrop: (`float`, *optional*, defaults to 0.0): encoder_layerdrop (`float`, *optional*, defaults to 0.0):
The LayerDrop probability for the encoder. See the [LayerDrop paper](see https://arxiv.org/abs/1909.11556) The LayerDrop probability for the encoder. See the [LayerDrop paper](see https://arxiv.org/abs/1909.11556)
for more details. for more details.
decoder_layerdrop: (`float`, *optional*, defaults to 0.0): decoder_layerdrop (`float`, *optional*, defaults to 0.0):
The LayerDrop probability for the decoder. See the [LayerDrop paper](see https://arxiv.org/abs/1909.11556) The LayerDrop probability for the decoder. See the [LayerDrop paper](see https://arxiv.org/abs/1909.11556)
for more details. for more details.
scale_embedding (`bool`, *optional*, defaults to `False`): scale_embedding (`bool`, *optional*, defaults to `False`):
......
...@@ -74,10 +74,10 @@ class DetrConfig(PretrainedConfig): ...@@ -74,10 +74,10 @@ class DetrConfig(PretrainedConfig):
The standard deviation of the truncated_normal_initializer for initializing all weight matrices. The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
init_xavier_std (`float`, *optional*, defaults to 1): init_xavier_std (`float`, *optional*, defaults to 1):
The scaling factor used for the Xavier initialization gain in the HM Attention map module. The scaling factor used for the Xavier initialization gain in the HM Attention map module.
encoder_layerdrop: (`float`, *optional*, defaults to 0.0): encoder_layerdrop (`float`, *optional*, defaults to 0.0):
The LayerDrop probability for the encoder. See the [LayerDrop paper](see https://arxiv.org/abs/1909.11556) The LayerDrop probability for the encoder. See the [LayerDrop paper](see https://arxiv.org/abs/1909.11556)
for more details. for more details.
decoder_layerdrop: (`float`, *optional*, defaults to 0.0): decoder_layerdrop (`float`, *optional*, defaults to 0.0):
The LayerDrop probability for the decoder. See the [LayerDrop paper](see https://arxiv.org/abs/1909.11556) The LayerDrop probability for the decoder. See the [LayerDrop paper](see https://arxiv.org/abs/1909.11556)
for more details. for more details.
auxiliary_loss (`bool`, *optional*, defaults to `False`): auxiliary_loss (`bool`, *optional*, defaults to `False`):
......
...@@ -404,7 +404,7 @@ DPR_ENCODERS_INPUTS_DOCSTRING = r""" ...@@ -404,7 +404,7 @@ DPR_ENCODERS_INPUTS_DOCSTRING = r"""
DPR_READER_INPUTS_DOCSTRING = r""" DPR_READER_INPUTS_DOCSTRING = r"""
Args: Args:
input_ids: (`Tuple[torch.LongTensor]` of shapes `(n_passages, sequence_length)`): input_ids (`Tuple[torch.LongTensor]` of shapes `(n_passages, sequence_length)`):
Indices of input sequence tokens in the vocabulary. It has to be a sequence triplet with 1) the question Indices of input sequence tokens in the vocabulary. It has to be a sequence triplet with 1) the question
and 2) the passages titles and 3) the passages texts To match pretraining, DPR `input_ids` sequence should and 2) the passages titles and 3) the passages texts To match pretraining, DPR `input_ids` sequence should
be formatted with [CLS] and [SEP] with the format: be formatted with [CLS] and [SEP] with the format:
......
...@@ -493,7 +493,7 @@ TF_DPR_ENCODERS_INPUTS_DOCSTRING = r""" ...@@ -493,7 +493,7 @@ TF_DPR_ENCODERS_INPUTS_DOCSTRING = r"""
TF_DPR_READER_INPUTS_DOCSTRING = r""" TF_DPR_READER_INPUTS_DOCSTRING = r"""
Args: Args:
input_ids: (`Numpy array` or `tf.Tensor` of shapes `(n_passages, sequence_length)`): input_ids (`Numpy array` or `tf.Tensor` of shapes `(n_passages, sequence_length)`):
Indices of input sequence tokens in the vocabulary. It has to be a sequence triplet with 1) the question Indices of input sequence tokens in the vocabulary. It has to be a sequence triplet with 1) the question
and 2) the passages titles and 3) the passages texts To match pretraining, DPR `input_ids` sequence should and 2) the passages titles and 3) the passages texts To match pretraining, DPR `input_ids` sequence should
be formatted with [CLS] and [SEP] with the format: be formatted with [CLS] and [SEP] with the format:
......
...@@ -136,7 +136,7 @@ ENCODER_DECODER_INPUTS_DOCSTRING = r""" ...@@ -136,7 +136,7 @@ ENCODER_DECODER_INPUTS_DOCSTRING = r"""
more detail. more detail.
return_dict (`bool`, *optional*): return_dict (`bool`, *optional*):
If set to `True`, the model will return a [`~utils.Seq2SeqLMOutput`] instead of a plain tuple. If set to `True`, the model will return a [`~utils.Seq2SeqLMOutput`] instead of a plain tuple.
kwargs: (*optional*) Remaining dictionary of keyword arguments. Keyword arguments come in two flavors: kwargs (*optional*): Remaining dictionary of keyword arguments. Keyword arguments come in two flavors:
- Without a prefix which will be input as `**encoder_kwargs` for the encoder forward function. - Without a prefix which will be input as `**encoder_kwargs` for the encoder forward function.
- With a *decoder_* prefix which will be input as `**decoder_kwargs` for the decoder forward function. - With a *decoder_* prefix which will be input as `**decoder_kwargs` for the decoder forward function.
......
...@@ -147,7 +147,7 @@ ENCODER_DECODER_INPUTS_DOCSTRING = r""" ...@@ -147,7 +147,7 @@ ENCODER_DECODER_INPUTS_DOCSTRING = r"""
training (`bool`, *optional*, defaults to `False`): training (`bool`, *optional*, defaults to `False`):
Whether or not to use the model in training mode (some modules like dropout modules have different Whether or not to use the model in training mode (some modules like dropout modules have different
behaviors between training and evaluation). behaviors between training and evaluation).
kwargs: (*optional*) Remaining dictionary of keyword arguments. Keyword arguments come in two flavors: kwargs (*optional*): Remaining dictionary of keyword arguments. Keyword arguments come in two flavors:
- Without a prefix which will be input as `**encoder_kwargs` for the encoder forward function. - Without a prefix which will be input as `**encoder_kwargs` for the encoder forward function.
- With a *decoder_* prefix which will be input as `**decoder_kwargs`` for the decoder forward function. - With a *decoder_* prefix which will be input as `**decoder_kwargs`` for the decoder forward function.
......
...@@ -95,9 +95,9 @@ class FSMTConfig(PretrainedConfig): ...@@ -95,9 +95,9 @@ class FSMTConfig(PretrainedConfig):
End of stream token id. End of stream token id.
decoder_start_token_id (`int`, *optional*): decoder_start_token_id (`int`, *optional*):
This model starts decoding with `eos_token_id` This model starts decoding with `eos_token_id`
encoder_layerdrop: (`float`, *optional*, defaults to 0.0): encoder_layerdrop (`float`, *optional*, defaults to 0.0):
Google "layerdrop arxiv", as its not explainable in one line. Google "layerdrop arxiv", as its not explainable in one line.
decoder_layerdrop: (`float`, *optional*, defaults to 0.0): decoder_layerdrop (`float`, *optional*, defaults to 0.0):
Google "layerdrop arxiv", as its not explainable in one line. Google "layerdrop arxiv", as its not explainable in one line.
is_encoder_decoder (`bool`, *optional*, defaults to `True`): is_encoder_decoder (`bool`, *optional*, defaults to `True`):
Whether this is an encoder/decoder model. Whether this is an encoder/decoder model.
......
...@@ -1362,7 +1362,7 @@ class BasicTokenizer(object): ...@@ -1362,7 +1362,7 @@ class BasicTokenizer(object):
This should likely be deactivated for Japanese (see this This should likely be deactivated for Japanese (see this
[issue](https://github.com/huggingface/transformers/issues/328)). [issue](https://github.com/huggingface/transformers/issues/328)).
strip_accents: (`bool`, *optional*): strip_accents (`bool`, *optional*):
Whether or not to strip all accents. If this option is not specified, then it will be determined by the Whether or not to strip all accents. If this option is not specified, then it will be determined by the
value for `lowercase` (as in the original BERT). value for `lowercase` (as in the original BERT).
""" """
......
...@@ -108,7 +108,7 @@ class LayoutLMv2TokenizerFast(PreTrainedTokenizerFast): ...@@ -108,7 +108,7 @@ class LayoutLMv2TokenizerFast(PreTrainedTokenizerFast):
tokenize_chinese_chars (`bool`, *optional*, defaults to `True`): tokenize_chinese_chars (`bool`, *optional*, defaults to `True`):
Whether or not to tokenize Chinese characters. This should likely be deactivated for Japanese (see [this Whether or not to tokenize Chinese characters. This should likely be deactivated for Japanese (see [this
issue](https://github.com/huggingface/transformers/issues/328)). issue](https://github.com/huggingface/transformers/issues/328)).
strip_accents: (`bool`, *optional*): strip_accents (`bool`, *optional*):
Whether or not to strip all accents. If this option is not specified, then it will be determined by the Whether or not to strip all accents. If this option is not specified, then it will be determined by the
value for `lowercase` (as in the original LayoutLMv2). value for `lowercase` (as in the original LayoutLMv2).
""" """
......
...@@ -74,10 +74,10 @@ class LEDConfig(PretrainedConfig): ...@@ -74,10 +74,10 @@ class LEDConfig(PretrainedConfig):
The maximum sequence length that the decoder might ever be used with. The maximum sequence length that the decoder might ever be used with.
init_std (`float`, *optional*, defaults to 0.02): init_std (`float`, *optional*, defaults to 0.02):
The standard deviation of the truncated_normal_initializer for initializing all weight matrices. The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
encoder_layerdrop: (`float`, *optional*, defaults to 0.0): encoder_layerdrop (`float`, *optional*, defaults to 0.0):
The LayerDrop probability for the encoder. See the [LayerDrop paper](see https://arxiv.org/abs/1909.11556) The LayerDrop probability for the encoder. See the [LayerDrop paper](see https://arxiv.org/abs/1909.11556)
for more details. for more details.
decoder_layerdrop: (`float`, *optional*, defaults to 0.0): decoder_layerdrop (`float`, *optional*, defaults to 0.0):
The LayerDrop probability for the decoder. See the [LayerDrop paper](see https://arxiv.org/abs/1909.11556) The LayerDrop probability for the decoder. See the [LayerDrop paper](see https://arxiv.org/abs/1909.11556)
for more details. for more details.
use_cache (`bool`, *optional*, defaults to `True`): use_cache (`bool`, *optional*, defaults to `True`):
......
...@@ -76,10 +76,10 @@ class M2M100Config(PretrainedConfig): ...@@ -76,10 +76,10 @@ class M2M100Config(PretrainedConfig):
just in case (e.g., 512 or 1024 or 2048). just in case (e.g., 512 or 1024 or 2048).
init_std (`float`, *optional*, defaults to 0.02): init_std (`float`, *optional*, defaults to 0.02):
The standard deviation of the truncated_normal_initializer for initializing all weight matrices. The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
encoder_layerdrop: (`float`, *optional*, defaults to 0.0): encoder_layerdrop (`float`, *optional*, defaults to 0.0):
The LayerDrop probability for the encoder. See the [LayerDrop paper](see https://arxiv.org/abs/1909.11556) The LayerDrop probability for the encoder. See the [LayerDrop paper](see https://arxiv.org/abs/1909.11556)
for more details. for more details.
decoder_layerdrop: (`float`, *optional*, defaults to 0.0): decoder_layerdrop (`float`, *optional*, defaults to 0.0):
The LayerDrop probability for the decoder. See the [LayerDrop paper](see https://arxiv.org/abs/1909.11556) The LayerDrop probability for the decoder. See the [LayerDrop paper](see https://arxiv.org/abs/1909.11556)
for more details. for more details.
use_cache (`bool`, *optional*, defaults to `True`): use_cache (`bool`, *optional*, defaults to `True`):
......
...@@ -76,10 +76,10 @@ class MarianConfig(PretrainedConfig): ...@@ -76,10 +76,10 @@ class MarianConfig(PretrainedConfig):
just in case (e.g., 512 or 1024 or 2048). just in case (e.g., 512 or 1024 or 2048).
init_std (`float`, *optional*, defaults to 0.02): init_std (`float`, *optional*, defaults to 0.02):
The standard deviation of the truncated_normal_initializer for initializing all weight matrices. The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
encoder_layerdrop: (`float`, *optional*, defaults to 0.0): encoder_layerdrop (`float`, *optional*, defaults to 0.0):
The LayerDrop probability for the encoder. See the [LayerDrop paper](see https://arxiv.org/abs/1909.11556) The LayerDrop probability for the encoder. See the [LayerDrop paper](see https://arxiv.org/abs/1909.11556)
for more details. for more details.
decoder_layerdrop: (`float`, *optional*, defaults to 0.0): decoder_layerdrop (`float`, *optional*, defaults to 0.0):
The LayerDrop probability for the decoder. See the [LayerDrop paper](see https://arxiv.org/abs/1909.11556) The LayerDrop probability for the decoder. See the [LayerDrop paper](see https://arxiv.org/abs/1909.11556)
for more details. for more details.
scale_embedding (`bool`, *optional*, defaults to `False`): scale_embedding (`bool`, *optional*, defaults to `False`):
......
...@@ -76,10 +76,10 @@ class MBartConfig(PretrainedConfig): ...@@ -76,10 +76,10 @@ class MBartConfig(PretrainedConfig):
just in case (e.g., 512 or 1024 or 2048). just in case (e.g., 512 or 1024 or 2048).
init_std (`float`, *optional*, defaults to 0.02): init_std (`float`, *optional*, defaults to 0.02):
The standard deviation of the truncated_normal_initializer for initializing all weight matrices. The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
encoder_layerdrop: (`float`, *optional*, defaults to 0.0): encoder_layerdrop (`float`, *optional*, defaults to 0.0):
The LayerDrop probability for the encoder. See the [LayerDrop paper](see https://arxiv.org/abs/1909.11556) The LayerDrop probability for the encoder. See the [LayerDrop paper](see https://arxiv.org/abs/1909.11556)
for more details. for more details.
decoder_layerdrop: (`float`, *optional*, defaults to 0.0): decoder_layerdrop (`float`, *optional*, defaults to 0.0):
The LayerDrop probability for the decoder. See the [LayerDrop paper](see https://arxiv.org/abs/1909.11556) The LayerDrop probability for the decoder. See the [LayerDrop paper](see https://arxiv.org/abs/1909.11556)
for more details. for more details.
scale_embedding (`bool`, *optional*, defaults to `False`): scale_embedding (`bool`, *optional*, defaults to `False`):
......
...@@ -340,7 +340,7 @@ class BasicTokenizer(object): ...@@ -340,7 +340,7 @@ class BasicTokenizer(object):
This should likely be deactivated for Japanese (see this This should likely be deactivated for Japanese (see this
[issue](https://github.com/huggingface/transformers/issues/328)). [issue](https://github.com/huggingface/transformers/issues/328)).
strip_accents: (`bool`, *optional*): strip_accents (`bool`, *optional*):
Whether or not to strip all accents. If this option is not specified, then it will be determined by the Whether or not to strip all accents. If this option is not specified, then it will be determined by the
value for `lowercase` (as in the original BERT). value for `lowercase` (as in the original BERT).
""" """
......
...@@ -71,10 +71,10 @@ class PegasusConfig(PretrainedConfig): ...@@ -71,10 +71,10 @@ class PegasusConfig(PretrainedConfig):
just in case (e.g., 512 or 1024 or 2048). just in case (e.g., 512 or 1024 or 2048).
init_std (`float`, *optional*, defaults to 0.02): init_std (`float`, *optional*, defaults to 0.02):
The standard deviation of the truncated_normal_initializer for initializing all weight matrices. The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
encoder_layerdrop: (`float`, *optional*, defaults to 0.0): encoder_layerdrop (`float`, *optional*, defaults to 0.0):
The LayerDrop probability for the encoder. See the [LayerDrop paper](see https://arxiv.org/abs/1909.11556) The LayerDrop probability for the encoder. See the [LayerDrop paper](see https://arxiv.org/abs/1909.11556)
for more details. for more details.
decoder_layerdrop: (`float`, *optional*, defaults to 0.0): decoder_layerdrop (`float`, *optional*, defaults to 0.0):
The LayerDrop probability for the decoder. See the [LayerDrop paper](see https://arxiv.org/abs/1909.11556) The LayerDrop probability for the decoder. See the [LayerDrop paper](see https://arxiv.org/abs/1909.11556)
for more details. for more details.
scale_embedding (`bool`, *optional*, defaults to `False`): scale_embedding (`bool`, *optional*, defaults to `False`):
......
...@@ -74,10 +74,10 @@ class PLBartConfig(PretrainedConfig): ...@@ -74,10 +74,10 @@ class PLBartConfig(PretrainedConfig):
just in case (e.g., 512 or 1024 or 2048). just in case (e.g., 512 or 1024 or 2048).
init_std (`float`, *optional*, defaults to 0.02): init_std (`float`, *optional*, defaults to 0.02):
The standard deviation of the truncated_normal_initializer for initializing all weight matrices. The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
encoder_layerdrop: (`float`, *optional*, defaults to 0.0): encoder_layerdrop (`float`, *optional*, defaults to 0.0):
The LayerDrop probability for the encoder. See the [LayerDrop paper](see https://arxiv.org/abs/1909.11556) The LayerDrop probability for the encoder. See the [LayerDrop paper](see https://arxiv.org/abs/1909.11556)
for more details. for more details.
decoder_layerdrop: (`float`, *optional*, defaults to 0.0): decoder_layerdrop (`float`, *optional*, defaults to 0.0):
The LayerDrop probability for the decoder. See the [LayerDrop paper](see https://arxiv.org/abs/1909.11556) The LayerDrop probability for the decoder. See the [LayerDrop paper](see https://arxiv.org/abs/1909.11556)
for more details. for more details.
scale_embedding (`bool`, *optional*, defaults to `True`): scale_embedding (`bool`, *optional*, defaults to `True`):
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment