Unverified Commit 721ee783 authored by Klaus Hipp's avatar Klaus Hipp Committed by GitHub
Browse files

[Docs] Fix spelling and grammar mistakes (#28825)

* Fix typos and grammar mistakes in docs and examples

* Fix typos in docstrings and comments

* Fix spelling of `tokenizer` in model tests

* Remove erroneous spaces in decorators

* Remove extra spaces in Markdown link texts
parent 2418c64a
...@@ -202,7 +202,7 @@ class ClapAudioConfig(PretrainedConfig): ...@@ -202,7 +202,7 @@ class ClapAudioConfig(PretrainedConfig):
Whether or not to enable patch fusion. This is the main contribution of the authors, and should give the Whether or not to enable patch fusion. This is the main contribution of the authors, and should give the
best results. best results.
hidden_dropout_prob (`float`, *optional*, defaults to 0.1): hidden_dropout_prob (`float`, *optional*, defaults to 0.1):
The dropout probabilitiy for all fully connected layers in the encoder. The dropout probability for all fully connected layers in the encoder.
fusion_type (`[type]`, *optional*): fusion_type (`[type]`, *optional*):
Fusion type used for the patch fusion. Fusion type used for the patch fusion.
patch_embed_input_channels (`int`, *optional*, defaults to 1): patch_embed_input_channels (`int`, *optional*, defaults to 1):
......
...@@ -61,7 +61,7 @@ class ConvBertConfig(PretrainedConfig): ...@@ -61,7 +61,7 @@ class ConvBertConfig(PretrainedConfig):
The non-linear activation function (function or string) in the encoder and pooler. If string, `"gelu"`, The non-linear activation function (function or string) in the encoder and pooler. If string, `"gelu"`,
`"relu"`, `"selu"` and `"gelu_new"` are supported. `"relu"`, `"selu"` and `"gelu_new"` are supported.
hidden_dropout_prob (`float`, *optional*, defaults to 0.1): hidden_dropout_prob (`float`, *optional*, defaults to 0.1):
The dropout probabilitiy for all fully connected layers in the embeddings, encoder, and pooler. The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.
attention_probs_dropout_prob (`float`, *optional*, defaults to 0.1): attention_probs_dropout_prob (`float`, *optional*, defaults to 0.1):
The dropout ratio for the attention probabilities. The dropout ratio for the attention probabilities.
max_position_embeddings (`int`, *optional*, defaults to 512): max_position_embeddings (`int`, *optional*, defaults to 512):
......
...@@ -51,7 +51,7 @@ class CpmAntConfig(PretrainedConfig): ...@@ -51,7 +51,7 @@ class CpmAntConfig(PretrainedConfig):
num_hidden_layers (`int`, *optional*, defaults to 48): num_hidden_layers (`int`, *optional*, defaults to 48):
Number of layers of the Transformer encoder. Number of layers of the Transformer encoder.
dropout_p (`float`, *optional*, defaults to 0.0): dropout_p (`float`, *optional*, defaults to 0.0):
The dropout probabilitiy for all fully connected layers in the embeddings, encoder. The dropout probability for all fully connected layers in the embeddings, encoder.
position_bias_num_buckets (`int`, *optional*, defaults to 512): position_bias_num_buckets (`int`, *optional*, defaults to 512):
The number of position_bias buckets. The number of position_bias buckets.
position_bias_max_distance (`int`, *optional*, defaults to 2048): position_bias_max_distance (`int`, *optional*, defaults to 2048):
......
...@@ -59,7 +59,7 @@ class DeiTConfig(PretrainedConfig): ...@@ -59,7 +59,7 @@ class DeiTConfig(PretrainedConfig):
The non-linear activation function (function or string) in the encoder and pooler. If string, `"gelu"`, The non-linear activation function (function or string) in the encoder and pooler. If string, `"gelu"`,
`"relu"`, `"selu"` and `"gelu_new"` are supported. `"relu"`, `"selu"` and `"gelu_new"` are supported.
hidden_dropout_prob (`float`, *optional*, defaults to 0.0): hidden_dropout_prob (`float`, *optional*, defaults to 0.0):
The dropout probabilitiy for all fully connected layers in the embeddings, encoder, and pooler. The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.
attention_probs_dropout_prob (`float`, *optional*, defaults to 0.0): attention_probs_dropout_prob (`float`, *optional*, defaults to 0.0):
The dropout ratio for the attention probabilities. The dropout ratio for the attention probabilities.
initializer_range (`float`, *optional*, defaults to 0.02): initializer_range (`float`, *optional*, defaults to 0.02):
......
...@@ -64,7 +64,7 @@ class MCTCTConfig(PretrainedConfig): ...@@ -64,7 +64,7 @@ class MCTCTConfig(PretrainedConfig):
initializer_range (`float`, *optional*, defaults to 0.02): initializer_range (`float`, *optional*, defaults to 0.02):
The standard deviation of the truncated_normal_initializer for initializing all weight matrices. The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
hidden_dropout_prob (`float`, *optional*, defaults to 0.3): hidden_dropout_prob (`float`, *optional*, defaults to 0.3):
The dropout probabilitiy for all fully connected layers in the embeddings, encoder, and pooler. The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.
attention_probs_dropout_prob (`float`, *optional*, defaults to 0.3): attention_probs_dropout_prob (`float`, *optional*, defaults to 0.3):
The dropout ratio for the attention probabilities. The dropout ratio for the attention probabilities.
pad_token_id (`int`, *optional*, defaults to 1): pad_token_id (`int`, *optional*, defaults to 1):
......
...@@ -146,7 +146,7 @@ class FlaxEmbeddings(nn.Module): ...@@ -146,7 +146,7 @@ class FlaxEmbeddings(nn.Module):
position_embeds = self.position_embeddings(position_ids.astype("i4")) position_embeds = self.position_embeddings(position_ids.astype("i4"))
else: else:
position_embeds = self.pos_encoding[:, :seq_length, :] position_embeds = self.pos_encoding[:, :seq_length, :]
# explictly cast the positions here, since self.embed_positions are not registered as parameters # explicitly cast the positions here, since self.embed_positions are not registered as parameters
position_embeds = position_embeds.astype(inputs_embeds.dtype) position_embeds = position_embeds.astype(inputs_embeds.dtype)
# Sum all embeddings # Sum all embeddings
......
...@@ -54,7 +54,7 @@ class DPTConfig(PretrainedConfig): ...@@ -54,7 +54,7 @@ class DPTConfig(PretrainedConfig):
The non-linear activation function (function or string) in the encoder and pooler. If string, `"gelu"`, The non-linear activation function (function or string) in the encoder and pooler. If string, `"gelu"`,
`"relu"`, `"selu"` and `"gelu_new"` are supported. `"relu"`, `"selu"` and `"gelu_new"` are supported.
hidden_dropout_prob (`float`, *optional*, defaults to 0.0): hidden_dropout_prob (`float`, *optional*, defaults to 0.0):
The dropout probabilitiy for all fully connected layers in the embeddings, encoder, and pooler. The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.
attention_probs_dropout_prob (`float`, *optional*, defaults to 0.0): attention_probs_dropout_prob (`float`, *optional*, defaults to 0.0):
The dropout ratio for the attention probabilities. The dropout ratio for the attention probabilities.
initializer_range (`float`, *optional*, defaults to 0.02): initializer_range (`float`, *optional*, defaults to 0.02):
......
...@@ -1915,7 +1915,7 @@ class EsmFoldingTrunk(nn.Module): ...@@ -1915,7 +1915,7 @@ class EsmFoldingTrunk(nn.Module):
# This parameter means the axial attention will be computed # This parameter means the axial attention will be computed
# in a chunked manner. This should make the memory used more or less O(L) instead of O(L^2). # in a chunked manner. This should make the memory used more or less O(L) instead of O(L^2).
# It's equivalent to running a for loop over chunks of the dimension we're iterative over, # It's equivalent to running a for loop over chunks of the dimension we're iterative over,
# where the chunk_size is the size of the chunks, so 128 would mean to parse 128-lengthed chunks. # where the chunk_size is the size of the chunks, so 128 would mean to parse 128-length chunks.
self.chunk_size = chunk_size self.chunk_size = chunk_size
def forward(self, seq_feats, pair_feats, true_aa, residx, mask, no_recycles): def forward(self, seq_feats, pair_feats, true_aa, residx, mask, no_recycles):
......
...@@ -53,7 +53,7 @@ class FlavaImageConfig(PretrainedConfig): ...@@ -53,7 +53,7 @@ class FlavaImageConfig(PretrainedConfig):
The non-linear activation function (function or string) in the encoder and pooler. If string, `"gelu"`, The non-linear activation function (function or string) in the encoder and pooler. If string, `"gelu"`,
`"relu"`, `"selu"` and `"gelu_new"` are supported. `"relu"`, `"selu"` and `"gelu_new"` are supported.
hidden_dropout_prob (`float`, *optional*, defaults to 0.0): hidden_dropout_prob (`float`, *optional*, defaults to 0.0):
The dropout probabilitiy for all fully connected layers in the embeddings, encoder, and pooler. The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.
attention_probs_dropout_prob (`float`, *optional*, defaults to 0.0): attention_probs_dropout_prob (`float`, *optional*, defaults to 0.0):
The dropout ratio for the attention probabilities. The dropout ratio for the attention probabilities.
initializer_range (`float`, *optional*, defaults to 0.02): initializer_range (`float`, *optional*, defaults to 0.02):
...@@ -188,7 +188,7 @@ class FlavaTextConfig(PretrainedConfig): ...@@ -188,7 +188,7 @@ class FlavaTextConfig(PretrainedConfig):
The non-linear activation function (function or string) in the encoder and pooler. If string, `"gelu"`, The non-linear activation function (function or string) in the encoder and pooler. If string, `"gelu"`,
`"relu"`, `"selu"` and `"gelu_new"` are supported. `"relu"`, `"selu"` and `"gelu_new"` are supported.
hidden_dropout_prob (`float`, *optional*, defaults to 0.1): hidden_dropout_prob (`float`, *optional*, defaults to 0.1):
The dropout probabilitiy for all fully connected layers in the embeddings, encoder, and pooler. The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.
attention_probs_dropout_prob (`float`, *optional*, defaults to 0.1): attention_probs_dropout_prob (`float`, *optional*, defaults to 0.1):
The dropout ratio for the attention probabilities. The dropout ratio for the attention probabilities.
initializer_range (`float`, *optional*, defaults to 0.02): initializer_range (`float`, *optional*, defaults to 0.02):
...@@ -302,7 +302,7 @@ class FlavaMultimodalConfig(PretrainedConfig): ...@@ -302,7 +302,7 @@ class FlavaMultimodalConfig(PretrainedConfig):
The non-linear activation function (function or string) in the encoder and pooler. If string, `"gelu"`, The non-linear activation function (function or string) in the encoder and pooler. If string, `"gelu"`,
`"relu"`, `"selu"` and `"gelu_new"` are supported. `"relu"`, `"selu"` and `"gelu_new"` are supported.
hidden_dropout_prob (`float`, *optional*, defaults to 0.0): hidden_dropout_prob (`float`, *optional*, defaults to 0.0):
The dropout probabilitiy for all fully connected layers in the embeddings, encoder, and pooler. The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.
attention_probs_dropout_prob (`float`, *optional*, defaults to 0.0): attention_probs_dropout_prob (`float`, *optional*, defaults to 0.0):
The dropout ratio for the attention probabilities. The dropout ratio for the attention probabilities.
initializer_range (`float`, *optional*, defaults to 0.02): initializer_range (`float`, *optional*, defaults to 0.02):
......
...@@ -52,7 +52,7 @@ class FNetConfig(PretrainedConfig): ...@@ -52,7 +52,7 @@ class FNetConfig(PretrainedConfig):
The non-linear activation function (function or string) in the encoder and pooler. If string, `"gelu"`, The non-linear activation function (function or string) in the encoder and pooler. If string, `"gelu"`,
`"relu"`, `"selu"` and `"gelu_new"` are supported. `"relu"`, `"selu"` and `"gelu_new"` are supported.
hidden_dropout_prob (`float`, *optional*, defaults to 0.1): hidden_dropout_prob (`float`, *optional*, defaults to 0.1):
The dropout probabilitiy for all fully connected layers in the embeddings, encoder, and pooler. The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.
max_position_embeddings (`int`, *optional*, defaults to 512): max_position_embeddings (`int`, *optional*, defaults to 512):
The maximum sequence length that this model might ever be used with. Typically set this to something large The maximum sequence length that this model might ever be used with. Typically set this to something large
just in case (e.g., 512 or 1024 or 2048). just in case (e.g., 512 or 1024 or 2048).
......
...@@ -70,7 +70,7 @@ class GPTNeoConfig(PretrainedConfig): ...@@ -70,7 +70,7 @@ class GPTNeoConfig(PretrainedConfig):
resid_dropout (`float`, *optional*, defaults to 0.0): resid_dropout (`float`, *optional*, defaults to 0.0):
Residual dropout used in the attention pattern. Residual dropout used in the attention pattern.
embed_dropout (`float`, *optional*, defaults to 0.0): embed_dropout (`float`, *optional*, defaults to 0.0):
The dropout probabilitiy for all fully connected layers in the embeddings, encoder, and pooler. The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.
attention_dropout (`float`, *optional*, defaults to 0.0): attention_dropout (`float`, *optional*, defaults to 0.0):
The dropout ratio for the attention probabilities. The dropout ratio for the attention probabilities.
classifier_dropout (`float`, *optional*, defaults to 0.1): classifier_dropout (`float`, *optional*, defaults to 0.1):
......
...@@ -177,7 +177,7 @@ class GroupViTVisionConfig(PretrainedConfig): ...@@ -177,7 +177,7 @@ class GroupViTVisionConfig(PretrainedConfig):
layer_norm_eps (`float`, *optional*, defaults to 1e-5): layer_norm_eps (`float`, *optional*, defaults to 1e-5):
The epsilon used by the layer normalization layers. The epsilon used by the layer normalization layers.
dropout (`float`, *optional*, defaults to 0.0): dropout (`float`, *optional*, defaults to 0.0):
The dropout probabilitiy for all fully connected layers in the embeddings, encoder, and pooler. The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.
attention_dropout (`float`, *optional*, defaults to 0.0): attention_dropout (`float`, *optional*, defaults to 0.0):
The dropout ratio for the attention probabilities. The dropout ratio for the attention probabilities.
initializer_range (`float`, *optional*, defaults to 0.02): initializer_range (`float`, *optional*, defaults to 0.02):
......
...@@ -63,7 +63,7 @@ class HubertConfig(PretrainedConfig): ...@@ -63,7 +63,7 @@ class HubertConfig(PretrainedConfig):
attention_dropout(`float`, *optional*, defaults to 0.1): attention_dropout(`float`, *optional*, defaults to 0.1):
The dropout ratio for the attention probabilities. The dropout ratio for the attention probabilities.
final_dropout (`float`, *optional*, defaults to 0.1): final_dropout (`float`, *optional*, defaults to 0.1):
The dropout probabilitiy for the final projection layer of [`Wav2Vec2ForCTC`]. The dropout probability for the final projection layer of [`Wav2Vec2ForCTC`].
layerdrop (`float`, *optional*, defaults to 0.1): layerdrop (`float`, *optional*, defaults to 0.1):
The LayerDrop probability. See the [LayerDrop paper](see https://arxiv.org/abs/1909.11556) for more The LayerDrop probability. See the [LayerDrop paper](see https://arxiv.org/abs/1909.11556) for more
details. details.
......
...@@ -57,7 +57,7 @@ class LayoutLMv2Config(PretrainedConfig): ...@@ -57,7 +57,7 @@ class LayoutLMv2Config(PretrainedConfig):
The non-linear activation function (function or string) in the encoder and pooler. If string, `"gelu"`, The non-linear activation function (function or string) in the encoder and pooler. If string, `"gelu"`,
`"relu"`, `"selu"` and `"gelu_new"` are supported. `"relu"`, `"selu"` and `"gelu_new"` are supported.
hidden_dropout_prob (`float`, *optional*, defaults to 0.1): hidden_dropout_prob (`float`, *optional*, defaults to 0.1):
The dropout probabilitiy for all fully connected layers in the embeddings, encoder, and pooler. The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.
attention_probs_dropout_prob (`float`, *optional*, defaults to 0.1): attention_probs_dropout_prob (`float`, *optional*, defaults to 0.1):
The dropout ratio for the attention probabilities. The dropout ratio for the attention probabilities.
max_position_embeddings (`int`, *optional*, defaults to 512): max_position_embeddings (`int`, *optional*, defaults to 512):
......
...@@ -63,7 +63,7 @@ class LayoutLMv3Config(PretrainedConfig): ...@@ -63,7 +63,7 @@ class LayoutLMv3Config(PretrainedConfig):
The non-linear activation function (function or string) in the encoder and pooler. If string, `"gelu"`, The non-linear activation function (function or string) in the encoder and pooler. If string, `"gelu"`,
`"relu"`, `"selu"` and `"gelu_new"` are supported. `"relu"`, `"selu"` and `"gelu_new"` are supported.
hidden_dropout_prob (`float`, *optional*, defaults to 0.1): hidden_dropout_prob (`float`, *optional*, defaults to 0.1):
The dropout probabilitiy for all fully connected layers in the embeddings, encoder, and pooler. The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.
attention_probs_dropout_prob (`float`, *optional*, defaults to 0.1): attention_probs_dropout_prob (`float`, *optional*, defaults to 0.1):
The dropout ratio for the attention probabilities. The dropout ratio for the attention probabilities.
max_position_embeddings (`int`, *optional*, defaults to 512): max_position_embeddings (`int`, *optional*, defaults to 512):
......
...@@ -711,7 +711,7 @@ class FlaxMarianEncoder(nn.Module): ...@@ -711,7 +711,7 @@ class FlaxMarianEncoder(nn.Module):
inputs_embeds = self.embed_tokens(input_ids) * self.embed_scale inputs_embeds = self.embed_tokens(input_ids) * self.embed_scale
positions = jnp.take(self.embed_positions, position_ids, axis=0) positions = jnp.take(self.embed_positions, position_ids, axis=0)
# explictly cast the positions here, since self.embed_positions are not registered as parameters # explicitly cast the positions here, since self.embed_positions are not registered as parameters
positions = positions.astype(inputs_embeds.dtype) positions = positions.astype(inputs_embeds.dtype)
hidden_states = inputs_embeds + positions hidden_states = inputs_embeds + positions
...@@ -771,7 +771,7 @@ class FlaxMarianDecoder(nn.Module): ...@@ -771,7 +771,7 @@ class FlaxMarianDecoder(nn.Module):
# embed positions # embed positions
positions = jnp.take(self.embed_positions, position_ids, axis=0) positions = jnp.take(self.embed_positions, position_ids, axis=0)
# explictly cast the positions here, since self.embed_positions are not registered as parameters # explicitly cast the positions here, since self.embed_positions are not registered as parameters
positions = positions.astype(inputs_embeds.dtype) positions = positions.astype(inputs_embeds.dtype)
hidden_states = inputs_embeds + positions hidden_states = inputs_embeds + positions
......
...@@ -77,7 +77,7 @@ class MobileViTConfig(PretrainedConfig): ...@@ -77,7 +77,7 @@ class MobileViTConfig(PretrainedConfig):
output_stride (`int`, *optional*, defaults to 32): output_stride (`int`, *optional*, defaults to 32):
The ratio of the spatial resolution of the output to the resolution of the input image. The ratio of the spatial resolution of the output to the resolution of the input image.
hidden_dropout_prob (`float`, *optional*, defaults to 0.1): hidden_dropout_prob (`float`, *optional*, defaults to 0.1):
The dropout probabilitiy for all fully connected layers in the Transformer encoder. The dropout probability for all fully connected layers in the Transformer encoder.
attention_probs_dropout_prob (`float`, *optional*, defaults to 0.0): attention_probs_dropout_prob (`float`, *optional*, defaults to 0.0):
The dropout ratio for the attention probabilities. The dropout ratio for the attention probabilities.
classifier_dropout_prob (`float`, *optional*, defaults to 0.1): classifier_dropout_prob (`float`, *optional*, defaults to 0.1):
......
...@@ -52,7 +52,7 @@ class MraConfig(PretrainedConfig): ...@@ -52,7 +52,7 @@ class MraConfig(PretrainedConfig):
The non-linear activation function (function or string) in the encoder and pooler. If string, `"gelu"`, The non-linear activation function (function or string) in the encoder and pooler. If string, `"gelu"`,
`"relu"`, `"selu"` and `"gelu_new"` are supported. `"relu"`, `"selu"` and `"gelu_new"` are supported.
hidden_dropout_prob (`float`, *optional*, defaults to 0.1): hidden_dropout_prob (`float`, *optional*, defaults to 0.1):
The dropout probabilitiy for all fully connected layers in the embeddings, encoder, and pooler. The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.
attention_probs_dropout_prob (`float`, *optional*, defaults to 0.1): attention_probs_dropout_prob (`float`, *optional*, defaults to 0.1):
The dropout ratio for the attention probabilities. The dropout ratio for the attention probabilities.
max_position_embeddings (`int`, *optional*, defaults to 512): max_position_embeddings (`int`, *optional*, defaults to 512):
......
...@@ -52,7 +52,7 @@ class NystromformerConfig(PretrainedConfig): ...@@ -52,7 +52,7 @@ class NystromformerConfig(PretrainedConfig):
The non-linear activation function (function or string) in the encoder and pooler. If string, `"gelu"`, The non-linear activation function (function or string) in the encoder and pooler. If string, `"gelu"`,
`"relu"`, `"selu"` and `"gelu_new"` are supported. `"relu"`, `"selu"` and `"gelu_new"` are supported.
hidden_dropout_prob (`float`, *optional*, defaults to 0.1): hidden_dropout_prob (`float`, *optional*, defaults to 0.1):
The dropout probabilitiy for all fully connected layers in the embeddings, encoder, and pooler. The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.
attention_probs_dropout_prob (`float`, *optional*, defaults to 0.1): attention_probs_dropout_prob (`float`, *optional*, defaults to 0.1):
The dropout ratio for the attention probabilities. The dropout ratio for the attention probabilities.
max_position_embeddings (`int`, *optional*, defaults to 512): max_position_embeddings (`int`, *optional*, defaults to 512):
......
...@@ -707,7 +707,7 @@ class FlaxPegasusEncoder(nn.Module): ...@@ -707,7 +707,7 @@ class FlaxPegasusEncoder(nn.Module):
# embed positions # embed positions
embed_pos = jnp.take(self.embed_positions, position_ids, axis=0) embed_pos = jnp.take(self.embed_positions, position_ids, axis=0)
# explictly cast the positions here, since self.embed_positions are not registered as parameters # explicitly cast the positions here, since self.embed_positions are not registered as parameters
embed_pos = embed_pos.astype(inputs_embeds.dtype) embed_pos = embed_pos.astype(inputs_embeds.dtype)
hidden_states = inputs_embeds + embed_pos hidden_states = inputs_embeds + embed_pos
...@@ -778,7 +778,7 @@ class FlaxPegasusDecoder(nn.Module): ...@@ -778,7 +778,7 @@ class FlaxPegasusDecoder(nn.Module):
# embed positions # embed positions
positions = jnp.take(self.embed_positions, position_ids, axis=0) positions = jnp.take(self.embed_positions, position_ids, axis=0)
# explictly cast the positions here, since self.embed_positions are not registered as parameters # explicitly cast the positions here, since self.embed_positions are not registered as parameters
positions = positions.astype(inputs_embeds.dtype) positions = positions.astype(inputs_embeds.dtype)
hidden_states = inputs_embeds + positions hidden_states = inputs_embeds + positions
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment