"test/srt/vscode:/vscode.git/clone" did not exist on "0ac61146947ad5bb202ce08a81431eb0daf43aef"
Unverified Commit 721ee783 authored by Klaus Hipp's avatar Klaus Hipp Committed by GitHub
Browse files

[Docs] Fix spelling and grammar mistakes (#28825)

* Fix typos and grammar mistakes in docs and examples

* Fix typos in docstrings and comments

* Fix spelling of `tokenizer` in model tests

* Remove erroneous spaces in decorators

* Remove extra spaces in Markdown link texts
parent 2418c64a
......@@ -202,7 +202,7 @@ class ClapAudioConfig(PretrainedConfig):
Whether or not to enable patch fusion. This is the main contribution of the authors, and should give the
best results.
hidden_dropout_prob (`float`, *optional*, defaults to 0.1):
The dropout probabilitiy for all fully connected layers in the encoder.
The dropout probability for all fully connected layers in the encoder.
fusion_type (`[type]`, *optional*):
Fusion type used for the patch fusion.
patch_embed_input_channels (`int`, *optional*, defaults to 1):
......
......@@ -61,7 +61,7 @@ class ConvBertConfig(PretrainedConfig):
The non-linear activation function (function or string) in the encoder and pooler. If string, `"gelu"`,
`"relu"`, `"selu"` and `"gelu_new"` are supported.
hidden_dropout_prob (`float`, *optional*, defaults to 0.1):
The dropout probabilitiy for all fully connected layers in the embeddings, encoder, and pooler.
The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.
attention_probs_dropout_prob (`float`, *optional*, defaults to 0.1):
The dropout ratio for the attention probabilities.
max_position_embeddings (`int`, *optional*, defaults to 512):
......
......@@ -51,7 +51,7 @@ class CpmAntConfig(PretrainedConfig):
num_hidden_layers (`int`, *optional*, defaults to 48):
Number of layers of the Transformer encoder.
dropout_p (`float`, *optional*, defaults to 0.0):
The dropout probabilitiy for all fully connected layers in the embeddings, encoder.
The dropout probability for all fully connected layers in the embeddings, encoder.
position_bias_num_buckets (`int`, *optional*, defaults to 512):
The number of position_bias buckets.
position_bias_max_distance (`int`, *optional*, defaults to 2048):
......
......@@ -59,7 +59,7 @@ class DeiTConfig(PretrainedConfig):
The non-linear activation function (function or string) in the encoder and pooler. If string, `"gelu"`,
`"relu"`, `"selu"` and `"gelu_new"` are supported.
hidden_dropout_prob (`float`, *optional*, defaults to 0.0):
The dropout probabilitiy for all fully connected layers in the embeddings, encoder, and pooler.
The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.
attention_probs_dropout_prob (`float`, *optional*, defaults to 0.0):
The dropout ratio for the attention probabilities.
initializer_range (`float`, *optional*, defaults to 0.02):
......
......@@ -64,7 +64,7 @@ class MCTCTConfig(PretrainedConfig):
initializer_range (`float`, *optional*, defaults to 0.02):
The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
hidden_dropout_prob (`float`, *optional*, defaults to 0.3):
The dropout probabilitiy for all fully connected layers in the embeddings, encoder, and pooler.
The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.
attention_probs_dropout_prob (`float`, *optional*, defaults to 0.3):
The dropout ratio for the attention probabilities.
pad_token_id (`int`, *optional*, defaults to 1):
......
......@@ -146,7 +146,7 @@ class FlaxEmbeddings(nn.Module):
position_embeds = self.position_embeddings(position_ids.astype("i4"))
else:
position_embeds = self.pos_encoding[:, :seq_length, :]
# explictly cast the positions here, since self.embed_positions are not registered as parameters
# explicitly cast the positions here, since self.embed_positions are not registered as parameters
position_embeds = position_embeds.astype(inputs_embeds.dtype)
# Sum all embeddings
......
......@@ -54,7 +54,7 @@ class DPTConfig(PretrainedConfig):
The non-linear activation function (function or string) in the encoder and pooler. If string, `"gelu"`,
`"relu"`, `"selu"` and `"gelu_new"` are supported.
hidden_dropout_prob (`float`, *optional*, defaults to 0.0):
The dropout probabilitiy for all fully connected layers in the embeddings, encoder, and pooler.
The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.
attention_probs_dropout_prob (`float`, *optional*, defaults to 0.0):
The dropout ratio for the attention probabilities.
initializer_range (`float`, *optional*, defaults to 0.02):
......
......@@ -1915,7 +1915,7 @@ class EsmFoldingTrunk(nn.Module):
# This parameter means the axial attention will be computed
# in a chunked manner. This should make the memory used more or less O(L) instead of O(L^2).
# It's equivalent to running a for loop over chunks of the dimension we're iterative over,
# where the chunk_size is the size of the chunks, so 128 would mean to parse 128-lengthed chunks.
# where the chunk_size is the size of the chunks, so 128 would mean to parse 128-length chunks.
self.chunk_size = chunk_size
def forward(self, seq_feats, pair_feats, true_aa, residx, mask, no_recycles):
......
......@@ -53,7 +53,7 @@ class FlavaImageConfig(PretrainedConfig):
The non-linear activation function (function or string) in the encoder and pooler. If string, `"gelu"`,
`"relu"`, `"selu"` and `"gelu_new"` are supported.
hidden_dropout_prob (`float`, *optional*, defaults to 0.0):
The dropout probabilitiy for all fully connected layers in the embeddings, encoder, and pooler.
The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.
attention_probs_dropout_prob (`float`, *optional*, defaults to 0.0):
The dropout ratio for the attention probabilities.
initializer_range (`float`, *optional*, defaults to 0.02):
......@@ -188,7 +188,7 @@ class FlavaTextConfig(PretrainedConfig):
The non-linear activation function (function or string) in the encoder and pooler. If string, `"gelu"`,
`"relu"`, `"selu"` and `"gelu_new"` are supported.
hidden_dropout_prob (`float`, *optional*, defaults to 0.1):
The dropout probabilitiy for all fully connected layers in the embeddings, encoder, and pooler.
The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.
attention_probs_dropout_prob (`float`, *optional*, defaults to 0.1):
The dropout ratio for the attention probabilities.
initializer_range (`float`, *optional*, defaults to 0.02):
......@@ -302,7 +302,7 @@ class FlavaMultimodalConfig(PretrainedConfig):
The non-linear activation function (function or string) in the encoder and pooler. If string, `"gelu"`,
`"relu"`, `"selu"` and `"gelu_new"` are supported.
hidden_dropout_prob (`float`, *optional*, defaults to 0.0):
The dropout probabilitiy for all fully connected layers in the embeddings, encoder, and pooler.
The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.
attention_probs_dropout_prob (`float`, *optional*, defaults to 0.0):
The dropout ratio for the attention probabilities.
initializer_range (`float`, *optional*, defaults to 0.02):
......
......@@ -52,7 +52,7 @@ class FNetConfig(PretrainedConfig):
The non-linear activation function (function or string) in the encoder and pooler. If string, `"gelu"`,
`"relu"`, `"selu"` and `"gelu_new"` are supported.
hidden_dropout_prob (`float`, *optional*, defaults to 0.1):
The dropout probabilitiy for all fully connected layers in the embeddings, encoder, and pooler.
The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.
max_position_embeddings (`int`, *optional*, defaults to 512):
The maximum sequence length that this model might ever be used with. Typically set this to something large
just in case (e.g., 512 or 1024 or 2048).
......
......@@ -70,7 +70,7 @@ class GPTNeoConfig(PretrainedConfig):
resid_dropout (`float`, *optional*, defaults to 0.0):
Residual dropout used in the attention pattern.
embed_dropout (`float`, *optional*, defaults to 0.0):
The dropout probabilitiy for all fully connected layers in the embeddings, encoder, and pooler.
The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.
attention_dropout (`float`, *optional*, defaults to 0.0):
The dropout ratio for the attention probabilities.
classifier_dropout (`float`, *optional*, defaults to 0.1):
......
......@@ -177,7 +177,7 @@ class GroupViTVisionConfig(PretrainedConfig):
layer_norm_eps (`float`, *optional*, defaults to 1e-5):
The epsilon used by the layer normalization layers.
dropout (`float`, *optional*, defaults to 0.0):
The dropout probabilitiy for all fully connected layers in the embeddings, encoder, and pooler.
The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.
attention_dropout (`float`, *optional*, defaults to 0.0):
The dropout ratio for the attention probabilities.
initializer_range (`float`, *optional*, defaults to 0.02):
......
......@@ -63,7 +63,7 @@ class HubertConfig(PretrainedConfig):
attention_dropout(`float`, *optional*, defaults to 0.1):
The dropout ratio for the attention probabilities.
final_dropout (`float`, *optional*, defaults to 0.1):
The dropout probabilitiy for the final projection layer of [`Wav2Vec2ForCTC`].
The dropout probability for the final projection layer of [`Wav2Vec2ForCTC`].
layerdrop (`float`, *optional*, defaults to 0.1):
The LayerDrop probability. See the [LayerDrop paper](see https://arxiv.org/abs/1909.11556) for more
details.
......
......@@ -57,7 +57,7 @@ class LayoutLMv2Config(PretrainedConfig):
The non-linear activation function (function or string) in the encoder and pooler. If string, `"gelu"`,
`"relu"`, `"selu"` and `"gelu_new"` are supported.
hidden_dropout_prob (`float`, *optional*, defaults to 0.1):
The dropout probabilitiy for all fully connected layers in the embeddings, encoder, and pooler.
The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.
attention_probs_dropout_prob (`float`, *optional*, defaults to 0.1):
The dropout ratio for the attention probabilities.
max_position_embeddings (`int`, *optional*, defaults to 512):
......
......@@ -63,7 +63,7 @@ class LayoutLMv3Config(PretrainedConfig):
The non-linear activation function (function or string) in the encoder and pooler. If string, `"gelu"`,
`"relu"`, `"selu"` and `"gelu_new"` are supported.
hidden_dropout_prob (`float`, *optional*, defaults to 0.1):
The dropout probabilitiy for all fully connected layers in the embeddings, encoder, and pooler.
The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.
attention_probs_dropout_prob (`float`, *optional*, defaults to 0.1):
The dropout ratio for the attention probabilities.
max_position_embeddings (`int`, *optional*, defaults to 512):
......
......@@ -711,7 +711,7 @@ class FlaxMarianEncoder(nn.Module):
inputs_embeds = self.embed_tokens(input_ids) * self.embed_scale
positions = jnp.take(self.embed_positions, position_ids, axis=0)
# explictly cast the positions here, since self.embed_positions are not registered as parameters
# explicitly cast the positions here, since self.embed_positions are not registered as parameters
positions = positions.astype(inputs_embeds.dtype)
hidden_states = inputs_embeds + positions
......@@ -771,7 +771,7 @@ class FlaxMarianDecoder(nn.Module):
# embed positions
positions = jnp.take(self.embed_positions, position_ids, axis=0)
# explictly cast the positions here, since self.embed_positions are not registered as parameters
# explicitly cast the positions here, since self.embed_positions are not registered as parameters
positions = positions.astype(inputs_embeds.dtype)
hidden_states = inputs_embeds + positions
......
......@@ -77,7 +77,7 @@ class MobileViTConfig(PretrainedConfig):
output_stride (`int`, *optional*, defaults to 32):
The ratio of the spatial resolution of the output to the resolution of the input image.
hidden_dropout_prob (`float`, *optional*, defaults to 0.1):
The dropout probabilitiy for all fully connected layers in the Transformer encoder.
The dropout probability for all fully connected layers in the Transformer encoder.
attention_probs_dropout_prob (`float`, *optional*, defaults to 0.0):
The dropout ratio for the attention probabilities.
classifier_dropout_prob (`float`, *optional*, defaults to 0.1):
......
......@@ -52,7 +52,7 @@ class MraConfig(PretrainedConfig):
The non-linear activation function (function or string) in the encoder and pooler. If string, `"gelu"`,
`"relu"`, `"selu"` and `"gelu_new"` are supported.
hidden_dropout_prob (`float`, *optional*, defaults to 0.1):
The dropout probabilitiy for all fully connected layers in the embeddings, encoder, and pooler.
The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.
attention_probs_dropout_prob (`float`, *optional*, defaults to 0.1):
The dropout ratio for the attention probabilities.
max_position_embeddings (`int`, *optional*, defaults to 512):
......
......@@ -52,7 +52,7 @@ class NystromformerConfig(PretrainedConfig):
The non-linear activation function (function or string) in the encoder and pooler. If string, `"gelu"`,
`"relu"`, `"selu"` and `"gelu_new"` are supported.
hidden_dropout_prob (`float`, *optional*, defaults to 0.1):
The dropout probabilitiy for all fully connected layers in the embeddings, encoder, and pooler.
The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.
attention_probs_dropout_prob (`float`, *optional*, defaults to 0.1):
The dropout ratio for the attention probabilities.
max_position_embeddings (`int`, *optional*, defaults to 512):
......
......@@ -707,7 +707,7 @@ class FlaxPegasusEncoder(nn.Module):
# embed positions
embed_pos = jnp.take(self.embed_positions, position_ids, axis=0)
# explictly cast the positions here, since self.embed_positions are not registered as parameters
# explicitly cast the positions here, since self.embed_positions are not registered as parameters
embed_pos = embed_pos.astype(inputs_embeds.dtype)
hidden_states = inputs_embeds + embed_pos
......@@ -778,7 +778,7 @@ class FlaxPegasusDecoder(nn.Module):
# embed positions
positions = jnp.take(self.embed_positions, position_ids, axis=0)
# explictly cast the positions here, since self.embed_positions are not registered as parameters
# explicitly cast the positions here, since self.embed_positions are not registered as parameters
positions = positions.astype(inputs_embeds.dtype)
hidden_states = inputs_embeds + positions
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment