Unverified Commit e68ec18c authored by Joao Gante's avatar Joao Gante Committed by GitHub
Browse files

Docs: formatting nits (#32247)



* doc formatting nits

* ignore non-autodocs

* Apply suggestions from code review
Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/esm/modeling_esm.py
Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/esm/modeling_esm.py
Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>

* make fixup

---------
Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
parent 2fbbcf50
...@@ -77,7 +77,7 @@ Then use `notebook_login` to sign-in to the Hub, and follow the link [here](http ...@@ -77,7 +77,7 @@ Then use `notebook_login` to sign-in to the Hub, and follow the link [here](http
To ensure your model can be used by someone working with a different framework, we recommend you convert and upload your model with both PyTorch and TensorFlow checkpoints. While users are still able to load your model from a different framework if you skip this step, it will be slower because 🤗 Transformers will need to convert the checkpoint on-the-fly. To ensure your model can be used by someone working with a different framework, we recommend you convert and upload your model with both PyTorch and TensorFlow checkpoints. While users are still able to load your model from a different framework if you skip this step, it will be slower because 🤗 Transformers will need to convert the checkpoint on-the-fly.
Converting a checkpoint for another framework is easy. Make sure you have PyTorch and TensorFlow installed (see [here](installation) for installation instructions), and then find the specific model for your task in the other framework. Converting a checkpoint for another framework is easy. Make sure you have PyTorch and TensorFlow installed (see [here](installation) for installation instructions), and then find the specific model for your task in the other framework.
<frameworkcontent> <frameworkcontent>
<pt> <pt>
......
...@@ -147,7 +147,7 @@ def get_original_command(max_width=80, full_python_path=False): ...@@ -147,7 +147,7 @@ def get_original_command(max_width=80, full_python_path=False):
Return the original command line string that can be replayed nicely and wrapped for 80 char width. Return the original command line string that can be replayed nicely and wrapped for 80 char width.
Args: Args:
max_width (`int`, `optional`, defaults to 80): max_width (`int`, *optional*, defaults to 80):
The width to wrap for. The width to wrap for.
full_python_path (`bool`, `optional`, defaults to `False`): full_python_path (`bool`, `optional`, defaults to `False`):
Whether to replicate the full path or just the last segment (i.e. `python`). Whether to replicate the full path or just the last segment (i.e. `python`).
......
...@@ -113,7 +113,7 @@ class Problem: ...@@ -113,7 +113,7 @@ class Problem:
The inputs that will be fed to the tools. For this testing environment, only strings are accepted as The inputs that will be fed to the tools. For this testing environment, only strings are accepted as
values. Pass along a dictionary when you want to specify the values of each inputs, or just the list of values. Pass along a dictionary when you want to specify the values of each inputs, or just the list of
inputs expected (the value used will be `<<input_name>>` in this case). inputs expected (the value used will be `<<input_name>>` in this case).
answer (`str` or `list[str`]): answer (`str` or `list[str]`):
The theoretical answer (or list of possible valid answers) to the problem, as code. The theoretical answer (or list of possible valid answers) to the problem, as code.
""" """
......
...@@ -663,7 +663,7 @@ def spectrogram_batch( ...@@ -663,7 +663,7 @@ def spectrogram_batch(
Specifies log scaling strategy; options are None, "log", "log10", "dB". Specifies log scaling strategy; options are None, "log", "log10", "dB".
reference (`float`, *optional*, defaults to 1.0): reference (`float`, *optional*, defaults to 1.0):
Reference value for dB conversion in log_mel. Reference value for dB conversion in log_mel.
min_value (`float`, °optional*, defaults to 1e-10): min_value (`float`, *optional*, defaults to 1e-10):
Minimum floor value for log scale conversions. Minimum floor value for log scale conversions.
db_range (`float`, *optional*): db_range (`float`, *optional*):
Dynamic range for dB scale spectrograms. Dynamic range for dB scale spectrograms.
......
...@@ -542,7 +542,7 @@ class QuantoQuantizedCache(QuantizedCache): ...@@ -542,7 +542,7 @@ class QuantoQuantizedCache(QuantizedCache):
Quantized Cache class that uses `quanto` as a backend to perform quantization. Current implementation supports `int2` and `int4` dtypes only. Quantized Cache class that uses `quanto` as a backend to perform quantization. Current implementation supports `int2` and `int4` dtypes only.
Parameters: Parameters:
cache_config (`QuantizedCacheConfig`,): cache_config (`QuantizedCacheConfig`):
A configuration containing all the arguments to be used by the quantizer, including axis, qtype and group size. A configuration containing all the arguments to be used by the quantizer, including axis, qtype and group size.
""" """
...@@ -583,7 +583,7 @@ class HQQQuantizedCache(QuantizedCache): ...@@ -583,7 +583,7 @@ class HQQQuantizedCache(QuantizedCache):
Quantized Cache class that uses `HQQ` as a backend to perform quantization. Current implementation supports `int2`, `int4`, `int8` dtypes. Quantized Cache class that uses `HQQ` as a backend to perform quantization. Current implementation supports `int2`, `int4`, `int8` dtypes.
Parameters: Parameters:
cache_config (`QuantizedCacheConfig`,): cache_config (`QuantizedCacheConfig`):
A configuration containing all the arguments to be used by the quantizer, including axis, qtype and group size. A configuration containing all the arguments to be used by the quantizer, including axis, qtype and group size.
""" """
...@@ -794,7 +794,7 @@ class StaticCache(Cache): ...@@ -794,7 +794,7 @@ class StaticCache(Cache):
Static Cache class to be used with `torch.compile(model)` and `torch.export()`. Static Cache class to be used with `torch.compile(model)` and `torch.export()`.
Parameters: Parameters:
config (`PretrainedConfig): config (`PretrainedConfig`):
The configuration file defining the shape-related attributes required to initialize the static cache. The configuration file defining the shape-related attributes required to initialize the static cache.
max_batch_size (`int`): max_batch_size (`int`):
The maximum batch size with which the model will be used. The maximum batch size with which the model will be used.
...@@ -924,7 +924,7 @@ class SlidingWindowCache(StaticCache): ...@@ -924,7 +924,7 @@ class SlidingWindowCache(StaticCache):
We overwrite the cache using these, then we always write at cache_position (clamped to `sliding_window`) We overwrite the cache using these, then we always write at cache_position (clamped to `sliding_window`)
Parameters: Parameters:
config (`PretrainedConfig): config (`PretrainedConfig`):
The configuration file defining the shape-related attributes required to initialize the static cache. The configuration file defining the shape-related attributes required to initialize the static cache.
max_batch_size (`int`): max_batch_size (`int`):
The maximum batch size with which the model will be used. The maximum batch size with which the model will be used.
......
...@@ -225,7 +225,7 @@ def get_resize_output_image_size( ...@@ -225,7 +225,7 @@ def get_resize_output_image_size(
Args: Args:
input_image (`np.ndarray`): input_image (`np.ndarray`):
The image to resize. The image to resize.
size (`int` or `Tuple[int, int]` or List[int] or Tuple[int]): size (`int` or `Tuple[int, int]` or List[int] or `Tuple[int]`):
The size to use for resizing the image. If `size` is a sequence like (h, w), output size will be matched to The size to use for resizing the image. If `size` is a sequence like (h, w), output size will be matched to
this. this.
......
...@@ -1389,7 +1389,7 @@ class NeptuneCallback(TrainerCallback): ...@@ -1389,7 +1389,7 @@ class NeptuneCallback(TrainerCallback):
You can find and copy the name in Neptune from the project settings -> Properties. If None (default), the You can find and copy the name in Neptune from the project settings -> Properties. If None (default), the
value of the `NEPTUNE_PROJECT` environment variable is used. value of the `NEPTUNE_PROJECT` environment variable is used.
name (`str`, *optional*): Custom name for the run. name (`str`, *optional*): Custom name for the run.
base_namespace (`str`, optional, defaults to "finetuning"): In the Neptune run, the root namespace base_namespace (`str`, *optional*, defaults to "finetuning"): In the Neptune run, the root namespace
that will contain all of the metadata logged by the callback. that will contain all of the metadata logged by the callback.
log_parameters (`bool`, *optional*, defaults to `True`): log_parameters (`bool`, *optional*, defaults to `True`):
If True, logs all Trainer arguments and model parameters provided by the Trainer. If True, logs all Trainer arguments and model parameters provided by the Trainer.
......
...@@ -266,7 +266,7 @@ class AttentionMaskConverter: ...@@ -266,7 +266,7 @@ class AttentionMaskConverter:
# or `torch.onnx.dynamo_export`, we must pass an example input, and `is_causal` behavior is hard-coded. If a user exports a model with q_len > 1, the exported model will hard-code `is_causal=True` which is in general wrong (see https://github.com/pytorch/pytorch/issues/108108). # or `torch.onnx.dynamo_export`, we must pass an example input, and `is_causal` behavior is hard-coded. If a user exports a model with q_len > 1, the exported model will hard-code `is_causal=True` which is in general wrong (see https://github.com/pytorch/pytorch/issues/108108).
# Thus, we only set `ignore_causal_mask = True` if the model is set to training. # Thus, we only set `ignore_causal_mask = True` if the model is set to training.
# #
# Besides, jit.trace can not handle the `q_len > 1` condition for `is_causal` (`TypeError: scaled_dot_product_attention(): argument 'is_causal' must be bool, not Tensor`). # Besides, jit.trace can not handle the `q_len > 1` condition for `is_causal` ("TypeError: scaled_dot_product_attention(): argument 'is_causal' must be bool, not Tensor").
if ( if (
(is_training or not is_tracing) (is_training or not is_tracing)
and (query_length == 1 or key_value_length == query_length) and (query_length == 1 or key_value_length == query_length)
......
...@@ -39,7 +39,7 @@ def _get_unpad_data(attention_mask: torch.Tensor) -> Tuple[torch.Tensor, torch.T ...@@ -39,7 +39,7 @@ def _get_unpad_data(attention_mask: torch.Tensor) -> Tuple[torch.Tensor, torch.T
Boolean or int tensor of shape (batch_size, sequence_length), 1 means valid and 0 means not valid. Boolean or int tensor of shape (batch_size, sequence_length), 1 means valid and 0 means not valid.
Return: Return:
indices (`torch.Tensor): indices (`torch.Tensor`):
The indices of non-masked tokens from the flattened input sequence. The indices of non-masked tokens from the flattened input sequence.
cu_seqlens (`torch.Tensor`): cu_seqlens (`torch.Tensor`):
The cumulative sequence lengths, used to index into ragged (unpadded) tensors. `cu_seqlens` shape is (batch_size + 1,). The cumulative sequence lengths, used to index into ragged (unpadded) tensors. `cu_seqlens` shape is (batch_size + 1,).
...@@ -83,7 +83,7 @@ def _upad_input( ...@@ -83,7 +83,7 @@ def _upad_input(
Target length. Target length.
Return: Return:
query_layer (`torch.Tensor): query_layer (`torch.Tensor`):
Query state without padding. Shape: (total_target_length, num_heads, head_dim). Query state without padding. Shape: (total_target_length, num_heads, head_dim).
key_layer (`torch.Tensor`): key_layer (`torch.Tensor`):
Key state with padding. Shape: (total_source_length, num_key_value_heads, head_dim). Key state with padding. Shape: (total_source_length, num_key_value_heads, head_dim).
...@@ -149,7 +149,7 @@ def prepare_fa2_from_position_ids(query, key, value, position_ids): ...@@ -149,7 +149,7 @@ def prepare_fa2_from_position_ids(query, key, value, position_ids):
Boolean or int tensor of shape (batch_size, sequence_length), 1 means valid and 0 means not valid. Boolean or int tensor of shape (batch_size, sequence_length), 1 means valid and 0 means not valid.
Return: Return:
query (`torch.Tensor): query (`torch.Tensor`):
Query state without padding. Shape: (total_target_length, num_heads, head_dim). Query state without padding. Shape: (total_target_length, num_heads, head_dim).
key (`torch.Tensor`): key (`torch.Tensor`):
Key state with padding. Shape: (total_source_length, num_key_value_heads, head_dim). Key state with padding. Shape: (total_source_length, num_key_value_heads, head_dim).
......
...@@ -1444,7 +1444,7 @@ class TFPreTrainedModel(keras.Model, TFModelUtilsMixin, TFGenerationMixin, PushT ...@@ -1444,7 +1444,7 @@ class TFPreTrainedModel(keras.Model, TFModelUtilsMixin, TFGenerationMixin, PushT
Args: Args:
dataset (`Any`): dataset (`Any`):
A [~`datasets.Dataset`] to be wrapped as a `tf.data.Dataset`. A [~`datasets.Dataset`] to be wrapped as a `tf.data.Dataset`.
batch_size (`int`, defaults to 8): batch_size (`int`, *optional*, defaults to 8):
The size of batches to return. The size of batches to return.
shuffle (`bool`, defaults to `True`): shuffle (`bool`, defaults to `True`):
Whether to return samples from the dataset in random order. Usually `True` for training datasets and Whether to return samples from the dataset in random order. Usually `True` for training datasets and
...@@ -3442,7 +3442,7 @@ class TFSequenceSummary(keras.layers.Layer): ...@@ -3442,7 +3442,7 @@ class TFSequenceSummary(keras.layers.Layer):
- **summary_first_dropout** (`float`) -- Optional dropout probability before the projection and activation. - **summary_first_dropout** (`float`) -- Optional dropout probability before the projection and activation.
- **summary_last_dropout** (`float`)-- Optional dropout probability after the projection and activation. - **summary_last_dropout** (`float`)-- Optional dropout probability after the projection and activation.
initializer_range (`float`, defaults to 0.02): The standard deviation to use to initialize the weights. initializer_range (`float`, *optional*, defaults to 0.02): The standard deviation to use to initialize the weights.
kwargs (`Dict[str, Any]`, *optional*): kwargs (`Dict[str, Any]`, *optional*):
Additional keyword arguments passed along to the `__init__` of `keras.layers.Layer`. Additional keyword arguments passed along to the `__init__` of `keras.layers.Layer`.
""" """
......
...@@ -105,10 +105,10 @@ class AutoformerConfig(PretrainedConfig): ...@@ -105,10 +105,10 @@ class AutoformerConfig(PretrainedConfig):
label_length (`int`, *optional*, defaults to 10): label_length (`int`, *optional*, defaults to 10):
Start token length of the Autoformer decoder, which is used for direct multi-step prediction (i.e. Start token length of the Autoformer decoder, which is used for direct multi-step prediction (i.e.
non-autoregressive generation). non-autoregressive generation).
moving_average (`int`, defaults to 25): moving_average (`int`, *optional*, defaults to 25):
The window size of the moving average. In practice, it's the kernel size in AvgPool1d of the Decomposition The window size of the moving average. In practice, it's the kernel size in AvgPool1d of the Decomposition
Layer. Layer.
autocorrelation_factor (`int`, defaults to 3): autocorrelation_factor (`int`, *optional*, defaults to 3):
"Attention" (i.e. AutoCorrelation mechanism) factor which is used to find top k autocorrelations delays. "Attention" (i.e. AutoCorrelation mechanism) factor which is used to find top k autocorrelations delays.
It's recommended in the paper to set it to a number between 1 and 5. It's recommended in the paper to set it to a number between 1 and 5.
......
...@@ -1219,7 +1219,7 @@ class BertForPreTraining(BertPreTrainedModel): ...@@ -1219,7 +1219,7 @@ class BertForPreTraining(BertPreTrainedModel):
- 0 indicates sequence B is a continuation of sequence A, - 0 indicates sequence B is a continuation of sequence A,
- 1 indicates sequence B is a random sequence. - 1 indicates sequence B is a random sequence.
kwargs (`Dict[str, any]`, optional, defaults to *{}*): kwargs (`Dict[str, any]`, *optional*, defaults to `{}`):
Used to hide legacy arguments that have been deprecated. Used to hide legacy arguments that have been deprecated.
Returns: Returns:
......
...@@ -1291,7 +1291,7 @@ class TFBertForPreTraining(TFBertPreTrainedModel, TFBertPreTrainingLoss): ...@@ -1291,7 +1291,7 @@ class TFBertForPreTraining(TFBertPreTrainedModel, TFBertPreTrainingLoss):
- 0 indicates sequence B is a continuation of sequence A, - 0 indicates sequence B is a continuation of sequence A,
- 1 indicates sequence B is a random sequence. - 1 indicates sequence B is a random sequence.
kwargs (`Dict[str, any]`, optional, defaults to *{}*): kwargs (`Dict[str, any]`, *optional*, defaults to `{}`):
Used to hide legacy arguments that have been deprecated. Used to hide legacy arguments that have been deprecated.
Return: Return:
......
...@@ -2290,7 +2290,7 @@ class BigBirdForPreTraining(BigBirdPreTrainedModel): ...@@ -2290,7 +2290,7 @@ class BigBirdForPreTraining(BigBirdPreTrainedModel):
- 0 indicates sequence B is a continuation of sequence A, - 0 indicates sequence B is a continuation of sequence A,
- 1 indicates sequence B is a random sequence. - 1 indicates sequence B is a random sequence.
kwargs (`Dict[str, any]`, optional, defaults to *{}*): kwargs (`Dict[str, any]`, *optional*, defaults to `{}`):
Used to hide legacy arguments that have been deprecated. Used to hide legacy arguments that have been deprecated.
Returns: Returns:
......
...@@ -57,7 +57,7 @@ def build_alibi_tensor(attention_mask: torch.Tensor, num_heads: int, dtype: torc ...@@ -57,7 +57,7 @@ def build_alibi_tensor(attention_mask: torch.Tensor, num_heads: int, dtype: torc
Returns tensor shaped (batch_size * num_heads, 1, max_seq_len) Returns tensor shaped (batch_size * num_heads, 1, max_seq_len)
attention_mask (`torch.Tensor`): attention_mask (`torch.Tensor`):
Token-wise attention mask, this should be of shape (batch_size, max_seq_len). Token-wise attention mask, this should be of shape (batch_size, max_seq_len).
num_heads (`int`, *required*): num_heads (`int`):
number of heads number of heads
dtype (`torch.dtype`, *optional*, default=`torch.bfloat16`): dtype (`torch.dtype`, *optional*, default=`torch.bfloat16`):
dtype of the output tensor dtype of the output tensor
...@@ -94,13 +94,13 @@ def dropout_add(x: torch.Tensor, residual: torch.Tensor, prob: float, training: ...@@ -94,13 +94,13 @@ def dropout_add(x: torch.Tensor, residual: torch.Tensor, prob: float, training:
Dropout add function Dropout add function
Args: Args:
x (`torch.tensor`, *required*): x (`torch.tensor`):
input tensor input tensor
residual (`torch.tensor`, *required*): residual (`torch.tensor`):
residual tensor residual tensor
prob (`float`, *required*): prob (`float`):
dropout probability dropout probability
training (`bool`, *required*): training (`bool`):
training mode training mode
""" """
out = F.dropout(x, p=prob, training=training) out = F.dropout(x, p=prob, training=training)
...@@ -114,7 +114,7 @@ def bloom_gelu_forward(x: torch.Tensor) -> torch.Tensor: ...@@ -114,7 +114,7 @@ def bloom_gelu_forward(x: torch.Tensor) -> torch.Tensor:
make the model jitable. make the model jitable.
Args: Args:
x (`torch.tensor`, *required*): x (`torch.tensor`):
input hidden states input hidden states
""" """
return x * 0.5 * (1.0 + torch.tanh(0.79788456 * x * (1 + 0.044715 * x * x))) return x * 0.5 * (1.0 + torch.tanh(0.79788456 * x * (1 + 0.044715 * x * x)))
...@@ -126,9 +126,9 @@ def bloom_gelu_back(g: torch.Tensor, x: torch.Tensor) -> torch.Tensor: ...@@ -126,9 +126,9 @@ def bloom_gelu_back(g: torch.Tensor, x: torch.Tensor) -> torch.Tensor:
0.3989423 * x * torch.exp(-0.5 * x * x) 0.3989423 * x * torch.exp(-0.5 * x * x)
Args: Args:
g (`torch.tensor`, *required*): g (`torch.tensor`):
gradient output tensor gradient output tensor
x (`torch.tensor`, *required*): x (`torch.tensor`):
input tensor input tensor
""" """
x = x[0] # x is a tuple of 1 element, needs to unpack it first x = x[0] # x is a tuple of 1 element, needs to unpack it first
...@@ -210,7 +210,7 @@ class BloomAttention(nn.Module): ...@@ -210,7 +210,7 @@ class BloomAttention(nn.Module):
without making any copies, results share same memory storage as `fused_qkv` without making any copies, results share same memory storage as `fused_qkv`
Args: Args:
fused_qkv (`torch.tensor`, *required*): [batch_size, seq_length, num_heads * 3 * head_dim] fused_qkv (`torch.tensor`): [batch_size, seq_length, num_heads * 3 * head_dim]
Returns: Returns:
query: [batch_size, num_heads, seq_length, head_dim] query: [batch_size, num_heads, seq_length, head_dim]
...@@ -229,7 +229,7 @@ class BloomAttention(nn.Module): ...@@ -229,7 +229,7 @@ class BloomAttention(nn.Module):
Merge heads together over the last dimension Merge heads together over the last dimension
Args: Args:
x (`torch.tensor`, *required*): [batch_size * num_heads, seq_length, head_dim] x (`torch.tensor`): [batch_size * num_heads, seq_length, head_dim]
Returns: Returns:
torch.tensor: [batch_size, seq_length, num_heads * head_dim] torch.tensor: [batch_size, seq_length, num_heads * head_dim]
......
...@@ -247,7 +247,7 @@ class BridgeTowerImageProcessor(BaseImageProcessor): ...@@ -247,7 +247,7 @@ class BridgeTowerImageProcessor(BaseImageProcessor):
Image to resize. Image to resize.
size (`Dict[str, int]`): size (`Dict[str, int]`):
Controls the size of the output image. Should be of the form `{"shortest_edge": int}`. Controls the size of the output image. Should be of the form `{"shortest_edge": int}`.
size_divisor (`int`, defaults to 32): size_divisor (`int`, *optional*, defaults to 32):
The image is resized to a size that is a multiple of this value. The image is resized to a size that is a multiple of this value.
resample (`PILImageResampling` filter, *optional*, defaults to `PILImageResampling.BICUBIC`): resample (`PILImageResampling` filter, *optional*, defaults to `PILImageResampling.BICUBIC`):
Resampling filter to use when resiizing the image. Resampling filter to use when resiizing the image.
......
...@@ -972,7 +972,7 @@ class CamembertForMaskedLM(CamembertPreTrainedModel): ...@@ -972,7 +972,7 @@ class CamembertForMaskedLM(CamembertPreTrainedModel):
Labels for computing the masked language modeling loss. Indices should be in `[-100, 0, ..., Labels for computing the masked language modeling loss. Indices should be in `[-100, 0, ...,
config.vocab_size]` (see `input_ids` docstring) Tokens with indices set to `-100` are ignored (masked), the config.vocab_size]` (see `input_ids` docstring) Tokens with indices set to `-100` are ignored (masked), the
loss is only computed for the tokens with labels in `[0, ..., config.vocab_size]` loss is only computed for the tokens with labels in `[0, ..., config.vocab_size]`
kwargs (`Dict[str, any]`, optional, defaults to *{}*): kwargs (`Dict[str, any]`, *optional*, defaults to `{}`):
Used to hide legacy arguments that have been deprecated. Used to hide legacy arguments that have been deprecated.
""" """
return_dict = return_dict if return_dict is not None else self.config.use_return_dict return_dict = return_dict if return_dict is not None else self.config.use_return_dict
......
...@@ -173,7 +173,7 @@ class ClvpFeatureExtractor(SequenceFeatureExtractor): ...@@ -173,7 +173,7 @@ class ClvpFeatureExtractor(SequenceFeatureExtractor):
- `'tf'`: Return TensorFlow `tf.constant` objects. - `'tf'`: Return TensorFlow `tf.constant` objects.
- `'pt'`: Return PyTorch `torch.Tensor` objects. - `'pt'`: Return PyTorch `torch.Tensor` objects.
- `'np'`: Return Numpy `np.ndarray` objects. - `'np'`: Return Numpy `np.ndarray` objects.
padding_value (`float`, defaults to 0.0): padding_value (`float`, *optional*, defaults to 0.0):
The value that is used to fill the padding values / vectors. The value that is used to fill the padding values / vectors.
max_length (`int`, *optional*): max_length (`int`, *optional*):
The maximum input length of the inputs. The maximum input length of the inputs.
......
...@@ -41,9 +41,9 @@ class ConvNextConfig(BackboneConfigMixin, PretrainedConfig): ...@@ -41,9 +41,9 @@ class ConvNextConfig(BackboneConfigMixin, PretrainedConfig):
Args: Args:
num_channels (`int`, *optional*, defaults to 3): num_channels (`int`, *optional*, defaults to 3):
The number of input channels. The number of input channels.
patch_size (`int`, optional, defaults to 4): patch_size (`int`, *optional*, defaults to 4):
Patch size to use in the patch embedding layer. Patch size to use in the patch embedding layer.
num_stages (`int`, optional, defaults to 4): num_stages (`int`, *optional*, defaults to 4):
The number of stages in the model. The number of stages in the model.
hidden_sizes (`List[int]`, *optional*, defaults to [96, 192, 384, 768]): hidden_sizes (`List[int]`, *optional*, defaults to [96, 192, 384, 768]):
Dimensionality (hidden size) at each stage. Dimensionality (hidden size) at each stage.
......
...@@ -35,9 +35,9 @@ class ConvNextV2Config(BackboneConfigMixin, PretrainedConfig): ...@@ -35,9 +35,9 @@ class ConvNextV2Config(BackboneConfigMixin, PretrainedConfig):
Args: Args:
num_channels (`int`, *optional*, defaults to 3): num_channels (`int`, *optional*, defaults to 3):
The number of input channels. The number of input channels.
patch_size (`int`, optional, defaults to 4): patch_size (`int`, *optional*, defaults to 4):
Patch size to use in the patch embedding layer. Patch size to use in the patch embedding layer.
num_stages (`int`, optional, defaults to 4): num_stages (`int`, *optional*, defaults to 4):
The number of stages in the model. The number of stages in the model.
hidden_sizes (`List[int]`, *optional*, defaults to `[96, 192, 384, 768]`): hidden_sizes (`List[int]`, *optional*, defaults to `[96, 192, 384, 768]`):
Dimensionality (hidden size) at each stage. Dimensionality (hidden size) at each stage.
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment