Unverified Commit 03af4c42 authored by Sylvain Gugger's avatar Sylvain Gugger Committed by GitHub
Browse files

Docstring check (#26052)



* Fix number of minimal calls to the Hub with peft integration

* Alternate design

* And this way?

* Revert

* Nits to fix

* Add util

* Print when changes are made

* Add list to ignore

* Add more rules

* Manual fixes

* deal with kwargs

* deal with enum defaults

* avoid many digits for floats

* Manual fixes

* Fix regex

* Fix regex

* Auto fix

* Style

* Apply script

* Add ignored list

* Add check that templates are filled

* Adding to CI checks

* Add back semi-fix

* Ignore more objects

* More auto-fixes

* Ignore missing objects

* Remove temp semi-fix

* Fixes

* Update src/transformers/models/pvt/configuration_pvt.py
Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>

* Update utils/check_docstrings.py
Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/utils/quantization_config.py
Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>

* Deal with float defaults

* Fix small defaults

* Address review comment

* Treat

* Post-rebase cleanup

* Address review comment

* Update src/transformers/models/deprecated/mctct/configuration_mctct.py
Co-authored-by: default avatarLysandre Debut <lysandre.debut@reseau.eseo.fr>

* Address review comment

---------
Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: default avatarLysandre Debut <lysandre.debut@reseau.eseo.fr>
parent 122b2657
...@@ -50,15 +50,17 @@ class CpmAntConfig(PretrainedConfig): ...@@ -50,15 +50,17 @@ class CpmAntConfig(PretrainedConfig):
Dimension of the "intermediate" (i.e., feed-forward) layer in the Transformer encoder. Dimension of the "intermediate" (i.e., feed-forward) layer in the Transformer encoder.
num_hidden_layers (`int`, *optional*, defaults to 48): num_hidden_layers (`int`, *optional*, defaults to 48):
Number of layers of the Transformer encoder. Number of layers of the Transformer encoder.
dropout_p (`float`, *optional*, defaults to 0.1): dropout_p (`float`, *optional*, defaults to 0.0):
The dropout probabilitiy for all fully connected layers in the embeddings, encoder. The dropout probabilitiy for all fully connected layers in the embeddings, encoder.
position_bias_num_buckets (`int`, *optional*, defaults to 512): position_bias_num_buckets (`int`, *optional*, defaults to 512):
The number of position_bias buckets. The number of position_bias buckets.
position_bias_max_distance (`int`, *optional*, defaults to 2048): position_bias_max_distance (`int`, *optional*, defaults to 2048):
The maximum sequence length that this model might ever be used with. Typically set this to something large The maximum sequence length that this model might ever be used with. Typically set this to something large
just in case (e.g., 512 or 1024 or 2048). just in case (e.g., 512 or 1024 or 2048).
eps (`float`, *optional*, defaults to 1e-6): eps (`float`, *optional*, defaults to 1e-06):
The epsilon used by the layer normalization layers. The epsilon used by the layer normalization layers.
init_std (`float`, *optional*, defaults to 1.0):
Initialize parameters with std = init_std.
prompt_types (`int`, *optional*, defaults to 32): prompt_types (`int`, *optional*, defaults to 32):
The type of prompt. The type of prompt.
prompt_length (`int`, *optional*, defaults to 32): prompt_length (`int`, *optional*, defaults to 32):
...@@ -67,8 +69,6 @@ class CpmAntConfig(PretrainedConfig): ...@@ -67,8 +69,6 @@ class CpmAntConfig(PretrainedConfig):
The type of segment. The type of segment.
use_cache (`bool`, *optional*, defaults to `True`): use_cache (`bool`, *optional*, defaults to `True`):
Whether to use cache. Whether to use cache.
init_std (`float`, *optional*, defaults to 1.0):
Initialize parameters with std = init_std.
Example: Example:
......
...@@ -54,7 +54,7 @@ class CTRLConfig(PretrainedConfig): ...@@ -54,7 +54,7 @@ class CTRLConfig(PretrainedConfig):
The dropout probability for all fully connected layers in the embeddings, encoder, and pooler. The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.
embd_pdrop (`int`, *optional*, defaults to 0.1): embd_pdrop (`int`, *optional*, defaults to 0.1):
The dropout ratio for the embeddings. The dropout ratio for the embeddings.
layer_norm_epsilon (`float`, *optional*, defaults to 1e-6): layer_norm_epsilon (`float`, *optional*, defaults to 1e-06):
The epsilon to use in the layer normalization layers The epsilon to use in the layer normalization layers
initializer_range (`float`, *optional*, defaults to 0.02): initializer_range (`float`, *optional*, defaults to 0.02):
The standard deviation of the truncated_normal_initializer for initializing all weight matrices. The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
......
...@@ -99,9 +99,9 @@ class DebertaTokenizerFast(PreTrainedTokenizerFast): ...@@ -99,9 +99,9 @@ class DebertaTokenizerFast(PreTrainedTokenizerFast):
refer to this superclass for more information regarding those methods. refer to this superclass for more information regarding those methods.
Args: Args:
vocab_file (`str`): vocab_file (`str`, *optional*):
Path to the vocabulary file. Path to the vocabulary file.
merges_file (`str`): merges_file (`str`, *optional*):
Path to the merges file. Path to the merges file.
tokenizer_file (`str`, *optional*): tokenizer_file (`str`, *optional*):
The path to a tokenizer file to use instead of the vocab file. The path to a tokenizer file to use instead of the vocab file.
......
...@@ -58,23 +58,23 @@ class DeiTConfig(PretrainedConfig): ...@@ -58,23 +58,23 @@ class DeiTConfig(PretrainedConfig):
hidden_act (`str` or `function`, *optional*, defaults to `"gelu"`): hidden_act (`str` or `function`, *optional*, defaults to `"gelu"`):
The non-linear activation function (function or string) in the encoder and pooler. If string, `"gelu"`, The non-linear activation function (function or string) in the encoder and pooler. If string, `"gelu"`,
`"relu"`, `"selu"` and `"gelu_new"` are supported. `"relu"`, `"selu"` and `"gelu_new"` are supported.
hidden_dropout_prob (`float`, *optional*, defaults to 0.1): hidden_dropout_prob (`float`, *optional*, defaults to 0.0):
The dropout probabilitiy for all fully connected layers in the embeddings, encoder, and pooler. The dropout probabilitiy for all fully connected layers in the embeddings, encoder, and pooler.
attention_probs_dropout_prob (`float`, *optional*, defaults to 0.1): attention_probs_dropout_prob (`float`, *optional*, defaults to 0.0):
The dropout ratio for the attention probabilities. The dropout ratio for the attention probabilities.
initializer_range (`float`, *optional*, defaults to 0.02): initializer_range (`float`, *optional*, defaults to 0.02):
The standard deviation of the truncated_normal_initializer for initializing all weight matrices. The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
layer_norm_eps (`float`, *optional*, defaults to 1e-12): layer_norm_eps (`float`, *optional*, defaults to 1e-12):
The epsilon used by the layer normalization layers. The epsilon used by the layer normalization layers.
image_size (`int`, *optional*, defaults to `224`): image_size (`int`, *optional*, defaults to 224):
The size (resolution) of each image. The size (resolution) of each image.
patch_size (`int`, *optional*, defaults to `16`): patch_size (`int`, *optional*, defaults to 16):
The size (resolution) of each patch. The size (resolution) of each patch.
num_channels (`int`, *optional*, defaults to `3`): num_channels (`int`, *optional*, defaults to 3):
The number of input channels. The number of input channels.
qkv_bias (`bool`, *optional*, defaults to `True`): qkv_bias (`bool`, *optional*, defaults to `True`):
Whether to add a bias to the queries, keys and values. Whether to add a bias to the queries, keys and values.
encoder_stride (`int`, `optional`, defaults to 16): encoder_stride (`int`, *optional*, defaults to 16):
Factor to increase the spatial resolution by in the decoder head for masked image modeling. Factor to increase the spatial resolution by in the decoder head for masked image modeling.
Example: Example:
......
...@@ -52,19 +52,19 @@ class DeiTImageProcessor(BaseImageProcessor): ...@@ -52,19 +52,19 @@ class DeiTImageProcessor(BaseImageProcessor):
`do_resize` in `preprocess`. `do_resize` in `preprocess`.
size (`Dict[str, int]` *optional*, defaults to `{"height": 256, "width": 256}`): size (`Dict[str, int]` *optional*, defaults to `{"height": 256, "width": 256}`):
Size of the image after `resize`. Can be overridden by `size` in `preprocess`. Size of the image after `resize`. Can be overridden by `size` in `preprocess`.
resample (`PILImageResampling` filter, *optional*, defaults to `PILImageResampling.BICUBIC`): resample (`PILImageResampling` filter, *optional*, defaults to `Resampling.BICUBIC`):
Resampling filter to use if resizing the image. Can be overridden by `resample` in `preprocess`. Resampling filter to use if resizing the image. Can be overridden by `resample` in `preprocess`.
do_center_crop (`bool`, *optional*, defaults to `True`): do_center_crop (`bool`, *optional*, defaults to `True`):
Whether to center crop the image. If the input size is smaller than `crop_size` along any edge, the image Whether to center crop the image. If the input size is smaller than `crop_size` along any edge, the image
is padded with 0's and then center cropped. Can be overridden by `do_center_crop` in `preprocess`. is padded with 0's and then center cropped. Can be overridden by `do_center_crop` in `preprocess`.
crop_size (`Dict[str, int]`, *optional*, defaults to `{"height": 224, "width": 224}`): crop_size (`Dict[str, int]`, *optional*, defaults to `{"height": 224, "width": 224}`):
Desired output size when applying center-cropping. Can be overridden by `crop_size` in `preprocess`. Desired output size when applying center-cropping. Can be overridden by `crop_size` in `preprocess`.
do_rescale (`bool`, *optional*, defaults to `True`):
Whether to rescale the image by the specified scale `rescale_factor`. Can be overridden by the `do_rescale`
parameter in the `preprocess` method.
rescale_factor (`int` or `float`, *optional*, defaults to `1/255`): rescale_factor (`int` or `float`, *optional*, defaults to `1/255`):
Scale factor to use if rescaling the image. Can be overridden by the `rescale_factor` parameter in the Scale factor to use if rescaling the image. Can be overridden by the `rescale_factor` parameter in the
`preprocess` method. `preprocess` method.
do_rescale (`bool`, *optional*, defaults to `True`):
Whether to rescale the image by the specified scale `rescale_factor`. Can be overridden by the `do_rescale`
parameter in the `preprocess` method.
do_normalize (`bool`, *optional*, defaults to `True`): do_normalize (`bool`, *optional*, defaults to `True`):
Whether to normalize the image. Can be overridden by the `do_normalize` parameter in the `preprocess` Whether to normalize the image. Can be overridden by the `do_normalize` parameter in the `preprocess`
method. method.
......
...@@ -53,7 +53,7 @@ class MCTCTConfig(PretrainedConfig): ...@@ -53,7 +53,7 @@ class MCTCTConfig(PretrainedConfig):
Dimensions of each attention head for each attention layer in the Transformer encoder. Dimensions of each attention head for each attention layer in the Transformer encoder.
max_position_embeddings (`int`, *optional*, defaults to 920): max_position_embeddings (`int`, *optional*, defaults to 920):
The maximum sequence length that this model might ever be used with (after log-mel spectrogram extraction). The maximum sequence length that this model might ever be used with (after log-mel spectrogram extraction).
layer_norm_eps (`float`, *optional*, defaults to 1e-5): layer_norm_eps (`float`, *optional*, defaults to 1e-05):
The epsilon used by the layer normalization layers. The epsilon used by the layer normalization layers.
layerdrop (`float`, *optional*, defaults to 0.3): layerdrop (`float`, *optional*, defaults to 0.3):
The probability of dropping an encoder layer during training. The default 0.3 value is used in the original The probability of dropping an encoder layer during training. The default 0.3 value is used in the original
...@@ -63,9 +63,9 @@ class MCTCTConfig(PretrainedConfig): ...@@ -63,9 +63,9 @@ class MCTCTConfig(PretrainedConfig):
`"relu"`, `"selu"` and `"gelu_new"` are supported. `"relu"`, `"selu"` and `"gelu_new"` are supported.
initializer_range (`float`, *optional*, defaults to 0.02): initializer_range (`float`, *optional*, defaults to 0.02):
The standard deviation of the truncated_normal_initializer for initializing all weight matrices. The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
hidden_dropout_prob (`float`, *optional*, defaults to 0.1): hidden_dropout_prob (`float`, *optional*, defaults to 0.3):
The dropout probabilitiy for all fully connected layers in the embeddings, encoder, and pooler. The dropout probabilitiy for all fully connected layers in the embeddings, encoder, and pooler.
attention_probs_dropout_prob (`float`, *optional*, defaults to 0.1): attention_probs_dropout_prob (`float`, *optional*, defaults to 0.3):
The dropout ratio for the attention probabilities. The dropout ratio for the attention probabilities.
pad_token_id (`int`, *optional*, defaults to 1): pad_token_id (`int`, *optional*, defaults to 1):
The tokenizer index of the pad token. The tokenizer index of the pad token.
...@@ -80,17 +80,17 @@ class MCTCTConfig(PretrainedConfig): ...@@ -80,17 +80,17 @@ class MCTCTConfig(PretrainedConfig):
The probability of randomly dropping the `Conv1dSubsampler` layer during training. The probability of randomly dropping the `Conv1dSubsampler` layer during training.
num_conv_layers (`int`, *optional*, defaults to 1): num_conv_layers (`int`, *optional*, defaults to 1):
Number of convolution layers before applying transformer encoder layers. Number of convolution layers before applying transformer encoder layers.
conv_kernel (`List[int]`, *optional*, defaults to `[7]`): conv_kernel (`Sequence[int]`, *optional*, defaults to `(7,)`):
The kernel size of the 1D convolution applied before transformer layers. `len(conv_kernel)` must be equal The kernel size of the 1D convolution applied before transformer layers. `len(conv_kernel)` must be equal
to `num_conv_layers`. to `num_conv_layers`.
conv_stride (`List[int]`, *optional*, defaults to `[3]`): conv_stride (`Sequence[int]`, *optional*, defaults to `(3,)`):
The stride length of the 1D convolution applied before transformer layers. `len(conv_stride)` must be equal The stride length of the 1D convolution applied before transformer layers. `len(conv_stride)` must be equal
to `num_conv_layers`. to `num_conv_layers`.
input_feat_per_channel (`int`, *optional*, defaults to 80): input_feat_per_channel (`int`, *optional*, defaults to 80):
Feature dimensions of the channels of the input to the Conv1D layer. Feature dimensions of the channels of the input to the Conv1D layer.
input_channels (`int`, *optional*, defaults to 1): input_channels (`int`, *optional*, defaults to 1):
Number of input channels of the input to the Conv1D layer. Number of input channels of the input to the Conv1D layer.
conv_channels (`List[int]`, *optional*, defaults to None): conv_channels (`List[int]`, *optional*):
Channel sizes of intermediate Conv1D layers. Channel sizes of intermediate Conv1D layers.
ctc_loss_reduction (`str`, *optional*, defaults to `"sum"`): ctc_loss_reduction (`str`, *optional*, defaults to `"sum"`):
Specifies the reduction to apply to the output of `torch.nn.CTCLoss`. Only relevant when training an Specifies the reduction to apply to the output of `torch.nn.CTCLoss`. Only relevant when training an
......
...@@ -57,9 +57,9 @@ class VanConfig(PretrainedConfig): ...@@ -57,9 +57,9 @@ class VanConfig(PretrainedConfig):
`"selu"` and `"gelu_new"` are supported. `"selu"` and `"gelu_new"` are supported.
initializer_range (`float`, *optional*, defaults to 0.02): initializer_range (`float`, *optional*, defaults to 0.02):
The standard deviation of the truncated_normal_initializer for initializing all weight matrices. The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
layer_norm_eps (`float`, *optional*, defaults to 1e-12): layer_norm_eps (`float`, *optional*, defaults to 1e-06):
The epsilon used by the layer normalization layers. The epsilon used by the layer normalization layers.
layer_scale_init_value (`float`, *optional*, defaults to 1e-2): layer_scale_init_value (`float`, *optional*, defaults to 0.01):
The initial value for layer scaling. The initial value for layer scaling.
drop_path_rate (`float`, *optional*, defaults to 0.0): drop_path_rate (`float`, *optional*, defaults to 0.0):
The dropout probability for stochastic depth. The dropout probability for stochastic depth.
......
...@@ -44,9 +44,9 @@ class DinatConfig(BackboneConfigMixin, PretrainedConfig): ...@@ -44,9 +44,9 @@ class DinatConfig(BackboneConfigMixin, PretrainedConfig):
The number of input channels. The number of input channels.
embed_dim (`int`, *optional*, defaults to 64): embed_dim (`int`, *optional*, defaults to 64):
Dimensionality of patch embedding. Dimensionality of patch embedding.
depths (`List[int]`, *optional*, defaults to `[2, 2, 6, 2]`): depths (`List[int]`, *optional*, defaults to `[3, 4, 6, 5]`):
Number of layers in each level of the encoder. Number of layers in each level of the encoder.
num_heads (`List[int]`, *optional*, defaults to `[3, 6, 12, 24]`): num_heads (`List[int]`, *optional*, defaults to `[2, 4, 8, 16]`):
Number of attention heads in each layer of the Transformer encoder. Number of attention heads in each layer of the Transformer encoder.
kernel_size (`int`, *optional*, defaults to 7): kernel_size (`int`, *optional*, defaults to 7):
Neighborhood Attention kernel size. Neighborhood Attention kernel size.
...@@ -67,7 +67,7 @@ class DinatConfig(BackboneConfigMixin, PretrainedConfig): ...@@ -67,7 +67,7 @@ class DinatConfig(BackboneConfigMixin, PretrainedConfig):
`"selu"` and `"gelu_new"` are supported. `"selu"` and `"gelu_new"` are supported.
initializer_range (`float`, *optional*, defaults to 0.02): initializer_range (`float`, *optional*, defaults to 0.02):
The standard deviation of the truncated_normal_initializer for initializing all weight matrices. The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
layer_norm_eps (`float`, *optional*, defaults to 1e-12): layer_norm_eps (`float`, *optional*, defaults to 1e-05):
The epsilon used by the layer normalization layers. The epsilon used by the layer normalization layers.
layer_scale_init_value (`float`, *optional*, defaults to 0.0): layer_scale_init_value (`float`, *optional*, defaults to 0.0):
The initial value for the layer scale. Disabled if <=0. The initial value for the layer scale. Disabled if <=0.
......
...@@ -60,7 +60,7 @@ class Dinov2Config(BackboneConfigMixin, PretrainedConfig): ...@@ -60,7 +60,7 @@ class Dinov2Config(BackboneConfigMixin, PretrainedConfig):
The dropout ratio for the attention probabilities. The dropout ratio for the attention probabilities.
initializer_range (`float`, *optional*, defaults to 0.02): initializer_range (`float`, *optional*, defaults to 0.02):
The standard deviation of the truncated_normal_initializer for initializing all weight matrices. The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
layer_norm_eps (`float`, *optional*, defaults to 1e-6): layer_norm_eps (`float`, *optional*, defaults to 1e-06):
The epsilon used by the layer normalization layers. The epsilon used by the layer normalization layers.
image_size (`int`, *optional*, defaults to 224): image_size (`int`, *optional*, defaults to 224):
The size (resolution) of each image. The size (resolution) of each image.
......
...@@ -45,15 +45,15 @@ class DonutSwinConfig(PretrainedConfig): ...@@ -45,15 +45,15 @@ class DonutSwinConfig(PretrainedConfig):
The number of input channels. The number of input channels.
embed_dim (`int`, *optional*, defaults to 96): embed_dim (`int`, *optional*, defaults to 96):
Dimensionality of patch embedding. Dimensionality of patch embedding.
depths (`list(int)`, *optional*, defaults to [2, 2, 6, 2]): depths (`list(int)`, *optional*, defaults to `[2, 2, 6, 2]`):
Depth of each layer in the Transformer encoder. Depth of each layer in the Transformer encoder.
num_heads (`list(int)`, *optional*, defaults to [3, 6, 12, 24]): num_heads (`list(int)`, *optional*, defaults to `[3, 6, 12, 24]`):
Number of attention heads in each layer of the Transformer encoder. Number of attention heads in each layer of the Transformer encoder.
window_size (`int`, *optional*, defaults to 7): window_size (`int`, *optional*, defaults to 7):
Size of windows. Size of windows.
mlp_ratio (`float`, *optional*, defaults to 4.0): mlp_ratio (`float`, *optional*, defaults to 4.0):
Ratio of MLP hidden dimensionality to embedding dimensionality. Ratio of MLP hidden dimensionality to embedding dimensionality.
qkv_bias (`bool`, *optional*, defaults to True): qkv_bias (`bool`, *optional*, defaults to `True`):
Whether or not a learnable bias should be added to the queries, keys and values. Whether or not a learnable bias should be added to the queries, keys and values.
hidden_dropout_prob (`float`, *optional*, defaults to 0.0): hidden_dropout_prob (`float`, *optional*, defaults to 0.0):
The dropout probability for all fully connected layers in the embeddings and encoder. The dropout probability for all fully connected layers in the embeddings and encoder.
...@@ -64,11 +64,11 @@ class DonutSwinConfig(PretrainedConfig): ...@@ -64,11 +64,11 @@ class DonutSwinConfig(PretrainedConfig):
hidden_act (`str` or `function`, *optional*, defaults to `"gelu"`): hidden_act (`str` or `function`, *optional*, defaults to `"gelu"`):
The non-linear activation function (function or string) in the encoder. If string, `"gelu"`, `"relu"`, The non-linear activation function (function or string) in the encoder. If string, `"gelu"`, `"relu"`,
`"selu"` and `"gelu_new"` are supported. `"selu"` and `"gelu_new"` are supported.
use_absolute_embeddings (`bool`, *optional*, defaults to False): use_absolute_embeddings (`bool`, *optional*, defaults to `False`):
Whether or not to add absolute position embeddings to the patch embeddings. Whether or not to add absolute position embeddings to the patch embeddings.
initializer_range (`float`, *optional*, defaults to 0.02): initializer_range (`float`, *optional*, defaults to 0.02):
The standard deviation of the truncated_normal_initializer for initializing all weight matrices. The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
layer_norm_eps (`float`, *optional*, defaults to 1e-12): layer_norm_eps (`float`, *optional*, defaults to 1e-05):
The epsilon used by the layer normalization layers. The epsilon used by the layer normalization layers.
Example: Example:
......
...@@ -32,9 +32,9 @@ class DonutProcessor(ProcessorMixin): ...@@ -32,9 +32,9 @@ class DonutProcessor(ProcessorMixin):
[`~DonutProcessor.decode`] for more information. [`~DonutProcessor.decode`] for more information.
Args: Args:
image_processor ([`DonutImageProcessor`]): image_processor ([`DonutImageProcessor`], *optional*):
An instance of [`DonutImageProcessor`]. The image processor is a required input. An instance of [`DonutImageProcessor`]. The image processor is a required input.
tokenizer ([`XLMRobertaTokenizer`/`XLMRobertaTokenizerFast`]): tokenizer ([`XLMRobertaTokenizer`/`XLMRobertaTokenizerFast`], *optional*):
An instance of [`XLMRobertaTokenizer`/`XLMRobertaTokenizerFast`]. The tokenizer is a required input. An instance of [`XLMRobertaTokenizer`/`XLMRobertaTokenizerFast`]. The tokenizer is a required input.
""" """
attributes = ["image_processor", "tokenizer"] attributes = ["image_processor", "tokenizer"]
......
...@@ -52,9 +52,9 @@ class DPTConfig(PretrainedConfig): ...@@ -52,9 +52,9 @@ class DPTConfig(PretrainedConfig):
hidden_act (`str` or `function`, *optional*, defaults to `"gelu"`): hidden_act (`str` or `function`, *optional*, defaults to `"gelu"`):
The non-linear activation function (function or string) in the encoder and pooler. If string, `"gelu"`, The non-linear activation function (function or string) in the encoder and pooler. If string, `"gelu"`,
`"relu"`, `"selu"` and `"gelu_new"` are supported. `"relu"`, `"selu"` and `"gelu_new"` are supported.
hidden_dropout_prob (`float`, *optional*, defaults to 0.1): hidden_dropout_prob (`float`, *optional*, defaults to 0.0):
The dropout probabilitiy for all fully connected layers in the embeddings, encoder, and pooler. The dropout probabilitiy for all fully connected layers in the embeddings, encoder, and pooler.
attention_probs_dropout_prob (`float`, *optional*, defaults to 0.1): attention_probs_dropout_prob (`float`, *optional*, defaults to 0.0):
The dropout ratio for the attention probabilities. The dropout ratio for the attention probabilities.
initializer_range (`float`, *optional*, defaults to 0.02): initializer_range (`float`, *optional*, defaults to 0.02):
The standard deviation of the truncated_normal_initializer for initializing all weight matrices. The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
...@@ -66,6 +66,8 @@ class DPTConfig(PretrainedConfig): ...@@ -66,6 +66,8 @@ class DPTConfig(PretrainedConfig):
The size (resolution) of each patch. The size (resolution) of each patch.
num_channels (`int`, *optional*, defaults to 3): num_channels (`int`, *optional*, defaults to 3):
The number of input channels. The number of input channels.
is_hybrid (`bool`, *optional*, defaults to `False`):
Whether to use a hybrid backbone. Useful in the context of loading DPT-Hybrid models.
qkv_bias (`bool`, *optional*, defaults to `True`): qkv_bias (`bool`, *optional*, defaults to `True`):
Whether to add a bias to the queries, keys and values. Whether to add a bias to the queries, keys and values.
backbone_out_indices (`List[int]`, *optional*, defaults to `[2, 5, 8, 11]`): backbone_out_indices (`List[int]`, *optional*, defaults to `[2, 5, 8, 11]`):
...@@ -79,11 +81,9 @@ class DPTConfig(PretrainedConfig): ...@@ -79,11 +81,9 @@ class DPTConfig(PretrainedConfig):
- "project" passes information to the other tokens by concatenating the readout to all other tokens before - "project" passes information to the other tokens by concatenating the readout to all other tokens before
projecting the projecting the
representation to the original feature dimension D using a linear layer followed by a GELU non-linearity. representation to the original feature dimension D using a linear layer followed by a GELU non-linearity.
is_hybrid (`bool`, *optional*, defaults to `False`):
Whether to use a hybrid backbone. Useful in the context of loading DPT-Hybrid models.
reassemble_factors (`List[int]`, *optional*, defaults to `[4, 2, 1, 0.5]`): reassemble_factors (`List[int]`, *optional*, defaults to `[4, 2, 1, 0.5]`):
The up/downsampling factors of the reassemble layers. The up/downsampling factors of the reassemble layers.
neck_hidden_sizes (`List[str]`, *optional*, defaults to [96, 192, 384, 768]): neck_hidden_sizes (`List[str]`, *optional*, defaults to `[96, 192, 384, 768]`):
The hidden sizes to project to for the feature maps of the backbone. The hidden sizes to project to for the feature maps of the backbone.
fusion_hidden_size (`int`, *optional*, defaults to 256): fusion_hidden_size (`int`, *optional*, defaults to 256):
The number of channels before fusion. The number of channels before fusion.
......
...@@ -100,14 +100,14 @@ class DPTImageProcessor(BaseImageProcessor): ...@@ -100,14 +100,14 @@ class DPTImageProcessor(BaseImageProcessor):
Whether to resize the image's (height, width) dimensions. Can be overidden by `do_resize` in `preprocess`. Whether to resize the image's (height, width) dimensions. Can be overidden by `do_resize` in `preprocess`.
size (`Dict[str, int]` *optional*, defaults to `{"height": 384, "width": 384}`): size (`Dict[str, int]` *optional*, defaults to `{"height": 384, "width": 384}`):
Size of the image after resizing. Can be overidden by `size` in `preprocess`. Size of the image after resizing. Can be overidden by `size` in `preprocess`.
resample (`PILImageResampling`, *optional*, defaults to `Resampling.BILINEAR`):
Defines the resampling filter to use if resizing the image. Can be overidden by `resample` in `preprocess`.
keep_aspect_ratio (`bool`, *optional*, defaults to `False`): keep_aspect_ratio (`bool`, *optional*, defaults to `False`):
If `True`, the image is resized to the largest possible size such that the aspect ratio is preserved. Can If `True`, the image is resized to the largest possible size such that the aspect ratio is preserved. Can
be overidden by `keep_aspect_ratio` in `preprocess`. be overidden by `keep_aspect_ratio` in `preprocess`.
ensure_multiple_of (`int`, *optional*, defaults to 1): ensure_multiple_of (`int`, *optional*, defaults to 1):
If `do_resize` is `True`, the image is resized to a size that is a multiple of this value. Can be overidden If `do_resize` is `True`, the image is resized to a size that is a multiple of this value. Can be overidden
by `ensure_multiple_of` in `preprocess`. by `ensure_multiple_of` in `preprocess`.
resample (`PILImageResampling`, *optional*, defaults to `PILImageResampling.BILINEAR`):
Defines the resampling filter to use if resizing the image. Can be overidden by `resample` in `preprocess`.
do_rescale (`bool`, *optional*, defaults to `True`): do_rescale (`bool`, *optional*, defaults to `True`):
Whether to rescale the image by the specified scale `rescale_factor`. Can be overidden by `do_rescale` in Whether to rescale the image by the specified scale `rescale_factor`. Can be overidden by `do_rescale` in
`preprocess`. `preprocess`.
......
...@@ -52,22 +52,22 @@ class EfficientNetImageProcessor(BaseImageProcessor): ...@@ -52,22 +52,22 @@ class EfficientNetImageProcessor(BaseImageProcessor):
`do_resize` in `preprocess`. `do_resize` in `preprocess`.
size (`Dict[str, int]` *optional*, defaults to `{"height": 346, "width": 346}`): size (`Dict[str, int]` *optional*, defaults to `{"height": 346, "width": 346}`):
Size of the image after `resize`. Can be overridden by `size` in `preprocess`. Size of the image after `resize`. Can be overridden by `size` in `preprocess`.
resample (`PILImageResampling` filter, *optional*, defaults to `PILImageResampling.NEAREST`): resample (`PILImageResampling` filter, *optional*, defaults to 0):
Resampling filter to use if resizing the image. Can be overridden by `resample` in `preprocess`. Resampling filter to use if resizing the image. Can be overridden by `resample` in `preprocess`.
do_center_crop (`bool`, *optional*, defaults to `False`): do_center_crop (`bool`, *optional*, defaults to `False`):
Whether to center crop the image. If the input size is smaller than `crop_size` along any edge, the image Whether to center crop the image. If the input size is smaller than `crop_size` along any edge, the image
is padded with 0's and then center cropped. Can be overridden by `do_center_crop` in `preprocess`. is padded with 0's and then center cropped. Can be overridden by `do_center_crop` in `preprocess`.
crop_size (`Dict[str, int]`, *optional*, defaults to `{"height": 289, "width": 289}`): crop_size (`Dict[str, int]`, *optional*, defaults to `{"height": 289, "width": 289}`):
Desired output size when applying center-cropping. Can be overridden by `crop_size` in `preprocess`. Desired output size when applying center-cropping. Can be overridden by `crop_size` in `preprocess`.
do_rescale (`bool`, *optional*, defaults to `True`):
Whether to rescale the image by the specified scale `rescale_factor`. Can be overridden by the `do_rescale`
parameter in the `preprocess` method.
rescale_factor (`int` or `float`, *optional*, defaults to `1/255`): rescale_factor (`int` or `float`, *optional*, defaults to `1/255`):
Scale factor to use if rescaling the image. Can be overridden by the `rescale_factor` parameter in the Scale factor to use if rescaling the image. Can be overridden by the `rescale_factor` parameter in the
`preprocess` method. `preprocess` method.
rescale_offset (`bool`, *optional*, defaults to `False`): rescale_offset (`bool`, *optional*, defaults to `False`):
Whether to rescale the image between [-scale_range, scale_range] instead of [0, scale_range]. Can be Whether to rescale the image between [-scale_range, scale_range] instead of [0, scale_range]. Can be
overridden by the `rescale_factor` parameter in the `preprocess` method. overridden by the `rescale_factor` parameter in the `preprocess` method.
do_rescale (`bool`, *optional*, defaults to `True`):
Whether to rescale the image by the specified scale `rescale_factor`. Can be overridden by the `do_rescale`
parameter in the `preprocess` method.
do_normalize (`bool`, *optional*, defaults to `True`): do_normalize (`bool`, *optional*, defaults to `True`):
Whether to normalize the image. Can be overridden by the `do_normalize` parameter in the `preprocess` Whether to normalize the image. Can be overridden by the `do_normalize` parameter in the `preprocess`
method. method.
......
...@@ -46,13 +46,13 @@ class FalconConfig(PretrainedConfig): ...@@ -46,13 +46,13 @@ class FalconConfig(PretrainedConfig):
Number of hidden layers in the Transformer decoder. Number of hidden layers in the Transformer decoder.
num_attention_heads (`int`, *optional*, defaults to 71): num_attention_heads (`int`, *optional*, defaults to 71):
Number of attention heads for each attention layer in the Transformer encoder. Number of attention heads for each attention layer in the Transformer encoder.
layer_norm_epsilon (`float`, *optional*, defaults to 1e-05):
The epsilon used by the layer normalization layers.
initializer_range (`float`, *optional*, defaults to 0.02): initializer_range (`float`, *optional*, defaults to 0.02):
The standard deviation of the truncated_normal_initializer for initializing all weight matrices. The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
use_cache (`bool`, *optional*, defaults to `True`): use_cache (`bool`, *optional*, defaults to `True`):
Whether the model should return the last key/values attentions (not used by all models). Only relevant if Whether the model should return the last key/values attentions (not used by all models). Only relevant if
`config.is_decoder=True`. `config.is_decoder=True`.
layer_norm_epsilon (`float`, *optional*, defaults to 1e-5):
The epsilon used by the layer normalization layers.
hidden_dropout (`float`, *optional*, defaults to 0.0): hidden_dropout (`float`, *optional*, defaults to 0.0):
The dropout probability for MLP layers. The dropout probability for MLP layers.
attention_dropout (`float`, *optional*, defaults to 0.0): attention_dropout (`float`, *optional*, defaults to 0.0):
......
...@@ -207,7 +207,7 @@ class FlaubertTokenizer(PreTrainedTokenizer): ...@@ -207,7 +207,7 @@ class FlaubertTokenizer(PreTrainedTokenizer):
mask_token (`str`, *optional*, defaults to `"<special1>"`): mask_token (`str`, *optional*, defaults to `"<special1>"`):
The token used for masking values. This is the token used when training this model with masked language The token used for masking values. This is the token used when training this model with masked language
modeling. This is the token which the model will try to predict. modeling. This is the token which the model will try to predict.
additional_special_tokens (`List[str]`, *optional*, defaults to `["<special0>","<special1>","<special2>","<special3>","<special4>","<special5>","<special6>","<special7>","<special8>","<special9>"]`): additional_special_tokens (`List[str]`, *optional*, defaults to `['<special0>', '<special1>', '<special2>', '<special3>', '<special4>', '<special5>', '<special6>', '<special7>', '<special8>', '<special9>']`):
List of additional special tokens. List of additional special tokens.
lang2id (`Dict[str, int]`, *optional*): lang2id (`Dict[str, int]`, *optional*):
Dictionary mapping languages string identifiers to their IDs. Dictionary mapping languages string identifiers to their IDs.
......
...@@ -52,9 +52,9 @@ class FlavaImageConfig(PretrainedConfig): ...@@ -52,9 +52,9 @@ class FlavaImageConfig(PretrainedConfig):
hidden_act (`str` or `function`, *optional*, defaults to `"gelu"`): hidden_act (`str` or `function`, *optional*, defaults to `"gelu"`):
The non-linear activation function (function or string) in the encoder and pooler. If string, `"gelu"`, The non-linear activation function (function or string) in the encoder and pooler. If string, `"gelu"`,
`"relu"`, `"selu"` and `"gelu_new"` are supported. `"relu"`, `"selu"` and `"gelu_new"` are supported.
hidden_dropout_prob (`float`, *optional*, defaults to 0.1): hidden_dropout_prob (`float`, *optional*, defaults to 0.0):
The dropout probabilitiy for all fully connected layers in the embeddings, encoder, and pooler. The dropout probabilitiy for all fully connected layers in the embeddings, encoder, and pooler.
attention_probs_dropout_prob (`float`, *optional*, defaults to 0.1): attention_probs_dropout_prob (`float`, *optional*, defaults to 0.0):
The dropout ratio for the attention probabilities. The dropout ratio for the attention probabilities.
initializer_range (`float`, *optional*, defaults to 0.02): initializer_range (`float`, *optional*, defaults to 0.02):
The standard deviation of the truncated_normal_initializer for initializing all weight matrices. The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
...@@ -291,7 +291,7 @@ class FlavaMultimodalConfig(PretrainedConfig): ...@@ -291,7 +291,7 @@ class FlavaMultimodalConfig(PretrainedConfig):
Args: Args:
hidden_size (`int`, *optional*, defaults to 768): hidden_size (`int`, *optional*, defaults to 768):
Dimensionality of the encoder layers and the pooler layer. Dimensionality of the encoder layers and the pooler layer.
num_hidden_layers (`int`, *optional*, defaults to 12): num_hidden_layers (`int`, *optional*, defaults to 6):
Number of hidden layers in the Transformer encoder. Number of hidden layers in the Transformer encoder.
num_attention_heads (`int`, *optional*, defaults to 12): num_attention_heads (`int`, *optional*, defaults to 12):
Number of attention heads for each attention layer in the Transformer encoder. Number of attention heads for each attention layer in the Transformer encoder.
...@@ -300,9 +300,9 @@ class FlavaMultimodalConfig(PretrainedConfig): ...@@ -300,9 +300,9 @@ class FlavaMultimodalConfig(PretrainedConfig):
hidden_act (`str` or `function`, *optional*, defaults to `"gelu"`): hidden_act (`str` or `function`, *optional*, defaults to `"gelu"`):
The non-linear activation function (function or string) in the encoder and pooler. If string, `"gelu"`, The non-linear activation function (function or string) in the encoder and pooler. If string, `"gelu"`,
`"relu"`, `"selu"` and `"gelu_new"` are supported. `"relu"`, `"selu"` and `"gelu_new"` are supported.
hidden_dropout_prob (`float`, *optional*, defaults to 0.1): hidden_dropout_prob (`float`, *optional*, defaults to 0.0):
The dropout probabilitiy for all fully connected layers in the embeddings, encoder, and pooler. The dropout probabilitiy for all fully connected layers in the embeddings, encoder, and pooler.
attention_probs_dropout_prob (`float`, *optional*, defaults to 0.1): attention_probs_dropout_prob (`float`, *optional*, defaults to 0.0):
The dropout ratio for the attention probabilities. The dropout ratio for the attention probabilities.
initializer_range (`float`, *optional*, defaults to 0.02): initializer_range (`float`, *optional*, defaults to 0.02):
The standard deviation of the truncated_normal_initializer for initializing all weight matrices. The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
......
...@@ -33,8 +33,8 @@ class FlavaProcessor(ProcessorMixin): ...@@ -33,8 +33,8 @@ class FlavaProcessor(ProcessorMixin):
[`~FlavaProcessor.__call__`] and [`~FlavaProcessor.decode`] for more information. [`~FlavaProcessor.__call__`] and [`~FlavaProcessor.decode`] for more information.
Args: Args:
image_processor ([`FlavaImageProcessor`]): The image processor is a required input. image_processor ([`FlavaImageProcessor`], *optional*): The image processor is a required input.
tokenizer ([`BertTokenizerFast`]): The tokenizer is a required input. tokenizer ([`BertTokenizerFast`], *optional*): The tokenizer is a required input.
""" """
attributes = ["image_processor", "tokenizer"] attributes = ["image_processor", "tokenizer"]
image_processor_class = "FlavaImageProcessor" image_processor_class = "FlavaImageProcessor"
......
...@@ -67,7 +67,7 @@ class FocalNetConfig(BackboneConfigMixin, PretrainedConfig): ...@@ -67,7 +67,7 @@ class FocalNetConfig(BackboneConfigMixin, PretrainedConfig):
Stochastic depth rate. Stochastic depth rate.
use_layerscale (`bool`, *optional*, defaults to `False`): use_layerscale (`bool`, *optional*, defaults to `False`):
Whether to use layer scale in the encoder. Whether to use layer scale in the encoder.
layerscale_value (`float`, *optional*, defaults to 1e-4): layerscale_value (`float`, *optional*, defaults to 0.0001):
The initial value of the layer scale. The initial value of the layer scale.
use_post_layernorm (`bool`, *optional*, defaults to `False`): use_post_layernorm (`bool`, *optional*, defaults to `False`):
Whether to use post layer normalization in the encoder. Whether to use post layer normalization in the encoder.
...@@ -77,9 +77,9 @@ class FocalNetConfig(BackboneConfigMixin, PretrainedConfig): ...@@ -77,9 +77,9 @@ class FocalNetConfig(BackboneConfigMixin, PretrainedConfig):
Whether to normalize the modulator. Whether to normalize the modulator.
initializer_range (`float`, *optional*, defaults to 0.02): initializer_range (`float`, *optional*, defaults to 0.02):
The standard deviation of the truncated_normal_initializer for initializing all weight matrices. The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
layer_norm_eps (`float`, *optional*, defaults to 1e-5): layer_norm_eps (`float`, *optional*, defaults to 1e-05):
The epsilon used by the layer normalization layers. The epsilon used by the layer normalization layers.
encoder_stride (`int`, `optional`, defaults to 32): encoder_stride (`int`, *optional*, defaults to 32):
Factor to increase the spatial resolution by in the decoder head for masked image modeling. Factor to increase the spatial resolution by in the decoder head for masked image modeling.
out_features (`List[str]`, *optional*): out_features (`List[str]`, *optional*):
If used as backbone, list of features to output. Can be any of `"stem"`, `"stage1"`, `"stage2"`, etc. If used as backbone, list of features to output. Can be any of `"stem"`, `"stage1"`, `"stage2"`, etc.
......
...@@ -146,13 +146,13 @@ class FSMTTokenizer(PreTrainedTokenizer): ...@@ -146,13 +146,13 @@ class FSMTTokenizer(PreTrainedTokenizer):
this superclass for more information regarding those methods. this superclass for more information regarding those methods.
Args: Args:
langs (`List[str]`): langs (`List[str]`, *optional*):
A list of two languages to translate from and to, for instance `["en", "ru"]`. A list of two languages to translate from and to, for instance `["en", "ru"]`.
src_vocab_file (`str`): src_vocab_file (`str`, *optional*):
File containing the vocabulary for the source language. File containing the vocabulary for the source language.
tgt_vocab_file (`st`): tgt_vocab_file (`st`, *optional*):
File containing the vocabulary for the target language. File containing the vocabulary for the target language.
merges_file (`str`): merges_file (`str`, *optional*):
File containing the merges. File containing the merges.
do_lower_case (`bool`, *optional*, defaults to `False`): do_lower_case (`bool`, *optional*, defaults to `False`):
Whether or not to lowercase the input when tokenizing. Whether or not to lowercase the input when tokenizing.
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment