"tests/models/vscode:/vscode.git/clone" did not exist on "38bff8c84f81a7aec58ac1f0d6463de2584f4a52"
Unverified Commit 03af4c42 authored by Sylvain Gugger's avatar Sylvain Gugger Committed by GitHub
Browse files

Docstring check (#26052)



* Fix number of minimal calls to the Hub with peft integration

* Alternate design

* And this way?

* Revert

* Nits to fix

* Add util

* Print when changes are made

* Add list to ignore

* Add more rules

* Manual fixes

* deal with kwargs

* deal with enum defaults

* avoid many digits for floats

* Manual fixes

* Fix regex

* Fix regex

* Auto fix

* Style

* Apply script

* Add ignored list

* Add check that templates are filled

* Adding to CI checks

* Add back semi-fix

* Ignore more objects

* More auto-fixes

* Ignore missing objects

* Remove temp semi-fix

* Fixes

* Update src/transformers/models/pvt/configuration_pvt.py
Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>

* Update utils/check_docstrings.py
Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/utils/quantization_config.py
Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>

* Deal with float defaults

* Fix small defaults

* Address review comment

* Treat

* Post-rebase cleanup

* Address review comment

* Update src/transformers/models/deprecated/mctct/configuration_mctct.py
Co-authored-by: default avatarLysandre Debut <lysandre.debut@reseau.eseo.fr>

* Address review comment

---------
Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: default avatarLysandre Debut <lysandre.debut@reseau.eseo.fr>
parent 122b2657
......@@ -358,20 +358,17 @@ class MaskFormerImageProcessor(BaseImageProcessor):
sequence like `(width, height)`, output size will be matched to this. If size is an int, smaller edge of
the image will be matched to this number. i.e, if `height > width`, then image will be rescaled to `(size *
height / width, size)`.
max_size (`int`, *optional*, defaults to 1333):
The largest size an image dimension can have (otherwise it's capped). Only has an effect if `do_resize` is
set to `True`.
resample (`int`, *optional*, defaults to `PIL.Image.Resampling.BILINEAR`):
size_divisor (`int`, *optional*, defaults to 32):
Some backbones need images divisible by a certain number. If not passed, it defaults to the value used in
Swin Transformer.
resample (`int`, *optional*, defaults to `Resampling.BILINEAR`):
An optional resampling filter. This can be one of `PIL.Image.Resampling.NEAREST`,
`PIL.Image.Resampling.BOX`, `PIL.Image.Resampling.BILINEAR`, `PIL.Image.Resampling.HAMMING`,
`PIL.Image.Resampling.BICUBIC` or `PIL.Image.Resampling.LANCZOS`. Only has an effect if `do_resize` is set
to `True`.
size_divisor (`int`, *optional*, defaults to 32):
Some backbones need images divisible by a certain number. If not passed, it defaults to the value used in
Swin Transformer.
do_rescale (`bool`, *optional*, defaults to `True`):
Whether to rescale the input to a certain `scale`.
rescale_factor (`float`, *optional*, defaults to 1/ 255):
rescale_factor (`float`, *optional*, defaults to `1/ 255`):
Rescale the input by the given factor. Only has an effect if `do_rescale` is set to `True`.
do_normalize (`bool`, *optional*, defaults to `True`):
Whether or not to normalize the input with mean and standard deviation.
......
......@@ -62,7 +62,7 @@ class MgpstrConfig(PretrainedConfig):
Whether to add a bias to the queries, keys and values.
distilled (`bool`, *optional*, defaults to `False`):
Model includes a distillation token and head as in DeiT models.
layer_norm_eps (`float`, *optional*, defaults to 1e-5):
layer_norm_eps (`float`, *optional*, defaults to 1e-05):
The epsilon used by the layer normalization layers.
drop_rate (`float`, *optional*, defaults to 0.0):
The dropout probability for all fully connected layers in the embeddings, encoder.
......
......@@ -44,9 +44,9 @@ class MgpstrProcessor(ProcessorMixin):
[`~MgpstrProcessor.__call__`] and [`~MgpstrProcessor.batch_decode`] for more information.
Args:
image_processor (`ViTImageProcessor`):
image_processor (`ViTImageProcessor`, *optional*):
An instance of `ViTImageProcessor`. The image processor is a required input.
tokenizer ([`MgpstrTokenizer`]):
tokenizer ([`MgpstrTokenizer`], *optional*):
The tokenizer is a required input.
"""
attributes = ["image_processor", "char_tokenizer"]
......
......@@ -52,7 +52,7 @@ class MgpstrTokenizer(PreTrainedTokenizer):
The beginning of sequence token.
eos_token (`str`, *optional*, defaults to `"[s]"`):
The end of sequence token.
pad_token (`str` or `tokenizers.AddedToken`, *optional*, , defaults to `"[GO]"`):
pad_token (`str` or `tokenizers.AddedToken`, *optional*, defaults to `"[GO]"`):
A special token used to make arrays of tokens the same size for batching purpose. Will then be ignored by
attention mechanisms or loss computation.
"""
......
......@@ -55,7 +55,7 @@ class MobileNetV1Config(PretrainedConfig):
All layers will have at least this many channels.
hidden_act (`str` or `function`, *optional*, defaults to `"relu6"`):
The non-linear activation function (function or string) in the Transformer encoder and convolution layers.
tf_padding (`bool`, `optional`, defaults to `True`):
tf_padding (`bool`, *optional*, defaults to `True`):
Whether to use TensorFlow padding rules on the convolution layers.
classifier_dropout_prob (`float`, *optional*, defaults to 0.999):
The dropout ratio for attached classifiers.
......
......@@ -64,16 +64,16 @@ class MobileNetV2Config(PretrainedConfig):
the input dimensions by a factor of 32. If `output_stride` is 8 or 16, the model uses dilated convolutions
on the depthwise layers instead of regular convolutions, so that the feature maps never become more than 8x
or 16x smaller than the input image.
first_layer_is_expansion (`bool`, `optional`, defaults to `True`):
first_layer_is_expansion (`bool`, *optional*, defaults to `True`):
True if the very first convolution layer is also the expansion layer for the first expansion block.
finegrained_output (`bool`, `optional`, defaults to `True`):
finegrained_output (`bool`, *optional*, defaults to `True`):
If true, the number of output channels in the final convolution layer will stay large (1280) even if
`depth_multiplier` is less than 1.
hidden_act (`str` or `function`, *optional*, defaults to `"relu6"`):
The non-linear activation function (function or string) in the Transformer encoder and convolution layers.
tf_padding (`bool`, `optional`, defaults to `True`):
tf_padding (`bool`, *optional*, defaults to `True`):
Whether to use TensorFlow padding rules on the convolution layers.
classifier_dropout_prob (`float`, *optional*, defaults to 0.999):
classifier_dropout_prob (`float`, *optional*, defaults to 0.8):
The dropout ratio for attached classifiers.
initializer_range (`float`, *optional*, defaults to 0.02):
The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
......@@ -105,7 +105,7 @@ class MobileNetV2Config(PretrainedConfig):
depth_multiplier=1.0,
depth_divisible_by=8,
min_depth=8,
expand_ratio=6,
expand_ratio=6.0,
output_stride=32,
first_layer_is_expansion=True,
finegrained_output=True,
......
......@@ -74,7 +74,7 @@ class MobileViTConfig(PretrainedConfig):
The non-linear activation function (function or string) in the Transformer encoder and convolution layers.
conv_kernel_size (`int`, *optional*, defaults to 3):
The size of the convolutional kernel in the MobileViT layer.
output_stride (`int`, `optional`, defaults to 32):
output_stride (`int`, *optional*, defaults to 32):
The ratio of the spatial resolution of the output to the resolution of the input image.
hidden_dropout_prob (`float`, *optional*, defaults to 0.1):
The dropout probabilitiy for all fully connected layers in the Transformer encoder.
......@@ -84,11 +84,11 @@ class MobileViTConfig(PretrainedConfig):
The dropout ratio for attached classifiers.
initializer_range (`float`, *optional*, defaults to 0.02):
The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
layer_norm_eps (`float`, *optional*, defaults to 1e-5):
layer_norm_eps (`float`, *optional*, defaults to 1e-05):
The epsilon used by the layer normalization layers.
qkv_bias (`bool`, *optional*, defaults to `True`):
Whether to add a bias to the queries, keys and values.
aspp_out_channels (`int`, `optional`, defaults to 256):
aspp_out_channels (`int`, *optional*, defaults to 256):
Number of output channels used in the ASPP layer for semantic segmentation.
atrous_rates (`List[int]`, *optional*, defaults to `[6, 12, 18]`):
Dilation (atrous) factors used in the ASPP layer for semantic segmentation.
......
......@@ -59,7 +59,7 @@ class MobileViTImageProcessor(BaseImageProcessor):
size (`Dict[str, int]` *optional*, defaults to `{"shortest_edge": 224}`):
Controls the size of the output image after resizing. Can be overridden by the `size` parameter in the
`preprocess` method.
resample (`PILImageResampling`, *optional*, defaults to `PILImageResampling.BILINEAR`):
resample (`PILImageResampling`, *optional*, defaults to `Resampling.BILINEAR`):
Defines the resampling filter to use if resizing the image. Can be overridden by the `resample` parameter
in the `preprocess` method.
do_rescale (`bool`, *optional*, defaults to `True`):
......
......@@ -54,15 +54,15 @@ class MobileViTV2Config(PretrainedConfig):
The non-linear activation function (function or string) in the Transformer encoder and convolution layers.
conv_kernel_size (`int`, *optional*, defaults to 3):
The size of the convolutional kernel in the MobileViTV2 layer.
output_stride (`int`, `optional`, defaults to 32):
output_stride (`int`, *optional*, defaults to 32):
The ratio of the spatial resolution of the output to the resolution of the input image.
classifier_dropout_prob (`float`, *optional*, defaults to 0.1):
The dropout ratio for attached classifiers.
initializer_range (`float`, *optional*, defaults to 0.02):
The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
layer_norm_eps (`float`, *optional*, defaults to 1e-5):
layer_norm_eps (`float`, *optional*, defaults to 1e-05):
The epsilon used by the layer normalization layers.
aspp_out_channels (`int`, `optional`, defaults to 512):
aspp_out_channels (`int`, *optional*, defaults to 512):
Number of output channels used in the ASPP layer for semantic segmentation.
atrous_rates (`List[int]`, *optional*, defaults to `[6, 12, 18]`):
Dilation (atrous) factors used in the ASPP layer for semantic segmentation.
......@@ -74,13 +74,13 @@ class MobileViTV2Config(PretrainedConfig):
The number of attention blocks in each MobileViTV2Layer
base_attn_unit_dims (`List[int]`, *optional*, defaults to `[128, 192, 256]`):
The base multiplier for dimensions of attention blocks in each MobileViTV2Layer
width_multiplier (`float`, *optional*, defaults to 1.0)
width_multiplier (`float`, *optional*, defaults to 1.0):
The width multiplier for MobileViTV2.
ffn_multiplier (`int`, *optional*, defaults to 2)
ffn_multiplier (`int`, *optional*, defaults to 2):
The FFN multiplier for MobileViTV2.
attn_dropout (`float`, *optional*, defaults to 0.0)
attn_dropout (`float`, *optional*, defaults to 0.0):
The dropout in the attention layer.
ffn_dropout (`float`, *optional*, defaults to 0.0)
ffn_dropout (`float`, *optional*, defaults to 0.0):
The dropout between FFN layers.
Example:
......
......@@ -145,17 +145,17 @@ class MptConfig(PretrainedConfig):
the `inputs_ids` passed when calling [`MptModel`]. Check [this
discussion](https://huggingface.co/bigscience/mpt/discussions/120#633d28389addb8530b406c2a) on how the
`vocab_size` has been defined.
resid_pdrop (`float`, *optional*, defaults to 0.1):
resid_pdrop (`float`, *optional*, defaults to 0.0):
The dropout probability applied to the attention output before combining with residual.
layer_norm_epsilon (`float`, *optional*, defaults to 1e-5):
layer_norm_epsilon (`float`, *optional*, defaults to 1e-05):
The epsilon to use in the layer normalization layers.
emb_pdrop (`float`, *optional*, defaults to 0.1):
emb_pdrop (`float`, *optional*, defaults to 0.0):
The dropout probability for the embedding layer.
learned_pos_emb (`bool`, *optional*, defaults to `False`):
learned_pos_emb (`bool`, *optional*, defaults to `True`):
Whether to use learned positional embeddings.
attn_config (`dict`, *optional*):
A dictionary used to configure the model's attention module.
init_device (`str`, *optional*):
init_device (`str`, *optional*, defaults to `"cpu"`):
The device to use for parameter initialization. Defined for backward compatibility
logit_scale (`float`, *optional*):
If not None, scale the logits by this value.
......@@ -169,7 +169,7 @@ class MptConfig(PretrainedConfig):
norm_type (`str`, *optional*, defaults to `"low_precision_layernorm"`):
Type of layer norm to use. All MPT models uses the same layer norm implementation. Defined for backward
compatibility.
use_cache (`bool`, *optional*, defaults to `True`):
use_cache (`bool`, *optional*, defaults to `False`):
Whether or not the model should return the last key/values attentions (not used by all models).
initializer_range (`float`, *optional*, defaults to 0.02):
The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
......
......@@ -44,9 +44,9 @@ class NatConfig(BackboneConfigMixin, PretrainedConfig):
The number of input channels.
embed_dim (`int`, *optional*, defaults to 64):
Dimensionality of patch embedding.
depths (`List[int]`, *optional*, defaults to `[2, 2, 6, 2]`):
depths (`List[int]`, *optional*, defaults to `[3, 4, 6, 5]`):
Number of layers in each level of the encoder.
num_heads (`List[int]`, *optional*, defaults to `[3, 6, 12, 24]`):
num_heads (`List[int]`, *optional*, defaults to `[2, 4, 8, 16]`):
Number of attention heads in each layer of the Transformer encoder.
kernel_size (`int`, *optional*, defaults to 7):
Neighborhood Attention kernel size.
......@@ -65,7 +65,7 @@ class NatConfig(BackboneConfigMixin, PretrainedConfig):
`"selu"` and `"gelu_new"` are supported.
initializer_range (`float`, *optional*, defaults to 0.02):
The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
layer_norm_eps (`float`, *optional*, defaults to 1e-12):
layer_norm_eps (`float`, *optional*, defaults to 1e-05):
The epsilon used by the layer normalization layers.
layer_scale_init_value (`float`, *optional*, defaults to 0.0):
The initial value for the layer scale. Disabled if <=0.
......
......@@ -66,7 +66,7 @@ class NougatImageProcessor(BaseImageProcessor):
`do_resize` in the `preprocess` method.
size (`Dict[str, int]` *optional*, defaults to `{"height": 896, "width": 672}`):
Size of the image after resizing. Can be overridden by `size` in the `preprocess` method.
resample (`PILImageResampling`, *optional*, defaults to `PILImageResampling.BILINEAR`):
resample (`PILImageResampling`, *optional*, defaults to `Resampling.BILINEAR`):
Resampling filter to use if resizing the image. Can be overridden by `resample` in the `preprocess` method.
do_thumbnail (`bool`, *optional*, defaults to `True`):
Whether to resize the image using thumbnail method.
......
......@@ -383,10 +383,10 @@ class NougatTokenizerFast(PreTrainedTokenizerFast):
methods for postprocessing the generated text.
Args:
vocab_file (`str`):
vocab_file (`str`, *optional*):
[SentencePiece](https://github.com/google/sentencepiece) file (generally has a .model extension) that
contains the vocabulary necessary to instantiate a tokenizer.
tokenizer_file (`str`):
tokenizer_file (`str`, *optional*):
[tokenizers](https://github.com/huggingface/tokenizers) file (generally has a .json extension) that
contains everything needed to load the tokenizer.
......@@ -394,16 +394,16 @@ class NougatTokenizerFast(PreTrainedTokenizerFast):
Wether to cleanup spaces after decoding, cleanup consists in removing potential artifacts like extra
spaces.
unk_token (`str`, *optional*, defaults to `"<unk>"`):
The unknown token. A token that is not in the vocabulary cannot be converted to an ID and is set to be this
token instead.
bos_token (`str`, *optional*, defaults to `"<s>"`):
The beginning of sequence token that was used during pretraining. Can be used a sequence classifier token.
eos_token (`str`, *optional*, defaults to `"</s>"`):
The end of sequence token.
unk_token (`str`, *optional*, defaults to `"<unk>"`):
The unknown token. A token that is not in the vocabulary cannot be converted to an ID and is set to be this
token instead.
pad_token (`str`, *optional*, defaults to `"<pad>"`):
The token used for padding, for example when batching sequences of different lengths.
"""
......
......@@ -42,87 +42,87 @@ class OneFormerConfig(PretrainedConfig):
documentation from [`PretrainedConfig`] for more information.
Args:
backbone_config (`PretrainedConfig`, *optional*, defaults to `SwinConfig`)
backbone_config (`PretrainedConfig`, *optional*, defaults to `SwinConfig`):
The configuration of the backbone model.
ignore_value (`int`, *optional*, defaults to 255)
ignore_value (`int`, *optional*, defaults to 255):
Values to be ignored in GT label while calculating loss.
num_queries (`int`, *optional*, defaults to 150)
num_queries (`int`, *optional*, defaults to 150):
Number of object queries.
no_object_weight (`float`, *optional*, defaults to 0.1)
no_object_weight (`float`, *optional*, defaults to 0.1):
Weight for no-object class predictions.
class_weight (`float`, *optional*, defaults to 2.0)
class_weight (`float`, *optional*, defaults to 2.0):
Weight for Classification CE loss.
mask_weight (`float`, *optional*, defaults to 5.0)
mask_weight (`float`, *optional*, defaults to 5.0):
Weight for binary CE loss.
dice_weight (`float`, *optional*, defaults to 5.0)
dice_weight (`float`, *optional*, defaults to 5.0):
Weight for dice loss.
contrastive_weight (`float`, *optional*, defaults to 0.5)
contrastive_weight (`float`, *optional*, defaults to 0.5):
Weight for contrastive loss.
contrastive_temperature (`float`, *optional*, defaults to 0.07)
contrastive_temperature (`float`, *optional*, defaults to 0.07):
Initial value for scaling the contrastive logits.
train_num_points (`int`, *optional*, defaults to 12544)
train_num_points (`int`, *optional*, defaults to 12544):
Number of points to sample while calculating losses on mask predictions.
oversample_ratio (`float`, *optional*, defaults to 3.0)
oversample_ratio (`float`, *optional*, defaults to 3.0):
Ratio to decide how many points to oversample.
importance_sample_ratio (`float`, *optional*, defaults to 0.75)
importance_sample_ratio (`float`, *optional*, defaults to 0.75):
Ratio of points that are sampled via importance sampling.
init_std (`float`, *optional*, defaults to 0.02)
init_std (`float`, *optional*, defaults to 0.02):
Standard deviation for normal intialization.
init_xavier_std (`float`, *optional*, defaults to 0.02)
init_xavier_std (`float`, *optional*, defaults to 1.0):
Standard deviation for xavier uniform initialization.
layer_norm_eps (`float`, *optional*, defaults to 1e-05)
layer_norm_eps (`float`, *optional*, defaults to 1e-05):
Epsilon for layer normalization.
is_training (`bool`, *optional*, defaults to False)
is_training (`bool`, *optional*, defaults to `False`):
Whether to run in training or inference mode.
use_auxiliary_loss (`bool`, *optional*, defaults to True)
use_auxiliary_loss (`bool`, *optional*, defaults to `True`):
Whether to calculate loss using intermediate predictions from transformer decoder.
output_auxiliary_logits (`bool`, *optional*, defaults to True)
output_auxiliary_logits (`bool`, *optional*, defaults to `True`):
Whether to return intermediate predictions from transformer decoder.
strides (`list`, *optional*, defaults to [4, 8, 16, 32])
strides (`list`, *optional*, defaults to `[4, 8, 16, 32]`):
List containing the strides for feature maps in the encoder.
task_seq_len (`int`, *optional*, defaults to 77)
task_seq_len (`int`, *optional*, defaults to 77):
Sequence length for tokenizing text list input.
text_encoder_width (`int`, *optional*, defaults to 256)
text_encoder_width (`int`, *optional*, defaults to 256):
Hidden size for text encoder.
text_encoder_context_length (`int`, *optional*, defaults to 77):
Input sequence length for text encoder.
text_encoder_num_layers (`int`, *optional*, defaults to 6)
text_encoder_num_layers (`int`, *optional*, defaults to 6):
Number of layers for transformer in text encoder.
text_encoder_vocab_size (`int`, *optional*, defaults to 49408)
text_encoder_vocab_size (`int`, *optional*, defaults to 49408):
Vocabulary size for tokenizer.
text_encoder_proj_layers (`int`, *optional*, defaults to 2)
text_encoder_proj_layers (`int`, *optional*, defaults to 2):
Number of layers in MLP for project text queries.
text_encoder_n_ctx (`int`, *optional*, defaults to 16)
text_encoder_n_ctx (`int`, *optional*, defaults to 16):
Number of learnable text context queries.
conv_dim (`int`, *optional*, defaults to 256)
conv_dim (`int`, *optional*, defaults to 256):
Feature map dimension to map outputs from the backbone.
mask_dim (`int`, *optional*, defaults to 256)
mask_dim (`int`, *optional*, defaults to 256):
Dimension for feature maps in pixel decoder.
hidden_dim (`int`, *optional*, defaults to 256)
hidden_dim (`int`, *optional*, defaults to 256):
Dimension for hidden states in transformer decoder.
encoder_feedforward_dim (`int`, *optional*, defaults to 1024)
encoder_feedforward_dim (`int`, *optional*, defaults to 1024):
Dimension for FFN layer in pixel decoder.
norm (`str`, *optional*, defaults to `GN`)
norm (`str`, *optional*, defaults to `"GN"`):
Type of normalization.
encoder_layers (`int`, *optional*, defaults to 6)
encoder_layers (`int`, *optional*, defaults to 6):
Number of layers in pixel decoder.
decoder_layers (`int`, *optional*, defaults to 10)
decoder_layers (`int`, *optional*, defaults to 10):
Number of layers in transformer decoder.
use_task_norm (`bool`, *optional*, defaults to `True`)
use_task_norm (`bool`, *optional*, defaults to `True`):
Whether to normalize the task token.
num_attention_heads (`int`, *optional*, defaults to 8)
num_attention_heads (`int`, *optional*, defaults to 8):
Number of attention heads in transformer layers in the pixel and transformer decoders.
dropout (`float`, *optional*, defaults to 0.1)
dropout (`float`, *optional*, defaults to 0.1):
Dropout probability for pixel and transformer decoders.
dim_feedforward (`int`, *optional*, defaults to 2048)
dim_feedforward (`int`, *optional*, defaults to 2048):
Dimension for FFN layer in transformer decoder.
pre_norm (`bool`, *optional*, defaults to `False`)
pre_norm (`bool`, *optional*, defaults to `False`):
Whether to normalize hidden states before attention layers in transformer decoder.
enforce_input_proj (`bool`, *optional*, defaults to `False`)
enforce_input_proj (`bool`, *optional*, defaults to `False`):
Whether to project hidden states in transformer decoder.
query_dec_layers (`int`, *optional*, defaults to 2)
query_dec_layers (`int`, *optional*, defaults to 2):
Number of layers in query transformer.
common_stride (`int`, *optional*, defaults to 4)
common_stride (`int`, *optional*, defaults to 4):
Common stride used for features in pixel decoder.
Examples:
......
......@@ -361,17 +361,14 @@ class OneFormerImageProcessor(BaseImageProcessor):
sequence like `(width, height)`, output size will be matched to this. If size is an int, smaller edge of
the image will be matched to this number. i.e, if `height > width`, then image will be rescaled to `(size *
height / width, size)`.
max_size (`int`, *optional*, defaults to 1333):
The largest size an image dimension can have (otherwise it's capped). Only has an effect if `do_resize` is
set to `True`.
resample (`int`, *optional*, defaults to `PIL.Image.Resampling.BILINEAR`):
resample (`int`, *optional*, defaults to `Resampling.BILINEAR`):
An optional resampling filter. This can be one of `PIL.Image.Resampling.NEAREST`,
`PIL.Image.Resampling.BOX`, `PIL.Image.Resampling.BILINEAR`, `PIL.Image.Resampling.HAMMING`,
`PIL.Image.Resampling.BICUBIC` or `PIL.Image.Resampling.LANCZOS`. Only has an effect if `do_resize` is set
to `True`.
do_rescale (`bool`, *optional*, defaults to `True`):
Whether to rescale the input to a certain `scale`.
rescale_factor (`float`, *optional*, defaults to 1/ 255):
rescale_factor (`float`, *optional*, defaults to `1/ 255`):
Rescale the input by the given factor. Only has an effect if `do_rescale` is set to `True`.
do_normalize (`bool`, *optional*, defaults to `True`):
Whether or not to normalize the input with mean and standard deviation.
......@@ -387,9 +384,9 @@ class OneFormerImageProcessor(BaseImageProcessor):
Whether or not to decrement all label values of segmentation maps by 1. Usually used for datasets where 0
is used for background, and background itself is not included in all classes of a dataset (e.g. ADE20k).
The background label will be replaced by `ignore_index`.
repo_path (`str`, defaults to `shi-labs/oneformer_demo`):
repo_path (`str`, defaults to `shi-labs/oneformer_demo`, *optional*, defaults to `"shi-labs/oneformer_demo"`):
Dataset repository on huggingface hub containing the JSON file with class information for the dataset.
class_info_file (`str`):
class_info_file (`str`, *optional*):
JSON file containing class information for the dataset. It is stored inside on the `repo_path` dataset
repository.
num_text (`int`, *optional*):
......
......@@ -56,7 +56,7 @@ class OpenAIGPTConfig(PretrainedConfig):
The dropout ratio for the embeddings.
attn_pdrop (`float`, *optional*, defaults to 0.1):
The dropout ratio for the attention.
layer_norm_epsilon (`float`, *optional*, defaults to 1e-5):
layer_norm_epsilon (`float`, *optional*, defaults to 1e-05):
The epsilon to use in the layer normalization layers
initializer_range (`float`, *optional*, defaults to 0.02):
The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
......@@ -91,8 +91,6 @@ class OpenAIGPTConfig(PretrainedConfig):
[`OpenAIGPTDoubleHeadsModel`].
The dropout ratio to be used after the projection and activation.
use_cache (`bool`, *optional*, defaults to `True`):
Whether or not the model should return the last key/values attentions (not used by all models).
Examples:
......
......@@ -171,13 +171,13 @@ class OwlViTVisionConfig(PretrainedConfig):
hidden_act (`str` or `function`, *optional*, defaults to `"quick_gelu"`):
The non-linear activation function (function or string) in the encoder and pooler. If string, `"gelu"`,
`"relu"`, `"selu"` and `"gelu_new"` ``"quick_gelu"` are supported.
layer_norm_eps (`float`, *optional*, defaults to 1e-5):
layer_norm_eps (`float`, *optional*, defaults to 1e-05):
The epsilon used by the layer normalization layers.
attention_dropout (`float`, *optional*, defaults to 0.0):
The dropout ratio for the attention probabilities.
initializer_range (`float`, *optional*, defaults to 0.02):
The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
initializer_factor (`float``, *optional*, defaults to 1):
initializer_factor (`float``, *optional*, defaults to 1.0):
A factor for initializing all weight matrices (should be kept to 1, used internally for initialization
testing).
......
......@@ -102,7 +102,7 @@ class OwlViTImageProcessor(BaseImageProcessor):
The size to use for resizing the image. Only has an effect if `do_resize` is set to `True`. If `size` is a
sequence like (h, w), output size will be matched to this. If `size` is an int, then image will be resized
to (size, size).
resample (`int`, *optional*, defaults to `PIL.Image.Resampling.BICUBIC`):
resample (`int`, *optional*, defaults to `Resampling.BICUBIC`):
An optional resampling filter. This can be one of `PIL.Image.Resampling.NEAREST`,
`PIL.Image.Resampling.BOX`, `PIL.Image.Resampling.BILINEAR`, `PIL.Image.Resampling.HAMMING`,
`PIL.Image.Resampling.BICUBIC` or `PIL.Image.Resampling.LANCZOS`. Only has an effect if `do_resize` is set
......
......@@ -33,9 +33,9 @@ class OwlViTProcessor(ProcessorMixin):
[`~OwlViTProcessor.__call__`] and [`~OwlViTProcessor.decode`] for more information.
Args:
image_processor ([`OwlViTImageProcessor`]):
image_processor ([`OwlViTImageProcessor`], *optional*):
The image processor is a required input.
tokenizer ([`CLIPTokenizer`, `CLIPTokenizerFast`]):
tokenizer ([`CLIPTokenizer`, `CLIPTokenizerFast`], *optional*):
The tokenizer is a required input.
"""
attributes = ["image_processor", "tokenizer"]
......
......@@ -65,7 +65,7 @@ class PerceiverConfig(PretrainedConfig):
v_channels (`int`, *optional*):
Dimension to project the values before applying attention in the cross-attention and self-attention layers
of the encoder. Will default to preserving the dimension of the queries if not specified.
cross_attention_shape_for_attention (`str`, *optional*, defaults to `'kv'`):
cross_attention_shape_for_attention (`str`, *optional*, defaults to `"kv"`):
Dimension to use when downsampling the queries and keys in the cross-attention layer of the encoder.
self_attention_widening_factor (`int`, *optional*, defaults to 1):
Dimension of the feed-forward layer in the cross-attention layer of the Transformer encoder.
......@@ -89,7 +89,7 @@ class PerceiverConfig(PretrainedConfig):
this to something large just in case (e.g., 512 or 1024 or 2048).
image_size (`int`, *optional*, defaults to 56):
Size of the images after preprocessing, for [`PerceiverForImageClassificationLearned`].
train_size (`List[int]`, *optional*, defaults to [368, 496]):
train_size (`List[int]`, *optional*, defaults to `[368, 496]`):
Training size of the images for the optical flow model.
num_frames (`int`, *optional*, defaults to 16):
Number of video frames used for the multimodal autoencoding model.
......@@ -97,11 +97,11 @@ class PerceiverConfig(PretrainedConfig):
Number of audio samples per frame for the multimodal autoencoding model.
samples_per_patch (`int`, *optional*, defaults to 16):
Number of audio samples per patch when preprocessing the audio for the multimodal autoencoding model.
output_num_channels (`int`, *optional*, defaults to 512):
Number of output channels for each modalitiy decoder.
output_shape (`List[int]`, *optional*, defaults to `[1, 16, 224, 224]`):
Shape of the output (batch_size, num_frames, height, width) for the video decoder queries of the multimodal
autoencoding model. This excludes the channel dimension.
output_num_channels (`int`, *optional*, defaults to 512):
Number of output channels for each modalitiy decoder.
Example:
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment