Unverified Commit 03af4c42 authored by Sylvain Gugger's avatar Sylvain Gugger Committed by GitHub
Browse files

Docstring check (#26052)



* Fix number of minimal calls to the Hub with peft integration

* Alternate design

* And this way?

* Revert

* Nits to fix

* Add util

* Print when changes are made

* Add list to ignore

* Add more rules

* Manual fixes

* deal with kwargs

* deal with enum defaults

* avoid many digits for floats

* Manual fixes

* Fix regex

* Fix regex

* Auto fix

* Style

* Apply script

* Add ignored list

* Add check that templates are filled

* Adding to CI checks

* Add back semi-fix

* Ignore more objects

* More auto-fixes

* Ignore missing objects

* Remove temp semi-fix

* Fixes

* Update src/transformers/models/pvt/configuration_pvt.py
Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>

* Update utils/check_docstrings.py
Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/utils/quantization_config.py
Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>

* Deal with float defaults

* Fix small defaults

* Address review comment

* Treat

* Post-rebase cleanup

* Address review comment

* Update src/transformers/models/deprecated/mctct/configuration_mctct.py
Co-authored-by: default avatarLysandre Debut <lysandre.debut@reseau.eseo.fr>

* Address review comment

---------
Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: default avatarLysandre Debut <lysandre.debut@reseau.eseo.fr>
parent 122b2657
...@@ -358,20 +358,17 @@ class MaskFormerImageProcessor(BaseImageProcessor): ...@@ -358,20 +358,17 @@ class MaskFormerImageProcessor(BaseImageProcessor):
sequence like `(width, height)`, output size will be matched to this. If size is an int, smaller edge of sequence like `(width, height)`, output size will be matched to this. If size is an int, smaller edge of
the image will be matched to this number. i.e, if `height > width`, then image will be rescaled to `(size * the image will be matched to this number. i.e, if `height > width`, then image will be rescaled to `(size *
height / width, size)`. height / width, size)`.
max_size (`int`, *optional*, defaults to 1333): size_divisor (`int`, *optional*, defaults to 32):
The largest size an image dimension can have (otherwise it's capped). Only has an effect if `do_resize` is Some backbones need images divisible by a certain number. If not passed, it defaults to the value used in
set to `True`. Swin Transformer.
resample (`int`, *optional*, defaults to `PIL.Image.Resampling.BILINEAR`): resample (`int`, *optional*, defaults to `Resampling.BILINEAR`):
An optional resampling filter. This can be one of `PIL.Image.Resampling.NEAREST`, An optional resampling filter. This can be one of `PIL.Image.Resampling.NEAREST`,
`PIL.Image.Resampling.BOX`, `PIL.Image.Resampling.BILINEAR`, `PIL.Image.Resampling.HAMMING`, `PIL.Image.Resampling.BOX`, `PIL.Image.Resampling.BILINEAR`, `PIL.Image.Resampling.HAMMING`,
`PIL.Image.Resampling.BICUBIC` or `PIL.Image.Resampling.LANCZOS`. Only has an effect if `do_resize` is set `PIL.Image.Resampling.BICUBIC` or `PIL.Image.Resampling.LANCZOS`. Only has an effect if `do_resize` is set
to `True`. to `True`.
size_divisor (`int`, *optional*, defaults to 32):
Some backbones need images divisible by a certain number. If not passed, it defaults to the value used in
Swin Transformer.
do_rescale (`bool`, *optional*, defaults to `True`): do_rescale (`bool`, *optional*, defaults to `True`):
Whether to rescale the input to a certain `scale`. Whether to rescale the input to a certain `scale`.
rescale_factor (`float`, *optional*, defaults to 1/ 255): rescale_factor (`float`, *optional*, defaults to `1/ 255`):
Rescale the input by the given factor. Only has an effect if `do_rescale` is set to `True`. Rescale the input by the given factor. Only has an effect if `do_rescale` is set to `True`.
do_normalize (`bool`, *optional*, defaults to `True`): do_normalize (`bool`, *optional*, defaults to `True`):
Whether or not to normalize the input with mean and standard deviation. Whether or not to normalize the input with mean and standard deviation.
......
...@@ -62,7 +62,7 @@ class MgpstrConfig(PretrainedConfig): ...@@ -62,7 +62,7 @@ class MgpstrConfig(PretrainedConfig):
Whether to add a bias to the queries, keys and values. Whether to add a bias to the queries, keys and values.
distilled (`bool`, *optional*, defaults to `False`): distilled (`bool`, *optional*, defaults to `False`):
Model includes a distillation token and head as in DeiT models. Model includes a distillation token and head as in DeiT models.
layer_norm_eps (`float`, *optional*, defaults to 1e-5): layer_norm_eps (`float`, *optional*, defaults to 1e-05):
The epsilon used by the layer normalization layers. The epsilon used by the layer normalization layers.
drop_rate (`float`, *optional*, defaults to 0.0): drop_rate (`float`, *optional*, defaults to 0.0):
The dropout probability for all fully connected layers in the embeddings, encoder. The dropout probability for all fully connected layers in the embeddings, encoder.
......
...@@ -44,9 +44,9 @@ class MgpstrProcessor(ProcessorMixin): ...@@ -44,9 +44,9 @@ class MgpstrProcessor(ProcessorMixin):
[`~MgpstrProcessor.__call__`] and [`~MgpstrProcessor.batch_decode`] for more information. [`~MgpstrProcessor.__call__`] and [`~MgpstrProcessor.batch_decode`] for more information.
Args: Args:
image_processor (`ViTImageProcessor`): image_processor (`ViTImageProcessor`, *optional*):
An instance of `ViTImageProcessor`. The image processor is a required input. An instance of `ViTImageProcessor`. The image processor is a required input.
tokenizer ([`MgpstrTokenizer`]): tokenizer ([`MgpstrTokenizer`], *optional*):
The tokenizer is a required input. The tokenizer is a required input.
""" """
attributes = ["image_processor", "char_tokenizer"] attributes = ["image_processor", "char_tokenizer"]
......
...@@ -52,7 +52,7 @@ class MgpstrTokenizer(PreTrainedTokenizer): ...@@ -52,7 +52,7 @@ class MgpstrTokenizer(PreTrainedTokenizer):
The beginning of sequence token. The beginning of sequence token.
eos_token (`str`, *optional*, defaults to `"[s]"`): eos_token (`str`, *optional*, defaults to `"[s]"`):
The end of sequence token. The end of sequence token.
pad_token (`str` or `tokenizers.AddedToken`, *optional*, , defaults to `"[GO]"`): pad_token (`str` or `tokenizers.AddedToken`, *optional*, defaults to `"[GO]"`):
A special token used to make arrays of tokens the same size for batching purpose. Will then be ignored by A special token used to make arrays of tokens the same size for batching purpose. Will then be ignored by
attention mechanisms or loss computation. attention mechanisms or loss computation.
""" """
......
...@@ -55,7 +55,7 @@ class MobileNetV1Config(PretrainedConfig): ...@@ -55,7 +55,7 @@ class MobileNetV1Config(PretrainedConfig):
All layers will have at least this many channels. All layers will have at least this many channels.
hidden_act (`str` or `function`, *optional*, defaults to `"relu6"`): hidden_act (`str` or `function`, *optional*, defaults to `"relu6"`):
The non-linear activation function (function or string) in the Transformer encoder and convolution layers. The non-linear activation function (function or string) in the Transformer encoder and convolution layers.
tf_padding (`bool`, `optional`, defaults to `True`): tf_padding (`bool`, *optional*, defaults to `True`):
Whether to use TensorFlow padding rules on the convolution layers. Whether to use TensorFlow padding rules on the convolution layers.
classifier_dropout_prob (`float`, *optional*, defaults to 0.999): classifier_dropout_prob (`float`, *optional*, defaults to 0.999):
The dropout ratio for attached classifiers. The dropout ratio for attached classifiers.
......
...@@ -64,16 +64,16 @@ class MobileNetV2Config(PretrainedConfig): ...@@ -64,16 +64,16 @@ class MobileNetV2Config(PretrainedConfig):
the input dimensions by a factor of 32. If `output_stride` is 8 or 16, the model uses dilated convolutions the input dimensions by a factor of 32. If `output_stride` is 8 or 16, the model uses dilated convolutions
on the depthwise layers instead of regular convolutions, so that the feature maps never become more than 8x on the depthwise layers instead of regular convolutions, so that the feature maps never become more than 8x
or 16x smaller than the input image. or 16x smaller than the input image.
first_layer_is_expansion (`bool`, `optional`, defaults to `True`): first_layer_is_expansion (`bool`, *optional*, defaults to `True`):
True if the very first convolution layer is also the expansion layer for the first expansion block. True if the very first convolution layer is also the expansion layer for the first expansion block.
finegrained_output (`bool`, `optional`, defaults to `True`): finegrained_output (`bool`, *optional*, defaults to `True`):
If true, the number of output channels in the final convolution layer will stay large (1280) even if If true, the number of output channels in the final convolution layer will stay large (1280) even if
`depth_multiplier` is less than 1. `depth_multiplier` is less than 1.
hidden_act (`str` or `function`, *optional*, defaults to `"relu6"`): hidden_act (`str` or `function`, *optional*, defaults to `"relu6"`):
The non-linear activation function (function or string) in the Transformer encoder and convolution layers. The non-linear activation function (function or string) in the Transformer encoder and convolution layers.
tf_padding (`bool`, `optional`, defaults to `True`): tf_padding (`bool`, *optional*, defaults to `True`):
Whether to use TensorFlow padding rules on the convolution layers. Whether to use TensorFlow padding rules on the convolution layers.
classifier_dropout_prob (`float`, *optional*, defaults to 0.999): classifier_dropout_prob (`float`, *optional*, defaults to 0.8):
The dropout ratio for attached classifiers. The dropout ratio for attached classifiers.
initializer_range (`float`, *optional*, defaults to 0.02): initializer_range (`float`, *optional*, defaults to 0.02):
The standard deviation of the truncated_normal_initializer for initializing all weight matrices. The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
...@@ -105,7 +105,7 @@ class MobileNetV2Config(PretrainedConfig): ...@@ -105,7 +105,7 @@ class MobileNetV2Config(PretrainedConfig):
depth_multiplier=1.0, depth_multiplier=1.0,
depth_divisible_by=8, depth_divisible_by=8,
min_depth=8, min_depth=8,
expand_ratio=6, expand_ratio=6.0,
output_stride=32, output_stride=32,
first_layer_is_expansion=True, first_layer_is_expansion=True,
finegrained_output=True, finegrained_output=True,
......
...@@ -74,7 +74,7 @@ class MobileViTConfig(PretrainedConfig): ...@@ -74,7 +74,7 @@ class MobileViTConfig(PretrainedConfig):
The non-linear activation function (function or string) in the Transformer encoder and convolution layers. The non-linear activation function (function or string) in the Transformer encoder and convolution layers.
conv_kernel_size (`int`, *optional*, defaults to 3): conv_kernel_size (`int`, *optional*, defaults to 3):
The size of the convolutional kernel in the MobileViT layer. The size of the convolutional kernel in the MobileViT layer.
output_stride (`int`, `optional`, defaults to 32): output_stride (`int`, *optional*, defaults to 32):
The ratio of the spatial resolution of the output to the resolution of the input image. The ratio of the spatial resolution of the output to the resolution of the input image.
hidden_dropout_prob (`float`, *optional*, defaults to 0.1): hidden_dropout_prob (`float`, *optional*, defaults to 0.1):
The dropout probabilitiy for all fully connected layers in the Transformer encoder. The dropout probabilitiy for all fully connected layers in the Transformer encoder.
...@@ -84,11 +84,11 @@ class MobileViTConfig(PretrainedConfig): ...@@ -84,11 +84,11 @@ class MobileViTConfig(PretrainedConfig):
The dropout ratio for attached classifiers. The dropout ratio for attached classifiers.
initializer_range (`float`, *optional*, defaults to 0.02): initializer_range (`float`, *optional*, defaults to 0.02):
The standard deviation of the truncated_normal_initializer for initializing all weight matrices. The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
layer_norm_eps (`float`, *optional*, defaults to 1e-5): layer_norm_eps (`float`, *optional*, defaults to 1e-05):
The epsilon used by the layer normalization layers. The epsilon used by the layer normalization layers.
qkv_bias (`bool`, *optional*, defaults to `True`): qkv_bias (`bool`, *optional*, defaults to `True`):
Whether to add a bias to the queries, keys and values. Whether to add a bias to the queries, keys and values.
aspp_out_channels (`int`, `optional`, defaults to 256): aspp_out_channels (`int`, *optional*, defaults to 256):
Number of output channels used in the ASPP layer for semantic segmentation. Number of output channels used in the ASPP layer for semantic segmentation.
atrous_rates (`List[int]`, *optional*, defaults to `[6, 12, 18]`): atrous_rates (`List[int]`, *optional*, defaults to `[6, 12, 18]`):
Dilation (atrous) factors used in the ASPP layer for semantic segmentation. Dilation (atrous) factors used in the ASPP layer for semantic segmentation.
......
...@@ -59,7 +59,7 @@ class MobileViTImageProcessor(BaseImageProcessor): ...@@ -59,7 +59,7 @@ class MobileViTImageProcessor(BaseImageProcessor):
size (`Dict[str, int]` *optional*, defaults to `{"shortest_edge": 224}`): size (`Dict[str, int]` *optional*, defaults to `{"shortest_edge": 224}`):
Controls the size of the output image after resizing. Can be overridden by the `size` parameter in the Controls the size of the output image after resizing. Can be overridden by the `size` parameter in the
`preprocess` method. `preprocess` method.
resample (`PILImageResampling`, *optional*, defaults to `PILImageResampling.BILINEAR`): resample (`PILImageResampling`, *optional*, defaults to `Resampling.BILINEAR`):
Defines the resampling filter to use if resizing the image. Can be overridden by the `resample` parameter Defines the resampling filter to use if resizing the image. Can be overridden by the `resample` parameter
in the `preprocess` method. in the `preprocess` method.
do_rescale (`bool`, *optional*, defaults to `True`): do_rescale (`bool`, *optional*, defaults to `True`):
......
...@@ -54,15 +54,15 @@ class MobileViTV2Config(PretrainedConfig): ...@@ -54,15 +54,15 @@ class MobileViTV2Config(PretrainedConfig):
The non-linear activation function (function or string) in the Transformer encoder and convolution layers. The non-linear activation function (function or string) in the Transformer encoder and convolution layers.
conv_kernel_size (`int`, *optional*, defaults to 3): conv_kernel_size (`int`, *optional*, defaults to 3):
The size of the convolutional kernel in the MobileViTV2 layer. The size of the convolutional kernel in the MobileViTV2 layer.
output_stride (`int`, `optional`, defaults to 32): output_stride (`int`, *optional*, defaults to 32):
The ratio of the spatial resolution of the output to the resolution of the input image. The ratio of the spatial resolution of the output to the resolution of the input image.
classifier_dropout_prob (`float`, *optional*, defaults to 0.1): classifier_dropout_prob (`float`, *optional*, defaults to 0.1):
The dropout ratio for attached classifiers. The dropout ratio for attached classifiers.
initializer_range (`float`, *optional*, defaults to 0.02): initializer_range (`float`, *optional*, defaults to 0.02):
The standard deviation of the truncated_normal_initializer for initializing all weight matrices. The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
layer_norm_eps (`float`, *optional*, defaults to 1e-5): layer_norm_eps (`float`, *optional*, defaults to 1e-05):
The epsilon used by the layer normalization layers. The epsilon used by the layer normalization layers.
aspp_out_channels (`int`, `optional`, defaults to 512): aspp_out_channels (`int`, *optional*, defaults to 512):
Number of output channels used in the ASPP layer for semantic segmentation. Number of output channels used in the ASPP layer for semantic segmentation.
atrous_rates (`List[int]`, *optional*, defaults to `[6, 12, 18]`): atrous_rates (`List[int]`, *optional*, defaults to `[6, 12, 18]`):
Dilation (atrous) factors used in the ASPP layer for semantic segmentation. Dilation (atrous) factors used in the ASPP layer for semantic segmentation.
...@@ -74,13 +74,13 @@ class MobileViTV2Config(PretrainedConfig): ...@@ -74,13 +74,13 @@ class MobileViTV2Config(PretrainedConfig):
The number of attention blocks in each MobileViTV2Layer The number of attention blocks in each MobileViTV2Layer
base_attn_unit_dims (`List[int]`, *optional*, defaults to `[128, 192, 256]`): base_attn_unit_dims (`List[int]`, *optional*, defaults to `[128, 192, 256]`):
The base multiplier for dimensions of attention blocks in each MobileViTV2Layer The base multiplier for dimensions of attention blocks in each MobileViTV2Layer
width_multiplier (`float`, *optional*, defaults to 1.0) width_multiplier (`float`, *optional*, defaults to 1.0):
The width multiplier for MobileViTV2. The width multiplier for MobileViTV2.
ffn_multiplier (`int`, *optional*, defaults to 2) ffn_multiplier (`int`, *optional*, defaults to 2):
The FFN multiplier for MobileViTV2. The FFN multiplier for MobileViTV2.
attn_dropout (`float`, *optional*, defaults to 0.0) attn_dropout (`float`, *optional*, defaults to 0.0):
The dropout in the attention layer. The dropout in the attention layer.
ffn_dropout (`float`, *optional*, defaults to 0.0) ffn_dropout (`float`, *optional*, defaults to 0.0):
The dropout between FFN layers. The dropout between FFN layers.
Example: Example:
......
...@@ -145,17 +145,17 @@ class MptConfig(PretrainedConfig): ...@@ -145,17 +145,17 @@ class MptConfig(PretrainedConfig):
the `inputs_ids` passed when calling [`MptModel`]. Check [this the `inputs_ids` passed when calling [`MptModel`]. Check [this
discussion](https://huggingface.co/bigscience/mpt/discussions/120#633d28389addb8530b406c2a) on how the discussion](https://huggingface.co/bigscience/mpt/discussions/120#633d28389addb8530b406c2a) on how the
`vocab_size` has been defined. `vocab_size` has been defined.
resid_pdrop (`float`, *optional*, defaults to 0.1): resid_pdrop (`float`, *optional*, defaults to 0.0):
The dropout probability applied to the attention output before combining with residual. The dropout probability applied to the attention output before combining with residual.
layer_norm_epsilon (`float`, *optional*, defaults to 1e-5): layer_norm_epsilon (`float`, *optional*, defaults to 1e-05):
The epsilon to use in the layer normalization layers. The epsilon to use in the layer normalization layers.
emb_pdrop (`float`, *optional*, defaults to 0.1): emb_pdrop (`float`, *optional*, defaults to 0.0):
The dropout probability for the embedding layer. The dropout probability for the embedding layer.
learned_pos_emb (`bool`, *optional*, defaults to `False`): learned_pos_emb (`bool`, *optional*, defaults to `True`):
Whether to use learned positional embeddings. Whether to use learned positional embeddings.
attn_config (`dict`, *optional*): attn_config (`dict`, *optional*):
A dictionary used to configure the model's attention module. A dictionary used to configure the model's attention module.
init_device (`str`, *optional*): init_device (`str`, *optional*, defaults to `"cpu"`):
The device to use for parameter initialization. Defined for backward compatibility The device to use for parameter initialization. Defined for backward compatibility
logit_scale (`float`, *optional*): logit_scale (`float`, *optional*):
If not None, scale the logits by this value. If not None, scale the logits by this value.
...@@ -169,7 +169,7 @@ class MptConfig(PretrainedConfig): ...@@ -169,7 +169,7 @@ class MptConfig(PretrainedConfig):
norm_type (`str`, *optional*, defaults to `"low_precision_layernorm"`): norm_type (`str`, *optional*, defaults to `"low_precision_layernorm"`):
Type of layer norm to use. All MPT models uses the same layer norm implementation. Defined for backward Type of layer norm to use. All MPT models uses the same layer norm implementation. Defined for backward
compatibility. compatibility.
use_cache (`bool`, *optional*, defaults to `True`): use_cache (`bool`, *optional*, defaults to `False`):
Whether or not the model should return the last key/values attentions (not used by all models). Whether or not the model should return the last key/values attentions (not used by all models).
initializer_range (`float`, *optional*, defaults to 0.02): initializer_range (`float`, *optional*, defaults to 0.02):
The standard deviation of the truncated_normal_initializer for initializing all weight matrices. The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
......
...@@ -44,9 +44,9 @@ class NatConfig(BackboneConfigMixin, PretrainedConfig): ...@@ -44,9 +44,9 @@ class NatConfig(BackboneConfigMixin, PretrainedConfig):
The number of input channels. The number of input channels.
embed_dim (`int`, *optional*, defaults to 64): embed_dim (`int`, *optional*, defaults to 64):
Dimensionality of patch embedding. Dimensionality of patch embedding.
depths (`List[int]`, *optional*, defaults to `[2, 2, 6, 2]`): depths (`List[int]`, *optional*, defaults to `[3, 4, 6, 5]`):
Number of layers in each level of the encoder. Number of layers in each level of the encoder.
num_heads (`List[int]`, *optional*, defaults to `[3, 6, 12, 24]`): num_heads (`List[int]`, *optional*, defaults to `[2, 4, 8, 16]`):
Number of attention heads in each layer of the Transformer encoder. Number of attention heads in each layer of the Transformer encoder.
kernel_size (`int`, *optional*, defaults to 7): kernel_size (`int`, *optional*, defaults to 7):
Neighborhood Attention kernel size. Neighborhood Attention kernel size.
...@@ -65,7 +65,7 @@ class NatConfig(BackboneConfigMixin, PretrainedConfig): ...@@ -65,7 +65,7 @@ class NatConfig(BackboneConfigMixin, PretrainedConfig):
`"selu"` and `"gelu_new"` are supported. `"selu"` and `"gelu_new"` are supported.
initializer_range (`float`, *optional*, defaults to 0.02): initializer_range (`float`, *optional*, defaults to 0.02):
The standard deviation of the truncated_normal_initializer for initializing all weight matrices. The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
layer_norm_eps (`float`, *optional*, defaults to 1e-12): layer_norm_eps (`float`, *optional*, defaults to 1e-05):
The epsilon used by the layer normalization layers. The epsilon used by the layer normalization layers.
layer_scale_init_value (`float`, *optional*, defaults to 0.0): layer_scale_init_value (`float`, *optional*, defaults to 0.0):
The initial value for the layer scale. Disabled if <=0. The initial value for the layer scale. Disabled if <=0.
......
...@@ -66,7 +66,7 @@ class NougatImageProcessor(BaseImageProcessor): ...@@ -66,7 +66,7 @@ class NougatImageProcessor(BaseImageProcessor):
`do_resize` in the `preprocess` method. `do_resize` in the `preprocess` method.
size (`Dict[str, int]` *optional*, defaults to `{"height": 896, "width": 672}`): size (`Dict[str, int]` *optional*, defaults to `{"height": 896, "width": 672}`):
Size of the image after resizing. Can be overridden by `size` in the `preprocess` method. Size of the image after resizing. Can be overridden by `size` in the `preprocess` method.
resample (`PILImageResampling`, *optional*, defaults to `PILImageResampling.BILINEAR`): resample (`PILImageResampling`, *optional*, defaults to `Resampling.BILINEAR`):
Resampling filter to use if resizing the image. Can be overridden by `resample` in the `preprocess` method. Resampling filter to use if resizing the image. Can be overridden by `resample` in the `preprocess` method.
do_thumbnail (`bool`, *optional*, defaults to `True`): do_thumbnail (`bool`, *optional*, defaults to `True`):
Whether to resize the image using thumbnail method. Whether to resize the image using thumbnail method.
......
...@@ -383,10 +383,10 @@ class NougatTokenizerFast(PreTrainedTokenizerFast): ...@@ -383,10 +383,10 @@ class NougatTokenizerFast(PreTrainedTokenizerFast):
methods for postprocessing the generated text. methods for postprocessing the generated text.
Args: Args:
vocab_file (`str`): vocab_file (`str`, *optional*):
[SentencePiece](https://github.com/google/sentencepiece) file (generally has a .model extension) that [SentencePiece](https://github.com/google/sentencepiece) file (generally has a .model extension) that
contains the vocabulary necessary to instantiate a tokenizer. contains the vocabulary necessary to instantiate a tokenizer.
tokenizer_file (`str`): tokenizer_file (`str`, *optional*):
[tokenizers](https://github.com/huggingface/tokenizers) file (generally has a .json extension) that [tokenizers](https://github.com/huggingface/tokenizers) file (generally has a .json extension) that
contains everything needed to load the tokenizer. contains everything needed to load the tokenizer.
...@@ -394,16 +394,16 @@ class NougatTokenizerFast(PreTrainedTokenizerFast): ...@@ -394,16 +394,16 @@ class NougatTokenizerFast(PreTrainedTokenizerFast):
Wether to cleanup spaces after decoding, cleanup consists in removing potential artifacts like extra Wether to cleanup spaces after decoding, cleanup consists in removing potential artifacts like extra
spaces. spaces.
unk_token (`str`, *optional*, defaults to `"<unk>"`):
The unknown token. A token that is not in the vocabulary cannot be converted to an ID and is set to be this
token instead.
bos_token (`str`, *optional*, defaults to `"<s>"`): bos_token (`str`, *optional*, defaults to `"<s>"`):
The beginning of sequence token that was used during pretraining. Can be used a sequence classifier token. The beginning of sequence token that was used during pretraining. Can be used a sequence classifier token.
eos_token (`str`, *optional*, defaults to `"</s>"`): eos_token (`str`, *optional*, defaults to `"</s>"`):
The end of sequence token. The end of sequence token.
unk_token (`str`, *optional*, defaults to `"<unk>"`):
The unknown token. A token that is not in the vocabulary cannot be converted to an ID and is set to be this
token instead.
pad_token (`str`, *optional*, defaults to `"<pad>"`): pad_token (`str`, *optional*, defaults to `"<pad>"`):
The token used for padding, for example when batching sequences of different lengths. The token used for padding, for example when batching sequences of different lengths.
""" """
......
...@@ -42,87 +42,87 @@ class OneFormerConfig(PretrainedConfig): ...@@ -42,87 +42,87 @@ class OneFormerConfig(PretrainedConfig):
documentation from [`PretrainedConfig`] for more information. documentation from [`PretrainedConfig`] for more information.
Args: Args:
backbone_config (`PretrainedConfig`, *optional*, defaults to `SwinConfig`) backbone_config (`PretrainedConfig`, *optional*, defaults to `SwinConfig`):
The configuration of the backbone model. The configuration of the backbone model.
ignore_value (`int`, *optional*, defaults to 255) ignore_value (`int`, *optional*, defaults to 255):
Values to be ignored in GT label while calculating loss. Values to be ignored in GT label while calculating loss.
num_queries (`int`, *optional*, defaults to 150) num_queries (`int`, *optional*, defaults to 150):
Number of object queries. Number of object queries.
no_object_weight (`float`, *optional*, defaults to 0.1) no_object_weight (`float`, *optional*, defaults to 0.1):
Weight for no-object class predictions. Weight for no-object class predictions.
class_weight (`float`, *optional*, defaults to 2.0) class_weight (`float`, *optional*, defaults to 2.0):
Weight for Classification CE loss. Weight for Classification CE loss.
mask_weight (`float`, *optional*, defaults to 5.0) mask_weight (`float`, *optional*, defaults to 5.0):
Weight for binary CE loss. Weight for binary CE loss.
dice_weight (`float`, *optional*, defaults to 5.0) dice_weight (`float`, *optional*, defaults to 5.0):
Weight for dice loss. Weight for dice loss.
contrastive_weight (`float`, *optional*, defaults to 0.5) contrastive_weight (`float`, *optional*, defaults to 0.5):
Weight for contrastive loss. Weight for contrastive loss.
contrastive_temperature (`float`, *optional*, defaults to 0.07) contrastive_temperature (`float`, *optional*, defaults to 0.07):
Initial value for scaling the contrastive logits. Initial value for scaling the contrastive logits.
train_num_points (`int`, *optional*, defaults to 12544) train_num_points (`int`, *optional*, defaults to 12544):
Number of points to sample while calculating losses on mask predictions. Number of points to sample while calculating losses on mask predictions.
oversample_ratio (`float`, *optional*, defaults to 3.0) oversample_ratio (`float`, *optional*, defaults to 3.0):
Ratio to decide how many points to oversample. Ratio to decide how many points to oversample.
importance_sample_ratio (`float`, *optional*, defaults to 0.75) importance_sample_ratio (`float`, *optional*, defaults to 0.75):
Ratio of points that are sampled via importance sampling. Ratio of points that are sampled via importance sampling.
init_std (`float`, *optional*, defaults to 0.02) init_std (`float`, *optional*, defaults to 0.02):
Standard deviation for normal intialization. Standard deviation for normal intialization.
init_xavier_std (`float`, *optional*, defaults to 0.02) init_xavier_std (`float`, *optional*, defaults to 1.0):
Standard deviation for xavier uniform initialization. Standard deviation for xavier uniform initialization.
layer_norm_eps (`float`, *optional*, defaults to 1e-05) layer_norm_eps (`float`, *optional*, defaults to 1e-05):
Epsilon for layer normalization. Epsilon for layer normalization.
is_training (`bool`, *optional*, defaults to False) is_training (`bool`, *optional*, defaults to `False`):
Whether to run in training or inference mode. Whether to run in training or inference mode.
use_auxiliary_loss (`bool`, *optional*, defaults to True) use_auxiliary_loss (`bool`, *optional*, defaults to `True`):
Whether to calculate loss using intermediate predictions from transformer decoder. Whether to calculate loss using intermediate predictions from transformer decoder.
output_auxiliary_logits (`bool`, *optional*, defaults to True) output_auxiliary_logits (`bool`, *optional*, defaults to `True`):
Whether to return intermediate predictions from transformer decoder. Whether to return intermediate predictions from transformer decoder.
strides (`list`, *optional*, defaults to [4, 8, 16, 32]) strides (`list`, *optional*, defaults to `[4, 8, 16, 32]`):
List containing the strides for feature maps in the encoder. List containing the strides for feature maps in the encoder.
task_seq_len (`int`, *optional*, defaults to 77) task_seq_len (`int`, *optional*, defaults to 77):
Sequence length for tokenizing text list input. Sequence length for tokenizing text list input.
text_encoder_width (`int`, *optional*, defaults to 256) text_encoder_width (`int`, *optional*, defaults to 256):
Hidden size for text encoder. Hidden size for text encoder.
text_encoder_context_length (`int`, *optional*, defaults to 77): text_encoder_context_length (`int`, *optional*, defaults to 77):
Input sequence length for text encoder. Input sequence length for text encoder.
text_encoder_num_layers (`int`, *optional*, defaults to 6) text_encoder_num_layers (`int`, *optional*, defaults to 6):
Number of layers for transformer in text encoder. Number of layers for transformer in text encoder.
text_encoder_vocab_size (`int`, *optional*, defaults to 49408) text_encoder_vocab_size (`int`, *optional*, defaults to 49408):
Vocabulary size for tokenizer. Vocabulary size for tokenizer.
text_encoder_proj_layers (`int`, *optional*, defaults to 2) text_encoder_proj_layers (`int`, *optional*, defaults to 2):
Number of layers in MLP for project text queries. Number of layers in MLP for project text queries.
text_encoder_n_ctx (`int`, *optional*, defaults to 16) text_encoder_n_ctx (`int`, *optional*, defaults to 16):
Number of learnable text context queries. Number of learnable text context queries.
conv_dim (`int`, *optional*, defaults to 256) conv_dim (`int`, *optional*, defaults to 256):
Feature map dimension to map outputs from the backbone. Feature map dimension to map outputs from the backbone.
mask_dim (`int`, *optional*, defaults to 256) mask_dim (`int`, *optional*, defaults to 256):
Dimension for feature maps in pixel decoder. Dimension for feature maps in pixel decoder.
hidden_dim (`int`, *optional*, defaults to 256) hidden_dim (`int`, *optional*, defaults to 256):
Dimension for hidden states in transformer decoder. Dimension for hidden states in transformer decoder.
encoder_feedforward_dim (`int`, *optional*, defaults to 1024) encoder_feedforward_dim (`int`, *optional*, defaults to 1024):
Dimension for FFN layer in pixel decoder. Dimension for FFN layer in pixel decoder.
norm (`str`, *optional*, defaults to `GN`) norm (`str`, *optional*, defaults to `"GN"`):
Type of normalization. Type of normalization.
encoder_layers (`int`, *optional*, defaults to 6) encoder_layers (`int`, *optional*, defaults to 6):
Number of layers in pixel decoder. Number of layers in pixel decoder.
decoder_layers (`int`, *optional*, defaults to 10) decoder_layers (`int`, *optional*, defaults to 10):
Number of layers in transformer decoder. Number of layers in transformer decoder.
use_task_norm (`bool`, *optional*, defaults to `True`) use_task_norm (`bool`, *optional*, defaults to `True`):
Whether to normalize the task token. Whether to normalize the task token.
num_attention_heads (`int`, *optional*, defaults to 8) num_attention_heads (`int`, *optional*, defaults to 8):
Number of attention heads in transformer layers in the pixel and transformer decoders. Number of attention heads in transformer layers in the pixel and transformer decoders.
dropout (`float`, *optional*, defaults to 0.1) dropout (`float`, *optional*, defaults to 0.1):
Dropout probability for pixel and transformer decoders. Dropout probability for pixel and transformer decoders.
dim_feedforward (`int`, *optional*, defaults to 2048) dim_feedforward (`int`, *optional*, defaults to 2048):
Dimension for FFN layer in transformer decoder. Dimension for FFN layer in transformer decoder.
pre_norm (`bool`, *optional*, defaults to `False`) pre_norm (`bool`, *optional*, defaults to `False`):
Whether to normalize hidden states before attention layers in transformer decoder. Whether to normalize hidden states before attention layers in transformer decoder.
enforce_input_proj (`bool`, *optional*, defaults to `False`) enforce_input_proj (`bool`, *optional*, defaults to `False`):
Whether to project hidden states in transformer decoder. Whether to project hidden states in transformer decoder.
query_dec_layers (`int`, *optional*, defaults to 2) query_dec_layers (`int`, *optional*, defaults to 2):
Number of layers in query transformer. Number of layers in query transformer.
common_stride (`int`, *optional*, defaults to 4) common_stride (`int`, *optional*, defaults to 4):
Common stride used for features in pixel decoder. Common stride used for features in pixel decoder.
Examples: Examples:
......
...@@ -361,17 +361,14 @@ class OneFormerImageProcessor(BaseImageProcessor): ...@@ -361,17 +361,14 @@ class OneFormerImageProcessor(BaseImageProcessor):
sequence like `(width, height)`, output size will be matched to this. If size is an int, smaller edge of sequence like `(width, height)`, output size will be matched to this. If size is an int, smaller edge of
the image will be matched to this number. i.e, if `height > width`, then image will be rescaled to `(size * the image will be matched to this number. i.e, if `height > width`, then image will be rescaled to `(size *
height / width, size)`. height / width, size)`.
max_size (`int`, *optional*, defaults to 1333): resample (`int`, *optional*, defaults to `Resampling.BILINEAR`):
The largest size an image dimension can have (otherwise it's capped). Only has an effect if `do_resize` is
set to `True`.
resample (`int`, *optional*, defaults to `PIL.Image.Resampling.BILINEAR`):
An optional resampling filter. This can be one of `PIL.Image.Resampling.NEAREST`, An optional resampling filter. This can be one of `PIL.Image.Resampling.NEAREST`,
`PIL.Image.Resampling.BOX`, `PIL.Image.Resampling.BILINEAR`, `PIL.Image.Resampling.HAMMING`, `PIL.Image.Resampling.BOX`, `PIL.Image.Resampling.BILINEAR`, `PIL.Image.Resampling.HAMMING`,
`PIL.Image.Resampling.BICUBIC` or `PIL.Image.Resampling.LANCZOS`. Only has an effect if `do_resize` is set `PIL.Image.Resampling.BICUBIC` or `PIL.Image.Resampling.LANCZOS`. Only has an effect if `do_resize` is set
to `True`. to `True`.
do_rescale (`bool`, *optional*, defaults to `True`): do_rescale (`bool`, *optional*, defaults to `True`):
Whether to rescale the input to a certain `scale`. Whether to rescale the input to a certain `scale`.
rescale_factor (`float`, *optional*, defaults to 1/ 255): rescale_factor (`float`, *optional*, defaults to `1/ 255`):
Rescale the input by the given factor. Only has an effect if `do_rescale` is set to `True`. Rescale the input by the given factor. Only has an effect if `do_rescale` is set to `True`.
do_normalize (`bool`, *optional*, defaults to `True`): do_normalize (`bool`, *optional*, defaults to `True`):
Whether or not to normalize the input with mean and standard deviation. Whether or not to normalize the input with mean and standard deviation.
...@@ -387,9 +384,9 @@ class OneFormerImageProcessor(BaseImageProcessor): ...@@ -387,9 +384,9 @@ class OneFormerImageProcessor(BaseImageProcessor):
Whether or not to decrement all label values of segmentation maps by 1. Usually used for datasets where 0 Whether or not to decrement all label values of segmentation maps by 1. Usually used for datasets where 0
is used for background, and background itself is not included in all classes of a dataset (e.g. ADE20k). is used for background, and background itself is not included in all classes of a dataset (e.g. ADE20k).
The background label will be replaced by `ignore_index`. The background label will be replaced by `ignore_index`.
repo_path (`str`, defaults to `shi-labs/oneformer_demo`): repo_path (`str`, defaults to `shi-labs/oneformer_demo`, *optional*, defaults to `"shi-labs/oneformer_demo"`):
Dataset repository on huggingface hub containing the JSON file with class information for the dataset. Dataset repository on huggingface hub containing the JSON file with class information for the dataset.
class_info_file (`str`): class_info_file (`str`, *optional*):
JSON file containing class information for the dataset. It is stored inside on the `repo_path` dataset JSON file containing class information for the dataset. It is stored inside on the `repo_path` dataset
repository. repository.
num_text (`int`, *optional*): num_text (`int`, *optional*):
......
...@@ -56,7 +56,7 @@ class OpenAIGPTConfig(PretrainedConfig): ...@@ -56,7 +56,7 @@ class OpenAIGPTConfig(PretrainedConfig):
The dropout ratio for the embeddings. The dropout ratio for the embeddings.
attn_pdrop (`float`, *optional*, defaults to 0.1): attn_pdrop (`float`, *optional*, defaults to 0.1):
The dropout ratio for the attention. The dropout ratio for the attention.
layer_norm_epsilon (`float`, *optional*, defaults to 1e-5): layer_norm_epsilon (`float`, *optional*, defaults to 1e-05):
The epsilon to use in the layer normalization layers The epsilon to use in the layer normalization layers
initializer_range (`float`, *optional*, defaults to 0.02): initializer_range (`float`, *optional*, defaults to 0.02):
The standard deviation of the truncated_normal_initializer for initializing all weight matrices. The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
...@@ -91,8 +91,6 @@ class OpenAIGPTConfig(PretrainedConfig): ...@@ -91,8 +91,6 @@ class OpenAIGPTConfig(PretrainedConfig):
[`OpenAIGPTDoubleHeadsModel`]. [`OpenAIGPTDoubleHeadsModel`].
The dropout ratio to be used after the projection and activation. The dropout ratio to be used after the projection and activation.
use_cache (`bool`, *optional*, defaults to `True`):
Whether or not the model should return the last key/values attentions (not used by all models).
Examples: Examples:
......
...@@ -171,13 +171,13 @@ class OwlViTVisionConfig(PretrainedConfig): ...@@ -171,13 +171,13 @@ class OwlViTVisionConfig(PretrainedConfig):
hidden_act (`str` or `function`, *optional*, defaults to `"quick_gelu"`): hidden_act (`str` or `function`, *optional*, defaults to `"quick_gelu"`):
The non-linear activation function (function or string) in the encoder and pooler. If string, `"gelu"`, The non-linear activation function (function or string) in the encoder and pooler. If string, `"gelu"`,
`"relu"`, `"selu"` and `"gelu_new"` ``"quick_gelu"` are supported. `"relu"`, `"selu"` and `"gelu_new"` ``"quick_gelu"` are supported.
layer_norm_eps (`float`, *optional*, defaults to 1e-5): layer_norm_eps (`float`, *optional*, defaults to 1e-05):
The epsilon used by the layer normalization layers. The epsilon used by the layer normalization layers.
attention_dropout (`float`, *optional*, defaults to 0.0): attention_dropout (`float`, *optional*, defaults to 0.0):
The dropout ratio for the attention probabilities. The dropout ratio for the attention probabilities.
initializer_range (`float`, *optional*, defaults to 0.02): initializer_range (`float`, *optional*, defaults to 0.02):
The standard deviation of the truncated_normal_initializer for initializing all weight matrices. The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
initializer_factor (`float``, *optional*, defaults to 1): initializer_factor (`float``, *optional*, defaults to 1.0):
A factor for initializing all weight matrices (should be kept to 1, used internally for initialization A factor for initializing all weight matrices (should be kept to 1, used internally for initialization
testing). testing).
......
...@@ -102,7 +102,7 @@ class OwlViTImageProcessor(BaseImageProcessor): ...@@ -102,7 +102,7 @@ class OwlViTImageProcessor(BaseImageProcessor):
The size to use for resizing the image. Only has an effect if `do_resize` is set to `True`. If `size` is a The size to use for resizing the image. Only has an effect if `do_resize` is set to `True`. If `size` is a
sequence like (h, w), output size will be matched to this. If `size` is an int, then image will be resized sequence like (h, w), output size will be matched to this. If `size` is an int, then image will be resized
to (size, size). to (size, size).
resample (`int`, *optional*, defaults to `PIL.Image.Resampling.BICUBIC`): resample (`int`, *optional*, defaults to `Resampling.BICUBIC`):
An optional resampling filter. This can be one of `PIL.Image.Resampling.NEAREST`, An optional resampling filter. This can be one of `PIL.Image.Resampling.NEAREST`,
`PIL.Image.Resampling.BOX`, `PIL.Image.Resampling.BILINEAR`, `PIL.Image.Resampling.HAMMING`, `PIL.Image.Resampling.BOX`, `PIL.Image.Resampling.BILINEAR`, `PIL.Image.Resampling.HAMMING`,
`PIL.Image.Resampling.BICUBIC` or `PIL.Image.Resampling.LANCZOS`. Only has an effect if `do_resize` is set `PIL.Image.Resampling.BICUBIC` or `PIL.Image.Resampling.LANCZOS`. Only has an effect if `do_resize` is set
......
...@@ -33,9 +33,9 @@ class OwlViTProcessor(ProcessorMixin): ...@@ -33,9 +33,9 @@ class OwlViTProcessor(ProcessorMixin):
[`~OwlViTProcessor.__call__`] and [`~OwlViTProcessor.decode`] for more information. [`~OwlViTProcessor.__call__`] and [`~OwlViTProcessor.decode`] for more information.
Args: Args:
image_processor ([`OwlViTImageProcessor`]): image_processor ([`OwlViTImageProcessor`], *optional*):
The image processor is a required input. The image processor is a required input.
tokenizer ([`CLIPTokenizer`, `CLIPTokenizerFast`]): tokenizer ([`CLIPTokenizer`, `CLIPTokenizerFast`], *optional*):
The tokenizer is a required input. The tokenizer is a required input.
""" """
attributes = ["image_processor", "tokenizer"] attributes = ["image_processor", "tokenizer"]
......
...@@ -65,7 +65,7 @@ class PerceiverConfig(PretrainedConfig): ...@@ -65,7 +65,7 @@ class PerceiverConfig(PretrainedConfig):
v_channels (`int`, *optional*): v_channels (`int`, *optional*):
Dimension to project the values before applying attention in the cross-attention and self-attention layers Dimension to project the values before applying attention in the cross-attention and self-attention layers
of the encoder. Will default to preserving the dimension of the queries if not specified. of the encoder. Will default to preserving the dimension of the queries if not specified.
cross_attention_shape_for_attention (`str`, *optional*, defaults to `'kv'`): cross_attention_shape_for_attention (`str`, *optional*, defaults to `"kv"`):
Dimension to use when downsampling the queries and keys in the cross-attention layer of the encoder. Dimension to use when downsampling the queries and keys in the cross-attention layer of the encoder.
self_attention_widening_factor (`int`, *optional*, defaults to 1): self_attention_widening_factor (`int`, *optional*, defaults to 1):
Dimension of the feed-forward layer in the cross-attention layer of the Transformer encoder. Dimension of the feed-forward layer in the cross-attention layer of the Transformer encoder.
...@@ -89,7 +89,7 @@ class PerceiverConfig(PretrainedConfig): ...@@ -89,7 +89,7 @@ class PerceiverConfig(PretrainedConfig):
this to something large just in case (e.g., 512 or 1024 or 2048). this to something large just in case (e.g., 512 or 1024 or 2048).
image_size (`int`, *optional*, defaults to 56): image_size (`int`, *optional*, defaults to 56):
Size of the images after preprocessing, for [`PerceiverForImageClassificationLearned`]. Size of the images after preprocessing, for [`PerceiverForImageClassificationLearned`].
train_size (`List[int]`, *optional*, defaults to [368, 496]): train_size (`List[int]`, *optional*, defaults to `[368, 496]`):
Training size of the images for the optical flow model. Training size of the images for the optical flow model.
num_frames (`int`, *optional*, defaults to 16): num_frames (`int`, *optional*, defaults to 16):
Number of video frames used for the multimodal autoencoding model. Number of video frames used for the multimodal autoencoding model.
...@@ -97,11 +97,11 @@ class PerceiverConfig(PretrainedConfig): ...@@ -97,11 +97,11 @@ class PerceiverConfig(PretrainedConfig):
Number of audio samples per frame for the multimodal autoencoding model. Number of audio samples per frame for the multimodal autoencoding model.
samples_per_patch (`int`, *optional*, defaults to 16): samples_per_patch (`int`, *optional*, defaults to 16):
Number of audio samples per patch when preprocessing the audio for the multimodal autoencoding model. Number of audio samples per patch when preprocessing the audio for the multimodal autoencoding model.
output_num_channels (`int`, *optional*, defaults to 512):
Number of output channels for each modalitiy decoder.
output_shape (`List[int]`, *optional*, defaults to `[1, 16, 224, 224]`): output_shape (`List[int]`, *optional*, defaults to `[1, 16, 224, 224]`):
Shape of the output (batch_size, num_frames, height, width) for the video decoder queries of the multimodal Shape of the output (batch_size, num_frames, height, width) for the video decoder queries of the multimodal
autoencoding model. This excludes the channel dimension. autoencoding model. This excludes the channel dimension.
output_num_channels (`int`, *optional*, defaults to 512):
Number of output channels for each modalitiy decoder.
Example: Example:
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment