Unverified Commit 03af4c42 authored by Sylvain Gugger's avatar Sylvain Gugger Committed by GitHub
Browse files

Docstring check (#26052)



* Fix number of minimal calls to the Hub with peft integration

* Alternate design

* And this way?

* Revert

* Nits to fix

* Add util

* Print when changes are made

* Add list to ignore

* Add more rules

* Manual fixes

* deal with kwargs

* deal with enum defaults

* avoid many digits for floats

* Manual fixes

* Fix regex

* Fix regex

* Auto fix

* Style

* Apply script

* Add ignored list

* Add check that templates are filled

* Adding to CI checks

* Add back semi-fix

* Ignore more objects

* More auto-fixes

* Ignore missing objects

* Remove temp semi-fix

* Fixes

* Update src/transformers/models/pvt/configuration_pvt.py
Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>

* Update utils/check_docstrings.py
Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/utils/quantization_config.py
Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>

* Deal with float defaults

* Fix small defaults

* Address review comment

* Treat

* Post-rebase cleanup

* Address review comment

* Update src/transformers/models/deprecated/mctct/configuration_mctct.py
Co-authored-by: default avatarLysandre Debut <lysandre.debut@reseau.eseo.fr>

* Address review comment

---------
Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: default avatarLysandre Debut <lysandre.debut@reseau.eseo.fr>
parent 122b2657
...@@ -41,7 +41,7 @@ class BeitConfig(PretrainedConfig): ...@@ -41,7 +41,7 @@ class BeitConfig(PretrainedConfig):
[microsoft/beit-base-patch16-224-pt22k](https://huggingface.co/microsoft/beit-base-patch16-224-pt22k) architecture. [microsoft/beit-base-patch16-224-pt22k](https://huggingface.co/microsoft/beit-base-patch16-224-pt22k) architecture.
Args: Args:
vocab_size (`int`, *optional*, defaults to 8092): vocab_size (`int`, *optional*, defaults to 8192):
Vocabulary size of the BEiT model. Defines the number of different image tokens that can be used during Vocabulary size of the BEiT model. Defines the number of different image tokens that can be used during
pre-training. pre-training.
hidden_size (`int`, *optional*, defaults to 768): hidden_size (`int`, *optional*, defaults to 768):
......
...@@ -57,7 +57,7 @@ class BeitImageProcessor(BaseImageProcessor): ...@@ -57,7 +57,7 @@ class BeitImageProcessor(BaseImageProcessor):
size (`Dict[str, int]` *optional*, defaults to `{"height": 256, "width": 256}`): size (`Dict[str, int]` *optional*, defaults to `{"height": 256, "width": 256}`):
Size of the output image after resizing. Can be overridden by the `size` parameter in the `preprocess` Size of the output image after resizing. Can be overridden by the `size` parameter in the `preprocess`
method. method.
resample (`PILImageResampling`, *optional*, defaults to `PILImageResampling.BICUBIC`): resample (`PILImageResampling`, *optional*, defaults to `Resampling.BICUBIC`):
Resampling filter to use if resizing the image. Can be overridden by the `resample` parameter in the Resampling filter to use if resizing the image. Can be overridden by the `resample` parameter in the
`preprocess` method. `preprocess` method.
do_center_crop (`bool`, *optional*, defaults to `True`): do_center_crop (`bool`, *optional*, defaults to `True`):
...@@ -67,12 +67,12 @@ class BeitImageProcessor(BaseImageProcessor): ...@@ -67,12 +67,12 @@ class BeitImageProcessor(BaseImageProcessor):
crop_size (`Dict[str, int]`, *optional*, defaults to `{"height": 224, "width": 224}`): crop_size (`Dict[str, int]`, *optional*, defaults to `{"height": 224, "width": 224}`):
Desired output size when applying center-cropping. Only has an effect if `do_center_crop` is set to `True`. Desired output size when applying center-cropping. Only has an effect if `do_center_crop` is set to `True`.
Can be overridden by the `crop_size` parameter in the `preprocess` method. Can be overridden by the `crop_size` parameter in the `preprocess` method.
do_rescale (`bool`, *optional*, defaults to `True`):
Whether to rescale the image by the specified scale `rescale_factor`. Can be overridden by the `do_rescale`
parameter in the `preprocess` method.
rescale_factor (`int` or `float`, *optional*, defaults to `1/255`): rescale_factor (`int` or `float`, *optional*, defaults to `1/255`):
Scale factor to use if rescaling the image. Can be overridden by the `rescale_factor` parameter in the Scale factor to use if rescaling the image. Can be overridden by the `rescale_factor` parameter in the
`preprocess` method. `preprocess` method.
do_rescale (`bool`, *optional*, defaults to `True`):
Whether to rescale the image by the specified scale `rescale_factor`. Can be overridden by the `do_rescale`
parameter in the `preprocess` method.
do_normalize (`bool`, *optional*, defaults to `True`): do_normalize (`bool`, *optional*, defaults to `True`):
Whether to normalize the image. Can be overridden by the `do_normalize` parameter in the `preprocess` Whether to normalize the image. Can be overridden by the `do_normalize` parameter in the `preprocess`
method. method.
......
...@@ -77,7 +77,7 @@ class BertweetTokenizer(PreTrainedTokenizer): ...@@ -77,7 +77,7 @@ class BertweetTokenizer(PreTrainedTokenizer):
Path to the vocabulary file. Path to the vocabulary file.
merges_file (`str`): merges_file (`str`):
Path to the merges file. Path to the merges file.
normalization (`bool`, *optional*, defaults to `False`) normalization (`bool`, *optional*, defaults to `False`):
Whether or not to apply a normalization preprocess. Whether or not to apply a normalization preprocess.
bos_token (`str`, *optional*, defaults to `"<s>"`): bos_token (`str`, *optional*, defaults to `"<s>"`):
The beginning of sequence token that was used during pretraining. Can be used a sequence classifier token. The beginning of sequence token that was used during pretraining. Can be used a sequence classifier token.
......
...@@ -60,25 +60,25 @@ class BigBirdTokenizer(PreTrainedTokenizer): ...@@ -60,25 +60,25 @@ class BigBirdTokenizer(PreTrainedTokenizer):
vocab_file (`str`): vocab_file (`str`):
[SentencePiece](https://github.com/google/sentencepiece) file (generally has a *.spm* extension) that [SentencePiece](https://github.com/google/sentencepiece) file (generally has a *.spm* extension) that
contains the vocabulary necessary to instantiate a tokenizer. contains the vocabulary necessary to instantiate a tokenizer.
eos_token (`str`, *optional*, defaults to `"</s>"`):
The end of sequence token.
bos_token (`str`, *optional*, defaults to `"<s>"`):
The begin of sequence token.
unk_token (`str`, *optional*, defaults to `"<unk>"`): unk_token (`str`, *optional*, defaults to `"<unk>"`):
The unknown token. A token that is not in the vocabulary cannot be converted to an ID and is set to be this The unknown token. A token that is not in the vocabulary cannot be converted to an ID and is set to be this
token instead. token instead.
bos_token (`str`, *optional*, defaults to `"<s>"`):
The begin of sequence token.
eos_token (`str`, *optional*, defaults to `"</s>"`):
The end of sequence token.
pad_token (`str`, *optional*, defaults to `"<pad>"`): pad_token (`str`, *optional*, defaults to `"<pad>"`):
The token used for padding, for example when batching sequences of different lengths. The token used for padding, for example when batching sequences of different lengths.
sep_token (`str`, *optional*, defaults to `"[SEP]"`): sep_token (`str`, *optional*, defaults to `"[SEP]"`):
The separator token, which is used when building a sequence from multiple sequences, e.g. two sequences for The separator token, which is used when building a sequence from multiple sequences, e.g. two sequences for
sequence classification or for a text and a question for question answering. It is also used as the last sequence classification or for a text and a question for question answering. It is also used as the last
token of a sequence built with special tokens. token of a sequence built with special tokens.
cls_token (`str`, *optional*, defaults to `"[CLS]"`):
The classifier token which is used when doing sequence classification (classification of the whole sequence
instead of per-token classification). It is the first token of the sequence when built with special tokens.
mask_token (`str`, *optional*, defaults to `"[MASK]"`): mask_token (`str`, *optional*, defaults to `"[MASK]"`):
The token used for masking values. This is the token used when training this model with masked language The token used for masking values. This is the token used when training this model with masked language
modeling. This is the token which the model will try to predict. modeling. This is the token which the model will try to predict.
cls_token (`str`, *optional*, defaults to `"[CLS]"`):
The classifier token which is used when doing sequence classification (classification of the whole sequence
instead of per-token classification). It is the first token of the sequence when built with special tokens.
sp_model_kwargs (`dict`, *optional*): sp_model_kwargs (`dict`, *optional*):
Will be passed to the `SentencePieceProcessor.__init__()` method. The [Python wrapper for Will be passed to the `SentencePieceProcessor.__init__()` method. The [Python wrapper for
SentencePiece](https://github.com/google/sentencepiece/tree/master/python) can be used, among other things, SentencePiece](https://github.com/google/sentencepiece/tree/master/python) can be used, among other things,
......
...@@ -72,12 +72,13 @@ class BioGptConfig(PretrainedConfig): ...@@ -72,12 +72,13 @@ class BioGptConfig(PretrainedConfig):
Please refer to the paper about LayerDrop: https://arxiv.org/abs/1909.11556 for further details Please refer to the paper about LayerDrop: https://arxiv.org/abs/1909.11556 for further details
activation_dropout (`float`, *optional*, defaults to 0.0): activation_dropout (`float`, *optional*, defaults to 0.0):
The dropout ratio for activations inside the fully connected layer. The dropout ratio for activations inside the fully connected layer.
pad_token_id (`int`, *optional*, defaults to 1) pad_token_id (`int`, *optional*, defaults to 1):
Padding token id. Padding token id.
bos_token_id (`int`, *optional*, defaults to 0) bos_token_id (`int`, *optional*, defaults to 0):
Beginning of stream token id. Beginning of stream token id.
eos_token_id (`int`, *optional*, defaults to 2) eos_token_id (`int`, *optional*, defaults to 2):
End of stream token id. End of stream token id.
Example: Example:
```python ```python
......
...@@ -52,7 +52,7 @@ class BitConfig(BackboneConfigMixin, PretrainedConfig): ...@@ -52,7 +52,7 @@ class BitConfig(BackboneConfigMixin, PretrainedConfig):
are supported. are supported.
global_padding (`str`, *optional*): global_padding (`str`, *optional*):
Padding strategy to use for the convolutional layers. Can be either `"valid"`, `"same"`, or `None`. Padding strategy to use for the convolutional layers. Can be either `"valid"`, `"same"`, or `None`.
num_groups (`int`, *optional*, defaults to `32`): num_groups (`int`, *optional*, defaults to 32):
Number of groups used for the `BitGroupNormActivation` layers. Number of groups used for the `BitGroupNormActivation` layers.
drop_path_rate (`float`, *optional*, defaults to 0.0): drop_path_rate (`float`, *optional*, defaults to 0.0):
The drop path rate for the stochastic depth. The drop path rate for the stochastic depth.
......
...@@ -85,9 +85,9 @@ class BlenderbotSmallTokenizer(PreTrainedTokenizer): ...@@ -85,9 +85,9 @@ class BlenderbotSmallTokenizer(PreTrainedTokenizer):
unk_token (`str`, *optional*, defaults to `"__unk__"`): unk_token (`str`, *optional*, defaults to `"__unk__"`):
The unknown token. A token that is not in the vocabulary cannot be converted to an ID and is set to be this The unknown token. A token that is not in the vocabulary cannot be converted to an ID and is set to be this
token instead. token instead.
pad_token (`str`, *optional*, defaults to `"__pad__"`): pad_token (`str`, *optional*, defaults to `"__null__"`):
The token used for padding, for example when batching sequences of different lengths. The token used for padding, for example when batching sequences of different lengths.
**kwargs kwargs (*optional*):
Additional keyword arguments passed along to [`PreTrainedTokenizer`] Additional keyword arguments passed along to [`PreTrainedTokenizer`]
""" """
......
...@@ -295,7 +295,7 @@ class BlipConfig(PretrainedConfig): ...@@ -295,7 +295,7 @@ class BlipConfig(PretrainedConfig):
Dimentionality of text and vision projection layers. Dimentionality of text and vision projection layers.
logit_scale_init_value (`float`, *optional*, defaults to 2.6592): logit_scale_init_value (`float`, *optional*, defaults to 2.6592):
The inital value of the *logit_scale* paramter. Default is used as per the original BLIP implementation. The inital value of the *logit_scale* paramter. Default is used as per the original BLIP implementation.
image_text_hidden_size (`int`, *optional*, defaults to 768): image_text_hidden_size (`int`, *optional*, defaults to 256):
Dimentionality of the hidden state of the image-text fusion layer. Dimentionality of the hidden state of the image-text fusion layer.
kwargs (*optional*): kwargs (*optional*):
Dictionary of keyword arguments. Dictionary of keyword arguments.
......
...@@ -53,7 +53,7 @@ class BlipImageProcessor(BaseImageProcessor): ...@@ -53,7 +53,7 @@ class BlipImageProcessor(BaseImageProcessor):
size (`dict`, *optional*, defaults to `{"height": 384, "width": 384}`): size (`dict`, *optional*, defaults to `{"height": 384, "width": 384}`):
Size of the output image after resizing. Can be overridden by the `size` parameter in the `preprocess` Size of the output image after resizing. Can be overridden by the `size` parameter in the `preprocess`
method. method.
resample (`PILImageResampling`, *optional*, defaults to `PILImageResampling.BICUBIC`): resample (`PILImageResampling`, *optional*, defaults to `Resampling.BICUBIC`):
Resampling filter to use if resizing the image. Only has an effect if `do_resize` is set to `True`. Can be Resampling filter to use if resizing the image. Only has an effect if `do_resize` is set to `True`. Can be
overridden by the `resample` parameter in the `preprocess` method. overridden by the `resample` parameter in the `preprocess` method.
do_rescale (`bool`, *optional*, defaults to `True`): do_rescale (`bool`, *optional*, defaults to `True`):
......
...@@ -128,14 +128,14 @@ class BridgeTowerImageProcessor(BaseImageProcessor): ...@@ -128,14 +128,14 @@ class BridgeTowerImageProcessor(BaseImageProcessor):
do_resize (`bool`, *optional*, defaults to `True`): do_resize (`bool`, *optional*, defaults to `True`):
Whether to resize the image's (height, width) dimensions to the specified `size`. Can be overridden by the Whether to resize the image's (height, width) dimensions to the specified `size`. Can be overridden by the
`do_resize` parameter in the `preprocess` method. `do_resize` parameter in the `preprocess` method.
size (`Dict[str, int]` *optional*, defaults to `288`): size (`Dict[str, int]` *optional*, defaults to 288):
Resize the shorter side of the input to `size["shortest_edge"]`. The longer side will be limited to under Resize the shorter side of the input to `size["shortest_edge"]`. The longer side will be limited to under
`int((1333 / 800) * size["shortest_edge"])` while preserving the aspect ratio. Only has an effect if `int((1333 / 800) * size["shortest_edge"])` while preserving the aspect ratio. Only has an effect if
`do_resize` is set to `True`. Can be overridden by the `size` parameter in the `preprocess` method. `do_resize` is set to `True`. Can be overridden by the `size` parameter in the `preprocess` method.
size_divisor (`int`, *optional*, defaults to 32): size_divisor (`int`, *optional*, defaults to 32):
The size by which to make sure both the height and width can be divided. Only has an effect if `do_resize` The size by which to make sure both the height and width can be divided. Only has an effect if `do_resize`
is set to `True`. Can be overridden by the `size_divisor` parameter in the `preprocess` method. is set to `True`. Can be overridden by the `size_divisor` parameter in the `preprocess` method.
resample (`PILImageResampling`, *optional*, defaults to `PILImageResampling.BICUBIC`): resample (`PILImageResampling`, *optional*, defaults to `Resampling.BICUBIC`):
Resampling filter to use if resizing the image. Only has an effect if `do_resize` is set to `True`. Can be Resampling filter to use if resizing the image. Only has an effect if `do_resize` is set to `True`. Can be
overridden by the `resample` parameter in the `preprocess` method. overridden by the `resample` parameter in the `preprocess` method.
do_rescale (`bool`, *optional*, defaults to `True`): do_rescale (`bool`, *optional*, defaults to `True`):
......
...@@ -31,7 +31,7 @@ class BrosProcessor(ProcessorMixin): ...@@ -31,7 +31,7 @@ class BrosProcessor(ProcessorMixin):
[`~BrosProcessor.__call__`] and [`~BrosProcessor.decode`] for more information. [`~BrosProcessor.__call__`] and [`~BrosProcessor.decode`] for more information.
Args: Args:
tokenizer (`BertTokenizerFast`): tokenizer (`BertTokenizerFast`, *optional*):
An instance of ['BertTokenizerFast`]. The tokenizer is a required input. An instance of ['BertTokenizerFast`]. The tokenizer is a required input.
""" """
attributes = ["tokenizer"] attributes = ["tokenizer"]
......
...@@ -48,7 +48,7 @@ class ByT5Tokenizer(PreTrainedTokenizer): ...@@ -48,7 +48,7 @@ class ByT5Tokenizer(PreTrainedTokenizer):
token instead. token instead.
pad_token (`str`, *optional*, defaults to `"<pad>"`): pad_token (`str`, *optional*, defaults to `"<pad>"`):
The token used for padding, for example when batching sequences of different lengths. The token used for padding, for example when batching sequences of different lengths.
extra_ids (`int`, *optional*, defaults to 100): extra_ids (`int`, *optional*, defaults to 125):
Add a number of extra ids added to the end of the vocabulary for use as sentinels. These tokens are Add a number of extra ids added to the end of the vocabulary for use as sentinels. These tokens are
accessible as "<extra_id_{%d}>" where "{%d}" is a number between 0 and extra_ids-1. Extra tokens are accessible as "<extra_id_{%d}>" where "{%d}" is a number between 0 and extra_ids-1. Extra tokens are
indexed from the end of the vocabulary up to beginning ("<extra_id_0>" is the last token in the vocabulary indexed from the end of the vocabulary up to beginning ("<extra_id_0>" is the last token in the vocabulary
......
...@@ -89,7 +89,7 @@ class CamembertTokenizer(PreTrainedTokenizer): ...@@ -89,7 +89,7 @@ class CamembertTokenizer(PreTrainedTokenizer):
mask_token (`str`, *optional*, defaults to `"<mask>"`): mask_token (`str`, *optional*, defaults to `"<mask>"`):
The token used for masking values. This is the token used when training this model with masked language The token used for masking values. This is the token used when training this model with masked language
modeling. This is the token which the model will try to predict. modeling. This is the token which the model will try to predict.
additional_special_tokens (`List[str]`, *optional*, defaults to `["<s>NOTUSED", "</s>NOTUSED"]`): additional_special_tokens (`List[str]`, *optional*, defaults to `['<s>NOTUSED', '</s>NOTUSED']`):
Additional special tokens used by the tokenizer. Additional special tokens used by the tokenizer.
sp_model_kwargs (`dict`, *optional*): sp_model_kwargs (`dict`, *optional*):
Will be passed to the `SentencePieceProcessor.__init__()` method. The [Python wrapper for Will be passed to the `SentencePieceProcessor.__init__()` method. The [Python wrapper for
......
...@@ -31,9 +31,9 @@ class ChineseCLIPProcessor(ProcessorMixin): ...@@ -31,9 +31,9 @@ class ChineseCLIPProcessor(ProcessorMixin):
See the [`~ChineseCLIPProcessor.__call__`] and [`~ChineseCLIPProcessor.decode`] for more information. See the [`~ChineseCLIPProcessor.__call__`] and [`~ChineseCLIPProcessor.decode`] for more information.
Args: Args:
image_processor ([`ChineseCLIPImageProcessor`]): image_processor ([`ChineseCLIPImageProcessor`], *optional*):
The image processor is a required input. The image processor is a required input.
tokenizer ([`BertTokenizerFast`]): tokenizer ([`BertTokenizerFast`], *optional*):
The tokenizer is a required input. The tokenizer is a required input.
""" """
attributes = ["image_processor", "tokenizer"] attributes = ["image_processor", "tokenizer"]
......
...@@ -227,7 +227,7 @@ class ClapAudioConfig(PretrainedConfig): ...@@ -227,7 +227,7 @@ class ClapAudioConfig(PretrainedConfig):
projection_hidden_act (`str`, *optional*, defaults to `"relu"`): projection_hidden_act (`str`, *optional*, defaults to `"relu"`):
The non-linear activation function (function or string) in the projection layer. If string, `"gelu"`, The non-linear activation function (function or string) in the projection layer. If string, `"gelu"`,
`"relu"`, `"silu"` and `"gelu_new"` are supported. `"relu"`, `"silu"` and `"gelu_new"` are supported.
layer_norm_eps (`[type]`, *optional*, defaults to `1e-5`): layer_norm_eps (`[type]`, *optional*, defaults to 1e-05):
The epsilon used by the layer normalization layers. The epsilon used by the layer normalization layers.
initializer_factor (`float`, *optional*, defaults to 1.0): initializer_factor (`float`, *optional*, defaults to 1.0):
A factor for initializing all weight matrices (should be kept to 1, used internally for initialization A factor for initializing all weight matrices (should be kept to 1, used internally for initialization
...@@ -345,10 +345,10 @@ class ClapConfig(PretrainedConfig): ...@@ -345,10 +345,10 @@ class ClapConfig(PretrainedConfig):
Dictionary of configuration options used to initialize [`ClapTextConfig`]. Dictionary of configuration options used to initialize [`ClapTextConfig`].
audio_config (`dict`, *optional*): audio_config (`dict`, *optional*):
Dictionary of configuration options used to initialize [`ClapAudioConfig`]. Dictionary of configuration options used to initialize [`ClapAudioConfig`].
logit_scale_init_value (`float`, *optional*, defaults to 14.29):
The inital value of the *logit_scale* paramter. Default is used as per the original CLAP implementation.
projection_dim (`int`, *optional*, defaults to 512): projection_dim (`int`, *optional*, defaults to 512):
Dimentionality of text and audio projection layers. Dimentionality of text and audio projection layers.
logit_scale_init_value (`float`, *optional*, defaults to 2.6592):
The inital value of the *logit_scale* paramter. Default is used as per the original CLAP implementation.
projection_hidden_act (`str`, *optional*, defaults to `"relu"`): projection_hidden_act (`str`, *optional*, defaults to `"relu"`):
Activation function for the projection layers. Activation function for the projection layers.
initializer_factor (`float`, *optional*, defaults to 1.0): initializer_factor (`float`, *optional*, defaults to 1.0):
......
...@@ -41,32 +41,32 @@ class ClapFeatureExtractor(SequenceFeatureExtractor): ...@@ -41,32 +41,32 @@ class ClapFeatureExtractor(SequenceFeatureExtractor):
Fourier Transform* (STFT) which should match pytorch's `torch.stft` equivalent. Fourier Transform* (STFT) which should match pytorch's `torch.stft` equivalent.
Args: Args:
feature_size (`int`, defaults to 64): feature_size (`int`, *optional*, defaults to 64):
The feature dimension of the extracted Mel spectrograms. This corresponds to the number of mel filters The feature dimension of the extracted Mel spectrograms. This corresponds to the number of mel filters
(`n_mels`). (`n_mels`).
sampling_rate (`int`, defaults to 48_000): sampling_rate (`int`, *optional*, defaults to 48000):
The sampling rate at which the audio files should be digitalized expressed in hertz (Hz). This only serves The sampling rate at which the audio files should be digitalized expressed in hertz (Hz). This only serves
to warn users if the audio fed to the feature extractor does not have the same sampling rate. to warn users if the audio fed to the feature extractor does not have the same sampling rate.
hop_length (`int`, defaults to 480): hop_length (`int`,*optional*, defaults to 480):
Length of the overlaping windows for the STFT used to obtain the Mel Spectrogram. The audio will be split Length of the overlaping windows for the STFT used to obtain the Mel Spectrogram. The audio will be split
in smaller `frames` with a step of `hop_length` between each frame. in smaller `frames` with a step of `hop_length` between each frame.
max_length_s (`int`, defaults to 10): max_length_s (`int`, *optional*, defaults to 10):
The maximum input length of the model in seconds. This is used to pad the audio. The maximum input length of the model in seconds. This is used to pad the audio.
fft_window_size (`int`, defaults to 1024): fft_window_size (`int`, *optional*, defaults to 1024):
Size of the window (in samples) on which the Fourier transform is applied. This controls the frequency Size of the window (in samples) on which the Fourier transform is applied. This controls the frequency
resolution of the spectrogram. 400 means that the fourrier transform is computed on windows of 400 samples. resolution of the spectrogram. 400 means that the fourrier transform is computed on windows of 400 samples.
padding_value (`float`, *optional*, defaults to 0.0): padding_value (`float`, *optional*, defaults to 0.0):
Padding value used to pad the audio. Should correspond to silences. Padding value used to pad the audio. Should correspond to silences.
return_attention_mask (`bool`, *optional*, defaults to `False`): return_attention_mask (`bool`, *optional*, defaults to `False`):
Whether or not the model should return the attention masks coresponding to the input. Whether or not the model should return the attention masks coresponding to the input.
frequency_min (`float`, *optional*, default to 0): frequency_min (`float`, *optional*, defaults to 0):
The lowest frequency of interest. The STFT will not be computed for values below this. The lowest frequency of interest. The STFT will not be computed for values below this.
frequency_max (`float`, *optional*, default to 14_000): frequency_max (`float`, *optional*, defaults to 14000):
The highest frequency of interest. The STFT will not be computed for values above this. The highest frequency of interest. The STFT will not be computed for values above this.
top_db (`float`, *optional*): top_db (`float`, *optional*):
The highest decibel value used to convert the mel spectrogram to the log scale. For more details see the The highest decibel value used to convert the mel spectrogram to the log scale. For more details see the
`audio_utils.power_to_db` function `audio_utils.power_to_db` function
truncation (`str`, *optional*, default to `"fusions"`): truncation (`str`, *optional*, defaults to `"fusion"`):
Truncation pattern for long audio inputs. Two patterns are available: Truncation pattern for long audio inputs. Two patterns are available:
- `fusion` will use `_random_mel_fusion`, which stacks 3 random crops from the mel spectrogram and a - `fusion` will use `_random_mel_fusion`, which stacks 3 random crops from the mel spectrogram and a
downsampled version of the entire mel spectrogram. downsampled version of the entire mel spectrogram.
......
...@@ -30,9 +30,9 @@ class CLIPProcessor(ProcessorMixin): ...@@ -30,9 +30,9 @@ class CLIPProcessor(ProcessorMixin):
[`~CLIPProcessor.__call__`] and [`~CLIPProcessor.decode`] for more information. [`~CLIPProcessor.__call__`] and [`~CLIPProcessor.decode`] for more information.
Args: Args:
image_processor ([`CLIPImageProcessor`]): image_processor ([`CLIPImageProcessor`], *optional*):
The image processor is a required input. The image processor is a required input.
tokenizer ([`CLIPTokenizerFast`]): tokenizer ([`CLIPTokenizerFast`], *optional*):
The tokenizer is a required input. The tokenizer is a required input.
""" """
attributes = ["image_processor", "tokenizer"] attributes = ["image_processor", "tokenizer"]
......
...@@ -255,7 +255,7 @@ class CLIPSegConfig(PretrainedConfig): ...@@ -255,7 +255,7 @@ class CLIPSegConfig(PretrainedConfig):
Dimensionality of text and vision projection layers. Dimensionality of text and vision projection layers.
logit_scale_init_value (`float`, *optional*, defaults to 2.6592): logit_scale_init_value (`float`, *optional*, defaults to 2.6592):
The inital value of the *logit_scale* paramter. Default is used as per the original CLIPSeg implementation. The inital value of the *logit_scale* paramter. Default is used as per the original CLIPSeg implementation.
extract_layers (`List[int]`, *optional*, defaults to [3, 6, 9]): extract_layers (`List[int]`, *optional*, defaults to `[3, 6, 9]`):
Layers to extract when forwarding the query image through the frozen visual backbone of CLIP. Layers to extract when forwarding the query image through the frozen visual backbone of CLIP.
reduce_dim (`int`, *optional*, defaults to 64): reduce_dim (`int`, *optional*, defaults to 64):
Dimensionality to reduce the CLIP vision embedding. Dimensionality to reduce the CLIP vision embedding.
......
...@@ -30,9 +30,9 @@ class CLIPSegProcessor(ProcessorMixin): ...@@ -30,9 +30,9 @@ class CLIPSegProcessor(ProcessorMixin):
[`~CLIPSegProcessor.__call__`] and [`~CLIPSegProcessor.decode`] for more information. [`~CLIPSegProcessor.__call__`] and [`~CLIPSegProcessor.decode`] for more information.
Args: Args:
image_processor ([`ViTImageProcessor`]): image_processor ([`ViTImageProcessor`], *optional*):
The image processor is a required input. The image processor is a required input.
tokenizer ([`CLIPTokenizerFast`]): tokenizer ([`CLIPTokenizerFast`], *optional*):
The tokenizer is a required input. The tokenizer is a required input.
""" """
attributes = ["image_processor", "tokenizer"] attributes = ["image_processor", "tokenizer"]
......
...@@ -64,7 +64,7 @@ class ConvNextImageProcessor(BaseImageProcessor): ...@@ -64,7 +64,7 @@ class ConvNextImageProcessor(BaseImageProcessor):
crop_pct (`float` *optional*, defaults to 224 / 256): crop_pct (`float` *optional*, defaults to 224 / 256):
Percentage of the image to crop. Only has an effect if `do_resize` is `True` and size < 384. Can be Percentage of the image to crop. Only has an effect if `do_resize` is `True` and size < 384. Can be
overriden by `crop_pct` in the `preprocess` method. overriden by `crop_pct` in the `preprocess` method.
resample (`PILImageResampling`, *optional*, defaults to `PILImageResampling.BILINEAR`): resample (`PILImageResampling`, *optional*, defaults to `Resampling.BILINEAR`):
Resampling filter to use if resizing the image. Can be overriden by `resample` in the `preprocess` method. Resampling filter to use if resizing the image. Can be overriden by `resample` in the `preprocess` method.
do_rescale (`bool`, *optional*, defaults to `True`): do_rescale (`bool`, *optional*, defaults to `True`):
Whether to rescale the image by the specified scale `rescale_factor`. Can be overriden by `do_rescale` in Whether to rescale the image by the specified scale `rescale_factor`. Can be overriden by `do_rescale` in
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment