Unverified Commit 03af4c42 authored by Sylvain Gugger's avatar Sylvain Gugger Committed by GitHub
Browse files

Docstring check (#26052)



* Fix number of minimal calls to the Hub with peft integration

* Alternate design

* And this way?

* Revert

* Nits to fix

* Add util

* Print when changes are made

* Add list to ignore

* Add more rules

* Manual fixes

* deal with kwargs

* deal with enum defaults

* avoid many digits for floats

* Manual fixes

* Fix regex

* Fix regex

* Auto fix

* Style

* Apply script

* Add ignored list

* Add check that templates are filled

* Adding to CI checks

* Add back semi-fix

* Ignore more objects

* More auto-fixes

* Ignore missing objects

* Remove temp semi-fix

* Fixes

* Update src/transformers/models/pvt/configuration_pvt.py
Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>

* Update utils/check_docstrings.py
Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/utils/quantization_config.py
Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>

* Deal with float defaults

* Fix small defaults

* Address review comment

* Treat

* Post-rebase cleanup

* Address review comment

* Update src/transformers/models/deprecated/mctct/configuration_mctct.py
Co-authored-by: default avatarLysandre Debut <lysandre.debut@reseau.eseo.fr>

* Address review comment

---------
Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: default avatarLysandre Debut <lysandre.debut@reseau.eseo.fr>
parent 122b2657
...@@ -81,7 +81,7 @@ class FunnelConfig(PretrainedConfig): ...@@ -81,7 +81,7 @@ class FunnelConfig(PretrainedConfig):
The standard deviation of the *normal initializer* for initializing the embedding matrix and the weight of The standard deviation of the *normal initializer* for initializing the embedding matrix and the weight of
linear layers. Will default to 1 for the embedding matrix and the value given by Xavier initialization for linear layers. Will default to 1 for the embedding matrix and the value given by Xavier initialization for
linear layers. linear layers.
layer_norm_eps (`float`, *optional*, defaults to 1e-9): layer_norm_eps (`float`, *optional*, defaults to 1e-09):
The epsilon used by the layer normalization layers. The epsilon used by the layer normalization layers.
pooling_type (`str`, *optional*, defaults to `"mean"`): pooling_type (`str`, *optional*, defaults to `"mean"`):
Possible values are `"mean"` or `"max"`. The way pooling is performed at the beginning of each block. Possible values are `"mean"` or `"max"`. The way pooling is performed at the beginning of each block.
...@@ -90,10 +90,10 @@ class FunnelConfig(PretrainedConfig): ...@@ -90,10 +90,10 @@ class FunnelConfig(PretrainedConfig):
is faster on TPU. is faster on TPU.
separate_cls (`bool`, *optional*, defaults to `True`): separate_cls (`bool`, *optional*, defaults to `True`):
Whether or not to separate the cls token when applying pooling. Whether or not to separate the cls token when applying pooling.
truncate_seq (`bool`, *optional*, defaults to `False`): truncate_seq (`bool`, *optional*, defaults to `True`):
When using `separate_cls`, whether or not to truncate the last token when pooling, to avoid getting a When using `separate_cls`, whether or not to truncate the last token when pooling, to avoid getting a
sequence length that is not a multiple of 2. sequence length that is not a multiple of 2.
pool_q_only (`bool`, *optional*, defaults to `False`): pool_q_only (`bool`, *optional*, defaults to `True`):
Whether or not to apply the pooling only to the query or to query, key and values for the attention layers. Whether or not to apply the pooling only to the query or to query, key and values for the attention layers.
""" """
model_type = "funnel" model_type = "funnel"
......
...@@ -120,9 +120,9 @@ class FunnelTokenizer(PreTrainedTokenizer): ...@@ -120,9 +120,9 @@ class FunnelTokenizer(PreTrainedTokenizer):
mask_token (`str`, *optional*, defaults to `"<mask>"`): mask_token (`str`, *optional*, defaults to `"<mask>"`):
The token used for masking values. This is the token used when training this model with masked language The token used for masking values. This is the token used when training this model with masked language
modeling. This is the token which the model will try to predict. modeling. This is the token which the model will try to predict.
bos_token (`str`, `optional`, defaults to `"<s>"`): bos_token (`str`, *optional*, defaults to `"<s>"`):
The beginning of sentence token. The beginning of sentence token.
eos_token (`str`, `optional`, defaults to `"</s>"`): eos_token (`str`, *optional*, defaults to `"</s>"`):
The end of sentence token. The end of sentence token.
tokenize_chinese_chars (`bool`, *optional*, defaults to `True`): tokenize_chinese_chars (`bool`, *optional*, defaults to `True`):
Whether or not to tokenize Chinese characters. Whether or not to tokenize Chinese characters.
......
...@@ -51,7 +51,7 @@ class GLPNConfig(PretrainedConfig): ...@@ -51,7 +51,7 @@ class GLPNConfig(PretrainedConfig):
Patch size before each encoder block. Patch size before each encoder block.
strides (`List[int]`, *optional*, defaults to `[4, 2, 2, 2]`): strides (`List[int]`, *optional*, defaults to `[4, 2, 2, 2]`):
Stride before each encoder block. Stride before each encoder block.
num_attention_heads (`List[int]`, *optional*, defaults to `[1, 2, 4, 8]`): num_attention_heads (`List[int]`, *optional*, defaults to `[1, 2, 5, 8]`):
Number of attention heads for each attention layer in each block of the Transformer encoder. Number of attention heads for each attention layer in each block of the Transformer encoder.
mlp_ratios (`List[int]`, *optional*, defaults to `[4, 4, 4, 4]`): mlp_ratios (`List[int]`, *optional*, defaults to `[4, 4, 4, 4]`):
Ratio of the size of the hidden layer compared to the size of the input layer of the Mix FFNs in the Ratio of the size of the hidden layer compared to the size of the input layer of the Mix FFNs in the
...@@ -67,9 +67,9 @@ class GLPNConfig(PretrainedConfig): ...@@ -67,9 +67,9 @@ class GLPNConfig(PretrainedConfig):
The standard deviation of the truncated_normal_initializer for initializing all weight matrices. The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
drop_path_rate (`float`, *optional*, defaults to 0.1): drop_path_rate (`float`, *optional*, defaults to 0.1):
The dropout probability for stochastic depth, used in the blocks of the Transformer encoder. The dropout probability for stochastic depth, used in the blocks of the Transformer encoder.
layer_norm_eps (`float`, *optional*, defaults to 1e-6): layer_norm_eps (`float`, *optional*, defaults to 1e-06):
The epsilon used by the layer normalization layers. The epsilon used by the layer normalization layers.
decoder_hidden_size (`int`, *optional*, defaults to 32): decoder_hidden_size (`int`, *optional*, defaults to 64):
The dimension of the decoder. The dimension of the decoder.
max_depth (`int`, *optional*, defaults to 10): max_depth (`int`, *optional*, defaults to 10):
The maximum depth of the decoder. The maximum depth of the decoder.
......
...@@ -48,7 +48,7 @@ class GLPNImageProcessor(BaseImageProcessor): ...@@ -48,7 +48,7 @@ class GLPNImageProcessor(BaseImageProcessor):
size_divisor (`int`, *optional*, defaults to 32): size_divisor (`int`, *optional*, defaults to 32):
When `do_resize` is `True`, images are resized so their height and width are rounded down to the closest When `do_resize` is `True`, images are resized so their height and width are rounded down to the closest
multiple of `size_divisor`. Can be overridden by `size_divisor` in `preprocess`. multiple of `size_divisor`. Can be overridden by `size_divisor` in `preprocess`.
resample (`PIL.Image` resampling filter, *optional*, defaults to `PILImageResampling.BILINEAR`): resample (`PIL.Image` resampling filter, *optional*, defaults to `Resampling.BILINEAR`):
Resampling filter to use if resizing the image. Can be overridden by `resample` in `preprocess`. Resampling filter to use if resizing the image. Can be overridden by `resample` in `preprocess`.
do_rescale (`bool`, *optional*, defaults to `True`): do_rescale (`bool`, *optional*, defaults to `True`):
Whether or not to apply the scaling factor (to make pixel values floats between 0. and 1.). Can be Whether or not to apply the scaling factor (to make pixel values floats between 0. and 1.). Can be
......
...@@ -54,7 +54,7 @@ class GPTNeoConfig(PretrainedConfig): ...@@ -54,7 +54,7 @@ class GPTNeoConfig(PretrainedConfig):
Dimensionality of the encoder layers and the pooler layer. Dimensionality of the encoder layers and the pooler layer.
num_layers (`int`, *optional*, defaults to 24): num_layers (`int`, *optional*, defaults to 24):
Number of hidden layers in the Transformer encoder. Number of hidden layers in the Transformer encoder.
attention_types (`List`, *optional*, defaults to `[[["global", "local"], 12]]`): attention_types (`List`, *optional*, defaults to `[[['global', 'local'], 12]]`):
The type of attention for each layer in a `List` of the following format `[[["attention_type"], The type of attention for each layer in a `List` of the following format `[[["attention_type"],
num_layerss]]` e.g. for a 24 layer model `[[["global"], 24]]` or `[[["global", "local"], 12]]` Choose the num_layerss]]` e.g. for a 24 layer model `[[["global"], 24]]` or `[[["global", "local"], 12]]` Choose the
value of `attention_type` from `["global", "local"]` value of `attention_type` from `["global", "local"]`
...@@ -76,7 +76,7 @@ class GPTNeoConfig(PretrainedConfig): ...@@ -76,7 +76,7 @@ class GPTNeoConfig(PretrainedConfig):
classifier_dropout (`float`, *optional*, defaults to 0.1): classifier_dropout (`float`, *optional*, defaults to 0.1):
Argument used when doing token classification, used in the model [`GPTNeoForTokenClassification`]. The Argument used when doing token classification, used in the model [`GPTNeoForTokenClassification`]. The
dropout ratio for the hidden layer. dropout ratio for the hidden layer.
layer_norm_epsilon (`float`, *optional*, defaults to 1e-5): layer_norm_epsilon (`float`, *optional*, defaults to 1e-05):
The epsilon used by the layer normalization layers. The epsilon used by the layer normalization layers.
initializer_range (`float`, *optional*, defaults to 0.02): initializer_range (`float`, *optional*, defaults to 0.02):
The standard deviation of the truncated_normal_initializer for initializing all weight matrices. The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
......
...@@ -64,17 +64,17 @@ class GPTSw3Tokenizer(PreTrainedTokenizer): ...@@ -64,17 +64,17 @@ class GPTSw3Tokenizer(PreTrainedTokenizer):
Whether or not to strip the text when tokenizing (removing excess spaces before and after the string). Whether or not to strip the text when tokenizing (removing excess spaces before and after the string).
keep_accents (`bool`, *optional*, defaults to `False`): keep_accents (`bool`, *optional*, defaults to `False`):
Whether or not to keep accents when tokenizing. Whether or not to keep accents when tokenizing.
bos_token (`str`, *optional*):
The beginning of sequence token that can be used for downstream task, was not seen during pretraining. If
not provided, will default to '<s>' or '<|endoftext|>', depending on model size.
eos_token (`str`, *optional*):
The end of sequence token seen during pretraining. If not provided, will default to '<|endoftext|>'
unk_token (`str`, *optional*):
The unknown token. A token that is not in the vocabulary cannot be converted to an ID and is set to be this
token instead. If not provided, will default to '<unk>'.
pad_token (`str`, *optional*): pad_token (`str`, *optional*):
The token used for padding, for example when batching sequences of different lengths. If not provided, will The token used for padding, for example when batching sequences of different lengths. If not provided, will
default to '<pad>' or '<unk>' depending on model size. default to '<pad>' or '<unk>' depending on model size.
unk_token (`str`, *optional*):
The unknown token. A token that is not in the vocabulary cannot be converted to an ID and is set to be this
token instead. If not provided, will default to '<unk>'.
eos_token (`str`, *optional*):
The end of sequence token seen during pretraining. If not provided, will default to '<|endoftext|>'
bos_token (`str`, *optional*):
The beginning of sequence token that can be used for downstream task, was not seen during pretraining. If
not provided, will default to '<s>' or '<|endoftext|>', depending on model size.
sp_model_kwargs (`dict`, *optional*): sp_model_kwargs (`dict`, *optional*):
Will be passed to the `SentencePieceProcessor.__init__()` method. The [Python wrapper for Will be passed to the `SentencePieceProcessor.__init__()` method. The [Python wrapper for
SentencePiece](https://github.com/google/sentencepiece/tree/master/python) can be used, among other things, SentencePiece](https://github.com/google/sentencepiece/tree/master/python) can be used, among other things,
......
...@@ -139,7 +139,7 @@ class GPTSanJapaneseTokenizer(PreTrainedTokenizer): ...@@ -139,7 +139,7 @@ class GPTSanJapaneseTokenizer(PreTrainedTokenizer):
The token used for unknown charactor The token used for unknown charactor
pad_token (`str`, *optional*, defaults to `"<|separator|>"`): pad_token (`str`, *optional*, defaults to `"<|separator|>"`):
The token used for padding The token used for padding
bos_token (`str`, *optional*, defaults to `"<|startoftext|>""`): bos_token (`str`, *optional*, defaults to `"<|startoftext|>"`):
The beginning of sequence token. The beginning of sequence token.
eos_token (`str`, *optional*, defaults to `"<|endoftext|>"`): eos_token (`str`, *optional*, defaults to `"<|endoftext|>"`):
The end of sequence token. The end of sequence token.
......
...@@ -53,10 +53,8 @@ class IdeficsImageProcessor(BaseImageProcessor): ...@@ -53,10 +53,8 @@ class IdeficsImageProcessor(BaseImageProcessor):
Constructs a Idefics image processor. Constructs a Idefics image processor.
Args: Args:
image_size (`int`, *optional*, defaults to `224`): image_size (`int`, *optional*, defaults to 224):
Resize to image size Resize to image size
image_num_channels (`int`, *optional*, defaults to `3`):
Number of image channels.
image_mean (`float` or `List[float]`, *optional*, defaults to `IDEFICS_STANDARD_MEAN`): image_mean (`float` or `List[float]`, *optional*, defaults to `IDEFICS_STANDARD_MEAN`):
Mean to use if normalizing the image. This is a float or list of floats the length of the number of Mean to use if normalizing the image. This is a float or list of floats the length of the number of
channels in the image. Can be overridden by the `image_mean` parameter in the `preprocess` method. Can be channels in the image. Can be overridden by the `image_mean` parameter in the `preprocess` method. Can be
...@@ -65,6 +63,8 @@ class IdeficsImageProcessor(BaseImageProcessor): ...@@ -65,6 +63,8 @@ class IdeficsImageProcessor(BaseImageProcessor):
Standard deviation to use if normalizing the image. This is a float or list of floats the length of the Standard deviation to use if normalizing the image. This is a float or list of floats the length of the
number of channels in the image. Can be overridden by the `image_std` parameter in the `preprocess` method. number of channels in the image. Can be overridden by the `image_std` parameter in the `preprocess` method.
Can be overridden by the `image_std` parameter in the `preprocess` method. Can be overridden by the `image_std` parameter in the `preprocess` method.
image_num_channels (`int`, *optional*, defaults to 3):
Number of image channels.
""" """
model_input_names = ["pixel_values"] model_input_names = ["pixel_values"]
......
...@@ -70,7 +70,7 @@ class ImageGPTImageProcessor(BaseImageProcessor): ...@@ -70,7 +70,7 @@ class ImageGPTImageProcessor(BaseImageProcessor):
`do_resize` in `preprocess`. `do_resize` in `preprocess`.
size (`Dict[str, int]` *optional*, defaults to `{"height": 256, "width": 256}`): size (`Dict[str, int]` *optional*, defaults to `{"height": 256, "width": 256}`):
Size of the image after resizing. Can be overridden by `size` in `preprocess`. Size of the image after resizing. Can be overridden by `size` in `preprocess`.
resample (`PILImageResampling`, *optional*, defaults to `PILImageResampling.BICUBIC`): resample (`PILImageResampling`, *optional*, defaults to `Resampling.BILINEAR`):
Resampling filter to use if resizing the image. Can be overridden by `resample` in `preprocess`. Resampling filter to use if resizing the image. Can be overridden by `resample` in `preprocess`.
do_normalize (`bool`, *optional*, defaults to `True`): do_normalize (`bool`, *optional*, defaults to `True`):
Whether to normalize the image pixel value to between [-1, 1]. Can be overridden by `do_normalize` in Whether to normalize the image pixel value to between [-1, 1]. Can be overridden by `do_normalize` in
......
...@@ -57,7 +57,7 @@ class InstructBlipVisionConfig(PretrainedConfig): ...@@ -57,7 +57,7 @@ class InstructBlipVisionConfig(PretrainedConfig):
The non-linear activation function (function or string) in the encoder and pooler. If string, `"gelu"`, The non-linear activation function (function or string) in the encoder and pooler. If string, `"gelu"`,
`"relu"`, `"selu"` and `"gelu_new"` ``"gelu"` are supported. to 1e-5): The epsilon used by the layer `"relu"`, `"selu"` and `"gelu_new"` ``"gelu"` are supported. to 1e-5): The epsilon used by the layer
normalization layers. normalization layers.
layer_norm_eps (`float`, *optional*, defaults to 1e-6): layer_norm_eps (`float`, *optional*, defaults to 1e-06):
The epsilon used by the layer normalization layers. The epsilon used by the layer normalization layers.
attention_dropout (`float`, *optional*, defaults to 0.0): attention_dropout (`float`, *optional*, defaults to 0.0):
The dropout ratio for the attention probabilities. The dropout ratio for the attention probabilities.
......
...@@ -83,8 +83,6 @@ class LayoutLMConfig(PretrainedConfig): ...@@ -83,8 +83,6 @@ class LayoutLMConfig(PretrainedConfig):
use_cache (`bool`, *optional*, defaults to `True`): use_cache (`bool`, *optional*, defaults to `True`):
Whether or not the model should return the last key/values attentions (not used by all models). Only Whether or not the model should return the last key/values attentions (not used by all models). Only
relevant if `config.is_decoder=True`. relevant if `config.is_decoder=True`.
classifier_dropout (`float`, *optional*):
The dropout ratio for the classification head.
max_2d_position_embeddings (`int`, *optional*, defaults to 1024): max_2d_position_embeddings (`int`, *optional*, defaults to 1024):
The maximum value that the 2D position embedding might ever used. Typically set this to something large The maximum value that the 2D position embedding might ever used. Typically set this to something large
just in case (e.g., 1024). just in case (e.g., 1024).
......
...@@ -100,7 +100,7 @@ class LayoutLMv2ImageProcessor(BaseImageProcessor): ...@@ -100,7 +100,7 @@ class LayoutLMv2ImageProcessor(BaseImageProcessor):
overridden by `do_resize` in `preprocess`. overridden by `do_resize` in `preprocess`.
size (`Dict[str, int]` *optional*, defaults to `{"height": 224, "width": 224}`): size (`Dict[str, int]` *optional*, defaults to `{"height": 224, "width": 224}`):
Size of the image after resizing. Can be overridden by `size` in `preprocess`. Size of the image after resizing. Can be overridden by `size` in `preprocess`.
resample (`PILImageResampling`, *optional*, defaults to `PILImageResampling.BILINEAR`): resample (`PILImageResampling`, *optional*, defaults to `Resampling.BILINEAR`):
Resampling filter to use if resizing the image. Can be overridden by the `resample` parameter in the Resampling filter to use if resizing the image. Can be overridden by the `resample` parameter in the
`preprocess` method. `preprocess` method.
apply_ocr (`bool`, *optional*, defaults to `True`): apply_ocr (`bool`, *optional*, defaults to `True`):
...@@ -109,7 +109,7 @@ class LayoutLMv2ImageProcessor(BaseImageProcessor): ...@@ -109,7 +109,7 @@ class LayoutLMv2ImageProcessor(BaseImageProcessor):
ocr_lang (`str`, *optional*): ocr_lang (`str`, *optional*):
The language, specified by its ISO code, to be used by the Tesseract OCR engine. By default, English is The language, specified by its ISO code, to be used by the Tesseract OCR engine. By default, English is
used. Can be overridden by `ocr_lang` in `preprocess`. used. Can be overridden by `ocr_lang` in `preprocess`.
tesseract_config (`str`, *optional*): tesseract_config (`str`, *optional*, defaults to `""`):
Any additional custom configuration flags that are forwarded to the `config` parameter when calling Any additional custom configuration flags that are forwarded to the `config` parameter when calling
Tesseract. For example: '--psm 6'. Can be overridden by `tesseract_config` in `preprocess`. Tesseract. For example: '--psm 6'. Can be overridden by `tesseract_config` in `preprocess`.
""" """
......
...@@ -38,9 +38,9 @@ class LayoutLMv2Processor(ProcessorMixin): ...@@ -38,9 +38,9 @@ class LayoutLMv2Processor(ProcessorMixin):
into token-level `labels` for token classification tasks (such as FUNSD, CORD). into token-level `labels` for token classification tasks (such as FUNSD, CORD).
Args: Args:
image_processor (`LayoutLMv2ImageProcessor`): image_processor (`LayoutLMv2ImageProcessor`, *optional*):
An instance of [`LayoutLMv2ImageProcessor`]. The image processor is a required input. An instance of [`LayoutLMv2ImageProcessor`]. The image processor is a required input.
tokenizer (`LayoutLMv2Tokenizer` or `LayoutLMv2TokenizerFast`): tokenizer (`LayoutLMv2Tokenizer` or `LayoutLMv2TokenizerFast`, *optional*):
An instance of [`LayoutLMv2Tokenizer`] or [`LayoutLMv2TokenizerFast`]. The tokenizer is a required input. An instance of [`LayoutLMv2Tokenizer`] or [`LayoutLMv2TokenizerFast`]. The tokenizer is a required input.
""" """
attributes = ["image_processor", "tokenizer"] attributes = ["image_processor", "tokenizer"]
......
...@@ -38,9 +38,9 @@ class LayoutLMv3Processor(ProcessorMixin): ...@@ -38,9 +38,9 @@ class LayoutLMv3Processor(ProcessorMixin):
into token-level `labels` for token classification tasks (such as FUNSD, CORD). into token-level `labels` for token classification tasks (such as FUNSD, CORD).
Args: Args:
image_processor (`LayoutLMv3ImageProcessor`): image_processor (`LayoutLMv3ImageProcessor`, *optional*):
An instance of [`LayoutLMv3ImageProcessor`]. The image processor is a required input. An instance of [`LayoutLMv3ImageProcessor`]. The image processor is a required input.
tokenizer (`LayoutLMv3Tokenizer` or `LayoutLMv3TokenizerFast`): tokenizer (`LayoutLMv3Tokenizer` or `LayoutLMv3TokenizerFast`, *optional*):
An instance of [`LayoutLMv3Tokenizer`] or [`LayoutLMv3TokenizerFast`]. The tokenizer is a required input. An instance of [`LayoutLMv3Tokenizer`] or [`LayoutLMv3TokenizerFast`]. The tokenizer is a required input.
""" """
attributes = ["image_processor", "tokenizer"] attributes = ["image_processor", "tokenizer"]
......
...@@ -253,7 +253,7 @@ class LayoutLMv3Tokenizer(PreTrainedTokenizer): ...@@ -253,7 +253,7 @@ class LayoutLMv3Tokenizer(PreTrainedTokenizer):
mask_token (`str`, *optional*, defaults to `"<mask>"`): mask_token (`str`, *optional*, defaults to `"<mask>"`):
The token used for masking values. This is the token used when training this model with masked language The token used for masking values. This is the token used when training this model with masked language
modeling. This is the token which the model will try to predict. modeling. This is the token which the model will try to predict.
add_prefix_space (`bool`, *optional*, defaults to `False`): add_prefix_space (`bool`, *optional*, defaults to `True`):
Whether or not to add an initial space to the input. This allows to treat the leading word just as any Whether or not to add an initial space to the input. This allows to treat the leading word just as any
other word. (RoBERTa tokenizer detect beginning of words by the preceding space). other word. (RoBERTa tokenizer detect beginning of words by the preceding space).
cls_token_box (`List[int]`, *optional*, defaults to `[0, 0, 0, 0]`): cls_token_box (`List[int]`, *optional*, defaults to `[0, 0, 0, 0]`):
......
...@@ -37,9 +37,9 @@ class LayoutXLMProcessor(ProcessorMixin): ...@@ -37,9 +37,9 @@ class LayoutXLMProcessor(ProcessorMixin):
into token-level `labels` for token classification tasks (such as FUNSD, CORD). into token-level `labels` for token classification tasks (such as FUNSD, CORD).
Args: Args:
image_processor (`LayoutLMv2ImageProcessor`): image_processor (`LayoutLMv2ImageProcessor`, *optional*):
An instance of [`LayoutLMv2ImageProcessor`]. The image processor is a required input. An instance of [`LayoutLMv2ImageProcessor`]. The image processor is a required input.
tokenizer (`LayoutXLMTokenizer` or `LayoutXLMTokenizerFast`): tokenizer (`LayoutXLMTokenizer` or `LayoutXLMTokenizerFast`, *optional*):
An instance of [`LayoutXLMTokenizer`] or [`LayoutXLMTokenizerFast`]. The tokenizer is a required input. An instance of [`LayoutXLMTokenizer`] or [`LayoutXLMTokenizerFast`]. The tokenizer is a required input.
""" """
......
...@@ -203,8 +203,6 @@ class LayoutXLMTokenizer(PreTrainedTokenizer): ...@@ -203,8 +203,6 @@ class LayoutXLMTokenizer(PreTrainedTokenizer):
CrossEntropyLoss. CrossEntropyLoss.
only_label_first_subword (`bool`, *optional*, defaults to `True`): only_label_first_subword (`bool`, *optional*, defaults to `True`):
Whether or not to only label the first subword, in case word labels are provided. Whether or not to only label the first subword, in case word labels are provided.
additional_special_tokens (`List[str]`, *optional*, defaults to `["<s>NOTUSED", "</s>NOTUSED"]`):
Additional special tokens used by the tokenizer.
sp_model_kwargs (`dict`, *optional*): sp_model_kwargs (`dict`, *optional*):
Will be passed to the `SentencePieceProcessor.__init__()` method. The [Python wrapper for Will be passed to the `SentencePieceProcessor.__init__()` method. The [Python wrapper for
SentencePiece](https://github.com/google/sentencepiece/tree/master/python) can be used, among other things, SentencePiece](https://github.com/google/sentencepiece/tree/master/python) can be used, among other things,
......
...@@ -56,7 +56,7 @@ class LevitImageProcessor(BaseImageProcessor): ...@@ -56,7 +56,7 @@ class LevitImageProcessor(BaseImageProcessor):
edge value `c` is rescaled to `int(c * (256/224))`. The smaller edge of the image will be matched to this edge value `c` is rescaled to `int(c * (256/224))`. The smaller edge of the image will be matched to this
value i.e, if height > width, then image will be rescaled to `(size["shortest_egde"] * height / width, value i.e, if height > width, then image will be rescaled to `(size["shortest_egde"] * height / width,
size["shortest_egde"])`. Can be overridden by the `size` parameter in the `preprocess` method. size["shortest_egde"])`. Can be overridden by the `size` parameter in the `preprocess` method.
resample (`PILImageResampling`, *optional*, defaults to `PILImageResampling.BICUBIC`): resample (`PILImageResampling`, *optional*, defaults to `Resampling.BICUBIC`):
Resampling filter to use if resizing the image. Can be overridden by the `resample` parameter in the Resampling filter to use if resizing the image. Can be overridden by the `resample` parameter in the
`preprocess` method. `preprocess` method.
do_center_crop (`bool`, *optional*, defaults to `True`): do_center_crop (`bool`, *optional*, defaults to `True`):
...@@ -74,10 +74,10 @@ class LevitImageProcessor(BaseImageProcessor): ...@@ -74,10 +74,10 @@ class LevitImageProcessor(BaseImageProcessor):
do_normalize (`bool`, *optional*, defaults to `True`): do_normalize (`bool`, *optional*, defaults to `True`):
Controls whether to normalize the image. Can be overridden by the `do_normalize` parameter in the Controls whether to normalize the image. Can be overridden by the `do_normalize` parameter in the
`preprocess` method. `preprocess` method.
image_mean (`List[int]`, defaults to `[0.229, 0.224, 0.225]`): image_mean (`List[int]`, *optional*, defaults to `[0.485, 0.456, 0.406]`):
Mean to use if normalizing the image. This is a float or list of floats the length of the number of Mean to use if normalizing the image. This is a float or list of floats the length of the number of
channels in the image. Can be overridden by the `image_mean` parameter in the `preprocess` method. channels in the image. Can be overridden by the `image_mean` parameter in the `preprocess` method.
image_std (`List[int]`, defaults to `[0.485, 0.456, 0.406]`): image_std (`List[int]`, *optional*, defaults to `[0.229, 0.224, 0.225]`):
Standard deviation to use if normalizing the image. This is a float or list of floats the length of the Standard deviation to use if normalizing the image. This is a float or list of floats the length of the
number of channels in the image. Can be overridden by the `image_std` parameter in the `preprocess` method. number of channels in the image. Can be overridden by the `image_std` parameter in the `preprocess` method.
""" """
......
...@@ -43,14 +43,18 @@ class LxmertConfig(PretrainedConfig): ...@@ -43,14 +43,18 @@ class LxmertConfig(PretrainedConfig):
`inputs_ids` passed when calling [`LxmertModel`] or [`TFLxmertModel`]. `inputs_ids` passed when calling [`LxmertModel`] or [`TFLxmertModel`].
hidden_size (`int`, *optional*, defaults to 768): hidden_size (`int`, *optional*, defaults to 768):
Dimensionality of the encoder layers and the pooler layer. Dimensionality of the encoder layers and the pooler layer.
r_layers (`int`, *optional*, defaults to 5): num_attention_heads (`int`, *optional*, defaults to 12):
Number of hidden layers in the Transformer visual encoder.
l_layers (`int`, *optional*, defaults to 9):
Number of hidden layers in the Transformer language encoder.
x_layers (`int`, *optional*, defaults to 5):
Number of hidden layers in the Transformer cross modality encoder.
num_attention_heads (`int`, *optional*, defaults to 5):
Number of attention heads for each attention layer in the Transformer encoder. Number of attention heads for each attention layer in the Transformer encoder.
num_qa_labels (`int`, *optional*, defaults to 9500):
This represents the total number of different question answering (QA) labels there are. If using more than
one dataset with QA, the user will need to account for the total number of labels that all of the datasets
have in total.
num_object_labels (`int`, *optional*, defaults to 1600):
This represents the total number of semantically unique objects that lxmert will be able to classify a
pooled-object feature as belonging too.
num_attr_labels (`int`, *optional*, defaults to 400):
This represents the total number of semantically unique attributes that lxmert will be able to classify a
pooled-object feature as possessing.
intermediate_size (`int`, *optional*, defaults to 3072): intermediate_size (`int`, *optional*, defaults to 3072):
Dimensionality of the "intermediate" (often named feed-forward) layer in the Transformer encoder. Dimensionality of the "intermediate" (often named feed-forward) layer in the Transformer encoder.
hidden_act (`str` or `Callable`, *optional*, defaults to `"gelu"`): hidden_act (`str` or `Callable`, *optional*, defaults to `"gelu"`):
...@@ -69,25 +73,21 @@ class LxmertConfig(PretrainedConfig): ...@@ -69,25 +73,21 @@ class LxmertConfig(PretrainedConfig):
The standard deviation of the truncated_normal_initializer for initializing all weight matrices. The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
layer_norm_eps (`float`, *optional*, defaults to 1e-12): layer_norm_eps (`float`, *optional*, defaults to 1e-12):
The epsilon used by the layer normalization layers. The epsilon used by the layer normalization layers.
l_layers (`int`, *optional*, defaults to 9):
Number of hidden layers in the Transformer language encoder.
x_layers (`int`, *optional*, defaults to 5):
Number of hidden layers in the Transformer cross modality encoder.
r_layers (`int`, *optional*, defaults to 5):
Number of hidden layers in the Transformer visual encoder.
visual_feat_dim (`int`, *optional*, defaults to 2048): visual_feat_dim (`int`, *optional*, defaults to 2048):
This represents the last dimension of the pooled-object features used as input for the model, representing This represents the last dimension of the pooled-object features used as input for the model, representing
the size of each object feature itself. the size of each object feature itself.
visual_pos_dim (`int`, *optional*, defaults to 4): visual_pos_dim (`int`, *optional*, defaults to 4):
This represents the number of spacial features that are mixed into the visual features. The default is set This represents the number of spacial features that are mixed into the visual features. The default is set
to 4 because most commonly this will represent the location of a bounding box. i.e., (x, y, width, height) to 4 because most commonly this will represent the location of a bounding box. i.e., (x, y, width, height)
visual_loss_normalizer (`float`, *optional*, defaults to 1/15): visual_loss_normalizer (`float`, *optional*, defaults to 6.67):
This represents the scaling factor in which each visual loss is multiplied by if during pretraining, one This represents the scaling factor in which each visual loss is multiplied by if during pretraining, one
decided to train with multiple vision-based loss objectives. decided to train with multiple vision-based loss objectives.
num_qa_labels (`int`, *optional*, defaults to 9500):
This represents the total number of different question answering (QA) labels there are. If using more than
one dataset with QA, the user will need to account for the total number of labels that all of the datasets
have in total.
num_object_labels (`int`, *optional*, defaults to 1600):
This represents the total number of semantically unique objects that lxmert will be able to classify a
pooled-object feature as belonging too.
num_attr_labels (`int`, *optional*, defaults to 400):
This represents the total number of semantically unique attributes that lxmert will be able to classify a
pooled-object feature as possessing.
task_matched (`bool`, *optional*, defaults to `True`): task_matched (`bool`, *optional*, defaults to `True`):
This task is used for sentence-image matching. If the sentence correctly describes the image the label will This task is used for sentence-image matching. If the sentence correctly describes the image the label will
be 1. If the sentence does not correctly describe the image, the label will be 0. be 1. If the sentence does not correctly describe the image, the label will be 0.
...@@ -104,12 +104,6 @@ class LxmertConfig(PretrainedConfig): ...@@ -104,12 +104,6 @@ class LxmertConfig(PretrainedConfig):
Whether or not to calculate the attribute-prediction loss objective Whether or not to calculate the attribute-prediction loss objective
visual_feat_loss (`bool`, *optional*, defaults to `True`): visual_feat_loss (`bool`, *optional*, defaults to `True`):
Whether or not to calculate the feature-regression loss objective Whether or not to calculate the feature-regression loss objective
output_attentions (`bool`, *optional*, defaults to `False`):
Whether or not the model should return the attentions from the vision, language, and cross-modality layers
should be returned.
output_hidden_states (`bool`, *optional*, defaults to `False`):
Whether or not the model should return the hidden states from the vision, language, and cross-modality
layers should be returned.
""" """
model_type = "lxmert" model_type = "lxmert"
......
...@@ -356,20 +356,17 @@ class Mask2FormerImageProcessor(BaseImageProcessor): ...@@ -356,20 +356,17 @@ class Mask2FormerImageProcessor(BaseImageProcessor):
sequence like `(width, height)`, output size will be matched to this. If size is an int, smaller edge of sequence like `(width, height)`, output size will be matched to this. If size is an int, smaller edge of
the image will be matched to this number. i.e, if `height > width`, then image will be rescaled to `(size * the image will be matched to this number. i.e, if `height > width`, then image will be rescaled to `(size *
height / width, size)`. height / width, size)`.
max_size (`int`, *optional*, defaults to 1333): size_divisor (`int`, *optional*, defaults to 32):
The largest size an image dimension can have (otherwise it's capped). Only has an effect if `do_resize` is Some backbones need images divisible by a certain number. If not passed, it defaults to the value used in
set to `True`. Swin Transformer.
resample (`int`, *optional*, defaults to `PIL.Image.Resampling.BILINEAR`): resample (`int`, *optional*, defaults to `Resampling.BILINEAR`):
An optional resampling filter. This can be one of `PIL.Image.Resampling.NEAREST`, An optional resampling filter. This can be one of `PIL.Image.Resampling.NEAREST`,
`PIL.Image.Resampling.BOX`, `PIL.Image.Resampling.BILINEAR`, `PIL.Image.Resampling.HAMMING`, `PIL.Image.Resampling.BOX`, `PIL.Image.Resampling.BILINEAR`, `PIL.Image.Resampling.HAMMING`,
`PIL.Image.Resampling.BICUBIC` or `PIL.Image.Resampling.LANCZOS`. Only has an effect if `do_resize` is set `PIL.Image.Resampling.BICUBIC` or `PIL.Image.Resampling.LANCZOS`. Only has an effect if `do_resize` is set
to `True`. to `True`.
size_divisor (`int`, *optional*, defaults to 32):
Some backbones need images divisible by a certain number. If not passed, it defaults to the value used in
Swin Transformer.
do_rescale (`bool`, *optional*, defaults to `True`): do_rescale (`bool`, *optional*, defaults to `True`):
Whether to rescale the input to a certain `scale`. Whether to rescale the input to a certain `scale`.
rescale_factor (`float`, *optional*, defaults to 1/ 255): rescale_factor (`float`, *optional*, defaults to `1/ 255`):
Rescale the input by the given factor. Only has an effect if `do_rescale` is set to `True`. Rescale the input by the given factor. Only has an effect if `do_rescale` is set to `True`.
do_normalize (`bool`, *optional*, defaults to `True`): do_normalize (`bool`, *optional*, defaults to `True`):
Whether or not to normalize the input with mean and standard deviation. Whether or not to normalize the input with mean and standard deviation.
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment