"test/ut/git@developer.sourcefind.cn:OpenDAS/nni.git" did not exist on "0efabe96f00306cd8b01c53697409338056dc00d"
Unverified Commit 03af4c42 authored by Sylvain Gugger's avatar Sylvain Gugger Committed by GitHub
Browse files

Docstring check (#26052)



* Fix number of minimal calls to the Hub with peft integration

* Alternate design

* And this way?

* Revert

* Nits to fix

* Add util

* Print when changes are made

* Add list to ignore

* Add more rules

* Manual fixes

* deal with kwargs

* deal with enum defaults

* avoid many digits for floats

* Manual fixes

* Fix regex

* Fix regex

* Auto fix

* Style

* Apply script

* Add ignored list

* Add check that templates are filled

* Adding to CI checks

* Add back semi-fix

* Ignore more objects

* More auto-fixes

* Ignore missing objects

* Remove temp semi-fix

* Fixes

* Update src/transformers/models/pvt/configuration_pvt.py
Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>

* Update utils/check_docstrings.py
Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/utils/quantization_config.py
Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>

* Deal with float defaults

* Fix small defaults

* Address review comment

* Treat

* Post-rebase cleanup

* Address review comment

* Update src/transformers/models/deprecated/mctct/configuration_mctct.py
Co-authored-by: default avatarLysandre Debut <lysandre.debut@reseau.eseo.fr>

* Address review comment

---------
Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: default avatarLysandre Debut <lysandre.debut@reseau.eseo.fr>
parent 122b2657
...@@ -209,6 +209,7 @@ jobs: ...@@ -209,6 +209,7 @@ jobs:
- run: make deps_table_check_updated - run: make deps_table_check_updated
- run: python utils/update_metadata.py --check-only - run: python utils/update_metadata.py --check-only
- run: python utils/check_task_guides.py - run: python utils/check_task_guides.py
- run: python utils/check_docstrings.py
workflows: workflows:
version: 2 version: 2
......
...@@ -43,6 +43,7 @@ repo-consistency: ...@@ -43,6 +43,7 @@ repo-consistency:
python utils/check_doctest_list.py python utils/check_doctest_list.py
python utils/update_metadata.py --check-only python utils/update_metadata.py --check-only
python utils/check_task_guides.py python utils/check_task_guides.py
python utils/check_docstrings.py
# this target runs checks on all files # this target runs checks on all files
...@@ -82,6 +83,7 @@ fix-copies: ...@@ -82,6 +83,7 @@ fix-copies:
python utils/check_dummies.py --fix_and_overwrite python utils/check_dummies.py --fix_and_overwrite
python utils/check_doctest_list.py --fix_and_overwrite python utils/check_doctest_list.py --fix_and_overwrite
python utils/check_task_guides.py --fix_and_overwrite python utils/check_task_guides.py --fix_and_overwrite
python utils/check_docstrings.py --fix_and_overwrite
# Run tests for the library # Run tests for the library
......
...@@ -124,6 +124,7 @@ This checks that: ...@@ -124,6 +124,7 @@ This checks that:
- The translations of the READMEs and the index of the doc have the same model list as the main README (performed by `utils/check_copies.py`) - The translations of the READMEs and the index of the doc have the same model list as the main README (performed by `utils/check_copies.py`)
- The auto-generated tables in the documentation are up to date (performed by `utils/check_table.py`) - The auto-generated tables in the documentation are up to date (performed by `utils/check_table.py`)
- The library has all objects available even if not all optional dependencies are installed (performed by `utils/check_dummies.py`) - The library has all objects available even if not all optional dependencies are installed (performed by `utils/check_dummies.py`)
- All docstrings properly document the arguments in the signature of the object (performed by `utils/check_docstrings.py`)
Should this check fail, the first two items require manual fixing, the last four can be fixed automatically for you by running the command Should this check fail, the first two items require manual fixing, the last four can be fixed automatically for you by running the command
......
...@@ -47,6 +47,7 @@ _re_configuration_file = re.compile(r"config\.(.*)\.json") ...@@ -47,6 +47,7 @@ _re_configuration_file = re.compile(r"config\.(.*)\.json")
class PretrainedConfig(PushToHubMixin): class PretrainedConfig(PushToHubMixin):
# no-format
r""" r"""
Base class for all configuration classes. Handles a few parameters common to all models' configurations as well as Base class for all configuration classes. Handles a few parameters common to all models' configurations as well as
methods for loading/downloading/saving configurations. methods for loading/downloading/saving configurations.
......
...@@ -90,7 +90,7 @@ class DefaultDataCollator(DataCollatorMixin): ...@@ -90,7 +90,7 @@ class DefaultDataCollator(DataCollatorMixin):
helpful if you need to set a return_tensors value at initialization. helpful if you need to set a return_tensors value at initialization.
Args: Args:
return_tensors (`str`): return_tensors (`str`, *optional*, defaults to `"pt"`):
The type of Tensor to return. Allowable values are "np", "pt" and "tf". The type of Tensor to return. Allowable values are "np", "pt" and "tf".
""" """
...@@ -235,7 +235,7 @@ class DataCollatorWithPadding: ...@@ -235,7 +235,7 @@ class DataCollatorWithPadding:
This is especially useful to enable the use of Tensor Cores on NVIDIA hardware with compute capability >= This is especially useful to enable the use of Tensor Cores on NVIDIA hardware with compute capability >=
7.5 (Volta). 7.5 (Volta).
return_tensors (`str`): return_tensors (`str`, *optional*, defaults to `"pt"`):
The type of Tensor to return. Allowable values are "np", "pt" and "tf". The type of Tensor to return. Allowable values are "np", "pt" and "tf".
""" """
...@@ -288,7 +288,7 @@ class DataCollatorForTokenClassification(DataCollatorMixin): ...@@ -288,7 +288,7 @@ class DataCollatorForTokenClassification(DataCollatorMixin):
7.5 (Volta). 7.5 (Volta).
label_pad_token_id (`int`, *optional*, defaults to -100): label_pad_token_id (`int`, *optional*, defaults to -100):
The id to use when padding the labels (-100 will be automatically ignore by PyTorch loss functions). The id to use when padding the labels (-100 will be automatically ignore by PyTorch loss functions).
return_tensors (`str`): return_tensors (`str`, *optional*, defaults to `"pt"`):
The type of Tensor to return. Allowable values are "np", "pt" and "tf". The type of Tensor to return. Allowable values are "np", "pt" and "tf".
""" """
...@@ -521,7 +521,7 @@ class DataCollatorForSeq2Seq: ...@@ -521,7 +521,7 @@ class DataCollatorForSeq2Seq:
Args: Args:
tokenizer ([`PreTrainedTokenizer`] or [`PreTrainedTokenizerFast`]): tokenizer ([`PreTrainedTokenizer`] or [`PreTrainedTokenizerFast`]):
The tokenizer used for encoding the data. The tokenizer used for encoding the data.
model ([`PreTrainedModel`]): model ([`PreTrainedModel`], *optional*):
The model that is being trained. If set and has the *prepare_decoder_input_ids_from_labels*, use it to The model that is being trained. If set and has the *prepare_decoder_input_ids_from_labels*, use it to
prepare the *decoder_input_ids* prepare the *decoder_input_ids*
...@@ -544,7 +544,7 @@ class DataCollatorForSeq2Seq: ...@@ -544,7 +544,7 @@ class DataCollatorForSeq2Seq:
7.5 (Volta). 7.5 (Volta).
label_pad_token_id (`int`, *optional*, defaults to -100): label_pad_token_id (`int`, *optional*, defaults to -100):
The id to use when padding the labels (-100 will be automatically ignored by PyTorch loss functions). The id to use when padding the labels (-100 will be automatically ignored by PyTorch loss functions).
return_tensors (`str`): return_tensors (`str`, *optional*, defaults to `"pt"`):
The type of Tensor to return. Allowable values are "np", "pt" and "tf". The type of Tensor to return. Allowable values are "np", "pt" and "tf".
""" """
......
...@@ -65,7 +65,7 @@ class BatchFeature(UserDict): ...@@ -65,7 +65,7 @@ class BatchFeature(UserDict):
This class is derived from a python dictionary and can be used as a dictionary. This class is derived from a python dictionary and can be used as a dictionary.
Args: Args:
data (`dict`): data (`dict`, *optional*):
Dictionary of lists/arrays/tensors returned by the __call__/pad methods ('input_values', 'attention_mask', Dictionary of lists/arrays/tensors returned by the __call__/pad methods ('input_values', 'attention_mask',
etc.). etc.).
tensor_type (`Union[None, str, TensorType]`, *optional*): tensor_type (`Union[None, str, TensorType]`, *optional*):
......
...@@ -263,8 +263,9 @@ class DisjunctiveConstraint(Constraint): ...@@ -263,8 +263,9 @@ class DisjunctiveConstraint(Constraint):
A special [`Constraint`] that is fulfilled by fulfilling just one of several constraints. A special [`Constraint`] that is fulfilled by fulfilling just one of several constraints.
Args: Args:
nested_token_ids (`List[List[int]]`): a list of words, where each word is a list of ids. This constraint nested_token_ids (`List[List[int]]`):
is fulfilled by generating just one from the list of words. A list of words, where each word is a list of ids. This constraint is fulfilled by generating just one from
the list of words.
""" """
def __init__(self, nested_token_ids: List[List[int]]): def __init__(self, nested_token_ids: List[List[int]]):
......
...@@ -152,7 +152,7 @@ class BeamSearchScorer(BeamScorer): ...@@ -152,7 +152,7 @@ class BeamSearchScorer(BeamScorer):
num_beam_hyps_to_keep (`int`, *optional*, defaults to 1): num_beam_hyps_to_keep (`int`, *optional*, defaults to 1):
The number of beam hypotheses that shall be returned upon calling The number of beam hypotheses that shall be returned upon calling
[`~transformer.BeamSearchScorer.finalize`]. [`~transformer.BeamSearchScorer.finalize`].
num_beam_groups (`int`): num_beam_groups (`int`, *optional*, defaults to 1):
Number of groups to divide `num_beams` into in order to ensure diversity among different groups of beams. Number of groups to divide `num_beams` into in order to ensure diversity among different groups of beams.
See [this paper](https://arxiv.org/pdf/1610.02424.pdf) for more details. See [this paper](https://arxiv.org/pdf/1610.02424.pdf) for more details.
max_length (`int`, *optional*): max_length (`int`, *optional*):
...@@ -437,7 +437,7 @@ class ConstrainedBeamSearchScorer(BeamScorer): ...@@ -437,7 +437,7 @@ class ConstrainedBeamSearchScorer(BeamScorer):
num_beam_hyps_to_keep (`int`, *optional*, defaults to 1): num_beam_hyps_to_keep (`int`, *optional*, defaults to 1):
The number of beam hypotheses that shall be returned upon calling The number of beam hypotheses that shall be returned upon calling
[`~transformer.BeamSearchScorer.finalize`]. [`~transformer.BeamSearchScorer.finalize`].
num_beam_groups (`int`): num_beam_groups (`int`, *optional*, defaults to 1):
Number of groups to divide `num_beams` into in order to ensure diversity among different groups of beams. Number of groups to divide `num_beams` into in order to ensure diversity among different groups of beams.
See [this paper](https://arxiv.org/pdf/1610.02424.pdf) for more details. See [this paper](https://arxiv.org/pdf/1610.02424.pdf) for more details.
max_length (`int`, *optional*): max_length (`int`, *optional*):
......
...@@ -38,6 +38,7 @@ METADATA_FIELDS = ("_from_model_config", "_commit_hash", "_original_object_hash" ...@@ -38,6 +38,7 @@ METADATA_FIELDS = ("_from_model_config", "_commit_hash", "_original_object_hash"
class GenerationConfig(PushToHubMixin): class GenerationConfig(PushToHubMixin):
# no-format
r""" r"""
Class that holds a configuration for a generation task. A `generate` call supports the following generation methods Class that holds a configuration for a generation task. A `generate` call supports the following generation methods
for text-decoder, text-to-text, speech-to-text, and vision-to-text models: for text-decoder, text-to-text, speech-to-text, and vision-to-text models:
......
...@@ -120,7 +120,7 @@ class FlaxTopPLogitsWarper(FlaxLogitsWarper): ...@@ -120,7 +120,7 @@ class FlaxTopPLogitsWarper(FlaxLogitsWarper):
top_p (`float`): top_p (`float`):
If set to < 1, only the smallest set of most probable tokens with probabilities that add up to `top_p` or If set to < 1, only the smallest set of most probable tokens with probabilities that add up to `top_p` or
higher are kept for generation. higher are kept for generation.
filter_value (`float`, *optional*, defaults to `-float("Inf")`): filter_value (`float`, *optional*, defaults to -inf):
All filtered values will be set to this float value. All filtered values will be set to this float value.
min_tokens_to_keep (`int`, *optional*, defaults to 1): min_tokens_to_keep (`int`, *optional*, defaults to 1):
Minimum number of tokens that cannot be filtered. Minimum number of tokens that cannot be filtered.
...@@ -163,7 +163,7 @@ class FlaxTopKLogitsWarper(FlaxLogitsWarper): ...@@ -163,7 +163,7 @@ class FlaxTopKLogitsWarper(FlaxLogitsWarper):
Args: Args:
top_k (`int`): top_k (`int`):
The number of highest probability vocabulary tokens to keep for top-k-filtering. The number of highest probability vocabulary tokens to keep for top-k-filtering.
filter_value (`float`, *optional*, defaults to `-float("Inf")`): filter_value (`float`, *optional*, defaults to -inf):
All filtered values will be set to this float value. All filtered values will be set to this float value.
min_tokens_to_keep (`int`, *optional*, defaults to 1): min_tokens_to_keep (`int`, *optional*, defaults to 1):
Minimum number of tokens that cannot be filtered. Minimum number of tokens that cannot be filtered.
......
...@@ -357,7 +357,7 @@ class TopPLogitsWarper(LogitsWarper): ...@@ -357,7 +357,7 @@ class TopPLogitsWarper(LogitsWarper):
top_p (`float`): top_p (`float`):
If set to < 1, only the smallest set of most probable tokens with probabilities that add up to `top_p` or If set to < 1, only the smallest set of most probable tokens with probabilities that add up to `top_p` or
higher are kept for generation. higher are kept for generation.
filter_value (`float`, *optional*, defaults to `-float("Inf")`): filter_value (`float`, *optional*, defaults to -inf):
All filtered values will be set to this float value. All filtered values will be set to this float value.
min_tokens_to_keep (`int`, *optional*, defaults to 1): min_tokens_to_keep (`int`, *optional*, defaults to 1):
Minimum number of tokens that cannot be filtered. Minimum number of tokens that cannot be filtered.
...@@ -419,7 +419,7 @@ class TopKLogitsWarper(LogitsWarper): ...@@ -419,7 +419,7 @@ class TopKLogitsWarper(LogitsWarper):
Args: Args:
top_k (`int`): top_k (`int`):
The number of highest probability vocabulary tokens to keep for top-k-filtering. The number of highest probability vocabulary tokens to keep for top-k-filtering.
filter_value (`float`, *optional*, defaults to `-float("Inf")`): filter_value (`float`, *optional*, defaults to -inf):
All filtered values will be set to this float value. All filtered values will be set to this float value.
min_tokens_to_keep (`int`, *optional*, defaults to 1): min_tokens_to_keep (`int`, *optional*, defaults to 1):
Minimum number of tokens that cannot be filtered. Minimum number of tokens that cannot be filtered.
...@@ -447,9 +447,9 @@ class TypicalLogitsWarper(LogitsWarper): ...@@ -447,9 +447,9 @@ class TypicalLogitsWarper(LogitsWarper):
Generation](https://arxiv.org/abs/2202.00666) for more information. Generation](https://arxiv.org/abs/2202.00666) for more information.
Args: Args:
mass (`float`): mass (`float`, *optional*, defaults to 0.9):
Value of typical_p between 0 and 1 inclusive, defaults to 0.9. Value of typical_p between 0 and 1 inclusive, defaults to 0.9.
filter_value (`float`, *optional*, defaults to `-float("Inf")`): filter_value (`float`, *optional*, defaults to -inf):
All filtered values will be set to this float value. All filtered values will be set to this float value.
min_tokens_to_keep (`int`, *optional*, defaults to 1): min_tokens_to_keep (`int`, *optional*, defaults to 1):
Minimum number of tokens that cannot be filtered. Minimum number of tokens that cannot be filtered.
...@@ -499,7 +499,7 @@ class EpsilonLogitsWarper(LogitsWarper): ...@@ -499,7 +499,7 @@ class EpsilonLogitsWarper(LogitsWarper):
Args: Args:
epsilon (`float`): epsilon (`float`):
If set to > 0, only the most tokens with probabilities `epsilon` or higher are kept for generation. If set to > 0, only the most tokens with probabilities `epsilon` or higher are kept for generation.
filter_value (`float`, *optional*, defaults to `-float("Inf")`): filter_value (`float`, *optional*, defaults to -inf):
All filtered values will be set to this float value. All filtered values will be set to this float value.
min_tokens_to_keep (`int`, *optional*, defaults to 1): min_tokens_to_keep (`int`, *optional*, defaults to 1):
Minimum number of tokens that cannot be filtered. Minimum number of tokens that cannot be filtered.
...@@ -572,7 +572,7 @@ class EtaLogitsWarper(LogitsWarper): ...@@ -572,7 +572,7 @@ class EtaLogitsWarper(LogitsWarper):
epsilon (`float`): epsilon (`float`):
A float value in the range (0, 1). Hyperparameter used to calculate the dynamic cutoff value, `eta`. The A float value in the range (0, 1). Hyperparameter used to calculate the dynamic cutoff value, `eta`. The
suggested values from the paper ranges from 3e-4 to 4e-3 depending on the size of the model. suggested values from the paper ranges from 3e-4 to 4e-3 depending on the size of the model.
filter_value (`float`, *optional*, defaults to `-float("Inf")`): filter_value (`float`, *optional*, defaults to -inf):
All values that are found to be below the dynamic cutoff value, `eta`, are set to this float value. This All values that are found to be below the dynamic cutoff value, `eta`, are set to this float value. This
parameter is useful when logits need to be modified for very low probability tokens that should be excluded parameter is useful when logits need to be modified for very low probability tokens that should be excluded
from generation entirely. from generation entirely.
...@@ -1600,18 +1600,15 @@ class UnbatchedClassifierFreeGuidanceLogitsProcessor(LogitsProcessor): ...@@ -1600,18 +1600,15 @@ class UnbatchedClassifierFreeGuidanceLogitsProcessor(LogitsProcessor):
Higher guidance scale encourages the model to generate samples that are more closely linked to the input Higher guidance scale encourages the model to generate samples that are more closely linked to the input
prompt, usually at the expense of poorer quality. A value smaller than 1 has the opposite effect, while prompt, usually at the expense of poorer quality. A value smaller than 1 has the opposite effect, while
making the negative prompt provided with negative_prompt_ids (if any) act as a positive prompt. making the negative prompt provided with negative_prompt_ids (if any) act as a positive prompt.
model (`PreTrainedModel`):
The model computing the unconditional scores. Supposedly the same as the one computing the conditional
scores. Both models must use the same tokenizer.
unconditional_ids (`torch.LongTensor` of shape `(batch_size, sequence_length)`, *optional*): unconditional_ids (`torch.LongTensor` of shape `(batch_size, sequence_length)`, *optional*):
Indices of input sequence tokens in the vocabulary for the unconditional branch. If unset, will default to Indices of input sequence tokens in the vocabulary for the unconditional branch. If unset, will default to
the last token of the prompt. the last token of the prompt.
unconditional_attention_mask (`torch.LongTensor` of shape `(batch_size, sequence_length)`, **optional**): unconditional_attention_mask (`torch.LongTensor` of shape `(batch_size, sequence_length)`, *optional*):
Attention mask for unconditional_ids. Attention mask for unconditional_ids.
model (`PreTrainedModel`): use_cache (`bool`, *optional*, defaults to `True`):
The model computing the unconditional scores. Supposedly the same as the one computing the conditional
scores. Both models must use the same tokenizer.
smooth_factor (`float`, **optional**):
The interpolation weight for CFG Rescale. 1 means no rescaling, 0 reduces to the conditional scores without
CFG. Turn it lower if the output degenerates.
use_cache (`bool`, **optional**):
Whether to cache key/values during the negative prompt forward pass. Whether to cache key/values during the negative prompt forward pass.
......
...@@ -49,7 +49,7 @@ class MaxLengthCriteria(StoppingCriteria): ...@@ -49,7 +49,7 @@ class MaxLengthCriteria(StoppingCriteria):
Args: Args:
max_length (`int`): max_length (`int`):
The maximum length that the output sequence can have in number of tokens. The maximum length that the output sequence can have in number of tokens.
max_position_embeddings (`int`, `optional`): max_position_embeddings (`int`, *optional*):
The maximum model length, as defined by the model's `config.max_position_embeddings` attribute. The maximum model length, as defined by the model's `config.max_position_embeddings` attribute.
""" """
......
...@@ -122,7 +122,7 @@ class TFTopKLogitsWarper(TFLogitsWarper): ...@@ -122,7 +122,7 @@ class TFTopKLogitsWarper(TFLogitsWarper):
Args: Args:
top_k (`int`): top_k (`int`):
The number of highest probability vocabulary tokens to keep for top-k-filtering. The number of highest probability vocabulary tokens to keep for top-k-filtering.
filter_value (`float`, *optional*, defaults to `-float("Inf")`): filter_value (`float`, *optional*, defaults to -inf):
All filtered values will be set to this float value. All filtered values will be set to this float value.
min_tokens_to_keep (`int`, *optional*, defaults to 1): min_tokens_to_keep (`int`, *optional*, defaults to 1):
Minimum number of tokens that cannot be filtered. Minimum number of tokens that cannot be filtered.
...@@ -151,7 +151,7 @@ class TFTopPLogitsWarper(TFLogitsWarper): ...@@ -151,7 +151,7 @@ class TFTopPLogitsWarper(TFLogitsWarper):
top_p (`float`): top_p (`float`):
If set to < 1, only the smallest set of most probable tokens with probabilities that add up to `top_p` or If set to < 1, only the smallest set of most probable tokens with probabilities that add up to `top_p` or
higher are kept for generation. higher are kept for generation.
filter_value (`float`, *optional*, defaults to `-float("Inf")`): filter_value (`float`, *optional*, defaults to -inf):
All filtered values will be set to this float value. All filtered values will be set to this float value.
min_tokens_to_keep (`int`, *optional*, defaults to 1): min_tokens_to_keep (`int`, *optional*, defaults to 1):
Minimum number of tokens that cannot be filtered. Minimum number of tokens that cannot be filtered.
......
...@@ -71,6 +71,8 @@ class AlignTextConfig(PretrainedConfig): ...@@ -71,6 +71,8 @@ class AlignTextConfig(PretrainedConfig):
The standard deviation of the truncated_normal_initializer for initializing all weight matrices. The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
layer_norm_eps (`float`, *optional*, defaults to 1e-12): layer_norm_eps (`float`, *optional*, defaults to 1e-12):
The epsilon used by the layer normalization layers. The epsilon used by the layer normalization layers.
pad_token_id (`int`, *optional*, defaults to 0):
Padding token id.
position_embedding_type (`str`, *optional*, defaults to `"absolute"`): position_embedding_type (`str`, *optional*, defaults to `"absolute"`):
Type of position embedding. Choose one of `"absolute"`, `"relative_key"`, `"relative_key_query"`. For Type of position embedding. Choose one of `"absolute"`, `"relative_key"`, `"relative_key_query"`. For
positional embeddings use `"absolute"`. For more information on `"relative_key"`, please refer to positional embeddings use `"absolute"`. For more information on `"relative_key"`, please refer to
...@@ -80,8 +82,6 @@ class AlignTextConfig(PretrainedConfig): ...@@ -80,8 +82,6 @@ class AlignTextConfig(PretrainedConfig):
use_cache (`bool`, *optional*, defaults to `True`): use_cache (`bool`, *optional*, defaults to `True`):
Whether or not the model should return the last key/values attentions (not used by all models). Only Whether or not the model should return the last key/values attentions (not used by all models). Only
relevant if `config.is_decoder=True`. relevant if `config.is_decoder=True`.
pad_token_id (`int`, *optional*, defaults to 0)
Padding token id.
Example: Example:
......
...@@ -259,7 +259,7 @@ class AltCLIPConfig(PretrainedConfig): ...@@ -259,7 +259,7 @@ class AltCLIPConfig(PretrainedConfig):
Dictionary of configuration options used to initialize [`AltCLIPTextConfig`]. Dictionary of configuration options used to initialize [`AltCLIPTextConfig`].
vision_config (`dict`, *optional*): vision_config (`dict`, *optional*):
Dictionary of configuration options used to initialize [`AltCLIPVisionConfig`]. Dictionary of configuration options used to initialize [`AltCLIPVisionConfig`].
projection_dim (`int`, *optional*, defaults to 512): projection_dim (`int`, *optional*, defaults to 768):
Dimentionality of text and vision projection layers. Dimentionality of text and vision projection layers.
logit_scale_init_value (`float`, *optional*, defaults to 2.6592): logit_scale_init_value (`float`, *optional*, defaults to 2.6592):
The inital value of the *logit_scale* paramter. Default is used as per the original CLIP implementation. The inital value of the *logit_scale* paramter. Default is used as per the original CLIP implementation.
......
...@@ -30,9 +30,9 @@ class AltCLIPProcessor(ProcessorMixin): ...@@ -30,9 +30,9 @@ class AltCLIPProcessor(ProcessorMixin):
the [`~AltCLIPProcessor.__call__`] and [`~AltCLIPProcessor.decode`] for more information. the [`~AltCLIPProcessor.__call__`] and [`~AltCLIPProcessor.decode`] for more information.
Args: Args:
image_processor ([`CLIPImageProcessor`]): image_processor ([`CLIPImageProcessor`], *optional*):
The image processor is a required input. The image processor is a required input.
tokenizer ([`XLMRobertaTokenizerFast`]): tokenizer ([`XLMRobertaTokenizerFast`], *optional*):
The tokenizer is a required input. The tokenizer is a required input.
""" """
attributes = ["image_processor", "tokenizer"] attributes = ["image_processor", "tokenizer"]
......
...@@ -51,15 +51,15 @@ class ASTConfig(PretrainedConfig): ...@@ -51,15 +51,15 @@ class ASTConfig(PretrainedConfig):
hidden_act (`str` or `function`, *optional*, defaults to `"gelu"`): hidden_act (`str` or `function`, *optional*, defaults to `"gelu"`):
The non-linear activation function (function or string) in the encoder and pooler. If string, `"gelu"`, The non-linear activation function (function or string) in the encoder and pooler. If string, `"gelu"`,
`"relu"`, `"selu"` and `"gelu_new"` are supported. `"relu"`, `"selu"` and `"gelu_new"` are supported.
hidden_dropout_prob (`float`, *optional*, defaults to 0.1): hidden_dropout_prob (`float`, *optional*, defaults to 0.0):
The dropout probability for all fully connected layers in the embeddings, encoder, and pooler. The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.
attention_probs_dropout_prob (`float`, *optional*, defaults to 0.1): attention_probs_dropout_prob (`float`, *optional*, defaults to 0.0):
The dropout ratio for the attention probabilities. The dropout ratio for the attention probabilities.
initializer_range (`float`, *optional*, defaults to 0.02): initializer_range (`float`, *optional*, defaults to 0.02):
The standard deviation of the truncated_normal_initializer for initializing all weight matrices. The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
layer_norm_eps (`float`, *optional*, defaults to 1e-12): layer_norm_eps (`float`, *optional*, defaults to 1e-12):
The epsilon used by the layer normalization layers. The epsilon used by the layer normalization layers.
patch_size (`int`, *optional*, defaults to `16`): patch_size (`int`, *optional*, defaults to 16):
The size (resolution) of each patch. The size (resolution) of each patch.
qkv_bias (`bool`, *optional*, defaults to `True`): qkv_bias (`bool`, *optional*, defaults to `True`):
Whether to add a bias to the queries, keys and values. Whether to add a bias to the queries, keys and values.
......
...@@ -38,7 +38,7 @@ class BarkProcessor(ProcessorMixin): ...@@ -38,7 +38,7 @@ class BarkProcessor(ProcessorMixin):
Args: Args:
tokenizer ([`PreTrainedTokenizer`]): tokenizer ([`PreTrainedTokenizer`]):
An instance of [`PreTrainedTokenizer`]. An instance of [`PreTrainedTokenizer`].
speaker_embeddings (`Dict[Dict[str]]`, *optional*, defaults to `None`): speaker_embeddings (`Dict[Dict[str]]`, *optional*):
Optional nested speaker embeddings dictionary. The first level contains voice preset names (e.g Optional nested speaker embeddings dictionary. The first level contains voice preset names (e.g
`"en_speaker_4"`). The second level contains `"semantic_prompt"`, `"coarse_prompt"` and `"fine_prompt"` `"en_speaker_4"`). The second level contains `"semantic_prompt"`, `"coarse_prompt"` and `"fine_prompt"`
embeddings. The values correspond to the path of the corresponding `np.ndarray`. See embeddings. The values correspond to the path of the corresponding `np.ndarray`. See
......
...@@ -97,8 +97,6 @@ class BarthezTokenizer(PreTrainedTokenizer): ...@@ -97,8 +97,6 @@ class BarthezTokenizer(PreTrainedTokenizer):
mask_token (`str`, *optional*, defaults to `"<mask>"`): mask_token (`str`, *optional*, defaults to `"<mask>"`):
The token used for masking values. This is the token used when training this model with masked language The token used for masking values. This is the token used when training this model with masked language
modeling. This is the token which the model will try to predict. modeling. This is the token which the model will try to predict.
additional_special_tokens (`List[str]`, *optional*, defaults to `["<s>NOTUSED", "</s>NOTUSED"]`):
Additional special tokens used by the tokenizer.
sp_model_kwargs (`dict`, *optional*): sp_model_kwargs (`dict`, *optional*):
Will be passed to the `SentencePieceProcessor.__init__()` method. The [Python wrapper for Will be passed to the `SentencePieceProcessor.__init__()` method. The [Python wrapper for
SentencePiece](https://github.com/google/sentencepiece/tree/master/python) can be used, among other things, SentencePiece](https://github.com/google/sentencepiece/tree/master/python) can be used, among other things,
......
...@@ -92,8 +92,6 @@ class BartphoTokenizer(PreTrainedTokenizer): ...@@ -92,8 +92,6 @@ class BartphoTokenizer(PreTrainedTokenizer):
mask_token (`str`, *optional*, defaults to `"<mask>"`): mask_token (`str`, *optional*, defaults to `"<mask>"`):
The token used for masking values. This is the token used when training this model with masked language The token used for masking values. This is the token used when training this model with masked language
modeling. This is the token which the model will try to predict. modeling. This is the token which the model will try to predict.
additional_special_tokens (`List[str]`, *optional*, defaults to `["<s>NOTUSED", "</s>NOTUSED"]`):
Additional special tokens used by the tokenizer.
sp_model_kwargs (`dict`, *optional*): sp_model_kwargs (`dict`, *optional*):
Will be passed to the `SentencePieceProcessor.__init__()` method. The [Python wrapper for Will be passed to the `SentencePieceProcessor.__init__()` method. The [Python wrapper for
SentencePiece](https://github.com/google/sentencepiece/tree/master/python) can be used, among other things, SentencePiece](https://github.com/google/sentencepiece/tree/master/python) can be used, among other things,
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment