Unverified Commit 03af4c42 authored by Sylvain Gugger's avatar Sylvain Gugger Committed by GitHub
Browse files

Docstring check (#26052)



* Fix number of minimal calls to the Hub with peft integration

* Alternate design

* And this way?

* Revert

* Nits to fix

* Add util

* Print when changes are made

* Add list to ignore

* Add more rules

* Manual fixes

* deal with kwargs

* deal with enum defaults

* avoid many digits for floats

* Manual fixes

* Fix regex

* Fix regex

* Auto fix

* Style

* Apply script

* Add ignored list

* Add check that templates are filled

* Adding to CI checks

* Add back semi-fix

* Ignore more objects

* More auto-fixes

* Ignore missing objects

* Remove temp semi-fix

* Fixes

* Update src/transformers/models/pvt/configuration_pvt.py
Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>

* Update utils/check_docstrings.py
Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/utils/quantization_config.py
Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>

* Deal with float defaults

* Fix small defaults

* Address review comment

* Treat

* Post-rebase cleanup

* Address review comment

* Update src/transformers/models/deprecated/mctct/configuration_mctct.py
Co-authored-by: default avatarLysandre Debut <lysandre.debut@reseau.eseo.fr>

* Address review comment

---------
Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: default avatarLysandre Debut <lysandre.debut@reseau.eseo.fr>
parent 122b2657
......@@ -209,6 +209,7 @@ jobs:
- run: make deps_table_check_updated
- run: python utils/update_metadata.py --check-only
- run: python utils/check_task_guides.py
- run: python utils/check_docstrings.py
workflows:
version: 2
......
......@@ -43,6 +43,7 @@ repo-consistency:
python utils/check_doctest_list.py
python utils/update_metadata.py --check-only
python utils/check_task_guides.py
python utils/check_docstrings.py
# this target runs checks on all files
......@@ -82,6 +83,7 @@ fix-copies:
python utils/check_dummies.py --fix_and_overwrite
python utils/check_doctest_list.py --fix_and_overwrite
python utils/check_task_guides.py --fix_and_overwrite
python utils/check_docstrings.py --fix_and_overwrite
# Run tests for the library
......
......@@ -124,6 +124,7 @@ This checks that:
- The translations of the READMEs and the index of the doc have the same model list as the main README (performed by `utils/check_copies.py`)
- The auto-generated tables in the documentation are up to date (performed by `utils/check_table.py`)
- The library has all objects available even if not all optional dependencies are installed (performed by `utils/check_dummies.py`)
- All docstrings properly document the arguments in the signature of the object (performed by `utils/check_docstrings.py`)
Should this check fail, the first two items require manual fixing, the last four can be fixed automatically for you by running the command
......
......@@ -47,6 +47,7 @@ _re_configuration_file = re.compile(r"config\.(.*)\.json")
class PretrainedConfig(PushToHubMixin):
# no-format
r"""
Base class for all configuration classes. Handles a few parameters common to all models' configurations as well as
methods for loading/downloading/saving configurations.
......
......@@ -90,7 +90,7 @@ class DefaultDataCollator(DataCollatorMixin):
helpful if you need to set a return_tensors value at initialization.
Args:
return_tensors (`str`):
return_tensors (`str`, *optional*, defaults to `"pt"`):
The type of Tensor to return. Allowable values are "np", "pt" and "tf".
"""
......@@ -235,7 +235,7 @@ class DataCollatorWithPadding:
This is especially useful to enable the use of Tensor Cores on NVIDIA hardware with compute capability >=
7.5 (Volta).
return_tensors (`str`):
return_tensors (`str`, *optional*, defaults to `"pt"`):
The type of Tensor to return. Allowable values are "np", "pt" and "tf".
"""
......@@ -288,7 +288,7 @@ class DataCollatorForTokenClassification(DataCollatorMixin):
7.5 (Volta).
label_pad_token_id (`int`, *optional*, defaults to -100):
The id to use when padding the labels (-100 will be automatically ignore by PyTorch loss functions).
return_tensors (`str`):
return_tensors (`str`, *optional*, defaults to `"pt"`):
The type of Tensor to return. Allowable values are "np", "pt" and "tf".
"""
......@@ -521,7 +521,7 @@ class DataCollatorForSeq2Seq:
Args:
tokenizer ([`PreTrainedTokenizer`] or [`PreTrainedTokenizerFast`]):
The tokenizer used for encoding the data.
model ([`PreTrainedModel`]):
model ([`PreTrainedModel`], *optional*):
The model that is being trained. If set and has the *prepare_decoder_input_ids_from_labels*, use it to
prepare the *decoder_input_ids*
......@@ -544,7 +544,7 @@ class DataCollatorForSeq2Seq:
7.5 (Volta).
label_pad_token_id (`int`, *optional*, defaults to -100):
The id to use when padding the labels (-100 will be automatically ignored by PyTorch loss functions).
return_tensors (`str`):
return_tensors (`str`, *optional*, defaults to `"pt"`):
The type of Tensor to return. Allowable values are "np", "pt" and "tf".
"""
......
......@@ -65,7 +65,7 @@ class BatchFeature(UserDict):
This class is derived from a python dictionary and can be used as a dictionary.
Args:
data (`dict`):
data (`dict`, *optional*):
Dictionary of lists/arrays/tensors returned by the __call__/pad methods ('input_values', 'attention_mask',
etc.).
tensor_type (`Union[None, str, TensorType]`, *optional*):
......
......@@ -263,8 +263,9 @@ class DisjunctiveConstraint(Constraint):
A special [`Constraint`] that is fulfilled by fulfilling just one of several constraints.
Args:
nested_token_ids (`List[List[int]]`): a list of words, where each word is a list of ids. This constraint
is fulfilled by generating just one from the list of words.
nested_token_ids (`List[List[int]]`):
A list of words, where each word is a list of ids. This constraint is fulfilled by generating just one from
the list of words.
"""
def __init__(self, nested_token_ids: List[List[int]]):
......
......@@ -152,7 +152,7 @@ class BeamSearchScorer(BeamScorer):
num_beam_hyps_to_keep (`int`, *optional*, defaults to 1):
The number of beam hypotheses that shall be returned upon calling
[`~transformer.BeamSearchScorer.finalize`].
num_beam_groups (`int`):
num_beam_groups (`int`, *optional*, defaults to 1):
Number of groups to divide `num_beams` into in order to ensure diversity among different groups of beams.
See [this paper](https://arxiv.org/pdf/1610.02424.pdf) for more details.
max_length (`int`, *optional*):
......@@ -437,7 +437,7 @@ class ConstrainedBeamSearchScorer(BeamScorer):
num_beam_hyps_to_keep (`int`, *optional*, defaults to 1):
The number of beam hypotheses that shall be returned upon calling
[`~transformer.BeamSearchScorer.finalize`].
num_beam_groups (`int`):
num_beam_groups (`int`, *optional*, defaults to 1):
Number of groups to divide `num_beams` into in order to ensure diversity among different groups of beams.
See [this paper](https://arxiv.org/pdf/1610.02424.pdf) for more details.
max_length (`int`, *optional*):
......
......@@ -38,6 +38,7 @@ METADATA_FIELDS = ("_from_model_config", "_commit_hash", "_original_object_hash"
class GenerationConfig(PushToHubMixin):
# no-format
r"""
Class that holds a configuration for a generation task. A `generate` call supports the following generation methods
for text-decoder, text-to-text, speech-to-text, and vision-to-text models:
......
......@@ -120,7 +120,7 @@ class FlaxTopPLogitsWarper(FlaxLogitsWarper):
top_p (`float`):
If set to < 1, only the smallest set of most probable tokens with probabilities that add up to `top_p` or
higher are kept for generation.
filter_value (`float`, *optional*, defaults to `-float("Inf")`):
filter_value (`float`, *optional*, defaults to -inf):
All filtered values will be set to this float value.
min_tokens_to_keep (`int`, *optional*, defaults to 1):
Minimum number of tokens that cannot be filtered.
......@@ -163,7 +163,7 @@ class FlaxTopKLogitsWarper(FlaxLogitsWarper):
Args:
top_k (`int`):
The number of highest probability vocabulary tokens to keep for top-k-filtering.
filter_value (`float`, *optional*, defaults to `-float("Inf")`):
filter_value (`float`, *optional*, defaults to -inf):
All filtered values will be set to this float value.
min_tokens_to_keep (`int`, *optional*, defaults to 1):
Minimum number of tokens that cannot be filtered.
......
......@@ -357,7 +357,7 @@ class TopPLogitsWarper(LogitsWarper):
top_p (`float`):
If set to < 1, only the smallest set of most probable tokens with probabilities that add up to `top_p` or
higher are kept for generation.
filter_value (`float`, *optional*, defaults to `-float("Inf")`):
filter_value (`float`, *optional*, defaults to -inf):
All filtered values will be set to this float value.
min_tokens_to_keep (`int`, *optional*, defaults to 1):
Minimum number of tokens that cannot be filtered.
......@@ -419,7 +419,7 @@ class TopKLogitsWarper(LogitsWarper):
Args:
top_k (`int`):
The number of highest probability vocabulary tokens to keep for top-k-filtering.
filter_value (`float`, *optional*, defaults to `-float("Inf")`):
filter_value (`float`, *optional*, defaults to -inf):
All filtered values will be set to this float value.
min_tokens_to_keep (`int`, *optional*, defaults to 1):
Minimum number of tokens that cannot be filtered.
......@@ -447,9 +447,9 @@ class TypicalLogitsWarper(LogitsWarper):
Generation](https://arxiv.org/abs/2202.00666) for more information.
Args:
mass (`float`):
mass (`float`, *optional*, defaults to 0.9):
Value of typical_p between 0 and 1 inclusive, defaults to 0.9.
filter_value (`float`, *optional*, defaults to `-float("Inf")`):
filter_value (`float`, *optional*, defaults to -inf):
All filtered values will be set to this float value.
min_tokens_to_keep (`int`, *optional*, defaults to 1):
Minimum number of tokens that cannot be filtered.
......@@ -499,7 +499,7 @@ class EpsilonLogitsWarper(LogitsWarper):
Args:
epsilon (`float`):
If set to > 0, only the most tokens with probabilities `epsilon` or higher are kept for generation.
filter_value (`float`, *optional*, defaults to `-float("Inf")`):
filter_value (`float`, *optional*, defaults to -inf):
All filtered values will be set to this float value.
min_tokens_to_keep (`int`, *optional*, defaults to 1):
Minimum number of tokens that cannot be filtered.
......@@ -572,7 +572,7 @@ class EtaLogitsWarper(LogitsWarper):
epsilon (`float`):
A float value in the range (0, 1). Hyperparameter used to calculate the dynamic cutoff value, `eta`. The
suggested values from the paper ranges from 3e-4 to 4e-3 depending on the size of the model.
filter_value (`float`, *optional*, defaults to `-float("Inf")`):
filter_value (`float`, *optional*, defaults to -inf):
All values that are found to be below the dynamic cutoff value, `eta`, are set to this float value. This
parameter is useful when logits need to be modified for very low probability tokens that should be excluded
from generation entirely.
......@@ -1600,18 +1600,15 @@ class UnbatchedClassifierFreeGuidanceLogitsProcessor(LogitsProcessor):
Higher guidance scale encourages the model to generate samples that are more closely linked to the input
prompt, usually at the expense of poorer quality. A value smaller than 1 has the opposite effect, while
making the negative prompt provided with negative_prompt_ids (if any) act as a positive prompt.
model (`PreTrainedModel`):
The model computing the unconditional scores. Supposedly the same as the one computing the conditional
scores. Both models must use the same tokenizer.
unconditional_ids (`torch.LongTensor` of shape `(batch_size, sequence_length)`, *optional*):
Indices of input sequence tokens in the vocabulary for the unconditional branch. If unset, will default to
the last token of the prompt.
unconditional_attention_mask (`torch.LongTensor` of shape `(batch_size, sequence_length)`, **optional**):
unconditional_attention_mask (`torch.LongTensor` of shape `(batch_size, sequence_length)`, *optional*):
Attention mask for unconditional_ids.
model (`PreTrainedModel`):
The model computing the unconditional scores. Supposedly the same as the one computing the conditional
scores. Both models must use the same tokenizer.
smooth_factor (`float`, **optional**):
The interpolation weight for CFG Rescale. 1 means no rescaling, 0 reduces to the conditional scores without
CFG. Turn it lower if the output degenerates.
use_cache (`bool`, **optional**):
use_cache (`bool`, *optional*, defaults to `True`):
Whether to cache key/values during the negative prompt forward pass.
......
......@@ -49,7 +49,7 @@ class MaxLengthCriteria(StoppingCriteria):
Args:
max_length (`int`):
The maximum length that the output sequence can have in number of tokens.
max_position_embeddings (`int`, `optional`):
max_position_embeddings (`int`, *optional*):
The maximum model length, as defined by the model's `config.max_position_embeddings` attribute.
"""
......
......@@ -122,7 +122,7 @@ class TFTopKLogitsWarper(TFLogitsWarper):
Args:
top_k (`int`):
The number of highest probability vocabulary tokens to keep for top-k-filtering.
filter_value (`float`, *optional*, defaults to `-float("Inf")`):
filter_value (`float`, *optional*, defaults to -inf):
All filtered values will be set to this float value.
min_tokens_to_keep (`int`, *optional*, defaults to 1):
Minimum number of tokens that cannot be filtered.
......@@ -151,7 +151,7 @@ class TFTopPLogitsWarper(TFLogitsWarper):
top_p (`float`):
If set to < 1, only the smallest set of most probable tokens with probabilities that add up to `top_p` or
higher are kept for generation.
filter_value (`float`, *optional*, defaults to `-float("Inf")`):
filter_value (`float`, *optional*, defaults to -inf):
All filtered values will be set to this float value.
min_tokens_to_keep (`int`, *optional*, defaults to 1):
Minimum number of tokens that cannot be filtered.
......
......@@ -71,6 +71,8 @@ class AlignTextConfig(PretrainedConfig):
The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
layer_norm_eps (`float`, *optional*, defaults to 1e-12):
The epsilon used by the layer normalization layers.
pad_token_id (`int`, *optional*, defaults to 0):
Padding token id.
position_embedding_type (`str`, *optional*, defaults to `"absolute"`):
Type of position embedding. Choose one of `"absolute"`, `"relative_key"`, `"relative_key_query"`. For
positional embeddings use `"absolute"`. For more information on `"relative_key"`, please refer to
......@@ -80,8 +82,6 @@ class AlignTextConfig(PretrainedConfig):
use_cache (`bool`, *optional*, defaults to `True`):
Whether or not the model should return the last key/values attentions (not used by all models). Only
relevant if `config.is_decoder=True`.
pad_token_id (`int`, *optional*, defaults to 0)
Padding token id.
Example:
......
......@@ -259,7 +259,7 @@ class AltCLIPConfig(PretrainedConfig):
Dictionary of configuration options used to initialize [`AltCLIPTextConfig`].
vision_config (`dict`, *optional*):
Dictionary of configuration options used to initialize [`AltCLIPVisionConfig`].
projection_dim (`int`, *optional*, defaults to 512):
projection_dim (`int`, *optional*, defaults to 768):
Dimentionality of text and vision projection layers.
logit_scale_init_value (`float`, *optional*, defaults to 2.6592):
The inital value of the *logit_scale* paramter. Default is used as per the original CLIP implementation.
......
......@@ -30,9 +30,9 @@ class AltCLIPProcessor(ProcessorMixin):
the [`~AltCLIPProcessor.__call__`] and [`~AltCLIPProcessor.decode`] for more information.
Args:
image_processor ([`CLIPImageProcessor`]):
image_processor ([`CLIPImageProcessor`], *optional*):
The image processor is a required input.
tokenizer ([`XLMRobertaTokenizerFast`]):
tokenizer ([`XLMRobertaTokenizerFast`], *optional*):
The tokenizer is a required input.
"""
attributes = ["image_processor", "tokenizer"]
......
......@@ -51,15 +51,15 @@ class ASTConfig(PretrainedConfig):
hidden_act (`str` or `function`, *optional*, defaults to `"gelu"`):
The non-linear activation function (function or string) in the encoder and pooler. If string, `"gelu"`,
`"relu"`, `"selu"` and `"gelu_new"` are supported.
hidden_dropout_prob (`float`, *optional*, defaults to 0.1):
hidden_dropout_prob (`float`, *optional*, defaults to 0.0):
The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.
attention_probs_dropout_prob (`float`, *optional*, defaults to 0.1):
attention_probs_dropout_prob (`float`, *optional*, defaults to 0.0):
The dropout ratio for the attention probabilities.
initializer_range (`float`, *optional*, defaults to 0.02):
The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
layer_norm_eps (`float`, *optional*, defaults to 1e-12):
The epsilon used by the layer normalization layers.
patch_size (`int`, *optional*, defaults to `16`):
patch_size (`int`, *optional*, defaults to 16):
The size (resolution) of each patch.
qkv_bias (`bool`, *optional*, defaults to `True`):
Whether to add a bias to the queries, keys and values.
......
......@@ -38,7 +38,7 @@ class BarkProcessor(ProcessorMixin):
Args:
tokenizer ([`PreTrainedTokenizer`]):
An instance of [`PreTrainedTokenizer`].
speaker_embeddings (`Dict[Dict[str]]`, *optional*, defaults to `None`):
speaker_embeddings (`Dict[Dict[str]]`, *optional*):
Optional nested speaker embeddings dictionary. The first level contains voice preset names (e.g
`"en_speaker_4"`). The second level contains `"semantic_prompt"`, `"coarse_prompt"` and `"fine_prompt"`
embeddings. The values correspond to the path of the corresponding `np.ndarray`. See
......
......@@ -97,8 +97,6 @@ class BarthezTokenizer(PreTrainedTokenizer):
mask_token (`str`, *optional*, defaults to `"<mask>"`):
The token used for masking values. This is the token used when training this model with masked language
modeling. This is the token which the model will try to predict.
additional_special_tokens (`List[str]`, *optional*, defaults to `["<s>NOTUSED", "</s>NOTUSED"]`):
Additional special tokens used by the tokenizer.
sp_model_kwargs (`dict`, *optional*):
Will be passed to the `SentencePieceProcessor.__init__()` method. The [Python wrapper for
SentencePiece](https://github.com/google/sentencepiece/tree/master/python) can be used, among other things,
......
......@@ -92,8 +92,6 @@ class BartphoTokenizer(PreTrainedTokenizer):
mask_token (`str`, *optional*, defaults to `"<mask>"`):
The token used for masking values. This is the token used when training this model with masked language
modeling. This is the token which the model will try to predict.
additional_special_tokens (`List[str]`, *optional*, defaults to `["<s>NOTUSED", "</s>NOTUSED"]`):
Additional special tokens used by the tokenizer.
sp_model_kwargs (`dict`, *optional*):
Will be passed to the `SentencePieceProcessor.__init__()` method. The [Python wrapper for
SentencePiece](https://github.com/google/sentencepiece/tree/master/python) can be used, among other things,
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment