Unverified Commit 003a0cf8 authored by zspo's avatar zspo Committed by GitHub
Browse files

Fix some docs what layerdrop does (#23691)



* Fix some docs what layerdrop does

* Update src/transformers/models/data2vec/configuration_data2vec_audio.py
Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Fix more docs

---------
Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
parent 357f281b
...@@ -70,10 +70,10 @@ class PegasusXConfig(PretrainedConfig): ...@@ -70,10 +70,10 @@ class PegasusXConfig(PretrainedConfig):
just in case (e.g., 512 or 1024 or 2048). just in case (e.g., 512 or 1024 or 2048).
init_std (`float`, *optional*, defaults to 0.02): init_std (`float`, *optional*, defaults to 0.02):
The standard deviation of the truncated_normal_initializer for initializing all weight matrices. The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
encoder_layerdrop: (`float`, *optional*, defaults to 0.0): encoder_layerdrop (`float`, *optional*, defaults to 0.0):
The LayerDrop probability for the encoder. See the [LayerDrop paper](see https://arxiv.org/abs/1909.11556) The LayerDrop probability for the encoder. See the [LayerDrop paper](see https://arxiv.org/abs/1909.11556)
for more details. for more details.
decoder_layerdrop: (`float`, *optional*, defaults to 0.0): decoder_layerdrop (`float`, *optional*, defaults to 0.0):
The LayerDrop probability for the decoder. See the [LayerDrop paper](see https://arxiv.org/abs/1909.11556) The LayerDrop probability for the decoder. See the [LayerDrop paper](see https://arxiv.org/abs/1909.11556)
for more details. for more details.
use_cache (`bool`, *optional*, defaults to `True`): use_cache (`bool`, *optional*, defaults to `True`):
......
...@@ -1430,7 +1430,7 @@ class RagTokenForGeneration(RagPreTrainedModel): ...@@ -1430,7 +1430,7 @@ class RagTokenForGeneration(RagPreTrainedModel):
priority: 1) from the `generation_config.json` model file, if it exists; 2) from the model priority: 1) from the `generation_config.json` model file, if it exists; 2) from the model
configuration. Please note that unspecified parameters will inherit [`~generation.GenerationConfig`]'s configuration. Please note that unspecified parameters will inherit [`~generation.GenerationConfig`]'s
default values, whose documentation should be checked to parameterize generation. default values, whose documentation should be checked to parameterize generation.
prefix_allowed_tokens_fn: (`Callable[[int, torch.Tensor], List[int]]`, *optional*): prefix_allowed_tokens_fn (`Callable[[int, torch.Tensor], List[int]]`, *optional*):
If provided, this function constraints the beam search to allowed tokens only at each step. If not If provided, this function constraints the beam search to allowed tokens only at each step. If not
provided no constraint is applied. This function takes 2 arguments `inputs_ids` and the batch ID provided no constraint is applied. This function takes 2 arguments `inputs_ids` and the batch ID
`batch_id`. It has to return a list with the allowed tokens for the next generation step conditioned on `batch_id`. It has to return a list with the allowed tokens for the next generation step conditioned on
......
...@@ -573,10 +573,10 @@ class RagRetriever: ...@@ -573,10 +573,10 @@ class RagRetriever:
Retrieves documents for specified `question_hidden_states`. Retrieves documents for specified `question_hidden_states`.
Args: Args:
question_input_ids: (`List[List[int]]`) batch of input ids question_input_ids (`List[List[int]]`) batch of input ids
question_hidden_states (`np.ndarray` of shape `(batch_size, vector_size)`: question_hidden_states (`np.ndarray` of shape `(batch_size, vector_size)`:
A batch of query vectors to retrieve with. A batch of query vectors to retrieve with.
prefix: (`str`, *optional*): prefix (`str`, *optional*):
The prefix used by the generator's tokenizer. The prefix used by the generator's tokenizer.
n_docs (`int`, *optional*): n_docs (`int`, *optional*):
The number of docs retrieved per query. The number of docs retrieved per query.
......
...@@ -726,7 +726,7 @@ class RealmReaderOutput(ModelOutput): ...@@ -726,7 +726,7 @@ class RealmReaderOutput(ModelOutput):
The index of the retrieved span candidates in which the predicted answer is most likely. The index of the retrieved span candidates in which the predicted answer is most likely.
start_pos (`torch.IntTensor` of shape `()`): start_pos (`torch.IntTensor` of shape `()`):
Predicted answer starting position in *RealmReader*'s inputs. Predicted answer starting position in *RealmReader*'s inputs.
end_pos: (`torch.IntTensor` of shape `()`): end_pos (`torch.IntTensor` of shape `()`):
Predicted answer ending position in *RealmReader*'s inputs. Predicted answer ending position in *RealmReader*'s inputs.
hidden_states (`tuple(torch.FloatTensor)`, *optional*, returned when `output_hidden_states=True` is passed or when `config.output_hidden_states=True`): hidden_states (`tuple(torch.FloatTensor)`, *optional*, returned when `output_hidden_states=True` is passed or when `config.output_hidden_states=True`):
Tuple of `torch.FloatTensor` (one for the output of the embeddings + one for the output of each layer) of Tuple of `torch.FloatTensor` (one for the output of the embeddings + one for the output of each layer) of
......
...@@ -63,6 +63,9 @@ class SEWConfig(PretrainedConfig): ...@@ -63,6 +63,9 @@ class SEWConfig(PretrainedConfig):
The dropout ratio for the attention probabilities. The dropout ratio for the attention probabilities.
final_dropout (`float`, *optional*, defaults to 0.1): final_dropout (`float`, *optional*, defaults to 0.1):
The dropout probability for the final projection layer of [`SEWForCTC`]. The dropout probability for the final projection layer of [`SEWForCTC`].
layerdrop (`float`, *optional*, defaults to 0.1):
The LayerDrop probability. See the [LayerDrop paper](see https://arxiv.org/abs/1909.11556) for more
details.
initializer_range (`float`, *optional*, defaults to 0.02): initializer_range (`float`, *optional*, defaults to 0.02):
The standard deviation of the truncated_normal_initializer for initializing all weight matrices. The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
layer_norm_eps (`float`, *optional*, defaults to 1e-12): layer_norm_eps (`float`, *optional*, defaults to 1e-12):
......
...@@ -65,6 +65,9 @@ class UniSpeechConfig(PretrainedConfig): ...@@ -65,6 +65,9 @@ class UniSpeechConfig(PretrainedConfig):
The dropout ratio for the attention probabilities. The dropout ratio for the attention probabilities.
final_dropout (`float`, *optional*, defaults to 0.1): final_dropout (`float`, *optional*, defaults to 0.1):
The dropout probability for the final projection layer of [`UniSpeechForCTC`]. The dropout probability for the final projection layer of [`UniSpeechForCTC`].
layerdrop (`float`, *optional*, defaults to 0.1):
The LayerDrop probability. See the [LayerDrop paper](see https://arxiv.org/abs/1909.11556) for more
details.
initializer_range (`float`, *optional*, defaults to 0.02): initializer_range (`float`, *optional*, defaults to 0.02):
The standard deviation of the truncated_normal_initializer for initializing all weight matrices. The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
layer_norm_eps (`float`, *optional*, defaults to 1e-12): layer_norm_eps (`float`, *optional*, defaults to 1e-12):
......
...@@ -66,6 +66,9 @@ class UniSpeechSatConfig(PretrainedConfig): ...@@ -66,6 +66,9 @@ class UniSpeechSatConfig(PretrainedConfig):
The dropout ratio for the attention probabilities. The dropout ratio for the attention probabilities.
final_dropout (`float`, *optional*, defaults to 0.1): final_dropout (`float`, *optional*, defaults to 0.1):
The dropout probability for the final projection layer of [`UniSpeechSatForCTC`]. The dropout probability for the final projection layer of [`UniSpeechSatForCTC`].
layerdrop (`float`, *optional*, defaults to 0.1):
The LayerDrop probability. See the [LayerDrop paper](see https://arxiv.org/abs/1909.11556) for more
details.
initializer_range (`float`, *optional*, defaults to 0.02): initializer_range (`float`, *optional*, defaults to 0.02):
The standard deviation of the truncated_normal_initializer for initializing all weight matrices. The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
layer_norm_eps (`float`, *optional*, defaults to 1e-12): layer_norm_eps (`float`, *optional*, defaults to 1e-12):
......
...@@ -63,6 +63,9 @@ class Wav2Vec2Config(PretrainedConfig): ...@@ -63,6 +63,9 @@ class Wav2Vec2Config(PretrainedConfig):
The dropout ratio for the attention probabilities. The dropout ratio for the attention probabilities.
final_dropout (`float`, *optional*, defaults to 0.1): final_dropout (`float`, *optional*, defaults to 0.1):
The dropout probability for the final projection layer of [`Wav2Vec2ForCTC`]. The dropout probability for the final projection layer of [`Wav2Vec2ForCTC`].
layerdrop (`float`, *optional*, defaults to 0.1):
The LayerDrop probability. See the [LayerDrop paper](see https://arxiv.org/abs/1909.11556) for more
details.
initializer_range (`float`, *optional*, defaults to 0.02): initializer_range (`float`, *optional*, defaults to 0.02):
The standard deviation of the truncated_normal_initializer for initializing all weight matrices. The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
layer_norm_eps (`float`, *optional*, defaults to 1e-12): layer_norm_eps (`float`, *optional*, defaults to 1e-12):
......
...@@ -65,6 +65,9 @@ class Wav2Vec2ConformerConfig(PretrainedConfig): ...@@ -65,6 +65,9 @@ class Wav2Vec2ConformerConfig(PretrainedConfig):
The dropout ratio for the attention probabilities. The dropout ratio for the attention probabilities.
final_dropout (`float`, *optional*, defaults to 0.1): final_dropout (`float`, *optional*, defaults to 0.1):
The dropout probability for the final projection layer of [`Wav2Vec2ConformerForCTC`]. The dropout probability for the final projection layer of [`Wav2Vec2ConformerForCTC`].
layerdrop (`float`, *optional*, defaults to 0.1):
The LayerDrop probability. See the [LayerDrop paper](see https://arxiv.org/abs/1909.11556) for more
details.
initializer_range (`float`, *optional*, defaults to 0.02): initializer_range (`float`, *optional*, defaults to 0.02):
The standard deviation of the truncated_normal_initializer for initializing all weight matrices. The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
layer_norm_eps (`float`, *optional*, defaults to 1e-12): layer_norm_eps (`float`, *optional*, defaults to 1e-12):
......
...@@ -62,6 +62,9 @@ class WavLMConfig(PretrainedConfig): ...@@ -62,6 +62,9 @@ class WavLMConfig(PretrainedConfig):
The dropout ratio for the attention probabilities. The dropout ratio for the attention probabilities.
final_dropout (`float`, *optional*, defaults to 0.1): final_dropout (`float`, *optional*, defaults to 0.1):
The dropout probability for the final projection layer of [`WavLMForCTC`]. The dropout probability for the final projection layer of [`WavLMForCTC`].
layerdrop (`float`, *optional*, defaults to 0.1):
The LayerDrop probability. See the [LayerDrop paper](see https://arxiv.org/abs/1909.11556) for more
details.
initializer_range (`float`, *optional*, defaults to 0.02): initializer_range (`float`, *optional*, defaults to 0.02):
The standard deviation of the truncated_normal_initializer for initializing all weight matrices. The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
layer_norm_eps (`float`, *optional*, defaults to 1e-12): layer_norm_eps (`float`, *optional*, defaults to 1e-12):
......
...@@ -117,9 +117,9 @@ def create_optimizer( ...@@ -117,9 +117,9 @@ def create_optimizer(
The beta2 to use in Adam. The beta2 to use in Adam.
adam_epsilon (`float`, *optional*, defaults to 1e-8): adam_epsilon (`float`, *optional*, defaults to 1e-8):
The epsilon to use in Adam. The epsilon to use in Adam.
adam_clipnorm: (`float`, *optional*, defaults to `None`): adam_clipnorm (`float`, *optional*, defaults to `None`):
If not `None`, clip the gradient norm for each weight tensor to this value. If not `None`, clip the gradient norm for each weight tensor to this value.
adam_global_clipnorm: (`float`, *optional*, defaults to `None`) adam_global_clipnorm (`float`, *optional*, defaults to `None`)
If not `None`, clip gradient norm to this value. When using this argument, the norm is computed over all If not `None`, clip gradient norm to this value. When using this argument, the norm is computed over all
weight tensors, as if they were concatenated into a single vector. weight tensors, as if they were concatenated into a single vector.
weight_decay_rate (`float`, *optional*, defaults to 0): weight_decay_rate (`float`, *optional*, defaults to 0):
......
...@@ -119,7 +119,7 @@ def ffmpeg_microphone_live( ...@@ -119,7 +119,7 @@ def ffmpeg_microphone_live(
The length of the striding to be used. Stride is used to provide context to a model on the (left, right) of The length of the striding to be used. Stride is used to provide context to a model on the (left, right) of
an audio sample but without using that part to actually make the prediction. Setting this does not change an audio sample but without using that part to actually make the prediction. Setting this does not change
the length of the chunk. the length of the chunk.
format_for_conversion: (`str`, defalts to `f32le`) format_for_conversion (`str`, defalts to `f32le`)
The name of the format of the audio samples to be returned by ffmpeg. The standard is `f32le`, `s16le` The name of the format of the audio samples to be returned by ffmpeg. The standard is `f32le`, `s16le`
could also be used. could also be used.
Return: Return:
......
...@@ -514,7 +514,7 @@ class PipelineDataFormat: ...@@ -514,7 +514,7 @@ class PipelineDataFormat:
Creates an instance of the right subclass of [`~pipelines.PipelineDataFormat`] depending on `format`. Creates an instance of the right subclass of [`~pipelines.PipelineDataFormat`] depending on `format`.
Args: Args:
format: (`str`): format (`str`):
The format of the desired pipeline. Acceptable values are `"json"`, `"csv"` or `"pipe"`. The format of the desired pipeline. Acceptable values are `"json"`, `"csv"` or `"pipe"`.
output_path (`str`, *optional*): output_path (`str`, *optional*):
Where to save the outgoing data. Where to save the outgoing data.
......
...@@ -2093,7 +2093,7 @@ class PreTrainedTokenizerBase(SpecialTokensMixin, PushToHubMixin): ...@@ -2093,7 +2093,7 @@ class PreTrainedTokenizerBase(SpecialTokensMixin, PushToHubMixin):
If `True`, will save the tokenizer in legacy format. If the "slow" tokenizer doesn't exits, a value If `True`, will save the tokenizer in legacy format. If the "slow" tokenizer doesn't exits, a value
error is raised. error is raised.
filename_prefix: (`str`, *optional*): filename_prefix (`str`, *optional*):
A prefix to add to the names of the files saved by the tokenizer. A prefix to add to the names of the files saved by the tokenizer.
push_to_hub (`bool`, *optional*, defaults to `False`): push_to_hub (`bool`, *optional*, defaults to `False`):
Whether or not to push your model to the Hugging Face model hub after saving it. You can specify the Whether or not to push your model to the Hugging Face model hub after saving it. You can specify the
......
...@@ -66,7 +66,7 @@ class TFTrainingArguments(TrainingArguments): ...@@ -66,7 +66,7 @@ class TFTrainingArguments(TrainingArguments):
The batch size per GPU/TPU core/CPU for training. The batch size per GPU/TPU core/CPU for training.
per_device_eval_batch_size (`int`, *optional*, defaults to 8): per_device_eval_batch_size (`int`, *optional*, defaults to 8):
The batch size per GPU/TPU core/CPU for evaluation. The batch size per GPU/TPU core/CPU for evaluation.
gradient_accumulation_steps: (`int`, *optional*, defaults to 1): gradient_accumulation_steps (`int`, *optional*, defaults to 1):
Number of updates steps to accumulate the gradients for, before performing a backward/update pass. Number of updates steps to accumulate the gradients for, before performing a backward/update pass.
<Tip warning={true}> <Tip warning={true}>
......
...@@ -107,10 +107,10 @@ class {{cookiecutter.camelcase_modelname}}Config(PretrainedConfig): ...@@ -107,10 +107,10 @@ class {{cookiecutter.camelcase_modelname}}Config(PretrainedConfig):
just in case (e.g., 512 or 1024 or 2048). just in case (e.g., 512 or 1024 or 2048).
init_std (`float`, *optional*, defaults to 0.02): init_std (`float`, *optional*, defaults to 0.02):
The standard deviation of the truncated_normal_initializer for initializing all weight matrices. The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
encoder_layerdrop: (`float`, *optional*, defaults to 0.0): encoder_layerdrop (`float`, *optional*, defaults to 0.0):
The LayerDrop probability for the encoder. See the [LayerDrop paper](see The LayerDrop probability for the encoder. See the [LayerDrop paper](see
https://arxiv.org/abs/1909.11556) for more details. https://arxiv.org/abs/1909.11556) for more details.
decoder_layerdrop: (`float`, *optional*, defaults to 0.0): decoder_layerdrop (`float`, *optional*, defaults to 0.0):
The LayerDrop probability for the decoder. See the [LayerDrop paper](see The LayerDrop probability for the decoder. See the [LayerDrop paper](see
https://arxiv.org/abs/1909.11556) for more details. https://arxiv.org/abs/1909.11556) for more details.
use_cache (`bool`, *optional*, defaults to `True`): use_cache (`bool`, *optional*, defaults to `True`):
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment