Unverified Commit 003a0cf8 authored by zspo's avatar zspo Committed by GitHub
Browse files

Fix some docs what layerdrop does (#23691)



* Fix some docs what layerdrop does

* Update src/transformers/models/data2vec/configuration_data2vec_audio.py
Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Fix more docs

---------
Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
parent 357f281b
......@@ -70,10 +70,10 @@ class PegasusXConfig(PretrainedConfig):
just in case (e.g., 512 or 1024 or 2048).
init_std (`float`, *optional*, defaults to 0.02):
The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
encoder_layerdrop: (`float`, *optional*, defaults to 0.0):
encoder_layerdrop (`float`, *optional*, defaults to 0.0):
The LayerDrop probability for the encoder. See the [LayerDrop paper](see https://arxiv.org/abs/1909.11556)
for more details.
decoder_layerdrop: (`float`, *optional*, defaults to 0.0):
decoder_layerdrop (`float`, *optional*, defaults to 0.0):
The LayerDrop probability for the decoder. See the [LayerDrop paper](see https://arxiv.org/abs/1909.11556)
for more details.
use_cache (`bool`, *optional*, defaults to `True`):
......
......@@ -1430,7 +1430,7 @@ class RagTokenForGeneration(RagPreTrainedModel):
priority: 1) from the `generation_config.json` model file, if it exists; 2) from the model
configuration. Please note that unspecified parameters will inherit [`~generation.GenerationConfig`]'s
default values, whose documentation should be checked to parameterize generation.
prefix_allowed_tokens_fn: (`Callable[[int, torch.Tensor], List[int]]`, *optional*):
prefix_allowed_tokens_fn (`Callable[[int, torch.Tensor], List[int]]`, *optional*):
If provided, this function constraints the beam search to allowed tokens only at each step. If not
provided no constraint is applied. This function takes 2 arguments `inputs_ids` and the batch ID
`batch_id`. It has to return a list with the allowed tokens for the next generation step conditioned on
......
......@@ -573,10 +573,10 @@ class RagRetriever:
Retrieves documents for specified `question_hidden_states`.
Args:
question_input_ids: (`List[List[int]]`) batch of input ids
question_input_ids (`List[List[int]]`) batch of input ids
question_hidden_states (`np.ndarray` of shape `(batch_size, vector_size)`:
A batch of query vectors to retrieve with.
prefix: (`str`, *optional*):
prefix (`str`, *optional*):
The prefix used by the generator's tokenizer.
n_docs (`int`, *optional*):
The number of docs retrieved per query.
......
......@@ -726,7 +726,7 @@ class RealmReaderOutput(ModelOutput):
The index of the retrieved span candidates in which the predicted answer is most likely.
start_pos (`torch.IntTensor` of shape `()`):
Predicted answer starting position in *RealmReader*'s inputs.
end_pos: (`torch.IntTensor` of shape `()`):
end_pos (`torch.IntTensor` of shape `()`):
Predicted answer ending position in *RealmReader*'s inputs.
hidden_states (`tuple(torch.FloatTensor)`, *optional*, returned when `output_hidden_states=True` is passed or when `config.output_hidden_states=True`):
Tuple of `torch.FloatTensor` (one for the output of the embeddings + one for the output of each layer) of
......
......@@ -63,6 +63,9 @@ class SEWConfig(PretrainedConfig):
The dropout ratio for the attention probabilities.
final_dropout (`float`, *optional*, defaults to 0.1):
The dropout probability for the final projection layer of [`SEWForCTC`].
layerdrop (`float`, *optional*, defaults to 0.1):
The LayerDrop probability. See the [LayerDrop paper](see https://arxiv.org/abs/1909.11556) for more
details.
initializer_range (`float`, *optional*, defaults to 0.02):
The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
layer_norm_eps (`float`, *optional*, defaults to 1e-12):
......
......@@ -65,6 +65,9 @@ class UniSpeechConfig(PretrainedConfig):
The dropout ratio for the attention probabilities.
final_dropout (`float`, *optional*, defaults to 0.1):
The dropout probability for the final projection layer of [`UniSpeechForCTC`].
layerdrop (`float`, *optional*, defaults to 0.1):
The LayerDrop probability. See the [LayerDrop paper](see https://arxiv.org/abs/1909.11556) for more
details.
initializer_range (`float`, *optional*, defaults to 0.02):
The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
layer_norm_eps (`float`, *optional*, defaults to 1e-12):
......
......@@ -66,6 +66,9 @@ class UniSpeechSatConfig(PretrainedConfig):
The dropout ratio for the attention probabilities.
final_dropout (`float`, *optional*, defaults to 0.1):
The dropout probability for the final projection layer of [`UniSpeechSatForCTC`].
layerdrop (`float`, *optional*, defaults to 0.1):
The LayerDrop probability. See the [LayerDrop paper](see https://arxiv.org/abs/1909.11556) for more
details.
initializer_range (`float`, *optional*, defaults to 0.02):
The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
layer_norm_eps (`float`, *optional*, defaults to 1e-12):
......
......@@ -63,6 +63,9 @@ class Wav2Vec2Config(PretrainedConfig):
The dropout ratio for the attention probabilities.
final_dropout (`float`, *optional*, defaults to 0.1):
The dropout probability for the final projection layer of [`Wav2Vec2ForCTC`].
layerdrop (`float`, *optional*, defaults to 0.1):
The LayerDrop probability. See the [LayerDrop paper](see https://arxiv.org/abs/1909.11556) for more
details.
initializer_range (`float`, *optional*, defaults to 0.02):
The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
layer_norm_eps (`float`, *optional*, defaults to 1e-12):
......
......@@ -65,6 +65,9 @@ class Wav2Vec2ConformerConfig(PretrainedConfig):
The dropout ratio for the attention probabilities.
final_dropout (`float`, *optional*, defaults to 0.1):
The dropout probability for the final projection layer of [`Wav2Vec2ConformerForCTC`].
layerdrop (`float`, *optional*, defaults to 0.1):
The LayerDrop probability. See the [LayerDrop paper](see https://arxiv.org/abs/1909.11556) for more
details.
initializer_range (`float`, *optional*, defaults to 0.02):
The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
layer_norm_eps (`float`, *optional*, defaults to 1e-12):
......
......@@ -62,6 +62,9 @@ class WavLMConfig(PretrainedConfig):
The dropout ratio for the attention probabilities.
final_dropout (`float`, *optional*, defaults to 0.1):
The dropout probability for the final projection layer of [`WavLMForCTC`].
layerdrop (`float`, *optional*, defaults to 0.1):
The LayerDrop probability. See the [LayerDrop paper](see https://arxiv.org/abs/1909.11556) for more
details.
initializer_range (`float`, *optional*, defaults to 0.02):
The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
layer_norm_eps (`float`, *optional*, defaults to 1e-12):
......
......@@ -117,9 +117,9 @@ def create_optimizer(
The beta2 to use in Adam.
adam_epsilon (`float`, *optional*, defaults to 1e-8):
The epsilon to use in Adam.
adam_clipnorm: (`float`, *optional*, defaults to `None`):
adam_clipnorm (`float`, *optional*, defaults to `None`):
If not `None`, clip the gradient norm for each weight tensor to this value.
adam_global_clipnorm: (`float`, *optional*, defaults to `None`)
adam_global_clipnorm (`float`, *optional*, defaults to `None`)
If not `None`, clip gradient norm to this value. When using this argument, the norm is computed over all
weight tensors, as if they were concatenated into a single vector.
weight_decay_rate (`float`, *optional*, defaults to 0):
......
......@@ -119,7 +119,7 @@ def ffmpeg_microphone_live(
The length of the striding to be used. Stride is used to provide context to a model on the (left, right) of
an audio sample but without using that part to actually make the prediction. Setting this does not change
the length of the chunk.
format_for_conversion: (`str`, defalts to `f32le`)
format_for_conversion (`str`, defalts to `f32le`)
The name of the format of the audio samples to be returned by ffmpeg. The standard is `f32le`, `s16le`
could also be used.
Return:
......
......@@ -514,7 +514,7 @@ class PipelineDataFormat:
Creates an instance of the right subclass of [`~pipelines.PipelineDataFormat`] depending on `format`.
Args:
format: (`str`):
format (`str`):
The format of the desired pipeline. Acceptable values are `"json"`, `"csv"` or `"pipe"`.
output_path (`str`, *optional*):
Where to save the outgoing data.
......
......@@ -2093,7 +2093,7 @@ class PreTrainedTokenizerBase(SpecialTokensMixin, PushToHubMixin):
If `True`, will save the tokenizer in legacy format. If the "slow" tokenizer doesn't exits, a value
error is raised.
filename_prefix: (`str`, *optional*):
filename_prefix (`str`, *optional*):
A prefix to add to the names of the files saved by the tokenizer.
push_to_hub (`bool`, *optional*, defaults to `False`):
Whether or not to push your model to the Hugging Face model hub after saving it. You can specify the
......
......@@ -66,7 +66,7 @@ class TFTrainingArguments(TrainingArguments):
The batch size per GPU/TPU core/CPU for training.
per_device_eval_batch_size (`int`, *optional*, defaults to 8):
The batch size per GPU/TPU core/CPU for evaluation.
gradient_accumulation_steps: (`int`, *optional*, defaults to 1):
gradient_accumulation_steps (`int`, *optional*, defaults to 1):
Number of updates steps to accumulate the gradients for, before performing a backward/update pass.
<Tip warning={true}>
......
......@@ -107,10 +107,10 @@ class {{cookiecutter.camelcase_modelname}}Config(PretrainedConfig):
just in case (e.g., 512 or 1024 or 2048).
init_std (`float`, *optional*, defaults to 0.02):
The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
encoder_layerdrop: (`float`, *optional*, defaults to 0.0):
encoder_layerdrop (`float`, *optional*, defaults to 0.0):
The LayerDrop probability for the encoder. See the [LayerDrop paper](see
https://arxiv.org/abs/1909.11556) for more details.
decoder_layerdrop: (`float`, *optional*, defaults to 0.0):
decoder_layerdrop (`float`, *optional*, defaults to 0.0):
The LayerDrop probability for the decoder. See the [LayerDrop paper](see
https://arxiv.org/abs/1909.11556) for more details.
use_cache (`bool`, *optional*, defaults to `True`):
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment