Unverified Commit 9763f829 authored by Arthur's avatar Arthur Committed by GitHub
Browse files

Fix whisper and speech to text doc (#20595)

* Fix whisper and speech to text doc
# What does this PR do?
Previously the documentation was badly indented for both models and indicated that
> If `decoder_input_ids` and `decoder_inputs_embeds` are both unset, `decoder_inputs_embeds` takes the value of `inputs_embeds`.`
Which is on valid for the forward pass of the `ForConditionnalGeneration` not for the model alone.

* other fixes
parent 4430b912
...@@ -663,15 +663,12 @@ SPEECH_TO_TEXT_INPUTS_DOCSTRING = r""" ...@@ -663,15 +663,12 @@ SPEECH_TO_TEXT_INPUTS_DOCSTRING = r"""
If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those that If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those that
don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of all don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of all
`decoder_input_ids` of shape `(batch_size, sequence_length)`. decoder_inputs_embeds (`torch.FloatTensor` of `decoder_input_ids` of shape `(batch_size, sequence_length)`.
shape `(batch_size, target_sequence_length, hidden_size)`, *optional*): Optionally, instead of passing decoder_inputs_embeds (`torch.FloatTensor` of shape `(batch_size, target_sequence_length, hidden_size)`, *optional*):
`decoder_input_ids` you can choose to directly pass an embedded representation. If `past_key_values` is Optionally, instead of passing `decoder_input_ids` you can choose to directly pass an embedded
used, optionally only the last `decoder_inputs_embeds` have to be input (see `past_key_values`). This is representation. If `past_key_values` is used, optionally only the last `decoder_inputs_embeds` have to be
useful if you want more control over how to convert `decoder_input_ids` indices into associated vectors input (see `past_key_values`). This is useful if you want more control over how to convert
than the model's internal embedding lookup matrix. `decoder_input_ids` indices into associated vectors than the model's internal embedding lookup matrix.
If `decoder_input_ids` and `decoder_inputs_embeds` are both unset, `decoder_inputs_embeds` takes the value
of `inputs_embeds`.
use_cache (`bool`, *optional*): use_cache (`bool`, *optional*):
If set to `True`, `past_key_values` key value states are returned and can be used to speed up decoding (see If set to `True`, `past_key_values` key value states are returned and can be used to speed up decoding (see
`past_key_values`). `past_key_values`).
......
...@@ -673,8 +673,9 @@ SPEECH_TO_TEXT_INPUTS_DOCSTRING = r""" ...@@ -673,8 +673,9 @@ SPEECH_TO_TEXT_INPUTS_DOCSTRING = r"""
[What are decoder input IDs?](../glossary#decoder-input-ids) [What are decoder input IDs?](../glossary#decoder-input-ids)
Bart uses the `eos_token_id` as the starting token for `decoder_input_ids` generation. If `past_key_values` SpeechToText uses the `eos_token_id` as the starting token for `decoder_input_ids` generation. If
is used, optionally only the last `decoder_input_ids` have to be input (see `past_key_values`). `past_key_values` is used, optionally only the last `decoder_input_ids` have to be input (see
`past_key_values`).
For translation and summarization training, `decoder_input_ids` should be provided. If no For translation and summarization training, `decoder_input_ids` should be provided. If no
`decoder_input_ids` is provided, the model will create this tensor by shifting the `input_ids` to the right `decoder_input_ids` is provided, the model will create this tensor by shifting the `input_ids` to the right
...@@ -707,6 +708,14 @@ SPEECH_TO_TEXT_INPUTS_DOCSTRING = r""" ...@@ -707,6 +708,14 @@ SPEECH_TO_TEXT_INPUTS_DOCSTRING = r"""
If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those that If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those that
don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of all don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of all
`decoder_input_ids` of shape `(batch_size, sequence_length)`. `decoder_input_ids` of shape `(batch_size, sequence_length)`.
decoder_inputs_embeds (`tf.FloatTensor` of shape `(batch_size, target_sequence_length, hidden_size)`, *optional*):
Optionally, instead of passing `decoder_input_ids` you can choose to directly pass an embedded
representation. If `past_key_values` is used, optionally only the last `decoder_inputs_embeds` have to be
input (see `past_key_values`). This is useful if you want more control over how to convert
`decoder_input_ids` indices into associated vectors than the model's internal embedding lookup matrix.
use_cache (`bool`, *optional*):
If set to `True`, `past_key_values` key value states are returned and can be used to speed up decoding (see
`past_key_values`).
output_attentions (`bool`, *optional*): output_attentions (`bool`, *optional*):
Whether or not to return the attentions tensors of all attention layers. See `attentions` under returned Whether or not to return the attentions tensors of all attention layers. See `attentions` under returned
tensors for more detail. This argument can be used only in eager mode, in graph mode the value in the tensors for more detail. This argument can be used only in eager mode, in graph mode the value in the
......
...@@ -565,15 +565,12 @@ WHISPER_INPUTS_DOCSTRING = r""" ...@@ -565,15 +565,12 @@ WHISPER_INPUTS_DOCSTRING = r"""
If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those that If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those that
don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of all don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of all
`decoder_input_ids` of shape `(batch_size, sequence_length)`. decoder_inputs_embeds (`tf.Tensor` of shape `decoder_input_ids` of shape `(batch_size, sequence_length)`.
`(batch_size, target_sequence_length, hidden_size)`, *optional*): Optionally, instead of passing decoder_inputs_embeds (`tf.Tensor` of shape `(batch_size, target_sequence_length, hidden_size)`, *optional*):
`decoder_input_ids` you can choose to directly pass an embedded representation. If `past_key_values` is Optionally, instead of passing `decoder_input_ids` you can choose to directly pass an embedded
used, optionally only the last `decoder_inputs_embeds` have to be input (see `past_key_values`). This is representation. If `past_key_values` is used, optionally only the last `decoder_inputs_embeds` have to be
useful if you want more control over how to convert `decoder_input_ids` indices into associated vectors input (see `past_key_values`). This is useful if you want more control over how to convert
than the model's internal embedding lookup matrix. `decoder_input_ids` indices into associated vectors than the model's internal embedding lookup matrix.
If `decoder_input_ids` and `decoder_inputs_embeds` are both unset, `decoder_inputs_embeds` takes the value
of `inputs_embeds`.
use_cache (`bool`, *optional*): use_cache (`bool`, *optional*):
If set to `True`, `past_key_values` key value states are returned and can be used to speed up decoding (see If set to `True`, `past_key_values` key value states are returned and can be used to speed up decoding (see
`past_key_values`). `past_key_values`).
......
...@@ -546,15 +546,12 @@ WHISPER_INPUTS_DOCSTRING = r""" ...@@ -546,15 +546,12 @@ WHISPER_INPUTS_DOCSTRING = r"""
If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those that If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those that
don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of all don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of all
`decoder_input_ids` of shape `(batch_size, sequence_length)`. decoder_inputs_embeds (`torch.FloatTensor` of `decoder_input_ids` of shape `(batch_size, sequence_length)`.
shape `(batch_size, target_sequence_length, hidden_size)`, *optional*): Optionally, instead of passing decoder_inputs_embeds (`torch.FloatTensor` of shape `(batch_size, target_sequence_length, hidden_size)`, *optional*):
`decoder_input_ids` you can choose to directly pass an embedded representation. If `past_key_values` is Optionally, instead of passing `decoder_input_ids` you can choose to directly pass an embedded
used, optionally only the last `decoder_inputs_embeds` have to be input (see `past_key_values`). This is representation. If `past_key_values` is used, optionally only the last `decoder_inputs_embeds` have to be
useful if you want more control over how to convert `decoder_input_ids` indices into associated vectors input (see `past_key_values`). This is useful if you want more control over how to convert
than the model's internal embedding lookup matrix. `decoder_input_ids` indices into associated vectors than the model's internal embedding lookup matrix.
If `decoder_input_ids` and `decoder_inputs_embeds` are both unset, `decoder_inputs_embeds` takes the value
of `inputs_embeds`.
use_cache (`bool`, *optional*): use_cache (`bool`, *optional*):
If set to `True`, `past_key_values` key value states are returned and can be used to speed up decoding (see If set to `True`, `past_key_values` key value states are returned and can be used to speed up decoding (see
`past_key_values`). `past_key_values`).
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment