Unverified Commit 7e93ce40 authored by Joao Gante's avatar Joao Gante Committed by GitHub
Browse files

Fix `input_embeds` docstring in encoder-decoder architectures (#28168)

parent 4f7806ef
...@@ -1900,11 +1900,11 @@ class SeamlessM4TDecoder(SeamlessM4TPreTrainedModel): ...@@ -1900,11 +1900,11 @@ class SeamlessM4TDecoder(SeamlessM4TPreTrainedModel):
If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those
that don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of that don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of
all `decoder_input_ids` of shape `(batch_size, sequence_length)`. inputs_embeds (`torch.FloatTensor` of all `decoder_input_ids` of shape `(batch_size, sequence_length)`.
shape `(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing inputs_embeds (`torch.FloatTensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*):
`input_ids` you can choose to directly pass an embedded representation. This is useful if you want more Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation.
control over how to convert `input_ids` indices into associated vectors than the model's internal This is useful if you want more control over how to convert `input_ids` indices into associated vectors
embedding lookup matrix. than the model's internal embedding lookup matrix.
output_attentions (`bool`, *optional*): output_attentions (`bool`, *optional*):
Whether or not to return the attentions tensors of all attention layers. See `attentions` under Whether or not to return the attentions tensors of all attention layers. See `attentions` under
returned tensors for more detail. returned tensors for more detail.
......
...@@ -2022,11 +2022,11 @@ class SeamlessM4Tv2Decoder(SeamlessM4Tv2PreTrainedModel): ...@@ -2022,11 +2022,11 @@ class SeamlessM4Tv2Decoder(SeamlessM4Tv2PreTrainedModel):
If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those
that don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of that don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of
all `decoder_input_ids` of shape `(batch_size, sequence_length)`. inputs_embeds (`torch.FloatTensor` of all `decoder_input_ids` of shape `(batch_size, sequence_length)`.
shape `(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing inputs_embeds (`torch.FloatTensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*):
`input_ids` you can choose to directly pass an embedded representation. This is useful if you want more Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation.
control over how to convert `input_ids` indices into associated vectors than the model's internal This is useful if you want more control over how to convert `input_ids` indices into associated vectors
embedding lookup matrix. than the model's internal embedding lookup matrix.
output_attentions (`bool`, *optional*): output_attentions (`bool`, *optional*):
Whether or not to return the attentions tensors of all attention layers. See `attentions` under Whether or not to return the attentions tensors of all attention layers. See `attentions` under
returned tensors for more detail. returned tensors for more detail.
......
...@@ -933,11 +933,11 @@ class Speech2TextDecoder(Speech2TextPreTrainedModel): ...@@ -933,11 +933,11 @@ class Speech2TextDecoder(Speech2TextPreTrainedModel):
If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those
that don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of that don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of
all `decoder_input_ids` of shape `(batch_size, sequence_length)`. inputs_embeds (`torch.FloatTensor` of all `decoder_input_ids` of shape `(batch_size, sequence_length)`.
shape `(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing inputs_embeds (`torch.FloatTensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*):
`input_ids` you can choose to directly pass an embedded representation. This is useful if you want more Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation.
control over how to convert `input_ids` indices into associated vectors than the model's internal This is useful if you want more control over how to convert `input_ids` indices into associated vectors
embedding lookup matrix. than the model's internal embedding lookup matrix.
output_attentions (`bool`, *optional*): output_attentions (`bool`, *optional*):
Whether or not to return the attentions tensors of all attention layers. See `attentions` under Whether or not to return the attentions tensors of all attention layers. See `attentions` under
returned tensors for more detail. returned tensors for more detail.
......
...@@ -1064,11 +1064,11 @@ class TFSpeech2TextDecoder(tf.keras.layers.Layer): ...@@ -1064,11 +1064,11 @@ class TFSpeech2TextDecoder(tf.keras.layers.Layer):
If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those
that don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of that don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of
all `decoder_input_ids` of shape `(batch_size, sequence_length)`. inputs_embeds (`tf.Tensor` of shape all `decoder_input_ids` of shape `(batch_size, sequence_length)`.
`(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing `input_ids` inputs_embeds (`tf.Tensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*):
you can choose to directly pass an embedded representation. This is useful if you want more control Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation.
over how to convert `input_ids` indices into associated vectors than the model's internal embedding This is useful if you want more control over how to convert `input_ids` indices into associated vectors
lookup matrix. than the model's internal embedding lookup matrix.
output_attentions (`bool`, *optional*): output_attentions (`bool`, *optional*):
Whether or not to return the attentions tensors of all attention layers. See `attentions` under Whether or not to return the attentions tensors of all attention layers. See `attentions` under
returned tensors for more detail. returned tensors for more detail.
......
...@@ -530,11 +530,11 @@ class Speech2Text2Decoder(Speech2Text2PreTrainedModel): ...@@ -530,11 +530,11 @@ class Speech2Text2Decoder(Speech2Text2PreTrainedModel):
If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those
that don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of that don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of
all `decoder_input_ids` of shape `(batch_size, sequence_length)`. inputs_embeds (`torch.FloatTensor` of all `decoder_input_ids` of shape `(batch_size, sequence_length)`.
shape `(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing inputs_embeds (`torch.FloatTensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*):
`input_ids` you can choose to directly pass an embedded representation. This is useful if you want more Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation.
control over how to convert `input_ids` indices into associated vectors than the model's internal This is useful if you want more control over how to convert `input_ids` indices into associated vectors
embedding lookup matrix. than the model's internal embedding lookup matrix.
output_attentions (`bool`, *optional*): output_attentions (`bool`, *optional*):
Whether or not to return the attentions tensors of all attention layers. See `attentions` under Whether or not to return the attentions tensors of all attention layers. See `attentions` under
returned tensors for more detail. returned tensors for more detail.
......
...@@ -1569,11 +1569,11 @@ class SpeechT5Decoder(SpeechT5PreTrainedModel): ...@@ -1569,11 +1569,11 @@ class SpeechT5Decoder(SpeechT5PreTrainedModel):
If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those
that don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of that don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of
all `decoder_input_ids` of shape `(batch_size, sequence_length)`. inputs_embeds (`torch.FloatTensor` of all `decoder_input_ids` of shape `(batch_size, sequence_length)`.
shape `(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing inputs_embeds (`torch.FloatTensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*):
`input_ids` you can choose to directly pass an embedded representation. This is useful if you want more Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation.
control over how to convert `input_ids` indices into associated vectors than the model's internal This is useful if you want more control over how to convert `input_ids` indices into associated vectors
embedding lookup matrix. than the model's internal embedding lookup matrix.
output_attentions (`bool`, *optional*): output_attentions (`bool`, *optional*):
Whether or not to return the attentions tensors of all attention layers. See `attentions` under Whether or not to return the attentions tensors of all attention layers. See `attentions` under
returned tensors for more detail. returned tensors for more detail.
......
...@@ -549,11 +549,11 @@ class TrOCRDecoder(TrOCRPreTrainedModel): ...@@ -549,11 +549,11 @@ class TrOCRDecoder(TrOCRPreTrainedModel):
If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those
that don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of that don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of
all `decoder_input_ids` of shape `(batch_size, sequence_length)`. inputs_embeds (`torch.FloatTensor` of all `decoder_input_ids` of shape `(batch_size, sequence_length)`.
shape `(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing inputs_embeds (`torch.FloatTensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*):
`input_ids` you can choose to directly pass an embedded representation. This is useful if you want more Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation.
control over how to convert `input_ids` indices into associated vectors than the model's internal This is useful if you want more control over how to convert `input_ids` indices into associated vectors
embedding lookup matrix. than the model's internal embedding lookup matrix.
output_attentions (`bool`, *optional*): output_attentions (`bool`, *optional*):
Whether or not to return the attentions tensors of all attention layers. See `attentions` under Whether or not to return the attentions tensors of all attention layers. See `attentions` under
returned tensors for more detail. returned tensors for more detail.
......
...@@ -958,11 +958,11 @@ class TFWhisperDecoder(tf.keras.layers.Layer): ...@@ -958,11 +958,11 @@ class TFWhisperDecoder(tf.keras.layers.Layer):
If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those
that don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of that don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of
all `decoder_input_ids` of shape `(batch_size, sequence_length)`. inputs_embeds (`tf.Tensor` of shape all `decoder_input_ids` of shape `(batch_size, sequence_length)`.
`(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing `input_ids` inputs_embeds (`tf.Tensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*):
you can choose to directly pass an embedded representation. This is useful if you want more control Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation.
over how to convert `input_ids` indices into associated vectors than the model's internal embedding This is useful if you want more control over how to convert `input_ids` indices into associated vectors
lookup matrix. than the model's internal embedding lookup matrix.
output_attentions (`bool`, *optional*): output_attentions (`bool`, *optional*):
Whether or not to return the attentions tensors of all attention layers. See `attentions` under Whether or not to return the attentions tensors of all attention layers. See `attentions` under
returned tensors for more detail. returned tensors for more detail.
......
...@@ -113,10 +113,11 @@ XGLM_INPUTS_DOCSTRING = r""" ...@@ -113,10 +113,11 @@ XGLM_INPUTS_DOCSTRING = r"""
If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those that If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those that
don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of all don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of all
`decoder_input_ids` of shape `(batch_size, sequence_length)`. inputs_embeds (`torch.FloatTensor` of shape `decoder_input_ids` of shape `(batch_size, sequence_length)`.
`(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing `input_ids` you inputs_embeds (`torch.FloatTensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*):
can choose to directly pass an embedded representation. This is useful if you want more control over how to Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation.
convert `input_ids` indices into associated vectors than the model's internal embedding lookup matrix. This is useful if you want more control over how to convert `input_ids` indices into associated vectors
than the model's internal embedding lookup matrix.
output_attentions (`bool`, *optional*): output_attentions (`bool`, *optional*):
Whether or not to return the attentions tensors of all attention layers. See `attentions` under returned Whether or not to return the attentions tensors of all attention layers. See `attentions` under returned
tensors for more detail. tensors for more detail.
......
...@@ -2083,8 +2083,11 @@ class {{cookiecutter.camelcase_modelname}}PreTrainedModel(PreTrainedModel): ...@@ -2083,8 +2083,11 @@ class {{cookiecutter.camelcase_modelname}}PreTrainedModel(PreTrainedModel):
If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids`
(those that don't have their past key value states given to this model) of shape `(batch_size, 1)` (those that don't have their past key value states given to this model) of shape `(batch_size, 1)`
instead of all `decoder_input_ids` of shape `(batch_size, sequence_length)`. inputs_embeds (`torch.FloatTensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation. This is useful if you want more control over how to convert `input_ids` indices into associated instead of all `decoder_input_ids` of shape `(batch_size, sequence_length)`.
vectors than the model's internal embedding lookup matrix. inputs_embeds (`torch.FloatTensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*):
Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation.
This is useful if you want more control over how to convert `input_ids` indices into associated vectors
than the model's internal embedding lookup matrix.
decoder_inputs_embeds (`torch.FloatTensor` of shape `(batch_size, target_sequence_length, hidden_size)`, *optional*): decoder_inputs_embeds (`torch.FloatTensor` of shape `(batch_size, target_sequence_length, hidden_size)`, *optional*):
Optionally, instead of passing `decoder_input_ids` you can choose to directly pass an embedded Optionally, instead of passing `decoder_input_ids` you can choose to directly pass an embedded
representation. If `past_key_values` is used, optionally only the last `decoder_inputs_embeds` representation. If `past_key_values` is used, optionally only the last `decoder_inputs_embeds`
...@@ -2407,8 +2410,11 @@ class {{cookiecutter.camelcase_modelname}}Decoder({{cookiecutter.camelcase_model ...@@ -2407,8 +2410,11 @@ class {{cookiecutter.camelcase_modelname}}Decoder({{cookiecutter.camelcase_model
If `past_key_values` are used, the user can optionally input only the last If `past_key_values` are used, the user can optionally input only the last
`decoder_input_ids` (those that don't have their past key value states given to this model) of `decoder_input_ids` (those that don't have their past key value states given to this model) of
shape `(batch_size, 1)` instead of all `decoder_input_ids` of shape `(batch_size, shape `(batch_size, 1)` instead of all `decoder_input_ids` of shape `(batch_size,
sequence_length)`. inputs_embeds (`torch.FloatTensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation. This is useful if you want more control over how to convert `input_ids` indices sequence_length)`.
into associated vectors than the model's internal embedding lookup matrix. inputs_embeds (`torch.FloatTensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*):
Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation.
This is useful if you want more control over how to convert `input_ids` indices into associated vectors
than the model's internal embedding lookup matrix.
output_attentions (`bool`, *optional*): output_attentions (`bool`, *optional*):
Whether or not to return the attentions tensors of all attention layers. See `attentions` under Whether or not to return the attentions tensors of all attention layers. See `attentions` under
returned tensors for more detail. returned tensors for more detail.
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment