Unverified Commit 7e93ce40 authored by Joao Gante's avatar Joao Gante Committed by GitHub
Browse files

Fix `input_embeds` docstring in encoder-decoder architectures (#28168)

parent 4f7806ef
...@@ -1011,10 +1011,11 @@ BART_INPUTS_DOCSTRING = r""" ...@@ -1011,10 +1011,11 @@ BART_INPUTS_DOCSTRING = r"""
If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those that If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those that
don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of all don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of all
`decoder_input_ids` of shape `(batch_size, sequence_length)`. inputs_embeds (`torch.FloatTensor` of shape `decoder_input_ids` of shape `(batch_size, sequence_length)`.
`(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing `input_ids` you inputs_embeds (`torch.FloatTensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*):
can choose to directly pass an embedded representation. This is useful if you want more control over how to Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation.
convert `input_ids` indices into associated vectors than the model's internal embedding lookup matrix. This is useful if you want more control over how to convert `input_ids` indices into associated vectors
than the model's internal embedding lookup matrix.
decoder_inputs_embeds (`torch.FloatTensor` of shape `(batch_size, target_sequence_length, hidden_size)`, *optional*): decoder_inputs_embeds (`torch.FloatTensor` of shape `(batch_size, target_sequence_length, hidden_size)`, *optional*):
Optionally, instead of passing `decoder_input_ids` you can choose to directly pass an embedded Optionally, instead of passing `decoder_input_ids` you can choose to directly pass an embedded
representation. If `past_key_values` is used, optionally only the last `decoder_inputs_embeds` have to be representation. If `past_key_values` is used, optionally only the last `decoder_inputs_embeds` have to be
...@@ -1331,11 +1332,11 @@ class BartDecoder(BartPreTrainedModel): ...@@ -1331,11 +1332,11 @@ class BartDecoder(BartPreTrainedModel):
If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those
that don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of that don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of
all `decoder_input_ids` of shape `(batch_size, sequence_length)`. inputs_embeds (`torch.FloatTensor` of all `decoder_input_ids` of shape `(batch_size, sequence_length)`.
shape `(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing inputs_embeds (`torch.FloatTensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*):
`input_ids` you can choose to directly pass an embedded representation. This is useful if you want more Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation.
control over how to convert `input_ids` indices into associated vectors than the model's internal This is useful if you want more control over how to convert `input_ids` indices into associated vectors
embedding lookup matrix. than the model's internal embedding lookup matrix.
output_attentions (`bool`, *optional*): output_attentions (`bool`, *optional*):
Whether or not to return the attentions tensors of all attention layers. See `attentions` under Whether or not to return the attentions tensors of all attention layers. See `attentions` under
returned tensors for more detail. returned tensors for more detail.
......
...@@ -715,6 +715,10 @@ BART_INPUTS_DOCSTRING = r""" ...@@ -715,6 +715,10 @@ BART_INPUTS_DOCSTRING = r"""
If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those that If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those that
don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of all don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of all
`decoder_input_ids` of shape `(batch_size, sequence_length)`. `decoder_input_ids` of shape `(batch_size, sequence_length)`.
inputs_embeds (`tf.Tensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*):
Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation.
This is useful if you want more control over how to convert `input_ids` indices into associated vectors
than the model's internal embedding lookup matrix.
use_cache (`bool`, *optional*, defaults to `True`): use_cache (`bool`, *optional*, defaults to `True`):
If set to `True`, `past_key_values` key value states are returned and can be used to speed up decoding (see If set to `True`, `past_key_values` key value states are returned and can be used to speed up decoding (see
`past_key_values`). Set to `False` during training, `True` during generation `past_key_values`). Set to `False` during training, `True` during generation
...@@ -990,11 +994,11 @@ class TFBartDecoder(tf.keras.layers.Layer): ...@@ -990,11 +994,11 @@ class TFBartDecoder(tf.keras.layers.Layer):
If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those
that don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of that don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of
all `decoder_input_ids` of shape `(batch_size, sequence_length)`. inputs_embeds (`tf.Tensor` of shape all `decoder_input_ids` of shape `(batch_size, sequence_length)`.
`(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing `input_ids` inputs_embeds (`tf.tTensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*):
you can choose to directly pass an embedded representation. This is useful if you want more control Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation.
over how to convert `input_ids` indices into associated vectors than the model's internal embedding This is useful if you want more control over how to convert `input_ids` indices into associated vectors
lookup matrix. than the model's internal embedding lookup matrix.
output_attentions (`bool`, *optional*): output_attentions (`bool`, *optional*):
Whether or not to return the attentions tensors of all attention layers. See `attentions` under Whether or not to return the attentions tensors of all attention layers. See `attentions` under
returned tensors for more detail. returned tensors for more detail.
......
...@@ -1678,10 +1678,11 @@ BIGBIRD_PEGASUS_INPUTS_DOCSTRING = r""" ...@@ -1678,10 +1678,11 @@ BIGBIRD_PEGASUS_INPUTS_DOCSTRING = r"""
If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those that If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those that
don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of all don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of all
`decoder_input_ids` of shape `(batch_size, sequence_length)`. inputs_embeds (`torch.FloatTensor` of shape `decoder_input_ids` of shape `(batch_size, sequence_length)`.
`(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing `input_ids` you inputs_embeds (`torch.FloatTensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*):
can choose to directly pass an embedded representation. This is useful if you want more control over how to Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation.
convert `input_ids` indices into associated vectors than the model's internal embedding lookup matrix. This is useful if you want more control over how to convert `input_ids` indices into associated vectors
than the model's internal embedding lookup matrix.
decoder_inputs_embeds (`torch.FloatTensor` of shape `(batch_size, target_sequence_length, hidden_size)`, *optional*): decoder_inputs_embeds (`torch.FloatTensor` of shape `(batch_size, target_sequence_length, hidden_size)`, *optional*):
Optionally, instead of passing `decoder_input_ids` you can choose to directly pass an embedded Optionally, instead of passing `decoder_input_ids` you can choose to directly pass an embedded
representation. If `past_key_values` is used, optionally only the last `decoder_inputs_embeds` have to be representation. If `past_key_values` is used, optionally only the last `decoder_inputs_embeds` have to be
...@@ -2136,11 +2137,11 @@ class BigBirdPegasusDecoder(BigBirdPegasusPreTrainedModel): ...@@ -2136,11 +2137,11 @@ class BigBirdPegasusDecoder(BigBirdPegasusPreTrainedModel):
If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those
that don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of that don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of
all `decoder_input_ids` of shape `(batch_size, sequence_length)`. inputs_embeds (`torch.FloatTensor` of all `decoder_input_ids` of shape `(batch_size, sequence_length)`.
shape `(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing inputs_embeds (`torch.FloatTensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*):
`input_ids` you can choose to directly pass an embedded representation. This is useful if you want more Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation.
control over how to convert `input_ids` indices into associated vectors than the model's internal This is useful if you want more control over how to convert `input_ids` indices into associated vectors
embedding lookup matrix. than the model's internal embedding lookup matrix.
output_attentions (`bool`, *optional*): output_attentions (`bool`, *optional*):
Whether or not to return the attentions tensors of all attention layers. See `attentions` under Whether or not to return the attentions tensors of all attention layers. See `attentions` under
returned tensors for more detail. returned tensors for more detail.
......
...@@ -396,10 +396,11 @@ BIOGPT_INPUTS_DOCSTRING = r""" ...@@ -396,10 +396,11 @@ BIOGPT_INPUTS_DOCSTRING = r"""
If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those that If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those that
don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of all don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of all
`decoder_input_ids` of shape `(batch_size, sequence_length)`. inputs_embeds (`torch.FloatTensor` of shape `decoder_input_ids` of shape `(batch_size, sequence_length)`.
`(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing `input_ids` you inputs_embeds (`torch.FloatTensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*):
can choose to directly pass an embedded representation. This is useful if you want more control over how to Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation.
convert `input_ids` indices into associated vectors than the model's internal embedding lookup matrix. This is useful if you want more control over how to convert `input_ids` indices into associated vectors
than the model's internal embedding lookup matrix.
use_cache (`bool`, *optional*): use_cache (`bool`, *optional*):
If set to `True`, `past_key_values` key value states are returned and can be used to speed up decoding (see If set to `True`, `past_key_values` key value states are returned and can be used to speed up decoding (see
`past_key_values`). `past_key_values`).
......
...@@ -589,10 +589,11 @@ BLENDERBOT_INPUTS_DOCSTRING = r""" ...@@ -589,10 +589,11 @@ BLENDERBOT_INPUTS_DOCSTRING = r"""
If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those that If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those that
don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of all don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of all
`decoder_input_ids` of shape `(batch_size, sequence_length)`. inputs_embeds (`torch.FloatTensor` of shape `decoder_input_ids` of shape `(batch_size, sequence_length)`.
`(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing `input_ids` you inputs_embeds (`torch.FloatTensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*):
can choose to directly pass an embedded representation. This is useful if you want more control over how to Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation.
convert `input_ids` indices into associated vectors than the model's internal embedding lookup matrix. This is useful if you want more control over how to convert `input_ids` indices into associated vectors
than the model's internal embedding lookup matrix.
decoder_inputs_embeds (`torch.FloatTensor` of shape `(batch_size, target_sequence_length, hidden_size)`, *optional*): decoder_inputs_embeds (`torch.FloatTensor` of shape `(batch_size, target_sequence_length, hidden_size)`, *optional*):
Optionally, instead of passing `decoder_input_ids` you can choose to directly pass an embedded Optionally, instead of passing `decoder_input_ids` you can choose to directly pass an embedded
representation. If `past_key_values` is used, optionally only the last `decoder_inputs_embeds` have to be representation. If `past_key_values` is used, optionally only the last `decoder_inputs_embeds` have to be
...@@ -892,11 +893,11 @@ class BlenderbotDecoder(BlenderbotPreTrainedModel): ...@@ -892,11 +893,11 @@ class BlenderbotDecoder(BlenderbotPreTrainedModel):
If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those
that don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of that don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of
all `decoder_input_ids` of shape `(batch_size, sequence_length)`. inputs_embeds (`torch.FloatTensor` of all `decoder_input_ids` of shape `(batch_size, sequence_length)`.
shape `(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing inputs_embeds (`torch.FloatTensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*):
`input_ids` you can choose to directly pass an embedded representation. This is useful if you want more Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation.
control over how to convert `input_ids` indices into associated vectors than the model's internal This is useful if you want more control over how to convert `input_ids` indices into associated vectors
embedding lookup matrix. than the model's internal embedding lookup matrix.
output_attentions (`bool`, *optional*): output_attentions (`bool`, *optional*):
Whether or not to return the attentions tensors of all attention layers. See `attentions` under Whether or not to return the attentions tensors of all attention layers. See `attentions` under
returned tensors for more detail. returned tensors for more detail.
......
...@@ -949,11 +949,11 @@ class TFBlenderbotDecoder(tf.keras.layers.Layer): ...@@ -949,11 +949,11 @@ class TFBlenderbotDecoder(tf.keras.layers.Layer):
If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those
that don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of that don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of
all `decoder_input_ids` of shape `(batch_size, sequence_length)`. inputs_embeds (`tf.Tensor` of shape all `decoder_input_ids` of shape `(batch_size, sequence_length)`.
`(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing `input_ids` inputs_embeds (`tf.Tensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*):
you can choose to directly pass an embedded representation. This is useful if you want more control Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation.
over how to convert `input_ids` indices into associated vectors than the model's internal embedding This is useful if you want more control over how to convert `input_ids` indices into associated vectors
lookup matrix. than the model's internal embedding lookup matrix.
output_attentions (`bool`, *optional*): output_attentions (`bool`, *optional*):
Whether or not to return the attentions tensors of all attention layers. See `attentions` under Whether or not to return the attentions tensors of all attention layers. See `attentions` under
returned tensors for more detail. This argument can be used only in eager mode, in graph mode the value returned tensors for more detail. This argument can be used only in eager mode, in graph mode the value
......
...@@ -589,10 +589,11 @@ BLENDERBOT_SMALL_INPUTS_DOCSTRING = r""" ...@@ -589,10 +589,11 @@ BLENDERBOT_SMALL_INPUTS_DOCSTRING = r"""
If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those that If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those that
don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of all don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of all
`decoder_input_ids` of shape `(batch_size, sequence_length)`. inputs_embeds (`torch.FloatTensor` of shape `decoder_input_ids` of shape `(batch_size, sequence_length)`.
`(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing `input_ids` you inputs_embeds (`torch.FloatTensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*):
can choose to directly pass an embedded representation. This is useful if you want more control over how to Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation.
convert `input_ids` indices into associated vectors than the model's internal embedding lookup matrix. This is useful if you want more control over how to convert `input_ids` indices into associated vectors
than the model's internal embedding lookup matrix.
decoder_inputs_embeds (`torch.FloatTensor` of shape `(batch_size, target_sequence_length, hidden_size)`, *optional*): decoder_inputs_embeds (`torch.FloatTensor` of shape `(batch_size, target_sequence_length, hidden_size)`, *optional*):
Optionally, instead of passing `decoder_input_ids` you can choose to directly pass an embedded Optionally, instead of passing `decoder_input_ids` you can choose to directly pass an embedded
representation. If `past_key_values` is used, optionally only the last `decoder_inputs_embeds` have to be representation. If `past_key_values` is used, optionally only the last `decoder_inputs_embeds` have to be
...@@ -889,11 +890,11 @@ class BlenderbotSmallDecoder(BlenderbotSmallPreTrainedModel): ...@@ -889,11 +890,11 @@ class BlenderbotSmallDecoder(BlenderbotSmallPreTrainedModel):
If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those
that don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of that don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of
all `decoder_input_ids` of shape `(batch_size, sequence_length)`. inputs_embeds (`torch.FloatTensor` of all `decoder_input_ids` of shape `(batch_size, sequence_length)`.
shape `(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing inputs_embeds (`torch.FloatTensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*):
`input_ids` you can choose to directly pass an embedded representation. This is useful if you want more Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation.
control over how to convert `input_ids` indices into associated vectors than the model's internal This is useful if you want more control over how to convert `input_ids` indices into associated vectors
embedding lookup matrix. than the model's internal embedding lookup matrix.
output_attentions (`bool`, *optional*): output_attentions (`bool`, *optional*):
Whether or not to return the attentions tensors of all attention layers. See `attentions` under Whether or not to return the attentions tensors of all attention layers. See `attentions` under
returned tensors for more detail. returned tensors for more detail.
......
...@@ -957,11 +957,11 @@ class TFBlenderbotSmallDecoder(tf.keras.layers.Layer): ...@@ -957,11 +957,11 @@ class TFBlenderbotSmallDecoder(tf.keras.layers.Layer):
If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those
that don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of that don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of
all `decoder_input_ids` of shape `(batch_size, sequence_length)`. inputs_embeds (`tf.Tensor` of shape all `decoder_input_ids` of shape `(batch_size, sequence_length)`.
`(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing `input_ids` inputs_embeds (`tf.Tensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*):
you can choose to directly pass an embedded representation. This is useful if you want more control Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation.
over how to convert `input_ids` indices into associated vectors than the model's internal embedding This is useful if you want more control over how to convert `input_ids` indices into associated vectors
lookup matrix. than the model's internal embedding lookup matrix.
output_attentions (`bool`, *optional*): output_attentions (`bool`, *optional*):
Whether or not to return the attentions tensors of all attention layers. See `attentions` under Whether or not to return the attentions tensors of all attention layers. See `attentions` under
returned tensors for more detail. This argument can be used only in eager mode, in graph mode the value returned tensors for more detail. This argument can be used only in eager mode, in graph mode the value
......
...@@ -2021,11 +2021,11 @@ class LEDDecoder(LEDPreTrainedModel): ...@@ -2021,11 +2021,11 @@ class LEDDecoder(LEDPreTrainedModel):
If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those
that don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of that don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of
all `decoder_input_ids` of shape `(batch_size, sequence_length)`. inputs_embeds (`torch.FloatTensor` of all `decoder_input_ids` of shape `(batch_size, sequence_length)`.
shape `(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing inputs_embeds (`torch.FloatTensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*):
`input_ids` you can choose to directly pass an embedded representation. This is useful if you want more Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation.
control over how to convert `input_ids` indices into associated vectors than the model's internal This is useful if you want more control over how to convert `input_ids` indices into associated vectors
embedding lookup matrix. than the model's internal embedding lookup matrix.
output_attentions (`bool`, *optional*): output_attentions (`bool`, *optional*):
Whether or not to return the attentions tensors of all attention layers. See `attentions` under Whether or not to return the attentions tensors of all attention layers. See `attentions` under
returned tensors for more detail. returned tensors for more detail.
......
...@@ -630,10 +630,11 @@ M2M_100_INPUTS_DOCSTRING = r""" ...@@ -630,10 +630,11 @@ M2M_100_INPUTS_DOCSTRING = r"""
If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those that If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those that
don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of all don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of all
`decoder_input_ids` of shape `(batch_size, sequence_length)`. inputs_embeds (`torch.FloatTensor` of shape `decoder_input_ids` of shape `(batch_size, sequence_length)`.
`(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing `input_ids` you inputs_embeds (`torch.FloatTensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*):
can choose to directly pass an embedded representation. This is useful if you want more control over how to Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation.
convert `input_ids` indices into associated vectors than the model's internal embedding lookup matrix. This is useful if you want more control over how to convert `input_ids` indices into associated vectors
than the model's internal embedding lookup matrix.
decoder_inputs_embeds (`torch.FloatTensor` of shape `(batch_size, target_sequence_length, hidden_size)`, *optional*): decoder_inputs_embeds (`torch.FloatTensor` of shape `(batch_size, target_sequence_length, hidden_size)`, *optional*):
Optionally, instead of passing `decoder_input_ids` you can choose to directly pass an embedded Optionally, instead of passing `decoder_input_ids` you can choose to directly pass an embedded
representation. If `past_key_values` is used, optionally only the last `decoder_inputs_embeds` have to be representation. If `past_key_values` is used, optionally only the last `decoder_inputs_embeds` have to be
...@@ -931,11 +932,11 @@ class M2M100Decoder(M2M100PreTrainedModel): ...@@ -931,11 +932,11 @@ class M2M100Decoder(M2M100PreTrainedModel):
If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those
that don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of that don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of
all `decoder_input_ids` of shape `(batch_size, sequence_length)`. inputs_embeds (`torch.FloatTensor` of all `decoder_input_ids` of shape `(batch_size, sequence_length)`.
shape `(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing inputs_embeds (`torch.FloatTensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*):
`input_ids` you can choose to directly pass an embedded representation. This is useful if you want more Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation.
control over how to convert `input_ids` indices into associated vectors than the model's internal This is useful if you want more control over how to convert `input_ids` indices into associated vectors
embedding lookup matrix. than the model's internal embedding lookup matrix.
output_attentions (`bool`, *optional*): output_attentions (`bool`, *optional*):
Whether or not to return the attentions tensors of all attention layers. See `attentions` under Whether or not to return the attentions tensors of all attention layers. See `attentions` under
returned tensors for more detail. returned tensors for more detail.
......
...@@ -688,6 +688,10 @@ MARIAN_INPUTS_DOCSTRING = r""" ...@@ -688,6 +688,10 @@ MARIAN_INPUTS_DOCSTRING = r"""
If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those that If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those that
don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of all don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of all
`decoder_input_ids` of shape `(batch_size, sequence_length)`. `decoder_input_ids` of shape `(batch_size, sequence_length)`.
inputs_embeds (`tf.Tensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*):
Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation.
This is useful if you want more control over how to convert `input_ids` indices into associated vectors
than the model's internal embedding lookup matrix.
use_cache (`bool`, *optional*, defaults to `True`): use_cache (`bool`, *optional*, defaults to `True`):
If set to `True`, `past_key_values` key value states are returned and can be used to speed up decoding (see If set to `True`, `past_key_values` key value states are returned and can be used to speed up decoding (see
`past_key_values`). Set to `False` during training, `True` during generation `past_key_values`). Set to `False` during training, `True` during generation
...@@ -975,11 +979,11 @@ class TFMarianDecoder(tf.keras.layers.Layer): ...@@ -975,11 +979,11 @@ class TFMarianDecoder(tf.keras.layers.Layer):
If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those
that don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of that don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of
all `decoder_input_ids` of shape `(batch_size, sequence_length)`. inputs_embeds (`tf.Tensor` of shape all `decoder_input_ids` of shape `(batch_size, sequence_length)`.
`(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing `input_ids` inputs_embeds (`tf.Tensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*):
you can choose to directly pass an embedded representation. This is useful if you want more control Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation.
over how to convert `input_ids` indices into associated vectors than the model's internal embedding This is useful if you want more control over how to convert `input_ids` indices into associated vectors
lookup matrix. than the model's internal embedding lookup matrix.
output_attentions (`bool`, *optional*): output_attentions (`bool`, *optional*):
Whether or not to return the attentions tensors of all attention layers. See `attentions` under Whether or not to return the attentions tensors of all attention layers. See `attentions` under
returned tensors for more detail. This argument can be used only in eager mode, in graph mode the value returned tensors for more detail. This argument can be used only in eager mode, in graph mode the value
......
...@@ -876,10 +876,11 @@ MBART_INPUTS_DOCSTRING = r""" ...@@ -876,10 +876,11 @@ MBART_INPUTS_DOCSTRING = r"""
If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those that If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those that
don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of all don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of all
`decoder_input_ids` of shape `(batch_size, sequence_length)`. inputs_embeds (`torch.FloatTensor` of shape `decoder_input_ids` of shape `(batch_size, sequence_length)`.
`(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing `input_ids` you inputs_embeds (`torch.FloatTensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*):
can choose to directly pass an embedded representation. This is useful if you want more control over how to Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation.
convert `input_ids` indices into associated vectors than the model's internal embedding lookup matrix. This is useful if you want more control over how to convert `input_ids` indices into associated vectors
than the model's internal embedding lookup matrix.
decoder_inputs_embeds (`torch.FloatTensor` of shape `(batch_size, target_sequence_length, hidden_size)`, *optional*): decoder_inputs_embeds (`torch.FloatTensor` of shape `(batch_size, target_sequence_length, hidden_size)`, *optional*):
Optionally, instead of passing `decoder_input_ids` you can choose to directly pass an embedded Optionally, instead of passing `decoder_input_ids` you can choose to directly pass an embedded
representation. If `past_key_values` is used, optionally only the last `decoder_inputs_embeds` have to be representation. If `past_key_values` is used, optionally only the last `decoder_inputs_embeds` have to be
...@@ -1191,11 +1192,11 @@ class MBartDecoder(MBartPreTrainedModel): ...@@ -1191,11 +1192,11 @@ class MBartDecoder(MBartPreTrainedModel):
If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those
that don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of that don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of
all `decoder_input_ids` of shape `(batch_size, sequence_length)`. inputs_embeds (`torch.FloatTensor` of all `decoder_input_ids` of shape `(batch_size, sequence_length)`.
shape `(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing inputs_embeds (`torch.FloatTensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*):
`input_ids` you can choose to directly pass an embedded representation. This is useful if you want more Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation.
control over how to convert `input_ids` indices into associated vectors than the model's internal This is useful if you want more control over how to convert `input_ids` indices into associated vectors
embedding lookup matrix. than the model's internal embedding lookup matrix.
output_attentions (`bool`, *optional*): output_attentions (`bool`, *optional*):
Whether or not to return the attentions tensors of all attention layers. See `attentions` under Whether or not to return the attentions tensors of all attention layers. See `attentions` under
returned tensors for more detail. returned tensors for more detail.
......
...@@ -636,6 +636,10 @@ MBART_INPUTS_DOCSTRING = r""" ...@@ -636,6 +636,10 @@ MBART_INPUTS_DOCSTRING = r"""
If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those that If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those that
don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of all don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of all
`decoder_input_ids` of shape `(batch_size, sequence_length)`. `decoder_input_ids` of shape `(batch_size, sequence_length)`.
inputs_embeds (`tf.Tensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*):
Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation.
This is useful if you want more control over how to convert `input_ids` indices into associated vectors
than the model's internal embedding lookup matrix.
use_cache (`bool`, *optional*, defaults to `True`): use_cache (`bool`, *optional*, defaults to `True`):
If set to `True`, `past_key_values` key value states are returned and can be used to speed up decoding (see If set to `True`, `past_key_values` key value states are returned and can be used to speed up decoding (see
`past_key_values`). Set to `False` during training, `True` during generation `past_key_values`). Set to `False` during training, `True` during generation
...@@ -981,11 +985,11 @@ class TFMBartDecoder(tf.keras.layers.Layer): ...@@ -981,11 +985,11 @@ class TFMBartDecoder(tf.keras.layers.Layer):
If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those
that don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of that don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of
all `decoder_input_ids` of shape `(batch_size, sequence_length)`. inputs_embeds (`tf.Tensor` of shape all `decoder_input_ids` of shape `(batch_size, sequence_length)`.
`(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing `input_ids` inputs_embeds (`tf.Tensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*):
you can choose to directly pass an embedded representation. This is useful if you want more control Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation.
over how to convert `input_ids` indices into associated vectors than the model's internal embedding This is useful if you want more control over how to convert `input_ids` indices into associated vectors
lookup matrix. than the model's internal embedding lookup matrix.
output_attentions (`bool`, *optional*): output_attentions (`bool`, *optional*):
Whether or not to return the attentions tensors of all attention layers. See `attentions` under Whether or not to return the attentions tensors of all attention layers. See `attentions` under
returned tensors for more detail. This argument can be used only in eager mode, in graph mode the value returned tensors for more detail. This argument can be used only in eager mode, in graph mode the value
......
...@@ -539,10 +539,11 @@ MUSICGEN_INPUTS_DOCSTRING = r""" ...@@ -539,10 +539,11 @@ MUSICGEN_INPUTS_DOCSTRING = r"""
If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those that If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those that
don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of all don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of all
`decoder_input_ids` of shape `(batch_size, sequence_length)`. inputs_embeds (`torch.FloatTensor` of shape `decoder_input_ids` of shape `(batch_size, sequence_length)`.
`(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing `input_ids` you inputs_embeds (`torch.FloatTensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*):
can choose to directly pass an embedded representation. This is useful if you want more control over how to Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation.
convert `input_ids` indices into associated vectors than the model's internal embedding lookup matrix. This is useful if you want more control over how to convert `input_ids` indices into associated vectors
than the model's internal embedding lookup matrix.
decoder_inputs_embeds (`torch.FloatTensor` of shape `(batch_size, target_sequence_length, hidden_size)`, *optional*): decoder_inputs_embeds (`torch.FloatTensor` of shape `(batch_size, target_sequence_length, hidden_size)`, *optional*):
Optionally, instead of passing `decoder_input_ids` you can choose to directly pass an embedded Optionally, instead of passing `decoder_input_ids` you can choose to directly pass an embedded
representation. If `past_key_values` is used, optionally only the last `decoder_inputs_embeds` have to be representation. If `past_key_values` is used, optionally only the last `decoder_inputs_embeds` have to be
...@@ -626,10 +627,11 @@ MUSICGEN_DECODER_INPUTS_DOCSTRING = r""" ...@@ -626,10 +627,11 @@ MUSICGEN_DECODER_INPUTS_DOCSTRING = r"""
If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those that If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those that
don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of all don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of all
`decoder_input_ids` of shape `(batch_size, sequence_length)`. inputs_embeds (`torch.FloatTensor` of shape `decoder_input_ids` of shape `(batch_size, sequence_length)`.
`(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing `input_ids` you inputs_embeds (`torch.FloatTensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*):
can choose to directly pass an embedded representation. This is useful if you want more control over how to Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation.
convert `input_ids` indices into associated vectors than the model's internal embedding lookup matrix. This is useful if you want more control over how to convert `input_ids` indices into associated vectors
than the model's internal embedding lookup matrix.
output_attentions (`bool`, *optional*): output_attentions (`bool`, *optional*):
Whether or not to return the attentions tensors of all attention layers. See `attentions` under returned Whether or not to return the attentions tensors of all attention layers. See `attentions` under returned
tensors for more detail. tensors for more detail.
......
...@@ -629,10 +629,11 @@ MVP_INPUTS_DOCSTRING = r""" ...@@ -629,10 +629,11 @@ MVP_INPUTS_DOCSTRING = r"""
If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those that If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those that
don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of all don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of all
`decoder_input_ids` of shape `(batch_size, sequence_length)`. inputs_embeds (`torch.FloatTensor` of shape `decoder_input_ids` of shape `(batch_size, sequence_length)`.
`(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing `input_ids` you inputs_embeds (`torch.FloatTensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*):
can choose to directly pass an embedded representation. This is useful if you want more control over how to Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation.
convert `input_ids` indices into associated vectors than the model's internal embedding lookup matrix. This is useful if you want more control over how to convert `input_ids` indices into associated vectors
than the model's internal embedding lookup matrix.
decoder_inputs_embeds (`torch.FloatTensor` of shape `(batch_size, target_sequence_length, hidden_size)`, *optional*): decoder_inputs_embeds (`torch.FloatTensor` of shape `(batch_size, target_sequence_length, hidden_size)`, *optional*):
Optionally, instead of passing `decoder_input_ids` you can choose to directly pass an embedded Optionally, instead of passing `decoder_input_ids` you can choose to directly pass an embedded
representation. If `past_key_values` is used, optionally only the last `decoder_inputs_embeds` have to be representation. If `past_key_values` is used, optionally only the last `decoder_inputs_embeds` have to be
...@@ -1067,11 +1068,11 @@ class MvpDecoder(MvpPreTrainedModel): ...@@ -1067,11 +1068,11 @@ class MvpDecoder(MvpPreTrainedModel):
If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those
that don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of that don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of
all `decoder_input_ids` of shape `(batch_size, sequence_length)`. inputs_embeds (`torch.FloatTensor` of all `decoder_input_ids` of shape `(batch_size, sequence_length)`.
shape `(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing inputs_embeds (`torch.FloatTensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*):
`input_ids` you can choose to directly pass an embedded representation. This is useful if you want more Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation.
control over how to convert `input_ids` indices into associated vectors than the model's internal This is useful if you want more control over how to convert `input_ids` indices into associated vectors
embedding lookup matrix. than the model's internal embedding lookup matrix.
output_attentions (`bool`, *optional*): output_attentions (`bool`, *optional*):
Whether or not to return the attentions tensors of all attention layers. See `attentions` under Whether or not to return the attentions tensors of all attention layers. See `attentions` under
returned tensors for more detail. returned tensors for more detail.
......
...@@ -943,10 +943,11 @@ NLLB_MOE_INPUTS_DOCSTRING = r""" ...@@ -943,10 +943,11 @@ NLLB_MOE_INPUTS_DOCSTRING = r"""
If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those that If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those that
don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of all don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of all
`decoder_input_ids` of shape `(batch_size, sequence_length)`. inputs_embeds (`torch.FloatTensor` of shape `decoder_input_ids` of shape `(batch_size, sequence_length)`.
`(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing `input_ids` you inputs_embeds (`torch.FloatTensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*):
can choose to directly pass an embedded representation. This is useful if you want more control over how to Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation.
convert `input_ids` indices into associated vectors than the model's internal embedding lookup matrix. This is useful if you want more control over how to convert `input_ids` indices into associated vectors
than the model's internal embedding lookup matrix.
decoder_inputs_embeds (`torch.FloatTensor` of shape `(batch_size, target_sequence_length, hidden_size)`, *optional*): decoder_inputs_embeds (`torch.FloatTensor` of shape `(batch_size, target_sequence_length, hidden_size)`, *optional*):
Optionally, instead of passing `decoder_input_ids` you can choose to directly pass an embedded Optionally, instead of passing `decoder_input_ids` you can choose to directly pass an embedded
representation. If `past_key_values` is used, optionally only the last `decoder_inputs_embeds` have to be representation. If `past_key_values` is used, optionally only the last `decoder_inputs_embeds` have to be
...@@ -1271,11 +1272,11 @@ class NllbMoeDecoder(NllbMoePreTrainedModel): ...@@ -1271,11 +1272,11 @@ class NllbMoeDecoder(NllbMoePreTrainedModel):
If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those
that don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of that don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of
all `decoder_input_ids` of shape `(batch_size, sequence_length)`. inputs_embeds (`torch.FloatTensor` of all `decoder_input_ids` of shape `(batch_size, sequence_length)`.
shape `(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing inputs_embeds (`torch.FloatTensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*):
`input_ids` you can choose to directly pass an embedded representation. This is useful if you want more Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation.
control over how to convert `input_ids` indices into associated vectors than the model's internal This is useful if you want more control over how to convert `input_ids` indices into associated vectors
embedding lookup matrix. than the model's internal embedding lookup matrix.
output_attentions (`bool`, *optional*): output_attentions (`bool`, *optional*):
Whether or not to return the attentions tensors of all attention layers. See `attentions` under Whether or not to return the attentions tensors of all attention layers. See `attentions` under
returned tensors for more detail. returned tensors for more detail.
......
...@@ -584,10 +584,11 @@ PEGASUS_INPUTS_DOCSTRING = r""" ...@@ -584,10 +584,11 @@ PEGASUS_INPUTS_DOCSTRING = r"""
If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those that If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those that
don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of all don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of all
`decoder_input_ids` of shape `(batch_size, sequence_length)`. inputs_embeds (`torch.FloatTensor` of shape `decoder_input_ids` of shape `(batch_size, sequence_length)`.
`(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing `input_ids` you inputs_embeds (`torch.FloatTensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*):
can choose to directly pass an embedded representation. This is useful if you want more control over how to Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation.
convert `input_ids` indices into associated vectors than the model's internal embedding lookup matrix. This is useful if you want more control over how to convert `input_ids` indices into associated vectors
than the model's internal embedding lookup matrix.
decoder_inputs_embeds (`torch.FloatTensor` of shape `(batch_size, target_sequence_length, hidden_size)`, *optional*): decoder_inputs_embeds (`torch.FloatTensor` of shape `(batch_size, target_sequence_length, hidden_size)`, *optional*):
Optionally, instead of passing `decoder_input_ids` you can choose to directly pass an embedded Optionally, instead of passing `decoder_input_ids` you can choose to directly pass an embedded
representation. If `past_key_values` is used, optionally only the last `decoder_inputs_embeds` have to be representation. If `past_key_values` is used, optionally only the last `decoder_inputs_embeds` have to be
...@@ -946,11 +947,11 @@ class PegasusDecoder(PegasusPreTrainedModel): ...@@ -946,11 +947,11 @@ class PegasusDecoder(PegasusPreTrainedModel):
If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those
that don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of that don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of
all `decoder_input_ids` of shape `(batch_size, sequence_length)`. inputs_embeds (`torch.FloatTensor` of all `decoder_input_ids` of shape `(batch_size, sequence_length)`.
shape `(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing inputs_embeds (`torch.FloatTensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*):
`input_ids` you can choose to directly pass an embedded representation. This is useful if you want more Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation.
control over how to convert `input_ids` indices into associated vectors than the model's internal This is useful if you want more control over how to convert `input_ids` indices into associated vectors
embedding lookup matrix. than the model's internal embedding lookup matrix.
output_attentions (`bool`, *optional*): output_attentions (`bool`, *optional*):
Whether or not to return the attentions tensors of all attention layers. See `attentions` under Whether or not to return the attentions tensors of all attention layers. See `attentions` under
returned tensors for more detail. returned tensors for more detail.
......
...@@ -688,6 +688,10 @@ PEGASUS_INPUTS_DOCSTRING = r""" ...@@ -688,6 +688,10 @@ PEGASUS_INPUTS_DOCSTRING = r"""
If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those that If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those that
don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of all don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of all
`decoder_input_ids` of shape `(batch_size, sequence_length)`. `decoder_input_ids` of shape `(batch_size, sequence_length)`.
inputs_embeds (`tf.Tensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*):
Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation.
This is useful if you want more control over how to convert `input_ids` indices into associated vectors
than the model's internal embedding lookup matrix.
use_cache (`bool`, *optional*, defaults to `True`): use_cache (`bool`, *optional*, defaults to `True`):
If set to `True`, `past_key_values` key value states are returned and can be used to speed up decoding (see If set to `True`, `past_key_values` key value states are returned and can be used to speed up decoding (see
`past_key_values`). Set to `False` during training, `True` during generation output_attentions (`bool`, `past_key_values`). Set to `False` during training, `True` during generation output_attentions (`bool`,
...@@ -985,11 +989,11 @@ class TFPegasusDecoder(tf.keras.layers.Layer): ...@@ -985,11 +989,11 @@ class TFPegasusDecoder(tf.keras.layers.Layer):
If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those
that don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of that don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of
all `decoder_input_ids` of shape `(batch_size, sequence_length)`. inputs_embeds (`tf.Tensor` of shape all `decoder_input_ids` of shape `(batch_size, sequence_length)`.
`(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing `input_ids` inputs_embeds (`tf.Tensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*):
you can choose to directly pass an embedded representation. This is useful if you want more control Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation.
over how to convert `input_ids` indices into associated vectors than the model's internal embedding This is useful if you want more control over how to convert `input_ids` indices into associated vectors
lookup matrix. than the model's internal embedding lookup matrix.
output_attentions (`bool`, *optional*): output_attentions (`bool`, *optional*):
Whether or not to return the attentions tensors of all attention layers. See `attentions` under Whether or not to return the attentions tensors of all attention layers. See `attentions` under
returned tensors for more detail. This argument can be used only in eager mode, in graph mode the value returned tensors for more detail. This argument can be used only in eager mode, in graph mode the value
......
...@@ -840,10 +840,11 @@ PEGASUS_X_INPUTS_DOCSTRING = r""" ...@@ -840,10 +840,11 @@ PEGASUS_X_INPUTS_DOCSTRING = r"""
If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those that If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those that
don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of all don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of all
`decoder_input_ids` of shape `(batch_size, sequence_length)`. inputs_embeds (`torch.FloatTensor` of shape `decoder_input_ids` of shape `(batch_size, sequence_length)`.
`(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing `input_ids` you inputs_embeds (`torch.FloatTensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*):
can choose to directly pass an embedded representation. This is useful if you want more control over how to Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation.
convert `input_ids` indices into associated vectors than the model's internal embedding lookup matrix. This is useful if you want more control over how to convert `input_ids` indices into associated vectors
than the model's internal embedding lookup matrix.
decoder_inputs_embeds (`torch.FloatTensor` of shape `(batch_size, target_sequence_length, hidden_size)`, *optional*): decoder_inputs_embeds (`torch.FloatTensor` of shape `(batch_size, target_sequence_length, hidden_size)`, *optional*):
Optionally, instead of passing `decoder_input_ids` you can choose to directly pass an embedded Optionally, instead of passing `decoder_input_ids` you can choose to directly pass an embedded
representation. If `past_key_values` is used, optionally only the last `decoder_inputs_embeds` have to be representation. If `past_key_values` is used, optionally only the last `decoder_inputs_embeds` have to be
......
...@@ -938,11 +938,11 @@ class PLBartDecoder(PLBartPreTrainedModel): ...@@ -938,11 +938,11 @@ class PLBartDecoder(PLBartPreTrainedModel):
If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those
that don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of that don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of
all `decoder_input_ids` of shape `(batch_size, sequence_length)`. inputs_embeds (`torch.FloatTensor` of all `decoder_input_ids` of shape `(batch_size, sequence_length)`.
shape `(batch_size, sequence_length, hidden_size)`, *optional*): Optionally, instead of passing inputs_embeds (`torch.FloatTensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*):
`input_ids` you can choose to directly pass an embedded representation. This is useful if you want more Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation.
control over how to convert `input_ids` indices into associated vectors than the model's internal This is useful if you want more control over how to convert `input_ids` indices into associated vectors
embedding lookup matrix. than the model's internal embedding lookup matrix.
output_attentions (`bool`, *optional*): output_attentions (`bool`, *optional*):
Whether or not to return the attentions tensors of all attention layers. See `attentions` under Whether or not to return the attentions tensors of all attention layers. See `attentions` under
returned tensors for more detail. returned tensors for more detail.
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment