"...composable_kernel.git" did not exist on "e521823c00e4c1b29214297ad0a31d2780c50d86"
Unverified Commit 7fd902d3 authored by Younes Belkada's avatar Younes Belkada Committed by GitHub
Browse files

[`BLIP`] fix docstring for `BlipTextxxx` (#21224)

* fix `blip` docstring

* fix typo

* fix another typo
parent d54d7598
......@@ -679,22 +679,19 @@ class BlipTextModel(BlipTextPreTrainedModel):
is_decoder=False,
):
r"""
encoder_hidden_states (:
obj:*torch.FloatTensor* of shape `(batch_size, sequence_length, hidden_size)`, *optional*): Sequence of
hidden-states at the output of the last layer of the encoder. Used in the cross-attention if the model is
configured as a decoder.
encoder_attention_mask (`torch.FloatTensor` of shape `(batch_size, sequence_length)`, *optional*):
encoder_hidden_states (`torch.FloatTensor`, *optional*):
Sequence of hidden-states at the output of the last layer of the encoder. Used in the cross-attention if
the model is configured as a decoder.
encoder_attention_mask (`torch.FloatTensor`, *optional*):
Mask to avoid performing attention on the padding token indices of the encoder input. This mask is used in
the cross-attention if the model is configured as a decoder. Mask values selected in `[0, 1]`:
- 1 for tokens that are **not masked**,
- 0 for tokens that are **masked**.
past_key_values (:
obj:*tuple(tuple(torch.FloatTensor))* of length `config.n_layers` with each tuple having 4 tensors of shape
`(batch_size, num_heads, sequence_length - 1, embed_size_per_head)`): Contains precomputed key and value
hidden states of the attention blocks. Can be used to speed up decoding. If `past_key_values` are used, the
user can optionally input only the last `decoder_input_ids` (those that don't have their past key value
states given to this model) of shape `(batch_size, 1)` instead of all `decoder_input_ids` of shape
`(batch_size, sequence_length)`.
past_key_values (`tuple(tuple(torch.FloatTensor))`, *optional*):
Contains precomputed key and value hidden states of the attention blocks. Can be used to speed up decoding.
If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those that
don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of all
`decoder_input_ids` of shape `(batch_size, sequence_length)`.
use_cache (`bool`, *optional*):
If set to `True`, `past_key_values` key value states are returned and can be used to speed up decoding (see
`past_key_values`).
......@@ -841,32 +838,26 @@ class BlipTextLMHeadModel(BlipTextPreTrainedModel):
reduction="mean",
):
r"""
encoder_hidden_states (:
obj:*torch.FloatTensor* of shape `(batch_size, sequence_length, hidden_size)`, *optional*): Sequence of
encoder_hidden_states (`torch.FloatTensor`, *optional*): Sequence of
hidden-states at the output of the last layer of the encoder. Used in the cross-attention if the model is
configured as a decoder.
encoder_attention_mask (`torch.FloatTensor` of shape `(batch_size, sequence_length)`, *optional*):
encoder_attention_mask (`torch.FloatTensor`, *optional*):
Mask to avoid performing attention on the padding token indices of the encoder input. This mask is used in
the cross-attention if the model is configured as a decoder. Mask values selected in `[0, 1]`:
- 1 for tokens that are **not masked**,
- 0 for tokens that are **masked**.
labels (`torch.LongTensor` of shape `(batch_size, sequence_length)`, *optional*):
labels (`torch.LongTensor`, *optional*):
Labels for computing the left-to-right language modeling loss (next word prediction). Indices should be in
`[-100, 0, ..., config.vocab_size]` (see `input_ids` docstring) Tokens with indices set to `-100` are
ignored (masked), the loss is only computed for the tokens with labels n `[0, ..., config.vocab_size]`
past_key_values (:
obj:*tuple(tuple(torch.FloatTensor))* of length `config.n_layers` with each tuple having 4 tensors of shape
`(batch_size, num_heads, sequence_length - 1, embed_size_per_head)`): Contains precomputed key and value
hidden states of the attention blocks. Can be used to speed up decoding. If `past_key_values` are used, the
user can optionally input only the last `decoder_input_ids` (those that don't have their past key value
states given to this model) of shape `(batch_size, 1)` instead of all `decoder_input_ids` of shape
`(batch_size, sequence_length)`.
past_key_values (`tuple(tuple(torch.FloatTensor))`, *optional*):
Contains precomputed key and value hidden states of the attention blocks. Can be used to speed up decoding.
If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those that
don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of all
`decoder_input_ids` of shape `(batch_size, sequence_length)`.
use_cache (`bool`, *optional*):
If set to `True`, `past_key_values` key value states are returned and can be used to speed up decoding (see
`past_key_values`).
Returns:
Example:
"""
return_dict = return_dict if return_dict is not None else self.config.use_return_dict
if labels is not None:
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment