Unverified Commit 232c898f authored by MS Kim(tony9402)'s avatar MS Kim(tony9402) Committed by GitHub
Browse files

Fix annotations (#24582)

* fix annotations

* fix annotations

* fix annotations

* fix annotations

* fix annotations

* fix annotations

* fix annotations

* fix annotations

* fix annotations

* fix annotations

* fix annotations

* fix annotations

* fix annotations

* fix annotations

* fix annotations

* fix annotations

* fix annotations

* fix annotations

* fix annotations

* fix annotations

* fix annotations

* fix annotations

* fix annotations
parent c817bc44
...@@ -735,7 +735,7 @@ class AutoformerEncoderLayer(nn.Module): ...@@ -735,7 +735,7 @@ class AutoformerEncoderLayer(nn.Module):
) -> Tuple[torch.FloatTensor, Optional[torch.FloatTensor]]: ) -> Tuple[torch.FloatTensor, Optional[torch.FloatTensor]]:
""" """
Args: Args:
hidden_states (`torch.FloatTensor`): input to the layer of shape `(seq_len, batch, embed_dim)` hidden_states (`torch.FloatTensor`): input to the layer of shape `(batch, seq_len, embed_dim)`
attention_mask (`torch.FloatTensor`): attention mask of size attention_mask (`torch.FloatTensor`): attention mask of size
`(batch, 1, tgt_len, src_len)` where padding elements are indicated by very large negative values. `(batch, 1, tgt_len, src_len)` where padding elements are indicated by very large negative values.
layer_head_mask (`torch.FloatTensor`): mask for attention heads in a given layer of size layer_head_mask (`torch.FloatTensor`): mask for attention heads in a given layer of size
......
...@@ -321,7 +321,7 @@ class TFBartEncoderLayer(tf.keras.layers.Layer): ...@@ -321,7 +321,7 @@ class TFBartEncoderLayer(tf.keras.layers.Layer):
) -> tf.Tensor: ) -> tf.Tensor:
""" """
Args: Args:
hidden_states (`tf.Tensor`): input to the layer of shape `(seq_len, batch, embed_dim)` hidden_states (`tf.Tensor`): input to the layer of shape `(batch, seq_len, embed_dim)`
attention_mask (`tf.Tensor`): attention mask of size attention_mask (`tf.Tensor`): attention mask of size
`(batch, 1, tgt_len, src_len)` where padding elements are indicated by very large negative values. `(batch, 1, tgt_len, src_len)` where padding elements are indicated by very large negative values.
layer_head_mask (`tf.Tensor`): mask for attention heads in a given layer of size layer_head_mask (`tf.Tensor`): mask for attention heads in a given layer of size
...@@ -394,11 +394,11 @@ class TFBartDecoderLayer(tf.keras.layers.Layer): ...@@ -394,11 +394,11 @@ class TFBartDecoderLayer(tf.keras.layers.Layer):
) -> Tuple[tf.Tensor, tf.Tensor, Tuple[Tuple[tf.Tensor]]]: ) -> Tuple[tf.Tensor, tf.Tensor, Tuple[Tuple[tf.Tensor]]]:
""" """
Args: Args:
hidden_states (`tf.Tensor`): input to the layer of shape `(seq_len, batch, embed_dim)` hidden_states (`tf.Tensor`): input to the layer of shape `(batch, seq_len, embed_dim)`
attention_mask (`tf.Tensor`): attention mask of size attention_mask (`tf.Tensor`): attention mask of size
`(batch, 1, tgt_len, src_len)` where padding elements are indicated by very large negative values. `(batch, 1, tgt_len, src_len)` where padding elements are indicated by very large negative values.
encoder_hidden_states (`tf.Tensor`): encoder_hidden_states (`tf.Tensor`):
cross attention input to the layer of shape `(seq_len, batch, embed_dim)` cross attention input to the layer of shape `(batch, seq_len, embed_dim)`
encoder_attention_mask (`tf.Tensor`): encoder attention mask of size encoder_attention_mask (`tf.Tensor`): encoder attention mask of size
`(batch, 1, tgt_len, src_len)` where padding elements are indicated by very large negative values. `(batch, 1, tgt_len, src_len)` where padding elements are indicated by very large negative values.
layer_head_mask (`tf.Tensor`): mask for attention heads in a given layer of size layer_head_mask (`tf.Tensor`): mask for attention heads in a given layer of size
......
...@@ -1388,7 +1388,7 @@ class BigBirdPegasusEncoderLayer(nn.Module): ...@@ -1388,7 +1388,7 @@ class BigBirdPegasusEncoderLayer(nn.Module):
): ):
""" """
Args: Args:
hidden_states (`torch.FloatTensor`): input to the layer of shape `(seq_len, batch, embed_dim)` hidden_states (`torch.FloatTensor`): input to the layer of shape `(batch, seq_len, embed_dim)`
attention_mask (`torch.FloatTensor`): attention mask of size attention_mask (`torch.FloatTensor`): attention mask of size
`(batch, 1, tgt_len, src_len)` where padding elements are indicated by very large negative values. `(batch, 1, tgt_len, src_len)` where padding elements are indicated by very large negative values.
output_attentions (`bool`, *optional*): output_attentions (`bool`, *optional*):
......
...@@ -317,7 +317,7 @@ class TFBlenderbotEncoderLayer(tf.keras.layers.Layer): ...@@ -317,7 +317,7 @@ class TFBlenderbotEncoderLayer(tf.keras.layers.Layer):
): ):
""" """
Args: Args:
hidden_states (`tf.Tensor`): input to the layer of shape *(seq_len, batch, embed_dim)* hidden_states (`tf.Tensor`): input to the layer of shape *(batch, seq_len, embed_dim)*
attention_mask (`tf.Tensor`): attention mask of size attention_mask (`tf.Tensor`): attention mask of size
*(batch, 1, tgt_len, src_len)* where padding elements are indicated by very large negative values. *(batch, 1, tgt_len, src_len)* where padding elements are indicated by very large negative values.
layer_head_mask (`tf.Tensor`): mask for attention heads in a given layer of size layer_head_mask (`tf.Tensor`): mask for attention heads in a given layer of size
...@@ -391,11 +391,11 @@ class TFBlenderbotDecoderLayer(tf.keras.layers.Layer): ...@@ -391,11 +391,11 @@ class TFBlenderbotDecoderLayer(tf.keras.layers.Layer):
) -> Tuple[tf.Tensor, tf.Tensor, Tuple[Tuple[tf.Tensor]]]: ) -> Tuple[tf.Tensor, tf.Tensor, Tuple[Tuple[tf.Tensor]]]:
""" """
Args: Args:
hidden_states (`tf.Tensor`): input to the layer of shape *(seq_len, batch, embed_dim)* hidden_states (`tf.Tensor`): input to the layer of shape *(batch, seq_len, embed_dim)*
attention_mask (`tf.Tensor`): attention mask of size attention_mask (`tf.Tensor`): attention mask of size
*(batch, 1, tgt_len, src_len)* where padding elements are indicated by very large negative values. *(batch, 1, tgt_len, src_len)* where padding elements are indicated by very large negative values.
encoder_hidden_states (`tf.Tensor`): encoder_hidden_states (`tf.Tensor`):
cross attention input to the layer of shape *(seq_len, batch, embed_dim)* cross attention input to the layer of shape *(batch, seq_len, embed_dim)*
encoder_attention_mask (`tf.Tensor`): encoder attention mask of size encoder_attention_mask (`tf.Tensor`): encoder attention mask of size
*(batch, 1, tgt_len, src_len)* where padding elements are indicated by very large negative values. *(batch, 1, tgt_len, src_len)* where padding elements are indicated by very large negative values.
layer_head_mask (`tf.Tensor`): mask for attention heads in a given layer of size layer_head_mask (`tf.Tensor`): mask for attention heads in a given layer of size
......
...@@ -317,7 +317,7 @@ class TFBlenderbotSmallEncoderLayer(tf.keras.layers.Layer): ...@@ -317,7 +317,7 @@ class TFBlenderbotSmallEncoderLayer(tf.keras.layers.Layer):
) -> tf.Tensor: ) -> tf.Tensor:
""" """
Args: Args:
hidden_states (`tf.Tensor`): input to the layer of shape `(seq_len, batch, embed_dim)` hidden_states (`tf.Tensor`): input to the layer of shape `(batch, seq_len, embed_dim)`
attention_mask (`tf.Tensor`): attention mask of size attention_mask (`tf.Tensor`): attention mask of size
`(batch, 1, tgt_len, src_len)` where padding elements are indicated by very large negative values. `(batch, 1, tgt_len, src_len)` where padding elements are indicated by very large negative values.
layer_head_mask (`tf.Tensor`): mask for attention heads in a given layer of size layer_head_mask (`tf.Tensor`): mask for attention heads in a given layer of size
...@@ -391,11 +391,11 @@ class TFBlenderbotSmallDecoderLayer(tf.keras.layers.Layer): ...@@ -391,11 +391,11 @@ class TFBlenderbotSmallDecoderLayer(tf.keras.layers.Layer):
) -> Tuple[tf.Tensor, tf.Tensor, Tuple[Tuple[tf.Tensor]]]: ) -> Tuple[tf.Tensor, tf.Tensor, Tuple[Tuple[tf.Tensor]]]:
""" """
Args: Args:
hidden_states (`tf.Tensor`): input to the layer of shape `(seq_len, batch, embed_dim)` hidden_states (`tf.Tensor`): input to the layer of shape `(batch, seq_len, embed_dim)`
attention_mask (`tf.Tensor`): attention mask of size attention_mask (`tf.Tensor`): attention mask of size
`(batch, 1, tgt_len, src_len)` where padding elements are indicated by very large negative values. `(batch, 1, tgt_len, src_len)` where padding elements are indicated by very large negative values.
encoder_hidden_states (`tf.Tensor`): encoder_hidden_states (`tf.Tensor`):
cross attention input to the layer of shape `(seq_len, batch, embed_dim)` cross attention input to the layer of shape `(batch, seq_len, embed_dim)`
encoder_attention_mask (`tf.Tensor`): encoder attention mask of size encoder_attention_mask (`tf.Tensor`): encoder attention mask of size
`(batch, 1, tgt_len, src_len)` where padding elements are indicated by very large negative values. `(batch, 1, tgt_len, src_len)` where padding elements are indicated by very large negative values.
layer_head_mask (`tf.Tensor`): mask for attention heads in a given layer of size layer_head_mask (`tf.Tensor`): mask for attention heads in a given layer of size
......
...@@ -795,7 +795,7 @@ class ConditionalDetrEncoderLayer(nn.Module): ...@@ -795,7 +795,7 @@ class ConditionalDetrEncoderLayer(nn.Module):
): ):
""" """
Args: Args:
hidden_states (`torch.FloatTensor`): input to the layer of shape `(seq_len, batch, embed_dim)` hidden_states (`torch.FloatTensor`): input to the layer of shape `(batch, seq_len, embed_dim)`
attention_mask (`torch.FloatTensor`): attention mask of size attention_mask (`torch.FloatTensor`): attention mask of size
`(batch, 1, target_len, source_len)` where padding elements are indicated by very large negative `(batch, 1, target_len, source_len)` where padding elements are indicated by very large negative
values. values.
......
...@@ -851,7 +851,7 @@ class DetaDecoderLayer(nn.Module): ...@@ -851,7 +851,7 @@ class DetaDecoderLayer(nn.Module):
""" """
Args: Args:
hidden_states (`torch.FloatTensor`): hidden_states (`torch.FloatTensor`):
Input to the layer of shape `(seq_len, batch, embed_dim)`. Input to the layer of shape `(batch, seq_len, embed_dim)`.
position_embeddings (`torch.FloatTensor`, *optional*): position_embeddings (`torch.FloatTensor`, *optional*):
Position embeddings that are added to the queries and keys in the self-attention layer. Position embeddings that are added to the queries and keys in the self-attention layer.
reference_points (`torch.FloatTensor`, *optional*): reference_points (`torch.FloatTensor`, *optional*):
...@@ -861,7 +861,7 @@ class DetaDecoderLayer(nn.Module): ...@@ -861,7 +861,7 @@ class DetaDecoderLayer(nn.Module):
level_start_index (`torch.LongTensor`, *optional*): level_start_index (`torch.LongTensor`, *optional*):
Level start index. Level start index.
encoder_hidden_states (`torch.FloatTensor`): encoder_hidden_states (`torch.FloatTensor`):
cross attention input to the layer of shape `(seq_len, batch, embed_dim)` cross attention input to the layer of shape `(batch, seq_len, embed_dim)`
encoder_attention_mask (`torch.FloatTensor`): encoder attention mask of size encoder_attention_mask (`torch.FloatTensor`): encoder attention mask of size
`(batch, 1, target_len, source_len)` where padding elements are indicated by very large negative `(batch, 1, target_len, source_len)` where padding elements are indicated by very large negative
values. values.
......
...@@ -642,7 +642,7 @@ class DetrEncoderLayer(nn.Module): ...@@ -642,7 +642,7 @@ class DetrEncoderLayer(nn.Module):
): ):
""" """
Args: Args:
hidden_states (`torch.FloatTensor`): input to the layer of shape `(seq_len, batch, embed_dim)` hidden_states (`torch.FloatTensor`): input to the layer of shape `(batch, seq_len, embed_dim)`
attention_mask (`torch.FloatTensor`): attention mask of size attention_mask (`torch.FloatTensor`): attention mask of size
`(batch, 1, target_len, source_len)` where padding elements are indicated by very large negative `(batch, 1, target_len, source_len)` where padding elements are indicated by very large negative
values. values.
...@@ -723,7 +723,7 @@ class DetrDecoderLayer(nn.Module): ...@@ -723,7 +723,7 @@ class DetrDecoderLayer(nn.Module):
): ):
""" """
Args: Args:
hidden_states (`torch.FloatTensor`): input to the layer of shape `(seq_len, batch, embed_dim)` hidden_states (`torch.FloatTensor`): input to the layer of shape `(batch, seq_len, embed_dim)`
attention_mask (`torch.FloatTensor`): attention mask of size attention_mask (`torch.FloatTensor`): attention mask of size
`(batch, 1, target_len, source_len)` where padding elements are indicated by very large negative `(batch, 1, target_len, source_len)` where padding elements are indicated by very large negative
values. values.
...@@ -734,7 +734,7 @@ class DetrDecoderLayer(nn.Module): ...@@ -734,7 +734,7 @@ class DetrDecoderLayer(nn.Module):
position embeddings that are added to the queries and keys position embeddings that are added to the queries and keys
in the self-attention layer. in the self-attention layer.
encoder_hidden_states (`torch.FloatTensor`): encoder_hidden_states (`torch.FloatTensor`):
cross attention input to the layer of shape `(seq_len, batch, embed_dim)` cross attention input to the layer of shape `(batch, seq_len, embed_dim)`
encoder_attention_mask (`torch.FloatTensor`): encoder attention mask of size encoder_attention_mask (`torch.FloatTensor`): encoder attention mask of size
`(batch, 1, target_len, source_len)` where padding elements are indicated by very large negative `(batch, 1, target_len, source_len)` where padding elements are indicated by very large negative
values. values.
......
...@@ -738,7 +738,7 @@ class InformerEncoderLayer(nn.Module): ...@@ -738,7 +738,7 @@ class InformerEncoderLayer(nn.Module):
) -> Tuple[torch.FloatTensor, Optional[torch.FloatTensor]]: ) -> Tuple[torch.FloatTensor, Optional[torch.FloatTensor]]:
""" """
Args: Args:
hidden_states (`torch.FloatTensor`): input to the layer of shape `(seq_len, batch, embed_dim)` hidden_states (`torch.FloatTensor`): input to the layer of shape `(batch, seq_len, embed_dim)`
attention_mask (`torch.FloatTensor`): attention mask of size attention_mask (`torch.FloatTensor`): attention mask of size
`(batch, 1, tgt_len, src_len)` where padding elements are indicated by very large negative values. `(batch, 1, tgt_len, src_len)` where padding elements are indicated by very large negative values.
layer_head_mask (`torch.FloatTensor`): mask for attention heads in a given layer of size layer_head_mask (`torch.FloatTensor`): mask for attention heads in a given layer of size
......
...@@ -962,7 +962,7 @@ class LEDEncoderLayer(nn.Module): ...@@ -962,7 +962,7 @@ class LEDEncoderLayer(nn.Module):
): ):
""" """
Args: Args:
hidden_states (`torch.FloatTensor`): input to the layer of shape *(seq_len, batch, embed_dim)* hidden_states (`torch.FloatTensor`): input to the layer of shape *(batch, seq_len, embed_dim)*
attention_mask (`torch.FloatTensor`): attention mask of size attention_mask (`torch.FloatTensor`): attention mask of size
*(batch, 1, tgt_len, src_len)* where padding elements are indicated by very large negative values. *(batch, 1, tgt_len, src_len)* where padding elements are indicated by very large negative values.
layer_head_mask (`torch.FloatTensor`): mask for attention heads in a given layer of size layer_head_mask (`torch.FloatTensor`): mask for attention heads in a given layer of size
...@@ -1040,11 +1040,11 @@ class LEDDecoderLayer(nn.Module): ...@@ -1040,11 +1040,11 @@ class LEDDecoderLayer(nn.Module):
): ):
""" """
Args: Args:
hidden_states (`torch.FloatTensor`): input to the layer of shape *(seq_len, batch, embed_dim)* hidden_states (`torch.FloatTensor`): input to the layer of shape *(batch, seq_len, embed_dim)*
attention_mask (`torch.FloatTensor`): attention mask of size attention_mask (`torch.FloatTensor`): attention mask of size
*(batch, 1, tgt_len, src_len)* where padding elements are indicated by very large negative values. *(batch, 1, tgt_len, src_len)* where padding elements are indicated by very large negative values.
encoder_hidden_states (`torch.FloatTensor`): encoder_hidden_states (`torch.FloatTensor`):
cross attention input to the layer of shape *(seq_len, batch, embed_dim)* cross attention input to the layer of shape *(batch, seq_len, embed_dim)*
encoder_attention_mask (`torch.FloatTensor`): encoder attention mask of size encoder_attention_mask (`torch.FloatTensor`): encoder attention mask of size
*(batch, 1, tgt_len, src_len)* where padding elements are indicated by very large negative values. *(batch, 1, tgt_len, src_len)* where padding elements are indicated by very large negative values.
layer_head_mask (`torch.FloatTensor`): mask for attention heads in a given layer of size layer_head_mask (`torch.FloatTensor`): mask for attention heads in a given layer of size
......
...@@ -1181,7 +1181,7 @@ class TFLEDEncoderLayer(tf.keras.layers.Layer): ...@@ -1181,7 +1181,7 @@ class TFLEDEncoderLayer(tf.keras.layers.Layer):
): ):
""" """
Args: Args:
hidden_states (`tf.Tensor`): input to the layer of shape *(seq_len, batch, embed_dim)* hidden_states (`tf.Tensor`): input to the layer of shape *(batch, seq_len, embed_dim)*
attention_mask (`tf.Tensor`): attention mask of size attention_mask (`tf.Tensor`): attention mask of size
*(batch, 1, tgt_len, src_len)* where padding elements are indicated by very large negative values. *(batch, 1, tgt_len, src_len)* where padding elements are indicated by very large negative values.
layer_head_mask (`tf.Tensor`): mask for attention heads in a given layer of size layer_head_mask (`tf.Tensor`): mask for attention heads in a given layer of size
...@@ -1256,11 +1256,11 @@ class TFLEDDecoderLayer(tf.keras.layers.Layer): ...@@ -1256,11 +1256,11 @@ class TFLEDDecoderLayer(tf.keras.layers.Layer):
) -> Tuple[tf.Tensor, tf.Tensor, tf.Tensor, Tuple[Tuple[tf.Tensor]]]: ) -> Tuple[tf.Tensor, tf.Tensor, tf.Tensor, Tuple[Tuple[tf.Tensor]]]:
""" """
Args: Args:
hidden_states (`tf.Tensor`): input to the layer of shape *(seq_len, batch, embed_dim)* hidden_states (`tf.Tensor`): input to the layer of shape *(batch, seq_len, embed_dim)*
attention_mask (`tf.Tensor`): attention mask of size attention_mask (`tf.Tensor`): attention mask of size
*(batch, 1, tgt_len, src_len)* where padding elements are indicated by very large negative values. *(batch, 1, tgt_len, src_len)* where padding elements are indicated by very large negative values.
encoder_hidden_states (`tf.Tensor`): encoder_hidden_states (`tf.Tensor`):
cross attention input to the layer of shape *(seq_len, batch, embed_dim)* cross attention input to the layer of shape *(batch, seq_len, embed_dim)*
encoder_attention_mask (`tf.Tensor`): encoder attention mask of size encoder_attention_mask (`tf.Tensor`): encoder attention mask of size
*(batch, 1, tgt_len, src_len)* where padding elements are indicated by very large negative values. *(batch, 1, tgt_len, src_len)* where padding elements are indicated by very large negative values.
layer_head_mask (`tf.Tensor`): mask for attention heads in a given layer of size layer_head_mask (`tf.Tensor`): mask for attention heads in a given layer of size
......
...@@ -354,7 +354,7 @@ class TFMarianEncoderLayer(tf.keras.layers.Layer): ...@@ -354,7 +354,7 @@ class TFMarianEncoderLayer(tf.keras.layers.Layer):
) -> tf.Tensor: ) -> tf.Tensor:
""" """
Args: Args:
hidden_states (`tf.Tensor`): input to the layer of shape `(seq_len, batch, embed_dim)` hidden_states (`tf.Tensor`): input to the layer of shape `(batch, seq_len, embed_dim)`
attention_mask (`tf.Tensor`): attention mask of size attention_mask (`tf.Tensor`): attention mask of size
`(batch, 1, tgt_len, src_len)` where padding elements are indicated by very large negative values. `(batch, 1, tgt_len, src_len)` where padding elements are indicated by very large negative values.
layer_head_mask (`tf.Tensor`): mask for attention heads in a given layer of size layer_head_mask (`tf.Tensor`): mask for attention heads in a given layer of size
...@@ -428,11 +428,11 @@ class TFMarianDecoderLayer(tf.keras.layers.Layer): ...@@ -428,11 +428,11 @@ class TFMarianDecoderLayer(tf.keras.layers.Layer):
) -> Tuple[tf.Tensor, tf.Tensor, Tuple[Tuple[tf.Tensor]]]: ) -> Tuple[tf.Tensor, tf.Tensor, Tuple[Tuple[tf.Tensor]]]:
""" """
Args: Args:
hidden_states (`tf.Tensor`): input to the layer of shape `(seq_len, batch, embed_dim)` hidden_states (`tf.Tensor`): input to the layer of shape `(batch, seq_len, embed_dim)`
attention_mask (`tf.Tensor`): attention mask of size attention_mask (`tf.Tensor`): attention mask of size
`(batch, 1, tgt_len, src_len)` where padding elements are indicated by very large negative values. `(batch, 1, tgt_len, src_len)` where padding elements are indicated by very large negative values.
encoder_hidden_states (`tf.Tensor`): encoder_hidden_states (`tf.Tensor`):
cross attention input to the layer of shape `(seq_len, batch, embed_dim)` cross attention input to the layer of shape `(batch, seq_len, embed_dim)`
encoder_attention_mask (`tf.Tensor`): encoder attention mask of size encoder_attention_mask (`tf.Tensor`): encoder attention mask of size
`(batch, 1, tgt_len, src_len)` where padding elements are indicated by very large negative values. `(batch, 1, tgt_len, src_len)` where padding elements are indicated by very large negative values.
layer_head_mask (`tf.Tensor`): mask for attention heads in a given layer of size layer_head_mask (`tf.Tensor`): mask for attention heads in a given layer of size
......
...@@ -571,7 +571,7 @@ class DetrDecoderLayer(nn.Module): ...@@ -571,7 +571,7 @@ class DetrDecoderLayer(nn.Module):
): ):
""" """
Args: Args:
hidden_states (`torch.FloatTensor`): input to the layer of shape `(seq_len, batch, embed_dim)` hidden_states (`torch.FloatTensor`): input to the layer of shape `(batch, seq_len, embed_dim)`
attention_mask (`torch.FloatTensor`): attention mask of size attention_mask (`torch.FloatTensor`): attention mask of size
`(batch, 1, target_len, source_len)` where padding elements are indicated by very large negative `(batch, 1, target_len, source_len)` where padding elements are indicated by very large negative
values. values.
...@@ -582,7 +582,7 @@ class DetrDecoderLayer(nn.Module): ...@@ -582,7 +582,7 @@ class DetrDecoderLayer(nn.Module):
position embeddings that are added to the queries and keys position embeddings that are added to the queries and keys
in the self-attention layer. in the self-attention layer.
encoder_hidden_states (`torch.FloatTensor`): encoder_hidden_states (`torch.FloatTensor`):
cross attention input to the layer of shape `(seq_len, batch, embed_dim)` cross attention input to the layer of shape `(batch, seq_len, embed_dim)`
encoder_attention_mask (`torch.FloatTensor`): encoder attention mask of size encoder_attention_mask (`torch.FloatTensor`): encoder attention mask of size
`(batch, 1, target_len, source_len)` where padding elements are indicated by very large negative `(batch, 1, target_len, source_len)` where padding elements are indicated by very large negative
values. values.
......
...@@ -322,7 +322,7 @@ class TFMBartEncoderLayer(tf.keras.layers.Layer): ...@@ -322,7 +322,7 @@ class TFMBartEncoderLayer(tf.keras.layers.Layer):
): ):
""" """
Args: Args:
hidden_states (`tf.Tensor`): input to the layer of shape *(seq_len, batch, embed_dim)* hidden_states (`tf.Tensor`): input to the layer of shape *(batch, seq_len, embed_dim)*
attention_mask (`tf.Tensor`): attention mask of size attention_mask (`tf.Tensor`): attention mask of size
*(batch, 1, tgt_len, src_len)* where padding elements are indicated by very large negative values. *(batch, 1, tgt_len, src_len)* where padding elements are indicated by very large negative values.
layer_head_mask (`tf.Tensor`): mask for attention heads in a given layer of size layer_head_mask (`tf.Tensor`): mask for attention heads in a given layer of size
...@@ -395,11 +395,11 @@ class TFMBartDecoderLayer(tf.keras.layers.Layer): ...@@ -395,11 +395,11 @@ class TFMBartDecoderLayer(tf.keras.layers.Layer):
) -> Tuple[tf.Tensor, tf.Tensor, Tuple[Tuple[tf.Tensor]]]: ) -> Tuple[tf.Tensor, tf.Tensor, Tuple[Tuple[tf.Tensor]]]:
""" """
Args: Args:
hidden_states (`tf.Tensor`): input to the layer of shape *(seq_len, batch, embed_dim)* hidden_states (`tf.Tensor`): input to the layer of shape *(batch, seq_len, embed_dim)*
attention_mask (`tf.Tensor`): attention mask of size attention_mask (`tf.Tensor`): attention mask of size
*(batch, 1, tgt_len, src_len)* where padding elements are indicated by very large negative values. *(batch, 1, tgt_len, src_len)* where padding elements are indicated by very large negative values.
encoder_hidden_states (`tf.Tensor`): encoder_hidden_states (`tf.Tensor`):
cross attention input to the layer of shape *(seq_len, batch, embed_dim)* cross attention input to the layer of shape *(batch, seq_len, embed_dim)*
encoder_attention_mask (`tf.Tensor`): encoder attention mask of size encoder_attention_mask (`tf.Tensor`): encoder attention mask of size
*(batch, 1, tgt_len, src_len)* where padding elements are indicated by very large negative values. *(batch, 1, tgt_len, src_len)* where padding elements are indicated by very large negative values.
layer_head_mask (`tf.Tensor`): mask for attention heads in a given layer of size layer_head_mask (`tf.Tensor`): mask for attention heads in a given layer of size
......
...@@ -327,7 +327,7 @@ class MvpEncoderLayer(nn.Module): ...@@ -327,7 +327,7 @@ class MvpEncoderLayer(nn.Module):
) -> Tuple[torch.FloatTensor, Optional[torch.FloatTensor]]: ) -> Tuple[torch.FloatTensor, Optional[torch.FloatTensor]]:
""" """
Args: Args:
hidden_states (`torch.FloatTensor`): input to the layer of shape `(seq_len, batch, embed_dim)` hidden_states (`torch.FloatTensor`): input to the layer of shape `(batch, seq_len, embed_dim)`
attention_mask (`torch.FloatTensor`): attention mask of size attention_mask (`torch.FloatTensor`): attention mask of size
`(batch, 1, tgt_len, src_len)` where padding elements are indicated by very large negative values. `(batch, 1, tgt_len, src_len)` where padding elements are indicated by very large negative values.
layer_head_mask (`torch.FloatTensor`): mask for attention heads in a given layer of size layer_head_mask (`torch.FloatTensor`): mask for attention heads in a given layer of size
......
...@@ -672,7 +672,7 @@ class NllbMoeEncoderLayer(nn.Module): ...@@ -672,7 +672,7 @@ class NllbMoeEncoderLayer(nn.Module):
""" """
Args: Args:
hidden_states (`torch.FloatTensor`): hidden_states (`torch.FloatTensor`):
input to the layer of shape `(seq_len, batch, embed_dim)` input to the layer of shape `(batch, seq_len, embed_dim)`
attention_mask (`torch.FloatTensor`): attention_mask (`torch.FloatTensor`):
attention mask of size `(batch, 1, tgt_len, src_len)` where padding elements are indicated by very attention mask of size `(batch, 1, tgt_len, src_len)` where padding elements are indicated by very
large negative values. large negative values.
......
...@@ -303,7 +303,7 @@ class TFOPTDecoderLayer(tf.keras.layers.Layer): ...@@ -303,7 +303,7 @@ class TFOPTDecoderLayer(tf.keras.layers.Layer):
) -> Tuple[tf.Tensor, tf.Tensor, Tuple[Tuple[tf.Tensor]]]: ) -> Tuple[tf.Tensor, tf.Tensor, Tuple[Tuple[tf.Tensor]]]:
""" """
Args: Args:
hidden_states (`tf.Tensor`): input to the layer of shape `(seq_len, batch, embed_dim)` hidden_states (`tf.Tensor`): input to the layer of shape `(batch, seq_len, embed_dim)`
attention_mask (`tf.Tensor`, *optional*): attention mask of size attention_mask (`tf.Tensor`, *optional*): attention mask of size
`(batch, 1, tgt_len, src_len)` where padding elements are indicated by very large negative values. `(batch, 1, tgt_len, src_len)` where padding elements are indicated by very large negative values.
layer_head_mask (`tf.Tensor`, *optional*): mask for attention heads in a given layer of size layer_head_mask (`tf.Tensor`, *optional*): mask for attention heads in a given layer of size
......
...@@ -356,7 +356,7 @@ class TFPegasusEncoderLayer(tf.keras.layers.Layer): ...@@ -356,7 +356,7 @@ class TFPegasusEncoderLayer(tf.keras.layers.Layer):
): ):
""" """
Args: Args:
hidden_states (`tf.Tensor`): input to the layer of shape *(seq_len, batch, embed_dim)* hidden_states (`tf.Tensor`): input to the layer of shape *(batch, seq_len, embed_dim)*
attention_mask (`tf.Tensor`): attention mask of size attention_mask (`tf.Tensor`): attention mask of size
*(batch, 1, tgt_len, src_len)* where padding elements are indicated by very large negative values. *(batch, 1, tgt_len, src_len)* where padding elements are indicated by very large negative values.
layer_head_mask (`tf.Tensor`): mask for attention heads in a given layer of size layer_head_mask (`tf.Tensor`): mask for attention heads in a given layer of size
...@@ -430,11 +430,11 @@ class TFPegasusDecoderLayer(tf.keras.layers.Layer): ...@@ -430,11 +430,11 @@ class TFPegasusDecoderLayer(tf.keras.layers.Layer):
) -> Tuple[tf.Tensor, tf.Tensor, Tuple[Tuple[tf.Tensor]]]: ) -> Tuple[tf.Tensor, tf.Tensor, Tuple[Tuple[tf.Tensor]]]:
""" """
Args: Args:
hidden_states (`tf.Tensor`): input to the layer of shape *(seq_len, batch, embed_dim)* hidden_states (`tf.Tensor`): input to the layer of shape *(batch, seq_len, embed_dim)*
attention_mask (`tf.Tensor`): attention mask of size attention_mask (`tf.Tensor`): attention mask of size
*(batch, 1, tgt_len, src_len)* where padding elements are indicated by very large negative values. *(batch, 1, tgt_len, src_len)* where padding elements are indicated by very large negative values.
encoder_hidden_states (`tf.Tensor`): encoder_hidden_states (`tf.Tensor`):
cross attention input to the layer of shape *(seq_len, batch, embed_dim)* cross attention input to the layer of shape *(batch, seq_len, embed_dim)*
encoder_attention_mask (`tf.Tensor`): encoder attention mask of size encoder_attention_mask (`tf.Tensor`): encoder attention mask of size
*(batch, 1, tgt_len, src_len)* where padding elements are indicated by very large negative values. *(batch, 1, tgt_len, src_len)* where padding elements are indicated by very large negative values.
layer_head_mask (`tf.Tensor`): mask for attention heads in a given layer of size layer_head_mask (`tf.Tensor`): mask for attention heads in a given layer of size
......
...@@ -400,7 +400,7 @@ class TFSpeech2TextEncoderLayer(tf.keras.layers.Layer): ...@@ -400,7 +400,7 @@ class TFSpeech2TextEncoderLayer(tf.keras.layers.Layer):
): ):
""" """
Args: Args:
hidden_states (`tf.Tensor`): input to the layer of shape `(seq_len, batch, embed_dim)` hidden_states (`tf.Tensor`): input to the layer of shape `(batch, seq_len, embed_dim)`
attention_mask (`tf.Tensor`): attention mask of size attention_mask (`tf.Tensor`): attention mask of size
`(batch, 1, tgt_len, src_len)` where padding elements are indicated by very large negative values. `(batch, 1, tgt_len, src_len)` where padding elements are indicated by very large negative values.
layer_head_mask (`tf.Tensor`): mask for attention heads in a given layer of size layer_head_mask (`tf.Tensor`): mask for attention heads in a given layer of size
...@@ -477,11 +477,11 @@ class TFSpeech2TextDecoderLayer(tf.keras.layers.Layer): ...@@ -477,11 +477,11 @@ class TFSpeech2TextDecoderLayer(tf.keras.layers.Layer):
) -> Tuple[tf.Tensor, tf.Tensor, Tuple[Tuple[tf.Tensor]]]: ) -> Tuple[tf.Tensor, tf.Tensor, Tuple[Tuple[tf.Tensor]]]:
""" """
Args: Args:
hidden_states (`tf.Tensor`): input to the layer of shape `(seq_len, batch, embed_dim)` hidden_states (`tf.Tensor`): input to the layer of shape `(batch, seq_len, embed_dim)`
attention_mask (`tf.Tensor`): attention mask of size attention_mask (`tf.Tensor`): attention mask of size
`(batch, 1, tgt_len, src_len)` where padding elements are indicated by very large negative values. `(batch, 1, tgt_len, src_len)` where padding elements are indicated by very large negative values.
encoder_hidden_states (`tf.Tensor`): encoder_hidden_states (`tf.Tensor`):
cross attention input to the layer of shape `(seq_len, batch, embed_dim)` cross attention input to the layer of shape `(batch, seq_len, embed_dim)`
encoder_attention_mask (`tf.Tensor`): encoder attention mask of size encoder_attention_mask (`tf.Tensor`): encoder attention mask of size
`(batch, 1, tgt_len, src_len)` where padding elements are indicated by very large negative values. `(batch, 1, tgt_len, src_len)` where padding elements are indicated by very large negative values.
layer_head_mask (`tf.Tensor`): mask for attention heads in a given layer of size layer_head_mask (`tf.Tensor`): mask for attention heads in a given layer of size
......
...@@ -345,11 +345,11 @@ class Speech2Text2DecoderLayer(nn.Module): ...@@ -345,11 +345,11 @@ class Speech2Text2DecoderLayer(nn.Module):
): ):
""" """
Args: Args:
hidden_states (`torch.FloatTensor`): input to the layer of shape `(seq_len, batch, embed_dim)` hidden_states (`torch.FloatTensor`): input to the layer of shape `(batch, seq_len, embed_dim)`
attention_mask (`torch.FloatTensor`): attention mask of size attention_mask (`torch.FloatTensor`): attention mask of size
`(batch, 1, tgt_len, src_len)` where padding elements are indicated by very large negative values. `(batch, 1, tgt_len, src_len)` where padding elements are indicated by very large negative values.
encoder_hidden_states (`torch.FloatTensor`): encoder_hidden_states (`torch.FloatTensor`):
cross attention input to the layer of shape `(seq_len, batch, embed_dim)` cross attention input to the layer of shape `(batch, seq_len, embed_dim)`
encoder_attention_mask (`torch.FloatTensor`): encoder attention mask of size encoder_attention_mask (`torch.FloatTensor`): encoder attention mask of size
`(batch, 1, tgt_len, src_len)` where padding elements are indicated by very large negative values. `(batch, 1, tgt_len, src_len)` where padding elements are indicated by very large negative values.
layer_head_mask (`torch.FloatTensor`): mask for attention heads in a given layer of size layer_head_mask (`torch.FloatTensor`): mask for attention heads in a given layer of size
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment