@@ -169,7 +169,7 @@ class TFFunnelAttentionStructure:
...
@@ -169,7 +169,7 @@ class TFFunnelAttentionStructure:
For the factorized attention, it returns the matrices (phi, pi, psi, omega) used in the paper, appendix A.2.2,
For the factorized attention, it returns the matrices (phi, pi, psi, omega) used in the paper, appendix A.2.2,
final formula.
final formula.
For the relative shif attention, it returns all possible vectors R used in the paper, appendix A.2.1, final
For the relative shift attention, it returns all possible vectors R used in the paper, appendix A.2.1, final
formula.
formula.
Paper link: https://arxiv.org/abs/2006.03236
Paper link: https://arxiv.org/abs/2006.03236
...
@@ -1009,7 +1009,7 @@ class TFFunnelForPreTrainingOutput(ModelOutput):
...
@@ -1009,7 +1009,7 @@ class TFFunnelForPreTrainingOutput(ModelOutput):
Args:
Args:
logits (:obj:`tf.Tensor` of shape :obj:`(batch_size, sequence_length)`):
logits (:obj:`tf.Tensor` of shape :obj:`(batch_size, sequence_length)`):
Prediction scores of the head (scores for each token before SoftMax).
Prediction scores of the head (scores for each token before SoftMax).
hidden_states (:obj:`tuple(tf.ensor)`, `optional`, returned when ``output_hidden_states=True`` is passed or when ``config.output_hidden_states=True``):
hidden_states (:obj:`tuple(tf.Tensor)`, `optional`, returned when ``output_hidden_states=True`` is passed or when ``config.output_hidden_states=True``):
Tuple of :obj:`tf.Tensor` (one for the output of the embeddings + one for the output of each layer) of
Tuple of :obj:`tf.Tensor` (one for the output of the embeddings + one for the output of each layer) of
Mask to nullify selected heads of the attention modules in the encoder. Mask values selected in ``[0, 1]``:
Mask to nullify selected heads of the attention modules in the encoder. Mask values selected in ``[0, 1]``:
- 1 indicates the head is **not masked**,
- 1 indicates the head is **not masked**,
- 0 indicates the heas is **masked**.
- 0 indicates the head is **masked**.
decoder_head_mask (:obj:`tf.Tensor` of shape :obj:`(decoder_layers, decoder_attention_heads)`, `optional`):
decoder_head_mask (:obj:`tf.Tensor` of shape :obj:`(decoder_layers, decoder_attention_heads)`, `optional`):
Mask to nullify selected heads of the attention modules in the decoder. Mask values selected in ``[0, 1]``:
Mask to nullify selected heads of the attention modules in the decoder. Mask values selected in ``[0, 1]``:
...
@@ -1667,7 +1667,7 @@ class TFLEDEncoder(tf.keras.layers.Layer):
...
@@ -1667,7 +1667,7 @@ class TFLEDEncoder(tf.keras.layers.Layer):
Mask to nullify selected heads of the attention modules. Mask values selected in ``[0, 1]``:
Mask to nullify selected heads of the attention modules. Mask values selected in ``[0, 1]``:
- 1 indicates the head is **not masked**,
- 1 indicates the head is **not masked**,
- 0 indicates the heas is **masked**.
- 0 indicates the head is **masked**.
inputs_embeds (:obj:`tf.Tensor` of shape :obj:`(batch_size, sequence_length, hidden_size)`, `optional`):
inputs_embeds (:obj:`tf.Tensor` of shape :obj:`(batch_size, sequence_length, hidden_size)`, `optional`):
Optionally, instead of passing :obj:`input_ids` you can choose to directly pass an embedded
Optionally, instead of passing :obj:`input_ids` you can choose to directly pass an embedded
...
@@ -1926,14 +1926,14 @@ class TFLEDDecoder(tf.keras.layers.Layer):
...
@@ -1926,14 +1926,14 @@ class TFLEDDecoder(tf.keras.layers.Layer):
Mask to nullify selected heads of the attention modules. Mask values selected in ``[0, 1]``:
Mask to nullify selected heads of the attention modules. Mask values selected in ``[0, 1]``:
- 1 indicates the head is **not masked**,
- 1 indicates the head is **not masked**,
- 0 indicates the heas is **masked**.
- 0 indicates the head is **masked**.
encoder_head_mask (:obj:`tf.Tensor` of shape :obj:`(encoder_layers, encoder_attention_heads)`, `optional`):
encoder_head_mask (:obj:`tf.Tensor` of shape :obj:`(encoder_layers, encoder_attention_heads)`, `optional`):
Mask to nullify selected heads of the attention modules in encoder to avoid performing cross-attention
Mask to nullify selected heads of the attention modules in encoder to avoid performing cross-attention
on hidden heads. Mask values selected in ``[0, 1]``:
on hidden heads. Mask values selected in ``[0, 1]``:
- 1 indicates the head is **not masked**,
- 1 indicates the head is **not masked**,
- 0 indicates the heas is **masked**.
- 0 indicates the head is **masked**.
past_key_values (:obj:`Tuple[Tuple[tf.Tensor]]` of length :obj:`config.n_layers` with each tuple having 2 tuples each of which has 2 tensors of shape :obj:`(batch_size, num_heads, sequence_length - 1, embed_size_per_head)`):
past_key_values (:obj:`Tuple[Tuple[tf.Tensor]]` of length :obj:`config.n_layers` with each tuple having 2 tuples each of which has 2 tensors of shape :obj:`(batch_size, num_heads, sequence_length - 1, embed_size_per_head)`):
Contains precomputed key and value hidden-states of the attention blocks. Can be used to speed up
Contains precomputed key and value hidden-states of the attention blocks. Can be used to speed up