Internal change

PiperOrigin-RevId: 450124488

Internal change
PiperOrigin-RevId: 450124488
46a12d4e · Scott Zhu · A. Unique TensorFlower · 97e6a524 · 46a12d4e
Commit 46a12d4e authored May 20, 2022 by Scott Zhu Committed by A. Unique TensorFlower May 20, 2022
Hide whitespace changes
Inline Side-by-side

Showing with 2 additions and 12 deletions

official/nlp/modeling/layers/self_attention_mask.py official/nlp/modeling/layers/self_attention_mask.py +2 -12

No files found.
--- a/official/nlp/modeling/layers/self_attention_mask.py
+++ b/official/nlp/modeling/layers/self_attention_mask.py
@@ -44,15 +44,5 @@ class SelfAttentionMask(tf.keras.layers.Layer):
        tf.reshape(to_mask, [batch_size, 1, to_seq_length]),
        dtype=inputs.dtype)

-    # We don't assume that `from_tensor` is a mask (although it could be). We
-    # don't actually care if we attend *from* padding tokens (only *to* padding)
-    # tokens so we create a tensor of all ones.
-    #
-    # `broadcast_ones` = [batch_size, from_seq_length, 1]
-    broadcast_ones = tf.ones(
-        shape=[batch_size, from_seq_length, 1], dtype=inputs.dtype)
-
-    # Here we broadcast along two dimensions to create the mask.
-    mask = broadcast_ones * to_mask
-
-    return mask
+    return tf.broadcast_to(to_mask,
+                           [batch_size, from_seq_length, to_seq_length])