Commit 7b129e6b authored by Hongkun Yu's avatar Hongkun Yu Committed by A. Unique TensorFlower
Browse files

Internal change

PiperOrigin-RevId: 477574174
parent 89bd20c5
...@@ -335,7 +335,9 @@ class TransformerScaffold(tf.keras.layers.Layer): ...@@ -335,7 +335,9 @@ class TransformerScaffold(tf.keras.layers.Layer):
training=training) training=training)
layer_output += source_attention_output layer_output += source_attention_output
else: else:
# if not norm_first, assume that the feedforwad does apply layer norm # Attention: if not norm_first, assume that the feedforwad does apply
# layer norm. The feedford also apply residual connection. Please
# read the `GatedFeedforward` as a concrete example.
layer_output = self._feedforward_block(attention_output, layer_output = self._feedforward_block(attention_output,
training=training) training=training)
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment