Update nlp.modeling.layers.ReZeroTransformer to be consistent with nlp.modeling.layers.Transformer

PiperOrigin-RevId: 315584374

Update nlp.modeling.layers.ReZeroTransformer to be consistent with nlp.modeling.layers.Transformer
PiperOrigin-RevId: 315584374
d4bb3055 · A. Unique TensorFlower · 465354df · d4bb3055
Commit d4bb3055 authored Jun 09, 2020 by A. Unique TensorFlower
Hide whitespace changes
Inline Side-by-side

Showing with 7 additions and 1 deletion

official/nlp/modeling/layers/rezero_transformer.py official/nlp/modeling/layers/rezero_transformer.py +7 -1

No files found.
--- a/official/nlp/modeling/layers/rezero_transformer.py
+++ b/official/nlp/modeling/layers/rezero_transformer.py
@@ -143,8 +143,14 @@ class ReZeroTransformer(tf.keras.layers.Layer):
        kernel_constraint=self._kernel_constraint,
        bias_constraint=self._bias_constraint,
        name="intermediate")
+    policy = tf.keras.mixed_precision.experimental.global_policy()
+    if policy.name == "mixed_bfloat16":
+      # bfloat16 causes BERT with the LAMB optimizer to not converge
+      # as well, so we use float32.
+      # TODO(b/154538392): Investigate this.
+      policy = tf.float32
    self._intermediate_activation_layer = tf.keras.layers.Activation(
-        self._intermediate_activation)
+        self._intermediate_activation, dtype=policy)
    self._output_dense = dense_einsum.DenseEinsum(
        output_shape=hidden_size,
        kernel_initializer=self._kernel_initializer,