Fix AdamWeightDecay: self.gradient_clip_norm should be used.

PiperOrigin-RevId: 363069319

Fix AdamWeightDecay: self.gradient_clip_norm should be used.
PiperOrigin-RevId: 363069319
76f1113e · Hongkun Yu · A. Unique TensorFlower · 3dcc078a · 76f1113e
Commit 76f1113e authored Mar 15, 2021 by Hongkun Yu Committed by A. Unique TensorFlower Mar 15, 2021
Hide whitespace changes
Inline Side-by-side

Showing with 2 additions and 1 deletion

official/nlp/optimization.py official/nlp/optimization.py +2 -1

No files found.
--- a/official/nlp/optimization.py
+++ b/official/nlp/optimization.py
@@ -171,7 +171,8 @@ class AdamWeightDecay(tf.keras.optimizers.Adam):
      # and passed the allreduced grads_and_vars. For now, the
      # clip_by_global_norm will be moved to before the explicit allreduce to
      # keep the math the same as TF 1 and pre TF 2.2 implementation.
-      (grads, _) = tf.clip_by_global_norm(grads, clip_norm=1.0)
+      (grads, _) = tf.clip_by_global_norm(
+          grads, clip_norm=self.gradient_clip_norm)
    return super(AdamWeightDecay, self).apply_gradients(
        zip(grads, tvars),
        name=name,