Do not scale loss manually for BERT classifier when compile/fit() API is used.

PiperOrigin-RevId: 275142626

Do not scale loss manually for BERT classifier when compile/fit() API is used.
PiperOrigin-RevId: 275142626
306e3e14 · A. Unique TensorFlower · 06412123 · 306e3e14
Commit 306e3e14 authored Oct 16, 2019 by A. Unique TensorFlower
Show whitespace changes
Inline Side-by-side

Showing with 11 additions and 4 deletions

official/nlp/bert/run_classifier.py official/nlp/bert/run_classifier.py +11 -4

No files found.
--- a/official/nlp/bert/run_classifier.py
+++ b/official/nlp/bert/run_classifier.py
@@ -132,10 +132,17 @@ def run_bert_classifier(strategy,
          classifier_model.optimizer)
    return classifier_model, core_model

-  loss_fn = get_loss_fn(
-      num_classes,
-      loss_factor=1.0 /
-      strategy.num_replicas_in_sync if FLAGS.scale_loss else 1.0)
+  # During distributed training, loss used for gradient computation is
+  # summed over from all replicas. When Keras compile/fit() API is used,
+  # the fit() API internally normalizes the loss by dividing the loss by
+  # the number of replicas used for computation. However, when custom
+  # training loop is used this is not done automatically and should be
+  # done manually by the end user.
+  loss_multiplier = 1.0
+  if FLAGS.scale_loss and not use_keras_compile_fit:
+    loss_multiplier = 1.0 / strategy.num_replicas_in_sync
+
+  loss_fn = get_loss_fn(num_classes, loss_factor=loss_multiplier)

  # Defines evaluation metrics function, which will create metrics in the
  # correct device and strategy scope.