Unverified Commit 24107c2c authored by Jin Young (Daniel) Sohn's avatar Jin Young (Daniel) Sohn Committed by GitHub
Browse files

Fix TPU Convergence bug introduced by PR#6151 (#6488)

Currently with the bug introduced we're taking two optimizer steps per
batch: one global one, where `xm.optimizer_step` injects a CRS between
all cores in training, and one without. This has been affecting training
accuracy (for example, XLNet GLUE on MNLI is not converging, etc.).
parent 895ed8f4
......@@ -572,7 +572,7 @@ class Trainer:
if is_torch_tpu_available():
xm.optimizer_step(self.optimizer)
if self.args.fp16 and _use_native_amp:
elif self.args.fp16 and _use_native_amp:
self.scaler.step(self.optimizer)
self.scaler.update()
else:
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment