Unverified Commit 956f44f1 authored by Shubham Krishna's avatar Shubham Krishna Committed by GitHub
Browse files

Fix TPU checkpointing inside Trainer (#29657)

Manually call sync step
parent c9e3c0b4
...@@ -3012,6 +3012,7 @@ class Trainer: ...@@ -3012,6 +3012,7 @@ class Trainer:
logger.info(f"Saving model checkpoint to {output_dir}") logger.info(f"Saving model checkpoint to {output_dir}")
model = self.model model = self.model
xm.mark_step()
model.to("cpu") model.to("cpu")
if xm.is_master_ordinal(): if xm.is_master_ordinal():
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment