Unverified Commit ac51e59e authored by xkszltl's avatar xkszltl Committed by GitHub
Browse files

Do not use mtime for checkpoint rotation. (#28862)

Resolve https://github.com/huggingface/transformers/issues/26961
parent 06901162
...@@ -2465,7 +2465,9 @@ class Trainer: ...@@ -2465,7 +2465,9 @@ class Trainer:
# Maybe delete some older checkpoints. # Maybe delete some older checkpoints.
if self.args.should_save: if self.args.should_save:
self._rotate_checkpoints(use_mtime=True, output_dir=run_dir) # Solely rely on numerical checkpoint id for rotation.
# mtime is not reliable especially on some fuse fs in cloud environments.
self._rotate_checkpoints(use_mtime=False, output_dir=run_dir)
self.args.distributed_state.wait_for_everyone() self.args.distributed_state.wait_for_everyone()
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment