"git@developer.sourcefind.cn:chenpangpang/transformers.git" did not exist on "4ecb022eb13d459d6f2e83014212311294bec5fc"
Unverified Commit 7fd2d686 authored by Hang's avatar Hang Committed by GitHub
Browse files

only main process should call _save on deepspeed zero3 (#25959)

only main process should call _save when deepspeed zero3
parent 95b37495
...@@ -2850,7 +2850,8 @@ class Trainer: ...@@ -2850,7 +2850,8 @@ class Trainer:
" stage3_gather_16bit_weights_on_model_save=false. Saving the full checkpoint instead, use" " stage3_gather_16bit_weights_on_model_save=false. Saving the full checkpoint instead, use"
" zero_to_fp32.py to recover weights" " zero_to_fp32.py to recover weights"
) )
self._save(output_dir, state_dict={}) if self.args.should_save:
self._save(output_dir, state_dict={})
# remove the dummy state_dict # remove the dummy state_dict
remove_dummy_checkpoint(self.args.should_save, output_dir, [WEIGHTS_NAME, SAFE_WEIGHTS_NAME]) remove_dummy_checkpoint(self.args.should_save, output_dir, [WEIGHTS_NAME, SAFE_WEIGHTS_NAME])
self.model_wrapped.save_checkpoint(output_dir) self.model_wrapped.save_checkpoint(output_dir)
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment