"...git@developer.sourcefind.cn:chenpangpang/transformers.git" did not exist on "58faa7b82472a712b95453382bff73ee917a06d0"
Unverified Commit 7fd2d686 authored by Hang's avatar Hang Committed by GitHub
Browse files

only main process should call _save on deepspeed zero3 (#25959)

only main process should call _save when deepspeed zero3
parent 95b37495
...@@ -2850,6 +2850,7 @@ class Trainer: ...@@ -2850,6 +2850,7 @@ class Trainer:
" stage3_gather_16bit_weights_on_model_save=false. Saving the full checkpoint instead, use" " stage3_gather_16bit_weights_on_model_save=false. Saving the full checkpoint instead, use"
" zero_to_fp32.py to recover weights" " zero_to_fp32.py to recover weights"
) )
if self.args.should_save:
self._save(output_dir, state_dict={}) self._save(output_dir, state_dict={})
# remove the dummy state_dict # remove the dummy state_dict
remove_dummy_checkpoint(self.args.should_save, output_dir, [WEIGHTS_NAME, SAFE_WEIGHTS_NAME]) remove_dummy_checkpoint(self.args.should_save, output_dir, [WEIGHTS_NAME, SAFE_WEIGHTS_NAME])
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment