"...git@developer.sourcefind.cn:chenpangpang/transformers.git" did not exist on "f8e6ba454c6b534d6613f62199c007743dda4761"
Unverified Commit c1138273 authored by Mihai Balint's avatar Mihai Balint Committed by GitHub
Browse files

Fix duplicate call to save_checkpoint when using deepspeed (#14946)

* Fix duplicate call to save_checkpoint when using deepspeed / stage3_gather_fp16_weights_on_model_save

* Revert "Fix duplicate call to save_checkpoint when using deepspeed / stage3_gather_fp16_weights_on_model_save"

This reverts commit 6a3dec0397723a8417351dc38fdebf14ab17756c.

* Delete correct duplicate invocation of deepspeed save_checkpoint
parent 03885a3f
...@@ -1999,9 +1999,6 @@ class Trainer: ...@@ -1999,9 +1999,6 @@ class Trainer:
# This must be called on all ranks # This must be called on all ranks
self.deepspeed.save_fp16_model(output_dir, WEIGHTS_NAME) self.deepspeed.save_fp16_model(output_dir, WEIGHTS_NAME)
# save a deepspeed checkpoint as well (this is very fast)
self.deepspeed.save_checkpoint(output_dir)
elif self.args.should_save: elif self.args.should_save:
self._save(output_dir) self._save(output_dir)
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment