Commit 3fce52cf authored by Anthony Chen's avatar Anthony Chen Committed by Facebook GitHub Bot
Browse files

delete loaded ckpt after use to save memory

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/574

Currently, d2go runner doesn't delete checkpoint after loading. This is fine if we run `resume=True` because all the model/optimizer/ema state in the checkpoint will be loaded into the corresponding training components. However, in the case of `resume=False`, only model state will be loaded and the optimizer/ema state will be left in memory until the end of training. This could potentially cause OOM if the checkpoint size is large.

This diff deletes loaded ckpt after use to save memory and avoid potentiall OOM issues.

Reviewed By: tglik

Differential Revision: D46674618

fbshipit-source-id: 2b70a8e46c7f2a309f83cc4deefe5d7a14783734
parent a879c1b4
......@@ -568,6 +568,7 @@ class Detectron2GoRunner(D2GoDataAPIMixIn, BaseRunner):
if resume and checkpointer.has_checkpoint()
else -1
)
del checkpoint
# The checkpoint stores the training iteration that just finished, thus we start
# at the next iteration (or iter zero if there's no checkpoint).
start_iter += 1
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment