Move EMA to after backward.

Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/494 Currently EMA computation is in the after step hook. It is in the critical path where no other work is available. This increases the training iteration time. This diff moves the EMA computation to after the backward but before the optimizer step. This way, the majority of the EMA computation time on the CPU can be hidden since CPU at that time is waiting for the GPU to finish the backward anyway. This change may completely hide the EMA CPU time. It reduces the EMA time from 20ms to 4ms, where the 4ms is the GPU time. However, with this change, the EMA gets its value from the previous iteration value (since it is before step). but since we do many epochs of training, one iteration difference may not be significant. Reviewed By: tglik Differential Revision: D43527552 fbshipit-source-id: 1faa9d910b20cae0fc77da541bc0ad176bce18a8

Move EMA to after backward.
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/494 Currently EMA computation is in the after step hook. It is in the critical path where no other work is available. This increases the training iteration time. This diff moves the EMA computation to after the backward but before the optimizer step. This way, the majority of the EMA computation time on the CPU can be hidden since CPU at that time is waiting for the GPU to finish the backward anyway. This change may completely hide the EMA CPU time. It reduces the EMA time from 20ms to 4ms, where the 4ms is the GPU time. However, with this change, the EMA gets its value from the previous iteration value (since it is before step). but since we do many epochs of training, one iteration difference may not be significant. Reviewed By: tglik Differential Revision: D43527552 fbshipit-source-id: 1faa9d910b20cae0fc77da541bc0ad176bce18a8
a7dc757c · Fei Sun · Facebook GitHub Bot · 5f1ef548 · a7dc757c
Commit a7dc757c authored Mar 05, 2023 by Fei Sun Committed by Facebook GitHub Bot Mar 05, 2023
Hide whitespace changes
Inline Side-by-side

Showing with 13 additions and 0 deletions

d2go/modeling/ema.py d2go/modeling/ema.py +13 -0

No files found.
--- a/d2go/modeling/ema.py
+++ b/d2go/modeling/ema.py
@@ -185,6 +185,8 @@ def add_model_ema_configs(_C):
    _C.MODEL_EMA.USE_EMA_WEIGHTS_FOR_EVAL_ONLY = False
    # Whether to use LERP to compute EMA
    _C.MODEL_EMA.USE_LERP = False
+    # Whether to put EMA to the backward pass
+    _C.MODEL_EMA.AFTER_BACKWARD = False


 def _remove_ddp(model):
@@ -266,6 +268,7 @@ class EMAHook(HookBase):
        self.model = model
        self.ema = self.model.ema_state
        self.device = cfg.MODEL_EMA.DEVICE or cfg.MODEL.DEVICE
+        self.is_after_backward = cfg.MODEL_EMA.AFTER_BACKWARD
        self.ema_updater = EMAUpdater(
            self.model.ema_state,
            decay=cfg.MODEL_EMA.DECAY,
@@ -285,7 +288,17 @@ class EMAHook(HookBase):
    def before_step(self):
        pass

+    def after_backward(self):
+        if not self.is_after_backward:
+            return
+        self._update()
+
    def after_step(self):
+        if self.is_after_backward:
+            return
+        self._update()
+
+    def _update(self):
        if not self.model.train:
            return
        self.ema_updater.update(self.model)