Use parallel version of AdamW optimizer

Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/448 Tracing d2go runners with using adamw optimizer yielded small operators being executed in the optimizer code. They can be fused together by using the foreach version. QPS gain is ~4.5%. Reviewed By: miqueljubert Differential Revision: D42004110 fbshipit-source-id: 807e0a297bb0b4272f67cc4348389294145a20eb

Use parallel version of AdamW optimizer
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/448 Tracing d2go runners with using adamw optimizer yielded small operators being executed in the optimizer code. They can be fused together by using the foreach version. QPS gain is ~4.5%. Reviewed By: miqueljubert Differential Revision: D42004110 fbshipit-source-id: 807e0a297bb0b4272f67cc4348389294145a20eb
b5e5b0ad · Francisc Bungiu · Facebook GitHub Bot · 02723f24 · b5e5b0ad
Commit b5e5b0ad authored Dec 16, 2022 by Francisc Bungiu Committed by Facebook GitHub Bot Dec 16, 2022
Hide whitespace changes
Inline Side-by-side

Showing with 5 additions and 1 deletion

d2go/optimizer/build.py d2go/optimizer/build.py +5 -1

No files found.
--- a/d2go/optimizer/build.py
+++ b/d2go/optimizer/build.py
@@ -276,7 +276,11 @@ def adamw(cfg, model: torch.nn.Module) -> torch.optim.Optimizer:
    params = get_optimizer_param_groups(model, cfg)

    return maybe_add_gradient_clipping(cfg, torch.optim.AdamW)(
-        params=params, lr=cfg.SOLVER.BASE_LR, betas=cfg.SOLVER.BETAS, eps=cfg.SOLVER.EPS
+        params=params,
+        lr=cfg.SOLVER.BASE_LR,
+        betas=cfg.SOLVER.BETAS,
+        eps=cfg.SOLVER.EPS,
+        foreach=True,
    )