Commit b5e5b0ad authored by Francisc Bungiu's avatar Francisc Bungiu Committed by Facebook GitHub Bot
Browse files

Use parallel version of AdamW optimizer

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/448

Tracing d2go runners with using adamw optimizer yielded small operators being executed in the optimizer code. They can be fused together by using the foreach version.

QPS gain is ~4.5%.

Reviewed By: miqueljubert

Differential Revision: D42004110

fbshipit-source-id: 807e0a297bb0b4272f67cc4348389294145a20eb
parent 02723f24
......@@ -276,7 +276,11 @@ def adamw(cfg, model: torch.nn.Module) -> torch.optim.Optimizer:
params = get_optimizer_param_groups(model, cfg)
return maybe_add_gradient_clipping(cfg, torch.optim.AdamW)(
params=params, lr=cfg.SOLVER.BASE_LR, betas=cfg.SOLVER.BETAS, eps=cfg.SOLVER.EPS
params=params,
lr=cfg.SOLVER.BASE_LR,
betas=cfg.SOLVER.BETAS,
eps=cfg.SOLVER.EPS,
foreach=True,
)
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment