d2go/optimizer/build.py · 737d099b0a8b0fb1f548435e73f95e1252442827 · OpenDAS / d2go

Reduce number of parameter groups to make optimizer more efficient · 737d099b

Valentin Andrei authored Aug 13, 2021

Summary:
`torch.optim._multi_tensor` provides faster Optimizer implementations as it uses foreach APIs. We can enable it by modifying from `OPTIMIZER: "ADAMW"` to `OPTIMIZER: "ADAMW_MT"` in the config file.

In order to profit from the speedup, we need to reduce the number of parameter groups as suggested in this post: https://fb.workplace.com/groups/1405155842844877/permalink/4971600462867046/

The current implementation uses one parameter group per parameter which is not optimal. The proposed change groups parameters by learning rate and weight decay combinations.

Reviewed By: zhanghang1989

Differential Revision: D30272112

fbshipit-source-id: d8d24298a59b52c2fc2930f7d614a0c6380a432f

737d099b

build.py 9.12 KB

Replace build.py