• Valentin Andrei's avatar
    Reduce number of parameter groups to make optimizer more efficient · 737d099b
    Valentin Andrei authored
    Summary:
    `torch.optim._multi_tensor` provides faster Optimizer implementations as it uses foreach APIs. We can enable it by modifying from `OPTIMIZER: "ADAMW"` to `OPTIMIZER: "ADAMW_MT"` in the config file.
    
    In order to profit from the speedup, we need to reduce the number of parameter groups as suggested in this post: https://fb.workplace.com/groups/1405155842844877/permalink/4971600462867046/
    
    The current implementation uses one parameter group per parameter which is not optimal. The proposed change groups parameters by learning rate and weight decay combinations.
    
    Reviewed By: zhanghang1989
    
    Differential Revision: D30272112
    
    fbshipit-source-id: d8d24298a59b52c2fc2930f7d614a0c6380a432f
    737d099b
build.py 9.12 KB