• Jeremy Reizenstein's avatar
    different learning rate for different parts · fe5bdb2f
    Jeremy Reizenstein authored
    Summary:
    Adds the ability to have different learning rates for different parts of the model. The trainable parts of the implicitron have a new member
    
           param_groups: dictionary where keys are names of individual parameters,
                or module’s members and values are the parameter group where the
                parameter/member will be sorted to. "self" key is used to denote the
                parameter group at the module level. Possible keys, including the "self" key
                do not have to be defined. By default all parameters are put into "default"
                parameter group and have the learning rate defined in the optimizer,
                it can be overriden at the:
                    - module level with “self” key, all the parameters and child
                        module s parameters will be put to that parameter group
                    - member level, which is the same as if the `param_groups` in that
                        member has key=“self” and value equal to that parameter group.
                        This is useful if members do not have `param_groups`, for
                        example torch.nn.Linear.
                    - parameter level, parameter with the same name as the key
                        will be put to that parameter group.
    
    And in the optimizer factory, parameters and their learning rates are recursively gathered.
    
    Reviewed By: shapovalov
    
    Differential Revision: D40145802
    
    fbshipit-source-id: 631c02b8d79ee1c0eb4c31e6e42dbd3d2882078a
    fe5bdb2f
optimizer_factory.py 13.2 KB