kd algorithm
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/420 Adds knowledge distillation as a generic algorithm that can be used by various projects. If eval, the algorithm just returns the result of the student model. If training, the algorithm feeds the input into both the student and teacher model. The user provides a list of `LayerLossMetadata` that provides the layers and losses run on these layers. The algorithm uses dynamic mixin to record the outputs of the relevant layers and compute the losses after both models are run. We provide student and teacher preprocessing as a placeholder before we support a more generic dataloader which can provide different inputs to the student and teacher (e.g., as of now, if you want to provide the teacher with a larger input then the dataloader should return a large input and the student preprocessing can downsample the input). We add the following functions as part of the user customizable distillation helper: * get_teacher => return a teacher that can be used directly by the KD algorithm * get_layer_losses => return a list of `LayerLossMetadata` that provides the layers and losses * get_preprocess_student_input => manipulate the output of the dataloader before passing to the student * get_preprocess_teacher_input => manipulate the output of the dataloader before passing to the teacher * get_combine_losses => since we may want to weight the student and distillation losses, return a function that can manipulate the loss_dict Reviewed By: chihyaoma Differential Revision: D40326412 fbshipit-source-id: 2fb0e818a7d5b120d62fb7aba314ff96cc7e10c5
Showing
Please register or sign in to comment