"git@developer.sourcefind.cn:change/sglang.git" did not exist on "120c3634efa52f733bef4b8290aff9a70ba65215"
Move EMA weights to current device before training
Summary: Currently we move EMA weights to expected device right after loading from checkpoint. However, by the time on_load_checkpoint hook is called, current GPU device has not been assigned. This could lead to EMA weights on cuda:0 while the model is on cuda:1. This diff move EMA weights to device in `on_pretrain_routine_end` instead. Reviewed By: zhanghang1989 Differential Revision: D28429843 fbshipit-source-id: d864fb3687eb6958872300c5ec0af7ce90591f83
Showing
Please register or sign in to comment