"tests/vscode:/vscode.git/clone" did not exist on "a607dd063efde73288a3ffbd9c70f5447235e4fb"
Take a dummy train step under OOM to keep multiprocessing in sync
Summary: This is not a guaranteed solution (since processes may still get out of sync if OOM happens after an all_gather/all_reduce has been done) - but should still make multiprocessing training more robust in practice since it seems we usually OOM early enough. Reviewed By: myleott Differential Revision: D13086018 fbshipit-source-id: feb1b01c2eb8818797cfdabc0faac8056ba1b4ee
Showing
Please register or sign in to comment