Add reply files to d2go training processes
Summary: This diff contains a minimal set of changes to support returning reply files to MAST. There are three parts: 1. First, we have a try..except in the main function to catch all the "catchable" Python exceptions. Exceptions from C++ code or segfaults will not be handled here. 2. Each exception is then written to a per-process JSON reply file. 3. At the end, all per-process files are stat-ed and the earliest file is copied to a location specified by MAST. # Limitations 1. This only works when local processes are launched using multiprocessing (which is the default) 2. If any error happens in C++ code - it will likely not be caught in Python and the reply file might not have the correct logs Differential Revision: D43097683 fbshipit-source-id: 0eaf4f19f6199a9c77f2ce4c7d2bbc2a2078be99
Showing
Please register or sign in to comment