Let distributed training launch script report error when any trainer or kvserver fails. (#4437)
* Collect error reports
* update
* fix
Co-authored-by:
root <root@ip-10-0-80-128.ec2.internal>
Showing
Please register or sign in to comment