"docs/git@developer.sourcefind.cn:tsoc/superbenchmark.git" did not exist on "7a162020a8342c81e2574937c84185a90869ace1"
Enhance timeout cleanup to avoid possible hanging (#405)
Enhance timeout cleanup to avoid possible hanging. __Major Revisions__ * Skip postprocess (mainly torch.dist.barrier and destroy) when exception happens (e.g., timeout, GPU crashed) to avoid subprocesses hanging. * Add cleanup to kill sb exec processes when Ansible run failed for certain benchmark. __Minor Revisions__ * Update extra Ansible timeout from 300s to 60s.
Showing
Please register or sign in to comment