"csrc/models/llama/llama_model.hpp" did not exist on "3c6ad521e1c71eec40880dc0155c88abfc65953d"
-
Yifan Xiong authored
Enhance timeout cleanup to avoid possible hanging. __Major Revisions__ * Skip postprocess (mainly torch.dist.barrier and destroy) when exception happens (e.g., timeout, GPU crashed) to avoid subprocesses hanging. * Add cleanup to kill sb exec processes when Ansible run failed for certain benchmark. __Minor Revisions__ * Update extra Ansible timeout from 300s to 60s.
8afaa376