- 23 May, 2023 1 commit
-
-
Yifan Xiong authored
Add signal handler in runner to gracefully exit when receiving SIGINT (<kbd>Ctrl</kbd>+<kbd>C</kbd>) or SIGTERM during benchmark execution.
-
- 01 Nov, 2022 1 commit
-
-
Yifan Xiong authored
Add non-zero return code for `sb deploy` and `sb run` command when there're Ansible failures in control plane. Return code is set to count of failure. For failures caused by benchmarks, return code is still set per benchmark in results json file.
-
- 30 Dec, 2021 1 commit
-
-
Yifan Xiong authored
__Description__ Cherry-pick bug fixes from v0.4.0 to main. __Major Revisions__ * Bug - Fix issues for Ansible and benchmarks (#267) * Tests - Refine test cases for microbenchmark (#268) * Bug - Build openmpi with ucx support in rocm dockerfiles (#269) * Benchmarks: Fix Bug - Fix fio build issue (#272) * Docs - Unify metric and add doc for cublas and cudnn functions (#271) * Monitor: Revision - Add 'monitor/' prefix to monitor metrics in result summary (#274) * Bug - Fix bug of detecting if gpu_index is none (#275) * Bug - Fix bugs in data diagnosis (#273) * Bug - Fix issue that the root mpi rank may not be the first in the hostfile (#270) * Benchmarks: Configuration - Update inference and network benchmarks in configs (#276) * Docs - Upgrade version and release note (#277) Co-authored-by:Yuting Jiang <v-yutjiang@microsoft.com>
-
- 08 Dec, 2021 1 commit
-
-
Yifan Xiong authored
Fix issues for distributed runs: * fix config for memory bandwidth benchmarks * add throttling for high concurrency docker pull * update rsync path and exclude directories * handle exceptions when creating summary * tune for logging
-
- 02 Sep, 2021 1 commit
-
-
Yifan Xiong authored
__Description__ Fix inventory bug in ansible_runner when host list is provided with multiple hosts. It ought to be handled by ansible_runner lib, workaround by using `--inventory` arg in cmdline.
-
- 19 Aug, 2021 1 commit
-
-
Yifan Xiong authored
Support mpi mode in runner: * concate mpirun command * support mca and env config * prepare hostfile and update Ansible host pattern Co-authored-by:Peng Cheng <chengpeng5555@outlook.com>
-
- 01 Jul, 2021 1 commit
-
-
Yifan Xiong authored
Support `--host-list` for deploy and run commands. Before this change, an inventory file is needed to use `sb deploy/run`. Now, `--host-list localhost` or `-l localhost` is sufficient for quick try.
-
- 23 May, 2021 1 commit
-
-
Yifan Xiong authored
Implement ansible client and runner: * add ansible client * add deploy and check_env playbooks
-