- 16 Dec, 2021 1 commit
-
-
Yifan Xiong authored
__Description__ Fix issues for Ansible and benchmarks: * Cleanup Ansible runner private data dir to avoid out of disk space issue when node number is large. * Support both absolute and relative paths when fecth results. * Use a deterministic image in Ansible test to avoid image update. * Update logging format. * Delete torch models and inputs after export.
-
- 08 Dec, 2021 1 commit
-
-
Yifan Xiong authored
Fix issues for distributed runs: * fix config for memory bandwidth benchmarks * add throttling for high concurrency docker pull * update rsync path and exclude directories * handle exceptions when creating summary * tune for logging
-
- 02 Jul, 2021 1 commit
-
-
Yifan Xiong authored
Fetch benchmarks results on all nodes, will rsync after each benchmark. The results directory structure on control node is as follows: ``` outputs/ └── datetime ├── nodes │ └── node-0 │ ├── benchmarks │ │ ├── benchmark-0 │ │ │ ├── rank-0 │ │ │ │ └── results.json │ └── sb-exec.log ├── sb-run.log └── sb.config.yaml ```
-