1. 16 Dec, 2021 1 commit
    • Yifan Xiong's avatar
      Bug - Fix issues for Ansible and benchmarks (#267) · a15f773b
      Yifan Xiong authored
      __Description__
      
      Fix issues for Ansible and benchmarks:
      * Cleanup Ansible runner private data dir to avoid out of disk space issue when node number is large.
      * Support both absolute and relative paths when fecth results.
      * Use a deterministic image in Ansible test to avoid image update.
      * Update logging format.
      * Delete torch models and inputs after export.
      a15f773b
  2. 08 Dec, 2021 1 commit
    • Yifan Xiong's avatar
      Bug - Fix issues for distributed runs (#258) · 213ab14b
      Yifan Xiong authored
      Fix issues for distributed runs:
      * fix config for memory bandwidth benchmarks
      * add throttling for high concurrency docker pull
      * update rsync path and exclude directories
      * handle exceptions when creating summary
      * tune for logging
      213ab14b
  3. 02 Jul, 2021 1 commit
    • Yifan Xiong's avatar
      Runner - Fetch benchmarks results on all nodes (#116) · fb7d4a73
      Yifan Xiong authored
      Fetch benchmarks results on all nodes, will rsync after each benchmark.
      The results directory structure on control node is as follows:
      
      ```
      outputs/
      └── datetime
          ├── nodes
          │   └── node-0
          │       ├── benchmarks
          │       │   ├── benchmark-0
          │       │   │   ├── rank-0
          │       │   │   │   └── results.json
          │       └── sb-exec.log
          ├── sb-run.log
          └── sb.config.yaml
      ```
      fb7d4a73