1. 13 Aug, 2022 1 commit
  2. 26 Jul, 2022 1 commit
    • Jie Zhang's avatar
      Support topo-aware IB performance validation (#373) · ef4d6574
      Jie Zhang authored
      
      
      * Support topo-aware IB performance validation
      
      Add a new pattern `topo-aware`, so the user can run IB performance
      test based on VM's topology information. This way, the user can
      validate the IB performance across VM pairs with different distance
      as a quick test instead of pair-wise test.
      
      To run with topo-aware pattern, user needs to specify three required
      (and two optional) parameters in YAML config file:
      --pattern	topo-aware
      --ibstat	path to ibstat output
      --ibnetdiscover	path to ibnetdiscover output
      --min_dist	minimum distance of VM pairs (optional, default 2)
      --max_dist	maximum distance of VM pairs (optional, default 6)
      
      The newly added topo_aware module then parses the topology
      information, builds a graph, and generates the VM pairs with
      the specified distance (# hops).
      
      The specified IB test will then be running across these
      generated VM pairs.
      Signed-off-by: default avatarJie Zhang <jessezhang1010@gmail.com>
      
      * Add description about topology aware ib traffic tests
      Signed-off-by: default avatarJie Zhang <jessezhang1010@gmail.com>
      
      * Add unit test to verify generated topology aware config file
      
      This commit adds unit test to verify the generated topology aware
      config file is correct. To do so, four new data files are added in
      order to invoke gen_topo_aware_config function to generate topology
      aware config file, then compares it with the expected config file.
      Signed-off-by: default avatarJie Zhang <jessezhang1010@gmail.com>
      
      * Fix lint issue on Azure pipeline
      Signed-off-by: default avatarJie Zhang <jessezhang1010@gmail.com>
      ef4d6574
  3. 05 Jul, 2022 1 commit
  4. 24 Jan, 2022 1 commit
  5. 15 Nov, 2021 1 commit
    • guoshzhao's avatar
      Benchmarks: Add Feature - Extend the device manager utility to support more functions. (#239) · cc70f9c1
      guoshzhao authored
      **Description**
      Rename `nvidia_helper` utility as `device_manager` module and support more functions:
      ```
      device_manager.get_device_count()
      device_manager.get_device_utilization(idx)
      device_manager.get_device_temperature(idx)
      device_manager.get_device_power_limit(idx)
      device_manager.get_device_memory(idx)
      device_manager.get_device_row_remapped_info(idx)
      device_manager.get_device_ecc_error(idx)
      ```
      cc70f9c1
  6. 31 Aug, 2021 1 commit
  7. 13 Jul, 2021 1 commit
  8. 09 Jul, 2021 1 commit
  9. 02 Jul, 2021 1 commit
    • Yifan Xiong's avatar
      Runner - Fetch benchmarks results on all nodes (#116) · fb7d4a73
      Yifan Xiong authored
      Fetch benchmarks results on all nodes, will rsync after each benchmark.
      The results directory structure on control node is as follows:
      
      ```
      outputs/
      └── datetime
          ├── nodes
          │   └── node-0
          │       ├── benchmarks
          │       │   ├── benchmark-0
          │       │   │   ├── rank-0
          │       │   │   │   └── results.json
          │       └── sb-exec.log
          ├── sb-run.log
          └── sb.config.yaml
      ```
      fb7d4a73
  10. 01 Jul, 2021 1 commit
  11. 23 Jun, 2021 1 commit
    • Yifan Xiong's avatar
      Bug bash - Fix bugs in multi GPU benchmarks (#98) · c0c43b8f
      Yifan Xiong authored
      * Add `sb deploy` command content.
      * Fix inline if-expression syntax in playbook.
      * Fix quote escape issue in bash command.
      * Add custom env in config.
      * Update default config for multi GPU benchmarks.
      * Update MANIFEST.in to include jinja2 template.
      * Require jinja2 minimum version.
      * Fix occasional duplicate output in Ansible runner.
      * Fix mixed color from Ansible and Python colorlog.
      * Update according to comments.
      * Change superbench.env from list to dict in config file.
      c0c43b8f
  12. 16 Jun, 2021 1 commit
    • Yifan Xiong's avatar
      Bug bash - Fix bugs and refine log in single GPU benchmarks (#97) · ddbc51a1
      Yifan Xiong authored
      Fix bugs and refine log in single GPU benchmarks:
      
      * Fix none framework issue
      * Fix empty parameter bug
      * Remove missed mobilenet_v3 models
      * Change benchmark registration log to debug level
      * Add pid in logging
      * Add missing benchmarks in default config
      * Fix deprecated logging warn
      ddbc51a1
  13. 01 Jun, 2021 1 commit
  14. 18 May, 2021 1 commit
  15. 11 May, 2021 1 commit
  16. 29 Mar, 2021 1 commit
    • Yifan Xiong's avatar
      Update logger (#28) · 0e2b2b08
      Yifan Xiong authored
      Update logger class.
      * add file handler along with stream handler
      * add colored formatter
      0e2b2b08
  17. 26 Mar, 2021 1 commit
  18. 12 Mar, 2021 1 commit
  19. 04 Mar, 2021 1 commit
  20. 24 Feb, 2021 1 commit