1. 29 Nov, 2022 1 commit
    • Yang Wang's avatar
      Runner - support 'pattern' in 'mpi' mode to run tasks in parallel (#430) · e4eeda0a
      Yang Wang authored
      * add mpi-parallels mode
      
      * update according to comments
      
      * fix and update doc
      
      * update
      
      * merge into 'mpi' mode
      
      * udpate according to comments
      
      * fix testcases
      
      * fix ansible
      
      * regard pattern as field
      
      * udpate
      
      * fix flake8 version
      
      * add flake8 range
      
      * remove map-by from host config
      
      * udpate comments
      e4eeda0a
  2. 26 Jul, 2022 1 commit
    • Jie Zhang's avatar
      Support topo-aware IB performance validation (#373) · ef4d6574
      Jie Zhang authored
      
      
      * Support topo-aware IB performance validation
      
      Add a new pattern `topo-aware`, so the user can run IB performance
      test based on VM's topology information. This way, the user can
      validate the IB performance across VM pairs with different distance
      as a quick test instead of pair-wise test.
      
      To run with topo-aware pattern, user needs to specify three required
      (and two optional) parameters in YAML config file:
      --pattern	topo-aware
      --ibstat	path to ibstat output
      --ibnetdiscover	path to ibnetdiscover output
      --min_dist	minimum distance of VM pairs (optional, default 2)
      --max_dist	maximum distance of VM pairs (optional, default 6)
      
      The newly added topo_aware module then parses the topology
      information, builds a graph, and generates the VM pairs with
      the specified distance (# hops).
      
      The specified IB test will then be running across these
      generated VM pairs.
      Signed-off-by: default avatarJie Zhang <jessezhang1010@gmail.com>
      
      * Add description about topology aware ib traffic tests
      Signed-off-by: default avatarJie Zhang <jessezhang1010@gmail.com>
      
      * Add unit test to verify generated topology aware config file
      
      This commit adds unit test to verify the generated topology aware
      config file is correct. To do so, four new data files are added in
      order to invoke gen_topo_aware_config function to generate topology
      aware config file, then compares it with the expected config file.
      Signed-off-by: default avatarJie Zhang <jessezhang1010@gmail.com>
      
      * Fix lint issue on Azure pipeline
      Signed-off-by: default avatarJie Zhang <jessezhang1010@gmail.com>
      ef4d6574
  3. 05 Jul, 2022 1 commit
  4. 15 Nov, 2021 1 commit
    • guoshzhao's avatar
      Benchmarks: Add Feature - Extend the device manager utility to support more functions. (#239) · cc70f9c1
      guoshzhao authored
      **Description**
      Rename `nvidia_helper` utility as `device_manager` module and support more functions:
      ```
      device_manager.get_device_count()
      device_manager.get_device_utilization(idx)
      device_manager.get_device_temperature(idx)
      device_manager.get_device_power_limit(idx)
      device_manager.get_device_memory(idx)
      device_manager.get_device_row_remapped_info(idx)
      device_manager.get_device_ecc_error(idx)
      ```
      cc70f9c1
  5. 31 Aug, 2021 1 commit
  6. 13 Jul, 2021 1 commit
  7. 02 Jul, 2021 1 commit
    • Yifan Xiong's avatar
      Runner - Fetch benchmarks results on all nodes (#116) · fb7d4a73
      Yifan Xiong authored
      Fetch benchmarks results on all nodes, will rsync after each benchmark.
      The results directory structure on control node is as follows:
      
      ```
      outputs/
      └── datetime
          ├── nodes
          │   └── node-0
          │       ├── benchmarks
          │       │   ├── benchmark-0
          │       │   │   ├── rank-0
          │       │   │   │   └── results.json
          │       └── sb-exec.log
          ├── sb-run.log
          └── sb.config.yaml
      ```
      fb7d4a73
  8. 01 Jul, 2021 1 commit
  9. 01 Jun, 2021 1 commit
  10. 11 May, 2021 1 commit
  11. 29 Mar, 2021 1 commit
    • Yifan Xiong's avatar
      Update logger (#28) · 0e2b2b08
      Yifan Xiong authored
      Update logger class.
      * add file handler along with stream handler
      * add colored formatter
      0e2b2b08
  12. 26 Mar, 2021 1 commit
  13. 12 Mar, 2021 1 commit
  14. 24 Feb, 2021 1 commit