1. 22 Aug, 2022 1 commit
  2. 17 Aug, 2022 1 commit
    • Yifan Xiong's avatar
      Update Python setup for require packages (#387) · 626ac0a4
      Yifan Xiong authored
      __Description__
      
      Update Python setup for require packages.
      
      __Major Revisions__
      * downgrade requests version to be compatible with python 3.6, add corresponding pipeline for 3.6
      * add extra entry in extras_require for nested packages
      * update `pip install` contents accordingly
      626ac0a4
  3. 26 Jul, 2022 1 commit
    • Jie Zhang's avatar
      Support topo-aware IB performance validation (#373) · ef4d6574
      Jie Zhang authored
      
      
      * Support topo-aware IB performance validation
      
      Add a new pattern `topo-aware`, so the user can run IB performance
      test based on VM's topology information. This way, the user can
      validate the IB performance across VM pairs with different distance
      as a quick test instead of pair-wise test.
      
      To run with topo-aware pattern, user needs to specify three required
      (and two optional) parameters in YAML config file:
      --pattern	topo-aware
      --ibstat	path to ibstat output
      --ibnetdiscover	path to ibnetdiscover output
      --min_dist	minimum distance of VM pairs (optional, default 2)
      --max_dist	maximum distance of VM pairs (optional, default 6)
      
      The newly added topo_aware module then parses the topology
      information, builds a graph, and generates the VM pairs with
      the specified distance (# hops).
      
      The specified IB test will then be running across these
      generated VM pairs.
      Signed-off-by: default avatarJie Zhang <jessezhang1010@gmail.com>
      
      * Add description about topology aware ib traffic tests
      Signed-off-by: default avatarJie Zhang <jessezhang1010@gmail.com>
      
      * Add unit test to verify generated topology aware config file
      
      This commit adds unit test to verify the generated topology aware
      config file is correct. To do so, four new data files are added in
      order to invoke gen_topo_aware_config function to generate topology
      aware config file, then compares it with the expected config file.
      Signed-off-by: default avatarJie Zhang <jessezhang1010@gmail.com>
      
      * Fix lint issue on Azure pipeline
      Signed-off-by: default avatarJie Zhang <jessezhang1010@gmail.com>
      ef4d6574
  4. 08 Jul, 2022 1 commit
    • Yifan Xiong's avatar
      Support node_num=1 in mpi mode (#372) · e00a8180
      Yifan Xiong authored
      Support `node_num: 1` in mpi mode, so that we can run mpi benchmarks in
      both 1 node and all nodes in one config by changing `node_num`.
      Update docs and add test case accordingly.
      e00a8180
  5. 29 Jun, 2022 1 commit
    • Yifan Xiong's avatar
      Fix issues in ib loopback benchmark (#369) · 620192a2
      Yifan Xiong authored
      Fix several issues in ib loopback benchmark:
      * use `--report_gbits` and divide by 8 to get GB/s, previous results are
        MiB/s / 1000
      * use the ib_write_bw binary built in third_party instead of system path
      * update the metrics name so that different hca indices have same metric
      620192a2
  6. 24 Jun, 2022 1 commit
    • Yifan Xiong's avatar
      Support multiple IB/GPU in ib validation (#363) · bfaa1c83
      Yifan Xiong authored
      **Description**
      
      Support multiple IB/GPU devices run simultaneously in ib validation benchmark.
      
      **Major Revisions**
      - Revise ib_validation_performance.cc so that multiple processes per node could be used to launch multiple perftest commands simultaneously. For each node pair in the config, number of processes per node will run in parallel.
      - Revise ib_validation_performance.py to correct file paths and adjust parameters to specify different NICs/GPUs/NUMA nodes.
      - Fix env issues in Dockerfile for end-to-end test.
      - Update ib-traffic configuration examples in config files.
      - Update unit tests and docs accordingly.
      
      Closes #326.
      bfaa1c83
  7. 14 Jun, 2022 1 commit
    • Yifan Xiong's avatar
      Support `sb run` on host directly without Docker (#358) · a4937e95
      Yifan Xiong authored
      **Description**
      
      Support `sb run` on host directly without Docker
      
      **Major Revisions**
      - Add `--no-docker` argument for `sb run`.
      - Run on host directly if `--no-docker` if specified.
      - Update docs and tests correspondingly.
      a4937e95
  8. 29 Apr, 2022 1 commit
    • Yifan Xiong's avatar
      Release - SuperBench v0.5.0 (#350) · 6681c720
      Yifan Xiong authored
      
      
      **Description**
      
      Cherry-pick  bug fixes from v0.5.0 to main.
      
      **Major Revisions**
      
      * Bug - Force to fix ort version as '1.10.0' (#343)
      * Bug - Support no matching rules and unify the output name in result_summary (#345)
      * Analyzer - Support regex in annotations of benchmark naming for metrics in rules (#344)
      * Bug - Fix bugs in sync results on root rank for e2e model benchmarks (#342)
      * Bug - Fix bug of duration feature for model benchmarks in distributed mode (#347)
      * Docs - Upgrade version and release note (#348)
      Co-authored-by: default avatarYuting Jiang <v-yutjiang@microsoft.com>
      6681c720
  9. 20 Apr, 2022 1 commit
  10. 15 Apr, 2022 1 commit
  11. 11 Apr, 2022 1 commit
  12. 08 Apr, 2022 2 commits
  13. 01 Apr, 2022 1 commit
  14. 16 Mar, 2022 1 commit
    • rafsalas19's avatar
      Benchmarks: Add Feature - Add GPU-Burn as microbenchmark (#324) · ff51a3ce
      rafsalas19 authored
      **Description**
      Modifications adding GPU-Burn to SuperBench.
      - added third party submodule
      - modified Makefile to make gpu-burn binary
      - added/modified microbenchmarks to add gpu-burn python scripts
      - modified default and azure_ndv4 configs to add gpu-burn
      ff51a3ce
  15. 20 Feb, 2022 1 commit
  16. 09 Feb, 2022 1 commit
  17. 30 Jan, 2022 1 commit
  18. 29 Jan, 2022 1 commit
  19. 28 Jan, 2022 1 commit
    • guoshzhao's avatar
      Benchmarks: Add Feature - Add timeout feature for each benchmark. (#288) · d877ca23
      guoshzhao authored
      **Description**
      Add timeout feature for each benchmark.
      
      **Major Revision**
      - Add `timeout` config for each benchmark. In current config files, only set the timeout for kernel-launch as example. Other benchmarks can be set in the future.
      - Set the timeout config for `ansible_runner.run()`. Runner will get the return code 254:
         [ansible.py:80][WARNING] Run failed, return code 254.
      - Using `timeout` command to terminate the client process.
      d877ca23
  20. 25 Jan, 2022 1 commit
  21. 21 Jan, 2022 1 commit
  22. 19 Jan, 2022 1 commit
  23. 18 Jan, 2022 1 commit
    • Yifan Xiong's avatar
      CLI - Add command sb benchmark [list,list-parameters] (#279) · f7ffc545
      Yifan Xiong authored
      __Description__
      
      Add command `sb benchmark list` and `sb benchmark list-parameters` to support listing all optional parameters for benchmarks.
      
      <details>
      <summary>Examples</summary>
      <pre>
      $ sb benchmark list -n [a-z]+-bw -o table
      Result
      --------
      mem-bw
      nccl-bw
      rccl-bw
      </pre>
      <pre>
      $ sb benchmark list-parameters -n mem-bw
      === mem-bw ===
      optional arguments:
        --bin_dir str         Specify the directory of the benchmark binary.
        --duration int        The elapsed time of benchmark in seconds.
        --mem_type str [str ...]
                              Memory types to benchmark. E.g. htod dtoh dtod.
        --memory str          Memory argument for bandwidthtest. E.g. pinned unpinned.
        --run_count int       The run count of benchmark.
        --shmoo_mode          Enable shmoo mode for bandwidthtest.
      default values:
      {'bin_dir': None,
       'duration': 0,
       'mem_type': ['htod', 'dtoh'],
       'memory': 'pinned',
       'run_count': 1}
      </pre>
      </details>
      
      __Major Revisions__
      * Add `sb benchmark list` to list benchmarks matching given name.
      * Add `sb benchmark list-parameters` to list parameters for benchmarks which match given name.
      
      __Minor Revisions__
      * Sort format help text for argparse.
      f7ffc545
  24. 30 Dec, 2021 1 commit
    • Yifan Xiong's avatar
      Release - SuperBench v0.4.0 (#278) · ff563b66
      Yifan Xiong authored
      
      
      __Description__
      
      Cherry-pick  bug fixes from v0.4.0 to main.
      
      __Major Revisions__
      
      * Bug - Fix issues for Ansible and benchmarks (#267)
      * Tests - Refine test cases for microbenchmark (#268)
      * Bug - Build openmpi with ucx support in rocm dockerfiles (#269)
      * Benchmarks: Fix Bug - Fix fio build issue (#272)
      * Docs - Unify metric and add doc for cublas and cudnn functions (#271)
      * Monitor: Revision - Add 'monitor/' prefix to monitor metrics in result summary (#274)
      * Bug - Fix bug of detecting if gpu_index is none (#275)
      * Bug - Fix bugs in data diagnosis (#273)
      * Bug - Fix issue that the root mpi rank may not be the first in the hostfile (#270)
      * Benchmarks: Configuration - Update inference and network benchmarks in configs (#276)
      * Docs - Upgrade version and release note (#277)
      Co-authored-by: default avatarYuting Jiang <v-yutjiang@microsoft.com>
      ff563b66
  25. 14 Dec, 2021 1 commit
  26. 13 Dec, 2021 4 commits
  27. 10 Dec, 2021 2 commits
  28. 09 Dec, 2021 1 commit
  29. 06 Dec, 2021 1 commit
  30. 30 Nov, 2021 1 commit
  31. 26 Nov, 2021 1 commit
  32. 12 Nov, 2021 1 commit
  33. 10 Nov, 2021 1 commit
  34. 09 Nov, 2021 1 commit
  35. 27 Oct, 2021 1 commit