1. 14 Jun, 2022 1 commit
    • Yifan Xiong's avatar
      Support `sb run` on host directly without Docker (#358) · a4937e95
      Yifan Xiong authored
      **Description**
      
      Support `sb run` on host directly without Docker
      
      **Major Revisions**
      - Add `--no-docker` argument for `sb run`.
      - Run on host directly if `--no-docker` if specified.
      - Update docs and tests correspondingly.
      a4937e95
  2. 01 Jun, 2022 1 commit
    • user4543's avatar
      Analyzer - Fix bugs in data diagnosis (#355) · 54da021b
      user4543 authored
      **Description**
      Fix bugs in data diagnosis.
      
      **Major Revision**
      - add support to get baseline of the metric which uses custom benchmark naming with ':' like 'nccl-bw:default/allreduce_8_bw:0'
      - save raw data of all metrics rather than metrics defined in diagnosis_rules.yaml when output_all is True
      - fix bug of using wrong column index when applying format(red color and percentile) in the excel
      54da021b
  3. 29 Apr, 2022 1 commit
    • Yifan Xiong's avatar
      Release - SuperBench v0.5.0 (#350) · 6681c720
      Yifan Xiong authored
      
      
      **Description**
      
      Cherry-pick  bug fixes from v0.5.0 to main.
      
      **Major Revisions**
      
      * Bug - Force to fix ort version as '1.10.0' (#343)
      * Bug - Support no matching rules and unify the output name in result_summary (#345)
      * Analyzer - Support regex in annotations of benchmark naming for metrics in rules (#344)
      * Bug - Fix bugs in sync results on root rank for e2e model benchmarks (#342)
      * Bug - Fix bug of duration feature for model benchmarks in distributed mode (#347)
      * Docs - Upgrade version and release note (#348)
      Co-authored-by: default avatarYuting Jiang <v-yutjiang@microsoft.com>
      6681c720
  4. 11 Apr, 2022 2 commits
  5. 10 Apr, 2022 1 commit
  6. 08 Apr, 2022 1 commit
  7. 01 Apr, 2022 1 commit
  8. 24 Mar, 2022 1 commit
  9. 22 Mar, 2022 1 commit
  10. 21 Mar, 2022 1 commit
  11. 17 Mar, 2022 1 commit
  12. 16 Mar, 2022 1 commit
    • rafsalas19's avatar
      Benchmarks: Add Feature - Add GPU-Burn as microbenchmark (#324) · ff51a3ce
      rafsalas19 authored
      **Description**
      Modifications adding GPU-Burn to SuperBench.
      - added third party submodule
      - modified Makefile to make gpu-burn binary
      - added/modified microbenchmarks to add gpu-burn python scripts
      - modified default and azure_ndv4 configs to add gpu-burn
      ff51a3ce
  13. 15 Mar, 2022 2 commits
  14. 09 Mar, 2022 1 commit
  15. 07 Mar, 2022 1 commit
  16. 06 Mar, 2022 1 commit
  17. 24 Feb, 2022 1 commit
  18. 22 Feb, 2022 1 commit
  19. 20 Feb, 2022 2 commits
  20. 15 Feb, 2022 1 commit
  21. 10 Feb, 2022 1 commit
  22. 09 Feb, 2022 1 commit
  23. 08 Feb, 2022 2 commits
  24. 07 Feb, 2022 1 commit
    • Ziyue Yang's avatar
      Benchmarks: Revise Code - Reduce result variance in gpu_copy benchmark (#298) · 85389055
      Ziyue Yang authored
      **Description**
      This commit does the following to optimize result variance in gpu_copy benchmark:
      1) Add warmup phase for gpu_copy benchmark to avoid timing instability caused by first-time CUDA kernel launch overhead;
      2) Use CUDA events for timing instead of CPU timestamps;
      3) Make data checking an option that is not preferred to be enabled in performance test;
      4) Enlarge message size in performance benchmark.
      85389055
  25. 30 Jan, 2022 1 commit
  26. 29 Jan, 2022 3 commits
  27. 28 Jan, 2022 2 commits
    • guoshzhao's avatar
      Benchmarks: Add Feature - Sync the E2E training results among all workers for each step. (#287) · d03d110f
      guoshzhao authored
      **Description**
      Please write a brief description and link the related issue if have.
      
      **Major Revision**
      - Sync (do allreduce max) the E2E training results among all workers.
      - Avoid using ':0' in metric name if there has only one rank having output.
      d03d110f
    • guoshzhao's avatar
      Benchmarks: Add Feature - Add timeout feature for each benchmark. (#288) · d877ca23
      guoshzhao authored
      **Description**
      Add timeout feature for each benchmark.
      
      **Major Revision**
      - Add `timeout` config for each benchmark. In current config files, only set the timeout for kernel-launch as example. Other benchmarks can be set in the future.
      - Set the timeout config for `ansible_runner.run()`. Runner will get the return code 254:
         [ansible.py:80][WARNING] Run failed, return code 254.
      - Using `timeout` command to terminate the client process.
      d877ca23
  28. 27 Jan, 2022 1 commit
  29. 25 Jan, 2022 1 commit
  30. 24 Jan, 2022 2 commits
  31. 21 Jan, 2022 1 commit
  32. 19 Jan, 2022 1 commit