1. 20 Jan, 2023 2 commits
  2. 19 Jan, 2023 1 commit
  3. 18 Jan, 2023 1 commit
  4. 15 Jan, 2023 1 commit
  5. 13 Jan, 2023 1 commit
    • Russell J. Hewett's avatar
      Benchmarks - Fix missing include in FP8 benchmark (#460) · ca0dccac
      Russell J. Hewett authored
      **Description**
      
      I couldn't build the fp8 benchmark.
      
      `
      [ 25%] Building CXX object
      CMakeFiles/cublaslt_utils.dir/cublaslt_utils.cc.o
      In file included from
      <scrubbed>/superbench/benchmarks/micro_benchmarks/cublaslt_fp8_gemm/cublaslt_utils.cc:4:
      
      <scrubbed>/superbench/benchmarks/micro_benchmarks/cublaslt_fp8_gemm/cublaslt_utils.h:
      In function ‘void checkCublasStatus(cublasStatus_t)’:
      
      <scrubbed>/superbench/benchmarks/micro_benchmarks/cublaslt_fp8_gemm/cublaslt_utils.h:15:20:
      error: ‘logic_error’ is not a member of ‘std’
         15 |         throw std::logic_error("cuBLAS API failed");
            |                    ^~~~~~~~~~~
      
      <scrubbed>/superbench/benchmarks/micro_benchmarks/cublaslt_fp8_gemm/cublaslt_utils.cc:
      In member function ‘size_t cublasLtGemm::GetAlgorithm(int, size_t)’:
      
      <scrubbed>/superbench/benchmarks/micro_benchmarks/cublaslt_fp8_gemm/cublaslt_utils.cc:103:20:
      error: ‘runtime_error’ is not a member of ‘std’
      103 | throw std::runtime_error("Unable to find any suitable
      algorithms");
            |                    ^~~~~~~~~~~~~
      make[2]: *** [CMakeFiles/cublaslt_utils.dir/build.make:76:
      CMakeFiles/cublaslt_utils.dir/cublaslt_utils.cc.o] Error 1
      make[1]: *** [CMakeFiles/Makefile2:85:
      CMakeFiles/cublaslt_utils.dir/all] Error 2
      make: *** [Makefile:136: all] Error 2
      `
      
      Adding stdexcept fixed this.
      ca0dccac
  6. 04 Jan, 2023 3 commits
  7. 03 Jan, 2023 6 commits
  8. 30 Dec, 2022 2 commits
  9. 29 Dec, 2022 2 commits
  10. 28 Dec, 2022 1 commit
  11. 27 Dec, 2022 1 commit
  12. 14 Dec, 2022 2 commits
  13. 29 Nov, 2022 1 commit
    • Yang Wang's avatar
      Runner - support 'pattern' in 'mpi' mode to run tasks in parallel (#430) · e4eeda0a
      Yang Wang authored
      * add mpi-parallels mode
      
      * update according to comments
      
      * fix and update doc
      
      * update
      
      * merge into 'mpi' mode
      
      * udpate according to comments
      
      * fix testcases
      
      * fix ansible
      
      * regard pattern as field
      
      * udpate
      
      * fix flake8 version
      
      * add flake8 range
      
      * remove map-by from host config
      
      * udpate comments
      e4eeda0a
  14. 18 Nov, 2022 1 commit
  15. 17 Nov, 2022 1 commit
  16. 01 Nov, 2022 1 commit
    • Yifan Xiong's avatar
      CLI - Add non-zero return code for `sb [deploy,run]` (#425) · 1b86503d
      Yifan Xiong authored
      Add non-zero return code for `sb deploy` and `sb run` command when
      there're Ansible failures in control plane.
      Return code is set to count of failure.
      
      For failures caused by benchmarks, return code is still set per benchmark
      in results json file.
      1b86503d
  17. 31 Oct, 2022 1 commit
  18. 27 Oct, 2022 1 commit
  19. 20 Oct, 2022 1 commit
  20. 18 Oct, 2022 2 commits
  21. 06 Sep, 2022 1 commit
    • Yifan Xiong's avatar
      Release - SuperBench v0.6.0 (#409) · 63e9b2d1
      Yifan Xiong authored
      
      
      **Description**
      
      Cherry-pick bug fixes from v0.6.0 to main.
      
      **Major Revisions**
      
      * Enable latency test in ib traffic validation distributed benchmark (#396)
      * Enhance parameter parsing to allow spaces in value (#397)
      * Update apt packages in dockerfile (#398)
      * Upgrade colorlog for NO_COLOR support (#404)
      * Analyzer - Update error handling to support exit code of sb result diagnosis (#403)
      * Analyzer - Make baseline file optional in data diagnosis and fix bugs (#399)
      * Enhance timeout cleanup to avoid possible hanging (#405)
      * Auto generate ibstat file by pssh (#402)
      * Analyzer - Format int type and unify empty value to N/A in diagnosis output file (#406)
      * Docs - Upgrade version and release note (#407)
      * Docs - Fix issues in document (#408)
      Co-authored-by: default avatarYang Wang <yangwang1@microsoft.com>
      Co-authored-by: default avatarYuting Jiang <yutingjiang@microsoft.com>
      63e9b2d1
  22. 23 Aug, 2022 1 commit
    • Yuting Jiang's avatar
      Analyzer - Add support to store values of metrics in data diagnosis (#392) · 733860d7
      Yuting Jiang authored
      **Description**
      Add support to store values of metrics in data diagnosis.
      
      Take the following rules as example: 
      ```
          nccl_store_rule:
            categories: NCCL_DIS
            store: True
            metrics:
              - nccl-bw:allreduce-run0/allreduce_1073741824_busbw
              - nccl-bw:allreduce-run1/allreduce_1073741824_busbw
              - nccl-bw:allreduce-run2/allreduce_1073741824_busbw
              - nccl-bw:allreduce-run3/allreduce_1073741824_busbw
              - nccl-bw:allreduce-run4/allreduce_1073741824_busbw
          nccl_rule:
            function: multi_rules
            criteria: 'lambda label:True if min(label["nccl_store_rule"].values())/max(label["nccl_store_rule"].values())<0.95 else False'
            categories: NCCL_DIS
      ```
      **nccl_store_rule** will store the values of the metrics in dict and save them into `label["nccl_store_rule"]` , and then **rccl_rule** can use the values of metrics through `label["nccl_store_rule"].values()` in criteria
      733860d7
  23. 22 Aug, 2022 1 commit
  24. 17 Aug, 2022 1 commit
    • Yifan Xiong's avatar
      Update Python setup for require packages (#387) · 626ac0a4
      Yifan Xiong authored
      __Description__
      
      Update Python setup for require packages.
      
      __Major Revisions__
      * downgrade requests version to be compatible with python 3.6, add corresponding pipeline for 3.6
      * add extra entry in extras_require for nested packages
      * update `pip install` contents accordingly
      626ac0a4
  25. 16 Aug, 2022 1 commit
  26. 13 Aug, 2022 1 commit
  27. 09 Aug, 2022 1 commit
  28. 08 Aug, 2022 1 commit