1. 17 Nov, 2022 1 commit
  2. 31 Oct, 2022 1 commit
  3. 18 Oct, 2022 1 commit
  4. 06 Sep, 2022 1 commit
    • Yifan Xiong's avatar
      Release - SuperBench v0.6.0 (#409) · 63e9b2d1
      Yifan Xiong authored
      
      
      **Description**
      
      Cherry-pick bug fixes from v0.6.0 to main.
      
      **Major Revisions**
      
      * Enable latency test in ib traffic validation distributed benchmark (#396)
      * Enhance parameter parsing to allow spaces in value (#397)
      * Update apt packages in dockerfile (#398)
      * Upgrade colorlog for NO_COLOR support (#404)
      * Analyzer - Update error handling to support exit code of sb result diagnosis (#403)
      * Analyzer - Make baseline file optional in data diagnosis and fix bugs (#399)
      * Enhance timeout cleanup to avoid possible hanging (#405)
      * Auto generate ibstat file by pssh (#402)
      * Analyzer - Format int type and unify empty value to N/A in diagnosis output file (#406)
      * Docs - Upgrade version and release note (#407)
      * Docs - Fix issues in document (#408)
      Co-authored-by: default avatarYang Wang <yangwang1@microsoft.com>
      Co-authored-by: default avatarYuting Jiang <yutingjiang@microsoft.com>
      63e9b2d1
  5. 17 Aug, 2022 1 commit
    • Yifan Xiong's avatar
      Update Python setup for require packages (#387) · 626ac0a4
      Yifan Xiong authored
      __Description__
      
      Update Python setup for require packages.
      
      __Major Revisions__
      * downgrade requests version to be compatible with python 3.6, add corresponding pipeline for 3.6
      * add extra entry in extras_require for nested packages
      * update `pip install` contents accordingly
      626ac0a4
  6. 13 Aug, 2022 1 commit
  7. 26 Jul, 2022 1 commit
    • Jie Zhang's avatar
      Support topo-aware IB performance validation (#373) · ef4d6574
      Jie Zhang authored
      
      
      * Support topo-aware IB performance validation
      
      Add a new pattern `topo-aware`, so the user can run IB performance
      test based on VM's topology information. This way, the user can
      validate the IB performance across VM pairs with different distance
      as a quick test instead of pair-wise test.
      
      To run with topo-aware pattern, user needs to specify three required
      (and two optional) parameters in YAML config file:
      --pattern	topo-aware
      --ibstat	path to ibstat output
      --ibnetdiscover	path to ibnetdiscover output
      --min_dist	minimum distance of VM pairs (optional, default 2)
      --max_dist	maximum distance of VM pairs (optional, default 6)
      
      The newly added topo_aware module then parses the topology
      information, builds a graph, and generates the VM pairs with
      the specified distance (# hops).
      
      The specified IB test will then be running across these
      generated VM pairs.
      Signed-off-by: default avatarJie Zhang <jessezhang1010@gmail.com>
      
      * Add description about topology aware ib traffic tests
      Signed-off-by: default avatarJie Zhang <jessezhang1010@gmail.com>
      
      * Add unit test to verify generated topology aware config file
      
      This commit adds unit test to verify the generated topology aware
      config file is correct. To do so, four new data files are added in
      order to invoke gen_topo_aware_config function to generate topology
      aware config file, then compares it with the expected config file.
      Signed-off-by: default avatarJie Zhang <jessezhang1010@gmail.com>
      
      * Fix lint issue on Azure pipeline
      Signed-off-by: default avatarJie Zhang <jessezhang1010@gmail.com>
      ef4d6574
  8. 13 Jul, 2022 1 commit
    • Yifan Xiong's avatar
      Add dependencies (#374) · 16b6385d
      Yifan Xiong authored
      Add dependencies
      
      * include ndv4-topo.xml in cuda docker images
      * require requests version to avoid RequestsDependencyWarning
      16b6385d
  9. 05 Jul, 2022 1 commit
  10. 14 Jun, 2022 1 commit
    • Yifan Xiong's avatar
      Support `sb run` on host directly without Docker (#358) · a4937e95
      Yifan Xiong authored
      **Description**
      
      Support `sb run` on host directly without Docker
      
      **Major Revisions**
      - Add `--no-docker` argument for `sb run`.
      - Run on host directly if `--no-docker` if specified.
      - Update docs and tests correspondingly.
      a4937e95
  11. 29 Apr, 2022 1 commit
    • Yifan Xiong's avatar
      Release - SuperBench v0.5.0 (#350) · 6681c720
      Yifan Xiong authored
      
      
      **Description**
      
      Cherry-pick  bug fixes from v0.5.0 to main.
      
      **Major Revisions**
      
      * Bug - Force to fix ort version as '1.10.0' (#343)
      * Bug - Support no matching rules and unify the output name in result_summary (#345)
      * Analyzer - Support regex in annotations of benchmark naming for metrics in rules (#344)
      * Bug - Fix bugs in sync results on root rank for e2e model benchmarks (#342)
      * Bug - Fix bug of duration feature for model benchmarks in distributed mode (#347)
      * Docs - Upgrade version and release note (#348)
      Co-authored-by: default avatarYuting Jiang <v-yutjiang@microsoft.com>
      6681c720
  12. 15 Mar, 2022 1 commit
    • user4543's avatar
      Analyzer - Add md and html output format for DataDiagnosis (#325) · b3c95f18
      user4543 authored
      **Description**
      Add md and html output format for DataDiagnosis.
      
      **Major Revision**
      - add md and html support in file_handler
      - add interface in DataDiagnosis for md and HTML output
      
      **Minor Revision**
      - move excel and json output interface into DataDiagnosis
      b3c95f18
  13. 19 Jan, 2022 1 commit
  14. 30 Dec, 2021 1 commit
    • Yifan Xiong's avatar
      Release - SuperBench v0.4.0 (#278) · ff563b66
      Yifan Xiong authored
      
      
      __Description__
      
      Cherry-pick  bug fixes from v0.4.0 to main.
      
      __Major Revisions__
      
      * Bug - Fix issues for Ansible and benchmarks (#267)
      * Tests - Refine test cases for microbenchmark (#268)
      * Bug - Build openmpi with ucx support in rocm dockerfiles (#269)
      * Benchmarks: Fix Bug - Fix fio build issue (#272)
      * Docs - Unify metric and add doc for cublas and cudnn functions (#271)
      * Monitor: Revision - Add 'monitor/' prefix to monitor metrics in result summary (#274)
      * Bug - Fix bug of detecting if gpu_index is none (#275)
      * Bug - Fix bugs in data diagnosis (#273)
      * Bug - Fix issue that the root mpi rank may not be the first in the hostfile (#270)
      * Benchmarks: Configuration - Update inference and network benchmarks in configs (#276)
      * Docs - Upgrade version and release note (#277)
      Co-authored-by: default avatarYuting Jiang <v-yutjiang@microsoft.com>
      ff563b66
  15. 10 Dec, 2021 2 commits
  16. 08 Dec, 2021 1 commit
  17. 12 Oct, 2021 1 commit
  18. 06 Sep, 2021 1 commit
  19. 31 Aug, 2021 1 commit
  20. 20 Aug, 2021 1 commit
    • guoshzhao's avatar
      Runner: Add Feature - Generate summarized output files. (#157) · 7595d794
      guoshzhao authored
      **Description**
      Generate the summarized output files from all nodes. For each metric, do the reduce operation according to the `reduce_op`
      
      **Major Revision**
      - Generate the summarized json file per node:
      For microbenchmark, the format is `{benchmark_name}/[{run_count}/]{metric_name}[:rank]`
      For modelbenchmark, the format is `{benchmark_name}/{sub_benchmark_name}/[{run_count}/]{metric_name}`
      `[]` means optional.
      ```
      {
        "kernel-launch/overhead_event:0": 0.00583,
        "kernel-launch/overhead_event:1": 0.00545,
        "kernel-launch/overhead_event:2": 0.00581,
        "kernel-launch/overhead_event:3": 0.00572,
        "kernel-launch/overhead_event:4": 0.00559,
        "kernel-launch/overhead_event:5": 0.00591,
        "kernel-launch/overhead_event:6": 0.00562,
        "kernel-launch/overhead_event:7": 0.00586,
        "resnet_models/pytorch-resnet50/steptime-train-float32": 544.0827468410134,
        "resnet_models/pytorch-resnet50/throughput-train-float32": 353.7607016465773,
        "resnet_models/pytorch-resnet50/steptime-train-float16": 425.40482617914677,
        "resnet_models/pytorch-resnet50/throughput-train-float16": 454.0142363793973,
        "pytorch-sharding-matmul/0/allreduce": 10.561786651611328,
        "pytorch-sharding-matmul/1/allreduce": 10.561786651611328,
        "pytorch-sharding-matmul/0/allgather": 10.088025093078613,
        "pytorch-sharding-matmul/1/allgather": 10.088025093078613
      }
      ```
      - Generate the summarized jsonl file for all nodes, each line is the result from one node in json format.
      7595d794
  21. 23 Jun, 2021 1 commit
    • Yifan Xiong's avatar
      Bug bash - Fix bugs in multi GPU benchmarks (#98) · c0c43b8f
      Yifan Xiong authored
      * Add `sb deploy` command content.
      * Fix inline if-expression syntax in playbook.
      * Fix quote escape issue in bash command.
      * Add custom env in config.
      * Update default config for multi GPU benchmarks.
      * Update MANIFEST.in to include jinja2 template.
      * Require jinja2 minimum version.
      * Fix occasional duplicate output in Ansible runner.
      * Fix mixed color from Ansible and Python colorlog.
      * Update according to comments.
      * Change superbench.env from list to dict in config file.
      c0c43b8f
  22. 16 Jun, 2021 1 commit
    • Yifan Xiong's avatar
      Bug bash - Fix bugs and refine log in single GPU benchmarks (#97) · ddbc51a1
      Yifan Xiong authored
      Fix bugs and refine log in single GPU benchmarks:
      
      * Fix none framework issue
      * Fix empty parameter bug
      * Remove missed mobilenet_v3 models
      * Change benchmark registration log to debug level
      * Add pid in logging
      * Add missing benchmarks in default config
      * Fix deprecated logging warn
      ddbc51a1
  23. 02 Jun, 2021 1 commit
  24. 01 Jun, 2021 1 commit
  25. 23 May, 2021 1 commit
  26. 18 May, 2021 1 commit
  27. 12 Apr, 2021 2 commits
  28. 29 Mar, 2021 1 commit
    • Yifan Xiong's avatar
      Update logger (#28) · 0e2b2b08
      Yifan Xiong authored
      Update logger class.
      * add file handler along with stream handler
      * add colored formatter
      0e2b2b08
  29. 26 Mar, 2021 1 commit
  30. 12 Mar, 2021 1 commit
  31. 11 Mar, 2021 1 commit
  32. 04 Feb, 2021 1 commit
  33. 01 Feb, 2021 2 commits
  34. 28 Jan, 2021 1 commit
    • Yifan Xiong's avatar
      Setup: Init - Initialize setup.py and basic configs (#4) · 5be32481
      Yifan Xiong authored
      Initialize setup.py and basic configurations for this project.
      
      Major revisions:
      
      - initialize setup.py for Python package
      - add gitignore and dockerignore
      - add editorconfig for editors
      - configure yapf for auto formating
      - configure mypy for type hint
      - configure flake8 for lint, including quotes and docstrings
      - add pre-commit check for `git commit`
      - add spelling check in GitHub Actions
      - format existing files according to configured rules
      
      Example usage:
      
          # install dependencies
          $ python3 -m pip install -e .[dev,test]
          $ pre-commit install
      
          # format code automatically
          $ python3 setup.py format
      
          # lint code
          $ python3 setup.py lint
      
          # test code
          $ python3 setup.py test
      5be32481