Commits · 6681c72043bef5cd5a9c4dd17f44358488b8b0ab · tsoc / superbenchmark

29 Apr, 2022 1 commit

Release - SuperBench v0.5.0 (#350) · 6681c720

Yifan Xiong authored Apr 29, 2022



**Description**

Cherry-pick  bug fixes from v0.5.0 to main.

**Major Revisions**

* Bug - Force to fix ort version as '1.10.0' (#343)
* Bug - Support no matching rules and unify the output name in result_summary (#345)
* Analyzer - Support regex in annotations of benchmark naming for metrics in rules (#344)
* Bug - Fix bugs in sync results on root rank for e2e model benchmarks (#342)
* Bug - Fix bug of duration feature for model benchmarks in distributed mode (#347)
* Docs - Upgrade version and release note (#348)
Co-authored-by: Yuting Jiang <v-yutjiang@microsoft.com>

6681c720

20 Apr, 2022 1 commit
- Docs - Update links using relative file paths with extensions (#346) · 712eafc3
  user4543 authored Apr 21, 2022
```
**Description**
Update links of referencing other docs using relative file paths with extensions.
```
  712eafc3
15 Apr, 2022 1 commit
- Docs - Update link to cli.md (#341) · cb266911
  Jared Bowden authored Apr 15, 2022
```
**Description**
Fixes relative link in documentation: point to `../cli.md`.
```
  cb266911
11 Apr, 2022 1 commit
- CLI - Integrate output all nodes diagnosis results (#339) · 8dc19ca4
  user4543 authored Apr 11, 2022
```
**Description**
Integrate output all nodes diagnosis results.
```
  8dc19ca4
08 Apr, 2022 2 commits

Docs - Add usage for result summary (#337) · 56c9a711
user4543 authored Apr 09, 2022
```
**Description**
Add usage for result summary.
```
56c9a711

CLI - Integrage result summary and update output format of data diagnosis (#335) · f15da60b

user4543 authored Apr 08, 2022

**Description**
Integrage result summary and update output format of data diagnosis.

**Major Revision**
- integrage result summary 
- add md and html format for data diagnosis

f15da60b

01 Apr, 2022 1 commit

Benchmarks: Add Feature - Provide option to save raw data into file. (#333) · 6d895da8

guoshzhao authored Apr 01, 2022

**Description**
Use config `log_raw_data` to control whether log the raw data into file or not. The default value is `no`. We can set it as `yes` for some particular benchmarks to save the raw data into file, such as NCCL/RCCL test.

6d895da8

16 Mar, 2022 1 commit

Benchmarks: Add Feature - Add GPU-Burn as microbenchmark (#324) · ff51a3ce

rafsalas19 authored Mar 16, 2022

**Description**
Modifications adding GPU-Burn to SuperBench.
- added third party submodule
- modified Makefile to make gpu-burn binary
- added/modified microbenchmarks to add gpu-burn python scripts
- modified default and azure_ndv4 configs to add gpu-burn

ff51a3ce

20 Feb, 2022 1 commit

Analyzer: Add Feature - Add multi-rules feature for data diagnosis (#289) · 97ed12f9

user4543 authored Feb 20, 2022

**Description**
Add multi-rules feature for data diagnosis to support multiple rules' combined check.

**Major Revision**
- revise rule design to support multiple rules combination check
- update related codes and tests

97ed12f9

09 Feb, 2022 1 commit

Benchmarks: Revise Code - Eliminate NUMA binding for device-to-device tests in gpu_copy (#302) · 6cdf7595

Ziyue Yang authored Feb 09, 2022

**Description**
This commit remove NUMA binding for device-to-device tests because NUMA doesn't affect performance, and revise benchmark metrics accordingly.

6cdf7595

30 Jan, 2022 1 commit
- Bug - Fix typo in document (#297) · 28195be6
  Yuting Jiang authored Jan 30, 2022
```
Fix typo in document.
```
  28195be6
29 Jan, 2022 1 commit
- Config - Support customized env for all modes (#295) · 3524975c
  Yifan Xiong authored Jan 29, 2022
```
Support customized env for all modes in configuration.
```
  3524975c
28 Jan, 2022 1 commit

Benchmarks: Add Feature - Add timeout feature for each benchmark. (#288) · d877ca23

guoshzhao authored Jan 28, 2022

**Description**
Add timeout feature for each benchmark.

**Major Revision**
- Add `timeout` config for each benchmark. In current config files, only set the timeout for kernel-launch as example. Other benchmarks can be set in the future.
- Set the timeout config for `ansible_runner.run()`. Runner will get the return code 254:
   [ansible.py:80][WARNING] Run failed, return code 254.
- Using `timeout` command to terminate the client process.

d877ca23

25 Jan, 2022 1 commit

Config - Update benchmark naming to support annotations (#284) · 7d7cd3dc

Yifan Xiong authored Jan 25, 2022

__Description__

Update benchmark naming to support annotations.

__Major Revisions__
- Update name for `create_benchmark_context` in executor.
- Backward compatibility for model benchmarks using "_models" suffix.
- Update documents.

7d7cd3dc

21 Jan, 2022 1 commit

Benchmarks: Add Feature - Add bidirectional test support in gpu_copy benchmark (#285) · 74421ffe

Ziyue Yang authored Jan 21, 2022

**Description**
This commit adds bidirectional tests in gpu_copy benchmark for both device-host transfer and device-device transfer, and revises related tests.

74421ffe

19 Jan, 2022 1 commit
- Benchmarks: Add Feature - Add percentile metrics for ort and pytorch inference benchmarks (#283) · fd2bc9e0
  guoshzhao authored Jan 19, 2022
```
**Description**
Add 50th, 90th, 95th, 99th, 99.9th latency metrics for ORT and pytorch inference benchmarks.
```
  fd2bc9e0
18 Jan, 2022 1 commit

CLI - Add command sb benchmark [list,list-parameters] (#279) · f7ffc545

Yifan Xiong authored Jan 18, 2022

__Description__

Add command `sb benchmark list` and `sb benchmark list-parameters` to support listing all optional parameters for benchmarks.

<details>
<summary>Examples</summary>
<pre>
$ sb benchmark list -n [a-z]+-bw -o table
Result
--------
mem-bw
nccl-bw
rccl-bw
</pre>
<pre>
$ sb benchmark list-parameters -n mem-bw
=== mem-bw ===
optional arguments:
  --bin_dir str         Specify the directory of the benchmark binary.
  --duration int        The elapsed time of benchmark in seconds.
  --mem_type str [str ...]
                        Memory types to benchmark. E.g. htod dtoh dtod.
  --memory str          Memory argument for bandwidthtest. E.g. pinned unpinned.
  --run_count int       The run count of benchmark.
  --shmoo_mode          Enable shmoo mode for bandwidthtest.
default values:
{'bin_dir': None,
 'duration': 0,
 'mem_type': ['htod', 'dtoh'],
 'memory': 'pinned',
 'run_count': 1}
</pre>
</details>

__Major Revisions__
* Add `sb benchmark list` to list benchmarks matching given name.
* Add `sb benchmark list-parameters` to list parameters for benchmarks which match given name.

__Minor Revisions__
* Sort format help text for argparse.

f7ffc545

30 Dec, 2021 1 commit

Release - SuperBench v0.4.0 (#278) · ff563b66

Yifan Xiong authored Dec 30, 2021



__Description__

Cherry-pick  bug fixes from v0.4.0 to main.

__Major Revisions__

* Bug - Fix issues for Ansible and benchmarks (#267)
* Tests - Refine test cases for microbenchmark (#268)
* Bug - Build openmpi with ucx support in rocm dockerfiles (#269)
* Benchmarks: Fix Bug - Fix fio build issue (#272)
* Docs - Unify metric and add doc for cublas and cudnn functions (#271)
* Monitor: Revision - Add 'monitor/' prefix to monitor metrics in result summary (#274)
* Bug - Fix bug of detecting if gpu_index is none (#275)
* Bug - Fix bugs in data diagnosis (#273)
* Bug - Fix issue that the root mpi rank may not be the first in the hostfile (#270)
* Benchmarks: Configuration - Update inference and network benchmarks in configs (#276)
* Docs - Upgrade version and release note (#277)
Co-authored-by: Yuting Jiang <v-yutjiang@microsoft.com>

ff563b66

14 Dec, 2021 1 commit
- Docs - Add usage for data diagnosis (#266) · 682ed06a
  Yuting Jiang authored Dec 14, 2021
```
**Description**
Add usage for data diagnosis.
```
  682ed06a
13 Dec, 2021 4 commits
- Docs - Update docs for monitor. (#265) · 2e10fb0d
  guoshzhao authored Dec 13, 2021
```
**Description**
Update docs for monitor.
```
  2e10fb0d
- Benchmarks - Add transformers for TensorRT inference (#254) · cb8a3cfb
  Yifan Xiong authored Dec 13, 2021
```
Add transformers for TensorRT inference.
```
  cb8a3cfb
- Docs - Add benchmark metrics for cpu-memory-bw-latency (#264) · 10012a0a
  Ziyue Yang authored Dec 13, 2021
```
**Description**
Add benchmark metrics for cpu-memory-bw-latency.
```
  10012a0a
- Docs - Add a small note for using release container version (#262) · c403b1ca
  yangpanMS authored Dec 12, 2021
```
**Description**
Minor doc change to highlight sb CLI version is independent of the sb container version.
```
  c403b1ca
10 Dec, 2021 2 commits

Benchmarks: Add Benchmark - Add ONNXRuntime inference benchmark based on ORT python API (#245) · 4d85630a

guoshzhao authored Dec 10, 2021

**Description**
Add ONNXRuntime inference benchmark based on ORT python API.

**Major Revision**
- Add `ORTInferenceBenchmark` class to export pytorch model to onnx model and do inference
- Add tests and example for `ort-inference` benchmark
- Update the introduction docs.

4d85630a

Monitor: Integration - Integrate monitor into Superbench (#259) · 6e357fb9

guoshzhao authored Dec 10, 2021

**Description**
Integrate monitor into Superbench.

**Major Revision**
- Initialize, start and stop monitor in SB executor.
- Parse the monitor data in SB runner and merge into benchmark results.
- Specify ReduceType for monitor metrics, such as MAX, MIN and LAST.
- Add monitor configs into config file.

6e357fb9

09 Dec, 2021 1 commit
- Benchmarks: Unify metric names of benchmarks (#252) · 9f56b219
  Yuting Jiang authored Dec 09, 2021
```
**Description**
Unify metric names of benchmarks.
```
  9f56b219
06 Dec, 2021 1 commit

Docs - Add doc for data diagnosis (#249) · 655f238d

Yuting Jiang authored Dec 06, 2021

**Description**
Add doc for data diagnosis, including input, output and baseline file schema.

655f238d

30 Nov, 2021 1 commit
- Docs: Update ib validation microbenchmark metrics (#246) · 978e88ef
  Yuting Jiang authored Nov 30, 2021
```
**Description**
Update ib validtion mirobenchmark metrics.
```
  978e88ef
26 Nov, 2021 1 commit
- Docs: Update gpu-copy benchmark metrics (#241) · d89fcd4f
  Ziyue Yang authored Nov 26, 2021
```
**Description**

Update gpu-copy benchmark metrics.
```
  d89fcd4f
12 Nov, 2021 1 commit

Benchmarks - Add TensorRT inference benchmark (#236) · 8a00c8a0

Yifan Xiong authored Nov 12, 2021

__Description__

Add TensorRT inference benchmark for torchvision models.

__Major Revision__
- Measure TensorRT inference performance.

8a00c8a0

10 Nov, 2021 1 commit
- Docs: Update docs to add network benchmarks for tcp and gpcnet (#238) · b913e1f6
  Yuting Jiang authored Nov 10, 2021
```
**Description**

Update docs to add network benchmarks for tcp and gpcnet.
```
  b913e1f6
09 Nov, 2021 1 commit
- Docs: Update docs to add ORT AMD benchmarks based on docker (#237) · f15fdf72
  guoshzhao authored Nov 09, 2021
```
Update docs to add ORT AMD benchmarks based on docker.
```
  f15fdf72
27 Oct, 2021 1 commit
- Docs - Add introduction and metrics in benchmarks docs (#233) · 976803f8
  Yifan Xiong authored Oct 27, 2021
```
Add introduction and metrics for micro-benchmarks and model-benchmarks document.
```
  976803f8
21 Oct, 2021 1 commit
- revise the term onnx to onnxruntime. (#232) · 455ad1f8
  guoshzhao authored Oct 21, 2021
```
**Description**
Revise the all the term `onnx` to `onnxruntime`.
```
  455ad1f8
12 Oct, 2021 1 commit

Docs - Refine document structure (#225) · 3d0fde12

Yifan Xiong authored Oct 12, 2021

__Major Revisions__

* Refine document structure for user tutorial.

__Minor Revisions__

* Add AMD part in installation.
* Change default config file to latest link.

3d0fde12

26 Sep, 2021 1 commit

Release - SuperBench v0.3.0 (#212) · dfbd70b1

Yifan Xiong authored Sep 26, 2021



**Description**

Cherry-pick  bug fixes from v0.3.0 to main.

**Major Revisions**
* Docs - Upgrade version and release note (#209)
* Benchmarks: Build Pipeline - Update rccl-test git submodule to dc1ad48 (#210)
* Benchmarks: Update - Update benchmarks in configuration file (#208)
* CI/CD - Update GitHub Action VM (#211)
* Benchmarks: Fix Bug - Fix wrong parameters for gpu-sm-copy-bw in configuration examples (#203)
* CI/CD - Fix bug in build image for push event (#205)
* Benchmark: Fix Bug - fix error message of communication-computation-overlap (#204)
* Tool: Fix bug - Fix function naming issue in system info  (#200)
* CI/CD - Push images in GitHub Action (#202)
* Bug - Fix torch.distributed command for single node (#201)
* CLI - Integrate system info for node (#199)
* Benchmarks: Code Revision - Revise CMake files for microbenchmarks. (#196)
* CI/CD - Add ROCm image build in GitHub Actions (#194)
* Bug: Fix bug - fix bug of hipBusBandwidth build (#193)
* Benchmarks: Build Pipeline - Restore rocblas build logic (#197)
* Bug: Fix Bug - Add barrier before 'destroy_process_group' in model benchmarks (#198)
* Bug - Revise 'docker run' in sb deploy (#195)
* Bug - Fix Bug : fix bug of error param operations to operation in rccl-bw of hpe config (#190)
Co-authored-by: Yuting Jiang <v-yujiang@microsoft.com>
Co-authored-by: Guoshuai Zhao <guzhao@microsoft.com>
Co-authored-by: Ziyue Yang <ziyyang@microsoft.com>

dfbd70b1

02 Sep, 2021 1 commit
- Docs: Add system config info for result collection (#168) · ab71bbb4
  TobeyQin authored Sep 02, 2021
```
**Description**
Add system config info for result collection
```
  ab71bbb4
30 Aug, 2021 2 commits

Docs: Revision - Revise results contributing rule (#174) · de481cb0

TobeyQin authored Aug 30, 2021

**Description**
Revise results contributing rule.

- Change the results uploading path to [superbench-results](https://github.com/microsoft/superbench-results

) repo.
- Add description of how to get system info by command.
Co-authored-by: Peng Cheng <chengpeng5555@outlook.com>

de481cb0

Docs: Add document for SuperBench YAML config (#158) · 0b74b2aa
Yifan Xiong authored Aug 30, 2021
```
**Description**
Add document for SuperBench YAML config file.
```
0b74b2aa

16 Aug, 2021 1 commit

Docs - Add config and docs for development experience (#155) · 96fc4d09

Yifan Xiong authored Aug 16, 2021

 Add config and docs for development experience.

__Major Revision__
- Add settings and extensions config for VSCode.
- Add devcontainer config for Codespaces.
- Update document accordingly.

96fc4d09