Commits · 8a00c8a03b76d5c736902b899bc350c50171f30a · tsoc / superbenchmark

12 Nov, 2021 1 commit

Benchmarks - Add TensorRT inference benchmark (#236) · 8a00c8a0

Yifan Xiong authored Nov 12, 2021

__Description__

Add TensorRT inference benchmark for torchvision models.

__Major Revision__
- Measure TensorRT inference performance.

8a00c8a0

09 Nov, 2021 1 commit

Benchmarks: Add Benchmark - Add ib traffic validation distributed benchmark (#215) · 54919424

Yuting Jiang authored Nov 10, 2021

**Description**
Add ib traffic validation distributed benchmark.

**Major Revision**
- Add ib traffic validation distributed benchmark, example and test

54919424

30 Oct, 2021 1 commit

Benchmarks: Add Feature - Add CPU-initiated copy and dtod support to gpu-sm-copy benchmark (#230) · 008e0fe1

Ziyue Yang authored Oct 30, 2021

**Description**
This commit does the following:
1) Adds CPU-initiated copy benchmark;
2) Adds dtod benchmark;
3) Support scanning NUMA nodes and GPUs inside the benchmark program;
4) Change the name of gpu-sm-copy to gpu-copy.

008e0fe1

27 Oct, 2021 1 commit
- Benchmarks: Add Benchmark - Add onnx model benchmarks based on docker image. (#227) · e98a6812
  guoshzhao authored Oct 27, 2021
```
Add RocmOnnxModelBenchmark class to run benchmarks packaged in superbench/benchmark:rocm4.3.1-onnxruntime1.9.0
```
  e98a6812
22 Oct, 2021 2 commits

Benchmarks: Add Benchmark - Add gpcnet microbenchmark (#229) · 6003f2c2

Yuting Jiang authored Oct 22, 2021

**Description**
Add gpcnet microbenchmark

**Major Revision**
- add 2 microbenmark for gpcnet, gpc-network-test, gpc-network-load-test
- add related test and example file

6003f2c2

Benchmarks: Add Feature - Support AMD and CUDA platform for DockerBenchmark. (#226) · f841c8f4
guoshzhao authored Oct 22, 2021
```
Description
Add CudaDockerBenchmark and RocmDockerBenchmark to support amd and cuda platform for DockerBenchmark.
```
f841c8f4

21 Oct, 2021 1 commit
- revise the term onnx to onnxruntime. (#232) · 455ad1f8
  guoshzhao authored Oct 21, 2021
```
**Description**
Revise the all the term `onnx` to `onnxruntime`.
```
  455ad1f8
12 Oct, 2021 1 commit

Benchmarks: Add Benchmark - Add tcp connectivity validation microbenchmark (#217) · 49cc8f9a

Yuting Jiang authored Oct 13, 2021

**Description**
Add tcp connectivity validation microbenchmark which is to validate TCP connectivity between current node and several nodes in the hostfile.

**Major Revision**
- Add tcp connectivity validation microbenchmark and related test, example

49cc8f9a

27 Sep, 2021 1 commit
- Benchmarks: Add Feature - Add option to use fp32 instead of tf32 (#213) · f9442456
  guoshzhao authored Sep 28, 2021
```
**Description**
Add option `force_fp32` to use fp32 instead of tf32, only takes effect on Ampere or newer GPUs.
```
  f9442456
26 Sep, 2021 1 commit

Release - SuperBench v0.3.0 (#212) · dfbd70b1

Yifan Xiong authored Sep 26, 2021



**Description**

Cherry-pick  bug fixes from v0.3.0 to main.

**Major Revisions**
* Docs - Upgrade version and release note (#209)
* Benchmarks: Build Pipeline - Update rccl-test git submodule to dc1ad48 (#210)
* Benchmarks: Update - Update benchmarks in configuration file (#208)
* CI/CD - Update GitHub Action VM (#211)
* Benchmarks: Fix Bug - Fix wrong parameters for gpu-sm-copy-bw in configuration examples (#203)
* CI/CD - Fix bug in build image for push event (#205)
* Benchmark: Fix Bug - fix error message of communication-computation-overlap (#204)
* Tool: Fix bug - Fix function naming issue in system info  (#200)
* CI/CD - Push images in GitHub Action (#202)
* Bug - Fix torch.distributed command for single node (#201)
* CLI - Integrate system info for node (#199)
* Benchmarks: Code Revision - Revise CMake files for microbenchmarks. (#196)
* CI/CD - Add ROCm image build in GitHub Actions (#194)
* Bug: Fix bug - fix bug of hipBusBandwidth build (#193)
* Benchmarks: Build Pipeline - Restore rocblas build logic (#197)
* Bug: Fix Bug - Add barrier before 'destroy_process_group' in model benchmarks (#198)
* Bug - Revise 'docker run' in sb deploy (#195)
* Bug - Fix Bug : fix bug of error param operations to operation in rccl-bw of hpe config (#190)
Co-authored-by: Yuting Jiang <v-yujiang@microsoft.com>
Co-authored-by: Guoshuai Zhao <guzhao@microsoft.com>
Co-authored-by: Ziyue Yang <ziyyang@microsoft.com>

dfbd70b1

03 Sep, 2021 1 commit

Benchmarks: Code Revision - Revise arguments of nccl/rccl to support mpi mode... · 60762518

Yuting Jiang authored Sep 03, 2021

Benchmarks: Code Revision - Revise arguments of nccl/rccl to support mpi mode and rename metric (#189)

**Description**
Revise arguments of nccl/rccl to support mpi mode for (mpi can not run in nccl/rccl due to multiple operators run in sequence without barrier) and rename metric .

**Major Revision**
- revise argument operators to be a single one

**Minor Revision**
- rename metric to remove benchmark name info
- change argument ngpus default value to be 1

60762518

02 Sep, 2021 1 commit

Runner - Fix inventory issue in ansible_runner (#185) · e2453e1c

Yifan Xiong authored Sep 02, 2021

__Description__

Fix inventory bug in ansible_runner when host list is provided with multiple hosts.

It ought to be handled by ansible_runner lib, workaround by using `--inventory` arg in cmdline.

e2453e1c

01 Sep, 2021 1 commit

Benchmarks: Code Revision - revise the DockerBenchmark base class (#179) · 37d5dfd5

guoshzhao authored Sep 01, 2021

**Description**
Revise the DockerBenchmark base to support image pull, image rm etc.

**Major Revision**
- image pull in _preprocess()
- image clean in _postprocess()
- execute customized commands in _benchmark()
- add unit tests

37d5dfd5

31 Aug, 2021 1 commit

Benchmarks: Code Revision - Revise metric name generation and default config... · 024a870b

Ziyue Yang authored Aug 31, 2021

Benchmarks: Code Revision - Revise metric name generation and default config for disk performance benchmark (#175)

**Description**
This commit revises disk performance benchmark, including:
1) Add missing benchmark name in default config;
2) Avoid using reserved character ':' in metric name.

024a870b

30 Aug, 2021 3 commits

Benchmarks: Add Benchmark - Add GPU SM copy benchmark (#169) · b97197f0
Ziyue Yang authored Aug 30, 2021
```
**Description**
This commit adds gpu_sm_copy benchmark and related tests.
```
b97197f0

Benchmarks: Add Benchmark - Add gemm flops microbenchmark for amd (#152) · f3d53c3d

Yuting Jiang authored Aug 30, 2021

**Description**
Add gemm flops microbenchmark for amd.

**Major Revision**
- Add gemm flops microbenchmark for amd.
- Add related example and test file.

f3d53c3d

Benchmarks: Code Revision - Extract base class for gemm flops microbenchmark (#165) · b0df66f7

Yuting Jiang authored Aug 30, 2021

**Description**
Extract base class for gemm flops microbenchmark.

**Major Revision**
- extract base class for gemm flops microbenchmark and add related test.
- revise gemm_flops_performance for cuda.

b0df66f7

27 Aug, 2021 2 commits

Benchmarks: Code Revision - Rename kernel_launch_overhead metrics (#171) · 35114bae

guoshzhao authored Aug 28, 2021

**Description**
Rename `kernel_launch_overhead_event` to `event_overhead`, `kernel_launch_overhead_wall` to `wall_overhead`.

35114bae

Benchmarks: Add Benchmark - Add memory bus bandwidth performance microbenchmark for amd (#153) · 666e3a94

Yuting Jiang authored Aug 27, 2021

**Description**
Add memory bus bandwidth performance microbenchmark for amd.

**Major Revision**
- Add memory bus bandwidth performance microbenchmark for amd.
- Add related example and test file.

666e3a94

25 Aug, 2021 1 commit

Benchmarks: Code Revision - Extract base class for memory bandwidth microbenchmark (#159) · e5e84a2e

Yuting Jiang authored Aug 26, 2021

**Description**
extract base class for memory bandwidth microbenchmark.

**Major Revision**
- revise and optimize cuda_memory_bandwidth_performance
- extract base class for memory bandwidth microbenchmark
- add test for base class

e5e84a2e

23 Aug, 2021 1 commit
- Benchmarks: Code Revision - fix typo in test of nccl microbenchmark. (#163) · 0583862d
  Yuting Jiang authored Aug 23, 2021
```
**Description**
 fix typo in test_nccl_bw_performance.py.

**Major Revision**
-  fix typo in test_nccl_bw_performance.py.
```
  0583862d
22 Aug, 2021 1 commit
- Benchmarks: Revise Benchmark - Add readwrite I/O pattern (#161) · 6774d7b7
  Ziyue Yang authored Aug 22, 2021
```
**Description**
This commit adds readwrite I/O pattern for FIO benchmark. Read/write ratio is fixed at 4:1.
```
  6774d7b7
20 Aug, 2021 1 commit

Runner: Add Feature - Generate summarized output files. (#157) · 7595d794

guoshzhao authored Aug 20, 2021

**Description**
Generate the summarized output files from all nodes. For each metric, do the reduce operation according to the `reduce_op`

**Major Revision**
- Generate the summarized json file per node:
For microbenchmark, the format is `{benchmark_name}/[{run_count}/]{metric_name}[:rank]`
For modelbenchmark, the format is `{benchmark_name}/{sub_benchmark_name}/[{run_count}/]{metric_name}`
`[]` means optional.
```
{
  "kernel-launch/overhead_event:0": 0.00583,
  "kernel-launch/overhead_event:1": 0.00545,
  "kernel-launch/overhead_event:2": 0.00581,
  "kernel-launch/overhead_event:3": 0.00572,
  "kernel-launch/overhead_event:4": 0.00559,
  "kernel-launch/overhead_event:5": 0.00591,
  "kernel-launch/overhead_event:6": 0.00562,
  "kernel-launch/overhead_event:7": 0.00586,
  "resnet_models/pytorch-resnet50/steptime-train-float32": 544.0827468410134,
  "resnet_models/pytorch-resnet50/throughput-train-float32": 353.7607016465773,
  "resnet_models/pytorch-resnet50/steptime-train-float16": 425.40482617914677,
  "resnet_models/pytorch-resnet50/throughput-train-float16": 454.0142363793973,
  "pytorch-sharding-matmul/0/allreduce": 10.561786651611328,
  "pytorch-sharding-matmul/1/allreduce": 10.561786651611328,
  "pytorch-sharding-matmul/0/allgather": 10.088025093078613,
  "pytorch-sharding-matmul/1/allgather": 10.088025093078613
}
```
- Generate the summarized jsonl file for all nodes, each line is the result from one node in json format.

7595d794

19 Aug, 2021 1 commit

Runner - Support mpi mode (#146) · 98b6c0e3

Yifan Xiong authored Aug 19, 2021



Support mpi mode in runner:
* concate mpirun command
* support mca and env config
* prepare hostfile and update Ansible host pattern
Co-authored-by: Peng Cheng <chengpeng5555@outlook.com>

98b6c0e3

16 Aug, 2021 1 commit
- Benchmarks: Code Revision - change 'reduce' to 'reduce_op' (#156) · 7293e783
  guoshzhao authored Aug 16, 2021
```
**Description**
Change the field name `reduce` to `reduce_op`.
```
  7293e783
06 Aug, 2021 2 commits
- Benchmarks: Add Feature - Set reduce type for current benchmarks' metrics. (#149) · acf365a8
  guoshzhao authored Aug 06, 2021
```
**Description**
Set reduce type for current benchmarks' metrics, including model benchmarks and ShardingMatmul.
```
  acf365a8
- Benchmarks: Code Revision - Calculate average value by using statistics module. (#148) · bc1a61b9
  guoshzhao authored Aug 06, 2021
```
**Description**
Replace `sum(results) / len(results)` with `statistics.mean(results)`
```
  bc1a61b9
05 Aug, 2021 1 commit

Benchmarks: Add Feature - Add reduce function support for output summary. (#147) · e41b1f62

guoshzhao authored Aug 05, 2021

**Description**
Add reduce function support for output summary.

**Major Revision**
- Add reducer class to maintain all reduce functions.
- Save reduce type of each metric into `BenchmarkResult`
- Fix UT.

e41b1f62

26 Jul, 2021 1 commit

Benchmarks: Add Benchmark - Add NCCL performance benchmark (#113) · e083a598

Yuting Jiang authored Jul 26, 2021

**Description**
Add NCCL performance microbenchmark.

**Major Revision**
- Add microbenchmark, example, test, config for NCCL

e083a598

23 Jul, 2021 2 commits

Benchmarks: Add Benchmark - Add IB Loopback performance benchmark. (#112) · b0c5addc

Yuting Jiang authored Jul 24, 2021

**Description**
Add RDMA Loopback performance microbenchmark.

**Major Revision**
- Add microbenchmark, example, test, config for RDMA Loopback

b0c5addc

Benchmarks: Add Benchmark - Add disk performance benchmark (#132) · db297fb4

Ziyue Yang authored Jul 23, 2021

**Description**
Add disk performance microbenchmark.

**Major Revision**
- Add microbenchmark, example, test, config for disk performance.

**Minor Revision**
- Fix bugs in executor unit test related to default enabled tests.

db297fb4

20 Jul, 2021 1 commit

Benchmarks: Fix bug - fix bug in test_executor.py to test default enabled tests only (#133) · 477fbb0a

Ziyue Yang authored Jul 20, 2021

**Description**
Fix bug of tests/executor/test_executor.py.

**Major Revision**
- Test default enabled benchmarks only instead of all benchmarks.

477fbb0a

13 Jul, 2021 2 commits

Benchmarks: Add Benchmark - Add memory bandwidth benchmark for cuda. (#114) · f9550bd6

Yuting Jiang authored Jul 13, 2021

Add microbenchmark, example, test, config for cuda memory performance and Add cuda-samples(tag with cuda version) as git submodule and update related makefile

f9550bd6

Utils: Code Revision - Update network common utils (#118) · 71c1617b

Yuting Jiang authored Jul 13, 2021


Update network common utils. Add get_ib_devices in network common utils and move get_free_port from test utils to network common utils

71c1617b

09 Jul, 2021 1 commit

Bug bash - Merge fix from release/0.2 to main (#124) · 9c984c7e

guoshzhao authored Jul 09, 2021



* Bug Fix - Fix race condition issue for multi ranks (#117)

Fix race condition issue when multi ranks rotating the same directory.

* Update pipeline for release branch (#122)

* Bug Fix - Fix bug when convert bool config to store_true argument. (#120)
Co-authored-by: Yifan Xiong <yifan.xiong@microsoft.com>

9c984c7e

08 Jul, 2021 1 commit

Runner & Executor - Support AMD GPU (#119) · 7458f83a

Yifan Xiong authored Jul 09, 2021

Support both NVIDIA and AMD GPU and check GPU vendor during deployment and execution.

* Add GPU environment check in sb deploy.
* Check GPU vendor in executor.

7458f83a

02 Jul, 2021 1 commit

Runner - Fetch benchmarks results on all nodes (#116) · fb7d4a73

Yifan Xiong authored Jul 02, 2021

Fetch benchmarks results on all nodes, will rsync after each benchmark.
The results directory structure on control node is as follows:

```
outputs/
└── datetime
    ├── nodes
    │   └── node-0
    │       ├── benchmarks
    │       │   ├── benchmark-0
    │       │   │   ├── rank-0
    │       │   │   │   └── results.json
    │       └── sb-exec.log
    ├── sb-run.log
    └── sb.config.yaml
```

fb7d4a73

01 Jul, 2021 1 commit
- CLI - Support custom output directory (#110) · 7b0b0e9a
  Yifan Xiong authored Jul 01, 2021
```
* Support custom output directory.
* Update document.
```
  7b0b0e9a
29 Jun, 2021 1 commit
- Benchmarks: Fix Bug - Fix gemm kernel bug for nvidia v100. (#105) · 8ffaddfa
  guoshzhao authored Jun 29, 2021
```
* fix bug for nvidia v100
* hard code the supported dict for different arch.
```
  8ffaddfa
28 Jun, 2021 1 commit
- Benchmarks: Code Revision - Replace torch.optim.AdamW with transformers.AdamW. (#106) · 9c748527
  guoshzhao authored Jun 28, 2021
```
* replace torch.optim.AdamW with transformers.AdamW.
```
  9c748527