Commits · 5d448eedbf29db6eec01927f0598872eb1fcab90 · tsoc / superbenchmark

25 Jul, 2022 1 commit

Fix unexpected base conversion when the result value is negative (#377) · 5d448eed

Yang Wang authored Jul 25, 2022

Fix an unexpected result value (`-0.125`) issue in ib traffic benchmark when encountering `-1` in raw output
* Check if the value is valid before the base conversion
* Add a test case to cover this situation

5d448eed

20 Jul, 2022 1 commit

Fix port conflict in ib loopback (#375) · 352ae0c9

Yifan Xiong authored Jul 20, 2022

Fix potential port conflict due to race condition between time-to-check
to time-to-use, by binding the port all through.

Modify the function to resolve flake8 C901 while keeping the logic same.

352ae0c9

09 Jul, 2022 1 commit

Fix issues in ib validation benchmark (#370) · b2875179

Yifan Xiong authored Jul 09, 2022

Fix several issues in ib validation benchmark:
* continue running when timeout in the middle, instead of aborting whole mpi process
* make timeout parameter configurable, set default to 120 seconds
* avoid mixture of stdio and iostream when print to stdout
* set default message size to 8M which will saturate ib in most cases
* fix hostfile path issue so that it can be auto found in different cases

b2875179

29 Jun, 2022 1 commit

Fix issues in ib loopback benchmark (#369) · 620192a2

Yifan Xiong authored Jun 30, 2022

Fix several issues in ib loopback benchmark:
* use `--report_gbits` and divide by 8 to get GB/s, previous results are
  MiB/s / 1000
* use the ib_write_bw binary built in third_party instead of system path
* update the metrics name so that different hca indices have same metric

620192a2

24 Jun, 2022 1 commit

Support multiple IB/GPU in ib validation (#363) · bfaa1c83

Yifan Xiong authored Jun 24, 2022

**Description**

Support multiple IB/GPU devices run simultaneously in ib validation benchmark.

**Major Revisions**
- Revise ib_validation_performance.cc so that multiple processes per node could be used to launch multiple perftest commands simultaneously. For each node pair in the config, number of processes per node will run in parallel.
- Revise ib_validation_performance.py to correct file paths and adjust parameters to specify different NICs/GPUs/NUMA nodes.
- Fix env issues in Dockerfile for end-to-end test.
- Update ib-traffic configuration examples in config files.
- Update unit tests and docs accordingly.

Closes #326.

bfaa1c83

15 Jun, 2022 1 commit

Fix cmake and build issues (#360) · 60a3c743

Yifan Xiong authored Jun 15, 2022

**Description**

Fix cmake and build issues.

**Major Revision**

* Remove unnecessary boost build
* Remove user-agent for mlc
* Remove -j for third party to build each project in sequence
* Fix ansible collections installation path

60a3c743

01 Apr, 2022 1 commit

Benchmarks: Add Feature - Provide option to save raw data into file. (#333) · 6d895da8

guoshzhao authored Apr 01, 2022

**Description**
Use config `log_raw_data` to control whether log the raw data into file or not. The default value is `no`. We can set it as `yes` for some particular benchmarks to save the raw data into file, such as NCCL/RCCL test.

6d895da8

16 Mar, 2022 1 commit

Benchmarks: Add Feature - Add GPU-Burn as microbenchmark (#324) · ff51a3ce

rafsalas19 authored Mar 16, 2022

**Description**
Modifications adding GPU-Burn to SuperBench.
- added third party submodule
- modified Makefile to make gpu-burn binary
- added/modified microbenchmarks to add gpu-burn python scripts
- modified default and azure_ndv4 configs to add gpu-burn

ff51a3ce

24 Feb, 2022 1 commit
- Bug Fix - Fix P2P detection in gpu_copy (#317) · 01304706
  Ziyue Yang authored Feb 25, 2022
```
**Description**
Fix invalid reference of P2P detection result in gpu_copy.
```
  01304706
22 Feb, 2022 1 commit

Bug - Fix empty HIP_ARCHITECTURES issue in cmake>=3.21.0 (#315) · e0c49142

user4543 authored Feb 22, 2022

**Description**
Fix HIP_ARCHITECTURES is empty issue with cmake>=3.21.0.
Refer to https://github.com/ROCm-Developer-Tools/HIP/pull/2364

e0c49142

09 Feb, 2022 1 commit

Benchmarks: Revise Code - Eliminate NUMA binding for device-to-device tests in gpu_copy (#302) · 6cdf7595

Ziyue Yang authored Feb 09, 2022

**Description**
This commit remove NUMA binding for device-to-device tests because NUMA doesn't affect performance, and revise benchmark metrics accordingly.

6cdf7595

08 Feb, 2022 1 commit
- Benchmarks: Revise Code - Make data checking in gpu_copy optional (#301) · 682b2c12
  Ziyue Yang authored Feb 08, 2022
```
This commit makes data checking in gpu_copy optional, because it will take too long time if message size is large.
```
  682b2c12
07 Feb, 2022 1 commit

Benchmarks: Revise Code - Reduce result variance in gpu_copy benchmark (#298) · 85389055

Ziyue Yang authored Feb 07, 2022

**Description**
This commit does the following to optimize result variance in gpu_copy benchmark:
1) Add warmup phase for gpu_copy benchmark to avoid timing instability caused by first-time CUDA kernel launch overhead;
2) Use CUDA events for timing instead of CPU timestamps;
3) Make data checking an option that is not preferred to be enabled in performance test;
4) Enlarge message size in performance benchmark.

85389055

29 Jan, 2022 2 commits
- Benchmarks - Support T4 and A10 in GEMM benchmark (#294) · 3419447c
  Yifan Xiong authored Jan 29, 2022
```
Support T4 and A10 in GEMM benchmark.
```
  3419447c
- Benchmarks: Fix Bug - Fix GPU scan logic in gpu_copy (#296) · f3d05006
  Ziyue Yang authored Jan 29, 2022
```
Fix bug of GPU scan logic in bidirectional tests.
```
  f3d05006
24 Jan, 2022 1 commit

Bug: Fix code incesure issue of integer overflow in cublas function (#290) · 380ce400

Yuting Jiang authored Jan 24, 2022

**Description**
Fix insecure issue of Multiplication result converted to larger type.

**Major Revision**
- Use a cast to ensure that the multiplication is done using the long long to avoid overflow.

380ce400

21 Jan, 2022 1 commit

Benchmarks: Add Feature - Add bidirectional test support in gpu_copy benchmark (#285) · 74421ffe

Ziyue Yang authored Jan 21, 2022

**Description**
This commit adds bidirectional tests in gpu_copy benchmark for both device-host transfer and device-device transfer, and revises related tests.

74421ffe

19 Jan, 2022 1 commit
- Benchmarks: Add Feature - Add percentile metrics for ort and pytorch inference benchmarks (#283) · fd2bc9e0
  guoshzhao authored Jan 19, 2022
```
**Description**
Add 50th, 90th, 95th, 99th, 99.9th latency metrics for ORT and pytorch inference benchmarks.
```
  fd2bc9e0
30 Dec, 2021 1 commit

Release - SuperBench v0.4.0 (#278) · ff563b66

Yifan Xiong authored Dec 30, 2021



__Description__

Cherry-pick  bug fixes from v0.4.0 to main.

__Major Revisions__

* Bug - Fix issues for Ansible and benchmarks (#267)
* Tests - Refine test cases for microbenchmark (#268)
* Bug - Build openmpi with ucx support in rocm dockerfiles (#269)
* Benchmarks: Fix Bug - Fix fio build issue (#272)
* Docs - Unify metric and add doc for cublas and cudnn functions (#271)
* Monitor: Revision - Add 'monitor/' prefix to monitor metrics in result summary (#274)
* Bug - Fix bug of detecting if gpu_index is none (#275)
* Bug - Fix bugs in data diagnosis (#273)
* Bug - Fix issue that the root mpi rank may not be the first in the hostfile (#270)
* Benchmarks: Configuration - Update inference and network benchmarks in configs (#276)
* Docs - Upgrade version and release note (#277)
Co-authored-by: Yuting Jiang <v-yutjiang@microsoft.com>

ff563b66

13 Dec, 2021 3 commits
- Benchmarks - Add transformers for TensorRT inference (#254) · cb8a3cfb
  Yifan Xiong authored Dec 13, 2021
```
Add transformers for TensorRT inference.
```
  cb8a3cfb
- Docs - Add benchmark metrics for cpu-memory-bw-latency (#264) · 10012a0a
  Ziyue Yang authored Dec 13, 2021
```
**Description**
Add benchmark metrics for cpu-memory-bw-latency.
```
  10012a0a
- Benchmarks: Add Benchmark - Add mlc benchmark to superbench (#216) · b590409e
  Hossein Pourreza authored Dec 12, 2021
```
**Description**
Add mlc memory bandwidth and latency micro benchmark to Superbench.

**Major Revision**
- Add mlc benchmark with test and example files
```
  b590409e
10 Dec, 2021 1 commit

Benchmarks: Add Benchmark - Add ONNXRuntime inference benchmark based on ORT python API (#245) · 4d85630a

guoshzhao authored Dec 10, 2021

**Description**
Add ONNXRuntime inference benchmark based on ORT python API.

**Major Revision**
- Add `ORTInferenceBenchmark` class to export pytorch model to onnx model and do inference
- Add tests and example for `ort-inference` benchmark
- Update the introduction docs.

4d85630a

09 Dec, 2021 1 commit
- Benchmarks: Unify metric names of benchmarks (#252) · 9f56b219
  Yuting Jiang authored Dec 09, 2021
```
**Description**
Unify metric names of benchmarks.
```
  9f56b219
15 Nov, 2021 1 commit

Benchmarks: Add Feature - Extend the device manager utility to support more functions. (#239) · cc70f9c1

guoshzhao authored Nov 15, 2021

**Description**
Rename `nvidia_helper` utility as `device_manager` module and support more functions:
```
device_manager.get_device_count()
device_manager.get_device_utilization(idx)
device_manager.get_device_temperature(idx)
device_manager.get_device_power_limit(idx)
device_manager.get_device_memory(idx)
device_manager.get_device_row_remapped_info(idx)
device_manager.get_device_ecc_error(idx)
```

cc70f9c1

12 Nov, 2021 1 commit

Benchmarks - Add TensorRT inference benchmark (#236) · 8a00c8a0

Yifan Xiong authored Nov 12, 2021

__Description__

Add TensorRT inference benchmark for torchvision models.

__Major Revision__
- Measure TensorRT inference performance.

8a00c8a0

09 Nov, 2021 1 commit

Benchmarks: Add Benchmark - Add ib traffic validation distributed benchmark (#215) · 54919424

Yuting Jiang authored Nov 10, 2021

**Description**
Add ib traffic validation distributed benchmark.

**Major Revision**
- Add ib traffic validation distributed benchmark, example and test

54919424

30 Oct, 2021 1 commit

Benchmarks: Add Feature - Add CPU-initiated copy and dtod support to gpu-sm-copy benchmark (#230) · 008e0fe1

Ziyue Yang authored Oct 30, 2021

**Description**
This commit does the following:
1) Adds CPU-initiated copy benchmark;
2) Adds dtod benchmark;
3) Support scanning NUMA nodes and GPUs inside the benchmark program;
4) Change the name of gpu-sm-copy to gpu-copy.

008e0fe1

22 Oct, 2021 1 commit

Benchmarks: Add Benchmark - Add gpcnet microbenchmark (#229) · 6003f2c2

Yuting Jiang authored Oct 22, 2021

**Description**
Add gpcnet microbenchmark

**Major Revision**
- add 2 microbenmark for gpcnet, gpc-network-test, gpc-network-load-test
- add related test and example file

6003f2c2

21 Oct, 2021 1 commit

Benchmarks: Add Benchmark - Add ib validation tool source code (#191) · 2664850a

Yuting Jiang authored Oct 21, 2021

**Description**
Add IB validation tool source code. IB validation tool is a tool to validate IB traffic of different pattern in multi nodes flexibly

**Major Revision**
- Add ib validation tool source code
- Add cmake file to build the source code

2664850a

12 Oct, 2021 1 commit

Benchmarks: Add Benchmark - Add tcp connectivity validation microbenchmark (#217) · 49cc8f9a

Yuting Jiang authored Oct 13, 2021

**Description**
Add tcp connectivity validation microbenchmark which is to validate TCP connectivity between current node and several nodes in the hostfile.

**Major Revision**
- Add tcp connectivity validation microbenchmark and related test, example

49cc8f9a

26 Sep, 2021 1 commit

Release - SuperBench v0.3.0 (#212) · dfbd70b1

Yifan Xiong authored Sep 26, 2021



**Description**

Cherry-pick  bug fixes from v0.3.0 to main.

**Major Revisions**
* Docs - Upgrade version and release note (#209)
* Benchmarks: Build Pipeline - Update rccl-test git submodule to dc1ad48 (#210)
* Benchmarks: Update - Update benchmarks in configuration file (#208)
* CI/CD - Update GitHub Action VM (#211)
* Benchmarks: Fix Bug - Fix wrong parameters for gpu-sm-copy-bw in configuration examples (#203)
* CI/CD - Fix bug in build image for push event (#205)
* Benchmark: Fix Bug - fix error message of communication-computation-overlap (#204)
* Tool: Fix bug - Fix function naming issue in system info  (#200)
* CI/CD - Push images in GitHub Action (#202)
* Bug - Fix torch.distributed command for single node (#201)
* CLI - Integrate system info for node (#199)
* Benchmarks: Code Revision - Revise CMake files for microbenchmarks. (#196)
* CI/CD - Add ROCm image build in GitHub Actions (#194)
* Bug: Fix bug - fix bug of hipBusBandwidth build (#193)
* Benchmarks: Build Pipeline - Restore rocblas build logic (#197)
* Bug: Fix Bug - Add barrier before 'destroy_process_group' in model benchmarks (#198)
* Bug - Revise 'docker run' in sb deploy (#195)
* Bug - Fix Bug : fix bug of error param operations to operation in rccl-bw of hpe config (#190)
Co-authored-by: Yuting Jiang <v-yujiang@microsoft.com>
Co-authored-by: Guoshuai Zhao <guzhao@microsoft.com>
Co-authored-by: Ziyue Yang <ziyyang@microsoft.com>

dfbd70b1

03 Sep, 2021 1 commit

Benchmarks: Code Revision - Revise arguments of nccl/rccl to support mpi mode... · 60762518

Yuting Jiang authored Sep 03, 2021

Benchmarks: Code Revision - Revise arguments of nccl/rccl to support mpi mode and rename metric (#189)

**Description**
Revise arguments of nccl/rccl to support mpi mode for (mpi can not run in nccl/rccl due to multiple operators run in sequence without barrier) and rename metric .

**Major Revision**
- revise argument operators to be a single one

**Minor Revision**
- rename metric to remove benchmark name info
- change argument ngpus default value to be 1

60762518

02 Sep, 2021 1 commit
- Benchmarks: Fix bug - Fix missing key error in disk performance benchmark (#188) · b79e2845
  Ziyue Yang authored Sep 02, 2021
```
**Description**
This commit fixes error of missing key 'percentile' in parsing FIO result.
```
  b79e2845
31 Aug, 2021 2 commits

Benchmarks: Code Revision - Revise metric name generation and default config... · 024a870b

Ziyue Yang authored Aug 31, 2021

Benchmarks: Code Revision - Revise metric name generation and default config for disk performance benchmark (#175)

**Description**
This commit revises disk performance benchmark, including:
1) Add missing benchmark name in default config;
2) Avoid using reserved character ':' in metric name.

024a870b

Benchmarks: Code Revision - Revise subprocess invoke (#178) · 8cd264fd
guoshzhao authored Aug 31, 2021
```
**Description**
Package frequently-used subprocess invoke into function.
```
8cd264fd

30 Aug, 2021 4 commits

Benchmarks: Add Benchmark - Add GPU SM copy benchmark (#169) · b97197f0
Ziyue Yang authored Aug 30, 2021
```
**Description**
This commit adds gpu_sm_copy benchmark and related tests.
```
b97197f0

Benchmarks: Fix Bug - Remove ib device port info in command to fix bug of ib loopback (#173) · 95c9fc95

Yuting Jiang authored Aug 30, 2021

**Description**
Remove IB device port info in command to fix bug of IB loopback.

**Major Revision**
- Remove IB device port info in command to fix bug of IB loopback

95c9fc95

Benchmarks: Add Benchmark - Add gemm flops microbenchmark for amd (#152) · f3d53c3d

Yuting Jiang authored Aug 30, 2021

**Description**
Add gemm flops microbenchmark for amd.

**Major Revision**
- Add gemm flops microbenchmark for amd.
- Add related example and test file.

f3d53c3d

Benchmarks: Code Revision - Extract base class for gemm flops microbenchmark (#165) · b0df66f7

Yuting Jiang authored Aug 30, 2021

**Description**
Extract base class for gemm flops microbenchmark.

**Major Revision**
- extract base class for gemm flops microbenchmark and add related test.
- revise gemm_flops_performance for cuda.

b0df66f7