Commits · e75c105924108bdc8f191ed7387de091f258e058 · tsoc / superbenchmark

25 Aug, 2022 2 commits

Enhance parameter parsing to allow spaces in value (#397) · e75c1059
Yifan Xiong authored Aug 25, 2022
```
Enhance parameter parsing to support string like `"--arg1 value --arg2 'a long string with spaces'"`.
```
e75c1059

Enable latency test in ib traffic validation distributed benchmark (#396) · dda692da

Yang Wang authored Aug 25, 2022

Enable ib latency test in ib traffic validation distributed benchmark

As Perftest supports CUDA only in BW tests (Refer [perftest source code](https://github.com/linux-rdma/perftest/blob/23f7f8a56892bd9e00b6a4e8bd4bbfdbe122af47/src/perftest_parameters.c#L1652))
Remove `--use_cuda` option from cmd prefix if the test command is bw related

dda692da

04 Aug, 2022 1 commit

Gracefully exit when timeout (#383) · 9b8df883

Yifan Xiong authored Aug 04, 2022

* Gracefully exit when timeout, add corresponding log and return code.
* Set minimum timeout to 1 minute and enlarge Ansible timeout.

9b8df883

26 Jul, 2022 1 commit

Support topo-aware IB performance validation (#373) · ef4d6574

Jie Zhang authored Jul 26, 2022



* Support topo-aware IB performance validation

Add a new pattern `topo-aware`, so the user can run IB performance
test based on VM's topology information. This way, the user can
validate the IB performance across VM pairs with different distance
as a quick test instead of pair-wise test.

To run with topo-aware pattern, user needs to specify three required
(and two optional) parameters in YAML config file:
--pattern	topo-aware
--ibstat	path to ibstat output
--ibnetdiscover	path to ibnetdiscover output
--min_dist	minimum distance of VM pairs (optional, default 2)
--max_dist	maximum distance of VM pairs (optional, default 6)

The newly added topo_aware module then parses the topology
information, builds a graph, and generates the VM pairs with
the specified distance (# hops).

The specified IB test will then be running across these
generated VM pairs.
Signed-off-by: Jie Zhang <jessezhang1010@gmail.com>

* Add description about topology aware ib traffic tests
Signed-off-by: Jie Zhang <jessezhang1010@gmail.com>

* Add unit test to verify generated topology aware config file

This commit adds unit test to verify the generated topology aware
config file is correct. To do so, four new data files are added in
order to invoke gen_topo_aware_config function to generate topology
aware config file, then compares it with the expected config file.
Signed-off-by: Jie Zhang <jessezhang1010@gmail.com>

* Fix lint issue on Azure pipeline
Signed-off-by: Jie Zhang <jessezhang1010@gmail.com>

ef4d6574

25 Jul, 2022 1 commit

Fix unexpected base conversion when the result value is negative (#377) · 5d448eed

Yang Wang authored Jul 25, 2022

Fix an unexpected result value (`-0.125`) issue in ib traffic benchmark when encountering `-1` in raw output
* Check if the value is valid before the base conversion
* Add a test case to cover this situation

5d448eed

20 Jul, 2022 1 commit

Fix port conflict in ib loopback (#375) · 352ae0c9

Yifan Xiong authored Jul 20, 2022

Fix potential port conflict due to race condition between time-to-check
to time-to-use, by binding the port all through.

Modify the function to resolve flake8 C901 while keeping the logic same.

352ae0c9

09 Jul, 2022 1 commit

Fix issues in ib validation benchmark (#370) · b2875179

Yifan Xiong authored Jul 09, 2022

Fix several issues in ib validation benchmark:
* continue running when timeout in the middle, instead of aborting whole mpi process
* make timeout parameter configurable, set default to 120 seconds
* avoid mixture of stdio and iostream when print to stdout
* set default message size to 8M which will saturate ib in most cases
* fix hostfile path issue so that it can be auto found in different cases

b2875179

29 Jun, 2022 1 commit

Fix issues in ib loopback benchmark (#369) · 620192a2

Yifan Xiong authored Jun 30, 2022

Fix several issues in ib loopback benchmark:
* use `--report_gbits` and divide by 8 to get GB/s, previous results are
  MiB/s / 1000
* use the ib_write_bw binary built in third_party instead of system path
* update the metrics name so that different hca indices have same metric

620192a2

24 Jun, 2022 1 commit

Support multiple IB/GPU in ib validation (#363) · bfaa1c83

Yifan Xiong authored Jun 24, 2022

**Description**

Support multiple IB/GPU devices run simultaneously in ib validation benchmark.

**Major Revisions**
- Revise ib_validation_performance.cc so that multiple processes per node could be used to launch multiple perftest commands simultaneously. For each node pair in the config, number of processes per node will run in parallel.
- Revise ib_validation_performance.py to correct file paths and adjust parameters to specify different NICs/GPUs/NUMA nodes.
- Fix env issues in Dockerfile for end-to-end test.
- Update ib-traffic configuration examples in config files.
- Update unit tests and docs accordingly.

Closes #326.

bfaa1c83

29 Apr, 2022 1 commit

Release - SuperBench v0.5.0 (#350) · 6681c720

Yifan Xiong authored Apr 29, 2022



**Description**

Cherry-pick  bug fixes from v0.5.0 to main.

**Major Revisions**

* Bug - Force to fix ort version as '1.10.0' (#343)
* Bug - Support no matching rules and unify the output name in result_summary (#345)
* Analyzer - Support regex in annotations of benchmark naming for metrics in rules (#344)
* Bug - Fix bugs in sync results on root rank for e2e model benchmarks (#342)
* Bug - Fix bug of duration feature for model benchmarks in distributed mode (#347)
* Docs - Upgrade version and release note (#348)
Co-authored-by: Yuting Jiang <v-yutjiang@microsoft.com>

6681c720

11 Apr, 2022 1 commit

Benchmarks: Add Benchmark - Add FAMBench based on docker benchmark (#338) · 80dcc8aa

guoshzhao authored Apr 11, 2022

**Description**
Integrate FAMBench into superbench based on docker implementation:
https://github.com/facebookresearch/FAMBench

The script to run all benchmarks is:
https://github.com/facebookresearch/FAMBench/blob/main/benchmarks/run_all.sh

80dcc8aa

01 Apr, 2022 1 commit

Benchmarks: Add Feature - Provide option to save raw data into file. (#333) · 6d895da8

guoshzhao authored Apr 01, 2022

**Description**
Use config `log_raw_data` to control whether log the raw data into file or not. The default value is `no`. We can set it as `yes` for some particular benchmarks to save the raw data into file, such as NCCL/RCCL test.

6d895da8

16 Mar, 2022 1 commit

Benchmarks: Add Feature - Add GPU-Burn as microbenchmark (#324) · ff51a3ce

rafsalas19 authored Mar 16, 2022

**Description**
Modifications adding GPU-Burn to SuperBench.
- added third party submodule
- modified Makefile to make gpu-burn binary
- added/modified microbenchmarks to add gpu-burn python scripts
- modified default and azure_ndv4 configs to add gpu-burn

ff51a3ce

08 Feb, 2022 1 commit
- Benchmarks: Revise Code - Make data checking in gpu_copy optional (#301) · 682b2c12
  Ziyue Yang authored Feb 08, 2022
```
This commit makes data checking in gpu_copy optional, because it will take too long time if message size is large.
```
  682b2c12
07 Feb, 2022 1 commit

Benchmarks: Revise Code - Reduce result variance in gpu_copy benchmark (#298) · 85389055

Ziyue Yang authored Feb 07, 2022

**Description**
This commit does the following to optimize result variance in gpu_copy benchmark:
1) Add warmup phase for gpu_copy benchmark to avoid timing instability caused by first-time CUDA kernel launch overhead;
2) Use CUDA events for timing instead of CPU timestamps;
3) Make data checking an option that is not preferred to be enabled in performance test;
4) Enlarge message size in performance benchmark.

85389055

29 Jan, 2022 1 commit
- Benchmarks - Support T4 and A10 in GEMM benchmark (#294) · 3419447c
  Yifan Xiong authored Jan 29, 2022
```
Support T4 and A10 in GEMM benchmark.
```
  3419447c
28 Jan, 2022 1 commit

Benchmarks: Add Feature - Sync the E2E training results among all workers for each step. (#287) · d03d110f

guoshzhao authored Jan 28, 2022

**Description**
Please write a brief description and link the related issue if have.

**Major Revision**
- Sync (do allreduce max) the E2E training results among all workers.
- Avoid using ':0' in metric name if there has only one rank having output.

d03d110f

21 Jan, 2022 1 commit

Benchmarks: Add Feature - Add bidirectional test support in gpu_copy benchmark (#285) · 74421ffe

Ziyue Yang authored Jan 21, 2022

**Description**
This commit adds bidirectional tests in gpu_copy benchmark for both device-host transfer and device-device transfer, and revises related tests.

74421ffe

19 Jan, 2022 1 commit
- Benchmarks: Add Feature - Add percentile metrics for ort and pytorch inference benchmarks (#283) · fd2bc9e0
  guoshzhao authored Jan 19, 2022
```
**Description**
Add 50th, 90th, 95th, 99th, 99.9th latency metrics for ORT and pytorch inference benchmarks.
```
  fd2bc9e0
18 Jan, 2022 1 commit

CLI - Add command sb benchmark [list,list-parameters] (#279) · f7ffc545

Yifan Xiong authored Jan 18, 2022

__Description__

Add command `sb benchmark list` and `sb benchmark list-parameters` to support listing all optional parameters for benchmarks.

<details>
<summary>Examples</summary>
<pre>
$ sb benchmark list -n [a-z]+-bw -o table
Result
--------
mem-bw
nccl-bw
rccl-bw
</pre>
<pre>
$ sb benchmark list-parameters -n mem-bw
=== mem-bw ===
optional arguments:
  --bin_dir str         Specify the directory of the benchmark binary.
  --duration int        The elapsed time of benchmark in seconds.
  --mem_type str [str ...]
                        Memory types to benchmark. E.g. htod dtoh dtod.
  --memory str          Memory argument for bandwidthtest. E.g. pinned unpinned.
  --run_count int       The run count of benchmark.
  --shmoo_mode          Enable shmoo mode for bandwidthtest.
default values:
{'bin_dir': None,
 'duration': 0,
 'mem_type': ['htod', 'dtoh'],
 'memory': 'pinned',
 'run_count': 1}
</pre>
</details>

__Major Revisions__
* Add `sb benchmark list` to list benchmarks matching given name.
* Add `sb benchmark list-parameters` to list parameters for benchmarks which match given name.

__Minor Revisions__
* Sort format help text for argparse.

f7ffc545

30 Dec, 2021 1 commit

Release - SuperBench v0.4.0 (#278) · ff563b66

Yifan Xiong authored Dec 30, 2021



__Description__

Cherry-pick  bug fixes from v0.4.0 to main.

__Major Revisions__

* Bug - Fix issues for Ansible and benchmarks (#267)
* Tests - Refine test cases for microbenchmark (#268)
* Bug - Build openmpi with ucx support in rocm dockerfiles (#269)
* Benchmarks: Fix Bug - Fix fio build issue (#272)
* Docs - Unify metric and add doc for cublas and cudnn functions (#271)
* Monitor: Revision - Add 'monitor/' prefix to monitor metrics in result summary (#274)
* Bug - Fix bug of detecting if gpu_index is none (#275)
* Bug - Fix bugs in data diagnosis (#273)
* Bug - Fix issue that the root mpi rank may not be the first in the hostfile (#270)
* Benchmarks: Configuration - Update inference and network benchmarks in configs (#276)
* Docs - Upgrade version and release note (#277)
Co-authored-by: Yuting Jiang <v-yutjiang@microsoft.com>

ff563b66

13 Dec, 2021 4 commits
- Benchmarks - Add transformers for TensorRT inference (#254) · cb8a3cfb
  Yifan Xiong authored Dec 13, 2021
```
Add transformers for TensorRT inference.
```
  cb8a3cfb
- Docs - Add benchmark metrics for cpu-memory-bw-latency (#264) · 10012a0a
  Ziyue Yang authored Dec 13, 2021
```
**Description**
Add benchmark metrics for cpu-memory-bw-latency.
```
  10012a0a
- Benchmarks: Fix Comment - Correct benchmark name in test_gpu_copy_bw_performance.py #263 · b6781968
  Ziyue Yang authored Dec 13, 2021
```
**Description**
Benchmarks: Fix Comment - Correct benchmark name in test_gpu_copy_bw_performance.py.
```
  b6781968
- Benchmarks: Add Benchmark - Add mlc benchmark to superbench (#216) · b590409e
  Hossein Pourreza authored Dec 12, 2021
```
**Description**
Add mlc memory bandwidth and latency micro benchmark to Superbench.

**Major Revision**
- Add mlc benchmark with test and example files
```
  b590409e
10 Dec, 2021 2 commits

Benchmarks: Add Benchmark - Add ONNXRuntime inference benchmark based on ORT python API (#245) · 4d85630a

guoshzhao authored Dec 10, 2021

**Description**
Add ONNXRuntime inference benchmark based on ORT python API.

**Major Revision**
- Add `ORTInferenceBenchmark` class to export pytorch model to onnx model and do inference
- Add tests and example for `ort-inference` benchmark
- Update the introduction docs.

4d85630a

Benchmarks: Fix Bug - Set reduce_op type for metirc return_code (#261) · afea9913
guoshzhao authored Dec 10, 2021
```
**Description**
Set the `reduce_op` type for metirc `return_code` as `None`.
```
afea9913

09 Dec, 2021 1 commit
- Benchmarks: Unify metric names of benchmarks (#252) · 9f56b219
  Yuting Jiang authored Dec 09, 2021
```
**Description**
Unify metric names of benchmarks.
```
  9f56b219
07 Dec, 2021 1 commit
- Benchmarks: Add Feature - Add return_code metric into result (#256) · 44f0270e
  guoshzhao authored Dec 07, 2021
```
**Description**
Add return_code metric into result and revise unit tests.
```
  44f0270e
02 Dec, 2021 1 commit

Benchmarks: Add Feature - Add 'ignore_invalid' option when register benchmarks. (#247) · 371fd61c

guoshzhao authored Dec 02, 2021

**Description**
If `ignore_invalid` is True, and 'required' arguments are not set when register the benchmark, the arguments should be provided by user in config and skip the arguments checking.

371fd61c

15 Nov, 2021 1 commit

Benchmarks: Add Feature - Extend the device manager utility to support more functions. (#239) · cc70f9c1

guoshzhao authored Nov 15, 2021

**Description**
Rename `nvidia_helper` utility as `device_manager` module and support more functions:
```
device_manager.get_device_count()
device_manager.get_device_utilization(idx)
device_manager.get_device_temperature(idx)
device_manager.get_device_power_limit(idx)
device_manager.get_device_memory(idx)
device_manager.get_device_row_remapped_info(idx)
device_manager.get_device_ecc_error(idx)
```

cc70f9c1

12 Nov, 2021 1 commit

Benchmarks - Add TensorRT inference benchmark (#236) · 8a00c8a0

Yifan Xiong authored Nov 12, 2021

__Description__

Add TensorRT inference benchmark for torchvision models.

__Major Revision__
- Measure TensorRT inference performance.

8a00c8a0

09 Nov, 2021 1 commit

Benchmarks: Add Benchmark - Add ib traffic validation distributed benchmark (#215) · 54919424

Yuting Jiang authored Nov 10, 2021

**Description**
Add ib traffic validation distributed benchmark.

**Major Revision**
- Add ib traffic validation distributed benchmark, example and test

54919424

30 Oct, 2021 1 commit

Benchmarks: Add Feature - Add CPU-initiated copy and dtod support to gpu-sm-copy benchmark (#230) · 008e0fe1

Ziyue Yang authored Oct 30, 2021

**Description**
This commit does the following:
1) Adds CPU-initiated copy benchmark;
2) Adds dtod benchmark;
3) Support scanning NUMA nodes and GPUs inside the benchmark program;
4) Change the name of gpu-sm-copy to gpu-copy.

008e0fe1

27 Oct, 2021 1 commit
- Benchmarks: Add Benchmark - Add onnx model benchmarks based on docker image. (#227) · e98a6812
  guoshzhao authored Oct 27, 2021
```
Add RocmOnnxModelBenchmark class to run benchmarks packaged in superbench/benchmark:rocm4.3.1-onnxruntime1.9.0
```
  e98a6812
22 Oct, 2021 2 commits

Benchmarks: Add Benchmark - Add gpcnet microbenchmark (#229) · 6003f2c2

Yuting Jiang authored Oct 22, 2021

**Description**
Add gpcnet microbenchmark

**Major Revision**
- add 2 microbenmark for gpcnet, gpc-network-test, gpc-network-load-test
- add related test and example file

6003f2c2

Benchmarks: Add Feature - Support AMD and CUDA platform for DockerBenchmark. (#226) · f841c8f4
guoshzhao authored Oct 22, 2021
```
Description
Add CudaDockerBenchmark and RocmDockerBenchmark to support amd and cuda platform for DockerBenchmark.
```
f841c8f4

21 Oct, 2021 1 commit
- revise the term onnx to onnxruntime. (#232) · 455ad1f8
  guoshzhao authored Oct 21, 2021
```
**Description**
Revise the all the term `onnx` to `onnxruntime`.
```
  455ad1f8
12 Oct, 2021 1 commit

Benchmarks: Add Benchmark - Add tcp connectivity validation microbenchmark (#217) · 49cc8f9a

Yuting Jiang authored Oct 13, 2021

**Description**
Add tcp connectivity validation microbenchmark which is to validate TCP connectivity between current node and several nodes in the hostfile.

**Major Revision**
- Add tcp connectivity validation microbenchmark and related test, example

49cc8f9a

27 Sep, 2021 1 commit
- Benchmarks: Add Feature - Add option to use fp32 instead of tf32 (#213) · f9442456
  guoshzhao authored Sep 28, 2021
```
**Description**
Add option `force_fp32` to use fp32 instead of tf32, only takes effect on Ampere or newer GPUs.
```
  f9442456