Commits · 6e50f0228f31eab876f5b17f45687f8743c7af5e · tsoc / superbenchmark

22 Dec, 2023 1 commit
- Benchmarks: Micro benchmarks - Update hipblaslt metric unit to tflops (#604) · 6e50f022
  Yuting Jiang authored Dec 22, 2023
```
**Description**
Benchmarks: Micro benchmarks - Update hipblaslt metric unit to tflops
```
  6e50f022
16 Dec, 2023 1 commit

Benchmarks: Bug Fix - Make metrics of dist-inference-cpp aligned with PyTorch version (#596) · b0cc8e17

Ziyue Yang authored Dec 16, 2023



**Description**
Make metrics of dist-inference-cpp aligned with PyTorch version.

---------
Co-authored-by: Yuting Jiang <yutingjiang@microsoft.com>

b0cc8e17

15 Dec, 2023 1 commit
- Benchmarks: Microbenchmark - Add data type option for NCCL and RCCL tests (#595) · 2c039b57
  Ziyue Yang authored Dec 15, 2023
```
**Description**
Add data type option for NCCL and RCCL tests.
```
  2c039b57
13 Dec, 2023 1 commit
- Benchmarks: Microbenchmark - Support in-place for NCCL/RCCL benchmark (#591) · 27374ad5
  Ziyue Yang authored Dec 14, 2023
```
**Description**
Add in-place metrics for NCCL/RCCL benchmark for latency measurement.
```
  27374ad5
10 Dec, 2023 1 commit
- Benchmarks: Microbenchmark - Add distributed inference benchmark cpp implementation (#586) · 719a427f
  Ziyue Yang authored Dec 11, 2023
```
**Description**
Add distributed inference benchmark cpp implementation.
```
  719a427f
08 Dec, 2023 1 commit

Benchmarks: Micro benchmark - Add one-to-all, all-to-one, all-to-all support... · 4fa60be7

Ziyue Yang authored Dec 08, 2023

Benchmarks: Micro benchmark - Add one-to-all, all-to-one, all-to-all support to gpu_copy_bw_performance (#588)

**Description**
Add one-to-all, all-to-one, all-to-all support to
gpu_copy_bw_performance, and fix performance bug in gpu_copy

4fa60be7

07 Dec, 2023 1 commit
- Benchmarks: Add benchmark: Megatron-LM/Megatron-Deepspeed GPT pretrain benchmark (#582) · dd5a6329
  Yuting Jiang authored Dec 07, 2023
```
**Description**
Megatron-LM/Megatron-Deepspeed GPT pretrain benchmark
```
  dd5a6329
05 Dec, 2023 1 commit
- Benchmarks: Micro benchmark - Add graph mode in NCCL/RCCL benchmarks for latency metrics (#583) · 254ea7fe
  Ziyue Yang authored Dec 05, 2023
```
**Description**
Revise NCCL/RCCL benchmarks to graph mode add latency metrics.
```
  254ea7fe
04 Dec, 2023 1 commit

Benchmarks: micro benchmark - Support cpu-gpu and gpu-cpu in ib-validation (#581) · 9ae8c670

Yuting Jiang authored Dec 04, 2023

**Description**
Benchmarks: micro benchmark - Support cpu-gpu and gpu-cpu in
ib-validation

**Major Revision**
- Support cpu-gpu and gpu-cpu in ib-validation


**Minor Revision**
- support multi msg size, multi direction, multi ib commands in
ib-validation

9ae8c670

22 Nov, 2023 2 commits
- Benchmarks: Micro benchmark - add initialization options for rocm gemm flops (#578) · 2235e084
  Yuting Jiang authored Nov 22, 2023
```
**Description**
add initialization options for rocm gemm flops.
```
  2235e084
- Benchmarks: Micro benchmark - Add hipBLASLt function benchmark (#576) · 79089b65
  Yuting Jiang authored Nov 22, 2023
```
**Description**
hipblaslt function benchmark and rebase cublaslt function benchmark.
```
  79089b65
20 Nov, 2023 1 commit
- Benchmarks: micro benchmarks - add int8 support for cublaslt function (#574) · f53d941a
  Yuting Jiang authored Nov 20, 2023
```
**Description**
add int8 support for cublaslt function.
```
  f53d941a
14 Nov, 2023 1 commit

Bug Fix - remove cp ptx file command in gpu burn test (#567) · c7800bb8

Yuting Jiang authored Nov 14, 2023

**Description**
remove cp ptx file in gpu burn test since the command is run inside
self.args.bin_dir dir.


https://github.com/microsoft/superbenchmark/blob/d246bab430adeb461072918a551b2e2b68c9bce5/superbench/benchmarks/micro_benchmarks/micro_base.py#L183

c7800bb8

06 Jul, 2023 1 commit
- Benchmarks: micro benchmarks - add python code for DirectXGPUEncodingLatency (#548) · e8ac0b1e
  Yuting Jiang authored Jul 06, 2023
```
**Description**
add python code for DirectXGPUEncodingLatency.
```
  e8ac0b1e
05 Jul, 2023 3 commits
- Benchmarks: micro benchmarks - add python code for DirectXGPUCopy (#546) · c8c079c2
  Yuting Jiang authored Jul 06, 2023
```
**Description**
add python code for DirectXGPUCopy.
```
  c8c079c2
- Benchmarks: micro benchmarks - add python code for DirecXGPUMemBw (#547) · af4cfd5b
  Yuting Jiang authored Jul 05, 2023
```
**Description**
add python code for DirecXGPUMemBw.
```
  af4cfd5b
- Benchmarks: micro benchmarks - add python code for DirectXGPUCoreFlops (#542) · f1d608ae
  Yuting Jiang authored Jul 05, 2023
```
**Description**
add python code for DirectX core flops and init DirectX test pipeline.

**Major Revision**
- add python code for DirectX core flops 
- init DirectX test pipeline


**Minor Revision**
- add test for DirectX core flops
```
  f1d608ae
30 Jun, 2023 2 commits

Benchmarks: microbenchmark - add auto selecting algorithm support for cudnn functions (#540) · 97f7b1df

Yuting Jiang authored Jun 30, 2023

**Description**
add auto selecting algorithm support for cudnn functions.

**Major Revision**
- add auto selecting algorithm support for cudnn functions in source
code
- add 'auto_algo' option in benchmark
- add related test

97f7b1df

Benchmarks - Update result parsing in tensorrt inference (#541) · 7184bdd1
Yifan Xiong authored Jun 30, 2023
```
* Update result parsing for newer tensorrt versions
* Update arguments when load torchvision models
```
7184bdd1

28 Apr, 2023 1 commit

ModelBenchmarks - Fix early stop logic due to num_steps. (#522) · f38a9829

guoshzhao authored Apr 28, 2023

**Description**
Model benchmarks can stop due to `num_steps` or `duration` config which
will take effect when the value is set greater than 0.
If both are set greater than 0, the earliest condition reached will
work.

f38a9829

14 Apr, 2023 1 commit

Release - SuperBench v0.8.0 (#517) · 51761b3a

Yifan Xiong authored Apr 14, 2023



**Description**

Cherry-pick bug fixes from v0.8.0 to main.

**Major Revisions**

* Monitor - Fix the cgroup version checking logic (#502)
* Benchmark - Fix matrix size overflow issue in cuBLASLt GEMM (#503)
* Fix wrong torch usage in communication wrapper for Distributed
Inference Benchmark (#505)
* Analyzer: Fix bug in python3.8 due to pandas api change (#504)
* Bug - Fix bug to get metric from cmd when error happens (#506)
* Monitor - Collect realtime GPU power when benchmarking (#507)
* Add num_workers argument in model benchmark (#511)
* Remove unreachable condition when write host list (#512)
* Update cuda11.8 image to cuda12.1 based on nvcr23.03 (#513)
* Doc - Fix wrong unit of cpu-memory-bw-latency in doc (#515)
* Docs - Upgrade version and release note (#508)
Co-authored-by: guoshzhao <guzhao@microsoft.com>
Co-authored-by: Ziyue Yang <ziyyang@microsoft.com>
Co-authored-by: Yuting Jiang <yutingjiang@microsoft.com>

51761b3a

24 Mar, 2023 1 commit

Benchmarks - Add distributed inference benchmark (#493) · 8daef211

Ziyue Yang authored Mar 24, 2023



**Description**
This PR adds a micro-benchmark of distributed model inference workloads.

**Major Revision**
- Add a new micro-benchmark dist-inference.
- Add corresponding example and unit tests.
- Update configuration files to include this new micro-benchmark.
- Update micro-benchmark README.

---------
Co-authored-by: Peng Cheng <chengpeng5555@outlook.com>

8daef211

22 Mar, 2023 1 commit
- Benchmark - Support batch/shape range in cublaslt gemm (#494) · dbeba805
  Yifan Xiong authored Mar 22, 2023
```
Support batch and shape range with multiplication factors in cublaslt
gemm benchmark.
```
  dbeba805
21 Mar, 2023 1 commit

Adding HPL benchmark (#482) · 655bd0aa

rafsalas19 authored Mar 21, 2023



**Description**

- Adding HPL benchmark

---------
Co-authored-by: Ubuntu <azureuser@sbtestvm.jzlku1oskncengjiado35wf1hd.ax.internal.cloudapp.net>
Co-authored-by: Peng Cheng <chengpeng5555@outlook.com>

655bd0aa

13 Feb, 2023 1 commit

Adding Stream Benchmark (#473) · 32896ca4

rafsalas19 authored Feb 13, 2023



**Description**

- Added stream benchmark
- Added stream unit test
- Added stream example
- Modified docker files to build stream

---------
Co-authored-by: Ubuntu <azureuser@sbtestvm.jzlku1oskncengjiado35wf1hd.ax.internal.cloudapp.net>
Co-authored-by: Peng Cheng <chengpeng5555@outlook.com>
Co-authored-by: Yifan Xiong <xiongyf@yandex.com>

32896ca4

04 Jan, 2023 2 commits

Benchmarks - Support topo-aware, pair-wise, and K-batch pattern in nccl-bw benchmark (#454) · ccccd988
Yang Wang authored Jan 04, 2023
```
Support traffic patterns under the different devices in NCCL/RCCL test
* change the metrics format if specified the pattern
```
ccccd988

Benchmarks - Support FP8 in BERT models (#446) · 5197cdf5

Yifan Xiong authored Jan 04, 2023

Support FP8 in PyTorch BERT models:

* add fp8 hybrid/e4m3/e5m2 in precision arguments
* build BERT encoders with `te.TransformerLayer` to repalce
`transformers.BertModel`
* wrap forward steps with fp8 autocast

5197cdf5

03 Jan, 2023 2 commits
- Benchmarks - Integrate cublaslt micro-benchmark (#455) · 616e7a5a
  Yifan Xiong authored Jan 03, 2023
```
Integrate cublaslt-gemm micro-benchmark #451.
```
  616e7a5a
- Benchmarks: Micro benchmarks - Add correctness check in cublas-function benchmark (#452) · 75573f59
  Yuting Jiang authored Jan 03, 2023
```
**Description**
 Add correctness check in cublas-function benchmark.

**Major Revision**
- add python code of correctness check in cublas-function benchmark and test
```
  75573f59
30 Dec, 2022 1 commit

Executor - Add stdout logging util module and enable real-time logging flushing in executor (#445) · 9dfefce3

Yuting Jiang authored Dec 30, 2022

**Description**
Add stdout logging util module and enable real-time logging flushing in executor

**Major Revision**
- Add stdout logging util module to redirect stdout into file log
- enable stdout logging in executor to write benchmark output into both stdout and file `sb-bench.log`
- enable real-time log flushing in run_command of microbenchmarks through config `log_flushing`

**Minor Revision**
- add log_n_step args to enable regular step time log in model benchmarks 
- udpate related docs

9dfefce3

14 Dec, 2022 1 commit
- Benchmark: Revision - Add wait time option to resolve mem-bw unstable issue (#438) · 6583ba2e
  Yuting Jiang authored Dec 14, 2022
```
**Description**
Add wait time option to resolve mem-bw unstable issue.
```
  6583ba2e
18 Oct, 2022 1 commit

Benchmarks - Add support to allow list of custom config string in... · 3367c4f6

Yuting Jiang authored Oct 18, 2022

Benchmarks - Add support to allow list of custom config string in cudnn-functions and cublas-functions (#414)

**Description**
Add support to allow list of custom config string in cudnn-functions and cublas-functions.

3367c4f6

06 Sep, 2022 1 commit

Release - SuperBench v0.6.0 (#409) · 63e9b2d1

Yifan Xiong authored Sep 06, 2022



**Description**

Cherry-pick bug fixes from v0.6.0 to main.

**Major Revisions**

* Enable latency test in ib traffic validation distributed benchmark (#396)
* Enhance parameter parsing to allow spaces in value (#397)
* Update apt packages in dockerfile (#398)
* Upgrade colorlog for NO_COLOR support (#404)
* Analyzer - Update error handling to support exit code of sb result diagnosis (#403)
* Analyzer - Make baseline file optional in data diagnosis and fix bugs (#399)
* Enhance timeout cleanup to avoid possible hanging (#405)
* Auto generate ibstat file by pssh (#402)
* Analyzer - Format int type and unify empty value to N/A in diagnosis output file (#406)
* Docs - Upgrade version and release note (#407)
* Docs - Fix issues in document (#408)
Co-authored-by: Yang Wang <yangwang1@microsoft.com>
Co-authored-by: Yuting Jiang <yutingjiang@microsoft.com>

63e9b2d1

04 Aug, 2022 1 commit

Gracefully exit when timeout (#383) · 9b8df883

Yifan Xiong authored Aug 04, 2022

* Gracefully exit when timeout, add corresponding log and return code.
* Set minimum timeout to 1 minute and enlarge Ansible timeout.

9b8df883

26 Jul, 2022 1 commit

Support topo-aware IB performance validation (#373) · ef4d6574

Jie Zhang authored Jul 26, 2022



* Support topo-aware IB performance validation

Add a new pattern `topo-aware`, so the user can run IB performance
test based on VM's topology information. This way, the user can
validate the IB performance across VM pairs with different distance
as a quick test instead of pair-wise test.

To run with topo-aware pattern, user needs to specify three required
(and two optional) parameters in YAML config file:
--pattern	topo-aware
--ibstat	path to ibstat output
--ibnetdiscover	path to ibnetdiscover output
--min_dist	minimum distance of VM pairs (optional, default 2)
--max_dist	maximum distance of VM pairs (optional, default 6)

The newly added topo_aware module then parses the topology
information, builds a graph, and generates the VM pairs with
the specified distance (# hops).

The specified IB test will then be running across these
generated VM pairs.
Signed-off-by: Jie Zhang <jessezhang1010@gmail.com>

* Add description about topology aware ib traffic tests
Signed-off-by: Jie Zhang <jessezhang1010@gmail.com>

* Add unit test to verify generated topology aware config file

This commit adds unit test to verify the generated topology aware
config file is correct. To do so, four new data files are added in
order to invoke gen_topo_aware_config function to generate topology
aware config file, then compares it with the expected config file.
Signed-off-by: Jie Zhang <jessezhang1010@gmail.com>

* Fix lint issue on Azure pipeline
Signed-off-by: Jie Zhang <jessezhang1010@gmail.com>

ef4d6574

25 Jul, 2022 1 commit

Fix unexpected base conversion when the result value is negative (#377) · 5d448eed

Yang Wang authored Jul 25, 2022

Fix an unexpected result value (`-0.125`) issue in ib traffic benchmark when encountering `-1` in raw output
* Check if the value is valid before the base conversion
* Add a test case to cover this situation

5d448eed

20 Jul, 2022 1 commit

Fix port conflict in ib loopback (#375) · 352ae0c9

Yifan Xiong authored Jul 20, 2022

Fix potential port conflict due to race condition between time-to-check
to time-to-use, by binding the port all through.

Modify the function to resolve flake8 C901 while keeping the logic same.

352ae0c9

09 Jul, 2022 1 commit

Fix issues in ib validation benchmark (#370) · b2875179

Yifan Xiong authored Jul 09, 2022

Fix several issues in ib validation benchmark:
* continue running when timeout in the middle, instead of aborting whole mpi process
* make timeout parameter configurable, set default to 120 seconds
* avoid mixture of stdio and iostream when print to stdout
* set default message size to 8M which will saturate ib in most cases
* fix hostfile path issue so that it can be auto found in different cases

b2875179

29 Jun, 2022 1 commit

Fix issues in ib loopback benchmark (#369) · 620192a2

Yifan Xiong authored Jun 30, 2022

Fix several issues in ib loopback benchmark:
* use `--report_gbits` and divide by 8 to get GB/s, previous results are
  MiB/s / 1000
* use the ib_write_bw binary built in third_party instead of system path
* update the metrics name so that different hca indices have same metric

620192a2

24 Jun, 2022 1 commit

Support multiple IB/GPU in ib validation (#363) · bfaa1c83

Yifan Xiong authored Jun 24, 2022

**Description**

Support multiple IB/GPU devices run simultaneously in ib validation benchmark.

**Major Revisions**
- Revise ib_validation_performance.cc so that multiple processes per node could be used to launch multiple perftest commands simultaneously. For each node pair in the config, number of processes per node will run in parallel.
- Revise ib_validation_performance.py to correct file paths and adjust parameters to specify different NICs/GPUs/NUMA nodes.
- Fix env issues in Dockerfile for end-to-end test.
- Update ib-traffic configuration examples in config files.
- Update unit tests and docs accordingly.

Closes #326.

bfaa1c83