Commits · af4cfd5bbfe989b212d5311656be0cbe7cd5ae35 · tsoc / superbenchmark

05 Jul, 2023 3 commits
- Benchmarks: micro benchmarks - add python code for DirecXGPUMemBw (#547) · af4cfd5b
  Yuting Jiang authored Jul 05, 2023
```
**Description**
add python code for DirecXGPUMemBw.
```
  af4cfd5b
- Benchmarks: micro benchmarks - add python code for DirectXGPUCoreFlops (#542) · f1d608ae
  Yuting Jiang authored Jul 05, 2023
```
**Description**
add python code for DirectX core flops and init DirectX test pipeline.

**Major Revision**
- add python code for DirectX core flops 
- init DirectX test pipeline


**Minor Revision**
- add test for DirectX core flops
```
  f1d608ae
- CI/CD - Support DirectX test pipeline (#545) · 3704a432
  Yuting Jiang authored Jul 05, 2023
```
**Description**
Support DirectX test pipeline.
```
  3704a432
30 Jun, 2023 2 commits

Benchmarks: microbenchmark - add auto selecting algorithm support for cudnn functions (#540) · 97f7b1df

Yuting Jiang authored Jun 30, 2023

**Description**
add auto selecting algorithm support for cudnn functions.

**Major Revision**
- add auto selecting algorithm support for cudnn functions in source
code
- add 'auto_algo' option in benchmark
- add related test

97f7b1df

Benchmarks - Update result parsing in tensorrt inference (#541) · 7184bdd1
Yifan Xiong authored Jun 30, 2023
```
* Update result parsing for newer tensorrt versions
* Update arguments when load torchvision models
```
7184bdd1

29 Jun, 2023 3 commits
- Benchmarks: Add benchmark - Add source code of DirectxGPUCopy microbenchmark (#486) · f2599137
  Yuting Jiang authored Jun 29, 2023
```
**Description**
Add source code of DirectxGPUCopy microbenchmark.
```
  f2599137
- Benchmarks: Add benchmark - Add source code of DirectxGPUMemBw microbenchmark (#487) · af4d18de
  Yuting Jiang authored Jun 29, 2023
```
**Description**
Add source code of DirectxGPUMemBw microbenchmark.

---------
Co-authored-by: v-junlinlv <v-junlinlv@microsoft.com>
```
  af4d18de
- Benchmarks: Add benchmark - Add source code of DirectXGPUCoreFLOPs microbenchmark (#488) · 3a6622f7
  Yuting Jiang authored Jun 29, 2023
```
**Description**
Add source code of DirectXGPUCoreFLOPs microbenchmark.

---------
Co-authored-by: v-junlinlv <v-junlinlv@microsoft.com>
```
  3a6622f7
28 Jun, 2023 1 commit

Dockerfile - Add SuperBench Windows Dockerfile (#534) · 44ef5314

Yuting Jiang authored Jun 28, 2023



**Description**
Add dockerfile for win10 and building script for directx_benchmarks.

**Major Revision**
- Add docker file for win10 and required scripts to install the
dependency
- Add building script to build all directx vs benchmarks
- Add call of building script in Makefile

---------
Co-authored-by: yukirora <yuting.jiang@microsoft.com>
Co-authored-by: Yifan Xiong <yifan.xiong@microsoft.com>

44ef5314

21 Jun, 2023 1 commit

Benchmarks - Add support for DirectX GPU platform (#536) · bbb0e243

Yuting Jiang authored Jun 21, 2023

**Description**
Add support for DirectX GPU platform.

**Major Revision**
- Add DirectX platform for benchmark registry
- Add gpu_vendor identify for AMD and NVIDIA with win driver

bbb0e243

16 Jun, 2023 1 commit
- Benchmarks - Update outdate references (#539) · e909ddd0
  guoshzhao authored Jun 16, 2023
```
**Description**
Update 404 outdate reference links.
```
  e909ddd0
28 Apr, 2023 1 commit

ModelBenchmarks - Fix early stop logic due to num_steps. (#522) · f38a9829

guoshzhao authored Apr 28, 2023

**Description**
Model benchmarks can stop due to `num_steps` or `duration` config which
will take effect when the value is set greater than 0.
If both are set greater than 0, the earliest condition reached will
work.

f38a9829

24 Apr, 2023 1 commit

Benchmarks - Revise step time collection in distributed inference benchmark (#524) · 4cb431ca

Ziyue Yang authored Apr 24, 2023

**Description**
This commit revises distributed inference benchmark to give a unified
step time result by taking maximum step times of different GPUs.

4cb431ca

14 Apr, 2023 1 commit

Release - SuperBench v0.8.0 (#517) · 51761b3a

Yifan Xiong authored Apr 14, 2023



**Description**

Cherry-pick bug fixes from v0.8.0 to main.

**Major Revisions**

* Monitor - Fix the cgroup version checking logic (#502)
* Benchmark - Fix matrix size overflow issue in cuBLASLt GEMM (#503)
* Fix wrong torch usage in communication wrapper for Distributed
Inference Benchmark (#505)
* Analyzer: Fix bug in python3.8 due to pandas api change (#504)
* Bug - Fix bug to get metric from cmd when error happens (#506)
* Monitor - Collect realtime GPU power when benchmarking (#507)
* Add num_workers argument in model benchmark (#511)
* Remove unreachable condition when write host list (#512)
* Update cuda11.8 image to cuda12.1 based on nvcr23.03 (#513)
* Doc - Fix wrong unit of cpu-memory-bw-latency in doc (#515)
* Docs - Upgrade version and release note (#508)
Co-authored-by: guoshzhao <guzhao@microsoft.com>
Co-authored-by: Ziyue Yang <ziyyang@microsoft.com>
Co-authored-by: Yuting Jiang <yutingjiang@microsoft.com>

51761b3a

28 Mar, 2023 1 commit

Benchmark - Update TE FP8 model conversion (#499) · 97c9a41f

Yifan Xiong authored Mar 28, 2023

__Description__

Update TE FP8 model conversion.

__Major Revisions__
* Add 16-byte alignment comment.
* Fix TE layer parameters type.

97c9a41f

25 Mar, 2023 1 commit

Benchmarks - Support TE FP8 in BERT/GPT2 models (#496) · c88c9709

Yifan Xiong authored Mar 25, 2023

Support Transformer Engine FP8 in existing PyTorch BERT/GPT2 models by
converting linear/layernorm to TE layers.

c88c9709

24 Mar, 2023 1 commit

Benchmarks - Add distributed inference benchmark (#493) · 8daef211

Ziyue Yang authored Mar 24, 2023



**Description**
This PR adds a micro-benchmark of distributed model inference workloads.

**Major Revision**
- Add a new micro-benchmark dist-inference.
- Add corresponding example and unit tests.
- Update configuration files to include this new micro-benchmark.
- Update micro-benchmark README.

---------
Co-authored-by: Peng Cheng <chengpeng5555@outlook.com>

8daef211

22 Mar, 2023 1 commit
- Benchmark - Support batch/shape range in cublaslt gemm (#494) · dbeba805
  Yifan Xiong authored Mar 22, 2023
```
Support batch and shape range with multiplication factors in cublaslt
gemm benchmark.
```
  dbeba805
21 Mar, 2023 2 commits

Adding HPL benchmark (#482) · 655bd0aa

rafsalas19 authored Mar 21, 2023



**Description**

- Adding HPL benchmark

---------
Co-authored-by: Ubuntu <azureuser@sbtestvm.jzlku1oskncengjiado35wf1hd.ax.internal.cloudapp.net>
Co-authored-by: Peng Cheng <chengpeng5555@outlook.com>

655bd0aa

Benchmark - Fix torch.dist init issue with multiple models (#495) · 644b5395

Yifan Xiong authored Mar 21, 2023

Fix potential barrier timeout in init_process_group due to race
condition of using the same port. Change to different ports when running
multiple models sequentially in one process.
For example, when running vgg11/13/16/19, will use port 29501~29504
respectively.

644b5395

20 Mar, 2023 2 commits

Benchmarks: Support error tolerance in micro-benchmark for CuDNN function (#490) · 5a88db16

Yuting Jiang authored Mar 20, 2023

**Description**
Support error tolerance in micro-benchmark for CuDNN function


**Major Revision**
- revise micro_base to support running the remaining commands run when
one command failed in the microbenchmark
- make error tolerance as true in cudnn functions

5a88db16

Benchmarks - Support tensor core precisions in cublaslt gemm (#492) · b808135c
Yifan Xiong authored Mar 20, 2023
```
Support FP64/TF32/FP16/BF16 in cublaslt (batch) GEMM.
```
b808135c

27 Feb, 2023 1 commit

Benchmarks: Revision - Support flexible warmup and non-random data... · eba298f5

Yuting Jiang authored Feb 28, 2023

Benchmarks: Revision - Support flexible warmup and non-random data initialization in cublas-benchmark  (#479)

**Description**
revise cublas-benchmark for flexible warmup and fill data with fixed
number for perf test to improve the running efficiency.

**Major Revision**
- remove num_in_steps for warmup to support more flexible warmup setting
for users
- Add support to generate input with fixed number for perf test

eba298f5

13 Feb, 2023 1 commit

Adding Stream Benchmark (#473) · 32896ca4

rafsalas19 authored Feb 13, 2023



**Description**

- Added stream benchmark
- Added stream unit test
- Added stream example
- Modified docker files to build stream

---------
Co-authored-by: Ubuntu <azureuser@sbtestvm.jzlku1oskncengjiado35wf1hd.ax.internal.cloudapp.net>
Co-authored-by: Peng Cheng <chengpeng5555@outlook.com>
Co-authored-by: Yifan Xiong <xiongyf@yandex.com>

32896ca4

28 Jan, 2023 1 commit

Release - SuperBench v0.7.0 (#468) · b07fda15

Yifan Xiong authored Jan 28, 2023



**Description**

Cherry-pick bug fixes from v0.7.0 to main.

**Major Revisions**

* Benchmarks - Fix missing include in FP8 benchmark (#460)
* Fix bug in TE BERT model (#461)
* Doc - Update benchmark doc (#465)
* Bug: Fix bug for incorrect datatype judgement in cublas-function
source code (#464)
* Support `sb deploy` without pulling image (#466)
* Docs - Upgrade version and release note (#467)
Co-authored-by: Russell J. Hewett <russell.j.hewett@gmail.com>
Co-authored-by: Yuting Jiang <yutingjiang@microsoft.com>

b07fda15

17 Jan, 2023 1 commit
- Bug: Fix bug for incorrect datatype judgement in cublas-function source code (#462) · f380bc5e
  Yuting Jiang authored Jan 17, 2023
```
**Description**
Fix bug for incorrect datatype judgement in cublas-function source code.
```
  f380bc5e
04 Jan, 2023 2 commits

Benchmarks - Support topo-aware, pair-wise, and K-batch pattern in nccl-bw benchmark (#454) · ccccd988
Yang Wang authored Jan 04, 2023
```
Support traffic patterns under the different devices in NCCL/RCCL test
* change the metrics format if specified the pattern
```
ccccd988

Benchmarks - Support FP8 in BERT models (#446) · 5197cdf5

Yifan Xiong authored Jan 04, 2023

Support FP8 in PyTorch BERT models:

* add fp8 hybrid/e4m3/e5m2 in precision arguments
* build BERT encoders with `te.TransformerLayer` to repalce
`transformers.BertModel`
* wrap forward steps with fp8 autocast

5197cdf5

03 Jan, 2023 5 commits

Support GEMM benchmark on Hopper GPUs (#456) · fc661f7d
Yifan Xiong authored Jan 03, 2023
```
Support GEMM benchmark on Hopper GPUs.
```
fc661f7d
Benchmarks - Integrate cublaslt micro-benchmark (#455) · 616e7a5a
Yifan Xiong authored Jan 03, 2023
```
Integrate cublaslt-gemm micro-benchmark #451.
```
616e7a5a

Benchmarks: Micro benchmarks - Add correctness check in cublas-function benchmark (#452) · 75573f59

Yuting Jiang authored Jan 03, 2023

**Description**
 Add correctness check in cublas-function benchmark.

**Major Revision**
- add python code of correctness check in cublas-function benchmark and test

75573f59

Benchmarks - Add cuBLASLt FP16 and FP8 GEMM micro-benchmark (#451) · 0591da5f
Yifan Xiong authored Jan 03, 2023
```
Add micro-benchmark for cublaslt fp8 gemm.
```
0591da5f

Benchmarks: Micro benchmarks - add source code of correctness check for cublas functions (#450) · 678b1251

Yuting Jiang authored Jan 03, 2023

**Description**
Add c source code of correctness check for cublas functions.

**Major Revision**
- add correctness check for all supported cublas functions
- add --correctness option into binary

**Minor Revision**
- fix bug and template fill_data and prepare_tensor to get right memory-alignment output matrix for different datatype

678b1251

30 Dec, 2022 2 commits

Executor - Add stdout logging util module and enable real-time logging flushing in executor (#445) · 9dfefce3

Yuting Jiang authored Dec 30, 2022

**Description**
Add stdout logging util module and enable real-time logging flushing in executor

**Major Revision**
- Add stdout logging util module to redirect stdout into file log
- enable stdout logging in executor to write benchmark output into both stdout and file `sb-bench.log`
- enable real-time log flushing in run_command of microbenchmarks through config `log_flushing`

**Minor Revision**
- add log_n_step args to enable regular step time log in model benchmarks 
- udpate related docs

9dfefce3

Benchmarks - Support `pair-wise` pattern in IB validation benchmark (#453) · f2634d86
Yang Wang authored Dec 30, 2022
```
**Description**
* Reuse `gen_pair_wise_config` in micro-benchmark
```
f2634d86

14 Dec, 2022 1 commit
- Benchmark: Revision - Add wait time option to resolve mem-bw unstable issue (#438) · 6583ba2e
  Yuting Jiang authored Dec 14, 2022
```
**Description**
Add wait time option to resolve mem-bw unstable issue.
```
  6583ba2e
18 Oct, 2022 1 commit

Benchmarks - Add support to allow list of custom config string in... · 3367c4f6

Yuting Jiang authored Oct 18, 2022

Benchmarks - Add support to allow list of custom config string in cudnn-functions and cublas-functions (#414)

**Description**
Add support to allow list of custom config string in cudnn-functions and cublas-functions.

3367c4f6

06 Sep, 2022 1 commit

Release - SuperBench v0.6.0 (#409) · 63e9b2d1

Yifan Xiong authored Sep 06, 2022



**Description**

Cherry-pick bug fixes from v0.6.0 to main.

**Major Revisions**

* Enable latency test in ib traffic validation distributed benchmark (#396)
* Enhance parameter parsing to allow spaces in value (#397)
* Update apt packages in dockerfile (#398)
* Upgrade colorlog for NO_COLOR support (#404)
* Analyzer - Update error handling to support exit code of sb result diagnosis (#403)
* Analyzer - Make baseline file optional in data diagnosis and fix bugs (#399)
* Enhance timeout cleanup to avoid possible hanging (#405)
* Auto generate ibstat file by pssh (#402)
* Analyzer - Format int type and unify empty value to N/A in diagnosis output file (#406)
* Docs - Upgrade version and release note (#407)
* Docs - Fix issues in document (#408)
Co-authored-by: Yang Wang <yangwang1@microsoft.com>
Co-authored-by: Yuting Jiang <yutingjiang@microsoft.com>

63e9b2d1

04 Aug, 2022 1 commit

Gracefully exit when timeout (#383) · 9b8df883

Yifan Xiong authored Aug 04, 2022

* Gracefully exit when timeout, add corresponding log and return code.
* Set minimum timeout to 1 minute and enlarge Ansible timeout.

9b8df883

26 Jul, 2022 1 commit

Support topo-aware IB performance validation (#373) · ef4d6574

Jie Zhang authored Jul 26, 2022



* Support topo-aware IB performance validation

Add a new pattern `topo-aware`, so the user can run IB performance
test based on VM's topology information. This way, the user can
validate the IB performance across VM pairs with different distance
as a quick test instead of pair-wise test.

To run with topo-aware pattern, user needs to specify three required
(and two optional) parameters in YAML config file:
--pattern	topo-aware
--ibstat	path to ibstat output
--ibnetdiscover	path to ibnetdiscover output
--min_dist	minimum distance of VM pairs (optional, default 2)
--max_dist	maximum distance of VM pairs (optional, default 6)

The newly added topo_aware module then parses the topology
information, builds a graph, and generates the VM pairs with
the specified distance (# hops).

The specified IB test will then be running across these
generated VM pairs.
Signed-off-by: Jie Zhang <jessezhang1010@gmail.com>

* Add description about topology aware ib traffic tests
Signed-off-by: Jie Zhang <jessezhang1010@gmail.com>

* Add unit test to verify generated topology aware config file

This commit adds unit test to verify the generated topology aware
config file is correct. To do so, four new data files are added in
order to invoke gen_topo_aware_config function to generate topology
aware config file, then compares it with the expected config file.
Signed-off-by: Jie Zhang <jessezhang1010@gmail.com>

* Fix lint issue on Azure pipeline
Signed-off-by: Jie Zhang <jessezhang1010@gmail.com>

ef4d6574