Commits · f53d941a22fc0746e98ef3560a6799422be8fa47 · tsoc / superbenchmark

"scripts/infer_task.py" did not exist on "a73433ab8c80d4fd95456d668707ff98da83d4c5"

20 Nov, 2023 1 commit
- Benchmarks: micro benchmarks - add int8 support for cublaslt function (#574) · f53d941a
  Yuting Jiang authored Nov 20, 2023
```
**Description**
add int8 support for cublaslt function.
```
  f53d941a
14 Nov, 2023 1 commit

Bug Fix - remove cp ptx file command in gpu burn test (#567) · c7800bb8

Yuting Jiang authored Nov 14, 2023

**Description**
remove cp ptx file in gpu burn test since the command is run inside
self.args.bin_dir dir.


https://github.com/microsoft/superbenchmark/blob/d246bab430adeb461072918a551b2e2b68c9bce5/superbench/benchmarks/micro_benchmarks/micro_base.py#L183

c7800bb8

22 Aug, 2023 1 commit
- Benchmarks: micro benchmark - source code for evaluating NVDEC decoding performance (#560) · 27a10811
  Yuting Jiang authored Aug 22, 2023
```
**Description**
source code for evaluating NVDEC decoding performance.

---------
Co-authored-by: yukirora <yuting.jiang@microsoft.com>
```
  27a10811
18 Aug, 2023 1 commit
- Benchmarks: micro benchmarks - add source code for DirectXRenderPerf (#549) · 6c0205ce
  Yuting Jiang authored Aug 18, 2023
```
**Description**
add source code for DirectXRenderPerf.

---------
Co-authored-by: yukirora <yuting.jiang@microsoft.com>
```
  6c0205ce
08 Aug, 2023 1 commit

Benchmarks: model benchmarks - change torch.distributed.launch to torchrun (#556) · 67f2aa72

pnunna93 authored Aug 08, 2023

This PR has following changes
- torch.distributed.launch changed to torchrun. torch.distributed.launch
is deprecated in latest Pytorch and is recommended to move to torchrun -
https://pytorch.org/docs/stable/elastic/run.html


- Changes to AMD GPU detection logic. The AMD GPU detection logic throws
warning when containers have only renderD in /dev/dri, this change would
resolve those warnings

---------
Co-authored-by: Yuting Jiang <yutingjiang@microsoft.com>

67f2aa72

27 Jul, 2023 1 commit

Release - SuperBench v0.9.0 (#558) · e1df877b

Yuting Jiang authored Jul 27, 2023

**Description**
Cherry-pick bug fixes from v0.9.0 to main.

**Major Revision**
- CI/CD: pipeline - clean more disk space to fix rocm building image
pipeline(#555 )
- Benchmarks: bug fix - use absolute path for input file in
DirectXEncodingLatency(#554)
- CI/CD - add push win docker image on release branch in pipeline (#552)
- Docs - Upgrade version and release note(#557)

e1df877b

06 Jul, 2023 1 commit
- Benchmarks: micro benchmarks - add python code for DirectXGPUEncodingLatency (#548) · e8ac0b1e
  Yuting Jiang authored Jul 06, 2023
```
**Description**
add python code for DirectXGPUEncodingLatency.
```
  e8ac0b1e
05 Jul, 2023 4 commits
- Benchmarks: micro benchmarks - add python code for DirectXGPUCopy (#546) · c8c079c2
  Yuting Jiang authored Jul 06, 2023
```
**Description**
add python code for DirectXGPUCopy.
```
  c8c079c2
- Benchmarks: micro benchmarks - add python code for DirecXGPUMemBw (#547) · af4cfd5b
  Yuting Jiang authored Jul 05, 2023
```
**Description**
add python code for DirecXGPUMemBw.
```
  af4cfd5b
- Benchmarks: micro benchmarks - add python code for DirectXGPUCoreFlops (#542) · f1d608ae
  Yuting Jiang authored Jul 05, 2023
```
**Description**
add python code for DirectX core flops and init DirectX test pipeline.

**Major Revision**
- add python code for DirectX core flops 
- init DirectX test pipeline


**Minor Revision**
- add test for DirectX core flops
```
  f1d608ae
- CI/CD - Support DirectX test pipeline (#545) · 3704a432
  Yuting Jiang authored Jul 05, 2023
```
**Description**
Support DirectX test pipeline.
```
  3704a432
30 Jun, 2023 2 commits

Benchmarks: microbenchmark - add auto selecting algorithm support for cudnn functions (#540) · 97f7b1df

Yuting Jiang authored Jun 30, 2023

**Description**
add auto selecting algorithm support for cudnn functions.

**Major Revision**
- add auto selecting algorithm support for cudnn functions in source
code
- add 'auto_algo' option in benchmark
- add related test

97f7b1df

Benchmarks - Update result parsing in tensorrt inference (#541) · 7184bdd1
Yifan Xiong authored Jun 30, 2023
```
* Update result parsing for newer tensorrt versions
* Update arguments when load torchvision models
```
7184bdd1

29 Jun, 2023 4 commits

Benchmarks: Add benchmark - Add source code of DirectxGPUCopy microbenchmark (#486) · f2599137
Yuting Jiang authored Jun 29, 2023
```
**Description**
Add source code of DirectxGPUCopy microbenchmark.
```
f2599137

Benchmarks: Add benchmark - Add source code of DirectxGPUMemBw microbenchmark (#487) · af4d18de

Yuting Jiang authored Jun 29, 2023



**Description**
Add source code of DirectxGPUMemBw microbenchmark.

---------
Co-authored-by: v-junlinlv <v-junlinlv@microsoft.com>

af4d18de

Tools - Add runner for sys info and update docs (#532) · ed027e4c

Yuting Jiang authored Jun 29, 2023

**Description**
Add runner for sys info to automatically collect on multiple nodes and
update related docs.

**Major Revision**
- add runner for sys info which will check docker status and run `sb
node info` on all nodes' docker and fetch results from all nodes

**Minor Revision**
- update cli and system-info doc
- update sb node info to save output info output-dir/sys-info.json

ed027e4c

Benchmarks: Add benchmark - Add source code of DirectXGPUCoreFLOPs microbenchmark (#488) · 3a6622f7

Yuting Jiang authored Jun 29, 2023



**Description**
Add source code of DirectXGPUCoreFLOPs microbenchmark.

---------
Co-authored-by: v-junlinlv <v-junlinlv@microsoft.com>

3a6622f7

28 Jun, 2023 1 commit

Dockerfile - Add SuperBench Windows Dockerfile (#534) · 44ef5314

Yuting Jiang authored Jun 28, 2023



**Description**
Add dockerfile for win10 and building script for directx_benchmarks.

**Major Revision**
- Add docker file for win10 and required scripts to install the
dependency
- Add building script to build all directx vs benchmarks
- Add call of building script in Makefile

---------
Co-authored-by: yukirora <yuting.jiang@microsoft.com>
Co-authored-by: Yifan Xiong <yifan.xiong@microsoft.com>

44ef5314

21 Jun, 2023 1 commit

Benchmarks - Add support for DirectX GPU platform (#536) · bbb0e243

Yuting Jiang authored Jun 21, 2023

**Description**
Add support for DirectX GPU platform.

**Major Revision**
- Add DirectX platform for benchmark registry
- Add gpu_vendor identify for AMD and NVIDIA with win driver

bbb0e243

16 Jun, 2023 1 commit
- Benchmarks - Update outdate references (#539) · e909ddd0
  guoshzhao authored Jun 16, 2023
```
**Description**
Update 404 outdate reference links.
```
  e909ddd0
23 May, 2023 1 commit

Runner - Add signal handler in runner (#530) · a1cd3c94

Yifan Xiong authored May 23, 2023

Add signal handler in runner to gracefully exit when receiving SIGINT
(<kbd>Ctrl</kbd>+<kbd>C</kbd>) or SIGTERM during benchmark execution.

a1cd3c94

28 Apr, 2023 1 commit

ModelBenchmarks - Fix early stop logic due to num_steps. (#522) · f38a9829

guoshzhao authored Apr 28, 2023

**Description**
Model benchmarks can stop due to `num_steps` or `duration` config which
will take effect when the value is set greater than 0.
If both are set greater than 0, the earliest condition reached will
work.

f38a9829

24 Apr, 2023 1 commit

Benchmarks - Revise step time collection in distributed inference benchmark (#524) · 4cb431ca

Ziyue Yang authored Apr 24, 2023

**Description**
This commit revises distributed inference benchmark to give a unified
step time result by taking maximum step times of different GPUs.

4cb431ca

14 Apr, 2023 1 commit

Release - SuperBench v0.8.0 (#517) · 51761b3a

Yifan Xiong authored Apr 14, 2023



**Description**

Cherry-pick bug fixes from v0.8.0 to main.

**Major Revisions**

* Monitor - Fix the cgroup version checking logic (#502)
* Benchmark - Fix matrix size overflow issue in cuBLASLt GEMM (#503)
* Fix wrong torch usage in communication wrapper for Distributed
Inference Benchmark (#505)
* Analyzer: Fix bug in python3.8 due to pandas api change (#504)
* Bug - Fix bug to get metric from cmd when error happens (#506)
* Monitor - Collect realtime GPU power when benchmarking (#507)
* Add num_workers argument in model benchmark (#511)
* Remove unreachable condition when write host list (#512)
* Update cuda11.8 image to cuda12.1 based on nvcr23.03 (#513)
* Doc - Fix wrong unit of cpu-memory-bw-latency in doc (#515)
* Docs - Upgrade version and release note (#508)
Co-authored-by: guoshzhao <guzhao@microsoft.com>
Co-authored-by: Ziyue Yang <ziyyang@microsoft.com>
Co-authored-by: Yuting Jiang <yutingjiang@microsoft.com>

51761b3a

28 Mar, 2023 1 commit

Benchmark - Update TE FP8 model conversion (#499) · 97c9a41f

Yifan Xiong authored Mar 28, 2023

__Description__

Update TE FP8 model conversion.

__Major Revisions__
* Add 16-byte alignment comment.
* Fix TE layer parameters type.

97c9a41f

25 Mar, 2023 1 commit

Benchmarks - Support TE FP8 in BERT/GPT2 models (#496) · c88c9709

Yifan Xiong authored Mar 25, 2023

Support Transformer Engine FP8 in existing PyTorch BERT/GPT2 models by
converting linear/layernorm to TE layers.

c88c9709

24 Mar, 2023 1 commit

Benchmarks - Add distributed inference benchmark (#493) · 8daef211

Ziyue Yang authored Mar 24, 2023



**Description**
This PR adds a micro-benchmark of distributed model inference workloads.

**Major Revision**
- Add a new micro-benchmark dist-inference.
- Add corresponding example and unit tests.
- Update configuration files to include this new micro-benchmark.
- Update micro-benchmark README.

---------
Co-authored-by: Peng Cheng <chengpeng5555@outlook.com>

8daef211

22 Mar, 2023 2 commits

Monitor - Support cgroup V2 when read system metrics. (#491) · a9b45a07

guoshzhao authored Mar 22, 2023

**Description**
Since ubuntu 22.04 will use cgroup V2 and the file structure changed.
Modify the monitor to adapt to cgroup v1 and v2.

a9b45a07

Benchmark - Support batch/shape range in cublaslt gemm (#494) · dbeba805
Yifan Xiong authored Mar 22, 2023
```
Support batch and shape range with multiplication factors in cublaslt
gemm benchmark.
```
dbeba805

21 Mar, 2023 2 commits

Adding HPL benchmark (#482) · 655bd0aa

rafsalas19 authored Mar 21, 2023



**Description**

- Adding HPL benchmark

---------
Co-authored-by: Ubuntu <azureuser@sbtestvm.jzlku1oskncengjiado35wf1hd.ax.internal.cloudapp.net>
Co-authored-by: Peng Cheng <chengpeng5555@outlook.com>

655bd0aa

Benchmark - Fix torch.dist init issue with multiple models (#495) · 644b5395

Yifan Xiong authored Mar 21, 2023

Fix potential barrier timeout in init_process_group due to race
condition of using the same port. Change to different ports when running
multiple models sequentially in one process.
For example, when running vgg11/13/16/19, will use port 29501~29504
respectively.

644b5395

20 Mar, 2023 2 commits

Benchmarks: Support error tolerance in micro-benchmark for CuDNN function (#490) · 5a88db16

Yuting Jiang authored Mar 20, 2023

**Description**
Support error tolerance in micro-benchmark for CuDNN function


**Major Revision**
- revise micro_base to support running the remaining commands run when
one command failed in the microbenchmark
- make error tolerance as true in cudnn functions

5a88db16

Benchmarks - Support tensor core precisions in cublaslt gemm (#492) · b808135c
Yifan Xiong authored Mar 20, 2023
```
Support FP64/TF32/FP16/BF16 in cublaslt (batch) GEMM.
```
b808135c

27 Feb, 2023 1 commit

Benchmarks: Revision - Support flexible warmup and non-random data... · eba298f5

Yuting Jiang authored Feb 28, 2023

Benchmarks: Revision - Support flexible warmup and non-random data initialization in cublas-benchmark  (#479)

**Description**
revise cublas-benchmark for flexible warmup and fill data with fixed
number for perf test to improve the running efficiency.

**Major Revision**
- remove num_in_steps for warmup to support more flexible warmup setting
for users
- Add support to generate input with fixed number for perf test

eba298f5

13 Feb, 2023 2 commits

Adding Stream Benchmark (#473) · 32896ca4

rafsalas19 authored Feb 13, 2023



**Description**

- Added stream benchmark
- Added stream unit test
- Added stream example
- Modified docker files to build stream

---------
Co-authored-by: Ubuntu <azureuser@sbtestvm.jzlku1oskncengjiado35wf1hd.ax.internal.cloudapp.net>
Co-authored-by: Peng Cheng <chengpeng5555@outlook.com>
Co-authored-by: Yifan Xiong <xiongyf@yandex.com>

32896ca4

Executor - Support SuperBench Executor running on Windows (#475) · 62a29134

Yuting Jiang authored Feb 13, 2023

**Description**
Support SuperBench Executor running on Windows.

**Major Revision**
- Lazy import ansible related module

62a29134

28 Jan, 2023 1 commit

Release - SuperBench v0.7.0 (#468) · b07fda15

Yifan Xiong authored Jan 28, 2023



**Description**

Cherry-pick bug fixes from v0.7.0 to main.

**Major Revisions**

* Benchmarks - Fix missing include in FP8 benchmark (#460)
* Fix bug in TE BERT model (#461)
* Doc - Update benchmark doc (#465)
* Bug: Fix bug for incorrect datatype judgement in cublas-function
source code (#464)
* Support `sb deploy` without pulling image (#466)
* Docs - Upgrade version and release note (#467)
Co-authored-by: Russell J. Hewett <russell.j.hewett@gmail.com>
Co-authored-by: Yuting Jiang <yutingjiang@microsoft.com>

b07fda15

17 Jan, 2023 1 commit
- Bug: Fix bug for incorrect datatype judgement in cublas-function source code (#462) · f380bc5e
  Yuting Jiang authored Jan 17, 2023
```
**Description**
Fix bug for incorrect datatype judgement in cublas-function source code.
```
  f380bc5e
04 Jan, 2023 2 commits

Benchmarks - Support topo-aware, pair-wise, and K-batch pattern in nccl-bw benchmark (#454) · ccccd988
Yang Wang authored Jan 04, 2023
```
Support traffic patterns under the different devices in NCCL/RCCL test
* change the metrics format if specified the pattern
```
ccccd988

Runner - Generate host groups file in mpi mode (#458) · 8e748d56

Yang Wang authored Jan 04, 2023

**Major Revision**

- Add an option for pattern to generate mpi_pattern.txt file if
specified the path.
- In mpi pattern, serial_index and parallel_index will add in each
benchmark as environment variables.

**Minor Revision**
- Fix typo

8e748d56