Commits · 694ae2a7c687a0fb12dc09a4216a04347ceb6d1d · tsoc / superbenchmark

14 Apr, 2023 1 commit

Docs - Upgrade version and release note (#508) · 694ae2a7

Yifan Xiong authored Apr 14, 2023

__Description__

Upgrade version and release note.

__Major Revision__

- Upgrade package versions
- Add release note for v0.8.0

694ae2a7

12 Apr, 2023 2 commits
- Remove unreachable condition when write host list (#512) · 5a2adddc
  Yifan Xiong authored Apr 12, 2023
```
Remove unreachable condition when write host list in mpi mode.
```
  5a2adddc
- Add num_workers argument in model benchmark (#511) · 4c5417f7
  Yifan Xiong authored Apr 12, 2023
```
Change num_workers to configurable in model benchmark data loader.
```
  4c5417f7
07 Apr, 2023 1 commit
- Monitor - Collect realtime GPU power when benchmarking. (#507) · 10380709
  guoshzhao authored Apr 07, 2023
```
**Description**
Collect realtime GPU power when benchmarking.
```
  10380709
06 Apr, 2023 4 commits

Bug - Fix bug to get metric from cmd when error happens (#506) · 9f18dea3
Yuting Jiang authored Apr 06, 2023
```
**Description**
Fix bug to get metric from cmd when error happens(cudnn-function/_time:4)
```
9f18dea3

Analyzer: Fix bug in python3.8 due to pandas api change (#504) · 14a4a44b

Yuting Jiang authored Apr 06, 2023

**Description**
Analyzer: Fix bug in python3.8 due to pandas api change.

**Major Revision**
- force check numeric only in dataframe for analysis
- dataframe.append -> pd.concat
- pd.ExcelWriter.save() -> pd.ExcelWriter.close()

14a4a44b

Fix wrong torch usage in communication wrapper for Distributed Inference Benchmark (#505) · b97ddcf7
Ziyue Yang authored Apr 06, 2023
```
**Description**
This commit fixes wrong `torch.empty_like` usage and missing dtype and
device argument in communication wrappers.
```
b97ddcf7
Benchmark - Fix matrix size overflow issue in cuBLASLt GEMM (#503) · 9d250cdd
Yifan Xiong authored Apr 06, 2023
```
Fix matrix size overflow issue when cast from int to size_t implicitly.
```
9d250cdd

03 Apr, 2023 1 commit

Monitor - Fix the cgroup version checking logic. (#502) · 26373edb

guoshzhao authored Apr 03, 2023

**Description**
Looks `grep cgroup /proc/filesystems` doesn't work for NDv4 whose cgroup
version is v1, but the result of this command got v2 for NDv4. Instead,
checking the file existence to judge the cgroup version.

26373edb

28 Mar, 2023 1 commit

Benchmark - Update TE FP8 model conversion (#499) · 97c9a41f

Yifan Xiong authored Mar 28, 2023

__Description__

Update TE FP8 model conversion.

__Major Revisions__
* Add 16-byte alignment comment.
* Fix TE layer parameters type.

97c9a41f

25 Mar, 2023 1 commit

Benchmarks - Support TE FP8 in BERT/GPT2 models (#496) · c88c9709

Yifan Xiong authored Mar 25, 2023

Support Transformer Engine FP8 in existing PyTorch BERT/GPT2 models by
converting linear/layernorm to TE layers.

c88c9709

24 Mar, 2023 1 commit

Benchmarks - Add distributed inference benchmark (#493) · 8daef211

Ziyue Yang authored Mar 24, 2023



**Description**
This PR adds a micro-benchmark of distributed model inference workloads.

**Major Revision**
- Add a new micro-benchmark dist-inference.
- Add corresponding example and unit tests.
- Update configuration files to include this new micro-benchmark.
- Update micro-benchmark README.

---------
Co-authored-by: Peng Cheng <chengpeng5555@outlook.com>

8daef211

22 Mar, 2023 2 commits

Monitor - Support cgroup V2 when read system metrics. (#491) · a9b45a07

guoshzhao authored Mar 22, 2023

**Description**
Since ubuntu 22.04 will use cgroup V2 and the file structure changed.
Modify the monitor to adapt to cgroup v1 and v2.

a9b45a07

Benchmark - Support batch/shape range in cublaslt gemm (#494) · dbeba805
Yifan Xiong authored Mar 22, 2023
```
Support batch and shape range with multiplication factors in cublaslt
gemm benchmark.
```
dbeba805

21 Mar, 2023 2 commits

Adding HPL benchmark (#482) · 655bd0aa

rafsalas19 authored Mar 21, 2023



**Description**

- Adding HPL benchmark

---------
Co-authored-by: Ubuntu <azureuser@sbtestvm.jzlku1oskncengjiado35wf1hd.ax.internal.cloudapp.net>
Co-authored-by: Peng Cheng <chengpeng5555@outlook.com>

655bd0aa

Benchmark - Fix torch.dist init issue with multiple models (#495) · 644b5395

Yifan Xiong authored Mar 21, 2023

Fix potential barrier timeout in init_process_group due to race
condition of using the same port. Change to different ports when running
multiple models sequentially in one process.
For example, when running vgg11/13/16/19, will use port 29501~29504
respectively.

644b5395

20 Mar, 2023 2 commits

Benchmarks: Support error tolerance in micro-benchmark for CuDNN function (#490) · 5a88db16

Yuting Jiang authored Mar 20, 2023

**Description**
Support error tolerance in micro-benchmark for CuDNN function


**Major Revision**
- revise micro_base to support running the remaining commands run when
one command failed in the microbenchmark
- make error tolerance as true in cudnn functions

5a88db16

Benchmarks - Support tensor core precisions in cublaslt gemm (#492) · b808135c
Yifan Xiong authored Mar 20, 2023
```
Support FP64/TF32/FP16/BF16 in cublaslt (batch) GEMM.
```
b808135c

27 Feb, 2023 1 commit

Benchmarks: Revision - Support flexible warmup and non-random data... · eba298f5

Yuting Jiang authored Feb 28, 2023

Benchmarks: Revision - Support flexible warmup and non-random data initialization in cublas-benchmark  (#479)

**Description**
revise cublas-benchmark for flexible warmup and fill data with fixed
number for perf test to improve the running efficiency.

**Major Revision**
- remove num_in_steps for warmup to support more flexible warmup setting
for users
- Add support to generate input with fixed number for perf test

eba298f5

13 Feb, 2023 2 commits

Adding Stream Benchmark (#473) · 32896ca4

rafsalas19 authored Feb 13, 2023



**Description**

- Added stream benchmark
- Added stream unit test
- Added stream example
- Modified docker files to build stream

---------
Co-authored-by: Ubuntu <azureuser@sbtestvm.jzlku1oskncengjiado35wf1hd.ax.internal.cloudapp.net>
Co-authored-by: Peng Cheng <chengpeng5555@outlook.com>
Co-authored-by: Yifan Xiong <xiongyf@yandex.com>

32896ca4

Executor - Support SuperBench Executor running on Windows (#475) · 62a29134

Yuting Jiang authored Feb 13, 2023

**Description**
Support SuperBench Executor running on Windows.

**Major Revision**
- Lazy import ansible related module

62a29134

28 Jan, 2023 1 commit

Release - SuperBench v0.7.0 (#468) · b07fda15

Yifan Xiong authored Jan 28, 2023



**Description**

Cherry-pick bug fixes from v0.7.0 to main.

**Major Revisions**

* Benchmarks - Fix missing include in FP8 benchmark (#460)
* Fix bug in TE BERT model (#461)
* Doc - Update benchmark doc (#465)
* Bug: Fix bug for incorrect datatype judgement in cublas-function
source code (#464)
* Support `sb deploy` without pulling image (#466)
* Docs - Upgrade version and release note (#467)
Co-authored-by: Russell J. Hewett <russell.j.hewett@gmail.com>
Co-authored-by: Yuting Jiang <yutingjiang@microsoft.com>

b07fda15

17 Jan, 2023 1 commit
- Bug: Fix bug for incorrect datatype judgement in cublas-function source code (#462) · f380bc5e
  Yuting Jiang authored Jan 17, 2023
```
**Description**
Fix bug for incorrect datatype judgement in cublas-function source code.
```
  f380bc5e
04 Jan, 2023 3 commits

Benchmarks - Support topo-aware, pair-wise, and K-batch pattern in nccl-bw benchmark (#454) · ccccd988
Yang Wang authored Jan 04, 2023
```
Support traffic patterns under the different devices in NCCL/RCCL test
* change the metrics format if specified the pattern
```
ccccd988

Runner - Generate host groups file in mpi mode (#458) · 8e748d56

Yang Wang authored Jan 04, 2023

**Major Revision**

- Add an option for pattern to generate mpi_pattern.txt file if
specified the path.
- In mpi pattern, serial_index and parallel_index will add in each
benchmark as environment variables.

**Minor Revision**
- Fix typo

8e748d56

Benchmarks - Support FP8 in BERT models (#446) · 5197cdf5

Yifan Xiong authored Jan 04, 2023

Support FP8 in PyTorch BERT models:

* add fp8 hybrid/e4m3/e5m2 in precision arguments
* build BERT encoders with `te.TransformerLayer` to repalce
`transformers.BertModel`
* wrap forward steps with fp8 autocast

5197cdf5

03 Jan, 2023 6 commits
- Runner: Support `topo-aware` and `k-batch` pattern in 'mpi' mode (#437) · 65e433c0
  Yang Wang authored Jan 03, 2023
```
**Description**
Support the following patterns  in `mpi` mode:
* `k-batch`
* `topo-aware`
```
  65e433c0
- Support GEMM benchmark on Hopper GPUs (#456) · fc661f7d
  Yifan Xiong authored Jan 03, 2023
```
Support GEMM benchmark on Hopper GPUs.
```
  fc661f7d
- Benchmarks - Integrate cublaslt micro-benchmark (#455) · 616e7a5a
  Yifan Xiong authored Jan 03, 2023
```
Integrate cublaslt-gemm micro-benchmark #451.
```
  616e7a5a
- Benchmarks: Micro benchmarks - Add correctness check in cublas-function benchmark (#452) · 75573f59
  Yuting Jiang authored Jan 03, 2023
```
**Description**
 Add correctness check in cublas-function benchmark.

**Major Revision**
- add python code of correctness check in cublas-function benchmark and test
```
  75573f59
- Benchmarks - Add cuBLASLt FP16 and FP8 GEMM micro-benchmark (#451) · 0591da5f
  Yifan Xiong authored Jan 03, 2023
```
Add micro-benchmark for cublaslt fp8 gemm.
```
  0591da5f
- Benchmarks: Micro benchmarks - add source code of correctness check for cublas functions (#450) · 678b1251
  Yuting Jiang authored Jan 03, 2023
```
**Description**
Add c source code of correctness check for cublas functions.

**Major Revision**
- add correctness check for all supported cublas functions
- add --correctness option into binary

**Minor Revision**
- fix bug and template fill_data and prepare_tensor to get right memory-alignment output matrix for different datatype
```
  678b1251
30 Dec, 2022 2 commits

Executor - Add stdout logging util module and enable real-time logging flushing in executor (#445) · 9dfefce3

Yuting Jiang authored Dec 30, 2022

**Description**
Add stdout logging util module and enable real-time logging flushing in executor

**Major Revision**
- Add stdout logging util module to redirect stdout into file log
- enable stdout logging in executor to write benchmark output into both stdout and file `sb-bench.log`
- enable real-time log flushing in run_command of microbenchmarks through config `log_flushing`

**Minor Revision**
- add log_n_step args to enable regular step time log in model benchmarks 
- udpate related docs

9dfefce3

Benchmarks - Support `pair-wise` pattern in IB validation benchmark (#453) · f2634d86
Yang Wang authored Dec 30, 2022
```
**Description**
* Reuse `gen_pair_wise_config` in micro-benchmark
```
f2634d86

29 Dec, 2022 1 commit
- Runner - Support `pair-wise` pattern in `mpi` mode (#447) · 7838b6b1
  Yang Wang authored Dec 29, 2022
```
* Extract pair-wise pattern from ib_validation
```
  7838b6b1
14 Dec, 2022 1 commit
- Benchmark: Revision - Add wait time option to resolve mem-bw unstable issue (#438) · 6583ba2e
  Yuting Jiang authored Dec 14, 2022
```
**Description**
Add wait time option to resolve mem-bw unstable issue.
```
  6583ba2e
29 Nov, 2022 1 commit

Runner - support 'pattern' in 'mpi' mode to run tasks in parallel (#430) · e4eeda0a

Yang Wang authored Nov 29, 2022

* add mpi-parallels mode

* update according to comments

* fix and update doc

* update

* merge into 'mpi' mode

* udpate according to comments

* fix testcases

* fix ansible

* regard pattern as field

* udpate

* fix flake8 version

* add flake8 range

* remove map-by from host config

* udpate comments

e4eeda0a

01 Nov, 2022 1 commit

CLI - Add non-zero return code for `sb [deploy,run]` (#425) · 1b86503d

Yifan Xiong authored Nov 01, 2022

Add non-zero return code for `sb deploy` and `sb run` command when
there're Ansible failures in control plane.
Return code is set to count of failure.

For failures caused by benchmarks, return code is still set per benchmark
in results json file.

1b86503d

31 Oct, 2022 1 commit

CLI - Update version to include revision hash and date (#427) · d7bb8303

Yifan Xiong authored Oct 31, 2022

Update version to include revision hash and date in "{last tag}+g{git
hash}.d{date}" format, here're the examples:
* exact tag: 0.6.0
* commit after tag: 0.6.0+gcbb1b34
* commit after tag with local changes: 0.6.0+gcbb1b34.d20221028

d7bb8303

18 Oct, 2022 1 commit

Benchmarks - Add support to allow list of custom config string in... · 3367c4f6

Yuting Jiang authored Oct 18, 2022

Benchmarks - Add support to allow list of custom config string in cudnn-functions and cublas-functions (#414)

**Description**
Add support to allow list of custom config string in cudnn-functions and cublas-functions.

3367c4f6