Commits · b808135c27237ebff290e3ee80887b8d857a86b7 · tsoc / superbenchmark

20 Mar, 2023 1 commit
- Benchmarks - Support tensor core precisions in cublaslt gemm (#492) · b808135c
  Yifan Xiong authored Mar 20, 2023
```
Support FP64/TF32/FP16/BF16 in cublaslt (batch) GEMM.
```
  b808135c
27 Feb, 2023 1 commit

Benchmarks: Revision - Support flexible warmup and non-random data... · eba298f5

Yuting Jiang authored Feb 28, 2023

Benchmarks: Revision - Support flexible warmup and non-random data initialization in cublas-benchmark  (#479)

**Description**
revise cublas-benchmark for flexible warmup and fill data with fixed
number for perf test to improve the running efficiency.

**Major Revision**
- remove num_in_steps for warmup to support more flexible warmup setting
for users
- Add support to generate input with fixed number for perf test

eba298f5

13 Feb, 2023 2 commits

Adding Stream Benchmark (#473) · 32896ca4

rafsalas19 authored Feb 13, 2023



**Description**

- Added stream benchmark
- Added stream unit test
- Added stream example
- Modified docker files to build stream

---------
Co-authored-by: Ubuntu <azureuser@sbtestvm.jzlku1oskncengjiado35wf1hd.ax.internal.cloudapp.net>
Co-authored-by: Peng Cheng <chengpeng5555@outlook.com>
Co-authored-by: Yifan Xiong <xiongyf@yandex.com>

32896ca4

Executor - Support SuperBench Executor running on Windows (#475) · 62a29134

Yuting Jiang authored Feb 13, 2023

**Description**
Support SuperBench Executor running on Windows.

**Major Revision**
- Lazy import ansible related module

62a29134

28 Jan, 2023 1 commit

Release - SuperBench v0.7.0 (#468) · b07fda15

Yifan Xiong authored Jan 28, 2023



**Description**

Cherry-pick bug fixes from v0.7.0 to main.

**Major Revisions**

* Benchmarks - Fix missing include in FP8 benchmark (#460)
* Fix bug in TE BERT model (#461)
* Doc - Update benchmark doc (#465)
* Bug: Fix bug for incorrect datatype judgement in cublas-function
source code (#464)
* Support `sb deploy` without pulling image (#466)
* Docs - Upgrade version and release note (#467)
Co-authored-by: Russell J. Hewett <russell.j.hewett@gmail.com>
Co-authored-by: Yuting Jiang <yutingjiang@microsoft.com>

b07fda15

17 Jan, 2023 1 commit
- Bug: Fix bug for incorrect datatype judgement in cublas-function source code (#462) · f380bc5e
  Yuting Jiang authored Jan 17, 2023
```
**Description**
Fix bug for incorrect datatype judgement in cublas-function source code.
```
  f380bc5e
04 Jan, 2023 3 commits

Benchmarks - Support topo-aware, pair-wise, and K-batch pattern in nccl-bw benchmark (#454) · ccccd988
Yang Wang authored Jan 04, 2023
```
Support traffic patterns under the different devices in NCCL/RCCL test
* change the metrics format if specified the pattern
```
ccccd988

Runner - Generate host groups file in mpi mode (#458) · 8e748d56

Yang Wang authored Jan 04, 2023

**Major Revision**

- Add an option for pattern to generate mpi_pattern.txt file if
specified the path.
- In mpi pattern, serial_index and parallel_index will add in each
benchmark as environment variables.

**Minor Revision**
- Fix typo

8e748d56

Benchmarks - Support FP8 in BERT models (#446) · 5197cdf5

Yifan Xiong authored Jan 04, 2023

Support FP8 in PyTorch BERT models:

* add fp8 hybrid/e4m3/e5m2 in precision arguments
* build BERT encoders with `te.TransformerLayer` to repalce
`transformers.BertModel`
* wrap forward steps with fp8 autocast

5197cdf5

03 Jan, 2023 6 commits
- Runner: Support `topo-aware` and `k-batch` pattern in 'mpi' mode (#437) · 65e433c0
  Yang Wang authored Jan 03, 2023
```
**Description**
Support the following patterns  in `mpi` mode:
* `k-batch`
* `topo-aware`
```
  65e433c0
- Support GEMM benchmark on Hopper GPUs (#456) · fc661f7d
  Yifan Xiong authored Jan 03, 2023
```
Support GEMM benchmark on Hopper GPUs.
```
  fc661f7d
- Benchmarks - Integrate cublaslt micro-benchmark (#455) · 616e7a5a
  Yifan Xiong authored Jan 03, 2023
```
Integrate cublaslt-gemm micro-benchmark #451.
```
  616e7a5a
- Benchmarks: Micro benchmarks - Add correctness check in cublas-function benchmark (#452) · 75573f59
  Yuting Jiang authored Jan 03, 2023
```
**Description**
 Add correctness check in cublas-function benchmark.

**Major Revision**
- add python code of correctness check in cublas-function benchmark and test
```
  75573f59
- Benchmarks - Add cuBLASLt FP16 and FP8 GEMM micro-benchmark (#451) · 0591da5f
  Yifan Xiong authored Jan 03, 2023
```
Add micro-benchmark for cublaslt fp8 gemm.
```
  0591da5f
- Benchmarks: Micro benchmarks - add source code of correctness check for cublas functions (#450) · 678b1251
  Yuting Jiang authored Jan 03, 2023
```
**Description**
Add c source code of correctness check for cublas functions.

**Major Revision**
- add correctness check for all supported cublas functions
- add --correctness option into binary

**Minor Revision**
- fix bug and template fill_data and prepare_tensor to get right memory-alignment output matrix for different datatype
```
  678b1251
30 Dec, 2022 2 commits

Executor - Add stdout logging util module and enable real-time logging flushing in executor (#445) · 9dfefce3

Yuting Jiang authored Dec 30, 2022

**Description**
Add stdout logging util module and enable real-time logging flushing in executor

**Major Revision**
- Add stdout logging util module to redirect stdout into file log
- enable stdout logging in executor to write benchmark output into both stdout and file `sb-bench.log`
- enable real-time log flushing in run_command of microbenchmarks through config `log_flushing`

**Minor Revision**
- add log_n_step args to enable regular step time log in model benchmarks 
- udpate related docs

9dfefce3

Benchmarks - Support `pair-wise` pattern in IB validation benchmark (#453) · f2634d86
Yang Wang authored Dec 30, 2022
```
**Description**
* Reuse `gen_pair_wise_config` in micro-benchmark
```
f2634d86

29 Dec, 2022 1 commit
- Runner - Support `pair-wise` pattern in `mpi` mode (#447) · 7838b6b1
  Yang Wang authored Dec 29, 2022
```
* Extract pair-wise pattern from ib_validation
```
  7838b6b1
14 Dec, 2022 1 commit
- Benchmark: Revision - Add wait time option to resolve mem-bw unstable issue (#438) · 6583ba2e
  Yuting Jiang authored Dec 14, 2022
```
**Description**
Add wait time option to resolve mem-bw unstable issue.
```
  6583ba2e
29 Nov, 2022 1 commit

Runner - support 'pattern' in 'mpi' mode to run tasks in parallel (#430) · e4eeda0a

Yang Wang authored Nov 29, 2022

* add mpi-parallels mode

* update according to comments

* fix and update doc

* update

* merge into 'mpi' mode

* udpate according to comments

* fix testcases

* fix ansible

* regard pattern as field

* udpate

* fix flake8 version

* add flake8 range

* remove map-by from host config

* udpate comments

e4eeda0a

01 Nov, 2022 1 commit

CLI - Add non-zero return code for `sb [deploy,run]` (#425) · 1b86503d

Yifan Xiong authored Nov 01, 2022

Add non-zero return code for `sb deploy` and `sb run` command when
there're Ansible failures in control plane.
Return code is set to count of failure.

For failures caused by benchmarks, return code is still set per benchmark
in results json file.

1b86503d

31 Oct, 2022 1 commit

CLI - Update version to include revision hash and date (#427) · d7bb8303

Yifan Xiong authored Oct 31, 2022

Update version to include revision hash and date in "{last tag}+g{git
hash}.d{date}" format, here're the examples:
* exact tag: 0.6.0
* commit after tag: 0.6.0+gcbb1b34
* commit after tag with local changes: 0.6.0+gcbb1b34.d20221028

d7bb8303

18 Oct, 2022 1 commit

Benchmarks - Add support to allow list of custom config string in... · 3367c4f6

Yuting Jiang authored Oct 18, 2022

Benchmarks - Add support to allow list of custom config string in cudnn-functions and cublas-functions (#414)

**Description**
Add support to allow list of custom config string in cudnn-functions and cublas-functions.

3367c4f6

06 Sep, 2022 1 commit

Release - SuperBench v0.6.0 (#409) · 63e9b2d1

Yifan Xiong authored Sep 06, 2022



**Description**

Cherry-pick bug fixes from v0.6.0 to main.

**Major Revisions**

* Enable latency test in ib traffic validation distributed benchmark (#396)
* Enhance parameter parsing to allow spaces in value (#397)
* Update apt packages in dockerfile (#398)
* Upgrade colorlog for NO_COLOR support (#404)
* Analyzer - Update error handling to support exit code of sb result diagnosis (#403)
* Analyzer - Make baseline file optional in data diagnosis and fix bugs (#399)
* Enhance timeout cleanup to avoid possible hanging (#405)
* Auto generate ibstat file by pssh (#402)
* Analyzer - Format int type and unify empty value to N/A in diagnosis output file (#406)
* Docs - Upgrade version and release note (#407)
* Docs - Fix issues in document (#408)
Co-authored-by: Yang Wang <yangwang1@microsoft.com>
Co-authored-by: Yuting Jiang <yutingjiang@microsoft.com>

63e9b2d1

23 Aug, 2022 1 commit

Analyzer - Add support to store values of metrics in data diagnosis (#392) · 733860d7

Yuting Jiang authored Aug 23, 2022

**Description**
Add support to store values of metrics in data diagnosis.

Take the following rules as example: 
```
    nccl_store_rule:
      categories: NCCL_DIS
      store: True
      metrics:
        - nccl-bw:allreduce-run0/allreduce_1073741824_busbw
        - nccl-bw:allreduce-run1/allreduce_1073741824_busbw
        - nccl-bw:allreduce-run2/allreduce_1073741824_busbw
        - nccl-bw:allreduce-run3/allreduce_1073741824_busbw
        - nccl-bw:allreduce-run4/allreduce_1073741824_busbw
    nccl_rule:
      function: multi_rules
      criteria: 'lambda label:True if min(label["nccl_store_rule"].values())/max(label["nccl_store_rule"].values())<0.95 else False'
      categories: NCCL_DIS
```
**nccl_store_rule** will store the values of the metrics in dict and save them into `label["nccl_store_rule"]` , and then **rccl_rule** can use the values of metrics through `label["nccl_store_rule"].values()` in criteria

733860d7

22 Aug, 2022 1 commit

Analyzer - Add support for both jsonl and json format in data diagnosis (#388) · 10a79c4e

Yuting Jiang authored Aug 22, 2022

**Description**
Add support for both jsonl and json format in data diagnosis.

**Major Revision**
- Add support for both jsonl and json format in data diagnosis


**Minor Revision**
- change related doc
- add jsonl support in cli

10a79c4e

13 Aug, 2022 1 commit

Auto generate ibstat file for topo aware traffic pattern (#381) · faeee0a7

Yang Wang authored Aug 13, 2022

An enhancement for topo-aware IB performance validation #373.
This PR will auto-generate a required ibstate file `ib_traffic_topo_aware_ibstat.txt` which is used as input to build a graph.

faeee0a7

09 Aug, 2022 1 commit

Analyzer: Rename fields in json of data diagnosis to be more readable (#382) · b5c7c85d

Yuting Jiang authored Aug 09, 2022

**Description**
Rename field in data diagnosis to be more readable.

**Major Revision**
- rename fields according to diagnosis/metric format

**Minor Revision**
- change type of diagnosis/issue_num to be int

b5c7c85d

08 Aug, 2022 1 commit
- Runner - Fix minimum timeout (#385) · 9c29c931
  Yifan Xiong authored Aug 08, 2022
```
Fix minimum timeout: use 60s if config is shorter.
```
  9c29c931
04 Aug, 2022 1 commit

Gracefully exit when timeout (#383) · 9b8df883

Yifan Xiong authored Aug 04, 2022

* Gracefully exit when timeout, add corresponding log and return code.
* Set minimum timeout to 1 minute and enlarge Ansible timeout.

9b8df883

01 Aug, 2022 1 commit

Analyzer - Add failure check feature in data diagnosis (#378) · ec16d425

Yuting Jiang authored Aug 01, 2022

**Description**
Add failure check feature in data diagnosis.

**Major Revision**
- Add failure check rule op to support that if there exists metric_regex not been matched by any metric in result, label as failedtest
- Split performance issue and failedtest in categories


**Minor Revision**
- replace DataFrame.append() with pd.concat since append() will be removed in later version of pandas

ec16d425

26 Jul, 2022 1 commit

Support topo-aware IB performance validation (#373) · ef4d6574

Jie Zhang authored Jul 26, 2022



* Support topo-aware IB performance validation

Add a new pattern `topo-aware`, so the user can run IB performance
test based on VM's topology information. This way, the user can
validate the IB performance across VM pairs with different distance
as a quick test instead of pair-wise test.

To run with topo-aware pattern, user needs to specify three required
(and two optional) parameters in YAML config file:
--pattern	topo-aware
--ibstat	path to ibstat output
--ibnetdiscover	path to ibnetdiscover output
--min_dist	minimum distance of VM pairs (optional, default 2)
--max_dist	maximum distance of VM pairs (optional, default 6)

The newly added topo_aware module then parses the topology
information, builds a graph, and generates the VM pairs with
the specified distance (# hops).

The specified IB test will then be running across these
generated VM pairs.
Signed-off-by: Jie Zhang <jessezhang1010@gmail.com>

* Add description about topology aware ib traffic tests
Signed-off-by: Jie Zhang <jessezhang1010@gmail.com>

* Add unit test to verify generated topology aware config file

This commit adds unit test to verify the generated topology aware
config file is correct. To do so, four new data files are added in
order to invoke gen_topo_aware_config function to generate topology
aware config file, then compares it with the expected config file.
Signed-off-by: Jie Zhang <jessezhang1010@gmail.com>

* Fix lint issue on Azure pipeline
Signed-off-by: Jie Zhang <jessezhang1010@gmail.com>

ef4d6574

25 Jul, 2022 1 commit

Fix unexpected base conversion when the result value is negative (#377) · 5d448eed

Yang Wang authored Jul 25, 2022

Fix an unexpected result value (`-0.125`) issue in ib traffic benchmark when encountering `-1` in raw output
* Check if the value is valid before the base conversion
* Add a test case to cover this situation

5d448eed

20 Jul, 2022 1 commit

Fix port conflict in ib loopback (#375) · 352ae0c9

Yifan Xiong authored Jul 20, 2022

Fix potential port conflict due to race condition between time-to-check
to time-to-use, by binding the port all through.

Modify the function to resolve flake8 C901 while keeping the logic same.

352ae0c9

09 Jul, 2022 1 commit

Fix issues in ib validation benchmark (#370) · b2875179

Yifan Xiong authored Jul 09, 2022

Fix several issues in ib validation benchmark:
* continue running when timeout in the middle, instead of aborting whole mpi process
* make timeout parameter configurable, set default to 120 seconds
* avoid mixture of stdio and iostream when print to stdout
* set default message size to 8M which will saturate ib in most cases
* fix hostfile path issue so that it can be auto found in different cases

b2875179

08 Jul, 2022 1 commit

Support node_num=1 in mpi mode (#372) · e00a8180

Yifan Xiong authored Jul 08, 2022

Support `node_num: 1` in mpi mode, so that we can run mpi benchmarks in
both 1 node and all nodes in one config by changing `node_num`.
Update docs and add test case accordingly.

e00a8180

05 Jul, 2022 1 commit
- CLI - Support SKU auto detect if running on Azure VM (#365) · a94ead34
  Yifan Xiong authored Jul 05, 2022
```
Support SKU auto detect and using corresponding benchmark config if running on Azure VM.
```
  a94ead34
29 Jun, 2022 2 commits

Fix issues in ib loopback benchmark (#369) · 620192a2

Yifan Xiong authored Jun 30, 2022

Fix several issues in ib loopback benchmark:
* use `--report_gbits` and divide by 8 to get GB/s, previous results are
  MiB/s / 1000
* use the ib_write_bw binary built in third_party instead of system path
* update the metrics name so that different hca indices have same metric

620192a2

Deployment - Refine error message when GPU is not detected (#368) · 8ef7163a

Yifan Xiong authored Jun 30, 2022

Refine error message when GPU is not detected.

Possible solutions if hardware exists and drivers are already installed:
* nvidia gpus:
  ```sh
  /sbin/modprobe nvidia-uvm
  D=`grep nvidia-uvm /proc/devices | awk '{print $1}'`
  mknod -m 666 /dev/nvidia-uvm c $D 0
  ```

* amd gpus
  ```sh
  modprobe amdgpu
  ```

8ef7163a

24 Jun, 2022 1 commit

Support multiple IB/GPU in ib validation (#363) · bfaa1c83

Yifan Xiong authored Jun 24, 2022

**Description**

Support multiple IB/GPU devices run simultaneously in ib validation benchmark.

**Major Revisions**
- Revise ib_validation_performance.cc so that multiple processes per node could be used to launch multiple perftest commands simultaneously. For each node pair in the config, number of processes per node will run in parallel.
- Revise ib_validation_performance.py to correct file paths and adjust parameters to specify different NICs/GPUs/NUMA nodes.
- Fix env issues in Dockerfile for end-to-end test.
- Update ib-traffic configuration examples in config files.
- Update unit tests and docs accordingly.

Closes #326.

bfaa1c83