Commits · 0b8d1fd4c3f301dbdc8cf2a6d67d7f7cd9f79c76 · tsoc / superbenchmark

20 Jun, 2025 1 commit

Benchmark - Add Grace CPU support for CPU Stream (#719) · 0b8d1fd4

WenqingLan1 authored Jun 19, 2025



**Description**
Added support for Grace CPU neo2 architecture in CPU Stream. Now CPU
Stream supports dual socket benchmarking.

Example config for this arch support:
```yaml
    cpu-stream:numa0:
      timeout: *default_timeout
      modes:
      - name: local
        parallel: no
      parameters:
        cpu_arch: neo2
        numa_mem_nodes: 0
        cores: 0 1 2 3 4 5 6 7 8
    cpu-stream:numa1:
      timeout: *default_timeout
      modes:
      - name: local
        parallel: no
      parameters:
        cpu_arch: neo2
        numa_mem_nodes: 1
        cores: 64 65 66 67 68 69 70 71 72
    cpu-stream:numa-spread:
      timeout: *default_timeout
      modes:
      - name: local
        parallel: no
      parameters:
        cpu_arch: neo2
        numa_mem_nodes: 0 1
        cores: 0 1 2 3 4 5 6 7 8 64 65 66 67 68 69 70 71 72
```

---------
Co-authored-by: dpower4 <dilipreddi@gmail.com>

0b8d1fd4

18 Jun, 2025 1 commit

Benchmarks - Add GPU Stream Micro Benchmark (#697) · 4eddd50a

WenqingLan1 authored Jun 18, 2025

Added GPU Stream benchmark - measures the GPU memory bandwidth and
efficiency for double datatype through various memory operations
including copy, scale, add, and triad.
- added documentation for `gpu-stream` detailing its introduction,
metrics, and descriptions.
- added unit tests for `gpu-stream`. Example output is in
`superbenchmark/tests/data/gpu_stream.log`.

4eddd50a

05 Feb, 2025 1 commit

Bugfix - nvbandwidth benchmark need to handle N/A value (#675) · 45d06647

Hongtao Zhang authored Feb 05, 2025



**Description**

1. Fixed the bug that nvbandwidth benchmark need to handle 'N/A' values
in nvbandwidth cmd output.
2. Replaced the input format of test cases with a list.
3. Add nvbandwidth configuration example in default config files.

---------
Co-authored-by: hongtaozhang <hongtaozhang@microsoft.com>
Co-authored-by: Yifan Xiong <yifan.xiong@microsoft.com>

45d06647

28 Nov, 2024 1 commit

Benchmarks - Add LLaMA-2 Models (#668) · 249e21c1

pdr authored Nov 27, 2024

Added llama benchmark - training and inference in accordance with the
existing pytorch models implementation like gpt2, lstm etc.

- added llama fp8 unit test for better code coverage, to reduce memory
required
- updated transformers version >= 4.28.0 for LLamaConfig
- set tokenizers version <= 0.20.3 to avoid 0.20.4 version
[issues](https://github.com/huggingface/tokenizers/issues/1691

) with
py3.8
- added llama2 to tensorrt
- llama2 tests not added to test_tensorrt_inference_performance.py due
to large memory requirement for worker gpu. tests validated separately
on gh200

---------
Co-authored-by: dpatlolla <dpatlolla@microsoft.com>

249e21c1

22 Nov, 2024 1 commit

Benchmarks: micro benchmarks - add nvbandwidth benchmark (#669) · 7cef624e

Hongtao Zhang authored Nov 21, 2024



**Description**

Add nvbandwidth benchmark.

---------
Co-authored-by: hongtaozhang <hongtaozhang@microsoft.com>

7cef624e

08 Dec, 2023 1 commit

Benchmarks: Micro benchmark - Add one-to-all, all-to-one, all-to-all support... · 4fa60be7

Ziyue Yang authored Dec 08, 2023

Benchmarks: Micro benchmark - Add one-to-all, all-to-one, all-to-all support to gpu_copy_bw_performance (#588)

**Description**
Add one-to-all, all-to-one, all-to-all support to
gpu_copy_bw_performance, and fix performance bug in gpu_copy

4fa60be7

24 Mar, 2023 1 commit

Benchmarks - Add distributed inference benchmark (#493) · 8daef211

Ziyue Yang authored Mar 24, 2023



**Description**
This PR adds a micro-benchmark of distributed model inference workloads.

**Major Revision**
- Add a new micro-benchmark dist-inference.
- Add corresponding example and unit tests.
- Update configuration files to include this new micro-benchmark.
- Update micro-benchmark README.

---------
Co-authored-by: Peng Cheng <chengpeng5555@outlook.com>

8daef211

21 Mar, 2023 1 commit

Adding HPL benchmark (#482) · 655bd0aa

rafsalas19 authored Mar 21, 2023



**Description**

- Adding HPL benchmark

---------
Co-authored-by: Ubuntu <azureuser@sbtestvm.jzlku1oskncengjiado35wf1hd.ax.internal.cloudapp.net>
Co-authored-by: Peng Cheng <chengpeng5555@outlook.com>

655bd0aa

13 Feb, 2023 1 commit

Adding Stream Benchmark (#473) · 32896ca4

rafsalas19 authored Feb 13, 2023



**Description**

- Added stream benchmark
- Added stream unit test
- Added stream example
- Modified docker files to build stream

---------
Co-authored-by: Ubuntu <azureuser@sbtestvm.jzlku1oskncengjiado35wf1hd.ax.internal.cloudapp.net>
Co-authored-by: Peng Cheng <chengpeng5555@outlook.com>
Co-authored-by: Yifan Xiong <xiongyf@yandex.com>

32896ca4

11 Apr, 2022 1 commit

Benchmarks: Add Benchmark - Add FAMBench based on docker benchmark (#338) · 80dcc8aa

guoshzhao authored Apr 11, 2022

**Description**
Integrate FAMBench into superbench based on docker implementation:
https://github.com/facebookresearch/FAMBench

The script to run all benchmarks is:
https://github.com/facebookresearch/FAMBench/blob/main/benchmarks/run_all.sh

80dcc8aa

16 Mar, 2022 1 commit

Benchmarks: Add Feature - Add GPU-Burn as microbenchmark (#324) · ff51a3ce

rafsalas19 authored Mar 16, 2022

**Description**
Modifications adding GPU-Burn to SuperBench.
- added third party submodule
- modified Makefile to make gpu-burn binary
- added/modified microbenchmarks to add gpu-burn python scripts
- modified default and azure_ndv4 configs to add gpu-burn

ff51a3ce

08 Feb, 2022 1 commit
- Benchmarks: Revise Code - Make data checking in gpu_copy optional (#301) · 682b2c12
  Ziyue Yang authored Feb 08, 2022
```
This commit makes data checking in gpu_copy optional, because it will take too long time if message size is large.
```
  682b2c12
21 Jan, 2022 1 commit

Benchmarks: Add Feature - Add bidirectional test support in gpu_copy benchmark (#285) · 74421ffe

Ziyue Yang authored Jan 21, 2022

**Description**
This commit adds bidirectional tests in gpu_copy benchmark for both device-host transfer and device-device transfer, and revises related tests.

74421ffe

13 Dec, 2021 1 commit

Benchmarks: Add Benchmark - Add mlc benchmark to superbench (#216) · b590409e

Hossein Pourreza authored Dec 12, 2021

**Description**
Add mlc memory bandwidth and latency micro benchmark to Superbench.

**Major Revision**
- Add mlc benchmark with test and example files

b590409e

10 Dec, 2021 1 commit

Benchmarks: Add Benchmark - Add ONNXRuntime inference benchmark based on ORT python API (#245) · 4d85630a

guoshzhao authored Dec 10, 2021

**Description**
Add ONNXRuntime inference benchmark based on ORT python API.

**Major Revision**
- Add `ORTInferenceBenchmark` class to export pytorch model to onnx model and do inference
- Add tests and example for `ort-inference` benchmark
- Update the introduction docs.

4d85630a

25 Nov, 2021 1 commit
- Benchmarks: Fix Typo - Fix typo in description of kernel launch overhead example (#244) · 6d85b03a
  Kaiyu Xie authored Nov 25, 2021
```
**Description**
Fix typo in description of kernel_launch_overhead.py
```
  6d85b03a
12 Nov, 2021 1 commit

Benchmarks - Add TensorRT inference benchmark (#236) · 8a00c8a0

Yifan Xiong authored Nov 12, 2021

__Description__

Add TensorRT inference benchmark for torchvision models.

__Major Revision__
- Measure TensorRT inference performance.

8a00c8a0

09 Nov, 2021 1 commit

Benchmarks: Add Benchmark - Add ib traffic validation distributed benchmark (#215) · 54919424

Yuting Jiang authored Nov 10, 2021

**Description**
Add ib traffic validation distributed benchmark.

**Major Revision**
- Add ib traffic validation distributed benchmark, example and test

54919424

30 Oct, 2021 1 commit

Benchmarks: Add Feature - Add CPU-initiated copy and dtod support to gpu-sm-copy benchmark (#230) · 008e0fe1

Ziyue Yang authored Oct 30, 2021

**Description**
This commit does the following:
1) Adds CPU-initiated copy benchmark;
2) Adds dtod benchmark;
3) Support scanning NUMA nodes and GPUs inside the benchmark program;
4) Change the name of gpu-sm-copy to gpu-copy.

008e0fe1

27 Oct, 2021 1 commit
- Benchmarks: Add Benchmark - Add onnx model benchmarks based on docker image. (#227) · e98a6812
  guoshzhao authored Oct 27, 2021
```
Add RocmOnnxModelBenchmark class to run benchmarks packaged in superbench/benchmark:rocm4.3.1-onnxruntime1.9.0
```
  e98a6812
22 Oct, 2021 1 commit

Benchmarks: Add Benchmark - Add gpcnet microbenchmark (#229) · 6003f2c2

Yuting Jiang authored Oct 22, 2021

**Description**
Add gpcnet microbenchmark

**Major Revision**
- add 2 microbenmark for gpcnet, gpc-network-test, gpc-network-load-test
- add related test and example file

6003f2c2

12 Oct, 2021 1 commit

Benchmarks: Add Benchmark - Add tcp connectivity validation microbenchmark (#217) · 49cc8f9a

Yuting Jiang authored Oct 13, 2021

**Description**
Add tcp connectivity validation microbenchmark which is to validate TCP connectivity between current node and several nodes in the hostfile.

**Major Revision**
- Add tcp connectivity validation microbenchmark and related test, example

49cc8f9a

30 Aug, 2021 2 commits
- Benchmarks: Add Benchmark - Add GPU SM copy benchmark (#169) · b97197f0
  Ziyue Yang authored Aug 30, 2021
```
**Description**
This commit adds gpu_sm_copy benchmark and related tests.
```
  b97197f0
- Benchmarks: Add Benchmark - Add gemm flops microbenchmark for amd (#152) · f3d53c3d
  Yuting Jiang authored Aug 30, 2021
```
**Description**
Add gemm flops microbenchmark for amd.

**Major Revision**
- Add gemm flops microbenchmark for amd.
- Add related example and test file.
```
  f3d53c3d
27 Aug, 2021 1 commit

Benchmarks: Add Benchmark - Add memory bus bandwidth performance microbenchmark for amd (#153) · 666e3a94

Yuting Jiang authored Aug 27, 2021

**Description**
Add memory bus bandwidth performance microbenchmark for amd.

**Major Revision**
- Add memory bus bandwidth performance microbenchmark for amd.
- Add related example and test file.

666e3a94

30 Jul, 2021 1 commit
- Benchmarks: Add Benchmark - Revise and add rccl microbenchmark for rocm (#143) · 157b4e2d
  Yuting Jiang authored Jul 30, 2021
```
**Description**
Add rccl bandwidth microbenchmark for rocm.

**Major Revision**
- Register rccl-bw benchmark.
```
  157b4e2d
26 Jul, 2021 1 commit

Benchmarks: Add Benchmark - Add NCCL performance benchmark (#113) · e083a598

Yuting Jiang authored Jul 26, 2021

**Description**
Add NCCL performance microbenchmark.

**Major Revision**
- Add microbenchmark, example, test, config for NCCL

e083a598

23 Jul, 2021 2 commits

Benchmarks: Add Benchmark - Add IB Loopback performance benchmark. (#112) · b0c5addc

Yuting Jiang authored Jul 24, 2021

**Description**
Add RDMA Loopback performance microbenchmark.

**Major Revision**
- Add microbenchmark, example, test, config for RDMA Loopback

b0c5addc

Benchmarks: Add Benchmark - Add disk performance benchmark (#132) · db297fb4

Ziyue Yang authored Jul 23, 2021

**Description**
Add disk performance microbenchmark.

**Major Revision**
- Add microbenchmark, example, test, config for disk performance.

**Minor Revision**
- Fix bugs in executor unit test related to default enabled tests.

db297fb4

13 Jul, 2021 1 commit

Benchmarks: Add Benchmark - Add memory bandwidth benchmark for cuda. (#114) · f9550bd6

Yuting Jiang authored Jul 13, 2021

Add microbenchmark, example, test, config for cuda memory performance and Add cuda-samples(tag with cuda version) as git submodule and update related makefile

f9550bd6

02 Jun, 2021 1 commit
- Benchmarks: Add Benchmark - Add FLOPs performance benchmark for cuda. (#87) · 6c6f5269
  guoshzhao authored Jun 02, 2021
```
* add cuda flops performance benchmark.
```
  6c6f5269
01 Jun, 2021 1 commit
- Benchmarks: Add benchmark - add micro benchmark for cudnn test (#89) · 83235433
  Yuting Jiang authored Jun 01, 2021
```
* add python related cudnn microbenchmark
```
  83235433
31 May, 2021 1 commit

Benchmarks: Add benchmark - add micro benchmark for cublas test (#80) · 18398fba

Yuting Jiang authored May 31, 2021



* add benchmark for cublas test

* format

* revise error handling and test

* add interface to read json file, revise json file path and include .json in packaging

* add random_seed in arguments

* revise preprocess of cublas benchmark

* fix lint error and note error in source code

* update according comments

* revise input arguments from json file to custom str and convert json file to built-in dict list

* restore package config

* fit lint issue

* update platform and comments

* rename files to match source code dir and fix comments error
Co-authored-by: root <root@sb-validation-000001.51z1chmys5fuzfqyo4niepozre.bx.internal.cloudapp.net>

18398fba

19 May, 2021 2 commits
- Benchmarks: Add Benchmark - Add kernel launch overhead benchmark. (#74) · e977bbc1
  guoshzhao authored May 19, 2021
```
* add kernel launch overhead benchmark.
```
  e977bbc1
- expose interface of pin memory and modify cnn configuration (#75) · b7d0ee32
  Yuting Jiang authored May 19, 2021
  
  b7d0ee32
26 Apr, 2021 1 commit
- Benchmarks: Code Revision - Revise the settings of CNN example models. (#65) · 65292ae5
  guoshzhao authored Apr 26, 2021
```
* revise example settings of cnn models.
```
  65292ae5
20 Apr, 2021 2 commits
- Benchmarks: Add Benchmark - Add LSTM model benchmarks. (#60) · 2a7ab691
  guoshzhao authored Apr 20, 2021
```
* Benchmarks: Add Benchmark - Add LSTM model benchmarks.
```
  2a7ab691
- Benchmarks: Add Benchmark - Add CNN model benchmarks. (#59) · 902ea211
  guoshzhao authored Apr 20, 2021
```
* Benchmarks: Add Benchmark - Add CNN model benchmarks.
```
  902ea211
16 Apr, 2021 2 commits
- Benchmarks: Code Revision - Fix some issue for BERT benchmark. (#58) · ce3ed24a
  guoshzhao authored Apr 16, 2021
```
Benchmarks: Code Revision - Fix some issue for BERT benchmark. (#58)
```
  ce3ed24a
- Benchmarks: Add Benchmark - Add GPT2 model benchmark. (#57) · af567cf6
  guoshzhao authored Apr 16, 2021
```
* Benchmarks: Add Benchmark - Add GPT2 model benchmark.
```
  af567cf6