Commits · 4fa10f4d00632bb68657160d38bebafe3946aa46 · tsoc / superbenchmark

23 Apr, 2026 1 commit

Benchmarks: Add gpu-hpl and gpu-hpl-mxp micro benchmarks (#15) · 4fa10f4d

one authored Apr 23, 2026

Add gpu-hpl and gpu-hpl-mxp micro benchmarks backed by rocHPL and rocHPL-MxP.

Implemented a shared GPU HPL base that:
- Generates per-workload HPL dat files and parses the corresponding output files.
- Supports common HPL inputs such as process grid, matrix size, block size, broadcast topology, warmup, iterations, and reduce operator.
- Adds rocHPL-specific tuning parameters for gpu-hpl.
- Formats metric keys from input-derived workload attributes.
- Reports `flops`, `time`, and `tests_pass` metrics with warmup-aware aggregation.

Add benchmark registrations, parser tests, sample output fixtures, documentation, and recommended configurations for gpu-hpl and gpu-hpl-mxp.

Update rocHPL and rocHPL-MxP third-party integration with build patches, install targets, and SuperBench run helper scripts.

Also update gpu-hpcg metric naming to use flops instead of gflops, remove standalone domain/verification-style metrics from the documented metric surface, and refresh Hygon HPCG documentation/config references accordingly.

4fa10f4d

20 Apr, 2026 1 commit
- Update mem-bw to use BandwidthTest (#5) · 800b962a
  one authored Apr 20, 2026
```
* Update mem-bw to use BandwidthTest

* Update config and format code
```
  800b962a
19 Mar, 2026 1 commit
- Migrate gpu-stream to BabelStream v5.0 · d4051602
  one authored Mar 19, 2026
  
  d4051602
23 Oct, 2025 1 commit

Benchmarks: Micro benchmark - add ncu profile support in cublaslt-gemm (#740) · f6e65a98

Yuting Jiang authored Oct 23, 2025

**Description**
This PR adds NCU (NVIDIA Nsight Compute) profiling support to the
cublaslt-gemm micro benchmark, enabling detailed kernel analysis
including DRAM throughput, compute throughput, and launch arguments.

**Major Revision**
- Add --enable_ncu_profiling and --profiling_metrics for ncu profiling
- Modifies command execution to use NCU when profiling is enabled
- Updates result parsing to handle both standard and NCU profiled output
formats

f6e65a98

20 Jun, 2025 1 commit

Benchmark - Add Grace CPU support for CPU Stream (#719) · 0b8d1fd4

WenqingLan1 authored Jun 19, 2025



**Description**
Added support for Grace CPU neo2 architecture in CPU Stream. Now CPU
Stream supports dual socket benchmarking.

Example config for this arch support:
```yaml
    cpu-stream:numa0:
      timeout: *default_timeout
      modes:
      - name: local
        parallel: no
      parameters:
        cpu_arch: neo2
        numa_mem_nodes: 0
        cores: 0 1 2 3 4 5 6 7 8
    cpu-stream:numa1:
      timeout: *default_timeout
      modes:
      - name: local
        parallel: no
      parameters:
        cpu_arch: neo2
        numa_mem_nodes: 1
        cores: 64 65 66 67 68 69 70 71 72
    cpu-stream:numa-spread:
      timeout: *default_timeout
      modes:
      - name: local
        parallel: no
      parameters:
        cpu_arch: neo2
        numa_mem_nodes: 0 1
        cores: 0 1 2 3 4 5 6 7 8 64 65 66 67 68 69 70 71 72
```

---------
Co-authored-by: dpower4 <dilipreddi@gmail.com>

0b8d1fd4

18 Jun, 2025 1 commit

Benchmarks - Add GPU Stream Micro Benchmark (#697) · 4eddd50a

WenqingLan1 authored Jun 18, 2025

Added GPU Stream benchmark - measures the GPU memory bandwidth and
efficiency for double datatype through various memory operations
including copy, scale, add, and triad.
- added documentation for `gpu-stream` detailing its introduction,
metrics, and descriptions.
- added unit tests for `gpu-stream`. Example output is in
`superbenchmark/tests/data/gpu_stream.log`.

4eddd50a

22 Nov, 2024 1 commit

Benchmarks: micro benchmarks - add nvbandwidth benchmark (#669) · 7cef624e

Hongtao Zhang authored Nov 21, 2024



**Description**

Add nvbandwidth benchmark.

---------
Co-authored-by: hongtaozhang <hongtaozhang@microsoft.com>

7cef624e

08 Jan, 2024 1 commit

Release - SuperBench v0.10.0 (#607) · 2c88db90

Yifan Xiong authored Jan 07, 2024

**Description**

Cherry-pick bug fixes from v0.10.0 to main.

**Major Revisions**

* Benchmarks: Microbenchmark - Support different hipblasLt data types in dist_inference #590
* Benchmarks: Microbenchmark - Support in-place for NCCL/RCCL benchmark #591
* Bug Fix - Fix NUMA Domains Swap Issue in NDv4 Topology File #592
* Benchmarks: Microbenchmark - Add data type option for NCCL and RCCL tests #595
* Benchmarks: Bug Fix - Make metrics of dist-inference-cpp aligned with PyTorch version #596
* CI/CD - Add ndv5 topo file #597
* Benchmarks: Microbenchmark - Improve AMD GPU P2P performance with fine-grained GPU memory #593
* Benchmarks: Build Pipeline - fix nccl and nccl test version to 2.18.3 to resolve hang issue in cuda12.2 docker #599
* Dockerfile - Bug fix for rocm docker build and deploy #598
* Benchmarks: Microbenchmark - Adapt to hipblasLt data type changes #603
* Benchmarks: Micro benchmarks - Update hipblaslt metric unit to tflops #604
* Monitor - U...

2c88db90

10 Dec, 2023 1 commit
- Benchmarks: Microbenchmark - Add distributed inference benchmark cpp implementation (#586) · 719a427f
  Ziyue Yang authored Dec 11, 2023
```
**Description**
Add distributed inference benchmark cpp implementation.
```
  719a427f
08 Dec, 2023 1 commit

Benchmarks: Micro benchmark - Add one-to-all, all-to-one, all-to-all support... · 4fa60be7

Ziyue Yang authored Dec 08, 2023

Benchmarks: Micro benchmark - Add one-to-all, all-to-one, all-to-all support to gpu_copy_bw_performance (#588)

**Description**
Add one-to-all, all-to-one, all-to-all support to
gpu_copy_bw_performance, and fix performance bug in gpu_copy

4fa60be7

07 Dec, 2023 1 commit
- Benchmarks: Add benchmark: Megatron-LM/Megatron-Deepspeed GPT pretrain benchmark (#582) · dd5a6329
  Yuting Jiang authored Dec 07, 2023
```
**Description**
Megatron-LM/Megatron-Deepspeed GPT pretrain benchmark
```
  dd5a6329
30 Jun, 2023 1 commit
- Benchmarks - Update result parsing in tensorrt inference (#541) · 7184bdd1
  Yifan Xiong authored Jun 30, 2023
```
* Update result parsing for newer tensorrt versions
* Update arguments when load torchvision models
```
  7184bdd1
21 Mar, 2023 1 commit

Adding HPL benchmark (#482) · 655bd0aa

rafsalas19 authored Mar 21, 2023



**Description**

- Adding HPL benchmark

---------
Co-authored-by: Ubuntu <azureuser@sbtestvm.jzlku1oskncengjiado35wf1hd.ax.internal.cloudapp.net>
Co-authored-by: Peng Cheng <chengpeng5555@outlook.com>

655bd0aa

13 Feb, 2023 1 commit

Adding Stream Benchmark (#473) · 32896ca4

rafsalas19 authored Feb 13, 2023



**Description**

- Added stream benchmark
- Added stream unit test
- Added stream example
- Modified docker files to build stream

---------
Co-authored-by: Ubuntu <azureuser@sbtestvm.jzlku1oskncengjiado35wf1hd.ax.internal.cloudapp.net>
Co-authored-by: Peng Cheng <chengpeng5555@outlook.com>
Co-authored-by: Yifan Xiong <xiongyf@yandex.com>

32896ca4

04 Jan, 2023 1 commit

Runner - Generate host groups file in mpi mode (#458) · 8e748d56

Yang Wang authored Jan 04, 2023

**Major Revision**

- Add an option for pattern to generate mpi_pattern.txt file if
specified the path.
- In mpi pattern, serial_index and parallel_index will add in each
benchmark as environment variables.

**Minor Revision**
- Fix typo

8e748d56

03 Jan, 2023 1 commit
- Runner: Support `topo-aware` and `k-batch` pattern in 'mpi' mode (#437) · 65e433c0
  Yang Wang authored Jan 03, 2023
```
**Description**
Support the following patterns  in `mpi` mode:
* `k-batch`
* `topo-aware`
```
  65e433c0
06 Sep, 2022 1 commit

Release - SuperBench v0.6.0 (#409) · 63e9b2d1

Yifan Xiong authored Sep 06, 2022



**Description**

Cherry-pick bug fixes from v0.6.0 to main.

**Major Revisions**

* Enable latency test in ib traffic validation distributed benchmark (#396)
* Enhance parameter parsing to allow spaces in value (#397)
* Update apt packages in dockerfile (#398)
* Upgrade colorlog for NO_COLOR support (#404)
* Analyzer - Update error handling to support exit code of sb result diagnosis (#403)
* Analyzer - Make baseline file optional in data diagnosis and fix bugs (#399)
* Enhance timeout cleanup to avoid possible hanging (#405)
* Auto generate ibstat file by pssh (#402)
* Analyzer - Format int type and unify empty value to N/A in diagnosis output file (#406)
* Docs - Upgrade version and release note (#407)
* Docs - Fix issues in document (#408)
Co-authored-by: Yang Wang <yangwang1@microsoft.com>
Co-authored-by: Yuting Jiang <yutingjiang@microsoft.com>

63e9b2d1

22 Aug, 2022 1 commit

Analyzer - Add support for both jsonl and json format in data diagnosis (#388) · 10a79c4e

Yuting Jiang authored Aug 22, 2022

**Description**
Add support for both jsonl and json format in data diagnosis.

**Major Revision**
- Add support for both jsonl and json format in data diagnosis


**Minor Revision**
- change related doc
- add jsonl support in cli

10a79c4e

09 Aug, 2022 1 commit

Analyzer: Rename fields in json of data diagnosis to be more readable (#382) · b5c7c85d

Yuting Jiang authored Aug 09, 2022

**Description**
Rename field in data diagnosis to be more readable.

**Major Revision**
- rename fields according to diagnosis/metric format

**Minor Revision**
- change type of diagnosis/issue_num to be int

b5c7c85d

01 Aug, 2022 1 commit

Analyzer - Add failure check feature in data diagnosis (#378) · ec16d425

Yuting Jiang authored Aug 01, 2022

**Description**
Add failure check feature in data diagnosis.

**Major Revision**
- Add failure check rule op to support that if there exists metric_regex not been matched by any metric in result, label as failedtest
- Split performance issue and failedtest in categories


**Minor Revision**
- replace DataFrame.append() with pd.concat since append() will be removed in later version of pandas

ec16d425

26 Jul, 2022 1 commit

Support topo-aware IB performance validation (#373) · ef4d6574

Jie Zhang authored Jul 26, 2022



* Support topo-aware IB performance validation

Add a new pattern `topo-aware`, so the user can run IB performance
test based on VM's topology information. This way, the user can
validate the IB performance across VM pairs with different distance
as a quick test instead of pair-wise test.

To run with topo-aware pattern, user needs to specify three required
(and two optional) parameters in YAML config file:
--pattern	topo-aware
--ibstat	path to ibstat output
--ibnetdiscover	path to ibnetdiscover output
--min_dist	minimum distance of VM pairs (optional, default 2)
--max_dist	maximum distance of VM pairs (optional, default 6)

The newly added topo_aware module then parses the topology
information, builds a graph, and generates the VM pairs with
the specified distance (# hops).

The specified IB test will then be running across these
generated VM pairs.
Signed-off-by: Jie Zhang <jessezhang1010@gmail.com>

* Add description about topology aware ib traffic tests
Signed-off-by: Jie Zhang <jessezhang1010@gmail.com>

* Add unit test to verify generated topology aware config file

This commit adds unit test to verify the generated topology aware
config file is correct. To do so, four new data files are added in
order to invoke gen_topo_aware_config function to generate topology
aware config file, then compares it with the expected config file.
Signed-off-by: Jie Zhang <jessezhang1010@gmail.com>

* Fix lint issue on Azure pipeline
Signed-off-by: Jie Zhang <jessezhang1010@gmail.com>

ef4d6574

01 Jun, 2022 1 commit

Analyzer - Fix bugs in data diagnosis (#355) · 54da021b

user4543 authored Jun 01, 2022

**Description**
Fix bugs in data diagnosis.

**Major Revision**
- add support to get baseline of the metric which uses custom benchmark naming with ':' like 'nccl-bw:default/allreduce_8_bw:0'
- save raw data of all metrics rather than metrics defined in diagnosis_rules.yaml when output_all is True
- fix bug of using wrong column index when applying format(red color and percentile) in the excel

54da021b

10 Apr, 2022 1 commit
- Analyzer: Add Feature - Output results of all nodes in data diagnosis (#336) · 55b0f9d2
  user4543 authored Apr 10, 2022
```
**Description**
Output results of all nodes in data diagnosis.
```
  55b0f9d2
24 Mar, 2022 1 commit

Analyzer: Add feature - Add result summary in excel,md,html format (#320) · 84fed1ce

user4543 authored Mar 24, 2022

**Description**
Add result summary in excel,md,html format.

**Major Revision**
- Add ResultSummary class to support result summary in excel,md,html format.
- Abstract RuleBase class for common-used functions in DataDiagnosis and ResultSummary.

84fed1ce

16 Mar, 2022 1 commit

Benchmarks: Add Feature - Add GPU-Burn as microbenchmark (#324) · ff51a3ce

rafsalas19 authored Mar 16, 2022

**Description**
Modifications adding GPU-Burn to SuperBench.
- added third party submodule
- modified Makefile to make gpu-burn binary
- added/modified microbenchmarks to add gpu-burn python scripts
- modified default and azure_ndv4 configs to add gpu-burn

ff51a3ce

15 Mar, 2022 1 commit

Analyzer - Add md and html output format for DataDiagnosis (#325) · b3c95f18

user4543 authored Mar 15, 2022

**Description**
Add md and html output format for DataDiagnosis.

**Major Revision**
- add md and html support in file_handler
- add interface in DataDiagnosis for md and HTML output

**Minor Revision**
- move excel and json output interface into DataDiagnosis

b3c95f18

09 Feb, 2022 1 commit

Benchmarks: Revise Code - Eliminate NUMA binding for device-to-device tests in gpu_copy (#302) · 6cdf7595

Ziyue Yang authored Feb 09, 2022

**Description**
This commit remove NUMA binding for device-to-device tests because NUMA doesn't affect performance, and revise benchmark metrics accordingly.

6cdf7595

21 Jan, 2022 1 commit

Benchmarks: Add Feature - Add bidirectional test support in gpu_copy benchmark (#285) · 74421ffe

Ziyue Yang authored Jan 21, 2022

**Description**
This commit adds bidirectional tests in gpu_copy benchmark for both device-host transfer and device-device transfer, and revises related tests.

74421ffe

30 Dec, 2021 1 commit

Release - SuperBench v0.4.0 (#278) · ff563b66

Yifan Xiong authored Dec 30, 2021



__Description__

Cherry-pick  bug fixes from v0.4.0 to main.

__Major Revisions__

* Bug - Fix issues for Ansible and benchmarks (#267)
* Tests - Refine test cases for microbenchmark (#268)
* Bug - Build openmpi with ucx support in rocm dockerfiles (#269)
* Benchmarks: Fix Bug - Fix fio build issue (#272)
* Docs - Unify metric and add doc for cublas and cudnn functions (#271)
* Monitor: Revision - Add 'monitor/' prefix to monitor metrics in result summary (#274)
* Bug - Fix bug of detecting if gpu_index is none (#275)
* Bug - Fix bugs in data diagnosis (#273)
* Bug - Fix issue that the root mpi rank may not be the first in the hostfile (#270)
* Benchmarks: Configuration - Update inference and network benchmarks in configs (#276)
* Docs - Upgrade version and release note (#277)
Co-authored-by: Yuting Jiang <v-yutjiang@microsoft.com>

ff563b66

10 Dec, 2021 1 commit

Monitor: Integration - Integrate monitor into Superbench (#259) · 6e357fb9

guoshzhao authored Dec 10, 2021

**Description**
Integrate monitor into Superbench.

**Major Revision**
- Initialize, start and stop monitor in SB executor.
- Parse the monitor data in SB runner and merge into benchmark results.
- Specify ReduceType for monitor metrics, such as MAX, MIN and LAST.
- Add monitor configs into config file.

6e357fb9

12 Nov, 2021 1 commit

Benchmarks - Add TensorRT inference benchmark (#236) · 8a00c8a0

Yifan Xiong authored Nov 12, 2021

__Description__

Add TensorRT inference benchmark for torchvision models.

__Major Revision__
- Measure TensorRT inference performance.

8a00c8a0