Commits · 0fdfe4c3116cf6f8e1d3f150200ecba3856e0d2b · tsoc / superbenchmark

17 Mar, 2026 1 commit
- Add a dtk dockerfile · 0fdfe4c3
  one authored Mar 17, 2026
  
  0fdfe4c3
01 Oct, 2025 1 commit

Dockerfile - add cuda13.0.dockerfile (#739) · 60189dd6

WenqingLan1 authored Oct 01, 2025



Add support for cuda13.0.
Add cuda13.0.dockerfile.
Add cuda13.0 image building task to github pipeline.
Update GPU STREAM to work with cuda13.0.
Fix data type conversion perf bug in GPU stream.
Update nvbandwidth submodule to be v0.8.
Update perftest submodule to be 4bee61f80d9e268fc97eaf40be00409e91d3a19e
(recent master).

---------
Co-authored-by: Ubuntu <dilipreddi@gmail.com>
Co-authored-by: guoshzhao <guzhao@microsoft.com>

60189dd6

30 Sep, 2025 1 commit

Benchmarks: Micro benchmark - Add simultanneously all-to-host / host-to-all... · 93e9d262

Yuting Jiang authored Sep 30, 2025

Benchmarks: Micro benchmark - Add simultanneously all-to-host / host-to-all bandwidth testcases to nvbandwidth (#736)

**Description**
Add simultanneously all-to-host / host-to-all bandwidth testcases to
nvbandwidth .

**Major Revision**
- nvbandwidth.patch: Add simultanneously all-to-host / host-to-all
bandwidth testcases to nvbandwidth
- upgrade nvbandwidth submodule into v0.8
- add patch into makefile build

93e9d262

26 Jun, 2025 1 commit

Benchmarks - Add deepseek megatron-lm benchmark (#713) · deef9a3d

Yuting Jiang authored Jun 27, 2025



**Description**
Add deepseek megatron-lm benchmark.

---------
Co-authored-by: yukirora <yuting.jiang@microsoft.com>
Co-authored-by: Hongtao Zhang <garyworkzht@gmail.com>
Co-authored-by: Hongtao Zhang <hongtaozhang@microsoft.com>

deef9a3d

25 Jun, 2025 1 commit

Dockerfile - Add cuda12.9 docker image (#716) · a56356d8

guoshzhao authored Jun 25, 2025



**Description**
Add cuda 12.9 dockerfile and build in pipeline.

---------
Co-authored-by: Guoshuai Zhao <microsoft@microsoft.com>
Co-authored-by: Hongtao Zhang <hongtaozhang@microsoft.com>
Co-authored-by: Hongtao Zhang <garyworkzht@gmail.com>

a56356d8

14 Jun, 2025 1 commit

microbenchmark - CPU Stream Benchmark Revise (#712) · 991c0051

Hongtao Zhang authored Jun 14, 2025



In the current implementation, the CPU‑stream benchmark code renames the
binary before the microbench base class can verify its existence,
causing the default‐binary check to fail.

This PR adds a “default” binary—built with the standard compile
parameters—so that the base class can always find and validate it. Once
the default binary is in place, the CPU‑stream code will rename it as
needed and re‑check its presence before running the benchmark.

The PR also enable CPU stream in the default settings.

---------
Co-authored-by: Hongtao Zhang <hongtaozhang@microsoft.com>

991c0051

21 Mar, 2025 1 commit

Dockerfile - Support cuda12.8 for Blackwell arch (#682) · 294f1f20

pdr authored Mar 20, 2025



**Description**
Updated docker for 12.8
Use cutlass latest relase 3.8 with ARCH 100(blackwell) support
add latest nccl-test release with ARCH 100(blackwell) 
Updated msccl to support build for sm_100
No breaking changes, so backward compatible tested with  cuda 12.4

---------
Co-authored-by: Hongtao Zhang <garyworkzht@gmail.com>

294f1f20

21 Nov, 2024 1 commit

Benchmarks: micro benchmarks - add nvbandwidth build (#665) · c8c52eb2

Hongtao Zhang authored Nov 21, 2024



**Description**
Add nvbandwidth build to repo

---------
Co-authored-by: hongtaozhang <hongtaozhang@microsoft.com>

c8c52eb2

06 Nov, 2024 1 commit

Dockerfile - Add support for arm64 build (#660) · 47949127

pdr authored Nov 06, 2024

Add support for arm64 build:

- Updated dockerfile for arm64 build
- extend cpu stream compilation for neoverse 
- handle onnxruntime-gpu installation
- third party builds filtering based on arch
- disable cuda decode perf build for non x86

47949127

28 Jul, 2024 1 commit
- CI/CD - Fix MSCCL build error in CUDA12.4 docker build pipeline (#633) · 2101e933
  Yuting Jiang authored Jul 29, 2024
```
**Description**
Fix MSCCL build error in CUDA12.4 docker build pipeline due to OOM
issue.
```
  2101e933
26 Jul, 2024 1 commit
- Benchmarks: Micro benchmarks - add support for NVIDIA L4/L40/L40s GPUs in gemm-flops (#634) · e304cf15
  Yuting Jiang authored Jul 26, 2024
```
**Description**
Add support GPU ARCH 8.9 for NVIDIA L4/L40/L40s GPUs in gemm-flops.
```
  e304cf15
08 Jan, 2024 1 commit

Release - SuperBench v0.10.0 (#607) · 2c88db90

Yifan Xiong authored Jan 07, 2024



**Description**

Cherry-pick bug fixes from v0.10.0 to main.

**Major Revisions**

* Benchmarks: Microbenchmark - Support different hipblasLt data types in dist_inference #590
* Benchmarks: Microbenchmark - Support in-place for NCCL/RCCL benchmark #591
* Bug Fix - Fix NUMA Domains Swap Issue in NDv4 Topology File #592
* Benchmarks: Microbenchmark - Add data type option for NCCL and RCCL tests #595
* Benchmarks: Bug Fix - Make metrics of dist-inference-cpp aligned with PyTorch version #596
* CI/CD - Add ndv5 topo file #597
* Benchmarks: Microbenchmark - Improve AMD GPU P2P performance with fine-grained GPU memory #593
* Benchmarks: Build Pipeline - fix nccl and nccl test version to 2.18.3 to resolve hang issue in cuda12.2 docker #599
* Dockerfile - Bug fix for rocm docker build and deploy #598
* Benchmarks: Microbenchmark - Adapt to hipblasLt data type changes #603
* Benchmarks: Micro benchmarks - Update hipblaslt metric unit to tflops #604
* Monitor - Upgrade pyrsmi to amdsmi python library. #601
* Benchmarks: Micro benchmarks - add fp8 and initialization for hipblaslt benchmark #605
* Dockerfile - Add rocm6.0 dockerfile #602
* Bug Fix - Bug fix for latest megatron-lm benchmark #600
* Docs - Upgrade version and release note #606
Co-authored-by: Ziyue Yang <ziyyang@microsoft.com>
Co-authored-by: Yang Wang <yangwang1@microsoft.com>
Co-authored-by: Yuting Jiang <yutingjiang@microsoft.com>
Co-authored-by: guoshzhao <guzhao@microsoft.com>

2c88db90

09 Dec, 2023 1 commit

Dockerfile - Upgrade to rocm5.7 dockerfile (#587) · 1f5031bd

Yuting Jiang authored Dec 10, 2023



**Description**
upgrade to rocm5.7 dockerfile.

---------
Co-authored-by: yukirora <yuting.jiang@microsoft.com>

1f5031bd

07 Dec, 2023 2 commits
- Benchmarks: Add MSCCL Support for Nvidia GPU (#584) · 6ef3a011
  Ziyue Yang authored Dec 07, 2023
```
**Description**
Add MSCCL support for Nvidia GPU
```
  6ef3a011
- Benchmarks: Add benchmark: Megatron-LM/Megatron-Deepspeed GPT pretrain benchmark (#582) · dd5a6329
  Yuting Jiang authored Dec 07, 2023
```
**Description**
Megatron-LM/Megatron-Deepspeed GPT pretrain benchmark
```
  dd5a6329
22 Nov, 2023 1 commit
- Benchmarks: Micro benchmark - Add hipBLASLt function benchmark (#576) · 79089b65
  Yuting Jiang authored Nov 22, 2023
```
**Description**
hipblaslt function benchmark and rebase cublaslt function benchmark.
```
  79089b65
27 Jul, 2023 1 commit

Release - SuperBench v0.9.0 (#558) · e1df877b

Yuting Jiang authored Jul 27, 2023

**Description**
Cherry-pick bug fixes from v0.9.0 to main.

**Major Revision**
- CI/CD: pipeline - clean more disk space to fix rocm building image
pipeline(#555 )
- Benchmarks: bug fix - use absolute path for input file in
DirectXEncodingLatency(#554)
- CI/CD - add push win docker image on release branch in pipeline (#552)
- Docs - Upgrade version and release note(#557)

e1df877b

03 Jul, 2023 1 commit
- Benchmarks: Build Pipeline - add AMF in third party and build AMF encoding latency test (#543) · 86547217
  Yuting Jiang authored Jul 03, 2023
```
**Description**
add AMF in third party and build AMF encoding latency test.
```
  86547217
21 Mar, 2023 1 commit

Adding HPL benchmark (#482) · 655bd0aa

rafsalas19 authored Mar 21, 2023



**Description**

- Adding HPL benchmark

---------
Co-authored-by: Ubuntu <azureuser@sbtestvm.jzlku1oskncengjiado35wf1hd.ax.internal.cloudapp.net>
Co-authored-by: Peng Cheng <chengpeng5555@outlook.com>

655bd0aa

24 Feb, 2023 1 commit

Benchmarks: Build Pipeline - Add suppport for cpu-only perftest in makefile (#480) · 02923660

Yuting Jiang authored Feb 24, 2023



**Description**
Add suppport to install cpu-only perftest in makefile.
Co-authored-by: Yuting Jiang <yuting.jiang@microsoft.com>
Co-authored-by: Peng Cheng <chengpeng5555@outlook.com>

02923660

13 Feb, 2023 1 commit

Adding Stream Benchmark (#473) · 32896ca4

rafsalas19 authored Feb 13, 2023



**Description**

- Added stream benchmark
- Added stream unit test
- Added stream example
- Modified docker files to build stream

---------
Co-authored-by: Ubuntu <azureuser@sbtestvm.jzlku1oskncengjiado35wf1hd.ax.internal.cloudapp.net>
Co-authored-by: Peng Cheng <chengpeng5555@outlook.com>
Co-authored-by: Yifan Xiong <xiongyf@yandex.com>

32896ca4

29 Dec, 2022 1 commit

Dockerfile - Add CUDA11.8 Docker image for Nvidia arch90 GPUs (#449) · a3c65b2a

Yifan Xiong authored Dec 29, 2022

Add Docker image for arch90 NVIDIA GPUs:

* add CUDA11.8 Dockerfile
* update archs in Makefile and benchmarks accordingly
* update image build pipeline

a3c65b2a

06 Jul, 2022 1 commit

Update dependencies and Dockerfile (#371) · 9f03d568

Yifan Xiong authored Jul 06, 2022

Update dependencies and Dockerfile:
* upgrade nccl-tests and rccl-tests to current latest version to match
  NCCL/RCCL versions
* unify image tag names on DockerHub
* remove verbose output in Dockerfile and minor fix some flags

9f03d568

19 Jun, 2022 1 commit

Update ROCm Dockerfile (#361) · 483bf782

Yifan Xiong authored Jun 19, 2022

**Description**

Update ROCm Dockerfile.

**Major Revisions**
- Add dockerfile for ROCm 5.1.3
- Merge 5.1.x and 5.0.x dockerfile
- Remove 4.2 and 4.0 legacy
- Update build pipeline accordingly

483bf782

15 Jun, 2022 1 commit

Fix cmake and build issues (#360) · 60a3c743

Yifan Xiong authored Jun 15, 2022

**Description**

Fix cmake and build issues.

**Major Revision**

* Remove unnecessary boost build
* Remove user-agent for mlc
* Remove -j for third party to build each project in sequence
* Fix ansible collections installation path

60a3c743

16 Mar, 2022 1 commit

Benchmarks: Add Feature - Add GPU-Burn as microbenchmark (#324) · ff51a3ce

rafsalas19 authored Mar 16, 2022

**Description**
Modifications adding GPU-Burn to SuperBench.
- added third party submodule
- modified Makefile to make gpu-burn binary
- added/modified microbenchmarks to add gpu-burn python scripts
- modified default and azure_ndv4 configs to add gpu-burn

ff51a3ce

24 Feb, 2022 1 commit
- Benchmarks: Build Pipeline - Make gpcnet only for cuda (#316) · 4f5027db
  user4543 authored Feb 24, 2022
```
**Description**
Make gpcnet only for cuda.
```
  4f5027db
09 Feb, 2022 1 commit
- Benchmarks: Build Pipeline - Update rccl-tests submodule to fix divide by zero error (#306) · 4abda6f5
  user4543 authored Feb 09, 2022
```
**Description**
Update rccl-tests submodule to fix divide by zero error.
```
  4abda6f5
29 Jan, 2022 1 commit
- Benchmarks - Support T4 and A10 in GEMM benchmark (#294) · 3419447c
  Yifan Xiong authored Jan 29, 2022
```
Support T4 and A10 in GEMM benchmark.
```
  3419447c
30 Dec, 2021 1 commit

Release - SuperBench v0.4.0 (#278) · ff563b66

Yifan Xiong authored Dec 30, 2021



__Description__

Cherry-pick  bug fixes from v0.4.0 to main.

__Major Revisions__

* Bug - Fix issues for Ansible and benchmarks (#267)
* Tests - Refine test cases for microbenchmark (#268)
* Bug - Build openmpi with ucx support in rocm dockerfiles (#269)
* Benchmarks: Fix Bug - Fix fio build issue (#272)
* Docs - Unify metric and add doc for cublas and cudnn functions (#271)
* Monitor: Revision - Add 'monitor/' prefix to monitor metrics in result summary (#274)
* Bug - Fix bug of detecting if gpu_index is none (#275)
* Bug - Fix bugs in data diagnosis (#273)
* Bug - Fix issue that the root mpi rank may not be the first in the hostfile (#270)
* Benchmarks: Configuration - Update inference and network benchmarks in configs (#276)
* Docs - Upgrade version and release note (#277)
Co-authored-by: Yuting Jiang <v-yutjiang@microsoft.com>

ff563b66

01 Dec, 2021 1 commit
- Benchmarks: Build Pipeline - Upgrade FIO benchmark tool (#251) · b0e759f5
  Ziyue Yang authored Dec 01, 2021
```
**Description**
Upgrade FIO benchmark tool from 3.27 to 3.28.
```
  b0e759f5
21 Oct, 2021 1 commit

Benchmarks: Build Pipeline - Add gpcnet as git submodule and building logic (#228) · b592a7c7

Yuting Jiang authored Oct 21, 2021

**Description**
Add gpcnet as git submodule and building logic.

**Major Revision**
- add gpcnet as a submodule
- add build logic in third_party/Makefile

b592a7c7

26 Sep, 2021 1 commit

Release - SuperBench v0.3.0 (#212) · dfbd70b1

Yifan Xiong authored Sep 26, 2021



**Description**

Cherry-pick  bug fixes from v0.3.0 to main.

**Major Revisions**
* Docs - Upgrade version and release note (#209)
* Benchmarks: Build Pipeline - Update rccl-test git submodule to dc1ad48 (#210)
* Benchmarks: Update - Update benchmarks in configuration file (#208)
* CI/CD - Update GitHub Action VM (#211)
* Benchmarks: Fix Bug - Fix wrong parameters for gpu-sm-copy-bw in configuration examples (#203)
* CI/CD - Fix bug in build image for push event (#205)
* Benchmark: Fix Bug - fix error message of communication-computation-overlap (#204)
* Tool: Fix bug - Fix function naming issue in system info  (#200)
* CI/CD - Push images in GitHub Action (#202)
* Bug - Fix torch.distributed command for single node (#201)
* CLI - Integrate system info for node (#199)
* Benchmarks: Code Revision - Revise CMake files for microbenchmarks. (#196)
* CI/CD - Add ROCm image build in GitHub Actions (#194)
* Bug: Fix bug - fix bug of hipBusBandwidth build (#193)
* Benchmarks: Build Pipeline - Restore rocblas build logic (#197)
* Bug: Fix Bug - Add barrier before 'destroy_process_group' in model benchmarks (#198)
* Bug - Revise 'docker run' in sb deploy (#195)
* Bug - Fix Bug : fix bug of error param operations to operation in rccl-bw of hpe config (#190)
Co-authored-by: Yuting Jiang <v-yujiang@microsoft.com>
Co-authored-by: Guoshuai Zhao <guzhao@microsoft.com>
Co-authored-by: Ziyue Yang <ziyyang@microsoft.com>

dfbd70b1

31 Aug, 2021 1 commit

Benchmarks: Build Pipeline - Support rocblas building in... · b90b47f3

Yuting Jiang authored Sep 01, 2021

Benchmarks: Build Pipeline - Support rocblas building in rocm4.0_ubuntu18.04_py3.6_pytorch_1.7.0 docker (#172)

**Description**
Revise rocblas building logic in third_party/makefile to support rocblas building in rocm4.0_ubuntu18.04_py3.6_pytorch_1.7.0 docker.

**Major Revision**
- add extra building logic including env about pthread limit and build command restrict to reduce amount of resource used

**Minor Revision**
- make rocm_version to be able to modify

b90b47f3

20 Aug, 2021 1 commit

Benchmarks: Build Pipeline - Add build logic of hipBusBandwidth in third_party (#151) · a1e5c90d

Yuting Jiang authored Aug 20, 2021

**Description**
Add build logic of hipBusBandwidth in third_party.

**Major Revision**
- Add build logic of hipBusBandwidth in third_party

a1e5c90d

02 Aug, 2021 1 commit

Benchmarks: Build Pipeline - Add rocBLAS building logic in third_party (#144) · 86c390a9

Yuting Jiang authored Aug 02, 2021

**Description**
Add rocBLAS building logic in third_party.

**Major Revision**
- Add rocm_rocblas target in third_party/Makefile.
- Add rocblas building logic

86c390a9

29 Jul, 2021 2 commits

Benchmarks: Build Pipeline - add rccl-tests as a submodule with building logic (#139) · a532eee4

Yuting Jiang authored Jul 30, 2021

**Description**
Support rocm in third_party/makefile and add rccl-tests as a submodule with building logic.

**Major Revision**
- Support rocm in third_party/makefile
- Add rccl-tests as a submodule 
- Add build logic in third_party/Makefile for rccl-tests

a532eee4

Benchmarks: Build Pipeline - Support rocm in third_party/makefile (#140) · c88ce056

Yuting Jiang authored Jul 29, 2021

**Description**
Support rocm in third_party/makefile.

**Major Revision**
- Split rocm and cuda target in makefile
- Add target in dockerfile

c88ce056

19 Jul, 2021 1 commit

Benchmarks: Build Pipeline - Add FIO benchmark tool (#127) · 4bbd7f51

Ziyue Yang authored Jul 19, 2021

**Description**
Add FIO benchmark tool into third-party dependency.

**Major Revision**
- Add FIO submodule into third-party directory and modify Makefile to enable it.

4bbd7f51

16 Jul, 2021 1 commit
- Benchmarks: Build Pipeline - Add perftest as a submodule and add build logic (#129) · 419dea26
  Yuting Jiang authored Jul 16, 2021
```
Add perftest as a submodule and add build logic
```
  419dea26