Commits · b98642442e2b6d86290a0c7c2c1822e8fcaf09b3 · tsoc / superbenchmark

08 Oct, 2025 1 commit
- CI/CD - Fix image merge in GitHub Action. (#749) · b9864244
  Yifan Xiong authored Oct 07, 2025
```
Fix image merge for release event in GitHub Action.
```
  b9864244
01 Oct, 2025 1 commit

Dockerfile - add cuda13.0.dockerfile (#739) · 60189dd6

WenqingLan1 authored Oct 01, 2025



Add support for cuda13.0.
Add cuda13.0.dockerfile.
Add cuda13.0 image building task to github pipeline.
Update GPU STREAM to work with cuda13.0.
Fix data type conversion perf bug in GPU stream.
Update nvbandwidth submodule to be v0.8.
Update perftest submodule to be 4bee61f80d9e268fc97eaf40be00409e91d3a19e
(recent master).

---------
Co-authored-by: Ubuntu <dilipreddi@gmail.com>
Co-authored-by: guoshzhao <guzhao@microsoft.com>

60189dd6

12 Aug, 2025 1 commit

Release - SuperBench v0.12.0 (#729) · 0b4311cd

Hongtao Zhang authored Aug 12, 2025



**Description**

Cherry-pick bug fixes from v0.12.0 to main.

**Major Revisions**

* #725
* #727
* #728
Co-authored-by: Hongtao Zhang <hongtaozhang@microsoft.com>
Co-authored-by: Yifan Xiong <yixio@microsoft.com>
Co-authored-by: Guoshuai Zhao <guzhao@microsoft.com>

---------
Co-authored-by: Hongtao Zhang <hongtaozhang@microsoft.com>

0b4311cd

25 Jun, 2025 1 commit

Dockerfile - Add cuda12.9 docker image (#716) · a56356d8

guoshzhao authored Jun 25, 2025



**Description**
Add cuda 12.9 dockerfile and build in pipeline.

---------
Co-authored-by: Guoshuai Zhao <microsoft@microsoft.com>
Co-authored-by: Hongtao Zhang <hongtaozhang@microsoft.com>
Co-authored-by: Hongtao Zhang <garyworkzht@gmail.com>

a56356d8

30 Apr, 2025 1 commit

CI/CD - Update OS of runner to the latest. (#702) · 330c68aa

Hongtao Zhang authored Apr 30, 2025



- Upgrade OS of github runner used by lint to the latest.
- Add symbolic link for clang-format to version 14.
- Update importlib_metadata version since it is too old (inside
nvcr.io/nvidia/pytorch:20.12-py3) and failed the 11.1 build.

---------
Co-authored-by: hongtaozhang <hongtaozhang@microsoft.com>
Co-authored-by: Yifan Xiong <yifan.xiong@microsoft.com>

330c68aa

09 Apr, 2025 1 commit
- CI/CD - Merge multi-arch image (#696) · b13ef28f
  Yifan Xiong authored Apr 08, 2025
```
Merge multi-arch image in build pipeline.
```
  b13ef28f
21 Mar, 2025 1 commit

Dockerfile - Support cuda12.8 for Blackwell arch (#682) · 294f1f20

pdr authored Mar 20, 2025



**Description**
Updated docker for 12.8
Use cutlass latest relase 3.8 with ARCH 100(blackwell) support
add latest nccl-test release with ARCH 100(blackwell) 
Updated msccl to support build for sm_100
No breaking changes, so backward compatible tested with  cuda 12.4

---------
Co-authored-by: Hongtao Zhang <garyworkzht@gmail.com>

294f1f20

12 Mar, 2025 1 commit

CI/CD - Update label in the ROCm image build (#693) · 48cd8a3c

Hongtao Zhang authored Mar 12, 2025



Due to the matrix strategy’s default "fail-fast" setting. In GitHub
Actions, when running a job with a matrix, the individual configurations
run in parallel. By default, if one matrix job (for example, the one
labeled "rocm6_2_rocm6_2_x_superbe") fails, the remaining parallel jobs
are canceled automatically.

In our current build image pipeline, the arm64 build job always are
canceled by the rocm build job. So, using a non-existent label in the
job config to prevent rocm build job from scheduling for a temporary
solution.

---------
Co-authored-by: hongtaozhang <hongtaozhang@microsoft.com>

48cd8a3c

07 Mar, 2025 1 commit
- CI/CD - Add image build on arm64 arch (#690) · 300df46b
  Yifan Xiong authored Mar 07, 2025
```
Add image build on arm64 arch.
```
  300df46b
06 Nov, 2024 1 commit

Dockerfile - Add support for arm64 build (#660) · 47949127

pdr authored Nov 06, 2024

Add support for arm64 build:

- Updated dockerfile for arm64 build
- extend cpu stream compilation for neoverse 
- handle onnxruntime-gpu installation
- third party builds filtering based on arch
- disable cuda decode perf build for non x86

47949127

02 Nov, 2024 1 commit

CI/CD - Update Image Build Pipeline (#659) · 61770b89

Yifan Xiong authored Nov 01, 2024

**Description**

Update image build.

**Major Revision**

* Remove ROCm 6.0 image due to outdated packages
* Remove build tag for ROCm
* Preserve build cache for 30 days

61770b89

10 Oct, 2024 1 commit

Release - SuperBench v0.11.0 (#654) · 949f9cb4

Yuting Jiang authored Oct 10, 2024



**Description**
Cherry pick bug fixes from v0.11.0 to main

**Major Revision**
* #645 
* #648 
* #646 
* #647 
* #651 
* #652 
* #650

---------
Co-authored-by: hongtaozhang <hongtaozhang@microsoft.com>
Co-authored-by: Yifan Xiong <yifan.xiong@microsoft.com>

949f9cb4

28 Jul, 2024 1 commit
- CI/CD - Fix MSCCL build error in CUDA12.4 docker build pipeline (#633) · 2101e933
  Yuting Jiang authored Jul 29, 2024
```
**Description**
Fix MSCCL build error in CUDA12.4 docker build pipeline due to OOM
issue.
```
  2101e933
22 Apr, 2024 1 commit

Dockerfile - Add CUDA 12.4 dockerfile (#619) · 7435f10a

Yuting Jiang authored Apr 22, 2024

**Description**
Add CUDA 12.4 dockerfile.

**Major Revision**
- upgrade nvidia docker into 23.04


**Minor Revision**
- upgrade hpcx into 2.18

7435f10a

08 Jan, 2024 1 commit

Release - SuperBench v0.10.0 (#607) · 2c88db90

Yifan Xiong authored Jan 07, 2024

**Description**

Cherry-pick bug fixes from v0.10.0 to main.

**Major Revisions**

* Benchmarks: Microbenchmark - Support different hipblasLt data types in dist_inference #590
* Benchmarks: Microbenchmark - Support in-place for NCCL/RCCL benchmark #591
* Bug Fix - Fix NUMA Domains Swap Issue in NDv4 Topology File #592
* Benchmarks: Microbenchmark - Add data type option for NCCL and RCCL tests #595
* Benchmarks: Bug Fix - Make metrics of dist-inference-cpp aligned with PyTorch version #596
* CI/CD - Add ndv5 topo file #597
* Benchmarks: Microbenchmark - Improve AMD GPU P2P performance with fine-grained GPU memory #593
* Benchmarks: Build Pipeline - fix nccl and nccl test version to 2.18.3 to resolve hang issue in cuda12.2 docker #599
* Dockerfile - Bug fix for rocm docker build and deploy #598
* Benchmarks: Microbenchmark - Adapt to hipblasLt data type changes #603
* Benchmarks: Micro benchmarks - Update hipblaslt metric unit to tflops #604
* Monitor - U...

2c88db90

09 Dec, 2023 1 commit

Dockerfile - Upgrade to rocm5.7 dockerfile (#587) · 1f5031bd

Yuting Jiang authored Dec 10, 2023



**Description**
upgrade to rocm5.7 dockerfile.

---------
Co-authored-by: yukirora <yuting.jiang@microsoft.com>

1f5031bd

07 Dec, 2023 1 commit
- Benchmarks: Add MSCCL Support for Nvidia GPU (#584) · 6ef3a011
  Ziyue Yang authored Dec 07, 2023
```
**Description**
Add MSCCL support for Nvidia GPU
```
  6ef3a011
22 Nov, 2023 1 commit

Dockerfile - Upgrade Docker image to CUDA 12.2 (#577) · 1ad1c21c

Yifan Xiong authored Nov 22, 2023

Upgrade Docker image to CUDA 12.2 for H100:
* upgrade base image to 23.10
* fix onnxruntime version in python3.10
* fix compilation errors

1ad1c21c

22 Aug, 2023 1 commit
- Benchmarks: micro benchmark - source code for evaluating NVDEC decoding performance (#560) · 27a10811
  Yuting Jiang authored Aug 22, 2023
```
**Description**
source code for evaluating NVDEC decoding performance.

---------
Co-authored-by: yukirora <yuting.jiang@microsoft.com>
```
  27a10811
18 Aug, 2023 1 commit
- Benchmarks: micro benchmarks - add source code for DirectXRenderPerf (#549) · 6c0205ce
  Yuting Jiang authored Aug 18, 2023
```
**Description**
add source code for DirectXRenderPerf.

---------
Co-authored-by: yukirora <yuting.jiang@microsoft.com>
```
  6c0205ce
27 Jul, 2023 1 commit

Release - SuperBench v0.9.0 (#558) · e1df877b

Yuting Jiang authored Jul 27, 2023

**Description**
Cherry-pick bug fixes from v0.9.0 to main.

**Major Revision**
- CI/CD: pipeline - clean more disk space to fix rocm building image
pipeline(#555 )
- Benchmarks: bug fix - use absolute path for input file in
DirectXEncodingLatency(#554)
- CI/CD - add push win docker image on release branch in pipeline (#552)
- Docs - Upgrade version and release note(#557)

e1df877b

05 Jul, 2023 3 commits
- Benchmarks: micro benchmarks - add python code for DirecXGPUMemBw (#547) · af4cfd5b
  Yuting Jiang authored Jul 05, 2023
```
**Description**
add python code for DirecXGPUMemBw.
```
  af4cfd5b
- Benchmarks: micro benchmarks - add python code for DirectXGPUCoreFlops (#542) · f1d608ae
  Yuting Jiang authored Jul 05, 2023
```
**Description**
add python code for DirectX core flops and init DirectX test pipeline.

**Major Revision**
- add python code for DirectX core flops 
- init DirectX test pipeline


**Minor Revision**
- add test for DirectX core flops
```
  f1d608ae
- CI/CD - Support DirectX test pipeline (#545) · 3704a432
  Yuting Jiang authored Jul 05, 2023
```
**Description**
Support DirectX test pipeline.
```
  3704a432
28 Jun, 2023 1 commit

Dockerfile - Add SuperBench Windows Dockerfile (#534) · 44ef5314

Yuting Jiang authored Jun 28, 2023



**Description**
Add dockerfile for win10 and building script for directx_benchmarks.

**Major Revision**
- Add docker file for win10 and required scripts to install the
dependency
- Add building script to build all directx vs benchmarks
- Add call of building script in Makefile

---------
Co-authored-by: yukirora <yuting.jiang@microsoft.com>
Co-authored-by: Yifan Xiong <yifan.xiong@microsoft.com>

44ef5314

14 Apr, 2023 1 commit

Release - SuperBench v0.8.0 (#517) · 51761b3a

Yifan Xiong authored Apr 14, 2023



**Description**

Cherry-pick bug fixes from v0.8.0 to main.

**Major Revisions**

* Monitor - Fix the cgroup version checking logic (#502)
* Benchmark - Fix matrix size overflow issue in cuBLASLt GEMM (#503)
* Fix wrong torch usage in communication wrapper for Distributed
Inference Benchmark (#505)
* Analyzer: Fix bug in python3.8 due to pandas api change (#504)
* Bug - Fix bug to get metric from cmd when error happens (#506)
* Monitor - Collect realtime GPU power when benchmarking (#507)
* Add num_workers argument in model benchmark (#511)
* Remove unreachable condition when write host list (#512)
* Update cuda11.8 image to cuda12.1 based on nvcr23.03 (#513)
* Doc - Fix wrong unit of cpu-memory-bw-latency in doc (#515)
* Docs - Upgrade version and release note (#508)
Co-authored-by: guoshzhao <guzhao@microsoft.com>
Co-authored-by: Ziyue Yang <ziyyang@microsoft.com>
Co-authored-by: Yuting Jiang <yutingjiang@microsoft.com>

51761b3a

23 Feb, 2023 1 commit
- CI/CD - Free disk space in GitHub Action VHD (#481) · bbb86c4a
  Yifan Xiong authored Feb 23, 2023
```
Free more disk space in GitHub Action VHD.
```
  bbb86c4a
29 Dec, 2022 1 commit

Dockerfile - Add CUDA11.8 Docker image for Nvidia arch90 GPUs (#449) · a3c65b2a

Yifan Xiong authored Dec 29, 2022

Add Docker image for arch90 NVIDIA GPUs:

* add CUDA11.8 Dockerfile
* update archs in Makefile and benchmarks accordingly
* update image build pipeline

a3c65b2a

18 Oct, 2022 1 commit

Benchmarks - Add support to allow list of custom config string in... · 3367c4f6

Yuting Jiang authored Oct 18, 2022

Benchmarks - Add support to allow list of custom config string in cudnn-functions and cublas-functions (#414)

**Description**
Add support to allow list of custom config string in cudnn-functions and cublas-functions.

3367c4f6

06 Jul, 2022 1 commit

Update dependencies and Dockerfile (#371) · 9f03d568

Yifan Xiong authored Jul 06, 2022

Update dependencies and Dockerfile:
* upgrade nccl-tests and rccl-tests to current latest version to match
  NCCL/RCCL versions
* unify image tag names on DockerHub
* remove verbose output in Dockerfile and minor fix some flags

9f03d568

19 Jun, 2022 1 commit

Update ROCm Dockerfile (#361) · 483bf782

Yifan Xiong authored Jun 19, 2022

**Description**

Update ROCm Dockerfile.

**Major Revisions**
- Add dockerfile for ROCm 5.1.3
- Merge 5.1.x and 5.0.x dockerfile
- Remove 4.2 and 4.0 legacy
- Update build pipeline accordingly

483bf782

25 May, 2022 1 commit
- Dockerfile - Add dockerfile for rocm5.1.1 (#353) · 81a4146b
  user4543 authored May 25, 2022
```
**Description**
Add dockerfile for rocm5.1.1.
```
  81a4146b
28 Feb, 2022 1 commit
- Dockerfile - Add dockerfile for rocm5.0.1 (#319) · 425b9ff8
  user4543 authored Feb 28, 2022
```
**Description**
Add dockerfile for rocm5.0.1.
```
  425b9ff8
25 Feb, 2022 1 commit
- Dockerfile - Add rocm5.0 dockerfile (#307) · a4950a70
  user4543 authored Feb 26, 2022
```
**Description**
Add rocm5.0 dockerfile.
```
  a4950a70
08 Feb, 2022 1 commit

Benchmarks: Add Feature - Add GDR-only nccl-tests for Nvidia machines (#299) · 433785fd

Ziyue Yang authored Feb 08, 2022

This commit adds GDR-only nccl-tests for Nvidia machines. Also bump NCCL to v2.10.3-1 to achieve peak performance in this test.

433785fd

11 Oct, 2021 1 commit

CI/CD - Add code security scanning (#206) · 849b6cac

Yifan Xiong authored Oct 11, 2021

Add code security scanning.

__Major Revisions__
* enable dependabot auto updates
* scan code with CodeQL

849b6cac

26 Sep, 2021 1 commit

Release - SuperBench v0.3.0 (#212) · dfbd70b1

Yifan Xiong authored Sep 26, 2021



**Description**

Cherry-pick  bug fixes from v0.3.0 to main.

**Major Revisions**
* Docs - Upgrade version and release note (#209)
* Benchmarks: Build Pipeline - Update rccl-test git submodule to dc1ad48 (#210)
* Benchmarks: Update - Update benchmarks in configuration file (#208)
* CI/CD - Update GitHub Action VM (#211)
* Benchmarks: Fix Bug - Fix wrong parameters for gpu-sm-copy-bw in configuration examples (#203)
* CI/CD - Fix bug in build image for push event (#205)
* Benchmark: Fix Bug - fix error message of communication-computation-overlap (#204)
* Tool: Fix bug - Fix function naming issue in system info  (#200)
* CI/CD - Push images in GitHub Action (#202)
* Bug - Fix torch.distributed command for single node (#201)
* CLI - Integrate system info for node (#199)
* Benchmarks: Code Revision - Revise CMake files for microbenchmarks. (#196)
* CI/CD - Add ROCm image build in GitHub Actions (#194)
* Bug: Fix bug - fix bug of hipBusBandwidth build (#193)
* Benchmarks: Build Pipeline - Restore rocblas build logic (#197)
* Bug: Fix Bug - Add barrier before 'destroy_process_group' in model benchmarks (#198)
* Bug - Revise 'docker run' in sb deploy (#195)
* Bug - Fix Bug : fix bug of error param operations to operation in rccl-bw of hpe config (#190)
Co-authored-by: Yuting Jiang <v-yujiang@microsoft.com>
Co-authored-by: Guoshuai Zhao <guzhao@microsoft.com>
Co-authored-by: Ziyue Yang <ziyyang@microsoft.com>

dfbd70b1

09 Jul, 2021 1 commit

Bug bash - Merge fix from release/0.2 to main (#124) · 9c984c7e

guoshzhao authored Jul 09, 2021



* Bug Fix - Fix race condition issue for multi ranks (#117)

Fix race condition issue when multi ranks rotating the same directory.

* Update pipeline for release branch (#122)

* Bug Fix - Fix bug when convert bool config to store_true argument. (#120)
Co-authored-by: Yifan Xiong <yifan.xiong@microsoft.com>

9c984c7e

25 Jun, 2021 1 commit
- Website - Initialize SuperBench website (#102) · e7b6af35
  Yifan Xiong authored Jun 25, 2021
```
* Initialize SuperBench website.
* Add GitHub Actions for automatically build and publish.
```
  e7b6af35
16 Jun, 2021 1 commit

Dockerfile - Update CUDA 11.1.1 Dockerfile (#96) · 25ec3a7c

Yifan Xiong authored Jun 16, 2021

Update packages and add build cache for CUDA 11.1.1 Dockerfile:

* Remove duplicate cmake and ompi, which are already in base image
* Add hpcx and sharp lib
* Add cache for gitmodules build
* Sort apt-get packages

25ec3a7c