Commits · d804dbb6a42638c54cb38e9cf2beadff1e0f9053 · tsoc / superbenchmark

08 Oct, 2025 1 commit

Enhancement: Add nsys and pytorch profiler debug trace support (#744) · d804dbb6

Hongtao Zhang authored Oct 08, 2025



To improve benchmark debugging, the following debug methods were added:

pytorch profiler in model benchmark

- SB_ENABLE_PYTORCH_PROFILER: switch to enable/disable
- SB_TORCH_PROFILER_TRACE_DIR: log path
These 2 runtime variables need to be configured in SB config file.

nsys in SB runner

- SB_ENABLE_NSYS: switch to enable/disable 
- SB_NSYS_TRACE_DIR: log path
These 2 runtime variables need to be configured in runner's ENV

---------
Co-authored-by: Hongtao Zhang <hongtaozhang@microsoft.com>

d804dbb6

25 Jun, 2025 1 commit

Dockerfile - Add cuda12.9 docker image (#716) · a56356d8

guoshzhao authored Jun 25, 2025



**Description**
Add cuda 12.9 dockerfile and build in pipeline.

---------
Co-authored-by: Guoshuai Zhao <microsoft@microsoft.com>
Co-authored-by: Hongtao Zhang <hongtaozhang@microsoft.com>
Co-authored-by: Hongtao Zhang <garyworkzht@gmail.com>

a56356d8

14 Apr, 2023 1 commit

Release - SuperBench v0.8.0 (#517) · 51761b3a

Yifan Xiong authored Apr 14, 2023



**Description**

Cherry-pick bug fixes from v0.8.0 to main.

**Major Revisions**

* Monitor - Fix the cgroup version checking logic (#502)
* Benchmark - Fix matrix size overflow issue in cuBLASLt GEMM (#503)
* Fix wrong torch usage in communication wrapper for Distributed
Inference Benchmark (#505)
* Analyzer: Fix bug in python3.8 due to pandas api change (#504)
* Bug - Fix bug to get metric from cmd when error happens (#506)
* Monitor - Collect realtime GPU power when benchmarking (#507)
* Add num_workers argument in model benchmark (#511)
* Remove unreachable condition when write host list (#512)
* Update cuda11.8 image to cuda12.1 based on nvcr23.03 (#513)
* Doc - Fix wrong unit of cpu-memory-bw-latency in doc (#515)
* Docs - Upgrade version and release note (#508)
Co-authored-by: guoshzhao <guzhao@microsoft.com>
Co-authored-by: Ziyue Yang <ziyyang@microsoft.com>
Co-authored-by: Yuting Jiang <yutingjiang@microsoft.com>

51761b3a

28 Mar, 2023 1 commit

Benchmark - Update TE FP8 model conversion (#499) · 97c9a41f

Yifan Xiong authored Mar 28, 2023

__Description__

Update TE FP8 model conversion.

__Major Revisions__
* Add 16-byte alignment comment.
* Fix TE layer parameters type.

97c9a41f

25 Mar, 2023 1 commit

Benchmarks - Support TE FP8 in BERT/GPT2 models (#496) · c88c9709

Yifan Xiong authored Mar 25, 2023

Support Transformer Engine FP8 in existing PyTorch BERT/GPT2 models by
converting linear/layernorm to TE layers.

c88c9709

21 Mar, 2023 1 commit

Benchmark - Fix torch.dist init issue with multiple models (#495) · 644b5395

Yifan Xiong authored Mar 21, 2023

Fix potential barrier timeout in init_process_group due to race
condition of using the same port. Change to different ports when running
multiple models sequentially in one process.
For example, when running vgg11/13/16/19, will use port 29501~29504
respectively.

644b5395

29 Apr, 2022 1 commit

Release - SuperBench v0.5.0 (#350) · 6681c720

Yifan Xiong authored Apr 29, 2022



**Description**

Cherry-pick  bug fixes from v0.5.0 to main.

**Major Revisions**

* Bug - Force to fix ort version as '1.10.0' (#343)
* Bug - Support no matching rules and unify the output name in result_summary (#345)
* Analyzer - Support regex in annotations of benchmark naming for metrics in rules (#344)
* Bug - Fix bugs in sync results on root rank for e2e model benchmarks (#342)
* Bug - Fix bug of duration feature for model benchmarks in distributed mode (#347)
* Docs - Upgrade version and release note (#348)
Co-authored-by: Yuting Jiang <v-yutjiang@microsoft.com>

6681c720

10 Feb, 2022 1 commit

Benchmarks: Revise Code - Add support for pytorch>=1.9.0 of init_process_group (#305) · e31b8c9e

user4543 authored Feb 10, 2022

**Description**
Add support for pytorch>=1.9.0 of init_process_group.

**Major Revision**
- Use PrefixStore(TCPStore) to init_process_group manully for each model run

e31b8c9e

28 Jan, 2022 1 commit

Benchmarks: Add Feature - Sync the E2E training results among all workers for each step. (#287) · d03d110f

guoshzhao authored Jan 28, 2022

**Description**
Please write a brief description and link the related issue if have.

**Major Revision**
- Sync (do allreduce max) the E2E training results among all workers.
- Avoid using ':0' in metric name if there has only one rank having output.

d03d110f

28 Sep, 2021 1 commit
- Benchmarks: Fix bug - Fix bug when set force_fp32 option. (#214) · 1a86583b
  guoshzhao authored Sep 28, 2021
```
**Description**
Fix typo when set force_fp32 option.
```
  1a86583b
27 Sep, 2021 1 commit
- Benchmarks: Add Feature - Add option to use fp32 instead of tf32 (#213) · f9442456
  guoshzhao authored Sep 28, 2021
```
**Description**
Add option `force_fp32` to use fp32 instead of tf32, only takes effect on Ampere or newer GPUs.
```
  f9442456
26 Sep, 2021 1 commit

Release - SuperBench v0.3.0 (#212) · dfbd70b1

Yifan Xiong authored Sep 26, 2021



**Description**

Cherry-pick  bug fixes from v0.3.0 to main.

**Major Revisions**
* Docs - Upgrade version and release note (#209)
* Benchmarks: Build Pipeline - Update rccl-test git submodule to dc1ad48 (#210)
* Benchmarks: Update - Update benchmarks in configuration file (#208)
* CI/CD - Update GitHub Action VM (#211)
* Benchmarks: Fix Bug - Fix wrong parameters for gpu-sm-copy-bw in configuration examples (#203)
* CI/CD - Fix bug in build image for push event (#205)
* Benchmark: Fix Bug - fix error message of communication-computation-overlap (#204)
* Tool: Fix bug - Fix function naming issue in system info  (#200)
* CI/CD - Push images in GitHub Action (#202)
* Bug - Fix torch.distributed command for single node (#201)
* CLI - Integrate system info for node (#199)
* Benchmarks: Code Revision - Revise CMake files for microbenchmarks. (#196)
* CI/CD - Add ROCm image build in GitHub Actions (#194)
* Bug: Fix bug - fix bug of hipBusBandwidth build (#193)
* Benchmarks: Build Pipeline - Restore rocblas build logic (#197)
* Bug: Fix Bug - Add barrier before 'destroy_process_group' in model benchmarks (#198)
* Bug - Revise 'docker run' in sb deploy (#195)
* Bug - Fix Bug : fix bug of error param operations to operation in rccl-bw of hpe config (#190)
Co-authored-by: Yuting Jiang <v-yujiang@microsoft.com>
Co-authored-by: Guoshuai Zhao <guzhao@microsoft.com>
Co-authored-by: Ziyue Yang <ziyyang@microsoft.com>

dfbd70b1

29 Jul, 2021 1 commit

Release - SuperBench v0.2.1 (#142) · 69b2c631

Yifan Xiong authored Jul 29, 2021

__Description__
Cherry-pick bug fixes from v0.2.1 to main.

__Major Revisions__
* Fix bug of VGG models failed on A100 GPU with batch_size=128.
* Fix Ansible connection issue when running in localhost.
* Update version in packages and docs.

69b2c631

28 Jun, 2021 1 commit
- Benchmarks: Code Revision - Replace torch.optim.AdamW with transformers.AdamW. (#106) · 9c748527
  guoshzhao authored Jun 28, 2021
```
* replace torch.optim.AdamW with transformers.AdamW.
```
  9c748527
07 Jun, 2021 1 commit
- Benchmarks: Fix Bug - Fix OOM issue when run pytorch models sequentially. (#93) · 03b41be1
  guoshzhao authored Jun 07, 2021
```
* Clean up the cache.
```
  03b41be1
19 May, 2021 1 commit
- expose interface of pin memory and modify cnn configuration (#75) · b7d0ee32
  Yuting Jiang authored May 19, 2021
  
  b7d0ee32
12 Apr, 2021 1 commit
- add _post_process() implementation in pytorch_base.py to clean up distributed resource. (#45) · 1f726091
  guoshzhao authored Apr 12, 2021
```
Co-authored-by: Guoshuai Zhao <guzhao@microsoft.com>
```
  1f726091
15 Mar, 2021 1 commit
- add more checks for PytorchBase module (#19) · 80f434cb
  guoshzhao authored Mar 15, 2021
```
Co-authored-by: Guoshuai Zhao <guzhao@microsoft.com>
```
  80f434cb
09 Mar, 2021 2 commits
- Benchmarks: Add Feature - Add flag to disable GPU. (#15) · 52848d2f
  guoshzhao authored Mar 10, 2021
```
* add flag to disable GPU.

* fix spelling

* fix unittest.

* address comments.
Co-authored-by: Guoshuai Zhao <guzhao@microsoft.com>
```
  52848d2f
- rename _cal_params_size as _cal_params_count. (#16) · 83a4e93f
  guoshzhao authored Mar 09, 2021
```
Co-authored-by: Guoshuai Zhao <guzhao@microsoft.com>
```
  83a4e93f
08 Mar, 2021 1 commit

Benchmarks: Add Feature - Add pytorch base class (#11) · 088aa19a

guoshzhao authored Mar 08, 2021



* add pytorch base class

* address comments
Co-authored-by: Guoshuai Zhao <guzhao@microsoft.com>

088aa19a