Commits · e1d791d202a55c79e4de63fb900d4a34a5f14b02 · tsoc / superbenchmark

17 Apr, 2026 1 commit
- Add --container-name for custom docker container name · e1d791d2
  one authored Apr 17, 2026
  
  e1d791d2
02 Apr, 2026 1 commit
- Use env file in docker instead of /tmp · c1bc12ce
  one authored Apr 02, 2026
  
  c1bc12ce
01 Apr, 2026 1 commit
- Refactor environment variable handling in runner.py · a10c3e15
  one authored Apr 01, 2026
  
  a10c3e15
27 Mar, 2026 1 commit
- MicroBenchmark: rocHPCG · e4c2bd4c
  one authored Mar 27, 2026
  
  e4c2bd4c
08 Oct, 2025 1 commit

Enhancement: Add nsys and pytorch profiler debug trace support (#744) · d804dbb6

Hongtao Zhang authored Oct 08, 2025



To improve benchmark debugging, the following debug methods were added:

pytorch profiler in model benchmark

- SB_ENABLE_PYTORCH_PROFILER: switch to enable/disable
- SB_TORCH_PROFILER_TRACE_DIR: log path
These 2 runtime variables need to be configured in SB config file.

nsys in SB runner

- SB_ENABLE_NSYS: switch to enable/disable 
- SB_NSYS_TRACE_DIR: log path
These 2 runtime variables need to be configured in runner's ENV

---------
Co-authored-by: Hongtao Zhang <hongtaozhang@microsoft.com>

d804dbb6

20 Aug, 2024 1 commit
- Bug: Executor - Fix executor for Benchmark Execution Without Explicit Framework Field (#636) · 96cc4d93
  Yang Wang authored Aug 21, 2024
```
**Description**
Fix executor for Benchmark Execution Without Explicit Framework Field
```
  96cc4d93
13 Aug, 2024 1 commit
- Bug Fix - Update Docker Exec Command for Persistent HPCX Environment (#635) · 46a57929
  Yang Wang authored Aug 14, 2024
```
Add 10-hpcx.sh to /etc/profile.d
Update the Docker exec command to ensure a persistent HPCX environment.
```
  46a57929
23 Jul, 2024 1 commit

Update omegaconf version to 2.3.0 (#631) · 9a3ce39d

Yang Wang authored Jul 24, 2024

Update `omegaconf` version to
[2.3.0](https://pypi.org/project/omegaconf/2.3.0/) as omegaconf 2.0.6
has a non-standard dependency specifier PyYAML>=5.1.*. pip 24.1 will
enforce this behaviour change.
Discussion can be found at https://github.com/pypa/pip/issues/12063.

9a3ce39d

08 Aug, 2023 1 commit

Benchmarks: model benchmarks - change torch.distributed.launch to torchrun (#556) · 67f2aa72

pnunna93 authored Aug 08, 2023

This PR has following changes
- torch.distributed.launch changed to torchrun. torch.distributed.launch
is deprecated in latest Pytorch and is recommended to move to torchrun -
https://pytorch.org/docs/stable/elastic/run.html


- Changes to AMD GPU detection logic. The AMD GPU detection logic throws
warning when containers have only renderD in /dev/dri, this change would
resolve those warnings

---------
Co-authored-by: Yuting Jiang <yutingjiang@microsoft.com>

67f2aa72

29 Jun, 2023 1 commit

Tools - Add runner for sys info and update docs (#532) · ed027e4c

Yuting Jiang authored Jun 29, 2023

**Description**
Add runner for sys info to automatically collect on multiple nodes and
update related docs.

**Major Revision**
- add runner for sys info which will check docker status and run `sb
node info` on all nodes' docker and fetch results from all nodes

**Minor Revision**
- update cli and system-info doc
- update sb node info to save output info output-dir/sys-info.json

ed027e4c

23 May, 2023 1 commit

Runner - Add signal handler in runner (#530) · a1cd3c94

Yifan Xiong authored May 23, 2023

Add signal handler in runner to gracefully exit when receiving SIGINT
(<kbd>Ctrl</kbd>+<kbd>C</kbd>) or SIGTERM during benchmark execution.

a1cd3c94

14 Apr, 2023 1 commit

Release - SuperBench v0.8.0 (#517) · 51761b3a

Yifan Xiong authored Apr 14, 2023



**Description**

Cherry-pick bug fixes from v0.8.0 to main.

**Major Revisions**

* Monitor - Fix the cgroup version checking logic (#502)
* Benchmark - Fix matrix size overflow issue in cuBLASLt GEMM (#503)
* Fix wrong torch usage in communication wrapper for Distributed
Inference Benchmark (#505)
* Analyzer: Fix bug in python3.8 due to pandas api change (#504)
* Bug - Fix bug to get metric from cmd when error happens (#506)
* Monitor - Collect realtime GPU power when benchmarking (#507)
* Add num_workers argument in model benchmark (#511)
* Remove unreachable condition when write host list (#512)
* Update cuda11.8 image to cuda12.1 based on nvcr23.03 (#513)
* Doc - Fix wrong unit of cpu-memory-bw-latency in doc (#515)
* Docs - Upgrade version and release note (#508)
Co-authored-by: guoshzhao <guzhao@microsoft.com>
Co-authored-by: Ziyue Yang <ziyyang@microsoft.com>
Co-authored-by: Yuting Jiang <yutingjiang@microsoft.com>

51761b3a

13 Feb, 2023 1 commit

Executor - Support SuperBench Executor running on Windows (#475) · 62a29134

Yuting Jiang authored Feb 13, 2023

**Description**
Support SuperBench Executor running on Windows.

**Major Revision**
- Lazy import ansible related module

62a29134

28 Jan, 2023 1 commit

Release - SuperBench v0.7.0 (#468) · b07fda15

Yifan Xiong authored Jan 28, 2023



**Description**

Cherry-pick bug fixes from v0.7.0 to main.

**Major Revisions**

* Benchmarks - Fix missing include in FP8 benchmark (#460)
* Fix bug in TE BERT model (#461)
* Doc - Update benchmark doc (#465)
* Bug: Fix bug for incorrect datatype judgement in cublas-function
source code (#464)
* Support `sb deploy` without pulling image (#466)
* Docs - Upgrade version and release note (#467)
Co-authored-by: Russell J. Hewett <russell.j.hewett@gmail.com>
Co-authored-by: Yuting Jiang <yutingjiang@microsoft.com>

b07fda15

04 Jan, 2023 1 commit

Runner - Generate host groups file in mpi mode (#458) · 8e748d56

Yang Wang authored Jan 04, 2023

**Major Revision**

- Add an option for pattern to generate mpi_pattern.txt file if
specified the path.
- In mpi pattern, serial_index and parallel_index will add in each
benchmark as environment variables.

**Minor Revision**
- Fix typo

8e748d56

03 Jan, 2023 1 commit
- Runner: Support `topo-aware` and `k-batch` pattern in 'mpi' mode (#437) · 65e433c0
  Yang Wang authored Jan 03, 2023
```
**Description**
Support the following patterns  in `mpi` mode:
* `k-batch`
* `topo-aware`
```
  65e433c0
29 Dec, 2022 1 commit
- Runner - Support `pair-wise` pattern in `mpi` mode (#447) · 7838b6b1
  Yang Wang authored Dec 29, 2022
```
* Extract pair-wise pattern from ib_validation
```
  7838b6b1
29 Nov, 2022 1 commit

Runner - support 'pattern' in 'mpi' mode to run tasks in parallel (#430) · e4eeda0a

Yang Wang authored Nov 29, 2022

* add mpi-parallels mode

* update according to comments

* fix and update doc

* update

* merge into 'mpi' mode

* udpate according to comments

* fix testcases

* fix ansible

* regard pattern as field

* udpate

* fix flake8 version

* add flake8 range

* remove map-by from host config

* udpate comments

e4eeda0a

01 Nov, 2022 1 commit

CLI - Add non-zero return code for `sb [deploy,run]` (#425) · 1b86503d

Yifan Xiong authored Nov 01, 2022

Add non-zero return code for `sb deploy` and `sb run` command when
there're Ansible failures in control plane.
Return code is set to count of failure.

For failures caused by benchmarks, return code is still set per benchmark
in results json file.

1b86503d

06 Sep, 2022 1 commit

Release - SuperBench v0.6.0 (#409) · 63e9b2d1

Yifan Xiong authored Sep 06, 2022



**Description**

Cherry-pick bug fixes from v0.6.0 to main.

**Major Revisions**

* Enable latency test in ib traffic validation distributed benchmark (#396)
* Enhance parameter parsing to allow spaces in value (#397)
* Update apt packages in dockerfile (#398)
* Upgrade colorlog for NO_COLOR support (#404)
* Analyzer - Update error handling to support exit code of sb result diagnosis (#403)
* Analyzer - Make baseline file optional in data diagnosis and fix bugs (#399)
* Enhance timeout cleanup to avoid possible hanging (#405)
* Auto generate ibstat file by pssh (#402)
* Analyzer - Format int type and unify empty value to N/A in diagnosis output file (#406)
* Docs - Upgrade version and release note (#407)
* Docs - Fix issues in document (#408)
Co-authored-by: Yang Wang <yangwang1@microsoft.com>
Co-authored-by: Yuting Jiang <yutingjiang@microsoft.com>

63e9b2d1

08 Aug, 2022 1 commit
- Runner - Fix minimum timeout (#385) · 9c29c931
  Yifan Xiong authored Aug 08, 2022
```
Fix minimum timeout: use 60s if config is shorter.
```
  9c29c931
04 Aug, 2022 1 commit

Gracefully exit when timeout (#383) · 9b8df883

Yifan Xiong authored Aug 04, 2022

* Gracefully exit when timeout, add corresponding log and return code.
* Set minimum timeout to 1 minute and enlarge Ansible timeout.

9b8df883

09 Jul, 2022 1 commit

Fix issues in ib validation benchmark (#370) · b2875179

Yifan Xiong authored Jul 09, 2022

Fix several issues in ib validation benchmark:
* continue running when timeout in the middle, instead of aborting whole mpi process
* make timeout parameter configurable, set default to 120 seconds
* avoid mixture of stdio and iostream when print to stdout
* set default message size to 8M which will saturate ib in most cases
* fix hostfile path issue so that it can be auto found in different cases

b2875179

08 Jul, 2022 1 commit

Support node_num=1 in mpi mode (#372) · e00a8180

Yifan Xiong authored Jul 08, 2022

Support `node_num: 1` in mpi mode, so that we can run mpi benchmarks in
both 1 node and all nodes in one config by changing `node_num`.
Update docs and add test case accordingly.

e00a8180

24 Jun, 2022 1 commit

Support multiple IB/GPU in ib validation (#363) · bfaa1c83

Yifan Xiong authored Jun 24, 2022

**Description**

Support multiple IB/GPU devices run simultaneously in ib validation benchmark.

**Major Revisions**
- Revise ib_validation_performance.cc so that multiple processes per node could be used to launch multiple perftest commands simultaneously. For each node pair in the config, number of processes per node will run in parallel.
- Revise ib_validation_performance.py to correct file paths and adjust parameters to specify different NICs/GPUs/NUMA nodes.
- Fix env issues in Dockerfile for end-to-end test.
- Update ib-traffic configuration examples in config files.
- Update unit tests and docs accordingly.

Closes #326.

bfaa1c83

19 Jun, 2022 1 commit
- Runner - Fix sudo issue when running without Docker (#362) · 0f7b057a
  Yifan Xiong authored Jun 19, 2022
```
Fix sudo issue when running without Docker, user account could be
arbitrary in such case.
```
  0f7b057a
14 Jun, 2022 1 commit

Support `sb run` on host directly without Docker (#358) · a4937e95

Yifan Xiong authored Jun 14, 2022

**Description**

Support `sb run` on host directly without Docker

**Major Revisions**
- Add `--no-docker` argument for `sb run`.
- Run on host directly if `--no-docker` if specified.
- Update docs and tests correspondingly.

a4937e95

09 Mar, 2022 1 commit

Bug - Fix env path to absolute path (#327) · f755c0b6

Yifan Xiong authored Mar 09, 2022

Fix env file path to absolute path in `docker exec`, in case there're mixed ssh and local connections or different users are used.

f755c0b6

15 Feb, 2022 1 commit
- Bug - Fix env file path (#310) · 1f48268b
  Yifan Xiong authored Feb 15, 2022
```
Fix env file path for `docker run`.
```
  1f48268b
29 Jan, 2022 1 commit
- Config - Support customized env for all modes (#295) · 3524975c
  Yifan Xiong authored Jan 29, 2022
```
Support customized env for all modes in configuration.
```
  3524975c
28 Jan, 2022 2 commits

Benchmarks: Add Feature - Sync the E2E training results among all workers for each step. (#287) · d03d110f

guoshzhao authored Jan 28, 2022

**Description**
Please write a brief description and link the related issue if have.

**Major Revision**
- Sync (do allreduce max) the E2E training results among all workers.
- Avoid using ':0' in metric name if there has only one rank having output.

d03d110f

Benchmarks: Add Feature - Add timeout feature for each benchmark. (#288) · d877ca23

guoshzhao authored Jan 28, 2022

**Description**
Add timeout feature for each benchmark.

**Major Revision**
- Add `timeout` config for each benchmark. In current config files, only set the timeout for kernel-launch as example. Other benchmarks can be set in the future.
- Set the timeout config for `ansible_runner.run()`. Runner will get the return code 254:
   [ansible.py:80][WARNING] Run failed, return code 254.
- Using `timeout` command to terminate the client process.

d877ca23

25 Jan, 2022 1 commit

Config - Update benchmark naming to support annotations (#284) · 7d7cd3dc

Yifan Xiong authored Jan 25, 2022

__Description__

Update benchmark naming to support annotations.

__Major Revisions__
- Update name for `create_benchmark_context` in executor.
- Backward compatibility for model benchmarks using "_models" suffix.
- Update documents.

7d7cd3dc

30 Dec, 2021 1 commit

Release - SuperBench v0.4.0 (#278) · ff563b66

Yifan Xiong authored Dec 30, 2021



__Description__

Cherry-pick  bug fixes from v0.4.0 to main.

__Major Revisions__

* Bug - Fix issues for Ansible and benchmarks (#267)
* Tests - Refine test cases for microbenchmark (#268)
* Bug - Build openmpi with ucx support in rocm dockerfiles (#269)
* Benchmarks: Fix Bug - Fix fio build issue (#272)
* Docs - Unify metric and add doc for cublas and cudnn functions (#271)
* Monitor: Revision - Add 'monitor/' prefix to monitor metrics in result summary (#274)
* Bug - Fix bug of detecting if gpu_index is none (#275)
* Bug - Fix bugs in data diagnosis (#273)
* Bug - Fix issue that the root mpi rank may not be the first in the hostfile (#270)
* Benchmarks: Configuration - Update inference and network benchmarks in configs (#276)
* Docs - Upgrade version and release note (#277)
Co-authored-by: Yuting Jiang <v-yutjiang@microsoft.com>

ff563b66

10 Dec, 2021 1 commit

Monitor: Integration - Integrate monitor into Superbench (#259) · 6e357fb9

guoshzhao authored Dec 10, 2021

**Description**
Integrate monitor into Superbench.

**Major Revision**
- Initialize, start and stop monitor in SB executor.
- Parse the monitor data in SB runner and merge into benchmark results.
- Specify ReduceType for monitor metrics, such as MAX, MIN and LAST.
- Add monitor configs into config file.

6e357fb9

08 Dec, 2021 1 commit

Bug - Fix issues for distributed runs (#258) · 213ab14b

Yifan Xiong authored Dec 08, 2021

Fix issues for distributed runs:
* fix config for memory bandwidth benchmarks
* add throttling for high concurrency docker pull
* update rsync path and exclude directories
* handle exceptions when creating summary
* tune for logging

213ab14b

26 Sep, 2021 1 commit

Release - SuperBench v0.3.0 (#212) · dfbd70b1

Yifan Xiong authored Sep 26, 2021



**Description**

Cherry-pick  bug fixes from v0.3.0 to main.

**Major Revisions**
* Docs - Upgrade version and release note (#209)
* Benchmarks: Build Pipeline - Update rccl-test git submodule to dc1ad48 (#210)
* Benchmarks: Update - Update benchmarks in configuration file (#208)
* CI/CD - Update GitHub Action VM (#211)
* Benchmarks: Fix Bug - Fix wrong parameters for gpu-sm-copy-bw in configuration examples (#203)
* CI/CD - Fix bug in build image for push event (#205)
* Benchmark: Fix Bug - fix error message of communication-computation-overlap (#204)
* Tool: Fix bug - Fix function naming issue in system info  (#200)
* CI/CD - Push images in GitHub Action (#202)
* Bug - Fix torch.distributed command for single node (#201)
* CLI - Integrate system info for node (#199)
* Benchmarks: Code Revision - Revise CMake files for microbenchmarks. (#196)
* CI/CD - Add ROCm image build in GitHub Actions (#194)
* Bug: Fix bug - fix bug of hipBusBandwidth build (#193)
* Benchmarks: Build Pipeline - Restore rocblas build logic (#197)
* Bug: Fix Bug - Add barrier before 'destroy_process_group' in model benchmarks (#198)
* Bug - Revise 'docker run' in sb deploy (#195)
* Bug - Fix Bug : fix bug of error param operations to operation in rccl-bw of hpe config (#190)
Co-authored-by: Yuting Jiang <v-yujiang@microsoft.com>
Co-authored-by: Guoshuai Zhao <guzhao@microsoft.com>
Co-authored-by: Ziyue Yang <ziyyang@microsoft.com>

dfbd70b1

20 Aug, 2021 1 commit

Runner: Add Feature - Generate summarized output files. (#157) · 7595d794

guoshzhao authored Aug 20, 2021

**Description**
Generate the summarized output files from all nodes. For each metric, do the reduce operation according to the `reduce_op`

**Major Revision**
- Generate the summarized json file per node:
For microbenchmark, the format is `{benchmark_name}/[{run_count}/]{metric_name}[:rank]`
For modelbenchmark, the format is `{benchmark_name}/{sub_benchmark_name}/[{run_count}/]{metric_name}`
`[]` means optional.
```
{
  "kernel-launch/overhead_event:0": 0.00583,
  "kernel-launch/overhead_event:1": 0.00545,
  "kernel-launch/overhead_event:2": 0.00581,
  "kernel-launch/overhead_event:3": 0.00572,
  "kernel-launch/overhead_event:4": 0.00559,
  "kernel-launch/overhead_event:5": 0.00591,
  "kernel-launch/overhead_event:6": 0.00562,
  "kernel-launch/overhead_event:7": 0.00586,
  "resnet_models/pytorch-resnet50/steptime-train-float32": 544.0827468410134,
  "resnet_models/pytorch-resnet50/throughput-train-float32": 353.7607016465773,
  "resnet_models/pytorch-resnet50/steptime-train-float16": 425.40482617914677,
  "resnet_models/pytorch-resnet50/throughput-train-float16": 454.0142363793973,
  "pytorch-sharding-matmul/0/allreduce": 10.561786651611328,
  "pytorch-sharding-matmul/1/allreduce": 10.561786651611328,
  "pytorch-sharding-matmul/0/allgather": 10.088025093078613,
  "pytorch-sharding-matmul/1/allgather": 10.088025093078613
}
```
- Generate the summarized jsonl file for all nodes, each line is the result from one node in json format.

7595d794

19 Aug, 2021 1 commit

Runner - Support mpi mode (#146) · 98b6c0e3

Yifan Xiong authored Aug 19, 2021



Support mpi mode in runner:
* concate mpirun command
* support mca and env config
* prepare hostfile and update Ansible host pattern
Co-authored-by: Peng Cheng <chengpeng5555@outlook.com>

98b6c0e3

08 Jul, 2021 1 commit

Runner & Executor - Support AMD GPU (#119) · 7458f83a

Yifan Xiong authored Jul 09, 2021

Support both NVIDIA and AMD GPU and check GPU vendor during deployment and execution.

* Add GPU environment check in sb deploy.
* Check GPU vendor in executor.

7458f83a