Commits · 47d4a79d5868a7173fc580a55e16c8486a6ce32f · tsoc / superbenchmark

18 Apr, 2026 1 commit

Benchmark: Model benchmark - deterministic training support (#731) (#2) · 47d4a79d

one authored Apr 18, 2026



Adds opt-in deterministic training mode to SuperBench's PyTorch model
benchmarks. When enabled --enable-determinism. PyTorch deterministic
algorithms are enforced, and per-step numerical fingerprints (loss,
activation means) are recorded as metrics. These can be compared across
runs using the existing sb result diagnosis pipeline to verify bit-exact
reproducibility — useful for hardware validation and platform
comparison.
 
Flags added - 

--enable-determinism
--check-frequency: Number of steps after which you want the metrics to
be recorded
--deterministic-seed

Changes - 

Updated pytorch_base.py to handle deterministic settings, logging.
Added a new example script: pytorch_deterministic_example.py
Added a test file: test_pytorch_determinism_all.py to verify everything
works as expected.

Usage - 

Step 1: Run 1 - Run with --enable-determinism and the necessary metrics
will be recorded in the results-summary.jsonl file
Step 2: Generate the baseline file from the Run 1 results using - sb
result generate-baseline
Step 3: Run 2 - Run with --enable-determinism and the necessary metrics
will be recorded in the results-summary.jsonl file on a different
machine (or the same machine)
Step 4: Run diagnosis on the results generated from the 2 runs using the
- sb result diagnosis command

Note - 
1. Make sure all the parameters are constant between the 2 runs 
2. Running the diagnosis command requires the rules.yaml file

---------
Co-authored-by: Aishwarya Tonpe <aishwarya.tonpe25@gmail.com>
Co-authored-by: Ubuntu <rdadmin@HPCPLTNODE0.n3kgq4m0lhoednrx3hxtad2nha.cdmx.internal.cloudapp.net>

47d4a79d

29 Sep, 2025 1 commit

Benchmark: Model benchmark - add option to exclude data copy time in model benchmarks (#734) · 76066b6d

Yuting Jiang authored Sep 29, 2025

**Description**
add option to exclude data copy time in model benchmarks.

**Major Revision**
- add an option --no_copy
- move start time after data copy finish

76066b6d

08 Jan, 2024 1 commit

Release - SuperBench v0.10.0 (#607) · 2c88db90

Yifan Xiong authored Jan 07, 2024

**Description**

Cherry-pick bug fixes from v0.10.0 to main.

**Major Revisions**

* Benchmarks: Microbenchmark - Support different hipblasLt data types in dist_inference #590
* Benchmarks: Microbenchmark - Support in-place for NCCL/RCCL benchmark #591
* Bug Fix - Fix NUMA Domains Swap Issue in NDv4 Topology File #592
* Benchmarks: Microbenchmark - Add data type option for NCCL and RCCL tests #595
* Benchmarks: Bug Fix - Make metrics of dist-inference-cpp aligned with PyTorch version #596
* CI/CD - Add ndv5 topo file #597
* Benchmarks: Microbenchmark - Improve AMD GPU P2P performance with fine-grained GPU memory #593
* Benchmarks: Build Pipeline - fix nccl and nccl test version to 2.18.3 to resolve hang issue in cuda12.2 docker #599
* Dockerfile - Bug fix for rocm docker build and deploy #598
* Benchmarks: Microbenchmark - Adapt to hipblasLt data type changes #603
* Benchmarks: Micro benchmarks - Update hipblaslt metric unit to tflops #604
* Monitor - U...

2c88db90

07 Dec, 2023 1 commit
- Benchmarks: Add benchmark: Megatron-LM/Megatron-Deepspeed GPT pretrain benchmark (#582) · dd5a6329
  Yuting Jiang authored Dec 07, 2023
```
**Description**
Megatron-LM/Megatron-Deepspeed GPT pretrain benchmark
```
  dd5a6329
28 Apr, 2023 1 commit

ModelBenchmarks - Fix early stop logic due to num_steps. (#522) · f38a9829

guoshzhao authored Apr 28, 2023

**Description**
Model benchmarks can stop due to `num_steps` or `duration` config which
will take effect when the value is set greater than 0.
If both are set greater than 0, the earliest condition reached will
work.

f38a9829

14 Apr, 2023 1 commit

Release - SuperBench v0.8.0 (#517) · 51761b3a

Yifan Xiong authored Apr 14, 2023



**Description**

Cherry-pick bug fixes from v0.8.0 to main.

**Major Revisions**

* Monitor - Fix the cgroup version checking logic (#502)
* Benchmark - Fix matrix size overflow issue in cuBLASLt GEMM (#503)
* Fix wrong torch usage in communication wrapper for Distributed
Inference Benchmark (#505)
* Analyzer: Fix bug in python3.8 due to pandas api change (#504)
* Bug - Fix bug to get metric from cmd when error happens (#506)
* Monitor - Collect realtime GPU power when benchmarking (#507)
* Add num_workers argument in model benchmark (#511)
* Remove unreachable condition when write host list (#512)
* Update cuda11.8 image to cuda12.1 based on nvcr23.03 (#513)
* Doc - Fix wrong unit of cpu-memory-bw-latency in doc (#515)
* Docs - Upgrade version and release note (#508)
Co-authored-by: guoshzhao <guzhao@microsoft.com>
Co-authored-by: Ziyue Yang <ziyyang@microsoft.com>
Co-authored-by: Yuting Jiang <yutingjiang@microsoft.com>

51761b3a

30 Dec, 2022 1 commit

Executor - Add stdout logging util module and enable real-time logging flushing in executor (#445) · 9dfefce3

Yuting Jiang authored Dec 30, 2022

**Description**
Add stdout logging util module and enable real-time logging flushing in executor

**Major Revision**
- Add stdout logging util module to redirect stdout into file log
- enable stdout logging in executor to write benchmark output into both stdout and file `sb-bench.log`
- enable real-time log flushing in run_command of microbenchmarks through config `log_flushing`

**Minor Revision**
- add log_n_step args to enable regular step time log in model benchmarks 
- udpate related docs

9dfefce3

29 Apr, 2022 1 commit

Release - SuperBench v0.5.0 (#350) · 6681c720

Yifan Xiong authored Apr 29, 2022



**Description**

Cherry-pick  bug fixes from v0.5.0 to main.

**Major Revisions**

* Bug - Force to fix ort version as '1.10.0' (#343)
* Bug - Support no matching rules and unify the output name in result_summary (#345)
* Analyzer - Support regex in annotations of benchmark naming for metrics in rules (#344)
* Bug - Fix bugs in sync results on root rank for e2e model benchmarks (#342)
* Bug - Fix bug of duration feature for model benchmarks in distributed mode (#347)
* Docs - Upgrade version and release note (#348)
Co-authored-by: Yuting Jiang <v-yutjiang@microsoft.com>

6681c720

01 Apr, 2022 1 commit

Benchmarks: Add Feature - Provide option to save raw data into file. (#333) · 6d895da8

guoshzhao authored Apr 01, 2022

**Description**
Use config `log_raw_data` to control whether log the raw data into file or not. The default value is `no`. We can set it as `yes` for some particular benchmarks to save the raw data into file, such as NCCL/RCCL test.

6d895da8

28 Jan, 2022 1 commit

Benchmarks: Add Feature - Sync the E2E training results among all workers for each step. (#287) · d03d110f

guoshzhao authored Jan 28, 2022

**Description**
Please write a brief description and link the related issue if have.

**Major Revision**
- Sync (do allreduce max) the E2E training results among all workers.
- Avoid using ':0' in metric name if there has only one rank having output.

d03d110f

19 Jan, 2022 1 commit
- Benchmarks: Add Feature - Add percentile metrics for ort and pytorch inference benchmarks (#283) · fd2bc9e0
  guoshzhao authored Jan 19, 2022
```
**Description**
Add 50th, 90th, 95th, 99th, 99.9th latency metrics for ORT and pytorch inference benchmarks.
```
  fd2bc9e0
09 Dec, 2021 1 commit
- Benchmarks: Unify metric names of benchmarks (#252) · 9f56b219
  Yuting Jiang authored Dec 09, 2021
```
**Description**
Unify metric names of benchmarks.
```
  9f56b219
27 Sep, 2021 1 commit
- Benchmarks: Add Feature - Add option to use fp32 instead of tf32 (#213) · f9442456
  guoshzhao authored Sep 28, 2021
```
**Description**
Add option `force_fp32` to use fp32 instead of tf32, only takes effect on Ampere or newer GPUs.
```
  f9442456
06 Aug, 2021 2 commits
- Benchmarks: Add Feature - Set reduce type for current benchmarks' metrics. (#149) · acf365a8
  guoshzhao authored Aug 06, 2021
```
**Description**
Set reduce type for current benchmarks' metrics, including model benchmarks and ShardingMatmul.
```
  acf365a8
- Benchmarks: Code Revision - Calculate average value by using statistics module. (#148) · bc1a61b9
  guoshzhao authored Aug 06, 2021
```
**Description**
Replace `sum(results) / len(results)` with `statistics.mean(results)`
```
  bc1a61b9
28 Jun, 2021 1 commit
- Benchmarks: Add Configuration - Add validation config file for azure NDv4. (#103) · f22bb3f2
  guoshzhao authored Jun 28, 2021
```
* add config file for ndv4.
```
  f22bb3f2
21 Jun, 2021 1 commit
- Benchmarks: Add Feature - Add DistributedImpl and DistributedBackend arguments... · 216c5b5c
  guoshzhao authored Jun 21, 2021
```
Benchmarks: Add Feature - Add DistributedImpl and DistributedBackend arguments for micro benchmark. (#100)
```
  216c5b5c
04 Jun, 2021 1 commit
- Benchmarks: Fix Bug - Fix return code overwrite issue (#94) · 2d9be807
  guoshzhao authored Jun 04, 2021
```
* fix return code reset issue
```
  2d9be807
19 May, 2021 1 commit
- expose interface of pin memory and modify cnn configuration (#75) · b7d0ee32
  Yuting Jiang authored May 19, 2021
  
  b7d0ee32
26 Apr, 2021 1 commit
- Benchmarks: Fix Bug - Increase default sample count for benchmarking. (#64) · a7184da3
  guoshzhao authored Apr 26, 2021
  
  a7184da3
08 Apr, 2021 1 commit
- Benchmarks: Code Revision - Revise result process interface and add result checking (#32) · 2871a68b
  guoshzhao authored Apr 08, 2021
```
* revise result process interface

* add more comments
Co-authored-by: Guoshuai Zhao <guzhao@microsoft.com>
```
  2871a68b
22 Mar, 2021 1 commit

Benchmarks: Add Feature - Add benchmark finish check according to... · 5dfcc6be

guoshzhao authored Mar 22, 2021


Benchmarks: Add Feature - Add benchmark finish check according to num_warmup/num_steps and duration in ModelBenchmark class. (#25)

* add is_finished function

* reuse current time.
Co-authored-by: Guoshuai Zhao <guzhao@microsoft.com>

5dfcc6be

18 Mar, 2021 1 commit

Benchmarks: Add Feature - Add sample_count argument for ModelBenchmark. (#22) · c00dc670

guoshzhao authored Mar 18, 2021



* add sample_count argument.

* handle more condidatins.

* address comments.
Co-authored-by: Guoshuai Zhao <guzhao@microsoft.com>

c00dc670

09 Mar, 2021 2 commits
- Benchmarks: Add Feature - Add flag to disable GPU. (#15) · 52848d2f
  guoshzhao authored Mar 10, 2021
```
* add flag to disable GPU.

* fix spelling

* fix unittest.

* address comments.
Co-authored-by: Guoshuai Zhao <guzhao@microsoft.com>
```
  52848d2f
- rename _cal_params_size as _cal_params_count. (#16) · 83a4e93f
  guoshzhao authored Mar 09, 2021
```
Co-authored-by: Guoshuai Zhao <guzhao@microsoft.com>
```
  83a4e93f
08 Mar, 2021 1 commit

Benchmarks: Add Feature - Add optimizer definition in Model Base (#13) · 52b52c2c

guoshzhao authored Mar 08, 2021



* add optimizer definition and function to create torch optimizer.

* move optimizer enum into model_base module.
Co-authored-by: Guoshuai Zhao <guzhao@microsoft.com>

52b52c2c

04 Mar, 2021 1 commit
- add more checks for model base (#12) · 9388f8f5
  guoshzhao authored Mar 04, 2021
```
Co-authored-by: Guoshuai Zhao <guzhao@microsoft.com>
```
  9388f8f5
24 Feb, 2021 1 commit
- Benchmarks: Initialization - Add base class, registry, and result (#1) · 4c87a3e4
  guoshzhao authored Feb 24, 2021
```
* benchmarks init.
Co-authored-by: Guoshuai Zhao <guzhao@microsoft.com>
```
  4c87a3e4