Commits · ad8e01439c2e00f8a5405ea85b29fa1af2b21aa5 · tsoc / superbenchmark

30 Jun, 2025 1 commit

Benchmarks: Add Mixture of Experts Model (#679) · 44e35cda

pdr authored Jun 30, 2025



Added MoE model using MixtralConfig. 

1. Added 8x7b and 8x22b variants 
2. Requires high VRAM as all experts are loaded in memory. Thus,
disabled training due to memory constraint on test worker.

---------
Co-authored-by: Hongtao Zhang <garyworkzht@gmail.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Hongtao Zhang <hongtaozhang@microsoft.com>

44e35cda

26 Jun, 2025 1 commit

Benchmarks - Add deepseek megatron-lm benchmark (#713) · deef9a3d

Yuting Jiang authored Jun 27, 2025



**Description**
Add deepseek megatron-lm benchmark.

---------
Co-authored-by: yukirora <yuting.jiang@microsoft.com>
Co-authored-by: Hongtao Zhang <garyworkzht@gmail.com>
Co-authored-by: Hongtao Zhang <hongtaozhang@microsoft.com>

deef9a3d

25 Jun, 2025 1 commit

Dockerfile - Add cuda12.9 docker image (#716) · a56356d8

guoshzhao authored Jun 25, 2025



**Description**
Add cuda 12.9 dockerfile and build in pipeline.

---------
Co-authored-by: Guoshuai Zhao <microsoft@microsoft.com>
Co-authored-by: Hongtao Zhang <hongtaozhang@microsoft.com>
Co-authored-by: Hongtao Zhang <garyworkzht@gmail.com>

a56356d8

28 Nov, 2024 1 commit

Benchmarks - Add LLaMA-2 Models (#668) · 249e21c1

pdr authored Nov 27, 2024

Added llama benchmark - training and inference in accordance with the
existing pytorch models implementation like gpt2, lstm etc.

- added llama fp8 unit test for better code coverage, to reduce memory
required
- updated transformers version >= 4.28.0 for LLamaConfig
- set tokenizers version <= 0.20.3 to avoid 0.20.4 version
[issues](https://github.com/huggingface/tokenizers/issues/1691

) with
py3.8
- added llama2 to tensorrt
- llama2 tests not added to test_tensorrt_inference_performance.py due
to large memory requirement for worker gpu. tests validated separately
on gh200

---------
Co-authored-by: dpatlolla <dpatlolla@microsoft.com>

249e21c1

27 Nov, 2024 1 commit

CI/CD - Upgrade dependency versions in pipeline (#671) · 96f5ccea

Yifan Xiong authored Nov 26, 2024



Upgrade dependency versions in Azure pipeline:

* Remove Python 3.6 and add Python 3.10 for cpu-unit-test
* Upgrade CUDA from 11.1 to 12.4 for cuda-unit-test
* Update labels accordingly

---------
Co-authored-by: Dilip Patlolla <dilipreddi@gmail.com>

96f5ccea

08 Jan, 2024 1 commit

Release - SuperBench v0.10.0 (#607) · 2c88db90

Yifan Xiong authored Jan 07, 2024



**Description**

Cherry-pick bug fixes from v0.10.0 to main.

**Major Revisions**

* Benchmarks: Microbenchmark - Support different hipblasLt data types in dist_inference #590
* Benchmarks: Microbenchmark - Support in-place for NCCL/RCCL benchmark #591
* Bug Fix - Fix NUMA Domains Swap Issue in NDv4 Topology File #592
* Benchmarks: Microbenchmark - Add data type option for NCCL and RCCL tests #595
* Benchmarks: Bug Fix - Make metrics of dist-inference-cpp aligned with PyTorch version #596
* CI/CD - Add ndv5 topo file #597
* Benchmarks: Microbenchmark - Improve AMD GPU P2P performance with fine-grained GPU memory #593
* Benchmarks: Build Pipeline - fix nccl and nccl test version to 2.18.3 to resolve hang issue in cuda12.2 docker #599
* Dockerfile - Bug fix for rocm docker build and deploy #598
* Benchmarks: Microbenchmark - Adapt to hipblasLt data type changes #603
* Benchmarks: Micro benchmarks - Update hipblaslt metric unit to tflops #604
* Monitor - Upgrade pyrsmi to amdsmi python library. #601
* Benchmarks: Micro benchmarks - add fp8 and initialization for hipblaslt benchmark #605
* Dockerfile - Add rocm6.0 dockerfile #602
* Bug Fix - Bug fix for latest megatron-lm benchmark #600
* Docs - Upgrade version and release note #606
Co-authored-by: Ziyue Yang <ziyyang@microsoft.com>
Co-authored-by: Yang Wang <yangwang1@microsoft.com>
Co-authored-by: Yuting Jiang <yutingjiang@microsoft.com>
Co-authored-by: guoshzhao <guzhao@microsoft.com>

2c88db90

07 Dec, 2023 1 commit
- Benchmarks: Add benchmark: Megatron-LM/Megatron-Deepspeed GPT pretrain benchmark (#582) · dd5a6329
  Yuting Jiang authored Dec 07, 2023
```
**Description**
Megatron-LM/Megatron-Deepspeed GPT pretrain benchmark
```
  dd5a6329
28 Apr, 2023 1 commit

ModelBenchmarks - Fix early stop logic due to num_steps. (#522) · f38a9829

guoshzhao authored Apr 28, 2023

**Description**
Model benchmarks can stop due to `num_steps` or `duration` config which
will take effect when the value is set greater than 0.
If both are set greater than 0, the earliest condition reached will
work.

f38a9829

14 Apr, 2023 1 commit

Release - SuperBench v0.8.0 (#517) · 51761b3a

Yifan Xiong authored Apr 14, 2023



**Description**

Cherry-pick bug fixes from v0.8.0 to main.

**Major Revisions**

* Monitor - Fix the cgroup version checking logic (#502)
* Benchmark - Fix matrix size overflow issue in cuBLASLt GEMM (#503)
* Fix wrong torch usage in communication wrapper for Distributed
Inference Benchmark (#505)
* Analyzer: Fix bug in python3.8 due to pandas api change (#504)
* Bug - Fix bug to get metric from cmd when error happens (#506)
* Monitor - Collect realtime GPU power when benchmarking (#507)
* Add num_workers argument in model benchmark (#511)
* Remove unreachable condition when write host list (#512)
* Update cuda11.8 image to cuda12.1 based on nvcr23.03 (#513)
* Doc - Fix wrong unit of cpu-memory-bw-latency in doc (#515)
* Docs - Upgrade version and release note (#508)
Co-authored-by: guoshzhao <guzhao@microsoft.com>
Co-authored-by: Ziyue Yang <ziyyang@microsoft.com>
Co-authored-by: Yuting Jiang <yutingjiang@microsoft.com>

51761b3a

04 Jan, 2023 1 commit

Benchmarks - Support FP8 in BERT models (#446) · 5197cdf5

Yifan Xiong authored Jan 04, 2023

Support FP8 in PyTorch BERT models:

* add fp8 hybrid/e4m3/e5m2 in precision arguments
* build BERT encoders with `te.TransformerLayer` to repalce
`transformers.BertModel`
* wrap forward steps with fp8 autocast

5197cdf5

30 Dec, 2022 1 commit

Executor - Add stdout logging util module and enable real-time logging flushing in executor (#445) · 9dfefce3

Yuting Jiang authored Dec 30, 2022

**Description**
Add stdout logging util module and enable real-time logging flushing in executor

**Major Revision**
- Add stdout logging util module to redirect stdout into file log
- enable stdout logging in executor to write benchmark output into both stdout and file `sb-bench.log`
- enable real-time log flushing in run_command of microbenchmarks through config `log_flushing`

**Minor Revision**
- add log_n_step args to enable regular step time log in model benchmarks 
- udpate related docs

9dfefce3

29 Apr, 2022 1 commit

Release - SuperBench v0.5.0 (#350) · 6681c720

Yifan Xiong authored Apr 29, 2022



**Description**

Cherry-pick  bug fixes from v0.5.0 to main.

**Major Revisions**

* Bug - Force to fix ort version as '1.10.0' (#343)
* Bug - Support no matching rules and unify the output name in result_summary (#345)
* Analyzer - Support regex in annotations of benchmark naming for metrics in rules (#344)
* Bug - Fix bugs in sync results on root rank for e2e model benchmarks (#342)
* Bug - Fix bug of duration feature for model benchmarks in distributed mode (#347)
* Docs - Upgrade version and release note (#348)
Co-authored-by: Yuting Jiang <v-yutjiang@microsoft.com>

6681c720

01 Apr, 2022 1 commit

Benchmarks: Add Feature - Provide option to save raw data into file. (#333) · 6d895da8

guoshzhao authored Apr 01, 2022

**Description**
Use config `log_raw_data` to control whether log the raw data into file or not. The default value is `no`. We can set it as `yes` for some particular benchmarks to save the raw data into file, such as NCCL/RCCL test.

6d895da8

28 Jan, 2022 1 commit

Benchmarks: Add Feature - Sync the E2E training results among all workers for each step. (#287) · d03d110f

guoshzhao authored Jan 28, 2022

**Description**
Please write a brief description and link the related issue if have.

**Major Revision**
- Sync (do allreduce max) the E2E training results among all workers.
- Avoid using ':0' in metric name if there has only one rank having output.

d03d110f

19 Jan, 2022 1 commit
- Benchmarks: Add Feature - Add percentile metrics for ort and pytorch inference benchmarks (#283) · fd2bc9e0
  guoshzhao authored Jan 19, 2022
```
**Description**
Add 50th, 90th, 95th, 99th, 99.9th latency metrics for ORT and pytorch inference benchmarks.
```
  fd2bc9e0
18 Jan, 2022 1 commit

CLI - Add command sb benchmark [list,list-parameters] (#279) · f7ffc545

Yifan Xiong authored Jan 18, 2022

__Description__

Add command `sb benchmark list` and `sb benchmark list-parameters` to support listing all optional parameters for benchmarks.

<details>
<summary>Examples</summary>
<pre>
$ sb benchmark list -n [a-z]+-bw -o table
Result
--------
mem-bw
nccl-bw
rccl-bw
</pre>
<pre>
$ sb benchmark list-parameters -n mem-bw
=== mem-bw ===
optional arguments:
  --bin_dir str         Specify the directory of the benchmark binary.
  --duration int        The elapsed time of benchmark in seconds.
  --mem_type str [str ...]
                        Memory types to benchmark. E.g. htod dtoh dtod.
  --memory str          Memory argument for bandwidthtest. E.g. pinned unpinned.
  --run_count int       The run count of benchmark.
  --shmoo_mode          Enable shmoo mode for bandwidthtest.
default values:
{'bin_dir': None,
 'duration': 0,
 'mem_type': ['htod', 'dtoh'],
 'memory': 'pinned',
 'run_count': 1}
</pre>
</details>

__Major Revisions__
* Add `sb benchmark list` to list benchmarks matching given name.
* Add `sb benchmark list-parameters` to list parameters for benchmarks which match given name.

__Minor Revisions__
* Sort format help text for argparse.

f7ffc545

10 Dec, 2021 1 commit
- Benchmarks: Fix Bug - Set reduce_op type for metirc return_code (#261) · afea9913
  guoshzhao authored Dec 10, 2021
```
**Description**
Set the `reduce_op` type for metirc `return_code` as `None`.
```
  afea9913
09 Dec, 2021 1 commit
- Benchmarks: Unify metric names of benchmarks (#252) · 9f56b219
  Yuting Jiang authored Dec 09, 2021
```
**Description**
Unify metric names of benchmarks.
```
  9f56b219
07 Dec, 2021 1 commit
- Benchmarks: Add Feature - Add return_code metric into result (#256) · 44f0270e
  guoshzhao authored Dec 07, 2021
```
**Description**
Add return_code metric into result and revise unit tests.
```
  44f0270e
27 Sep, 2021 1 commit
- Benchmarks: Add Feature - Add option to use fp32 instead of tf32 (#213) · f9442456
  guoshzhao authored Sep 28, 2021
```
**Description**
Add option `force_fp32` to use fp32 instead of tf32, only takes effect on Ampere or newer GPUs.
```
  f9442456
16 Aug, 2021 1 commit
- Benchmarks: Code Revision - change 'reduce' to 'reduce_op' (#156) · 7293e783
  guoshzhao authored Aug 16, 2021
```
**Description**
Change the field name `reduce` to `reduce_op`.
```
  7293e783
06 Aug, 2021 2 commits
- Benchmarks: Add Feature - Set reduce type for current benchmarks' metrics. (#149) · acf365a8
  guoshzhao authored Aug 06, 2021
```
**Description**
Set reduce type for current benchmarks' metrics, including model benchmarks and ShardingMatmul.
```
  acf365a8
- Benchmarks: Code Revision - Calculate average value by using statistics module. (#148) · bc1a61b9
  guoshzhao authored Aug 06, 2021
```
**Description**
Replace `sum(results) / len(results)` with `statistics.mean(results)`
```
  bc1a61b9
05 Aug, 2021 1 commit

Benchmarks: Add Feature - Add reduce function support for output summary. (#147) · e41b1f62

guoshzhao authored Aug 05, 2021

**Description**
Add reduce function support for output summary.

**Major Revision**
- Add reducer class to maintain all reduce functions.
- Save reduce type of each metric into `BenchmarkResult`
- Fix UT.

e41b1f62

28 Jun, 2021 1 commit
- Benchmarks: Code Revision - Replace torch.optim.AdamW with transformers.AdamW. (#106) · 9c748527
  guoshzhao authored Jun 28, 2021
```
* replace torch.optim.AdamW with transformers.AdamW.
```
  9c748527
07 Jun, 2021 1 commit
- Benchmarks: Fix Bug - Fix OOM issue when run pytorch models sequentially. (#93) · 03b41be1
  guoshzhao authored Jun 07, 2021
```
* Clean up the cache.
```
  03b41be1
04 Jun, 2021 1 commit
- Benchmarks: Fix Bug - Fix return code overwrite issue (#94) · 2d9be807
  guoshzhao authored Jun 04, 2021
```
* fix return code reset issue
```
  2d9be807
19 May, 2021 1 commit
- expose interface of pin memory and modify cnn configuration (#75) · b7d0ee32
  Yuting Jiang authored May 19, 2021
  
  b7d0ee32
20 Apr, 2021 2 commits
- Benchmarks: Add Benchmark - Add LSTM model benchmarks. (#60) · 2a7ab691
  guoshzhao authored Apr 20, 2021
```
* Benchmarks: Add Benchmark - Add LSTM model benchmarks.
```
  2a7ab691
- Benchmarks: Add Benchmark - Add CNN model benchmarks. (#59) · 902ea211
  guoshzhao authored Apr 20, 2021
```
* Benchmarks: Add Benchmark - Add CNN model benchmarks.
```
  902ea211
16 Apr, 2021 2 commits
- Benchmarks: Code Revision - Fix some issue for BERT benchmark. (#58) · ce3ed24a
  guoshzhao authored Apr 16, 2021
```
Benchmarks: Code Revision - Fix some issue for BERT benchmark. (#58)
```
  ce3ed24a
- Benchmarks: Add Benchmark - Add GPT2 model benchmark. (#57) · af567cf6
  guoshzhao authored Apr 16, 2021
```
* Benchmarks: Add Benchmark - Add GPT2 model benchmark.
```
  af567cf6
12 Apr, 2021 2 commits
- unify arguments format by using whitespace. (#50) · 4664019a
  guoshzhao authored Apr 12, 2021
```
Co-authored-by: Guoshuai Zhao <guzhao@microsoft.com>
Co-authored-by: Yifan Xiong <yifan.xiong@microsoft.com>
```
  4664019a
- Skip tests and remove useless tests (#42) · 7c0534cc
  Yifan Xiong authored Apr 12, 2021
```
* skip unnecessary tests according to env var
* remove useless tests
```
  7c0534cc
08 Apr, 2021 1 commit

Benchmarks: Code Revision - Revise BenchmarkRegistry interfaces for... · 923ce277

guoshzhao authored Apr 08, 2021


Benchmarks: Code Revision - Revise BenchmarkRegistry interfaces for integration with executor. (#33)

* revise BenchmarkRegistry interfaces.
* address comments
Co-authored-by: Guoshuai Zhao <guzhao@microsoft.com>

923ce277

26 Mar, 2021 1 commit

Benchmarks: Add Benchmark - Add Pytorch BERT benchmarks, including bert-base... · 0972b223

guoshzhao authored Mar 26, 2021


Benchmarks: Add Benchmark - Add Pytorch BERT benchmarks, including bert-base and bert-large.   (#20)

* add pytorch bert benchmarks.

* revise code

* fix issue

* revise code.
Co-authored-by: Guoshuai Zhao <guzhao@microsoft.com>

0972b223

22 Mar, 2021 1 commit
- Benchmarks: Code Revision - Move benchmarks auto-registration from registry.py to __init__.py (#24) · 8d24d03d
  guoshzhao authored Mar 22, 2021
```
* move benchmarks registration from registry.py to __init__.py

* revise __init__.
Co-authored-by: Guoshuai Zhao <guzhao@microsoft.com>
```
  8d24d03d
18 Mar, 2021 2 commits
- Benchmarks: Add Feature - Add sample_count argument for ModelBenchmark. (#22) · c00dc670
  guoshzhao authored Mar 18, 2021
```
* add sample_count argument.

* handle more condidatins.

* address comments.
Co-authored-by: Guoshuai Zhao <guzhao@microsoft.com>
```
  c00dc670
- Benchmarks: Code Revision - Support benchmark re-registration, keep the latest one. (#23) · 31b6f085
  guoshzhao authored Mar 18, 2021
```
* support benchmark re-registration.

* address comments
Co-authored-by: Guoshuai Zhao <guzhao@microsoft.com>
```
  31b6f085
17 Mar, 2021 1 commit
- Benchmarks: Add Test - Add tests for PytorchBase module. (#18) · 5b9b5cc8
  guoshzhao authored Mar 17, 2021
```
* add pytorch base tests.

* add more test cases.
Co-authored-by: Guoshuai Zhao <guzhao@microsoft.com>
```
  5b9b5cc8