Commits · 51761b3af172b4fc54ce0a3abc302e203d2bf44a · tsoc / superbenchmark

14 Apr, 2023 1 commit

Release - SuperBench v0.8.0 (#517) · 51761b3a

Yifan Xiong authored Apr 14, 2023



**Description**

Cherry-pick bug fixes from v0.8.0 to main.

**Major Revisions**

* Monitor - Fix the cgroup version checking logic (#502)
* Benchmark - Fix matrix size overflow issue in cuBLASLt GEMM (#503)
* Fix wrong torch usage in communication wrapper for Distributed
Inference Benchmark (#505)
* Analyzer: Fix bug in python3.8 due to pandas api change (#504)
* Bug - Fix bug to get metric from cmd when error happens (#506)
* Monitor - Collect realtime GPU power when benchmarking (#507)
* Add num_workers argument in model benchmark (#511)
* Remove unreachable condition when write host list (#512)
* Update cuda11.8 image to cuda12.1 based on nvcr23.03 (#513)
* Doc - Fix wrong unit of cpu-memory-bw-latency in doc (#515)
* Docs - Upgrade version and release note (#508)
Co-authored-by: guoshzhao <guzhao@microsoft.com>
Co-authored-by: Ziyue Yang <ziyyang@microsoft.com>
Co-authored-by: Yuting Jiang <yutingjiang@microsoft.com>

51761b3a

30 Dec, 2022 1 commit

Executor - Add stdout logging util module and enable real-time logging flushing in executor (#445) · 9dfefce3

Yuting Jiang authored Dec 30, 2022

**Description**
Add stdout logging util module and enable real-time logging flushing in executor

**Major Revision**
- Add stdout logging util module to redirect stdout into file log
- enable stdout logging in executor to write benchmark output into both stdout and file `sb-bench.log`
- enable real-time log flushing in run_command of microbenchmarks through config `log_flushing`

**Minor Revision**
- add log_n_step args to enable regular step time log in model benchmarks 
- udpate related docs

9dfefce3

29 Apr, 2022 1 commit

Release - SuperBench v0.5.0 (#350) · 6681c720

Yifan Xiong authored Apr 29, 2022



**Description**

Cherry-pick  bug fixes from v0.5.0 to main.

**Major Revisions**

* Bug - Force to fix ort version as '1.10.0' (#343)
* Bug - Support no matching rules and unify the output name in result_summary (#345)
* Analyzer - Support regex in annotations of benchmark naming for metrics in rules (#344)
* Bug - Fix bugs in sync results on root rank for e2e model benchmarks (#342)
* Bug - Fix bug of duration feature for model benchmarks in distributed mode (#347)
* Docs - Upgrade version and release note (#348)
Co-authored-by: Yuting Jiang <v-yutjiang@microsoft.com>

6681c720

01 Apr, 2022 1 commit

Benchmarks: Add Feature - Provide option to save raw data into file. (#333) · 6d895da8

guoshzhao authored Apr 01, 2022

**Description**
Use config `log_raw_data` to control whether log the raw data into file or not. The default value is `no`. We can set it as `yes` for some particular benchmarks to save the raw data into file, such as NCCL/RCCL test.

6d895da8

28 Jan, 2022 1 commit

Benchmarks: Add Feature - Sync the E2E training results among all workers for each step. (#287) · d03d110f

guoshzhao authored Jan 28, 2022

**Description**
Please write a brief description and link the related issue if have.

**Major Revision**
- Sync (do allreduce max) the E2E training results among all workers.
- Avoid using ':0' in metric name if there has only one rank having output.

d03d110f

19 Jan, 2022 1 commit
- Benchmarks: Add Feature - Add percentile metrics for ort and pytorch inference benchmarks (#283) · fd2bc9e0
  guoshzhao authored Jan 19, 2022
```
**Description**
Add 50th, 90th, 95th, 99th, 99.9th latency metrics for ORT and pytorch inference benchmarks.
```
  fd2bc9e0
09 Dec, 2021 1 commit
- Benchmarks: Unify metric names of benchmarks (#252) · 9f56b219
  Yuting Jiang authored Dec 09, 2021
```
**Description**
Unify metric names of benchmarks.
```
  9f56b219
27 Sep, 2021 1 commit
- Benchmarks: Add Feature - Add option to use fp32 instead of tf32 (#213) · f9442456
  guoshzhao authored Sep 28, 2021
```
**Description**
Add option `force_fp32` to use fp32 instead of tf32, only takes effect on Ampere or newer GPUs.
```
  f9442456
06 Aug, 2021 2 commits
- Benchmarks: Add Feature - Set reduce type for current benchmarks' metrics. (#149) · acf365a8
  guoshzhao authored Aug 06, 2021
```
**Description**
Set reduce type for current benchmarks' metrics, including model benchmarks and ShardingMatmul.
```
  acf365a8
- Benchmarks: Code Revision - Calculate average value by using statistics module. (#148) · bc1a61b9
  guoshzhao authored Aug 06, 2021
```
**Description**
Replace `sum(results) / len(results)` with `statistics.mean(results)`
```
  bc1a61b9
28 Jun, 2021 1 commit
- Benchmarks: Add Configuration - Add validation config file for azure NDv4. (#103) · f22bb3f2
  guoshzhao authored Jun 28, 2021
```
* add config file for ndv4.
```
  f22bb3f2
21 Jun, 2021 1 commit
- Benchmarks: Add Feature - Add DistributedImpl and DistributedBackend arguments... · 216c5b5c
  guoshzhao authored Jun 21, 2021
```
Benchmarks: Add Feature - Add DistributedImpl and DistributedBackend arguments for micro benchmark. (#100)
```
  216c5b5c
04 Jun, 2021 1 commit
- Benchmarks: Fix Bug - Fix return code overwrite issue (#94) · 2d9be807
  guoshzhao authored Jun 04, 2021
```
* fix return code reset issue
```
  2d9be807
19 May, 2021 1 commit
- expose interface of pin memory and modify cnn configuration (#75) · b7d0ee32
  Yuting Jiang authored May 19, 2021
  
  b7d0ee32
26 Apr, 2021 1 commit
- Benchmarks: Fix Bug - Increase default sample count for benchmarking. (#64) · a7184da3
  guoshzhao authored Apr 26, 2021
  
  a7184da3
08 Apr, 2021 1 commit
- Benchmarks: Code Revision - Revise result process interface and add result checking (#32) · 2871a68b
  guoshzhao authored Apr 08, 2021
```
* revise result process interface

* add more comments
Co-authored-by: Guoshuai Zhao <guzhao@microsoft.com>
```
  2871a68b
22 Mar, 2021 1 commit

Benchmarks: Add Feature - Add benchmark finish check according to... · 5dfcc6be

guoshzhao authored Mar 22, 2021


Benchmarks: Add Feature - Add benchmark finish check according to num_warmup/num_steps and duration in ModelBenchmark class. (#25)

* add is_finished function

* reuse current time.
Co-authored-by: Guoshuai Zhao <guzhao@microsoft.com>

5dfcc6be

18 Mar, 2021 1 commit

Benchmarks: Add Feature - Add sample_count argument for ModelBenchmark. (#22) · c00dc670

guoshzhao authored Mar 18, 2021



* add sample_count argument.

* handle more condidatins.

* address comments.
Co-authored-by: Guoshuai Zhao <guzhao@microsoft.com>

c00dc670

09 Mar, 2021 2 commits
- Benchmarks: Add Feature - Add flag to disable GPU. (#15) · 52848d2f
  guoshzhao authored Mar 10, 2021
```
* add flag to disable GPU.

* fix spelling

* fix unittest.

* address comments.
Co-authored-by: Guoshuai Zhao <guzhao@microsoft.com>
```
  52848d2f
- rename _cal_params_size as _cal_params_count. (#16) · 83a4e93f
  guoshzhao authored Mar 09, 2021
```
Co-authored-by: Guoshuai Zhao <guzhao@microsoft.com>
```
  83a4e93f
08 Mar, 2021 1 commit

Benchmarks: Add Feature - Add optimizer definition in Model Base (#13) · 52b52c2c

guoshzhao authored Mar 08, 2021



* add optimizer definition and function to create torch optimizer.

* move optimizer enum into model_base module.
Co-authored-by: Guoshuai Zhao <guzhao@microsoft.com>

52b52c2c

04 Mar, 2021 1 commit
- add more checks for model base (#12) · 9388f8f5
  guoshzhao authored Mar 04, 2021
```
Co-authored-by: Guoshuai Zhao <guzhao@microsoft.com>
```
  9388f8f5
24 Feb, 2021 1 commit
- Benchmarks: Initialization - Add base class, registry, and result (#1) · 4c87a3e4
  guoshzhao authored Feb 24, 2021
```
* benchmarks init.
Co-authored-by: Guoshuai Zhao <guzhao@microsoft.com>
```
  4c87a3e4