Commits · f2634d8608a7b40f5377b3e220458b5da1588bfc · tsoc / superbenchmark

01 Apr, 2022 1 commit

Benchmarks: Add Feature - Provide option to save raw data into file. (#333) · 6d895da8

guoshzhao authored Apr 01, 2022

**Description**
Use config `log_raw_data` to control whether log the raw data into file or not. The default value is `no`. We can set it as `yes` for some particular benchmarks to save the raw data into file, such as NCCL/RCCL test.

6d895da8

15 Mar, 2022 1 commit
- Bug: Executor - fix bug in result writing to files for mpi mode (#328) · 84359fd8
  user4543 authored Mar 16, 2022
```
**Description**
fix the bug in result writing to files for mpi mode.
```
  84359fd8
25 Jan, 2022 1 commit

Config - Update benchmark naming to support annotations (#284) · 7d7cd3dc

Yifan Xiong authored Jan 25, 2022

__Description__

Update benchmark naming to support annotations.

__Major Revisions__
- Update name for `create_benchmark_context` in executor.
- Backward compatibility for model benchmarks using "_models" suffix.
- Update documents.

7d7cd3dc

10 Dec, 2021 1 commit

Monitor: Integration - Integrate monitor into Superbench (#259) · 6e357fb9

guoshzhao authored Dec 10, 2021

**Description**
Integrate monitor into Superbench.

**Major Revision**
- Initialize, start and stop monitor in SB executor.
- Parse the monitor data in SB runner and merge into benchmark results.
- Specify ReduceType for monitor metrics, such as MAX, MIN and LAST.
- Add monitor configs into config file.

6e357fb9

08 Dec, 2021 1 commit

Bug - Fix issues for distributed runs (#258) · 213ab14b

Yifan Xiong authored Dec 08, 2021

Fix issues for distributed runs:
* fix config for memory bandwidth benchmarks
* add throttling for high concurrency docker pull
* update rsync path and exclude directories
* handle exceptions when creating summary
* tune for logging

213ab14b

20 Aug, 2021 1 commit

Runner: Add Feature - Generate summarized output files. (#157) · 7595d794

guoshzhao authored Aug 20, 2021

**Description**
Generate the summarized output files from all nodes. For each metric, do the reduce operation according to the `reduce_op`

**Major Revision**
- Generate the summarized json file per node:
For microbenchmark, the format is `{benchmark_name}/[{run_count}/]{metric_name}[:rank]`
For modelbenchmark, the format is `{benchmark_name}/{sub_benchmark_name}/[{run_count}/]{metric_name}`
`[]` means optional.
```
{
  "kernel-launch/overhead_event:0": 0.00583,
  "kernel-launch/overhead_event:1": 0.00545,
  "kernel-launch/overhead_event:2": 0.00581,
  "kernel-launch/overhead_event:3": 0.00572,
  "kernel-launch/overhead_event:4": 0.00559,
  "kernel-launch/overhead_event:5": 0.00591,
  "kernel-launch/overhead_event:6": 0.00562,
  "kernel-launch/overhead_event:7": 0.00586,
  "resnet_models/pytorch-resnet50/steptime-train-float32": 544.0827468410134,
  "resnet_models/pytorch-resnet50/throughput-train-float32": 353.7607016465773,
  "resnet_models/pytorch-resnet50/steptime-train-float16": 425.40482617914677,
  "resnet_models/pytorch-resnet50/throughput-train-float16": 454.0142363793973,
  "pytorch-sharding-matmul/0/allreduce": 10.561786651611328,
  "pytorch-sharding-matmul/1/allreduce": 10.561786651611328,
  "pytorch-sharding-matmul/0/allgather": 10.088025093078613,
  "pytorch-sharding-matmul/1/allgather": 10.088025093078613
}
```
- Generate the summarized jsonl file for all nodes, each line is the result from one node in json format.

7595d794

09 Jul, 2021 1 commit

Bug bash - Merge fix from release/0.2 to main (#124) · 9c984c7e

guoshzhao authored Jul 09, 2021



* Bug Fix - Fix race condition issue for multi ranks (#117)

Fix race condition issue when multi ranks rotating the same directory.

* Update pipeline for release branch (#122)

* Bug Fix - Fix bug when convert bool config to store_true argument. (#120)
Co-authored-by: Yifan Xiong <yifan.xiong@microsoft.com>

9c984c7e

08 Jul, 2021 1 commit

Runner & Executor - Support AMD GPU (#119) · 7458f83a

Yifan Xiong authored Jul 09, 2021

Support both NVIDIA and AMD GPU and check GPU vendor during deployment and execution.

* Add GPU environment check in sb deploy.
* Check GPU vendor in executor.

7458f83a

02 Jul, 2021 1 commit

Runner - Fetch benchmarks results on all nodes (#116) · fb7d4a73

Yifan Xiong authored Jul 02, 2021

Fetch benchmarks results on all nodes, will rsync after each benchmark.
The results directory structure on control node is as follows:

```
outputs/
└── datetime
    ├── nodes
    │   └── node-0
    │       ├── benchmarks
    │       │   ├── benchmark-0
    │       │   │   ├── rank-0
    │       │   │   │   └── results.json
    │       └── sb-exec.log
    ├── sb-run.log
    └── sb.config.yaml
```

fb7d4a73

01 Jul, 2021 1 commit
- CLI - Support custom output directory (#110) · 7b0b0e9a
  Yifan Xiong authored Jul 01, 2021
```
* Support custom output directory.
* Update document.
```
  7b0b0e9a
28 Jun, 2021 1 commit
- Bug bash - Fix ambiguous type check in executor. (#104) · 05e449a3
  guoshzhao authored Jun 28, 2021
  
  05e449a3
16 Jun, 2021 1 commit

Bug bash - Fix bugs and refine log in single GPU benchmarks (#97) · ddbc51a1

Yifan Xiong authored Jun 16, 2021

Fix bugs and refine log in single GPU benchmarks:

* Fix none framework issue
* Fix empty parameter bug
* Remove missed mobilenet_v3 models
* Change benchmark registration log to debug level
* Add pid in logging
* Add missing benchmarks in default config
* Fix deprecated logging warn

ddbc51a1

31 May, 2021 1 commit
- Executor - Save benchmark results to file (#86) · 5e9f948d
  Yifan Xiong authored May 31, 2021
```
* Save benchmark results to json file.
```
  5e9f948d
18 May, 2021 1 commit

CLI - Refine CLI handlers (#68) · 977b1a73

Yifan Xiong authored May 18, 2021

* use absolute path of input file
* parse registry uri from image
* merge common parts for arguments processing

977b1a73

13 Apr, 2021 1 commit

Executor - Fix issues when executing benchmarks (#51) · 8c527308

Yifan Xiong authored Apr 13, 2021

* fix missing package in dockerfile
* update benchmark list and parameters
* catch runtime errors
* refine logging info

8c527308

12 Apr, 2021 1 commit
- CLI - Integration with Executor and Runner (#26) · 57114294
  Yifan Xiong authored Apr 12, 2021
```
* CLI integration with Executor and Runner
```
  57114294
09 Apr, 2021 1 commit

Executor: Init - Add superbench executor class (#34) · c74f4879

Yifan Xiong authored Apr 09, 2021

Add superbench executor class

* add executor class
* update default config to exec benchmarks
* add micro benchmarks and model benchmarks

c74f4879