Commits · 1362732c79a58de5e9d89bc7fe5bdd78e607597e · tsoc / superbenchmark

24 Jul, 2024 1 commit
- Docs - Add BibTeX in README and repo (#632) · 1362732c
  Yifan Xiong authored Jul 23, 2024
```
Add BibTeX for citation in README and repo.
```
  1362732c
23 Jul, 2024 1 commit

Update omegaconf version to 2.3.0 (#631) · 9a3ce39d

Yang Wang authored Jul 24, 2024

Update `omegaconf` version to
[2.3.0](https://pypi.org/project/omegaconf/2.3.0/) as omegaconf 2.0.6
has a non-standard dependency specifier PyYAML>=5.1.*. pip 24.1 will
enforce this behaviour change.
Discussion can be found at https://github.com/pypa/pip/issues/12063.

9a3ce39d

22 Apr, 2024 1 commit

Dockerfile - Add CUDA 12.4 dockerfile (#619) · 7435f10a

Yuting Jiang authored Apr 22, 2024

**Description**
Add CUDA 12.4 dockerfile.

**Major Revision**
- upgrade nvidia docker into 23.04


**Minor Revision**
- upgrade hpcx into 2.18

7435f10a

18 Apr, 2024 1 commit
- Dockerfile - Upgrade mlc to v3.11 (#620) · dc3846cb
  Yuting Jiang authored Apr 18, 2024
```
**Description**
Upgrade mlc to v3.11.
```
  dc3846cb
02 Apr, 2024 1 commit
- Benchmarks: Revise Code - Add hipblasLt tuning to dist-inference cpp implementation (#616) · cc89ee59
  Ziyue Yang authored Apr 02, 2024
```
**Description**
Adds hipblasLt tuning to dist-inference cpp implementation.
```
  cc89ee59
21 Mar, 2024 1 commit

Bug Fix - Bug fix for cuda 12.2 dockerfile LD_LIBRARY_PATH issue (#614) · eeaa9b1a

Yang Wang authored Mar 22, 2024

**Description**
Cuda 12.2 image will report undfined symbol error due to incomplete
LD_LIBRARY_PATH:


![image](https://github.com/microsoft/superbenchmark/assets/25875482/1a7c48c7-cb6b-4e3a-abbe-dde23007a96b)

### How to reproduce:
1. Deploy sb with cuda12.2 image
```
sb deploy -f local.ini -i superbench/superbench:v0.10.0-cuda12.2
```
2. Enter to the container
```
sudo docker exec -it sb-workspace bash
```
3. Execute `mpirun`:
```
root@sb-container:~# mpirun
mpirun: symbol lookup error: mpirun: undefined symbol: opal_libevent2022_event_base_loop
```
### Fix to fix
* Append hpcx_load into /etc/bash.bashrc for updaing env LD_LIBRARY_PATH in each time

---------

eeaa9b1a

08 Jan, 2024 1 commit

Release - SuperBench v0.10.0 (#607) · 2c88db90

Yifan Xiong authored Jan 07, 2024



**Description**

Cherry-pick bug fixes from v0.10.0 to main.

**Major Revisions**

* Benchmarks: Microbenchmark - Support different hipblasLt data types in dist_inference #590
* Benchmarks: Microbenchmark - Support in-place for NCCL/RCCL benchmark #591
* Bug Fix - Fix NUMA Domains Swap Issue in NDv4 Topology File #592
* Benchmarks: Microbenchmark - Add data type option for NCCL and RCCL tests #595
* Benchmarks: Bug Fix - Make metrics of dist-inference-cpp aligned with PyTorch version #596
* CI/CD - Add ndv5 topo file #597
* Benchmarks: Microbenchmark - Improve AMD GPU P2P performance with fine-grained GPU memory #593
* Benchmarks: Build Pipeline - fix nccl and nccl test version to 2.18.3 to resolve hang issue in cuda12.2 docker #599
* Dockerfile - Bug fix for rocm docker build and deploy #598
* Benchmarks: Microbenchmark - Adapt to hipblasLt data type changes #603
* Benchmarks: Micro benchmarks - Update hipblaslt metric unit to tflops #604
* Monitor - Upgrade pyrsmi to amdsmi python library. #601
* Benchmarks: Micro benchmarks - add fp8 and initialization for hipblaslt benchmark #605
* Dockerfile - Add rocm6.0 dockerfile #602
* Bug Fix - Bug fix for latest megatron-lm benchmark #600
* Docs - Upgrade version and release note #606
Co-authored-by: Ziyue Yang <ziyyang@microsoft.com>
Co-authored-by: Yang Wang <yangwang1@microsoft.com>
Co-authored-by: Yuting Jiang <yutingjiang@microsoft.com>
Co-authored-by: guoshzhao <guzhao@microsoft.com>

2c88db90

11 Dec, 2023 1 commit
- Benchmark: Revision - Fix -O2 option passing in gpu_copy ROCm build (#589) · 2c2096ed
  Ziyue Yang authored Dec 11, 2023
```
**Description**
`add_compile_options` will not work for ROCm build, change it to setting
`CMAKE_CXX_FLAGS`.
```
  2c2096ed
10 Dec, 2023 1 commit
- Benchmarks: Microbenchmark - Add distributed inference benchmark cpp implementation (#586) · 719a427f
  Ziyue Yang authored Dec 11, 2023
```
**Description**
Add distributed inference benchmark cpp implementation.
```
  719a427f
09 Dec, 2023 1 commit

Dockerfile - Upgrade to rocm5.7 dockerfile (#587) · 1f5031bd

Yuting Jiang authored Dec 10, 2023



**Description**
upgrade to rocm5.7 dockerfile.

---------
Co-authored-by: yukirora <yuting.jiang@microsoft.com>

1f5031bd

08 Dec, 2023 1 commit

Benchmarks: Micro benchmark - Add one-to-all, all-to-one, all-to-all support... · 4fa60be7

Ziyue Yang authored Dec 08, 2023

Benchmarks: Micro benchmark - Add one-to-all, all-to-one, all-to-all support to gpu_copy_bw_performance (#588)

**Description**
Add one-to-all, all-to-one, all-to-all support to
gpu_copy_bw_performance, and fix performance bug in gpu_copy

4fa60be7

07 Dec, 2023 2 commits
- Benchmarks: Add MSCCL Support for Nvidia GPU (#584) · 6ef3a011
  Ziyue Yang authored Dec 07, 2023
```
**Description**
Add MSCCL support for Nvidia GPU
```
  6ef3a011
- Benchmarks: Add benchmark: Megatron-LM/Megatron-Deepspeed GPT pretrain benchmark (#582) · dd5a6329
  Yuting Jiang authored Dec 07, 2023
```
**Description**
Megatron-LM/Megatron-Deepspeed GPT pretrain benchmark
```
  dd5a6329
05 Dec, 2023 1 commit
- Benchmarks: Micro benchmark - Add graph mode in NCCL/RCCL benchmarks for latency metrics (#583) · 254ea7fe
  Ziyue Yang authored Dec 05, 2023
```
**Description**
Revise NCCL/RCCL benchmarks to graph mode add latency metrics.
```
  254ea7fe
04 Dec, 2023 1 commit

Benchmarks: micro benchmark - Support cpu-gpu and gpu-cpu in ib-validation (#581) · 9ae8c670

Yuting Jiang authored Dec 04, 2023

**Description**
Benchmarks: micro benchmark - Support cpu-gpu and gpu-cpu in
ib-validation

**Major Revision**
- Support cpu-gpu and gpu-cpu in ib-validation


**Minor Revision**
- support multi msg size, multi direction, multi ib commands in
ib-validation

9ae8c670

27 Nov, 2023 1 commit

Monitor - Add support for AMD GPU. (#580) · 028819b3

guoshzhao authored Nov 27, 2023

**Description**
Add AMD support in monitor.

**Major Revision**
- Add library pyrsmi to collect metrics.
- Currently can get device_utilization, device_power, device_used_memory
and device_total_memory.

028819b3

22 Nov, 2023 4 commits

Dockerfile - Upgrade Docker image to CUDA 12.2 (#577) · 1ad1c21c

Yifan Xiong authored Nov 22, 2023

Upgrade Docker image to CUDA 12.2 for H100:
* upgrade base image to 23.10
* fix onnxruntime version in python3.10
* fix compilation errors

1ad1c21c

Benchmarks: Micro benchmark - add initialization options for rocm gemm flops (#578) · 2235e084
Yuting Jiang authored Nov 22, 2023
```
**Description**
add initialization options for rocm gemm flops.
```
2235e084
Benchmarks: Micro benchmark - Add hipBLASLt function benchmark (#576) · 79089b65
Yuting Jiang authored Nov 22, 2023
```
**Description**
hipblaslt function benchmark and rebase cublaslt function benchmark.
```
79089b65

Analyzer - Generate baseline given results from multiple nodes. (#575) · 9f4880cb

guoshzhao authored Nov 22, 2023



**Description**
Generate baseline given results from multiple nodes. 

**Major Revision**
- Add sub command `sb result generate-baseline`
- Add UT and docs

---------
Co-authored-by: 454314380 <454314380@qq.com>
Co-authored-by: Yuting Jiang <yutingjiang@microsoft.com>

9f4880cb

20 Nov, 2023 1 commit
- Benchmarks: micro benchmarks - add int8 support for cublaslt function (#574) · f53d941a
  Yuting Jiang authored Nov 20, 2023
```
**Description**
add int8 support for cublaslt function.
```
  f53d941a
14 Nov, 2023 1 commit

Bug Fix - remove cp ptx file command in gpu burn test (#567) · c7800bb8

Yuting Jiang authored Nov 14, 2023

**Description**
remove cp ptx file in gpu burn test since the command is run inside
self.args.bin_dir dir.


https://github.com/microsoft/superbenchmark/blob/d246bab430adeb461072918a551b2e2b68c9bce5/superbench/benchmarks/micro_benchmarks/micro_base.py#L183

c7800bb8

07 Nov, 2023 1 commit

Bump @babel/traverse from 7.14.5 to 7.23.2 in /website (#566) · ce3737f9

dependabot[bot] authored Nov 07, 2023

Bumps [@babel/traverse](https://github.com/babel/babel/tree/HEAD/packages/babel-traverse) from 7.14.5 to 7.23.2.
- [Release notes](https://github.com/babel/babel/releases)
- [Changelog](https://github.com/babel/babel/blob/main/CHANGELOG.md)
- [Commits](https://github.com/babel/babel/commits/v7.23.2/packages/babel-traverse

)

---
updated-dependencies:
- dependency-name: "@babel/traverse"
  dependency-type: indirect
...
Signed-off-by: dependabot[bot] <support@github.com>

ce3737f9

05 Nov, 2023 1 commit

Bump postcss from 8.3.5 to 8.4.31 in /website (#564) · 07477c3b

dependabot[bot] authored Nov 05, 2023

Bumps [postcss](https://github.com/postcss/postcss) from 8.3.5 to 8.4.31.
- [Release notes](https://github.com/postcss/postcss/releases)
- [Changelog](https://github.com/postcss/postcss/blob/main/CHANGELOG.md

)
- [Commits](postcss/postcss@8.3.5...8.4.31)

---
updated-dependencies:
- dependency-name: postcss
  dependency-type: indirect
...
Signed-off-by: dependabot[bot] <support@github.com>

07477c3b

23 Oct, 2023 1 commit

Dockerfile - update mlc version into 3.10 for cuda and rocm dockerfiles (#562) · d246bab4

Yuting Jiang authored Oct 23, 2023



**Description**
Update mlc version into 3.10 for cuda and rocm dockerfiles to be
consistent with cuda12 dockerfile
Co-authored-by: yukirora <yuting.jiang@microsoft.com>

d246bab4

22 Aug, 2023 1 commit
- Benchmarks: micro benchmark - source code for evaluating NVDEC decoding performance (#560) · 27a10811
  Yuting Jiang authored Aug 22, 2023
```
**Description**
source code for evaluating NVDEC decoding performance.

---------
Co-authored-by: yukirora <yuting.jiang@microsoft.com>
```
  27a10811
18 Aug, 2023 1 commit
- Benchmarks: micro benchmarks - add source code for DirectXRenderPerf (#549) · 6c0205ce
  Yuting Jiang authored Aug 18, 2023
```
**Description**
add source code for DirectXRenderPerf.

---------
Co-authored-by: yukirora <yuting.jiang@microsoft.com>
```
  6c0205ce
08 Aug, 2023 1 commit

Benchmarks: model benchmarks - change torch.distributed.launch to torchrun (#556) · 67f2aa72

pnunna93 authored Aug 08, 2023

This PR has following changes
- torch.distributed.launch changed to torchrun. torch.distributed.launch
is deprecated in latest Pytorch and is recommended to move to torchrun -
https://pytorch.org/docs/stable/elastic/run.html


- Changes to AMD GPU detection logic. The AMD GPU detection logic throws
warning when containers have only renderD in /dev/dri, this change would
resolve those warnings

---------
Co-authored-by: Yuting Jiang <yutingjiang@microsoft.com>

67f2aa72

27 Jul, 2023 1 commit

Release - SuperBench v0.9.0 (#558) · e1df877b

Yuting Jiang authored Jul 27, 2023

**Description**
Cherry-pick bug fixes from v0.9.0 to main.

**Major Revision**
- CI/CD: pipeline - clean more disk space to fix rocm building image
pipeline(#555 )
- Benchmarks: bug fix - use absolute path for input file in
DirectXEncodingLatency(#554)
- CI/CD - add push win docker image on release branch in pipeline (#552)
- Docs - Upgrade version and release note(#557)

e1df877b

24 Jul, 2023 1 commit

Bump semver from 5.7.1 to 5.7.2 in /website (#550) · 466b477e

dependabot[bot] authored Jul 24, 2023

Bumps [semver](https://github.com/npm/node-semver) from 5.7.1 to 5.7.2.
- [Release notes](https://github.com/npm/node-semver/releases)
- [Changelog](https://github.com/npm/node-semver/blob/v5.7.2/CHANGELOG.md

)
- [Commits](npm/node-semver@v5.7.1...v5.7.2)

---
updated-dependencies:
- dependency-name: semver
  dependency-type: indirect
...
Signed-off-by: dependabot[bot] <support@github.com>

466b477e

06 Jul, 2023 1 commit
- Benchmarks: micro benchmarks - add python code for DirectXGPUEncodingLatency (#548) · e8ac0b1e
  Yuting Jiang authored Jul 06, 2023
```
**Description**
add python code for DirectXGPUEncodingLatency.
```
  e8ac0b1e
05 Jul, 2023 4 commits
- Benchmarks: micro benchmarks - add python code for DirectXGPUCopy (#546) · c8c079c2
  Yuting Jiang authored Jul 06, 2023
```
**Description**
add python code for DirectXGPUCopy.
```
  c8c079c2
- Benchmarks: micro benchmarks - add python code for DirecXGPUMemBw (#547) · af4cfd5b
  Yuting Jiang authored Jul 05, 2023
```
**Description**
add python code for DirecXGPUMemBw.
```
  af4cfd5b
- Benchmarks: micro benchmarks - add python code for DirectXGPUCoreFlops (#542) · f1d608ae
  Yuting Jiang authored Jul 05, 2023
```
**Description**
add python code for DirectX core flops and init DirectX test pipeline.

**Major Revision**
- add python code for DirectX core flops 
- init DirectX test pipeline


**Minor Revision**
- add test for DirectX core flops
```
  f1d608ae
- CI/CD - Support DirectX test pipeline (#545) · 3704a432
  Yuting Jiang authored Jul 05, 2023
```
**Description**
Support DirectX test pipeline.
```
  3704a432
03 Jul, 2023 1 commit
- Benchmarks: Build Pipeline - add AMF in third party and build AMF encoding latency test (#543) · 86547217
  Yuting Jiang authored Jul 03, 2023
```
**Description**
add AMF in third party and build AMF encoding latency test.
```
  86547217
30 Jun, 2023 3 commits

Benchmarks: microbenchmark - add auto selecting algorithm support for cudnn functions (#540) · 97f7b1df

Yuting Jiang authored Jun 30, 2023

**Description**
add auto selecting algorithm support for cudnn functions.

**Major Revision**
- add auto selecting algorithm support for cudnn functions in source
code
- add 'auto_algo' option in benchmark
- add related test

97f7b1df

Doc - Update outdate references in micro-benchmarks.md (#544) · c7d0beaf

Lei Qu authored Jun 30, 2023

Modify link for Nvidia bandwidth test tool

**Description**
previous link is 404

**Minor Revision**
update the link value to
https://github.com/NVIDIA/cuda-samples/tree/master/Samples/1_Utilities/bandwidthTest

c7d0beaf

Benchmarks - Update result parsing in tensorrt inference (#541) · 7184bdd1
Yifan Xiong authored Jun 30, 2023
```
* Update result parsing for newer tensorrt versions
* Update arguments when load torchvision models
```
7184bdd1

29 Jun, 2023 1 commit
- Benchmarks: Add benchmark - Add source code of DirectxGPUCopy microbenchmark (#486) · f2599137
  Yuting Jiang authored Jun 29, 2023
```
**Description**
Add source code of DirectxGPUCopy microbenchmark.
```
  f2599137