- 08 Jan, 2024 1 commit
-
-
Yifan Xiong authored
**Description** Cherry-pick bug fixes from v0.10.0 to main. **Major Revisions** * Benchmarks: Microbenchmark - Support different hipblasLt data types in dist_inference #590 * Benchmarks: Microbenchmark - Support in-place for NCCL/RCCL benchmark #591 * Bug Fix - Fix NUMA Domains Swap Issue in NDv4 Topology File #592 * Benchmarks: Microbenchmark - Add data type option for NCCL and RCCL tests #595 * Benchmarks: Bug Fix - Make metrics of dist-inference-cpp aligned with PyTorch version #596 * CI/CD - Add ndv5 topo file #597 * Benchmarks: Microbenchmark - Improve AMD GPU P2P performance with fine-grained GPU memory #593 * Benchmarks: Build Pipeline - fix nccl and nccl test version to 2.18.3 to resolve hang issue in cuda12.2 docker #599 * Dockerfile - Bug fix for rocm docker build and deploy #598 * Benchmarks: Microbenchmark - Adapt to hipblasLt data type changes #603 * Benchmarks: Micro benchmarks - Update hipblaslt metric unit to tflops #604 * Monitor - U...
-
- 11 Dec, 2023 1 commit
-
-
Ziyue Yang authored
**Description** `add_compile_options` will not work for ROCm build, change it to setting `CMAKE_CXX_FLAGS`.
-
- 10 Dec, 2023 1 commit
-
-
Ziyue Yang authored
**Description** Add distributed inference benchmark cpp implementation.
-
- 09 Dec, 2023 1 commit
-
-
Yuting Jiang authored
**Description** upgrade to rocm5.7 dockerfile. --------- Co-authored-by:yukirora <yuting.jiang@microsoft.com>
-
- 08 Dec, 2023 1 commit
-
-
Ziyue Yang authored
Benchmarks: Micro benchmark - Add one-to-all, all-to-one, all-to-all support to gpu_copy_bw_performance (#588) **Description** Add one-to-all, all-to-one, all-to-all support to gpu_copy_bw_performance, and fix performance bug in gpu_copy
-
- 07 Dec, 2023 2 commits
-
-
Ziyue Yang authored
**Description** Add MSCCL support for Nvidia GPU
-
Yuting Jiang authored
**Description** Megatron-LM/Megatron-Deepspeed GPT pretrain benchmark
-
- 05 Dec, 2023 1 commit
-
-
Ziyue Yang authored
**Description** Revise NCCL/RCCL benchmarks to graph mode add latency metrics.
-
- 04 Dec, 2023 1 commit
-
-
Yuting Jiang authored
**Description** Benchmarks: micro benchmark - Support cpu-gpu and gpu-cpu in ib-validation **Major Revision** - Support cpu-gpu and gpu-cpu in ib-validation **Minor Revision** - support multi msg size, multi direction, multi ib commands in ib-validation
-
- 27 Nov, 2023 1 commit
-
-
guoshzhao authored
**Description** Add AMD support in monitor. **Major Revision** - Add library pyrsmi to collect metrics. - Currently can get device_utilization, device_power, device_used_memory and device_total_memory.
-
- 22 Nov, 2023 4 commits
-
-
Yifan Xiong authored
Upgrade Docker image to CUDA 12.2 for H100: * upgrade base image to 23.10 * fix onnxruntime version in python3.10 * fix compilation errors
-
Yuting Jiang authored
**Description** add initialization options for rocm gemm flops.
-
Yuting Jiang authored
**Description** hipblaslt function benchmark and rebase cublaslt function benchmark.
-
guoshzhao authored
**Description** Generate baseline given results from multiple nodes. **Major Revision** - Add sub command `sb result generate-baseline` - Add UT and docs --------- Co-authored-by:
454314380 <454314380@qq.com> Co-authored-by:
Yuting Jiang <yutingjiang@microsoft.com>
-
- 20 Nov, 2023 1 commit
-
-
Yuting Jiang authored
**Description** add int8 support for cublaslt function.
-
- 14 Nov, 2023 1 commit
-
-
Yuting Jiang authored
**Description** remove cp ptx file in gpu burn test since the command is run inside self.args.bin_dir dir. https://github.com/microsoft/superbenchmark/blob/d246bab430adeb461072918a551b2e2b68c9bce5/superbench/benchmarks/micro_benchmarks/micro_base.py#L183
-
- 07 Nov, 2023 1 commit
-
-
dependabot[bot] authored
Bumps [@babel/traverse](https://github.com/babel/babel/tree/HEAD/packages/babel-traverse) from 7.14.5 to 7.23.2. - [Release notes](https://github.com/babel/babel/releases) - [Changelog](https://github.com/babel/babel/blob/main/CHANGELOG.md) - [Commits](https://github.com/babel/babel/commits/v7.23.2/packages/babel-traverse ) --- updated-dependencies: - dependency-name: "@babel/traverse" dependency-type: indirect ... Signed-off-by:
dependabot[bot] <support@github.com>
-
- 05 Nov, 2023 1 commit
-
-
dependabot[bot] authored
Bumps [postcss](https://github.com/postcss/postcss) from 8.3.5 to 8.4.31. - [Release notes](https://github.com/postcss/postcss/releases) - [Changelog](https://github.com/postcss/postcss/blob/main/CHANGELOG.md ) - [Commits](postcss/postcss@8.3.5...8.4.31) --- updated-dependencies: - dependency-name: postcss dependency-type: indirect ... Signed-off-by:
dependabot[bot] <support@github.com>
-
- 23 Oct, 2023 1 commit
-
-
Yuting Jiang authored
**Description** Update mlc version into 3.10 for cuda and rocm dockerfiles to be consistent with cuda12 dockerfile Co-authored-by:yukirora <yuting.jiang@microsoft.com>
-
- 22 Aug, 2023 1 commit
-
-
Yuting Jiang authored
**Description** source code for evaluating NVDEC decoding performance. --------- Co-authored-by:yukirora <yuting.jiang@microsoft.com>
-
- 18 Aug, 2023 1 commit
-
-
Yuting Jiang authored
**Description** add source code for DirectXRenderPerf. --------- Co-authored-by:yukirora <yuting.jiang@microsoft.com>
-
- 08 Aug, 2023 1 commit
-
-
pnunna93 authored
This PR has following changes - torch.distributed.launch changed to torchrun. torch.distributed.launch is deprecated in latest Pytorch and is recommended to move to torchrun - https://pytorch.org/docs/stable/elastic/run.html - Changes to AMD GPU detection logic. The AMD GPU detection logic throws warning when containers have only renderD in /dev/dri, this change would resolve those warnings --------- Co-authored-by:
Yuting Jiang <yutingjiang@microsoft.com>
-
- 27 Jul, 2023 1 commit
-
-
Yuting Jiang authored
**Description** Cherry-pick bug fixes from v0.9.0 to main. **Major Revision** - CI/CD: pipeline - clean more disk space to fix rocm building image pipeline(#555 ) - Benchmarks: bug fix - use absolute path for input file in DirectXEncodingLatency(#554) - CI/CD - add push win docker image on release branch in pipeline (#552) - Docs - Upgrade version and release note(#557)
-
- 24 Jul, 2023 1 commit
-
-
dependabot[bot] authored
Bumps [semver](https://github.com/npm/node-semver) from 5.7.1 to 5.7.2. - [Release notes](https://github.com/npm/node-semver/releases) - [Changelog](https://github.com/npm/node-semver/blob/v5.7.2/CHANGELOG.md ) - [Commits](npm/node-semver@v5.7.1...v5.7.2) --- updated-dependencies: - dependency-name: semver dependency-type: indirect ... Signed-off-by:
dependabot[bot] <support@github.com>
-
- 06 Jul, 2023 1 commit
-
-
Yuting Jiang authored
**Description** add python code for DirectXGPUEncodingLatency.
-
- 05 Jul, 2023 4 commits
-
-
Yuting Jiang authored
**Description** add python code for DirectXGPUCopy.
-
Yuting Jiang authored
**Description** add python code for DirecXGPUMemBw.
-
Yuting Jiang authored
**Description** add python code for DirectX core flops and init DirectX test pipeline. **Major Revision** - add python code for DirectX core flops - init DirectX test pipeline **Minor Revision** - add test for DirectX core flops
-
Yuting Jiang authored
**Description** Support DirectX test pipeline.
-
- 03 Jul, 2023 1 commit
-
-
Yuting Jiang authored
**Description** add AMF in third party and build AMF encoding latency test.
-
- 30 Jun, 2023 3 commits
-
-
Yuting Jiang authored
**Description** add auto selecting algorithm support for cudnn functions. **Major Revision** - add auto selecting algorithm support for cudnn functions in source code - add 'auto_algo' option in benchmark - add related test
-
Lei Qu authored
Modify link for Nvidia bandwidth test tool **Description** previous link is 404 **Minor Revision** update the link value to https://github.com/NVIDIA/cuda-samples/tree/master/Samples/1_Utilities/bandwidthTest
-
Yifan Xiong authored
* Update result parsing for newer tensorrt versions * Update arguments when load torchvision models
-
- 29 Jun, 2023 4 commits
-
-
Yuting Jiang authored
**Description** Add source code of DirectxGPUCopy microbenchmark.
-
Yuting Jiang authored
**Description** Add source code of DirectxGPUMemBw microbenchmark. --------- Co-authored-by:v-junlinlv <v-junlinlv@microsoft.com>
-
Yuting Jiang authored
**Description** Add runner for sys info to automatically collect on multiple nodes and update related docs. **Major Revision** - add runner for sys info which will check docker status and run `sb node info` on all nodes' docker and fetch results from all nodes **Minor Revision** - update cli and system-info doc - update sb node info to save output info output-dir/sys-info.json
-
Yuting Jiang authored
**Description** Add source code of DirectXGPUCoreFLOPs microbenchmark. --------- Co-authored-by:v-junlinlv <v-junlinlv@microsoft.com>
-
- 28 Jun, 2023 1 commit
-
-
Yuting Jiang authored
**Description** Add dockerfile for win10 and building script for directx_benchmarks. **Major Revision** - Add docker file for win10 and required scripts to install the dependency - Add building script to build all directx vs benchmarks - Add call of building script in Makefile --------- Co-authored-by:
yukirora <yuting.jiang@microsoft.com> Co-authored-by:
Yifan Xiong <yifan.xiong@microsoft.com>
-
- 21 Jun, 2023 1 commit
-
-
Yuting Jiang authored
**Description** Add support for DirectX GPU platform. **Major Revision** - Add DirectX platform for benchmark registry - Add gpu_vendor identify for AMD and NVIDIA with win driver
-
- 16 Jun, 2023 1 commit
-
-
guoshzhao authored
**Description** Update 404 outdate reference links.
-