- 02 Apr, 2026 2 commits
- 01 Apr, 2026 1 commit
-
-
one authored
-
- 31 Mar, 2026 1 commit
-
-
one authored
-
- 20 Mar, 2026 1 commit
-
-
one authored
-
- 19 Mar, 2026 2 commits
-
-
one authored
- Added Platform.DTK in the microbenchmark framework. - Introduced new DTK hipblaslt benchmark class and corresponding tests. - Updated Dockerfile to include hipblaslt-bench and its permissions. - Registered DTK benchmarks in the benchmark registry for various performance tests. - Enhanced GPU detection logic to recognize HYGON GPUs. This update improves the benchmarking capabilities for DTK, ensuring compatibility and performance testing across platforms.
-
one authored
- Update rocm_commom.cmake for CMake>=3.24 - Prevent isolation build - Add BabelStream as a submodule - Update dockerignore
-
- 17 Mar, 2026 1 commit
-
-
one authored
-
- 11 Mar, 2026 1 commit
-
-
Hongtao Zhang authored
## Summary - Upgrade Intel Memory Latency Checker from v3.11 to v3.12 in rocm5.0.x.dockerfile - Aligns with other dockerfiles that already use v3.12 Co-authored-by:
Hongtao Zhang <hongtaozhang@microsoft.com> Co-authored-by:
Claude Opus 4.5 <noreply@anthropic.com>
-
- 28 Jan, 2026 1 commit
-
-
Hongtao Zhang authored
**Description** - When building the CUDA 11.1.1 image, pip (Python 3.8) cannot find a pre-built wheel for the latest wandb release (v0.23.1). As a result, pip attempts to build wandb from source. However, the build fails because the image does not have Go installed, which is required for building wandb from source. Then the error appears. **Solution** - For the CUDA 11.1.1 build, install the required build tools (e.g., Go, Rust, and Cargo) needed for wandb. --------- Co-authored-by:
Hongtao Zhang <hongtaozhang@microsoft.com> Co-authored-by:
Copilot <175728472+Copilot@users.noreply.github.com>
-
- 06 Nov, 2025 1 commit
-
-
WenqingLan1 authored
Updated mlc wget link in dockerfiles. --------- Co-authored-by:guoshzhao <guzhao@microsoft.com>
-
- 01 Oct, 2025 1 commit
-
-
WenqingLan1 authored
Add support for cuda13.0. Add cuda13.0.dockerfile. Add cuda13.0 image building task to github pipeline. Update GPU STREAM to work with cuda13.0. Fix data type conversion perf bug in GPU stream. Update nvbandwidth submodule to be v0.8. Update perftest submodule to be 4bee61f80d9e268fc97eaf40be00409e91d3a19e (recent master). --------- Co-authored-by:
Ubuntu <dilipreddi@gmail.com> Co-authored-by:
guoshzhao <guzhao@microsoft.com>
-
- 25 Jun, 2025 1 commit
-
-
guoshzhao authored
**Description** Add cuda 12.9 dockerfile and build in pipeline. --------- Co-authored-by:
Guoshuai Zhao <microsoft@microsoft.com> Co-authored-by:
Hongtao Zhang <hongtaozhang@microsoft.com> Co-authored-by:
Hongtao Zhang <garyworkzht@gmail.com>
-
- 30 Apr, 2025 1 commit
-
-
Hongtao Zhang authored
- Upgrade OS of github runner used by lint to the latest. - Add symbolic link for clang-format to version 14. - Update importlib_metadata version since it is too old (inside nvcr.io/nvidia/pytorch:20.12-py3) and failed the 11.1 build. --------- Co-authored-by:
hongtaozhang <hongtaozhang@microsoft.com> Co-authored-by:
Yifan Xiong <yifan.xiong@microsoft.com>
-
- 21 Mar, 2025 1 commit
-
-
pdr authored
**Description** Updated docker for 12.8 Use cutlass latest relase 3.8 with ARCH 100(blackwell) support add latest nccl-test release with ARCH 100(blackwell) Updated msccl to support build for sm_100 No breaking changes, so backward compatible tested with cuda 12.4 --------- Co-authored-by:Hongtao Zhang <garyworkzht@gmail.com>
-
- 21 Nov, 2024 1 commit
-
-
Hongtao Zhang authored
**Description** Add nvbandwidth build to repo --------- Co-authored-by:hongtaozhang <hongtaozhang@microsoft.com>
-
- 06 Nov, 2024 1 commit
-
-
pdr authored
Add support for arm64 build: - Updated dockerfile for arm64 build - extend cpu stream compilation for neoverse - handle onnxruntime-gpu installation - third party builds filtering based on arch - disable cuda decode perf build for non x86
-
- 10 Oct, 2024 1 commit
-
-
Yuting Jiang authored
**Description** Cherry pick bug fixes from v0.11.0 to main **Major Revision** * #645 * #648 * #646 * #647 * #651 * #652 * #650 --------- Co-authored-by:
hongtaozhang <hongtaozhang@microsoft.com> Co-authored-by:
Yifan Xiong <yifan.xiong@microsoft.com>
-
- 13 Aug, 2024 1 commit
-
-
Yang Wang authored
Add 10-hpcx.sh to /etc/profile.d Update the Docker exec command to ensure a persistent HPCX environment.
-
- 28 Jul, 2024 1 commit
-
-
Yuting Jiang authored
**Description** Fix MSCCL build error in CUDA12.4 docker build pipeline due to OOM issue.
-
- 22 Apr, 2024 1 commit
-
-
Yuting Jiang authored
**Description** Add CUDA 12.4 dockerfile. **Major Revision** - upgrade nvidia docker into 23.04 **Minor Revision** - upgrade hpcx into 2.18
-
- 18 Apr, 2024 1 commit
-
-
Yuting Jiang authored
**Description** Upgrade mlc to v3.11.
-
- 21 Mar, 2024 1 commit
-
-
Yang Wang authored
**Description** Cuda 12.2 image will report undfined symbol error due to incomplete LD_LIBRARY_PATH:  ### How to reproduce: 1. Deploy sb with cuda12.2 image ``` sb deploy -f local.ini -i superbench/superbench:v0.10.0-cuda12.2 ``` 2. Enter to the container ``` sudo docker exec -it sb-workspace bash ``` 3. Execute `mpirun`: ``` root@sb-container:~# mpirun mpirun: symbol lookup error: mpirun: undefined symbol: opal_libevent2022_event_base_loop ``` ### Fix to fix * Append hpcx_load into /etc/bash.bashrc for updaing env LD_LIBRARY_PATH in each time ---------
-
- 08 Jan, 2024 1 commit
-
-
Yifan Xiong authored
**Description** Cherry-pick bug fixes from v0.10.0 to main. **Major Revisions** * Benchmarks: Microbenchmark - Support different hipblasLt data types in dist_inference #590 * Benchmarks: Microbenchmark - Support in-place for NCCL/RCCL benchmark #591 * Bug Fix - Fix NUMA Domains Swap Issue in NDv4 Topology File #592 * Benchmarks: Microbenchmark - Add data type option for NCCL and RCCL tests #595 * Benchmarks: Bug Fix - Make metrics of dist-inference-cpp aligned with PyTorch version #596 * CI/CD - Add ndv5 topo file #597 * Benchmarks: Microbenchmark - Improve AMD GPU P2P performance with fine-grained GPU memory #593 * Benchmarks: Build Pipeline - fix nccl and nccl test version to 2.18.3 to resolve hang issue in cuda12.2 docker #599 * Dockerfile - Bug fix for rocm docker build and deploy #598 * Benchmarks: Microbenchmark - Adapt to hipblasLt data type changes #603 * Benchmarks: Micro benchmarks - Update hipblaslt metric unit to tflops #604 * Monitor - U...
-
- 09 Dec, 2023 1 commit
-
-
Yuting Jiang authored
**Description** upgrade to rocm5.7 dockerfile. --------- Co-authored-by:yukirora <yuting.jiang@microsoft.com>
-
- 07 Dec, 2023 2 commits
-
-
Ziyue Yang authored
**Description** Add MSCCL support for Nvidia GPU
-
Yuting Jiang authored
**Description** Megatron-LM/Megatron-Deepspeed GPT pretrain benchmark
-
- 22 Nov, 2023 3 commits
-
-
Yifan Xiong authored
Upgrade Docker image to CUDA 12.2 for H100: * upgrade base image to 23.10 * fix onnxruntime version in python3.10 * fix compilation errors
-
Yuting Jiang authored
**Description** hipblaslt function benchmark and rebase cublaslt function benchmark.
-
guoshzhao authored
**Description** Generate baseline given results from multiple nodes. **Major Revision** - Add sub command `sb result generate-baseline` - Add UT and docs --------- Co-authored-by:
454314380 <454314380@qq.com> Co-authored-by:
Yuting Jiang <yutingjiang@microsoft.com>
-
- 23 Oct, 2023 1 commit
-
-
Yuting Jiang authored
**Description** Update mlc version into 3.10 for cuda and rocm dockerfiles to be consistent with cuda12 dockerfile Co-authored-by:yukirora <yuting.jiang@microsoft.com>
-
- 22 Aug, 2023 1 commit
-
-
Yuting Jiang authored
**Description** source code for evaluating NVDEC decoding performance. --------- Co-authored-by:yukirora <yuting.jiang@microsoft.com>
-
- 06 Jul, 2023 1 commit
-
-
Yuting Jiang authored
**Description** add python code for DirectXGPUEncodingLatency.
-
- 03 Jul, 2023 1 commit
-
-
Yuting Jiang authored
**Description** add AMF in third party and build AMF encoding latency test.
-
- 28 Jun, 2023 1 commit
-
-
Yuting Jiang authored
**Description** Add dockerfile for win10 and building script for directx_benchmarks. **Major Revision** - Add docker file for win10 and required scripts to install the dependency - Add building script to build all directx vs benchmarks - Add call of building script in Makefile --------- Co-authored-by:
yukirora <yuting.jiang@microsoft.com> Co-authored-by:
Yifan Xiong <yifan.xiong@microsoft.com>
-
- 14 Apr, 2023 1 commit
-
-
Yifan Xiong authored
**Description** Cherry-pick bug fixes from v0.8.0 to main. **Major Revisions** * Monitor - Fix the cgroup version checking logic (#502) * Benchmark - Fix matrix size overflow issue in cuBLASLt GEMM (#503) * Fix wrong torch usage in communication wrapper for Distributed Inference Benchmark (#505) * Analyzer: Fix bug in python3.8 due to pandas api change (#504) * Bug - Fix bug to get metric from cmd when error happens (#506) * Monitor - Collect realtime GPU power when benchmarking (#507) * Add num_workers argument in model benchmark (#511) * Remove unreachable condition when write host list (#512) * Update cuda11.8 image to cuda12.1 based on nvcr23.03 (#513) * Doc - Fix wrong unit of cpu-memory-bw-latency in doc (#515) * Docs - Upgrade version and release note (#508) Co-authored-by:
guoshzhao <guzhao@microsoft.com> Co-authored-by:
Ziyue Yang <ziyyang@microsoft.com> Co-authored-by:
Yuting Jiang <yutingjiang@microsoft.com>
-
- 21 Mar, 2023 1 commit
-
-
rafsalas19 authored
**Description** - Adding HPL benchmark --------- Co-authored-by:
Ubuntu <azureuser@sbtestvm.jzlku1oskncengjiado35wf1hd.ax.internal.cloudapp.net> Co-authored-by:
Peng Cheng <chengpeng5555@outlook.com>
-
- 06 Mar, 2023 1 commit
-
-
Yifan Xiong authored
Pin setuptools version to [v65.7.0](https://setuptools.pypa.io/en/latest/history.html#v65-7-0) to avoid breaking changes since v66.0.0.
-
- 13 Feb, 2023 1 commit
-
-
rafsalas19 authored
**Description** - Added stream benchmark - Added stream unit test - Added stream example - Modified docker files to build stream --------- Co-authored-by:
Ubuntu <azureuser@sbtestvm.jzlku1oskncengjiado35wf1hd.ax.internal.cloudapp.net> Co-authored-by:
Peng Cheng <chengpeng5555@outlook.com> Co-authored-by:
Yifan Xiong <xiongyf@yandex.com>
-
- 07 Feb, 2023 1 commit
-