- 19 Mar, 2026 1 commit
-
-
one authored
-
- 25 Jun, 2025 1 commit
-
-
guoshzhao authored
**Description** Add cuda 12.9 dockerfile and build in pipeline. --------- Co-authored-by:
Guoshuai Zhao <microsoft@microsoft.com> Co-authored-by:
Hongtao Zhang <hongtaozhang@microsoft.com> Co-authored-by:
Hongtao Zhang <garyworkzht@gmail.com>
-
- 18 Jun, 2025 1 commit
-
-
WenqingLan1 authored
Added GPU Stream benchmark - measures the GPU memory bandwidth and efficiency for double datatype through various memory operations including copy, scale, add, and triad. - added documentation for `gpu-stream` detailing its introduction, metrics, and descriptions. - added unit tests for `gpu-stream`. Example output is in `superbenchmark/tests/data/gpu_stream.log`.
-
- 22 Nov, 2024 1 commit
-
-
Hongtao Zhang authored
**Description** Add nvbandwidth benchmark. --------- Co-authored-by:hongtaozhang <hongtaozhang@microsoft.com>
-
- 08 Jan, 2024 1 commit
-
-
Yifan Xiong authored
**Description** Cherry-pick bug fixes from v0.10.0 to main. **Major Revisions** * Benchmarks: Microbenchmark - Support different hipblasLt data types in dist_inference #590 * Benchmarks: Microbenchmark - Support in-place for NCCL/RCCL benchmark #591 * Bug Fix - Fix NUMA Domains Swap Issue in NDv4 Topology File #592 * Benchmarks: Microbenchmark - Add data type option for NCCL and RCCL tests #595 * Benchmarks: Bug Fix - Make metrics of dist-inference-cpp aligned with PyTorch version #596 * CI/CD - Add ndv5 topo file #597 * Benchmarks: Microbenchmark - Improve AMD GPU P2P performance with fine-grained GPU memory #593 * Benchmarks: Build Pipeline - fix nccl and nccl test version to 2.18.3 to resolve hang issue in cuda12.2 docker #599 * Dockerfile - Bug fix for rocm docker build and deploy #598 * Benchmarks: Microbenchmark - Adapt to hipblasLt data type changes #603 * Benchmarks: Micro benchmarks - Update hipblaslt metric unit to tflops #604 * Monitor - U...
-
- 10 Dec, 2023 1 commit
-
-
Ziyue Yang authored
**Description** Add distributed inference benchmark cpp implementation.
-
- 08 Dec, 2023 1 commit
-
-
Ziyue Yang authored
Benchmarks: Micro benchmark - Add one-to-all, all-to-one, all-to-all support to gpu_copy_bw_performance (#588) **Description** Add one-to-all, all-to-one, all-to-all support to gpu_copy_bw_performance, and fix performance bug in gpu_copy
-
- 04 Dec, 2023 1 commit
-
-
Yuting Jiang authored
**Description** Benchmarks: micro benchmark - Support cpu-gpu and gpu-cpu in ib-validation **Major Revision** - Support cpu-gpu and gpu-cpu in ib-validation **Minor Revision** - support multi msg size, multi direction, multi ib commands in ib-validation
-
- 30 Jun, 2023 1 commit
-
-
Lei Qu authored
Modify link for Nvidia bandwidth test tool **Description** previous link is 404 **Minor Revision** update the link value to https://github.com/NVIDIA/cuda-samples/tree/master/Samples/1_Utilities/bandwidthTest
-
- 14 Apr, 2023 1 commit
-
-
Yifan Xiong authored
**Description** Cherry-pick bug fixes from v0.8.0 to main. **Major Revisions** * Monitor - Fix the cgroup version checking logic (#502) * Benchmark - Fix matrix size overflow issue in cuBLASLt GEMM (#503) * Fix wrong torch usage in communication wrapper for Distributed Inference Benchmark (#505) * Analyzer: Fix bug in python3.8 due to pandas api change (#504) * Bug - Fix bug to get metric from cmd when error happens (#506) * Monitor - Collect realtime GPU power when benchmarking (#507) * Add num_workers argument in model benchmark (#511) * Remove unreachable condition when write host list (#512) * Update cuda11.8 image to cuda12.1 based on nvcr23.03 (#513) * Doc - Fix wrong unit of cpu-memory-bw-latency in doc (#515) * Docs - Upgrade version and release note (#508) Co-authored-by:
guoshzhao <guzhao@microsoft.com> Co-authored-by:
Ziyue Yang <ziyyang@microsoft.com> Co-authored-by:
Yuting Jiang <yutingjiang@microsoft.com>
-
- 24 Mar, 2023 1 commit
-
-
Ziyue Yang authored
**Description** This PR adds a micro-benchmark of distributed model inference workloads. **Major Revision** - Add a new micro-benchmark dist-inference. - Add corresponding example and unit tests. - Update configuration files to include this new micro-benchmark. - Update micro-benchmark README. --------- Co-authored-by:Peng Cheng <chengpeng5555@outlook.com>
-
- 22 Mar, 2023 1 commit
-
-
Yifan Xiong authored
Support batch and shape range with multiplication factors in cublaslt gemm benchmark.
-
- 21 Mar, 2023 1 commit
-
-
rafsalas19 authored
**Description** - Adding HPL benchmark --------- Co-authored-by:
Ubuntu <azureuser@sbtestvm.jzlku1oskncengjiado35wf1hd.ax.internal.cloudapp.net> Co-authored-by:
Peng Cheng <chengpeng5555@outlook.com>
-
- 13 Feb, 2023 1 commit
-
-
rafsalas19 authored
**Description** - Added stream benchmark - Added stream unit test - Added stream example - Modified docker files to build stream --------- Co-authored-by:
Ubuntu <azureuser@sbtestvm.jzlku1oskncengjiado35wf1hd.ax.internal.cloudapp.net> Co-authored-by:
Peng Cheng <chengpeng5555@outlook.com> Co-authored-by:
Yifan Xiong <xiongyf@yandex.com>
-
- 28 Jan, 2023 1 commit
-
-
Yifan Xiong authored
**Description** Cherry-pick bug fixes from v0.7.0 to main. **Major Revisions** * Benchmarks - Fix missing include in FP8 benchmark (#460) * Fix bug in TE BERT model (#461) * Doc - Update benchmark doc (#465) * Bug: Fix bug for incorrect datatype judgement in cublas-function source code (#464) * Support `sb deploy` without pulling image (#466) * Docs - Upgrade version and release note (#467) Co-authored-by:
Russell J. Hewett <russell.j.hewett@gmail.com> Co-authored-by:
Yuting Jiang <yutingjiang@microsoft.com>
-
- 04 Jan, 2023 1 commit
-
-
Yang Wang authored
Support traffic patterns under the different devices in NCCL/RCCL test * change the metrics format if specified the pattern
-
- 03 Jan, 2023 1 commit
-
-
Yifan Xiong authored
Integrate cublaslt-gemm micro-benchmark #451.
-
- 26 Jul, 2022 1 commit
-
-
Jie Zhang authored
* Support topo-aware IB performance validation Add a new pattern `topo-aware`, so the user can run IB performance test based on VM's topology information. This way, the user can validate the IB performance across VM pairs with different distance as a quick test instead of pair-wise test. To run with topo-aware pattern, user needs to specify three required (and two optional) parameters in YAML config file: --pattern topo-aware --ibstat path to ibstat output --ibnetdiscover path to ibnetdiscover output --min_dist minimum distance of VM pairs (optional, default 2) --max_dist maximum distance of VM pairs (optional, default 6) The newly added topo_aware module then parses the topology information, builds a graph, and generates the VM pairs with the specified distance (# hops). The specified IB test will then be running across these generated VM pairs. Signed-off-by:
Jie Zhang <jessezhang1010@gmail.com> * Add description about topology aware ib traffic tests Signed-off-by:
Jie Zhang <jessezhang1010@gmail.com> * Add unit test to verify generated topology aware config file This commit adds unit test to verify the generated topology aware config file is correct. To do so, four new data files are added in order to invoke gen_topo_aware_config function to generate topology aware config file, then compares it with the expected config file. Signed-off-by:
Jie Zhang <jessezhang1010@gmail.com> * Fix lint issue on Azure pipeline Signed-off-by:
Jie Zhang <jessezhang1010@gmail.com>
-
- 29 Jun, 2022 1 commit
-
-
Yifan Xiong authored
Fix several issues in ib loopback benchmark: * use `--report_gbits` and divide by 8 to get GB/s, previous results are MiB/s / 1000 * use the ib_write_bw binary built in third_party instead of system path * update the metrics name so that different hca indices have same metric
-
- 24 Jun, 2022 1 commit
-
-
Yifan Xiong authored
**Description** Support multiple IB/GPU devices run simultaneously in ib validation benchmark. **Major Revisions** - Revise ib_validation_performance.cc so that multiple processes per node could be used to launch multiple perftest commands simultaneously. For each node pair in the config, number of processes per node will run in parallel. - Revise ib_validation_performance.py to correct file paths and adjust parameters to specify different NICs/GPUs/NUMA nodes. - Fix env issues in Dockerfile for end-to-end test. - Update ib-traffic configuration examples in config files. - Update unit tests and docs accordingly. Closes #326.
-
- 16 Mar, 2022 1 commit
-
-
rafsalas19 authored
**Description** Modifications adding GPU-Burn to SuperBench. - added third party submodule - modified Makefile to make gpu-burn binary - added/modified microbenchmarks to add gpu-burn python scripts - modified default and azure_ndv4 configs to add gpu-burn
-
- 09 Feb, 2022 1 commit
-
-
Ziyue Yang authored
**Description** This commit remove NUMA binding for device-to-device tests because NUMA doesn't affect performance, and revise benchmark metrics accordingly.
-
- 21 Jan, 2022 1 commit
-
-
Ziyue Yang authored
**Description** This commit adds bidirectional tests in gpu_copy benchmark for both device-host transfer and device-device transfer, and revises related tests.
-
- 19 Jan, 2022 1 commit
-
-
guoshzhao authored
**Description** Add 50th, 90th, 95th, 99th, 99.9th latency metrics for ORT and pytorch inference benchmarks.
-
- 30 Dec, 2021 1 commit
-
-
Yifan Xiong authored
__Description__ Cherry-pick bug fixes from v0.4.0 to main. __Major Revisions__ * Bug - Fix issues for Ansible and benchmarks (#267) * Tests - Refine test cases for microbenchmark (#268) * Bug - Build openmpi with ucx support in rocm dockerfiles (#269) * Benchmarks: Fix Bug - Fix fio build issue (#272) * Docs - Unify metric and add doc for cublas and cudnn functions (#271) * Monitor: Revision - Add 'monitor/' prefix to monitor metrics in result summary (#274) * Bug - Fix bug of detecting if gpu_index is none (#275) * Bug - Fix bugs in data diagnosis (#273) * Bug - Fix issue that the root mpi rank may not be the first in the hostfile (#270) * Benchmarks: Configuration - Update inference and network benchmarks in configs (#276) * Docs - Upgrade version and release note (#277) Co-authored-by:Yuting Jiang <v-yutjiang@microsoft.com>
-
- 13 Dec, 2021 2 commits
-
-
Yifan Xiong authored
Add transformers for TensorRT inference.
-
Ziyue Yang authored
**Description** Add benchmark metrics for cpu-memory-bw-latency.
-
- 10 Dec, 2021 1 commit
-
-
guoshzhao authored
**Description** Add ONNXRuntime inference benchmark based on ORT python API. **Major Revision** - Add `ORTInferenceBenchmark` class to export pytorch model to onnx model and do inference - Add tests and example for `ort-inference` benchmark - Update the introduction docs.
-
- 09 Dec, 2021 1 commit
-
-
Yuting Jiang authored
**Description** Unify metric names of benchmarks.
-
- 30 Nov, 2021 1 commit
-
-
Yuting Jiang authored
**Description** Update ib validtion mirobenchmark metrics.
-
- 26 Nov, 2021 1 commit
-
-
Ziyue Yang authored
**Description** Update gpu-copy benchmark metrics.
-
- 12 Nov, 2021 1 commit
-
-
Yifan Xiong authored
__Description__ Add TensorRT inference benchmark for torchvision models. __Major Revision__ - Measure TensorRT inference performance.
-
- 10 Nov, 2021 1 commit
-
-
Yuting Jiang authored
**Description** Update docs to add network benchmarks for tcp and gpcnet.
-
- 27 Oct, 2021 1 commit
-
-
Yifan Xiong authored
Add introduction and metrics for micro-benchmarks and model-benchmarks document.
-
- 12 Oct, 2021 1 commit
-
-
Yifan Xiong authored
__Major Revisions__ * Refine document structure for user tutorial. __Minor Revisions__ * Add AMD part in installation. * Change default config file to latest link.
-
- 30 Jun, 2021 1 commit
-
-
TobeyQin authored
* Add introduction and release documents. * Fix some typos in documents.
-
- 25 Jun, 2021 1 commit
-
-
Yifan Xiong authored
Update SuperBench documents.
-