- 20 Jun, 2025 1 commit
-
-
WenqingLan1 authored
**Description** Added support for Grace CPU neo2 architecture in CPU Stream. Now CPU Stream supports dual socket benchmarking. Example config for this arch support: ```yaml cpu-stream:numa0: timeout: *default_timeout modes: - name: local parallel: no parameters: cpu_arch: neo2 numa_mem_nodes: 0 cores: 0 1 2 3 4 5 6 7 8 cpu-stream:numa1: timeout: *default_timeout modes: - name: local parallel: no parameters: cpu_arch: neo2 numa_mem_nodes: 1 cores: 64 65 66 67 68 69 70 71 72 cpu-stream:numa-spread: timeout: *default_timeout modes: - name: local parallel: no parameters: cpu_arch: neo2 numa_mem_nodes: 0 1 cores: 0 1 2 3 4 5 6 7 8 64 65 66 67 68 69 70 71 72 ``` --------- Co-authored-by:dpower4 <dilipreddi@gmail.com>
-
- 18 Jun, 2025 1 commit
-
-
WenqingLan1 authored
Added GPU Stream benchmark - measures the GPU memory bandwidth and efficiency for double datatype through various memory operations including copy, scale, add, and triad. - added documentation for `gpu-stream` detailing its introduction, metrics, and descriptions. - added unit tests for `gpu-stream`. Example output is in `superbenchmark/tests/data/gpu_stream.log`.
-
- 05 Feb, 2025 1 commit
-
-
Hongtao Zhang authored
**Description** 1. Fixed the bug that nvbandwidth benchmark need to handle 'N/A' values in nvbandwidth cmd output. 2. Replaced the input format of test cases with a list. 3. Add nvbandwidth configuration example in default config files. --------- Co-authored-by:
hongtaozhang <hongtaozhang@microsoft.com> Co-authored-by:
Yifan Xiong <yifan.xiong@microsoft.com>
-
- 28 Nov, 2024 1 commit
-
-
pdr authored
Added llama benchmark - training and inference in accordance with the existing pytorch models implementation like gpt2, lstm etc. - added llama fp8 unit test for better code coverage, to reduce memory required - updated transformers version >= 4.28.0 for LLamaConfig - set tokenizers version <= 0.20.3 to avoid 0.20.4 version [issues](https://github.com/huggingface/tokenizers/issues/1691 ) with py3.8 - added llama2 to tensorrt - llama2 tests not added to test_tensorrt_inference_performance.py due to large memory requirement for worker gpu. tests validated separately on gh200 --------- Co-authored-by:
dpatlolla <dpatlolla@microsoft.com>
-
- 22 Nov, 2024 1 commit
-
-
Hongtao Zhang authored
**Description** Add nvbandwidth benchmark. --------- Co-authored-by:hongtaozhang <hongtaozhang@microsoft.com>
-
- 08 Dec, 2023 1 commit
-
-
Ziyue Yang authored
Benchmarks: Micro benchmark - Add one-to-all, all-to-one, all-to-all support to gpu_copy_bw_performance (#588) **Description** Add one-to-all, all-to-one, all-to-all support to gpu_copy_bw_performance, and fix performance bug in gpu_copy
-
- 24 Mar, 2023 1 commit
-
-
Ziyue Yang authored
**Description** This PR adds a micro-benchmark of distributed model inference workloads. **Major Revision** - Add a new micro-benchmark dist-inference. - Add corresponding example and unit tests. - Update configuration files to include this new micro-benchmark. - Update micro-benchmark README. --------- Co-authored-by:Peng Cheng <chengpeng5555@outlook.com>
-
- 21 Mar, 2023 1 commit
-
-
rafsalas19 authored
**Description** - Adding HPL benchmark --------- Co-authored-by:
Ubuntu <azureuser@sbtestvm.jzlku1oskncengjiado35wf1hd.ax.internal.cloudapp.net> Co-authored-by:
Peng Cheng <chengpeng5555@outlook.com>
-
- 13 Feb, 2023 1 commit
-
-
rafsalas19 authored
**Description** - Added stream benchmark - Added stream unit test - Added stream example - Modified docker files to build stream --------- Co-authored-by:
Ubuntu <azureuser@sbtestvm.jzlku1oskncengjiado35wf1hd.ax.internal.cloudapp.net> Co-authored-by:
Peng Cheng <chengpeng5555@outlook.com> Co-authored-by:
Yifan Xiong <xiongyf@yandex.com>
-
- 11 Apr, 2022 1 commit
-
-
guoshzhao authored
**Description** Integrate FAMBench into superbench based on docker implementation: https://github.com/facebookresearch/FAMBench The script to run all benchmarks is: https://github.com/facebookresearch/FAMBench/blob/main/benchmarks/run_all.sh
-
- 16 Mar, 2022 1 commit
-
-
rafsalas19 authored
**Description** Modifications adding GPU-Burn to SuperBench. - added third party submodule - modified Makefile to make gpu-burn binary - added/modified microbenchmarks to add gpu-burn python scripts - modified default and azure_ndv4 configs to add gpu-burn
-
- 08 Feb, 2022 1 commit
-
-
Ziyue Yang authored
This commit makes data checking in gpu_copy optional, because it will take too long time if message size is large.
-
- 21 Jan, 2022 1 commit
-
-
Ziyue Yang authored
**Description** This commit adds bidirectional tests in gpu_copy benchmark for both device-host transfer and device-device transfer, and revises related tests.
-
- 13 Dec, 2021 1 commit
-
-
Hossein Pourreza authored
**Description** Add mlc memory bandwidth and latency micro benchmark to Superbench. **Major Revision** - Add mlc benchmark with test and example files
-
- 10 Dec, 2021 1 commit
-
-
guoshzhao authored
**Description** Add ONNXRuntime inference benchmark based on ORT python API. **Major Revision** - Add `ORTInferenceBenchmark` class to export pytorch model to onnx model and do inference - Add tests and example for `ort-inference` benchmark - Update the introduction docs.
-
- 25 Nov, 2021 1 commit
-
-
Kaiyu Xie authored
**Description** Fix typo in description of kernel_launch_overhead.py
-
- 12 Nov, 2021 1 commit
-
-
Yifan Xiong authored
__Description__ Add TensorRT inference benchmark for torchvision models. __Major Revision__ - Measure TensorRT inference performance.
-
- 09 Nov, 2021 1 commit
-
-
Yuting Jiang authored
**Description** Add ib traffic validation distributed benchmark. **Major Revision** - Add ib traffic validation distributed benchmark, example and test
-
- 30 Oct, 2021 1 commit
-
-
Ziyue Yang authored
**Description** This commit does the following: 1) Adds CPU-initiated copy benchmark; 2) Adds dtod benchmark; 3) Support scanning NUMA nodes and GPUs inside the benchmark program; 4) Change the name of gpu-sm-copy to gpu-copy.
-
- 27 Oct, 2021 1 commit
-
-
guoshzhao authored
Add RocmOnnxModelBenchmark class to run benchmarks packaged in superbench/benchmark:rocm4.3.1-onnxruntime1.9.0
-
- 22 Oct, 2021 1 commit
-
-
Yuting Jiang authored
**Description** Add gpcnet microbenchmark **Major Revision** - add 2 microbenmark for gpcnet, gpc-network-test, gpc-network-load-test - add related test and example file
-
- 12 Oct, 2021 1 commit
-
-
Yuting Jiang authored
**Description** Add tcp connectivity validation microbenchmark which is to validate TCP connectivity between current node and several nodes in the hostfile. **Major Revision** - Add tcp connectivity validation microbenchmark and related test, example
-
- 30 Aug, 2021 2 commits
-
-
Ziyue Yang authored
**Description** This commit adds gpu_sm_copy benchmark and related tests.
-
Yuting Jiang authored
**Description** Add gemm flops microbenchmark for amd. **Major Revision** - Add gemm flops microbenchmark for amd. - Add related example and test file.
-
- 27 Aug, 2021 1 commit
-
-
Yuting Jiang authored
**Description** Add memory bus bandwidth performance microbenchmark for amd. **Major Revision** - Add memory bus bandwidth performance microbenchmark for amd. - Add related example and test file.
-
- 30 Jul, 2021 1 commit
-
-
Yuting Jiang authored
**Description** Add rccl bandwidth microbenchmark for rocm. **Major Revision** - Register rccl-bw benchmark.
-
- 26 Jul, 2021 1 commit
-
-
Yuting Jiang authored
**Description** Add NCCL performance microbenchmark. **Major Revision** - Add microbenchmark, example, test, config for NCCL
-
- 23 Jul, 2021 2 commits
-
-
Yuting Jiang authored
**Description** Add RDMA Loopback performance microbenchmark. **Major Revision** - Add microbenchmark, example, test, config for RDMA Loopback
-
Ziyue Yang authored
**Description** Add disk performance microbenchmark. **Major Revision** - Add microbenchmark, example, test, config for disk performance. **Minor Revision** - Fix bugs in executor unit test related to default enabled tests.
-
- 13 Jul, 2021 1 commit
-
-
Yuting Jiang authored
Add microbenchmark, example, test, config for cuda memory performance and Add cuda-samples(tag with cuda version) as git submodule and update related makefile
-
- 02 Jun, 2021 1 commit
-
-
guoshzhao authored
* add cuda flops performance benchmark.
-
- 01 Jun, 2021 1 commit
-
-
Yuting Jiang authored
* add python related cudnn microbenchmark
-
- 31 May, 2021 1 commit
-
-
Yuting Jiang authored
* add benchmark for cublas test * format * revise error handling and test * add interface to read json file, revise json file path and include .json in packaging * add random_seed in arguments * revise preprocess of cublas benchmark * fix lint error and note error in source code * update according comments * revise input arguments from json file to custom str and convert json file to built-in dict list * restore package config * fit lint issue * update platform and comments * rename files to match source code dir and fix comments error Co-authored-by:root <root@sb-validation-000001.51z1chmys5fuzfqyo4niepozre.bx.internal.cloudapp.net>
-
- 19 May, 2021 2 commits
-
-
guoshzhao authored
* add kernel launch overhead benchmark.
-
Yuting Jiang authored
-
- 26 Apr, 2021 1 commit
-
-
guoshzhao authored
* revise example settings of cnn models.
-
- 20 Apr, 2021 2 commits
- 16 Apr, 2021 2 commits