- 11 Apr, 2022 1 commit
-
-
guoshzhao authored
**Description** Integrate FAMBench into superbench based on docker implementation: https://github.com/facebookresearch/FAMBench The script to run all benchmarks is: https://github.com/facebookresearch/FAMBench/blob/main/benchmarks/run_all.sh
-
- 01 Apr, 2022 1 commit
-
-
guoshzhao authored
**Description** Use config `log_raw_data` to control whether log the raw data into file or not. The default value is `no`. We can set it as `yes` for some particular benchmarks to save the raw data into file, such as NCCL/RCCL test.
-
- 16 Mar, 2022 1 commit
-
-
rafsalas19 authored
**Description** Modifications adding GPU-Burn to SuperBench. - added third party submodule - modified Makefile to make gpu-burn binary - added/modified microbenchmarks to add gpu-burn python scripts - modified default and azure_ndv4 configs to add gpu-burn
-
- 08 Feb, 2022 1 commit
-
-
Ziyue Yang authored
This commit makes data checking in gpu_copy optional, because it will take too long time if message size is large.
-
- 07 Feb, 2022 1 commit
-
-
Ziyue Yang authored
**Description** This commit does the following to optimize result variance in gpu_copy benchmark: 1) Add warmup phase for gpu_copy benchmark to avoid timing instability caused by first-time CUDA kernel launch overhead; 2) Use CUDA events for timing instead of CPU timestamps; 3) Make data checking an option that is not preferred to be enabled in performance test; 4) Enlarge message size in performance benchmark.
-
- 29 Jan, 2022 1 commit
-
-
Yifan Xiong authored
Support T4 and A10 in GEMM benchmark.
-
- 28 Jan, 2022 1 commit
-
-
guoshzhao authored
**Description** Please write a brief description and link the related issue if have. **Major Revision** - Sync (do allreduce max) the E2E training results among all workers. - Avoid using ':0' in metric name if there has only one rank having output.
-
- 21 Jan, 2022 1 commit
-
-
Ziyue Yang authored
**Description** This commit adds bidirectional tests in gpu_copy benchmark for both device-host transfer and device-device transfer, and revises related tests.
-
- 19 Jan, 2022 1 commit
-
-
guoshzhao authored
**Description** Add 50th, 90th, 95th, 99th, 99.9th latency metrics for ORT and pytorch inference benchmarks.
-
- 18 Jan, 2022 1 commit
-
-
Yifan Xiong authored
__Description__ Add command `sb benchmark list` and `sb benchmark list-parameters` to support listing all optional parameters for benchmarks. <details> <summary>Examples</summary> <pre> $ sb benchmark list -n [a-z]+-bw -o table Result -------- mem-bw nccl-bw rccl-bw </pre> <pre> $ sb benchmark list-parameters -n mem-bw === mem-bw === optional arguments: --bin_dir str Specify the directory of the benchmark binary. --duration int The elapsed time of benchmark in seconds. --mem_type str [str ...] Memory types to benchmark. E.g. htod dtoh dtod. --memory str Memory argument for bandwidthtest. E.g. pinned unpinned. --run_count int The run count of benchmark. --shmoo_mode Enable shmoo mode for bandwidthtest. default values: {'bin_dir': None, 'duration': 0, 'mem_type': ['htod', 'dtoh'], 'memory': 'pinned', 'run_count': 1} </pre> </details> __Major Revisions__ * Add `sb benchmark list` to list benchmarks matching given name. * Add `sb benchmark list-parameters` to list parameters for benchmarks which match given name. __Minor Revisions__ * Sort format help text for argparse.
-
- 30 Dec, 2021 1 commit
-
-
Yifan Xiong authored
__Description__ Cherry-pick bug fixes from v0.4.0 to main. __Major Revisions__ * Bug - Fix issues for Ansible and benchmarks (#267) * Tests - Refine test cases for microbenchmark (#268) * Bug - Build openmpi with ucx support in rocm dockerfiles (#269) * Benchmarks: Fix Bug - Fix fio build issue (#272) * Docs - Unify metric and add doc for cublas and cudnn functions (#271) * Monitor: Revision - Add 'monitor/' prefix to monitor metrics in result summary (#274) * Bug - Fix bug of detecting if gpu_index is none (#275) * Bug - Fix bugs in data diagnosis (#273) * Bug - Fix issue that the root mpi rank may not be the first in the hostfile (#270) * Benchmarks: Configuration - Update inference and network benchmarks in configs (#276) * Docs - Upgrade version and release note (#277) Co-authored-by:Yuting Jiang <v-yutjiang@microsoft.com>
-
- 13 Dec, 2021 4 commits
-
-
Yifan Xiong authored
Add transformers for TensorRT inference.
-
Ziyue Yang authored
**Description** Add benchmark metrics for cpu-memory-bw-latency.
-
Ziyue Yang authored
**Description** Benchmarks: Fix Comment - Correct benchmark name in test_gpu_copy_bw_performance.py.
-
Hossein Pourreza authored
**Description** Add mlc memory bandwidth and latency micro benchmark to Superbench. **Major Revision** - Add mlc benchmark with test and example files
-
- 10 Dec, 2021 2 commits
-
-
guoshzhao authored
**Description** Add ONNXRuntime inference benchmark based on ORT python API. **Major Revision** - Add `ORTInferenceBenchmark` class to export pytorch model to onnx model and do inference - Add tests and example for `ort-inference` benchmark - Update the introduction docs.
-
guoshzhao authored
**Description** Set the `reduce_op` type for metirc `return_code` as `None`.
-
- 09 Dec, 2021 1 commit
-
-
Yuting Jiang authored
**Description** Unify metric names of benchmarks.
-
- 07 Dec, 2021 1 commit
-
-
guoshzhao authored
**Description** Add return_code metric into result and revise unit tests.
-
- 02 Dec, 2021 1 commit
-
-
guoshzhao authored
**Description** If `ignore_invalid` is True, and 'required' arguments are not set when register the benchmark, the arguments should be provided by user in config and skip the arguments checking.
-
- 15 Nov, 2021 1 commit
-
-
guoshzhao authored
**Description** Rename `nvidia_helper` utility as `device_manager` module and support more functions: ``` device_manager.get_device_count() device_manager.get_device_utilization(idx) device_manager.get_device_temperature(idx) device_manager.get_device_power_limit(idx) device_manager.get_device_memory(idx) device_manager.get_device_row_remapped_info(idx) device_manager.get_device_ecc_error(idx) ```
-
- 12 Nov, 2021 1 commit
-
-
Yifan Xiong authored
__Description__ Add TensorRT inference benchmark for torchvision models. __Major Revision__ - Measure TensorRT inference performance.
-
- 09 Nov, 2021 1 commit
-
-
Yuting Jiang authored
**Description** Add ib traffic validation distributed benchmark. **Major Revision** - Add ib traffic validation distributed benchmark, example and test
-
- 30 Oct, 2021 1 commit
-
-
Ziyue Yang authored
**Description** This commit does the following: 1) Adds CPU-initiated copy benchmark; 2) Adds dtod benchmark; 3) Support scanning NUMA nodes and GPUs inside the benchmark program; 4) Change the name of gpu-sm-copy to gpu-copy.
-
- 27 Oct, 2021 1 commit
-
-
guoshzhao authored
Add RocmOnnxModelBenchmark class to run benchmarks packaged in superbench/benchmark:rocm4.3.1-onnxruntime1.9.0
-
- 22 Oct, 2021 2 commits
-
-
Yuting Jiang authored
**Description** Add gpcnet microbenchmark **Major Revision** - add 2 microbenmark for gpcnet, gpc-network-test, gpc-network-load-test - add related test and example file
-
guoshzhao authored
Description Add CudaDockerBenchmark and RocmDockerBenchmark to support amd and cuda platform for DockerBenchmark.
-
- 21 Oct, 2021 1 commit
-
-
guoshzhao authored
**Description** Revise the all the term `onnx` to `onnxruntime`.
-
- 12 Oct, 2021 1 commit
-
-
Yuting Jiang authored
**Description** Add tcp connectivity validation microbenchmark which is to validate TCP connectivity between current node and several nodes in the hostfile. **Major Revision** - Add tcp connectivity validation microbenchmark and related test, example
-
- 27 Sep, 2021 1 commit
-
-
guoshzhao authored
**Description** Add option `force_fp32` to use fp32 instead of tf32, only takes effect on Ampere or newer GPUs.
-
- 03 Sep, 2021 1 commit
-
-
Yuting Jiang authored
Benchmarks: Code Revision - Revise arguments of nccl/rccl to support mpi mode and rename metric (#189) **Description** Revise arguments of nccl/rccl to support mpi mode for (mpi can not run in nccl/rccl due to multiple operators run in sequence without barrier) and rename metric . **Major Revision** - revise argument operators to be a single one **Minor Revision** - rename metric to remove benchmark name info - change argument ngpus default value to be 1
-
- 01 Sep, 2021 1 commit
-
-
guoshzhao authored
**Description** Revise the DockerBenchmark base to support image pull, image rm etc. **Major Revision** - image pull in _preprocess() - image clean in _postprocess() - execute customized commands in _benchmark() - add unit tests
-
- 31 Aug, 2021 1 commit
-
-
Ziyue Yang authored
Benchmarks: Code Revision - Revise metric name generation and default config for disk performance benchmark (#175) **Description** This commit revises disk performance benchmark, including: 1) Add missing benchmark name in default config; 2) Avoid using reserved character ':' in metric name.
-
- 30 Aug, 2021 3 commits
-
-
Ziyue Yang authored
**Description** This commit adds gpu_sm_copy benchmark and related tests.
-
Yuting Jiang authored
**Description** Add gemm flops microbenchmark for amd. **Major Revision** - Add gemm flops microbenchmark for amd. - Add related example and test file.
-
Yuting Jiang authored
**Description** Extract base class for gemm flops microbenchmark. **Major Revision** - extract base class for gemm flops microbenchmark and add related test. - revise gemm_flops_performance for cuda.
-
- 27 Aug, 2021 2 commits
-
-
guoshzhao authored
**Description** Rename `kernel_launch_overhead_event` to `event_overhead`, `kernel_launch_overhead_wall` to `wall_overhead`.
-
Yuting Jiang authored
**Description** Add memory bus bandwidth performance microbenchmark for amd. **Major Revision** - Add memory bus bandwidth performance microbenchmark for amd. - Add related example and test file.
-
- 25 Aug, 2021 1 commit
-
-
Yuting Jiang authored
**Description** extract base class for memory bandwidth microbenchmark. **Major Revision** - revise and optimize cuda_memory_bandwidth_performance - extract base class for memory bandwidth microbenchmark - add test for base class
-
- 23 Aug, 2021 1 commit
-
-
Yuting Jiang authored
**Description** fix typo in test_nccl_bw_performance.py. **Major Revision** - fix typo in test_nccl_bw_performance.py.
-