- 18 Nov, 2021 1 commit
-
-
guoshzhao authored
**Description** Add the initial version of Monitor. **Major Revision** - Add `Monitor` class to launch background process for monitoring. - Add `MonitorRecord` class to save the data one time capturing.
-
- 15 Nov, 2021 1 commit
-
-
guoshzhao authored
**Description** Rename `nvidia_helper` utility as `device_manager` module and support more functions: ``` device_manager.get_device_count() device_manager.get_device_utilization(idx) device_manager.get_device_temperature(idx) device_manager.get_device_power_limit(idx) device_manager.get_device_memory(idx) device_manager.get_device_row_remapped_info(idx) device_manager.get_device_ecc_error(idx) ```
-
- 12 Nov, 2021 1 commit
-
-
Yifan Xiong authored
__Description__ Add TensorRT inference benchmark for torchvision models. __Major Revision__ - Measure TensorRT inference performance.
-
- 09 Nov, 2021 1 commit
-
-
Yuting Jiang authored
**Description** Add ib traffic validation distributed benchmark. **Major Revision** - Add ib traffic validation distributed benchmark, example and test
-
- 30 Oct, 2021 1 commit
-
-
Ziyue Yang authored
**Description** This commit does the following: 1) Adds CPU-initiated copy benchmark; 2) Adds dtod benchmark; 3) Support scanning NUMA nodes and GPUs inside the benchmark program; 4) Change the name of gpu-sm-copy to gpu-copy.
-
- 27 Oct, 2021 1 commit
-
-
guoshzhao authored
Add RocmOnnxModelBenchmark class to run benchmarks packaged in superbench/benchmark:rocm4.3.1-onnxruntime1.9.0
-
- 22 Oct, 2021 2 commits
-
-
Yuting Jiang authored
**Description** Add gpcnet microbenchmark **Major Revision** - add 2 microbenmark for gpcnet, gpc-network-test, gpc-network-load-test - add related test and example file
-
guoshzhao authored
Description Add CudaDockerBenchmark and RocmDockerBenchmark to support amd and cuda platform for DockerBenchmark.
-
- 21 Oct, 2021 2 commits
-
-
guoshzhao authored
**Description** Revise the all the term `onnx` to `onnxruntime`.
-
Yuting Jiang authored
**Description** Add IB validation tool source code. IB validation tool is a tool to validate IB traffic of different pattern in multi nodes flexibly **Major Revision** - Add ib validation tool source code - Add cmake file to build the source code
-
- 12 Oct, 2021 1 commit
-
-
Yuting Jiang authored
**Description** Add tcp connectivity validation microbenchmark which is to validate TCP connectivity between current node and several nodes in the hostfile. **Major Revision** - Add tcp connectivity validation microbenchmark and related test, example
-
- 28 Sep, 2021 1 commit
-
-
guoshzhao authored
**Description** Fix typo when set force_fp32 option.
-
- 27 Sep, 2021 1 commit
-
-
guoshzhao authored
**Description** Add option `force_fp32` to use fp32 instead of tf32, only takes effect on Ampere or newer GPUs.
-
- 26 Sep, 2021 1 commit
-
-
Yifan Xiong authored
**Description** Cherry-pick bug fixes from v0.3.0 to main. **Major Revisions** * Docs - Upgrade version and release note (#209) * Benchmarks: Build Pipeline - Update rccl-test git submodule to dc1ad48 (#210) * Benchmarks: Update - Update benchmarks in configuration file (#208) * CI/CD - Update GitHub Action VM (#211) * Benchmarks: Fix Bug - Fix wrong parameters for gpu-sm-copy-bw in configuration examples (#203) * CI/CD - Fix bug in build image for push event (#205) * Benchmark: Fix Bug - fix error message of communication-computation-overlap (#204) * Tool: Fix bug - Fix function naming issue in system info (#200) * CI/CD - Push images in GitHub Action (#202) * Bug - Fix torch.distributed command for single node (#201) * CLI - Integrate system info for node (#199) * Benchmarks: Code Revision - Revise CMake files for microbenchmarks. (#196) * CI/CD - Add ROCm image build in GitHub Actions (#194) * Bug: Fix bug - fix bug of hipBusBandwidth build (#193) * Benchmarks: Build Pipeline - Restore rocblas build logic (#197) * Bug: Fix Bug - Add barrier before 'destroy_process_group' in model benchmarks (#198) * Bug - Revise 'docker run' in sb deploy (#195) * Bug - Fix Bug : fix bug of error param operations to operation in rccl-bw of hpe config (#190) Co-authored-by:
Yuting Jiang <v-yujiang@microsoft.com> Co-authored-by:
Guoshuai Zhao <guzhao@microsoft.com> Co-authored-by:
Ziyue Yang <ziyyang@microsoft.com>
-
- 06 Sep, 2021 1 commit
-
-
Yuting Jiang authored
**Description** Add script to generate system config info. **Major Revision** - Add script to generate system config info into the dict in superbench/tools.
-
- 03 Sep, 2021 1 commit
-
-
Yuting Jiang authored
Benchmarks: Code Revision - Revise arguments of nccl/rccl to support mpi mode and rename metric (#189) **Description** Revise arguments of nccl/rccl to support mpi mode for (mpi can not run in nccl/rccl due to multiple operators run in sequence without barrier) and rename metric . **Major Revision** - revise argument operators to be a single one **Minor Revision** - rename metric to remove benchmark name info - change argument ngpus default value to be 1
-
- 02 Sep, 2021 3 commits
-
-
Ziyue Yang authored
**Description** This commit fixes error of missing key 'percentile' in parsing FIO result.
-
Yuting Jiang authored
Benchmarks: Add Configuration - Add microbenchmark in the validation config file for HPE (AMD MI00) (#176) **Description** Add microbenchmark in the validation config file for AMD MI00. **Major Revision** - add rccl-bw, mem-bw,ib-loopback,gemm-flops,kernel-launch config for mi100
-
Yifan Xiong authored
__Description__ Fix inventory bug in ansible_runner when host list is provided with multiple hosts. It ought to be handled by ansible_runner lib, workaround by using `--inventory` arg in cmdline.
-
- 01 Sep, 2021 2 commits
-
-
guoshzhao authored
**Description** Revise the DockerBenchmark base to support image pull, image rm etc. **Major Revision** - image pull in _preprocess() - image clean in _postprocess() - execute customized commands in _benchmark() - add unit tests
-
guoshzhao authored
**Description** Setup docker environment in docker container. **Major Revision** - Install docker client for cuda and rocm images. - Mount /var/run/docker.sock from host
-
- 31 Aug, 2021 2 commits
-
-
Ziyue Yang authored
Benchmarks: Code Revision - Revise metric name generation and default config for disk performance benchmark (#175) **Description** This commit revises disk performance benchmark, including: 1) Add missing benchmark name in default config; 2) Avoid using reserved character ':' in metric name.
-
guoshzhao authored
**Description** Package frequently-used subprocess invoke into function.
-
- 30 Aug, 2021 4 commits
-
-
Ziyue Yang authored
**Description** This commit adds gpu_sm_copy benchmark and related tests.
-
Yuting Jiang authored
**Description** Remove IB device port info in command to fix bug of IB loopback. **Major Revision** - Remove IB device port info in command to fix bug of IB loopback
-
Yuting Jiang authored
**Description** Add gemm flops microbenchmark for amd. **Major Revision** - Add gemm flops microbenchmark for amd. - Add related example and test file.
-
Yuting Jiang authored
**Description** Extract base class for gemm flops microbenchmark. **Major Revision** - extract base class for gemm flops microbenchmark and add related test. - revise gemm_flops_performance for cuda.
-
- 27 Aug, 2021 4 commits
-
-
guoshzhao authored
**Description** Rename `kernel_launch_overhead_event` to `event_overhead`, `kernel_launch_overhead_wall` to `wall_overhead`.
-
Yuting Jiang authored
**Description** Add memory bus bandwidth performance microbenchmark for amd. **Major Revision** - Add memory bus bandwidth performance microbenchmark for amd. - Add related example and test file.
-
Ziyue Yang authored
**Description** This commit adds the benchmark program for GPU-initiated data transfer benchmark.
-
Yuting Jiang authored
Benchmarks: Fix Bug - fix bug of microbenmark building cublas and cudnn for amd in build pipeline (#166) **Description** Fix bug of microbenmark building cublas and cudnn for amd **Major Revision** - remove cuda LANGUAGES in project() - check CUDAToolkit quiet and then build if found
-
- 26 Aug, 2021 1 commit
-
-
Yuting Jiang authored
**Description** Rename computation_communication_overlap microbenchmark metric . **Major Revision** - remove rank info in metric. - simplify and rename metric.
-
- 25 Aug, 2021 1 commit
-
-
Yuting Jiang authored
**Description** extract base class for memory bandwidth microbenchmark. **Major Revision** - revise and optimize cuda_memory_bandwidth_performance - extract base class for memory bandwidth microbenchmark - add test for base class
-
- 22 Aug, 2021 1 commit
-
-
Ziyue Yang authored
**Description** This commit adds readwrite I/O pattern for FIO benchmark. Read/write ratio is fixed at 4:1.
-
- 20 Aug, 2021 1 commit
-
-
guoshzhao authored
**Description** Generate the summarized output files from all nodes. For each metric, do the reduce operation according to the `reduce_op` **Major Revision** - Generate the summarized json file per node: For microbenchmark, the format is `{benchmark_name}/[{run_count}/]{metric_name}[:rank]` For modelbenchmark, the format is `{benchmark_name}/{sub_benchmark_name}/[{run_count}/]{metric_name}` `[]` means optional. ``` { "kernel-launch/overhead_event:0": 0.00583, "kernel-launch/overhead_event:1": 0.00545, "kernel-launch/overhead_event:2": 0.00581, "kernel-launch/overhead_event:3": 0.00572, "kernel-launch/overhead_event:4": 0.00559, "kernel-launch/overhead_event:5": 0.00591, "kernel-launch/overhead_event:6": 0.00562, "kernel-launch/overhead_event:7": 0.00586, "resnet_models/pytorch-resnet50/steptime-train-float32": 544.0827468410134, "resnet_models/pytorch-resnet50/throughput-train-float32": 353.7607016465773, "resnet_models/pytorch-resnet50/steptime-train-float16": 425.40482617914677, "resnet_models/pytorch-resnet50/throughput-train-float16": 454.0142363793973, "pytorch-sharding-matmul/0/allreduce": 10.561786651611328, "pytorch-sharding-matmul/1/allreduce": 10.561786651611328, "pytorch-sharding-matmul/0/allgather": 10.088025093078613, "pytorch-sharding-matmul/1/allgather": 10.088025093078613 } ``` - Generate the summarized jsonl file for all nodes, each line is the result from one node in json format.
-
- 19 Aug, 2021 1 commit
-
-
Yifan Xiong authored
Support mpi mode in runner: * concate mpirun command * support mca and env config * prepare hostfile and update Ansible host pattern Co-authored-by:Peng Cheng <chengpeng5555@outlook.com>
-
- 16 Aug, 2021 1 commit
-
-
guoshzhao authored
**Description** Change the field name `reduce` to `reduce_op`.
-
- 06 Aug, 2021 2 commits
- 05 Aug, 2021 1 commit
-
-
guoshzhao authored
**Description** Add reduce function support for output summary. **Major Revision** - Add reducer class to maintain all reduce functions. - Save reduce type of each metric into `BenchmarkResult` - Fix UT.
-