- 03 Sep, 2021 1 commit
-
-
Yuting Jiang authored
Benchmarks: Code Revision - Revise arguments of nccl/rccl to support mpi mode and rename metric (#189) **Description** Revise arguments of nccl/rccl to support mpi mode for (mpi can not run in nccl/rccl due to multiple operators run in sequence without barrier) and rename metric . **Major Revision** - revise argument operators to be a single one **Minor Revision** - rename metric to remove benchmark name info - change argument ngpus default value to be 1
-
- 02 Sep, 2021 1 commit
-
-
Ziyue Yang authored
**Description** This commit fixes error of missing key 'percentile' in parsing FIO result.
-
- 01 Sep, 2021 1 commit
-
-
guoshzhao authored
**Description** Revise the DockerBenchmark base to support image pull, image rm etc. **Major Revision** - image pull in _preprocess() - image clean in _postprocess() - execute customized commands in _benchmark() - add unit tests
-
- 31 Aug, 2021 2 commits
-
-
Ziyue Yang authored
Benchmarks: Code Revision - Revise metric name generation and default config for disk performance benchmark (#175) **Description** This commit revises disk performance benchmark, including: 1) Add missing benchmark name in default config; 2) Avoid using reserved character ':' in metric name.
-
guoshzhao authored
**Description** Package frequently-used subprocess invoke into function.
-
- 30 Aug, 2021 4 commits
-
-
Ziyue Yang authored
**Description** This commit adds gpu_sm_copy benchmark and related tests.
-
Yuting Jiang authored
**Description** Remove IB device port info in command to fix bug of IB loopback. **Major Revision** - Remove IB device port info in command to fix bug of IB loopback
-
Yuting Jiang authored
**Description** Add gemm flops microbenchmark for amd. **Major Revision** - Add gemm flops microbenchmark for amd. - Add related example and test file.
-
Yuting Jiang authored
**Description** Extract base class for gemm flops microbenchmark. **Major Revision** - extract base class for gemm flops microbenchmark and add related test. - revise gemm_flops_performance for cuda.
-
- 27 Aug, 2021 4 commits
-
-
guoshzhao authored
**Description** Rename `kernel_launch_overhead_event` to `event_overhead`, `kernel_launch_overhead_wall` to `wall_overhead`.
-
Yuting Jiang authored
**Description** Add memory bus bandwidth performance microbenchmark for amd. **Major Revision** - Add memory bus bandwidth performance microbenchmark for amd. - Add related example and test file.
-
Ziyue Yang authored
**Description** This commit adds the benchmark program for GPU-initiated data transfer benchmark.
-
Yuting Jiang authored
Benchmarks: Fix Bug - fix bug of microbenmark building cublas and cudnn for amd in build pipeline (#166) **Description** Fix bug of microbenmark building cublas and cudnn for amd **Major Revision** - remove cuda LANGUAGES in project() - check CUDAToolkit quiet and then build if found
-
- 26 Aug, 2021 1 commit
-
-
Yuting Jiang authored
**Description** Rename computation_communication_overlap microbenchmark metric . **Major Revision** - remove rank info in metric. - simplify and rename metric.
-
- 25 Aug, 2021 1 commit
-
-
Yuting Jiang authored
**Description** extract base class for memory bandwidth microbenchmark. **Major Revision** - revise and optimize cuda_memory_bandwidth_performance - extract base class for memory bandwidth microbenchmark - add test for base class
-
- 22 Aug, 2021 1 commit
-
-
Ziyue Yang authored
**Description** This commit adds readwrite I/O pattern for FIO benchmark. Read/write ratio is fixed at 4:1.
-
- 20 Aug, 2021 1 commit
-
-
guoshzhao authored
**Description** Generate the summarized output files from all nodes. For each metric, do the reduce operation according to the `reduce_op` **Major Revision** - Generate the summarized json file per node: For microbenchmark, the format is `{benchmark_name}/[{run_count}/]{metric_name}[:rank]` For modelbenchmark, the format is `{benchmark_name}/{sub_benchmark_name}/[{run_count}/]{metric_name}` `[]` means optional. ``` { "kernel-launch/overhead_event:0": 0.00583, "kernel-launch/overhead_event:1": 0.00545, "kernel-launch/overhead_event:2": 0.00581, "kernel-launch/overhead_event:3": 0.00572, "kernel-launch/overhead_event:4": 0.00559, "kernel-launch/overhead_event:5": 0.00591, "kernel-launch/overhead_event:6": 0.00562, "kernel-launch/overhead_event:7": 0.00586, "resnet_models/pytorch-resnet50/steptime-train-float32": 544.0827468410134, "resnet_models/pytorch-resnet50/throughput-train-float32": 353.7607016465773, "resnet_models/pytorch-resnet50/steptime-train-float16": 425.40482617914677, "resnet_models/pytorch-resnet50/throughput-train-float16": 454.0142363793973, "pytorch-sharding-matmul/0/allreduce": 10.561786651611328, "pytorch-sharding-matmul/1/allreduce": 10.561786651611328, "pytorch-sharding-matmul/0/allgather": 10.088025093078613, "pytorch-sharding-matmul/1/allgather": 10.088025093078613 } ``` - Generate the summarized jsonl file for all nodes, each line is the result from one node in json format.
-
- 16 Aug, 2021 1 commit
-
-
guoshzhao authored
**Description** Change the field name `reduce` to `reduce_op`.
-
- 06 Aug, 2021 2 commits
- 05 Aug, 2021 1 commit
-
-
guoshzhao authored
**Description** Add reduce function support for output summary. **Major Revision** - Add reducer class to maintain all reduce functions. - Save reduce type of each metric into `BenchmarkResult` - Fix UT.
-
- 30 Jul, 2021 1 commit
-
-
Yuting Jiang authored
**Description** Add rccl bandwidth microbenchmark for rocm. **Major Revision** - Register rccl-bw benchmark.
-
- 29 Jul, 2021 1 commit
-
-
Yifan Xiong authored
__Description__ Cherry-pick bug fixes from v0.2.1 to main. __Major Revisions__ * Fix bug of VGG models failed on A100 GPU with batch_size=128. * Fix Ansible connection issue when running in localhost. * Update version in packages and docs.
-
- 27 Jul, 2021 2 commits
-
-
Yuting Jiang authored
**Description** Add the source code of rocm kernel launch overhead benchmark. **Major Revision** - Revise cmake build logic to support both cuda and rocm
-
Yuting Jiang authored
**Description** Support rocm cmake build. **Major Revision** - Add some envs in rocm_common.cmake to support rocm cmake build.
-
- 26 Jul, 2021 1 commit
-
-
Yuting Jiang authored
**Description** Add NCCL performance microbenchmark. **Major Revision** - Add microbenchmark, example, test, config for NCCL
-
- 23 Jul, 2021 2 commits
-
-
Yuting Jiang authored
**Description** Add RDMA Loopback performance microbenchmark. **Major Revision** - Add microbenchmark, example, test, config for RDMA Loopback
-
Ziyue Yang authored
**Description** Add disk performance microbenchmark. **Major Revision** - Add microbenchmark, example, test, config for disk performance. **Minor Revision** - Fix bugs in executor unit test related to default enabled tests.
-
- 13 Jul, 2021 1 commit
-
-
Yuting Jiang authored
Add microbenchmark, example, test, config for cuda memory performance and Add cuda-samples(tag with cuda version) as git submodule and update related makefile
-
- 30 Jun, 2021 1 commit
-
-
guoshzhao authored
-
- 29 Jun, 2021 1 commit
-
-
guoshzhao authored
* fix bug for nvidia v100 * hard code the supported dict for different arch.
-
- 28 Jun, 2021 2 commits
- 21 Jun, 2021 1 commit
-
-
guoshzhao authored
Benchmarks: Add Feature - Add DistributedImpl and DistributedBackend arguments for micro benchmark. (#100)
-
- 20 Jun, 2021 1 commit
-
-
Yuting Jiang authored
rename bin name and result metric of cublas and cudnn microbenchmark
-
- 16 Jun, 2021 1 commit
-
-
Yifan Xiong authored
Fix bugs and refine log in single GPU benchmarks: * Fix none framework issue * Fix empty parameter bug * Remove missed mobilenet_v3 models * Change benchmark registration log to debug level * Add pid in logging * Add missing benchmarks in default config * Fix deprecated logging warn
-
- 07 Jun, 2021 1 commit
-
-
guoshzhao authored
* Clean up the cache.
-
- 04 Jun, 2021 1 commit
-
-
guoshzhao authored
* fix return code reset issue
-
- 02 Jun, 2021 2 commits