- 14 Apr, 2023 1 commit
-
-
Yifan Xiong authored
**Description** Cherry-pick bug fixes from v0.8.0 to main. **Major Revisions** * Monitor - Fix the cgroup version checking logic (#502) * Benchmark - Fix matrix size overflow issue in cuBLASLt GEMM (#503) * Fix wrong torch usage in communication wrapper for Distributed Inference Benchmark (#505) * Analyzer: Fix bug in python3.8 due to pandas api change (#504) * Bug - Fix bug to get metric from cmd when error happens (#506) * Monitor - Collect realtime GPU power when benchmarking (#507) * Add num_workers argument in model benchmark (#511) * Remove unreachable condition when write host list (#512) * Update cuda11.8 image to cuda12.1 based on nvcr23.03 (#513) * Doc - Fix wrong unit of cpu-memory-bw-latency in doc (#515) * Docs - Upgrade version and release note (#508) Co-authored-by:
guoshzhao <guzhao@microsoft.com> Co-authored-by:
Ziyue Yang <ziyyang@microsoft.com> Co-authored-by:
Yuting Jiang <yutingjiang@microsoft.com>
-
- 28 Jan, 2023 1 commit
-
-
Yifan Xiong authored
**Description** Cherry-pick bug fixes from v0.7.0 to main. **Major Revisions** * Benchmarks - Fix missing include in FP8 benchmark (#460) * Fix bug in TE BERT model (#461) * Doc - Update benchmark doc (#465) * Bug: Fix bug for incorrect datatype judgement in cublas-function source code (#464) * Support `sb deploy` without pulling image (#466) * Docs - Upgrade version and release note (#467) Co-authored-by:
Russell J. Hewett <russell.j.hewett@gmail.com> Co-authored-by:
Yuting Jiang <yutingjiang@microsoft.com>
-
- 06 Sep, 2022 1 commit
-
-
Yifan Xiong authored
**Description** Cherry-pick bug fixes from v0.6.0 to main. **Major Revisions** * Enable latency test in ib traffic validation distributed benchmark (#396) * Enhance parameter parsing to allow spaces in value (#397) * Update apt packages in dockerfile (#398) * Upgrade colorlog for NO_COLOR support (#404) * Analyzer - Update error handling to support exit code of sb result diagnosis (#403) * Analyzer - Make baseline file optional in data diagnosis and fix bugs (#399) * Enhance timeout cleanup to avoid possible hanging (#405) * Auto generate ibstat file by pssh (#402) * Analyzer - Format int type and unify empty value to N/A in diagnosis output file (#406) * Docs - Upgrade version and release note (#407) * Docs - Fix issues in document (#408) Co-authored-by:
Yang Wang <yangwang1@microsoft.com> Co-authored-by:
Yuting Jiang <yutingjiang@microsoft.com>
-
- 24 Jun, 2022 1 commit
-
-
Yifan Xiong authored
**Description** Support multiple IB/GPU devices run simultaneously in ib validation benchmark. **Major Revisions** - Revise ib_validation_performance.cc so that multiple processes per node could be used to launch multiple perftest commands simultaneously. For each node pair in the config, number of processes per node will run in parallel. - Revise ib_validation_performance.py to correct file paths and adjust parameters to specify different NICs/GPUs/NUMA nodes. - Fix env issues in Dockerfile for end-to-end test. - Update ib-traffic configuration examples in config files. - Update unit tests and docs accordingly. Closes #326.
-
- 29 Apr, 2022 1 commit
-
-
Yifan Xiong authored
**Description** Cherry-pick bug fixes from v0.5.0 to main. **Major Revisions** * Bug - Force to fix ort version as '1.10.0' (#343) * Bug - Support no matching rules and unify the output name in result_summary (#345) * Analyzer - Support regex in annotations of benchmark naming for metrics in rules (#344) * Bug - Fix bugs in sync results on root rank for e2e model benchmarks (#342) * Bug - Fix bug of duration feature for model benchmarks in distributed mode (#347) * Docs - Upgrade version and release note (#348) Co-authored-by:Yuting Jiang <v-yutjiang@microsoft.com>
-
- 16 Mar, 2022 1 commit
-
-
rafsalas19 authored
**Description** Modifications adding GPU-Burn to SuperBench. - added third party submodule - modified Makefile to make gpu-burn binary - added/modified microbenchmarks to add gpu-burn python scripts - modified default and azure_ndv4 configs to add gpu-burn
-
- 08 Feb, 2022 2 commits
-
-
Ziyue Yang authored
This commit adds GDR-only nccl-tests for Nvidia machines. Also bump NCCL to v2.10.3-1 to achieve peak performance in this test.
-
Ziyue Yang authored
This commit makes data checking in gpu_copy optional, because it will take too long time if message size is large.
-
- 27 Jan, 2022 1 commit
-
-
Yuting Jiang authored
**Description** Disable disk-benchmark in ndmv4.yaml and change batch size to 1 in default.yaml
-
- 30 Dec, 2021 1 commit
-
-
Yifan Xiong authored
__Description__ Cherry-pick bug fixes from v0.4.0 to main. __Major Revisions__ * Bug - Fix issues for Ansible and benchmarks (#267) * Tests - Refine test cases for microbenchmark (#268) * Bug - Build openmpi with ucx support in rocm dockerfiles (#269) * Benchmarks: Fix Bug - Fix fio build issue (#272) * Docs - Unify metric and add doc for cublas and cudnn functions (#271) * Monitor: Revision - Add 'monitor/' prefix to monitor metrics in result summary (#274) * Bug - Fix bug of detecting if gpu_index is none (#275) * Bug - Fix bugs in data diagnosis (#273) * Bug - Fix issue that the root mpi rank may not be the first in the hostfile (#270) * Benchmarks: Configuration - Update inference and network benchmarks in configs (#276) * Docs - Upgrade version and release note (#277) Co-authored-by:Yuting Jiang <v-yutjiang@microsoft.com>
-
- 13 Dec, 2021 1 commit
-
-
Hossein Pourreza authored
**Description** Add mlc memory bandwidth and latency micro benchmark to Superbench. **Major Revision** - Add mlc benchmark with test and example files
-
- 10 Dec, 2021 2 commits
-
-
guoshzhao authored
**Description** Add ONNXRuntime inference benchmark based on ORT python API. **Major Revision** - Add `ORTInferenceBenchmark` class to export pytorch model to onnx model and do inference - Add tests and example for `ort-inference` benchmark - Update the introduction docs.
-
guoshzhao authored
**Description** Integrate monitor into Superbench. **Major Revision** - Initialize, start and stop monitor in SB executor. - Parse the monitor data in SB runner and merge into benchmark results. - Specify ReduceType for monitor metrics, such as MAX, MIN and LAST. - Add monitor configs into config file.
-
- 08 Dec, 2021 1 commit
-
-
Yifan Xiong authored
Fix issues for distributed runs: * fix config for memory bandwidth benchmarks * add throttling for high concurrency docker pull * update rsync path and exclude directories * handle exceptions when creating summary * tune for logging
-
- 02 Dec, 2021 1 commit
-
-
Yifan Xiong authored
**Description** Replace `-c` argument with `-N` for `numactl` since the old `-c`/`--cpubind` argument is deprecated.
-
- 30 Oct, 2021 1 commit
-
-
Ziyue Yang authored
**Description** This commit does the following: 1) Adds CPU-initiated copy benchmark; 2) Adds dtod benchmark; 3) Support scanning NUMA nodes and GPUs inside the benchmark program; 4) Change the name of gpu-sm-copy to gpu-copy.
-
- 26 Sep, 2021 1 commit
-
-
Yifan Xiong authored
**Description** Cherry-pick bug fixes from v0.3.0 to main. **Major Revisions** * Docs - Upgrade version and release note (#209) * Benchmarks: Build Pipeline - Update rccl-test git submodule to dc1ad48 (#210) * Benchmarks: Update - Update benchmarks in configuration file (#208) * CI/CD - Update GitHub Action VM (#211) * Benchmarks: Fix Bug - Fix wrong parameters for gpu-sm-copy-bw in configuration examples (#203) * CI/CD - Fix bug in build image for push event (#205) * Benchmark: Fix Bug - fix error message of communication-computation-overlap (#204) * Tool: Fix bug - Fix function naming issue in system info (#200) * CI/CD - Push images in GitHub Action (#202) * Bug - Fix torch.distributed command for single node (#201) * CLI - Integrate system info for node (#199) * Benchmarks: Code Revision - Revise CMake files for microbenchmarks. (#196) * CI/CD - Add ROCm image build in GitHub Actions (#194) * Bug: Fix bug - fix bug of hipBusBandwidth build (#193) * Benchmarks: Build Pipeline - Restore rocblas build logic (#197) * Bug: Fix Bug - Add barrier before 'destroy_process_group' in model benchmarks (#198) * Bug - Revise 'docker run' in sb deploy (#195) * Bug - Fix Bug : fix bug of error param operations to operation in rccl-bw of hpe config (#190) Co-authored-by:
Yuting Jiang <v-yujiang@microsoft.com> Co-authored-by:
Guoshuai Zhao <guzhao@microsoft.com> Co-authored-by:
Ziyue Yang <ziyyang@microsoft.com>
-
- 31 Aug, 2021 1 commit
-
-
Ziyue Yang authored
Benchmarks: Code Revision - Revise metric name generation and default config for disk performance benchmark (#175) **Description** This commit revises disk performance benchmark, including: 1) Add missing benchmark name in default config; 2) Avoid using reserved character ':' in metric name.
-
- 30 Aug, 2021 1 commit
-
-
Ziyue Yang authored
**Description** This commit adds gpu_sm_copy benchmark and related tests.
-
- 26 Jul, 2021 1 commit
-
-
Yuting Jiang authored
**Description** Add NCCL performance microbenchmark. **Major Revision** - Add microbenchmark, example, test, config for NCCL
-
- 23 Jul, 2021 2 commits
-
-
Yuting Jiang authored
**Description** Add RDMA Loopback performance microbenchmark. **Major Revision** - Add microbenchmark, example, test, config for RDMA Loopback
-
Ziyue Yang authored
**Description** Add disk performance microbenchmark. **Major Revision** - Add microbenchmark, example, test, config for disk performance. **Minor Revision** - Fix bugs in executor unit test related to default enabled tests.
-
- 13 Jul, 2021 1 commit
-
-
Yuting Jiang authored
Add microbenchmark, example, test, config for cuda memory performance and Add cuda-samples(tag with cuda version) as git submodule and update related makefile
-
- 02 Jul, 2021 1 commit
-
-
Yifan Xiong authored
Update README and version for v0.2 release.
-
- 23 Jun, 2021 1 commit
-
-
Yifan Xiong authored
* Add `sb deploy` command content. * Fix inline if-expression syntax in playbook. * Fix quote escape issue in bash command. * Add custom env in config. * Update default config for multi GPU benchmarks. * Update MANIFEST.in to include jinja2 template. * Require jinja2 minimum version. * Fix occasional duplicate output in Ansible runner. * Fix mixed color from Ansible and Python colorlog. * Update according to comments. * Change superbench.env from list to dict in config file.
-
- 16 Jun, 2021 1 commit
-
-
Yifan Xiong authored
Fix bugs and refine log in single GPU benchmarks: * Fix none framework issue * Fix empty parameter bug * Remove missed mobilenet_v3 models * Change benchmark registration log to debug level * Add pid in logging * Add missing benchmarks in default config * Fix deprecated logging warn
-
- 02 Jun, 2021 1 commit
-
-
Yifan Xiong authored
* Support local mode in runner.
-
- 28 May, 2021 1 commit
-
-
Yifan Xiong authored
* Support `torch.distributed` mode in runner. * Support given `proc_num` and `node_num` in `torch.distributed` mode.
-
- 13 Apr, 2021 1 commit
-
-
Yifan Xiong authored
* fix missing package in dockerfile * update benchmark list and parameters * catch runtime errors * refine logging info
-
- 09 Apr, 2021 1 commit
-
-
Yifan Xiong authored
Add superbench executor class * add executor class * update default config to exec benchmarks * add micro benchmarks and model benchmarks
-
- 12 Mar, 2021 1 commit
-
-
Yifan Xiong authored
- Add CLI commands * sb version * sb deploy * sb exec * sb run - Add interface with executor and runner - Add cli test cases
-