- 10 Dec, 2021 2 commits
-
-
guoshzhao authored
**Description** Set the `reduce_op` type for metirc `return_code` as `None`.
-
Yuting Jiang authored
**Description** Add cli to integrate data diagnosis module.
-
- 09 Dec, 2021 1 commit
-
-
Yuting Jiang authored
**Description** Unify metric names of benchmarks.
-
- 08 Dec, 2021 2 commits
-
-
Yuting Jiang authored
**Description** Add data diagnosis module. **Major Revision** - Add DataDiagnosis class to support rule-based data diagnosis for result summary jsonl file of multi nodes - Add RuleOp class to define rule operators
-
Yifan Xiong authored
Fix issues for distributed runs: * fix config for memory bandwidth benchmarks * add throttling for high concurrency docker pull * update rsync path and exclude directories * handle exceptions when creating summary * tune for logging
-
- 07 Dec, 2021 1 commit
-
-
guoshzhao authored
**Description** Add return_code metric into result and revise unit tests.
-
- 06 Dec, 2021 1 commit
-
-
Yuting Jiang authored
**Description** Add doc for data diagnosis, including input, output and baseline file schema.
-
- 03 Dec, 2021 1 commit
-
-
Yifan Xiong authored
Add config file for Azure NDm A100 v4 SKU.
-
- 02 Dec, 2021 3 commits
-
-
guoshzhao authored
**Description** Add gpt-small into config files.
-
guoshzhao authored
**Description** If `ignore_invalid` is True, and 'required' arguments are not set when register the benchmark, the arguments should be provided by user in config and skip the arguments checking.
-
Yifan Xiong authored
**Description** Replace `-c` argument with `-N` for `numactl` since the old `-c`/`--cpubind` argument is deprecated.
-
- 01 Dec, 2021 1 commit
-
-
Ziyue Yang authored
**Description** Upgrade FIO benchmark tool from 3.27 to 3.28.
-
- 30 Nov, 2021 1 commit
-
-
Yuting Jiang authored
**Description** Update ib validtion mirobenchmark metrics.
-
- 29 Nov, 2021 1 commit
-
-
dependabot[bot] authored
Bumps [algoliasearch-helper](https://github.com/algolia/algoliasearch-helper-js) from 3.5.5 to 3.6.2. - [Release notes](https://github.com/algolia/algoliasearch-helper-js/releases) - [Changelog](https://github.com/algolia/algoliasearch-helper-js/blob/develop/CHANGELOG) - [Commits](https://github.com/algolia/algoliasearch-helper-js/compare/3.5.5...3.6.2 ) --- updated-dependencies: - dependency-name: algoliasearch-helper dependency-type: indirect ... Signed-off-by:
dependabot[bot] <support@github.com> Co-authored-by:
dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
-
- 26 Nov, 2021 1 commit
-
-
Ziyue Yang authored
**Description** Update gpu-copy benchmark metrics.
-
- 25 Nov, 2021 1 commit
-
-
Kaiyu Xie authored
**Description** Fix typo in description of kernel_launch_overhead.py
-
- 18 Nov, 2021 1 commit
-
-
guoshzhao authored
**Description** Add the initial version of Monitor. **Major Revision** - Add `Monitor` class to launch background process for monitoring. - Add `MonitorRecord` class to save the data one time capturing.
-
- 15 Nov, 2021 1 commit
-
-
guoshzhao authored
**Description** Rename `nvidia_helper` utility as `device_manager` module and support more functions: ``` device_manager.get_device_count() device_manager.get_device_utilization(idx) device_manager.get_device_temperature(idx) device_manager.get_device_power_limit(idx) device_manager.get_device_memory(idx) device_manager.get_device_row_remapped_info(idx) device_manager.get_device_ecc_error(idx) ```
-
- 12 Nov, 2021 1 commit
-
-
Yifan Xiong authored
__Description__ Add TensorRT inference benchmark for torchvision models. __Major Revision__ - Measure TensorRT inference performance.
-
- 10 Nov, 2021 1 commit
-
-
Yuting Jiang authored
**Description** Update docs to add network benchmarks for tcp and gpcnet.
-
- 09 Nov, 2021 2 commits
-
-
Yuting Jiang authored
**Description** Add ib traffic validation distributed benchmark. **Major Revision** - Add ib traffic validation distributed benchmark, example and test
-
guoshzhao authored
Update docs to add ORT AMD benchmarks based on docker.
-
- 30 Oct, 2021 1 commit
-
-
Ziyue Yang authored
**Description** This commit does the following: 1) Adds CPU-initiated copy benchmark; 2) Adds dtod benchmark; 3) Support scanning NUMA nodes and GPUs inside the benchmark program; 4) Change the name of gpu-sm-copy to gpu-copy.
-
- 29 Oct, 2021 1 commit
-
-
Ziyue Yang authored
**Description** This commit fixes the URL of ROCm GPG file.
-
- 27 Oct, 2021 2 commits
-
-
Yifan Xiong authored
Add introduction and metrics for micro-benchmarks and model-benchmarks document.
-
guoshzhao authored
Add RocmOnnxModelBenchmark class to run benchmarks packaged in superbench/benchmark:rocm4.3.1-onnxruntime1.9.0
-
- 22 Oct, 2021 2 commits
-
-
Yuting Jiang authored
**Description** Add gpcnet microbenchmark **Major Revision** - add 2 microbenmark for gpcnet, gpc-network-test, gpc-network-load-test - add related test and example file
-
guoshzhao authored
Description Add CudaDockerBenchmark and RocmDockerBenchmark to support amd and cuda platform for DockerBenchmark.
-
- 21 Oct, 2021 4 commits
-
-
Yuting Jiang authored
**Description** Add gpcnet as git submodule and building logic. **Major Revision** - add gpcnet as a submodule - add build logic in third_party/Makefile
-
guoshzhao authored
**Description** Revise the all the term `onnx` to `onnxruntime`.
-
Yuting Jiang authored
**Description** Add IB validation tool source code. IB validation tool is a tool to validate IB traffic of different pattern in multi nodes flexibly **Major Revision** - Add ib validation tool source code - Add cmake file to build the source code
-
Yifan Xiong authored
Upgrade to latest agent image in pipeline, fix "Ubuntu16 image does not exist" issue.
-
- 12 Oct, 2021 3 commits
-
-
Yuting Jiang authored
**Description** Add tcp connectivity validation microbenchmark which is to validate TCP connectivity between current node and several nodes in the hostfile. **Major Revision** - Add tcp connectivity validation microbenchmark and related test, example
-
Yifan Xiong authored
__Major Revisions__ * Refine document structure for user tutorial. __Minor Revisions__ * Add AMD part in installation. * Change default config file to latest link.
-
Yifan Xiong authored
Disable dependabot version update, allow security update only. Reference: https://docs.github.com/en/code-security/supply-chain-security/keeping-your-dependencies-updated-automatically/configuration-options-for-dependency-updates#open-pull-requests-limit.
-
- 11 Oct, 2021 2 commits
-
-
Yifan Xiong authored
Add code security scanning. __Major Revisions__ * enable dependabot auto updates * scan code with CodeQL
-
dependabot[bot] authored
Bumps [axios](https://github.com/axios/axios) from 0.21.1 to 0.21.4. - [Release notes](https://github.com/axios/axios/releases) - [Changelog](https://github.com/axios/axios/blob/master/CHANGELOG.md) - [Commits](https://github.com/axios/axios/compare/v0.21.1...v0.21.4 ) --- updated-dependencies: - dependency-name: axios dependency-type: indirect ... Signed-off-by:
dependabot[bot] <support@github.com> Co-authored-by:
dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
-
- 28 Sep, 2021 1 commit
-
-
guoshzhao authored
**Description** Fix typo when set force_fp32 option.
-
- 27 Sep, 2021 1 commit
-
-
guoshzhao authored
**Description** Add option `force_fp32` to use fp32 instead of tf32, only takes effect on Ampere or newer GPUs.
-
- 26 Sep, 2021 1 commit
-
-
Yifan Xiong authored
**Description** Cherry-pick bug fixes from v0.3.0 to main. **Major Revisions** * Docs - Upgrade version and release note (#209) * Benchmarks: Build Pipeline - Update rccl-test git submodule to dc1ad48 (#210) * Benchmarks: Update - Update benchmarks in configuration file (#208) * CI/CD - Update GitHub Action VM (#211) * Benchmarks: Fix Bug - Fix wrong parameters for gpu-sm-copy-bw in configuration examples (#203) * CI/CD - Fix bug in build image for push event (#205) * Benchmark: Fix Bug - fix error message of communication-computation-overlap (#204) * Tool: Fix bug - Fix function naming issue in system info (#200) * CI/CD - Push images in GitHub Action (#202) * Bug - Fix torch.distributed command for single node (#201) * CLI - Integrate system info for node (#199) * Benchmarks: Code Revision - Revise CMake files for microbenchmarks. (#196) * CI/CD - Add ROCm image build in GitHub Actions (#194) * Bug: Fix bug - fix bug of hipBusBandwidth build (#193) * Benchmarks: Build Pipeline - Restore rocblas build logic (#197) * Bug: Fix Bug - Add barrier before 'destroy_process_group' in model benchmarks (#198) * Bug - Revise 'docker run' in sb deploy (#195) * Bug - Fix Bug : fix bug of error param operations to operation in rccl-bw of hpe config (#190) Co-authored-by:
Yuting Jiang <v-yujiang@microsoft.com> Co-authored-by:
Guoshuai Zhao <guzhao@microsoft.com> Co-authored-by:
Ziyue Yang <ziyyang@microsoft.com>
-