- 28 Dec, 2021 1 commit
-
-
Yifan Xiong authored
__Description__ Upgrade version and release note. __Major Revision__ - Upgrade package versions - Add release note for v0.4.0
-
- 27 Dec, 2021 1 commit
-
-
Yuting Jiang authored
**Description** Update inference and network benchmarks in configs.
-
- 24 Dec, 2021 3 commits
-
-
Yuting Jiang authored
**Description** Launch mpi on the sorted first host in the hostfile.
-
Yuting Jiang authored
**Description** Fix bugs in data diagnosis. **Major Revision** - fix package import issue of file_handler - deal with monitor metrics - fix typo in output_path
-
Yuting Jiang authored
**Description** Fix bug of detecting if gpu_index is none.
-
- 23 Dec, 2021 3 commits
-
-
Yuting Jiang authored
**Description** Add 'monitor/' prefix to monitor metrics in result summary.
-
Yuting Jiang authored
**Description** Unify metric and add doc for cuBLAS and cuDNN functions.
-
Yuting Jiang authored
**Description** Fix fio build issue (Illegal instruction). Refer to https://github.com/axboe/fio/issues/970
-
- 22 Dec, 2021 1 commit
-
-
Yuting Jiang authored
**Description** Build openmpi with ucx support in rocm dockerfiles.
-
- 16 Dec, 2021 2 commits
-
-
Yifan Xiong authored
__Description__ Refine test cases for microbenchmark: * Refine test fixture, add BenchmarkTestCase class. * Refine test data. * Resolve no numa issue for test_ib_loopback_util case.
-
Yifan Xiong authored
__Description__ Fix issues for Ansible and benchmarks: * Cleanup Ansible runner private data dir to avoid out of disk space issue when node number is large. * Support both absolute and relative paths when fecth results. * Use a deterministic image in Ansible test to avoid image update. * Update logging format. * Delete torch models and inputs after export.
-
- 14 Dec, 2021 1 commit
-
-
Yuting Jiang authored
**Description** Add usage for data diagnosis.
-
- 13 Dec, 2021 6 commits
-
-
guoshzhao authored
**Description** Update docs for monitor.
-
Yifan Xiong authored
Add transformers for TensorRT inference.
-
Ziyue Yang authored
**Description** Add benchmark metrics for cpu-memory-bw-latency.
-
Ziyue Yang authored
**Description** Benchmarks: Fix Comment - Correct benchmark name in test_gpu_copy_bw_performance.py.
-
Hossein Pourreza authored
**Description** Add mlc memory bandwidth and latency micro benchmark to Superbench. **Major Revision** - Add mlc benchmark with test and example files
-
yangpanMS authored
**Description** Minor doc change to highlight sb CLI version is independent of the sb container version.
-
- 10 Dec, 2021 5 commits
-
-
guoshzhao authored
**Description** Add ONNXRuntime inference benchmark based on ORT python API. **Major Revision** - Add `ORTInferenceBenchmark` class to export pytorch model to onnx model and do inference - Add tests and example for `ort-inference` benchmark - Update the introduction docs.
-
Yuting Jiang authored
**Description** Add basic analysis features. **Major Revision** - Add statistics, correlations of the raw data - Add numeric outlier detection(inter_quartile_range) - Add boxplot for selected metric
-
guoshzhao authored
**Description** Integrate monitor into Superbench. **Major Revision** - Initialize, start and stop monitor in SB executor. - Parse the monitor data in SB runner and merge into benchmark results. - Specify ReduceType for monitor metrics, such as MAX, MIN and LAST. - Add monitor configs into config file.
-
guoshzhao authored
**Description** Set the `reduce_op` type for metirc `return_code` as `None`.
-
Yuting Jiang authored
**Description** Add cli to integrate data diagnosis module.
-
- 09 Dec, 2021 1 commit
-
-
Yuting Jiang authored
**Description** Unify metric names of benchmarks.
-
- 08 Dec, 2021 2 commits
-
-
Yuting Jiang authored
**Description** Add data diagnosis module. **Major Revision** - Add DataDiagnosis class to support rule-based data diagnosis for result summary jsonl file of multi nodes - Add RuleOp class to define rule operators
-
Yifan Xiong authored
Fix issues for distributed runs: * fix config for memory bandwidth benchmarks * add throttling for high concurrency docker pull * update rsync path and exclude directories * handle exceptions when creating summary * tune for logging
-
- 07 Dec, 2021 1 commit
-
-
guoshzhao authored
**Description** Add return_code metric into result and revise unit tests.
-
- 06 Dec, 2021 1 commit
-
-
Yuting Jiang authored
**Description** Add doc for data diagnosis, including input, output and baseline file schema.
-
- 03 Dec, 2021 1 commit
-
-
Yifan Xiong authored
Add config file for Azure NDm A100 v4 SKU.
-
- 02 Dec, 2021 3 commits
-
-
guoshzhao authored
**Description** Add gpt-small into config files.
-
guoshzhao authored
**Description** If `ignore_invalid` is True, and 'required' arguments are not set when register the benchmark, the arguments should be provided by user in config and skip the arguments checking.
-
Yifan Xiong authored
**Description** Replace `-c` argument with `-N` for `numactl` since the old `-c`/`--cpubind` argument is deprecated.
-
- 01 Dec, 2021 1 commit
-
-
Ziyue Yang authored
**Description** Upgrade FIO benchmark tool from 3.27 to 3.28.
-
- 30 Nov, 2021 1 commit
-
-
Yuting Jiang authored
**Description** Update ib validtion mirobenchmark metrics.
-
- 29 Nov, 2021 1 commit
-
-
dependabot[bot] authored
Bumps [algoliasearch-helper](https://github.com/algolia/algoliasearch-helper-js) from 3.5.5 to 3.6.2. - [Release notes](https://github.com/algolia/algoliasearch-helper-js/releases) - [Changelog](https://github.com/algolia/algoliasearch-helper-js/blob/develop/CHANGELOG) - [Commits](https://github.com/algolia/algoliasearch-helper-js/compare/3.5.5...3.6.2 ) --- updated-dependencies: - dependency-name: algoliasearch-helper dependency-type: indirect ... Signed-off-by:
dependabot[bot] <support@github.com> Co-authored-by:
dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
-
- 26 Nov, 2021 1 commit
-
-
Ziyue Yang authored
**Description** Update gpu-copy benchmark metrics.
-
- 25 Nov, 2021 1 commit
-
-
Kaiyu Xie authored
**Description** Fix typo in description of kernel_launch_overhead.py
-
- 18 Nov, 2021 1 commit
-
-
guoshzhao authored
**Description** Add the initial version of Monitor. **Major Revision** - Add `Monitor` class to launch background process for monitoring. - Add `MonitorRecord` class to save the data one time capturing.
-
- 15 Nov, 2021 1 commit
-
-
guoshzhao authored
**Description** Rename `nvidia_helper` utility as `device_manager` module and support more functions: ``` device_manager.get_device_count() device_manager.get_device_utilization(idx) device_manager.get_device_temperature(idx) device_manager.get_device_power_limit(idx) device_manager.get_device_memory(idx) device_manager.get_device_row_remapped_info(idx) device_manager.get_device_ecc_error(idx) ```
-
- 12 Nov, 2021 1 commit
-
-
Yifan Xiong authored
__Description__ Add TensorRT inference benchmark for torchvision models. __Major Revision__ - Measure TensorRT inference performance.
-