- 13 Aug, 2022 1 commit
-
-
Yang Wang authored
An enhancement for topo-aware IB performance validation #373. This PR will auto-generate a required ibstate file `ib_traffic_topo_aware_ibstat.txt` which is used as input to build a graph.
-
- 06 Jul, 2022 1 commit
-
-
Yifan Xiong authored
Update dependencies and Dockerfile: * upgrade nccl-tests and rccl-tests to current latest version to match NCCL/RCCL versions * unify image tag names on DockerHub * remove verbose output in Dockerfile and minor fix some flags
-
- 24 Jun, 2022 2 commits
-
-
Yifan Xiong authored
Fix incorrect ulimit nofile config in Dockerfile. Instead of bash, sh is used by default where `echo` does not accept any parameters and `-e` is written into /etc/security/limits.conf.
-
Yifan Xiong authored
**Description** Support multiple IB/GPU devices run simultaneously in ib validation benchmark. **Major Revisions** - Revise ib_validation_performance.cc so that multiple processes per node could be used to launch multiple perftest commands simultaneously. For each node pair in the config, number of processes per node will run in parallel. - Revise ib_validation_performance.py to correct file paths and adjust parameters to specify different NICs/GPUs/NUMA nodes. - Fix env issues in Dockerfile for end-to-end test. - Update ib-traffic configuration examples in config files. - Update unit tests and docs accordingly. Closes #326.
-
- 19 Jun, 2022 1 commit
-
-
Yifan Xiong authored
**Description** Update ROCm Dockerfile. **Major Revisions** - Add dockerfile for ROCm 5.1.3 - Merge 5.1.x and 5.0.x dockerfile - Remove 4.2 and 4.0 legacy - Update build pipeline accordingly
-
- 15 Jun, 2022 1 commit
-
-
Yifan Xiong authored
**Description** Fix cmake and build issues. **Major Revision** * Remove unnecessary boost build * Remove user-agent for mlc * Remove -j for third party to build each project in sequence * Fix ansible collections installation path
-
- 31 May, 2022 1 commit
-
-
user4543 authored
**Description** Add support to run sb command inside docker image - install missing dependency.
-
- 28 Feb, 2022 1 commit
-
-
user4543 authored
**Description** Add dockerfile for rocm5.0.1.
-
- 25 Feb, 2022 1 commit
-
-
user4543 authored
**Description** Add rocm5.0 dockerfile.
-
- 30 Dec, 2021 1 commit
-
-
Yifan Xiong authored
__Description__ Cherry-pick bug fixes from v0.4.0 to main. __Major Revisions__ * Bug - Fix issues for Ansible and benchmarks (#267) * Tests - Refine test cases for microbenchmark (#268) * Bug - Build openmpi with ucx support in rocm dockerfiles (#269) * Benchmarks: Fix Bug - Fix fio build issue (#272) * Docs - Unify metric and add doc for cublas and cudnn functions (#271) * Monitor: Revision - Add 'monitor/' prefix to monitor metrics in result summary (#274) * Bug - Fix bug of detecting if gpu_index is none (#275) * Bug - Fix bugs in data diagnosis (#273) * Bug - Fix issue that the root mpi rank may not be the first in the hostfile (#270) * Benchmarks: Configuration - Update inference and network benchmarks in configs (#276) * Docs - Upgrade version and release note (#277) Co-authored-by:Yuting Jiang <v-yutjiang@microsoft.com>
-
- 13 Dec, 2021 1 commit
-
-
Hossein Pourreza authored
**Description** Add mlc memory bandwidth and latency micro benchmark to Superbench. **Major Revision** - Add mlc benchmark with test and example files
-
- 10 Dec, 2021 1 commit
-
-
guoshzhao authored
**Description** Add ONNXRuntime inference benchmark based on ORT python API. **Major Revision** - Add `ORTInferenceBenchmark` class to export pytorch model to onnx model and do inference - Add tests and example for `ort-inference` benchmark - Update the introduction docs.
-
- 30 Oct, 2021 1 commit
-
-
Ziyue Yang authored
**Description** This commit does the following: 1) Adds CPU-initiated copy benchmark; 2) Adds dtod benchmark; 3) Support scanning NUMA nodes and GPUs inside the benchmark program; 4) Change the name of gpu-sm-copy to gpu-copy.
-
- 29 Oct, 2021 1 commit
-
-
Ziyue Yang authored
**Description** This commit fixes the URL of ROCm GPG file.
-
- 21 Oct, 2021 1 commit
-
-
Yuting Jiang authored
**Description** Add gpcnet as git submodule and building logic. **Major Revision** - add gpcnet as a submodule - add build logic in third_party/Makefile
-
- 02 Sep, 2021 1 commit
-
-
Yifan Xiong authored
__Description__ Resolve "too many open files" issue when runnning NCCL/RCCL on multiple nodes using Docker images, increase nofile number in limits.conf.
-
- 01 Sep, 2021 2 commits
- 31 Aug, 2021 1 commit
-
-
guoshzhao authored
**Description** Add dockerfile `rocm4.0-pytorch1.7.0.dockerfile` and `rocm4.2-pytorch1.7.0.dockerfile` for `rocm` platform.
-