- 30 Oct, 2021 1 commit
-
-
Ziyue Yang authored
**Description** This commit does the following: 1) Adds CPU-initiated copy benchmark; 2) Adds dtod benchmark; 3) Support scanning NUMA nodes and GPUs inside the benchmark program; 4) Change the name of gpu-sm-copy to gpu-copy.
-
- 29 Oct, 2021 1 commit
-
-
Ziyue Yang authored
**Description** This commit fixes the URL of ROCm GPG file.
-
- 21 Oct, 2021 1 commit
-
-
Yuting Jiang authored
**Description** Add gpcnet as git submodule and building logic. **Major Revision** - add gpcnet as a submodule - add build logic in third_party/Makefile
-
- 26 Sep, 2021 1 commit
-
-
Yifan Xiong authored
**Description** Cherry-pick bug fixes from v0.3.0 to main. **Major Revisions** * Docs - Upgrade version and release note (#209) * Benchmarks: Build Pipeline - Update rccl-test git submodule to dc1ad48 (#210) * Benchmarks: Update - Update benchmarks in configuration file (#208) * CI/CD - Update GitHub Action VM (#211) * Benchmarks: Fix Bug - Fix wrong parameters for gpu-sm-copy-bw in configuration examples (#203) * CI/CD - Fix bug in build image for push event (#205) * Benchmark: Fix Bug - fix error message of communication-computation-overlap (#204) * Tool: Fix bug - Fix function naming issue in system info (#200) * CI/CD - Push images in GitHub Action (#202) * Bug - Fix torch.distributed command for single node (#201) * CLI - Integrate system info for node (#199) * Benchmarks: Code Revision - Revise CMake files for microbenchmarks. (#196) * CI/CD - Add ROCm image build in GitHub Actions (#194) * Bug: Fix bug - fix bug of hipBusBandwidth build (#193) * Benchmarks: Build Pipeline - Restore rocblas build logic (#197) * Bug: Fix Bug - Add barrier before 'destroy_process_group' in model benchmarks (#198) * Bug - Revise 'docker run' in sb deploy (#195) * Bug - Fix Bug : fix bug of error param operations to operation in rccl-bw of hpe config (#190) Co-authored-by:
Yuting Jiang <v-yujiang@microsoft.com> Co-authored-by:
Guoshuai Zhao <guzhao@microsoft.com> Co-authored-by:
Ziyue Yang <ziyyang@microsoft.com>
-
- 02 Sep, 2021 1 commit
-
-
Yifan Xiong authored
__Description__ Resolve "too many open files" issue when runnning NCCL/RCCL on multiple nodes using Docker images, increase nofile number in limits.conf.
-
- 01 Sep, 2021 2 commits
- 31 Aug, 2021 1 commit
-
-
guoshzhao authored
**Description** Add dockerfile `rocm4.0-pytorch1.7.0.dockerfile` and `rocm4.2-pytorch1.7.0.dockerfile` for `rocm` platform.
-
- 29 Jul, 2021 1 commit
-
-
Yuting Jiang authored
**Description** Support rocm in third_party/makefile. **Major Revision** - Split rocm and cuda target in makefile - Add target in dockerfile
-
- 16 Jul, 2021 2 commits
-
-
Yuting Jiang authored
Add perftest as a submodule and add build logic
-
Yuting Jiang authored
Benchmarks: Build Pipeline - Add nccl-tests as a submodule and add build logic.
-
- 16 Jun, 2021 1 commit
-
-
Yifan Xiong authored
Update packages and add build cache for CUDA 11.1.1 Dockerfile: * Remove duplicate cmake and ompi, which are already in base image * Add hpcx and sharp lib * Add cache for gitmodules build * Sort apt-get packages
-
- 01 Jun, 2021 2 commits
- 18 May, 2021 1 commit
-
-
guoshzhao authored
* call build script in Makefile. * add cppbuild command for testing and docker env.
-
- 17 May, 2021 1 commit
-
-
Yifan Xiong authored
* add GitHub Action to build and push image * update Dockerfile to copy from context
-
- 14 Apr, 2021 1 commit
-
-
Yifan Xiong authored
* Rename dev branch to main and set it as default.
-
- 13 Apr, 2021 1 commit
-
-
Yifan Xiong authored
* fix missing package in dockerfile * update benchmark list and parameters * catch runtime errors * refine logging info
-
- 12 Apr, 2021 1 commit
-
-
Yifan Xiong authored
* add cuda11.1.1 dockerfile
-