- 28 Jul, 2024 1 commit
-
-
Yuting Jiang authored
**Description** Fix MSCCL build error in CUDA12.4 docker build pipeline due to OOM issue.
-
- 22 Apr, 2024 1 commit
-
-
Yuting Jiang authored
**Description** Add CUDA 12.4 dockerfile. **Major Revision** - upgrade nvidia docker into 23.04 **Minor Revision** - upgrade hpcx into 2.18
-
- 18 Apr, 2024 1 commit
-
-
Yuting Jiang authored
**Description** Upgrade mlc to v3.11.
-
- 21 Mar, 2024 1 commit
-
-
Yang Wang authored
**Description** Cuda 12.2 image will report undfined symbol error due to incomplete LD_LIBRARY_PATH:  ### How to reproduce: 1. Deploy sb with cuda12.2 image ``` sb deploy -f local.ini -i superbench/superbench:v0.10.0-cuda12.2 ``` 2. Enter to the container ``` sudo docker exec -it sb-workspace bash ``` 3. Execute `mpirun`: ``` root@sb-container:~# mpirun mpirun: symbol lookup error: mpirun: undefined symbol: opal_libevent2022_event_base_loop ``` ### Fix to fix * Append hpcx_load into /etc/bash.bashrc for updaing env LD_LIBRARY_PATH in each time ---------
-
- 08 Jan, 2024 1 commit
-
-
Yifan Xiong authored
**Description** Cherry-pick bug fixes from v0.10.0 to main. **Major Revisions** * Benchmarks: Microbenchmark - Support different hipblasLt data types in dist_inference #590 * Benchmarks: Microbenchmark - Support in-place for NCCL/RCCL benchmark #591 * Bug Fix - Fix NUMA Domains Swap Issue in NDv4 Topology File #592 * Benchmarks: Microbenchmark - Add data type option for NCCL and RCCL tests #595 * Benchmarks: Bug Fix - Make metrics of dist-inference-cpp aligned with PyTorch version #596 * CI/CD - Add ndv5 topo file #597 * Benchmarks: Microbenchmark - Improve AMD GPU P2P performance with fine-grained GPU memory #593 * Benchmarks: Build Pipeline - fix nccl and nccl test version to 2.18.3 to resolve hang issue in cuda12.2 docker #599 * Dockerfile - Bug fix for rocm docker build and deploy #598 * Benchmarks: Microbenchmark - Adapt to hipblasLt data type changes #603 * Benchmarks: Micro benchmarks - Update hipblaslt metric unit to tflops #604 * Monitor - Upgrade pyrsmi to amdsmi python library. #601 * Benchmarks: Micro benchmarks - add fp8 and initialization for hipblaslt benchmark #605 * Dockerfile - Add rocm6.0 dockerfile #602 * Bug Fix - Bug fix for latest megatron-lm benchmark #600 * Docs - Upgrade version and release note #606 Co-authored-by:
Ziyue Yang <ziyyang@microsoft.com> Co-authored-by:
Yang Wang <yangwang1@microsoft.com> Co-authored-by:
Yuting Jiang <yutingjiang@microsoft.com> Co-authored-by:
guoshzhao <guzhao@microsoft.com>
-
- 09 Dec, 2023 1 commit
-
-
Yuting Jiang authored
**Description** upgrade to rocm5.7 dockerfile. --------- Co-authored-by:yukirora <yuting.jiang@microsoft.com>
-
- 07 Dec, 2023 2 commits
-
-
Ziyue Yang authored
**Description** Add MSCCL support for Nvidia GPU
-
Yuting Jiang authored
**Description** Megatron-LM/Megatron-Deepspeed GPT pretrain benchmark
-
- 22 Nov, 2023 3 commits
-
-
Yifan Xiong authored
Upgrade Docker image to CUDA 12.2 for H100: * upgrade base image to 23.10 * fix onnxruntime version in python3.10 * fix compilation errors
-
Yuting Jiang authored
**Description** hipblaslt function benchmark and rebase cublaslt function benchmark.
-
guoshzhao authored
**Description** Generate baseline given results from multiple nodes. **Major Revision** - Add sub command `sb result generate-baseline` - Add UT and docs --------- Co-authored-by:
454314380 <454314380@qq.com> Co-authored-by:
Yuting Jiang <yutingjiang@microsoft.com>
-
- 23 Oct, 2023 1 commit
-
-
Yuting Jiang authored
**Description** Update mlc version into 3.10 for cuda and rocm dockerfiles to be consistent with cuda12 dockerfile Co-authored-by:yukirora <yuting.jiang@microsoft.com>
-
- 22 Aug, 2023 1 commit
-
-
Yuting Jiang authored
**Description** source code for evaluating NVDEC decoding performance. --------- Co-authored-by:yukirora <yuting.jiang@microsoft.com>
-
- 06 Jul, 2023 1 commit
-
-
Yuting Jiang authored
**Description** add python code for DirectXGPUEncodingLatency.
-
- 03 Jul, 2023 1 commit
-
-
Yuting Jiang authored
**Description** add AMF in third party and build AMF encoding latency test.
-
- 28 Jun, 2023 1 commit
-
-
Yuting Jiang authored
**Description** Add dockerfile for win10 and building script for directx_benchmarks. **Major Revision** - Add docker file for win10 and required scripts to install the dependency - Add building script to build all directx vs benchmarks - Add call of building script in Makefile --------- Co-authored-by:
yukirora <yuting.jiang@microsoft.com> Co-authored-by:
Yifan Xiong <yifan.xiong@microsoft.com>
-
- 14 Apr, 2023 1 commit
-
-
Yifan Xiong authored
**Description** Cherry-pick bug fixes from v0.8.0 to main. **Major Revisions** * Monitor - Fix the cgroup version checking logic (#502) * Benchmark - Fix matrix size overflow issue in cuBLASLt GEMM (#503) * Fix wrong torch usage in communication wrapper for Distributed Inference Benchmark (#505) * Analyzer: Fix bug in python3.8 due to pandas api change (#504) * Bug - Fix bug to get metric from cmd when error happens (#506) * Monitor - Collect realtime GPU power when benchmarking (#507) * Add num_workers argument in model benchmark (#511) * Remove unreachable condition when write host list (#512) * Update cuda11.8 image to cuda12.1 based on nvcr23.03 (#513) * Doc - Fix wrong unit of cpu-memory-bw-latency in doc (#515) * Docs - Upgrade version and release note (#508) Co-authored-by:
guoshzhao <guzhao@microsoft.com> Co-authored-by:
Ziyue Yang <ziyyang@microsoft.com> Co-authored-by:
Yuting Jiang <yutingjiang@microsoft.com>
-
- 21 Mar, 2023 1 commit
-
-
rafsalas19 authored
**Description** - Adding HPL benchmark --------- Co-authored-by:
Ubuntu <azureuser@sbtestvm.jzlku1oskncengjiado35wf1hd.ax.internal.cloudapp.net> Co-authored-by:
Peng Cheng <chengpeng5555@outlook.com>
-
- 06 Mar, 2023 1 commit
-
-
Yifan Xiong authored
Pin setuptools version to [v65.7.0](https://setuptools.pypa.io/en/latest/history.html#v65-7-0) to avoid breaking changes since v66.0.0.
-
- 13 Feb, 2023 1 commit
-
-
rafsalas19 authored
**Description** - Added stream benchmark - Added stream unit test - Added stream example - Modified docker files to build stream --------- Co-authored-by:
Ubuntu <azureuser@sbtestvm.jzlku1oskncengjiado35wf1hd.ax.internal.cloudapp.net> Co-authored-by:
Peng Cheng <chengpeng5555@outlook.com> Co-authored-by:
Yifan Xiong <xiongyf@yandex.com>
-
- 07 Feb, 2023 1 commit
-
- 29 Dec, 2022 1 commit
-
-
Yifan Xiong authored
Add Docker image for arch90 NVIDIA GPUs: * add CUDA11.8 Dockerfile * update archs in Makefile and benchmarks accordingly * update image build pipeline
-
- 31 Oct, 2022 1 commit
-
-
Yifan Xiong authored
Update version to include revision hash and date in "{last tag}+g{git hash}.d{date}" format, here're the examples: * exact tag: 0.6.0 * commit after tag: 0.6.0+gcbb1b34 * commit after tag with local changes: 0.6.0+gcbb1b34.d20221028
-
- 06 Sep, 2022 1 commit
-
-
Yifan Xiong authored
**Description** Cherry-pick bug fixes from v0.6.0 to main. **Major Revisions** * Enable latency test in ib traffic validation distributed benchmark (#396) * Enhance parameter parsing to allow spaces in value (#397) * Update apt packages in dockerfile (#398) * Upgrade colorlog for NO_COLOR support (#404) * Analyzer - Update error handling to support exit code of sb result diagnosis (#403) * Analyzer - Make baseline file optional in data diagnosis and fix bugs (#399) * Enhance timeout cleanup to avoid possible hanging (#405) * Auto generate ibstat file by pssh (#402) * Analyzer - Format int type and unify empty value to N/A in diagnosis output file (#406) * Docs - Upgrade version and release note (#407) * Docs - Fix issues in document (#408) Co-authored-by:
Yang Wang <yangwang1@microsoft.com> Co-authored-by:
Yuting Jiang <yutingjiang@microsoft.com>
-
- 17 Aug, 2022 1 commit
-
-
Yifan Xiong authored
__Description__ Update Python setup for require packages. __Major Revisions__ * downgrade requests version to be compatible with python 3.6, add corresponding pipeline for 3.6 * add extra entry in extras_require for nested packages * update `pip install` contents accordingly
-
- 13 Aug, 2022 1 commit
-
-
Yang Wang authored
An enhancement for topo-aware IB performance validation #373. This PR will auto-generate a required ibstate file `ib_traffic_topo_aware_ibstat.txt` which is used as input to build a graph.
-
- 13 Jul, 2022 1 commit
-
-
Yifan Xiong authored
Add dependencies * include ndv4-topo.xml in cuda docker images * require requests version to avoid RequestsDependencyWarning
-
- 06 Jul, 2022 1 commit
-
-
Yifan Xiong authored
Update dependencies and Dockerfile: * upgrade nccl-tests and rccl-tests to current latest version to match NCCL/RCCL versions * unify image tag names on DockerHub * remove verbose output in Dockerfile and minor fix some flags
-
- 24 Jun, 2022 2 commits
-
-
Yifan Xiong authored
Fix incorrect ulimit nofile config in Dockerfile. Instead of bash, sh is used by default where `echo` does not accept any parameters and `-e` is written into /etc/security/limits.conf.
-
Yifan Xiong authored
**Description** Support multiple IB/GPU devices run simultaneously in ib validation benchmark. **Major Revisions** - Revise ib_validation_performance.cc so that multiple processes per node could be used to launch multiple perftest commands simultaneously. For each node pair in the config, number of processes per node will run in parallel. - Revise ib_validation_performance.py to correct file paths and adjust parameters to specify different NICs/GPUs/NUMA nodes. - Fix env issues in Dockerfile for end-to-end test. - Update ib-traffic configuration examples in config files. - Update unit tests and docs accordingly. Closes #326.
-
- 19 Jun, 2022 1 commit
-
-
Yifan Xiong authored
**Description** Update ROCm Dockerfile. **Major Revisions** - Add dockerfile for ROCm 5.1.3 - Merge 5.1.x and 5.0.x dockerfile - Remove 4.2 and 4.0 legacy - Update build pipeline accordingly
-
- 15 Jun, 2022 1 commit
-
-
Yifan Xiong authored
**Description** Fix cmake and build issues. **Major Revision** * Remove unnecessary boost build * Remove user-agent for mlc * Remove -j for third party to build each project in sequence * Fix ansible collections installation path
-
- 31 May, 2022 1 commit
-
-
user4543 authored
**Description** Add support to run sb command inside docker image - install missing dependency.
-
- 27 May, 2022 1 commit
-
-
user4543 authored
**Description** Update rccl version and fix issue in rocm5.1.1 dockerfile.
-
- 25 May, 2022 1 commit
-
-
user4543 authored
**Description** Add dockerfile for rocm5.1.1.
-
- 28 Feb, 2022 1 commit
-
-
user4543 authored
**Description** Add dockerfile for rocm5.0.1.
-
- 25 Feb, 2022 1 commit
-
-
user4543 authored
**Description** Add rocm5.0 dockerfile.
-
- 08 Feb, 2022 1 commit
-
-
Ziyue Yang authored
This commit adds GDR-only nccl-tests for Nvidia machines. Also bump NCCL to v2.10.3-1 to achieve peak performance in this test.
-
- 30 Dec, 2021 1 commit
-
-
Yifan Xiong authored
__Description__ Cherry-pick bug fixes from v0.4.0 to main. __Major Revisions__ * Bug - Fix issues for Ansible and benchmarks (#267) * Tests - Refine test cases for microbenchmark (#268) * Bug - Build openmpi with ucx support in rocm dockerfiles (#269) * Benchmarks: Fix Bug - Fix fio build issue (#272) * Docs - Unify metric and add doc for cublas and cudnn functions (#271) * Monitor: Revision - Add 'monitor/' prefix to monitor metrics in result summary (#274) * Bug - Fix bug of detecting if gpu_index is none (#275) * Bug - Fix bugs in data diagnosis (#273) * Bug - Fix issue that the root mpi rank may not be the first in the hostfile (#270) * Benchmarks: Configuration - Update inference and network benchmarks in configs (#276) * Docs - Upgrade version and release note (#277) Co-authored-by:Yuting Jiang <v-yutjiang@microsoft.com>
-
- 13 Dec, 2021 1 commit
-
-
Hossein Pourreza authored
**Description** Add mlc memory bandwidth and latency micro benchmark to Superbench. **Major Revision** - Add mlc benchmark with test and example files
-