- 06 Aug, 2025 1 commit
-
-
Hongtao Zhang authored
**Description** Merge ARM64 and AMD64 images into a single multi-architecture Docker manifest under one artifact namespace. Co-authored-by:Hongtao Zhang <hongtaozhang@microsoft.com>
-
- 25 Jun, 2025 1 commit
-
-
guoshzhao authored
**Description** Add cuda 12.9 dockerfile and build in pipeline. --------- Co-authored-by:
Guoshuai Zhao <microsoft@microsoft.com> Co-authored-by:
Hongtao Zhang <hongtaozhang@microsoft.com> Co-authored-by:
Hongtao Zhang <garyworkzht@gmail.com>
-
- 05 Jun, 2025 1 commit
-
-
Yifan Xiong authored
Update CODEOWNERS.
-
- 30 Apr, 2025 1 commit
-
-
Hongtao Zhang authored
- Upgrade OS of github runner used by lint to the latest. - Add symbolic link for clang-format to version 14. - Update importlib_metadata version since it is too old (inside nvcr.io/nvidia/pytorch:20.12-py3) and failed the 11.1 build. --------- Co-authored-by:
hongtaozhang <hongtaozhang@microsoft.com> Co-authored-by:
Yifan Xiong <yifan.xiong@microsoft.com>
-
- 09 Apr, 2025 1 commit
-
-
Yifan Xiong authored
Merge multi-arch image in build pipeline.
-
- 21 Mar, 2025 1 commit
-
-
pdr authored
**Description** Updated docker for 12.8 Use cutlass latest relase 3.8 with ARCH 100(blackwell) support add latest nccl-test release with ARCH 100(blackwell) Updated msccl to support build for sm_100 No breaking changes, so backward compatible tested with cuda 12.4 --------- Co-authored-by:Hongtao Zhang <garyworkzht@gmail.com>
-
- 12 Mar, 2025 1 commit
-
-
Hongtao Zhang authored
Due to the matrix strategy’s default "fail-fast" setting. In GitHub Actions, when running a job with a matrix, the individual configurations run in parallel. By default, if one matrix job (for example, the one labeled "rocm6_2_rocm6_2_x_superbe") fails, the remaining parallel jobs are canceled automatically. In our current build image pipeline, the arm64 build job always are canceled by the rocm build job. So, using a non-existent label in the job config to prevent rocm build job from scheduling for a temporary solution. --------- Co-authored-by:hongtaozhang <hongtaozhang@microsoft.com>
-
- 07 Mar, 2025 1 commit
-
-
Yifan Xiong authored
Add image build on arm64 arch.
-
- 21 Nov, 2024 1 commit
-
-
Yifan Xiong authored
Update CODEOWNERS for docs.
-
- 06 Nov, 2024 1 commit
-
-
pdr authored
Add support for arm64 build: - Updated dockerfile for arm64 build - extend cpu stream compilation for neoverse - handle onnxruntime-gpu installation - third party builds filtering based on arch - disable cuda decode perf build for non x86
-
- 02 Nov, 2024 1 commit
-
-
Yifan Xiong authored
**Description** Update image build. **Major Revision** * Remove ROCm 6.0 image due to outdated packages * Remove build tag for ROCm * Preserve build cache for 30 days
-
- 10 Oct, 2024 1 commit
-
-
Yuting Jiang authored
**Description** Cherry pick bug fixes from v0.11.0 to main **Major Revision** * #645 * #648 * #646 * #647 * #651 * #652 * #650 --------- Co-authored-by:
hongtaozhang <hongtaozhang@microsoft.com> Co-authored-by:
Yifan Xiong <yifan.xiong@microsoft.com>
-
- 28 Jul, 2024 1 commit
-
-
Yuting Jiang authored
**Description** Fix MSCCL build error in CUDA12.4 docker build pipeline due to OOM issue.
-
- 22 Apr, 2024 1 commit
-
-
Yuting Jiang authored
**Description** Add CUDA 12.4 dockerfile. **Major Revision** - upgrade nvidia docker into 23.04 **Minor Revision** - upgrade hpcx into 2.18
-
- 08 Jan, 2024 1 commit
-
-
Yifan Xiong authored
**Description** Cherry-pick bug fixes from v0.10.0 to main. **Major Revisions** * Benchmarks: Microbenchmark - Support different hipblasLt data types in dist_inference #590 * Benchmarks: Microbenchmark - Support in-place for NCCL/RCCL benchmark #591 * Bug Fix - Fix NUMA Domains Swap Issue in NDv4 Topology File #592 * Benchmarks: Microbenchmark - Add data type option for NCCL and RCCL tests #595 * Benchmarks: Bug Fix - Make metrics of dist-inference-cpp aligned with PyTorch version #596 * CI/CD - Add ndv5 topo file #597 * Benchmarks: Microbenchmark - Improve AMD GPU P2P performance with fine-grained GPU memory #593 * Benchmarks: Build Pipeline - fix nccl and nccl test version to 2.18.3 to resolve hang issue in cuda12.2 docker #599 * Dockerfile - Bug fix for rocm docker build and deploy #598 * Benchmarks: Microbenchmark - Adapt to hipblasLt data type changes #603 * Benchmarks: Micro benchmarks - Update hipblaslt metric unit to tflops #604 * Monitor - Upgrade pyrsmi to amdsmi python library. #601 * Benchmarks: Micro benchmarks - add fp8 and initialization for hipblaslt benchmark #605 * Dockerfile - Add rocm6.0 dockerfile #602 * Bug Fix - Bug fix for latest megatron-lm benchmark #600 * Docs - Upgrade version and release note #606 Co-authored-by:
Ziyue Yang <ziyyang@microsoft.com> Co-authored-by:
Yang Wang <yangwang1@microsoft.com> Co-authored-by:
Yuting Jiang <yutingjiang@microsoft.com> Co-authored-by:
guoshzhao <guzhao@microsoft.com>
-
- 09 Dec, 2023 1 commit
-
-
Yuting Jiang authored
**Description** upgrade to rocm5.7 dockerfile. --------- Co-authored-by:yukirora <yuting.jiang@microsoft.com>
-
- 07 Dec, 2023 1 commit
-
-
Ziyue Yang authored
**Description** Add MSCCL support for Nvidia GPU
-
- 22 Nov, 2023 1 commit
-
-
Yifan Xiong authored
Upgrade Docker image to CUDA 12.2 for H100: * upgrade base image to 23.10 * fix onnxruntime version in python3.10 * fix compilation errors
-
- 22 Aug, 2023 1 commit
-
-
Yuting Jiang authored
**Description** source code for evaluating NVDEC decoding performance. --------- Co-authored-by:yukirora <yuting.jiang@microsoft.com>
-
- 18 Aug, 2023 1 commit
-
-
Yuting Jiang authored
**Description** add source code for DirectXRenderPerf. --------- Co-authored-by:yukirora <yuting.jiang@microsoft.com>
-
- 27 Jul, 2023 1 commit
-
-
Yuting Jiang authored
**Description** Cherry-pick bug fixes from v0.9.0 to main. **Major Revision** - CI/CD: pipeline - clean more disk space to fix rocm building image pipeline(#555 ) - Benchmarks: bug fix - use absolute path for input file in DirectXEncodingLatency(#554) - CI/CD - add push win docker image on release branch in pipeline (#552) - Docs - Upgrade version and release note(#557)
-
- 05 Jul, 2023 3 commits
-
-
Yuting Jiang authored
**Description** add python code for DirecXGPUMemBw.
-
Yuting Jiang authored
**Description** add python code for DirectX core flops and init DirectX test pipeline. **Major Revision** - add python code for DirectX core flops - init DirectX test pipeline **Minor Revision** - add test for DirectX core flops
-
Yuting Jiang authored
**Description** Support DirectX test pipeline.
-
- 28 Jun, 2023 1 commit
-
-
Yuting Jiang authored
**Description** Add dockerfile for win10 and building script for directx_benchmarks. **Major Revision** - Add docker file for win10 and required scripts to install the dependency - Add building script to build all directx vs benchmarks - Add call of building script in Makefile --------- Co-authored-by:
yukirora <yuting.jiang@microsoft.com> Co-authored-by:
Yifan Xiong <yifan.xiong@microsoft.com>
-
- 14 Apr, 2023 1 commit
-
-
Yifan Xiong authored
**Description** Cherry-pick bug fixes from v0.8.0 to main. **Major Revisions** * Monitor - Fix the cgroup version checking logic (#502) * Benchmark - Fix matrix size overflow issue in cuBLASLt GEMM (#503) * Fix wrong torch usage in communication wrapper for Distributed Inference Benchmark (#505) * Analyzer: Fix bug in python3.8 due to pandas api change (#504) * Bug - Fix bug to get metric from cmd when error happens (#506) * Monitor - Collect realtime GPU power when benchmarking (#507) * Add num_workers argument in model benchmark (#511) * Remove unreachable condition when write host list (#512) * Update cuda11.8 image to cuda12.1 based on nvcr23.03 (#513) * Doc - Fix wrong unit of cpu-memory-bw-latency in doc (#515) * Docs - Upgrade version and release note (#508) Co-authored-by:
guoshzhao <guzhao@microsoft.com> Co-authored-by:
Ziyue Yang <ziyyang@microsoft.com> Co-authored-by:
Yuting Jiang <yutingjiang@microsoft.com>
-
- 23 Feb, 2023 1 commit
-
-
Yifan Xiong authored
Free more disk space in GitHub Action VHD.
-
- 29 Dec, 2022 1 commit
-
-
Yifan Xiong authored
Add Docker image for arch90 NVIDIA GPUs: * add CUDA11.8 Dockerfile * update archs in Makefile and benchmarks accordingly * update image build pipeline
-
- 18 Oct, 2022 1 commit
-
-
Yuting Jiang authored
Benchmarks - Add support to allow list of custom config string in cudnn-functions and cublas-functions (#414) **Description** Add support to allow list of custom config string in cudnn-functions and cublas-functions.
-
- 06 Jul, 2022 1 commit
-
-
Yifan Xiong authored
Update dependencies and Dockerfile: * upgrade nccl-tests and rccl-tests to current latest version to match NCCL/RCCL versions * unify image tag names on DockerHub * remove verbose output in Dockerfile and minor fix some flags
-
- 19 Jun, 2022 1 commit
-
-
Yifan Xiong authored
**Description** Update ROCm Dockerfile. **Major Revisions** - Add dockerfile for ROCm 5.1.3 - Merge 5.1.x and 5.0.x dockerfile - Remove 4.2 and 4.0 legacy - Update build pipeline accordingly
-
- 25 May, 2022 1 commit
-
-
user4543 authored
**Description** Add dockerfile for rocm5.1.1.
-
- 28 Feb, 2022 1 commit
-
-
user4543 authored
**Description** Add dockerfile for rocm5.0.1.
-
- 25 Feb, 2022 1 commit
-
-
user4543 authored
**Description** Add rocm5.0 dockerfile.
-
- 08 Feb, 2022 1 commit
-
-
Ziyue Yang authored
This commit adds GDR-only nccl-tests for Nvidia machines. Also bump NCCL to v2.10.3-1 to achieve peak performance in this test.
-
- 12 Oct, 2021 1 commit
-
-
Yifan Xiong authored
Disable dependabot version update, allow security update only. Reference: https://docs.github.com/en/code-security/supply-chain-security/keeping-your-dependencies-updated-automatically/configuration-options-for-dependency-updates#open-pull-requests-limit.
-
- 11 Oct, 2021 1 commit
-
-
Yifan Xiong authored
Add code security scanning. __Major Revisions__ * enable dependabot auto updates * scan code with CodeQL
-
- 26 Sep, 2021 1 commit
-
-
Yifan Xiong authored
**Description** Cherry-pick bug fixes from v0.3.0 to main. **Major Revisions** * Docs - Upgrade version and release note (#209) * Benchmarks: Build Pipeline - Update rccl-test git submodule to dc1ad48 (#210) * Benchmarks: Update - Update benchmarks in configuration file (#208) * CI/CD - Update GitHub Action VM (#211) * Benchmarks: Fix Bug - Fix wrong parameters for gpu-sm-copy-bw in configuration examples (#203) * CI/CD - Fix bug in build image for push event (#205) * Benchmark: Fix Bug - fix error message of communication-computation-overlap (#204) * Tool: Fix bug - Fix function naming issue in system info (#200) * CI/CD - Push images in GitHub Action (#202) * Bug - Fix torch.distributed command for single node (#201) * CLI - Integrate system info for node (#199) * Benchmarks: Code Revision - Revise CMake files for microbenchmarks. (#196) * CI/CD - Add ROCm image build in GitHub Actions (#194) * Bug: Fix bug - fix bug of hipBusBandwidth build (#193) * Benchmarks: Build Pipeline - Restore rocblas build logic (#197) * Bug: Fix Bug - Add barrier before 'destroy_process_group' in model benchmarks (#198) * Bug - Revise 'docker run' in sb deploy (#195) * Bug - Fix Bug : fix bug of error param operations to operation in rccl-bw of hpe config (#190) Co-authored-by:
Yuting Jiang <v-yujiang@microsoft.com> Co-authored-by:
Guoshuai Zhao <guzhao@microsoft.com> Co-authored-by:
Ziyue Yang <ziyyang@microsoft.com>
-
- 09 Jul, 2021 1 commit
-
-
guoshzhao authored
* Bug Fix - Fix race condition issue for multi ranks (#117) Fix race condition issue when multi ranks rotating the same directory. * Update pipeline for release branch (#122) * Bug Fix - Fix bug when convert bool config to store_true argument. (#120) Co-authored-by:Yifan Xiong <yifan.xiong@microsoft.com>
-
- 25 Jun, 2021 1 commit
-
-
Yifan Xiong authored
* Initialize SuperBench website. * Add GitHub Actions for automatically build and publish.
-