1. 23 Apr, 2026 1 commit
    • one's avatar
      Benchmarks: Add gpu-hpl and gpu-hpl-mxp micro benchmarks (#15) · 4fa10f4d
      one authored
      Add gpu-hpl and gpu-hpl-mxp micro benchmarks backed by rocHPL and rocHPL-MxP.
      
      Implemented a shared GPU HPL base that:
      - Generates per-workload HPL dat files and parses the corresponding output files.
      - Supports common HPL inputs such as process grid, matrix size, block size, broadcast topology, warmup, iterations, and reduce operator.
      - Adds rocHPL-specific tuning parameters for gpu-hpl.
      - Formats metric keys from input-derived workload attributes.
      - Reports `flops`, `time`, and `tests_pass` metrics with warmup-aware aggregation.
      
      Add benchmark registrations, parser tests, sample output fixtures, documentation, and recommended configurations for gpu-hpl and gpu-hpl-mxp.
      
      Update rocHPL and rocHPL-MxP third-party integration with build patches, install targets, and SuperBench run helper scripts.
      
      Also update gpu-hpcg metric naming to use flops instead of gflops, remove standalone domain/verification-style metrics from the documented metric surface, and refresh Hygon HPCG documentation/config references accordingly.
      4fa10f4d
  2. 20 Apr, 2026 1 commit
  3. 01 Apr, 2026 1 commit
  4. 27 Mar, 2026 1 commit
  5. 19 Mar, 2026 1 commit
  6. 17 Mar, 2026 1 commit
  7. 04 Feb, 2026 1 commit
  8. 01 Oct, 2025 1 commit
  9. 30 Sep, 2025 1 commit
    • Yuting Jiang's avatar
      Benchmarks: Micro benchmark - Add simultanneously all-to-host / host-to-all... · 93e9d262
      Yuting Jiang authored
      Benchmarks: Micro benchmark - Add simultanneously all-to-host / host-to-all bandwidth testcases to nvbandwidth (#736)
      
      **Description**
      Add simultanneously all-to-host / host-to-all bandwidth testcases to
      nvbandwidth .
      
      **Major Revision**
      - nvbandwidth.patch: Add simultanneously all-to-host / host-to-all
      bandwidth testcases to nvbandwidth
      - upgrade nvbandwidth submodule into v0.8
      - add patch into makefile build
      93e9d262
  10. 26 Jun, 2025 1 commit
  11. 25 Jun, 2025 1 commit
  12. 14 Jun, 2025 1 commit
    • Hongtao Zhang's avatar
      microbenchmark - CPU Stream Benchmark Revise (#712) · 991c0051
      Hongtao Zhang authored
      
      
      In the current implementation, the CPU‑stream benchmark code renames the
      binary before the microbench base class can verify its existence,
      causing the default‐binary check to fail.
      
      This PR adds a “default” binary—built with the standard compile
      parameters—so that the base class can always find and validate it. Once
      the default binary is in place, the CPU‑stream code will rename it as
      needed and re‑check its presence before running the benchmark.
      
      The PR also enable CPU stream in the default settings.
      
      ---------
      Co-authored-by: default avatarHongtao Zhang <hongtaozhang@microsoft.com>
      991c0051
  13. 21 Mar, 2025 1 commit
  14. 21 Nov, 2024 1 commit
  15. 06 Nov, 2024 1 commit
    • pdr's avatar
      Dockerfile - Add support for arm64 build (#660) · 47949127
      pdr authored
      Add support for arm64 build:
      
      - Updated dockerfile for arm64 build
      - extend cpu stream compilation for neoverse 
      - handle onnxruntime-gpu installation
      - third party builds filtering based on arch
      - disable cuda decode perf build for non x86
      47949127
  16. 28 Jul, 2024 1 commit
  17. 26 Jul, 2024 1 commit
  18. 08 Jan, 2024 1 commit
    • Yifan Xiong's avatar
      Release - SuperBench v0.10.0 (#607) · 2c88db90
      Yifan Xiong authored
      
      
      **Description**
      
      Cherry-pick bug fixes from v0.10.0 to main.
      
      **Major Revisions**
      
      * Benchmarks: Microbenchmark - Support different hipblasLt data types in dist_inference #590
      * Benchmarks: Microbenchmark - Support in-place for NCCL/RCCL benchmark #591
      * Bug Fix - Fix NUMA Domains Swap Issue in NDv4 Topology File #592
      * Benchmarks: Microbenchmark - Add data type option for NCCL and RCCL tests #595
      * Benchmarks: Bug Fix - Make metrics of dist-inference-cpp aligned with PyTorch version #596
      * CI/CD - Add ndv5 topo file #597
      * Benchmarks: Microbenchmark - Improve AMD GPU P2P performance with fine-grained GPU memory #593
      * Benchmarks: Build Pipeline - fix nccl and nccl test version to 2.18.3 to resolve hang issue in cuda12.2 docker #599
      * Dockerfile - Bug fix for rocm docker build and deploy #598
      * Benchmarks: Microbenchmark - Adapt to hipblasLt data type changes #603
      * Benchmarks: Micro benchmarks - Update hipblaslt metric unit to tflops #604
      * Monitor - Upgrade pyrsmi to amdsmi python library. #601
      * Benchmarks: Micro benchmarks - add fp8 and initialization for hipblaslt benchmark #605
      * Dockerfile - Add rocm6.0 dockerfile #602
      * Bug Fix - Bug fix for latest megatron-lm benchmark #600
      * Docs - Upgrade version and release note #606
      Co-authored-by: default avatarZiyue Yang <ziyyang@microsoft.com>
      Co-authored-by: default avatarYang Wang <yangwang1@microsoft.com>
      Co-authored-by: default avatarYuting Jiang <yutingjiang@microsoft.com>
      Co-authored-by: default avatarguoshzhao <guzhao@microsoft.com>
      2c88db90
  19. 09 Dec, 2023 1 commit
  20. 07 Dec, 2023 2 commits
  21. 22 Nov, 2023 1 commit
  22. 22 Aug, 2023 1 commit
  23. 27 Jul, 2023 1 commit
    • Yuting Jiang's avatar
      Release - SuperBench v0.9.0 (#558) · e1df877b
      Yuting Jiang authored
      **Description**
      Cherry-pick bug fixes from v0.9.0 to main.
      
      **Major Revision**
      - CI/CD: pipeline - clean more disk space to fix rocm building image
      pipeline(#555 )
      - Benchmarks: bug fix - use absolute path for input file in
      DirectXEncodingLatency(#554)
      - CI/CD - add push win docker image on release branch in pipeline (#552)
      - Docs - Upgrade version and release note(#557)
      e1df877b
  24. 03 Jul, 2023 1 commit
  25. 21 Mar, 2023 1 commit
  26. 24 Feb, 2023 1 commit
  27. 13 Feb, 2023 1 commit
  28. 29 Dec, 2022 1 commit
  29. 16 Aug, 2022 1 commit
  30. 06 Jul, 2022 1 commit
    • Yifan Xiong's avatar
      Update dependencies and Dockerfile (#371) · 9f03d568
      Yifan Xiong authored
      Update dependencies and Dockerfile:
      * upgrade nccl-tests and rccl-tests to current latest version to match
        NCCL/RCCL versions
      * unify image tag names on DockerHub
      * remove verbose output in Dockerfile and minor fix some flags
      9f03d568
  31. 19 Jun, 2022 1 commit
    • Yifan Xiong's avatar
      Update ROCm Dockerfile (#361) · 483bf782
      Yifan Xiong authored
      **Description**
      
      Update ROCm Dockerfile.
      
      **Major Revisions**
      - Add dockerfile for ROCm 5.1.3
      - Merge 5.1.x and 5.0.x dockerfile
      - Remove 4.2 and 4.0 legacy
      - Update build pipeline accordingly
      483bf782
  32. 15 Jun, 2022 1 commit
    • Yifan Xiong's avatar
      Fix cmake and build issues (#360) · 60a3c743
      Yifan Xiong authored
      **Description**
      
      Fix cmake and build issues.
      
      **Major Revision**
      
      * Remove unnecessary boost build
      * Remove user-agent for mlc
      * Remove -j for third party to build each project in sequence
      * Fix ansible collections installation path
      60a3c743
  33. 16 Mar, 2022 1 commit
    • rafsalas19's avatar
      Benchmarks: Add Feature - Add GPU-Burn as microbenchmark (#324) · ff51a3ce
      rafsalas19 authored
      **Description**
      Modifications adding GPU-Burn to SuperBench.
      - added third party submodule
      - modified Makefile to make gpu-burn binary
      - added/modified microbenchmarks to add gpu-burn python scripts
      - modified default and azure_ndv4 configs to add gpu-burn
      ff51a3ce
  34. 24 Feb, 2022 1 commit
  35. 09 Feb, 2022 1 commit
  36. 29 Jan, 2022 1 commit
  37. 30 Dec, 2021 1 commit
    • Yifan Xiong's avatar
      Release - SuperBench v0.4.0 (#278) · ff563b66
      Yifan Xiong authored
      
      
      __Description__
      
      Cherry-pick  bug fixes from v0.4.0 to main.
      
      __Major Revisions__
      
      * Bug - Fix issues for Ansible and benchmarks (#267)
      * Tests - Refine test cases for microbenchmark (#268)
      * Bug - Build openmpi with ucx support in rocm dockerfiles (#269)
      * Benchmarks: Fix Bug - Fix fio build issue (#272)
      * Docs - Unify metric and add doc for cublas and cudnn functions (#271)
      * Monitor: Revision - Add 'monitor/' prefix to monitor metrics in result summary (#274)
      * Bug - Fix bug of detecting if gpu_index is none (#275)
      * Bug - Fix bugs in data diagnosis (#273)
      * Bug - Fix issue that the root mpi rank may not be the first in the hostfile (#270)
      * Benchmarks: Configuration - Update inference and network benchmarks in configs (#276)
      * Docs - Upgrade version and release note (#277)
      Co-authored-by: default avatarYuting Jiang <v-yutjiang@microsoft.com>
      ff563b66
  38. 01 Dec, 2021 1 commit
  39. 21 Oct, 2021 1 commit