1. 21 Apr, 2026 1 commit
    • Hongtao Zhang's avatar
      Bugfix - gpu_stream: remove ROCm build support, require CUDA with NVML (#789) · 3c95714f
      Hongtao Zhang authored
      
      
      Summary
      
      The gpu_stream benchmark has NVIDIA-specific dependencies that prevent
      it from compiling on ROCm 6.3+. This change makes it CUDA-only,
      gracefully skipping the build with a warning on non-NVIDIA
        environments.
      
        Problem
      
      The gpu_stream benchmark fails to compile on ROCm 6.3+ due to multiple
      NVIDIA-specific dependencies:
      
      1. nvml.h — NVIDIA Management Library header, used for querying actual
      memory clock rates. No HIP equivalent. Referenced in gpu_stream.cu and
      gpu_stream_utils.hpp.
      2. cuda.h in headers — Three .hpp files (gpu_stream.hpp,
      gpu_stream_kernels.hpp, gpu_stream_utils.hpp) directly include <cuda.h>
      and <cuda_runtime.h>. These headers are not processed by hipify-perl
      (only
        .cu source files are), so they fail to resolve on ROCm.
      3. Deprecated hipDeviceProp_t struct fields — The code accesses
      memoryBusWidth, memoryClockRate, and ECCEnabled from the device
      properties struct. These fields were removed from hipDeviceProp_t in
      ROCm
          6.3, causing compilation errors after hipification.
      
      The existing ROCm path was marked as incomplete (# TODO: test for ROC)
      and was never fully functional on recent ROCm versions.
      
        Changes
      
      - Removed the non-functional ROCm/HIP build path from
      gpu_stream/CMakeLists.txt
      - When CUDA is not found, prints a warning and returns gracefully
      instead of attempting a broken hipify build or raising FATAL_ERROR
      - No changes to the NVIDIA/CUDA build path — it continues to work as
      before
      
        Impact
      
         - NVIDIA builds: No change — gpu_stream builds and installs normally
      - ROCm builds: gpu_stream is skipped with a warning message. Previously
      it would fail the entire make cppbuild step, blocking the Docker image
      build
      - Other benchmarks: Unaffected — build.sh continues to the next
      benchmark after gpu_stream returns
      Co-authored-by: default avatarHongtao Zhang <hongtaozhang@microsoft.com>
      3c95714f
  2. 18 Jun, 2025 1 commit
    • WenqingLan1's avatar
      Benchmarks - Add GPU Stream Micro Benchmark (#697) · 4eddd50a
      WenqingLan1 authored
      Added GPU Stream benchmark - measures the GPU memory bandwidth and
      efficiency for double datatype through various memory operations
      including copy, scale, add, and triad.
      - added documentation for `gpu-stream` detailing its introduction,
      metrics, and descriptions.
      - added unit tests for `gpu-stream`. Example output is in
      `superbenchmark/tests/data/gpu_stream.log`.
      4eddd50a