• Hongtao Zhang's avatar
    Bugfix - gpu_stream: remove ROCm build support, require CUDA with NVML (#789) · 3c95714f
    Hongtao Zhang authored
    
    
    Summary
    
    The gpu_stream benchmark has NVIDIA-specific dependencies that prevent
    it from compiling on ROCm 6.3+. This change makes it CUDA-only,
    gracefully skipping the build with a warning on non-NVIDIA
      environments.
    
      Problem
    
    The gpu_stream benchmark fails to compile on ROCm 6.3+ due to multiple
    NVIDIA-specific dependencies:
    
    1. nvml.h — NVIDIA Management Library header, used for querying actual
    memory clock rates. No HIP equivalent. Referenced in gpu_stream.cu and
    gpu_stream_utils.hpp.
    2. cuda.h in headers — Three .hpp files (gpu_stream.hpp,
    gpu_stream_kernels.hpp, gpu_stream_utils.hpp) directly include <cuda.h>
    and <cuda_runtime.h>. These headers are not processed by hipify-perl
    (only
      .cu source files are), so they fail to resolve on ROCm.
    3. Deprecated hipDeviceProp_t struct fields — The code accesses
    memoryBusWidth, memoryClockRate, and ECCEnabled from the device
    properties struct. These fields were removed from hipDeviceProp_t in
    ROCm
        6.3, causing compilation errors after hipification.
    
    The existing ROCm path was marked as incomplete (# TODO: test for ROC)
    and was never fully functional on recent ROCm versions.
    
      Changes
    
    - Removed the non-functional ROCm/HIP build path from
    gpu_stream/CMakeLists.txt
    - When CUDA is not found, prints a warning and returns gracefully
    instead of attempting a broken hipify build or raising FATAL_ERROR
    - No changes to the NVIDIA/CUDA build path — it continues to work as
    before
    
      Impact
    
       - NVIDIA builds: No change — gpu_stream builds and installs normally
    - ROCm builds: gpu_stream is skipped with a warning message. Previously
    it would fail the entire make cppbuild step, blocking the Docker image
    build
    - Other benchmarks: Unaffected — build.sh continues to the next
    benchmark after gpu_stream returns
    Co-authored-by: default avatarHongtao Zhang <hongtaozhang@microsoft.com>
    3c95714f
test_gpu_stream.py 4.88 KB