tests/benchmarks/micro_benchmarks/test_gpu_stream.py · 3c95714f9483f449b0d01b82c23fe890c397284f · tsoc / superbenchmark

Bugfix - gpu_stream: remove ROCm build support, require CUDA with NVML (#789) · 3c95714f

Hongtao Zhang authored Apr 21, 2026



Summary

The gpu_stream benchmark has NVIDIA-specific dependencies that prevent
it from compiling on ROCm 6.3+. This change makes it CUDA-only,
gracefully skipping the build with a warning on non-NVIDIA
  environments.

  Problem

The gpu_stream benchmark fails to compile on ROCm 6.3+ due to multiple
NVIDIA-specific dependencies:

1. nvml.h — NVIDIA Management Library header, used for querying actual
memory clock rates. No HIP equivalent. Referenced in gpu_stream.cu and
gpu_stream_utils.hpp.
2. cuda.h in headers — Three .hpp files (gpu_stream.hpp,
gpu_stream_kernels.hpp, gpu_stream_utils.hpp) directly include <cuda.h>
and <cuda_runtime.h>. These headers are not processed by hipify-perl
(only
  .cu source files are), so they fail to resolve on ROCm.
3. Deprecated hipDeviceProp_t struct fields — The code accesses
memoryBusWidth, memoryClockRate, and ECCEnabled from the device
properties struct. These fields were removed from hipDeviceProp_t in
ROCm
    6.3, causing compilation errors after hipification.

The existing ROCm path was marked as incomplete (# TODO: test for ROC)
and was never fully functional on recent ROCm versions.

  Changes

- Removed the non-functional ROCm/HIP build path from
gpu_stream/CMakeLists.txt
- When CUDA is not found, prints a warning and returns gracefully
instead of attempting a broken hipify build or raising FATAL_ERROR
- No changes to the NVIDIA/CUDA build path — it continues to work as
before

  Impact

   - NVIDIA builds: No change — gpu_stream builds and installs normally
- ROCm builds: gpu_stream is skipped with a warning message. Previously
it would fail the entire make cppbuild step, blocking the Docker image
build
- Other benchmarks: Unaffected — build.sh continues to the next
benchmark after gpu_stream returns
Co-authored-by: Hongtao Zhang <hongtaozhang@microsoft.com>

3c95714f

test_gpu_stream.py 4.88 KB

Replace test_gpu_stream.py