- 21 Apr, 2026 1 commit
-
-
Hongtao Zhang authored
Summary The gpu_stream benchmark has NVIDIA-specific dependencies that prevent it from compiling on ROCm 6.3+. This change makes it CUDA-only, gracefully skipping the build with a warning on non-NVIDIA environments. Problem The gpu_stream benchmark fails to compile on ROCm 6.3+ due to multiple NVIDIA-specific dependencies: 1. nvml.h — NVIDIA Management Library header, used for querying actual memory clock rates. No HIP equivalent. Referenced in gpu_stream.cu and gpu_stream_utils.hpp. 2. cuda.h in headers — Three .hpp files (gpu_stream.hpp, gpu_stream_kernels.hpp, gpu_stream_utils.hpp) directly include <cuda.h> and <cuda_runtime.h>. These headers are not processed by hipify-perl (only .cu source files are), so they fail to resolve on ROCm. 3. Deprecated hipDeviceProp_t struct fields — The code accesses memoryBusWidth, memoryClockRate, and ECCEnabled from the device properties struct. These fields were removed from hipDeviceProp_t in ROCm 6.3, causing compilation errors after hipification. The existing ROCm path was marked as incomplete (# TODO: test for ROC) and was never fully functional on recent ROCm versions. Changes - Removed the non-functional ROCm/HIP build path from gpu_stream/CMakeLists.txt - When CUDA is not found, prints a warning and returns gracefully instead of attempting a broken hipify build or raising FATAL_ERROR - No changes to the NVIDIA/CUDA build path — it continues to work as before Impact - NVIDIA builds: No change — gpu_stream builds and installs normally - ROCm builds: gpu_stream is skipped with a warning message. Previously it would fail the entire make cppbuild step, blocking the Docker image build - Other benchmarks: Unaffected — build.sh continues to the next benchmark after gpu_stream returns Co-authored-by:Hongtao Zhang <hongtaozhang@microsoft.com>
-
- 18 Jun, 2025 1 commit
-
-
WenqingLan1 authored
Added GPU Stream benchmark - measures the GPU memory bandwidth and efficiency for double datatype through various memory operations including copy, scale, add, and triad. - added documentation for `gpu-stream` detailing its introduction, metrics, and descriptions. - added unit tests for `gpu-stream`. Example output is in `superbenchmark/tests/data/gpu_stream.log`.
-