1. 05 Sep, 2024 2 commits
    • Anton Gorenko's avatar
      Use VkFFT in HipFFT3D, remove hipFFT and the builtin FFT · f717ed89
      Anton Gorenko authored
      * VkFFT-based 3D FFT;
      * Caching of compiled VkFFT kernels;
      * Extend FFT tests with more sizes.
      f717ed89
    • Anton Gorenko's avatar
      Always use hipRTC, support Windows · b9c45d45
      Anton Gorenko authored
      * Unload all loaded modules in HipContext's destructor,
        HIP modules keep file desctriptors opened, but OpenMM never unloads
        modules leaking these file descriptors. This can cause crashinf of
        some scripts like test-openmm-platforms from openmmtools.
      * ROCm 6.0 defines operator* for complex types (that are typedefs for
        float2 and double2), they conflict with operators defined for vectors.
        This is fixed in newer ROCm versions.
      * Revert HIP_DYNAMIC_SHARED back to extern __shared__ (the macro is
        in the headers).
      * Reduce the speed of the HIP platform if there are no HIP devices in
        the system.
      b9c45d45
  2. 01 Sep, 2024 3 commits
    • Anton Gorenko's avatar
      Optimize sorting kernels and tune block sizes · 7279c539
      Anton Gorenko authored
      * Compile kernels with max block size of 256 threads:
        The default hipcc behavior since ROCm 4.2 is to compile kernels
        with 1024 threads unless __launch_bounds__ is specified. This
        significantly increases register pressure especially in heavy kernels
        (double precision, for example), requiring register spilling;
      * Optimize computeRange by using multiple blocks for reduction;
      * Use blocks of 1024 threads for computeBucketPositions - it is executed
        as a single work group so larger block size is faster;
      * Sort up-to lenghtNextPow2 instead of blockDim.x (faster for short
        buckets);
      * Optimize sortShortList2;
      * Optimize sortBuckets with bit instructions;
      * Decrease bucket size for non-uniform sorting: too many buckets may
        have sizes too large to sort in shared memory;
      * Add more sizes in tests.
      7279c539
    • Anton Gorenko's avatar
      Cleanup Cmake scripts for HIP platform · aca24d5f
      Anton Gorenko authored
      * Remove setting of link libraries, include and link dirs and compile
        flags for each target, instead let Cmake deal with them by linking the
        main library to hip::host hiprtc::hiprtc hip::hipfft;
      * Fix: custom command without ADD_CUSTOM_TARGET and ADD_DEPENDENCIES is
        executed for both static and shared targets;
      * Remove IF(APPLE) parts.
      aca24d5f
    • Anton Gorenko's avatar
      Add hipification of CUDA platform · 89d2ff0e
      Anton Gorenko authored
      Port changes in CUDA backend to HIP
      
      Fix a warning about arithmetic operations on void* in HipArray::uploadSubArray
      
      Fix "Error Initializing context ROCm 5.3.0"
      
          https://github.com/StreamHPC/openmm-hip/issues/3
      
      
          hipDeviceSetCacheConfig returns hipErrorNotSupported on 5.3
      Co-authored-by: default avatarNick Curtis <nicholas.curtis@amd.com>
      89d2ff0e