1. 01 Sep, 2024 2 commits
    • Anton Gorenko's avatar
      Optimize sorting kernels and tune block sizes · 7279c539
      Anton Gorenko authored
      * Compile kernels with max block size of 256 threads:
        The default hipcc behavior since ROCm 4.2 is to compile kernels
        with 1024 threads unless __launch_bounds__ is specified. This
        significantly increases register pressure especially in heavy kernels
        (double precision, for example), requiring register spilling;
      * Optimize computeRange by using multiple blocks for reduction;
      * Use blocks of 1024 threads for computeBucketPositions - it is executed
        as a single work group so larger block size is faster;
      * Sort up-to lenghtNextPow2 instead of blockDim.x (faster for short
        buckets);
      * Optimize sortShortList2;
      * Optimize sortBuckets with bit instructions;
      * Decrease bucket size for non-uniform sorting: too many buckets may
        have sizes too large to sort in shared memory;
      * Add more sizes in tests.
      7279c539
    • Anton Gorenko's avatar
      Add hipification of CUDA platform · 89d2ff0e
      Anton Gorenko authored
      Port changes in CUDA backend to HIP
      
      Fix a warning about arithmetic operations on void* in HipArray::uploadSubArray
      
      Fix "Error Initializing context ROCm 5.3.0"
      
          https://github.com/StreamHPC/openmm-hip/issues/3
      
      
          hipDeviceSetCacheConfig returns hipErrorNotSupported on 5.3
      Co-authored-by: default avatarNick Curtis <nicholas.curtis@amd.com>
      89d2ff0e
  2. 27 Dec, 2021 1 commit
  3. 08 Oct, 2020 1 commit
  4. 16 May, 2018 1 commit
  5. 03 May, 2018 1 commit
  6. 12 Feb, 2018 1 commit
  7. 12 Sep, 2016 1 commit
  8. 14 Nov, 2014 1 commit
  9. 22 Mar, 2013 1 commit
  10. 28 Sep, 2012 1 commit
  11. 16 Jun, 2012 1 commit
  12. 05 Jun, 2012 1 commit