1. 14 Dec, 2025 1 commit
    • Anton Gorenko's avatar
      Support ROCm 7 (#5162) · 07b738c5
      Anton Gorenko authored
      * Remove std::enable_if, warpRotateLeft is always used with TILE_SIZE
      
      * Do not use built-in warpSize in constexpr contexts
      
      Starting from ROCm 7 warpSize is no longer constexpr.
      findInteractingBlocks.hip uses it for sizes of __shared__ arrays.
      
      * Check if hipHostMallocNumaUser is allowed before using it
      07b738c5
  2. 23 Sep, 2025 1 commit
  3. 09 Jul, 2025 1 commit
  4. 05 May, 2025 1 commit
    • Peter Eastman's avatar
      Common implementation of NonbondedForce (#4922) · 2443dcee
      Peter Eastman authored
      * Use common API for kernels
      
      * More code uses common interface
      
      * Bug fixes
      
      * Unified interface for sorting
      
      * Simplified interface for FFT
      
      * Use common event API for synchronization
      
      * Minor changes to make code more consistent between platforms
      
      * Common implementation of NonbondedForce
      
      * Bug fixes
      
      * Flag to enable list of single pairs
      
      * CUDA and OpenCL use common implementation of NonbondedForce
      
      * Fixed compilation error
      
      * HIP uses common implementation of NonbondedForce
      2443dcee
  5. 28 Apr, 2025 1 commit
  6. 25 Apr, 2025 1 commit
  7. 10 Mar, 2025 1 commit
  8. 01 Nov, 2024 1 commit
  9. 10 Sep, 2024 1 commit
    • Peter Eastman's avatar
      Merged parallel code (#4649) · b28d2e66
      Peter Eastman authored
      * Unified lots of parallel computation code between platforms
      
      * Unified test code between platforms
      
      * Eliminated duplicated timing code
      b28d2e66
  10. 05 Sep, 2024 8 commits
  11. 01 Sep, 2024 2 commits
    • Anton Gorenko's avatar
      Optimize sorting kernels and tune block sizes · 7279c539
      Anton Gorenko authored
      * Compile kernels with max block size of 256 threads:
        The default hipcc behavior since ROCm 4.2 is to compile kernels
        with 1024 threads unless __launch_bounds__ is specified. This
        significantly increases register pressure especially in heavy kernels
        (double precision, for example), requiring register spilling;
      * Optimize computeRange by using multiple blocks for reduction;
      * Use blocks of 1024 threads for computeBucketPositions - it is executed
        as a single work group so larger block size is faster;
      * Sort up-to lenghtNextPow2 instead of blockDim.x (faster for short
        buckets);
      * Optimize sortShortList2;
      * Optimize sortBuckets with bit instructions;
      * Decrease bucket size for non-uniform sorting: too many buckets may
        have sizes too large to sort in shared memory;
      * Add more sizes in tests.
      7279c539
    • Anton Gorenko's avatar
      Add hipification of CUDA platform · 89d2ff0e
      Anton Gorenko authored
      Port changes in CUDA backend to HIP
      
      Fix a warning about arithmetic operations on void* in HipArray::uploadSubArray
      
      Fix "Error Initializing context ROCm 5.3.0"
      
          https://github.com/StreamHPC/openmm-hip/issues/3
      
      
          hipDeviceSetCacheConfig returns hipErrorNotSupported on 5.3
      Co-authored-by: default avatarNick Curtis <nicholas.curtis@amd.com>
      89d2ff0e
  12. 12 Dec, 2023 1 commit
  13. 20 Jul, 2023 1 commit
  14. 13 Apr, 2022 1 commit
  15. 04 Oct, 2021 1 commit
  16. 22 May, 2021 1 commit
    • Peter Eastman's avatar
      Converted AMOEBA to common platform (#3120) · 8e8923a7
      Peter Eastman authored
      * Began converting AMOEBA to common platform
      
      * Beginning of OpenCL platform for AMOEBA
      
      * Converted AmoebaVdwForce to common platform
      
      * Cleaned up reference AMOEBA tests
      
      * Began converting AmoebaMultipoleForce to common platform
      
      * Continue converting AmoebaMultipoleForce to common platform
      
      * Bug fixes
      
      * Bug fix
      
      * Continue converting AmoebaMultipoleForce to common platform
      
      * Converting AmoebaMultipoleForce and AmoebaGeneralizedKirkwoodForce to common platform
      
      * Converting AmoebaMultipoleForce and AmoebaGeneralizedKirkwoodForce to common platform
      
      * Creating OpenCL version of AmoebaMultipoleForce and AmoebaGeneralizedKirkwoodForce
      
      * Creating OpenCL version of AmoebaMultipoleForce and AmoebaGeneralizedKirkwoodForce
      
      * Creating OpenCL version of AmoebaMultipoleForce and AmoebaGeneralizedKirkwoodForce
      
      * Converted arrays from real3 to real
      
      * Bug fix to OpenCL AmoebaGeneralizedKirkwoodForce
      
      * Fixes for AMD GPUs
      
      * Began converting HippoNonbondedForce to common platform
      
      * Continuing to convert HippoNonbondedForce to common platform
      
      * Continuing to convert HippoNonbondedForce to common platform
      
      * Working on unifying PME kernels
      
      * Fixed error on devices without 64 bit atomics
      
      * Unified PME kernels
      
      * Converted HippoNonbondedForce to common platform
      
      * Creating OpenCL implementation of HippoNonbondedForce
      
      * Continuing OpenCL implementation of HippoNonbondedForce
      
      * Mostly finished OpenCL implementation of HippoNonbondedForce
      
      * Eliminated three component vector types in host code
      
      * Fix errors on CPU OpenCL
      
      * Skip double precision tests for AMOEBA on OpenCL
      
      * Bug fixes
      
      * Bug fixes
      
      * Fixed compilation error
      8e8923a7
  17. 19 Mar, 2021 1 commit
  18. 22 Feb, 2021 1 commit
  19. 11 Feb, 2021 1 commit
  20. 08 Jan, 2020 1 commit
    • peastman's avatar
      Common compute framework to unify CUDA and OpenCL code (#2488) · edbc8407
      peastman authored
      * Began creating common compute framework to unify code between CUDA and OpenCL
      
      * Began OpenCL implementation of common compute framework
      
      * Common implementation of CMMotionRemover
      
      * CUDA implementation of common compute interface
      
      * Converted HarmonicBondForce to common compute API
      
      * Converted standard bonded forces to common compute API
      
      * Converted ExpressionUtilities to common compute API
      
      * Created ComputeParameterSet
      
      * Converted custom bonded forces to common compute API
      
      * Converted CustomCentroidBondForce to common compute API
      
      * Converted CustomManyParticleForce to common compute API
      
      * Moved lots of duplicate code from CudaContext and OpenCLContext to ComputeContext
      
      * Converted GayBerneForce to common compute API
      
      * Removed obsolete kernels
      
      * Converted verlet integrators to common compute API
      
      * Converted Langevin and Brownian integrators to common compute API
      
      * Converted CustomIntegrator to common compute API
      
      * Converted CustomNonbondedForce to common compute API
      
      * Removed uses of a deprecated API
      
      * Fixed failing test cases
      
      * Converted GBSAOBCForce to common compute API
      
      * Began converting CustomGBForce to common compute API
      
      * Finished converting CustomGBForce to common compute API
      
      * Merged duplicated code in CudaIntegrationUtilities and OpenCLIntegrationUtilities
      
      * Converted RMSDForce and AndersenThermostat to common compute API
      
      * Converted CustomHbondForce to common compute API
      
      * Merged scripts for encoding kernel sources
      
      * Converted Drude plugin to common compute API
      
      * Fixed errors in CMake scripts
      
      * Attempt at fixing errors on Windows
      
      * Added discussion of common compute API to developer guide
      
      * Added Windows export macro for common classes
      
      * Fixed error in CMMotionRemover
      
      * Ubdated travis to newer Ubuntu version
      
      * Fixed errors on CPU OpenCL
      
      * Fixed Windows linking errors
      
      * Added missing pragma for 32 bit atomics
      
      * Replaced long long with mm_long
      
      * More fixes to Windows linking
      
      * Bug fix
      edbc8407
  21. 14 Jun, 2019 1 commit
  22. 29 May, 2018 1 commit
  23. 12 Feb, 2018 1 commit
  24. 02 Feb, 2018 1 commit
  25. 19 Jun, 2017 1 commit
  26. 16 Jun, 2017 1 commit
  27. 15 Feb, 2017 1 commit
  28. 05 Aug, 2016 1 commit
  29. 27 Jul, 2016 1 commit
  30. 25 May, 2016 1 commit
  31. 25 Oct, 2015 1 commit
  32. 15 May, 2015 1 commit