1. 17 Apr, 2026 1 commit
  2. 09 Feb, 2026 1 commit
    • Peter Eastman's avatar
      API for querying devices (#5192) · add95438
      Peter Eastman authored
      * API for querying devices
      
      * CUDA and HIP implementations of getDevices()
      
      * Fix test failures
      
      * Fix test failures
      
      * CUDA returns correct devices even if no context has been created
      
      * Return a single device for Reference and CPU
      
      * Fix CI failure
      add95438
  3. 14 Dec, 2025 1 commit
    • Anton Gorenko's avatar
      Support ROCm 7 (#5162) · 07b738c5
      Anton Gorenko authored
      * Remove std::enable_if, warpRotateLeft is always used with TILE_SIZE
      
      * Do not use built-in warpSize in constexpr contexts
      
      Starting from ROCm 7 warpSize is no longer constexpr.
      findInteractingBlocks.hip uses it for sizes of __shared__ arrays.
      
      * Check if hipHostMallocNumaUser is allowed before using it
      07b738c5
  4. 23 Sep, 2025 1 commit
  5. 12 Sep, 2025 1 commit
    • Evan Pretti's avatar
      Add constant potential method (#4870) · f55abcaa
      Evan Pretti authored
      
      
      * Initial implementation of C++ API
      
      * Add kernel interface and information for API generation
      
      * API updates for updating electrode parameters
      
      * Add serialization proxy for ConstantPotentialForce
      
      * Update file headers
      
      * Add CG error tolerance and fix units on getCharges() return value
      
      * Initial implementation of matrix solver
      
      * Fixes and conjugate gradient solver
      
      * Try to fix Linux and Windows builds
      
      * Make sure charge constraint target is on total charge
      
      * Restore handling of exceptions like NonbondedForce since they won't involve electrode atoms
      
      * Ameliorate numerical instability in constrained conjugate gradient
      
      * Fix uninitialized pointers, memory leak, and style
      
      * Set CG tolerance units in Python API
      
      * Test ConstantPotentialForce serialization
      
      * Read/write ExceptionsUsePeriodicBoundaryConditions as bool
      
      * Improve constrained conjugate gradient robustness to roundoff error accumulation
      
      * Recompute matrix if electrode atoms move due to setPositions()
      
      * Tolerance is now in gradient (potential) units again
      
      * Add neutralizing background correction
      
      * Add Python API tests
      
      * Fixes for CG and nonbonded exceptions
      
      * Add initial tests checking against existing NonbondedForce behavior
      
      * Expand test suite and fix some implementation issues
      
      * Add additional tests using larger reference system
      
      * Add Gaussian test
      
      * Finish test against reference computation
      
      * CPU platform implementation
      
      * Fixes for compilation on some platforms
      
      * Fixes for constant potential with AVX/AVX2
      
      * Test linking CPU PME library to constant potential test directly
      
      * Older SWIG versions don't support Python set to C++ set conversion
      
      * Add user guide entry
      
      * Increase speed of reference test
      
      * Conditional building constant potential CPU test is unreliable
      
      * Debugging
      
      * Miscellaneous fixes and improvements for CI
      
      * Cache charges so solver will not run if system and coordinates have not changed
      
      * Preconditioner flag, stability, and automatic detection improvements
      
      * Add GPU platform-specific constant potential kernel classes
      
      * PME and device-host I/O changes to support constant potential
      
      * Initial common constant potential implementation
      
      * Constant potential fixes:
      
      * Fix preconditioner PME position/charge save/restore logic
      
      * Fix reduction synchronization in constant potential solver kernels
      
      * Add double-float accumulation for conjugate gradient solver when
        double unsupported by hardware
      
      * Improve conditioning of a test system, and make sure particles are in or
      out of cutoff for consistency and ease of comparing between platforms
      
      * Reorder guess charges for CG when atom reordering changes positions
      
      * Remove PME queue for now
      
      * Trying to debug optimized direct space derivative kernel
      
      * Remove extraneous debugging lines
      
      * Style updates; just make CPU preconditioner double precision
      
      * Debugging updated optimized direct derivatives kernel for all but OpenCL CPU
      
      * OpenCL CPU implementation of direct space derivatives, and cleanup
      
      * Try to make test even shorter to not time out on CI
      
      * Temporary - Debugging
      
      * Debugging
      
      * Debugging
      
      * Debugging
      
      * Debugging
      
      * Remove debugging code and fix reduction synchronization
      
      * Fix other reductions
      
      * Debugging - are tests hanging or just slow on CI?
      
      * Debugging
      
      * Debugging
      
      * Fix macro for case when double precision is available on hardware
      
      * Remove changes for debugging again
      
      * Try to improve matrix solver cache locality by uploading transpose
      
      * Fixes for atom ordering and periodic images
      
      * Can't rely on reorder listener for cell offset updates
      
      * Test reducing number of contexts and timing for CI
      
      * Debugging
      
      * Remove timing code and revert debugging changes
      
      * Matrix solver and plasma term optimizations
      
      * Reduce CG solver kernel calls and downloads
      
      * Don't read back convergence flag from global memory
      
      * Update PME due to refactoring in master branch
      
      * Faster matrix solver (1st step)
      
      * Faster matrix solver for CUDA
      
      * Faster matrix solver compatibility with non-CUDA platforms
      
      * Matrix solver fixes
      
      * Use warp shuffle reductions when possible
      
      * Attempt to work around intermittent compiler crash in Intel CPU OpenCL
      
      * Optimize CG solver kernel 1
      
      * Rework CG solver so some kernels can use more than 1 block
      
      * Don't run out of shared memory
      
      * Asynchronously download convergence flag while clearing buffers
      
      ---------
      Co-authored-by: default avatarEvan Pretti <pretti@sh03-17n15.int>
      f55abcaa
  6. 09 Jul, 2025 1 commit
  7. 05 May, 2025 1 commit
    • Peter Eastman's avatar
      Common implementation of NonbondedForce (#4922) · 2443dcee
      Peter Eastman authored
      * Use common API for kernels
      
      * More code uses common interface
      
      * Bug fixes
      
      * Unified interface for sorting
      
      * Simplified interface for FFT
      
      * Use common event API for synchronization
      
      * Minor changes to make code more consistent between platforms
      
      * Common implementation of NonbondedForce
      
      * Bug fixes
      
      * Flag to enable list of single pairs
      
      * CUDA and OpenCL use common implementation of NonbondedForce
      
      * Fixed compilation error
      
      * HIP uses common implementation of NonbondedForce
      2443dcee
  8. 28 Apr, 2025 1 commit
  9. 25 Apr, 2025 1 commit
  10. 23 Apr, 2025 1 commit
  11. 14 Apr, 2025 1 commit
    • Peter Eastman's avatar
      DPDIntegrator (#4799) · de180e4e
      Peter Eastman authored
      * Created DPDIntegrator class
      
      * Reference implementation of DPDIntegrator
      
      * Build neighbor list for DPDIntegrator
      
      * Minor fixes
      
      * Documentation for DPDIntegrator
      
      * Python API for DPDIntegrator
      
      * Preliminary OpenCL implementation of DPDIntegrator
      
      * Enable USE_PERIODIC
      
      * Use updated positions in DPD thermostat
      
      * Working on neighbor list for OpenCL DPDIntegrator
      
      * ReorderListener for particle types
      
      * Serialization for DPDIntegrator
      
      * CUDA implementation of DPDIntegrator
      
      * HIP implementation of DPDIntegrator
      
      * Fixed compile error in Python wrapper
      
      * Fixed compile error in wrappers
      
      * Fixed uninitialized memory in reference neighbor list
      
      * Added DPDIntegrator to C++ API docs
      
      * Fixed incorrect launch size
      
      * Fixed nan in DPD random number generator
      
      * Minor optimizations
      
      * Improved load balancing
      
      * Fixed an indexing error
      
      * Neighbor list uses the maximum cutoff of any force
      
      * Fixed HIP compilation error
      
      * Fixed access to invalid memory
      
      * Added test case for diffusion coefficient
      
      * Try to debug segfaults on CI
      
      * Debugging
      
      * Debugging
      
      * Debugging
      
      * Debugging
      
      * Debugging
      
      * Debugging
      
      * Possible fix
      
      * Debugging
      
      * Debugging
      
      * Debugging
      
      * Use correct block size on CPU OpenCL
      
      * Workaround for bug in Intel's OpenCL for CPUs
      
      * Removed an unnecessary define
      
      * Removed debugging code
      
      * Include Dart
      
      * More Intel workarounds
      
      * Workaround for error in NVIDIA OpenCL
      de180e4e
  12. 10 Mar, 2025 1 commit
  13. 22 Nov, 2024 1 commit
  14. 01 Nov, 2024 1 commit
  15. 23 Sep, 2024 1 commit
    • Anton Gorenko's avatar
      Optimize PME spread charge kernel (#4633) · 8ea42950
      Anton Gorenko authored
      * PME_ORDER threads process one atom;
      * PME_ORDER threads access consecutive addresses;
      * No need to permute z indices with zindexTable;
      * finishSpreadCharge is needed only with fixed point charge spreading;
      8ea42950
  16. 10 Sep, 2024 1 commit
    • Peter Eastman's avatar
      Merged parallel code (#4649) · b28d2e66
      Peter Eastman authored
      * Unified lots of parallel computation code between platforms
      
      * Unified test code between platforms
      
      * Eliminated duplicated timing code
      b28d2e66
  17. 06 Sep, 2024 1 commit
    • Peter Eastman's avatar
      Optimize updateParametersInContext() (#4610) · 78902bed
      Peter Eastman authored
      * Optimize CustomNonbondedForce.updateParametersInContext()
      
      * Optimized uploading changed values to GPU
      
      * Optimized updateParametersInContext() for lots of bonded forces
      
      * Optimized updateParametersInContext() for CustomExternalForce
      
      * Optimized updateParametersInContext() for NonbondedForce
      
      * Code changes for HIP platform
      78902bed
  18. 05 Sep, 2024 10 commits
  19. 01 Sep, 2024 2 commits
    • Anton Gorenko's avatar
      Optimize sorting kernels and tune block sizes · 7279c539
      Anton Gorenko authored
      * Compile kernels with max block size of 256 threads:
        The default hipcc behavior since ROCm 4.2 is to compile kernels
        with 1024 threads unless __launch_bounds__ is specified. This
        significantly increases register pressure especially in heavy kernels
        (double precision, for example), requiring register spilling;
      * Optimize computeRange by using multiple blocks for reduction;
      * Use blocks of 1024 threads for computeBucketPositions - it is executed
        as a single work group so larger block size is faster;
      * Sort up-to lenghtNextPow2 instead of blockDim.x (faster for short
        buckets);
      * Optimize sortShortList2;
      * Optimize sortBuckets with bit instructions;
      * Decrease bucket size for non-uniform sorting: too many buckets may
        have sizes too large to sort in shared memory;
      * Add more sizes in tests.
      7279c539
    • Anton Gorenko's avatar
      Add hipification of CUDA platform · 89d2ff0e
      Anton Gorenko authored
      Port changes in CUDA backend to HIP
      
      Fix a warning about arithmetic operations on void* in HipArray::uploadSubArray
      
      Fix "Error Initializing context ROCm 5.3.0"
      
          https://github.com/StreamHPC/openmm-hip/issues/3
      
      
          hipDeviceSetCacheConfig returns hipErrorNotSupported on 5.3
      Co-authored-by: default avatarNick Curtis <nicholas.curtis@amd.com>
      89d2ff0e