Commits · 1ce5d91d9dedfdc273066fafa1a618bf05c25b85 · tsoc / openmm

07 Jun, 2025 2 commits

Use fixed point charge spreading on RDNA4 (#4960) · 1ce5d91d

Anton Gorenko authored Jun 08, 2025

* Use fixed point spread charge on RDNA4 as it is faster

Even though RDNA4 (gfx12) has global_atomic_add_f32, micro-benchmarks and OpenMM benchmarks show
that it is very slow compared to global_atomic_add_u64.

* Add a workaround for fixed point gridSpreadCharge on RDNA4

Workaround for rare cases when few values of pmeGrid are very large and
incorrect. The cause is unknown. Why this workaround or other irrelevant
changes like printf help is also unknown.

1ce5d91d

Fix computeNonbonded hang on the HIP platform (#4959) · a4b43a04

Anton Gorenko authored Jun 08, 2025

* Add a workaround for infinite loop in computeNonbonded (HIP)

computeNonbonded hangs in some tests (without neighbor list).
Reproducible on ROCm 6.4 and 6.4.1 (maybe on older versions too) on various architectures (both CDNA and RDNA).
Affected tests: TestHipATMForce, TestHipMonteCarloBarostat, TestHipNonbondedForce, TestHipVirtualSites.

Disassembly shows that the compiler splits branches of `if (skipBase+tgx < NUM_TILES_WITH_EXCLUSIONS)` and does
`SHFL(skipTiles, TILE_SIZE-1) < pos` checks in them separately, even though `__builtin_amdgcn_ds_bpermute`
is a convergent function. Apparently in this case not all lanes participate in each call.

* Simplify includeTile check using ballot

a4b43a04

02 Jun, 2025 1 commit
- Fixed exception when computing pressure (#4954) · 2b8ad703
  Peter Eastman authored Jun 02, 2025
  
  2b8ad703
25 May, 2025 1 commit
- Optimize computing kinetic energy (#4946) · 88f32f2d
  Peter Eastman authored May 25, 2025
  
  88f32f2d
24 May, 2025 1 commit
- set box vectors of the inner contexts before atom reordering (#4851) · f19c9f59
  Emilio Gallicchio authored May 23, 2025
```
* set box vectors of the inner contexts before atom reordering

* test for changing box vectors
```
  f19c9f59
23 May, 2025 1 commit
- Optimized setPositions() and setVelocities() (#4945) · 559da024
  Peter Eastman authored May 23, 2025
```
* Optimized setPositions() and setVelocities()

* Fix test failures
```
  559da024
20 May, 2025 1 commit
- Fix GPU memory leak in context arrays (#4940) · a5156da5
  Pier Fiedorowicz authored May 19, 2025
```
* Fix GPU memory leak

* Undo CUDA change
```
  a5156da5
05 May, 2025 1 commit

Common implementation of NonbondedForce (#4922) · 2443dcee

Peter Eastman authored May 05, 2025

* Use common API for kernels

* More code uses common interface

* Bug fixes

* Unified interface for sorting

* Simplified interface for FFT

* Use common event API for synchronization

* Minor changes to make code more consistent between platforms

* Common implementation of NonbondedForce

* Bug fixes

* Flag to enable list of single pairs

* CUDA and OpenCL use common implementation of NonbondedForce

* Fixed compilation error

* HIP uses common implementation of NonbondedForce

2443dcee

02 May, 2025 1 commit
- Fixed a bug in computing pressure (#4924) · dfb8d755
  Peter Eastman authored May 02, 2025
  
  dfb8d755
28 Apr, 2025 2 commits

Unified interface for queues (#4913) · dd320bcf

Peter Eastman authored Apr 28, 2025

* Unified interface for queues

* Simplified stream handling in CudaFFT3D

* HIP implementation of ComputeQueue

dd320bcf

Added computeCurrentPressure() to MonteCarloBarostat (#4881) · bce0c133

Peter Eastman authored Apr 28, 2025

* Added computeCurrentPressure() to MonteCarloBarostat

* Use instantaneous temperature to compute pressure

* Added computeCurrentPressure() to MonteCarloAnisotropicBarostat

* Added computeCurrentPressure() to MonteCarloMembraneBarostat

* Fixed compilation error

* Fixed error in typemap

* Added documentation on computing pressure

* Fixed CUDA compilation errors

* Made test case more robust

* Made a test case more robust

* Added computeCurrentPressure() to MonteCarloFlexibleBarostat

* Fixed compilation error

* More documentation on computing pressure

bce0c133

25 Apr, 2025 1 commit

Uniform interface for FFTs (#4911) · 01e99e77

Peter Eastman authored Apr 25, 2025

* Unified interface for FFTs

* AMOEBA uses unified interface for FFTs

* HIP implementation of common FFT interface

01e99e77

23 Apr, 2025 1 commit

Add correction for self energy of neutralizing plasma (#4907) · a3909c8e

Peter Eastman authored Apr 23, 2025

* Add correction for self energy of neutralizing plasma

* Fixed compilation errors

* Update total charge in copyParametersToContext()

* Bug fixes

* Fixed compilation errors in HIP

* Bug fix

a3909c8e

14 Apr, 2025 1 commit

DPDIntegrator (#4799) · de180e4e

Peter Eastman authored Apr 14, 2025

* Created DPDIntegrator class

* Reference implementation of DPDIntegrator

* Build neighbor list for DPDIntegrator

* Minor fixes

* Documentation for DPDIntegrator

* Python API for DPDIntegrator

* Preliminary OpenCL implementation of DPDIntegrator

* Enable USE_PERIODIC

* Use updated positions in DPD thermostat

* Working on neighbor list for OpenCL DPDIntegrator

* ReorderListener for particle types

* Serialization for DPDIntegrator

* CUDA implementation of DPDIntegrator

* HIP implementation of DPDIntegrator

* Fixed compile error in Python wrapper

* Fixed compile error in wrappers

* Fixed uninitialized memory in reference neighbor list

* Added DPDIntegrator to C++ API docs

* Fixed incorrect launch size

* Fixed nan in DPD random number generator

* Minor optimizations

* Improved load balancing

* Fixed an indexing error

* Neighbor list uses the maximum cutoff of any force

* Fixed HIP compilation error

* Fixed access to invalid memory

* Added test case for diffusion coefficient

* Try to debug segfaults on CI

* Debugging

* Debugging

* Debugging

* Debugging

* Debugging

* Debugging

* Possible fix

* Debugging

* Debugging

* Debugging

* Use correct block size on CPU OpenCL

* Workaround for bug in Intel's OpenCL for CPUs

* Removed an unnecessary define

* Removed debugging code

* Include Dart

* More Intel workarounds

* Workaround for error in NVIDIA OpenCL

de180e4e

21 Mar, 2025 1 commit
- Reference and CPU allow more changes to nonbonded exceptions (#4858) · d2318e3b
  Peter Eastman authored Mar 21, 2025
  
  d2318e3b
14 Mar, 2025 1 commit

Split CommonKernels into multiple files (#4847) · 377e3249

Peter Eastman authored Mar 14, 2025

* Began splitting CommonKernels into multiple files

* Moved two more kernels into separate files

* Moved two integrators into separate files

* Fix compilation error on Windows

377e3249

11 Mar, 2025 1 commit

ATMForce: ignore invalid energy term when the energy expression does not depend on it (#4834) · dde4228b

Emilio Gallicchio authored Mar 11, 2025



* reset overflowed state energies at the alchemical endpoints

* address formatting, complete clash test

* Fixed indentation

---------
Co-authored-by: Peter Eastman <peter.eastman@gmail.com>

dde4228b

10 Mar, 2025 1 commit

Replace pthreads with C++ threads (#4833) · 68c97c5b

Peter Eastman authored Mar 10, 2025

* Replace pthreads with C++ threads

* Try to fix CI errors

* Try including -pthread linker option

68c97c5b

05 Mar, 2025 1 commit
- Fixed exception from re-enabling peer-to-peer access (#4825) · bda67c67
  Peter Eastman authored Mar 04, 2025
  
  bda67c67
04 Mar, 2025 1 commit
- Fixed error in checking for circular virtual site definitions (#4822) · 862e6ac7
  Peter Eastman authored Mar 04, 2025
  
  862e6ac7
13 Jan, 2025 1 commit
- Allow more changes to nonbonded exceptions (#4766) · a3628b48
  Peter Eastman authored Jan 13, 2025
  
  a3628b48
16 Dec, 2024 1 commit
- Prevent extra force evaluations with NoseHooverIntegrator (#4753) · 87810125
  Peter Eastman authored Dec 16, 2024
  
  87810125
27 Nov, 2024 1 commit

CPU platform checkpoints random number generator (#4740) · f67ae730

Peter Eastman authored Nov 27, 2024

* CPU platform checkpoints random number generator

* Fix Windows compilation error

* Another Windows compilation error

f67ae730

26 Nov, 2024 1 commit

Use Intel OpenCL for CI (#4366) · 9fe1bae6

Peter Eastman authored Nov 26, 2024

* Use Intel OpenCL for CI

* Set environment variables

* Try to get CI to run

* Debugging

* Debugging

* Fixes for Intel OpenCL

9fe1bae6

22 Nov, 2024 1 commit

updateParametersInContext() can modify parameter offsets (#4732) · ecbe32b0

Peter Eastman authored Nov 22, 2024

* updateParametersInContext() can modify parameter offsets

* Reordering respects parameter offsets

* Implemented for CUDA and HIP

ecbe32b0

11 Nov, 2024 1 commit
- Reduced memory use while identifying molecule groups (#4713) · 78c15368
  Peter Eastman authored Nov 11, 2024
```
* Reduced memory use while identifying molecule groups

* Further reduce memory use
```
  78c15368
01 Nov, 2024 1 commit
- Prevent HipContext.h from including vkFFT.h (#4714) · 53770948
  Peter Eastman authored Nov 01, 2024
  
  53770948
09 Oct, 2024 1 commit
- Prevent linking to nvrtc-builtins (#4688) · ffb3082f
  Peter Eastman authored Oct 09, 2024
  
  ffb3082f
23 Sep, 2024 1 commit

Optimize PME spread charge kernel (#4633) · 8ea42950

Anton Gorenko authored Sep 24, 2024

* PME_ORDER threads process one atom;
* PME_ORDER threads access consecutive addresses;
* No need to permute z indices with zindexTable;
* finishSpreadCharge is needed only with fixed point charge spreading;

8ea42950

10 Sep, 2024 3 commits
- Merged tests for different platforms (#4652) · 462a9be3
  Peter Eastman authored Sep 10, 2024
  
  462a9be3
- Fixed numeric overflow (#4650) · e2f659ce
  Peter Eastman authored Sep 10, 2024
  
  e2f659ce
- Merged parallel code (#4649) · b28d2e66
  Peter Eastman authored Sep 10, 2024
```
* Unified lots of parallel computation code between platforms

* Unified test code between platforms

* Eliminated duplicated timing code
```
  b28d2e66
06 Sep, 2024 1 commit

Optimize updateParametersInContext() (#4610) · 78902bed

Peter Eastman authored Sep 06, 2024

* Optimize CustomNonbondedForce.updateParametersInContext()

* Optimized uploading changed values to GPU

* Optimized updateParametersInContext() for lots of bonded forces

* Optimized updateParametersInContext() for CustomExternalForce

* Optimized updateParametersInContext() for NonbondedForce

* Code changes for HIP platform

78902bed

05 Sep, 2024 7 commits

Added test case · da106c00
peastman authored Sep 05, 2024

da106c00
Prevent deadlock using CustomCPPForceImpl with multiple GPUs · 91d56f74
Peter Eastman authored Sep 05, 2024

91d56f74

Port changes from the main repository · f7240731

Anton Gorenko authored Aug 25, 2024

CustomCPPForceImpl for writing forces in C++

    https://github.com/openmm/openmm/commit/9a0db72
    https://github.com/openmm/openmm/pull/4231

Virtual sites can depend on other virtual sites

    https://github.com/openmm/openmm/commit/71f4b3f

Use LF-Middle for LangevinIntegrator and VariableLangevinIntegrator

    https://github.com/openmm/openmm/commit/86988b9

Merged more code into common platform

    https://github.com/openmm/openmm/commit/5739788

    * Common implementation of BondedUtilities
    * Common implementation of UpdateStateDataKernel

Fixed periodic box changing from rectangular to triclinic

    https://github.com/openmm/openmm/commit/75d4f29

f7240731

Port changes from the main repository (mostly related to large blocks) · 7d7490ea

Anton Gorenko authored Aug 25, 2024

Skip neighbor list for very small systems

    https://github.com/openmm/openmm/pull/4070

Store bounding box sizes in half precision

    https://github.com/openmm/openmm/commit/2ae50f9

Use large blocks to optimize building the neighbor list

    https://github.com/openmm/openmm/commit/3955033

Improved sorting of blocks when building neighbor list

    https://github.com/openmm/openmm/commit/796ffaa

Fixed bug in large blocks optimization with triclinic boxes

    https://github.com/openmm/openmm/commit/4c10732

Optimize sorting of non-uniformly distributed data

    https://github.com/openmm/openmm/commit/71d9bb1

Co-authored-by: bdenhollander <44237618+bdenhollander@users.noreply.github.com>

7d7490ea

Add ATMForce · f965707b

Anton Gorenko authored Aug 25, 2024


Co-authored-by: Emilio Gallicchio <emilio.gallicchio@gmail.com>

f965707b

Improve latencies, handling of streams and events, multi-GPU support · 70771a51

Anton Gorenko authored Aug 25, 2024

Use a small kernel for copying interactionCounts to host memory

    hipMemcpy's CopyDeviceToHost operation has higher latency.

Do not set stream and event blocking/spin related flags

    Let the runtime choose the best option because overriding does not
    improve performance in most cases.

Remove NULL streams and use nonblocking streams explicitly

Make HipContext::pushAsCurrent/popAsCurrent thread-safe as they can be
called simultaneously from different threads via ContextSelector.

Allow peer access to be enabled more than once (if there are multiple
simulations one after another, like in benchmark.py).

Create peerCopyStream on a corresponding device

Use two-speed load balancing for multi GPU runs

    First 100 steps do coarse balancing, next 100 - fine tuning.
    Also ignore the slowest device (usually 0) if its fraction has
    reached 0, (i.e. no work can be transfered to other devices) and
    balance other devices.

Do not download inteactionCounts in parallel nonbonded tasks

    This is not required because updateNeighborListSize has been called
    and valid flag changed.

Initialize tilesAfterReorder properly

    It may contain a garbage value, and if it is large then
    updateNeighborListSize does not force reorder atoms after 25 steps
    in extremal cases.

70771a51

Port changes from the main repository · ecc2d258

Anton Gorenko authored Aug 25, 2024

Use cuCtxPushCurrent() and cuCtxPopCurrent() for selecting CUDA context

    https://github.com/openmm/openmm/pull/3258

Fixed uninitialized memory access

    https://github.com/openmm/openmm/issues/3392
    https://github.com/openmm/openmm/pull/3399

Fixed potential invalid memory access

    See https://github.com/openmm/openmm/pull/3428

Improved temperature reporting for Drude particles

    https://github.com/openmm/openmm/pull/3486
    https://github.com/openmm/openmm/commit/a5e42f5

Fixed race condition with multiple GPUs

    https://github.com/openmm/openmm/commit/6fb1c8a41edff980862750bc086f6a204eb50941

Use blocking sync when creating events

    https://github.com/openmm/openmm/commit/fe21d5ee4f14673a4ea38b7244991772a64ceec2

Very minor optimizations

    https://github.com/openmm/openmm/commit/109f6b2535da4e0c0dd88007d6ca06b4add2ce81

Use PocketFFT

    https://github.com/openmm/openmm/commit/1dac981a63300a2a53a7925f570995914f7163ed

Improved logic for deciding when to reorder atoms

    https://github.com/openmm/openmm/commit/48664a1f1a4490a4dabc277757545ac070e7b898

Ensure valid atom order after loading a checkpoint

    https://github.com/openmm/openmm/commit/a056d5a3754e193105409afa12c9f0c9a2d972a2

Improve performance running on multiple GPUs

    https://github.com/openmm/openmm/commit/0c82c2647de98da5c6dab7bf7a7b8b19705aadc0

Fixed errors when running on multiple GPUs

    https://github.com/openmm/openmm/commit/ed9df876d43c037c08d4762721e73e5caae086d9

Optimized reducing energy

    https://github.com/openmm/openmm/commit/2975f44

ecc2d258