Commits · a0acfbc961f00562bb49a8f3a541012b5796c9ad · tsoc / openmm

05 Sep, 2024 3 commits

Anton Gorenko authored Aug 25, 2024

* Compile with -munsafe-fp-atomics to enable fast hardware f32 atomic
  add on global memory on pre-MI100 GPUs;
* Use fixed point charge spreading on other GPUs, otherwise float atomic
  add will be compiled as a slow CAS loop;
* Tune block sizes, use executeKernelFlat;
* Tune launch bounds of PME grid-related kernels: force the compiler to
  use all registers by limiting max waves per EU to 1.

a0acfbc9

Optimize findInteractingBlocks · a96534c1

Anton Gorenko authored Aug 25, 2024

Optimize findBlocksWithInteractions

* Replace volatile shared mem accesses with shuffles;
* Add NUM_TILES_IN_BATCH for processing block1 by multiple warps
  (for small systems);
* Cherry-pick missing changes from .cu;
* Tune MAX_BITS_FOR_PAIRS depending on device and the system size;
* Store single pairs immediately (if there are any), this allows not to
  store flags to shared memory and filter buffer and flagsBuffer after
  saving single pairs;
* Use fma explicitly and sign bit for better device code;
* Use CDNA's MFMA with singe/mixed precision;
* On CDNA the coarse grained stage processes warpSize blocks for
  one block1, the fine grained stage checks atoms of two block2 vs atoms
  of the same block1, singlePairs and interactingAtoms are also stored
  by warps, not half-warps;

Optimize findBlockBounds

* Use shuffles;
* Use executeKernelFlat;
* Process 2 tiles per warp 64 on CDNA;
* Use more uniformly distributed keys when sorting blocks;

Use compareInt2LargeSIMD when tile size < SIMD width

Fix exclusion tiles sorting on AMD CDNA (64 threads per wave)

    The nonbonded kernel uses USE_NEIGHBOR_LIST (useNeighborList)
    so host code also must check it instead of useCutoff.

    See also https://github.com/openmm/openmm/issues/3462

a96534c1

Optimize computeNonbonded · 67f5644d

Anton Gorenko authored Aug 25, 2024

* All AMD GPUs support shuffle, double precision and 64-bit int atomics;
* Remove unused code: !ENABLE_SHUFFLE code paths in nonbonded.hip;
* Use intrinsics in single-precision;
* Use realToFixedPoint (faster float32-to-int64);
* Remove shared atomIndices, use shuffles;
* Check early if atoms are in the cutoff range, sometimes all lanes in
  a warp can skip computations, single pairs can also skip useless
  atomics with zero values;
* Remove volatile skipTiles access, use shuffles;
* Distribute work for warps in a strided order;
* Skip warps that may be still busy in the first loop;
* Unify conditions for excluded atoms with `includeInteraction`;
* Move multiprocessors to HipContext;
* Increase number of warps for computeNonbonded;
* Disable packed math for >=MI200 (it affects performance of some
  kernels like computeGKForces of amoebagk);
* Remove defaultOptimizationOptions and createModule's optimizationFlags
  as they are never used;
* Support -save-temps.

67f5644d

01 Sep, 2024 4 commits

Optimize sorting kernels and tune block sizes · 7279c539

Anton Gorenko authored Aug 25, 2024

* Compile kernels with max block size of 256 threads:
  The default hipcc behavior since ROCm 4.2 is to compile kernels
  with 1024 threads unless __launch_bounds__ is specified. This
  significantly increases register pressure especially in heavy kernels
  (double precision, for example), requiring register spilling;
* Optimize computeRange by using multiple blocks for reduction;
* Use blocks of 1024 threads for computeBucketPositions - it is executed
  as a single work group so larger block size is faster;
* Sort up-to lenghtNextPow2 instead of blockDim.x (faster for short
  buckets);
* Optimize sortShortList2;
* Optimize sortBuckets with bit instructions;
* Decrease bucket size for non-uniform sorting: too many buckets may
  have sizes too large to sort in shared memory;
* Add more sizes in tests.

7279c539

Cleanup Cmake scripts for HIP platform · aca24d5f

Anton Gorenko authored Aug 25, 2024

* Remove setting of link libraries, include and link dirs and compile
  flags for each target, instead let Cmake deal with them by linking the
  main library to hip::host hiprtc::hiprtc hip::hipfft;
* Fix: custom command without ADD_CUSTOM_TARGET and ADD_DEPENDENCIES is
  executed for both static and shared targets;
* Remove IF(APPLE) parts.

aca24d5f

Add hipification of Amoeba, Drude, RPMD plugins · 6c0f3fbd

Anton Gorenko authored Aug 25, 2024

Fix SegFault in HipCalcHippoNonbondedForceKernel

    HipSort was created using a temporary ref. Adding `HipContext& cu`
    field to HipCalcHippoNonbondedForceKernel fixes the issue;

6c0f3fbd

Add hipification of CUDA platform · 89d2ff0e

Anton Gorenko authored Aug 25, 2024

Port changes in CUDA backend to HIP

Fix a warning about arithmetic operations on void* in HipArray::uploadSubArray

Fix "Error Initializing context ROCm 5.3.0"

    https://github.com/StreamHPC/openmm-hip/issues/3


    hipDeviceSetCacheConfig returns hipErrorNotSupported on 5.3
Co-authored-by: Nick Curtis <nicholas.curtis@amd.com>

89d2ff0e

23 Aug, 2024 2 commits
- Update reference for citing OpenMM (#4628) · 8defca2d
  Peter Eastman authored Aug 23, 2024
  
  8defca2d
- Support numpy 2 (#4622) · 1cd905b4
  Peter Eastman authored Aug 23, 2024
```
* Support numpy 2

* Debugging

* Removed debugging code
```
  1cd905b4
19 Aug, 2024 1 commit
- Fixed periodic box changing from rectangular to triclinic (#4618) · 75d4f299
  Peter Eastman authored Aug 19, 2024
  
  75d4f299
06 Aug, 2024 1 commit
- Don't require importlib_metadata (#4612) · 68fd0f67
  Peter Eastman authored Aug 06, 2024
```
* Don't require importlib_metadata

* Handle older versions of importlib
```
  68fd0f67
25 Jul, 2024 1 commit

Fix import of netcdf_file for compatibility with scipy 1.14 (#4602) · 79202805

Timothy Palpant authored Jul 25, 2024



* Fix import of netcdf_file for scipy 1.14

* Fix indentation

---------
Co-authored-by: Timothy Palpant <tim@atommapper.com>

79202805

19 Jul, 2024 1 commit

Run Mac tests on ARM and Intel (#4597) · ab7ad049

Peter Eastman authored Jul 19, 2024

* Run Mac tests on ARM and Intel

* Added missing environment file

* Removed obsolete code for M1 runner

* Removed obsolete code for M1 runner

ab7ad049

17 Jul, 2024 1 commit

Debug CI failures (#4588) · e30e5b69

Peter Eastman authored Jul 17, 2024

* Debug CI failures

* Debugging

* Debugging

* Debugging

* Debugging

* Debugging

* Debugging

* Debugging

* Debugging

* Removed build that was failing

* Fixed URL that had changed

e30e5b69

09 Jul, 2024 1 commit
- addHydrogens() allows specifying exactly what hydrogens to add (#4585) · 9e4b6ba5
  Peter Eastman authored Jul 09, 2024
```
* addHydrogens() allows specifying exactly what hydrogens to add

* Prevent CI from using numpy 2.0
```
  9e4b6ba5
13 May, 2024 1 commit
- GromacsTopFile supports virtual_sites3 function 4 (#4536) · 51a112a3
  Peter Eastman authored May 13, 2024
  
  51a112a3
03 May, 2024 1 commit
- Allow multiple registrations of the same atom type if definitions identical (#4531) · fd44d285
  Matt Thompson authored May 03, 2024
```
* Allow multiple registrations of the same atom type if definitions identical

* Different short-circuiting logic
```
  fd44d285
29 Apr, 2024 3 commits
- Fixed errors with residueTemplates arguments to Modeller methods (#4528) · 5d407c94
  Peter Eastman authored Apr 29, 2024
  
  5d407c94
- DrudeForce supports periodic boundary conditions (#4523) · 8ac9cc44
  Peter Eastman authored Apr 29, 2024
```
* DrudeForce supports periodic boundary conditions

* Fixed uninitialized memory
```
  8ac9cc44
- Update residueTemplates dictionary when solvating a system (#4525) · 3cb6f8d7
  FloLangenfeld authored Apr 29, 2024
```
Co-authored-by: FloLangenfeld <florent.langenfeld@peptinov.fr>
```
  3cb6f8d7
10 Apr, 2024 1 commit
- added type checking for Simulation.step() (#4506) · 2d4372d7
  Marc Schuh authored Apr 10, 2024
```
* added type checking for Simulation.step()

* changed how to check if step is an integer number
```
  2d4372d7
09 Apr, 2024 1 commit
- Fixed error when using both CustomNonbondedForce and LennardJonesForce (#4508) · 967d89c8
  Peter Eastman authored Apr 09, 2024
  
  967d89c8
06 Apr, 2024 2 commits
- ATMForce reorders inner contexts for better performance (#4495) · ea4b6872
  Peter Eastman authored Apr 06, 2024
```
* ATMForce reorders inner contexts for better performance

* Fixed obsolete comments
```
  ea4b6872
- reinitialize() remembers whether positions have been set (#4501) · 6ba168cd
  Peter Eastman authored Apr 05, 2024
  
  6ba168cd
05 Apr, 2024 1 commit
- Do not build versioned libraries (#4498) · daab6862
  Peter Eastman authored Apr 05, 2024
  
  daab6862
28 Mar, 2024 2 commits
- Check BUILD_TESTING before building tests (#4487) · b32720cc
  Peter Eastman authored Mar 28, 2024
  
  b32720cc
- Avoid overflow in large XTC files (#4485) · 6cf08d25
  Raul authored Mar 28, 2024
```
* Avoid overflow in large XTC files

* Also cast box indices to size_t
```
  6cf08d25
21 Mar, 2024 1 commit
- Created DebuggingReporter class (#4482) · f82876f0
  Peter Eastman authored Mar 21, 2024
```
* Created DebuggingReporter class

* Fixed description
```
  f82876f0
18 Mar, 2024 1 commit
- Update references to "openmmforcefields" project (#4480) · 3a2b49f2
  Matt Thompson authored Mar 18, 2024
```
* Update references to "openmmforcefields" project

* Org name, too
```
  3a2b49f2
09 Mar, 2024 1 commit

Revised code from installing CUDA on CI (#4470) · b0eb7713

Peter Eastman authored Mar 08, 2024

* Revised code from installing CUDA on CI

* Debugging

* Debugging

* Debugging

* Debugging

* Debugging

* Debugging

* Debugging

b0eb7713

08 Mar, 2024 2 commits

Add LoongArch architecture LSX support. (#4467) · af229eb4

ZangRuochen authored Mar 09, 2024



LoongArch is a new architecture, already supported by linux-6.1, gcc-12.
Signed-off-by: Zang Ruochen <zangruochen@loongson.cn>
Co-authored-by: Zang Ruochen <zangruochen@loongson.cn>

af229eb4

vectorize_portable.h works on gcc (#4466) · faee23a2
Peter Eastman authored Mar 07, 2024

faee23a2

05 Mar, 2024 2 commits
- fix minor issue in error message with non-existent self.lines (#4464) · 122bbe40
  Stefan Doerr authored Mar 05, 2024
  
  122bbe40
- remove possibility of defining (32-bit) __ARM__ instead of __ARM64__ (#4462) · 669c7fc7
  Miguel Dias Costa authored Mar 05, 2024
  
  669c7fc7
04 Mar, 2024 1 commit
- Fixed memory leak (#4461) · 66616acc
  Peter Eastman authored Mar 04, 2024
  
  66616acc
24 Feb, 2024 1 commit
- Minor optimization to validating exclusions (#4453) · 6c6bc628
  Peter Eastman authored Feb 24, 2024
```
* Minor optimization to validating exclusions

* Optimizations to findMoleculeGroups()
```
  6c6bc628
23 Feb, 2024 1 commit
- Improved performance of CustomHbondForce on large systems (#4451) · 4e742529
  Peter Eastman authored Feb 23, 2024
```
* Improved performance of CustomHbondForce on large systems

* Fixed CUDA compilation errors
```
  4e742529
17 Feb, 2024 1 commit

Use LF-Middle for LangevinIntegrator and VariableLangevinIntegrator (#4440) · 86988b90

Peter Eastman authored Feb 16, 2024

* Made LangevinIntegrator identical to LangevinMiddleIntegrator

* Removed unused code

* VariableLangevinIntegrator uses LFMiddle

86988b90

13 Feb, 2024 1 commit

API improvements (#4437) · e62bdf6a

Peter Eastman authored Feb 13, 2024

* Can use getPlatform() instead of getPlatformByName()

* More concise arguments for getState()

e62bdf6a

02 Feb, 2024 1 commit

Virtual sites can depend on other virtual sites (#4348) · 71f4b3fc

Peter Eastman authored Feb 02, 2024

* Reference platform supports nested virtual sites

* Common platform supports nested virtual sites

* Fixed force distribution from nested virtual sites

* Fixed test failures

71f4b3fc