Unverified Commit f55abcaa authored by Evan Pretti's avatar Evan Pretti Committed by GitHub
Browse files

Add constant potential method (#4870)



* Initial implementation of C++ API

* Add kernel interface and information for API generation

* API updates for updating electrode parameters

* Add serialization proxy for ConstantPotentialForce

* Update file headers

* Add CG error tolerance and fix units on getCharges() return value

* Initial implementation of matrix solver

* Fixes and conjugate gradient solver

* Try to fix Linux and Windows builds

* Make sure charge constraint target is on total charge

* Restore handling of exceptions like NonbondedForce since they won't involve electrode atoms

* Ameliorate numerical instability in constrained conjugate gradient

* Fix uninitialized pointers, memory leak, and style

* Set CG tolerance units in Python API

* Test ConstantPotentialForce serialization

* Read/write ExceptionsUsePeriodicBoundaryConditions as bool

* Improve constrained conjugate gradient robustness to roundoff error accumulation

* Recompute matrix if electrode atoms move due to setPositions()

* Tolerance is now in gradient (potential) units again

* Add neutralizing background correction

* Add Python API tests

* Fixes for CG and nonbonded exceptions

* Add initial tests checking against existing NonbondedForce behavior

* Expand test suite and fix some implementation issues

* Add additional tests using larger reference system

* Add Gaussian test

* Finish test against reference computation

* CPU platform implementation

* Fixes for compilation on some platforms

* Fixes for constant potential with AVX/AVX2

* Test linking CPU PME library to constant potential test directly

* Older SWIG versions don't support Python set to C++ set conversion

* Add user guide entry

* Increase speed of reference test

* Conditional building constant potential CPU test is unreliable

* Debugging

* Miscellaneous fixes and improvements for CI

* Cache charges so solver will not run if system and coordinates have not changed

* Preconditioner flag, stability, and automatic detection improvements

* Add GPU platform-specific constant potential kernel classes

* PME and device-host I/O changes to support constant potential

* Initial common constant potential implementation

* Constant potential fixes:

* Fix preconditioner PME position/charge save/restore logic

* Fix reduction synchronization in constant potential solver kernels

* Add double-float accumulation for conjugate gradient solver when
  double unsupported by hardware

* Improve conditioning of a test system, and make sure particles are in or
out of cutoff for consistency and ease of comparing between platforms

* Reorder guess charges for CG when atom reordering changes positions

* Remove PME queue for now

* Trying to debug optimized direct space derivative kernel

* Remove extraneous debugging lines

* Style updates; just make CPU preconditioner double precision

* Debugging updated optimized direct derivatives kernel for all but OpenCL CPU

* OpenCL CPU implementation of direct space derivatives, and cleanup

* Try to make test even shorter to not time out on CI

* Temporary - Debugging

* Debugging

* Debugging

* Debugging

* Debugging

* Remove debugging code and fix reduction synchronization

* Fix other reductions

* Debugging - are tests hanging or just slow on CI?

* Debugging

* Debugging

* Fix macro for case when double precision is available on hardware

* Remove changes for debugging again

* Try to improve matrix solver cache locality by uploading transpose

* Fixes for atom ordering and periodic images

* Can't rely on reorder listener for cell offset updates

* Test reducing number of contexts and timing for CI

* Debugging

* Remove timing code and revert debugging changes

* Matrix solver and plasma term optimizations

* Reduce CG solver kernel calls and downloads

* Don't read back convergence flag from global memory

* Update PME due to refactoring in master branch

* Faster matrix solver (1st step)

* Faster matrix solver for CUDA

* Faster matrix solver compatibility with non-CUDA platforms

* Matrix solver fixes

* Use warp shuffle reductions when possible

* Attempt to work around intermittent compiler crash in Intel CPU OpenCL

* Optimize CG solver kernel 1

* Rework CG solver so some kernels can use more than 1 block

* Don't run out of shared memory

* Asynchronously download convergence flag while clearing buffers

---------
Co-authored-by: default avatarEvan Pretti <pretti@sh03-17n15.int>
parent 0ad62341
This diff is collapsed.
This diff is collapsed.
// The approximation for erfc is from Abramowitz and Stegun (1964) p. 299. They cite the following as
// the original source: C. Hastings, Jr., Approximations for Digital Computers (1955). It has a maximum
// error of 1.5e-7.
if (!isExcluded && r2 < CUTOFF_SQUARED) {
const real prefactor = ONE_4PI_EPS0 * CHARGE1 * CHARGE2 * invR;
const real alphaR = EWALD_ALPHA * r;
const real expAlphaRSqr = EXP(-alphaR * alphaR);
#ifdef USE_DOUBLE_PRECISION
const real erfcAlphaR = erfc(alphaR);
#else
const real tAlpha = RECIP(1.0f+0.3275911f*alphaR);
const real erfcAlphaR = (0.254829592f+(-0.284496736f+(1.421413741f+(-1.453152027f+1.061405429f*tAlpha)*tAlpha)*tAlpha)*tAlpha)*tAlpha*expAlphaRSqr;
#endif
real tempForceScale = erfcAlphaR + TWO_OVER_SQRT_PI * alphaR * expAlphaRSqr;
real tempEnergyScale = erfcAlphaR;
if (SYSELEC1 != -1 || SYSELEC2 != -1) {
const real4 params1 = PARAMS[SYSELEC1 + 1];
const real4 params2 = PARAMS[SYSELEC2 + 1];
const real etaR = r / SQRT(params1.y * params1.y + params2.y * params2.y);
const real expEtaRSqr = EXP(-etaR * etaR);
#ifdef USE_DOUBLE_PRECISION
const real erfcEtaR = erfc(etaR);
#else
const real tEta = RECIP(1.0f+0.3275911f*etaR);
const real erfcEtaR = (0.254829592f+(-0.284496736f+(1.421413741f+(-1.453152027f+1.061405429f*tEta)*tEta)*tEta)*tEta)*tEta*expEtaRSqr;
#endif
tempForceScale -= erfcEtaR + TWO_OVER_SQRT_PI * etaR * expEtaRSqr;
tempEnergyScale -= erfcEtaR;
}
tempEnergy += prefactor * tempEnergyScale;
dEdR += prefactor * tempForceScale * invR * invR;
}
const real exceptionScale = PARAMS[index];
real3 delta = make_real3(pos2.x - pos1.x, pos2.y - pos1.y, pos2.z - pos1.z);
#if APPLY_PERIODIC
APPLY_PERIODIC_TO_DELTA(delta)
#endif
const real r2 = delta.x * delta.x + delta.y * delta.y + delta.z * delta.z;
const real invR = RSQRT(r2);
const real tempEnergy = exceptionScale * invR;
const real tempForce = tempEnergy * invR * invR;
energy += tempEnergy;
delta *= tempForce;
real3 force1 = -delta;
real3 force2 = delta;
const real exclusionScale = PARAMS[index];
real3 delta = make_real3(pos2.x - pos1.x, pos2.y - pos1.y, pos2.z - pos1.z);
#if APPLY_PERIODIC
APPLY_PERIODIC_TO_DELTA(delta)
#endif
const real r2 = delta.x * delta.x + delta.y * delta.y + delta.z * delta.z;
const real r = SQRT(r2);
const real invR = RECIP(r);
const real alphaR = EWALD_ALPHA * r;
real tempForce = 0.0f;
if (alphaR > 1e-6f) {
const real erfAlphaR = ERF(alphaR);
const real prefactor = exclusionScale * invR;
tempForce = prefactor * (erfAlphaR - TWO_OVER_SQRT_PI * alphaR * EXP(-alphaR * alphaR)) * invR * invR;
energy -= prefactor * erfAlphaR;
}
else {
energy -= TWO_OVER_SQRT_PI * EWALD_ALPHA * exclusionScale;
}
delta *= tempForce;
real3 force1 = delta;
real3 force2 = -delta;
#define WARP_SIZE 32
#if defined(__CUDA_ARCH__) && __CUDA_ARCH__ >= 700
#define WARP_SHUFFLE(local, index) __shfl_sync(0xffffffff, local, index)
#define WARP_SHUFFLE_DOWN(local, offset) __shfl_down_sync(0xffffffff, local, offset)
#elif defined(USE_HIP)
#define WARP_SHUFFLE(local, index) __shfl(local, index)
#define WARP_SHUFFLE_DOWN(local, offset) __shfl_down(local, offset)
#endif
#ifdef WARP_SHUFFLE_DOWN
#define TEMP_SIZE WARP_SIZE
#else
#define TEMP_SIZE THREAD_BLOCK_SIZE
#endif
DEVICE real reduceValue(real value, LOCAL_ARG volatile real* temp) {
const int thread = LOCAL_ID;
SYNC_THREADS;
#ifdef WARP_SHUFFLE_DOWN
const int warpCount = LOCAL_SIZE / WARP_SIZE;
const int warp = thread / WARP_SIZE;
const int lane = thread % WARP_SIZE;
for (int step = WARP_SIZE / 2; step > 0; step >>= 1) {
value += WARP_SHUFFLE_DOWN(value, step);
}
if (!lane) {
temp[warp] = value;
}
SYNC_THREADS;
if (!warp) {
value = lane < warpCount ? temp[lane] : 0;
for (int step = WARP_SIZE / 2; step > 0; step >>= 1) {
value += WARP_SHUFFLE_DOWN(value, step);
}
if (!lane) {
temp[0] = value;
}
}
SYNC_THREADS;
#else
temp[thread] = value;
SYNC_THREADS;
for (int step = 1; step < WARP_SIZE / 2; step <<= 1) {
if(thread + step < LOCAL_SIZE && thread % (2 * step) == 0) {
temp[thread] += temp[thread + step];
}
SYNC_WARPS;
}
for (int step = WARP_SIZE / 2; step < LOCAL_SIZE; step <<= 1) {
if(thread + step < LOCAL_SIZE && thread % (2 * step) == 0) {
temp[thread] += temp[thread + step];
}
SYNC_THREADS;
}
#endif
return temp[0];
}
KERNEL void checkSavedElectrodePositions(GLOBAL real4* RESTRICT posq, GLOBAL real4* RESTRICT electrodePosData, GLOBAL int* RESTRICT elecToSys, GLOBAL int* RESTRICT result) {
for (int ii = GLOBAL_ID; ii < NUM_ELECTRODE_PARTICLES; ii += GLOBAL_SIZE) {
real4 posqPosition = posq[elecToSys[ii]];
real4 savedPosition = electrodePosData[ii];
if (posqPosition.x != savedPosition.x || posqPosition.y != savedPosition.y || posqPosition.z != savedPosition.z) {
*result = 1;
break;
}
}
}
KERNEL void saveElectrodePositions(GLOBAL real4* RESTRICT posq, GLOBAL real4* RESTRICT electrodePosData, GLOBAL int* RESTRICT elecToSys) {
for (int ii = GLOBAL_ID; ii < NUM_ELECTRODE_PARTICLES; ii += GLOBAL_SIZE) {
electrodePosData[ii] = posq[elecToSys[ii]];
}
}
KERNEL void solve(GLOBAL real* RESTRICT electrodeCharges, GLOBAL real* RESTRICT chargeDerivatives, GLOBAL real* RESTRICT capacitance
#ifdef USE_CHARGE_CONSTRAINT
, GLOBAL real* RESTRICT constraintVector, real chargeTarget
#endif
) {
// This kernel expects to be executed in a single thread block.
#if CHUNK_SIZE > 1
LOCAL volatile real chunkCharges[CHUNK_SIZE];
#endif
for (int ii = LOCAL_ID; ii < NUM_ELECTRODE_PARTICLES; ii += LOCAL_SIZE) {
electrodeCharges[ii] = -chargeDerivatives[ii];
}
SYNC_THREADS;
// Cholesky solve step 1 (outer loop over chunks of rows).
for (int jj = 0; jj < PADDED_PROBLEM_SIZE; jj += CHUNK_SIZE) {
if (LOCAL_ID < CHUNK_SIZE) {
#if CHUNK_SIZE > 1
#ifdef WARP_SHUFFLE
real threadCharge = electrodeCharges[jj + LOCAL_ID];
for (int k = 0; k < CHUNK_SIZE - 1; k++) {
const real chargeShuffled = WARP_SHUFFLE(threadCharge, k);
if (LOCAL_ID > k) {
threadCharge -= chargeShuffled * capacitance[(mm_long) (jj + k) * PADDED_PROBLEM_SIZE + (jj + LOCAL_ID)];
}
}
SYNC_WARPS;
electrodeCharges[jj + LOCAL_ID] = chunkCharges[LOCAL_ID] = threadCharge;
#else
chunkCharges[LOCAL_ID] = electrodeCharges[jj + LOCAL_ID];
for (int k = 0; k < CHUNK_SIZE - 1; k++) {
SYNC_WARPS;
if (LOCAL_ID > k) {
chunkCharges[LOCAL_ID] -= chunkCharges[k] * capacitance[(mm_long) (jj + k) * PADDED_PROBLEM_SIZE + (jj + LOCAL_ID)];
}
}
SYNC_WARPS;
electrodeCharges[jj + LOCAL_ID] = chunkCharges[LOCAL_ID];
#endif
#endif
}
SYNC_THREADS;
for (int ii = jj + CHUNK_SIZE + LOCAL_ID; ii < NUM_ELECTRODE_PARTICLES; ii += LOCAL_SIZE) {
#if CHUNK_SIZE > 1
real chargeOffset = 0;
for (int k = 0; k < CHUNK_SIZE; k++) {
chargeOffset += chunkCharges[k] * capacitance[(mm_long) (jj + k) * PADDED_PROBLEM_SIZE + ii];
}
electrodeCharges[ii] -= chargeOffset;
#else
electrodeCharges[ii] -= electrodeCharges[jj] * capacitance[(mm_long) jj * PADDED_PROBLEM_SIZE + ii];
#endif
}
SYNC_THREADS;
}
for (int ii = LOCAL_ID; ii < NUM_ELECTRODE_PARTICLES; ii += LOCAL_SIZE) {
electrodeCharges[ii] *= capacitance[(mm_long) ii * PADDED_PROBLEM_SIZE + ii];
}
SYNC_THREADS;
// Cholesky solve step 2 (outer loop over chunks of columns).
for (int jj = PADDED_PROBLEM_SIZE - CHUNK_SIZE; jj >= 0; jj -= CHUNK_SIZE) {
if (LOCAL_ID < CHUNK_SIZE) {
#if CHUNK_SIZE > 1
#ifdef WARP_SHUFFLE
real threadCharge = electrodeCharges[jj + LOCAL_ID];
for (int k = CHUNK_SIZE - 1; k >= 0; k--) {
const real chargeShuffled = WARP_SHUFFLE(threadCharge, k);
if (LOCAL_ID < k) {
threadCharge -= chargeShuffled * capacitance[(mm_long) (jj + k) * PADDED_PROBLEM_SIZE + (jj + LOCAL_ID)];
}
}
SYNC_WARPS;
electrodeCharges[jj + LOCAL_ID] = chunkCharges[LOCAL_ID] = threadCharge;
#else
chunkCharges[LOCAL_ID] = electrodeCharges[jj + LOCAL_ID];
for (int k = CHUNK_SIZE - 1; k >= 0; k--) {
SYNC_WARPS;
if (LOCAL_ID < k) {
chunkCharges[LOCAL_ID] -= chunkCharges[k] * capacitance[(mm_long) (jj + k) * PADDED_PROBLEM_SIZE + (jj + LOCAL_ID)];
}
}
SYNC_WARPS;
electrodeCharges[jj + LOCAL_ID] = chunkCharges[LOCAL_ID];
#endif
#endif
}
SYNC_THREADS;
for (int ii = LOCAL_ID; ii < jj; ii += LOCAL_SIZE) {
#if CHUNK_SIZE > 1
real chargeOffset = 0;
for (int k = 0; k < CHUNK_SIZE; k++) {
chargeOffset += chunkCharges[k] * capacitance[(mm_long) (jj + k) * PADDED_PROBLEM_SIZE + ii];
}
electrodeCharges[ii] -= chargeOffset;
#else
electrodeCharges[ii] -= electrodeCharges[jj] * capacitance[(mm_long) jj * PADDED_PROBLEM_SIZE + ii];
#endif
}
SYNC_THREADS;
}
for (int ii = LOCAL_ID; ii < NUM_ELECTRODE_PARTICLES; ii += LOCAL_SIZE) {
electrodeCharges[ii] *= capacitance[(mm_long) ii * PADDED_PROBLEM_SIZE + ii];
}
SYNC_THREADS;
#ifdef USE_CHARGE_CONSTRAINT
LOCAL volatile real temp[TEMP_SIZE];
real chargeOffset = 0;
for (int ii = LOCAL_ID; ii < NUM_ELECTRODE_PARTICLES; ii += LOCAL_SIZE) {
chargeOffset -= electrodeCharges[ii];
}
chargeOffset = chargeTarget + reduceValue(chargeOffset, temp);
for (int ii = LOCAL_ID; ii < NUM_ELECTRODE_PARTICLES; ii += LOCAL_SIZE) {
electrodeCharges[ii] += chargeOffset * constraintVector[ii];
}
#endif
}
KERNEL void checkSavedPositions(GLOBAL real4* RESTRICT posq, GLOBAL real4* RESTRICT savedPositions, GLOBAL int* RESTRICT result) {
for (int i = GLOBAL_ID; i < NUM_PARTICLES; i += GLOBAL_SIZE) {
real4 posqPosition = posq[i];
real4 savedPosition = savedPositions[i];
if (posqPosition.x != savedPosition.x || posqPosition.y != savedPosition.y || posqPosition.z != savedPosition.z) {
*result = 1;
break;
}
}
}
......@@ -179,6 +179,8 @@ KERNEL void reciprocalConvolution(GLOBAL real2* RESTRICT pmeGrid, GLOBAL const r
real eterm = recipScaleFactor*EXP(-RECIP_EXP_FACTOR*m2)/denom;
if (kx != 0 || ky != 0 || kz != 0) {
pmeGrid[index] = make_real2(grid.x*eterm, grid.y*eterm);
} else {
pmeGrid[index] = make_real2(0);
}
#endif
}
......@@ -351,6 +353,77 @@ KERNEL void gridInterpolateForce(GLOBAL const real4* RESTRICT posq, GLOBAL mm_ul
}
}
KERNEL void gridInterpolateChargeDerivatives(GLOBAL const real4* RESTRICT posq, GLOBAL mm_ulong* RESTRICT derivatives, GLOBAL const real* RESTRICT pmeGrid,
real4 periodicBoxSize, real4 invPeriodicBoxSize, real4 periodicBoxVecX, real4 periodicBoxVecY, real4 periodicBoxVecZ,
real4 recipBoxVecX, real4 recipBoxVecY, real4 recipBoxVecZ, GLOBAL const int* RESTRICT atomIndices
) {
real3 data[PME_ORDER];
const real scale = RECIP((real) (PME_ORDER-1));
for (int i = GLOBAL_ID; i < NUM_INDICES; i += GLOBAL_SIZE) {
int atom = atomIndices[i];
real derivative = 0;
real4 pos = posq[atom];
APPLY_PERIODIC_TO_POS(pos)
real3 t = make_real3(pos.x*recipBoxVecX.x+pos.y*recipBoxVecY.x+pos.z*recipBoxVecZ.x,
pos.y*recipBoxVecY.y+pos.z*recipBoxVecZ.y,
pos.z*recipBoxVecZ.z);
t.x = (t.x-floor(t.x))*GRID_SIZE_X;
t.y = (t.y-floor(t.y))*GRID_SIZE_Y;
t.z = (t.z-floor(t.z))*GRID_SIZE_Z;
int3 gridIndex = make_int3(((int) t.x) % GRID_SIZE_X,
((int) t.y) % GRID_SIZE_Y,
((int) t.z) % GRID_SIZE_Z);
// Since we need the full set of thetas, it's faster to compute them here than load them
// from global memory.
real3 dr = make_real3(t.x-(int) t.x, t.y-(int) t.y, t.z-(int) t.z);
data[PME_ORDER-1] = make_real3(0);
data[1] = dr;
data[0] = make_real3(1)-dr;
for (int j = 3; j < PME_ORDER; j++) {
real div = RECIP((real) (j-1));
data[j-1] = div*dr*data[j-2];
for (int k = 1; k < (j-1); k++)
data[j-k-1] = div*((dr+make_real3(k))*data[j-k-2] + (make_real3(j-k)-dr)*data[j-k-1]);
data[0] = div*(make_real3(1)-dr)*data[0];
}
data[PME_ORDER-1] = scale*dr*data[PME_ORDER-2];
for (int j = 1; j < (PME_ORDER-1); j++)
data[PME_ORDER-j-1] = scale*((dr+make_real3(j))*data[PME_ORDER-j-2] + (make_real3(PME_ORDER-j)-dr)*data[PME_ORDER-j-1]);
data[0] = scale*(make_real3(1)-dr)*data[0];
// Compute the charge derivative on this atom.
for (int ix = 0; ix < PME_ORDER; ix++) {
int xbase = gridIndex.x+ix;
xbase -= (xbase >= GRID_SIZE_X ? GRID_SIZE_X : 0);
xbase = xbase*GRID_SIZE_Y*GRID_SIZE_Z;
real dx = data[ix].x;
for (int iy = 0; iy < PME_ORDER; iy++) {
int ybase = gridIndex.y+iy;
ybase -= (ybase >= GRID_SIZE_Y ? GRID_SIZE_Y : 0);
ybase = xbase + ybase*GRID_SIZE_Z;
real dy = data[iy].y;
for (int iz = 0; iz < PME_ORDER; iz++) {
int zindex = gridIndex.z+iz;
zindex -= (zindex >= GRID_SIZE_Z ? GRID_SIZE_Z : 0);
derivative += dx*dy*data[iz].z*pmeGrid[ybase + zindex];
}
}
}
derivative *= EPSILON_FACTOR;
#ifdef USE_PME_STREAM
ATOMIC_ADD(&derivatives[i], (mm_ulong) realToFixedPoint(derivative));
#else
derivatives[i] += (mm_ulong) realToFixedPoint(derivative);
#endif
}
}
KERNEL void addForces(GLOBAL const real4* RESTRICT forces, GLOBAL mm_long* RESTRICT forceBuffers) {
for (int atom = GLOBAL_ID; atom < NUM_ATOMS; atom += GLOBAL_SIZE) {
real4 f = forces[atom];
......
This diff is collapsed.
This diff is collapsed.
......@@ -9,7 +9,7 @@
* Biological Structures at Stanford, funded under the NIH Roadmap for *
* Medical Research, grant U54 GM072970. See https://simtk.org. *
* *
* Portions copyright (c) 2013-2024 Stanford University and the Authors. *
* Portions copyright (c) 2013-2025 Stanford University and the Authors. *
* Authors: Peter Eastman *
* Contributors: *
* *
......@@ -33,6 +33,7 @@
* -------------------------------------------------------------------------- */
#include "CpuBondForce.h"
#include "CpuConstantPotentialForce.h"
#include "CpuCustomGBForce.h"
#include "CpuCustomManyParticleForce.h"
#include "CpuCustomNonbondedForce.h"
......@@ -322,6 +323,83 @@ private:
CpuBondForce bondForce;
};
/**
* This kernel is invoked by ConstantPotentialForce to calculate the forces acting on the system.
*/
class CpuCalcConstantPotentialForceKernel : public CalcConstantPotentialForceKernel {
public:
CpuCalcConstantPotentialForceKernel(std::string name, const Platform& platform, CpuPlatform::PlatformData& data);
~CpuCalcConstantPotentialForceKernel();
/**
* Initialize the kernel.
*
* @param system the System this kernel will be applied to
* @param force the ConstantPotentialForce this kernel will be used for
*/
void initialize(const System& system, const ConstantPotentialForce& force);
/**
* Execute the kernel to calculate the forces and/or energy.
*
* @param context the context in which to execute this kernel
* @param includeForces true if forces should be calculated
* @param includeEnergy true if the energy should be calculated
* @return the potential energy due to the force
*/
double execute(ContextImpl& context, bool includeForces, bool includeEnergy);
/**
* Copy changed parameters over to a context.
*
* @param context the context to copy parameters to
* @param force the ConstantPotentialForce to copy the parameters from
* @param firstParticle the index of the first particle whose parameters might have changed
* @param lastParticle the index of the last particle whose parameters might have changed
* @param firstException the index of the first exception whose parameters might have changed
* @param lastException the index of the last exception whose parameters might have changed
* @param firstElectrode the index of the first electrode whose parameters might have changed
* @param lastElectrode the index of the last electrode whose parameters might have changed
*/
void copyParametersToContext(ContextImpl& context, const ConstantPotentialForce& force, int firstParticle, int lastParticle, int firstException, int lastException, int firstElectrode, int lastElectrode);
/**
* Get the parameters being used for PME.
*
* @param alpha the separation parameter
* @param nx the number of grid points along the X axis
* @param ny the number of grid points along the Y axis
* @param nz the number of grid points along the Z axis
*/
void getPMEParameters(double& alpha, int& nx, int& ny, int& nz) const;
/**
* Get the charges on all particles.
*
* @param context the context to copy parameters to
* @param[out] charges a vector to populate with particle charges
*/
void getCharges(ContextImpl& context, std::vector<double>& charges);
private:
void checkBoxSize(const Vec3* boxVectors);
void ensurePmeInitialized(ContextImpl& context);
private:
CpuPlatform::PlatformData& data;
int numParticles, num14, numElectrodeParticles, chargePosqIndex;
std::vector<double> setCharges;
std::vector<float> charges;
std::vector<std::vector<double> > bonded14ParamArray;
std::vector<std::vector<int> > bonded14IndexArray;
std::map<int, int> nb14Index;
std::vector<std::set<int> > exclusions;
std::vector<int> sysToElec, elecToSys, sysElec, elecElec;
std::vector<std::array<double, 3> > electrodeParams;
double nonbondedCutoff, ewaldAlpha, cgErrorTol, chargeTarget;
int gridSize[3];
bool exceptionsArePeriodic, hasInitializedPme, useChargeConstraint;
Vec3 externalField;
CpuConstantPotentialForce* constantPotential;
CpuConstantPotentialSolver* solver;
CpuBondForce bondForce;
Kernel pmeKernel;
};
/**
* This kernel is invoked by CustomNonbondedForce to calculate the forces acting on the system.
*/
......
......@@ -8,10 +8,14 @@ ENDFOREACH(file)
IF(MSVC)
SET_SOURCE_FILES_PROPERTIES(${CMAKE_SOURCE_DIR}/platforms/cpu/src/CpuNonbondedForceAvx.cpp PROPERTIES COMPILE_FLAGS "${EXTRA_COMPILE_FLAGS} /arch:AVX /D__AVX__")
SET_SOURCE_FILES_PROPERTIES(${CMAKE_SOURCE_DIR}/platforms/cpu/src/CpuNonbondedForceAvx2.cpp PROPERTIES COMPILE_FLAGS "${EXTRA_COMPILE_FLAGS} /arch:AVX2 /D__AVX2__")
SET_SOURCE_FILES_PROPERTIES(${CMAKE_SOURCE_DIR}/platforms/cpu/src/CpuConstantPotentialForceAvx.cpp PROPERTIES COMPILE_FLAGS "${EXTRA_COMPILE_FLAGS} /arch:AVX /D__AVX__")
SET_SOURCE_FILES_PROPERTIES(${CMAKE_SOURCE_DIR}/platforms/cpu/src/CpuConstantPotentialForceAvx2.cpp PROPERTIES COMPILE_FLAGS "${EXTRA_COMPILE_FLAGS} /arch:AVX2 /D__AVX2__")
SET_SOURCE_FILES_PROPERTIES(${CMAKE_SOURCE_DIR}/platforms/cpu/src/CpuCustomNonbondedForceAvx.cpp PROPERTIES COMPILE_FLAGS "${EXTRA_COMPILE_FLAGS} /arch:AVX /D__AVX__")
ELSEIF(X86)
SET_SOURCE_FILES_PROPERTIES(${CMAKE_SOURCE_DIR}/platforms/cpu/src/CpuNonbondedForceAvx.cpp PROPERTIES COMPILE_FLAGS "${EXTRA_COMPILE_FLAGS} -mavx")
SET_SOURCE_FILES_PROPERTIES(${CMAKE_SOURCE_DIR}/platforms/cpu/src/CpuNonbondedForceAvx2.cpp PROPERTIES COMPILE_FLAGS "${EXTRA_COMPILE_FLAGS} -mavx2 -mfma")
SET_SOURCE_FILES_PROPERTIES(${CMAKE_SOURCE_DIR}/platforms/cpu/src/CpuConstantPotentialForceAvx.cpp PROPERTIES COMPILE_FLAGS "${EXTRA_COMPILE_FLAGS} -mavx")
SET_SOURCE_FILES_PROPERTIES(${CMAKE_SOURCE_DIR}/platforms/cpu/src/CpuConstantPotentialForceAvx2.cpp PROPERTIES COMPILE_FLAGS "${EXTRA_COMPILE_FLAGS} -mavx2 -mfma")
SET_SOURCE_FILES_PROPERTIES(${CMAKE_SOURCE_DIR}/platforms/cpu/src/CpuCustomNonbondedForceAvx.cpp PROPERTIES COMPILE_FLAGS "${EXTRA_COMPILE_FLAGS} -mavx")
ENDIF()
......
This diff is collapsed.
/* -------------------------------------------------------------------------- *
* OpenMM *
* -------------------------------------------------------------------------- *
* This is part of the OpenMM molecular simulation toolkit originating from *
* Simbios, the NIH National Center for Physics-Based Simulation of *
* Biological Structures at Stanford, funded under the NIH Roadmap for *
* Medical Research, grant U54 GM072970. See https://simtk.org. *
* *
* Portions copyright (c) 2025 Stanford University and the Authors. *
* Authors: Evan Pretti *
* Contributors: *
* *
* Permission is hereby granted, free of charge, to any person obtaining a *
* copy of this software and associated documentation files (the "Software"), *
* to deal in the Software without restriction, including without limitation *
* the rights to use, copy, modify, merge, publish, distribute, sublicense, *
* and/or sell copies of the Software, and to permit persons to whom the *
* Software is furnished to do so, subject to the following conditions: *
* *
* The above copyright notice and this permission notice shall be included in *
* all copies or substantial portions of the Software. *
* *
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR *
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, *
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL *
* THE AUTHORS, CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, *
* DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR *
* OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE *
* USE OR OTHER DEALINGS IN THE SOFTWARE. *
* -------------------------------------------------------------------------- */
#include "CpuConstantPotentialForceFvec.h"
#include "CpuNeighborList.h"
#include "openmm/OpenMMException.h"
#ifdef __AVX__
#include "openmm/internal/vectorizeAvx.h"
OpenMM::CpuConstantPotentialForce* createCpuConstantPotentialForceAvx() {
return new OpenMM::CpuConstantPotentialForceFvec<fvec8, ivec8>();
}
#else
OpenMM::CpuConstantPotentialForce* createCpuConstantPotentialForceAvx() {
throw OpenMM::OpenMMException("Internal error: OpenMM was compiled without AVX support");
}
#endif
/* -------------------------------------------------------------------------- *
* OpenMM *
* -------------------------------------------------------------------------- *
* This is part of the OpenMM molecular simulation toolkit originating from *
* Simbios, the NIH National Center for Physics-Based Simulation of *
* Biological Structures at Stanford, funded under the NIH Roadmap for *
* Medical Research, grant U54 GM072970. See https://simtk.org. *
* *
* Portions copyright (c) 2025 Stanford University and the Authors. *
* Authors: Evan Pretti *
* Contributors: *
* *
* Permission is hereby granted, free of charge, to any person obtaining a *
* copy of this software and associated documentation files (the "Software"), *
* to deal in the Software without restriction, including without limitation *
* the rights to use, copy, modify, merge, publish, distribute, sublicense, *
* and/or sell copies of the Software, and to permit persons to whom the *
* Software is furnished to do so, subject to the following conditions: *
* *
* The above copyright notice and this permission notice shall be included in *
* all copies or substantial portions of the Software. *
* *
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR *
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, *
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL *
* THE AUTHORS, CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, *
* DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR *
* OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE *
* USE OR OTHER DEALINGS IN THE SOFTWARE. *
* -------------------------------------------------------------------------- */
#include "CpuConstantPotentialForceFvec.h"
#include "CpuNeighborList.h"
#include "openmm/OpenMMException.h"
#ifdef __AVX2__
#include "openmm/internal/vectorizeAvx2.h"
OpenMM::CpuConstantPotentialForce* createCpuConstantPotentialForceAvx2() {
return new OpenMM::CpuConstantPotentialForceFvec<fvecAvx2, ivec8>();
}
#else
OpenMM::CpuConstantPotentialForce* createCpuConstantPotentialForceAvx2() {
throw OpenMM::OpenMMException("Internal error: OpenMM was compiled without AVX2 support");
}
#endif
/* -------------------------------------------------------------------------- *
* OpenMM *
* -------------------------------------------------------------------------- *
* This is part of the OpenMM molecular simulation toolkit originating from *
* Simbios, the NIH National Center for Physics-Based Simulation of *
* Biological Structures at Stanford, funded under the NIH Roadmap for *
* Medical Research, grant U54 GM072970. See https://simtk.org. *
* *
* Portions copyright (c) 2025 Stanford University and the Authors. *
* Authors: Evan Pretti *
* Contributors: *
* *
* Permission is hereby granted, free of charge, to any person obtaining a *
* copy of this software and associated documentation files (the "Software"), *
* to deal in the Software without restriction, including without limitation *
* the rights to use, copy, modify, merge, publish, distribute, sublicense, *
* and/or sell copies of the Software, and to permit persons to whom the *
* Software is furnished to do so, subject to the following conditions: *
* *
* The above copyright notice and this permission notice shall be included in *
* all copies or substantial portions of the Software. *
* *
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR *
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, *
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL *
* THE AUTHORS, CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, *
* DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR *
* OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE *
* USE OR OTHER DEALINGS IN THE SOFTWARE. *
* -------------------------------------------------------------------------- */
#include "CpuConstantPotentialForceFvec.h"
#include "CpuNeighborList.h"
#include "openmm/internal/hardware.h"
using namespace OpenMM;
CpuConstantPotentialForce* createCpuConstantPotentialForceVec4();
CpuConstantPotentialForce* createCpuConstantPotentialForceAvx();
CpuConstantPotentialForce* createCpuConstantPotentialForceAvx2();
CpuConstantPotentialForce* createCpuConstantPotentialForceVec() {
if (isAvx2Supported())
return createCpuConstantPotentialForceAvx2();
else if (isAvxSupported())
return createCpuConstantPotentialForceAvx();
else
return createCpuConstantPotentialForceVec4();
}
This diff is collapsed.
......@@ -6,7 +6,7 @@
* Biological Structures at Stanford, funded under the NIH Roadmap for *
* Medical Research, grant U54 GM072970. See https://simtk.org. *
* *
* Portions copyright (c) 2013-2024 Stanford University and the Authors. *
* Portions copyright (c) 2013-2025 Stanford University and the Authors. *
* Authors: Peter Eastman *
* Contributors: *
* *
......@@ -52,6 +52,8 @@ KernelImpl* CpuKernelFactory::createKernelImpl(std::string name, const Platform&
return new CpuCalcRBTorsionForceKernel(name, platform, data);
if (name == CalcNonbondedForceKernel::Name())
return new CpuCalcNonbondedForceKernel(name, platform, data);
if (name == CalcConstantPotentialForceKernel::Name())
return new CpuCalcConstantPotentialForceKernel(name, platform, data);
if (name == CalcCustomNonbondedForceKernel::Name())
return new CpuCalcCustomNonbondedForceKernel(name, platform, data);
if (name == CalcCustomManyParticleForceKernel::Name())
......
This diff is collapsed.
This diff is collapsed.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment