Add constant potential method (#4870)

* Initial implementation of C++ API * Add kernel interface and information for API generation * API updates for updating electrode parameters * Add serialization proxy for ConstantPotentialForce * Update file headers * Add CG error tolerance and fix units on getCharges() return value * Initial implementation of matrix solver * Fixes and conjugate gradient solver * Try to fix Linux and Windows builds * Make sure charge constraint target is on total charge * Restore handling of exceptions like NonbondedForce since they won't involve electrode atoms * Ameliorate numerical instability in constrained conjugate gradient * Fix uninitialized pointers, memory leak, and style * Set CG tolerance units in Python API * Test ConstantPotentialForce serialization * Read/write ExceptionsUsePeriodicBoundaryConditions as bool * Improve constrained conjugate gradient robustness to roundoff error accumulation * Recompute matrix if electrode atoms move due to setPositions() * Tolerance is now in gradient (potential) units again * Add neutralizing background correction * Add Python API tests * Fixes for CG and nonbonded exceptions * Add initial tests checking against existing NonbondedForce behavior * Expand test suite and fix some implementation issues * Add additional tests using larger reference system * Add Gaussian test * Finish test against reference computation * CPU platform implementation * Fixes for compilation on some platforms * Fixes for constant potential with AVX/AVX2 * Test linking CPU PME library to constant potential test directly * Older SWIG versions don't support Python set to C++ set conversion * Add user guide entry * Increase speed of reference test * Conditional building constant potential CPU test is unreliable * Debugging * Miscellaneous fixes and improvements for CI * Cache charges so solver will not run if system and coordinates have not changed * Preconditioner flag, stability, and automatic detection improvements * Add GPU platform-specific constant potential kernel classes * PME and device-host I/O changes to support constant potential * Initial common constant potential implementation * Constant potential fixes: * Fix preconditioner PME position/charge save/restore logic * Fix reduction synchronization in constant potential solver kernels * Add double-float accumulation for conjugate gradient solver when double unsupported by hardware * Improve conditioning of a test system, and make sure particles are in or out of cutoff for consistency and ease of comparing between platforms * Reorder guess charges for CG when atom reordering changes positions * Remove PME queue for now * Trying to debug optimized direct space derivative kernel * Remove extraneous debugging lines * Style updates; just make CPU preconditioner double precision * Debugging updated optimized direct derivatives kernel for all but OpenCL CPU * OpenCL CPU implementation of direct space derivatives, and cleanup * Try to make test even shorter to not time out on CI * Temporary - Debugging * Debugging * Debugging * Debugging * Debugging * Remove debugging code and fix reduction synchronization * Fix other reductions * Debugging - are tests hanging or just slow on CI? * Debugging * Debugging * Fix macro for case when double precision is available on hardware * Remove changes for debugging again * Try to improve matrix solver cache locality by uploading transpose * Fixes for atom ordering and periodic images * Can't rely on reorder listener for cell offset updates * Test reducing number of contexts and timing for CI * Debugging * Remove timing code and revert debugging changes * Matrix solver and plasma term optimizations * Reduce CG solver kernel calls and downloads * Don't read back convergence flag from global memory * Update PME due to refactoring in master branch * Faster matrix solver (1st step) * Faster matrix solver for CUDA * Faster matrix solver compatibility with non-CUDA platforms * Matrix solver fixes * Use warp shuffle reductions when possible * Attempt to work around intermittent compiler crash in Intel CPU OpenCL * Optimize CG solver kernel 1 * Rework CG solver so some kernels can use more than 1 block * Don't run out of shared memory * Asynchronously download convergence flag while clearing buffers --------- Co-authored-by: Evan Pretti <pretti@sh03-17n15.int>

Add constant potential method (#4870)
* Initial implementation of C++ API * Add kernel interface and information for API generation * API updates for updating electrode parameters * Add serialization proxy for ConstantPotentialForce * Update file headers * Add CG error tolerance and fix units on getCharges() return value * Initial implementation of matrix solver * Fixes and conjugate gradient solver * Try to fix Linux and Windows builds * Make sure charge constraint target is on total charge * Restore handling of exceptions like NonbondedForce since they won't involve electrode atoms * Ameliorate numerical instability in constrained conjugate gradient * Fix uninitialized pointers, memory leak, and style * Set CG tolerance units in Python API * Test ConstantPotentialForce serialization * Read/write ExceptionsUsePeriodicBoundaryConditions as bool * Improve constrained conjugate gradient robustness to roundoff error accumulation * Recompute matrix if electrode atoms move due to setPositions() * Tolerance is now in gradient (potential) units again * Add neutralizing background correction * Add Python API tests * Fixes for CG and nonbonded exceptions * Add initial tests checking against existing NonbondedForce behavior * Expand test suite and fix some implementation issues * Add additional tests using larger reference system * Add Gaussian test * Finish test against reference computation * CPU platform implementation * Fixes for compilation on some platforms * Fixes for constant potential with AVX/AVX2 * Test linking CPU PME library to constant potential test directly * Older SWIG versions don't support Python set to C++ set conversion * Add user guide entry * Increase speed of reference test * Conditional building constant potential CPU test is unreliable * Debugging * Miscellaneous fixes and improvements for CI * Cache charges so solver will not run if system and coordinates have not changed * Preconditioner flag, stability, and automatic detection improvements * Add GPU platform-specific constant potential kernel classes * PME and device-host I/O changes to support constant potential * Initial common constant potential implementation * Constant potential fixes: * Fix preconditioner PME position/charge save/restore logic * Fix reduction synchronization in constant potential solver kernels * Add double-float accumulation for conjugate gradient solver when double unsupported by hardware * Improve conditioning of a test system, and make sure particles are in or out of cutoff for consistency and ease of comparing between platforms * Reorder guess charges for CG when atom reordering changes positions * Remove PME queue for now * Trying to debug optimized direct space derivative kernel * Remove extraneous debugging lines * Style updates; just make CPU preconditioner double precision * Debugging updated optimized direct derivatives kernel for all but OpenCL CPU * OpenCL CPU implementation of direct space derivatives, and cleanup * Try to make test even shorter to not time out on CI * Temporary - Debugging * Debugging * Debugging * Debugging * Debugging * Remove debugging code and fix reduction synchronization * Fix other reductions * Debugging - are tests hanging or just slow on CI? * Debugging * Debugging * Fix macro for case when double precision is available on hardware * Remove changes for debugging again * Try to improve matrix solver cache locality by uploading transpose * Fixes for atom ordering and periodic images * Can't rely on reorder listener for cell offset updates * Test reducing number of contexts and timing for CI * Debugging * Remove timing code and revert debugging changes * Matrix solver and plasma term optimizations * Reduce CG solver kernel calls and downloads * Don't read back convergence flag from global memory * Update PME due to refactoring in master branch * Faster matrix solver (1st step) * Faster matrix solver for CUDA * Faster matrix solver compatibility with non-CUDA platforms * Matrix solver fixes * Use warp shuffle reductions when possible * Attempt to work around intermittent compiler crash in Intel CPU OpenCL * Optimize CG solver kernel 1 * Rework CG solver so some kernels can use more than 1 block * Don't run out of shared memory * Asynchronously download convergence flag while clearing buffers --------- Co-authored-by: Evan Pretti <pretti@sh03-17n15.int>
f55abcaa · Evan Pretti · GitHub · 0ad62341 · f55abcaa · f55abcaa
Unverified Commit f55abcaa authored Sep 12, 2025 by Evan Pretti Committed by GitHub Sep 12, 2025
20 changed files
--- a/platforms/common/src/kernels/constantPotential.cc
+++ b/platforms/common/src/kernels/constantPotential.cc
--- a/platforms/common/src/kernels/constantPotentialCGSolver.cc
+++ b/platforms/common/src/kernels/constantPotentialCGSolver.cc
--- a/platforms/common/src/kernels/constantPotentialCoulombEnergyForces.cc
+++ b/platforms/common/src/kernels/constantPotentialCoulombEnergyForces.cc
+// The approximation for erfc is from Abramowitz and Stegun (1964) p. 299.  They cite the following as
+// the original source: C. Hastings, Jr., Approximations for Digital Computers (1955).  It has a maximum
+// error of 1.5e-7.
+
+if (!isExcluded && r2 < CUTOFF_SQUARED) {
+    const real prefactor = ONE_4PI_EPS0 * CHARGE1 * CHARGE2 * invR;
+
+    const real alphaR = EWALD_ALPHA * r;
+    const real expAlphaRSqr = EXP(-alphaR * alphaR);
+#ifdef USE_DOUBLE_PRECISION
+    const real erfcAlphaR = erfc(alphaR);
+#else
+    const real tAlpha = RECIP(1.0f+0.3275911f*alphaR);
+    const real erfcAlphaR = (0.254829592f+(-0.284496736f+(1.421413741f+(-1.453152027f+1.061405429f*tAlpha)*tAlpha)*tAlpha)*tAlpha)*tAlpha*expAlphaRSqr;
+#endif
+
+    real tempForceScale = erfcAlphaR + TWO_OVER_SQRT_PI * alphaR * expAlphaRSqr;
+    real tempEnergyScale = erfcAlphaR;
+
+    if (SYSELEC1 != -1 || SYSELEC2 != -1) {
+        const real4 params1 = PARAMS[SYSELEC1 + 1];
+        const real4 params2 = PARAMS[SYSELEC2 + 1];
+
+        const real etaR = r / SQRT(params1.y * params1.y + params2.y * params2.y);
+        const real expEtaRSqr = EXP(-etaR * etaR);
+#ifdef USE_DOUBLE_PRECISION
+        const real erfcEtaR = erfc(etaR);
+#else
+        const real tEta = RECIP(1.0f+0.3275911f*etaR);
+        const real erfcEtaR = (0.254829592f+(-0.284496736f+(1.421413741f+(-1.453152027f+1.061405429f*tEta)*tEta)*tEta)*tEta)*tEta*expEtaRSqr;
+#endif
+
+        tempForceScale -= erfcEtaR + TWO_OVER_SQRT_PI * etaR * expEtaRSqr;
+        tempEnergyScale -= erfcEtaR;
+    }
+
+    tempEnergy += prefactor * tempEnergyScale;
+    dEdR += prefactor * tempForceScale * invR * invR;
+}
--- a/platforms/common/src/kernels/constantPotentialExceptions.cc
+++ b/platforms/common/src/kernels/constantPotentialExceptions.cc
+const real exceptionScale = PARAMS[index];
+real3 delta = make_real3(pos2.x - pos1.x, pos2.y - pos1.y, pos2.z - pos1.z);
+#if APPLY_PERIODIC
+    APPLY_PERIODIC_TO_DELTA(delta)
+#endif
+
+const real r2 = delta.x * delta.x + delta.y * delta.y + delta.z * delta.z;
+const real invR = RSQRT(r2);
+
+const real tempEnergy = exceptionScale * invR;
+const real tempForce = tempEnergy * invR * invR;
+
+energy += tempEnergy;
+delta *= tempForce;
+real3 force1 = -delta;
+real3 force2 = delta;
--- a/platforms/common/src/kernels/constantPotentialExclusions.cc
+++ b/platforms/common/src/kernels/constantPotentialExclusions.cc
+const real exclusionScale = PARAMS[index];
+real3 delta = make_real3(pos2.x - pos1.x, pos2.y - pos1.y, pos2.z - pos1.z);
+#if APPLY_PERIODIC
+    APPLY_PERIODIC_TO_DELTA(delta)
+#endif
+
+const real r2 = delta.x * delta.x + delta.y * delta.y + delta.z * delta.z;
+const real r = SQRT(r2);
+const real invR = RECIP(r);
+
+const real alphaR = EWALD_ALPHA * r;
+real tempForce = 0.0f;
+if (alphaR > 1e-6f) {
+    const real erfAlphaR = ERF(alphaR);
+    const real prefactor = exclusionScale * invR;
+    tempForce = prefactor * (erfAlphaR - TWO_OVER_SQRT_PI * alphaR * EXP(-alphaR * alphaR)) * invR * invR;
+
+    energy -= prefactor * erfAlphaR;
+}
+else {
+    energy -= TWO_OVER_SQRT_PI * EWALD_ALPHA * exclusionScale;
+}
+delta *= tempForce;
+real3 force1 = delta;
+real3 force2 = -delta;
--- a/platforms/common/src/kernels/constantPotentialMatrixSolver.cc
+++ b/platforms/common/src/kernels/constantPotentialMatrixSolver.cc
+#define WARP_SIZE 32
+
+#if defined(__CUDA_ARCH__) && __CUDA_ARCH__ >= 700
+    #define WARP_SHUFFLE(local, index) __shfl_sync(0xffffffff, local, index)
+    #define WARP_SHUFFLE_DOWN(local, offset) __shfl_down_sync(0xffffffff, local, offset)
+#elif defined(USE_HIP)
+    #define WARP_SHUFFLE(local, index) __shfl(local, index)
+    #define WARP_SHUFFLE_DOWN(local, offset) __shfl_down(local, offset)
+#endif
+
+#ifdef WARP_SHUFFLE_DOWN
+    #define TEMP_SIZE WARP_SIZE
+#else
+    #define TEMP_SIZE THREAD_BLOCK_SIZE
+#endif
+
+DEVICE real reduceValue(real value, LOCAL_ARG volatile real* temp) {
+    const int thread = LOCAL_ID;
+    SYNC_THREADS;
+#ifdef WARP_SHUFFLE_DOWN
+    const int warpCount = LOCAL_SIZE / WARP_SIZE;
+    const int warp = thread / WARP_SIZE;
+    const int lane = thread % WARP_SIZE;
+    for (int step = WARP_SIZE / 2; step > 0; step >>= 1) {
+        value += WARP_SHUFFLE_DOWN(value, step);
+    }
+    if (!lane) {
+        temp[warp] = value;
+    }
+    SYNC_THREADS;
+    if (!warp) {
+        value = lane < warpCount ? temp[lane] : 0;
+        for (int step = WARP_SIZE / 2; step > 0; step >>= 1) {
+            value += WARP_SHUFFLE_DOWN(value, step);
+        }
+        if (!lane) {
+            temp[0] = value;
+        }
+    }
+    SYNC_THREADS;
+#else
+    temp[thread] = value;
+    SYNC_THREADS;
+    for (int step = 1; step < WARP_SIZE / 2; step <<= 1) {
+        if(thread + step < LOCAL_SIZE && thread % (2 * step) == 0) {
+            temp[thread] += temp[thread + step];
+        }
+        SYNC_WARPS;
+    }
+    for (int step = WARP_SIZE / 2; step < LOCAL_SIZE; step <<= 1) {
+        if(thread + step < LOCAL_SIZE && thread % (2 * step) == 0) {
+            temp[thread] += temp[thread + step];
+        }
+        SYNC_THREADS;
+    }
+#endif
+    return temp[0];
+}
+
+KERNEL void checkSavedElectrodePositions(GLOBAL real4* RESTRICT posq, GLOBAL real4* RESTRICT electrodePosData, GLOBAL int* RESTRICT elecToSys, GLOBAL int* RESTRICT result) {
+    for (int ii = GLOBAL_ID; ii < NUM_ELECTRODE_PARTICLES; ii += GLOBAL_SIZE) {
+        real4 posqPosition = posq[elecToSys[ii]];
+        real4 savedPosition = electrodePosData[ii];
+        if (posqPosition.x != savedPosition.x || posqPosition.y != savedPosition.y || posqPosition.z != savedPosition.z) {
+            *result = 1;
+            break;
+        }
+    }
+}
+
+KERNEL void saveElectrodePositions(GLOBAL real4* RESTRICT posq, GLOBAL real4* RESTRICT electrodePosData, GLOBAL int* RESTRICT elecToSys) {
+    for (int ii = GLOBAL_ID; ii < NUM_ELECTRODE_PARTICLES; ii += GLOBAL_SIZE) {
+        electrodePosData[ii] = posq[elecToSys[ii]];
+    }
+}
+
+KERNEL void solve(GLOBAL real* RESTRICT electrodeCharges, GLOBAL real* RESTRICT chargeDerivatives, GLOBAL real* RESTRICT capacitance
+#ifdef USE_CHARGE_CONSTRAINT
+    , GLOBAL real* RESTRICT constraintVector, real chargeTarget
+#endif
+) {
+    // This kernel expects to be executed in a single thread block.
+
+#if CHUNK_SIZE > 1
+    LOCAL volatile real chunkCharges[CHUNK_SIZE];
+#endif
+
+    for (int ii = LOCAL_ID; ii < NUM_ELECTRODE_PARTICLES; ii += LOCAL_SIZE) {
+        electrodeCharges[ii] = -chargeDerivatives[ii];
+    }
+    SYNC_THREADS;
+
+    // Cholesky solve step 1 (outer loop over chunks of rows).
+
+    for (int jj = 0; jj < PADDED_PROBLEM_SIZE; jj += CHUNK_SIZE) {
+        if (LOCAL_ID < CHUNK_SIZE) {
+#if CHUNK_SIZE > 1
+    #ifdef WARP_SHUFFLE
+            real threadCharge = electrodeCharges[jj + LOCAL_ID];
+            for (int k = 0; k < CHUNK_SIZE - 1; k++) {
+                const real chargeShuffled = WARP_SHUFFLE(threadCharge, k);
+                if (LOCAL_ID > k) {
+                    threadCharge -= chargeShuffled * capacitance[(mm_long) (jj + k) * PADDED_PROBLEM_SIZE + (jj + LOCAL_ID)];
+                }
+            }
+            SYNC_WARPS;
+            electrodeCharges[jj + LOCAL_ID] = chunkCharges[LOCAL_ID] = threadCharge;
+    #else
+            chunkCharges[LOCAL_ID] = electrodeCharges[jj + LOCAL_ID];
+            for (int k = 0; k < CHUNK_SIZE - 1; k++) {
+                SYNC_WARPS;
+                if (LOCAL_ID > k) {
+                    chunkCharges[LOCAL_ID] -= chunkCharges[k] * capacitance[(mm_long) (jj + k) * PADDED_PROBLEM_SIZE + (jj + LOCAL_ID)];
+                }
+            }
+            SYNC_WARPS;
+            electrodeCharges[jj + LOCAL_ID] = chunkCharges[LOCAL_ID];
+    #endif
+#endif
+        }
+        SYNC_THREADS;
+        for (int ii = jj + CHUNK_SIZE + LOCAL_ID; ii < NUM_ELECTRODE_PARTICLES; ii += LOCAL_SIZE) {
+#if CHUNK_SIZE > 1
+            real chargeOffset = 0;
+            for (int k = 0; k < CHUNK_SIZE; k++) {
+                chargeOffset += chunkCharges[k] * capacitance[(mm_long) (jj + k) * PADDED_PROBLEM_SIZE + ii];
+            }
+            electrodeCharges[ii] -= chargeOffset;
+#else
+            electrodeCharges[ii] -= electrodeCharges[jj] * capacitance[(mm_long) jj * PADDED_PROBLEM_SIZE + ii];
+#endif
+        }
+        SYNC_THREADS;
+    }
+
+    for (int ii = LOCAL_ID; ii < NUM_ELECTRODE_PARTICLES; ii += LOCAL_SIZE) {
+        electrodeCharges[ii] *= capacitance[(mm_long) ii * PADDED_PROBLEM_SIZE + ii];
+    }
+    SYNC_THREADS;
+
+    // Cholesky solve step 2 (outer loop over chunks of columns).
+
+    for (int jj = PADDED_PROBLEM_SIZE - CHUNK_SIZE; jj >= 0; jj -= CHUNK_SIZE) {
+        if (LOCAL_ID < CHUNK_SIZE) {
+#if CHUNK_SIZE > 1
+    #ifdef WARP_SHUFFLE
+            real threadCharge = electrodeCharges[jj + LOCAL_ID];
+            for (int k = CHUNK_SIZE - 1; k >= 0; k--) {
+                const real chargeShuffled = WARP_SHUFFLE(threadCharge, k);
+                if (LOCAL_ID < k) {
+                    threadCharge -= chargeShuffled * capacitance[(mm_long) (jj + k) * PADDED_PROBLEM_SIZE + (jj + LOCAL_ID)];
+                }
+            }
+            SYNC_WARPS;
+            electrodeCharges[jj + LOCAL_ID] = chunkCharges[LOCAL_ID] = threadCharge;
+    #else
+            chunkCharges[LOCAL_ID] = electrodeCharges[jj + LOCAL_ID];
+            for (int k = CHUNK_SIZE - 1; k >= 0; k--) {
+                SYNC_WARPS;
+                if (LOCAL_ID < k) {
+                    chunkCharges[LOCAL_ID] -= chunkCharges[k] * capacitance[(mm_long) (jj + k) * PADDED_PROBLEM_SIZE + (jj + LOCAL_ID)];
+                }
+            }
+            SYNC_WARPS;
+            electrodeCharges[jj + LOCAL_ID] = chunkCharges[LOCAL_ID];
+    #endif
+#endif
+        }
+        SYNC_THREADS;
+        for (int ii = LOCAL_ID; ii < jj; ii += LOCAL_SIZE) {
+#if CHUNK_SIZE > 1
+            real chargeOffset = 0;
+            for (int k = 0; k < CHUNK_SIZE; k++) {
+                chargeOffset += chunkCharges[k] * capacitance[(mm_long) (jj + k) * PADDED_PROBLEM_SIZE + ii];
+            }
+            electrodeCharges[ii] -= chargeOffset;
+#else
+            electrodeCharges[ii] -= electrodeCharges[jj] * capacitance[(mm_long) jj * PADDED_PROBLEM_SIZE + ii];
+#endif
+        }
+        SYNC_THREADS;
+    }
+
+    for (int ii = LOCAL_ID; ii < NUM_ELECTRODE_PARTICLES; ii += LOCAL_SIZE) {
+        electrodeCharges[ii] *= capacitance[(mm_long) ii * PADDED_PROBLEM_SIZE + ii];
+    }
+    SYNC_THREADS;
+
+#ifdef USE_CHARGE_CONSTRAINT
+    LOCAL volatile real temp[TEMP_SIZE];
+
+    real chargeOffset = 0;
+    for (int ii = LOCAL_ID; ii < NUM_ELECTRODE_PARTICLES; ii += LOCAL_SIZE) {
+        chargeOffset -= electrodeCharges[ii];
+    }
+    chargeOffset = chargeTarget + reduceValue(chargeOffset, temp);
+    for (int ii = LOCAL_ID; ii < NUM_ELECTRODE_PARTICLES; ii += LOCAL_SIZE) {
+        electrodeCharges[ii] += chargeOffset * constraintVector[ii];
+    }
+#endif
+}
--- a/platforms/common/src/kernels/constantPotentialSolver.cc
+++ b/platforms/common/src/kernels/constantPotentialSolver.cc
+KERNEL void checkSavedPositions(GLOBAL real4* RESTRICT posq, GLOBAL real4* RESTRICT savedPositions, GLOBAL int* RESTRICT result) {
+    for (int i = GLOBAL_ID; i < NUM_PARTICLES; i += GLOBAL_SIZE) {
+        real4 posqPosition = posq[i];
+        real4 savedPosition = savedPositions[i];
+        if (posqPosition.x != savedPosition.x || posqPosition.y != savedPosition.y || posqPosition.z != savedPosition.z) {
+            *result = 1;
+            break;
+        }
+    }
+}
--- a/platforms/common/src/kernels/pme.cc
+++ b/platforms/common/src/kernels/pme.cc
@@ -179,6 +179,8 @@ KERNEL void reciprocalConvolution(GLOBAL real2* RESTRICT pmeGrid, GLOBAL const r
        real eterm = recipScaleFactor*EXP(-RECIP_EXP_FACTOR*m2)/denom;
        if (kx != 0 || ky != 0 || kz != 0) {
            pmeGrid[index] = make_real2(grid.x*eterm, grid.y*eterm);
+        } else {
+            pmeGrid[index] = make_real2(0);
        }
 #endif
    }
@@ -351,6 +353,77 @@ KERNEL void gridInterpolateForce(GLOBAL const real4* RESTRICT posq, GLOBAL mm_ul
    }
 }

+KERNEL void gridInterpolateChargeDerivatives(GLOBAL const real4* RESTRICT posq, GLOBAL mm_ulong* RESTRICT derivatives, GLOBAL const real* RESTRICT pmeGrid,
+        real4 periodicBoxSize, real4 invPeriodicBoxSize, real4 periodicBoxVecX, real4 periodicBoxVecY, real4 periodicBoxVecZ,
+        real4 recipBoxVecX, real4 recipBoxVecY, real4 recipBoxVecZ, GLOBAL const int* RESTRICT atomIndices
+        ) {
+    real3 data[PME_ORDER];
+    const real scale = RECIP((real) (PME_ORDER-1));
+
+    for (int i = GLOBAL_ID; i < NUM_INDICES; i += GLOBAL_SIZE) {
+        int atom = atomIndices[i];
+        real derivative = 0;
+        real4 pos = posq[atom];
+        APPLY_PERIODIC_TO_POS(pos)
+        real3 t = make_real3(pos.x*recipBoxVecX.x+pos.y*recipBoxVecY.x+pos.z*recipBoxVecZ.x,
+                             pos.y*recipBoxVecY.y+pos.z*recipBoxVecZ.y,
+                             pos.z*recipBoxVecZ.z);
+        t.x = (t.x-floor(t.x))*GRID_SIZE_X;
+        t.y = (t.y-floor(t.y))*GRID_SIZE_Y;
+        t.z = (t.z-floor(t.z))*GRID_SIZE_Z;
+        int3 gridIndex = make_int3(((int) t.x) % GRID_SIZE_X,
+                                   ((int) t.y) % GRID_SIZE_Y,
+                                   ((int) t.z) % GRID_SIZE_Z);
+
+        // Since we need the full set of thetas, it's faster to compute them here than load them
+        // from global memory.
+
+        real3 dr = make_real3(t.x-(int) t.x, t.y-(int) t.y, t.z-(int) t.z);
+        data[PME_ORDER-1] = make_real3(0);
+        data[1] = dr;
+        data[0] = make_real3(1)-dr;
+        for (int j = 3; j < PME_ORDER; j++) {
+            real div = RECIP((real) (j-1));
+            data[j-1] = div*dr*data[j-2];
+            for (int k = 1; k < (j-1); k++)
+                data[j-k-1] = div*((dr+make_real3(k))*data[j-k-2] + (make_real3(j-k)-dr)*data[j-k-1]);
+            data[0] = div*(make_real3(1)-dr)*data[0];
+        }
+        data[PME_ORDER-1] = scale*dr*data[PME_ORDER-2];
+        for (int j = 1; j < (PME_ORDER-1); j++)
+            data[PME_ORDER-j-1] = scale*((dr+make_real3(j))*data[PME_ORDER-j-2] + (make_real3(PME_ORDER-j)-dr)*data[PME_ORDER-j-1]);
+        data[0] = scale*(make_real3(1)-dr)*data[0];
+
+        // Compute the charge derivative on this atom.
+
+        for (int ix = 0; ix < PME_ORDER; ix++) {
+            int xbase = gridIndex.x+ix;
+            xbase -= (xbase >= GRID_SIZE_X ? GRID_SIZE_X : 0);
+            xbase = xbase*GRID_SIZE_Y*GRID_SIZE_Z;
+            real dx = data[ix].x;
+
+            for (int iy = 0; iy < PME_ORDER; iy++) {
+                int ybase = gridIndex.y+iy;
+                ybase -= (ybase >= GRID_SIZE_Y ? GRID_SIZE_Y : 0);
+                ybase = xbase + ybase*GRID_SIZE_Z;
+                real dy = data[iy].y;
+
+                for (int iz = 0; iz < PME_ORDER; iz++) {
+                    int zindex = gridIndex.z+iz;
+                    zindex -= (zindex >= GRID_SIZE_Z ? GRID_SIZE_Z : 0);
+                    derivative += dx*dy*data[iz].z*pmeGrid[ybase + zindex];
+                }
+            }
+        }
+        derivative *= EPSILON_FACTOR;
+#ifdef USE_PME_STREAM
+        ATOMIC_ADD(&derivatives[i], (mm_ulong) realToFixedPoint(derivative));
+#else
+        derivatives[i] += (mm_ulong) realToFixedPoint(derivative);
+#endif
+    }
+}
+
 KERNEL void addForces(GLOBAL const real4* RESTRICT forces, GLOBAL mm_long* RESTRICT forceBuffers) {
    for (int atom = GLOBAL_ID; atom < NUM_ATOMS; atom += GLOBAL_SIZE) {
        real4 f = forces[atom];

--- a/platforms/cpu/include/CpuConstantPotentialForce.h
+++ b/platforms/cpu/include/CpuConstantPotentialForce.h
--- a/platforms/cpu/include/CpuConstantPotentialForceFvec.h
+++ b/platforms/cpu/include/CpuConstantPotentialForceFvec.h
--- a/platforms/cpu/include/CpuKernels.h
+++ b/platforms/cpu/include/CpuKernels.h
@@ -9,7 +9,7 @@
 * Biological Structures at Stanford, funded under the NIH Roadmap for        *
 * Medical Research, grant U54 GM072970. See https://simtk.org.               *
 *                                                                            *
- * Portions copyright (c) 2013-2024 Stanford University and the Authors.      *
+ * Portions copyright (c) 2013-2025 Stanford University and the Authors.      *
 * Authors: Peter Eastman                                                     *
 * Contributors:                                                              *
 *                                                                            *
@@ -33,6 +33,7 @@
 * -------------------------------------------------------------------------- */

 #include "CpuBondForce.h"
+#include "CpuConstantPotentialForce.h"
 #include "CpuCustomGBForce.h"
 #include "CpuCustomManyParticleForce.h"
 #include "CpuCustomNonbondedForce.h"
@@ -322,6 +323,83 @@ private:
    CpuBondForce bondForce;
 };

+/**
+ * This kernel is invoked by ConstantPotentialForce to calculate the forces acting on the system.
+ */
+class CpuCalcConstantPotentialForceKernel : public CalcConstantPotentialForceKernel {
+public:
+    CpuCalcConstantPotentialForceKernel(std::string name, const Platform& platform, CpuPlatform::PlatformData& data);
+    ~CpuCalcConstantPotentialForceKernel();
+    /**
+     * Initialize the kernel.
+     *
+     * @param system     the System this kernel will be applied to
+     * @param force      the ConstantPotentialForce this kernel will be used for
+     */
+    void initialize(const System& system, const ConstantPotentialForce& force);
+    /**
+     * Execute the kernel to calculate the forces and/or energy.
+     *
+     * @param context        the context in which to execute this kernel
+     * @param includeForces  true if forces should be calculated
+     * @param includeEnergy  true if the energy should be calculated
+     * @return the potential energy due to the force
+     */
+    double execute(ContextImpl& context, bool includeForces, bool includeEnergy);
+    /**
+     * Copy changed parameters over to a context.
+     *
+     * @param context        the context to copy parameters to
+     * @param force          the ConstantPotentialForce to copy the parameters from
+     * @param firstParticle  the index of the first particle whose parameters might have changed
+     * @param lastParticle   the index of the last particle whose parameters might have changed
+     * @param firstException the index of the first exception whose parameters might have changed
+     * @param lastException  the index of the last exception whose parameters might have changed
+     * @param firstElectrode the index of the first electrode whose parameters might have changed
+     * @param lastElectrode  the index of the last electrode whose parameters might have changed
+     */
+    void copyParametersToContext(ContextImpl& context, const ConstantPotentialForce& force, int firstParticle, int lastParticle, int firstException, int lastException, int firstElectrode, int lastElectrode);
+    /**
+     * Get the parameters being used for PME.
+     *
+     * @param alpha   the separation parameter
+     * @param nx      the number of grid points along the X axis
+     * @param ny      the number of grid points along the Y axis
+     * @param nz      the number of grid points along the Z axis
+     */
+    void getPMEParameters(double& alpha, int& nx, int& ny, int& nz) const;
+    /**
+     * Get the charges on all particles.
+     *
+     * @param context       the context to copy parameters to
+     * @param[out] charges  a vector to populate with particle charges
+     */
+    void getCharges(ContextImpl& context, std::vector<double>& charges);
+private:
+    void checkBoxSize(const Vec3* boxVectors);
+    void ensurePmeInitialized(ContextImpl& context);
+private:
+    CpuPlatform::PlatformData& data;
+    int numParticles, num14, numElectrodeParticles, chargePosqIndex;
+    std::vector<double> setCharges;
+    std::vector<float> charges;
+    std::vector<std::vector<double> > bonded14ParamArray;
+    std::vector<std::vector<int> > bonded14IndexArray;
+    std::map<int, int> nb14Index;
+    std::vector<std::set<int> > exclusions;
+    std::vector<int> sysToElec, elecToSys, sysElec, elecElec;
+    std::vector<std::array<double, 3> > electrodeParams;
+    double nonbondedCutoff, ewaldAlpha, cgErrorTol, chargeTarget;
+    int gridSize[3];
+    bool exceptionsArePeriodic, hasInitializedPme, useChargeConstraint;
+    Vec3 externalField;
+    CpuConstantPotentialForce* constantPotential;
+    CpuConstantPotentialSolver* solver;
+    CpuBondForce bondForce;
+    Kernel pmeKernel;
+};
+
+
 /**
 * This kernel is invoked by CustomNonbondedForce to calculate the forces acting on the system.
 */

--- a/platforms/cpu/sharedTarget/CMakeLists.txt
+++ b/platforms/cpu/sharedTarget/CMakeLists.txt
@@ -8,10 +8,14 @@ ENDFOREACH(file)
 IF(MSVC)
    SET_SOURCE_FILES_PROPERTIES(${CMAKE_SOURCE_DIR}/platforms/cpu/src/CpuNonbondedForceAvx.cpp PROPERTIES COMPILE_FLAGS "${EXTRA_COMPILE_FLAGS} /arch:AVX /D__AVX__")
    SET_SOURCE_FILES_PROPERTIES(${CMAKE_SOURCE_DIR}/platforms/cpu/src/CpuNonbondedForceAvx2.cpp PROPERTIES COMPILE_FLAGS "${EXTRA_COMPILE_FLAGS} /arch:AVX2 /D__AVX2__")
+    SET_SOURCE_FILES_PROPERTIES(${CMAKE_SOURCE_DIR}/platforms/cpu/src/CpuConstantPotentialForceAvx.cpp PROPERTIES COMPILE_FLAGS "${EXTRA_COMPILE_FLAGS} /arch:AVX /D__AVX__")
+    SET_SOURCE_FILES_PROPERTIES(${CMAKE_SOURCE_DIR}/platforms/cpu/src/CpuConstantPotentialForceAvx2.cpp PROPERTIES COMPILE_FLAGS "${EXTRA_COMPILE_FLAGS} /arch:AVX2 /D__AVX2__")
    SET_SOURCE_FILES_PROPERTIES(${CMAKE_SOURCE_DIR}/platforms/cpu/src/CpuCustomNonbondedForceAvx.cpp PROPERTIES COMPILE_FLAGS "${EXTRA_COMPILE_FLAGS} /arch:AVX /D__AVX__")
 ELSEIF(X86)
    SET_SOURCE_FILES_PROPERTIES(${CMAKE_SOURCE_DIR}/platforms/cpu/src/CpuNonbondedForceAvx.cpp PROPERTIES COMPILE_FLAGS "${EXTRA_COMPILE_FLAGS} -mavx")
    SET_SOURCE_FILES_PROPERTIES(${CMAKE_SOURCE_DIR}/platforms/cpu/src/CpuNonbondedForceAvx2.cpp PROPERTIES COMPILE_FLAGS "${EXTRA_COMPILE_FLAGS} -mavx2 -mfma")
+    SET_SOURCE_FILES_PROPERTIES(${CMAKE_SOURCE_DIR}/platforms/cpu/src/CpuConstantPotentialForceAvx.cpp PROPERTIES COMPILE_FLAGS "${EXTRA_COMPILE_FLAGS} -mavx")
+    SET_SOURCE_FILES_PROPERTIES(${CMAKE_SOURCE_DIR}/platforms/cpu/src/CpuConstantPotentialForceAvx2.cpp PROPERTIES COMPILE_FLAGS "${EXTRA_COMPILE_FLAGS} -mavx2 -mfma")
    SET_SOURCE_FILES_PROPERTIES(${CMAKE_SOURCE_DIR}/platforms/cpu/src/CpuCustomNonbondedForceAvx.cpp PROPERTIES COMPILE_FLAGS "${EXTRA_COMPILE_FLAGS} -mavx")
 ENDIF()


--- a/platforms/cpu/src/CpuConstantPotentialForce.cpp
+++ b/platforms/cpu/src/CpuConstantPotentialForce.cpp
--- a/platforms/cpu/src/CpuConstantPotentialForceAvx.cpp
+++ b/platforms/cpu/src/CpuConstantPotentialForceAvx.cpp
+/* -------------------------------------------------------------------------- *
+ *                                   OpenMM                                   *
+ * -------------------------------------------------------------------------- *
+ * This is part of the OpenMM molecular simulation toolkit originating from   *
+ * Simbios, the NIH National Center for Physics-Based Simulation of           *
+ * Biological Structures at Stanford, funded under the NIH Roadmap for        *
+ * Medical Research, grant U54 GM072970. See https://simtk.org.               *
+ *                                                                            *
+ * Portions copyright (c) 2025 Stanford University and the Authors.           *
+ * Authors: Evan Pretti                                                       *
+ * Contributors:                                                              *
+ *                                                                            *
+ * Permission is hereby granted, free of charge, to any person obtaining a    *
+ * copy of this software and associated documentation files (the "Software"), *
+ * to deal in the Software without restriction, including without limitation  *
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,   *
+ * and/or sell copies of the Software, and to permit persons to whom the      *
+ * Software is furnished to do so, subject to the following conditions:       *
+ *                                                                            *
+ * The above copyright notice and this permission notice shall be included in *
+ * all copies or substantial portions of the Software.                        *
+ *                                                                            *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR *
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,   *
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL    *
+ * THE AUTHORS, CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM,    *
+ * DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR      *
+ * OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE  *
+ * USE OR OTHER DEALINGS IN THE SOFTWARE.                                     *
+ * -------------------------------------------------------------------------- */
+
+#include "CpuConstantPotentialForceFvec.h"
+#include "CpuNeighborList.h"
+#include "openmm/OpenMMException.h"
+
+#ifdef __AVX__
+#include "openmm/internal/vectorizeAvx.h"
+
+OpenMM::CpuConstantPotentialForce* createCpuConstantPotentialForceAvx() {
+    return new OpenMM::CpuConstantPotentialForceFvec<fvec8, ivec8>();
+}
+
+#else
+
+OpenMM::CpuConstantPotentialForce* createCpuConstantPotentialForceAvx() {
+    throw OpenMM::OpenMMException("Internal error: OpenMM was compiled without AVX support");
+}
+
+#endif
--- a/platforms/cpu/src/CpuConstantPotentialForceAvx2.cpp
+++ b/platforms/cpu/src/CpuConstantPotentialForceAvx2.cpp
+/* -------------------------------------------------------------------------- *
+ *                                   OpenMM                                   *
+ * -------------------------------------------------------------------------- *
+ * This is part of the OpenMM molecular simulation toolkit originating from   *
+ * Simbios, the NIH National Center for Physics-Based Simulation of           *
+ * Biological Structures at Stanford, funded under the NIH Roadmap for        *
+ * Medical Research, grant U54 GM072970. See https://simtk.org.               *
+ *                                                                            *
+ * Portions copyright (c) 2025 Stanford University and the Authors.           *
+ * Authors: Evan Pretti                                                       *
+ * Contributors:                                                              *
+ *                                                                            *
+ * Permission is hereby granted, free of charge, to any person obtaining a    *
+ * copy of this software and associated documentation files (the "Software"), *
+ * to deal in the Software without restriction, including without limitation  *
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,   *
+ * and/or sell copies of the Software, and to permit persons to whom the      *
+ * Software is furnished to do so, subject to the following conditions:       *
+ *                                                                            *
+ * The above copyright notice and this permission notice shall be included in *
+ * all copies or substantial portions of the Software.                        *
+ *                                                                            *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR *
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,   *
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL    *
+ * THE AUTHORS, CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM,    *
+ * DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR      *
+ * OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE  *
+ * USE OR OTHER DEALINGS IN THE SOFTWARE.                                     *
+ * -------------------------------------------------------------------------- */
+
+#include "CpuConstantPotentialForceFvec.h"
+#include "CpuNeighborList.h"
+#include "openmm/OpenMMException.h"
+
+#ifdef __AVX2__
+#include "openmm/internal/vectorizeAvx2.h"
+
+OpenMM::CpuConstantPotentialForce* createCpuConstantPotentialForceAvx2() {
+    return new OpenMM::CpuConstantPotentialForceFvec<fvecAvx2, ivec8>();
+}
+
+#else
+
+OpenMM::CpuConstantPotentialForce* createCpuConstantPotentialForceAvx2() {
+    throw OpenMM::OpenMMException("Internal error: OpenMM was compiled without AVX2 support");
+}
+
+#endif
--- a/platforms/cpu/src/CpuConstantPotentialForceFvec.cpp
+++ b/platforms/cpu/src/CpuConstantPotentialForceFvec.cpp
+/* -------------------------------------------------------------------------- *
+ *                                   OpenMM                                   *
+ * -------------------------------------------------------------------------- *
+ * This is part of the OpenMM molecular simulation toolkit originating from   *
+ * Simbios, the NIH National Center for Physics-Based Simulation of           *
+ * Biological Structures at Stanford, funded under the NIH Roadmap for        *
+ * Medical Research, grant U54 GM072970. See https://simtk.org.               *
+ *                                                                            *
+ * Portions copyright (c) 2025 Stanford University and the Authors.           *
+ * Authors: Evan Pretti                                                       *
+ * Contributors:                                                              *
+ *                                                                            *
+ * Permission is hereby granted, free of charge, to any person obtaining a    *
+ * copy of this software and associated documentation files (the "Software"), *
+ * to deal in the Software without restriction, including without limitation  *
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,   *
+ * and/or sell copies of the Software, and to permit persons to whom the      *
+ * Software is furnished to do so, subject to the following conditions:       *
+ *                                                                            *
+ * The above copyright notice and this permission notice shall be included in *
+ * all copies or substantial portions of the Software.                        *
+ *                                                                            *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR *
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,   *
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL    *
+ * THE AUTHORS, CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM,    *
+ * DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR      *
+ * OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE  *
+ * USE OR OTHER DEALINGS IN THE SOFTWARE.                                     *
+ * -------------------------------------------------------------------------- */
+
+#include "CpuConstantPotentialForceFvec.h"
+#include "CpuNeighborList.h"
+#include "openmm/internal/hardware.h"
+
+using namespace OpenMM;
+
+CpuConstantPotentialForce* createCpuConstantPotentialForceVec4();
+CpuConstantPotentialForce* createCpuConstantPotentialForceAvx();
+CpuConstantPotentialForce* createCpuConstantPotentialForceAvx2();
+
+CpuConstantPotentialForce* createCpuConstantPotentialForceVec() {
+    if (isAvx2Supported())
+        return createCpuConstantPotentialForceAvx2();
+    else if (isAvxSupported())
+        return createCpuConstantPotentialForceAvx();
+    else
+        return createCpuConstantPotentialForceVec4();
+}
--- a/platforms/cpu/src/CpuConstantPotentialForceVec4.cpp
+++ b/platforms/cpu/src/CpuConstantPotentialForceVec4.cpp
--- a/platforms/cpu/src/CpuKernelFactory.cpp
+++ b/platforms/cpu/src/CpuKernelFactory.cpp
@@ -6,7 +6,7 @@
 * Biological Structures at Stanford, funded under the NIH Roadmap for        *
 * Medical Research, grant U54 GM072970. See https://simtk.org.               *
 *                                                                            *
- * Portions copyright (c) 2013-2024 Stanford University and the Authors.      *
+ * Portions copyright (c) 2013-2025 Stanford University and the Authors.      *
 * Authors: Peter Eastman                                                     *
 * Contributors:                                                              *
 *                                                                            *
@@ -52,6 +52,8 @@ KernelImpl* CpuKernelFactory::createKernelImpl(std::string name, const Platform&
        return new CpuCalcRBTorsionForceKernel(name, platform, data);
    if (name == CalcNonbondedForceKernel::Name())
        return new CpuCalcNonbondedForceKernel(name, platform, data);
+    if (name == CalcConstantPotentialForceKernel::Name())
+        return new CpuCalcConstantPotentialForceKernel(name, platform, data);
    if (name == CalcCustomNonbondedForceKernel::Name())
        return new CpuCalcCustomNonbondedForceKernel(name, platform, data);
    if (name == CalcCustomManyParticleForceKernel::Name())

--- a/platforms/cpu/src/CpuKernels.cpp
+++ b/platforms/cpu/src/CpuKernels.cpp
--- a/platforms/cpu/src/CpuNonbondedForceAvx2.cpp
+++ b/platforms/cpu/src/CpuNonbondedForceAvx2.cpp