Amortize PME atom-grid re-sort for large systems (#5305)

Large systems (>15000 atoms) re-sorted the PME atom grid every step, skipping the step-counter amortization used for smaller systems. On current GPUs the per-step sort is mostly wasted work, so re-sort every 2 steps instead. Smaller systems are unchanged. The sort only changes charge-spread memory locality; results are identical up to floating-point summation order.

Amortize PME atom-grid re-sort for large systems (#5305)
Large systems (>15000 atoms) re-sorted the PME atom grid every step, skipping the step-counter amortization used for smaller systems. On current GPUs the per-step sort is mostly wasted work, so re-sort every 2 steps instead. Smaller systems are unchanged. The sort only changes charge-spread memory locality; results are identical up to floating-point summation order.
06d8d513 · Mateus R · GitHub · 2ffa7cd3 · 06d8d513
Unverified Commit 06d8d513 authored May 29, 2026 by Mateus R Committed by GitHub May 29, 2026
Show whitespace changes
Inline Side-by-side

Showing with 2 additions and 2 deletions

platforms/common/src/CommonCalcNonbondedForce.cpp platforms/common/src/CommonCalcNonbondedForce.cpp +2 -2

No files found.
--- a/platforms/common/src/CommonCalcNonbondedForce.cpp
+++ b/platforms/common/src/CommonCalcNonbondedForce.cpp
@@ -962,7 +962,7 @@ double CommonCalcNonbondedForceKernel::execute(ContextImpl& context, bool includ
        // Execute the reciprocal space kernels.

        if (hasCoulomb) {
-            if (stepsToSort <= 0 || doLJPME || cc.getNumAtoms() > 15000) {
+            if (stepsToSort <= 0 || doLJPME) {
                setPeriodicBoxArgs(cc, pmeGridIndexKernel, 2);
                if (cc.getUseDoublePrecision()) {
                    pmeGridIndexKernel->setArg(7, recipBoxVectors[0]);
@@ -976,7 +976,7 @@ double CommonCalcNonbondedForceKernel::execute(ContextImpl& context, bool includ
                }
                pmeGridIndexKernel->execute(cc.getNumAtoms());
                sort->sort(pmeAtomGridIndex);
-                stepsToSort = 3;
+                stepsToSort = (cc.getNumAtoms() > 15000) ? 1 : 3;
            }
            else
                stepsToSort--;