Unverified Commit 06d8d513 authored by Mateus R's avatar Mateus R Committed by GitHub
Browse files

Amortize PME atom-grid re-sort for large systems (#5305)

Large systems (>15000 atoms) re-sorted the PME atom grid every step,
skipping the step-counter amortization used for smaller systems. On
current GPUs the per-step sort is mostly wasted work, so re-sort every
2 steps instead. Smaller systems are unchanged. The sort only changes
charge-spread memory locality; results are identical up to floating-point
summation order.
parent 2ffa7cd3
......@@ -962,7 +962,7 @@ double CommonCalcNonbondedForceKernel::execute(ContextImpl& context, bool includ
// Execute the reciprocal space kernels.
if (hasCoulomb) {
if (stepsToSort <= 0 || doLJPME || cc.getNumAtoms() > 15000) {
if (stepsToSort <= 0 || doLJPME) {
setPeriodicBoxArgs(cc, pmeGridIndexKernel, 2);
if (cc.getUseDoublePrecision()) {
pmeGridIndexKernel->setArg(7, recipBoxVectors[0]);
......@@ -976,7 +976,7 @@ double CommonCalcNonbondedForceKernel::execute(ContextImpl& context, bool includ
}
pmeGridIndexKernel->execute(cc.getNumAtoms());
sort->sort(pmeAtomGridIndex);
stepsToSort = 3;
stepsToSort = (cc.getNumAtoms() > 15000) ? 1 : 3;
}
else
stepsToSort--;
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment