GPU implementation of L-BFGS (#5198)

* Make reference/CPU minimizer into a kernel * Add per-platform support for GPU minimization * Initial implementation of GPU minimization * Fixes * Increase robustness when initial gradient is huge * Handle overflow leading to non-finite values gracefully * Handle large forces in single precision more robustly * Optimize kernels * Fix kernel launch size * Update banner years * Don't create MinimizeKernel until first minimization requested * Make some compile-time constants into kernel arguments * Consolidate scale calculation kernel * Condense alpha/beta reduction kernels using atomics * Condense line search dot kernels with reductions * Remove a download, and download grad norm separately * Asynchronously check lbfgs convergence condition * Restructure line search to avoid download waiting * Start line search preemptively in case CPU evaluation is not needed * In rare cases, constraint error might not decrease after one optimization round * Better handling of unsupported 64-bit atomics, use FLT_MAX * Pick gradient mode based on GPU vs. CPU evaluation * Rework getDiff/getScale reduction, remove reduceBuffer * Older CUDA might not like float hex literals * Fix error in a comment

GPU implementation of L-BFGS (#5198)
* Make reference/CPU minimizer into a kernel * Add per-platform support for GPU minimization * Initial implementation of GPU minimization * Fixes * Increase robustness when initial gradient is huge * Handle overflow leading to non-finite values gracefully * Handle large forces in single precision more robustly * Optimize kernels * Fix kernel launch size * Update banner years * Don't create MinimizeKernel until first minimization requested * Make some compile-time constants into kernel arguments * Consolidate scale calculation kernel * Condense alpha/beta reduction kernels using atomics * Condense line search dot kernels with reductions * Remove a download, and download grad norm separately * Asynchronously check lbfgs convergence condition * Restructure line search to avoid download waiting * Start line search preemptively in case CPU evaluation is not needed * In rare cases, constraint error might not decrease after one optimization round * Better handling of unsupported 64-bit atomics, use FLT_MAX * Pick gradient mode based on GPU vs. CPU evaluation * Rework getDiff/getScale reduction, remove reduceBuffer * Older CUDA might not like float hex literals * Fix error in a comment
4ab645ea · Evan Pretti · GitHub · 834b1294 · 4ab645ea · 4ab645ea
Unverified Commit 4ab645ea authored Feb 10, 2026 by Evan Pretti Committed by GitHub Feb 10, 2026
8 changed files
--- a/platforms/opencl/src/OpenCLPlatform.cpp
+++ b/platforms/opencl/src/OpenCLPlatform.cpp
@@ -64,6 +64,7 @@ OpenCLPlatform::OpenCLPlatform() {
    registerKernelFactory(UpdateStateDataKernel::Name(), factory);
    registerKernelFactory(ApplyConstraintsKernel::Name(), factory);
    registerKernelFactory(VirtualSitesKernel::Name(), factory);
+    registerKernelFactory(MinimizeKernel::Name(), factory);
    registerKernelFactory(CalcHarmonicBondForceKernel::Name(), factory);
    registerKernelFactory(CalcCustomBondForceKernel::Name(), factory);
    registerKernelFactory(CalcHarmonicAngleForceKernel::Name(), factory);

--- a/platforms/reference/include/ReferenceKernels.h
+++ b/platforms/reference/include/ReferenceKernels.h
@@ -7,7 +7,7 @@
 * This is part of the OpenMM molecular simulation toolkit.                   *
 * See https://openmm.org/development.                                        *
 *                                                                            *
- * Portions copyright (c) 2008-2025 Stanford University and the Authors.      *
+ * Portions copyright (c) 2008-2026 Stanford University and the Authors.      *
 * Authors: Peter Eastman                                                     *
 * Contributors:                                                              *
 *                                                                            *
@@ -289,6 +289,30 @@ public:
    void computePositions(ContextImpl& context);
 };

+/**
+ * This kernel performs local energy minimization.
+ */
+class ReferenceMinimizeKernel : public MinimizeKernel {
+public:
+    ReferenceMinimizeKernel(std::string name, const Platform& platform) : MinimizeKernel(name, platform) {
+    }
+    /**
+     * Initialize the kernel.
+     *
+     * @param system     the System this kernel will be applied to
+     */
+    void initialize(const System& system);
+    /**
+     * Perform local energy minimization.
+     * 
+     * @param context        the context with which to perform the minimization
+     * @param tolerance      limiting root-mean-square value of all force components in kJ/mol/nm for convergence
+     * @param maxIterations  the maximum number of iterations to perform, or 0 to continue until convergence
+     * @param reporter       an optional reporter to invoke after each iteration of minimization
+     */
+    void execute(ContextImpl& context, double tolerance, int maxIterations, MinimizationReporter* reporter);
+};
+
 /**
 * This kernel is invoked by HarmonicBondForce to calculate the forces acting on the system and the energy of the system.
 */

--- a/platforms/reference/include/ReferenceMinimize.h
+++ b/platforms/reference/include/ReferenceMinimize.h
+#ifndef OPENMM_REFERENCEMINIMIZE_H_
+#define OPENMM_REFERENCEMINIMIZE_H_
+
+/* -------------------------------------------------------------------------- *
+ *                                   OpenMM                                   *
+ * -------------------------------------------------------------------------- *
+ * This is part of the OpenMM molecular simulation toolkit.                   *
+ * See https://openmm.org/development.                                        *
+ *                                                                            *
+ * Portions copyright (c) 2010-2026 Stanford University and the Authors.      *
+ * Authors: Evan Pretti                                                       *
+ * Contributors:                                                              *
+ *                                                                            *
+ * Permission is hereby granted, free of charge, to any person obtaining a    *
+ * copy of this software and associated documentation files (the "Software"), *
+ * to deal in the Software without restriction, including without limitation  *
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,   *
+ * and/or sell copies of the Software, and to permit persons to whom the      *
+ * Software is furnished to do so, subject to the following conditions:       *
+ *                                                                            *
+ * The above copyright notice and this permission notice shall be included in *
+ * all copies or substantial portions of the Software.                        *
+ *                                                                            *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR *
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,   *
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL    *
+ * THE AUTHORS, CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM,    *
+ * DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR      *
+ * OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE  *
+ * USE OR OTHER DEALINGS IN THE SOFTWARE.                                     *
+ * -------------------------------------------------------------------------- */
+
+#include "openmm/internal/ContextImpl.h"
+#include "openmm/LocalEnergyMinimizer.h"
+
+namespace OpenMM {
+
+class ReferenceMinimize {
+public:
+    static void minimize(ContextImpl& context, double tolerance, int maxIterations, MinimizationReporter* reporter);
+};
+
+} // namespace OpenMM
+
+#endif // OPENMM_REFERENCEMINIMIZE_H_
--- a/platforms/reference/src/ReferenceKernelFactory.cpp
+++ b/platforms/reference/src/ReferenceKernelFactory.cpp
@@ -4,7 +4,7 @@
 * This is part of the OpenMM molecular simulation toolkit.                   *
 * See https://openmm.org/development.                                        *
 *                                                                            *
- * Portions copyright (c) 2008-2025 Stanford University and the Authors.      *
+ * Portions copyright (c) 2008-2026 Stanford University and the Authors.      *
 * Authors: Peter Eastman                                                     *
 * Contributors:                                                              *
 *                                                                            *
@@ -44,6 +44,8 @@ KernelImpl* ReferenceKernelFactory::createKernelImpl(std::string name, const Pla
        return new ReferenceApplyConstraintsKernel(name, platform, data);
    if (name == VirtualSitesKernel::Name())
        return new ReferenceVirtualSitesKernel(name, platform);
+    if (name == MinimizeKernel::Name())
+        return new ReferenceMinimizeKernel(name, platform);
    if (name == CalcNonbondedForceKernel::Name())
        return new ReferenceCalcNonbondedForceKernel(name, platform);
    if (name == CalcConstantPotentialForceKernel::Name())

--- a/platforms/reference/src/ReferenceKernels.cpp
+++ b/platforms/reference/src/ReferenceKernels.cpp
@@ -4,7 +4,7 @@
 * This is part of the OpenMM molecular simulation toolkit.                   *
 * See https://openmm.org/development.                                        *
 *                                                                            *
- * Portions copyright (c) 2008-2025 Stanford University and the Authors.      *
+ * Portions copyright (c) 2008-2026 Stanford University and the Authors.      *
 * Authors: Peter Eastman                                                     *
 * Contributors:                                                              *
 *                                                                            *
@@ -56,6 +56,7 @@
 #include "ReferenceLCPOIxn.h"
 #include "ReferenceLJCoulomb14.h"
 #include "ReferenceLJCoulombIxn.h"
+#include "ReferenceMinimize.h"
 #include "ReferenceMonteCarloBarostat.h"
 #include "ReferenceNoseHooverChain.h"
 #include "ReferenceNoseHooverDynamics.h"
@@ -346,6 +347,13 @@ void ReferenceVirtualSitesKernel::computePositions(ContextImpl& context) {
    extractVirtualSites(context).computePositions(context.getSystem(), positions, extractBoxVectors(context));
 }

+void ReferenceMinimizeKernel::initialize(const System& system) {
+}
+
+void ReferenceMinimizeKernel::execute(ContextImpl& context, double tolerance, int maxIterations, MinimizationReporter* reporter) {
+    ReferenceMinimize::minimize(context, tolerance, maxIterations, reporter);
+}
+
 void ReferenceCalcHarmonicBondForceKernel::initialize(const System& system, const HarmonicBondForce& force) {
    numBonds = force.getNumBonds();
    bondIndexArray.resize(numBonds, vector<int>(2));

--- a/platforms/reference/src/ReferencePlatform.cpp
+++ b/platforms/reference/src/ReferencePlatform.cpp
@@ -4,7 +4,7 @@
 * This is part of the OpenMM molecular simulation toolkit.                   *
 * See https://openmm.org/development.                                        *
 *                                                                            *
- * Portions copyright (c) 2008-2025 Stanford University and the Authors.      *
+ * Portions copyright (c) 2008-2026 Stanford University and the Authors.      *
 * Authors: Peter Eastman                                                     *
 * Contributors:                                                              *
 *                                                                            *
@@ -44,6 +44,7 @@ ReferencePlatform::ReferencePlatform() {
    registerKernelFactory(UpdateStateDataKernel::Name(), factory);
    registerKernelFactory(ApplyConstraintsKernel::Name(), factory);
    registerKernelFactory(VirtualSitesKernel::Name(), factory);
+    registerKernelFactory(MinimizeKernel::Name(), factory);
    registerKernelFactory(CalcHarmonicBondForceKernel::Name(), factory);
    registerKernelFactory(CalcCustomBondForceKernel::Name(), factory);
    registerKernelFactory(CalcHarmonicAngleForceKernel::Name(), factory);

--- a/platforms/reference/src/SimTKReference/ReferenceMinimize.cpp
+++ b/platforms/reference/src/SimTKReference/ReferenceMinimize.cpp
+/* -------------------------------------------------------------------------- *
+ *                                   OpenMM                                   *
+ * -------------------------------------------------------------------------- *
+ * This is part of the OpenMM molecular simulation toolkit.                   *
+ * See https://openmm.org/development.                                        *
+ *                                                                            *
+ * Portions copyright (c) 2010-2026 Stanford University and the Authors.      *
+ * Authors: Peter Eastman                                                     *
+ * Contributors:                                                              *
+ *                                                                            *
+ * Permission is hereby granted, free of charge, to any person obtaining a    *
+ * copy of this software and associated documentation files (the "Software"), *
+ * to deal in the Software without restriction, including without limitation  *
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,   *
+ * and/or sell copies of the Software, and to permit persons to whom the      *
+ * Software is furnished to do so, subject to the following conditions:       *
+ *                                                                            *
+ * The above copyright notice and this permission notice shall be included in *
+ * all copies or substantial portions of the Software.                        *
+ *                                                                            *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR *
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,   *
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL    *
+ * THE AUTHORS, CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM,    *
+ * DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR      *
+ * OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE  *
+ * USE OR OTHER DEALINGS IN THE SOFTWARE.                                     *
+ * -------------------------------------------------------------------------- */
+
+#include "ReferenceMinimize.h"
+#include "lbfgs.h"
+#include <cmath>
+
+using namespace OpenMM;
+using namespace std;
+
+struct MinimizerData {
+    ContextImpl& context;
+    double k;
+    MinimizationReporter* reporter;
+    MinimizerData(ContextImpl& context, double k, MinimizationReporter* reporter) : context(context), k(k), reporter(reporter) {
+    }
+};
+
+static double computeForcesAndEnergy(ContextImpl& context, const vector<Vec3>& positions, lbfgsfloatval_t *g) {
+    context.setPositions(positions);
+    context.computeVirtualSites();
+    double potentialEnergy = context.calcForcesAndEnergy(true, true, context.getIntegrator().getIntegrationForceGroups());
+    vector<Vec3> forces;
+    context.getForces(forces);
+    const System& system = context.getSystem();
+    for (int i = 0; i < forces.size(); i++) {
+        if (system.getParticleMass(i) == 0) {
+            g[3*i] = 0.0;
+            g[3*i+1] = 0.0;
+            g[3*i+2] = 0.0;
+        }
+        else {
+            g[3*i] = -forces[i][0];
+            g[3*i+1] = -forces[i][1];
+            g[3*i+2] = -forces[i][2];
+        }
+    }
+    return potentialEnergy;
+}
+
+static lbfgsfloatval_t evaluate(void *instance, const lbfgsfloatval_t *x, lbfgsfloatval_t *g, const int n, const lbfgsfloatval_t step) {
+    MinimizerData* data = reinterpret_cast<MinimizerData*>(instance);
+    ContextImpl& context = data->context;
+    const System& system = context.getSystem();
+    int numParticles = system.getNumParticles();
+
+    // Compute the force and energy for this configuration.
+
+    vector<Vec3> positions(numParticles);
+    for (int i = 0; i < numParticles; i++)
+        positions[i] = Vec3(x[3*i], x[3*i+1], x[3*i+2]);
+    double energy = computeForcesAndEnergy(context, positions, g);
+
+    // Add harmonic forces for any constraints.
+
+    int numConstraints = system.getNumConstraints();
+    double k = data->k;
+    for (int i = 0; i < numConstraints; i++) {
+        int particle1, particle2;
+        double distance;
+        system.getConstraintParameters(i, particle1, particle2, distance);
+        Vec3 delta = positions[particle2]-positions[particle1];
+        double r2 = delta.dot(delta);
+        double r = sqrt(r2);
+        delta *= 1/r;
+        double dr = r-distance;
+        double kdr = k*dr;
+        energy += 0.5*kdr*dr;
+        if (system.getParticleMass(particle1) != 0) {
+            g[3*particle1] -= kdr*delta[0];
+            g[3*particle1+1] -= kdr*delta[1];
+            g[3*particle1+2] -= kdr*delta[2];
+        }
+        if (system.getParticleMass(particle2) != 0) {
+            g[3*particle2] += kdr*delta[0];
+            g[3*particle2+1] += kdr*delta[1];
+            g[3*particle2+2] += kdr*delta[2];
+        }
+    }
+    return energy;
+}
+
+static int report(void *instance, const lbfgsfloatval_t *x, const lbfgsfloatval_t *g, const lbfgsfloatval_t fx,
+        const lbfgsfloatval_t xnorm, const lbfgsfloatval_t gnorm, const lbfgsfloatval_t step, int n, int iteration, int ls) {
+    // Copy over the positions and gradients.
+
+    vector<double> xout(n), gradout(n);
+    for (int i = 0; i < n; i++) {
+        xout[i] = x[i];
+        gradout[i] = g[i];
+    }
+
+    // Compute the other arguments passed to the reporter.
+
+    MinimizerData* data = reinterpret_cast<MinimizerData*>(instance);
+    ContextImpl& context = data->context;
+    const System& system = context.getSystem();
+    double restraintEnergy = 0.0, maxError = 0.0;
+    double k = data->k;
+    for (int i = 0; i < system.getNumConstraints(); i++) {
+        int p1, p2;
+        double distance;
+        system.getConstraintParameters(i, p1, p2, distance);
+        Vec3 delta(x[3*p1]-x[3*p2], x[3*p1+1]-x[3*p2+1], x[3*p1+2]-x[3*p2+2]);
+        double r2 = delta.dot(delta);
+        double r = sqrt(r2);
+        double dr = r-distance;
+        restraintEnergy += 0.5*k*dr*dr;
+        maxError = max(maxError, fabs(dr)/distance);
+    }
+    map<string, double> args;
+    args["restraint energy"] = restraintEnergy;
+    args["system energy"] = fx-restraintEnergy;
+    args["restraint strength"] = k;
+    args["max constraint error"] = maxError;
+
+    // Invoke the reporter.
+
+    MinimizationReporter* reporter = reinterpret_cast<MinimizationReporter*>(data->reporter);
+    if (reporter->report(iteration-1, xout, gradout, args))
+        return 1;
+    return 0;
+}
+
+void ReferenceMinimize::minimize(ContextImpl& context, double tolerance, int maxIterations, MinimizationReporter* reporter) {
+    const System& system = context.getSystem();
+    int numParticles = system.getNumParticles();
+    double constraintTol = context.getIntegrator().getConstraintTolerance();
+    double workingConstraintTol = std::max(1e-4, constraintTol);
+    double k = 100/workingConstraintTol;
+    lbfgsfloatval_t *x = lbfgs_malloc(numParticles*3);
+    if (x == NULL)
+        throw OpenMMException("LocalEnergyMinimizer: Failed to allocate memory");
+    try {
+
+        // Initialize the minimizer.
+
+        lbfgs_parameter_t param;
+        lbfgs_parameter_init(&param);
+        if (!context.getPlatform().supportsDoublePrecision())
+            param.xtol = 1e-7;
+        param.max_iterations = maxIterations;
+        param.linesearch = LBFGS_LINESEARCH_BACKTRACKING_STRONG_WOLFE;
+
+        // Make sure the initial configuration satisfies all constraints.
+
+        context.applyConstraints(workingConstraintTol);
+
+        // Record the initial positions and determine a normalization constant for scaling the tolerance.
+
+        vector<Vec3> initialPos;
+        context.getPositions(initialPos);
+        double norm = 0.0;
+        for (int i = 0; i < numParticles; i++) {
+            x[3*i] = initialPos[i][0];
+            x[3*i+1] = initialPos[i][1];
+            x[3*i+2] = initialPos[i][2];
+            norm += initialPos[i].dot(initialPos[i]);
+        }
+        norm /= numParticles;
+        norm = (norm < 1 ? 1 : sqrt(norm));
+        param.epsilon = tolerance/norm;
+
+        // Repeatedly minimize, steadily increasing the strength of the springs until all constraints are satisfied.
+
+        double prevMaxError1 = 1e10, prevMaxError2 = 1e10;
+        MinimizerData data(context, k, reporter);
+        while (true) {
+            // Perform the minimization.
+
+            lbfgsfloatval_t fx;
+            lbfgs_progress_t reportFn = (reporter == NULL ? NULL : report);
+            lbfgs(numParticles*3, x, &fx, evaluate, reportFn, &data, &param);
+
+            // Check whether all constraints are satisfied.
+
+            vector<Vec3> positions;
+            context.getPositions(positions);
+            int numConstraints = system.getNumConstraints();
+            double maxError = 0.0;
+            for (int i = 0; i < numConstraints; i++) {
+                int particle1, particle2;
+                double distance;
+                system.getConstraintParameters(i, particle1, particle2, distance);
+                Vec3 delta = positions[particle2]-positions[particle1];
+                double r = sqrt(delta.dot(delta));
+                double error = fabs(r-distance)/distance;
+                if (error > maxError)
+                    maxError = error;
+            }
+            if (maxError <= workingConstraintTol)
+                break; // All constraints are satisfied.
+            context.setPositions(initialPos);
+            if (maxError >= prevMaxError2)
+                break; // Further tightening the springs doesn't seem to be helping, so just give up.
+            prevMaxError2 = prevMaxError1;
+            prevMaxError1 = maxError;
+            data.k *= 10;
+            if (maxError > 100*workingConstraintTol) {
+                // We've gotten far enough from a valid state that we might have trouble getting
+                // back, so reset to the original positions.
+
+                for (int i = 0; i < numParticles; i++) {
+                    x[3*i] = initialPos[i][0];
+                    x[3*i+1] = initialPos[i][1];
+                    x[3*i+2] = initialPos[i][2];
+                }
+            }
+        }
+    }
+    catch (...) {
+        lbfgs_free(x);
+        throw;
+    }
+    lbfgs_free(x);
+
+    // If necessary, do a final constraint projection to make sure they are satisfied
+    // to the full precision requested by the user.
+
+    if (constraintTol < workingConstraintTol)
+        context.applyConstraints(constraintTol);
+}
--- a/tests/TestLocalEnergyMinimizer.h
+++ b/tests/TestLocalEnergyMinimizer.h
@@ -5,7 +5,7 @@
 * This is part of the OpenMM molecular simulation toolkit.                   *
 * See https://openmm.org/development.                                        *
 *                                                                            *
- * Portions copyright (c) 2010-2023 Stanford University and the Authors.      *
+ * Portions copyright (c) 2010-2026 Stanford University and the Authors.      *
 * Authors: Peter Eastman                                                     *
 * Contributors:                                                              *
 *                                                                            *
@@ -206,13 +206,13 @@ void testLargeForces() {
    system.addForce(nonbonded);
    for (int i = 0; i < numParticles; i++) {
        system.addParticle(1.0);
-        nonbonded->addParticle(1.0, 0.2, 1.0);
+        nonbonded->addParticle(0.1, 0.2, 1.0);
    }
    vector<Vec3> positions(numParticles);
    OpenMM_SFMT::SFMT sfmt;
    init_gen_rand(0, sfmt);
    for (int i = 0; i < numParticles; i++)
-        positions[i] = Vec3(genrand_real2(sfmt), genrand_real2(sfmt), genrand_real2(sfmt))*1e-10;
+        positions[i] = Vec3(genrand_real2(sfmt), genrand_real2(sfmt), genrand_real2(sfmt))*1e-2;

    // Minimize it and verify that it didn't blow up.

@@ -226,7 +226,7 @@ void testLargeForces() {
        Vec3 r = state.getPositions()[i];
        maxdist = max(maxdist, sqrt(r.dot(r)));
    }
-    ASSERT(maxdist > 0.1);
+    ASSERT(maxdist > 1.0);
    ASSERT(maxdist < 10.0);
 }