Common compute framework to unify CUDA and OpenCL code (#2488)

* Began creating common compute framework to unify code between CUDA and OpenCL * Began OpenCL implementation of common compute framework * Common implementation of CMMotionRemover * CUDA implementation of common compute interface * Converted HarmonicBondForce to common compute API * Converted standard bonded forces to common compute API * Converted ExpressionUtilities to common compute API * Created ComputeParameterSet * Converted custom bonded forces to common compute API * Converted CustomCentroidBondForce to common compute API * Converted CustomManyParticleForce to common compute API * Moved lots of duplicate code from CudaContext and OpenCLContext to ComputeContext * Converted GayBerneForce to common compute API * Removed obsolete kernels * Converted verlet integrators to common compute API * Converted Langevin and Brownian integrators to common compute API * Converted CustomIntegrator to common compute API * Converted CustomNonbondedForce to common compute API * Removed uses of a deprecated API * Fixed failing test cases * Converted GBSAOBCForce to common compute API * Began converting CustomGBForce to common compute API * Finished converting CustomGBForce to common compute API * Merged duplicated code in CudaIntegrationUtilities and OpenCLIntegrationUtilities * Converted RMSDForce and AndersenThermostat to common compute API * Converted CustomHbondForce to common compute API * Merged scripts for encoding kernel sources * Converted Drude plugin to common compute API * Fixed errors in CMake scripts * Attempt at fixing errors on Windows * Added discussion of common compute API to developer guide * Added Windows export macro for common classes * Fixed error in CMMotionRemover * Ubdated travis to newer Ubuntu version * Fixed errors on CPU OpenCL * Fixed Windows linking errors * Added missing pragma for 32 bit atomics * Replaced long long with mm_long * More fixes to Windows linking * Bug fix

Common compute framework to unify CUDA and OpenCL code (#2488)
* Began creating common compute framework to unify code between CUDA and OpenCL * Began OpenCL implementation of common compute framework * Common implementation of CMMotionRemover * CUDA implementation of common compute interface * Converted HarmonicBondForce to common compute API * Converted standard bonded forces to common compute API * Converted ExpressionUtilities to common compute API * Created ComputeParameterSet * Converted custom bonded forces to common compute API * Converted CustomCentroidBondForce to common compute API * Converted CustomManyParticleForce to common compute API * Moved lots of duplicate code from CudaContext and OpenCLContext to ComputeContext * Converted GayBerneForce to common compute API * Removed obsolete kernels * Converted verlet integrators to common compute API * Converted Langevin and Brownian integrators to common compute API * Converted CustomIntegrator to common compute API * Converted CustomNonbondedForce to common compute API * Removed uses of a deprecated API * Fixed failing test cases * Converted GBSAOBCForce to common compute API * Began converting CustomGBForce to common compute API * Finished converting CustomGBForce to common compute API * Merged duplicated code in CudaIntegrationUtilities and OpenCLIntegrationUtilities * Converted RMSDForce and AndersenThermostat to common compute API * Converted CustomHbondForce to common compute API * Merged scripts for encoding kernel sources * Converted Drude plugin to common compute API * Fixed errors in CMake scripts * Attempt at fixing errors on Windows * Added discussion of common compute API to developer guide * Added Windows export macro for common classes * Fixed error in CMMotionRemover * Ubdated travis to newer Ubuntu version * Fixed errors on CPU OpenCL * Fixed Windows linking errors * Added missing pragma for 32 bit atomics * Replaced long long with mm_long * More fixes to Windows linking * Bug fix
edbc8407 · peastman · GitHub · 38beeefe · edbc8407 · edbc8407
Unverified Commit edbc8407 authored Jan 08, 2020 by peastman Committed by GitHub Jan 08, 2020
20 changed files
--- a/.travis.yml
+++ b/.travis.yml
@@ -17,7 +17,7 @@ env:
 matrix:
  include:
    - sudo: required
-      dist: trusty
+      dist: xenial
      env: ==CPU_OPENCL==
           OPENCL=true
           CUDA=false
@@ -40,7 +40,7 @@ matrix:
      addons: {apt: {packages: []}}

    - sudo: required
-      dist: trusty
+      dist: xenial
      env: ==CUDA_COMPILE==
           CUDA=true
           OPENCL=false
@@ -74,7 +74,7 @@ matrix:
      addons: {apt: {packages: []}}

    - sudo: false
-      dist: trusty
+      dist: xenial
      python: 3.6
      env: ==STATIC_LIB==
           OPENCL=false
@@ -84,7 +84,7 @@ matrix:
           CMAKE_FLAGS="-DOPENMM_BUILD_STATIC_LIB=ON"

    - sudo: false
-      dist: trusty
+      dist: xenial
      python: 3.6
      env: ==PYTHON_3_6==
           OPENCL=false

--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -341,6 +341,12 @@ IF(OPENMM_BUILD_OPENCL_LIB)
    ADD_SUBDIRECTORY(platforms/opencl)
 ENDIF(OPENMM_BUILD_OPENCL_LIB)

+# Common compute files
+
+IF(CUDA_FOUND OR OPENCL_FOUND)
+    ADD_SUBDIRECTORY(platforms/common)
+ENDIF()
+
 # Optimized CPU platform

 SET(OPENMM_BUILD_CPU_LIB ON CACHE BOOL "Build optimized CPU platform")

--- a/platforms/cuda/EncodeCUDAFiles.cmake
+++ b/platforms/cuda/EncodeCUDAFiles.cmake
-FILE(GLOB CUDA_KERNELS ${CUDA_SOURCE_DIR}/kernels/*.cu)
-SET(CUDA_FILE_DECLARATIONS)
-SET(CUDA_FILE_DEFINITIONS)
-CONFIGURE_FILE(${CUDA_SOURCE_DIR}/${CUDA_SOURCE_CLASS}.cpp.in ${CUDA_KERNELS_CPP})
-FOREACH(file ${CUDA_KERNELS})
+FILE(GLOB KERNEL_FILES ${KERNEL_SOURCE_DIR}/kernels/*.${KERNEL_FILE_EXTENSION})
+SET(KERNEL_FILE_DECLARATIONS)
+CONFIGURE_FILE(${KERNEL_SOURCE_DIR}/${KERNEL_SOURCE_CLASS}.cpp.in ${KERNELS_CPP})
+FOREACH(file ${KERNEL_FILES})
    # Load the file contents and process it.
    FILE(STRINGS ${file} file_content NEWLINE_CONSUME)
    # Replace all backslashes by double backslashes as they are being put in a C string.
@@ -15,13 +14,13 @@ FOREACH(file ${CUDA_KERNELS})
    STRING(REPLACE "\n" "\\n\"\n\"" file_content "${file_content}")

    # Determine a name for the variable that will contain this file's contents
-    FILE(RELATIVE_PATH filename ${CUDA_SOURCE_DIR}/kernels ${file})
+    FILE(RELATIVE_PATH filename ${KERNEL_SOURCE_DIR}/kernels ${file})
    STRING(LENGTH ${filename} filename_length)
    MATH(EXPR filename_length ${filename_length}-3)
    STRING(SUBSTRING ${filename} 0 ${filename_length} variable_name)

    # Record the variable declaration and definition.
-    SET(CUDA_FILE_DECLARATIONS ${CUDA_FILE_DECLARATIONS}static\ const\ std::string\ ${variable_name};\n)
-    FILE(APPEND ${CUDA_KERNELS_CPP} const\ string\ ${CUDA_SOURCE_CLASS}::${variable_name}\ =\ \"${file_content}\"\;\n)
+    SET(KERNEL_FILE_DECLARATIONS ${KERNEL_FILE_DECLARATIONS}static\ const\ std::string\ ${variable_name};\n)
+    FILE(APPEND ${KERNELS_CPP} const\ string\ ${KERNEL_SOURCE_CLASS}::${variable_name}\ =\ \"${file_content}\"\;\n)
 ENDFOREACH(file)
-CONFIGURE_FILE(${CUDA_SOURCE_DIR}/${CUDA_SOURCE_CLASS}.h.in ${CUDA_KERNELS_H})
+CONFIGURE_FILE(${KERNEL_SOURCE_DIR}/${KERNEL_SOURCE_CLASS}.h.in ${KERNELS_H})
--- a/docs-source/developerguide/developer.rst
+++ b/docs-source/developerguide/developer.rst
@@ -456,15 +456,22 @@ It also defines vector versions of these types (\ :code:`real2`\ ,
 Computing Forces
 ****************

-When forces are computed, they are stored in multiple buffers.  This is done to
-enable multiple work-items or work-groups to compute forces on the same particle
-at the same time; as long as each one writes to a different buffer, there is no
-danger of race conditions.  At the start of a force calculation, all forces in
-all buffers are set to zero.   Each Force is then free to add its contributions
-to any or all of the buffers.  Finally, the buffers are summed to produce the
-total force on each particle.
-
-The size of each buffer is equal to the number of particles, rounded up to the
+When forces are computed, they can be stored in either of two places.  There is
+an array of :code:`long` values storing them as 64 bit fixed point values, and
+a collection of buffers of :code:`real4` values storing them in floating point
+format.  Most GPUs support atomic operations on 64 bit integers, which allows
+many threads to simultaneously record forces without a danger of conflicts.
+Some low end GPUs do not support this, however, especially the embedded GPUs
+found in many laptops.  These devices write to the floating point buffers, with
+careful coordination to make sure two threads will never write to the same
+memory location at the same time.
+
+At the start of a force calculation, all forces in all buffers are set to zero.
+Each Force is then free to add its contributions to any or all of the buffers.
+Finally, the buffers are summed to produce the total force on each particle.
+The total is recorded in both the floating point and fixed point arrays.
+
+The size of each floating point buffer is equal to the number of particles, rounded up to the
 next multiple of 32.  Call :code:`getPaddedNumAtoms()` on the OpenCLContext
 to get that number.  The actual force buffers are obtained by calling 
 :code:`getForceBuffers()`\ .  The first *n* entries (where *n* is the
@@ -473,16 +480,13 @@ represent the second force buffer, and so on.  More generally, the *i*\ ’th
 force buffer’s contribution to the force on particle *j* is stored in
 element :code:`i*context.getPaddedNumAtoms()+j`\ .

-Depending on the device, a buffer may also be created that stores contributions
-to the forces in 64 bit fixed point format.  On devices that support atomic
-operations on 64 bit integers in global memory, this can be a more efficient way
-of accumulating forces than using a large number of force buffers.  To convert a
-value from floating point to fixed point, multiply it by 0x100000000 (2\ :sup:`32`\ ),
-then cast it to a :code:`long`\ .  The fixed point buffer is
-ordered differently from the others.  For atom *i*\ , the x component of its
-force is stored in element :code:`i`\ , the y component in element 
+The fixed point buffer is ordered differently.  For atom *i*\ , the x component
+of its force is stored in element :code:`i`\ , the y component in element 
 :code:`i+context.getPaddedNumAtoms()`\ , and the z component in element 
-:code:`i+2*context.getPaddedNumAtoms()`\ .
+:code:`i+2*context.getPaddedNumAtoms()`\ .  To convert a value from floating
+point to fixed point, multiply it by 0x100000000 (2\ :sup:`32`\ ),
+then cast it to a :code:`long`\ .  Call :code:`getLongForceBuffer()` to get the
+array of fixed point values.

 The potential energy is also accumulated in a set of buffers, but this one is
 simply a list of floating point values.  All of them are set to zero at the
@@ -490,15 +494,10 @@ start of a computation, and they are summed at the end of the computation to
 yield the total energy.

 The OpenCL implementation of each Force object should define a subclass of
-OpenCLForce, and register an instance of it by calling :code:`addForce()` on
-the OpenCLContext.  This serves two purposes:
-
-#. It reports how many force buffers are required when calculating this
-   particular Force.  The OpenCLContext sets the size of its force buffer array
-   based on the largest number of buffers required by any Force.
-#. It implements methods for determining whether particular particles or groups
-   of particles are identical.  This is important when reordering particles, and is
-   discussed below.
+ComputeForceInfo, and register an instance of it by calling :code:`addForce()` on
+the OpenCLContext.  It implements methods for determining whether particular
+particles or groups of particles are identical.  This is important when
+reordering particles, and is discussed below.


 Nonbonded Forces
@@ -586,8 +585,7 @@ where *k* is a per-particle parameter.  First we create a parameter as
 follows
 ::

-    nb.addParameter(OpenCLNonbondedUtilities::ParameterInfo("kparam", "float", 1,
-            sizeof(cl_float), kparam->getDeviceBuffer()));
+    nb.addParameter(ComputeParameterInfo(kparam, "kparam", "float", 1));

 where :code:`nb` is the OpenCLNonbondedUtilities for the context.  Now we
 call :code:`addInteraction()` to define an interaction with the following
@@ -700,7 +698,7 @@ exchanged without affecting the System in any way.

 Every Force can contribute to defining the boundaries of molecules, and to
 determining whether two molecules are identical.  This is done through the
-OpenCLForceInfo it adds to the OpenCLContext.  It can specify two types of
+ComputeForceInfo it adds to the OpenCLContext.  It can specify two types of
 information:

 #. Given a pair of particles, it can say whether those two particles are
@@ -792,3 +790,189 @@ buffer.  In contrast, the CUDA platform uses *only* the fixed point buffer
 the CUDA platform only works on devices that support 64 bit atomic operations
 (compute capability 1.2 or higher).

+
+.. _common-compute
+
+Common Compute
+##############
+
+Common Compute is not a platform, but it shares many elements of one.  It exists
+to reduce code duplication between the OpenCL and CUDA platforms.  It allows a
+single implementation to be written for most kernels that can be used by both
+platforms.
+
+OpenCL and CUDA are very similar to each other.  Their computational models are
+nearly identical.  For example, each is based around launching kernels that are
+executed in parallel by many threads.  Each of them groups threads into blocks,
+with more communication and synchronization permitted between the threads
+in a block than between ones in different blocks.  They have very similar memory
+hierarchies: high latency global memory, low latency local/shared memory that
+can be used for communication between the threads of a block, and local variables
+that are visible only to a single thread.
+
+Even their languages for writing kernels are very similar.  Here is an OpenCL
+kernel that adds two arrays together, storing the result in a third array.
+::
+
+    __kernel void addArrays(__global const float* restrict a,
+                            __global const float* restrict b,
+                            __global float* restrict c
+                            int length) {
+        for (int i = get_global_id(0); i < length; i += get_global_size(0))
+            c[i] = a[i]+b[i];
+    }
+
+Here is the corresponding CUDA kernel.
+::
+
+    __extern "C" __global__ void addArrays(const float* __restrict__ a,
+                                           const float* __restrict__ b,
+                                           _float* __restrict__ c
+                                           int length) {
+        for (int i = blockIdx.x*blockDim.x+threadIdx.x; i < length; i += blockDim.x*gridDim.x)
+            c[i] = a[i]+b[i];
+    }
+
+The difference between them is largely just a mechanical find-and-replace.
+After many years of writing and maintaining nearly identical kernels by hand,
+it finally occurred to us that the translation could be done automatically by
+the compiler.  Simply by defining a few preprocessor macros, the following
+kernel can be compiled equally well either as OpenCL or as CUDA.
+::
+
+    KERNEL void addArrays(GLOBAL const float* RESTRICT a,
+                          GLOBAL const float* RESTRICT b,
+                          GLOBAL float* RESTRICT c
+                          int length) {
+        for (int i = GLOBAL_ID; i < length; i += GLOBAL_SIZE)
+            c[i] = a[i]+b[i];
+    }
+
+Writing Device Code
+*******************
+
+When compiling kernels with the Common Compute API, the following macros are
+defined.
+
+.. tabularcolumns:: |l|l|L|
+
+-------------------------------+------------------------------------------------------------+--------------------------------------------+
+|Macro                          |OpenCL Definition                                           |CUDA Definition                             |
+===============================+============================================================+============================================+
+|:code:`KERNEL`                 |:code:`__kernel`                                            |:code:`extern "C" __global__`               |
+-------------------------------+------------------------------------------------------------+--------------------------------------------+
+|:code:`DEVICE`                 |                                                            |:code:`__device__`                          |
+-------------------------------+------------------------------------------------------------+--------------------------------------------+
+|:code:`LOCAL`                  |:code:`__local`                                             |:code:`__shared__`                          |
+-------------------------------+------------------------------------------------------------+--------------------------------------------+
+|:code:`LOCAL_ARG`              |:code:`__local`                                             |                                            |
+-------------------------------+------------------------------------------------------------+--------------------------------------------+
+|:code:`GLOBAL`                 |:code:`__global`                                            |                                            |
+-------------------------------+------------------------------------------------------------+--------------------------------------------+
+|:code:`RESTRICT`               |:code:`restrict`                                            |:code:`__restrict__`                        |
+-------------------------------+------------------------------------------------------------+--------------------------------------------+
+|:code:`LOCAL_ID`               |:code:`get_local_id(0)`                                     |:code:`threadIdx.x`                         |
+-------------------------------+------------------------------------------------------------+--------------------------------------------+
+|:code:`LOCAL_SIZE`             |:code:`get_local_size(0)`                                   |:code:`blockDim.x`                          |
+-------------------------------+------------------------------------------------------------+--------------------------------------------+
+|:code:`GLOBAL_ID`              |:code:`get_global_id(0)`                                    |:code:`(blockIdx.x*blockDim.x+threadIdx.x)` |
+-------------------------------+------------------------------------------------------------+--------------------------------------------+
+|:code:`GLOBAL_SIZE`            |:code:`get_global_size(0)`                                  |:code:`(blockDim.x*gridDim.x)`              |
+-------------------------------+------------------------------------------------------------+--------------------------------------------+
+|:code:`GROUP_ID`               |:code:`get_group_id(0)`                                     |:code:`blockIdx.x`                          |
+-------------------------------+------------------------------------------------------------+--------------------------------------------+
+|:code:`NUM_GROUPS`             |:code:`get_num_groups(0)`                                   |:code:`gridDim.x`                           |
+-------------------------------+------------------------------------------------------------+--------------------------------------------+
+|:code:`SYNC_THREADS`           |:code:`barrier(CLK_LOCAL_MEM_FENCE+CLK_GLOBAL_MEM_FENCE);`  |:code:`__syncthreads();`                    |
+-------------------------------+------------------------------------------------------------+--------------------------------------------+
+|:code:`SYNC_WARPS`             | | if SIMT width >= 32:                                     | | if compute capability >= 7.0:            |
+|                               | | :code:`mem_fence(CLK_LOCAL_MEM_FENCE)`                   | | :code:`__syncwarp();`                    |
+|                               | | otherwise:                                               | | otherwise empty                          |
+|                               | | :code:`barrier(CLK_LOCAL_MEM_FENCE)`                     |                                            |
+-------------------------------+------------------------------------------------------------+--------------------------------------------+
+|:code:`MEM_FENCE`              |:code:`mem_fence(CLK_LOCAL_MEM_FENCE+CLK_GLOBAL_MEM_FENCE);`|:code:`__threadfence_block();`              |
+-------------------------------+------------------------------------------------------------+--------------------------------------------+
+|:code:`ATOMIC_ADD(dest, value)`|:code:`atom_add(dest, value)`                               |:code:`atomicAdd(dest, value)`              |
+-------------------------------+------------------------------------------------------------+--------------------------------------------+
+
+A few other symbols may or may not be defined based on the device you are running on:
+:code:`SUPPORTS_DOUBLE_PRECISION` and :code:`SUPPORTS_64_BIT_ATOMICS`\ .  You
+can use :code:`#ifdef` blocks with these symbols to conditionally compile code
+based on the features supported by the device.  In addition, the CUDA compiler
+defines the symbol :code:`__CUDA_ARCH__`\ , so you can check for this symbol if
+you want to have different code blocks for CUDA and OpenCL.
+
+Both OpenCL and CUDA define vector types like :code:`int2` and :code:`float4`\ .
+The types they support are different but overlapping.  When writing common code,
+use only the vector types that are supported by both OpenCL and CUDA: 2, 3, and 4
+element vectors of type :code:`short`\ , :code:`int`\ , :code:`float`\ , and
+:code:`double`\ .
+
+CUDA uses functions to construct vector values, such as :code:`make_float2(x, y)`\ .
+OpenCL instead uses a typecast like syntax: :code:`(float2) (x, y)`\ .  In common
+code, use the CUDA style :code:`make_` functions.  OpenMM provides definitions
+of these functions when compiling as OpenCL.
+
+In CUDA, vector types are simply data structures.  You can access their elements,
+but not do much more with them.  In contrast, OpenCL's vectors are mathematical
+types.  All standard math operators are defined for them, as well as geometrical
+functions like :code:`dot()` and :code:`cross()`\ .  When compiling kernels as
+CUDA, OpenMM provides definitions of these operators and functions.
+
+OpenCL also supports "swizzle" notation for vectors.  For example, if :code:`f`
+is a :code:`float4` you can construct a vector of its first three elements
+by writing :code:`f.xyz`\ , or you can swap its first two elements by writing
+:code:`f.xy = f.yx`\ .  Unfortunately, there is no practical way to support this
+in CUDA, so swizzle notation cannot be used in common code.  Because stripping
+the final element from a four component vector is such a common operation, OpenMM
+provides a special function for doing it: :code:`trimTo3(f)` is a vector of its
+first three elements.
+
+64 bit integers are another data type that needs special handling.  Both OpenCL
+and CUDA support them, but they use different names for them: :code:`long` in OpenCL,
+:code:`long long` in CUDA.  To work around this inconsistency, OpenMM provides
+the typedefs :code:`mm_long` and :code:`mm_ulong` for signed and unsigned 64 bit
+integers in device code.
+
+Writing Host Code
+*****************
+
+Host code for Common Compute is very similar to host code for OpenCL or CUDA.
+In fact, most of the classes provided by the OpenCL and CUDA platforms are
+subclasses of Common Compute classes.  For example, OpenCLContext and
+CudaContext are both subclasses of ComputeContext.  When writing common code,
+each KernelImpl should expect a ComputeContext to be passed to its constructor.
+By using the common API provided by that abstract class, it can be used for
+either OpenCL or CUDA just based on the particular context passed to it at
+runtime.  Similarly, OpenCLNonbondedUtilities and CudaNonbondedUtilities are
+subclasses of the abstract NonbondedUtilities class, and so on.
+
+ArrayInterface is an abstract class defining the interface for arrays stored on
+the device.  OpenCLArray and CudaArray are both subclasses of it.  To simplify
+code that creates and uses arrays, there is also a third subclass called
+ComputeArray.  It acts as a wrapper around an OpenCLArray or CudaArray,
+automatically creating an array of the appropriate type for the current
+platform.  In practice, just follow these rules:
+
+  1. Whenever you need to create an array, make it a ComputeArray.
+
+  2. Whenever you write a function that expects an array to be passed to it,
+     declare the type to be ArrayInterface.
+
+If you do these two things, all differences between platforms will be handled
+automatically.
+
+OpenCL and CUDA have quite different APIs for compiling and invoking kernels.
+To hide these differences, OpenMM provides a set of abstract classes.  To compile
+device code, pass the source code to :code:`compileProgram()` on the ComputeContext.
+This returns a ComputeProgram.  You can then call its :code:`createKernel()`
+method to get a ComputeKernel object, which has methods for setting arguments
+and invoking the kernel.
+
+Sometimes you need to refer to vector types in host code, such as to set the
+value for a kernel argument or to access the elements of an array.  OpenCL and
+CUDA both define types for them, but they have different names, and in any case
+you want to avoid using OpenCL-specific or CUDA-specific types in common code.
+OpenMM therefore defines types for vectors in host code.  They have the same
+names as the correspond types in device code, only with the prefix :code:`mm_`\ ,
+for example :code:`mm_int2` and :code:`mm_float3`\ .
\ No newline at end of file
--- a/libraries/asmjit/asmjit_apibegin.h
+++ b/libraries/asmjit/asmjit_apibegin.h
@@ -53,8 +53,8 @@
 // [GCC]
 #if ASMJIT_CC_GCC
 # pragma GCC diagnostic push
-# pragma GCC diagnostic ignored "-Wbool-operation"
 # if ASMJIT_CC_GCC_GE(8, 0, 0)
+#  pragma GCC diagnostic ignored "-Wbool-operation"
 #  pragma GCC diagnostic ignored "-Wclass-memaccess"
 # endif
 #endif // ASMJIT_CC_GCC

--- a/platforms/common/CMakeLists.txt
+++ b/platforms/common/CMakeLists.txt
+# Encode the kernel sources into a C++ class.
+
+SET(KERNEL_SOURCE_DIR "${CMAKE_CURRENT_SOURCE_DIR}/src")
+SET(KERNEL_SOURCE_CLASS CommonKernelSources)
+SET(KERNELS_CPP ${CMAKE_CURRENT_BINARY_DIR}/src/${KERNEL_SOURCE_CLASS}.cpp)
+SET(KERNELS_H ${CMAKE_CURRENT_BINARY_DIR}/src/${KERNEL_SOURCE_CLASS}.h)
+INCLUDE_DIRECTORIES(BEFORE ${CMAKE_CURRENT_BINARY_DIR}/src)
+FILE(GLOB COMMON_KERNELS ${KERNEL_SOURCE_DIR}/kernels/*.cc)
+ADD_CUSTOM_COMMAND(OUTPUT ${KERNELS_CPP} ${KERNELS_H}
+    COMMAND ${CMAKE_COMMAND}
+    ARGS -D KERNEL_SOURCE_DIR=${KERNEL_SOURCE_DIR} -D KERNELS_CPP=${KERNELS_CPP} -D KERNELS_H=${KERNELS_H} -D KERNEL_SOURCE_CLASS=${KERNEL_SOURCE_CLASS} -D KERNEL_FILE_EXTENSION=cc -P ${CMAKE_SOURCE_DIR}/cmake_modules/EncodeKernelFiles.cmake
+    DEPENDS ${COMMON_KERNELS}
+)
+SET_SOURCE_FILES_PROPERTIES(${KERNELS_CPP} ${KERNELS_H} PROPERTIES GENERATED TRUE)
+ADD_CUSTOM_TARGET(CommonKernels DEPENDS ${KERNELS_CPP} ${KERNELS_H})
+
+# Install headers
+
+FILE(GLOB CORE_HEADERS include/openmm/common/*.h)
+INSTALL_FILES(/include/openmm/common FILES ${CORE_HEADERS})
--- a/platforms/common/include/openmm/common/ArrayInterface.h
+++ b/platforms/common/include/openmm/common/ArrayInterface.h
+#ifndef OPENMM_ARRAYINTERFACE_H_
+#define OPENMM_ARRAYINTERFACE_H_
+
+/* -------------------------------------------------------------------------- *
+ *                                   OpenMM                                   *
+ * -------------------------------------------------------------------------- *
+ * This is part of the OpenMM molecular simulation toolkit originating from   *
+ * Simbios, the NIH National Center for Physics-Based Simulation of           *
+ * Biological Structures at Stanford, funded under the NIH Roadmap for        *
+ * Medical Research, grant U54 GM072970. See https://simtk.org.               *
+ *                                                                            *
+ * Portions copyright (c) 2019 Stanford University and the Authors.           *
+ * Authors: Peter Eastman                                                     *
+ * Contributors:                                                              *
+ *                                                                            *
+ * This program is free software: you can redistribute it and/or modify       *
+ * it under the terms of the GNU Lesser General Public License as published   *
+ * by the Free Software Foundation, either version 3 of the License, or       *
+ * (at your option) any later version.                                        *
+ *                                                                            *
+ * This program is distributed in the hope that it will be useful,            *
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of             *
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the              *
+ * GNU Lesser General Public License for more details.                        *
+ *                                                                            *
+ * You should have received a copy of the GNU Lesser General Public License   *
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.      *
+ * -------------------------------------------------------------------------- */
+
+#include "openmm/OpenMMException.h"
+#include "openmm/common/windowsExportCommon.h"
+#include <vector>
+
+namespace OpenMM {
+
+class ComputeContext;
+
+/**
+ * This abstract class defines the interface for arrays stored on a computing device.
+ */
+
+class OPENMM_EXPORT_COMMON ArrayInterface {
+public:
+    virtual ~ArrayInterface() {
+    }
+    /**
+     * Initialize this array.
+     *
+     * @param context           the context for which to create the array
+     * @param size              the number of elements in the array
+     * @param elementSize       the size of each element in bytes
+     * @param name              the name of the array
+     */
+    virtual void initialize(ComputeContext& context, int size, int elementSize, const std::string& name) = 0;
+    /**
+     * Initialize this object.  The template argument is the data type of each array element.
+     *
+     * @param context           the context for which to create the array
+     * @param size              the number of elements in the array
+     * @param name              the name of the array
+     */
+    template <class T>
+    void initialize(ComputeContext& context, int size, const std::string& name) {
+        initialize(context, size, sizeof(T), name);
+    }
+    /**
+     * Recreate the internal storage to have a different size.
+     */
+    virtual void resize(int size) = 0;
+    /**
+     * Get whether this array has been initialized.
+     */
+    virtual bool isInitialized() const = 0;
+    /**
+     * Get the number of elements in the array.
+     */
+    virtual int getSize() const = 0;
+    /**
+     * Get the size of each element in bytes.
+     */
+    virtual int getElementSize() const = 0;
+    /**
+     * Get the name of the array.
+     */
+    virtual const std::string& getName() const = 0;
+    /**
+     * Get the context this array belongs to.
+     */
+    virtual ComputeContext& getContext() = 0;
+    /**
+     * Copy the values in a vector to the device memory.
+     * 
+     * @param data      the data in host memory to copy
+     * @param convert   if true, automatic conversions between single and double
+     *                  precision will be performed as necessary
+     */
+    template <class T>
+    void upload(const std::vector<T>& data, bool convert=false) {
+        if (convert && data.size() == getSize() && sizeof(T) != getElementSize()) {
+            if (sizeof(T) == 2*getElementSize()) {
+                // Convert values from double to single precision.
+                const double* d = reinterpret_cast<const double*>(&data[0]);
+                std::vector<float> v(getElementSize()*getSize()/sizeof(float));
+                for (int i = 0; i < v.size(); i++)
+                    v[i] = (float) d[i];
+                upload(&v[0], true);
+                return;
+            }
+            if (2*sizeof(T) == getElementSize()) {
+                // Convert values from single to double precision.
+                const float* d = reinterpret_cast<const float*>(&data[0]);
+                std::vector<double> v(getElementSize()*getSize()/sizeof(double));
+                for (int i = 0; i < v.size(); i++)
+                    v[i] = (double) d[i];
+                upload(&v[0], true);
+                return;
+            }
+        }
+        if (sizeof(T) != getElementSize() || data.size() != getSize())
+            throw OpenMMException("Error uploading array "+getName()+": The specified vector does not match the size of the array");
+        upload(&data[0], true);
+    }
+    /**
+     * Copy the values in the array to a vector.
+     */
+    template <class T>
+    void download(std::vector<T>& data) const {
+        if (sizeof(T) != getElementSize())
+            throw OpenMMException("Error downloading array "+getName()+": The specified vector has the wrong element size");
+        if (data.size() != getSize())
+            data.resize(getSize());
+        download(&data[0], true);
+    }
+    /**
+     * Copy the values from host memory to the array.
+     * 
+     * @param data     the data to copy
+     * @param blocking if true, this call will block until the transfer is complete.  Subclasses often
+     *                 have restrictions on non-blocking copies, such as that the source data must be
+     *                 in page-locked memory.
+     */
+    virtual void upload(const void* data, bool blocking=true) = 0;
+    /**
+     * Copy the values in the array to host memory.
+     * 
+     * @param data     the destination to copy the value to
+     * @param blocking if true, this call will block until the transfer is complete.  Subclasses often
+     *                 have restrictions on non-blocking copies, such as that the destination must be
+     *                 in page-locked memory.
+     */
+    virtual void download(void* data, bool blocking=true) const = 0;
+    /**
+     * Copy the values in this array to a second array.
+     * 
+     * @param dest     the destination array to copy to
+     */
+    virtual void copyTo(ArrayInterface& dest) const = 0;
+};
+
+} // namespace OpenMM
+
+#endif /*OPENMM_ARRAYINTERFACE_H_*/
--- a/platforms/common/include/openmm/common/BondedUtilities.h
+++ b/platforms/common/include/openmm/common/BondedUtilities.h
+#ifndef OPENMM_BONDEDUTILITIES_H_
+#define OPENMM_BONDEDUTILITIES_H_
+
+/* -------------------------------------------------------------------------- *
+ *                                   OpenMM                                   *
+ * -------------------------------------------------------------------------- *
+ * This is part of the OpenMM molecular simulation toolkit originating from   *
+ * Simbios, the NIH National Center for Physics-Based Simulation of           *
+ * Biological Structures at Stanford, funded under the NIH Roadmap for        *
+ * Medical Research, grant U54 GM072970. See https://simtk.org.               *
+ *                                                                            *
+ * Portions copyright (c) 2011-2019 Stanford University and the Authors.      *
+ * Authors: Peter Eastman                                                     *
+ * Contributors:                                                              *
+ *                                                                            *
+ * This program is free software: you can redistribute it and/or modify       *
+ * it under the terms of the GNU Lesser General Public License as published   *
+ * by the Free Software Foundation, either version 3 of the License, or       *
+ * (at your option) any later version.                                        *
+ *                                                                            *
+ * This program is distributed in the hope that it will be useful,            *
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of             *
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the              *
+ * GNU Lesser General Public License for more details.                        *
+ *                                                                            *
+ * You should have received a copy of the GNU Lesser General Public License   *
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.      *
+ * -------------------------------------------------------------------------- */
+
+#include "openmm/common/ArrayInterface.h"
+#include <string>
+#include <vector>
+
+namespace OpenMM {
+
+/**
+ * This abstract class defines an interface for computing bonded interactions.  Call
+ * getBondedUtilities() on a ComputeContext to get the BondedUtilities object for that
+ * context.
+ * 
+ * This class provides a generic mechanism for evaluating bonded interactions.  You write only
+ * the source code needed to compute one interaction, and this object takes care of creating
+ * and executing a complete kernel that loops over bonds, evaluates each one, and accumulates
+ * the resulting forces and energies.  This offers two advantages.  First, it simplifies the
+ * task of writing a new Force.  Second, it allows multiple forces to be evaluated by a single
+ * kernel, which reduces overhead and improves performance.
+ * 
+ * A "bonded interaction" means an interaction that affects a small, fixed set of particles.
+ * The interaction energy may depend on the positions of only those particles, and the list of
+ * particles forming a "bond" may not change with time.  Examples of bonded interactions
+ * include HarmonicBondForce, HarmonicAngleForce, and PeriodicTorsionForce.
+ * 
+ * To create a bonded interaction, call addInteraction().  You pass to it a block of source
+ * code for evaluating the interaction.  The inputs and outputs for that source code are as
+ * follows:
+ * 
+ * <ol>
+ * <li>The index of the bond being evaluated will have been stored in the unsigned int variable "index".</li>
+ * <li>The indices of the atoms forming that bond will have been stored in the unsigned int variables "atom1",
+ * "atom2", ....</li>
+ * <li>The positions of those atoms will have been stored in the real4 variables "pos1", "pos2", ....</li>
+ * <li>A real variable called "energy" will exist.  Your code should add the potential energy of the
+ * bond to that variable.</li>
+ * <li>Your code should define real3 variables called "force1", "force2", ... that contain the force to
+ * apply to each atom.</li>
+ * </ol>
+ * 
+ * As a simple example, the following source code would be used to implement a pairwise interaction of
+ * the form E=r^2:
+ * 
+ * <tt><pre>
+ * real4 delta = pos2-pos1;
+ * energy += delta.x*delta.x + delta.y*delta.y + delta.z*delta.z;
+ * real3 force1 = 2.0f*delta;
+ * real3 force2 = -2.0f*delta;
+ * </pre></tt>
+ * 
+ * Interactions will often depend on parameters or other data.  Call addArgument() to provide the data
+ * to this class.  It will be passed to the interaction kernel as an argument, and you can refer to it
+ * from your interaction code.
+ */
+
+class OPENMM_EXPORT_COMMON BondedUtilities {
+public:
+    virtual ~BondedUtilities() {
+    }
+    /**
+     * Add a bonded interaction.
+     *
+     * @param atoms    this should have one entry for each bond, and that entry should contain the list
+     *                 of atoms involved in the bond.  Every entry must have the same number of atoms.
+     * @param source   the code to evaluate the interaction
+     * @param group    the force group in which the interaction should be calculated
+     */
+    virtual void addInteraction(const std::vector<std::vector<int> >& atoms, const std::string& source, int group) = 0;
+    /**
+     * Add an argument that should be passed to the interaction kernel.
+     * 
+     * @param data    the array containing the data to pass
+     * @param type    the data type contained in the memory (e.g. "float4")
+     * @return the name that will be used for the argument.  Any code you pass to addInteraction() should
+     * refer to it by this name.
+     */
+    virtual std::string addArgument(ArrayInterface& data, const std::string& type) = 0;
+    /**
+     * Register that the interaction kernel will be computing the derivative of the potential energy
+     * with respect to a parameter.
+     * 
+     * @param param   the name of the parameter
+     * @return the variable that will be used to accumulate the derivative.  Any code you pass to addInteraction() should
+     * add its contributions to this variable.
+     */
+    virtual std::string addEnergyParameterDerivative(const std::string& param) = 0;
+    /**
+     * Add some code that should be included in the program, before the start of the kernel.
+     * This can be used, for example, to define functions that will be called by the kernel.
+     * 
+     * @param source   the code to include
+     */
+    virtual void addPrefixCode(const std::string& source) = 0;
+};
+
+} // namespace OpenMM
+
+#endif /*OPENMM_BONDEDUTILITIES_H_*/
--- a/platforms/common/include/openmm/common/CommonKernels.h
+++ b/platforms/common/include/openmm/common/CommonKernels.h
+#ifndef OPENMM_COMMONKERNELS_H_
+#define OPENMM_COMMONKERNELS_H_
+
+/* -------------------------------------------------------------------------- *
+ *                                   OpenMM                                   *
+ * -------------------------------------------------------------------------- *
+ * This is part of the OpenMM molecular simulation toolkit originating from   *
+ * Simbios, the NIH National Center for Physics-Based Simulation of           *
+ * Biological Structures at Stanford, funded under the NIH Roadmap for        *
+ * Medical Research, grant U54 GM072970. See https://simtk.org.               *
+ *                                                                            *
+ * Portions copyright (c) 2008-2019 Stanford University and the Authors.      *
+ * Authors: Peter Eastman                                                     *
+ * Contributors:                                                              *
+ *                                                                            *
+ * This program is free software: you can redistribute it and/or modify       *
+ * it under the terms of the GNU Lesser General Public License as published   *
+ * by the Free Software Foundation, either version 3 of the License, or       *
+ * (at your option) any later version.                                        *
+ *                                                                            *
+ * This program is distributed in the hope that it will be useful,            *
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of             *
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the              *
+ * GNU Lesser General Public License for more details.                        *
+ *                                                                            *
+ * You should have received a copy of the GNU Lesser General Public License   *
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.      *
+ * -------------------------------------------------------------------------- */
+
+#include "openmm/common/ComputeArray.h"
+#include "openmm/common/ComputeContext.h"
+#include "openmm/common/ComputeParameterSet.h"
+#include "openmm/Platform.h"
+#include "openmm/kernels.h"
+#include "openmm/internal/CompiledExpressionSet.h"
+#include "openmm/internal/CustomIntegratorUtilities.h"
+#include "lepton/CompiledExpression.h"
+
+namespace OpenMM {
+
+
+/**
+ * This kernel is invoked by HarmonicBondForce to calculate the forces acting on the system and the energy of the system.
+ */
+class CommonCalcHarmonicBondForceKernel : public CalcHarmonicBondForceKernel {
+public:
+    CommonCalcHarmonicBondForceKernel(std::string name, const Platform& platform, ComputeContext& cc, const System& system) : CalcHarmonicBondForceKernel(name, platform),
+            hasInitializedKernel(false), cc(cc), system(system) {
+    }
+    /**
+     * Initialize the kernel.
+     *
+     * @param system     the System this kernel will be applied to
+     * @param force      the HarmonicBondForce this kernel will be used for
+     */
+    void initialize(const System& system, const HarmonicBondForce& force);
+    /**
+     * Execute the kernel to calculate the forces and/or energy.
+     *
+     * @param context        the context in which to execute this kernel
+     * @param includeForces  true if forces should be calculated
+     * @param includeEnergy  true if the energy should be calculated
+     * @return the potential energy due to the force
+     */
+    double execute(ContextImpl& context, bool includeForces, bool includeEnergy);
+    /**
+     * Copy changed parameters over to a context.
+     *
+     * @param context    the context to copy parameters to
+     * @param force      the HarmonicBondForce to copy the parameters from
+     */
+    void copyParametersToContext(ContextImpl& context, const HarmonicBondForce& force);
+private:
+    class ForceInfo;
+    int numBonds;
+    bool hasInitializedKernel;
+    ComputeContext& cc;
+    ForceInfo* info;
+    const System& system;
+    ComputeArray params;
+};
+
+/**
+ * This kernel is invoked by CustomBondForce to calculate the forces acting on the system and the energy of the system.
+ */
+class CommonCalcCustomBondForceKernel : public CalcCustomBondForceKernel {
+public:
+    CommonCalcCustomBondForceKernel(std::string name, const Platform& platform, ComputeContext& cc, const System& system) : CalcCustomBondForceKernel(name, platform),
+            hasInitializedKernel(false), cc(cc), system(system), params(NULL) {
+    }
+    ~CommonCalcCustomBondForceKernel();
+    /**
+     * Initialize the kernel.
+     *
+     * @param system     the System this kernel will be applied to
+     * @param force      the CustomBondForce this kernel will be used for
+     */
+    void initialize(const System& system, const CustomBondForce& force);
+    /**
+     * Execute the kernel to calculate the forces and/or energy.
+     *
+     * @param context        the context in which to execute this kernel
+     * @param includeForces  true if forces should be calculated
+     * @param includeEnergy  true if the energy should be calculated
+     * @return the potential energy due to the force
+     */
+    double execute(ContextImpl& context, bool includeForces, bool includeEnergy);
+    /**
+     * Copy changed parameters over to a context.
+     *
+     * @param context    the context to copy parameters to
+     * @param force      the CustomBondForce to copy the parameters from
+     */
+    void copyParametersToContext(ContextImpl& context, const CustomBondForce& force);
+private:
+    class ForceInfo;
+    int numBonds;
+    bool hasInitializedKernel;
+    ComputeContext& cc;
+    ForceInfo* info;
+    const System& system;
+    ComputeParameterSet* params;
+    ComputeArray globals;
+    std::vector<std::string> globalParamNames;
+    std::vector<float> globalParamValues;
+};
+
+/**
+ * This kernel is invoked by HarmonicAngleForce to calculate the forces acting on the system and the energy of the system.
+ */
+class CommonCalcHarmonicAngleForceKernel : public CalcHarmonicAngleForceKernel {
+public:
+    CommonCalcHarmonicAngleForceKernel(std::string name, const Platform& platform, ComputeContext& cc, const System& system) : CalcHarmonicAngleForceKernel(name, platform),
+            hasInitializedKernel(false), cc(cc), system(system) {
+    }
+    /**
+     * Initialize the kernel.
+     *
+     * @param system     the System this kernel will be applied to
+     * @param force      the HarmonicAngleForce this kernel will be used for
+     */
+    void initialize(const System& system, const HarmonicAngleForce& force);
+    /**
+     * Execute the kernel to calculate the forces and/or energy.
+     *
+     * @param context        the context in which to execute this kernel
+     * @param includeForces  true if forces should be calculated
+     * @param includeEnergy  true if the energy should be calculated
+     * @return the potential energy due to the force
+     */
+    double execute(ContextImpl& context, bool includeForces, bool includeEnergy);
+    /**
+     * Copy changed parameters over to a context.
+     *
+     * @param context    the context to copy parameters to
+     * @param force      the HarmonicAngleForce to copy the parameters from
+     */
+    void copyParametersToContext(ContextImpl& context, const HarmonicAngleForce& force);
+private:
+    class ForceInfo;
+    int numAngles;
+    bool hasInitializedKernel;
+    ComputeContext& cc;
+    ForceInfo* info;
+    const System& system;
+    ComputeArray params;
+};
+
+/**
+ * This kernel is invoked by CustomAngleForce to calculate the forces acting on the system and the energy of the system.
+ */
+class CommonCalcCustomAngleForceKernel : public CalcCustomAngleForceKernel {
+public:
+    CommonCalcCustomAngleForceKernel(std::string name, const Platform& platform, ComputeContext& cc, const System& system) : CalcCustomAngleForceKernel(name, platform),
+            hasInitializedKernel(false), cc(cc), system(system), params(NULL) {
+    }
+    ~CommonCalcCustomAngleForceKernel();
+    /**
+     * Initialize the kernel.
+     *
+     * @param system     the System this kernel will be applied to
+     * @param force      the CustomAngleForce this kernel will be used for
+     */
+    void initialize(const System& system, const CustomAngleForce& force);
+    /**
+     * Execute the kernel to calculate the forces and/or energy.
+     *
+     * @param context        the context in which to execute this kernel
+     * @param includeForces  true if forces should be calculated
+     * @param includeEnergy  true if the energy should be calculated
+     * @return the potential energy due to the force
+     */
+    double execute(ContextImpl& context, bool includeForces, bool includeEnergy);
+    /**
+     * Copy changed parameters over to a context.
+     *
+     * @param context    the context to copy parameters to
+     * @param force      the CustomAngleForce to copy the parameters from
+     */
+    void copyParametersToContext(ContextImpl& context, const CustomAngleForce& force);
+private:
+    class ForceInfo;
+    int numAngles;
+    bool hasInitializedKernel;
+    ComputeContext& cc;
+    ForceInfo* info;
+    const System& system;
+    ComputeParameterSet* params;
+    ComputeArray globals;
+    std::vector<std::string> globalParamNames;
+    std::vector<float> globalParamValues;
+};
+
+/**
+ * This kernel is invoked by PeriodicTorsionForce to calculate the forces acting on the system and the energy of the system.
+ */
+class CommonCalcPeriodicTorsionForceKernel : public CalcPeriodicTorsionForceKernel {
+public:
+    CommonCalcPeriodicTorsionForceKernel(std::string name, const Platform& platform, ComputeContext& cc, const System& system) : CalcPeriodicTorsionForceKernel(name, platform),
+            hasInitializedKernel(false), cc(cc), system(system) {
+    }
+    /**
+     * Initialize the kernel.
+     *
+     * @param system     the System this kernel will be applied to
+     * @param force      the PeriodicTorsionForce this kernel will be used for
+     */
+    void initialize(const System& system, const PeriodicTorsionForce& force);
+    /**
+     * Execute the kernel to calculate the forces and/or energy.
+     *
+     * @param context        the context in which to execute this kernel
+     * @param includeForces  true if forces should be calculated
+     * @param includeEnergy  true if the energy should be calculated
+     * @return the potential energy due to the force
+     */
+    double execute(ContextImpl& context, bool includeForces, bool includeEnergy);
+    /**
+     * Copy changed parameters over to a context.
+     *
+     * @param context    the context to copy parameters to
+     * @param force      the PeriodicTorsionForce to copy the parameters from
+     */
+    void copyParametersToContext(ContextImpl& context, const PeriodicTorsionForce& force);
+private:
+    class ForceInfo;
+    int numTorsions;
+    bool hasInitializedKernel;
+    ComputeContext& cc;
+    ForceInfo* info;
+    const System& system;
+    ComputeArray params;
+};
+
+/**
+ * This kernel is invoked by RBTorsionForce to calculate the forces acting on the system and the energy of the system.
+ */
+class CommonCalcRBTorsionForceKernel : public CalcRBTorsionForceKernel {
+public:
+    CommonCalcRBTorsionForceKernel(std::string name, const Platform& platform, ComputeContext& cc, const System& system) : CalcRBTorsionForceKernel(name, platform),
+            hasInitializedKernel(false), cc(cc), system(system) {
+    }
+    /**
+     * Initialize the kernel.
+     *
+     * @param system     the System this kernel will be applied to
+     * @param force      the RBTorsionForce this kernel will be used for
+     */
+    void initialize(const System& system, const RBTorsionForce& force);
+    /**
+     * Execute the kernel to calculate the forces and/or energy.
+     *
+     * @param context        the context in which to execute this kernel
+     * @param includeForces  true if forces should be calculated
+     * @param includeEnergy  true if the energy should be calculated
+     * @return the potential energy due to the force
+     */
+    double execute(ContextImpl& context, bool includeForces, bool includeEnergy);
+    /**
+     * Copy changed parameters over to a context.
+     *
+     * @param context    the context to copy parameters to
+     * @param force      the RBTorsionForce to copy the parameters from
+     */
+    void copyParametersToContext(ContextImpl& context, const RBTorsionForce& force);
+private:
+    class ForceInfo;
+    int numTorsions;
+    bool hasInitializedKernel;
+    ComputeContext& cc;
+    ForceInfo* info;
+    const System& system;
+    ComputeArray params1;
+    ComputeArray params2;
+};
+
+/**
+ * This kernel is invoked by CustomTorsionForce to calculate the forces acting on the system and the energy of the system.
+ */
+class CommonCalcCustomTorsionForceKernel : public CalcCustomTorsionForceKernel {
+public:
+    CommonCalcCustomTorsionForceKernel(std::string name, const Platform& platform, ComputeContext& cc, const System& system) : CalcCustomTorsionForceKernel(name, platform),
+            hasInitializedKernel(false), cc(cc), system(system), params(NULL) {
+    }
+    ~CommonCalcCustomTorsionForceKernel();
+    /**
+     * Initialize the kernel.
+     *
+     * @param system     the System this kernel will be applied to
+     * @param force      the CustomTorsionForce this kernel will be used for
+     */
+    void initialize(const System& system, const CustomTorsionForce& force);
+    /**
+     * Execute the kernel to calculate the forces and/or energy.
+     *
+     * @param context        the context in which to execute this kernel
+     * @param includeForces  true if forces should be calculated
+     * @param includeEnergy  true if the energy should be calculated
+     * @return the potential energy due to the force
+     */
+    double execute(ContextImpl& context, bool includeForces, bool includeEnergy);
+    /**
+     * Copy changed parameters over to a context.
+     *
+     * @param context    the context to copy parameters to
+     * @param force      the CustomTorsionForce to copy the parameters from
+     */
+    void copyParametersToContext(ContextImpl& context, const CustomTorsionForce& force);
+private:
+    class ForceInfo;
+    int numTorsions;
+    bool hasInitializedKernel;
+    ComputeContext& cc;
+    ForceInfo* info;
+    const System& system;
+    ComputeParameterSet* params;
+    ComputeArray globals;
+    std::vector<std::string> globalParamNames;
+    std::vector<float> globalParamValues;
+};
+
+/**
+ * This kernel is invoked by CMAPTorsionForce to calculate the forces acting on the system and the energy of the system.
+ */
+class CommonCalcCMAPTorsionForceKernel : public CalcCMAPTorsionForceKernel {
+public:
+    CommonCalcCMAPTorsionForceKernel(std::string name, const Platform& platform, ComputeContext& cc, const System& system) : CalcCMAPTorsionForceKernel(name, platform),
+            hasInitializedKernel(false), cc(cc), system(system) {
+    }
+    /**
+     * Initialize the kernel.
+     *
+     * @param system     the System this kernel will be applied to
+     * @param force      the CMAPTorsionForce this kernel will be used for
+     */
+    void initialize(const System& system, const CMAPTorsionForce& force);
+    /**
+     * Execute the kernel to calculate the forces and/or energy.
+     *
+     * @param context        the context in which to execute this kernel
+     * @param includeForces  true if forces should be calculated
+     * @param includeEnergy  true if the energy should be calculated
+     * @return the potential energy due to the force
+     */
+    double execute(ContextImpl& context, bool includeForces, bool includeEnergy);
+    /**
+     * Copy changed parameters over to a context.
+     *
+     * @param context    the context to copy parameters to
+     * @param force      the CMAPTorsionForce to copy the parameters from
+     */
+    void copyParametersToContext(ContextImpl& context, const CMAPTorsionForce& force);
+private:
+    class ForceInfo;
+    int numTorsions;
+    bool hasInitializedKernel;
+    ComputeContext& cc;
+    ForceInfo* info;
+    const System& system;
+    std::vector<mm_int2> mapPositionsVec;
+    ComputeArray coefficients;
+    ComputeArray mapPositions;
+    ComputeArray torsionMaps;
+};
+
+/**
+ * This kernel is invoked by CustomExternalForce to calculate the forces acting on the system and the energy of the system.
+ */
+class CommonCalcCustomExternalForceKernel : public CalcCustomExternalForceKernel {
+public:
+    CommonCalcCustomExternalForceKernel(std::string name, const Platform& platform, ComputeContext& cc, const System& system) : CalcCustomExternalForceKernel(name, platform),
+            hasInitializedKernel(false), cc(cc), system(system), params(NULL) {
+    }
+    ~CommonCalcCustomExternalForceKernel();
+    /**
+     * Initialize the kernel.
+     *
+     * @param system     the System this kernel will be applied to
+     * @param force      the CustomExternalForce this kernel will be used for
+     */
+    void initialize(const System& system, const CustomExternalForce& force);
+    /**
+     * Execute the kernel to calculate the forces and/or energy.
+     *
+     * @param context        the context in which to execute this kernel
+     * @param includeForces  true if forces should be calculated
+     * @param includeEnergy  true if the energy should be calculated
+     * @return the potential energy due to the force
+     */
+    double execute(ContextImpl& context, bool includeForces, bool includeEnergy);
+    /**
+     * Copy changed parameters over to a context.
+     *
+     * @param context    the context to copy parameters to
+     * @param force      the CustomExternalForce to copy the parameters from
+     */
+    void copyParametersToContext(ContextImpl& context, const CustomExternalForce& force);
+private:
+    class ForceInfo;
+    int numParticles;
+    bool hasInitializedKernel;
+    ComputeContext& cc;
+    ForceInfo* info;
+    const System& system;
+    ComputeParameterSet* params;
+    ComputeArray globals;
+    std::vector<std::string> globalParamNames;
+    std::vector<float> globalParamValues;
+};
+
+/**
+ * This kernel is invoked by CustomCompoundBondForce to calculate the forces acting on the system.
+ */
+class CommonCalcCustomCompoundBondForceKernel : public CalcCustomCompoundBondForceKernel {
+public:
+    CommonCalcCustomCompoundBondForceKernel(std::string name, const Platform& platform, ComputeContext& cc, const System& system) : CalcCustomCompoundBondForceKernel(name, platform),
+            cc(cc), params(NULL), system(system) {
+    }
+    ~CommonCalcCustomCompoundBondForceKernel();
+    /**
+     * Initialize the kernel.
+     *
+     * @param system     the System this kernel will be applied to
+     * @param force      the CustomCompoundBondForce this kernel will be used for
+     */
+    void initialize(const System& system, const CustomCompoundBondForce& force);
+    /**
+     * Execute the kernel to calculate the forces and/or energy.
+     *
+     * @param context        the context in which to execute this kernel
+     * @param includeForces  true if forces should be calculated
+     * @param includeEnergy  true if the energy should be calculated
+     * @return the potential energy due to the force
+     */
+    double execute(ContextImpl& context, bool includeForces, bool includeEnergy);
+    /**
+     * Copy changed parameters over to a context.
+     *
+     * @param context    the context to copy parameters to
+     * @param force      the CustomCompoundBondForce to copy the parameters from
+     */
+    void copyParametersToContext(ContextImpl& context, const CustomCompoundBondForce& force);
+
+private:
+    class ForceInfo;
+    int numBonds;
+    ComputeContext& cc;
+    ForceInfo* info;
+    ComputeParameterSet* params;
+    ComputeArray globals;
+    std::vector<std::string> globalParamNames;
+    std::vector<float> globalParamValues;
+    std::vector<ComputeArray> tabulatedFunctions;
+    const System& system;
+};
+
+/**
+ * This kernel is invoked by CustomCentroidBondForce to calculate the forces acting on the system.
+ */
+class CommonCalcCustomCentroidBondForceKernel : public CalcCustomCentroidBondForceKernel {
+public:
+    CommonCalcCustomCentroidBondForceKernel(std::string name, const Platform& platform, ComputeContext& cc, const System& system) : CalcCustomCentroidBondForceKernel(name, platform),
+            cc(cc), params(NULL), system(system) {
+    }
+    ~CommonCalcCustomCentroidBondForceKernel();
+    /**
+     * Initialize the kernel.
+     *
+     * @param system     the System this kernel will be applied to
+     * @param force      the CustomCentroidBondForce this kernel will be used for
+     */
+    void initialize(const System& system, const CustomCentroidBondForce& force);
+    /**
+     * Execute the kernel to calculate the forces and/or energy.
+     *
+     * @param context        the context in which to execute this kernel
+     * @param includeForces  true if forces should be calculated
+     * @param includeEnergy  true if the energy should be calculated
+     * @return the potential energy due to the force
+     */
+    double execute(ContextImpl& context, bool includeForces, bool includeEnergy);
+    /**
+     * Copy changed parameters over to a context.
+     *
+     * @param context    the context to copy parameters to
+     * @param force      the CustomCentroidBondForce to copy the parameters from
+     */
+    void copyParametersToContext(ContextImpl& context, const CustomCentroidBondForce& force);
+
+private:
+    class ForceInfo;
+    int numGroups, numBonds;
+    bool needEnergyParamDerivs;
+    ComputeContext& cc;
+    ForceInfo* info;
+    ComputeParameterSet* params;
+    ComputeArray globals, groupParticles, groupWeights, groupOffsets;
+    ComputeArray groupForces, bondGroups, centerPositions;
+    std::vector<std::string> globalParamNames;
+    std::vector<float> globalParamValues;
+    std::vector<ComputeArray> tabulatedFunctions;
+    std::vector<void*> groupForcesArgs;
+    ComputeKernel computeCentersKernel, groupForcesKernel, applyForcesKernel;
+    const System& system;
+};
+
+/**
+ * This kernel is invoked by CustomNonbondedForce to calculate the forces acting on the system.
+ */
+class CommonCalcCustomNonbondedForceKernel : public CalcCustomNonbondedForceKernel {
+public:
+    CommonCalcCustomNonbondedForceKernel(std::string name, const Platform& platform, ComputeContext& cc, const System& system) : CalcCustomNonbondedForceKernel(name, platform),
+            cc(cc), params(NULL), forceCopy(NULL), system(system), hasInitializedKernel(false) {
+    }
+    ~CommonCalcCustomNonbondedForceKernel();
+    /**
+     * Initialize the kernel.
+     *
+     * @param system     the System this kernel will be applied to
+     * @param force      the CustomNonbondedForce this kernel will be used for
+     */
+    void initialize(const System& system, const CustomNonbondedForce& force);
+    /**
+     * Execute the kernel to calculate the forces and/or energy.
+     *
+     * @param context        the context in which to execute this kernel
+     * @param includeForces  true if forces should be calculated
+     * @param includeEnergy  true if the energy should be calculated
+     * @return the potential energy due to the force
+     */
+    double execute(ContextImpl& context, bool includeForces, bool includeEnergy);
+    /**
+     * Copy changed parameters over to a context.
+     *
+     * @param context    the context to copy parameters to
+     * @param force      the CustomNonbondedForce to copy the parameters from
+     */
+    void copyParametersToContext(ContextImpl& context, const CustomNonbondedForce& force);
+private:
+    class ForceInfo;
+    void initInteractionGroups(const CustomNonbondedForce& force, const std::string& interactionSource, const std::vector<std::string>& tableTypes);
+    ComputeContext& cc;
+    ForceInfo* info;
+    ComputeParameterSet* params;
+    ComputeArray globals, interactionGroupData, filteredGroupData, numGroupTiles;
+    ComputeKernel interactionGroupKernel, prepareNeighborListKernel, buildNeighborListKernel;
+    std::vector<void*> interactionGroupArgs;
+    std::vector<std::string> globalParamNames;
+    std::vector<float> globalParamValues;
+    std::vector<ComputeArray> tabulatedFunctions;
+    double longRangeCoefficient;
+    std::vector<double> longRangeCoefficientDerivs;
+    bool hasInitializedLongRangeCorrection, hasInitializedKernel, hasParamDerivs, useNeighborList;
+    int numGroupThreadBlocks;
+    CustomNonbondedForce* forceCopy;
+    const System& system;
+};
+
+/**
+ * This kernel is invoked by GBSAOBCForce to calculate the forces acting on the system.
+ */
+class CommonCalcGBSAOBCForceKernel : public CalcGBSAOBCForceKernel {
+public:
+    CommonCalcGBSAOBCForceKernel(std::string name, const Platform& platform, ComputeContext& cc) : CalcGBSAOBCForceKernel(name, platform), cc(cc),
+            hasCreatedKernels(false) {
+    }
+    /**
+     * Initialize the kernel.
+     *
+     * @param system     the System this kernel will be applied to
+     * @param force      the GBSAOBCForce this kernel will be used for
+     */
+    void initialize(const System& system, const GBSAOBCForce& force);
+    /**
+     * Execute the kernel to calculate the forces and/or energy.
+     *
+     * @param context        the context in which to execute this kernel
+     * @param includeForces  true if forces should be calculated
+     * @param includeEnergy  true if the energy should be calculated
+     * @return the potential energy due to the force
+     */
+    double execute(ContextImpl& context, bool includeForces, bool includeEnergy);
+    /**
+     * Copy changed parameters over to a context.
+     *
+     * @param context    the context to copy parameters to
+     * @param force      the GBSAOBCForce to copy the parameters from
+     */
+    void copyParametersToContext(ContextImpl& context, const GBSAOBCForce& force);
+private:
+    class ForceInfo;
+    double prefactor, surfaceAreaFactor, cutoff;
+    bool hasCreatedKernels;
+    int maxTiles;
+    ComputeContext& cc;
+    ForceInfo* info;
+    ComputeArray params, charges, bornSum, bornRadii, bornForce, obcChain;
+    ComputeKernel computeBornSumKernel, reduceBornSumKernel, force1Kernel, reduceBornForceKernel;
+};
+
+/**
+ * This kernel is invoked by CustomGBForce to calculate the forces acting on the system.
+ */
+class CommonCalcCustomGBForceKernel : public CalcCustomGBForceKernel {
+public:
+    CommonCalcCustomGBForceKernel(std::string name, const Platform& platform, ComputeContext& cc, const System& system) : CalcCustomGBForceKernel(name, platform),
+            hasInitializedKernels(false), cc(cc), params(NULL), computedValues(NULL), energyDerivs(NULL), energyDerivChain(NULL), system(system) {
+    }
+    ~CommonCalcCustomGBForceKernel();
+    /**
+     * Initialize the kernel.
+     *
+     * @param system     the System this kernel will be applied to
+     * @param force      the CustomGBForce this kernel will be used for
+     */
+    void initialize(const System& system, const CustomGBForce& force);
+    /**
+     * Execute the kernel to calculate the forces and/or energy.
+     *
+     * @param context        the context in which to execute this kernel
+     * @param includeForces  true if forces should be calculated
+     * @param includeEnergy  true if the energy should be calculated
+     * @return the potential energy due to the force
+     */
+    double execute(ContextImpl& context, bool includeForces, bool includeEnergy);
+    /**
+     * Copy changed parameters over to a context.
+     *
+     * @param context    the context to copy parameters to
+     * @param force      the CustomGBForce to copy the parameters from
+     */
+    void copyParametersToContext(ContextImpl& context, const CustomGBForce& force);
+private:
+    class ForceInfo;
+    double cutoff;
+    bool hasInitializedKernels, needParameterGradient, needEnergyParamDerivs;
+    int maxTiles, numComputedValues;
+    ComputeContext& cc;
+    ForceInfo* info;
+    ComputeParameterSet* params;
+    ComputeParameterSet* computedValues;
+    ComputeParameterSet* energyDerivs;
+    ComputeParameterSet* energyDerivChain;
+    std::vector<ComputeParameterSet*> dValuedParam;
+    std::vector<ComputeArray> dValue0dParam;
+    ComputeArray longEnergyDerivs, globals, valueBuffers;
+    std::vector<std::string> globalParamNames;
+    std::vector<float> globalParamValues;
+    std::vector<ComputeArray> tabulatedFunctions;
+    std::vector<bool> pairValueUsesParam, pairEnergyUsesParam, pairEnergyUsesValue;
+    const System& system;
+    ComputeKernel pairValueKernel, perParticleValueKernel, pairEnergyKernel, perParticleEnergyKernel, gradientChainRuleKernel;
+    std::string pairValueSrc, pairEnergySrc;
+    std::map<std::string, std::string> pairValueDefines, pairEnergyDefines;
+};
+
+/**
+ * This kernel is invoked by CustomHbondForce to calculate the forces acting on the system.
+ */
+class CommonCalcCustomHbondForceKernel : public CalcCustomHbondForceKernel {
+public:
+    CommonCalcCustomHbondForceKernel(std::string name, const Platform& platform, ComputeContext& cc, const System& system) : CalcCustomHbondForceKernel(name, platform),
+            hasInitializedKernel(false), cc(cc), donorParams(NULL), acceptorParams(NULL), system(system) {
+    }
+    ~CommonCalcCustomHbondForceKernel();
+    /**
+     * Initialize the kernel.
+     *
+     * @param system     the System this kernel will be applied to
+     * @param force      the CustomHbondForce this kernel will be used for
+     */
+    void initialize(const System& system, const CustomHbondForce& force);
+    /**
+     * Execute the kernel to calculate the forces and/or energy.
+     *
+     * @param context        the context in which to execute this kernel
+     * @param includeForces  true if forces should be calculated
+     * @param includeEnergy  true if the energy should be calculated
+     * @return the potential energy due to the force
+     */
+    double execute(ContextImpl& context, bool includeForces, bool includeEnergy);
+    /**
+     * Copy changed parameters over to a context.
+     *
+     * @param context    the context to copy parameters to
+     * @param force      the CustomHbondForce to copy the parameters from
+     */
+    void copyParametersToContext(ContextImpl& context, const CustomHbondForce& force);
+private:
+    class ForceInfo;
+    int numDonors, numAcceptors;
+    bool hasInitializedKernel;
+    ComputeContext& cc;
+    ForceInfo* info;
+    ComputeParameterSet* donorParams;
+    ComputeParameterSet* acceptorParams;
+    ComputeArray globals;
+    ComputeArray donors;
+    ComputeArray acceptors;
+    ComputeArray donorBufferIndices;
+    ComputeArray acceptorBufferIndices;
+    ComputeArray donorExclusions;
+    ComputeArray acceptorExclusions;
+    std::vector<std::string> globalParamNames;
+    std::vector<float> globalParamValues;
+    std::vector<ComputeArray> tabulatedFunctions;
+    const System& system;
+    ComputeKernel donorKernel, acceptorKernel;
+};
+
+/**
+ * This kernel is invoked by CustomManyParticleForce to calculate the forces acting on the system.
+ */
+class CommonCalcCustomManyParticleForceKernel : public CalcCustomManyParticleForceKernel {
+public:
+    CommonCalcCustomManyParticleForceKernel(std::string name, const Platform& platform, ComputeContext& cc, const System& system) : CalcCustomManyParticleForceKernel(name, platform),
+            hasInitializedKernel(false), cc(cc), params(NULL), system(system) {
+    }
+    ~CommonCalcCustomManyParticleForceKernel();
+    /**
+     * Initialize the kernel.
+     *
+     * @param system     the System this kernel will be applied to
+     * @param force      the CustomManyParticleForce this kernel will be used for
+     */
+    void initialize(const System& system, const CustomManyParticleForce& force);
+    /**
+     * Execute the kernel to calculate the forces and/or energy.
+     *
+     * @param context        the context in which to execute this kernel
+     * @param includeForces  true if forces should be calculated
+     * @param includeEnergy  true if the energy should be calculated
+     * @return the potential energy due to the force
+     */
+    double execute(ContextImpl& context, bool includeForces, bool includeEnergy);
+    /**
+     * Copy changed parameters over to a context.
+     *
+     * @param context    the context to copy parameters to
+     * @param force      the CustomManyParticleForce to copy the parameters from
+     */
+    void copyParametersToContext(ContextImpl& context, const CustomManyParticleForce& force);
+
+private:
+    class ForceInfo;
+    ComputeContext& cc;
+    ForceInfo* info;
+    bool hasInitializedKernel;
+    NonbondedMethod nonbondedMethod;
+    int maxNeighborPairs, forceWorkgroupSize, findNeighborsWorkgroupSize;
+    ComputeParameterSet* params;
+    ComputeArray globals, particleTypes,  orderIndex, particleOrder;
+    ComputeArray exclusions, exclusionStartIndex, blockCenter, blockBoundingBox;
+    ComputeArray neighborPairs, numNeighborPairs, neighborStartIndex, numNeighborsForAtom, neighbors;
+    std::vector<std::string> globalParamNames;
+    std::vector<float> globalParamValues;
+    std::vector<ComputeArray> tabulatedFunctions;
+    const System& system;
+    ComputeKernel forceKernel, blockBoundsKernel, neighborsKernel, startIndicesKernel, copyPairsKernel;
+    ComputeEvent event;
+};
+
+/**
+ * This kernel is invoked by GayBerneForce to calculate the forces acting on the system.
+ */
+class CommonCalcGayBerneForceKernel : public CalcGayBerneForceKernel {
+public:
+    CommonCalcGayBerneForceKernel(std::string name, const Platform& platform, ComputeContext& cc) : CalcGayBerneForceKernel(name, platform), cc(cc),
+            hasInitializedKernels(false) {
+    }
+    /**
+     * Initialize the kernel.
+     *
+     * @param system     the System this kernel will be applied to
+     * @param force      the GayBerneForce this kernel will be used for
+     */
+    void initialize(const System& system, const GayBerneForce& force);
+    /**
+     * Execute the kernel to calculate the forces and/or energy.
+     *
+     * @param context        the context in which to execute this kernel
+     * @param includeForces  true if forces should be calculated
+     * @return the potential energy due to the force
+     */
+    double execute(ContextImpl& context, bool includeForces, bool includeEnergy);
+    /**
+     * Copy changed parameters over to a context.
+     *
+     * @param context    the context to copy parameters to
+     * @param force      the GayBerneForce to copy the parameters from
+     */
+    void copyParametersToContext(ContextImpl& context, const GayBerneForce& force);
+private:
+    class ForceInfo;
+    class ReorderListener;
+    void sortAtoms();
+    ComputeContext& cc;
+    ForceInfo* info;
+    bool hasInitializedKernels;
+    int numRealParticles, maxNeighborBlocks;
+    GayBerneForce::NonbondedMethod nonbondedMethod;
+    ComputeArray sortedParticles, axisParticleIndices, sigParams, epsParams;
+    ComputeArray scale, exceptionParticles, exceptionParams;
+    ComputeArray aMatrix, bMatrix, gMatrix;
+    ComputeArray exclusions, exclusionStartIndex, blockCenter, blockBoundingBox;
+    ComputeArray neighbors, neighborIndex, neighborBlockCount;
+    ComputeArray sortedPos, torque;
+    std::vector<bool> isRealParticle;
+    std::vector<std::pair<int, int> > exceptionAtoms;
+    std::vector<std::pair<int, int> > excludedPairs;
+    ComputeKernel framesKernel, blockBoundsKernel, neighborsKernel, forceKernel, torqueKernel;
+    ComputeEvent event;
+};
+
+/**
+ * This kernel is invoked by VerletIntegrator to take one time step.
+ */
+class CommonIntegrateVerletStepKernel : public IntegrateVerletStepKernel {
+public:
+    CommonIntegrateVerletStepKernel(std::string name, const Platform& platform, ComputeContext& cc) : IntegrateVerletStepKernel(name, platform), cc(cc),
+            hasInitializedKernels(false) {
+    }
+    /**
+     * Initialize the kernel.
+     *
+     * @param system     the System this kernel will be applied to
+     * @param integrator the VerletIntegrator this kernel will be used for
+     */
+    void initialize(const System& system, const VerletIntegrator& integrator);
+    /**
+     * Execute the kernel.
+     *
+     * @param context    the context in which to execute this kernel
+     * @param integrator the VerletIntegrator this kernel is being used for
+     */
+    void execute(ContextImpl& context, const VerletIntegrator& integrator);
+    /**
+     * Compute the kinetic energy.
+     * 
+     * @param context    the context in which to execute this kernel
+     * @param integrator the VerletIntegrator this kernel is being used for
+     */
+    double computeKineticEnergy(ContextImpl& context, const VerletIntegrator& integrator);
+private:
+    ComputeContext& cc;
+    bool hasInitializedKernels;
+    ComputeKernel kernel1, kernel2;
+};
+
+/**
+ * This kernel is invoked by LangevinIntegrator to take one time step.
+ */
+class CommonIntegrateLangevinStepKernel : public IntegrateLangevinStepKernel {
+public:
+    CommonIntegrateLangevinStepKernel(std::string name, const Platform& platform, ComputeContext& cc) : IntegrateLangevinStepKernel(name, platform), cc(cc),
+            hasInitializedKernels(false) {
+    }
+    /**
+     * Initialize the kernel, setting up the particle masses.
+     *
+     * @param system     the System this kernel will be applied to
+     * @param integrator the LangevinIntegrator this kernel will be used for
+     */
+    void initialize(const System& system, const LangevinIntegrator& integrator);
+    /**
+     * Execute the kernel.
+     *
+     * @param context    the context in which to execute this kernel
+     * @param integrator the LangevinIntegrator this kernel is being used for
+     */
+    void execute(ContextImpl& context, const LangevinIntegrator& integrator);
+    /**
+     * Compute the kinetic energy.
+     * 
+     * @param context    the context in which to execute this kernel
+     * @param integrator the LangevinIntegrator this kernel is being used for
+     */
+    double computeKineticEnergy(ContextImpl& context, const LangevinIntegrator& integrator);
+private:
+    ComputeContext& cc;
+    double prevTemp, prevFriction, prevStepSize;
+    bool hasInitializedKernels;
+    ComputeArray params;
+    ComputeKernel kernel1, kernel2;
+};
+
+/**
+ * This kernel is invoked by BAOABLangevinIntegrator to take one time step.
+ */
+class CommonIntegrateBAOABStepKernel : public IntegrateBAOABStepKernel {
+public:
+    CommonIntegrateBAOABStepKernel(std::string name, const Platform& platform, ComputeContext& cc) : IntegrateBAOABStepKernel(name, platform), cc(cc),
+            hasInitializedKernels(false) {
+    }
+    /**
+     * Initialize the kernel, setting up the particle masses.
+     * 
+     * @param system     the System this kernel will be applied to
+     * @param integrator the BAOABLangevinIntegrator this kernel will be used for
+     */
+    void initialize(const System& system, const BAOABLangevinIntegrator& integrator);
+    /**
+     * Execute the kernel.
+     * 
+     * @param context    the context in which to execute this kernel
+     * @param integrator the BAOABLangevinIntegrator this kernel is being used for
+     * @param forcesAreValid if the context has been modified since the last time step, this will be
+     *                       false to show that cached forces are invalid and must be recalculated.
+     *                       On exit, this should specify whether the cached forces are valid at the
+     *                       end of the step.
+     */
+    void execute(ContextImpl& context, const BAOABLangevinIntegrator& integrator, bool& forcesAreValid);
+    /**
+     * Compute the kinetic energy.
+     * 
+     * @param context    the context in which to execute this kernel
+     * @param integrator the BAOABLangevinIntegrator this kernel is being used for
+     */
+    double computeKineticEnergy(ContextImpl& context, const BAOABLangevinIntegrator& integrator);
+private:
+    ComputeContext& cc;
+    double prevTemp, prevFriction, prevStepSize;
+    bool hasInitializedKernels;
+    ComputeArray params, oldDelta;
+    ComputeKernel kernel1, kernel2, kernel3, kernel4;
+};
+
+/**
+ * This kernel is invoked by BrownianIntegrator to take one time step.
+ */
+class CommonIntegrateBrownianStepKernel : public IntegrateBrownianStepKernel {
+public:
+    CommonIntegrateBrownianStepKernel(std::string name, const Platform& platform, ComputeContext& cc) : IntegrateBrownianStepKernel(name, platform), cc(cc),
+            hasInitializedKernels(false), prevTemp(-1), prevFriction(-1), prevStepSize(-1) {
+    }
+    /**
+     * Initialize the kernel.
+     *
+     * @param system     the System this kernel will be applied to
+     * @param integrator the BrownianIntegrator this kernel will be used for
+     */
+    void initialize(const System& system, const BrownianIntegrator& integrator);
+    /**
+     * Execute the kernel.
+     *
+     * @param context    the context in which to execute this kernel
+     * @param integrator the BrownianIntegrator this kernel is being used for
+     */
+    void execute(ContextImpl& context, const BrownianIntegrator& integrator);
+    /**
+     * Compute the kinetic energy.
+     * 
+     * @param context    the context in which to execute this kernel
+     * @param integrator the BrownianIntegrator this kernel is being used for
+     */
+    double computeKineticEnergy(ContextImpl& context, const BrownianIntegrator& integrator);
+private:
+    ComputeContext& cc;
+    double prevTemp, prevFriction, prevStepSize;
+    bool hasInitializedKernels;
+    ComputeKernel kernel1, kernel2;
+};
+
+/**
+ * This kernel is invoked by VerletIntegrator to take one time step.
+ */
+class CommonIntegrateVariableVerletStepKernel : public IntegrateVariableVerletStepKernel {
+public:
+    CommonIntegrateVariableVerletStepKernel(std::string name, const Platform& platform, ComputeContext& cc) : IntegrateVariableVerletStepKernel(name, platform), cc(cc),
+            hasInitializedKernels(false) {
+    }
+    /**
+     * Initialize the kernel.
+     *
+     * @param system     the System this kernel will be applied to
+     * @param integrator the VariableVerletIntegrator this kernel will be used for
+     */
+    void initialize(const System& system, const VariableVerletIntegrator& integrator);
+    /**
+     * Execute the kernel.
+     *
+     * @param context    the context in which to execute this kernel
+     * @param integrator the VariableVerletIntegrator this kernel is being used for
+     * @param maxTime    the maximum time beyond which the simulation should not be advanced
+     * @return the size of the step that was taken
+     */
+    double execute(ContextImpl& context, const VariableVerletIntegrator& integrator, double maxTime);
+    /**
+     * Compute the kinetic energy.
+     * 
+     * @param context    the context in which to execute this kernel
+     * @param integrator the VariableVerletIntegrator this kernel is being used for
+     */
+    double computeKineticEnergy(ContextImpl& context, const VariableVerletIntegrator& integrator);
+private:
+    ComputeContext& cc;
+    bool hasInitializedKernels;
+    int blockSize;
+    ComputeKernel kernel1, kernel2, selectSizeKernel;
+};
+
+/**
+ * This kernel is invoked by VariableLangevinIntegrator to take one time step.
+ */
+class CommonIntegrateVariableLangevinStepKernel : public IntegrateVariableLangevinStepKernel {
+public:
+    CommonIntegrateVariableLangevinStepKernel(std::string name, const Platform& platform, ComputeContext& cc) : IntegrateVariableLangevinStepKernel(name, platform), cc(cc),
+            hasInitializedKernels(false) {
+    }
+    /**
+     * Initialize the kernel, setting up the particle masses.
+     *
+     * @param system     the System this kernel will be applied to
+     * @param integrator the VariableLangevinIntegrator this kernel will be used for
+     */
+    void initialize(const System& system, const VariableLangevinIntegrator& integrator);
+    /**
+     * Execute the kernel.
+     *
+     * @param context    the context in which to execute this kernel
+     * @param integrator the VariableLangevinIntegrator this kernel is being used for
+     * @param maxTime    the maximum time beyond which the simulation should not be advanced
+     * @return the size of the step that was taken
+     */
+    double execute(ContextImpl& context, const VariableLangevinIntegrator& integrator, double maxTime);
+    /**
+     * Compute the kinetic energy.
+     * 
+     * @param context    the context in which to execute this kernel
+     * @param integrator the VariableLangevinIntegrator this kernel is being used for
+     */
+    double computeKineticEnergy(ContextImpl& context, const VariableLangevinIntegrator& integrator);
+private:
+    ComputeContext& cc;
+    bool hasInitializedKernels;
+    int blockSize;
+    ComputeArray params;
+    ComputeKernel kernel1, kernel2, selectSizeKernel;
+    double prevTemp, prevFriction, prevErrorTol;
+};
+
+/**
+ * This kernel is invoked by CustomIntegrator to take one time step.
+ */
+class CommonIntegrateCustomStepKernel : public IntegrateCustomStepKernel {
+public:
+    enum GlobalTargetType {DT, VARIABLE, PARAMETER};
+    CommonIntegrateCustomStepKernel(std::string name, const Platform& platform, ComputeContext& cc) : IntegrateCustomStepKernel(name, platform), cc(cc),
+            hasInitializedKernels(false), needsEnergyParamDerivs(false) {
+    }
+    /**
+     * Initialize the kernel.
+     * 
+     * @param system     the System this kernel will be applied to
+     * @param integrator the CustomIntegrator this kernel will be used for
+     */
+    void initialize(const System& system, const CustomIntegrator& integrator);
+    /**
+     * Execute the kernel.
+     * 
+     * @param context    the context in which to execute this kernel
+     * @param integrator the CustomIntegrator this kernel is being used for
+     * @param forcesAreValid if the context has been modified since the last time step, this will be
+     *                       false to show that cached forces are invalid and must be recalculated.
+     *                       On exit, this should specify whether the cached forces are valid at the
+     *                       end of the step.
+     */
+    void execute(ContextImpl& context, CustomIntegrator& integrator, bool& forcesAreValid);
+    /**
+     * Compute the kinetic energy.
+     * 
+     * @param context    the context in which to execute this kernel
+     * @param integrator the CustomIntegrator this kernel is being used for
+     * @param forcesAreValid if the context has been modified since the last time step, this will be
+     *                       false to show that cached forces are invalid and must be recalculated.
+     *                       On exit, this should specify whether the cached forces are valid at the
+     *                       end of the step.
+     */
+    double computeKineticEnergy(ContextImpl& context, CustomIntegrator& integrator, bool& forcesAreValid);
+    /**
+     * Get the values of all global variables.
+     *
+     * @param context   the context in which to execute this kernel
+     * @param values    on exit, this contains the values
+     */
+    void getGlobalVariables(ContextImpl& context, std::vector<double>& values) const;
+    /**
+     * Set the values of all global variables.
+     *
+     * @param context   the context in which to execute this kernel
+     * @param values    a vector containing the values
+     */
+    void setGlobalVariables(ContextImpl& context, const std::vector<double>& values);
+    /**
+     * Get the values of a per-DOF variable.
+     *
+     * @param context   the context in which to execute this kernel
+     * @param variable  the index of the variable to get
+     * @param values    on exit, this contains the values
+     */
+    void getPerDofVariable(ContextImpl& context, int variable, std::vector<Vec3>& values) const;
+    /**
+     * Set the values of a per-DOF variable.
+     *
+     * @param context   the context in which to execute this kernel
+     * @param variable  the index of the variable to get
+     * @param values    a vector containing the values
+     */
+    void setPerDofVariable(ContextImpl& context, int variable, const std::vector<Vec3>& values);
+private:
+    class ReorderListener;
+    class GlobalTarget;
+    class DerivFunction;
+    std::string createPerDofComputation(const std::string& variable, const Lepton::ParsedExpression& expr, CustomIntegrator& integrator,
+        const std::string& forceName, const std::string& energyName, std::vector<const TabulatedFunction*>& functions,
+        std::vector<std::pair<std::string, std::string> >& functionNames);
+    void prepareForComputation(ContextImpl& context, CustomIntegrator& integrator, bool& forcesAreValid);
+    Lepton::ExpressionTreeNode replaceDerivFunctions(const Lepton::ExpressionTreeNode& node, OpenMM::ContextImpl& context);
+    void findExpressionsForDerivs(const Lepton::ExpressionTreeNode& node, std::vector<std::pair<Lepton::ExpressionTreeNode, std::string> >& variableNodes);
+    void recordGlobalValue(double value, GlobalTarget target, CustomIntegrator& integrator);
+    void recordChangedParameters(ContextImpl& context);
+    bool evaluateCondition(int step);
+    ComputeContext& cc;
+    double energy;
+    float energyFloat;
+    int numGlobalVariables, sumWorkGroupSize;
+    bool hasInitializedKernels, deviceGlobalsAreCurrent, modifiesParameters, hasAnyConstraints, needsEnergyParamDerivs;
+    std::vector<bool> deviceValuesAreCurrent;
+    mutable std::vector<bool> localValuesAreCurrent;
+    ComputeArray globalValues, sumBuffer, summedValue;
+    ComputeArray uniformRandoms, randomSeed, perDofEnergyParamDerivs;
+    std::vector<ComputeArray> tabulatedFunctions, perDofValues;
+    std::map<int, double> savedEnergy;
+    std::map<int, ComputeArray> savedForces;
+    std::set<int> validSavedForces;
+    mutable std::vector<std::vector<mm_float4> > localPerDofValuesFloat;
+    mutable std::vector<std::vector<mm_double4> > localPerDofValuesDouble;
+    std::map<std::string, double> energyParamDerivs;
+    std::vector<std::string> perDofEnergyParamDerivNames;
+    std::vector<double> localPerDofEnergyParamDerivs;
+    std::vector<double> localGlobalValues;
+    std::vector<double> initialGlobalVariables;
+    std::vector<std::vector<ComputeKernel> > kernels;
+    ComputeKernel randomKernel, kineticEnergyKernel, sumKineticEnergyKernel;
+    std::vector<CustomIntegrator::ComputationType> stepType;
+    std::vector<CustomIntegratorUtilities::Comparison> comparisons;
+    std::vector<std::vector<Lepton::CompiledExpression> > globalExpressions;
+    CompiledExpressionSet expressionSet;
+    std::vector<bool> needsGlobals, needsForces, needsEnergy;
+    std::vector<bool> computeBothForceAndEnergy, invalidatesForces, merged;
+    std::vector<int> forceGroupFlags, blockEnd, requiredGaussian, requiredUniform;
+    std::vector<int> stepEnergyVariableIndex, globalVariableIndex, parameterVariableIndex;
+    int gaussianVariableIndex, uniformVariableIndex, dtVariableIndex;
+    std::vector<std::string> parameterNames;
+    std::vector<GlobalTarget> stepTarget;
+};
+
+class CommonIntegrateCustomStepKernel::GlobalTarget {
+public:
+    CommonIntegrateCustomStepKernel::GlobalTargetType type;
+    int variableIndex;
+    GlobalTarget() {
+    }
+    GlobalTarget(CommonIntegrateCustomStepKernel::GlobalTargetType type, int variableIndex) : type(type), variableIndex(variableIndex) {
+    }
+};
+
+/**
+ * This kernel is invoked to remove center of mass motion from the system.
+ */
+class CommonRemoveCMMotionKernel : public RemoveCMMotionKernel {
+public:
+    CommonRemoveCMMotionKernel(std::string name, const Platform& platform, ComputeContext& cc) : RemoveCMMotionKernel(name, platform), cc(cc) {
+    }
+    /**
+     * Initialize the kernel, setting up the particle masses.
+     *
+     * @param system     the System this kernel will be applied to
+     * @param force      the CMMotionRemover this kernel will be used for
+     */
+    void initialize(const System& system, const CMMotionRemover& force);
+    /**
+     * Execute the kernel.
+     *
+     * @param context    the context in which to execute this kernel
+     */
+    void execute(ContextImpl& context);
+private:
+    ComputeContext& cc;
+    int frequency;
+    ComputeArray cmMomentum;
+    ComputeKernel kernel1, kernel2;
+};
+
+/**
+ * This kernel is invoked by RMSDForce to calculate the forces acting on the system and the energy of the system.
+ */
+class CommonCalcRMSDForceKernel : public CalcRMSDForceKernel {
+public:
+    CommonCalcRMSDForceKernel(std::string name, const Platform& platform, ComputeContext& cc) : CalcRMSDForceKernel(name, platform), cc(cc) {
+    }
+    /**
+     * Initialize the kernel.
+     *
+     * @param system     the System this kernel will be applied to
+     * @param force      the RMSDForce this kernel will be used for
+     */
+    void initialize(const System& system, const RMSDForce& force);
+    /**
+     * Record the reference positions and particle indices.
+     */
+    void recordParameters(const RMSDForce& force);
+    /**
+     * Execute the kernel to calculate the forces and/or energy.
+     *
+     * @param context        the context in which to execute this kernel
+     * @param includeForces  true if forces should be calculated
+     * @param includeEnergy  true if the energy should be calculated
+     * @return the potential energy due to the force
+     */
+    double execute(ContextImpl& context, bool includeForces, bool includeEnergy);
+    /**
+     * This is the internal implementation of execute(), templatized on whether we're
+     * using single or double precision.
+     */
+    template <class REAL>
+    double executeImpl(ContextImpl& context);
+    /**
+     * Copy changed parameters over to a context.
+     *
+     * @param context    the context to copy parameters to
+     * @param force      the RMSDForce to copy the parameters from
+     */
+    void copyParametersToContext(ContextImpl& context, const RMSDForce& force);
+private:
+    class ForceInfo;
+    ComputeContext& cc;
+    ForceInfo* info;
+    int blockSize;
+    double sumNormRef;
+    ComputeArray referencePos, particles, buffer;
+    ComputeKernel kernel1, kernel2;
+};
+
+/**
+ * This kernel is invoked by AndersenThermostat at the start of each time step to adjust the particle velocities.
+ */
+class CommonApplyAndersenThermostatKernel : public ApplyAndersenThermostatKernel {
+public:
+    CommonApplyAndersenThermostatKernel(std::string name, const Platform& platform, ComputeContext& cc) : ApplyAndersenThermostatKernel(name, platform), cc(cc) {
+    }
+    /**
+     * Initialize the kernel.
+     *
+     * @param system     the System this kernel will be applied to
+     * @param thermostat the AndersenThermostat this kernel will be used for
+     */
+    void initialize(const System& system, const AndersenThermostat& thermostat);
+    /**
+     * Execute the kernel.
+     *
+     * @param context    the context in which to execute this kernel
+     */
+    void execute(ContextImpl& context);
+private:
+    ComputeContext& cc;
+    int randomSeed;
+    ComputeArray atomGroups;
+    ComputeKernel kernel;
+};
+
+} // namespace OpenMM
+
+#endif /*OPENMM_COMMONKERNELS_H_*/
--- a/platforms/common/include/openmm/common/ComputeArray.h
+++ b/platforms/common/include/openmm/common/ComputeArray.h
+#ifndef OPENMM_COMPUTEARRAY_H_
+#define OPENMM_COMPUTEARRAY_H_
+
+/* -------------------------------------------------------------------------- *
+ *                                   OpenMM                                   *
+ * -------------------------------------------------------------------------- *
+ * This is part of the OpenMM molecular simulation toolkit originating from   *
+ * Simbios, the NIH National Center for Physics-Based Simulation of           *
+ * Biological Structures at Stanford, funded under the NIH Roadmap for        *
+ * Medical Research, grant U54 GM072970. See https://simtk.org.               *
+ *                                                                            *
+ * Portions copyright (c) 2019 Stanford University and the Authors.           *
+ * Authors: Peter Eastman                                                     *
+ * Contributors:                                                              *
+ *                                                                            *
+ * This program is free software: you can redistribute it and/or modify       *
+ * it under the terms of the GNU Lesser General Public License as published   *
+ * by the Free Software Foundation, either version 3 of the License, or       *
+ * (at your option) any later version.                                        *
+ *                                                                            *
+ * This program is distributed in the hope that it will be useful,            *
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of             *
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the              *
+ * GNU Lesser General Public License for more details.                        *
+ *                                                                            *
+ * You should have received a copy of the GNU Lesser General Public License   *
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.      *
+ * -------------------------------------------------------------------------- */
+
+#include "openmm/common/ArrayInterface.h"
+
+namespace OpenMM {
+
+/**
+ * This is an implementation of ArrayInterface that acts as a wrapper around a platform-specific
+ * array implementation (typically CudaArray or OpenCLArray).  This class can be used in code that
+ * is not platform-specific, and an appropriate implementation array is created automatically
+ * based on the ComputeContext.
+ */
+
+class OPENMM_EXPORT_COMMON ComputeArray : public ArrayInterface {
+public:
+    /**
+     * Create an uninitialized ComputeArray object.  It cannot be used until initialize() is called on it.
+     */
+    ComputeArray();
+    /**
+     * Release all resources allocated by this object.
+     */
+    ~ComputeArray();
+    /**
+     * Get the internal array this object is wrapping.
+     */
+    ArrayInterface& getArray();
+    /**
+     * Initialize this array.
+     *
+     * @param context           the context for which to create the array
+     * @param size              the number of elements in the array
+     * @param elementSize       the size of each element in bytes
+     * @param name              the name of the array
+     */
+    void initialize(ComputeContext& context, int size, int elementSize, const std::string& name);
+    /**
+     * Initialize this object.  The template argument is the data type of each array element.
+     *
+     * @param context           the context for which to create the array
+     * @param size              the number of elements in the array
+     * @param name              the name of the array
+     */
+    template <class T>
+    void initialize(ComputeContext& context, int size, const std::string& name) {
+        initialize(context, size, sizeof(T), name);
+    }
+    /**
+     * Recreate the internal storage to have a different size.
+     */
+    void resize(int size);
+    /**
+     * Get whether this array has been initialized.
+     */
+    bool isInitialized() const;
+    /**
+     * Get the number of elements in the array.
+     */
+    int getSize() const;
+    /**
+     * Get the size of each element in bytes.
+     */
+    int getElementSize() const;
+    /**
+     * Get the name of the array.
+     */
+    const std::string& getName() const;
+    /**
+     * Get the context this array belongs to.
+     */
+    ComputeContext& getContext();
+    /**
+     * Copy the values in a vector to the Buffer.
+     */
+    template <class T>
+    void upload(const std::vector<T>& data, bool convert=false) {
+        ArrayInterface::upload(data, convert);
+    }
+    /**
+     * Copy the values in the Buffer to a vector.
+     */
+    template <class T>
+    void download(std::vector<T>& data) const {
+        ArrayInterface::download(data);
+    }
+    /**
+     * Copy the values from host memory to the array.
+     * 
+     * @param data     the data to copy
+     * @param blocking if true, this call will block until the transfer is complete.  Subclasses often
+     *                 have restrictions on non-blocking copies, such as that the source data must be
+     *                 in page-locked memory.
+     */
+    void upload(const void* data, bool blocking=true);
+    /**
+     * Copy the values in the array to host memory.
+     * 
+     * @param data     the destination to copy the value to
+     * @param blocking if true, this call will block until the transfer is complete.  Subclasses often
+     *                 have restrictions on non-blocking copies, such as that the destination must be
+     *                 in page-locked memory.
+     */
+    void download(void* data, bool blocking=true) const;
+    /**
+     * Copy the values in this array to a second array.
+     * 
+     * @param dest     the destination array to copy to
+     */
+    void copyTo(ArrayInterface& dest) const;
+private:
+    ArrayInterface* impl;
+};
+
+} // namespace OpenMM
+
+#endif /*OPENMM_COMPUTEARRAY_H_*/
--- a/platforms/common/include/openmm/common/ComputeContext.h
+++ b/platforms/common/include/openmm/common/ComputeContext.h
+#ifndef OPENMM_COMPUTECONTEXT_H_
+#define OPENMM_COMPUTECONTEXT_H_
+
+/* -------------------------------------------------------------------------- *
+ *                                   OpenMM                                   *
+ * -------------------------------------------------------------------------- *
+ * This is part of the OpenMM molecular simulation toolkit originating from   *
+ * Simbios, the NIH National Center for Physics-Based Simulation of           *
+ * Biological Structures at Stanford, funded under the NIH Roadmap for        *
+ * Medical Research, grant U54 GM072970. See https://simtk.org.               *
+ *                                                                            *
+ * Portions copyright (c) 2019 Stanford University and the Authors.           *
+ * Authors: Peter Eastman                                                     *
+ * Contributors:                                                              *
+ *                                                                            *
+ * This program is free software: you can redistribute it and/or modify       *
+ * it under the terms of the GNU Lesser General Public License as published   *
+ * by the Free Software Foundation, either version 3 of the License, or       *
+ * (at your option) any later version.                                        *
+ *                                                                            *
+ * This program is distributed in the hope that it will be useful,            *
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of             *
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the              *
+ * GNU Lesser General Public License for more details.                        *
+ *                                                                            *
+ * You should have received a copy of the GNU Lesser General Public License   *
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.      *
+ * -------------------------------------------------------------------------- */
+
+#ifdef _MSC_VER
+    // Prevent Windows from defining macros that interfere with other code.
+    #define NOMINMAX
+#endif
+#include "openmm/common/ArrayInterface.h"
+#include "openmm/common/BondedUtilities.h"
+#include "openmm/common/ComputeEvent.h"
+#include "openmm/common/ComputeForceInfo.h"
+#include "openmm/common/ComputeProgram.h"
+#include "openmm/common/ComputeVectorTypes.h"
+#include "openmm/common/IntegrationUtilities.h"
+#include "openmm/common/NonbondedUtilities.h"
+#include "openmm/Vec3.h"
+#include <pthread.h>
+#include <map>
+#include <queue>
+#include <string>
+#include <vector>
+
+namespace OpenMM {
+
+class ExpressionUtilities;
+class System;
+class ThreadPool;
+
+/**
+ * This abstract class defines the interface by which platforms compile and execute
+ * kernels.  It also manages the arrays use for storing standard information, like
+ * positions and forces.
+ */
+
+class OPENMM_EXPORT_COMMON ComputeContext {
+public:
+    class WorkTask;
+    class WorkThread;
+    class ReorderListener;
+    class ForcePreComputation;
+    class ForcePostComputation;
+    ComputeContext(const System& system);
+    virtual ~ComputeContext();
+    /**
+     * Add a ComputeForceInfo to this context.  Force kernels call this during initialization
+     * to provide information about particular forces.
+     */
+    virtual void addForce(ComputeForceInfo* force);
+    /**
+     * Get all ComputeForceInfos that have been added to this context.
+     */
+    std::vector<ComputeForceInfo*>& getForceInfos() {
+        return forces;
+    }
+    /**
+     * Request that the context provide at least a particular number of force buffers.
+     * This is only meaningful for devices that do not support 64 bit atomic operations.
+     * On other devices, this will typically have no effect.  Force kernels should call
+     * this during initialization.
+     */
+    virtual void requestForceBuffers(int minBuffers) {
+    }
+    /**
+     * Set this as the current context for the calling thread.  This should be called before
+     * doing any computation when you do not know what other code has just been executing on
+     * the thread.  Platforms that rely on binding contexts to threads (such as CUDA) need to
+     * implement this.
+     */
+    virtual void setAsCurrent() {
+    }
+    /**
+     * Get the number of contexts being used for the current simulation.
+     * This is relevant when a simulation is parallelized across multiple devices.  In that case,
+     * one ComputeContext is created for each device.
+     */
+    virtual int getNumContexts() const = 0;
+    /**
+     * Get the index of this context in the list of ones being used for the current simulation.
+     * This is relevant when a simulation is parallelized across multiple devices.  In that case,
+     * one ComputeContext is created for each device.
+     */
+    virtual int getContextIndex() const = 0;
+    /**
+     * Construct an uninitialized array of the appropriate class for this platform.  The returned
+     * value should be created on the heap with the "new" operator.
+     */
+    virtual ArrayInterface* createArray() = 0;
+    /**
+     * Construct a ComputeEvent object of the appropriate class for this platform.
+     */
+    virtual ComputeEvent createEvent() = 0;
+    /**
+     * Compile source code to create a ComputeProgram.
+     *
+     * @param source             the source code of the program
+     * @param defines            a set of preprocessor definitions (name, value) to define when compiling the program
+     */
+    virtual ComputeProgram compileProgram(const std::string source, const std::map<std::string, std::string>& defines=std::map<std::string, std::string>()) = 0;
+    /**
+     * Set all elements of an array to 0.
+     */
+    virtual void clearBuffer(ArrayInterface& array) = 0;
+    /**
+     * Register an array that should be automatically cleared (all elements set to 0) at the start of each force or energy computation.
+     */
+    virtual void addAutoclearBuffer(ArrayInterface& array) = 0;
+    /**
+     * Get whether the device being used is a CPU.  In some cases, different algorithms
+     * may be more efficient on CPUs and GPUs.
+     */
+    virtual bool getIsCPU() const = 0;
+    /**
+     * Get the SIMD width of the device being used.
+     */
+    virtual int getSIMDWidth() const = 0;
+    /**
+     * Get whether the device being used supports 64 bit atomic operations on global memory.
+     */
+    virtual bool getSupports64BitGlobalAtomics() const = 0;
+    /**
+     * Get whether the device being used supports double precision math.
+     */
+    virtual bool getSupportsDoublePrecision() const = 0;
+    /**
+     * Get whether double precision is being used.
+     */
+    virtual bool getUseDoublePrecision() const = 0;
+    /**
+     * Get whether mixed precision is being used.
+     */
+    virtual bool getUseMixedPrecision() const = 0;
+    /**
+     * Get the current simulation time.
+     */
+    double getTime() {
+        return time;
+    }
+    /**
+     * Set the current simulation time.
+     */
+    void setTime(double t) {
+        time = t;
+    }
+    /**
+     * Get the number of integration steps that have been taken.
+     */
+    int getStepCount() {
+        return stepCount;
+    }
+    /**
+     * Set the number of integration steps that have been taken.
+     */
+    void setStepCount(int steps) {
+        stepCount = steps;
+    }
+    /**
+     * Get the number of times forces or energy has been computed.
+     */
+    int getComputeForceCount() {
+        return computeForceCount;
+    }
+    /**
+     * Set the number of times forces or energy has been computed.
+     */
+    void setComputeForceCount(int count) {
+        computeForceCount = count;
+    }
+    /**
+     * Get the number of time steps since the atoms were reordered.
+     */
+    int getStepsSinceReorder() const {
+        return stepsSinceReorder;
+    }
+    /**
+     * Set the number of time steps since the atoms were reordered.
+     */
+    void setStepsSinceReorder(int steps) {
+        stepsSinceReorder = steps;
+    }
+    /**
+     * Get whether atoms were reordered during the most recent force/energy computation.
+     */
+    bool getAtomsWereReordered() const {
+        return atomsWereReordered;
+    }
+    /**
+     * Set whether atoms were reordered during the most recent force/energy computation.
+     */
+    void setAtomsWereReordered(bool wereReordered) {
+        atomsWereReordered = wereReordered;
+    }
+    /**
+     * Reorder the internal arrays of atoms to try to keep spatially contiguous atoms close
+     * together in the arrays.
+     */
+    void reorderAtoms();
+    /**
+     * Add a listener that should be called whenever atoms get reordered.  The OpenCLContext
+     * assumes ownership of the object, and deletes it when the context itself is deleted.
+     */
+    void addReorderListener(ReorderListener* listener);
+    /**
+     * Get the list of ReorderListeners.
+     */
+    std::vector<ReorderListener*>& getReorderListeners() {
+        return reorderListeners;
+    }
+    /**
+     * Add a pre-computation that should be called at the very start of force and energy evaluations.
+     * The OpenCLContext assumes ownership of the object, and deletes it when the context itself is deleted.
+     */
+    void addPreComputation(ForcePreComputation* computation);
+    /**
+     * Get the list of ForcePreComputations.
+     */
+    std::vector<ForcePreComputation*>& getPreComputations() {
+        return preComputations;
+    }
+    /**
+     * Add a post-computation that should be called at the very end of force and energy evaluations.
+     * The OpenCLContext assumes ownership of the object, and deletes it when the context itself is deleted.
+     */
+    void addPostComputation(ForcePostComputation* computation);
+    /**
+     * Get the list of ForcePostComputations.
+     */
+    std::vector<ForcePostComputation*>& getPostComputations() {
+        return postComputations;
+    }
+    /**
+     * Get the flag that marks whether the current force evaluation is valid.
+     */
+    bool getForcesValid() const {
+        return forcesValid;
+    }
+    /**
+     * Get the flag that marks whether the current force evaluation is valid.
+     */
+    void setForcesValid(bool valid) {
+        forcesValid = valid;
+    }
+    /**
+     * Get the number of atoms.
+     */
+    int getNumAtoms() const {
+        return numAtoms;
+    }
+    /**
+     * Get the number of atoms, rounded up to a multiple of TileSize.  This is the actual size of
+     * most arrays with one element per atom.
+     */
+    int getPaddedNumAtoms() const {
+        return paddedNumAtoms;
+    }
+    /**
+     * Get the standard number of thread blocks to use when executing kernels.
+     */
+    virtual int getNumThreadBlocks() const = 0;
+    /**
+     * Get the maximum number of threads in a thread block supported by this device.
+     */
+    virtual int getMaxThreadBlockSize() const = 0;
+    /**
+     * Get the array which contains the position (the xyz components) and charge (the w component) of each atom.
+     */
+    virtual ArrayInterface& getPosq() = 0;
+    /**
+     * Get the array which contains a correction to the position of each atom.  This only exists if getUseMixedPrecision() returns true.
+     */
+    virtual ArrayInterface& getPosqCorrection() = 0;
+    /**
+     * Get the array which contains the velocity (the xyz components) and inverse mass (the w component) of each atom.
+     */
+    virtual ArrayInterface& getVelm() = 0;
+    /**
+     * On devices that do not support 64 bit atomics, this returns an array containing buffers of type real4 in which
+     * forces can be accumulated.  Do not call this if getSupports64BitGlobalAtomics() returns true.  The returned value
+     * in that case is undefined, and it may throw an exception.
+     */
+    virtual ArrayInterface& getForceBuffers() = 0;
+    /**
+     * Get the array which contains a contribution to each force represented as 64 bit fixed point.
+     */
+    virtual ArrayInterface& getLongForceBuffer() = 0;
+    /**
+     * Get the array which contains the buffer in which energy is computed.
+     */
+    virtual ArrayInterface& getEnergyBuffer() = 0;
+    /**
+     * Get the array which contains the buffer in which derivatives of the energy with respect to parameters are computed.
+     */
+    virtual ArrayInterface& getEnergyParamDerivBuffer() = 0;
+    /**
+     * Get a pointer to a block of pinned memory that can be used for asynchronous transfers between host and device.
+     * This is guaranteed to be at least as large as any of the arrays returned by methods of this class.
+     * 
+     * Because this buffer is freely available to all code, care is needed to avoid conflicts.  Only access this
+     * buffer from the main thread, and make sure all transfers are complete before you invoke any other code that
+     * might make use of it
+     */
+    virtual void* getPinnedBuffer() = 0;
+    /**
+     * Get a shared ThreadPool that code can use to parallelize operations.
+     * 
+     * Because this object is freely available to all code, care is needed to avoid conflicts.  Only use it
+     * from the main thread, and make sure all operations are complete before you invoke any other code that
+     * might make use of it
+     */
+    virtual ThreadPool& getThreadPool() = 0;
+    /**
+     * Get the host-side vector which contains the index of each atom.
+     */
+    const std::vector<int>& getAtomIndex() const {
+        return atomIndex;
+    }
+    /**
+     * Get the array which contains the index of each atom.
+     */
+    virtual ArrayInterface& getAtomIndexArray() = 0;
+    /**
+     * Get the number of cells by which the positions are offset.
+     */
+    std::vector<mm_int4>& getPosCellOffsets() {
+        return posCellOffsets;
+    }
+    /**
+     * Replace all occurrences of a list of substrings.
+     *
+     * @param input   a string to process
+     * @param replacements a set of strings that should be replaced with new strings wherever they appear in the input string
+     * @return a new string produced by performing the replacements
+     */
+    std::string replaceStrings(const std::string& input, const std::map<std::string, std::string>& replacements) const;
+    /**
+     * Convert a number to a string in a format suitable for including in a kernel.
+     * This takes into account whether the context uses single or double precision.
+     */
+    std::string doubleToString(double value) const;
+    /**
+     * Convert a number to a string in a format suitable for including in a kernel.
+     */
+    std::string intToString(int value) const;
+    /**
+     * Get whether the periodic box is triclinic.
+     */
+    virtual bool getBoxIsTriclinic() const = 0;
+    /**
+     * Get the vectors defining the periodic box.
+     */
+    virtual void getPeriodicBoxVectors(Vec3& a, Vec3& b, Vec3& c) const = 0;
+    /**
+     * Set the vectors defining the periodic box.
+     */
+    virtual void setPeriodicBoxVectors(const Vec3& a, const Vec3& b, const Vec3& c) = 0; 
+    /**
+     * Get the IntegrationUtilities for this context.
+     */
+    virtual IntegrationUtilities& getIntegrationUtilities() = 0;
+    /**
+     * Get the ExpressionUtilities for this context.
+     */
+    virtual ExpressionUtilities& getExpressionUtilities() = 0;
+    /**
+     * Get the BondedUtilities for this context.
+     */
+    virtual BondedUtilities& getBondedUtilities() = 0;
+    /**
+     * Get the NonbondedUtilities for this context.
+     */
+    virtual NonbondedUtilities& getNonbondedUtilities() = 0;
+    /**
+     * This should be called by the Integrator from its own initialize() method.
+     * It ensures all contexts are fully initialized.
+     */
+    virtual void initializeContexts() = 0;
+    /**
+     * Get the thread used by this context for executing parallel computations.
+     */
+    WorkThread& getWorkThread() {
+        return *thread;
+    }
+    /**
+     * Get the names of all parameters with respect to which energy derivatives are computed.
+     */
+    virtual const std::vector<std::string>& getEnergyParamDerivNames() const = 0;
+    /**
+     * Get a workspace data structure used for accumulating the values of derivatives of the energy
+     * with respect to parameters.
+     */
+    virtual std::map<std::string, double>& getEnergyParamDerivWorkspace() = 0;
+    /**
+     * Register that the derivative of potential energy with respect to a context parameter
+     * will need to be calculated.  If this is called multiple times for a single parameter,
+     * it is only added to the list once.
+     * 
+     * @param param    the name of the parameter to add
+     */
+    virtual void addEnergyParameterDerivative(const std::string& param) = 0;
+    /**
+     * Mark that the current molecule definitions (and hence the atom order) may be invalid.
+     * This should be called whenever force field parameters change.  It will cause the definitions
+     * and order to be revalidated.
+     * 
+     * If you know which force has changed, calling the alternate form that takes a ComputeForceInfo
+     * is more efficient.
+     */
+    void invalidateMolecules();
+    /**
+     * Mark that the current molecule definitions from one particular force (and hence the atom order)
+     * may be invalid.  This should be called whenever force field parameters change.  It will cause the
+     * definitions and order to be revalidated.
+     */
+    bool invalidateMolecules(ComputeForceInfo* force);
+    /**
+     * Wait until all work that has been queued (kernel executions, asynchronous data transfers, etc.)
+     * has been submitted to the device.  This does not mean it has necessarily been completed.
+     * Calling this periodically may improve the responsiveness of the computer's GUI, but at the
+     * expense of reduced simulation performance.
+     */
+    virtual void flushQueue() = 0;
+protected:
+    struct Molecule;
+    struct MoleculeGroup;
+    class VirtualSiteInfo;
+    void findMoleculeGroups();
+    /**
+     * This is the internal implementation of reorderAtoms(), templatized by the numerical precision in use.
+     */
+    template <class Real, class Real4, class Mixed, class Mixed4>
+    void reorderAtomsImpl();
+    const System& system;
+    double time;
+    int numAtoms, paddedNumAtoms, stepCount, computeForceCount, stepsSinceReorder;
+    bool atomsWereReordered, forcesValid;
+    std::vector<ComputeForceInfo*> forces;
+    std::vector<Molecule> molecules;
+    std::vector<MoleculeGroup> moleculeGroups;
+    std::vector<int> atomIndex;
+    std::vector<mm_int4> posCellOffsets;
+    std::vector<ReorderListener*> reorderListeners;
+    std::vector<ForcePreComputation*> preComputations;
+    std::vector<ForcePostComputation*> postComputations;
+    WorkThread* thread;
+};
+
+struct ComputeContext::Molecule {
+    std::vector<int> atoms;
+    std::vector<int> constraints;
+    std::vector<std::vector<int> > groups;
+};
+
+struct ComputeContext::MoleculeGroup {
+    std::vector<int> atoms;
+    std::vector<int> instances;
+    std::vector<int> offsets;
+};
+
+/**
+ * This abstract class defines a task to be executed on the worker thread.
+ */
+class OPENMM_EXPORT_COMMON ComputeContext::WorkTask {
+public:
+    virtual void execute() = 0;
+    virtual ~WorkTask() {
+    }
+};
+
+class OPENMM_EXPORT_COMMON ComputeContext::WorkThread {
+public:
+    struct ThreadData;
+    WorkThread();
+    ~WorkThread();
+    /**
+     * Request that a task be executed on the worker thread.  The argument should have been allocated on the
+     * heap with the "new" operator.  After its execute() method finishes, the object will be deleted automatically.
+     */
+    void addTask(ComputeContext::WorkTask* task);
+    /**
+     * Get whether the worker thread is idle, waiting for a task to be added.
+     */
+    bool isWaiting();
+    /**
+     * Get whether the worker thread has exited.
+     */
+    bool isFinished();
+    /**
+     * Block until all tasks have finished executing and the worker thread is idle.
+     */
+    void flush();
+private:
+    std::queue<ComputeContext::WorkTask*> tasks;
+    bool waiting, finished;
+    pthread_mutex_t queueLock;
+    pthread_cond_t waitForTaskCondition, queueEmptyCondition;
+    pthread_t thread;
+};
+
+/**
+ * This abstract class defines a function to be executed whenever atoms get reordered.
+ * Objects that need to know when reordering happens should create a ReorderListener
+ * and register it by calling addReorderListener().
+ */
+class OPENMM_EXPORT_COMMON ComputeContext::ReorderListener {
+public:
+    virtual void execute() = 0;
+    virtual ~ReorderListener() {
+    }
+};
+
+/**
+ * This abstract class defines a function to be executed at the very beginning of force and
+ * energy evaluation, before any other calculation has been done.  It is useful for operations
+ * that need to be performed at a nonstandard point in the process.  After creating a
+ * ForcePreComputation, register it by calling addForcePreComputation().
+ */
+class OPENMM_EXPORT_COMMON ComputeContext::ForcePreComputation {
+public:
+    virtual ~ForcePreComputation() {
+    }
+    /**
+     * @param includeForce  true if forces should be computed
+     * @param includeEnergy true if potential energy should be computed
+     * @param groups        a set of bit flags for which force groups to include
+     */
+    virtual void computeForceAndEnergy(bool includeForces, bool includeEnergy, int groups) = 0;
+};
+
+/**
+ * This abstract class defines a function to be executed at the very end of force and
+ * energy evaluation, after all other calculations have been done.  It is useful for operations
+ * that need to be performed at a nonstandard point in the process.  After creating a
+ * ForcePostComputation, register it by calling addForcePostComputation().
+ */
+class OPENMM_EXPORT_COMMON ComputeContext::ForcePostComputation {
+public:
+    virtual ~ForcePostComputation() {
+    }
+    /**
+     * @param includeForce  true if forces should be computed
+     * @param includeEnergy true if potential energy should be computed
+     * @param groups        a set of bit flags for which force groups to include
+     * @return an optional contribution to add to the potential energy.
+     */
+    virtual double computeForceAndEnergy(bool includeForces, bool includeEnergy, int groups) = 0;
+};
+
+} // namespace OpenMM
+
+#endif /*OPENMM_COMPUTECONTEXT_H_*/
--- a/platforms/common/include/openmm/common/ComputeEvent.h
+++ b/platforms/common/include/openmm/common/ComputeEvent.h
+#ifndef OPENMM_COMPUTEEVENT_H_
+#define OPENMM_COMPUTEEVENT_H_
+
+/* -------------------------------------------------------------------------- *
+ *                                   OpenMM                                   *
+ * -------------------------------------------------------------------------- *
+ * This is part of the OpenMM molecular simulation toolkit originating from   *
+ * Simbios, the NIH National Center for Physics-Based Simulation of           *
+ * Biological Structures at Stanford, funded under the NIH Roadmap for        *
+ * Medical Research, grant U54 GM072970. See https://simtk.org.               *
+ *                                                                            *
+ * Portions copyright (c) 2019 Stanford University and the Authors.           *
+ * Authors: Peter Eastman                                                     *
+ * Contributors:                                                              *
+ *                                                                            *
+ * This program is free software: you can redistribute it and/or modify       *
+ * it under the terms of the GNU Lesser General Public License as published   *
+ * by the Free Software Foundation, either version 3 of the License, or       *
+ * (at your option) any later version.                                        *
+ *                                                                            *
+ * This program is distributed in the hope that it will be useful,            *
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of             *
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the              *
+ * GNU Lesser General Public License for more details.                        *
+ *                                                                            *
+ * You should have received a copy of the GNU Lesser General Public License   *
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.      *
+ * -------------------------------------------------------------------------- */
+
+#include <memory>
+
+namespace OpenMM {
+
+/**
+ * This abstract class represents an event for synchronization between the host and
+ * device.  It is created by calling createEvent() on a ComputeContext, which returns
+ * an instance of a platform-specific subclass.  To use it, call enqueue() immediately
+ * after starting an asynchronous operation, such as a kernel invocation or non-blocking
+ * data transfer.  Then at a later point call wait().  This will cause the host to block
+ * until all operations started before the call to enequeue() have completed.
+ * 
+ * Instead of referring to this class directly, it is best to use a ComputeEvent, which is
+ * a typedef for a shared_ptr to a ComputeEventImpl.  This allows you to treat it as having
+ * value semantics, and frees you from having to manage memory.  
+ */
+
+class OPENMM_EXPORT_COMMON ComputeEventImpl {
+public:
+    virtual ~ComputeEventImpl() {
+    }
+    /**
+     * Place the event into the device's execution queue.
+     */
+    virtual void enqueue() = 0;
+    /**
+     * Block until all operations started before the call to enqueue() have completed.
+     */
+    virtual void wait() = 0;
+};
+
+typedef std::shared_ptr<ComputeEventImpl> ComputeEvent;
+
+} // namespace OpenMM
+
+#endif /*OPENMM_COMPUTEEVENT_H_*/
--- a/platforms/common/include/openmm/common/ComputeForceInfo.h
+++ b/platforms/common/include/openmm/common/ComputeForceInfo.h
+#ifndef OPENMM_COMPUTEFORCEINFO_H_
+#define OPENMM_COMPUTEFORCEINFO_H_
+
+/* -------------------------------------------------------------------------- *
+ *                                   OpenMM                                   *
+ * -------------------------------------------------------------------------- *
+ * This is part of the OpenMM molecular simulation toolkit originating from   *
+ * Simbios, the NIH National Center for Physics-Based Simulation of           *
+ * Biological Structures at Stanford, funded under the NIH Roadmap for        *
+ * Medical Research, grant U54 GM072970. See https://simtk.org.               *
+ *                                                                            *
+ * Portions copyright (c) 2009-2019 Stanford University and the Authors.       *
+ * Authors: Peter Eastman                                                     *
+ * Contributors:                                                              *
+ *                                                                            *
+ * This program is free software: you can redistribute it and/or modify       *
+ * it under the terms of the GNU Lesser General Public License as published   *
+ * by the Free Software Foundation, either version 3 of the License, or       *
+ * (at your option) any later version.                                        *
+ *                                                                            *
+ * This program is distributed in the hope that it will be useful,            *
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of             *
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the              *
+ * GNU Lesser General Public License for more details.                        *
+ *                                                                            *
+ * You should have received a copy of the GNU Lesser General Public License   *
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.      *
+ * -------------------------------------------------------------------------- */
+
+#include "openmm/common/windowsExportCommon.h"
+#include <vector>
+
+namespace OpenMM {
+
+/**
+ * ComputeForceInfo objects describe information about the behavior and requirements of
+ * a force.  They exist primarily to help a ComputeContext determine how particles can be
+ * reordered without affecting forces.  Force kernels create them during initialization
+ * and add them to the ComputeContext by calling addForce().
+ */
+
+class OPENMM_EXPORT_COMMON ComputeForceInfo {
+public:
+    ComputeForceInfo() {
+    }
+    /**
+     * Get whether or not two particles have identical force field parameters.
+     */
+    virtual bool areParticlesIdentical(int particle1, int particle2);
+    /**
+     * Get the number of particle groups defined by this force.
+     */
+    virtual int getNumParticleGroups();
+    /**
+     * Get the list of particles in a particular group.
+     */
+    virtual void getParticlesInGroup(int index, std::vector<int>& particles);
+    /**
+     * Get whether two particle groups are identical.
+     */
+    virtual bool areGroupsIdentical(int group1, int group2);
+};
+
+} // namespace OpenMM
+
+#endif /*OPENMM_COMPUTEFORCEINFO_H_*/
--- a/platforms/common/include/openmm/common/ComputeKernel.h
+++ b/platforms/common/include/openmm/common/ComputeKernel.h
+#ifndef OPENMM_COMPUTEKERNEL_H_
+#define OPENMM_COMPUTEKERNEL_H_
+
+/* -------------------------------------------------------------------------- *
+ *                                   OpenMM                                   *
+ * -------------------------------------------------------------------------- *
+ * This is part of the OpenMM molecular simulation toolkit originating from   *
+ * Simbios, the NIH National Center for Physics-Based Simulation of           *
+ * Biological Structures at Stanford, funded under the NIH Roadmap for        *
+ * Medical Research, grant U54 GM072970. See https://simtk.org.               *
+ *                                                                            *
+ * Portions copyright (c) 2019 Stanford University and the Authors.           *
+ * Authors: Peter Eastman                                                     *
+ * Contributors:                                                              *
+ *                                                                            *
+ * This program is free software: you can redistribute it and/or modify       *
+ * it under the terms of the GNU Lesser General Public License as published   *
+ * by the Free Software Foundation, either version 3 of the License, or       *
+ * (at your option) any later version.                                        *
+ *                                                                            *
+ * This program is distributed in the hope that it will be useful,            *
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of             *
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the              *
+ * GNU Lesser General Public License for more details.                        *
+ *                                                                            *
+ * You should have received a copy of the GNU Lesser General Public License   *
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.      *
+ * -------------------------------------------------------------------------- */
+
+#include "openmm/common/ArrayInterface.h"
+#include <memory>
+#include <string>
+#include <type_traits>
+
+namespace OpenMM {
+
+/**
+ * This abstract class represents a kernel that can be executed on a computing device.
+ * Call createKernel() on a ComputeProgramImpl to create an instance of a platform-specific
+ * subclass.  Then call addArg() to specify the values to pass for all of the kernel's arguments.
+ * Finally, call execute() to execute the kernel.  If you need to modify the values of kernel
+ * arguments between invocations, use setArg() to change the value of an argument.
+ * 
+ * Instead of referring to this class directly, it is best to use ComputeKernel, which is
+ * a typedef for a shared_ptr to a ComputeKernelImpl.  This allows you to treat it as having
+ * value semantics, and frees you from having to manage memory.  
+ */
+
+class OPENMM_EXPORT_COMMON ComputeKernelImpl {
+public:
+    virtual ~ComputeKernelImpl() {
+    }
+    /**
+     * Get the name of this kernel.
+     */
+    virtual std::string getName() const = 0;
+    /**
+     * Add an argument to pass the kernel when it is invoked.
+     * 
+     * @param value     the value to pass to the kernel
+     */
+    template <class T>
+    typename std::enable_if<std::is_trivially_copyable<T>::value, void>::type addArg(const T& value) {
+        addPrimitiveArg(&value, sizeof(value));
+    }
+    /**
+     * Add an argument to pass the kernel when it is invoked.
+     * 
+     * @param value     the value to pass to the kernel
+     */
+    void addArg(ArrayInterface& value) {
+        addArrayArg(value);
+    }
+    /**
+     * Add a placeholder for an argument without specifying its value.  The value must
+     * be provided by calling setArg() before the kernel is executed.
+     */
+    void addArg() {
+        addEmptyArg();
+    }
+    /**
+     * Set the value of an argument to pass the kernel when it is invoked.
+     * 
+     * @param index     the index of the argument to set
+     * @param value     the value to pass to the kernel
+     */
+    template <class T>
+    typename std::enable_if<std::is_trivially_copyable<T>::value, void>::type setArg(int index, const T& value) {
+        setPrimitiveArg(index, &value, sizeof(value));
+    }
+    /**
+     * Set the value of an argument to pass the kernel when it is invoked.
+     * 
+     * @param index     the index of the argument to set
+     * @param value     the value to pass to the kernel
+     */
+    void setArg(int index, ArrayInterface& value) {
+        setArrayArg(index, value);
+    }
+    /**
+     * Execute this kernel.
+     *
+     * @param threads      the maximum number of threads that should be used.  Depending on the
+     *                     computing device, it may choose to use fewer threads than this number.
+     * @param blockSize    the number of threads in each thread block.  If this is omitted, a
+     *                     default size that is appropriate for the computing device is used.
+     */
+    virtual void execute(int threads, int blockSize=-1) = 0;
+protected:
+    /**
+     * Add an argument to pass the kernel when it is invoked, where the value is a
+     * subclass of ArrayInterface.
+     * 
+     * @param value     the value to pass to the kernel
+     */
+    virtual void addArrayArg(ArrayInterface& value) = 0;
+    /**
+     * Add an argument to pass the kernel when it is invoked, where the value is a primitive type.
+     * 
+     * @param value    a pointer to the argument value
+     * @param size     the size of the value in bytes
+     */
+    virtual void addPrimitiveArg(const void* value, int size) = 0;
+    /**
+     * Add a placeholder for an argument without specifying its value.
+     */
+    virtual void addEmptyArg() = 0;
+    /**
+     * Add an argument to pass the kernel when it is invoked, where the value is a
+     * subclass of ArrayInterface.
+     * 
+     * @param index     the index of the argument to set
+     * @param value     the value to pass to the kernel
+     */
+    virtual void setArrayArg(int index, ArrayInterface& value) = 0;
+    /**
+     * Add an argument to pass the kernel when it is invoked, where the value is a primitive type.
+     * 
+     * @param index     the index of the argument to set
+     * @param value    a pointer to the argument value
+     * @param size     the size of the value in bytes
+     */
+    virtual void setPrimitiveArg(int index, const void* value, int size) = 0;
+};
+
+typedef std::shared_ptr<ComputeKernelImpl> ComputeKernel;
+
+} // namespace OpenMM
+
+#endif /*OPENMM_COMPUTEKERNEL_H_*/
--- a/platforms/common/include/openmm/common/ComputeParameterInfo.h
+++ b/platforms/common/include/openmm/common/ComputeParameterInfo.h
+#ifndef OPENMM_COMPUTEPARAMETERINFO_H_
+#define OPENMM_COMPUTEPARAMETERINFO_H_
+
+/* -------------------------------------------------------------------------- *
+ *                                   OpenMM                                   *
+ * -------------------------------------------------------------------------- *
+ * This is part of the OpenMM molecular simulation toolkit originating from   *
+ * Simbios, the NIH National Center for Physics-Based Simulation of           *
+ * Biological Structures at Stanford, funded under the NIH Roadmap for        *
+ * Medical Research, grant U54 GM072970. See https://simtk.org.               *
+ *                                                                            *
+ * Portions copyright (c) 2019 Stanford University and the Authors.           *
+ * Authors: Peter Eastman                                                     *
+ * Contributors:                                                              *
+ *                                                                            *
+ * This program is free software: you can redistribute it and/or modify       *
+ * it under the terms of the GNU Lesser General Public License as published   *
+ * by the Free Software Foundation, either version 3 of the License, or       *
+ * (at your option) any later version.                                        *
+ *                                                                            *
+ * This program is distributed in the hope that it will be useful,            *
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of             *
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the              *
+ * GNU Lesser General Public License for more details.                        *
+ *                                                                            *
+ * You should have received a copy of the GNU Lesser General Public License   *
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.      *
+ * -------------------------------------------------------------------------- */
+
+#include "openmm/common/ArrayInterface.h"
+#include <sstream>
+#include <string>
+
+namespace OpenMM {
+
+/**
+ * This class stores information about a parameter that can be passed to a kernel.
+ * It combines an ArrayInterface holding parameter values with additional information
+ * describing how to represent it in kernels: the variable name, the data type, etc.
+ * 
+ * The array is assumed to contain a parameter value for each of many objects (atoms,
+ * bonds, etc.).  Each value may in turn be a multi-component vector.  When creating
+ * a ComputeParameterInfo, specify the number of components in the vector and the
+ * type of each component.  For example, suppose you have an array of type float3
+ * containing a dipole moment for each atom.  The ComputeParameterInfo would be
+ * created like this:
+ * 
+ * ComputeParameterInfo parameter(dipoleArray, "dipole", "float", 3);
+ */
+
+class ComputeParameterInfo {
+public:
+    /**
+     * Create a ComputeParameterInfo.
+     *
+     * @param array          the array containing the parameter values
+     * @param name           the name of the variable to use for this parameter
+     * @param type           the data type of the parameter's components
+     * @param numComponents  the number of components in the parameter
+     * @param constant       whether the array memory should be marked as constant
+     */
+    ComputeParameterInfo(ArrayInterface& array, const std::string& name, const std::string& componentType, int numComponents, bool constant=true) :
+            array(array), name(name), componentType(componentType), numComponents(numComponents), constant(constant) {
+        if (numComponents == 1)
+            type = componentType;
+        else {
+            std::stringstream s;
+            s << componentType << numComponents;
+            type = s.str();
+        }
+    }
+    virtual ~ComputeParameterInfo() {
+    }
+    /**
+     * Get the array containing the parameter values.
+     */
+    ArrayInterface& getArray() {
+        return array;
+    }
+    /**
+     * Get the array containing the parameter values.
+     */
+    const ArrayInterface& getArray() const {
+        return array;
+    }
+    /**
+     * Get the name of the variable to use for this parameter.
+     */
+    const std::string& getName() const {
+        return name;
+    }
+    /**
+     * Get the data type of each component of the value.  For example, if getType() returns "float3",
+     * this will return "float".
+     */
+    const std::string& getComponentType() const {
+        return componentType;
+    }
+    /**
+     * Get the data type of each value.
+     */
+    const std::string& getType() const {
+        return type;
+    }
+    /**
+     * Get the number of components in each value.  If the values are not a vector
+     * type, this returns 1.
+     */
+    int getNumComponents() const {
+        return numComponents;
+    }
+    /**
+     * Get the size of each parameter value in bytes.
+     */
+    int getSize() const {
+        return array.getElementSize();
+    }
+    /**
+     * Get whether the array memory should be marked as constant.
+     */
+    bool isConstant() const {
+        return constant;
+    }
+private:
+    ArrayInterface& array;
+    std::string name;
+    std::string componentType;
+    std::string type;
+    int numComponents;
+    bool constant;
+};
+
+} // namespace OpenMM
+
+#endif /*OPENMM_COMPUTEPARAMETERINFO_H_*/
--- a/platforms/common/include/openmm/common/ComputeParameterSet.h
+++ b/platforms/common/include/openmm/common/ComputeParameterSet.h
+#ifndef OPENMM_COMPUTEPARAMETERSET_H_
+#define OPENMM_COMPUTEPARAMETERSET_H_
+
+/* -------------------------------------------------------------------------- *
+ *                                   OpenMM                                   *
+ * -------------------------------------------------------------------------- *
+ * This is part of the OpenMM molecular simulation toolkit originating from   *
+ * Simbios, the NIH National Center for Physics-Based Simulation of           *
+ * Biological Structures at Stanford, funded under the NIH Roadmap for        *
+ * Medical Research, grant U54 GM072970. See https://simtk.org.               *
+ *                                                                            *
+ * Portions copyright (c) 2009-2019 Stanford University and the Authors.      *
+ * Authors: Peter Eastman                                                     *
+ * Contributors:                                                              *
+ *                                                                            *
+ * This program is free software: you can redistribute it and/or modify       *
+ * it under the terms of the GNU Lesser General Public License as published   *
+ * by the Free Software Foundation, either version 3 of the License, or       *
+ * (at your option) any later version.                                        *
+ *                                                                            *
+ * This program is distributed in the hope that it will be useful,            *
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of             *
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the              *
+ * GNU Lesser General Public License for more details.                        *
+ *                                                                            *
+ * You should have received a copy of the GNU Lesser General Public License   *
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.      *
+ * -------------------------------------------------------------------------- */
+
+#include "openmm/common/ArrayInterface.h"
+#include "openmm/common/ComputeContext.h"
+#include "openmm/common/ComputeParameterInfo.h"
+#include <string>
+#include <vector>
+
+namespace OpenMM {
+
+/**
+ * This class represents a set of floating point parameter values for a set of objects (particles, bonds, etc.).
+ * It automatically creates an appropriate set of arrays to hold the parameter values, based
+ * on the number of parameters required.
+ */
+
+class OPENMM_EXPORT_COMMON ComputeParameterSet {
+public:
+    /**
+     * Create an ComputeParameterSet.
+     *
+     * @param context          the context for which to create the parameter set
+     * @param numParameters    the number of parameters for each object
+     * @param numObjects       the number of objects to store parameter values for
+     * @param name             the name of the parameter set
+     * @param arrayPerParameter   if true, a separate array is created for each parameter.  If false,
+     *                            multiple parameters may be combined into a single array for efficiency.
+     * @param useDoublePrecision  whether values should be stored as single or double precision
+     */
+    ComputeParameterSet(ComputeContext& context, int numParameters, int numObjects, const std::string& name, bool arrayPerParameter=false, bool useDoublePrecision=false);
+    ~ComputeParameterSet();
+    /**
+     * Get the number of parameters.
+     */
+    int getNumParameters() const {
+        return numParameters;
+    }
+    /**
+     * Get the number of objects.
+     */
+    int getNumObjects() const {
+        return numObjects;
+    }
+    /**
+     * Get the values of all parameters.
+     *
+     * @param values on exit, values[i][j] contains the value of parameter j for object i
+     */
+    template <class T>
+    void getParameterValues(std::vector<std::vector<T> >& values);
+    /**
+     * Set the values of all parameters.
+     *
+     * @param values values[i][j] contains the value of parameter j for object i
+     */
+    template <class T>
+    void setParameterValues(const std::vector<std::vector<T> >& values);
+    /**
+     * Get a vector of ComputeParameterInfo objects which describe the arrays
+     * containing the data.
+     */
+    std::vector<ComputeParameterInfo>& getParameterInfos() {
+        return parameters;
+    }
+    /**
+     * Get a suffix to add to variable names when accessing a certain parameter.
+     *
+     * @param index         the index of the parameter
+     * @param extraSuffix   an extra suffix to add to the variable name
+     * @return the suffix to append
+     */
+    std::string getParameterSuffix(int index, const std::string& extraSuffix="") const;
+private:
+    ComputeContext& context;
+    int numParameters, numObjects, elementSize;
+    std::string name;
+    std::vector<ArrayInterface*> arrays;
+    std::vector<ComputeParameterInfo> parameters;
+};
+
+} // namespace OpenMM
+
+#endif /*OPENMM_COMPUTEPARAMETERSET_H_*/
--- a/platforms/common/include/openmm/common/ComputeProgram.h
+++ b/platforms/common/include/openmm/common/ComputeProgram.h
+#ifndef OPENMM_COMPUTEPROGRAM_H_
+#define OPENMM_COMPUTEPROGRAM_H_
+
+/* -------------------------------------------------------------------------- *
+ *                                   OpenMM                                   *
+ * -------------------------------------------------------------------------- *
+ * This is part of the OpenMM molecular simulation toolkit originating from   *
+ * Simbios, the NIH National Center for Physics-Based Simulation of           *
+ * Biological Structures at Stanford, funded under the NIH Roadmap for        *
+ * Medical Research, grant U54 GM072970. See https://simtk.org.               *
+ *                                                                            *
+ * Portions copyright (c) 2019 Stanford University and the Authors.           *
+ * Authors: Peter Eastman                                                     *
+ * Contributors:                                                              *
+ *                                                                            *
+ * This program is free software: you can redistribute it and/or modify       *
+ * it under the terms of the GNU Lesser General Public License as published   *
+ * by the Free Software Foundation, either version 3 of the License, or       *
+ * (at your option) any later version.                                        *
+ *                                                                            *
+ * This program is distributed in the hope that it will be useful,            *
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of             *
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the              *
+ * GNU Lesser General Public License for more details.                        *
+ *                                                                            *
+ * You should have received a copy of the GNU Lesser General Public License   *
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.      *
+ * -------------------------------------------------------------------------- */
+
+#include "openmm/common/ComputeKernel.h"
+#include <memory>
+
+namespace OpenMM {
+
+/**
+ * This abstract class represents a compiled program that can be executed on a computing
+ * device.  A ComputeProgramImpl is created by calling compileProgram() on a ComputeContext,
+ * which returns an instance of a platform-specific subclass.  The source code for a
+ * ComputeProgramImpl typically contains one or more kernels.  Call createKernel() to get
+ * ComputeKernels for the kernels, which can then be executed.
+ * 
+ * Instead of referring to this class directly, it is best to use ComputeProgram, which is
+ * a typedef for a shared_ptr to a ComputeProgramImpl.  This allows you to treat it as having
+ * value semantics, and frees you from having to manage memory.  
+ */
+
+class OPENMM_EXPORT_COMMON ComputeProgramImpl {
+public:
+    virtual ~ComputeProgramImpl() {
+    }
+    /**
+     * Create a ComputeKernel for one of the kernels in this program.
+     * 
+     * @param name    the name of the kernel to get
+     */
+    virtual ComputeKernel createKernel(const std::string& name) = 0;
+};
+
+typedef std::shared_ptr<ComputeProgramImpl> ComputeProgram;
+
+} // namespace OpenMM
+
+#endif /*OPENMM_COMPUTEPROGRAM_H_*/
--- a/platforms/common/include/openmm/common/ComputeVectorTypes.h
+++ b/platforms/common/include/openmm/common/ComputeVectorTypes.h
+#ifndef OPENMM_COMPUTEVECTORTYPES_H_
+#define OPENMM_COMPUTEVECTORTYPES_H_
+
+/* -------------------------------------------------------------------------- *
+ *                                   OpenMM                                   *
+ * -------------------------------------------------------------------------- *
+ * This is part of the OpenMM molecular simulation toolkit originating from   *
+ * Simbios, the NIH National Center for Physics-Based Simulation of           *
+ * Biological Structures at Stanford, funded under the NIH Roadmap for        *
+ * Medical Research, grant U54 GM072970. See https://simtk.org.               *
+ *                                                                            *
+ * Portions copyright (c) 2019 Stanford University and the Authors.           *
+ * Authors: Peter Eastman                                                     *
+ * Contributors:                                                              *
+ *                                                                            *
+ * This program is free software: you can redistribute it and/or modify       *
+ * it under the terms of the GNU Lesser General Public License as published   *
+ * by the Free Software Foundation, either version 3 of the License, or       *
+ * (at your option) any later version.                                        *
+ *                                                                            *
+ * This program is distributed in the hope that it will be useful,            *
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of             *
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the              *
+ * GNU Lesser General Public License for more details.                        *
+ *                                                                            *
+ * You should have received a copy of the GNU Lesser General Public License   *
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.      *
+ * -------------------------------------------------------------------------- */
+
+namespace OpenMM {
+
+struct mm_short2 {
+    short x, y;
+    mm_short2() {
+    }
+    mm_short2(short x, short y) : x(x), y(y) {
+    }
+};
+struct mm_short3 {
+    short x, y, z, w;
+    mm_short3() {
+    }
+    mm_short3(short x, short y, short z) : x(x), y(y), z(z) {
+    }
+};
+struct mm_short4 {
+    short x, y, z, w;
+    mm_short4() {
+    }
+    mm_short4(short x, short y, short z, short w) : x(x), y(y), z(z), w(w) {
+    }
+};
+struct mm_int2 {
+    int x, y;
+    mm_int2() {
+    }
+    mm_int2(int x, int y) : x(x), y(y) {
+    }
+};
+struct mm_int3 {
+    int x, y, z, w;
+    mm_int3() {
+    }
+    mm_int3(int x, int y, int z) : x(x), y(y), z(z) {
+    }
+};
+struct mm_int4 {
+    int x, y, z, w;
+    mm_int4() {
+    }
+    mm_int4(int x, int y, int z, int w) : x(x), y(y), z(z), w(w) {
+    }
+};
+struct mm_float2 {
+    float x, y;
+    mm_float2() {
+    }
+    mm_float2(float x, float y) : x(x), y(y) {
+    }
+};
+struct mm_float3 {
+    float x, y, z, w;
+    mm_float3() {
+    }
+    mm_float3(float x, float y, float z) : x(x), y(y), z(z) {
+    }
+};
+struct mm_float4 {
+    float x, y, z, w;
+    mm_float4() {
+    }
+    mm_float4(float x, float y, float z, float w) : x(x), y(y), z(z), w(w) {
+    }
+};
+struct mm_double2 {
+    double x, y;
+    mm_double2() {
+    }
+    mm_double2(double x, double y) : x(x), y(y) {
+    }
+};
+struct mm_double3 {
+    double x, y, z, w;
+    mm_double3() {
+    }
+    mm_double3(double x, double y, double z) : x(x), y(y), z(z) {
+    }
+};
+struct mm_double4 {
+    double x, y, z, w;
+    mm_double4() {
+    }
+    mm_double4(double x, double y, double z, double w) : x(x), y(y), z(z), w(w) {
+    }
+};
+
+} // namespace OpenMM
+
+#endif /*OPENMM_COMPUTEVECTORTYPES_H_*/
--- a/platforms/common/include/openmm/common/ExpressionUtilities.h
+++ b/platforms/common/include/openmm/common/ExpressionUtilities.h
+#ifndef OPENMM_EXPRESSIONUTILITIES_H_
+#define OPENMM_EXPRESSIONUTILITIES_H_
+
+/* -------------------------------------------------------------------------- *
+ *                                   OpenMM                                   *
+ * -------------------------------------------------------------------------- *
+ * This is part of the OpenMM molecular simulation toolkit originating from   *
+ * Simbios, the NIH National Center for Physics-Based Simulation of           *
+ * Biological Structures at Stanford, funded under the NIH Roadmap for        *
+ * Medical Research, grant U54 GM072970. See https://simtk.org.               *
+ *                                                                            *
+ * Portions copyright (c) 2009-2019 Stanford University and the Authors.      *
+ * Authors: Peter Eastman                                                     *
+ * Contributors:                                                              *
+ *                                                                            *
+ * This program is free software: you can redistribute it and/or modify       *
+ * it under the terms of the GNU Lesser General Public License as published   *
+ * by the Free Software Foundation, either version 3 of the License, or       *
+ * (at your option) any later version.                                        *
+ *                                                                            *
+ * This program is distributed in the hope that it will be useful,            *
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of             *
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the              *
+ * GNU Lesser General Public License for more details.                        *
+ *                                                                            *
+ * You should have received a copy of the GNU Lesser General Public License   *
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.      *
+ * -------------------------------------------------------------------------- */
+
+#include "ComputeContext.h"
+#include "openmm/TabulatedFunction.h"
+#include "lepton/CustomFunction.h"
+#include "lepton/ExpressionTreeNode.h"
+#include "lepton/ParsedExpression.h"
+#include <map>
+#include <sstream>
+#include <string>
+#include <utility>
+
+namespace OpenMM {
+
+/**
+ * This class is used by various classes to generate kernel source code implementing
+ * user defined mathematical expressions.
+ */
+
+class OPENMM_EXPORT_COMMON ExpressionUtilities {
+public:
+    ExpressionUtilities(ComputeContext& context);
+    /**
+     * Generate the source code for calculating a set of expressions.
+     *
+     * @param expressions    the expressions to generate code for (keys are the variables to store the output values in)
+     * @param variables      defines the source code to generate for each variable that may appear in the expressions.  Keys are
+     *                       variable names, and the values are the code to generate for them.
+     * @param functions      the tabulated functions that may appear in the expressions
+     * @param functionNames  defines the variable name for each tabulated function that may appear in the expressions
+     * @param prefix         a prefix to put in front of temporary variables
+     * @param tempType       the type of value to use for temporary variables (defaults to "real")
+     */
+    std::string createExpressions(const std::map<std::string, Lepton::ParsedExpression>& expressions, const std::map<std::string, std::string>& variables,
+            const std::vector<const TabulatedFunction*>& functions, const std::vector<std::pair<std::string, std::string> >& functionNames,
+            const std::string& prefix, const std::string& tempType="real");
+    /**
+     * Generate the source code for calculating a set of expressions.
+     *
+     * @param expressions    the expressions to generate code for (keys are the variables to store the output values in)
+     * @param variables      defines the source code to generate for each variable or precomputed sub-expression that may appear in the expressions.
+     *                       Each entry is an ExpressionTreeNode, and the code to generate wherever an identical node appears.
+     * @param functions      the tabulated functions that may appear in the expressions
+     * @param functionNames  defines the variable name for each tabulated function that may appear in the expressions
+     * @param prefix         a prefix to put in front of temporary variables
+     * @param tempType       the type of value to use for temporary variables (defaults to "real")
+     */
+    std::string createExpressions(const std::map<std::string, Lepton::ParsedExpression>& expressions, const std::vector<std::pair<Lepton::ExpressionTreeNode, std::string> >& variables,
+            const std::vector<const TabulatedFunction*>& functions, const std::vector<std::pair<std::string, std::string> >& functionNames,
+            const std::string& prefix, const std::string& tempType="real");
+    /**
+     * Calculate the spline coefficients for a tabulated function that appears in expressions.
+     *
+     * @param function   the function for which to compute coefficients
+     * @param width      on output, the number of floats used for each value
+     * @return the spline coefficients
+     */
+    std::vector<float> computeFunctionCoefficients(const TabulatedFunction& function, int& width);
+    /**
+     * Get a Lepton::CustomFunction that can be used to represent a TabulatedFunction when parsing expressions.
+     * 
+     * @param function   the function for which to get a placeholder
+     */
+    Lepton::CustomFunction* getFunctionPlaceholder(const TabulatedFunction& function);
+    /**
+     * Get a Lepton::CustomFunction that can be used to represent the periodicdistance() function when parsing expressions.
+     */
+    Lepton::CustomFunction* getPeriodicDistancePlaceholder();
+private:
+    class FunctionPlaceholder : public Lepton::CustomFunction {
+        public:
+            FunctionPlaceholder(int numArgs) : numArgs(numArgs) {
+            }
+            int getNumArguments() const {
+                return numArgs;
+            }
+            double evaluate(const double* arguments) const {
+                return 0.0;
+            }
+            double evaluateDerivative(const double* arguments, const int* derivOrder) const {
+                return 0.0;
+            }
+            CustomFunction* clone() const {
+                return new FunctionPlaceholder(numArgs);
+            }
+        private:
+            int numArgs;
+    };
+    void processExpression(std::stringstream& out, const Lepton::ExpressionTreeNode& node,
+            std::vector<std::pair<Lepton::ExpressionTreeNode, std::string> >& temps,
+            const std::vector<const TabulatedFunction*>& functions, const std::vector<std::pair<std::string, std::string> >& functionNames,
+            const std::string& prefix, const std::vector<std::vector<double> >& functionParams, const std::vector<Lepton::ParsedExpression>& allExpressions, const std::string& tempType);
+    std::string getTempName(const Lepton::ExpressionTreeNode& node, const std::vector<std::pair<Lepton::ExpressionTreeNode, std::string> >& temps);
+    void findRelatedCustomFunctions(const Lepton::ExpressionTreeNode& node, const Lepton::ExpressionTreeNode& searchNode,
+            std::vector<const Lepton::ExpressionTreeNode*>& nodes);
+    void findRelatedPowers(const Lepton::ExpressionTreeNode& node, const Lepton::ExpressionTreeNode& searchNode,
+            std::map<int, const Lepton::ExpressionTreeNode*>& powers);
+    void callFunction(std::stringstream& out, std::string singleFn, std::string doubleFn, const std::string& arg, const std::string& tempType);
+    void callFunction2(std::stringstream& out, std::string singleFn, std::string doubleFn, const std::string& arg1, const std::string& arg2, const std::string& tempType);
+    std::vector<std::vector<double> > computeFunctionParameters(const std::vector<const TabulatedFunction*>& functions);
+    ComputeContext& context;
+    FunctionPlaceholder fp1, fp2, fp3, periodicDistance;
+};
+
+} // namespace OpenMM
+
+#endif /*OPENMM_EXPRESSIONUTILITIES_H_*/
--- a/platforms/common/include/openmm/common/IntegrationUtilities.h
+++ b/platforms/common/include/openmm/common/IntegrationUtilities.h
+#ifndef OPENMM_INTEGRATIONUTILITIES_H_
+#define OPENMM_INTEGRATIONUTILITIES_H_
+
+/* -------------------------------------------------------------------------- *
+ *                                   OpenMM                                   *
+ * -------------------------------------------------------------------------- *
+ * This is part of the OpenMM molecular simulation toolkit originating from   *
+ * Simbios, the NIH National Center for Physics-Based Simulation of           *
+ * Biological Structures at Stanford, funded under the NIH Roadmap for        *
+ * Medical Research, grant U54 GM072970. See https://simtk.org.               *
+ *                                                                            *
+ * Portions copyright (c) 2009-2019 Stanford University and the Authors.      *
+ * Authors: Peter Eastman                                                     *
+ * Contributors:                                                              *
+ *                                                                            *
+ * This program is free software: you can redistribute it and/or modify       *
+ * it under the terms of the GNU Lesser General Public License as published   *
+ * by the Free Software Foundation, either version 3 of the License, or       *
+ * (at your option) any later version.                                        *
+ *                                                                            *
+ * This program is distributed in the hope that it will be useful,            *
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of             *
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the              *
+ * GNU Lesser General Public License for more details.                        *
+ *                                                                            *
+ * You should have received a copy of the GNU Lesser General Public License   *
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.      *
+ * -------------------------------------------------------------------------- */
+
+#include "openmm/common/ComputeArray.h"
+#include "openmm/common/ComputeKernel.h"
+#include "openmm/common/ComputeVectorTypes.h"
+#include "openmm/System.h"
+#include <iosfwd>
+#include <map>
+
+namespace OpenMM {
+
+class ComputeContext;
+
+/**
+ * This class implements features that are used by many different integrators, including
+ * common workspace arrays, random number generation, and enforcing constraints.
+ */
+
+class OPENMM_EXPORT_COMMON IntegrationUtilities {
+public:
+    IntegrationUtilities(ComputeContext& context, const System& system);
+    virtual ~IntegrationUtilities() {
+    }
+    /**
+     * Get the array which contains position deltas.  These are the amounts by
+     * which the position of each atom will change in the current step.  The actual
+     * positions should not be modified until after constraints have been applied.
+     */
+    virtual ArrayInterface& getPosDelta() = 0;
+    /**
+     * Get the array which contains random values.  Each element is a float4 whose components
+     * are independent, normally distributed random numbers with mean 0 and variance 1.
+     * Be sure to call initRandomNumberGenerator() and prepareRandomNumbers() before
+     * accessing this array.
+     */
+    virtual ArrayInterface& getRandom() = 0;
+    /**
+     * Get the array which contains the current step size.
+     */
+    virtual ArrayInterface& getStepSize() = 0;
+    /**
+     * Set the size to use for the next step.
+     */
+    void setNextStepSize(double size);
+    /**
+     * Get the size that was used for the last step.
+     */
+    double getLastStepSize();
+    /**
+     * Apply constraints to the atom positions.  When calling this method, the
+     * context's array of positions should contain the positions at the start of the
+     * step, and the array returned by getPosDelta() should contain the intended
+     * change to each position.  This method modifies the position deltas so that,
+     * once they are added to the positions, constraints will be satisfied.
+     *
+     * @param tol             the constraint tolerance
+     */
+    void applyConstraints(double tol);
+    /**
+     * Apply constraints to the atom velocities.
+     *
+     * @param tol             the constraint tolerance
+     */
+    void applyVelocityConstraints(double tol);
+    /**
+     * Initialize the random number generator.  This should be called once when the
+     * context is first created.  Subsequent calls will be ignored if the random
+     * seed is the same as on the first call, or throw an exception if the random
+     * seed is different.
+     */
+    void initRandomNumberGenerator(unsigned int randomNumberSeed);
+    /**
+     * Ensure that sufficient random numbers are available in the array, and generate new ones if not.
+     *
+     * @param numValues     the number of random float4's that will be required
+     * @return the index in the array at which to start reading
+     */
+    int prepareRandomNumbers(int numValues);
+    /**
+     * Compute the positions of virtual sites.
+     */
+    void computeVirtualSites();
+    /**
+     * Distribute forces from virtual sites to the atoms they are based on.
+     */
+    virtual void distributeForcesFromVirtualSites() = 0;
+    /**
+     * Create a checkpoint recording the current state of the random number generator.
+     * 
+     * @param stream    an output stream the checkpoint data should be written to
+     */
+    void createCheckpoint(std::ostream& stream);
+    /**
+     * Load a checkpoint that was written by createCheckpoint().
+     * 
+     * @param stream    an input stream the checkpoint data should be read from
+     */
+    void loadCheckpoint(std::istream& stream);
+    /**
+     * Compute the kinetic energy of the system, possibly shifting the velocities in time to account
+     * for a leapfrog integrator.
+     * 
+     * @param timeShift   the amount by which to shift the velocities in time
+     */
+    double computeKineticEnergy(double timeShift);
+    /**
+     * Get the data structure that holds the state of all Nose-Hoover chains
+     */
+    std::map<int, ComputeArray>& getNoseHooverChainState() {
+        return noseHooverChainState;
+    }
+protected:
+    virtual void applyConstraintsImpl(bool constrainVelocities, double tol) = 0;
+    ComputeContext& context;
+    ComputeKernel settlePosKernel, settleVelKernel;
+    ComputeKernel shakePosKernel, shakeVelKernel;
+    ComputeKernel ccmaDirectionsKernel, ccmaPosForceKernel, ccmaVelForceKernel;
+    ComputeKernel ccmaMultiplyKernel, ccmaUpdateKernel;
+    ComputeKernel vsitePositionKernel, vsiteForceKernel, vsiteSaveForcesKernel;
+    ComputeKernel randomKernel, timeShiftKernel;
+    ComputeArray posDelta;
+    ComputeArray settleAtoms;
+    ComputeArray settleParams;
+    ComputeArray shakeAtoms;
+    ComputeArray shakeParams;
+    ComputeArray random;
+    ComputeArray randomSeed;
+    ComputeArray stepSize;
+    ComputeArray ccmaAtoms;
+    ComputeArray ccmaDistance;
+    ComputeArray ccmaReducedMass;
+    ComputeArray ccmaAtomConstraints;
+    ComputeArray ccmaNumAtomConstraints;
+    ComputeArray ccmaConstraintMatrixColumn;
+    ComputeArray ccmaConstraintMatrixValue;
+    ComputeArray ccmaDelta1;
+    ComputeArray ccmaDelta2;
+    ComputeArray ccmaConverged;
+    ComputeArray vsite2AvgAtoms;
+    ComputeArray vsite2AvgWeights;
+    ComputeArray vsite3AvgAtoms;
+    ComputeArray vsite3AvgWeights;
+    ComputeArray vsiteOutOfPlaneAtoms;
+    ComputeArray vsiteOutOfPlaneWeights;
+    ComputeArray vsiteLocalCoordsIndex;
+    ComputeArray vsiteLocalCoordsAtoms;
+    ComputeArray vsiteLocalCoordsWeights;
+    ComputeArray vsiteLocalCoordsPos;
+    ComputeArray vsiteLocalCoordsStartIndex;
+    std::map<int, ComputeArray> noseHooverChainState;
+    int randomPos, lastSeed, numVsites;
+    bool hasOverlappingVsites;
+    mm_double2 lastStepSize;
+    struct ShakeCluster;
+    struct ConstraintOrderer;
+};
+
+} // namespace OpenMM
+
+#endif /*OPENMM_INTEGRATIONUTILITIES_H_*/