Common compute framework to unify CUDA and OpenCL code (#2488)

* Began creating common compute framework to unify code between CUDA and OpenCL * Began OpenCL implementation of common compute framework * Common implementation of CMMotionRemover * CUDA implementation of common compute interface * Converted HarmonicBondForce to common compute API * Converted standard bonded forces to common compute API * Converted ExpressionUtilities to common compute API * Created ComputeParameterSet * Converted custom bonded forces to common compute API * Converted CustomCentroidBondForce to common compute API * Converted CustomManyParticleForce to common compute API * Moved lots of duplicate code from CudaContext and OpenCLContext to ComputeContext * Converted GayBerneForce to common compute API * Removed obsolete kernels * Converted verlet integrators to common compute API * Converted Langevin and Brownian integrators to common compute API * Converted CustomIntegrator to common compute API * Converted CustomNonbondedForce to common compute API * Removed uses of a deprecated API * Fixed failing test cases * Converted GBSAOBCForce to common compute API * Began converting CustomGBForce to common compute API * Finished converting CustomGBForce to common compute API * Merged duplicated code in CudaIntegrationUtilities and OpenCLIntegrationUtilities * Converted RMSDForce and AndersenThermostat to common compute API * Converted CustomHbondForce to common compute API * Merged scripts for encoding kernel sources * Converted Drude plugin to common compute API * Fixed errors in CMake scripts * Attempt at fixing errors on Windows * Added discussion of common compute API to developer guide * Added Windows export macro for common classes * Fixed error in CMMotionRemover * Ubdated travis to newer Ubuntu version * Fixed errors on CPU OpenCL * Fixed Windows linking errors * Added missing pragma for 32 bit atomics * Replaced long long with mm_long * More fixes to Windows linking * Bug fix

Common compute framework to unify CUDA and OpenCL code (#2488)
* Began creating common compute framework to unify code between CUDA and OpenCL * Began OpenCL implementation of common compute framework * Common implementation of CMMotionRemover * CUDA implementation of common compute interface * Converted HarmonicBondForce to common compute API * Converted standard bonded forces to common compute API * Converted ExpressionUtilities to common compute API * Created ComputeParameterSet * Converted custom bonded forces to common compute API * Converted CustomCentroidBondForce to common compute API * Converted CustomManyParticleForce to common compute API * Moved lots of duplicate code from CudaContext and OpenCLContext to ComputeContext * Converted GayBerneForce to common compute API * Removed obsolete kernels * Converted verlet integrators to common compute API * Converted Langevin and Brownian integrators to common compute API * Converted CustomIntegrator to common compute API * Converted CustomNonbondedForce to common compute API * Removed uses of a deprecated API * Fixed failing test cases * Converted GBSAOBCForce to common compute API * Began converting CustomGBForce to common compute API * Finished converting CustomGBForce to common compute API * Merged duplicated code in CudaIntegrationUtilities and OpenCLIntegrationUtilities * Converted RMSDForce and AndersenThermostat to common compute API * Converted CustomHbondForce to common compute API * Merged scripts for encoding kernel sources * Converted Drude plugin to common compute API * Fixed errors in CMake scripts * Attempt at fixing errors on Windows * Added discussion of common compute API to developer guide * Added Windows export macro for common classes * Fixed error in CMMotionRemover * Ubdated travis to newer Ubuntu version * Fixed errors on CPU OpenCL * Fixed Windows linking errors * Added missing pragma for 32 bit atomics * Replaced long long with mm_long * More fixes to Windows linking * Bug fix
edbc8407 · peastman · GitHub · 38beeefe · edbc8407 · edbc8407
Unverified Commit edbc8407 authored Jan 08, 2020 by peastman Committed by GitHub Jan 08, 2020
20 changed files
--- a/.travis.yml
+++ b/.travis.yml
@@ -17,7 +17,7 @@ env:
 matrix:
  include:
    - sudo: required
-      dist: trusty
+      dist: xenial
      env: ==CPU_OPENCL==
           OPENCL=true
           CUDA=false
@@ -40,7 +40,7 @@ matrix:
      addons: {apt: {packages: []}}

    - sudo: required
-      dist: trusty
+      dist: xenial
      env: ==CUDA_COMPILE==
           CUDA=true
           OPENCL=false
@@ -74,7 +74,7 @@ matrix:
      addons: {apt: {packages: []}}

    - sudo: false
-      dist: trusty
+      dist: xenial
      python: 3.6
      env: ==STATIC_LIB==
           OPENCL=false
@@ -84,7 +84,7 @@ matrix:
           CMAKE_FLAGS="-DOPENMM_BUILD_STATIC_LIB=ON"

    - sudo: false
-      dist: trusty
+      dist: xenial
      python: 3.6
      env: ==PYTHON_3_6==
           OPENCL=false

--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -341,6 +341,12 @@ IF(OPENMM_BUILD_OPENCL_LIB)
    ADD_SUBDIRECTORY(platforms/opencl)
 ENDIF(OPENMM_BUILD_OPENCL_LIB)

+# Common compute files
+
+IF(CUDA_FOUND OR OPENCL_FOUND)
+    ADD_SUBDIRECTORY(platforms/common)
+ENDIF()
+
 # Optimized CPU platform

 SET(OPENMM_BUILD_CPU_LIB ON CACHE BOOL "Build optimized CPU platform")

--- a/platforms/cuda/EncodeCUDAFiles.cmake
+++ b/platforms/cuda/EncodeCUDAFiles.cmake
-FILE(GLOB CUDA_KERNELS ${CUDA_SOURCE_DIR}/kernels/*.cu)
-SET(CUDA_FILE_DECLARATIONS)
-SET(CUDA_FILE_DEFINITIONS)
-CONFIGURE_FILE(${CUDA_SOURCE_DIR}/${CUDA_SOURCE_CLASS}.cpp.in ${CUDA_KERNELS_CPP})
-FOREACH(file ${CUDA_KERNELS})
+FILE(GLOB KERNEL_FILES ${KERNEL_SOURCE_DIR}/kernels/*.${KERNEL_FILE_EXTENSION})
+SET(KERNEL_FILE_DECLARATIONS)
+CONFIGURE_FILE(${KERNEL_SOURCE_DIR}/${KERNEL_SOURCE_CLASS}.cpp.in ${KERNELS_CPP})
+FOREACH(file ${KERNEL_FILES})
    # Load the file contents and process it.
    FILE(STRINGS ${file} file_content NEWLINE_CONSUME)
    # Replace all backslashes by double backslashes as they are being put in a C string.
@@ -15,13 +14,13 @@ FOREACH(file ${CUDA_KERNELS})
    STRING(REPLACE "\n" "\\n\"\n\"" file_content "${file_content}")

    # Determine a name for the variable that will contain this file's contents
-    FILE(RELATIVE_PATH filename ${CUDA_SOURCE_DIR}/kernels ${file})
+    FILE(RELATIVE_PATH filename ${KERNEL_SOURCE_DIR}/kernels ${file})
    STRING(LENGTH ${filename} filename_length)
    MATH(EXPR filename_length ${filename_length}-3)
    STRING(SUBSTRING ${filename} 0 ${filename_length} variable_name)

    # Record the variable declaration and definition.
-    SET(CUDA_FILE_DECLARATIONS ${CUDA_FILE_DECLARATIONS}static\ const\ std::string\ ${variable_name};\n)
-    FILE(APPEND ${CUDA_KERNELS_CPP} const\ string\ ${CUDA_SOURCE_CLASS}::${variable_name}\ =\ \"${file_content}\"\;\n)
+    SET(KERNEL_FILE_DECLARATIONS ${KERNEL_FILE_DECLARATIONS}static\ const\ std::string\ ${variable_name};\n)
+    FILE(APPEND ${KERNELS_CPP} const\ string\ ${KERNEL_SOURCE_CLASS}::${variable_name}\ =\ \"${file_content}\"\;\n)
 ENDFOREACH(file)
-CONFIGURE_FILE(${CUDA_SOURCE_DIR}/${CUDA_SOURCE_CLASS}.h.in ${CUDA_KERNELS_H})
+CONFIGURE_FILE(${KERNEL_SOURCE_DIR}/${KERNEL_SOURCE_CLASS}.h.in ${KERNELS_H})
--- a/docs-source/developerguide/developer.rst
+++ b/docs-source/developerguide/developer.rst
@@ -456,15 +456,22 @@ It also defines vector versions of these types (\ :code:`real2`\ ,
 Computing Forces
 ****************

-When forces are computed, they are stored in multiple buffers.  This is done to
-enable multiple work-items or work-groups to compute forces on the same particle
-at the same time; as long as each one writes to a different buffer, there is no
-danger of race conditions.  At the start of a force calculation, all forces in
-all buffers are set to zero.   Each Force is then free to add its contributions
-to any or all of the buffers.  Finally, the buffers are summed to produce the
-total force on each particle.
-
-The size of each buffer is equal to the number of particles, rounded up to the
+When forces are computed, they can be stored in either of two places.  There is
+an array of :code:`long` values storing them as 64 bit fixed point values, and
+a collection of buffers of :code:`real4` values storing them in floating point
+format.  Most GPUs support atomic operations on 64 bit integers, which allows
+many threads to simultaneously record forces without a danger of conflicts.
+Some low end GPUs do not support this, however, especially the embedded GPUs
+found in many laptops.  These devices write to the floating point buffers, with
+careful coordination to make sure two threads will never write to the same
+memory location at the same time.
+
+At the start of a force calculation, all forces in all buffers are set to zero.
+Each Force is then free to add its contributions to any or all of the buffers.
+Finally, the buffers are summed to produce the total force on each particle.
+The total is recorded in both the floating point and fixed point arrays.
+
+The size of each floating point buffer is equal to the number of particles, rounded up to the
 next multiple of 32.  Call :code:`getPaddedNumAtoms()` on the OpenCLContext
 to get that number.  The actual force buffers are obtained by calling 
 :code:`getForceBuffers()`\ .  The first *n* entries (where *n* is the
@@ -473,16 +480,13 @@ represent the second force buffer, and so on.  More generally, the *i*\ ’th
 force buffer’s contribution to the force on particle *j* is stored in
 element :code:`i*context.getPaddedNumAtoms()+j`\ .

-Depending on the device, a buffer may also be created that stores contributions
-to the forces in 64 bit fixed point format.  On devices that support atomic
-operations on 64 bit integers in global memory, this can be a more efficient way
-of accumulating forces than using a large number of force buffers.  To convert a
-value from floating point to fixed point, multiply it by 0x100000000 (2\ :sup:`32`\ ),
-then cast it to a :code:`long`\ .  The fixed point buffer is
-ordered differently from the others.  For atom *i*\ , the x component of its
-force is stored in element :code:`i`\ , the y component in element 
+The fixed point buffer is ordered differently.  For atom *i*\ , the x component
+of its force is stored in element :code:`i`\ , the y component in element 
 :code:`i+context.getPaddedNumAtoms()`\ , and the z component in element 
-:code:`i+2*context.getPaddedNumAtoms()`\ .
+:code:`i+2*context.getPaddedNumAtoms()`\ .  To convert a value from floating
+point to fixed point, multiply it by 0x100000000 (2\ :sup:`32`\ ),
+then cast it to a :code:`long`\ .  Call :code:`getLongForceBuffer()` to get the
+array of fixed point values.

 The potential energy is also accumulated in a set of buffers, but this one is
 simply a list of floating point values.  All of them are set to zero at the
@@ -490,15 +494,10 @@ start of a computation, and they are summed at the end of the computation to
 yield the total energy.

 The OpenCL implementation of each Force object should define a subclass of
-OpenCLForce, and register an instance of it by calling :code:`addForce()` on
-the OpenCLContext.  This serves two purposes:
-
-#. It reports how many force buffers are required when calculating this
-   particular Force.  The OpenCLContext sets the size of its force buffer array
-   based on the largest number of buffers required by any Force.
-#. It implements methods for determining whether particular particles or groups
-   of particles are identical.  This is important when reordering particles, and is
-   discussed below.
+ComputeForceInfo, and register an instance of it by calling :code:`addForce()` on
+the OpenCLContext.  It implements methods for determining whether particular
+particles or groups of particles are identical.  This is important when
+reordering particles, and is discussed below.


 Nonbonded Forces
@@ -586,8 +585,7 @@ where *k* is a per-particle parameter.  First we create a parameter as
 follows
 ::

-    nb.addParameter(OpenCLNonbondedUtilities::ParameterInfo("kparam", "float", 1,
-            sizeof(cl_float), kparam->getDeviceBuffer()));
+    nb.addParameter(ComputeParameterInfo(kparam, "kparam", "float", 1));

 where :code:`nb` is the OpenCLNonbondedUtilities for the context.  Now we
 call :code:`addInteraction()` to define an interaction with the following
@@ -700,7 +698,7 @@ exchanged without affecting the System in any way.

 Every Force can contribute to defining the boundaries of molecules, and to
 determining whether two molecules are identical.  This is done through the
-OpenCLForceInfo it adds to the OpenCLContext.  It can specify two types of
+ComputeForceInfo it adds to the OpenCLContext.  It can specify two types of
 information:

 #. Given a pair of particles, it can say whether those two particles are
@@ -792,3 +790,189 @@ buffer.  In contrast, the CUDA platform uses *only* the fixed point buffer
 the CUDA platform only works on devices that support 64 bit atomic operations
 (compute capability 1.2 or higher).

+
+.. _common-compute
+
+Common Compute
+##############
+
+Common Compute is not a platform, but it shares many elements of one.  It exists
+to reduce code duplication between the OpenCL and CUDA platforms.  It allows a
+single implementation to be written for most kernels that can be used by both
+platforms.
+
+OpenCL and CUDA are very similar to each other.  Their computational models are
+nearly identical.  For example, each is based around launching kernels that are
+executed in parallel by many threads.  Each of them groups threads into blocks,
+with more communication and synchronization permitted between the threads
+in a block than between ones in different blocks.  They have very similar memory
+hierarchies: high latency global memory, low latency local/shared memory that
+can be used for communication between the threads of a block, and local variables
+that are visible only to a single thread.
+
+Even their languages for writing kernels are very similar.  Here is an OpenCL
+kernel that adds two arrays together, storing the result in a third array.
+::
+
+    __kernel void addArrays(__global const float* restrict a,
+                            __global const float* restrict b,
+                            __global float* restrict c
+                            int length) {
+        for (int i = get_global_id(0); i < length; i += get_global_size(0))
+            c[i] = a[i]+b[i];
+    }
+
+Here is the corresponding CUDA kernel.
+::
+
+    __extern "C" __global__ void addArrays(const float* __restrict__ a,
+                                           const float* __restrict__ b,
+                                           _float* __restrict__ c
+                                           int length) {
+        for (int i = blockIdx.x*blockDim.x+threadIdx.x; i < length; i += blockDim.x*gridDim.x)
+            c[i] = a[i]+b[i];
+    }
+
+The difference between them is largely just a mechanical find-and-replace.
+After many years of writing and maintaining nearly identical kernels by hand,
+it finally occurred to us that the translation could be done automatically by
+the compiler.  Simply by defining a few preprocessor macros, the following
+kernel can be compiled equally well either as OpenCL or as CUDA.
+::
+
+    KERNEL void addArrays(GLOBAL const float* RESTRICT a,
+                          GLOBAL const float* RESTRICT b,
+                          GLOBAL float* RESTRICT c
+                          int length) {
+        for (int i = GLOBAL_ID; i < length; i += GLOBAL_SIZE)
+            c[i] = a[i]+b[i];
+    }
+
+Writing Device Code
+*******************
+
+When compiling kernels with the Common Compute API, the following macros are
+defined.
+
+.. tabularcolumns:: |l|l|L|
+
+-------------------------------+------------------------------------------------------------+--------------------------------------------+
+|Macro                          |OpenCL Definition                                           |CUDA Definition                             |
+===============================+============================================================+============================================+
+|:code:`KERNEL`                 |:code:`__kernel`                                            |:code:`extern "C" __global__`               |
+-------------------------------+------------------------------------------------------------+--------------------------------------------+
+|:code:`DEVICE`                 |                                                            |:code:`__device__`                          |
+-------------------------------+------------------------------------------------------------+--------------------------------------------+
+|:code:`LOCAL`                  |:code:`__local`                                             |:code:`__shared__`                          |
+-------------------------------+------------------------------------------------------------+--------------------------------------------+
+|:code:`LOCAL_ARG`              |:code:`__local`                                             |                                            |
+-------------------------------+------------------------------------------------------------+--------------------------------------------+
+|:code:`GLOBAL`                 |:code:`__global`                                            |                                            |
+-------------------------------+------------------------------------------------------------+--------------------------------------------+
+|:code:`RESTRICT`               |:code:`restrict`                                            |:code:`__restrict__`                        |
+-------------------------------+------------------------------------------------------------+--------------------------------------------+
+|:code:`LOCAL_ID`               |:code:`get_local_id(0)`                                     |:code:`threadIdx.x`                         |
+-------------------------------+------------------------------------------------------------+--------------------------------------------+
+|:code:`LOCAL_SIZE`             |:code:`get_local_size(0)`                                   |:code:`blockDim.x`                          |
+-------------------------------+------------------------------------------------------------+--------------------------------------------+
+|:code:`GLOBAL_ID`              |:code:`get_global_id(0)`                                    |:code:`(blockIdx.x*blockDim.x+threadIdx.x)` |
+-------------------------------+------------------------------------------------------------+--------------------------------------------+
+|:code:`GLOBAL_SIZE`            |:code:`get_global_size(0)`                                  |:code:`(blockDim.x*gridDim.x)`              |
+-------------------------------+------------------------------------------------------------+--------------------------------------------+
+|:code:`GROUP_ID`               |:code:`get_group_id(0)`                                     |:code:`blockIdx.x`                          |
+-------------------------------+------------------------------------------------------------+--------------------------------------------+
+|:code:`NUM_GROUPS`             |:code:`get_num_groups(0)`                                   |:code:`gridDim.x`                           |
+-------------------------------+------------------------------------------------------------+--------------------------------------------+
+|:code:`SYNC_THREADS`           |:code:`barrier(CLK_LOCAL_MEM_FENCE+CLK_GLOBAL_MEM_FENCE);`  |:code:`__syncthreads();`                    |
+-------------------------------+------------------------------------------------------------+--------------------------------------------+
+|:code:`SYNC_WARPS`             | | if SIMT width >= 32:                                     | | if compute capability >= 7.0:            |
+|                               | | :code:`mem_fence(CLK_LOCAL_MEM_FENCE)`                   | | :code:`__syncwarp();`                    |
+|                               | | otherwise:                                               | | otherwise empty                          |
+|                               | | :code:`barrier(CLK_LOCAL_MEM_FENCE)`                     |                                            |
+-------------------------------+------------------------------------------------------------+--------------------------------------------+
+|:code:`MEM_FENCE`              |:code:`mem_fence(CLK_LOCAL_MEM_FENCE+CLK_GLOBAL_MEM_FENCE);`|:code:`__threadfence_block();`              |
+-------------------------------+------------------------------------------------------------+--------------------------------------------+
+|:code:`ATOMIC_ADD(dest, value)`|:code:`atom_add(dest, value)`                               |:code:`atomicAdd(dest, value)`              |
+-------------------------------+------------------------------------------------------------+--------------------------------------------+
+
+A few other symbols may or may not be defined based on the device you are running on:
+:code:`SUPPORTS_DOUBLE_PRECISION` and :code:`SUPPORTS_64_BIT_ATOMICS`\ .  You
+can use :code:`#ifdef` blocks with these symbols to conditionally compile code
+based on the features supported by the device.  In addition, the CUDA compiler
+defines the symbol :code:`__CUDA_ARCH__`\ , so you can check for this symbol if
+you want to have different code blocks for CUDA and OpenCL.
+
+Both OpenCL and CUDA define vector types like :code:`int2` and :code:`float4`\ .
+The types they support are different but overlapping.  When writing common code,
+use only the vector types that are supported by both OpenCL and CUDA: 2, 3, and 4
+element vectors of type :code:`short`\ , :code:`int`\ , :code:`float`\ , and
+:code:`double`\ .
+
+CUDA uses functions to construct vector values, such as :code:`make_float2(x, y)`\ .
+OpenCL instead uses a typecast like syntax: :code:`(float2) (x, y)`\ .  In common
+code, use the CUDA style :code:`make_` functions.  OpenMM provides definitions
+of these functions when compiling as OpenCL.
+
+In CUDA, vector types are simply data structures.  You can access their elements,
+but not do much more with them.  In contrast, OpenCL's vectors are mathematical
+types.  All standard math operators are defined for them, as well as geometrical
+functions like :code:`dot()` and :code:`cross()`\ .  When compiling kernels as
+CUDA, OpenMM provides definitions of these operators and functions.
+
+OpenCL also supports "swizzle" notation for vectors.  For example, if :code:`f`
+is a :code:`float4` you can construct a vector of its first three elements
+by writing :code:`f.xyz`\ , or you can swap its first two elements by writing
+:code:`f.xy = f.yx`\ .  Unfortunately, there is no practical way to support this
+in CUDA, so swizzle notation cannot be used in common code.  Because stripping
+the final element from a four component vector is such a common operation, OpenMM
+provides a special function for doing it: :code:`trimTo3(f)` is a vector of its
+first three elements.
+
+64 bit integers are another data type that needs special handling.  Both OpenCL
+and CUDA support them, but they use different names for them: :code:`long` in OpenCL,
+:code:`long long` in CUDA.  To work around this inconsistency, OpenMM provides
+the typedefs :code:`mm_long` and :code:`mm_ulong` for signed and unsigned 64 bit
+integers in device code.
+
+Writing Host Code
+*****************
+
+Host code for Common Compute is very similar to host code for OpenCL or CUDA.
+In fact, most of the classes provided by the OpenCL and CUDA platforms are
+subclasses of Common Compute classes.  For example, OpenCLContext and
+CudaContext are both subclasses of ComputeContext.  When writing common code,
+each KernelImpl should expect a ComputeContext to be passed to its constructor.
+By using the common API provided by that abstract class, it can be used for
+either OpenCL or CUDA just based on the particular context passed to it at
+runtime.  Similarly, OpenCLNonbondedUtilities and CudaNonbondedUtilities are
+subclasses of the abstract NonbondedUtilities class, and so on.
+
+ArrayInterface is an abstract class defining the interface for arrays stored on
+the device.  OpenCLArray and CudaArray are both subclasses of it.  To simplify
+code that creates and uses arrays, there is also a third subclass called
+ComputeArray.  It acts as a wrapper around an OpenCLArray or CudaArray,
+automatically creating an array of the appropriate type for the current
+platform.  In practice, just follow these rules:
+
+  1. Whenever you need to create an array, make it a ComputeArray.
+
+  2. Whenever you write a function that expects an array to be passed to it,
+     declare the type to be ArrayInterface.
+
+If you do these two things, all differences between platforms will be handled
+automatically.
+
+OpenCL and CUDA have quite different APIs for compiling and invoking kernels.
+To hide these differences, OpenMM provides a set of abstract classes.  To compile
+device code, pass the source code to :code:`compileProgram()` on the ComputeContext.
+This returns a ComputeProgram.  You can then call its :code:`createKernel()`
+method to get a ComputeKernel object, which has methods for setting arguments
+and invoking the kernel.
+
+Sometimes you need to refer to vector types in host code, such as to set the
+value for a kernel argument or to access the elements of an array.  OpenCL and
+CUDA both define types for them, but they have different names, and in any case
+you want to avoid using OpenCL-specific or CUDA-specific types in common code.
+OpenMM therefore defines types for vectors in host code.  They have the same
+names as the correspond types in device code, only with the prefix :code:`mm_`\ ,
+for example :code:`mm_int2` and :code:`mm_float3`\ .
\ No newline at end of file
--- a/libraries/asmjit/asmjit_apibegin.h
+++ b/libraries/asmjit/asmjit_apibegin.h
@@ -53,8 +53,8 @@
 // [GCC]
 #if ASMJIT_CC_GCC
 # pragma GCC diagnostic push
-# pragma GCC diagnostic ignored "-Wbool-operation"
 # if ASMJIT_CC_GCC_GE(8, 0, 0)
+#  pragma GCC diagnostic ignored "-Wbool-operation"
 #  pragma GCC diagnostic ignored "-Wclass-memaccess"
 # endif
 #endif // ASMJIT_CC_GCC

--- a/platforms/common/CMakeLists.txt
+++ b/platforms/common/CMakeLists.txt
+# Encode the kernel sources into a C++ class.
+
+SET(KERNEL_SOURCE_DIR "${CMAKE_CURRENT_SOURCE_DIR}/src")
+SET(KERNEL_SOURCE_CLASS CommonKernelSources)
+SET(KERNELS_CPP ${CMAKE_CURRENT_BINARY_DIR}/src/${KERNEL_SOURCE_CLASS}.cpp)
+SET(KERNELS_H ${CMAKE_CURRENT_BINARY_DIR}/src/${KERNEL_SOURCE_CLASS}.h)
+INCLUDE_DIRECTORIES(BEFORE ${CMAKE_CURRENT_BINARY_DIR}/src)
+FILE(GLOB COMMON_KERNELS ${KERNEL_SOURCE_DIR}/kernels/*.cc)
+ADD_CUSTOM_COMMAND(OUTPUT ${KERNELS_CPP} ${KERNELS_H}
+    COMMAND ${CMAKE_COMMAND}
+    ARGS -D KERNEL_SOURCE_DIR=${KERNEL_SOURCE_DIR} -D KERNELS_CPP=${KERNELS_CPP} -D KERNELS_H=${KERNELS_H} -D KERNEL_SOURCE_CLASS=${KERNEL_SOURCE_CLASS} -D KERNEL_FILE_EXTENSION=cc -P ${CMAKE_SOURCE_DIR}/cmake_modules/EncodeKernelFiles.cmake
+    DEPENDS ${COMMON_KERNELS}
+)
+SET_SOURCE_FILES_PROPERTIES(${KERNELS_CPP} ${KERNELS_H} PROPERTIES GENERATED TRUE)
+ADD_CUSTOM_TARGET(CommonKernels DEPENDS ${KERNELS_CPP} ${KERNELS_H})
+
+# Install headers
+
+FILE(GLOB CORE_HEADERS include/openmm/common/*.h)
+INSTALL_FILES(/include/openmm/common FILES ${CORE_HEADERS})
--- a/platforms/common/include/openmm/common/ArrayInterface.h
+++ b/platforms/common/include/openmm/common/ArrayInterface.h
+#ifndef OPENMM_ARRAYINTERFACE_H_
+#define OPENMM_ARRAYINTERFACE_H_
+
+/* -------------------------------------------------------------------------- *
+ *                                   OpenMM                                   *
+ * -------------------------------------------------------------------------- *
+ * This is part of the OpenMM molecular simulation toolkit originating from   *
+ * Simbios, the NIH National Center for Physics-Based Simulation of           *
+ * Biological Structures at Stanford, funded under the NIH Roadmap for        *
+ * Medical Research, grant U54 GM072970. See https://simtk.org.               *
+ *                                                                            *
+ * Portions copyright (c) 2019 Stanford University and the Authors.           *
+ * Authors: Peter Eastman                                                     *
+ * Contributors:                                                              *
+ *                                                                            *
+ * This program is free software: you can redistribute it and/or modify       *
+ * it under the terms of the GNU Lesser General Public License as published   *
+ * by the Free Software Foundation, either version 3 of the License, or       *
+ * (at your option) any later version.                                        *
+ *                                                                            *
+ * This program is distributed in the hope that it will be useful,            *
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of             *
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the              *
+ * GNU Lesser General Public License for more details.                        *
+ *                                                                            *
+ * You should have received a copy of the GNU Lesser General Public License   *
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.      *
+ * -------------------------------------------------------------------------- */
+
+#include "openmm/OpenMMException.h"
+#include "openmm/common/windowsExportCommon.h"
+#include <vector>
+
+namespace OpenMM {
+
+class ComputeContext;
+
+/**
+ * This abstract class defines the interface for arrays stored on a computing device.
+ */
+
+class OPENMM_EXPORT_COMMON ArrayInterface {
+public:
+    virtual ~ArrayInterface() {
+    }
+    /**
+     * Initialize this array.
+     *
+     * @param context           the context for which to create the array
+     * @param size              the number of elements in the array
+     * @param elementSize       the size of each element in bytes
+     * @param name              the name of the array
+     */
+    virtual void initialize(ComputeContext& context, int size, int elementSize, const std::string& name) = 0;
+    /**
+     * Initialize this object.  The template argument is the data type of each array element.
+     *
+     * @param context           the context for which to create the array
+     * @param size              the number of elements in the array
+     * @param name              the name of the array
+     */
+    template <class T>
+    void initialize(ComputeContext& context, int size, const std::string& name) {
+        initialize(context, size, sizeof(T), name);
+    }
+    /**
+     * Recreate the internal storage to have a different size.
+     */
+    virtual void resize(int size) = 0;
+    /**
+     * Get whether this array has been initialized.
+     */
+    virtual bool isInitialized() const = 0;
+    /**
+     * Get the number of elements in the array.
+     */
+    virtual int getSize() const = 0;
+    /**
+     * Get the size of each element in bytes.
+     */
+    virtual int getElementSize() const = 0;
+    /**
+     * Get the name of the array.
+     */
+    virtual const std::string& getName() const = 0;
+    /**
+     * Get the context this array belongs to.
+     */
+    virtual ComputeContext& getContext() = 0;
+    /**
+     * Copy the values in a vector to the device memory.
+     * 
+     * @param data      the data in host memory to copy
+     * @param convert   if true, automatic conversions between single and double
+     *                  precision will be performed as necessary
+     */
+    template <class T>
+    void upload(const std::vector<T>& data, bool convert=false) {
+        if (convert && data.size() == getSize() && sizeof(T) != getElementSize()) {
+            if (sizeof(T) == 2*getElementSize()) {
+                // Convert values from double to single precision.
+                const double* d = reinterpret_cast<const double*>(&data[0]);
+                std::vector<float> v(getElementSize()*getSize()/sizeof(float));
+                for (int i = 0; i < v.size(); i++)
+                    v[i] = (float) d[i];
+                upload(&v[0], true);
+                return;
+            }
+            if (2*sizeof(T) == getElementSize()) {
+                // Convert values from single to double precision.
+                const float* d = reinterpret_cast<const float*>(&data[0]);
+                std::vector<double> v(getElementSize()*getSize()/sizeof(double));
+                for (int i = 0; i < v.size(); i++)
+                    v[i] = (double) d[i];
+                upload(&v[0], true);
+                return;
+            }
+        }
+        if (sizeof(T) != getElementSize() || data.size() != getSize())
+            throw OpenMMException("Error uploading array "+getName()+": The specified vector does not match the size of the array");
+        upload(&data[0], true);
+    }
+    /**
+     * Copy the values in the array to a vector.
+     */
+    template <class T>
+    void download(std::vector<T>& data) const {
+        if (sizeof(T) != getElementSize())
+            throw OpenMMException("Error downloading array "+getName()+": The specified vector has the wrong element size");
+        if (data.size() != getSize())
+            data.resize(getSize());
+        download(&data[0], true);
+    }
+    /**
+     * Copy the values from host memory to the array.
+     * 
+     * @param data     the data to copy
+     * @param blocking if true, this call will block until the transfer is complete.  Subclasses often
+     *                 have restrictions on non-blocking copies, such as that the source data must be
+     *                 in page-locked memory.
+     */
+    virtual void upload(const void* data, bool blocking=true) = 0;
+    /**
+     * Copy the values in the array to host memory.
+     * 
+     * @param data     the destination to copy the value to
+     * @param blocking if true, this call will block until the transfer is complete.  Subclasses often
+     *                 have restrictions on non-blocking copies, such as that the destination must be
+     *                 in page-locked memory.
+     */
+    virtual void download(void* data, bool blocking=true) const = 0;
+    /**
+     * Copy the values in this array to a second array.
+     * 
+     * @param dest     the destination array to copy to
+     */
+    virtual void copyTo(ArrayInterface& dest) const = 0;
+};
+
+} // namespace OpenMM
+
+#endif /*OPENMM_ARRAYINTERFACE_H_*/
--- a/platforms/common/include/openmm/common/BondedUtilities.h
+++ b/platforms/common/include/openmm/common/BondedUtilities.h
+#ifndef OPENMM_BONDEDUTILITIES_H_
+#define OPENMM_BONDEDUTILITIES_H_
+
+/* -------------------------------------------------------------------------- *
+ *                                   OpenMM                                   *
+ * -------------------------------------------------------------------------- *
+ * This is part of the OpenMM molecular simulation toolkit originating from   *
+ * Simbios, the NIH National Center for Physics-Based Simulation of           *
+ * Biological Structures at Stanford, funded under the NIH Roadmap for        *
+ * Medical Research, grant U54 GM072970. See https://simtk.org.               *
+ *                                                                            *
+ * Portions copyright (c) 2011-2019 Stanford University and the Authors.      *
+ * Authors: Peter Eastman                                                     *
+ * Contributors:                                                              *
+ *                                                                            *
+ * This program is free software: you can redistribute it and/or modify       *
+ * it under the terms of the GNU Lesser General Public License as published   *
+ * by the Free Software Foundation, either version 3 of the License, or       *
+ * (at your option) any later version.                                        *
+ *                                                                            *
+ * This program is distributed in the hope that it will be useful,            *
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of             *
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the              *
+ * GNU Lesser General Public License for more details.                        *
+ *                                                                            *
+ * You should have received a copy of the GNU Lesser General Public License   *
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.      *
+ * -------------------------------------------------------------------------- */
+
+#include "openmm/common/ArrayInterface.h"
+#include <string>
+#include <vector>
+
+namespace OpenMM {
+
+/**
+ * This abstract class defines an interface for computing bonded interactions.  Call
+ * getBondedUtilities() on a ComputeContext to get the BondedUtilities object for that
+ * context.
+ * 
+ * This class provides a generic mechanism for evaluating bonded interactions.  You write only
+ * the source code needed to compute one interaction, and this object takes care of creating
+ * and executing a complete kernel that loops over bonds, evaluates each one, and accumulates
+ * the resulting forces and energies.  This offers two advantages.  First, it simplifies the
+ * task of writing a new Force.  Second, it allows multiple forces to be evaluated by a single
+ * kernel, which reduces overhead and improves performance.
+ * 
+ * A "bonded interaction" means an interaction that affects a small, fixed set of particles.
+ * The interaction energy may depend on the positions of only those particles, and the list of
+ * particles forming a "bond" may not change with time.  Examples of bonded interactions
+ * include HarmonicBondForce, HarmonicAngleForce, and PeriodicTorsionForce.
+ * 
+ * To create a bonded interaction, call addInteraction().  You pass to it a block of source
+ * code for evaluating the interaction.  The inputs and outputs for that source code are as
+ * follows:
+ * 
+ * <ol>
+ * <li>The index of the bond being evaluated will have been stored in the unsigned int variable "index".</li>
+ * <li>The indices of the atoms forming that bond will have been stored in the unsigned int variables "atom1",
+ * "atom2", ....</li>
+ * <li>The positions of those atoms will have been stored in the real4 variables "pos1", "pos2", ....</li>
+ * <li>A real variable called "energy" will exist.  Your code should add the potential energy of the
+ * bond to that variable.</li>
+ * <li>Your code should define real3 variables called "force1", "force2", ... that contain the force to
+ * apply to each atom.</li>
+ * </ol>
+ * 
+ * As a simple example, the following source code would be used to implement a pairwise interaction of
+ * the form E=r^2:
+ * 
+ * <tt><pre>
+ * real4 delta = pos2-pos1;
+ * energy += delta.x*delta.x + delta.y*delta.y + delta.z*delta.z;
+ * real3 force1 = 2.0f*delta;
+ * real3 force2 = -2.0f*delta;
+ * </pre></tt>
+ * 
+ * Interactions will often depend on parameters or other data.  Call addArgument() to provide the data
+ * to this class.  It will be passed to the interaction kernel as an argument, and you can refer to it
+ * from your interaction code.
+ */
+
+class OPENMM_EXPORT_COMMON BondedUtilities {
+public:
+    virtual ~BondedUtilities() {
+    }
+    /**
+     * Add a bonded interaction.
+     *
+     * @param atoms    this should have one entry for each bond, and that entry should contain the list
+     *                 of atoms involved in the bond.  Every entry must have the same number of atoms.
+     * @param source   the code to evaluate the interaction
+     * @param group    the force group in which the interaction should be calculated
+     */
+    virtual void addInteraction(const std::vector<std::vector<int> >& atoms, const std::string& source, int group) = 0;
+    /**
+     * Add an argument that should be passed to the interaction kernel.
+     * 
+     * @param data    the array containing the data to pass
+     * @param type    the data type contained in the memory (e.g. "float4")
+     * @return the name that will be used for the argument.  Any code you pass to addInteraction() should
+     * refer to it by this name.
+     */
+    virtual std::string addArgument(ArrayInterface& data, const std::string& type) = 0;
+    /**
+     * Register that the interaction kernel will be computing the derivative of the potential energy
+     * with respect to a parameter.
+     * 
+     * @param param   the name of the parameter
+     * @return the variable that will be used to accumulate the derivative.  Any code you pass to addInteraction() should
+     * add its contributions to this variable.
+     */
+    virtual std::string addEnergyParameterDerivative(const std::string& param) = 0;
+    /**
+     * Add some code that should be included in the program, before the start of the kernel.
+     * This can be used, for example, to define functions that will be called by the kernel.
+     * 
+     * @param source   the code to include
+     */
+    virtual void addPrefixCode(const std::string& source) = 0;
+};
+
+} // namespace OpenMM
+
+#endif /*OPENMM_BONDEDUTILITIES_H_*/
--- a/platforms/common/include/openmm/common/CommonKernels.h
+++ b/platforms/common/include/openmm/common/CommonKernels.h
--- a/platforms/common/include/openmm/common/ComputeArray.h
+++ b/platforms/common/include/openmm/common/ComputeArray.h
+#ifndef OPENMM_COMPUTEARRAY_H_
+#define OPENMM_COMPUTEARRAY_H_
+
+/* -------------------------------------------------------------------------- *
+ *                                   OpenMM                                   *
+ * -------------------------------------------------------------------------- *
+ * This is part of the OpenMM molecular simulation toolkit originating from   *
+ * Simbios, the NIH National Center for Physics-Based Simulation of           *
+ * Biological Structures at Stanford, funded under the NIH Roadmap for        *
+ * Medical Research, grant U54 GM072970. See https://simtk.org.               *
+ *                                                                            *
+ * Portions copyright (c) 2019 Stanford University and the Authors.           *
+ * Authors: Peter Eastman                                                     *
+ * Contributors:                                                              *
+ *                                                                            *
+ * This program is free software: you can redistribute it and/or modify       *
+ * it under the terms of the GNU Lesser General Public License as published   *
+ * by the Free Software Foundation, either version 3 of the License, or       *
+ * (at your option) any later version.                                        *
+ *                                                                            *
+ * This program is distributed in the hope that it will be useful,            *
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of             *
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the              *
+ * GNU Lesser General Public License for more details.                        *
+ *                                                                            *
+ * You should have received a copy of the GNU Lesser General Public License   *
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.      *
+ * -------------------------------------------------------------------------- */
+
+#include "openmm/common/ArrayInterface.h"
+
+namespace OpenMM {
+
+/**
+ * This is an implementation of ArrayInterface that acts as a wrapper around a platform-specific
+ * array implementation (typically CudaArray or OpenCLArray).  This class can be used in code that
+ * is not platform-specific, and an appropriate implementation array is created automatically
+ * based on the ComputeContext.
+ */
+
+class OPENMM_EXPORT_COMMON ComputeArray : public ArrayInterface {
+public:
+    /**
+     * Create an uninitialized ComputeArray object.  It cannot be used until initialize() is called on it.
+     */
+    ComputeArray();
+    /**
+     * Release all resources allocated by this object.
+     */
+    ~ComputeArray();
+    /**
+     * Get the internal array this object is wrapping.
+     */
+    ArrayInterface& getArray();
+    /**
+     * Initialize this array.
+     *
+     * @param context           the context for which to create the array
+     * @param size              the number of elements in the array
+     * @param elementSize       the size of each element in bytes
+     * @param name              the name of the array
+     */
+    void initialize(ComputeContext& context, int size, int elementSize, const std::string& name);
+    /**
+     * Initialize this object.  The template argument is the data type of each array element.
+     *
+     * @param context           the context for which to create the array
+     * @param size              the number of elements in the array
+     * @param name              the name of the array
+     */
+    template <class T>
+    void initialize(ComputeContext& context, int size, const std::string& name) {
+        initialize(context, size, sizeof(T), name);
+    }
+    /**
+     * Recreate the internal storage to have a different size.
+     */
+    void resize(int size);
+    /**
+     * Get whether this array has been initialized.
+     */
+    bool isInitialized() const;
+    /**
+     * Get the number of elements in the array.
+     */
+    int getSize() const;
+    /**
+     * Get the size of each element in bytes.
+     */
+    int getElementSize() const;
+    /**
+     * Get the name of the array.
+     */
+    const std::string& getName() const;
+    /**
+     * Get the context this array belongs to.
+     */
+    ComputeContext& getContext();
+    /**
+     * Copy the values in a vector to the Buffer.
+     */
+    template <class T>
+    void upload(const std::vector<T>& data, bool convert=false) {
+        ArrayInterface::upload(data, convert);
+    }
+    /**
+     * Copy the values in the Buffer to a vector.
+     */
+    template <class T>
+    void download(std::vector<T>& data) const {
+        ArrayInterface::download(data);
+    }
+    /**
+     * Copy the values from host memory to the array.
+     * 
+     * @param data     the data to copy
+     * @param blocking if true, this call will block until the transfer is complete.  Subclasses often
+     *                 have restrictions on non-blocking copies, such as that the source data must be
+     *                 in page-locked memory.
+     */
+    void upload(const void* data, bool blocking=true);
+    /**
+     * Copy the values in the array to host memory.
+     * 
+     * @param data     the destination to copy the value to
+     * @param blocking if true, this call will block until the transfer is complete.  Subclasses often
+     *                 have restrictions on non-blocking copies, such as that the destination must be
+     *                 in page-locked memory.
+     */
+    void download(void* data, bool blocking=true) const;
+    /**
+     * Copy the values in this array to a second array.
+     * 
+     * @param dest     the destination array to copy to
+     */
+    void copyTo(ArrayInterface& dest) const;
+private:
+    ArrayInterface* impl;
+};
+
+} // namespace OpenMM
+
+#endif /*OPENMM_COMPUTEARRAY_H_*/
--- a/platforms/common/include/openmm/common/ComputeContext.h
+++ b/platforms/common/include/openmm/common/ComputeContext.h
--- a/platforms/common/include/openmm/common/ComputeEvent.h
+++ b/platforms/common/include/openmm/common/ComputeEvent.h
+#ifndef OPENMM_COMPUTEEVENT_H_
+#define OPENMM_COMPUTEEVENT_H_
+
+/* -------------------------------------------------------------------------- *
+ *                                   OpenMM                                   *
+ * -------------------------------------------------------------------------- *
+ * This is part of the OpenMM molecular simulation toolkit originating from   *
+ * Simbios, the NIH National Center for Physics-Based Simulation of           *
+ * Biological Structures at Stanford, funded under the NIH Roadmap for        *
+ * Medical Research, grant U54 GM072970. See https://simtk.org.               *
+ *                                                                            *
+ * Portions copyright (c) 2019 Stanford University and the Authors.           *
+ * Authors: Peter Eastman                                                     *
+ * Contributors:                                                              *
+ *                                                                            *
+ * This program is free software: you can redistribute it and/or modify       *
+ * it under the terms of the GNU Lesser General Public License as published   *
+ * by the Free Software Foundation, either version 3 of the License, or       *
+ * (at your option) any later version.                                        *
+ *                                                                            *
+ * This program is distributed in the hope that it will be useful,            *
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of             *
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the              *
+ * GNU Lesser General Public License for more details.                        *
+ *                                                                            *
+ * You should have received a copy of the GNU Lesser General Public License   *
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.      *
+ * -------------------------------------------------------------------------- */
+
+#include <memory>
+
+namespace OpenMM {
+
+/**
+ * This abstract class represents an event for synchronization between the host and
+ * device.  It is created by calling createEvent() on a ComputeContext, which returns
+ * an instance of a platform-specific subclass.  To use it, call enqueue() immediately
+ * after starting an asynchronous operation, such as a kernel invocation or non-blocking
+ * data transfer.  Then at a later point call wait().  This will cause the host to block
+ * until all operations started before the call to enequeue() have completed.
+ * 
+ * Instead of referring to this class directly, it is best to use a ComputeEvent, which is
+ * a typedef for a shared_ptr to a ComputeEventImpl.  This allows you to treat it as having
+ * value semantics, and frees you from having to manage memory.  
+ */
+
+class OPENMM_EXPORT_COMMON ComputeEventImpl {
+public:
+    virtual ~ComputeEventImpl() {
+    }
+    /**
+     * Place the event into the device's execution queue.
+     */
+    virtual void enqueue() = 0;
+    /**
+     * Block until all operations started before the call to enqueue() have completed.
+     */
+    virtual void wait() = 0;
+};
+
+typedef std::shared_ptr<ComputeEventImpl> ComputeEvent;
+
+} // namespace OpenMM
+
+#endif /*OPENMM_COMPUTEEVENT_H_*/
--- a/platforms/common/include/openmm/common/ComputeForceInfo.h
+++ b/platforms/common/include/openmm/common/ComputeForceInfo.h
+#ifndef OPENMM_COMPUTEFORCEINFO_H_
+#define OPENMM_COMPUTEFORCEINFO_H_
+
+/* -------------------------------------------------------------------------- *
+ *                                   OpenMM                                   *
+ * -------------------------------------------------------------------------- *
+ * This is part of the OpenMM molecular simulation toolkit originating from   *
+ * Simbios, the NIH National Center for Physics-Based Simulation of           *
+ * Biological Structures at Stanford, funded under the NIH Roadmap for        *
+ * Medical Research, grant U54 GM072970. See https://simtk.org.               *
+ *                                                                            *
+ * Portions copyright (c) 2009-2019 Stanford University and the Authors.       *
+ * Authors: Peter Eastman                                                     *
+ * Contributors:                                                              *
+ *                                                                            *
+ * This program is free software: you can redistribute it and/or modify       *
+ * it under the terms of the GNU Lesser General Public License as published   *
+ * by the Free Software Foundation, either version 3 of the License, or       *
+ * (at your option) any later version.                                        *
+ *                                                                            *
+ * This program is distributed in the hope that it will be useful,            *
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of             *
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the              *
+ * GNU Lesser General Public License for more details.                        *
+ *                                                                            *
+ * You should have received a copy of the GNU Lesser General Public License   *
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.      *
+ * -------------------------------------------------------------------------- */
+
+#include "openmm/common/windowsExportCommon.h"
+#include <vector>
+
+namespace OpenMM {
+
+/**
+ * ComputeForceInfo objects describe information about the behavior and requirements of
+ * a force.  They exist primarily to help a ComputeContext determine how particles can be
+ * reordered without affecting forces.  Force kernels create them during initialization
+ * and add them to the ComputeContext by calling addForce().
+ */
+
+class OPENMM_EXPORT_COMMON ComputeForceInfo {
+public:
+    ComputeForceInfo() {
+    }
+    /**
+     * Get whether or not two particles have identical force field parameters.
+     */
+    virtual bool areParticlesIdentical(int particle1, int particle2);
+    /**
+     * Get the number of particle groups defined by this force.
+     */
+    virtual int getNumParticleGroups();
+    /**
+     * Get the list of particles in a particular group.
+     */
+    virtual void getParticlesInGroup(int index, std::vector<int>& particles);
+    /**
+     * Get whether two particle groups are identical.
+     */
+    virtual bool areGroupsIdentical(int group1, int group2);
+};
+
+} // namespace OpenMM
+
+#endif /*OPENMM_COMPUTEFORCEINFO_H_*/
--- a/platforms/common/include/openmm/common/ComputeKernel.h
+++ b/platforms/common/include/openmm/common/ComputeKernel.h
+#ifndef OPENMM_COMPUTEKERNEL_H_
+#define OPENMM_COMPUTEKERNEL_H_
+
+/* -------------------------------------------------------------------------- *
+ *                                   OpenMM                                   *
+ * -------------------------------------------------------------------------- *
+ * This is part of the OpenMM molecular simulation toolkit originating from   *
+ * Simbios, the NIH National Center for Physics-Based Simulation of           *
+ * Biological Structures at Stanford, funded under the NIH Roadmap for        *
+ * Medical Research, grant U54 GM072970. See https://simtk.org.               *
+ *                                                                            *
+ * Portions copyright (c) 2019 Stanford University and the Authors.           *
+ * Authors: Peter Eastman                                                     *
+ * Contributors:                                                              *
+ *                                                                            *
+ * This program is free software: you can redistribute it and/or modify       *
+ * it under the terms of the GNU Lesser General Public License as published   *
+ * by the Free Software Foundation, either version 3 of the License, or       *
+ * (at your option) any later version.                                        *
+ *                                                                            *
+ * This program is distributed in the hope that it will be useful,            *
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of             *
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the              *
+ * GNU Lesser General Public License for more details.                        *
+ *                                                                            *
+ * You should have received a copy of the GNU Lesser General Public License   *
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.      *
+ * -------------------------------------------------------------------------- */
+
+#include "openmm/common/ArrayInterface.h"
+#include <memory>
+#include <string>
+#include <type_traits>
+
+namespace OpenMM {
+
+/**
+ * This abstract class represents a kernel that can be executed on a computing device.
+ * Call createKernel() on a ComputeProgramImpl to create an instance of a platform-specific
+ * subclass.  Then call addArg() to specify the values to pass for all of the kernel's arguments.
+ * Finally, call execute() to execute the kernel.  If you need to modify the values of kernel
+ * arguments between invocations, use setArg() to change the value of an argument.
+ * 
+ * Instead of referring to this class directly, it is best to use ComputeKernel, which is
+ * a typedef for a shared_ptr to a ComputeKernelImpl.  This allows you to treat it as having
+ * value semantics, and frees you from having to manage memory.  
+ */
+
+class OPENMM_EXPORT_COMMON ComputeKernelImpl {
+public:
+    virtual ~ComputeKernelImpl() {
+    }
+    /**
+     * Get the name of this kernel.
+     */
+    virtual std::string getName() const = 0;
+    /**
+     * Add an argument to pass the kernel when it is invoked.
+     * 
+     * @param value     the value to pass to the kernel
+     */
+    template <class T>
+    typename std::enable_if<std::is_trivially_copyable<T>::value, void>::type addArg(const T& value) {
+        addPrimitiveArg(&value, sizeof(value));
+    }
+    /**
+     * Add an argument to pass the kernel when it is invoked.
+     * 
+     * @param value     the value to pass to the kernel
+     */
+    void addArg(ArrayInterface& value) {
+        addArrayArg(value);
+    }
+    /**
+     * Add a placeholder for an argument without specifying its value.  The value must
+     * be provided by calling setArg() before the kernel is executed.
+     */
+    void addArg() {
+        addEmptyArg();
+    }
+    /**
+     * Set the value of an argument to pass the kernel when it is invoked.
+     * 
+     * @param index     the index of the argument to set
+     * @param value     the value to pass to the kernel
+     */
+    template <class T>
+    typename std::enable_if<std::is_trivially_copyable<T>::value, void>::type setArg(int index, const T& value) {
+        setPrimitiveArg(index, &value, sizeof(value));
+    }
+    /**
+     * Set the value of an argument to pass the kernel when it is invoked.
+     * 
+     * @param index     the index of the argument to set
+     * @param value     the value to pass to the kernel
+     */
+    void setArg(int index, ArrayInterface& value) {
+        setArrayArg(index, value);
+    }
+    /**
+     * Execute this kernel.
+     *
+     * @param threads      the maximum number of threads that should be used.  Depending on the
+     *                     computing device, it may choose to use fewer threads than this number.
+     * @param blockSize    the number of threads in each thread block.  If this is omitted, a
+     *                     default size that is appropriate for the computing device is used.
+     */
+    virtual void execute(int threads, int blockSize=-1) = 0;
+protected:
+    /**
+     * Add an argument to pass the kernel when it is invoked, where the value is a
+     * subclass of ArrayInterface.
+     * 
+     * @param value     the value to pass to the kernel
+     */
+    virtual void addArrayArg(ArrayInterface& value) = 0;
+    /**
+     * Add an argument to pass the kernel when it is invoked, where the value is a primitive type.
+     * 
+     * @param value    a pointer to the argument value
+     * @param size     the size of the value in bytes
+     */
+    virtual void addPrimitiveArg(const void* value, int size) = 0;
+    /**
+     * Add a placeholder for an argument without specifying its value.
+     */
+    virtual void addEmptyArg() = 0;
+    /**
+     * Add an argument to pass the kernel when it is invoked, where the value is a
+     * subclass of ArrayInterface.
+     * 
+     * @param index     the index of the argument to set
+     * @param value     the value to pass to the kernel
+     */
+    virtual void setArrayArg(int index, ArrayInterface& value) = 0;
+    /**
+     * Add an argument to pass the kernel when it is invoked, where the value is a primitive type.
+     * 
+     * @param index     the index of the argument to set
+     * @param value    a pointer to the argument value
+     * @param size     the size of the value in bytes
+     */
+    virtual void setPrimitiveArg(int index, const void* value, int size) = 0;
+};
+
+typedef std::shared_ptr<ComputeKernelImpl> ComputeKernel;
+
+} // namespace OpenMM
+
+#endif /*OPENMM_COMPUTEKERNEL_H_*/
--- a/platforms/common/include/openmm/common/ComputeParameterInfo.h
+++ b/platforms/common/include/openmm/common/ComputeParameterInfo.h
+#ifndef OPENMM_COMPUTEPARAMETERINFO_H_
+#define OPENMM_COMPUTEPARAMETERINFO_H_
+
+/* -------------------------------------------------------------------------- *
+ *                                   OpenMM                                   *
+ * -------------------------------------------------------------------------- *
+ * This is part of the OpenMM molecular simulation toolkit originating from   *
+ * Simbios, the NIH National Center for Physics-Based Simulation of           *
+ * Biological Structures at Stanford, funded under the NIH Roadmap for        *
+ * Medical Research, grant U54 GM072970. See https://simtk.org.               *
+ *                                                                            *
+ * Portions copyright (c) 2019 Stanford University and the Authors.           *
+ * Authors: Peter Eastman                                                     *
+ * Contributors:                                                              *
+ *                                                                            *
+ * This program is free software: you can redistribute it and/or modify       *
+ * it under the terms of the GNU Lesser General Public License as published   *
+ * by the Free Software Foundation, either version 3 of the License, or       *
+ * (at your option) any later version.                                        *
+ *                                                                            *
+ * This program is distributed in the hope that it will be useful,            *
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of             *
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the              *
+ * GNU Lesser General Public License for more details.                        *
+ *                                                                            *
+ * You should have received a copy of the GNU Lesser General Public License   *
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.      *
+ * -------------------------------------------------------------------------- */
+
+#include "openmm/common/ArrayInterface.h"
+#include <sstream>
+#include <string>
+
+namespace OpenMM {
+
+/**
+ * This class stores information about a parameter that can be passed to a kernel.
+ * It combines an ArrayInterface holding parameter values with additional information
+ * describing how to represent it in kernels: the variable name, the data type, etc.
+ * 
+ * The array is assumed to contain a parameter value for each of many objects (atoms,
+ * bonds, etc.).  Each value may in turn be a multi-component vector.  When creating
+ * a ComputeParameterInfo, specify the number of components in the vector and the
+ * type of each component.  For example, suppose you have an array of type float3
+ * containing a dipole moment for each atom.  The ComputeParameterInfo would be
+ * created like this:
+ * 
+ * ComputeParameterInfo parameter(dipoleArray, "dipole", "float", 3);
+ */
+
+class ComputeParameterInfo {
+public:
+    /**
+     * Create a ComputeParameterInfo.
+     *
+     * @param array          the array containing the parameter values
+     * @param name           the name of the variable to use for this parameter
+     * @param type           the data type of the parameter's components
+     * @param numComponents  the number of components in the parameter
+     * @param constant       whether the array memory should be marked as constant
+     */
+    ComputeParameterInfo(ArrayInterface& array, const std::string& name, const std::string& componentType, int numComponents, bool constant=true) :
+            array(array), name(name), componentType(componentType), numComponents(numComponents), constant(constant) {
+        if (numComponents == 1)
+            type = componentType;
+        else {
+            std::stringstream s;
+            s << componentType << numComponents;
+            type = s.str();
+        }
+    }
+    virtual ~ComputeParameterInfo() {
+    }
+    /**
+     * Get the array containing the parameter values.
+     */
+    ArrayInterface& getArray() {
+        return array;
+    }
+    /**
+     * Get the array containing the parameter values.
+     */
+    const ArrayInterface& getArray() const {
+        return array;
+    }
+    /**
+     * Get the name of the variable to use for this parameter.
+     */
+    const std::string& getName() const {
+        return name;
+    }
+    /**
+     * Get the data type of each component of the value.  For example, if getType() returns "float3",
+     * this will return "float".
+     */
+    const std::string& getComponentType() const {
+        return componentType;
+    }
+    /**
+     * Get the data type of each value.
+     */
+    const std::string& getType() const {
+        return type;
+    }
+    /**
+     * Get the number of components in each value.  If the values are not a vector
+     * type, this returns 1.
+     */
+    int getNumComponents() const {
+        return numComponents;
+    }
+    /**
+     * Get the size of each parameter value in bytes.
+     */
+    int getSize() const {
+        return array.getElementSize();
+    }
+    /**
+     * Get whether the array memory should be marked as constant.
+     */
+    bool isConstant() const {
+        return constant;
+    }
+private:
+    ArrayInterface& array;
+    std::string name;
+    std::string componentType;
+    std::string type;
+    int numComponents;
+    bool constant;
+};
+
+} // namespace OpenMM
+
+#endif /*OPENMM_COMPUTEPARAMETERINFO_H_*/
--- a/platforms/common/include/openmm/common/ComputeParameterSet.h
+++ b/platforms/common/include/openmm/common/ComputeParameterSet.h
+#ifndef OPENMM_COMPUTEPARAMETERSET_H_
+#define OPENMM_COMPUTEPARAMETERSET_H_
+
+/* -------------------------------------------------------------------------- *
+ *                                   OpenMM                                   *
+ * -------------------------------------------------------------------------- *
+ * This is part of the OpenMM molecular simulation toolkit originating from   *
+ * Simbios, the NIH National Center for Physics-Based Simulation of           *
+ * Biological Structures at Stanford, funded under the NIH Roadmap for        *
+ * Medical Research, grant U54 GM072970. See https://simtk.org.               *
+ *                                                                            *
+ * Portions copyright (c) 2009-2019 Stanford University and the Authors.      *
+ * Authors: Peter Eastman                                                     *
+ * Contributors:                                                              *
+ *                                                                            *
+ * This program is free software: you can redistribute it and/or modify       *
+ * it under the terms of the GNU Lesser General Public License as published   *
+ * by the Free Software Foundation, either version 3 of the License, or       *
+ * (at your option) any later version.                                        *
+ *                                                                            *
+ * This program is distributed in the hope that it will be useful,            *
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of             *
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the              *
+ * GNU Lesser General Public License for more details.                        *
+ *                                                                            *
+ * You should have received a copy of the GNU Lesser General Public License   *
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.      *
+ * -------------------------------------------------------------------------- */
+
+#include "openmm/common/ArrayInterface.h"
+#include "openmm/common/ComputeContext.h"
+#include "openmm/common/ComputeParameterInfo.h"
+#include <string>
+#include <vector>
+
+namespace OpenMM {
+
+/**
+ * This class represents a set of floating point parameter values for a set of objects (particles, bonds, etc.).
+ * It automatically creates an appropriate set of arrays to hold the parameter values, based
+ * on the number of parameters required.
+ */
+
+class OPENMM_EXPORT_COMMON ComputeParameterSet {
+public:
+    /**
+     * Create an ComputeParameterSet.
+     *
+     * @param context          the context for which to create the parameter set
+     * @param numParameters    the number of parameters for each object
+     * @param numObjects       the number of objects to store parameter values for
+     * @param name             the name of the parameter set
+     * @param arrayPerParameter   if true, a separate array is created for each parameter.  If false,
+     *                            multiple parameters may be combined into a single array for efficiency.
+     * @param useDoublePrecision  whether values should be stored as single or double precision
+     */
+    ComputeParameterSet(ComputeContext& context, int numParameters, int numObjects, const std::string& name, bool arrayPerParameter=false, bool useDoublePrecision=false);
+    ~ComputeParameterSet();
+    /**
+     * Get the number of parameters.
+     */
+    int getNumParameters() const {
+        return numParameters;
+    }
+    /**
+     * Get the number of objects.
+     */
+    int getNumObjects() const {
+        return numObjects;
+    }
+    /**
+     * Get the values of all parameters.
+     *
+     * @param values on exit, values[i][j] contains the value of parameter j for object i
+     */
+    template <class T>
+    void getParameterValues(std::vector<std::vector<T> >& values);
+    /**
+     * Set the values of all parameters.
+     *
+     * @param values values[i][j] contains the value of parameter j for object i
+     */
+    template <class T>
+    void setParameterValues(const std::vector<std::vector<T> >& values);
+    /**
+     * Get a vector of ComputeParameterInfo objects which describe the arrays
+     * containing the data.
+     */
+    std::vector<ComputeParameterInfo>& getParameterInfos() {
+        return parameters;
+    }
+    /**
+     * Get a suffix to add to variable names when accessing a certain parameter.
+     *
+     * @param index         the index of the parameter
+     * @param extraSuffix   an extra suffix to add to the variable name
+     * @return the suffix to append
+     */
+    std::string getParameterSuffix(int index, const std::string& extraSuffix="") const;
+private:
+    ComputeContext& context;
+    int numParameters, numObjects, elementSize;
+    std::string name;
+    std::vector<ArrayInterface*> arrays;
+    std::vector<ComputeParameterInfo> parameters;
+};
+
+} // namespace OpenMM
+
+#endif /*OPENMM_COMPUTEPARAMETERSET_H_*/
--- a/platforms/common/include/openmm/common/ComputeProgram.h
+++ b/platforms/common/include/openmm/common/ComputeProgram.h
+#ifndef OPENMM_COMPUTEPROGRAM_H_
+#define OPENMM_COMPUTEPROGRAM_H_
+
+/* -------------------------------------------------------------------------- *
+ *                                   OpenMM                                   *
+ * -------------------------------------------------------------------------- *
+ * This is part of the OpenMM molecular simulation toolkit originating from   *
+ * Simbios, the NIH National Center for Physics-Based Simulation of           *
+ * Biological Structures at Stanford, funded under the NIH Roadmap for        *
+ * Medical Research, grant U54 GM072970. See https://simtk.org.               *
+ *                                                                            *
+ * Portions copyright (c) 2019 Stanford University and the Authors.           *
+ * Authors: Peter Eastman                                                     *
+ * Contributors:                                                              *
+ *                                                                            *
+ * This program is free software: you can redistribute it and/or modify       *
+ * it under the terms of the GNU Lesser General Public License as published   *
+ * by the Free Software Foundation, either version 3 of the License, or       *
+ * (at your option) any later version.                                        *
+ *                                                                            *
+ * This program is distributed in the hope that it will be useful,            *
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of             *
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the              *
+ * GNU Lesser General Public License for more details.                        *
+ *                                                                            *
+ * You should have received a copy of the GNU Lesser General Public License   *
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.      *
+ * -------------------------------------------------------------------------- */
+
+#include "openmm/common/ComputeKernel.h"
+#include <memory>
+
+namespace OpenMM {
+
+/**
+ * This abstract class represents a compiled program that can be executed on a computing
+ * device.  A ComputeProgramImpl is created by calling compileProgram() on a ComputeContext,
+ * which returns an instance of a platform-specific subclass.  The source code for a
+ * ComputeProgramImpl typically contains one or more kernels.  Call createKernel() to get
+ * ComputeKernels for the kernels, which can then be executed.
+ * 
+ * Instead of referring to this class directly, it is best to use ComputeProgram, which is
+ * a typedef for a shared_ptr to a ComputeProgramImpl.  This allows you to treat it as having
+ * value semantics, and frees you from having to manage memory.  
+ */
+
+class OPENMM_EXPORT_COMMON ComputeProgramImpl {
+public:
+    virtual ~ComputeProgramImpl() {
+    }
+    /**
+     * Create a ComputeKernel for one of the kernels in this program.
+     * 
+     * @param name    the name of the kernel to get
+     */
+    virtual ComputeKernel createKernel(const std::string& name) = 0;
+};
+
+typedef std::shared_ptr<ComputeProgramImpl> ComputeProgram;
+
+} // namespace OpenMM
+
+#endif /*OPENMM_COMPUTEPROGRAM_H_*/
--- a/platforms/common/include/openmm/common/ComputeVectorTypes.h
+++ b/platforms/common/include/openmm/common/ComputeVectorTypes.h
+#ifndef OPENMM_COMPUTEVECTORTYPES_H_
+#define OPENMM_COMPUTEVECTORTYPES_H_
+
+/* -------------------------------------------------------------------------- *
+ *                                   OpenMM                                   *
+ * -------------------------------------------------------------------------- *
+ * This is part of the OpenMM molecular simulation toolkit originating from   *
+ * Simbios, the NIH National Center for Physics-Based Simulation of           *
+ * Biological Structures at Stanford, funded under the NIH Roadmap for        *
+ * Medical Research, grant U54 GM072970. See https://simtk.org.               *
+ *                                                                            *
+ * Portions copyright (c) 2019 Stanford University and the Authors.           *
+ * Authors: Peter Eastman                                                     *
+ * Contributors:                                                              *
+ *                                                                            *
+ * This program is free software: you can redistribute it and/or modify       *
+ * it under the terms of the GNU Lesser General Public License as published   *
+ * by the Free Software Foundation, either version 3 of the License, or       *
+ * (at your option) any later version.                                        *
+ *                                                                            *
+ * This program is distributed in the hope that it will be useful,            *
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of             *
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the              *
+ * GNU Lesser General Public License for more details.                        *
+ *                                                                            *
+ * You should have received a copy of the GNU Lesser General Public License   *
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.      *
+ * -------------------------------------------------------------------------- */
+
+namespace OpenMM {
+
+struct mm_short2 {
+    short x, y;
+    mm_short2() {
+    }
+    mm_short2(short x, short y) : x(x), y(y) {
+    }
+};
+struct mm_short3 {
+    short x, y, z, w;
+    mm_short3() {
+    }
+    mm_short3(short x, short y, short z) : x(x), y(y), z(z) {
+    }
+};
+struct mm_short4 {
+    short x, y, z, w;
+    mm_short4() {
+    }
+    mm_short4(short x, short y, short z, short w) : x(x), y(y), z(z), w(w) {
+    }
+};
+struct mm_int2 {
+    int x, y;
+    mm_int2() {
+    }
+    mm_int2(int x, int y) : x(x), y(y) {
+    }
+};
+struct mm_int3 {
+    int x, y, z, w;
+    mm_int3() {
+    }
+    mm_int3(int x, int y, int z) : x(x), y(y), z(z) {
+    }
+};
+struct mm_int4 {
+    int x, y, z, w;
+    mm_int4() {
+    }
+    mm_int4(int x, int y, int z, int w) : x(x), y(y), z(z), w(w) {
+    }
+};
+struct mm_float2 {
+    float x, y;
+    mm_float2() {
+    }
+    mm_float2(float x, float y) : x(x), y(y) {
+    }
+};
+struct mm_float3 {
+    float x, y, z, w;
+    mm_float3() {
+    }
+    mm_float3(float x, float y, float z) : x(x), y(y), z(z) {
+    }
+};
+struct mm_float4 {
+    float x, y, z, w;
+    mm_float4() {
+    }
+    mm_float4(float x, float y, float z, float w) : x(x), y(y), z(z), w(w) {
+    }
+};
+struct mm_double2 {
+    double x, y;
+    mm_double2() {
+    }
+    mm_double2(double x, double y) : x(x), y(y) {
+    }
+};
+struct mm_double3 {
+    double x, y, z, w;
+    mm_double3() {
+    }
+    mm_double3(double x, double y, double z) : x(x), y(y), z(z) {
+    }
+};
+struct mm_double4 {
+    double x, y, z, w;
+    mm_double4() {
+    }
+    mm_double4(double x, double y, double z, double w) : x(x), y(y), z(z), w(w) {
+    }
+};
+
+} // namespace OpenMM
+
+#endif /*OPENMM_COMPUTEVECTORTYPES_H_*/
--- a/platforms/common/include/openmm/common/ExpressionUtilities.h
+++ b/platforms/common/include/openmm/common/ExpressionUtilities.h
+#ifndef OPENMM_EXPRESSIONUTILITIES_H_
+#define OPENMM_EXPRESSIONUTILITIES_H_
+
+/* -------------------------------------------------------------------------- *
+ *                                   OpenMM                                   *
+ * -------------------------------------------------------------------------- *
+ * This is part of the OpenMM molecular simulation toolkit originating from   *
+ * Simbios, the NIH National Center for Physics-Based Simulation of           *
+ * Biological Structures at Stanford, funded under the NIH Roadmap for        *
+ * Medical Research, grant U54 GM072970. See https://simtk.org.               *
+ *                                                                            *
+ * Portions copyright (c) 2009-2019 Stanford University and the Authors.      *
+ * Authors: Peter Eastman                                                     *
+ * Contributors:                                                              *
+ *                                                                            *
+ * This program is free software: you can redistribute it and/or modify       *
+ * it under the terms of the GNU Lesser General Public License as published   *
+ * by the Free Software Foundation, either version 3 of the License, or       *
+ * (at your option) any later version.                                        *
+ *                                                                            *
+ * This program is distributed in the hope that it will be useful,            *
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of             *
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the              *
+ * GNU Lesser General Public License for more details.                        *
+ *                                                                            *
+ * You should have received a copy of the GNU Lesser General Public License   *
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.      *
+ * -------------------------------------------------------------------------- */
+
+#include "ComputeContext.h"
+#include "openmm/TabulatedFunction.h"
+#include "lepton/CustomFunction.h"
+#include "lepton/ExpressionTreeNode.h"
+#include "lepton/ParsedExpression.h"
+#include <map>
+#include <sstream>
+#include <string>
+#include <utility>
+
+namespace OpenMM {
+
+/**
+ * This class is used by various classes to generate kernel source code implementing
+ * user defined mathematical expressions.
+ */
+
+class OPENMM_EXPORT_COMMON ExpressionUtilities {
+public:
+    ExpressionUtilities(ComputeContext& context);
+    /**
+     * Generate the source code for calculating a set of expressions.
+     *
+     * @param expressions    the expressions to generate code for (keys are the variables to store the output values in)
+     * @param variables      defines the source code to generate for each variable that may appear in the expressions.  Keys are
+     *                       variable names, and the values are the code to generate for them.
+     * @param functions      the tabulated functions that may appear in the expressions
+     * @param functionNames  defines the variable name for each tabulated function that may appear in the expressions
+     * @param prefix         a prefix to put in front of temporary variables
+     * @param tempType       the type of value to use for temporary variables (defaults to "real")
+     */
+    std::string createExpressions(const std::map<std::string, Lepton::ParsedExpression>& expressions, const std::map<std::string, std::string>& variables,
+            const std::vector<const TabulatedFunction*>& functions, const std::vector<std::pair<std::string, std::string> >& functionNames,
+            const std::string& prefix, const std::string& tempType="real");
+    /**
+     * Generate the source code for calculating a set of expressions.
+     *
+     * @param expressions    the expressions to generate code for (keys are the variables to store the output values in)
+     * @param variables      defines the source code to generate for each variable or precomputed sub-expression that may appear in the expressions.
+     *                       Each entry is an ExpressionTreeNode, and the code to generate wherever an identical node appears.
+     * @param functions      the tabulated functions that may appear in the expressions
+     * @param functionNames  defines the variable name for each tabulated function that may appear in the expressions
+     * @param prefix         a prefix to put in front of temporary variables
+     * @param tempType       the type of value to use for temporary variables (defaults to "real")
+     */
+    std::string createExpressions(const std::map<std::string, Lepton::ParsedExpression>& expressions, const std::vector<std::pair<Lepton::ExpressionTreeNode, std::string> >& variables,
+            const std::vector<const TabulatedFunction*>& functions, const std::vector<std::pair<std::string, std::string> >& functionNames,
+            const std::string& prefix, const std::string& tempType="real");
+    /**
+     * Calculate the spline coefficients for a tabulated function that appears in expressions.
+     *
+     * @param function   the function for which to compute coefficients
+     * @param width      on output, the number of floats used for each value
+     * @return the spline coefficients
+     */
+    std::vector<float> computeFunctionCoefficients(const TabulatedFunction& function, int& width);
+    /**
+     * Get a Lepton::CustomFunction that can be used to represent a TabulatedFunction when parsing expressions.
+     * 
+     * @param function   the function for which to get a placeholder
+     */
+    Lepton::CustomFunction* getFunctionPlaceholder(const TabulatedFunction& function);
+    /**
+     * Get a Lepton::CustomFunction that can be used to represent the periodicdistance() function when parsing expressions.
+     */
+    Lepton::CustomFunction* getPeriodicDistancePlaceholder();
+private:
+    class FunctionPlaceholder : public Lepton::CustomFunction {
+        public:
+            FunctionPlaceholder(int numArgs) : numArgs(numArgs) {
+            }
+            int getNumArguments() const {
+                return numArgs;
+            }
+            double evaluate(const double* arguments) const {
+                return 0.0;
+            }
+            double evaluateDerivative(const double* arguments, const int* derivOrder) const {
+                return 0.0;
+            }
+            CustomFunction* clone() const {
+                return new FunctionPlaceholder(numArgs);
+            }
+        private:
+            int numArgs;
+    };
+    void processExpression(std::stringstream& out, const Lepton::ExpressionTreeNode& node,
+            std::vector<std::pair<Lepton::ExpressionTreeNode, std::string> >& temps,
+            const std::vector<const TabulatedFunction*>& functions, const std::vector<std::pair<std::string, std::string> >& functionNames,
+            const std::string& prefix, const std::vector<std::vector<double> >& functionParams, const std::vector<Lepton::ParsedExpression>& allExpressions, const std::string& tempType);
+    std::string getTempName(const Lepton::ExpressionTreeNode& node, const std::vector<std::pair<Lepton::ExpressionTreeNode, std::string> >& temps);
+    void findRelatedCustomFunctions(const Lepton::ExpressionTreeNode& node, const Lepton::ExpressionTreeNode& searchNode,
+            std::vector<const Lepton::ExpressionTreeNode*>& nodes);
+    void findRelatedPowers(const Lepton::ExpressionTreeNode& node, const Lepton::ExpressionTreeNode& searchNode,
+            std::map<int, const Lepton::ExpressionTreeNode*>& powers);
+    void callFunction(std::stringstream& out, std::string singleFn, std::string doubleFn, const std::string& arg, const std::string& tempType);
+    void callFunction2(std::stringstream& out, std::string singleFn, std::string doubleFn, const std::string& arg1, const std::string& arg2, const std::string& tempType);
+    std::vector<std::vector<double> > computeFunctionParameters(const std::vector<const TabulatedFunction*>& functions);
+    ComputeContext& context;
+    FunctionPlaceholder fp1, fp2, fp3, periodicDistance;
+};
+
+} // namespace OpenMM
+
+#endif /*OPENMM_EXPRESSIONUTILITIES_H_*/
--- a/platforms/common/include/openmm/common/IntegrationUtilities.h
+++ b/platforms/common/include/openmm/common/IntegrationUtilities.h
+#ifndef OPENMM_INTEGRATIONUTILITIES_H_
+#define OPENMM_INTEGRATIONUTILITIES_H_
+
+/* -------------------------------------------------------------------------- *
+ *                                   OpenMM                                   *
+ * -------------------------------------------------------------------------- *
+ * This is part of the OpenMM molecular simulation toolkit originating from   *
+ * Simbios, the NIH National Center for Physics-Based Simulation of           *
+ * Biological Structures at Stanford, funded under the NIH Roadmap for        *
+ * Medical Research, grant U54 GM072970. See https://simtk.org.               *
+ *                                                                            *
+ * Portions copyright (c) 2009-2019 Stanford University and the Authors.      *
+ * Authors: Peter Eastman                                                     *
+ * Contributors:                                                              *
+ *                                                                            *
+ * This program is free software: you can redistribute it and/or modify       *
+ * it under the terms of the GNU Lesser General Public License as published   *
+ * by the Free Software Foundation, either version 3 of the License, or       *
+ * (at your option) any later version.                                        *
+ *                                                                            *
+ * This program is distributed in the hope that it will be useful,            *
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of             *
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the              *
+ * GNU Lesser General Public License for more details.                        *
+ *                                                                            *
+ * You should have received a copy of the GNU Lesser General Public License   *
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.      *
+ * -------------------------------------------------------------------------- */
+
+#include "openmm/common/ComputeArray.h"
+#include "openmm/common/ComputeKernel.h"
+#include "openmm/common/ComputeVectorTypes.h"
+#include "openmm/System.h"
+#include <iosfwd>
+#include <map>
+
+namespace OpenMM {
+
+class ComputeContext;
+
+/**
+ * This class implements features that are used by many different integrators, including
+ * common workspace arrays, random number generation, and enforcing constraints.
+ */
+
+class OPENMM_EXPORT_COMMON IntegrationUtilities {
+public:
+    IntegrationUtilities(ComputeContext& context, const System& system);
+    virtual ~IntegrationUtilities() {
+    }
+    /**
+     * Get the array which contains position deltas.  These are the amounts by
+     * which the position of each atom will change in the current step.  The actual
+     * positions should not be modified until after constraints have been applied.
+     */
+    virtual ArrayInterface& getPosDelta() = 0;
+    /**
+     * Get the array which contains random values.  Each element is a float4 whose components
+     * are independent, normally distributed random numbers with mean 0 and variance 1.
+     * Be sure to call initRandomNumberGenerator() and prepareRandomNumbers() before
+     * accessing this array.
+     */
+    virtual ArrayInterface& getRandom() = 0;
+    /**
+     * Get the array which contains the current step size.
+     */
+    virtual ArrayInterface& getStepSize() = 0;
+    /**
+     * Set the size to use for the next step.
+     */
+    void setNextStepSize(double size);
+    /**
+     * Get the size that was used for the last step.
+     */
+    double getLastStepSize();
+    /**
+     * Apply constraints to the atom positions.  When calling this method, the
+     * context's array of positions should contain the positions at the start of the
+     * step, and the array returned by getPosDelta() should contain the intended
+     * change to each position.  This method modifies the position deltas so that,
+     * once they are added to the positions, constraints will be satisfied.
+     *
+     * @param tol             the constraint tolerance
+     */
+    void applyConstraints(double tol);
+    /**
+     * Apply constraints to the atom velocities.
+     *
+     * @param tol             the constraint tolerance
+     */
+    void applyVelocityConstraints(double tol);
+    /**
+     * Initialize the random number generator.  This should be called once when the
+     * context is first created.  Subsequent calls will be ignored if the random
+     * seed is the same as on the first call, or throw an exception if the random
+     * seed is different.
+     */
+    void initRandomNumberGenerator(unsigned int randomNumberSeed);
+    /**
+     * Ensure that sufficient random numbers are available in the array, and generate new ones if not.
+     *
+     * @param numValues     the number of random float4's that will be required
+     * @return the index in the array at which to start reading
+     */
+    int prepareRandomNumbers(int numValues);
+    /**
+     * Compute the positions of virtual sites.
+     */
+    void computeVirtualSites();
+    /**
+     * Distribute forces from virtual sites to the atoms they are based on.
+     */
+    virtual void distributeForcesFromVirtualSites() = 0;
+    /**
+     * Create a checkpoint recording the current state of the random number generator.
+     * 
+     * @param stream    an output stream the checkpoint data should be written to
+     */
+    void createCheckpoint(std::ostream& stream);
+    /**
+     * Load a checkpoint that was written by createCheckpoint().
+     * 
+     * @param stream    an input stream the checkpoint data should be read from
+     */
+    void loadCheckpoint(std::istream& stream);
+    /**
+     * Compute the kinetic energy of the system, possibly shifting the velocities in time to account
+     * for a leapfrog integrator.
+     * 
+     * @param timeShift   the amount by which to shift the velocities in time
+     */
+    double computeKineticEnergy(double timeShift);
+    /**
+     * Get the data structure that holds the state of all Nose-Hoover chains
+     */
+    std::map<int, ComputeArray>& getNoseHooverChainState() {
+        return noseHooverChainState;
+    }
+protected:
+    virtual void applyConstraintsImpl(bool constrainVelocities, double tol) = 0;
+    ComputeContext& context;
+    ComputeKernel settlePosKernel, settleVelKernel;
+    ComputeKernel shakePosKernel, shakeVelKernel;
+    ComputeKernel ccmaDirectionsKernel, ccmaPosForceKernel, ccmaVelForceKernel;
+    ComputeKernel ccmaMultiplyKernel, ccmaUpdateKernel;
+    ComputeKernel vsitePositionKernel, vsiteForceKernel, vsiteSaveForcesKernel;
+    ComputeKernel randomKernel, timeShiftKernel;
+    ComputeArray posDelta;
+    ComputeArray settleAtoms;
+    ComputeArray settleParams;
+    ComputeArray shakeAtoms;
+    ComputeArray shakeParams;
+    ComputeArray random;
+    ComputeArray randomSeed;
+    ComputeArray stepSize;
+    ComputeArray ccmaAtoms;
+    ComputeArray ccmaDistance;
+    ComputeArray ccmaReducedMass;
+    ComputeArray ccmaAtomConstraints;
+    ComputeArray ccmaNumAtomConstraints;
+    ComputeArray ccmaConstraintMatrixColumn;
+    ComputeArray ccmaConstraintMatrixValue;
+    ComputeArray ccmaDelta1;
+    ComputeArray ccmaDelta2;
+    ComputeArray ccmaConverged;
+    ComputeArray vsite2AvgAtoms;
+    ComputeArray vsite2AvgWeights;
+    ComputeArray vsite3AvgAtoms;
+    ComputeArray vsite3AvgWeights;
+    ComputeArray vsiteOutOfPlaneAtoms;
+    ComputeArray vsiteOutOfPlaneWeights;
+    ComputeArray vsiteLocalCoordsIndex;
+    ComputeArray vsiteLocalCoordsAtoms;
+    ComputeArray vsiteLocalCoordsWeights;
+    ComputeArray vsiteLocalCoordsPos;
+    ComputeArray vsiteLocalCoordsStartIndex;
+    std::map<int, ComputeArray> noseHooverChainState;
+    int randomPos, lastSeed, numVsites;
+    bool hasOverlappingVsites;
+    mm_double2 lastStepSize;
+    struct ShakeCluster;
+    struct ConstraintOrderer;
+};
+
+} // namespace OpenMM
+
+#endif /*OPENMM_INTEGRATIONUTILITIES_H_*/