Merge pull request #4632 from ex-rzr/make-hip-standard-platform

HIP platform

Merge pull request #4632 from ex-rzr/make-hip-standard-platform
HIP platform
3b8df952 · Peter Eastman · GitHub · 5ce6a85d · 28fb2918 · 3b8df952
Unverified Commit 3b8df952 authored Sep 05, 2024 by Peter Eastman Committed by GitHub Sep 05, 2024
20 changed files
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -353,11 +353,24 @@ IF(OPENMM_BUILD_OPENCL_LIB)
    ADD_SUBDIRECTORY(platforms/opencl)
 ENDIF(OPENMM_BUILD_OPENCL_LIB)

+# HIP platform
+
+LIST(APPEND CMAKE_PREFIX_PATH $ENV{ROCM_PATH} /opt/rocm)
+FIND_PACKAGE(HIP CONFIG QUIET)
+IF(HIP_FOUND)
+    SET(OPENMM_BUILD_HIP_LIB ON CACHE BOOL "Build OpenMMHIP library for AMD GPUs")
+ELSE(HIP_FOUND)
+    SET(OPENMM_BUILD_HIP_LIB OFF CACHE BOOL "Build OpenMMHIP library for AMD GPUs")
+ENDIF(HIP_FOUND)
+IF(OPENMM_BUILD_HIP_LIB)
+    ADD_SUBDIRECTORY(platforms/hip)
+ENDIF(OPENMM_BUILD_HIP_LIB)
+
 # Common compute files

 SET(OPENMM_BUILD_COMMON OFF CACHE BOOL "Build common files even if CUDA or OpenCL platforms are not built")

-IF(OPENMM_BUILD_CUDA_LIB OR OPENMM_BUILD_OPENCL_LIB OR OPENMM_BUILD_COMMON)
+IF(OPENMM_BUILD_CUDA_LIB OR OPENMM_BUILD_OPENCL_LIB OR OPENMM_BUILD_HIP_LIB OR OPENMM_BUILD_COMMON)
    ADD_SUBDIRECTORY(platforms/common)
 ENDIF()


--- a/docs-source/licenses/Licenses.txt
+++ b/docs-source/licenses/Licenses.txt
@@ -177,3 +177,31 @@ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
 THE SOFTWARE.
+
+
+9. VkFFT
+
+OpenMM uses the VkFFT library by Dmitrii Tolmachev.  It may be used under the
+terms of the MIT License:
+
+MIT License
+
+Copyright (c) 2020 - present Dmitrii Tolmachev
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
--- a/docs-source/usersguide/application/05_add_on_packages.rst
+++ b/docs-source/usersguide/application/05_add_on_packages.rst
@@ -104,17 +104,6 @@ For more information,  see the OpenMMTools_ website.

 .. _OpenMMTools: https://github.com/choderalab/openmmtools

-OpenMM-HIP
-**********
-
-This package adds a new platform that is implemented with AMD's HIP framework.
-When running on AMD GPUs, it often has much faster performance than the OpenCL
-platform.  For information about how to install it, see the OpenMM-HIP_ website.
-Once it is installed, the new platform can be selected and used exactly like the
-ones included in the main OpenMM package.
-
-.. _OpenMM-HIP: https://github.com/StreamHPC/openmm-hip
-
 openmmforcefields
 *****************


--- a/docs-source/usersguide/library/01_introduction.rst
+++ b/docs-source/usersguide/library/01_introduction.rst
@@ -46,7 +46,7 @@ license.  This is a very permissive license which allows them to be used in
 almost any way, requiring only that you retain the copyright notice and
 disclaimer when distributing them.

-The CUDA and OpenCL platforms are distributed under the GNU Lesser General
+The CUDA, HIP, and OpenCL platforms are distributed under the GNU Lesser General
 Public License (LGPL).  This also allows you to use, modify, and distribute them
 in any way you want, but it requires you to also distribute the source code for
 your modifications.  This restriction applies only to modifications to OpenMM
@@ -280,8 +280,8 @@ simulation; it is a fairly generic computational API.  In addition to defining
 the generic classes, OpenMM also defines abstract subclasses of KernelImpl
 corresponding to specific calculations.  For example, there is a class called
 CalcHarmonicBondForceKernel to implement HarmonicBondForce and a class called
-IntegrateLangevinStepKernel to implement LangevinIntegrator.  It is these
-classes for which each Platform must provide a concrete subclass.
+IntegrateLangevinMiddleStepKernel to implement LangevinMiddleIntegrator.  It is
+these classes for which each Platform must provide a concrete subclass.

 This architecture is designed to allow easy extensibility.  To support a new
 hardware platform, for example, you create concrete subclasses of all the
@@ -330,6 +330,9 @@ conventional CPUs.
 **CudaPlatform**\ : This platform is implemented using the CUDA language, and
 performs calculations on Nvidia GPUs.

+**HipPlatform**\ : This platform is implemented using the HIP language, and
+performs calculations on ROCm-compatible AMD GPUs.
+
 **OpenCLPlatform**\ : This platform is implemented using the OpenCL language,
 and performs calculations on a variety of types of GPUs and CPUs.

@@ -343,8 +346,8 @@ The choice of which platform to use for a simulation depends on various factors:
   some older computers.  Also, for simulations that use certain features
   (primarily the various “custom” force classes), it may be faster to use the
   OpenCL platform running on the CPU.
-#. The CUDA platform can only be used with NVIDIA GPUs.  For using an AMD or
-   Intel GPU, use the OpenCL platform.
-#. The AMOEBA force field only works with the CUDA platform, not with the OpenCL
-   platform.  It also works with the Reference and CPU platforms, but the performance
-   is usually too slow to be useful on those platforms.
+#. The CUDA platform can be used with NVIDIA GPUs.  For using an AMD GPU,
+   use the HIP platform (or the OpenCL platform which is usually slower), for
+   using an Intel GPU, use the OpenCL platform.
+#. The AMOEBA force field works with all platforms, but the performance
+   of the Reference and CPU platforms is usually too slow to be useful.
--- a/docs-source/usersguide/library/02_compiling.rst
+++ b/docs-source/usersguide/library/02_compiling.rst
@@ -167,12 +167,6 @@ fraction of the time.  These tests will say so in the error message:
    exception: Assertion failure at TestReferenceLangevinIntegrator.cpp:129.  Expected 9.97741,
        found 10.7884 (This test is stochastic and may occasionally fail)

-If you get an error message such as :code:`exception: Error launching CUDA compiler: 32512` you need
-to specify the path to the CUDA compiler (nvcc) using the :code:`OPENMM_CUDA_COMPILER` environment
-variable, for example using something like the following::
-
-    OPENMM_CUDA_COMPILER=/<path_to_custom_cuda_dir>/nvcc
-
 Step 3: Install
 ===============
 Install your local build of OpenMM using the following command::

--- a/libraries/lepton/src/ParsedExpression.cpp
+++ b/libraries/lepton/src/ParsedExpression.cpp
@@ -247,7 +247,7 @@ ExpressionTreeNode ParsedExpression::substituteSimplerExpression(const Expressio
        case Operation::DIVIDE:
        {
            if (children[0] == children[1])
-                return ExpressionTreeNode(new Operation::Constant(1.0)); // Dividing anything from itself is 0
+                return ExpressionTreeNode(new Operation::Constant(1.0)); // Dividing anything by itself is 1
            if (first_const && first == 0.0) // 0 divided by something
                return ExpressionTreeNode(new Operation::Constant(0.0));
            if (first_const && first == 1.0) // 1 divided by something

--- a/platforms/hip/CMakeLists.txt
+++ b/platforms/hip/CMakeLists.txt
+#---------------------------------------------------
+# OpenMM HIP Platform
+#
+# Creates OpenMMHIP library.
+#
+# Windows:
+#   OpenMMHIP.dll
+#   OpenMMHIP.lib
+#   OpenMMHIP_static.lib
+# Unix:
+#   libOpenMMHIP.so
+#   libOpenMMHIP_static.a
+#----------------------------------------------------
+
+FIND_PACKAGE(HIPRTC CONFIG)
+
+SET(OPENMM_BUILD_HIP_TESTS TRUE CACHE BOOL "Whether to build HIP test cases")
+IF(BUILD_TESTING AND OPENMM_BUILD_HIP_TESTS)
+    SUBDIRS(tests)
+ENDIF(BUILD_TESTING AND OPENMM_BUILD_HIP_TESTS)
+
+# The source is organized into subdirectories, but we handle them all from
+# this CMakeLists file rather than letting CMake visit them as SUBDIRS.
+SET(OPENMM_SOURCE_SUBDIRS . ../common)
+
+
+# Collect up information about the version of the OpenMM library we're building
+# and make it available to the code so it can be built into the binaries.
+
+SET(OPENMMHIP_LIBRARY_NAME OpenMMHIP)
+
+SET(SHARED_TARGET ${OPENMMHIP_LIBRARY_NAME})
+SET(STATIC_TARGET ${OPENMMHIP_LIBRARY_NAME}_static)
+
+
+# These are all the places to search for header files which are
+# to be part of the API.
+SET(API_INCLUDE_DIRS) # start empty
+FOREACH(subdir ${OPENMM_SOURCE_SUBDIRS})
+    # append
+    SET(API_INCLUDE_DIRS ${API_INCLUDE_DIRS}
+                         ${CMAKE_CURRENT_SOURCE_DIR}/${subdir}/include
+                         ${CMAKE_CURRENT_SOURCE_DIR}/${subdir}/include/internal)
+ENDFOREACH(subdir)
+
+# We'll need both *relative* path names, starting with their API_INCLUDE_DIRS,
+# and absolute pathnames.
+SET(API_REL_INCLUDE_FILES)   # start these out empty
+SET(API_ABS_INCLUDE_FILES)
+
+FOREACH(dir ${API_INCLUDE_DIRS})
+    FILE(GLOB fullpaths ${dir}/*.h)	# returns full pathnames
+    SET(API_ABS_INCLUDE_FILES ${API_ABS_INCLUDE_FILES} ${fullpaths})
+
+    FOREACH(pathname ${fullpaths})
+        GET_FILENAME_COMPONENT(filename ${pathname} NAME)
+        SET(API_REL_INCLUDE_FILES ${API_REL_INCLUDE_FILES} ${dir}/${filename})
+    ENDFOREACH(pathname)
+ENDFOREACH(dir)
+
+# collect up source files
+SET(SOURCE_FILES) # empty
+SET(SOURCE_INCLUDE_FILES)
+
+FOREACH(subdir ${OPENMM_SOURCE_SUBDIRS})
+    FILE(GLOB_RECURSE src_files  ${CMAKE_CURRENT_SOURCE_DIR}/${subdir}/src/*.cpp ${CMAKE_CURRENT_SOURCE_DIR}/${subdir}/src/*.c)
+    FILE(GLOB incl_files ${CMAKE_CURRENT_SOURCE_DIR}/${subdir}/src/*.h)
+    SET(SOURCE_FILES         ${SOURCE_FILES}         ${src_files})   #append
+    IF(MSVC)
+        FILE(GLOB_RECURSE kernel_files ${CMAKE_CURRENT_SOURCE_DIR}/${subdir}/src/kernels/*.hip)
+        SET(SOURCE_FILES ${SOURCE_FILES} ${kernel_files})
+    ENDIF(MSVC)
+    SET(SOURCE_INCLUDE_FILES ${SOURCE_INCLUDE_FILES} ${incl_files})
+    INCLUDE_DIRECTORIES(BEFORE ${CMAKE_CURRENT_SOURCE_DIR}/${subdir}/include)
+ENDFOREACH(subdir)
+
+INCLUDE_DIRECTORIES(BEFORE ${CMAKE_CURRENT_SOURCE_DIR}/src)
+
+# Encode the kernel sources into a C++ class
+
+SET(KERNEL_SOURCE_DIR ${CMAKE_CURRENT_SOURCE_DIR}/src)
+SET(KERNEL_SOURCE_CLASS HipKernelSources)
+SET(KERNELS_CPP ${CMAKE_CURRENT_BINARY_DIR}/src/${KERNEL_SOURCE_CLASS}.cpp)
+SET(KERNELS_H ${CMAKE_CURRENT_BINARY_DIR}/src/${KERNEL_SOURCE_CLASS}.h)
+SET(COMMON_KERNELS_CPP ${CMAKE_CURRENT_BINARY_DIR}/../common/src/CommonKernelSources.cpp)
+SET(SOURCE_FILES ${SOURCE_FILES} ${KERNELS_CPP} ${KERNELS_H} ${COMMON_KERNELS_CPP})
+INCLUDE_DIRECTORIES(BEFORE ${CMAKE_CURRENT_BINARY_DIR}/src)
+INCLUDE_DIRECTORIES(BEFORE ${CMAKE_CURRENT_BINARY_DIR}/../common/src)
+
+FILE(GLOB HIP_KERNELS ${KERNEL_SOURCE_DIR}/kernels/*.hip)
+ADD_CUSTOM_COMMAND(OUTPUT ${KERNELS_CPP} ${KERNELS_H}
+    COMMAND ${CMAKE_COMMAND}
+    ARGS -D KERNEL_SOURCE_DIR=${KERNEL_SOURCE_DIR} -D KERNELS_CPP=${KERNELS_CPP} -D KERNELS_H=${KERNELS_H} -D KERNEL_SOURCE_CLASS=${KERNEL_SOURCE_CLASS} -D KERNEL_FILE_EXTENSION=hip -P ${CMAKE_SOURCE_DIR}/cmake_modules/EncodeKernelFiles.cmake
+    DEPENDS ${HIP_KERNELS}
+)
+SET_SOURCE_FILES_PROPERTIES(${KERNELS_CPP} ${KERNELS_H} ${COMMON_KERNELS_CPP} PROPERTIES GENERATED TRUE)
+ADD_CUSTOM_TARGET(HipKernels DEPENDS ${KERNELS_CPP} ${KERNELS_H})
+
+IF(OPENMM_BUILD_SHARED_LIB)
+    ADD_LIBRARY(${SHARED_TARGET} SHARED ${SOURCE_FILES} ${SOURCE_INCLUDE_FILES} ${API_ABS_INCLUDE_FILES})
+    ADD_DEPENDENCIES(${SHARED_TARGET} CommonKernels HipKernels)
+
+    TARGET_LINK_LIBRARIES(${SHARED_TARGET} PUBLIC ${OPENMM_LIBRARY_NAME} ${PTHREADS_LIB} hip::host hiprtc::hiprtc)
+    SET_TARGET_PROPERTIES(${SHARED_TARGET} PROPERTIES COMPILE_FLAGS "${EXTRA_COMPILE_FLAGS} -DOPENMM_COMMON_BUILDING_SHARED_LIBRARY")
+    SET_TARGET_PROPERTIES(${SHARED_TARGET} PROPERTIES LINK_FLAGS "${EXTRA_LINK_FLAGS}")
+
+    INSTALL_TARGETS(/lib/plugins RUNTIME_DIRECTORY /lib/plugins ${SHARED_TARGET})
+ENDIF(OPENMM_BUILD_SHARED_LIB)
+
+# Build the static library.
+
+IF(OPENMM_BUILD_STATIC_LIB)
+    ADD_LIBRARY(${STATIC_TARGET} STATIC ${SOURCE_FILES} ${SOURCE_INCLUDE_FILES} ${API_ABS_INCLUDE_FILES})
+    ADD_DEPENDENCIES(${STATIC_TARGET} CommonKernels HipKernels)
+
+    TARGET_LINK_LIBRARIES(${STATIC_TARGET} ${OPENMM_LIBRARY_NAME} ${PTHREADS_LIB_STATIC} hip::host hiprtc::hiprtc)
+    SET_TARGET_PROPERTIES(${STATIC_TARGET} PROPERTIES COMPILE_FLAGS "${EXTRA_COMPILE_FLAGS} -DOPENMM_COMMON_BUILDING_STATIC_LIBRARY")
+    SET_TARGET_PROPERTIES(${STATIC_TARGET} PROPERTIES LINK_FLAGS "${EXTRA_LINK_FLAGS}")
+
+    INSTALL_TARGETS(/lib/plugins RUNTIME_DIRECTORY /lib/plugins ${STATIC_TARGET})
+ENDIF(OPENMM_BUILD_STATIC_LIB)
+
+# Install headers
+
+FILE(GLOB CORE_HEADERS include/*.h ${KERNELS_H})
+INSTALL_FILES(/include/openmm/hip FILES ${CORE_HEADERS})
--- a/platforms/hip/include/HipArray.h
+++ b/platforms/hip/include/HipArray.h
+#ifndef OPENMM_HIPARRAY_H_
+#define OPENMM_HIPARRAY_H_
+
+/* -------------------------------------------------------------------------- *
+ *                                   OpenMM                                   *
+ * -------------------------------------------------------------------------- *
+ * This is part of the OpenMM molecular simulation toolkit originating from   *
+ * Simbios, the NIH National Center for Physics-Based Simulation of           *
+ * Biological Structures at Stanford, funded under the NIH Roadmap for        *
+ * Medical Research, grant U54 GM072970. See https://simtk.org.               *
+ *                                                                            *
+ * Portions copyright (c) 2009-2022 Stanford University and the Authors.      *
+ * Portions copyright (c) 2020-2022 Advanced Micro Devices, Inc.              *
+ * Authors: Peter Eastman, Nicholas Curtis                                    *
+ * Contributors:                                                              *
+ *                                                                            *
+ * This program is free software: you can redistribute it and/or modify       *
+ * it under the terms of the GNU Lesser General Public License as published   *
+ * by the Free Software Foundation, either version 3 of the License, or       *
+ * (at your option) any later version.                                        *
+ *                                                                            *
+ * This program is distributed in the hope that it will be useful,            *
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of             *
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the              *
+ * GNU Lesser General Public License for more details.                        *
+ *                                                                            *
+ * You should have received a copy of the GNU Lesser General Public License   *
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.      *
+ * -------------------------------------------------------------------------- */
+
+#include "openmm/OpenMMException.h"
+#include "openmm/common/windowsExportCommon.h"
+#include "openmm/common/ArrayInterface.h"
+#include <hip/hip_runtime.h>
+#include <iostream>
+#include <sstream>
+#include <vector>
+
+namespace OpenMM {
+
+class HipContext;
+
+/**
+ * This class encapsulates a block of HIP device memory.  It provides a simplified API
+ * for working with it and for copying data to and from device memory.
+ */
+
+class OPENMM_EXPORT_COMMON HipArray : public ArrayInterface {
+public:
+    /**
+     * Create a HipArray object.  The object is allocated on the heap with the "new" operator.
+     * The template argument is the data type of each array element.
+     *
+     * @param context           the context for which to create the array
+     * @param size              the number of elements in the array
+     * @param name              the name of the array
+     */
+    template <class T>
+    static HipArray* create(HipContext& context, size_t size, const std::string& name) {
+        return new HipArray(context, size, sizeof(T), name);
+    }
+    /**
+     * Create an uninitialized HipArray object.  It does not point to any device memory,
+     * and cannot be used until initialize() is called on it.
+     */
+    HipArray();
+    /**
+     * Create a HipArray object.
+     *
+     * @param context           the context for which to create the array
+     * @param size              the number of elements in the array
+     * @param elementSize       the size of each element in bytes
+     * @param name              the name of the array
+     */
+    HipArray(HipContext& context, size_t size, int elementSize, const std::string& name);
+    ~HipArray();
+    /**
+     * Initialize this object.
+     *
+     * @param context           the context for which to create the array
+     * @param size              the number of elements in the array
+     * @param elementSize       the size of each element in bytes
+     * @param name              the name of the array
+     */
+    void initialize(ComputeContext& context, size_t size, int elementSize, const std::string& name);
+    /**
+     * Initialize this object.  The template argument is the data type of each array element.
+     *
+     * @param context           the context for which to create the array
+     * @param size              the number of elements in the array
+     * @param name              the name of the array
+     */
+    template <class T>
+    void initialize(ComputeContext& context, size_t size, const std::string& name) {
+        initialize(context, size, sizeof(T), name);
+    }
+    /**
+     * Recreate the internal storage to have a different size.
+     */
+    void resize(size_t size);
+    /**
+     * Get whether this array has been initialized.
+     */
+    bool isInitialized() const {
+        return (pointer != 0);
+    }
+    /**
+     * Get the number of elements in the array.
+     */
+    size_t getSize() const {
+        return size;
+    }
+    /**
+     * Get the size of each element in bytes.
+     */
+    int getElementSize() const {
+        return elementSize;
+    }
+    /**
+     * Get the name of the array.
+     */
+    const std::string& getName() const {
+        return name;
+    }
+    /**
+     * Get the context this array belongs to.
+     */
+    ComputeContext& getContext();
+    /**
+     * Get a pointer to the device memory.
+     */
+    hipDeviceptr_t& getDevicePointer() {
+        return pointer;
+    }
+    /**
+     * Copy the values in a vector to the device memory.
+     */
+    template <class T>
+    void upload(const std::vector<T>& data, bool convert=false) {
+        ArrayInterface::upload(data, convert);
+    }
+    /**
+     * Copy the values in the Buffer to a vector.
+     */
+    template <class T>
+    void download(std::vector<T>& data) const {
+        ArrayInterface::download(data);
+    }
+    /**
+     * Copy the values from host memory to the array.
+     *
+     * @param data     the data to copy
+     * @param blocking if true, this call will block until the transfer is complete.  If false,
+     *                 the source array  must be in page-locked memory.
+     */
+    void upload(const void* data, bool blocking=true) {
+        uploadSubArray(data, 0, getSize(), blocking);
+    }
+    /**
+     * Copy values from host memory to a subset of the array.
+     *
+     * @param data     the data to copy
+     * @param offset   the index of the element within the array at which the copy should begin
+     * @param elements the number of elements to copy
+     * @param blocking if true, this call will block until the transfer is complete.  If false,
+     *                 the source array  must be in page-locked memory.
+     */
+    void uploadSubArray(const void* data, int offset, int elements, bool blocking=true);
+    /**
+     * Copy the values in the device memory to an array.
+     *
+     * @param data     the array to copy the memory to
+     * @param blocking if true, this call will block until the transfer is complete.  If false,
+     *                 the destination array must be in page-locked memory.
+     */
+    void download(void* data, bool blocking=true) const;
+    /**
+     * Copy the values in the device memory to a second array.
+     *
+     * @param dest     the destination array to copy to
+     */
+    void copyTo(ArrayInterface& dest) const;
+private:
+    HipContext* context;
+    hipDeviceptr_t pointer;
+    size_t size;
+    int elementSize;
+    bool ownsMemory;
+    std::string name;
+};
+
+} // namespace OpenMM
+
+#endif /*OPENMM_HIPARRAY_H_*/
--- a/platforms/hip/include/HipBondedUtilities.h
+++ b/platforms/hip/include/HipBondedUtilities.h
+#ifndef OPENMM_HIPBONDEDUTILITIES_H_
+#define OPENMM_HIPBONDEDUTILITIES_H_
+
+/* -------------------------------------------------------------------------- *
+ *                                   OpenMM                                   *
+ * -------------------------------------------------------------------------- *
+ * This is part of the OpenMM molecular simulation toolkit originating from   *
+ * Simbios, the NIH National Center for Physics-Based Simulation of           *
+ * Biological Structures at Stanford, funded under the NIH Roadmap for        *
+ * Medical Research, grant U54 GM072970. See https://simtk.org.               *
+ *                                                                            *
+ * Portions copyright (c) 2011-2024 Stanford University and the Authors.      *
+ * Portions copyright (c) 2020 Advanced Micro Devices, Inc.                   *
+ * Authors: Peter Eastman, Nicholas Curtis                                    *
+ * Contributors:                                                              *
+ *                                                                            *
+ * This program is free software: you can redistribute it and/or modify       *
+ * it under the terms of the GNU Lesser General Public License as published   *
+ * by the Free Software Foundation, either version 3 of the License, or       *
+ * (at your option) any later version.                                        *
+ *                                                                            *
+ * This program is distributed in the hope that it will be useful,            *
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of             *
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the              *
+ * GNU Lesser General Public License for more details.                        *
+ *                                                                            *
+ * You should have received a copy of the GNU Lesser General Public License   *
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.      *
+ * -------------------------------------------------------------------------- */
+
+#include "openmm/common/BondedUtilities.h"
+#include "openmm/common/windowsExportCommon.h"
+
+namespace OpenMM {
+
+/**
+ * This class exists only for backward compatibility.  It adds no features beyond
+ * the base BondedUtilities class.
+ */
+
+class OPENMM_EXPORT_COMMON HipBondedUtilities : public BondedUtilities {
+public:
+    HipBondedUtilities(ComputeContext& context) : BondedUtilities(context) {
+    }
+};
+
+} // namespace OpenMM
+
+#endif /*OPENMM_HIPBONDEDUTILITIES_H_*/
--- a/platforms/hip/include/HipContext.h
+++ b/platforms/hip/include/HipContext.h
+#ifndef OPENMM_HIPCONTEXT_H_
+#define OPENMM_HIPCONTEXT_H_
+
+/* -------------------------------------------------------------------------- *
+ *                                   OpenMM                                   *
+ * -------------------------------------------------------------------------- *
+ * This is part of the OpenMM molecular simulation toolkit originating from   *
+ * Simbios, the NIH National Center for Physics-Based Simulation of           *
+ * Biological Structures at Stanford, funded under the NIH Roadmap for        *
+ * Medical Research, grant U54 GM072970. See https://simtk.org.               *
+ *                                                                            *
+ * Portions copyright (c) 2009-2024 Stanford University and the Authors.      *
+ * Portions copyright (c) 2020-2023 Advanced Micro Devices, Inc.              *
+ * Authors: Peter Eastman, Nicholas Curtis                                    *
+ * Contributors:                                                              *
+ *                                                                            *
+ * This program is free software: you can redistribute it and/or modify       *
+ * it under the terms of the GNU Lesser General Public License as published   *
+ * by the Free Software Foundation, either version 3 of the License, or       *
+ * (at your option) any later version.                                        *
+ *                                                                            *
+ * This program is distributed in the hope that it will be useful,            *
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of             *
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the              *
+ * GNU Lesser General Public License for more details.                        *
+ *                                                                            *
+ * You should have received a copy of the GNU Lesser General Public License   *
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.      *
+ * -------------------------------------------------------------------------- */
+
+
+/*
+ * Porting notes:
+  - Hip only marginally supports the CUDA context API, and will remove
+    support eventually.  To my knowledge, contexts don't really buy you anything
+    that streams / hipSetDevice don't.  Hence, for this implementation, we are doing
+    away entirely with the context usage.
+ */
+
+
+#include <map>
+#include <string>
+#include <utility>
+#define __CL_ENABLE_EXCEPTIONS
+#ifdef _MSC_VER
+    // Prevent Windows from defining macros that interfere with other code.
+    #define NOMINMAX
+#endif
+#include <pthread.h>
+#include <hip/hip_runtime.h>
+#include "openmm/common/windowsExportCommon.h"
+#include "HipArray.h"
+#include "HipBondedUtilities.h"
+#include "HipExpressionUtilities.h"
+#include "HipIntegrationUtilities.h"
+#include "HipNonbondedUtilities.h"
+#include "HipPlatform.h"
+#include "HipFFT3D.h"
+#include "openmm/OpenMMException.h"
+#include "openmm/common/ComputeContext.h"
+#include "openmm/Kernel.h"
+
+typedef unsigned int tileflags;
+
+namespace OpenMM {
+
+/**
+ * This class contains the information associated with a Context by the HIP Platform.  Each HipContext is
+ * specific to a particular device, and manages data structures and kernels for that device.  When running a simulation
+ * in parallel on multiple devices, there is a separate HipContext for each one.  The list of all contexts is
+ * stored in the HipPlatform::PlatformData.
+ * <p>
+ * In addition, a worker thread is created for each HipContext.  This is used for parallel computations, so that
+ * blocking calls to one device will not block other devices.  When only a single device is being used, the worker
+ * thread is not used and calculations are performed on the main application thread.
+ */
+
+class OPENMM_EXPORT_COMMON HipContext : public ComputeContext {
+public:
+    class WorkTask;
+    class WorkThread;
+    class ReorderListener;
+    class ForcePreComputation;
+    class ForcePostComputation;
+    static const int ThreadBlockSize;
+    static const int TileSize;
+    HipContext(const System& system, int deviceIndex, bool useBlockingSync, const std::string& precision,
+            const std::string& tempDir, HipPlatform::PlatformData& platformData, HipContext* originalContext);
+    ~HipContext();
+    /**
+     * This is called to initialize internal data structures after all Forces in the system
+     * have been initialized.
+     */
+    void initialize();
+    /**
+     * Get whether the context associated with this object is valid.
+     */
+    bool getContextIsValid() const {
+        return contextIsValid;
+    }
+    /**
+     * Set the device associated with this object to be the current device.  If the context is not
+     * valid, this returns without doing anything.
+     */
+    void setAsCurrent();
+    /**
+     * Push the device associated with this object to be the current device.  If the context is not
+     * valid, this returns without doing anything.
+     */
+    void pushAsCurrent();
+    /**
+     * Pop the device associated with this object off the stack of contexts.  If the context is not
+     * valid, this returns without doing anything.
+     */
+    void popAsCurrent();
+    /**
+     * Get the hipDevice_t associated with this object.
+     */
+    hipDevice_t getDevice() {
+        return device;
+    }
+    /**
+     * Get the compute capability of the device associated with this object.
+     */
+    double getComputeCapability() const {
+        return computeCapability;
+    }
+    /**
+     * Get the index of the hipDevice_t associated with this object.
+     */
+    int getDeviceIndex() const {
+        return deviceIndex;
+    }
+    /**
+     * Get the PlatformData object this context is part of.
+     */
+    HipPlatform::PlatformData& getPlatformData() {
+        return platformData;
+    }
+    /**
+     * Get the number of contexts being used for the current simulation.
+     * This is relevant when a simulation is parallelized across multiple devices.  In that case,
+     * one HipContext is created for each device.
+     */
+    int getNumContexts() const {
+        return platformData.contexts.size();
+    }
+    /**
+     * Get the index of this context in the list stored in the PlatformData.
+     */
+    int getContextIndex() const {
+        return contextIndex;
+    }
+    /**
+     * Get a list of all contexts being used for the current simulation.
+     * This is relevant when a simulation is parallelized across multiple devices.  In that case,
+     * one ComputeContext is created for each device.
+     */
+    std::vector<ComputeContext*> getAllContexts();
+    /**
+     * Get the stream currently being used for execution.
+     */
+    hipStream_t getCurrentStream();
+    /**
+     * Set the stream to use for execution.
+     */
+    void setCurrentStream(hipStream_t stream);
+    /**
+     * Reset the context to using the default stream for execution.
+     */
+    void restoreDefaultStream();
+    /**
+     * Construct an uninitialized array of the appropriate class for this platform.  The returned
+     * value should be created on the heap with the "new" operator.
+     */
+    HipArray* createArray();
+    /**
+     * Construct a ComputeEvent object of the appropriate class for this platform.
+     */
+    ComputeEvent createEvent();
+    /**
+     * Create a new HipFFT3D.
+     *
+     * @param xsize   the first dimension of the data sets on which FFTs will be performed
+     * @param ysize   the second dimension of the data sets on which FFTs will be performed
+     * @param zsize   the third dimension of the data sets on which FFTs will be performed
+     * @param realToComplex  if true, a real-to-complex transform will be done.  Otherwise, it is complex-to-complex.
+     * @param stream  HIP stream
+     * @param in      the data to transform, ordered such that in[x*ysize*zsize + y*zsize + z] contains element (x, y, z)
+     * @param out     on exit, this contains the transformed data
+     */
+    HipFFT3D* createFFT(int xsize, int ysize, int zsize, bool realToComplex, hipStream_t stream, HipArray& in, HipArray& out);
+    /**
+     * Get the smallest legal size for a dimension of the grid supported by the FFT.
+     */
+    virtual int findLegalFFTDimension(int minimum);
+    /**
+     * Compile source code to create a ComputeProgram.
+     *
+     * @param source             the source code of the program
+     * @param defines            a set of preprocessor definitions (name, value) to define when compiling the program
+     */
+    ComputeProgram compileProgram(const std::string source, const std::map<std::string, std::string>& defines=std::map<std::string, std::string>());
+    /**
+     * Convert an array to an HipArray.  If the argument is already an HipArray, this simply casts it.
+     * If the argument is a ComputeArray that wraps a HipArray, this returns the wrapped array.  For any
+     * other argument, this throws an exception.
+     */
+    HipArray& unwrap(ArrayInterface& array) const;
+    /**
+     * Get the array which contains the position (the xyz components) and charge (the w component) of each atom.
+     */
+    HipArray& getPosq() {
+        return posq;
+    }
+    /**
+     * Get the array which contains a correction to the position of each atom.  This only exists if getUseMixedPrecision() returns true.
+     */
+    HipArray& getPosqCorrection() {
+        return posqCorrection;
+    }
+    /**
+     * Get the array which contains the velocity (the xyz components) and inverse mass (the w component) of each atom.
+     */
+    HipArray& getVelm() {
+        return velm;
+    }
+    /**
+     * Get the array which contains the force on each atom (represented as three long longs in 64 bit fixed point).
+     */
+    HipArray& getForce() {
+        return force;
+    }
+    /**
+     * The HIP platform does not use floating point force buffers, so this throws an exception.
+     */
+    ArrayInterface& getFloatForceBuffer() {
+        throw OpenMMException("HIP platform does not use floating point force buffers");
+    }
+    /**
+     * Get the array which contains a contribution to each force represented as 64 bit fixed point.
+     * This is a synonym for getForce().  It exists to satisfy the ComputeContext interface.
+     */
+    HipArray& getLongForceBuffer() {
+        return force;
+    }
+    /**
+     * Not all HIP devices support 64 bit atomics, so this throws an exception.
+     * @return
+     */
+    ArrayInterface& getForceBuffers() {
+        throw OpenMMException("HIP platform does not use floating point force buffers");
+    }
+    /**
+     * Get the array which contains the buffer in which energy is computed.
+     */
+    HipArray& getEnergyBuffer() {
+        return energyBuffer;
+    }
+    /**
+     * Get the array which contains the buffer in which derivatives of the energy with respect to parameters are computed.
+     */
+    HipArray& getEnergyParamDerivBuffer() {
+        return energyParamDerivBuffer;
+    }
+    /**
+     * Get a pointer to a block of pinned memory that can be used for efficient transfers between host and device.
+     * This is guaranteed to be at least as large as any of the arrays returned by methods of this class.
+     */
+    void* getPinnedBuffer() {
+        return pinnedBuffer;
+    }
+    /**
+     * Get a shared ThreadPool that code can use to parallelize operations.
+     *
+     * Because this object is freely available to all code, care is needed to avoid conflicts.  Only use it
+     * from the main thread, and make sure all operations are complete before you invoke any other code that
+     * might make use of it
+     */
+    ThreadPool& getThreadPool() {
+        return getPlatformData().threads;
+    }
+    /**
+     * Get the array which contains the index of each atom.
+     */
+    HipArray& getAtomIndexArray() {
+        return atomIndexDevice;
+    }
+    /**
+     * Get a file name in tempDir unique for the current process and context.
+     */
+    std::string getTempFileName() const;
+    /**
+     * Get src hash.
+     */
+    std::string getHash(const std::string& src) const;
+    /**
+     * Get a filename in cacheDir based on src hash.
+     */
+    std::string getCacheFileName(const std::string& src) const;
+    /**
+     * Create a HIP module from source code.
+     *
+     * @param source             the source code of the module
+     */
+    hipModule_t createModule(const std::string source);
+    /**
+     * Create a HIP module from source code.
+     *
+     * @param source             the source code of the module
+     * @param defines            a set of preprocessor definitions (name, value) to define when compiling the program
+     */
+    hipModule_t createModule(const std::string source, const std::map<std::string, std::string>& defines);
+    /**
+     * Get a kernel from a HIP module.
+     *
+     * @param module    the module to get the kernel from
+     * @param name      the name of the kernel to get
+     */
+    hipFunction_t getKernel(hipModule_t& module, const std::string& name);
+    /**
+     * Execute a kernel.
+     *
+     * @param kernel       the kernel to execute
+     * @param arguments    an array of pointers to the kernel arguments
+     * @param threads      the maximum number of threads that should be used
+     * @param blockSize    the size of each thread block to use
+     * @param sharedSize   the amount of dynamic shared memory to allocated for the kernel, in bytes
+     */
+    void executeKernel(hipFunction_t kernel, void** arguments, int threads, int blockSize = -1, unsigned int sharedSize = 0);
+    /**
+     * Execute a kernel with full grid.
+     *
+     * @param kernel       the kernel to execute
+     * @param arguments    an array of pointers to the kernel arguments
+     * @param threads      the total number of threads that should be used
+     * @param blockSize    the size of each thread block to use
+     * @param sharedSize   the amount of dynamic shared memory to allocated for the kernel, in bytes
+     */
+    void executeKernelFlat(hipFunction_t kernel, void** arguments, int threads, int blockSize = -1, unsigned int sharedSize = 0);
+    /**
+     * Compute the largest thread block size that can be used for a kernel that requires a particular amount of
+     * shared memory per thread.
+     *
+     * @param memory        the number of bytes of shared memory per thread
+     */
+    int computeThreadBlockSize(double memory) const;
+    /**
+     * Set all elements of an array to 0.
+     */
+    void clearBuffer(ArrayInterface& array);
+    /**
+     * Set all elements of an array to 0.
+     *
+     * @param memory     the memory to clear
+     * @param size       the size of the buffer in bytes
+     */
+    void clearBuffer(hipDeviceptr_t memory, int size);
+    /**
+     * Register a buffer that should be automatically cleared (all elements set to 0) at the start of each force or energy computation.
+     */
+    void addAutoclearBuffer(ArrayInterface& array);
+    /**
+     * Register a buffer that should be automatically cleared (all elements set to 0) at the start of each force or energy computation.
+     *
+     * @param memory     the memory to clear
+     * @param size       the size of the buffer in bytes
+     */
+    void addAutoclearBuffer(hipDeviceptr_t memory, int size);
+    /**
+     * Clear all buffers that have been registered with addAutoclearBuffer().
+     */
+    void clearAutoclearBuffers();
+    /**
+     * Sum the buffer containing energy.
+     */
+    double reduceEnergy();
+    /**
+     * Get the number of blocks of TileSize atoms.
+     */
+    int getNumAtomBlocks() const {
+        return numAtomBlocks;
+    }
+    /**
+     * Get the standard number of thread blocks to use when executing kernels.
+     */
+    int getNumThreadBlocks() const {
+        return numThreadBlocks;
+    }
+    /**
+     * Get the maximum number of threads in a thread block supported by this device.
+     */
+    int getMaxThreadBlockSize() const {
+        return 256;
+    }
+    /**
+     * Get whether the device being used is a CPU.  In some cases, different algorithms
+     * may be more efficient on CPUs and GPUs.
+     */
+    bool getIsCPU() const {
+        return false;
+    }
+    /**
+     * Get the SIMD width of the device being used.
+     */
+    int getSIMDWidth() const {
+        return simdWidth;
+    }
+    /**
+     * Get the number of multiprocessors (compute units) of the device being used.
+     */
+    int getMultiprocessors() const {
+        return multiprocessors;
+    }
+    /**
+     * Get whether the device being used supports 64 bit atomic operations on global memory.
+     */
+    bool getSupports64BitGlobalAtomics() const {
+        return true;
+    }
+    /**
+     * Get whether the device being used supports 32 bit floating point atomic operations
+     * on global memory (fast hardware instructions, not a compare-and-swap loop implementation).
+     */
+    bool getSupportsHardwareFloatGlobalAtomicAdd() const {
+        return supportsHardwareFloatGlobalAtomicAdd;
+    }
+    /**
+     * Get whether the device being used supports double precision math.
+     */
+    bool getSupportsDoublePrecision() const {
+        return true;
+    }
+    /**
+     * Get whether double precision is being used.
+     */
+    bool getUseDoublePrecision() const {
+        return useDoublePrecision;
+    }
+    /**
+     * Get whether mixed precision is being used.
+     */
+    bool getUseMixedPrecision() const {
+        return useMixedPrecision;
+    }
+    /**
+     * Get whether the periodic box is triclinic.
+     */
+    bool getBoxIsTriclinic() const {
+        return boxIsTriclinic;
+    }
+    /**
+     * Convert a HIP result code to the corresponding string description.
+     */
+    static std::string getErrorString(hipError_t result);
+    /**
+     * Get the vectors defining the periodic box.
+     */
+    void getPeriodicBoxVectors(Vec3& a, Vec3& b, Vec3& c) const {
+        a = Vec3(periodicBoxVecX.x, periodicBoxVecX.y, periodicBoxVecX.z);
+        b = Vec3(periodicBoxVecY.x, periodicBoxVecY.y, periodicBoxVecY.z);
+        c = Vec3(periodicBoxVecZ.x, periodicBoxVecZ.y, periodicBoxVecZ.z);
+    }
+    /**
+     * Set the vectors defining the periodic box.
+     */
+    void setPeriodicBoxVectors(const Vec3& a, const Vec3& b, const Vec3& c) {
+        periodicBoxVecX = make_double4(a[0], a[1], a[2], 0.0);
+        periodicBoxVecY = make_double4(b[0], b[1], b[2], 0.0);
+        periodicBoxVecZ = make_double4(c[0], c[1], c[2], 0.0);
+        periodicBoxVecXFloat = make_float4((float) a[0], (float) a[1], (float) a[2], 0.0f);
+        periodicBoxVecYFloat = make_float4((float) b[0], (float) b[1], (float) b[2], 0.0f);
+        periodicBoxVecZFloat = make_float4((float) c[0], (float) c[1], (float) c[2], 0.0f);
+        periodicBoxSize = make_double4(a[0], b[1], c[2], 0.0);
+        invPeriodicBoxSize = make_double4(1.0/a[0], 1.0/b[1], 1.0/c[2], 0.0);
+        periodicBoxSizeFloat = make_float4((float) a[0], (float) b[1], (float) c[2], 0.0f);
+        invPeriodicBoxSizeFloat = make_float4(1.0f/(float) a[0], 1.0f/(float) b[1], 1.0f/(float) c[2], 0.0f);
+    }
+    /**
+     * Get the size of the periodic box.
+     */
+    double4 getPeriodicBoxSize() const {
+        return periodicBoxSize;
+    }
+    /**
+     * Get the inverse of the size of the periodic box.
+     */
+    double4 getInvPeriodicBoxSize() const {
+        return invPeriodicBoxSize;
+    }
+    /**
+     * Get a pointer to the size of the periodic box, represented as either a float4 or double4 depending on
+     * this context's precision.  This value is suitable for passing to kernels as an argument.
+     */
+    void* getPeriodicBoxSizePointer() {
+        return (useDoublePrecision ? reinterpret_cast<void*>(&periodicBoxSize) : reinterpret_cast<void*>(&periodicBoxSizeFloat));
+    }
+    /**
+     * Get a pointer to the inverse of the size of the periodic box, represented as either a float4 or double4 depending on
+     * this context's precision.  This value is suitable for passing to kernels as an argument.
+     */
+    void* getInvPeriodicBoxSizePointer() {
+        return (useDoublePrecision ? reinterpret_cast<void*>(&invPeriodicBoxSize) : reinterpret_cast<void*>(&invPeriodicBoxSizeFloat));
+    }
+    /**
+     * Get a pointer to the first periodic box vector, represented as either a float4 or double4 depending on
+     * this context's precision.  This value is suitable for passing to kernels as an argument.
+     */
+    void* getPeriodicBoxVecXPointer() {
+        return (useDoublePrecision ? reinterpret_cast<void*>(&periodicBoxVecX) : reinterpret_cast<void*>(&periodicBoxVecXFloat));
+    }
+    /**
+     * Get a pointer to the second periodic box vector, represented as either a float4 or double4 depending on
+     * this context's precision.  This value is suitable for passing to kernels as an argument.
+     */
+    void* getPeriodicBoxVecYPointer() {
+        return (useDoublePrecision ? reinterpret_cast<void*>(&periodicBoxVecY) : reinterpret_cast<void*>(&periodicBoxVecYFloat));
+    }
+    /**
+     * Get a pointer to the third periodic box vector, represented as either a float4 or double4 depending on
+     * this context's precision.  This value is suitable for passing to kernels as an argument.
+     */
+    void* getPeriodicBoxVecZPointer() {
+        return (useDoublePrecision ? reinterpret_cast<void*>(&periodicBoxVecZ) : reinterpret_cast<void*>(&periodicBoxVecZFloat));
+    }
+    /**
+     * Get the HipIntegrationUtilities for this context.
+     */
+    HipIntegrationUtilities& getIntegrationUtilities() {
+        return *integration;
+    }
+    /**
+     * Get the HipExpressionUtilities for this context.
+     */
+    HipExpressionUtilities& getExpressionUtilities() {
+        return *expression;
+    }
+    /**
+     * Get the HipBondedUtilities for this context.
+     */
+    HipBondedUtilities& getBondedUtilities() {
+        return *bonded;
+    }
+    /**
+     * Get the HipNonbondedUtilities for this context.
+     */
+    HipNonbondedUtilities& getNonbondedUtilities() {
+        return *nonbonded;
+    }
+    /**
+     * Create a new NonbondedUtilities for use with this context.  This should be called
+     * only in unusual situations, when a Force needs its own NonbondedUtilities object
+     * separate from the standard one.  The caller is responsible for deleting the object
+     * when it is no longer needed.
+     */
+    HipNonbondedUtilities* createNonbondedUtilities() {
+        return new HipNonbondedUtilities(*this);
+    }
+    /**
+     * This should be called by the Integrator from its own initialize() method.
+     * It ensures all contexts are fully initialized.
+     */
+    void initializeContexts();
+    /**
+     * Set the particle charges.  These are packed into the fourth element of the posq array.
+     */
+    void setCharges(const std::vector<double>& charges);
+    /**
+     * Request to use the fourth element of the posq array for storing charges.  Since only one force can
+     * do that, this returns true the first time it is called, and false on all subsequent calls.
+     */
+    bool requestPosqCharges();
+    /**
+     * Get the names of all parameters with respect to which energy derivatives are computed.
+     */
+    const std::vector<std::string>& getEnergyParamDerivNames() const {
+        return energyParamDerivNames;
+    }
+    /**
+     * Get a workspace data structure used for accumulating the values of derivatives of the energy
+     * with respect to parameters.
+     */
+    std::map<std::string, double>& getEnergyParamDerivWorkspace() {
+        return energyParamDerivWorkspace;
+    }
+    /**
+     * Register that the derivative of potential energy with respect to a context parameter
+     * will need to be calculated.  If this is called multiple times for a single parameter,
+     * it is only added to the list once.
+     *
+     * @param param    the name of the parameter to add
+     */
+    void addEnergyParameterDerivative(const std::string& param);
+    /**
+     * Wait until all work that has been queued (kernel executions, asynchronous data transfers, etc.)
+     * has been submitted to the device.  This does not mean it has necessarily been completed.
+     * Calling this periodically may improve the responsiveness of the computer's GUI, but at the
+     * expense of reduced simulation performance.
+     */
+    void flushQueue();
+    /**
+     * Get the flags that should be used when creating hipEvent_t objects.
+     */
+    unsigned int getEventFlags();
+    /**
+     * Get the flags that should be used when allocating pinned host memory.
+     */
+    unsigned int getHostMallocFlags();
+private:
+    /**
+     * Compute a sorted list of device indices in decreasing order of desirability
+     */
+    std::vector<int> getDevicePrecedence();
+    static bool hasInitializedHip;
+    double computeCapability;
+    HipPlatform::PlatformData& platformData;
+    int deviceIndex;
+    int contextIndex;
+    int numAtomBlocks;
+    int numThreadBlocks;
+    int simdWidth;
+    int multiprocessors;
+    int sharedMemPerBlock;
+    bool supportsHardwareFloatGlobalAtomicAdd;
+    bool useBlockingSync, useDoublePrecision, useMixedPrecision, contextIsValid, boxIsTriclinic, hasAssignedPosqCharges;
+    bool isLinkedContext;
+    std::string tempDir, cacheDir, gpuArchitecture;
+    float4 periodicBoxVecXFloat, periodicBoxVecYFloat, periodicBoxVecZFloat, periodicBoxSizeFloat, invPeriodicBoxSizeFloat;
+    double4 periodicBoxVecX, periodicBoxVecY, periodicBoxVecZ, periodicBoxSize, invPeriodicBoxSize;
+    std::map<std::string, std::string> compilationDefines;
+    std::vector<hipModule_t> loadedModules;
+    hipDevice_t device;
+    hipStream_t currentStream;
+    hipStream_t defaultStream;
+    hipFunction_t clearBufferKernel;
+    hipFunction_t clearTwoBuffersKernel;
+    hipFunction_t clearThreeBuffersKernel;
+    hipFunction_t clearFourBuffersKernel;
+    hipFunction_t clearFiveBuffersKernel;
+    hipFunction_t clearSixBuffersKernel;
+    hipFunction_t reduceEnergyKernel;
+    hipFunction_t setChargesKernel;
+    void* pinnedBuffer;
+    HipArray posq;
+    HipArray posqCorrection;
+    HipArray velm;
+    HipArray force;
+    HipArray energyBuffer;
+    HipArray energySum;
+    HipArray energyParamDerivBuffer;
+    HipArray atomIndexDevice;
+    HipArray chargeBuffer;
+    std::vector<std::string> energyParamDerivNames;
+    std::map<std::string, double> energyParamDerivWorkspace;
+    std::vector<hipDeviceptr_t> autoclearBuffers;
+    std::vector<int> autoclearBufferSizes;
+    HipIntegrationUtilities* integration;
+    HipExpressionUtilities* expression;
+    HipBondedUtilities* bonded;
+    HipNonbondedUtilities* nonbonded;
+};
+
+/**
+ * This class exists only for backward compatibility.  Use ComputeContext::WorkTask instead.
+ */
+class OPENMM_EXPORT_COMMON HipContext::WorkTask : public ComputeContext::WorkTask {
+};
+
+/**
+ * This class exists only for backward compatibility.  Use ComputeContext::ReorderListener instead.
+ */
+class OPENMM_EXPORT_COMMON HipContext::ReorderListener : public ComputeContext::ReorderListener {
+};
+
+/**
+ * This class exists only for backward compatibility.  Use ComputeContext::ForcePreComputation instead.
+ */
+class OPENMM_EXPORT_COMMON HipContext::ForcePreComputation : public ComputeContext::ForcePreComputation {
+};
+
+/**
+ * This class exists only for backward compatibility.  Use ComputeContext::ForcePostComputation instead.
+ */
+class OPENMM_EXPORT_COMMON HipContext::ForcePostComputation : public ComputeContext::ForcePostComputation {
+};
+
+} // namespace OpenMM
+
+#endif /*OPENMM_HIPCONTEXT_H_*/
--- a/platforms/hip/include/HipEvent.h
+++ b/platforms/hip/include/HipEvent.h
+#ifndef OPENMM_HIPEVENT_H_
+#define OPENMM_HIPEVENT_H_
+
+/* -------------------------------------------------------------------------- *
+ *                                   OpenMM                                   *
+ * -------------------------------------------------------------------------- *
+ * This is part of the OpenMM molecular simulation toolkit originating from   *
+ * Simbios, the NIH National Center for Physics-Based Simulation of           *
+ * Biological Structures at Stanford, funded under the NIH Roadmap for        *
+ * Medical Research, grant U54 GM072970. See https://simtk.org.               *
+ *                                                                            *
+ * Portions copyright (c) 2019 Stanford University and the Authors.           *
+ * Portions copyright (c) 2020 Advanced Micro Devices, Inc.                   *
+ * Authors: Peter Eastman, Nicholas Curtis                                    *
+ * Contributors:                                                              *
+ *                                                                            *
+ * This program is free software: you can redistribute it and/or modify       *
+ * it under the terms of the GNU Lesser General Public License as published   *
+ * by the Free Software Foundation, either version 3 of the License, or       *
+ * (at your option) any later version.                                        *
+ *                                                                            *
+ * This program is distributed in the hope that it will be useful,            *
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of             *
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the              *
+ * GNU Lesser General Public License for more details.                        *
+ *                                                                            *
+ * You should have received a copy of the GNU Lesser General Public License   *
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.      *
+ * -------------------------------------------------------------------------- */
+
+#include "HipContext.h"
+#include "openmm/common/ComputeEvent.h"
+
+namespace OpenMM {
+
+/**
+ * This is the HIP implementation of the ComputeKernelImpl interface.
+ */
+
+class HipEvent : public ComputeEventImpl {
+public:
+    HipEvent(HipContext& context);
+    ~HipEvent();
+    /**
+     * Place the event into the device's execution queue.
+     */
+    void enqueue();
+    /**
+     * Block until all operations started before the call to enqueue() have completed.
+     */
+    void wait();
+private:
+    HipContext& context;
+    hipEvent_t event;
+    bool eventCreated;
+};
+
+} // namespace OpenMM
+
+#endif /*OPENMM_HIPEVENT_H_*/
--- a/platforms/hip/include/HipExpressionUtilities.h
+++ b/platforms/hip/include/HipExpressionUtilities.h
+#ifndef OPENMM_HIPEXPRESSIONUTILITIES_H_
+#define OPENMM_HIPEXPRESSIONUTILITIES_H_
+
+/* -------------------------------------------------------------------------- *
+ *                                   OpenMM                                   *
+ * -------------------------------------------------------------------------- *
+ * This is part of the OpenMM molecular simulation toolkit originating from   *
+ * Simbios, the NIH National Center for Physics-Based Simulation of           *
+ * Biological Structures at Stanford, funded under the NIH Roadmap for        *
+ * Medical Research, grant U54 GM072970. See https://simtk.org.               *
+ *                                                                            *
+ * Portions copyright (c) 2019 Stanford University and the Authors.           *
+ * Portions copyright (c) 2020 Advanced Micro Devices, Inc.                   *
+ * Authors: Peter Eastman, Nicholas Curtis                                    *
+ * Contributors:                                                              *
+ *                                                                            *
+ * This program is free software: you can redistribute it and/or modify       *
+ * it under the terms of the GNU Lesser General Public License as published   *
+ * by the Free Software Foundation, either version 3 of the License, or       *
+ * (at your option) any later version.                                        *
+ *                                                                            *
+ * This program is distributed in the hope that it will be useful,            *
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of             *
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the              *
+ * GNU Lesser General Public License for more details.                        *
+ *                                                                            *
+ * You should have received a copy of the GNU Lesser General Public License   *
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.      *
+ * -------------------------------------------------------------------------- */
+
+#include "openmm/common/ExpressionUtilities.h"
+#include "openmm/common/windowsExportCommon.h"
+
+namespace OpenMM {
+
+/**
+ * This class exists only for backward compatibility.  It adds no features beyond
+ * the base ExpressionUtilities class.
+ */
+
+class OPENMM_EXPORT_COMMON HipExpressionUtilities : public ExpressionUtilities {
+public:
+    HipExpressionUtilities(ComputeContext& context) : ExpressionUtilities(context) {
+    }
+};
+
+} // namespace OpenMM
+
+#endif /*OPENMM_HIPEXPRESSIONUTILITIES_H_*/
--- a/platforms/hip/include/HipFFT3D.h
+++ b/platforms/hip/include/HipFFT3D.h
+#ifndef __OPENMM_HIPFFT3D_H__
+#define __OPENMM_HIPFFT3D_H__
+
+/* -------------------------------------------------------------------------- *
+ *                                   OpenMM                                   *
+ * -------------------------------------------------------------------------- *
+ * This is part of the OpenMM molecular simulation toolkit originating from   *
+ * Simbios, the NIH National Center for Physics-Based Simulation of           *
+ * Biological Structures at Stanford, funded under the NIH Roadmap for        *
+ * Medical Research, grant U54 GM072970. See https://simtk.org.               *
+ *                                                                            *
+ * Portions copyright (c) 2009-2015 Stanford University and the Authors.      *
+ * Portions copyright (c) 2021 Advanced Micro Devices, Inc.                   *
+ * Authors:                                                                   *
+ * Contributors:                                                              *
+ *                                                                            *
+ * This program is free software: you can redistribute it and/or modify       *
+ * it under the terms of the GNU Lesser General Public License as published   *
+ * by the Free Software Foundation, either version 3 of the License, or       *
+ * (at your option) any later version.                                        *
+ *                                                                            *
+ * This program is distributed in the hope that it will be useful,            *
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of             *
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the              *
+ * GNU Lesser General Public License for more details.                        *
+ *                                                                            *
+ * You should have received a copy of the GNU Lesser General Public License   *
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.      *
+ * -------------------------------------------------------------------------- */
+
+#include "HipArray.h"
+
+#define VKFFT_BACKEND 2 // HIP
+#include "vkFFT.h"
+
+namespace OpenMM {
+
+class HipContext;
+
+/**
+ * This class performs three dimensional Fast Fourier Transforms using VkFFT by
+ * Dmitrii Tolmachev (https://github.com/DTolm/VkFFT).
+ * <p>
+ * Note that this class performs an unnormalized transform.  That means that if you perform
+ * a forward transform followed immediately by an inverse transform, the effect is to
+ * multiply every value of the original data set by the total number of data points.
+ */
+
+class OPENMM_EXPORT_COMMON HipFFT3D {
+public:
+    /**
+     * Create an HipFFT3D object for performing transforms of a particular size.
+     * <p>
+     * The transform cannot be done in-place: the input and output
+     * arrays must be different.  Also, the input array is used as workspace, so its contents
+     * are destroyed.  This also means that both arrays must be large enough to hold complex values,
+     * even when performing a real-to-complex transform.
+     * <p>
+     * When performing a real-to-complex transform, the output data is of size xsize*ysize*(zsize/2+1)
+     * and contains only the non-redundant elements.
+     *
+     * @param context the context in which to perform calculations
+     * @param xsize   the first dimension of the data sets on which FFTs will be performed
+     * @param ysize   the second dimension of the data sets on which FFTs will be performed
+     * @param zsize   the third dimension of the data sets on which FFTs will be performed
+     * @param realToComplex  if true, a real-to-complex transform will be done.  Otherwise, it is complex-to-complex.
+     * @param stream  HIP stream
+     * @param in      the data to transform, ordered such that in[x*ysize*zsize + y*zsize + z] contains element (x, y, z)
+     * @param out     on exit, this contains the transformed data
+     */
+    HipFFT3D(HipContext& context, int xsize, int ysize, int zsize, bool realToComplex, hipStream_t stream, HipArray& in, HipArray& out);
+    ~HipFFT3D();
+    /**
+     * Perform a Fourier transform.
+     *
+     * @param forward  true to perform a forward transform, false to perform an inverse transform
+     */
+    void execFFT(bool forward);
+    /**
+     * Get the smallest legal size for a dimension of the grid (that is, a size with no prime
+     * factors other than 2, 3, 5, 7, 11, 13).  VkFFT supports arbitrary sizes but they may work
+     * slower.
+     *
+     * @param minimum   the minimum size the return value must be greater than or equal to
+     */
+    static int findLegalDimension(int minimum);
+private:
+    hipStream_t stream;
+    HipContext& context;
+    int deviceIndex;
+    void* inputBuffer;
+    void* outputBuffer;
+    uint64_t inputBufferSize;
+    uint64_t outputBufferSize;
+    VkFFTApplication* app;
+};
+
+} // namespace OpenMM
+
+#endif // __OPENMM_HIPFFT3D_H__
--- a/platforms/hip/include/HipForceInfo.h
+++ b/platforms/hip/include/HipForceInfo.h
+#ifndef OPENMM_HIPFORCEINFO_H_
+#define OPENMM_HIPFORCEINFO_H_
+
+/* -------------------------------------------------------------------------- *
+ *                                   OpenMM                                   *
+ * -------------------------------------------------------------------------- *
+ * This is part of the OpenMM molecular simulation toolkit originating from   *
+ * Simbios, the NIH National Center for Physics-Based Simulation of           *
+ * Biological Structures at Stanford, funded under the NIH Roadmap for        *
+ * Medical Research, grant U54 GM072970. See https://simtk.org.               *
+ *                                                                            *
+ * Portions copyright (c) 2009 Stanford University and the Authors.           *
+ * Portions copyright (c) 2020 Advanced Micro Devices, Inc.                   *
+ * Authors: Peter Eastman, Nicholas Curtis                                    *
+ * Contributors:                                                              *
+ *                                                                            *
+ * This program is free software: you can redistribute it and/or modify       *
+ * it under the terms of the GNU Lesser General Public License as published   *
+ * by the Free Software Foundation, either version 3 of the License, or       *
+ * (at your option) any later version.                                        *
+ *                                                                            *
+ * This program is distributed in the hope that it will be useful,            *
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of             *
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the              *
+ * GNU Lesser General Public License for more details.                        *
+ *                                                                            *
+ * You should have received a copy of the GNU Lesser General Public License   *
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.      *
+ * -------------------------------------------------------------------------- */
+
+#include "openmm/common/ComputeForceInfo.h"
+#include "openmm/common/windowsExportCommon.h"
+#include <vector>
+
+namespace OpenMM {
+
+/**
+ * This class exists solely for backward compatibility.  It adds no features beyond the ones
+ * in ComputeForceInfo.
+ */
+
+class OPENMM_EXPORT_COMMON HipForceInfo : public ComputeForceInfo {
+};
+
+} // namespace OpenMM
+
+#endif /*OPENMM_HIPFORCEINFO_H_*/
--- a/platforms/hip/include/HipIntegrationUtilities.h
+++ b/platforms/hip/include/HipIntegrationUtilities.h
+#ifndef OPENMM_HIPINTEGRATIONUTILITIES_H_
+#define OPENMM_HIPINTEGRATIONUTILITIES_H_
+
+/* -------------------------------------------------------------------------- *
+ *                                   OpenMM                                   *
+ * -------------------------------------------------------------------------- *
+ * This is part of the OpenMM molecular simulation toolkit originating from   *
+ * Simbios, the NIH National Center for Physics-Based Simulation of           *
+ * Biological Structures at Stanford, funded under the NIH Roadmap for        *
+ * Medical Research, grant U54 GM072970. See https://simtk.org.               *
+ *                                                                            *
+ * Portions copyright (c) 2009-2019 Stanford University and the Authors.      *
+ * Portions copyright (c) 2020 Advanced Micro Devices, Inc.                   *
+ * Authors: Peter Eastman, Nicholas Curtis                                    *
+ * Contributors:                                                              *
+ *                                                                            *
+ * This program is free software: you can redistribute it and/or modify       *
+ * it under the terms of the GNU Lesser General Public License as published   *
+ * by the Free Software Foundation, either version 3 of the License, or       *
+ * (at your option) any later version.                                        *
+ *                                                                            *
+ * This program is distributed in the hope that it will be useful,            *
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of             *
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the              *
+ * GNU Lesser General Public License for more details.                        *
+ *                                                                            *
+ * You should have received a copy of the GNU Lesser General Public License   *
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.      *
+ * -------------------------------------------------------------------------- */
+
+#include "HipArray.h"
+#include "openmm/System.h"
+#include "openmm/common/IntegrationUtilities.h"
+#include "openmm/common/windowsExportCommon.h"
+#include <hip/hip_runtime.h>
+
+namespace OpenMM {
+
+class HipContext;
+
+/**
+ * This class implements features that are used by many different integrators, including
+ * common workspace arrays, random number generation, and enforcing constraints.
+ */
+
+class OPENMM_EXPORT_COMMON HipIntegrationUtilities : public IntegrationUtilities {
+public:
+    HipIntegrationUtilities(HipContext& context, const System& system);
+    ~HipIntegrationUtilities();
+    /**
+     * Get the array which contains position deltas.
+     */
+    HipArray& getPosDelta();
+    /**
+     * Get the array which contains random values.  Each element is a float4, whose components
+     * are independent, normally distributed random numbers with mean 0 and variance 1.
+     */
+    HipArray& getRandom();
+    /**
+     * Get the array which contains the current step size.
+     */
+    HipArray& getStepSize();
+    /**
+     * Distribute forces from virtual sites to the atoms they are based on.
+     */
+    void distributeForcesFromVirtualSites();
+private:
+    void applyConstraintsImpl(bool constrainVelocities, double tol);
+    int* ccmaConvergedMemory;
+    hipDeviceptr_t ccmaConvergedDeviceMemory;
+    hipEvent_t ccmaEvent;
+};
+
+} // namespace OpenMM
+
+#endif /*OPENMM_HIPINTEGRATIONUTILITIES_H_*/
--- a/platforms/hip/include/HipKernel.h
+++ b/platforms/hip/include/HipKernel.h
+#ifndef OPENMM_HIPKERNEL_H_
+#define OPENMM_HIPKERNEL_H_
+
+/* -------------------------------------------------------------------------- *
+ *                                   OpenMM                                   *
+ * -------------------------------------------------------------------------- *
+ * This is part of the OpenMM molecular simulation toolkit originating from   *
+ * Simbios, the NIH National Center for Physics-Based Simulation of           *
+ * Biological Structures at Stanford, funded under the NIH Roadmap for        *
+ * Medical Research, grant U54 GM072970. See https://simtk.org.               *
+ *                                                                            *
+ * Portions copyright (c) 2019 Stanford University and the Authors.           *
+ * Portions copyright (c) 2020 Advanced Micro Devices, Inc.                   *
+ * Authors: Peter Eastman, Nicholas Curtis                                    *
+ * Contributors:                                                              *
+ *                                                                            *
+ * This program is free software: you can redistribute it and/or modify       *
+ * it under the terms of the GNU Lesser General Public License as published   *
+ * by the Free Software Foundation, either version 3 of the License, or       *
+ * (at your option) any later version.                                        *
+ *                                                                            *
+ * This program is distributed in the hope that it will be useful,            *
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of             *
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the              *
+ * GNU Lesser General Public License for more details.                        *
+ *                                                                            *
+ * You should have received a copy of the GNU Lesser General Public License   *
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.      *
+ * -------------------------------------------------------------------------- */
+
+#include "HipArray.h"
+#include "HipContext.h"
+#include <string>
+#include <vector>
+
+namespace OpenMM {
+
+/**
+ * This is the HIP implementation of the ComputeKernelImpl interface.
+ */
+
+class HipKernel : public ComputeKernelImpl {
+public:
+    /**
+     * Create a new HipKernel.
+     *
+     * @param context      the context this kernel belongs to
+     * @param kernel       the kernel to be invoked
+     * @param name         the name of the kernel function
+     */
+    HipKernel(HipContext& context, hipFunction_t kernel, const std::string& name);
+    /**
+     * Get the name of this kernel.
+     */
+    std::string getName() const;
+    /**
+     * Get the maximum block size that can be used when executing this kernel.
+     */
+    int getMaxBlockSize() const;
+    /**
+     * Execute this kernel.
+     *
+     * @param threads      the maximum number of threads that should be used.  Depending on the
+     *                     computing device, it may choose to use fewer threads than this number.
+     * @param blockSize    the number of threads in each thread block.  If this is omitted, a
+     *                     default size that is appropriate for the computing device is used.
+     */
+    void execute(int threads, int blockSize=-1);
+protected:
+    /**
+     * Add an argument to pass the kernel when it is invoked, where the value is a
+     * subclass of ArrayInterface.
+     *
+     * @param value     the value to pass to the kernel
+     */
+    void addArrayArg(ArrayInterface& value);
+    /**
+     * Add an argument to pass the kernel when it is invoked, where the value is a primitive type.
+     *
+     * @param value    a pointer to the argument value
+     * @param size     the size of the value in bytes
+     */
+    void addPrimitiveArg(const void* value, int size);
+    /**
+     * Add a placeholder for an argument without specifying its value.
+     */
+    void addEmptyArg();
+    /**
+     * Add an argument to pass the kernel when it is invoked, where the value is a
+     * subclass of ArrayInterface.
+     *
+     * @param index     the index of the argument to set
+     * @param value     the value to pass to the kernel
+     */
+    void setArrayArg(int index, ArrayInterface& value);
+    /**
+     * Add an argument to pass the kernel when it is invoked, where the value is a primitive type.
+     *
+     * @param index     the index of the argument to set
+     * @param value    a pointer to the argument value
+     * @param size     the size of the value in bytes
+     */
+    void setPrimitiveArg(int index, const void* value, int size);
+private:
+    HipContext& context;
+    hipFunction_t kernel;
+    std::string name;
+    std::vector<double4> primitiveArgs;
+    std::vector<HipArray*> arrayArgs;
+    std::vector<void*> argPointers;
+};
+
+} // namespace OpenMM
+
+#endif /*OPENMM_HIPKERNEL_H_*/
--- a/platforms/hip/include/HipKernelFactory.h
+++ b/platforms/hip/include/HipKernelFactory.h
+#ifndef OPENMM_HIPKERNELFACTORY_H_
+#define OPENMM_HIPKERNELFACTORY_H_
+
+/* -------------------------------------------------------------------------- *
+ *                                   OpenMM                                   *
+ * -------------------------------------------------------------------------- *
+ * This is part of the OpenMM molecular simulation toolkit originating from   *
+ * Simbios, the NIH National Center for Physics-Based Simulation of           *
+ * Biological Structures at Stanford, funded under the NIH Roadmap for        *
+ * Medical Research, grant U54 GM072970. See https://simtk.org.               *
+ *                                                                            *
+ * Portions copyright (c) 2012 Stanford University and the Authors.           *
+ * Portions copyright (c) 2020 Advanced Micro Devices, Inc.                   *
+ * Authors: Peter Eastman, Nicholas Curtis                                    *
+ * Contributors:                                                              *
+ *                                                                            *
+ * This program is free software: you can redistribute it and/or modify       *
+ * it under the terms of the GNU Lesser General Public License as published   *
+ * by the Free Software Foundation, either version 3 of the License, or       *
+ * (at your option) any later version.                                        *
+ *                                                                            *
+ * This program is distributed in the hope that it will be useful,            *
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of             *
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the              *
+ * GNU Lesser General Public License for more details.                        *
+ *                                                                            *
+ * You should have received a copy of the GNU Lesser General Public License   *
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.      *
+ * -------------------------------------------------------------------------- */
+
+#include "openmm/KernelFactory.h"
+
+namespace OpenMM {
+
+/**
+ * This KernelFactory creates all kernels for HipPlatform.
+ */
+
+class HipKernelFactory : public KernelFactory {
+public:
+    KernelImpl* createKernelImpl(std::string name, const Platform& platform, ContextImpl& context) const;
+};
+
+} // namespace OpenMM
+
+#endif /*OPENMM_HIPKERNELFACTORY_H_*/
--- a/platforms/hip/include/HipKernels.h
+++ b/platforms/hip/include/HipKernels.h
+#ifndef OPENMM_HIPKERNELS_H_
+#define OPENMM_HIPKERNELS_H_
+
+/* -------------------------------------------------------------------------- *
+ *                                   OpenMM                                   *
+ * -------------------------------------------------------------------------- *
+ * This is part of the OpenMM molecular simulation toolkit originating from   *
+ * Simbios, the NIH National Center for Physics-Based Simulation of           *
+ * Biological Structures at Stanford, funded under the NIH Roadmap for        *
+ * Medical Research, grant U54 GM072970. See https://simtk.org.               *
+ *                                                                            *
+ * Portions copyright (c) 2008-2024 Stanford University and the Authors.      *
+ * Portions copyright (c) 2020-2022 Advanced Micro Devices, Inc.              *
+ * Authors: Peter Eastman, Nicholas Curtis                                    *
+ * Contributors:                                                              *
+ *                                                                            *
+ * This program is free software: you can redistribute it and/or modify       *
+ * it under the terms of the GNU Lesser General Public License as published   *
+ * by the Free Software Foundation, either version 3 of the License, or       *
+ * (at your option) any later version.                                        *
+ *                                                                            *
+ * This program is distributed in the hope that it will be useful,            *
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of             *
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the              *
+ * GNU Lesser General Public License for more details.                        *
+ *                                                                            *
+ * You should have received a copy of the GNU Lesser General Public License   *
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.      *
+ * -------------------------------------------------------------------------- */
+
+#include "HipPlatform.h"
+#include "HipArray.h"
+#include "HipContext.h"
+#include "HipFFT3D.h"
+#include "HipSort.h"
+#include "openmm/kernels.h"
+#include "openmm/System.h"
+#include "openmm/common/CommonKernels.h"
+
+namespace OpenMM {
+
+/**
+ * This kernel is invoked at the beginning and end of force and energy computations.  It gives the
+ * Platform a chance to clear buffers and do other initialization at the beginning, and to do any
+ * necessary work at the end to determine the final results.
+ */
+class HipCalcForcesAndEnergyKernel : public CalcForcesAndEnergyKernel {
+public:
+    HipCalcForcesAndEnergyKernel(std::string name, const Platform& platform, HipContext& cu) : CalcForcesAndEnergyKernel(name, platform), cu(cu) {
+    }
+    /**
+     * Initialize the kernel.
+     *
+     * @param system     the System this kernel will be applied to
+     */
+    void initialize(const System& system);
+    /**
+     * This is called at the beginning of each force/energy computation, before calcForcesAndEnergy() has been called on
+     * any ForceImpl.
+     *
+     * @param context       the context in which to execute this kernel
+     * @param includeForce  true if forces should be computed
+     * @param includeEnergy true if potential energy should be computed
+     * @param groups        a set of bit flags for which force groups to include
+     */
+    void beginComputation(ContextImpl& context, bool includeForce, bool includeEnergy, int groups);
+    /**
+     * This is called at the end of each force/energy computation, after calcForcesAndEnergy() has been called on
+     * every ForceImpl.
+     *
+     * @param context       the context in which to execute this kernel
+     * @param includeForce  true if forces should be computed
+     * @param includeEnergy true if potential energy should be computed
+     * @param groups        a set of bit flags for which force groups to include
+     * @param valid         the method may set this to false to indicate the results are invalid and the force/energy
+     *                      calculation should be repeated
+     * @return the potential energy of the system.  This value is added to all values returned by ForceImpls'
+     * calcForcesAndEnergy() methods.  That is, each force kernel may <i>either</i> return its contribution to the
+     * energy directly, <i>or</i> add it to an internal buffer so that it will be included here.
+     */
+    double finishComputation(ContextImpl& context, bool includeForce, bool includeEnergy, int groups, bool& valid);
+private:
+   HipContext& cu;
+};
+
+/**
+ * This kernel is invoked by NonbondedForce to calculate the forces acting on the system.
+ */
+class HipCalcNonbondedForceKernel : public CalcNonbondedForceKernel {
+public:
+    HipCalcNonbondedForceKernel(std::string name, const Platform& platform, HipContext& cu, const System& system) : CalcNonbondedForceKernel(name, platform),
+            cu(cu), hasInitializedFFT(false), sort(NULL), dispersionFft(NULL), fft(NULL), pmeio(NULL), usePmeStream(false) {
+    }
+    ~HipCalcNonbondedForceKernel();
+    /**
+     * Initialize the kernel.
+     *
+     * @param system     the System this kernel will be applied to
+     * @param force      the NonbondedForce this kernel will be used for
+     */
+    void initialize(const System& system, const NonbondedForce& force);
+    /**
+     * Execute the kernel to calculate the forces and/or energy.
+     *
+     * @param context        the context in which to execute this kernel
+     * @param includeForces  true if forces should be calculated
+     * @param includeEnergy  true if the energy should be calculated
+     * @param includeDirect  true if direct space interactions should be included
+     * @param includeReciprocal  true if reciprocal space interactions should be included
+     * @return the potential energy due to the force
+     */
+    double execute(ContextImpl& context, bool includeForces, bool includeEnergy, bool includeDirect, bool includeReciprocal);
+    /**
+     * Copy changed parameters over to a context.
+     *
+     * @param context    the context to copy parameters to
+     * @param force      the NonbondedForce to copy the parameters from
+     */
+    void copyParametersToContext(ContextImpl& context, const NonbondedForce& force);
+    /**
+     * Get the parameters being used for PME.
+     *
+     * @param alpha   the separation parameter
+     * @param nx      the number of grid points along the X axis
+     * @param ny      the number of grid points along the Y axis
+     * @param nz      the number of grid points along the Z axis
+     */
+    void getPMEParameters(double& alpha, int& nx, int& ny, int& nz) const;
+    /**
+     * Get the dispersion parameters being used for the dispersion term in LJPME.
+     *
+     * @param alpha   the separation parameter
+     * @param nx      the number of grid points along the X axis
+     * @param ny      the number of grid points along the Y axis
+     * @param nz      the number of grid points along the Z axis
+     */
+    void getLJPMEParameters(double& alpha, int& nx, int& ny, int& nz) const;
+private:
+    class SortTrait : public HipSort::SortTrait {
+        int getDataSize() const {return 8;}
+        int getKeySize() const {return 4;}
+        const char* getDataType() const {return "int2";}
+        const char* getKeyType() const {return "int";}
+        const char* getMinKey() const {return "(-2147483647-1)";}
+        const char* getMaxKey() const {return "2147483647";}
+        const char* getMaxValue() const {return "make_int2(2147483647, 2147483647)";}
+        const char* getSortKey() const {return "value.y";}
+    };
+    class ForceInfo;
+    class PmeIO;
+    class PmePreComputation;
+    class PmePostComputation;
+    class SyncStreamPreComputation;
+    class SyncStreamPostComputation;
+    HipContext& cu;
+    ForceInfo* info;
+    bool hasInitializedFFT;
+    HipArray charges;
+    HipArray sigmaEpsilon;
+    HipArray exceptionParams;
+    HipArray exclusionAtoms;
+    HipArray exclusionParams;
+    HipArray baseParticleParams;
+    HipArray baseExceptionParams;
+    HipArray particleParamOffsets;
+    HipArray exceptionParamOffsets;
+    HipArray particleOffsetIndices;
+    HipArray exceptionOffsetIndices;
+    HipArray globalParams;
+    HipArray cosSinSums;
+    HipArray pmeGrid1;
+    HipArray pmeGrid2;
+    HipArray pmeBsplineModuliX;
+    HipArray pmeBsplineModuliY;
+    HipArray pmeBsplineModuliZ;
+    HipArray pmeDispersionBsplineModuliX;
+    HipArray pmeDispersionBsplineModuliY;
+    HipArray pmeDispersionBsplineModuliZ;
+    HipArray pmeAtomGridIndex;
+    HipArray pmeEnergyBuffer;
+    HipSort* sort;
+    Kernel cpuPme;
+    PmeIO* pmeio;
+    hipStream_t pmeStream;
+    hipEvent_t pmeSyncEvent, paramsSyncEvent;
+    HipFFT3D* fft;
+    HipFFT3D* dispersionFft;
+    hipFunction_t computeParamsKernel, computeExclusionParamsKernel;
+    hipFunction_t ewaldSumsKernel;
+    hipFunction_t ewaldForcesKernel;
+    hipFunction_t pmeGridIndexKernel;
+    hipFunction_t pmeDispersionGridIndexKernel;
+    hipFunction_t pmeSpreadChargeKernel;
+    hipFunction_t pmeDispersionSpreadChargeKernel;
+    hipFunction_t pmeFinishSpreadChargeKernel;
+    hipFunction_t pmeDispersionFinishSpreadChargeKernel;
+    hipFunction_t pmeEvalEnergyKernel;
+    hipFunction_t pmeEvalDispersionEnergyKernel;
+    hipFunction_t pmeConvolutionKernel;
+    hipFunction_t pmeDispersionConvolutionKernel;
+    hipFunction_t pmeInterpolateForceKernel;
+    hipFunction_t pmeInterpolateDispersionForceKernel;
+    std::vector<std::pair<int, int> > exceptionAtoms;
+    std::vector<std::string> paramNames;
+    std::vector<double> paramValues;
+    double ewaldSelfEnergy, dispersionCoefficient, alpha, dispersionAlpha;
+    int interpolateForceThreads;
+    int gridSizeX, gridSizeY, gridSizeZ;
+    int dispersionGridSizeX, dispersionGridSizeY, dispersionGridSizeZ;
+    bool hasCoulomb, hasLJ, usePmeStream, doLJPME, usePosqCharges, recomputeParams, hasOffsets;
+    NonbondedMethod nonbondedMethod;
+    static const int PmeOrder = 5;
+};
+
+/**
+ * This kernel is invoked by CustomCVForce to calculate the forces acting on the system and the energy of the system.
+ */
+class HipCalcCustomCVForceKernel : public CommonCalcCustomCVForceKernel {
+public:
+    HipCalcCustomCVForceKernel(std::string name, const Platform& platform, ComputeContext& cc) : CommonCalcCustomCVForceKernel(name, platform, cc) {
+    }
+    ComputeContext& getInnerComputeContext(ContextImpl& innerContext) {
+        return *reinterpret_cast<HipPlatform::PlatformData*>(innerContext.getPlatformData())->contexts[0];
+    }
+};
+
+/**
+ * This kernel is invoked by ATMForce to calculate the forces acting on the system and the energy of the system.
+ */
+class HipCalcATMForceKernel : public CommonCalcATMForceKernel {
+public:
+    HipCalcATMForceKernel(std::string name, const Platform& platform, ComputeContext& cc) : CommonCalcATMForceKernel(name, platform, cc) {
+    }
+    ComputeContext& getInnerComputeContext(ContextImpl& innerContext) {
+        return *reinterpret_cast<HipPlatform::PlatformData*>(innerContext.getPlatformData())->contexts[0];
+    }
+};
+
+} // namespace OpenMM
+
+#endif /*OPENMM_HIPKERNELS_H_*/
--- a/platforms/hip/include/HipNonbondedUtilities.h
+++ b/platforms/hip/include/HipNonbondedUtilities.h
+#ifndef OPENMM_HIPNONBONDEDUTILITIES_H_
+#define OPENMM_HIPNONBONDEDUTILITIES_H_
+
+/* -------------------------------------------------------------------------- *
+ *                                   OpenMM                                   *
+ * -------------------------------------------------------------------------- *
+ * This is part of the OpenMM molecular simulation toolkit originating from   *
+ * Simbios, the NIH National Center for Physics-Based Simulation of           *
+ * Biological Structures at Stanford, funded under the NIH Roadmap for        *
+ * Medical Research, grant U54 GM072970. See https://simtk.org.               *
+ *                                                                            *
+ * Portions copyright (c) 2009-2023 Stanford University and the Authors.      *
+ * Portions copyright (c) 2020-2023 Advanced Micro Devices, Inc.              *
+ * Authors: Peter Eastman, Nicholas Curtis                                    *
+ * Contributors:                                                              *
+ *                                                                            *
+ * This program is free software: you can redistribute it and/or modify       *
+ * it under the terms of the GNU Lesser General Public License as published   *
+ * by the Free Software Foundation, either version 3 of the License, or       *
+ * (at your option) any later version.                                        *
+ *                                                                            *
+ * This program is distributed in the hope that it will be useful,            *
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of             *
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the              *
+ * GNU Lesser General Public License for more details.                        *
+ *                                                                            *
+ * You should have received a copy of the GNU Lesser General Public License   *
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.      *
+ * -------------------------------------------------------------------------- */
+
+#include "openmm/System.h"
+#include "HipArray.h"
+#include "HipExpressionUtilities.h"
+#include "openmm/common/NonbondedUtilities.h"
+#include <hip/hip_runtime.h>
+#include <sstream>
+#include <string>
+#include <vector>
+
+namespace OpenMM {
+
+class HipContext;
+class HipSort;
+
+/**
+ * This class provides a generic interface for calculating nonbonded interactions.  It does this in two
+ * ways.  First, it can be used to create kernels that evaluate nonbonded interactions.  Clients
+ * only need to provide the code for evaluating a single interaction and the list of parameters it depends on.
+ * A complete kernel is then synthesized using an appropriate algorithm to evaluate all interactions on all
+ * atoms.
+ *
+ * Second, this class itself creates and invokes a single "default" interaction kernel, allowing several
+ * different forces to be evaluated at once for greater efficiency.  Call addInteraction() and addParameter()
+ * to add interactions to this default kernel.
+ *
+ * During each force or energy evaluation, the following sequence of steps takes place:
+ *
+ * 1. Data structures (e.g. neighbor lists) are calculated to allow nonbonded interactions to be evaluated
+ * quickly.
+ *
+ * 2. calcForcesAndEnergy() is called on each ForceImpl in the System.
+ *
+ * 3. Finally, the default interaction kernel is invoked to calculate all interactions that were added
+ * to it.
+ *
+ * This sequence means that the default interaction kernel may depend on quantities that were calculated
+ * by ForceImpls during calcForcesAndEnergy().
+ */
+
+class OPENMM_EXPORT_COMMON HipNonbondedUtilities : public NonbondedUtilities  {
+public:
+    class ParameterInfo;
+    HipNonbondedUtilities(HipContext& context);
+    ~HipNonbondedUtilities();
+    /**
+     * Add a nonbonded interaction to be evaluated by the default interaction kernel.
+     *
+     * @param usesCutoff       specifies whether a cutoff should be applied to this interaction
+     * @param usesPeriodic     specifies whether periodic boundary conditions should be applied to this interaction
+     * @param usesExclusions   specifies whether this interaction uses exclusions.  If this is true, it must have identical exclusions to every other interaction.
+     * @param cutoffDistance   the cutoff distance for this interaction (ignored if usesCutoff is false)
+     * @param exclusionList    for each atom, specifies the list of other atoms whose interactions should be excluded
+     * @param kernel           the code to evaluate the interaction
+     * @param forceGroup       the force group in which the interaction should be calculated
+     * @param usesNeighborList specifies whether a neighbor list should be used to optimize this interaction.  This should
+     *                         be viewed as only a suggestion.  Even when it is false, a neighbor list may be used anyway.
+     */
+    void addInteraction(bool usesCutoff, bool usesPeriodic, bool usesExclusions, double cutoffDistance, const std::vector<std::vector<int> >& exclusionList, const std::string& kernel, int forceGroup, bool usesNeighborList = true);
+    /**
+     * Add a nonbonded interaction to be evaluated by the default interaction kernel.
+     *
+     * @param usesCutoff       specifies whether a cutoff should be applied to this interaction
+     * @param usesPeriodic     specifies whether periodic boundary conditions should be applied to this interaction
+     * @param usesExclusions   specifies whether this interaction uses exclusions.  If this is true, it must have identical exclusions to every other interaction.
+     * @param cutoffDistance   the cutoff distance for this interaction (ignored if usesCutoff is false)
+     * @param exclusionList    for each atom, specifies the list of other atoms whose interactions should be excluded
+     * @param kernel           the code to evaluate the interaction
+     * @param forceGroup       the force group in which the interaction should be calculated
+     * @param usesNeighborList specifies whether a neighbor list should be used to optimize this interaction.  This should
+     *                         be viewed as only a suggestion.  Even when it is false, a neighbor list may be used anyway.
+     * @param supportsPairList specifies whether this interaction can work with a neighbor list that uses a separate pair list
+     */
+    void addInteraction(bool usesCutoff, bool usesPeriodic, bool usesExclusions, double cutoffDistance, const std::vector<std::vector<int> >& exclusionList, const std::string& kernel, int forceGroup, bool usesNeighborList, bool supportsPairList);
+    /**
+     * Add a per-atom parameter that the default interaction kernel may depend on.
+     */
+    void addParameter(ComputeParameterInfo parameter);
+    /**
+     * Add a per-atom parameter that the default interaction kernel may depend on.
+     *
+     * @deprecated Use the version that takes a ComputeParameterInfo instead.
+     */
+    void addParameter(const ParameterInfo& parameter);
+    /**
+     * Add an array (other than a per-atom parameter) that should be passed as an argument to the default interaction kernel.
+     */
+    void addArgument(ComputeParameterInfo parameter);
+    /**
+     * Add an array (other than a per-atom parameter) that should be passed as an argument to the default interaction kernel.
+     *
+     * @deprecated Use the version that takes a ComputeParameterInfo instead.
+     */
+    void addArgument(const ParameterInfo& parameter);
+    /**
+     * Register that the interaction kernel will be computing the derivative of the potential energy
+     * with respect to a parameter.
+     *
+     * @param param   the name of the parameter
+     * @return the variable that will be used to accumulate the derivative.  Any code you pass to addInteraction() should
+     * add its contributions to this variable.
+     */
+    std::string addEnergyParameterDerivative(const std::string& param);
+    /**
+     * Specify the list of exclusions that an interaction outside the default kernel will depend on.
+     *
+     * @param exclusionList  for each atom, specifies the list of other atoms whose interactions should be excluded
+     */
+    void requestExclusions(const std::vector<std::vector<int> >& exclusionList);
+    /**
+     * Initialize this object in preparation for a simulation.
+     */
+    void initialize(const System& system);
+    /**
+     * Get the number of force buffers required for nonbonded forces.
+     */
+    int getNumForceBuffers() const {
+        return 0;
+    }
+    /**
+     * Get the number of energy buffers required for nonbonded forces.
+     */
+    int getNumEnergyBuffers() {
+        return numForceThreadBlocks*forceThreadBlockSize;
+    }
+    /**
+     * Get whether a cutoff is being used.
+     */
+    bool getUseCutoff() {
+        return useCutoff;
+    }
+    /**
+     * Get whether periodic boundary conditions are being used.
+     */
+    bool getUsePeriodic() {
+        return usePeriodic;
+    }
+    /**
+     * Get the number of work groups used for computing nonbonded forces.
+     */
+    int getNumForceThreadBlocks() {
+        return numForceThreadBlocks;
+    }
+    /**
+     * Get the size of each work group used for computing nonbonded forces.
+     */
+    int getForceThreadBlockSize() {
+        return forceThreadBlockSize;
+    }
+    /**
+     * Get the maximum cutoff distance used by any force group.
+     */
+    double getMaxCutoffDistance();
+    /**
+     * Given a nonbonded cutoff, get the padded cutoff distance used in computing
+     * the neighbor list.
+     */
+    double padCutoff(double cutoff);
+    /**
+     * Prepare to compute interactions.  This updates the neighbor list.
+     */
+    void prepareInteractions(int forceGroups);
+    /**
+     * Compute the nonbonded interactions.
+     *
+     * @param forceGroups    the flags specifying which force groups to include
+     * @param includeForces  whether to compute forces
+     * @param includeEnergy  whether to compute the potential energy
+     */
+    void computeInteractions(int forceGroups, bool includeForces, bool includeEnergy);
+    /**
+     * Check to see if the neighbor list arrays are large enough, and make them bigger if necessary.
+     *
+     * @return true if the neighbor list needed to be enlarged.
+     */
+    bool updateNeighborListSize();
+    /**
+     * Get the array containing the center of each atom block.
+     */
+    HipArray& getBlockCenters() {
+        return blockCenter;
+    }
+    /**
+     * Get the array containing the dimensions of each atom block.
+     */
+    HipArray& getBlockBoundingBoxes() {
+        return blockBoundingBox;
+    }
+    /**
+     * Get the array whose first element contains the number of tiles with interactions.
+     */
+    HipArray& getInteractionCount() {
+        return interactionCount;
+    }
+    /**
+     * Get the array containing tiles with interactions.
+     */
+    HipArray& getInteractingTiles() {
+        return interactingTiles;
+    }
+    /**
+     * Get the array containing the atoms in each tile with interactions.
+     */
+    HipArray& getInteractingAtoms() {
+        return interactingAtoms;
+    }
+    /**
+     * Get the array containing single pairs in the neighbor list.
+     */
+    HipArray& getSinglePairs() {
+        return singlePairs;
+    }
+    /**
+     * Get the array containing exclusion flags.
+     */
+    HipArray& getExclusions() {
+        return exclusions;
+    }
+    /**
+     * Get the array containing tiles with exclusions.
+     */
+    HipArray& getExclusionTiles() {
+        return exclusionTiles;
+    }
+    /**
+     * Get the array containing the index into the exclusion array for each tile.
+     */
+    HipArray& getExclusionIndices() {
+        return exclusionIndices;
+    }
+    /**
+     * Get the array listing where the exclusion data starts for each row.
+     */
+    HipArray& getExclusionRowIndices() {
+        return exclusionRowIndices;
+    }
+    /**
+     * Get the array containing a flag for whether the neighbor list was rebuilt
+     * on the most recent call to prepareInteractions().
+     */
+    HipArray& getRebuildNeighborList() {
+        return rebuildNeighborList;
+    }
+    /**
+     * Get the index of the first tile this context is responsible for processing.
+     */
+    int getStartTileIndex() const {
+        return startTileIndex;
+    }
+    /**
+     * Get the total number of tiles this context is responsible for processing.
+     */
+    int getNumTiles() const {
+        return numTiles;
+    }
+    /**
+     * Set whether to add padding to the cutoff distance when building the neighbor list.
+     * This increases the size of the neighbor list (and thus the cost of computing interactions),
+     * but also means we don't need to rebuild it every time step.  The default value is true,
+     * since usually this improves performance.  For very expensive interactions, however,
+     * it may be better to set this to false.
+     */
+    void setUsePadding(bool padding);
+    /**
+     * Set the range of atom blocks and tiles that should be processed by this context.
+     */
+    void setAtomBlockRange(double startFraction, double endFraction);
+    /**
+     * Create a Kernel for evaluating a nonbonded interaction.  Cutoffs and periodic boundary conditions
+     * are assumed to be the same as those for the default interaction Kernel, since this kernel will use
+     * the same neighbor list.
+     *
+     * @param source        the source code for evaluating the force and energy
+     * @param params        the per-atom parameters this kernel may depend on
+     * @param arguments     arrays (other than per-atom parameters) that should be passed as arguments to the kernel
+     * @param useExclusions specifies whether exclusions are applied to this interaction
+     * @param isSymmetric   specifies whether the interaction is symmetric
+     * @param groups        the set of force groups this kernel is for
+     * @param includeForces whether this kernel should compute forces
+     * @param includeEnergy whether this kernel should compute potential energy
+     */
+    hipFunction_t createInteractionKernel(const std::string& source, std::vector<ParameterInfo>& params, std::vector<ParameterInfo>& arguments, bool useExclusions, bool isSymmetric, int groups, bool includeForces, bool includeEnergy);
+    /**
+     * Create the set of kernels that will be needed for a particular combination of force groups.
+     *
+     * @param groups    the set of force groups
+     */
+    void createKernelsForGroups(int groups);
+    /**
+     * Set the source code for the main kernel.  This defaults to the content of nonbonded.hip.  It only needs to be
+     * changed in very unusual circumstances.
+     */
+    void setKernelSource(const std::string& source);
+private:
+    class KernelSet;
+    class BlockSortTrait;
+    HipContext& context;
+    std::map<int, KernelSet> groupKernels;
+    HipArray exclusionTiles;
+    HipArray exclusions;
+    HipArray exclusionIndices;
+    HipArray exclusionRowIndices;
+    HipArray interactingTiles;
+    HipArray interactingAtoms;
+    HipArray interactionCount;
+    HipArray singlePairs;
+    HipArray singlePairCount;
+    HipArray blockCenter;
+    HipArray blockBoundingBox;
+    HipArray sortedBlocks;
+    HipArray sortedBlockCenter;
+    HipArray sortedBlockBoundingBox;
+    HipArray blockSizeRange;
+    HipArray largeBlockCenter;
+    HipArray largeBlockBoundingBox;
+    HipArray oldPositions;
+    HipArray rebuildNeighborList;
+    HipSort* blockSorter;
+    hipEvent_t downloadCountEvent;
+    unsigned int* pinnedCountBuffer;
+    std::vector<void*> forceArgs, findBlockBoundsArgs, computeSortKeysArgs, sortBoxDataArgs, findInteractingBlocksArgs, copyInteractionCountsArgs;
+    std::vector<std::vector<int> > atomExclusions;
+    std::vector<ParameterInfo> parameters;
+    std::vector<ParameterInfo> arguments;
+    std::vector<std::string> energyParameterDerivatives;
+    std::map<int, double> groupCutoff;
+    std::map<int, std::string> groupKernelSource;
+    double lastCutoff;
+    bool useCutoff, usePeriodic, anyExclusions, usePadding, useNeighborList, forceRebuildNeighborList, canUsePairList, useLargeBlocks;
+    int startTileIndex, startBlockIndex, numBlocks, numTilesInBatch, maxExclusions;
+    int numForceThreadBlocks, forceThreadBlockSize, findInteractingBlocksThreadBlockSize, numAtoms, groupFlags;
+    unsigned int maxTiles, maxSinglePairs, tilesAfterReorder;
+    long long numTiles;
+    std::string kernelSource;
+};
+
+/**
+ * This class stores the kernels to execute for a set of force groups.
+ */
+
+class HipNonbondedUtilities::KernelSet {
+public:
+    bool hasForces;
+    double cutoffDistance;
+    std::string source;
+    hipFunction_t forceKernel, energyKernel, forceEnergyKernel;
+    hipFunction_t findBlockBoundsKernel;
+    hipFunction_t computeSortKeysKernel;
+    hipFunction_t sortBoxDataKernel;
+    hipFunction_t findInteractingBlocksKernel;
+    hipFunction_t copyInteractionCountsKernel;
+};
+
+/**
+ * This class stores information about a per-atom parameter that may be used in a nonbonded kernel.
+ */
+
+class HipNonbondedUtilities::ParameterInfo {
+public:
+    /**
+     * Create a ParameterInfo object.
+     *
+     * @param name           the name of the parameter
+     * @param type           the data type of the parameter's components
+     * @param numComponents  the number of components in the parameter
+     * @param size           the size of the parameter in bytes
+     * @param memory         the memory containing the parameter values
+     * @param constant       whether the memory should be marked as constant
+     */
+    ParameterInfo(const std::string& name, const std::string& componentType, int numComponents, int size, hipDeviceptr_t memory, bool constant=true) :
+            name(name), componentType(componentType), numComponents(numComponents), size(size), memory(memory), constant(constant) {
+        if (numComponents == 1)
+            type = componentType;
+        else {
+            std::stringstream s;
+            s << componentType << numComponents;
+            type = s.str();
+        }
+    }
+    const std::string& getName() const {
+        return name;
+    }
+    const std::string& getComponentType() const {
+        return componentType;
+    }
+    const std::string& getType() const {
+        return type;
+    }
+    int getNumComponents() const {
+        return numComponents;
+    }
+    int getSize() const {
+        return size;
+    }
+    hipDeviceptr_t& getMemory() {
+        return memory;
+    }
+    bool isConstant() const {
+        return constant;
+    }
+private:
+    std::string name;
+    std::string componentType;
+    std::string type;
+    int size, numComponents;
+    hipDeviceptr_t memory;
+    bool constant;
+};
+
+} // namespace OpenMM
+
+#endif /*OPENMM_HIPNONBONDEDUTILITIES_H_*/
--- a/platforms/hip/include/HipParallelKernels.h
+++ b/platforms/hip/include/HipParallelKernels.h
+#ifndef OPENMM_HIPPARALLELKERNELS_H_
+#define OPENMM_HIPPARALLELKERNELS_H_
+
+/* -------------------------------------------------------------------------- *
+ *                                   OpenMM                                   *
+ * -------------------------------------------------------------------------- *
+ * This is part of the OpenMM molecular simulation toolkit originating from   *
+ * Simbios, the NIH National Center for Physics-Based Simulation of           *
+ * Biological Structures at Stanford, funded under the NIH Roadmap for        *
+ * Medical Research, grant U54 GM072970. See https://simtk.org.               *
+ *                                                                            *
+ * Portions copyright (c) 2011-2019 Stanford University and the Authors.      *
+ * Portions copyright (c) 2020-2023 Advanced Micro Devices, Inc.              *
+ * Authors: Peter Eastman, Nicholas Curtis                                    *
+ * Contributors:                                                              *
+ *                                                                            *
+ * This program is free software: you can redistribute it and/or modify       *
+ * it under the terms of the GNU Lesser General Public License as published   *
+ * by the Free Software Foundation, either version 3 of the License, or       *
+ * (at your option) any later version.                                        *
+ *                                                                            *
+ * This program is distributed in the hope that it will be useful,            *
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of             *
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the              *
+ * GNU Lesser General Public License for more details.                        *
+ *                                                                            *
+ * You should have received a copy of the GNU Lesser General Public License   *
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.      *
+ * -------------------------------------------------------------------------- */
+
+#include "HipPlatform.h"
+#include "HipContext.h"
+#include "HipKernels.h"
+#include "openmm/common/CommonKernels.h"
+
+namespace OpenMM {
+
+/**
+ * This kernel is invoked at the beginning and end of force and energy computations.  It gives the
+ * Platform a chance to clear buffers and do other initialization at the beginning, and to do any
+ * necessary work at the end to determine the final results.
+ */
+class HipParallelCalcForcesAndEnergyKernel : public CalcForcesAndEnergyKernel {
+public:
+    HipParallelCalcForcesAndEnergyKernel(std::string name, const Platform& platform, HipPlatform::PlatformData& data);
+    ~HipParallelCalcForcesAndEnergyKernel();
+    HipCalcForcesAndEnergyKernel& getKernel(int index) {
+        return dynamic_cast<HipCalcForcesAndEnergyKernel&>(kernels[index].getImpl());
+    }
+    /**
+     * Initialize the kernel.
+     *
+     * @param system     the System this kernel will be applied to
+     */
+    void initialize(const System& system);
+    /**
+     * This is called at the beginning of each force/energy computation, before calcForcesAndEnergy() has been called on
+     * any ForceImpl.
+     *
+     * @param context       the context in which to execute this kernel
+     * @param includeForce  true if forces should be computed
+     * @param includeEnergy true if potential energy should be computed
+     * @param groups        a set of bit flags for which force groups to include
+     */
+    void beginComputation(ContextImpl& context, bool includeForce, bool includeEnergy, int groups);
+    /**
+     * This is called at the end of each force/energy computation, after calcForcesAndEnergy() has been called on
+     * every ForceImpl.
+     *
+     * @param context       the context in which to execute this kernel
+     * @param includeForce  true if forces should be computed
+     * @param includeEnergy true if potential energy should be computed
+     * @param groups        a set of bit flags for which force groups to include
+     * @param valid         the method may set this to false to indicate the results are invalid and the force/energy
+     *                      calculation should be repeated
+     * @return the potential energy of the system.  This value is added to all values returned by ForceImpls'
+     * calcForcesAndEnergy() methods.  That is, each force kernel may <i>either</i> return its contribution to the
+     * energy directly, <i>or</i> add it to an internal buffer so that it will be included here.
+     */
+    double finishComputation(ContextImpl& context, bool includeForce, bool includeEnergy, int groups, bool& valid);
+private:
+    class BeginComputationTask;
+    class FinishComputationTask;
+    HipPlatform::PlatformData& data;
+    std::vector<Kernel> kernels;
+    std::vector<long long> completionTimes;
+    std::vector<double> contextNonbondedFractions;
+    HipArray contextForces;
+    void* pinnedPositionBuffer;
+    long long* pinnedForceBuffer;
+    hipFunction_t sumKernel;
+    hipEvent_t event;
+    std::vector<hipEvent_t> peerCopyEvent;
+    std::vector<hipEvent_t> peerCopyEventLocal;
+    std::vector<hipStream_t> peerCopyStream;
+};
+
+/**
+ * This kernel is invoked by HarmonicBondForce to calculate the forces acting on the system and the energy of the system.
+ */
+class HipParallelCalcHarmonicBondForceKernel : public CalcHarmonicBondForceKernel {
+public:
+    HipParallelCalcHarmonicBondForceKernel(std::string name, const Platform& platform, HipPlatform::PlatformData& data, const System& system);
+    CommonCalcHarmonicBondForceKernel& getKernel(int index) {
+        return dynamic_cast<CommonCalcHarmonicBondForceKernel&>(kernels[index].getImpl());
+    }
+    /**
+     * Initialize the kernel.
+     *
+     * @param system     the System this kernel will be applied to
+     * @param force      the HarmonicBondForce this kernel will be used for
+     */
+    void initialize(const System& system, const HarmonicBondForce& force);
+    /**
+     * Execute the kernel to calculate the forces and/or energy.
+     *
+     * @param context        the context in which to execute this kernel
+     * @param includeForces  true if forces should be calculated
+     * @param includeEnergy  true if the energy should be calculated
+     * @return the potential energy due to the force
+     */
+    double execute(ContextImpl& context, bool includeForces, bool includeEnergy);
+    /**
+     * Copy changed parameters over to a context.
+     *
+     * @param context    the context to copy parameters to
+     * @param force      the HarmonicBondForce to copy the parameters from
+     */
+    void copyParametersToContext(ContextImpl& context, const HarmonicBondForce& force);
+private:
+    class Task;
+    HipPlatform::PlatformData& data;
+    std::vector<Kernel> kernels;
+};
+
+/**
+ * This kernel is invoked by CustomBondForce to calculate the forces acting on the system and the energy of the system.
+ */
+class HipParallelCalcCustomBondForceKernel : public CalcCustomBondForceKernel {
+public:
+    HipParallelCalcCustomBondForceKernel(std::string name, const Platform& platform, HipPlatform::PlatformData& data, const System& system);
+    CommonCalcCustomBondForceKernel& getKernel(int index) {
+        return dynamic_cast<CommonCalcCustomBondForceKernel&>(kernels[index].getImpl());
+    }
+    /**
+     * Initialize the kernel.
+     *
+     * @param system     the System this kernel will be applied to
+     * @param force      the CustomBondForce this kernel will be used for
+     */
+    void initialize(const System& system, const CustomBondForce& force);
+    /**
+     * Execute the kernel to calculate the forces and/or energy.
+     *
+     * @param context        the context in which to execute this kernel
+     * @param includeForces  true if forces should be calculated
+     * @param includeEnergy  true if the energy should be calculated
+     * @return the potential energy due to the force
+     */
+    double execute(ContextImpl& context, bool includeForces, bool includeEnergy);
+    /**
+     * Copy changed parameters over to a context.
+     *
+     * @param context    the context to copy parameters to
+     * @param force      the CustomBondForce to copy the parameters from
+     */
+    void copyParametersToContext(ContextImpl& context, const CustomBondForce& force);
+private:
+    class Task;
+    HipPlatform::PlatformData& data;
+    std::vector<Kernel> kernels;
+};
+
+/**
+ * This kernel is invoked by HarmonicAngleForce to calculate the forces acting on the system and the energy of the system.
+ */
+class HipParallelCalcHarmonicAngleForceKernel : public CalcHarmonicAngleForceKernel {
+public:
+    HipParallelCalcHarmonicAngleForceKernel(std::string name, const Platform& platform, HipPlatform::PlatformData& data, const System& system);
+    CommonCalcHarmonicAngleForceKernel& getKernel(int index) {
+        return dynamic_cast<CommonCalcHarmonicAngleForceKernel&>(kernels[index].getImpl());
+    }
+    /**
+     * Initialize the kernel.
+     *
+     * @param system     the System this kernel will be applied to
+     * @param force      the HarmonicAngleForce this kernel will be used for
+     */
+    void initialize(const System& system, const HarmonicAngleForce& force);
+    /**
+     * Execute the kernel to calculate the forces and/or energy.
+     *
+     * @param context        the context in which to execute this kernel
+     * @param includeForces  true if forces should be calculated
+     * @param includeEnergy  true if the energy should be calculated
+     * @return the potential energy due to the force
+     */
+    double execute(ContextImpl& context, bool includeForces, bool includeEnergy);
+    /**
+     * Copy changed parameters over to a context.
+     *
+     * @param context    the context to copy parameters to
+     * @param force      the HarmonicAngleForce to copy the parameters from
+     */
+    void copyParametersToContext(ContextImpl& context, const HarmonicAngleForce& force);
+private:
+    class Task;
+    HipPlatform::PlatformData& data;
+    std::vector<Kernel> kernels;
+};
+
+/**
+ * This kernel is invoked by CustomAngleForce to calculate the forces acting on the system and the energy of the system.
+ */
+class HipParallelCalcCustomAngleForceKernel : public CalcCustomAngleForceKernel {
+public:
+    HipParallelCalcCustomAngleForceKernel(std::string name, const Platform& platform, HipPlatform::PlatformData& data, const System& system);
+    CommonCalcCustomAngleForceKernel& getKernel(int index) {
+        return dynamic_cast<CommonCalcCustomAngleForceKernel&>(kernels[index].getImpl());
+    }
+    /**
+     * Initialize the kernel.
+     *
+     * @param system     the System this kernel will be applied to
+     * @param force      the CustomAngleForce this kernel will be used for
+     */
+    void initialize(const System& system, const CustomAngleForce& force);
+    /**
+     * Execute the kernel to calculate the forces and/or energy.
+     *
+     * @param context        the context in which to execute this kernel
+     * @param includeForces  true if forces should be calculated
+     * @param includeEnergy  true if the energy should be calculated
+     * @return the potential energy due to the force
+     */
+    double execute(ContextImpl& context, bool includeForces, bool includeEnergy);
+    /**
+     * Copy changed parameters over to a context.
+     *
+     * @param context    the context to copy parameters to
+     * @param force      the CustomAngleForce to copy the parameters from
+     */
+    void copyParametersToContext(ContextImpl& context, const CustomAngleForce& force);
+private:
+    class Task;
+    HipPlatform::PlatformData& data;
+    std::vector<Kernel> kernels;
+};
+
+/**
+ * This kernel is invoked by PeriodicTorsionForce to calculate the forces acting on the system and the energy of the system.
+ */
+class HipParallelCalcPeriodicTorsionForceKernel : public CalcPeriodicTorsionForceKernel {
+public:
+    HipParallelCalcPeriodicTorsionForceKernel(std::string name, const Platform& platform, HipPlatform::PlatformData& data, const System& system);
+    CommonCalcPeriodicTorsionForceKernel& getKernel(int index) {
+        return dynamic_cast<CommonCalcPeriodicTorsionForceKernel&>(kernels[index].getImpl());
+    }
+    /**
+     * Initialize the kernel.
+     *
+     * @param system     the System this kernel will be applied to
+     * @param force      the PeriodicTorsionForce this kernel will be used for
+     */
+    void initialize(const System& system, const PeriodicTorsionForce& force);
+    /**
+     * Execute the kernel to calculate the forces and/or energy.
+     *
+     * @param context        the context in which to execute this kernel
+     * @param includeForces  true if forces should be calculated
+     * @param includeEnergy  true if the energy should be calculated
+     * @return the potential energy due to the force
+     */
+    double execute(ContextImpl& context, bool includeForces, bool includeEnergy);
+    class Task;
+    /**
+     * Copy changed parameters over to a context.
+     *
+     * @param context    the context to copy parameters to
+     * @param force      the PeriodicTorsionForce to copy the parameters from
+     */
+    void copyParametersToContext(ContextImpl& context, const PeriodicTorsionForce& force);
+private:
+    HipPlatform::PlatformData& data;
+    std::vector<Kernel> kernels;
+};
+
+/**
+ * This kernel is invoked by RBTorsionForce to calculate the forces acting on the system and the energy of the system.
+ */
+class HipParallelCalcRBTorsionForceKernel : public CalcRBTorsionForceKernel {
+public:
+    HipParallelCalcRBTorsionForceKernel(std::string name, const Platform& platform, HipPlatform::PlatformData& data, const System& system);
+    CommonCalcRBTorsionForceKernel& getKernel(int index) {
+        return dynamic_cast<CommonCalcRBTorsionForceKernel&>(kernels[index].getImpl());
+    }
+    /**
+     * Initialize the kernel.
+     *
+     * @param system     the System this kernel will be applied to
+     * @param force      the RBTorsionForce this kernel will be used for
+     */
+    void initialize(const System& system, const RBTorsionForce& force);
+    /**
+     * Execute the kernel to calculate the forces and/or energy.
+     *
+     * @param context        the context in which to execute this kernel
+     * @param includeForces  true if forces should be calculated
+     * @param includeEnergy  true if the energy should be calculated
+     * @return the potential energy due to the force
+     */
+    double execute(ContextImpl& context, bool includeForces, bool includeEnergy);
+    /**
+     * Copy changed parameters over to a context.
+     *
+     * @param context    the context to copy parameters to
+     * @param force      the RBTorsionForce to copy the parameters from
+     */
+    void copyParametersToContext(ContextImpl& context, const RBTorsionForce& force);
+private:
+    class Task;
+    HipPlatform::PlatformData& data;
+    std::vector<Kernel> kernels;
+};
+
+/**
+ * This kernel is invoked by CMAPTorsionForce to calculate the forces acting on the system and the energy of the system.
+ */
+class HipParallelCalcCMAPTorsionForceKernel : public CalcCMAPTorsionForceKernel {
+public:
+    HipParallelCalcCMAPTorsionForceKernel(std::string name, const Platform& platform, HipPlatform::PlatformData& data, const System& system);
+    CommonCalcCMAPTorsionForceKernel& getKernel(int index) {
+        return dynamic_cast<CommonCalcCMAPTorsionForceKernel&>(kernels[index].getImpl());
+    }
+    /**
+     * Initialize the kernel.
+     *
+     * @param system     the System this kernel will be applied to
+     * @param force      the CMAPTorsionForce this kernel will be used for
+     */
+    void initialize(const System& system, const CMAPTorsionForce& force);
+    /**
+     * Execute the kernel to calculate the forces and/or energy.
+     *
+     * @param context        the context in which to execute this kernel
+     * @param includeForces  true if forces should be calculated
+     * @param includeEnergy  true if the energy should be calculated
+     * @return the potential energy due to the force
+     */
+    double execute(ContextImpl& context, bool includeForces, bool includeEnergy);
+    /**
+     * Copy changed parameters over to a context.
+     *
+     * @param context    the context to copy parameters to
+     * @param force      the CMAPTorsionForce to copy the parameters from
+     */
+    void copyParametersToContext(ContextImpl& context, const CMAPTorsionForce& force);
+private:
+    class Task;
+    HipPlatform::PlatformData& data;
+    std::vector<Kernel> kernels;
+};
+
+/**
+ * This kernel is invoked by CustomTorsionForce to calculate the forces acting on the system and the energy of the system.
+ */
+class HipParallelCalcCustomTorsionForceKernel : public CalcCustomTorsionForceKernel {
+public:
+    HipParallelCalcCustomTorsionForceKernel(std::string name, const Platform& platform, HipPlatform::PlatformData& data, const System& system);
+    CommonCalcCustomTorsionForceKernel& getKernel(int index) {
+        return dynamic_cast<CommonCalcCustomTorsionForceKernel&>(kernels[index].getImpl());
+    }
+    /**
+     * Initialize the kernel.
+     *
+     * @param system     the System this kernel will be applied to
+     * @param force      the CustomTorsionForce this kernel will be used for
+     */
+    void initialize(const System& system, const CustomTorsionForce& force);
+    /**
+     * Execute the kernel to calculate the forces and/or energy.
+     *
+     * @param context        the context in which to execute this kernel
+     * @param includeForces  true if forces should be calculated
+     * @param includeEnergy  true if the energy should be calculated
+     * @return the potential energy due to the force
+     */
+    double execute(ContextImpl& context, bool includeForces, bool includeEnergy);
+    /**
+     * Copy changed parameters over to a context.
+     *
+     * @param context    the context to copy parameters to
+     * @param force      the CustomTorsionForce to copy the parameters from
+     */
+    void copyParametersToContext(ContextImpl& context, const CustomTorsionForce& force);
+private:
+    class Task;
+    HipPlatform::PlatformData& data;
+    std::vector<Kernel> kernels;
+};
+
+/**
+ * This kernel is invoked by NonbondedForce to calculate the forces acting on the system.
+ */
+class HipParallelCalcNonbondedForceKernel : public CalcNonbondedForceKernel {
+public:
+    HipParallelCalcNonbondedForceKernel(std::string name, const Platform& platform, HipPlatform::PlatformData& data, const System& system);
+    HipCalcNonbondedForceKernel& getKernel(int index) {
+        return dynamic_cast<HipCalcNonbondedForceKernel&>(kernels[index].getImpl());
+    }
+    /**
+     * Initialize the kernel.
+     *
+     * @param system     the System this kernel will be applied to
+     * @param force      the NonbondedForce this kernel will be used for
+     */
+    void initialize(const System& system, const NonbondedForce& force);
+    /**
+     * Execute the kernel to calculate the forces and/or energy.
+     *
+     * @param context        the context in which to execute this kernel
+     * @param includeForces  true if forces should be calculated
+     * @param includeEnergy  true if the energy should be calculated
+     * @param includeReciprocal  true if reciprocal space interactions should be included
+     * @param includeReciprocal  true if reciprocal space interactions should be included
+     * @return the potential energy due to the force
+     */
+    double execute(ContextImpl& context, bool includeForces, bool includeEnergy, bool includeDirect, bool includeReciprocal);
+    /**
+     * Copy changed parameters over to a context.
+     *
+     * @param context    the context to copy parameters to
+     * @param force      the NonbondedForce to copy the parameters from
+     */
+    void copyParametersToContext(ContextImpl& context, const NonbondedForce& force);
+    /**
+     * Get the parameters being used for PME.
+     *
+     * @param alpha   the separation parameter
+     * @param nx      the number of grid points along the X axis
+     * @param ny      the number of grid points along the Y axis
+     * @param nz      the number of grid points along the Z axis
+     */
+    void getPMEParameters(double& alpha, int& nx, int& ny, int& nz) const;
+    /**
+     * Get the dispersion parameters being used for the dispersion term in LJPME.
+     *
+     * @param alpha   the separation parameter
+     * @param nx      the number of grid points along the X axis
+     * @param ny      the number of grid points along the Y axis
+     * @param nz      the number of grid points along the Z axis
+     */
+    void getLJPMEParameters(double& alpha, int& nx, int& ny, int& nz) const;
+private:
+    class Task;
+    HipPlatform::PlatformData& data;
+    std::vector<Kernel> kernels;
+};
+
+/**
+ * This kernel is invoked by CustomNonbondedForce to calculate the forces acting on the system.
+ */
+class HipParallelCalcCustomNonbondedForceKernel : public CalcCustomNonbondedForceKernel {
+public:
+    HipParallelCalcCustomNonbondedForceKernel(std::string name, const Platform& platform, HipPlatform::PlatformData& data, const System& system);
+    CommonCalcCustomNonbondedForceKernel& getKernel(int index) {
+        return dynamic_cast<CommonCalcCustomNonbondedForceKernel&>(kernels[index].getImpl());
+    }
+    /**
+     * Initialize the kernel.
+     *
+     * @param system     the System this kernel will be applied to
+     * @param force      the CustomNonbondedForce this kernel will be used for
+     */
+    void initialize(const System& system, const CustomNonbondedForce& force);
+    /**
+     * Execute the kernel to calculate the forces and/or energy.
+     *
+     * @param context        the context in which to execute this kernel
+     * @param includeForces  true if forces should be calculated
+     * @param includeEnergy  true if the energy should be calculated
+     * @return the potential energy due to the force
+     */
+    double execute(ContextImpl& context, bool includeForces, bool includeEnergy);
+    /**
+     * Copy changed parameters over to a context.
+     *
+     * @param context    the context to copy parameters to
+     * @param force      the CustomNonbondedForce to copy the parameters from
+     */
+    void copyParametersToContext(ContextImpl& context, const CustomNonbondedForce& force);
+private:
+    class Task;
+    HipPlatform::PlatformData& data;
+    std::vector<Kernel> kernels;
+};
+
+/**
+ * This kernel is invoked by CustomExternalForce to calculate the forces acting on the system and the energy of the system.
+ */
+class HipParallelCalcCustomExternalForceKernel : public CalcCustomExternalForceKernel {
+public:
+    HipParallelCalcCustomExternalForceKernel(std::string name, const Platform& platform, HipPlatform::PlatformData& data, const System& system);
+    CommonCalcCustomExternalForceKernel& getKernel(int index) {
+        return dynamic_cast<CommonCalcCustomExternalForceKernel&>(kernels[index].getImpl());
+    }
+    /**
+     * Initialize the kernel.
+     *
+     * @param system     the System this kernel will be applied to
+     * @param force      the CustomExternalForce this kernel will be used for
+     */
+    void initialize(const System& system, const CustomExternalForce& force);
+    /**
+     * Execute the kernel to calculate the forces and/or energy.
+     *
+     * @param context        the context in which to execute this kernel
+     * @param includeForces  true if forces should be calculated
+     * @param includeEnergy  true if the energy should be calculated
+     * @return the potential energy due to the force
+     */
+    double execute(ContextImpl& context, bool includeForces, bool includeEnergy);
+    /**
+     * Copy changed parameters over to a context.
+     *
+     * @param context    the context to copy parameters to
+     * @param force      the CustomExternalForce to copy the parameters from
+     */
+    void copyParametersToContext(ContextImpl& context, const CustomExternalForce& force);
+private:
+    class Task;
+    HipPlatform::PlatformData& data;
+    std::vector<Kernel> kernels;
+};
+
+/**
+ * This kernel is invoked by CustomHbondForce to calculate the forces acting on the system.
+ */
+class HipParallelCalcCustomHbondForceKernel : public CalcCustomHbondForceKernel {
+public:
+    HipParallelCalcCustomHbondForceKernel(std::string name, const Platform& platform, HipPlatform::PlatformData& data, const System& system);
+    CommonCalcCustomHbondForceKernel& getKernel(int index) {
+        return dynamic_cast<CommonCalcCustomHbondForceKernel&>(kernels[index].getImpl());
+    }
+    /**
+     * Initialize the kernel.
+     *
+     * @param system     the System this kernel will be applied to
+     * @param force      the CustomHbondForce this kernel will be used for
+     */
+    void initialize(const System& system, const CustomHbondForce& force);
+    /**
+     * Execute the kernel to calculate the forces and/or energy.
+     *
+     * @param context        the context in which to execute this kernel
+     * @param includeForces  true if forces should be calculated
+     * @param includeEnergy  true if the energy should be calculated
+     * @return the potential energy due to the force
+     */
+    double execute(ContextImpl& context, bool includeForces, bool includeEnergy);
+    /**
+     * Copy changed parameters over to a context.
+     *
+     * @param context    the context to copy parameters to
+     * @param force      the CustomHbondForce to copy the parameters from
+     */
+    void copyParametersToContext(ContextImpl& context, const CustomHbondForce& force);
+private:
+    class Task;
+    HipPlatform::PlatformData& data;
+    std::vector<Kernel> kernels;
+};
+
+/**
+ * This kernel is invoked by CustomCompoundBondForce to calculate the forces acting on the system.
+ */
+class HipParallelCalcCustomCompoundBondForceKernel : public CalcCustomCompoundBondForceKernel {
+public:
+    HipParallelCalcCustomCompoundBondForceKernel(std::string name, const Platform& platform, HipPlatform::PlatformData& data, const System& system);
+    CommonCalcCustomCompoundBondForceKernel& getKernel(int index) {
+        return dynamic_cast<CommonCalcCustomCompoundBondForceKernel&>(kernels[index].getImpl());
+    }
+    /**
+     * Initialize the kernel.
+     *
+     * @param system     the System this kernel will be applied to
+     * @param force      the CustomCompoundBondForce this kernel will be used for
+     */
+    void initialize(const System& system, const CustomCompoundBondForce& force);
+    /**
+     * Execute the kernel to calculate the forces and/or energy.
+     *
+     * @param context        the context in which to execute this kernel
+     * @param includeForces  true if forces should be calculated
+     * @param includeEnergy  true if the energy should be calculated
+     * @return the potential energy due to the force
+     */
+    double execute(ContextImpl& context, bool includeForces, bool includeEnergy);
+    /**
+     * Copy changed parameters over to a context.
+     *
+     * @param context    the context to copy parameters to
+     * @param force      the CustomCompoundBondForce to copy the parameters from
+     */
+    void copyParametersToContext(ContextImpl& context, const CustomCompoundBondForce& force);
+private:
+    class Task;
+    HipPlatform::PlatformData& data;
+    std::vector<Kernel> kernels;
+};
+
+} // namespace OpenMM
+
+#endif /*OPENMM_HIPPARALLELKERNELS_H_*/