Always use nvrtc for compilation (#4146)

* Always use nvrtc for compilation * Install nvrtc on CI * Workaround for compiler error * Set empty values for deprecated properties

Always use nvrtc for compilation (#4146)
* Always use nvrtc for compilation * Install nvrtc on CI * Workaround for compiler error * Set empty values for deprecated properties
dd07fa70 · Peter Eastman · GitHub · 007ab83d · dd07fa70 · dd07fa70
Unverified Commit dd07fa70 authored Jul 20, 2023 by Peter Eastman Committed by GitHub Jul 20, 2023
20 changed files
--- a/.github/workflows/CI.yml
+++ b/.github/workflows/CI.yml
@@ -289,8 +289,6 @@ jobs:
          # With CUDA, we _expect_ CUDA plugins :)
          if [[ "${{ matrix.cuda-version }}" != "" ]]; then
            test -f ${CONDA_PREFIX}/lib/plugins/libOpenMMCUDA.$SHLIB
-            # TODO: Check with Peter why this is not there. Maybe we need an extra flag?
-            # test -f ${CONDA_PREFIX}/lib/plugins/libOpenMMCudaCompiler.$SHLIB
          fi
          # OpenCL should also be there for CUDA and, well, OpenCL
          if [[ "${{ matrix.cuda-version }}" != "" || ${{ matrix.OPENCL }} == true ]]; then
@@ -447,7 +445,6 @@ jobs:
          if not "${{ matrix.cuda-version }}" == "" (
            if not exist %CONDA_PREFIX%/Library/lib/plugins/OpenMMCUDA.lib exit 1
            if not exist %CONDA_PREFIX%/Library/lib/plugins/OpenMMOpenCL.lib exit 1
-            if not exist %CONDA_PREFIX%/Library/lib/plugins/OpenMMCudaCompiler.lib exit 1
          ) else (
            if not exist %CONDA_PREFIX%/Library/lib/plugins/OpenMMCPU.lib exit 1
            if not exist %CONDA_PREFIX%/Library/lib/plugins/OpenMMPME.lib exit 1

--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -401,21 +401,6 @@ IF(OPENMM_BUILD_PME_PLUGIN)
   ADD_SUBDIRECTORY(plugins/cpupme)
 ENDIF(OPENMM_BUILD_PME_PLUGIN)

-# CUDA compiler plugin
-
-GET_FILENAME_COMPONENT(CUDA_LIB_DIR "${CUDA_cufft_LIBRARY}" PATH)
-FIND_LIBRARY(CUDA_nvrtc_LIBRARY nvrtc "${CUDA_LIB_DIR}")
-IF(CUDA_nvrtc_LIBRARY)
-    SET(OPENMM_BUILD_CUDA_COMPILER_PLUGIN ON CACHE BOOL "Build CUDA runtime compiler plugin")
-ELSE(CUDA_nvrtc_LIBRARY)
-    SET(OPENMM_BUILD_CUDA_COMPILER_PLUGIN OFF CACHE BOOL "Build CUDA runtime compiler plugin")
-ENDIF(CUDA_nvrtc_LIBRARY)
-SET(OPENMM_BUILD_CUDACOMPILER_PATH)
-IF(OPENMM_BUILD_CUDA_COMPILER_PLUGIN)
-   SET(OPENMM_BUILD_CUDACOMPILER_PATH ${CMAKE_CURRENT_SOURCE_DIR}/plugins/cudacompiler)
-   ADD_SUBDIRECTORY(plugins/cudacompiler)
-ENDIF(OPENMM_BUILD_CUDA_COMPILER_PLUGIN)
-
 IF(OPENMM_BUILD_SHARED_LIB)
    INSTALL_TARGETS(/lib RUNTIME_DIRECTORY /lib ${SHARED_TARGET})
 ENDIF(OPENMM_BUILD_SHARED_LIB)

--- a/devtools/ci/gh-actions/scripts/install_cuda.sh
+++ b/devtools/ci/gh-actions/scripts/install_cuda.sh
@@ -24,6 +24,7 @@ sudo apt-get install -y \
    cuda-drivers cuda-driver-dev-${CUDA_APT} \
    cuda-cudart-${CUDA_APT} cuda-cudart-dev-${CUDA_APT} \
    ${CUFFT}-${CUDA_APT} ${CUFFT}-dev-${CUDA_APT} \
+    cuda-nvrtc-${CUDA_APT} cuda-nvrtc-dev-${CUDA_APT} \
    cuda-nvprof-${CUDA_APT}
 sudo apt-get clean


--- a/docs-source/developerguide/07_cuda_platform.rst
+++ b/docs-source/developerguide/07_cuda_platform.rst
@@ -15,28 +15,17 @@ The CUDA platform is very similar to the OpenCL platform, and most of the
 previous chapter applies equally well to it, just changing “OpenCL” to “Cuda” in
 class names.  There are a few differences worth noting.

-Compiling Kernels
-*****************
+Caching Kernels
+***************

 Like the OpenCL platform, the CUDA platform compiles all its kernels at runtime.
-Unlike OpenCL, CUDA does not have built in support for runtime compilation.
-OpenMM therefore needs to implement this itself by writing the source code out
-to disk, invoking the nvcc compiler as a separate process, and then loading the
-compiled kernel in from disk.
-
-For the most part, you can ignore all of this.  Just call
-:code:`createModule()` on the CudaContext, passing it the CUDA source code.
-It takes care of the details of compilation and loading, returning a CUmodule
-object when it is done.  You can then call :code:`getKernel()` to look up
-individual kernels in the module (represented as CUfunction objects) and
-:code:`executeKernel()` to execute them.
-
-The CUDA platform does need two things to make this work: a directory on disk
-where it can write out temporary files, and the path to the nvcc compiler.
-These are specified by the “CudaTempDirectory” and “CudaCompiler” properties
-when you create a new Context.  It often can figure out suitable values for them
-on its own, but sometimes it needs help.  See the “Platform-Specific Properties”
-chapter of the User's Manual for details.
+To improve performance, it tries to cache the compiled kernels on disk for
+later use.  This allows subsequent Contexts to skip compiling some kernels.  To
+make this work, it needs a directory on disk where it can write out temporary
+files.  It is specified by the “CudaTempDirectory” property when you create a
+new Context.  It usually can figure out a suitable value on its own, but
+sometimes it needs help.  See the “Platform-Specific Properties” chapter of the
+User's Manual for details.

 Accumulating Forces
 *******************
@@ -44,6 +33,4 @@ Accumulating Forces
 The OpenCL platform, as described in Section :numref:`computing-forces`\ , uses two types of buffers for
 accumulating forces: a set of floating point buffers, and a single fixed point
 buffer.  In contrast, the CUDA platform uses *only* the fixed point buffer
-(represented by the CUDA type :code:`long` :code:`long`\ ).  This means
-the CUDA platform only works on devices that support 64 bit atomic operations
-(compute capability 1.2 or higher).
+(represented by the CUDA type :code:`long` :code:`long`\ ).
--- a/docs-source/usersguide/library/04_platform_specifics.rst
+++ b/docs-source/usersguide/library/04_platform_specifics.rst
@@ -73,19 +73,6 @@ The CUDA Platform recognizes the following Platform-specific properties:
 * UseCpuPme: This selects whether to use the CPU-based PME implementation.
  The allowed values are “true” or “false”.  Depending on your hardware, this
  might (or might not) improve performance.
-* CudaCompiler: This specifies the path to the CUDA kernel compiler.  Versions
-  of CUDA before 7.0 require a separate compiler executable.  If you do
-  not specify this, OpenMM will try to locate the compiler itself.  Specify this
-  only when you want to override the default location.  The logic used to pick the
-  default location depends on the operating system:
-
-  * Mac/Linux: It first looks for an environment variable called
-    OPENMM_CUDA_COMPILER.  If that is set, its value is used.  Otherwise, the
-    default location is set to /usr/local/cuda/bin/nvcc.
-  * Windows: It looks for an environment variable called CUDA_BIN_PATH, then
-    appends \nvcc.exe to it.  That environment variable is set by the CUDA
-    installer, so it usually is present.
-
 * TempDirectory: This specifies a directory where temporary files can be
  written while compiling kernels.  OpenMM usually can locate your operating
  system’s temp directory automatically (for example, by looking for the TEMP

--- a/platforms/cuda/include/CudaContext.h
+++ b/platforms/cuda/include/CudaContext.h
@@ -9,7 +9,7 @@
 * Biological Structures at Stanford, funded under the NIH Roadmap for        *
 * Medical Research, grant U54 GM072970. See https://simtk.org.               *
 *                                                                            *
- * Portions copyright (c) 2009-2021 Stanford University and the Authors.      *
+ * Portions copyright (c) 2009-2023 Stanford University and the Authors.      *
 * Authors: Peter Eastman                                                     *
 * Contributors:                                                              *
 *                                                                            *
@@ -75,8 +75,7 @@ public:
    static const int ThreadBlockSize;
    static const int TileSize;
    CudaContext(const System& system, int deviceIndex, bool useBlockingSync, const std::string& precision,
-            const std::string& compiler, const std::string& tempDir, const std::string& hostCompiler, bool allowRuntimeCompiler,
-            CudaPlatform::PlatformData& platformData, CudaContext* originalContext);
+            const std::string& tempDir, CudaPlatform::PlatformData& platformData, CudaContext* originalContext);
    ~CudaContext();
    /**
     * This is called to initialize internal data structures after all Forces in the system
@@ -558,9 +557,9 @@ private:
    int numAtomBlocks;
    int numThreadBlocks;
    int gpuArchitecture;
-    bool useBlockingSync, useDoublePrecision, useMixedPrecision, contextIsValid, boxIsTriclinic, hasCompilerKernel, isNvccAvailable, hasAssignedPosqCharges;
+    bool useBlockingSync, useDoublePrecision, useMixedPrecision, contextIsValid, boxIsTriclinic, hasAssignedPosqCharges;
    bool isLinkedContext;
-    std::string compiler, tempDir, cacheDir;
+    std::string tempDir, cacheDir;
    float4 periodicBoxVecXFloat, periodicBoxVecYFloat, periodicBoxVecZFloat, periodicBoxSizeFloat, invPeriodicBoxSizeFloat;
    double4 periodicBoxVecX, periodicBoxVecY, periodicBoxVecZ, periodicBoxSize, invPeriodicBoxSize;
    std::string defaultOptimizationOptions;

--- a/platforms/cuda/include/CudaPlatform.h
+++ b/platforms/cuda/include/CudaPlatform.h
@@ -9,7 +9,7 @@
 * Biological Structures at Stanford, funded under the NIH Roadmap for        *
 * Medical Research, grant U54 GM072970. See https://simtk.org.               *
 *                                                                            *
- * Portions copyright (c) 2008-2021 Stanford University and the Authors.      *
+ * Portions copyright (c) 2008-2023 Stanford University and the Authors.      *
 * Authors: Peter Eastman                                                     *
 * Contributors:                                                              *
 *                                                                            *
@@ -91,14 +91,18 @@ public:
        return key;
    }
    /**
-     * This is the name of the parameter for specifying the path to the CUDA compiler.
+     * This property is ignored.  It exists only for backward compatibility.
+     * 
+     * @deprecated
     */
    static const std::string& CudaCompiler() {
        static const std::string key = "CudaCompiler";
        return key;
    }
    /**
-     * This is the name of the parameter for specifying the host compiler for the CUDA compiler to use.
+     * This property is ignored.  It exists only for backward compatibility.
+     * 
+     * @deprecated
     */
    static const std::string& CudaHostCompiler() {
        static const std::string key = "CudaHostCompiler";
@@ -130,15 +134,15 @@ public:
 class OPENMM_EXPORT_COMMON CudaPlatform::PlatformData {
 public:
    PlatformData(ContextImpl* context, const System& system, const std::string& deviceIndexProperty, const std::string& blockingProperty, const std::string& precisionProperty,
-            const std::string& cpuPmeProperty, const std::string& compilerProperty, const std::string& tempProperty, const std::string& hostCompilerProperty,
-            const std::string& pmeStreamProperty, const std::string& deterministicForcesProperty, int numThreads, bool allowRuntimeCompiler, ContextImpl* originalContext);
+            const std::string& cpuPmeProperty, const std::string& tempProperty, const std::string& pmeStreamProperty, const std::string& deterministicForcesProperty,
+            int numThreads, ContextImpl* originalContext);
    ~PlatformData();
    void initializeContexts(const System& system);
    void syncContexts();
    ContextImpl* context;
    std::vector<CudaContext*> contexts;
    std::vector<double> contextEnergy;
-    bool hasInitializedContexts, removeCM, peerAccessSupported, useCpuPme, disablePmeStream, deterministicForces, allowRuntimeCompiler;
+    bool hasInitializedContexts, removeCM, peerAccessSupported, useCpuPme, disablePmeStream, deterministicForces;
    int cmMotionFrequency, computeForceCount;
    long long stepCount;
    double time;

--- a/platforms/cuda/sharedTarget/CMakeLists.txt
+++ b/platforms/cuda/sharedTarget/CMakeLists.txt
@@ -13,7 +13,7 @@ SET_SOURCE_FILES_PROPERTIES(${KERNELS_CPP} ${KERNELS_H} ${COMMON_KERNELS_CPP} PR
 ADD_LIBRARY(${SHARED_TARGET} SHARED ${SOURCE_FILES} ${SOURCE_INCLUDE_FILES} ${API_ABS_INCLUDE_FILES})
 ADD_DEPENDENCIES(${SHARED_TARGET} CommonKernels)

-TARGET_LINK_LIBRARIES(${SHARED_TARGET} ${OPENMM_LIBRARY_NAME} CUDA::cuda_driver CUDA::cufft ${PTHREADS_LIB})
+TARGET_LINK_LIBRARIES(${SHARED_TARGET} ${OPENMM_LIBRARY_NAME} CUDA::cuda_driver CUDA::cufft CUDA::nvrtc ${PTHREADS_LIB})
 SET_TARGET_PROPERTIES(${SHARED_TARGET} PROPERTIES COMPILE_FLAGS "${EXTRA_COMPILE_FLAGS} -DOPENMM_COMMON_BUILDING_SHARED_LIBRARY")
 IF (APPLE)
    SET_TARGET_PROPERTIES(${SHARED_TARGET} PROPERTIES LINK_FLAGS "${EXTRA_COMPILE_FLAGS} -F/Library/Frameworks -framework CUDA")

--- a/platforms/cuda/src/CudaContext.cpp
+++ b/platforms/cuda/src/CudaContext.cpp
@@ -55,6 +55,7 @@
 #include <typeinfo>
 #include <sys/stat.h>
 #include <cudaProfiler.h>
+#include <nvrtc.h>
 #ifndef WIN32
  #include <unistd.h>
 #endif
@@ -67,6 +68,12 @@
        m<<prefix<<": "<<getErrorString(result)<<" ("<<result<<")"<<" at "<<__FILE__<<":"<<__LINE__; \
        throw OpenMMException(m.str());\
    }
+#define CHECK_NVRTC_RESULT(result, prefix) \
+    if (result != NVRTC_SUCCESS) { \
+        stringstream m; \
+        m<<prefix<<": "<<nvrtcGetErrorString(result)<<" ("<<result<<")"<<" at "<<__FILE__<<":"<<__LINE__; \
+        throw OpenMMException(m.str());\
+    }

 using namespace OpenMM;
 using namespace std;
@@ -75,67 +82,11 @@ const int CudaContext::ThreadBlockSize = 64;
 const int CudaContext::TileSize = sizeof(tileflags)*8;
 bool CudaContext::hasInitializedCuda = false;

-#ifdef WIN32
-#include <Windows.h>
-static int executeInWindows(const string &command) {
-    // COMSPEC is an env variable pointing to full dir of cmd.exe
-    // it always defined on pretty much all Windows OSes
-    string fullcommand = getenv("COMSPEC") + string(" /C ") + command;
-    STARTUPINFO si;
-    PROCESS_INFORMATION pi;
-    ZeroMemory( &si, sizeof(si) );
-    si.cb = sizeof(si);
-    ZeroMemory( &pi, sizeof(pi) );
-    vector<char> args(std::max(1000, (int) fullcommand.size()+1));
-    strcpy(&args[0], fullcommand.c_str());
-    si.dwFlags = STARTF_USESHOWWINDOW;
-    si.wShowWindow = SW_HIDE;
-    if (!CreateProcess(NULL, &args[0], NULL, NULL, FALSE, 0, NULL, NULL, &si, &pi)) {
-        return -1;
-    }
-    WaitForSingleObject(pi.hProcess, INFINITE);
-    DWORD exitCode = -1;
-    if(!GetExitCodeProcess(pi.hProcess, &exitCode)) {
-        throw(OpenMMException("Could not get nvcc.exe's exit code\n"));
-    } else {
-        if(exitCode == 0)
-            return 0;
-        else
-            return -1;
-    }
-}
-#endif
-
-CudaContext::CudaContext(const System& system, int deviceIndex, bool useBlockingSync, const string& precision, const string& compiler,
-        const string& tempDir, const std::string& hostCompiler, bool allowRuntimeCompiler, CudaPlatform::PlatformData& platformData,
+CudaContext::CudaContext(const System& system, int deviceIndex, bool useBlockingSync, const string& precision, const string& tempDir, CudaPlatform::PlatformData& platformData,
        CudaContext* originalContext) : ComputeContext(system), currentStream(0), platformData(platformData), contextIsValid(false), hasAssignedPosqCharges(false),
-        hasCompilerKernel(false), isNvccAvailable(false), pinnedBuffer(NULL), integration(NULL), expression(NULL), bonded(NULL), nonbonded(NULL),
-        useBlockingSync(useBlockingSync) {
-    // Determine what compiler to use.
-    
-    this->compiler = "\""+compiler+"\"";
-    if (allowRuntimeCompiler && platformData.context != NULL) {
-        try {
-            compilerKernel = platformData.context->getPlatform().createKernel(CudaCompilerKernel::Name(), *platformData.context);
-            hasCompilerKernel = true;
-        }
-        catch (...) {
-            // The runtime compiler plugin isn't available.
-        }
-    }
-#ifdef WIN32
-    string testCompilerCommand = this->compiler+" --version > nul 2> nul";
-    int res = executeInWindows(testCompilerCommand.c_str());
-#else
-    string testCompilerCommand = this->compiler+" --version > /dev/null 2> /dev/null";
-    int res = std::system(testCompilerCommand.c_str());
-#endif
-    struct stat info;
-    isNvccAvailable = (res == 0 && stat(tempDir.c_str(), &info) == 0);
+        pinnedBuffer(NULL), integration(NULL), expression(NULL), bonded(NULL), nonbonded(NULL), useBlockingSync(useBlockingSync) {
    int cudaDriverVersion;
    cuDriverGetVersion(&cudaDriverVersion);
-    if (hostCompiler.size() > 0)
-        this->compiler = compiler+" --compiler-bindir "+hostCompiler;
    if (!hasInitializedCuda) {
        CHECK_RESULT2(cuInit(0), "Error initializing CUDA");
        hasInitializedCuda = true;
@@ -557,13 +508,19 @@ CUmodule CudaContext::createModule(const string source, const map<string, string
    
    // Determine what architecture to compile for.

-    string compileArchitecture;
-    if (hasCompilerKernel) {
-        int maxCompilerArchitecture = compilerKernel.getAs<CudaCompilerKernel>().getMaxSupportedArchitecture();
-        compileArchitecture = intToString(min(gpuArchitecture, maxCompilerArchitecture));
-    }
-    else
-        compileArchitecture = intToString(gpuArchitecture);
+    int maxCompilerArchitecture;
+#if CUDA_VERSION < 11020
+    // CUDA versions before 11.2 can't query the compiler to see what it supports.
+    
+    maxCompilerArchitecture = 75;
+#else
+    int numArchs;
+    CHECK_NVRTC_RESULT(nvrtcGetNumSupportedArchs(&numArchs), "Error querying supported architectures");
+    vector<int> archs(numArchs);
+    CHECK_NVRTC_RESULT(nvrtcGetSupportedArchs(archs.data()), "Error querying supported architectures");
+    maxCompilerArchitecture = archs.back();
+#endif
+    string compileArchitecture = intToString(min(gpuArchitecture, maxCompilerArchitecture));

    // See whether we already have PTX for this kernel cached.

@@ -582,7 +539,7 @@ CUmodule CudaContext::createModule(const string source, const map<string, string
    if (cuModuleLoad(&module, cacheFile.str().c_str()) == CUDA_SUCCESS)
        return module;

-    // Select names for the various temporary files.
+    // Select a name for the output file.

    stringstream tempFileName;
    tempFileName << "openmmTempKernel" << this; // Include a pointer to this context as part of the filename to avoid collisions.
@@ -591,22 +548,46 @@ CUmodule CudaContext::createModule(const string source, const map<string, string
 #else
    tempFileName << "_" << getpid();
 #endif
-    string inputFile = (tempDir+tempFileName.str()+".cu");
    string outputFile = (tempDir+tempFileName.str()+".ptx");
-    string logFile = (tempDir+tempFileName.str()+".log");
-    int res = 0;

-    // If the runtime compiler plugin is available, use it.
+    // Split the command line flags into an array of options.
+    
+    string flags = "-arch=compute_"+compileArchitecture+" "+options;
+    stringstream flagsStream(flags);
+    string flag;
+    vector<string> splitFlags;
+    while (flagsStream >> flag)
+        splitFlags.push_back(flag);
+    int numOptions = splitFlags.size();
+    vector<const char*> optionsVec(numOptions);
+    for (int i = 0; i < numOptions; i++)
+        optionsVec[i] = &splitFlags[i][0];
+    
+    // Compile the program to PTX.
    
-    if (hasCompilerKernel) {
-        string ptx = compilerKernel.getAs<CudaCompilerKernel>().createModule(src.str(), "-arch=compute_"+compileArchitecture+" "+options, *this);
+    nvrtcProgram program;
+    CHECK_NVRTC_RESULT(nvrtcCreateProgram(&program, src.str().c_str(), NULL, 0, NULL, NULL), "Error creating program");
+    try {
+        nvrtcResult result = nvrtcCompileProgram(program, optionsVec.size(), &optionsVec[0]);
+        if (result != NVRTC_SUCCESS) {
+            size_t logSize;
+            nvrtcGetProgramLogSize(program, &logSize);
+            vector<char> log(logSize);
+            nvrtcGetProgramLog(program, &log[0]);
+            throw OpenMMException("Error compiling program: "+string(&log[0]));
+        }
+        size_t ptxSize;
+        nvrtcGetPTXSize(program, &ptxSize);
+        vector<char> ptx(ptxSize);
+        nvrtcGetPTX(program, &ptx[0]);
+        nvrtcDestroyProgram(&program);

        // If possible, write the PTX out to a temporary file so we can cache it for later use.

        bool wroteCache = false;
        try {
            ofstream out(outputFile.c_str());
-            out << ptx;
+            out << string(&ptx[0]);
            out.close();
            if (!out.fail())
                wroteCache = true;
@@ -621,57 +602,23 @@ CUmodule CudaContext::createModule(const string source, const map<string, string
            return module;
        }
    }
-    else {
-        // Write out the source to a temporary file.
-
-        ofstream out(inputFile.c_str());
-        out << src.str();
-        out.close();
-#ifdef WIN32
-#ifdef _DEBUG
-        string command = compiler+" --ptx -G -g --machine "+bits+" -arch=sm_"+compileArchitecture+" -o "+outputFile+" "+options+" "+inputFile+" 2> "+logFile;
-#else
-        string command = compiler+" --ptx -lineinfo --machine "+bits+" -arch=sm_"+compileArchitecture+" -o "+outputFile+" "+options+" "+inputFile+" 2> "+logFile;
-#endif
-        res = executeInWindows(command);
-#else
-        string command = compiler+" --ptx --machine "+bits+" -arch=sm_"+compileArchitecture+" -o \""+outputFile+"\" "+options+" \""+inputFile+"\" 2> \""+logFile+"\"";
-        res = std::system(command.c_str());
-#endif
+    catch (...) {
+        nvrtcDestroyProgram(&program);
+        throw;
    }
    try {
-        if (res != 0) {
-            // Load the error log.
-
-            stringstream error;
-            error << "Error launching CUDA compiler: " << res;
-            ifstream log(logFile.c_str());
-            if (log.is_open()) {
-                string line;
-                while (!log.eof()) {
-                    getline(log, line);
-                    error << '\n' << line;
-                }
-                log.close();
-            }
-            throw OpenMMException(error.str());
-        }
        CUresult result = cuModuleLoad(&module, outputFile.c_str());
        if (result != CUDA_SUCCESS) {
            std::stringstream m;
            m<<"Error loading CUDA module: "<<getErrorString(result)<<" ("<<result<<")";
            throw OpenMMException(m.str());
        }
-        remove(inputFile.c_str());
        if (rename(outputFile.c_str(), cacheFile.str().c_str()) != 0)
            remove(outputFile.c_str());
-        remove(logFile.c_str());
        return module;
    }
    catch (...) {
-        remove(inputFile.c_str());
        remove(outputFile.c_str());
-        remove(logFile.c_str());
        throw;
    }
 }

--- a/platforms/cuda/src/CudaPlatform.cpp
+++ b/platforms/cuda/src/CudaPlatform.cpp
@@ -6,7 +6,7 @@
 * Biological Structures at Stanford, funded under the NIH Roadmap for        *
 * Medical Research, grant U54 GM072970. See https://simtk.org.               *
 *                                                                            *
- * Portions copyright (c) 2008-2021 Stanford University and the Authors.      *
+ * Portions copyright (c) 2008-2023 Stanford University and the Authors.      *
 * Authors: Peter Eastman                                                     *
 * Contributors:                                                              *
 *                                                                            *
@@ -123,27 +123,15 @@ CudaPlatform::CudaPlatform() {
    setPropertyDefaultValue(CudaUseCpuPme(), "false");
    setPropertyDefaultValue(CudaDisablePmeStream(), "false");
    setPropertyDefaultValue(CudaDeterministicForces(), "false");
+    setPropertyDefaultValue(CudaCompiler(), "");
+    setPropertyDefaultValue(CudaHostCompiler(), "");
 #ifdef _MSC_VER
-    char* bindir = getenv("CUDA_BIN_PATH");
-    string nvcc = (bindir == NULL ? "nvcc.exe" : string(bindir)+"\\nvcc.exe");
-    int length = GetShortPathName(nvcc.c_str(), NULL, 0);
-    if (length > 0) {
-        vector<char> shortName(length);
-        GetShortPathName(nvcc.c_str(), &shortName[0], length);
-        nvcc = string(&shortName[0]);
-    }
-    setPropertyDefaultValue(CudaCompiler(), nvcc);
    setPropertyDefaultValue(CudaTempDirectory(), string(getenv("TEMP")));
 #else
-    char* compiler = getenv("OPENMM_CUDA_COMPILER");
-    string nvcc = (compiler == NULL ? "/usr/local/cuda/bin/nvcc" : string(compiler));
-    setPropertyDefaultValue(CudaCompiler(), nvcc);
    char* tmpdir = getenv("TMPDIR");
    string tmp = (tmpdir == NULL ? string(P_tmpdir) : string(tmpdir));
    setPropertyDefaultValue(CudaTempDirectory(), tmp);
 #endif
-    char* hostCompiler = getenv("CUDA_HOST_COMPILER");
-    setPropertyDefaultValue(CudaHostCompiler(), (hostCompiler == NULL ? "" : string(hostCompiler)));
 }

 double CudaPlatform::getSpeed() const {
@@ -178,12 +166,8 @@ void CudaPlatform::contextCreated(ContextImpl& context, const map<string, string
            getPropertyDefaultValue(CudaPrecision()) : properties.find(CudaPrecision())->second);
    string cpuPmePropValue = (properties.find(CudaUseCpuPme()) == properties.end() ?
            getPropertyDefaultValue(CudaUseCpuPme()) : properties.find(CudaUseCpuPme())->second);
-    const string& compilerPropValue = (properties.find(CudaCompiler()) == properties.end() ?
-            getPropertyDefaultValue(CudaCompiler()) : properties.find(CudaCompiler())->second);
    const string& tempPropValue = (properties.find(CudaTempDirectory()) == properties.end() ?
            getPropertyDefaultValue(CudaTempDirectory()) : properties.find(CudaTempDirectory())->second);
-    const string& hostCompilerPropValue = (properties.find(CudaHostCompiler()) == properties.end() ?
-            getPropertyDefaultValue(CudaHostCompiler()) : properties.find(CudaHostCompiler())->second);
    string pmeStreamPropValue = (properties.find(CudaDisablePmeStream()) == properties.end() ?
            getPropertyDefaultValue(CudaDisablePmeStream()) : properties.find(CudaDisablePmeStream())->second);
    string deterministicForcesValue = (properties.find(CudaDeterministicForces()) == properties.end() ?
@@ -201,10 +185,8 @@ void CudaPlatform::contextCreated(ContextImpl& context, const map<string, string
    char* threadsEnv = getenv("OPENMM_CPU_THREADS");
    if (threadsEnv != NULL)
        stringstream(threadsEnv) >> threads;
-    char* compilerEnv = getenv("OPENMM_CUDA_COMPILER");
-    bool allowRuntimeCompiler = (compilerEnv == NULL && properties.find(CudaCompiler()) == properties.end());
-    context.setPlatformData(new PlatformData(&context, context.getSystem(), devicePropValue, blockingPropValue, precisionPropValue, cpuPmePropValue, compilerPropValue, tempPropValue,
-            hostCompilerPropValue, pmeStreamPropValue, deterministicForcesValue, threads, allowRuntimeCompiler, NULL));
+    context.setPlatformData(new PlatformData(&context, context.getSystem(), devicePropValue, blockingPropValue, precisionPropValue, cpuPmePropValue, tempPropValue,
+            pmeStreamPropValue, deterministicForcesValue, threads, NULL));
 }

 void CudaPlatform::linkedContextCreated(ContextImpl& context, ContextImpl& originalContext) const {
@@ -213,15 +195,12 @@ void CudaPlatform::linkedContextCreated(ContextImpl& context, ContextImpl& origi
    string blockingPropValue = platform.getPropertyValue(originalContext.getOwner(), CudaUseBlockingSync());
    string precisionPropValue = platform.getPropertyValue(originalContext.getOwner(), CudaPrecision());
    string cpuPmePropValue = platform.getPropertyValue(originalContext.getOwner(), CudaUseCpuPme());
-    string compilerPropValue = platform.getPropertyValue(originalContext.getOwner(), CudaCompiler());
    string tempPropValue = platform.getPropertyValue(originalContext.getOwner(), CudaTempDirectory());
-    string hostCompilerPropValue = platform.getPropertyValue(originalContext.getOwner(), CudaHostCompiler());
    string pmeStreamPropValue = platform.getPropertyValue(originalContext.getOwner(), CudaDisablePmeStream());
    string deterministicForcesValue = platform.getPropertyValue(originalContext.getOwner(), CudaDeterministicForces());
    int threads = reinterpret_cast<PlatformData*>(originalContext.getPlatformData())->threads.getNumThreads();
-    bool allowRuntimeCompiler = reinterpret_cast<PlatformData*>(originalContext.getPlatformData())->allowRuntimeCompiler;
-    context.setPlatformData(new PlatformData(&context, context.getSystem(), devicePropValue, blockingPropValue, precisionPropValue, cpuPmePropValue, compilerPropValue, tempPropValue,
-            hostCompilerPropValue, pmeStreamPropValue, deterministicForcesValue, threads, allowRuntimeCompiler, &originalContext));
+    context.setPlatformData(new PlatformData(&context, context.getSystem(), devicePropValue, blockingPropValue, precisionPropValue, cpuPmePropValue, tempPropValue,
+            pmeStreamPropValue, deterministicForcesValue, threads, &originalContext));
 }

 void CudaPlatform::contextDestroyed(ContextImpl& context) const {
@@ -230,10 +209,9 @@ void CudaPlatform::contextDestroyed(ContextImpl& context) const {
 }

 CudaPlatform::PlatformData::PlatformData(ContextImpl* context, const System& system, const string& deviceIndexProperty, const string& blockingProperty, const string& precisionProperty,
-            const string& cpuPmeProperty, const string& compilerProperty, const string& tempProperty, const string& hostCompilerProperty, const string& pmeStreamProperty,
-            const string& deterministicForcesProperty, int numThreads, bool allowRuntimeCompiler, ContextImpl* originalContext) :
-                context(context), removeCM(false), stepCount(0), computeForceCount(0), time(0.0), hasInitializedContexts(false),
-                threads(numThreads), allowRuntimeCompiler(allowRuntimeCompiler) {
+            const string& cpuPmeProperty, const string& tempProperty, const string& pmeStreamProperty, const string& deterministicForcesProperty,
+            int numThreads, ContextImpl* originalContext) : context(context), removeCM(false), stepCount(0), computeForceCount(0), time(0.0),
+                hasInitializedContexts(false), threads(numThreads) {
    bool blocking = (blockingProperty == "true");
    vector<string> devices;
    size_t searchPos = 0, nextPos;
@@ -250,11 +228,11 @@ CudaPlatform::PlatformData::PlatformData(ContextImpl* context, const System& sys
            if (devices[i].length() > 0) {
                int deviceIndex;
                stringstream(devices[i]) >> deviceIndex;
-                contexts.push_back(new CudaContext(system, deviceIndex, blocking, precisionProperty, compilerProperty, tempProperty, hostCompilerProperty, allowRuntimeCompiler, *this, (originalData == NULL ? NULL : originalData->contexts[i])));
+                contexts.push_back(new CudaContext(system, deviceIndex, blocking, precisionProperty, tempProperty, *this, (originalData == NULL ? NULL : originalData->contexts[i])));
            }
        }
        if (contexts.size() == 0)
-            contexts.push_back(new CudaContext(system, -1, blocking, precisionProperty, compilerProperty, tempProperty, hostCompilerProperty, allowRuntimeCompiler, *this, (originalData == NULL ? NULL : originalData->contexts[0])));
+            contexts.push_back(new CudaContext(system, -1, blocking, precisionProperty, tempProperty, *this, (originalData == NULL ? NULL : originalData->contexts[0])));
    }
    catch (...) {
        // If an exception was thrown, do our best to clean up memory.
@@ -282,9 +260,9 @@ CudaPlatform::PlatformData::PlatformData(ContextImpl* context, const System& sys
    propertyValues[CudaPlatform::CudaUseBlockingSync()] = blocking ? "true" : "false";
    propertyValues[CudaPlatform::CudaPrecision()] = precisionProperty;
    propertyValues[CudaPlatform::CudaUseCpuPme()] = useCpuPme ? "true" : "false";
-    propertyValues[CudaPlatform::CudaCompiler()] = compilerProperty;
+    propertyValues[CudaPlatform::CudaCompiler()] = "";
    propertyValues[CudaPlatform::CudaTempDirectory()] = tempProperty;
-    propertyValues[CudaPlatform::CudaHostCompiler()] = hostCompilerProperty;
+    propertyValues[CudaPlatform::CudaHostCompiler()] = "";
    propertyValues[CudaPlatform::CudaDisablePmeStream()] = disablePmeStream ? "true" : "false";
    propertyValues[CudaPlatform::CudaDeterministicForces()] = deterministicForces ? "true" : "false";
    contextEnergy.resize(contexts.size());

--- a/platforms/cuda/staticTarget/CMakeLists.txt
+++ b/platforms/cuda/staticTarget/CMakeLists.txt
@@ -12,7 +12,7 @@ ADD_CUSTOM_COMMAND(OUTPUT ${KERNELS_CPP} ${KERNELS_H}
 SET_SOURCE_FILES_PROPERTIES(${KERNELS_CPP} ${KERNELS_H} PROPERTIES GENERATED TRUE)
 ADD_LIBRARY(${STATIC_TARGET} STATIC ${SOURCE_FILES} ${SOURCE_INCLUDE_FILES} ${API_ABS_INCLUDE_FILES})

-TARGET_LINK_LIBRARIES(${STATIC_TARGET} ${OPENMM_LIBRARY_NAME} CUDA::cuda_driver CUDA::cufft_static ${PTHREADS_LIB_STATIC})
+TARGET_LINK_LIBRARIES(${STATIC_TARGET} ${OPENMM_LIBRARY_NAME} CUDA::cuda_driver CUDA::cufft_static CUDA::nvrtc_static ${PTHREADS_LIB_STATIC})
 SET_TARGET_PROPERTIES(${STATIC_TARGET} PROPERTIES COMPILE_FLAGS "${EXTRA_COMPILE_FLAGS} -DOPENMM_COMMON_BUILDING_STATIC_LIBRARY")
 IF (APPLE)
    SET_TARGET_PROPERTIES(${STATIC_TARGET} PROPERTIES LINK_FLAGS "${EXTRA_COMPILE_FLAGS} -F/Library/Frameworks -framework CUDA")

--- a/platforms/cuda/tests/TestCudaFFT3D.cpp
+++ b/platforms/cuda/tests/TestCudaFFT3D.cpp
@@ -59,8 +59,8 @@ void testTransform(bool realToComplex, int xsize, int ysize, int zsize) {
    System system;
    system.addParticle(0.0);
    CudaPlatform::PlatformData platformData(NULL, system, "", "true", platform.getPropertyDefaultValue("CudaPrecision"), "false",
-            platform.getPropertyDefaultValue(CudaPlatform::CudaCompiler()), platform.getPropertyDefaultValue(CudaPlatform::CudaTempDirectory()),
-            platform.getPropertyDefaultValue(CudaPlatform::CudaHostCompiler()), platform.getPropertyDefaultValue(CudaPlatform::CudaDisablePmeStream()), "false", true, 1, NULL);
+            platform.getPropertyDefaultValue(CudaPlatform::CudaTempDirectory()),
+            platform.getPropertyDefaultValue(CudaPlatform::CudaDisablePmeStream()), "false", 1, NULL);
    CudaContext& context = *platformData.contexts[0];
    context.initialize();
    context.setAsCurrent();

--- a/platforms/cuda/tests/TestCudaRandom.cpp
+++ b/platforms/cuda/tests/TestCudaRandom.cpp
@@ -55,8 +55,8 @@ void testGaussian() {
    for (int i = 0; i < numAtoms; i++)
        system.addParticle(1.0);
    CudaPlatform::PlatformData platformData(NULL, system, "", "true", platform.getPropertyDefaultValue("CudaPrecision"), "false",
-            platform.getPropertyDefaultValue(CudaPlatform::CudaCompiler()), platform.getPropertyDefaultValue(CudaPlatform::CudaTempDirectory()),
-            platform.getPropertyDefaultValue(CudaPlatform::CudaHostCompiler()), platform.getPropertyDefaultValue(CudaPlatform::CudaDisablePmeStream()), "false", true, 1, NULL);
+            platform.getPropertyDefaultValue(CudaPlatform::CudaTempDirectory()),
+            platform.getPropertyDefaultValue(CudaPlatform::CudaDisablePmeStream()), "false", 1, NULL);
    CudaContext& context = *platformData.contexts[0];
    context.initialize();
    context.setAsCurrent();

--- a/platforms/cuda/tests/TestCudaSort.cpp
+++ b/platforms/cuda/tests/TestCudaSort.cpp
@@ -65,8 +65,8 @@ void verifySorting(vector<float> array, bool uniform) {
    System system;
    system.addParticle(0.0);
    CudaPlatform::PlatformData platformData(NULL, system, "", "true", platform.getPropertyDefaultValue("CudaPrecision"), "false",
-            platform.getPropertyDefaultValue(CudaPlatform::CudaCompiler()), platform.getPropertyDefaultValue(CudaPlatform::CudaTempDirectory()),
-            platform.getPropertyDefaultValue(CudaPlatform::CudaHostCompiler()), platform.getPropertyDefaultValue(CudaPlatform::CudaDisablePmeStream()), "false", true, 1, NULL);
+            platform.getPropertyDefaultValue(CudaPlatform::CudaTempDirectory()),
+            platform.getPropertyDefaultValue(CudaPlatform::CudaDisablePmeStream()), "false", 1, NULL);
    CudaContext& context = *platformData.contexts[0];
    context.initialize();
    context.setAsCurrent();

--- a/plugins/amoeba/platforms/common/src/kernels/amoebaVdwForce2.cc
+++ b/plugins/amoeba/platforms/common/src/kernels/amoebaVdwForce2.cc
@@ -16,7 +16,7 @@
 #endif
 #if VDW_ALCHEMICAL_METHOD != 0
       real lambda = vdwLambda[0];
-       epsilon = epsilon * POW(lambda, VDW_SOFTCORE_POWER);
+       epsilon = epsilon * POW(lambda, (real) VDW_SOFTCORE_POWER);
       softcore = VDW_SOFTCORE_ALPHA * (1.0f - lambda) * (1.0f - lambda);
    }
 #endif

--- a/plugins/cudacompiler/CMakeLists.txt
+++ b/plugins/cudacompiler/CMakeLists.txt
-#---------------------------------------------------
-# OpenMM CUDA runtime compiler
-#
-# Creates OpenMMCudaCompiler library.
-#
-# Windows:
-#   OpenMMCudaCompiler.dll
-#   OpenMMCudaCompiler.lib
-# Unix:
-#   libOpenMMCudaCompiler.so
-#----------------------------------------------------
-
-# The source is organized into subdirectories, but we handle them all from
-# this CMakeLists file rather than letting CMake visit them as SUBDIRS.
-SET(OPENMM_SOURCE_SUBDIRS .)
-
-
-# Collect up information about the version of the OpenMM library we're building
-# and make it available to the code so it can be built into the binaries.
-
-SET(OPENMMCUDACOMPILER_LIBRARY_NAME OpenMMCudaCompiler)
-
-SET(SHARED_TARGET ${OPENMMCUDACOMPILER_LIBRARY_NAME})
-SET(STATIC_TARGET ${OPENMMCUDACOMPILER_LIBRARY_NAME}_static)
-
-# These are all the places to search for header files which are
-# to be part of the API.
-SET(API_INCLUDE_DIRS) # start empty
-FOREACH(subdir ${OPENMM_SOURCE_SUBDIRS})
-    # append
-    SET(API_INCLUDE_DIRS ${API_INCLUDE_DIRS}
-                         ${CMAKE_CURRENT_SOURCE_DIR}/${subdir}/include
-                         ${CMAKE_CURRENT_SOURCE_DIR}/${subdir}/include/internal)
-ENDFOREACH(subdir)
-
-# We'll need both *relative* path names, starting with their API_INCLUDE_DIRS,
-# and absolute pathnames.
-SET(API_REL_INCLUDE_FILES)   # start these out empty
-SET(API_ABS_INCLUDE_FILES)
-
-FOREACH(dir ${API_INCLUDE_DIRS})
-    FILE(GLOB fullpaths ${dir}/*.h)	# returns full pathnames
-    SET(API_ABS_INCLUDE_FILES ${API_ABS_INCLUDE_FILES} ${fullpaths})
-
-    FOREACH(pathname ${fullpaths})
-        GET_FILENAME_COMPONENT(filename ${pathname} NAME)
-        SET(API_REL_INCLUDE_FILES ${API_REL_INCLUDE_FILES} ${dir}/${filename})
-    ENDFOREACH(pathname)
-ENDFOREACH(dir)
-
-# collect up source files
-SET(SOURCE_FILES) # empty
-SET(SOURCE_INCLUDE_FILES)
-
-FOREACH(subdir ${OPENMM_SOURCE_SUBDIRS})
-    FILE(GLOB_RECURSE src_files  ${CMAKE_CURRENT_SOURCE_DIR}/${subdir}/src/*.cpp ${CMAKE_CURRENT_SOURCE_DIR}/${subdir}/src/*.c)
-    FILE(GLOB incl_files ${CMAKE_CURRENT_SOURCE_DIR}/${subdir}/src/*.h)
-    SET(SOURCE_FILES         ${SOURCE_FILES}         ${src_files})   #append
-    SET(SOURCE_INCLUDE_FILES ${SOURCE_INCLUDE_FILES} ${incl_files})
-    INCLUDE_DIRECTORIES(BEFORE ${CMAKE_CURRENT_SOURCE_DIR}/${subdir}/include)
-ENDFOREACH(subdir)
-
-INCLUDE_DIRECTORIES(BEFORE ${CMAKE_CURRENT_SOURCE_DIR}/src)
-INCLUDE_DIRECTORIES(BEFORE ${CMAKE_SOURCE_DIR}/platforms/cuda/include)
-INCLUDE_DIRECTORIES(BEFORE ${CMAKE_SOURCE_DIR}/platforms/common/include)
-INCLUDE_DIRECTORIES(${CUDA_TOOLKIT_INCLUDE})
-
-# Build the shared plugin library.
-
-IF (OPENMM_BUILD_SHARED_LIB)
-    ADD_LIBRARY(${SHARED_TARGET} SHARED ${SOURCE_FILES} ${SOURCE_INCLUDE_FILES} ${API_INCLUDE_FILES})
-
-    TARGET_LINK_LIBRARIES(${SHARED_TARGET} ${OPENMM_LIBRARY_NAME} ${CUDA_nvrtc_LIBRARY})
-    TARGET_LINK_LIBRARIES(${SHARED_TARGET} ${OPENMM_LIBRARY_NAME}CUDA)
-    SET_TARGET_PROPERTIES(${SHARED_TARGET} PROPERTIES COMPILE_FLAGS "${EXTRA_COMPILE_FLAGS} -DOPENMM_CUDACOMPILER_BUILDING_SHARED_LIBRARY")
-    IF (APPLE)
-        SET_TARGET_PROPERTIES(${SHARED_TARGET} PROPERTIES LINK_FLAGS "${EXTRA_COMPILE_FLAGS} -F/Library/Frameworks -framework CUDA")
-    ELSE (APPLE)
-        SET_TARGET_PROPERTIES(${SHARED_TARGET} PROPERTIES LINK_FLAGS "${EXTRA_LINK_FLAGS}")
-    ENDIF (APPLE)
-
-    INSTALL_TARGETS(/lib/plugins RUNTIME_DIRECTORY /lib/plugins ${SHARED_TARGET})
-ENDIF (OPENMM_BUILD_SHARED_LIB)
-
-# Build the static plugin library.
-
-IF(OPENMM_BUILD_STATIC_LIB)
-    ADD_LIBRARY(${STATIC_TARGET} STATIC ${SOURCE_FILES} ${SOURCE_INCLUDE_FILES} ${API_INCLUDE_FILES})
-
-    TARGET_LINK_LIBRARIES(${STATIC_TARGET} ${OPENMM_LIBRARY_NAME}_static ${CUDA_nvrtc_LIBRARY})
-    TARGET_LINK_LIBRARIES(${STATIC_TARGET} ${OPENMM_LIBRARY_NAME}CUDA)
-    SET_TARGET_PROPERTIES(${STATIC_TARGET} PROPERTIES COMPILE_FLAGS "${EXTRA_COMPILE_FLAGS} -DOPENMM_CUDACOMPILER_BUILDING_STATIC_LIBRARY")
-    IF (APPLE)
-        SET_TARGET_PROPERTIES(${STATIC_TARGET} PROPERTIES LINK_FLAGS "${EXTRA_COMPILE_FLAGS} -F/Library/Frameworks -framework CUDA")
-    ELSE (APPLE)
-        SET_TARGET_PROPERTIES(${STATIC_TARGET} PROPERTIES LINK_FLAGS "${EXTRA_LINK_FLAGS}")
-    ENDIF (APPLE)
-
-    INSTALL_TARGETS(/lib/plugins RUNTIME_DIRECTORY /lib/plugins ${STATIC_TARGET})
-ENDIF(OPENMM_BUILD_STATIC_LIB)
-
-INSTALL(TARGETS ${SHARED_TARGET} DESTINATION ${CMAKE_INSTALL_PREFIX}/lib/plugins)
-# Ensure that links to the main CUDA library will be resolved.
-IF (APPLE)
-    IF (OPENMM_BUILD_SHARED_LIB)
-        SET(CUDA_LIBRARY libOpenMMCUDA.dylib)
-        INSTALL(CODE "EXECUTE_PROCESS(COMMAND install_name_tool -change ${CUDA_LIBRARY} @loader_path/${CUDA_LIBRARY} ${CMAKE_INSTALL_PREFIX}/lib/plugins/lib${SHARED_TARGET}.dylib)")
-    ENDIF (OPENMM_BUILD_SHARED_LIB)
-ENDIF (APPLE)
-
-if(OPENMM_BUILD_CUDA_TESTS)
-    SUBDIRS (tests)
-endif(OPENMM_BUILD_CUDA_TESTS)
--- a/plugins/cudacompiler/include/internal/windowsExportCudaCompiler.h
+++ b/plugins/cudacompiler/include/internal/windowsExportCudaCompiler.h
-#ifndef OPENMM_WINDOWSEXPORTCUDACOMPILER_H_
-#define OPENMM_WINDOWSEXPORTCUDACOMPILER_H_
-
-/*
- * Shared libraries are messy in Visual Studio. We have to distinguish three
- * cases:
- *   (1) this header is being used to build the OpenMM shared library
- *       (dllexport)
- *   (2) this header is being used by a *client* of the OpenMM shared
- *       library (dllimport)
- *   (3) we are building the OpenMM static library, or the client is
- *       being compiled with the expectation of linking with the
- *       OpenMM static library (nothing special needed)
- * In the CMake script for building this library, we define one of the symbols
- *     OPENMM_CUDACOMPILER_BUILDING_{SHARED|STATIC}_LIBRARY
- * Client code normally has no special symbol defined, in which case we'll
- * assume it wants to use the shared library. However, if the client defines
- * the symbol OPENMM_USE_STATIC_LIBRARIES we'll suppress the dllimport so
- * that the client code can be linked with static libraries. Note that
- * the client symbol is not library dependent, while the library symbols
- * affect only the OpenMM library, meaning that other libraries can
- * be clients of this one. However, we are assuming all-static or all-shared.
- */
-
-#ifdef _MSC_VER
-    // We don't want to hear about how sprintf is "unsafe".
-    #pragma warning(disable:4996)
-    // Keep MS VC++ quiet about lack of dll export of private members.
-    #pragma warning(disable:4251)
-    #if defined(OPENMM_CUDACOMPILER_BUILDING_SHARED_LIBRARY)
-        #define OPENMM_EXPORT_CUDACOMPILER __declspec(dllexport)
-    #elif defined(OPENMM_CUDACOMPILER_BUILDING_STATIC_LIBRARY) || defined(OPENMM_CUDACOMPILER_USE_STATIC_LIBRARIES)
-        #define OPENMM_EXPORT_CUDACOMPILER
-    #else
-        #define OPENMM_EXPORT_CUDACOMPILER __declspec(dllimport)   // i.e., a client of a shared library
-    #endif
-#else
-    #define OPENMM_EXPORT_CUDACOMPILER // Linux, Mac
-#endif
-
-#endif // OPENMM_WINDOWSEXPORTCUDACOMPILER_H_
--- a/plugins/cudacompiler/src/CudaCompilerKernelFactory.cpp
+++ b/plugins/cudacompiler/src/CudaCompilerKernelFactory.cpp
-/* -------------------------------------------------------------------------- *
- *                           OpenMMCudaCompiler                               *
- * -------------------------------------------------------------------------- *
- * This is part of the OpenMM molecular simulation toolkit originating from   *
- * Simbios, the NIH National Center for Physics-Based Simulation of           *
- * Biological Structures at Stanford, funded under the NIH Roadmap for        *
- * Medical Research, grant U54 GM072970. See https://simtk.org.               *
- *                                                                            *
- * Portions copyright (c) 2015-2016 Stanford University and the Authors.      *
- * Authors: Peter Eastman                                                     *
- * Contributors:                                                              *
- *                                                                            *
- * This program is free software: you can redistribute it and/or modify       *
- * it under the terms of the GNU Lesser General Public License as published   *
- * by the Free Software Foundation, either version 3 of the License, or       *
- * (at your option) any later version.                                        *
- *                                                                            *
- * This program is distributed in the hope that it will be useful,            *
- * but WITHOUT ANY WARRANTY; without even the implied warranty of             *
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the              *
- * GNU Lesser General Public License for more details.                        *
- *                                                                            *
- * You should have received a copy of the GNU Lesser General Public License   *
- * along with this program.  If not, see <http://www.gnu.org/licenses/>.      *
- * -------------------------------------------------------------------------- */
-
-#include "CudaCompilerKernelFactory.h"
-#include "CudaCompilerKernels.h"
-#include "internal/windowsExportCudaCompiler.h"
-#include "openmm/internal/ContextImpl.h"
-#include "openmm/OpenMMException.h"
-
-using namespace OpenMM;
-
-#ifdef OPENMM_CUDACOMPILER_BUILDING_STATIC_LIBRARY
-static void registerKernelFactories() {
-#else
-extern "C" OPENMM_EXPORT_CUDACOMPILER void registerKernelFactories() {
-#endif
-    try {
-        // Make sure this is at least CUDA 7.0.
-        
-        int driverVersion;
-        cuDriverGetVersion(&driverVersion);
-        if (driverVersion >= 7000) {
-            Platform& platform = Platform::getPlatformByName("CUDA");
-            CudaCompilerKernelFactory* factory = new CudaCompilerKernelFactory();
-            platform.registerKernelFactory(CudaCompilerKernel::Name(), factory);
-        }
-    }
-    catch (std::exception ex) {
-        // Ignore
-    }
-}
-
-#ifdef OPENMM_CUDACOMPILER_BUILDING_STATIC_LIBRARY
-extern "C" void registerCudaCompilerKernelFactories() {
-    registerKernelFactories();
-}
-#else
-extern "C" OPENMM_EXPORT_CUDACOMPILER void registerCudaCompilerKernelFactories() {
-    registerKernelFactories();
-}
-extern "C" OPENMM_EXPORT_CUDACOMPILER void registerPlatforms() {
-}
-#endif
-
-KernelImpl* CudaCompilerKernelFactory::createKernelImpl(std::string name, const Platform& platform, ContextImpl& context) const {
-    if (name == CudaCompilerKernel::Name())
-        return new CudaRuntimeCompilerKernel(name, platform);
-    throw OpenMMException((std::string("Tried to create kernel with illegal kernel name '")+name+"'").c_str());
-}
\ No newline at end of file
--- a/plugins/cudacompiler/src/CudaCompilerKernelFactory.h
+++ b/plugins/cudacompiler/src/CudaCompilerKernelFactory.h
-#ifndef OPENMM_CPUCUDACOMPILERKERNELFACTORY_H_
-#define OPENMM_CPUCUDACOMPILERKERNELFACTORY_H_
-
-/* -------------------------------------------------------------------------- *
- *                                   OpenMM                                   *
- * -------------------------------------------------------------------------- *
- * This is part of the OpenMM molecular simulation toolkit originating from   *
- * Simbios, the NIH National Center for Physics-Based Simulation of           *
- * Biological Structures at Stanford, funded under the NIH Roadmap for        *
- * Medical Research, grant U54 GM072970. See https://simtk.org.               *
- *                                                                            *
- * Portions copyright (c) 2015 Stanford University and the Authors.           *
- * Authors: Peter Eastman                                                     *
- * Contributors:                                                              *
- *                                                                            *
- * Permission is hereby granted, free of charge, to any person obtaining a    *
- * copy of this software and associated documentation files (the "Software"), *
- * to deal in the Software without restriction, including without limitation  *
- * the rights to use, copy, modify, merge, publish, distribute, sublicense,   *
- * and/or sell copies of the Software, and to permit persons to whom the      *
- * Software is furnished to do so, subject to the following conditions:       *
- *                                                                            *
- * The above copyright notice and this permission notice shall be included in *
- * all copies or substantial portions of the Software.                        *
- *                                                                            *
- * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR *
- * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,   *
- * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL    *
- * THE AUTHORS, CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM,    *
- * DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR      *
- * OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE  *
- * USE OR OTHER DEALINGS IN THE SOFTWARE.                                     *
- * -------------------------------------------------------------------------- */
-
-#include "openmm/KernelFactory.h"
-
-namespace OpenMM {
-
-/**
- * This KernelFactory creates kernels for the CUDA runtime compiler.
- */
-
-class CudaCompilerKernelFactory : public KernelFactory {
-public:
-    KernelImpl* createKernelImpl(std::string name, const Platform& platform, ContextImpl& context) const;
-};
-
-} // namespace OpenMM
-
-#endif /*OPENMM_CPUCUDACOMPILERKERNELFACTORY_H_*/
--- a/plugins/cudacompiler/src/CudaCompilerKernels.cpp
+++ b/plugins/cudacompiler/src/CudaCompilerKernels.cpp
-/* -------------------------------------------------------------------------- *
- *                                   OpenMM                                   *
- * -------------------------------------------------------------------------- *
- * This is part of the OpenMM molecular simulation toolkit originating from   *
- * Simbios, the NIH National Center for Physics-Based Simulation of           *
- * Biological Structures at Stanford, funded under the NIH Roadmap for        *
- * Medical Research, grant U54 GM072970. See https://simtk.org.               *
- *                                                                            *
- * Portions copyright (c) 2015-2021 Stanford University and the Authors.      *
- * Authors: Peter Eastman                                                     *
- * Contributors:                                                              *
- *                                                                            *
- * Permission is hereby granted, free of charge, to any person obtaining a    *
- * copy of this software and associated documentation files (the "Software"), *
- * to deal in the Software without restriction, including without limitation  *
- * the rights to use, copy, modify, merge, publish, distribute, sublicense,   *
- * and/or sell copies of the Software, and to permit persons to whom the      *
- * Software is furnished to do so, subject to the following conditions:       *
- *                                                                            *
- * The above copyright notice and this permission notice shall be included in *
- * all copies or substantial portions of the Software.                        *
- *                                                                            *
- * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR *
- * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,   *
- * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL    *
- * THE AUTHORS, CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM,    *
- * DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR      *
- * OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE  *
- * USE OR OTHER DEALINGS IN THE SOFTWARE.                                     *
- * -------------------------------------------------------------------------- */
-
-#include "CudaCompilerKernels.h"
-#include "openmm/OpenMMException.h"
-#include <sstream>
-#include <nvrtc.h>
-
-using namespace OpenMM;
-using namespace std;
-
-#define CHECK_RESULT(result, prefix) \
-    if (result != NVRTC_SUCCESS) { \
-        stringstream m; \
-        m<<prefix<<": "<<getErrorString(result)<<" ("<<result<<")"<<" at "<<__FILE__<<":"<<__LINE__; \
-        throw OpenMMException(m.str());\
-    }
-
-static string getErrorString(nvrtcResult result) {
-    return nvrtcGetErrorString(result);
-}
-
-CudaRuntimeCompilerKernel::CudaRuntimeCompilerKernel(const std::string& name, const Platform& platform) : CudaCompilerKernel(name, platform) {
-    // Find the maximum architecture the compiler supports.
-    
-#if CUDA_VERSION < 11020
-    // CUDA versions before 11.2 can't query the compiler to see what it supports.
-    
-    maxSupportedArchitecture = 75;
-#else
-    int numArchs;
-    CHECK_RESULT(nvrtcGetNumSupportedArchs(&numArchs), "Error querying supported architectures");
-    vector<int> archs(numArchs);
-    CHECK_RESULT(nvrtcGetSupportedArchs(archs.data()), "Error querying supported architectures");
-    maxSupportedArchitecture = archs.back();
-#endif
-}
-
-string CudaRuntimeCompilerKernel::createModule(const string& source, const string& flags, CudaContext& cu) {
-    // Split the command line flags into an array of options.
-    
-    stringstream flagsStream(flags);
-    string flag;
-    vector<string> splitFlags;
-    while (flagsStream >> flag)
-        splitFlags.push_back(flag);
-    int numOptions = splitFlags.size();
-    vector<const char*> options(numOptions);
-    for (int i = 0; i < numOptions; i++)
-        options[i] = &splitFlags[i][0];
-    
-    // Compile the program to PTX.
-    
-    nvrtcProgram program;
-    CHECK_RESULT(nvrtcCreateProgram(&program, source.c_str(), NULL, 0, NULL, NULL), "Error creating program");
-    try {
-        nvrtcResult result = nvrtcCompileProgram(program, options.size(), &options[0]);
-        if (result != NVRTC_SUCCESS) {
-            size_t logSize;
-            nvrtcGetProgramLogSize(program, &logSize);
-            vector<char> log(logSize);
-            nvrtcGetProgramLog(program, &log[0]);
-            throw OpenMMException("Error compiling program: "+string(&log[0]));
-        }
-        size_t ptxSize;
-        nvrtcGetPTXSize(program, &ptxSize);
-        vector<char> ptx(ptxSize);
-        nvrtcGetPTX(program, &ptx[0]);
-        nvrtcDestroyProgram(&program);
-        return string(&ptx[0]);
-    }
-    catch (...) {
-        nvrtcDestroyProgram(&program);
-        throw;
-    }
-}