1. 05 Sep, 2024 2 commits
    • Anton Gorenko's avatar
      Optimize PME kernels · a0acfbc9
      Anton Gorenko authored
      * Compile with -munsafe-fp-atomics to enable fast hardware f32 atomic
        add on global memory on pre-MI100 GPUs;
      * Use fixed point charge spreading on other GPUs, otherwise float atomic
        add will be compiled as a slow CAS loop;
      * Tune block sizes, use executeKernelFlat;
      * Tune launch bounds of PME grid-related kernels: force the compiler to
        use all registers by limiting max waves per EU to 1.
      a0acfbc9
    • Anton Gorenko's avatar
      Optimize computeNonbonded · 67f5644d
      Anton Gorenko authored
      * All AMD GPUs support shuffle, double precision and 64-bit int atomics;
      * Remove unused code: !ENABLE_SHUFFLE code paths in nonbonded.hip;
      * Use intrinsics in single-precision;
      * Use realToFixedPoint (faster float32-to-int64);
      * Remove shared atomIndices, use shuffles;
      * Check early if atoms are in the cutoff range, sometimes all lanes in
        a warp can skip computations, single pairs can also skip useless
        atomics with zero values;
      * Remove volatile skipTiles access, use shuffles;
      * Distribute work for warps in a strided order;
      * Skip warps that may be still busy in the first loop;
      * Unify conditions for excluded atoms with `includeInteraction`;
      * Move multiprocessors to HipContext;
      * Increase number of warps for computeNonbonded;
      * Disable packed math for >=MI200 (it affects performance of some
        kernels like computeGKForces of amoebagk);
      * Remove defaultOptimizationOptions and createModule's optimizationFlags
        as they are never used;
      * Support -save-temps.
      67f5644d
  2. 01 Sep, 2024 1 commit
  3. 07 Mar, 2022 1 commit
  4. 08 Jan, 2020 1 commit
    • peastman's avatar
      Common compute framework to unify CUDA and OpenCL code (#2488) · edbc8407
      peastman authored
      * Began creating common compute framework to unify code between CUDA and OpenCL
      
      * Began OpenCL implementation of common compute framework
      
      * Common implementation of CMMotionRemover
      
      * CUDA implementation of common compute interface
      
      * Converted HarmonicBondForce to common compute API
      
      * Converted standard bonded forces to common compute API
      
      * Converted ExpressionUtilities to common compute API
      
      * Created ComputeParameterSet
      
      * Converted custom bonded forces to common compute API
      
      * Converted CustomCentroidBondForce to common compute API
      
      * Converted CustomManyParticleForce to common compute API
      
      * Moved lots of duplicate code from CudaContext and OpenCLContext to ComputeContext
      
      * Converted GayBerneForce to common compute API
      
      * Removed obsolete kernels
      
      * Converted verlet integrators to common compute API
      
      * Converted Langevin and Brownian integrators to common compute API
      
      * Converted CustomIntegrator to common compute API
      
      * Converted CustomNonbondedForce to common compute API
      
      * Removed uses of a deprecated API
      
      * Fixed failing test cases
      
      * Converted GBSAOBCForce to common compute API
      
      * Began converting CustomGBForce to common compute API
      
      * Finished converting CustomGBForce to common compute API
      
      * Merged duplicated code in CudaIntegrationUtilities and OpenCLIntegrationUtilities
      
      * Converted RMSDForce and AndersenThermostat to common compute API
      
      * Converted CustomHbondForce to common compute API
      
      * Merged scripts for encoding kernel sources
      
      * Converted Drude plugin to common compute API
      
      * Fixed errors in CMake scripts
      
      * Attempt at fixing errors on Windows
      
      * Added discussion of common compute API to developer guide
      
      * Added Windows export macro for common classes
      
      * Fixed error in CMMotionRemover
      
      * Ubdated travis to newer Ubuntu version
      
      * Fixed errors on CPU OpenCL
      
      * Fixed Windows linking errors
      
      * Added missing pragma for 32 bit atomics
      
      * Replaced long long with mm_long
      
      * More fixes to Windows linking
      
      * Bug fix
      edbc8407