- 01 Sep, 2024 2 commits
-
-
Anton Gorenko authored
* Compile kernels with max block size of 256 threads: The default hipcc behavior since ROCm 4.2 is to compile kernels with 1024 threads unless __launch_bounds__ is specified. This significantly increases register pressure especially in heavy kernels (double precision, for example), requiring register spilling; * Optimize computeRange by using multiple blocks for reduction; * Use blocks of 1024 threads for computeBucketPositions - it is executed as a single work group so larger block size is faster; * Sort up-to lenghtNextPow2 instead of blockDim.x (faster for short buckets); * Optimize sortShortList2; * Optimize sortBuckets with bit instructions; * Decrease bucket size for non-uniform sorting: too many buckets may have sizes too large to sort in shared memory; * Add more sizes in tests.
-
Anton Gorenko authored
Port changes in CUDA backend to HIP Fix a warning about arithmetic operations on void* in HipArray::uploadSubArray Fix "Error Initializing context ROCm 5.3.0" https://github.com/StreamHPC/openmm-hip/issues/3 hipDeviceSetCacheConfig returns hipErrorNotSupported on 5.3 Co-authored-by:Nick Curtis <nicholas.curtis@amd.com>
-
- 27 Dec, 2021 1 commit
-
-
Peter Eastman authored
* Optimized CudaSort for non-uniformly distributed data * Optimized OpenCLSort for non-uniformly distributed data * Further tuned distributing elements between buckets * Copied optimizations over to OpenCL
-
- 08 Jan, 2020 1 commit
-
-
peastman authored
* Began creating common compute framework to unify code between CUDA and OpenCL * Began OpenCL implementation of common compute framework * Common implementation of CMMotionRemover * CUDA implementation of common compute interface * Converted HarmonicBondForce to common compute API * Converted standard bonded forces to common compute API * Converted ExpressionUtilities to common compute API * Created ComputeParameterSet * Converted custom bonded forces to common compute API * Converted CustomCentroidBondForce to common compute API * Converted CustomManyParticleForce to common compute API * Moved lots of duplicate code from CudaContext and OpenCLContext to ComputeContext * Converted GayBerneForce to common compute API * Removed obsolete kernels * Converted verlet integrators to common compute API * Converted Langevin and Brownian integrators to common compute API * Converted CustomIntegrator to common compute API * Converted CustomNonbondedForce to common compute API * Removed uses of a deprecated API * Fixed failing test cases * Converted GBSAOBCForce to common compute API * Began converting CustomGBForce to common compute API * Finished converting CustomGBForce to common compute API * Merged duplicated code in CudaIntegrationUtilities and OpenCLIntegrationUtilities * Converted RMSDForce and AndersenThermostat to common compute API * Converted CustomHbondForce to common compute API * Merged scripts for encoding kernel sources * Converted Drude plugin to common compute API * Fixed errors in CMake scripts * Attempt at fixing errors on Windows * Added discussion of common compute API to developer guide * Added Windows export macro for common classes * Fixed error in CMMotionRemover * Ubdated travis to newer Ubuntu version * Fixed errors on CPU OpenCL * Fixed Windows linking errors * Added missing pragma for 32 bit atomics * Replaced long long with mm_long * More fixes to Windows linking * Bug fix
-
- 03 May, 2018 1 commit
-
-
peastman authored
-
- 12 Feb, 2018 1 commit
-
-
Peter Eastman authored
-
- 08 Jul, 2013 1 commit
-
-
peastman authored
Platform specific header files get installed. This allows plugins to be built with just an OpenMM installation, not a full source tree.
-
- 22 Mar, 2013 1 commit
-
-
Peter Eastman authored
-
- 12 Dec, 2012 1 commit
-
-
Peter Eastman authored
-
- 24 Oct, 2012 1 commit
-
-
Peter Eastman authored
-
- 28 Sep, 2012 1 commit
-
-
Peter Eastman authored
-
- 20 Jun, 2012 1 commit
-
-
Peter Eastman authored
-
- 15 Jun, 2012 1 commit
-
-
Peter Eastman authored
-
- 05 Jun, 2012 1 commit
-
-
Peter Eastman authored
-