Commits · 3b8df95215bbe3f3423eeb690c4ab812b2a80b3e · tsoc / openmm

01 Sep, 2024 2 commits

Optimize sorting kernels and tune block sizes · 7279c539

Anton Gorenko authored Aug 25, 2024

* Compile kernels with max block size of 256 threads:
  The default hipcc behavior since ROCm 4.2 is to compile kernels
  with 1024 threads unless __launch_bounds__ is specified. This
  significantly increases register pressure especially in heavy kernels
  (double precision, for example), requiring register spilling;
* Optimize computeRange by using multiple blocks for reduction;
* Use blocks of 1024 threads for computeBucketPositions - it is executed
  as a single work group so larger block size is faster;
* Sort up-to lenghtNextPow2 instead of blockDim.x (faster for short
  buckets);
* Optimize sortShortList2;
* Optimize sortBuckets with bit instructions;
* Decrease bucket size for non-uniform sorting: too many buckets may
  have sizes too large to sort in shared memory;
* Add more sizes in tests.

7279c539

Add hipification of CUDA platform · 89d2ff0e

Anton Gorenko authored Aug 25, 2024

Port changes in CUDA backend to HIP

Fix a warning about arithmetic operations on void* in HipArray::uploadSubArray

Fix "Error Initializing context ROCm 5.3.0"

    https://github.com/StreamHPC/openmm-hip/issues/3


    hipDeviceSetCacheConfig returns hipErrorNotSupported on 5.3
Co-authored-by: Nick Curtis <nicholas.curtis@amd.com>

89d2ff0e

27 Dec, 2021 1 commit

Optimize sorting of non-uniformly distributed data (#3383) · 71d9bb13

Peter Eastman authored Dec 26, 2021

* Optimized CudaSort for non-uniformly distributed data

* Optimized OpenCLSort for non-uniformly distributed data

* Further tuned distributing elements between buckets

* Copied optimizations over to OpenCL

71d9bb13

08 Jan, 2020 1 commit

Common compute framework to unify CUDA and OpenCL code (#2488) · edbc8407

peastman authored Jan 08, 2020

* Began creating common compute framework to unify code between CUDA and OpenCL

* Began OpenCL implementation of common compute framework

* Common implementation of CMMotionRemover

* CUDA implementation of common compute interface

* Converted HarmonicBondForce to common compute API

* Converted standard bonded forces to common compute API

* Converted ExpressionUtilities to common compute API

* Created ComputeParameterSet

* Converted custom bonded forces to common compute API

* Converted CustomCentroidBondForce to common compute API

* Converted CustomManyParticleForce to common compute API

* Moved lots of duplicate code from CudaContext and OpenCLContext to ComputeContext

* Converted GayBerneForce to common compute API

* Removed obsolete kernels

* Converted verlet integrators to common compute API

* Converted Langevin and Brownian integrators to common compute API

* Converted CustomIntegrator to common compute API

* Converted CustomNonbondedForce to common compute API

* Removed uses of a deprecated API

* Fixed failing test cases

* Converted GBSAOBCForce to common compute API

* Began converting CustomGBForce to common compute API

* Finished converting CustomGBForce to common compute API

* Merged duplicated code in CudaIntegrationUtilities and OpenCLIntegrationUtilities

* Converted RMSDForce and AndersenThermostat to common compute API

* Converted CustomHbondForce to common compute API

* Merged scripts for encoding kernel sources

* Converted Drude plugin to common compute API

* Fixed errors in CMake scripts

* Attempt at fixing errors on Windows

* Added discussion of common compute API to developer guide

* Added Windows export macro for common classes

* Fixed error in CMMotionRemover

* Ubdated travis to newer Ubuntu version

* Fixed errors on CPU OpenCL

* Fixed Windows linking errors

* Added missing pragma for 32 bit atomics

* Replaced long long with mm_long

* More fixes to Windows linking

* Bug fix

edbc8407

03 May, 2018 1 commit
- Minor optimizations · f3c55c28
  peastman authored May 03, 2018
  
  f3c55c28
12 Feb, 2018 1 commit
- Began converting CudaArrays. · b8c86406
  Peter Eastman authored Feb 12, 2018
  
  b8c86406
08 Jul, 2013 1 commit

Platform specific header files get installed. This allows plugins to be built... · 300758a5

peastman authored Jul 08, 2013

Platform specific header files get installed.  This allows plugins to be built with just an OpenMM installation, not a full source tree.

300758a5

22 Mar, 2013 1 commit
- Merged 5.1Optimizations branch back to trunk · 93c467b2
  Peter Eastman authored Mar 22, 2013
  
  93c467b2
12 Dec, 2012 1 commit
- Fixed compilation errors on Windows · b256555e
  Peter Eastman authored Dec 12, 2012
  
  b256555e
24 Oct, 2012 1 commit
- Minor fixes and cleanup · e479d792
  Peter Eastman authored Oct 24, 2012
  
  e479d792
28 Sep, 2012 1 commit
- Renamed cuda2 platform to cuda · 58b094ce
  Peter Eastman authored Sep 28, 2012
  
  58b094ce
20 Jun, 2012 1 commit
- Bug fixes and additional test cases · fe589755
  Peter Eastman authored Jun 20, 2012
  
  fe589755
15 Jun, 2012 1 commit
- Continuing to implement new CUDA platform: NonbondedForce · 86aacbd8
  Peter Eastman authored Jun 15, 2012
  
  86aacbd8
05 Jun, 2012 1 commit
- Continuing to implement new CUDA platform · 3e16cab9
  Peter Eastman authored Jun 05, 2012
  
  3e16cab9