platforms/hip/src/HipContext.cpp · a0acfbc961f00562bb49a8f3a541012b5796c9ad · tsoc / openmm

Anton Gorenko authored Aug 25, 2024

* Compile with -munsafe-fp-atomics to enable fast hardware f32 atomic
  add on global memory on pre-MI100 GPUs;
* Use fixed point charge spreading on other GPUs, otherwise float atomic
  add will be compiled as a slow CAS loop;
* Tune block sizes, use executeKernelFlat;
* Tune launch bounds of PME grid-related kernels: force the compiler to
  use all registers by limiting max waves per EU to 1.

a0acfbc9

HipContext.cpp 35.2 KB

Replace HipContext.cpp