• Anton Gorenko's avatar
    Optimize PME kernels · a0acfbc9
    Anton Gorenko authored
    * Compile with -munsafe-fp-atomics to enable fast hardware f32 atomic
      add on global memory on pre-MI100 GPUs;
    * Use fixed point charge spreading on other GPUs, otherwise float atomic
      add will be compiled as a slow CAS loop;
    * Tune block sizes, use executeKernelFlat;
    * Tune launch bounds of PME grid-related kernels: force the compiler to
      use all registers by limiting max waves per EU to 1.
    a0acfbc9
HipContext.cpp 35.2 KB