1. 05 Sep, 2024 1 commit
    • Anton Gorenko's avatar
      Optimize PME kernels · a0acfbc9
      Anton Gorenko authored
      * Compile with -munsafe-fp-atomics to enable fast hardware f32 atomic
        add on global memory on pre-MI100 GPUs;
      * Use fixed point charge spreading on other GPUs, otherwise float atomic
        add will be compiled as a slow CAS loop;
      * Tune block sizes, use executeKernelFlat;
      * Tune launch bounds of PME grid-related kernels: force the compiler to
        use all registers by limiting max waves per EU to 1.
      a0acfbc9
  2. 01 Sep, 2024 1 commit