• Anton Gorenko's avatar
    Use fixed point charge spreading on RDNA4 (#4960) · 1ce5d91d
    Anton Gorenko authored
    * Use fixed point spread charge on RDNA4 as it is faster
    
    Even though RDNA4 (gfx12) has global_atomic_add_f32, micro-benchmarks and OpenMM benchmarks show
    that it is very slow compared to global_atomic_add_u64.
    
    * Add a workaround for fixed point gridSpreadCharge on RDNA4
    
    Workaround for rare cases when few values of pmeGrid are very large and
    incorrect. The cause is unknown. Why this workaround or other irrelevant
    changes like printf help is also unknown.
    1ce5d91d
pme.cc 16 KB