-
Anton Gorenko authored
* Use fixed point spread charge on RDNA4 as it is faster Even though RDNA4 (gfx12) has global_atomic_add_f32, micro-benchmarks and OpenMM benchmarks show that it is very slow compared to global_atomic_add_u64. * Add a workaround for fixed point gridSpreadCharge on RDNA4 Workaround for rare cases when few values of pmeGrid are very large and incorrect. The cause is unknown. Why this workaround or other irrelevant changes like printf help is also unknown.
1ce5d91d