-
one authored
Use explicit 128-thread block launches for selected HIP PME kernels that benefit from larger blocks. Keep the platform default block size unchanged, and leave small-system grid indexing and charge spreading on the existing default launch configuration. The heuristic applies 128-thread launches to finishSpreadCharge on HIP, and uses 128-thread launches for findAtomGridIndex and gridSpreadCharge only for larger systems. Coulomb PME and LJPME dispersion paths are handled in parallel, while interpolation and energy evaluation remain unchanged.
20e4b551