Enable split PME streams for HIP LJPME
Run Coulomb and dispersion reciprocal PME work on separate HIP queues for LJPME when PME streams are enabled. Use separate grids, sorters, events, and energy buffers so the two reciprocal branches can overlap safely. Keep the behavior HIP-only based on RTX4090 CUDA profiling, where the same split increased PME spread/list contention and regressed apoa1ljpme.
Showing
Please register or sign in to comment