Unverified Commit 1ce5d91d authored by Anton Gorenko's avatar Anton Gorenko Committed by GitHub
Browse files

Use fixed point charge spreading on RDNA4 (#4960)

* Use fixed point spread charge on RDNA4 as it is faster

Even though RDNA4 (gfx12) has global_atomic_add_f32, micro-benchmarks and OpenMM benchmarks show
that it is very slow compared to global_atomic_add_u64.

* Add a workaround for fixed point gridSpreadCharge on RDNA4

Workaround for rare cases when few values of pmeGrid are very large and
incorrect. The cause is unknown. Why this workaround or other irrelevant
changes like printf help is also unknown.
parent a4b43a04
......@@ -104,6 +104,12 @@ KERNEL void gridSpreadCharge(GLOBAL const real4* RESTRICT posq,
real add = dzdx*data[iy].y;
#ifdef USE_FIXED_POINT_CHARGE_SPREADING
ATOMIC_ADD(&pmeGrid[index], (mm_ulong) realToFixedPoint(add));
#if defined(__GFX12__)
// Workaround for rare cases when few values of pmeGrid are very large and
// incorrect. The cause is unknown. Why this workaround or other irrelevant
// changes like printf help is also unknown.
asm volatile("s_wait_storecnt 0x0");
#endif
#else
ATOMIC_ADD(&pmeGrid[index], add);
#endif
......
......@@ -174,11 +174,14 @@ HipContext::HipContext(const System& system, int deviceIndex, bool useBlockingSy
// GPUs starting from CDNA1 and RDNA3 support atomic add for floats (global_atomic_add_f32),
// which can be used in PME. Older GPUs use fixed point charge spreading instead.
this->supportsHardwareFloatGlobalAtomicAdd = true;
if (gpuArchitecture.find("gfx900") == 0 ||
gpuArchitecture.find("gfx906") == 0 ||
gpuArchitecture.find("gfx10") == 0) {
// RDNA4 also has this instruction but benchmarks show that it is very slow compared to
// global_atomic_add_u64.
this->supportsHardwareFloatGlobalAtomicAdd = false;
if (gpuArchitecture.find("gfx908") == 0 ||
gpuArchitecture.find("gfx90a") == 0 ||
gpuArchitecture.find("gfx94") == 0 ||
gpuArchitecture.find("gfx11") == 0) {
this->supportsHardwareFloatGlobalAtomicAdd = true;
}
contextIsValid = true;
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment