- 05 Sep, 2024 2 commits
-
-
Anton Gorenko authored
* VkFFT-based 3D FFT; * Caching of compiled VkFFT kernels; * Extend FFT tests with more sizes.
-
Anton Gorenko authored
* Compile with -munsafe-fp-atomics to enable fast hardware f32 atomic add on global memory on pre-MI100 GPUs; * Use fixed point charge spreading on other GPUs, otherwise float atomic add will be compiled as a slow CAS loop; * Tune block sizes, use executeKernelFlat; * Tune launch bounds of PME grid-related kernels: force the compiler to use all registers by limiting max waves per EU to 1.
-
- 01 Sep, 2024 2 commits
-
-
Anton Gorenko authored
* Remove setting of link libraries, include and link dirs and compile flags for each target, instead let Cmake deal with them by linking the main library to hip::host hiprtc::hiprtc hip::hipfft; * Fix: custom command without ADD_CUSTOM_TARGET and ADD_DEPENDENCIES is executed for both static and shared targets; * Remove IF(APPLE) parts.
-
Anton Gorenko authored
Fix SegFault in HipCalcHippoNonbondedForceKernel HipSort was created using a temporary ref. Adding `HipContext& cu` field to HipCalcHippoNonbondedForceKernel fixes the issue;
-