1. 06 May, 2026 1 commit
    • one's avatar
      Optimize HIP pair-list handling for CDNA LJPME · 939ecf28
      one authored
      - Use bitwise prefix accounting when storing sparse interactions as single pairs in the HIP pair-list kernel. This reduces the number of ballot operations needed to compute per-lane single-pair offsets.
      - For HIP CDNA single precision, raise MAX_BITS_FOR_PAIRS to 8 so more sparse interactions are emitted as single pairs instead of full tiles. Keep the existing double precision and RDNA thresholds unchanged.
      - Also simplify the HIP LJPME direct correction by computing alpha^2*r2
      939ecf28
  2. 19 Feb, 2026 1 commit
  3. 14 Dec, 2025 1 commit
    • Anton Gorenko's avatar
      Support ROCm 7 (#5162) · 07b738c5
      Anton Gorenko authored
      * Remove std::enable_if, warpRotateLeft is always used with TILE_SIZE
      
      * Do not use built-in warpSize in constexpr contexts
      
      Starting from ROCm 7 warpSize is no longer constexpr.
      findInteractingBlocks.hip uses it for sizes of __shared__ arrays.
      
      * Check if hipHostMallocNumaUser is allowed before using it
      07b738c5
  4. 05 Sep, 2024 3 commits
    • Anton Gorenko's avatar
      Port changes from the main repository (mostly related to large blocks) · 7d7490ea
      Anton Gorenko authored
      Skip neighbor list for very small systems
      
          https://github.com/openmm/openmm/pull/4070
      
      Store bounding box sizes in half precision
      
          https://github.com/openmm/openmm/commit/2ae50f9
      
      Use large blocks to optimize building the neighbor list
      
          https://github.com/openmm/openmm/commit/3955033
      
      Improved sorting of blocks when building neighbor list
      
          https://github.com/openmm/openmm/commit/796ffaa
      
      Fixed bug in large blocks optimization with triclinic boxes
      
          https://github.com/openmm/openmm/commit/4c10732
      
      Optimize sorting of non-uniformly distributed data
      
          https://github.com/openmm/openmm/commit/71d9bb1
      
      Co-authored-by: default avatarbdenhollander <44237618+bdenhollander@users.noreply.github.com>
      7d7490ea
    • Anton Gorenko's avatar
      Improve latencies, handling of streams and events, multi-GPU support · 70771a51
      Anton Gorenko authored
      Use a small kernel for copying interactionCounts to host memory
      
          hipMemcpy's CopyDeviceToHost operation has higher latency.
      
      Do not set stream and event blocking/spin related flags
      
          Let the runtime choose the best option because overriding does not
          improve performance in most cases.
      
      Remove NULL streams and use nonblocking streams explicitly
      
      Make HipContext::pushAsCurrent/popAsCurrent thread-safe as they can be
      called simultaneously from different threads via ContextSelector.
      
      Allow peer access to be enabled more than once (if there are multiple
      simulations one after another, like in benchmark.py).
      
      Create peerCopyStream on a corresponding device
      
      Use two-speed load balancing for multi GPU runs
      
          First 100 steps do coarse balancing, next 100 - fine tuning.
          Also ignore the slowest device (usually 0) if its fraction has
          reached 0, (i.e. no work can be transfered to other devices) and
          balance other devices.
      
      Do not download inteactionCounts in parallel nonbonded tasks
      
          This is not required because updateNeighborListSize has been called
          and valid flag changed.
      
      Initialize tilesAfterReorder properly
      
          It may contain a garbage value, and if it is large then
          updateNeighborListSize does not force reorder atoms after 25 steps
          in extremal cases.
      70771a51
    • Anton Gorenko's avatar
      Optimize findInteractingBlocks · a96534c1
      Anton Gorenko authored
      Optimize findBlocksWithInteractions
      
      * Replace volatile shared mem accesses with shuffles;
      * Add NUM_TILES_IN_BATCH for processing block1 by multiple warps
        (for small systems);
      * Cherry-pick missing changes from .cu;
      * Tune MAX_BITS_FOR_PAIRS depending on device and the system size;
      * Store single pairs immediately (if there are any), this allows not to
        store flags to shared memory and filter buffer and flagsBuffer after
        saving single pairs;
      * Use fma explicitly and sign bit for better device code;
      * Use CDNA's MFMA with singe/mixed precision;
      * On CDNA the coarse grained stage processes warpSize blocks for
        one block1, the fine grained stage checks atoms of two block2 vs atoms
        of the same block1, singlePairs and interactingAtoms are also stored
        by warps, not half-warps;
      
      Optimize findBlockBounds
      
      * Use shuffles;
      * Use executeKernelFlat;
      * Process 2 tiles per warp 64 on CDNA;
      * Use more uniformly distributed keys when sorting blocks;
      
      Use compareInt2LargeSIMD when tile size < SIMD width
      
      Fix exclusion tiles sorting on AMD CDNA (64 threads per wave)
      
          The nonbonded kernel uses USE_NEIGHBOR_LIST (useNeighborList)
          so host code also must check it instead of useCutoff.
      
          See also https://github.com/openmm/openmm/issues/3462
      a96534c1
  5. 01 Sep, 2024 1 commit
  6. 14 Dec, 2023 1 commit
  7. 11 Dec, 2023 1 commit
  8. 24 Jul, 2023 1 commit
  9. 14 May, 2023 1 commit
  10. 27 Jan, 2022 1 commit
  11. 11 Mar, 2021 1 commit
  12. 18 Feb, 2021 1 commit
  13. 28 Jan, 2021 1 commit
  14. 10 Dec, 2020 1 commit
  15. 25 Sep, 2020 1 commit
  16. 16 Sep, 2020 1 commit
  17. 20 Aug, 2020 1 commit
  18. 04 Oct, 2019 1 commit
  19. 03 Oct, 2019 1 commit
  20. 03 May, 2018 1 commit
  21. 21 Sep, 2017 1 commit
  22. 10 Jan, 2017 1 commit
  23. 02 Dec, 2016 1 commit
  24. 18 Oct, 2016 1 commit
  25. 13 Oct, 2016 1 commit
  26. 22 Sep, 2016 1 commit
  27. 14 Sep, 2016 1 commit
  28. 19 Aug, 2016 2 commits
  29. 06 Mar, 2015 1 commit
  30. 05 Jan, 2015 1 commit
  31. 10 Nov, 2014 1 commit
  32. 09 Sep, 2014 1 commit
  33. 08 Sep, 2014 1 commit
  34. 04 Jun, 2013 1 commit
  35. 16 May, 2013 1 commit
  36. 03 May, 2013 1 commit
  37. 24 Apr, 2013 1 commit