1. 10 May, 2026 1 commit
    • one's avatar
      Tune HIP neighbor-list launch heuristics · 4d20b76e
      one authored
      Apply heuristics for HIP neighbor-list construction:
      use fewer nonbonded force blocks for small neighbor-list systems, use two
      tiles per batch for larger atom-block counts, and increase the
      findBlocksWithInteractions thread block size for small atom-block counts.
      
      Standard concurrent validation shows no clear per-case regression and a
      small geomean throughput improvement over the current blocksPerCU baseline.
      4d20b76e
  2. 06 May, 2026 2 commits
    • one's avatar
      Add wave64 LDS spreading in HIP LJ-PME · 4e7070c2
      one authored
      4e7070c2
    • one's avatar
      Optimize HIP pair-list handling for CDNA LJPME · 939ecf28
      one authored
      - Use bitwise prefix accounting when storing sparse interactions as single pairs in the HIP pair-list kernel. This reduces the number of ballot operations needed to compute per-lane single-pair offsets.
      - For HIP CDNA single precision, raise MAX_BITS_FOR_PAIRS to 8 so more sparse interactions are emitted as single pairs instead of full tiles. Keep the existing double precision and RDNA thresholds unchanged.
      - Also simplify the HIP LJPME direct correction by computing alpha^2*r2
      939ecf28
  3. 29 Apr, 2026 1 commit
  4. 24 Apr, 2026 1 commit
  5. 17 Apr, 2026 2 commits
  6. 16 Apr, 2026 4 commits
  7. 10 Apr, 2026 1 commit
  8. 07 Apr, 2026 1 commit
  9. 06 Apr, 2026 2 commits
  10. 02 Apr, 2026 2 commits
  11. 31 Mar, 2026 1 commit
  12. 30 Mar, 2026 1 commit
  13. 27 Mar, 2026 2 commits
  14. 26 Mar, 2026 1 commit
  15. 12 Mar, 2026 1 commit
  16. 05 Mar, 2026 1 commit
  17. 26 Feb, 2026 1 commit
  18. 24 Feb, 2026 1 commit
  19. 19 Feb, 2026 1 commit
  20. 17 Feb, 2026 1 commit
  21. 16 Feb, 2026 1 commit
  22. 11 Feb, 2026 1 commit
  23. 10 Feb, 2026 3 commits
    • Peter Eastman's avatar
      Update version number to 8.5 (#5210) · 017fca83
      Peter Eastman authored
      017fca83
    • Peter Eastman's avatar
    • Evan Pretti's avatar
      GPU implementation of L-BFGS (#5198) · 4ab645ea
      Evan Pretti authored
      * Make reference/CPU minimizer into a kernel
      
      * Add per-platform support for GPU minimization
      
      * Initial implementation of GPU minimization
      
      * Fixes
      
      * Increase robustness when initial gradient is huge
      
      * Handle overflow leading to non-finite values gracefully
      
      * Handle large forces in single precision more robustly
      
      * Optimize kernels
      
      * Fix kernel launch size
      
      * Update banner years
      
      * Don't create MinimizeKernel until first minimization requested
      
      * Make some compile-time constants into kernel arguments
      
      * Consolidate scale calculation kernel
      
      * Condense alpha/beta reduction kernels using atomics
      
      * Condense line search dot kernels with reductions
      
      * Remove a download, and download grad norm separately
      
      * Asynchronously check lbfgs convergence condition
      
      * Restructure line search to avoid download waiting
      
      * Start line search preemptively in case CPU evaluation is not needed
      
      * In rare cases, constraint error might not decrease after one optimization round
      
      * Better handling of unsupported 64-bit atomics, use FLT_MAX
      
      * Pick gradient mode based on GPU vs. CPU evaluation
      
      * Rework getDiff/getScale reduction, remove reduceBuffer
      
      * Older CUDA might not like float hex literals
      
      * Fix error in a comment
      4ab645ea
  24. 09 Feb, 2026 2 commits
    • Peter Eastman's avatar
      Residue templates can specify constraints (#5197) · 834b1294
      Peter Eastman authored
      * Residue templates can specify constraints
      
      * Patched template generation preserves constraints
      834b1294
    • Peter Eastman's avatar
      API for querying devices (#5192) · add95438
      Peter Eastman authored
      * API for querying devices
      
      * CUDA and HIP implementations of getDevices()
      
      * Fix test failures
      
      * Fix test failures
      
      * CUDA returns correct devices even if no context has been created
      
      * Return a single device for Reference and CPU
      
      * Fix CI failure
      add95438
  25. 30 Jan, 2026 1 commit
  26. 14 Jan, 2026 1 commit
  27. 08 Jan, 2026 1 commit
  28. 30 Dec, 2025 1 commit
  29. 14 Dec, 2025 1 commit
    • Anton Gorenko's avatar
      Support ROCm 7 (#5162) · 07b738c5
      Anton Gorenko authored
      * Remove std::enable_if, warpRotateLeft is always used with TILE_SIZE
      
      * Do not use built-in warpSize in constexpr contexts
      
      Starting from ROCm 7 warpSize is no longer constexpr.
      findInteractingBlocks.hip uses it for sizes of __shared__ arrays.
      
      * Check if hipHostMallocNumaUser is allowed before using it
      07b738c5