- 31 May, 2026 1 commit
-
- 27 May, 2026 1 commit
-
-
Peter Eastman authored
-
- 26 May, 2026 1 commit
-
-
Peter Eastman authored
* More barostats support scaling particles independently * Documentation update * MonteCarloMembraneBarostat can scale particles independently
-
- 12 May, 2026 1 commit
-
-
one authored
Use explicit 128-thread launches for the three LangevinMiddle integration kernels to improve HIP throughput while preserving the existing PME launch heuristics.
-
- 11 May, 2026 1 commit
-
-
one authored
Use explicit 128-thread block launches for selected HIP PME kernels that benefit from larger blocks. Keep the platform default block size unchanged, and leave small-system grid indexing and charge spreading on the existing default launch configuration. The heuristic applies 128-thread launches to finishSpreadCharge on HIP, and uses 128-thread launches for findAtomGridIndex and gridSpreadCharge only for larger systems. Coulomb PME and LJPME dispersion paths are handled in parallel, while interpolation and energy evaluation remain unchanged.
-
- 10 May, 2026 1 commit
-
-
one authored
Apply heuristics for HIP neighbor-list construction: use fewer nonbonded force blocks for small neighbor-list systems, use two tiles per batch for larger atom-block counts, and increase the findBlocksWithInteractions thread block size for small atom-block counts. Standard concurrent validation shows no clear per-case regression and a small geomean throughput improvement over the current blocksPerCU baseline.
-
- 06 May, 2026 2 commits
-
-
one authored
-
one authored
- Use bitwise prefix accounting when storing sparse interactions as single pairs in the HIP pair-list kernel. This reduces the number of ballot operations needed to compute per-lane single-pair offsets. - For HIP CDNA single precision, raise MAX_BITS_FOR_PAIRS to 8 so more sparse interactions are emitted as single pairs instead of full tiles. Keep the existing double precision and RDNA thresholds unchanged. - Also simplify the HIP LJPME direct correction by computing alpha^2*r2
-
- 04 May, 2026 1 commit
-
-
Evan Pretti authored
* Context deselection before energy evaluation * Check that the correct context is popped by popAsCurrent()
-
- 29 Apr, 2026 2 commits
- 24 Apr, 2026 1 commit
-
-
one authored
-
- 17 Apr, 2026 2 commits
- 16 Apr, 2026 4 commits
- 06 Apr, 2026 1 commit
-
-
Peter Eastman authored
* PythonForce can be restricted to a subset of particles * Fix exception with CUDA
-
- 31 Mar, 2026 1 commit
-
-
ramdoys authored
* optimization, switch to unordered map * Remove emplace, reduce reserve allocation
-
- 30 Mar, 2026 1 commit
-
-
Peter Eastman authored
-
- 27 Mar, 2026 2 commits
-
-
Evan Pretti authored
-
Peter Eastman authored
* Cache coefficients for long range correction * updateParametersInContext() clears cache
-
- 05 Mar, 2026 1 commit
-
-
Peter Eastman authored
-
- 19 Feb, 2026 1 commit
-
-
Peter Eastman authored
* Fixed issue that caused inefficient sorting when a block contained only one atom * Add the fix to OpenCL and HIP
-
- 10 Feb, 2026 2 commits
-
-
Peter Eastman authored
-
Evan Pretti authored
* Make reference/CPU minimizer into a kernel * Add per-platform support for GPU minimization * Initial implementation of GPU minimization * Fixes * Increase robustness when initial gradient is huge * Handle overflow leading to non-finite values gracefully * Handle large forces in single precision more robustly * Optimize kernels * Fix kernel launch size * Update banner years * Don't create MinimizeKernel until first minimization requested * Make some compile-time constants into kernel arguments * Consolidate scale calculation kernel * Condense alpha/beta reduction kernels using atomics * Condense line search dot kernels with reductions * Remove a download, and download grad norm separately * Asynchronously check lbfgs convergence condition * Restructure line search to avoid download waiting * Start line search preemptively in case CPU evaluation is not needed * In rare cases, constraint error might not decrease after one optimization round * Better handling of unsupported 64-bit atomics, use FLT_MAX * Pick gradient mode based on GPU vs. CPU evaluation * Rework getDiff/getScale reduction, remove reduceBuffer * Older CUDA might not like float hex literals * Fix error in a comment
-
- 09 Feb, 2026 1 commit
-
-
Peter Eastman authored
* API for querying devices * CUDA and HIP implementations of getDevices() * Fix test failures * Fix test failures * CUDA returns correct devices even if no context has been created * Return a single device for Reference and CPU * Fix CI failure
-
- 14 Dec, 2025 1 commit
-
-
Anton Gorenko authored
* Remove std::enable_if, warpRotateLeft is always used with TILE_SIZE * Do not use built-in warpSize in constexpr contexts Starting from ROCm 7 warpSize is no longer constexpr. findInteractingBlocks.hip uses it for sizes of __shared__ arrays. * Check if hipHostMallocNumaUser is allowed before using it
-
- 13 Dec, 2025 1 commit
-
-
Anton Gorenko authored
adaptFrictionPart1 is executed first, it reads randomForce and segmentVelocity buffers before other kernels write them so they can contain garbage after allocation (cuMemAlloc/hipMalloc/clCreateBuffer do not clear allocated memory).
-
- 11 Dec, 2025 1 commit
-
-
Evan Pretti authored
* Basic LCPO support * Add basic test for LCPO from a prmtop file * API for LCPOForce * Started LCPO reference implementation * Finished reference forces & test cases * Use other test for finite difference since grid might have discontinuous forces * Reference platform formatting * Initial implementation of CPU platform * Bugfixes * More vectorization and improve neighbor list query speed * Parallelize part of neighbor search * Check box size for LCPO with periodic boundary conditions * Fixes for updating parameters in context * GBSAOBCForce doesn't use first & last indices for updates, so no need for this optimization here * Changes to neighbor checking and optimization * Fixes and minor changes * Add global surface tension parameter * Only process half of the pairs in the neighbor list * Remove unnecessary checks * Initial version of common platform implementation * Asynchronously download neighbor list size * Debugging * Do pair precomputation in copyPairsToNeighborList * Recompute interactions instead of scanning neighbor list in inner loop * Condense position array before computations * Also make neighbor count download asynchronous on device * Fixes for kernel launching * Topology-based LCPO parameter assignment * Fixes, and use test system for LCPO with nucleic acids * Always raise instead of warn when LCPO parameters can't be assigned * Use Amber convention for phosphates
-
- 01 Dec, 2025 1 commit
-
-
Peter Eastman authored
* Minor optimization to PME * Minor simplification
-
- 13 Nov, 2025 1 commit
-
-
Evan Pretti authored
-
- 12 Nov, 2025 1 commit
-
-
Peter Eastman authored
* Initial implementation of PythonForce * Continuing implementation of PythonForce * Tests for PythonForce * Fix memory leaks * Documentation for PythonForce * Fixed incorrect return type * Fix compilation error on Python older than 3.12 * Handle all dtypes * Optimizations to PythonForce * Optimized getPositions() * Test all platforms * Fix test failures
-
- 11 Nov, 2025 1 commit
-
-
Peter Eastman authored
-
- 06 Nov, 2025 1 commit
-
-
Peter Eastman authored
-
- 22 Oct, 2025 3 commits
-
-
Peter Eastman authored
-
Peter Eastman authored
-
Peter Eastman authored
-
- 21 Oct, 2025 1 commit
-
-
Evan Pretti authored
* Ensure that neighbor list is valid before solving for charges * Add test with neighbor list that needs to be resized * Try another approach to skip interactions for neighbor list generation only * Increase CG error tolerance for test
-