- 05 Sep, 2024 1 commit
-
-
Anton Gorenko authored
Optimize findBlocksWithInteractions * Replace volatile shared mem accesses with shuffles; * Add NUM_TILES_IN_BATCH for processing block1 by multiple warps (for small systems); * Cherry-pick missing changes from .cu; * Tune MAX_BITS_FOR_PAIRS depending on device and the system size; * Store single pairs immediately (if there are any), this allows not to store flags to shared memory and filter buffer and flagsBuffer after saving single pairs; * Use fma explicitly and sign bit for better device code; * Use CDNA's MFMA with singe/mixed precision; * On CDNA the coarse grained stage processes warpSize blocks for one block1, the fine grained stage checks atoms of two block2 vs atoms of the same block1, singlePairs and interactingAtoms are also stored by warps, not half-warps; Optimize findBlockBounds * Use shuffles; * Use executeKernelFlat; * Process 2 tiles per warp 64 on CDNA; * Use more uniformly distributed keys when sorting blocks; Use compareInt2LargeSIMD when tile size < SIMD width Fix exclusion tiles sorting on AMD CDNA (64 threads per wave) The nonbonded kernel uses USE_NEIGHBOR_LIST (useNeighborList) so host code also must check it instead of useCutoff. See also https://github.com/openmm/openmm/issues/3462
-
- 01 Sep, 2024 1 commit
-
-
Anton Gorenko authored
Port changes in CUDA backend to HIP Fix a warning about arithmetic operations on void* in HipArray::uploadSubArray Fix "Error Initializing context ROCm 5.3.0" https://github.com/StreamHPC/openmm-hip/issues/3 hipDeviceSetCacheConfig returns hipErrorNotSupported on 5.3 Co-authored-by:Nick Curtis <nicholas.curtis@amd.com>
-
- 14 Dec, 2023 1 commit
-
-
Peter Eastman authored
-
- 11 Dec, 2023 1 commit
-
-
Peter Eastman authored
* Improved sorting of blocks when building neighbor list * Improved block sorting for OpenCL * Made sort keys more evenly distributed
-
- 24 Jul, 2023 1 commit
-
-
Peter Eastman authored
* Use large blocks to optimize building the neighbor list * Large blocks optimization for OpenCL * Fix test failures * Select whether to use large blocks based on system size
-
- 14 May, 2023 1 commit
-
-
Peter Eastman authored
* Store bounding box sizes in half precision * Work correctly in double precision mode
-
- 27 Jan, 2022 1 commit
-
-
Peter Eastman authored
* Fixed potential invalid memory access * Fixed exception
-
- 11 Mar, 2021 1 commit
-
-
Peter Eastman authored
-
- 18 Feb, 2021 1 commit
-
-
Peter Eastman authored
-
- 28 Jan, 2021 1 commit
-
-
David Clark authored
* Frames distance calculation as matrix multiplciation * Adds comment explaining distance calculation * Tunes launch bound for cuda11.2 * Simplifies the effective matrix multiplication Co-authored-by:David Clark <daclark@nvidia.com>
-
- 10 Dec, 2020 1 commit
-
-
David Clark authored
* Changes name of NVRTC program * Adds launch bounds for findInteractingBlocks * Replaces launch bound parameter with named constant Co-authored-by:David Clark <daclark@nvidia.com>
-
- 25 Sep, 2020 1 commit
-
-
peastman authored
-
- 16 Sep, 2020 1 commit
-
-
peastman authored
-
- 20 Aug, 2020 1 commit
-
-
peastman authored
* Fixed range overflow with very large numbers of atoms * More fixes to overflow with large numbers of atoms * Fix test failures
-
- 04 Oct, 2019 1 commit
-
-
Peter Eastman authored
-
- 03 Oct, 2019 1 commit
-
-
Peter Eastman authored
-
- 03 May, 2018 1 commit
-
-
peastman authored
-
- 21 Sep, 2017 1 commit
-
-
peastman authored
-
- 10 Jan, 2017 1 commit
-
-
Peter Eastman authored
-
- 02 Dec, 2016 1 commit
-
-
Peter Eastman authored
-
- 18 Oct, 2016 1 commit
-
-
Peter Eastman authored
-
- 13 Oct, 2016 1 commit
-
-
Peter Eastman authored
-
- 22 Sep, 2016 1 commit
-
-
Peter Eastman authored
-
- 14 Sep, 2016 1 commit
-
-
Peter Eastman authored
-
- 19 Aug, 2016 2 commits
-
-
Peter Eastman authored
-
Peter Eastman authored
-
- 06 Mar, 2015 1 commit
-
-
peastman authored
-
- 05 Jan, 2015 1 commit
-
-
Peter Eastman authored
-
- 10 Nov, 2014 1 commit
-
-
peastman authored
-
- 09 Sep, 2014 1 commit
-
-
peastman authored
-
- 08 Sep, 2014 1 commit
-
-
peastman authored
-
- 04 Jun, 2013 1 commit
-
-
peastman authored
Converted the array containing atom block indices for the neighbor list from ushort2 to int. This removes the hard limit of 2 million atoms.
-
- 16 May, 2013 1 commit
-
-
Yutong Zhao authored
-
- 03 May, 2013 1 commit
-
-
Peter Eastman authored
-
- 24 Apr, 2013 1 commit
-
-
Peter Eastman authored
-
- 15 Apr, 2013 1 commit
-
-
Yutong Zhao authored
-
- 22 Mar, 2013 1 commit
-
-
Peter Eastman authored
-
- 28 Sep, 2012 1 commit
-
-
Peter Eastman authored
-
- 28 Jun, 2012 1 commit
-
-
Peter Eastman authored
-
- 22 Jun, 2012 1 commit
-
-
Peter Eastman authored
-