Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
tsoc
openmm
Commits
fdb501e3
Unverified
Commit
fdb501e3
authored
Jul 25, 2018
by
peastman
Committed by
GitHub
Jul 25, 2018
Browse files
Merge pull request #2133 from peastman/atomic
Replaced gmx_atomic with C++ atomic
parents
e72a4e8c
72bfef12
Changes
18
Hide whitespace changes
Inline
Side-by-side
Showing
18 changed files
with
80 additions
and
1690 deletions
+80
-1690
openmmapi/include/openmm/internal/gmx_atomic.h
openmmapi/include/openmm/internal/gmx_atomic.h
+0
-1597
platforms/cpu/include/CpuCustomGBForce.h
platforms/cpu/include/CpuCustomGBForce.h
+2
-1
platforms/cpu/include/CpuCustomManyParticleForce.h
platforms/cpu/include/CpuCustomManyParticleForce.h
+2
-1
platforms/cpu/include/CpuCustomNonbondedForce.h
platforms/cpu/include/CpuCustomNonbondedForce.h
+2
-1
platforms/cpu/include/CpuGBSAOBCForce.h
platforms/cpu/include/CpuGBSAOBCForce.h
+3
-2
platforms/cpu/include/CpuGayBerneForce.h
platforms/cpu/include/CpuGayBerneForce.h
+2
-2
platforms/cpu/include/CpuNeighborList.h
platforms/cpu/include/CpuNeighborList.h
+3
-3
platforms/cpu/include/CpuNonbondedForce.h
platforms/cpu/include/CpuNonbondedForce.h
+3
-2
platforms/cpu/src/CpuCustomGBForce.cpp
platforms/cpu/src/CpuCustomGBForce.cpp
+9
-12
platforms/cpu/src/CpuCustomManyParticleForce.cpp
platforms/cpu/src/CpuCustomManyParticleForce.cpp
+3
-6
platforms/cpu/src/CpuCustomNonbondedForce.cpp
platforms/cpu/src/CpuCustomNonbondedForce.cpp
+5
-8
platforms/cpu/src/CpuGBSAOBCForce.cpp
platforms/cpu/src/CpuGBSAOBCForce.cpp
+9
-12
platforms/cpu/src/CpuGayBerneForce.cpp
platforms/cpu/src/CpuGayBerneForce.cpp
+6
-9
platforms/cpu/src/CpuNeighborList.cpp
platforms/cpu/src/CpuNeighborList.cpp
+3
-3
platforms/cpu/src/CpuNonbondedForce.cpp
platforms/cpu/src/CpuNonbondedForce.cpp
+7
-10
platforms/cpu/src/CpuSETTLE.cpp
platforms/cpu/src/CpuSETTLE.cpp
+8
-8
plugins/cpupme/src/CpuPmeKernels.cpp
plugins/cpupme/src/CpuPmeKernels.cpp
+9
-9
plugins/cpupme/src/CpuPmeKernels.h
plugins/cpupme/src/CpuPmeKernels.h
+4
-4
No files found.
openmmapi/include/openmm/internal/gmx_atomic.h
deleted
100644 → 0
View file @
e72a4e8c
/* -*- mode: c; tab-width: 4; indent-tabs-mode: nil; c-basic-offset: 4 -*-
*
* Copyright (c) 2004-2008, Erik Lindahl <lindahl@cbr.su.se>
*
* Unfortunately, some of the constructs in this file are _very_ sensitive
* to compiler optimizations and architecture changes. If you find any such
* errors, please send a message to lindahl@cbr.su.se to help us fix the
* upstream version too.
*
* Permission is hereby granted, free of charge, to any person obtaining a copy
* of this software and associated documentation files (the "Software"), to deal
* in the Software without restriction, including without limitation the rights
* to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
* copies of the Software, and to permit persons to whom the Software is
* furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice shall be included in
* all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
* AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
* THE SOFTWARE.
*
* And Hey:
* Gnomes, ROck Monsters And Chili Sauce
*/
#ifndef _GMX_ATOMIC_H_
#define _GMX_ATOMIC_H_
/*! \file gmx_atomic.h
*
* @brief Atomic operations for fast SMP synchronization
*
* This file defines atomic integer operations and spinlocks for
* fast synchronization in performance-critical regions of gromacs.
*
* In general, the best option is to use functions without explicit
* locking, e.g. gmx_atomic_fetch_add() or gmx_atomic_cmpxchg().
*
* Not all architecture support atomic operations though inline assembly,
* and even if they do it might not be implemented here. In that case
* we use a fallback mutex implementation, so you can always count on
* the function interfaces working in Gromacs.
*
* Don't use spinlocks in non-performance-critical regions like file I/O.
* Since they always spin busy they would waste CPU cycles instead of
* properly yielding to a computation thread while waiting for the disk.
*
* Finally, note that all our spinlock operations are defined to return
* 0 if initialization or locking completes successfully.
* This is the opposite of some other implementations, but the same standard
* as used for pthread mutexes. So, if e.g. are trying to lock a spinlock,
* you will have gotten the lock if the return value is 0.
*
* gmx_spinlock_islocked(x) obviously still returns 1 if the lock is locked,
* and 0 if it is available, though...
*/
#include <stdio.h>
#define NOMINMAX
#include <pthread.h>
#ifdef __cplusplus
extern
"C"
{
#endif
#if 0
} /* Avoids screwing up auto-indentation */
#endif
#if ( ( (defined(__GNUC__) || defined(__INTEL_COMPILER) || defined(__PATHSCALE__)) && \
(defined(i386) || defined(__x86_64__)) ) \
|| defined (DOXYGEN) )
/* This code is executed for x86 and x86-64, with these compilers:
* GNU
* Intel
* Pathscale
* All these support GCC-style inline assembly.
* We also use this section for the documentation.
*/
/*! \brief Memory barrier operation
*
* Modern CPUs rely heavily on out-of-order execution, and one common feature
* is that load/stores might be reordered. Also, when using inline assembly
* the compiler might already have loaded the variable we are changing into
* a register, so any update to memory won't be visible.
*
* This command creates a memory barrier, i.e. all memory results before
* it in the code should be visible to all memory operations after it - the
* CPU cannot propagate load/stores across it.
*/
#define gmx_atomic_memory_barrier() __asm__ __volatile__("": : :"memory")
/* Only gcc and Intel support this check, otherwise set it to true (skip doc) */
#if (!defined(__GNUC__) && !defined(__INTEL_COMPILER) && !defined DOXYGEN)
#define __builtin_constant_p(i) (1)
#endif
/*! \brief Gromacs atomic operations datatype
*
* Portable synchronization primitives like mutexes are effective for
* many purposes, but usually not very high performance.
* One of the problem is that you have the overhead of a function call,
* and another is that Mutexes often have extra overhead to make the
* scheduling fair. Finally, if performance is important we don't want
* to suspend the thread if we cannot lock a mutex, but spin-lock at 100%
* CPU usage until the resources is available (e.g. increment a counter).
*
* These things can often be implemented with inline-assembly or other
* system-dependent functions, and we provide such functionality for the
* most common platforms. For portability we also have a fallback
* implementation using a mutex for locking.
*
* Performance-wise, the fastest solution is always to avoid locking
* completely (obvious, but remember it!). If you cannot do that, the
* next best thing is to use atomic operations that e.g. increment a
* counter without explicit locking. Spinlocks are useful to lock an
* entire region, but leads to more overhead and can be difficult to
* debug - it is up to you to make sure that only the thread owning the
* lock unlocks it!
*
* You should normally NOT use atomic operations for things like
* I/O threads. These should yield to other threads while waiting for
* the disk instead of spinning at 100% CPU usage.
*
* It is imperative that you use the provided routines for reading
* and writing, since some implementations require memory barriers before
* the CPU or memory sees an updated result. The structure contents is
* only visible here so it can be inlined for performance - it might
* change without further notice.
*
* \note No initialization is required for atomic variables.
*
* Currently, we have (real) atomic operations for:
*
* - x86 or x86_64, using GNU compilers
* - x86 or x86_64, using Intel compilers
* - x86 or x86_64, using Pathscale compilers
* - Itanium, using GNU compilers
* - Itanium, using Intel compilers
* - Itanium, using HP compilers
* - PowerPC, using GNU compilers
* - PowerPC, using IBM AIX compilers
* - PowerPC, using IBM compilers >=7.0 under Linux or Mac OS X.
*/
typedef
struct
gmx_atomic
{
volatile
int
value
;
/*!< Volatile, to avoid compiler aliasing */
}
gmx_atomic_t
;
/*! \brief Gromacs spinlock
*
* Spinlocks provide a faster synchronization than mutexes,
* although they consume CPU-cycles while waiting. They are implemented
* with atomic operations and inline assembly whenever possible, and
* otherwise we use a fallback implementation where a spinlock is identical
* to a mutex (this is one of the reasons why you have to initialize them).
*
* There are no guarantees whatsoever about fair scheduling or
* debugging if you make a mistake and unlock a variable somebody
* else has locked - performance is the primary goal of spinlocks.
*
*/
typedef
struct
gmx_spinlock
{
volatile
unsigned
int
lock
;
/*!< Volatile, to avoid compiler aliasing */
}
gmx_spinlock_t
;
/*! \brief Spinlock static initializer
*
* This is used for static spinlock initialization, and has the same
* properties as GMX_THREAD_MUTEX_INITIALIZER has for mutexes.
* This is only for inlining in the gmx_thread.h header file. Whether
* it is 0, 1, or something else when unlocked depends on the platform.
* Don't assume anything about it. It might even be a mutex when using the
* fallback implementation!
*/
#define GMX_SPINLOCK_INITIALIZER { 1 }
/*! \brief Return value of an atomic integer
*
* Also implements proper memory barriers when necessary.
* The actual implementation is system-dependent.
*
* \param a Atomic variable to read
* \return Integer value of the atomic variable
*/
#define gmx_atomic_read(a) ((a)->value)
/*! \brief Write value to an atomic integer
*
* Also implements proper memory barriers when necessary.
* The actual implementation is system-dependent.
*
* \param a Atomic variable
* \param i Integer to set the atomic variable to.
*/
#define gmx_atomic_set(a,i) (((a)->value) = (i))
/*! \brief Add integer to atomic variable
*
* Also implements proper memory barriers when necessary.
* The actual implementation is system-dependent.
*
* \param a atomic datatype to modify
* \param i integer to increment with. Use i<0 to subtract atomically.
*
* \return The new value (after summation).
*/
static
inline
int
gmx_atomic_add_return
(
gmx_atomic_t
*
a
,
volatile
int
i
)
{
int
__i
;
__i
=
i
;
__asm__
__volatile__
(
"lock ; xaddl %0, %1;"
:
"=r"
(
i
)
:
"m"
(
a
->
value
),
"0"
(
i
));
return
i
+
__i
;
}
/*! \brief Add to variable, return the old value.
*
* This operation is quite useful for synchronization counters.
* By performing a fetchadd with N, a thread can e.g. reserve a chunk
* with the next N iterations, and the return value is the index
* of the first element to treat.
*
* Also implements proper memory barriers when necessary.
* The actual implementation is system-dependent.
*
* \param a atomic datatype to modify
* \param i integer to increment with. Use i<0 to subtract atomically.
*
* \return The value of the atomic variable before addition.
*/
static
inline
int
gmx_atomic_fetch_add
(
gmx_atomic_t
*
a
,
volatile
int
i
)
{
int
__i
;
__i
=
i
;
__asm__
__volatile__
(
"lock ; xaddl %0, %1;"
:
"=r"
(
i
)
:
"m"
(
a
->
value
),
"0"
(
i
));
return
i
;
}
/*! \brief Atomic compare-exchange operation
*
* The \a old value is compared with the memory value in the atomic datatype.
* If the are identical, the atomic type is updated to the new value,
* and otherwise left unchanged.
*
* This is a very useful synchronization primitive: You can start by reading
* a value (without locking anything), perform some calculations, and then
* atomically try to update it in memory unless it has changed. If it has
* changed you will get an error return code - reread the new value
* an repeat the calculations in that case.
*
* \param a Atomic datatype ('memory' value)
* \param oldval Integer value read from the atomic type at an earlier point
* \param newval New value to write to the atomic type if it currently is
* identical to the old value.
*
* \return The value of the atomic memory variable in memory when this
* instruction was executed. This, if the operation succeeded the
* return value was identical to the \a old parameter, and if not
* it returns the updated value in memory so you can repeat your
* operations on it.
*
* \note The exchange occured if the return value is identical to \a old.
*/
static
inline
int
gmx_atomic_cmpxchg
(
gmx_atomic_t
*
a
,
int
oldval
,
int
newval
)
{
volatile
unsigned
long
prev
;
__asm__
__volatile__
(
"lock ; cmpxchgl %1,%2"
:
"=a"
(
prev
)
:
"q"
(
newval
),
"m"
(
a
->
value
),
"0"
(
oldval
)
:
"memory"
);
return
prev
;
}
/*! \brief Initialize spinlock
*
* In theory you can call this from multiple threads, but remember
* that we don't check for errors. If the first thread proceeded to
* lock the spinlock after initialization, the second will happily
* overwrite the contents and unlock it without warning you.
*
* \param x Gromacs spinlock pointer.
*/
static
inline
void
gmx_spinlock_init
(
gmx_spinlock_t
*
x
)
{
x
->
lock
=
1
;
}
/*! \brief Acquire spinlock
*
* This routine blocks until the spinlock is available, and
* the locks it again before returning.
*
* \param x Gromacs spinlock pointer
*/
static
inline
void
gmx_spinlock_lock
(
gmx_spinlock_t
*
x
)
{
__asm__
__volatile__
(
"
\n
1:
\t
"
"lock ; decb %0
\n\t
"
"jns 3f
\n
"
"2:
\t
"
"rep;nop
\n\t
"
"cmpb $0,%0
\n\t
"
"jle 2b
\n\t
"
"jmp 1b
\n
"
"3:
\n\t
"
:
"=m"
(
x
->
lock
)
:
:
"memory"
);
}
/*! \brief Attempt to acquire spinlock
*
* This routine acquires the spinlock if possible, but if
* already locked it return an error code immediately.
*
* \param x Gromacs spinlock pointer
*
* \return 0 if the mutex was available so we could lock it,
* otherwise a non-zero integer (1) if the lock is busy.
*/
static
inline
int
gmx_spinlock_trylock
(
gmx_spinlock_t
*
x
)
{
char
old_value
;
__asm__
__volatile__
(
"xchgb %b0,%1"
:
"=q"
(
old_value
),
"=m"
(
x
->
lock
)
:
"0"
(
0
)
:
"memory"
);
return
(
old_value
<=
0
);
}
/*! \brief Release spinlock
*
* \param x Gromacs spinlock pointer
*
* Unlocks the spinlock, regardless if which thread locked it.
*/
static
inline
void
gmx_spinlock_unlock
(
gmx_spinlock_t
*
x
)
{
char
old_value
=
1
;
__asm__
__volatile__
(
"xchgb %b0, %1"
:
"=q"
(
old_value
),
"=m"
(
x
->
lock
)
:
"0"
(
old_value
)
:
"memory"
);
}
/*! \brief Check if spinlock is locked
*
* This routine returns immediately with the lock status.
*
* \param x Gromacs spinlock pointer
*
* \return 1 if the spinlock is locked, 0 otherwise.
*/
static
inline
int
gmx_spinlock_islocked
(
gmx_spinlock_t
*
x
)
{
return
(
*
(
volatile
signed
char
*
)(
&
(
x
)
->
lock
)
<=
0
);
}
/*! \brief Wait for a spinlock to become available
*
* This routine blocks until the spinlock is unlocked,
* but in contrast to gmx_spinlock_lock() it returns without
* trying to lock the spinlock.
*
* \param x Gromacs spinlock pointer
*/
static
inline
void
gmx_spinlock_wait
(
gmx_spinlock_t
*
x
)
{
do
{
gmx_atomic_memory_barrier
();
}
while
(
gmx_spinlock_islocked
(
x
));
}
#elif ( defined(__GNUC__) && (defined(__powerpc__) || defined(__ppc__)))
/* PowerPC using proper GCC inline assembly.
* Recent versions of xlC (>=7.0) _partially_ support this, but since it is
* not 100% compatible we provide a separate implementation for xlC in
* the next section.
*/
/* Compiler-dependent stuff: GCC memory barrier */
#define gmx_atomic_memory_barrier() __asm__ __volatile__("": : :"memory")
typedef
struct
gmx_atomic
{
volatile
int
value
;
/*!< Volatile, to avoid compiler aliasing */
}
gmx_atomic_t
;
typedef
struct
gmx_spinlock
{
volatile
unsigned
int
lock
;
/*!< Volatile, to avoid compiler aliasing */
}
gmx_spinlock_t
;
#define GMX_SPINLOCK_INITIALIZER { 0 }
#define gmx_atomic_read(a) ((a)->value)
#define gmx_atomic_set(a,i) (((a)->value) = (i))
static
inline
int
gmx_atomic_add_return
(
gmx_atomic_t
*
a
,
int
i
)
{
int
t
;
__asm__
__volatile__
(
"1: lwarx %0,0,%2
\n
"
"
\t
add %0,%1,%0
\n
"
"
\t
stwcx. %0,0,%2
\n
"
"
\t
bne- 1b"
"
\t
isync
\n
"
:
"=&r"
(
t
)
:
"r"
(
i
),
"r"
(
&
a
->
value
)
:
"cc"
,
"memory"
);
return
t
;
}
static
inline
int
gmx_atomic_fetch_add
(
gmx_atomic_t
*
a
,
int
i
)
{
int
t
;
__asm__
__volatile__
(
"
\t
eieio
\n
"
"1: lwarx %0,0,%2
\n
"
"
\t
add %0,%1,%0
\n
"
"
\t
stwcx. %0,0,%2
\n
"
"
\t
bne- 1b
\n
"
"
\t
isync
\n
"
:
"=&r"
(
t
)
:
"r"
(
i
),
"r"
(
&
a
->
value
)
:
"cc"
,
"memory"
);
return
(
t
-
i
);
}
static
inline
int
gmx_atomic_cmpxchg
(
gmx_atomic_t
*
a
,
int
oldval
,
int
newval
)
{
int
prev
;
__asm__
__volatile__
(
"1: lwarx %0,0,%2
\n
"
"
\t
cmpw 0,%0,%3
\n
"
"
\t
bne 2f
\n
"
"
\t
stwcx. %4,0,%2
\n
"
"bne- 1b
\n
"
"
\t
sync
\n
"
"2:
\n
"
:
"=&r"
(
prev
),
"=m"
(
a
->
value
)
:
"r"
(
&
a
->
value
),
"r"
(
oldval
),
"r"
(
newval
),
"m"
(
a
->
value
)
:
"cc"
,
"memory"
);
return
prev
;
}
static
inline
void
gmx_spinlock_init
(
gmx_spinlock_t
*
x
)
{
x
->
lock
=
0
;
}
static
inline
void
gmx_spinlock_lock
(
gmx_spinlock_t
*
x
)
{
unsigned
int
tmp
;
__asm__
__volatile__
(
"
\t
b 1f
\n
"
"2: lwzx %0,0,%1
\n
"
"
\t
cmpwi 0,%0,0
\n
"
"
\t
bne+ 2b
\n
"
"1: lwarx %0,0,%1
\n
"
"
\t
cmpwi 0,%0,0
\n
"
"
\t
bne- 2b
\n
"
"
\t
stwcx. %2,0,%1
\n
"
"bne- 2b
\n
"
"
\t
isync
\n
"
:
"=&r"
(
tmp
)
:
"r"
(
&
x
->
lock
),
"r"
(
1
)
:
"cr0"
,
"memory"
);
}
static
inline
int
gmx_spinlock_trylock
(
gmx_spinlock_t
*
x
)
{
unsigned
int
old
,
t
;
unsigned
int
mask
=
1
;
volatile
unsigned
int
*
p
=
&
x
->
lock
;
__asm__
__volatile__
(
"
\t
eieio
\n
"
"1: lwarx %0,0,%4
\n
"
"
\t
or %1,%0,%3
\n
"
"
\t
stwcx. %1,0,%4
\n
"
"
\t
bne 1b
\n
"
"
\t
sync
\n
"
:
"=&r"
(
old
),
"=&r"
(
t
),
"=m"
(
*
p
)
:
"r"
(
mask
),
"r"
(
p
),
"m"
(
*
p
)
:
"cc"
,
"memory"
);
return
((
old
&
mask
)
!=
0
);
}
static
inline
void
gmx_spinlock_unlock
(
gmx_spinlock_t
*
x
)
{
__asm__
__volatile__
(
"
\t
eieio
\n
"
:
:
:
"memory"
);
x
->
lock
=
0
;
}
static
inline
int
gmx_spinlock_islocked
(
gmx_spinlock_t
*
x
)
{
return
(
x
->
lock
!=
0
);
}
static
inline
void
gmx_spinlock_wait
(
gmx_spinlock_t
*
x
)
{
do
{
gmx_atomic_memory_barrier
();
}
while
(
gmx_spinlock_islocked
(
x
));
}
#elif ( (defined(__IBM_GCC_ASM) || defined(__IBM_STDCPP_ASM)) && \
(defined(__powerpc__) || defined(__ppc__)))
/* PowerPC using xlC inline assembly.
* Recent versions of xlC (>=7.0) _partially_ support GCC inline assembly
* if you use the option -qasm=gcc but we have had to hack things a bit, in
* particular when it comes to clobbered variables. Since this implementation
* _could_ be buggy, we have separated it from the known-to-be-working gcc
* one above.
*/
/* memory barrier - no idea how to create one with xlc! */
#define gmx_atomic_memory_barrier()
typedef
struct
gmx_atomic
{
volatile
int
value
;
/*!< Volatile, to avoid compiler aliasing */
}
gmx_atomic_t
;
typedef
struct
gmx_spinlock
{
volatile
unsigned
int
lock
;
/*!< Volatile, to avoid compiler aliasing */
}
gmx_spinlock_t
;
#define GMX_SPINLOCK_INITIALIZER { 0 }
#define gmx_atomic_read(a) ((a)->value)
#define gmx_atomic_set(a,i) (((a)->value) = (i))
static
inline
int
gmx_atomic_add_return
(
gmx_atomic_t
*
a
,
int
i
)
{
int
t
;
__asm__
__volatile__
(
"1: lwarx %0,0,%2
\n
"
"
\t
add %0,%1,%0
\n
"
"
\t
stwcx. %0,0,%2
\n
"
"
\t
bne- 1b
\n
"
"
\t
isync
\n
"
:
"=&r"
(
t
)
:
"r"
(
i
),
"r"
(
&
a
->
value
)
);
return
t
;
}
static
inline
int
gmx_atomic_fetch_add
(
gmx_atomic_t
*
a
,
int
i
)
{
int
t
;
__asm__
__volatile__
(
"
\t
eieio
\n
"
"1: lwarx %0,0,%2
\n
"
"
\t
add %0,%1,%0
\n
"
"
\t
stwcx. %0,0,%2
\n
"
"
\t
bne- 1b
\n
"
"
\t
isync
\n
"
:
"=&r"
(
t
)
:
"r"
(
i
),
"r"
(
&
a
->
value
));
return
(
t
-
i
);
}
static
inline
int
gmx_atomic_cmpxchg
(
gmx_atomic_t
*
a
,
int
oldval
,
int
newval
)
{
int
prev
;
__asm__
__volatile__
(
"1: lwarx %0,0,%2
\n
"
"
\t
cmpw 0,%0,%3
\n
"
"
\t
bne 2f
\n
"
"
\t
stwcx. %4,0,%2
\n
"
"
\t
bne- 1b
\n
"
"
\t
sync
\n
"
"2:
\n
"
:
"=&r"
(
prev
),
"=m"
(
a
->
value
)
:
"r"
(
&
a
->
value
),
"r"
(
oldval
),
"r"
(
newval
),
"m"
(
a
->
value
));
return
prev
;
}
static
inline
void
gmx_spinlock_init
(
gmx_spinlock_t
*
x
)
{
x
->
lock
=
0
;
}
static
inline
void
gmx_spinlock_lock
(
gmx_spinlock_t
*
x
)
{
unsigned
int
tmp
;
__asm__
__volatile__
(
"
\t
b 1f
\n
"
"2: lwzx %0,0,%1
\n
"
"
\t
cmpwi 0,%0,0
\n
"
"
\t
bne+ 2b
\n
"
"1: lwarx %0,0,%1
\n
"
"
\t
cmpwi 0,%0,0
\n
"
"
\t
bne- 2b
\n
"
"
\t
stwcx. %2,0,%1
\n
"
"
\t
bne- 2b
\n
"
"
\t
isync
\n
"
:
"=&r"
(
tmp
)
:
"r"
(
&
x
->
lock
),
"r"
(
1
));
}
static
inline
int
gmx_spinlock_trylock
(
gmx_spinlock_t
*
x
)
{
unsigned
int
old
,
t
;
unsigned
int
mask
=
1
;
volatile
unsigned
int
*
p
=
&
x
->
lock
;
__asm__
__volatile__
(
"
\t
eieio
\n
"
"1: lwarx %0,0,%4
\n
"
"
\t
or %1,%0,%3
\n
"
"
\t
stwcx. %1,0,%4
\n
"
"
\t
bne 1b
\n
"
"
\t
sync
\n
"
:
"=&r"
(
old
),
"=&r"
(
t
),
"=m"
(
*
p
)
:
"r"
(
mask
),
"r"
(
p
),
"m"
(
*
p
));
return
((
old
&
mask
)
!=
0
);
}
static
inline
void
gmx_spinlock_unlock
(
gmx_spinlock_t
*
x
)
{
__asm__
__volatile__
(
"
\t
eieio
\n
"
);
x
->
lock
=
0
;
}
static
inline
void
gmx_spinlock_islocked
(
gmx_spinlock_t
*
x
)
{
return
(
x
->
lock
!=
0
);
}
static
inline
void
gmx_spinlock_wait
(
gmx_spinlock_t
*
x
)
{
do
{
gmx_atomic_memory_barrier
();
}
while
(
gmx_spinlock_islocked
(
x
));
}
#elif (defined(__ia64__) && (defined(__GNUC__) || defined(__INTEL_COMPILER)))
/* ia64 with GCC or Intel compilers. Since we need to define everything through
* cmpxchg and fetchadd on ia64, we merge the different compilers and only provide
* different implementations for that single function.
* Documentation? Check the gcc/x86 section.
*/
typedef
struct
gmx_atomic
{
volatile
int
value
;
/*!< Volatile, to avoid compiler aliasing */
}
gmx_atomic_t
;
typedef
struct
gmx_spinlock
{
volatile
unsigned
int
lock
;
/*!< Volatile, to avoid compiler aliasing */
}
gmx_spinlock_t
;
#define GMX_SPINLOCK_INITIALIZER { 0 }
#define gmx_atomic_read(a) ((a)->value)
#define gmx_atomic_set(a,i) (((a)->value) = (i))
/* Compiler thingies */
#ifdef __INTEL_COMPILER
void
__memory_barrier
(
void
);
int
_InterlockedCompareExchange
(
volatile
int
*
dest
,
int
xchg
,
int
comp
);
unsigned
__int64
__fetchadd4_rel
(
unsigned
int
*
addend
,
const
int
increment
);
/* ia64 memory barrier */
# define gmx_atomic_memory_barrier() __memory_barrier()
/* ia64 cmpxchg */
# define gmx_atomic_cmpxchg(a, oldval, newval) _InterlockedCompareExchange(&a->value,newval,oldval)
/* ia64 fetchadd, but it only works with increments +/- 1,4,8,16 */
# define gmx_ia64_fetchadd(a, inc) __fetchadd4_rel(a, inc)
#elif defined __GNUC__
/* ia64 memory barrier */
# define gmx_atomic_memory_barrier() asm volatile ("":::"memory")
/* ia64 cmpxchg */
static
inline
int
gmx_atomic_cmpxchg
(
gmx_atomic_t
*
a
,
int
oldval
,
int
newval
)
{
volatile
int
res
;
asm
volatile
(
"mov ar.ccv=%0;;"
::
"rO"
(
oldval
));
asm
volatile
(
"cmpxchg4.acq %0=[%1],%2,ar.ccv"
:
"=r"
(
res
)
:
"r"
(
&
a
->
value
),
"r"
(
newval
)
:
"memory"
);
return
res
;
}
/* fetchadd, but on ia64 it only works with increments +/- 1,4,8,16 */
#define gmx_ia64_fetchadd(a, inc) \
({ unsigned long res; \
asm volatile ("fetchadd4.rel %0=[%1],%2" \
: "=r"(res) : "r"(a), "i" (inc) : "memory"); \
res; \
})
#else
/* Unknown compiler */
# error Unknown ia64 compiler (not GCC or ICC) - modify gmx_thread.h!
#endif
static
inline
int
gmx_atomic_add_return
(
gmx_atomic_t
*
a
,
volatile
int
i
)
{
volatile
int
oldval
,
newval
;
volatile
int
__i
=
i
;
/* Use fetchadd if, and only if, the increment value can be determined
* at compile time (otherwise this check is optimized away) and it is
* a value supported by fetchadd (1,4,8,16,-1,-4,-8,-16).
*/
if
(
__builtin_constant_p
(
i
)
&&
(
(
__i
==
1
)
||
(
__i
==
4
)
||
(
__i
==
8
)
||
(
__i
==
16
)
||
(
__i
==
-
1
)
||
(
__i
==
-
4
)
||
(
__i
==
-
8
)
||
(
__i
==
-
16
)
)
)
{
oldval
=
gmx_ia64_fetchadd
(
a
,
__i
);
newval
=
oldval
+
i
;
}
else
{
/* Use compare-exchange addition that works with any value */
do
{
oldval
=
gmx_atomic_read
(
a
);
newval
=
oldval
+
i
;
}
while
(
gmx_atomic_cmpxchg
(
a
,
oldval
,
newval
)
!=
oldval
);
}
return
newval
;
}
static
inline
int
gmx_atomic_fetch_add
(
gmx_atomic_t
*
a
,
volatile
int
i
)
{
volatile
int
oldval
,
newval
;
volatile
int
__i
=
i
;
/* Use ia64 fetchadd if, and only if, the increment value can be determined
* at compile time (otherwise this check is optimized away) and it is
* a value supported by fetchadd (1,4,8,16,-1,-4,-8,-16).
*/
if
(
__builtin_constant_p
(
i
)
&&
(
(
__i
==
1
)
||
(
__i
==
4
)
||
(
__i
==
8
)
||
(
__i
==
16
)
||
(
__i
==
-
1
)
||
(
__i
==
-
4
)
||
(
__i
==
-
8
)
||
(
__i
==
-
16
)
)
)
{
oldval
=
gmx_ia64_fetchadd
(
a
,
__i
);
newval
=
oldval
+
i
;
}
else
{
/* Use compare-exchange addition that works with any value */
do
{
oldval
=
gmx_atomic_read
(
a
);
newval
=
oldval
+
i
;
}
while
(
gmx_atomic_cmpxchg
(
a
,
oldval
,
newval
)
!=
oldval
);
}
return
oldval
;
}
static
inline
void
gmx_spinlock_init
(
gmx_spinlock_t
*
x
)
{
x
->
lock
=
0
;
}
static
inline
void
gmx_spinlock_lock
(
gmx_spinlock_t
*
x
)
{
gmx_atomic_t
*
a
=
(
gmx_atomic_t
*
)
x
;
unsigned
long
value
;
value
=
gmx_atomic_cmpxchg
(
a
,
0
,
1
);
if
(
value
)
{
do
{
while
(
a
->
value
!=
0
)
{
gmx_atomic_memory_barrier
();
}
value
=
gmx_atomic_cmpxchg
(
a
,
0
,
1
);
}
while
(
value
);
}
}
static
inline
int
gmx_spinlock_trylock
(
gmx_spinlock_t
*
x
)
{
return
(
gmx_atomic_cmpxchg
((
gmx_atomic_t
*
)
x
,
0
,
1
)
!=
0
);
}
static
inline
void
gmx_spinlock_unlock
(
gmx_spinlock_t
*
x
)
{
do
{
gmx_atomic_memory_barrier
();
x
->
lock
=
0
;
}
while
(
0
);
}
static
inline
int
gmx_spinlock_islocked
(
gmx_spinlock_t
*
x
)
{
return
(
x
->
lock
!=
0
);
}
static
inline
void
gmx_spinlock_wait
(
gmx_spinlock_t
*
x
)
{
do
{
gmx_atomic_memory_barrier
();
}
while
(
gmx_spinlock_islocked
(
x
));
}
#undef gmx_ia64_fetchadd
#elif (defined(__hpux) || defined(__HP_cc)) && defined(__ia64)
/* HP compiler on ia64 */
#include <machine/sys/inline.h>
#define gmx_atomic_memory_barrier() _Asm_mf()
#define gmx_hpia64_fetchadd(a, i) \
_Asm_fetchadd((_Asm_fasz)_FASZ_W,(_Asm_sem)_SEM_REL, \
(UInt32*)a,(unsigned int) i, \
(_Asm_ldhint)LDHINT_NONE)
typedef
struct
gmx_atomic
{
volatile
int
value
;
/*!< Volatile, to avoid compiler aliasing */
}
gmx_atomic_t
;
typedef
struct
gmx_spinlock
{
volatile
unsigned
int
lock
;
/*!< Volatile, to avoid compiler aliasing */
}
gmx_spinlock_t
;
static
inline
int
gmx_atomic_cmpxchg
(
gmx_atomic_t
*
a
,
int
oldval
,
int
newval
)
{
int
ret
;
_Asm_mov_to_ar
((
_Asm_app_reg
)
_AREG_CCV
,(
Uint32
)
oldval
,
(
_Asm_fence
)(
_UP_CALL_FENCE
|
_UP_SYS_FENCE
|
_DOWN_CALL_FENCE
|
_DOWN_SYS_FENCE
));
ret
=
_Asm_cmpxchg
((
_Asm_sz
)
SZ_W
,(
_Asm_sem
)
_SEM_ACQ
,(
Uint32
*
)
a
,
(
Uint32
)
newval
,(
_Asm_ldhint
)
_LDHINT_NONE
);
return
ret
;
}
#define GMX_SPINLOCK_INITIALIZER { 0 }
#define gmx_atomic_read(a) ((a)->value)
#define gmx_atomic_set(a,i) (((a)->value) = (i))
static
inline
void
gmx_atomic_add_return
(
gmx_atomic_t
*
a
,
int
i
)
{
int
old
,
new
;
int
__i
=
i
;
/* On HP-UX we don't know any macro to determine whether the increment
* is known at compile time, but hopefully the call uses something simple
* like a constant, and then the optimizer should be able to do the job.
*/
if
(
(
__i
==
1
)
||
(
__i
==
4
)
||
(
__i
==
8
)
||
(
__i
==
16
)
||
(
__i
==
-
1
)
||
(
__i
==
-
4
)
||
(
__i
==
-
8
)
||
(
__i
==
-
16
)
)
{
oldval
=
gmx_hpia64_fetchadd
(
a
,
__i
);
newval
=
oldval
+
i
;
}
else
{
/* Use compare-exchange addition that works with any value */
do
{
oldval
=
gmx_atomic_read
(
a
);
newval
=
oldval
+
i
;
}
while
(
gmx_atomic_cmpxchg
(
a
,
oldval
,
newval
)
!=
oldval
);
}
return
newval
;
}
static
inline
int
gmx_atomic_fetch_add
(
gmx_atomic_t
*
a
,
int
i
)
{
int
oldval
,
newval
;
int
__i
=
i
;
/* On HP-UX we don't know any macro to determine whether the increment
* is known at compile time, but hopefully the call uses something simple
* like a constant, and then the optimizer should be able to do the job.
*/
if
(
(
__i
==
1
)
||
(
__i
==
4
)
||
(
__i
==
8
)
||
(
__i
==
16
)
||
(
__i
==
-
1
)
||
(
__i
==
-
4
)
||
(
__i
==
-
8
)
||
(
__i
==
-
16
)
)
{
oldval
=
gmx_hpia64_fetchadd
(
a
,
__i
);
newval
=
oldval
+
i
;
}
else
{
/* Use compare-exchange addition that works with any value */
do
{
oldval
=
gmx_atomic_read
(
a
);
newval
=
oldval
+
i
;
}
while
(
gmx_atomic_cmpxchg
(
a
,
oldval
,
newval
)
!=
oldval
);
}
return
oldval
;
}
static
inline
void
gmx_spinlock_init
(
gmx_spinlock_t
*
x
)
{
x
->
lock
=
0
;
}
static
inline
void
gmx_spinlock_trylock
(
gmx_spinlock_t
*
x
)
{
int
rc
;
rc
=
_Asm_xchg
((
_Asm_sz
)
_SZ_W
,
(
unsigned
int
*
)
x
,
1
(
_Asm_ldhit
)
_LDHINT_NONE
);
return
(
(
rc
>
0
)
?
1
:
0
);
}
static
inline
void
gmx_spinlock_lock
(
gmx_spinlock_t
*
x
)
{
int
status
=
1
;
do
{
if
(
*
((
unsigned
int
*
)
x
->
lock
)
==
0
)
{
status
=
gmx_spinlock_trylock
(
x
);
}
}
while
(
status
!=
0
);
}
static
inline
void
gmx_spinlock_unlock
(
gmx_spinlock_t
*
x
)
{
_Asm_fetchadd
((
_Asm_fasz
)
_SZ_W
,(
_Asm_sem
)
_SEM_REL
,
(
unsigned
int
*
)
x
,
-
1
,(
_Asm_ldhint
)
_LDHINT_NONE
);
}
static
inline
void
gmx_spinlock_islocked
(
gmx_spinlock_t
*
x
)
{
return
(
x
->
lock
!=
0
);
}
static
inline
void
gmx_spinlock_wait
(
gmx_spinlock_t
*
x
)
{
do
{
gmx_atomic_memory_barrier
();
}
while
(
gmx_spinlock_islocked
(
x
));
}
#undef gmx_hpia64_fetchadd
#elif (defined(_MSC_VER) && (_MSC_VER >= 1200))
/* Microsoft Visual C on x86, define taken from FFTW who got it from Morten Nissov */
#include <windows.h>
#define gmx_atomic_memory_barrier()
typedef
struct
gmx_atomic
{
LONG
volatile
value
;
/*!< Volatile, to avoid compiler aliasing */
}
gmx_atomic_t
;
typedef
struct
gmx_spinlock
{
LONG
volatile
lock
;
/*!< Volatile, to avoid compiler aliasing */
}
gmx_spinlock_t
;
#define GMX_SPINLOCK_INITIALIZER { 0 }
#define gmx_atomic_read(a) ((a)->value)
#define gmx_atomic_set(a,i) (((a)->value) = (i))
#define gmx_atomic_fetch_add(a, i) \
InterlockedExchangeAdd((LONG volatile *)a, (LONG) i)
#define gmx_atomic_add_return(a, i) \
( i + InterlockedExchangeAdd((LONG volatile *)a, (LONG) i) )
#define gmx_atomic_cmpxchg(a, oldval, newval) \
InterlockedCompareExchange((LONG volatile *)a, (LONG) newval, (LONG) oldval)
# define gmx_spinlock_lock(x) \
while((InterlockedCompareExchange((LONG volatile *)&x, 1, 0))!=0)
#define gmx_spinlock_trylock(x) \
InterlockedCompareExchange((LONG volatile *)&x, 1, 0)
static
inline
void
gmx_spinlock_unlock
(
gmx_spinlock_t
*
x
)
{
x
->
lock
=
0
;
}
static
inline
int
gmx_spinlock_islocked
(
gmx_spinlock_t
*
x
)
{
return
(
*
(
volatile
signed
char
*
)(
&
(
x
)
->
lock
)
!=
0
);
}
static
inline
void
gmx_spinlock_wait
(
gmx_spinlock_t
*
x
)
{
while
(
gmx_spinlock_islocked
(
x
))
{
Sleep
(
0
);
}
}
#elif defined(__xlC__) && defined (_AIX)
/* IBM xlC compiler on AIX */
#include <sys/atomic_op.h>
#define gmx_atomic_memory_barrier()
typedef
struct
gmx_atomic
{
volatile
int
value
;
/*!< Volatile, to avoid compiler aliasing */
}
gmx_atomic_t
;
typedef
struct
gmx_spinlock
{
volatile
unsigned
int
lock
;
/*!< Volatile, to avoid compiler aliasing */
}
gmx_spinlock_t
;
static
inline
int
gmx_atomic_cmpxchg
(
gmx_atomic_t
*
a
,
int
oldval
,
int
newval
)
{
int
t
;
if
(
__check_lock
((
atomic_p
)
&
a
->
value
,
oldval
,
newval
))
{
/* Not successful - value had changed in memory. Reload value. */
t
=
a
->
value
;
}
else
{
/* replacement suceeded */
t
=
oldval
;
}
return
t
;
}
static
inline
void
gmx_atomic_add_return
(
gmx_atomic_t
*
a
,
int
i
)
{
int
oldval
,
newval
;
do
{
oldval
=
gmx_atomic_read
(
a
);
newval
=
oldval
+
i
;
}
while
(
__check_lock
((
atomic_p
)
&
a
->
value
,
oldval
,
newval
));
return
newval
;
}
static
inline
void
gmx_atomic_fetch_add
(
gmx_atomic_t
*
a
,
int
i
)
{
int
oldval
,
newval
;
do
{
oldval
=
gmx_atomic_read
(
a
);
newval
=
oldval
+
i
;
}
while
(
__check_lock
((
atomic_p
)
&
a
->
value
,
oldval
,
newval
));
return
oldval
;
}
static
inline
void
gmx_spinlock_init
(
gmx_spinlock_t
*
x
)
{
__clear_lock
((
atomic_p
)
x
,
0
);
}
static
inline
void
gmx_spinlock_lock
(
gmx_spinlock_t
*
x
)
{
do
{
;
}
while
(
__check_lock
((
atomic_p
)
x
,
0
,
1
));
}
static
inline
void
gmx_spinlock_trylock
(
gmx_spinlock_t
*
x
)
{
/* Return 0 if we got the lock */
return
(
__check_lock
((
atomic_p
)
x
,
0
,
1
)
!=
0
)
}
static
inline
void
gmx_spinlock_unlock
(
gmx_spinlock_t
*
x
)
{
__clear_lock
((
atomic_p
)
x
,
0
);
}
static
inline
void
gmx_spinlock_islocked
(
gmx_spinlock_t
*
x
)
{
return
(
*
((
atomic_p
)
x
)
!=
0
);
}
static
inline
void
gmx_spinlock_wait
(
gmx_spinlock_t
*
x
)
{
while
(
gmx_spinlock_islocked
(
x
))
{
;
}
}
#else
/* No atomic operations, use mutex fallback. Documentation is in x86 section */
#define gmx_atomic_memory_barrier()
/* System mutex used for locking to guarantee atomicity */
static
pthread_mutex_t
gmx_atomic_mutex
=
PTHREAD_MUTEX_INITIALIZER
;
typedef
struct
gmx_atomic
{
int
value
;
}
gmx_atomic_t
;
#define gmx_spinlock_t pthread_mutex_t
# define GMX_SPINLOCK_INITIALIZER PTHREAD_MUTEX_INITIALIZER
/* Since mutexes guarantee memory barriers this works fine */
#define gmx_atomic_read(a) ((a)->value)
static
inline
void
gmx_atomic_set
(
gmx_atomic_t
*
a
,
int
i
)
{
/* Mutexes here are necessary to guarantee memory visibility */
pthread_mutex_lock
(
&
gmx_atomic_mutex
);
a
->
value
=
i
;
pthread_mutex_unlock
(
&
gmx_atomic_mutex
);
}
static
inline
int
gmx_atomic_add_return
(
gmx_atomic_t
*
a
,
int
i
)
{
int
t
;
pthread_mutex_lock
(
&
gmx_atomic_mutex
);
t
=
a
->
value
+
i
;
a
->
value
=
t
;
pthread_mutex_unlock
(
&
gmx_atomic_mutex
);
return
t
;
}
static
inline
int
gmx_atomic_fetch_add
(
gmx_atomic_t
*
a
,
int
i
)
{
int
old_value
;
pthread_mutex_lock
(
&
gmx_atomic_mutex
);
old_value
=
a
->
value
;
a
->
value
=
old_value
+
i
;
pthread_mutex_unlock
(
&
gmx_atomic_mutex
);
return
old_value
;
}
static
inline
int
gmx_atomic_cmpxchg
(
gmx_atomic_t
*
a
,
int
oldv
,
int
newv
)
{
int
t
;
pthread_mutex_lock
(
&
gmx_atomic_mutex
);
t
=
a
->
value
;
if
(
t
==
oldv
)
{
a
->
value
=
newv
;
}
pthread_mutex_unlock
(
&
gmx_atomic_mutex
);
return
t
;
}
#define gmx_spinlock_init(lock) pthread_mutex_init(lock)
#define gmx_spinlock_lock(lock) pthread_mutex_lock(lock)
#define gmx_spinlock_trylock(lock) pthread_mutex_trylock(lock)
#define gmx_spinlock_unlock(lock) pthread_mutex_unlock(lock)
static
inline
int
gmx_spinlock_islocked
(
gmx_spinlock_t
*
x
)
{
int
rc
;
if
(
gmx_spinlock_trylock
(
x
)
!=
0
)
{
/* It was locked */
return
1
;
}
else
{
/* We just locked it */
gmx_spinlock_unlock
(
x
);
return
0
;
}
}
static
inline
void
gmx_spinlock_wait
(
gmx_spinlock_t
*
x
)
{
int
rc
;
gmx_spinlock_lock
(
x
);
/* Got the lock now, so the waiting is over */
gmx_spinlock_unlock
(
x
);
}
#endif
/*! \brief Spinlock-based barrier type
*
* This barrier has the same functionality as the standard
* gmx_thread_barrier_t, but since it is based on spinlocks
* it provides faster synchronization at the cost of busy-waiting.
*
* Variables of this type should be initialized by calling
* gmx_spinlock_barrier_init() to set the number of threads
* that should be synchronized.
*/
typedef
struct
gmx_spinlock_barrier
{
gmx_atomic_t
count
;
/*!< Number of threads remaining */
int
threshold
;
/*!< Total number of threads */
volatile
int
cycle
;
/*!< Current cycle (alternating 0/1) */
}
gmx_spinlock_barrier_t
;
/*! \brief Initialize spinlock-based barrier
*
* \param barrier Pointer to _spinlock_ barrier. Note that this is not
* the same datatype as the full, thread based, barrier.
* \param count Number of threads to synchronize. All threads
* will be released after \a count calls to
* gmx_spinlock_barrier_wait().
*/
static
inline
void
gmx_spinlock_barrier_init
(
gmx_spinlock_barrier_t
*
barrier
,
int
count
)
{
barrier
->
threshold
=
count
;
barrier
->
cycle
=
0
;
gmx_atomic_set
(
&
(
barrier
->
count
),
count
);
}
/*! \brief Perform busy-waiting barrier synchronization
*
* This routine blocks until it has been called N times,
* where N is the count value the barrier was initialized with.
* After N total calls all threads return. The barrier automatically
* cycles, and thus requires another N calls to unblock another time.
*
* Note that spinlock-based barriers are completely different from
* standard ones (using mutexes and condition variables), only the
* functionality and names are similar.
*
* \param barrier Pointer to previously create barrier.
*
* \return The last thread returns -1, all the others 0.
*/
static
inline
int
gmx_spinlock_barrier_wait
(
gmx_spinlock_barrier_t
*
barrier
)
{
int
cycle
;
int
status
;
/* We don't need to lock or use atomic ops here, since the cycle index
* cannot change until after the last thread has performed the check
* further down. Further, they cannot reach this point in the next
* barrier iteration until all of them have been released, and that
* happens after the cycle value has been updated.
*
* No synchronization == fast synchronization.
*/
cycle
=
barrier
->
cycle
;
/* Decrement the count atomically and check if it is zero.
* This will only be true for the last thread calling us.
*/
if
(
gmx_atomic_add_return
(
&
(
barrier
->
count
),
-
1
)
==
0
)
{
gmx_atomic_set
(
&
(
barrier
->
count
),
barrier
->
threshold
);
barrier
->
cycle
=
!
barrier
->
cycle
;
status
=
-
1
;
}
else
{
/* Wait until the last thread changes the cycle index.
* We are both using a memory barrier, and explicit
* volatile pointer cast to make sure the compiler
* doesn't try to be smart and cache the contents.
*/
do
{
gmx_atomic_memory_barrier
();
}
while
(
*
(
volatile
int
*
)(
&
(
barrier
->
cycle
))
==
cycle
);
status
=
0
;
}
return
status
;
}
#ifdef __cplusplus
}
#endif
#endif
/* _GMX_ATOMIC_H_ */
platforms/cpu/include/CpuCustomGBForce.h
View file @
fdb501e3
...
@@ -31,6 +31,7 @@
...
@@ -31,6 +31,7 @@
#include "openmm/internal/CompiledExpressionSet.h"
#include "openmm/internal/CompiledExpressionSet.h"
#include "openmm/internal/ThreadPool.h"
#include "openmm/internal/ThreadPool.h"
#include "openmm/internal/vectorize.h"
#include "openmm/internal/vectorize.h"
#include <atomic>
#include <map>
#include <map>
#include <set>
#include <set>
#include <vector>
#include <vector>
...
@@ -63,7 +64,7 @@ private:
...
@@ -63,7 +64,7 @@ private:
const
std
::
map
<
std
::
string
,
double
>*
globalParameters
;
const
std
::
map
<
std
::
string
,
double
>*
globalParameters
;
std
::
vector
<
AlignedArray
<
float
>
>*
threadForce
;
std
::
vector
<
AlignedArray
<
float
>
>*
threadForce
;
bool
includeForce
,
includeEnergy
;
bool
includeForce
,
includeEnergy
;
void
*
atomicCounter
;
std
::
atomic
<
int
>
atomicCounter
;
/**
/**
* This routine contains the code executed by each thread.
* This routine contains the code executed by each thread.
...
...
platforms/cpu/include/CpuCustomManyParticleForce.h
View file @
fdb501e3
...
@@ -34,6 +34,7 @@
...
@@ -34,6 +34,7 @@
#include "openmm/internal/vectorize.h"
#include "openmm/internal/vectorize.h"
#include "lepton/CompiledExpression.h"
#include "lepton/CompiledExpression.h"
#include "lepton/ParsedExpression.h"
#include "lepton/ParsedExpression.h"
#include <atomic>
#include <map>
#include <map>
#include <set>
#include <set>
#include <utility>
#include <utility>
...
@@ -69,7 +70,7 @@ private:
...
@@ -69,7 +70,7 @@ private:
const
std
::
map
<
std
::
string
,
double
>*
globalParameters
;
const
std
::
map
<
std
::
string
,
double
>*
globalParameters
;
std
::
vector
<
AlignedArray
<
float
>
>*
threadForce
;
std
::
vector
<
AlignedArray
<
float
>
>*
threadForce
;
bool
includeForces
,
includeEnergy
;
bool
includeForces
,
includeEnergy
;
void
*
atomicCounter
;
std
::
atomic
<
int
>
atomicCounter
;
/**
/**
* This routine contains the code executed by each thread.
* This routine contains the code executed by each thread.
...
...
platforms/cpu/include/CpuCustomNonbondedForce.h
View file @
fdb501e3
...
@@ -30,6 +30,7 @@
...
@@ -30,6 +30,7 @@
#include "openmm/internal/CompiledExpressionSet.h"
#include "openmm/internal/CompiledExpressionSet.h"
#include "openmm/internal/ThreadPool.h"
#include "openmm/internal/ThreadPool.h"
#include "openmm/internal/vectorize.h"
#include "openmm/internal/vectorize.h"
#include <atomic>
#include <map>
#include <map>
#include <set>
#include <set>
#include <utility>
#include <utility>
...
@@ -147,7 +148,7 @@ private:
...
@@ -147,7 +148,7 @@ private:
const
std
::
map
<
std
::
string
,
double
>*
globalParameters
;
const
std
::
map
<
std
::
string
,
double
>*
globalParameters
;
std
::
vector
<
AlignedArray
<
float
>
>*
threadForce
;
std
::
vector
<
AlignedArray
<
float
>
>*
threadForce
;
bool
includeForce
,
includeEnergy
;
bool
includeForce
,
includeEnergy
;
void
*
atomicCounter
;
std
::
atomic
<
int
>
atomicCounter
;
/**
/**
* This routine contains the code executed by each thread.
* This routine contains the code executed by each thread.
...
...
platforms/cpu/include/CpuGBSAOBCForce.h
View file @
fdb501e3
/* Portions copyright (c) 2006-201
7
Stanford University and Simbios.
/* Portions copyright (c) 2006-201
8
Stanford University and Simbios.
* Contributors: Pande Group
* Contributors: Pande Group
*
*
* Permission is hereby granted, free of charge, to any person obtaining
* Permission is hereby granted, free of charge, to any person obtaining
...
@@ -28,6 +28,7 @@
...
@@ -28,6 +28,7 @@
#include "AlignedArray.h"
#include "AlignedArray.h"
#include "openmm/internal/ThreadPool.h"
#include "openmm/internal/ThreadPool.h"
#include "openmm/internal/vectorize.h"
#include "openmm/internal/vectorize.h"
#include <atomic>
#include <set>
#include <set>
#include <utility>
#include <utility>
#include <vector>
#include <vector>
...
@@ -112,7 +113,7 @@ private:
...
@@ -112,7 +113,7 @@ private:
float
const
*
posq
;
float
const
*
posq
;
std
::
vector
<
AlignedArray
<
float
>
>*
threadForce
;
std
::
vector
<
AlignedArray
<
float
>
>*
threadForce
;
bool
includeEnergy
;
bool
includeEnergy
;
void
*
atomicCounter
;
std
::
atomic
<
int
>
atomicCounter
;
static
const
int
NUM_TABLE_POINTS
;
static
const
int
NUM_TABLE_POINTS
;
static
const
float
TABLE_MIN
;
static
const
float
TABLE_MIN
;
...
...
platforms/cpu/include/CpuGayBerneForce.h
View file @
fdb501e3
...
@@ -6,7 +6,7 @@
...
@@ -6,7 +6,7 @@
* Biological Structures at Stanford, funded under the NIH Roadmap for *
* Biological Structures at Stanford, funded under the NIH Roadmap for *
* Medical Research, grant U54 GM072970. See https://simtk.org. *
* Medical Research, grant U54 GM072970. See https://simtk.org. *
* *
* *
* Portions copyright (c) 2016-201
7
Stanford University and the Authors. *
* Portions copyright (c) 2016-201
8
Stanford University and the Authors. *
* Authors: Peter Eastman *
* Authors: Peter Eastman *
* Contributors: *
* Contributors: *
* *
* *
...
@@ -91,7 +91,7 @@ private:
...
@@ -91,7 +91,7 @@ private:
Vec3
const
*
positions
;
Vec3
const
*
positions
;
std
::
vector
<
AlignedArray
<
float
>
>*
threadForce
;
std
::
vector
<
AlignedArray
<
float
>
>*
threadForce
;
Vec3
*
boxVectors
;
Vec3
*
boxVectors
;
void
*
atomicCounter
;
std
::
atomic
<
int
>
atomicCounter
;
void
computeEllipsoidFrames
(
const
std
::
vector
<
Vec3
>&
positions
);
void
computeEllipsoidFrames
(
const
std
::
vector
<
Vec3
>&
positions
);
...
...
platforms/cpu/include/CpuNeighborList.h
View file @
fdb501e3
...
@@ -9,7 +9,7 @@
...
@@ -9,7 +9,7 @@
* Biological Structures at Stanford, funded under the NIH Roadmap for *
* Biological Structures at Stanford, funded under the NIH Roadmap for *
* Medical Research, grant U54 GM072970. See https://simtk.org. *
* Medical Research, grant U54 GM072970. See https://simtk.org. *
* *
* *
* Portions copyright (c) 2013-201
7
Stanford University and the Authors. *
* Portions copyright (c) 2013-201
8
Stanford University and the Authors. *
* Authors: Peter Eastman *
* Authors: Peter Eastman *
* Contributors: *
* Contributors: *
* *
* *
...
@@ -35,8 +35,8 @@
...
@@ -35,8 +35,8 @@
#include "AlignedArray.h"
#include "AlignedArray.h"
#include "openmm/Vec3.h"
#include "openmm/Vec3.h"
#include "windowsExportCpu.h"
#include "windowsExportCpu.h"
#include "openmm/internal/gmx_atomic.h"
#include "openmm/internal/ThreadPool.h"
#include "openmm/internal/ThreadPool.h"
#include <atomic>
#include <set>
#include <set>
#include <utility>
#include <utility>
#include <vector>
#include <vector>
...
@@ -75,7 +75,7 @@ private:
...
@@ -75,7 +75,7 @@ private:
int
numAtoms
;
int
numAtoms
;
bool
usePeriodic
;
bool
usePeriodic
;
float
maxDistance
;
float
maxDistance
;
gmx_
atomic
_t
atomicCounter
;
std
::
atomic
<
int
>
atomicCounter
;
};
};
}
// namespace OpenMM
}
// namespace OpenMM
...
...
platforms/cpu/include/CpuNonbondedForce.h
View file @
fdb501e3
/* Portions copyright (c) 2006-201
7
Stanford University and Simbios.
/* Portions copyright (c) 2006-201
8
Stanford University and Simbios.
* Contributors: Pande Group
* Contributors: Pande Group
*
*
* Permission is hereby granted, free of charge, to any person obtaining
* Permission is hereby granted, free of charge, to any person obtaining
...
@@ -30,6 +30,7 @@
...
@@ -30,6 +30,7 @@
#include "ReferencePairIxn.h"
#include "ReferencePairIxn.h"
#include "openmm/internal/ThreadPool.h"
#include "openmm/internal/ThreadPool.h"
#include "openmm/internal/vectorize.h"
#include "openmm/internal/vectorize.h"
#include <atomic>
#include <set>
#include <set>
#include <utility>
#include <utility>
#include <vector>
#include <vector>
...
@@ -200,7 +201,7 @@ protected:
...
@@ -200,7 +201,7 @@ protected:
bool
includeEnergy
;
bool
includeEnergy
;
float
inverseRcut6
;
float
inverseRcut6
;
float
inverseRcut6Expterm
;
float
inverseRcut6Expterm
;
void
*
atomicCounter
;
std
::
atomic
<
int
>
atomicCounter
;
static
const
float
TWO_OVER_SQRT_PI
;
static
const
float
TWO_OVER_SQRT_PI
;
static
const
int
NUM_TABLE_POINTS
;
static
const
int
NUM_TABLE_POINTS
;
...
...
platforms/cpu/src/CpuCustomGBForce.cpp
View file @
fdb501e3
...
@@ -28,7 +28,6 @@
...
@@ -28,7 +28,6 @@
#include "SimTKOpenMMUtilities.h"
#include "SimTKOpenMMUtilities.h"
#include "ReferenceForce.h"
#include "ReferenceForce.h"
#include "CpuCustomGBForce.h"
#include "CpuCustomGBForce.h"
#include "openmm/internal/gmx_atomic.h"
using
namespace
OpenMM
;
using
namespace
OpenMM
;
using
namespace
std
;
using
namespace
std
;
...
@@ -191,13 +190,11 @@ void CpuCustomGBForce::calculateIxn(int numberOfAtoms, float* posq, vector<vecto
...
@@ -191,13 +190,11 @@ void CpuCustomGBForce::calculateIxn(int numberOfAtoms, float* posq, vector<vecto
this
->
includeForce
=
includeForce
;
this
->
includeForce
=
includeForce
;
this
->
includeEnergy
=
includeEnergy
;
this
->
includeEnergy
=
includeEnergy
;
threadEnergy
.
resize
(
threads
.
getNumThreads
());
threadEnergy
.
resize
(
threads
.
getNumThreads
());
gmx_atomic_t
counter
;
this
->
atomicCounter
=
&
counter
;
// Calculate the first computed value.
// Calculate the first computed value.
auto
task
=
[
&
]
(
ThreadPool
&
threads
,
int
threadIndex
)
{
threadComputeForce
(
threads
,
threadIndex
);
};
auto
task
=
[
&
]
(
ThreadPool
&
threads
,
int
threadIndex
)
{
threadComputeForce
(
threads
,
threadIndex
);
};
gmx_
atomic
_set
(
&
c
ounter
,
0
)
;
atomic
C
ounter
=
0
;
threads
.
execute
(
task
);
threads
.
execute
(
task
);
threads
.
waitForThreads
();
threads
.
waitForThreads
();
...
@@ -217,7 +214,7 @@ void CpuCustomGBForce::calculateIxn(int numberOfAtoms, float* posq, vector<vecto
...
@@ -217,7 +214,7 @@ void CpuCustomGBForce::calculateIxn(int numberOfAtoms, float* posq, vector<vecto
// Calculate the energy terms.
// Calculate the energy terms.
for
(
int
i
=
0
;
i
<
(
int
)
threadData
[
0
]
->
energyExpressions
.
size
();
i
++
)
{
for
(
int
i
=
0
;
i
<
(
int
)
threadData
[
0
]
->
energyExpressions
.
size
();
i
++
)
{
gmx_
atomic
_set
(
&
c
ounter
,
0
)
;
atomic
C
ounter
=
0
;
threads
.
execute
(
task
);
threads
.
execute
(
task
);
threads
.
waitForThreads
();
threads
.
waitForThreads
();
}
}
...
@@ -229,7 +226,7 @@ void CpuCustomGBForce::calculateIxn(int numberOfAtoms, float* posq, vector<vecto
...
@@ -229,7 +226,7 @@ void CpuCustomGBForce::calculateIxn(int numberOfAtoms, float* posq, vector<vecto
// Apply the chain rule to evaluate forces.
// Apply the chain rule to evaluate forces.
gmx_
atomic
_set
(
&
c
ounter
,
0
)
;
atomic
C
ounter
=
0
;
threads
.
resumeThreads
();
threads
.
resumeThreads
();
threads
.
waitForThreads
();
threads
.
waitForThreads
();
...
@@ -361,7 +358,7 @@ void CpuCustomGBForce::calculateParticlePairValue(int index, ThreadData& data, i
...
@@ -361,7 +358,7 @@ void CpuCustomGBForce::calculateParticlePairValue(int index, ThreadData& data, i
// Loop over all pairs in the neighbor list.
// Loop over all pairs in the neighbor list.
while
(
true
)
{
while
(
true
)
{
int
blockIndex
=
gmx_atomic_fetch_add
(
reinterpret_cast
<
gmx_atomic_t
*>
(
atomicCounter
),
1
)
;
int
blockIndex
=
atomicCounter
++
;
if
(
blockIndex
>=
neighborList
->
getNumBlocks
())
if
(
blockIndex
>=
neighborList
->
getNumBlocks
())
break
;
break
;
const
int
blockSize
=
neighborList
->
getBlockSize
();
const
int
blockSize
=
neighborList
->
getBlockSize
();
...
@@ -386,7 +383,7 @@ void CpuCustomGBForce::calculateParticlePairValue(int index, ThreadData& data, i
...
@@ -386,7 +383,7 @@ void CpuCustomGBForce::calculateParticlePairValue(int index, ThreadData& data, i
// Perform an O(N^2) loop over all atom pairs.
// Perform an O(N^2) loop over all atom pairs.
while
(
true
)
{
while
(
true
)
{
int
i
=
gmx_atomic_fetch_add
(
reinterpret_cast
<
gmx_atomic_t
*>
(
atomicCounter
),
1
)
;
int
i
=
atomicCounter
++
;
if
(
i
>=
numAtoms
)
if
(
i
>=
numAtoms
)
break
;
break
;
for
(
int
j
=
i
+
1
;
j
<
numAtoms
;
j
++
)
{
for
(
int
j
=
i
+
1
;
j
<
numAtoms
;
j
++
)
{
...
@@ -456,7 +453,7 @@ void CpuCustomGBForce::calculateParticlePairEnergyTerm(int index, ThreadData& da
...
@@ -456,7 +453,7 @@ void CpuCustomGBForce::calculateParticlePairEnergyTerm(int index, ThreadData& da
// Loop over all pairs in the neighbor list.
// Loop over all pairs in the neighbor list.
while
(
true
)
{
while
(
true
)
{
int
blockIndex
=
gmx_atomic_fetch_add
(
reinterpret_cast
<
gmx_atomic_t
*>
(
atomicCounter
),
1
)
;
int
blockIndex
=
atomicCounter
++
;
if
(
blockIndex
>=
neighborList
->
getNumBlocks
())
if
(
blockIndex
>=
neighborList
->
getNumBlocks
())
break
;
break
;
const
int
blockSize
=
neighborList
->
getBlockSize
();
const
int
blockSize
=
neighborList
->
getBlockSize
();
...
@@ -480,7 +477,7 @@ void CpuCustomGBForce::calculateParticlePairEnergyTerm(int index, ThreadData& da
...
@@ -480,7 +477,7 @@ void CpuCustomGBForce::calculateParticlePairEnergyTerm(int index, ThreadData& da
// Perform an O(N^2) loop over all atom pairs.
// Perform an O(N^2) loop over all atom pairs.
while
(
true
)
{
while
(
true
)
{
int
i
=
gmx_atomic_fetch_add
(
reinterpret_cast
<
gmx_atomic_t
*>
(
atomicCounter
),
1
)
;
int
i
=
atomicCounter
++
;
if
(
i
>=
numAtoms
)
if
(
i
>=
numAtoms
)
break
;
break
;
for
(
int
j
=
i
+
1
;
j
<
numAtoms
;
j
++
)
{
for
(
int
j
=
i
+
1
;
j
<
numAtoms
;
j
++
)
{
...
@@ -543,7 +540,7 @@ void CpuCustomGBForce::calculateChainRuleForces(ThreadData& data, int numAtoms,
...
@@ -543,7 +540,7 @@ void CpuCustomGBForce::calculateChainRuleForces(ThreadData& data, int numAtoms,
// Loop over all pairs in the neighbor list.
// Loop over all pairs in the neighbor list.
while
(
true
)
{
while
(
true
)
{
int
blockIndex
=
gmx_atomic_fetch_add
(
reinterpret_cast
<
gmx_atomic_t
*>
(
atomicCounter
),
1
)
;
int
blockIndex
=
atomicCounter
++
;
if
(
blockIndex
>=
neighborList
->
getNumBlocks
())
if
(
blockIndex
>=
neighborList
->
getNumBlocks
())
break
;
break
;
const
int
blockSize
=
neighborList
->
getBlockSize
();
const
int
blockSize
=
neighborList
->
getBlockSize
();
...
@@ -567,7 +564,7 @@ void CpuCustomGBForce::calculateChainRuleForces(ThreadData& data, int numAtoms,
...
@@ -567,7 +564,7 @@ void CpuCustomGBForce::calculateChainRuleForces(ThreadData& data, int numAtoms,
// Perform an O(N^2) loop over all atom pairs.
// Perform an O(N^2) loop over all atom pairs.
while
(
true
)
{
while
(
true
)
{
int
i
=
gmx_atomic_fetch_add
(
reinterpret_cast
<
gmx_atomic_t
*>
(
atomicCounter
),
1
)
;
int
i
=
atomicCounter
++
;
if
(
i
>=
numAtoms
)
if
(
i
>=
numAtoms
)
break
;
break
;
for
(
int
j
=
i
+
1
;
j
<
numAtoms
;
j
++
)
{
for
(
int
j
=
i
+
1
;
j
<
numAtoms
;
j
++
)
{
...
...
platforms/cpu/src/CpuCustomManyParticleForce.cpp
View file @
fdb501e3
...
@@ -32,7 +32,6 @@
...
@@ -32,7 +32,6 @@
#include "ReferenceTabulatedFunction.h"
#include "ReferenceTabulatedFunction.h"
#include "openmm/internal/CustomManyParticleForceImpl.h"
#include "openmm/internal/CustomManyParticleForceImpl.h"
#include "lepton/CustomFunction.h"
#include "lepton/CustomFunction.h"
#include "openmm/internal/gmx_atomic.h"
using
namespace
OpenMM
;
using
namespace
OpenMM
;
using
namespace
std
;
using
namespace
std
;
...
@@ -99,9 +98,7 @@ void CpuCustomManyParticleForce::calculateIxn(AlignedArray<float>& posq, vector<
...
@@ -99,9 +98,7 @@ void CpuCustomManyParticleForce::calculateIxn(AlignedArray<float>& posq, vector<
this
->
threadForce
=
&
threadForce
;
this
->
threadForce
=
&
threadForce
;
this
->
includeForces
=
includeForces
;
this
->
includeForces
=
includeForces
;
this
->
includeEnergy
=
includeEnergy
;
this
->
includeEnergy
=
includeEnergy
;
gmx_atomic_t
counter
;
atomicCounter
=
0
;
gmx_atomic_set
(
&
counter
,
0
);
this
->
atomicCounter
=
&
counter
;
if
(
useCutoff
)
{
if
(
useCutoff
)
{
// Construct a neighbor list. We use CpuNeighborList to do this, but then copy the result
// Construct a neighbor list. We use CpuNeighborList to do this, but then copy the result
// into a new data structure. This is needed because in UniqueCentralParticle mode, the
// into a new data structure. This is needed because in UniqueCentralParticle mode, the
...
@@ -156,7 +153,7 @@ void CpuCustomManyParticleForce::threadComputeForce(ThreadPool& threads, int thr
...
@@ -156,7 +153,7 @@ void CpuCustomManyParticleForce::threadComputeForce(ThreadPool& threads, int thr
// Loop over interactions from the neighbor list.
// Loop over interactions from the neighbor list.
while
(
true
)
{
while
(
true
)
{
int
i
=
gmx_atomic_fetch_add
(
reinterpret_cast
<
gmx_atomic_t
*>
(
atomicCounter
),
1
)
;
int
i
=
atomicCounter
++
;
if
(
i
>=
numParticles
)
if
(
i
>=
numParticles
)
break
;
break
;
particleIndices
[
0
]
=
i
;
particleIndices
[
0
]
=
i
;
...
@@ -170,7 +167,7 @@ void CpuCustomManyParticleForce::threadComputeForce(ThreadPool& threads, int thr
...
@@ -170,7 +167,7 @@ void CpuCustomManyParticleForce::threadComputeForce(ThreadPool& threads, int thr
for
(
int
i
=
0
;
i
<
numParticles
;
i
++
)
for
(
int
i
=
0
;
i
<
numParticles
;
i
++
)
particles
[
i
]
=
i
;
particles
[
i
]
=
i
;
while
(
true
)
{
while
(
true
)
{
int
i
=
gmx_atomic_fetch_add
(
reinterpret_cast
<
gmx_atomic_t
*>
(
atomicCounter
),
1
)
;
int
i
=
atomicCounter
++
;
if
(
i
>=
numParticles
)
if
(
i
>=
numParticles
)
break
;
break
;
particleIndices
[
0
]
=
i
;
particleIndices
[
0
]
=
i
;
...
...
platforms/cpu/src/CpuCustomNonbondedForce.cpp
View file @
fdb501e3
/* Portions copyright (c) 2009-201
7
Stanford University and Simbios.
/* Portions copyright (c) 2009-201
8
Stanford University and Simbios.
* Contributors: Peter Eastman
* Contributors: Peter Eastman
*
*
* Permission is hereby granted, free of charge, to any person obtaining
* Permission is hereby granted, free of charge, to any person obtaining
...
@@ -28,7 +28,6 @@
...
@@ -28,7 +28,6 @@
#include "SimTKOpenMMUtilities.h"
#include "SimTKOpenMMUtilities.h"
#include "ReferenceForce.h"
#include "ReferenceForce.h"
#include "CpuCustomNonbondedForce.h"
#include "CpuCustomNonbondedForce.h"
#include "openmm/internal/gmx_atomic.h"
using
namespace
OpenMM
;
using
namespace
OpenMM
;
using
namespace
std
;
using
namespace
std
;
...
@@ -134,9 +133,7 @@ void CpuCustomNonbondedForce::calculatePairIxn(int numberOfAtoms, float* posq, v
...
@@ -134,9 +133,7 @@ void CpuCustomNonbondedForce::calculatePairIxn(int numberOfAtoms, float* posq, v
this
->
includeForce
=
includeForce
;
this
->
includeForce
=
includeForce
;
this
->
includeEnergy
=
includeEnergy
;
this
->
includeEnergy
=
includeEnergy
;
threadEnergy
.
resize
(
threads
.
getNumThreads
());
threadEnergy
.
resize
(
threads
.
getNumThreads
());
gmx_atomic_t
counter
;
atomicCounter
=
0
;
gmx_atomic_set
(
&
counter
,
0
);
this
->
atomicCounter
=
&
counter
;
// Signal the threads to start running and wait for them to finish.
// Signal the threads to start running and wait for them to finish.
...
@@ -177,7 +174,7 @@ void CpuCustomNonbondedForce::threadComputeForce(ThreadPool& threads, int thread
...
@@ -177,7 +174,7 @@ void CpuCustomNonbondedForce::threadComputeForce(ThreadPool& threads, int thread
// The user has specified interaction groups, so compute only the requested interactions.
// The user has specified interaction groups, so compute only the requested interactions.
while
(
true
)
{
while
(
true
)
{
int
i
=
gmx_atomic_fetch_add
(
reinterpret_cast
<
gmx_atomic_t
*>
(
atomicCounter
),
1
)
;
int
i
=
atomicCounter
++
;
if
(
i
>=
groupInteractions
.
size
())
if
(
i
>=
groupInteractions
.
size
())
break
;
break
;
int
atom1
=
groupInteractions
[
i
].
first
;
int
atom1
=
groupInteractions
[
i
].
first
;
...
@@ -193,7 +190,7 @@ void CpuCustomNonbondedForce::threadComputeForce(ThreadPool& threads, int thread
...
@@ -193,7 +190,7 @@ void CpuCustomNonbondedForce::threadComputeForce(ThreadPool& threads, int thread
// We are using a cutoff, so get the interactions from the neighbor list.
// We are using a cutoff, so get the interactions from the neighbor list.
while
(
true
)
{
while
(
true
)
{
int
blockIndex
=
gmx_atomic_fetch_add
(
reinterpret_cast
<
gmx_atomic_t
*>
(
atomicCounter
),
1
)
;
int
blockIndex
=
atomicCounter
++
;
if
(
blockIndex
>=
neighborList
->
getNumBlocks
())
if
(
blockIndex
>=
neighborList
->
getNumBlocks
())
break
;
break
;
const
int
blockSize
=
neighborList
->
getBlockSize
();
const
int
blockSize
=
neighborList
->
getBlockSize
();
...
@@ -219,7 +216,7 @@ void CpuCustomNonbondedForce::threadComputeForce(ThreadPool& threads, int thread
...
@@ -219,7 +216,7 @@ void CpuCustomNonbondedForce::threadComputeForce(ThreadPool& threads, int thread
// Every particle interacts with every other one.
// Every particle interacts with every other one.
while
(
true
)
{
while
(
true
)
{
int
ii
=
gmx_atomic_fetch_add
(
reinterpret_cast
<
gmx_atomic_t
*>
(
atomicCounter
),
1
)
;
int
ii
=
atomicCounter
++
;
if
(
ii
>=
numberOfAtoms
)
if
(
ii
>=
numberOfAtoms
)
break
;
break
;
for
(
int
jj
=
ii
+
1
;
jj
<
numberOfAtoms
;
jj
++
)
{
for
(
int
jj
=
ii
+
1
;
jj
<
numberOfAtoms
;
jj
++
)
{
...
...
platforms/cpu/src/CpuGBSAOBCForce.cpp
View file @
fdb501e3
/* Portions copyright (c) 2006-201
7
Stanford University and Simbios.
/* Portions copyright (c) 2006-201
8
Stanford University and Simbios.
* Contributors: Pande Group
* Contributors: Pande Group
*
*
* Permission is hereby granted, free of charge, to any person obtaining
* Permission is hereby granted, free of charge, to any person obtaining
...
@@ -24,7 +24,6 @@
...
@@ -24,7 +24,6 @@
#include "CpuGBSAOBCForce.h"
#include "CpuGBSAOBCForce.h"
#include "SimTKOpenMMRealType.h"
#include "SimTKOpenMMRealType.h"
#include "openmm/internal/vectorize.h"
#include "openmm/internal/vectorize.h"
#include "openmm/internal/gmx_atomic.h"
#include <algorithm>
#include <algorithm>
#include <cmath>
#include <cmath>
#include <cstdlib>
#include <cstdlib>
...
@@ -95,21 +94,19 @@ void CpuGBSAOBCForce::computeForce(const AlignedArray<float>& posq, vector<Align
...
@@ -95,21 +94,19 @@ void CpuGBSAOBCForce::computeForce(const AlignedArray<float>& posq, vector<Align
threadBornForces
.
resize
(
numThreads
);
threadBornForces
.
resize
(
numThreads
);
for
(
int
i
=
0
;
i
<
numThreads
;
i
++
)
for
(
int
i
=
0
;
i
<
numThreads
;
i
++
)
threadBornForces
[
i
].
resize
(
particleParams
.
size
()
+
3
);
threadBornForces
[
i
].
resize
(
particleParams
.
size
()
+
3
);
gmx_atomic_t
counter
;
this
->
atomicCounter
=
&
counter
;
// Signal the threads to start running and wait for them to finish.
// Signal the threads to start running and wait for them to finish.
gmx_
atomic
_set
(
&
c
ounter
,
0
)
;
atomic
C
ounter
=
0
;
threads
.
execute
([
&
]
(
ThreadPool
&
threads
,
int
threadIndex
)
{
threadComputeForce
(
threads
,
threadIndex
);
});
threads
.
execute
([
&
]
(
ThreadPool
&
threads
,
int
threadIndex
)
{
threadComputeForce
(
threads
,
threadIndex
);
});
threads
.
waitForThreads
();
// Compute Born radii
threads
.
waitForThreads
();
// Compute Born radii
gmx_
atomic
_set
(
&
c
ounter
,
0
)
;
atomic
C
ounter
=
0
;
threads
.
resumeThreads
();
threads
.
resumeThreads
();
threads
.
waitForThreads
();
// Compute surface area term
threads
.
waitForThreads
();
// Compute surface area term
gmx_
atomic
_set
(
&
c
ounter
,
0
)
;
atomic
C
ounter
=
0
;
threads
.
resumeThreads
();
threads
.
resumeThreads
();
threads
.
waitForThreads
();
// First loop
threads
.
waitForThreads
();
// First loop
gmx_
atomic
_set
(
&
c
ounter
,
0
)
;
atomic
C
ounter
=
0
;
threads
.
resumeThreads
();
threads
.
resumeThreads
();
threads
.
waitForThreads
();
// Second loop
threads
.
waitForThreads
();
// Second loop
...
@@ -138,7 +135,7 @@ void CpuGBSAOBCForce::threadComputeForce(ThreadPool& threads, int threadIndex) {
...
@@ -138,7 +135,7 @@ void CpuGBSAOBCForce::threadComputeForce(ThreadPool& threads, int threadIndex) {
// Calculate Born radii
// Calculate Born radii
while
(
true
)
{
while
(
true
)
{
int
blockStart
=
gmx_
atomic
_
fetch_add
(
reinterpret_cast
<
gmx_atomic_t
*>
(
atomicCounter
),
4
);
int
blockStart
=
atomic
Counter
.
fetch_add
(
4
);
if
(
blockStart
>=
numParticles
)
if
(
blockStart
>=
numParticles
)
break
;
break
;
int
numInBlock
=
min
(
4
,
numParticles
-
blockStart
);
int
numInBlock
=
min
(
4
,
numParticles
-
blockStart
);
...
@@ -215,7 +212,7 @@ void CpuGBSAOBCForce::threadComputeForce(ThreadPool& threads, int threadIndex) {
...
@@ -215,7 +212,7 @@ void CpuGBSAOBCForce::threadComputeForce(ThreadPool& threads, int threadIndex) {
for
(
int
i
=
0
;
i
<
numParticles
;
i
++
)
for
(
int
i
=
0
;
i
<
numParticles
;
i
++
)
bornForces
[
i
]
=
0.0
f
;
bornForces
[
i
]
=
0.0
f
;
while
(
true
)
{
while
(
true
)
{
int
atomI
=
gmx_atomic_fetch_add
(
reinterpret_cast
<
gmx_atomic_t
*>
(
atomicCounter
),
1
)
;
int
atomI
=
atomicCounter
++
;
if
(
atomI
>=
numParticles
)
if
(
atomI
>=
numParticles
)
break
;
break
;
if
(
bornRadii
[
atomI
]
>
0
)
{
if
(
bornRadii
[
atomI
]
>
0
)
{
...
@@ -240,7 +237,7 @@ void CpuGBSAOBCForce::threadComputeForce(ThreadPool& threads, int threadIndex) {
...
@@ -240,7 +237,7 @@ void CpuGBSAOBCForce::threadComputeForce(ThreadPool& threads, int threadIndex) {
else
else
preFactor
=
0.0
f
;
preFactor
=
0.0
f
;
while
(
true
)
{
while
(
true
)
{
int
blockStart
=
gmx_
atomic
_
fetch_add
(
reinterpret_cast
<
gmx_atomic_t
*>
(
atomicCounter
),
4
);
int
blockStart
=
atomic
Counter
.
fetch_add
(
4
);
if
(
blockStart
>=
numParticles
)
if
(
blockStart
>=
numParticles
)
break
;
break
;
int
numInBlock
=
min
(
4
,
numParticles
-
blockStart
);
int
numInBlock
=
min
(
4
,
numParticles
-
blockStart
);
...
@@ -318,7 +315,7 @@ void CpuGBSAOBCForce::threadComputeForce(ThreadPool& threads, int threadIndex) {
...
@@ -318,7 +315,7 @@ void CpuGBSAOBCForce::threadComputeForce(ThreadPool& threads, int threadIndex) {
// Second loop of Born energy computation.
// Second loop of Born energy computation.
while
(
true
)
{
while
(
true
)
{
int
blockStart
=
gmx_
atomic
_
fetch_add
(
reinterpret_cast
<
gmx_atomic_t
*>
(
atomicCounter
),
4
);
int
blockStart
=
atomic
Counter
.
fetch_add
(
4
);
if
(
blockStart
>=
numParticles
)
if
(
blockStart
>=
numParticles
)
break
;
break
;
fvec4
bornForce
(
0.0
f
);
fvec4
bornForce
(
0.0
f
);
...
...
platforms/cpu/src/CpuGayBerneForce.cpp
View file @
fdb501e3
...
@@ -6,7 +6,7 @@
...
@@ -6,7 +6,7 @@
* Biological Structures at Stanford, funded under the NIH Roadmap for *
* Biological Structures at Stanford, funded under the NIH Roadmap for *
* Medical Research, grant U54 GM072970. See https://simtk.org. *
* Medical Research, grant U54 GM072970. See https://simtk.org. *
* *
* *
* Portions copyright (c) 2016-201
7
Stanford University and the Authors. *
* Portions copyright (c) 2016-201
8
Stanford University and the Authors. *
* Authors: Peter Eastman *
* Authors: Peter Eastman *
* Contributors: *
* Contributors: *
* *
* *
...
@@ -37,7 +37,6 @@
...
@@ -37,7 +37,6 @@
#include "ReferenceForce.h"
#include "ReferenceForce.h"
#include "openmm/OpenMMException.h"
#include "openmm/OpenMMException.h"
#include "openmm/GayBerneForce.h"
#include "openmm/GayBerneForce.h"
#include "openmm/internal/gmx_atomic.h"
#include <algorithm>
#include <algorithm>
#include <cmath>
#include <cmath>
...
@@ -120,9 +119,7 @@ double CpuGayBerneForce::calculateForce(const vector<Vec3>& positions, std::vect
...
@@ -120,9 +119,7 @@ double CpuGayBerneForce::calculateForce(const vector<Vec3>& positions, std::vect
this
->
boxVectors
=
boxVectors
;
this
->
boxVectors
=
boxVectors
;
threadEnergy
.
resize
(
numThreads
);
threadEnergy
.
resize
(
numThreads
);
threadTorque
.
resize
(
numThreads
);
threadTorque
.
resize
(
numThreads
);
gmx_atomic_t
counter
;
atomicCounter
=
0
;
gmx_atomic_set
(
&
counter
,
0
);
this
->
atomicCounter
=
&
counter
;
// Signal the threads to compute the pairwise interactions.
// Signal the threads to compute the pairwise interactions.
...
@@ -131,7 +128,7 @@ double CpuGayBerneForce::calculateForce(const vector<Vec3>& positions, std::vect
...
@@ -131,7 +128,7 @@ double CpuGayBerneForce::calculateForce(const vector<Vec3>& positions, std::vect
// Signal the threads to compute exceptions.
// Signal the threads to compute exceptions.
gmx_
atomic
_set
(
&
c
ounter
,
0
)
;
atomic
C
ounter
=
0
;
threads
.
resumeThreads
();
threads
.
resumeThreads
();
threads
.
waitForThreads
();
threads
.
waitForThreads
();
...
@@ -162,7 +159,7 @@ void CpuGayBerneForce::threadComputeForce(ThreadPool& threads, int threadIndex,
...
@@ -162,7 +159,7 @@ void CpuGayBerneForce::threadComputeForce(ThreadPool& threads, int threadIndex,
if
(
neighborList
==
NULL
)
{
if
(
neighborList
==
NULL
)
{
while
(
true
)
{
while
(
true
)
{
int
i
=
gmx_atomic_fetch_add
(
reinterpret_cast
<
gmx_atomic_t
*>
(
atomicCounter
),
1
)
;
int
i
=
atomicCounter
++
;
if
(
i
>=
numParticles
)
if
(
i
>=
numParticles
)
break
;
break
;
if
(
particles
[
i
].
sqrtEpsilon
==
0.0
f
)
if
(
particles
[
i
].
sqrtEpsilon
==
0.0
f
)
...
@@ -180,7 +177,7 @@ void CpuGayBerneForce::threadComputeForce(ThreadPool& threads, int threadIndex,
...
@@ -180,7 +177,7 @@ void CpuGayBerneForce::threadComputeForce(ThreadPool& threads, int threadIndex,
}
}
else
{
else
{
while
(
true
)
{
while
(
true
)
{
int
blockIndex
=
gmx_atomic_fetch_add
(
reinterpret_cast
<
gmx_atomic_t
*>
(
atomicCounter
),
1
)
;
int
blockIndex
=
atomicCounter
++
;
if
(
blockIndex
>=
neighborList
->
getNumBlocks
())
if
(
blockIndex
>=
neighborList
->
getNumBlocks
())
break
;
break
;
const
int
blockSize
=
neighborList
->
getBlockSize
();
const
int
blockSize
=
neighborList
->
getBlockSize
();
...
@@ -211,7 +208,7 @@ void CpuGayBerneForce::threadComputeForce(ThreadPool& threads, int threadIndex,
...
@@ -211,7 +208,7 @@ void CpuGayBerneForce::threadComputeForce(ThreadPool& threads, int threadIndex,
int
numExceptions
=
exceptions
.
size
();
int
numExceptions
=
exceptions
.
size
();
const
int
groupSize
=
max
(
1
,
numExceptions
/
(
10
*
numThreads
));
const
int
groupSize
=
max
(
1
,
numExceptions
/
(
10
*
numThreads
));
while
(
true
)
{
while
(
true
)
{
int
start
=
gmx_
atomic
_
fetch_add
(
reinterpret_cast
<
gmx_atomic_t
*>
(
atomicCounter
),
groupSize
);
int
start
=
atomic
Counter
.
fetch_add
(
groupSize
);
if
(
start
>=
numExceptions
)
if
(
start
>=
numExceptions
)
break
;
break
;
int
end
=
min
(
start
+
groupSize
,
numExceptions
);
int
end
=
min
(
start
+
groupSize
,
numExceptions
);
...
...
platforms/cpu/src/CpuNeighborList.cpp
View file @
fdb501e3
...
@@ -6,7 +6,7 @@
...
@@ -6,7 +6,7 @@
* Biological Structures at Stanford, funded under the NIH Roadmap for *
* Biological Structures at Stanford, funded under the NIH Roadmap for *
* Medical Research, grant U54 GM072970. See https://simtk.org. *
* Medical Research, grant U54 GM072970. See https://simtk.org. *
* *
* *
* Portions copyright (c) 2013-201
7
Stanford University and the Authors. *
* Portions copyright (c) 2013-201
8
Stanford University and the Authors. *
* Authors: Peter Eastman *
* Authors: Peter Eastman *
* Contributors: *
* Contributors: *
* *
* *
...
@@ -476,7 +476,7 @@ void CpuNeighborList::computeNeighborList(int numAtoms, const AlignedArray<float
...
@@ -476,7 +476,7 @@ void CpuNeighborList::computeNeighborList(int numAtoms, const AlignedArray<float
// Signal the threads to start running and wait for them to finish.
// Signal the threads to start running and wait for them to finish.
gmx_atomic_set
(
&
atomicCounter
,
0
)
;
atomicCounter
=
0
;
threads
.
resumeThreads
();
threads
.
resumeThreads
();
threads
.
waitForThreads
();
threads
.
waitForThreads
();
...
@@ -538,7 +538,7 @@ void CpuNeighborList::threadComputeNeighborList(ThreadPool& threads, int threadI
...
@@ -538,7 +538,7 @@ void CpuNeighborList::threadComputeNeighborList(ThreadPool& threads, int threadI
vector
<
float
>
blockAtomX
(
blockSize
),
blockAtomY
(
blockSize
),
blockAtomZ
(
blockSize
);
vector
<
float
>
blockAtomX
(
blockSize
),
blockAtomY
(
blockSize
),
blockAtomZ
(
blockSize
);
vector
<
VoxelIndex
>
atomVoxelIndex
;
vector
<
VoxelIndex
>
atomVoxelIndex
;
while
(
true
)
{
while
(
true
)
{
int
i
=
gmx_atomic_fetch_add
(
&
atomicCounter
,
1
)
;
int
i
=
atomicCounter
++
;
if
(
i
>=
numBlocks
)
if
(
i
>=
numBlocks
)
break
;
break
;
...
...
platforms/cpu/src/CpuNonbondedForce.cpp
View file @
fdb501e3
/* Portions copyright (c) 2006-201
7
Stanford University and Simbios.
/* Portions copyright (c) 2006-201
8
Stanford University and Simbios.
* Contributors: Pande Group
* Contributors: Pande Group
*
*
* Permission is hereby granted, free of charge, to any person obtaining
* Permission is hereby granted, free of charge, to any person obtaining
...
@@ -28,7 +28,6 @@
...
@@ -28,7 +28,6 @@
#include "CpuNonbondedForce.h"
#include "CpuNonbondedForce.h"
#include "ReferenceForce.h"
#include "ReferenceForce.h"
#include "ReferencePME.h"
#include "ReferencePME.h"
#include "openmm/internal/gmx_atomic.h"
#include <algorithm>
#include <algorithm>
#include <iostream>
#include <iostream>
...
@@ -389,9 +388,7 @@ void CpuNonbondedForce::calculateDirectIxn(int numberOfAtoms, float* posq, const
...
@@ -389,9 +388,7 @@ void CpuNonbondedForce::calculateDirectIxn(int numberOfAtoms, float* posq, const
this
->
threadForce
=
&
threadForce
;
this
->
threadForce
=
&
threadForce
;
includeEnergy
=
(
totalEnergy
!=
NULL
);
includeEnergy
=
(
totalEnergy
!=
NULL
);
threadEnergy
.
resize
(
threads
.
getNumThreads
());
threadEnergy
.
resize
(
threads
.
getNumThreads
());
gmx_atomic_t
counter
;
atomicCounter
=
0
;
gmx_atomic_set
(
&
counter
,
0
);
this
->
atomicCounter
=
&
counter
;
// Signal the threads to start running and wait for them to finish.
// Signal the threads to start running and wait for them to finish.
...
@@ -401,7 +398,7 @@ void CpuNonbondedForce::calculateDirectIxn(int numberOfAtoms, float* posq, const
...
@@ -401,7 +398,7 @@ void CpuNonbondedForce::calculateDirectIxn(int numberOfAtoms, float* posq, const
// Signal the threads to subtract the exclusions.
// Signal the threads to subtract the exclusions.
if
(
ewald
||
pme
)
{
if
(
ewald
||
pme
)
{
gmx_
atomic
_set
(
&
c
ounter
,
0
)
;
atomic
C
ounter
=
0
;
threads
.
resumeThreads
();
threads
.
resumeThreads
();
threads
.
waitForThreads
();
threads
.
waitForThreads
();
}
}
...
@@ -429,7 +426,7 @@ void CpuNonbondedForce::threadComputeDirect(ThreadPool& threads, int threadIndex
...
@@ -429,7 +426,7 @@ void CpuNonbondedForce::threadComputeDirect(ThreadPool& threads, int threadIndex
if
(
ewald
||
pme
||
ljpme
)
{
if
(
ewald
||
pme
||
ljpme
)
{
// Compute the interactions from the neighbor list.
// Compute the interactions from the neighbor list.
while
(
true
)
{
while
(
true
)
{
int
nextBlock
=
gmx_atomic_fetch_add
(
reinterpret_cast
<
gmx_atomic_t
*>
(
atomicCounter
),
1
)
;
int
nextBlock
=
atomicCounter
++
;
if
(
nextBlock
>=
neighborList
->
getNumBlocks
())
if
(
nextBlock
>=
neighborList
->
getNumBlocks
())
break
;
break
;
calculateBlockEwaldIxn
(
nextBlock
,
forces
,
energyPtr
,
boxSize
,
invBoxSize
);
calculateBlockEwaldIxn
(
nextBlock
,
forces
,
energyPtr
,
boxSize
,
invBoxSize
);
...
@@ -440,7 +437,7 @@ void CpuNonbondedForce::threadComputeDirect(ThreadPool& threads, int threadIndex
...
@@ -440,7 +437,7 @@ void CpuNonbondedForce::threadComputeDirect(ThreadPool& threads, int threadIndex
threads
.
syncThreads
();
threads
.
syncThreads
();
const
int
groupSize
=
max
(
1
,
numberOfAtoms
/
(
10
*
numThreads
));
const
int
groupSize
=
max
(
1
,
numberOfAtoms
/
(
10
*
numThreads
));
while
(
true
)
{
while
(
true
)
{
int
start
=
gmx_
atomic
_
fetch_add
(
reinterpret_cast
<
gmx_atomic_t
*>
(
atomicCounter
),
groupSize
);
int
start
=
atomic
Counter
.
fetch_add
(
groupSize
);
if
(
start
>=
numberOfAtoms
)
if
(
start
>=
numberOfAtoms
)
break
;
break
;
int
end
=
min
(
start
+
groupSize
,
numberOfAtoms
);
int
end
=
min
(
start
+
groupSize
,
numberOfAtoms
);
...
@@ -490,7 +487,7 @@ void CpuNonbondedForce::threadComputeDirect(ThreadPool& threads, int threadIndex
...
@@ -490,7 +487,7 @@ void CpuNonbondedForce::threadComputeDirect(ThreadPool& threads, int threadIndex
// Compute the interactions from the neighbor list.
// Compute the interactions from the neighbor list.
while
(
true
)
{
while
(
true
)
{
int
nextBlock
=
gmx_atomic_fetch_add
(
reinterpret_cast
<
gmx_atomic_t
*>
(
atomicCounter
),
1
)
;
int
nextBlock
=
atomicCounter
++
;
if
(
nextBlock
>=
neighborList
->
getNumBlocks
())
if
(
nextBlock
>=
neighborList
->
getNumBlocks
())
break
;
break
;
calculateBlockIxn
(
nextBlock
,
forces
,
energyPtr
,
boxSize
,
invBoxSize
);
calculateBlockIxn
(
nextBlock
,
forces
,
energyPtr
,
boxSize
,
invBoxSize
);
...
@@ -500,7 +497,7 @@ void CpuNonbondedForce::threadComputeDirect(ThreadPool& threads, int threadIndex
...
@@ -500,7 +497,7 @@ void CpuNonbondedForce::threadComputeDirect(ThreadPool& threads, int threadIndex
// Loop over all atom pairs
// Loop over all atom pairs
while
(
true
)
{
while
(
true
)
{
int
i
=
gmx_atomic_fetch_add
(
reinterpret_cast
<
gmx_atomic_t
*>
(
atomicCounter
),
1
)
;
int
i
=
atomicCounter
++
;
if
(
i
>=
numberOfAtoms
)
if
(
i
>=
numberOfAtoms
)
break
;
break
;
for
(
int
j
=
i
+
1
;
j
<
numberOfAtoms
;
j
++
)
for
(
int
j
=
i
+
1
;
j
<
numberOfAtoms
;
j
++
)
...
...
platforms/cpu/src/CpuSETTLE.cpp
View file @
fdb501e3
...
@@ -6,7 +6,7 @@
...
@@ -6,7 +6,7 @@
* Biological Structures at Stanford, funded under the NIH Roadmap for *
* Biological Structures at Stanford, funded under the NIH Roadmap for *
* Medical Research, grant U54 GM072970. See https://simtk.org. *
* Medical Research, grant U54 GM072970. See https://simtk.org. *
* *
* *
* Portions copyright (c) 2013-201
7
Stanford University and the Authors. *
* Portions copyright (c) 2013-201
8
Stanford University and the Authors. *
* Authors: Peter Eastman *
* Authors: Peter Eastman *
* Contributors: *
* Contributors: *
* *
* *
...
@@ -30,7 +30,7 @@
...
@@ -30,7 +30,7 @@
* -------------------------------------------------------------------------- */
* -------------------------------------------------------------------------- */
#include "CpuSETTLE.h"
#include "CpuSETTLE.h"
#include
"openmm/internal/gmx_
atomic
.h"
#include
<
atomic
>
using
namespace
OpenMM
;
using
namespace
OpenMM
;
using
namespace
std
;
using
namespace
std
;
...
@@ -61,11 +61,11 @@ CpuSETTLE::~CpuSETTLE() {
...
@@ -61,11 +61,11 @@ CpuSETTLE::~CpuSETTLE() {
}
}
void
CpuSETTLE
::
apply
(
vector
<
OpenMM
::
Vec3
>&
atomCoordinates
,
vector
<
OpenMM
::
Vec3
>&
atomCoordinatesP
,
vector
<
double
>&
inverseMasses
,
double
tolerance
)
{
void
CpuSETTLE
::
apply
(
vector
<
OpenMM
::
Vec3
>&
atomCoordinates
,
vector
<
OpenMM
::
Vec3
>&
atomCoordinatesP
,
vector
<
double
>&
inverseMasses
,
double
tolerance
)
{
gmx_
atomic
_t
atomicCounter
;
atomic
<
int
>
atomicCounter
;
gmx_atomic_set
(
&
atomicCounter
,
0
)
;
atomicCounter
=
0
;
threads
.
execute
([
&
]
(
ThreadPool
&
threads
,
int
threadIndex
)
{
threads
.
execute
([
&
]
(
ThreadPool
&
threads
,
int
threadIndex
)
{
while
(
true
)
{
while
(
true
)
{
int
index
=
gmx_atomic_fetch_add
(
&
atomicCounter
,
1
)
;
int
index
=
atomicCounter
++
;
if
(
index
>=
threadSettle
.
size
())
if
(
index
>=
threadSettle
.
size
())
break
;
break
;
threadSettle
[
index
]
->
apply
(
atomCoordinates
,
atomCoordinatesP
,
inverseMasses
,
tolerance
);
threadSettle
[
index
]
->
apply
(
atomCoordinates
,
atomCoordinatesP
,
inverseMasses
,
tolerance
);
...
@@ -75,11 +75,11 @@ void CpuSETTLE::apply(vector<OpenMM::Vec3>& atomCoordinates, vector<OpenMM::Vec3
...
@@ -75,11 +75,11 @@ void CpuSETTLE::apply(vector<OpenMM::Vec3>& atomCoordinates, vector<OpenMM::Vec3
}
}
void
CpuSETTLE
::
applyToVelocities
(
vector
<
OpenMM
::
Vec3
>&
atomCoordinates
,
vector
<
OpenMM
::
Vec3
>&
velocities
,
vector
<
double
>&
inverseMasses
,
double
tolerance
)
{
void
CpuSETTLE
::
applyToVelocities
(
vector
<
OpenMM
::
Vec3
>&
atomCoordinates
,
vector
<
OpenMM
::
Vec3
>&
velocities
,
vector
<
double
>&
inverseMasses
,
double
tolerance
)
{
gmx_
atomic
_t
atomicCounter
;
atomic
<
int
>
atomicCounter
;
gmx_atomic_set
(
&
atomicCounter
,
0
)
;
atomicCounter
=
0
;
threads
.
execute
([
&
]
(
ThreadPool
&
threads
,
int
threadIndex
)
{
threads
.
execute
([
&
]
(
ThreadPool
&
threads
,
int
threadIndex
)
{
while
(
true
)
{
while
(
true
)
{
int
index
=
gmx_atomic_fetch_add
(
&
atomicCounter
,
1
)
;
int
index
=
atomicCounter
++
;
if
(
index
>=
threadSettle
.
size
())
if
(
index
>=
threadSettle
.
size
())
break
;
break
;
threadSettle
[
index
]
->
applyToVelocities
(
atomCoordinates
,
velocities
,
inverseMasses
,
tolerance
);
threadSettle
[
index
]
->
applyToVelocities
(
atomCoordinates
,
velocities
,
inverseMasses
,
tolerance
);
...
...
plugins/cpupme/src/CpuPmeKernels.cpp
View file @
fdb501e3
...
@@ -6,7 +6,7 @@
...
@@ -6,7 +6,7 @@
* Biological Structures at Stanford, funded under the NIH Roadmap for *
* Biological Structures at Stanford, funded under the NIH Roadmap for *
* Medical Research, grant U54 GM072970. See https://simtk.org. *
* Medical Research, grant U54 GM072970. See https://simtk.org. *
* *
* *
* Portions copyright (c) 2013-201
7
Stanford University and the Authors. *
* Portions copyright (c) 2013-201
8
Stanford University and the Authors. *
* Authors: Peter Eastman *
* Authors: Peter Eastman *
* Contributors: *
* Contributors: *
* *
* *
...
@@ -52,7 +52,7 @@ bool CpuCalcDispersionPmeReciprocalForceKernel::hasInitializedThreads = false;
...
@@ -52,7 +52,7 @@ bool CpuCalcDispersionPmeReciprocalForceKernel::hasInitializedThreads = false;
int
CpuCalcDispersionPmeReciprocalForceKernel
::
numThreads
=
0
;
int
CpuCalcDispersionPmeReciprocalForceKernel
::
numThreads
=
0
;
static
void
spreadCharge
(
float
*
posq
,
float
*
grid
,
int
gridx
,
int
gridy
,
int
gridz
,
int
numParticles
,
Vec3
*
periodicBoxVectors
,
Vec3
*
recipBoxVectors
,
static
void
spreadCharge
(
float
*
posq
,
float
*
grid
,
int
gridx
,
int
gridy
,
int
gridz
,
int
numParticles
,
Vec3
*
periodicBoxVectors
,
Vec3
*
recipBoxVectors
,
gmx_
atomic
_t
&
atomicCounter
,
const
float
epsilonFactor
,
int
threadIndex
,
int
numThreads
,
bool
deterministic
)
{
atomic
<
int
>
&
atomicCounter
,
const
float
epsilonFactor
,
int
threadIndex
,
int
numThreads
,
bool
deterministic
)
{
float
temp
[
4
];
float
temp
[
4
];
fvec4
boxSize
((
float
)
periodicBoxVectors
[
0
][
0
],
(
float
)
periodicBoxVectors
[
1
][
1
],
(
float
)
periodicBoxVectors
[
2
][
2
],
0
);
fvec4
boxSize
((
float
)
periodicBoxVectors
[
0
][
0
],
(
float
)
periodicBoxVectors
[
1
][
1
],
(
float
)
periodicBoxVectors
[
2
][
2
],
0
);
fvec4
invBoxSize
((
float
)
recipBoxVectors
[
0
][
0
],
(
float
)
recipBoxVectors
[
1
][
1
],
(
float
)
recipBoxVectors
[
2
][
2
],
0
);
fvec4
invBoxSize
((
float
)
recipBoxVectors
[
0
][
0
],
(
float
)
recipBoxVectors
[
1
][
1
],
(
float
)
recipBoxVectors
[
2
][
2
],
0
);
...
@@ -69,7 +69,7 @@ static void spreadCharge(float* posq, float* grid, int gridx, int gridy, int gri
...
@@ -69,7 +69,7 @@ static void spreadCharge(float* posq, float* grid, int gridx, int gridy, int gri
int
i
=
threadIndex
;
int
i
=
threadIndex
;
while
(
true
)
{
while
(
true
)
{
if
(
!
deterministic
)
if
(
!
deterministic
)
i
=
gmx_atomic_fetch_add
(
&
atomicCounter
,
1
)
;
i
=
atomicCounter
++
;
if
(
i
>=
numParticles
)
if
(
i
>=
numParticles
)
break
;
break
;
...
@@ -310,7 +310,7 @@ static void reciprocalConvolution(int start, int end, fftwf_complex* grid, vecto
...
@@ -310,7 +310,7 @@ static void reciprocalConvolution(int start, int end, fftwf_complex* grid, vecto
}
}
}
}
static
void
interpolateForces
(
float
*
posq
,
float
*
force
,
float
*
grid
,
int
gridx
,
int
gridy
,
int
gridz
,
int
numParticles
,
Vec3
*
periodicBoxVectors
,
Vec3
*
recipBoxVectors
,
gmx_
atomic
_t
&
atomicCounter
,
const
float
epsilonFactor
)
{
static
void
interpolateForces
(
float
*
posq
,
float
*
force
,
float
*
grid
,
int
gridx
,
int
gridy
,
int
gridz
,
int
numParticles
,
Vec3
*
periodicBoxVectors
,
Vec3
*
recipBoxVectors
,
atomic
<
int
>
&
atomicCounter
,
const
float
epsilonFactor
)
{
fvec4
boxSize
((
float
)
periodicBoxVectors
[
0
][
0
],
(
float
)
periodicBoxVectors
[
1
][
1
],
(
float
)
periodicBoxVectors
[
2
][
2
],
0
);
fvec4
boxSize
((
float
)
periodicBoxVectors
[
0
][
0
],
(
float
)
periodicBoxVectors
[
1
][
1
],
(
float
)
periodicBoxVectors
[
2
][
2
],
0
);
fvec4
invBoxSize
((
float
)
recipBoxVectors
[
0
][
0
],
(
float
)
recipBoxVectors
[
1
][
1
],
(
float
)
recipBoxVectors
[
2
][
2
],
0
);
fvec4
invBoxSize
((
float
)
recipBoxVectors
[
0
][
0
],
(
float
)
recipBoxVectors
[
1
][
1
],
(
float
)
recipBoxVectors
[
2
][
2
],
0
);
fvec4
recipBoxVec0
((
float
)
recipBoxVectors
[
0
][
0
],
(
float
)
recipBoxVectors
[
0
][
1
],
(
float
)
recipBoxVectors
[
0
][
2
],
0
);
fvec4
recipBoxVec0
((
float
)
recipBoxVectors
[
0
][
0
],
(
float
)
recipBoxVectors
[
0
][
1
],
(
float
)
recipBoxVectors
[
0
][
2
],
0
);
...
@@ -321,7 +321,7 @@ static void interpolateForces(float* posq, float* force, float* grid, int gridx,
...
@@ -321,7 +321,7 @@ static void interpolateForces(float* posq, float* force, float* grid, int gridx,
fvec4
one
(
1
);
fvec4
one
(
1
);
fvec4
scale
(
1.0
f
/
(
PME_ORDER
-
1
));
fvec4
scale
(
1.0
f
/
(
PME_ORDER
-
1
));
while
(
true
)
{
while
(
true
)
{
int
i
=
gmx_atomic_fetch_add
(
&
atomicCounter
,
1
)
;
int
i
=
atomicCounter
++
;
if
(
i
>=
numParticles
)
if
(
i
>=
numParticles
)
break
;
break
;
...
@@ -545,7 +545,7 @@ void CpuCalcPmeReciprocalForceKernel::runMainThread() {
...
@@ -545,7 +545,7 @@ void CpuCalcPmeReciprocalForceKernel::runMainThread() {
if
(
isDeleted
)
if
(
isDeleted
)
break
;
break
;
posq
=
io
->
getPosq
();
posq
=
io
->
getPosq
();
gmx_atomic_set
(
&
atomicCounter
,
0
)
;
atomicCounter
=
0
;
threads
.
execute
([
&
]
(
ThreadPool
&
threads
,
int
threadIndex
)
{
runWorkerThread
(
threads
,
threadIndex
);
});
// Signal threads to perform charge spreading.
threads
.
execute
([
&
]
(
ThreadPool
&
threads
,
int
threadIndex
)
{
runWorkerThread
(
threads
,
threadIndex
);
});
// Signal threads to perform charge spreading.
threads
.
waitForThreads
();
threads
.
waitForThreads
();
threads
.
resumeThreads
();
// Signal threads to sum the charge grids.
threads
.
resumeThreads
();
// Signal threads to sum the charge grids.
...
@@ -564,7 +564,7 @@ void CpuCalcPmeReciprocalForceKernel::runMainThread() {
...
@@ -564,7 +564,7 @@ void CpuCalcPmeReciprocalForceKernel::runMainThread() {
threads
.
resumeThreads
();
// Signal threads to perform reciprocal convolution.
threads
.
resumeThreads
();
// Signal threads to perform reciprocal convolution.
threads
.
waitForThreads
();
threads
.
waitForThreads
();
fftwf_execute_dft_c2r
(
backwardFFT
,
complexGrid
,
realGrid
);
fftwf_execute_dft_c2r
(
backwardFFT
,
complexGrid
,
realGrid
);
gmx_atomic_set
(
&
atomicCounter
,
0
)
;
atomicCounter
=
0
;
threads
.
resumeThreads
();
// Signal threads to interpolate forces.
threads
.
resumeThreads
();
// Signal threads to interpolate forces.
threads
.
waitForThreads
();
threads
.
waitForThreads
();
isFinished
=
true
;
isFinished
=
true
;
...
@@ -837,7 +837,7 @@ void CpuCalcDispersionPmeReciprocalForceKernel::runMainThread() {
...
@@ -837,7 +837,7 @@ void CpuCalcDispersionPmeReciprocalForceKernel::runMainThread() {
break
;
break
;
posq
=
io
->
getPosq
();
posq
=
io
->
getPosq
();
ComputeTask
task
(
*
this
);
ComputeTask
task
(
*
this
);
gmx_atomic_set
(
&
atomicCounter
,
0
)
;
atomicCounter
=
0
;
threads
.
execute
(
task
);
// Signal threads to perform charge spreading.
threads
.
execute
(
task
);
// Signal threads to perform charge spreading.
threads
.
waitForThreads
();
threads
.
waitForThreads
();
threads
.
resumeThreads
();
// Signal threads to sum the charge grids.
threads
.
resumeThreads
();
// Signal threads to sum the charge grids.
...
@@ -856,7 +856,7 @@ void CpuCalcDispersionPmeReciprocalForceKernel::runMainThread() {
...
@@ -856,7 +856,7 @@ void CpuCalcDispersionPmeReciprocalForceKernel::runMainThread() {
threads
.
resumeThreads
();
// Signal threads to perform reciprocal convolution.
threads
.
resumeThreads
();
// Signal threads to perform reciprocal convolution.
threads
.
waitForThreads
();
threads
.
waitForThreads
();
fftwf_execute_dft_c2r
(
backwardFFT
,
complexGrid
,
realGrid
);
fftwf_execute_dft_c2r
(
backwardFFT
,
complexGrid
,
realGrid
);
gmx_atomic_set
(
&
atomicCounter
,
0
)
;
atomicCounter
=
0
;
threads
.
resumeThreads
();
// Signal threads to interpolate forces.
threads
.
resumeThreads
();
// Signal threads to interpolate forces.
threads
.
waitForThreads
();
threads
.
waitForThreads
();
isFinished
=
true
;
isFinished
=
true
;
...
...
plugins/cpupme/src/CpuPmeKernels.h
View file @
fdb501e3
...
@@ -9,7 +9,7 @@
...
@@ -9,7 +9,7 @@
* Biological Structures at Stanford, funded under the NIH Roadmap for *
* Biological Structures at Stanford, funded under the NIH Roadmap for *
* Medical Research, grant U54 GM072970. See https://simtk.org. *
* Medical Research, grant U54 GM072970. See https://simtk.org. *
* *
* *
* Portions copyright (c) 2013-201
7
Stanford University and the Authors. *
* Portions copyright (c) 2013-201
8
Stanford University and the Authors. *
* Authors: Peter Eastman *
* Authors: Peter Eastman *
* Contributors: *
* Contributors: *
* *
* *
...
@@ -36,8 +36,8 @@
...
@@ -36,8 +36,8 @@
#include "internal/windowsExportPme.h"
#include "internal/windowsExportPme.h"
#include "openmm/kernels.h"
#include "openmm/kernels.h"
#include "openmm/Vec3.h"
#include "openmm/Vec3.h"
#include "openmm/internal/gmx_atomic.h"
#include "openmm/internal/ThreadPool.h"
#include "openmm/internal/ThreadPool.h"
#include <atomic>
#include <fftw3.h>
#include <fftw3.h>
#include <pthread.h>
#include <pthread.h>
#include <vector>
#include <vector>
...
@@ -132,7 +132,7 @@ private:
...
@@ -132,7 +132,7 @@ private:
float
*
posq
;
float
*
posq
;
Vec3
periodicBoxVectors
[
3
],
recipBoxVectors
[
3
];
Vec3
periodicBoxVectors
[
3
],
recipBoxVectors
[
3
];
bool
includeEnergy
;
bool
includeEnergy
;
gmx_
atomic
_t
atomicCounter
;
std
::
atomic
<
int
>
atomicCounter
;
};
};
...
@@ -226,7 +226,7 @@ private:
...
@@ -226,7 +226,7 @@ private:
float
*
posq
;
float
*
posq
;
Vec3
periodicBoxVectors
[
3
],
recipBoxVectors
[
3
];
Vec3
periodicBoxVectors
[
3
],
recipBoxVectors
[
3
];
bool
includeEnergy
;
bool
includeEnergy
;
gmx_
atomic
_t
atomicCounter
;
std
::
atomic
<
int
>
atomicCounter
;
};
};
}
// namespace OpenMM
}
// namespace OpenMM
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment