Unverified Commit f5166695 authored by Josh A. Mitchell's avatar Josh A. Mitchell Committed by GitHub
Browse files

Reorganise documentation (#3165)

* Break user guide into parts

* Break up file I missed

* Add basic .gitignore to suit out-of-tree builds in build, build1, build2 ... build9

* Small changes to autonumber.py for doc compilation

This is a small change designed not to fix the incorrect logic of
autonumber.py, but just to get the docs compiling. While assigning
numbers, the code now just ignores the autonumber_by_chapter
setting for a particular object if it can't find the appropriate
section in section_numbers. This is a temporary fix!

* Include part and chapter in autonumbered objects

* Fix autonumber.py to correctly reference autonumber roles in file-level sections

* Unify and simplify styling across documentation

* Break dev docs down into individual chapter-files

* Correct absolute links to relative

* Disable browser suggestions for lunrsearch box in API docs

* Remove part name from chapter 2.1

* Rename ambiguous 'Home' link to OpenMM.org

* Typo

* Minor fixes and reversions

Reverts some changes I had made and later thought better of,
and fixes some recurring typos across the documentation.

* Number developers guide chapters

* Fix responsiveness

* Remove header.rst and center captions

* Add a level of depth to main TOC and styling to accomodate

* Add third level to Part-level TOCs

* Use :numref: instead of :ref: to number links to sections

* Continuously number chapters in user guide

* navbar links to other docs relative, not absolute

* Improve navbar spacing with new template

* Fix sidebar while allowing it to scroll

* Hard -> Soft links for navigation.html template

* Add navigation.html template to cmakelists

* Add another level of .. to relative links

* Fix flex on C++ and remove layer of ..
parent 1344f2e0
__pycache__
build
build?
...@@ -4,6 +4,9 @@ file(GLOB STAGING_INPUT_FILES RELATIVE "${CMAKE_CURRENT_SOURCE_DIR}" ...@@ -4,6 +4,9 @@ file(GLOB STAGING_INPUT_FILES RELATIVE "${CMAKE_CURRENT_SOURCE_DIR}"
"*.rst.jinja2" "*.rst.jinja2"
"*.py" "*.py"
"_static/logo.png" "_static/logo.png"
"_static/custom.css"
"_templates/lunrsearch.html"
"_templates/navigation.html"
) )
set(WRAPPER_DOXYGEN_DIR "${CMAKE_CURRENT_BINARY_DIR}/doxygen") set(WRAPPER_DOXYGEN_DIR "${CMAKE_CURRENT_BINARY_DIR}/doxygen")
...@@ -51,6 +54,9 @@ add_custom_command( ...@@ -51,6 +54,9 @@ add_custom_command(
"${CMAKE_CURRENT_BINARY_DIR}/library.rst" "${CMAKE_CURRENT_BINARY_DIR}/library.rst"
"${CMAKE_CURRENT_BINARY_DIR}/index.rst" "${CMAKE_CURRENT_BINARY_DIR}/index.rst"
"${CMAKE_CURRENT_BINARY_DIR}/_static/logo.png" "${CMAKE_CURRENT_BINARY_DIR}/_static/logo.png"
"${CMAKE_CURRENT_BINARY_DIR}/_static/custom.css"
"${CMAKE_CURRENT_BINARY_DIR}/_templates/lunrsearch.html"
"${CMAKE_CURRENT_BINARY_DIR}/_templates/navigation.html"
"${WRAPPER_DOXYGEN_DIR}/xml/index.xml" "${WRAPPER_DOXYGEN_DIR}/xml/index.xml"
) )
......
../../api-python/_static/custom.css
\ No newline at end of file
../../api-python/_templates/lunrsearch.html
\ No newline at end of file
../../api-python/_templates/navigation.html
\ No newline at end of file
import sys
import os import os
import sys
extensions = ['sphinx.ext.mathjax','sphinx.ext.autosummary', extensions = [
'sphinx.ext.autodoc', 'sphinxcontrib.lunrsearch', "sphinx.ext.mathjax",
'sphinxcontrib.autodoc_doxygen'] "sphinx.ext.autosummary",
"sphinx.ext.autodoc",
"sphinxcontrib.lunrsearch",
"sphinxcontrib.autodoc_doxygen",
]
autosummary_generate = True autosummary_generate = True
autodoc_member_order = 'bysource' autodoc_member_order = "bysource"
source_suffix = '.rst' source_suffix = ".rst"
master_doc = 'index' master_doc = "index"
project = u'OpenMM' project = u"OpenMM C++ API"
copyright = u'2015, Stanford University and the Authors' copyright = u"2015, Stanford University and the Authors"
version = '@OPENMM_MAJOR_VERSION@.@OPENMM_MINOR_VERSION@' version = "@OPENMM_MAJOR_VERSION@.@OPENMM_MINOR_VERSION@"
release = '@OPENMM_MAJOR_VERSION@.@OPENMM_MINOR_VERSION@' release = "@OPENMM_MAJOR_VERSION@.@OPENMM_MINOR_VERSION@"
exclude_patterns = ['_build', '_templates'] exclude_patterns = ["_build", "_templates"]
html_static_path = ['_static'] html_static_path = ["_static"]
templates_path = ['_templates'] templates_path = ["_templates"]
pygments_style = 'sphinx' pygments_style = "sphinx"
html_theme = "alabaster" html_theme = "alabaster"
html_theme_options = { html_theme_options = {
'description': "High performance molecular simulation on GPUs", "github_button": False,
'github_button': False, "github_user": "openmm",
# 'github_user': 'pandegroup', "github_repo": "openmm",
# 'github_repo': 'openmm', "logo_name": True,
'logo_name': False, "logo": "logo.png",
'logo': 'logo.png', "extra_nav_links": [
{
"title": "OpenMM.org",
"uri": "https://openmm.org",
"relative": False,
},
{
"title": "User's Manual",
"uri": "../userguide/",
"relative": True,
},
{
"title": "Developer Guide",
"uri": "../developerguide/",
"relative": True,
},
{
"title": "Python API reference",
"uri": "../api-python/",
"relative": True,
},
{
"title": "GitHub",
"uri": "https://github.com/openmm",
"relative": False,
},
],
"show_relbar_bottom": True,
} }
html_sidebars = { html_sidebars = {
'**': [ "**": [
'about.html', "about.html",
'searchbox.html', "lunrsearch.html",
'navigation.html', "navigation.html",
] ]
} }
......
...@@ -7,7 +7,10 @@ file(GLOB STAGING_INPUT_FILES RELATIVE "${CMAKE_CURRENT_SOURCE_DIR}" ...@@ -7,7 +7,10 @@ file(GLOB STAGING_INPUT_FILES RELATIVE "${CMAKE_CURRENT_SOURCE_DIR}"
"conf.py" "conf.py"
"process-docstring.py" "process-docstring.py"
"_static/logo.png" "_static/logo.png"
"_static/custom.css"
"_templates/class.rst" "_templates/class.rst"
"_templates/lunrsearch.html"
"_templates/navigation.html"
) )
foreach(INIT_FILE ${STAGING_INPUT_FILES}) foreach(INIT_FILE ${STAGING_INPUT_FILES})
...@@ -41,7 +44,10 @@ add_custom_command( ...@@ -41,7 +44,10 @@ add_custom_command(
"${CMAKE_CURRENT_BINARY_DIR}/library.rst" "${CMAKE_CURRENT_BINARY_DIR}/library.rst"
"${CMAKE_CURRENT_BINARY_DIR}/index.rst" "${CMAKE_CURRENT_BINARY_DIR}/index.rst"
"${CMAKE_CURRENT_BINARY_DIR}/_static/logo.png" "${CMAKE_CURRENT_BINARY_DIR}/_static/logo.png"
"${CMAKE_CURRENT_BINARY_DIR}/_static/custom.css"
"${CMAKE_CURRENT_BINARY_DIR}/_templates/class.rst" "${CMAKE_CURRENT_BINARY_DIR}/_templates/class.rst"
"${CMAKE_CURRENT_BINARY_DIR}/_templates/lunrsearch.html"
"${CMAKE_CURRENT_BINARY_DIR}/_templates/navigation.html"
PythonInstall PythonInstall
) )
file(MAKE_DIRECTORY ${CMAKE_BINARY_DIR}/api-python/) file(MAKE_DIRECTORY ${CMAKE_BINARY_DIR}/api-python/)
......
/* Reasonable defaults */
html {
overflow-x: hidden;
overflow-y: scroll;
text-rendering: optimizeLegibility;
text-size-adjust: 100%;
-moz-osx-font-smoothing: grayscale;
-webkit-font-smoothing: antialiased;
}
/* Fix responsiveness */
body {
overflow-x: hidden;
}
div.body {
min-width: unset;
}
@media screen and (max-width: 870px) {
div.sphinxsidebar p.logo {
display: unset;
}
}
@media screen and (max-width: 875px) {
ul {
margin-left: 30px;
}
div.sphinxsidebar {
width: 100vw;
padding: 0;
}
}
@media screen and (min-width: 871px) and (max-width: 940px) {
div.document {
width: 100vw
}
}
/* When search bar is in nav footer, don't let it stretch too far */
.searchformwrapper {
max-width: 250px;
}
/* Fix next/prev links in footer */
/* Don't just float the whole thing right */
nav#rellinks {
float: unset;
}
nav#rellinks ul {
padding-left: 0;
display: flex;
justify-content: space-between;
flex-wrap: wrap;
}
nav#rellinks li {
line-height: 1.3;
padding: 5px 0px;
}
nav#rellinks li:first-child {
display: block;
text-indent: -17px;
padding-left: 17px;
}
nav#rellinks li + li {
margin-left: auto;
text-align: right;
display: flex;
}
nav#rellinks li + li a {
display: inline-block;
margin-right: 5px;
}
nav#rellinks li + li:before {
content: "";
}
/* Put the title and logo side by side*/
.sphinxsidebarwrapper {
display: flex;
flex-wrap: wrap;
align-items: flex-start;
flex-direction: row-reverse;
justify-content: space-between;
align-content: flex-start;
}
.sphinxsidebar .logo-name {
flex-basis: 140px;
font-size: 20px;
}
.sphinxsidebar p.logo {
flex-basis: 60px;
text-align: right;
display: block;
margin-top: 0 !important;
}
/* Get control over the image */
.sphinxsidebar p.logo a {
height: auto;
display: block;
}
/* Make sure the remaining items use the width of
* the whole navbar and don't get squished together
* by flex.
*/
.sphinxsidebar .logo-name ~ * {
flex-basis: 100%
}
/* Emulate a placeholder rather than a heading for search */
.sphinxsidebar #searchbox form.search input[type="text"] {
background-image: url("data:image/svg+xml;utf8,<svg xmlns='http://www.w3.org/2000/svg' version='1.1' height='50px' width='120px'><text x='10' y='17' fill='gray' font-size='15'>Search...</text></svg>");
background-repeat: no-repeat;
}
.sphinxsidebar #searchbox form.search input[type="text"]:focus {
background-image: none;
}
/* Hide unwanted elements*/
.sphinxsidebarwrapper #searchbox h3, /* Search heading */
.sphinxsidebarwrapper > h3, /* Navigation heading */
.sphinxsidebarwrapper p:empty, /* Empty elements taking up space */
.sphinxsidebar .logo-name + a[href], /* Inexplicable but ugly link */
.sphinxsidebarwrapper hr /* Horizontal rules */
{
display: none;
}
/* Hide logo on tiny screens */
@media screen and (max-width: 280px) {
div.sphinxsidebar p.logo {
display: none;
}
.sphinxsidebar .logo-name {
flex-basis: 100%;
}
}
/* Style TOC in sidebar more clearly */
.sphinxsidebarwrapper li.toctree-l1 {
padding: 0.15em 0;
line-height: 1.4;
}
.sphinxsidebarwrapper a.current,
.sphinxsidebarwrapper a.current:hover {
text-decoration: none;
border-bottom: none;
cursor: text;
}
/* Tweak spacing */
div.sphinxsidebarwrapper #searchbox {
margin-bottom: 0;
}
div.sphinxsidebarwrapper .nav-toctree > ul {
margin: 5px 0;
}
/* Enlarge space between toctrees and external links */
div.sphinxsidebarwrapper .nav-toctree {
margin-top: 15px;
margin-bottom: 15px;
}
div.sphinxsidebarwrapper .extra-nav-links {
margin-bottom: 0;
}
/* Custom body styling */
/* Center captions of figures, examples, etc. */
.body .caption {
text-align: center;
}
.body .toctree-l1 {
font-weight: bold;
}
.body .toctree-l2 {
font-weight: normal;
}
.body .toctree-l3 {
font-size: 0.8em;
}
/* Fix navbar to top */
@media screen and (min-width: 875px) {
.sphinxsidebar {
position: fixed;
height: 100vh;
overflow-y: hidden;
top: 0;
float: unset !important;
margin-left: 0 !important;
}
.sphinxsidebarwrapper {
height: calc(100% - 60px);
overflow-y: auto;
padding-top: 30px !important;
padding-bottom: 30px !important;
/* Hide scrollbar */
-ms-overflow-style: none; /* IE and Edge */
scrollbar-width: none; /* Firefox */
}
/* Hide scrollbar */
.sphinxsidebarwrapper::-webkit-scrollbar {
display: none
}
}
<!--
sphinxcontrib-lunrsearch injects its own search template at the front of the
line, so to overwrite it, I'm using this template with a different name, and
specifying it in html_sidebars. This is a temporary measure until we replace
or remove sphinxcontrib-lunrsearch
-->
<script type="text/javascript">
var Search = {
store : null,
setIndex : function (data) {
this.store = data.store;
},
};
</script>
{# The script searchindex.js contains the code Search.setIndex(...) where
the content is an object built from IndexBuilder.freeze(). So we need to
setup the Search.setIndex function beforehand just to store the data.
This should all be finished when onload fires, and at that point the code in
searchbox.js will pull the data out of Search.store and build the actual
index and callbacks.
#}
<script src="{{ pathto('searchindex.js', 1) }}" type="text/javascript"></script>
<form class="search" action="" method="get">
<input type="hidden" name="check_keywords" value="yes" />
<input type="hidden" name="area" value="default" />
<input type="hidden" id="ls_lunrsearch-highlight" value="{{ lunrsearch_highlight }}" />
<input type="text" class="search-field" id="ls_search-field" name="q" placeholder="Search API..." autocomplete="off" />
<ul class="results" id="ls_search-results"></ul>
</form>
<div class="navigation-scrollbox">
<div class="nav-toctree">
{{ toctree(includehidden=theme_sidebar_includehidden, collapse=theme_sidebar_collapse) }}
</div>
{% if theme_extra_nav_links %}
<ul class="extra-nav-links">
{% for link in theme_extra_nav_links %}
<li class="toctree-l1">
<a href="{{ pathto(link.uri, 1) if link.relative else link.uri }}">
{{ link.title }}
</a>
</li>
{% endfor %}
</ul>
{% endif %}
</div>
# -*- coding: utf-8 -*- # -*- coding: utf-8 -*-
import sys
import os import os
import sys
import openmm.version import openmm.version
extensions = ['sphinx.ext.mathjax', 'sphinx.ext.ifconfig', 'sphinx.ext.autosummary', extensions = [
'sphinx.ext.autodoc', 'sphinx.ext.napoleon', 'process-docstring', "sphinx.ext.mathjax",
'sphinxcontrib.lunrsearch'] "sphinx.ext.ifconfig",
"sphinx.ext.autosummary",
"sphinx.ext.autodoc",
"sphinx.ext.napoleon",
"process-docstring",
"sphinxcontrib.lunrsearch",
]
autosummary_generate = True autosummary_generate = True
autodoc_default_options = { autodoc_default_options = {
'members': True, "members": True,
'inherited-members': True, "inherited-members": True,
'member-order': 'bysource' "member-order": "bysource",
} }
source_suffix = '.rst' source_suffix = ".rst"
master_doc = 'index' master_doc = "index"
project = u'OpenMM' project = u"OpenMM Python API"
copyright = u'2015, Stanford University and the Authors' copyright = u"2015, Stanford University and the Authors"
version = openmm.version.short_version version = openmm.version.short_version
release = openmm.version.full_version release = openmm.version.full_version
exclude_patterns = ['_build', '_templates'] exclude_patterns = ["_build", "_templates"]
html_static_path = ['_static'] html_static_path = ["_static"]
templates_path = ['_templates'] templates_path = ["_templates"]
pygments_style = 'sphinx' pygments_style = "sphinx"
html_theme = "alabaster" html_theme = "alabaster"
html_theme_options = { html_theme_options = {
'description': "High performance molecular simulation on GPUs", "github_button": False,
'github_button': False, "github_user": "openmm",
# 'github_user': 'pandegroup', "github_repo": "openmm",
# 'github_repo': 'openmm', "logo_name": True,
'logo_name': False, "logo": "logo.png",
'logo': 'logo.png', "extra_nav_links": [
{
"title": "OpenMM.org",
"uri": "https://openmm.org",
"relative": False,
},
{
"title": "User's Manual",
"uri": "../userguide/",
"relative": True,
},
{
"title": "Developer Guide",
"uri": "../developerguide/",
"relative": True,
},
{
"title": "C++ API reference",
"uri": "../api-c++/",
"relative": True,
},
{
"title": "GitHub",
"uri": "https://github.com/openmm",
"relative": False,
},
],
"show_relbar_bottom": True,
} }
html_sidebars = { html_sidebars = {
'**': [ "**": [
'about.html', "about.html",
'searchbox.html', "lunrsearch.html",
'navigation.html', "navigation.html",
] ]
} }
......
.. role:: code
.. raw:: html
<style> .code {font-family:monospace;} </style>
<style> .caption {text-align:center;} </style>
.. highlight:: c++
Introduction
############
This guide describes the internal architecture of the OpenMM library. It is
targeted at developers who want to add features to OpenMM, either by modifying
the core library directly or by writing plugins. If you just want to write
applications that use OpenMM, you do not need to read this guide; the User's
Manual tells you everything you need to know. This guide is intended for
people who want to contribute to OpenMM itself.
It is organized as follows:
* Chapter :numref:`the-core-library` describes the architecture of the core OpenMM library. It
discusses how the high level and low level APIs relate to each other, and the
flow of execution between them.
* Chapter :numref:`writing-plugins` describes in detail how to write a plugin. It focuses on the two
most common types of plugins: those which define new Forces, and those which
implement new Platforms.
* Chapter :numref:`the-reference-platform` discusses the architecture of the reference Platform, providing
information relevant to writing reference implementations of new features.
* Chapter :numref:`the-cpu-platform` discusses the architecture of the CPU Platform, providing
information relevant to writing CPU implementations of new features.
* Chapter :numref:`the-opencl-platform` discusses the architecture of the OpenCL Platform, providing
information relevant to writing OpenCL implementations of new features.
* Chapter :numref:`the-cuda-platform` discusses the architecture of the CUDA Platform, providing
information relevant to writing CUDA implementations of new features.
* Chapter :numref:`common-compute` describes the Common Compute framework, which lets you
write a single implementation of a feature that can be used for both OpenCL and CUDA.
This guide assumes you are already familiar with the public API and how to use
OpenMM in applications. If that is not the case, you should first read the
User's Manual and work through some of the example programs. Pay especially
close attention to the “Introduction to the OpenMM Library” chapter, since it
introduces concepts that are important in understanding this guide.
.. role:: code
.. raw:: html
<style> .code {font-family:monospace;} </style>
<style> .caption {text-align:center;} </style>
.. highlight:: c++
.. _the-core-library:
The Core Library
################
OpenMM is based on a layered architecture, as shown in the following diagram:
.. figure:: ../images/ArchitectureLayers.jpg
:align: center
:width: 100%
:autonumber:`Figure,Architecture Layers`\ : OpenMM architecture
The public API layer consists of the classes you access when using OpenMM in an
application: System; Force and its subclasses; Integrator and its subclasses;
and Context. These classes define a public interface but do no computation.
The next layer down consists of “implementation” classes that mirror the public
API classes: ContextImpl, ForceImpl, and a subclass of ForceImpl for each
subclass of Force (HarmonicBondForceImpl, NonbondedForceImpl, etc.). These
objects are created automatically when you create a Context. They store
information related to a particular simulation, and define methods for
performing calculations.
Note that, whereas a Force is logically “part of” a System, a ForceImpl is
logically “part of” a Context. (See :autonumref:`Figure,API Relationships`\ .) If you create many Contexts
for simulating the same System, there is still only one System and only one copy
of each Force in it. But there will be separate ForceImpls for each Context,
and those ForceImpls store information related to their particular Contexts.
.. figure:: ../images/SystemContextRelationships.jpg
:align: center
:autonumber:`Figure,API Relationships`\ : Relationships between public API and implementation layer objects
Also note that there is no “IntegratorImpl” class, because it is not needed.
Integrator is already specific to one Context. Many Contexts can all simulate
the same System, but each of them must have its own Integrator, so information
specific to one simulation can be stored directly in the Integrator.
The next layer down is the OpenMM Low Level API (OLLA). The important classes
in this layer are: Platform; Kernel; KernelImpl and its subclasses; and
KernelFactory. A Kernel is just a reference counted pointer to a KernelImpl;
the real work is done by KernelImpl objects (or more precisely, by instances of
its subclasses). A KernelFactory creates KernelImpl objects, and a Platform
ties together a set of KernelFactories, as well as defining information that
applies generally to performing computations with that Platform.
All of these classes (except Kernel) are abstract. A particular Platform
provides concrete subclasses of all of them. For example, the reference
platform defines a Platform subclass called ReferencePlatform, a KernelFactory
subclass called ReferenceKernelFactory, and a concrete subclass of each abstract
KernelImpl type: ReferenceCalcNonbondedForceKernel extends
CalcNonbondedForceKernel (which in turn extends KernelImpl),
ReferenceIntegrateVerletStepKernel extends IntegrateVerletStepKernel, and so on.
We can understand this better by walking through the entire sequence of events
that takes place when you create a Context. As an example, suppose you create a
System; add a NonbondedForce to it; create a VerletIntegrator; and then create a
Context for them using the reference Platform. Here is what happens.
#. The Context constructor creates a ContextImpl.
#. The ContextImpl calls :code:`createImpl()` on each Force in the System,
which creates an instance of the appropriate ForceImpl subclass.
#. The ContextImpl calls :code:`contextCreated()` on the Platform(), which
in turn calls :code:`setPlatformData()` on the ContextImpl. This allows
Platform-specific information to be stored in a ContextImpl. Every Platform has
its own mechanism for storing particle masses, constraint definitions, particle
positions, and so on. ContextImpl therefore allows the Platform to create an
arbitrary block of data and store it where it can be accessed by that Platform’s
kernels.
#. The ContextImpl calls :code:`createKernel()` on the Platform several
times to get instances of various kernels that it needs:
CalcKineticEnergyKernel, ApplyConstraintsKernel, etc.
#. For each kernel, the Platform looks up which KernelFactory has been
registered for that particular kernel. In this case, it will be a
ReferenceKernelFactory.
#. It calls :code:`createKernelImpl()` on the KernelFactory, which
creates and returns an instance of an appropriate KernelImpl subclass:
ReferenceCalcKineticEnergyKernel, ReferenceApplyConstraintsKernel, etc.
#. The ContextImpl loops over all of its ForceImpls and calls
:code:`initialize()` on each one.
#. Each ForceImpl asks the Platform to create whatever kernels it needs. In
this example, NonbondedForceImpl will request a CalcNonbondedForceKernel, and
get back a ReferenceCalcNonbondedForceKernel.
#. The ContextImpl calls :code:`initialize()` on the Integrator which, like
the other objects, requests kernels from the Platform. In this example,
VerletIntegrator requests an IntegrateVerletStepKernel and gets back a
ReferenceIntegrateVerletStepKernel.
At this point, the Context is fully initialized and ready for doing computation.
Reference implementations of various KernelImpls have been created, but they are
always referenced through abstract superclasses. Similarly, data structures
specific to the reference Platform have been created and stored in the
ContextImpl, but the format and content of these structures is opaque to the
ContextImpl. Whenever it needs to access them (for example, to get or set
particle positions), it does so through a kernel (UpdateStateDataKernel in this
case).
Now suppose that you call :code:`step()` on the VerletIntegrator. Here is
what happens to execute each time step.
#. The VerletIntegrator calls :code:`updateContextState()` on the
ContextImpl. This gives each Force an opportunity to modify the state of the
Context at the start of each time step.
#. The ContextImpl loops over its ForceImpls and calls
:code:`updateContextState()` on each one. In this case, our only ForceImpl is
a NonbondedForceImpl, which returns without doing anything. On the other hand,
if we had an AndersenThermostat in our System, its ForceImpl would invoke a
kernel to modify particle velocities.
#. The VerletIntegrator calls :code:`calcForcesAndEnergy()` on the
ContextImpl to request that the forces be computed.
#. The ContextImpl calls :code:`beginComputation()` on its
CalcForcesAndEnergyKernel. This initializes all the forces to zero and does any
other initialization the Platform requires before forces can be computed. For
example, some Platforms construct their nonbonded neighbor lists at this point.
#. The ContextImpl loops over its ForceImpls and calls
:code:`calcForcesAndEnergy()` on each one. In this case, we have a
NonbondedForceImpl which invokes its CalcNonbondedForceKernel to compute forces.
#. Finally, the ContextImpl calls :code:`finishComputation()` on its
CalcForcesAndEnergyKernel. This does any additional work needed to determine
the final forces, such as summing the values from intermediate buffers.
#. Finally, the VerletIntegrator invokes its IntegrateVerletStepKernel. This
takes the forces, positions, and velocities that are stored in a Platform-
specific format in the ContextImpl, uses them to compute new positions and
velocities, and stores them in the ContextImpl.
.. role:: code
.. raw:: html
<style> .code {font-family:monospace;} </style>
<style> .caption {text-align:center;} </style>
.. highlight:: c++
.. _writing-plugins:
Writing Plugins
###############
A plugin is a dynamic library that adds new features to OpenMM. It is typically
stored in the :code:`lib/plugins` directory inside your OpenMM installation,
and gets loaded along with all other plugins when the user calls
::
Platform::loadPluginsFromDirectory(Platform::getDefaultPluginsDirectory());
It is also possible to load plugins from a different directory, or to load them
individually by calling :code:`Platform::loadPluginLibrary()`\ .
Every plugin must implement two functions that are declared in the
PluginInitializer.h header file:
::
extern "C" void registerPlatforms();
extern "C" void registerKernelFactories();
When a plugin is loaded, these two functions are invoked to register any
Platforms and KernelFactories defined by the plugin. When many plugins are
loaded at once by calling :code:`Platform::loadPluginsFromDirectory()`\ ,
:code:`registerPlatforms()` is first called on all of them, then
:code:`registerKernelFactories()` is called on all of them. This allows one
plugin to define a Platform, and a different plugin to add KernelFactories to
it; the Platform is guaranteed to be registered by the first plugin before the
second plugin tries to add its KernelFactories, regardless of what order the
plugins happen to be loaded in.
Creating New Platforms
**********************
One common type of plugin defines a new Platform. There are four such plugins
that come with OpenMM: one for the Reference platform, one for the CPU Platform,
one for the CUDA Platform, and one for the OpenCL Platform.
To define a new Platform, you must create subclasses of the various abstract
classes in the OpenMM Low Level API: a subclass of Platform, one or more
subclasses of KernelFactory, and a subclass of each KernelImpl. That is easy to
say, but a huge amount of work to actually do. There are many different
algorithms involved in computing forces, enforcing constraints, performing
integration, and so on, all of which together make up a Platform. Of course,
there is no requirement that every Platform must implement every possible
feature. If you do not provide an implementation of a particular kernel, it
simply means your Platform cannot be used for any simulation that requires that
kernel; if a user tries to do so, an exception will be thrown.
Your plugin’s :code:`registerPlatforms()` function should create an instance
of your Platform subclass, then register it by calling
:code:`Platform::registerPlatform()`\ . You also must register the
KernelFactory for each kernel your Platform supports. This can be done in the
:code:`registerKernelFactories()` function, or more simply, directly in the
Platform’s constructor. You can use as many different KernelFactories as you
want for different kernels, but usually it is simplest to use a single
KernelFactory for all of them. The support for multiple KernelFactories exists
primarily to let plugins add new features to existing Platforms, as described in
the next section.
Creating New Forces
*******************
Another common type of plugin defines new Forces and provides implementations of
them for existing Platforms. (Defining new Integrators is not specifically
discussed here, but the process is very similar.) There are two such plugins
that come with OpenMM. They implement the AMOEBA force field and Drude
oscillators, respectively.
As an example, suppose you want to create a new Force subclass called
StringForce that uses the equations of String Theory to compute the interactions
between particles. You want to provide implementations of it for all four
standard platforms: Reference, CPU, CUDA, and OpenCL.
The first thing to realize is that this *cannot* be done with only a plugin
library. Plugins are loaded dynamically at runtime, and they relate to the low
level API; but you must also provide a public API. Users of your class need to
create StringForce objects and call methods on them. That means providing a
header file with the class declaration, and a (non-plugin) library with the
class definition to link their code against. The implementations for particular
Platforms can be in plugins, but the public API class itself cannot. Or to put
it differently, the full “plugin” (from the user’s perspective) consists of
three parts: the library OpenMM loads at runtime (which is what OpenMM considers
to be the “plugin”), a second library for users to link their code against, and
a header file for them to include in their source code.
To define the API, you will need to create the following classes:
#. StringForce. This is the public API for your force, and users will directly
link against the library containing it.
#. StringForceImpl. This is the ForceImpl subclass corresponding to
StringForce. It should be defined in the same library as StringForce, and
StringForce’s :code:`createImpl()` method should create an instance of it.
#. CalcStringForceKernel. This is an abstract class that extends KernelImpl,
and defines the API by which StringForceImpl invokes its kernel. You only need
to provide a header file for it, not an implementation; those will be provided
by Platforms.
Now suppose you are writing the OpenCL implementation of StringForce. Here are
the classes you need to write:
#. OpenCLCalcStringForceKernel. This extends CalcStringForceKernel and provides
implementations of its virtual methods. The code for this class will probably
be very complicated (and if it actually works, worth a Nobel Prize). It may
execute many different GPU kernels and create its own internal data structures.
But those details are entirely internal to your own code. As long as this class
implements the virtual methods of CalcStringForceKernel, you can do anything you
want inside it.
#. OpenCLStringForceKernelFactory. This is a KernelFactory subclass that knows
how to create instances of OpenCLCalcStringForceKernel.
Both of these classes should be packaged into a dynamic library (.so on Linux,
.dylib on Mac, .dll on Windows) that can be loaded as a plugin. This library
must also implement the two functions from PluginInitializer.h.
:code:`registerPlatforms()` will do nothing, since this plugin does not
implement any new Platforms. :code:`registerKernelFactories()` should call
\ :code:`Platform::getPlatformByName("OpenCL")` to get the OpenCL Platform,
then create a new OpenCLStringForceKernelFactory and call
:code:`registerKernelFactory()` on the Platform to register it. If the OpenCL
Platform is not available, you should catch the exception then return without
doing anything. Most likely this means there is no OpenCL runtime on the
computer your code is running on.
.. role:: code
.. raw:: html
<style> .code {font-family:monospace;} </style>
<style> .caption {text-align:center;} </style>
.. highlight:: c++
.. _the-reference-platform:
The Reference Platform
######################
The reference Platform is written with simplicity and clarity in mind, not
performance. (It is still not always as simple or clear as one might hope, but
that is the goal.) When implementing a new feature, it is recommended to create
the reference implementation first, then use that as a model for the versions in
other Platforms.
When using the reference Platform, the “platform-specific data” stored in
ContextImpl is of type ReferencePlatform::PlatformData, which is declared in
ReferencePlatform.h. It has fields for storing positions, velocities, box
vectors, and other types of data.
The PlatformData’s vector of forces contains one element for each particle. At
the start of each force evaluation, all elements of it are set to zero. Each
Force adds its own contributions to the vector, so that at the end, it contains
the total force acting on each particle.
There are a few additional classes that contain useful static methods.
SimTKOpenMMUtilities has various utility functions, of which the most important
is a random number generator. ReferenceForce provides methods for calculating
the displacement between two positions, optionally taking periodic boundary
conditions into account.
.. role:: code
.. raw:: html
<style> .code {font-family:monospace;} </style>
<style> .caption {text-align:center;} </style>
.. highlight:: c++
.. _the-cpu-platform:
The CPU Plaform
###############
CpuPlatform is a subclass of ReferencePlatform. It provides optimized versions
of a small number of kernels, while using the reference implementations for all
the others. Any kernel implementation written for the reference Platform will
work equally well with the CPU platform. Of course, if that kernel happens to
be a performance bottleneck, you will probably want to write an optimized
version of it. But many kernels have negligible effect on performance, and for
these you can just use the same implementation for both platforms.
If you choose to do that, you can easily support both platforms with a single
plugin library. Just implement :code:`registerKernelFactories()` like this:
::
extern "C" void registerKernelFactories() {
for (int i = 0; i < Platform::getNumPlatforms(); i++) {
Platform& platform = Platform::getPlatform(i);
if (dynamic_cast<ReferencePlatform*>(&platform) != NULL) {
// Create and register your KernelFactory.
}
}
}
The loop identifies every ReferencePlatform, either an instance of the base
class or of a subclass, and registers a KernelFactory for every one.
...@@ -6,366 +6,6 @@ ...@@ -6,366 +6,6 @@
.. highlight:: c++ .. highlight:: c++
Introduction
############
This guide describes the internal architecture of the OpenMM library. It is
targeted at developers who want to add features to OpenMM, either by modifying
the core library directly or by writing plugins. If you just want to write
applications that use OpenMM, you do not need to read this guide; the Users
Manual tells you everything you need to know. This guide is *only* for
people who want to contribute to OpenMM itself.
It is organized as follows:
* Chapter :ref:`the-core-library` describes the architecture of the core OpenMM library. It
discusses how the high level and low level APIs relate to each other, and the
flow of execution between them.
* Chapter :ref:`writing-plugins` describes in detail how to write a plugin. It focuses on the two
most common types of plugins: those which define new Forces, and those which
implement new Platforms.
* Chapter :ref:`the-reference-platform` discusses the architecture of the reference Platform, providing
information relevant to writing reference implementations of new features.
* Chapter :ref:`the-cpu-platform` discusses the architecture of the CPU Platform, providing
information relevant to writing CPU implementations of new features.
* Chapter :ref:`the-opencl-platform` discusses the architecture of the OpenCL Platform, providing
information relevant to writing OpenCL implementations of new features.
* Chapter :ref:`the-cuda-platform` discusses the architecture of the CUDA Platform, providing
information relevant to writing CUDA implementations of new features.
* Chapter :ref:`common-compute` describes the Common Compute framework, which lets you
write a single implementation of a feature that can be used for both OpenCL and CUDA.
This guide assumes you are already familiar with the public API and how to use
OpenMM in applications. If that is not the case, you should first read the
Users Manual and work through some of the example programs. Pay especially
close attention to the “Introduction to the OpenMM Library” chapter, since it
introduces concepts that are important in understanding this guide.
.. _the-core-library:
The Core Library
################
OpenMM is based on a layered architecture, as shown in the following diagram:
.. figure:: ../images/ArchitectureLayers.jpg
:align: center
:width: 100%
:autonumber:`Figure,Architecture Layers`\ : OpenMM architecture
The public API layer consists of the classes you access when using OpenMM in an
application: System; Force and its subclasses; Integrator and its subclasses;
and Context. These classes define a public interface but do no computation.
The next layer down consists of “implementation” classes that mirror the public
API classes: ContextImpl, ForceImpl, and a subclass of ForceImpl for each
subclass of Force (HarmonicBondForceImpl, NonbondedForceImpl, etc.). These
objects are created automatically when you create a Context. They store
information related to a particular simulation, and define methods for
performing calculations.
Note that, whereas a Force is logically “part of” a System, a ForceImpl is
logically “part of” a Context. (See :autonumref:`Figure,API Relationships`\ .) If you create many Contexts
for simulating the same System, there is still only one System and only one copy
of each Force in it. But there will be separate ForceImpls for each Context,
and those ForceImpls store information related to their particular Contexts.
.. figure:: ../images/SystemContextRelationships.jpg
:align: center
:autonumber:`Figure,API Relationships`\ : Relationships between public API and implementation layer objects
Also note that there is no “IntegratorImpl” class, because it is not needed.
Integrator is already specific to one Context. Many Contexts can all simulate
the same System, but each of them must have its own Integrator, so information
specific to one simulation can be stored directly in the Integrator.
The next layer down is the OpenMM Low Level API (OLLA). The important classes
in this layer are: Platform; Kernel; KernelImpl and its subclasses; and
KernelFactory. A Kernel is just a reference counted pointer to a KernelImpl;
the real work is done by KernelImpl objects (or more precisely, by instances of
its subclasses). A KernelFactory creates KernelImpl objects, and a Platform
ties together a set of KernelFactories, as well as defining information that
applies generally to performing computations with that Platform.
All of these classes (except Kernel) are abstract. A particular Platform
provides concrete subclasses of all of them. For example, the reference
platform defines a Platform subclass called ReferencePlatform, a KernelFactory
subclass called ReferenceKernelFactory, and a concrete subclass of each abstract
KernelImpl type: ReferenceCalcNonbondedForceKernel extends
CalcNonbondedForceKernel (which in turn extends KernelImpl),
ReferenceIntegrateVerletStepKernel extends IntegrateVerletStepKernel, and so on.
We can understand this better by walking through the entire sequence of events
that takes place when you create a Context. As an example, suppose you create a
System; add a NonbondedForce to it; create a VerletIntegrator; and then create a
Context for them using the reference Platform. Here is what happens.
#. The Context constructor creates a ContextImpl.
#. The ContextImpl calls :code:`createImpl()` on each Force in the System,
which creates an instance of the appropriate ForceImpl subclass.
#. The ContextImpl calls :code:`contextCreated()` on the Platform(), which
in turn calls :code:`setPlatformData()` on the ContextImpl. This allows
Platform-specific information to be stored in a ContextImpl. Every Platform has
its own mechanism for storing particle masses, constraint definitions, particle
positions, and so on. ContextImpl therefore allows the Platform to create an
arbitrary block of data and store it where it can be accessed by that Platform’s
kernels.
#. The ContextImpl calls :code:`createKernel()` on the Platform several
times to get instances of various kernels that it needs:
CalcKineticEnergyKernel, ApplyConstraintsKernel, etc.
#. For each kernel, the Platform looks up which KernelFactory has been
registered for that particular kernel. In this case, it will be a
ReferenceKernelFactory.
#. It calls :code:`createKernelImpl()` on the KernelFactory, which
creates and returns an instance of an appropriate KernelImpl subclass:
ReferenceCalcKineticEnergyKernel, ReferenceApplyConstraintsKernel, etc.
#. The ContextImpl loops over all of its ForceImpls and calls
:code:`initialize()` on each one.
#. Each ForceImpl asks the Platform to create whatever kernels it needs. In
this example, NonbondedForceImpl will request a CalcNonbondedForceKernel, and
get back a ReferenceCalcNonbondedForceKernel.
#. The ContextImpl calls :code:`initialize()` on the Integrator which, like
the other objects, requests kernels from the Platform. In this example,
VerletIntegrator requests an IntegrateVerletStepKernel and gets back a
ReferenceIntegrateVerletStepKernel.
At this point, the Context is fully initialized and ready for doing computation.
Reference implementations of various KernelImpls have been created, but they are
always referenced through abstract superclasses. Similarly, data structures
specific to the reference Platform have been created and stored in the
ContextImpl, but the format and content of these structures is opaque to the
ContextImpl. Whenever it needs to access them (for example, to get or set
particle positions), it does so through a kernel (UpdateStateDataKernel in this
case).
Now suppose that you call :code:`step()` on the VerletIntegrator. Here is
what happens to execute each time step.
#. The VerletIntegrator calls :code:`updateContextState()` on the
ContextImpl. This gives each Force an opportunity to modify the state of the
Context at the start of each time step.
#. The ContextImpl loops over its ForceImpls and calls
:code:`updateContextState()` on each one. In this case, our only ForceImpl is
a NonbondedForceImpl, which returns without doing anything. On the other hand,
if we had an AndersenThermostat in our System, its ForceImpl would invoke a
kernel to modify particle velocities.
#. The VerletIntegrator calls :code:`calcForcesAndEnergy()` on the
ContextImpl to request that the forces be computed.
#. The ContextImpl calls :code:`beginComputation()` on its
CalcForcesAndEnergyKernel. This initializes all the forces to zero and does any
other initialization the Platform requires before forces can be computed. For
example, some Platforms construct their nonbonded neighbor lists at this point.
#. The ContextImpl loops over its ForceImpls and calls
:code:`calcForcesAndEnergy()` on each one. In this case, we have a
NonbondedForceImpl which invokes its CalcNonbondedForceKernel to compute forces.
#. Finally, the ContextImpl calls :code:`finishComputation()` on its
CalcForcesAndEnergyKernel. This does any additional work needed to determine
the final forces, such as summing the values from intermediate buffers.
#. Finally, the VerletIntegrator invokes its IntegrateVerletStepKernel. This
takes the forces, positions, and velocities that are stored in a Platform-
specific format in the ContextImpl, uses them to compute new positions and
velocities, and stores them in the ContextImpl.
.. _writing-plugins:
Writing Plugins
###############
A plugin is a dynamic library that adds new features to OpenMM. It is typically
stored in the :code:`lib/plugins` directory inside your OpenMM installation,
and gets loaded along with all other plugins when the user calls
::
Platform::loadPluginsFromDirectory(Platform::getDefaultPluginsDirectory());
It is also possible to load plugins from a different directory, or to load them
individually by calling :code:`Platform::loadPluginLibrary()`\ .
Every plugin must implement two functions that are declared in the
PluginInitializer.h header file:
::
extern "C" void registerPlatforms();
extern "C" void registerKernelFactories();
When a plugin is loaded, these two functions are invoked to register any
Platforms and KernelFactories defined by the plugin. When many plugins are
loaded at once by calling :code:`Platform::loadPluginsFromDirectory()`\ ,
:code:`registerPlatforms()` is first called on all of them, then
:code:`registerKernelFactories()` is called on all of them. This allows one
plugin to define a Platform, and a different plugin to add KernelFactories to
it; the Platform is guaranteed to be registered by the first plugin before the
second plugin tries to add its KernelFactories, regardless of what order the
plugins happen to be loaded in.
Creating New Platforms
**********************
One common type of plugin defines a new Platform. There are four such plugins
that come with OpenMM: one for the Reference platform, one for the CPU Platform,
one for the CUDA Platform, and one for the OpenCL Platform.
To define a new Platform, you must create subclasses of the various abstract
classes in the OpenMM Low Level API: a subclass of Platform, one or more
subclasses of KernelFactory, and a subclass of each KernelImpl. That is easy to
say, but a huge amount of work to actually do. There are many different
algorithms involved in computing forces, enforcing constraints, performing
integration, and so on, all of which together make up a Platform. Of course,
there is no requirement that every Platform must implement every possible
feature. If you do not provide an implementation of a particular kernel, it
simply means your Platform cannot be used for any simulation that requires that
kernel; if a user tries to do so, an exception will be thrown.
Your plugin’s :code:`registerPlatforms()` function should create an instance
of your Platform subclass, then register it by calling
:code:`Platform::registerPlatform()`\ . You also must register the
KernelFactory for each kernel your Platform supports. This can be done in the
:code:`registerKernelFactories()` function, or more simply, directly in the
Platform’s constructor. You can use as many different KernelFactories as you
want for different kernels, but usually it is simplest to use a single
KernelFactory for all of them. The support for multiple KernelFactories exists
primarily to let plugins add new features to existing Platforms, as described in
the next section.
Creating New Forces
*******************
Another common type of plugin defines new Forces and provides implementations of
them for existing Platforms. (Defining new Integrators is not specifically
discussed here, but the process is very similar.) There are two such plugins
that come with OpenMM. They implement the AMOEBA force field and Drude
oscillators, respectively.
As an example, suppose you want to create a new Force subclass called
StringForce that uses the equations of String Theory to compute the interactions
between particles. You want to provide implementations of it for all four
standard platforms: Reference, CPU, CUDA, and OpenCL.
The first thing to realize is that this *cannot* be done with only a plugin
library. Plugins are loaded dynamically at runtime, and they relate to the low
level API; but you must also provide a public API. Users of your class need to
create StringForce objects and call methods on them. That means providing a
header file with the class declaration, and a (non-plugin) library with the
class definition to link their code against. The implementations for particular
Platforms can be in plugins, but the public API class itself cannot. Or to put
it differently, the full “plugin” (from the user’s perspective) consists of
three parts: the library OpenMM loads at runtime (which is what OpenMM considers
to be the “plugin”), a second library for users to link their code against, and
a header file for them to include in their source code.
To define the API, you will need to create the following classes:
#. StringForce. This is the public API for your force, and users will directly
link against the library containing it.
#. StringForceImpl. This is the ForceImpl subclass corresponding to
StringForce. It should be defined in the same library as StringForce, and
StringForce’s :code:`createImpl()` method should create an instance of it.
#. CalcStringForceKernel. This is an abstract class that extends KernelImpl,
and defines the API by which StringForceImpl invokes its kernel. You only need
to provide a header file for it, not an implementation; those will be provided
by Platforms.
Now suppose you are writing the OpenCL implementation of StringForce. Here are
the classes you need to write:
#. OpenCLCalcStringForceKernel. This extends CalcStringForceKernel and provides
implementations of its virtual methods. The code for this class will probably
be very complicated (and if it actually works, worth a Nobel Prize). It may
execute many different GPU kernels and create its own internal data structures.
But those details are entirely internal to your own code. As long as this class
implements the virtual methods of CalcStringForceKernel, you can do anything you
want inside it.
#. OpenCLStringForceKernelFactory. This is a KernelFactory subclass that knows
how to create instances of OpenCLCalcStringForceKernel.
Both of these classes should be packaged into a dynamic library (.so on Linux,
.dylib on Mac, .dll on Windows) that can be loaded as a plugin. This library
must also implement the two functions from PluginInitializer.h.
:code:`registerPlatforms()` will do nothing, since this plugin does not
implement any new Platforms. :code:`registerKernelFactories()` should call
\ :code:`Platform::getPlatformByName("OpenCL")` to get the OpenCL Platform,
then create a new OpenCLStringForceKernelFactory and call
:code:`registerKernelFactory()` on the Platform to register it. If the OpenCL
Platform is not available, you should catch the exception then return without
doing anything. Most likely this means there is no OpenCL runtime on the
computer your code is running on.
.. _the-reference-platform:
The Reference Platform
######################
The reference Platform is written with simplicity and clarity in mind, not
performance. (It is still not always as simple or clear as one might hope, but
that is the goal.) When implementing a new feature, it is recommended to create
the reference implementation first, then use that as a model for the versions in
other Platforms.
When using the reference Platform, the “platform-specific data” stored in
ContextImpl is of type ReferencePlatform::PlatformData, which is declared in
ReferencePlatform.h. It has fields for storing positions, velocities, box
vectors, and other types of data.
The PlatformData’s vector of forces contains one element for each particle. At
the start of each force evaluation, all elements of it are set to zero. Each
Force adds its own contributions to the vector, so that at the end, it contains
the total force acting on each particle.
There are a few additional classes that contain useful static methods.
SimTKOpenMMUtilities has various utility functions, of which the most important
is a random number generator. ReferenceForce provides methods for calculating
the displacement between two positions, optionally taking periodic boundary
conditions into account.
.. _the-cpu-platform:
The CPU Plaform
###############
CpuPlatform is a subclass of ReferencePlatform. It provides optimized versions
of a small number of kernels, while using the reference implementations for all
the others. Any kernel implementation written for the reference Platform will
work equally well with the CPU platform. Of course, if that kernel happens to
be a performance bottleneck, you will probably want to write an optimized
version of it. But many kernels have negligible effect on performance, and for
these you can just use the same implementation for both platforms.
If you choose to do that, you can easily support both platforms with a single
plugin library. Just implement :code:`registerKernelFactories()` like this:
::
extern "C" void registerKernelFactories() {
for (int i = 0; i < Platform::getNumPlatforms(); i++) {
Platform& platform = Platform::getPlatform(i);
if (dynamic_cast<ReferencePlatform*>(&platform) != NULL) {
// Create and register your KernelFactory.
}
}
}
The loop identifies every ReferencePlatform, either an instance of the base
class or of a subclass, and registers a KernelFactory for every one.
.. _the-opencl-platform: .. _the-opencl-platform:
The OpenCL Platform The OpenCL Platform
...@@ -748,241 +388,3 @@ OpenCLContext plus the delta stored in the OpenCLIntegrationUtilities. It then ...@@ -748,241 +388,3 @@ OpenCLContext plus the delta stored in the OpenCLIntegrationUtilities. It then
modifies the deltas so that all distance constraints are satisfied. The modifies the deltas so that all distance constraints are satisfied. The
integrator must then finish the time step by adding the deltas to the positions integrator must then finish the time step by adding the deltas to the positions
and storing them into the main position array. and storing them into the main position array.
.. _the-cuda-platform:
The CUDA Platform
#################
The CUDA platform is very similar to the OpenCL platform, and most of the
previous chapter applies equally well to it, just changing “OpenCL” to “Cuda” in
class names. There are a few differences worth noting.
Compiling Kernels
*****************
Like the OpenCL platform, the CUDA platform compiles all its kernels at runtime.
Unlike OpenCL, CUDA does not have built in support for runtime compilation.
OpenMM therefore needs to implement this itself by writing the source code out
to disk, invoking the nvcc compiler as a separate process, and then loading the
compiled kernel in from disk.
For the most part, you can ignore all of this. Just call
:code:`createModule()` on the CudaContext, passing it the CUDA source code.
It takes care of the details of compilation and loading, returning a CUmodule
object when it is done. You can then call :code:`getKernel()` to look up
individual kernels in the module (represented as CUfunction objects) and
:code:`executeKernel()` to execute them.
The CUDA platform does need two things to make this work: a directory on disk
where it can write out temporary files, and the path to the nvcc compiler.
These are specified by the “CudaTempDirectory” and “CudaCompiler” properties
when you create a new Context. It often can figure out suitable values for them
on its own, but sometimes it needs help. See the “Platform-Specific Properties”
chapter of the Users Manual for details.
Accumulating Forces
*******************
The OpenCL platform, as described in Section :ref:`computing-forces`\ , uses two types of buffers for
accumulating forces: a set of floating point buffers, and a single fixed point
buffer. In contrast, the CUDA platform uses *only* the fixed point buffer
(represented by the CUDA type :code:`long` :code:`long`\ ). This means
the CUDA platform only works on devices that support 64 bit atomic operations
(compute capability 1.2 or higher).
.. _common-compute:
Common Compute
##############
Common Compute is not a platform, but it shares many elements of one. It exists
to reduce code duplication between the OpenCL and CUDA platforms. It allows a
single implementation to be written for most kernels that can be used by both
platforms.
OpenCL and CUDA are very similar to each other. Their computational models are
nearly identical. For example, each is based around launching kernels that are
executed in parallel by many threads. Each of them groups threads into blocks,
with more communication and synchronization permitted between the threads
in a block than between ones in different blocks. They have very similar memory
hierarchies: high latency global memory, low latency local/shared memory that
can be used for communication between the threads of a block, and local variables
that are visible only to a single thread.
Even their languages for writing kernels are very similar. Here is an OpenCL
kernel that adds two arrays together, storing the result in a third array.
::
__kernel void addArrays(__global const float* restrict a,
__global const float* restrict b,
__global float* restrict c
int length) {
for (int i = get_global_id(0); i < length; i += get_global_size(0))
c[i] = a[i]+b[i];
}
Here is the corresponding CUDA kernel.
::
__extern "C" __global__ void addArrays(const float* __restrict__ a,
const float* __restrict__ b,
_float* __restrict__ c
int length) {
for (int i = blockIdx.x*blockDim.x+threadIdx.x; i < length; i += blockDim.x*gridDim.x)
c[i] = a[i]+b[i];
}
The difference between them is largely just a mechanical find-and-replace.
After many years of writing and maintaining nearly identical kernels by hand,
it finally occurred to us that the translation could be done automatically by
the compiler. Simply by defining a few preprocessor macros, the following
kernel can be compiled equally well either as OpenCL or as CUDA.
::
KERNEL void addArrays(GLOBAL const float* RESTRICT a,
GLOBAL const float* RESTRICT b,
GLOBAL float* RESTRICT c
int length) {
for (int i = GLOBAL_ID; i < length; i += GLOBAL_SIZE)
c[i] = a[i]+b[i];
}
Writing Device Code
*******************
When compiling kernels with the Common Compute API, the following macros are
defined.
+-------------------------------+------------------------------------------------------------+--------------------------------------------+
|Macro |OpenCL Definition |CUDA Definition |
+===============================+============================================================+============================================+
|:code:`KERNEL` |:code:`__kernel` |:code:`extern "C" __global__` |
+-------------------------------+------------------------------------------------------------+--------------------------------------------+
|:code:`DEVICE` | |:code:`__device__` |
+-------------------------------+------------------------------------------------------------+--------------------------------------------+
|:code:`LOCAL` |:code:`__local` |:code:`__shared__` |
+-------------------------------+------------------------------------------------------------+--------------------------------------------+
|:code:`LOCAL_ARG` |:code:`__local` | |
+-------------------------------+------------------------------------------------------------+--------------------------------------------+
|:code:`GLOBAL` |:code:`__global` | |
+-------------------------------+------------------------------------------------------------+--------------------------------------------+
|:code:`RESTRICT` |:code:`restrict` |:code:`__restrict__` |
+-------------------------------+------------------------------------------------------------+--------------------------------------------+
|:code:`LOCAL_ID` |:code:`get_local_id(0)` |:code:`threadIdx.x` |
+-------------------------------+------------------------------------------------------------+--------------------------------------------+
|:code:`LOCAL_SIZE` |:code:`get_local_size(0)` |:code:`blockDim.x` |
+-------------------------------+------------------------------------------------------------+--------------------------------------------+
|:code:`GLOBAL_ID` |:code:`get_global_id(0)` |:code:`(blockIdx.x*blockDim.x+threadIdx.x)` |
+-------------------------------+------------------------------------------------------------+--------------------------------------------+
|:code:`GLOBAL_SIZE` |:code:`get_global_size(0)` |:code:`(blockDim.x*gridDim.x)` |
+-------------------------------+------------------------------------------------------------+--------------------------------------------+
|:code:`GROUP_ID` |:code:`get_group_id(0)` |:code:`blockIdx.x` |
+-------------------------------+------------------------------------------------------------+--------------------------------------------+
|:code:`NUM_GROUPS` |:code:`get_num_groups(0)` |:code:`gridDim.x` |
+-------------------------------+------------------------------------------------------------+--------------------------------------------+
|:code:`SYNC_THREADS` |:code:`barrier(CLK_LOCAL_MEM_FENCE+CLK_GLOBAL_MEM_FENCE);` |:code:`__syncthreads();` |
+-------------------------------+------------------------------------------------------------+--------------------------------------------+
|:code:`SYNC_WARPS` | | if SIMT width >= 32: | | if compute capability >= 7.0: |
| | | :code:`mem_fence(CLK_LOCAL_MEM_FENCE)` | | :code:`__syncwarp();` |
| | | otherwise: | | otherwise empty |
| | | :code:`barrier(CLK_LOCAL_MEM_FENCE)` | |
+-------------------------------+------------------------------------------------------------+--------------------------------------------+
|:code:`MEM_FENCE` |:code:`mem_fence(CLK_LOCAL_MEM_FENCE+CLK_GLOBAL_MEM_FENCE);`|:code:`__threadfence_block();` |
+-------------------------------+------------------------------------------------------------+--------------------------------------------+
|:code:`ATOMIC_ADD(dest, value)`|:code:`atom_add(dest, value)` |:code:`atomicAdd(dest, value)` |
+-------------------------------+------------------------------------------------------------+--------------------------------------------+
A few other symbols may or may not be defined based on the device you are running on:
:code:`SUPPORTS_DOUBLE_PRECISION` and :code:`SUPPORTS_64_BIT_ATOMICS`\ . You
can use :code:`#ifdef` blocks with these symbols to conditionally compile code
based on the features supported by the device. In addition, the CUDA compiler
defines the symbol :code:`__CUDA_ARCH__`\ , so you can check for this symbol if
you want to have different code blocks for CUDA and OpenCL.
Both OpenCL and CUDA define vector types like :code:`int2` and :code:`float4`\ .
The types they support are different but overlapping. When writing common code,
use only the vector types that are supported by both OpenCL and CUDA: 2, 3, and 4
element vectors of type :code:`short`\ , :code:`int`\ , :code:`float`\ , and
:code:`double`\ .
CUDA uses functions to construct vector values, such as :code:`make_float2(x, y)`\ .
OpenCL instead uses a typecast like syntax: :code:`(float2) (x, y)`\ . In common
code, use the CUDA style :code:`make_` functions. OpenMM provides definitions
of these functions when compiling as OpenCL.
In CUDA, vector types are simply data structures. You can access their elements,
but not do much more with them. In contrast, OpenCL's vectors are mathematical
types. All standard math operators are defined for them, as well as geometrical
functions like :code:`dot()` and :code:`cross()`\ . When compiling kernels as
CUDA, OpenMM provides definitions of these operators and functions.
OpenCL also supports "swizzle" notation for vectors. For example, if :code:`f`
is a :code:`float4` you can construct a vector of its first three elements
by writing :code:`f.xyz`\ , or you can swap its first two elements by writing
:code:`f.xy = f.yx`\ . Unfortunately, there is no practical way to support this
in CUDA, so swizzle notation cannot be used in common code. Because stripping
the final element from a four component vector is such a common operation, OpenMM
provides a special function for doing it: :code:`trimTo3(f)` is a vector of its
first three elements.
64 bit integers are another data type that needs special handling. Both OpenCL
and CUDA support them, but they use different names for them: :code:`long` in OpenCL,
:code:`long long` in CUDA. To work around this inconsistency, OpenMM provides
the typedefs :code:`mm_long` and :code:`mm_ulong` for signed and unsigned 64 bit
integers in device code.
Writing Host Code
*****************
Host code for Common Compute is very similar to host code for OpenCL or CUDA.
In fact, most of the classes provided by the OpenCL and CUDA platforms are
subclasses of Common Compute classes. For example, OpenCLContext and
CudaContext are both subclasses of ComputeContext. When writing common code,
each KernelImpl should expect a ComputeContext to be passed to its constructor.
By using the common API provided by that abstract class, it can be used for
either OpenCL or CUDA just based on the particular context passed to it at
runtime. Similarly, OpenCLNonbondedUtilities and CudaNonbondedUtilities are
subclasses of the abstract NonbondedUtilities class, and so on.
ArrayInterface is an abstract class defining the interface for arrays stored on
the device. OpenCLArray and CudaArray are both subclasses of it. To simplify
code that creates and uses arrays, there is also a third subclass called
ComputeArray. It acts as a wrapper around an OpenCLArray or CudaArray,
automatically creating an array of the appropriate type for the current
platform. In practice, just follow these rules:
1. Whenever you need to create an array, make it a ComputeArray.
2. Whenever you write a function that expects an array to be passed to it,
declare the type to be ArrayInterface.
If you do these two things, all differences between platforms will be handled
automatically.
OpenCL and CUDA have quite different APIs for compiling and invoking kernels.
To hide these differences, OpenMM provides a set of abstract classes. To compile
device code, pass the source code to :code:`compileProgram()` on the ComputeContext.
This returns a ComputeProgram. You can then call its :code:`createKernel()`
method to get a ComputeKernel object, which has methods for setting arguments
and invoking the kernel.
Sometimes you need to refer to vector types in host code, such as to set the
value for a kernel argument or to access the elements of an array. OpenCL and
CUDA both define types for them, but they have different names, and in any case
you want to avoid using OpenCL-specific or CUDA-specific types in common code.
OpenMM therefore defines types for vectors in host code. They have the same
names as the corresponding types in device code, only with the prefix :code:`mm_`\ ,
for example :code:`mm_int2` and :code:`mm_float4`\ .
Three component vectors need special care in this context, because the platforms
define them differently. In OpenCL, a three component vector is essentially a
four component vector whose last component is ignored. For example,
:code:`sizeof(float3)` is 12 in CUDA but 16 in OpenCL. Within a kernel this
distinction can usually be ignored, but when communicating between host and
device it becomes vitally important. It is generally best to avoid storing
three component vectors in arrays or passing them as arguments. There are no
:code:`mm_` host types defined for three component vectors, because CUDA and
OpenCL would require them to be defined in different ways.
\ No newline at end of file
.. role:: code
.. raw:: html
<style> .code {font-family:monospace;} </style>
<style> .caption {text-align:center;} </style>
.. highlight:: c++
.. _the-cuda-platform:
The CUDA Platform
#################
The CUDA platform is very similar to the OpenCL platform, and most of the
previous chapter applies equally well to it, just changing “OpenCL” to “Cuda” in
class names. There are a few differences worth noting.
Compiling Kernels
*****************
Like the OpenCL platform, the CUDA platform compiles all its kernels at runtime.
Unlike OpenCL, CUDA does not have built in support for runtime compilation.
OpenMM therefore needs to implement this itself by writing the source code out
to disk, invoking the nvcc compiler as a separate process, and then loading the
compiled kernel in from disk.
For the most part, you can ignore all of this. Just call
:code:`createModule()` on the CudaContext, passing it the CUDA source code.
It takes care of the details of compilation and loading, returning a CUmodule
object when it is done. You can then call :code:`getKernel()` to look up
individual kernels in the module (represented as CUfunction objects) and
:code:`executeKernel()` to execute them.
The CUDA platform does need two things to make this work: a directory on disk
where it can write out temporary files, and the path to the nvcc compiler.
These are specified by the “CudaTempDirectory” and “CudaCompiler” properties
when you create a new Context. It often can figure out suitable values for them
on its own, but sometimes it needs help. See the “Platform-Specific Properties”
chapter of the User's Manual for details.
Accumulating Forces
*******************
The OpenCL platform, as described in Section :numref:`computing-forces`\ , uses two types of buffers for
accumulating forces: a set of floating point buffers, and a single fixed point
buffer. In contrast, the CUDA platform uses *only* the fixed point buffer
(represented by the CUDA type :code:`long` :code:`long`\ ). This means
the CUDA platform only works on devices that support 64 bit atomic operations
(compute capability 1.2 or higher).
.. role:: code
.. raw:: html
<style> .code {font-family:monospace;} </style>
<style> .caption {text-align:center;} </style>
.. highlight:: c++
.. _common-compute:
Common Compute
##############
Common Compute is not a platform, but it shares many elements of one. It exists
to reduce code duplication between the OpenCL and CUDA platforms. It allows a
single implementation to be written for most kernels that can be used by both
platforms.
OpenCL and CUDA are very similar to each other. Their computational models are
nearly identical. For example, each is based around launching kernels that are
executed in parallel by many threads. Each of them groups threads into blocks,
with more communication and synchronization permitted between the threads
in a block than between ones in different blocks. They have very similar memory
hierarchies: high latency global memory, low latency local/shared memory that
can be used for communication between the threads of a block, and local variables
that are visible only to a single thread.
Even their languages for writing kernels are very similar. Here is an OpenCL
kernel that adds two arrays together, storing the result in a third array.
::
__kernel void addArrays(__global const float* restrict a,
__global const float* restrict b,
__global float* restrict c
int length) {
for (int i = get_global_id(0); i < length; i += get_global_size(0))
c[i] = a[i]+b[i];
}
Here is the corresponding CUDA kernel.
::
__extern "C" __global__ void addArrays(const float* __restrict__ a,
const float* __restrict__ b,
_float* __restrict__ c
int length) {
for (int i = blockIdx.x*blockDim.x+threadIdx.x; i < length; i += blockDim.x*gridDim.x)
c[i] = a[i]+b[i];
}
The difference between them is largely just a mechanical find-and-replace.
After many years of writing and maintaining nearly identical kernels by hand,
it finally occurred to us that the translation could be done automatically by
the compiler. Simply by defining a few preprocessor macros, the following
kernel can be compiled equally well either as OpenCL or as CUDA.
::
KERNEL void addArrays(GLOBAL const float* RESTRICT a,
GLOBAL const float* RESTRICT b,
GLOBAL float* RESTRICT c
int length) {
for (int i = GLOBAL_ID; i < length; i += GLOBAL_SIZE)
c[i] = a[i]+b[i];
}
Writing Device Code
*******************
When compiling kernels with the Common Compute API, the following macros are
defined.
+-------------------------------+------------------------------------------------------------+--------------------------------------------+
|Macro |OpenCL Definition |CUDA Definition |
+===============================+============================================================+============================================+
|:code:`KERNEL` |:code:`__kernel` |:code:`extern "C" __global__` |
+-------------------------------+------------------------------------------------------------+--------------------------------------------+
|:code:`DEVICE` | |:code:`__device__` |
+-------------------------------+------------------------------------------------------------+--------------------------------------------+
|:code:`LOCAL` |:code:`__local` |:code:`__shared__` |
+-------------------------------+------------------------------------------------------------+--------------------------------------------+
|:code:`LOCAL_ARG` |:code:`__local` | |
+-------------------------------+------------------------------------------------------------+--------------------------------------------+
|:code:`GLOBAL` |:code:`__global` | |
+-------------------------------+------------------------------------------------------------+--------------------------------------------+
|:code:`RESTRICT` |:code:`restrict` |:code:`__restrict__` |
+-------------------------------+------------------------------------------------------------+--------------------------------------------+
|:code:`LOCAL_ID` |:code:`get_local_id(0)` |:code:`threadIdx.x` |
+-------------------------------+------------------------------------------------------------+--------------------------------------------+
|:code:`LOCAL_SIZE` |:code:`get_local_size(0)` |:code:`blockDim.x` |
+-------------------------------+------------------------------------------------------------+--------------------------------------------+
|:code:`GLOBAL_ID` |:code:`get_global_id(0)` |:code:`(blockIdx.x*blockDim.x+threadIdx.x)` |
+-------------------------------+------------------------------------------------------------+--------------------------------------------+
|:code:`GLOBAL_SIZE` |:code:`get_global_size(0)` |:code:`(blockDim.x*gridDim.x)` |
+-------------------------------+------------------------------------------------------------+--------------------------------------------+
|:code:`GROUP_ID` |:code:`get_group_id(0)` |:code:`blockIdx.x` |
+-------------------------------+------------------------------------------------------------+--------------------------------------------+
|:code:`NUM_GROUPS` |:code:`get_num_groups(0)` |:code:`gridDim.x` |
+-------------------------------+------------------------------------------------------------+--------------------------------------------+
|:code:`SYNC_THREADS` |:code:`barrier(CLK_LOCAL_MEM_FENCE+CLK_GLOBAL_MEM_FENCE);` |:code:`__syncthreads();` |
+-------------------------------+------------------------------------------------------------+--------------------------------------------+
|:code:`SYNC_WARPS` | | if SIMT width >= 32: | | if compute capability >= 7.0: |
| | | :code:`mem_fence(CLK_LOCAL_MEM_FENCE)` | | :code:`__syncwarp();` |
| | | otherwise: | | otherwise empty |
| | | :code:`barrier(CLK_LOCAL_MEM_FENCE)` | |
+-------------------------------+------------------------------------------------------------+--------------------------------------------+
|:code:`MEM_FENCE` |:code:`mem_fence(CLK_LOCAL_MEM_FENCE+CLK_GLOBAL_MEM_FENCE);`|:code:`__threadfence_block();` |
+-------------------------------+------------------------------------------------------------+--------------------------------------------+
|:code:`ATOMIC_ADD(dest, value)`|:code:`atom_add(dest, value)` |:code:`atomicAdd(dest, value)` |
+-------------------------------+------------------------------------------------------------+--------------------------------------------+
A few other symbols may or may not be defined based on the device you are running on:
:code:`SUPPORTS_DOUBLE_PRECISION` and :code:`SUPPORTS_64_BIT_ATOMICS`\ . You
can use :code:`#ifdef` blocks with these symbols to conditionally compile code
based on the features supported by the device. In addition, the CUDA compiler
defines the symbol :code:`__CUDA_ARCH__`\ , so you can check for this symbol if
you want to have different code blocks for CUDA and OpenCL.
Both OpenCL and CUDA define vector types like :code:`int2` and :code:`float4`\ .
The types they support are different but overlapping. When writing common code,
use only the vector types that are supported by both OpenCL and CUDA: 2, 3, and 4
element vectors of type :code:`short`\ , :code:`int`\ , :code:`float`\ , and
:code:`double`\ .
CUDA uses functions to construct vector values, such as :code:`make_float2(x, y)`\ .
OpenCL instead uses a typecast like syntax: :code:`(float2) (x, y)`\ . In common
code, use the CUDA style :code:`make_` functions. OpenMM provides definitions
of these functions when compiling as OpenCL.
In CUDA, vector types are simply data structures. You can access their elements,
but not do much more with them. In contrast, OpenCL's vectors are mathematical
types. All standard math operators are defined for them, as well as geometrical
functions like :code:`dot()` and :code:`cross()`\ . When compiling kernels as
CUDA, OpenMM provides definitions of these operators and functions.
OpenCL also supports "swizzle" notation for vectors. For example, if :code:`f`
is a :code:`float4` you can construct a vector of its first three elements
by writing :code:`f.xyz`\ , or you can swap its first two elements by writing
:code:`f.xy = f.yx`\ . Unfortunately, there is no practical way to support this
in CUDA, so swizzle notation cannot be used in common code. Because stripping
the final element from a four component vector is such a common operation, OpenMM
provides a special function for doing it: :code:`trimTo3(f)` is a vector of its
first three elements.
64 bit integers are another data type that needs special handling. Both OpenCL
and CUDA support them, but they use different names for them: :code:`long` in OpenCL,
:code:`long long` in CUDA. To work around this inconsistency, OpenMM provides
the typedefs :code:`mm_long` and :code:`mm_ulong` for signed and unsigned 64 bit
integers in device code.
Writing Host Code
*****************
Host code for Common Compute is very similar to host code for OpenCL or CUDA.
In fact, most of the classes provided by the OpenCL and CUDA platforms are
subclasses of Common Compute classes. For example, OpenCLContext and
CudaContext are both subclasses of ComputeContext. When writing common code,
each KernelImpl should expect a ComputeContext to be passed to its constructor.
By using the common API provided by that abstract class, it can be used for
either OpenCL or CUDA just based on the particular context passed to it at
runtime. Similarly, OpenCLNonbondedUtilities and CudaNonbondedUtilities are
subclasses of the abstract NonbondedUtilities class, and so on.
ArrayInterface is an abstract class defining the interface for arrays stored on
the device. OpenCLArray and CudaArray are both subclasses of it. To simplify
code that creates and uses arrays, there is also a third subclass called
ComputeArray. It acts as a wrapper around an OpenCLArray or CudaArray,
automatically creating an array of the appropriate type for the current
platform. In practice, just follow these rules:
1. Whenever you need to create an array, make it a ComputeArray.
2. Whenever you write a function that expects an array to be passed to it,
declare the type to be ArrayInterface.
If you do these two things, all differences between platforms will be handled
automatically.
OpenCL and CUDA have quite different APIs for compiling and invoking kernels.
To hide these differences, OpenMM provides a set of abstract classes. To compile
device code, pass the source code to :code:`compileProgram()` on the ComputeContext.
This returns a ComputeProgram. You can then call its :code:`createKernel()`
method to get a ComputeKernel object, which has methods for setting arguments
and invoking the kernel.
Sometimes you need to refer to vector types in host code, such as to set the
value for a kernel argument or to access the elements of an array. OpenCL and
CUDA both define types for them, but they have different names, and in any case
you want to avoid using OpenCL-specific or CUDA-specific types in common code.
OpenMM therefore defines types for vectors in host code. They have the same
names as the corresponding types in device code, only with the prefix :code:`mm_`\ ,
for example :code:`mm_int2` and :code:`mm_float4`\ .
Three component vectors need special care in this context, because the platforms
define them differently. In OpenCL, a three component vector is essentially a
four component vector whose last component is ignored. For example,
:code:`sizeof(float3)` is 12 in CUDA but 16 in OpenCL. Within a kernel this
distinction can usually be ignored, but when communicating between host and
device it becomes vitally important. It is generally best to avoid storing
three component vectors in arrays or passing them as arguments. There are no
:code:`mm_` host types defined for three component vectors, because CUDA and
OpenCL would require them to be defined in different ways.
../../api-python/_static/custom.css
\ No newline at end of file
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment