Reorganise documentation (#3165)

* Break user guide into parts * Break up file I missed * Add basic .gitignore to suit out-of-tree builds in build, build1, build2 ... build9 * Small changes to autonumber.py for doc compilation This is a small change designed not to fix the incorrect logic of autonumber.py, but just to get the docs compiling. While assigning numbers, the code now just ignores the autonumber_by_chapter setting for a particular object if it can't find the appropriate section in section_numbers. This is a temporary fix! * Include part and chapter in autonumbered objects * Fix autonumber.py to correctly reference autonumber roles in file-level sections * Unify and simplify styling across documentation * Break dev docs down into individual chapter-files * Correct absolute links to relative * Disable browser suggestions for lunrsearch box in API docs * Remove part name from chapter 2.1 * Rename ambiguous 'Home' link to OpenMM.org * Typo * Minor fixes and reversions Reverts some changes I had made and later thought better of, and fixes some recurring typos across the documentation. * Number developers guide chapters * Fix responsiveness * Remove header.rst and center captions * Add a level of depth to main TOC and styling to accomodate * Add third level to Part-level TOCs * Use :numref: instead of :ref: to number links to sections * Continuously number chapters in user guide * navbar links to other docs relative, not absolute * Improve navbar spacing with new template * Fix sidebar while allowing it to scroll * Hard -> Soft links for navigation.html template * Add navigation.html template to cmakelists * Add another level of .. to relative links * Fix flex on C++ and remove layer of ..

Reorganise documentation (#3165)
* Break user guide into parts * Break up file I missed * Add basic .gitignore to suit out-of-tree builds in build, build1, build2 ... build9 * Small changes to autonumber.py for doc compilation This is a small change designed not to fix the incorrect logic of autonumber.py, but just to get the docs compiling. While assigning numbers, the code now just ignores the autonumber_by_chapter setting for a particular object if it can't find the appropriate section in section_numbers. This is a temporary fix! * Include part and chapter in autonumbered objects * Fix autonumber.py to correctly reference autonumber roles in file-level sections * Unify and simplify styling across documentation * Break dev docs down into individual chapter-files * Correct absolute links to relative * Disable browser suggestions for lunrsearch box in API docs * Remove part name from chapter 2.1 * Rename ambiguous 'Home' link to OpenMM.org * Typo * Minor fixes and reversions Reverts some changes I had made and later thought better of, and fixes some recurring typos across the documentation. * Number developers guide chapters * Fix responsiveness * Remove header.rst and center captions * Add a level of depth to main TOC and styling to accomodate * Add third level to Part-level TOCs * Use :numref: instead of :ref: to number links to sections * Continuously number chapters in user guide * navbar links to other docs relative, not absolute * Improve navbar spacing with new template * Fix sidebar while allowing it to scroll * Hard -> Soft links for navigation.html template * Add navigation.html template to cmakelists * Add another level of .. to relative links * Fix flex on C++ and remove layer of ..
f5166695 · Josh A. Mitchell · GitHub · 1344f2e0 · f5166695 · f5166695
Unverified Commit f5166695 authored Jul 16, 2021 by Josh A. Mitchell Committed by GitHub Jul 15, 2021
20 changed files
--- a/.gitignore
+++ b/.gitignore
+__pycache__
+build
+build?
--- a/docs-source/api-c++/CMakeLists.txt
+++ b/docs-source/api-c++/CMakeLists.txt
@@ -4,6 +4,9 @@ file(GLOB STAGING_INPUT_FILES RELATIVE "${CMAKE_CURRENT_SOURCE_DIR}"
    "*.rst.jinja2"
    "*.py"
    "_static/logo.png"
+    "_static/custom.css"
+    "_templates/lunrsearch.html"
+    "_templates/navigation.html"
 )
 set(WRAPPER_DOXYGEN_DIR "${CMAKE_CURRENT_BINARY_DIR}/doxygen")
@@ -51,6 +54,9 @@ add_custom_command(
            "${CMAKE_CURRENT_BINARY_DIR}/library.rst"
            "${CMAKE_CURRENT_BINARY_DIR}/index.rst"
            "${CMAKE_CURRENT_BINARY_DIR}/_static/logo.png"
+            "${CMAKE_CURRENT_BINARY_DIR}/_static/custom.css"
+            "${CMAKE_CURRENT_BINARY_DIR}/_templates/lunrsearch.html"
+            "${CMAKE_CURRENT_BINARY_DIR}/_templates/navigation.html"
            "${WRAPPER_DOXYGEN_DIR}/xml/index.xml"
 )

--- a/docs-source/api-c++/_static/custom.css
+++ b/docs-source/api-c++/_static/custom.css
+../../api-python/_static/custom.css
\ No newline at end of file
--- a/docs-source/api-c++/_templates/lunrsearch.html
+++ b/docs-source/api-c++/_templates/lunrsearch.html
+../../api-python/_templates/lunrsearch.html
\ No newline at end of file
--- a/docs-source/api-c++/_templates/navigation.html
+++ b/docs-source/api-c++/_templates/navigation.html
+../../api-python/_templates/navigation.html
\ No newline at end of file
--- a/docs-source/api-c++/conf.py
+++ b/docs-source/api-c++/conf.py
-import sys
 import os
+import sys
-extensions = ['sphinx.ext.mathjax','sphinx.ext.autosummary',
+extensions = [
-              'sphinx.ext.autodoc', 'sphinxcontrib.lunrsearch',
+    "sphinx.ext.mathjax",
-              'sphinxcontrib.autodoc_doxygen']
+    "sphinx.ext.autosummary",
+    "sphinx.ext.autodoc",
+    "sphinxcontrib.lunrsearch",
+    "sphinxcontrib.autodoc_doxygen",
+]
 autosummary_generate = True
-autodoc_member_order = 'bysource'
+autodoc_member_order = "bysource"
-source_suffix = '.rst'
+source_suffix = ".rst"
-master_doc = 'index'
+master_doc = "index"
-project = u'OpenMM'
+project = u"OpenMM C++ API"
-copyright = u'2015, Stanford University and the Authors'
+copyright = u"2015, Stanford University and the Authors"
-version = '@OPENMM_MAJOR_VERSION@.@OPENMM_MINOR_VERSION@'
+version = "@OPENMM_MAJOR_VERSION@.@OPENMM_MINOR_VERSION@"
-release = '@OPENMM_MAJOR_VERSION@.@OPENMM_MINOR_VERSION@'
+release = "@OPENMM_MAJOR_VERSION@.@OPENMM_MINOR_VERSION@"
-exclude_patterns = ['_build', '_templates']
+exclude_patterns = ["_build", "_templates"]
-html_static_path = ['_static']
+html_static_path = ["_static"]
-templates_path = ['_templates']
+templates_path = ["_templates"]
-pygments_style = 'sphinx'
+pygments_style = "sphinx"
 html_theme = "alabaster"
 html_theme_options = {
-    'description': "High performance molecular simulation on GPUs",
+    "github_button": False,
-    'github_button': False,
+    "github_user": "openmm",
-    # 'github_user': 'pandegroup',
+    "github_repo": "openmm",
-    # 'github_repo': 'openmm',
+    "logo_name": True,
-    'logo_name': False,
+    "logo": "logo.png",
-    'logo': 'logo.png',
+    "extra_nav_links": [
+        {
+            "title": "OpenMM.org",
+            "uri": "https://openmm.org",
+            "relative": False,
+        },
+        {
+            "title": "User's Manual",
+            "uri": "../userguide/",
+            "relative": True,
+        },
+        {
+            "title": "Developer Guide",
+            "uri": "../developerguide/",
+            "relative": True,
+        },
+        {
+            "title": "Python API reference",
+            "uri": "../api-python/",
+            "relative": True,
+        },
+        {
+            "title": "GitHub",
+            "uri": "https://github.com/openmm",
+            "relative": False,
+        },
+    ],
+    "show_relbar_bottom": True,
 }
 html_sidebars = {
-    '**': [
+    "**": [
-        'about.html',
+        "about.html",
-        'searchbox.html',
+        "lunrsearch.html",
-        'navigation.html',
+        "navigation.html",
    ]
 }

--- a/docs-source/api-python/CMakeLists.txt
+++ b/docs-source/api-python/CMakeLists.txt
@@ -7,7 +7,10 @@ file(GLOB STAGING_INPUT_FILES RELATIVE "${CMAKE_CURRENT_SOURCE_DIR}"
    "conf.py"
    "process-docstring.py"
    "_static/logo.png"
+    "_static/custom.css"
    "_templates/class.rst"
+    "_templates/lunrsearch.html"
+    "_templates/navigation.html"
 )
 foreach(INIT_FILE ${STAGING_INPUT_FILES})
@@ -41,7 +44,10 @@ add_custom_command(
            "${CMAKE_CURRENT_BINARY_DIR}/library.rst"
            "${CMAKE_CURRENT_BINARY_DIR}/index.rst"
            "${CMAKE_CURRENT_BINARY_DIR}/_static/logo.png"
+            "${CMAKE_CURRENT_BINARY_DIR}/_static/custom.css"
            "${CMAKE_CURRENT_BINARY_DIR}/_templates/class.rst"
+            "${CMAKE_CURRENT_BINARY_DIR}/_templates/lunrsearch.html"
+            "${CMAKE_CURRENT_BINARY_DIR}/_templates/navigation.html"
            PythonInstall
 )
 file(MAKE_DIRECTORY ${CMAKE_BINARY_DIR}/api-python/)

--- a/docs-source/api-python/_static/custom.css
+++ b/docs-source/api-python/_static/custom.css
+/* Reasonable defaults */
+html {
+    overflow-x: hidden;
+    overflow-y: scroll;
+    text-rendering: optimizeLegibility;
+    text-size-adjust: 100%;
+    -moz-osx-font-smoothing: grayscale;
+    -webkit-font-smoothing: antialiased;
+}
+/* Fix responsiveness */
+body {
+    overflow-x: hidden;
+}
+div.body {
+    min-width: unset;
+}
+@media screen and (max-width: 870px) {
+    div.sphinxsidebar p.logo {
+        display: unset;
+    }
+}
+@media screen and (max-width: 875px) {
+    ul {
+        margin-left: 30px;
+    }
+    div.sphinxsidebar {
+        width: 100vw;
+        padding: 0;
+    }
+}
+@media screen and (min-width: 871px) and (max-width: 940px) {
+    div.document {
+        width: 100vw
+    }
+}
+/* When search bar is in nav footer, don't let it stretch too far */
+.searchformwrapper {
+    max-width: 250px;
+}
+/* Fix next/prev links in footer */
+/* Don't just float the whole thing right */
+nav#rellinks {
+    float: unset;
+}
+nav#rellinks ul {
+    padding-left: 0;
+    display: flex;
+    justify-content: space-between;
+    flex-wrap: wrap;
+}
+nav#rellinks li {
+    line-height: 1.3;
+    padding: 5px 0px;
+}
+nav#rellinks li:first-child {
+    display: block;
+    text-indent: -17px;
+    padding-left: 17px;
+}
+nav#rellinks li + li {
+    margin-left: auto;
+    text-align: right;
+    display: flex;
+}
+nav#rellinks li + li a {
+    display: inline-block;
+    margin-right: 5px;
+}
+nav#rellinks li + li:before {
+    content: "";
+}
+/* Put the title and logo side by side*/
+.sphinxsidebarwrapper {
+    display: flex;
+    flex-wrap: wrap;
+    align-items: flex-start;
+    flex-direction: row-reverse;
+    justify-content: space-between;
+    align-content: flex-start;
+}
+.sphinxsidebar .logo-name {
+    flex-basis: 140px;
+    font-size: 20px;
+}
+.sphinxsidebar p.logo {
+    flex-basis: 60px;
+    text-align: right;
+    display: block;
+    margin-top: 0 !important;
+}
+/* Get control over the image */
+.sphinxsidebar p.logo a {
+    height: auto;
+    display: block;
+}
+/* Make sure the remaining items use the width of
+ * the whole navbar and don't get squished together
+ * by flex.
+ */
+.sphinxsidebar .logo-name ~ * {
+    flex-basis: 100%
+}
+/* Emulate a placeholder rather than a heading for search */
+.sphinxsidebar #searchbox form.search input[type="text"] {
+    background-image: url("data:image/svg+xml;utf8,<svg xmlns='http://www.w3.org/2000/svg' version='1.1' height='50px' width='120px'><text x='10' y='17' fill='gray' font-size='15'>Search...</text></svg>");
+    background-repeat: no-repeat;
+}
+.sphinxsidebar #searchbox form.search input[type="text"]:focus {
+    background-image: none;
+}
+/* Hide unwanted elements*/
+.sphinxsidebarwrapper #searchbox h3, /* Search heading */
+.sphinxsidebarwrapper > h3, /* Navigation heading */
+.sphinxsidebarwrapper p:empty, /* Empty elements taking up space */
+.sphinxsidebar .logo-name + a[href], /* Inexplicable but ugly link */
+.sphinxsidebarwrapper hr /* Horizontal rules */
+{
+    display: none;
+}
+/* Hide logo on tiny screens */
+@media screen and (max-width: 280px) {
+    div.sphinxsidebar p.logo {
+        display: none;
+    }
+    .sphinxsidebar .logo-name {
+        flex-basis: 100%;
+    }
+}
+/* Style TOC in sidebar more clearly */
+.sphinxsidebarwrapper li.toctree-l1 {
+    padding: 0.15em 0;
+    line-height: 1.4;
+}
+.sphinxsidebarwrapper a.current,
+.sphinxsidebarwrapper a.current:hover {
+    text-decoration: none;
+    border-bottom: none;
+    cursor: text;
+}
+/* Tweak spacing */
+div.sphinxsidebarwrapper #searchbox {
+    margin-bottom: 0;
+}
+div.sphinxsidebarwrapper .nav-toctree > ul {
+    margin: 5px 0;
+}
+/* Enlarge space between toctrees and external links */
+div.sphinxsidebarwrapper .nav-toctree {
+    margin-top: 15px;
+    margin-bottom: 15px;
+}
+div.sphinxsidebarwrapper .extra-nav-links {
+    margin-bottom: 0;
+}
+/* Custom body styling */
+/* Center captions of figures, examples, etc. */
+.body .caption {
+    text-align: center;
+}
+.body .toctree-l1 {
+    font-weight: bold;
+}
+.body .toctree-l2 {
+    font-weight: normal;
+}
+.body .toctree-l3 {
+    font-size: 0.8em;
+}
+/* Fix navbar to top */
+@media screen and (min-width: 875px) {
+    .sphinxsidebar {
+        position: fixed;
+        height: 100vh;
+        overflow-y: hidden;
+        top: 0;
+        float: unset !important;
+        margin-left: 0 !important;
+    }
+    .sphinxsidebarwrapper {
+        height: calc(100% - 60px);
+        overflow-y: auto;
+        padding-top: 30px !important;
+        padding-bottom: 30px !important;
+        /* Hide scrollbar */
+        -ms-overflow-style: none;  /* IE and Edge */
+        scrollbar-width: none;  /* Firefox */
+    }
+    /* Hide scrollbar */
+    .sphinxsidebarwrapper::-webkit-scrollbar {
+        display: none
+    }
+}
--- a/docs-source/api-python/_templates/lunrsearch.html
+++ b/docs-source/api-python/_templates/lunrsearch.html
+<!--
+    sphinxcontrib-lunrsearch injects its own search template at the front of the
+    line, so to overwrite it, I'm using this template with a different name, and
+    specifying it in html_sidebars. This is a temporary measure until we replace
+    or remove sphinxcontrib-lunrsearch
+-->
+<script type="text/javascript">
+var Search = {
+    store : null,
+    setIndex : function (data) {
+        this.store = data.store;
+    },
+};
+</script>
+{# The script searchindex.js contains the code Search.setIndex(...) where
+   the content is an object built from IndexBuilder.freeze(). So we need to
+   setup the Search.setIndex function beforehand just to store the data.
+   This should all be finished when onload fires, and at that point the code in
+   searchbox.js will pull the data out of Search.store and build the actual
+   index and callbacks.
+ #}
+<script src="{{ pathto('searchindex.js', 1) }}" type="text/javascript"></script>
+<form class="search" action="" method="get">
+  <input type="hidden" name="check_keywords" value="yes" />
+  <input type="hidden" name="area" value="default" />
+  <input type="hidden" id="ls_lunrsearch-highlight" value="{{ lunrsearch_highlight }}" />
+  <input type="text" class="search-field" id="ls_search-field" name="q" placeholder="Search API..." autocomplete="off" />
+  <ul class="results" id="ls_search-results"></ul>
+</form>
--- a/docs-source/api-python/_templates/navigation.html
+++ b/docs-source/api-python/_templates/navigation.html
+<div class="navigation-scrollbox">
+    <div class="nav-toctree">
+    {{ toctree(includehidden=theme_sidebar_includehidden, collapse=theme_sidebar_collapse) }}
+    </div>
+    {% if theme_extra_nav_links %}
+    <ul class="extra-nav-links">
+        {% for link in theme_extra_nav_links %}
+        <li class="toctree-l1">
+            <a href="{{ pathto(link.uri, 1) if link.relative else link.uri }}">
+                {{ link.title }}
+            </a>
+        </li>
+        {% endfor %}
+    </ul>
+    {% endif %}
+</div>
--- a/docs-source/api-python/conf.py
+++ b/docs-source/api-python/conf.py
 # -*- coding: utf-8 -*-
-import sys
 import os
+import sys
 import openmm.version
-extensions = ['sphinx.ext.mathjax', 'sphinx.ext.ifconfig', 'sphinx.ext.autosummary',
+extensions = [
-              'sphinx.ext.autodoc', 'sphinx.ext.napoleon', 'process-docstring',
+    "sphinx.ext.mathjax",
-              'sphinxcontrib.lunrsearch']
+    "sphinx.ext.ifconfig",
+    "sphinx.ext.autosummary",
+    "sphinx.ext.autodoc",
+    "sphinx.ext.napoleon",
+    "process-docstring",
+    "sphinxcontrib.lunrsearch",
+]
 autosummary_generate = True
 autodoc_default_options = {
-    'members': True,
+    "members": True,
-    'inherited-members': True,
+    "inherited-members": True,
-    'member-order': 'bysource'
+    "member-order": "bysource",
 }
-source_suffix = '.rst'
+source_suffix = ".rst"
-master_doc = 'index'
+master_doc = "index"
-project = u'OpenMM'
+project = u"OpenMM Python API"
-copyright = u'2015, Stanford University and the Authors'
+copyright = u"2015, Stanford University and the Authors"
 version = openmm.version.short_version
 release = openmm.version.full_version
-exclude_patterns = ['_build', '_templates']
+exclude_patterns = ["_build", "_templates"]
-html_static_path = ['_static']
+html_static_path = ["_static"]
-templates_path = ['_templates']
+templates_path = ["_templates"]
-pygments_style = 'sphinx'
+pygments_style = "sphinx"
 html_theme = "alabaster"
 html_theme_options = {
-    'description': "High performance molecular simulation on GPUs",
+    "github_button": False,
-    'github_button': False,
+    "github_user": "openmm",
-    # 'github_user': 'pandegroup',
+    "github_repo": "openmm",
-    # 'github_repo': 'openmm',
+    "logo_name": True,
-    'logo_name': False,
+    "logo": "logo.png",
-    'logo': 'logo.png',
+    "extra_nav_links": [
+        {
+            "title": "OpenMM.org",
+            "uri": "https://openmm.org",
+            "relative": False,
+        },
+        {
+            "title": "User's Manual",
+            "uri": "../userguide/",
+            "relative": True,
+        },
+        {
+            "title": "Developer Guide",
+            "uri": "../developerguide/",
+            "relative": True,
+        },
+        {
+            "title": "C++ API reference",
+            "uri": "../api-c++/",
+            "relative": True,
+        },
+        {
+            "title": "GitHub",
+            "uri": "https://github.com/openmm",
+            "relative": False,
+        },
+    ],
+    "show_relbar_bottom": True,
 }
 html_sidebars = {
-    '**': [
+    "**": [
-        'about.html',
+        "about.html",
-        'searchbox.html',
+        "lunrsearch.html",
-        'navigation.html',
+        "navigation.html",
    ]
 }

--- a/docs-source/developerguide/01_introduction.rst
+++ b/docs-source/developerguide/01_introduction.rst
+.. role:: code
+.. raw:: html
+    <style> .code {font-family:monospace;} </style>
+    <style> .caption {text-align:center;} </style>
+.. highlight:: c++
+Introduction
+############
+This guide describes the internal architecture of the OpenMM library.  It is
+targeted at developers who want to add features to OpenMM, either by modifying
+the core library directly or by writing plugins.  If you just want to write
+applications that use OpenMM, you do not need to read this guide; the User's
+Manual tells you everything you need to know.  This guide is intended for
+people who want to contribute to OpenMM itself.
+It is organized as follows:
+* Chapter :numref:`the-core-library` describes the architecture of the core OpenMM library.  It
+  discusses how the high level and low level APIs relate to each other, and the
+  flow of execution between them.
+* Chapter :numref:`writing-plugins` describes in detail how to write a plugin.  It focuses on the two
+  most common types of plugins: those which define new Forces, and those which
+  implement new Platforms.
+* Chapter :numref:`the-reference-platform` discusses the architecture of the reference Platform, providing
+  information relevant to writing reference implementations of new features.
+* Chapter :numref:`the-cpu-platform` discusses the architecture of the CPU Platform, providing
+  information relevant to writing CPU implementations of new features.
+* Chapter :numref:`the-opencl-platform` discusses the architecture of the OpenCL Platform, providing
+  information relevant to writing OpenCL implementations of new features.
+* Chapter :numref:`the-cuda-platform` discusses the architecture of the CUDA Platform, providing
+  information relevant to writing CUDA implementations of new features.
+* Chapter :numref:`common-compute` describes the Common Compute framework, which lets you
+  write a single implementation of a feature that can be used for both OpenCL and CUDA.
+This guide assumes you are already familiar with the public API and how to use
+OpenMM in applications.  If that is not the case, you should first read the
+User's Manual and work through some of the example programs.  Pay especially
+close attention to the “Introduction to the OpenMM Library” chapter, since it
+introduces concepts that are important in understanding this guide.
--- a/docs-source/developerguide/02_core_library.rst
+++ b/docs-source/developerguide/02_core_library.rst
+.. role:: code
+.. raw:: html
+    <style> .code {font-family:monospace;} </style>
+    <style> .caption {text-align:center;} </style>
+.. highlight:: c++
+.. _the-core-library:
+The Core Library
+################
+OpenMM is based on a layered architecture, as shown in the following diagram:
+.. figure:: ../images/ArchitectureLayers.jpg
+   :align: center
+   :width: 100%
+   :autonumber:`Figure,Architecture Layers`\ : OpenMM architecture
+The public API layer consists of the classes you access when using OpenMM in an
+application: System; Force and its subclasses; Integrator and its subclasses;
+and Context.  These classes define a public interface but do no computation.
+The next layer down consists of “implementation” classes that mirror the public
+API classes: ContextImpl, ForceImpl, and a subclass of ForceImpl for each
+subclass of Force (HarmonicBondForceImpl, NonbondedForceImpl, etc.).  These
+objects are created automatically when you create a Context.  They store
+information related to a particular simulation, and define methods for
+performing calculations.
+Note that, whereas a Force is logically “part of” a System, a ForceImpl is
+logically “part of” a Context.  (See :autonumref:`Figure,API Relationships`\ .)  If you create many Contexts
+for simulating the same System, there is still only one System and only one copy
+of each Force in it.  But there will be separate ForceImpls for each Context,
+and those ForceImpls store information related to their particular Contexts.
+.. figure:: ../images/SystemContextRelationships.jpg
+   :align: center
+   :autonumber:`Figure,API Relationships`\ : Relationships between public API and implementation layer objects
+Also note that there is no “IntegratorImpl” class, because it is not needed.
+Integrator is already specific to one Context.  Many Contexts can all simulate
+the same System, but each of them must have its own Integrator, so information
+specific to one simulation can be stored directly in the Integrator.
+The next layer down is the OpenMM Low Level API (OLLA).  The important classes
+in this layer are: Platform; Kernel; KernelImpl and its subclasses; and
+KernelFactory.  A Kernel is just a reference counted pointer to a KernelImpl;
+the real work is done by KernelImpl objects (or more precisely, by instances of
+its subclasses).  A KernelFactory creates KernelImpl objects, and a Platform
+ties together a set of KernelFactories, as well as defining information that
+applies generally to performing computations with that Platform.
+All of these classes (except Kernel) are abstract.  A particular Platform
+provides concrete subclasses of all of them.  For example, the reference
+platform defines a Platform subclass called ReferencePlatform, a KernelFactory
+subclass called ReferenceKernelFactory, and a concrete subclass of each abstract
+KernelImpl type: ReferenceCalcNonbondedForceKernel extends
+CalcNonbondedForceKernel (which in turn extends KernelImpl),
+ReferenceIntegrateVerletStepKernel extends IntegrateVerletStepKernel, and so on.
+We can understand this better by walking through the entire sequence of events
+that takes place when you create a Context.  As an example, suppose you create a
+System; add a NonbondedForce to it; create a VerletIntegrator; and then create a
+Context for them using the reference Platform.  Here is what happens.
+#. The Context constructor creates a ContextImpl.
+#. The ContextImpl calls :code:`createImpl()` on each Force in the System,
+   which creates an instance of the appropriate ForceImpl subclass.
+#. The ContextImpl calls :code:`contextCreated()` on the Platform(), which
+   in turn calls :code:`setPlatformData()` on the ContextImpl.  This allows
+   Platform-specific information to be stored in a ContextImpl.  Every Platform has
+   its own mechanism for storing particle masses, constraint definitions, particle
+   positions, and so on.  ContextImpl therefore allows the Platform to create an
+   arbitrary block of data and store it where it can be accessed by that Platform’s
+   kernels.
+#. The ContextImpl  calls :code:`createKernel()` on the Platform several
+   times to get instances of various kernels that it needs:
+   CalcKineticEnergyKernel, ApplyConstraintsKernel, etc.
+   #. For each kernel, the Platform looks up which KernelFactory has been
+      registered for that particular kernel.  In this case, it will be a
+      ReferenceKernelFactory.
+   #. It calls :code:`createKernelImpl()` on the KernelFactory, which
+      creates and returns an instance of an appropriate KernelImpl subclass:
+      ReferenceCalcKineticEnergyKernel, ReferenceApplyConstraintsKernel, etc.
+#. The ContextImpl loops over all of its ForceImpls and calls
+   :code:`initialize()` on each one.
+   #. Each ForceImpl asks the Platform to create whatever kernels it needs.  In
+      this example, NonbondedForceImpl will request a CalcNonbondedForceKernel, and
+      get back a ReferenceCalcNonbondedForceKernel.
+#. The ContextImpl calls :code:`initialize()` on the Integrator which, like
+   the other objects, requests kernels from the Platform.  In this example,
+   VerletIntegrator requests an IntegrateVerletStepKernel and gets back a
+   ReferenceIntegrateVerletStepKernel.
+At this point, the Context is fully initialized and ready for doing computation.
+Reference implementations of various KernelImpls have been created, but they are
+always referenced through abstract superclasses.  Similarly, data structures
+specific to the reference Platform have been created and stored in the
+ContextImpl, but the format and content of these structures is opaque to the
+ContextImpl.  Whenever it needs to access them (for example, to get or set
+particle positions), it does so through a kernel (UpdateStateDataKernel in this
+case).
+Now suppose that you call :code:`step()` on the VerletIntegrator.  Here is
+what happens to execute each time step.
+#. The VerletIntegrator calls :code:`updateContextState()` on the
+   ContextImpl.  This gives each Force an opportunity to modify the state of the
+   Context at the start of each time step.
+   #. The ContextImpl loops over its ForceImpls and calls
+      :code:`updateContextState()` on each one.  In this case, our only ForceImpl is
+      a NonbondedForceImpl, which returns without doing anything.  On the other hand,
+      if we had an AndersenThermostat in our System, its ForceImpl would invoke a
+      kernel to modify particle velocities.
+#. The VerletIntegrator calls :code:`calcForcesAndEnergy()` on the
+   ContextImpl to request that the forces be computed.
+   #. The ContextImpl calls :code:`beginComputation()` on its
+      CalcForcesAndEnergyKernel.  This initializes all the forces to zero and does any
+      other initialization the Platform requires before forces can be computed.  For
+      example, some Platforms construct their nonbonded neighbor lists at this point.
+   #. The ContextImpl loops over its ForceImpls and calls
+      :code:`calcForcesAndEnergy()` on each one.  In this case, we have a
+      NonbondedForceImpl which invokes its CalcNonbondedForceKernel to compute forces.
+   #. Finally, the ContextImpl calls :code:`finishComputation()` on its
+      CalcForcesAndEnergyKernel.  This does any additional work needed to determine
+      the final forces, such as summing the values from intermediate buffers.
+#. Finally, the VerletIntegrator invokes its IntegrateVerletStepKernel.  This
+   takes the forces, positions, and velocities that are stored in a Platform-
+   specific format in the ContextImpl, uses them to compute new positions and
+   velocities, and stores them in the ContextImpl.
--- a/docs-source/developerguide/03_writing_plugins.rst
+++ b/docs-source/developerguide/03_writing_plugins.rst
+.. role:: code
+.. raw:: html
+    <style> .code {font-family:monospace;} </style>
+    <style> .caption {text-align:center;} </style>
+.. highlight:: c++
+.. _writing-plugins:
+Writing Plugins
+###############
+A plugin is a dynamic library that adds new features to OpenMM.  It is typically
+stored in the :code:`lib/plugins` directory inside your OpenMM installation,
+and gets loaded along with all other plugins when the user calls
+::
+    Platform::loadPluginsFromDirectory(Platform::getDefaultPluginsDirectory());
+It is also possible to load plugins from a different directory, or to load them
+individually by calling :code:`Platform::loadPluginLibrary()`\ .
+Every plugin must implement two functions that are declared in the
+PluginInitializer.h header file:
+::
+    extern "C" void registerPlatforms();
+    extern "C" void registerKernelFactories();
+When a plugin is loaded, these two functions are invoked to register any
+Platforms and KernelFactories defined by the plugin.  When many plugins are
+loaded at once by calling :code:`Platform::loadPluginsFromDirectory()`\ ,
+:code:`registerPlatforms()` is first called on all of them, then
+:code:`registerKernelFactories()` is called on all of them.  This allows one
+plugin to define a Platform, and a different plugin to add KernelFactories to
+it; the Platform is guaranteed to be registered by the first plugin before the
+second plugin tries to add its KernelFactories, regardless of what order the
+plugins happen to be loaded in.
+Creating New Platforms
+**********************
+One common type of plugin defines a new Platform.  There are four such plugins
+that come with OpenMM: one for the Reference platform, one for the CPU Platform,
+one for the CUDA Platform, and one for the OpenCL Platform.
+To define a new Platform, you must create subclasses of the various abstract
+classes in the OpenMM Low Level API: a subclass of Platform, one or more
+subclasses of KernelFactory, and a subclass of each KernelImpl.  That is easy to
+say, but a huge amount of work to actually do.  There are many different
+algorithms involved in computing forces, enforcing constraints, performing
+integration, and so on, all of which together make up a Platform.  Of course,
+there is no requirement that every Platform must implement every possible
+feature.  If you do not provide an implementation of a particular kernel, it
+simply means your Platform cannot be used for any simulation that requires that
+kernel; if a user tries to do so, an exception will be thrown.
+Your plugin’s :code:`registerPlatforms()` function should create an instance
+of your Platform subclass, then register it by calling
+:code:`Platform::registerPlatform()`\ .  You also must register the
+KernelFactory for each kernel your Platform supports.  This can be done in the
+:code:`registerKernelFactories()` function, or more simply, directly in the
+Platform’s constructor.  You can use as many different KernelFactories as you
+want for different kernels, but usually it is simplest to use a single
+KernelFactory for all of them.  The support for multiple KernelFactories exists
+primarily to let plugins add new features to existing Platforms, as described in
+the next section.
+Creating New Forces
+*******************
+Another common type of plugin defines new Forces and provides implementations of
+them for existing Platforms.  (Defining new Integrators is not specifically
+discussed here, but the process is very similar.)  There are two such plugins
+that come with OpenMM.  They implement the AMOEBA force field and Drude
+oscillators, respectively.
+As an example, suppose you want to create a new Force subclass called
+StringForce that uses the equations of String Theory to compute the interactions
+between particles.  You want to provide implementations of it for all four
+standard platforms: Reference, CPU, CUDA, and OpenCL.
+The first thing to realize is that this *cannot* be done with only a plugin
+library.  Plugins are loaded dynamically at runtime, and they relate to the low
+level API; but you must also provide a public API.  Users of your class need to
+create StringForce objects and call methods on them.  That means providing a
+header file with the class declaration, and a (non-plugin) library with the
+class definition to link their code against.  The implementations for particular
+Platforms can be in plugins, but the public API class itself cannot.  Or to put
+it differently, the full “plugin” (from the user’s perspective) consists of
+three parts: the library OpenMM loads at runtime (which is what OpenMM considers
+to be the “plugin”), a second library for users to link their code against, and
+a header file for them to include in their source code.
+To define the API, you will need to create the following classes:
+#. StringForce.  This is the public API for your force, and users will directly
+   link against the library containing it.
+#. StringForceImpl.  This is the ForceImpl subclass corresponding to
+   StringForce.  It should be defined in the same library as StringForce, and
+   StringForce’s :code:`createImpl()` method should create an instance of it.
+#. CalcStringForceKernel.  This is an abstract class that extends KernelImpl,
+   and defines the API by which StringForceImpl invokes its kernel.  You only need
+   to provide a header file for it, not an implementation; those will be provided
+   by Platforms.
+Now suppose you are writing the OpenCL implementation of StringForce.  Here are
+the classes you need to write:
+#. OpenCLCalcStringForceKernel.  This extends CalcStringForceKernel and provides
+   implementations of its virtual methods.  The code for this class will probably
+   be very complicated (and if it actually works, worth a Nobel Prize).  It may
+   execute many different GPU kernels and create its own internal data structures.
+   But those details are entirely internal to your own code.  As long as this class
+   implements the virtual methods of CalcStringForceKernel, you can do anything you
+   want inside it.
+#. OpenCLStringForceKernelFactory.  This is a KernelFactory subclass that knows
+   how to create instances of OpenCLCalcStringForceKernel.
+Both of these classes should be packaged into a dynamic library (.so on Linux,
+.dylib on Mac, .dll on Windows) that can be loaded as a plugin.  This library
+must also implement the two functions from PluginInitializer.h.
+:code:`registerPlatforms()` will do nothing, since this plugin does not
+implement any new Platforms.  :code:`registerKernelFactories()` should call
+\ :code:`Platform::getPlatformByName("OpenCL")` to get the OpenCL Platform,
+then create a new OpenCLStringForceKernelFactory and call
+:code:`registerKernelFactory()` on the Platform to register it.  If the OpenCL
+Platform is not available, you should catch the exception then return without
+doing anything.  Most likely this means there is no OpenCL runtime on the
+computer your code is running on.
--- a/docs-source/developerguide/04_reference_platform.rst
+++ b/docs-source/developerguide/04_reference_platform.rst
+.. role:: code
+.. raw:: html
+    <style> .code {font-family:monospace;} </style>
+    <style> .caption {text-align:center;} </style>
+.. highlight:: c++
+.. _the-reference-platform:
+The Reference Platform
+######################
+The reference Platform is written with simplicity and clarity in mind, not
+performance.  (It is still not always as simple or clear as one might hope, but
+that is the goal.)  When implementing a new feature, it is recommended to create
+the reference implementation first, then use that as a model for the versions in
+other Platforms.
+When using the reference Platform, the “platform-specific data” stored in
+ContextImpl is of type ReferencePlatform::PlatformData, which is declared in
+ReferencePlatform.h.  It has fields for storing positions, velocities, box
+vectors, and other types of data.
+The PlatformData’s vector of forces contains one element for each particle.  At
+the start of each force evaluation, all elements of it are set to zero.  Each
+Force adds its own contributions to the vector, so that at the end, it contains
+the total force acting on each particle.
+There are a few additional classes that contain useful static methods.
+SimTKOpenMMUtilities has various utility functions, of which the most important
+is a random number generator.  ReferenceForce provides methods for calculating
+the displacement between two positions, optionally taking periodic boundary
+conditions into account.
--- a/docs-source/developerguide/05_cpu_platform.rst
+++ b/docs-source/developerguide/05_cpu_platform.rst
+.. role:: code
+.. raw:: html
+    <style> .code {font-family:monospace;} </style>
+    <style> .caption {text-align:center;} </style>
+.. highlight:: c++
+.. _the-cpu-platform:
+The CPU Plaform
+###############
+CpuPlatform is a subclass of ReferencePlatform.  It provides optimized versions
+of a small number of kernels, while using the reference implementations for all
+the others.  Any kernel implementation written for the reference Platform will
+work equally well with the CPU platform.  Of course, if that kernel happens to
+be a performance bottleneck, you will probably want to write an optimized
+version of it.  But many kernels have negligible effect on performance, and for
+these you can just use the same implementation for both platforms.
+If you choose to do that, you can easily support both platforms with a single
+plugin library.  Just implement :code:`registerKernelFactories()` like this:
+::
+    extern "C" void registerKernelFactories() {
+        for (int i = 0; i < Platform::getNumPlatforms(); i++) {
+            Platform& platform = Platform::getPlatform(i);
+            if (dynamic_cast<ReferencePlatform*>(&platform) != NULL) {
+                // Create and register your KernelFactory.
+            }
+        }
+    }
+The loop identifies every ReferencePlatform, either an instance of the base
+class or of a subclass, and registers a KernelFactory for every one.
--- a/docs-source/developerguide/developer.rst
+++ b/docs-source/developerguide/developer.rst
@@ -6,366 +6,6 @@
 .. highlight:: c++
-Introduction
-############
-This guide describes the internal architecture of the OpenMM library.  It is
-targeted at developers who want to add features to OpenMM, either by modifying
-the core library directly or by writing plugins.  If you just want to write
-applications that use OpenMM, you do not need to read this guide; the Users
-Manual tells you everything you need to know.  This guide is *only* for
-people who want to contribute to OpenMM itself.
-It is organized as follows:
-* Chapter :ref:`the-core-library` describes the architecture of the core OpenMM library.  It
-  discusses how the high level and low level APIs relate to each other, and the
-  flow of execution between them.
-* Chapter :ref:`writing-plugins` describes in detail how to write a plugin.  It focuses on the two
-  most common types of plugins: those which define new Forces, and those which
-  implement new Platforms.
-* Chapter :ref:`the-reference-platform` discusses the architecture of the reference Platform, providing
-  information relevant to writing reference implementations of new features.
-* Chapter :ref:`the-cpu-platform` discusses the architecture of the CPU Platform, providing
-  information relevant to writing CPU implementations of new features.
-* Chapter :ref:`the-opencl-platform` discusses the architecture of the OpenCL Platform, providing
-  information relevant to writing OpenCL implementations of new features.
-* Chapter :ref:`the-cuda-platform` discusses the architecture of the CUDA Platform, providing
-  information relevant to writing CUDA implementations of new features.
-* Chapter :ref:`common-compute` describes the Common Compute framework, which lets you
-  write a single implementation of a feature that can be used for both OpenCL and CUDA.
-This guide assumes you are already familiar with the public API and how to use
-OpenMM in applications.  If that is not the case, you should first read the
-Users Manual and work through some of the example programs.  Pay especially
-close attention to the “Introduction to the OpenMM Library” chapter, since it
-introduces concepts that are important in understanding this guide.
-.. _the-core-library:
-The Core Library
-################
-OpenMM is based on a layered architecture, as shown in the following diagram:
-.. figure:: ../images/ArchitectureLayers.jpg
-   :align: center
-   :width: 100%
-   :autonumber:`Figure,Architecture Layers`\ : OpenMM architecture
-The public API layer consists of the classes you access when using OpenMM in an
-application: System; Force and its subclasses; Integrator and its subclasses;
-and Context.  These classes define a public interface but do no computation.
-The next layer down consists of “implementation” classes that mirror the public
-API classes: ContextImpl, ForceImpl, and a subclass of ForceImpl for each
-subclass of Force (HarmonicBondForceImpl, NonbondedForceImpl, etc.).  These
-objects are created automatically when you create a Context.  They store
-information related to a particular simulation, and define methods for
-performing calculations.
-Note that, whereas a Force is logically “part of” a System, a ForceImpl is
-logically “part of” a Context.  (See :autonumref:`Figure,API Relationships`\ .)  If you create many Contexts
-for simulating the same System, there is still only one System and only one copy
-of each Force in it.  But there will be separate ForceImpls for each Context,
-and those ForceImpls store information related to their particular Contexts.
-.. figure:: ../images/SystemContextRelationships.jpg
-   :align: center
-   :autonumber:`Figure,API Relationships`\ : Relationships between public API and implementation layer objects
-Also note that there is no “IntegratorImpl” class, because it is not needed.
-Integrator is already specific to one Context.  Many Contexts can all simulate
-the same System, but each of them must have its own Integrator, so information
-specific to one simulation can be stored directly in the Integrator.
-The next layer down is the OpenMM Low Level API (OLLA).  The important classes
-in this layer are: Platform; Kernel; KernelImpl and its subclasses; and
-KernelFactory.  A Kernel is just a reference counted pointer to a KernelImpl;
-the real work is done by KernelImpl objects (or more precisely, by instances of
-its subclasses).  A KernelFactory creates KernelImpl objects, and a Platform
-ties together a set of KernelFactories, as well as defining information that
-applies generally to performing computations with that Platform.
-All of these classes (except Kernel) are abstract.  A particular Platform
-provides concrete subclasses of all of them.  For example, the reference
-platform defines a Platform subclass called ReferencePlatform, a KernelFactory
-subclass called ReferenceKernelFactory, and a concrete subclass of each abstract
-KernelImpl type: ReferenceCalcNonbondedForceKernel extends
-CalcNonbondedForceKernel (which in turn extends KernelImpl),
-ReferenceIntegrateVerletStepKernel extends IntegrateVerletStepKernel, and so on.
-We can understand this better by walking through the entire sequence of events
-that takes place when you create a Context.  As an example, suppose you create a
-System; add a NonbondedForce to it; create a VerletIntegrator; and then create a
-Context for them using the reference Platform.  Here is what happens.
-#. The Context constructor creates a ContextImpl.
-#. The ContextImpl calls :code:`createImpl()` on each Force in the System,
-   which creates an instance of the appropriate ForceImpl subclass.
-#. The ContextImpl calls :code:`contextCreated()` on the Platform(), which
-   in turn calls :code:`setPlatformData()` on the ContextImpl.  This allows
-   Platform-specific information to be stored in a ContextImpl.  Every Platform has
-   its own mechanism for storing particle masses, constraint definitions, particle
-   positions, and so on.  ContextImpl therefore allows the Platform to create an
-   arbitrary block of data and store it where it can be accessed by that Platform’s
-   kernels.
-#. The ContextImpl  calls :code:`createKernel()` on the Platform several
-   times to get instances of various kernels that it needs:
-   CalcKineticEnergyKernel, ApplyConstraintsKernel, etc.
-   #. For each kernel, the Platform looks up which KernelFactory has been
-      registered for that particular kernel.  In this case, it will be a
-      ReferenceKernelFactory.
-   #. It calls :code:`createKernelImpl()` on the KernelFactory, which
-      creates and returns an instance of an appropriate KernelImpl subclass:
-      ReferenceCalcKineticEnergyKernel, ReferenceApplyConstraintsKernel, etc.
-#. The ContextImpl loops over all of its ForceImpls and calls 
-   :code:`initialize()` on each one.
-   #. Each ForceImpl asks the Platform to create whatever kernels it needs.  In
-      this example, NonbondedForceImpl will request a CalcNonbondedForceKernel, and
-      get back a ReferenceCalcNonbondedForceKernel.
-#. The ContextImpl calls :code:`initialize()` on the Integrator which, like
-   the other objects, requests kernels from the Platform.  In this example,
-   VerletIntegrator requests an IntegrateVerletStepKernel and gets back a
-   ReferenceIntegrateVerletStepKernel.
-At this point, the Context is fully initialized and ready for doing computation.
-Reference implementations of various KernelImpls have been created, but they are
-always referenced through abstract superclasses.  Similarly, data structures
-specific to the reference Platform have been created and stored in the
-ContextImpl, but the format and content of these structures is opaque to the
-ContextImpl.  Whenever it needs to access them (for example, to get or set
-particle positions), it does so through a kernel (UpdateStateDataKernel in this
-case).
-Now suppose that you call :code:`step()` on the VerletIntegrator.  Here is
-what happens to execute each time step.
-#. The VerletIntegrator calls :code:`updateContextState()` on the
-   ContextImpl.  This gives each Force an opportunity to modify the state of the
-   Context at the start of each time step.
-   #. The ContextImpl loops over its ForceImpls and calls 
-      :code:`updateContextState()` on each one.  In this case, our only ForceImpl is
-      a NonbondedForceImpl, which returns without doing anything.  On the other hand,
-      if we had an AndersenThermostat in our System, its ForceImpl would invoke a
-      kernel to modify particle velocities.
-#. The VerletIntegrator calls :code:`calcForcesAndEnergy()` on the
-   ContextImpl to request that the forces be computed.
-   #. The ContextImpl calls :code:`beginComputation()` on its
-      CalcForcesAndEnergyKernel.  This initializes all the forces to zero and does any
-      other initialization the Platform requires before forces can be computed.  For
-      example, some Platforms construct their nonbonded neighbor lists at this point.
-   #. The ContextImpl loops over its ForceImpls and calls 
-      :code:`calcForcesAndEnergy()` on each one.  In this case, we have a
-      NonbondedForceImpl which invokes its CalcNonbondedForceKernel to compute forces.
-   #. Finally, the ContextImpl calls :code:`finishComputation()` on its
-      CalcForcesAndEnergyKernel.  This does any additional work needed to determine
-      the final forces, such as summing the values from intermediate buffers.
-#. Finally, the VerletIntegrator invokes its IntegrateVerletStepKernel.  This
-   takes the forces, positions, and velocities that are stored in a Platform-
-   specific format in the ContextImpl, uses them to compute new positions and
-   velocities, and stores them in the ContextImpl.
-.. _writing-plugins:
-Writing Plugins
-###############
-A plugin is a dynamic library that adds new features to OpenMM.  It is typically
-stored in the :code:`lib/plugins` directory inside your OpenMM installation,
-and gets loaded along with all other plugins when the user calls
-::
-    Platform::loadPluginsFromDirectory(Platform::getDefaultPluginsDirectory());
-It is also possible to load plugins from a different directory, or to load them
-individually by calling :code:`Platform::loadPluginLibrary()`\ .
-Every plugin must implement two functions that are declared in the
-PluginInitializer.h header file:
-::
-    extern "C" void registerPlatforms();
-    extern "C" void registerKernelFactories();
-When a plugin is loaded, these two functions are invoked to register any
-Platforms and KernelFactories defined by the plugin.  When many plugins are
-loaded at once by calling :code:`Platform::loadPluginsFromDirectory()`\ , 
-:code:`registerPlatforms()` is first called on all of them, then 
-:code:`registerKernelFactories()` is called on all of them.  This allows one
-plugin to define a Platform, and a different plugin to add KernelFactories to
-it; the Platform is guaranteed to be registered by the first plugin before the
-second plugin tries to add its KernelFactories, regardless of what order the
-plugins happen to be loaded in.
-Creating New Platforms
-**********************
-One common type of plugin defines a new Platform.  There are four such plugins
-that come with OpenMM: one for the Reference platform, one for the CPU Platform,
-one for the CUDA Platform, and one for the OpenCL Platform.
-To define a new Platform, you must create subclasses of the various abstract
-classes in the OpenMM Low Level API: a subclass of Platform, one or more
-subclasses of KernelFactory, and a subclass of each KernelImpl.  That is easy to
-say, but a huge amount of work to actually do.  There are many different
-algorithms involved in computing forces, enforcing constraints, performing
-integration, and so on, all of which together make up a Platform.  Of course,
-there is no requirement that every Platform must implement every possible
-feature.  If you do not provide an implementation of a particular kernel, it
-simply means your Platform cannot be used for any simulation that requires that
-kernel; if a user tries to do so, an exception will be thrown.
-Your plugin’s :code:`registerPlatforms()` function should create an instance
-of your Platform subclass, then register it by calling 
-:code:`Platform::registerPlatform()`\ .  You also must register the
-KernelFactory for each kernel your Platform supports.  This can be done in the 
-:code:`registerKernelFactories()` function, or more simply, directly in the
-Platform’s constructor.  You can use as many different KernelFactories as you
-want for different kernels, but usually it is simplest to use a single
-KernelFactory for all of them.  The support for multiple KernelFactories exists
-primarily to let plugins add new features to existing Platforms, as described in
-the next section.
-Creating New Forces
-*******************
-Another common type of plugin defines new Forces and provides implementations of
-them for existing Platforms.  (Defining new Integrators is not specifically
-discussed here, but the process is very similar.)  There are two such plugins
-that come with OpenMM.  They implement the AMOEBA force field and Drude
-oscillators, respectively.
-As an example, suppose you want to create a new Force subclass called
-StringForce that uses the equations of String Theory to compute the interactions
-between particles.  You want to provide implementations of it for all four
-standard platforms: Reference, CPU, CUDA, and OpenCL.
-The first thing to realize is that this *cannot* be done with only a plugin
-library.  Plugins are loaded dynamically at runtime, and they relate to the low
-level API; but you must also provide a public API.  Users of your class need to
-create StringForce objects and call methods on them.  That means providing a
-header file with the class declaration, and a (non-plugin) library with the
-class definition to link their code against.  The implementations for particular
-Platforms can be in plugins, but the public API class itself cannot.  Or to put
-it differently, the full “plugin” (from the user’s perspective) consists of
-three parts: the library OpenMM loads at runtime (which is what OpenMM considers
-to be the “plugin”), a second library for users to link their code against, and
-a header file for them to include in their source code.
-To define the API, you will need to create the following classes:
-#. StringForce.  This is the public API for your force, and users will directly
-   link against the library containing it.
-#. StringForceImpl.  This is the ForceImpl subclass corresponding to
-   StringForce.  It should be defined in the same library as StringForce, and
-   StringForce’s :code:`createImpl()` method should create an instance of it.
-#. CalcStringForceKernel.  This is an abstract class that extends KernelImpl,
-   and defines the API by which StringForceImpl invokes its kernel.  You only need
-   to provide a header file for it, not an implementation; those will be provided
-   by Platforms.
-Now suppose you are writing the OpenCL implementation of StringForce.  Here are
-the classes you need to write:
-#. OpenCLCalcStringForceKernel.  This extends CalcStringForceKernel and provides
-   implementations of its virtual methods.  The code for this class will probably
-   be very complicated (and if it actually works, worth a Nobel Prize).  It may
-   execute many different GPU kernels and create its own internal data structures.
-   But those details are entirely internal to your own code.  As long as this class
-   implements the virtual methods of CalcStringForceKernel, you can do anything you
-   want inside it.
-#. OpenCLStringForceKernelFactory.  This is a KernelFactory subclass that knows
-   how to create instances of OpenCLCalcStringForceKernel.
-Both of these classes should be packaged into a dynamic library (.so on Linux,
-.dylib on Mac, .dll on Windows) that can be loaded as a plugin.  This library
-must also implement the two functions from PluginInitializer.h. 
-:code:`registerPlatforms()` will do nothing, since this plugin does not
-implement any new Platforms.  :code:`registerKernelFactories()` should call
-\ :code:`Platform::getPlatformByName("OpenCL")` to get the OpenCL Platform,
-then create a new OpenCLStringForceKernelFactory and call 
-:code:`registerKernelFactory()` on the Platform to register it.  If the OpenCL
-Platform is not available, you should catch the exception then return without
-doing anything.  Most likely this means there is no OpenCL runtime on the
-computer your code is running on.
-.. _the-reference-platform:
-The Reference Platform
-######################
-The reference Platform is written with simplicity and clarity in mind, not
-performance.  (It is still not always as simple or clear as one might hope, but
-that is the goal.)  When implementing a new feature, it is recommended to create
-the reference implementation first, then use that as a model for the versions in
-other Platforms.
-When using the reference Platform, the “platform-specific data” stored in
-ContextImpl is of type ReferencePlatform::PlatformData, which is declared in
-ReferencePlatform.h.  It has fields for storing positions, velocities, box
-vectors, and other types of data.
-The PlatformData’s vector of forces contains one element for each particle.  At
-the start of each force evaluation, all elements of it are set to zero.  Each
-Force adds its own contributions to the vector, so that at the end, it contains
-the total force acting on each particle.
-There are a few additional classes that contain useful static methods.
-SimTKOpenMMUtilities has various utility functions, of which the most important
-is a random number generator.  ReferenceForce provides methods for calculating
-the displacement between two positions, optionally taking periodic boundary
-conditions into account.
-.. _the-cpu-platform:
-The CPU Plaform
-###############
-CpuPlatform is a subclass of ReferencePlatform.  It provides optimized versions
-of a small number of kernels, while using the reference implementations for all
-the others.  Any kernel implementation written for the reference Platform will
-work equally well with the CPU platform.  Of course, if that kernel happens to
-be a performance bottleneck, you will probably want to write an optimized
-version of it.  But many kernels have negligible effect on performance, and for
-these you can just use the same implementation for both platforms.
-If you choose to do that, you can easily support both platforms with a single
-plugin library.  Just implement :code:`registerKernelFactories()` like this:
-::
-    extern "C" void registerKernelFactories() {
-        for (int i = 0; i < Platform::getNumPlatforms(); i++) {
-            Platform& platform = Platform::getPlatform(i);
-            if (dynamic_cast<ReferencePlatform*>(&platform) != NULL) {
-                // Create and register your KernelFactory.
-            }
-        }
-    }
-The loop identifies every ReferencePlatform, either an instance of the base
-class or of a subclass, and registers a KernelFactory for every one.
 .. _the-opencl-platform:
 The OpenCL Platform
@@ -748,241 +388,3 @@ OpenCLContext plus the delta stored in the OpenCLIntegrationUtilities.  It then
 modifies the deltas so that all distance constraints are satisfied.  The
 integrator must then finish the time step by adding the deltas to the positions
 and storing them into the main position array.
-.. _the-cuda-platform:
-The CUDA Platform
-#################
-The CUDA platform is very similar to the OpenCL platform, and most of the
-previous chapter applies equally well to it, just changing “OpenCL” to “Cuda” in
-class names.  There are a few differences worth noting.
-Compiling Kernels
-*****************
-Like the OpenCL platform, the CUDA platform compiles all its kernels at runtime.
-Unlike OpenCL, CUDA does not have built in support for runtime compilation.
-OpenMM therefore needs to implement this itself by writing the source code out
-to disk, invoking the nvcc compiler as a separate process, and then loading the
-compiled kernel in from disk.
-For the most part, you can ignore all of this.  Just call 
-:code:`createModule()` on the CudaContext, passing it the CUDA source code.
-It takes care of the details of compilation and loading, returning a CUmodule
-object when it is done.  You can then call :code:`getKernel()` to look up
-individual kernels in the module (represented as CUfunction objects) and 
-:code:`executeKernel()` to execute them.
-The CUDA platform does need two things to make this work: a directory on disk
-where it can write out temporary files, and the path to the nvcc compiler.
-These are specified by the “CudaTempDirectory” and “CudaCompiler” properties
-when you create a new Context.  It often can figure out suitable values for them
-on its own, but sometimes it needs help.  See the “Platform-Specific Properties”
-chapter of the Users Manual for details.
-Accumulating Forces
-*******************
-The OpenCL platform, as described in Section :ref:`computing-forces`\ , uses two types of buffers for
-accumulating forces: a set of floating point buffers, and a single fixed point
-buffer.  In contrast, the CUDA platform uses *only* the fixed point buffer
-(represented by the CUDA type :code:`long` :code:`long`\ ).  This means
-the CUDA platform only works on devices that support 64 bit atomic operations
-(compute capability 1.2 or higher).
-.. _common-compute:
-Common Compute
-##############
-Common Compute is not a platform, but it shares many elements of one.  It exists
-to reduce code duplication between the OpenCL and CUDA platforms.  It allows a
-single implementation to be written for most kernels that can be used by both
-platforms.
-OpenCL and CUDA are very similar to each other.  Their computational models are
-nearly identical.  For example, each is based around launching kernels that are
-executed in parallel by many threads.  Each of them groups threads into blocks,
-with more communication and synchronization permitted between the threads
-in a block than between ones in different blocks.  They have very similar memory
-hierarchies: high latency global memory, low latency local/shared memory that
-can be used for communication between the threads of a block, and local variables
-that are visible only to a single thread.
-Even their languages for writing kernels are very similar.  Here is an OpenCL
-kernel that adds two arrays together, storing the result in a third array.
-::
-    __kernel void addArrays(__global const float* restrict a,
-                            __global const float* restrict b,
-                            __global float* restrict c
-                            int length) {
-        for (int i = get_global_id(0); i < length; i += get_global_size(0))
-            c[i] = a[i]+b[i];
-    }
-Here is the corresponding CUDA kernel.
-::
-    __extern "C" __global__ void addArrays(const float* __restrict__ a,
-                                           const float* __restrict__ b,
-                                           _float* __restrict__ c
-                                           int length) {
-        for (int i = blockIdx.x*blockDim.x+threadIdx.x; i < length; i += blockDim.x*gridDim.x)
-            c[i] = a[i]+b[i];
-    }
-The difference between them is largely just a mechanical find-and-replace.
-After many years of writing and maintaining nearly identical kernels by hand,
-it finally occurred to us that the translation could be done automatically by
-the compiler.  Simply by defining a few preprocessor macros, the following
-kernel can be compiled equally well either as OpenCL or as CUDA.
-::
-    KERNEL void addArrays(GLOBAL const float* RESTRICT a,
-                          GLOBAL const float* RESTRICT b,
-                          GLOBAL float* RESTRICT c
-                          int length) {
-        for (int i = GLOBAL_ID; i < length; i += GLOBAL_SIZE)
-            c[i] = a[i]+b[i];
-    }
-Writing Device Code
-*******************
-When compiling kernels with the Common Compute API, the following macros are
-defined.
-+-------------------------------+------------------------------------------------------------+--------------------------------------------+
-|Macro                          |OpenCL Definition                                           |CUDA Definition                             |
-+===============================+============================================================+============================================+
-|:code:`KERNEL`                 |:code:`__kernel`                                            |:code:`extern "C" __global__`               |
-+-------------------------------+------------------------------------------------------------+--------------------------------------------+
-|:code:`DEVICE`                 |                                                            |:code:`__device__`                          |
-+-------------------------------+------------------------------------------------------------+--------------------------------------------+
-|:code:`LOCAL`                  |:code:`__local`                                             |:code:`__shared__`                          |
-+-------------------------------+------------------------------------------------------------+--------------------------------------------+
-|:code:`LOCAL_ARG`              |:code:`__local`                                             |                                            |
-+-------------------------------+------------------------------------------------------------+--------------------------------------------+
-|:code:`GLOBAL`                 |:code:`__global`                                            |                                            |
-+-------------------------------+------------------------------------------------------------+--------------------------------------------+
-|:code:`RESTRICT`               |:code:`restrict`                                            |:code:`__restrict__`                        |
-+-------------------------------+------------------------------------------------------------+--------------------------------------------+
-|:code:`LOCAL_ID`               |:code:`get_local_id(0)`                                     |:code:`threadIdx.x`                         |
-+-------------------------------+------------------------------------------------------------+--------------------------------------------+
-|:code:`LOCAL_SIZE`             |:code:`get_local_size(0)`                                   |:code:`blockDim.x`                          |
-+-------------------------------+------------------------------------------------------------+--------------------------------------------+
-|:code:`GLOBAL_ID`              |:code:`get_global_id(0)`                                    |:code:`(blockIdx.x*blockDim.x+threadIdx.x)` |
-+-------------------------------+------------------------------------------------------------+--------------------------------------------+
-|:code:`GLOBAL_SIZE`            |:code:`get_global_size(0)`                                  |:code:`(blockDim.x*gridDim.x)`              |
-+-------------------------------+------------------------------------------------------------+--------------------------------------------+
-|:code:`GROUP_ID`               |:code:`get_group_id(0)`                                     |:code:`blockIdx.x`                          |
-+-------------------------------+------------------------------------------------------------+--------------------------------------------+
-|:code:`NUM_GROUPS`             |:code:`get_num_groups(0)`                                   |:code:`gridDim.x`                           |
-+-------------------------------+------------------------------------------------------------+--------------------------------------------+
-|:code:`SYNC_THREADS`           |:code:`barrier(CLK_LOCAL_MEM_FENCE+CLK_GLOBAL_MEM_FENCE);`  |:code:`__syncthreads();`                    |
-+-------------------------------+------------------------------------------------------------+--------------------------------------------+
-|:code:`SYNC_WARPS`             | | if SIMT width >= 32:                                     | | if compute capability >= 7.0:            |
-|                               | | :code:`mem_fence(CLK_LOCAL_MEM_FENCE)`                   | | :code:`__syncwarp();`                    |
-|                               | | otherwise:                                               | | otherwise empty                          |
-|                               | | :code:`barrier(CLK_LOCAL_MEM_FENCE)`                     |                                            |
-+-------------------------------+------------------------------------------------------------+--------------------------------------------+
-|:code:`MEM_FENCE`              |:code:`mem_fence(CLK_LOCAL_MEM_FENCE+CLK_GLOBAL_MEM_FENCE);`|:code:`__threadfence_block();`              |
-+-------------------------------+------------------------------------------------------------+--------------------------------------------+
-|:code:`ATOMIC_ADD(dest, value)`|:code:`atom_add(dest, value)`                               |:code:`atomicAdd(dest, value)`              |
-+-------------------------------+------------------------------------------------------------+--------------------------------------------+
-A few other symbols may or may not be defined based on the device you are running on:
-:code:`SUPPORTS_DOUBLE_PRECISION` and :code:`SUPPORTS_64_BIT_ATOMICS`\ .  You
-can use :code:`#ifdef` blocks with these symbols to conditionally compile code
-based on the features supported by the device.  In addition, the CUDA compiler
-defines the symbol :code:`__CUDA_ARCH__`\ , so you can check for this symbol if
-you want to have different code blocks for CUDA and OpenCL.
-Both OpenCL and CUDA define vector types like :code:`int2` and :code:`float4`\ .
-The types they support are different but overlapping.  When writing common code,
-use only the vector types that are supported by both OpenCL and CUDA: 2, 3, and 4
-element vectors of type :code:`short`\ , :code:`int`\ , :code:`float`\ , and
-:code:`double`\ .
-CUDA uses functions to construct vector values, such as :code:`make_float2(x, y)`\ .
-OpenCL instead uses a typecast like syntax: :code:`(float2) (x, y)`\ .  In common
-code, use the CUDA style :code:`make_` functions.  OpenMM provides definitions
-of these functions when compiling as OpenCL.
-In CUDA, vector types are simply data structures.  You can access their elements,
-but not do much more with them.  In contrast, OpenCL's vectors are mathematical
-types.  All standard math operators are defined for them, as well as geometrical
-functions like :code:`dot()` and :code:`cross()`\ .  When compiling kernels as
-CUDA, OpenMM provides definitions of these operators and functions.
-OpenCL also supports "swizzle" notation for vectors.  For example, if :code:`f`
-is a :code:`float4` you can construct a vector of its first three elements
-by writing :code:`f.xyz`\ , or you can swap its first two elements by writing
-:code:`f.xy = f.yx`\ .  Unfortunately, there is no practical way to support this
-in CUDA, so swizzle notation cannot be used in common code.  Because stripping
-the final element from a four component vector is such a common operation, OpenMM
-provides a special function for doing it: :code:`trimTo3(f)` is a vector of its
-first three elements.
-64 bit integers are another data type that needs special handling.  Both OpenCL
-and CUDA support them, but they use different names for them: :code:`long` in OpenCL,
-:code:`long long` in CUDA.  To work around this inconsistency, OpenMM provides
-the typedefs :code:`mm_long` and :code:`mm_ulong` for signed and unsigned 64 bit
-integers in device code.
-Writing Host Code
-*****************
-Host code for Common Compute is very similar to host code for OpenCL or CUDA.
-In fact, most of the classes provided by the OpenCL and CUDA platforms are
-subclasses of Common Compute classes.  For example, OpenCLContext and
-CudaContext are both subclasses of ComputeContext.  When writing common code,
-each KernelImpl should expect a ComputeContext to be passed to its constructor.
-By using the common API provided by that abstract class, it can be used for
-either OpenCL or CUDA just based on the particular context passed to it at
-runtime.  Similarly, OpenCLNonbondedUtilities and CudaNonbondedUtilities are
-subclasses of the abstract NonbondedUtilities class, and so on.
-ArrayInterface is an abstract class defining the interface for arrays stored on
-the device.  OpenCLArray and CudaArray are both subclasses of it.  To simplify
-code that creates and uses arrays, there is also a third subclass called
-ComputeArray.  It acts as a wrapper around an OpenCLArray or CudaArray,
-automatically creating an array of the appropriate type for the current
-platform.  In practice, just follow these rules:
-  1. Whenever you need to create an array, make it a ComputeArray.
-  2. Whenever you write a function that expects an array to be passed to it,
-     declare the type to be ArrayInterface.
-If you do these two things, all differences between platforms will be handled
-automatically.
-OpenCL and CUDA have quite different APIs for compiling and invoking kernels.
-To hide these differences, OpenMM provides a set of abstract classes.  To compile
-device code, pass the source code to :code:`compileProgram()` on the ComputeContext.
-This returns a ComputeProgram.  You can then call its :code:`createKernel()`
-method to get a ComputeKernel object, which has methods for setting arguments
-and invoking the kernel.
-Sometimes you need to refer to vector types in host code, such as to set the
-value for a kernel argument or to access the elements of an array.  OpenCL and
-CUDA both define types for them, but they have different names, and in any case
-you want to avoid using OpenCL-specific or CUDA-specific types in common code.
-OpenMM therefore defines types for vectors in host code.  They have the same
-names as the corresponding types in device code, only with the prefix :code:`mm_`\ ,
-for example :code:`mm_int2` and :code:`mm_float4`\ .
-Three component vectors need special care in this context, because the platforms
-define them differently.  In OpenCL, a three component vector is essentially a
-four component vector whose last component is ignored.  For example,
-:code:`sizeof(float3)` is 12 in CUDA but 16 in OpenCL.  Within a kernel this
-distinction can usually be ignored, but when communicating between host and
-device it becomes vitally important.  It is generally best to avoid storing
-three component vectors in arrays or passing them as arguments.  There are no
-:code:`mm_` host types defined for three component vectors, because CUDA and
-OpenCL would require them to be defined in different ways.
\ No newline at end of file
--- a/docs-source/developerguide/07_cuda_platform.rst
+++ b/docs-source/developerguide/07_cuda_platform.rst
+.. role:: code
+.. raw:: html
+    <style> .code {font-family:monospace;} </style>
+    <style> .caption {text-align:center;} </style>
+.. highlight:: c++
+.. _the-cuda-platform:
+The CUDA Platform
+#################
+The CUDA platform is very similar to the OpenCL platform, and most of the
+previous chapter applies equally well to it, just changing “OpenCL” to “Cuda” in
+class names.  There are a few differences worth noting.
+Compiling Kernels
+*****************
+Like the OpenCL platform, the CUDA platform compiles all its kernels at runtime.
+Unlike OpenCL, CUDA does not have built in support for runtime compilation.
+OpenMM therefore needs to implement this itself by writing the source code out
+to disk, invoking the nvcc compiler as a separate process, and then loading the
+compiled kernel in from disk.
+For the most part, you can ignore all of this.  Just call
+:code:`createModule()` on the CudaContext, passing it the CUDA source code.
+It takes care of the details of compilation and loading, returning a CUmodule
+object when it is done.  You can then call :code:`getKernel()` to look up
+individual kernels in the module (represented as CUfunction objects) and
+:code:`executeKernel()` to execute them.
+The CUDA platform does need two things to make this work: a directory on disk
+where it can write out temporary files, and the path to the nvcc compiler.
+These are specified by the “CudaTempDirectory” and “CudaCompiler” properties
+when you create a new Context.  It often can figure out suitable values for them
+on its own, but sometimes it needs help.  See the “Platform-Specific Properties”
+chapter of the User's Manual for details.
+Accumulating Forces
+*******************
+The OpenCL platform, as described in Section :numref:`computing-forces`\ , uses two types of buffers for
+accumulating forces: a set of floating point buffers, and a single fixed point
+buffer.  In contrast, the CUDA platform uses *only* the fixed point buffer
+(represented by the CUDA type :code:`long` :code:`long`\ ).  This means
+the CUDA platform only works on devices that support 64 bit atomic operations
+(compute capability 1.2 or higher).
--- a/docs-source/developerguide/08_common_compute.rst
+++ b/docs-source/developerguide/08_common_compute.rst
+.. role:: code
+.. raw:: html
+    <style> .code {font-family:monospace;} </style>
+    <style> .caption {text-align:center;} </style>
+.. highlight:: c++
+.. _common-compute:
+Common Compute
+##############
+Common Compute is not a platform, but it shares many elements of one.  It exists
+to reduce code duplication between the OpenCL and CUDA platforms.  It allows a
+single implementation to be written for most kernels that can be used by both
+platforms.
+OpenCL and CUDA are very similar to each other.  Their computational models are
+nearly identical.  For example, each is based around launching kernels that are
+executed in parallel by many threads.  Each of them groups threads into blocks,
+with more communication and synchronization permitted between the threads
+in a block than between ones in different blocks.  They have very similar memory
+hierarchies: high latency global memory, low latency local/shared memory that
+can be used for communication between the threads of a block, and local variables
+that are visible only to a single thread.
+Even their languages for writing kernels are very similar.  Here is an OpenCL
+kernel that adds two arrays together, storing the result in a third array.
+::
+    __kernel void addArrays(__global const float* restrict a,
+                            __global const float* restrict b,
+                            __global float* restrict c
+                            int length) {
+        for (int i = get_global_id(0); i < length; i += get_global_size(0))
+            c[i] = a[i]+b[i];
+    }
+Here is the corresponding CUDA kernel.
+::
+    __extern "C" __global__ void addArrays(const float* __restrict__ a,
+                                           const float* __restrict__ b,
+                                           _float* __restrict__ c
+                                           int length) {
+        for (int i = blockIdx.x*blockDim.x+threadIdx.x; i < length; i += blockDim.x*gridDim.x)
+            c[i] = a[i]+b[i];
+    }
+The difference between them is largely just a mechanical find-and-replace.
+After many years of writing and maintaining nearly identical kernels by hand,
+it finally occurred to us that the translation could be done automatically by
+the compiler.  Simply by defining a few preprocessor macros, the following
+kernel can be compiled equally well either as OpenCL or as CUDA.
+::
+    KERNEL void addArrays(GLOBAL const float* RESTRICT a,
+                          GLOBAL const float* RESTRICT b,
+                          GLOBAL float* RESTRICT c
+                          int length) {
+        for (int i = GLOBAL_ID; i < length; i += GLOBAL_SIZE)
+            c[i] = a[i]+b[i];
+    }
+Writing Device Code
+*******************
+When compiling kernels with the Common Compute API, the following macros are
+defined.
+-------------------------------+------------------------------------------------------------+--------------------------------------------+
+|Macro                          |OpenCL Definition                                           |CUDA Definition                             |
+===============================+============================================================+============================================+
+|:code:`KERNEL`                 |:code:`__kernel`                                            |:code:`extern "C" __global__`               |
+-------------------------------+------------------------------------------------------------+--------------------------------------------+
+|:code:`DEVICE`                 |                                                            |:code:`__device__`                          |
+-------------------------------+------------------------------------------------------------+--------------------------------------------+
+|:code:`LOCAL`                  |:code:`__local`                                             |:code:`__shared__`                          |
+-------------------------------+------------------------------------------------------------+--------------------------------------------+
+|:code:`LOCAL_ARG`              |:code:`__local`                                             |                                            |
+-------------------------------+------------------------------------------------------------+--------------------------------------------+
+|:code:`GLOBAL`                 |:code:`__global`                                            |                                            |
+-------------------------------+------------------------------------------------------------+--------------------------------------------+
+|:code:`RESTRICT`               |:code:`restrict`                                            |:code:`__restrict__`                        |
+-------------------------------+------------------------------------------------------------+--------------------------------------------+
+|:code:`LOCAL_ID`               |:code:`get_local_id(0)`                                     |:code:`threadIdx.x`                         |
+-------------------------------+------------------------------------------------------------+--------------------------------------------+
+|:code:`LOCAL_SIZE`             |:code:`get_local_size(0)`                                   |:code:`blockDim.x`                          |
+-------------------------------+------------------------------------------------------------+--------------------------------------------+
+|:code:`GLOBAL_ID`              |:code:`get_global_id(0)`                                    |:code:`(blockIdx.x*blockDim.x+threadIdx.x)` |
+-------------------------------+------------------------------------------------------------+--------------------------------------------+
+|:code:`GLOBAL_SIZE`            |:code:`get_global_size(0)`                                  |:code:`(blockDim.x*gridDim.x)`              |
+-------------------------------+------------------------------------------------------------+--------------------------------------------+
+|:code:`GROUP_ID`               |:code:`get_group_id(0)`                                     |:code:`blockIdx.x`                          |
+-------------------------------+------------------------------------------------------------+--------------------------------------------+
+|:code:`NUM_GROUPS`             |:code:`get_num_groups(0)`                                   |:code:`gridDim.x`                           |
+-------------------------------+------------------------------------------------------------+--------------------------------------------+
+|:code:`SYNC_THREADS`           |:code:`barrier(CLK_LOCAL_MEM_FENCE+CLK_GLOBAL_MEM_FENCE);`  |:code:`__syncthreads();`                    |
+-------------------------------+------------------------------------------------------------+--------------------------------------------+
+|:code:`SYNC_WARPS`             | | if SIMT width >= 32:                                     | | if compute capability >= 7.0:            |
+|                               | | :code:`mem_fence(CLK_LOCAL_MEM_FENCE)`                   | | :code:`__syncwarp();`                    |
+|                               | | otherwise:                                               | | otherwise empty                          |
+|                               | | :code:`barrier(CLK_LOCAL_MEM_FENCE)`                     |                                            |
+-------------------------------+------------------------------------------------------------+--------------------------------------------+
+|:code:`MEM_FENCE`              |:code:`mem_fence(CLK_LOCAL_MEM_FENCE+CLK_GLOBAL_MEM_FENCE);`|:code:`__threadfence_block();`              |
+-------------------------------+------------------------------------------------------------+--------------------------------------------+
+|:code:`ATOMIC_ADD(dest, value)`|:code:`atom_add(dest, value)`                               |:code:`atomicAdd(dest, value)`              |
+-------------------------------+------------------------------------------------------------+--------------------------------------------+
+A few other symbols may or may not be defined based on the device you are running on:
+:code:`SUPPORTS_DOUBLE_PRECISION` and :code:`SUPPORTS_64_BIT_ATOMICS`\ .  You
+can use :code:`#ifdef` blocks with these symbols to conditionally compile code
+based on the features supported by the device.  In addition, the CUDA compiler
+defines the symbol :code:`__CUDA_ARCH__`\ , so you can check for this symbol if
+you want to have different code blocks for CUDA and OpenCL.
+Both OpenCL and CUDA define vector types like :code:`int2` and :code:`float4`\ .
+The types they support are different but overlapping.  When writing common code,
+use only the vector types that are supported by both OpenCL and CUDA: 2, 3, and 4
+element vectors of type :code:`short`\ , :code:`int`\ , :code:`float`\ , and
+:code:`double`\ .
+CUDA uses functions to construct vector values, such as :code:`make_float2(x, y)`\ .
+OpenCL instead uses a typecast like syntax: :code:`(float2) (x, y)`\ .  In common
+code, use the CUDA style :code:`make_` functions.  OpenMM provides definitions
+of these functions when compiling as OpenCL.
+In CUDA, vector types are simply data structures.  You can access their elements,
+but not do much more with them.  In contrast, OpenCL's vectors are mathematical
+types.  All standard math operators are defined for them, as well as geometrical
+functions like :code:`dot()` and :code:`cross()`\ .  When compiling kernels as
+CUDA, OpenMM provides definitions of these operators and functions.
+OpenCL also supports "swizzle" notation for vectors.  For example, if :code:`f`
+is a :code:`float4` you can construct a vector of its first three elements
+by writing :code:`f.xyz`\ , or you can swap its first two elements by writing
+:code:`f.xy = f.yx`\ .  Unfortunately, there is no practical way to support this
+in CUDA, so swizzle notation cannot be used in common code.  Because stripping
+the final element from a four component vector is such a common operation, OpenMM
+provides a special function for doing it: :code:`trimTo3(f)` is a vector of its
+first three elements.
+64 bit integers are another data type that needs special handling.  Both OpenCL
+and CUDA support them, but they use different names for them: :code:`long` in OpenCL,
+:code:`long long` in CUDA.  To work around this inconsistency, OpenMM provides
+the typedefs :code:`mm_long` and :code:`mm_ulong` for signed and unsigned 64 bit
+integers in device code.
+Writing Host Code
+*****************
+Host code for Common Compute is very similar to host code for OpenCL or CUDA.
+In fact, most of the classes provided by the OpenCL and CUDA platforms are
+subclasses of Common Compute classes.  For example, OpenCLContext and
+CudaContext are both subclasses of ComputeContext.  When writing common code,
+each KernelImpl should expect a ComputeContext to be passed to its constructor.
+By using the common API provided by that abstract class, it can be used for
+either OpenCL or CUDA just based on the particular context passed to it at
+runtime.  Similarly, OpenCLNonbondedUtilities and CudaNonbondedUtilities are
+subclasses of the abstract NonbondedUtilities class, and so on.
+ArrayInterface is an abstract class defining the interface for arrays stored on
+the device.  OpenCLArray and CudaArray are both subclasses of it.  To simplify
+code that creates and uses arrays, there is also a third subclass called
+ComputeArray.  It acts as a wrapper around an OpenCLArray or CudaArray,
+automatically creating an array of the appropriate type for the current
+platform.  In practice, just follow these rules:
+  1. Whenever you need to create an array, make it a ComputeArray.
+  2. Whenever you write a function that expects an array to be passed to it,
+     declare the type to be ArrayInterface.
+If you do these two things, all differences between platforms will be handled
+automatically.
+OpenCL and CUDA have quite different APIs for compiling and invoking kernels.
+To hide these differences, OpenMM provides a set of abstract classes.  To compile
+device code, pass the source code to :code:`compileProgram()` on the ComputeContext.
+This returns a ComputeProgram.  You can then call its :code:`createKernel()`
+method to get a ComputeKernel object, which has methods for setting arguments
+and invoking the kernel.
+Sometimes you need to refer to vector types in host code, such as to set the
+value for a kernel argument or to access the elements of an array.  OpenCL and
+CUDA both define types for them, but they have different names, and in any case
+you want to avoid using OpenCL-specific or CUDA-specific types in common code.
+OpenMM therefore defines types for vectors in host code.  They have the same
+names as the corresponding types in device code, only with the prefix :code:`mm_`\ ,
+for example :code:`mm_int2` and :code:`mm_float4`\ .
+Three component vectors need special care in this context, because the platforms
+define them differently.  In OpenCL, a three component vector is essentially a
+four component vector whose last component is ignored.  For example,
+:code:`sizeof(float3)` is 12 in CUDA but 16 in OpenCL.  Within a kernel this
+distinction can usually be ignored, but when communicating between host and
+device it becomes vitally important.  It is generally best to avoid storing
+three component vectors in arrays or passing them as arguments.  There are no
+:code:`mm_` host types defined for three component vectors, because CUDA and
+OpenCL would require them to be defined in different ways.
--- a/docs-source/developerguide/_static/custom.css
+++ b/docs-source/developerguide/_static/custom.css
+../../api-python/_static/custom.css
\ No newline at end of file