Documentation updates

7f39db93 · Michael Carilli · df83b67e · 7f39db93 · 7f39db93 · 7f39db93
Commit 7f39db93 authored Mar 04, 2019 by Michael Carilli
Hide whitespace changes
Inline Side-by-side

Showing with 95 additions and 22 deletions

docs/source/amp.rst docs/source/amp.rst +80 -6

docs/source/index.rst docs/source/index.rst +7 -14

setup.py setup.py +8 -2

No files found.
--- a/docs/source/amp.rst
+++ b/docs/source/amp.rst
@@ -4,23 +4,23 @@
 apex.amp
 ===================================
-This page documents Amp (Automatic Mixed Precision) 1.0, a tool to enable Tensor Core-accelerated
+This page documents the update API for Amp (Automatic Mixed Precision),
-training in only 3 lines of Python.
+a tool to enable Tensor Core-accelerated training in only 3 lines of Python.
 Amp allows users to easily experiment with different pure and mixed precision modes, including
 pure FP16 training and pure FP32 training.  Commonly-used default modes are chosen by
 selecting an "optimization level" or ``opt_level``; each ``opt_level`` establishes a set of
 properties that govern Amp's implementation of pure or mixed precision training.
-Finer-grained control of how a given ``opt_level`` behaves can be achieved by also passing values for
+Finer-grained control of how a given ``opt_level`` behaves can be achieved by passing values for
 particular properties directly to ``amp.initialize``.  These manually specified values will
 override the defaults established by the ``opt_level``.  If you attempt to override a property
 that does not make sense for the current ``opt_level``, Amp will raise an error with an explanation.
 Users **should not** manually cast their model or data to ``.half()``, regardless of what ``opt_level``
 or properties are chosen.  Amp intends that users start with an existing default (FP32) script,
-add the three lines corresponding to the Amp 1.0 API, and begin training with mixed precision.
+add the three lines corresponding to the Amp API, and begin training with mixed precision.
 Amp can also be disabled, in which case the original script will behave exactly as it used to.
-In this way, there's no risk adhering to the Amp 1.0 API, and a lot of potential performance benefit.
+In this way, there's no risk adhering to the Amp API, and a lot of potential performance benefit.
 Example::
        model = torch.nn.Linear(D_in, D_out).cuda().half()
@@ -32,6 +32,14 @@ Example::
            scaled_loss.backward()
        ...
+A `runnable, comprehensive Imagenet example`_ demonstrating good practices can be found
+on the Github page.
+DCGAN is a tricky case that many people have requested.  A comprehensive example is under construction.
+.. _`runnable, comprehensive Imagenet example`:
+    https://github.com/NVIDIA/apex/tree/master/examples/imagenet
 .. automodule:: apex.amp
 .. currentmodule:: apex.amp
@@ -39,6 +47,72 @@ Example::
 .. autofunction:: scale_loss
+Advanced use cases
+------------------
+The new Amp API supports gradient accumulation across iterations,
+multiple backward passes per iteration, multiple models/optimizers,
+and forcing layers to a particular type.  Further details can be found here:
+.. toctree::
+   :maxdepth: 1
+   advanced
+Transition Guide for Old API Users
+----------------------------------
+We strongly encourage moving to the new Amp API, because it's more versatile, easier to use, and future proof.  The original :class:`FP16_Optimizer` and the old "Amp" API are deprecated, and subject to removal at at any time.
+**For users of the old "Amp" API**
+In the new API, ``opt-level O1`` performs the same patching of the Torch namespace as the old Amp API.
+However, the new API allows choosing static or dynamic loss scaling, while the old API only allowed dynamic loss scaling.
+In the new API, the old call to ``amp_handle = amp.init()``, and the returned ``amp_handle``, are no
+longer exposed or necessary.  The new ``amp.initialize()`` does the duty of ``amp.init()`` (and more).
+Therefore, any existing calls to ``amp_handle = amp.init()`` should be deleted.
+The functions formerly exposed through ``amp_handle`` are now free
+functions accessible through the ``amp`` module.
+The backward context manager must be changed accordingly::
+    # old API
+    with amp_handle.scale_loss(loss, optimizer) as scaled_loss:
+        scaled_loss.backward()
+    ->
+    # new API
+    with amp.scale_loss(loss, optimizer) as scaled_loss:
+        scaled_loss.backward()
+For now, the deprecated "Amp" API documentation can still be found on the Github README:  https://github.com/NVIDIA/apex/tree/master/apex/amp.  The old API calls that `annotate user functions`_ to run
+with a particular precision are still honored by the new API.
+.. _`annotate user functions`:
+    https://github.com/NVIDIA/apex/tree/master/apex/amp#annotating-user-functions
+**For users of the old FP16_Optimizer**
+``opt-level O2`` is equivalent to :class:`FP16_Optimizer` with ``dynamic_loss_scale=True``.
+Once again, the backward pass must be changed to the unified version::
+    optimizer.backward(loss)
+    ->
+    with amp.scale_loss(loss, optimizer) as scaled_loss:
+        scaled_loss.backward()
+One annoying aspect of FP16_Optimizer was that the user had to manually convert their model to half
+(either by calling ``.half()`` on it, or using a function or module wrapper from
+``apex.fp16_utils``), and also manually call ``.half()`` on input data.  **Neither of these are
+necessary in the new API.  No matter what --opt-level
+you choose, you can and should simply build your model in the default FP32 format.**  The new Amp
+API will perform the right conversions during
+``model, optimizer = amp.initialize(model, optimizer, opt_level=....)`` based on the ``--opt-level``
+and any overridden flags.  Floating point input data may be float or half, but you may as well just
+let it be float, because the ``model`` returned by ``amp.initialize`` will have its ``forward``
+method patched to cast the input data appropriately.
-Legacy documentation for the old "Amp" API (equivalent to ``opt_level="O1"`` in the new Amp 1.0 API) can be found on the Github README:  https://github.com/NVIDIA/apex/tree/master/apex/amp.
+.. note::
+    Aside from the call to ``amp.initialize`` itself, it's never necessary to manually cast
+    your model or data with the new API.  Therefore, a script that adheres to the new API
+    can switch between different ``opt-level``s without having to make any other changes.
--- a/docs/source/index.rst
+++ b/docs/source/index.rst
@@ -11,14 +11,7 @@ Apex (A PyTorch Extension)
 This site contains the API documentation for Apex (https://github.com/nvidia/apex),
 a Pytorch extension with NVIDIA-maintained utilities to streamline mixed precision and distributed training.  Some of the code here will be included in upstream Pytorch eventually. The intention of Apex is to make up-to-date utilities available to users as quickly as possible.
-Installation requires CUDA 9 or later, PyTorch 0.4 or later, and Python 3. Install by running
+Installation instructions can be found here:  https://github.com/NVIDIA/apex#quick-start.
-::
-   git clone https://www.github.com/nvidia/apex
-   cd apex
-   python setup.py install [--cuda_ext] [--cpp_ext]
 .. toctree::
   :maxdepth: 1
@@ -26,12 +19,6 @@ Installation requires CUDA 9 or later, PyTorch 0.4 or later, and Python 3. Insta
   amp
-.. toctree::
-   :maxdepth: 1
-   :caption: Legacy mixed precision utilities
-   fp16_utils
 .. toctree::
   :maxdepth: 1
   :caption: Distributed Training
@@ -50,6 +37,12 @@ Installation requires CUDA 9 or later, PyTorch 0.4 or later, and Python 3. Insta
   layernorm
+.. toctree::
+   :maxdepth: 1
+   :caption: Deprecated mixed precision utilities
+   fp16_utils
 ..   reparameterization
 ..   RNN

--- a/setup.py
+++ b/setup.py
@@ -39,6 +39,12 @@ if "--cuda_ext" in sys.argv:
    if torch.utils.cpp_extension.CUDA_HOME is None:
        raise RuntimeError("--cuda_ext was requested, but nvcc was not found.  Are you sure your environment has nvcc available?  If you're installing within a container from https://hub.docker.com/r/pytorch/pytorch, only images whose names contain 'devel' will provide nvcc.")
    else:
+        # Set up macros for forward/backward compatibility hack around
+        # https://github.com/pytorch/pytorch/commit/4404762d7dd955383acee92e6f06b48144a0742e
+        version_ge_1_1 = []
+        if (TORCH_MAJOR > 1) or (TORCH_MAJOR == 1 and TORCH_MINOR > 0):
+            version_ge_1_1 = ['-DVERSION_GE_1_1']
        ext_modules.append(
            CUDAExtension(name='amp_C',
                          sources=['csrc/amp_C_frontend.cpp',
@@ -63,10 +69,10 @@ if "--cuda_ext" in sys.argv:
            CUDAExtension(name='fused_layer_norm_cuda',
                          sources=['apex/normalization/csrc/layer_norm_cuda.cpp',
                                   'apex/normalization/csrc/layer_norm_cuda_kernel.cu'],
-                          extra_compile_args={'cxx': ['-O3',],
+                          extra_compile_args={'cxx': ['-O3'] + version_ge_1_1,
                                              'nvcc':['-maxrregcount=50',
                                                      '-O3', 
-                                                      '--use_fast_math']}))
+                                                      '--use_fast_math'] + version_ge_1_1}))
 setup(
    name='apex',