README wiring in a reasonable state, Sphinx docstrings updated

c41d9f2b · Michael Carilli · 5f8c3183 · c41d9f2b · c41d9f2b · c41d9f2b
Commit c41d9f2b authored Jun 15, 2018 by Michael Carilli
10 changed files
--- a/apex/fp16_utils/README.md
+++ b/apex/fp16_utils/README.md
@@ -2,15 +2,15 @@ fp16_optimizer.py contains `FP16_Optimizer`, a Python class designed to wrap an
 ### [FP16_Optimizer API documentation](https://nvidia.github.io/apex/fp16_utils.html#automatic-management-of-master-params-loss-scaling)
-[Simple examples with FP16_Optimizer](https://github.com/NVIDIA/apex/tree/master/examples/FP16_Optimizer_simple)
+### [Simple examples with FP16_Optimizer](https://github.com/NVIDIA/apex/tree/master/examples/FP16_Optimizer_simple)
-[Imagenet with FP16_Optimizer](https://github.com/NVIDIA/apex/tree/master/examples/imagenet)
+### [Imagenet with FP16_Optimizer](https://github.com/NVIDIA/apex/tree/master/examples/imagenet)
-[word_language_model with FP16_Optimizer](https://github.com/NVIDIA/apex/tree/master/examples/word_language_model)
+### [word_language_model with FP16_Optimizer](https://github.com/NVIDIA/apex/tree/master/examples/word_language_model)
 fp16_util.py contains a number of utilities to manually manage master parameters and loss scaling, if the user chooses.  
 ### [Manual management documentation](https://nvidia.github.io/apex/fp16_utils.html#manual-master-parameter-management)
-In addition to `FP16_Optimizer` examples, the Imagenet and word_language_model directories contain examples that demonstrate manual management of master parameters and static loss scaling.  
+The [Imagenet with FP16_Optimizer](https://github.com/NVIDIA/apex/tree/master/examples/imagenet) and [word_language_model with FP16_Optimizer](https://github.com/NVIDIA/apex/tree/master/examples/word_language_model) directories also contain `main.py` files that demonstrate manual management of master parameters and static loss scaling.  These examples illustrate what sort of operations `FP16_Optimizer` is performing automatically.
-These examples illustrate what sort of operations `FP16_Optimizer` is performing automatically.
--- a/apex/fp16_utils/fp16_optimizer.py
+++ b/apex/fp16_utils/fp16_optimizer.py
@@ -53,7 +53,7 @@ class FP16_Module(nn.Module):
 class FP16_Optimizer(object):
    """
    :class:`FP16_Optimizer` is designed to wrap an existing PyTorch optimizer, 
-    and manage (dynamic) loss scaling and master weights in a manner transparent to the user.
+    and manage static or dynamic loss scaling and master weights in a manner transparent to the user.
    For standard use, only two lines must be changed:  creating the :class:`FP16_Optimizer` instance,
    and changing the call to ``backward``.

--- a/apex/parallel/distributed.py
+++ b/apex/parallel/distributed.py
@@ -43,13 +43,13 @@ class DistributedDataParallel(Module):
    When used with ``multiproc.py``, :class:`DistributedDataParallel` 
    assigns 1 process to each of the available (visible) GPUs on the node.
    Parameters are broadcast across participating processes on initialization, and gradients are
-    allreduced and averaged over processes during ``backward()`.
+    allreduced and averaged over processes during ``backward()``.
-    :class:``DistributedDataParallel`` is optimized for use with NCCL.  It achieves high performance by 
+    :class:`DistributedDataParallel` is optimized for use with NCCL.  It achieves high performance by 
    overlapping communication with computation during ``backward()`` and bucketing smaller gradient
    transfers to reduce the total number of transfers required.
-    :class:``DistributedDataParallel`` assumes that your script accepts the command line 
+    :class:`DistributedDataParallel` assumes that your script accepts the command line 
    arguments "rank" and "world-size."  It also assumes that your script calls
    ``torch.cuda.set_device(args.rank)`` before creating the model.
@@ -60,8 +60,7 @@ class DistributedDataParallel(Module):
    Args:
        module: Network definition to be run in multi-gpu/distributed mode.
        message_size (Default = 1e7): Minimum number of elements in a communication bucket.
-        shared_param (Default = False): If your model uses shared parameters this must be True.
+        shared_param (Default = False): If your model uses shared parameters this must be True.  It will disable bucketing of parameters to avoid race conditions.
-        It will disable bucketing of parameters to avoid race conditions.
    """

--- a/docs/Makefile
+++ b/docs/Makefile
@@ -13,7 +13,8 @@ help:
 	@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
 docset: html
-	doc2dash --name $(SPHINXPROJ) --icon $(SOURCEDIR)/_static/img/nv-pytorch2.png --enable-js --online-redirect-url http://pytorch.org/docs/ --force $(BUILDDIR)/html/
+	doc2dash --name $(SPHINXPROJ)  --enable-js --online-redirect-url http://pytorch.org/docs/ --force $(BUILDDIR)/html/
+	# doc2dash --name $(SPHINXPROJ) --icon $(SOURCEDIR)/_static/img/nv-pytorch2.png --enable-js --online-redirect-url http://pytorch.org/docs/ --force $(BUILDDIR)/html/
        # Manually fix because Zeal doesn't deal well with `icon.png`-only at 2x resolution.
 	cp $(SPHINXPROJ).docset/icon.png $(SPHINXPROJ).docset/icon@2x.png

--- a/docs/source/conf.py
+++ b/docs/source/conf.py
@@ -62,7 +62,7 @@ source_suffix = '.rst'
 master_doc = 'index'
 # General information about the project.
-project = 'APEx'
+project = 'Apex'
 copyright = '2018'
 author = 'Christian Sarofeen, Natalia Gimelshein, Michael Carilli, Raul Puri'
@@ -115,7 +115,7 @@ html_theme_options = {
    'logo_only': True,
 }
-html_logo = '_static/img/nv-pytorch2.png'
+# html_logo = '_static/img/nv-pytorch2.png'
 # Add any paths that contain custom static files (such as style sheets) here,
 # relative to this directory. They are copied after the builtin static files,
@@ -161,7 +161,7 @@ latex_elements = {
 # (source start file, target name, title,
 #  author, documentclass [howto, manual, or own class]).
 latex_documents = [
-    (master_doc, 'apex.tex', 'APEx Documentation',
+    (master_doc, 'apex.tex', 'Apex Documentation',
     'Torch Contributors', 'manual'),
 ]
@@ -171,7 +171,7 @@ latex_documents = [
 # One entry per manual page. List of tuples
 # (source start file, name, description, authors, manual section).
 man_pages = [
-    (master_doc, 'APEx', 'APEx Documentation',
+    (master_doc, 'Apex', 'Apex Documentation',
     [author], 1)
 ]
@@ -182,8 +182,8 @@ man_pages = [
 # (source start file, target name, title, author,
 #  dir menu entry, description, category)
 texinfo_documents = [
-    (master_doc, 'APEx', 'APEx Documentation',
+    (master_doc, 'Apex', 'Apex Documentation',
-     author, 'APEx', 'One line description of project.',
+     author, 'Apex', 'One line description of project.',
     'Miscellaneous'),
 ]

--- a/docs/source/fp16_utils.rst
+++ b/docs/source/fp16_utils.rst
@@ -10,12 +10,29 @@ presented by NVIDIA `on Parallel Forall`_ and in GTC 2018 Sessions
 `Training Neural Networks with Mixed Precision: Real Examples`_.
 For Pytorch users, Real Examples in particular is recommended.
+Full runnable Python scripts demonstrating ``apex.fp16_utils`` 
+can be found on the Github page:
+| `Simple FP16_Optimizer demos`_
+|
+| `Distributed Mixed Precision Training with imagenet`_
+|
+| `Mixed Precision Training with word_language_model`_
+|
+|
 .. _`on Parallel Forall`:
    https://devblogs.nvidia.com/mixed-precision-training-deep-neural-networks/
 .. _`Training Neural Networks with Mixed Precision: Theory and Practice`:
    http://on-demand.gputechconf.com/gtc/2018/video/S8923/
 .. _`Training Neural Networks with Mixed Precision: Real Examples`:
    http://on-demand.gputechconf.com/gtc/2018/video/S81012/
+.. _`Simple FP16_Optimizer demos`:
+    https://github.com/NVIDIA/apex/tree/master/examples/FP16_Optimizer_simple
+.. _`Distributed Mixed Precision Training with imagenet`:
+    https://github.com/NVIDIA/apex/tree/master/examples/imagenet
+.. _`Mixed Precision Training with word_language_model`:
+    https://github.com/NVIDIA/apex/tree/master/examples/word_language_model
 .. automodule:: apex.fp16_utils
 .. currentmodule:: apex.fp16_utils

--- a/docs/source/index.rst
+++ b/docs/source/index.rst
@@ -11,17 +11,24 @@ Apex (A PyTorch Extension)
 This site contains the API documentation for Apex (https://github.com/nvidia/apex),
 a Pytorch extension with NVIDIA-maintained utilities to streamline mixed precision and distributed training.  Some of the code here will be included in upstream Pytorch eventually. The intention of Apex is to make up-to-date utilities available to users as quickly as possible.
-Installation requires CUDA 9 or later, PyTorch 0.4 or later, and Python 3. Installation can be done by running
+Installation requires CUDA 9 or later, PyTorch 0.4 or later, and Python 3. Install by running
 ::
-  git clone https://www.github.com/nvidia/apex
-  cd apex
+   git clone https://www.github.com/nvidia/apex
-  python setup.py install
+   cd apex
+   python setup.py install
+.. toctree::
+   :maxdepth: 1
+   :caption: AMP:  Automatic Mixed Precision
+   amp
 .. toctree::
   :maxdepth: 1
-   :caption: FP16/Mixed Precision Training
+   :caption: FP16/Mixed Precision Utilities
   fp16_utils

--- a/examples/FP16_Optimizer_simple/README.md
+++ b/examples/FP16_Optimizer_simple/README.md
@@ -9,25 +9,28 @@
 See [the API documentation](https://nvidia.github.io/apex/fp16_utils.html#apex.fp16_utils.FP16_Optimizer.step) for more details.
+<!---
+TODO:  add checkpointing example showing deserialization on the correct device
 #### Checkpointing
 `FP16_Optimizer` also supports checkpointing with the same control flow as ordinary Pytorch optimizers.
 `save_load.py` shows an example.  Test via `python save_load.py`.
 See [the API documentation](https://nvidia.github.io/apex/fp16_utils.html#apex.fp16_utils.FP16_Optimizer.load_state_dict) for more details.
+-->
 #### Distributed
-**distributed_pytorch** shows an example using `FP16_Optimizer` with Pytorch DistributedDataParallel.
+**distributed_apex** shows an example using `FP16_Optimizer` with Apex DistributedDataParallel.
 The usage of `FP16_Optimizer` with distributed does not need to change from ordinary single-process 
 usage. Test via
 ```bash
-cd distributed_pytorch
+cd distributed_apex
 bash run.sh
 ```
-**distributed_pytorch** shows an example using `FP16_Optimizer` with Apex DistributedDataParallel.
+**distributed_pytorch** shows an example using `FP16_Optimizer` with Pytorch DistributedDataParallel.
 Again, the usage of `FP16_Optimizer` with distributed does not need to change from ordinary 
 single-process usage.  Test via
 ```bash
-cd distributed_apex
+cd distributed_pytorch
 bash run.sh
 ```
--- a/examples/distributed/README.md
+++ b/examples/distributed/README.md
@@ -17,6 +17,8 @@ transfers to reduce the total number of transfers required.
 [Source Code](https://github.com/NVIDIA/apex/tree/master/apex/parallel)
+[Another Example: Imagenet with mixed precision](https://github.com/NVIDIA/apex/tree/master/examples/imagenet)
 ## Getting started
 Prior to running please run
 ```pip install -r requirements.txt```
@@ -26,8 +28,10 @@ To download the dataset, run
 without any arguments.  Once you have downloaded the dataset, you should not need to do this again.
 You can now launch multi-process distributed data parallel jobs via
-```python -m apex.parallel.multiproc main.py args...```
+```bash
-adding any args... you'd like.  The launch script `apex.parallel.multiproc` will 
+python -m apex.parallel.multiproc main.py args...
+```
+adding any `args...` you like.  The launch script `apex.parallel.multiproc` will 
 spawn one process for each of your system's available (visible) GPUs.
 Each process will run `python main.py args... --world-size <worldsize> --rank <rank>`
 (the `--world-size` and `--rank` arguments are determined and appended by `apex.parallel.multiproc`).
@@ -45,7 +49,5 @@ which will run on devices 0 and 1.  By default, if `CUDA_VISIBLE_DEVICES` is uns
 To understand how to convert your own model, please see all sections of main.py within ```#=====START: ADDED FOR DISTRIBUTED======``` and ```#=====END:   ADDED FOR DISTRIBUTED======``` flags.
-[Example with Imagenet and mixed precision training](https://github.com/NVIDIA/apex/tree/master/examples/imagenet)
 ## Requirements
 Pytorch master branch built from source. This requirement is to use NCCL as a distributed backend.
--- a/examples/imagenet/README.md
+++ b/examples/imagenet/README.md
@@ -3,7 +3,7 @@
 This example is based on [https://github.com/pytorch/examples/tree/master/imagenet](https://github.com/pytorch/examples/tree/master/imagenet).
 It implements training of popular model architectures, such as ResNet, AlexNet, and VGG on the ImageNet dataset.
-`main.py` and `main_fp16_optimizer.py` have been modified to use the `DistributedDataParallel` module in APEx instead of the one in upstream PyTorch.  For description of how this works please see the distributed example included in this repo.
+`main.py` and `main_fp16_optimizer.py` have been modified to use the `DistributedDataParallel` module in Apex instead of the one in upstream PyTorch.  For description of how this works please see the distributed example included in this repo.
 `main.py` with the `--fp16` argument demonstrates mixed precision training with manual management of master parameters and loss scaling.
@@ -15,8 +15,8 @@ adding any normal arguments.
 ## Requirements
- APEx which can be installed from https://www.github.com/nvidia/apex
+- Apex which can be installed from https://www.github.com/nvidia/apex
- Install PyTorch from source, master branch of ([pytorch on github](https://www.github.com/pytorch/pytorch)
+- Install PyTorch from source, master branch of [pytorch on github](https://www.github.com/pytorch/pytorch).
 - `pip install -r requirements.txt`
 - Download the ImageNet dataset and move validation images to labeled subfolders
    - To do this, you can use the following script: https://raw.githubusercontent.com/soumith/imagenetloader.torch/master/valprep.sh