Rewiring READMEs

2a2341c7 · Michael Carilli · 378ce1e1 · 2a2341c7 · 2a2341c7 · 2a2341c7
Commit 2a2341c7 authored Jun 14, 2018 by Michael Carilli
7 changed files
--- a/README.md
+++ b/README.md
 # Introduction
-This repo is designed to hold PyTorch modules and utilities that are under active development and experimental. This repo is not designed as a long term solution or a production solution. Things placed in here are intended to be eventually moved to upstream PyTorch.
+This repository holds NVIDIA-maintained utilities to streamline 
+mixed precision and distributed training in Pytorch. 
+Some of the code here will be included in upstream Pytorch eventually.
+The intention of Apex is to make up-to-date utilities available to 
+users as quickly as possible.
+# [Full Documentation](https://nvidia.github.io/apex)
+# Contents
+## 1. Mixed Precision 
+[amp:  Automatic Mixed Precision](https://github.com/NVIDIA/apex/tree/master/apex/amp)
+`apex.amp` is a tool designed for ease of use and maximum safety in FP16 training.  All potentially unsafe ops are performed in FP32 under the hood, while safe ops are performed using faster, Tensor Core-friendly FP16 math.  `amp` also automatically implements dynamic loss scaling. 
+The intention of `amp` is to be the "on-ramp" to easy FP16 training: achieve all the numerical stability of full FP32 training, with most of the performance benefits of full FP16 training.
+[FP16_Optimizer](https://github.com/NVIDIA/apex/tree/master/apex/fp16_utils)
+`apex.FP16_Optimizer` wraps an existing Python optimizer and automatically implements master parameters and static or dynamic loss scaling under the hood.
+The intention of `FP16_Optimizer` is to be the "highway" for FP16 training: achieve most of the numerically stability of full FP32 training, and almost all the performance benefits of full FP16 training.
+### Examples:
+[Simple examples with FP16_Optimizer](https://github.com/NVIDIA/apex/tree/master/examples/FP16_Optimizer_simple)
+[Imagenet with FP16_Optimizer](https://github.com/NVIDIA/apex/tree/master/examples/imagenet)
+[word_language_model with FP16_Optimizer](https://github.com/NVIDIA/apex/tree/master/examples/word_language_model)
+The Imagenet and word_language_model directories also contain examples that show manual management of master parameters and static loss scaling.  
+These examples illustrate what sort of operations `amp` and `FP16_Optimizer` are performing automatically.
+## 2. Distributed Training
+`apex.parallel.DistributedDataParallel` is a module wrapper, similar to 
+`torch.nn.parallel.DistributedDataParallel`.  It enables convenient multiprocess distributed training,
+optimized for NVIDIA's NCCL communication library.
+`apex.parallel.multiproc` is a launch utility that helps set up arguments for `DistributedDataParallel.`
+### [Example/Walkthrough](https://github.com/csarofeen/examples/tree/apex/distributed).
+### [Python Source](https://nvidia.github.io/apex/parallel).
 # Requirements
@@ -8,9 +53,11 @@ Python 3
 CUDA 9
-PyTorch 0.4 or newer.  We recommend to use the latest stable release, obtainable from https://pytorch.org/.  We also test against the latest master branch, obtainable from https://github.com/pytorch/pytorch.  If you have any problems building, please file an issue.
+PyTorch 0.4 or newer.  We recommend to use the latest stable release, obtainable from 
+[https://pytorch.org/](https://pytorch.org/).  We also test against the latest master branch, obtainable from [https://github.com/pytorch/pytorch](https://github.com/pytorch/pytorch).  
+If you have any problems building, please file an issue.
-# [Full Documentation](https://nvidia.github.io/apex)
 # Quick Start
@@ -19,7 +66,7 @@ To build the extension run the following command in the root directory of this p
 python setup.py install
 ```
-To use the extension simply run
+To use the extension
 ```
 import apex
 ```
@@ -28,15 +75,12 @@ and optionally (if required for your use)
 import apex_C as apex_backend
 ```
-# What's included
+<!--
+reparametrization and RNN API under construction
 Current version of apex contains:
-1. Mixed precision utilities can be found [here](https://nvidia.github.io/apex/fp16_utils) examples of using mixed precision utilities can be found for the [PyTorch imagenet example](https://github.com/csarofeen/examples/tree/apex/imagenet) and the [PyTorch word language model example](https://github.com/csarofeen/examples/tree/apex/word_language_model).
-2. Parallel utilities can be found [here](https://nvidia.github.io/apex/parallel) and an example/walkthrough can be found [here](https://github.com/csarofeen/examples/tree/apex/distributed)
-  - apex/parallel/distributed.py contains a simplified implementation of PyTorch's DistributedDataParallel that's optimized for use with NCCL in single gpu / process mode
-  - apex/parallel/multiproc.py is a simple multi-process launcher that can be used on a single node/computer with multiple GPU's
 3. Reparameterization function that allows you to recursively apply reparameterization to an entire module (including children modules).
 4. An experimental and in development flexible RNN API.
+-->
--- a/apex/RNN/README.md
+++ b/apex/RNN/README.md
+Under construction...
--- a/apex/fp16_utils/README.md
+++ b/apex/fp16_utils/README.md
+fp16_optimizer.py contains `FP16_Optimizer`, a Python class designed to wrap an existing Pytorch optimizer and automatically enable master parameters and loss scaling in a manner transparent to the user.  To use `FP16_Optimizer`, only two lines of one's Python model need to change.
+### [FP16_Optimizer API documentation](https://nvidia.github.io/apex/fp16_utils.html#automatic-management-of-master-params-loss-scaling)
+[Simple examples with FP16_Optimizer](https://github.com/NVIDIA/apex/tree/master/examples/FP16_Optimizer_simple)
+[Imagenet with FP16_Optimizer](https://github.com/NVIDIA/apex/tree/master/examples/imagenet)
+[word_language_model with FP16_Optimizer](https://github.com/NVIDIA/apex/tree/master/examples/word_language_model)
+fp16_util.py contains a number of utilities to manually manage master parameters and loss scaling, if the user chooses.  
+### [Manual management documentation](https://nvidia.github.io/apex/fp16_utils.html#manual-master-parameter-management)
+In addition to `FP16_Optimizer` examples, the Imagenet and word_language_model directories contain examples that demonstrate manual management of master parameters and static loss scaling.  
+These examples illustrate what sort of operations `FP16_Optimizer` is performing automatically.
--- a/apex/parallel/README.md
+++ b/apex/parallel/README.md
+distributed.py contains the source code for `apex.parallel.DistributedDataParallel`, a module wrapper that enables multi-process multi-GPU data parallel training, optimized for NVIDIA's NCCL communication library.
+`apex.parallel.DistributedDataParallel` achieves high performance by overlapping communication with
+computation in the backward pass and bucketing smaller transfers to reduce the total number of
+transfers required.
+multiproc.py contains the source code for `apex.parallel.multiproc`, a launch utility that places one process on each of the node's available GPUs.
+### [API Documentation](https://nvidia.github.io/apex/parallel.html)
+### [Example/Walkthrough](https://github.com/csarofeen/examples/tree/apex/distributed)
--- a/apex/reparameterization/README.md
+++ b/apex/reparameterization/README.md
+Under construction...
--- a/docs/source/fp16_utils.rst
+++ b/docs/source/fp16_utils.rst
@@ -20,15 +20,6 @@ For Pytorch users, Real Examples in particular is recommended.
 .. automodule:: apex.fp16_utils
 .. currentmodule:: apex.fp16_utils
-Manual master parameter management
----------------------------------
-.. autofunction:: prep_param_lists
-.. autofunction:: master_params_to_model_params
-.. autofunction:: model_grads_to_master_grads
 Automatic management of master params + loss scaling
 ----------------------------------------------------
@@ -41,6 +32,15 @@ Automatic management of master params + loss scaling
 .. autoclass:: DynamicLossScaler
    :members:
+Manual master parameter management
+----------------------------------
+.. autofunction:: prep_param_lists
+.. autofunction:: master_params_to_model_params
+.. autofunction:: model_grads_to_master_grads
 Custom Operations
 -----------------

--- a/examples/distributed/README.md
+++ b/examples/distributed/README.md
-# Basic Multirpocess Example based on the MNIST example
+# Basic Multiprocess Example based on pytorch/examples/mnist
-This version of this examples requires APEx which can be installed from https://www.github.com/nvidia/apex. This example demonstrates how to modify a network to use a basic but effective distributed data parallel module. This parallel method is designed to easily run multi-gpu runs on a single node. It was created as current parallel methods integraded into pytorch can induce significant overhead due to python GIL lock. This method will reduce the influence of those overheads and potentially provide a benefit in performance, especially for networks with a significant number of fast running operations.
+This example demonstrates how to modify a network to use a simple but effective distributed data parallel module. This parallel method is designed to easily run multi-gpu runs on a single node. It was created as current parallel methods integrated into pytorch can induce significant overhead due to python GIL lock. This method will reduce the influence of those overheads and potentially provide a benefit in performance, especially for networks with a significant number of fast running operations.
+[API Documentation](https://nvidia.github.io/apex/parallel.html)
+[Source Code](https://github.com/csarofeen/examples/tree/apex/distributed)
 ## Getting started
 Prior to running please run
@@ -9,13 +13,13 @@ Prior to running please run
 and start a single process run to allow the dataset to be downloaded (This will not work properly in multi-gpu. You can stop this job as soon as it starts iterating.).
 ```python main.py```
-You can now the code multi-gpu with
+You can now launch multi-process data-parallel jobs via
 ```python -m apex.parallel.multiproc main.py ...```
-adding any normal option you'd like.
+adding any normal option you'd like.  Each process will run on one of you system's available GPUs.
 ## Converting your own model
 To understand how to convert your own model to use the distributed module included, please see all sections of main.py within ```#=====START: ADDED FOR DISTRIBUTED======``` and ```#=====END:   ADDED FOR DISTRIBUTED======``` flags.
 ## Requirements
 Pytorch master branch built from source. This requirement is to use NCCL as a distributed backend.
-APEx installed from https://www.github.com/nvidia/apex
+Apex installed from https://www.github.com/nvidia/apex