Commit c41d9f2b authored by Michael Carilli's avatar Michael Carilli
Browse files

README wiring in a reasonable state, Sphinx docstrings updated

parent 5f8c3183
......@@ -2,15 +2,15 @@ fp16_optimizer.py contains `FP16_Optimizer`, a Python class designed to wrap an
### [FP16_Optimizer API documentation](https://nvidia.github.io/apex/fp16_utils.html#automatic-management-of-master-params-loss-scaling)
[Simple examples with FP16_Optimizer](https://github.com/NVIDIA/apex/tree/master/examples/FP16_Optimizer_simple)
### [Simple examples with FP16_Optimizer](https://github.com/NVIDIA/apex/tree/master/examples/FP16_Optimizer_simple)
[Imagenet with FP16_Optimizer](https://github.com/NVIDIA/apex/tree/master/examples/imagenet)
### [Imagenet with FP16_Optimizer](https://github.com/NVIDIA/apex/tree/master/examples/imagenet)
[word_language_model with FP16_Optimizer](https://github.com/NVIDIA/apex/tree/master/examples/word_language_model)
### [word_language_model with FP16_Optimizer](https://github.com/NVIDIA/apex/tree/master/examples/word_language_model)
fp16_util.py contains a number of utilities to manually manage master parameters and loss scaling, if the user chooses.
### [Manual management documentation](https://nvidia.github.io/apex/fp16_utils.html#manual-master-parameter-management)
In addition to `FP16_Optimizer` examples, the Imagenet and word_language_model directories contain examples that demonstrate manual management of master parameters and static loss scaling.
These examples illustrate what sort of operations `FP16_Optimizer` is performing automatically.
The [Imagenet with FP16_Optimizer](https://github.com/NVIDIA/apex/tree/master/examples/imagenet) and [word_language_model with FP16_Optimizer](https://github.com/NVIDIA/apex/tree/master/examples/word_language_model) directories also contain `main.py` files that demonstrate manual management of master parameters and static loss scaling. These examples illustrate what sort of operations `FP16_Optimizer` is performing automatically.
......@@ -53,7 +53,7 @@ class FP16_Module(nn.Module):
class FP16_Optimizer(object):
"""
:class:`FP16_Optimizer` is designed to wrap an existing PyTorch optimizer,
and manage (dynamic) loss scaling and master weights in a manner transparent to the user.
and manage static or dynamic loss scaling and master weights in a manner transparent to the user.
For standard use, only two lines must be changed: creating the :class:`FP16_Optimizer` instance,
and changing the call to ``backward``.
......
......@@ -43,13 +43,13 @@ class DistributedDataParallel(Module):
When used with ``multiproc.py``, :class:`DistributedDataParallel`
assigns 1 process to each of the available (visible) GPUs on the node.
Parameters are broadcast across participating processes on initialization, and gradients are
allreduced and averaged over processes during ``backward()`.
allreduced and averaged over processes during ``backward()``.
:class:``DistributedDataParallel`` is optimized for use with NCCL. It achieves high performance by
:class:`DistributedDataParallel` is optimized for use with NCCL. It achieves high performance by
overlapping communication with computation during ``backward()`` and bucketing smaller gradient
transfers to reduce the total number of transfers required.
:class:``DistributedDataParallel`` assumes that your script accepts the command line
:class:`DistributedDataParallel` assumes that your script accepts the command line
arguments "rank" and "world-size." It also assumes that your script calls
``torch.cuda.set_device(args.rank)`` before creating the model.
......@@ -60,8 +60,7 @@ class DistributedDataParallel(Module):
Args:
module: Network definition to be run in multi-gpu/distributed mode.
message_size (Default = 1e7): Minimum number of elements in a communication bucket.
shared_param (Default = False): If your model uses shared parameters this must be True.
It will disable bucketing of parameters to avoid race conditions.
shared_param (Default = False): If your model uses shared parameters this must be True. It will disable bucketing of parameters to avoid race conditions.
"""
......
......@@ -13,7 +13,8 @@ help:
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
docset: html
doc2dash --name $(SPHINXPROJ) --icon $(SOURCEDIR)/_static/img/nv-pytorch2.png --enable-js --online-redirect-url http://pytorch.org/docs/ --force $(BUILDDIR)/html/
doc2dash --name $(SPHINXPROJ) --enable-js --online-redirect-url http://pytorch.org/docs/ --force $(BUILDDIR)/html/
# doc2dash --name $(SPHINXPROJ) --icon $(SOURCEDIR)/_static/img/nv-pytorch2.png --enable-js --online-redirect-url http://pytorch.org/docs/ --force $(BUILDDIR)/html/
# Manually fix because Zeal doesn't deal well with `icon.png`-only at 2x resolution.
cp $(SPHINXPROJ).docset/icon.png $(SPHINXPROJ).docset/icon@2x.png
......
......@@ -62,7 +62,7 @@ source_suffix = '.rst'
master_doc = 'index'
# General information about the project.
project = 'APEx'
project = 'Apex'
copyright = '2018'
author = 'Christian Sarofeen, Natalia Gimelshein, Michael Carilli, Raul Puri'
......@@ -115,7 +115,7 @@ html_theme_options = {
'logo_only': True,
}
html_logo = '_static/img/nv-pytorch2.png'
# html_logo = '_static/img/nv-pytorch2.png'
# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
......@@ -161,7 +161,7 @@ latex_elements = {
# (source start file, target name, title,
# author, documentclass [howto, manual, or own class]).
latex_documents = [
(master_doc, 'apex.tex', 'APEx Documentation',
(master_doc, 'apex.tex', 'Apex Documentation',
'Torch Contributors', 'manual'),
]
......@@ -171,7 +171,7 @@ latex_documents = [
# One entry per manual page. List of tuples
# (source start file, name, description, authors, manual section).
man_pages = [
(master_doc, 'APEx', 'APEx Documentation',
(master_doc, 'Apex', 'Apex Documentation',
[author], 1)
]
......@@ -182,8 +182,8 @@ man_pages = [
# (source start file, target name, title, author,
# dir menu entry, description, category)
texinfo_documents = [
(master_doc, 'APEx', 'APEx Documentation',
author, 'APEx', 'One line description of project.',
(master_doc, 'Apex', 'Apex Documentation',
author, 'Apex', 'One line description of project.',
'Miscellaneous'),
]
......
......@@ -10,12 +10,29 @@ presented by NVIDIA `on Parallel Forall`_ and in GTC 2018 Sessions
`Training Neural Networks with Mixed Precision: Real Examples`_.
For Pytorch users, Real Examples in particular is recommended.
Full runnable Python scripts demonstrating ``apex.fp16_utils``
can be found on the Github page:
| `Simple FP16_Optimizer demos`_
|
| `Distributed Mixed Precision Training with imagenet`_
|
| `Mixed Precision Training with word_language_model`_
|
|
.. _`on Parallel Forall`:
https://devblogs.nvidia.com/mixed-precision-training-deep-neural-networks/
.. _`Training Neural Networks with Mixed Precision: Theory and Practice`:
http://on-demand.gputechconf.com/gtc/2018/video/S8923/
.. _`Training Neural Networks with Mixed Precision: Real Examples`:
http://on-demand.gputechconf.com/gtc/2018/video/S81012/
.. _`Simple FP16_Optimizer demos`:
https://github.com/NVIDIA/apex/tree/master/examples/FP16_Optimizer_simple
.. _`Distributed Mixed Precision Training with imagenet`:
https://github.com/NVIDIA/apex/tree/master/examples/imagenet
.. _`Mixed Precision Training with word_language_model`:
https://github.com/NVIDIA/apex/tree/master/examples/word_language_model
.. automodule:: apex.fp16_utils
.. currentmodule:: apex.fp16_utils
......
......@@ -11,17 +11,24 @@ Apex (A PyTorch Extension)
This site contains the API documentation for Apex (https://github.com/nvidia/apex),
a Pytorch extension with NVIDIA-maintained utilities to streamline mixed precision and distributed training. Some of the code here will be included in upstream Pytorch eventually. The intention of Apex is to make up-to-date utilities available to users as quickly as possible.
Installation requires CUDA 9 or later, PyTorch 0.4 or later, and Python 3. Installation can be done by running
Installation requires CUDA 9 or later, PyTorch 0.4 or later, and Python 3. Install by running
::
git clone https://www.github.com/nvidia/apex
cd apex
python setup.py install
git clone https://www.github.com/nvidia/apex
cd apex
python setup.py install
.. toctree::
:maxdepth: 1
:caption: AMP: Automatic Mixed Precision
amp
.. toctree::
:maxdepth: 1
:caption: FP16/Mixed Precision Training
:caption: FP16/Mixed Precision Utilities
fp16_utils
......
......@@ -9,25 +9,28 @@
See [the API documentation](https://nvidia.github.io/apex/fp16_utils.html#apex.fp16_utils.FP16_Optimizer.step) for more details.
<!---
TODO: add checkpointing example showing deserialization on the correct device
#### Checkpointing
`FP16_Optimizer` also supports checkpointing with the same control flow as ordinary Pytorch optimizers.
`save_load.py` shows an example. Test via `python save_load.py`.
See [the API documentation](https://nvidia.github.io/apex/fp16_utils.html#apex.fp16_utils.FP16_Optimizer.load_state_dict) for more details.
-->
#### Distributed
**distributed_pytorch** shows an example using `FP16_Optimizer` with Pytorch DistributedDataParallel.
**distributed_apex** shows an example using `FP16_Optimizer` with Apex DistributedDataParallel.
The usage of `FP16_Optimizer` with distributed does not need to change from ordinary single-process
usage. Test via
```bash
cd distributed_pytorch
cd distributed_apex
bash run.sh
```
**distributed_pytorch** shows an example using `FP16_Optimizer` with Apex DistributedDataParallel.
**distributed_pytorch** shows an example using `FP16_Optimizer` with Pytorch DistributedDataParallel.
Again, the usage of `FP16_Optimizer` with distributed does not need to change from ordinary
single-process usage. Test via
```bash
cd distributed_apex
cd distributed_pytorch
bash run.sh
```
......@@ -17,6 +17,8 @@ transfers to reduce the total number of transfers required.
[Source Code](https://github.com/NVIDIA/apex/tree/master/apex/parallel)
[Another Example: Imagenet with mixed precision](https://github.com/NVIDIA/apex/tree/master/examples/imagenet)
## Getting started
Prior to running please run
```pip install -r requirements.txt```
......@@ -26,8 +28,10 @@ To download the dataset, run
without any arguments. Once you have downloaded the dataset, you should not need to do this again.
You can now launch multi-process distributed data parallel jobs via
```python -m apex.parallel.multiproc main.py args...```
adding any args... you'd like. The launch script `apex.parallel.multiproc` will
```bash
python -m apex.parallel.multiproc main.py args...
```
adding any `args...` you like. The launch script `apex.parallel.multiproc` will
spawn one process for each of your system's available (visible) GPUs.
Each process will run `python main.py args... --world-size <worldsize> --rank <rank>`
(the `--world-size` and `--rank` arguments are determined and appended by `apex.parallel.multiproc`).
......@@ -45,7 +49,5 @@ which will run on devices 0 and 1. By default, if `CUDA_VISIBLE_DEVICES` is uns
To understand how to convert your own model, please see all sections of main.py within ```#=====START: ADDED FOR DISTRIBUTED======``` and ```#=====END: ADDED FOR DISTRIBUTED======``` flags.
[Example with Imagenet and mixed precision training](https://github.com/NVIDIA/apex/tree/master/examples/imagenet)
## Requirements
Pytorch master branch built from source. This requirement is to use NCCL as a distributed backend.
......@@ -3,7 +3,7 @@
This example is based on [https://github.com/pytorch/examples/tree/master/imagenet](https://github.com/pytorch/examples/tree/master/imagenet).
It implements training of popular model architectures, such as ResNet, AlexNet, and VGG on the ImageNet dataset.
`main.py` and `main_fp16_optimizer.py` have been modified to use the `DistributedDataParallel` module in APEx instead of the one in upstream PyTorch. For description of how this works please see the distributed example included in this repo.
`main.py` and `main_fp16_optimizer.py` have been modified to use the `DistributedDataParallel` module in Apex instead of the one in upstream PyTorch. For description of how this works please see the distributed example included in this repo.
`main.py` with the `--fp16` argument demonstrates mixed precision training with manual management of master parameters and loss scaling.
......@@ -15,8 +15,8 @@ adding any normal arguments.
## Requirements
- APEx which can be installed from https://www.github.com/nvidia/apex
- Install PyTorch from source, master branch of ([pytorch on github](https://www.github.com/pytorch/pytorch)
- Apex which can be installed from https://www.github.com/nvidia/apex
- Install PyTorch from source, master branch of [pytorch on github](https://www.github.com/pytorch/pytorch).
- `pip install -r requirements.txt`
- Download the ImageNet dataset and move validation images to labeled subfolders
- To do this, you can use the following script: https://raw.githubusercontent.com/soumith/imagenetloader.torch/master/valprep.sh
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment