You need to sign in or sign up before continuing.
Commit 0f703d13 authored by Michael Carilli's avatar Michael Carilli
Browse files

Added Dockerfile example, more readme updates

parent 942174bf
...@@ -6,25 +6,29 @@ Some of the code here will be included in upstream Pytorch eventually. ...@@ -6,25 +6,29 @@ Some of the code here will be included in upstream Pytorch eventually.
The intention of Apex is to make up-to-date utilities available to The intention of Apex is to make up-to-date utilities available to
users as quickly as possible. users as quickly as possible.
# [Full Documentation](https://nvidia.github.io/apex) ## Full API Documentation: [https://nvidia.github.io/apex](https://nvidia.github.io/apex)
# Contents # Contents
## 1. Mixed Precision ## 1. Mixed Precision
[amp: Automatic Mixed Precision](https://github.com/NVIDIA/apex/tree/master/apex/amp) ### amp: Automatic Mixed Precision
`apex.amp` is a tool designed for ease of use and maximum safety in FP16 training. All potentially unsafe ops are performed in FP32 under the hood, while safe ops are performed using faster, Tensor Core-friendly FP16 math. `amp` also automatically implements dynamic loss scaling. `apex.amp` is a tool designed for ease of use and maximum safety in FP16 training. All potentially unsafe ops are performed in FP32 under the hood, while safe ops are performed using faster, Tensor Core-friendly FP16 math. `amp` also automatically implements dynamic loss scaling.
The intention of `amp` is to be the "on-ramp" to easy FP16 training: achieve all the numerical stability of full FP32 training, with most of the performance benefits of full FP16 training. The intention of `amp` is to be the "on-ramp" to easy FP16 training: achieve all the numerical stability of full FP32 training, with most of the performance benefits of full FP16 training.
[FP16_Optimizer](https://github.com/NVIDIA/apex/tree/master/apex/fp16_utils) [Python Source and API Documentation](https://github.com/NVIDIA/apex/tree/master/apex/amp)
### FP16_Optimizer
`apex.FP16_Optimizer` wraps an existing Python optimizer and automatically implements master parameters and static or dynamic loss scaling under the hood. `apex.FP16_Optimizer` wraps an existing Python optimizer and automatically implements master parameters and static or dynamic loss scaling under the hood.
The intention of `FP16_Optimizer` is to be the "highway" for FP16 training: achieve most of the numerically stability of full FP32 training, and almost all the performance benefits of full FP16 training. The intention of `FP16_Optimizer` is to be the "highway" for FP16 training: achieve most of the numerically stability of full FP32 training, and almost all the performance benefits of full FP16 training.
### Examples: [Python Source](https://github.com/NVIDIA/apex/tree/master/apex/fp16_utils)
[API Documentation](https://nvidia.github.io/apex/fp16_utils.html#automatic-management-of-master-params-loss-scaling)
[Simple examples with FP16_Optimizer](https://github.com/NVIDIA/apex/tree/master/examples/FP16_Optimizer_simple) [Simple examples with FP16_Optimizer](https://github.com/NVIDIA/apex/tree/master/examples/FP16_Optimizer_simple)
...@@ -43,9 +47,11 @@ optimized for NVIDIA's NCCL communication library. ...@@ -43,9 +47,11 @@ optimized for NVIDIA's NCCL communication library.
`apex.parallel.multiproc` is a launch utility that helps set up arguments for `DistributedDataParallel.` `apex.parallel.multiproc` is a launch utility that helps set up arguments for `DistributedDataParallel.`
### [Example/Walkthrough](https://github.com/csarofeen/examples/tree/apex/distributed). [API Documentation](https://nvidia.github.io/apex/parallel.html)
[Python Source](https://nvidia.github.io/apex/parallel)
### [Python Source](https://nvidia.github.io/apex/parallel). [Example/Walkthrough](https://github.com/NVIDIA/apex/tree/master/examples/distributed)
# Requirements # Requirements
......
...@@ -7,6 +7,6 @@ multiproc.py contains the source code for `apex.parallel.multiproc`, a launch ut ...@@ -7,6 +7,6 @@ multiproc.py contains the source code for `apex.parallel.multiproc`, a launch ut
### [API Documentation](https://nvidia.github.io/apex/parallel.html) ### [API Documentation](https://nvidia.github.io/apex/parallel.html)
### [Example/Walkthrough](https://github.com/csarofeen/examples/tree/apex/distributed) ### [Example/Walkthrough](https://github.com/NVIDIA/apex/tree/master/examples/distributed)
## Contents:
distributed: Walkthrough of apex distributed data parallel utilities.
FP16_Optimizer_simple: Simple examples demonstrating various use cases of `FP16_Optimizer` to automatically manage master parameters and static or dynamic loss scaling.
imagenet: Example based on [https://github.com/pytorch/examples/tree/master/imagenet](https://github.com/pytorch/examples/tree/master/imagenet) showing the use of `FP16_Optimizer`, as well as manual management of master parameters and loss scaling for illustration/comparison.
word_language_model: Example based on [https://github.com/pytorch/examples/tree/master/word_language_model](https://github.com/pytorch/examples/tree/master/word_language_model) showing the use of `FP16_Optimizer`, as well as manual management of master parameters and loss scaling for illustration/comparison.
docker: Example of a minimal Dockerfile that installs Apex on top of the Pytorch 0.4 container.
FROM pytorch/pytorch:0.4_cuda9_cudnn7
WORKDIR /workspace
# uninstall Apex if present
RUN pip uninstall -y apex || :
# SHA is something the user can alter to force recreation of this Docker layer,
# and therefore force cloning the latest version of Apex
RUN SHA=43d1ae08 git clone https://github.com/NVIDIA/apex.git
WORKDIR /workspace/apex
RUN python setup.py install
WORKDIR /workspace
Example of a minimal Dockerfile that installs Apex on top of upstream Pytorch's stable 0.4 container (pytorch/pytorch:0.4_cuda9_cudnn7).
# ImageNet training in PyTorch # ImageNet training in PyTorch
This example is based on https://github.com/pytorch/examples/tree/master/imagenet. This example is based on [https://github.com/pytorch/examples/tree/master/imagenet](https://github.com/pytorch/examples/tree/master/imagenet).
It implements training of popular model architectures, such as ResNet, AlexNet, and VGG on the ImageNet dataset. It implements training of popular model architectures, such as ResNet, AlexNet, and VGG on the ImageNet dataset.
`main.py` and `main_fp16_optimizer.py` have been modified to use the `DistributedDataParallel` module in APEx instead of the one in upstream PyTorch. For description of how this works please see the distributed example included in this repo. `main.py` and `main_fp16_optimizer.py` have been modified to use the `DistributedDataParallel` module in APEx instead of the one in upstream PyTorch. For description of how this works please see the distributed example included in this repo.
......
# Word-level language modeling RNN # Word-level language modeling RNN
This example is based on https://github.com/pytorch/examples/tree/master/imagenet. This example is based on [https://github.com/pytorch/examples/tree/master/word_language_model](https://github.com/pytorch/examples/tree/master/word_language_model).
It trains a multi-layer RNN (Elman, GRU, or LSTM) on a language modeling task. It trains a multi-layer RNN (Elman, GRU, or LSTM) on a language modeling task.
By default, the training script uses the Wikitext-2 dataset, provided. By default, the training script uses the Wikitext-2 dataset, provided.
The trained model can then be used by the generate script to generate new text. The trained model can then be used by the generate script to generate new text.
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment