README.md 3.99 KB
Newer Older
Christian Sarofeen's avatar
Christian Sarofeen committed
1
2
# Introduction

Michael Carilli's avatar
Michael Carilli committed
3
4
5
6
7
8
This repository holds NVIDIA-maintained utilities to streamline 
mixed precision and distributed training in Pytorch. 
Some of the code here will be included in upstream Pytorch eventually.
The intention of Apex is to make up-to-date utilities available to 
users as quickly as possible.

9
## Full API Documentation: [https://nvidia.github.io/apex](https://nvidia.github.io/apex)
Michael Carilli's avatar
Michael Carilli committed
10
11
12
13
14

# Contents

## 1. Mixed Precision 

15
### amp:  Automatic Mixed Precision
Michael Carilli's avatar
Michael Carilli committed
16
17
18
19
20

`apex.amp` is a tool designed for ease of use and maximum safety in FP16 training.  All potentially unsafe ops are performed in FP32 under the hood, while safe ops are performed using faster, Tensor Core-friendly FP16 math.  `amp` also automatically implements dynamic loss scaling. 

The intention of `amp` is to be the "on-ramp" to easy FP16 training: achieve all the numerical stability of full FP32 training, with most of the performance benefits of full FP16 training.

21
22
23
[Python Source and API Documentation](https://github.com/NVIDIA/apex/tree/master/apex/amp)

### FP16_Optimizer
Michael Carilli's avatar
Michael Carilli committed
24
25
26
27
28

`apex.FP16_Optimizer` wraps an existing Python optimizer and automatically implements master parameters and static or dynamic loss scaling under the hood.

The intention of `FP16_Optimizer` is to be the "highway" for FP16 training: achieve most of the numerically stability of full FP32 training, and almost all the performance benefits of full FP16 training.

29
[API Documentation](https://nvidia.github.io/apex/fp16_utils.html#automatic-management-of-master-params-loss-scaling)
Michael Carilli's avatar
Michael Carilli committed
30

Michael Carilli's avatar
Michael Carilli committed
31
32
[Python Source](https://github.com/NVIDIA/apex/tree/master/apex/fp16_utils)

Michael Carilli's avatar
Michael Carilli committed
33
34
35
36
37
38
39
[Simple examples with FP16_Optimizer](https://github.com/NVIDIA/apex/tree/master/examples/FP16_Optimizer_simple)

[Imagenet with FP16_Optimizer](https://github.com/NVIDIA/apex/tree/master/examples/imagenet)

[word_language_model with FP16_Optimizer](https://github.com/NVIDIA/apex/tree/master/examples/word_language_model)

The Imagenet and word_language_model directories also contain examples that show manual management of master parameters and static loss scaling.  
40
41

These manual examples illustrate what sort of operations `amp` and `FP16_Optimizer` are performing automatically.
Michael Carilli's avatar
Michael Carilli committed
42
43
44
45
46
47
48
49
50

## 2. Distributed Training

`apex.parallel.DistributedDataParallel` is a module wrapper, similar to 
`torch.nn.parallel.DistributedDataParallel`.  It enables convenient multiprocess distributed training,
optimized for NVIDIA's NCCL communication library.

`apex.parallel.multiproc` is a launch utility that helps set up arguments for `DistributedDataParallel.`

51
52
[API Documentation](https://nvidia.github.io/apex/parallel.html)

53
[Python Source](https://github.com/NVIDIA/apex/tree/master/apex/parallel)
Michael Carilli's avatar
Michael Carilli committed
54

55
[Example/Walkthrough](https://github.com/NVIDIA/apex/tree/master/examples/distributed)
Christian Sarofeen's avatar
Christian Sarofeen committed
56

57
58
59
The [Imagenet with FP16_Optimizer](https://github.com/NVIDIA/apex/tree/master/examples/imagenet) 
mixed precision examples also demonstrate `apex.parallel.DistributedDataParallel`.

Christian Sarofeen's avatar
Christian Sarofeen committed
60
61
62
# Requirements

Python 3
Michael Carilli's avatar
Michael Carilli committed
63

Christian Sarofeen's avatar
Christian Sarofeen committed
64
CUDA 9
Michael Carilli's avatar
Michael Carilli committed
65

Michael Carilli's avatar
Michael Carilli committed
66
67
68
69
PyTorch 0.4 or newer.  We recommend to use the latest stable release, obtainable from 
[https://pytorch.org/](https://pytorch.org/).  We also test against the latest master branch, obtainable from [https://github.com/pytorch/pytorch](https://github.com/pytorch/pytorch).  
If you have any problems building, please file an issue.

Christian Sarofeen's avatar
Christian Sarofeen committed
70
71
72
73


# Quick Start

74
### Linux
mcarilli's avatar
mcarilli committed
75
To build the extension run
Christian Sarofeen's avatar
Christian Sarofeen committed
76
77
78
```
python setup.py install
```
mcarilli's avatar
mcarilli committed
79
in the root directory of the cloned repository.
Christian Sarofeen's avatar
Christian Sarofeen committed
80

Michael Carilli's avatar
Michael Carilli committed
81
To use the extension
Christian Sarofeen's avatar
Christian Sarofeen committed
82
83
84
85
```
import apex
```

86
### Windows support
mcarilli's avatar
mcarilli committed
87
Windows support is experimental, and Linux is recommended.  However, since Apex is Python-only, there's a good chance it "just works" the same way as Linux.  If you installed Pytorch in a Conda environment, make sure to install Apex in that same environment.
88

Michael Carilli's avatar
Michael Carilli committed
89
90
<!--
reparametrization and RNN API under construction
Christian Sarofeen's avatar
Christian Sarofeen committed
91
92
93
94

Current version of apex contains:
3. Reparameterization function that allows you to recursively apply reparameterization to an entire module (including children modules).
4. An experimental and in development flexible RNN API.
Michael Carilli's avatar
Michael Carilli committed
95
-->
Christian Sarofeen's avatar
Christian Sarofeen committed
96
97