README.md 2.64 KB
Newer Older
1
2
# Simple examples of FP16_Optimizer functionality

Michael Carilli's avatar
Michael Carilli committed
3
4
5
6
To use `FP16_Optimizer` on a half-precision model, or a model with a mixture of 
half and float parameters, only two lines of your training script need to change:
1. Construct an `FP16_Optimizer` instance from an existing optimizer.
2. Replace `loss.backward()` with `optimizer.backward(loss)`.
7

8
#### [Full API Documentation](https://nvidia.github.io/apex/fp16_utils.html#automatic-management-of-master-params-loss-scaling)
Michael Carilli's avatar
Michael Carilli committed
9
10
11

See "Other Options" at the bottom of this page for some cases that require special treatment.

Michael Carilli's avatar
Michael Carilli committed
12
#### Minimal Working Sample
13
`minimal.py` shows the basic usage of `FP16_Optimizer` with either static or dynamic loss scaling.  Test via `python minimal.py`.
14

Michael Carilli's avatar
Michael Carilli committed
15
#### Closures
16
`FP16_Optimizer` supports closures with the same control flow as ordinary Pytorch optimizers.  
17
18
`closure.py` shows an example.  Test via `python closure.py`.

19
See [the API documentation](https://nvidia.github.io/apex/fp16_utils.html#apex.fp16_utils.FP16_Optimizer.step) for more details.
20

Michael Carilli's avatar
Michael Carilli committed
21
22
#### Serialization/Deserialization
`FP16_Optimizer` supports saving and loading with the same control flow as ordinary Pytorch optimizers.
23
24
`save_load.py` shows an example.  Test via `python save_load.py`.

25
See [the API documentation](https://nvidia.github.io/apex/fp16_utils.html#apex.fp16_utils.FP16_Optimizer.load_state_dict) for more details.
26

Michael Carilli's avatar
Michael Carilli committed
27
#### Distributed
28
**distributed_apex** shows an example using `FP16_Optimizer` with Apex DistributedDataParallel.
29
The usage of `FP16_Optimizer` with distributed does not need to change from ordinary single-process 
30
usage. Test via
31
```bash
32
cd distributed_apex
33
34
35
bash run.sh
```

36
**distributed_pytorch** shows an example using `FP16_Optimizer` with Pytorch DistributedDataParallel.
37
Again, the usage of `FP16_Optimizer` with distributed does not need to change from ordinary 
38
single-process usage.  Test via
39
```bash
40
cd distributed_pytorch
41
42
bash run.sh
```
Michael Carilli's avatar
Michael Carilli committed
43
44
45

#### Other Options

46
47
Gradient clipping requires that calls to `torch.nn.utils.clip_grad_norm`
be replaced with [fp16_optimizer_instance.clip_master_grads()](https://nvidia.github.io/apex/fp16_utils.html#apex.fp16_utils.FP16_Optimizer.clip_master_grads).  The [word_language_model example](https://github.com/NVIDIA/apex/blob/master/examples/word_language_model/main_fp16_optimizer.py) uses this feature.
Michael Carilli's avatar
Michael Carilli committed
48
49
50
51
52
53
54
55
56
57
58
59

Multiple losses will work if you simply replace
```bash
loss1.backward()
loss2.backward()
```
with 
```bash
optimizer.backward(loss1)
optimizer.backward(loss2)
```
but `FP16_Optimizer` can be told to handle this more efficiently using the 
60
[update_master_grads()](https://nvidia.github.io/apex/fp16_utils.html#apex.fp16_utils.FP16_Optimizer.update_master_grads) option.