amp.rst 5.97 KB
Newer Older
Michael Carilli's avatar
Michael Carilli committed
1
2
3
4
5
6
.. role:: hidden
    :class: hidden-section

apex.amp
===================================

Michael Carilli's avatar
Michael Carilli committed
7
8
9
10
Unified API
-----------

This page documents the updated API for Amp (Automatic Mixed Precision),
Michael Carilli's avatar
Michael Carilli committed
11
a tool to enable Tensor Core-accelerated training in only 3 lines of Python.
Michael Carilli's avatar
Michael Carilli committed
12

13
14
15
16
Amp allows users to easily experiment with different pure and mixed precision modes, including
pure FP16 training and pure FP32 training.  Commonly-used default modes are chosen by
selecting an "optimization level" or ``opt_level``; each ``opt_level`` establishes a set of
properties that govern Amp's implementation of pure or mixed precision training.
Michael Carilli's avatar
Michael Carilli committed
17
Finer-grained control of how a given ``opt_level`` behaves can be achieved by passing values for
18
19
20
particular properties directly to ``amp.initialize``.  These manually specified values will
override the defaults established by the ``opt_level``.  If you attempt to override a property
that does not make sense for the current ``opt_level``, Amp will raise an error with an explanation.
Michael Carilli's avatar
Michael Carilli committed
21

22
23
Users **should not** manually cast their model or data to ``.half()``, regardless of what ``opt_level``
or properties are chosen.  Amp intends that users start with an existing default (FP32) script,
Michael Carilli's avatar
Michael Carilli committed
24
add the three lines corresponding to the Amp API, and begin training with mixed precision.
25
Amp can also be disabled, in which case the original script will behave exactly as it used to.
Michael Carilli's avatar
Michael Carilli committed
26
In this way, there's no risk adhering to the Amp API, and a lot of potential performance benefit.
27
28

Example::
Michael Carilli's avatar
Michael Carilli committed
29
30

        # Declare model and optimizer as usual
31
32
        model = torch.nn.Linear(D_in, D_out).cuda().half()
        optimizer = torch.optim.SGD(model.parameters(), lr=1e-3)
Michael Carilli's avatar
Michael Carilli committed
33
34

        # Allow Amp to perform casts as required by the opt_level
35
36
37
38
39
40
41
        model, optimizer = amp.initialize(model, optimizer, opt_level="O1")
        ...
        # loss.backward() becomes:
        with amp.scale_loss(loss, optimizer) as scaled_loss:
            scaled_loss.backward()
        ...

Michael Carilli's avatar
Michael Carilli committed
42
43
44
A `runnable, comprehensive Imagenet example`_ demonstrating good practices can be found
on the Github page.

Michael Carilli's avatar
Michael Carilli committed
45
46
47
48
49
GANs are a tricky case that many people have requested.  A `comprehensive DCGAN example`_
is under construction.

``opt_level``\ s and Properties
-------------------------------
Michael Carilli's avatar
Michael Carilli committed
50
51
52
53

.. _`runnable, comprehensive Imagenet example`:
    https://github.com/NVIDIA/apex/tree/master/examples/imagenet

Michael Carilli's avatar
Michael Carilli committed
54
55
56
.. _`comprehensive DCGAN example`:
    https://github.com/NVIDIA/apex/tree/master/examples/dcgan

57
58
59
60
61
62
63
.. automodule:: apex.amp
.. currentmodule:: apex.amp

.. autofunction:: initialize

.. autofunction:: scale_loss

Michael Carilli's avatar
Michael Carilli committed
64
65
.. autofunction:: master_params

Michael Carilli's avatar
Michael Carilli committed
66
67
68
69
70
Advanced use cases
------------------

The new Amp API supports gradient accumulation across iterations,
multiple backward passes per iteration, multiple models/optimizers,
Michael Carilli's avatar
Michael Carilli committed
71
72
73
and custom/user-defined autograd functions.  Gradient clipping and GANs also
require special treatment, but this treatment does not need to change
for different ``opt_level``\ s.  Further details can be found here:
Michael Carilli's avatar
Michael Carilli committed
74
75
76
77
78
79

.. toctree::
   :maxdepth: 1

   advanced

Michael Carilli's avatar
Michael Carilli committed
80
Transition guide for old API users
Michael Carilli's avatar
Michael Carilli committed
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
----------------------------------

We strongly encourage moving to the new Amp API, because it's more versatile, easier to use, and future proof.  The original :class:`FP16_Optimizer` and the old "Amp" API are deprecated, and subject to removal at at any time.

**For users of the old "Amp" API**

In the new API, ``opt-level O1`` performs the same patching of the Torch namespace as the old Amp API.
However, the new API allows choosing static or dynamic loss scaling, while the old API only allowed dynamic loss scaling.

In the new API, the old call to ``amp_handle = amp.init()``, and the returned ``amp_handle``, are no
longer exposed or necessary.  The new ``amp.initialize()`` does the duty of ``amp.init()`` (and more).
Therefore, any existing calls to ``amp_handle = amp.init()`` should be deleted.

The functions formerly exposed through ``amp_handle`` are now free
functions accessible through the ``amp`` module.

The backward context manager must be changed accordingly::
Michael Carilli's avatar
Michael Carilli committed
98

Michael Carilli's avatar
Michael Carilli committed
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
    # old API
    with amp_handle.scale_loss(loss, optimizer) as scaled_loss:
        scaled_loss.backward()
    ->
    # new API
    with amp.scale_loss(loss, optimizer) as scaled_loss:
        scaled_loss.backward()

For now, the deprecated "Amp" API documentation can still be found on the Github README:  https://github.com/NVIDIA/apex/tree/master/apex/amp.  The old API calls that `annotate user functions`_ to run
with a particular precision are still honored by the new API.

.. _`annotate user functions`:
    https://github.com/NVIDIA/apex/tree/master/apex/amp#annotating-user-functions


**For users of the old FP16_Optimizer**

``opt-level O2`` is equivalent to :class:`FP16_Optimizer` with ``dynamic_loss_scale=True``.
Once again, the backward pass must be changed to the unified version::
Michael Carilli's avatar
Michael Carilli committed
118

Michael Carilli's avatar
Michael Carilli committed
119
120
121
122
    optimizer.backward(loss)
    ->
    with amp.scale_loss(loss, optimizer) as scaled_loss:
        scaled_loss.backward()
123

Michael Carilli's avatar
Michael Carilli committed
124
125
126
127
128
129
130
One annoying aspect of FP16_Optimizer was that the user had to manually convert their model to half
(either by calling ``.half()`` on it, or using a function or module wrapper from
``apex.fp16_utils``), and also manually call ``.half()`` on input data.  **Neither of these are
necessary in the new API.  No matter what --opt-level
you choose, you can and should simply build your model in the default FP32 format.**  The new Amp
API will perform the right conversions during
``model, optimizer = amp.initialize(model, optimizer, opt_level=....)`` based on the ``--opt-level``
Michael Carilli's avatar
Michael Carilli committed
131
132
and any overridden flags.  Floating point input data may be FP32 or FP16, but you may as well just
let it be FP16, because the ``model`` returned by ``amp.initialize`` will have its ``forward``
Michael Carilli's avatar
Michael Carilli committed
133
method patched to cast the input data appropriately.
134

Michael Carilli's avatar
Michael Carilli committed
135
136
137
.. note::
    Aside from the call to ``amp.initialize`` itself, it's never necessary to manually cast
    your model or data with the new API.  Therefore, a script that adheres to the new API
Michael Carilli's avatar
Michael Carilli committed
138
    can switch between different ``opt-level``\ s without having to make any other changes.