"git@developer.sourcefind.cn:OpenDAS/ktransformers.git" did not exist on "b5ef7c26dc2eda7bb8335277e1d62face0c24f26"
Commit 4606df98 authored by Michael Carilli's avatar Michael Carilli
Browse files

Rearranging documentation

parent 0c2a629d
...@@ -12,7 +12,7 @@ users as quickly as possible. ...@@ -12,7 +12,7 @@ users as quickly as possible.
## 1. Mixed Precision ## 1. Mixed Precision
### amp: Automatic Mixed Precision ### Amp: Automatic Mixed Precision
`apex.amp` is a tool to enable mixed precision training by changing only 3 lines of your script. `apex.amp` is a tool to enable mixed precision training by changing only 3 lines of your script.
Users can easily experiment with different pure and mixed precision training modes by supplying Users can easily experiment with different pure and mixed precision training modes by supplying
...@@ -27,7 +27,7 @@ different flags to `amp.initialize`. ...@@ -27,7 +27,7 @@ different flags to `amp.initialize`.
[DCGAN example coming soon...](https://github.com/NVIDIA/apex/tree/master/examples/dcgan) [DCGAN example coming soon...](https://github.com/NVIDIA/apex/tree/master/examples/dcgan)
[Moving to the new Amp API] (for users of the deprecated tools formerly called "Amp" and "FP16_Optimizer") [Moving to the new Amp API](https://nvidia.github.io/apex/amp.html#transition-guide-for-old-api-users) (for users of the deprecated tools formerly called "Amp" and "FP16_Optimizer")
## 2. Distributed Training ## 2. Distributed Training
......
...@@ -7,11 +7,22 @@ apex.amp ...@@ -7,11 +7,22 @@ apex.amp
This page documents the updated API for Amp (Automatic Mixed Precision), This page documents the updated API for Amp (Automatic Mixed Precision),
a tool to enable Tensor Core-accelerated training in only 3 lines of Python. a tool to enable Tensor Core-accelerated training in only 3 lines of Python.
Users **should not** manually cast their model or data to ``.half()``, regardless of what ``opt_level`` A `runnable, comprehensive Imagenet example`_ demonstrating good practices can be found
or properties are chosen. Amp intends that users start with an existing default (FP32) script, on the Github page.
add the three lines corresponding to the Amp API, and begin training with mixed precision.
Amp can also be disabled, in which case the original script will behave exactly as it used to. GANs are a tricky case that many people have requested. A `comprehensive DCGAN example`_
In this way, there's no risk adhering to the Amp API, and a lot of potential performance benefit. is under construction.
``opt_level``\ s and Properties
-------------------------------
Amp allows users to easily experiment with different pure and mixed precision modes.
Commonly-used default modes are chosen by
selecting an "optimization level" or ``opt_level``; each ``opt_level`` establishes a set of
properties that govern Amp's implementation of pure or mixed precision training.
Finer-grained control of how a given ``opt_level`` behaves can be achieved by passing values for
particular properties directly to ``amp.initialize``. These manually specified values
override the defaults established by the ``opt_level``.
Example:: Example::
...@@ -27,11 +38,16 @@ Example:: ...@@ -27,11 +38,16 @@ Example::
scaled_loss.backward() scaled_loss.backward()
... ...
A `runnable, comprehensive Imagenet example`_ demonstrating good practices can be found Users **should not** manually cast their model or data to ``.half()``, regardless of what ``opt_level``
on the Github page. or properties are chosen. Amp intends that users start with an existing default (FP32) script,
add the three lines corresponding to the Amp API, and begin training with mixed precision.
Amp can also be disabled, in which case the original script will behave exactly as it used to.
In this way, there's no risk adhering to the Amp API, and a lot of potential performance benefit.
GANs are a tricky case that many people have requested. A `comprehensive DCGAN example`_ .. note::
is under construction. Because it's never necessary to manually cast your model (aside from the call ``amp.initialize``)
or input data, a script that adheres to the new API
can switch between different ``opt-level``\ s without having to make any other changes.
.. _`runnable, comprehensive Imagenet example`: .. _`runnable, comprehensive Imagenet example`:
https://github.com/NVIDIA/apex/tree/master/examples/imagenet https://github.com/NVIDIA/apex/tree/master/examples/imagenet
...@@ -39,17 +55,6 @@ is under construction. ...@@ -39,17 +55,6 @@ is under construction.
.. _`comprehensive DCGAN example`: .. _`comprehensive DCGAN example`:
https://github.com/NVIDIA/apex/tree/master/examples/dcgan https://github.com/NVIDIA/apex/tree/master/examples/dcgan
``opt_level``\ s and Properties
-------------------------------
Amp allows users to easily experiment with different pure and mixed precision modes, including
pure FP16 training and pure FP32 training. Commonly-used default modes are chosen by
selecting an "optimization level" or ``opt_level``; each ``opt_level`` establishes a set of
properties that govern Amp's implementation of pure or mixed precision training.
Finer-grained control of how a given ``opt_level`` behaves can be achieved by passing values for
particular properties directly to ``amp.initialize``. These manually specified values will
override the defaults established by the ``opt_level``.
Properties Properties
********** **********
...@@ -57,9 +62,9 @@ Currently, the under-the-hood properties that govern pure or mixed precision tra ...@@ -57,9 +62,9 @@ Currently, the under-the-hood properties that govern pure or mixed precision tra
- ``cast_model_type``: Casts your model's parameters and buffers to the desired type. - ``cast_model_type``: Casts your model's parameters and buffers to the desired type.
- ``patch_torch_functions``: Patch all Torch functions and Tensor methods to perform Tensor Core-friendly ops like GEMMs and convolutions in FP16, and any ops that benefit from FP32 precision in FP32. - ``patch_torch_functions``: Patch all Torch functions and Tensor methods to perform Tensor Core-friendly ops like GEMMs and convolutions in FP16, and any ops that benefit from FP32 precision in FP32.
- ``keep_batchnorm_fp32``: To enhance precision and enable cudnn batchnorm (which improves performance), it's often beneficial to keep batchnorms in particular in FP32 even if the rest of the model is FP16. - ``keep_batchnorm_fp32``: To enhance precision and enable cudnn batchnorm (which improves performance), it's often beneficial to keep batchnorm weights in FP32 even if the rest of the model is FP16.
- ``master_weights``: Maintain FP32 master weights to accompany any FP16 model weights. FP32 master weights are stepped by the optimizer to enhance precision and capture small gradients. - ``master_weights``: Maintain FP32 master weights to accompany any FP16 model weights. FP32 master weights are stepped by the optimizer to enhance precision and capture small gradients.
- ``loss_scale``: If ``loss_scale`` is a float value, use this value as the static (fixed) loss scale. If ``loss_scale`` is the string ``"dynamic"``, adapatively adjust the loss scale over time. Dynamic loss scale adjustments are performed by Amp automatically. - ``loss_scale``: If ``loss_scale`` is a float value, use this value as the static (fixed) loss scale. If ``loss_scale`` is the string ``"dynamic"``, adaptively adjust the loss scale over time. Dynamic loss scale adjustments are performed by Amp automatically.
Again, you often don't need to specify these properties by hand. Instead, select an ``opt_level``, Again, you often don't need to specify these properties by hand. Instead, select an ``opt_level``,
which will set them up for you. After selecting an ``opt_level``, you can optionally pass property which will set them up for you. After selecting an ``opt_level``, you can optionally pass property
...@@ -85,7 +90,7 @@ Your incoming model should be FP32 already, so this is likely a no-op. ...@@ -85,7 +90,7 @@ Your incoming model should be FP32 already, so this is likely a no-op.
| Default properties set by ``O0``: | Default properties set by ``O0``:
| ``cast_model_type=torch.float32`` | ``cast_model_type=torch.float32``
| ``patch_torch_functions=False`` | ``patch_torch_functions=False``
| ``keep_batchnorm_fp32=None`` (effectively, "not applicable") | ``keep_batchnorm_fp32=None`` (effectively, "not applicable," everything is FP32)
| ``master_weights=False`` | ``master_weights=False``
| ``loss_scale=1.0`` | ``loss_scale=1.0``
| |
...@@ -116,13 +121,13 @@ what gives the best speedup and accuracy for your model. ...@@ -116,13 +121,13 @@ what gives the best speedup and accuracy for your model.
Patch all Torch functions and Tensor methods to cast their inputs according to a whitelist-blacklist Patch all Torch functions and Tensor methods to cast their inputs according to a whitelist-blacklist
model. Whitelist ops (for example, Tensor Core-friendly ops like GEMMs and convolutions) are performed model. Whitelist ops (for example, Tensor Core-friendly ops like GEMMs and convolutions) are performed
in FP16. Blacklist ops that benefit from FP32 precision (for example, batchnorm and softmax) in FP16. Blacklist ops that benefit from FP32 precision (for example, softmax)
are performed in FP32. ``O1`` also uses dynamic loss scaling, unless overridden. are performed in FP32. ``O1`` also uses dynamic loss scaling, unless overridden.
| Default properties set by ``O1``: | Default properties set by ``O1``:
| ``cast_model_type=None`` (not applicable) | ``cast_model_type=None`` (not applicable)
| ``patch_torch_functions=True`` | ``patch_torch_functions=True``
| ``keep_batchnorm_fp32=None`` (not necessary to specify True, batchnorm inputs are cast to FP32) | ``keep_batchnorm_fp32=None`` (again, "not applicable," all model weights remain FP32)
| ``master_weights=None`` (not applicable, model weights remain FP32) | ``master_weights=None`` (not applicable, model weights remain FP32)
| ``loss_scale="dynamic"`` | ``loss_scale="dynamic"``
| |
...@@ -130,7 +135,9 @@ are performed in FP32. ``O1`` also uses dynamic loss scaling, unless overridden ...@@ -130,7 +135,9 @@ are performed in FP32. ``O1`` also uses dynamic loss scaling, unless overridden
``O2``: Fast Mixed Precision ``O2``: Fast Mixed Precision
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
``O2`` casts the model to FP16, keeps batchnorms in FP32, maintains master weights in FP32, ``O2`` casts the model weights to FP16,
patches the model's ``forward`` method to cast input
data to FP16, keeps batchnorms in FP32, maintains FP32 master weights,
and implements dynamic loss scaling (unless overridden). and implements dynamic loss scaling (unless overridden).
Unlike ``O1``, ``O2`` does not patch Torch functions or Tensor methods. Unlike ``O1``, ``O2`` does not patch Torch functions or Tensor methods.
...@@ -221,14 +228,9 @@ One annoying aspect of FP16_Optimizer was that the user had to manually convert ...@@ -221,14 +228,9 @@ One annoying aspect of FP16_Optimizer was that the user had to manually convert
(either by calling ``.half()`` on it, or using a function or module wrapper from (either by calling ``.half()`` on it, or using a function or module wrapper from
``apex.fp16_utils``), and also manually call ``.half()`` on input data. **Neither of these are ``apex.fp16_utils``), and also manually call ``.half()`` on input data. **Neither of these are
necessary in the new API. No matter what --opt-level necessary in the new API. No matter what --opt-level
you choose, you can and should simply build your model in the default FP32 format.** The new Amp you choose, you can and should simply build your model and pass input data in the default FP32 format.**
API will perform the right conversions during The new Amp API will perform the right conversions during
``model, optimizer = amp.initialize(model, optimizer, opt_level=....)`` based on the ``--opt-level`` ``model, optimizer = amp.initialize(model, optimizer, opt_level=....)`` based on the ``--opt-level``
and any overridden flags. Floating point input data may be FP32 or FP16, but you may as well just and any overridden flags. Floating point input data may be FP32 or FP16, but you may as well just
let it be FP16, because the ``model`` returned by ``amp.initialize`` will have its ``forward`` let it be FP16, because the ``model`` returned by ``amp.initialize`` will have its ``forward``
method patched to cast the input data appropriately. method patched to cast the input data appropriately.
.. note::
Aside from the call to ``amp.initialize`` itself, it's never necessary to manually cast
your model or data with the new API. Therefore, a script that adheres to the new API
can switch between different ``opt-level``\ s without having to make any other changes.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment