@@ -4,6 +4,7 @@ To use `FP16_Optimizer` on a half-precision model, or a model with a mixture of
...
@@ -4,6 +4,7 @@ To use `FP16_Optimizer` on a half-precision model, or a model with a mixture of
half and float parameters, only two lines of your training script need to change:
half and float parameters, only two lines of your training script need to change:
1. Construct an `FP16_Optimizer` instance from an existing optimizer.
1. Construct an `FP16_Optimizer` instance from an existing optimizer.
2. Replace `loss.backward()` with `optimizer.backward(loss)`.
2. Replace `loss.backward()` with `optimizer.backward(loss)`.
[Full API Documentation](https://nvidia.github.io/apex/fp16_utils.html#automatic-management-of-master-params-loss-scaling)
[Full API Documentation](https://nvidia.github.io/apex/fp16_utils.html#automatic-management-of-master-params-loss-scaling)
See "Other Options" at the bottom of this page for some cases that require special treatment.
See "Other Options" at the bottom of this page for some cases that require special treatment.
...
@@ -42,8 +43,8 @@ bash run.sh
...
@@ -42,8 +43,8 @@ bash run.sh
#### Other Options
#### Other Options
Gradient clipping requires that calls to `torch.nn.utils.clip_grad_norm"
Gradient clipping requires that calls to `torch.nn.utils.clip_grad_norm`
be replaced with [fp16_optimizer_instance.clip_master_grads](https://nvidia.github.io/apex/fp16_utils.html#apex.fp16_utils.FP16_Optimizer.clip_master_grads).
be replaced with [fp16_optimizer_instance.clip_master_grads()](https://nvidia.github.io/apex/fp16_utils.html#apex.fp16_utils.FP16_Optimizer.clip_master_grads). The [word_language_model example](https://github.com/NVIDIA/apex/blob/master/examples/word_language_model/main_fp16_optimizer.py) uses this feature.
Multiple losses will work if you simply replace
Multiple losses will work if you simply replace
```bash
```bash
...
@@ -56,4 +57,4 @@ optimizer.backward(loss1)
...
@@ -56,4 +57,4 @@ optimizer.backward(loss1)
optimizer.backward(loss2)
optimizer.backward(loss2)
```
```
but `FP16_Optimizer` can be told to handle this more efficiently using the
but `FP16_Optimizer` can be told to handle this more efficiently using the