Updating clip_master_grads for forward compatibility

0a092aaf · Michael Carilli · 7eba6bfb · 0a092aaf · 0a092aaf
Commit 0a092aaf authored Jun 18, 2018 by Michael Carilli
Hide whitespace changes
Inline Side-by-side

Showing with 12 additions and 4 deletions

apex/fp16_utils/fp16_optimizer.py apex/fp16_utils/fp16_optimizer.py +8 -1

examples/FP16_Optimizer_simple/README.md examples/FP16_Optimizer_simple/README.md +4 -3

No files found.
--- a/apex/fp16_utils/fp16_optimizer.py
+++ b/apex/fp16_utils/fp16_optimizer.py
@@ -199,6 +199,13 @@ class FP16_Optimizer(object):
        self.overflow = False
        self.first_closure_call_this_step = True

+        TORCH_MAJOR = int(torch.__version__.split('.')[0])
+        TORCH_MINOR = int(torch.__version__.split('.')[1])
+        if TORCH_MAJOR == 0 and TORCH_MINOR <= 4:
+            self.clip_grad_norm = torch.nn.utils.clip_grad_norm
+        else:
+            self.clip_grad_norm = torch.nn.utils.clip_grad_norm_
+            
    def __getstate__(self):
        raise RuntimeError("FP16_Optimizer should be serialized using state_dict().")

@@ -270,7 +277,7 @@ class FP16_Optimizer(object):
            for param_group in self.optimizer.param_groups:
                for param in param_group['params']:
                    fp32_params.append(param)
-            return torch.nn.utils.clip_grad_norm(fp32_params, max_norm, norm_type)
+            return self.clip_grad_norm(fp32_params, max_norm, norm_type)
        else:
            return -1


--- a/examples/FP16_Optimizer_simple/README.md
+++ b/examples/FP16_Optimizer_simple/README.md
@@ -4,6 +4,7 @@ To use `FP16_Optimizer` on a half-precision model, or a model with a mixture of
 half and float parameters, only two lines of your training script need to change:
 1. Construct an `FP16_Optimizer` instance from an existing optimizer.
 2. Replace `loss.backward()` with `optimizer.backward(loss)`.
+
 [Full API Documentation](https://nvidia.github.io/apex/fp16_utils.html#automatic-management-of-master-params-loss-scaling)

 See "Other Options" at the bottom of this page for some cases that require special treatment.
@@ -42,8 +43,8 @@ bash run.sh

 #### Other Options

-Gradient clipping requires that calls to `torch.nn.utils.clip_grad_norm"
-be replaced with [fp16_optimizer_instance.clip_master_grads](https://nvidia.github.io/apex/fp16_utils.html#apex.fp16_utils.FP16_Optimizer.clip_master_grads).
+Gradient clipping requires that calls to `torch.nn.utils.clip_grad_norm`
+be replaced with [fp16_optimizer_instance.clip_master_grads()](https://nvidia.github.io/apex/fp16_utils.html#apex.fp16_utils.FP16_Optimizer.clip_master_grads).  The [word_language_model example](https://github.com/NVIDIA/apex/blob/master/examples/word_language_model/main_fp16_optimizer.py) uses this feature.

 Multiple losses will work if you simply replace
 ```bash
@@ -56,4 +57,4 @@ optimizer.backward(loss1)
 optimizer.backward(loss2)
 ```
 but `FP16_Optimizer` can be told to handle this more efficiently using the 
-[update_master_grads](https://nvidia.github.io/apex/fp16_utils.html#apex.fp16_utils.FP16_Optimizer.update_master_grads) option.
+[update_master_grads()](https://nvidia.github.io/apex/fp16_utils.html#apex.fp16_utils.FP16_Optimizer.update_master_grads) option.