Adding closure and minimal examples

f5790a1e · Michael Carilli · 71671993 · f5790a1e · f5790a1e · f5790a1e
Commit f5790a1e authored Jun 14, 2018 by Michael Carilli
7 changed files
--- a/examples/FP16_Optimizer_simple/README.md
+++ b/examples/FP16_Optimizer_simple/README.md
 # Simple examples of FP16_Optimizer functionality

-`minimal.py` shows the basic usage of `FP16_Optimizer`.
+`minimal.py` shows the basic usage of `FP16_Optimizer` with either static or dynamic loss scaling.  Test via
+```bash
+python minimal.py
+```

-`closure.py` shows how to use `FP16_Optimizer` with a closure.
+`FP16_Optimizer` supports closures with the same control flow as ordinary Pytorch optimizers.  
+`closure.py` shows an example.  Test via
+```bash
+python closure.py
+```
+See [the API documentation](https://nvidia.github.io/apex/fp16_utils.html#apex.fp16_utils.FP16_Optimizer.step) for more details.

-`save_load.py` shows that `FP16_Optimizer` uses the same checkpointing syntax as ordinary Pytorch 
-optimizers.
+`FP16_Optimizer` also supports checkpointing with the same control flow as ordinary Pytorch optimizers.
+`save_load.py` shows an example.  Test via
+```bash
+python save_load.py
+```
+See [the API documentation](https://nvidia.github.io/apex/fp16_utils.html#apex.fp16_utils.FP16_Optimizer.load_state_dict) for more details.

-`distributed_pytorch` shows an example using `FP16_Optimizer` with Pytorch DistributedDataParallel.
+**distributed_pytorch** shows an example using `FP16_Optimizer` with Pytorch DistributedDataParallel.
 The usage of `FP16_Optimizer` with distributed does not need to change from ordinary single-process 
 usage. Run via
 ```bash
@@ -15,7 +27,7 @@ cd distributed_pytorch
 bash run.sh
 ```

-`distributed_pytorch` shows an example using `FP16_Optimizer` with Apex DistributedDataParallel.
+**distributed_pytorch** shows an example using `FP16_Optimizer` with Apex DistributedDataParallel.
 Again, the usage of `FP16_Optimizer` with distributed does not need to change from ordinary 
 single-process usage.  Run via
 ```bash

--- a/examples/FP16_Optimizer_simple/closure.py
+++ b/examples/FP16_Optimizer_simple/closure.py
+import torch
+from torch.autograd import Variable
+from apex.fp16_utils import FP16_Optimizer
+
+torch.backends.cudnn.benchmark = True
+
+N, D_in, D_out = 64, 1024, 16
+
+x = Variable(torch.cuda.FloatTensor(N, D_in ).normal_()).half()
+y = Variable(torch.cuda.FloatTensor(N, D_out).normal_()).half()
+
+model = torch.nn.Linear(D_in, D_out).cuda().half()
+
+optimizer = torch.optim.LBFGS(model.parameters())
+### Construct FP16_Optimizer
+optimizer = FP16_Optimizer(optimizer, static_loss_scale=128.0)
+###
+
+loss_fn = torch.nn.MSELoss()
+
+for t in range(5):
+    def closure():
+        optimizer.zero_grad()
+        y_pred = model(x)
+        loss = loss_fn(y_pred.float(), y.float())
+        ### Change loss.backward() within the closure to: ###
+        optimizer.backward(loss)
+        ###
+        return loss
+    loss = optimizer.step(closure)
+
+print("final loss = ", loss) 
--- a/examples/FP16_Optimizer_simple/distributed_apex/distributed_data_parallel.py
+++ b/examples/FP16_Optimizer_simple/distributed_apex/distributed_data_parallel.py
@@ -31,7 +31,7 @@ model = torch.nn.Linear(D_in, D_out).cuda().half()
 model = DDP(model)

 optimizer = torch.optim.SGD(model.parameters(), lr=1e-3)
-### CONSTRUCT FP16_Optimizer ###
+### Construct FP16_Optimizer ###
 optimizer = FP16_Optimizer(optimizer)
 ###

@@ -40,8 +40,8 @@ loss_fn = torch.nn.MSELoss()
 for t in range(500):
    optimizer.zero_grad()
    y_pred = model(x)
-    loss = loss_fn(y_pred, y)
-    ### CHANGE loss.backward() TO: ###
+    loss = loss_fn(y_pred.float(), y.float())
+    ### Change loss.backward() to: ###
    optimizer.backward(loss)
    ###
    optimizer.step()

--- a/examples/FP16_Optimizer_simple/distributed_pytorch/distributed_data_parallel.py
+++ b/examples/FP16_Optimizer_simple/distributed_pytorch/distributed_data_parallel.py
@@ -24,7 +24,7 @@ model = torch.nn.parallel.DistributedDataParallel(model,
                                                  output_device=args.local_rank)

 optimizer = torch.optim.SGD(model.parameters(), lr=1e-3)
-### CONSTRUCT FP16_Optimizer ###
+### Construct FP16_Optimizer ###
 optimizer = FP16_Optimizer(optimizer)
 ###

@@ -33,8 +33,8 @@ loss_fn = torch.nn.MSELoss()
 for t in range(500):
    optimizer.zero_grad()
    y_pred = model(x)
-    loss = loss_fn(y_pred, y)
-    ### CHANGE loss.backward() TO: ###
+    loss = loss_fn(y_pred.float(), y.float())
+    ### Change loss.backward() to: ###
    optimizer.backward(loss)
    ###
    optimizer.step()

--- a/examples/FP16_Optimizer_simple/minimal.py
+++ b/examples/FP16_Optimizer_simple/minimal.py
+import torch
+from torch.autograd import Variable
+from apex.fp16_utils import FP16_Optimizer
+
+torch.backends.cudnn.benchmark = True
+
+N, D_in, D_out = 64, 1024, 16
+
+x = Variable(torch.cuda.FloatTensor(N, D_in ).normal_()).half()
+y = Variable(torch.cuda.FloatTensor(N, D_out).normal_()).half()
+
+model = torch.nn.Linear(D_in, D_out).cuda().half()
+
+optimizer = torch.optim.SGD(model.parameters(), lr=1e-3)
+### Construct FP16_Optimizer with static loss scaling ###
+optimizer = FP16_Optimizer(optimizer, static_loss_scale=128.0)
+### ...or construct with dynamic loss scaling ###
+# optimizer = FP16_Optimizer(optimizer, 
+#                            dynamic_loss_scale=True,
+#                            dynamic_loss_args={'scale_factor' : 4})
+### dynamic_loss_args is optional, for "power users,"  and unnecessary in most cases.
+
+loss_fn = torch.nn.MSELoss()
+
+for t in range(1000):
+    optimizer.zero_grad()
+    y_pred = model(x)
+    loss = loss_fn(y_pred.float(), y.float())
+    ### Change loss.backward() to: ###
+    optimizer.backward(loss)
+    ###
+    optimizer.step()
+
+print("final loss = ", loss)
--- a/examples/README.md
+++ b/examples/README.md
 ## Contents:

-distributed:  Walkthrough of apex distributed data parallel utilities.
+**distributed**:  Walkthrough of apex distributed data parallel utilities.

-FP16_Optimizer_simple:  Simple examples demonstrating various use cases of `FP16_Optimizer` to automatically manage master parameters and static or dynamic loss scaling.
+**FP16_Optimizer_simple**:  Simple examples demonstrating various use cases of `FP16_Optimizer` to automatically manage master parameters and static or dynamic loss scaling.

-imagenet:  Example based on [https://github.com/pytorch/examples/tree/master/imagenet](https://github.com/pytorch/examples/tree/master/imagenet) showing the use of `FP16_Optimizer`, as well as manual management of master parameters and loss scaling for illustration/comparison.
+**imagenet**:  Example based on [https://github.com/pytorch/examples/tree/master/imagenet](https://github.com/pytorch/examples/tree/master/imagenet) showing the use of `FP16_Optimizer`, as well as manual management of master parameters and loss scaling for illustration/comparison.

-word_language_model:  Example based on [https://github.com/pytorch/examples/tree/master/word_language_model](https://github.com/pytorch/examples/tree/master/word_language_model) showing the use of `FP16_Optimizer`, as well as manual management of master parameters and loss scaling for illustration/comparison.
+**word_language_model**:  Example based on [https://github.com/pytorch/examples/tree/master/word_language_model](https://github.com/pytorch/examples/tree/master/word_language_model) showing the use of `FP16_Optimizer`, as well as manual management of master parameters and loss scaling for illustration/comparison.

-docker:  Example of a minimal Dockerfile that installs Apex on top of the Pytorch 0.4 container.
+**docker**:  Example of a minimal Dockerfile that installs Apex on top of the Pytorch 0.4 container.
--- a/examples/docker/Dockerfile
+++ b/examples/docker/Dockerfile
 # Base image must at least have nvcc and CUDA installed.
-FROM <base>
+FROM gitlab-dl.nvidia.com:5005/dgx/pytorch:18.04-py3-devel
 WORKDIR /workspace
 # uninstall Apex if present
 RUN pip uninstall -y apex || :
 # SHA is something the user can alter to force recreation of this Docker layer, 
 # and therefore force cloning the latest version of Apex
-RUN SHA=43d1ae08 git clone https://github.com/NVIDIA/apex.git
+RUN SHA=43f1ae08 git clone https://github.com/NVIDIA/apex.git
 WORKDIR /workspace/apex
 RUN python setup.py install
 WORKDIR /workspace