Commit f5790a1e authored by Michael Carilli's avatar Michael Carilli
Browse files

Adding closure and minimal examples

parent 71671993
# Simple examples of FP16_Optimizer functionality # Simple examples of FP16_Optimizer functionality
`minimal.py` shows the basic usage of `FP16_Optimizer`. `minimal.py` shows the basic usage of `FP16_Optimizer` with either static or dynamic loss scaling. Test via
```bash
python minimal.py
```
`closure.py` shows how to use `FP16_Optimizer` with a closure. `FP16_Optimizer` supports closures with the same control flow as ordinary Pytorch optimizers.
`closure.py` shows an example. Test via
```bash
python closure.py
```
See [the API documentation](https://nvidia.github.io/apex/fp16_utils.html#apex.fp16_utils.FP16_Optimizer.step) for more details.
`save_load.py` shows that `FP16_Optimizer` uses the same checkpointing syntax as ordinary Pytorch `FP16_Optimizer` also supports checkpointing with the same control flow as ordinary Pytorch optimizers.
optimizers. `save_load.py` shows an example. Test via
```bash
python save_load.py
```
See [the API documentation](https://nvidia.github.io/apex/fp16_utils.html#apex.fp16_utils.FP16_Optimizer.load_state_dict) for more details.
`distributed_pytorch` shows an example using `FP16_Optimizer` with Pytorch DistributedDataParallel. **distributed_pytorch** shows an example using `FP16_Optimizer` with Pytorch DistributedDataParallel.
The usage of `FP16_Optimizer` with distributed does not need to change from ordinary single-process The usage of `FP16_Optimizer` with distributed does not need to change from ordinary single-process
usage. Run via usage. Run via
```bash ```bash
...@@ -15,7 +27,7 @@ cd distributed_pytorch ...@@ -15,7 +27,7 @@ cd distributed_pytorch
bash run.sh bash run.sh
``` ```
`distributed_pytorch` shows an example using `FP16_Optimizer` with Apex DistributedDataParallel. **distributed_pytorch** shows an example using `FP16_Optimizer` with Apex DistributedDataParallel.
Again, the usage of `FP16_Optimizer` with distributed does not need to change from ordinary Again, the usage of `FP16_Optimizer` with distributed does not need to change from ordinary
single-process usage. Run via single-process usage. Run via
```bash ```bash
......
import torch
from torch.autograd import Variable
from apex.fp16_utils import FP16_Optimizer
torch.backends.cudnn.benchmark = True
N, D_in, D_out = 64, 1024, 16
x = Variable(torch.cuda.FloatTensor(N, D_in ).normal_()).half()
y = Variable(torch.cuda.FloatTensor(N, D_out).normal_()).half()
model = torch.nn.Linear(D_in, D_out).cuda().half()
optimizer = torch.optim.LBFGS(model.parameters())
### Construct FP16_Optimizer
optimizer = FP16_Optimizer(optimizer, static_loss_scale=128.0)
###
loss_fn = torch.nn.MSELoss()
for t in range(5):
def closure():
optimizer.zero_grad()
y_pred = model(x)
loss = loss_fn(y_pred.float(), y.float())
### Change loss.backward() within the closure to: ###
optimizer.backward(loss)
###
return loss
loss = optimizer.step(closure)
print("final loss = ", loss)
...@@ -31,7 +31,7 @@ model = torch.nn.Linear(D_in, D_out).cuda().half() ...@@ -31,7 +31,7 @@ model = torch.nn.Linear(D_in, D_out).cuda().half()
model = DDP(model) model = DDP(model)
optimizer = torch.optim.SGD(model.parameters(), lr=1e-3) optimizer = torch.optim.SGD(model.parameters(), lr=1e-3)
### CONSTRUCT FP16_Optimizer ### ### Construct FP16_Optimizer ###
optimizer = FP16_Optimizer(optimizer) optimizer = FP16_Optimizer(optimizer)
### ###
...@@ -40,8 +40,8 @@ loss_fn = torch.nn.MSELoss() ...@@ -40,8 +40,8 @@ loss_fn = torch.nn.MSELoss()
for t in range(500): for t in range(500):
optimizer.zero_grad() optimizer.zero_grad()
y_pred = model(x) y_pred = model(x)
loss = loss_fn(y_pred, y) loss = loss_fn(y_pred.float(), y.float())
### CHANGE loss.backward() TO: ### ### Change loss.backward() to: ###
optimizer.backward(loss) optimizer.backward(loss)
### ###
optimizer.step() optimizer.step()
......
...@@ -24,7 +24,7 @@ model = torch.nn.parallel.DistributedDataParallel(model, ...@@ -24,7 +24,7 @@ model = torch.nn.parallel.DistributedDataParallel(model,
output_device=args.local_rank) output_device=args.local_rank)
optimizer = torch.optim.SGD(model.parameters(), lr=1e-3) optimizer = torch.optim.SGD(model.parameters(), lr=1e-3)
### CONSTRUCT FP16_Optimizer ### ### Construct FP16_Optimizer ###
optimizer = FP16_Optimizer(optimizer) optimizer = FP16_Optimizer(optimizer)
### ###
...@@ -33,8 +33,8 @@ loss_fn = torch.nn.MSELoss() ...@@ -33,8 +33,8 @@ loss_fn = torch.nn.MSELoss()
for t in range(500): for t in range(500):
optimizer.zero_grad() optimizer.zero_grad()
y_pred = model(x) y_pred = model(x)
loss = loss_fn(y_pred, y) loss = loss_fn(y_pred.float(), y.float())
### CHANGE loss.backward() TO: ### ### Change loss.backward() to: ###
optimizer.backward(loss) optimizer.backward(loss)
### ###
optimizer.step() optimizer.step()
......
import torch
from torch.autograd import Variable
from apex.fp16_utils import FP16_Optimizer
torch.backends.cudnn.benchmark = True
N, D_in, D_out = 64, 1024, 16
x = Variable(torch.cuda.FloatTensor(N, D_in ).normal_()).half()
y = Variable(torch.cuda.FloatTensor(N, D_out).normal_()).half()
model = torch.nn.Linear(D_in, D_out).cuda().half()
optimizer = torch.optim.SGD(model.parameters(), lr=1e-3)
### Construct FP16_Optimizer with static loss scaling ###
optimizer = FP16_Optimizer(optimizer, static_loss_scale=128.0)
### ...or construct with dynamic loss scaling ###
# optimizer = FP16_Optimizer(optimizer,
# dynamic_loss_scale=True,
# dynamic_loss_args={'scale_factor' : 4})
### dynamic_loss_args is optional, for "power users," and unnecessary in most cases.
loss_fn = torch.nn.MSELoss()
for t in range(1000):
optimizer.zero_grad()
y_pred = model(x)
loss = loss_fn(y_pred.float(), y.float())
### Change loss.backward() to: ###
optimizer.backward(loss)
###
optimizer.step()
print("final loss = ", loss)
## Contents: ## Contents:
distributed: Walkthrough of apex distributed data parallel utilities. **distributed**: Walkthrough of apex distributed data parallel utilities.
FP16_Optimizer_simple: Simple examples demonstrating various use cases of `FP16_Optimizer` to automatically manage master parameters and static or dynamic loss scaling. **FP16_Optimizer_simple**: Simple examples demonstrating various use cases of `FP16_Optimizer` to automatically manage master parameters and static or dynamic loss scaling.
imagenet: Example based on [https://github.com/pytorch/examples/tree/master/imagenet](https://github.com/pytorch/examples/tree/master/imagenet) showing the use of `FP16_Optimizer`, as well as manual management of master parameters and loss scaling for illustration/comparison. **imagenet**: Example based on [https://github.com/pytorch/examples/tree/master/imagenet](https://github.com/pytorch/examples/tree/master/imagenet) showing the use of `FP16_Optimizer`, as well as manual management of master parameters and loss scaling for illustration/comparison.
word_language_model: Example based on [https://github.com/pytorch/examples/tree/master/word_language_model](https://github.com/pytorch/examples/tree/master/word_language_model) showing the use of `FP16_Optimizer`, as well as manual management of master parameters and loss scaling for illustration/comparison. **word_language_model**: Example based on [https://github.com/pytorch/examples/tree/master/word_language_model](https://github.com/pytorch/examples/tree/master/word_language_model) showing the use of `FP16_Optimizer`, as well as manual management of master parameters and loss scaling for illustration/comparison.
docker: Example of a minimal Dockerfile that installs Apex on top of the Pytorch 0.4 container. **docker**: Example of a minimal Dockerfile that installs Apex on top of the Pytorch 0.4 container.
# Base image must at least have nvcc and CUDA installed. # Base image must at least have nvcc and CUDA installed.
FROM <base> FROM gitlab-dl.nvidia.com:5005/dgx/pytorch:18.04-py3-devel
WORKDIR /workspace WORKDIR /workspace
# uninstall Apex if present # uninstall Apex if present
RUN pip uninstall -y apex || : RUN pip uninstall -y apex || :
# SHA is something the user can alter to force recreation of this Docker layer, # SHA is something the user can alter to force recreation of this Docker layer,
# and therefore force cloning the latest version of Apex # and therefore force cloning the latest version of Apex
RUN SHA=43d1ae08 git clone https://github.com/NVIDIA/apex.git RUN SHA=43f1ae08 git clone https://github.com/NVIDIA/apex.git
WORKDIR /workspace/apex WORKDIR /workspace/apex
RUN python setup.py install RUN python setup.py install
WORKDIR /workspace WORKDIR /workspace
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment