Commit f5790a1e authored by Michael Carilli's avatar Michael Carilli
Browse files

Adding closure and minimal examples

parent 71671993
# Simple examples of FP16_Optimizer functionality
`minimal.py` shows the basic usage of `FP16_Optimizer`.
`minimal.py` shows the basic usage of `FP16_Optimizer` with either static or dynamic loss scaling. Test via
```bash
python minimal.py
```
`closure.py` shows how to use `FP16_Optimizer` with a closure.
`FP16_Optimizer` supports closures with the same control flow as ordinary Pytorch optimizers.
`closure.py` shows an example. Test via
```bash
python closure.py
```
See [the API documentation](https://nvidia.github.io/apex/fp16_utils.html#apex.fp16_utils.FP16_Optimizer.step) for more details.
`save_load.py` shows that `FP16_Optimizer` uses the same checkpointing syntax as ordinary Pytorch
optimizers.
`FP16_Optimizer` also supports checkpointing with the same control flow as ordinary Pytorch optimizers.
`save_load.py` shows an example. Test via
```bash
python save_load.py
```
See [the API documentation](https://nvidia.github.io/apex/fp16_utils.html#apex.fp16_utils.FP16_Optimizer.load_state_dict) for more details.
`distributed_pytorch` shows an example using `FP16_Optimizer` with Pytorch DistributedDataParallel.
**distributed_pytorch** shows an example using `FP16_Optimizer` with Pytorch DistributedDataParallel.
The usage of `FP16_Optimizer` with distributed does not need to change from ordinary single-process
usage. Run via
```bash
......@@ -15,7 +27,7 @@ cd distributed_pytorch
bash run.sh
```
`distributed_pytorch` shows an example using `FP16_Optimizer` with Apex DistributedDataParallel.
**distributed_pytorch** shows an example using `FP16_Optimizer` with Apex DistributedDataParallel.
Again, the usage of `FP16_Optimizer` with distributed does not need to change from ordinary
single-process usage. Run via
```bash
......
import torch
from torch.autograd import Variable
from apex.fp16_utils import FP16_Optimizer
torch.backends.cudnn.benchmark = True
N, D_in, D_out = 64, 1024, 16
x = Variable(torch.cuda.FloatTensor(N, D_in ).normal_()).half()
y = Variable(torch.cuda.FloatTensor(N, D_out).normal_()).half()
model = torch.nn.Linear(D_in, D_out).cuda().half()
optimizer = torch.optim.LBFGS(model.parameters())
### Construct FP16_Optimizer
optimizer = FP16_Optimizer(optimizer, static_loss_scale=128.0)
###
loss_fn = torch.nn.MSELoss()
for t in range(5):
def closure():
optimizer.zero_grad()
y_pred = model(x)
loss = loss_fn(y_pred.float(), y.float())
### Change loss.backward() within the closure to: ###
optimizer.backward(loss)
###
return loss
loss = optimizer.step(closure)
print("final loss = ", loss)
......@@ -31,7 +31,7 @@ model = torch.nn.Linear(D_in, D_out).cuda().half()
model = DDP(model)
optimizer = torch.optim.SGD(model.parameters(), lr=1e-3)
### CONSTRUCT FP16_Optimizer ###
### Construct FP16_Optimizer ###
optimizer = FP16_Optimizer(optimizer)
###
......@@ -40,8 +40,8 @@ loss_fn = torch.nn.MSELoss()
for t in range(500):
optimizer.zero_grad()
y_pred = model(x)
loss = loss_fn(y_pred, y)
### CHANGE loss.backward() TO: ###
loss = loss_fn(y_pred.float(), y.float())
### Change loss.backward() to: ###
optimizer.backward(loss)
###
optimizer.step()
......
......@@ -24,7 +24,7 @@ model = torch.nn.parallel.DistributedDataParallel(model,
output_device=args.local_rank)
optimizer = torch.optim.SGD(model.parameters(), lr=1e-3)
### CONSTRUCT FP16_Optimizer ###
### Construct FP16_Optimizer ###
optimizer = FP16_Optimizer(optimizer)
###
......@@ -33,8 +33,8 @@ loss_fn = torch.nn.MSELoss()
for t in range(500):
optimizer.zero_grad()
y_pred = model(x)
loss = loss_fn(y_pred, y)
### CHANGE loss.backward() TO: ###
loss = loss_fn(y_pred.float(), y.float())
### Change loss.backward() to: ###
optimizer.backward(loss)
###
optimizer.step()
......
import torch
from torch.autograd import Variable
from apex.fp16_utils import FP16_Optimizer
torch.backends.cudnn.benchmark = True
N, D_in, D_out = 64, 1024, 16
x = Variable(torch.cuda.FloatTensor(N, D_in ).normal_()).half()
y = Variable(torch.cuda.FloatTensor(N, D_out).normal_()).half()
model = torch.nn.Linear(D_in, D_out).cuda().half()
optimizer = torch.optim.SGD(model.parameters(), lr=1e-3)
### Construct FP16_Optimizer with static loss scaling ###
optimizer = FP16_Optimizer(optimizer, static_loss_scale=128.0)
### ...or construct with dynamic loss scaling ###
# optimizer = FP16_Optimizer(optimizer,
# dynamic_loss_scale=True,
# dynamic_loss_args={'scale_factor' : 4})
### dynamic_loss_args is optional, for "power users," and unnecessary in most cases.
loss_fn = torch.nn.MSELoss()
for t in range(1000):
optimizer.zero_grad()
y_pred = model(x)
loss = loss_fn(y_pred.float(), y.float())
### Change loss.backward() to: ###
optimizer.backward(loss)
###
optimizer.step()
print("final loss = ", loss)
## Contents:
distributed: Walkthrough of apex distributed data parallel utilities.
**distributed**: Walkthrough of apex distributed data parallel utilities.
FP16_Optimizer_simple: Simple examples demonstrating various use cases of `FP16_Optimizer` to automatically manage master parameters and static or dynamic loss scaling.
**FP16_Optimizer_simple**: Simple examples demonstrating various use cases of `FP16_Optimizer` to automatically manage master parameters and static or dynamic loss scaling.
imagenet: Example based on [https://github.com/pytorch/examples/tree/master/imagenet](https://github.com/pytorch/examples/tree/master/imagenet) showing the use of `FP16_Optimizer`, as well as manual management of master parameters and loss scaling for illustration/comparison.
**imagenet**: Example based on [https://github.com/pytorch/examples/tree/master/imagenet](https://github.com/pytorch/examples/tree/master/imagenet) showing the use of `FP16_Optimizer`, as well as manual management of master parameters and loss scaling for illustration/comparison.
word_language_model: Example based on [https://github.com/pytorch/examples/tree/master/word_language_model](https://github.com/pytorch/examples/tree/master/word_language_model) showing the use of `FP16_Optimizer`, as well as manual management of master parameters and loss scaling for illustration/comparison.
**word_language_model**: Example based on [https://github.com/pytorch/examples/tree/master/word_language_model](https://github.com/pytorch/examples/tree/master/word_language_model) showing the use of `FP16_Optimizer`, as well as manual management of master parameters and loss scaling for illustration/comparison.
docker: Example of a minimal Dockerfile that installs Apex on top of the Pytorch 0.4 container.
**docker**: Example of a minimal Dockerfile that installs Apex on top of the Pytorch 0.4 container.
# Base image must at least have nvcc and CUDA installed.
FROM <base>
FROM gitlab-dl.nvidia.com:5005/dgx/pytorch:18.04-py3-devel
WORKDIR /workspace
# uninstall Apex if present
RUN pip uninstall -y apex || :
# SHA is something the user can alter to force recreation of this Docker layer,
# and therefore force cloning the latest version of Apex
RUN SHA=43d1ae08 git clone https://github.com/NVIDIA/apex.git
RUN SHA=43f1ae08 git clone https://github.com/NVIDIA/apex.git
WORKDIR /workspace/apex
RUN python setup.py install
WORKDIR /workspace
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment