"vscode:/vscode.git/clone" did not exist on "8fe65d5f560611f4f60b8ea549b6bf7e75e3ae7f"
Commit 254cad2d authored by Michael Carilli's avatar Michael Carilli
Browse files

Comprehensive imagenet example for the new API

parent 484292f0
......@@ -170,15 +170,15 @@ def check_params_fp32(model):
if param.type() != "torch.cuda.FloatTensor":
print("Warning: Found param {} with type {}, expected torch.cuda.FloatTensor.\n"
"When using amp.initialize, you do not need to call .half() on your model\n"
"before passing it, no matter what optimization level you choose.",
name, param.type())
"before passing it, no matter what optimization level you choose.".format(
name, param.type()))
for name, param in model.named_buffers():
if param.type() != "torch.cuda.FloatTensor":
print("Warning: Found buffer {} with type {}, expected torch.cuda.FloatTensor.\n"
"When using amp.initialize, you do not need to call .half() on your model\n"
"before passing it, no matter what optimization level you choose.",
name, param.type())
"before passing it, no matter what optimization level you choose.".format(
name, param.type()))
# allow user to directly pass Properties struct as well?
......
**distributed_data_parallel.py** and **run.sh** show an example using `FP16_Optimizer` with
`apex.parallel.DistributedDataParallel` in conjuction with the legacy Apex
launcher script, `apex.parallel.multiproc`. See
[FP16_Optimizer_simple/distributed_apex](https://github.com/NVIDIA/apex/tree/torch_launcher/examples/FP16_Optimizer_simple/distributed_apex) for a more up-to-date example that uses the Pytorch launcher
script, `torch.distributed.launch`.
The usage of `FP16_Optimizer` with distributed does not need to change from ordinary
single-process usage. Test via
```bash
bash run.sh
```
import torch
import argparse
from apex.parallel import DistributedDataParallel as DDP
from apex.fp16_utils import FP16_Optimizer
parser = argparse.ArgumentParser()
parser.add_argument('--dist-url', default='tcp://224.66.41.62:23456', type=str,
help='url used to set up distributed training')
parser.add_argument('--world-size', default=2, type=int,
help='Number of distributed processes.')
parser.add_argument("--rank", type=int,
help='Rank of this process')
args = parser.parse_args()
torch.cuda.set_device(args.rank)
torch.distributed.init_process_group(backend='nccl',
init_method=args.dist_url,
world_size=args.world_size,
rank=args.rank)
torch.backends.cudnn.benchmark = True
N, D_in, D_out = 64, 1024, 16
x = torch.randn(N, D_in, device='cuda', dtype=torch.half)
y = torch.randn(N, D_out, device='cuda', dtype=torch.half)
model = torch.nn.Linear(D_in, D_out).cuda().half()
model = DDP(model)
optimizer = torch.optim.SGD(model.parameters(), lr=1e-3)
### Construct FP16_Optimizer ###
optimizer = FP16_Optimizer(optimizer)
###
loss_fn = torch.nn.MSELoss()
for t in range(500):
optimizer.zero_grad()
y_pred = model(x)
loss = loss_fn(y_pred.float(), y.float())
### Change loss.backward() to: ###
optimizer.backward(loss)
###
optimizer.step()
print("final loss = ", loss)
#!/bin/bash
# By default, apex.parallel.multiproc will attempt to use all available GPUs on the system.
# The number of GPUs to use can be limited by setting CUDA_VISIBLE_DEVICES:
export CUDA_VISIBLE_DEVICES=0,1
python -m apex.parallel.multiproc distributed_data_parallel.py
## Contents:
This directory contains examples illustrating Apex mixed precision and distributed tools.
**distributed**: Walkthrough of apex distributed data parallel utilities.
**FP16_Optimizer_simple**: Simple examples demonstrating various use cases of `FP16_Optimizer` to automatically manage master parameters and static or dynamic loss scaling.
**imagenet**: Example based on [https://github.com/pytorch/examples/tree/master/imagenet](https://github.com/pytorch/examples/tree/master/imagenet) showing the use of `FP16_Optimizer`, as well as manual management of master parameters and loss scaling for illustration/comparison.
**word_language_model**: Example based on [https://github.com/pytorch/examples/tree/master/word_language_model](https://github.com/pytorch/examples/tree/master/word_language_model) showing the use of `FP16_Optimizer`, as well as manual management of master parameters and loss scaling for illustration/comparison.
**docker**: Example of a minimal Dockerfile that installs Apex on top of an existing container.
**Note for users of the pre-unification API**:
`deprecated_api` contains examples illustrating the old (pre-unified) APIs. These APIs will be removed soon, and users are strongly encouraged to switch. The separate mixed precision tools called `Amp` and `FP16_Optimizer` in the old API are exposed via different flags/optimization levels in the new API.
## Contents:
**distributed**: Walkthrough of apex distributed data parallel utilities.
**FP16_Optimizer_simple**: Simple examples demonstrating various use cases of `FP16_Optimizer` to automatically manage master parameters and static or dynamic loss scaling.
**imagenet**: Example based on [https://github.com/pytorch/examples/tree/master/imagenet](https://github.com/pytorch/examples/tree/master/imagenet) showing the use of `FP16_Optimizer`, as well as manual management of master parameters and loss scaling for illustration/comparison.
**word_language_model**: Example based on [https://github.com/pytorch/examples/tree/master/word_language_model](https://github.com/pytorch/examples/tree/master/word_language_model) showing the use of `FP16_Optimizer`, as well as manual management of master parameters and loss scaling for illustration/comparison.
**docker**: Example of a minimal Dockerfile that installs Apex on top of an existing container.
# Base image must at least have pytorch and CUDA installed.
ARG BASE_IMAGE=nvcr.io/nvidia/pytorch:18.12-py3
ARG BASE_IMAGE=nvcr.io/nvidia/pytorch:19.01-py3
FROM $BASE_IMAGE
ARG BASE_IMAGE
RUN echo "Installing Apex on top of ${BASE_IMAGE}"
......@@ -10,5 +10,5 @@ RUN pip uninstall -y apex || :
# and therefore force cloning of the latest version of Apex
RUN SHA=ToUcHMe git clone https://github.com/NVIDIA/apex.git
WORKDIR /workspace/apex
RUN python setup.py install
RUN python setup.py install --cuda_ext --cpp_ext
WORKDIR /workspace
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment