Comprehensive imagenet example for the new API

254cad2d · Michael Carilli · 484292f0 · 254cad2d · 484292f0 · 484292f0
Commit 254cad2d authored Mar 03, 2019 by Michael Carilli
20 changed files
--- a/apex/amp/frontend.py
+++ b/apex/amp/frontend.py
@@ -170,15 +170,15 @@ def check_params_fp32(model):
        if param.type() != "torch.cuda.FloatTensor":
            print("Warning:  Found param {} with type {}, expected torch.cuda.FloatTensor.\n"
                  "When using amp.initialize, you do not need to call .half() on your model\n"
-                  "before passing it, no matter what optimization level you choose.",
-                  name, param.type())
+                  "before passing it, no matter what optimization level you choose.".format(
+                  name, param.type()))

    for name, param in model.named_buffers():
        if param.type() != "torch.cuda.FloatTensor":
            print("Warning:  Found buffer {} with type {}, expected torch.cuda.FloatTensor.\n"
                  "When using amp.initialize, you do not need to call .half() on your model\n"
-                  "before passing it, no matter what optimization level you choose.",
-                  name, param.type())
+                  "before passing it, no matter what optimization level you choose.".format(
+                  name, param.type()))


 # allow user to directly pass Properties struct as well?

--- a/examples/FP16_Optimizer_simple/distributed_apex_legacy_launcher/README.md
+++ b/examples/FP16_Optimizer_simple/distributed_apex_legacy_launcher/README.md
-**distributed_data_parallel.py** and **run.sh** show an example using `FP16_Optimizer` with
-`apex.parallel.DistributedDataParallel` in conjuction with the legacy Apex
-launcher script, `apex.parallel.multiproc`.  See 
-[FP16_Optimizer_simple/distributed_apex](https://github.com/NVIDIA/apex/tree/torch_launcher/examples/FP16_Optimizer_simple/distributed_apex) for a more up-to-date example that uses the Pytorch launcher
-script, `torch.distributed.launch`.
-The usage of `FP16_Optimizer` with distributed does not need to change from ordinary 
-single-process usage.  Test via
-```bash
-bash run.sh
-```
--- a/examples/FP16_Optimizer_simple/distributed_apex_legacy_launcher/distributed_data_parallel.py
+++ b/examples/FP16_Optimizer_simple/distributed_apex_legacy_launcher/distributed_data_parallel.py
-import torch
-import argparse
-from apex.parallel import DistributedDataParallel as DDP
-from apex.fp16_utils import FP16_Optimizer
-
-parser = argparse.ArgumentParser()
-parser.add_argument('--dist-url', default='tcp://224.66.41.62:23456', type=str,
-                    help='url used to set up distributed training')
-parser.add_argument('--world-size', default=2, type=int,
-                    help='Number of distributed processes.')
-parser.add_argument("--rank", type=int,
-                    help='Rank of this process')
-
-args = parser.parse_args()
-
-torch.cuda.set_device(args.rank)
-torch.distributed.init_process_group(backend='nccl',
-                                     init_method=args.dist_url,
-                                     world_size=args.world_size,
-                                     rank=args.rank)
-
-torch.backends.cudnn.benchmark = True
-
-N, D_in, D_out = 64, 1024, 16
-
-x = torch.randn(N, D_in, device='cuda', dtype=torch.half)
-y = torch.randn(N, D_out, device='cuda', dtype=torch.half)
-
-model = torch.nn.Linear(D_in, D_out).cuda().half()
-model = DDP(model)
-
-optimizer = torch.optim.SGD(model.parameters(), lr=1e-3)
-### Construct FP16_Optimizer ###
-optimizer = FP16_Optimizer(optimizer)
-###
-
-loss_fn = torch.nn.MSELoss()
-
-for t in range(500):
-    optimizer.zero_grad()
-    y_pred = model(x)
-    loss = loss_fn(y_pred.float(), y.float())
-    ### Change loss.backward() to: ###
-    optimizer.backward(loss)
-    ###
-    optimizer.step()
-
-print("final loss = ", loss)
-
--- a/examples/FP16_Optimizer_simple/distributed_apex_legacy_launcher/run.sh
+++ b/examples/FP16_Optimizer_simple/distributed_apex_legacy_launcher/run.sh
-#!/bin/bash
-# By default, apex.parallel.multiproc will attempt to use all available GPUs on the system.  
-# The number of GPUs to use can be limited by setting CUDA_VISIBLE_DEVICES:
-export CUDA_VISIBLE_DEVICES=0,1
-python -m apex.parallel.multiproc distributed_data_parallel.py
--- a/examples/README.md
+++ b/examples/README.md
-## Contents:
+This directory contains examples illustrating Apex mixed precision and distributed tools.

-**distributed**:  Walkthrough of apex distributed data parallel utilities.
-
-**FP16_Optimizer_simple**:  Simple examples demonstrating various use cases of `FP16_Optimizer` to automatically manage master parameters and static or dynamic loss scaling.
-
-**imagenet**:  Example based on [https://github.com/pytorch/examples/tree/master/imagenet](https://github.com/pytorch/examples/tree/master/imagenet) showing the use of `FP16_Optimizer`, as well as manual management of master parameters and loss scaling for illustration/comparison.
-
-**word_language_model**:  Example based on [https://github.com/pytorch/examples/tree/master/word_language_model](https://github.com/pytorch/examples/tree/master/word_language_model) showing the use of `FP16_Optimizer`, as well as manual management of master parameters and loss scaling for illustration/comparison.
-
-**docker**:  Example of a minimal Dockerfile that installs Apex on top of an existing container.
+**Note for users of the pre-unification API**:
+`deprecated_api` contains examples illustrating the old (pre-unified) APIs.  These APIs will be removed soon, and users are strongly encouraged to switch.  The separate mixed precision tools called `Amp` and `FP16_Optimizer` in the old API are exposed via different flags/optimization levels in the new API.
--- a/examples/FP16_Optimizer_simple/README.md
+++ b/examples/FP16_Optimizer_simple/README.md
--- a/examples/FP16_Optimizer_simple/closure.py
+++ b/examples/FP16_Optimizer_simple/closure.py
--- a/examples/FP16_Optimizer_simple/distributed_apex/README.md
+++ b/examples/FP16_Optimizer_simple/distributed_apex/README.md
--- a/examples/FP16_Optimizer_simple/distributed_apex/distributed_data_parallel.py
+++ b/examples/FP16_Optimizer_simple/distributed_apex/distributed_data_parallel.py
--- a/examples/FP16_Optimizer_simple/distributed_apex/run.sh
+++ b/examples/FP16_Optimizer_simple/distributed_apex/run.sh
--- a/examples/FP16_Optimizer_simple/distributed_pytorch/README.md
+++ b/examples/FP16_Optimizer_simple/distributed_pytorch/README.md
--- a/examples/FP16_Optimizer_simple/distributed_pytorch/distributed_data_parallel.py
+++ b/examples/FP16_Optimizer_simple/distributed_pytorch/distributed_data_parallel.py
--- a/examples/FP16_Optimizer_simple/distributed_pytorch/run.sh
+++ b/examples/FP16_Optimizer_simple/distributed_pytorch/run.sh
--- a/examples/FP16_Optimizer_simple/minimal.py
+++ b/examples/FP16_Optimizer_simple/minimal.py
--- a/examples/FP16_Optimizer_simple/save_load.py
+++ b/examples/FP16_Optimizer_simple/save_load.py
--- a/examples/deprecated_api/README.md
+++ b/examples/deprecated_api/README.md
+## Contents:
+
+**distributed**:  Walkthrough of apex distributed data parallel utilities.
+
+**FP16_Optimizer_simple**:  Simple examples demonstrating various use cases of `FP16_Optimizer` to automatically manage master parameters and static or dynamic loss scaling.
+
+**imagenet**:  Example based on [https://github.com/pytorch/examples/tree/master/imagenet](https://github.com/pytorch/examples/tree/master/imagenet) showing the use of `FP16_Optimizer`, as well as manual management of master parameters and loss scaling for illustration/comparison.
+
+**word_language_model**:  Example based on [https://github.com/pytorch/examples/tree/master/word_language_model](https://github.com/pytorch/examples/tree/master/word_language_model) showing the use of `FP16_Optimizer`, as well as manual management of master parameters and loss scaling for illustration/comparison.
+
+**docker**:  Example of a minimal Dockerfile that installs Apex on top of an existing container.
--- a/examples/distributed/README.md
+++ b/examples/distributed/README.md
--- a/examples/distributed/main.py
+++ b/examples/distributed/main.py
--- a/examples/distributed/requirements.txt
+++ b/examples/distributed/requirements.txt
--- a/examples/docker/Dockerfile
+++ b/examples/docker/Dockerfile
 # Base image must at least have pytorch and CUDA installed.
-ARG BASE_IMAGE=nvcr.io/nvidia/pytorch:18.12-py3
+ARG BASE_IMAGE=nvcr.io/nvidia/pytorch:19.01-py3
 FROM $BASE_IMAGE
 ARG BASE_IMAGE
 RUN echo "Installing Apex on top of ${BASE_IMAGE}"
@@ -10,5 +10,5 @@ RUN pip uninstall -y apex || :
 # and therefore force cloning of the latest version of Apex
 RUN SHA=ToUcHMe git clone https://github.com/NVIDIA/apex.git
 WORKDIR /workspace/apex
-RUN python setup.py install
+RUN python setup.py install --cuda_ext --cpp_ext
 WORKDIR /workspace