Commit b90b0570 authored by Michael Carilli's avatar Michael Carilli
Browse files

Minor typos

parent a3dbea38
...@@ -30,7 +30,7 @@ CPU data loading bottlenecks. ...@@ -30,7 +30,7 @@ CPU data loading bottlenecks.
`O0` and `O3` can be told to use loss scaling via manual overrides, but using loss scaling with `O0` `O0` and `O3` can be told to use loss scaling via manual overrides, but using loss scaling with `O0`
(pure FP32 training) does not really make sense, and will trigger a warning. (pure FP32 training) does not really make sense, and will trigger a warning.
Softlink training and validation dataset into current directory Softlink training and validation dataset into current directory:
``` ```
$ ln -sf /data/imagenet/train-jpeg/ train $ ln -sf /data/imagenet/train-jpeg/ train
$ ln -sf /data/imagenet/val-jpeg/ val $ ln -sf /data/imagenet/val-jpeg/ val
...@@ -42,7 +42,7 @@ Amp enables easy experimentation with various pure and mixed precision options. ...@@ -42,7 +42,7 @@ Amp enables easy experimentation with various pure and mixed precision options.
``` ```
$ python main_amp.py -a resnet50 --b 128 --workers 4 --opt-level O0 ./ $ python main_amp.py -a resnet50 --b 128 --workers 4 --opt-level O0 ./
$ python main_amp.py -a resnet50 --b 224 --workers 4 --opt-level O3 ./ $ python main_amp.py -a resnet50 --b 224 --workers 4 --opt-level O3 ./
$ python main_amp.py -a resnet50 --b 224 --workers 4 --opt-level O3 --keep-batchnorm-FP32 True ./ $ python main_amp.py -a resnet50 --b 224 --workers 4 --opt-level O3 --keep-batchnorm-fp32 True ./
$ python main_amp.py -a resnet50 --b 224 --workers 4 --opt-level O1 ./ $ python main_amp.py -a resnet50 --b 224 --workers 4 --opt-level O1 ./
$ python main_amp.py -a resnet50 --b 224 --workers 4 --opt-level O1 --loss-scale 128.0 ./ $ python main_amp.py -a resnet50 --b 224 --workers 4 --opt-level O1 --loss-scale 128.0 ./
$ python -m torch.distributed.launch --nproc_per_node=2 main_amp.py -a resnet50 --b 224 --workers 4 --opt-level O1 ./ $ python -m torch.distributed.launch --nproc_per_node=2 main_amp.py -a resnet50 --b 224 --workers 4 --opt-level O1 ./
...@@ -64,7 +64,7 @@ $ python main_amp.py -a resnet50 --b 224 --workers 4 --opt-level O3 ./ ...@@ -64,7 +64,7 @@ $ python main_amp.py -a resnet50 --b 224 --workers 4 --opt-level O3 ./
``` ```
FP16 training with FP32 batchnorm: FP16 training with FP32 batchnorm:
``` ```
$ python main_amp.py -a resnet50 --b 224 --workers 4 --opt-level O3 --keep-batchnorm-FP32 True ./ $ python main_amp.py -a resnet50 --b 224 --workers 4 --opt-level O3 --keep-batchnorm-fp32 True ./
``` ```
Keeping the batchnorms in FP32 improves stability and allows Pytorch Keeping the batchnorms in FP32 improves stability and allows Pytorch
to use cudnn batchnorms, which significantly increases speed in Resnet50. to use cudnn batchnorms, which significantly increases speed in Resnet50.
...@@ -72,8 +72,8 @@ to use cudnn batchnorms, which significantly increases speed in Resnet50. ...@@ -72,8 +72,8 @@ to use cudnn batchnorms, which significantly increases speed in Resnet50.
The `O3` options might not converge, because they are not true mixed precision. The `O3` options might not converge, because they are not true mixed precision.
However, they can be useful to establish "speed of light" performance for However, they can be useful to establish "speed of light" performance for
your model, which provides a baseline for comparison with `O1` and `O2`. your model, which provides a baseline for comparison with `O1` and `O2`.
For Resnet50 in particular, `--opt-level O3 --keep-batchnorm-FP32 True` establishes For Resnet50 in particular, `--opt-level O3 --keep-batchnorm-fp32 True` establishes
the "speed of light." (Without `--keep-batchnorm-FP32`, it's slower, because it does the "speed of light." (Without `--keep-batchnorm-fp32`, it's slower, because it does
not use cudnn batchnorm.) not use cudnn batchnorm.)
#### `--opt-level O1` ("conservative mixed precision") #### `--opt-level O1` ("conservative mixed precision")
......
...@@ -95,15 +95,10 @@ def fast_collate(batch): ...@@ -95,15 +95,10 @@ def fast_collate(batch):
best_prec1 = 0 best_prec1 = 0
args = parser.parse_args() args = parser.parse_args()
# Let multi_tensor_applier be the canary in the coalmine
# that verifies if the backend is what we think it is
assert multi_tensor_applier.available == args.has_ext
print("opt_level = {}".format(args.opt_level)) print("opt_level = {}".format(args.opt_level))
print("keep_batchnorm_fp32 = {}".format(args.keep_batchnorm_fp32), type(args.keep_batchnorm_fp32)) print("keep_batchnorm_fp32 = {}".format(args.keep_batchnorm_fp32), type(args.keep_batchnorm_fp32))
print("loss_scale = {}".format(args.loss_scale), type(args.loss_scale)) print("loss_scale = {}".format(args.loss_scale), type(args.loss_scale))
print("\nCUDNN VERSION: {}\n".format(torch.backends.cudnn.version())) print("\nCUDNN VERSION: {}\n".format(torch.backends.cudnn.version()))
if args.deterministic: if args.deterministic:
...@@ -342,8 +337,8 @@ def train(train_loader, model, criterion, optimizer, epoch): ...@@ -342,8 +337,8 @@ def train(train_loader, model, criterion, optimizer, epoch):
input, target = prefetcher.next() input, target = prefetcher.next()
if i%args.print_freq == 0: if i%args.print_freq == 0:
# Every print_freq iterations, let's check the accuracy and speed. # Every print_freq iterations, check the loss accuracy and speed.
# For best performance, it doesn't make sense to collect these metrics every # For best performance, it doesn't make sense to print these metrics every
# iteration, since they incur an allreduce and some host<->device syncs. # iteration, since they incur an allreduce and some host<->device syncs.
# Measure accuracy # Measure accuracy
...@@ -374,8 +369,8 @@ def train(train_loader, model, criterion, optimizer, epoch): ...@@ -374,8 +369,8 @@ def train(train_loader, model, criterion, optimizer, epoch):
'Prec@1 {top1.val:.3f} ({top1.avg:.3f})\t' 'Prec@1 {top1.val:.3f} ({top1.avg:.3f})\t'
'Prec@5 {top5.val:.3f} ({top5.avg:.3f})'.format( 'Prec@5 {top5.val:.3f} ({top5.avg:.3f})'.format(
epoch, i, len(train_loader), epoch, i, len(train_loader),
args.print_freq*args.world_size*args.batch_size/batch_time.val, args.world_size*args.batch_size/batch_time.val,
args.print_freq*args.world_size*args.batch_size/batch_time.avg, args.world_size*args.batch_size/batch_time.avg,
batch_time=batch_time, batch_time=batch_time,
loss=losses, top1=top1, top5=top5)) loss=losses, top1=top1, top5=top5))
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment