Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
OpenDAS
apex
Commits
b90b0570
Commit
b90b0570
authored
Mar 03, 2019
by
Michael Carilli
Browse files
Minor typos
parent
a3dbea38
Changes
2
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
9 additions
and
14 deletions
+9
-14
examples/imagenet/README.md
examples/imagenet/README.md
+5
-5
examples/imagenet/main_amp.py
examples/imagenet/main_amp.py
+4
-9
No files found.
examples/imagenet/README.md
View file @
b90b0570
...
...
@@ -30,7 +30,7 @@ CPU data loading bottlenecks.
`O0`
and
`O3`
can be told to use loss scaling via manual overrides, but using loss scaling with
`O0`
(pure FP32 training) does not really make sense, and will trigger a warning.
Softlink training and validation dataset into current directory
Softlink training and validation dataset into current directory
:
```
$ ln -sf /data/imagenet/train-jpeg/ train
$ ln -sf /data/imagenet/val-jpeg/ val
...
...
@@ -42,7 +42,7 @@ Amp enables easy experimentation with various pure and mixed precision options.
```
$ python main_amp.py -a resnet50 --b 128 --workers 4 --opt-level O0 ./
$ python main_amp.py -a resnet50 --b 224 --workers 4 --opt-level O3 ./
$ python main_amp.py -a resnet50 --b 224 --workers 4 --opt-level O3 --keep-batchnorm-
FP
32 True ./
$ python main_amp.py -a resnet50 --b 224 --workers 4 --opt-level O3 --keep-batchnorm-
fp
32 True ./
$ python main_amp.py -a resnet50 --b 224 --workers 4 --opt-level O1 ./
$ python main_amp.py -a resnet50 --b 224 --workers 4 --opt-level O1 --loss-scale 128.0 ./
$ python -m torch.distributed.launch --nproc_per_node=2 main_amp.py -a resnet50 --b 224 --workers 4 --opt-level O1 ./
...
...
@@ -64,7 +64,7 @@ $ python main_amp.py -a resnet50 --b 224 --workers 4 --opt-level O3 ./
```
FP16 training with FP32 batchnorm:
```
$ python main_amp.py -a resnet50 --b 224 --workers 4 --opt-level O3 --keep-batchnorm-
FP
32 True ./
$ python main_amp.py -a resnet50 --b 224 --workers 4 --opt-level O3 --keep-batchnorm-
fp
32 True ./
```
Keeping the batchnorms in FP32 improves stability and allows Pytorch
to use cudnn batchnorms, which significantly increases speed in Resnet50.
...
...
@@ -72,8 +72,8 @@ to use cudnn batchnorms, which significantly increases speed in Resnet50.
The
`O3`
options might not converge, because they are not true mixed precision.
However, they can be useful to establish "speed of light" performance for
your model, which provides a baseline for comparison with
`O1`
and
`O2`
.
For Resnet50 in particular,
`--opt-level O3 --keep-batchnorm-
FP
32 True`
establishes
the "speed of light." (Without
`--keep-batchnorm-
FP
32`
, it's slower, because it does
For Resnet50 in particular,
`--opt-level O3 --keep-batchnorm-
fp
32 True`
establishes
the "speed of light." (Without
`--keep-batchnorm-
fp
32`
, it's slower, because it does
not use cudnn batchnorm.)
#### `--opt-level O1` ("conservative mixed precision")
...
...
examples/imagenet/main_amp.py
View file @
b90b0570
...
...
@@ -95,15 +95,10 @@ def fast_collate(batch):
best_prec1
=
0
args
=
parser
.
parse_args
()
# Let multi_tensor_applier be the canary in the coalmine
# that verifies if the backend is what we think it is
assert
multi_tensor_applier
.
available
==
args
.
has_ext
print
(
"opt_level = {}"
.
format
(
args
.
opt_level
))
print
(
"keep_batchnorm_fp32 = {}"
.
format
(
args
.
keep_batchnorm_fp32
),
type
(
args
.
keep_batchnorm_fp32
))
print
(
"loss_scale = {}"
.
format
(
args
.
loss_scale
),
type
(
args
.
loss_scale
))
print
(
"
\n
CUDNN VERSION: {}
\n
"
.
format
(
torch
.
backends
.
cudnn
.
version
()))
if
args
.
deterministic
:
...
...
@@ -342,8 +337,8 @@ def train(train_loader, model, criterion, optimizer, epoch):
input
,
target
=
prefetcher
.
next
()
if
i
%
args
.
print_freq
==
0
:
# Every print_freq iterations,
let's
check the accuracy and speed.
# For best performance, it doesn't make sense to
collec
t these metrics every
# Every print_freq iterations, check the
loss
accuracy and speed.
# For best performance, it doesn't make sense to
prin
t these metrics every
# iteration, since they incur an allreduce and some host<->device syncs.
# Measure accuracy
...
...
@@ -374,8 +369,8 @@ def train(train_loader, model, criterion, optimizer, epoch):
'Prec@1 {top1.val:.3f} ({top1.avg:.3f})
\t
'
'Prec@5 {top5.val:.3f} ({top5.avg:.3f})'
.
format
(
epoch
,
i
,
len
(
train_loader
),
args
.
print_freq
*
args
.
world_size
*
args
.
batch_size
/
batch_time
.
val
,
args
.
print_freq
*
args
.
world_size
*
args
.
batch_size
/
batch_time
.
avg
,
args
.
world_size
*
args
.
batch_size
/
batch_time
.
val
,
args
.
world_size
*
args
.
batch_size
/
batch_time
.
avg
,
batch_time
=
batch_time
,
loss
=
losses
,
top1
=
top1
,
top5
=
top5
))
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment