Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
OpenDAS
apex
Commits
b90b0570
Commit
b90b0570
authored
Mar 03, 2019
by
Michael Carilli
Browse files
Minor typos
parent
a3dbea38
Changes
2
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
9 additions
and
14 deletions
+9
-14
examples/imagenet/README.md
examples/imagenet/README.md
+5
-5
examples/imagenet/main_amp.py
examples/imagenet/main_amp.py
+4
-9
No files found.
examples/imagenet/README.md
View file @
b90b0570
...
@@ -30,7 +30,7 @@ CPU data loading bottlenecks.
...
@@ -30,7 +30,7 @@ CPU data loading bottlenecks.
`O0`
and
`O3`
can be told to use loss scaling via manual overrides, but using loss scaling with
`O0`
`O0`
and
`O3`
can be told to use loss scaling via manual overrides, but using loss scaling with
`O0`
(pure FP32 training) does not really make sense, and will trigger a warning.
(pure FP32 training) does not really make sense, and will trigger a warning.
Softlink training and validation dataset into current directory
Softlink training and validation dataset into current directory
:
```
```
$ ln -sf /data/imagenet/train-jpeg/ train
$ ln -sf /data/imagenet/train-jpeg/ train
$ ln -sf /data/imagenet/val-jpeg/ val
$ ln -sf /data/imagenet/val-jpeg/ val
...
@@ -42,7 +42,7 @@ Amp enables easy experimentation with various pure and mixed precision options.
...
@@ -42,7 +42,7 @@ Amp enables easy experimentation with various pure and mixed precision options.
```
```
$ python main_amp.py -a resnet50 --b 128 --workers 4 --opt-level O0 ./
$ python main_amp.py -a resnet50 --b 128 --workers 4 --opt-level O0 ./
$ python main_amp.py -a resnet50 --b 224 --workers 4 --opt-level O3 ./
$ python main_amp.py -a resnet50 --b 224 --workers 4 --opt-level O3 ./
$ python main_amp.py -a resnet50 --b 224 --workers 4 --opt-level O3 --keep-batchnorm-
FP
32 True ./
$ python main_amp.py -a resnet50 --b 224 --workers 4 --opt-level O3 --keep-batchnorm-
fp
32 True ./
$ python main_amp.py -a resnet50 --b 224 --workers 4 --opt-level O1 ./
$ python main_amp.py -a resnet50 --b 224 --workers 4 --opt-level O1 ./
$ python main_amp.py -a resnet50 --b 224 --workers 4 --opt-level O1 --loss-scale 128.0 ./
$ python main_amp.py -a resnet50 --b 224 --workers 4 --opt-level O1 --loss-scale 128.0 ./
$ python -m torch.distributed.launch --nproc_per_node=2 main_amp.py -a resnet50 --b 224 --workers 4 --opt-level O1 ./
$ python -m torch.distributed.launch --nproc_per_node=2 main_amp.py -a resnet50 --b 224 --workers 4 --opt-level O1 ./
...
@@ -64,7 +64,7 @@ $ python main_amp.py -a resnet50 --b 224 --workers 4 --opt-level O3 ./
...
@@ -64,7 +64,7 @@ $ python main_amp.py -a resnet50 --b 224 --workers 4 --opt-level O3 ./
```
```
FP16 training with FP32 batchnorm:
FP16 training with FP32 batchnorm:
```
```
$ python main_amp.py -a resnet50 --b 224 --workers 4 --opt-level O3 --keep-batchnorm-
FP
32 True ./
$ python main_amp.py -a resnet50 --b 224 --workers 4 --opt-level O3 --keep-batchnorm-
fp
32 True ./
```
```
Keeping the batchnorms in FP32 improves stability and allows Pytorch
Keeping the batchnorms in FP32 improves stability and allows Pytorch
to use cudnn batchnorms, which significantly increases speed in Resnet50.
to use cudnn batchnorms, which significantly increases speed in Resnet50.
...
@@ -72,8 +72,8 @@ to use cudnn batchnorms, which significantly increases speed in Resnet50.
...
@@ -72,8 +72,8 @@ to use cudnn batchnorms, which significantly increases speed in Resnet50.
The
`O3`
options might not converge, because they are not true mixed precision.
The
`O3`
options might not converge, because they are not true mixed precision.
However, they can be useful to establish "speed of light" performance for
However, they can be useful to establish "speed of light" performance for
your model, which provides a baseline for comparison with
`O1`
and
`O2`
.
your model, which provides a baseline for comparison with
`O1`
and
`O2`
.
For Resnet50 in particular,
`--opt-level O3 --keep-batchnorm-
FP
32 True`
establishes
For Resnet50 in particular,
`--opt-level O3 --keep-batchnorm-
fp
32 True`
establishes
the "speed of light." (Without
`--keep-batchnorm-
FP
32`
, it's slower, because it does
the "speed of light." (Without
`--keep-batchnorm-
fp
32`
, it's slower, because it does
not use cudnn batchnorm.)
not use cudnn batchnorm.)
#### `--opt-level O1` ("conservative mixed precision")
#### `--opt-level O1` ("conservative mixed precision")
...
...
examples/imagenet/main_amp.py
View file @
b90b0570
...
@@ -95,15 +95,10 @@ def fast_collate(batch):
...
@@ -95,15 +95,10 @@ def fast_collate(batch):
best_prec1
=
0
best_prec1
=
0
args
=
parser
.
parse_args
()
args
=
parser
.
parse_args
()
# Let multi_tensor_applier be the canary in the coalmine
# that verifies if the backend is what we think it is
assert
multi_tensor_applier
.
available
==
args
.
has_ext
print
(
"opt_level = {}"
.
format
(
args
.
opt_level
))
print
(
"opt_level = {}"
.
format
(
args
.
opt_level
))
print
(
"keep_batchnorm_fp32 = {}"
.
format
(
args
.
keep_batchnorm_fp32
),
type
(
args
.
keep_batchnorm_fp32
))
print
(
"keep_batchnorm_fp32 = {}"
.
format
(
args
.
keep_batchnorm_fp32
),
type
(
args
.
keep_batchnorm_fp32
))
print
(
"loss_scale = {}"
.
format
(
args
.
loss_scale
),
type
(
args
.
loss_scale
))
print
(
"loss_scale = {}"
.
format
(
args
.
loss_scale
),
type
(
args
.
loss_scale
))
print
(
"
\n
CUDNN VERSION: {}
\n
"
.
format
(
torch
.
backends
.
cudnn
.
version
()))
print
(
"
\n
CUDNN VERSION: {}
\n
"
.
format
(
torch
.
backends
.
cudnn
.
version
()))
if
args
.
deterministic
:
if
args
.
deterministic
:
...
@@ -342,8 +337,8 @@ def train(train_loader, model, criterion, optimizer, epoch):
...
@@ -342,8 +337,8 @@ def train(train_loader, model, criterion, optimizer, epoch):
input
,
target
=
prefetcher
.
next
()
input
,
target
=
prefetcher
.
next
()
if
i
%
args
.
print_freq
==
0
:
if
i
%
args
.
print_freq
==
0
:
# Every print_freq iterations,
let's
check the accuracy and speed.
# Every print_freq iterations, check the
loss
accuracy and speed.
# For best performance, it doesn't make sense to
collec
t these metrics every
# For best performance, it doesn't make sense to
prin
t these metrics every
# iteration, since they incur an allreduce and some host<->device syncs.
# iteration, since they incur an allreduce and some host<->device syncs.
# Measure accuracy
# Measure accuracy
...
@@ -374,8 +369,8 @@ def train(train_loader, model, criterion, optimizer, epoch):
...
@@ -374,8 +369,8 @@ def train(train_loader, model, criterion, optimizer, epoch):
'Prec@1 {top1.val:.3f} ({top1.avg:.3f})
\t
'
'Prec@1 {top1.val:.3f} ({top1.avg:.3f})
\t
'
'Prec@5 {top5.val:.3f} ({top5.avg:.3f})'
.
format
(
'Prec@5 {top5.val:.3f} ({top5.avg:.3f})'
.
format
(
epoch
,
i
,
len
(
train_loader
),
epoch
,
i
,
len
(
train_loader
),
args
.
print_freq
*
args
.
world_size
*
args
.
batch_size
/
batch_time
.
val
,
args
.
world_size
*
args
.
batch_size
/
batch_time
.
val
,
args
.
print_freq
*
args
.
world_size
*
args
.
batch_size
/
batch_time
.
avg
,
args
.
world_size
*
args
.
batch_size
/
batch_time
.
avg
,
batch_time
=
batch_time
,
batch_time
=
batch_time
,
loss
=
losses
,
top1
=
top1
,
top5
=
top5
))
loss
=
losses
,
top1
=
top1
,
top5
=
top5
))
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment