Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
OpenDAS
apex
Commits
a3dbea38
Commit
a3dbea38
authored
Mar 03, 2019
by
Michael Carilli
Browse files
Adding summary
parent
26b30d13
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
19 additions
and
3 deletions
+19
-3
examples/imagenet/README.md
examples/imagenet/README.md
+19
-3
No files found.
examples/imagenet/README.md
View file @
a3dbea38
...
@@ -36,7 +36,23 @@ $ ln -sf /data/imagenet/train-jpeg/ train
...
@@ -36,7 +36,23 @@ $ ln -sf /data/imagenet/train-jpeg/ train
$ ln -sf /data/imagenet/val-jpeg/ val
$ ln -sf /data/imagenet/val-jpeg/ val
```
```
### `--opt-level O0` (FP32 training) and `O3` (FP16 training)
### Summary
Amp enables easy experimentation with various pure and mixed precision options.
```
$ python main_amp.py -a resnet50 --b 128 --workers 4 --opt-level O0 ./
$ python main_amp.py -a resnet50 --b 224 --workers 4 --opt-level O3 ./
$ python main_amp.py -a resnet50 --b 224 --workers 4 --opt-level O3 --keep-batchnorm-FP32 True ./
$ python main_amp.py -a resnet50 --b 224 --workers 4 --opt-level O1 ./
$ python main_amp.py -a resnet50 --b 224 --workers 4 --opt-level O1 --loss-scale 128.0 ./
$ python -m torch.distributed.launch --nproc_per_node=2 main_amp.py -a resnet50 --b 224 --workers 4 --opt-level O1 ./
$ python main_amp.py -a resnet50 --b 224 --workers 4 --opt-level O2 ./
$ python main_amp.py -a resnet50 --b 224 --workers 4 --opt-level O2 --loss-scale 128.0 ./
$ python -m torch.distributed.launch --nproc_per_node=2 main_amp.py -a resnet50 --b 224 --workers 4 --opt-level O2 ./
```
Options are broken down in detail below.
#### `--opt-level O0` (FP32 training) and `O3` (FP16 training)
"Pure FP32" training:
"Pure FP32" training:
```
```
...
@@ -60,7 +76,7 @@ For Resnet50 in particular, `--opt-level O3 --keep-batchnorm-FP32 True` establis
...
@@ -60,7 +76,7 @@ For Resnet50 in particular, `--opt-level O3 --keep-batchnorm-FP32 True` establis
the "speed of light." (Without
`--keep-batchnorm-FP32`
, it's slower, because it does
the "speed of light." (Without
`--keep-batchnorm-FP32`
, it's slower, because it does
not use cudnn batchnorm.)
not use cudnn batchnorm.)
### `--opt-level O1` ("conservative mixed precision")
###
#
`--opt-level O1` ("conservative mixed precision")
`O1`
patches Torch functions to cast inputs according to a whitelist-blacklist model.
`O1`
patches Torch functions to cast inputs according to a whitelist-blacklist model.
FP16-friendly (Tensor Core) ops like gemms and convolutions run in FP16, while ops
FP16-friendly (Tensor Core) ops like gemms and convolutions run in FP16, while ops
...
@@ -81,7 +97,7 @@ $ python -m torch.distributed.launch --nproc_per_node=2 main_amp.py -a resnet50
...
@@ -81,7 +97,7 @@ $ python -m torch.distributed.launch --nproc_per_node=2 main_amp.py -a resnet50
For best performance, set
`--nproc_per_node`
equal to the total number of GPUs on the node
For best performance, set
`--nproc_per_node`
equal to the total number of GPUs on the node
to use all available resources.
to use all available resources.
### `--opt-level O2` ("fast mixed precision")
###
#
`--opt-level O2` ("fast mixed precision")
`O2`
casts the model to FP16, keeps batchnorms in FP32,
`O2`
casts the model to FP16, keeps batchnorms in FP32,
maintains master weights in FP32, and implements
maintains master weights in FP32, and implements
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment