Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
OpenDAS
apex
Commits
a3dbea38
You need to sign in or sign up before continuing.
Commit
a3dbea38
authored
Mar 03, 2019
by
Michael Carilli
Browse files
Adding summary
parent
26b30d13
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
19 additions
and
3 deletions
+19
-3
examples/imagenet/README.md
examples/imagenet/README.md
+19
-3
No files found.
examples/imagenet/README.md
View file @
a3dbea38
...
@@ -36,7 +36,23 @@ $ ln -sf /data/imagenet/train-jpeg/ train
...
@@ -36,7 +36,23 @@ $ ln -sf /data/imagenet/train-jpeg/ train
$ ln -sf /data/imagenet/val-jpeg/ val
$ ln -sf /data/imagenet/val-jpeg/ val
```
```
### `--opt-level O0` (FP32 training) and `O3` (FP16 training)
### Summary
Amp enables easy experimentation with various pure and mixed precision options.
```
$ python main_amp.py -a resnet50 --b 128 --workers 4 --opt-level O0 ./
$ python main_amp.py -a resnet50 --b 224 --workers 4 --opt-level O3 ./
$ python main_amp.py -a resnet50 --b 224 --workers 4 --opt-level O3 --keep-batchnorm-FP32 True ./
$ python main_amp.py -a resnet50 --b 224 --workers 4 --opt-level O1 ./
$ python main_amp.py -a resnet50 --b 224 --workers 4 --opt-level O1 --loss-scale 128.0 ./
$ python -m torch.distributed.launch --nproc_per_node=2 main_amp.py -a resnet50 --b 224 --workers 4 --opt-level O1 ./
$ python main_amp.py -a resnet50 --b 224 --workers 4 --opt-level O2 ./
$ python main_amp.py -a resnet50 --b 224 --workers 4 --opt-level O2 --loss-scale 128.0 ./
$ python -m torch.distributed.launch --nproc_per_node=2 main_amp.py -a resnet50 --b 224 --workers 4 --opt-level O2 ./
```
Options are broken down in detail below.
#### `--opt-level O0` (FP32 training) and `O3` (FP16 training)
"Pure FP32" training:
"Pure FP32" training:
```
```
...
@@ -60,7 +76,7 @@ For Resnet50 in particular, `--opt-level O3 --keep-batchnorm-FP32 True` establis
...
@@ -60,7 +76,7 @@ For Resnet50 in particular, `--opt-level O3 --keep-batchnorm-FP32 True` establis
the "speed of light." (Without
`--keep-batchnorm-FP32`
, it's slower, because it does
the "speed of light." (Without
`--keep-batchnorm-FP32`
, it's slower, because it does
not use cudnn batchnorm.)
not use cudnn batchnorm.)
### `--opt-level O1` ("conservative mixed precision")
###
#
`--opt-level O1` ("conservative mixed precision")
`O1`
patches Torch functions to cast inputs according to a whitelist-blacklist model.
`O1`
patches Torch functions to cast inputs according to a whitelist-blacklist model.
FP16-friendly (Tensor Core) ops like gemms and convolutions run in FP16, while ops
FP16-friendly (Tensor Core) ops like gemms and convolutions run in FP16, while ops
...
@@ -81,7 +97,7 @@ $ python -m torch.distributed.launch --nproc_per_node=2 main_amp.py -a resnet50
...
@@ -81,7 +97,7 @@ $ python -m torch.distributed.launch --nproc_per_node=2 main_amp.py -a resnet50
For best performance, set
`--nproc_per_node`
equal to the total number of GPUs on the node
For best performance, set
`--nproc_per_node`
equal to the total number of GPUs on the node
to use all available resources.
to use all available resources.
### `--opt-level O2` ("fast mixed precision")
###
#
`--opt-level O2` ("fast mixed precision")
`O2`
casts the model to FP16, keeps batchnorms in FP32,
`O2`
casts the model to FP16, keeps batchnorms in FP32,
maintains master weights in FP32, and implements
maintains master weights in FP32, and implements
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment