Remove some fp16 examples that don't converge (#45)

* Remove some fp16 examples that don't converge Default static loss scale of 1.0 (default value) for resnet50 doesn't converge. Either remove example or put static loss scale 128 on it, which is known to converge well. * Update README.md

Remove some fp16 examples that don't converge (#45)
* Remove some fp16 examples that don't converge Default static loss scale of 1.0 (default value) for resnet50 doesn't converge. Either remove example or put static loss scale 128 on it, which is known to converge well. * Update README.md
0ec8addb · Christian Sarofeen · mcarilli · 48f105d9 · 0ec8addb
Commit 0ec8addb authored Sep 17, 2018 by Christian Sarofeen Committed by mcarilli Sep 17, 2018
Hide whitespace changes
Inline Side-by-side

Showing with 4 additions and 6 deletions

examples/imagenet/README.md examples/imagenet/README.md +4 -6

No files found.
--- a/examples/imagenet/README.md
+++ b/examples/imagenet/README.md
@@ -50,15 +50,13 @@ $ ln -sf /data/imagenet/train-jpeg/ train
 ### Softlink validation dataset into current directory
 $ ln -sf /data/imagenet/val-jpeg/ val
 ### Single-process training
-$ python main.py -a resnet50 --fp16 --b 256 --workers 4 ./
-### Multi-process training (uses all visible GPU on the node)
-$ python -m torch.distributed.launch --nproc_per_node=NUM_GPUS main.py -a resnet50 --fp16 --b 256 --workers 4 ./
+$ python main.py -a resnet50 --fp16 --b 256 --workers 4 --static-loss-scale 128.0 ./
+### Multi-process training (uses all visible GPUs on the node)
+$ python -m torch.distributed.launch --nproc_per_node=NUM_GPUS main.py -a resnet50 --fp16 --b 256 --workers 4 --static-loss-scale 128.0 ./
 ### Multi-process training on GPUs 0 and 1 only
 $ export CUDA_VISIBLE_DEVICES=0,1
 $ python -m torch.distributed.launch --nproc_per_node=2 main.py -a resnet50 --fp16 --b 256 --workers 4 ./
-### Multi-process training with FP16_Optimizer, default loss scale 1.0 (still uses FP32 master params)
-$ python -m torch.distributed.launch --nproc_per_node=NUM_GPUS main_fp16_optimizer.py -a resnet50 --fp16 --b 256 --workers 4 ./
-# Multi-process training with FP16_Optimizer, static loss scale
+### Multi-process training with FP16_Optimizer, static loss scale 128.0 (still uses FP32 master params)
 $ python -m torch.distributed.launch --nproc_per_node=NUM_GPUS main_fp16_optimizer.py -a resnet50 --fp16 --b 256 --static-loss-scale 128.0 --workers 4 ./
 ### Multi-process training with FP16_Optimizer, dynamic loss scaling
 $ python -m torch.distributed.launch --nproc_per_node=NUM_GPUS main_fp16_optimizer.py -a resnet50 --fp16 --b 256 --dynamic-loss-scale --workers 4 ./