Unverified Commit 3266e5c4 authored by Shruti Pulstya's avatar Shruti Pulstya Committed by GitHub
Browse files

Updated references Readme files with torchrun instead of distributed.launch (#4451)

parent 85982ac6
......@@ -23,7 +23,7 @@ Since `AlexNet` and the original `VGG` architectures do not include batch
normalization, the default initial learning rate `--lr 0.1` is to high.
```
python -m torch.distributed.launch --nproc_per_node=8 --use_env train.py\
torchrun --nproc_per_node=8 train.py\
--model $MODEL --lr 1e-2
```
......@@ -33,7 +33,7 @@ normalization and thus are trained with the default parameters.
### ResNext-50 32x4d
```
python -m torch.distributed.launch --nproc_per_node=8 --use_env train.py\
torchrun --nproc_per_node=8 train.py\
--model resnext50_32x4d --epochs 100
```
......@@ -41,7 +41,7 @@ python -m torch.distributed.launch --nproc_per_node=8 --use_env train.py\
### ResNext-101 32x8d
```
python -m torch.distributed.launch --nproc_per_node=8 --use_env train.py\
torchrun --nproc_per_node=8 train.py\
--model resnext101_32x8d --epochs 100
```
......@@ -54,7 +54,7 @@ which are respectively batch_size=32 and lr=0.1
### MobileNetV2
```
python -m torch.distributed.launch --nproc_per_node=8 --use_env train.py\
torchrun --nproc_per_node=8 train.py\
--model mobilenet_v2 --epochs 300 --lr 0.045 --wd 0.00004\
--lr-step-size 1 --lr-gamma 0.98
```
......@@ -62,7 +62,7 @@ python -m torch.distributed.launch --nproc_per_node=8 --use_env train.py\
### MobileNetV3 Large & Small
```
python -m torch.distributed.launch --nproc_per_node=8 --use_env train.py\
torchrun --nproc_per_node=8 train.py\
--model $MODEL --epochs 600 --opt rmsprop --batch-size 128 --lr 0.064\
--wd 0.00001 --lr-step-size 2 --lr-gamma 0.973 --auto-augment imagenet --random-erase 0.2
```
......@@ -85,7 +85,7 @@ Automatic Mixed Precision (AMP) training on GPU for Pytorch can be enabled with
Mixed precision training makes use of both FP32 and FP16 precisions where appropriate. FP16 operations can leverage the Tensor cores on NVIDIA GPUs (Volta, Turing or newer architectures) for improved throughput, generally without loss in model accuracy. Mixed precision training also often allows larger batch sizes. GPU automatic mixed precision training for Pytorch Vision can be enabled via the flag value `--apex=True`.
```
python -m torch.distributed.launch --nproc_per_node=8 --use_env train.py\
torchrun --nproc_per_node=8 train.py\
--model resnext50_32x4d --epochs 100 --apex
```
......@@ -120,7 +120,7 @@ For Mobilenet-v2, the model was trained with quantization aware training, the se
12. weight-decay: 0.0001
```
python -m torch.distributed.launch --nproc_per_node=8 --use_env train_quantization.py --model='mobilenet_v2'
torchrun --nproc_per_node=8 train_quantization.py --model='mobilenet_v2'
```
Training converges at about 10 epochs.
......@@ -140,7 +140,7 @@ For Mobilenet-v3 Large, the model was trained with quantization aware training,
12. weight-decay: 0.00001
```
python -m torch.distributed.launch --nproc_per_node=8 --use_env train_quantization.py --model='mobilenet_v3_large' \
torchrun --nproc_per_node=8 train_quantization.py --model='mobilenet_v3_large' \
--wd 0.00001 --lr 0.001
```
......
......@@ -22,35 +22,35 @@ Except otherwise noted, all models have been trained on 8x V100 GPUs.
### Faster R-CNN ResNet-50 FPN
```
python -m torch.distributed.launch --nproc_per_node=8 --use_env train.py\
torchrun --nproc_per_node=8 train.py\
--dataset coco --model fasterrcnn_resnet50_fpn --epochs 26\
--lr-steps 16 22 --aspect-ratio-group-factor 3
```
### Faster R-CNN MobileNetV3-Large FPN
```
python -m torch.distributed.launch --nproc_per_node=8 --use_env train.py\
torchrun --nproc_per_node=8 train.py\
--dataset coco --model fasterrcnn_mobilenet_v3_large_fpn --epochs 26\
--lr-steps 16 22 --aspect-ratio-group-factor 3
```
### Faster R-CNN MobileNetV3-Large 320 FPN
```
python -m torch.distributed.launch --nproc_per_node=8 --use_env train.py\
torchrun --nproc_per_node=8 train.py\
--dataset coco --model fasterrcnn_mobilenet_v3_large_320_fpn --epochs 26\
--lr-steps 16 22 --aspect-ratio-group-factor 3
```
### RetinaNet
```
python -m torch.distributed.launch --nproc_per_node=8 --use_env train.py\
torchrun --nproc_per_node=8 train.py\
--dataset coco --model retinanet_resnet50_fpn --epochs 26\
--lr-steps 16 22 --aspect-ratio-group-factor 3 --lr 0.01
```
### SSD300 VGG16
```
python -m torch.distributed.launch --nproc_per_node=8 --use_env train.py\
torchrun --nproc_per_node=8 train.py\
--dataset coco --model ssd300_vgg16 --epochs 120\
--lr-steps 80 110 --aspect-ratio-group-factor 3 --lr 0.002 --batch-size 4\
--weight-decay 0.0005 --data-augmentation ssd
......@@ -58,7 +58,7 @@ python -m torch.distributed.launch --nproc_per_node=8 --use_env train.py\
### SSDlite320 MobileNetV3-Large
```
python -m torch.distributed.launch --nproc_per_node=8 --use_env train.py\
torchrun --nproc_per_node=8 train.py\
--dataset coco --model ssdlite320_mobilenet_v3_large --epochs 660\
--aspect-ratio-group-factor 3 --lr-scheduler cosineannealinglr --lr 0.15 --batch-size 24\
--weight-decay 0.00004 --data-augmentation ssdlite
......@@ -67,7 +67,7 @@ python -m torch.distributed.launch --nproc_per_node=8 --use_env train.py\
### Mask R-CNN
```
python -m torch.distributed.launch --nproc_per_node=8 --use_env train.py\
torchrun --nproc_per_node=8 train.py\
--dataset coco --model maskrcnn_resnet50_fpn --epochs 26\
--lr-steps 16 22 --aspect-ratio-group-factor 3
```
......@@ -75,7 +75,7 @@ python -m torch.distributed.launch --nproc_per_node=8 --use_env train.py\
### Keypoint R-CNN
```
python -m torch.distributed.launch --nproc_per_node=8 --use_env train.py\
torchrun --nproc_per_node=8 train.py\
--dataset coco_kp --model keypointrcnn_resnet50_fpn --epochs 46\
--lr-steps 36 43 --aspect-ratio-group-factor 3
```
......@@ -14,30 +14,30 @@ You must modify the following flags:
## fcn_resnet50
```
python -m torch.distributed.launch --nproc_per_node=8 --use_env train.py --lr 0.02 --dataset coco -b 4 --model fcn_resnet50 --aux-loss
torchrun --nproc_per_node=8 train.py --lr 0.02 --dataset coco -b 4 --model fcn_resnet50 --aux-loss
```
## fcn_resnet101
```
python -m torch.distributed.launch --nproc_per_node=8 --use_env train.py --lr 0.02 --dataset coco -b 4 --model fcn_resnet101 --aux-loss
torchrun --nproc_per_node=8 train.py --lr 0.02 --dataset coco -b 4 --model fcn_resnet101 --aux-loss
```
## deeplabv3_resnet50
```
python -m torch.distributed.launch --nproc_per_node=8 --use_env train.py --lr 0.02 --dataset coco -b 4 --model deeplabv3_resnet50 --aux-loss
torchrun --nproc_per_node=8 train.py --lr 0.02 --dataset coco -b 4 --model deeplabv3_resnet50 --aux-loss
```
## deeplabv3_resnet101
```
python -m torch.distributed.launch --nproc_per_node=8 --use_env train.py --lr 0.02 --dataset coco -b 4 --model deeplabv3_resnet101 --aux-loss
torchrun --nproc_per_node=8 train.py --lr 0.02 --dataset coco -b 4 --model deeplabv3_resnet101 --aux-loss
```
## deeplabv3_mobilenet_v3_large
```
python -m torch.distributed.launch --nproc_per_node=8 --use_env train.py --dataset coco -b 4 --model deeplabv3_mobilenet_v3_large --aux-loss --wd 0.000001
torchrun --nproc_per_node=8 train.py --dataset coco -b 4 --model deeplabv3_mobilenet_v3_large --aux-loss --wd 0.000001
```
## lraspp_mobilenet_v3_large
```
python -m torch.distributed.launch --nproc_per_node=8 --use_env train.py --dataset coco -b 4 --model lraspp_mobilenet_v3_large --wd 0.000001
torchrun --nproc_per_node=8 train.py --dataset coco -b 4 --model lraspp_mobilenet_v3_large --wd 0.000001
```
......@@ -18,7 +18,7 @@ We assume the training and validation AVI videos are stored at `/data/kinectics4
Run the training on a single node with 8 GPUs:
```bash
python -m torch.distributed.launch --nproc_per_node=8 --use_env train.py --data-path=/data/kinectics400 --train-dir=train --val-dir=val --batch-size=16 --cache-dataset --sync-bn --apex
torchrun --nproc_per_node=8 train.py --data-path=/data/kinectics400 --train-dir=train --val-dir=val --batch-size=16 --cache-dataset --sync-bn --apex
```
**Note:** all our models were trained on 8 nodes with 8 V100 GPUs each for a total of 64 GPUs. Expected training time for 64 GPUs is 24 hours, depending on the storage solution.
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment