README.md 8.71 KB
Newer Older
1
2
3
4
5
6
# Image classification reference training scripts

This folder contains reference training scripts for image classification.
They serve as a log of how to train specific models, as provide baseline
training and evaluation scripts to quickly bootstrap research.

7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
Except otherwise noted, all models have been trained on 8x V100 GPUs with 
the following parameters:

| Parameter                | value  |
| ------------------------ | ------ |
| `--batch_size`           | `32`   |
| `--epochs`               | `90`   |
| `--lr`                   | `0.1`  |
| `--momentum`             | `0.9`  |
| `--wd`, `--weight-decay` | `1e-4` |
| `--lr-step-size`         | `30`   |
| `--lr-gamma`             | `0.1`  |

### AlexNet and VGG

Since `AlexNet` and the original `VGG` architectures do not include batch 
normalization, the default initial learning rate `--lr 0.1` is to high.

```
26
torchrun --nproc_per_node=8 train.py\
27
    --model $MODEL --lr 1e-2
28
29
30
31
32
```

Here `$MODEL` is one of `alexnet`, `vgg11`, `vgg13`, `vgg16` or `vgg19`. Note
that `vgg11_bn`, `vgg13_bn`, `vgg16_bn`, and `vgg19_bn` include batch
normalization and thus are trained with the default parameters.
33

34
35
36
37
38
39
40
### Inception V3

The weights of the Inception V3 model are ported from the original paper rather than trained from scratch.

Since it expects tensors with a size of N x 3 x 299 x 299, to validate the model use the following command:

```
41
torchrun --nproc_per_node=8 train.py --model inception_v3\
42
43
44
      --val-resize-size 342 --val-crop-size 299 --train-crop-size 299 --test-only --pretrained
```

45
### ResNet
46
```
47
torchrun --nproc_per_node=8 train.py --model $MODEL
48
49
```

50
Here `$MODEL` is one of `resnet18`, `resnet34`, `resnet50`, `resnet101` or `resnet152`.
51

52
### ResNext
53
```
54
torchrun --nproc_per_node=8 train.py\
55
    --model $MODEL --epochs 100
56
57
```

58
Here `$MODEL` is one of `resnext50_32x4d` or `resnext101_32x8d`.
59
60
61
62
63
64
Note that the above command corresponds to a single node with 8 GPUs. If you use
a different number of GPUs and/or a different batch size, then the learning rate
should be scaled accordingly. For example, the pretrained model provided by
`torchvision` was trained on 8 nodes, each with 8 GPUs (for a total of 64 GPUs),
with `--batch_size 16` and `--lr 0.4`, instead of the current defaults
which are respectively batch_size=32 and lr=0.1
65
66
67

### MobileNetV2
```
68
torchrun --nproc_per_node=8 train.py\
69
70
71
     --model mobilenet_v2 --epochs 300 --lr 0.045 --wd 0.00004\
     --lr-step-size 1 --lr-gamma 0.98
```
72

73

74
### MobileNetV3 Large & Small
75
```
76
torchrun --nproc_per_node=8 train.py\
77
     --model $MODEL --epochs 600 --opt rmsprop --batch-size 128 --lr 0.064\ 
78
79
80
     --wd 0.00001 --lr-step-size 2 --lr-gamma 0.973 --auto-augment imagenet --random-erase 0.2
```

81
82
83
84
85
Here `$MODEL` is one of `mobilenet_v3_large` or `mobilenet_v3_small`.

Then we averaged the parameters of the last 3 checkpoints that improved the Acc@1. See [#3182](https://github.com/pytorch/vision/pull/3182) 
and [#3354](https://github.com/pytorch/vision/pull/3354) for details.

86

87
88
89
90
91
92
### EfficientNet

The weights of the B0-B4 variants are ported from Ross Wightman's [timm repo](https://github.com/rwightman/pytorch-image-models/blob/01cb46a9a50e3ba4be167965b5764e9702f09b30/timm/models/efficientnet.py#L95-L108).

The weights of the B5-B7 variants are ported from Luke Melas' [EfficientNet-PyTorch repo](https://github.com/lukemelas/EfficientNet-PyTorch/blob/1039e009545d9329ea026c9f7541341439712b96/efficientnet_pytorch/utils.py#L562-L564).

93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
All models were trained using Bicubic interpolation and each have custom crop and resize sizes. To validate the models use the following commands:
```
torchrun --nproc_per_node=8 train.py --model efficientnet_b0 --interpolation bicubic\
     --val-resize-size 256 --val-crop-size 224 --train-crop-size 224 --test-only --pretrained
torchrun --nproc_per_node=8 train.py --model efficientnet_b1 --interpolation bicubic\
      --val-resize-size 256 --val-crop-size 240 --train-crop-size 240 --test-only --pretrained
torchrun --nproc_per_node=8 train.py --model efficientnet_b2 --interpolation bicubic\
      --val-resize-size 288 --val-crop-size 288 --train-crop-size 288 --test-only --pretrained
torchrun --nproc_per_node=8 train.py --model efficientnet_b3 --interpolation bicubic\
      --val-resize-size 320 --val-crop-size 300 --train-crop-size 300 --test-only --pretrained
torchrun --nproc_per_node=8 train.py --model efficientnet_b4 --interpolation bicubic\
      --val-resize-size 384 --val-crop-size 380 --train-crop-size 380 --test-only --pretrained
torchrun --nproc_per_node=8 train.py --model efficientnet_b5 --interpolation bicubic\
      --val-resize-size 456 --val-crop-size 456 --train-crop-size 456 --test-only --pretrained
torchrun --nproc_per_node=8 train.py --model efficientnet_b6 --interpolation bicubic\
      --val-resize-size 528 --val-crop-size 528 --train-crop-size 528 --test-only --pretrained
torchrun --nproc_per_node=8 train.py --model efficientnet_b7 --interpolation bicubic\
      --val-resize-size 600 --val-crop-size 600 --train-crop-size 600 --test-only --pretrained
```
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141

### RegNet

#### Small models
```
torchrun --nproc_per_node=8 train.py\
     --model $MODEL --epochs 100 --batch-size 128 --wd 0.00005 --lr=0.8\
     --lr-scheduler=cosineannealinglr --lr-warmup-method=linear\
     --lr-warmup-epochs=5 --lr-warmup-decay=0.1
```
Here `$MODEL` is one of `regnet_x_400mf`, `regnet_x_800mf`, `regnet_x_1_6gf`, `regnet_y_400mf`, `regnet_y_800mf` and `regnet_y_1_6gf`. Please note we used learning rate 0.4 for `regent_y_400mf` to get the same Acc@1 as [the paper)(https://arxiv.org/abs/2003.13678).

### Medium models
```
torchrun --nproc_per_node=8 train.py\
     --model $MODEL --epochs 100 --batch-size 64 --wd 0.00005 --lr=0.4\
     --lr-scheduler=cosineannealinglr --lr-warmup-method=linear\
     --lr-warmup-epochs=5 --lr-warmup-decay=0.1
```
Here `$MODEL` is one of `regnet_x_3_2gf`, `regnet_x_8gf`, `regnet_x_16gf`, `regnet_y_3_2gf` and `regnet_y_8gf`.

### Large models
```
torchrun --nproc_per_node=8 train.py\
     --model $MODEL --epochs 100 --batch-size 32 --wd 0.00005 --lr=0.2\
     --lr-scheduler=cosineannealinglr --lr-warmup-method=linear\
     --lr-warmup-epochs=5 --lr-warmup-decay=0.1
```
Here `$MODEL` is one of `regnet_x_32gf`, `regnet_y_16gf` and `regnet_y_32gf`.

142
## Mixed precision training
143
Automatic Mixed Precision (AMP) training on GPU for Pytorch can be enabled with the [torch.cuda.amp](https://pytorch.org/docs/stable/amp.html?highlight=amp#module-torch.cuda.amp).
144

145
Mixed precision training makes use of both FP32 and FP16 precisions where appropriate. FP16 operations can leverage the Tensor cores on NVIDIA GPUs (Volta, Turing or newer architectures) for improved throughput, generally without loss in model accuracy. Mixed precision training also often allows larger batch sizes. GPU automatic mixed precision training for Pytorch Vision can be enabled via the flag value `--amp=True`.
146
147

```
148
torchrun --nproc_per_node=8 train.py\
149
    --model resnext50_32x4d --epochs 100 --amp
150
151
```

152
153
## Quantized

154
### Post training quantized models
155

156
For all post training quantized models, the settings are:
157
158
159
160
161
162
163

1. num_calibration_batches: 32
2. num_workers: 16
3. batch_size: 32
4. eval_batch_size: 128
5. backend: 'fbgemm'

164
```
165
python train_quantization.py --device='cpu' --post-training-quantize --backend='fbgemm' --model='$MODEL'
166
```
167
168
169
Here `$MODEL` is one of `googlenet`, `inception_v3`, `resnet18`, `resnet50`, `resnext101_32x8d` and `shufflenet_v2_x1_0`.

### QAT MobileNetV2
170

171
172
173
174
175
176
177
178
179
180
181
182
For Mobilenet-v2, the model was trained with quantization aware training, the settings used are:
1. num_workers: 16
2. batch_size: 32
3. eval_batch_size: 128
4. backend: 'qnnpack'
5. learning-rate: 0.0001
6. num_epochs: 90
7. num_observer_update_epochs:4
8. num_batch_norm_update_epochs:3
9. momentum: 0.9
10. lr_step_size:30
11. lr_gamma: 0.1
183
184
185
12. weight-decay: 0.0001

```
186
torchrun --nproc_per_node=8 train_quantization.py --model='mobilenet_v2'
187
```
188
189
190

Training converges at about 10 epochs.

191
192
### QAT MobileNetV3

193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
For Mobilenet-v3 Large, the model was trained with quantization aware training, the settings used are:
1. num_workers: 16
2. batch_size: 32
3. eval_batch_size: 128
4. backend: 'qnnpack'
5. learning-rate: 0.001
6. num_epochs: 90
7. num_observer_update_epochs:4
8. num_batch_norm_update_epochs:3
9. momentum: 0.9
10. lr_step_size:30
11. lr_gamma: 0.1
12. weight-decay: 0.00001

```
208
torchrun --nproc_per_node=8 train_quantization.py --model='mobilenet_v3_large' \
209
210
211
212
    --wd 0.00001 --lr 0.001
```

For post training quant, device is set to CPU. For training, the device is set to CUDA.
213
214

### Command to evaluate quantized models using the pre-trained weights:
215

216
```
217
python train_quantization.py --device='cpu' --test-only --backend='<backend>' --model='<model_name>'
218
```
219
220
221
222
223

For inception_v3 you need to pass the following extra parameters:
```
--val-resize-size 342 --val-crop-size 299 --train-crop-size 299
```