README.md 5.21 KB
Newer Older
1
2
3
4
5
6
# Image classification reference training scripts

This folder contains reference training scripts for image classification.
They serve as a log of how to train specific models, as provide baseline
training and evaluation scripts to quickly bootstrap research.

7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
Except otherwise noted, all models have been trained on 8x V100 GPUs with 
the following parameters:

| Parameter                | value  |
| ------------------------ | ------ |
| `--batch_size`           | `32`   |
| `--epochs`               | `90`   |
| `--lr`                   | `0.1`  |
| `--momentum`             | `0.9`  |
| `--wd`, `--weight-decay` | `1e-4` |
| `--lr-step-size`         | `30`   |
| `--lr-gamma`             | `0.1`  |

### AlexNet and VGG

Since `AlexNet` and the original `VGG` architectures do not include batch 
normalization, the default initial learning rate `--lr 0.1` is to high.

```
26
torchrun --nproc_per_node=8 train.py\
27
    --model $MODEL --lr 1e-2
28
29
30
31
32
```

Here `$MODEL` is one of `alexnet`, `vgg11`, `vgg13`, `vgg16` or `vgg19`. Note
that `vgg11_bn`, `vgg13_bn`, `vgg16_bn`, and `vgg19_bn` include batch
normalization and thus are trained with the default parameters.
33
34
35

### ResNext-50 32x4d
```
36
torchrun --nproc_per_node=8 train.py\
37
38
39
40
41
42
43
    --model resnext50_32x4d --epochs 100
```


### ResNext-101 32x8d

```
44
torchrun --nproc_per_node=8 train.py\
45
46
47
    --model resnext101_32x8d --epochs 100
```

48
49
50
51
52
53
Note that the above command corresponds to a single node with 8 GPUs. If you use
a different number of GPUs and/or a different batch size, then the learning rate
should be scaled accordingly. For example, the pretrained model provided by
`torchvision` was trained on 8 nodes, each with 8 GPUs (for a total of 64 GPUs),
with `--batch_size 16` and `--lr 0.4`, instead of the current defaults
which are respectively batch_size=32 and lr=0.1
54
55
56

### MobileNetV2
```
57
torchrun --nproc_per_node=8 train.py\
58
59
60
     --model mobilenet_v2 --epochs 300 --lr 0.045 --wd 0.00004\
     --lr-step-size 1 --lr-gamma 0.98
```
61

62

63
### MobileNetV3 Large & Small
64
```
65
torchrun --nproc_per_node=8 train.py\
66
     --model $MODEL --epochs 600 --opt rmsprop --batch-size 128 --lr 0.064\ 
67
68
69
     --wd 0.00001 --lr-step-size 2 --lr-gamma 0.973 --auto-augment imagenet --random-erase 0.2
```

70
71
72
73
74
Here `$MODEL` is one of `mobilenet_v3_large` or `mobilenet_v3_small`.

Then we averaged the parameters of the last 3 checkpoints that improved the Acc@1. See [#3182](https://github.com/pytorch/vision/pull/3182) 
and [#3354](https://github.com/pytorch/vision/pull/3354) for details.

75

76
77
78
79
80
81
### EfficientNet

The weights of the B0-B4 variants are ported from Ross Wightman's [timm repo](https://github.com/rwightman/pytorch-image-models/blob/01cb46a9a50e3ba4be167965b5764e9702f09b30/timm/models/efficientnet.py#L95-L108).

The weights of the B5-B7 variants are ported from Luke Melas' [EfficientNet-PyTorch repo](https://github.com/lukemelas/EfficientNet-PyTorch/blob/1039e009545d9329ea026c9f7541341439712b96/efficientnet_pytorch/utils.py#L562-L564).

82
83
84
85
86
87
## Mixed precision training
Automatic Mixed Precision (AMP) training on GPU for Pytorch can be enabled with the [NVIDIA Apex extension](https://github.com/NVIDIA/apex).

Mixed precision training makes use of both FP32 and FP16 precisions where appropriate. FP16 operations can leverage the Tensor cores on NVIDIA GPUs (Volta, Turing or newer architectures) for improved throughput, generally without loss in model accuracy. Mixed precision training also often allows larger batch sizes. GPU automatic mixed precision training for Pytorch Vision can be enabled via the flag value `--apex=True`.

```
88
torchrun --nproc_per_node=8 train.py\
89
90
91
    --model resnext50_32x4d --epochs 100 --apex
```

92
93
94
95
96
97
98
99
100
101
102
103
## Quantized

### Parameters used for generating quantized models:

For all post training quantized models (All quantized models except mobilenet-v2), the settings are:

1. num_calibration_batches: 32
2. num_workers: 16
3. batch_size: 32
4. eval_batch_size: 128
5. backend: 'fbgemm'

104
105
106
107
```
python train_quantization.py --device='cpu' --post-training-quantize --backend='fbgemm' --model='<model_name>'
```

108
109
110
111
112
113
114
115
116
117
118
119
For Mobilenet-v2, the model was trained with quantization aware training, the settings used are:
1. num_workers: 16
2. batch_size: 32
3. eval_batch_size: 128
4. backend: 'qnnpack'
5. learning-rate: 0.0001
6. num_epochs: 90
7. num_observer_update_epochs:4
8. num_batch_norm_update_epochs:3
9. momentum: 0.9
10. lr_step_size:30
11. lr_gamma: 0.1
120
121
122
12. weight-decay: 0.0001

```
123
torchrun --nproc_per_node=8 train_quantization.py --model='mobilenet_v2'
124
```
125
126
127

Training converges at about 10 epochs.

128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
For Mobilenet-v3 Large, the model was trained with quantization aware training, the settings used are:
1. num_workers: 16
2. batch_size: 32
3. eval_batch_size: 128
4. backend: 'qnnpack'
5. learning-rate: 0.001
6. num_epochs: 90
7. num_observer_update_epochs:4
8. num_batch_norm_update_epochs:3
9. momentum: 0.9
10. lr_step_size:30
11. lr_gamma: 0.1
12. weight-decay: 0.00001

```
143
torchrun --nproc_per_node=8 train_quantization.py --model='mobilenet_v3_large' \
144
145
146
147
    --wd 0.00001 --lr 0.001
```

For post training quant, device is set to CPU. For training, the device is set to CUDA.
148
149

### Command to evaluate quantized models using the pre-trained weights:
150

151
```
152
python train_quantization.py --device='cpu' --test-only --backend='<backend>' --model='<model_name>'
153
```