MODEL_GARDEN.md 14.2 KB
Newer Older
Xianzhi Du's avatar
Xianzhi Du committed
1
# TF-Vision Model Garden
Abdullah Rashwan's avatar
Abdullah Rashwan committed
2
3
4

## Introduction

Xianzhi Du's avatar
Xianzhi Du committed
5
6
7
TF-Vision modeling library for computer vision provides a collection of
baselines and checkpoints for image classification, object detection, and
segmentation.
Abdullah Rashwan's avatar
Abdullah Rashwan committed
8
9

## Image Classification
Xianzhi Du's avatar
Xianzhi Du committed
10

Abdullah Rashwan's avatar
Abdullah Rashwan committed
11
### ImageNet Baselines
Xianzhi Du's avatar
Xianzhi Du committed
12
13
14
15
16

#### ResNet models trained with vanilla settings

* Models are trained from scratch with batch size 4096 and 1.6 initial learning
  rate.
A. Unique TensorFlower's avatar
A. Unique TensorFlower committed
17
18
19
* Linear warmup is applied for the first 5 epochs.
* Models trained with l2 weight regularization and ReLU activation.

Xianzhi Du's avatar
Xianzhi Du committed
20
21
| Model        | Resolution    | Epochs  |  Top-1  |  Top-5  | Download |
| ------------ |:-------------:|--------:|--------:|--------:|---------:|
A. Unique TensorFlower's avatar
A. Unique TensorFlower committed
22
23
24
25
| ResNet-50    | 224x224       |    90    | 76.1 | 92.9 | [config](https://github.com/tensorflow/models/blob/master/official/vision/beta/configs/experiments/image_classification/imagenet_resnet50_tpu.yaml) |
| ResNet-50    | 224x224       |    200   | 77.1 | 93.5 | [config](https://github.com/tensorflow/models/blob/master/official/vision/beta/configs/experiments/image_classification/imagenet_resnet50_tpu.yaml) |
| ResNet-101   | 224x224       |    200   | 78.3 | 94.2 | [config](https://github.com/tensorflow/models/blob/master/official/vision/beta/configs/experiments/image_classification/imagenet_resnet101_tpu.yaml) |
| ResNet-152   | 224x224       |    200   | 78.7 | 94.3 | [config](https://github.com/tensorflow/models/blob/master/official/vision/beta/configs/experiments/image_classification/imagenet_resnet152_tpu.yaml) |
A. Unique TensorFlower's avatar
A. Unique TensorFlower committed
26

Xianzhi Du's avatar
Xianzhi Du committed
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
#### ResNet-RS models trained with various settings

We support state-of-the-art [ResNet-RS](https://arxiv.org/abs/2103.07579) image
classification models with features:

* ResNet-RS architectural changes and Swish activation. (Note that ResNet-RS
  adopts ReLU activation in the paper.)
* Regularization methods including Random Augment, 4e-5 weight decay, stochastic
depth, label smoothing and dropout.
* New training methods including a 350-epoch schedule, cosine learning rate and
  EMA.
* Configs are in this [directory](https://github.com/tensorflow/models/blob/master/official/vision/beta/configs/experiments/image_classification).

| Model     | Resolution | Params (M) | Top-1 | Top-5 | Download |
| --------- | :--------: | ---------: | ----: | ----: | --------:|
| ResNet-RS-50 | 160x160    | 35.7    | 79.1  | 94.5  | [config](https://github.com/tensorflow/models/blob/master/official/vision/beta/configs/experiments/image_classification/imagenet_resnetrs50_i160.yaml) \| [ckpt](https://storage.cloud.google.com/tf_model_garden/vision/resnet-rs/resnet-rs-50-i160.tar.gz) |
| ResNet-RS-101 | 160x160    | 63.7    | 80.2  | 94.9  | [config](https://github.com/tensorflow/models/blob/master/official/vision/beta/configs/experiments/image_classification/imagenet_resnetrs101_i160.yaml) \| [ckpt](https://storage.cloud.google.com/tf_model_garden/vision/resnet-rs/resnet-rs-101-i160.tar.gz) |
| ResNet-RS-101 | 192x192    | 63.7    | 81.3  | 95.6  | [config](https://github.com/tensorflow/models/blob/master/official/vision/beta/configs/experiments/image_classification/imagenet_resnetrs101_i192.yaml) \| [ckpt](https://storage.cloud.google.com/tf_model_garden/vision/resnet-rs/resnet-rs-101-i192.tar.gz) |
| ResNet-RS-152 | 192x192    | 86.8    | 81.9  | 95.8  | [config](https://github.com/tensorflow/models/blob/master/official/vision/beta/configs/experiments/image_classification/imagenet_resnetrs152_i192.yaml) \| [ckpt](https://storage.cloud.google.com/tf_model_garden/vision/resnet-rs/resnet-rs-152-i192.tar.gz) |
| ResNet-RS-152 | 224x224    | 86.8    | 82.5  | 96.1  | [config](https://github.com/tensorflow/models/blob/master/official/vision/beta/configs/experiments/image_classification/imagenet_resnetrs152_i224.yaml) \| [ckpt](https://storage.cloud.google.com/tf_model_garden/vision/resnet-rs/resnet-rs-152-i224.tar.gz) |
| ResNet-RS-152 | 256x256    | 86.8    | 83.1  | 96.3  | [config](https://github.com/tensorflow/models/blob/master/official/vision/beta/configs/experiments/image_classification/imagenet_resnetrs152_i256.yaml) \| [ckpt](https://storage.cloud.google.com/tf_model_garden/vision/resnet-rs/resnet-rs-152-i256.tar.gz) |
| ResNet-RS-200 | 256x256    | 93.4    | 83.5  | 96.6  | [config](https://github.com/tensorflow/models/blob/master/official/vision/beta/configs/experiments/image_classification/imagenet_resnetrs200_i256.yaml) \| [ckpt](https://storage.cloud.google.com/tf_model_garden/vision/resnet-rs/resnet-rs-200-i256.tar.gz) |
| ResNet-RS-270 | 256x256    | 130.1    | 83.6  | 96.6  | [config](https://github.com/tensorflow/models/blob/master/official/vision/beta/configs/experiments/image_classification/imagenet_resnetrs270_i256.yaml) \| [ckpt](https://storage.cloud.google.com/tf_model_garden/vision/resnet-rs/resnet-rs-270-i256.tar.gz) |
| ResNet-RS-350 | 256x256    |  164.3   | 83.7  | 96.7  | [config](https://github.com/tensorflow/models/blob/master/official/vision/beta/configs/experiments/image_classification/imagenet_resnetrs350_i256.yaml) \| [ckpt](https://storage.cloud.google.com/tf_model_garden/vision/resnet-rs/resnet-rs-350-i256.tar.gz) |
| ResNet-RS-350 | 320x320    | 164.3   | 84.2  | 96.9  | [config](https://github.com/tensorflow/models/blob/master/official/vision/beta/configs/experiments/image_classification/imagenet_resnetrs420_i256.yaml) \| [ckpt](https://storage.cloud.google.com/tf_model_garden/vision/resnet-rs/resnet-rs-350-i320.tar.gz) |
Abdullah Rashwan's avatar
Abdullah Rashwan committed
52
53

## Object Detection and Instance Segmentation
Xianzhi Du's avatar
Xianzhi Du committed
54

Abdullah Rashwan's avatar
Abdullah Rashwan committed
55
### Common Settings and Notes
Xianzhi Du's avatar
Xianzhi Du committed
56

Xianzhi Du's avatar
Xianzhi Du committed
57
58
59
60
61
62
* We provide models adopting [ResNet-FPN](https://arxiv.org/abs/1612.03144) and
  [SpineNet](https://arxiv.org/abs/1912.05027) backbones  based on detection frameworks:
  * [RetinaNet](https://arxiv.org/abs/1708.02002) and [RetinaNet-RS](https://arxiv.org/abs/2107.00057)
  * [Mask R-CNN](https://arxiv.org/abs/1703.06870)
  * [Cascade RCNN](https://arxiv.org/abs/1712.00726) and [Cascade RCNN-RS](https://arxiv.org/abs/2107.00057)

Abdullah Rashwan's avatar
Abdullah Rashwan committed
63
64
* Models are all trained on COCO train2017 and evaluated on COCO val2017.
* Training details:
Xianzhi Du's avatar
Xianzhi Du committed
65
66
67
68
69
70
71
72
73
74
  * Models finetuned from ImageNet pretrained checkpoints adopt the 12 or 36
    epochs schedule. Models trained from scratch adopt the 350 epochs schedule.
  * The default training data augmentation implements horizontal flipping and
    scale jittering with a random scale between [0.5, 2.0].
  * Unless noted, all models are trained with l2 weight regularization and ReLU
    activation.
  * We use batch size 256 and stepwise learning rate that decays at the last 30
    and 10 epoch.
  * We use square image as input by resizing the long side of an image to the
    target size then padding the short side with zeros.
Abdullah Rashwan's avatar
Abdullah Rashwan committed
75
76

### COCO Object Detection Baselines
Xianzhi Du's avatar
Xianzhi Du committed
77

Abdullah Rashwan's avatar
Abdullah Rashwan committed
78
#### RetinaNet (ImageNet pretrained)
Xianzhi Du's avatar
Xianzhi Du committed
79
80
81
82
83

| Backbone     | Resolution    | Epochs  | FLOPs (B)     | Params (M) | Box AP | Download |
| ------------ |:-------------:| -------:|--------------:|-----------:|-------:|---------:|
| R50-FPN      | 640x640       |    12   | 97.0 | 34.0 | 34.3 | config|
| R50-FPN      | 640x640       |    72   | 97.0 | 34.0 | 36.8 | config \| [ckpt](https://storage.cloud.google.com/tf_model_garden/vision/retinanet/retinanet-resnet50fpn.tar.gz) |
Abdullah Rashwan's avatar
Abdullah Rashwan committed
84

A. Unique TensorFlower's avatar
A. Unique TensorFlower committed
85
#### RetinaNet (Trained from scratch) with training features including:
Xianzhi Du's avatar
Xianzhi Du committed
86

A. Unique TensorFlower's avatar
A. Unique TensorFlower committed
87
88
89
* Stochastic depth with drop rate 0.2.
* Swish activation.

Xianzhi Du's avatar
Xianzhi Du committed
90
91
| Backbone     | Resolution    | Epochs  | FLOPs (B)     | Params (M) |  Box AP | Download |
| ------------ |:-------------:| -------:|--------------:|-----------:|--------:|---------:|
Xianzhi Du's avatar
Xianzhi Du committed
92
93
94
95
96
97
| SpineNet-49  | 640x640       |    500    | 85.4| 28.5 | 44.2 | [config](https://github.com/tensorflow/models/blob/master/official/vision/beta/configs/experiments/retinanet/coco_spinenet49_tpu.yaml) \| [TB.dev](https://tensorboard.dev/experiment/n2UN83TkTdyKZn3slCWulg/#scalars&_smoothingWeight=0)|
| SpineNet-96  | 1024x1024     |    500    | 265.4 | 43.0 | 48.5 |  [config](https://github.com/tensorflow/models/blob/master/official/vision/beta/configs/experiments/retinanet/coco_spinenet96_tpu.yaml) \| [TB.dev](https://tensorboard.dev/experiment/n2UN83TkTdyKZn3slCWulg/#scalars&_smoothingWeight=0)|
| SpineNet-143 | 1280x1280     |    500    | 524.0 | 67.0 | 50.0 | [config](https://github.com/tensorflow/models/blob/master/official/vision/beta/configs/experiments/retinanet/coco_spinenet143_tpu.yaml) \| [TB.dev](https://tensorboard.dev/experiment/n2UN83TkTdyKZn3slCWulg/#scalars&_smoothingWeight=0)|

#### Mobile-size RetinaNet (Trained from scratch):

Xianzhi Du's avatar
Xianzhi Du committed
98
99
100
101
| Backbone    | Resolution | Epochs | FLOPs (B) | Params (M) | Box AP | Download |
| ----------- | :--------: | -----: | --------: | ---------: | -----: | --------:|
| MobileNetv2 | 256x256    | 600    | -         | 2.27       | 23.5   | [config](https://github.com/tensorflow/models/blob/master/official/vision/beta/configs/experiments/retinanet/coco_mobilenetv2_tpu.yaml) |
| Mobile SpineNet-49  | 384x384    | 600    | 1.0      | 2.32       | 28.1   | [config](https://github.com/tensorflow/models/blob/master/official/vision/beta/configs/experiments/retinanet/coco_spinenet49_mobile_tpu.yaml) \| [ckpt](https://storage.cloud.google.com/tf_model_garden/vision/retinanet/spinenet49mobile.tar.gz) |
Abdullah Rashwan's avatar
Abdullah Rashwan committed
102
103
104
105

### Instance Segmentation Baselines

#### Mask R-CNN (Trained from scratch)
Yin Cui's avatar
Yin Cui committed
106

Xianzhi Du's avatar
Xianzhi Du committed
107
108
| Backbone     | Resolution    | Epochs  | FLOPs (B)  | Params (M) | Box AP | Mask AP | Download |
| ------------ |:-------------:| -------:|-----------:|-----------:|-------:|--------:|---------:|
Xianzhi Du's avatar
Xianzhi Du committed
109
110
111
112
113
114
115
116
117
118
119
120
ResNet50-FPN | 640x640    | 350    | 227.7     | 46.3       | 42.3   | 37.6    | [config](https://github.com/tensorflow/models/blob/master/official/vision/beta/configs/experiments/maskrcnn/r50fpn_640_coco_scratch_tpu4x4.yaml) |
| SpineNet-49  | 640x640       |  350    | 215.7      | 40.8       | 42.6   | 37.9    | [config](https://github.com/tensorflow/models/blob/master/official/vision/beta/configs/experiments/maskrcnn/coco_spinenet49_mrcnn_tpu.yaml) |
SpineNet-96  | 1024x1024  | 500    | 315.0     | 55.2       | 48.1   | 42.4    | [config](https://github.com/tensorflow/models/blob/master/official/vision/beta/configs/experiments/maskrcnn/coco_spinenet96_mrcnn_tpu.yaml) |
SpineNet-143 | 1280x1280  | 500    | 498.8     | 79.2       | 49.3   | 43.4    | [config](https://github.com/tensorflow/models/blob/master/official/vision/beta/configs/experiments/maskrcnn/coco_spinenet143_mrcnn_tpu.yaml) |


#### Cascade RCNN-RS (Trained from scratch)

backbone     | resolution | epochs | params (M) | box AP | mask AP | download
------------ | :--------: | -----: | ---------: | -----: | ------: | -------:
SpineNet-49  | 640x640    | 500    | 56.4       | 46.4   | 40.0    | [config](https://github.com/tensorflow/models/blob/master/official/vision/beta/configs/experiments/maskrcnn/coco_spinenet49_cascadercnn_tpu.yaml)|
SpineNet-143 | 1280x1280  | 500    | 94.9       | 51.9   | 45.0    | [config](https://github.com/tensorflow/models/blob/master/official/vision/beta/configs/experiments/maskrcnn/coco_spinenet143_cascadercnn_tpu.yaml)|
Xianzhi Du's avatar
Xianzhi Du committed
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140

## Semantic Segmentation

* We support [DeepLabV3](https://arxiv.org/pdf/1706.05587.pdf) and
  [DeepLabV3+](https://arxiv.org/pdf/1802.02611.pdf) architectures, with
  Dilated ResNet backbones.
* Backbones are pre-trained on ImageNet.

### PASCAL-VOC

| Model      | Backbone           | Resolution | Steps | mIoU | Download |
| ---------- | :----------------: | :--------: | ----: | ---: | --------:|
| DeepLabV3  | Dilated Resnet-101 | 512x512    | 30k   | 78.7 |          |
| DeepLabV3+ | Dilated Resnet-101 | 512x512    | 30k   | 79.2 |          |

### CITYSCAPES

| Model      | Backbone           | Resolution | Steps | mIoU  | Download |
| ---------- | :----------------: | :--------: | ----: | ----: | --------:|
| DeepLabV3+ | Dilated Resnet-101 | 1024x2048  | 90k   | 78.79 |          |
Yin Cui's avatar
Yin Cui committed
141
142

## Video Classification
Xianzhi Du's avatar
Xianzhi Du committed
143

Yin Cui's avatar
Yin Cui committed
144
### Common Settings and Notes
Xianzhi Du's avatar
Xianzhi Du committed
145

146
147
148
149
150
151
152
153
*   We provide models for video classification with backbones:
    *   SlowOnly in
        [SlowFast Networks for Video Recognition](https://arxiv.org/abs/1812.03982).
    *   ResNet-3D (R3D) in
        [Spatiotemporal Contrastive Video Representation Learning](https://arxiv.org/abs/2008.03800).
    *   ResNet-3D-RS (R3D-RS) in
        [Revisiting 3D ResNets for Video Recognition](https://arxiv.org/pdf/2109.01696.pdf).

Yin Cui's avatar
Yin Cui committed
154
* Training and evaluation details:
Xianzhi Du's avatar
Xianzhi Du committed
155
156
157
158
159
160
  * All models are trained from scratch with vision modality (RGB) for 200
    epochs.
  * We use batch size of 1024 and cosine learning rate decay with linear warmup
    in first 5 epochs.
  * We follow [SlowFast](https://arxiv.org/abs/1812.03982) to perform 30-view
    evaluation.
Yin Cui's avatar
Yin Cui committed
161
162

### Kinetics-400 Action Recognition Baselines
Xianzhi Du's avatar
Xianzhi Du committed
163
164

| Model    | Input (frame x stride) |  Top-1  |  Top-5  | Download |
Yin Cui's avatar
Yin Cui committed
165
166
167
168
| -------- |:----------------------:|--------:|--------:|---------:|
| SlowOnly | 8 x 8                  |  74.1   |  91.4   | [config](https://github.com/tensorflow/models/blob/master/official/vision/beta/configs/experiments/video_classification/k400_slowonly8x8_tpu.yaml) |
| SlowOnly | 16 x 4                 |  75.6   |  92.1   | [config](https://github.com/tensorflow/models/blob/master/official/vision/beta/configs/experiments/video_classification/k400_slowonly16x4_tpu.yaml) |
| R3D-50   | 32 x 2                 |  77.0   |  93.0   | [config](https://github.com/tensorflow/models/blob/master/official/vision/beta/configs/experiments/video_classification/k400_3d-resnet50_tpu.yaml) |
169
170
171
172
173
| R3D-RS-50   | 32 x 2                 |  78.2   |  93.7   | [config](https://github.com/tensorflow/models/blob/master/official/vision/beta/configs/experiments/video_classification/k400_resnet3drs_50_tpu.yaml) |
| R3D-RS-101 | 32 x 2                 | 79.5  | 94.2  | -
| R3D-RS-152 | 32 x 2                 | 79.9  | 94.3  | -
| R3D-RS-200 | 32 x 2                 | 80.4  | 94.4  | -
| R3D-RS-200 | 48 x 2                 | 81.0  | -     | -
Yin Cui's avatar
Yin Cui committed
174
175

### Kinetics-600 Action Recognition Baselines
Xianzhi Du's avatar
Xianzhi Du committed
176
177

| Model    | Input (frame x stride) |  Top-1  |  Top-5  | Download |
Yin Cui's avatar
Yin Cui committed
178
179
180
| -------- |:----------------------:|--------:|--------:|---------:|
| SlowOnly | 8 x 8                  |  77.3   |  93.6   | [config](https://github.com/tensorflow/models/blob/master/official/vision/beta/configs/experiments/video_classification/k600_slowonly8x8_tpu.yaml) |
| R3D-50   | 32 x 2                 |  79.5   |  94.8   | [config](https://github.com/tensorflow/models/blob/master/official/vision/beta/configs/experiments/video_classification/k600_3d-resnet50_tpu.yaml) |
181
182
| R3D-RS-200 | 32 x 2                 | 83.1  | -     | -
| R3D-RS-200 | 48 x 2                 | 83.8  | -     | -