MODEL_GARDEN.md 16.7 KB
Newer Older
Xianzhi Du's avatar
Xianzhi Du committed
1
# TF-Vision Model Garden
Abdullah Rashwan's avatar
Abdullah Rashwan committed
2

A. Unique TensorFlower's avatar
A. Unique TensorFlower committed
3
4
5
6
7
⚠️ Disclaimer: All datasets hyperlinked from this page are not owned or
distributed by Google. The dataset is made available by third parties.
Please review the terms and conditions made available by the third parties
before using the data.

Abdullah Rashwan's avatar
Abdullah Rashwan committed
8
9
## Introduction

Xianzhi Du's avatar
Xianzhi Du committed
10
11
12
TF-Vision modeling library for computer vision provides a collection of
baselines and checkpoints for image classification, object detection, and
segmentation.
Abdullah Rashwan's avatar
Abdullah Rashwan committed
13
14

## Image Classification
Xianzhi Du's avatar
Xianzhi Du committed
15

Abdullah Rashwan's avatar
Abdullah Rashwan committed
16
### ImageNet Baselines
Xianzhi Du's avatar
Xianzhi Du committed
17
18
19
20
21

#### ResNet models trained with vanilla settings

* Models are trained from scratch with batch size 4096 and 1.6 initial learning
  rate.
A. Unique TensorFlower's avatar
A. Unique TensorFlower committed
22
23
24
* Linear warmup is applied for the first 5 epochs.
* Models trained with l2 weight regularization and ReLU activation.

Xianzhi Du's avatar
Xianzhi Du committed
25
26
| Model        | Resolution    | Epochs  |  Top-1  |  Top-5  | Download |
| ------------ |:-------------:|--------:|--------:|--------:|---------:|
Yeqing Li's avatar
Yeqing Li committed
27
28
29
30
| ResNet-50    | 224x224       |    90    | 76.1 | 92.9 | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/image_classification/imagenet_resnet50_tpu.yaml) |
| ResNet-50    | 224x224       |    200   | 77.1 | 93.5 | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/image_classification/imagenet_resnet50_tpu.yaml) |
| ResNet-101   | 224x224       |    200   | 78.3 | 94.2 | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/image_classification/imagenet_resnet101_tpu.yaml) |
| ResNet-152   | 224x224       |    200   | 78.7 | 94.3 | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/image_classification/imagenet_resnet152_tpu.yaml) |
A. Unique TensorFlower's avatar
A. Unique TensorFlower committed
31

Xianzhi Du's avatar
Xianzhi Du committed
32
33
34
35
36
37
38
39
40
41
42
#### ResNet-RS models trained with various settings

We support state-of-the-art [ResNet-RS](https://arxiv.org/abs/2103.07579) image
classification models with features:

* ResNet-RS architectural changes and Swish activation. (Note that ResNet-RS
  adopts ReLU activation in the paper.)
* Regularization methods including Random Augment, 4e-5 weight decay, stochastic
depth, label smoothing and dropout.
* New training methods including a 350-epoch schedule, cosine learning rate and
  EMA.
Yeqing Li's avatar
Yeqing Li committed
43
* Configs are in this [directory](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/image_classification).
Xianzhi Du's avatar
Xianzhi Du committed
44
45
46

| Model     | Resolution | Params (M) | Top-1 | Top-5 | Download |
| --------- | :--------: | ---------: | ----: | ----: | --------:|
Yeqing Li's avatar
Yeqing Li committed
47
48
49
50
51
52
53
54
55
| ResNet-RS-50 | 160x160    | 35.7    | 79.1  | 94.5  | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/image_classification/imagenet_resnetrs50_i160.yaml) \| [ckpt](https://storage.cloud.google.com/tf_model_garden/vision/resnet-rs/resnet-rs-50-i160.tar.gz) |
| ResNet-RS-101 | 160x160    | 63.7    | 80.2  | 94.9  | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/image_classification/imagenet_resnetrs101_i160.yaml) \| [ckpt](https://storage.cloud.google.com/tf_model_garden/vision/resnet-rs/resnet-rs-101-i160.tar.gz) |
| ResNet-RS-101 | 192x192    | 63.7    | 81.3  | 95.6  | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/image_classification/imagenet_resnetrs101_i192.yaml) \| [ckpt](https://storage.cloud.google.com/tf_model_garden/vision/resnet-rs/resnet-rs-101-i192.tar.gz) |
| ResNet-RS-152 | 192x192    | 86.8    | 81.9  | 95.8  | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/image_classification/imagenet_resnetrs152_i192.yaml) \| [ckpt](https://storage.cloud.google.com/tf_model_garden/vision/resnet-rs/resnet-rs-152-i192.tar.gz) |
| ResNet-RS-152 | 224x224    | 86.8    | 82.5  | 96.1  | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/image_classification/imagenet_resnetrs152_i224.yaml) \| [ckpt](https://storage.cloud.google.com/tf_model_garden/vision/resnet-rs/resnet-rs-152-i224.tar.gz) |
| ResNet-RS-152 | 256x256    | 86.8    | 83.1  | 96.3  | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/image_classification/imagenet_resnetrs152_i256.yaml) \| [ckpt](https://storage.cloud.google.com/tf_model_garden/vision/resnet-rs/resnet-rs-152-i256.tar.gz) |
| ResNet-RS-200 | 256x256    | 93.4    | 83.5  | 96.6  | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/image_classification/imagenet_resnetrs200_i256.yaml) \| [ckpt](https://storage.cloud.google.com/tf_model_garden/vision/resnet-rs/resnet-rs-200-i256.tar.gz) |
| ResNet-RS-270 | 256x256    | 130.1    | 83.6  | 96.6  | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/image_classification/imagenet_resnetrs270_i256.yaml) \| [ckpt](https://storage.cloud.google.com/tf_model_garden/vision/resnet-rs/resnet-rs-270-i256.tar.gz) |
| ResNet-RS-350 | 256x256    |  164.3   | 83.7  | 96.7  | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/image_classification/imagenet_resnetrs350_i256.yaml) \| [ckpt](https://storage.cloud.google.com/tf_model_garden/vision/resnet-rs/resnet-rs-350-i256.tar.gz) |
56
| ResNet-RS-350 | 320x320    | 164.3   | 84.2  | 96.9  | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/image_classification/imagenet_resnetrs350_i320.yaml) \| [ckpt](https://storage.cloud.google.com/tf_model_garden/vision/resnet-rs/resnet-rs-350-i320.tar.gz) |
Abdullah Rashwan's avatar
Abdullah Rashwan committed
57

Xianzhi Du's avatar
Xianzhi Du committed
58
59
60
61
62
63
64
65
66
67
68
69
70
71

#### Vision Transformer (ViT)

We support [ViT](https://arxiv.org/abs/2010.11929) and [DEIT](https://arxiv.org/abs/2012.12877) implementations in a TF
Vision
[project](https://github.com/tensorflow/models/tree/master/official/projects/vit). ViT models trained under the DEIT settings:

model     | resolution | Top-1 | Top-5 |
--------- | :--------: | ----: | ----: |
ViT-s16  | 224x224    | 79.4  | 94.7  |
ViT-b16  | 224x224    | 81.8  | 95.8  |
ViT-l16  | 224x224    | 82.2  | 95.8  |


Abdullah Rashwan's avatar
Abdullah Rashwan committed
72
## Object Detection and Instance Segmentation
Xianzhi Du's avatar
Xianzhi Du committed
73

Abdullah Rashwan's avatar
Abdullah Rashwan committed
74
### Common Settings and Notes
Xianzhi Du's avatar
Xianzhi Du committed
75

Xianzhi Du's avatar
Xianzhi Du committed
76
77
78
79
80
* We provide models adopting [ResNet-FPN](https://arxiv.org/abs/1612.03144) and
  [SpineNet](https://arxiv.org/abs/1912.05027) backbones  based on detection frameworks:
  * [RetinaNet](https://arxiv.org/abs/1708.02002) and [RetinaNet-RS](https://arxiv.org/abs/2107.00057)
  * [Mask R-CNN](https://arxiv.org/abs/1703.06870)
  * [Cascade RCNN](https://arxiv.org/abs/1712.00726) and [Cascade RCNN-RS](https://arxiv.org/abs/2107.00057)
A. Unique TensorFlower's avatar
A. Unique TensorFlower committed
81
82
* Models are all trained on [COCO](https://cocodataset.org/) train2017 and
evaluated on [COCO](https://cocodataset.org/) val2017.
Abdullah Rashwan's avatar
Abdullah Rashwan committed
83
* Training details:
A. Unique TensorFlower's avatar
A. Unique TensorFlower committed
84
85
86
  * Models finetuned from [ImageNet](https://www.image-net.org/) pretrained
    checkpoints adopt the 12 or 36 epochs schedule. Models trained from scratch
    adopt the 350 epochs schedule.
Xianzhi Du's avatar
Xianzhi Du committed
87
88
89
90
91
92
93
94
  * The default training data augmentation implements horizontal flipping and
    scale jittering with a random scale between [0.5, 2.0].
  * Unless noted, all models are trained with l2 weight regularization and ReLU
    activation.
  * We use batch size 256 and stepwise learning rate that decays at the last 30
    and 10 epoch.
  * We use square image as input by resizing the long side of an image to the
    target size then padding the short side with zeros.
Abdullah Rashwan's avatar
Abdullah Rashwan committed
95
96

### COCO Object Detection Baselines
Xianzhi Du's avatar
Xianzhi Du committed
97

Abdullah Rashwan's avatar
Abdullah Rashwan committed
98
#### RetinaNet (ImageNet pretrained)
Xianzhi Du's avatar
Xianzhi Du committed
99
100
101
102

| Backbone     | Resolution    | Epochs  | FLOPs (B)     | Params (M) | Box AP | Download |
| ------------ |:-------------:| -------:|--------------:|-----------:|-------:|---------:|
| R50-FPN      | 640x640       |    12   | 97.0 | 34.0 | 34.3 | config|
103
| R50-FPN      | 640x640       |    72   | 97.0 | 34.0 | 36.8 | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/retinanet.py#L187-L258) \| [ckpt](https://storage.cloud.google.com/tf_model_garden/vision/retinanet/retinanet-resnet50fpn.tar.gz) |
Abdullah Rashwan's avatar
Abdullah Rashwan committed
104

A. Unique TensorFlower's avatar
A. Unique TensorFlower committed
105
#### RetinaNet (Trained from scratch) with training features including:
Xianzhi Du's avatar
Xianzhi Du committed
106

A. Unique TensorFlower's avatar
A. Unique TensorFlower committed
107
108
109
* Stochastic depth with drop rate 0.2.
* Swish activation.

Xianzhi Du's avatar
Xianzhi Du committed
110
111
| Backbone     | Resolution    | Epochs  | FLOPs (B)     | Params (M) |  Box AP | Download |
| ------------ |:-------------:| -------:|--------------:|-----------:|--------:|---------:|
Yeqing Li's avatar
Yeqing Li committed
112
113
114
| SpineNet-49  | 640x640       |    500    | 85.4| 28.5 | 44.2 | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/retinanet/coco_spinenet49_tpu.yaml) \| [TB.dev](https://tensorboard.dev/experiment/n2UN83TkTdyKZn3slCWulg/#scalars&_smoothingWeight=0)|
| SpineNet-96  | 1024x1024     |    500    | 265.4 | 43.0 | 48.5 |  [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/retinanet/coco_spinenet96_tpu.yaml) \| [TB.dev](https://tensorboard.dev/experiment/n2UN83TkTdyKZn3slCWulg/#scalars&_smoothingWeight=0)|
| SpineNet-143 | 1280x1280     |    500    | 524.0 | 67.0 | 50.0 | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/retinanet/coco_spinenet143_tpu.yaml) \| [TB.dev](https://tensorboard.dev/experiment/n2UN83TkTdyKZn3slCWulg/#scalars&_smoothingWeight=0)|
Xianzhi Du's avatar
Xianzhi Du committed
115
116
117

#### Mobile-size RetinaNet (Trained from scratch):

Xianzhi Du's avatar
Xianzhi Du committed
118
119
| Backbone    | Resolution | Epochs | FLOPs (B) | Params (M) | Box AP | Download |
| ----------- | :--------: | -----: | --------: | ---------: | -----: | --------:|
Yeqing Li's avatar
Yeqing Li committed
120
121
| MobileNetv2 | 256x256    | 600    | -         | 2.27       | 23.5   | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/retinanet/coco_mobilenetv2_tpu.yaml) |
| Mobile SpineNet-49  | 384x384    | 600    | 1.0      | 2.32       | 28.1   | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/retinanet/coco_spinenet49_mobile_tpu.yaml) \| [ckpt](https://storage.cloud.google.com/tf_model_garden/vision/retinanet/spinenet49mobile.tar.gz) |
Abdullah Rashwan's avatar
Abdullah Rashwan committed
122
123
124
125

### Instance Segmentation Baselines

#### Mask R-CNN (Trained from scratch)
Yin Cui's avatar
Yin Cui committed
126

Xianzhi Du's avatar
Xianzhi Du committed
127
128
| Backbone     | Resolution    | Epochs  | FLOPs (B)  | Params (M) | Box AP | Mask AP | Download |
| ------------ |:-------------:| -------:|-----------:|-----------:|-------:|--------:|---------:|
Yeqing Li's avatar
Yeqing Li committed
129
130
131
132
| ResNet50-FPN | 640x640    | 350    | 227.7     | 46.3       | 42.3   | 37.6    | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/maskrcnn/r50fpn_640_coco_scratch_tpu4x4.yaml) |
| SpineNet-49  | 640x640       |  350    | 215.7      | 40.8       | 42.6   | 37.9    | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/maskrcnn/coco_spinenet49_mrcnn_tpu.yaml) |
| SpineNet-96  | 1024x1024  | 500    | 315.0     | 55.2       | 48.1   | 42.4    | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/maskrcnn/coco_spinenet96_mrcnn_tpu.yaml) |
| SpineNet-143 | 1280x1280  | 500    | 498.8     | 79.2       | 49.3   | 43.4    | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/maskrcnn/coco_spinenet143_mrcnn_tpu.yaml) |
Xianzhi Du's avatar
Xianzhi Du committed
133
134
135
136


#### Cascade RCNN-RS (Trained from scratch)

A. Unique TensorFlower's avatar
A. Unique TensorFlower committed
137
| Backbone     | Resolution | Epochs | Params (M) | Box AP | Mask AP | Download
Xianzhi Du's avatar
Xianzhi Du committed
138
------------ | :--------: | -----: | ---------: | -----: | ------: | -------:
Yeqing Li's avatar
Yeqing Li committed
139
140
141
| SpineNet-49  | 640x640    | 500    | 56.4       | 46.4   | 40.0    | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/maskrcnn/coco_spinenet49_cascadercnn_tpu.yaml)|
| SpineNet-96 | 1024x1024  | 500    | 70.8   | 50.9   | 43.8    | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/maskrcnn/coco_spinenet96_cascadercnn_tpu.yaml)|
| SpineNet-143 | 1280x1280  | 500    | 94.9       | 51.9   | 45.0    | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/maskrcnn/coco_spinenet143_cascadercnn_tpu.yaml)|
Xianzhi Du's avatar
Xianzhi Du committed
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161

## Semantic Segmentation

* We support [DeepLabV3](https://arxiv.org/pdf/1706.05587.pdf) and
  [DeepLabV3+](https://arxiv.org/pdf/1802.02611.pdf) architectures, with
  Dilated ResNet backbones.
* Backbones are pre-trained on ImageNet.

### PASCAL-VOC

| Model      | Backbone           | Resolution | Steps | mIoU | Download |
| ---------- | :----------------: | :--------: | ----: | ---: | --------:|
| DeepLabV3  | Dilated Resnet-101 | 512x512    | 30k   | 78.7 |          |
| DeepLabV3+ | Dilated Resnet-101 | 512x512    | 30k   | 79.2 |          |

### CITYSCAPES

| Model      | Backbone           | Resolution | Steps | mIoU  | Download |
| ---------- | :----------------: | :--------: | ----: | ----: | --------:|
| DeepLabV3+ | Dilated Resnet-101 | 1024x2048  | 90k   | 78.79 |          |
Yin Cui's avatar
Yin Cui committed
162
163

## Video Classification
Xianzhi Du's avatar
Xianzhi Du committed
164

Yin Cui's avatar
Yin Cui committed
165
### Common Settings and Notes
Xianzhi Du's avatar
Xianzhi Du committed
166

Xianzhi Du's avatar
Xianzhi Du committed
167
168
169
*   We provide models for video classification with backbones:
    *   SlowOnly in
        [SlowFast Networks for Video Recognition](https://arxiv.org/abs/1812.03982).
Xianzhi Du's avatar
Xianzhi Du committed
170
    *   ResNet-3D (R3D) in
Xianzhi Du's avatar
Xianzhi Du committed
171
172
173
        [Spatiotemporal Contrastive Video Representation Learning](https://arxiv.org/abs/2008.03800).
    *   ResNet-3D-RS (R3D-RS) in
        [Revisiting 3D ResNets for Video Recognition](https://arxiv.org/pdf/2109.01696.pdf).
174
175
    *   Mobile Video Networks (MoViNets) in
        [MoViNets: Mobile Video Networks for Efficient Video Recognition](https://arxiv.org/abs/2103.11511).
Xianzhi Du's avatar
Xianzhi Du committed
176

177
* Training and evaluation details (SlowFast and ResNet):
Xianzhi Du's avatar
Xianzhi Du committed
178
179
180
181
182
183
  * All models are trained from scratch with vision modality (RGB) for 200
    epochs.
  * We use batch size of 1024 and cosine learning rate decay with linear warmup
    in first 5 epochs.
  * We follow [SlowFast](https://arxiv.org/abs/1812.03982) to perform 30-view
    evaluation.
Yin Cui's avatar
Yin Cui committed
184
185

### Kinetics-400 Action Recognition Baselines
Xianzhi Du's avatar
Xianzhi Du committed
186
187

| Model    | Input (frame x stride) |  Top-1  |  Top-5  | Download |
Yin Cui's avatar
Yin Cui committed
188
| -------- |:----------------------:|--------:|--------:|---------:|
Yeqing Li's avatar
Yeqing Li committed
189
190
191
192
| SlowOnly | 8 x 8                  |  74.1   |  91.4   | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/video_classification/k400_slowonly8x8_tpu.yaml) |
| SlowOnly | 16 x 4                 |  75.6   |  92.1   | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/video_classification/k400_slowonly16x4_tpu.yaml) |
| R3D-50   | 32 x 2                 |  77.0   |  93.0   | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/video_classification/k400_3d-resnet50_tpu.yaml) |
| R3D-RS-50   | 32 x 2                 |  78.2   |  93.7   | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/video_classification/k400_resnet3drs_50_tpu.yaml) |
Xianzhi Du's avatar
Xianzhi Du committed
193
194
195
196
| R3D-RS-101 | 32 x 2                 | 79.5  | 94.2  | -
| R3D-RS-152 | 32 x 2                 | 79.9  | 94.3  | -
| R3D-RS-200 | 32 x 2                 | 80.4  | 94.4  | -
| R3D-RS-200 | 48 x 2                 | 81.0  | -     | -
197
198
199
200
201
202
| MoViNet-A0-Base | 50 x 5            | 69.40 | 89.18 | -
| MoViNet-A1-Base | 50 x 5            | 74.57 | 92.03 | -
| MoViNet-A2-Base | 50 x 5            | 75.91 | 92.63 | -
| MoViNet-A3-Base | 120 x 2           | 79.34 | 94.52 | -
| MoViNet-A4-Base | 80 x 3            | 80.64 | 94.93 | -
| MoViNet-A5-Base | 120 x 2           | 81.39 | 95.06 | -
Yin Cui's avatar
Yin Cui committed
203
204

### Kinetics-600 Action Recognition Baselines
Xianzhi Du's avatar
Xianzhi Du committed
205
206

| Model    | Input (frame x stride) |  Top-1  |  Top-5  | Download |
Yin Cui's avatar
Yin Cui committed
207
| -------- |:----------------------:|--------:|--------:|---------:|
Yeqing Li's avatar
Yeqing Li committed
208
209
| SlowOnly | 8 x 8                  |  77.3   |  93.6   | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/video_classification/k600_slowonly8x8_tpu.yaml) |
| R3D-50   | 32 x 2                 |  79.5   |  94.8   | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/video_classification/k600_3d-resnet50_tpu.yaml) |
Xianzhi Du's avatar
Xianzhi Du committed
210
211
| R3D-RS-200 | 32 x 2                 | 83.1  | -     | -
| R3D-RS-200 | 48 x 2                 | 83.8  | -     | -
212
213
214
215
216
217
| MoViNet-A0-Base | 50 x 5            | 72.05 | 90.92 | [config](https://github.com/tensorflow/models/blob/master/official/projects/movinet/configs/yaml/movinet_a0_k600_8x8.yaml) |
| MoViNet-A1-Base | 50 x 5            | 76.69 | 93.40 | [config](https://github.com/tensorflow/models/blob/master/official/projects/movinet/configs/yaml/movinet_a1_k600_8x8.yaml) |
| MoViNet-A2-Base | 50 x 5            | 78.62 | 94.17 | [config](https://github.com/tensorflow/models/blob/master/official/projects/movinet/configs/yaml/movinet_a2_k600_8x8.yaml) |
| MoViNet-A3-Base | 120 x 2           | 81.79 | 95.67 | [config](https://github.com/tensorflow/models/blob/master/official/projects/movinet/configs/yaml/movinet_a3_k600_8x8.yaml) |
| MoViNet-A4-Base | 80 x 3            | 83.48 | 96.16 | [config](https://github.com/tensorflow/models/blob/master/official/projects/movinet/configs/yaml/movinet_a4_k600_8x8.yaml) |
| MoViNet-A5-Base | 120 x 2           | 84.27 | 96.39 | [config](https://github.com/tensorflow/models/blob/master/official/projects/movinet/configs/yaml/movinet_a5_k600_8x8.yaml) |