MODEL_GARDEN.md 15.2 KB
Newer Older
Xianzhi Du's avatar
Xianzhi Du committed
1
# TF-Vision Model Garden
Abdullah Rashwan's avatar
Abdullah Rashwan committed
2

A. Unique TensorFlower's avatar
A. Unique TensorFlower committed
3
4
5
6
7
⚠️ Disclaimer: All datasets hyperlinked from this page are not owned or
distributed by Google. The dataset is made available by third parties.
Please review the terms and conditions made available by the third parties
before using the data.

Abdullah Rashwan's avatar
Abdullah Rashwan committed
8
9
## Introduction

Xianzhi Du's avatar
Xianzhi Du committed
10
11
12
TF-Vision modeling library for computer vision provides a collection of
baselines and checkpoints for image classification, object detection, and
segmentation.
Abdullah Rashwan's avatar
Abdullah Rashwan committed
13
14

## Image Classification
Xianzhi Du's avatar
Xianzhi Du committed
15

Abdullah Rashwan's avatar
Abdullah Rashwan committed
16
### ImageNet Baselines
Xianzhi Du's avatar
Xianzhi Du committed
17
18
19
20
21

#### ResNet models trained with vanilla settings

* Models are trained from scratch with batch size 4096 and 1.6 initial learning
  rate.
A. Unique TensorFlower's avatar
A. Unique TensorFlower committed
22
23
24
* Linear warmup is applied for the first 5 epochs.
* Models trained with l2 weight regularization and ReLU activation.

Xianzhi Du's avatar
Xianzhi Du committed
25
26
| Model        | Resolution    | Epochs  |  Top-1  |  Top-5  | Download |
| ------------ |:-------------:|--------:|--------:|--------:|---------:|
A. Unique TensorFlower's avatar
A. Unique TensorFlower committed
27
28
29
30
| ResNet-50    | 224x224       |    90    | 76.1 | 92.9 | [config](https://github.com/tensorflow/models/blob/master/official/vision/beta/configs/experiments/image_classification/imagenet_resnet50_tpu.yaml) |
| ResNet-50    | 224x224       |    200   | 77.1 | 93.5 | [config](https://github.com/tensorflow/models/blob/master/official/vision/beta/configs/experiments/image_classification/imagenet_resnet50_tpu.yaml) |
| ResNet-101   | 224x224       |    200   | 78.3 | 94.2 | [config](https://github.com/tensorflow/models/blob/master/official/vision/beta/configs/experiments/image_classification/imagenet_resnet101_tpu.yaml) |
| ResNet-152   | 224x224       |    200   | 78.7 | 94.3 | [config](https://github.com/tensorflow/models/blob/master/official/vision/beta/configs/experiments/image_classification/imagenet_resnet152_tpu.yaml) |
A. Unique TensorFlower's avatar
A. Unique TensorFlower committed
31

Xianzhi Du's avatar
Xianzhi Du committed
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
#### ResNet-RS models trained with various settings

We support state-of-the-art [ResNet-RS](https://arxiv.org/abs/2103.07579) image
classification models with features:

* ResNet-RS architectural changes and Swish activation. (Note that ResNet-RS
  adopts ReLU activation in the paper.)
* Regularization methods including Random Augment, 4e-5 weight decay, stochastic
depth, label smoothing and dropout.
* New training methods including a 350-epoch schedule, cosine learning rate and
  EMA.
* Configs are in this [directory](https://github.com/tensorflow/models/blob/master/official/vision/beta/configs/experiments/image_classification).

| Model     | Resolution | Params (M) | Top-1 | Top-5 | Download |
| --------- | :--------: | ---------: | ----: | ----: | --------:|
| ResNet-RS-50 | 160x160    | 35.7    | 79.1  | 94.5  | [config](https://github.com/tensorflow/models/blob/master/official/vision/beta/configs/experiments/image_classification/imagenet_resnetrs50_i160.yaml) \| [ckpt](https://storage.cloud.google.com/tf_model_garden/vision/resnet-rs/resnet-rs-50-i160.tar.gz) |
| ResNet-RS-101 | 160x160    | 63.7    | 80.2  | 94.9  | [config](https://github.com/tensorflow/models/blob/master/official/vision/beta/configs/experiments/image_classification/imagenet_resnetrs101_i160.yaml) \| [ckpt](https://storage.cloud.google.com/tf_model_garden/vision/resnet-rs/resnet-rs-101-i160.tar.gz) |
| ResNet-RS-101 | 192x192    | 63.7    | 81.3  | 95.6  | [config](https://github.com/tensorflow/models/blob/master/official/vision/beta/configs/experiments/image_classification/imagenet_resnetrs101_i192.yaml) \| [ckpt](https://storage.cloud.google.com/tf_model_garden/vision/resnet-rs/resnet-rs-101-i192.tar.gz) |
| ResNet-RS-152 | 192x192    | 86.8    | 81.9  | 95.8  | [config](https://github.com/tensorflow/models/blob/master/official/vision/beta/configs/experiments/image_classification/imagenet_resnetrs152_i192.yaml) \| [ckpt](https://storage.cloud.google.com/tf_model_garden/vision/resnet-rs/resnet-rs-152-i192.tar.gz) |
| ResNet-RS-152 | 224x224    | 86.8    | 82.5  | 96.1  | [config](https://github.com/tensorflow/models/blob/master/official/vision/beta/configs/experiments/image_classification/imagenet_resnetrs152_i224.yaml) \| [ckpt](https://storage.cloud.google.com/tf_model_garden/vision/resnet-rs/resnet-rs-152-i224.tar.gz) |
| ResNet-RS-152 | 256x256    | 86.8    | 83.1  | 96.3  | [config](https://github.com/tensorflow/models/blob/master/official/vision/beta/configs/experiments/image_classification/imagenet_resnetrs152_i256.yaml) \| [ckpt](https://storage.cloud.google.com/tf_model_garden/vision/resnet-rs/resnet-rs-152-i256.tar.gz) |
| ResNet-RS-200 | 256x256    | 93.4    | 83.5  | 96.6  | [config](https://github.com/tensorflow/models/blob/master/official/vision/beta/configs/experiments/image_classification/imagenet_resnetrs200_i256.yaml) \| [ckpt](https://storage.cloud.google.com/tf_model_garden/vision/resnet-rs/resnet-rs-200-i256.tar.gz) |
| ResNet-RS-270 | 256x256    | 130.1    | 83.6  | 96.6  | [config](https://github.com/tensorflow/models/blob/master/official/vision/beta/configs/experiments/image_classification/imagenet_resnetrs270_i256.yaml) \| [ckpt](https://storage.cloud.google.com/tf_model_garden/vision/resnet-rs/resnet-rs-270-i256.tar.gz) |
| ResNet-RS-350 | 256x256    |  164.3   | 83.7  | 96.7  | [config](https://github.com/tensorflow/models/blob/master/official/vision/beta/configs/experiments/image_classification/imagenet_resnetrs350_i256.yaml) \| [ckpt](https://storage.cloud.google.com/tf_model_garden/vision/resnet-rs/resnet-rs-350-i256.tar.gz) |
| ResNet-RS-350 | 320x320    | 164.3   | 84.2  | 96.9  | [config](https://github.com/tensorflow/models/blob/master/official/vision/beta/configs/experiments/image_classification/imagenet_resnetrs420_i256.yaml) \| [ckpt](https://storage.cloud.google.com/tf_model_garden/vision/resnet-rs/resnet-rs-350-i320.tar.gz) |
Abdullah Rashwan's avatar
Abdullah Rashwan committed
57

Xianzhi Du's avatar
Xianzhi Du committed
58
59
60
61
62
63
64
65
66
67
68
69
70
71

#### Vision Transformer (ViT)

We support [ViT](https://arxiv.org/abs/2010.11929) and [DEIT](https://arxiv.org/abs/2012.12877) implementations in a TF
Vision
[project](https://github.com/tensorflow/models/tree/master/official/projects/vit). ViT models trained under the DEIT settings:

model     | resolution | Top-1 | Top-5 |
--------- | :--------: | ----: | ----: |
ViT-s16  | 224x224    | 79.4  | 94.7  |
ViT-b16  | 224x224    | 81.8  | 95.8  |
ViT-l16  | 224x224    | 82.2  | 95.8  |


Abdullah Rashwan's avatar
Abdullah Rashwan committed
72
## Object Detection and Instance Segmentation
Xianzhi Du's avatar
Xianzhi Du committed
73

Abdullah Rashwan's avatar
Abdullah Rashwan committed
74
### Common Settings and Notes
Xianzhi Du's avatar
Xianzhi Du committed
75

Xianzhi Du's avatar
Xianzhi Du committed
76
77
78
79
80
* We provide models adopting [ResNet-FPN](https://arxiv.org/abs/1612.03144) and
  [SpineNet](https://arxiv.org/abs/1912.05027) backbones  based on detection frameworks:
  * [RetinaNet](https://arxiv.org/abs/1708.02002) and [RetinaNet-RS](https://arxiv.org/abs/2107.00057)
  * [Mask R-CNN](https://arxiv.org/abs/1703.06870)
  * [Cascade RCNN](https://arxiv.org/abs/1712.00726) and [Cascade RCNN-RS](https://arxiv.org/abs/2107.00057)
A. Unique TensorFlower's avatar
A. Unique TensorFlower committed
81
82
* Models are all trained on [COCO](https://cocodataset.org/) train2017 and
evaluated on [COCO](https://cocodataset.org/) val2017.
Abdullah Rashwan's avatar
Abdullah Rashwan committed
83
* Training details:
A. Unique TensorFlower's avatar
A. Unique TensorFlower committed
84
85
86
  * Models finetuned from [ImageNet](https://www.image-net.org/) pretrained
    checkpoints adopt the 12 or 36 epochs schedule. Models trained from scratch
    adopt the 350 epochs schedule.
Xianzhi Du's avatar
Xianzhi Du committed
87
88
89
90
91
92
93
94
  * The default training data augmentation implements horizontal flipping and
    scale jittering with a random scale between [0.5, 2.0].
  * Unless noted, all models are trained with l2 weight regularization and ReLU
    activation.
  * We use batch size 256 and stepwise learning rate that decays at the last 30
    and 10 epoch.
  * We use square image as input by resizing the long side of an image to the
    target size then padding the short side with zeros.
Abdullah Rashwan's avatar
Abdullah Rashwan committed
95
96

### COCO Object Detection Baselines
Xianzhi Du's avatar
Xianzhi Du committed
97

Abdullah Rashwan's avatar
Abdullah Rashwan committed
98
#### RetinaNet (ImageNet pretrained)
Xianzhi Du's avatar
Xianzhi Du committed
99
100
101
102
103

| Backbone     | Resolution    | Epochs  | FLOPs (B)     | Params (M) | Box AP | Download |
| ------------ |:-------------:| -------:|--------------:|-----------:|-------:|---------:|
| R50-FPN      | 640x640       |    12   | 97.0 | 34.0 | 34.3 | config|
| R50-FPN      | 640x640       |    72   | 97.0 | 34.0 | 36.8 | config \| [ckpt](https://storage.cloud.google.com/tf_model_garden/vision/retinanet/retinanet-resnet50fpn.tar.gz) |
Abdullah Rashwan's avatar
Abdullah Rashwan committed
104

A. Unique TensorFlower's avatar
A. Unique TensorFlower committed
105
#### RetinaNet (Trained from scratch) with training features including:
Xianzhi Du's avatar
Xianzhi Du committed
106

A. Unique TensorFlower's avatar
A. Unique TensorFlower committed
107
108
109
* Stochastic depth with drop rate 0.2.
* Swish activation.

Xianzhi Du's avatar
Xianzhi Du committed
110
111
| Backbone     | Resolution    | Epochs  | FLOPs (B)     | Params (M) |  Box AP | Download |
| ------------ |:-------------:| -------:|--------------:|-----------:|--------:|---------:|
Xianzhi Du's avatar
Xianzhi Du committed
112
113
114
115
116
117
| SpineNet-49  | 640x640       |    500    | 85.4| 28.5 | 44.2 | [config](https://github.com/tensorflow/models/blob/master/official/vision/beta/configs/experiments/retinanet/coco_spinenet49_tpu.yaml) \| [TB.dev](https://tensorboard.dev/experiment/n2UN83TkTdyKZn3slCWulg/#scalars&_smoothingWeight=0)|
| SpineNet-96  | 1024x1024     |    500    | 265.4 | 43.0 | 48.5 |  [config](https://github.com/tensorflow/models/blob/master/official/vision/beta/configs/experiments/retinanet/coco_spinenet96_tpu.yaml) \| [TB.dev](https://tensorboard.dev/experiment/n2UN83TkTdyKZn3slCWulg/#scalars&_smoothingWeight=0)|
| SpineNet-143 | 1280x1280     |    500    | 524.0 | 67.0 | 50.0 | [config](https://github.com/tensorflow/models/blob/master/official/vision/beta/configs/experiments/retinanet/coco_spinenet143_tpu.yaml) \| [TB.dev](https://tensorboard.dev/experiment/n2UN83TkTdyKZn3slCWulg/#scalars&_smoothingWeight=0)|

#### Mobile-size RetinaNet (Trained from scratch):

Xianzhi Du's avatar
Xianzhi Du committed
118
119
120
121
| Backbone    | Resolution | Epochs | FLOPs (B) | Params (M) | Box AP | Download |
| ----------- | :--------: | -----: | --------: | ---------: | -----: | --------:|
| MobileNetv2 | 256x256    | 600    | -         | 2.27       | 23.5   | [config](https://github.com/tensorflow/models/blob/master/official/vision/beta/configs/experiments/retinanet/coco_mobilenetv2_tpu.yaml) |
| Mobile SpineNet-49  | 384x384    | 600    | 1.0      | 2.32       | 28.1   | [config](https://github.com/tensorflow/models/blob/master/official/vision/beta/configs/experiments/retinanet/coco_spinenet49_mobile_tpu.yaml) \| [ckpt](https://storage.cloud.google.com/tf_model_garden/vision/retinanet/spinenet49mobile.tar.gz) |
Abdullah Rashwan's avatar
Abdullah Rashwan committed
122
123
124
125

### Instance Segmentation Baselines

#### Mask R-CNN (Trained from scratch)
Yin Cui's avatar
Yin Cui committed
126

Xianzhi Du's avatar
Xianzhi Du committed
127
128
| Backbone     | Resolution    | Epochs  | FLOPs (B)  | Params (M) | Box AP | Mask AP | Download |
| ------------ |:-------------:| -------:|-----------:|-----------:|-------:|--------:|---------:|
A. Unique TensorFlower's avatar
A. Unique TensorFlower committed
129
| ResNet50-FPN | 640x640    | 350    | 227.7     | 46.3       | 42.3   | 37.6    | [config](https://github.com/tensorflow/models/blob/master/official/vision/beta/configs/experiments/maskrcnn/r50fpn_640_coco_scratch_tpu4x4.yaml) |
Xianzhi Du's avatar
Xianzhi Du committed
130
| SpineNet-49  | 640x640       |  350    | 215.7      | 40.8       | 42.6   | 37.9    | [config](https://github.com/tensorflow/models/blob/master/official/vision/beta/configs/experiments/maskrcnn/coco_spinenet49_mrcnn_tpu.yaml) |
A. Unique TensorFlower's avatar
A. Unique TensorFlower committed
131
132
| SpineNet-96  | 1024x1024  | 500    | 315.0     | 55.2       | 48.1   | 42.4    | [config](https://github.com/tensorflow/models/blob/master/official/vision/beta/configs/experiments/maskrcnn/coco_spinenet96_mrcnn_tpu.yaml) |
| SpineNet-143 | 1280x1280  | 500    | 498.8     | 79.2       | 49.3   | 43.4    | [config](https://github.com/tensorflow/models/blob/master/official/vision/beta/configs/experiments/maskrcnn/coco_spinenet143_mrcnn_tpu.yaml) |
Xianzhi Du's avatar
Xianzhi Du committed
133
134
135
136


#### Cascade RCNN-RS (Trained from scratch)

A. Unique TensorFlower's avatar
A. Unique TensorFlower committed
137
| Backbone     | Resolution | Epochs | Params (M) | Box AP | Mask AP | Download
Xianzhi Du's avatar
Xianzhi Du committed
138
------------ | :--------: | -----: | ---------: | -----: | ------: | -------:
A. Unique TensorFlower's avatar
A. Unique TensorFlower committed
139
| SpineNet-49  | 640x640    | 500    | 56.4       | 46.4   | 40.0    | [config](https://github.com/tensorflow/models/blob/master/official/vision/beta/configs/experiments/maskrcnn/coco_spinenet49_cascadercnn_tpu.yaml)|
Xianzhi Du's avatar
Xianzhi Du committed
140
| SpineNet-96 | 1024x1024  | 500    | 70.8   | 50.9   | 43.8    | [config](https://github.com/tensorflow/models/blob/master/official/vision/beta/configs/experiments/maskrcnn/coco_spinenet96_cascadercnn_tpu.yaml)|
A. Unique TensorFlower's avatar
A. Unique TensorFlower committed
141
| SpineNet-143 | 1280x1280  | 500    | 94.9       | 51.9   | 45.0    | [config](https://github.com/tensorflow/models/blob/master/official/vision/beta/configs/experiments/maskrcnn/coco_spinenet143_cascadercnn_tpu.yaml)|
Xianzhi Du's avatar
Xianzhi Du committed
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161

## Semantic Segmentation

* We support [DeepLabV3](https://arxiv.org/pdf/1706.05587.pdf) and
  [DeepLabV3+](https://arxiv.org/pdf/1802.02611.pdf) architectures, with
  Dilated ResNet backbones.
* Backbones are pre-trained on ImageNet.

### PASCAL-VOC

| Model      | Backbone           | Resolution | Steps | mIoU | Download |
| ---------- | :----------------: | :--------: | ----: | ---: | --------:|
| DeepLabV3  | Dilated Resnet-101 | 512x512    | 30k   | 78.7 |          |
| DeepLabV3+ | Dilated Resnet-101 | 512x512    | 30k   | 79.2 |          |

### CITYSCAPES

| Model      | Backbone           | Resolution | Steps | mIoU  | Download |
| ---------- | :----------------: | :--------: | ----: | ----: | --------:|
| DeepLabV3+ | Dilated Resnet-101 | 1024x2048  | 90k   | 78.79 |          |
Yin Cui's avatar
Yin Cui committed
162
163

## Video Classification
Xianzhi Du's avatar
Xianzhi Du committed
164

Yin Cui's avatar
Yin Cui committed
165
### Common Settings and Notes
Xianzhi Du's avatar
Xianzhi Du committed
166

Xianzhi Du's avatar
Xianzhi Du committed
167
168
169
*   We provide models for video classification with backbones:
    *   SlowOnly in
        [SlowFast Networks for Video Recognition](https://arxiv.org/abs/1812.03982).
Xianzhi Du's avatar
Xianzhi Du committed
170
    *   ResNet-3D (R3D) in
Xianzhi Du's avatar
Xianzhi Du committed
171
172
173
174
        [Spatiotemporal Contrastive Video Representation Learning](https://arxiv.org/abs/2008.03800).
    *   ResNet-3D-RS (R3D-RS) in
        [Revisiting 3D ResNets for Video Recognition](https://arxiv.org/pdf/2109.01696.pdf).

Yin Cui's avatar
Yin Cui committed
175
* Training and evaluation details:
Xianzhi Du's avatar
Xianzhi Du committed
176
177
178
179
180
181
  * All models are trained from scratch with vision modality (RGB) for 200
    epochs.
  * We use batch size of 1024 and cosine learning rate decay with linear warmup
    in first 5 epochs.
  * We follow [SlowFast](https://arxiv.org/abs/1812.03982) to perform 30-view
    evaluation.
Yin Cui's avatar
Yin Cui committed
182
183

### Kinetics-400 Action Recognition Baselines
Xianzhi Du's avatar
Xianzhi Du committed
184
185

| Model    | Input (frame x stride) |  Top-1  |  Top-5  | Download |
Yin Cui's avatar
Yin Cui committed
186
187
188
189
| -------- |:----------------------:|--------:|--------:|---------:|
| SlowOnly | 8 x 8                  |  74.1   |  91.4   | [config](https://github.com/tensorflow/models/blob/master/official/vision/beta/configs/experiments/video_classification/k400_slowonly8x8_tpu.yaml) |
| SlowOnly | 16 x 4                 |  75.6   |  92.1   | [config](https://github.com/tensorflow/models/blob/master/official/vision/beta/configs/experiments/video_classification/k400_slowonly16x4_tpu.yaml) |
| R3D-50   | 32 x 2                 |  77.0   |  93.0   | [config](https://github.com/tensorflow/models/blob/master/official/vision/beta/configs/experiments/video_classification/k400_3d-resnet50_tpu.yaml) |
Xianzhi Du's avatar
Xianzhi Du committed
190
| R3D-RS-50   | 32 x 2                 |  78.2   |  93.7   | [config](https://github.com/tensorflow/models/blob/master/official/vision/beta/configs/experiments/video_classification/k400_resnet3drs_50_tpu.yaml) |
Xianzhi Du's avatar
Xianzhi Du committed
191
192
193
194
| R3D-RS-101 | 32 x 2                 | 79.5  | 94.2  | -
| R3D-RS-152 | 32 x 2                 | 79.9  | 94.3  | -
| R3D-RS-200 | 32 x 2                 | 80.4  | 94.4  | -
| R3D-RS-200 | 48 x 2                 | 81.0  | -     | -
Yin Cui's avatar
Yin Cui committed
195
196

### Kinetics-600 Action Recognition Baselines
Xianzhi Du's avatar
Xianzhi Du committed
197
198

| Model    | Input (frame x stride) |  Top-1  |  Top-5  | Download |
Yin Cui's avatar
Yin Cui committed
199
200
201
| -------- |:----------------------:|--------:|--------:|---------:|
| SlowOnly | 8 x 8                  |  77.3   |  93.6   | [config](https://github.com/tensorflow/models/blob/master/official/vision/beta/configs/experiments/video_classification/k600_slowonly8x8_tpu.yaml) |
| R3D-50   | 32 x 2                 |  79.5   |  94.8   | [config](https://github.com/tensorflow/models/blob/master/official/vision/beta/configs/experiments/video_classification/k600_3d-resnet50_tpu.yaml) |
Xianzhi Du's avatar
Xianzhi Du committed
202
203
| R3D-RS-200 | 32 x 2                 | 83.1  | -     | -
| R3D-RS-200 | 48 x 2                 | 83.8  | -     | -