README.md 12.1 KB
Newer Older
yukun's avatar
yukun committed
1
2
# DeepLab: Deep Labelling for Semantic Image Segmentation

3
4
5
6
**To new and existing DeepLab users**: We have released a unified codebase for
dense pixel labeling tasks in TensorFlow2 at https://github.com/google-research/deeplab2.
Please consider switching to the newer codebase for better support. 

yukun's avatar
yukun committed
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
DeepLab is a state-of-art deep learning model for semantic image segmentation,
where the goal is to assign semantic labels (e.g., person, dog, cat and so on)
to every pixel in the input image. Current implementation includes the following
features:

1.  DeepLabv1 [1]: We use *atrous convolution* to explicitly control the
    resolution at which feature responses are computed within Deep Convolutional
    Neural Networks.

2.  DeepLabv2 [2]: We use *atrous spatial pyramid pooling* (ASPP) to robustly
    segment objects at multiple scales with filters at multiple sampling rates
    and effective fields-of-views.

3.  DeepLabv3 [3]: We augment the ASPP module with *image-level feature* [5, 6]
    to capture longer range information. We also include *batch normalization*
    [7] parameters to facilitate the training. In particular, we applying atrous
    convolution to extract output features at different output strides during
    training and evaluation, which efficiently enables training BN at output
    stride = 16 and attains a high performance at output stride = 8 during
    evaluation.

4.  DeepLabv3+ [4]: We extend DeepLabv3 to include a simple yet effective
    decoder module to refine the segmentation results especially along object
    boundaries. Furthermore, in this encoder-decoder structure one can
    arbitrarily control the resolution of extracted encoder features by atrous
    convolution to trade-off precision and runtime.

If you find the code useful for your research, please consider citing our latest
35
36
37
works:

*   DeepLabv3+:
yukun's avatar
yukun committed
38
39

```
40
@inproceedings{deeplabv3plus2018,
yukun's avatar
yukun committed
41
42
  title={Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation},
  author={Liang-Chieh Chen and Yukun Zhu and George Papandreou and Florian Schroff and Hartwig Adam},
43
  booktitle={ECCV},
yukun's avatar
yukun committed
44
45
46
47
  year={2018}
}
```

48
49
50
51
*   MobileNetv2:

```
@inproceedings{mobilenetv22018,
52
  title={MobileNetV2: Inverted Residuals and Linear Bottlenecks},
53
54
55
56
57
58
  author={Mark Sandler and Andrew Howard and Menglong Zhu and Andrey Zhmoginov and Liang-Chieh Chen},
  booktitle={CVPR},
  year={2018}
}
```

59
60
61
62
63
64
65
66
67
68
69
*   MobileNetv3:

```
@inproceedings{mobilenetv32019,
  title={Searching for MobileNetV3},
  author={Andrew Howard and Mark Sandler and Grace Chu and Liang-Chieh Chen and Bo Chen and Mingxing Tan and Weijun Wang and Yukun Zhu and Ruoming Pang and Vijay Vasudevan and Quoc V. Le and Hartwig Adam},
  booktitle={ICCV},
  year={2019}
}
```

70
71
72
73
74
75
76
77
78
79
80
81
*  Architecture search for dense prediction cell:

```
@inproceedings{dpc2018,
  title={Searching for Efficient Multi-Scale Architectures for Dense Image Prediction},
  author={Liang-Chieh Chen and Maxwell D. Collins and Yukun Zhu and George Papandreou and Barret Zoph and Florian Schroff and Hartwig Adam and Jonathon Shlens},
  booktitle={NIPS},
  year={2018}
}

```

82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
*  Auto-DeepLab (also called hnasnet in core/nas_network.py):

```
@inproceedings{autodeeplab2019,
  title={Auto-DeepLab: Hierarchical Neural Architecture Search for Semantic
Image Segmentation},
  author={Chenxi Liu and Liang-Chieh Chen and Florian Schroff and Hartwig Adam
  and Wei Hua and Alan Yuille and Li Fei-Fei},
  booktitle={CVPR},
  year={2019}
}

```


yukun's avatar
yukun committed
97
98
99
In the current implementation, we support adopting the following network
backbones:

100
101
1.  MobileNetv2 [8] and MobileNetv3 [16]: A fast network structure designed
    for mobile devices.
yukun's avatar
yukun committed
102
103
104
105

2.  Xception [9, 10]: A powerful network structure intended for server-side
    deployment.

106
107
108
109
110
111
112
113
114
3.  ResNet-v1-{50,101} [14]: We provide both the original ResNet-v1 and its
    'beta' variant where the 'stem' is modified for semantic segmentation.

4.  PNASNet [15]: A Powerful network structure found by neural architecture
    search.

5.  Auto-DeepLab (called HNASNet in the code): A segmentation-specific network
    backbone found by neural architecture search.

yukun's avatar
yukun committed
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
This directory contains our TensorFlow [11] implementation. We provide codes
allowing users to train the model, evaluate results in terms of mIOU (mean
intersection-over-union), and visualize segmentation results. We use PASCAL VOC
2012 [12] and Cityscapes [13] semantic segmentation benchmarks as an example in
the code.

Some segmentation results on Flickr images:
<p align="center">
    <img src="g3doc/img/vis1.png" width=600></br>
    <img src="g3doc/img/vis2.png" width=600></br>
    <img src="g3doc/img/vis3.png" width=600></br>
</p>

## Contacts (Maintainers)

*   Liang-Chieh Chen, github: [aquariusjay](https://github.com/aquariusjay)
*   YuKun Zhu, github: [yknzhu](https://github.com/YknZhu)
*   George Papandreou, github: [gpapan](https://github.com/gpapan)
133
*   Hui Hui, github: [huihui-personal](https://github.com/huihui-personal)
134
135
*   Maxwell D. Collins, github: [mcollinswisc](https://github.com/mcollinswisc)
*   Ting Liu: github: [tingliu](https://github.com/tingliu)
yukun's avatar
yukun committed
136
137
138
139
140

## Tables of Contents

Demo:

141
*   <a href='https://colab.sandbox.google.com/github/tensorflow/models/blob/master/research/deeplab/deeplab_demo.ipynb'>Colab notebook for off-the-shelf inference.</a><br>
yukun's avatar
yukun committed
142
143
144
145
146
147

Running:

*   <a href='g3doc/installation.md'>Installation.</a><br>
*   <a href='g3doc/pascal.md'>Running DeepLab on PASCAL VOC 2012 semantic segmentation dataset.</a><br>
*   <a href='g3doc/cityscapes.md'>Running DeepLab on Cityscapes semantic segmentation dataset.</a><br>
Yubin Ruan's avatar
Yubin Ruan committed
148
*   <a href='g3doc/ade20k.md'>Running DeepLab on ADE20K semantic segmentation dataset.</a><br>
yukun's avatar
yukun committed
149
150
151
152
153
154
155

Models:

*   <a href='g3doc/model_zoo.md'>Checkpoints and frozen inference graphs.</a><br>

Misc:

yukun's avatar
yukun committed
156
*   Please check <a href='g3doc/faq.md'>FAQ</a> if you have some questions before reporting the issues.<br>
yukun's avatar
yukun committed
157
158
159
160

## Getting Help

To get help with issues you may encounter while using the DeepLab Tensorflow
yukun's avatar
yukun committed
161
implementation, create a new question on
162
[StackOverflow](https://stackoverflow.com/) with the tag "tensorflow".
yukun's avatar
yukun committed
163
164
165
166
167
168

Please report bugs (i.e., broken code, not usage questions) to the
tensorflow/models GitHub [issue
tracker](https://github.com/tensorflow/models/issues), prefixing the issue name
with "deeplab".

169
170
171
172
173
## License

All the codes in deeplab folder is covered by the [LICENSE](https://github.com/tensorflow/models/blob/master/LICENSE)
under tensorflow/models. Please refer to the LICENSE for details.

174
175
## Change Logs

Yukun Zhu's avatar
Yukun Zhu committed
176
177
178
179
### March 26, 2020
* Supported EdgeTPU-DeepLab and EdgeTPU-DeepLab-slim on Cityscapes.
**Contributor**: Yun Long.

180
181
182
183
184
### November 20, 2019
* Supported MobileNetV3 large and small model variants on Cityscapes.
**Contributor**: Yukun Zhu.


185
186
187
188
189
190
191
192
193
194
195
### March 27, 2019

* Supported using different loss weights on different classes during training.
**Contributor**: Yuwei Yang.


### March 26, 2019

* Supported ResNet-v1-18. **Contributor**: Michalis Raptis.


196
197
198
199
200
201
202
203
### March 6, 2019

* Released the evaluation code (under the `evaluation` folder) for image
parsing, a.k.a. panoptic segmentation. In particular, the released code supports
evaluating the parsing results in terms of both the parsing covering and
panoptic quality metrics. **Contributors**: Maxwell Collins and Ting Liu.


204
205
### February 6, 2019

206
* Updated decoder module to exploit multiple low-level features with different
207
208
209
210
211
212
213
214
215
216
217
218
219
220
output_strides.

### December 3, 2018

* Released the MobileNet-v2 checkpoint on ADE20K.


### November 19, 2018

* Supported NAS architecture for feature extraction. **Contributor**: Chenxi Liu.

* Supported hard pixel mining during training.


221
222
### October 1, 2018

223
* Released MobileNet-v2 depth-multiplier = 0.5 COCO-pretrained checkpoints on
224
225
226
227
PASCAL VOC 2012, and Xception-65 COCO pretrained checkpoint (i.e., no PASCAL
pretrained).


228
229
### September 5, 2018

230
* Released Cityscapes pretrained checkpoints with found best dense prediction cell.
231

232

233
234
### May 26, 2018

235
* Updated ADE20K pretrained checkpoint.
236
237


Yukun Zhu's avatar
Yukun Zhu committed
238
### May 18, 2018
239
240
241
* Added builders for ResNet-v1 and Xception model variants.
* Added ADE20K support, including colormap and pretrained Xception_65 checkpoint.
* Fixed a bug on using non-default depth_multiplier for MobileNet-v2.
Yukun Zhu's avatar
Yukun Zhu committed
242
243


244
245
### March 22, 2018

246
* Released checkpoints using MobileNet-V2 as network backbone and pretrained on
247
248
249
250
251
PASCAL VOC 2012 and Cityscapes.


### March 5, 2018

252
* First release of DeepLab in TensorFlow including deeper Xception network
Ruslan Baratov's avatar
Ruslan Baratov committed
253
backbone. Included checkpoints that have been pretrained on PASCAL VOC 2012
254
255
and Cityscapes.

yukun's avatar
yukun committed
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
## References

1.  **Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs**<br />
    Liang-Chieh Chen+, George Papandreou+, Iasonas Kokkinos, Kevin Murphy, Alan L. Yuille (+ equal
    contribution). <br />
    [[link]](https://arxiv.org/abs/1412.7062). In ICLR, 2015.

2.  **DeepLab: Semantic Image Segmentation with Deep Convolutional Nets,**
    **Atrous Convolution, and Fully Connected CRFs** <br />
    Liang-Chieh Chen+, George Papandreou+, Iasonas Kokkinos, Kevin Murphy, and Alan L Yuille (+ equal
    contribution). <br />
    [[link]](http://arxiv.org/abs/1606.00915). TPAMI 2017.

3.  **Rethinking Atrous Convolution for Semantic Image Segmentation**<br />
    Liang-Chieh Chen, George Papandreou, Florian Schroff, Hartwig Adam.<br />
    [[link]](http://arxiv.org/abs/1706.05587). arXiv: 1706.05587, 2017.

4.  **Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation**<br />
274
275
    Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, Hartwig Adam.<br />
    [[link]](https://arxiv.org/abs/1802.02611). In ECCV, 2018.
yukun's avatar
yukun committed
276
277
278
279
280
281
282
283
284
285
286
287
288

5.  **ParseNet: Looking Wider to See Better**<br />
    Wei Liu, Andrew Rabinovich, Alexander C Berg<br />
    [[link]](https://arxiv.org/abs/1506.04579). arXiv:1506.04579, 2015.

6.  **Pyramid Scene Parsing Network**<br />
    Hengshuang Zhao, Jianping Shi, Xiaojuan Qi, Xiaogang Wang, Jiaya Jia<br />
    [[link]](https://arxiv.org/abs/1612.01105). In CVPR, 2017.

7.  **Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate shift**<br />
    Sergey Ioffe, Christian Szegedy <br />
    [[link]](https://arxiv.org/abs/1502.03167). In ICML, 2015.

289
8.  **MobileNetV2: Inverted Residuals and Linear Bottlenecks**<br />
yukun's avatar
yukun committed
290
    Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, Liang-Chieh Chen<br />
291
    [[link]](https://arxiv.org/abs/1801.04381). In CVPR, 2018.
yukun's avatar
yukun committed
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313

9.  **Xception: Deep Learning with Depthwise Separable Convolutions**<br />
    François Chollet<br />
    [[link]](https://arxiv.org/abs/1610.02357). In CVPR, 2017.

10. **Deformable Convolutional Networks -- COCO Detection and Segmentation Challenge 2017 Entry**<br />
    Haozhi Qi, Zheng Zhang, Bin Xiao, Han Hu, Bowen Cheng, Yichen Wei, Jifeng Dai<br />
    [[link]](http://presentations.cocodataset.org/COCO17-Detect-MSRA.pdf). ICCV COCO Challenge
    Workshop, 2017.

11. **Tensorflow: Large-Scale Machine Learning on Heterogeneous Distributed Systems**<br />
    M. Abadi, A. Agarwal, et al. <br />
    [[link]](https://arxiv.org/abs/1603.04467). arXiv:1603.04467, 2016.

12. **The Pascal Visual Object Classes Challenge – A Retrospective,** <br />
    Mark Everingham, S. M. Ali Eslami, Luc Van Gool, Christopher K. I. Williams, John
    Winn, and Andrew Zisserma. <br />
    [[link]](http://host.robots.ox.ac.uk/pascal/VOC/voc2012/). IJCV, 2014.

13. **The Cityscapes Dataset for Semantic Urban Scene Understanding**<br />
    Cordts, Marius, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, Bernt Schiele. <br />
    [[link]](https://www.cityscapes-dataset.com/). In CVPR, 2016.
314
315
316
317
318
319
320
321

14. **Deep Residual Learning for Image Recognition**<br />
    Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun. <br />
    [[link]](https://arxiv.org/abs/1512.03385). In CVPR, 2016.

15. **Progressive Neural Architecture Search**<br />
    Chenxi Liu, Barret Zoph, Maxim Neumann, Jonathon Shlens, Wei Hua, Li-Jia Li, Li Fei-Fei, Alan Yuille, Jonathan Huang, Kevin Murphy. <br />
    [[link]](https://arxiv.org/abs/1712.00559). In ECCV, 2018.
322

Yukun Zhu's avatar
Yukun Zhu committed
323
16. **Searching for MobileNetV3**<br />
324
325
    Andrew Howard, Mark Sandler, Grace Chu, Liang-Chieh Chen, Bo Chen, Mingxing Tan, Weijun Wang, Yukun Zhu, Ruoming Pang, Vijay Vasudevan, Quoc V. Le, Hartwig Adam. <br />
    [[link]](https://arxiv.org/abs/1905.02244). In ICCV, 2019.