segmentation.rst 8.67 KB
Newer Older
Zhang's avatar
v0.4.2  
Zhang committed
1
2
3
4
5
6
7
8
Context Encoding for Semantic Segmentation (EncNet)
===================================================

Install Package
---------------

- Clone the GitHub repo::
    
Hang Zhang's avatar
Hang Zhang committed
9
    git clone https://github.com/zhanghang1989/PyTorch-Encoding
Zhang's avatar
v0.4.2  
Zhang committed
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25

- Install PyTorch Encoding (if not yet). Please follow the installation guide `Installing PyTorch Encoding <../notes/compile.html>`_.

Test Pre-trained Model
----------------------

.. hint::
    The model names contain the training information. For instance ``FCN_ResNet50_PContext``:
      - ``FCN`` indicate the algorithm is Fully Convolutional Network for Semantic Segmentation
      - ``ResNet50`` is the name of backbone network.
      - ``PContext`` means the PASCAL in Context dataset.

    How to get pretrained model, for example ``FCN_ResNet50_PContext``::

        model = encoding.models.get_model('FCN_ResNet50_PContext', pretrained=True)

Hang Zhang's avatar
Hang Zhang committed
26
27
28
29
    Prepare the datasets by runing the scripts in the ``scripts/`` folder, for example preparing ``PASCAL Context`` dataset::

        python scripts/prepare_pcontext.py
    
Zhang's avatar
v0.4.2  
Zhang committed
30
31
32
33
    The test script is in the ``experiments/segmentation/`` folder. For evaluating the model (using MS),
    for example ``Encnet_ResNet50_PContext``::

        python test.py --dataset PContext --model-zoo Encnet_ResNet50_PContext --eval
Hang Zhang's avatar
Hang Zhang committed
34
        # pixAcc: 0.792, mIoU: 0.510: 100%|████████████████████████| 1276/1276 [46:31<00:00,  2.19s/it]
Zhang's avatar
v0.4.2  
Zhang committed
35
36
37
38
39
40

    The command for training the model can be found by clicking ``cmd`` in the table.

.. role:: raw-html(raw)
   :format: html

Hang Zhang's avatar
Hang Zhang committed
41
42
43
44
45
46
47
48
49
50
51
52
53
+----------------------------------+-----------+-----------+----------------------------------------------------------------------------------------------+------------+
| Model                            | pixAcc    | mIoU      | Command                                                                                      | Logs       |
+==================================+===========+===========+==============================================================================================+============+
| Encnet_ResNet50_PContext         | 79.2%     | 51.0%     | :raw-html:`<a href="javascript:toggleblock('cmd_enc50_pcont')" class="toggleblock">cmd</a>`  | ENC50PC_   |
+----------------------------------+-----------+-----------+----------------------------------------------------------------------------------------------+------------+
| EncNet_ResNet101_PContext        | 80.7%     | 54.1%     | :raw-html:`<a href="javascript:toggleblock('cmd_enc101_pcont')" class="toggleblock">cmd</a>` | ENC101PC_  |
+----------------------------------+-----------+-----------+----------------------------------------------------------------------------------------------+------------+
| EncNet_ResNet50_ADE              | 80.1%     | 41.5%     | :raw-html:`<a href="javascript:toggleblock('cmd_enc50_ade')" class="toggleblock">cmd</a>`    | ENC50ADE_  |
+----------------------------------+-----------+-----------+----------------------------------------------------------------------------------------------+------------+
| EncNet_ResNet101_ADE             | 81.3%     | 44.4%     | :raw-html:`<a href="javascript:toggleblock('cmd_enc101_ade')" class="toggleblock">cmd</a>`   | ENC101ADE_ |
+----------------------------------+-----------+-----------+----------------------------------------------------------------------------------------------+------------+
| EncNet_ResNet101_VOC             | N/A       | 85.9%     | :raw-html:`<a href="javascript:toggleblock('cmd_enc101_voc')" class="toggleblock">cmd</a>`   | ENC101VOC_ |
+----------------------------------+-----------+-----------+----------------------------------------------------------------------------------------------+------------+
Hang Zhang's avatar
Hang Zhang committed
54
55
56
57
58

.. _ENC50PC: https://github.com/zhanghang1989/image-data/blob/master/encoding/segmentation/logs/encnet_resnet50_pcontext.log?raw=true
.. _ENC101PC: https://github.com/zhanghang1989/image-data/blob/master/encoding/segmentation/logs/encnet_resnet101_pcontext.log?raw=true
.. _ENC50ADE: https://github.com/zhanghang1989/image-data/blob/master/encoding/segmentation/logs/encnet_resnet50_ade.log?raw=true

Zhang's avatar
v0.4.2  
Zhang committed
59
60
61
62
63
64
65
66
67
68
69

.. raw:: html

    <code xml:space="preserve" id="cmd_fcn50_pcont" style="display: none; text-align: left; white-space: pre-wrap">
    CUDA_VISIBLE_DEVICES=0,1,2,3 python train.py --dataset PContext --model FCN
    </code>

    <code xml:space="preserve" id="cmd_enc50_pcont" style="display: none; text-align: left; white-space: pre-wrap">
    CUDA_VISIBLE_DEVICES=0,1,2,3 python train.py --dataset PContext --model EncNet --aux --se-loss
    </code>

Hang Zhang's avatar
Hang Zhang committed
70
71
72
73
    <code xml:space="preserve" id="cmd_enc101_pcont" style="display: none; text-align: left; white-space: pre-wrap">
    CUDA_VISIBLE_DEVICES=0,1,2,3 python train.py --dataset PContext --model EncNet --aux --se-loss --backbone resnet101
    </code>

Hang Zhang's avatar
Hang Zhang committed
74
75
76
77
78
    <code xml:space="preserve" id="cmd_psp50_ade" style="display: none; text-align: left; white-space: pre-wrap">
    CUDA_VISIBLE_DEVICES=0,1,2,3 python train.py --dataset ADE20K --model PSP --aux
    </code>

    <code xml:space="preserve" id="cmd_enc50_ade" style="display: none; text-align: left; white-space: pre-wrap">
Hang Zhang's avatar
Hang Zhang committed
79
    CUDA_VISIBLE_DEVICES=0,1,2,3 python train.py --dataset ADE20K --model EncNet --aux --se-loss
Hang Zhang's avatar
Hang Zhang committed
80
81
    </code>

Hang Zhang's avatar
Hang Zhang committed
82
83
84
85
86
87
88
89
90
91
92
93
94

    <code xml:space="preserve" id="cmd_enc101_ade" style="display: none; text-align: left; white-space: pre-wrap">
    CUDA_VISIBLE_DEVICES=0,1,2,3 python train.py --dataset ADE20K --model EncNet --aux --se-loss --backbone resnet101 
    </code>

    <code xml:space="preserve" id="cmd_enc101_voc" style="display: none; text-align: left; white-space: pre-wrap">
    # First finetuning COCO dataset pretrained model on augmented set
    # You can also train from scratch on COCO by yourself
    CUDA_VISIBLE_DEVICES=0,1,2,3 python train.py --dataset Pascal_aug --model-zoo EncNet_Resnet101_COCO --aux --se-loss --lr 0.001 --syncbn --ngpus 4 --checkname res101
    # Finetuning on original set
    CUDA_VISIBLE_DEVICES=0,1,2,3 python train.py --dataset Pascal_voc --model encnet --aux  --se-loss --backbone resnet101 --lr 0.0001 --syncbn --ngpus 4 --checkname res101 --resume runs/Pascal_aug/encnet/res101/checkpoint.params
    </code>

Zhang's avatar
v0.4.2  
Zhang committed
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
Quick Demo
~~~~~~~~~~

.. code-block:: python

    import torch
    import encoding

    # Get the model
    model = encoding.models.get_model('Encnet_ResNet50_PContext', pretrained=True).cuda()
    model.eval()

    # Prepare the image
    url = 'https://github.com/zhanghang1989/image-data/blob/master/' + \
          'encoding/segmentation/pcontext/2010_001829_org.jpg?raw=true'
    filename = 'example.jpg'
    img = encoding.utils.load_image(
        encoding.utils.download(url, filename)).cuda().unsqueeze(0)

    # Make prediction
    output = model.evaluate(img)
    predict = torch.max(output, 1)[1].cpu().numpy() + 1

    # Get color pallete for visualization
    mask = encoding.utils.get_mask_pallete(predict, 'pcontext')
    mask.save('output.png')


.. image:: https://raw.githubusercontent.com/zhanghang1989/image-data/master/encoding/segmentation/pcontext/2010_001829_org.jpg
   :width: 45%

.. image:: https://raw.githubusercontent.com/zhanghang1989/image-data/master/encoding/segmentation/pcontext/2010_001829.png
   :width: 45%

Train Your Own Model
--------------------

- Prepare the datasets by runing the scripts in the ``scripts/`` folder, for example preparing ``PASCAL Context`` dataset::

    python scripts/prepare_pcontext.py

- The training script is in the ``experiments/segmentation/`` folder, example training command::

    CUDA_VISIBLE_DEVICES=0,1,2,3 python train.py --dataset pcontext --model encnet --aux --se-loss

Hang Zhang's avatar
Hang Zhang committed
140
- Detail training options, please run ``python train.py -h``. Commands for reproducing pre-trained models can be found in the table.
Zhang's avatar
v0.4.2  
Zhang committed
141

Hang Zhang's avatar
Hang Zhang committed
142
143
144
145
.. hint::
    The validation metrics during the training only using center-crop is just for monitoring the
    training correctness purpose. For evaluating the pretrained model on validation set using MS,
    please use the command::
Hang Zhang's avatar
Hang Zhang committed
146

Hang Zhang's avatar
Hang Zhang committed
147
        CUDA_VISIBLE_DEVICES=0,1,2,3 python test.py --dataset pcontext --model encnet --aux --se-loss --resume mycheckpoint --eval
Hang Zhang's avatar
Hang Zhang committed
148

Zhang's avatar
v0.4.2  
Zhang committed
149
150
151
152
153
154
155
156
157
158
159
160
161
Citation
--------

.. note::
    * Hang Zhang, Kristin Dana, Jianping Shi, Zhongyue Zhang, Xiaogang Wang, Ambrish Tyagi, Amit Agrawal. "Context Encoding for Semantic Segmentation"  *The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2018*::

        @InProceedings{Zhang_2018_CVPR,
        author = {Zhang, Hang and Dana, Kristin and Shi, Jianping and Zhang, Zhongyue and Wang, Xiaogang and Tyagi, Ambrish and Agrawal, Amit},
        title = {Context Encoding for Semantic Segmentation},
        booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
        month = {June},
        year = {2018}
        }