Context Encoding for Semantic Segmentation (EncNet)
===================================================
Install Package
---------------
- Clone the GitHub repo::
git clone https://github.com/zhanghang1989/PyTorch-Encoding
- Install PyTorch Encoding (if not yet). Please follow the installation guide `Installing PyTorch Encoding <../notes/compile.html>`_.
Test Pre-trained Model
----------------------
.. hint::
The model names contain the training information. For instance ``FCN_ResNet50_PContext``:
- ``FCN`` indicate the algorithm is “Fully Convolutional Network for Semantic Segmentation”
- ``ResNet50`` is the name of backbone network.
- ``PContext`` means the PASCAL in Context dataset.
How to get pretrained model, for example ``FCN_ResNet50_PContext``::
model = encoding.models.get_model('FCN_ResNet50_PContext', pretrained=True)
Prepare the datasets by runing the scripts in the ``scripts/`` folder, for example preparing ``PASCAL Context`` dataset::
python scripts/prepare_pcontext.py
The test script is in the ``experiments/segmentation/`` folder. For evaluating the model (using MS),
for example ``Encnet_ResNet50_PContext``::
python test.py --dataset PContext --model-zoo Encnet_ResNet50_PContext --eval
# pixAcc: 0.792, mIoU: 0.510: 100%|████████████████████████| 1276/1276 [46:31<00:00, 2.19s/it]
The command for training the model can be found by clicking ``cmd`` in the table.
.. role:: raw-html(raw)
:format: html
+----------------------------------+-----------+-----------+----------------------------------------------------------------------------------------------+
| Model | pixAcc | mIoU | Command |
+==================================+===========+===========+==============================================================================================+
| Encnet_ResNet50_PContext | 79.2% | 51.0% | :raw-html:`cmd` |
+----------------------------------+-----------+-----------+----------------------------------------------------------------------------------------------+
| EncNet_ResNet101_PContext | 80.7% | 54.1% | :raw-html:`cmd` |
+----------------------------------+-----------+-----------+----------------------------------------------------------------------------------------------+
| EncNet_ResNet50_ADE | 80.1% | 41.5% | :raw-html:`cmd` |
+----------------------------------+-----------+-----------+----------------------------------------------------------------------------------------------+
| EncNet_ResNet101_ADE | 81.3% | 44.4% | :raw-html:`cmd` |
+----------------------------------+-----------+-----------+----------------------------------------------------------------------------------------------+
| EncNet_ResNet101_VOC | N/A | 85.9% | :raw-html:`cmd` |
+----------------------------------+-----------+-----------+----------------------------------------------------------------------------------------------+
.. raw:: html
CUDA_VISIBLE_DEVICES=0,1,2,3 python train.py --dataset PContext --model FCN
CUDA_VISIBLE_DEVICES=0,1,2,3 python train.py --dataset PContext --model EncNet --aux --se-loss
CUDA_VISIBLE_DEVICES=0,1,2,3 python train.py --dataset PContext --model EncNet --aux --se-loss --backbone resnet101
CUDA_VISIBLE_DEVICES=0,1,2,3 python train.py --dataset ADE20K --model PSP --aux
CUDA_VISIBLE_DEVICES=0,1,2,3 python train.py --dataset ADE20K --model EncNet --aux --se-loss
CUDA_VISIBLE_DEVICES=0,1,2,3 python train.py --dataset ADE20K --model EncNet --aux --se-loss --backbone resnet101 --base-size 640 --crop-size 576
# First finetuning COCO dataset pretrained model on augmented set
# You can also train from scratch on COCO by yourself
CUDA_VISIBLE_DEVICES=0,1,2,3 python train.py --dataset Pascal_aug --model-zoo EncNet_Resnet101_COCO --aux --se-loss --lr 0.001 --syncbn --ngpus 4 --checkname res101 --ft
# Finetuning on original set
CUDA_VISIBLE_DEVICES=0,1,2,3 python train.py --dataset Pascal_voc --model encnet --aux --se-loss --backbone resnet101 --lr 0.0001 --syncbn --ngpus 4 --checkname res101 --resume runs/Pascal_aug/encnet/res101/checkpoint.params --ft
Quick Demo
~~~~~~~~~~
.. code-block:: python
import torch
import encoding
# Get the model
model = encoding.models.get_model('Encnet_ResNet50_PContext', pretrained=True).cuda()
model.eval()
# Prepare the image
url = 'https://github.com/zhanghang1989/image-data/blob/master/' + \
'encoding/segmentation/pcontext/2010_001829_org.jpg?raw=true'
filename = 'example.jpg'
img = encoding.utils.load_image(
encoding.utils.download(url, filename)).cuda().unsqueeze(0)
# Make prediction
output = model.evaluate(img)
predict = torch.max(output, 1)[1].cpu().numpy() + 1
# Get color pallete for visualization
mask = encoding.utils.get_mask_pallete(predict, 'pcontext')
mask.save('output.png')
.. image:: https://raw.githubusercontent.com/zhanghang1989/image-data/master/encoding/segmentation/pcontext/2010_001829_org.jpg
:width: 45%
.. image:: https://raw.githubusercontent.com/zhanghang1989/image-data/master/encoding/segmentation/pcontext/2010_001829.png
:width: 45%
Train Your Own Model
--------------------
- Prepare the datasets by runing the scripts in the ``scripts/`` folder, for example preparing ``PASCAL Context`` dataset::
python scripts/prepare_pcontext.py
- The training script is in the ``experiments/segmentation/`` folder, example training command::
CUDA_VISIBLE_DEVICES=0,1,2,3 python train.py --dataset pcontext --model encnet --aux --se-loss
- Detail training options, please run ``python train.py -h``. Commands for reproducing pre-trained models can be found in the table.
.. hint::
The validation metrics during the training only using center-crop is just for monitoring the
training correctness purpose. For evaluating the pretrained model on validation set using MS,
please use the command::
CUDA_VISIBLE_DEVICES=0,1,2,3 python test.py --dataset pcontext --model encnet --aux --se-loss --resume mycheckpoint --eval
Citation
--------
.. note::
* Hang Zhang, Kristin Dana, Jianping Shi, Zhongyue Zhang, Xiaogang Wang, Ambrish Tyagi, Amit Agrawal. "Context Encoding for Semantic Segmentation" *The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2018*::
@InProceedings{Zhang_2018_CVPR,
author = {Zhang, Hang and Dana, Kristin and Shi, Jianping and Zhang, Zhongyue and Wang, Xiaogang and Tyagi, Ambrish and Agrawal, Amit},
title = {Context Encoding for Semantic Segmentation},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2018}
}