add SOLO

453e151f · WXinlong · 695fcddd · 453e151f · 453e151f · 453e151f
Commit 453e151f authored Mar 28, 2020 by WXinlong
20 changed files
--- a/.gitignore
+++ b/.gitignore
@@ -113,6 +113,7 @@ data
 # custom
 *.pkl
 *.pkl.json
+*.segm.json
 *.log.json
 work_dirs/


--- a/README.md
+++ b/README.md

-# MMDetection
+# SOLO: Segmenting Objects by Locations

-**News**: We released the technical report on [ArXiv](https://arxiv.org/abs/1906.07155).
+This project hosts the code for implementing the SOLO algorithms for instance segmentation.

-Documentation: https://mmdetection.readthedocs.io/
+> [**SOLO: Segmenting Objects by Locations**](https://arxiv.org/abs/1912.04488),            
+> Xinlong Wang, Tao Kong, Chunhua Shen, Yuning Jiang, Lei Li        
+> *arXiv preprint ([arXiv 1912.04488](https://arxiv.org/abs/1912.04488))*   

-## Introduction

-The master branch works with **PyTorch 1.1** or higher.
+> [**SOLOv2: Dynamic, Faster and Stronger**](https://arxiv.org/abs/2003.10152),            
+> Xinlong Wang, Rufeng Zhang, Tao Kong, Lei Li, Chunhua Shen        
+> *arXiv preprint ([arXiv 2003.10152](https://arxiv.org/abs/2003.10152))*  

-mmdetection is an open source object detection toolbox based on PyTorch. It is
-a part of the open-mmlab project developed by [Multimedia Laboratory, CUHK](http://mmlab.ie.cuhk.edu.hk/).
+More code and models will be released soon. Stay tuned.

-![demo image](demo/coco_test_12510.jpg)

-### Major features
+## Highlights
+- **Totally box-free:**  SOLO is totally box-free thus not being restricted by (anchor) box locations and scales, and naturally benefits from the inherent advantages of FCNs.
+- **Direct instance segmentation:** Our method takes an image as input, directly outputs instance masks and corresponding class probabilities, in a fully convolutional, box-free and grouping-free paradigm.
+- **State-of-the-art performance:** Our best single model based on ResNet-101 and deformable convolutions achieves **41.7%** in AP on COCO test-dev (without multi-scale testing). A light-weight version of SOLOv2 executes at **31.3** FPS on a single V100 GPU and yields **37.1%** AP.

- **Modular Design**
-
-  We decompose the detection framework into different components and one can easily construct a customized object detection framework by combining different modules.
-
- **Support of multiple frameworks out of box**
-
-  The toolbox directly supports popular and contemporary detection frameworks, *e.g.* Faster RCNN, Mask RCNN, RetinaNet, etc.
-
- **High efficiency**
-
-  All basic bbox and mask operations run on GPUs now. The training speed is faster than or comparable to other codebases, including [Detectron](https://github.com/facebookresearch/Detectron), [maskrcnn-benchmark](https://github.com/facebookresearch/maskrcnn-benchmark) and [SimpleDet](https://github.com/TuSimple/simpledet).
-
- **State of the art**
-
-  The toolbox stems from the codebase developed by the *MMDet* team, who won [COCO Detection Challenge](http://cocodataset.org/#detection-leaderboard) in 2018, and we keep pushing it forward.
-
-Apart from MMDetection, we also released a library [mmcv](https://github.com/open-mmlab/mmcv) for computer vision research, which is heavily depended on by this toolbox.
-
-## License
-
-This project is released under the [Apache 2.0 license](LICENSE).
-
-## Changelog
-
-v1.0.0 was released in 30/1/2020, with more than 20 fixes and improvements.
-Please refer to [CHANGELOG.md](docs/CHANGELOG.md) for details and release history.
-
-## Benchmark and model zoo
-
-Supported methods and backbones are shown in the below table.
-Results and models are available in the [Model zoo](docs/MODEL_ZOO.md).
-
-|                    | ResNet   | ResNeXt  | SENet    | VGG      | HRNet |
-|--------------------|:--------:|:--------:|:--------:|:--------:|:-----:|
-| RPN                | ✓        | ✓        | ☐        | ✗        | ✓     |
-| Fast R-CNN         | ✓        | ✓        | ☐        | ✗        | ✓     |
-| Faster R-CNN       | ✓        | ✓        | ☐        | ✗        | ✓     |
-| Mask R-CNN         | ✓        | ✓        | ☐        | ✗        | ✓     |
-| Cascade R-CNN      | ✓        | ✓        | ☐        | ✗        | ✓     |
-| Cascade Mask R-CNN | ✓        | ✓        | ☐        | ✗        | ✓     |
-| SSD                | ✗        | ✗        | ✗        | ✓        | ✗     |
-| RetinaNet          | ✓        | ✓        | ☐        | ✗        | ✓     |
-| GHM                | ✓        | ✓        | ☐        | ✗        | ✓     |
-| Mask Scoring R-CNN | ✓        | ✓        | ☐        | ✗        | ✓     |
-| Double-Head R-CNN  | ✓        | ✓        | ☐        | ✗        | ✓     |
-| Grid R-CNN (Plus)  | ✓        | ✓        | ☐        | ✗        | ✓     |
-| Hybrid Task Cascade| ✓        | ✓        | ☐        | ✗        | ✓     |
-| Libra R-CNN        | ✓        | ✓        | ☐        | ✗        | ✓     |
-| Guided Anchoring   | ✓        | ✓        | ☐        | ✗        | ✓     |
-| FCOS               | ✓        | ✓        | ☐        | ✗        | ✓     |
-| RepPoints          | ✓        | ✓        | ☐        | ✗        | ✓     |
-| Foveabox           | ✓        | ✓        | ☐        | ✗        | ✓     |
-| FreeAnchor         | ✓        | ✓        | ☐        | ✗        | ✓     |
-| NAS-FPN            | ✓        | ✓        | ☐        | ✗        | ✓     |
-| ATSS               | ✓        | ✓        | ☐        | ✗        | ✓     |
-
-Other features
- [x] DCNv2
- [x] Group Normalization
- [x] Weight Standardization
- [x] OHEM
- [x] Soft-NMS
- [x] Generalized Attention
- [x] GCNet
- [x] Mixed Precision (FP16) Training
- [x] [InstaBoost](configs/instaboost/README.md)
+## Updates
+   - SOLOv1 is available. Code and trained models of SOLO and Decoupled SOLO are released. (28/03/2020)


 ## Installation
+This implementation is based on [mmdetection](https://github.com/open-mmlab/mmdetection)(v1.0.0). Please refer to [INSTALL.md](docs/INSTALL.md) for installation and dataset preparation.

-Please refer to [INSTALL.md](docs/INSTALL.md) for installation and dataset preparation.
-
+## Models
+For your convenience, we provide the following trained models on COCO (more models are coming soon).

-## Get Started
+Model | Multi-scale training | Testing time / im | AP (minival) | Link
+--- |:---:|:---:|:---:|:---:
+SOLO_R50_FPN_1x | No | 77ms | 32.9 | [download](https://cloudstor.aarnet.edu.au/plus/s/nTOgDldI4dvDrPs/download)
+SOLO_R50_FPN_3x | Yes | 77ms |  35.8 | [download](https://cloudstor.aarnet.edu.au/plus/s/x4Fb4XQ0OmkBvaQ/download)
+Decoupled_SOLO_R50_FPN_1x | No | 85ms | 33.9 | [download](https://cloudstor.aarnet.edu.au/plus/s/RcQyLrZQeeS6JIy/download)
+Decoupled_SOLO_R50_FPN_3x | Yes | 85ms | 36.4 | [download](https://cloudstor.aarnet.edu.au/plus/s/dXz11J672ax0Z1Q/download)

-Please see [GETTING_STARTED.md](docs/GETTING_STARTED.md) for the basic usage of MMDetection.
+## Usage

-## Contributing
+### Train with multiple GPUs
+    ./tools/dist_train.sh ${CONFIG_FILE} ${GPU_NUM}

-We appreciate all contributions to improve MMDetection. Please refer to [CONTRIBUTING.md](.github/CONTRIBUTING.md) for the contributing guideline.
+    Example: 
+    ./tools/dist_train.sh configs/solo/solo_r50_fpn_8gpu_1x.py  8

-## Acknowledgement
+### Train with single GPU
+    python tools/train.py ${CONFIG_FILE}
+    
+    Example:
+    python tools/train.py configs/solo/solo_r50_fpn_8gpu_1x.py

-MMDetection is an open source project that is contributed by researchers and engineers from various colleges and companies. We appreciate all the contributors who implement their methods or add new features, as well as users who give valuable feedbacks.
-We wish that the toolbox and benchmark could serve the growing research community by providing a flexible toolkit to reimplement existing methods and develop their own new detectors.
+### Testing
+    python tools/test_ins.py ${CONFIG_FILE} ${CHECKPOINT_FILE} --show --out  ${OUTPUT_FILE} --eval segm
+    
+    Example: 
+    python tools/test_ins.py configs/solo/solo_r50_fpn_8gpu_1x.py  SOLO_R50_FPN_1x.pth --show --out  results_solo.pkl --eval segm

+### Visualization

-## Citation
+    python tools/test_ins_vis.py ${CONFIG_FILE} ${CHECKPOINT_FILE} --show --save_dir  ${SAVE_DIR}
+    
+    Example: 
+    python tools/test_ins_vis.py configs/solo/solo_r50_fpn_8gpu_1x.py  SOLO_R50_FPN_1x.pth --show --save_dir  work_dirs/vis_solo

-If you use this toolbox or benchmark in your research, please cite this project.
+## Contributing to the project
+Any pull requests or issues are welcome.

+## Citations
+Please consider citing our paper in your publications if the project helps your research. BibTeX reference is as follows.
 ```
-@article{mmdetection,
-  title   = {{MMDetection}: Open MMLab Detection Toolbox and Benchmark},
-  author  = {Chen, Kai and Wang, Jiaqi and Pang, Jiangmiao and Cao, Yuhang and
-             Xiong, Yu and Li, Xiaoxiao and Sun, Shuyang and Feng, Wansen and
-             Liu, Ziwei and Xu, Jiarui and Zhang, Zheng and Cheng, Dazhi and
-             Zhu, Chenchen and Cheng, Tianheng and Zhao, Qijie and Li, Buyu and
-             Lu, Xin and Zhu, Rui and Wu, Yue and Dai, Jifeng and Wang, Jingdong
-             and Shi, Jianping and Ouyang, Wanli and Loy, Chen Change and Lin, Dahua},
-  journal= {arXiv preprint arXiv:1906.07155},
+@article{wang2019solo,
+  title={SOLO: Segmenting Objects by Locations},
+  author={Wang, Xinlong and Kong, Tao and Shen, Chunhua and Jiang, Yuning and Li, Lei},
+  journal={arXiv preprint arXiv:1912.04488},
  year={2019}
 }
 ```

-
-## Contact
-
-This repo is currently maintained by Kai Chen ([@hellock](http://github.com/hellock)), Yuhang Cao ([@yhcao6](https://github.com/yhcao6)), Wenwei Zhang ([@ZwwWayne](https://github.com/ZwwWayne)), Jiangmiao Pang ([@OceanPang](https://github.com/OceanPang)) and Jiaqi Wang ([@myownskyW7](https://github.com/myownskyW7)).
+```
+@article{wang2020solov2,
+  title={SOLOv2: Dynamic, Faster and Stronger},
+  author={Wang, Xinlong and Zhang, Rufeng and  Kong, Tao and Li, Lei and Shen, Chunhua},
+  journal={arXiv preprint arXiv:2003.10152},
+  year={2020}
+}
+```
--- a/configs/solo/decoupled_solo_r50_fpn_8gpu_1x.py
+++ b/configs/solo/decoupled_solo_r50_fpn_8gpu_1x.py
+# model settings
+model = dict(
+    type='SOLO',
+    pretrained='torchvision://resnet50',
+    backbone=dict(
+        type='ResNet',
+        depth=50,
+        num_stages=4,
+        out_indices=(0, 1, 2, 3), # C2, C3, C4, C5
+        frozen_stages=1,
+        style='pytorch'),
+    neck=dict(
+        type='FPN',
+        in_channels=[256, 512, 1024, 2048],
+        out_channels=256,
+        start_level=0,
+        num_outs=5),
+    bbox_head=dict(
+        type='DecoupledSOLOHead',
+        num_classes=81,
+        in_channels=256,
+        stacked_convs=7,
+        seg_feat_channels=256,
+        strides=[8, 8, 16, 32, 32],
+        scale_ranges=((1, 96), (48, 192), (96, 384), (192, 768), (384, 2048)),
+        sigma=0.2,
+        num_grids=[40, 36, 24, 16, 12],
+        cate_down_pos=0,
+        with_deform=False,
+        loss_ins=dict(
+            type='DiceLoss',
+            use_sigmoid=True,
+            loss_weight=3.0),
+        loss_cate=dict(
+            type='FocalLoss',
+            use_sigmoid=True,
+            gamma=2.0,
+            alpha=0.25,
+            loss_weight=1.0),
+    ))
+# training and testing settings
+train_cfg = dict()
+test_cfg = dict(
+    nms_pre=500,
+    score_thr=0.1,
+    mask_thr=0.5,
+    update_thr=0.05,
+    kernel='gaussian',  # gaussian/linear
+    sigma=2.0,
+    max_per_img=100)
+# dataset settings
+dataset_type = 'CocoDataset'
+data_root = 'data/coco/'
+img_norm_cfg = dict(
+    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
+train_pipeline = [
+    dict(type='LoadImageFromFile'),
+    dict(type='LoadAnnotations', with_bbox=True, with_mask=True),
+    dict(type='Resize', img_scale=(1333, 800), keep_ratio=True),
+    dict(type='RandomFlip', flip_ratio=0.5),
+    dict(type='Normalize', **img_norm_cfg),
+    dict(type='Pad', size_divisor=32),
+    dict(type='DefaultFormatBundle'),
+    dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks']),
+]
+test_pipeline = [
+    dict(type='LoadImageFromFile'),
+    dict(
+        type='MultiScaleFlipAug',
+        img_scale=(1333, 800),
+        flip=False,
+        transforms=[
+            dict(type='Resize', keep_ratio=True),
+            dict(type='RandomFlip'),
+            dict(type='Normalize', **img_norm_cfg),
+            dict(type='Pad', size_divisor=32),
+            dict(type='ImageToTensor', keys=['img']),
+            dict(type='Collect', keys=['img']),
+        ])
+]
+data = dict(
+    imgs_per_gpu=2,
+    workers_per_gpu=2,
+    train=dict(
+        type=dataset_type,
+        ann_file=data_root + 'annotations/instances_train2017.json',
+        img_prefix=data_root + 'train2017/',
+        pipeline=train_pipeline),
+    val=dict(
+        type=dataset_type,
+        ann_file=data_root + 'annotations/instances_val2017.json',
+        img_prefix=data_root + 'val2017/',
+        pipeline=test_pipeline),
+    test=dict(
+        type=dataset_type,
+        ann_file=data_root + 'annotations/instances_val2017.json',
+        img_prefix=data_root + 'val2017/',
+        pipeline=test_pipeline))
+# optimizer
+optimizer = dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0001)
+optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2))
+# learning policy
+lr_config = dict(
+    policy='step',
+    warmup='linear',
+    warmup_iters=500,
+    warmup_ratio=1.0 / 3,
+    step=[9, 11])
+checkpoint_config = dict(interval=1)
+# yapf:disable
+log_config = dict(
+    interval=50,
+    hooks=[
+        dict(type='TextLoggerHook'),
+        # dict(type='TensorboardLoggerHook')
+    ])
+# yapf:enable
+# runtime settings
+total_epochs = 12
+device_ids = range(8)
+dist_params = dict(backend='nccl')
+log_level = 'INFO'
+work_dir = './work_dirs/decoupled_solo_release_r50_fpn_8gpu_1x'
+load_from = None
+resume_from = None
+workflow = [('train', 1)]
--- a/configs/solo/decoupled_solo_r50_fpn_8gpu_3x.py
+++ b/configs/solo/decoupled_solo_r50_fpn_8gpu_3x.py
+# model settings
+model = dict(
+    type='SOLO',
+    pretrained='torchvision://resnet50',
+    backbone=dict(
+        type='ResNet',
+        depth=50,
+        num_stages=4,
+        out_indices=(0, 1, 2, 3), # C2, C3, C4, C5
+        frozen_stages=1,
+        style='pytorch'),
+    neck=dict(
+        type='FPN',
+        in_channels=[256, 512, 1024, 2048],
+        out_channels=256,
+        start_level=0,
+        num_outs=5),
+    bbox_head=dict(
+        type='DecoupledSOLOHead',
+        num_classes=81,
+        in_channels=256,
+        stacked_convs=7,
+        seg_feat_channels=256,
+        strides=[8, 8, 16, 32, 32],
+        scale_ranges=((1, 96), (48, 192), (96, 384), (192, 768), (384, 2048)),
+        sigma=0.2,
+        num_grids=[40, 36, 24, 16, 12],
+        cate_down_pos=0,
+        with_deform=False,
+        loss_ins=dict(
+            type='DiceLoss',
+            use_sigmoid=True,
+            loss_weight=3.0),
+        loss_cate=dict(
+            type='FocalLoss',
+            use_sigmoid=True,
+            gamma=2.0,
+            alpha=0.25,
+            loss_weight=1.0),
+    ))
+# training and testing settings
+train_cfg = dict()
+test_cfg = dict(
+    nms_pre=500,
+    score_thr=0.1,
+    mask_thr=0.5,
+    update_thr=0.05,
+    kernel='gaussian',  # gaussian/linear
+    sigma=2.0,
+    max_per_img=100)
+# dataset settings
+dataset_type = 'CocoDataset'
+data_root = 'data/coco/'
+img_norm_cfg = dict(
+    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
+train_pipeline = [
+    dict(type='LoadImageFromFile'),
+    dict(type='LoadAnnotations', with_bbox=True, with_mask=True),
+    dict(type='Resize',
+         img_scale=[(1333, 800), (1333, 768), (1333, 736),
+                    (1333, 704), (1333, 672), (1333, 640)],
+         multiscale_mode='value',
+         keep_ratio=True),
+    dict(type='RandomFlip', flip_ratio=0.5),
+    dict(type='Normalize', **img_norm_cfg),
+    dict(type='Pad', size_divisor=32),
+    dict(type='DefaultFormatBundle'),
+    dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks']),
+]
+test_pipeline = [
+    dict(type='LoadImageFromFile'),
+    dict(
+        type='MultiScaleFlipAug',
+        img_scale=(1333, 800),
+        flip=False,
+        transforms=[
+            dict(type='Resize', keep_ratio=True),
+            dict(type='RandomFlip'),
+            dict(type='Normalize', **img_norm_cfg),
+            dict(type='Pad', size_divisor=32),
+            dict(type='ImageToTensor', keys=['img']),
+            dict(type='Collect', keys=['img']),
+        ])
+]
+data = dict(
+    imgs_per_gpu=2,
+    workers_per_gpu=2,
+    train=dict(
+        type=dataset_type,
+        ann_file=data_root + 'annotations/instances_train2017.json',
+        img_prefix=data_root + 'train2017/',
+        pipeline=train_pipeline),
+    val=dict(
+        type=dataset_type,
+        ann_file=data_root + 'annotations/instances_val2017.json',
+        img_prefix=data_root + 'val2017/',
+        pipeline=test_pipeline),
+    test=dict(
+        type=dataset_type,
+        ann_file=data_root + 'annotations/instances_val2017.json',
+        img_prefix=data_root + 'val2017/',
+        pipeline=test_pipeline))
+# optimizer
+optimizer = dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0001)
+optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2))
+# learning policy
+lr_config = dict(
+    policy='step',
+    warmup='linear',
+    warmup_iters=500,
+    warmup_ratio=1.0 / 3,
+    step=[27, 33])
+checkpoint_config = dict(interval=1)
+# yapf:disable
+log_config = dict(
+    interval=50,
+    hooks=[
+        dict(type='TextLoggerHook'),
+        # dict(type='TensorboardLoggerHook')
+    ])
+# yapf:enable
+# runtime settings
+total_epochs = 36
+device_ids = range(8)
+dist_params = dict(backend='nccl')
+log_level = 'INFO'
+work_dir = './work_dirs/decoupled_solo_release_r50_fpn_8gpu_3x'
+load_from = None
+resume_from = None
+workflow = [('train', 1)]
--- a/configs/solo/solo_r50_fpn_8gpu_1x.py
+++ b/configs/solo/solo_r50_fpn_8gpu_1x.py
+# model settings
+model = dict(
+    type='SOLO',
+    pretrained='torchvision://resnet50',
+    backbone=dict(
+        type='ResNet',
+        depth=50,
+        num_stages=4,
+        out_indices=(0, 1, 2, 3), # C2, C3, C4, C5
+        frozen_stages=1,
+        style='pytorch'),
+    neck=dict(
+        type='FPN',
+        in_channels=[256, 512, 1024, 2048],
+        out_channels=256,
+        start_level=0,
+        num_outs=5),
+    bbox_head=dict(
+        type='SOLOHead',
+        num_classes=81,
+        in_channels=256,
+        stacked_convs=7,
+        seg_feat_channels=256,
+        strides=[8, 8, 16, 32, 32],
+        scale_ranges=((1, 96), (48, 192), (96, 384), (192, 768), (384, 2048)),
+        sigma=0.2,
+        num_grids=[40, 36, 24, 16, 12],
+        cate_down_pos=0,
+        with_deform=False,
+        loss_ins=dict(
+            type='DiceLoss',
+            use_sigmoid=True,
+            loss_weight=3.0),
+        loss_cate=dict(
+            type='FocalLoss',
+            use_sigmoid=True,
+            gamma=2.0,
+            alpha=0.25,
+            loss_weight=1.0),
+    ))
+# training and testing settings
+train_cfg = dict()
+test_cfg = dict(
+    nms_pre=500,
+    score_thr=0.1,
+    mask_thr=0.5,
+    update_thr=0.05,
+    kernel='gaussian',  # gaussian/linear
+    sigma=2.0,
+    max_per_img=100)
+# dataset settings
+dataset_type = 'CocoDataset'
+data_root = 'data/coco/'
+img_norm_cfg = dict(
+    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
+train_pipeline = [
+    dict(type='LoadImageFromFile'),
+    dict(type='LoadAnnotations', with_bbox=True, with_mask=True),
+    dict(type='Resize', img_scale=(1333, 800), keep_ratio=True),
+    dict(type='RandomFlip', flip_ratio=0.5),
+    dict(type='Normalize', **img_norm_cfg),
+    dict(type='Pad', size_divisor=32),
+    dict(type='DefaultFormatBundle'),
+    dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks']),
+]
+test_pipeline = [
+    dict(type='LoadImageFromFile'),
+    dict(
+        type='MultiScaleFlipAug',
+        img_scale=(1333, 800),
+        flip=False,
+        transforms=[
+            dict(type='Resize', keep_ratio=True),
+            dict(type='RandomFlip'),
+            dict(type='Normalize', **img_norm_cfg),
+            dict(type='Pad', size_divisor=32),
+            dict(type='ImageToTensor', keys=['img']),
+            dict(type='Collect', keys=['img']),
+        ])
+]
+data = dict(
+    imgs_per_gpu=2,
+    workers_per_gpu=2,
+    train=dict(
+        type=dataset_type,
+        ann_file=data_root + 'annotations/instances_train2017.json',
+        img_prefix=data_root + 'train2017/',
+        pipeline=train_pipeline),
+    val=dict(
+        type=dataset_type,
+        ann_file=data_root + 'annotations/instances_val2017.json',
+        img_prefix=data_root + 'val2017/',
+        pipeline=test_pipeline),
+    test=dict(
+        type=dataset_type,
+        ann_file=data_root + 'annotations/instances_val2017.json',
+        img_prefix=data_root + 'val2017/',
+        pipeline=test_pipeline))
+# optimizer
+optimizer = dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0001)
+optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2))
+# learning policy
+lr_config = dict(
+    policy='step',
+    warmup='linear',
+    warmup_iters=500,
+    warmup_ratio=1.0 / 3,
+    step=[9, 11])
+checkpoint_config = dict(interval=1)
+# yapf:disable
+log_config = dict(
+    interval=50,
+    hooks=[
+        dict(type='TextLoggerHook'),
+        # dict(type='TensorboardLoggerHook')
+    ])
+# yapf:enable
+# runtime settings
+total_epochs = 12
+device_ids = range(8)
+dist_params = dict(backend='nccl')
+log_level = 'INFO'
+work_dir = './work_dirs/solo_release_r50_fpn_8gpu_1x'
+load_from = None
+resume_from = None
+workflow = [('train', 1)]
--- a/configs/solo/solo_r50_fpn_8gpu_3x.py
+++ b/configs/solo/solo_r50_fpn_8gpu_3x.py
+# model settings
+model = dict(
+    type='SOLO',
+    pretrained='torchvision://resnet50',
+    backbone=dict(
+        type='ResNet',
+        depth=50,
+        num_stages=4,
+        out_indices=(0, 1, 2, 3), # C2, C3, C4, C5
+        frozen_stages=1,
+        style='pytorch'),
+    neck=dict(
+        type='FPN',
+        in_channels=[256, 512, 1024, 2048],
+        out_channels=256,
+        start_level=0,
+        num_outs=5),
+    bbox_head=dict(
+        type='SOLOHead',
+        num_classes=81,
+        in_channels=256,
+        stacked_convs=7,
+        seg_feat_channels=256,
+        strides=[8, 8, 16, 32, 32],
+        scale_ranges=((1, 96), (48, 192), (96, 384), (192, 768), (384, 2048)),
+        sigma=0.2,
+        num_grids=[40, 36, 24, 16, 12],
+        cate_down_pos=0,
+        with_deform=False,
+        loss_ins=dict(
+            type='DiceLoss',
+            use_sigmoid=True,
+            loss_weight=3.0),
+        loss_cate=dict(
+            type='FocalLoss',
+            use_sigmoid=True,
+            gamma=2.0,
+            alpha=0.25,
+            loss_weight=1.0),
+    ))
+# training and testing settings
+train_cfg = dict()
+test_cfg = dict(
+    nms_pre=500,
+    score_thr=0.1,
+    mask_thr=0.5,
+    update_thr=0.05,
+    kernel='gaussian',  # gaussian/linear
+    sigma=2.0,
+    max_per_img=100)
+# dataset settings
+dataset_type = 'CocoDataset'
+data_root = 'data/coco/'
+img_norm_cfg = dict(
+    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
+train_pipeline = [
+    dict(type='LoadImageFromFile'),
+    dict(type='LoadAnnotations', with_bbox=True, with_mask=True),
+    dict(type='Resize',
+         img_scale=[(1333, 800), (1333, 768), (1333, 736),
+                    (1333, 704), (1333, 672), (1333, 640)],
+         multiscale_mode='value',
+         keep_ratio=True),
+    dict(type='RandomFlip', flip_ratio=0.5),
+    dict(type='Normalize', **img_norm_cfg),
+    dict(type='Pad', size_divisor=32),
+    dict(type='DefaultFormatBundle'),
+    dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks']),
+]
+test_pipeline = [
+    dict(type='LoadImageFromFile'),
+    dict(
+        type='MultiScaleFlipAug',
+        img_scale=(1333, 800),
+        flip=False,
+        transforms=[
+            dict(type='Resize', keep_ratio=True),
+            dict(type='RandomFlip'),
+            dict(type='Normalize', **img_norm_cfg),
+            dict(type='Pad', size_divisor=32),
+            dict(type='ImageToTensor', keys=['img']),
+            dict(type='Collect', keys=['img']),
+        ])
+]
+data = dict(
+    imgs_per_gpu=2,
+    workers_per_gpu=2,
+    train=dict(
+        type=dataset_type,
+        ann_file=data_root + 'annotations/instances_train2017.json',
+        img_prefix=data_root + 'train2017/',
+        pipeline=train_pipeline),
+    val=dict(
+        type=dataset_type,
+        ann_file=data_root + 'annotations/instances_val2017.json',
+        img_prefix=data_root + 'val2017/',
+        pipeline=test_pipeline),
+    test=dict(
+        type=dataset_type,
+        ann_file=data_root + 'annotations/instances_val2017.json',
+        img_prefix=data_root + 'val2017/',
+        pipeline=test_pipeline))
+# optimizer
+optimizer = dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0001)
+optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2))
+# learning policy
+lr_config = dict(
+    policy='step',
+    warmup='linear',
+    warmup_iters=500,
+    warmup_ratio=1.0 / 3,
+    step=[27, 33])
+checkpoint_config = dict(interval=1)
+# yapf:disable
+log_config = dict(
+    interval=50,
+    hooks=[
+        dict(type='TextLoggerHook'),
+        # dict(type='TensorboardLoggerHook')
+    ])
+# yapf:enable
+# runtime settings
+total_epochs = 36
+device_ids = range(8)
+dist_params = dict(backend='nccl')
+log_level = 'INFO'
+work_dir = './work_dirs/solo_release_r50_fpn_8gpu_3x'
+load_from = None
+resume_from = None
+workflow = [('train', 1)]
--- a/docker/Dockerfile
+++ b/docker/Dockerfile
@@ -12,8 +12,8 @@ RUN apt-get update && apt-get install -y libglib2.0-0 libsm6 libxrender-dev libx
 && apt-get clean \
 && rm -rf /var/lib/apt/lists/*

-# Install mmdetection
+# Install SOLO
 RUN conda install cython -y && conda clean --all
-RUN git clone https://github.com/open-mmlab/mmdetection.git /mmdetection
-WORKDIR /mmdetection
+RUN git clone https://github.com:WXinlong/SOLO.git /SOLO
+WORKDIR /SOLO
 RUN pip install --no-cache-dir -e .
--- a/docs/INSTALL.md
+++ b/docs/INSTALL.md
@@ -17,13 +17,13 @@ We have tested the following versions of OS and softwares:
 - NCCL: 2.1.15/2.2.13/2.3.7/2.4.2
 - GCC(G++): 4.9/5.3/5.4/7.3

-### Install mmdetection
+### Install SOLO

 a. Create a conda virtual environment and activate it.

 ```shell
-conda create -n open-mmlab python=3.7 -y
-conda activate open-mmlab
+conda create -n solo python=3.7 -y
+conda activate solo
 ```

 b. Install PyTorch and torchvision following the [official instructions](https://pytorch.org/), e.g.,
@@ -32,14 +32,14 @@ b. Install PyTorch and torchvision following the [official instructions](https:/
 conda install pytorch torchvision -c pytorch
 ```

-c. Clone the mmdetection repository.
+c. Clone the SOLO repository.

 ```shell
-git clone https://github.com/open-mmlab/mmdetection.git
-cd mmdetection
+git clone https://github.com:WXinlong/SOLO.git
+cd SOLO
 ```

-d. Install build requirements and then install mmdetection.
+d. Install build requirements and then install SOLO.
 (We install pycocotools via the github repo instead of pypi because the pypi version is old and not compatible with the latest numpy.)

 ```shell
@@ -53,7 +53,7 @@ Note:
 1. The git commit id will be written to the version number with step d, e.g. 0.6.0+2e7045c. The version will also be saved in trained models.
 It is recommended that you run step d each time you pull some updates from github. If C++/CUDA codes are modified, then this step is compulsory.

-2. Following the above instructions, mmdetection is installed on `dev` mode, any local modifications made to the code will take effect without the need to reinstall it (unless you submit some commits and want to update the version number).
+2. Following the above instructions, SOLO is installed on `dev` mode, any local modifications made to the code will take effect without the need to reinstall it (unless you submit some commits and want to update the version number).

 3. If you would like to use `opencv-python-headless` instead of `opencv-python`,
 you can install it before installing MMCV.
@@ -62,20 +62,20 @@ you can install it before installing MMCV.

 ### Another option: Docker Image

-We provide a [Dockerfile](https://github.com/open-mmlab/mmdetection/blob/master/docker/Dockerfile) to build an image.
+We provide a [Dockerfile](https://github.com/WXinlong/SOLO/blob/master/docker/Dockerfile) to build an image.

 ```shell
 # build an image with PyTorch 1.1, CUDA 10.0 and CUDNN 7.5
-docker build -t mmdetection docker/
+docker build -t SOLO docker/
 ```

 ### Prepare datasets

-It is recommended to symlink the dataset root to `$MMDETECTION/data`.
+It is recommended to symlink the dataset root to `$SOLO/data`.
 If your folder structure is different, you may need to change the corresponding paths in config files.

 ```
-mmdetection
+SOLO
 ├── mmdet
 ├── tools
 ├── configs
@@ -104,16 +104,16 @@ mv train/*/* train/

 ### A from-scratch setup script

-Here is a full script for setting up mmdetection with conda and link the dataset path (supposing that your COCO dataset path is $COCO_ROOT).
+Here is a full script for setting up SOLO with conda and link the dataset path (supposing that your COCO dataset path is $COCO_ROOT).

 ```shell
-conda create -n open-mmlab python=3.7 -y
-conda activate open-mmlab
+conda create -n solo python=3.7 -y
+conda activate solo

 conda install -c pytorch pytorch torchvision -y
 conda install cython -y
-git clone https://github.com/open-mmlab/mmdetection.git
-cd mmdetection
+git clone https://github.com/WXinlong/SOLO.git
+cd SOLO
 pip install -r requirements/build.txt
 pip install "git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI"
 pip install -v -e .

--- a/mmdet/core/evaluation/__init__.py
+++ b/mmdet/core/evaluation/__init__.py
 from .class_names import (coco_classes, dataset_aliases, get_classes,
                          imagenet_det_classes, imagenet_vid_classes,
                          voc_classes)
-from .coco_utils import coco_eval, fast_eval_recall, results2json
+from .coco_utils import coco_eval, fast_eval_recall, results2json, results2json_segm
 from .eval_hooks import (CocoDistEvalmAPHook, CocoDistEvalRecallHook,
                         DistEvalHook, DistEvalmAPHook)
 from .mean_ap import average_precision, eval_map, print_map_summary
@@ -14,5 +14,5 @@ __all__ = [
    'fast_eval_recall', 'results2json', 'DistEvalHook', 'DistEvalmAPHook',
    'CocoDistEvalRecallHook', 'CocoDistEvalmAPHook', 'average_precision',
    'eval_map', 'print_map_summary', 'eval_recalls', 'print_recall_summary',
-    'plot_num_recall', 'plot_iou_recall'
+    'plot_num_recall', 'plot_iou_recall', 'results2json_segm'
 ]
--- a/mmdet/core/evaluation/coco_utils.py
+++ b/mmdet/core/evaluation/coco_utils.py
@@ -198,6 +198,26 @@ def segm2json(dataset, results):
    return bbox_json_results, segm_json_results


+def segm2json_segm(dataset, results):
+    segm_json_results = []
+    for idx in range(len(dataset)):
+        img_id = dataset.img_ids[idx]
+        seg = results[idx]
+        for label in range(len(seg)):
+            masks = seg[label]
+            for i in range(len(masks)):
+                mask_score = masks[i][1]
+                segm = masks[i][0]
+                data = dict()
+                data['image_id'] = img_id
+                data['score'] = float(mask_score)
+                data['category_id'] = dataset.cat_ids[label]
+                segm['counts'] = segm['counts'].decode()
+                data['segmentation'] = segm
+                segm_json_results.append(data)
+    return segm_json_results
+
+
 def results2json(dataset, results, out_file):
    result_files = dict()
    if isinstance(results[0], list):
@@ -219,3 +239,12 @@ def results2json(dataset, results, out_file):
    else:
        raise TypeError('invalid type of results')
    return result_files
+
+
+def results2json_segm(dataset, results, out_file):
+    result_files = dict()
+    json_results = segm2json_segm(dataset, results)
+    result_files['segm'] = '{}.{}.json'.format(out_file, 'segm')
+    mmcv.dump(json_results, result_files['segm'])
+
+    return result_files
--- a/mmdet/core/post_processing/__init__.py
+++ b/mmdet/core/post_processing/__init__.py
 from .bbox_nms import multiclass_nms
+from .matrix_nms import matrix_nms
 from .merge_augs import (merge_aug_bboxes, merge_aug_masks,
                         merge_aug_proposals, merge_aug_scores)

 __all__ = [
    'multiclass_nms', 'merge_aug_proposals', 'merge_aug_bboxes',
-    'merge_aug_scores', 'merge_aug_masks'
+    'merge_aug_scores', 'merge_aug_masks', 'matrix_nms'
 ]
--- a/mmdet/core/post_processing/matrix_nms.py
+++ b/mmdet/core/post_processing/matrix_nms.py
+import torch
+
+
+def matrix_nms(seg_masks, cate_labels, cate_scores, kernel='gaussian', sigma=2.0, sum_masks=None):
+    """Matrix NMS for multi-class masks.
+
+    Args:
+        seg_masks (Tensor): shape (n, h, w)
+        cate_labels (Tensor): shape (n), mask labels in descending order
+        cate_scores (Tensor): shape (n), mask scores in descending order
+        kernel (str):  'linear' or 'gauss' 
+        sigma (float): std in gaussian method
+        sum_masks (Tensor): The sum of seg_masks
+
+    Returns:
+        Tensor: cate_scores_update, tensors of shape (n)
+    """
+    n_samples = len(cate_labels)
+    if n_samples == 0:
+        return []
+    if sum_masks == None:
+        sum_masks = seg_masks.sum((1, 2)).float()
+    seg_masks = seg_masks.reshape(n_samples, -1).float()
+    # inter.
+    inter_matrix = torch.mm(seg_masks, seg_masks.transpose(1, 0))
+    # union.
+    sum_masks_x = sum_masks.expand(n_samples, n_samples)
+    # iou.
+    iou_matrix = (inter_matrix / (sum_masks_x + sum_masks_x.transpose(1, 0) - inter_matrix)).triu(diagonal=1)
+    # label_specific matrix.
+    cate_labels_x = cate_labels.expand(n_samples, n_samples)
+    label_matrix = (cate_labels_x == cate_labels_x.transpose(1, 0)).float().triu(diagonal=1)
+
+    # IoU compensation
+    compensate_iou, _ = (iou_matrix * label_matrix).max(0)
+    compensate_iou = compensate_iou.expand(n_samples, n_samples).transpose(1, 0)
+
+    # IoU decay 
+    decay_iou = iou_matrix * label_matrix
+
+    # matrix nms
+    if kernel == 'gaussian':
+        decay_matrix = torch.exp(-1 * sigma * (decay_iou ** 2))
+        compensate_matrix = torch.exp(-1 * sigma * (compensate_iou ** 2))
+        decay_coefficient, _ = (decay_matrix / compensate_matrix).min(0)
+    elif kernel == 'linear':
+        decay_matrix = (1-decay_iou)/(1-compensate_iou)
+        decay_coefficient, _ = decay_matrix.min(0)
+    else:
+        raise NotImplementedError
+
+    # update the score.
+    cate_scores_update = cate_scores * decay_coefficient
+    return cate_scores_update
+
+
+def multiclass_nms(multi_bboxes,
+                   multi_scores,
+                   score_thr,
+                   nms_cfg,
+                   max_num=-1,
+                   score_factors=None):
+    """NMS for multi-class bboxes.
+
+    Args:
+        multi_bboxes (Tensor): shape (n, #class*4) or (n, 4)
+        multi_scores (Tensor): shape (n, #class), where the 0th column
+            contains scores of the background class, but this will be ignored.
+        score_thr (float): bbox threshold, bboxes with scores lower than it
+            will not be considered.
+        nms_thr (float): NMS IoU threshold
+        max_num (int): if there are more than max_num bboxes after NMS,
+            only top max_num will be kept.
+        score_factors (Tensor): The factors multiplied to scores before
+            applying NMS
+
+    Returns:
+        tuple: (bboxes, labels), tensors of shape (k, 5) and (k, 1). Labels
+            are 0-based.
+    """
+    num_classes = multi_scores.shape[1]
+    bboxes, labels = [], []
+    nms_cfg_ = nms_cfg.copy()
+    nms_type = nms_cfg_.pop('type', 'nms')
+    nms_op = getattr(nms_wrapper, nms_type)
+    for i in range(1, num_classes):
+        cls_inds = multi_scores[:, i] > score_thr
+        if not cls_inds.any():
+            continue
+        # get bboxes and scores of this class
+        if multi_bboxes.shape[1] == 4:
+            _bboxes = multi_bboxes[cls_inds, :]
+        else:
+            _bboxes = multi_bboxes[cls_inds, i * 4:(i + 1) * 4]
+        _scores = multi_scores[cls_inds, i]
+        if score_factors is not None:
+            _scores *= score_factors[cls_inds]
+        cls_dets = torch.cat([_bboxes, _scores[:, None]], dim=1)
+        cls_dets, _ = nms_op(cls_dets, **nms_cfg_)
+        cls_labels = multi_bboxes.new_full((cls_dets.shape[0], ),
+                                           i - 1,
+                                           dtype=torch.long)
+        bboxes.append(cls_dets)
+        labels.append(cls_labels)
+    if bboxes:
+        bboxes = torch.cat(bboxes)
+        labels = torch.cat(labels)
+        if bboxes.shape[0] > max_num:
+            _, inds = bboxes[:, -1].sort(descending=True)
+            inds = inds[:max_num]
+            bboxes = bboxes[inds]
+            labels = labels[inds]
+    else:
+        bboxes = multi_bboxes.new_zeros((0, 5))
+        labels = multi_bboxes.new_zeros((0, ), dtype=torch.long)
+
+    return bboxes, labels
--- a/mmdet/models/anchor_heads/__init__.py
+++ b/mmdet/models/anchor_heads/__init__.py
@@ -11,10 +11,12 @@ from .retina_head import RetinaHead
 from .retina_sepbn_head import RetinaSepBNHead
 from .rpn_head import RPNHead
 from .ssd_head import SSDHead
+from .solo_head import SOLOHead
+from .decoupled_solo_head import DecoupledSOLOHead

 __all__ = [
    'AnchorHead', 'GuidedAnchorHead', 'FeatureAdaption', 'RPNHead',
    'GARPNHead', 'RetinaHead', 'RetinaSepBNHead', 'GARetinaHead', 'SSDHead',
    'FCOSHead', 'RepPointsHead', 'FoveaHead', 'FreeAnchorRetinaHead',
-    'ATSSHead'
+    'ATSSHead', 'SOLOHead', 'DecoupledSOLOHead'
 ]
--- a/mmdet/models/anchor_heads/decoupled_solo_head.py
+++ b/mmdet/models/anchor_heads/decoupled_solo_head.py
--- a/mmdet/models/anchor_heads/solo_head.py
+++ b/mmdet/models/anchor_heads/solo_head.py
+import mmcv
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from mmcv.cnn import normal_init
+from mmdet.ops import DeformConv, roi_align
+from mmdet.core import multi_apply, bbox2roi, matrix_nms
+from ..builder import build_loss
+from ..registry import HEADS
+from ..utils import bias_init_with_prob, ConvModule
+
+INF = 1e8
+
+from scipy import ndimage
+
+def points_nms(heat, kernel=2):
+    # kernel must be 2
+    hmax = nn.functional.max_pool2d(
+        heat, (kernel, kernel), stride=1, padding=1)
+    keep = (hmax[:, :, :-1, :-1] == heat).float()
+    return heat * keep
+
+def dice_loss(input, target):
+    input = input.contiguous().view(input.size()[0], -1)
+    target = target.contiguous().view(target.size()[0], -1).float()
+
+    a = torch.sum(input * target, 1)
+    b = torch.sum(input * input, 1) + 0.001
+    c = torch.sum(target * target, 1) + 0.001
+    d = (2 * a) / (b + c)
+    return 1-d
+
+@HEADS.register_module
+class SOLOHead(nn.Module):
+
+    def __init__(self,
+                 num_classes,
+                 in_channels,
+                 seg_feat_channels=256,
+                 stacked_convs=4,
+                 strides=(4, 8, 16, 32, 64),
+                 base_edge_list=(16, 32, 64, 128, 256),
+                 scale_ranges=((8, 32), (16, 64), (32, 128), (64, 256), (128, 512)),
+                 sigma=0.4,
+                 num_grids=None,
+                 cate_down_pos=0,
+                 with_deform=False,
+                 loss_ins=None,
+                 loss_cate=None,
+                 conv_cfg=None,
+                 norm_cfg=None):
+        super(SOLOHead, self).__init__()
+        self.num_classes = num_classes
+        self.seg_num_grids = num_grids
+        self.cate_out_channels = self.num_classes - 1
+        self.in_channels = in_channels
+        self.seg_feat_channels = seg_feat_channels
+        self.stacked_convs = stacked_convs
+        self.strides = strides
+        self.sigma = sigma
+        self.cate_down_pos = cate_down_pos
+        self.base_edge_list = base_edge_list
+        self.scale_ranges = scale_ranges
+        self.with_deform = with_deform
+        self.loss_cate = build_loss(loss_cate)
+        self.ins_loss_weight = loss_ins['loss_weight']
+        self.conv_cfg = conv_cfg
+        self.norm_cfg = norm_cfg
+        self._init_layers()
+
+    def _init_layers(self):
+        norm_cfg = dict(type='GN', num_groups=32, requires_grad=True)
+        self.ins_convs = nn.ModuleList()
+        self.cate_convs = nn.ModuleList()
+        for i in range(self.stacked_convs):
+            chn = self.in_channels + 2 if i == 0 else self.seg_feat_channels
+            self.ins_convs.append(
+                ConvModule(
+                    chn,
+                    self.seg_feat_channels,
+                    3,
+                    stride=1,
+                    padding=1,
+                    norm_cfg=norm_cfg,
+                    bias=norm_cfg is None))
+
+            chn = self.in_channels if i == 0 else self.seg_feat_channels
+            self.cate_convs.append(
+                ConvModule(
+                    chn,
+                    self.seg_feat_channels,
+                    3,
+                    stride=1,
+                    padding=1,
+                    norm_cfg=norm_cfg,
+                    bias=norm_cfg is None))
+
+        self.solo_ins_list = nn.ModuleList()
+        for seg_num_grid in self.seg_num_grids:
+            self.solo_ins_list.append(
+                nn.Conv2d(
+                    self.seg_feat_channels, seg_num_grid**2, 1))
+
+        self.solo_cate = nn.Conv2d(
+            self.seg_feat_channels, self.cate_out_channels, 3, padding=1)
+
+    def init_weights(self):
+        for m in self.ins_convs:
+            normal_init(m.conv, std=0.01)
+        for m in self.cate_convs:
+            normal_init(m.conv, std=0.01)
+        bias_ins = bias_init_with_prob(0.01)
+        for m in self.solo_ins_list: 
+            normal_init(m, std=0.01, bias=bias_ins)
+        bias_cate = bias_init_with_prob(0.01)
+        normal_init(self.solo_cate, std=0.01, bias=bias_cate)
+
+    def forward(self, feats, eval=False):
+        new_feats = self.split_feats(feats)
+        featmap_sizes = [featmap.size()[-2:] for featmap in new_feats]
+        upsampled_size = (featmap_sizes[0][0] * 2, featmap_sizes[0][1] * 2)
+        ins_pred, cate_pred = multi_apply(self.forward_single, new_feats, 
+                                          list(range(len(self.seg_num_grids))),
+                                          eval=eval, upsampled_size=upsampled_size)
+        return ins_pred, cate_pred
+
+    def split_feats(self, feats):
+        return (F.interpolate(feats[0], scale_factor=0.5, mode='bilinear'), 
+                feats[1], 
+                feats[2], 
+                feats[3], 
+                F.interpolate(feats[4], size=feats[3].shape[-2:], mode='bilinear'))
+
+    def forward_single(self, x, idx, eval=False, upsampled_size=None):
+        ins_feat = x
+        cate_feat = x
+        # ins branch
+        # concat coord
+        x_range = torch.linspace(-1, 1, ins_feat.shape[-1], device=ins_feat.device)
+        y_range = torch.linspace(-1, 1, ins_feat.shape[-2], device=ins_feat.device)
+        y, x = torch.meshgrid(y_range, x_range)
+        y = y.expand([ins_feat.shape[0], 1, -1, -1])
+        x = x.expand([ins_feat.shape[0], 1, -1, -1])
+        coord_feat = torch.cat([x, y], 1)
+        ins_feat = torch.cat([ins_feat, coord_feat], 1)
+
+        for i, ins_layer in enumerate(self.ins_convs):
+            ins_feat = ins_layer(ins_feat)
+
+        ins_feat = F.interpolate(ins_feat, scale_factor=2, mode='bilinear')
+        ins_pred = self.solo_ins_list[idx](ins_feat)
+
+        # cate branch
+        for i, cate_layer in enumerate(self.cate_convs):
+            if i == self.cate_down_pos:
+                seg_num_grid = self.seg_num_grids[idx]
+                cate_feat = F.interpolate(cate_feat, size=seg_num_grid, mode='bilinear')
+            cate_feat = cate_layer(cate_feat)
+
+        cate_pred = self.solo_cate(cate_feat)
+        if eval:
+            ins_pred = F.interpolate(ins_pred.sigmoid(), size=upsampled_size, mode='bilinear')
+            cate_pred = points_nms(cate_pred.sigmoid(), kernel=2).permute(0, 2, 3, 1)
+        return ins_pred, cate_pred
+
+    def loss(self,
+             ins_preds,
+             cate_preds,
+             gt_bbox_list,
+             gt_label_list,
+             gt_mask_list,
+             img_metas,
+             cfg,
+             gt_bboxes_ignore=None):
+        featmap_sizes = [featmap.size()[-2:] for featmap in
+                         ins_preds]
+        ins_label_list, cate_label_list, ins_ind_label_list = multi_apply(
+            self.solo_target_single,
+            gt_bbox_list,
+            gt_label_list,
+            gt_mask_list,
+            featmap_sizes=featmap_sizes)
+
+        # ins
+        ins_labels = [torch.cat([ins_labels_level_img[ins_ind_labels_level_img, ...]
+                                 for ins_labels_level_img, ins_ind_labels_level_img in
+                                 zip(ins_labels_level, ins_ind_labels_level)], 0)
+                      for ins_labels_level, ins_ind_labels_level in zip(zip(*ins_label_list), zip(*ins_ind_label_list))]
+
+        ins_preds = [torch.cat([ins_preds_level_img[ins_ind_labels_level_img, ...]
+                                for ins_preds_level_img, ins_ind_labels_level_img in
+                                zip(ins_preds_level, ins_ind_labels_level)], 0)
+                     for ins_preds_level, ins_ind_labels_level in zip(ins_preds, zip(*ins_ind_label_list))]
+
+
+        ins_ind_labels = [
+            torch.cat([ins_ind_labels_level_img.flatten()
+                       for ins_ind_labels_level_img in ins_ind_labels_level])
+            for ins_ind_labels_level in zip(*ins_ind_label_list)
+        ]
+        flatten_ins_ind_labels = torch.cat(ins_ind_labels)
+
+        num_ins = flatten_ins_ind_labels.sum()
+
+        # dice loss
+        loss_ins = []
+        for input, target in zip(ins_preds, ins_labels):
+            if input.size()[0] == 0:
+                continue
+            input = torch.sigmoid(input)
+            loss_ins.append(dice_loss(input, target))
+        loss_ins = torch.cat(loss_ins).mean()
+        loss_ins = loss_ins * self.ins_loss_weight
+
+        # cate
+        cate_labels = [
+            torch.cat([cate_labels_level_img.flatten()
+                       for cate_labels_level_img in cate_labels_level])
+            for cate_labels_level in zip(*cate_label_list)
+        ]
+        flatten_cate_labels = torch.cat(cate_labels)
+
+        cate_preds = [
+            cate_pred.permute(0, 2, 3, 1).reshape(-1, self.cate_out_channels)
+            for cate_pred in cate_preds
+        ]
+        flatten_cate_preds = torch.cat(cate_preds)
+
+        loss_cate = self.loss_cate(flatten_cate_preds, flatten_cate_labels, avg_factor=num_ins + 1)
+        return dict(
+            loss_ins=loss_ins,
+            loss_cate=loss_cate)
+
+    def solo_target_single(self,
+                               gt_bboxes_raw,
+                               gt_labels_raw,
+                               gt_masks_raw,
+                               featmap_sizes=None):
+
+        device = gt_labels_raw[0].device
+
+        # ins
+        gt_areas = torch.sqrt((gt_bboxes_raw[:, 2] - gt_bboxes_raw[:, 0]) * (
+                gt_bboxes_raw[:, 3] - gt_bboxes_raw[:, 1]))
+
+        ins_label_list = []
+        cate_label_list = []
+        ins_ind_label_list = []
+        for (lower_bound, upper_bound), stride, featmap_size, num_grid \
+                in zip(self.scale_ranges, self.strides, featmap_sizes, self.seg_num_grids):
+
+            ins_label = torch.zeros([num_grid ** 2, featmap_size[0], featmap_size[1]], dtype=torch.uint8, device=device)
+            cate_label = torch.zeros([num_grid, num_grid], dtype=torch.int64, device=device)
+            ins_ind_label = torch.zeros([num_grid ** 2], dtype=torch.bool, device=device)
+
+            hit_indices = ((gt_areas >= lower_bound) & (gt_areas <= upper_bound)).nonzero().flatten()
+            if len(hit_indices) == 0:
+                ins_label_list.append(ins_label)
+                cate_label_list.append(cate_label)
+                ins_ind_label_list.append(ins_ind_label)
+                continue
+            gt_bboxes = gt_bboxes_raw[hit_indices]
+            gt_labels = gt_labels_raw[hit_indices]
+            gt_masks = gt_masks_raw[hit_indices.cpu().numpy(), ...]
+
+            half_ws = 0.5 * (gt_bboxes[:, 2] - gt_bboxes[:, 0]) * self.sigma
+            half_hs = 0.5 * (gt_bboxes[:, 3] - gt_bboxes[:, 1]) * self.sigma
+
+            output_stride = stride / 2
+
+            for seg_mask, gt_label, half_h, half_w in zip(gt_masks, gt_labels, half_hs, half_ws):
+                if seg_mask.sum() < 10:
+                   continue
+                # mass center
+                upsampled_size = (featmap_sizes[0][0] * 4, featmap_sizes[0][1] * 4)
+                center_h, center_w = ndimage.measurements.center_of_mass(seg_mask)
+                coord_w = int((center_w / upsampled_size[1]) // (1. / num_grid))
+                coord_h = int((center_h / upsampled_size[0]) // (1. / num_grid))
+
+                # left, top, right, down
+                top_box = max(0, int(((center_h - half_h) / upsampled_size[0]) // (1. / num_grid)))
+                down_box = min(num_grid - 1, int(((center_h + half_h) / upsampled_size[0]) // (1. / num_grid)))
+                left_box = max(0, int(((center_w - half_w) / upsampled_size[1]) // (1. / num_grid)))
+                right_box = min(num_grid - 1, int(((center_w + half_w) / upsampled_size[1]) // (1. / num_grid)))
+
+                top = max(top_box, coord_h-1)
+                down = min(down_box, coord_h+1)
+                left = max(coord_w-1, left_box)
+                right = min(right_box, coord_w+1)
+
+                cate_label[top:(down+1), left:(right+1)] = gt_label
+                # ins
+                seg_mask = mmcv.imrescale(seg_mask, scale=1. / output_stride)
+                seg_mask = torch.Tensor(seg_mask)
+                for i in range(top, down+1):
+                    for j in range(left, right+1):
+                        label = int(i * num_grid + j)
+                        ins_label[label, :seg_mask.shape[0], :seg_mask.shape[1]] = seg_mask
+                        ins_ind_label[label] = True
+            ins_label_list.append(ins_label)
+            cate_label_list.append(cate_label)
+            ins_ind_label_list.append(ins_ind_label)
+        return ins_label_list, cate_label_list, ins_ind_label_list
+
+    def get_seg(self, seg_preds, cate_preds, img_metas, cfg, rescale=None):
+        assert len(seg_preds) == len(cate_preds)
+        num_levels = len(cate_preds)
+        featmap_size = seg_preds[0].size()[-2:]
+
+        result_list = []
+        for img_id in range(len(img_metas)):
+            cate_pred_list = [
+                cate_preds[i][img_id].view(-1, self.cate_out_channels).detach() for i in range(num_levels)
+            ]
+            seg_pred_list = [
+                seg_preds[i][img_id].detach() for i in range(num_levels)
+            ]
+            img_shape = img_metas[img_id]['img_shape']
+            scale_factor = img_metas[img_id]['scale_factor']
+            ori_shape = img_metas[img_id]['ori_shape']
+
+            cate_pred_list = torch.cat(cate_pred_list, dim=0)
+            seg_pred_list = torch.cat(seg_pred_list, dim=0)
+
+            result = self.get_seg_single(cate_pred_list, seg_pred_list,
+                                         featmap_size, img_shape, ori_shape, scale_factor, cfg, rescale)
+            result_list.append(result)
+        return result_list
+
+    def get_seg_single(self,
+                       cate_preds,
+                       seg_preds,
+                       featmap_size,
+                       img_shape,
+                       ori_shape,
+                       scale_factor,
+                       cfg,
+                       rescale=False, debug=False):
+        assert len(cate_preds) == len(seg_preds)
+
+        # overall info.
+        h, w, _ = img_shape
+        upsampled_size_out = (featmap_size[0] * 4, featmap_size[1] * 4)
+
+        # process.
+        inds = (cate_preds > cfg.score_thr)
+        # category scores.
+        cate_scores = cate_preds[inds]
+        if len(cate_scores) == 0:
+            return None
+        # category labels.
+        inds = inds.nonzero()
+        cate_labels = inds[:, 1]
+
+        # strides.
+        size_trans = cate_labels.new_tensor(self.seg_num_grids).pow(2).cumsum(0)
+        strides = cate_scores.new_ones(size_trans[-1])
+        n_stage = len(self.seg_num_grids)
+        strides[:size_trans[0]] *= self.strides[0]
+        for ind_ in range(1, n_stage):
+            strides[size_trans[ind_ - 1]:size_trans[ind_]] *= self.strides[ind_]
+        strides = strides[inds[:, 0]]
+
+        # masks.
+        seg_preds = seg_preds[inds[:, 0]]
+        seg_masks = seg_preds > cfg.mask_thr
+        sum_masks = seg_masks.sum((1, 2)).float()
+
+        # filter.
+        keep = sum_masks > strides
+        if keep.sum() == 0:
+            return None
+
+        seg_masks = seg_masks[keep, ...]
+        seg_preds = seg_preds[keep, ...]
+        sum_masks = sum_masks[keep]
+        cate_scores = cate_scores[keep]
+        cate_labels = cate_labels[keep]
+
+        # mask scoring.
+        seg_scores = (seg_preds * seg_masks.float()).sum((1, 2)) / sum_masks
+        cate_scores *= seg_scores
+
+        # sort and keep top nms_pre
+        sort_inds = torch.argsort(cate_scores, descending=True)
+        if len(sort_inds) > cfg.nms_pre:
+            sort_inds = sort_inds[:cfg.nms_pre]
+        seg_masks = seg_masks[sort_inds, :, :]
+        seg_preds = seg_preds[sort_inds, :, :]
+        sum_masks = sum_masks[sort_inds]
+        cate_scores = cate_scores[sort_inds]
+        cate_labels = cate_labels[sort_inds]
+
+        # Matrix NMS
+        cate_scores = matrix_nms(seg_masks, cate_labels, cate_scores,
+                                 kernel=cfg.kernel, sigma=cfg.sigma, sum_masks=sum_masks)
+
+        # filter.
+        keep = cate_scores >= cfg.update_thr
+        if keep.sum() == 0:
+            return None
+        seg_preds = seg_preds[keep, :, :]
+        cate_scores = cate_scores[keep]
+        cate_labels = cate_labels[keep]
+
+        # sort and keep top_k
+        sort_inds = torch.argsort(cate_scores, descending=True)
+        if len(sort_inds) > cfg.max_per_img:
+            sort_inds = sort_inds[:cfg.max_per_img]
+        seg_preds = seg_preds[sort_inds, :, :]
+        cate_scores = cate_scores[sort_inds]
+        cate_labels = cate_labels[sort_inds]
+
+        seg_preds = F.interpolate(seg_preds.unsqueeze(0),
+                                  size=upsampled_size_out,
+                                  mode='bilinear')[:, :, :h, :w]
+        seg_masks = F.interpolate(seg_preds,
+                                  size=ori_shape[:2],
+                                  mode='bilinear').squeeze(0)
+        seg_masks = seg_masks > cfg.mask_thr
+        return seg_masks, cate_labels, cate_scores
--- a/mmdet/models/detectors/__init__.py
+++ b/mmdet/models/detectors/__init__.py
@@ -14,11 +14,13 @@ from .reppoints_detector import RepPointsDetector
 from .retinanet import RetinaNet
 from .rpn import RPN
 from .single_stage import SingleStageDetector
+from .single_stage_ins import SingleStageInsDetector
 from .two_stage import TwoStageDetector
+from .solo import SOLO

 __all__ = [
    'ATSS', 'BaseDetector', 'SingleStageDetector', 'TwoStageDetector', 'RPN',
    'FastRCNN', 'FasterRCNN', 'MaskRCNN', 'CascadeRCNN', 'HybridTaskCascade',
    'DoubleHeadRCNN', 'RetinaNet', 'FCOS', 'GridRCNN', 'MaskScoringRCNN',
-    'RepPointsDetector', 'FOVEA'
+    'RepPointsDetector', 'FOVEA', 'SingleStageInsDetector', 'SOLO'
 ]
--- a/mmdet/models/detectors/single_stage_ins.py
+++ b/mmdet/models/detectors/single_stage_ins.py
+import torch.nn as nn
+
+from mmdet.core import bbox2result
+from .. import builder
+from ..registry import DETECTORS
+from .base import BaseDetector
+
+
+@DETECTORS.register_module
+class SingleStageInsDetector(BaseDetector):
+
+    def __init__(self,
+                 backbone,
+                 neck=None,
+                 bbox_head=None,
+                 train_cfg=None,
+                 test_cfg=None,
+                 pretrained=None):
+        super(SingleStageInsDetector, self).__init__()
+        self.backbone = builder.build_backbone(backbone)
+        if neck is not None:
+            self.neck = builder.build_neck(neck)
+        self.bbox_head = builder.build_head(bbox_head)
+        self.train_cfg = train_cfg
+        self.test_cfg = test_cfg
+        self.init_weights(pretrained=pretrained)
+
+    def init_weights(self, pretrained=None):
+        super(SingleStageInsDetector, self).init_weights(pretrained)
+        self.backbone.init_weights(pretrained=pretrained)
+        if self.with_neck:
+            if isinstance(self.neck, nn.Sequential):
+                for m in self.neck:
+                    m.init_weights()
+            else:
+                self.neck.init_weights()
+        self.bbox_head.init_weights()
+
+    def extract_feat(self, img):
+        x = self.backbone(img)
+        if self.with_neck:
+            x = self.neck(x)
+        return x
+
+    def forward_dummy(self, img):
+        x = self.extract_feat(img)
+        outs = self.bbox_head(x)
+        return outs
+
+    def forward_train(self,
+                      img,
+                      img_metas,
+                      gt_bboxes,
+                      gt_labels,
+                      gt_bboxes_ignore=None,
+                      gt_masks=None):
+        x = self.extract_feat(img)
+        outs = self.bbox_head(x)
+        loss_inputs = outs + (gt_bboxes, gt_labels, gt_masks, img_metas, self.train_cfg)
+        losses = self.bbox_head.loss(
+            *loss_inputs, gt_bboxes_ignore=gt_bboxes_ignore)
+        return losses
+
+    def simple_test(self, img, img_meta, rescale=False):
+        x = self.extract_feat(img)
+        outs = self.bbox_head(x, eval=True)
+        seg_inputs = outs + (img_meta, self.test_cfg, rescale)
+        seg_result = self.bbox_head.get_seg(*seg_inputs)
+        return seg_result  
+
+    def aug_test(self, imgs, img_metas, rescale=False):
+        raise NotImplementedError
--- a/mmdet/models/detectors/solo.py
+++ b/mmdet/models/detectors/solo.py
+from .single_stage_ins import SingleStageInsDetector
+from ..registry import DETECTORS
+
+
+@DETECTORS.register_module
+class SOLO(SingleStageInsDetector):
+
+    def __init__(self,
+                 backbone,
+                 neck,
+                 bbox_head,
+                 train_cfg=None,
+                 test_cfg=None,
+                 pretrained=None):
+        super(SOLO, self).__init__(backbone, neck, bbox_head, train_cfg,
+                                   test_cfg, pretrained)
--- a/requirements/runtime.txt
+++ b/requirements/runtime.txt
 matplotlib
 mmcv>=0.2.15
 numpy
+scipy
 # need older pillow until torchvision is fixed
 Pillow<=6.2.2
 six

--- a/tools/test_ins.py
+++ b/tools/test_ins.py
+import argparse
+import os
+import os.path as osp
+import shutil
+import tempfile
+
+import mmcv
+import torch
+import torch.nn.functional as F
+import torch.distributed as dist
+from mmcv.parallel import MMDataParallel, MMDistributedDataParallel
+from mmcv.runner import init_dist, get_dist_info, load_checkpoint
+
+from mmdet.core import coco_eval, results2json, results2json_segm, wrap_fp16_model, tensor2imgs, get_classes
+from mmdet.datasets import build_dataloader, build_dataset
+from mmdet.models import build_detector
+import time
+import numpy as np
+import pycocotools.mask as mask_util
+
+
+def get_masks(result, num_classes=80):
+    for cur_result in result:
+        masks = [[] for _ in range(num_classes)]
+        if cur_result is None:
+            return masks
+        seg_pred = cur_result[0].cpu().numpy().astype(np.uint8)
+        cate_label = cur_result[1].cpu().numpy().astype(np.int)
+        cate_score = cur_result[2].cpu().numpy().astype(np.float)
+        num_ins = seg_pred.shape[0]
+        for idx in range(num_ins):
+            cur_mask = seg_pred[idx, ...]
+            rle = mask_util.encode(
+                np.array(cur_mask[:, :, np.newaxis], order='F'))[0]
+            rst = (rle, cate_score[idx])
+            masks[cate_label[idx]].append(rst)
+
+        return masks
+
+
+def single_gpu_test(model, data_loader, show=False, verbose=True):
+    model.eval()
+    results = []
+    dataset = data_loader.dataset
+
+    num_classes = len(dataset.CLASSES)
+
+    prog_bar = mmcv.ProgressBar(len(dataset))
+    for i, data in enumerate(data_loader):
+        with torch.no_grad():
+            seg_result = model(return_loss=False, rescale=not show, **data)
+
+        result = get_masks(seg_result, num_classes=num_classes)
+        results.append(result)
+            
+        batch_size = data['img'][0].size(0)
+        for _ in range(batch_size):
+            prog_bar.update()
+    return results
+
+
+def multi_gpu_test(model, data_loader, tmpdir=None):
+    model.eval()
+    results = []
+    dataset = data_loader.dataset
+    rank, world_size = get_dist_info()
+    if rank == 0:
+        prog_bar = mmcv.ProgressBar(len(dataset))
+    for i, data in enumerate(data_loader):
+        with torch.no_grad():
+            result = model(return_loss=False, rescale=True, **data)
+        results.append(result)
+
+        if rank == 0:
+            batch_size = data['img'][0].size(0)
+            for _ in range(batch_size * world_size):
+                prog_bar.update()
+
+    # collect results from all ranks
+    results = collect_results(results, len(dataset), tmpdir)
+
+    return results
+
+
+def collect_results(result_part, size, tmpdir=None):
+    rank, world_size = get_dist_info()
+    # create a tmp dir if it is not specified
+    if tmpdir is None:
+        MAX_LEN = 512
+        # 32 is whitespace
+        dir_tensor = torch.full((MAX_LEN, ),
+                                32,
+                                dtype=torch.uint8,
+                                device='cuda')
+        if rank == 0:
+            tmpdir = tempfile.mkdtemp()
+            tmpdir = torch.tensor(
+                bytearray(tmpdir.encode()), dtype=torch.uint8, device='cuda')
+            dir_tensor[:len(tmpdir)] = tmpdir
+        dist.broadcast(dir_tensor, 0)
+        tmpdir = dir_tensor.cpu().numpy().tobytes().decode().rstrip()
+    else:
+        mmcv.mkdir_or_exist(tmpdir)
+    # dump the part result to the dir
+    mmcv.dump(result_part, osp.join(tmpdir, 'part_{}.pkl'.format(rank)))
+    dist.barrier()
+    # collect all parts
+    if rank != 0:
+        return None
+    else:
+        # load results of all parts from tmp dir
+        part_list = []
+        for i in range(world_size):
+            part_file = osp.join(tmpdir, 'part_{}.pkl'.format(i))
+            part_list.append(mmcv.load(part_file))
+        # sort the results
+        ordered_results = []
+        for res in zip(*part_list):
+            ordered_results.extend(list(res))
+        # the dataloader may pad some samples
+        ordered_results = ordered_results[:size]
+        # remove tmp dir
+        shutil.rmtree(tmpdir)
+        return ordered_results
+
+
+def parse_args():
+    parser = argparse.ArgumentParser(description='MMDet test detector')
+    parser.add_argument('config', help='test config file path')
+    parser.add_argument('checkpoint', help='checkpoint file')
+    parser.add_argument('--out', help='output result file')
+    parser.add_argument(
+        '--json_out',
+        help='output result file name without extension',
+        type=str)
+    parser.add_argument(
+        '--eval',
+        type=str,
+        nargs='+',
+        choices=['proposal', 'proposal_fast', 'bbox', 'segm', 'keypoints'],
+        help='eval types')
+    parser.add_argument('--show', action='store_true', help='show results')
+    parser.add_argument('--tmpdir', help='tmp dir for writing some results')
+    parser.add_argument(
+        '--launcher',
+        choices=['none', 'pytorch', 'slurm', 'mpi'],
+        default='none',
+        help='job launcher')
+    parser.add_argument('--local_rank', type=int, default=0)
+    args = parser.parse_args()
+    if 'LOCAL_RANK' not in os.environ:
+        os.environ['LOCAL_RANK'] = str(args.local_rank)
+    return args
+
+
+def main():
+    args = parse_args()
+
+    assert args.out or args.show or args.json_out, \
+        ('Please specify at least one operation (save or show the results) '
+         'with the argument "--out" or "--show" or "--json_out"')
+
+    if args.out is not None and not args.out.endswith(('.pkl', '.pickle')):
+        raise ValueError('The output file must be a pkl file.')
+
+    if args.json_out is not None and args.json_out.endswith('.json'):
+        args.json_out = args.json_out[:-5]
+
+    cfg = mmcv.Config.fromfile(args.config)
+    # set cudnn_benchmark
+    if cfg.get('cudnn_benchmark', False):
+        torch.backends.cudnn.benchmark = True
+    cfg.model.pretrained = None
+    cfg.data.test.test_mode = True
+
+    # init distributed env first, since logger depends on the dist info.
+    if args.launcher == 'none':
+        distributed = False
+    else:
+        distributed = True
+        init_dist(args.launcher, **cfg.dist_params)
+
+    # build the dataloader
+    # TODO: support multiple images per gpu (only minor changes are needed)
+    dataset = build_dataset(cfg.data.test)
+    data_loader = build_dataloader(
+        dataset,
+        imgs_per_gpu=1,
+        workers_per_gpu=cfg.data.workers_per_gpu,
+        dist=distributed,
+        shuffle=False)
+
+    # build the model and load checkpoint
+    model = build_detector(cfg.model, train_cfg=None, test_cfg=cfg.test_cfg)
+    fp16_cfg = cfg.get('fp16', None)
+    if fp16_cfg is not None:
+        wrap_fp16_model(model)
+
+    while not osp.isfile(args.checkpoint):
+        print('Waiting for {} to exist...'.format(args.checkpoint))
+        time.sleep(60)
+
+    checkpoint = load_checkpoint(model, args.checkpoint, map_location='cpu')
+    # old versions did not save class info in checkpoints, this walkaround is
+    # for backward compatibility
+    if 'CLASSES' in checkpoint['meta']:
+        model.CLASSES = checkpoint['meta']['CLASSES']
+    else:
+        model.CLASSES = dataset.CLASSES
+
+    assert not distributed
+    if not distributed:
+        model = MMDataParallel(model, device_ids=[0])
+        outputs = single_gpu_test(model, data_loader)
+    else:
+        model = MMDistributedDataParallel(model.cuda())
+        outputs = multi_gpu_test(model, data_loader, args.tmpdir)
+
+    rank, _ = get_dist_info()
+    if args.out and rank == 0:
+        print('\nwriting results to {}'.format(args.out))
+        mmcv.dump(outputs, args.out)
+        eval_types = args.eval
+        if eval_types:
+            print('Starting evaluate {}'.format(' and '.join(eval_types)))
+            if eval_types == ['proposal_fast']:
+                result_file = args.out
+                coco_eval(result_file, eval_types, dataset.coco)
+            else:
+                if not isinstance(outputs[0], dict):
+                    result_files = results2json_segm(dataset, outputs, args.out)
+                    coco_eval(result_files, eval_types, dataset.coco)
+                else:
+                    for name in outputs[0]:
+                        print('\nEvaluating {}'.format(name))
+                        outputs_ = [out[name] for out in outputs]
+                        result_file = args.out + '.{}'.format(name)
+                        result_files = results2json(dataset, outputs_,
+                                                    result_file)
+                        coco_eval(result_files, eval_types, dataset.coco)
+
+    # Save predictions in the COCO json format
+    if args.json_out and rank == 0:
+        if not isinstance(outputs[0], dict):
+            results2json(dataset, outputs, args.json_out)
+        else:
+            for name in outputs[0]:
+                outputs_ = [out[name] for out in outputs]
+                result_file = args.json_out + '.{}'.format(name)
+                results2json(dataset, outputs_, result_file)
+
+
+if __name__ == '__main__':
+    main()