auto merge

0ccd4c52 · myownskyW7 · de4dd5a5 · 0ccd4c52 · 0ccd4c52 · 0ccd4c52
Commit 0ccd4c52 authored Oct 15, 2018 by myownskyW7
15 changed files
--- a/MODEL_ZOO.md
+++ b/MODEL_ZOO.md
+# Benchmark and Model Zoo
+
+## Environment
+
+### Hardware
+
+- 8 NVIDIA Tesla V100 GPUs
+- Intel Xeon 4114 CPU @ 2.20GHz
+
+### Software environment
+
+- Python 3.6 / 3.7
+- PyTorch 0.4.1
+- CUDA 9.0.176
+- CUDNN 7.0.4
+- NCCL 2.1.15
+
+
+## Common settings
+
+- All baselines were trained using 8 GPU with a batch size of 16 (2 images per GPU).
+- All models were trained on `coco_2017_train`, and tested on the `coco_2017_val`.
+- We use distributed training and BN layer stats are fixed.
+- We adopt the same training schedules as Detectron. 1x indicates 12 epochs and 2x indicates 24 epochs, which corresponds to slightly less iterations than Detectron and the difference can be ignored.
+- All pytorch-style pretrained backbones on ImageNet are from PyTorch model zoo.
+- We report the training GPU memory as the maximum value of `torch.cuda.max_memory_cached()`
+for all 8 GPUs. Note that this value is usually less than what `nvidia-smi` shows, but
+closer to the actual requirements.
+- We report the inference time as the overall time including data loading,
+network forwarding and post processing.
+- The training memory and time of 2x schedule is simply copied from 1x.
+It should be very close to the actual memory and time.
+
+
+## Baselines
+
+We released RPN, Faster R-CNN and Mask R-CNN models in the first version. More models with different backbones will be added to the model zoo.
+
+### RPN
+
+| Backbone | Style   | Lr schd | Mem (GB) | Train time (s/iter) | Inf time (fps) | AR1000 | Download |
+|:--------:|:-------:|:-------:|:--------:|:-------------------:|:--------------:|:------:|:--------:|
+| R-50-FPN | caffe   | 1x      | 4.5      | 0.379               | 14.4           | 58.2   | -        |
+| R-50-FPN | pytorch | 1x      | 4.8      | 0.407               | 14.5           | 57.1   | [model](https://s3.ap-northeast-2.amazonaws.com/open-mmlab/mmdetection/models/rpn_r50_fpn_1x_20181010-4a9c0712.pth) \| [result](https://s3.ap-northeast-2.amazonaws.com/open-mmlab/mmdetection/results/rpn_r50_fpn_1x_20181010_results.pkl.json) |
+| R-50-FPN | pytorch | 2x      | 4.8      | 0.407               | 14.5           | 57.6   | [model](https://s3.ap-northeast-2.amazonaws.com/open-mmlab/mmdetection/models/rpn_r50_fpn_2x_20181010-88a4a471.pth) \| [result](https://s3.ap-northeast-2.amazonaws.com/open-mmlab/mmdetection/results/rpn_r50_fpn_2x_20181010_results.pkl.json) |
+
+### Faster R-CNN
+
+| Backbone | Style   | Lr schd | Mem (GB) | Train time (s/iter) | Inf time (fps) | box AP | Download |
+|:--------:|:-------:|:-------:|:--------:|:-------------------:|:--------------:|:------:|:--------:|
+| R-50-FPN | caffe   | 1x      | 4.9      | 0.525               | 10.0           | 36.7   | -        |
+| R-50-FPN | pytorch | 1x      | 5.1      | 0.554               | 9.9            | 36.4   | [model](https://s3.ap-northeast-2.amazonaws.com/open-mmlab/mmdetection/models/faster_rcnn_r50_fpn_1x_20181010-3d1b3351.pth) \| [result](https://s3.ap-northeast-2.amazonaws.com/open-mmlab/mmdetection/results/faster_rcnn_r50_fpn_1x_20181010_results.pkl.json) |
+| R-50-FPN | pytorch | 2x      | 5.1      | 0.554               | 9.9            | 37.7   | [model](https://s3.ap-northeast-2.amazonaws.com/open-mmlab/mmdetection/models/faster_rcnn_r50_fpn_2x_20181010-443129e1.pth) \| [result](https://s3.ap-northeast-2.amazonaws.com/open-mmlab/mmdetection/results/faster_rcnn_r50_fpn_2x_20181010_results.pkl.json) |
+
+### Mask R-CNN
+
+| Backbone | Style   | Lr schd | Mem (GB) | Train time (s/iter) | Inf time (fps) | box AP | mask AP | Download |
+|:--------:|:-------:|:-------:|:--------:|:-------------------:|:--------------:|:------:|:-------:|:--------:|
+| R-50-FPN | caffe   | 1x      | 5.9      | 0.658               | 7.7            | 37.5   | 34.4    | -        |
+| R-50-FPN | pytorch | 1x      | 5.8      | 0.690               | 7.7            | 37.3   | 34.2    | [model](https://s3.ap-northeast-2.amazonaws.com/open-mmlab/mmdetection/models/mask_rcnn_r50_fpn_1x_20181010-069fa190.pth) \| [result](https://s3.ap-northeast-2.amazonaws.com/open-mmlab/mmdetection/results/mask_rcnn_r50_fpn_1x_20181010_results.pkl.json) |
+| R-50-FPN | pytorch | 2x      | 5.8      | 0.690               | 7.7            | 38.6   | 35.1    | [model](https://s3.ap-northeast-2.amazonaws.com/open-mmlab/mmdetection/models/mask_rcnn_r50_fpn_2x_20181010-41d35c05.pth) \| [result](https://s3.ap-northeast-2.amazonaws.com/open-mmlab/mmdetection/results/mask_rcnn_r50_fpn_2x_20181010_results.pkl.json) |
+
+### Fast R-CNN (with pre-computed proposals)
+
+| Backbone | Style   | Type   | Lr schd | Mem (GB) | Train time (s/iter) | Inf time (fps) | box AP | mask AP | Download |
+|:--------:|:-------:|:------:|:-------:|:--------:|:-------------------:|:--------------:|:------:|:-------:|:--------:|
+| R-50-FPN | caffe   | Faster | 1x      | 3.5      | 0.35                | 14.6           | 36.6   | -       | -        |
+| R-50-FPN | pytorch | Faster | 1x      | 4.0      | 0.38                | 14.5           | 35.8   | -       | [model](https://s3.ap-northeast-2.amazonaws.com/open-mmlab/mmdetection/models/fast_rcnn_r50_fpn_1x_20181010-08160859.pth) \| [result](https://s3.ap-northeast-2.amazonaws.com/open-mmlab/mmdetection/results/fast_rcnn_r50_fpn_1x_20181010_results.pkl.json) |
+| R-50-FPN | pytorch | Faster | 2x      | 4.0      | 0.38                | 14.5           | 37.1   | -       | [model](https://s3.ap-northeast-2.amazonaws.com/open-mmlab/mmdetection/models/fast_rcnn_r50_fpn_2x_20181010-d263ada5.pth) \| [result](https://s3.ap-northeast-2.amazonaws.com/open-mmlab/mmdetection/results/fast_rcnn_r50_fpn_2x_20181010_results.pkl.json) |
+| R-50-FPN | caffe   | Mask   | 1x      | 5.4      | 0.47                | 10.7           | 37.3   | 34.5    | -        |
+| R-50-FPN | pytorch | Mask   | 1x      | 5.3      | 0.50                | 10.6           | 36.8   | 34.1    | [model](https://s3.ap-northeast-2.amazonaws.com/open-mmlab/mmdetection/models/fast_mask_rcnn_r50_fpn_1x_20181010-e030a38f.pth) \| [result](https://s3.ap-northeast-2.amazonaws.com/open-mmlab/mmdetection/results/fast_mask_rcnn_r50_fpn_1x_20181010_results.pkl.json) |
+| R-50-FPN | pytorch | Mask   | 2x      | 5.3      | 0.50                | 10.6           | 37.9   | 34.8    | [model](https://s3.ap-northeast-2.amazonaws.com/open-mmlab/mmdetection/models/fast_mask_rcnn_r50_fpn_2x_20181010-5048cb03.pth) \| [result](https://s3.ap-northeast-2.amazonaws.com/open-mmlab/mmdetection/results/fast_mask_rcnn_r50_fpn_2x_20181010_results.pkl.json) |
+
+### RetinaNet (coming soon)
+
+| Backbone | Style   | Lr schd | Mem (GB) | Train time (s/iter) | Inf time (fps) | box AP | Download |
+|:--------:|:-------:|:-------:|:--------:|:-------------------:|:--------------:|:------:|:--------:|
+| R-50-FPN | caffe   | 1x      |          |                     |                |        |          |
+| R-50-FPN | pytorch | 1x      |          |                     |                |        |          |
+| R-50-FPN | pytorch | 2x      |          |                     |                |        |          |
+
+
+## Comparison with Detectron
+
+We compare mmdetection with [Detectron](https://github.com/facebookresearch/Detectron)
+and [Detectron.pytorch](https://github.com/roytseng-tw/Detectron.pytorch),
+a third-party port of Detectron to Pytorch. The backbone used is R-50-FPN.
+
+In general, mmdetection has 3 advantages over Detectron.
+
+- **Higher performance** (especially in terms of mask AP)
+- **Faster training speed**
+- **Memory efficient**
+
+### Performance
+
+Detectron and Detectron.pytorch use caffe-style ResNet as the backbone.
+In order to utilize the PyTorch model zoo, we use pytorch-style ResNet in our experiments.
+
+In the meanwhile, we train models with caffe-style ResNet in 1x experiments for comparison.
+We find that pytorch-style ResNet usually converges slower than caffe-style ResNet,
+thus leading to slightly lower results in 1x schedule, but the final results
+of 2x schedule is higher.
+
+We report results using both caffe-style (weights converted from
+[here](https://github.com/facebookresearch/Detectron/blob/master/MODEL_ZOO.md#imagenet-pretrained-models))
+and pytorch-style (weights from the official model zoo) ResNet backbone,
+indicated as *pytorch-style results* / *caffe-style results*.
+
+<table>
+  <tr>
+    <th>Type</th>
+    <th>Lr schd</th>
+    <th>Detectron</th>
+    <th>Detectron.pytorch</th>
+    <th>mmdetection</th>
+  </tr>
+  <tr>
+    <td rowspan="2">RPN</td>
+    <td>1x</td>
+    <td>57.2</td>
+    <td>-</td>
+    <td>57.1 / 58.2</td>
+  </tr>
+  <tr>
+    <td>2x</td>
+    <td>-</td>
+    <td>-</td>
+    <td>57.6 / -</td>
+  </tr>
+  <tr>
+    <td rowspan="2">Faster R-CNN</td>
+    <td>1x</td>
+    <td>36.7</td>
+    <td>37.1</td>
+    <td>36.4 / 36.7</td>
+  </tr>
+  <tr>
+    <td>2x</td>
+    <td>37.9</td>
+    <td>-</td>
+    <td>37.7 / -</td>
+  </tr>
+  <tr>
+    <td rowspan="2">Mask R-CNN</td>
+    <td>1x</td>
+    <td>37.7 &amp; 33.9</td>
+    <td>37.7 &amp; 33.7</td>
+    <td>37.3 &amp; 34.2 / 37.5 &amp; 34.4</td>
+  </tr>
+  <tr>
+    <td>2x</td>
+    <td>38.6 &amp; 34.5</td>
+    <td>-</td>
+    <td>38.6 &amp; 35.1 / -</td>
+  </tr>
+  <tr>
+    <td rowspan="2">Fast R-CNN</td>
+    <td>1x</td>
+    <td>36.4</td>
+    <td>-</td>
+    <td>35.8 / 36.6</td>
+  </tr>
+  <tr>
+    <td>2x</td>
+    <td>36.8</td>
+    <td>-</td>
+    <td>37.1 / -</td>
+  </tr>
+  <tr>
+    <td rowspan="2">Fast R-CNN (w/mask)</td>
+    <td>1x</td>
+    <td>37.3 &amp; 33.7</td>
+    <td>-</td>
+    <td>36.8 &amp; 34.1 / 37.3 &amp; 34.5</td>
+  </tr>
+  <tr>
+    <td>2x</td>
+    <td>37.7 &amp; 34.0</td>
+    <td>-</td>
+    <td>37.9 &amp; 34.8 / -</td>
+  </tr>
+</table>
+
+### Training Speed
+
+The training speed is measure with s/iter. The lower, the better.
+
+<table>
+  <tr>
+    <th>Type</th>
+    <th>Detectron (P100<sup>1</sup>)</th>
+    <th>Detectron.pytorch (XP<sup>2</sup>)</th>
+    <th>mmdetection<sup>3</sup> (V100<sup>4</sup> / XP)</th>
+  </tr>
+  <tr>
+    <td>RPN</td>
+    <td>0.416</td>
+    <td>-</td>
+    <td>0.407 / 0.413</td>
+  </tr>
+  <tr>
+    <td>Faster R-CNN</td>
+    <td>0.544</td>
+    <td>1.015</td>
+    <td>0.554 / 0.579</td>
+  </tr>
+  <tr>
+    <td>Mask R-CNN</td>
+    <td>0.889</td>
+    <td>1.435</td>
+    <td>0.690 / 0.732</td>
+  </tr>
+  <tr>
+    <td>Fast R-CNN</td>
+    <td>0.285</td>
+    <td>-</td>
+    <td>0.375 / 0.398</td>
+  </tr>
+  <tr>
+    <td>Fast R-CNN (w/mask)</td>
+    <td>0.377</td>
+    <td>-</td>
+    <td>0.504 / 0.574</td>
+  </tr>
+</table>
+
+\*1. Detectron reports the speed on Facebook's Big Basin servers (P100),
+on our V100 servers it is slower so we use the official reported values.
+
+\*2. Detectron.pytorch does not report the runtime and we encountered some issue to
+run it on V100, so we report the speed on TITAN XP.
+
+\*3. The speed of pytorch-style ResNet is approximately 5% slower than caffe-style,
+and we report the pytorch-style results here.
+
+\*4. We also run the models on a DGX-1 server (P100) and the speed is almost the same as our V100 servers.
+
+### Inference Speed
+
+The inference speed is measured with fps (img/s) on a single GPU. The higher, the better.
+
+<table>
+  <tr>
+    <th>Type</th>
+    <th>Detectron (P100)</th>
+    <th>Detectron.pytorch (XP)</th>
+    <th>mmdetection (V100 / XP)</th>
+  </tr>
+  <tr>
+    <td>RPN</td>
+    <td>12.5</td>
+    <td>-</td>
+    <td>14.5 / 15.4</td>
+  </tr>
+  <tr>
+    <td>Faster R-CNN</td>
+    <td>10.3</td>
+    <td></td>
+    <td>9.9 / 9.8</td>
+  </tr>
+  <tr>
+    <td>Mask R-CNN</td>
+    <td>8.5</td>
+    <td></td>
+    <td>7.7 / 7.4</td>
+  </tr>
+  <tr>
+    <td>Fast R-CNN</td>
+    <td>12.5</td>
+    <td></td>
+    <td>14.5 / 14.1</td>
+  </tr>
+  <tr>
+    <td>Fast R-CNN (w/mask)</td>
+    <td>9.9</td>
+    <td></td>
+    <td>10.6 / 10.3</td>
+  </tr>
+</table>
+
+### Training memory
+
+We perform various tests and there is no doubt that mmdetection is more memory
+efficient than Detectron, and the main cause is the deep learning framework itself, not our efforts.
+Besides, Caffe2 and PyTorch have different apis to obtain memory usage
+whose implementation is not exactly the same.
+
+`nvidia-smi` shows a larger memory usage for both detectron and mmdetection, e.g.,
+we observe a much higher memory usage when we train Mask R-CNN with 2 images per GPU using detectron (10.6G) and mmdetection (9.3G), which is obviously more than actually required.
+
+> With mmdetection, we can train R-50 FPN Mask R-CNN with **4** images per GPU (TITAN XP, 12G),
+which is a promising result.
--- a/README.md
+++ b/README.md
-# mm-detection
-Open-MMLab Detection Toolbox

-**Note:** 
+# mmdetection

-We are still working on organizing the codebase. This toolbox will be formally released by the end of September. Stay tuned!
+## Introduction

---
+mmdetection is an open source object detection toolbox based on PyTorch. It is
+a part of the open-mmlab project developed by [Multimedia Laboratory, CUHK](http://mmlab.ie.cuhk.edu.hk/).

-## Major Features
+### Major features

 - **Modular Design**

-  One can easily construct a customized object detection framework by combining different components. 
-  
+  One can easily construct a customized object detection framework by combining different components.
+
 - **Support of multiple frameworks out of box**

-  The toolbox directly supports popular detection frameworks, *e.g.* Faster RCNN, Mask RCNN, and R-FCN, etc.
-  
+  The toolbox directly supports popular detection frameworks, *e.g.* Faster RCNN, Mask RCNN, RetinaNet, etc.
+
+- **Efficient**
+
+  All basic bbox and mask operations run on GPUs now.
+  The training speed is about 5% ~ 20% faster than Detectron for different models.
+
 - **State of the art**

-  This was the codebase of the *MMDet* team, who won the [COCO Detection 2018 challenge](http://cocodataset.org/#detection-leaderboard). 
-  
+  This was the codebase of the *MMDet* team, who won the [COCO Detection 2018 challenge](http://cocodataset.org/#detection-leaderboard).
+
+Apart from mmdetection, we also released a library [mmcv](https://github.com/open-mmlab/mmcv) for computer vision research,
+which is heavily depended on by this toolbox.
+
+## License
+
+This project is released under the [GPLv3 license](LICENSE).
+
+## Benchmark and model zoo
+
+We provide our baseline results and the comparision with Detectron, the most
+popular detection projects. Results and models are available in the [Model zoo](MODEL_ZOO.md).
+
+## Installation
+
+### Requirements
+
+- Linux (tested on Ubuntu 16.04 and CentOS 7.2)
+- Python 3.4+
+- PyTorch 0.4.1 and torchvision
+- Cython
+- [mmcv](https://github.com/open-mmlab/mmcv)
+
+### Install mmdetection
+
+a. Install PyTorch 0.4.1 and torchvision following the [official instructions](https://pytorch.org/).
+
+b. Clone the mmdetection repository.
+
+```shell
+git clone https://github.com/open-mmlab/mmdetection.git
+```
+
+c. Compile cuda extensions.
+
+```shell
+cd mmdetection
+pip install cython  # or "conda install cython" if you prefer conda
+./compile.sh  # or "PYTHON=python3 ./compile.sh" if you use system python3 without virtual environments
+```
+
+d. Install mmdetection (other dependencies will be installed automatically).
+
+```shell
+python(3) setup.py install  # add --user if you want to install it locally
+# or "pip install ."
+```
+
+Note: You need to run the last step each time you pull updates from github.
+The git commit id will be written to the version number and also saved in trained models.
+
+### Prepare COCO dataset.
+
+It is recommended to symlink the dataset root to `$MMDETECTION/data`.
+
+```
+mmdetection
+├── mmdet
+├── tools
+├── configs
+├── data
+│   ├── coco
+│   │   ├── annotations
+│   │   ├── train2017
+│   │   ├── val2017
+│   │   ├── test2017
+
+```
+
+> [Here](https://gist.github.com/hellock/bf23cd7348c727d69d48682cb6909047) is
+a script for setting up mmdetection with conda for reference.
+
+
+## Inference with pretrained models
+
+### Test a dataset
+
+- [x] single GPU testing
+- [x] multiple GPU testing
+- [x] visualize detection results
+
+We allow to run one or multiple processes on each GPU, e.g. 8 processes on 8 GPU
+or 16 processes on 8 GPU. When the GPU workload is not very heavy for a single
+process, running multiple processes will accelerate the testing, which is specified
+with the argument `--proc_per_gpu <PROCESS_NUM>`.
+
+
+To test a dataset and save the results.
+
+```shell
+python tools/test.py <CONFIG_FILE> <CHECKPOINT_FILE> --gpus <GPU_NUM> --out <OUT_FILE>
+```
+
+To perform evaluation after testing, add `--eval <EVAL_TYPES>`. Supported types are:
+
+- proposal_fast: eval recalls of proposals with our own codes. (supposed to get the same results as the official evaluation)
+- proposal: eval recalls of proposals with the official code provided by COCO.
+- bbox: eval box AP with the official code provided by COCO.
+- segm: eval mask AP with the official code provided by COCO.
+- keypoints: eval keypoint AP with the official code provided by COCO.
+
+For example, to evaluate Mask R-CNN with 8 GPUs and save the result as `results.pkl`.
+
+```shell
+python tools/test.py configs/mask_rcnn_r50_fpn_1x.py <CHECKPOINT_FILE> --gpus 8 --out results.pkl --eval bbox segm
+```
+
+It is also convenient to visualize the results during testing by adding an argument `--show`.
+
+```shell
+python tools/test.py <CONFIG_FILE> <CHECKPOINT_FILE> --show
+```
+
+### Test image(s)
+
+We provide some high-level apis (experimental) to test an image.
+
+```python
+import mmcv
+from mmcv.runner import load_checkpoint
+from mmdet.models import build_detector
+from mmdet.apis import inference_detector, show_result
+
+cfg = mmcv.Config.fromfile('configs/faster_rcnn_r50_fpn_1x.py')
+cfg.model.pretrained = None
+
+# construct the model and load checkpoint
+model = build_detector(cfg.model, test_cfg=cfg.test_cfg)
+_ = load_checkpoint(model, 'https://s3.ap-northeast-2.amazonaws.com/open-mmlab/mmdetection/models/faster_rcnn_r50_fpn_1x_20181010-3d1b3351.pth')
+
+# test a single image
+img = mmcv.imread('test.jpg')
+result = inference_detector(model, img, cfg)
+show_result(img, result)
+
+# test a list of images
+imgs = ['test1.jpg', 'test2.jpg']
+for i, result in enumerate(inference_detector(model, imgs, cfg, device='cuda:0')):
+    print(i, imgs[i])
+    show_result(imgs[i], result)
+```
+
+
+## Train a model
+
+mmdetection implements distributed training and non-distributed training,
+which uses `MMDistributedDataParallel` and `MMDataParallel` respectively.
+
+We suggest using distributed training even on a single machine, which is faster,
+and non-distributed training are left for debugging or other purposes.
+
+### Distributed training
+
+mmdetection potentially supports multiple launch methods, e.g., PyTorch’s built-in launch utility, slurm and MPI.
+
+We provide a training script using the launch utility provided by PyTorch.
+
+```shell
+./tools/dist_train.sh <CONFIG_FILE> <GPU_NUM> [optional arguments]
+```
+
+Supported arguments are:
+
+- --validate: perform evaluation every k (default=1) epochs during the training.
+- --work_dir <WORK_DIR>: if specified, the path in config file will be overwritten.
+
+### Non-distributed training
+
+```shell
+python tools/train.py <CONFIG_FILE> --gpus <GPU_NUM> --work_dir <WORK_DIR> --validate
+```
+
+Expected results in WORK_DIR:
+
+- log file
+- saved checkpoints (every k epochs, defaults=1)
+- a symbol link to the latest checkpoint
+
+
+## Technical details
+
+Some implementation details and project structures are described in the [technical details](TECHNICAL_DETAILS.md).
--- a/TECHNICAL_DETAILS.md
+++ b/TECHNICAL_DETAILS.md
+## Overview
+
+In this section, we will introduce the main units of training a detector:
+data loading, model and iteration pipeline.
+
+## Data loading
+
+Following typical conventions, we use `Dataset` and `DataLoader` for data loading
+with multiple workers. `Dataset` returns a dict of data items corresponding
+the arguments of models' forward method.
+Since the data in object detection may not be the same size (image size, gt bbox size, etc.),
+we introduce a new `DataContainer` type in `mmcv` to help collect and distribute
+data of different size.
+See [here](https://github.com/open-mmlab/mmcv/blob/master/mmcv/parallel/data_container.py) for more details.
+
+## Model
+
+In mmdetection, model components are basically categorized as 4 types.
+
+- backbone: usually a FCN network to extract feature maps, e.g., ResNet.
+- neck: the part between backbones and heads, e.g., FPN, ASPP.
+- head: the part for specific tasks, e.g., bbox prediction and mask prediction.
+- roi extractor: the part for extracting features from feature maps, e.g., RoI Align.
+
+We also write implement some general detection pipelines with the above components,
+such as `SingleStageDetector` and `TwoStageDetector`.
+
+### Build a model with basic components
+
+Following some basic pipelines (e.g., two-stage detectors), the model structure
+can be customized through config files with no pains.
+
+If we want to implement some new components, e.g, the path aggregation
+FPN structure in [Path Aggregation Network for Instance Segmentation](https://arxiv.org/abs/1803.01534), there are two things to do.
+
+1. create a new file in `mmdet/models/necks/pafpn.py`.
+
+    ```python
+    class PAFPN(nn.Module):
+
+        def __init__(self,
+                    in_channels,
+                    out_channels,
+                    num_outs,
+                    start_level=0,
+                    end_level=-1,
+                    add_extra_convs=False):
+            pass
+        
+        def forward(self, inputs):
+            # implementation is ignored
+            pass
+    ```
+
+2. modify the config file from
+
+    ```python
+    neck=dict(
+        type='FPN',
+        in_channels=[256, 512, 1024, 2048],
+        out_channels=256,
+        num_outs=5)
+    ```
+
+    to
+
+    ```python
+    neck=dict(
+        type='PAFPN',
+        in_channels=[256, 512, 1024, 2048],
+        out_channels=256,
+        num_outs=5)
+    ```
+
+We will release more components (backbones, necks, heads) for research purpose.
+
+### Write a new model
+
+To write a new detection pipeline, you need to inherit from `BaseDetector`,
+which defines the following abstract methods.
+
+- `extract_feat()`: given an image batch of shape (n, c, h, w), extract the feature map(s).
+- `forward_train()`: forward method of the training mode
+- `simple_test()`: single scale testing without augmentation
+- `aug_test()`: testing without augmentation (multi-scale, flip, etc.)
+
+[TwoStageDetector](https://github.com/hellock/mmdetection/blob/master/mmdet/models/detectors/two_stage.py)
+is a good example which shows how to do that.
+
+## Iteration pipeline
+
+We adopt distributed training for both single machine and multiple machines.
+Supposing that the server has 8 GPUs, 8 processes will be started and each process runs on a single GPU.
+
+Each process keeps an isolated model, data loader, and optimizer.
+Model parameters are only synchronized once at the begining.
+After a forward and backward pass, gradients will be allreduced among all GPUs,
+and the optimizer will update model parameters.
+Since the gradients are allreduced, the model parameter stays the same for all processes after the iteration.
\ No newline at end of file
--- a/configs/fast_mask_rcnn_r50_fpn_1x.py
+++ b/configs/fast_mask_rcnn_r50_fpn_1x.py
+# model settings
+model = dict(
+    type='FastRCNN',
+    pretrained='modelzoo://resnet50',
+    backbone=dict(
+        type='ResNet',
+        depth=50,
+        num_stages=4,
+        out_indices=(0, 1, 2, 3),
+        frozen_stages=1,
+        style='pytorch'),
+    neck=dict(
+        type='FPN',
+        in_channels=[256, 512, 1024, 2048],
+        out_channels=256,
+        num_outs=5),
+    bbox_roi_extractor=dict(
+        type='SingleRoIExtractor',
+        roi_layer=dict(type='RoIAlign', out_size=7, sample_num=2),
+        out_channels=256,
+        featmap_strides=[4, 8, 16, 32]),
+    bbox_head=dict(
+        type='SharedFCRoIHead',
+        num_fcs=2,
+        in_channels=256,
+        fc_out_channels=1024,
+        roi_feat_size=7,
+        num_classes=81,
+        target_means=[0., 0., 0., 0.],
+        target_stds=[0.1, 0.1, 0.2, 0.2],
+        reg_class_agnostic=False),
+    mask_roi_extractor=dict(
+        type='SingleRoIExtractor',
+        roi_layer=dict(type='RoIAlign', out_size=14, sample_num=2),
+        out_channels=256,
+        featmap_strides=[4, 8, 16, 32]),
+    mask_head=dict(
+        type='FCNMaskHead',
+        num_convs=4,
+        in_channels=256,
+        conv_out_channels=256,
+        num_classes=81))
+# model training and testing settings
+train_cfg = dict(
+    rcnn=dict(
+        mask_size=28,
+        pos_iou_thr=0.5,
+        neg_iou_thr=0.5,
+        crowd_thr=1.1,
+        roi_batch_size=512,
+        add_gt_as_proposals=True,
+        pos_fraction=0.25,
+        pos_balance_sampling=False,
+        neg_pos_ub=512,
+        neg_balance_thr=0,
+        min_pos_iou=0.5,
+        pos_weight=-1,
+        debug=False))
+test_cfg = dict(
+    rcnn=dict(
+        score_thr=0.05, max_per_img=100, nms_thr=0.5, mask_thr_binary=0.5))
+# dataset settings
+dataset_type = 'CocoDataset'
+data_root = 'data/coco/'
+img_norm_cfg = dict(
+    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
+data = dict(
+    imgs_per_gpu=2,
+    workers_per_gpu=2,
+    train=dict(
+        type=dataset_type,
+        ann_file=data_root + 'annotations/instances_train2017.json',
+        img_prefix=data_root + 'train2017/',
+        img_scale=(1333, 800),
+        img_norm_cfg=img_norm_cfg,
+        size_divisor=32,
+        proposal_file=data_root + 'proposals/rpn_r50_fpn_1x_train2017.pkl',
+        flip_ratio=0.5,
+        with_mask=True,
+        with_crowd=True,
+        with_label=True),
+    val=dict(
+        type=dataset_type,
+        ann_file=data_root + 'annotations/instances_val2017.json',
+        img_prefix=data_root + 'val2017/',
+        img_scale=(1333, 800),
+        img_norm_cfg=img_norm_cfg,
+        proposal_file=data_root + 'proposals/rpn_r50_fpn_1x_val2017.pkl',
+        size_divisor=32,
+        flip_ratio=0,
+        with_mask=True,
+        with_crowd=True,
+        with_label=True),
+    test=dict(
+        type=dataset_type,
+        ann_file=data_root + 'annotations/instances_val2017.json',
+        img_prefix=data_root + 'val2017/',
+        img_scale=(1333, 800),
+        img_norm_cfg=img_norm_cfg,
+        proposal_file=data_root + 'proposals/rpn_r50_fpn_1x_val2017.pkl',
+        size_divisor=32,
+        flip_ratio=0,
+        with_mask=False,
+        with_label=False,
+        test_mode=True))
+# optimizer
+optimizer = dict(type='SGD', lr=0.02, momentum=0.9, weight_decay=0.0001)
+optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2))
+# learning policy
+lr_config = dict(
+    policy='step',
+    warmup='linear',
+    warmup_iters=500,
+    warmup_ratio=1.0 / 3,
+    step=[8, 11])
+checkpoint_config = dict(interval=1)
+# yapf:disable
+log_config = dict(
+    interval=50,
+    hooks=[
+        dict(type='TextLoggerHook'),
+        # dict(type='TensorboardLoggerHook')
+    ])
+# yapf:enable
+# runtime settings
+total_epochs = 12
+dist_params = dict(backend='nccl')
+log_level = 'INFO'
+work_dir = './work_dirs/fast_mask_rcnn_r50_fpn_1x'
+load_from = None
+resume_from = None
+workflow = [('train', 1)]
--- a/configs/fast_rcnn_r50_fpn_1x.py
+++ b/configs/fast_rcnn_r50_fpn_1x.py
+# model settings
+model = dict(
+    type='FastRCNN',
+    pretrained='modelzoo://resnet50',
+    backbone=dict(
+        type='ResNet',
+        depth=50,
+        num_stages=4,
+        out_indices=(0, 1, 2, 3),
+        frozen_stages=1,
+        style='pytorch'),
+    neck=dict(
+        type='FPN',
+        in_channels=[256, 512, 1024, 2048],
+        out_channels=256,
+        num_outs=5),
+    bbox_roi_extractor=dict(
+        type='SingleRoIExtractor',
+        roi_layer=dict(type='RoIAlign', out_size=7, sample_num=2),
+        out_channels=256,
+        featmap_strides=[4, 8, 16, 32]),
+    bbox_head=dict(
+        type='SharedFCRoIHead',
+        num_fcs=2,
+        in_channels=256,
+        fc_out_channels=1024,
+        roi_feat_size=7,
+        num_classes=81,
+        target_means=[0., 0., 0., 0.],
+        target_stds=[0.1, 0.1, 0.2, 0.2],
+        reg_class_agnostic=False))
+# model training and testing settings
+train_cfg = dict(
+    rcnn=dict(
+        pos_iou_thr=0.5,
+        neg_iou_thr=0.5,
+        crowd_thr=1.1,
+        roi_batch_size=512,
+        add_gt_as_proposals=True,
+        pos_fraction=0.25,
+        pos_balance_sampling=False,
+        neg_pos_ub=512,
+        neg_balance_thr=0,
+        min_pos_iou=0.5,
+        pos_weight=-1,
+        debug=False))
+test_cfg = dict(rcnn=dict(score_thr=0.05, max_per_img=100, nms_thr=0.5))
+# dataset settings
+dataset_type = 'CocoDataset'
+data_root = 'data/coco/'
+img_norm_cfg = dict(
+    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
+data = dict(
+    imgs_per_gpu=2,
+    workers_per_gpu=2,
+    train=dict(
+        type=dataset_type,
+        ann_file=data_root + 'annotations/instances_train2017.json',
+        img_prefix=data_root + 'train2017/',
+        img_scale=(1333, 800),
+        img_norm_cfg=img_norm_cfg,
+        size_divisor=32,
+        proposal_file=data_root + 'proposals/rpn_r50_fpn_1x_train2017.pkl',
+        flip_ratio=0.5,
+        with_mask=False,
+        with_crowd=True,
+        with_label=True),
+    val=dict(
+        type=dataset_type,
+        ann_file=data_root + 'annotations/instances_val2017.json',
+        img_prefix=data_root + 'val2017/',
+        img_scale=(1333, 800),
+        img_norm_cfg=img_norm_cfg,
+        proposal_file=data_root + 'proposals/rpn_r50_fpn_1x_val2017.pkl',
+        size_divisor=32,
+        flip_ratio=0,
+        with_mask=False,
+        with_crowd=True,
+        with_label=True),
+    test=dict(
+        type=dataset_type,
+        ann_file=data_root + 'annotations/instances_val2017.json',
+        img_prefix=data_root + 'val2017/',
+        img_scale=(1333, 800),
+        img_norm_cfg=img_norm_cfg,
+        proposal_file=data_root + 'proposals/rpn_r50_fpn_1x_val2017.pkl',
+        size_divisor=32,
+        flip_ratio=0,
+        with_mask=False,
+        with_label=False,
+        test_mode=True))
+# optimizer
+optimizer = dict(type='SGD', lr=0.02, momentum=0.9, weight_decay=0.0001)
+optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2))
+# learning policy
+lr_config = dict(
+    policy='step',
+    warmup='linear',
+    warmup_iters=500,
+    warmup_ratio=1.0 / 3,
+    step=[8, 11])
+checkpoint_config = dict(interval=1)
+# yapf:disable
+log_config = dict(
+    interval=50,
+    hooks=[
+        dict(type='TextLoggerHook'),
+        # dict(type='TensorboardLoggerHook')
+    ])
+# yapf:enable
+# runtime settings
+total_epochs = 12
+dist_params = dict(backend='nccl')
+log_level = 'INFO'
+work_dir = './work_dirs/fast_rcnn_r50_fpn_1x'
+load_from = None
+resume_from = None
+workflow = [('train', 1)]
--- a/configs/faster_rcnn_r50_fpn_1x.py
+++ b/configs/faster_rcnn_r50_fpn_1x.py
@@ -65,7 +65,7 @@ train_cfg = dict(
        pos_balance_sampling=False,
        neg_pos_ub=512,
        neg_balance_thr=0,
-        min_pos_iou=1.1,
+        min_pos_iou=0.5,
        pos_weight=-1,
        debug=False))
 test_cfg = dict(
@@ -139,7 +139,6 @@ log_config = dict(
 # yapf:enable
 # runtime settings
 total_epochs = 12
-device_ids = range(8)
 dist_params = dict(backend='nccl')
 log_level = 'INFO'
 work_dir = './work_dirs/faster_rcnn_r50_fpn_1x'

--- a/configs/mask_rcnn_r50_fpn_1x.py
+++ b/configs/mask_rcnn_r50_fpn_1x.py
@@ -77,7 +77,7 @@ train_cfg = dict(
        pos_balance_sampling=False,
        neg_pos_ub=512,
        neg_balance_thr=0,
-        min_pos_iou=1.1,
+        min_pos_iou=0.5,
        pos_weight=-1,
        debug=False))
 test_cfg = dict(
@@ -152,7 +152,6 @@ log_config = dict(
 # yapf:enable
 # runtime settings
 total_epochs = 12
-device_ids = range(8)
 dist_params = dict(backend='nccl')
 log_level = 'INFO'
 work_dir = './work_dirs/mask_rcnn_r50_fpn_1x'

--- a/mmdet/api/__init__.py
+++ b/mmdet/api/__init__.py
 from .env import init_dist, get_root_logger, set_random_seed
 from .train import train_detector
-from .inference import inference_detector
+from .inference import inference_detector, show_result

 __all__ = [
    'init_dist', 'get_root_logger', 'set_random_seed', 'train_detector',
-    'inference_detector'
+    'inference_detector', 'show_result'
 ]
--- a/mmdet/api/env.py
+++ b/mmdet/api/env.py
--- a/mmdet/api/inference.py
+++ b/mmdet/api/inference.py
@@ -23,19 +23,29 @@ def _prepare_data(img, img_transform, cfg, device):
    return dict(img=[img], img_meta=[img_meta])


-def inference_detector(model, imgs, cfg, device='cuda:0'):
+def _inference_single(model, img, img_transform, cfg, device):
+    img = mmcv.imread(img)
+    data = _prepare_data(img, img_transform, cfg, device)
+    with torch.no_grad():
+        result = model(return_loss=False, rescale=True, **data)
+    return result
+
+
+def _inference_generator(model, imgs, img_transform, cfg, device):
+    for img in imgs:
+        yield _inference_single(model, img, img_transform, cfg, device)

-    imgs = imgs if isinstance(imgs, list) else [imgs]
+
+def inference_detector(model, imgs, cfg, device='cuda:0'):
    img_transform = ImageTransform(
        size_divisor=cfg.data.test.size_divisor, **cfg.img_norm_cfg)
    model = model.to(device)
    model.eval()
-    for img in imgs:
-        img = mmcv.imread(img)
-        data = _prepare_data(img, img_transform, cfg, device)
-        with torch.no_grad():
-            result = model(return_loss=False, rescale=True, **data)
-        yield result
+
+    if not isinstance(imgs, list):
+        return _inference_single(model, imgs, img_transform, cfg, device)
+    else:
+        return _inference_generator(model, imgs, img_transform, cfg, device)


 def show_result(img, result, dataset='coco', score_thr=0.3):
@@ -46,6 +56,7 @@ def show_result(img, result, dataset='coco', score_thr=0.3):
    ]
    labels = np.concatenate(labels)
    bboxes = np.vstack(result)
+    img = mmcv.imread(img)
    mmcv.imshow_det_bboxes(
        img.copy(),
        bboxes,

--- a/mmdet/api/train.py
+++ b/mmdet/api/train.py
--- a/mmdet/models/builder.py
+++ b/mmdet/models/builder.py
@@ -2,7 +2,7 @@ from mmcv.runner import obj_from_dict
 from torch import nn

 from . import (backbones, necks, roi_extractors, rpn_heads, bbox_heads,
-               mask_heads, detectors)
+               mask_heads)

 __all__ = [
    'build_backbone', 'build_neck', 'build_rpn_head', 'build_roi_extractor',
@@ -48,4 +48,5 @@ def build_mask_head(cfg):


 def build_detector(cfg, train_cfg=None, test_cfg=None):
+    from . import detectors
    return build(cfg, detectors, dict(train_cfg=train_cfg, test_cfg=test_cfg))
--- a/mmdet/models/detectors/two_stage.py
+++ b/mmdet/models/detectors/two_stage.py
@@ -140,7 +140,6 @@ class TwoStageDetector(BaseDetector, RPNTestMixin, BBoxTestMixin,

    def simple_test(self, img, img_meta, proposals=None, rescale=False):
        """Test without augmentation."""
-        assert proposals is None, "Fast RCNN hasn't been implemented."
        assert self.with_bbox, "Bbox head must be implemented."

        x = self.extract_feat(img)

--- a/mmdet/models/rpn_heads/rpn_head.py
+++ b/mmdet/models/rpn_heads/rpn_head.py
@@ -48,8 +48,8 @@ class RPNHead(nn.Module):
        self.anchor_scales = anchor_scales
        self.anchor_ratios = anchor_ratios
        self.anchor_strides = anchor_strides
-        self.anchor_base_sizes = anchor_strides.copy(
-        ) if anchor_base_sizes is None else anchor_base_sizes
+        self.anchor_base_sizes = list(
+            anchor_strides) if anchor_base_sizes is None else anchor_base_sizes
        self.target_means = target_means
        self.target_stds = target_stds
        self.use_sigmoid_cls = use_sigmoid_cls

--- a/tools/train.py
+++ b/tools/train.py
@@ -5,8 +5,8 @@ from mmcv import Config
 from mmcv.runner import obj_from_dict

 from mmdet import datasets, __version__
-from mmdet.api import (train_detector, init_dist, get_root_logger,
-                       set_random_seed)
+from mmdet.apis import (train_detector, init_dist, get_root_logger,
+                        set_random_seed)
 from mmdet.models import build_detector