Commit 0ccd4c52 authored by myownskyW7's avatar myownskyW7
Browse files

auto merge

parent de4dd5a5
# Benchmark and Model Zoo
## Environment
### Hardware
- 8 NVIDIA Tesla V100 GPUs
- Intel Xeon 4114 CPU @ 2.20GHz
### Software environment
- Python 3.6 / 3.7
- PyTorch 0.4.1
- CUDA 9.0.176
- CUDNN 7.0.4
- NCCL 2.1.15
## Common settings
- All baselines were trained using 8 GPU with a batch size of 16 (2 images per GPU).
- All models were trained on `coco_2017_train`, and tested on the `coco_2017_val`.
- We use distributed training and BN layer stats are fixed.
- We adopt the same training schedules as Detectron. 1x indicates 12 epochs and 2x indicates 24 epochs, which corresponds to slightly less iterations than Detectron and the difference can be ignored.
- All pytorch-style pretrained backbones on ImageNet are from PyTorch model zoo.
- We report the training GPU memory as the maximum value of `torch.cuda.max_memory_cached()`
for all 8 GPUs. Note that this value is usually less than what `nvidia-smi` shows, but
closer to the actual requirements.
- We report the inference time as the overall time including data loading,
network forwarding and post processing.
- The training memory and time of 2x schedule is simply copied from 1x.
It should be very close to the actual memory and time.
## Baselines
We released RPN, Faster R-CNN and Mask R-CNN models in the first version. More models with different backbones will be added to the model zoo.
### RPN
| Backbone | Style | Lr schd | Mem (GB) | Train time (s/iter) | Inf time (fps) | AR1000 | Download |
|:--------:|:-------:|:-------:|:--------:|:-------------------:|:--------------:|:------:|:--------:|
| R-50-FPN | caffe | 1x | 4.5 | 0.379 | 14.4 | 58.2 | - |
| R-50-FPN | pytorch | 1x | 4.8 | 0.407 | 14.5 | 57.1 | [model](https://s3.ap-northeast-2.amazonaws.com/open-mmlab/mmdetection/models/rpn_r50_fpn_1x_20181010-4a9c0712.pth) \| [result](https://s3.ap-northeast-2.amazonaws.com/open-mmlab/mmdetection/results/rpn_r50_fpn_1x_20181010_results.pkl.json) |
| R-50-FPN | pytorch | 2x | 4.8 | 0.407 | 14.5 | 57.6 | [model](https://s3.ap-northeast-2.amazonaws.com/open-mmlab/mmdetection/models/rpn_r50_fpn_2x_20181010-88a4a471.pth) \| [result](https://s3.ap-northeast-2.amazonaws.com/open-mmlab/mmdetection/results/rpn_r50_fpn_2x_20181010_results.pkl.json) |
### Faster R-CNN
| Backbone | Style | Lr schd | Mem (GB) | Train time (s/iter) | Inf time (fps) | box AP | Download |
|:--------:|:-------:|:-------:|:--------:|:-------------------:|:--------------:|:------:|:--------:|
| R-50-FPN | caffe | 1x | 4.9 | 0.525 | 10.0 | 36.7 | - |
| R-50-FPN | pytorch | 1x | 5.1 | 0.554 | 9.9 | 36.4 | [model](https://s3.ap-northeast-2.amazonaws.com/open-mmlab/mmdetection/models/faster_rcnn_r50_fpn_1x_20181010-3d1b3351.pth) \| [result](https://s3.ap-northeast-2.amazonaws.com/open-mmlab/mmdetection/results/faster_rcnn_r50_fpn_1x_20181010_results.pkl.json) |
| R-50-FPN | pytorch | 2x | 5.1 | 0.554 | 9.9 | 37.7 | [model](https://s3.ap-northeast-2.amazonaws.com/open-mmlab/mmdetection/models/faster_rcnn_r50_fpn_2x_20181010-443129e1.pth) \| [result](https://s3.ap-northeast-2.amazonaws.com/open-mmlab/mmdetection/results/faster_rcnn_r50_fpn_2x_20181010_results.pkl.json) |
### Mask R-CNN
| Backbone | Style | Lr schd | Mem (GB) | Train time (s/iter) | Inf time (fps) | box AP | mask AP | Download |
|:--------:|:-------:|:-------:|:--------:|:-------------------:|:--------------:|:------:|:-------:|:--------:|
| R-50-FPN | caffe | 1x | 5.9 | 0.658 | 7.7 | 37.5 | 34.4 | - |
| R-50-FPN | pytorch | 1x | 5.8 | 0.690 | 7.7 | 37.3 | 34.2 | [model](https://s3.ap-northeast-2.amazonaws.com/open-mmlab/mmdetection/models/mask_rcnn_r50_fpn_1x_20181010-069fa190.pth) \| [result](https://s3.ap-northeast-2.amazonaws.com/open-mmlab/mmdetection/results/mask_rcnn_r50_fpn_1x_20181010_results.pkl.json) |
| R-50-FPN | pytorch | 2x | 5.8 | 0.690 | 7.7 | 38.6 | 35.1 | [model](https://s3.ap-northeast-2.amazonaws.com/open-mmlab/mmdetection/models/mask_rcnn_r50_fpn_2x_20181010-41d35c05.pth) \| [result](https://s3.ap-northeast-2.amazonaws.com/open-mmlab/mmdetection/results/mask_rcnn_r50_fpn_2x_20181010_results.pkl.json) |
### Fast R-CNN (with pre-computed proposals)
| Backbone | Style | Type | Lr schd | Mem (GB) | Train time (s/iter) | Inf time (fps) | box AP | mask AP | Download |
|:--------:|:-------:|:------:|:-------:|:--------:|:-------------------:|:--------------:|:------:|:-------:|:--------:|
| R-50-FPN | caffe | Faster | 1x | 3.5 | 0.35 | 14.6 | 36.6 | - | - |
| R-50-FPN | pytorch | Faster | 1x | 4.0 | 0.38 | 14.5 | 35.8 | - | [model](https://s3.ap-northeast-2.amazonaws.com/open-mmlab/mmdetection/models/fast_rcnn_r50_fpn_1x_20181010-08160859.pth) \| [result](https://s3.ap-northeast-2.amazonaws.com/open-mmlab/mmdetection/results/fast_rcnn_r50_fpn_1x_20181010_results.pkl.json) |
| R-50-FPN | pytorch | Faster | 2x | 4.0 | 0.38 | 14.5 | 37.1 | - | [model](https://s3.ap-northeast-2.amazonaws.com/open-mmlab/mmdetection/models/fast_rcnn_r50_fpn_2x_20181010-d263ada5.pth) \| [result](https://s3.ap-northeast-2.amazonaws.com/open-mmlab/mmdetection/results/fast_rcnn_r50_fpn_2x_20181010_results.pkl.json) |
| R-50-FPN | caffe | Mask | 1x | 5.4 | 0.47 | 10.7 | 37.3 | 34.5 | - |
| R-50-FPN | pytorch | Mask | 1x | 5.3 | 0.50 | 10.6 | 36.8 | 34.1 | [model](https://s3.ap-northeast-2.amazonaws.com/open-mmlab/mmdetection/models/fast_mask_rcnn_r50_fpn_1x_20181010-e030a38f.pth) \| [result](https://s3.ap-northeast-2.amazonaws.com/open-mmlab/mmdetection/results/fast_mask_rcnn_r50_fpn_1x_20181010_results.pkl.json) |
| R-50-FPN | pytorch | Mask | 2x | 5.3 | 0.50 | 10.6 | 37.9 | 34.8 | [model](https://s3.ap-northeast-2.amazonaws.com/open-mmlab/mmdetection/models/fast_mask_rcnn_r50_fpn_2x_20181010-5048cb03.pth) \| [result](https://s3.ap-northeast-2.amazonaws.com/open-mmlab/mmdetection/results/fast_mask_rcnn_r50_fpn_2x_20181010_results.pkl.json) |
### RetinaNet (coming soon)
| Backbone | Style | Lr schd | Mem (GB) | Train time (s/iter) | Inf time (fps) | box AP | Download |
|:--------:|:-------:|:-------:|:--------:|:-------------------:|:--------------:|:------:|:--------:|
| R-50-FPN | caffe | 1x | | | | | |
| R-50-FPN | pytorch | 1x | | | | | |
| R-50-FPN | pytorch | 2x | | | | | |
## Comparison with Detectron
We compare mmdetection with [Detectron](https://github.com/facebookresearch/Detectron)
and [Detectron.pytorch](https://github.com/roytseng-tw/Detectron.pytorch),
a third-party port of Detectron to Pytorch. The backbone used is R-50-FPN.
In general, mmdetection has 3 advantages over Detectron.
- **Higher performance** (especially in terms of mask AP)
- **Faster training speed**
- **Memory efficient**
### Performance
Detectron and Detectron.pytorch use caffe-style ResNet as the backbone.
In order to utilize the PyTorch model zoo, we use pytorch-style ResNet in our experiments.
In the meanwhile, we train models with caffe-style ResNet in 1x experiments for comparison.
We find that pytorch-style ResNet usually converges slower than caffe-style ResNet,
thus leading to slightly lower results in 1x schedule, but the final results
of 2x schedule is higher.
We report results using both caffe-style (weights converted from
[here](https://github.com/facebookresearch/Detectron/blob/master/MODEL_ZOO.md#imagenet-pretrained-models))
and pytorch-style (weights from the official model zoo) ResNet backbone,
indicated as *pytorch-style results* / *caffe-style results*.
<table>
<tr>
<th>Type</th>
<th>Lr schd</th>
<th>Detectron</th>
<th>Detectron.pytorch</th>
<th>mmdetection</th>
</tr>
<tr>
<td rowspan="2">RPN</td>
<td>1x</td>
<td>57.2</td>
<td>-</td>
<td>57.1 / 58.2</td>
</tr>
<tr>
<td>2x</td>
<td>-</td>
<td>-</td>
<td>57.6 / -</td>
</tr>
<tr>
<td rowspan="2">Faster R-CNN</td>
<td>1x</td>
<td>36.7</td>
<td>37.1</td>
<td>36.4 / 36.7</td>
</tr>
<tr>
<td>2x</td>
<td>37.9</td>
<td>-</td>
<td>37.7 / -</td>
</tr>
<tr>
<td rowspan="2">Mask R-CNN</td>
<td>1x</td>
<td>37.7 &amp; 33.9</td>
<td>37.7 &amp; 33.7</td>
<td>37.3 &amp; 34.2 / 37.5 &amp; 34.4</td>
</tr>
<tr>
<td>2x</td>
<td>38.6 &amp; 34.5</td>
<td>-</td>
<td>38.6 &amp; 35.1 / -</td>
</tr>
<tr>
<td rowspan="2">Fast R-CNN</td>
<td>1x</td>
<td>36.4</td>
<td>-</td>
<td>35.8 / 36.6</td>
</tr>
<tr>
<td>2x</td>
<td>36.8</td>
<td>-</td>
<td>37.1 / -</td>
</tr>
<tr>
<td rowspan="2">Fast R-CNN (w/mask)</td>
<td>1x</td>
<td>37.3 &amp; 33.7</td>
<td>-</td>
<td>36.8 &amp; 34.1 / 37.3 &amp; 34.5</td>
</tr>
<tr>
<td>2x</td>
<td>37.7 &amp; 34.0</td>
<td>-</td>
<td>37.9 &amp; 34.8 / -</td>
</tr>
</table>
### Training Speed
The training speed is measure with s/iter. The lower, the better.
<table>
<tr>
<th>Type</th>
<th>Detectron (P100<sup>1</sup>)</th>
<th>Detectron.pytorch (XP<sup>2</sup>)</th>
<th>mmdetection<sup>3</sup> (V100<sup>4</sup> / XP)</th>
</tr>
<tr>
<td>RPN</td>
<td>0.416</td>
<td>-</td>
<td>0.407 / 0.413</td>
</tr>
<tr>
<td>Faster R-CNN</td>
<td>0.544</td>
<td>1.015</td>
<td>0.554 / 0.579</td>
</tr>
<tr>
<td>Mask R-CNN</td>
<td>0.889</td>
<td>1.435</td>
<td>0.690 / 0.732</td>
</tr>
<tr>
<td>Fast R-CNN</td>
<td>0.285</td>
<td>-</td>
<td>0.375 / 0.398</td>
</tr>
<tr>
<td>Fast R-CNN (w/mask)</td>
<td>0.377</td>
<td>-</td>
<td>0.504 / 0.574</td>
</tr>
</table>
\*1. Detectron reports the speed on Facebook's Big Basin servers (P100),
on our V100 servers it is slower so we use the official reported values.
\*2. Detectron.pytorch does not report the runtime and we encountered some issue to
run it on V100, so we report the speed on TITAN XP.
\*3. The speed of pytorch-style ResNet is approximately 5% slower than caffe-style,
and we report the pytorch-style results here.
\*4. We also run the models on a DGX-1 server (P100) and the speed is almost the same as our V100 servers.
### Inference Speed
The inference speed is measured with fps (img/s) on a single GPU. The higher, the better.
<table>
<tr>
<th>Type</th>
<th>Detectron (P100)</th>
<th>Detectron.pytorch (XP)</th>
<th>mmdetection (V100 / XP)</th>
</tr>
<tr>
<td>RPN</td>
<td>12.5</td>
<td>-</td>
<td>14.5 / 15.4</td>
</tr>
<tr>
<td>Faster R-CNN</td>
<td>10.3</td>
<td></td>
<td>9.9 / 9.8</td>
</tr>
<tr>
<td>Mask R-CNN</td>
<td>8.5</td>
<td></td>
<td>7.7 / 7.4</td>
</tr>
<tr>
<td>Fast R-CNN</td>
<td>12.5</td>
<td></td>
<td>14.5 / 14.1</td>
</tr>
<tr>
<td>Fast R-CNN (w/mask)</td>
<td>9.9</td>
<td></td>
<td>10.6 / 10.3</td>
</tr>
</table>
### Training memory
We perform various tests and there is no doubt that mmdetection is more memory
efficient than Detectron, and the main cause is the deep learning framework itself, not our efforts.
Besides, Caffe2 and PyTorch have different apis to obtain memory usage
whose implementation is not exactly the same.
`nvidia-smi` shows a larger memory usage for both detectron and mmdetection, e.g.,
we observe a much higher memory usage when we train Mask R-CNN with 2 images per GPU using detectron (10.6G) and mmdetection (9.3G), which is obviously more than actually required.
> With mmdetection, we can train R-50 FPN Mask R-CNN with **4** images per GPU (TITAN XP, 12G),
which is a promising result.
# mm-detection
Open-MMLab Detection Toolbox
**Note:**
# mmdetection
We are still working on organizing the codebase. This toolbox will be formally released by the end of September. Stay tuned!
## Introduction
---
mmdetection is an open source object detection toolbox based on PyTorch. It is
a part of the open-mmlab project developed by [Multimedia Laboratory, CUHK](http://mmlab.ie.cuhk.edu.hk/).
## Major Features
### Major features
- **Modular Design**
One can easily construct a customized object detection framework by combining different components.
One can easily construct a customized object detection framework by combining different components.
- **Support of multiple frameworks out of box**
The toolbox directly supports popular detection frameworks, *e.g.* Faster RCNN, Mask RCNN, and R-FCN, etc.
The toolbox directly supports popular detection frameworks, *e.g.* Faster RCNN, Mask RCNN, RetinaNet, etc.
- **Efficient**
All basic bbox and mask operations run on GPUs now.
The training speed is about 5% ~ 20% faster than Detectron for different models.
- **State of the art**
This was the codebase of the *MMDet* team, who won the [COCO Detection 2018 challenge](http://cocodataset.org/#detection-leaderboard).
This was the codebase of the *MMDet* team, who won the [COCO Detection 2018 challenge](http://cocodataset.org/#detection-leaderboard).
Apart from mmdetection, we also released a library [mmcv](https://github.com/open-mmlab/mmcv) for computer vision research,
which is heavily depended on by this toolbox.
## License
This project is released under the [GPLv3 license](LICENSE).
## Benchmark and model zoo
We provide our baseline results and the comparision with Detectron, the most
popular detection projects. Results and models are available in the [Model zoo](MODEL_ZOO.md).
## Installation
### Requirements
- Linux (tested on Ubuntu 16.04 and CentOS 7.2)
- Python 3.4+
- PyTorch 0.4.1 and torchvision
- Cython
- [mmcv](https://github.com/open-mmlab/mmcv)
### Install mmdetection
a. Install PyTorch 0.4.1 and torchvision following the [official instructions](https://pytorch.org/).
b. Clone the mmdetection repository.
```shell
git clone https://github.com/open-mmlab/mmdetection.git
```
c. Compile cuda extensions.
```shell
cd mmdetection
pip install cython # or "conda install cython" if you prefer conda
./compile.sh # or "PYTHON=python3 ./compile.sh" if you use system python3 without virtual environments
```
d. Install mmdetection (other dependencies will be installed automatically).
```shell
python(3) setup.py install # add --user if you want to install it locally
# or "pip install ."
```
Note: You need to run the last step each time you pull updates from github.
The git commit id will be written to the version number and also saved in trained models.
### Prepare COCO dataset.
It is recommended to symlink the dataset root to `$MMDETECTION/data`.
```
mmdetection
├── mmdet
├── tools
├── configs
├── data
│ ├── coco
│ │ ├── annotations
│ │ ├── train2017
│ │ ├── val2017
│ │ ├── test2017
```
> [Here](https://gist.github.com/hellock/bf23cd7348c727d69d48682cb6909047) is
a script for setting up mmdetection with conda for reference.
## Inference with pretrained models
### Test a dataset
- [x] single GPU testing
- [x] multiple GPU testing
- [x] visualize detection results
We allow to run one or multiple processes on each GPU, e.g. 8 processes on 8 GPU
or 16 processes on 8 GPU. When the GPU workload is not very heavy for a single
process, running multiple processes will accelerate the testing, which is specified
with the argument `--proc_per_gpu <PROCESS_NUM>`.
To test a dataset and save the results.
```shell
python tools/test.py <CONFIG_FILE> <CHECKPOINT_FILE> --gpus <GPU_NUM> --out <OUT_FILE>
```
To perform evaluation after testing, add `--eval <EVAL_TYPES>`. Supported types are:
- proposal_fast: eval recalls of proposals with our own codes. (supposed to get the same results as the official evaluation)
- proposal: eval recalls of proposals with the official code provided by COCO.
- bbox: eval box AP with the official code provided by COCO.
- segm: eval mask AP with the official code provided by COCO.
- keypoints: eval keypoint AP with the official code provided by COCO.
For example, to evaluate Mask R-CNN with 8 GPUs and save the result as `results.pkl`.
```shell
python tools/test.py configs/mask_rcnn_r50_fpn_1x.py <CHECKPOINT_FILE> --gpus 8 --out results.pkl --eval bbox segm
```
It is also convenient to visualize the results during testing by adding an argument `--show`.
```shell
python tools/test.py <CONFIG_FILE> <CHECKPOINT_FILE> --show
```
### Test image(s)
We provide some high-level apis (experimental) to test an image.
```python
import mmcv
from mmcv.runner import load_checkpoint
from mmdet.models import build_detector
from mmdet.apis import inference_detector, show_result
cfg = mmcv.Config.fromfile('configs/faster_rcnn_r50_fpn_1x.py')
cfg.model.pretrained = None
# construct the model and load checkpoint
model = build_detector(cfg.model, test_cfg=cfg.test_cfg)
_ = load_checkpoint(model, 'https://s3.ap-northeast-2.amazonaws.com/open-mmlab/mmdetection/models/faster_rcnn_r50_fpn_1x_20181010-3d1b3351.pth')
# test a single image
img = mmcv.imread('test.jpg')
result = inference_detector(model, img, cfg)
show_result(img, result)
# test a list of images
imgs = ['test1.jpg', 'test2.jpg']
for i, result in enumerate(inference_detector(model, imgs, cfg, device='cuda:0')):
print(i, imgs[i])
show_result(imgs[i], result)
```
## Train a model
mmdetection implements distributed training and non-distributed training,
which uses `MMDistributedDataParallel` and `MMDataParallel` respectively.
We suggest using distributed training even on a single machine, which is faster,
and non-distributed training are left for debugging or other purposes.
### Distributed training
mmdetection potentially supports multiple launch methods, e.g., PyTorch’s built-in launch utility, slurm and MPI.
We provide a training script using the launch utility provided by PyTorch.
```shell
./tools/dist_train.sh <CONFIG_FILE> <GPU_NUM> [optional arguments]
```
Supported arguments are:
- --validate: perform evaluation every k (default=1) epochs during the training.
- --work_dir <WORK_DIR>: if specified, the path in config file will be overwritten.
### Non-distributed training
```shell
python tools/train.py <CONFIG_FILE> --gpus <GPU_NUM> --work_dir <WORK_DIR> --validate
```
Expected results in WORK_DIR:
- log file
- saved checkpoints (every k epochs, defaults=1)
- a symbol link to the latest checkpoint
## Technical details
Some implementation details and project structures are described in the [technical details](TECHNICAL_DETAILS.md).
## Overview
In this section, we will introduce the main units of training a detector:
data loading, model and iteration pipeline.
## Data loading
Following typical conventions, we use `Dataset` and `DataLoader` for data loading
with multiple workers. `Dataset` returns a dict of data items corresponding
the arguments of models' forward method.
Since the data in object detection may not be the same size (image size, gt bbox size, etc.),
we introduce a new `DataContainer` type in `mmcv` to help collect and distribute
data of different size.
See [here](https://github.com/open-mmlab/mmcv/blob/master/mmcv/parallel/data_container.py) for more details.
## Model
In mmdetection, model components are basically categorized as 4 types.
- backbone: usually a FCN network to extract feature maps, e.g., ResNet.
- neck: the part between backbones and heads, e.g., FPN, ASPP.
- head: the part for specific tasks, e.g., bbox prediction and mask prediction.
- roi extractor: the part for extracting features from feature maps, e.g., RoI Align.
We also write implement some general detection pipelines with the above components,
such as `SingleStageDetector` and `TwoStageDetector`.
### Build a model with basic components
Following some basic pipelines (e.g., two-stage detectors), the model structure
can be customized through config files with no pains.
If we want to implement some new components, e.g, the path aggregation
FPN structure in [Path Aggregation Network for Instance Segmentation](https://arxiv.org/abs/1803.01534), there are two things to do.
1. create a new file in `mmdet/models/necks/pafpn.py`.
```python
class PAFPN(nn.Module):
def __init__(self,
in_channels,
out_channels,
num_outs,
start_level=0,
end_level=-1,
add_extra_convs=False):
pass
def forward(self, inputs):
# implementation is ignored
pass
```
2. modify the config file from
```python
neck=dict(
type='FPN',
in_channels=[256, 512, 1024, 2048],
out_channels=256,
num_outs=5)
```
to
```python
neck=dict(
type='PAFPN',
in_channels=[256, 512, 1024, 2048],
out_channels=256,
num_outs=5)
```
We will release more components (backbones, necks, heads) for research purpose.
### Write a new model
To write a new detection pipeline, you need to inherit from `BaseDetector`,
which defines the following abstract methods.
- `extract_feat()`: given an image batch of shape (n, c, h, w), extract the feature map(s).
- `forward_train()`: forward method of the training mode
- `simple_test()`: single scale testing without augmentation
- `aug_test()`: testing without augmentation (multi-scale, flip, etc.)
[TwoStageDetector](https://github.com/hellock/mmdetection/blob/master/mmdet/models/detectors/two_stage.py)
is a good example which shows how to do that.
## Iteration pipeline
We adopt distributed training for both single machine and multiple machines.
Supposing that the server has 8 GPUs, 8 processes will be started and each process runs on a single GPU.
Each process keeps an isolated model, data loader, and optimizer.
Model parameters are only synchronized once at the begining.
After a forward and backward pass, gradients will be allreduced among all GPUs,
and the optimizer will update model parameters.
Since the gradients are allreduced, the model parameter stays the same for all processes after the iteration.
\ No newline at end of file
# model settings
model = dict(
type='FastRCNN',
pretrained='modelzoo://resnet50',
backbone=dict(
type='ResNet',
depth=50,
num_stages=4,
out_indices=(0, 1, 2, 3),
frozen_stages=1,
style='pytorch'),
neck=dict(
type='FPN',
in_channels=[256, 512, 1024, 2048],
out_channels=256,
num_outs=5),
bbox_roi_extractor=dict(
type='SingleRoIExtractor',
roi_layer=dict(type='RoIAlign', out_size=7, sample_num=2),
out_channels=256,
featmap_strides=[4, 8, 16, 32]),
bbox_head=dict(
type='SharedFCRoIHead',
num_fcs=2,
in_channels=256,
fc_out_channels=1024,
roi_feat_size=7,
num_classes=81,
target_means=[0., 0., 0., 0.],
target_stds=[0.1, 0.1, 0.2, 0.2],
reg_class_agnostic=False),
mask_roi_extractor=dict(
type='SingleRoIExtractor',
roi_layer=dict(type='RoIAlign', out_size=14, sample_num=2),
out_channels=256,
featmap_strides=[4, 8, 16, 32]),
mask_head=dict(
type='FCNMaskHead',
num_convs=4,
in_channels=256,
conv_out_channels=256,
num_classes=81))
# model training and testing settings
train_cfg = dict(
rcnn=dict(
mask_size=28,
pos_iou_thr=0.5,
neg_iou_thr=0.5,
crowd_thr=1.1,
roi_batch_size=512,
add_gt_as_proposals=True,
pos_fraction=0.25,
pos_balance_sampling=False,
neg_pos_ub=512,
neg_balance_thr=0,
min_pos_iou=0.5,
pos_weight=-1,
debug=False))
test_cfg = dict(
rcnn=dict(
score_thr=0.05, max_per_img=100, nms_thr=0.5, mask_thr_binary=0.5))
# dataset settings
dataset_type = 'CocoDataset'
data_root = 'data/coco/'
img_norm_cfg = dict(
mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
data = dict(
imgs_per_gpu=2,
workers_per_gpu=2,
train=dict(
type=dataset_type,
ann_file=data_root + 'annotations/instances_train2017.json',
img_prefix=data_root + 'train2017/',
img_scale=(1333, 800),
img_norm_cfg=img_norm_cfg,
size_divisor=32,
proposal_file=data_root + 'proposals/rpn_r50_fpn_1x_train2017.pkl',
flip_ratio=0.5,
with_mask=True,
with_crowd=True,
with_label=True),
val=dict(
type=dataset_type,
ann_file=data_root + 'annotations/instances_val2017.json',
img_prefix=data_root + 'val2017/',
img_scale=(1333, 800),
img_norm_cfg=img_norm_cfg,
proposal_file=data_root + 'proposals/rpn_r50_fpn_1x_val2017.pkl',
size_divisor=32,
flip_ratio=0,
with_mask=True,
with_crowd=True,
with_label=True),
test=dict(
type=dataset_type,
ann_file=data_root + 'annotations/instances_val2017.json',
img_prefix=data_root + 'val2017/',
img_scale=(1333, 800),
img_norm_cfg=img_norm_cfg,
proposal_file=data_root + 'proposals/rpn_r50_fpn_1x_val2017.pkl',
size_divisor=32,
flip_ratio=0,
with_mask=False,
with_label=False,
test_mode=True))
# optimizer
optimizer = dict(type='SGD', lr=0.02, momentum=0.9, weight_decay=0.0001)
optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2))
# learning policy
lr_config = dict(
policy='step',
warmup='linear',
warmup_iters=500,
warmup_ratio=1.0 / 3,
step=[8, 11])
checkpoint_config = dict(interval=1)
# yapf:disable
log_config = dict(
interval=50,
hooks=[
dict(type='TextLoggerHook'),
# dict(type='TensorboardLoggerHook')
])
# yapf:enable
# runtime settings
total_epochs = 12
dist_params = dict(backend='nccl')
log_level = 'INFO'
work_dir = './work_dirs/fast_mask_rcnn_r50_fpn_1x'
load_from = None
resume_from = None
workflow = [('train', 1)]
# model settings
model = dict(
type='FastRCNN',
pretrained='modelzoo://resnet50',
backbone=dict(
type='ResNet',
depth=50,
num_stages=4,
out_indices=(0, 1, 2, 3),
frozen_stages=1,
style='pytorch'),
neck=dict(
type='FPN',
in_channels=[256, 512, 1024, 2048],
out_channels=256,
num_outs=5),
bbox_roi_extractor=dict(
type='SingleRoIExtractor',
roi_layer=dict(type='RoIAlign', out_size=7, sample_num=2),
out_channels=256,
featmap_strides=[4, 8, 16, 32]),
bbox_head=dict(
type='SharedFCRoIHead',
num_fcs=2,
in_channels=256,
fc_out_channels=1024,
roi_feat_size=7,
num_classes=81,
target_means=[0., 0., 0., 0.],
target_stds=[0.1, 0.1, 0.2, 0.2],
reg_class_agnostic=False))
# model training and testing settings
train_cfg = dict(
rcnn=dict(
pos_iou_thr=0.5,
neg_iou_thr=0.5,
crowd_thr=1.1,
roi_batch_size=512,
add_gt_as_proposals=True,
pos_fraction=0.25,
pos_balance_sampling=False,
neg_pos_ub=512,
neg_balance_thr=0,
min_pos_iou=0.5,
pos_weight=-1,
debug=False))
test_cfg = dict(rcnn=dict(score_thr=0.05, max_per_img=100, nms_thr=0.5))
# dataset settings
dataset_type = 'CocoDataset'
data_root = 'data/coco/'
img_norm_cfg = dict(
mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
data = dict(
imgs_per_gpu=2,
workers_per_gpu=2,
train=dict(
type=dataset_type,
ann_file=data_root + 'annotations/instances_train2017.json',
img_prefix=data_root + 'train2017/',
img_scale=(1333, 800),
img_norm_cfg=img_norm_cfg,
size_divisor=32,
proposal_file=data_root + 'proposals/rpn_r50_fpn_1x_train2017.pkl',
flip_ratio=0.5,
with_mask=False,
with_crowd=True,
with_label=True),
val=dict(
type=dataset_type,
ann_file=data_root + 'annotations/instances_val2017.json',
img_prefix=data_root + 'val2017/',
img_scale=(1333, 800),
img_norm_cfg=img_norm_cfg,
proposal_file=data_root + 'proposals/rpn_r50_fpn_1x_val2017.pkl',
size_divisor=32,
flip_ratio=0,
with_mask=False,
with_crowd=True,
with_label=True),
test=dict(
type=dataset_type,
ann_file=data_root + 'annotations/instances_val2017.json',
img_prefix=data_root + 'val2017/',
img_scale=(1333, 800),
img_norm_cfg=img_norm_cfg,
proposal_file=data_root + 'proposals/rpn_r50_fpn_1x_val2017.pkl',
size_divisor=32,
flip_ratio=0,
with_mask=False,
with_label=False,
test_mode=True))
# optimizer
optimizer = dict(type='SGD', lr=0.02, momentum=0.9, weight_decay=0.0001)
optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2))
# learning policy
lr_config = dict(
policy='step',
warmup='linear',
warmup_iters=500,
warmup_ratio=1.0 / 3,
step=[8, 11])
checkpoint_config = dict(interval=1)
# yapf:disable
log_config = dict(
interval=50,
hooks=[
dict(type='TextLoggerHook'),
# dict(type='TensorboardLoggerHook')
])
# yapf:enable
# runtime settings
total_epochs = 12
dist_params = dict(backend='nccl')
log_level = 'INFO'
work_dir = './work_dirs/fast_rcnn_r50_fpn_1x'
load_from = None
resume_from = None
workflow = [('train', 1)]
......@@ -65,7 +65,7 @@ train_cfg = dict(
pos_balance_sampling=False,
neg_pos_ub=512,
neg_balance_thr=0,
min_pos_iou=1.1,
min_pos_iou=0.5,
pos_weight=-1,
debug=False))
test_cfg = dict(
......@@ -139,7 +139,6 @@ log_config = dict(
# yapf:enable
# runtime settings
total_epochs = 12
device_ids = range(8)
dist_params = dict(backend='nccl')
log_level = 'INFO'
work_dir = './work_dirs/faster_rcnn_r50_fpn_1x'
......
......@@ -77,7 +77,7 @@ train_cfg = dict(
pos_balance_sampling=False,
neg_pos_ub=512,
neg_balance_thr=0,
min_pos_iou=1.1,
min_pos_iou=0.5,
pos_weight=-1,
debug=False))
test_cfg = dict(
......@@ -152,7 +152,6 @@ log_config = dict(
# yapf:enable
# runtime settings
total_epochs = 12
device_ids = range(8)
dist_params = dict(backend='nccl')
log_level = 'INFO'
work_dir = './work_dirs/mask_rcnn_r50_fpn_1x'
......
from .env import init_dist, get_root_logger, set_random_seed
from .train import train_detector
from .inference import inference_detector
from .inference import inference_detector, show_result
__all__ = [
'init_dist', 'get_root_logger', 'set_random_seed', 'train_detector',
'inference_detector'
'inference_detector', 'show_result'
]
......@@ -23,19 +23,29 @@ def _prepare_data(img, img_transform, cfg, device):
return dict(img=[img], img_meta=[img_meta])
def inference_detector(model, imgs, cfg, device='cuda:0'):
def _inference_single(model, img, img_transform, cfg, device):
img = mmcv.imread(img)
data = _prepare_data(img, img_transform, cfg, device)
with torch.no_grad():
result = model(return_loss=False, rescale=True, **data)
return result
def _inference_generator(model, imgs, img_transform, cfg, device):
for img in imgs:
yield _inference_single(model, img, img_transform, cfg, device)
imgs = imgs if isinstance(imgs, list) else [imgs]
def inference_detector(model, imgs, cfg, device='cuda:0'):
img_transform = ImageTransform(
size_divisor=cfg.data.test.size_divisor, **cfg.img_norm_cfg)
model = model.to(device)
model.eval()
for img in imgs:
img = mmcv.imread(img)
data = _prepare_data(img, img_transform, cfg, device)
with torch.no_grad():
result = model(return_loss=False, rescale=True, **data)
yield result
if not isinstance(imgs, list):
return _inference_single(model, imgs, img_transform, cfg, device)
else:
return _inference_generator(model, imgs, img_transform, cfg, device)
def show_result(img, result, dataset='coco', score_thr=0.3):
......@@ -46,6 +56,7 @@ def show_result(img, result, dataset='coco', score_thr=0.3):
]
labels = np.concatenate(labels)
bboxes = np.vstack(result)
img = mmcv.imread(img)
mmcv.imshow_det_bboxes(
img.copy(),
bboxes,
......
......@@ -2,7 +2,7 @@ from mmcv.runner import obj_from_dict
from torch import nn
from . import (backbones, necks, roi_extractors, rpn_heads, bbox_heads,
mask_heads, detectors)
mask_heads)
__all__ = [
'build_backbone', 'build_neck', 'build_rpn_head', 'build_roi_extractor',
......@@ -48,4 +48,5 @@ def build_mask_head(cfg):
def build_detector(cfg, train_cfg=None, test_cfg=None):
from . import detectors
return build(cfg, detectors, dict(train_cfg=train_cfg, test_cfg=test_cfg))
......@@ -140,7 +140,6 @@ class TwoStageDetector(BaseDetector, RPNTestMixin, BBoxTestMixin,
def simple_test(self, img, img_meta, proposals=None, rescale=False):
"""Test without augmentation."""
assert proposals is None, "Fast RCNN hasn't been implemented."
assert self.with_bbox, "Bbox head must be implemented."
x = self.extract_feat(img)
......
......@@ -48,8 +48,8 @@ class RPNHead(nn.Module):
self.anchor_scales = anchor_scales
self.anchor_ratios = anchor_ratios
self.anchor_strides = anchor_strides
self.anchor_base_sizes = anchor_strides.copy(
) if anchor_base_sizes is None else anchor_base_sizes
self.anchor_base_sizes = list(
anchor_strides) if anchor_base_sizes is None else anchor_base_sizes
self.target_means = target_means
self.target_stds = target_stds
self.use_sigmoid_cls = use_sigmoid_cls
......
......@@ -5,8 +5,8 @@ from mmcv import Config
from mmcv.runner import obj_from_dict
from mmdet import datasets, __version__
from mmdet.api import (train_detector, init_dist, get_root_logger,
set_random_seed)
from mmdet.apis import (train_detector, init_dist, get_root_logger,
set_random_seed)
from mmdet.models import build_detector
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment