Commit 8c4f2d98 authored by zhangwenwei's avatar zhangwenwei
Browse files

Doc

parent 809d1a9b
## Changelog ## Changelog
### v2.0.0 (6/5/2020)
In this release, we made lots of major refactoring and modifications.
1. **Faster speed**. We optimize the training and inference speed for common models, achieving up to 30% speedup for training and 25% for inference. Please refer to [model zoo](model_zoo.md#comparison-with-detectron2) for details.
2. **Higher performance**. We change some default hyperparameters with no additional cost, which leads to a gain of performance for most models. Please refer to [compatibility](compatibility.md#training-hyperparameters) for details.
3. **More documentation and tutorials**. We add a bunch of documentation and tutorials to help users get started more smoothly. Read it [here](https://mmdetection.readthedocs.io/en/latest/).
4. **Support PyTorch 1.5**. The support for 1.1 and 1.2 is dropped, and we switch to some new APIs.
5. **Better configuration system**. Inheritance is supported to reduce the redundancy of configs.
6. **Better modular desing**. Towards the goal of simplicity and flexibility, we simplify some encapsulation while add more other configurable modules like BBoxCoder, IoUCalculator, OptimizerConstructor, RoIHead. Target computation is also included in heads and the call hierarchy is simpler.
7. Support new methods: [FSAF](https://arxiv.org/abs/1903.00621) and PAFPN (part of [PAFPN](https://arxiv.org/abs/1803.01534)).
**Breaking Changes**
Models training with mmdetection 1.x are not fully compatible with 2.0, please refer to the [compatibility doc](compatibility.md) for the details and how to migrate to the new version.
**Improvements**
- Unify cuda and cpp API for custom ops. (#2277)
- New config files with inheritance. (#2216)
- Encapsulate the second stage into RoI heads. (#1999)
- Refactor GCNet/EmpericalAttention into plugins. (#2345)
- Set low quality match as an option in IoU-based bbox assigners. (#2375)
- Change the codebase's coordinate system. (#2380)
- Refactor the category order in heads. 0 means the first positive class instead of background now. (#2374)
- Add bbox sampler and assigner registry. (#2419)
- Speed up the inference of RPN. (#2420)
- Add `train_cfg` and `test_cfg` as class members in all anchor heads. (#2422)
- Merge target computation methods into heads. (#2429)
- Add bbox coder to support different bbox encoding and losses. (#2480)
- Unify the API for regression loss. (#2156)
- Refactor Anchor Generator. (#2474)
- Make `lr` an optional argument for optimizers. (#2509)
- Migrate to modules and methods in MMCV. (#2502, #2511, #2569, #2572)
- Support PyTorch 1.5. (#2524)
- Drop the support for Python 3.5 and use F-string in the codebase. (#2531)
**Bug Fixes**
- Fix the scale factors for resized images without keep the aspect ratio. (#2039)
- Check if max_num > 0 before slicing in NMS. (#2486)
- Fix Deformable RoIPool when there is no instance. (#2490)
- Fix the default value of assigned labels. (#2536)
- Fix the evaluation of Cityscapes. (#2578)
**New Features**
- Add deep_stem and avg_down option to ResNet, i.e., support ResNetV1d. (#2252)
- Add L1 loss. (#2376)
- Support both polygon and bitmap for instance masks. (#2353, #2540)
- Support CPU mode for inference. (#2385)
- Add optimizer constructor for complicated configuration of optimizers. (#2397, #2488)
- Implement PAFPN. (#2392)
- Support empty tensor input for some modules. (#2280)
- Support for custom dataset classes without overriding it. (#2408, #2443)
- Support to train subsets of coco dataset. (#2340)
- Add iou_calculator to potentially support more IoU calculation methods. (2405)
- Support class wise mean AP (was removed in the last version). (#2459)
- Add option to save the testing result images. (#2414)
- Support MomentumUpdaterHook. (#2571)
- Add a demo to inference a single image. (#2605)
### v1.1.0 (24/2/2020) ### v1.1.0 (24/2/2020)
**Highlights** **Highlights**
......
# Getting Started # Getting Started
This page provides basic tutorials about the usage of MMDetection. This page provides basic tutorials about the usage of MMDetection.
For installation instructions, please see [INSTALL.md](INSTALL.md). For installation instructions, please see [install.md](install.md).
## Prepare datasets
It is recommended to symlink the dataset root to `$MMDETECTION/data`.
If your folder structure is different, you may need to change the corresponding paths in config files.
```
mmdetection
├── mmdet
├── tools
├── configs
├── data
│ ├── coco
│ │ ├── annotations
│ │ ├── train2017
│ │ ├── val2017
│ │ ├── test2017
│ ├── cityscapes
│ │ ├── annotations
│ │ ├── leftImg8bit
│ │ │ ├── train
│ │ │ ├── val
│ │ ├── gtFine
│ │ │ ├── train
│ │ │ ├── val
│ ├── VOCdevkit
│ │ ├── VOC2007
│ │ ├── VOC2012
```
The cityscapes annotations have to be converted into the coco format using `tools/convert_datasets/cityscapes.py`:
```shell
pip install cityscapesscripts
python tools/convert_datasets/cityscapes.py ./data/cityscapes --nproc 8 --out_dir ./data/cityscapes/annotations
```
Currently the config files in `cityscapes` use COCO pre-trained weights to initialize.
You could download the pre-trained models in advance if network is unavailable or slow, otherwise it would cause errors at the beginning of training.
For using custom datasets, please refer to [Tutorials 2: Adding New Dataset](tutorials/new_dataset.md).
## Inference with pretrained models ## Inference with pretrained models
...@@ -10,9 +52,9 @@ and also some high-level apis for easier integration to other projects. ...@@ -10,9 +52,9 @@ and also some high-level apis for easier integration to other projects.
### Test a dataset ### Test a dataset
- [x] single GPU testing - single GPU
- [x] multiple GPU testing - single node multiple GPU
- [x] visualize detection results - multiple node
You can use the following commands to test a dataset. You can use the following commands to test a dataset.
...@@ -28,8 +70,9 @@ Optional arguments: ...@@ -28,8 +70,9 @@ Optional arguments:
- `RESULT_FILE`: Filename of the output results in pickle format. If not specified, the results will not be saved to a file. - `RESULT_FILE`: Filename of the output results in pickle format. If not specified, the results will not be saved to a file.
- `EVAL_METRICS`: Items to be evaluated on the results. Allowed values depend on the dataset, e.g., `proposal_fast`, `proposal`, `bbox`, `segm` are available for COCO, `mAP`, `recall` for PASCAL VOC. Cityscapes could be evaluated by `cityscapes` as well as all COCO metrics. - `EVAL_METRICS`: Items to be evaluated on the results. Allowed values depend on the dataset, e.g., `proposal_fast`, `proposal`, `bbox`, `segm` are available for COCO, `mAP`, `recall` for PASCAL VOC. Cityscapes could be evaluated by `cityscapes` as well as all COCO metrics.
- `--show`: If specified, detection results will be plotted on the images and shown in a new window. It is only applicable to single GPU testing and used for debugging and visualization. Please make sure that GUI is available in your environment, otherwise you may encounter the error like `cannot connect to X server`. - `--show`: If specified, detection results will be plotted on the images and shown in a new window. It is only applicable to single GPU testing and used for debugging and visualization. Please make sure that GUI is available in your environment, otherwise you may encounter the error like `cannot connect to X server`.
- `--show-dir`: If specified, detection results will be plotted on the images and saved to the specified directory. It is only applicable to single GPU testing and used for debugging and visualization. You do NOT need a GUI available in your environment for using this option.
- `--show-score-thr`: If specified, detections with score below this threshold will be removed.
If you would like to evaluate the dataset, do not specify `--show` at the same time.
Examples: Examples:
...@@ -38,12 +81,20 @@ Assume that you have already downloaded the checkpoints to the directory `checkp ...@@ -38,12 +81,20 @@ Assume that you have already downloaded the checkpoints to the directory `checkp
1. Test Faster R-CNN and visualize the results. Press any key for the next image. 1. Test Faster R-CNN and visualize the results. Press any key for the next image.
```shell ```shell
python tools/test.py configs/faster_rcnn_r50_fpn_1x.py \ python tools/test.py configs/faster_rcnn_r50_fpn_1x_coco.py \
checkpoints/faster_rcnn_r50_fpn_1x_20181010-3d1b3351.pth \ checkpoints/faster_rcnn_r50_fpn_1x_20181010-3d1b3351.pth \
--show --show
``` ```
2. Test Faster R-CNN on PASCAL VOC (without saving the test results) and evaluate the mAP. 2. Test Faster R-CNN and save the painted images for latter visualization.
```shell
python tools/test.py configs/faster_rcnn_r50_fpn_1x.py \
checkpoints/faster_rcnn_r50_fpn_1x_20181010-3d1b3351.pth \
--show-dir faster_rcnn_r50_fpn_1x_results
```
3. Test Faster R-CNN on PASCAL VOC (without saving the test results) and evaluate the mAP.
```shell ```shell
python tools/test.py configs/pascal_voc/faster_rcnn_r50_fpn_1x_voc.py \ python tools/test.py configs/pascal_voc/faster_rcnn_r50_fpn_1x_voc.py \
...@@ -51,46 +102,69 @@ python tools/test.py configs/pascal_voc/faster_rcnn_r50_fpn_1x_voc.py \ ...@@ -51,46 +102,69 @@ python tools/test.py configs/pascal_voc/faster_rcnn_r50_fpn_1x_voc.py \
--eval mAP --eval mAP
``` ```
3. Test Mask R-CNN with 8 GPUs, and evaluate the bbox and mask AP. 4. Test Mask R-CNN with 8 GPUs, and evaluate the bbox and mask AP.
```shell ```shell
./tools/dist_test.sh configs/mask_rcnn_r50_fpn_1x.py \ ./tools/dist_test.sh configs/mask_rcnn_r50_fpn_1x_coco.py \
checkpoints/mask_rcnn_r50_fpn_1x_20181010-069fa190.pth \ checkpoints/mask_rcnn_r50_fpn_1x_20181010-069fa190.pth \
8 --out results.pkl --eval bbox segm 8 --out results.pkl --eval bbox segm
``` ```
4. Test Mask R-CNN on COCO test-dev with 8 GPUs, and generate the json file to be submit to the official evaluation server. 5. Test Mask R-CNN with 8 GPUs, and evaluate the **classwise** bbox and mask AP.
```shell
./tools/dist_test.sh configs/mask_rcnn_r50_fpn_1x_coco.py \
checkpoints/mask_rcnn_r50_fpn_1x_20181010-069fa190.pth \
8 --out results.pkl --eval bbox segm --options "classwise=True"
```
6. Test Mask R-CNN on COCO test-dev with 8 GPUs, and generate the json file to be submit to the official evaluation server.
```shell ```shell
./tools/dist_test.sh configs/mask_rcnn_r50_fpn_1x.py \ ./tools/dist_test.sh configs/mask_rcnn_r50_fpn_1x_coco.py \
checkpoints/mask_rcnn_r50_fpn_1x_20181010-069fa190.pth \ checkpoints/mask_rcnn_r50_fpn_1x_20181010-069fa190.pth \
8 --format-only --options "jsonfile_prefix=./mask_rcnn_test-dev_results" 8 --format-only --options "jsonfile_prefix=./mask_rcnn_test-dev_results"
``` ```
You will get two json files `mask_rcnn_test-dev_results.bbox.json` and `mask_rcnn_test-dev_results.segm.json`. You will get two json files `mask_rcnn_test-dev_results.bbox.json` and `mask_rcnn_test-dev_results.segm.json`.
5. Test Mask R-CNN on Cityscapes test with 8 GPUs, and generate the txt and png files to be submit to the official evaluation server. 7. Test Mask R-CNN on Cityscapes test with 8 GPUs, and generate the txt and png files to be submit to the official evaluation server.
```shell ```shell
./tools/dist_test.sh configs/cityscapes/mask_rcnn_r50_fpn_1x_cityscapes.py \ ./tools/dist_test.sh configs/cityscapes/mask_rcnn_r50_fpn_1x_cityscapes.py \
checkpoints/mask_rcnn_r50_fpn_1x_cityscapes_20200227-afe51d5a.pth \ checkpoints/mask_rcnn_r50_fpn_1x_cityscapes_20200227-afe51d5a.pth \
8 --format_only --options "outfile_prefix=./mask_rcnn_cityscapes_test_results" 8 --format-only --options "txtfile_prefix=./mask_rcnn_cityscapes_test_results"
``` ```
The generated png and txt would be under `./mask_rcnn_cityscapes_test_results` directory. The generated png and txt would be under `./mask_rcnn_cityscapes_test_results` directory.
### Image demo
We provide a demo script to test a single image.
```shell
python demo/image_demo.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [--device ${GPU_ID}] [--camera-id ${CAMERA-ID}] [--score-thr ${SCORE_THR}]
```
Examples:
```shell
python demo/image_demo.py demo/demo.jpg configs/faster_rcnn_r50_fpn_1x_coco.py \
checkpoints/faster_rcnn_r50_fpn_1x_20181010-3d1b3351.pth --device cpu
```
### Webcam demo ### Webcam demo
We provide a webcam demo to illustrate the results. We provide a webcam demo to illustrate the results.
```shell ```shell
python demo/webcam_demo.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [--device ${GPU_ID}] [--camera-id ${CAMERA-ID}] [--score-thr ${SCORE_THR}] python demo/webcam_demo.py ${IMAGE_FILE} ${CONFIG_FILE} ${CHECKPOINT_FILE} [--device ${GPU_ID}] [--score-thr ${SCORE_THR}]
``` ```
Examples: Examples:
```shell ```shell
python demo/webcam_demo.py configs/faster_rcnn_r50_fpn_1x.py \ python demo/webcam_demo.py configs/faster_rcnn_r50_fpn_1x_coco.py \
checkpoints/faster_rcnn_r50_fpn_1x_20181010-3d1b3351.pth checkpoints/faster_rcnn_r50_fpn_1x_20181010-3d1b3351.pth
``` ```
...@@ -100,10 +174,10 @@ python demo/webcam_demo.py configs/faster_rcnn_r50_fpn_1x.py \ ...@@ -100,10 +174,10 @@ python demo/webcam_demo.py configs/faster_rcnn_r50_fpn_1x.py \
Here is an example of building the model and test given images. Here is an example of building the model and test given images.
```python ```python
from mmdet.apis import init_detector, inference_detector, show_result from mmdet.apis import init_detector, inference_detector
import mmcv import mmcv
config_file = 'configs/faster_rcnn_r50_fpn_1x.py' config_file = 'configs/faster_rcnn_r50_fpn_1x_coco.py'
checkpoint_file = 'checkpoints/faster_rcnn_r50_fpn_1x_20181010-3d1b3351.pth' checkpoint_file = 'checkpoints/faster_rcnn_r50_fpn_1x_20181010-3d1b3351.pth'
# build the model from a config file and a checkpoint file # build the model from a config file and a checkpoint file
...@@ -113,15 +187,15 @@ model = init_detector(config_file, checkpoint_file, device='cuda:0') ...@@ -113,15 +187,15 @@ model = init_detector(config_file, checkpoint_file, device='cuda:0')
img = 'test.jpg' # or img = mmcv.imread(img), which will only load it once img = 'test.jpg' # or img = mmcv.imread(img), which will only load it once
result = inference_detector(model, img) result = inference_detector(model, img)
# visualize the results in a new window # visualize the results in a new window
show_result(img, result, model.CLASSES) model.show_result(img, result)
# or save the visualization results to image files # or save the visualization results to image files
show_result(img, result, model.CLASSES, out_file='result.jpg') model.show_result(img, result, out_file='result.jpg')
# test a video and show the results # test a video and show the results
video = mmcv.VideoReader('video.mp4') video = mmcv.VideoReader('video.mp4')
for frame in video: for frame in video:
result = inference_detector(model, frame) result = inference_detector(model, frame)
show_result(frame, result, model.CLASSES, wait_time=1) model.show_result(frame, result, wait_time=1)
``` ```
A notebook demo can be found in [demo/inference_demo.ipynb](https://github.com/open-mmlab/mmdetection/blob/master/demo/inference_demo.ipynb). A notebook demo can be found in [demo/inference_demo.ipynb](https://github.com/open-mmlab/mmdetection/blob/master/demo/inference_demo.ipynb).
...@@ -135,11 +209,11 @@ See `tests/async_benchmark.py` to compare the speed of synchronous and asynchron ...@@ -135,11 +209,11 @@ See `tests/async_benchmark.py` to compare the speed of synchronous and asynchron
```python ```python
import asyncio import asyncio
import torch import torch
from mmdet.apis import init_detector, async_inference_detector, show_result from mmdet.apis import init_detector, async_inference_detector
from mmdet.utils.contextmanagers import concurrent from mmdet.utils.contextmanagers import concurrent
async def main(): async def main():
config_file = 'configs/faster_rcnn_r50_fpn_1x.py' config_file = 'configs/faster_rcnn_r50_fpn_1x_coco.py'
checkpoint_file = 'checkpoints/faster_rcnn_r50_fpn_1x_20181010-3d1b3351.pth' checkpoint_file = 'checkpoints/faster_rcnn_r50_fpn_1x_20181010-3d1b3351.pth'
device = 'cuda:0' device = 'cuda:0'
model = init_detector(config_file, checkpoint=checkpoint_file, device=device) model = init_detector(config_file, checkpoint=checkpoint_file, device=device)
...@@ -159,9 +233,9 @@ async def main(): ...@@ -159,9 +233,9 @@ async def main():
result = await async_inference_detector(model, img) result = await async_inference_detector(model, img)
# visualize the results in a new window # visualize the results in a new window
show_result(img, result, model.CLASSES) model.show_result(img, result)
# or save the visualization results to image files # or save the visualization results to image files
show_result(img, result, model.CLASSES, out_file='result.jpg') model.show_result(img, result, out_file='result.jpg')
asyncio.run(main()) asyncio.run(main())
...@@ -188,7 +262,7 @@ According to the [Linear Scaling Rule](https://arxiv.org/abs/1706.02677), you ne ...@@ -188,7 +262,7 @@ According to the [Linear Scaling Rule](https://arxiv.org/abs/1706.02677), you ne
### Train with a single GPU ### Train with a single GPU
```shell ```shell
python tools/train.py ${CONFIG_FILE} python tools/train.py ${CONFIG_FILE} [optional arguments]
``` ```
If you want to specify the working directory in the command, you can add an argument `--work_dir ${YOUR_WORK_DIR}`. If you want to specify the working directory in the command, you can add an argument `--work_dir ${YOUR_WORK_DIR}`.
...@@ -201,26 +275,26 @@ If you want to specify the working directory in the command, you can add an argu ...@@ -201,26 +275,26 @@ If you want to specify the working directory in the command, you can add an argu
Optional arguments are: Optional arguments are:
- `--validate` (**strongly recommended**): Perform evaluation at every k (default value is 1, which can be modified like [this](https://github.com/open-mmlab/mmdetection/blob/master/configs/mask_rcnn_r50_fpn_1x.py#L174)) epochs during the training. - `--no-validate` (**not suggested**): By default, the codebase will perform evaluation at every k (default value is 1, which can be modified like [this](https://github.com/open-mmlab/mmdetection/blob/master/configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.py#L174)) epochs during the training. To disable this behavior, use `--no-validate`.
- `--work_dir ${WORK_DIR}`: Override the working directory specified in the config file. - `--work-dir ${WORK_DIR}`: Override the working directory specified in the config file.
- `--resume_from ${CHECKPOINT_FILE}`: Resume from a previous checkpoint file. - `--resume-from ${CHECKPOINT_FILE}`: Resume from a previous checkpoint file.
Difference between `resume_from` and `load_from`: Difference between `resume-from` and `load-from`:
`resume_from` loads both the model weights and optimizer status, and the epoch is also inherited from the specified checkpoint. It is usually used for resuming the training process that is interrupted accidentally. `resume-from` loads both the model weights and optimizer status, and the epoch is also inherited from the specified checkpoint. It is usually used for resuming the training process that is interrupted accidentally.
`load_from` only loads the model weights and the training epoch starts from 0. It is usually used for finetuning. `load-from` only loads the model weights and the training epoch starts from 0. It is usually used for finetuning.
### Train with multiple machines ### Train with multiple machines
If you run MMDetection on a cluster managed with [slurm](https://slurm.schedmd.com/), you can use the script `slurm_train.sh`. (This script also supports single machine training.) If you run MMDetection on a cluster managed with [slurm](https://slurm.schedmd.com/), you can use the script `slurm_train.sh`. (This script also supports single machine training.)
```shell ```shell
./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} ${CONFIG_FILE} ${WORK_DIR} [${GPUS}] [GPUS=${GPUS}] ./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} ${CONFIG_FILE} ${WORK_DIR}
``` ```
Here is an example of using 16 GPUs to train Mask R-CNN on the dev partition. Here is an example of using 16 GPUs to train Mask R-CNN on the dev partition.
```shell ```shell
./tools/slurm_train.sh dev mask_r50_1x configs/mask_rcnn_r50_fpn_1x.py /nfs/xxxx/mask_rcnn_r50_fpn_1x 16 GPUS=16 ./tools/slurm_train.sh dev mask_r50_1x configs/mask_rcnn_r50_fpn_1x_coco.py /nfs/xxxx/mask_rcnn_r50_fpn_1x
``` ```
You can check [slurm_train.sh](https://github.com/open-mmlab/mmdetection/blob/master/tools/slurm_train.sh) for full arguments and environment variables. You can check [slurm_train.sh](https://github.com/open-mmlab/mmdetection/blob/master/tools/slurm_train.sh) for full arguments and environment variables.
...@@ -256,8 +330,8 @@ dist_params = dict(backend='nccl', port=29501) ...@@ -256,8 +330,8 @@ dist_params = dict(backend='nccl', port=29501)
Then you can launch two jobs with `config1.py` ang `config2.py`. Then you can launch two jobs with `config1.py` ang `config2.py`.
```shell ```shell
CUDA_VISIBLE_DEVICES=0,1,2,3 ./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} config1.py ${WORK_DIR} 4 CUDA_VISIBLE_DEVICES=0,1,2,3 GPUS=4 ./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} config1.py ${WORK_DIR}
CUDA_VISIBLE_DEVICES=4,5,6,7 ./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} config2.py ${WORK_DIR} 4 CUDA_VISIBLE_DEVICES=4,5,6,7 GPUS=4 ./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} config2.py ${WORK_DIR}
``` ```
## Useful tools ## Useful tools
...@@ -356,155 +430,18 @@ The final output filename will be `faster_rcnn_r50_fpn_1x_20190801-{hash id}.pth ...@@ -356,155 +430,18 @@ The final output filename will be `faster_rcnn_r50_fpn_1x_20190801-{hash id}.pth
### Test the robustness of detectors ### Test the robustness of detectors
Please refer to [ROBUSTNESS_BENCHMARKING.md](ROBUSTNESS_BENCHMARKING.md). Please refer to [robustness_benchmarking.md](robustness_benchmarking.md).
## How-to
### Use my own datasets ### Convert to ONNX (experimental)
The simplest way is to convert your dataset to existing dataset formats (COCO or PASCAL VOC). We provide a script to convert model to [ONNX](https://github.com/onnx/onnx) format. The converted model could be visualized by tools like [Netron](https://github.com/lutzroeder/netron).
Here we show an example of adding a custom dataset of 5 classes, assuming it is also in COCO format. ```shell
python tools/pytorch2onnx.py ${CONFIG_FILE} ${CHECKPOINT_FILE} --out ${ONNX_FILE} [--shape ${INPUT_SHAPE}]
In `mmdet/datasets/my_dataset.py`:
```python
from .coco import CocoDataset
from .registry import DATASETS
@DATASETS.register_module()
class MyDataset(CocoDataset):
CLASSES = ('a', 'b', 'c', 'd', 'e')
```
In `mmdet/datasets/__init__.py`:
```python
from .my_dataset import MyDataset
```
Then you can use `MyDataset` in config files, with the same API as CocoDataset.
It is also fine if you do not want to convert the annotation format to COCO or PASCAL format.
Actually, we define a simple annotation format and all existing datasets are
processed to be compatible with it, either online or offline.
The annotation of a dataset is a list of dict, each dict corresponds to an image.
There are 3 field `filename` (relative path), `width`, `height` for testing,
and an additional field `ann` for training. `ann` is also a dict containing at least 2 fields:
`bboxes` and `labels`, both of which are numpy arrays. Some datasets may provide
annotations like crowd/difficult/ignored bboxes, we use `bboxes_ignore` and `labels_ignore`
to cover them.
Here is an example.
```
[
{
'filename': 'a.jpg',
'width': 1280,
'height': 720,
'ann': {
'bboxes': <np.ndarray, float32> (n, 4),
'labels': <np.ndarray, int64> (n, ),
'bboxes_ignore': <np.ndarray, float32> (k, 4),
'labels_ignore': <np.ndarray, int64> (k, ) (optional field)
}
},
...
]
```
There are two ways to work with custom datasets.
- online conversion
You can write a new Dataset class inherited from `CustomDataset`, and overwrite two methods
`load_annotations(self, ann_file)` and `get_ann_info(self, idx)`,
like [CocoDataset](https://github.com/open-mmlab/mmdetection/blob/master/mmdet/datasets/coco.py) and [VOCDataset](https://github.com/open-mmlab/mmdetection/blob/master/mmdet/datasets/voc.py).
- offline conversion
You can convert the annotation format to the expected format above and save it to
a pickle or json file, like [pascal_voc.py](https://github.com/open-mmlab/mmdetection/blob/master/tools/convert_datasets/pascal_voc.py).
Then you can simply use `CustomDataset`.
### Customize optimizer
An example of customized optimizer `CopyOfSGD` is defined in `mmdet/core/optimizer/copy_of_sgd.py`.
More generally, a customized optimizer could be defined as following.
In `mmdet/core/optimizer/my_optimizer.py`:
```python
from .registry import OPTIMIZERS
from torch.optim import Optimizer
@OPTIMIZERS.register_module()
class MyOptimizer(Optimizer):
```
In `mmdet/core/optimizer/__init__.py`:
```python
from .my_optimizer import MyOptimizer
```
Then you can use `MyOptimizer` in `optimizer` field of config files.
### Develop new components
We basically categorize model components into 4 types.
- backbone: usually an FCN network to extract feature maps, e.g., ResNet, MobileNet.
- neck: the component between backbones and heads, e.g., FPN, PAFPN.
- head: the component for specific tasks, e.g., bbox prediction and mask prediction.
- roi extractor: the part for extracting RoI features from feature maps, e.g., RoI Align.
Here we show how to develop new components with an example of MobileNet.
1. Create a new file `mmdet/models/backbones/mobilenet.py`.
```python
import torch.nn as nn
from ..registry import BACKBONES
@BACKBONES.register_module()
class MobileNet(nn.Module):
def __init__(self, arg1, arg2):
pass
def forward(self, x): # should return a tuple
pass
def init_weights(self, pretrained=None):
pass
```
2. Import the module in `mmdet/models/backbones/__init__.py`.
```python
from .mobilenet import MobileNet
``` ```
3. Use it in your config file. **Note**: This tool is still experimental. Customized operators are not supported for now. We set `use_torchvision=True` on-the-fly for `RoIPool` and `RoIAlign`.
```python ## Tutorials
model = dict(
...
backbone=dict(
type='MobileNet',
arg1=xxx,
arg2=xxx),
...
```
For more information on how it works, you can refer to [TECHNICAL_DETAILS.md](TECHNICAL_DETAILS.md) (TODO). Currently, we provide three tutorials for users to [finetune models](tutorials/finetune.md), [add new dataset](tutorials/new_dataset.md), and [add new modules](tutorials/new_modules.md)
...@@ -2,27 +2,20 @@ ...@@ -2,27 +2,20 @@
### Requirements ### Requirements
- Linux (Windows is not officially supported) - Linux or macOS (Windows is not currently officially supported)
- Python 3.5+ - Python 3.6+
- PyTorch 1.1 or higher - PyTorch 1.3+
- CUDA 9.0 or higher - CUDA 9.2+ (If you build PyTorch from source, CUDA 9.0 is also compatible)
- NCCL 2 - GCC 4.9+
- GCC 4.9 or higher
- [mmcv](https://github.com/open-mmlab/mmcv) - [mmcv](https://github.com/open-mmlab/mmcv)
We have tested the following versions of OS and softwares:
- OS: Ubuntu 16.04/18.04 and CentOS 7.2
- CUDA: 9.0/9.2/10.0/10.1
- NCCL: 2.1.15/2.2.13/2.3.7/2.4.2
- GCC(G++): 4.9/5.3/5.4/7.3
### Install mmdetection ### Install mmdetection
a. Create a conda virtual environment and activate it. a. Create a conda virtual environment and activate it.
```shell ```shell
conda create -n open-mmlab python=3.7 numba=0.45.1 -y conda create -n open-mmlab python=3.7 -y
conda activate open-mmlab conda activate open-mmlab
``` ```
...@@ -32,6 +25,26 @@ b. Install PyTorch and torchvision following the [official instructions](https:/ ...@@ -32,6 +25,26 @@ b. Install PyTorch and torchvision following the [official instructions](https:/
conda install pytorch torchvision -c pytorch conda install pytorch torchvision -c pytorch
``` ```
Note: Make sure that your compilation CUDA version and runtime CUDA version match.
You can check the supported CUDA version for precompiled packages on the [PyTorch website](https://pytorch.org/).
`E.g.1` If you have CUDA 10.1 installed under `/usr/local/cuda` and would like to install
PyTorch 1.5, you need to install the prebuilt PyTorch with CUDA 10.1.
```python
conda install pytorch cudatoolkit=10.1 torchvision -c pytorch
```
`E.g. 2` If you have CUDA 9.2 installed under `/usr/local/cuda` and would like to install
PyTorch 1.3.1., you need to install the prebuilt PyTorch with CUDA 9.2.
```python
conda install pytorch=1.3.1 cudatoolkit=9.2 torchvision=0.4.2 -c pytorch
```
If you build PyTorch from source instead of installing the prebuilt pacakge,
you can use more CUDA versions such as 9.0.
c. Clone the mmdetection repository. c. Clone the mmdetection repository.
```shell ```shell
...@@ -48,18 +61,10 @@ pip install "git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonA ...@@ -48,18 +61,10 @@ pip install "git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonA
pip install -v -e . # or "python setup.py develop" pip install -v -e . # or "python setup.py develop"
``` ```
e. Clone the MMDetection3D repository. If you build mmdetection on macOS, replace the last command with
```shell
git clone https://github.com/open-mmlab/mmdetection3d.git
cd mmdetection3d
``` ```
CC=clang CXX=clang++ CFLAGS='-stdlib=libc++' pip install -e .
f. Install build requirements and then install MMDetection3D.
```shell
pip install -r requirements/build.txt
pip install -v -e . # or "python setup.py develop"
``` ```
Note: Note:
...@@ -74,92 +79,59 @@ you can install it before installing MMCV. ...@@ -74,92 +79,59 @@ you can install it before installing MMCV.
4. Some dependencies are optional. Simply running `pip install -v -e .` will only install the minimum runtime requirements. To use optional dependencies like `albumentations` and `imagecorruptions` either install them manually with `pip install -r requirements/optional.txt` or specify desired extras when calling `pip` (e.g. `pip install -v -e .[optional]`). Valid keys for the extras field are: `all`, `tests`, `build`, and `optional`. 4. Some dependencies are optional. Simply running `pip install -v -e .` will only install the minimum runtime requirements. To use optional dependencies like `albumentations` and `imagecorruptions` either install them manually with `pip install -r requirements/optional.txt` or specify desired extras when calling `pip` (e.g. `pip install -v -e .[optional]`). Valid keys for the extras field are: `all`, `tests`, `build`, and `optional`.
### Install with CPU only
The code can be built for CPU only environment (where CUDA isn't available).
In CPU mode you can run the demo/webcam_demo.py for example.
However some functionality is gone in this mode:
- Deformable Convolution
- Deformable ROI pooling
- CARAFE: Content-Aware ReAssembly of FEatures
- nms_cuda
- sigmoid_focal_loss_cuda
So if you try to run inference with a model containing deformable convolution you will get an error.
Note: We set `use_torchvision=True` on-the-fly in CPU mode for `RoIPool` and `RoIAlign`
### Another option: Docker Image ### Another option: Docker Image
We provide a [Dockerfile](https://github.com/open-mmlab/mmdetection/blob/master/docker/Dockerfile) to build an image. We provide a [Dockerfile](https://github.com/open-mmlab/mmdetection/blob/master/docker/Dockerfile) to build an image.
```shell ```shell
# build an image with PyTorch 1.1, CUDA 10.0 and CUDNN 7.5 # build an image with PyTorch 1.5, CUDA 10.1
docker build -t mmdetection docker/ docker build -t mmdetection docker/
``` ```
### Prepare datasets Run it with
It is recommended to symlink the dataset root to `$MMDETECTION/data`.
If your folder structure is different, you may need to change the corresponding paths in config files.
```
mmdetection
├── mmdet
├── tools
├── configs
├── data
│ ├── coco
│ │ ├── annotations
│ │ ├── train2017
│ │ ├── val2017
│ │ ├── test2017
│ ├── cityscapes
│ │ ├── annotations
│ │ ├── leftImg8bit
│ │ │ ├── train
│ │ │ ├── val
│ │ ├── gtFine
│ │ │ ├── train
│ │ │ ├── val
│ ├── VOCdevkit
│ │ ├── VOC2007
│ │ ├── VOC2012
│ ├── ScanNet
│ │ ├── meta_data
│ │ ├── scannet_train_instance_data
│ ├── SUNRGBD
│ │ ├── sunrgbd_trainval
```
The cityscapes annotations have to be converted into the coco format using `tools/convert_datasets/cityscapes.py`:
```shell ```shell
pip install cityscapesscripts docker run --gpus all --shm-size=8g -it -v {DATA_DIR}:/mmdetection/data mmdetection
python tools/convert_datasets/cityscapes.py ./data/cityscapes --nproc 8 --out_dir ./data/cityscapes/annotations
``` ```
Current the config files in `cityscapes` use COCO pre-trained weights to initialize.
You could download the pre-trained models in advance if network is unavailable or slow, otherwise it would cause errors at the beginning of training.
### A from-scratch setup script ### A from-scratch setup script
Here is a full script for setting up mmdetection with conda and link the dataset path (supposing that your COCO dataset path is $COCO_ROOT). Here is a full script for setting up mmdetection with conda.
```shell ```shell
conda create -n open-mmlab python=3.7 numba=0.45.1 -y conda create -n open-mmlab python=3.7 -y
conda activate open-mmlab conda activate open-mmlab
# install latest pytorch prebuilt with the default prebuilt CUDA version (usually the latest)
conda install -c pytorch pytorch torchvision -y conda install -c pytorch pytorch torchvision -y
git clone https://github.com/open-mmlab/mmdetection.git git clone https://github.com/open-mmlab/mmdetection.git
cd mmdetection cd mmdetection
pip install -r requirements/build.txt pip install -r requirements/build.txt
pip install "git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI" pip install "git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI"
pip install -v -e . pip install -v -e .
git clone https://github.com/open-mmlab/mmdetection3d.git
cd mmdetection3d
pip install -r requirements/build.txt
pip install -v -e .
mkdir data
ln -s $COCO_ROOT data
``` ```
### Using multiple MMDetection3D versions ### Using multiple MMDetection versions
If there are more than one mmdetection on your machine, and you want to use them alternatively, the recommended way is to create multiple conda environments and use different environments for different versions. The train and test scripts already modify the `PYTHONPATH` to ensure the script use the MMDetection in the current directory.
Another way is to insert the following code to the main scripts (`train.py`, `test.py` or any other scripts you run) To use the default MMDetection installed in the environment rather than that you are working with, you can remove the following line in those scripts
```python
import os.path as osp
import sys
sys.path.insert(0, osp.join(osp.dirname(osp.abspath(__file__)), '../'))
```
Or run the following command in the terminal of corresponding folder to temporally use the current one.
```shell ```shell
export PYTHONPATH=`pwd`:$PYTHONPATH PYTHONPATH="$(dirname $0)/..":$PYTHONPATH
``` ```
This diff is collapsed.
API Documentation
=================
mmdet3d.apis
--------------
.. automodule:: mmdet3d.apis
:members:
mmdet3d.core
--------------
anchor
^^^^^^^^^^
.. automodule:: mmdet3d.core.anchor
:members:
bbox
^^^^^^^^^^
.. automodule:: mmdet3d.core.bbox
:members:
evaluation
^^^^^^^^^^
.. automodule:: mmdet3d.core.evaluation
:members:
post_processing
^^^^^^^^^^^^^^^
.. automodule:: mmdet3d.core.post_processing
:members:
optimizer
^^^^^^^^^^
.. automodule:: mmdet3d.core.optimizer
:members:
utils
^^^^^^^^^^
.. automodule:: mmdet3d.core.utils
:members:
mmdet3d.datasets
--------------
datasets
^^^^^^^^^^
.. automodule:: mmdet3d.datasets
:members:
pipelines
^^^^^^^^^^
.. automodule:: mmdet3d.datasets.pipelines
:members:
mmdet3d.models
--------------
detectors
^^^^^^^^^^
.. automodule:: mmdet3d.models.detectors
:members:
backbones
^^^^^^^^^^
.. automodule:: mmdet3d.models.backbones
:members:
dense_heads
^^^^^^^^^^^^
.. automodule:: mmdet3d.models.dense_heads
:members:
roi_heads
^^^^^^^^^^
.. automodule:: mmdet3d.models.roi_heads
:members:
# Compatibility with MMDetection 1.x
MMDetection 2.0 goes through a big refactoring and addresses many legacy issues. It is not compatible with the 1.x version, i.e., running inference with the same model weights in these two version will produce different results. Thus, MMDetection 2.0 re-benchmarks all the models and provids their links and logs in the model zoo.
The major differences are in four folds: coordinate system, codebase conventions, training hyperparameters, and modular design.
## Coordinate System
The new coordinate system is consistent with [Detectron2](https://github.com/facebookresearch/detectron2/) and treats the center of the most left-top pixel as (0, 0) rather than the left-top corner of that pixel.
Accordingly, the system interprets the coordinates in COCO bounding box and segmentation annotations as coordinates in range `[0, width]` or `[0, height]`.
This modification affects all the computation related to the bbox and pixel selection,
which is more natural and accurate.
- The height and width of a box with corners (x1, y1) and (x2, y2) in the new coordinate system is computed as `width = x2 - x1` and `height = y2 - y1`.
In MMDetection 1.x and previous version, a "+ 1" was added both height and width.
This modification are in three folds:
1. Box transformation and encoding/decoding in regression.
2. Iou calculation. This affects the matching process between ground truth and bounding box and the NMS process. The effect to compatibility is very negligible, though.
3. The corners of bounding box is in float type and no longer quantized. This should provide more accurate bounding box results. Thie also makes the bounding box and rois not required to have minimum size of 1, whose effect is small, though.
- The anchors are center-aligned to feature grid points and in float type.
In MMDetection 1.x and previous version, the anchors are in int type and not center-aligned.
This affects the anchor generation in RPN and all the anchor-based methods.
- ROIAlign is better alligned with the image coordinate system. The new implementation is adopted from [Detectron2](https://github.com/facebookresearch/detectron2/tree/master/detectron2/layers/csrc/ROIAlign).
The RoIs are shifted by half a pixel by default when they are used to cropping RoI features, compared to MMDetection 1.x.
The old behavior is still available by setting `aligned=False` instead of `aligned=True`.
- Mask cropping and pasting are more accurate.
1. We use the new RoIAlign to crop mask targets. In MMDetection 1.x, the bounding box is quantilized before it is used to crop mask target, and the crop process is implemented by numpy. In new implementation, the bounding box for crop is not quantilized and sent to RoIAlign. This implementation accelerates the training speed by a large margin (~0.1s per iter, ~2 hour when training Mask R50 for 1x schedule) and should be more accurate.
2. In MMDetection 2.0, the "paste_mask()" function is different and should be more accurate than those in previous versions. This change follows the modification in [Detectron2](https://github.com/facebookresearch/detectron2/blob/master/detectron2/structures/masks.py) and can improve mask AP on COCO by ~0.5% absolute.
## Codebase Conventions
- MMDetection 2.0 changes the order of class labels to reduce unused parameters in regression and mask branch more naturally (without +1 and -1).
This effect all the classification layers of the model to have a different ordering of class labels. The final layers of regression branch and mask head no longer keep K+1 channels for K categories, and their class orders are consistent with the classification branch.
- In MMDetection 2.0, label "K" means background, and labels [0, K-1] correspond to the K = num_categories object categories.
- In MMDetection 1.x and previous version, label "0" means background, and labels [1, K] correspond to the K categories.
- Low quality matching in R-CNN is not used. In MMDetection 1.x and previous versions, the `max_iou_assigner` will match low quality boxes for each ground truth box in both RPN and R-CNN training. We observe this sometimes does not assign the most perfect GT box to some bounding boxes,
thus MMDetection 2.0 do not allow low quality matching by default in R-CNN training in the new system. This sometimes may slightly improve the box AP (~0.1% absolute).
- Seperate scale factors for width and height. In MMDetection 1.x and previous versions, the scale factor is a single float in mode `keep_ratio=True`. This is slightly inaccurate because the scale factors for width and height have slight difference. MMDetection 2.0 adopts separate scale factors for width and height, the improvment on AP ~0.1% absolute.
- Configs name conventions are changed. MMDetection V2.0 adopts the new name convention to maintain the gradually growing model zoo as the following:
```
[model]_(model setting)_[backbone]_[neck]_(norm setting)_(misc)_(gpu x batch)_[schedule]_[dataset].py,
```
where the (misc) includes DCN and GCBlock, etc. More details are illustrated in the [documentation for config](config.md)
- MMDetection V2.0 uses new ResNet Caffe backbones to reduce warnings when loading pre-trained models. Most of the new backbones' weights are the same as the former ones but do not have `conv.bias`, except that they use a different `img_norm_cfg`. Thus, the new backbone will not cause warning of unexpected keys.
## Training Hyperparameters
The change in training hyperparameters does not affect
model-level compatibility but slightly improves the performance. The major ones are:
- The number of proposals after nms is changed from 2000 to 1000 by setting `nms_post=1000` and `max_num=1000`.
This slightly improves both mask AP and bbox AP by ~0.2% absolute.
- The default box regression losses for Mask R-CNN, Faster R-CNN and RetinaNet are changed from smooth L1 Loss to L1 loss. This leads to an overall improvement in box AP (~0.6% absolute). However, using L1-loss for other methods such as Cascade R-CNN and HTC does not improve the performance, so we keep the original settings for these methods.
- The sample num of RoIAlign layer is set to be 0 for simplicity. This leads to slightly improvement on mask AP (~0.2% absolute).
- The default setting does not use gradient clipping anymore during training for faster training speed. This does not degrade performance of the most of models. For some models such as RepPoints we keep using gradient clipping to stablize the training process and to obtain better performance.
- The default warmup ratio is changed from 1/3 to 0.001 for a more smooth warming up process since the gradient clipping is usually not used. The effect is found negligible during our re-benchmarking, though.
## Upgrade Models from 1.x to 2.0
To convert the models trained by MMDetection V1.x to MMDetection V2.0, the users can use the script `tools/upgrade_model_version.py` to convert
their models. The converted models can be run in MMDetection V2.0 with slightly dropped performance (less than 1% AP absolute).
Details can be found in `configs/legacy`.
...@@ -10,18 +10,20 @@ ...@@ -10,18 +10,20 @@
# add these directories to sys.path here. If the directory is relative to the # add these directories to sys.path here. If the directory is relative to the
# documentation root, use os.path.abspath to make it absolute, like shown here. # documentation root, use os.path.abspath to make it absolute, like shown here.
# #
# import os import os
# import sys import sys
# sys.path.insert(0, os.path.abspath('.'))
sys.path.insert(0, os.path.abspath('..'))
# -- Project information ----------------------------------------------------- # -- Project information -----------------------------------------------------
project = 'MMDetection' project = 'MMDetection3D'
copyright = '2018-2020, OpenMMLab' copyright = '2020-2023, OpenMMLab'
author = 'OpenMMLab' author = 'MMDetection3D Authors'
# The full version, including alpha/beta/rc tags # The full version, including alpha/beta/rc tags
release = '1.0.0' with open('../mmdet3d/VERSION', 'r') as f:
release = f.read().strip()
# -- General configuration --------------------------------------------------- # -- General configuration ---------------------------------------------------
...@@ -36,7 +38,9 @@ extensions = [ ...@@ -36,7 +38,9 @@ extensions = [
'sphinx_markdown_tables', 'sphinx_markdown_tables',
] ]
autodoc_mock_imports = ['torch', 'torchvision', 'mmcv'] autodoc_mock_imports = [
'matplotlib', 'pycocotools', 'terminaltables', 'mmdet3d.version',
]
# Add any paths that contain templates here, relative to this directory. # Add any paths that contain templates here, relative to this directory.
templates_path = ['_templates'] templates_path = ['_templates']
......
# Config System
We incorporate modular and inheritance design into our config system, which is convenient to conduct various experiments.
If you wish to inspect the config file, you may run `python tools/print_config.py /PATH/TO/CONFIG` to see the complete config.
You may also pass `--options xxx.yyy=zzz` to see updated config.
## Config File Structure
There are 4 basic component types under `config/_base_`, dataset, model, schedule, default_runtime.
Many methods could be easily constructed with one of each like Faster R-CNN, Mask R-CNN, Cascade R-CNN, RPN, SSD.
The configs that are composed by components from `_base_` are called _primitive_.
For all configs under the same folder, it is recommended to have only **one** _primitive_ config. All other configs should inherit from the _primitive_ config. In this way, the maximum of inheritance level is 3.
For easy understanding, we recommend contributors to inherit from exiting methods.
For example, if some modification is made base on Faster R-CNN, user may first inherit the basic Faster R-CNN structure by specifying `_base_ = ../faster_rcnn/faster_rcnn_r50_fpn_1x_coco.py`, then modify the necessary fields in the config files.
If you are building an entirely new method that does not share the structure with any of the existing methods, you may create a folder `xxx_rcnn` under `configs`,
Please refer to [mmcv](https://mmcv.readthedocs.io/en/latest/utils.html#config) for detailed documentation.
## Config Name Style
We follow the below style to name config files. Contributors are advised to follow the same style.
```
{model}_[model setting]_{backbone}_{neck}_[norm setting]_[misc]_[gpu x batch_per_gpu]_{schedule}_{dataset}
```
`{xxx}` is required field and `[yyy]` is optional.
- `{model}`: model type like `faster_rcnn`, `mask_rcnn`, etc.
- `[model setting]`: specific setting for some model, like `without_semantic` for `htc`, `moment` for `reppoints`, etc.
- `{backbone}`: backbone type like `r50` (ResNet-50), `x101` (ResNeXt-101).
- `{neck}`: neck type like `fpn`, `pafpn`, `nasfpn`, `c4`.
- `[norm_setting]`: `bn` (Batch Normalization) is used unless specified, other norm layer type could be `gn` (Group Normalization), `syncbn` (Synchronized Batch Normalization).
`gn-head`/`gn-neck` indicates GN is applied in head/neck only, while `gn-all` means GN is applied in the entire model, e.g. backbone, neck, head.
- `[misc]`: miscellaneous setting/plugins of model, e.g. `dconv`, `gcb`, `attention`, `albu`, `mstrain`.
- `[gpu x batch_per_gpu]`: GPUs and samples per GPU, `8x2` is used by default.
- `{schedule}`: training schedule, options are `1x`, `2x`, `20e`, etc.
`1x` and `2x` means 12 epochs and 24 epochs respectively.
`20e` is adopted in cascade models, which denotes 20 epochs.
For `1x`/`2x`, initial learning rate decays by a factor of 10 at the 8/16th and 11/22th epochs.
For `20e`, initial learning rate decays by a factor of 10 at the 16th and 19th epochs.
- `{dataset}`: dataset like `coco`, `cityscapes`, `voc_0712`, `wider_face`.
## FAQ
### Ignore some fields in the base configs
Sometimes, you may set `_delete_=True` to ignore some of fields in base configs.
You may refer to [mmcv](https://mmcv.readthedocs.io/en/latest/utils.html#inherit-from-base-config-with-ignored-fields) for simple inllustration.
In MMDetection, for example, to change the backbone of Mask R-CNN with the following config.
```python
model = dict(
type='MaskRCNN',
pretrained='torchvision://resnet50',
backbone=dict(
type='ResNet',
depth=50,
num_stages=4,
out_indices=(0, 1, 2, 3),
frozen_stages=1,
norm_cfg=dict(type='BN', requires_grad=True),
norm_eval=True,
style='pytorch'),
neck=dict(...),
rpn_head=dict(...),
roi_head=dict(...))
```
`ResNet` and `HRNet` use different keywords to construct.
```python
_base_ = '../mask_rcnn/mask_rcnn_r50_fpn_1x_coco.py'
model = dict(
pretrained='open-mmlab://msra/hrnetv2_w32',
backbone=dict(
_delete_=True,
type='HRNet',
extra=dict(
stage1=dict(
num_modules=1,
num_branches=1,
block='BOTTLENECK',
num_blocks=(4, ),
num_channels=(64, )),
stage2=dict(
num_modules=1,
num_branches=2,
block='BASIC',
num_blocks=(4, 4),
num_channels=(32, 64)),
stage3=dict(
num_modules=4,
num_branches=3,
block='BASIC',
num_blocks=(4, 4, 4),
num_channels=(32, 64, 128)),
stage4=dict(
num_modules=3,
num_branches=4,
block='BASIC',
num_blocks=(4, 4, 4, 4),
num_channels=(32, 64, 128, 256)))),
neck=dict(...))
```
The `_delete_=True` would replace all old keys in `backbone` field with new keys new keys.
### Use intermediate variables in configs
Some intermediate variables are used in the configs files, like `train_pipeline`/`test_pipeline` in datasets.
It's worth noting that when modifying intermediate variables in the children configs, user need to pass the intermediate variables into corresponding fields again.
For example, we would like to use multi scale strategy to train a Mask R-CNN. `train_pipeline`/`test_pipeline` are intermediate variable we would like modify.
```python
_base_ = './mask_rcnn_r50_fpn_1x_coco.py'
img_norm_cfg = dict(
mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
train_pipeline = [
dict(type='LoadImageFromFile'),
dict(type='LoadAnnotations', with_bbox=True, with_mask=True),
dict(
type='Resize',
img_scale=[(1333, 640), (1333, 672), (1333, 704), (1333, 736),
(1333, 768), (1333, 800)],
multiscale_mode="value",
keep_ratio=True),
dict(type='RandomFlip', flip_ratio=0.5),
dict(type='Normalize', **img_norm_cfg),
dict(type='Pad', size_divisor=32),
dict(type='DefaultFormatBundle'),
dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks']),
]
test_pipeline = [
dict(type='LoadImageFromFile'),
dict(
type='MultiScaleFlipAug',
img_scale=(1333, 800),
flip=False,
transforms=[
dict(type='Resize', keep_ratio=True),
dict(type='RandomFlip'),
dict(type='Normalize', **img_norm_cfg),
dict(type='Pad', size_divisor=32),
dict(type='ImageToTensor', keys=['img']),
dict(type='Collect', keys=['img']),
])
]
data = dict(
train=dict(pipeline=train_pipeline),
val=dict(pipeline=test_pipeline),
test=dict(pipeline=test_pipeline))
```
We first define the new `train_pipeline`/`test_pipeline` and pass them into `data`.
...@@ -4,12 +4,14 @@ Welcome to MMDetection's documentation! ...@@ -4,12 +4,14 @@ Welcome to MMDetection's documentation!
.. toctree:: .. toctree::
:maxdepth: 2 :maxdepth: 2
INSTALL.md install.md
GETTING_STARTED.md getting_started.md
MODEL_ZOO.md config.md
TECHNICAL_DETAILS.md tutorials/finetune.md
CHANGELOG.md tutorials/new_dataset.md
tutorials/data_pipeline.md
tutorials/new_modules.md
api.rst
Indices and tables Indices and tables
......
# Technical Details # Tutorial 3: Custom Data Pipelines
In this section, we will introduce the main units of training a detector: ## Design of Data pipelines
data pipeline, model and iteration pipeline.
## Data pipeline
Following typical conventions, we use `Dataset` and `DataLoader` for data loading Following typical conventions, we use `Dataset` and `DataLoader` for data loading
with multiple workers. `Dataset` returns a dict of data items corresponding with multiple workers. `Dataset` returns a dict of data items corresponding
...@@ -18,7 +15,7 @@ defines how to process the annotations and a data pipeline defines all the steps ...@@ -18,7 +15,7 @@ defines how to process the annotations and a data pipeline defines all the steps
A pipeline consists of a sequence of operations. Each operation takes a dict as input and also output a dict for the next transform. A pipeline consists of a sequence of operations. Each operation takes a dict as input and also output a dict for the next transform.
We present a classical pipeline in the following figure. The blue blocks are pipeline operations. With the pipeline going on, each operator can add new keys (marked as green) to the result dict or update the existing keys (marked as orange). We present a classical pipeline in the following figure. The blue blocks are pipeline operations. With the pipeline going on, each operator can add new keys (marked as green) to the result dict or update the existing keys (marked as orange).
![pipeline figure](../demo/data_pipeline.png) ![pipeline figure](../../demo/data_pipeline.png)
The operations are categorized into data loading, pre-processing, formatting and test-time augmentation. The operations are categorized into data loading, pre-processing, formatting and test-time augmentation.
...@@ -127,100 +124,41 @@ For each operation, we list the related dict fields that are added/updated/remov ...@@ -127,100 +124,41 @@ For each operation, we list the related dict fields that are added/updated/remov
`MultiScaleFlipAug` `MultiScaleFlipAug`
## Model ## Extend and use custom pipelines
In MMDetection, model components are basically categorized as 4 types.
- backbone: usually a FCN network to extract feature maps, e.g., ResNet.
- neck: the part between backbones and heads, e.g., FPN, ASPP.
- head: the part for specific tasks, e.g., bbox prediction and mask prediction.
- roi extractor: the part for extracting features from feature maps, e.g., RoI Align.
We also write implement some general detection pipelines with the above components,
such as `SingleStageDetector` and `TwoStageDetector`.
### Build a model with basic components
Following some basic pipelines (e.g., two-stage detectors), the model structure
can be customized through config files with no pains.
If we want to implement some new components, e.g, the path aggregation
FPN structure in [Path Aggregation Network for Instance Segmentation](https://arxiv.org/abs/1803.01534), there are two things to do.
1. create a new file in `mmdet/models/necks/pafpn.py`. 1. Write a new pipeline in any file, e.g., `my_pipeline.py`. It takes a dict as input and return a dict.
```python ```python
from ..registry import NECKS from mmdet.datasets import PIPELINES
@NECKS.register
class PAFPN(nn.Module):
def __init__(self,
in_channels,
out_channels,
num_outs,
start_level=0,
end_level=-1,
add_extra_convs=False):
pass
def forward(self, inputs):
# implementation is ignored
pass
```
2. Import the module in `mmdet/models/necks/__init__.py`. @PIPELINES.register_module()
class MyTransform:
```python def __call__(self, results):
from .pafpn import PAFPN results['dummy'] = True
return results
``` ```
2. modify the config file from 2. Import the new class.
```python ```python
neck=dict( from .my_pipeline import MyTransform
type='FPN',
in_channels=[256, 512, 1024, 2048],
out_channels=256,
num_outs=5)
``` ```
to 3. Use it in config files.
```python ```python
neck=dict( img_norm_cfg = dict(
type='PAFPN', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
in_channels=[256, 512, 1024, 2048], train_pipeline = [
out_channels=256, dict(type='LoadImageFromFile'),
num_outs=5) dict(type='LoadAnnotations', with_bbox=True),
dict(type='Resize', img_scale=(1333, 800), keep_ratio=True),
dict(type='RandomFlip', flip_ratio=0.5),
dict(type='Normalize', **img_norm_cfg),
dict(type='Pad', size_divisor=32),
dict(type='MyTransform'),
dict(type='DefaultFormatBundle'),
dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']),
]
``` ```
We will release more components (backbones, necks, heads) for research purpose.
### Write a new model
To write a new detection pipeline, you need to inherit from `BaseDetector`,
which defines the following abstract methods.
- `extract_feat()`: given an image batch of shape (n, c, h, w), extract the feature map(s).
- `forward_train()`: forward method of the training mode
- `simple_test()`: single scale testing without augmentation
- `aug_test()`: testing with augmentation (multi-scale, flip, etc.)
[TwoStageDetector](https://github.com/hellock/mmdetection/blob/master/mmdet/models/detectors/two_stage.py)
is a good example which shows how to do that.
## Iteration pipeline
We adopt distributed training for both single machine and multiple machines.
Supposing that the server has 8 GPUs, 8 processes will be started and each process runs on a single GPU.
Each process keeps an isolated model, data loader, and optimizer.
Model parameters are only synchronized once at the beginning.
After a forward and backward pass, gradients will be allreduced among all GPUs,
and the optimizer will update model parameters.
Since the gradients are allreduced, the model parameter stays the same for all processes after the iteration.
## Other information
For more information, please refer to our [technical report](https://arxiv.org/abs/1906.07155).
# Tutorial 1: Finetuning Models
Detectors pre-trained on the COCO dataset can serve as a good pre-trained model for other datasets, e.g., CityScapes and KITTI Dataset.
This tutorial provides instruction for users to use the models provided in the [Model Zoo](../model_zoo.md) for other datasets to obatin better performance.
There are two steps to finetune a model on a new dataset.
- Add support for the new dataset following [Tutorial 2: Adding New Dataset](new_dataset.md).
- Modify the configs as will be discussed in this tutorial.
Take the finetuning process on Cityscapes Dataset as an example, the users need to modify five parts in the config.
## Inherit base configs
To release the burdun and reduce bugs in writing the whole configs, MMDetection V2.0 support inheriting configs from multiple existing configs. To finetune a Mask RCNN model, the new config needs to inherit
`_base_/models/mask_rcnn_r50_fpn.py` to build the basic structure of the model. To use the Cityscapes Dataset, the new config can also simply inherit `_base_/datasets/cityscapes_instance.py`. For runtime settings such as training schedules, the new config needs to inherit `_base_/default_runtime.py`. This configs are in the `configs` directory and the users can also choose to write the whole contents rather than use inheritance.
```python
_base_ = [
'../_base_/models/mask_rcnn_r50_fpn.py',
'../_base_/datasets/cityscapes_instance.py', '../_base_/default_runtime.py'
]
```
## Modify head
Then the new config needs to modify the head according to the class numbers of the new datasets. By only changing `num_classes` in the roi_head, the weights of the pre-trained models are mostly reused except the final prediction head.
```python
model = dict(
pretrained=None,
roi_head=dict(
bbox_head=dict(
type='Shared2FCBBoxHead',
in_channels=256,
fc_out_channels=1024,
roi_feat_size=7,
num_classes=8,
target_means=[0., 0., 0., 0.],
target_stds=[0.1, 0.1, 0.2, 0.2],
reg_class_agnostic=False,
loss_cls=dict(
type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0),
loss_bbox=dict(type='SmoothL1Loss', beta=1.0, loss_weight=1.0)),
mask_head=dict(
type='FCNMaskHead',
num_convs=4,
in_channels=256,
conv_out_channels=256,
num_classes=8,
loss_mask=dict(
type='CrossEntropyLoss', use_mask=True, loss_weight=1.0))))
```
## Modify dataset
The users may also need to prepare the dataset and write the configs about dataset. MMDetection V2.0 already support VOC, WIDER FACE, COCO and Cityscapes Dataset.
## Modify training schedule
The finetuning hyperparameters vary from the default schedule. It usually requires smaller learning rate and less training epochs
```python
# optimizer
# lr is set for a batch size of 8
optimizer = dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0001)
optimizer_config = dict(grad_clip=None)
# learning policy
lr_config = dict(
policy='step',
warmup='linear',
warmup_iters=500,
warmup_ratio=0.001,
# [7] yields higher performance than [6]
step=[7])
total_epochs = 8 # actual epoch = 8 * 8 = 64
log_config = dict(interval=100)
```
## Use pre-trained model
To use the pre-trained model, the new config add the link of pre-trained models in the `load_from`. The users might need to download the model weights before training to avoid the download time during training.
```python
load_from = 'https://s3.ap-northeast-2.amazonaws.com/open-mmlab/mmdetection/models/mask_rcnn_r50_fpn_2x_20181010-41d35c05.pth' # noqa
```
# Tutorial 2: Adding New Dataset
## Customize datasets by reorganizing data
### Reorganize dataset to existing format
The simplest way is to convert your dataset to existing dataset formats (COCO or PASCAL VOC).
The annotation json files in COCO format has the following necessary keys:
```python
'images': [
{
'file_name': 'COCO_val2014_000000001268.jpg',
'height': 427,
'width': 640,
'id': 1268
},
...
],
'annotations': [
{
'segmentation': [[192.81,
247.09,
...
219.03,
249.06]], # if you have mask labels
'area': 1035.749,
'iscrowd': 0,
'image_id': 1268,
'bbox': [192.81, 224.8, 74.73, 33.43],
'category_id': 16,
'id': 42986
},
...
],
'categories': [
{'id': 0, 'name': 'car'},
]
```
There are three necessary keys in the json file:
- `images`: contains a list of images with theire informations like `file_name`, `height`, `width`, and `id`.
- `annotations`: contains the list of instance annotations.
- `categories`: contains the list of categories names and their ID.
After the data pre-processing, the users need to further modify the config files to use the dataset.
Here we show an example of using a custom dataset of 5 classes, assuming it is also in COCO format.
In `configs/my_custom_config.py`:
```python
...
# dataset settings
dataset_type = 'CocoDataset'
classes = ('a', 'b', 'c', 'd', 'e')
...
data = dict(
samples_per_gpu=2,
workers_per_gpu=2,
train=dict(
type=dataset_type,
classes=classes,
ann_file='path/to/your/train/data',
...),
val=dict(
type=dataset_type,
classes=classes,
ann_file='path/to/your/val/data',
...),
test=dict(
type=dataset_type,
classes=classes,
ann_file='path/to/your/test/data',
...))
...
```
We use this way to support CityScapes dataset. The script is in [cityscapes.py](https://github.com/open-mmlab/mmdetection/blob/master/tools/convert_datasets/cityscapes.py) and we also provide the finetuning [configs](https://github.com/open-mmlab/mmdetection/blob/master/configs/cityscapes).
### Reorganize dataset to middle format
It is also fine if you do not want to convert the annotation format to COCO or PASCAL format.
Actually, we define a simple annotation format and all existing datasets are
processed to be compatible with it, either online or offline.
The annotation of a dataset is a list of dict, each dict corresponds to an image.
There are 3 field `filename` (relative path), `width`, `height` for testing,
and an additional field `ann` for training. `ann` is also a dict containing at least 2 fields:
`bboxes` and `labels`, both of which are numpy arrays. Some datasets may provide
annotations like crowd/difficult/ignored bboxes, we use `bboxes_ignore` and `labels_ignore`
to cover them.
Here is an example.
```
[
{
'filename': 'a.jpg',
'width': 1280,
'height': 720,
'ann': {
'bboxes': <np.ndarray, float32> (n, 4),
'labels': <np.ndarray, int64> (n, ),
'bboxes_ignore': <np.ndarray, float32> (k, 4),
'labels_ignore': <np.ndarray, int64> (k, ) (optional field)
}
},
...
]
```
There are two ways to work with custom datasets.
- online conversion
You can write a new Dataset class inherited from `CustomDataset`, and overwrite two methods
`load_annotations(self, ann_file)` and `get_ann_info(self, idx)`,
like [CocoDataset](https://github.com/open-mmlab/mmdetection/blob/master/mmdet/datasets/coco.py) and [VOCDataset](https://github.com/open-mmlab/mmdetection/blob/master/mmdet/datasets/voc.py).
- offline conversion
You can convert the annotation format to the expected format above and save it to
a pickle or json file, like [pascal_voc.py](https://github.com/open-mmlab/mmdetection/blob/master/tools/convert_datasets/pascal_voc.py).
Then you can simply use `CustomDataset`.
### An example of customized dataset
Assume the annotation is in a new format in text files.
The bounding boxes annotations are stored in text file `annotation.txt` as the following
```
#
000001.jpg
1280 720
2
10 20 40 60 1
20 40 50 60 2
#
000002.jpg
1280 720
3
50 20 40 60 2
20 40 30 45 2
30 40 50 60 3
```
We can create a new dataset in `mmdet/datasets/my_dataset.py` to load the data.
```python
import mmcv
import numpy as np
from .builder import DATASETS
from .custom import CustomDataset
@DATASETS.register_module()
class MyDataset(CustomDataset):
CLASSES = ('person', 'bicycle', 'car', 'motorcycle')
def load_annotations(self, ann_file):
ann_list = mmcv.list_from_file(ann_file)
data_infos = []
for i, ann_line in enumerate(ann_list):
if ann_line != '#':
continue
img_shape = ann_list[i + 2].split(' ')
width = int(img_shape[0])
height = int(img_shape[1])
bbox_number = int(ann_list[i + 3])
anns = ann_line.split(' ')
bboxes = []
labels = []
for anns in ann_list[i + 4:i + 4 + bbox_number]:
bboxes.append([float(ann) for ann in anns[:4]])
labels.append(int(anns[4]))
data_infos.append(
dict(
filename=ann_list[i + 1],
width=width,
height=height,
ann=dict(
bboxes=np.array(bboxes).astype(np.float32),
labels=np.array(labels).astype(np.int64))
))
return data_infos
def get_ann_info(self, idx):
return self.data_infos[idx]['ann']
```
Then in the config, to use `MyDataset` you can modify the config as the following
```python
dataset_A_train = dict(
type='MyDataset',
ann_file = 'image_list.txt',
pipeline=train_pipeline
)
```
## Customize datasets by mixing dataset
MMDetection also supports to mix dataset for training.
Currently it supports to concat and repeat datasets.
### Repeat dataset
We use `RepeatDataset` as warpper to repeat the dataset. For example, suppose the original dataset is `Dataset_A`, to repeat it, the config looks like the following
```python
dataset_A_train = dict(
type='RepeatDataset',
times=N,
dataset=dict( # This is the original config of Dataset_A
type='Dataset_A',
...
pipeline=train_pipeline
)
)
```
### Concatemate dataset
There two ways to concatenate the dataset.
1. If the datasets you want to concatenate are in the same type with different annotation files, you can concatenate the dataset configs like the following.
```python
dataset_A_train = dict(
type='Dataset_A',
ann_file = ['anno_file_1', 'anno_file_2'],
pipeline=train_pipeline
)
```
2. In case the dataset you want to concatenate is different, you can concatenate the dataset configs like the following.
```python
dataset_A_train = dict()
dataset_B_train = dict()
data = dict(
imgs_per_gpu=2,
workers_per_gpu=2,
train = [
dataset_A_train,
dataset_B_train
],
val = dataset_A_val,
test = dataset_A_test
)
```
A more complex example that repeats `Dataset_A` and `Dataset_B` by N and M times, respectively, and then concatenates the repeated datasets is as the following.
```python
dataset_A_train = dict(
type='RepeatDataset',
times=N,
dataset=dict(
type='Dataset_A',
...
pipeline=train_pipeline
)
)
dataset_A_val = dict(
...
pipeline=test_pipeline
)
dataset_A_test = dict(
...
pipeline=test_pipeline
)
dataset_B_train = dict(
type='RepeatDataset',
times=M,
dataset=dict(
type='Dataset_B',
...
pipeline=train_pipeline
)
)
data = dict(
imgs_per_gpu=2,
workers_per_gpu=2,
train = [
dataset_A_train,
dataset_B_train
],
val = dataset_A_val,
test = dataset_A_test
)
```
### Modify classes of existing dataset
With existing dataset types, we can modify the class names of them to train subset of the dataset.
For example, if you want to train only three classes of the current dataset,
you can modify the classes of dataset.
The dataset will subtract subset of the data which contains at least one class in the `classes`.
```python
classes = ('person', 'bicycle', 'car')
data = dict(
train=dict(classes=classes),
val=dict(classes=classes),
test=dict(classes=classes))
```
MMDetection V2.0 also supports to read the classes from a file, which is common in real applications.
For example, assume the `classes.txt` contains the name of classes as the following.
```
person
bicycle
car
```
Users can set the classes as a file path, the dataset will load it and convert it to a list automatically.
```python
classes = 'path/to/classes.txt'
data = dict(
train=dict(classes=classes),
val=dict(classes=classes),
test=dict(classes=classes))
```
# Tutorial 4: Adding New Modules
## Customize optimizer
An example of customized optimizer `CopyOfSGD` is defined in `mmdet/core/optimizer/copy_of_sgd.py`.
More generally, a customized optimizer could be defined as following.
Assume you want to add a optimizer named as `MyOptimizer`, which has arguments `a`, `b`, and `c`.
You need to first implement the new optimizer in a file, e.g., in `mmdet/core/optimizer/my_optimizer.py`:
```python
from .registry import OPTIMIZERS
from torch.optim import Optimizer
@OPTIMIZERS.register_module
class MyOptimizer(Optimizer):
def __init__(self, a, b, c)
```
Then add this module in `mmdet/core/optimizer/__init__.py` thus the registry will
find the new module and add it:
```python
from .my_optimizer import MyOptimizer
```
Then you can use `MyOptimizer` in `optimizer` field of config files.
In the configs, the optimizers are defined by the field `optimizer` like the following:
```python
optimizer = dict(type='SGD', lr=0.02, momentum=0.9, weight_decay=0.0001)
```
To use your own optimizer, the field can be changed as
```python
optimizer = dict(type='MyOptimizer', a=a_value, b=b_value, c=c_value)
```
We already support to use all the optimizers implemented by PyTorch, and the only modification is to change the `optimizer` field of config files.
For example, if you want to use `ADAM`, though the performance will drop a lot, the modification could be as the following.
```python
optimizer = dict(type='Adam', lr=0.0003, weight_decay=0.0001)
```
The users can directly set arguments following the [API doc](https://pytorch.org/docs/stable/optim.html?highlight=optim#module-torch.optim) of PyTorch.
## Customize optimizer constructor
Some models may have some parameter-specific settings for optimization, e.g. weight decay for BatchNoarm layers.
The users can do those fine-grained parameter tuning through customizing optimizer constructor.
```
from mmcv.utils import build_from_cfg
from mmdet.core.optimizer import OPTIMIZER_BUILDERS, OPTIMIZERS
from mmdet.utils import get_root_logger
from .cocktail_optimizer import CocktailOptimizer
@OPTIMIZER_BUILDERS.register_module
class CocktailOptimizerConstructor(object):
def __init__(self, optimizer_cfg, paramwise_cfg=None):
def __call__(self, model):
return my_optimizer
```
## Develop new components
We basically categorize model components into 4 types.
- backbone: usually an FCN network to extract feature maps, e.g., ResNet, MobileNet.
- neck: the component between backbones and heads, e.g., FPN, PAFPN.
- head: the component for specific tasks, e.g., bbox prediction and mask prediction.
- roi extractor: the part for extracting RoI features from feature maps, e.g., RoI Align.
### Add new backbones
Here we show how to develop new components with an example of MobileNet.
1. Create a new file `mmdet/models/backbones/mobilenet.py`.
```python
import torch.nn as nn
from ..registry import BACKBONES
@BACKBONES.register_module
class MobileNet(nn.Module):
def __init__(self, arg1, arg2):
pass
def forward(self, x): # should return a tuple
pass
def init_weights(self, pretrained=None):
pass
```
2. Import the module in `mmdet/models/backbones/__init__.py`.
```python
from .mobilenet import MobileNet
```
3. Use it in your config file.
```python
model = dict(
...
backbone=dict(
type='MobileNet',
arg1=xxx,
arg2=xxx),
...
```
### Add new necks
Here we take PAFPN as an example.
1. Create a new file in `mmdet/models/necks/pafpn.py`.
```python
from ..registry import NECKS
@NECKS.register
class PAFPN(nn.Module):
def __init__(self,
in_channels,
out_channels,
num_outs,
start_level=0,
end_level=-1,
add_extra_convs=False):
pass
def forward(self, inputs):
# implementation is ignored
pass
```
2. Import the module in `mmdet/models/necks/__init__.py`.
```python
from .pafpn import PAFPN
```
3. Modify the config file.
```python
neck=dict(
type='PAFPN',
in_channels=[256, 512, 1024, 2048],
out_channels=256,
num_outs=5)
```
### Add new heads
Here we show how to develop a new head with the example of [Double Head R-CNN](https://arxiv.org/abs/1904.06493) as the following.
First, add a new bbox head in `mmdet/models/bbox_heads/double_bbox_head.py`.
Double Head R-CNN implements a new bbox head for object detection.
To implement a bbox head, basically we need to implement three functions of the new module as the following.
```python
@HEADS.register_module
class DoubleConvFCBBoxHead(BBoxHead):
r"""Bbox head used in Double-Head R-CNN
/-> cls
/-> shared convs ->
\-> reg
roi features
/-> cls
\-> shared fc ->
\-> reg
""" # noqa: W605
def __init__(self,
num_convs=0,
num_fcs=0,
conv_out_channels=1024,
fc_out_channels=1024,
conv_cfg=None,
norm_cfg=dict(type='BN'),
**kwargs):
kwargs.setdefault('with_avg_pool', True)
super(DoubleConvFCBBoxHead, self).__init__(**kwargs)
def init_weights(self):
# conv layers are already initialized by ConvModule
def forward(self, x_cls, x_reg):
```
Second, implement a new RoI Head if it is necessary. We plan to inherit the new `DoubleHeadRoIHead` from `StandardRoIHead`. We can find that a `StandardRoIHead` already implements the following functions.
```python
import torch
from mmdet.core import bbox2result, bbox2roi, build_assigner, build_sampler
from ..builder import HEADS, build_head, build_roi_extractor
from .base_roi_head import BaseRoIHead
from .test_mixins import BBoxTestMixin, MaskTestMixin
@HEADS.register_module
class StandardRoIHead(BaseRoIHead, BBoxTestMixin, MaskTestMixin):
"""Simplest base roi head including one bbox head and one mask head.
"""
def init_assigner_sampler(self):
def init_bbox_head(self, bbox_roi_extractor, bbox_head):
def init_mask_head(self, mask_roi_extractor, mask_head):
def init_weights(self, pretrained):
def forward_dummy(self, x, proposals):
def forward_train(self,
x,
img_metas,
proposal_list,
gt_bboxes,
gt_labels,
gt_bboxes_ignore=None,
gt_masks=None):
def _bbox_forward(self, x, rois):
def _bbox_forward_train(self, x, sampling_results, gt_bboxes, gt_labels,
img_metas):
def _mask_forward_train(self, x, sampling_results, bbox_feats, gt_masks,
img_metas):
def _mask_forward(self, x, rois=None, pos_inds=None, bbox_feats=None):
def simple_test(self,
x,
proposal_list,
img_metas,
proposals=None,
rescale=False):
"""Test without augmentation."""
```
Double Head's modification is mainly in the bbox_forward logic, and it inherits other logics from the `StandardRoIHead`.
In the `mmdet/models/roi_heads/double_roi_head.py`, we implement the new RoI Head as the following:
```python
from ..builder import HEADS
from .standard_roi_head import StandardRoIHead
@HEADS.register_module
class DoubleHeadRoIHead(StandardRoIHead):
"""RoI head for Double Head RCNN
https://arxiv.org/abs/1904.06493
"""
def __init__(self, reg_roi_scale_factor, **kwargs):
super(DoubleHeadRoIHead, self).__init__(**kwargs)
self.reg_roi_scale_factor = reg_roi_scale_factor
def _bbox_forward(self, x, rois):
bbox_cls_feats = self.bbox_roi_extractor(
x[:self.bbox_roi_extractor.num_inputs], rois)
bbox_reg_feats = self.bbox_roi_extractor(
x[:self.bbox_roi_extractor.num_inputs],
rois,
roi_scale_factor=self.reg_roi_scale_factor)
if self.with_shared_head:
bbox_cls_feats = self.shared_head(bbox_cls_feats)
bbox_reg_feats = self.shared_head(bbox_reg_feats)
cls_score, bbox_pred = self.bbox_head(bbox_cls_feats, bbox_reg_feats)
bbox_results = dict(
cls_score=cls_score,
bbox_pred=bbox_pred,
bbox_feats=bbox_cls_feats)
return bbox_results
```
Last, the users need to add the module in the `mmdet/models/bbox_heads/__init__.py` and `mmdet/models/roi_heads/__init__.py` thus the corresponding registry could find and load them.
To config file of Double Head R-CNN is as the following
```python
_base_ = '../faster_rcnn/faster_rcnn_r50_fpn_1x_coco.py'
model = dict(
roi_head=dict(
type='DoubleHeadRoIHead',
reg_roi_scale_factor=1.3,
bbox_head=dict(
_delete_=True,
type='DoubleConvFCBBoxHead',
num_convs=4,
num_fcs=2,
in_channels=256,
conv_out_channels=1024,
fc_out_channels=1024,
roi_feat_size=7,
num_classes=80,
bbox_coder=dict(
type='DeltaXYWHBBoxCoder',
target_means=[0., 0., 0., 0.],
target_stds=[0.1, 0.1, 0.2, 0.2]),
reg_class_agnostic=False,
loss_cls=dict(
type='CrossEntropyLoss', use_sigmoid=False, loss_weight=2.0),
loss_bbox=dict(type='SmoothL1Loss', beta=1.0, loss_weight=2.0))))
```
Since MMDetection 2.0, the config system support to inherit configs such that the users can focus on the modification.
The Double Head R-CNN mainly uses a new DoubleHeadRoIHead and a new
`DoubleConvFCBBoxHead`, the arguments are set according to the `__init__` function of each module.
### Add new loss
Assume you want to add a new loss as `MyLoss`, for bounding box regression.
To add a new loss function, the users need implement it in `mmdet/models/losses/my_loss.py`.
The decorator `weighted_loss` enable the loss to be weighted for each element.
```python
import torch
import torch.nn as nn
from ..builder import LOSSES
from .utils import weighted_loss
@weighted_loss
def my_loss(pred, target):
assert pred.size() == target.size() and target.numel() > 0
loss = torch.abs(pred - target)
return loss
@LOSSES.register_module
class MyLoss(nn.Module):
def __init__(self, reduction='mean', loss_weight=1.0):
super(MyLoss, self).__init__()
self.reduction = reduction
self.loss_weight = loss_weight
def forward(self,
pred,
target,
weight=None,
avg_factor=None,
reduction_override=None):
assert reduction_override in (None, 'none', 'mean', 'sum')
reduction = (
reduction_override if reduction_override else self.reduction)
loss_bbox = self.loss_weight * my_loss(
pred, target, weight, reduction=reduction, avg_factor=avg_factor)
return loss_bbox
```
Then the users need to add it in the `mmdet/models/losses/__init__.py`.
```python
from .my_loss import MyLoss, my_loss
```
To use it, modify the `loss_xxx` field.
Since MyLoss is for regrression, you need to modify the `loss_bbox` field in the head.
```python
loss_bbox=dict(type='MyLoss', loss_weight=1.0))
```
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment