"examples/community/lpw_stable_diffusion_xl.py" did not exist on "0e82fb19e16bd2d45ade31c9a4b871de56e7e80a"
Commit d08a9ec2 authored by Kai Chen's avatar Kai Chen
Browse files

Merge branch 'master' into single-stage

parents 626e1e19 810b7110
This diff is collapsed.
# Benchmark and Model Zoo
## Environment
### Hardware
- 8 NVIDIA Tesla V100 GPUs
- Intel Xeon 4114 CPU @ 2.20GHz
### Software environment
- Python 3.6 / 3.7
- PyTorch 0.4.1
- CUDA 9.0.176
- CUDNN 7.0.4
- NCCL 2.1.15
## Common settings
- All baselines were trained using 8 GPU with a batch size of 16 (2 images per GPU).
- All models were trained on `coco_2017_train`, and tested on the `coco_2017_val`.
- We use distributed training and BN layer stats are fixed.
- We adopt the same training schedules as Detectron. 1x indicates 12 epochs and 2x indicates 24 epochs, which corresponds to slightly less iterations than Detectron and the difference can be ignored.
- All pytorch-style pretrained backbones on ImageNet are from PyTorch model zoo.
- We report the training GPU memory as the maximum value of `torch.cuda.max_memory_cached()`
for all 8 GPUs. Note that this value is usually less than what `nvidia-smi` shows, but
closer to the actual requirements.
- We report the inference time as the overall time including data loading,
network forwarding and post processing.
- The training memory and time of 2x schedule is simply copied from 1x.
It should be very close to the actual memory and time.
## Baselines
We released RPN, Faster R-CNN and Mask R-CNN models in the first version. More models with different backbones will be added to the model zoo.
### RPN
| Backbone | Style | Lr schd | Mem (GB) | Train time (s/iter) | Inf time (fps) | AR1000 | Download |
|:--------:|:-------:|:-------:|:--------:|:-------------------:|:--------------:|:------:|:--------:|
| R-50-FPN | caffe | 1x | 4.5 | 0.379 | 14.4 | 58.2 | - |
| R-50-FPN | pytorch | 1x | 4.8 | 0.407 | 14.5 | 57.1 | [model](https://s3.ap-northeast-2.amazonaws.com/open-mmlab/mmdetection/models/rpn_r50_fpn_1x_20181010-4a9c0712.pth) \| [result](https://s3.ap-northeast-2.amazonaws.com/open-mmlab/mmdetection/results/rpn_r50_fpn_1x_20181010_results.pkl.json) |
| R-50-FPN | pytorch | 2x | 4.8 | 0.407 | 14.5 | 57.6 | [model](https://s3.ap-northeast-2.amazonaws.com/open-mmlab/mmdetection/models/rpn_r50_fpn_2x_20181010-88a4a471.pth) \| [result](https://s3.ap-northeast-2.amazonaws.com/open-mmlab/mmdetection/results/rpn_r50_fpn_2x_20181010_results.pkl.json) |
### Faster R-CNN
| Backbone | Style | Lr schd | Mem (GB) | Train time (s/iter) | Inf time (fps) | box AP | Download |
|:--------:|:-------:|:-------:|:--------:|:-------------------:|:--------------:|:------:|:--------:|
| R-50-FPN | caffe | 1x | 4.9 | 0.525 | 10.0 | 36.7 | - |
| R-50-FPN | pytorch | 1x | 5.1 | 0.554 | 9.9 | 36.4 | [model](https://s3.ap-northeast-2.amazonaws.com/open-mmlab/mmdetection/models/faster_rcnn_r50_fpn_1x_20181010-3d1b3351.pth) \| [result](https://s3.ap-northeast-2.amazonaws.com/open-mmlab/mmdetection/results/faster_rcnn_r50_fpn_1x_20181010_results.pkl.json) |
| R-50-FPN | pytorch | 2x | 5.1 | 0.554 | 9.9 | 37.7 | [model](https://s3.ap-northeast-2.amazonaws.com/open-mmlab/mmdetection/models/faster_rcnn_r50_fpn_2x_20181010-443129e1.pth) \| [result](https://s3.ap-northeast-2.amazonaws.com/open-mmlab/mmdetection/results/faster_rcnn_r50_fpn_2x_20181010_results.pkl.json) |
### Mask R-CNN
| Backbone | Style | Lr schd | Mem (GB) | Train time (s/iter) | Inf time (fps) | box AP | mask AP | Download |
|:--------:|:-------:|:-------:|:--------:|:-------------------:|:--------------:|:------:|:-------:|:--------:|
| R-50-FPN | caffe | 1x | 5.9 | 0.658 | 7.7 | 37.5 | 34.4 | - |
| R-50-FPN | pytorch | 1x | 5.8 | 0.690 | 7.7 | 37.3 | 34.2 | [model](https://s3.ap-northeast-2.amazonaws.com/open-mmlab/mmdetection/models/mask_rcnn_r50_fpn_1x_20181010-069fa190.pth) \| [result](https://s3.ap-northeast-2.amazonaws.com/open-mmlab/mmdetection/results/mask_rcnn_r50_fpn_1x_20181010_results.pkl.json) |
| R-50-FPN | pytorch | 2x | 5.8 | 0.690 | 7.7 | 38.6 | 35.1 | [model](https://s3.ap-northeast-2.amazonaws.com/open-mmlab/mmdetection/models/mask_rcnn_r50_fpn_2x_20181010-41d35c05.pth) \| [result](https://s3.ap-northeast-2.amazonaws.com/open-mmlab/mmdetection/results/mask_rcnn_r50_fpn_2x_20181010_results.pkl.json) |
### Fast R-CNN (with pre-computed proposals)
| Backbone | Style | Type | Lr schd | Mem (GB) | Train time (s/iter) | Inf time (fps) | box AP | mask AP | Download |
|:--------:|:-------:|:------:|:-------:|:--------:|:-------------------:|:--------------:|:------:|:-------:|:--------:|
| R-50-FPN | caffe | Faster | 1x | 3.5 | 0.35 | 14.6 | 36.6 | - | - |
| R-50-FPN | pytorch | Faster | 1x | 4.0 | 0.38 | 14.5 | 35.8 | - | [model](https://s3.ap-northeast-2.amazonaws.com/open-mmlab/mmdetection/models/fast_rcnn_r50_fpn_1x_20181010-08160859.pth) \| [result](https://s3.ap-northeast-2.amazonaws.com/open-mmlab/mmdetection/results/fast_rcnn_r50_fpn_1x_20181010_results.pkl.json) |
| R-50-FPN | pytorch | Faster | 2x | 4.0 | 0.38 | 14.5 | 37.1 | - | [model](https://s3.ap-northeast-2.amazonaws.com/open-mmlab/mmdetection/models/fast_rcnn_r50_fpn_2x_20181010-d263ada5.pth) \| [result](https://s3.ap-northeast-2.amazonaws.com/open-mmlab/mmdetection/results/fast_rcnn_r50_fpn_2x_20181010_results.pkl.json) |
| R-50-FPN | caffe | Mask | 1x | 5.4 | 0.47 | 10.7 | 37.3 | 34.5 | - |
| R-50-FPN | pytorch | Mask | 1x | 5.3 | 0.50 | 10.6 | 36.8 | 34.1 | [model](https://s3.ap-northeast-2.amazonaws.com/open-mmlab/mmdetection/models/fast_mask_rcnn_r50_fpn_1x_20181010-e030a38f.pth) \| [result](https://s3.ap-northeast-2.amazonaws.com/open-mmlab/mmdetection/results/fast_mask_rcnn_r50_fpn_1x_20181010_results.pkl.json) |
| R-50-FPN | pytorch | Mask | 2x | 5.3 | 0.50 | 10.6 | 37.9 | 34.8 | [model](https://s3.ap-northeast-2.amazonaws.com/open-mmlab/mmdetection/models/fast_mask_rcnn_r50_fpn_2x_20181010-5048cb03.pth) \| [result](https://s3.ap-northeast-2.amazonaws.com/open-mmlab/mmdetection/results/fast_mask_rcnn_r50_fpn_2x_20181010_results.pkl.json) |
### RetinaNet (coming soon)
| Backbone | Style | Lr schd | Mem (GB) | Train time (s/iter) | Inf time (fps) | box AP | Download |
|:--------:|:-------:|:-------:|:--------:|:-------------------:|:--------------:|:------:|:--------:|
| R-50-FPN | caffe | 1x | | | | | |
| R-50-FPN | pytorch | 1x | | | | | |
| R-50-FPN | pytorch | 2x | | | | | |
## Comparison with Detectron
We compare mmdetection with [Detectron](https://github.com/facebookresearch/Detectron)
and [Detectron.pytorch](https://github.com/roytseng-tw/Detectron.pytorch),
a third-party port of Detectron to Pytorch. The backbone used is R-50-FPN.
In general, mmdetection has 3 advantages over Detectron.
- **Higher performance** (especially in terms of mask AP)
- **Faster training speed**
- **Memory efficient**
### Performance
Detectron and Detectron.pytorch use caffe-style ResNet as the backbone.
In order to utilize the PyTorch model zoo, we use pytorch-style ResNet in our experiments.
In the meanwhile, we train models with caffe-style ResNet in 1x experiments for comparison.
We find that pytorch-style ResNet usually converges slower than caffe-style ResNet,
thus leading to slightly lower results in 1x schedule, but the final results
of 2x schedule is higher.
We report results using both caffe-style (weights converted from
[here](https://github.com/facebookresearch/Detectron/blob/master/MODEL_ZOO.md#imagenet-pretrained-models))
and pytorch-style (weights from the official model zoo) ResNet backbone,
indicated as *pytorch-style results* / *caffe-style results*.
<table>
<tr>
<th>Type</th>
<th>Lr schd</th>
<th>Detectron</th>
<th>Detectron.pytorch</th>
<th>mmdetection</th>
</tr>
<tr>
<td rowspan="2">RPN</td>
<td>1x</td>
<td>57.2</td>
<td>-</td>
<td>57.1 / 58.2</td>
</tr>
<tr>
<td>2x</td>
<td>-</td>
<td>-</td>
<td>57.6 / -</td>
</tr>
<tr>
<td rowspan="2">Faster R-CNN</td>
<td>1x</td>
<td>36.7</td>
<td>37.1</td>
<td>36.4 / 36.7</td>
</tr>
<tr>
<td>2x</td>
<td>37.9</td>
<td>-</td>
<td>37.7 / -</td>
</tr>
<tr>
<td rowspan="2">Mask R-CNN</td>
<td>1x</td>
<td>37.7 &amp; 33.9</td>
<td>37.7 &amp; 33.7</td>
<td>37.3 &amp; 34.2 / 37.5 &amp; 34.4</td>
</tr>
<tr>
<td>2x</td>
<td>38.6 &amp; 34.5</td>
<td>-</td>
<td>38.6 &amp; 35.1 / -</td>
</tr>
<tr>
<td rowspan="2">Fast R-CNN</td>
<td>1x</td>
<td>36.4</td>
<td>-</td>
<td>35.8 / 36.6</td>
</tr>
<tr>
<td>2x</td>
<td>36.8</td>
<td>-</td>
<td>37.1 / -</td>
</tr>
<tr>
<td rowspan="2">Fast R-CNN (w/mask)</td>
<td>1x</td>
<td>37.3 &amp; 33.7</td>
<td>-</td>
<td>36.8 &amp; 34.1 / 37.3 &amp; 34.5</td>
</tr>
<tr>
<td>2x</td>
<td>37.7 &amp; 34.0</td>
<td>-</td>
<td>37.9 &amp; 34.8 / -</td>
</tr>
</table>
### Training Speed
The training speed is measure with s/iter. The lower, the better.
<table>
<tr>
<th>Type</th>
<th>Detectron (P100<sup>1</sup>)</th>
<th>Detectron.pytorch (XP<sup>2</sup>)</th>
<th>mmdetection<sup>3</sup> (V100<sup>4</sup> / XP)</th>
</tr>
<tr>
<td>RPN</td>
<td>0.416</td>
<td>-</td>
<td>0.407 / 0.413</td>
</tr>
<tr>
<td>Faster R-CNN</td>
<td>0.544</td>
<td>1.015</td>
<td>0.554 / 0.579</td>
</tr>
<tr>
<td>Mask R-CNN</td>
<td>0.889</td>
<td>1.435</td>
<td>0.690 / 0.732</td>
</tr>
<tr>
<td>Fast R-CNN</td>
<td>0.285</td>
<td>-</td>
<td>0.375 / 0.398</td>
</tr>
<tr>
<td>Fast R-CNN (w/mask)</td>
<td>0.377</td>
<td>-</td>
<td>0.504 / 0.574</td>
</tr>
</table>
\*1. Detectron reports the speed on Facebook's Big Basin servers (P100),
on our V100 servers it is slower so we use the official reported values.
\*2. Detectron.pytorch does not report the runtime and we encountered some issue to
run it on V100, so we report the speed on TITAN XP.
\*3. The speed of pytorch-style ResNet is approximately 5% slower than caffe-style,
and we report the pytorch-style results here.
\*4. We also run the models on a DGX-1 server (P100) and the speed is almost the same as our V100 servers.
### Inference Speed
The inference speed is measured with fps (img/s) on a single GPU. The higher, the better.
<table>
<tr>
<th>Type</th>
<th>Detectron (P100)</th>
<th>Detectron.pytorch (XP)</th>
<th>mmdetection (V100 / XP)</th>
</tr>
<tr>
<td>RPN</td>
<td>12.5</td>
<td>-</td>
<td>14.5 / 15.4</td>
</tr>
<tr>
<td>Faster R-CNN</td>
<td>10.3</td>
<td></td>
<td>9.9 / 9.8</td>
</tr>
<tr>
<td>Mask R-CNN</td>
<td>8.5</td>
<td></td>
<td>7.7 / 7.4</td>
</tr>
<tr>
<td>Fast R-CNN</td>
<td>12.5</td>
<td></td>
<td>14.5 / 14.1</td>
</tr>
<tr>
<td>Fast R-CNN (w/mask)</td>
<td>9.9</td>
<td></td>
<td>10.6 / 10.3</td>
</tr>
</table>
### Training memory
We perform various tests and there is no doubt that mmdetection is more memory
efficient than Detectron, and the main cause is the deep learning framework itself, not our efforts.
Besides, Caffe2 and PyTorch have different apis to obtain memory usage
whose implementation is not exactly the same.
`nvidia-smi` shows a larger memory usage for both detectron and mmdetection, e.g.,
we observe a much higher memory usage when we train Mask R-CNN with 2 images per GPU using detectron (10.6G) and mmdetection (9.3G), which is obviously more than actually required.
> With mmdetection, we can train R-50 FPN Mask R-CNN with **4** images per GPU (TITAN XP, 12G),
which is a promising result.
# mm-detection
Open-MMLab Detection Toolbox
**Note:** # mmdetection
We are still working on organizing the codebase. This toolbox will be formally released by the end of September. Stay tuned! ## Introduction
--- mmdetection is an open source object detection toolbox based on PyTorch. It is
a part of the open-mmlab project developed by [Multimedia Laboratory, CUHK](http://mmlab.ie.cuhk.edu.hk/).
## Major Features ### Major features
- **Modular Design** - **Modular Design**
...@@ -15,9 +14,217 @@ We are still working on organizing the codebase. This toolbox will be formally r ...@@ -15,9 +14,217 @@ We are still working on organizing the codebase. This toolbox will be formally r
- **Support of multiple frameworks out of box** - **Support of multiple frameworks out of box**
The toolbox directly supports popular detection frameworks, *e.g.* Faster RCNN, Mask RCNN, and R-FCN, etc. The toolbox directly supports popular detection frameworks, *e.g.* Faster RCNN, Mask RCNN, RetinaNet, etc.
- **Efficient**
All basic bbox and mask operations run on GPUs now.
The training speed is about 5% ~ 20% faster than Detectron for different models.
- **State of the art** - **State of the art**
This was the codebase of the *MMDet* team, who won the [COCO Detection 2018 challenge](http://cocodataset.org/#detection-leaderboard). This was the codebase of the *MMDet* team, who won the [COCO Detection 2018 challenge](http://cocodataset.org/#detection-leaderboard).
Apart from mmdetection, we also released a library [mmcv](https://github.com/open-mmlab/mmcv) for computer vision research,
which is heavily depended on by this toolbox.
## License
This project is released under the [Apache 2.0 license](LICENSE).
## Updates
v0.5.1 (20/10/2018)
- Add BBoxAssigner and BBoxSampler, the `train_cfg` field in config files are restructured.
- `ConvFCRoIHead` / `SharedFCRoIHead` are renamed to `ConvFCBBoxHead` / `SharedFCBBoxHead` for consistency.
## Benchmark and model zoo
We provide our baseline results and the comparision with Detectron, the most
popular detection projects. Results and models are available in the [Model zoo](MODEL_ZOO.md).
## Installation
### Requirements
- Linux (tested on Ubuntu 16.04 and CentOS 7.2)
- Python 3.4+
- PyTorch 0.4.1 and torchvision
- Cython
- [mmcv](https://github.com/open-mmlab/mmcv)
### Install mmdetection
a. Install PyTorch 0.4.1 and torchvision following the [official instructions](https://pytorch.org/).
b. Clone the mmdetection repository.
```shell
git clone https://github.com/open-mmlab/mmdetection.git
```
c. Compile cuda extensions.
```shell
cd mmdetection
pip install cython # or "conda install cython" if you prefer conda
./compile.sh # or "PYTHON=python3 ./compile.sh" if you use system python3 without virtual environments
```
d. Install mmdetection (other dependencies will be installed automatically).
```shell
python(3) setup.py install # add --user if you want to install it locally
# or "pip install ."
```
Note: You need to run the last step each time you pull updates from github.
The git commit id will be written to the version number and also saved in trained models.
### Prepare COCO dataset.
It is recommended to symlink the dataset root to `$MMDETECTION/data`.
```
mmdetection
├── mmdet
├── tools
├── configs
├── data
│ ├── coco
│ │ ├── annotations
│ │ ├── train2017
│ │ ├── val2017
│ │ ├── test2017
```
> [Here](https://gist.github.com/hellock/bf23cd7348c727d69d48682cb6909047) is
a script for setting up mmdetection with conda for reference.
## Inference with pretrained models
### Test a dataset
- [x] single GPU testing
- [x] multiple GPU testing
- [x] visualize detection results
We allow to run one or multiple processes on each GPU, e.g. 8 processes on 8 GPU
or 16 processes on 8 GPU. When the GPU workload is not very heavy for a single
process, running multiple processes will accelerate the testing, which is specified
with the argument `--proc_per_gpu <PROCESS_NUM>`.
To test a dataset and save the results.
```shell
python tools/test.py <CONFIG_FILE> <CHECKPOINT_FILE> --gpus <GPU_NUM> --out <OUT_FILE>
```
To perform evaluation after testing, add `--eval <EVAL_TYPES>`. Supported types are:
- proposal_fast: eval recalls of proposals with our own codes. (supposed to get the same results as the official evaluation)
- proposal: eval recalls of proposals with the official code provided by COCO.
- bbox: eval box AP with the official code provided by COCO.
- segm: eval mask AP with the official code provided by COCO.
- keypoints: eval keypoint AP with the official code provided by COCO.
For example, to evaluate Mask R-CNN with 8 GPUs and save the result as `results.pkl`.
```shell
python tools/test.py configs/mask_rcnn_r50_fpn_1x.py <CHECKPOINT_FILE> --gpus 8 --out results.pkl --eval bbox segm
```
It is also convenient to visualize the results during testing by adding an argument `--show`.
```shell
python tools/test.py <CONFIG_FILE> <CHECKPOINT_FILE> --show
```
### Test image(s)
We provide some high-level apis (experimental) to test an image.
```python
import mmcv
from mmcv.runner import load_checkpoint
from mmdet.models import build_detector
from mmdet.apis import inference_detector, show_result
cfg = mmcv.Config.fromfile('configs/faster_rcnn_r50_fpn_1x.py')
cfg.model.pretrained = None
# construct the model and load checkpoint
model = build_detector(cfg.model, test_cfg=cfg.test_cfg)
_ = load_checkpoint(model, 'https://s3.ap-northeast-2.amazonaws.com/open-mmlab/mmdetection/models/faster_rcnn_r50_fpn_1x_20181010-3d1b3351.pth')
# test a single image
img = mmcv.imread('test.jpg')
result = inference_detector(model, img, cfg)
show_result(img, result)
# test a list of images
imgs = ['test1.jpg', 'test2.jpg']
for i, result in enumerate(inference_detector(model, imgs, cfg, device='cuda:0')):
print(i, imgs[i])
show_result(imgs[i], result)
```
## Train a model
mmdetection implements distributed training and non-distributed training,
which uses `MMDistributedDataParallel` and `MMDataParallel` respectively.
### Distributed training
mmdetection potentially supports multiple launch methods, e.g., PyTorch’s built-in launch utility, slurm and MPI.
We provide a training script using the launch utility provided by PyTorch.
```shell
./tools/dist_train.sh <CONFIG_FILE> <GPU_NUM> [optional arguments]
```
Supported arguments are:
- --validate: perform evaluation every k (default=1) epochs during the training.
- --work_dir <WORK_DIR>: if specified, the path in config file will be overwritten.
### Non-distributed training
```shell
python tools/train.py <CONFIG_FILE> --gpus <GPU_NUM> --work_dir <WORK_DIR> --validate
```
Expected results in WORK_DIR:
- log file
- saved checkpoints (every k epochs, defaults=1)
- a symbol link to the latest checkpoint
> **Note**
> 1. We recommend using distributed training with NCCL2 even on a single machine, which is faster. Non-distributed training is for debugging or other purposes.
> 2. The default learning rate is for 8 GPUs. If you use less or more than 8 GPUs, you need to set the learning rate proportional to the GPU num. E.g., modify lr to 0.01 for 4 GPUs or 0.04 for 16 GPUs.
## Technical details
Some implementation details and project structures are described in the [technical details](TECHNICAL_DETAILS.md).
## Citation
If you use our codebase or models in your research, please cite this project.
We will release a paper or technical report later.
```
@misc{mmdetection2018,
author = {Kai Chen and Jiangmiao Pang and Jiaqi Wang and Yu Xiong and Xiaoxiao Li
and Shuyang Sun and Wansen Feng and Ziwei Liu and Jianping Shi and
Wanli Ouyang and Chen Change Loy and Dahua Lin},
title = {mmdetection},
howpublished = {\url{https://github.com/open-mmlab/mmdetection}},
year = {2018}
}
```
## Overview
In this section, we will introduce the main units of training a detector:
data loading, model and iteration pipeline.
## Data loading
Following typical conventions, we use `Dataset` and `DataLoader` for data loading
with multiple workers. `Dataset` returns a dict of data items corresponding
the arguments of models' forward method.
Since the data in object detection may not be the same size (image size, gt bbox size, etc.),
we introduce a new `DataContainer` type in `mmcv` to help collect and distribute
data of different size.
See [here](https://github.com/open-mmlab/mmcv/blob/master/mmcv/parallel/data_container.py) for more details.
## Model
In mmdetection, model components are basically categorized as 4 types.
- backbone: usually a FCN network to extract feature maps, e.g., ResNet.
- neck: the part between backbones and heads, e.g., FPN, ASPP.
- head: the part for specific tasks, e.g., bbox prediction and mask prediction.
- roi extractor: the part for extracting features from feature maps, e.g., RoI Align.
We also write implement some general detection pipelines with the above components,
such as `SingleStageDetector` and `TwoStageDetector`.
### Build a model with basic components
Following some basic pipelines (e.g., two-stage detectors), the model structure
can be customized through config files with no pains.
If we want to implement some new components, e.g, the path aggregation
FPN structure in [Path Aggregation Network for Instance Segmentation](https://arxiv.org/abs/1803.01534), there are two things to do.
1. create a new file in `mmdet/models/necks/pafpn.py`.
```python
class PAFPN(nn.Module):
def __init__(self,
in_channels,
out_channels,
num_outs,
start_level=0,
end_level=-1,
add_extra_convs=False):
pass
def forward(self, inputs):
# implementation is ignored
pass
```
2. modify the config file from
```python
neck=dict(
type='FPN',
in_channels=[256, 512, 1024, 2048],
out_channels=256,
num_outs=5)
```
to
```python
neck=dict(
type='PAFPN',
in_channels=[256, 512, 1024, 2048],
out_channels=256,
num_outs=5)
```
We will release more components (backbones, necks, heads) for research purpose.
### Write a new model
To write a new detection pipeline, you need to inherit from `BaseDetector`,
which defines the following abstract methods.
- `extract_feat()`: given an image batch of shape (n, c, h, w), extract the feature map(s).
- `forward_train()`: forward method of the training mode
- `simple_test()`: single scale testing without augmentation
- `aug_test()`: testing without augmentation (multi-scale, flip, etc.)
[TwoStageDetector](https://github.com/hellock/mmdetection/blob/master/mmdet/models/detectors/two_stage.py)
is a good example which shows how to do that.
## Iteration pipeline
We adopt distributed training for both single machine and multiple machines.
Supposing that the server has 8 GPUs, 8 processes will be started and each process runs on a single GPU.
Each process keeps an isolated model, data loader, and optimizer.
Model parameters are only synchronized once at the begining.
After a forward and backward pass, gradients will be allreduced among all GPUs,
and the optimizer will update model parameters.
Since the gradients are allreduced, the model parameter stays the same for all processes after the iteration.
\ No newline at end of file
...@@ -20,7 +20,7 @@ model = dict( ...@@ -20,7 +20,7 @@ model = dict(
out_channels=256, out_channels=256,
featmap_strides=[4, 8, 16, 32]), featmap_strides=[4, 8, 16, 32]),
bbox_head=dict( bbox_head=dict(
type='SharedFCRoIHead', type='SharedFCBBoxHead',
num_fcs=2, num_fcs=2,
in_channels=256, in_channels=256,
fc_out_channels=1024, fc_out_channels=1024,
...@@ -43,17 +43,19 @@ model = dict( ...@@ -43,17 +43,19 @@ model = dict(
# model training and testing settings # model training and testing settings
train_cfg = dict( train_cfg = dict(
rcnn=dict( rcnn=dict(
mask_size=28, assigner=dict(
pos_iou_thr=0.5, pos_iou_thr=0.5,
neg_iou_thr=0.5, neg_iou_thr=0.5,
crowd_thr=1.1, min_pos_iou=0.5,
roi_batch_size=512, ignore_iof_thr=-1),
add_gt_as_proposals=True, sampler=dict(
num=512,
pos_fraction=0.25, pos_fraction=0.25,
neg_pos_ub=-1,
add_gt_as_proposals=True,
pos_balance_sampling=False, pos_balance_sampling=False,
neg_pos_ub=512, neg_balance_thr=0),
neg_balance_thr=0, mask_size=28,
min_pos_iou=0.5,
pos_weight=-1, pos_weight=-1,
debug=False)) debug=False))
test_cfg = dict( test_cfg = dict(
......
...@@ -20,7 +20,7 @@ model = dict( ...@@ -20,7 +20,7 @@ model = dict(
out_channels=256, out_channels=256,
featmap_strides=[4, 8, 16, 32]), featmap_strides=[4, 8, 16, 32]),
bbox_head=dict( bbox_head=dict(
type='SharedFCRoIHead', type='SharedFCBBoxHead',
num_fcs=2, num_fcs=2,
in_channels=256, in_channels=256,
fc_out_channels=1024, fc_out_channels=1024,
...@@ -32,16 +32,18 @@ model = dict( ...@@ -32,16 +32,18 @@ model = dict(
# model training and testing settings # model training and testing settings
train_cfg = dict( train_cfg = dict(
rcnn=dict( rcnn=dict(
assigner=dict(
pos_iou_thr=0.5, pos_iou_thr=0.5,
neg_iou_thr=0.5, neg_iou_thr=0.5,
crowd_thr=1.1, min_pos_iou=0.5,
roi_batch_size=512, ignore_iof_thr=-1),
add_gt_as_proposals=True, sampler=dict(
num=512,
pos_fraction=0.25, pos_fraction=0.25,
neg_pos_ub=-1,
add_gt_as_proposals=True,
pos_balance_sampling=False, pos_balance_sampling=False,
neg_pos_ub=512, neg_balance_thr=0),
neg_balance_thr=0,
min_pos_iou=0.5,
pos_weight=-1, pos_weight=-1,
debug=False)) debug=False))
test_cfg = dict(rcnn=dict(score_thr=0.05, max_per_img=100, nms_thr=0.5)) test_cfg = dict(rcnn=dict(score_thr=0.05, max_per_img=100, nms_thr=0.5))
......
...@@ -30,7 +30,7 @@ model = dict( ...@@ -30,7 +30,7 @@ model = dict(
out_channels=256, out_channels=256,
featmap_strides=[4, 8, 16, 32]), featmap_strides=[4, 8, 16, 32]),
bbox_head=dict( bbox_head=dict(
type='SharedFCRoIHead', type='SharedFCBBoxHead',
num_fcs=2, num_fcs=2,
in_channels=256, in_channels=256,
fc_out_channels=1024, fc_out_channels=1024,
...@@ -42,30 +42,35 @@ model = dict( ...@@ -42,30 +42,35 @@ model = dict(
# model training and testing settings # model training and testing settings
train_cfg = dict( train_cfg = dict(
rpn=dict( rpn=dict(
pos_fraction=0.5, assigner=dict(
pos_balance_sampling=False,
neg_pos_ub=256,
allowed_border=0,
crowd_thr=1.1,
anchor_batch_size=256,
pos_iou_thr=0.7, pos_iou_thr=0.7,
neg_iou_thr=0.3, neg_iou_thr=0.3,
neg_balance_thr=0,
min_pos_iou=0.3, min_pos_iou=0.3,
ignore_iof_thr=-1),
sampler=dict(
num=256,
pos_fraction=0.5,
neg_pos_ub=-1,
add_gt_as_proposals=False,
pos_balance_sampling=False,
neg_balance_thr=0),
allowed_border=0,
pos_weight=-1, pos_weight=-1,
smoothl1_beta=1 / 9.0, smoothl1_beta=1 / 9.0,
debug=False), debug=False),
rcnn=dict( rcnn=dict(
assigner=dict(
pos_iou_thr=0.5, pos_iou_thr=0.5,
neg_iou_thr=0.5, neg_iou_thr=0.5,
crowd_thr=1.1, min_pos_iou=0.5,
roi_batch_size=512, ignore_iof_thr=-1),
add_gt_as_proposals=True, sampler=dict(
num=512,
pos_fraction=0.25, pos_fraction=0.25,
neg_pos_ub=-1,
add_gt_as_proposals=True,
pos_balance_sampling=False, pos_balance_sampling=False,
neg_pos_ub=512, neg_balance_thr=0),
neg_balance_thr=0,
min_pos_iou=0.5,
pos_weight=-1, pos_weight=-1,
debug=False)) debug=False))
test_cfg = dict( test_cfg = dict(
......
...@@ -30,7 +30,7 @@ model = dict( ...@@ -30,7 +30,7 @@ model = dict(
out_channels=256, out_channels=256,
featmap_strides=[4, 8, 16, 32]), featmap_strides=[4, 8, 16, 32]),
bbox_head=dict( bbox_head=dict(
type='SharedFCRoIHead', type='SharedFCBBoxHead',
num_fcs=2, num_fcs=2,
in_channels=256, in_channels=256,
fc_out_channels=1024, fc_out_channels=1024,
...@@ -53,31 +53,36 @@ model = dict( ...@@ -53,31 +53,36 @@ model = dict(
# model training and testing settings # model training and testing settings
train_cfg = dict( train_cfg = dict(
rpn=dict( rpn=dict(
pos_fraction=0.5, assigner=dict(
pos_balance_sampling=False,
neg_pos_ub=256,
allowed_border=0,
crowd_thr=1.1,
anchor_batch_size=256,
pos_iou_thr=0.7, pos_iou_thr=0.7,
neg_iou_thr=0.3, neg_iou_thr=0.3,
neg_balance_thr=0,
min_pos_iou=0.3, min_pos_iou=0.3,
ignore_iof_thr=-1),
sampler=dict(
num=256,
pos_fraction=0.5,
neg_pos_ub=-1,
add_gt_as_proposals=False,
pos_balance_sampling=False,
neg_balance_thr=0),
allowed_border=0,
pos_weight=-1, pos_weight=-1,
smoothl1_beta=1 / 9.0, smoothl1_beta=1 / 9.0,
debug=False), debug=False),
rcnn=dict( rcnn=dict(
mask_size=28, assigner=dict(
pos_iou_thr=0.5, pos_iou_thr=0.5,
neg_iou_thr=0.5, neg_iou_thr=0.5,
crowd_thr=1.1, min_pos_iou=0.5,
roi_batch_size=512, ignore_iof_thr=-1),
add_gt_as_proposals=True, sampler=dict(
num=512,
pos_fraction=0.25, pos_fraction=0.25,
neg_pos_ub=-1,
add_gt_as_proposals=True,
pos_balance_sampling=False, pos_balance_sampling=False,
neg_pos_ub=512, neg_balance_thr=0),
neg_balance_thr=0, mask_size=28,
min_pos_iou=0.5,
pos_weight=-1, pos_weight=-1,
debug=False)) debug=False))
test_cfg = dict( test_cfg = dict(
......
...@@ -27,16 +27,19 @@ model = dict( ...@@ -27,16 +27,19 @@ model = dict(
# model training and testing settings # model training and testing settings
train_cfg = dict( train_cfg = dict(
rpn=dict( rpn=dict(
pos_fraction=0.5, assigner=dict(
pos_balance_sampling=False,
neg_pos_ub=256,
allowed_border=0,
crowd_thr=1.1,
anchor_batch_size=256,
pos_iou_thr=0.7, pos_iou_thr=0.7,
neg_iou_thr=0.3, neg_iou_thr=0.3,
neg_balance_thr=0,
min_pos_iou=0.3, min_pos_iou=0.3,
ignore_iof_thr=-1),
sampler=dict(
num=256,
pos_fraction=0.5,
neg_pos_ub=-1,
add_gt_as_proposals=False,
pos_balance_sampling=False,
neg_balance_thr=0),
allowed_border=0,
pos_weight=-1, pos_weight=-1,
smoothl1_beta=1 / 9.0, smoothl1_beta=1 / 9.0,
debug=False)) debug=False))
......
import torch import torch
from ..bbox import bbox_assign, bbox2delta, bbox_sampling from ..bbox import assign_and_sample, BBoxAssigner, SamplingResult, bbox2delta
from ..utils import multi_apply from ..utils import multi_apply
...@@ -102,37 +102,40 @@ def anchor_target_single(flat_anchors, ...@@ -102,37 +102,40 @@ def anchor_target_single(flat_anchors,
return (None, ) * 6 return (None, ) * 6
# assign gt and sample anchors # assign gt and sample anchors
anchors = flat_anchors[inside_flags, :] anchors = flat_anchors[inside_flags, :]
assigned_gt_inds, argmax_overlaps, max_overlaps = bbox_assign(
anchors,
gt_bboxes,
pos_iou_thr=cfg.pos_iou_thr,
neg_iou_thr=cfg.neg_iou_thr,
min_pos_iou=cfg.min_pos_iou)
if sampling: if sampling:
pos_inds, neg_inds = bbox_sampling( assign_result, sampling_result = assign_and_sample(
assigned_gt_inds, cfg.anchor_batch_size, cfg.pos_fraction, anchors, gt_bboxes, None, None, cfg)
cfg.neg_pos_ub, cfg.pos_balance_sampling, max_overlaps,
cfg.neg_balance_thr)
else: else:
pos_inds = torch.nonzero(assigned_gt_inds > 0).squeeze(-1).unique() bbox_assigner = BBoxAssigner(**cfg.assigner)
neg_inds = torch.nonzero(assigned_gt_inds == 0).squeeze(-1).unique() assign_result = bbox_assigner.assign(anchors, gt_bboxes, None,
gt_labels)
pos_inds = torch.nonzero(
assign_result.gt_inds > 0).squeeze(-1).unique()
neg_inds = torch.nonzero(
assign_result.gt_inds == 0).squeeze(-1).unique()
gt_flags = anchors.new_zeros(anchors.shape[0], dtype=torch.uint8)
sampling_result = SamplingResult(pos_inds, neg_inds, anchors,
gt_bboxes, assign_result, gt_flags)
num_valid_anchors = anchors.shape[0]
bbox_targets = torch.zeros_like(anchors) bbox_targets = torch.zeros_like(anchors)
bbox_weights = torch.zeros_like(anchors) bbox_weights = torch.zeros_like(anchors)
labels = torch.zeros_like(assigned_gt_inds) labels = anchors.new_zeros((num_valid_anchors, ))
label_weights = torch.zeros_like(assigned_gt_inds, dtype=anchors.dtype) label_weights = anchors.new_zeros((num_valid_anchors, ))
pos_inds = sampling_result.pos_inds
neg_inds = sampling_result.neg_inds
if len(pos_inds) > 0: if len(pos_inds) > 0:
pos_anchors = anchors[pos_inds, :] pos_bbox_targets = bbox2delta(sampling_result.pos_bboxes,
pos_gt_bbox = gt_bboxes[assigned_gt_inds[pos_inds] - 1, :] sampling_result.pos_gt_bboxes,
pos_bbox_targets = bbox2delta(pos_anchors, pos_gt_bbox, target_means, target_means, target_stds)
target_stds)
bbox_targets[pos_inds, :] = pos_bbox_targets bbox_targets[pos_inds, :] = pos_bbox_targets
bbox_weights[pos_inds, :] = 1.0 bbox_weights[pos_inds, :] = 1.0
if gt_labels is None: if gt_labels is None:
labels[pos_inds] = 1 labels[pos_inds] = 1
else: else:
labels[pos_inds] = gt_labels[assigned_gt_inds[pos_inds] - 1] labels[pos_inds] = gt_labels[sampling_result.pos_assigned_gt_inds]
if cfg.pos_weight <= 0: if cfg.pos_weight <= 0:
label_weights[pos_inds] = 1.0 label_weights[pos_inds] = 1.0
else: else:
......
from .geometry import bbox_overlaps from .geometry import bbox_overlaps
from .sampling import (random_choice, bbox_assign, bbox_assign_wrt_overlaps, from .assignment import BBoxAssigner, AssignResult
bbox_sampling, bbox_sampling_pos, bbox_sampling_neg, from .sampling import (BBoxSampler, SamplingResult, assign_and_sample,
sample_bboxes) random_choice)
from .transforms import (bbox2delta, delta2bbox, bbox_flip, bbox_mapping, from .transforms import (bbox2delta, delta2bbox, bbox_flip, bbox_mapping,
bbox_mapping_back, bbox2roi, roi2bbox, bbox2result) bbox_mapping_back, bbox2roi, roi2bbox, bbox2result)
from .bbox_target import bbox_target from .bbox_target import bbox_target
__all__ = [ __all__ = [
'bbox_overlaps', 'random_choice', 'bbox_assign', 'bbox_overlaps', 'BBoxAssigner', 'AssignResult', 'BBoxSampler',
'bbox_assign_wrt_overlaps', 'bbox_sampling', 'bbox_sampling_pos', 'SamplingResult', 'assign_and_sample', 'random_choice', 'bbox2delta',
'bbox_sampling_neg', 'sample_bboxes', 'bbox2delta', 'delta2bbox', 'delta2bbox', 'bbox_flip', 'bbox_mapping', 'bbox_mapping_back', 'bbox2roi',
'bbox_flip', 'bbox_mapping', 'bbox_mapping_back', 'bbox2roi', 'roi2bbox', 'roi2bbox', 'bbox2result', 'bbox_target'
'bbox2result', 'bbox_target'
] ]
import torch
from .geometry import bbox_overlaps
class BBoxAssigner(object):
"""Assign a corresponding gt bbox or background to each bbox.
Each proposals will be assigned with `-1`, `0`, or a positive integer
indicating the ground truth index.
- -1: don't care
- 0: negative sample, no assigned gt
- positive integer: positive sample, index (1-based) of assigned gt
Args:
pos_iou_thr (float): IoU threshold for positive bboxes.
neg_iou_thr (float or tuple): IoU threshold for negative bboxes.
min_pos_iou (float): Minimum iou for a bbox to be considered as a
positive bbox. For RPN, it is usually set as 0.3, for Fast R-CNN,
it is usually set as pos_iou_thr
ignore_iof_thr (float): IoF threshold for ignoring bboxes (if
`gt_bboxes_ignore` is specified). Negative values mean not
ignoring any bboxes.
"""
def __init__(self,
pos_iou_thr,
neg_iou_thr,
min_pos_iou=.0,
ignore_iof_thr=-1):
self.pos_iou_thr = pos_iou_thr
self.neg_iou_thr = neg_iou_thr
self.min_pos_iou = min_pos_iou
self.ignore_iof_thr = ignore_iof_thr
def assign(self, bboxes, gt_bboxes, gt_bboxes_ignore=None, gt_labels=None):
"""Assign gt to bboxes.
This method assign a gt bbox to every bbox (proposal/anchor), each bbox
will be assigned with -1, 0, or a positive number. -1 means don't care,
0 means negative sample, positive number is the index (1-based) of
assigned gt.
The assignment is done in following steps, the order matters.
1. assign every bbox to -1
2. assign proposals whose iou with all gts < neg_iou_thr to 0
3. for each bbox, if the iou with its nearest gt >= pos_iou_thr,
assign it to that bbox
4. for each gt bbox, assign its nearest proposals (may be more than
one) to itself
Args:
bboxes (Tensor): Bounding boxes to be assigned, shape(n, 4).
gt_bboxes (Tensor): Groundtruth boxes, shape (k, 4).
gt_bboxes_ignore (Tensor, optional): Ground truth bboxes that are
labelled as `ignored`, e.g., crowd boxes in COCO.
gt_labels (Tensor, optional): Label of gt_bboxes, shape (k, ).
Returns:
:obj:`AssignResult`: The assign result.
"""
if bboxes.shape[0] == 0 or gt_bboxes.shape[0] == 0:
raise ValueError('No gt or bboxes')
bboxes = bboxes[:, :4]
overlaps = bbox_overlaps(bboxes, gt_bboxes)
if (self.ignore_iof_thr > 0) and (gt_bboxes_ignore is not None) and (
gt_bboxes_ignore.numel() > 0):
ignore_overlaps = bbox_overlaps(
bboxes, gt_bboxes_ignore, mode='iof')
ignore_max_overlaps, _ = ignore_overlaps.max(dim=1)
ignore_bboxes_inds = torch.nonzero(
ignore_max_overlaps > self.ignore_iof_thr).squeeze()
if ignore_bboxes_inds.numel() > 0:
overlaps[ignore_bboxes_inds[:, 0], :] = -1
assign_result = self.assign_wrt_overlaps(overlaps, gt_labels)
return assign_result
def assign_wrt_overlaps(self, overlaps, gt_labels=None):
"""Assign w.r.t. the overlaps of bboxes with gts.
Args:
overlaps (Tensor): Overlaps between n bboxes and k gt_bboxes,
shape(n, k).
gt_labels (Tensor, optional): Labels of k gt_bboxes, shape (k, ).
Returns:
:obj:`AssignResult`: The assign result.
"""
if overlaps.numel() == 0:
raise ValueError('No gt or proposals')
num_bboxes, num_gts = overlaps.size(0), overlaps.size(1)
# 1. assign -1 by default
assigned_gt_inds = overlaps.new_full(
(num_bboxes, ), -1, dtype=torch.long)
assert overlaps.size() == (num_bboxes, num_gts)
# for each anchor, which gt best overlaps with it
# for each anchor, the max iou of all gts
max_overlaps, argmax_overlaps = overlaps.max(dim=1)
# for each gt, which anchor best overlaps with it
# for each gt, the max iou of all proposals
gt_max_overlaps, gt_argmax_overlaps = overlaps.max(dim=0)
# 2. assign negative: below
if isinstance(self.neg_iou_thr, float):
assigned_gt_inds[(max_overlaps >= 0)
& (max_overlaps < self.neg_iou_thr)] = 0
elif isinstance(self.neg_iou_thr, tuple):
assert len(self.neg_iou_thr) == 2
assigned_gt_inds[(max_overlaps >= self.neg_iou_thr[0])
& (max_overlaps < self.neg_iou_thr[1])] = 0
# 3. assign positive: above positive IoU threshold
pos_inds = max_overlaps >= self.pos_iou_thr
assigned_gt_inds[pos_inds] = argmax_overlaps[pos_inds] + 1
# 4. assign fg: for each gt, proposals with highest IoU
for i in range(num_gts):
if gt_max_overlaps[i] >= self.min_pos_iou:
assigned_gt_inds[overlaps[:, i] == gt_max_overlaps[i]] = i + 1
if gt_labels is not None:
assigned_labels = assigned_gt_inds.new_zeros((num_bboxes, ))
pos_inds = torch.nonzero(assigned_gt_inds > 0).squeeze()
if pos_inds.numel() > 0:
assigned_labels[pos_inds] = gt_labels[
assigned_gt_inds[pos_inds] - 1]
else:
assigned_labels = None
return AssignResult(
num_gts, assigned_gt_inds, max_overlaps, labels=assigned_labels)
class AssignResult(object):
def __init__(self, num_gts, gt_inds, max_overlaps, labels=None):
self.num_gts = num_gts
self.gt_inds = gt_inds
self.max_overlaps = max_overlaps
self.labels = labels
def add_gt_(self, gt_labels):
self_inds = torch.arange(
1, len(gt_labels) + 1, dtype=torch.long, device=gt_labels.device)
self.gt_inds = torch.cat([self_inds, self.gt_inds])
self.max_overlaps = torch.cat(
[self.max_overlaps.new_ones(self.num_gts), self.max_overlaps])
if self.labels is not None:
self.labels = torch.cat([gt_labels, self.labels])
...@@ -4,23 +4,23 @@ from .transforms import bbox2delta ...@@ -4,23 +4,23 @@ from .transforms import bbox2delta
from ..utils import multi_apply from ..utils import multi_apply
def bbox_target(pos_proposals_list, def bbox_target(pos_bboxes_list,
neg_proposals_list, neg_bboxes_list,
pos_gt_bboxes_list, pos_gt_bboxes_list,
pos_gt_labels_list, pos_gt_labels_list,
cfg, cfg,
reg_num_classes=1, reg_classes=1,
target_means=[.0, .0, .0, .0], target_means=[.0, .0, .0, .0],
target_stds=[1.0, 1.0, 1.0, 1.0], target_stds=[1.0, 1.0, 1.0, 1.0],
concat=True): concat=True):
labels, label_weights, bbox_targets, bbox_weights = multi_apply( labels, label_weights, bbox_targets, bbox_weights = multi_apply(
proposal_target_single, bbox_target_single,
pos_proposals_list, pos_bboxes_list,
neg_proposals_list, neg_bboxes_list,
pos_gt_bboxes_list, pos_gt_bboxes_list,
pos_gt_labels_list, pos_gt_labels_list,
cfg=cfg, cfg=cfg,
reg_num_classes=reg_num_classes, reg_classes=reg_classes,
target_means=target_means, target_means=target_means,
target_stds=target_stds) target_stds=target_stds)
...@@ -32,34 +32,34 @@ def bbox_target(pos_proposals_list, ...@@ -32,34 +32,34 @@ def bbox_target(pos_proposals_list,
return labels, label_weights, bbox_targets, bbox_weights return labels, label_weights, bbox_targets, bbox_weights
def proposal_target_single(pos_proposals, def bbox_target_single(pos_bboxes,
neg_proposals, neg_bboxes,
pos_gt_bboxes, pos_gt_bboxes,
pos_gt_labels, pos_gt_labels,
cfg, cfg,
reg_num_classes=1, reg_classes=1,
target_means=[.0, .0, .0, .0], target_means=[.0, .0, .0, .0],
target_stds=[1.0, 1.0, 1.0, 1.0]): target_stds=[1.0, 1.0, 1.0, 1.0]):
num_pos = pos_proposals.size(0) num_pos = pos_bboxes.size(0)
num_neg = neg_proposals.size(0) num_neg = neg_bboxes.size(0)
num_samples = num_pos + num_neg num_samples = num_pos + num_neg
labels = pos_proposals.new_zeros(num_samples, dtype=torch.long) labels = pos_bboxes.new_zeros(num_samples, dtype=torch.long)
label_weights = pos_proposals.new_zeros(num_samples) label_weights = pos_bboxes.new_zeros(num_samples)
bbox_targets = pos_proposals.new_zeros(num_samples, 4) bbox_targets = pos_bboxes.new_zeros(num_samples, 4)
bbox_weights = pos_proposals.new_zeros(num_samples, 4) bbox_weights = pos_bboxes.new_zeros(num_samples, 4)
if num_pos > 0: if num_pos > 0:
labels[:num_pos] = pos_gt_labels labels[:num_pos] = pos_gt_labels
pos_weight = 1.0 if cfg.pos_weight <= 0 else cfg.pos_weight pos_weight = 1.0 if cfg.pos_weight <= 0 else cfg.pos_weight
label_weights[:num_pos] = pos_weight label_weights[:num_pos] = pos_weight
pos_bbox_targets = bbox2delta(pos_proposals, pos_gt_bboxes, pos_bbox_targets = bbox2delta(pos_bboxes, pos_gt_bboxes, target_means,
target_means, target_stds) target_stds)
bbox_targets[:num_pos, :] = pos_bbox_targets bbox_targets[:num_pos, :] = pos_bbox_targets
bbox_weights[:num_pos, :] = 1 bbox_weights[:num_pos, :] = 1
if num_neg > 0: if num_neg > 0:
label_weights[-num_neg:] = 1.0 label_weights[-num_neg:] = 1.0
if reg_num_classes > 1: if reg_classes > 1:
bbox_targets, bbox_weights = expand_target(bbox_targets, bbox_weights, bbox_targets, bbox_weights = expand_target(bbox_targets, bbox_weights,
labels, reg_num_classes) labels, reg_classes)
return labels, label_weights, bbox_targets, bbox_weights return labels, label_weights, bbox_targets, bbox_weights
......
import numpy as np import numpy as np
import torch import torch
from .geometry import bbox_overlaps from .assignment import BBoxAssigner
def random_choice(gallery, num): def random_choice(gallery, num):
...@@ -21,158 +21,68 @@ def random_choice(gallery, num): ...@@ -21,158 +21,68 @@ def random_choice(gallery, num):
return gallery[rand_inds] return gallery[rand_inds]
def bbox_assign(proposals, def assign_and_sample(bboxes, gt_bboxes, gt_bboxes_ignore, gt_labels, cfg):
gt_bboxes, bbox_assigner = BBoxAssigner(**cfg.assigner)
gt_bboxes_ignore=None, bbox_sampler = BBoxSampler(**cfg.sampler)
gt_labels=None, assign_result = bbox_assigner.assign(bboxes, gt_bboxes, gt_bboxes_ignore,
pos_iou_thr=0.5, gt_labels)
neg_iou_thr=0.5, sampling_result = bbox_sampler.sample(assign_result, bboxes, gt_bboxes,
min_pos_iou=.0, gt_labels)
crowd_thr=-1): return assign_result, sampling_result
"""Assign a corresponding gt bbox or background to each proposal/anchor.
Each proposals will be assigned with `-1`, `0`, or a positive integer.
- -1: don't care class BBoxSampler(object):
- 0: negative sample, no assigned gt """Sample positive and negative bboxes given assigned results.
- positive integer: positive sample, index (1-based) of assigned gt
If `gt_bboxes_ignore` is specified, bboxes which have iof (intersection
over foreground) with `gt_bboxes_ignore` above `crowd_thr` will be ignored.
Args:
proposals (Tensor): Proposals or RPN anchors, shape (n, 4).
gt_bboxes (Tensor): Ground truth bboxes, shape (k, 4).
gt_bboxes_ignore (Tensor, optional): shape(m, 4).
gt_labels (Tensor, optional): shape (k, ).
pos_iou_thr (float): IoU threshold for positive bboxes.
neg_iou_thr (float or tuple): IoU threshold for negative bboxes.
min_pos_iou (float): Minimum iou for a bbox to be considered as a
positive bbox. For RPN, it is usually set as 0.3, for Fast R-CNN,
it is usually set as pos_iou_thr
crowd_thr (float): IoF threshold for ignoring bboxes. Negative value
for not ignoring any bboxes.
Returns:
tuple: (assigned_gt_inds, argmax_overlaps, max_overlaps), shape (n, )
"""
# calculate overlaps between the proposals and the gt boxes
overlaps = bbox_overlaps(proposals, gt_bboxes)
if overlaps.numel() == 0:
raise ValueError('No gt bbox or proposals')
# ignore proposals according to crowd bboxes
if (crowd_thr > 0) and (gt_bboxes_ignore is
not None) and (gt_bboxes_ignore.numel() > 0):
crowd_overlaps = bbox_overlaps(proposals, gt_bboxes_ignore, mode='iof')
crowd_max_overlaps, _ = crowd_overlaps.max(dim=1)
crowd_bboxes_inds = torch.nonzero(
crowd_max_overlaps > crowd_thr).long()
if crowd_bboxes_inds.numel() > 0:
overlaps[crowd_bboxes_inds, :] = -1
return bbox_assign_wrt_overlaps(overlaps, gt_labels, pos_iou_thr,
neg_iou_thr, min_pos_iou)
def bbox_assign_wrt_overlaps(overlaps,
gt_labels=None,
pos_iou_thr=0.5,
neg_iou_thr=0.5,
min_pos_iou=.0):
"""Assign a corresponding gt bbox or background to each proposal/anchor.
This method assign a gt bbox to every proposal, each proposals will be
assigned with -1, 0, or a positive number. -1 means don't care, 0 means
negative sample, positive number is the index (1-based) of assigned gt.
The assignment is done in following steps, the order matters:
1. assign every anchor to -1
2. assign proposals whose iou with all gts < neg_iou_thr to 0
3. for each anchor, if the iou with its nearest gt >= pos_iou_thr,
assign it to that bbox
4. for each gt bbox, assign its nearest proposals(may be more than one)
to itself
Args: Args:
overlaps (Tensor): Overlaps between n proposals and k gt_bboxes, pos_fraction (float): Positive sample fraction.
shape(n, k). neg_pos_ub (float): Negative/Positive upper bound.
gt_labels (Tensor, optional): Labels of k gt_bboxes, shape (k, ). pos_balance_sampling (bool): Whether to sample positive samples around
pos_iou_thr (float): IoU threshold for positive bboxes. each gt bbox evenly.
neg_iou_thr (float or tuple): IoU threshold for negative bboxes. neg_balance_thr (float, optional): IoU threshold for simple/hard
min_pos_iou (float): Minimum IoU for a bbox to be considered as a negative balance sampling.
positive bbox. This argument only affects the 4th step. neg_hard_fraction (float, optional): Fraction of hard negative samples
for negative balance sampling.
Returns:
tuple: (assigned_gt_inds, [assigned_labels], argmax_overlaps,
max_overlaps), shape (n, )
""" """
num_bboxes, num_gts = overlaps.size(0), overlaps.size(1)
# 1. assign -1 by default
assigned_gt_inds = overlaps.new(num_bboxes).long().fill_(-1)
if overlaps.numel() == 0:
raise ValueError('No gt bbox or proposals')
assert overlaps.size() == (num_bboxes, num_gts)
# for each anchor, which gt best overlaps with it
# for each anchor, the max iou of all gts
max_overlaps, argmax_overlaps = overlaps.max(dim=1)
# for each gt, which anchor best overlaps with it
# for each gt, the max iou of all proposals
gt_max_overlaps, gt_argmax_overlaps = overlaps.max(dim=0)
# 2. assign negative: below
if isinstance(neg_iou_thr, float):
assigned_gt_inds[(max_overlaps >= 0)
& (max_overlaps < neg_iou_thr)] = 0
elif isinstance(neg_iou_thr, tuple):
assert len(neg_iou_thr) == 2
assigned_gt_inds[(max_overlaps >= neg_iou_thr[0])
& (max_overlaps < neg_iou_thr[1])] = 0
# 3. assign positive: above positive IoU threshold def __init__(self,
pos_inds = max_overlaps >= pos_iou_thr num,
assigned_gt_inds[pos_inds] = argmax_overlaps[pos_inds] + 1 pos_fraction,
neg_pos_ub=-1,
# 4. assign fg: for each gt, proposals with highest IoU add_gt_as_proposals=True,
for i in range(num_gts): pos_balance_sampling=False,
if gt_max_overlaps[i] >= min_pos_iou: neg_balance_thr=0,
assigned_gt_inds[overlaps[:, i] == gt_max_overlaps[i]] = i + 1 neg_hard_fraction=0.5):
self.num = num
if gt_labels is None: self.pos_fraction = pos_fraction
return assigned_gt_inds, argmax_overlaps, max_overlaps self.neg_pos_ub = neg_pos_ub
else: self.add_gt_as_proposals = add_gt_as_proposals
assigned_labels = assigned_gt_inds.new(num_bboxes).fill_(0) self.pos_balance_sampling = pos_balance_sampling
pos_inds = torch.nonzero(assigned_gt_inds > 0).squeeze() self.neg_balance_thr = neg_balance_thr
if pos_inds.numel() > 0: self.neg_hard_fraction = neg_hard_fraction
assigned_labels[pos_inds] = gt_labels[assigned_gt_inds[pos_inds] -
1] def _sample_pos(self, assign_result, num_expected):
return assigned_gt_inds, assigned_labels, argmax_overlaps, max_overlaps
def bbox_sampling_pos(assigned_gt_inds, num_expected, balance_sampling=True):
"""Balance sampling for positive bboxes/anchors. """Balance sampling for positive bboxes/anchors.
1. calculate average positive num for each gt: num_per_gt 1. calculate average positive num for each gt: num_per_gt
2. sample at most num_per_gt positives for each gt 2. sample at most num_per_gt positives for each gt
3. random sampling from rest anchors if not enough fg 3. random sampling from rest anchors if not enough fg
""" """
pos_inds = torch.nonzero(assigned_gt_inds > 0) pos_inds = torch.nonzero(assign_result.gt_inds > 0)
if pos_inds.numel() != 0: if pos_inds.numel() != 0:
pos_inds = pos_inds.squeeze(1) pos_inds = pos_inds.squeeze(1)
if pos_inds.numel() <= num_expected: if pos_inds.numel() <= num_expected:
return pos_inds return pos_inds
elif not balance_sampling: elif not self.pos_balance_sampling:
return random_choice(pos_inds, num_expected) return random_choice(pos_inds, num_expected)
else: else:
unique_gt_inds = torch.unique(assigned_gt_inds[pos_inds].cpu()) unique_gt_inds = torch.unique(
assign_result.gt_inds[pos_inds].cpu())
num_gts = len(unique_gt_inds) num_gts = len(unique_gt_inds)
num_per_gt = int(round(num_expected / float(num_gts)) + 1) num_per_gt = int(round(num_expected / float(num_gts)) + 1)
sampled_inds = [] sampled_inds = []
for i in unique_gt_inds: for i in unique_gt_inds:
inds = torch.nonzero(assigned_gt_inds == i.item()) inds = torch.nonzero(assign_result.gt_inds == i.item())
if inds.numel() != 0: if inds.numel() != 0:
inds = inds.squeeze(1) inds = inds.squeeze(1)
else: else:
...@@ -188,56 +98,53 @@ def bbox_sampling_pos(assigned_gt_inds, num_expected, balance_sampling=True): ...@@ -188,56 +98,53 @@ def bbox_sampling_pos(assigned_gt_inds, num_expected, balance_sampling=True):
if len(extra_inds) > num_extra: if len(extra_inds) > num_extra:
extra_inds = random_choice(extra_inds, num_extra) extra_inds = random_choice(extra_inds, num_extra)
extra_inds = torch.from_numpy(extra_inds).to( extra_inds = torch.from_numpy(extra_inds).to(
assigned_gt_inds.device).long() assign_result.gt_inds.device).long()
sampled_inds = torch.cat([sampled_inds, extra_inds]) sampled_inds = torch.cat([sampled_inds, extra_inds])
elif len(sampled_inds) > num_expected: elif len(sampled_inds) > num_expected:
sampled_inds = random_choice(sampled_inds, num_expected) sampled_inds = random_choice(sampled_inds, num_expected)
return sampled_inds return sampled_inds
def _sample_neg(self, assign_result, num_expected):
def bbox_sampling_neg(assigned_gt_inds,
num_expected,
max_overlaps=None,
balance_thr=0,
hard_fraction=0.5):
"""Balance sampling for negative bboxes/anchors. """Balance sampling for negative bboxes/anchors.
Negative samples are split into 2 set: hard (balance_thr <= iou < Negative samples are split into 2 set: hard (balance_thr <= iou <
neg_iou_thr) and easy(iou < balance_thr). The sampling ratio is controlled neg_iou_thr) and easy (iou < balance_thr). The sampling ratio is
by `hard_fraction`. controlled by `hard_fraction`.
""" """
neg_inds = torch.nonzero(assigned_gt_inds == 0) neg_inds = torch.nonzero(assign_result.gt_inds == 0)
if neg_inds.numel() != 0: if neg_inds.numel() != 0:
neg_inds = neg_inds.squeeze(1) neg_inds = neg_inds.squeeze(1)
if len(neg_inds) <= num_expected: if len(neg_inds) <= num_expected:
return neg_inds return neg_inds
elif balance_thr <= 0: elif self.neg_balance_thr <= 0:
# uniform sampling among all negative samples # uniform sampling among all negative samples
return random_choice(neg_inds, num_expected) return random_choice(neg_inds, num_expected)
else: else:
assert max_overlaps is not None max_overlaps = assign_result.max_overlaps.cpu().numpy()
max_overlaps = max_overlaps.cpu().numpy()
# balance sampling for negative samples # balance sampling for negative samples
neg_set = set(neg_inds.cpu().numpy()) neg_set = set(neg_inds.cpu().numpy())
easy_set = set( easy_set = set(
np.where( np.where(
np.logical_and(max_overlaps >= 0, np.logical_and(max_overlaps >= 0,
max_overlaps < balance_thr))[0]) max_overlaps < self.neg_balance_thr))[0])
hard_set = set(np.where(max_overlaps >= balance_thr)[0]) hard_set = set(np.where(max_overlaps >= self.neg_balance_thr)[0])
easy_neg_inds = list(easy_set & neg_set) easy_neg_inds = list(easy_set & neg_set)
hard_neg_inds = list(hard_set & neg_set) hard_neg_inds = list(hard_set & neg_set)
num_expected_hard = int(num_expected * hard_fraction) num_expected_hard = int(num_expected * self.neg_hard_fraction)
if len(hard_neg_inds) > num_expected_hard: if len(hard_neg_inds) > num_expected_hard:
sampled_hard_inds = random_choice(hard_neg_inds, num_expected_hard) sampled_hard_inds = random_choice(hard_neg_inds,
num_expected_hard)
else: else:
sampled_hard_inds = np.array(hard_neg_inds, dtype=np.int) sampled_hard_inds = np.array(hard_neg_inds, dtype=np.int)
num_expected_easy = num_expected - len(sampled_hard_inds) num_expected_easy = num_expected - len(sampled_hard_inds)
if len(easy_neg_inds) > num_expected_easy: if len(easy_neg_inds) > num_expected_easy:
sampled_easy_inds = random_choice(easy_neg_inds, num_expected_easy) sampled_easy_inds = random_choice(easy_neg_inds,
num_expected_easy)
else: else:
sampled_easy_inds = np.array(easy_neg_inds, dtype=np.int) sampled_easy_inds = np.array(easy_neg_inds, dtype=np.int)
sampled_inds = np.concatenate((sampled_easy_inds, sampled_hard_inds)) sampled_inds = np.concatenate((sampled_easy_inds,
sampled_hard_inds))
if len(sampled_inds) < num_expected: if len(sampled_inds) < num_expected:
num_extra = num_expected - len(sampled_inds) num_extra = num_expected - len(sampled_inds)
extra_inds = np.array(list(neg_set - set(sampled_inds))) extra_inds = np.array(list(neg_set - set(sampled_inds)))
...@@ -245,99 +152,76 @@ def bbox_sampling_neg(assigned_gt_inds, ...@@ -245,99 +152,76 @@ def bbox_sampling_neg(assigned_gt_inds,
extra_inds = random_choice(extra_inds, num_extra) extra_inds = random_choice(extra_inds, num_extra)
sampled_inds = np.concatenate((sampled_inds, extra_inds)) sampled_inds = np.concatenate((sampled_inds, extra_inds))
sampled_inds = torch.from_numpy(sampled_inds).long().to( sampled_inds = torch.from_numpy(sampled_inds).long().to(
assigned_gt_inds.device) assign_result.gt_inds.device)
return sampled_inds return sampled_inds
def sample(self, assign_result, bboxes, gt_bboxes, gt_labels=None):
def bbox_sampling(assigned_gt_inds,
num_expected,
pos_fraction,
neg_pos_ub,
pos_balance_sampling=True,
max_overlaps=None,
neg_balance_thr=0,
neg_hard_fraction=0.5):
"""Sample positive and negative bboxes given assigned results.
Args:
assigned_gt_inds (Tensor): Assigned gt indices for each bbox.
num_expected (int): Expected total samples (pos and neg).
pos_fraction (float): Positive sample fraction.
neg_pos_ub (float): Negative/Positive upper bound.
pos_balance_sampling(bool): Whether to sample positive samples around
each gt bbox evenly.
max_overlaps (Tensor, optional): For each bbox, the max IoU of all gts.
Used for negative balance sampling only.
neg_balance_thr (float, optional): IoU threshold for simple/hard
negative balance sampling.
neg_hard_fraction (float, optional): Fraction of hard negative samples
for negative balance sampling.
Returns:
tuple[Tensor]: positive bbox indices, negative bbox indices.
"""
num_expected_pos = int(num_expected * pos_fraction)
pos_inds = bbox_sampling_pos(assigned_gt_inds, num_expected_pos,
pos_balance_sampling)
# We found that sampled indices have duplicated items occasionally.
# (mab be a bug of PyTorch)
pos_inds = pos_inds.unique()
num_sampled_pos = pos_inds.numel()
num_neg_max = int(
neg_pos_ub *
num_sampled_pos) if num_sampled_pos > 0 else int(neg_pos_ub)
num_expected_neg = min(num_neg_max, num_expected - num_sampled_pos)
neg_inds = bbox_sampling_neg(assigned_gt_inds, num_expected_neg,
max_overlaps, neg_balance_thr,
neg_hard_fraction)
neg_inds = neg_inds.unique()
return pos_inds, neg_inds
def sample_bboxes(bboxes, gt_bboxes, gt_bboxes_ignore, gt_labels, cfg):
"""Sample positive and negative bboxes. """Sample positive and negative bboxes.
This is a simple implementation of bbox sampling given candidates and This is a simple implementation of bbox sampling given candidates,
ground truth bboxes, which includes 3 steps. assigning results and ground truth bboxes.
1. Assign gt to each bbox. 1. Assign gt to each bbox.
2. Add gt bboxes to the sampling pool (optional). 2. Add gt bboxes to the sampling pool (optional).
3. Perform positive and negative sampling. 3. Perform positive and negative sampling.
Args: Args:
assign_result (:obj:`AssignResult`): Bbox assigning results.
bboxes (Tensor): Boxes to be sampled from. bboxes (Tensor): Boxes to be sampled from.
gt_bboxes (Tensor): Ground truth bboxes. gt_bboxes (Tensor): Ground truth bboxes.
gt_bboxes_ignore (Tensor): Ignored ground truth bboxes. In MS COCO, gt_labels (Tensor, optional): Class labels of ground truth bboxes.
`crowd` bboxes are considered as ignored.
gt_labels (Tensor): Class labels of ground truth bboxes.
cfg (dict): Sampling configs.
Returns: Returns:
tuple[Tensor]: pos_bboxes, neg_bboxes, pos_assigned_gt_inds, :obj:`SamplingResult`: Sampling result.
pos_gt_bboxes, pos_gt_labels
""" """
bboxes = bboxes[:, :4] bboxes = bboxes[:, :4]
assigned_gt_inds, assigned_labels, argmax_overlaps, max_overlaps = \
bbox_assign(bboxes, gt_bboxes, gt_bboxes_ignore, gt_labels,
cfg.pos_iou_thr, cfg.neg_iou_thr, cfg.min_pos_iou,
cfg.crowd_thr)
if cfg.add_gt_as_proposals: gt_flags = bboxes.new_zeros((bboxes.shape[0], ), dtype=torch.uint8)
if self.add_gt_as_proposals:
bboxes = torch.cat([gt_bboxes, bboxes], dim=0) bboxes = torch.cat([gt_bboxes, bboxes], dim=0)
gt_assign_self = torch.arange( assign_result.add_gt_(gt_labels)
1, len(gt_labels) + 1, dtype=torch.long, device=bboxes.device) gt_flags = torch.cat([
assigned_gt_inds = torch.cat([gt_assign_self, assigned_gt_inds]) bboxes.new_ones((gt_bboxes.shape[0], ), dtype=torch.uint8),
assigned_labels = torch.cat([gt_labels, assigned_labels]) gt_flags
])
num_expected_pos = int(self.num * self.pos_fraction)
pos_inds = self._sample_pos(assign_result, num_expected_pos)
# We found that sampled indices have duplicated items occasionally.
# (mab be a bug of PyTorch)
pos_inds = pos_inds.unique()
num_sampled_pos = pos_inds.numel()
num_expected_neg = self.num - num_sampled_pos
if self.neg_pos_ub >= 0:
num_neg_max = int(self.neg_pos_ub *
num_sampled_pos) if num_sampled_pos > 0 else int(
self.neg_pos_ub)
num_expected_neg = min(num_neg_max, num_expected_neg)
neg_inds = self._sample_neg(assign_result, num_expected_neg)
neg_inds = neg_inds.unique()
pos_inds, neg_inds = bbox_sampling( return SamplingResult(pos_inds, neg_inds, bboxes, gt_bboxes,
assigned_gt_inds, cfg.roi_batch_size, cfg.pos_fraction, cfg.neg_pos_ub, assign_result, gt_flags)
cfg.pos_balance_sampling, max_overlaps, cfg.neg_balance_thr)
pos_bboxes = bboxes[pos_inds]
neg_bboxes = bboxes[neg_inds]
pos_assigned_gt_inds = assigned_gt_inds[pos_inds] - 1
pos_gt_bboxes = gt_bboxes[pos_assigned_gt_inds, :]
pos_gt_labels = assigned_labels[pos_inds]
return (pos_bboxes, neg_bboxes, pos_assigned_gt_inds, pos_gt_bboxes, class SamplingResult(object):
pos_gt_labels)
def __init__(self, pos_inds, neg_inds, bboxes, gt_bboxes, assign_result,
gt_flags):
self.pos_inds = pos_inds
self.neg_inds = neg_inds
self.pos_bboxes = bboxes[pos_inds]
self.neg_bboxes = bboxes[neg_inds]
self.pos_is_gt = gt_flags[pos_inds]
self.num_gts = gt_bboxes.shape[0]
self.pos_assigned_gt_inds = assign_result.gt_inds[pos_inds] - 1
self.pos_gt_bboxes = gt_bboxes[self.pos_assigned_gt_inds, :]
if assign_result.labels is not None:
self.pos_gt_labels = assign_result.labels[pos_inds]
else:
self.pos_gt_labels = None
@property
def bboxes(self):
return torch.cat([self.pos_bboxes, self.neg_bboxes])
...@@ -203,13 +203,22 @@ class CocoDataset(Dataset): ...@@ -203,13 +203,22 @@ class CocoDataset(Dataset):
# load proposals if necessary # load proposals if necessary
if self.proposals is not None: if self.proposals is not None:
proposals = self.proposals[idx][:self.num_max_proposals, :4] proposals = self.proposals[idx][:self.num_max_proposals]
# TODO: Handle empty proposals properly. Currently images with # TODO: Handle empty proposals properly. Currently images with
# no proposals are just ignored, but they can be used for # no proposals are just ignored, but they can be used for
# training in concept. # training in concept.
if len(proposals) == 0: if len(proposals) == 0:
idx = self._rand_another(idx) idx = self._rand_another(idx)
continue continue
if not (proposals.shape[1] == 4 or proposals.shape[1] == 5):
raise AssertionError(
'proposals should have shapes (n, 4) or (n, 5), '
'but found {}'.format(proposals.shape))
if proposals.shape[1] == 5:
scores = proposals[:, 4, None]
proposals = proposals[:, :4]
else:
scores = None
ann = self._parse_ann_info(ann_info, self.with_mask) ann = self._parse_ann_info(ann_info, self.with_mask)
gt_bboxes = ann['bboxes'] gt_bboxes = ann['bboxes']
...@@ -228,6 +237,8 @@ class CocoDataset(Dataset): ...@@ -228,6 +237,8 @@ class CocoDataset(Dataset):
if self.proposals is not None: if self.proposals is not None:
proposals = self.bbox_transform(proposals, img_shape, proposals = self.bbox_transform(proposals, img_shape,
scale_factor, flip) scale_factor, flip)
proposals = np.hstack(
[proposals, scores]) if scores is not None else proposals
gt_bboxes = self.bbox_transform(gt_bboxes, img_shape, scale_factor, gt_bboxes = self.bbox_transform(gt_bboxes, img_shape, scale_factor,
flip) flip)
gt_bboxes_ignore = self.bbox_transform(gt_bboxes_ignore, img_shape, gt_bboxes_ignore = self.bbox_transform(gt_bboxes_ignore, img_shape,
...@@ -263,8 +274,14 @@ class CocoDataset(Dataset): ...@@ -263,8 +274,14 @@ class CocoDataset(Dataset):
"""Prepare an image for testing (multi-scale and flipping)""" """Prepare an image for testing (multi-scale and flipping)"""
img_info = self.img_infos[idx] img_info = self.img_infos[idx]
img = mmcv.imread(osp.join(self.img_prefix, img_info['file_name'])) img = mmcv.imread(osp.join(self.img_prefix, img_info['file_name']))
proposal = (self.proposals[idx][:, :4] if self.proposals is not None:
if self.proposals is not None else None) proposal = self.proposals[idx][:self.num_max_proposals]
if not (proposal.shape[1] == 4 or proposal.shape[1] == 5):
raise AssertionError(
'proposals should have shapes (n, 4) or (n, 5), '
'but found {}'.format(proposal.shape))
else:
proposal = None
def prepare_single(img, scale, flip, proposal=None): def prepare_single(img, scale, flip, proposal=None):
_img, img_shape, pad_shape, scale_factor = self.img_transform( _img, img_shape, pad_shape, scale_factor = self.img_transform(
...@@ -277,8 +294,15 @@ class CocoDataset(Dataset): ...@@ -277,8 +294,15 @@ class CocoDataset(Dataset):
scale_factor=scale_factor, scale_factor=scale_factor,
flip=flip) flip=flip)
if proposal is not None: if proposal is not None:
if proposal.shape[1] == 5:
score = proposal[:, 4, None]
proposal = proposal[:, :4]
else:
score = None
_proposal = self.bbox_transform(proposal, img_shape, _proposal = self.bbox_transform(proposal, img_shape,
scale_factor, flip) scale_factor, flip)
_proposal = np.hstack(
[_proposal, score]) if score is not None else _proposal
_proposal = to_tensor(_proposal) _proposal = to_tensor(_proposal)
else: else:
_proposal = None _proposal = None
......
from .bbox_head import BBoxHead from .bbox_head import BBoxHead
from .convfc_bbox_head import ConvFCRoIHead, SharedFCRoIHead from .convfc_bbox_head import ConvFCBBoxHead, SharedFCBBoxHead
__all__ = ['BBoxHead', 'ConvFCRoIHead', 'SharedFCRoIHead'] __all__ = ['BBoxHead', 'ConvFCBBoxHead', 'SharedFCBBoxHead']
...@@ -59,16 +59,20 @@ class BBoxHead(nn.Module): ...@@ -59,16 +59,20 @@ class BBoxHead(nn.Module):
bbox_pred = self.fc_reg(x) if self.with_reg else None bbox_pred = self.fc_reg(x) if self.with_reg else None
return cls_score, bbox_pred return cls_score, bbox_pred
def get_bbox_target(self, pos_proposals, neg_proposals, pos_gt_bboxes, def get_target(self, sampling_results, gt_bboxes, gt_labels,
pos_gt_labels, rcnn_train_cfg): rcnn_train_cfg):
reg_num_classes = 1 if self.reg_class_agnostic else self.num_classes pos_proposals = [res.pos_bboxes for res in sampling_results]
neg_proposals = [res.neg_bboxes for res in sampling_results]
pos_gt_bboxes = [res.pos_gt_bboxes for res in sampling_results]
pos_gt_labels = [res.pos_gt_labels for res in sampling_results]
reg_classes = 1 if self.reg_class_agnostic else self.num_classes
cls_reg_targets = bbox_target( cls_reg_targets = bbox_target(
pos_proposals, pos_proposals,
neg_proposals, neg_proposals,
pos_gt_bboxes, pos_gt_bboxes,
pos_gt_labels, pos_gt_labels,
rcnn_train_cfg, rcnn_train_cfg,
reg_num_classes, reg_classes,
target_means=self.target_means, target_means=self.target_means,
target_stds=self.target_stds) target_stds=self.target_stds)
return cls_reg_targets return cls_reg_targets
......
...@@ -4,7 +4,7 @@ from .bbox_head import BBoxHead ...@@ -4,7 +4,7 @@ from .bbox_head import BBoxHead
from ..utils import ConvModule from ..utils import ConvModule
class ConvFCRoIHead(BBoxHead): class ConvFCBBoxHead(BBoxHead):
"""More general bbox head, with shared conv and fc layers and two optional """More general bbox head, with shared conv and fc layers and two optional
separated branches. separated branches.
...@@ -22,9 +22,10 @@ class ConvFCRoIHead(BBoxHead): ...@@ -22,9 +22,10 @@ class ConvFCRoIHead(BBoxHead):
num_reg_fcs=0, num_reg_fcs=0,
conv_out_channels=256, conv_out_channels=256,
fc_out_channels=1024, fc_out_channels=1024,
normalize=None,
*args, *args,
**kwargs): **kwargs):
super(ConvFCRoIHead, self).__init__(*args, **kwargs) super(ConvFCBBoxHead, self).__init__(*args, **kwargs)
assert (num_shared_convs + num_shared_fcs + num_cls_convs + num_cls_fcs assert (num_shared_convs + num_shared_fcs + num_cls_convs + num_cls_fcs
+ num_reg_convs + num_reg_fcs > 0) + num_reg_convs + num_reg_fcs > 0)
if num_cls_convs > 0 or num_reg_convs > 0: if num_cls_convs > 0 or num_reg_convs > 0:
...@@ -41,6 +42,8 @@ class ConvFCRoIHead(BBoxHead): ...@@ -41,6 +42,8 @@ class ConvFCRoIHead(BBoxHead):
self.num_reg_fcs = num_reg_fcs self.num_reg_fcs = num_reg_fcs
self.conv_out_channels = conv_out_channels self.conv_out_channels = conv_out_channels
self.fc_out_channels = fc_out_channels self.fc_out_channels = fc_out_channels
self.normalize = normalize
self.with_bias = normalize is None
# add shared convs and fcs # add shared convs and fcs
self.shared_convs, self.shared_fcs, last_layer_dim = \ self.shared_convs, self.shared_fcs, last_layer_dim = \
...@@ -116,7 +119,7 @@ class ConvFCRoIHead(BBoxHead): ...@@ -116,7 +119,7 @@ class ConvFCRoIHead(BBoxHead):
return branch_convs, branch_fcs, last_layer_dim return branch_convs, branch_fcs, last_layer_dim
def init_weights(self): def init_weights(self):
super(ConvFCRoIHead, self).init_weights() super(ConvFCBBoxHead, self).init_weights()
for module_list in [self.shared_fcs, self.cls_fcs, self.reg_fcs]: for module_list in [self.shared_fcs, self.cls_fcs, self.reg_fcs]:
for m in module_list.modules(): for m in module_list.modules():
if isinstance(m, nn.Linear): if isinstance(m, nn.Linear):
...@@ -162,11 +165,11 @@ class ConvFCRoIHead(BBoxHead): ...@@ -162,11 +165,11 @@ class ConvFCRoIHead(BBoxHead):
return cls_score, bbox_pred return cls_score, bbox_pred
class SharedFCRoIHead(ConvFCRoIHead): class SharedFCBBoxHead(ConvFCBBoxHead):
def __init__(self, num_fcs=2, fc_out_channels=1024, *args, **kwargs): def __init__(self, num_fcs=2, fc_out_channels=1024, *args, **kwargs):
assert num_fcs >= 1 assert num_fcs >= 1
super(SharedFCRoIHead, self).__init__( super(SharedFCBBoxHead, self).__init__(
num_shared_convs=0, num_shared_convs=0,
num_shared_fcs=num_fcs, num_shared_fcs=num_fcs,
num_cls_convs=0, num_cls_convs=0,
......
...@@ -4,7 +4,7 @@ import torch.nn as nn ...@@ -4,7 +4,7 @@ import torch.nn as nn
from .base import BaseDetector from .base import BaseDetector
from .test_mixins import RPNTestMixin, BBoxTestMixin, MaskTestMixin from .test_mixins import RPNTestMixin, BBoxTestMixin, MaskTestMixin
from .. import builder from .. import builder
from mmdet.core import sample_bboxes, bbox2roi, bbox2result, multi_apply from mmdet.core import (assign_and_sample, bbox2roi, bbox2result, multi_apply)
class TwoStageDetector(BaseDetector, RPNTestMixin, BBoxTestMixin, class TwoStageDetector(BaseDetector, RPNTestMixin, BBoxTestMixin,
...@@ -80,10 +80,11 @@ class TwoStageDetector(BaseDetector, RPNTestMixin, BBoxTestMixin, ...@@ -80,10 +80,11 @@ class TwoStageDetector(BaseDetector, RPNTestMixin, BBoxTestMixin,
gt_labels, gt_labels,
gt_masks=None, gt_masks=None,
proposals=None): proposals=None):
losses = dict()
x = self.extract_feat(img) x = self.extract_feat(img)
losses = dict()
# RPN forward and loss
if self.with_rpn: if self.with_rpn:
rpn_outs = self.rpn_head(x) rpn_outs = self.rpn_head(x)
rpn_loss_inputs = rpn_outs + (gt_bboxes, img_meta, rpn_loss_inputs = rpn_outs + (gt_bboxes, img_meta,
...@@ -96,44 +97,43 @@ class TwoStageDetector(BaseDetector, RPNTestMixin, BBoxTestMixin, ...@@ -96,44 +97,43 @@ class TwoStageDetector(BaseDetector, RPNTestMixin, BBoxTestMixin,
else: else:
proposal_list = proposals proposal_list = proposals
if self.with_bbox: # assign gts and sample proposals
(pos_proposals, neg_proposals, pos_assigned_gt_inds, pos_gt_bboxes, if self.with_bbox or self.with_mask:
pos_gt_labels) = multi_apply( assign_results, sampling_results = multi_apply(
sample_bboxes, assign_and_sample,
proposal_list, proposal_list,
gt_bboxes, gt_bboxes,
gt_bboxes_ignore, gt_bboxes_ignore,
gt_labels, gt_labels,
cfg=self.train_cfg.rcnn) cfg=self.train_cfg.rcnn)
(labels, label_weights, bbox_targets,
bbox_weights) = self.bbox_head.get_bbox_target( # bbox head forward and loss
pos_proposals, neg_proposals, pos_gt_bboxes, pos_gt_labels, if self.with_bbox:
self.train_cfg.rcnn) rois = bbox2roi([res.bboxes for res in sampling_results])
# TODO: a more flexible way to decide which feature maps to use
rois = bbox2roi([ bbox_feats = self.bbox_roi_extractor(
torch.cat([pos, neg], dim=0)
for pos, neg in zip(pos_proposals, neg_proposals)
])
# TODO: a more flexible way to configurate feat maps
roi_feats = self.bbox_roi_extractor(
x[:self.bbox_roi_extractor.num_inputs], rois) x[:self.bbox_roi_extractor.num_inputs], rois)
cls_score, bbox_pred = self.bbox_head(roi_feats) cls_score, bbox_pred = self.bbox_head(bbox_feats)
loss_bbox = self.bbox_head.loss(cls_score, bbox_pred, labels, bbox_targets = self.bbox_head.get_target(
label_weights, bbox_targets, sampling_results, gt_bboxes, gt_labels, self.train_cfg.rcnn)
bbox_weights) loss_bbox = self.bbox_head.loss(cls_score, bbox_pred,
*bbox_targets)
losses.update(loss_bbox) losses.update(loss_bbox)
# mask head forward and loss
if self.with_mask: if self.with_mask:
mask_targets = self.mask_head.get_mask_target( pos_rois = bbox2roi([res.pos_bboxes for res in sampling_results])
pos_proposals, pos_assigned_gt_inds, gt_masks,
self.train_cfg.rcnn)
pos_rois = bbox2roi(pos_proposals)
mask_feats = self.mask_roi_extractor( mask_feats = self.mask_roi_extractor(
x[:self.mask_roi_extractor.num_inputs], pos_rois) x[:self.mask_roi_extractor.num_inputs], pos_rois)
mask_pred = self.mask_head(mask_feats) mask_pred = self.mask_head(mask_feats)
mask_targets = self.mask_head.get_target(
sampling_results, gt_masks, self.train_cfg.rcnn)
pos_labels = torch.cat(
[res.pos_gt_labels for res in sampling_results])
loss_mask = self.mask_head.loss(mask_pred, mask_targets, loss_mask = self.mask_head.loss(mask_pred, mask_targets,
torch.cat(pos_gt_labels)) pos_labels)
losses.update(loss_mask) losses.update(loss_mask)
return losses return losses
...@@ -145,8 +145,7 @@ class TwoStageDetector(BaseDetector, RPNTestMixin, BBoxTestMixin, ...@@ -145,8 +145,7 @@ class TwoStageDetector(BaseDetector, RPNTestMixin, BBoxTestMixin,
x = self.extract_feat(img) x = self.extract_feat(img)
proposal_list = self.simple_test_rpn( proposal_list = self.simple_test_rpn(
x, img_meta, x, img_meta, self.test_cfg.rpn) if proposals is None else proposals
self.test_cfg.rpn) if proposals is None else proposals
det_bboxes, det_labels = self.simple_test_bboxes( det_bboxes, det_labels = self.simple_test_bboxes(
x, img_meta, proposal_list, self.test_cfg.rcnn, rescale=rescale) x, img_meta, proposal_list, self.test_cfg.rcnn, rescale=rescale)
......
...@@ -86,8 +86,11 @@ class FCNMaskHead(nn.Module): ...@@ -86,8 +86,11 @@ class FCNMaskHead(nn.Module):
mask_pred = self.conv_logits(x) mask_pred = self.conv_logits(x)
return mask_pred return mask_pred
def get_mask_target(self, pos_proposals, pos_assigned_gt_inds, gt_masks, def get_target(self, sampling_results, gt_masks, rcnn_train_cfg):
rcnn_train_cfg): pos_proposals = [res.pos_bboxes for res in sampling_results]
pos_assigned_gt_inds = [
res.pos_assigned_gt_inds for res in sampling_results
]
mask_targets = mask_target(pos_proposals, pos_assigned_gt_inds, mask_targets = mask_target(pos_proposals, pos_assigned_gt_inds,
gt_masks, rcnn_train_cfg) gt_masks, rcnn_train_cfg)
return mask_targets return mask_targets
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment