Unverified Commit cbee4278 authored by Tong He's avatar Tong He Committed by GitHub
Browse files

[Model] Scene Graph Extraction Model with GluonCV (#1260)



* add working scripts

* add frcnn training script

* remove redundent files

* refactor validation computation, will optimize sgdet and training

* validation finally finished

* f-rcnn training

* test reldn

* rm file

* update reldn training

* data preprocess to h5

* temp

* use coco json

* fix conflict

* new obj dataset for detection

* update training

* before cleanup

* remove abundant files

* add arg parse to train

* cleanup code file

* update

* fix

* add readme

* add ipynb as demo

* add demo pic

* update readme

* add demo script

* improve paths

* improve readme

* add docstrings

* fix args description

* update readme

* add models from s3

* update README
Co-authored-by: default avatarMinjie Wang <minjie.wang@nyu.edu>
parent ce93330e
...@@ -147,3 +147,6 @@ cscope.* ...@@ -147,3 +147,6 @@ cscope.*
*.swo *.swo
*.un~ *.un~
*~ *~
# parameters
*.params
# Scene Graph Extraction
Scene graph extraction aims at not only detect objects in the given image, but also classify the relationships between pairs of them.
This example reproduces [Graphical Contrastive Losses for Scene Graph Parsing](https://arxiv.org/abs/1903.02728), author's code can be found [here](https://github.com/NVIDIA/ContrastiveLosses4VRD).
![DEMO](https://raw.githubusercontent.com/dmlc/web-data/master/dgl/examples/mxnet/scenegraph/old-couple-pred.png)
## Results
**VisualGenome**
| Model | Backbone | mAP@50 | SGDET@20 | SGDET@50 | SGDET@100 | PHRCLS@20 | PHRCLS@50 |PHRCLS@100 | PREDCLS@20 | PREDCLS@50 | PREDCLS@100 |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- |
| RelDN, L0 | ResNet101 | 29.5 | 22.65 | 30.02 | 35.04 | 32.84 | 35.60 | 36.26 | 60.58 | 65.53 | 66.51 |
## Preparation
This implementation is based on GluonCV. Install GluonCV with
```
pip install gluoncv --upgrade
```
The implementation contains the following files:
```
.
|-- data
| |-- dataloader.py
| |-- __init__.py
| |-- object.py
| |-- prepare_visualgenome.py
| `-- relation.py
|-- demo_reldn.py
|-- model
| |-- faster_rcnn.py
| |-- __init__.py
| `-- reldn.py
|-- README.md
|-- train_faster_rcnn.py
|-- train_faster_rcnn.sh
|-- train_freq_prior.py
|-- train_reldn.py
|-- train_reldn.sh
|-- utils
| |-- build_graph.py
| |-- __init__.py
| |-- metric.py
| |-- sampling.py
| `-- viz.py
|-- validate_reldn.py
`-- validate_reldn.sh
```
- The folder `data` contains the data preparation script, and definition of datasets for object detection and scene graph extraction.
- The folder `model` contains model definition.
- The folder `utils` contains helper functions for training, validation, and visualization.
- The script `train_faster_rcnn.py` trains a Faster R-CNN model on VisualGenome dataset, and `train_faster_rcnn.sh` includes preset parameters.
- The script `train_freq_prior.py` trains the frequency counts for RelDN model training.
- The script `train_reldn.py` trains a RelDN model, and `train_reldn.sh` includes preset parameters.
- The script `validate_reldn.py` validate the trained Faster R-CNN and RelDN models, and `validate_reldn.sh` includes preset parameters.
- The script `demo_reldh.py` makes use of trained parameters and extract an scene graph from an arbitrary input image.
Below are further steps on training your own models. Besides, we also provide pretrained model files for validation and demo:
1. [Faster R-CNN Model for Object Detection](http://dgl-data/models/SceneGraph/faster_rcnn_resnet101_v1d_visualgenome.params)
2. [RelDN Model](http://dgl-data/models/SceneGraph/reldn.params)
3. [Faster R-CNN Model for Edge Feature](http://dgl-data/models/SceneGraph/detector_feature.params)
## Data preparation
We provide scripts to download and prepare the VisualGenome dataset. One can run with
```
python data/prepare_visualgenome.py
```
## Object Detector
First one need to train the object detection model on VisualGenome.
```
bash train_faster_rcnn.sh
```
It runs for about 20 hours on a machine with 64 CPU cores and 8 V100 GPUs.
## Training RelDN
With a trained Faster R-CNN model, one can start the training of RelDN model by
```
bash train_reldn.sh
```
It runs for about 2 days with one single GPU and 8 CPU cores.
## Validate RelDN
After the training, one can evaluate the results with multiple commonly-used metrics:
```
bash validate_reldn.sh
```
## Demo
We provide a demo script of running the model with real-world pictures. Be aware that you need trained model to generate meaningful results from the demo, otherwise the script will download the pre-trained model automatically.
from .object import *
from .relation import *
from .dataloader import *
"""DataLoader utils."""
import dgl
from mxnet import nd
from gluoncv.data.batchify import Pad
def dgl_mp_batchify_fn(data):
if isinstance(data[0], tuple):
data = zip(*data)
return [dgl_mp_batchify_fn(i) for i in data]
for dt in data:
if dt is not None:
if isinstance(dt, dgl.DGLGraph):
return [d for d in data if isinstance(d, dgl.DGLGraph)]
elif isinstance(dt, nd.NDArray):
pad = Pad(axis=(1, 2), num_shards=1, ret_length=False)
data_list = [dt for dt in data if dt is not None]
return pad(data_list)
"""Pascal VOC object detection dataset."""
from __future__ import absolute_import
from __future__ import division
import os
import logging
import warnings
import json
import pickle
import numpy as np
import mxnet as mx
from gluoncv.data import COCODetection
from collections import Counter
class VGObject(COCODetection):
CLASSES = ["airplane", "animal", "arm", "bag", "banana", "basket", "beach",
"bear", "bed", "bench", "bike", "bird", "board", "boat", "book",
"boot", "bottle", "bowl", "box", "boy", "branch", "building", "bus",
"cabinet", "cap", "car", "cat", "chair", "child", "clock", "coat",
"counter", "cow", "cup", "curtain", "desk", "dog", "door", "drawer",
"ear", "elephant", "engine", "eye", "face", "fence", "finger", "flag",
"flower", "food", "fork", "fruit", "giraffe", "girl", "glass", "glove",
"guy", "hair", "hand", "handle", "hat", "head", "helmet", "hill",
"horse", "house", "jacket", "jean", "kid", "kite", "lady", "lamp",
"laptop", "leaf", "leg", "letter", "light", "logo", "man", "men",
"motorcycle", "mountain", "mouth", "neck", "nose", "number", "orange",
"pant", "paper", "paw", "people", "person", "phone", "pillow", "pizza",
"plane", "plant", "plate", "player", "pole", "post", "pot", "racket",
"railing", "rock", "roof", "room", "screen", "seat", "sheep", "shelf",
"shirt", "shoe", "short", "sidewalk", "sign", "sink", "skateboard",
"ski", "skier", "sneaker", "snow", "sock", "stand", "street",
"surfboard", "table", "tail", "tie", "tile", "tire", "toilet",
"towel", "tower", "track", "train", "tree", "truck", "trunk",
"umbrella", "vase", "vegetable", "vehicle", "wave", "wheel",
"window", "windshield", "wing", "wire", "woman", "zebra"]
def __init__(self, **kwargs):
super(VGObject, self).__init__(**kwargs)
@property
def annotation_dir(self):
return ''
def _parse_image_path(self, entry):
dirname = 'VG_100K'
filename = entry['file_name']
abs_path = os.path.join(self._root, dirname, filename)
return abs_path
"""Prepare Visual Genome datasets"""
import os
import shutil
import argparse
import zipfile
import random
import json
import tqdm
import pickle
from gluoncv.utils import download, makedirs
_TARGET_DIR = os.path.expanduser('~/.mxnet/datasets/visualgenome')
def parse_args():
parser = argparse.ArgumentParser(
description='Initialize Visual Genome dataset.',
epilog='Example: python visualgenome.py --download-dir ~/visualgenome',
formatter_class=argparse.ArgumentDefaultsHelpFormatter)
parser.add_argument('--download-dir', type=str, default='~/visualgenome/',
help='dataset directory on disk')
parser.add_argument('--no-download', action='store_true', help='disable automatic download if set')
parser.add_argument('--overwrite', action='store_true', help='overwrite downloaded files if set, in case they are corrupted')
args = parser.parse_args()
return args
def download_vg(path, overwrite=False):
_DOWNLOAD_URLS = [
('https://cs.stanford.edu/people/rak248/VG_100K_2/images.zip',
'a055367f675dd5476220e9b93e4ca9957b024b94'),
('https://cs.stanford.edu/people/rak248/VG_100K_2/images2.zip',
'2add3aab77623549e92b7f15cda0308f50b64ecf'),
]
makedirs(path)
for url, checksum in _DOWNLOAD_URLS:
filename = download(url, path=path, overwrite=overwrite, sha1_hash=checksum)
# extract
if filename.endswith('zip'):
with zipfile.ZipFile(filename) as zf:
zf.extractall(path=path)
# move all images into folder `VG_100K`
vg_100k_path = os.path.join(path, 'VG_100K')
vg_100k_2_path = os.path.join(path, 'VG_100K_2')
files_2 = os.listdir(vg_100k_2_path)
for fl in files_2:
shutil.move(os.path.join(vg_100k_2_path, fl),
os.path.join(vg_100k_path, fl))
def download_json(path, overwrite=False):
url = 'https://data.dgl.ai/dataset/vg.zip'
output = 'vg.zip'
download(url, path=path)
with zipfile.ZipFile(output) as zf:
zf.extractall(path=path)
json_path = os.path.join(path, 'vg')
json_files = os.listdir(json_path)
for fl in json_files:
shutil.move(os.path.join(json_path, fl),
os.path.join(path, fl))
os.rmdir(json_path)
if __name__ == '__main__':
args = parse_args()
path = os.path.expanduser(args.download_dir)
if not os.path.isdir(path):
if args.no_download:
raise ValueError(('{} is not a valid directory, make sure it is present.'
' Or you should not disable "--no-download" to grab it'.format(path)))
else:
download_vg(path, overwrite=args.overwrite)
download_json(path, overwrite=args.overwrite)
# make symlink
makedirs(os.path.expanduser('~/.mxnet/datasets'))
if os.path.isdir(_TARGET_DIR):
os.rmdir(_TARGET_DIR)
os.symlink(path, _TARGET_DIR)
"""Pascal VOC object detection dataset."""
from __future__ import absolute_import
from __future__ import division
import os
import logging
import warnings
import json
import dgl
import pickle
import numpy as np
import mxnet as mx
from gluoncv.data.base import VisionDataset
from collections import Counter
from gluoncv.data.transforms.presets.rcnn import FasterRCNNDefaultTrainTransform, FasterRCNNDefaultValTransform
class VGRelation(VisionDataset):
def __init__(self, root=os.path.join('~', '.mxnet', 'datasets', 'visualgenome'), split='train'):
super(VGRelation, self).__init__(root)
self._root = os.path.expanduser(root)
self._img_path = os.path.join(self._root, 'VG_100K', '{}')
if split == 'train':
self._dict_path = os.path.join(self._root, 'rel_annotations_train.json')
elif split == 'val':
self._dict_path = os.path.join(self._root, 'rel_annotations_val.json')
else:
raise NotImplementedError
with open(self._dict_path) as f:
tmp = f.read()
self._dict = json.loads(tmp)
self._predicates_path = os.path.join(self._root, 'predicates.json')
with open(self._predicates_path, 'r') as f:
tmp = f.read()
self.rel_classes = json.loads(tmp)
self.num_rel_classes = len(self.rel_classes) + 1
self._objects_path = os.path.join(self._root, 'objects.json')
with open(self._objects_path, 'r') as f:
tmp = f.read()
self.obj_classes = json.loads(tmp)
self.num_obj_classes = len(self.obj_classes)
if split == 'val':
self.img_transform = FasterRCNNDefaultValTransform(short=600, max_size=1000)
else:
self.img_transform = FasterRCNNDefaultTrainTransform(short=600, max_size=1000)
self.split = split
def __len__(self):
return len(self._dict)
def _hash_bbox(self, object):
num_list = [object['category']] + object['bbox']
return '_'.join([str(num) for num in num_list])
def __getitem__(self, idx):
img_id = list(self._dict)[idx]
img_path = self._img_path.format(img_id)
img = mx.image.imread(img_path)
item = self._dict[img_id]
n_edges = len(item)
# edge to node ids
sub_node_hash = []
ob_node_hash = []
for i, it in enumerate(item):
sub_node_hash.append(self._hash_bbox(it['subject']))
ob_node_hash.append(self._hash_bbox(it['object']))
node_set = sorted(list(set(sub_node_hash + ob_node_hash)))
n_nodes = len(node_set)
node_to_id = {}
for i, node in enumerate(node_set):
node_to_id[node] = i
sub_id = []
ob_id = []
for i in range(n_edges):
sub_id.append(node_to_id[sub_node_hash[i]])
ob_id.append(node_to_id[ob_node_hash[i]])
# node features
bbox = mx.nd.zeros((n_nodes, 4))
node_class_ids = mx.nd.zeros((n_nodes, 1))
node_visited = [False for i in range(n_nodes)]
for i, it in enumerate(item):
if not node_visited[sub_id[i]]:
ind = sub_id[i]
sub = it['subject']
node_class_ids[ind] = sub['category']
# y1y2x1x2 to x1y1x2y2
bbox[ind,0] = sub['bbox'][2]
bbox[ind,1] = sub['bbox'][0]
bbox[ind,2] = sub['bbox'][3]
bbox[ind,3] = sub['bbox'][1]
node_visited[ind] = True
if not node_visited[ob_id[i]]:
ind = ob_id[i]
ob = it['object']
node_class_ids[ind] = ob['category']
# y1y2x1x2 to x1y1x2y2
bbox[ind,0] = ob['bbox'][2]
bbox[ind,1] = ob['bbox'][0]
bbox[ind,2] = ob['bbox'][3]
bbox[ind,3] = ob['bbox'][1]
node_visited[ind] = True
eta = 0.1
node_class_vec = node_class_ids[:,0].one_hot(self.num_obj_classes,
on_value = 1 - eta + eta / self.num_obj_classes,
off_value = eta / self.num_obj_classes)
# augmentation
if self.split == 'val':
img, bbox, _ = self.img_transform(img, bbox)
else:
img, bbox = self.img_transform(img, bbox)
# build the graph
g = dgl.DGLGraph(multigraph=True)
g.add_nodes(n_nodes)
adjmat = np.zeros((n_nodes, n_nodes))
predicate = []
for i, it in enumerate(item):
adjmat[sub_id[i], ob_id[i]] = 1
predicate.append(it['predicate'])
predicate = mx.nd.array(predicate).expand_dims(1)
g.add_edges(sub_id, ob_id, {'rel_class': mx.nd.array(predicate) + 1})
empty_edge_list = []
for i in range(n_nodes):
for j in range(n_nodes):
if i != j and adjmat[i, j] == 0:
empty_edge_list.append((i, j))
if len(empty_edge_list) > 0:
src, dst = tuple(zip(*empty_edge_list))
g.add_edges(src, dst, {'rel_class': mx.nd.zeros((len(empty_edge_list), 1))})
# assign features
g.ndata['bbox'] = bbox
g.ndata['node_class'] = node_class_ids
g.ndata['node_class_vec'] = node_class_vec
return g, img
import dgl
import argparse
import mxnet as mx
import gluoncv as gcv
from gluoncv.utilz import download
from gluoncv.data.transforms import presets
from model import faster_rcnn_resnet101_v1d_custom, RelDN
from utils import *
from data import *
def parse_args():
parser = argparse.ArgumentParser(description='Demo of Scene Graph Extraction.')
parser.add_argument('--image', type=str, default='',
help="The image for scene graph extraction.")
parser.add_argument('--gpu', type=str, default='',
help="GPU id to use for inference, default is not using GPU.")
parser.add_argument('--pretrained-faster-rcnn-params', type=str, default='',
help="Path to saved Faster R-CNN model parameters.")
parser.add_argument('--reldn-params', type=str, default='',
help="Path to saved Faster R-CNN model parameters.")
parser.add_argument('--faster-rcnn-params', type=str, default='',
help="Path to saved Faster R-CNN model parameters.")
parser.add_argument('--freq-prior', type=str, default='freq_prior.pkl',
help="Path to saved frequency prior data.")
args = parser.parse_args()
return args
args = parse_args()
if args.gpu:
ctx = mx.gpu(int(args.gpu))
else:
ctx = mx.cpu()
net = RelDN(n_classes=50, prior_pkl=args.freq_prior, semantic_only=False)
if args.reldn_params == '':
download('http://data.dgl.ai/models/SceneGraph/reldn.params')
net.load_parameters('rendl.params', ctx=ctx)
else:
net.load_parameters(args.reldn_params, ctx=ctx)
# dataset and dataloader
vg_val = VGRelation(split='val')
detector = faster_rcnn_resnet101_v1d_custom(classes=vg_val.obj_classes,
pretrained_base=False, pretrained=False,
additional_output=True)
if args.pretrained_faster_rcnn_params == '':
download('http://data.dgl.ai/models/SceneGraph/faster_rcnn_resnet101_v1d_visualgenome.params')
params_path = 'faster_rcnn_resnet101_v1d_visualgenome.params'
else:
params_path = args.pretrained_faster_rcnn_params
detector.load_parameters(params_path, ctx=ctx, ignore_extra=True, allow_missing=True)
detector_feat = faster_rcnn_resnet101_v1d_custom(classes=vg_val.obj_classes,
pretrained_base=False, pretrained=False,
additional_output=True)
detector_feat.load_parameters(params_path, ctx=ctx, ignore_extra=True, allow_missing=True)
if args.faster_rcnn_params == '':
download('http://data.dgl.ai/models/SceneGraph/faster_rcnn_resnet101_v1d_visualgenome.params')
detector_feat.features.load_parameters('faster_rcnn_resnet101_v1d_visualgenome.params', ctx=ctx)
else:
detector_feat.features.load_parameters(args.faster_rcnn_params, ctx=ctx)
# image input
if args.image:
image_path = args.image
else:
gcv.utils.download('https://raw.githubusercontent.com/dmlc/web-data/master/' +
'dgl/examples/mxnet/scenegraph/old-couple.png',
'old-couple.png')
image_path = 'old-couple.png'
x, img = presets.rcnn.load_test(args.image, short=detector.short, max_size=detector.max_size)
x = x.as_in_context(ctx)
# detector prediction
ids, scores, bboxes, feat, feat_ind, spatial_feat = detector(x)
# build graph, extract edge features
g = build_graph_validate_pred(x, ids, scores, bboxes, feat_ind, spatial_feat, bbox_improvement=True, scores_top_k=75, overlap=False)
rel_bbox = g.edata['rel_bbox'].expand_dims(0).as_in_context(ctx)
_, _, _, spatial_feat_rel = detector_feat(x, None, None, rel_bbox)
g.edata['edge_feat'] = spatial_feat_rel[0]
# graph prediction
g = net(g)
_, preds = extract_pred(g, joint_preds=True)
preds = preds[preds[:,1].argsort()[::-1]]
plot_sg(img, preds, detector.classes, vg_val.rel_classes, 10)
from .faster_rcnn import *
from .reldn import *
"""Faster RCNN Model."""
from __future__ import absolute_import
import os
import warnings
import mxnet as mx
from mxnet import autograd
from mxnet.gluon import nn
from mxnet.gluon.contrib.nn import SyncBatchNorm
from gluoncv.model_zoo.faster_rcnn.rcnn_target import RCNNTargetSampler, RCNNTargetGenerator
from gluoncv.model_zoo.rcnn import RCNN
from gluoncv.model_zoo.rpn import RPN
from gluoncv.nn.feature import FPNFeatureExpander
__all__ = ['FasterRCNN', 'get_faster_rcnn',
'faster_rcnn_resnet50_v1b_coco',
'faster_rcnn_resnet50_v1b_custom',
'faster_rcnn_resnet101_v1d_coco',
'faster_rcnn_resnet101_v1d_custom']
class FasterRCNN(RCNN):
r"""Faster RCNN network.
Parameters
----------
features : gluon.HybridBlock
Base feature extractor before feature pooling layer.
top_features : gluon.HybridBlock
Tail feature extractor after feature pooling layer.
classes : iterable of str
Names of categories, its length is ``num_class``.
box_features : gluon.HybridBlock, default is None
feature head for transforming shared ROI output (top_features) for box prediction.
If set to None, global average pooling will be used.
short : int, default is 600.
Input image short side size.
max_size : int, default is 1000.
Maximum size of input image long side.
min_stage : int, default is 4
Minimum stage NO. for FPN stages.
max_stage : int, default is 4
Maximum stage NO. for FPN stages.
train_patterns : str, default is None.
Matching pattern for trainable parameters.
nms_thresh : float, default is 0.3.
Non-maximum suppression threshold. You can specify < 0 or > 1 to disable NMS.
nms_topk : int, default is 400
Apply NMS to top k detection results, use -1 to disable so that every Detection
result is used in NMS.
post_nms : int, default is 100
Only return top `post_nms` detection results, the rest is discarded. The number is
based on COCO dataset which has maximum 100 objects per image. You can adjust this
number if expecting more objects. You can use -1 to return all detections.
roi_mode : str, default is align
ROI pooling mode. Currently support 'pool' and 'align'.
roi_size : tuple of int, length 2, default is (14, 14)
(height, width) of the ROI region.
strides : int/tuple of ints, default is 16
Feature map stride with respect to original image.
This is usually the ratio between original image size and feature map size.
For FPN, use a tuple of ints.
clip : float, default is None
Clip bounding box target to this value.
rpn_channel : int, default is 1024
Channel number used in RPN convolutional layers.
base_size : int
The width(and height) of reference anchor box.
scales : iterable of float, default is (8, 16, 32)
The areas of anchor boxes.
We use the following form to compute the shapes of anchors:
.. math::
width_{anchor} = size_{base} \times scale \times \sqrt{ 1 / ratio}
height_{anchor} = size_{base} \times scale \times \sqrt{ratio}
ratios : iterable of float, default is (0.5, 1, 2)
The aspect ratios of anchor boxes. We expect it to be a list or tuple.
alloc_size : tuple of int
Allocate size for the anchor boxes as (H, W).
Usually we generate enough anchors for large feature map, e.g. 128x128.
Later in inference we can have variable input sizes,
at which time we can crop corresponding anchors from this large
anchor map so we can skip re-generating anchors for each input.
rpn_train_pre_nms : int, default is 12000
Filter top proposals before NMS in training of RPN.
rpn_train_post_nms : int, default is 2000
Return top proposal results after NMS in training of RPN.
Will be set to rpn_train_pre_nms if it is larger than rpn_train_pre_nms.
rpn_test_pre_nms : int, default is 6000
Filter top proposals before NMS in testing of RPN.
rpn_test_post_nms : int, default is 300
Return top proposal results after NMS in testing of RPN.
Will be set to rpn_test_pre_nms if it is larger than rpn_test_pre_nms.
rpn_nms_thresh : float, default is 0.7
IOU threshold for NMS. It is used to remove overlapping proposals.
rpn_num_sample : int, default is 256
Number of samples for RPN targets.
rpn_pos_iou_thresh : float, default is 0.7
Anchor with IOU larger than ``pos_iou_thresh`` is regarded as positive samples.
rpn_neg_iou_thresh : float, default is 0.3
Anchor with IOU smaller than ``neg_iou_thresh`` is regarded as negative samples.
Anchors with IOU in between ``pos_iou_thresh`` and ``neg_iou_thresh`` are
ignored.
rpn_pos_ratio : float, default is 0.5
``pos_ratio`` defines how many positive samples (``pos_ratio * num_sample``) is
to be sampled.
rpn_box_norm : array-like of size 4, default is (1., 1., 1., 1.)
Std value to be divided from encoded values.
rpn_min_size : int, default is 16
Proposals whose size is smaller than ``min_size`` will be discarded.
per_device_batch_size : int, default is 1
Batch size for each device during training.
num_sample : int, default is 128
Number of samples for RCNN targets.
pos_iou_thresh : float, default is 0.5
Proposal whose IOU larger than ``pos_iou_thresh`` is regarded as positive samples.
pos_ratio : float, default is 0.25
``pos_ratio`` defines how many positive samples (``pos_ratio * num_sample``) is
to be sampled.
max_num_gt : int, default is 300
Maximum ground-truth number in whole training dataset. This is only an upper bound, not
necessarily very precise. However, using a very big number may impact the training speed.
additional_output : boolean, default is False
``additional_output`` is only used for Mask R-CNN to get internal outputs.
force_nms : bool, default is False
Appy NMS to all categories, this is to avoid overlapping detection results from different
categories.
Attributes
----------
classes : iterable of str
Names of categories, its length is ``num_class``.
num_class : int
Number of positive categories.
short : int
Input image short side size.
max_size : int
Maximum size of input image long side.
train_patterns : str
Matching pattern for trainable parameters.
nms_thresh : float
Non-maximum suppression threshold. You can specify < 0 or > 1 to disable NMS.
nms_topk : int
Apply NMS to top k detection results, use -1 to disable so that every Detection
result is used in NMS.
force_nms : bool
Appy NMS to all categories, this is to avoid overlapping detection results
from different categories.
post_nms : int
Only return top `post_nms` detection results, the rest is discarded. The number is
based on COCO dataset which has maximum 100 objects per image. You can adjust this
number if expecting more objects. You can use -1 to return all detections.
rpn_target_generator : gluon.Block
Generate training targets with cls_target, box_target, and box_mask.
target_generator : gluon.Block
Generate training targets with boxes, samples, matches, gt_label and gt_box.
"""
def __init__(self, features, top_features, classes, box_features=None,
short=600, max_size=1000, min_stage=4, max_stage=4, train_patterns=None,
nms_thresh=0.3, nms_topk=400, post_nms=100,
roi_mode='align', roi_size=(14, 14), strides=16, clip=None,
rpn_channel=1024, base_size=16, scales=(8, 16, 32),
ratios=(0.5, 1, 2), alloc_size=(128, 128), rpn_nms_thresh=0.7,
rpn_train_pre_nms=12000, rpn_train_post_nms=2000, rpn_test_pre_nms=6000,
rpn_test_post_nms=300, rpn_min_size=16, per_device_batch_size=1, num_sample=128,
pos_iou_thresh=0.5, pos_ratio=0.25, max_num_gt=300, additional_output=False,
force_nms=False, **kwargs):
super(FasterRCNN, self).__init__(
features=features, top_features=top_features, classes=classes,
box_features=box_features, short=short, max_size=max_size,
train_patterns=train_patterns, nms_thresh=nms_thresh, nms_topk=nms_topk,
post_nms=post_nms, roi_mode=roi_mode, roi_size=roi_size, strides=strides, clip=clip,
force_nms=force_nms, **kwargs)
if rpn_train_post_nms > rpn_train_pre_nms:
rpn_train_post_nms = rpn_train_pre_nms
if rpn_test_post_nms > rpn_test_pre_nms:
rpn_test_post_nms = rpn_test_pre_nms
self.ashape = alloc_size[0]
self._min_stage = min_stage
self._max_stage = max_stage
self.num_stages = max_stage - min_stage + 1
if self.num_stages > 1:
assert len(scales) == len(strides) == self.num_stages, \
"The num_stages (%d) must match number of scales (%d) and strides (%d)" \
% (self.num_stages, len(scales), len(strides))
self._batch_size = per_device_batch_size
self._num_sample = num_sample
self._rpn_test_post_nms = rpn_test_post_nms
self._target_generator = RCNNTargetGenerator(self.num_class, int(num_sample * pos_ratio),
self._batch_size)
self._additional_output = additional_output
with self.name_scope():
self.rpn = RPN(
channels=rpn_channel, strides=strides, base_size=base_size,
scales=scales, ratios=ratios, alloc_size=alloc_size,
clip=clip, nms_thresh=rpn_nms_thresh, train_pre_nms=rpn_train_pre_nms,
train_post_nms=rpn_train_post_nms, test_pre_nms=rpn_test_pre_nms,
test_post_nms=rpn_test_post_nms, min_size=rpn_min_size,
multi_level=self.num_stages > 1, per_level_nms=False)
self.sampler = RCNNTargetSampler(num_image=self._batch_size,
num_proposal=rpn_train_post_nms, num_sample=num_sample,
pos_iou_thresh=pos_iou_thresh, pos_ratio=pos_ratio,
max_num_gt=max_num_gt)
@property
def target_generator(self):
"""Returns stored target generator
Returns
-------
mxnet.gluon.HybridBlock
The RCNN target generator
"""
return self._target_generator
def reset_class(self, classes, reuse_weights=None):
"""Reset class categories and class predictors.
Parameters
----------
classes : iterable of str
The new categories. ['apple', 'orange'] for example.
reuse_weights : dict
A {new_integer : old_integer} or mapping dict or {new_name : old_name} mapping dict,
or a list of [name0, name1,...] if class names don't change.
This allows the new predictor to reuse the
previously trained weights specified.
Example
-------
>>> net = gluoncv.model_zoo.get_model('faster_rcnn_resnet50_v1b_coco', pretrained=True)
>>> # use direct name to name mapping to reuse weights
>>> net.reset_class(classes=['person'], reuse_weights={'person':'person'})
>>> # or use interger mapping, person is the 14th category in VOC
>>> net.reset_class(classes=['person'], reuse_weights={0:14})
>>> # you can even mix them
>>> net.reset_class(classes=['person'], reuse_weights={'person':14})
>>> # or use a list of string if class name don't change
>>> net.reset_class(classes=['person'], reuse_weights=['person'])
"""
super(FasterRCNN, self).reset_class(classes, reuse_weights)
self._target_generator = RCNNTargetGenerator(self.num_class, self.sampler._max_pos,
self._batch_size)
def _pyramid_roi_feats(self, F, features, rpn_rois, roi_size, strides, roi_mode='align',
roi_canonical_scale=224.0, eps=1e-6):
"""Assign rpn_rois to specific FPN layers according to its area
and then perform `ROIPooling` or `ROIAlign` to generate final
region proposals aggregated features.
Parameters
----------
features : list of mx.ndarray or mx.symbol
Features extracted from FPN base network
rpn_rois : mx.ndarray or mx.symbol
(N, 5) with [[batch_index, x1, y1, x2, y2], ...] like
roi_size : tuple
The size of each roi with regard to ROI-Wise operation
each region proposal will be roi_size spatial shape.
strides : tuple e.g. [4, 8, 16, 32]
Define the gap that ori image and feature map have
roi_mode : str, default is align
ROI pooling mode. Currently support 'pool' and 'align'.
roi_canonical_scale : float, default is 224.0
Hyperparameters for the RoI-to-FPN level mapping heuristic.
Returns
-------
Pooled roi features aggregated according to its roi_level
"""
max_stage = self._max_stage
if self._max_stage > 5: # do not use p6 for RCNN
max_stage = self._max_stage - 1
_, x1, y1, x2, y2 = F.split(rpn_rois, axis=-1, num_outputs=5)
h = y2 - y1 + 1
w = x2 - x1 + 1
roi_level = F.floor(4 + F.log2(F.sqrt(w * h) / roi_canonical_scale + eps))
roi_level = F.squeeze(F.clip(roi_level, self._min_stage, max_stage))
# [2,2,..,3,3,...,4,4,...,5,5,...] ``Prohibit swap order here``
# roi_level_sorted_args = F.argsort(roi_level, is_ascend=True)
# roi_level = F.sort(roi_level, is_ascend=True)
# rpn_rois = F.take(rpn_rois, roi_level_sorted_args, axis=0)
pooled_roi_feats = []
for i, l in enumerate(range(self._min_stage, max_stage + 1)):
if roi_mode == 'pool':
# Pool features with all rois first, and then set invalid pooled features to zero,
# at last ele-wise add together to aggregate all features.
pooled_feature = F.ROIPooling(features[i], rpn_rois, roi_size, 1. / strides[i])
pooled_feature = F.where(roi_level == l, pooled_feature,
F.zeros_like(pooled_feature))
elif roi_mode == 'align':
if 'box_encode' in F.contrib.__dict__ and 'box_decode' in F.contrib.__dict__:
# TODO(jerryzcn): clean this up for once mx 1.6 is released.
masked_rpn_rois = F.where(roi_level == l, rpn_rois, F.ones_like(rpn_rois) * -1.)
pooled_feature = F.contrib.ROIAlign(features[i], masked_rpn_rois, roi_size,
1. / strides[i], sample_ratio=2)
else:
pooled_feature = F.contrib.ROIAlign(features[i], rpn_rois, roi_size,
1. / strides[i], sample_ratio=2)
pooled_feature = F.where(roi_level == l, pooled_feature,
F.zeros_like(pooled_feature))
else:
raise ValueError("Invalid roi mode: {}".format(roi_mode))
pooled_roi_feats.append(pooled_feature)
# Ele-wise add to aggregate all pooled features
pooled_roi_feats = F.ElementWiseSum(*pooled_roi_feats)
# Sort all pooled features by asceding order
# [2,2,..,3,3,...,4,4,...,5,5,...]
# pooled_roi_feats = F.take(pooled_roi_feats, roi_level_sorted_args)
# pooled roi feats (B*N, C, 7, 7), N = N2 + N3 + N4 + N5 = num_roi, C=256 in ori paper
return pooled_roi_feats
# pylint: disable=arguments-differ
def hybrid_forward(self, F, x, gt_box=None, gt_label=None, m_rpn_box=None):
"""Forward Faster-RCNN network.
The behavior during training and inference is different.
Parameters
----------
x : mxnet.nd.NDArray or mxnet.symbol
The network input tensor.
gt_box : type, only required during training
The ground-truth bbox tensor with shape (B, N, 4).
gt_label : type, only required during training
The ground-truth label tensor with shape (B, 1, 4).
Returns
-------
(ids, scores, bboxes)
During inference, returns final class id, confidence scores, bounding
boxes.
"""
def _split(x, axis, num_outputs, squeeze_axis):
x = F.split(x, axis=axis, num_outputs=num_outputs, squeeze_axis=squeeze_axis)
if isinstance(x, list):
return x
else:
return [x]
if m_rpn_box is not None:
manual_rpn_box = True
else:
manual_rpn_box = False
feat = self.features(x)
if not isinstance(feat, (list, tuple)):
feat = [feat]
# RPN proposals
if autograd.is_training():
if manual_rpn_box:
rpn_box = m_rpn_box
self.nms_thresh = 1
else:
rpn_score, rpn_box, raw_rpn_score, raw_rpn_box, anchors = \
self.rpn(F.zeros_like(x), *feat)
rpn_box, samples, matches = self.sampler(rpn_box, rpn_score, gt_box)
else:
if manual_rpn_box:
rpn_box = m_rpn_box
self.nms_thresh = 1
else:
_, rpn_box = self.rpn(F.zeros_like(x), *feat)
# create batchid for roi
if not manual_rpn_box:
num_roi = self._num_sample if autograd.is_training() else self._rpn_test_post_nms
batch_size = self._batch_size if autograd.is_training() else 1
else:
num_roi = m_rpn_box.shape[1]
batch_size = rpn_box.shape[0]
with autograd.pause():
roi_batchid = F.arange(0, batch_size)
roi_batchid = F.repeat(roi_batchid, num_roi)
# remove batch dim because ROIPooling require 2d input
rpn_roi = F.concat(*[roi_batchid.reshape((-1, 1)), rpn_box.reshape((-1, 4))], dim=-1)
rpn_roi = F.stop_gradient(rpn_roi)
if self.num_stages > 1:
# using FPN
pooled_feat = self._pyramid_roi_feats(F, feat, rpn_roi, self._roi_size,
self._strides, roi_mode=self._roi_mode)
else:
# ROI features
if self._roi_mode == 'pool':
pooled_feat = F.ROIPooling(feat[0], rpn_roi, self._roi_size, 1. / self._strides)
elif self._roi_mode == 'align':
pooled_feat = F.contrib.ROIAlign(feat[0], rpn_roi, self._roi_size,
1. / self._strides, sample_ratio=2)
else:
raise ValueError("Invalid roi mode: {}".format(self._roi_mode))
# RCNN prediction
if self.top_features is not None:
top_feat = self.top_features(pooled_feat)
else:
top_feat = pooled_feat
if self.box_features is None:
box_feat = F.contrib.AdaptiveAvgPooling2D(top_feat, output_size=1)
else:
box_feat = self.box_features(top_feat)
cls_pred = self.class_predictor(box_feat)
# cls_pred (B * N, C) -> (B, N, C)
cls_pred = cls_pred.reshape((batch_size, num_roi, self.num_class + 1))
if manual_rpn_box:
spatial_feat = top_feat.mean(axis=1).reshape((-4, rpn_box.shape[0], rpn_box.shape[1], -3))
cls_ids, scores = self.cls_decoder(F.softmax(cls_pred, axis=-1))
cls_ids = cls_ids.transpose((0, 2, 1)).reshape((0, 0, 0, 1))
scores = scores.transpose((0, 2, 1)).reshape((0, 0, 0, 1))
cls_ids = _split(cls_ids, axis=0, num_outputs=batch_size, squeeze_axis=True)
scores = _split(scores, axis=0, num_outputs=batch_size, squeeze_axis=True)
return cls_ids, scores, rpn_box, spatial_feat
# no need to convert bounding boxes in training, just return
if autograd.is_training():
cls_targets, box_targets, box_masks, indices = \
self._target_generator(rpn_box, samples, matches, gt_label, gt_box)
box_feat = F.reshape(box_feat.expand_dims(0), (batch_size, -1, 0))
box_pred = self.box_predictor(F.concat(
*[F.take(F.slice_axis(box_feat, axis=0, begin=i, end=i + 1).squeeze(),
F.slice_axis(indices, axis=0, begin=i, end=i + 1).squeeze())
for i in range(batch_size)], dim=0))
# box_pred (B * N, C * 4) -> (B, N, C, 4)
box_pred = box_pred.reshape((batch_size, -1, self.num_class, 4))
if self._additional_output:
return (cls_pred, box_pred, rpn_box, samples, matches, raw_rpn_score, raw_rpn_box,
anchors, cls_targets, box_targets, box_masks, top_feat, indices)
return (cls_pred, box_pred, rpn_box, samples, matches, raw_rpn_score, raw_rpn_box,
anchors, cls_targets, box_targets, box_masks, indices)
box_pred = self.box_predictor(box_feat)
# box_pred (B * N, C * 4) -> (B, N, C, 4)
box_pred = box_pred.reshape((batch_size, num_roi, self.num_class, 4))
# cls_ids (B, N, C), scores (B, N, C)
cls_ids, scores = self.cls_decoder(F.softmax(cls_pred, axis=-1))
# cls_ids, scores (B, N, C) -> (B, C, N) -> (B, C, N, 1)
cls_ids = cls_ids.transpose((0, 2, 1)).reshape((0, 0, 0, 1))
scores = scores.transpose((0, 2, 1)).reshape((0, 0, 0, 1))
# box_pred (B, N, C, 4) -> (B, C, N, 4)
box_pred = box_pred.transpose((0, 2, 1, 3))
# rpn_boxes (B, N, 4) -> B * (1, N, 4)
rpn_boxes = _split(rpn_box, axis=0, num_outputs=batch_size, squeeze_axis=False)
# cls_ids, scores (B, C, N, 1) -> B * (C, N, 1)
cls_ids = _split(cls_ids, axis=0, num_outputs=batch_size, squeeze_axis=True)
scores = _split(scores, axis=0, num_outputs=batch_size, squeeze_axis=True)
# box_preds (B, C, N, 4) -> B * (C, N, 4)
box_preds = _split(box_pred, axis=0, num_outputs=batch_size, squeeze_axis=True)
# per batch predict, nms, each class has topk outputs
results = []
# add feat index
if self._additional_output:
sizes = scores[0].shape[0:2]
# ind = mx.nd.array(list(range(sizes[1])))
ind = mx.nd.linspace(0, 999, 1000)
ind = mx.nd.repeat(ind, repeats=sizes[0])
ind = ind.reshape(sizes[1], sizes[0]).transpose((1, 0)).expand_dims(axis=2)
for rpn_box, cls_id, score, box_pred in zip(rpn_boxes, cls_ids, scores, box_preds):
# box_pred (C, N, 4) rpn_box (1, N, 4) -> bbox (C, N, 4)
bbox = self.box_decoder(box_pred, rpn_box)
if self._additional_output:
# res (C, N, 7)
res = F.concat(*[cls_id, score, bbox, ind], dim=-1)
else:
# res (C, N, 6)
res = F.concat(*[cls_id, score, bbox], dim=-1)
if self.force_nms:
# res (1, C*N, 6), to allow cross-catogory suppression
res = res.reshape((1, -1, 0))
# res (C, self.nms_topk, 6)
res = F.contrib.box_nms(
res, overlap_thresh=self.nms_thresh, topk=self.nms_topk, valid_thresh=0.001,
id_index=0, score_index=1, coord_start=2, force_suppress=self.force_nms)
# res (C * self.nms_topk, 6)
res = res.reshape((-3, 0))
results.append(res)
# result B * (C * topk, 6) -> (B, C * topk, 6)
result = F.stack(*results, axis=0)
ids = F.slice_axis(result, axis=-1, begin=0, end=1)
scores = F.slice_axis(result, axis=-1, begin=1, end=2)
bboxes = F.slice_axis(result, axis=-1, begin=2, end=6)
if self._additional_output:
feat_ind = F.slice_axis(result, axis=-1, begin=6, end=7)
spatial_feat = top_feat.mean(axis=1).expand_dims(0).reshape(batch_size, 0, -1)
return ids, scores, bboxes, feat, feat_ind, spatial_feat
return ids, scores, bboxes
def get_faster_rcnn(name, dataset, pretrained=False, ctx=mx.cpu(),
root=os.path.join('~', '.mxnet', 'models'), **kwargs):
r"""Utility function to return faster rcnn networks.
Parameters
----------
name : str
Model name.
dataset : str
The name of dataset.
pretrained : bool or str
Boolean value controls whether to load the default pretrained weights for model.
String value represents the hashtag for a certain version of pretrained weights.
ctx : mxnet.Context
Context such as mx.cpu(), mx.gpu(0).
root : str
Model weights storing path.
Returns
-------
mxnet.gluon.HybridBlock
The Faster-RCNN network.
"""
net = FasterRCNN(**kwargs)
if pretrained:
from gluoncv.model_zoo.model_store import get_model_file
full_name = '_'.join(('faster_rcnn', name, dataset))
net.load_parameters(get_model_file(full_name, tag=pretrained, root=root), ctx=ctx,
ignore_extra=True, allow_missing=True)
else:
for v in net.collect_params().values():
try:
v.reset_ctx(ctx)
except ValueError:
pass
return net
def faster_rcnn_resnet50_v1b_coco(pretrained=False, pretrained_base=True, **kwargs):
r"""Faster RCNN model from the paper
"Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster r-cnn: Towards
real-time object detection with region proposal networks"
Parameters
----------
pretrained : bool or str
Boolean value controls whether to load the default pretrained weights for model.
String value represents the hashtag for a certain version of pretrained weights.
pretrained_base : bool or str, optional, default is True
Load pretrained base network, the extra layers are randomized. Note that
if pretrained is `True`, this has no effect.
ctx : Context, default CPU
The context in which to load the pretrained weights.
root : str, default '~/.mxnet/models'
Location for keeping the model parameters.
Examples
--------
>>> model = get_faster_rcnn_resnet50_v1b_coco(pretrained=True)
>>> print(model)
"""
from gluoncv.model_zoo.resnetv1b import resnet50_v1b
from gluoncv.data import COCODetection
classes = COCODetection.CLASSES
pretrained_base = False if pretrained else pretrained_base
base_network = resnet50_v1b(pretrained=pretrained_base, dilated=False,
use_global_stats=True, **kwargs)
features = nn.HybridSequential()
top_features = nn.HybridSequential()
for layer in ['conv1', 'bn1', 'relu', 'maxpool', 'layer1', 'layer2', 'layer3']:
features.add(getattr(base_network, layer))
for layer in ['layer4']:
top_features.add(getattr(base_network, layer))
train_patterns = '|'.join(['.*dense', '.*rpn', '.*down(2|3|4)_conv', '.*layers(2|3|4)_conv'])
return get_faster_rcnn(
name='resnet50_v1b', dataset='coco', pretrained=pretrained,
features=features, top_features=top_features, classes=classes,
short=800, max_size=1333, train_patterns=train_patterns,
nms_thresh=0.7, nms_topk=-1, post_nms=-1,
roi_mode='align', roi_size=(14, 14), strides=16, clip=4.14,
rpn_channel=1024, base_size=16, scales=(2, 4, 8, 16, 32),
ratios=(0.5, 1, 2), alloc_size=(128, 128), rpn_nms_thresh=0.7,
rpn_train_pre_nms=12000, rpn_train_post_nms=2000,
rpn_test_pre_nms=6000, rpn_test_post_nms=1000, rpn_min_size=1,
num_sample=128, pos_iou_thresh=0.5, pos_ratio=0.25,
max_num_gt=3000, **kwargs)
def faster_rcnn_resnet50_v1b_custom(classes, transfer=None, pretrained_base=True,
pretrained=False, **kwargs):
r"""Faster RCNN model with resnet50_v1b base network on custom dataset.
Parameters
----------
classes : iterable of str
Names of custom foreground classes. `len(classes)` is the number of foreground classes.
transfer : str or None
If not `None`, will try to reuse pre-trained weights from faster RCNN networks trained
on other datasets.
pretrained : bool or str
Boolean value controls whether to load the default pretrained weights for model.
String value represents the hashtag for a certain version of pretrained weights.
pretrained_base : bool or str
Boolean value controls whether to load the default pretrained weights for model.
String value represents the hashtag for a certain version of pretrained weights.
ctx : Context, default CPU
The context in which to load the pretrained weights.
root : str, default '~/.mxnet/models'
Location for keeping the model parameters.
Returns
-------
mxnet.gluon.HybridBlock
Hybrid faster RCNN network.
"""
if pretrained:
warnings.warn("Custom models don't provide `pretrained` weights, ignored.")
if transfer is None:
from gluoncv.model_zoo.resnetv1b import resnet50_v1b
base_network = resnet50_v1b(pretrained=pretrained_base, dilated=False,
use_global_stats=True, **kwargs)
features = nn.HybridSequential()
top_features = nn.HybridSequential()
for layer in ['conv1', 'bn1', 'relu', 'maxpool', 'layer1', 'layer2', 'layer3']:
features.add(getattr(base_network, layer))
for layer in ['layer4']:
top_features.add(getattr(base_network, layer))
train_patterns = '|'.join(['.*dense', '.*rpn', '.*down(2|3|4)_conv',
'.*layers(2|3|4)_conv'])
return get_faster_rcnn(
name='resnet50_v1b', dataset='custom', pretrained=pretrained,
features=features, top_features=top_features, classes=classes,
short=600, max_size=1000, train_patterns=train_patterns,
nms_thresh=0.7, nms_topk=400, post_nms=100,
roi_mode='align', roi_size=(14, 14), strides=16, clip=4.14,
rpn_channel=1024, base_size=16, scales=(2, 4, 8, 16, 32),
ratios=(0.5, 1, 2), alloc_size=(128, 128), rpn_nms_thresh=0.7,
rpn_train_pre_nms=12000, rpn_train_post_nms=2000,
rpn_test_pre_nms=6000, rpn_test_post_nms=300, rpn_min_size=16,
num_sample=128, pos_iou_thresh=0.5, pos_ratio=0.25, max_num_gt=3000,
**kwargs)
else:
from gluoncv.model_zoo import get_model
net = get_model('faster_rcnn_resnet50_v1b_' + str(transfer), pretrained=True, **kwargs)
reuse_classes = [x for x in classes if x in net.classes]
net.reset_class(classes, reuse_weights=reuse_classes)
return net
def faster_rcnn_resnet101_v1d_coco(pretrained=False, pretrained_base=True, **kwargs):
r"""Faster RCNN model from the paper
"Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster r-cnn: Towards
real-time object detection with region proposal networks"
Parameters
----------
pretrained : bool, optional, default is False
Load pretrained weights.
pretrained_base : bool or str, optional, default is True
Load pretrained base network, the extra layers are randomized. Note that
if pretrained is `True`, this has no effect.
ctx : Context, default CPU
The context in which to load the pretrained weights.
root : str, default '~/.mxnet/models'
Location for keeping the model parameters.
Examples
--------
>>> model = get_faster_rcnn_resnet101_v1d_coco(pretrained=True)
>>> print(model)
"""
from gluoncv.model_zoo.resnetv1b import resnet101_v1d
from gluoncv.data import COCODetection
classes = COCODetection.CLASSES
pretrained_base = False if pretrained else pretrained_base
base_network = resnet101_v1d(pretrained=pretrained_base, dilated=False,
use_global_stats=True, **kwargs)
features = nn.HybridSequential()
top_features = nn.HybridSequential()
for layer in ['conv1', 'bn1', 'relu', 'maxpool', 'layer1', 'layer2', 'layer3']:
features.add(getattr(base_network, layer))
for layer in ['layer4']:
top_features.add(getattr(base_network, layer))
train_patterns = '|'.join(['.*dense', '.*rpn', '.*down(2|3|4)_conv', '.*layers(2|3|4)_conv'])
return get_faster_rcnn(
name='resnet101_v1d', dataset='coco', pretrained=pretrained,
features=features, top_features=top_features, classes=classes,
short=800, max_size=1333, train_patterns=train_patterns,
nms_thresh=0.5, nms_topk=-1, post_nms=100,
roi_mode='align', roi_size=(14, 14), strides=16, clip=4.14,
rpn_channel=1024, base_size=16, scales=(2, 4, 8, 16, 32),
ratios=(0.5, 1, 2), alloc_size=(128, 128), rpn_nms_thresh=0.7,
rpn_train_pre_nms=12000, rpn_train_post_nms=2000,
rpn_test_pre_nms=6000, rpn_test_post_nms=1000, rpn_min_size=1,
num_sample=128, pos_iou_thresh=0.5, pos_ratio=0.25, max_num_gt=3000,
**kwargs)
def faster_rcnn_resnet101_v1d_custom(classes, transfer=None, pretrained_base=True,
pretrained=False, **kwargs):
r"""Faster RCNN model with resnet101_v1d base network on custom dataset.
Parameters
----------
classes : iterable of str
Names of custom foreground classes. `len(classes)` is the number of foreground classes.
transfer : str or None
If not `None`, will try to reuse pre-trained weights from faster RCNN networks trained
on other datasets.
pretrained_base : bool or str
Boolean value controls whether to load the default pretrained weights for model.
String value represents the hashtag for a certain version of pretrained weights.
ctx : Context, default CPU
The context in which to load the pretrained weights.
root : str, default '~/.mxnet/models'
Location for keeping the model parameters.
Returns
-------
mxnet.gluon.HybridBlock
Hybrid faster RCNN network.
"""
if pretrained:
warnings.warn("Custom models don't provide `pretrained` weights, ignored.")
if transfer is None:
from gluoncv.model_zoo.resnetv1b import resnet101_v1d
base_network = resnet101_v1d(pretrained=pretrained_base, dilated=False,
use_global_stats=True, **kwargs)
features = nn.HybridSequential()
top_features = nn.HybridSequential()
for layer in ['conv1', 'bn1', 'relu', 'maxpool', 'layer1', 'layer2', 'layer3']:
features.add(getattr(base_network, layer))
for layer in ['layer4']:
top_features.add(getattr(base_network, layer))
train_patterns = '|'.join(['.*dense', '.*rpn', '.*down(2|3|4)_conv',
'.*layers(2|3|4)_conv'])
return get_faster_rcnn(
name='resnet101_v1d', dataset='custom', pretrained=pretrained,
features=features, top_features=top_features, classes=classes,
short=600, max_size=1000, train_patterns=train_patterns,
nms_thresh=0.5, nms_topk=400, post_nms=100,
roi_mode='align', roi_size=(14, 14), strides=16, clip=4.14,
rpn_channel=1024, base_size=16, scales=(2, 4, 8, 16, 32),
ratios=(0.5, 1, 2), alloc_size=(128, 128), rpn_nms_thresh=0.7,
rpn_train_pre_nms=12000, rpn_train_post_nms=2000,
rpn_test_pre_nms=6000, rpn_test_post_nms=300, rpn_min_size=16,
num_sample=128, pos_iou_thresh=0.5, pos_ratio=0.25, max_num_gt=3000,
**kwargs)
else:
net = faster_rcnn_resnet101_v1d_coco(pretrained=True)
reuse_classes = [x for x in classes if x in net.classes]
net.reset_class(classes, reuse_weights=reuse_classes)
return net
import dgl
import gluoncv as gcv
import mxnet as mx
import numpy as np
from mxnet import nd
from mxnet.gluon import nn
from dgl.utils import toindex
import pickle
from dgl.nn.mxnet import GraphConv
__all__ = ['RelDN']
class EdgeConfMLP(nn.Block):
'''compute the confidence for edges'''
def __init__(self):
super(EdgeConfMLP, self).__init__()
def forward(self, edges):
score_pred = nd.log_softmax(edges.data['preds'])[:,1:].max(axis=1)
score_phr = score_pred + edges.src['node_class_logit'] + edges.dst['node_class_logit']
return {'score_pred': score_pred,
'score_phr': score_phr}
class EdgeBBoxExtend(nn.Block):
'''encode the bounding boxes'''
def __init__(self):
super(EdgeBBoxExtend, self).__init__()
def bbox_delta(self, bbox_a, bbox_b):
n = bbox_a.shape[0]
result = nd.zeros((n, 4), ctx=bbox_a.context)
result[:,0] = bbox_a[:,0] - bbox_b[:,0]
result[:,1] = bbox_a[:,1] - bbox_b[:,1]
result[:,2] = nd.log((bbox_a[:,2] - bbox_a[:,0] + 1e-8) / (bbox_b[:,2] - bbox_b[:,0] + 1e-8))
result[:,3] = nd.log((bbox_a[:,3] - bbox_a[:,1] + 1e-8) / (bbox_b[:,3] - bbox_b[:,1] + 1e-8))
return result
def forward(self, edges):
ctx = edges.src['pred_bbox'].context
n = edges.src['pred_bbox'].shape[0]
delta_src_obj = self.bbox_delta(edges.src['pred_bbox'], edges.dst['pred_bbox'])
delta_src_rel = self.bbox_delta(edges.src['pred_bbox'], edges.data['rel_bbox'])
delta_rel_obj = self.bbox_delta(edges.data['rel_bbox'], edges.dst['pred_bbox'])
result = nd.zeros((n, 12), ctx=ctx)
result[:,0:4] = delta_src_obj
result[:,4:8] = delta_src_rel
result[:,8:12] = delta_rel_obj
return {'pred_bbox_additional': result}
class EdgeFreqPrior(nn.Block):
'''make use of the pre-trained frequency prior'''
def __init__(self, prior_pkl):
super(EdgeFreqPrior, self).__init__()
with open(prior_pkl, 'rb') as f:
freq_prior = pickle.load(f)
self.freq_prior = freq_prior
def forward(self, edges):
ctx = edges.src['node_class_pred'].context
src_ind = edges.src['node_class_pred'].asnumpy().astype(int)
dst_ind = edges.dst['node_class_pred'].asnumpy().astype(int)
prob = self.freq_prior[src_ind, dst_ind]
out = nd.array(prob, ctx=ctx)
return {'freq_prior': out}
class EdgeSpatial(nn.Block):
'''spatial feature branch'''
def __init__(self, n_classes):
super(EdgeSpatial, self).__init__()
self.mlp = nn.Sequential()
self.mlp.add(nn.Dense(64))
self.mlp.add(nn.LeakyReLU(0.1))
self.mlp.add(nn.Dense(64))
self.mlp.add(nn.LeakyReLU(0.1))
self.mlp.add(nn.Dense(n_classes))
def forward(self, edges):
feat = nd.concat(edges.src['pred_bbox'], edges.dst['pred_bbox'],
edges.data['rel_bbox'], edges.data['pred_bbox_additional'])
out = self.mlp(feat)
return {'spatial': out}
class EdgeVisual(nn.Block):
'''visual feature branch'''
def __init__(self, n_classes, vis_feat_dim=7*7*3):
super(EdgeVisual, self).__init__()
self.dim_in = vis_feat_dim
self.mlp_joint = nn.Sequential()
self.mlp_joint.add(nn.Dense(vis_feat_dim // 2))
self.mlp_joint.add(nn.LeakyReLU(0.1))
self.mlp_joint.add(nn.Dense(vis_feat_dim // 3))
self.mlp_joint.add(nn.LeakyReLU(0.1))
self.mlp_joint.add(nn.Dense(n_classes))
self.mlp_sub = nn.Dense(n_classes)
self.mlp_ob = nn.Dense(n_classes)
def forward(self, edges):
feat = nd.concat(edges.src['node_feat'], edges.dst['node_feat'], edges.data['edge_feat'])
out_joint = self.mlp_joint(feat)
out_sub = self.mlp_sub(edges.src['node_feat'])
out_ob = self.mlp_ob(edges.dst['node_feat'])
out = out_joint + out_sub + out_ob
return {'visual': out}
class RelDN(nn.Block):
'''The RelDN Model'''
def __init__(self, n_classes, prior_pkl, semantic_only=False):
super(RelDN, self).__init__()
# output layers
self.edge_bbox_extend = EdgeBBoxExtend()
# semantic through mlp encoding
if prior_pkl is not None:
self.freq_prior = EdgeFreqPrior(prior_pkl)
# with predicate class and a link class
self.spatial = EdgeSpatial(n_classes + 1)
# with visual features
self.visual = EdgeVisual(n_classes + 1)
self.edge_conf_mlp = EdgeConfMLP()
self.semantic_only = semantic_only
def forward(self, g):
if g is None or g.number_of_nodes() == 0:
return g
# predictions
g.apply_edges(self.freq_prior)
if self.semantic_only:
g.edata['preds'] = g.edata['freq_prior']
else:
# bbox extension
g.apply_edges(self.edge_bbox_extend)
g.apply_edges(self.spatial)
g.apply_edges(self.visual)
g.edata['preds'] = g.edata['freq_prior'] + g.edata['spatial'] + g.edata['visual']
# subgraph for gconv
g.apply_edges(self.edge_conf_mlp)
return g
"""Train Faster-RCNN end to end."""
import argparse
import os
# disable autotune
os.environ['MXNET_CUDNN_AUTOTUNE_DEFAULT'] = '0'
import logging
import time
import numpy as np
import mxnet as mx
from mxnet import gluon
from mxnet import autograd
from mxnet.contrib import amp
import gluoncv as gcv
from gluoncv import data as gdata
from gluoncv import utils as gutils
from gluoncv.model_zoo import get_model
from gluoncv.data.batchify import FasterRCNNTrainBatchify, Tuple, Append
from gluoncv.data.transforms.presets.rcnn import FasterRCNNDefaultTrainTransform, \
FasterRCNNDefaultValTransform
from gluoncv.utils.metrics.voc_detection import VOC07MApMetric
from gluoncv.utils.metrics.coco_detection import COCODetectionMetric
from gluoncv.utils.parallel import Parallelizable, Parallel
from gluoncv.utils.metrics.rcnn import RPNAccMetric, RPNL1LossMetric, RCNNAccMetric, \
RCNNL1LossMetric
from data import *
from model import faster_rcnn_resnet101_v1d_custom, faster_rcnn_resnet50_v1b_custom
try:
import horovod.mxnet as hvd
except ImportError:
hvd = None
def parse_args():
parser = argparse.ArgumentParser(description='Train Faster-RCNN networks e2e.')
parser.add_argument('--network', type=str, default='resnet101_v1d',
help="Base network name which serves as feature extraction base.")
parser.add_argument('--dataset', type=str, default='visualgenome',
help='Training dataset. Now support voc and coco.')
parser.add_argument('--num-workers', '-j', dest='num_workers', type=int,
default=8, help='Number of data workers, you can use larger '
'number to accelerate data loading, '
'if your CPU and GPUs are powerful.')
parser.add_argument('--batch-size', type=int, default=8, help='Training mini-batch size.')
parser.add_argument('--gpus', type=str, default='0',
help='Training with GPUs, you can specify 1,3 for example.')
parser.add_argument('--epochs', type=str, default='',
help='Training epochs.')
parser.add_argument('--resume', type=str, default='',
help='Resume from previously saved parameters if not None. '
'For example, you can resume from ./faster_rcnn_xxx_0123.params')
parser.add_argument('--start-epoch', type=int, default=0,
help='Starting epoch for resuming, default is 0 for new training.'
'You can specify it to 100 for example to start from 100 epoch.')
parser.add_argument('--lr', type=str, default='',
help='Learning rate, default is 0.001 for voc single gpu training.')
parser.add_argument('--lr-decay', type=float, default=0.1,
help='decay rate of learning rate. default is 0.1.')
parser.add_argument('--lr-decay-epoch', type=str, default='',
help='epochs at which learning rate decays. default is 14,20 for voc.')
parser.add_argument('--lr-warmup', type=str, default='',
help='warmup iterations to adjust learning rate, default is 0 for voc.')
parser.add_argument('--lr-warmup-factor', type=float, default=1. / 3.,
help='warmup factor of base lr.')
parser.add_argument('--momentum', type=float, default=0.9,
help='SGD momentum, default is 0.9')
parser.add_argument('--wd', type=str, default='',
help='Weight decay, default is 5e-4 for voc')
parser.add_argument('--log-interval', type=int, default=100,
help='Logging mini-batch interval. Default is 100.')
parser.add_argument('--save-prefix', type=str, default='',
help='Saving parameter prefix')
parser.add_argument('--save-interval', type=int, default=1,
help='Saving parameters epoch interval, best model will always be saved.')
parser.add_argument('--val-interval', type=int, default=1,
help='Epoch interval for validation, increase the number will reduce the '
'training time if validation is slow.')
parser.add_argument('--seed', type=int, default=233,
help='Random seed to be fixed.')
parser.add_argument('--verbose', dest='verbose', action='store_true',
help='Print helpful debugging info once set.')
parser.add_argument('--mixup', action='store_true', help='Use mixup training.')
parser.add_argument('--no-mixup-epochs', type=int, default=20,
help='Disable mixup training if enabled in the last N epochs.')
# Norm layer options
parser.add_argument('--norm-layer', type=str, default=None,
help='Type of normalization layer to use. '
'If set to None, backbone normalization layer will be fixed,'
' and no normalization layer will be used. '
'Currently supports \'bn\', and None, default is None.'
'Note that if horovod is enabled, sync bn will not work correctly.')
# FPN options
parser.add_argument('--use-fpn', action='store_true',
help='Whether to use feature pyramid network.')
# Performance options
parser.add_argument('--disable-hybridization', action='store_true',
help='Whether to disable hybridize the model. '
'Memory usage and speed will decrese.')
parser.add_argument('--static-alloc', action='store_true',
help='Whether to use static memory allocation. Memory usage will increase.')
parser.add_argument('--amp', action='store_true',
help='Use MXNet AMP for mixed precision training.')
parser.add_argument('--horovod', action='store_true',
help='Use MXNet Horovod for distributed training. Must be run with OpenMPI. '
'--gpus is ignored when using --horovod.')
parser.add_argument('--executor-threads', type=int, default=1,
help='Number of threads for executor for scheduling ops. '
'More threads may incur higher GPU memory footprint, '
'but may speed up throughput. Note that when horovod is used, '
'it is set to 1.')
parser.add_argument('--kv-store', type=str, default='nccl',
help='KV store options. local, device, nccl, dist_sync, dist_device_sync, '
'dist_async are available.')
args = parser.parse_args()
if args.horovod:
if hvd is None:
raise SystemExit("Horovod not found, please check if you installed it correctly.")
hvd.init()
if args.dataset == 'voc':
args.epochs = int(args.epochs) if args.epochs else 20
args.lr_decay_epoch = args.lr_decay_epoch if args.lr_decay_epoch else '14,20'
args.lr = float(args.lr) if args.lr else 0.001
args.lr_warmup = args.lr_warmup if args.lr_warmup else -1
args.wd = float(args.wd) if args.wd else 5e-4
elif args.dataset == 'visualgenome':
args.epochs = int(args.epochs) if args.epochs else 20
args.lr_decay_epoch = args.lr_decay_epoch if args.lr_decay_epoch else '14,20'
args.lr = float(args.lr) if args.lr else 0.001
args.lr_warmup = args.lr_warmup if args.lr_warmup else -1
args.wd = float(args.wd) if args.wd else 5e-4
elif args.dataset == 'coco':
args.epochs = int(args.epochs) if args.epochs else 26
args.lr_decay_epoch = args.lr_decay_epoch if args.lr_decay_epoch else '17,23'
args.lr = float(args.lr) if args.lr else 0.01
args.lr_warmup = args.lr_warmup if args.lr_warmup else 1000
args.wd = float(args.wd) if args.wd else 1e-4
return args
def get_dataset(dataset, args):
if dataset.lower() == 'voc':
train_dataset = gdata.VOCDetection(
splits=[(2007, 'trainval'), (2012, 'trainval')])
val_dataset = gdata.VOCDetection(
splits=[(2007, 'test')])
val_metric = VOC07MApMetric(iou_thresh=0.5, class_names=val_dataset.classes)
elif dataset.lower() == 'coco':
train_dataset = gdata.COCODetection(splits='instances_train2017', use_crowd=False)
val_dataset = gdata.COCODetection(splits='instances_val2017', skip_empty=False)
val_metric = COCODetectionMetric(val_dataset, args.save_prefix + '_eval', cleanup=True)
elif dataset.lower() == 'visualgenome':
train_dataset = VGObject(root=os.path.join('~', '.mxnet', 'datasets', 'visualgenome'),
splits='detections_train', use_crowd=False)
val_dataset = VGObject(root=os.path.join('~', '.mxnet', 'datasets', 'visualgenome'),
splits='detections_val', skip_empty=False)
val_metric = COCODetectionMetric(val_dataset, args.save_prefix + '_eval', cleanup=True)
else:
raise NotImplementedError('Dataset: {} not implemented.'.format(dataset))
if args.mixup:
from gluoncv.data.mixup import detection
train_dataset = detection.MixupDetection(train_dataset)
return train_dataset, val_dataset, val_metric
def get_dataloader(net, train_dataset, val_dataset, train_transform, val_transform, batch_size,
num_shards, args):
"""Get dataloader."""
train_bfn = FasterRCNNTrainBatchify(net, num_shards)
if hasattr(train_dataset, 'get_im_aspect_ratio'):
im_aspect_ratio = train_dataset.get_im_aspect_ratio()
else:
im_aspect_ratio = [1.] * len(train_dataset)
train_sampler = \
gcv.nn.sampler.SplitSortedBucketSampler(im_aspect_ratio, batch_size,
num_parts=hvd.size() if args.horovod else 1,
part_index=hvd.rank() if args.horovod else 0,
shuffle=True)
train_loader = mx.gluon.data.DataLoader(train_dataset.transform(
train_transform(net.short, net.max_size, net, ashape=net.ashape, multi_stage=args.use_fpn)),
batch_sampler=train_sampler, batchify_fn=train_bfn, num_workers=args.num_workers)
if val_dataset is None:
val_loader = None
else:
val_bfn = Tuple(*[Append() for _ in range(3)])
short = net.short[-1] if isinstance(net.short, (tuple, list)) else net.short
# validation use 1 sample per device
val_loader = mx.gluon.data.DataLoader(
val_dataset.transform(val_transform(short, net.max_size)), num_shards, False,
batchify_fn=val_bfn, last_batch='keep', num_workers=args.num_workers)
return train_loader, val_loader
def save_params(net, logger, best_map, current_map, epoch, save_interval, prefix):
current_map = float(current_map)
if current_map > best_map[0]:
logger.info('[Epoch {}] mAP {} higher than current best {} saving to {}'.format(
epoch, current_map, best_map, '{:s}_best.params'.format(prefix)))
best_map[0] = current_map
net.save_parameters('{:s}_best.params'.format(prefix))
with open(prefix + '_best_map.log', 'a') as f:
f.write('{:04d}:\t{:.4f}\n'.format(epoch, current_map))
if save_interval and (epoch + 1) % save_interval == 0:
logger.info('[Epoch {}] Saving parameters to {}'.format(
epoch, '{:s}_{:04d}_{:.4f}.params'.format(prefix, epoch, current_map)))
net.save_parameters('{:s}_{:04d}_{:.4f}.params'.format(prefix, epoch, current_map))
def split_and_load(batch, ctx_list):
"""Split data to 1 batch each device."""
new_batch = []
for i, data in enumerate(batch):
if isinstance(data, (list, tuple)):
new_data = [x.as_in_context(ctx) for x, ctx in zip(data, ctx_list)]
else:
new_data = [data.as_in_context(ctx_list[0])]
new_batch.append(new_data)
return new_batch
def validate(net, val_data, ctx, eval_metric, args):
"""Test on validation dataset."""
clipper = gcv.nn.bbox.BBoxClipToImage()
eval_metric.reset()
if not args.disable_hybridization:
# input format is differnet than training, thus rehybridization is needed.
net.hybridize(static_alloc=args.static_alloc)
for i, batch in enumerate(val_data):
batch = split_and_load(batch, ctx_list=ctx)
det_bboxes = []
det_ids = []
det_scores = []
gt_bboxes = []
gt_ids = []
gt_difficults = []
for x, y, im_scale in zip(*batch):
# get prediction results
ids, scores, bboxes = net(x)
det_ids.append(ids)
det_scores.append(scores)
# clip to image size
det_bboxes.append(clipper(bboxes, x))
# rescale to original resolution
im_scale = im_scale.reshape((-1)).asscalar()
det_bboxes[-1] *= im_scale
# split ground truths
gt_ids.append(y.slice_axis(axis=-1, begin=4, end=5))
gt_bboxes.append(y.slice_axis(axis=-1, begin=0, end=4))
gt_bboxes[-1] *= im_scale
gt_difficults.append(y.slice_axis(axis=-1, begin=5, end=6) if y.shape[-1] > 5 else None)
# update metric
for det_bbox, det_id, det_score, gt_bbox, gt_id, gt_diff in zip(det_bboxes, det_ids,
det_scores, gt_bboxes,
gt_ids, gt_difficults):
eval_metric.update(det_bbox, det_id, det_score, gt_bbox, gt_id, gt_diff)
return eval_metric.get()
def get_lr_at_iter(alpha, lr_warmup_factor=1. / 3.):
return lr_warmup_factor * (1 - alpha) + alpha
class ForwardBackwardTask(Parallelizable):
def __init__(self, net, optimizer, rpn_cls_loss, rpn_box_loss, rcnn_cls_loss, rcnn_box_loss,
mix_ratio):
super(ForwardBackwardTask, self).__init__()
self.net = net
self._optimizer = optimizer
self.rpn_cls_loss = rpn_cls_loss
self.rpn_box_loss = rpn_box_loss
self.rcnn_cls_loss = rcnn_cls_loss
self.rcnn_box_loss = rcnn_box_loss
self.mix_ratio = mix_ratio
def forward_backward(self, x):
data, label, rpn_cls_targets, rpn_box_targets, rpn_box_masks = x
with autograd.record():
gt_label = label[:, :, 4:5]
gt_box = label[:, :, :4]
cls_pred, box_pred, roi, samples, matches, rpn_score, rpn_box, anchors, cls_targets, \
box_targets, box_masks, _ = net(data, gt_box, gt_label)
# losses of rpn
rpn_score = rpn_score.squeeze(axis=-1)
num_rpn_pos = (rpn_cls_targets >= 0).sum()
rpn_loss1 = self.rpn_cls_loss(rpn_score, rpn_cls_targets,
rpn_cls_targets >= 0) * rpn_cls_targets.size / num_rpn_pos
rpn_loss2 = self.rpn_box_loss(rpn_box, rpn_box_targets,
rpn_box_masks) * rpn_box.size / num_rpn_pos
# rpn overall loss, use sum rather than average
rpn_loss = rpn_loss1 + rpn_loss2
# losses of rcnn
num_rcnn_pos = (cls_targets >= 0).sum()
rcnn_loss1 = self.rcnn_cls_loss(cls_pred, cls_targets,
cls_targets.expand_dims(-1) >= 0) * cls_targets.size / \
num_rcnn_pos
rcnn_loss2 = self.rcnn_box_loss(box_pred, box_targets, box_masks) * box_pred.size / \
num_rcnn_pos
rcnn_loss = rcnn_loss1 + rcnn_loss2
# overall losses
total_loss = rpn_loss.sum() * self.mix_ratio + rcnn_loss.sum() * self.mix_ratio
rpn_loss1_metric = rpn_loss1.mean() * self.mix_ratio
rpn_loss2_metric = rpn_loss2.mean() * self.mix_ratio
rcnn_loss1_metric = rcnn_loss1.mean() * self.mix_ratio
rcnn_loss2_metric = rcnn_loss2.mean() * self.mix_ratio
rpn_acc_metric = [[rpn_cls_targets, rpn_cls_targets >= 0], [rpn_score]]
rpn_l1_loss_metric = [[rpn_box_targets, rpn_box_masks], [rpn_box]]
rcnn_acc_metric = [[cls_targets], [cls_pred]]
rcnn_l1_loss_metric = [[box_targets, box_masks], [box_pred]]
if args.amp:
with amp.scale_loss(total_loss, self._optimizer) as scaled_losses:
autograd.backward(scaled_losses)
else:
total_loss.backward()
return rpn_loss1_metric, rpn_loss2_metric, rcnn_loss1_metric, rcnn_loss2_metric, \
rpn_acc_metric, rpn_l1_loss_metric, rcnn_acc_metric, rcnn_l1_loss_metric
def train(net, train_data, val_data, eval_metric, batch_size, ctx, args):
"""Training pipeline"""
args.kv_store = 'device' if (args.amp and 'nccl' in args.kv_store) else args.kv_store
kv = mx.kvstore.create(args.kv_store)
net.collect_params().setattr('grad_req', 'null')
net.collect_train_params().setattr('grad_req', 'write')
optimizer_params = {'learning_rate': args.lr, 'wd': args.wd, 'momentum': args.momentum}
if args.horovod:
hvd.broadcast_parameters(net.collect_params(), root_rank=0)
trainer = hvd.DistributedTrainer(
net.collect_train_params(), # fix batchnorm, fix first stage, etc...
'sgd',
optimizer_params)
else:
trainer = gluon.Trainer(
net.collect_train_params(), # fix batchnorm, fix first stage, etc...
'sgd',
optimizer_params,
update_on_kvstore=(False if args.amp else None), kvstore=kv)
if args.amp:
amp.init_trainer(trainer)
# lr decay policy
lr_decay = float(args.lr_decay)
lr_steps = sorted([float(ls) for ls in args.lr_decay_epoch.split(',') if ls.strip()])
lr_warmup = float(args.lr_warmup) # avoid int division
# TODO(zhreshold) losses?
rpn_cls_loss = mx.gluon.loss.SigmoidBinaryCrossEntropyLoss(from_sigmoid=False)
rpn_box_loss = mx.gluon.loss.HuberLoss(rho=1 / 9.) # == smoothl1
rcnn_cls_loss = mx.gluon.loss.SoftmaxCrossEntropyLoss()
rcnn_box_loss = mx.gluon.loss.HuberLoss() # == smoothl1
metrics = [mx.metric.Loss('RPN_Conf'),
mx.metric.Loss('RPN_SmoothL1'),
mx.metric.Loss('RCNN_CrossEntropy'),
mx.metric.Loss('RCNN_SmoothL1'), ]
rpn_acc_metric = RPNAccMetric()
rpn_bbox_metric = RPNL1LossMetric()
rcnn_acc_metric = RCNNAccMetric()
rcnn_bbox_metric = RCNNL1LossMetric()
metrics2 = [rpn_acc_metric, rpn_bbox_metric, rcnn_acc_metric, rcnn_bbox_metric]
# set up logger
logging.basicConfig()
logger = logging.getLogger()
logger.setLevel(logging.INFO)
log_file_path = args.save_prefix + '_train.log'
log_dir = os.path.dirname(log_file_path)
if log_dir and not os.path.exists(log_dir):
os.makedirs(log_dir)
fh = logging.FileHandler(log_file_path)
logger.addHandler(fh)
logger.info(args)
if args.verbose:
logger.info('Trainable parameters:')
logger.info(net.collect_train_params().keys())
logger.info('Start training from [Epoch {}]'.format(args.start_epoch))
best_map = [0]
for epoch in range(args.start_epoch, args.epochs):
mix_ratio = 1.0
if not args.disable_hybridization:
net.hybridize(static_alloc=args.static_alloc)
rcnn_task = ForwardBackwardTask(net, trainer, rpn_cls_loss, rpn_box_loss, rcnn_cls_loss,
rcnn_box_loss, mix_ratio=1.0)
executor = Parallel(args.executor_threads, rcnn_task) if not args.horovod else None
if args.mixup:
# TODO(zhreshold) only support evenly mixup now, target generator needs to be modified otherwise
train_data._dataset._data.set_mixup(np.random.uniform, 0.5, 0.5)
mix_ratio = 0.5
if epoch >= args.epochs - args.no_mixup_epochs:
train_data._dataset._data.set_mixup(None)
mix_ratio = 1.0
while lr_steps and epoch >= lr_steps[0]:
new_lr = trainer.learning_rate * lr_decay
lr_steps.pop(0)
trainer.set_learning_rate(new_lr)
logger.info("[Epoch {}] Set learning rate to {}".format(epoch, new_lr))
for metric in metrics:
metric.reset()
tic = time.time()
btic = time.time()
base_lr = trainer.learning_rate
rcnn_task.mix_ratio = mix_ratio
logger.info('Total Num of Batches: %d'%(len(train_data)))
for i, batch in enumerate(train_data):
if epoch == 0 and i <= lr_warmup:
# adjust based on real percentage
new_lr = base_lr * get_lr_at_iter(i / lr_warmup, args.lr_warmup_factor)
if new_lr != trainer.learning_rate:
if i % args.log_interval == 0:
logger.info(
'[Epoch 0 Iteration {}] Set learning rate to {}'.format(i, new_lr))
trainer.set_learning_rate(new_lr)
batch = split_and_load(batch, ctx_list=ctx)
metric_losses = [[] for _ in metrics]
add_losses = [[] for _ in metrics2]
if executor is not None:
for data in zip(*batch):
executor.put(data)
for j in range(len(ctx)):
if executor is not None:
result = executor.get()
else:
result = rcnn_task.forward_backward(list(zip(*batch))[0])
if (not args.horovod) or hvd.rank() == 0:
for k in range(len(metric_losses)):
metric_losses[k].append(result[k])
for k in range(len(add_losses)):
add_losses[k].append(result[len(metric_losses) + k])
for metric, record in zip(metrics, metric_losses):
metric.update(0, record)
for metric, records in zip(metrics2, add_losses):
for pred in records:
metric.update(pred[0], pred[1])
trainer.step(batch_size)
# update metrics
if (not args.horovod or hvd.rank() == 0) and args.log_interval \
and not (i + 1) % args.log_interval:
msg = ','.join(
['{}={:.3f}'.format(*metric.get()) for metric in metrics + metrics2])
logger.info('[Epoch {}][Batch {}], Speed: {:.3f} samples/sec, {}'.format(
epoch, i, args.log_interval * args.batch_size / (time.time() - btic), msg))
btic = time.time()
if (not args.horovod) or hvd.rank() == 0:
msg = ','.join(['{}={:.3f}'.format(*metric.get()) for metric in metrics])
logger.info('[Epoch {}] Training cost: {:.3f}, {}'.format(
epoch, (time.time() - tic), msg))
if not (epoch + 1) % args.val_interval:
# consider reduce the frequency of validation to save time
if val_data is not None:
map_name, mean_ap = validate(net, val_data, ctx, eval_metric, args)
val_msg = '\n'.join(['{}={}'.format(k, v) for k, v in zip(map_name, mean_ap)])
logger.info('[Epoch {}] Validation: \n{}'.format(epoch, val_msg))
current_map = float(mean_ap[-1])
else:
current_map = 0
else:
current_map = 0.
save_params(net, logger, best_map, current_map, epoch, args.save_interval,
args.save_prefix)
if __name__ == '__main__':
import sys
sys.setrecursionlimit(1100)
args = parse_args()
# fix seed for mxnet, numpy and python builtin random generator.
gutils.random.seed(args.seed)
if args.amp:
amp.init()
# training contexts
if args.horovod:
ctx = [mx.gpu(hvd.local_rank())]
else:
ctx = [mx.gpu(int(i)) for i in args.gpus.split(',') if i.strip()]
ctx = ctx if ctx else [mx.cpu()]
# network
kwargs = {}
module_list = []
if args.use_fpn:
module_list.append('fpn')
if args.norm_layer is not None:
module_list.append(args.norm_layer)
if args.norm_layer == 'bn':
kwargs['num_devices'] = len(args.gpus.split(','))
net_name = '_'.join(('faster_rcnn', *module_list, args.network, 'custom'))
args.save_prefix += net_name
gutils.makedirs(args.save_prefix)
train_dataset, val_dataset, eval_metric = get_dataset(args.dataset, args)
net = faster_rcnn_resnet101_v1d_custom(classes=train_dataset.classes, transfer='coco',
pretrained_base=False, additional_output=False,
per_device_batch_size=args.batch_size // len(ctx), **kwargs)
if args.resume.strip():
net.load_parameters(args.resume.strip())
else:
for param in net.collect_params().values():
if param._data is not None:
continue
param.initialize()
net.collect_params().reset_ctx(ctx)
# training data
batch_size = args.batch_size // len(ctx) if args.horovod else args.batch_size
train_data, val_data = get_dataloader(
net, train_dataset, val_dataset, FasterRCNNDefaultTrainTransform,
FasterRCNNDefaultValTransform, batch_size, len(ctx), args)
# training
train(net, train_data, val_data, eval_metric, batch_size, ctx, args)
MXNET_CUDNN_AUTOTUNE_DEFAULT=0 CUDNN_AUTOTUNE_DEFAULT=0 MXNET_GPU_MEM_POOL_TYPE=Round MXNET_GPU_MEM_POOL_ROUND_LINEAR_CUTOFF=28 python train_faster_rcnn.py \
--gpus 0,1,2,3,4,5,6,7 --dataset visualgenome -j 60 --batch-size 8 --val-interval 20 --save-prefix faster_rcnn_resnet101_v1d_visualgenome/
import numpy as np
import json, pickle, os, argparse
def parse_args():
parser = argparse.ArgumentParser(description='Train the Frequenct Prior For RelDN.')
parser.add_argument('--overlap', action='store_true',
help="Only count overlap boxes.")
parser.add_argument('--json-path', type=str, default='~/.mxnet/datasets/visualgenome',
help="Only count overlap boxes.")
args = parser.parse_args()
return args
args = parse_args()
use_overlap = args.overlap
PATH_TO_DATASETS = os.path.expanduser(args.json_path)
path_to_json = os.path.join(PATH_TO_DATASETS, 'rel_annotations_train.json')
# format in y1y2x1x2
def with_overlap(boxA, boxB):
xA = max(boxA[2], boxB[2])
xB = min(boxA[3], boxB[3])
if xB > xA:
yA = max(boxA[0], boxB[0])
yB = min(boxA[1], boxB[1])
if yB > yA:
return 1
return 0
def box_ious(boxes):
n = len(boxes)
res = np.zeros((n, n))
for i in range(n-1):
for j in range(i+1, n):
iou_val = with_overlap(boxes[i], boxes[j])
res[i, j] = iou_val
res[j, i] = iou_val
return res
with open(path_to_json, 'r') as f:
tmp = f.read()
train_data = json.loads(tmp)
fg_matrix = np.zeros((150, 150, 51), dtype=np.int64)
bg_matrix = np.zeros((150, 150), dtype=np.int64)
for _, item in train_data.items():
gt_box_to_label = {}
for rel in item:
sub_bbox = rel['subject']['bbox']
ob_bbox = rel['object']['bbox']
sub_class = rel['subject']['category']
ob_class = rel['object']['category']
rel_class = rel['predicate']
sub_node = tuple(sub_bbox)
ob_node = tuple(ob_bbox)
if sub_node not in gt_box_to_label:
gt_box_to_label[sub_node] = sub_class
if ob_node not in gt_box_to_label:
gt_box_to_label[ob_node] = ob_class
fg_matrix[sub_class, ob_class, rel_class + 1] += 1
if use_overlap:
gt_boxes = [*gt_box_to_label]
gt_classes = np.array([*gt_box_to_label.values()])
iou_mat = box_ious(gt_boxes)
cols, rows = np.where(iou_mat)
if len(cols) and len(rows):
for col, row in zip(cols, rows):
bg_matrix[gt_classes[col], gt_classes[row]] += 1
else:
all_possib = np.ones_like(iou_mat, dtype=np.bool)
np.fill_diagonal(all_possib, 0)
cols, rows = np.where(all_possib)
for col, row in zip(cols, rows):
bg_matrix[gt_classes[col], gt_classes[row]] += 1
else:
for b1, l1 in gt_box_to_label.items():
for b2, l2 in gt_box_to_label.items():
if b1 == b2:
continue
bg_matrix[l1, l2] += 1
eps = 1e-3
bg_matrix += 1
fg_matrix[:, :, 0] = bg_matrix
pred_dist = np.log(fg_matrix / (fg_matrix.sum(2)[:, :, None] + eps) + eps)
if use_overlap:
with open('freq_prior_overlap.pkl', 'wb') as f:
pickle.dump(pred_dist, f)
else:
with open('freq_prior.pkl', 'wb') as f:
pickle.dump(pred_dist, f)
import dgl
import mxnet as mx
import numpy as np
import logging, time, argparse
from mxnet import nd, gluon
from gluoncv.data.batchify import Pad
from gluoncv.utils import makedirs
from model import faster_rcnn_resnet101_v1d_custom, RelDN
from utils import *
from data import *
def parse_args():
parser = argparse.ArgumentParser(description='Train RelDN Model.')
parser.add_argument('--gpus', type=str, default='0',
help="Training with GPUs, you can specify 1,3 for example.")
parser.add_argument('--batch-size', type=int, default=8,
help="Total batch-size for training.")
parser.add_argument('--epochs', type=int, default=9,
help="Training epochs.")
parser.add_argument('--lr-reldn', type=float, default=0.01,
help="Learning rate for RelDN module.")
parser.add_argument('--wd-reldn', type=float, default=0.0001,
help="Weight decay for RelDN module.")
parser.add_argument('--lr-faster-rcnn', type=float, default=0.01,
help="Learning rate for Faster R-CNN module.")
parser.add_argument('--wd-faster-rcnn', type=float, default=0.0001,
help="Weight decay for RelDN module.")
parser.add_argument('--lr-decay-epochs', type=str, default='5,8',
help="Learning rate decay points.")
parser.add_argument('--lr-warmup-iters', type=int, default=4000,
help="Learning rate warm-up iterations.")
parser.add_argument('--save-dir', type=str, default='params_resnet101_v1d_reldn',
help="Path to save model parameters.")
parser.add_argument('--log-dir', type=str, default='reldn_output.log',
help="Path to save training logs.")
parser.add_argument('--pretrained-faster-rcnn-params', type=str, required=True,
help="Path to saved Faster R-CNN model parameters.")
parser.add_argument('--freq-prior', type=str, default='freq_prior.pkl',
help="Path to saved frequency prior data.")
parser.add_argument('--verbose-freq', type=int, default=100,
help="Frequency of log printing in number of iterations.")
args = parser.parse_args()
return args
args = parse_args()
filehandler = logging.FileHandler(args.log_dir)
streamhandler = logging.StreamHandler()
logger = logging.getLogger('')
logger.setLevel(logging.INFO)
logger.addHandler(filehandler)
logger.addHandler(streamhandler)
# Hyperparams
ctx = [mx.gpu(int(i)) for i in args.gpus.split(',') if i.strip()]
if ctx:
num_gpus = len(ctx)
assert args.batch_size % num_gpus == 0
per_device_batch_size = int(args.batch_size / num_gpus)
else:
ctx = [mx.cpu()]
per_device_batch_size = args.batch_size
aggregate_grad = per_device_batch_size > 1
nepoch = args.epochs
N_relations = 50
N_objects = 150
save_dir = args.save_dir
makedirs(save_dir)
batch_verbose_freq = args.verbose_freq
lr_decay_epochs = [int(i) for i in args.lr_decay_epochs.split(',')]
# Dataset and dataloader
vg_train = VGRelation(split='train')
logger.info('data loaded!')
train_data = gluon.data.DataLoader(vg_train, batch_size=len(ctx), shuffle=True, num_workers=8*num_gpus,
batchify_fn=dgl_mp_batchify_fn)
n_batches = len(train_data)
# Network definition
net = RelDN(n_classes=N_relations, prior_pkl=args.freq_prior)
net.spatial.initialize(mx.init.Normal(1e-4), ctx=ctx)
net.visual.initialize(mx.init.Normal(1e-4), ctx=ctx)
for k, v in net.collect_params().items():
v.grad_req = 'add' if aggregate_grad else 'write'
net_params = net.collect_params()
net_trainer = gluon.Trainer(net.collect_params(), 'adam',
{'learning_rate': args.lr_reldn, 'wd': args.wd_reldn})
det_params_path = args.pretrained_faster_rcnn_params
detector = faster_rcnn_resnet101_v1d_custom(classes=vg_train.obj_classes,
pretrained_base=False, pretrained=False,
additional_output=True)
detector.load_parameters(det_params_path, ctx=ctx, ignore_extra=True, allow_missing=True)
for k, v in detector.collect_params().items():
v.grad_req = 'null'
detector_feat = faster_rcnn_resnet101_v1d_custom(classes=vg_train.obj_classes,
pretrained_base=False, pretrained=False,
additional_output=True)
detector_feat.load_parameters(det_params_path, ctx=ctx, ignore_extra=True, allow_missing=True)
for k, v in detector_feat.collect_params().items():
v.grad_req = 'null'
for k, v in detector_feat.features.collect_params().items():
v.grad_req = 'add' if aggregate_grad else 'write'
det_params = detector_feat.features.collect_params()
det_trainer = gluon.Trainer(detector_feat.features.collect_params(), 'adam',
{'learning_rate': args.lr_faster_rcnn, 'wd': args.wd_faster_rcnn})
def get_data_batch(g_list, img_list, ctx_list):
if g_list is None or len(g_list) == 0:
return None, None
n_gpu = len(ctx_list)
size = len(g_list)
if size < n_gpu:
raise Exception("too small batch")
step = size // n_gpu
G_list = [g_list[i*step:(i+1)*step] if i < n_gpu - 1 else g_list[i*step:size] for i in range(n_gpu)]
img_list = [img_list[i*step:(i+1)*step] if i < n_gpu - 1 else img_list[i*step:size] for i in range(n_gpu)]
for G_slice, ctx in zip(G_list, ctx_list):
for G in G_slice:
G.ndata['bbox'] = G.ndata['bbox'].as_in_context(ctx)
G.ndata['node_class'] = G.ndata['node_class'].as_in_context(ctx)
G.ndata['node_class_vec'] = G.ndata['node_class_vec'].as_in_context(ctx)
G.edata['rel_class'] = G.edata['rel_class'].as_in_context(ctx)
img_list = [img.as_in_context(ctx) for img in img_list]
return G_list, img_list
L_rel = gluon.loss.SoftmaxCELoss()
train_metric = mx.metric.Accuracy(name='rel_acc')
train_metric_top5 = mx.metric.TopKAccuracy(5, name='rel_acc_top5')
metric_list = [train_metric, train_metric_top5]
def batch_print(epoch, i, batch_verbose_freq, n_batches, btic, loss_rel_val, metric_list):
if (i+1) % batch_verbose_freq == 0:
print_txt = 'Epoch[%d] Batch[%d/%d], time: %d, loss_rel=%.4f '%\
(epoch, i, n_batches, int(time.time() - btic),
loss_rel_val / (i+1), )
for metric in metric_list:
metric_name, metric_val = metric.get()
print_txt += '%s=%.4f '%(metric_name, metric_val)
logger.info(print_txt)
btic = time.time()
loss_rel_val = 0
return btic, loss_rel_val
for epoch in range(nepoch):
loss_rel_val = 0
tic = time.time()
btic = time.time()
for metric in metric_list:
metric.reset()
if epoch == 0:
net_trainer_base_lr = net_trainer.learning_rate
det_trainer_base_lr = det_trainer.learning_rate
if epoch == 5 or epoch == 8:
net_trainer.set_learning_rate(net_trainer.learning_rate*0.1)
det_trainer.set_learning_rate(det_trainer.learning_rate*0.1)
for i, (G_list, img_list) in enumerate(train_data):
if epoch == 0 and i < args.lr_warmup_iters:
alpha = i / args.lr_warmup_iters
warmup_factor = 1/3 * (1 - alpha) + alpha
net_trainer.set_learning_rate(net_trainer_base_lr*warmup_factor)
det_trainer.set_learning_rate(det_trainer_base_lr*warmup_factor)
G_list, img_list = get_data_batch(G_list, img_list, ctx)
if G_list is None or img_list is None:
btic, loss_rel_val = batch_print(epoch, i, batch_verbose_freq, n_batches, btic, loss_rel_val, metric_list)
continue
loss = []
detector_res_list = []
G_batch = []
bbox_pad = Pad(axis=(0))
with mx.autograd.record():
for G_slice, img in zip(G_list, img_list):
cur_ctx = img.context
bbox_list = [G.ndata['bbox'] for G in G_slice]
bbox_stack = bbox_pad(bbox_list).as_in_context(cur_ctx)
with mx.autograd.pause():
ids, scores, bbox, feat, feat_ind, spatial_feat = detector(img)
g_pred_batch = build_graph_train(G_slice, bbox_stack, img, ids, scores, bbox, feat_ind,
spatial_feat, scores_top_k=300, overlap=False)
g_batch = l0_sample(g_pred_batch)
if g_batch is None:
continue
rel_bbox = g_batch.edata['rel_bbox']
batch_id = g_batch.edata['batch_id'].asnumpy()
n_sample_edges = g_batch.number_of_edges()
n_graph = len(G_slice)
bbox_rel_list = []
for j in range(n_graph):
eids = np.where(batch_id == j)[0]
if len(eids) > 0:
bbox_rel_list.append(rel_bbox[eids])
bbox_rel_stack = bbox_pad(bbox_rel_list).as_in_context(cur_ctx)
img_size = img.shape[2:4]
bbox_rel_stack[:, :, 0] *= img_size[1]
bbox_rel_stack[:, :, 1] *= img_size[0]
bbox_rel_stack[:, :, 2] *= img_size[1]
bbox_rel_stack[:, :, 3] *= img_size[0]
_, _, _, spatial_feat_rel = detector_feat(img, None, None, bbox_rel_stack)
spatial_feat_rel_list = []
for j in range(n_graph):
eids = np.where(batch_id == j)[0]
if len(eids) > 0:
spatial_feat_rel_list.append(spatial_feat_rel[j, 0:len(eids)])
g_batch.edata['edge_feat'] = nd.concat(*spatial_feat_rel_list, dim=0)
G_batch.append(g_batch)
G_batch = [net(G) for G in G_batch]
for G_pred, img in zip(G_batch, img_list):
if G_pred is None or G_pred.number_of_nodes() == 0:
continue
loss_rel = L_rel(G_pred.edata['preds'], G_pred.edata['rel_class'],
G_pred.edata['sample_weights'])
loss.append(loss_rel.sum())
loss_rel_val += loss_rel.mean().asscalar() / num_gpus
if len(loss) == 0:
btic, loss_rel_val = batch_print(epoch, i, batch_verbose_freq, n_batches, btic, loss_rel_val, metric_list)
continue
for l in loss:
l.backward()
if (i+1) % per_device_batch_size == 0 or i == n_batches - 1:
net_trainer.step(args.batch_size)
det_trainer.step(args.batch_size)
if aggregate_grad:
for k, v in net_params.items():
v.zero_grad()
for k, v in det_params.items():
v.zero_grad()
for G_pred, img_slice in zip(G_batch, img_list):
if G_pred is None or G_pred.number_of_nodes() == 0:
continue
link_ind = np.where(G_pred.edata['rel_class'].asnumpy() > 0)[0]
if len(link_ind) == 0:
continue
train_metric.update([G_pred.edata['rel_class'][link_ind]],
[G_pred.edata['preds'][link_ind]])
train_metric_top5.update([G_pred.edata['rel_class'][link_ind]],
[G_pred.edata['preds'][link_ind]])
btic, loss_rel_val = batch_print(epoch, i, batch_verbose_freq, n_batches, btic, loss_rel_val, metric_list)
if (i+1) % batch_verbose_freq == 0:
net.save_parameters('%s/model-%d.params'%(save_dir, epoch))
detector_feat.features.save_parameters('%s/detector_feat.features-%d.params'%(save_dir, epoch))
print_txt = 'Epoch[%d], time: %d, loss_rel=%.4f,'%\
(epoch, int(time.time() - tic),
loss_rel_val / (i+1))
for metric in metric_list:
metric_name, metric_val = metric.get()
print_txt += '%s=%.4f '%(metric_name, metric_val)
logger.info(print_txt)
net.save_parameters('%s/model-%d.params'%(save_dir, epoch))
detector_feat.features.save_parameters('%s/detector_feat.features-%d.params'%(save_dir, epoch))
MXNET_CUDNN_AUTOTUNE_DEFAULT=0 python train_reldn.py \
--pretrained-faster-rcnn-params faster_rcnn_resnet101_v1d_visualgenome/faster_rcnn_resnet101_v1d_custom_best.params
from .metric import *
from .build_graph import *
from .sampling import *
from .viz import *
import dgl
from mxnet import nd
import numpy as np
def bbox_improve(bbox):
'''bbox encoding'''
area = (bbox[:,2] - bbox[:,0]) * (bbox[:,3] - bbox[:,1])
return nd.concat(bbox, area.expand_dims(1))
def extract_edge_bbox(g):
'''bbox encoding'''
src, dst = g.edges(order='eid')
n = g.number_of_edges()
src_bbox = g.ndata['pred_bbox'][src.asnumpy()]
dst_bbox = g.ndata['pred_bbox'][dst.asnumpy()]
edge_bbox = nd.zeros((n, 4), ctx=g.ndata['pred_bbox'].context)
edge_bbox[:,0] = nd.stack(src_bbox[:,0], dst_bbox[:,0]).min(axis=0)
edge_bbox[:,1] = nd.stack(src_bbox[:,1], dst_bbox[:,1]).min(axis=0)
edge_bbox[:,2] = nd.stack(src_bbox[:,2], dst_bbox[:,2]).max(axis=0)
edge_bbox[:,3] = nd.stack(src_bbox[:,3], dst_bbox[:,3]).max(axis=0)
return edge_bbox
def build_graph_train(g_slice, gt_bbox, img, ids, scores, bbox, feat_ind,
spatial_feat, iou_thresh=0.5,
bbox_improvement=True, scores_top_k=50, overlap=False):
'''given ground truth and predicted bboxes, assign the label to the predicted w.r.t iou_thresh'''
# match and re-factor the graph
img_size = img.shape[2:4]
gt_bbox[:, :, 0] /= img_size[1]
gt_bbox[:, :, 1] /= img_size[0]
gt_bbox[:, :, 2] /= img_size[1]
gt_bbox[:, :, 3] /= img_size[0]
bbox[:, :, 0] /= img_size[1]
bbox[:, :, 1] /= img_size[0]
bbox[:, :, 2] /= img_size[1]
bbox[:, :, 3] /= img_size[0]
n_graph = len(g_slice)
g_pred_batch = []
for gi in range(n_graph):
g = g_slice[gi]
ctx = g.ndata['bbox'].context
inds = np.where(scores[gi, :, 0].asnumpy() > 0)[0].tolist()
if len(inds) == 0:
return None
if len(inds) > scores_top_k:
top_score_inds = scores[gi, inds, 0].asnumpy().argsort()[::-1][0:scores_top_k]
inds = np.array(inds)[top_score_inds].tolist()
n_nodes = len(inds)
roi_ind = feat_ind[gi, inds].squeeze(axis=1)
g_pred = dgl.DGLGraph(multigraph=True)
g_pred.add_nodes(n_nodes, {'pred_bbox': bbox[gi, inds],
'node_feat': spatial_feat[gi, roi_ind],
'node_class_pred': ids[gi, inds, 0],
'node_class_logit': nd.log(scores[gi, inds, 0] + 1e-7)})
# iou matching
ious = nd.contrib.box_iou(gt_bbox[gi], g_pred.ndata['pred_bbox']).asnumpy()
H, W = ious.shape
h = H
w = W
pred_to_gt_ind = np.array([-1 for i in range(W)])
pred_to_gt_class_match = [0 for i in range(W)]
pred_to_gt_class_match_id = [0 for i in range(W)]
while h > 0 and w > 0:
ind = int(ious.argmax())
row_ind = ind // W
col_ind = ind % W
if ious[row_ind, col_ind] < iou_thresh:
break
pred_to_gt_ind[col_ind] = row_ind
gt_node_class = g.ndata['node_class'][row_ind]
pred_node_class = g_pred.ndata['node_class_pred'][col_ind]
if gt_node_class == pred_node_class:
pred_to_gt_class_match[col_ind] = 1
pred_to_gt_class_match_id[col_ind] = row_ind
ious[row_ind, :] = -1
ious[:, col_ind] = -1
h -= 1
w -= 1
n_nodes = g_pred.number_of_nodes()
triplet = []
adjmat = np.zeros((n_nodes, n_nodes))
src, dst = g.all_edges(order='eid')
eid_keys = np.column_stack([src.asnumpy(), dst.asnumpy()])
eid_dict = {}
for i, key in enumerate(eid_keys):
k = tuple(key)
if k not in eid_dict:
eid_dict[k] = [i]
else:
eid_dict[k].append(i)
ori_rel_class = g.edata['rel_class'].asnumpy()
for i in range(n_nodes):
for j in range(n_nodes):
if i != j:
if pred_to_gt_class_match[i] and pred_to_gt_class_match[j]:
sub_gt_id = pred_to_gt_class_match_id[i]
ob_gt_id = pred_to_gt_class_match_id[j]
eids = eid_dict[(sub_gt_id, ob_gt_id)]
rel_cls = ori_rel_class[eids]
n_edges_between = len(rel_cls)
for ii in range(n_edges_between):
triplet.append((i, j, rel_cls[ii]))
adjmat[i,j] = 1
else:
triplet.append((i, j, 0))
src, dst, rel_class = tuple(zip(*triplet))
rel_class = nd.array(rel_class, ctx=ctx).expand_dims(1)
g_pred.add_edges(src, dst, data={'rel_class': rel_class})
# other operations
n_nodes = g_pred.number_of_nodes()
n_edges = g_pred.number_of_edges()
if bbox_improvement:
g_pred.ndata['pred_bbox'] = bbox_improve(g_pred.ndata['pred_bbox'])
g_pred.edata['rel_bbox'] = extract_edge_bbox(g_pred)
g_pred.edata['batch_id'] = nd.zeros((n_edges, 1), ctx = ctx) + gi
# remove non-overlapping edges
if overlap:
overlap_ious = nd.contrib.box_iou(g_pred.ndata['pred_bbox'][:,0:4],
g_pred.ndata['pred_bbox'][:,0:4]).asnumpy()
cols, rows = np.where(overlap_ious <= 1e-7)
if cols.shape[0] > 0:
eids = g_pred.edge_ids(cols, rows)[2].asnumpy().tolist()
if len(eids):
g_pred.remove_edges(eids)
if g_pred.number_of_edges() == 0:
g_pred = None
g_pred_batch.append(g_pred)
if n_graph > 1:
return dgl.batch(g_pred_batch)
else:
return g_pred_batch[0]
def build_graph_validate_gt_obj(img, gt_ids, bbox, spatial_feat,
bbox_improvement=True, overlap=False):
'''given ground truth bbox and label, build graph for validation'''
n_batch = img.shape[0]
img_size = img.shape[2:4]
bbox[:, :, 0] /= img_size[1]
bbox[:, :, 1] /= img_size[0]
bbox[:, :, 2] /= img_size[1]
bbox[:, :, 3] /= img_size[0]
ctx = img.context
g_batch = []
for btc in range(n_batch):
inds = np.where(bbox[btc].sum(1).asnumpy() > 0)[0].tolist()
if len(inds) == 0:
continue
n_nodes = len(inds)
g_pred = dgl.DGLGraph()
g_pred.add_nodes(n_nodes, {'pred_bbox': bbox[btc, inds],
'node_feat': spatial_feat[btc, inds],
'node_class_pred': gt_ids[btc, inds, 0],
'node_class_logit': nd.zeros_like(gt_ids[btc, inds, 0], ctx=ctx)})
edge_list = []
for i in range(n_nodes - 1):
for j in range(i + 1, n_nodes):
edge_list.append((i, j))
src, dst = tuple(zip(*edge_list))
g_pred.add_edges(src, dst)
g_pred.add_edges(dst, src)
n_nodes = g_pred.number_of_nodes()
n_edges = g_pred.number_of_edges()
if bbox_improvement:
g_pred.ndata['pred_bbox'] = bbox_improve(g_pred.ndata['pred_bbox'])
g_pred.edata['rel_bbox'] = extract_edge_bbox(g_pred)
g_pred.edata['batch_id'] = nd.zeros((n_edges, 1), ctx = ctx) + btc
g_batch.append(g_pred)
if len(g_batch) == 0:
return None
if len(g_batch) > 1:
return dgl.batch(g_batch)
return g_batch[0]
def build_graph_validate_gt_bbox(img, ids, scores, bbox, spatial_feat, gt_ids=None,
bbox_improvement=True, overlap=False):
'''given ground truth bbox, build graph for validation'''
n_batch = img.shape[0]
img_size = img.shape[2:4]
bbox[:, :, 0] /= img_size[1]
bbox[:, :, 1] /= img_size[0]
bbox[:, :, 2] /= img_size[1]
bbox[:, :, 3] /= img_size[0]
ctx = img.context
g_batch = []
for btc in range(n_batch):
id_btc = scores[btc][:,:,0].argmax(0)
score_btc = scores[btc][:,:,0].max(0)
inds = np.where(bbox[btc].sum(1).asnumpy() > 0)[0].tolist()
if len(inds) == 0:
continue
n_nodes = len(inds)
g_pred = dgl.DGLGraph()
g_pred.add_nodes(n_nodes, {'pred_bbox': bbox[btc, inds],
'node_feat': spatial_feat[btc, inds],
'node_class_pred': id_btc,
'node_class_logit': nd.log(score_btc + 1e-7)})
edge_list = []
for i in range(n_nodes - 1):
for j in range(i + 1, n_nodes):
edge_list.append((i, j))
src, dst = tuple(zip(*edge_list))
g_pred.add_edges(src, dst)
g_pred.add_edges(dst, src)
n_nodes = g_pred.number_of_nodes()
n_edges = g_pred.number_of_edges()
if bbox_improvement:
g_pred.ndata['pred_bbox'] = bbox_improve(g_pred.ndata['pred_bbox'])
g_pred.edata['rel_bbox'] = extract_edge_bbox(g_pred)
g_pred.edata['batch_id'] = nd.zeros((n_edges, 1), ctx = ctx) + btc
g_batch.append(g_pred)
if len(g_batch) == 0:
return None
if len(g_batch) > 1:
return dgl.batch(g_batch)
return g_batch[0]
def build_graph_validate_pred(img, ids, scores, bbox, feat_ind, spatial_feat,
bbox_improvement=True, scores_top_k=50, overlap=False):
'''given predicted bbox, build graph for validation'''
n_batch = img.shape[0]
img_size = img.shape[2:4]
bbox[:, :, 0] /= img_size[1]
bbox[:, :, 1] /= img_size[0]
bbox[:, :, 2] /= img_size[1]
bbox[:, :, 3] /= img_size[0]
ctx = img.context
g_batch = []
for btc in range(n_batch):
inds = np.where(scores[btc, :, 0].asnumpy() > 0)[0].tolist()
if len(inds) == 0:
continue
if len(inds) > scores_top_k:
top_score_inds = scores[btc, inds, 0].asnumpy().argsort()[::-1][0:scores_top_k]
inds = np.array(inds)[top_score_inds].tolist()
n_nodes = len(inds)
roi_ind = feat_ind[btc, inds].squeeze(axis=1)
g_pred = dgl.DGLGraph()
g_pred.add_nodes(n_nodes, {'pred_bbox': bbox[btc, inds],
'node_feat': spatial_feat[btc, roi_ind],
'node_class_pred': ids[btc, inds, 0],
'node_class_logit': nd.log(scores[btc, inds, 0] + 1e-7)})
edge_list = []
for i in range(n_nodes - 1):
for j in range(i + 1, n_nodes):
edge_list.append((i, j))
src, dst = tuple(zip(*edge_list))
g_pred.add_edges(src, dst)
g_pred.add_edges(dst, src)
n_nodes = g_pred.number_of_nodes()
n_edges = g_pred.number_of_edges()
if bbox_improvement:
g_pred.ndata['pred_bbox'] = bbox_improve(g_pred.ndata['pred_bbox'])
g_pred.edata['rel_bbox'] = extract_edge_bbox(g_pred)
g_pred.edata['batch_id'] = nd.zeros((n_edges, 1), ctx = ctx) + btc
g_batch.append(g_pred)
if len(g_batch) == 0:
return None
if len(g_batch) > 1:
return dgl.batch(g_batch)
return g_batch[0]
import dgl
import mxnet as mx
import numpy as np
import logging, time
from operator import attrgetter, itemgetter
from mxnet import nd, gluon
from mxnet.gluon import nn
from dgl.utils import toindex
from dgl.nn.mxnet import GraphConv
from gluoncv.model_zoo import get_model
from gluoncv.data.batchify import Pad
def iou(boxA, boxB):
# determine the (x, y)-coordinates of the intersection rectangle
xA = max(boxA[0], boxB[0])
yA = max(boxA[1], boxB[1])
xB = min(boxA[2], boxB[2])
yB = min(boxA[3], boxB[3])
interArea = max(0, xB - xA) * max(0, yB - yA)
if interArea < 1e-7 :
return 0
boxAArea = (boxA[2] - boxA[0]) * (boxA[3] - boxA[1])
boxBArea = (boxB[2] - boxB[0]) * (boxB[3] - boxB[1])
if boxAArea + boxBArea - interArea < 1e-7:
return 0
iou_val = interArea / float(boxAArea + boxBArea - interArea)
return iou_val
def object_iou_thresh(gt_object, pred_object, iou_thresh=0.5):
obj_iou = iou(gt_object[1:5], pred_object[1:5])
if obj_iou >= iou_thresh:
return True
return False
def triplet_iou_thresh(pred_triplet, gt_triplet, iou_thresh=0.5):
sub_iou = iou(gt_triplet[5:9], pred_triplet[5:9])
if sub_iou >= iou_thresh:
ob_iou = iou(gt_triplet[9:13], pred_triplet[9:13])
if ob_iou >= iou_thresh:
return True
return False
@mx.metric.register
@mx.metric.alias('auc')
class AUCMetric(mx.metric.EvalMetric):
def __init__(self, name='auc', eps=1e-12):
super(AUCMetric, self).__init__(name)
self.eps = eps
def update(self, labels, preds):
mx.metric.check_label_shapes(labels, preds)
label_weight = labels[0].asnumpy()
preds = preds[0].asnumpy()
tmp = []
for i in range(preds.shape[0]):
tmp.append((label_weight[i], preds[i][1]))
tmp = sorted(tmp, key=itemgetter(1), reverse=True)
label_sum = label_weight.sum()
if label_sum == 0 or label_sum == label_weight.size:
return
label_one_num = np.count_nonzero(label_weight)
label_zero_num = len(label_weight) - label_one_num
total_area = label_zero_num * label_one_num
height = 0
width = 0
area = 0
for a, _ in tmp:
if a == 1.0:
height += 1.0
else:
width += 1.0
area += height
self.sum_metric += area / total_area
self.num_inst += 1
@mx.metric.register
@mx.metric.alias('predcls')
class PredCls(mx.metric.EvalMetric):
'''Metric with ground truth object location and label'''
def __init__(self, topk=20, iou_thresh=0.99):
super(PredCls, self).__init__('predcls@%d'%(topk))
self.topk = topk
self.iou_thresh = iou_thresh
def update(self, labels, preds):
if labels is None or preds is None:
self.num_inst += 1
return
preds = preds[preds[:,0].argsort()[::-1]]
m = min(self.topk, preds.shape[0])
count = 0
gt_edge_num = labels.shape[0]
label_matched = [False for label in labels]
for i in range(m):
pred = preds[i]
for j in range(gt_edge_num):
if label_matched[j]:
continue
label = labels[j]
if int(label[2]) == int(pred[2]) and \
triplet_iou_thresh(pred, label, self.iou_thresh):
count += 1
label_matched[j] = True
total = labels.shape[0]
self.sum_metric += count / total
self.num_inst += 1
@mx.metric.register
@mx.metric.alias('phrcls')
class PhrCls(mx.metric.EvalMetric):
'''Metric with ground truth object location and predicted object label from detector'''
def __init__(self, topk=20, iou_thresh=0.99):
super(PhrCls, self).__init__('phrcls@%d'%(topk))
self.topk = topk
self.iou_thresh = iou_thresh
def update(self, labels, preds):
if labels is None or preds is None:
self.num_inst += 1
return
preds = preds[preds[:,1].argsort()[::-1]]
m = min(self.topk, preds.shape[0])
count = 0
gt_edge_num = labels.shape[0]
label_matched = [False for label in labels]
for i in range(m):
pred = preds[i]
for j in range(gt_edge_num):
if label_matched[j]:
continue
label = labels[j]
if int(label[2]) == int(pred[2]) and \
int(label[3]) == int(pred[3]) and \
int(label[4]) == int(pred[4]) and \
triplet_iou_thresh(pred, label, self.iou_thresh):
count += 1
label_matched[j] = True
total = labels.shape[0]
self.sum_metric += count / total
self.num_inst += 1
@mx.metric.register
@mx.metric.alias('sgdet')
class SGDet(mx.metric.EvalMetric):
'''Metric with predicted object information by the detector'''
def __init__(self, topk=20, iou_thresh=0.5):
super(SGDet, self).__init__('sgdet@%d'%(topk))
self.topk = topk
self.iou_thresh = iou_thresh
def update(self, labels, preds):
if labels is None or preds is None:
self.num_inst += 1
return
preds = preds[preds[:,1].argsort()[::-1]]
m = min(self.topk, len(preds))
count = 0
gt_edge_num = labels.shape[0]
label_matched = [False for label in labels]
for i in range(m):
pred = preds[i]
for j in range(gt_edge_num):
if label_matched[j]:
continue
label = labels[j]
if int(label[2]) == int(pred[2]) and \
int(label[3]) == int(pred[3]) and \
int(label[4]) == int(pred[4]) and \
triplet_iou_thresh(pred, label, self.iou_thresh):
count += 1
label_matched[j] =True
total = labels.shape[0]
self.sum_metric += count / total
self.num_inst += 1
@mx.metric.register
@mx.metric.alias('sgdet+')
class SGDetPlus(mx.metric.EvalMetric):
'''Metric proposed by `Graph R-CNN for Scene Graph Generation`'''
def __init__(self, topk=20, iou_thresh=0.5):
super(SGDetPlus, self).__init__('sgdet+@%d'%(topk))
self.topk = topk
self.iou_thresh = iou_thresh
def update(self, labels, preds):
label_objects, label_triplets = labels
pred_objects, pred_triplets = preds
if label_objects is None or pred_objects is None:
self.num_inst += 1
return
count = 0
# count objects
object_matched = [False for obj in label_objects]
m = len(pred_objects)
gt_obj_num = label_objects.shape[0]
for i in range(m):
pred = pred_objects[i]
for j in range(gt_obj_num):
if object_matched[j]:
continue
label = label_objects[j]
if int(label[0]) == int(pred[0]) and \
object_iou_thresh(pred, label, self.iou_thresh):
count += 1
object_matched[j] = True
# count predicate and triplet
pred_triplets = pred_triplets[pred_triplets[:,1].argsort()[::-1]]
m = min(self.topk, len(pred_triplets))
gt_triplet_num = label_triplets.shape[0]
triplet_matched = [False for label in label_triplets]
predicate_matched = [False for label in label_triplets]
for i in range(m):
pred = pred_triplets[i]
for j in range(gt_triplet_num):
label = label_triplets[j]
if not predicate_matched:
if int(label[2]) == int(pred[2]) and \
triplet_iou_thresh(pred, label, self.iou_thresh):
count += label[3]
predicate_matched[j] = True
if not triplet_matched[j]:
if int(label[2]) == int(pred[2]) and \
int(label[3]) == int(pred[3]) and \
int(label[4]) == int(pred[4]) and \
triplet_iou_thresh(pred, label, self.iou_thresh):
count += 1
triplet_matched[j] = True
# compute sum
total = labels.shape[0]
N = gt_obj_num + 2 * total
self.sum_metric += count / N
self.num_inst += 1
def extract_gt(g, img_size):
'''extract prediction from ground truth graph'''
if g is None or g.number_of_nodes() == 0:
return None, None
gt_eids = np.where(g.edata['rel_class'].asnumpy() > 0)[0]
if len(gt_eids) == 0:
return None, None
gt_class = g.ndata['node_class'][:,0].asnumpy()
gt_bbox = g.ndata['bbox'].asnumpy()
gt_bbox[:, 0] /= img_size[1]
gt_bbox[:, 1] /= img_size[0]
gt_bbox[:, 2] /= img_size[1]
gt_bbox[:, 3] /= img_size[0]
gt_objects = np.vstack([gt_class, gt_bbox.transpose(1, 0)]).transpose(1, 0)
gt_node_ids = g.find_edges(gt_eids)
gt_node_sub = gt_node_ids[0].asnumpy()
gt_node_ob = gt_node_ids[1].asnumpy()
gt_rel_class = g.edata['rel_class'][gt_eids,0].asnumpy() - 1
gt_sub_class = gt_class[gt_node_sub]
gt_ob_class = gt_class[gt_node_ob]
gt_sub_bbox = gt_bbox[gt_node_sub]
gt_ob_bbox = gt_bbox[gt_node_ob]
n = len(gt_eids)
gt_triplets = np.vstack([np.ones(n), np.ones(n),
gt_rel_class, gt_sub_class, gt_ob_class,
gt_sub_bbox.transpose(1, 0),
gt_ob_bbox.transpose(1, 0)]).transpose(1, 0)
return gt_objects, gt_triplets
def extract_pred(g, topk=100, joint_preds=False):
'''extract prediction from prediction graph for validation and visualization'''
if g is None or g.number_of_nodes() == 0:
return None, None
pred_class = g.ndata['node_class_pred'].asnumpy()
pred_class_prob = g.ndata['node_class_logit'].asnumpy()
pred_bbox = g.ndata['pred_bbox'][:,0:4].asnumpy()
pred_objects = np.vstack([pred_class, pred_bbox.transpose(1, 0)]).transpose(1, 0)
score_pred = g.edata['score_pred'].asnumpy()
score_phr = g.edata['score_phr'].asnumpy()
score_pred_topk_eids = (-score_pred).argsort()[0:topk].tolist()
score_phr_topk_eids = (-score_phr).argsort()[0:topk].tolist()
topk_eids = sorted(list(set(score_pred_topk_eids + score_phr_topk_eids)))
pred_rel_prob = g.edata['preds'][topk_eids].asnumpy()
if joint_preds:
pred_rel_class = pred_rel_prob[:,1:].argmax(axis=1)
else:
pred_rel_class = pred_rel_prob.argmax(axis=1)
pred_node_ids = g.find_edges(topk_eids)
pred_node_sub = pred_node_ids[0].asnumpy()
pred_node_ob = pred_node_ids[1].asnumpy()
pred_sub_class = pred_class[pred_node_sub]
pred_sub_class_prob = pred_class_prob[pred_node_sub]
pred_sub_bbox = pred_bbox[pred_node_sub]
pred_ob_class = pred_class[pred_node_ob]
pred_ob_class_prob = pred_class_prob[pred_node_ob]
pred_ob_bbox = pred_bbox[pred_node_ob]
pred_triplets = np.vstack([score_pred[topk_eids], score_phr[topk_eids],
pred_rel_class, pred_sub_class, pred_ob_class,
pred_sub_bbox.transpose(1, 0),
pred_ob_bbox.transpose(1, 0)]).transpose(1, 0)
return pred_objects, pred_triplets
import dgl
from dgl.utils import toindex
import mxnet as mx
import numpy as np
def l0_sample(g, positive_max=128, negative_ratio=3):
'''sampling positive and negative edges'''
if g is None:
return None
n_eids = g.number_of_edges()
pos_eids = np.where(g.edata['rel_class'].asnumpy() > 0)[0]
neg_eids = np.where(g.edata['rel_class'].asnumpy() == 0)[0]
if len(pos_eids) == 0:
return None
positive_num = min(len(pos_eids), positive_max)
negative_num = min(len(neg_eids), positive_num * negative_ratio)
pos_sample = np.random.choice(pos_eids, positive_num, replace=False)
neg_sample = np.random.choice(neg_eids, negative_num, replace=False)
weights = np.zeros(n_eids)
# np.add.at(weights, pos_sample, 1)
weights[pos_sample] = 1
weights[neg_sample] = 1
# g.edata['sample_weights'] = mx.nd.array(weights, ctx=g.edata['rel_class'].context)
# return g
eids = np.where(weights > 0)[0]
sub_g = g.edge_subgraph(toindex(eids.tolist()))
sub_g.copy_from_parent()
sub_g.edata['sample_weights'] = mx.nd.array(weights[eids],
ctx=g.edata['rel_class'].context)
return sub_g
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment