Unverified Commit 2e5628b4 authored by q.yao's avatar q.yao Committed by GitHub
Browse files

[Refactor]: Remove deployment for dev-2.x (#2225)

* remove deploy for 2.0

* update onnx ut
parent 961373ad
...@@ -131,15 +131,6 @@ jobs: ...@@ -131,15 +131,6 @@ jobs:
- run: - run:
name: Install psutil name: Install psutil
command: python -m pip install psutil command: python -m pip install psutil
- run:
name: Download onnxruntime library and install onnxruntime
command: |
wget https://github.com/microsoft/onnxruntime/releases/download/v1.8.1/onnxruntime-linux-x64-1.8.1.tgz
tar -zxvf onnxruntime-linux-x64-1.8.1.tgz
echo 'export ONNXRUNTIME_DIR=$(pwd)/onnxruntime-linux-x64-1.8.1' >> $BASH_ENV
echo 'export LD_LIBRARY_PATH=$ONNXRUNTIME_DIR/lib:$LD_LIBRARY_PATH' >> $BASH_ENV
source $BASH_ENV
python -m pip install onnxruntime==1.8.1
- run: - run:
name: Build and install name: Build and install
command: | command: |
......
...@@ -241,10 +241,6 @@ b. Install the lite version. ...@@ -241,10 +241,6 @@ b. Install the lite version.
pip install mmcv pip install mmcv
``` ```
c. Install full version with custom operators for onnxruntime
- Check [here](docs/en/deployment/onnxruntime_op.md) for detailed instruction.
If you would like to build MMCV from source, please refer to the [guide](https://mmcv.readthedocs.io/en/latest/get_started/build.html). If you would like to build MMCV from source, please refer to the [guide](https://mmcv.readthedocs.io/en/latest/get_started/build.html).
## FAQ ## FAQ
......
...@@ -238,10 +238,6 @@ b. 安装精简版 ...@@ -238,10 +238,6 @@ b. 安装精简版
pip install mmcv pip install mmcv
``` ```
c. 安装完整版并且编译 onnxruntime 的自定义算子
- 详细的指南请查看[这里](docs/zh_cn/deployment/onnxruntime_op.md)
如果想从源码编译 MMCV,请参考[该文档](https://mmcv.readthedocs.io/zh_CN/latest/get_started/build.html) 如果想从源码编译 MMCV,请参考[该文档](https://mmcv.readthedocs.io/zh_CN/latest/get_started/build.html)
## FAQ ## FAQ
......
## Introduction of mmcv.onnx module
### <span style="color:red">DeprecationWarning</span>
ONNX support will be deprecated in the future.
Welcome to use the unified model deployment toolbox MMDeploy: https://github.com/open-mmlab/mmdeploy
### register_extra_symbolics
Some extra symbolic functions need to be registered before exporting PyTorch model to ONNX.
#### Example
```python
import mmcv
from mmcv.onnx import register_extra_symbolics
opset_version = 11
register_extra_symbolics(opset_version)
```
#### Reminder
- *Please note that this feature is experimental and may change in the future.*
#### FAQs
- None
## ONNX Runtime Custom Ops
<!-- TOC -->
- [ONNX Runtime Custom Ops](#onnx-runtime-custom-ops)
- [SoftNMS](#softnms)
- [Description](#description)
- [Parameters](#parameters)
- [Inputs](#inputs)
- [Outputs](#outputs)
- [Type Constraints](#type-constraints)
- [RoIAlign](#roialign)
- [Description](#description-1)
- [Parameters](#parameters-1)
- [Inputs](#inputs-1)
- [Outputs](#outputs-1)
- [Type Constraints](#type-constraints-1)
- [NMS](#nms)
- [Description](#description-2)
- [Parameters](#parameters-2)
- [Inputs](#inputs-2)
- [Outputs](#outputs-2)
- [Type Constraints](#type-constraints-2)
- [grid_sampler](#grid_sampler)
- [Description](#description-3)
- [Parameters](#parameters-3)
- [Inputs](#inputs-3)
- [Outputs](#outputs-3)
- [Type Constraints](#type-constraints-3)
- [CornerPool](#cornerpool)
- [Description](#description-4)
- [Parameters](#parameters-4)
- [Inputs](#inputs-4)
- [Outputs](#outputs-4)
- [Type Constraints](#type-constraints-4)
- [cummax](#cummax)
- [Description](#description-5)
- [Parameters](#parameters-5)
- [Inputs](#inputs-5)
- [Outputs](#outputs-5)
- [Type Constraints](#type-constraints-5)
- [cummin](#cummin)
- [Description](#description-6)
- [Parameters](#parameters-6)
- [Inputs](#inputs-6)
- [Outputs](#outputs-6)
- [Type Constraints](#type-constraints-6)
- [MMCVModulatedDeformConv2d](#mmcvmodulateddeformconv2d)
- [Description](#description-7)
- [Parameters](#parameters-7)
- [Inputs](#inputs-7)
- [Outputs](#outputs-7)
- [Type Constraints](#type-constraints-7)
- [MMCVDeformConv2d](#mmcvdeformconv2d)
- [Description](#description-8)
- [Parameters](#parameters-8)
- [Inputs](#inputs-8)
- [Outputs](#outputs-8)
- [Type Constraints](#type-constraints-8)
<!-- TOC -->
### SoftNMS
#### Description
Perform soft NMS on `boxes` with `scores`. Read [Soft-NMS -- Improving Object Detection With One Line of Code](https://arxiv.org/abs/1704.04503) for detail.
#### Parameters
| Type | Parameter | Description |
| ------- | --------------- | -------------------------------------------------------------- |
| `float` | `iou_threshold` | IoU threshold for NMS |
| `float` | `sigma` | hyperparameter for gaussian method |
| `float` | `min_score` | score filter threshold |
| `int` | `method` | method to do the nms, (0: `naive`, 1: `linear`, 2: `gaussian`) |
| `int` | `offset` | `boxes` width or height is (x2 - x1 + offset). (0 or 1) |
#### Inputs
<dl>
<dt><tt>boxes</tt>: T</dt>
<dd>Input boxes. 2-D tensor of shape (N, 4). N is the number of boxes.</dd>
<dt><tt>scores</tt>: T</dt>
<dd>Input scores. 1-D tensor of shape (N, ).</dd>
</dl>
#### Outputs
<dl>
<dt><tt>dets</tt>: T</dt>
<dd>Output boxes and scores. 2-D tensor of shape (num_valid_boxes, 5), [[x1, y1, x2, y2, score], ...]. num_valid_boxes is the number of valid boxes.</dd>
<dt><tt>indices</tt>: tensor(int64)</dt>
<dd>Output indices. 1-D tensor of shape (num_valid_boxes, ).</dd>
</dl>
#### Type Constraints
- T:tensor(float32)
### RoIAlign
#### Description
Perform RoIAlign on output feature, used in bbox_head of most two-stage detectors.
#### Parameters
| Type | Parameter | Description |
| ------- | ---------------- | ------------------------------------------------------------------------------------------------------------- |
| `int` | `output_height` | height of output roi |
| `int` | `output_width` | width of output roi |
| `float` | `spatial_scale` | used to scale the input boxes |
| `int` | `sampling_ratio` | number of input samples to take for each output sample. `0` means to take samples densely for current models. |
| `str` | `mode` | pooling mode in each bin. `avg` or `max` |
| `int` | `aligned` | If `aligned=0`, use the legacy implementation in MMDetection. Else, align the results more perfectly. |
#### Inputs
<dl>
<dt><tt>input</tt>: T</dt>
<dd>Input feature map; 4D tensor of shape (N, C, H, W), where N is the batch size, C is the numbers of channels, H and W are the height and width of the data.</dd>
<dt><tt>rois</tt>: T</dt>
<dd>RoIs (Regions of Interest) to pool over; 2-D tensor of shape (num_rois, 5) given as [[batch_index, x1, y1, x2, y2], ...]. The RoIs' coordinates are the coordinate system of input.</dd>
</dl>
#### Outputs
<dl>
<dt><tt>feat</tt>: T</dt>
<dd>RoI pooled output, 4-D tensor of shape (num_rois, C, output_height, output_width). The r-th batch element feat[r-1] is a pooled feature map corresponding to the r-th RoI RoIs[r-1].<dd>
</dl>
#### Type Constraints
- T:tensor(float32)
### NMS
#### Description
Filter out boxes has high IoU overlap with previously selected boxes.
#### Parameters
| Type | Parameter | Description |
| ------- | --------------- | ------------------------------------------------------------------------------------------------------------------ |
| `float` | `iou_threshold` | The threshold for deciding whether boxes overlap too much with respect to IoU. Value range \[0, 1\]. Default to 0. |
| `int` | `offset` | 0 or 1, boxes' width or height is (x2 - x1 + offset). |
#### Inputs
<dl>
<dt><tt>bboxes</tt>: T</dt>
<dd>Input boxes. 2-D tensor of shape (num_boxes, 4). num_boxes is the number of input boxes.</dd>
<dt><tt>scores</tt>: T</dt>
<dd>Input scores. 1-D tensor of shape (num_boxes, ).</dd>
</dl>
#### Outputs
<dl>
<dt><tt>indices</tt>: tensor(int32, Linear)</dt>
<dd>Selected indices. 1-D tensor of shape (num_valid_boxes, ). num_valid_boxes is the number of valid boxes.</dd>
</dl>
#### Type Constraints
- T:tensor(float32)
### grid_sampler
#### Description
Perform sample from `input` with pixel locations from `grid`.
#### Parameters
| Type | Parameter | Description |
| ----- | -------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `int` | `interpolation_mode` | Interpolation mode to calculate output values. (0: `bilinear` , 1: `nearest`) |
| `int` | `padding_mode` | Padding mode for outside grid values. (0: `zeros`, 1: `border`, 2: `reflection`) |
| `int` | `align_corners` | If `align_corners=1`, the extrema (`-1` and `1`) are considered as referring to the center points of the input's corner pixels. If `align_corners=0`, they are instead considered as referring to the corner points of the input's corner pixels, making the sampling more resolution agnostic. |
#### Inputs
<dl>
<dt><tt>input</tt>: T</dt>
<dd>Input feature; 4-D tensor of shape (N, C, inH, inW), where N is the batch size, C is the numbers of channels, inH and inW are the height and width of the data.</dd>
<dt><tt>grid</tt>: T</dt>
<dd>Input offset; 4-D tensor of shape (N, outH, outW, 2), where outH and outW is the height and width of offset and output. </dd>
</dl>
#### Outputs
<dl>
<dt><tt>output</tt>: T</dt>
<dd>Output feature; 4-D tensor of shape (N, C, outH, outW).</dd>
</dl>
#### Type Constraints
- T:tensor(float32, Linear)
### CornerPool
#### Description
Perform CornerPool on `input` features. Read [CornerNet -- Detecting Objects as Paired Keypoints](https://arxiv.org/abs/1808.01244) for more details.
#### Parameters
| Type | Parameter | Description |
| ----- | --------- | ---------------------------------------------------------------- |
| `int` | `mode` | corner pool mode, (0: `top`, 1: `bottom`, 2: `left`, 3: `right`) |
#### Inputs
<dl>
<dt><tt>input</tt>: T</dt>
<dd>Input features. 4-D tensor of shape (N, C, H, W). N is the batch size.</dd>
</dl>
#### Outputs
<dl>
<dt><tt>output</tt>: T</dt>
<dd>Output the pooled features. 4-D tensor of shape (N, C, H, W).</dd>
</dl>
#### Type Constraints
- T:tensor(float32)
### cummax
#### Description
Returns a tuple (`values`, `indices`) where `values` is the cumulative maximum elements of `input` in the dimension `dim`. And `indices` is the index location of each maximum value found in the dimension `dim`. Read [torch.cummax](https://pytorch.org/docs/stable/generated/torch.cummax.html) for more details.
#### Parameters
| Type | Parameter | Description |
| ----- | --------- | -------------------------------------- |
| `int` | `dim` | the dimension to do the operation over |
#### Inputs
<dl>
<dt><tt>input</tt>: T</dt>
<dd>The input tensor with various shapes. Tensor with empty element is also supported.</dd>
</dl>
#### Outputs
<dl>
<dt><tt>output</tt>: T</dt>
<dd>Output the cumulative maximum elements of `input` in the dimension `dim`, with the same shape and dtype as `input`.</dd>
<dt><tt>indices</tt>: tensor(int64)</dt>
<dd>Output the index location of each cumulative maximum value found in the dimension `dim`, with the same shape as `input`.</dd>
</dl>
#### Type Constraints
- T:tensor(float32)
### cummin
#### Description
Returns a tuple (`values`, `indices`) where `values` is the cumulative minimum elements of `input` in the dimension `dim`. And `indices` is the index location of each minimum value found in the dimension `dim`. Read [torch.cummin](https://pytorch.org/docs/stable/generated/torch.cummin.html) for more details.
#### Parameters
| Type | Parameter | Description |
| ----- | --------- | -------------------------------------- |
| `int` | `dim` | the dimension to do the operation over |
#### Inputs
<dl>
<dt><tt>input</tt>: T</dt>
<dd>The input tensor with various shapes. Tensor with empty element is also supported.</dd>
</dl>
#### Outputs
<dl>
<dt><tt>output</tt>: T</dt>
<dd>Output the cumulative minimum elements of `input` in the dimension `dim`, with the same shape and dtype as `input`.</dd>
<dt><tt>indices</tt>: tensor(int64)</dt>
<dd>Output the index location of each cumulative minimum value found in the dimension `dim`, with the same shape as `input`.</dd>
</dl>
#### Type Constraints
- T:tensor(float32)
### MMCVModulatedDeformConv2d
#### Description
Perform Modulated Deformable Convolution on input feature, read [Deformable ConvNets v2: More Deformable, Better Results](https://arxiv.org/abs/1811.11168?from=timeline) for detail.
#### Parameters
| Type | Parameter | Description |
| -------------- | ------------------- | ------------------------------------------------------------------------------------- |
| `list of ints` | `stride` | The stride of the convolving kernel. (sH, sW) |
| `list of ints` | `padding` | Paddings on both sides of the input. (padH, padW) |
| `list of ints` | `dilation` | The spacing between kernel elements. (dH, dW) |
| `int` | `deformable_groups` | Groups of deformable offset. |
| `int` | `groups` | Split input into groups. `input_channel` should be divisible by the number of groups. |
#### Inputs
<dl>
<dt><tt>inputs[0]</tt>: T</dt>
<dd>Input feature; 4-D tensor of shape (N, C, inH, inW), where N is the batch size, C is the number of channels, inH and inW are the height and width of the data.</dd>
<dt><tt>inputs[1]</tt>: T</dt>
<dd>Input offset; 4-D tensor of shape (N, deformable_group* 2* kH* kW, outH, outW), where kH and kW is the height and width of weight, outH and outW is the height and width of offset and output.</dd>
<dt><tt>inputs[2]</tt>: T</dt>
<dd>Input mask; 4-D tensor of shape (N, deformable_group* kH* kW, outH, outW), where kH and kW is the height and width of weight, outH and outW is the height and width of offset and output.</dd>
<dt><tt>inputs[3]</tt>: T</dt>
<dd>Input weight; 4-D tensor of shape (output_channel, input_channel, kH, kW).</dd>
<dt><tt>inputs[4]</tt>: T, optional</dt>
<dd>Input bias; 1-D tensor of shape (output_channel).</dd>
</dl>
#### Outputs
<dl>
<dt><tt>outputs[0]</tt>: T</dt>
<dd>Output feature; 4-D tensor of shape (N, output_channel, outH, outW).</dd>
</dl>
#### Type Constraints
- T:tensor(float32, Linear)
### MMCVDeformConv2d
#### Description
Perform Deformable Convolution on input feature, read [Deformable Convolutional Network](https://arxiv.org/abs/1703.06211) for detail.
#### Parameters
| Type | Parameter | Description |
| -------------- | ------------------ | --------------------------------------------------------------------------------------------------------------------------------- |
| `list of ints` | `stride` | The stride of the convolving kernel. (sH, sW) |
| `list of ints` | `padding` | Paddings on both sides of the input. (padH, padW) |
| `list of ints` | `dilation` | The spacing between kernel elements. (dH, dW) |
| `int` | `deformable_group` | Groups of deformable offset. |
| `int` | `group` | Split input into groups. `input_channel` should be divisible by the number of groups. |
| `int` | `im2col_step` | DeformableConv2d use im2col to compute convolution. im2col_step is used to split input and offset, reduce memory usage of column. |
#### Inputs
<dl>
<dt><tt>inputs[0]</tt>: T</dt>
<dd>Input feature; 4-D tensor of shape (N, C, inH, inW), where N is the batch size, C is the numbers of channels, inH and inW are the height and width of the data.</dd>
<dt><tt>inputs[1]</tt>: T</dt>
<dd>Input offset; 4-D tensor of shape (N, deformable_group* 2* kH* kW, outH, outW), where kH and kW is the height and width of weight, outH and outW is the height and width of offset and output.</dd>
<dt><tt>inputs[2]</tt>: T</dt>
<dd>Input weight; 4-D tensor of shape (output_channel, input_channel, kH, kW).</dd>
</dl>
#### Outputs
<dl>
<dt><tt>outputs[0]</tt>: T</dt>
<dd>Output feature; 4-D tensor of shape (N, output_channel, outH, outW).</dd>
</dl>
#### Type Constraints
- T:tensor(float32, Linear)
## ONNX Runtime Deployment
### <span style="color:red">DeprecationWarning</span>
ONNX support will be deprecated in the future.
Welcome to use the unified model deployment toolbox MMDeploy: https://github.com/open-mmlab/mmdeploy
### Introduction of ONNX Runtime
**ONNX Runtime** is a cross-platform inferencing and training accelerator compatible with many popular ML/DNN frameworks. Check its [github](https://github.com/microsoft/onnxruntime) for more information.
### Introduction of ONNX
**ONNX** stands for **Open Neural Network Exchange**, which acts as *Intermediate Representation(IR)* for ML/DNN models from many frameworks. Check its [github](https://github.com/onnx/onnx) for more information.
### Why include custom operators for ONNX Runtime in MMCV
- To verify the correctness of exported ONNX models in ONNX Runtime.
- To ease the deployment of ONNX models with custom operators from `mmcv.ops` in ONNX Runtime.
### List of operators for ONNX Runtime supported in MMCV
| Operator | CPU | GPU | MMCV Releases |
| :----------------------------------------------------- | :-: | :-: | :-----------: |
| [SoftNMS](onnxruntime_custom_ops.md#softnms) | Y | N | 1.2.3 |
| [RoIAlign](onnxruntime_custom_ops.md#roialign) | Y | N | 1.2.5 |
| [NMS](onnxruntime_custom_ops.md#nms) | Y | N | 1.2.7 |
| [grid_sampler](onnxruntime_custom_ops.md#grid_sampler) | Y | N | 1.3.1 |
| [CornerPool](onnxruntime_custom_ops.md#cornerpool) | Y | N | 1.3.4 |
| [cummax](onnxruntime_custom_ops.md#cummax) | Y | N | 1.3.4 |
| [cummin](onnxruntime_custom_ops.md#cummin) | Y | N | 1.3.4 |
### How to build custom operators for ONNX Runtime
*Please be noted that only **onnxruntime>=1.8.1** of CPU version on Linux platform is tested by now.*
#### Prerequisite
- Clone repository
```bash
git clone https://github.com/open-mmlab/mmcv.git
```
- Download `onnxruntime-linux` from ONNX Runtime [releases](https://github.com/microsoft/onnxruntime/releases/tag/v1.8.1), extract it, expose `ONNXRUNTIME_DIR` and finally add the lib path to `LD_LIBRARY_PATH` as below:
```bash
wget https://github.com/microsoft/onnxruntime/releases/download/v1.8.1/onnxruntime-linux-x64-1.8.1.tgz
tar -zxvf onnxruntime-linux-x64-1.8.1.tgz
cd onnxruntime-linux-x64-1.8.1
export ONNXRUNTIME_DIR=$(pwd)
export LD_LIBRARY_PATH=$ONNXRUNTIME_DIR/lib:$LD_LIBRARY_PATH
```
#### Build on Linux
```bash
cd mmcv ## to MMCV root directory
MMCV_WITH_OPS=1 MMCV_WITH_ORT=1 python setup.py develop
```
### How to do inference using exported ONNX models with custom operators in ONNX Runtime in python
Install ONNX Runtime with `pip`
```bash
pip install onnxruntime==1.8.1
```
Inference Demo
```python
import os
import numpy as np
import onnxruntime as ort
from mmcv.ops import get_onnxruntime_op_path
ort_custom_op_path = get_onnxruntime_op_path()
assert os.path.exists(ort_custom_op_path)
session_options = ort.SessionOptions()
session_options.register_custom_ops_library(ort_custom_op_path)
## exported ONNX model with custom operators
onnx_file = 'sample.onnx'
input_data = np.random.randn(1, 3, 224, 224).astype(np.float32)
sess = ort.InferenceSession(onnx_file, session_options)
onnx_results = sess.run(None, {'input' : input_data})
```
### How to add a new custom operator for ONNX Runtime in MMCV
#### Reminder
- *Please note that this feature is experimental and may change in the future. Strongly suggest users always try with the latest master branch.*
- The custom operator is not included in [supported operator list](https://github.com/microsoft/onnxruntime/blob/master/docs/OperatorKernels.md) in ONNX Runtime.
- The custom operator should be able to be exported to ONNX.
#### Main procedures
Take custom operator `soft_nms` for example.
1. Add header `soft_nms.h` to ONNX Runtime include directory `mmcv/ops/csrc/onnxruntime/`
2. Add source `soft_nms.cpp` to ONNX Runtime source directory `mmcv/ops/csrc/onnxruntime/cpu/`
3. Register `soft_nms` operator in [onnxruntime_register.cpp](../../../mmcv/ops/csrc/onnxruntime/cpu/onnxruntime_register.cpp)
```c++
#include "soft_nms.h"
SoftNmsOp c_SoftNmsOp;
if (auto status = ortApi->CustomOpDomain_Add(domain, &c_SoftNmsOp)) {
return status;
}
```
4. Add unit test into `tests/test_ops/test_onnx.py`
Check [here](../../tests/test_ops/test_onnx.py) for examples.
**Finally, welcome to send us PR of adding custom operators for ONNX Runtime in MMCV.** :nerd_face:
### Known Issues
- "RuntimeError: tuple appears in op that does not forward tuples, unsupported kind: `prim::PythonOp`."
1. Note generally `cummax` or `cummin` is exportable to ONNX as long as the torch version >= 1.5.0, since `torch.cummax` is only supported with torch >= 1.5.0. But when `cummax` or `cummin` serves as an intermediate component whose outputs is used as inputs for another modules, it's expected that torch version must be >= 1.7.0. Otherwise the above error might arise, when running exported ONNX model with onnxruntime.
2. Solution: update the torch version to 1.7.0 or higher.
### References
- [How to export Pytorch model with custom op to ONNX and run it in ONNX Runtime](https://github.com/onnx/tutorials/blob/master/PyTorchCustomOperator/README.md)
- [How to add a custom operator/kernel in ONNX Runtime](https://onnxruntime.ai/docs/reference/operators/add-custom-op.html)
## TensorRT Custom Ops
<!-- TOC -->
- [TensorRT Custom Ops](#tensorrt-custom-ops)
- [MMCVRoIAlign](#mmcvroialign)
- [Description](#description)
- [Parameters](#parameters)
- [Inputs](#inputs)
- [Outputs](#outputs)
- [Type Constraints](#type-constraints)
- [ScatterND](#scatternd)
- [Description](#description-1)
- [Parameters](#parameters-1)
- [Inputs](#inputs-1)
- [Outputs](#outputs-1)
- [Type Constraints](#type-constraints-1)
- [NonMaxSuppression](#nonmaxsuppression)
- [Description](#description-2)
- [Parameters](#parameters-2)
- [Inputs](#inputs-2)
- [Outputs](#outputs-2)
- [Type Constraints](#type-constraints-2)
- [MMCVDeformConv2d](#mmcvdeformconv2d)
- [Description](#description-3)
- [Parameters](#parameters-3)
- [Inputs](#inputs-3)
- [Outputs](#outputs-3)
- [Type Constraints](#type-constraints-3)
- [grid_sampler](#grid_sampler)
- [Description](#description-4)
- [Parameters](#parameters-4)
- [Inputs](#inputs-4)
- [Outputs](#outputs-4)
- [Type Constraints](#type-constraints-4)
- [cummax](#cummax)
- [Description](#description-5)
- [Parameters](#parameters-5)
- [Inputs](#inputs-5)
- [Outputs](#outputs-5)
- [Type Constraints](#type-constraints-5)
- [cummin](#cummin)
- [Description](#description-6)
- [Parameters](#parameters-6)
- [Inputs](#inputs-6)
- [Outputs](#outputs-6)
- [Type Constraints](#type-constraints-6)
- [MMCVInstanceNormalization](#mmcvinstancenormalization)
- [Description](#description-7)
- [Parameters](#parameters-7)
- [Inputs](#inputs-7)
- [Outputs](#outputs-7)
- [Type Constraints](#type-constraints-7)
- [MMCVModulatedDeformConv2d](#mmcvmodulateddeformconv2d)
- [Description](#description-8)
- [Parameters](#parameters-8)
- [Inputs](#inputs-8)
- [Outputs](#outputs-8)
- [Type Constraints](#type-constraints-8)
<!-- TOC -->
### MMCVRoIAlign
#### Description
Perform RoIAlign on output feature, used in bbox_head of most two stage
detectors.
#### Parameters
| Type | Parameter | Description |
| ------- | ---------------- | ------------------------------------------------------------------------------------------------------------- |
| `int` | `output_height` | height of output roi |
| `int` | `output_width` | width of output roi |
| `float` | `spatial_scale` | used to scale the input boxes |
| `int` | `sampling_ratio` | number of input samples to take for each output sample. `0` means to take samples densely for current models. |
| `str` | `mode` | pooling mode in each bin. `avg` or `max` |
| `int` | `aligned` | If `aligned=0`, use the legacy implementation in MMDetection. Else, align the results more perfectly. |
#### Inputs
<dl>
<dt><tt>inputs[0]</tt>: T</dt>
<dd>Input feature map; 4D tensor of shape (N, C, H, W), where N is the batch size, C is the numbers of channels, H and W are the height and width of the data.</dd>
<dt><tt>inputs[1]</tt>: T</dt>
<dd>RoIs (Regions of Interest) to pool over; 2-D tensor of shape (num_rois, 5) given as [[batch_index, x1, y1, x2, y2], ...]. The RoIs' coordinates are the coordinate system of inputs[0].</dd>
</dl>
#### Outputs
<dl>
<dt><tt>outputs[0]</tt>: T</dt>
<dd>RoI pooled output, 4-D tensor of shape (num_rois, C, output_height, output_width). The r-th batch element output[0][r-1] is a pooled feature map corresponding to the r-th RoI inputs[1][r-1].<dd>
</dl>
#### Type Constraints
- T:tensor(float32, Linear)
### ScatterND
#### Description
ScatterND takes three inputs `data` tensor of rank r >= 1, `indices` tensor of rank q >= 1, and `updates` tensor of rank q + r - indices.shape\[-1\] - 1. The output of the operation is produced by creating a copy of the input `data`, and then updating its value to values specified by updates at specific index positions specified by `indices`. Its output shape is the same as the shape of `data`. Note that `indices` should not have duplicate entries. That is, two or more updates for the same index-location is not supported.
The `output` is calculated via the following equation:
```python
output = np.copy(data)
update_indices = indices.shape[:-1]
for idx in np.ndindex(update_indices):
output[indices[idx]] = updates[idx]
```
#### Parameters
None
#### Inputs
<dl>
<dt><tt>inputs[0]</tt>: T</dt>
<dd>Tensor of rank r>=1.</dd>
<dt><tt>inputs[1]</tt>: tensor(int32, Linear)</dt>
<dd>Tensor of rank q>=1.</dd>
<dt><tt>inputs[2]</tt>: T</dt>
<dd>Tensor of rank q + r - indices_shape[-1] - 1.</dd>
</dl>
#### Outputs
<dl>
<dt><tt>outputs[0]</tt>: T</dt>
<dd>Tensor of rank r >= 1.</dd>
</dl>
#### Type Constraints
- T:tensor(float32, Linear), tensor(int32, Linear)
### NonMaxSuppression
#### Description
Filter out boxes has high IoU overlap with previously selected boxes or low score. Output the indices of valid boxes. Indices of invalid boxes will be filled with -1.
#### Parameters
| Type | Parameter | Description |
| ------- | ---------------------------- | ------------------------------------------------------------------------------------------------------------------------------------ |
| `int` | `center_point_box` | 0 - the box data is supplied as \[y1, x1, y2, x2\], 1-the box data is supplied as \[x_center, y_center, width, height\]. |
| `int` | `max_output_boxes_per_class` | The maximum number of boxes to be selected per batch per class. Default to 0, number of output boxes equal to number of input boxes. |
| `float` | `iou_threshold` | The threshold for deciding whether boxes overlap too much with respect to IoU. Value range \[0, 1\]. Default to 0. |
| `float` | `score_threshold` | The threshold for deciding when to remove boxes based on score. |
| `int` | `offset` | 0 or 1, boxes' width or height is (x2 - x1 + offset). |
#### Inputs
<dl>
<dt><tt>inputs[0]</tt>: T</dt>
<dd>Input boxes. 3-D tensor of shape (num_batches, spatial_dimension, 4).</dd>
<dt><tt>inputs[1]</tt>: T</dt>
<dd>Input scores. 3-D tensor of shape (num_batches, num_classes, spatial_dimension).</dd>
</dl>
#### Outputs
<dl>
<dt><tt>outputs[0]</tt>: tensor(int32, Linear)</dt>
<dd>Selected indices. 2-D tensor of shape (num_selected_indices, 3) as [[batch_index, class_index, box_index], ...].</dd>
<dd>num_selected_indices=num_batches* num_classes* min(max_output_boxes_per_class, spatial_dimension).</dd>
<dd>All invalid indices will be filled with -1.</dd>
</dl>
#### Type Constraints
- T:tensor(float32, Linear)
### MMCVDeformConv2d
#### Description
Perform Deformable Convolution on input feature, read [Deformable Convolutional Network](https://arxiv.org/abs/1703.06211) for detail.
#### Parameters
| Type | Parameter | Description |
| -------------- | ------------------ | --------------------------------------------------------------------------------------------------------------------------------- |
| `list of ints` | `stride` | The stride of the convolving kernel. (sH, sW) |
| `list of ints` | `padding` | Paddings on both sides of the input. (padH, padW) |
| `list of ints` | `dilation` | The spacing between kernel elements. (dH, dW) |
| `int` | `deformable_group` | Groups of deformable offset. |
| `int` | `group` | Split input into groups. `input_channel` should be divisible by the number of groups. |
| `int` | `im2col_step` | DeformableConv2d use im2col to compute convolution. im2col_step is used to split input and offset, reduce memory usage of column. |
#### Inputs
<dl>
<dt><tt>inputs[0]</tt>: T</dt>
<dd>Input feature; 4-D tensor of shape (N, C, inH, inW), where N is the batch size, C is the numbers of channels, inH and inW are the height and width of the data.</dd>
<dt><tt>inputs[1]</tt>: T</dt>
<dd>Input offset; 4-D tensor of shape (N, deformable_group* 2* kH* kW, outH, outW), where kH and kW is the height and width of weight, outH and outW is the height and width of offset and output.</dd>
<dt><tt>inputs[2]</tt>: T</dt>
<dd>Input weight; 4-D tensor of shape (output_channel, input_channel, kH, kW).</dd>
</dl>
#### Outputs
<dl>
<dt><tt>outputs[0]</tt>: T</dt>
<dd>Output feature; 4-D tensor of shape (N, output_channel, outH, outW).</dd>
</dl>
#### Type Constraints
- T:tensor(float32, Linear)
### grid_sampler
#### Description
Perform sample from `input` with pixel locations from `grid`.
#### Parameters
| Type | Parameter | Description |
| ----- | -------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `int` | `interpolation_mode` | Interpolation mode to calculate output values. (0: `bilinear` , 1: `nearest`) |
| `int` | `padding_mode` | Padding mode for outside grid values. (0: `zeros`, 1: `border`, 2: `reflection`) |
| `int` | `align_corners` | If `align_corners=1`, the extrema (`-1` and `1`) are considered as referring to the center points of the input's corner pixels. If `align_corners=0`, they are instead considered as referring to the corner points of the input's corner pixels, making the sampling more resolution agnostic. |
#### Inputs
<dl>
<dt><tt>inputs[0]</tt>: T</dt>
<dd>Input feature; 4-D tensor of shape (N, C, inH, inW), where N is the batch size, C is the numbers of channels, inH and inW are the height and width of the data.</dd>
<dt><tt>inputs[1]</tt>: T</dt>
<dd>Input offset; 4-D tensor of shape (N, outH, outW, 2), where outH and outW is the height and width of offset and output. </dd>
</dl>
#### Outputs
<dl>
<dt><tt>outputs[0]</tt>: T</dt>
<dd>Output feature; 4-D tensor of shape (N, C, outH, outW).</dd>
</dl>
#### Type Constraints
- T:tensor(float32, Linear)
### cummax
#### Description
Returns a namedtuple (`values`, `indices`) where `values` is the cumulative maximum of elements of `input` in the dimension `dim`. And `indices` is the index location of each maximum value found in the dimension `dim`.
#### Parameters
| Type | Parameter | Description |
| ----- | --------- | --------------------------------------- |
| `int` | `dim` | The dimension to do the operation over. |
#### Inputs
<dl>
<dt><tt>inputs[0]</tt>: T</dt>
<dd>The input tensor.</dd>
</dl>
#### Outputs
<dl>
<dt><tt>outputs[0]</tt>: T</dt>
<dd>Output values.</dd>
<dt><tt>outputs[1]</tt>: (int32, Linear)</dt>
<dd>Output indices.</dd>
</dl>
#### Type Constraints
- T:tensor(float32, Linear)
### cummin
#### Description
Returns a namedtuple (`values`, `indices`) where `values` is the cumulative minimum of elements of `input` in the dimension `dim`. And `indices` is the index location of each minimum value found in the dimension `dim`.
#### Parameters
| Type | Parameter | Description |
| ----- | --------- | --------------------------------------- |
| `int` | `dim` | The dimension to do the operation over. |
#### Inputs
<dl>
<dt><tt>inputs[0]</tt>: T</dt>
<dd>The input tensor.</dd>
</dl>
#### Outputs
<dl>
<dt><tt>outputs[0]</tt>: T</dt>
<dd>Output values.</dd>
<dt><tt>outputs[1]</tt>: (int32, Linear)</dt>
<dd>Output indices.</dd>
</dl>
#### Type Constraints
- T:tensor(float32, Linear)
### MMCVInstanceNormalization
#### Description
Carries out instance normalization as described in the paper https://arxiv.org/abs/1607.08022.
y = scale * (x - mean) / sqrt(variance + epsilon) + B, where mean and variance are computed per instance per channel.
#### Parameters
| Type | Parameter | Description |
| ------- | --------- | -------------------------------------------------------------------- |
| `float` | `epsilon` | The epsilon value to use to avoid division by zero. Default is 1e-05 |
#### Inputs
<dl>
<dt><tt>input</tt>: T</dt>
<dd>Input data tensor from the previous operator; dimensions for image case are (N x C x H x W), where N is the batch size, C is the number of channels, and H and W are the height and the width of the data. For non image case, the dimensions are in the form of (N x C x D1 x D2 ... Dn), where N is the batch size.</dd>
<dt><tt>scale</tt>: T</dt>
<dd>The input 1-dimensional scale tensor of size C.</dd>
<dt><tt>B</tt>: T</dt>
<dd>The input 1-dimensional bias tensor of size C.</dd>
</dl>
#### Outputs
<dl>
<dt><tt>output</tt>: T</dt>
<dd>The output tensor of the same shape as input.</dd>
</dl>
#### Type Constraints
- T:tensor(float32, Linear)
### MMCVModulatedDeformConv2d
#### Description
Perform Modulated Deformable Convolution on input feature, read [Deformable ConvNets v2: More Deformable, Better Results](https://arxiv.org/abs/1811.11168?from=timeline) for detail.
#### Parameters
| Type | Parameter | Description |
| -------------- | ------------------ | ------------------------------------------------------------------------------------- |
| `list of ints` | `stride` | The stride of the convolving kernel. (sH, sW) |
| `list of ints` | `padding` | Paddings on both sides of the input. (padH, padW) |
| `list of ints` | `dilation` | The spacing between kernel elements. (dH, dW) |
| `int` | `deformable_group` | Groups of deformable offset. |
| `int` | `group` | Split input into groups. `input_channel` should be divisible by the number of groups. |
#### Inputs
<dl>
<dt><tt>inputs[0]</tt>: T</dt>
<dd>Input feature; 4-D tensor of shape (N, C, inH, inW), where N is the batch size, C is the number of channels, inH and inW are the height and width of the data.</dd>
<dt><tt>inputs[1]</tt>: T</dt>
<dd>Input offset; 4-D tensor of shape (N, deformable_group* 2* kH* kW, outH, outW), where kH and kW is the height and width of weight, outH and outW is the height and width of offset and output.</dd>
<dt><tt>inputs[2]</tt>: T</dt>
<dd>Input mask; 4-D tensor of shape (N, deformable_group* kH* kW, outH, outW), where kH and kW is the height and width of weight, outH and outW is the height and width of offset and output.</dd>
<dt><tt>inputs[3]</tt>: T</dt>
<dd>Input weight; 4-D tensor of shape (output_channel, input_channel, kH, kW).</dd>
<dt><tt>inputs[4]</tt>: T, optional</dt>
<dd>Input weight; 1-D tensor of shape (output_channel).</dd>
</dl>
#### Outputs
<dl>
<dt><tt>outputs[0]</tt>: T</dt>
<dd>Output feature; 4-D tensor of shape (N, output_channel, outH, outW).</dd>
</dl>
#### Type Constraints
- T:tensor(float32, Linear)
## TensorRT Deployment
### <span style="color:red">DeprecationWarning</span>
TensorRT support will be deprecated in the future.
Welcome to use the unified model deployment toolbox MMDeploy: https://github.com/open-mmlab/mmdeploy
<!-- TOC -->
- [TensorRT Deployment](#tensorrt-deployment)
- [<span style="color:red">DeprecationWarning</span>](#deprecationwarning)
- [Introduction](#introduction)
- [List of TensorRT plugins supported in MMCV](#list-of-tensorrt-plugins-supported-in-mmcv)
- [How to build TensorRT plugins in MMCV](#how-to-build-tensorrt-plugins-in-mmcv)
- [Prerequisite](#prerequisite)
- [Build on Linux](#build-on-linux)
- [Create TensorRT engine and run inference in python](#create-tensorrt-engine-and-run-inference-in-python)
- [How to add a TensorRT plugin for custom op in MMCV](#how-to-add-a-tensorrt-plugin-for-custom-op-in-mmcv)
- [Main procedures](#main-procedures)
- [Reminders](#reminders)
- [Known Issues](#known-issues)
- [References](#references)
<!-- TOC -->
### Introduction
**NVIDIA TensorRT** is a software development kit(SDK) for high-performance inference of deep learning models. It includes a deep learning inference optimizer and runtime that delivers low latency and high-throughput for deep learning inference applications. Please check its [developer's website](https://developer.nvidia.com/tensorrt) for more information.
To ease the deployment of trained models with custom operators from `mmcv.ops` using TensorRT, a series of TensorRT plugins are included in MMCV.
### List of TensorRT plugins supported in MMCV
| ONNX Operator | TensorRT Plugin | MMCV Releases |
| :------------------------ | :------------------------------------------------------------------------------ | :-----------: |
| MMCVRoiAlign | [MMCVRoiAlign](./tensorrt_custom_ops.md#mmcvroialign) | 1.2.6 |
| ScatterND | [ScatterND](./tensorrt_custom_ops.md#scatternd) | 1.2.6 |
| NonMaxSuppression | [NonMaxSuppression](./tensorrt_custom_ops.md#nonmaxsuppression) | 1.3.0 |
| MMCVDeformConv2d | [MMCVDeformConv2d](./tensorrt_custom_ops.md#mmcvdeformconv2d) | 1.3.0 |
| grid_sampler | [grid_sampler](./tensorrt_custom_ops.md#grid-sampler) | 1.3.1 |
| cummax | [cummax](./tensorrt_custom_ops.md#cummax) | 1.3.5 |
| cummin | [cummin](./tensorrt_custom_ops.md#cummin) | 1.3.5 |
| MMCVInstanceNormalization | [MMCVInstanceNormalization](./tensorrt_custom_ops.md#mmcvinstancenormalization) | 1.3.5 |
| MMCVModulatedDeformConv2d | [MMCVModulatedDeformConv2d](./tensorrt_custom_ops.md#mmcvmodulateddeformconv2d) | 1.3.8 |
Notes
- All plugins listed above are developed on TensorRT-7.2.1.6.Ubuntu-16.04.x86_64-gnu.cuda-10.2.cudnn8.0
### How to build TensorRT plugins in MMCV
#### Prerequisite
- Clone repository
```bash
git clone https://github.com/open-mmlab/mmcv.git
```
- Install TensorRT
Download the corresponding TensorRT build from [NVIDIA Developer Zone](https://developer.nvidia.com/nvidia-tensorrt-download).
For example, for Ubuntu 16.04 on x86-64 with cuda-10.2, the downloaded file is `TensorRT-7.2.1.6.Ubuntu-16.04.x86_64-gnu.cuda-10.2.cudnn8.0.tar.gz`.
Then, install as below:
```bash
cd ~/Downloads
tar -xvzf TensorRT-7.2.1.6.Ubuntu-16.04.x86_64-gnu.cuda-10.2.cudnn8.0.tar.gz
export TENSORRT_DIR=`pwd`/TensorRT-7.2.1.6
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$TENSORRT_DIR/lib
```
Install python packages: tensorrt, graphsurgeon, onnx-graphsurgeon
```bash
pip install $TENSORRT_DIR/python/tensorrt-7.2.1.6-cp37-none-linux_x86_64.whl
pip install $TENSORRT_DIR/onnx_graphsurgeon/onnx_graphsurgeon-0.2.6-py2.py3-none-any.whl
pip install $TENSORRT_DIR/graphsurgeon/graphsurgeon-0.4.5-py2.py3-none-any.whl
```
For more detailed information of installing TensorRT using tar, please refer to [Nvidia' website](https://docs.nvidia.com/deeplearning/tensorrt/archives/tensorrt-721/install-guide/index.html#installing-tar).
- Install cuDNN
Install cuDNN 8 following [Nvidia' website](https://docs.nvidia.com/deeplearning/cudnn/install-guide/index.html#installlinux-tar).
#### Build on Linux
```bash
cd mmcv ## to MMCV root directory
MMCV_WITH_OPS=1 MMCV_WITH_TRT=1 pip install -e .
```
### Create TensorRT engine and run inference in python
Here is an example.
```python
import torch
import onnx
from mmcv.tensorrt import (TRTWrapper, onnx2trt, save_trt_engine,
is_tensorrt_plugin_loaded)
assert is_tensorrt_plugin_loaded(), 'Requires to complie TensorRT plugins in mmcv'
onnx_file = 'sample.onnx'
trt_file = 'sample.trt'
onnx_model = onnx.load(onnx_file)
## Model input
inputs = torch.rand(1, 3, 224, 224).cuda()
## Model input shape info
opt_shape_dict = {
'input': [list(inputs.shape),
list(inputs.shape),
list(inputs.shape)]
}
## Create TensorRT engine
max_workspace_size = 1 << 30
trt_engine = onnx2trt(
onnx_model,
opt_shape_dict,
max_workspace_size=max_workspace_size)
## Save TensorRT engine
save_trt_engine(trt_engine, trt_file)
## Run inference with TensorRT
trt_model = TRTWrapper(trt_file, ['input'], ['output'])
with torch.no_grad():
trt_outputs = trt_model({'input': inputs})
output = trt_outputs['output']
```
### How to add a TensorRT plugin for custom op in MMCV
#### Main procedures
Below are the main steps:
1. Add c++ header file
2. Add c++ source file
3. Add cuda kernel file
4. Register plugin in `trt_plugin.cpp`
5. Add unit test in `tests/test_ops/test_tensorrt.py`
**Take RoIAlign plugin `roi_align` for example.**
1. Add header `trt_roi_align.hpp` to TensorRT include directory `mmcv/ops/csrc/tensorrt/`
2. Add source `trt_roi_align.cpp` to TensorRT source directory `mmcv/ops/csrc/tensorrt/plugins/`
3. Add cuda kernel `trt_roi_align_kernel.cu` to TensorRT source directory `mmcv/ops/csrc/tensorrt/plugins/`
4. Register `roi_align` plugin in [trt_plugin.cpp](https://github.com/open-mmlab/mmcv/blob/master/mmcv/ops/csrc/tensorrt/plugins/trt_plugin.cpp)
```c++
#include "trt_plugin.hpp"
#include "trt_roi_align.hpp"
REGISTER_TENSORRT_PLUGIN(RoIAlignPluginDynamicCreator);
extern "C" {
bool initLibMMCVInferPlugins() { return true; }
} // extern "C"
```
5. Add unit test into `tests/test_ops/test_tensorrt.py`
Check [here](https://github.com/open-mmlab/mmcv/blob/master/tests/test_ops/test_tensorrt.py) for examples.
#### Reminders
- *Please note that this feature is experimental and may change in the future. Strongly suggest users always try with the latest master branch.*
- Some of the [custom ops](https://mmcv.readthedocs.io/en/latest/ops.html) in `mmcv` have their cuda implementations, which could be referred.
### Known Issues
- None
### References
- [Developer guide of Nvidia TensorRT](https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html)
- [TensorRT Open Source Software](https://github.com/NVIDIA/TensorRT)
- [onnx-tensorrt](https://github.com/onnx/onnx-tensorrt)
- [TensorRT python API](https://docs.nvidia.com/deeplearning/tensorrt/api/python_api/index.html)
- [TensorRT c++ plugin API](https://docs.nvidia.com/deeplearning/tensorrt/api/c_api/classnvinfer1_1_1_i_plugin.html)
...@@ -177,8 +177,4 @@ b. Install the lite version. ...@@ -177,8 +177,4 @@ b. Install the lite version.
pip install mmcv pip install mmcv
``` ```
c. Install full version with custom operators for onnxruntime
- Check [here](https://mmcv.readthedocs.io/en/latest/deployment/onnxruntime_custom_ops.html) for detailed instruction.
If you would like to build MMCV from source, please refer to the [guide](https://mmcv.readthedocs.io/en/latest/get_started/build.html). If you would like to build MMCV from source, please refer to the [guide](https://mmcv.readthedocs.io/en/latest/get_started/build.html).
...@@ -25,11 +25,6 @@ You can switch between Chinese and English documents in the lower-left corner of ...@@ -25,11 +25,6 @@ You can switch between Chinese and English documents in the lower-left corner of
:caption: Deployment :caption: Deployment
deployment/mmcv_ops_definition.md deployment/mmcv_ops_definition.md
deployment/onnx.md
deployment/onnxruntime_custom_ops.md
deployment/onnxruntime_op.md
deployment/tensorrt_custom_ops.md
deployment/tensorrt_plugin.md
.. toctree:: .. toctree::
:caption: Switch Language :caption: Switch Language
......
## MMCV中ONNX模块简介 (实验性)
### register_extra_symbolics
在将PyTorch模型导出成ONNX时,需要注册额外的符号函数
#### 范例
```python
import mmcv
from mmcv.onnx import register_extra_symbolics
opset_version = 11
register_extra_symbolics(opset_version)
```
#### 常见问题
-
## ONNX Runtime自定义算子
<!-- TOC -->
- [ONNX Runtime自定义算子](#onnx-runtime自定义算子)
- [SoftNMS](#softnms)
- [描述](#描述)
- [模型参数](#模型参数)
- [输入](#输入)
- [输出](#输出)
- [类型约束](#类型约束)
- [RoIAlign](#roialign)
- [描述](#描述-1)
- [模型参数](#模型参数-1)
- [输入](#输入-1)
- [输出](#输出-1)
- [类型约束](#类型约束-1)
- [NMS](#nms)
- [描述](#描述-2)
- [模型参数](#模型参数-2)
- [输入](#输入-2)
- [输出](#输出-2)
- [类型约束](#类型约束-2)
- [grid_sampler](#grid_sampler)
- [描述](#描述-3)
- [模型参数](#模型参数-3)
- [输入](#输入-3)
- [输出](#输出-3)
- [类型约束](#类型约束-3)
- [CornerPool](#cornerpool)
- [描述](#描述-4)
- [模型参数](#模型参数-4)
- [输入](#输入-4)
- [输出](#输出-4)
- [类型约束](#类型约束-4)
- [cummax](#cummax)
- [描述](#描述-5)
- [模型参数](#模型参数-5)
- [输入](#输入-5)
- [输出](#输出-5)
- [类型约束](#类型约束-5)
- [cummin](#cummin)
- [描述](#描述-6)
- [模型参数](#模型参数-6)
- [输入](#输入-6)
- [输出](#输出-6)
- [类型约束](#类型约束-6)
- [MMCVModulatedDeformConv2d](#mmcvmodulateddeformconv2d)
- [描述](#描述-7)
- [模型参数](#模型参数-7)
- [输入](#输入-7)
- [输出](#输出-7)
- [类型约束](#类型约束-7)
<!-- TOC -->
### SoftNMS
#### 描述
根据`scores`计算`boxes`的soft NMS。 请阅读[Soft-NMS -- Improving Object Detection With One Line of Code](https://arxiv.org/abs/1704.04503)了解细节。
#### 模型参数
| 类型 | 参数名 | 描述 |
| ------- | --------------- | ------------------------------------------------------- |
| `float` | `iou_threshold` | 用来判断候选框重合度的阈值,取值范围\[0, 1\]。默认值为0 |
| `float` | `sigma` | 高斯方法的超参数 |
| `float` | `min_score` | NMS的score阈值 |
| `int` | `method` | NMS的计算方式, (0: `naive`, 1: `linear`, 2: `gaussian`) |
| `int` | `offset` | 用来计算候选框的宽高(x2 - x1 + offset)。可选值0或1 |
#### 输入
<dl>
<dt><tt>boxes</tt>: T</dt>
<dd>输入候选框。形状为(N, 4)的二维张量,N为候选框数量。</dd>
<dt><tt>scores</tt>: T</dt>
<dd>输入得分。形状为(N, )的一维张量。</dd>
</dl>
#### 输出
<dl>
<dt><tt>dets</tt>: T</dt>
<dd>输出的检测框与得分。形状为(num_valid_boxes, 5)的二维张量,内容为[[x1, y1, x2, y2, score], ...]。num_valid_boxes是合法的检测框数量。</dd>
<dt><tt>indices</tt>: tensor(int64)</dt>
<dd>输出序号。形状为(num_valid_boxes, )的一维张量。</dd>
</dl>
#### 类型约束
- T:tensor(float32)
### RoIAlign
#### 描述
在特征图上计算RoIAlign,通常在双阶段目标检测模型的bbox_head中使用
#### 模型参数
| 类型 | 参数名 | 描述 |
| ------- | ---------------- | ------------------------------------------------------- |
| `int` | `output_height` | roi特征的输出高度 |
| `int` | `output_width` | roi特征的输出宽度 |
| `float` | `spatial_scale` | 输入检测框的缩放系数 |
| `int` | `sampling_ratio` | 输出的采样率。`0`表示使用密集采样 |
| `str` | `mode` | 池化方式。 `avg``max` |
| `int` | `aligned` | 如果`aligned=1`,则像素会进行-0.5的偏移以达到更好的对齐 |
#### 输入
<dl>
<dt><tt>input</tt>: T</dt>
<dd>输入特征图;形状为(N, C, H, W)的四维张量,其中N为batch大小,C为输入通道数,H和W为输入特征图的高和宽。</dd>
<dt><tt>rois</tt>: T</dt>
<dd>需要进行池化的感兴趣区域;形状为(num_rois, 5)的二维张量,内容为[[batch_index, x1, y1, x2, y2], ...]。rois的坐标为输入特征图的坐标系。</dd>
</dl>
#### 输出
<dl>
<dt><tt>feat</tt>: T</dt>
<dd>池化的输出;形状为(num_rois, C, output_height, output_width)的四维张量。每个输出特征feat[i]都与输入感兴趣区域rois[i]一一对应。<dd>
</dl>
#### 类型约束
- T:tensor(float32)
### NMS
#### 描述
根据IoU阈值对候选框进行非极大值抑制。
#### 模型参数
| 类型 | 参数名 | 描述 |
| ------- | --------------- | ------------------------------------------------------- |
| `float` | `iou_threshold` | 用来判断候选框重合度的阈值,取值范围\[0, 1\]。默认值为0 |
| `int` | `offset` | 用来计算候选框的宽高(x2 - x1 + offset)。可选值0或1 |
#### 输入
<dl>
<dt><tt>boxes</tt>: T</dt>
<dd>输入候选框。形状为(N, 4)的二维张量,N为候选框数量。</dd>
<dt><tt>scores</tt>: T</dt>
<dd>输入得分。形状为(N, )的一维张量。</dd>
</dl>
#### 输出
<dl>
<dt><tt>indices</tt>: tensor(int32, Linear)</dt>
<dd>被选中的候选框索引。形状为(num_valid_boxes, )的一维张量,num_valid_boxes表示被选上的候选框数量。</dd>
</dl>
#### 类型约束
- T:tensor(float32)
### grid_sampler
#### 描述
根据`grid`的像素位置对`input`进行网格采样。
#### 模型参数
| 类型 | 参数名 | 描述 |
| ----- | -------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------- |
| `int` | `interpolation_mode` | 计算输出使用的插值模式。(0: `bilinear` , 1: `nearest`) |
| `int` | `padding_mode` | 边缘填充模式。(0: `zeros`, 1: `border`, 2: `reflection`) |
| `int` | `align_corners` | 如果`align_corners=1`,则极值(`-1``1`)会被当做输入边缘像素的中心点。如果`align_corners=0`,则它们会被看做是边缘像素的边缘点,减小分辨率对采样的影响 |
#### 输入
<dl>
<dt><tt>input</tt>: T</dt>
<dd>输入特征;形状为(N, C, inH, inW)的四维张量,其中N为batch大小,C为输入通道数,inH和inW为输入特征图的高和宽。</dd>
<dt><tt>grid</tt>: T</dt>
<dd>输入网格;形状为(N, outH, outW, 2)的四维张量,outH和outW为输出的高和宽。 </dd>
</dl>
#### 输出
<dl>
<dt><tt>output</tt>: T</dt>
<dd>输出特征;形状为(N, C, outH, outW)的四维张量。</dd>
</dl>
#### 类型约束
- T:tensor(float32, Linear)
### CornerPool
#### 描述
`input`计算CornerPool。请阅读[CornerNet -- Detecting Objects as Paired Keypoints](https://arxiv.org/abs/1808.01244)了解更多细节。
#### 模型参数
| 类型 | 参数名 | 描述 |
| ----- | ------ | -------------------------------------------------------- |
| `int` | `mode` | 池化模式。(0: `top`, 1: `bottom`, 2: `left`, 3: `right`) |
#### 输入
<dl>
<dt><tt>input</tt>: T</dt>
<dd>输入特征;形状为(N, C, H, W)的四维张量,其中N为batch大小,C为输入通道数,H和W为输入特征图的高和宽。</dd>
</dl>
#### 输出
<dl>
<dt><tt>output</tt>: T</dt>
<dd>输出特征;形状为(N, C, H, W)的四维张量。</dd>
</dl>
#### 类型约束
- T:tensor(float32)
### cummax
#### 描述
返回一个元组(`values`, `indices`),其中`values``input``dim`维的累计最大值,`indices`为第`dim`维最大值位置。请阅读[torch.cummax](https://pytorch.org/docs/stable/generated/torch.cummax.html)了解更多细节。
#### 模型参数
| 类型 | 参数名 | 描述 |
| ----- | ------ | ------------------ |
| `int` | `dim` | 进行累计计算的维度 |
#### 输入
<dl>
<dt><tt>input</tt>: T</dt>
<dd>输入张量;可以使任意形状;也支持空Tensor</dd>
</dl>
#### 输出
<dl>
<dt><tt>output</tt>: T</dt>
<dd>`input``dim`维的累计最大值,形状与`input`相同。类型和`input`一致</dd>
<dt><tt>indices</tt>: tensor(int64)</dt>
<dd>`dim`维最大值位置,形状与`input`相同。</dd>
</dl>
#### 类型约束
- T:tensor(float32)
### cummin
#### 描述
返回一个元组(`values`, `indices`),其中`values``input``dim`维的累计最小值,`indices`为第`dim`维最小值位置。请阅读[torch.cummin](https://pytorch.org/docs/stable/generated/torch.cummin.html)了解更多细节。
#### 模型参数
| 类型 | 参数名 | 描述 |
| ----- | ------ | ------------------ |
| `int` | `dim` | 进行累计计算的维度 |
#### 输入
<dl>
<dt><tt>input</tt>: T</dt>
<dd>输入张量;可以是任意形状;也支持空Tensor</dd>
</dl>
#### 输出
<dl>
<dt><tt>output</tt>: T</dt>
<dd>`input``dim`维的累计最小值,形状与`input`相同。类型和`input`一致</dd>
<dt><tt>indices</tt>: tensor(int64)</dt>
<dd>`dim`维最小值位置,形状与`input`相同。</dd>
</dl>
#### 类型约束
- T:tensor(float32)
### MMCVModulatedDeformConv2d
#### 描述
在输入特征上计算Modulated Deformable Convolution,请阅读[Deformable ConvNets v2: More Deformable, Better Results](https://arxiv.org/abs/1811.11168?from=timeline)了解更多细节。
#### 模型参数
| 类型 | 参数名 | 描述 |
| -------------- | ------------------- | ------------------------------------------------------------- |
| `list of ints` | `stride` | 卷积的步长 (sH, sW) |
| `list of ints` | `padding` | 输入特征填充大小 (padH, padW) |
| `list of ints` | `dilation` | 卷积核各元素间隔 (dH, dW) |
| `int` | `deformable_groups` | 可变偏移量的分组,通常置位1即可 |
| `int` | `groups` | 卷积分组数,`input_channel`会根据这个值被分为数个分组进行计算 |
#### 输入
<dl>
<dt><tt>inputs[0]</tt>: T</dt>
<dd>输入特征;形状为(N, C, inH, inW)的四维张量,其中N为batch大小,C为输入通道数,inH和inW为输入特征图的高和宽。</dd>
<dt><tt>inputs[1]</tt>: T</dt>
<dd>输入偏移量;形状为(N, deformable_group* 2* kH* kW, outH, outW)的四维张量,kH和kW为输入特征图的高和宽,outH和outW为输入特征图的高和宽。</dd>
<dt><tt>inputs[2]</tt>: T</dt>
<dd>输入掩码;形状为(N, deformable_group* kH* kW, outH, outW)的四维张量。</dd>
<dt><tt>inputs[3]</tt>: T</dt>
<dd>输入权重;形状为(output_channel, input_channel, kH, kW)的四维张量。</dd>
<dt><tt>inputs[4]</tt>: T, optional</dt>
<dd>输入偏移量;形状为(output_channel)的一维张量。</dd>
</dl>
#### 输出
<dl>
<dt><tt>outputs[0]</tt>: T</dt>
<dd>输出特征;形状为(N, output_channel, outH, outW)的四维张量。</dd>
</dl>
#### 类型约束
- T:tensor(float32, Linear)
## MMCV中的ONNX Runtime自定义算子
### ONNX Runtime介绍
**ONNX Runtime**是一个跨平台的推理与训练加速器,适配许多常用的机器学习/深度神经网络框架。请访问[github](https://github.com/microsoft/onnxruntime)了解更多信息。
### ONNX介绍
**ONNX****Open Neural Network Exchange**的缩写,是许多机器学习/深度神经网络框架使用的*中间表示(IR)*。请访问[github](https://github.com/onnx/onnx)了解更多信息。
### 为什么要在MMCV中添加ONNX自定义算子?
- 为了验证ONNX模型在ONNX Runtime下的推理的正确性。
- 为了方便使用了`mmcv.ops`自定义算子的模型的部署工作。
### MMCV已支持的算子
| 算子 | CPU | GPU | MMCV版本 |
| :------------------------------------------------------------------------------: | :-: | :-: | :------: |
| [SoftNMS](onnxruntime_custom_ops.md#softnms) | Y | N | 1.2.3 |
| [RoIAlign](onnxruntime_custom_ops.md#roialign) | Y | N | 1.2.5 |
| [NMS](onnxruntime_custom_ops.md#nms) | Y | N | 1.2.7 |
| [grid_sampler](onnxruntime_custom_ops.md#grid_sampler) | Y | N | 1.3.1 |
| [CornerPool](onnxruntime_custom_ops.md#cornerpool) | Y | N | 1.3.4 |
| [cummax](onnxruntime_custom_ops.md#cummax) | Y | N | 1.3.4 |
| [cummin](onnxruntime_custom_ops.md#cummin) | Y | N | 1.3.4 |
| [MMCVModulatedDeformConv2d](onnxruntime_custom_ops.md#mmcvmodulateddeformconv2d) | Y | N | 1.3.12 |
### 如何编译ONNX Runtime自定义算子?
*请注意我们仅在**onnxruntime>=1.8.1**的Linux x86-64 cpu平台上进行过测试*
#### 准备工作
- 克隆代码仓库
```bash
git clone https://github.com/open-mmlab/mmcv.git
```
- 从ONNX Runtime下载`onnxruntime-linux`[releases](https://github.com/microsoft/onnxruntime/releases/tag/v1.8.1),解压缩,根据路径创建变量`ONNXRUNTIME_DIR`并把路径下的lib目录添加到`LD_LIBRARY_PATH`,步骤如下:
```bash
wget https://github.com/microsoft/onnxruntime/releases/download/v1.8.1/onnxruntime-linux-x64-1.8.1.tgz
tar -zxvf onnxruntime-linux-x64-1.8.1.tgz
cd onnxruntime-linux-x64-1.8.1
export ONNXRUNTIME_DIR=$(pwd)
export LD_LIBRARY_PATH=$ONNXRUNTIME_DIR/lib:$LD_LIBRARY_PATH
```
#### Linux系统下编译
```bash
cd mmcv ## to MMCV root directory
MMCV_WITH_OPS=1 MMCV_WITH_ORT=1 python setup.py develop
```
### 如何在python下使用ONNX Runtime对导出的ONNX模型做编译
使用`pip`安装ONNX Runtime
```bash
pip install onnxruntime==1.8.1
```
推理范例
```python
import os
import numpy as np
import onnxruntime as ort
from mmcv.ops import get_onnxruntime_op_path
ort_custom_op_path = get_onnxruntime_op_path()
assert os.path.exists(ort_custom_op_path)
session_options = ort.SessionOptions()
session_options.register_custom_ops_library(ort_custom_op_path)
## exported ONNX model with custom operators
onnx_file = 'sample.onnx'
input_data = np.random.randn(1, 3, 224, 224).astype(np.float32)
sess = ort.InferenceSession(onnx_file, session_options)
onnx_results = sess.run(None, {'input' : input_data})
```
### 如何为MMCV添加ONNX Runtime的自定义算子
#### 开发前提醒
- 该算子的ONNX Runtime实现尚未在MMCV中支持[已实现算子列表](https://github.com/microsoft/onnxruntime/blob/master/docs/OperatorKernels.md)
- 确保该自定义算子可以被ONNX导出。
#### 添加方法
`soft_nms`为例:
1. 在ONNX Runtime头文件目录`mmcv/ops/csrc/onnxruntime/`下添加头文件`soft_nms.h`
2. 在ONNX Runtime源码目录`mmcv/ops/csrc/onnxruntime/cpu/`下添加算子实现`soft_nms.cpp`
3.[onnxruntime_register.cpp](../../../mmcv/ops/csrc/onnxruntime/cpu/onnxruntime_register.cpp)中注册实现的算子`soft_nms`
```c++
#include "soft_nms.h"
SoftNmsOp c_SoftNmsOp;
if (auto status = ortApi->CustomOpDomain_Add(domain, &c_SoftNmsOp)) {
return status;
}
```
4.`tests/test_ops/test_onnx.py`添加单元测试,
可以参考[here](../../tests/test_ops/test_onnx.py)
**最后,欢迎为MMCV添加ONNX Runtime自定义算子** :nerd_face:
### 已知问题
- "RuntimeError: tuple appears in op that does not forward tuples, unsupported kind: `prim::PythonOp`."
1. 请注意`cummax``cummin`算子是在torch >= 1.5.0被添加的。但他们需要在torch version >= 1.7.0才能正确导出。否则会在导出时发生上面的错误。
2. 解决方法:升级PyTorch到1.7.0以上版本
### 引用
- [How to export Pytorch model with custom op to ONNX and run it in ONNX Runtime](https://github.com/onnx/tutorials/blob/master/PyTorchCustomOperator/README.md)
- [How to add a custom operator/kernel in ONNX Runtime](https://onnxruntime.ai/docs/reference/operators/add-custom-op.html)
## TensorRT自定义算子
<!-- TOC -->
- [TensorRT自定义算子](#tensorrt自定义算子)
- [MMCVRoIAlign](#mmcvroialign)
- [描述](#描述)
- [模型参数](#模型参数)
- [输入](#输入)
- [输出](#输出)
- [类型约束](#类型约束)
- [ScatterND](#scatternd)
- [描述](#描述-1)
- [模型参数](#模型参数-1)
- [输入](#输入-1)
- [输出](#输出-1)
- [类型约束](#类型约束-1)
- [NonMaxSuppression](#nonmaxsuppression)
- [描述](#描述-2)
- [模型参数](#模型参数-2)
- [输入](#输入-2)
- [输出](#输出-2)
- [类型约束](#类型约束-2)
- [MMCVDeformConv2d](#mmcvdeformconv2d)
- [描述](#描述-3)
- [模型参数](#模型参数-3)
- [输入](#输入-3)
- [输出](#输出-3)
- [类型约束](#类型约束-3)
- [grid_sampler](#grid_sampler)
- [描述](#描述-4)
- [模型参数](#模型参数-4)
- [输入](#输入-4)
- [输出](#输出-4)
- [类型约束](#类型约束-4)
- [cummax](#cummax)
- [描述](#描述-5)
- [模型参数](#模型参数-5)
- [输入](#输入-5)
- [输出](#输出-5)
- [类型约束](#类型约束-5)
- [cummin](#cummin)
- [描述](#描述-6)
- [模型参数](#模型参数-6)
- [输入](#输入-6)
- [输出](#输出-6)
- [类型约束](#类型约束-6)
- [MMCVInstanceNormalization](#mmcvinstancenormalization)
- [描述](#描述-7)
- [模型参数](#模型参数-7)
- [输入](#输入-7)
- [输出](#输出-7)
- [类型约束](#类型约束-7)
- [MMCVModulatedDeformConv2d](#mmcvmodulateddeformconv2d)
- [描述](#描述-8)
- [模型参数](#模型参数-8)
- [输入](#输入-8)
- [输出](#输出-8)
- [类型约束](#类型约束-8)
<!-- TOC -->
### MMCVRoIAlign
#### 描述
在特征图上计算RoIAlign,在多数双阶段目标检测模型的bbox_head中使用
#### 模型参数
| 类型 | 参数名 | 描述 |
| ------- | ---------------- | ------------------------------------------------------- |
| `int` | `output_height` | roi特征的输出高度 |
| `int` | `output_width` | roi特征的输出宽度 |
| `float` | `spatial_scale` | 输入检测框的缩放系数 |
| `int` | `sampling_ratio` | 输出的采样率。`0`表示使用密集采样 |
| `str` | `mode` | 池化方式。 `avg``max` |
| `int` | `aligned` | 如果`aligned=1`,则像素会进行-0.5的偏移以达到更好的对齐 |
#### 输入
<dl>
<dt><tt>inputs[0]</tt>: T</dt>
<dd>输入特征图;形状为(N, C, H, W)的四维张量,其中N为batch大小,C为输入通道数,H和W为输入特征图的高和宽。</dd>
<dt><tt>inputs[1]</tt>: T</dt>
<dd>需要进行池化的感兴趣区域;形状为(num_rois, 5)的二维张量,内容为[[batch_index, x1, y1, x2, y2], ...]。rois的坐标为输入特征图的坐标系。</dd>
</dl>
#### 输出
<dl>
<dt><tt>outputs[0]</tt>: T</dt>
<dd>池化的输出;形状为(num_rois, C, output_height, output_width)的四维张量。每个输出特征feat[i]都与输入感兴趣区域rois[i]一一对应。<dd>
</dl>
#### 类型约束
- T:tensor(float32, Linear)
### ScatterND
#### 描述
ScatterND接收三个输入,分别为秩为r >= 1的`data`,秩为q >= 1的`indices`以及秩为 q + r - indices.shape\[-1\] -1 的`update`。输出的计算方式为:首先创建一个`data`的拷贝,然后根据`indces`的值使用`update`对拷贝的`data`进行更新。注意`indices`中不应该存在相同的条目,也就是说对同一个位置进行一次以上的更新是不允许的。
输出的计算方式可以参考如下代码:
```python
output = np.copy(data)
update_indices = indices.shape[:-1]
for idx in np.ndindex(update_indices):
output[indices[idx]] = updates[idx]
```
#### 模型参数
#### 输入
<dl>
<dt><tt>inputs[0]</tt>: T</dt>
<dd>秩为r >= 1的输入`data`</dd>
<dt><tt>inputs[1]</tt>: tensor(int32, Linear)</dt>
<dd>秩为q >= 1的输入`update`</dd>
<dt><tt>inputs[2]</tt>: T</dt>
<dd>秩为 q + r - indices.shape[-1] -1 的输入`update`</dd>
</dl>
#### 输出
<dl>
<dt><tt>outputs[0]</tt>: T</dt>
<dd>秩为r >= 1的输出张量</dd>
</dl>
#### 类型约束
- T:tensor(float32, Linear), tensor(int32, Linear)
### NonMaxSuppression
#### 描述
根据IoU阈值对候选框进行非极大值抑制。
#### 模型参数
| 类型 | 参数名 | 描述 |
| ------- | ---------------------------- | -------------------------------------------------------------------------------------------- |
| `int` | `center_point_box` | 0 - 候选框的格式为\[y1, x1, y2, x2\], 1-候选框的格式为\[x_center, y_center, width, height\] |
| `int` | `max_output_boxes_per_class` | 每一类最大的输出检测框个数。默认为0,输出检测框个数等于输入候选框数 |
| `float` | `iou_threshold` | 用来判断候选框重合度的阈值,取值范围\[0, 1\]。默认值为0 |
| `float` | `score_threshold` | 用来判断候选框是否合法的阈值 |
| `int` | `offset` | 检测框长宽计算方式为(x2 - x1 + offset),可选值0或1 |
#### 输入
<dl>
<dt><tt>inputs[0]</tt>: T</dt>
<dd>输入候选框。形状为(num_batches, spatial_dimension, 4)的三维张量</dd>
<dt><tt>inputs[1]</tt>: T</dt>
<dd>输入得分。形状为(num_batches, num_classes, spatial_dimension)的三维张量</dd>
</dl>
#### 输出
<dl>
<dt><tt>outputs[0]</tt>: tensor(int32, Linear)</dt>
<dd>被选中的候选框索引。形状为(num_selected_indices, 3)的二维张量。每一行内容为[batch_index, class_index, box_index]。</dd>
<dd>其中 num_selected_indices=num_batches* num_classes* min(max_output_boxes_per_class, spatial_dimension)。</dd>
<dd>所有未被选中的候选框索引都会被填充为-1</dd>
</dl>
#### 类型约束
- T:tensor(float32, Linear)
### MMCVDeformConv2d
#### 描述
在输入特征上计算Deformable Convolution,请阅读[Deformable Convolutional Network](https://arxiv.org/abs/1703.06211)了解更多细节。
#### 模型参数
| 类型 | 参数名 | 描述 |
| -------------- | ------------------ | --------------------------------------------------------------------------------------------- |
| `list of ints` | `stride` | 卷积的步长 (sH, sW) |
| `list of ints` | `padding` | 输入特征填充大小 (padH, padW) |
| `list of ints` | `dilation` | 卷积核各元素间隔 (dH, dW) |
| `int` | `deformable_group` | 可变偏移量的分组 |
| `int` | `group` | 卷积分组数,`input_channel`会根据这个值被分为数个分组进行计算 |
| `int` | `im2col_step` | 可变卷积使用im2col计算卷积。输入与偏移量会以im2col_step为步长分块计算,减少临时空间的使用量。 |
#### 输入
<dl>
<dt><tt>inputs[0]</tt>: T</dt>
<dd>输入特征;形状为(N, C, inH, inW)的四维张量,其中N为batch大小,C为输入通道数,inH和inW为输入特征图的高和宽</dd>
<dt><tt>inputs[1]</tt>: T</dt>
<dd>输入偏移量;形状为(N, deformable_group* 2* kH* kW, outH, outW)的四维张量,kH和kW为输入特征图的高和宽,outH和outW为输入特征图的高和宽</dd>
<dt><tt>inputs[2]</tt>: T</dt>
<dd>输入权重;形状为(output_channel, input_channel, kH, kW)的四维张量</dd>
</dl>
#### 输出
<dl>
<dt><tt>outputs[0]</tt>: T</dt>
<dd>输出特征;形状为(N, output_channel, outH, outW)的四维张量</dd>
</dl>
#### 类型约束
- T:tensor(float32, Linear)
### grid_sampler
#### 描述
根据`grid`的像素位置对`input`进行网格采样。
#### 模型参数
| 类型 | 参数名 | 描述 |
| ----- | -------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------- |
| `int` | `interpolation_mode` | 计算输出使用的插值模式。(0: `bilinear` , 1: `nearest`) |
| `int` | `padding_mode` | 边缘填充模式。(0: `zeros`, 1: `border`, 2: `reflection`) |
| `int` | `align_corners` | 如果`align_corners=1`,则极值(`-1``1`)会被当做输入边缘像素的中心点。如果`align_corners=0`,则它们会被看做是边缘像素的边缘点,减小分辨率对采样的影响 |
#### 输入
<dl>
<dt><tt>inputs[0]</tt>: T</dt>
<dd>输入特征;形状为(N, C, inH, inW)的四维张量,其中N为batch大小,C为输入通道数,inH和inW为输入特征图的高和宽</dd>
<dt><tt>inputs[1]</tt>: T</dt>
<dd>输入网格;形状为(N, outH, outW, 2)的四维张量,outH和outW为输出的高和宽 </dd>
</dl>
#### 输出
<dl>
<dt><tt>outputs[0]</tt>: T</dt>
<dd>输出特征;形状为(N, C, outH, outW)的四维张量</dd>
</dl>
#### 类型约束
- T:tensor(float32, Linear)
### cummax
#### 描述
返回一个元组(`values`, `indices`),其中`values``input``dim`维的累计最大值,`indices`为第`dim`维最大值位置。请阅读[torch.cummax](https://pytorch.org/docs/stable/generated/torch.cummax.html)了解更多细节。
#### 模型参数
| 类型 | 参数名 | 描述 |
| ----- | ------ | ------------------ |
| `int` | `dim` | 进行累计计算的维度 |
#### 输入
<dl>
<dt><tt>inputs[0]</tt>: T</dt>
<dd>输入张量;可以使任意形状</dd>
</dl>
#### 输出
<dl>
<dt><tt>outputs[0]</tt>: T</dt>
<dd>`input``dim`维的累计最大值,形状与`input`相同。类型和`input`一致</dd>
<dt><tt>outputs[1]</tt>: (int32, Linear)</dt>
<dd>`dim`维最大值位置,形状与`input`相同</dd>
</dl>
#### 类型约束
- T:tensor(float32, Linear)
### cummin
#### 描述
返回一个元组(`values`, `indices`),其中`values``input``dim`维的累计最小值,`indices`为第`dim`维最小值位置。请阅读[torch.cummin](https://pytorch.org/docs/stable/generated/torch.cummin.html)了解更多细节。
#### 模型参数
| 类型 | 参数名 | 描述 |
| ----- | ------ | ------------------ |
| `int` | `dim` | 进行累计计算的维度 |
#### 输入
<dl>
<dt><tt>inputs[0]</tt>: T</dt>
<dd>输入张量;可以使任意形状</dd>
</dl>
#### 输出
<dl>
<dt><tt>outputs[0]</tt>: T</dt>
<dd>`input``dim`维的累计最小值,形状与`input`相同。类型和`input`一致</dd>
<dt><tt>outputs[1]</tt>: (int32, Linear)</dt>
<dd>`dim`维最小值位置,形状与`input`相同</dd>
</dl>
#### 类型约束
- T:tensor(float32, Linear)
### MMCVInstanceNormalization
#### 描述
对特征计算instance normalization,请阅读[Instance Normalization: The Missing Ingredient for Fast Stylization](https://arxiv.org/abs/1607.08022)了解更多详细信息。
#### 模型参数
| 类型 | 参数名 | 描述 |
| ------- | --------- | ---------------------------- |
| `float` | `epsilon` | 用来避免除0错误。默认为1e-05 |
#### 输入
<dl>
<dt><tt>inputs[0]</tt>: T</dt>
<dd>输入特征。形状为(N, C, H, W)的四维张量,其中N为batch大小,C为输入通道数,H和W为输入特征图的高和宽</dd>
<dt><tt>inputs[1]</tt>: T</dt>
<dd>输入缩放系数。形状为(C,)的一维张量</dd>
<dt><tt>inputs[2]</tt>: T</dt>
<dd>输入偏移量。形状为(C,)的一维张量</dd>
</dl>
#### 输出
<dl>
<dt><tt>outputs[0]</tt>: T</dt>
<dd>输出特征。形状为(N, C, H, W)的四维张量</dd>
</dl>
#### 类型约束
- T:tensor(float32, Linear)
### MMCVModulatedDeformConv2d
#### 描述
在输入特征上计算Modulated Deformable Convolution,请阅读[Deformable ConvNets v2: More Deformable, Better Results](https://arxiv.org/abs/1811.11168?from=timeline)了解更多细节。
#### 模型参数
| 类型 | 参数名 | 描述 |
| -------------- | ------------------- | ------------------------------------------------------------- |
| `list of ints` | `stride` | 卷积的步长 (sH, sW) |
| `list of ints` | `padding` | 输入特征填充大小 (padH, padW) |
| `list of ints` | `dilation` | 卷积核各元素间隔 (dH, dW) |
| `int` | `deformable_groups` | 可变偏移量的分组,通常置位1即可 |
| `int` | `groups` | 卷积分组数,`input_channel`会根据这个值被分为数个分组进行计算 |
#### 输入
<dl>
<dt><tt>inputs[0]</tt>: T</dt>
<dd>输入特征;形状为(N, C, inH, inW)的四维张量,其中N为batch大小,C为输入通道数,inH和inW为输入特征图的高和宽</dd>
<dt><tt>inputs[1]</tt>: T</dt>
<dd>输入偏移量;形状为(N, deformable_group* 2* kH* kW, outH, outW)的四维张量,kH和kW为输入特征图的高和宽,outH和outW为输入特征图的高和宽</dd>
<dt><tt>inputs[2]</tt>: T</dt>
<dd>输入掩码;形状为(N, deformable_group* kH* kW, outH, outW)的四维张量</dd>
<dt><tt>inputs[3]</tt>: T</dt>
<dd>输入权重;形状为(output_channel, input_channel, kH, kW)的四维张量</dd>
<dt><tt>inputs[4]</tt>: T, optional</dt>
<dd>输入偏移量;形状为(output_channel)的一维张量</dd>
</dl>
#### 输出
<dl>
<dt><tt>outputs[0]</tt>: T</dt>
<dd>输出特征;形状为(N, output_channel, outH, outW)的四维张量</dd>
</dl>
#### 类型约束
- T:tensor(float32, Linear)
## MMCV中的TensorRT自定义算子 (实验性)
<!-- TOC -->
- [MMCV中的TensorRT自定义算子 (实验性)](#mmcv%E4%B8%AD%E7%9A%84tensorrt%E8%87%AA%E5%AE%9A%E4%B9%89%E7%AE%97%E5%AD%90-%E5%AE%9E%E9%AA%8C%E6%80%A7)
- [介绍](#%E4%BB%8B%E7%BB%8D)
- [MMCV中的TensorRT插件列表](#mmcv%E4%B8%AD%E7%9A%84tensorrt%E6%8F%92%E4%BB%B6%E5%88%97%E8%A1%A8)
- [如何编译MMCV中的TensorRT插件](#%E5%A6%82%E4%BD%95%E7%BC%96%E8%AF%91mmcv%E4%B8%AD%E7%9A%84tensorrt%E6%8F%92%E4%BB%B6)
- [准备](#%E5%87%86%E5%A4%87)
- [在Linux上编译](#%E5%9C%A8linux%E4%B8%8A%E7%BC%96%E8%AF%91)
- [创建TensorRT推理引擎并在python下进行推理](#%E5%88%9B%E5%BB%BAtensorrt%E6%8E%A8%E7%90%86%E5%BC%95%E6%93%8E%E5%B9%B6%E5%9C%A8python%E4%B8%8B%E8%BF%9B%E8%A1%8C%E6%8E%A8%E7%90%86)
- [如何在MMCV中添加新的TensorRT自定义算子](#%E5%A6%82%E4%BD%95%E5%9C%A8mmcv%E4%B8%AD%E6%B7%BB%E5%8A%A0%E6%96%B0%E7%9A%84tensorrt%E8%87%AA%E5%AE%9A%E4%B9%89%E7%AE%97%E5%AD%90)
- [主要流程](#%E4%B8%BB%E8%A6%81%E6%B5%81%E7%A8%8B)
- [注意](#%E6%B3%A8%E6%84%8F)
- [已知问题](#%E5%B7%B2%E7%9F%A5%E9%97%AE%E9%A2%98)
- [引用](#%E5%BC%95%E7%94%A8)
<!-- TOC -->
### 介绍
**NVIDIA TensorRT**是一个为深度学习模型高性能推理准备的软件开发工具(SDK)。它包括深度学习推理优化器和运行时,可为深度学习推理应用提供低延迟和高吞吐量。请访问[developer's website](https://developer.nvidia.com/tensorrt)了解更多信息。
为了简化TensorRT部署带有MMCV自定义算子的模型的流程,MMCV中添加了一系列TensorRT插件。
### MMCV中的TensorRT插件列表
| ONNX算子 | TensorRT插件 | MMCV版本 |
| :-----------------------: | :-----------------------------------------------------------------------------: | :------: |
| MMCVRoiAlign | [MMCVRoiAlign](./tensorrt_custom_ops.md#mmcvroialign) | 1.2.6 |
| ScatterND | [ScatterND](./tensorrt_custom_ops.md#scatternd) | 1.2.6 |
| NonMaxSuppression | [NonMaxSuppression](./tensorrt_custom_ops.md#nonmaxsuppression) | 1.3.0 |
| MMCVDeformConv2d | [MMCVDeformConv2d](./tensorrt_custom_ops.md#mmcvdeformconv2d) | 1.3.0 |
| grid_sampler | [grid_sampler](./tensorrt_custom_ops.md#grid-sampler) | 1.3.1 |
| cummax | [cummax](./tensorrt_custom_ops.md#cummax) | 1.3.5 |
| cummin | [cummin](./tensorrt_custom_ops.md#cummin) | 1.3.5 |
| MMCVInstanceNormalization | [MMCVInstanceNormalization](./tensorrt_custom_ops.md#mmcvinstancenormalization) | 1.3.5 |
| MMCVModulatedDeformConv2d | [MMCVModulatedDeformConv2d](./tensorrt_custom_ops.md#mmcvmodulateddeformconv2d) | master |
注意
- 以上所有算子均在 TensorRT-7.2.1.6.Ubuntu-16.04.x86_64-gnu.cuda-10.2.cudnn8.0 环境下开发。
### 如何编译MMCV中的TensorRT插件
#### 准备
- 克隆代码仓库
```bash
git clone https://github.com/open-mmlab/mmcv.git
```
- 安装TensorRT
[NVIDIA Developer Zone](https://developer.nvidia.com/nvidia-tensorrt-download) 下载合适的TensorRT版本。
比如,对安装了cuda-10.2的x86-64的Ubuntu 16.04,下载文件为`TensorRT-7.2.1.6.Ubuntu-16.04.x86_64-gnu.cuda-10.2.cudnn8.0.tar.gz`.
然后使用下面方式安装并配置环境
```bash
cd ~/Downloads
tar -xvzf TensorRT-7.2.1.6.Ubuntu-16.04.x86_64-gnu.cuda-10.2.cudnn8.0.tar.gz
export TENSORRT_DIR=`pwd`/TensorRT-7.2.1.6
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$TENSORRT_DIR/lib
```
安装python依赖: tensorrt, graphsurgeon, onnx-graphsurgeon
```bash
pip install $TENSORRT_DIR/python/tensorrt-7.2.1.6-cp37-none-linux_x86_64.whl
pip install $TENSORRT_DIR/onnx_graphsurgeon/onnx_graphsurgeon-0.2.6-py2.py3-none-any.whl
pip install $TENSORRT_DIR/graphsurgeon/graphsurgeon-0.4.5-py2.py3-none-any.whl
```
想了解更多通过tar包安装TensorRT,请访问[Nvidia' website](https://docs.nvidia.com/deeplearning/tensorrt/archives/tensorrt-721/install-guide/index.html#installing-tar).
- 安装 cuDNN
参考[Nvidia' website](https://docs.nvidia.com/deeplearning/cudnn/install-guide/index.html#installlinux-tar)安装 cuDNN 8。
#### 在Linux上编译
```bash
cd mmcv ## to MMCV root directory
MMCV_WITH_OPS=1 MMCV_WITH_TRT=1 pip install -e .
```
### 创建TensorRT推理引擎并在python下进行推理
范例如下:
```python
import torch
import onnx
from mmcv.tensorrt import (TRTWrapper, onnx2trt, save_trt_engine,
is_tensorrt_plugin_loaded)
assert is_tensorrt_plugin_loaded(), 'Requires to complie TensorRT plugins in mmcv'
onnx_file = 'sample.onnx'
trt_file = 'sample.trt'
onnx_model = onnx.load(onnx_file)
## Model input
inputs = torch.rand(1, 3, 224, 224).cuda()
## Model input shape info
opt_shape_dict = {
'input': [list(inputs.shape),
list(inputs.shape),
list(inputs.shape)]
}
## Create TensorRT engine
max_workspace_size = 1 << 30
trt_engine = onnx2trt(
onnx_model,
opt_shape_dict,
max_workspace_size=max_workspace_size)
## Save TensorRT engine
save_trt_engine(trt_engine, trt_file)
## Run inference with TensorRT
trt_model = TRTWrapper(trt_file, ['input'], ['output'])
with torch.no_grad():
trt_outputs = trt_model({'input': inputs})
output = trt_outputs['output']
```
### 如何在MMCV中添加新的TensorRT自定义算子
#### 主要流程
下面是主要的步骤:
1. 添加c++头文件
2. 添加c++源文件
3. 添加cuda kernel文件
4.`trt_plugin.cpp`中注册插件
5.`tests/test_ops/test_tensorrt.py`中添加单元测试
**以RoIAlign算子插件`roi_align`举例。**
1. 在TensorRT包含目录`mmcv/ops/csrc/tensorrt/`中添加头文件`trt_roi_align.hpp`
2. 在TensorRT源码目录`mmcv/ops/csrc/tensorrt/plugins/`中添加头文件`trt_roi_align.cpp`
3. 在TensorRT源码目录`mmcv/ops/csrc/tensorrt/plugins/`中添加cuda kernel文件`trt_roi_align_kernel.cu`
4.[trt_plugin.cpp](https://github.com/open-mmlab/mmcv/blob/master/mmcv/ops/csrc/tensorrt/plugins/trt_plugin.cpp)中注册`roi_align`插件
```c++
#include "trt_plugin.hpp"
#include "trt_roi_align.hpp"
REGISTER_TENSORRT_PLUGIN(RoIAlignPluginDynamicCreator);
extern "C" {
bool initLibMMCVInferPlugins() { return true; }
} // extern "C"
```
5.`tests/test_ops/test_tensorrt.py`中添加单元测试
#### 注意
- 部分MMCV中的自定义算子存在对应的cuda实现,在进行TensorRT插件开发的时候可以参考。
### 已知问题
-
### 引用
- [Developer guide of Nvidia TensorRT](https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html)
- [TensorRT Open Source Software](https://github.com/NVIDIA/TensorRT)
- [onnx-tensorrt](https://github.com/onnx/onnx-tensorrt)
- [TensorRT python API](https://docs.nvidia.com/deeplearning/tensorrt/api/python_api/index.html)
- [TensorRT c++ plugin API](https://docs.nvidia.com/deeplearning/tensorrt/api/c_api/classnvinfer1_1_1_i_plugin.html)
...@@ -173,8 +173,4 @@ b. 安装精简版 ...@@ -173,8 +173,4 @@ b. 安装精简版
pip install mmcv pip install mmcv
``` ```
c. 安装完整版并且编译 onnxruntime 的自定义算子
- 详细的指南请查看 [这里](https://mmcv.readthedocs.io/zh_CN/latest/deployment/onnxruntime_custom_ops.html)
如果想从源码编译 MMCV,请参考[该文档](https://mmcv.readthedocs.io/zh_CN/latest/get_started/build.html) 如果想从源码编译 MMCV,请参考[该文档](https://mmcv.readthedocs.io/zh_CN/latest/get_started/build.html)
...@@ -21,16 +21,6 @@ ...@@ -21,16 +21,6 @@
understand_mmcv/cnn.md understand_mmcv/cnn.md
understand_mmcv/ops.md understand_mmcv/ops.md
.. toctree::
:maxdepth: 2
:caption: 部署
deployment/onnx.md
deployment/onnxruntime_op.md
deployment/onnxruntime_custom_ops.md
deployment/tensorrt_plugin.md
deployment/tensorrt_custom_ops.md
.. toctree:: .. toctree::
:caption: 语言切换 :caption: 语言切换
......
# Copyright (c) OpenMMLab. All rights reserved. # Copyright (c) OpenMMLab. All rights reserved.
from .info import is_custom_op_loaded
from .symbolic import register_extra_symbolics from .symbolic import register_extra_symbolics
__all__ = ['register_extra_symbolics', 'is_custom_op_loaded'] __all__ = ['register_extra_symbolics']
# Copyright (c) OpenMMLab. All rights reserved.
import os
import warnings
import torch
def is_custom_op_loaded() -> bool:
# Following strings of text style are from colorama package
bright_style, reset_style = '\x1b[1m', '\x1b[0m'
red_text, blue_text = '\x1b[31m', '\x1b[34m'
white_background = '\x1b[107m'
msg = white_background + bright_style + red_text
msg += 'DeprecationWarning: This function will be deprecated in future. '
msg += blue_text + 'Welcome to use the unified model deployment toolbox '
msg += 'MMDeploy: https://github.com/open-mmlab/mmdeploy'
msg += reset_style
warnings.warn(msg)
flag = False
try:
from ..tensorrt import is_tensorrt_plugin_loaded
flag = is_tensorrt_plugin_loaded()
except (ImportError, ModuleNotFoundError):
pass
if not flag:
try:
from ..ops import get_onnxruntime_op_path
ort_lib_path = get_onnxruntime_op_path()
flag = os.path.exists(ort_lib_path)
except (ImportError, ModuleNotFoundError):
pass
return flag or torch.__version__ == 'parrots'
...@@ -27,8 +27,7 @@ from .furthest_point_sample import (furthest_point_sample, ...@@ -27,8 +27,7 @@ from .furthest_point_sample import (furthest_point_sample,
from .fused_bias_leakyrelu import FusedBiasLeakyReLU, fused_bias_leakyrelu from .fused_bias_leakyrelu import FusedBiasLeakyReLU, fused_bias_leakyrelu
from .gather_points import gather_points from .gather_points import gather_points
from .group_points import GroupAll, QueryAndGroup, grouping_operation from .group_points import GroupAll, QueryAndGroup, grouping_operation
from .info import (get_compiler_version, get_compiling_cuda_version, from .info import get_compiler_version, get_compiling_cuda_version
get_onnxruntime_op_path)
from .iou3d import (boxes_iou3d, boxes_iou_bev, boxes_overlap_bev, nms3d, from .iou3d import (boxes_iou3d, boxes_iou_bev, boxes_overlap_bev, nms3d,
nms3d_normal, nms_bev, nms_normal_bev) nms3d_normal, nms_bev, nms_normal_bev)
from .knn import knn from .knn import knn
...@@ -76,9 +75,8 @@ __all__ = [ ...@@ -76,9 +75,8 @@ __all__ = [
'deform_conv2d', 'DeformRoIPool', 'DeformRoIPoolPack', 'deform_conv2d', 'DeformRoIPool', 'DeformRoIPoolPack',
'ModulatedDeformRoIPoolPack', 'deform_roi_pool', 'SigmoidFocalLoss', 'ModulatedDeformRoIPoolPack', 'deform_roi_pool', 'SigmoidFocalLoss',
'SoftmaxFocalLoss', 'sigmoid_focal_loss', 'softmax_focal_loss', 'SoftmaxFocalLoss', 'sigmoid_focal_loss', 'softmax_focal_loss',
'get_compiler_version', 'get_compiling_cuda_version', 'get_compiler_version', 'get_compiling_cuda_version', 'MaskedConv2d',
'get_onnxruntime_op_path', 'MaskedConv2d', 'masked_conv2d', 'masked_conv2d', 'ModulatedDeformConv2d', 'ModulatedDeformConv2dPack',
'ModulatedDeformConv2d', 'ModulatedDeformConv2dPack',
'modulated_deform_conv2d', 'batched_nms', 'nms', 'soft_nms', 'nms_match', 'modulated_deform_conv2d', 'batched_nms', 'nms', 'soft_nms', 'nms_match',
'RoIAlign', 'roi_align', 'RoIPool', 'roi_pool', 'SyncBatchNorm', 'Conv2d', 'RoIAlign', 'roi_align', 'RoIPool', 'roi_pool', 'SyncBatchNorm', 'Conv2d',
'ConvTranspose2d', 'Linear', 'MaxPool2d', 'CrissCrossAttention', 'PSAMask', 'ConvTranspose2d', 'Linear', 'MaxPool2d', 'CrissCrossAttention', 'PSAMask',
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment