[Refactor]: Remove deployment for dev-2.x (#2225)

* remove deploy for 2.0 * update onnx ut

[Refactor]: Remove deployment for dev-2.x (#2225)
* remove deploy for 2.0 * update onnx ut
2e5628b4 · q.yao · GitHub · 961373ad · 2e5628b4 · 2e5628b4
Unverified Commit 2e5628b4 authored Aug 26, 2022 by q.yao Committed by GitHub Aug 26, 2022
20 changed files
--- a/.circleci/config.yml
+++ b/.circleci/config.yml
@@ -131,15 +131,6 @@ jobs:
      - run:
          name: Install psutil
          command: python -m pip install psutil
-      - run:
-          name: Download onnxruntime library and install onnxruntime
-          command: |
-            wget https://github.com/microsoft/onnxruntime/releases/download/v1.8.1/onnxruntime-linux-x64-1.8.1.tgz
-            tar -zxvf onnxruntime-linux-x64-1.8.1.tgz
-            echo 'export ONNXRUNTIME_DIR=$(pwd)/onnxruntime-linux-x64-1.8.1' >> $BASH_ENV
-            echo 'export LD_LIBRARY_PATH=$ONNXRUNTIME_DIR/lib:$LD_LIBRARY_PATH' >> $BASH_ENV
-            source $BASH_ENV
-            python -m pip install onnxruntime==1.8.1
      - run:
          name: Build and install
          command: |

--- a/README.md
+++ b/README.md
@@ -241,10 +241,6 @@ b. Install the lite version.
 pip install mmcv
 ```

-c. Install full version with custom operators for onnxruntime
-
- Check [here](docs/en/deployment/onnxruntime_op.md) for detailed instruction.
-
 If you would like to build MMCV from source, please refer to the [guide](https://mmcv.readthedocs.io/en/latest/get_started/build.html).

 ## FAQ

--- a/README_zh-CN.md
+++ b/README_zh-CN.md
@@ -238,10 +238,6 @@ b. 安装精简版
 pip install mmcv
 ```

-c. 安装完整版并且编译 onnxruntime 的自定义算子
-
- 详细的指南请查看[这里](docs/zh_cn/deployment/onnxruntime_op.md)。
-
 如果想从源码编译 MMCV，请参考[该文档](https://mmcv.readthedocs.io/zh_CN/latest/get_started/build.html)。

 ## FAQ

--- a/docs/en/deployment/onnx.md
+++ b/docs/en/deployment/onnx.md
-## Introduction of mmcv.onnx module
-
-### <span style="color:red">DeprecationWarning</span>
-
-ONNX support will be deprecated in the future.
-Welcome to use the unified model deployment toolbox MMDeploy: https://github.com/open-mmlab/mmdeploy
-
-### register_extra_symbolics
-
-Some extra symbolic functions need to be registered before exporting PyTorch model to ONNX.
-
-#### Example
-
-```python
-import mmcv
-from mmcv.onnx import register_extra_symbolics
-
-opset_version = 11
-register_extra_symbolics(opset_version)
-```
-
-#### Reminder
-
- *Please note that this feature is experimental and may change in the future.*
-
-#### FAQs
-
- None
--- a/docs/en/deployment/onnxruntime_custom_ops.md
+++ b/docs/en/deployment/onnxruntime_custom_ops.md
-## ONNX Runtime Custom Ops
-
-<!-- TOC -->
-
- [ONNX Runtime Custom Ops](#onnx-runtime-custom-ops)
-  - [SoftNMS](#softnms)
-    - [Description](#description)
-    - [Parameters](#parameters)
-    - [Inputs](#inputs)
-    - [Outputs](#outputs)
-    - [Type Constraints](#type-constraints)
-  - [RoIAlign](#roialign)
-    - [Description](#description-1)
-    - [Parameters](#parameters-1)
-    - [Inputs](#inputs-1)
-    - [Outputs](#outputs-1)
-    - [Type Constraints](#type-constraints-1)
-  - [NMS](#nms)
-    - [Description](#description-2)
-    - [Parameters](#parameters-2)
-    - [Inputs](#inputs-2)
-    - [Outputs](#outputs-2)
-    - [Type Constraints](#type-constraints-2)
-  - [grid_sampler](#grid_sampler)
-    - [Description](#description-3)
-    - [Parameters](#parameters-3)
-    - [Inputs](#inputs-3)
-    - [Outputs](#outputs-3)
-    - [Type Constraints](#type-constraints-3)
-  - [CornerPool](#cornerpool)
-    - [Description](#description-4)
-    - [Parameters](#parameters-4)
-    - [Inputs](#inputs-4)
-    - [Outputs](#outputs-4)
-    - [Type Constraints](#type-constraints-4)
-  - [cummax](#cummax)
-    - [Description](#description-5)
-    - [Parameters](#parameters-5)
-    - [Inputs](#inputs-5)
-    - [Outputs](#outputs-5)
-    - [Type Constraints](#type-constraints-5)
-  - [cummin](#cummin)
-    - [Description](#description-6)
-    - [Parameters](#parameters-6)
-    - [Inputs](#inputs-6)
-    - [Outputs](#outputs-6)
-    - [Type Constraints](#type-constraints-6)
-  - [MMCVModulatedDeformConv2d](#mmcvmodulateddeformconv2d)
-    - [Description](#description-7)
-    - [Parameters](#parameters-7)
-    - [Inputs](#inputs-7)
-    - [Outputs](#outputs-7)
-    - [Type Constraints](#type-constraints-7)
-  - [MMCVDeformConv2d](#mmcvdeformconv2d)
-    - [Description](#description-8)
-    - [Parameters](#parameters-8)
-    - [Inputs](#inputs-8)
-    - [Outputs](#outputs-8)
-    - [Type Constraints](#type-constraints-8)
-
-<!-- TOC -->
-
-### SoftNMS
-
-#### Description
-
-Perform soft NMS on `boxes` with `scores`. Read [Soft-NMS -- Improving Object Detection With One Line of Code](https://arxiv.org/abs/1704.04503) for detail.
-
-#### Parameters
-
-| Type    | Parameter       | Description                                                    |
-| ------- | --------------- | -------------------------------------------------------------- |
-| `float` | `iou_threshold` | IoU threshold for NMS                                          |
-| `float` | `sigma`         | hyperparameter for gaussian method                             |
-| `float` | `min_score`     | score filter threshold                                         |
-| `int`   | `method`        | method to do the nms, (0: `naive`, 1: `linear`, 2: `gaussian`) |
-| `int`   | `offset`        | `boxes` width or height is (x2 - x1 + offset). (0 or 1)        |
-
-#### Inputs
-
-<dl>
-<dt><tt>boxes</tt>: T</dt>
-<dd>Input boxes. 2-D tensor of shape (N, 4). N is the number of boxes.</dd>
-<dt><tt>scores</tt>: T</dt>
-<dd>Input scores. 1-D tensor of shape (N, ).</dd>
-</dl>
-
-#### Outputs
-
-<dl>
-<dt><tt>dets</tt>: T</dt>
-<dd>Output boxes and scores. 2-D tensor of shape (num_valid_boxes, 5), [[x1, y1, x2, y2, score], ...]. num_valid_boxes is the number of valid boxes.</dd>
-<dt><tt>indices</tt>: tensor(int64)</dt>
-<dd>Output indices. 1-D tensor of shape (num_valid_boxes, ).</dd>
-</dl>
-
-#### Type Constraints
-
- T:tensor(float32)
-
-### RoIAlign
-
-#### Description
-
-Perform RoIAlign on output feature, used in bbox_head of most two-stage detectors.
-
-#### Parameters
-
-| Type    | Parameter        | Description                                                                                                   |
-| ------- | ---------------- | ------------------------------------------------------------------------------------------------------------- |
-| `int`   | `output_height`  | height of output roi                                                                                          |
-| `int`   | `output_width`   | width of output roi                                                                                           |
-| `float` | `spatial_scale`  | used to scale the input boxes                                                                                 |
-| `int`   | `sampling_ratio` | number of input samples to take for each output sample. `0` means to take samples densely for current models. |
-| `str`   | `mode`           | pooling mode in each bin. `avg` or `max`                                                                      |
-| `int`   | `aligned`        | If `aligned=0`, use the legacy implementation in MMDetection. Else, align the results more perfectly.         |
-
-#### Inputs
-
-<dl>
-<dt><tt>input</tt>: T</dt>
-<dd>Input feature map; 4D tensor of shape (N, C, H, W), where N is the batch size, C is the numbers of channels, H and W are the height and width of the data.</dd>
-<dt><tt>rois</tt>: T</dt>
-<dd>RoIs (Regions of Interest) to pool over; 2-D tensor of shape (num_rois, 5) given as [[batch_index, x1, y1, x2, y2], ...]. The RoIs' coordinates are the coordinate system of input.</dd>
-</dl>
-
-#### Outputs
-
-<dl>
-<dt><tt>feat</tt>: T</dt>
-<dd>RoI pooled output, 4-D tensor of shape (num_rois, C, output_height, output_width). The r-th batch element feat[r-1] is a pooled feature map corresponding to the r-th RoI RoIs[r-1].<dd>
-</dl>
-
-#### Type Constraints
-
- T:tensor(float32)
-
-### NMS
-
-#### Description
-
-Filter out boxes has high IoU overlap with previously selected boxes.
-
-#### Parameters
-
-| Type    | Parameter       | Description                                                                                                        |
-| ------- | --------------- | ------------------------------------------------------------------------------------------------------------------ |
-| `float` | `iou_threshold` | The threshold for deciding whether boxes overlap too much with respect to IoU. Value range \[0, 1\]. Default to 0. |
-| `int`   | `offset`        | 0 or 1, boxes' width or height is (x2 - x1 + offset).                                                              |
-
-#### Inputs
-
-<dl>
-<dt><tt>bboxes</tt>: T</dt>
-<dd>Input boxes. 2-D tensor of shape (num_boxes, 4). num_boxes is the number of input boxes.</dd>
-<dt><tt>scores</tt>: T</dt>
-<dd>Input scores. 1-D tensor of shape (num_boxes, ).</dd>
-</dl>
-
-#### Outputs
-
-<dl>
-<dt><tt>indices</tt>: tensor(int32, Linear)</dt>
-<dd>Selected indices. 1-D tensor of shape (num_valid_boxes, ). num_valid_boxes is the number of valid boxes.</dd>
-</dl>
-
-#### Type Constraints
-
- T:tensor(float32)
-
-### grid_sampler
-
-#### Description
-
-Perform sample from `input` with pixel locations from `grid`.
-
-#### Parameters
-
-| Type  | Parameter            | Description                                                                                                                                                                                                                                                                                     |
-| ----- | -------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| `int` | `interpolation_mode` | Interpolation mode to calculate output values. (0: `bilinear` , 1: `nearest`)                                                                                                                                                                                                                   |
-| `int` | `padding_mode`       | Padding mode for outside grid values. (0: `zeros`, 1: `border`, 2: `reflection`)                                                                                                                                                                                                                |
-| `int` | `align_corners`      | If `align_corners=1`, the extrema (`-1` and `1`) are considered as referring to the center points of the input's corner pixels. If `align_corners=0`, they are instead considered as referring to the corner points of the input's corner pixels, making the sampling more resolution agnostic. |
-
-#### Inputs
-
-<dl>
-<dt><tt>input</tt>: T</dt>
-<dd>Input feature; 4-D tensor of shape (N, C, inH, inW), where N is the batch size, C is the numbers of channels, inH and inW are the height and width of the data.</dd>
-<dt><tt>grid</tt>: T</dt>
-<dd>Input offset; 4-D tensor of shape (N, outH, outW, 2), where outH and outW is the height and width of offset and output. </dd>
-</dl>
-
-#### Outputs
-
-<dl>
-<dt><tt>output</tt>: T</dt>
-<dd>Output feature; 4-D tensor of shape (N, C, outH, outW).</dd>
-</dl>
-
-#### Type Constraints
-
- T:tensor(float32, Linear)
-
-### CornerPool
-
-#### Description
-
-Perform CornerPool on `input` features. Read [CornerNet -- Detecting Objects as Paired Keypoints](https://arxiv.org/abs/1808.01244) for more details.
-
-#### Parameters
-
-| Type  | Parameter | Description                                                      |
-| ----- | --------- | ---------------------------------------------------------------- |
-| `int` | `mode`    | corner pool mode, (0: `top`, 1: `bottom`, 2: `left`, 3: `right`) |
-
-#### Inputs
-
-<dl>
-<dt><tt>input</tt>: T</dt>
-<dd>Input features. 4-D tensor of shape (N, C, H, W). N is the batch size.</dd>
-</dl>
-
-#### Outputs
-
-<dl>
-<dt><tt>output</tt>: T</dt>
-<dd>Output the pooled features. 4-D tensor of shape (N, C, H, W).</dd>
-</dl>
-
-#### Type Constraints
-
- T:tensor(float32)
-
-### cummax
-
-#### Description
-
-Returns a tuple (`values`, `indices`) where `values` is the cumulative maximum elements of `input` in the dimension `dim`. And `indices` is the index location of each maximum value found in the dimension `dim`. Read [torch.cummax](https://pytorch.org/docs/stable/generated/torch.cummax.html) for more details.
-
-#### Parameters
-
-| Type  | Parameter | Description                            |
-| ----- | --------- | -------------------------------------- |
-| `int` | `dim`     | the dimension to do the operation over |
-
-#### Inputs
-
-<dl>
-<dt><tt>input</tt>: T</dt>
-<dd>The input tensor with various shapes. Tensor with empty element is also supported.</dd>
-</dl>
-
-#### Outputs
-
-<dl>
-<dt><tt>output</tt>: T</dt>
-<dd>Output the cumulative maximum elements of `input` in the dimension `dim`, with the same shape and dtype as `input`.</dd>
-<dt><tt>indices</tt>: tensor(int64)</dt>
-<dd>Output the index location of each cumulative maximum value found in the dimension `dim`, with the same shape as `input`.</dd>
-</dl>
-
-#### Type Constraints
-
- T:tensor(float32)
-
-### cummin
-
-#### Description
-
-Returns a tuple (`values`, `indices`) where `values` is the cumulative minimum elements of `input` in the dimension `dim`. And `indices` is the index location of each minimum value found in the dimension `dim`. Read [torch.cummin](https://pytorch.org/docs/stable/generated/torch.cummin.html) for more details.
-
-#### Parameters
-
-| Type  | Parameter | Description                            |
-| ----- | --------- | -------------------------------------- |
-| `int` | `dim`     | the dimension to do the operation over |
-
-#### Inputs
-
-<dl>
-<dt><tt>input</tt>: T</dt>
-<dd>The input tensor with various shapes. Tensor with empty element is also supported.</dd>
-</dl>
-
-#### Outputs
-
-<dl>
-<dt><tt>output</tt>: T</dt>
-<dd>Output the cumulative minimum elements of `input` in the dimension `dim`, with the same shape and dtype as `input`.</dd>
-<dt><tt>indices</tt>: tensor(int64)</dt>
-<dd>Output the index location of each cumulative minimum value found in the dimension `dim`, with the same shape as `input`.</dd>
-</dl>
-
-#### Type Constraints
-
- T:tensor(float32)
-
-### MMCVModulatedDeformConv2d
-
-#### Description
-
-Perform Modulated Deformable Convolution on input feature, read [Deformable ConvNets v2: More Deformable, Better Results](https://arxiv.org/abs/1811.11168?from=timeline) for detail.
-
-#### Parameters
-
-| Type           | Parameter           | Description                                                                           |
-| -------------- | ------------------- | ------------------------------------------------------------------------------------- |
-| `list of ints` | `stride`            | The stride of the convolving kernel. (sH, sW)                                         |
-| `list of ints` | `padding`           | Paddings on both sides of the input. (padH, padW)                                     |
-| `list of ints` | `dilation`          | The spacing between kernel elements. (dH, dW)                                         |
-| `int`          | `deformable_groups` | Groups of deformable offset.                                                          |
-| `int`          | `groups`            | Split input into groups. `input_channel` should be divisible by the number of groups. |
-
-#### Inputs
-
-<dl>
-<dt><tt>inputs[0]</tt>: T</dt>
-<dd>Input feature; 4-D tensor of shape (N, C, inH, inW), where N is the batch size, C is the number of channels, inH and inW are the height and width of the data.</dd>
-<dt><tt>inputs[1]</tt>: T</dt>
-<dd>Input offset; 4-D tensor of shape (N, deformable_group* 2* kH* kW, outH, outW), where kH and kW is the height and width of weight, outH and outW is the height and width of offset and output.</dd>
-<dt><tt>inputs[2]</tt>: T</dt>
-<dd>Input mask; 4-D tensor of shape (N, deformable_group* kH* kW, outH, outW), where kH and kW is the height and width of weight, outH and outW is the height and width of offset and output.</dd>
-<dt><tt>inputs[3]</tt>: T</dt>
-<dd>Input weight; 4-D tensor of shape (output_channel, input_channel, kH, kW).</dd>
-<dt><tt>inputs[4]</tt>: T, optional</dt>
-<dd>Input bias; 1-D tensor of shape (output_channel).</dd>
-</dl>
-
-#### Outputs
-
-<dl>
-<dt><tt>outputs[0]</tt>: T</dt>
-<dd>Output feature; 4-D tensor of shape (N, output_channel, outH, outW).</dd>
-</dl>
-
-#### Type Constraints
-
- T:tensor(float32, Linear)
-
-### MMCVDeformConv2d
-
-#### Description
-
-Perform Deformable Convolution on input feature, read [Deformable Convolutional Network](https://arxiv.org/abs/1703.06211) for detail.
-
-#### Parameters
-
-| Type           | Parameter          | Description                                                                                                                       |
-| -------------- | ------------------ | --------------------------------------------------------------------------------------------------------------------------------- |
-| `list of ints` | `stride`           | The stride of the convolving kernel. (sH, sW)                                                                                     |
-| `list of ints` | `padding`          | Paddings on both sides of the input. (padH, padW)                                                                                 |
-| `list of ints` | `dilation`         | The spacing between kernel elements. (dH, dW)                                                                                     |
-| `int`          | `deformable_group` | Groups of deformable offset.                                                                                                      |
-| `int`          | `group`            | Split input into groups. `input_channel` should be divisible by the number of groups.                                             |
-| `int`          | `im2col_step`      | DeformableConv2d use im2col to compute convolution. im2col_step is used to split input and offset, reduce memory usage of column. |
-
-#### Inputs
-
-<dl>
-<dt><tt>inputs[0]</tt>: T</dt>
-<dd>Input feature; 4-D tensor of shape (N, C, inH, inW), where N is the batch size, C is the numbers of channels, inH and inW are the height and width of the data.</dd>
-<dt><tt>inputs[1]</tt>: T</dt>
-<dd>Input offset; 4-D tensor of shape (N, deformable_group* 2* kH* kW, outH, outW), where kH and kW is the height and width of weight, outH and outW is the height and width of offset and output.</dd>
-<dt><tt>inputs[2]</tt>: T</dt>
-<dd>Input weight; 4-D tensor of shape (output_channel, input_channel, kH, kW).</dd>
-</dl>
-
-#### Outputs
-
-<dl>
-<dt><tt>outputs[0]</tt>: T</dt>
-<dd>Output feature; 4-D tensor of shape (N, output_channel, outH, outW).</dd>
-</dl>
-
-#### Type Constraints
-
- T:tensor(float32, Linear)
--- a/docs/en/deployment/onnxruntime_op.md
+++ b/docs/en/deployment/onnxruntime_op.md
-## ONNX Runtime Deployment
-
-### <span style="color:red">DeprecationWarning</span>
-
-ONNX support will be deprecated in the future.
-Welcome to use the unified model deployment toolbox MMDeploy: https://github.com/open-mmlab/mmdeploy
-
-### Introduction of ONNX Runtime
-
-**ONNX Runtime** is a cross-platform inferencing and training accelerator compatible with many popular ML/DNN frameworks. Check its [github](https://github.com/microsoft/onnxruntime) for more information.
-
-### Introduction of ONNX
-
-**ONNX** stands for **Open Neural Network Exchange**, which acts as *Intermediate Representation(IR)* for ML/DNN models from many frameworks. Check its [github](https://github.com/onnx/onnx) for more information.
-
-### Why include custom operators for ONNX Runtime in MMCV
-
- To verify the correctness of exported ONNX models in ONNX Runtime.
- To ease the deployment of ONNX models with custom operators from `mmcv.ops` in ONNX Runtime.
-
-### List of operators for ONNX Runtime supported in MMCV
-
-| Operator                                               | CPU | GPU | MMCV Releases |
-| :----------------------------------------------------- | :-: | :-: | :-----------: |
-| [SoftNMS](onnxruntime_custom_ops.md#softnms)           |  Y  |  N  |     1.2.3     |
-| [RoIAlign](onnxruntime_custom_ops.md#roialign)         |  Y  |  N  |     1.2.5     |
-| [NMS](onnxruntime_custom_ops.md#nms)                   |  Y  |  N  |     1.2.7     |
-| [grid_sampler](onnxruntime_custom_ops.md#grid_sampler) |  Y  |  N  |     1.3.1     |
-| [CornerPool](onnxruntime_custom_ops.md#cornerpool)     |  Y  |  N  |     1.3.4     |
-| [cummax](onnxruntime_custom_ops.md#cummax)             |  Y  |  N  |     1.3.4     |
-| [cummin](onnxruntime_custom_ops.md#cummin)             |  Y  |  N  |     1.3.4     |
-
-### How to build custom operators for ONNX Runtime
-
-*Please be noted that only **onnxruntime>=1.8.1** of CPU version on Linux platform is tested by now.*
-
-#### Prerequisite
-
- Clone repository
-
-```bash
-git clone https://github.com/open-mmlab/mmcv.git
-```
-
- Download `onnxruntime-linux` from ONNX Runtime [releases](https://github.com/microsoft/onnxruntime/releases/tag/v1.8.1), extract it, expose `ONNXRUNTIME_DIR` and finally add the lib path to `LD_LIBRARY_PATH` as below:
-
-```bash
-wget https://github.com/microsoft/onnxruntime/releases/download/v1.8.1/onnxruntime-linux-x64-1.8.1.tgz
-
-tar -zxvf onnxruntime-linux-x64-1.8.1.tgz
-cd onnxruntime-linux-x64-1.8.1
-export ONNXRUNTIME_DIR=$(pwd)
-export LD_LIBRARY_PATH=$ONNXRUNTIME_DIR/lib:$LD_LIBRARY_PATH
-```
-
-#### Build on Linux
-
-```bash
-cd mmcv ## to MMCV root directory
-MMCV_WITH_OPS=1 MMCV_WITH_ORT=1 python setup.py develop
-```
-
-### How to do inference using exported ONNX models with custom operators in ONNX Runtime in python
-
-Install ONNX Runtime with `pip`
-
-```bash
-pip install onnxruntime==1.8.1
-```
-
-Inference Demo
-
-```python
-import os
-
-import numpy as np
-import onnxruntime as ort
-
-from mmcv.ops import get_onnxruntime_op_path
-
-ort_custom_op_path = get_onnxruntime_op_path()
-assert os.path.exists(ort_custom_op_path)
-session_options = ort.SessionOptions()
-session_options.register_custom_ops_library(ort_custom_op_path)
-## exported ONNX model with custom operators
-onnx_file = 'sample.onnx'
-input_data = np.random.randn(1, 3, 224, 224).astype(np.float32)
-sess = ort.InferenceSession(onnx_file, session_options)
-onnx_results = sess.run(None, {'input' : input_data})
-```
-
-### How to add a new custom operator for ONNX Runtime in MMCV
-
-#### Reminder
-
- *Please note that this feature is experimental and may change in the future. Strongly suggest users always try with the latest master branch.*
-
- The custom operator is not included in [supported operator list](https://github.com/microsoft/onnxruntime/blob/master/docs/OperatorKernels.md) in ONNX Runtime.
-
- The custom operator should be able to be exported to ONNX.
-
-#### Main procedures
-
-Take custom operator `soft_nms` for example.
-
-1. Add header `soft_nms.h` to ONNX Runtime include directory `mmcv/ops/csrc/onnxruntime/`
-
-2. Add source `soft_nms.cpp` to ONNX Runtime source directory `mmcv/ops/csrc/onnxruntime/cpu/`
-
-3. Register `soft_nms` operator in [onnxruntime_register.cpp](../../../mmcv/ops/csrc/onnxruntime/cpu/onnxruntime_register.cpp)
-
-   ```c++
-   #include "soft_nms.h"
-
-   SoftNmsOp c_SoftNmsOp;
-
-   if (auto status = ortApi->CustomOpDomain_Add(domain, &c_SoftNmsOp)) {
-   return status;
-   }
-   ```
-
-4. Add unit test into `tests/test_ops/test_onnx.py`
-   Check [here](../../tests/test_ops/test_onnx.py) for examples.
-
-**Finally, welcome to send us PR of adding custom operators for ONNX Runtime in MMCV.** :nerd_face:
-
-### Known Issues
-
- "RuntimeError: tuple appears in op that does not forward tuples, unsupported kind: `prim::PythonOp`."
-  1. Note generally `cummax` or `cummin` is exportable to ONNX as long as the torch version >= 1.5.0, since `torch.cummax` is only supported with torch >= 1.5.0. But when `cummax` or `cummin` serves as an intermediate component whose outputs is used as inputs for another modules, it's expected that torch version must be >= 1.7.0. Otherwise the above error might arise, when running exported ONNX model with onnxruntime.
-  2. Solution: update the torch version to 1.7.0 or higher.
-
-### References
-
- [How to export Pytorch model with custom op to ONNX and run it in ONNX Runtime](https://github.com/onnx/tutorials/blob/master/PyTorchCustomOperator/README.md)
- [How to add a custom operator/kernel in ONNX Runtime](https://onnxruntime.ai/docs/reference/operators/add-custom-op.html)
--- a/docs/en/deployment/tensorrt_custom_ops.md
+++ b/docs/en/deployment/tensorrt_custom_ops.md
-## TensorRT Custom Ops
-
-<!-- TOC -->
-
- [TensorRT Custom Ops](#tensorrt-custom-ops)
-  - [MMCVRoIAlign](#mmcvroialign)
-    - [Description](#description)
-    - [Parameters](#parameters)
-    - [Inputs](#inputs)
-    - [Outputs](#outputs)
-    - [Type Constraints](#type-constraints)
-  - [ScatterND](#scatternd)
-    - [Description](#description-1)
-    - [Parameters](#parameters-1)
-    - [Inputs](#inputs-1)
-    - [Outputs](#outputs-1)
-    - [Type Constraints](#type-constraints-1)
-  - [NonMaxSuppression](#nonmaxsuppression)
-    - [Description](#description-2)
-    - [Parameters](#parameters-2)
-    - [Inputs](#inputs-2)
-    - [Outputs](#outputs-2)
-    - [Type Constraints](#type-constraints-2)
-  - [MMCVDeformConv2d](#mmcvdeformconv2d)
-    - [Description](#description-3)
-    - [Parameters](#parameters-3)
-    - [Inputs](#inputs-3)
-    - [Outputs](#outputs-3)
-    - [Type Constraints](#type-constraints-3)
-  - [grid_sampler](#grid_sampler)
-    - [Description](#description-4)
-    - [Parameters](#parameters-4)
-    - [Inputs](#inputs-4)
-    - [Outputs](#outputs-4)
-    - [Type Constraints](#type-constraints-4)
-  - [cummax](#cummax)
-    - [Description](#description-5)
-    - [Parameters](#parameters-5)
-    - [Inputs](#inputs-5)
-    - [Outputs](#outputs-5)
-    - [Type Constraints](#type-constraints-5)
-  - [cummin](#cummin)
-    - [Description](#description-6)
-    - [Parameters](#parameters-6)
-    - [Inputs](#inputs-6)
-    - [Outputs](#outputs-6)
-    - [Type Constraints](#type-constraints-6)
-  - [MMCVInstanceNormalization](#mmcvinstancenormalization)
-    - [Description](#description-7)
-    - [Parameters](#parameters-7)
-    - [Inputs](#inputs-7)
-    - [Outputs](#outputs-7)
-    - [Type Constraints](#type-constraints-7)
-  - [MMCVModulatedDeformConv2d](#mmcvmodulateddeformconv2d)
-    - [Description](#description-8)
-    - [Parameters](#parameters-8)
-    - [Inputs](#inputs-8)
-    - [Outputs](#outputs-8)
-    - [Type Constraints](#type-constraints-8)
-
-<!-- TOC -->
-
-### MMCVRoIAlign
-
-#### Description
-
-Perform RoIAlign on output feature, used in bbox_head of most two stage
-detectors.
-
-#### Parameters
-
-| Type    | Parameter        | Description                                                                                                   |
-| ------- | ---------------- | ------------------------------------------------------------------------------------------------------------- |
-| `int`   | `output_height`  | height of output roi                                                                                          |
-| `int`   | `output_width`   | width of output roi                                                                                           |
-| `float` | `spatial_scale`  | used to scale the input boxes                                                                                 |
-| `int`   | `sampling_ratio` | number of input samples to take for each output sample. `0` means to take samples densely for current models. |
-| `str`   | `mode`           | pooling mode in each bin. `avg` or `max`                                                                      |
-| `int`   | `aligned`        | If `aligned=0`, use the legacy implementation in MMDetection. Else, align the results more perfectly.         |
-
-#### Inputs
-
-<dl>
-<dt><tt>inputs[0]</tt>: T</dt>
-<dd>Input feature map; 4D tensor of shape (N, C, H, W), where N is the batch size, C is the numbers of channels, H and W are the height and width of the data.</dd>
-<dt><tt>inputs[1]</tt>: T</dt>
-<dd>RoIs (Regions of Interest) to pool over; 2-D tensor of shape (num_rois, 5) given as [[batch_index, x1, y1, x2, y2], ...]. The RoIs' coordinates are the coordinate system of inputs[0].</dd>
-</dl>
-
-#### Outputs
-
-<dl>
-<dt><tt>outputs[0]</tt>: T</dt>
-<dd>RoI pooled output, 4-D tensor of shape (num_rois, C, output_height, output_width). The r-th batch element output[0][r-1] is a pooled feature map corresponding to the r-th RoI inputs[1][r-1].<dd>
-</dl>
-
-#### Type Constraints
-
- T:tensor(float32, Linear)
-
-### ScatterND
-
-#### Description
-
-ScatterND takes three inputs `data` tensor of rank r >= 1, `indices` tensor of rank q >= 1, and `updates` tensor of rank q + r - indices.shape\[-1\] - 1. The output of the operation is produced by creating a copy of the input `data`, and then updating its value to values specified by updates at specific index positions specified by `indices`. Its output shape is the same as the shape of `data`. Note that `indices` should not have duplicate entries. That is, two or more updates for the same index-location is not supported.
-
-The `output` is calculated via the following equation:
-
-```python
-  output = np.copy(data)
-  update_indices = indices.shape[:-1]
-  for idx in np.ndindex(update_indices):
-      output[indices[idx]] = updates[idx]
-```
-
-#### Parameters
-
-None
-
-#### Inputs
-
-<dl>
-<dt><tt>inputs[0]</tt>: T</dt>
-<dd>Tensor of rank r>=1.</dd>
-
-<dt><tt>inputs[1]</tt>: tensor(int32, Linear)</dt>
-<dd>Tensor of rank q>=1.</dd>
-
-<dt><tt>inputs[2]</tt>: T</dt>
-<dd>Tensor of rank q + r - indices_shape[-1] - 1.</dd>
-</dl>
-
-#### Outputs
-
-<dl>
-<dt><tt>outputs[0]</tt>: T</dt>
-<dd>Tensor of rank r >= 1.</dd>
-</dl>
-
-#### Type Constraints
-
- T:tensor(float32, Linear), tensor(int32, Linear)
-
-### NonMaxSuppression
-
-#### Description
-
-Filter out boxes has high IoU overlap with previously selected boxes or low score. Output the indices of valid boxes. Indices of invalid boxes will be filled with -1.
-
-#### Parameters
-
-| Type    | Parameter                    | Description                                                                                                                          |
-| ------- | ---------------------------- | ------------------------------------------------------------------------------------------------------------------------------------ |
-| `int`   | `center_point_box`           | 0 - the box data is supplied as \[y1, x1, y2, x2\], 1-the box data is supplied as \[x_center, y_center, width, height\].             |
-| `int`   | `max_output_boxes_per_class` | The maximum number of boxes to be selected per batch per class. Default to 0, number of output boxes equal to number of input boxes. |
-| `float` | `iou_threshold`              | The threshold for deciding whether boxes overlap too much with respect to IoU. Value range \[0, 1\]. Default to 0.                   |
-| `float` | `score_threshold`            | The threshold for deciding when to remove boxes based on score.                                                                      |
-| `int`   | `offset`                     | 0 or 1, boxes' width or height is (x2 - x1 + offset).                                                                                |
-
-#### Inputs
-
-<dl>
-<dt><tt>inputs[0]</tt>: T</dt>
-<dd>Input boxes. 3-D tensor of shape (num_batches, spatial_dimension, 4).</dd>
-<dt><tt>inputs[1]</tt>: T</dt>
-<dd>Input scores. 3-D tensor of shape (num_batches, num_classes, spatial_dimension).</dd>
-</dl>
-
-#### Outputs
-
-<dl>
-<dt><tt>outputs[0]</tt>: tensor(int32, Linear)</dt>
-<dd>Selected indices. 2-D tensor of shape (num_selected_indices, 3) as [[batch_index, class_index, box_index], ...].</dd>
-<dd>num_selected_indices=num_batches* num_classes* min(max_output_boxes_per_class, spatial_dimension).</dd>
-<dd>All invalid indices will be filled with -1.</dd>
-</dl>
-
-#### Type Constraints
-
- T:tensor(float32, Linear)
-
-### MMCVDeformConv2d
-
-#### Description
-
-Perform Deformable Convolution on input feature, read [Deformable Convolutional Network](https://arxiv.org/abs/1703.06211) for detail.
-
-#### Parameters
-
-| Type           | Parameter          | Description                                                                                                                       |
-| -------------- | ------------------ | --------------------------------------------------------------------------------------------------------------------------------- |
-| `list of ints` | `stride`           | The stride of the convolving kernel. (sH, sW)                                                                                     |
-| `list of ints` | `padding`          | Paddings on both sides of the input. (padH, padW)                                                                                 |
-| `list of ints` | `dilation`         | The spacing between kernel elements. (dH, dW)                                                                                     |
-| `int`          | `deformable_group` | Groups of deformable offset.                                                                                                      |
-| `int`          | `group`            | Split input into groups. `input_channel` should be divisible by the number of groups.                                             |
-| `int`          | `im2col_step`      | DeformableConv2d use im2col to compute convolution. im2col_step is used to split input and offset, reduce memory usage of column. |
-
-#### Inputs
-
-<dl>
-<dt><tt>inputs[0]</tt>: T</dt>
-<dd>Input feature; 4-D tensor of shape (N, C, inH, inW), where N is the batch size, C is the numbers of channels, inH and inW are the height and width of the data.</dd>
-<dt><tt>inputs[1]</tt>: T</dt>
-<dd>Input offset; 4-D tensor of shape (N, deformable_group* 2* kH* kW, outH, outW), where kH and kW is the height and width of weight, outH and outW is the height and width of offset and output.</dd>
-<dt><tt>inputs[2]</tt>: T</dt>
-<dd>Input weight; 4-D tensor of shape (output_channel, input_channel, kH, kW).</dd>
-</dl>
-
-#### Outputs
-
-<dl>
-<dt><tt>outputs[0]</tt>: T</dt>
-<dd>Output feature; 4-D tensor of shape (N, output_channel, outH, outW).</dd>
-</dl>
-
-#### Type Constraints
-
- T:tensor(float32, Linear)
-
-### grid_sampler
-
-#### Description
-
-Perform sample from `input` with pixel locations from `grid`.
-
-#### Parameters
-
-| Type  | Parameter            | Description                                                                                                                                                                                                                                                                                     |
-| ----- | -------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| `int` | `interpolation_mode` | Interpolation mode to calculate output values. (0: `bilinear` , 1: `nearest`)                                                                                                                                                                                                                   |
-| `int` | `padding_mode`       | Padding mode for outside grid values. (0: `zeros`, 1: `border`, 2: `reflection`)                                                                                                                                                                                                                |
-| `int` | `align_corners`      | If `align_corners=1`, the extrema (`-1` and `1`) are considered as referring to the center points of the input's corner pixels. If `align_corners=0`, they are instead considered as referring to the corner points of the input's corner pixels, making the sampling more resolution agnostic. |
-
-#### Inputs
-
-<dl>
-<dt><tt>inputs[0]</tt>: T</dt>
-<dd>Input feature; 4-D tensor of shape (N, C, inH, inW), where N is the batch size, C is the numbers of channels, inH and inW are the height and width of the data.</dd>
-<dt><tt>inputs[1]</tt>: T</dt>
-<dd>Input offset; 4-D tensor of shape (N, outH, outW, 2), where outH and outW is the height and width of offset and output. </dd>
-</dl>
-
-#### Outputs
-
-<dl>
-<dt><tt>outputs[0]</tt>: T</dt>
-<dd>Output feature; 4-D tensor of shape (N, C, outH, outW).</dd>
-</dl>
-
-#### Type Constraints
-
- T:tensor(float32, Linear)
-
-### cummax
-
-#### Description
-
-Returns a namedtuple (`values`, `indices`) where `values` is the cumulative maximum of elements of `input` in the dimension `dim`. And `indices` is the index location of each maximum value found in the dimension `dim`.
-
-#### Parameters
-
-| Type  | Parameter | Description                             |
-| ----- | --------- | --------------------------------------- |
-| `int` | `dim`     | The dimension to do the operation over. |
-
-#### Inputs
-
-<dl>
-<dt><tt>inputs[0]</tt>: T</dt>
-<dd>The input tensor.</dd>
-</dl>
-
-#### Outputs
-
-<dl>
-<dt><tt>outputs[0]</tt>: T</dt>
-<dd>Output values.</dd>
-<dt><tt>outputs[1]</tt>: (int32, Linear)</dt>
-<dd>Output indices.</dd>
-</dl>
-
-#### Type Constraints
-
- T:tensor(float32, Linear)
-
-### cummin
-
-#### Description
-
-Returns a namedtuple (`values`, `indices`) where `values` is the cumulative minimum of elements of `input` in the dimension `dim`. And `indices` is the index location of each minimum value found in the dimension `dim`.
-
-#### Parameters
-
-| Type  | Parameter | Description                             |
-| ----- | --------- | --------------------------------------- |
-| `int` | `dim`     | The dimension to do the operation over. |
-
-#### Inputs
-
-<dl>
-<dt><tt>inputs[0]</tt>: T</dt>
-<dd>The input tensor.</dd>
-</dl>
-
-#### Outputs
-
-<dl>
-<dt><tt>outputs[0]</tt>: T</dt>
-<dd>Output values.</dd>
-<dt><tt>outputs[1]</tt>: (int32, Linear)</dt>
-<dd>Output indices.</dd>
-</dl>
-
-#### Type Constraints
-
- T:tensor(float32, Linear)
-
-### MMCVInstanceNormalization
-
-#### Description
-
-Carries out instance normalization as described in the paper https://arxiv.org/abs/1607.08022.
-
-y = scale * (x - mean) / sqrt(variance + epsilon) + B, where mean and variance are computed per instance per channel.
-
-#### Parameters
-
-| Type    | Parameter | Description                                                          |
-| ------- | --------- | -------------------------------------------------------------------- |
-| `float` | `epsilon` | The epsilon value to use to avoid division by zero. Default is 1e-05 |
-
-#### Inputs
-
-<dl>
-<dt><tt>input</tt>: T</dt>
-<dd>Input data tensor from the previous operator; dimensions for image case are (N x C x H x W), where N is the batch size, C is the number of channels, and H and W are the height and the width of the data. For non image case, the dimensions are in the form of (N x C x D1 x D2 ... Dn), where N is the batch size.</dd>
-<dt><tt>scale</tt>: T</dt>
-<dd>The input 1-dimensional scale tensor of size C.</dd>
-<dt><tt>B</tt>: T</dt>
-<dd>The input 1-dimensional bias tensor of size C.</dd>
-</dl>
-
-#### Outputs
-
-<dl>
-<dt><tt>output</tt>: T</dt>
-<dd>The output tensor of the same shape as input.</dd>
-</dl>
-
-#### Type Constraints
-
- T:tensor(float32, Linear)
-
-### MMCVModulatedDeformConv2d
-
-#### Description
-
-Perform Modulated Deformable Convolution on input feature, read [Deformable ConvNets v2: More Deformable, Better Results](https://arxiv.org/abs/1811.11168?from=timeline) for detail.
-
-#### Parameters
-
-| Type           | Parameter          | Description                                                                           |
-| -------------- | ------------------ | ------------------------------------------------------------------------------------- |
-| `list of ints` | `stride`           | The stride of the convolving kernel. (sH, sW)                                         |
-| `list of ints` | `padding`          | Paddings on both sides of the input. (padH, padW)                                     |
-| `list of ints` | `dilation`         | The spacing between kernel elements. (dH, dW)                                         |
-| `int`          | `deformable_group` | Groups of deformable offset.                                                          |
-| `int`          | `group`            | Split input into groups. `input_channel` should be divisible by the number of groups. |
-
-#### Inputs
-
-<dl>
-<dt><tt>inputs[0]</tt>: T</dt>
-<dd>Input feature; 4-D tensor of shape (N, C, inH, inW), where N is the batch size, C is the number of channels, inH and inW are the height and width of the data.</dd>
-<dt><tt>inputs[1]</tt>: T</dt>
-<dd>Input offset; 4-D tensor of shape (N, deformable_group* 2* kH* kW, outH, outW), where kH and kW is the height and width of weight, outH and outW is the height and width of offset and output.</dd>
-<dt><tt>inputs[2]</tt>: T</dt>
-<dd>Input mask; 4-D tensor of shape (N, deformable_group* kH* kW, outH, outW), where kH and kW is the height and width of weight, outH and outW is the height and width of offset and output.</dd>
-<dt><tt>inputs[3]</tt>: T</dt>
-<dd>Input weight; 4-D tensor of shape (output_channel, input_channel, kH, kW).</dd>
-<dt><tt>inputs[4]</tt>: T, optional</dt>
-<dd>Input weight; 1-D tensor of shape (output_channel).</dd>
-</dl>
-
-#### Outputs
-
-<dl>
-<dt><tt>outputs[0]</tt>: T</dt>
-<dd>Output feature; 4-D tensor of shape (N, output_channel, outH, outW).</dd>
-</dl>
-
-#### Type Constraints
-
- T:tensor(float32, Linear)
--- a/docs/en/deployment/tensorrt_plugin.md
+++ b/docs/en/deployment/tensorrt_plugin.md
-## TensorRT Deployment
-
-### <span style="color:red">DeprecationWarning</span>
-
-TensorRT support will be deprecated in the future.
-Welcome to use the unified model deployment toolbox MMDeploy: https://github.com/open-mmlab/mmdeploy
-
-<!-- TOC -->
-
- [TensorRT Deployment](#tensorrt-deployment)
-  - [<span style="color:red">DeprecationWarning</span>](#deprecationwarning)
-  - [Introduction](#introduction)
-  - [List of TensorRT plugins supported in MMCV](#list-of-tensorrt-plugins-supported-in-mmcv)
-  - [How to build TensorRT plugins in MMCV](#how-to-build-tensorrt-plugins-in-mmcv)
-    - [Prerequisite](#prerequisite)
-    - [Build on Linux](#build-on-linux)
-  - [Create TensorRT engine and run inference in python](#create-tensorrt-engine-and-run-inference-in-python)
-  - [How to add a TensorRT plugin for custom op in MMCV](#how-to-add-a-tensorrt-plugin-for-custom-op-in-mmcv)
-    - [Main procedures](#main-procedures)
-    - [Reminders](#reminders)
-  - [Known Issues](#known-issues)
-  - [References](#references)
-
-<!-- TOC -->
-
-### Introduction
-
-**NVIDIA TensorRT** is a software development kit(SDK) for high-performance inference of deep learning models. It includes a deep learning inference optimizer and runtime that delivers low latency and high-throughput for deep learning inference applications. Please check its [developer's website](https://developer.nvidia.com/tensorrt) for more information.
-To ease the deployment of trained models with custom operators from `mmcv.ops` using TensorRT, a series of TensorRT plugins are included in MMCV.
-
-### List of TensorRT plugins supported in MMCV
-
-| ONNX Operator             | TensorRT Plugin                                                                 | MMCV Releases |
-| :------------------------ | :------------------------------------------------------------------------------ | :-----------: |
-| MMCVRoiAlign              | [MMCVRoiAlign](./tensorrt_custom_ops.md#mmcvroialign)                           |     1.2.6     |
-| ScatterND                 | [ScatterND](./tensorrt_custom_ops.md#scatternd)                                 |     1.2.6     |
-| NonMaxSuppression         | [NonMaxSuppression](./tensorrt_custom_ops.md#nonmaxsuppression)                 |     1.3.0     |
-| MMCVDeformConv2d          | [MMCVDeformConv2d](./tensorrt_custom_ops.md#mmcvdeformconv2d)                   |     1.3.0     |
-| grid_sampler              | [grid_sampler](./tensorrt_custom_ops.md#grid-sampler)                           |     1.3.1     |
-| cummax                    | [cummax](./tensorrt_custom_ops.md#cummax)                                       |     1.3.5     |
-| cummin                    | [cummin](./tensorrt_custom_ops.md#cummin)                                       |     1.3.5     |
-| MMCVInstanceNormalization | [MMCVInstanceNormalization](./tensorrt_custom_ops.md#mmcvinstancenormalization) |     1.3.5     |
-| MMCVModulatedDeformConv2d | [MMCVModulatedDeformConv2d](./tensorrt_custom_ops.md#mmcvmodulateddeformconv2d) |     1.3.8     |
-
-Notes
-
- All plugins listed above are developed on TensorRT-7.2.1.6.Ubuntu-16.04.x86_64-gnu.cuda-10.2.cudnn8.0
-
-### How to build TensorRT plugins in MMCV
-
-#### Prerequisite
-
- Clone repository
-
-```bash
-git clone https://github.com/open-mmlab/mmcv.git
-```
-
- Install TensorRT
-
-Download the corresponding TensorRT build from [NVIDIA Developer Zone](https://developer.nvidia.com/nvidia-tensorrt-download).
-
-For example, for Ubuntu 16.04 on x86-64 with cuda-10.2, the downloaded file is `TensorRT-7.2.1.6.Ubuntu-16.04.x86_64-gnu.cuda-10.2.cudnn8.0.tar.gz`.
-
-Then, install as below:
-
-```bash
-cd ~/Downloads
-tar -xvzf TensorRT-7.2.1.6.Ubuntu-16.04.x86_64-gnu.cuda-10.2.cudnn8.0.tar.gz
-export TENSORRT_DIR=`pwd`/TensorRT-7.2.1.6
-export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$TENSORRT_DIR/lib
-```
-
-Install python packages: tensorrt, graphsurgeon, onnx-graphsurgeon
-
-```bash
-pip install $TENSORRT_DIR/python/tensorrt-7.2.1.6-cp37-none-linux_x86_64.whl
-pip install $TENSORRT_DIR/onnx_graphsurgeon/onnx_graphsurgeon-0.2.6-py2.py3-none-any.whl
-pip install $TENSORRT_DIR/graphsurgeon/graphsurgeon-0.4.5-py2.py3-none-any.whl
-```
-
-For more detailed information of installing TensorRT using tar, please refer to [Nvidia' website](https://docs.nvidia.com/deeplearning/tensorrt/archives/tensorrt-721/install-guide/index.html#installing-tar).
-
- Install cuDNN
-
-Install cuDNN 8 following [Nvidia' website](https://docs.nvidia.com/deeplearning/cudnn/install-guide/index.html#installlinux-tar).
-
-#### Build on Linux
-
-```bash
-cd mmcv ## to MMCV root directory
-MMCV_WITH_OPS=1 MMCV_WITH_TRT=1 pip install -e .
-```
-
-### Create TensorRT engine and run inference in python
-
-Here is an example.
-
-```python
-import torch
-import onnx
-
-from mmcv.tensorrt import (TRTWrapper, onnx2trt, save_trt_engine,
-                                   is_tensorrt_plugin_loaded)
-
-assert is_tensorrt_plugin_loaded(), 'Requires to complie TensorRT plugins in mmcv'
-
-onnx_file = 'sample.onnx'
-trt_file = 'sample.trt'
-onnx_model = onnx.load(onnx_file)
-
-## Model input
-inputs = torch.rand(1, 3, 224, 224).cuda()
-## Model input shape info
-opt_shape_dict = {
-    'input': [list(inputs.shape),
-              list(inputs.shape),
-              list(inputs.shape)]
-}
-
-## Create TensorRT engine
-max_workspace_size = 1 << 30
-trt_engine = onnx2trt(
-    onnx_model,
-    opt_shape_dict,
-    max_workspace_size=max_workspace_size)
-
-## Save TensorRT engine
-save_trt_engine(trt_engine, trt_file)
-
-## Run inference with TensorRT
-trt_model = TRTWrapper(trt_file, ['input'], ['output'])
-
-with torch.no_grad():
-    trt_outputs = trt_model({'input': inputs})
-    output = trt_outputs['output']
-
-```
-
-### How to add a TensorRT plugin for custom op in MMCV
-
-#### Main procedures
-
-Below are the main steps:
-
-1. Add c++ header file
-2. Add c++ source file
-3. Add cuda kernel file
-4. Register plugin in `trt_plugin.cpp`
-5. Add unit test in `tests/test_ops/test_tensorrt.py`
-
-**Take RoIAlign plugin `roi_align` for example.**
-
-1. Add header `trt_roi_align.hpp` to TensorRT include directory `mmcv/ops/csrc/tensorrt/`
-
-2. Add source `trt_roi_align.cpp` to TensorRT source directory `mmcv/ops/csrc/tensorrt/plugins/`
-
-3. Add cuda kernel `trt_roi_align_kernel.cu` to TensorRT source directory `mmcv/ops/csrc/tensorrt/plugins/`
-
-4. Register `roi_align` plugin in [trt_plugin.cpp](https://github.com/open-mmlab/mmcv/blob/master/mmcv/ops/csrc/tensorrt/plugins/trt_plugin.cpp)
-
-   ```c++
-   #include "trt_plugin.hpp"
-
-   #include "trt_roi_align.hpp"
-
-   REGISTER_TENSORRT_PLUGIN(RoIAlignPluginDynamicCreator);
-
-   extern "C" {
-   bool initLibMMCVInferPlugins() { return true; }
-   }  // extern "C"
-   ```
-
-5. Add unit test into `tests/test_ops/test_tensorrt.py`
-   Check [here](https://github.com/open-mmlab/mmcv/blob/master/tests/test_ops/test_tensorrt.py) for examples.
-
-#### Reminders
-
- *Please note that this feature is experimental and may change in the future. Strongly suggest users always try with the latest master branch.*
-
- Some of the [custom ops](https://mmcv.readthedocs.io/en/latest/ops.html) in `mmcv` have their cuda implementations, which could be referred.
-
-### Known Issues
-
- None
-
-### References
-
- [Developer guide of Nvidia TensorRT](https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html)
- [TensorRT Open Source Software](https://github.com/NVIDIA/TensorRT)
- [onnx-tensorrt](https://github.com/onnx/onnx-tensorrt)
- [TensorRT python API](https://docs.nvidia.com/deeplearning/tensorrt/api/python_api/index.html)
- [TensorRT c++ plugin API](https://docs.nvidia.com/deeplearning/tensorrt/api/c_api/classnvinfer1_1_1_i_plugin.html)
--- a/docs/en/get_started/installation.md
+++ b/docs/en/get_started/installation.md
@@ -177,8 +177,4 @@ b. Install the lite version.
 pip install mmcv
 ```

-c. Install full version with custom operators for onnxruntime
-
- Check [here](https://mmcv.readthedocs.io/en/latest/deployment/onnxruntime_custom_ops.html) for detailed instruction.
-
 If you would like to build MMCV from source, please refer to the [guide](https://mmcv.readthedocs.io/en/latest/get_started/build.html).
--- a/docs/en/index.rst
+++ b/docs/en/index.rst
@@ -25,11 +25,6 @@ You can switch between Chinese and English documents in the lower-left corner of
   :caption: Deployment

   deployment/mmcv_ops_definition.md
-   deployment/onnx.md
-   deployment/onnxruntime_custom_ops.md
-   deployment/onnxruntime_op.md
-   deployment/tensorrt_custom_ops.md
-   deployment/tensorrt_plugin.md

 .. toctree::
   :caption: Switch Language

--- a/docs/zh_cn/deployment/onnx.md
+++ b/docs/zh_cn/deployment/onnx.md
-## MMCV中ONNX模块简介 (实验性)
-
-### register_extra_symbolics
-
-在将PyTorch模型导出成ONNX时，需要注册额外的符号函数
-
-#### 范例
-
-```python
-import mmcv
-from mmcv.onnx import register_extra_symbolics
-
-opset_version = 11
-register_extra_symbolics(opset_version)
-```
-
-#### 常见问题
-
- 无
--- a/docs/zh_cn/deployment/onnxruntime_custom_ops.md
+++ b/docs/zh_cn/deployment/onnxruntime_custom_ops.md
-## ONNX Runtime自定义算子
-
-<!-- TOC -->
-
- [ONNX Runtime自定义算子](#onnx-runtime自定义算子)
-  - [SoftNMS](#softnms)
-    - [描述](#描述)
-    - [模型参数](#模型参数)
-    - [输入](#输入)
-    - [输出](#输出)
-    - [类型约束](#类型约束)
-  - [RoIAlign](#roialign)
-    - [描述](#描述-1)
-    - [模型参数](#模型参数-1)
-    - [输入](#输入-1)
-    - [输出](#输出-1)
-    - [类型约束](#类型约束-1)
-  - [NMS](#nms)
-    - [描述](#描述-2)
-    - [模型参数](#模型参数-2)
-    - [输入](#输入-2)
-    - [输出](#输出-2)
-    - [类型约束](#类型约束-2)
-  - [grid_sampler](#grid_sampler)
-    - [描述](#描述-3)
-    - [模型参数](#模型参数-3)
-    - [输入](#输入-3)
-    - [输出](#输出-3)
-    - [类型约束](#类型约束-3)
-  - [CornerPool](#cornerpool)
-    - [描述](#描述-4)
-    - [模型参数](#模型参数-4)
-    - [输入](#输入-4)
-    - [输出](#输出-4)
-    - [类型约束](#类型约束-4)
-  - [cummax](#cummax)
-    - [描述](#描述-5)
-    - [模型参数](#模型参数-5)
-    - [输入](#输入-5)
-    - [输出](#输出-5)
-    - [类型约束](#类型约束-5)
-  - [cummin](#cummin)
-    - [描述](#描述-6)
-    - [模型参数](#模型参数-6)
-    - [输入](#输入-6)
-    - [输出](#输出-6)
-    - [类型约束](#类型约束-6)
-  - [MMCVModulatedDeformConv2d](#mmcvmodulateddeformconv2d)
-    - [描述](#描述-7)
-    - [模型参数](#模型参数-7)
-    - [输入](#输入-7)
-    - [输出](#输出-7)
-    - [类型约束](#类型约束-7)
-
-<!-- TOC -->
-
-### SoftNMS
-
-#### 描述
-
-根据`scores`计算`boxes`的soft NMS。 请阅读[Soft-NMS -- Improving Object Detection With One Line of Code](https://arxiv.org/abs/1704.04503)了解细节。
-
-#### 模型参数
-
-| 类型    | 参数名          | 描述                                                    |
-| ------- | --------------- | ------------------------------------------------------- |
-| `float` | `iou_threshold` | 用来判断候选框重合度的阈值，取值范围\[0, 1\]。默认值为0 |
-| `float` | `sigma`         | 高斯方法的超参数                                        |
-| `float` | `min_score`     | NMS的score阈值                                          |
-| `int`   | `method`        | NMS的计算方式, (0: `naive`, 1: `linear`, 2: `gaussian`) |
-| `int`   | `offset`        | 用来计算候选框的宽高(x2 - x1 + offset)。可选值0或1      |
-
-#### 输入
-
-<dl>
-<dt><tt>boxes</tt>: T</dt>
-<dd>输入候选框。形状为(N, 4)的二维张量，N为候选框数量。</dd>
-<dt><tt>scores</tt>: T</dt>
-<dd>输入得分。形状为(N, )的一维张量。</dd>
-</dl>
-
-#### 输出
-
-<dl>
-<dt><tt>dets</tt>: T</dt>
-<dd>输出的检测框与得分。形状为(num_valid_boxes, 5)的二维张量，内容为[[x1, y1, x2, y2, score], ...]。num_valid_boxes是合法的检测框数量。</dd>
-<dt><tt>indices</tt>: tensor(int64)</dt>
-<dd>输出序号。形状为(num_valid_boxes, )的一维张量。</dd>
-</dl>
-
-#### 类型约束
-
- T:tensor(float32)
-
-### RoIAlign
-
-#### 描述
-
-在特征图上计算RoIAlign，通常在双阶段目标检测模型的bbox_head中使用
-
-#### 模型参数
-
-| 类型    | 参数名           | 描述                                                    |
-| ------- | ---------------- | ------------------------------------------------------- |
-| `int`   | `output_height`  | roi特征的输出高度                                       |
-| `int`   | `output_width`   | roi特征的输出宽度                                       |
-| `float` | `spatial_scale`  | 输入检测框的缩放系数                                    |
-| `int`   | `sampling_ratio` | 输出的采样率。`0`表示使用密集采样                       |
-| `str`   | `mode`           | 池化方式。 `avg`或`max`                                 |
-| `int`   | `aligned`        | 如果`aligned=1`，则像素会进行-0.5的偏移以达到更好的对齐 |
-
-#### 输入
-
-<dl>
-<dt><tt>input</tt>: T</dt>
-<dd>输入特征图；形状为(N, C, H, W)的四维张量，其中N为batch大小，C为输入通道数，H和W为输入特征图的高和宽。</dd>
-<dt><tt>rois</tt>: T</dt>
-<dd>需要进行池化的感兴趣区域；形状为(num_rois, 5)的二维张量，内容为[[batch_index, x1, y1, x2, y2], ...]。rois的坐标为输入特征图的坐标系。</dd>
-</dl>
-
-#### 输出
-
-<dl>
-<dt><tt>feat</tt>: T</dt>
-<dd>池化的输出；形状为(num_rois, C, output_height, output_width)的四维张量。每个输出特征feat[i]都与输入感兴趣区域rois[i]一一对应。<dd>
-</dl>
-
-#### 类型约束
-
- T:tensor(float32)
-
-### NMS
-
-#### 描述
-
-根据IoU阈值对候选框进行非极大值抑制。
-
-#### 模型参数
-
-| 类型    | 参数名          | 描述                                                    |
-| ------- | --------------- | ------------------------------------------------------- |
-| `float` | `iou_threshold` | 用来判断候选框重合度的阈值，取值范围\[0, 1\]。默认值为0 |
-| `int`   | `offset`        | 用来计算候选框的宽高(x2 - x1 + offset)。可选值0或1      |
-
-#### 输入
-
-<dl>
-<dt><tt>boxes</tt>: T</dt>
-<dd>输入候选框。形状为(N, 4)的二维张量，N为候选框数量。</dd>
-<dt><tt>scores</tt>: T</dt>
-<dd>输入得分。形状为(N, )的一维张量。</dd>
-</dl>
-
-#### 输出
-
-<dl>
-<dt><tt>indices</tt>: tensor(int32, Linear)</dt>
-<dd>被选中的候选框索引。形状为(num_valid_boxes, )的一维张量，num_valid_boxes表示被选上的候选框数量。</dd>
-</dl>
-
-#### 类型约束
-
- T:tensor(float32)
-
-### grid_sampler
-
-#### 描述
-
-根据`grid`的像素位置对`input`进行网格采样。
-
-#### 模型参数
-
-| 类型  | 参数名               | 描述                                                                                                                                                 |
-| ----- | -------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------- |
-| `int` | `interpolation_mode` | 计算输出使用的插值模式。(0: `bilinear` , 1: `nearest`)                                                                                               |
-| `int` | `padding_mode`       | 边缘填充模式。(0: `zeros`, 1: `border`, 2: `reflection`)                                                                                             |
-| `int` | `align_corners`      | 如果`align_corners=1`，则极值(`-1`和`1`)会被当做输入边缘像素的中心点。如果`align_corners=0`，则它们会被看做是边缘像素的边缘点,减小分辨率对采样的影响 |
-
-#### 输入
-
-<dl>
-<dt><tt>input</tt>: T</dt>
-<dd>输入特征；形状为(N, C, inH, inW)的四维张量，其中N为batch大小，C为输入通道数，inH和inW为输入特征图的高和宽。</dd>
-<dt><tt>grid</tt>: T</dt>
-<dd>输入网格；形状为(N, outH, outW, 2)的四维张量，outH和outW为输出的高和宽。 </dd>
-</dl>
-
-#### 输出
-
-<dl>
-<dt><tt>output</tt>: T</dt>
-<dd>输出特征；形状为(N, C, outH, outW)的四维张量。</dd>
-</dl>
-
-#### 类型约束
-
- T:tensor(float32, Linear)
-
-### CornerPool
-
-#### 描述
-
-对`input`计算CornerPool。请阅读[CornerNet -- Detecting Objects as Paired Keypoints](https://arxiv.org/abs/1808.01244)了解更多细节。
-
-#### 模型参数
-
-| 类型  | 参数名 | 描述                                                     |
-| ----- | ------ | -------------------------------------------------------- |
-| `int` | `mode` | 池化模式。(0: `top`, 1: `bottom`, 2: `left`, 3: `right`) |
-
-#### 输入
-
-<dl>
-<dt><tt>input</tt>: T</dt>
-<dd>输入特征；形状为(N, C, H, W)的四维张量，其中N为batch大小，C为输入通道数，H和W为输入特征图的高和宽。</dd>
-</dl>
-
-#### 输出
-
-<dl>
-<dt><tt>output</tt>: T</dt>
-<dd>输出特征；形状为(N, C, H, W)的四维张量。</dd>
-</dl>
-
-#### 类型约束
-
- T:tensor(float32)
-
-### cummax
-
-#### 描述
-
-返回一个元组(`values`, `indices`)，其中`values`为`input`第`dim`维的累计最大值，`indices`为第`dim`维最大值位置。请阅读[torch.cummax](https://pytorch.org/docs/stable/generated/torch.cummax.html)了解更多细节。
-
-#### 模型参数
-
-| 类型  | 参数名 | 描述               |
-| ----- | ------ | ------------------ |
-| `int` | `dim`  | 进行累计计算的维度 |
-
-#### 输入
-
-<dl>
-<dt><tt>input</tt>: T</dt>
-<dd>输入张量；可以使任意形状；也支持空Tensor</dd>
-</dl>
-
-#### 输出
-
-<dl>
-<dt><tt>output</tt>: T</dt>
-<dd>`input`第`dim`维的累计最大值，形状与`input`相同。类型和`input`一致</dd>
-<dt><tt>indices</tt>: tensor(int64)</dt>
-<dd>第`dim`维最大值位置，形状与`input`相同。</dd>
-</dl>
-
-#### 类型约束
-
- T:tensor(float32)
-
-### cummin
-
-#### 描述
-
-返回一个元组(`values`, `indices`)，其中`values`为`input`第`dim`维的累计最小值，`indices`为第`dim`维最小值位置。请阅读[torch.cummin](https://pytorch.org/docs/stable/generated/torch.cummin.html)了解更多细节。
-
-#### 模型参数
-
-| 类型  | 参数名 | 描述               |
-| ----- | ------ | ------------------ |
-| `int` | `dim`  | 进行累计计算的维度 |
-
-#### 输入
-
-<dl>
-<dt><tt>input</tt>: T</dt>
-<dd>输入张量；可以是任意形状；也支持空Tensor</dd>
-</dl>
-
-#### 输出
-
-<dl>
-<dt><tt>output</tt>: T</dt>
-<dd>`input`第`dim`维的累计最小值，形状与`input`相同。类型和`input`一致</dd>
-<dt><tt>indices</tt>: tensor(int64)</dt>
-<dd>第`dim`维最小值位置，形状与`input`相同。</dd>
-</dl>
-
-#### 类型约束
-
- T:tensor(float32)
-
-### MMCVModulatedDeformConv2d
-
-#### 描述
-
-在输入特征上计算Modulated Deformable Convolution，请阅读[Deformable ConvNets v2: More Deformable, Better Results](https://arxiv.org/abs/1811.11168?from=timeline)了解更多细节。
-
-#### 模型参数
-
-| 类型           | 参数名              | 描述                                                          |
-| -------------- | ------------------- | ------------------------------------------------------------- |
-| `list of ints` | `stride`            | 卷积的步长 (sH, sW)                                           |
-| `list of ints` | `padding`           | 输入特征填充大小 (padH, padW)                                 |
-| `list of ints` | `dilation`          | 卷积核各元素间隔 (dH, dW)                                     |
-| `int`          | `deformable_groups` | 可变偏移量的分组，通常置位1即可                               |
-| `int`          | `groups`            | 卷积分组数，`input_channel`会根据这个值被分为数个分组进行计算 |
-
-#### 输入
-
-<dl>
-<dt><tt>inputs[0]</tt>: T</dt>
-<dd>输入特征；形状为(N, C, inH, inW)的四维张量，其中N为batch大小，C为输入通道数，inH和inW为输入特征图的高和宽。</dd>
-<dt><tt>inputs[1]</tt>: T</dt>
-<dd>输入偏移量；形状为(N, deformable_group* 2* kH* kW, outH, outW)的四维张量，kH和kW为输入特征图的高和宽，outH和outW为输入特征图的高和宽。</dd>
-<dt><tt>inputs[2]</tt>: T</dt>
-<dd>输入掩码；形状为(N, deformable_group* kH* kW, outH, outW)的四维张量。</dd>
-<dt><tt>inputs[3]</tt>: T</dt>
-<dd>输入权重；形状为(output_channel, input_channel, kH, kW)的四维张量。</dd>
-<dt><tt>inputs[4]</tt>: T, optional</dt>
-<dd>输入偏移量；形状为(output_channel)的一维张量。</dd>
-</dl>
-
-#### 输出
-
-<dl>
-<dt><tt>outputs[0]</tt>: T</dt>
-<dd>输出特征；形状为(N, output_channel, outH, outW)的四维张量。</dd>
-</dl>
-
-#### 类型约束
-
- T:tensor(float32, Linear)
--- a/docs/zh_cn/deployment/onnxruntime_op.md
+++ b/docs/zh_cn/deployment/onnxruntime_op.md
-## MMCV中的ONNX Runtime自定义算子
-
-### ONNX Runtime介绍
-
-**ONNX Runtime**是一个跨平台的推理与训练加速器，适配许多常用的机器学习/深度神经网络框架。请访问[github](https://github.com/microsoft/onnxruntime)了解更多信息。
-
-### ONNX介绍
-
-**ONNX**是**Open Neural Network Exchange**的缩写，是许多机器学习/深度神经网络框架使用的*中间表示(IR)*。请访问[github](https://github.com/onnx/onnx)了解更多信息。
-
-### 为什么要在MMCV中添加ONNX自定义算子？
-
- 为了验证ONNX模型在ONNX Runtime下的推理的正确性。
- 为了方便使用了`mmcv.ops`自定义算子的模型的部署工作。
-
-### MMCV已支持的算子
-
-|                                       算子                                       | CPU | GPU | MMCV版本 |
-| :------------------------------------------------------------------------------: | :-: | :-: | :------: |
-|                   [SoftNMS](onnxruntime_custom_ops.md#softnms)                   |  Y  |  N  |  1.2.3   |
-|                  [RoIAlign](onnxruntime_custom_ops.md#roialign)                  |  Y  |  N  |  1.2.5   |
-|                       [NMS](onnxruntime_custom_ops.md#nms)                       |  Y  |  N  |  1.2.7   |
-|              [grid_sampler](onnxruntime_custom_ops.md#grid_sampler)              |  Y  |  N  |  1.3.1   |
-|                [CornerPool](onnxruntime_custom_ops.md#cornerpool)                |  Y  |  N  |  1.3.4   |
-|                    [cummax](onnxruntime_custom_ops.md#cummax)                    |  Y  |  N  |  1.3.4   |
-|                    [cummin](onnxruntime_custom_ops.md#cummin)                    |  Y  |  N  |  1.3.4   |
-| [MMCVModulatedDeformConv2d](onnxruntime_custom_ops.md#mmcvmodulateddeformconv2d) |  Y  |  N  |  1.3.12  |
-
-### 如何编译ONNX Runtime自定义算子？
-
-*请注意我们仅在**onnxruntime>=1.8.1**的Linux x86-64 cpu平台上进行过测试*
-
-#### 准备工作
-
- 克隆代码仓库
-
-```bash
-git clone https://github.com/open-mmlab/mmcv.git
-```
-
- 从ONNX Runtime下载`onnxruntime-linux`：[releases](https://github.com/microsoft/onnxruntime/releases/tag/v1.8.1)，解压缩，根据路径创建变量`ONNXRUNTIME_DIR`并把路径下的lib目录添加到`LD_LIBRARY_PATH`，步骤如下：
-
-```bash
-wget https://github.com/microsoft/onnxruntime/releases/download/v1.8.1/onnxruntime-linux-x64-1.8.1.tgz
-
-tar -zxvf onnxruntime-linux-x64-1.8.1.tgz
-cd onnxruntime-linux-x64-1.8.1
-export ONNXRUNTIME_DIR=$(pwd)
-export LD_LIBRARY_PATH=$ONNXRUNTIME_DIR/lib:$LD_LIBRARY_PATH
-```
-
-#### Linux系统下编译
-
-```bash
-cd mmcv ## to MMCV root directory
-MMCV_WITH_OPS=1 MMCV_WITH_ORT=1 python setup.py develop
-```
-
-### 如何在python下使用ONNX Runtime对导出的ONNX模型做编译
-
-使用`pip`安装ONNX Runtime
-
-```bash
-pip install onnxruntime==1.8.1
-```
-
-推理范例
-
-```python
-import os
-
-import numpy as np
-import onnxruntime as ort
-
-from mmcv.ops import get_onnxruntime_op_path
-
-ort_custom_op_path = get_onnxruntime_op_path()
-assert os.path.exists(ort_custom_op_path)
-session_options = ort.SessionOptions()
-session_options.register_custom_ops_library(ort_custom_op_path)
-## exported ONNX model with custom operators
-onnx_file = 'sample.onnx'
-input_data = np.random.randn(1, 3, 224, 224).astype(np.float32)
-sess = ort.InferenceSession(onnx_file, session_options)
-onnx_results = sess.run(None, {'input' : input_data})
-```
-
-### 如何为MMCV添加ONNX Runtime的自定义算子
-
-#### 开发前提醒
-
- 该算子的ONNX Runtime实现尚未在MMCV中支持[已实现算子列表](https://github.com/microsoft/onnxruntime/blob/master/docs/OperatorKernels.md)。
- 确保该自定义算子可以被ONNX导出。
-
-#### 添加方法
-
-以`soft_nms`为例：
-
-1. 在ONNX Runtime头文件目录`mmcv/ops/csrc/onnxruntime/`下添加头文件`soft_nms.h`
-
-2. 在ONNX Runtime源码目录`mmcv/ops/csrc/onnxruntime/cpu/`下添加算子实现`soft_nms.cpp`
-
-3. 在[onnxruntime_register.cpp](../../../mmcv/ops/csrc/onnxruntime/cpu/onnxruntime_register.cpp)中注册实现的算子`soft_nms`
-
-   ```c++
-   #include "soft_nms.h"
-
-   SoftNmsOp c_SoftNmsOp;
-
-   if (auto status = ortApi->CustomOpDomain_Add(domain, &c_SoftNmsOp)) {
-   return status;
-   }
-   ```
-
-4. 在`tests/test_ops/test_onnx.py`添加单元测试，
-   可以参考[here](../../tests/test_ops/test_onnx.py)。
-
-**最后，欢迎为MMCV添加ONNX Runtime自定义算子** :nerd_face:
-
-### 已知问题
-
- "RuntimeError: tuple appears in op that does not forward tuples, unsupported kind: `prim::PythonOp`."
-  1. 请注意`cummax`和`cummin`算子是在torch >= 1.5.0被添加的。但他们需要在torch version >= 1.7.0才能正确导出。否则会在导出时发生上面的错误。
-  2. 解决方法：升级PyTorch到1.7.0以上版本
-
-### 引用
-
- [How to export Pytorch model with custom op to ONNX and run it in ONNX Runtime](https://github.com/onnx/tutorials/blob/master/PyTorchCustomOperator/README.md)
- [How to add a custom operator/kernel in ONNX Runtime](https://onnxruntime.ai/docs/reference/operators/add-custom-op.html)
--- a/docs/zh_cn/deployment/tensorrt_custom_ops.md
+++ b/docs/zh_cn/deployment/tensorrt_custom_ops.md
-## TensorRT自定义算子
-
-<!-- TOC -->
-
- [TensorRT自定义算子](#tensorrt自定义算子)
-  - [MMCVRoIAlign](#mmcvroialign)
-    - [描述](#描述)
-    - [模型参数](#模型参数)
-    - [输入](#输入)
-    - [输出](#输出)
-    - [类型约束](#类型约束)
-  - [ScatterND](#scatternd)
-    - [描述](#描述-1)
-    - [模型参数](#模型参数-1)
-    - [输入](#输入-1)
-    - [输出](#输出-1)
-    - [类型约束](#类型约束-1)
-  - [NonMaxSuppression](#nonmaxsuppression)
-    - [描述](#描述-2)
-    - [模型参数](#模型参数-2)
-    - [输入](#输入-2)
-    - [输出](#输出-2)
-    - [类型约束](#类型约束-2)
-  - [MMCVDeformConv2d](#mmcvdeformconv2d)
-    - [描述](#描述-3)
-    - [模型参数](#模型参数-3)
-    - [输入](#输入-3)
-    - [输出](#输出-3)
-    - [类型约束](#类型约束-3)
-  - [grid_sampler](#grid_sampler)
-    - [描述](#描述-4)
-    - [模型参数](#模型参数-4)
-    - [输入](#输入-4)
-    - [输出](#输出-4)
-    - [类型约束](#类型约束-4)
-  - [cummax](#cummax)
-    - [描述](#描述-5)
-    - [模型参数](#模型参数-5)
-    - [输入](#输入-5)
-    - [输出](#输出-5)
-    - [类型约束](#类型约束-5)
-  - [cummin](#cummin)
-    - [描述](#描述-6)
-    - [模型参数](#模型参数-6)
-    - [输入](#输入-6)
-    - [输出](#输出-6)
-    - [类型约束](#类型约束-6)
-  - [MMCVInstanceNormalization](#mmcvinstancenormalization)
-    - [描述](#描述-7)
-    - [模型参数](#模型参数-7)
-    - [输入](#输入-7)
-    - [输出](#输出-7)
-    - [类型约束](#类型约束-7)
-  - [MMCVModulatedDeformConv2d](#mmcvmodulateddeformconv2d)
-    - [描述](#描述-8)
-    - [模型参数](#模型参数-8)
-    - [输入](#输入-8)
-    - [输出](#输出-8)
-    - [类型约束](#类型约束-8)
-
-<!-- TOC -->
-
-### MMCVRoIAlign
-
-#### 描述
-
-在特征图上计算RoIAlign，在多数双阶段目标检测模型的bbox_head中使用
-
-#### 模型参数
-
-| 类型    | 参数名           | 描述                                                    |
-| ------- | ---------------- | ------------------------------------------------------- |
-| `int`   | `output_height`  | roi特征的输出高度                                       |
-| `int`   | `output_width`   | roi特征的输出宽度                                       |
-| `float` | `spatial_scale`  | 输入检测框的缩放系数                                    |
-| `int`   | `sampling_ratio` | 输出的采样率。`0`表示使用密集采样                       |
-| `str`   | `mode`           | 池化方式。 `avg`或`max`                                 |
-| `int`   | `aligned`        | 如果`aligned=1`，则像素会进行-0.5的偏移以达到更好的对齐 |
-
-#### 输入
-
-<dl>
-<dt><tt>inputs[0]</tt>: T</dt>
-<dd>输入特征图；形状为(N, C, H, W)的四维张量，其中N为batch大小，C为输入通道数，H和W为输入特征图的高和宽。</dd>
-<dt><tt>inputs[1]</tt>: T</dt>
-<dd>需要进行池化的感兴趣区域；形状为(num_rois, 5)的二维张量，内容为[[batch_index, x1, y1, x2, y2], ...]。rois的坐标为输入特征图的坐标系。</dd>
-</dl>
-
-#### 输出
-
-<dl>
-<dt><tt>outputs[0]</tt>: T</dt>
-<dd>池化的输出；形状为(num_rois, C, output_height, output_width)的四维张量。每个输出特征feat[i]都与输入感兴趣区域rois[i]一一对应。<dd>
-</dl>
-#### 类型约束
-
- T:tensor(float32, Linear)
-
-### ScatterND
-
-#### 描述
-
-ScatterND接收三个输入，分别为秩为r >= 1的`data`，秩为q >= 1的`indices`以及秩为 q + r - indices.shape\[-1\] -1 的`update`。输出的计算方式为：首先创建一个`data`的拷贝，然后根据`indces`的值使用`update`对拷贝的`data`进行更新。注意`indices`中不应该存在相同的条目，也就是说对同一个位置进行一次以上的更新是不允许的。
-
-输出的计算方式可以参考如下代码：
-
-```python
-  output = np.copy(data)
-  update_indices = indices.shape[:-1]
-  for idx in np.ndindex(update_indices):
-      output[indices[idx]] = updates[idx]
-```
-
-#### 模型参数
-
-无
-
-#### 输入
-
-<dl>
-<dt><tt>inputs[0]</tt>: T</dt>
-<dd>秩为r >= 1的输入`data`</dd>
-
-<dt><tt>inputs[1]</tt>: tensor(int32, Linear)</dt>
-<dd>秩为q >= 1的输入`update`</dd>
-
-<dt><tt>inputs[2]</tt>: T</dt>
-<dd>秩为 q + r - indices.shape[-1] -1 的输入`update`</dd>
-</dl>
-
-#### 输出
-
-<dl>
-<dt><tt>outputs[0]</tt>: T</dt>
-<dd>秩为r >= 1的输出张量</dd>
-</dl>
-
-#### 类型约束
-
- T:tensor(float32, Linear), tensor(int32, Linear)
-
-### NonMaxSuppression
-
-#### 描述
-
-根据IoU阈值对候选框进行非极大值抑制。
-
-#### 模型参数
-
-| 类型    | 参数名                       | 描述                                                                                         |
-| ------- | ---------------------------- | -------------------------------------------------------------------------------------------- |
-| `int`   | `center_point_box`           | 0 - 候选框的格式为\[y1, x1, y2, x2\]， 1-候选框的格式为\[x_center, y_center, width, height\] |
-| `int`   | `max_output_boxes_per_class` | 每一类最大的输出检测框个数。默认为0，输出检测框个数等于输入候选框数                          |
-| `float` | `iou_threshold`              | 用来判断候选框重合度的阈值，取值范围\[0, 1\]。默认值为0                                      |
-| `float` | `score_threshold`            | 用来判断候选框是否合法的阈值                                                                 |
-| `int`   | `offset`                     | 检测框长宽计算方式为(x2 - x1 + offset)，可选值0或1                                           |
-
-#### 输入
-
-<dl>
-<dt><tt>inputs[0]</tt>: T</dt>
-<dd>输入候选框。形状为(num_batches, spatial_dimension, 4)的三维张量</dd>
-<dt><tt>inputs[1]</tt>: T</dt>
-<dd>输入得分。形状为(num_batches, num_classes, spatial_dimension)的三维张量</dd>
-</dl>
-
-#### 输出
-
-<dl>
-<dt><tt>outputs[0]</tt>: tensor(int32, Linear)</dt>
-<dd>被选中的候选框索引。形状为(num_selected_indices, 3)的二维张量。每一行内容为[batch_index, class_index, box_index]。</dd>
-<dd>其中 num_selected_indices=num_batches* num_classes* min(max_output_boxes_per_class, spatial_dimension)。</dd>
-<dd>所有未被选中的候选框索引都会被填充为-1</dd>
-</dl>
-
-#### 类型约束
-
- T:tensor(float32, Linear)
-
-### MMCVDeformConv2d
-
-#### 描述
-
-在输入特征上计算Deformable Convolution，请阅读[Deformable Convolutional Network](https://arxiv.org/abs/1703.06211)了解更多细节。
-
-#### 模型参数
-
-| 类型           | 参数名             | 描述                                                                                          |
-| -------------- | ------------------ | --------------------------------------------------------------------------------------------- |
-| `list of ints` | `stride`           | 卷积的步长 (sH, sW)                                                                           |
-| `list of ints` | `padding`          | 输入特征填充大小 (padH, padW)                                                                 |
-| `list of ints` | `dilation`         | 卷积核各元素间隔 (dH, dW)                                                                     |
-| `int`          | `deformable_group` | 可变偏移量的分组                                                                              |
-| `int`          | `group`            | 卷积分组数，`input_channel`会根据这个值被分为数个分组进行计算                                 |
-| `int`          | `im2col_step`      | 可变卷积使用im2col计算卷积。输入与偏移量会以im2col_step为步长分块计算，减少临时空间的使用量。 |
-
-#### 输入
-
-<dl>
-<dt><tt>inputs[0]</tt>: T</dt>
-<dd>输入特征；形状为(N, C, inH, inW)的四维张量，其中N为batch大小，C为输入通道数，inH和inW为输入特征图的高和宽</dd>
-<dt><tt>inputs[1]</tt>: T</dt>
-<dd>输入偏移量；形状为(N, deformable_group* 2* kH* kW, outH, outW)的四维张量，kH和kW为输入特征图的高和宽，outH和outW为输入特征图的高和宽</dd>
-<dt><tt>inputs[2]</tt>: T</dt>
-<dd>输入权重；形状为(output_channel, input_channel, kH, kW)的四维张量</dd>
-</dl>
-
-#### 输出
-
-<dl>
-<dt><tt>outputs[0]</tt>: T</dt>
-<dd>输出特征；形状为(N, output_channel, outH, outW)的四维张量</dd>
-</dl>
-
-#### 类型约束
-
- T:tensor(float32, Linear)
-
-### grid_sampler
-
-#### 描述
-
-根据`grid`的像素位置对`input`进行网格采样。
-
-#### 模型参数
-
-| 类型  | 参数名               | 描述                                                                                                                                                 |
-| ----- | -------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------- |
-| `int` | `interpolation_mode` | 计算输出使用的插值模式。(0: `bilinear` , 1: `nearest`)                                                                                               |
-| `int` | `padding_mode`       | 边缘填充模式。(0: `zeros`, 1: `border`, 2: `reflection`)                                                                                             |
-| `int` | `align_corners`      | 如果`align_corners=1`，则极值(`-1`和`1`)会被当做输入边缘像素的中心点。如果`align_corners=0`，则它们会被看做是边缘像素的边缘点,减小分辨率对采样的影响 |
-
-#### 输入
-
-<dl>
-<dt><tt>inputs[0]</tt>: T</dt>
-<dd>输入特征；形状为(N, C, inH, inW)的四维张量，其中N为batch大小，C为输入通道数，inH和inW为输入特征图的高和宽</dd>
-<dt><tt>inputs[1]</tt>: T</dt>
-<dd>输入网格；形状为(N, outH, outW, 2)的四维张量，outH和outW为输出的高和宽 </dd>
-</dl>
-
-#### 输出
-
-<dl>
-<dt><tt>outputs[0]</tt>: T</dt>
-<dd>输出特征；形状为(N, C, outH, outW)的四维张量</dd>
-</dl>
-
-#### 类型约束
-
- T:tensor(float32, Linear)
-
-### cummax
-
-#### 描述
-
-返回一个元组(`values`, `indices`)，其中`values`为`input`第`dim`维的累计最大值，`indices`为第`dim`维最大值位置。请阅读[torch.cummax](https://pytorch.org/docs/stable/generated/torch.cummax.html)了解更多细节。
-
-#### 模型参数
-
-| 类型  | 参数名 | 描述               |
-| ----- | ------ | ------------------ |
-| `int` | `dim`  | 进行累计计算的维度 |
-
-#### 输入
-
-<dl>
-<dt><tt>inputs[0]</tt>: T</dt>
-<dd>输入张量；可以使任意形状</dd>
-</dl>
-
-#### 输出
-
-<dl>
-<dt><tt>outputs[0]</tt>: T</dt>
-<dd>`input`第`dim`维的累计最大值，形状与`input`相同。类型和`input`一致</dd>
-<dt><tt>outputs[1]</tt>: (int32, Linear)</dt>
-<dd>第`dim`维最大值位置，形状与`input`相同</dd>
-</dl>
-
-#### 类型约束
-
- T:tensor(float32, Linear)
-
-### cummin
-
-#### 描述
-
-返回一个元组(`values`, `indices`)，其中`values`为`input`第`dim`维的累计最小值，`indices`为第`dim`维最小值位置。请阅读[torch.cummin](https://pytorch.org/docs/stable/generated/torch.cummin.html)了解更多细节。
-
-#### 模型参数
-
-| 类型  | 参数名 | 描述               |
-| ----- | ------ | ------------------ |
-| `int` | `dim`  | 进行累计计算的维度 |
-
-#### 输入
-
-<dl>
-<dt><tt>inputs[0]</tt>: T</dt>
-<dd>输入张量；可以使任意形状</dd>
-</dl>
-
-#### 输出
-
-<dl>
-<dt><tt>outputs[0]</tt>: T</dt>
-<dd>`input`第`dim`维的累计最小值，形状与`input`相同。类型和`input`一致</dd>
-<dt><tt>outputs[1]</tt>: (int32, Linear)</dt>
-<dd>第`dim`维最小值位置，形状与`input`相同</dd>
-</dl>
-
-#### 类型约束
-
- T:tensor(float32, Linear)
-
-### MMCVInstanceNormalization
-
-#### 描述
-
-对特征计算instance normalization，请阅读[Instance Normalization: The Missing Ingredient for Fast Stylization](https://arxiv.org/abs/1607.08022)了解更多详细信息。
-
-#### 模型参数
-
-| 类型    | 参数名    | 描述                         |
-| ------- | --------- | ---------------------------- |
-| `float` | `epsilon` | 用来避免除0错误。默认为1e-05 |
-
-#### 输入
-
-<dl>
-<dt><tt>inputs[0]</tt>: T</dt>
-<dd>输入特征。形状为(N, C, H， W)的四维张量，其中N为batch大小，C为输入通道数，H和W为输入特征图的高和宽</dd>
-<dt><tt>inputs[1]</tt>: T</dt>
-<dd>输入缩放系数。形状为(C，)的一维张量</dd>
-<dt><tt>inputs[2]</tt>: T</dt>
-<dd>输入偏移量。形状为(C，)的一维张量</dd>
-</dl>
-
-#### 输出
-
-<dl>
-<dt><tt>outputs[0]</tt>: T</dt>
-<dd>输出特征。形状为(N, C, H， W)的四维张量</dd>
-</dl>
-
-#### 类型约束
-
- T:tensor(float32, Linear)
-
-### MMCVModulatedDeformConv2d
-
-#### 描述
-
-在输入特征上计算Modulated Deformable Convolution，请阅读[Deformable ConvNets v2: More Deformable, Better Results](https://arxiv.org/abs/1811.11168?from=timeline)了解更多细节。
-
-#### 模型参数
-
-| 类型           | 参数名              | 描述                                                          |
-| -------------- | ------------------- | ------------------------------------------------------------- |
-| `list of ints` | `stride`            | 卷积的步长 (sH, sW)                                           |
-| `list of ints` | `padding`           | 输入特征填充大小 (padH, padW)                                 |
-| `list of ints` | `dilation`          | 卷积核各元素间隔 (dH, dW)                                     |
-| `int`          | `deformable_groups` | 可变偏移量的分组，通常置位1即可                               |
-| `int`          | `groups`            | 卷积分组数，`input_channel`会根据这个值被分为数个分组进行计算 |
-
-#### 输入
-
-<dl>
-<dt><tt>inputs[0]</tt>: T</dt>
-<dd>输入特征；形状为(N, C, inH, inW)的四维张量，其中N为batch大小，C为输入通道数，inH和inW为输入特征图的高和宽</dd>
-<dt><tt>inputs[1]</tt>: T</dt>
-<dd>输入偏移量；形状为(N, deformable_group* 2* kH* kW, outH, outW)的四维张量，kH和kW为输入特征图的高和宽，outH和outW为输入特征图的高和宽</dd>
-<dt><tt>inputs[2]</tt>: T</dt>
-<dd>输入掩码；形状为(N, deformable_group* kH* kW, outH, outW)的四维张量</dd>
-<dt><tt>inputs[3]</tt>: T</dt>
-<dd>输入权重；形状为(output_channel, input_channel, kH, kW)的四维张量</dd>
-<dt><tt>inputs[4]</tt>: T, optional</dt>
-<dd>输入偏移量；形状为(output_channel)的一维张量</dd>
-</dl>
-
-#### 输出
-
-<dl>
-<dt><tt>outputs[0]</tt>: T</dt>
-<dd>输出特征；形状为(N, output_channel, outH, outW)的四维张量</dd>
-</dl>
-
-#### 类型约束
-
- T:tensor(float32, Linear)
--- a/docs/zh_cn/deployment/tensorrt_plugin.md
+++ b/docs/zh_cn/deployment/tensorrt_plugin.md
-## MMCV中的TensorRT自定义算子 (实验性)
-
-<!-- TOC -->
-
- [MMCV中的TensorRT自定义算子 (实验性)](#mmcv%E4%B8%AD%E7%9A%84tensorrt%E8%87%AA%E5%AE%9A%E4%B9%89%E7%AE%97%E5%AD%90-%E5%AE%9E%E9%AA%8C%E6%80%A7)
-  - [介绍](#%E4%BB%8B%E7%BB%8D)
-  - [MMCV中的TensorRT插件列表](#mmcv%E4%B8%AD%E7%9A%84tensorrt%E6%8F%92%E4%BB%B6%E5%88%97%E8%A1%A8)
-  - [如何编译MMCV中的TensorRT插件](#%E5%A6%82%E4%BD%95%E7%BC%96%E8%AF%91mmcv%E4%B8%AD%E7%9A%84tensorrt%E6%8F%92%E4%BB%B6)
-    - [准备](#%E5%87%86%E5%A4%87)
-    - [在Linux上编译](#%E5%9C%A8linux%E4%B8%8A%E7%BC%96%E8%AF%91)
-  - [创建TensorRT推理引擎并在python下进行推理](#%E5%88%9B%E5%BB%BAtensorrt%E6%8E%A8%E7%90%86%E5%BC%95%E6%93%8E%E5%B9%B6%E5%9C%A8python%E4%B8%8B%E8%BF%9B%E8%A1%8C%E6%8E%A8%E7%90%86)
-  - [如何在MMCV中添加新的TensorRT自定义算子](#%E5%A6%82%E4%BD%95%E5%9C%A8mmcv%E4%B8%AD%E6%B7%BB%E5%8A%A0%E6%96%B0%E7%9A%84tensorrt%E8%87%AA%E5%AE%9A%E4%B9%89%E7%AE%97%E5%AD%90)
-    - [主要流程](#%E4%B8%BB%E8%A6%81%E6%B5%81%E7%A8%8B)
-    - [注意](#%E6%B3%A8%E6%84%8F)
-  - [已知问题](#%E5%B7%B2%E7%9F%A5%E9%97%AE%E9%A2%98)
-  - [引用](#%E5%BC%95%E7%94%A8)
-
-<!-- TOC -->
-
-### 介绍
-
-**NVIDIA TensorRT**是一个为深度学习模型高性能推理准备的软件开发工具(SDK)。它包括深度学习推理优化器和运行时，可为深度学习推理应用提供低延迟和高吞吐量。请访问[developer's website](https://developer.nvidia.com/tensorrt)了解更多信息。
-为了简化TensorRT部署带有MMCV自定义算子的模型的流程，MMCV中添加了一系列TensorRT插件。
-
-### MMCV中的TensorRT插件列表
-
-|         ONNX算子          |                                  TensorRT插件                                   | MMCV版本 |
-| :-----------------------: | :-----------------------------------------------------------------------------: | :------: |
-|       MMCVRoiAlign        |              [MMCVRoiAlign](./tensorrt_custom_ops.md#mmcvroialign)              |  1.2.6   |
-|         ScatterND         |                 [ScatterND](./tensorrt_custom_ops.md#scatternd)                 |  1.2.6   |
-|     NonMaxSuppression     |         [NonMaxSuppression](./tensorrt_custom_ops.md#nonmaxsuppression)         |  1.3.0   |
-|     MMCVDeformConv2d      |          [MMCVDeformConv2d](./tensorrt_custom_ops.md#mmcvdeformconv2d)          |  1.3.0   |
-|       grid_sampler        |              [grid_sampler](./tensorrt_custom_ops.md#grid-sampler)              |  1.3.1   |
-|          cummax           |                    [cummax](./tensorrt_custom_ops.md#cummax)                    |  1.3.5   |
-|          cummin           |                    [cummin](./tensorrt_custom_ops.md#cummin)                    |  1.3.5   |
-| MMCVInstanceNormalization | [MMCVInstanceNormalization](./tensorrt_custom_ops.md#mmcvinstancenormalization) |  1.3.5   |
-| MMCVModulatedDeformConv2d | [MMCVModulatedDeformConv2d](./tensorrt_custom_ops.md#mmcvmodulateddeformconv2d) |  master  |
-
-注意
-
- 以上所有算子均在 TensorRT-7.2.1.6.Ubuntu-16.04.x86_64-gnu.cuda-10.2.cudnn8.0 环境下开发。
-
-### 如何编译MMCV中的TensorRT插件
-
-#### 准备
-
- 克隆代码仓库
-
-```bash
-git clone https://github.com/open-mmlab/mmcv.git
-```
-
- 安装TensorRT
-
-从 [NVIDIA Developer Zone](https://developer.nvidia.com/nvidia-tensorrt-download) 下载合适的TensorRT版本。
-
-比如，对安装了cuda-10.2的x86-64的Ubuntu 16.04，下载文件为`TensorRT-7.2.1.6.Ubuntu-16.04.x86_64-gnu.cuda-10.2.cudnn8.0.tar.gz`.
-
-然后使用下面方式安装并配置环境
-
-```bash
-cd ~/Downloads
-tar -xvzf TensorRT-7.2.1.6.Ubuntu-16.04.x86_64-gnu.cuda-10.2.cudnn8.0.tar.gz
-export TENSORRT_DIR=`pwd`/TensorRT-7.2.1.6
-export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$TENSORRT_DIR/lib
-```
-
-安装python依赖: tensorrt, graphsurgeon, onnx-graphsurgeon
-
-```bash
-pip install $TENSORRT_DIR/python/tensorrt-7.2.1.6-cp37-none-linux_x86_64.whl
-pip install $TENSORRT_DIR/onnx_graphsurgeon/onnx_graphsurgeon-0.2.6-py2.py3-none-any.whl
-pip install $TENSORRT_DIR/graphsurgeon/graphsurgeon-0.4.5-py2.py3-none-any.whl
-```
-
-想了解更多通过tar包安装TensorRT，请访问[Nvidia' website](https://docs.nvidia.com/deeplearning/tensorrt/archives/tensorrt-721/install-guide/index.html#installing-tar).
-
- 安装 cuDNN
-
-参考[Nvidia' website](https://docs.nvidia.com/deeplearning/cudnn/install-guide/index.html#installlinux-tar)安装 cuDNN 8。
-
-#### 在Linux上编译
-
-```bash
-cd mmcv ## to MMCV root directory
-MMCV_WITH_OPS=1 MMCV_WITH_TRT=1 pip install -e .
-```
-
-### 创建TensorRT推理引擎并在python下进行推理
-
-范例如下：
-
-```python
-import torch
-import onnx
-
-from mmcv.tensorrt import (TRTWrapper, onnx2trt, save_trt_engine,
-                                   is_tensorrt_plugin_loaded)
-
-assert is_tensorrt_plugin_loaded(), 'Requires to complie TensorRT plugins in mmcv'
-
-onnx_file = 'sample.onnx'
-trt_file = 'sample.trt'
-onnx_model = onnx.load(onnx_file)
-
-## Model input
-inputs = torch.rand(1, 3, 224, 224).cuda()
-## Model input shape info
-opt_shape_dict = {
-    'input': [list(inputs.shape),
-              list(inputs.shape),
-              list(inputs.shape)]
-}
-
-## Create TensorRT engine
-max_workspace_size = 1 << 30
-trt_engine = onnx2trt(
-    onnx_model,
-    opt_shape_dict,
-    max_workspace_size=max_workspace_size)
-
-## Save TensorRT engine
-save_trt_engine(trt_engine, trt_file)
-
-## Run inference with TensorRT
-trt_model = TRTWrapper(trt_file, ['input'], ['output'])
-
-with torch.no_grad():
-    trt_outputs = trt_model({'input': inputs})
-    output = trt_outputs['output']
-
-```
-
-### 如何在MMCV中添加新的TensorRT自定义算子
-
-#### 主要流程
-
-下面是主要的步骤：
-
-1. 添加c++头文件
-2. 添加c++源文件
-3. 添加cuda kernel文件
-4. 在`trt_plugin.cpp`中注册插件
-5. 在`tests/test_ops/test_tensorrt.py`中添加单元测试
-
-**以RoIAlign算子插件`roi_align`举例。**
-
-1. 在TensorRT包含目录`mmcv/ops/csrc/tensorrt/`中添加头文件`trt_roi_align.hpp`
-
-2. 在TensorRT源码目录`mmcv/ops/csrc/tensorrt/plugins/`中添加头文件`trt_roi_align.cpp`
-
-3. 在TensorRT源码目录`mmcv/ops/csrc/tensorrt/plugins/`中添加cuda kernel文件`trt_roi_align_kernel.cu`
-
-4. 在[trt_plugin.cpp](https://github.com/open-mmlab/mmcv/blob/master/mmcv/ops/csrc/tensorrt/plugins/trt_plugin.cpp)中注册`roi_align`插件
-
-   ```c++
-   #include "trt_plugin.hpp"
-
-   #include "trt_roi_align.hpp"
-
-   REGISTER_TENSORRT_PLUGIN(RoIAlignPluginDynamicCreator);
-
-   extern "C" {
-   bool initLibMMCVInferPlugins() { return true; }
-   }  // extern "C"
-   ```
-
-5. 在`tests/test_ops/test_tensorrt.py`中添加单元测试
-
-#### 注意
-
- 部分MMCV中的自定义算子存在对应的cuda实现，在进行TensorRT插件开发的时候可以参考。
-
-### 已知问题
-
- 无
-
-### 引用
-
- [Developer guide of Nvidia TensorRT](https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html)
- [TensorRT Open Source Software](https://github.com/NVIDIA/TensorRT)
- [onnx-tensorrt](https://github.com/onnx/onnx-tensorrt)
- [TensorRT python API](https://docs.nvidia.com/deeplearning/tensorrt/api/python_api/index.html)
- [TensorRT c++ plugin API](https://docs.nvidia.com/deeplearning/tensorrt/api/c_api/classnvinfer1_1_1_i_plugin.html)
--- a/docs/zh_cn/get_started/installation.md
+++ b/docs/zh_cn/get_started/installation.md
@@ -173,8 +173,4 @@ b. 安装精简版
 pip install mmcv
 ```

-c. 安装完整版并且编译 onnxruntime 的自定义算子
-
- 详细的指南请查看 [这里](https://mmcv.readthedocs.io/zh_CN/latest/deployment/onnxruntime_custom_ops.html)。
-
 如果想从源码编译 MMCV，请参考[该文档](https://mmcv.readthedocs.io/zh_CN/latest/get_started/build.html)。
--- a/docs/zh_cn/index.rst
+++ b/docs/zh_cn/index.rst
@@ -21,16 +21,6 @@
   understand_mmcv/cnn.md
   understand_mmcv/ops.md

-.. toctree::
-   :maxdepth: 2
-   :caption: 部署
-
-   deployment/onnx.md
-   deployment/onnxruntime_op.md
-   deployment/onnxruntime_custom_ops.md
-   deployment/tensorrt_plugin.md
-   deployment/tensorrt_custom_ops.md
-
 .. toctree::
   :caption: 语言切换


--- a/mmcv/onnx/__init__.py
+++ b/mmcv/onnx/__init__.py
 # Copyright (c) OpenMMLab. All rights reserved.
-from .info import is_custom_op_loaded
 from .symbolic import register_extra_symbolics

-__all__ = ['register_extra_symbolics', 'is_custom_op_loaded']
+__all__ = ['register_extra_symbolics']
--- a/mmcv/onnx/info.py
+++ b/mmcv/onnx/info.py
-# Copyright (c) OpenMMLab. All rights reserved.
-import os
-import warnings
-
-import torch
-
-
-def is_custom_op_loaded() -> bool:
-
-    # Following strings of text style are from colorama package
-    bright_style, reset_style = '\x1b[1m', '\x1b[0m'
-    red_text, blue_text = '\x1b[31m', '\x1b[34m'
-    white_background = '\x1b[107m'
-
-    msg = white_background + bright_style + red_text
-    msg += 'DeprecationWarning: This function will be deprecated in future. '
-    msg += blue_text + 'Welcome to use the unified model deployment toolbox '
-    msg += 'MMDeploy: https://github.com/open-mmlab/mmdeploy'
-    msg += reset_style
-    warnings.warn(msg)
-
-    flag = False
-    try:
-        from ..tensorrt import is_tensorrt_plugin_loaded
-        flag = is_tensorrt_plugin_loaded()
-    except (ImportError, ModuleNotFoundError):
-        pass
-    if not flag:
-        try:
-            from ..ops import get_onnxruntime_op_path
-            ort_lib_path = get_onnxruntime_op_path()
-            flag = os.path.exists(ort_lib_path)
-        except (ImportError, ModuleNotFoundError):
-            pass
-    return flag or torch.__version__ == 'parrots'
--- a/mmcv/ops/__init__.py
+++ b/mmcv/ops/__init__.py
@@ -27,8 +27,7 @@ from .furthest_point_sample import (furthest_point_sample,
 from .fused_bias_leakyrelu import FusedBiasLeakyReLU, fused_bias_leakyrelu
 from .gather_points import gather_points
 from .group_points import GroupAll, QueryAndGroup, grouping_operation
-from .info import (get_compiler_version, get_compiling_cuda_version,
-                   get_onnxruntime_op_path)
+from .info import get_compiler_version, get_compiling_cuda_version
 from .iou3d import (boxes_iou3d, boxes_iou_bev, boxes_overlap_bev, nms3d,
                    nms3d_normal, nms_bev, nms_normal_bev)
 from .knn import knn
@@ -76,9 +75,8 @@ __all__ = [
    'deform_conv2d', 'DeformRoIPool', 'DeformRoIPoolPack',
    'ModulatedDeformRoIPoolPack', 'deform_roi_pool', 'SigmoidFocalLoss',
    'SoftmaxFocalLoss', 'sigmoid_focal_loss', 'softmax_focal_loss',
-    'get_compiler_version', 'get_compiling_cuda_version',
-    'get_onnxruntime_op_path', 'MaskedConv2d', 'masked_conv2d',
-    'ModulatedDeformConv2d', 'ModulatedDeformConv2dPack',
+    'get_compiler_version', 'get_compiling_cuda_version', 'MaskedConv2d',
+    'masked_conv2d', 'ModulatedDeformConv2d', 'ModulatedDeformConv2dPack',
    'modulated_deform_conv2d', 'batched_nms', 'nms', 'soft_nms', 'nms_match',
    'RoIAlign', 'roi_align', 'RoIPool', 'roi_pool', 'SyncBatchNorm', 'Conv2d',
    'ConvTranspose2d', 'Linear', 'MaxPool2d', 'CrissCrossAttention', 'PSAMask',