[Docs] Add docs_zh_CN of TensorRT (#1336)

* Add docs_zh_CN of TensorRT * fix according to doc-cn-onnx * update doc of cummax cummin * Apply suggestions from code review Co-authored-by: Zaida Zhou <58739961+zhouzaida@users.noreply.github.com> * update heading Co-authored-by: Zaida Zhou <58739961+zhouzaida@users.noreply.github.com>

[Docs] Add docs_zh_CN of TensorRT (#1336)
* Add docs_zh_CN of TensorRT * fix according to doc-cn-onnx * update doc of cummax cummin * Apply suggestions from code review Co-authored-by: Zaida Zhou <58739961+zhouzaida@users.noreply.github.com> * update heading Co-authored-by: Zaida Zhou <58739961+zhouzaida@users.noreply.github.com>
afe0794c · q.yao · GitHub · 2b39d7a8 · afe0794c · afe0794c
Unverified Commit afe0794c authored Sep 24, 2021 by q.yao Committed by GitHub Sep 24, 2021
4 changed files
--- a/docs/deployment/tensorrt_custom_ops.md
+++ b/docs/deployment/tensorrt_custom_ops.md
-# TensorRT Custom Ops
+## TensorRT Custom Ops

 <!-- TOC -->

@@ -60,14 +60,14 @@

 <!-- TOC -->

-## MMCVRoIAlign
+### MMCVRoIAlign

-### Description
+#### Description

 Perform RoIAlign on output feature, used in bbox_head of most two stage
 detectors.

-### Parameters
+#### Parameters

 | Type    | Parameter        | Description                                                                                                   |
 | ------- | ---------------- | ------------------------------------------------------------------------------------------------------------- |
@@ -78,7 +78,7 @@ detectors.
 | `str`   | `mode`           | pooling mode in each bin. `avg` or `max`                                                                      |
 | `int`   | `aligned`        | If `aligned=0`, use the legacy implementation in MMDetection. Else, align the results more perfectly.         |

-### Inputs
+#### Inputs

 <dl>
 <dt><tt>inputs[0]</tt>: T</dt>
@@ -87,20 +87,20 @@ detectors.
 <dd>RoIs (Regions of Interest) to pool over; 2-D tensor of shape (num_rois, 5) given as [[batch_index, x1, y1, x2, y2], ...]. The RoIs' coordinates are the coordinate system of inputs[0].</dd>
 </dl>

-### Outputs
+#### Outputs

 <dl>
 <dt><tt>outputs[0]</tt>: T</dt>
 <dd>RoI pooled output, 4-D tensor of shape (num_rois, C, output_height, output_width). The r-th batch element output[0][r-1] is a pooled feature map corresponding to the r-th RoI inputs[1][r-1].<dd>
 </dl>

-### Type Constraints
+#### Type Constraints

 - T:tensor(float32, Linear)

-## ScatterND
+### ScatterND

-### Description
+#### Description

 ScatterND takes three inputs `data` tensor of rank r >= 1, `indices` tensor of rank q >= 1, and `updates` tensor of rank q + r - indices.shape[-1] - 1. The output of the operation is produced by creating a copy of the input `data`, and then updating its value to values specified by updates at specific index positions specified by `indices`. Its output shape is the same as the shape of `data`. Note that `indices` should not have duplicate entries. That is, two or more updates for the same index-location is not supported.

@@ -113,11 +113,11 @@ The `output` is calculated via the following equation:
      output[indices[idx]] = updates[idx]
 ```

-### Parameters
+#### Parameters

 None

-### Inputs
+#### Inputs

 <dl>
 <dt><tt>inputs[0]</tt>: T</dt>
@@ -130,24 +130,24 @@ None
 <dd>Tensor of rank q + r - indices_shape[-1] - 1.</dd>
 </dl>

-### Outputs
+#### Outputs

 <dl>
 <dt><tt>outputs[0]</tt>: T</dt>
 <dd>Tensor of rank r >= 1.</dd>
 </dl>

-### Type Constraints
+#### Type Constraints

 - T:tensor(float32, Linear), tensor(int32, Linear)

-## NonMaxSuppression
+### NonMaxSuppression

-### Description
+#### Description

 Filter out boxes has high IoU overlap with previously selected boxes or low score. Output the indices of valid boxes. Indices of invalid boxes will be filled with -1.

-### Parameters
+#### Parameters

 | Type    | Parameter                    | Description                                                                                                                          |
 | ------- | ---------------------------- | ------------------------------------------------------------------------------------------------------------------------------------ |
@@ -157,7 +157,7 @@ Filter out boxes has high IoU overlap with previously selected boxes or low scor
 | `float` | `score_threshold`            | The threshold for deciding when to remove boxes based on score.                                                                      |
 | `int`   | `offset`                     | 0 or 1, boxes' width or height is (x2 - x1 + offset).                                                                                |

-### Inputs
+#### Inputs

 <dl>
 <dt><tt>inputs[0]</tt>: T</dt>
@@ -166,7 +166,7 @@ Filter out boxes has high IoU overlap with previously selected boxes or low scor
 <dd>Input scores. 3-D tensor of shape (num_batches, num_classes, spatial_dimension).</dd>
 </dl>

-### Outputs
+#### Outputs

 <dl>
 <dt><tt>outputs[0]</tt>: tensor(int32, Linear)</dt>
@@ -175,17 +175,17 @@ Filter out boxes has high IoU overlap with previously selected boxes or low scor
 <dd>All invalid indices will be filled with -1.</dd>
 </dl>

-### Type Constraints
+#### Type Constraints

 - T:tensor(float32, Linear)

-## MMCVDeformConv2d
+### MMCVDeformConv2d

-### Description
+#### Description

 Perform Deformable Convolution on input feature, read [Deformable Convolutional Network](https://arxiv.org/abs/1703.06211) for detail.

-### Parameters
+#### Parameters

 | Type           | Parameter          | Description                                                                                                                       |
 | -------------- | ------------------ | --------------------------------------------------------------------------------------------------------------------------------- |
@@ -196,7 +196,7 @@ Perform Deformable Convolution on input feature, read [Deformable Convolutional
 | `int`          | `group`            | Split input into groups. `input_channel` should be divisible by the number of groups.                                             |
 | `int`          | `im2col_step`      | DeformableConv2d use im2col to compute convolution. im2col_step is used to split input and offset, reduce memory usage of column. |

-### Inputs
+#### Inputs

 <dl>
 <dt><tt>inputs[0]</tt>: T</dt>
@@ -207,24 +207,24 @@ Perform Deformable Convolution on input feature, read [Deformable Convolutional
 <dd>Input weight; 4-D tensor of shape (output_channel, input_channel, kH, kW).</dd>
 </dl>

-### Outputs
+#### Outputs

 <dl>
 <dt><tt>outputs[0]</tt>: T</dt>
 <dd>Output feature; 4-D tensor of shape (N, output_channel, outH, outW).</dd>
 </dl>

-### Type Constraints
+#### Type Constraints

 - T:tensor(float32, Linear)

-## grid_sampler
+### grid_sampler

-### Description
+#### Description

 Perform sample from `input` with pixel locations from `grid`.

-### Parameters
+#### Parameters

 | Type  | Parameter            | Description                                                                                                                                                                                                                                                                                     |
 | ----- | -------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
@@ -232,7 +232,7 @@ Perform sample from `input` with pixel locations from `grid`.
 | `int` | `padding_mode`       | Padding mode for outside grid values. (0: `zeros`, 1: `border`, 2: `reflection`)                                                                                                                                                                                                                |
 | `int` | `align_corners`      | If `align_corners=1`, the extrema (`-1` and `1`) are considered as referring to the center points of the input's corner pixels. If `align_corners=0`, they are instead considered as referring to the corner points of the input's corner pixels, making the sampling more resolution agnostic. |

-### Inputs
+#### Inputs

 <dl>
 <dt><tt>inputs[0]</tt>: T</dt>
@@ -241,37 +241,37 @@ Perform sample from `input` with pixel locations from `grid`.
 <dd>Input offset; 4-D tensor of shape (N, outH, outW, 2), where outH and outW is the height and width of offset and output. </dd>
 </dl>

-### Outputs
+#### Outputs

 <dl>
 <dt><tt>outputs[0]</tt>: T</dt>
 <dd>Output feature; 4-D tensor of shape (N, C, outH, outW).</dd>
 </dl>

-### Type Constraints
+#### Type Constraints

 - T:tensor(float32, Linear)

-## cummax
+### cummax

-### Description
+#### Description

 Returns a namedtuple (`values`, `indices`) where `values` is the cumulative maximum of elements of `input` in the dimension `dim`. And `indices` is the index location of each maximum value found in the dimension `dim`.

-### Parameters
+#### Parameters

 | Type  | Parameter | Description                             |
 | ----- | --------- | --------------------------------------- |
 | `int` | `dim`     | The dimension to do the operation over. |

-### Inputs
+#### Inputs

 <dl>
 <dt><tt>inputs[0]</tt>: T</dt>
 <dd>The input tensor.</dd>
 </dl>

-### Outputs
+#### Outputs

 <dl>
 <dt><tt>outputs[0]</tt>: T</dt>
@@ -280,30 +280,30 @@ Returns a namedtuple (`values`, `indices`) where `values` is the cumulative maxi
 <dd>Output indices.</dd>
 </dl>

-### Type Constraints
+#### Type Constraints

 - T:tensor(float32, Linear)

-## cummin
+### cummin

-### Description
+#### Description

 Returns a namedtuple (`values`, `indices`) where `values` is the cumulative minimum of elements of `input` in the dimension `dim`. And `indices` is the index location of each minimum value found in the dimension `dim`.

-### Parameters
+#### Parameters

 | Type  | Parameter | Description                             |
 | ----- | --------- | --------------------------------------- |
 | `int` | `dim`     | The dimension to do the operation over. |

-### Inputs
+#### Inputs

 <dl>
 <dt><tt>inputs[0]</tt>: T</dt>
 <dd>The input tensor.</dd>
 </dl>

-### Outputs
+#### Outputs

 <dl>
 <dt><tt>outputs[0]</tt>: T</dt>
@@ -312,25 +312,25 @@ Returns a namedtuple (`values`, `indices`) where `values` is the cumulative mini
 <dd>Output indices.</dd>
 </dl>

-### Type Constraints
+#### Type Constraints

 - T:tensor(float32, Linear)

-## MMCVInstanceNormalization
+### MMCVInstanceNormalization

-### Description
+#### Description

 Carries out instance normalization as described in the paper https://arxiv.org/abs/1607.08022.

 y = scale * (x - mean) / sqrt(variance + epsilon) + B, where mean and variance are computed per instance per channel.

-### Parameters
+#### Parameters

 | Type    | Parameter | Description                                                          |
 | ------- | --------- | -------------------------------------------------------------------- |
 | `float` | `epsilon` | The epsilon value to use to avoid division by zero. Default is 1e-05 |

-### Inputs
+#### Inputs

 <dl>
 <dt><tt>input</tt>: T</dt>
@@ -341,24 +341,24 @@ y = scale * (x - mean) / sqrt(variance + epsilon) + B, where mean and variance a
 <dd>The input 1-dimensional bias tensor of size C.</dd>
 </dl>

-### Outputs
+#### Outputs

 <dl>
 <dt><tt>output</tt>: T</dt>
 <dd>The output tensor of the same shape as input.</dd>
 </dl>

-### Type Constraints
+#### Type Constraints

 - T:tensor(float32, Linear)

-## MMCVModulatedDeformConv2d
+### MMCVModulatedDeformConv2d

-### Description
+#### Description

 Perform Modulated Deformable Convolution on input feature, read [Deformable ConvNets v2: More Deformable, Better Results](https://arxiv.org/abs/1811.11168?from=timeline) for detail.

-### Parameters
+#### Parameters

 | Type           | Parameter          | Description                                                                           |
 | -------------- | ------------------ | ------------------------------------------------------------------------------------- |
@@ -368,7 +368,7 @@ Perform Modulated Deformable Convolution on input feature, read [Deformable Conv
 | `int`          | `deformable_group` | Groups of deformable offset.                                                          |
 | `int`          | `group`            | Split input into groups. `input_channel` should be divisible by the number of groups. |

-### Inputs
+#### Inputs

 <dl>
 <dt><tt>inputs[0]</tt>: T</dt>
@@ -383,13 +383,13 @@ Perform Modulated Deformable Convolution on input feature, read [Deformable Conv
 <dd>Input weight; 1-D tensor of shape (output_channel).</dd>
 </dl>

-### Outputs
+#### Outputs

 <dl>
 <dt><tt>outputs[0]</tt>: T</dt>
 <dd>Output feature; 4-D tensor of shape (N, output_channel, outH, outW).</dd>
 </dl>

-### Type Constraints
+#### Type Constraints

 - T:tensor(float32, Linear)
--- a/docs/deployment/tensorrt_plugin.md
+++ b/docs/deployment/tensorrt_plugin.md
-# TensorRT Plugins for custom operators in MMCV (Experimental)
+## TensorRT Plugins for custom operators in MMCV (Experimental)

 <!-- TOC -->

@@ -17,12 +17,12 @@

 <!-- TOC -->

-## Introduction
+### Introduction

 **NVIDIA TensorRT** is a software development kit(SDK) for high-performance inference of deep learning models. It includes a deep learning inference optimizer and runtime that delivers low latency and high-throughput for deep learning inference applications. Please check its [developer's website](https://developer.nvidia.com/tensorrt) for more information.
 To ease the deployment of trained models with custom operators from `mmcv.ops` using TensorRT, a series of TensorRT plugins are included in MMCV.

-## List of TensorRT plugins supported in MMCV
+### List of TensorRT plugins supported in MMCV

 |       ONNX Operator       |                                 TensorRT Plugin                                 | MMCV Releases |
 | :-----------------------: | :-----------------------------------------------------------------------------: | :-----------: |
@@ -40,9 +40,9 @@ Notes

 - All plugins listed above are developed on TensorRT-7.2.1.6.Ubuntu-16.04.x86_64-gnu.cuda-10.2.cudnn8.0

-## How to build TensorRT plugins in MMCV
+### How to build TensorRT plugins in MMCV

-### Prerequisite
+#### Prerequisite

 - Clone repository

@@ -75,14 +75,14 @@ pip install $TENSORRT_DIR/graphsurgeon/graphsurgeon-0.4.5-py2.py3-none-any.whl

 For more detailed infomation of installing TensorRT using tar, please refer to [Nvidia' website](https://docs.nvidia.com/deeplearning/tensorrt/archives/tensorrt-721/install-guide/index.html#installing-tar).

-### Build on Linux
+#### Build on Linux

 ```bash
-cd mmcv # to MMCV root directory
+cd mmcv ## to MMCV root directory
 MMCV_WITH_OPS=1 MMCV_WITH_TRT=1 pip install -e .
 ```

-## Create TensorRT engine and run inference in python
+### Create TensorRT engine and run inference in python

 Here is an example.

@@ -99,26 +99,26 @@ onnx_file = 'sample.onnx'
 trt_file = 'sample.trt'
 onnx_model = onnx.load(onnx_file)

-# Model input
+## Model input
 inputs = torch.rand(1, 3, 224, 224).cuda()
-# Model input shape info
+## Model input shape info
 opt_shape_dict = {
    'input': [list(inputs.shape),
              list(inputs.shape),
              list(inputs.shape)]
 }

-# Create TensorRT engine
+## Create TensorRT engine
 max_workspace_size = 1 << 30
 trt_engine = onnx2trt(
    onnx_model,
    opt_shape_dict,
    max_workspace_size=max_workspace_size)

-# Save TensorRT engine
+## Save TensorRT engine
 save_trt_engine(trt_engine, trt_file)

-# Run inference with TensorRT
+## Run inference with TensorRT
 trt_model = TRTWrapper(trt_file, ['input'], ['output'])

 with torch.no_grad():
@@ -127,9 +127,9 @@ with torch.no_grad():

 ```

-## How to add a TensorRT plugin for custom op in MMCV
+### How to add a TensorRT plugin for custom op in MMCV

-### Main procedures
+#### Main procedures

 Below are the main steps:

@@ -161,15 +161,15 @@ Below are the main steps:
 5. Add unit test into `tests/test_ops/test_tensorrt.py`
   Check [here](https://github.com/open-mmlab/mmcv/blob/master/tests/test_ops/test_tensorrt.py) for examples.

-### Reminders
+#### Reminders

 - Some of the [custom ops](https://mmcv.readthedocs.io/en/latest/ops.html) in `mmcv` have their cuda implementations, which could be referred.

-## Known Issues
+### Known Issues

 - None

-## References
+### References

 - [Developer guide of Nvidia TensorRT](https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html)
 - [TensorRT Open Source Software](https://github.com/NVIDIA/TensorRT)

--- a/docs_zh_CN/deployment/tensorrt_custom_ops.md
+++ b/docs_zh_CN/deployment/tensorrt_custom_ops.md
-# TensorRT 自定义算子
+## TensorRT自定义算子

-欢迎有兴趣的朋友一起翻译 MMCV 文档。如有兴趣，请在 [MMCV issue](https://github.com/open-mmlab/mmcv/issues) 提 issue 确定翻译的文档。
+<!-- TOC -->
+
+- [TensorRT自定义算子](#tensorrt自定义算子)
+  - [MMCVRoIAlign](#mmcvroialign)
+    - [描述](#描述)
+    - [模型参数](#模型参数)
+    - [输入](#输入)
+    - [输出](#输出)
+    - [类型约束](#类型约束)
+  - [ScatterND](#scatternd)
+    - [描述](#描述-1)
+    - [模型参数](#模型参数-1)
+    - [输入](#输入-1)
+    - [输出](#输出-1)
+    - [类型约束](#类型约束-1)
+  - [NonMaxSuppression](#nonmaxsuppression)
+    - [描述](#描述-2)
+    - [模型参数](#模型参数-2)
+    - [输入](#输入-2)
+    - [输出](#输出-2)
+    - [类型约束](#类型约束-2)
+  - [MMCVDeformConv2d](#mmcvdeformconv2d)
+    - [描述](#描述-3)
+    - [模型参数](#模型参数-3)
+    - [输入](#输入-3)
+    - [输出](#输出-3)
+    - [类型约束](#类型约束-3)
+  - [grid_sampler](#grid_sampler)
+    - [描述](#描述-4)
+    - [模型参数](#模型参数-4)
+    - [输入](#输入-4)
+    - [输出](#输出-4)
+    - [类型约束](#类型约束-4)
+  - [cummax](#cummax)
+    - [描述](#描述-5)
+    - [模型参数](#模型参数-5)
+    - [输入](#输入-5)
+    - [输出](#输出-5)
+    - [类型约束](#类型约束-5)
+  - [cummin](#cummin)
+    - [描述](#描述-6)
+    - [模型参数](#模型参数-6)
+    - [输入](#输入-6)
+    - [输出](#输出-6)
+    - [类型约束](#类型约束-6)
+  - [MMCVInstanceNormalization](#mmcvinstancenormalization)
+    - [描述](#描述-7)
+    - [模型参数](#模型参数-7)
+    - [输入](#输入-7)
+    - [输出](#输出-7)
+    - [类型约束](#类型约束-7)
+  - [MMCVModulatedDeformConv2d](#mmcvmodulateddeformconv2d)
+    - [描述](#描述-8)
+    - [模型参数](#模型参数-8)
+    - [输入](#输入-8)
+    - [输出](#输出-8)
+    - [类型约束](#类型约束-8)
+
+<!-- TOC -->
+
+### MMCVRoIAlign
+
+#### 描述
+
+在特征图上计算RoIAlign，在多数双阶段目标检测模型的bbox_head中使用
+
+#### 模型参数
+
+| 类型    | 参数名           | 描述                                                    |
+| ------- | ---------------- | ------------------------------------------------------- |
+| `int`   | `output_height`  | roi特征的输出高度                                       |
+| `int`   | `output_width`   | roi特征的输出宽度                                       |
+| `float` | `spatial_scale`  | 输入检测框的缩放系数                                    |
+| `int`   | `sampling_ratio` | 输出的采样率。`0`表示使用密集采样                       |
+| `str`   | `mode`           | 池化方式。 `avg`或`max`                                 |
+| `int`   | `aligned`        | 如果`aligned=1`，则像素会进行-0.5的偏移以达到更好的对齐 |
+
+#### 输入
+
+<dl>
+<dt><tt>inputs[0]</tt>: T</dt>
+<dd>输入特征图；形状为(N, C, H, W)的四维张量，其中N为batch大小，C为输入通道数，H和W为输入特征图的高和宽。</dd>
+<dt><tt>inputs[1]</tt>: T</dt>
+<dd>需要进行池化的感兴趣区域；形状为(num_rois, 5)的二维张量，内容为[[batch_index, x1, y1, x2, y2], ...]。rois的坐标为输入特征图的坐标系。</dd>
+</dl>
+
+#### 输出
+
+<dl>
+<dt><tt>outputs[0]</tt>: T</dt>
+<dd>池化的输出；形状为(num_rois, C, output_height, output_width)的四维张量。每个输出特征feat[i]都与输入感兴趣区域rois[i]一一对应。<dd>
+</dl>
+#### 类型约束
+
+- T:tensor(float32, Linear)
+
+### ScatterND
+
+#### 描述
+
+ScatterND接收三个输入，分别为秩为r >= 1的`data`，秩为q >= 1的`indices`以及秩为 q + r - indices.shape[-1] -1 的`update`。输出的计算方式为：首先创建一个`data`的拷贝，然后根据`indces`的值使用`update`对拷贝的`data`进行更新。注意`indices`中不应该存在相同的条目，也就是说对同一个位置进行一次以上的更新是不允许的。
+
+输出的计算方式可以参考如下代码：
+
+```python
+  output = np.copy(data)
+  update_indices = indices.shape[:-1]
+  for idx in np.ndindex(update_indices):
+      output[indices[idx]] = updates[idx]
+```
+
+#### 模型参数
+
+无
+
+#### 输入
+
+<dl>
+<dt><tt>inputs[0]</tt>: T</dt>
+<dd>秩为r >= 1的输入`data`</dd>
+
+<dt><tt>inputs[1]</tt>: tensor(int32, Linear)</dt>
+<dd>秩为q >= 1的输入`update`</dd>
+
+<dt><tt>inputs[2]</tt>: T</dt>
+<dd>秩为 q + r - indices.shape[-1] -1 的输入`update`</dd>
+</dl>
+
+#### 输出
+
+<dl>
+<dt><tt>outputs[0]</tt>: T</dt>
+<dd>秩为r >= 1的输出张量</dd>
+</dl>
+
+#### 类型约束
+
+- T:tensor(float32, Linear), tensor(int32, Linear)
+
+### NonMaxSuppression
+
+#### 描述
+
+根据IoU阈值对候选框进行非极大值抑制。
+
+#### 模型参数
+
+| 类型    | 参数名                       | 描述                                                                                     |
+| ------- | ---------------------------- | ---------------------------------------------------------------------------------------- |
+| `int`   | `center_point_box`           | 0 - 候选框的格式为[y1, x1, y2, x2]， 1-候选框的格式为[x_center, y_center, width, height] |
+| `int`   | `max_output_boxes_per_class` | 每一类最大的输出检测框个数。默认为0，输出检测框个数等于输入候选框数                      |
+| `float` | `iou_threshold`              | 用来判断候选框重合度的阈值，取值范围[0, 1]。默认值为0                                    |
+| `float` | `score_threshold`            | 用来判断候选框是否合法的阈值                                                             |
+| `int`   | `offset`                     | 检测框长宽计算方式为(x2 - x1 + offset)，可选值0或1                                       |
+
+#### 输入
+
+<dl>
+<dt><tt>inputs[0]</tt>: T</dt>
+<dd>输入候选框。形状为(num_batches, spatial_dimension, 4)的三维张量</dd>
+<dt><tt>inputs[1]</tt>: T</dt>
+<dd>输入得分。形状为(num_batches, num_classes, spatial_dimension)的三维张量</dd>
+</dl>
+
+#### 输出
+
+<dl>
+<dt><tt>outputs[0]</tt>: tensor(int32, Linear)</dt>
+<dd>被选中的候选框索引。形状为(num_selected_indices, 3)的二维张量。每一行内容为[batch_index, class_index, box_index]。</dd>
+<dd>其中 num_selected_indices=num_batches* num_classes* min(max_output_boxes_per_class, spatial_dimension)。</dd>
+<dd>所有未被选中的候选框索引都会被填充为-1</dd>
+</dl>
+
+#### 类型约束
+
+- T:tensor(float32, Linear)
+
+### MMCVDeformConv2d
+
+#### 描述
+
+在输入特征上计算Deformable Convolution，请阅读[Deformable Convolutional Network](https://arxiv.org/abs/1703.06211)了解更多细节。
+
+#### 模型参数
+
+| 类型           | 参数名             | 描述                                                                                          |
+| -------------- | ------------------ | --------------------------------------------------------------------------------------------- |
+| `list of ints` | `stride`           | 卷积的步长 (sH, sW)                                                                           |
+| `list of ints` | `padding`          | 输入特征填充大小 (padH, padW)                                                                 |
+| `list of ints` | `dilation`         | 卷积核各元素间隔 (dH, dW)                                                                     |
+| `int`          | `deformable_group` | 可变偏移量的分组                                                                              |
+| `int`          | `group`            | 卷积分组数，`input_channel`会根据这个值被分为数个分组进行计算                                 |
+| `int`          | `im2col_step`      | 可变卷积使用im2col计算卷积。输入与偏移量会以im2col_step为步长分块计算，减少临时空间的使用量。 |
+
+#### 输入
+
+<dl>
+<dt><tt>inputs[0]</tt>: T</dt>
+<dd>输入特征；形状为(N, C, inH, inW)的四维张量，其中N为batch大小，C为输入通道数，inH和inW为输入特征图的高和宽</dd>
+<dt><tt>inputs[1]</tt>: T</dt>
+<dd>输入偏移量；形状为(N, deformable_group* 2* kH* kW, outH, outW)的四维张量，kH和kW为输入特征图的高和宽，outH和outW为输入特征图的高和宽</dd>
+<dt><tt>inputs[2]</tt>: T</dt>
+<dd>输入权重；形状为(output_channel, input_channel, kH, kW)的四维张量</dd>
+</dl>
+
+#### 输出
+
+<dl>
+<dt><tt>outputs[0]</tt>: T</dt>
+<dd>输出特征；形状为(N, output_channel, outH, outW)的四维张量</dd>
+</dl>
+
+#### 类型约束
+
+- T:tensor(float32, Linear)
+
+### grid_sampler
+
+#### 描述
+
+根据`grid`的像素位置对`input`进行网格采样。
+
+#### 模型参数
+
+| 类型  | 参数名               | 描述                                                                                                                                                 |
+| ----- | -------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `int` | `interpolation_mode` | 计算输出使用的插值模式。(0: `bilinear` , 1: `nearest`)                                                                                               |
+| `int` | `padding_mode`       | 边缘填充模式。(0: `zeros`, 1: `border`, 2: `reflection`)                                                                                             |
+| `int` | `align_corners`      | 如果`align_corners=1`，则极值(`-1`和`1`)会被当做输入边缘像素的中心点。如果`align_corners=0`，则它们会被看做是边缘像素的边缘点,减小分辨率对采样的影响 |
+
+#### 输入
+
+<dl>
+<dt><tt>inputs[0]</tt>: T</dt>
+<dd>输入特征；形状为(N, C, inH, inW)的四维张量，其中N为batch大小，C为输入通道数，inH和inW为输入特征图的高和宽</dd>
+<dt><tt>inputs[1]</tt>: T</dt>
+<dd>输入网格；形状为(N, outH, outW, 2)的四维张量，outH和outW为输出的高和宽 </dd>
+</dl>
+
+#### 输出
+
+<dl>
+<dt><tt>outputs[0]</tt>: T</dt>
+<dd>输出特征；形状为(N, C, outH, outW)的四维张量</dd>
+</dl>
+
+#### 类型约束
+
+- T:tensor(float32, Linear)
+
+### cummax
+
+#### 描述
+
+返回一个元组(`values`, `indices`)，其中`values`为`input`第`dim`维的累计最大值，`indices`为第`dim`维最大值位置。请阅读[torch.cummax](https://pytorch.org/docs/stable/generated/torch.cummax.html)了解更多细节。
+
+#### 模型参数
+
+| 类型  | 参数名 | 描述               |
+| ----- | ------ | ------------------ |
+| `int` | `dim`  | 进行累计计算的维度 |
+
+#### 输入
+
+<dl>
+<dt><tt>inputs[0]</tt>: T</dt>
+<dd>输入张量；可以使任意形状</dd>
+</dl>
+
+#### 输出
+
+<dl>
+<dt><tt>outputs[0]</tt>: T</dt>
+<dd>`input`第`dim`维的累计最大值，形状与`input`相同。类型和`input`一致</dd>
+<dt><tt>outputs[1]</tt>: (int32, Linear)</dt>
+<dd>第`dim`维最大值位置，形状与`input`相同</dd>
+</dl>
+
+#### 类型约束
+
+- T:tensor(float32, Linear)
+
+### cummin
+
+#### 描述
+
+返回一个元组(`values`, `indices`)，其中`values`为`input`第`dim`维的累计最小值，`indices`为第`dim`维最小值位置。请阅读[torch.cummin](https://pytorch.org/docs/stable/generated/torch.cummin.html)了解更多细节。
+
+#### 模型参数
+
+| 类型  | 参数名 | 描述               |
+| ----- | ------ | ------------------ |
+| `int` | `dim`  | 进行累计计算的维度 |
+
+#### 输入
+
+<dl>
+<dt><tt>inputs[0]</tt>: T</dt>
+<dd>输入张量；可以使任意形状</dd>
+</dl>
+
+#### 输出
+
+<dl>
+<dt><tt>outputs[0]</tt>: T</dt>
+<dd>`input`第`dim`维的累计最小值，形状与`input`相同。类型和`input`一致</dd>
+<dt><tt>outputs[1]</tt>: (int32, Linear)</dt>
+<dd>第`dim`维最小值位置，形状与`input`相同</dd>
+</dl>
+
+#### 类型约束
+
+- T:tensor(float32, Linear)
+
+### MMCVInstanceNormalization
+
+#### 描述
+
+对特征计算instance normalization，请阅读[Instance Normalization: The Missing Ingredient for Fast Stylization](https://arxiv.org/abs/1607.08022)了解更多详细信息。
+
+#### 模型参数
+
+| 类型    | 参数名    | 描述                         |
+| ------- | --------- | ---------------------------- |
+| `float` | `epsilon` | 用来避免除0错误。默认为1e-05 |
+
+#### 输入
+
+<dl>
+<dt><tt>inputs[0]</tt>: T</dt>
+<dd>输入特征。形状为(N, C, H， W)的四维张量，其中N为batch大小，C为输入通道数，H和W为输入特征图的高和宽</dd>
+<dt><tt>inputs[1]</tt>: T</dt>
+<dd>输入缩放系数。形状为(C，)的一维张量</dd>
+<dt><tt>inputs[2]</tt>: T</dt>
+<dd>输入偏移量。形状为(C，)的一维张量</dd>
+</dl>
+
+#### 输出
+
+<dl>
+<dt><tt>outputs[0]</tt>: T</dt>
+<dd>输出特征。形状为(N, C, H， W)的四维张量</dd>
+</dl>
+
+#### 类型约束
+
+- T:tensor(float32, Linear)
+
+### MMCVModulatedDeformConv2d
+
+#### 描述
+
+在输入特征上计算Modulated Deformable Convolution，请阅读[Deformable ConvNets v2: More Deformable, Better Results](https://arxiv.org/abs/1811.11168?from=timeline)了解更多细节。
+
+#### 模型参数
+
+| 类型           | 参数名              | 描述                                                          |
+| -------------- | ------------------- | ------------------------------------------------------------- |
+| `list of ints` | `stride`            | 卷积的步长 (sH, sW)                                           |
+| `list of ints` | `padding`           | 输入特征填充大小 (padH, padW)                                 |
+| `list of ints` | `dilation`          | 卷积核各元素间隔 (dH, dW)                                     |
+| `int`          | `deformable_groups` | 可变偏移量的分组，通常置位1即可                               |
+| `int`          | `groups`            | 卷积分组数，`input_channel`会根据这个值被分为数个分组进行计算 |
+
+#### 输入
+
+<dl>
+<dt><tt>inputs[0]</tt>: T</dt>
+<dd>输入特征；形状为(N, C, inH, inW)的四维张量，其中N为batch大小，C为输入通道数，inH和inW为输入特征图的高和宽</dd>
+<dt><tt>inputs[1]</tt>: T</dt>
+<dd>输入偏移量；形状为(N, deformable_group* 2* kH* kW, outH, outW)的四维张量，kH和kW为输入特征图的高和宽，outH和outW为输入特征图的高和宽</dd>
+<dt><tt>inputs[2]</tt>: T</dt>
+<dd>输入掩码；形状为(N, deformable_group* kH* kW, outH, outW)的四维张量</dd>
+<dt><tt>inputs[3]</tt>: T</dt>
+<dd>输入权重；形状为(output_channel, input_channel, kH, kW)的四维张量</dd>
+<dt><tt>inputs[4]</tt>: T, optional</dt>
+<dd>输入偏移量；形状为(output_channel)的一维张量</dd>
+</dl>
+
+#### 输出
+
+<dl>
+<dt><tt>outputs[0]</tt>: T</dt>
+<dd>输出特征；形状为(N, output_channel, outH, outW)的四维张量</dd>
+</dl>
+
+#### 类型约束
+
+- T:tensor(float32, Linear)
--- a/docs_zh_CN/deployment/tensorrt_plugin.md
+++ b/docs_zh_CN/deployment/tensorrt_plugin.md
-# MMCV 中用于自定义算子的 TensorRT 插件 (实验性质)
+## MMCV中的TensorRT自定义算子 (实验性)

-欢迎有兴趣的朋友一起翻译 MMCV 文档。如有兴趣，请在 [MMCV issue](https://github.com/open-mmlab/mmcv/issues) 提 issue 确定翻译的文档。
+<!-- TOC -->
+
+- [MMCV中的TensorRT自定义算子 (实验性)](#mmcv中的tensorrt自定义算子-实验性)
+  - [介绍](#介绍)
+  - [MMCV中的TensorRT插件列表](#mmcv中的tensorrt插件列表)
+  - [如何编译MMCV中的TensorRT插件](#如何编译mmcv中的tensorrt插件)
+    - [准备](#准备)
+    - [在Linux上编译](#在linux上编译)
+  - [创建TensorRT推理引擎并在python下进行推理](#创建tensorrt推理引擎并在python下进行推理)
+  - [如何在MMCV中添加新的TensorRT自定义算子](#如何在mmcv中添加新的tensorrt自定义算子)
+    - [主要流程](#主要流程)
+    - [注意](#注意)
+  - [已知问题](#已知问题)
+  - [引用](#引用)
+
+<!-- TOC -->
+
+### 介绍
+
+**NVIDIA TensorRT**是一个为深度学习模型高性能推理准备的软件开发工具(SDK)。它包括深度学习推理优化器和运行时，可为深度学习推理应用提供低延迟和高吞吐量。请访问[developer's website](https://developer.nvidia.com/tensorrt)了解更多信息。
+为了简化TensorRT部署带有MMCV自定义算子的模型的流程，MMCV中添加了一系列TensorRT插件。
+
+### MMCV中的TensorRT插件列表
+
+|         ONNX算子          |                                  TensorRT插件                                   | MMCV版本 |
+| :-----------------------: | :-----------------------------------------------------------------------------: | :------: |
+|       MMCVRoiAlign        |              [MMCVRoiAlign](./tensorrt_custom_ops.md#mmcvroialign)              |  1.2.6   |
+|         ScatterND         |                 [ScatterND](./tensorrt_custom_ops.md#scatternd)                 |  1.2.6   |
+|     NonMaxSuppression     |         [NonMaxSuppression](./tensorrt_custom_ops.md#nonmaxsuppression)         |  1.3.0   |
+|     MMCVDeformConv2d      |          [MMCVDeformConv2d](./tensorrt_custom_ops.md#mmcvdeformconv2d)          |  1.3.0   |
+|       grid_sampler        |              [grid_sampler](./tensorrt_custom_ops.md#grid-sampler)              |  1.3.1   |
+|          cummax           |                    [cummax](./tensorrt_custom_ops.md#cummax)                    |  1.3.5   |
+|          cummin           |                    [cummin](./tensorrt_custom_ops.md#cummin)                    |  1.3.5   |
+| MMCVInstanceNormalization | [MMCVInstanceNormalization](./tensorrt_custom_ops.md#mmcvinstancenormalization) |  1.3.5   |
+| MMCVModulatedDeformConv2d | [MMCVModulatedDeformConv2d](./tensorrt_custom_ops.md#mmcvmodulateddeformconv2d) |  master  |
+
+注意
+
+- 以上所有算子均在 TensorRT-7.2.1.6.Ubuntu-16.04.x86_64-gnu.cuda-10.2.cudnn8.0 环境下开发。
+
+### 如何编译MMCV中的TensorRT插件
+
+#### 准备
+
+- 克隆代码仓库
+
+```bash
+git clone https://github.com/open-mmlab/mmcv.git
+```
+
+- 安装TensorRT
+
+从 [NVIDIA Developer Zone](https://developer.nvidia.com/nvidia-tensorrt-download) 下载合适的TensorRT版本。
+
+比如，对安装了cuda-10.2的x86-64的Ubuntu 16.04，下载文件为`TensorRT-7.2.1.6.Ubuntu-16.04.x86_64-gnu.cuda-10.2.cudnn8.0.tar.gz`.
+
+然后使用下面方式安装并配置环境
+
+```bash
+cd ~/Downloads
+tar -xvzf TensorRT-7.2.1.6.Ubuntu-16.04.x86_64-gnu.cuda-10.2.cudnn8.0.tar.gz
+export TENSORRT_DIR=`pwd`/TensorRT-7.2.1.6
+export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$TENSORRT_DIR/lib
+```
+
+安装python依赖: tensorrt, graphsurgeon, onnx-graphsurgeon
+
+```bash
+pip install $TENSORRT_DIR/python/tensorrt-7.2.1.6-cp37-none-linux_x86_64.whl
+pip install $TENSORRT_DIR/onnx_graphsurgeon/onnx_graphsurgeon-0.2.6-py2.py3-none-any.whl
+pip install $TENSORRT_DIR/graphsurgeon/graphsurgeon-0.4.5-py2.py3-none-any.whl
+```
+
+想了解更多通过tar包安装TensorRT，请访问[Nvidia' website](https://docs.nvidia.com/deeplearning/tensorrt/archives/tensorrt-721/install-guide/index.html#installing-tar).
+
+#### 在Linux上编译
+
+```bash
+cd mmcv ## to MMCV root directory
+MMCV_WITH_OPS=1 MMCV_WITH_TRT=1 pip install -e .
+```
+
+### 创建TensorRT推理引擎并在python下进行推理
+
+范例如下：
+
+```python
+import torch
+import onnx
+
+from mmcv.tensorrt import (TRTWrapper, onnx2trt, save_trt_engine,
+                                   is_tensorrt_plugin_loaded)
+
+assert is_tensorrt_plugin_loaded(), 'Requires to complie TensorRT plugins in mmcv'
+
+onnx_file = 'sample.onnx'
+trt_file = 'sample.trt'
+onnx_model = onnx.load(onnx_file)
+
+## Model input
+inputs = torch.rand(1, 3, 224, 224).cuda()
+## Model input shape info
+opt_shape_dict = {
+    'input': [list(inputs.shape),
+              list(inputs.shape),
+              list(inputs.shape)]
+}
+
+## Create TensorRT engine
+max_workspace_size = 1 << 30
+trt_engine = onnx2trt(
+    onnx_model,
+    opt_shape_dict,
+    max_workspace_size=max_workspace_size)
+
+## Save TensorRT engine
+save_trt_engine(trt_engine, trt_file)
+
+## Run inference with TensorRT
+trt_model = TRTWrapper(trt_file, ['input'], ['output'])
+
+with torch.no_grad():
+    trt_outputs = trt_model({'input': inputs})
+    output = trt_outputs['output']
+
+```
+
+### 如何在MMCV中添加新的TensorRT自定义算子
+
+#### 主要流程
+
+下面是主要的步骤：
+
+1. 添加c++头文件
+2. 添加c++源文件
+3. 添加cuda kernel文件
+4. 在`trt_plugin.cpp`中注册插件
+5. 在`tests/test_ops/test_tensorrt.py`中添加单元测试
+
+**以RoIAlign算子插件`roi_align`举例。**
+
+1. 在TensorRT包含目录`mmcv/ops/csrc/tensorrt/`中添加头文件`trt_roi_align.hpp`
+2. 在TensorRT源码目录`mmcv/ops/csrc/tensorrt/plugins/`中添加头文件`trt_roi_align.cpp`
+3. 在TensorRT源码目录`mmcv/ops/csrc/tensorrt/plugins/`中添加cuda kernel文件`trt_roi_align_kernel.cu`
+4. 在[trt_plugin.cpp](https://github.com/open-mmlab/mmcv/blob/master/mmcv/ops/csrc/tensorrt/plugins/trt_plugin.cpp)中注册`roi_align`插件
+
+    ```c++
+    #include "trt_plugin.hpp"
+
+    #include "trt_roi_align.hpp"
+
+    REGISTER_TENSORRT_PLUGIN(RoIAlignPluginDynamicCreator);
+
+    extern "C" {
+    bool initLibMMCVInferPlugins() { return true; }
+    }  // extern "C"
+    ```
+
+5. 在`tests/test_ops/test_tensorrt.py`中添加单元测试
+
+#### 注意
+
+- 部分MMCV中的自定义算子存在对应的cuda实现，在进行TensorRT插件开发的时候可以参考。
+
+### 已知问题
+
+- 无
+
+### 引用
+
+- [Developer guide of Nvidia TensorRT](https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html)
+- [TensorRT Open Source Software](https://github.com/NVIDIA/TensorRT)
+- [onnx-tensorrt](https://github.com/onnx/onnx-tensorrt)
+- [TensorRT python API](https://docs.nvidia.com/deeplearning/tensorrt/api/python_api/index.html)
+- [TensorRT c++ plugin API](https://docs.nvidia.com/deeplearning/tensorrt/api/c_api/classnvinfer1_1_1_i_plugin.html)