[Docs] Add zh_cn document of ONNX (#1331)

* add doc-cn of ONNX * Update docs_zh_CN/deployment/onnxruntime_custom_ops.md Co-authored-by: Zaida Zhou <58739961+zhouzaida@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Zaida Zhou <58739961+zhouzaida@users.noreply.github.com> * update doc of cummax * fix en doc of softnms * update heading Co-authored-by: Zaida Zhou <58739961+zhouzaida@users.noreply.github.com>

[Docs] Add zh_cn document of ONNX (#1331)
* add doc-cn of ONNX * Update docs_zh_CN/deployment/onnxruntime_custom_ops.md Co-authored-by: Zaida Zhou <58739961+zhouzaida@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Zaida Zhou <58739961+zhouzaida@users.noreply.github.com> * update doc of cummax * fix en doc of softnms * update heading Co-authored-by: Zaida Zhou <58739961+zhouzaida@users.noreply.github.com>
2b39d7a8 · q.yao · GitHub · 6f533ff1 · 2b39d7a8 · 2b39d7a8
Unverified Commit 2b39d7a8 authored Sep 24, 2021 by q.yao Committed by GitHub Sep 24, 2021
6 changed files
--- a/docs/deployment/onnx.md
+++ b/docs/deployment/onnx.md
-# Introduction of onnx module in MMCV (Experimental)
+## Introduction of onnx module in MMCV (Experimental)

-## register_extra_symbolics
+### register_extra_symbolics

 Some extra symbolic functions need to be registered before exporting PyTorch model to ONNX.

-### Example
+#### Example

 ```python
 import mmcv
@@ -14,6 +14,6 @@ opset_version = 11
 register_extra_symbolics(opset_version)
 ```

-### FAQs
+#### FAQs

 - None
--- a/docs/deployment/onnxruntime_custom_ops.md
+++ b/docs/deployment/onnxruntime_custom_ops.md
-# Onnxruntime Custom Ops
+## Onnxruntime Custom Ops

 <!-- TOC -->

@@ -54,13 +54,13 @@

 <!-- TOC -->

-## SoftNMS
+### SoftNMS

-### Description
+#### Description

 Perform soft NMS on `boxes` with `scores`. Read [Soft-NMS -- Improving Object Detection With One Line of Code](https://arxiv.org/abs/1704.04503) for detail.

-### Parameters
+#### Parameters

 | Type    | Parameter       | Description                                                    |
 | ------- | --------------- | -------------------------------------------------------------- |
@@ -70,7 +70,7 @@ Perform soft NMS on `boxes` with `scores`. Read [Soft-NMS -- Improving Object De
 | `int`   | `method`        | method to do the nms, (0: `naive`, 1: `linear`, 2: `gaussian`) |
 | `int`   | `offset`        | `boxes` width or height is (x2 - x1 + offset). (0 or 1)        |

-### Inputs
+#### Inputs

 <dl>
 <dt><tt>boxes</tt>: T</dt>
@@ -79,26 +79,26 @@ Perform soft NMS on `boxes` with `scores`. Read [Soft-NMS -- Improving Object De
 <dd>Input scores. 1-D tensor of shape (N, ).</dd>
 </dl>

-### Outputs
+#### Outputs

 <dl>
-<dt><tt>dets</tt>: tensor(int64)</dt>
+<dt><tt>dets</tt>: T</dt>
 <dd>Output boxes and scores. 2-D tensor of shape (num_valid_boxes, 5), [[x1, y1, x2, y2, score], ...]. num_valid_boxes is the number of valid boxes.</dd>
-<dt><tt>indices</tt>: T</dt>
+<dt><tt>indices</tt>: tensor(int64)</dt>
 <dd>Output indices. 1-D tensor of shape (num_valid_boxes, ).</dd>
 </dl>

-### Type Constraints
+#### Type Constraints

 - T:tensor(float32)

-## RoIAlign
+### RoIAlign

-### Description
+#### Description

 Perform RoIAlign on output feature, used in bbox_head of most two-stage detectors.

-### Parameters
+#### Parameters

 | Type    | Parameter        | Description                                                                                                   |
 | ------- | ---------------- | ------------------------------------------------------------------------------------------------------------- |
@@ -109,7 +109,7 @@ Perform RoIAlign on output feature, used in bbox_head of most two-stage detector
 | `str`   | `mode`           | pooling mode in each bin. `avg` or `max`                                                                      |
 | `int`   | `aligned`        | If `aligned=0`, use the legacy implementation in MMDetection. Else, align the results more perfectly.         |

-### Inputs
+#### Inputs

 <dl>
 <dt><tt>input</tt>: T</dt>
@@ -118,31 +118,31 @@ Perform RoIAlign on output feature, used in bbox_head of most two-stage detector
 <dd>RoIs (Regions of Interest) to pool over; 2-D tensor of shape (num_rois, 5) given as [[batch_index, x1, y1, x2, y2], ...]. The RoIs' coordinates are the coordinate system of input.</dd>
 </dl>

-### Outputs
+#### Outputs

 <dl>
 <dt><tt>feat</tt>: T</dt>
 <dd>RoI pooled output, 4-D tensor of shape (num_rois, C, output_height, output_width). The r-th batch element feat[r-1] is a pooled feature map corresponding to the r-th RoI RoIs[r-1].<dd>
 </dl>

-### Type Constraints
+#### Type Constraints

 - T:tensor(float32)

-## NMS
+### NMS

-### Description
+#### Description

 Filter out boxes has high IoU overlap with previously selected boxes.

-### Parameters
+#### Parameters

 | Type    | Parameter       | Description                                                                                                      |
 | ------- | --------------- | ---------------------------------------------------------------------------------------------------------------- |
 | `float` | `iou_threshold` | The threshold for deciding whether boxes overlap too much with respect to IoU. Value range [0, 1]. Default to 0. |
 | `int`   | `offset`        | 0 or 1, boxes' width or height is (x2 - x1 + offset).                                                            |

-### Inputs
+#### Inputs

 <dl>
 <dt><tt>bboxes</tt>: T</dt>
@@ -151,24 +151,24 @@ Filter out boxes has high IoU overlap with previously selected boxes.
 <dd>Input scores. 1-D tensor of shape (num_boxes, ).</dd>
 </dl>

-### Outputs
+#### Outputs

 <dl>
 <dt><tt>indices</tt>: tensor(int32, Linear)</dt>
 <dd>Selected indices. 1-D tensor of shape (num_valid_boxes, ). num_valid_boxes is the number of valid boxes.</dd>
 </dl>

-### Type Constraints
+#### Type Constraints

 - T:tensor(float32)

-## grid_sampler
+### grid_sampler

-### Description
+#### Description

 Perform sample from `input` with pixel locations from `grid`.

-### Parameters
+#### Parameters

 | Type  | Parameter            | Description                                                                                                                                                                                                                                                                                     |
 | ----- | -------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
@@ -176,7 +176,7 @@ Perform sample from `input` with pixel locations from `grid`.
 | `int` | `padding_mode`       | Padding mode for outside grid values. (0: `zeros`, 1: `border`, 2: `reflection`)                                                                                                                                                                                                                |
 | `int` | `align_corners`      | If `align_corners=1`, the extrema (`-1` and `1`) are considered as referring to the center points of the input's corner pixels. If `align_corners=0`, they are instead considered as referring to the corner points of the input's corner pixels, making the sampling more resolution agnostic. |

-### Inputs
+#### Inputs

 <dl>
 <dt><tt>input</tt>: T</dt>
@@ -185,67 +185,67 @@ Perform sample from `input` with pixel locations from `grid`.
 <dd>Input offset; 4-D tensor of shape (N, outH, outW, 2), where outH and outW is the height and width of offset and output. </dd>
 </dl>

-### Outputs
+#### Outputs

 <dl>
 <dt><tt>output</tt>: T</dt>
 <dd>Output feature; 4-D tensor of shape (N, C, outH, outW).</dd>
 </dl>

-### Type Constraints
+#### Type Constraints

 - T:tensor(float32, Linear)

-## CornerPool
+### CornerPool

-### Description
+#### Description

 Perform CornerPool on `input` features. Read [CornerNet -- Detecting Objects as Paired Keypoints](https://arxiv.org/abs/1808.01244) for more details.

-### Parameters
+#### Parameters

-| Type    | Parameter       | Description                                                      |
-| ------- | --------------- | ---------------------------------------------------------------- |
-| `int`   | `mode`          | corner pool mode, (0: `top`, 1: `bottom`, 2: `left`, 3: `right`) |
+| Type  | Parameter | Description                                                      |
+| ----- | --------- | ---------------------------------------------------------------- |
+| `int` | `mode`    | corner pool mode, (0: `top`, 1: `bottom`, 2: `left`, 3: `right`) |

-### Inputs
+#### Inputs

 <dl>
 <dt><tt>input</tt>: T</dt>
 <dd>Input features. 4-D tensor of shape (N, C, H, W). N is the batch size.</dd>
 </dl>

-### Outputs
+#### Outputs

 <dl>
 <dt><tt>output</tt>: T</dt>
 <dd>Output the pooled features. 4-D tensor of shape (N, C, H, W).</dd>
 </dl>

-### Type Constraints
+#### Type Constraints

 - T:tensor(float32)

-## cummax
+### cummax

-### Description
+#### Description

 Returns a tuple (`values`, `indices`) where `values` is the cumulative maximum elements of `input` in the dimension `dim`. And `indices` is the index location of each maximum value found in the dimension `dim`. Read [torch.cummax](https://pytorch.org/docs/stable/generated/torch.cummax.html) for more details.

-### Parameters
+#### Parameters

-| Type    | Parameter       | Description                                                      |
-| ------- | --------------- | ---------------------------------------------------------------- |
-| `int`   | `dim`           | the dimension to do the operation over                           |
+| Type  | Parameter | Description                            |
+| ----- | --------- | -------------------------------------- |
+| `int` | `dim`     | the dimension to do the operation over |

-### Inputs
+#### Inputs

 <dl>
 <dt><tt>input</tt>: T</dt>
 <dd>The input tensor with various shapes. Tensor with empty element is also supported.</dd>
 </dl>

-### Outputs
+#### Outputs

 <dl>
 <dt><tt>output</tt>: T</dt>
@@ -254,30 +254,30 @@ Returns a tuple (`values`, `indices`) where `values` is the cumulative maximum e
 <dd>Output the index location of each cumulative maximum value found in the dimension `dim`, with the same shape as `input`.</dd>
 </dl>

-### Type Constraints
+#### Type Constraints

 - T:tensor(float32)

-## cummin
+### cummin

-### Description
+#### Description

 Returns a tuple (`values`, `indices`) where `values` is the cumulative minimum elements of `input` in the dimension `dim`. And `indices` is the index location of each minimum value found in the dimension `dim`. Read [torch.cummin](https://pytorch.org/docs/stable/generated/torch.cummin.html) for more details.

-### Parameters
+#### Parameters

-| Type    | Parameter       | Description                                                      |
-| ------- | --------------- | ---------------------------------------------------------------- |
-| `int`   | `dim`           | the dimension to do the operation over                           |
+| Type  | Parameter | Description                            |
+| ----- | --------- | -------------------------------------- |
+| `int` | `dim`     | the dimension to do the operation over |

-### Inputs
+#### Inputs

 <dl>
 <dt><tt>input</tt>: T</dt>
 <dd>The input tensor with various shapes. Tensor with empty element is also supported.</dd>
 </dl>

-### Outputs
+#### Outputs

 <dl>
 <dt><tt>output</tt>: T</dt>
@@ -286,27 +286,27 @@ Returns a tuple (`values`, `indices`) where `values` is the cumulative minimum e
 <dd>Output the index location of each cumulative minimum value found in the dimension `dim`, with the same shape as `input`.</dd>
 </dl>

-### Type Constraints
+#### Type Constraints

 - T:tensor(float32)

-## MMCVModulatedDeformConv2d
+### MMCVModulatedDeformConv2d

-### Description
+#### Description

 Perform Modulated Deformable Convolution on input feature, read [Deformable ConvNets v2: More Deformable, Better Results](https://arxiv.org/abs/1811.11168?from=timeline) for detail.

-### Parameters
+#### Parameters

-| Type           | Parameter          | Description                                                                           |
-| -------------- | ------------------ | ------------------------------------------------------------------------------------- |
-| `list of ints` | `stride`           | The stride of the convolving kernel. (sH, sW)                                         |
-| `list of ints` | `padding`          | Paddings on both sides of the input. (padH, padW)                                     |
-| `list of ints` | `dilation`         | The spacing between kernel elements. (dH, dW)                                         |
+| Type           | Parameter           | Description                                                                           |
+| -------------- | ------------------- | ------------------------------------------------------------------------------------- |
+| `list of ints` | `stride`            | The stride of the convolving kernel. (sH, sW)                                         |
+| `list of ints` | `padding`           | Paddings on both sides of the input. (padH, padW)                                     |
+| `list of ints` | `dilation`          | The spacing between kernel elements. (dH, dW)                                         |
 | `int`          | `deformable_groups` | Groups of deformable offset.                                                          |
 | `int`          | `groups`            | Split input into groups. `input_channel` should be divisible by the number of groups. |

-### Inputs
+#### Inputs

 <dl>
 <dt><tt>inputs[0]</tt>: T</dt>
@@ -321,13 +321,13 @@ Perform Modulated Deformable Convolution on input feature, read [Deformable Conv
 <dd>Input bias; 1-D tensor of shape (output_channel).</dd>
 </dl>

-### Outputs
+#### Outputs

 <dl>
 <dt><tt>outputs[0]</tt>: T</dt>
 <dd>Output feature; 4-D tensor of shape (N, output_channel, outH, outW).</dd>
 </dl>

-### Type Constraints
+#### Type Constraints

 - T:tensor(float32, Linear)
--- a/docs/deployment/onnxruntime_op.md
+++ b/docs/deployment/onnxruntime_op.md
-# Custom operators for ONNX Runtime in MMCV
+## Custom operators for ONNX Runtime in MMCV

-## Introduction of ONNX Runtime
+### Introduction of ONNX Runtime

 **ONNX Runtime** is a cross-platform inferencing and training accelerator compatible with many popular ML/DNN frameworks. Check its [github](https://github.com/microsoft/onnxruntime) for more information.

-## Introduction of ONNX
+### Introduction of ONNX

 **ONNX** stands for **Open Neural Network Exchange**, which acts as *Intermediate Representation(IR)* for ML/DNN models from many frameworks. Check its [github](https://github.com/onnx/onnx) for more information.

-## Why include custom operators for ONNX Runtime in MMCV
+### Why include custom operators for ONNX Runtime in MMCV

 - To verify the correctness of exported ONNX models in ONNX Runtime.
 - To ease the deployment of ONNX models with custom operators from `mmcv.ops` in ONNX Runtime.

-## List of operators for ONNX Runtime supported in MMCV
+### List of operators for ONNX Runtime supported in MMCV

 |                        Operator                        |  CPU  |  GPU  | MMCV Releases |
 | :----------------------------------------------------: | :---: | :---: | :-----------: |
@@ -25,11 +25,11 @@
 |       [cummax](onnxruntime_custom_ops.md#cummax)       |   Y   |   N   |    master     |
 |       [cummin](onnxruntime_custom_ops.md#cummin)       |   Y   |   N   |    master     |

-## How to build custom operators for ONNX Runtime
+### How to build custom operators for ONNX Runtime

 *Please be noted that only **onnxruntime>=1.8.1** of CPU version on Linux platform is tested by now.*

-### Prerequisite
+#### Prerequisite

 - Clone repository

@@ -48,14 +48,14 @@ export ONNXRUNTIME_DIR=$(pwd)
 export LD_LIBRARY_PATH=$ONNXRUNTIME_DIR/lib:$LD_LIBRARY_PATH
 ```

-### Build on Linux
+#### Build on Linux

 ```bash
-cd mmcv # to MMCV root directory
+cd mmcv ## to MMCV root directory
 MMCV_WITH_OPS=1 MMCV_WITH_ORT=1 python setup.py develop
 ```

-## How to do inference using exported ONNX models with custom operators in ONNX Runtime in python
+### How to do inference using exported ONNX models with custom operators in ONNX Runtime in python

 Install ONNX Runtime with `pip`

@@ -77,21 +77,21 @@ ort_custom_op_path = get_onnxruntime_op_path()
 assert os.path.exists(ort_custom_op_path)
 session_options = ort.SessionOptions()
 session_options.register_custom_ops_library(ort_custom_op_path)
-# exported ONNX model with custom operators
+## exported ONNX model with custom operators
 onnx_file = 'sample.onnx'
 input_data = np.random.randn(1, 3, 224, 224).astype(np.float32)
 sess = ort.InferenceSession(onnx_file, session_options)
 onnx_results = sess.run(None, {'input' : input_data})
 ```

-## How to add a new custom operator for ONNX Runtime in MMCV
+### How to add a new custom operator for ONNX Runtime in MMCV

-### Reminder
+#### Reminder

 - The custom operator is not included in [supported operator list](https://github.com/microsoft/onnxruntime/blob/master/docs/OperatorKernels.md) in ONNX Runtime.
 - The custom operator should be able to be exported to ONNX.

-### Main procedures
+#### Main procedures

 Take custom operator `soft_nms` for example.

@@ -114,13 +114,13 @@ Take custom operator `soft_nms` for example.

 **Finally, welcome to send us PR of adding custom operators for ONNX Runtime in MMCV.** :nerd_face:

-## Known Issues
+### Known Issues

 - "RuntimeError: tuple appears in op that does not forward tuples, unsupported kind: `prim::PythonOp`."
   1. Note generally `cummax` or `cummin` is exportable to ONNX as long as the torch version >= 1.5.0, since `torch.cummax` is only supported with torch >= 1.5.0. But when `cummax` or `cummin` serves as an intermediate component whose outputs is used as inputs for another modules, it's expected that torch version must be >= 1.7.0. Otherwise the above error might arise, when running exported ONNX model with onnxruntime.
   2. Solution: update the torch version to 1.7.0 or higher.

-## References
+### References

 - [How to export Pytorch model with custom op to ONNX and run it in ONNX Runtime](https://github.com/onnx/tutorials/blob/master/PyTorchCustomOperator/README.md)
 - [How to add a custom operator/kernel in ONNX Runtime](https://github.com/microsoft/onnxruntime/blob/master/docs/AddingCustomOp.md)
--- a/docs_zh_CN/deployment/onnx.md
+++ b/docs_zh_CN/deployment/onnx.md
-# MMCV 中的 onnx 模块 (实验性质)
+## MMCV中ONNX模块简介 (实验性)

-欢迎有兴趣的朋友一起翻译 MMCV 文档。如有兴趣，请在 [MMCV issue](https://github.com/open-mmlab/mmcv/issues) 提 issue 确定翻译的文档。
+### register_extra_symbolics
+
+在将PyTorch模型导出成ONNX时，需要注册额外的符号函数
+
+#### 范例
+
+```python
+import mmcv
+from mmcv.onnx import register_extra_symbolics
+
+opset_version = 11
+register_extra_symbolics(opset_version)
+```
+
+#### 常见问题
+
+- 无
--- a/docs_zh_CN/deployment/onnxruntime_custom_ops.md
+++ b/docs_zh_CN/deployment/onnxruntime_custom_ops.md
-# Onnxruntime 自定义算子
+## ONNX Runtime自定义算子

-欢迎有兴趣的朋友一起翻译 MMCV 文档。如有兴趣，请在 [MMCV issue](https://github.com/open-mmlab/mmcv/issues) 提 issue 确定翻译的文档。
+<!-- TOC -->
+
+- [ONNX Runtime自定义算子](#onnx-runtime自定义算子)
+  - [SoftNMS](#softnms)
+    - [描述](#描述)
+    - [模型参数](#模型参数)
+    - [输入](#输入)
+    - [输出](#输出)
+    - [类型约束](#类型约束)
+  - [RoIAlign](#roialign)
+    - [描述](#描述-1)
+    - [模型参数](#模型参数-1)
+    - [输入](#输入-1)
+    - [输出](#输出-1)
+    - [类型约束](#类型约束-1)
+  - [NMS](#nms)
+    - [描述](#描述-2)
+    - [模型参数](#模型参数-2)
+    - [输入](#输入-2)
+    - [输出](#输出-2)
+    - [类型约束](#类型约束-2)
+  - [grid_sampler](#grid_sampler)
+    - [描述](#描述-3)
+    - [模型参数](#模型参数-3)
+    - [输入](#输入-3)
+    - [输出](#输出-3)
+    - [类型约束](#类型约束-3)
+  - [CornerPool](#cornerpool)
+    - [描述](#描述-4)
+    - [模型参数](#模型参数-4)
+    - [输入](#输入-4)
+    - [输出](#输出-4)
+    - [类型约束](#类型约束-4)
+  - [cummax](#cummax)
+    - [描述](#描述-5)
+    - [模型参数](#模型参数-5)
+    - [输入](#输入-5)
+    - [输出](#输出-5)
+    - [类型约束](#类型约束-5)
+  - [cummin](#cummin)
+    - [描述](#描述-6)
+    - [模型参数](#模型参数-6)
+    - [输入](#输入-6)
+    - [输出](#输出-6)
+    - [类型约束](#类型约束-6)
+  - [MMCVModulatedDeformConv2d](#mmcvmodulateddeformconv2d)
+    - [描述](#描述-7)
+    - [模型参数](#模型参数-7)
+    - [输入](#输入-7)
+    - [输出](#输出-7)
+    - [类型约束](#类型约束-7)
+
+<!-- TOC -->
+
+### SoftNMS
+
+#### 描述
+
+根据`scores`计算`boxes`的soft NMS。 请阅读[Soft-NMS -- Improving Object Detection With One Line of Code](https://arxiv.org/abs/1704.04503)了解细节。
+
+#### 模型参数
+
+| 类型    | 参数名          | 描述                                                    |
+| ------- | --------------- | ------------------------------------------------------- |
+| `float` | `iou_threshold` | 用来判断候选框重合度的阈值，取值范围[0, 1]。默认值为0   |
+| `float` | `sigma`         | 高斯方法的超参数                                        |
+| `float` | `min_score`     | NMS的score阈值                                          |
+| `int`   | `method`        | NMS的计算方式, (0: `naive`, 1: `linear`, 2: `gaussian`) |
+| `int`   | `offset`        | 用来计算候选框的宽高(x2 - x1 + offset)。可选值0或1      |
+
+#### 输入
+
+<dl>
+<dt><tt>boxes</tt>: T</dt>
+<dd>输入候选框。形状为(N, 4)的二维张量，N为候选框数量。</dd>
+<dt><tt>scores</tt>: T</dt>
+<dd>输入得分。形状为(N, )的一维张量。</dd>
+</dl>
+
+#### 输出
+
+<dl>
+<dt><tt>dets</tt>: T</dt>
+<dd>输出的检测框与得分。形状为(num_valid_boxes, 5)的二维张量，内容为[[x1, y1, x2, y2, score], ...]。num_valid_boxes是合法的检测框数量。</dd>
+<dt><tt>indices</tt>: tensor(int64)</dt>
+<dd>输出序号。形状为(num_valid_boxes, )的一维张量。</dd>
+</dl>
+
+#### 类型约束
+
+- T:tensor(float32)
+
+### RoIAlign
+
+#### 描述
+
+在特征图上计算RoIAlign，通常在双阶段目标检测模型的bbox_head中使用
+
+#### 模型参数
+
+| 类型    | 参数名           | 描述                                                    |
+| ------- | ---------------- | ------------------------------------------------------- |
+| `int`   | `output_height`  | roi特征的输出高度                                       |
+| `int`   | `output_width`   | roi特征的输出宽度                                       |
+| `float` | `spatial_scale`  | 输入检测框的缩放系数                                    |
+| `int`   | `sampling_ratio` | 输出的采样率。`0`表示使用密集采样                       |
+| `str`   | `mode`           | 池化方式。 `avg`或`max`                                 |
+| `int`   | `aligned`        | 如果`aligned=1`，则像素会进行-0.5的偏移以达到更好的对齐 |
+
+#### 输入
+
+<dl>
+<dt><tt>input</tt>: T</dt>
+<dd>输入特征图；形状为(N, C, H, W)的四维张量，其中N为batch大小，C为输入通道数，H和W为输入特征图的高和宽。</dd>
+<dt><tt>rois</tt>: T</dt>
+<dd>需要进行池化的感兴趣区域；形状为(num_rois, 5)的二维张量，内容为[[batch_index, x1, y1, x2, y2], ...]。rois的坐标为输入特征图的坐标系。</dd>
+</dl>
+
+#### 输出
+
+<dl>
+<dt><tt>feat</tt>: T</dt>
+<dd>池化的输出；形状为(num_rois, C, output_height, output_width)的四维张量。每个输出特征feat[i]都与输入感兴趣区域rois[i]一一对应。<dd>
+</dl>
+
+#### 类型约束
+
+- T:tensor(float32)
+
+### NMS
+
+#### 描述
+
+根据IoU阈值对候选框进行非极大值抑制。
+
+#### 模型参数
+
+| 类型    | 参数名          | 描述                                                  |
+| ------- | --------------- | ----------------------------------------------------- |
+| `float` | `iou_threshold` | 用来判断候选框重合度的阈值，取值范围[0, 1]。默认值为0 |
+| `int`   | `offset`        | 用来计算候选框的宽高(x2 - x1 + offset)。可选值0或1    |
+
+#### 输入
+
+<dl>
+<dt><tt>boxes</tt>: T</dt>
+<dd>输入候选框。形状为(N, 4)的二维张量，N为候选框数量。</dd>
+<dt><tt>scores</tt>: T</dt>
+<dd>输入得分。形状为(N, )的一维张量。</dd>
+</dl>
+
+#### 输出
+
+<dl>
+<dt><tt>indices</tt>: tensor(int32, Linear)</dt>
+<dd>被选中的候选框索引。形状为(num_valid_boxes, )的一维张量，num_valid_boxes表示被选上的候选框数量。</dd>
+</dl>
+
+#### 类型约束
+
+- T:tensor(float32)
+
+### grid_sampler
+
+#### 描述
+
+根据`grid`的像素位置对`input`进行网格采样。
+
+#### 模型参数
+
+| 类型  | 参数名               | 描述                                                                                                                                                 |
+| ----- | -------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `int` | `interpolation_mode` | 计算输出使用的插值模式。(0: `bilinear` , 1: `nearest`)                                                                                               |
+| `int` | `padding_mode`       | 边缘填充模式。(0: `zeros`, 1: `border`, 2: `reflection`)                                                                                             |
+| `int` | `align_corners`      | 如果`align_corners=1`，则极值(`-1`和`1`)会被当做输入边缘像素的中心点。如果`align_corners=0`，则它们会被看做是边缘像素的边缘点,减小分辨率对采样的影响 |
+
+#### 输入
+
+<dl>
+<dt><tt>input</tt>: T</dt>
+<dd>输入特征；形状为(N, C, inH, inW)的四维张量，其中N为batch大小，C为输入通道数，inH和inW为输入特征图的高和宽。</dd>
+<dt><tt>grid</tt>: T</dt>
+<dd>输入网格；形状为(N, outH, outW, 2)的四维张量，outH和outW为输出的高和宽。 </dd>
+</dl>
+
+#### 输出
+
+<dl>
+<dt><tt>output</tt>: T</dt>
+<dd>输出特征；形状为(N, C, outH, outW)的四维张量。</dd>
+</dl>
+
+#### 类型约束
+
+- T:tensor(float32, Linear)
+
+### CornerPool
+
+#### 描述
+
+对`input`计算CornerPool。请阅读[CornerNet -- Detecting Objects as Paired Keypoints](https://arxiv.org/abs/1808.01244)了解更多细节。
+
+#### 模型参数
+
+| 类型  | 参数名 | 描述                                                     |
+| ----- | ------ | -------------------------------------------------------- |
+| `int` | `mode` | 池化模式。(0: `top`, 1: `bottom`, 2: `left`, 3: `right`) |
+
+#### 输入
+
+<dl>
+<dt><tt>input</tt>: T</dt>
+<dd>输入特征；形状为(N, C, H, W)的四维张量，其中N为batch大小，C为输入通道数，H和W为输入特征图的高和宽。</dd>
+</dl>
+
+#### 输出
+
+<dl>
+<dt><tt>output</tt>: T</dt>
+<dd>输出特征；形状为(N, C, H, W)的四维张量。</dd>
+</dl>
+
+#### 类型约束
+
+- T:tensor(float32)
+
+### cummax
+
+#### 描述
+
+返回一个元组(`values`, `indices`)，其中`values`为`input`第`dim`维的累计最大值，`indices`为第`dim`维最大值位置。请阅读[torch.cummax](https://pytorch.org/docs/stable/generated/torch.cummax.html)了解更多细节。
+
+#### 模型参数
+
+| 类型  | 参数名 | 描述               |
+| ----- | ------ | ------------------ |
+| `int` | `dim`  | 进行累计计算的维度 |
+
+#### 输入
+
+<dl>
+<dt><tt>input</tt>: T</dt>
+<dd>输入张量；可以使任意形状；也支持空Tensor</dd>
+</dl>
+
+#### 输出
+
+<dl>
+<dt><tt>output</tt>: T</dt>
+<dd>`input`第`dim`维的累计最大值，形状与`input`相同。类型和`input`一致</dd>
+<dt><tt>indices</tt>: tensor(int64)</dt>
+<dd>第`dim`维最大值位置，形状与`input`相同。</dd>
+</dl>
+
+#### 类型约束
+
+- T:tensor(float32)
+
+### cummin
+
+#### 描述
+
+返回一个元组(`values`, `indices`)，其中`values`为`input`第`dim`维的累计最小值，`indices`为第`dim`维最小值位置。请阅读[torch.cummin](https://pytorch.org/docs/stable/generated/torch.cummin.html)了解更多细节。
+
+#### 模型参数
+
+| 类型  | 参数名 | 描述               |
+| ----- | ------ | ------------------ |
+| `int` | `dim`  | 进行累计计算的维度 |
+
+#### 输入
+
+<dl>
+<dt><tt>input</tt>: T</dt>
+<dd>输入张量；可以是任意形状；也支持空Tensor</dd>
+</dl>
+
+#### 输出
+
+<dl>
+<dt><tt>output</tt>: T</dt>
+<dd>`input`第`dim`维的累计最小值，形状与`input`相同。类型和`input`一致</dd>
+<dt><tt>indices</tt>: tensor(int64)</dt>
+<dd>第`dim`维最小值位置，形状与`input`相同。</dd>
+</dl>
+
+#### 类型约束
+
+- T:tensor(float32)
+
+### MMCVModulatedDeformConv2d
+
+#### 描述
+
+在输入特征上计算Modulated Deformable Convolution，请阅读[Deformable ConvNets v2: More Deformable, Better Results](https://arxiv.org/abs/1811.11168?from=timeline)了解更多细节。
+
+#### 模型参数
+
+| 类型           | 参数名              | 描述                                                          |
+| -------------- | ------------------- | ------------------------------------------------------------- |
+| `list of ints` | `stride`            | 卷积的步长 (sH, sW)                                           |
+| `list of ints` | `padding`           | 输入特征填充大小 (padH, padW)                                 |
+| `list of ints` | `dilation`          | 卷积核各元素间隔 (dH, dW)                                     |
+| `int`          | `deformable_groups` | 可变偏移量的分组，通常置位1即可                               |
+| `int`          | `groups`            | 卷积分组数，`input_channel`会根据这个值被分为数个分组进行计算 |
+
+#### 输入
+
+<dl>
+<dt><tt>inputs[0]</tt>: T</dt>
+<dd>输入特征；形状为(N, C, inH, inW)的四维张量，其中N为batch大小，C为输入通道数，inH和inW为输入特征图的高和宽。</dd>
+<dt><tt>inputs[1]</tt>: T</dt>
+<dd>输入偏移量；形状为(N, deformable_group* 2* kH* kW, outH, outW)的四维张量，kH和kW为输入特征图的高和宽，outH和outW为输入特征图的高和宽。</dd>
+<dt><tt>inputs[2]</tt>: T</dt>
+<dd>输入掩码；形状为(N, deformable_group* kH* kW, outH, outW)的四维张量。</dd>
+<dt><tt>inputs[3]</tt>: T</dt>
+<dd>输入权重；形状为(output_channel, input_channel, kH, kW)的四维张量。</dd>
+<dt><tt>inputs[4]</tt>: T, optional</dt>
+<dd>输入偏移量；形状为(output_channel)的一维张量。</dd>
+</dl>
+
+#### 输出
+
+<dl>
+<dt><tt>outputs[0]</tt>: T</dt>
+<dd>输出特征；形状为(N, output_channel, outH, outW)的四维张量。</dd>
+</dl>
+
+#### 类型约束
+
+- T:tensor(float32, Linear)
--- a/docs_zh_CN/deployment/onnxruntime_op.md
+++ b/docs_zh_CN/deployment/onnxruntime_op.md
-# MMCV 中用于 ONNX Runtime 的自定义算子
+## MMCV中的ONNX Runtime自定义算子

-欢迎有兴趣的朋友一起翻译 MMCV 文档。如有兴趣，请在 [MMCV issue](https://github.com/open-mmlab/mmcv/issues) 提 issue 确定翻译的文档。
+### ONNX Runtime介绍
+
+**ONNX Runtime**是一个跨平台的推理与训练加速器，适配许多常用的机器学习/深度神经网络框架。请访问[github](https://github.com/microsoft/onnxruntime)了解更多信息。
+
+### ONNX介绍
+
+**ONNX**是**Open Neural Network Exchange**的缩写，是许多机器学习/深度神经网络框架使用的*中间表示(IR)*。请访问[github](https://github.com/onnx/onnx)了解更多信息。
+
+### 为什么要在MMCV中添加ONNX自定义算子？
+
+- 为了验证ONNX模型在ONNX Runtime下的推理的正确性。
+- 为了方便使用了`mmcv.ops`自定义算子的模型的部署工作。
+
+### MMCV已支持的算子
+
+|                                       算子                                       |  CPU  |  GPU  | MMCV版本 |
+| :------------------------------------------------------------------------------: | :---: | :---: | :------: |
+|                   [SoftNMS](onnxruntime_custom_ops.md#softnms)                   |   Y   |   N   |  1.2.3   |
+|                  [RoIAlign](onnxruntime_custom_ops.md#roialign)                  |   Y   |   N   |  1.2.5   |
+|                       [NMS](onnxruntime_custom_ops.md#nms)                       |   Y   |   N   |  1.2.7   |
+|              [grid_sampler](onnxruntime_custom_ops.md#grid_sampler)              |   Y   |   N   |  1.3.1   |
+|                [CornerPool](onnxruntime_custom_ops.md#cornerpool)                |   Y   |   N   |  1.3.4   |
+|                    [cummax](onnxruntime_custom_ops.md#cummax)                    |   Y   |   N   |  1.3.4   |
+|                    [cummin](onnxruntime_custom_ops.md#cummin)                    |   Y   |   N   |  1.3.4   |
+| [MMCVModulatedDeformConv2d](onnxruntime_custom_ops.md#mmcvmodulateddeformconv2d) |   Y   |   N   |  1.3.12  |
+
+### 如何编译ONNX Runtime自定义算子？
+
+*请注意我们仅在**onnxruntime>=1.8.1**的Linux x86-64 cpu平台上进行过测试*
+
+#### 准备工作
+
+- 克隆代码仓库
+
+```bash
+git clone https://github.com/open-mmlab/mmcv.git
+```
+
+- 从ONNX Runtime下载`onnxruntime-linux`：[releases](https://github.com/microsoft/onnxruntime/releases/tag/v1.8.1)，解压缩，根据路径创建变量`ONNXRUNTIME_DIR`并把路径下的lib目录添加到`LD_LIBRARY_PATH`，步骤如下：
+
+```bash
+wget https://github.com/microsoft/onnxruntime/releases/download/v1.8.1/onnxruntime-linux-x64-1.8.1.tgz
+
+tar -zxvf onnxruntime-linux-x64-1.8.1.tgz
+cd onnxruntime-linux-x64-1.8.1
+export ONNXRUNTIME_DIR=$(pwd)
+export LD_LIBRARY_PATH=$ONNXRUNTIME_DIR/lib:$LD_LIBRARY_PATH
+```
+
+#### Linux系统下编译
+
+```bash
+cd mmcv ## to MMCV root directory
+MMCV_WITH_OPS=1 MMCV_WITH_ORT=1 python setup.py develop
+```
+
+### 如何在python下使用ONNX Runtime对导出的ONNX模型做编译
+
+使用`pip`安装ONNX Runtime
+
+```bash
+pip install onnxruntime==1.8.1
+```
+
+推理范例
+
+```python
+import os
+
+import numpy as np
+import onnxruntime as ort
+
+from mmcv.ops import get_onnxruntime_op_path
+
+ort_custom_op_path = get_onnxruntime_op_path()
+assert os.path.exists(ort_custom_op_path)
+session_options = ort.SessionOptions()
+session_options.register_custom_ops_library(ort_custom_op_path)
+## exported ONNX model with custom operators
+onnx_file = 'sample.onnx'
+input_data = np.random.randn(1, 3, 224, 224).astype(np.float32)
+sess = ort.InferenceSession(onnx_file, session_options)
+onnx_results = sess.run(None, {'input' : input_data})
+```
+
+### 如何为MMCV添加ONNX Runtime的自定义算子
+
+#### 开发前提醒
+
+- 该算子的ONNX Runtime实现尚未在MMCV中支持[已实现算子列表](https://github.com/microsoft/onnxruntime/blob/master/docs/OperatorKernels.md)。
+- 确保该自定义算子可以被ONNX导出。
+
+#### 添加方法
+
+以`soft_nms`为例：
+
+1. 在ONNX Runtime头文件目录`mmcv/ops/csrc/onnxruntime/`下添加头文件`soft_nms.h`
+2. 在ONNX Runtime源码目录`mmcv/ops/csrc/onnxruntime/cpu/`下添加算子实现`soft_nms.cpp`
+3. 在[onnxruntime_register.cpp](../../mmcv/ops/csrc/onnxruntime/cpu/onnxruntime_register.cpp)中注册实现的算子`soft_nms`
+
+    ```c++
+    #include "soft_nms.h"
+
+    SoftNmsOp c_SoftNmsOp;
+
+    if (auto status = ortApi->CustomOpDomain_Add(domain, &c_SoftNmsOp)) {
+    return status;
+    }
+    ```
+
+4. 在`tests/test_ops/test_onnx.py`添加单元测试，
+   可以参考[here](../../tests/test_ops/test_onnx.py)。
+
+**最后，欢迎为MMCV添加ONNX Runtime自定义算子** :nerd_face:
+
+### 已知问题
+
+- "RuntimeError: tuple appears in op that does not forward tuples, unsupported kind: `prim::PythonOp`."
+   1. 请注意`cummax`和`cummin`算子是在torch >= 1.5.0被添加的。但他们需要在torch version >= 1.7.0才能正确导出。否则会在导出时发生上面的错误。
+   2. 解决方法：升级PyTorch到1.7.0以上版本
+
+### 引用
+
+- [How to export Pytorch model with custom op to ONNX and run it in ONNX Runtime](https://github.com/onnx/tutorials/blob/master/PyTorchCustomOperator/README.md)
+- [How to add a custom operator/kernel in ONNX Runtime](https://github.com/microsoft/onnxruntime/blob/master/docs/AddingCustomOp.md)