Unverified Commit afe0794c authored by q.yao's avatar q.yao Committed by GitHub
Browse files

[Docs] Add docs_zh_CN of TensorRT (#1336)



* Add docs_zh_CN of TensorRT

* fix according to doc-cn-onnx

* update doc of cummax cummin

* Apply suggestions from code review
Co-authored-by: default avatarZaida Zhou <58739961+zhouzaida@users.noreply.github.com>

* update heading
Co-authored-by: default avatarZaida Zhou <58739961+zhouzaida@users.noreply.github.com>
parent 2b39d7a8
# TensorRT Custom Ops
## TensorRT Custom Ops
<!-- TOC -->
......@@ -60,14 +60,14 @@
<!-- TOC -->
## MMCVRoIAlign
### MMCVRoIAlign
### Description
#### Description
Perform RoIAlign on output feature, used in bbox_head of most two stage
detectors.
### Parameters
#### Parameters
| Type | Parameter | Description |
| ------- | ---------------- | ------------------------------------------------------------------------------------------------------------- |
......@@ -78,7 +78,7 @@ detectors.
| `str` | `mode` | pooling mode in each bin. `avg` or `max` |
| `int` | `aligned` | If `aligned=0`, use the legacy implementation in MMDetection. Else, align the results more perfectly. |
### Inputs
#### Inputs
<dl>
<dt><tt>inputs[0]</tt>: T</dt>
......@@ -87,20 +87,20 @@ detectors.
<dd>RoIs (Regions of Interest) to pool over; 2-D tensor of shape (num_rois, 5) given as [[batch_index, x1, y1, x2, y2], ...]. The RoIs' coordinates are the coordinate system of inputs[0].</dd>
</dl>
### Outputs
#### Outputs
<dl>
<dt><tt>outputs[0]</tt>: T</dt>
<dd>RoI pooled output, 4-D tensor of shape (num_rois, C, output_height, output_width). The r-th batch element output[0][r-1] is a pooled feature map corresponding to the r-th RoI inputs[1][r-1].<dd>
</dl>
### Type Constraints
#### Type Constraints
- T:tensor(float32, Linear)
## ScatterND
### ScatterND
### Description
#### Description
ScatterND takes three inputs `data` tensor of rank r >= 1, `indices` tensor of rank q >= 1, and `updates` tensor of rank q + r - indices.shape[-1] - 1. The output of the operation is produced by creating a copy of the input `data`, and then updating its value to values specified by updates at specific index positions specified by `indices`. Its output shape is the same as the shape of `data`. Note that `indices` should not have duplicate entries. That is, two or more updates for the same index-location is not supported.
......@@ -113,11 +113,11 @@ The `output` is calculated via the following equation:
output[indices[idx]] = updates[idx]
```
### Parameters
#### Parameters
None
### Inputs
#### Inputs
<dl>
<dt><tt>inputs[0]</tt>: T</dt>
......@@ -130,24 +130,24 @@ None
<dd>Tensor of rank q + r - indices_shape[-1] - 1.</dd>
</dl>
### Outputs
#### Outputs
<dl>
<dt><tt>outputs[0]</tt>: T</dt>
<dd>Tensor of rank r >= 1.</dd>
</dl>
### Type Constraints
#### Type Constraints
- T:tensor(float32, Linear), tensor(int32, Linear)
## NonMaxSuppression
### NonMaxSuppression
### Description
#### Description
Filter out boxes has high IoU overlap with previously selected boxes or low score. Output the indices of valid boxes. Indices of invalid boxes will be filled with -1.
### Parameters
#### Parameters
| Type | Parameter | Description |
| ------- | ---------------------------- | ------------------------------------------------------------------------------------------------------------------------------------ |
......@@ -157,7 +157,7 @@ Filter out boxes has high IoU overlap with previously selected boxes or low scor
| `float` | `score_threshold` | The threshold for deciding when to remove boxes based on score. |
| `int` | `offset` | 0 or 1, boxes' width or height is (x2 - x1 + offset). |
### Inputs
#### Inputs
<dl>
<dt><tt>inputs[0]</tt>: T</dt>
......@@ -166,7 +166,7 @@ Filter out boxes has high IoU overlap with previously selected boxes or low scor
<dd>Input scores. 3-D tensor of shape (num_batches, num_classes, spatial_dimension).</dd>
</dl>
### Outputs
#### Outputs
<dl>
<dt><tt>outputs[0]</tt>: tensor(int32, Linear)</dt>
......@@ -175,17 +175,17 @@ Filter out boxes has high IoU overlap with previously selected boxes or low scor
<dd>All invalid indices will be filled with -1.</dd>
</dl>
### Type Constraints
#### Type Constraints
- T:tensor(float32, Linear)
## MMCVDeformConv2d
### MMCVDeformConv2d
### Description
#### Description
Perform Deformable Convolution on input feature, read [Deformable Convolutional Network](https://arxiv.org/abs/1703.06211) for detail.
### Parameters
#### Parameters
| Type | Parameter | Description |
| -------------- | ------------------ | --------------------------------------------------------------------------------------------------------------------------------- |
......@@ -196,7 +196,7 @@ Perform Deformable Convolution on input feature, read [Deformable Convolutional
| `int` | `group` | Split input into groups. `input_channel` should be divisible by the number of groups. |
| `int` | `im2col_step` | DeformableConv2d use im2col to compute convolution. im2col_step is used to split input and offset, reduce memory usage of column. |
### Inputs
#### Inputs
<dl>
<dt><tt>inputs[0]</tt>: T</dt>
......@@ -207,24 +207,24 @@ Perform Deformable Convolution on input feature, read [Deformable Convolutional
<dd>Input weight; 4-D tensor of shape (output_channel, input_channel, kH, kW).</dd>
</dl>
### Outputs
#### Outputs
<dl>
<dt><tt>outputs[0]</tt>: T</dt>
<dd>Output feature; 4-D tensor of shape (N, output_channel, outH, outW).</dd>
</dl>
### Type Constraints
#### Type Constraints
- T:tensor(float32, Linear)
## grid_sampler
### grid_sampler
### Description
#### Description
Perform sample from `input` with pixel locations from `grid`.
### Parameters
#### Parameters
| Type | Parameter | Description |
| ----- | -------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
......@@ -232,7 +232,7 @@ Perform sample from `input` with pixel locations from `grid`.
| `int` | `padding_mode` | Padding mode for outside grid values. (0: `zeros`, 1: `border`, 2: `reflection`) |
| `int` | `align_corners` | If `align_corners=1`, the extrema (`-1` and `1`) are considered as referring to the center points of the input's corner pixels. If `align_corners=0`, they are instead considered as referring to the corner points of the input's corner pixels, making the sampling more resolution agnostic. |
### Inputs
#### Inputs
<dl>
<dt><tt>inputs[0]</tt>: T</dt>
......@@ -241,37 +241,37 @@ Perform sample from `input` with pixel locations from `grid`.
<dd>Input offset; 4-D tensor of shape (N, outH, outW, 2), where outH and outW is the height and width of offset and output. </dd>
</dl>
### Outputs
#### Outputs
<dl>
<dt><tt>outputs[0]</tt>: T</dt>
<dd>Output feature; 4-D tensor of shape (N, C, outH, outW).</dd>
</dl>
### Type Constraints
#### Type Constraints
- T:tensor(float32, Linear)
## cummax
### cummax
### Description
#### Description
Returns a namedtuple (`values`, `indices`) where `values` is the cumulative maximum of elements of `input` in the dimension `dim`. And `indices` is the index location of each maximum value found in the dimension `dim`.
### Parameters
#### Parameters
| Type | Parameter | Description |
| ----- | --------- | --------------------------------------- |
| `int` | `dim` | The dimension to do the operation over. |
### Inputs
#### Inputs
<dl>
<dt><tt>inputs[0]</tt>: T</dt>
<dd>The input tensor.</dd>
</dl>
### Outputs
#### Outputs
<dl>
<dt><tt>outputs[0]</tt>: T</dt>
......@@ -280,30 +280,30 @@ Returns a namedtuple (`values`, `indices`) where `values` is the cumulative maxi
<dd>Output indices.</dd>
</dl>
### Type Constraints
#### Type Constraints
- T:tensor(float32, Linear)
## cummin
### cummin
### Description
#### Description
Returns a namedtuple (`values`, `indices`) where `values` is the cumulative minimum of elements of `input` in the dimension `dim`. And `indices` is the index location of each minimum value found in the dimension `dim`.
### Parameters
#### Parameters
| Type | Parameter | Description |
| ----- | --------- | --------------------------------------- |
| `int` | `dim` | The dimension to do the operation over. |
### Inputs
#### Inputs
<dl>
<dt><tt>inputs[0]</tt>: T</dt>
<dd>The input tensor.</dd>
</dl>
### Outputs
#### Outputs
<dl>
<dt><tt>outputs[0]</tt>: T</dt>
......@@ -312,25 +312,25 @@ Returns a namedtuple (`values`, `indices`) where `values` is the cumulative mini
<dd>Output indices.</dd>
</dl>
### Type Constraints
#### Type Constraints
- T:tensor(float32, Linear)
## MMCVInstanceNormalization
### MMCVInstanceNormalization
### Description
#### Description
Carries out instance normalization as described in the paper https://arxiv.org/abs/1607.08022.
y = scale * (x - mean) / sqrt(variance + epsilon) + B, where mean and variance are computed per instance per channel.
### Parameters
#### Parameters
| Type | Parameter | Description |
| ------- | --------- | -------------------------------------------------------------------- |
| `float` | `epsilon` | The epsilon value to use to avoid division by zero. Default is 1e-05 |
### Inputs
#### Inputs
<dl>
<dt><tt>input</tt>: T</dt>
......@@ -341,24 +341,24 @@ y = scale * (x - mean) / sqrt(variance + epsilon) + B, where mean and variance a
<dd>The input 1-dimensional bias tensor of size C.</dd>
</dl>
### Outputs
#### Outputs
<dl>
<dt><tt>output</tt>: T</dt>
<dd>The output tensor of the same shape as input.</dd>
</dl>
### Type Constraints
#### Type Constraints
- T:tensor(float32, Linear)
## MMCVModulatedDeformConv2d
### MMCVModulatedDeformConv2d
### Description
#### Description
Perform Modulated Deformable Convolution on input feature, read [Deformable ConvNets v2: More Deformable, Better Results](https://arxiv.org/abs/1811.11168?from=timeline) for detail.
### Parameters
#### Parameters
| Type | Parameter | Description |
| -------------- | ------------------ | ------------------------------------------------------------------------------------- |
......@@ -368,7 +368,7 @@ Perform Modulated Deformable Convolution on input feature, read [Deformable Conv
| `int` | `deformable_group` | Groups of deformable offset. |
| `int` | `group` | Split input into groups. `input_channel` should be divisible by the number of groups. |
### Inputs
#### Inputs
<dl>
<dt><tt>inputs[0]</tt>: T</dt>
......@@ -383,13 +383,13 @@ Perform Modulated Deformable Convolution on input feature, read [Deformable Conv
<dd>Input weight; 1-D tensor of shape (output_channel).</dd>
</dl>
### Outputs
#### Outputs
<dl>
<dt><tt>outputs[0]</tt>: T</dt>
<dd>Output feature; 4-D tensor of shape (N, output_channel, outH, outW).</dd>
</dl>
### Type Constraints
#### Type Constraints
- T:tensor(float32, Linear)
# TensorRT Plugins for custom operators in MMCV (Experimental)
## TensorRT Plugins for custom operators in MMCV (Experimental)
<!-- TOC -->
......@@ -17,12 +17,12 @@
<!-- TOC -->
## Introduction
### Introduction
**NVIDIA TensorRT** is a software development kit(SDK) for high-performance inference of deep learning models. It includes a deep learning inference optimizer and runtime that delivers low latency and high-throughput for deep learning inference applications. Please check its [developer's website](https://developer.nvidia.com/tensorrt) for more information.
To ease the deployment of trained models with custom operators from `mmcv.ops` using TensorRT, a series of TensorRT plugins are included in MMCV.
## List of TensorRT plugins supported in MMCV
### List of TensorRT plugins supported in MMCV
| ONNX Operator | TensorRT Plugin | MMCV Releases |
| :-----------------------: | :-----------------------------------------------------------------------------: | :-----------: |
......@@ -40,9 +40,9 @@ Notes
- All plugins listed above are developed on TensorRT-7.2.1.6.Ubuntu-16.04.x86_64-gnu.cuda-10.2.cudnn8.0
## How to build TensorRT plugins in MMCV
### How to build TensorRT plugins in MMCV
### Prerequisite
#### Prerequisite
- Clone repository
......@@ -75,14 +75,14 @@ pip install $TENSORRT_DIR/graphsurgeon/graphsurgeon-0.4.5-py2.py3-none-any.whl
For more detailed infomation of installing TensorRT using tar, please refer to [Nvidia' website](https://docs.nvidia.com/deeplearning/tensorrt/archives/tensorrt-721/install-guide/index.html#installing-tar).
### Build on Linux
#### Build on Linux
```bash
cd mmcv # to MMCV root directory
cd mmcv ## to MMCV root directory
MMCV_WITH_OPS=1 MMCV_WITH_TRT=1 pip install -e .
```
## Create TensorRT engine and run inference in python
### Create TensorRT engine and run inference in python
Here is an example.
......@@ -99,26 +99,26 @@ onnx_file = 'sample.onnx'
trt_file = 'sample.trt'
onnx_model = onnx.load(onnx_file)
# Model input
## Model input
inputs = torch.rand(1, 3, 224, 224).cuda()
# Model input shape info
## Model input shape info
opt_shape_dict = {
'input': [list(inputs.shape),
list(inputs.shape),
list(inputs.shape)]
}
# Create TensorRT engine
## Create TensorRT engine
max_workspace_size = 1 << 30
trt_engine = onnx2trt(
onnx_model,
opt_shape_dict,
max_workspace_size=max_workspace_size)
# Save TensorRT engine
## Save TensorRT engine
save_trt_engine(trt_engine, trt_file)
# Run inference with TensorRT
## Run inference with TensorRT
trt_model = TRTWrapper(trt_file, ['input'], ['output'])
with torch.no_grad():
......@@ -127,9 +127,9 @@ with torch.no_grad():
```
## How to add a TensorRT plugin for custom op in MMCV
### How to add a TensorRT plugin for custom op in MMCV
### Main procedures
#### Main procedures
Below are the main steps:
......@@ -161,15 +161,15 @@ Below are the main steps:
5. Add unit test into `tests/test_ops/test_tensorrt.py`
Check [here](https://github.com/open-mmlab/mmcv/blob/master/tests/test_ops/test_tensorrt.py) for examples.
### Reminders
#### Reminders
- Some of the [custom ops](https://mmcv.readthedocs.io/en/latest/ops.html) in `mmcv` have their cuda implementations, which could be referred.
## Known Issues
### Known Issues
- None
## References
### References
- [Developer guide of Nvidia TensorRT](https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html)
- [TensorRT Open Source Software](https://github.com/NVIDIA/TensorRT)
......
# TensorRT 自定义算子
## TensorRT自定义算子
欢迎有兴趣的朋友一起翻译 MMCV 文档。如有兴趣,请在 [MMCV issue](https://github.com/open-mmlab/mmcv/issues) 提 issue 确定翻译的文档。
<!-- TOC -->
- [TensorRT自定义算子](#tensorrt自定义算子)
- [MMCVRoIAlign](#mmcvroialign)
- [描述](#描述)
- [模型参数](#模型参数)
- [输入](#输入)
- [输出](#输出)
- [类型约束](#类型约束)
- [ScatterND](#scatternd)
- [描述](#描述-1)
- [模型参数](#模型参数-1)
- [输入](#输入-1)
- [输出](#输出-1)
- [类型约束](#类型约束-1)
- [NonMaxSuppression](#nonmaxsuppression)
- [描述](#描述-2)
- [模型参数](#模型参数-2)
- [输入](#输入-2)
- [输出](#输出-2)
- [类型约束](#类型约束-2)
- [MMCVDeformConv2d](#mmcvdeformconv2d)
- [描述](#描述-3)
- [模型参数](#模型参数-3)
- [输入](#输入-3)
- [输出](#输出-3)
- [类型约束](#类型约束-3)
- [grid_sampler](#grid_sampler)
- [描述](#描述-4)
- [模型参数](#模型参数-4)
- [输入](#输入-4)
- [输出](#输出-4)
- [类型约束](#类型约束-4)
- [cummax](#cummax)
- [描述](#描述-5)
- [模型参数](#模型参数-5)
- [输入](#输入-5)
- [输出](#输出-5)
- [类型约束](#类型约束-5)
- [cummin](#cummin)
- [描述](#描述-6)
- [模型参数](#模型参数-6)
- [输入](#输入-6)
- [输出](#输出-6)
- [类型约束](#类型约束-6)
- [MMCVInstanceNormalization](#mmcvinstancenormalization)
- [描述](#描述-7)
- [模型参数](#模型参数-7)
- [输入](#输入-7)
- [输出](#输出-7)
- [类型约束](#类型约束-7)
- [MMCVModulatedDeformConv2d](#mmcvmodulateddeformconv2d)
- [描述](#描述-8)
- [模型参数](#模型参数-8)
- [输入](#输入-8)
- [输出](#输出-8)
- [类型约束](#类型约束-8)
<!-- TOC -->
### MMCVRoIAlign
#### 描述
在特征图上计算RoIAlign,在多数双阶段目标检测模型的bbox_head中使用
#### 模型参数
| 类型 | 参数名 | 描述 |
| ------- | ---------------- | ------------------------------------------------------- |
| `int` | `output_height` | roi特征的输出高度 |
| `int` | `output_width` | roi特征的输出宽度 |
| `float` | `spatial_scale` | 输入检测框的缩放系数 |
| `int` | `sampling_ratio` | 输出的采样率。`0`表示使用密集采样 |
| `str` | `mode` | 池化方式。 `avg``max` |
| `int` | `aligned` | 如果`aligned=1`,则像素会进行-0.5的偏移以达到更好的对齐 |
#### 输入
<dl>
<dt><tt>inputs[0]</tt>: T</dt>
<dd>输入特征图;形状为(N, C, H, W)的四维张量,其中N为batch大小,C为输入通道数,H和W为输入特征图的高和宽。</dd>
<dt><tt>inputs[1]</tt>: T</dt>
<dd>需要进行池化的感兴趣区域;形状为(num_rois, 5)的二维张量,内容为[[batch_index, x1, y1, x2, y2], ...]。rois的坐标为输入特征图的坐标系。</dd>
</dl>
#### 输出
<dl>
<dt><tt>outputs[0]</tt>: T</dt>
<dd>池化的输出;形状为(num_rois, C, output_height, output_width)的四维张量。每个输出特征feat[i]都与输入感兴趣区域rois[i]一一对应。<dd>
</dl>
#### 类型约束
- T:tensor(float32, Linear)
### ScatterND
#### 描述
ScatterND接收三个输入,分别为秩为r >= 1的`data`,秩为q >= 1的`indices`以及秩为 q + r - indices.shape[-1] -1 的`update`。输出的计算方式为:首先创建一个`data`的拷贝,然后根据`indces`的值使用`update`对拷贝的`data`进行更新。注意`indices`中不应该存在相同的条目,也就是说对同一个位置进行一次以上的更新是不允许的。
输出的计算方式可以参考如下代码:
```python
output = np.copy(data)
update_indices = indices.shape[:-1]
for idx in np.ndindex(update_indices):
output[indices[idx]] = updates[idx]
```
#### 模型参数
#### 输入
<dl>
<dt><tt>inputs[0]</tt>: T</dt>
<dd>秩为r >= 1的输入`data`</dd>
<dt><tt>inputs[1]</tt>: tensor(int32, Linear)</dt>
<dd>秩为q >= 1的输入`update`</dd>
<dt><tt>inputs[2]</tt>: T</dt>
<dd>秩为 q + r - indices.shape[-1] -1 的输入`update`</dd>
</dl>
#### 输出
<dl>
<dt><tt>outputs[0]</tt>: T</dt>
<dd>秩为r >= 1的输出张量</dd>
</dl>
#### 类型约束
- T:tensor(float32, Linear), tensor(int32, Linear)
### NonMaxSuppression
#### 描述
根据IoU阈值对候选框进行非极大值抑制。
#### 模型参数
| 类型 | 参数名 | 描述 |
| ------- | ---------------------------- | ---------------------------------------------------------------------------------------- |
| `int` | `center_point_box` | 0 - 候选框的格式为[y1, x1, y2, x2], 1-候选框的格式为[x_center, y_center, width, height] |
| `int` | `max_output_boxes_per_class` | 每一类最大的输出检测框个数。默认为0,输出检测框个数等于输入候选框数 |
| `float` | `iou_threshold` | 用来判断候选框重合度的阈值,取值范围[0, 1]。默认值为0 |
| `float` | `score_threshold` | 用来判断候选框是否合法的阈值 |
| `int` | `offset` | 检测框长宽计算方式为(x2 - x1 + offset),可选值0或1 |
#### 输入
<dl>
<dt><tt>inputs[0]</tt>: T</dt>
<dd>输入候选框。形状为(num_batches, spatial_dimension, 4)的三维张量</dd>
<dt><tt>inputs[1]</tt>: T</dt>
<dd>输入得分。形状为(num_batches, num_classes, spatial_dimension)的三维张量</dd>
</dl>
#### 输出
<dl>
<dt><tt>outputs[0]</tt>: tensor(int32, Linear)</dt>
<dd>被选中的候选框索引。形状为(num_selected_indices, 3)的二维张量。每一行内容为[batch_index, class_index, box_index]。</dd>
<dd>其中 num_selected_indices=num_batches* num_classes* min(max_output_boxes_per_class, spatial_dimension)。</dd>
<dd>所有未被选中的候选框索引都会被填充为-1</dd>
</dl>
#### 类型约束
- T:tensor(float32, Linear)
### MMCVDeformConv2d
#### 描述
在输入特征上计算Deformable Convolution,请阅读[Deformable Convolutional Network](https://arxiv.org/abs/1703.06211)了解更多细节。
#### 模型参数
| 类型 | 参数名 | 描述 |
| -------------- | ------------------ | --------------------------------------------------------------------------------------------- |
| `list of ints` | `stride` | 卷积的步长 (sH, sW) |
| `list of ints` | `padding` | 输入特征填充大小 (padH, padW) |
| `list of ints` | `dilation` | 卷积核各元素间隔 (dH, dW) |
| `int` | `deformable_group` | 可变偏移量的分组 |
| `int` | `group` | 卷积分组数,`input_channel`会根据这个值被分为数个分组进行计算 |
| `int` | `im2col_step` | 可变卷积使用im2col计算卷积。输入与偏移量会以im2col_step为步长分块计算,减少临时空间的使用量。 |
#### 输入
<dl>
<dt><tt>inputs[0]</tt>: T</dt>
<dd>输入特征;形状为(N, C, inH, inW)的四维张量,其中N为batch大小,C为输入通道数,inH和inW为输入特征图的高和宽</dd>
<dt><tt>inputs[1]</tt>: T</dt>
<dd>输入偏移量;形状为(N, deformable_group* 2* kH* kW, outH, outW)的四维张量,kH和kW为输入特征图的高和宽,outH和outW为输入特征图的高和宽</dd>
<dt><tt>inputs[2]</tt>: T</dt>
<dd>输入权重;形状为(output_channel, input_channel, kH, kW)的四维张量</dd>
</dl>
#### 输出
<dl>
<dt><tt>outputs[0]</tt>: T</dt>
<dd>输出特征;形状为(N, output_channel, outH, outW)的四维张量</dd>
</dl>
#### 类型约束
- T:tensor(float32, Linear)
### grid_sampler
#### 描述
根据`grid`的像素位置对`input`进行网格采样。
#### 模型参数
| 类型 | 参数名 | 描述 |
| ----- | -------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------- |
| `int` | `interpolation_mode` | 计算输出使用的插值模式。(0: `bilinear` , 1: `nearest`) |
| `int` | `padding_mode` | 边缘填充模式。(0: `zeros`, 1: `border`, 2: `reflection`) |
| `int` | `align_corners` | 如果`align_corners=1`,则极值(`-1``1`)会被当做输入边缘像素的中心点。如果`align_corners=0`,则它们会被看做是边缘像素的边缘点,减小分辨率对采样的影响 |
#### 输入
<dl>
<dt><tt>inputs[0]</tt>: T</dt>
<dd>输入特征;形状为(N, C, inH, inW)的四维张量,其中N为batch大小,C为输入通道数,inH和inW为输入特征图的高和宽</dd>
<dt><tt>inputs[1]</tt>: T</dt>
<dd>输入网格;形状为(N, outH, outW, 2)的四维张量,outH和outW为输出的高和宽 </dd>
</dl>
#### 输出
<dl>
<dt><tt>outputs[0]</tt>: T</dt>
<dd>输出特征;形状为(N, C, outH, outW)的四维张量</dd>
</dl>
#### 类型约束
- T:tensor(float32, Linear)
### cummax
#### 描述
返回一个元组(`values`, `indices`),其中`values``input``dim`维的累计最大值,`indices`为第`dim`维最大值位置。请阅读[torch.cummax](https://pytorch.org/docs/stable/generated/torch.cummax.html)了解更多细节。
#### 模型参数
| 类型 | 参数名 | 描述 |
| ----- | ------ | ------------------ |
| `int` | `dim` | 进行累计计算的维度 |
#### 输入
<dl>
<dt><tt>inputs[0]</tt>: T</dt>
<dd>输入张量;可以使任意形状</dd>
</dl>
#### 输出
<dl>
<dt><tt>outputs[0]</tt>: T</dt>
<dd>`input``dim`维的累计最大值,形状与`input`相同。类型和`input`一致</dd>
<dt><tt>outputs[1]</tt>: (int32, Linear)</dt>
<dd>`dim`维最大值位置,形状与`input`相同</dd>
</dl>
#### 类型约束
- T:tensor(float32, Linear)
### cummin
#### 描述
返回一个元组(`values`, `indices`),其中`values``input``dim`维的累计最小值,`indices`为第`dim`维最小值位置。请阅读[torch.cummin](https://pytorch.org/docs/stable/generated/torch.cummin.html)了解更多细节。
#### 模型参数
| 类型 | 参数名 | 描述 |
| ----- | ------ | ------------------ |
| `int` | `dim` | 进行累计计算的维度 |
#### 输入
<dl>
<dt><tt>inputs[0]</tt>: T</dt>
<dd>输入张量;可以使任意形状</dd>
</dl>
#### 输出
<dl>
<dt><tt>outputs[0]</tt>: T</dt>
<dd>`input``dim`维的累计最小值,形状与`input`相同。类型和`input`一致</dd>
<dt><tt>outputs[1]</tt>: (int32, Linear)</dt>
<dd>`dim`维最小值位置,形状与`input`相同</dd>
</dl>
#### 类型约束
- T:tensor(float32, Linear)
### MMCVInstanceNormalization
#### 描述
对特征计算instance normalization,请阅读[Instance Normalization: The Missing Ingredient for Fast Stylization](https://arxiv.org/abs/1607.08022)了解更多详细信息。
#### 模型参数
| 类型 | 参数名 | 描述 |
| ------- | --------- | ---------------------------- |
| `float` | `epsilon` | 用来避免除0错误。默认为1e-05 |
#### 输入
<dl>
<dt><tt>inputs[0]</tt>: T</dt>
<dd>输入特征。形状为(N, C, H, W)的四维张量,其中N为batch大小,C为输入通道数,H和W为输入特征图的高和宽</dd>
<dt><tt>inputs[1]</tt>: T</dt>
<dd>输入缩放系数。形状为(C,)的一维张量</dd>
<dt><tt>inputs[2]</tt>: T</dt>
<dd>输入偏移量。形状为(C,)的一维张量</dd>
</dl>
#### 输出
<dl>
<dt><tt>outputs[0]</tt>: T</dt>
<dd>输出特征。形状为(N, C, H, W)的四维张量</dd>
</dl>
#### 类型约束
- T:tensor(float32, Linear)
### MMCVModulatedDeformConv2d
#### 描述
在输入特征上计算Modulated Deformable Convolution,请阅读[Deformable ConvNets v2: More Deformable, Better Results](https://arxiv.org/abs/1811.11168?from=timeline)了解更多细节。
#### 模型参数
| 类型 | 参数名 | 描述 |
| -------------- | ------------------- | ------------------------------------------------------------- |
| `list of ints` | `stride` | 卷积的步长 (sH, sW) |
| `list of ints` | `padding` | 输入特征填充大小 (padH, padW) |
| `list of ints` | `dilation` | 卷积核各元素间隔 (dH, dW) |
| `int` | `deformable_groups` | 可变偏移量的分组,通常置位1即可 |
| `int` | `groups` | 卷积分组数,`input_channel`会根据这个值被分为数个分组进行计算 |
#### 输入
<dl>
<dt><tt>inputs[0]</tt>: T</dt>
<dd>输入特征;形状为(N, C, inH, inW)的四维张量,其中N为batch大小,C为输入通道数,inH和inW为输入特征图的高和宽</dd>
<dt><tt>inputs[1]</tt>: T</dt>
<dd>输入偏移量;形状为(N, deformable_group* 2* kH* kW, outH, outW)的四维张量,kH和kW为输入特征图的高和宽,outH和outW为输入特征图的高和宽</dd>
<dt><tt>inputs[2]</tt>: T</dt>
<dd>输入掩码;形状为(N, deformable_group* kH* kW, outH, outW)的四维张量</dd>
<dt><tt>inputs[3]</tt>: T</dt>
<dd>输入权重;形状为(output_channel, input_channel, kH, kW)的四维张量</dd>
<dt><tt>inputs[4]</tt>: T, optional</dt>
<dd>输入偏移量;形状为(output_channel)的一维张量</dd>
</dl>
#### 输出
<dl>
<dt><tt>outputs[0]</tt>: T</dt>
<dd>输出特征;形状为(N, output_channel, outH, outW)的四维张量</dd>
</dl>
#### 类型约束
- T:tensor(float32, Linear)
# MMCV 中用于自定义算子的 TensorRT 插件 (实验性)
## MMCV中的TensorRT自定义算子 (实验性)
欢迎有兴趣的朋友一起翻译 MMCV 文档。如有兴趣,请在 [MMCV issue](https://github.com/open-mmlab/mmcv/issues) 提 issue 确定翻译的文档。
<!-- TOC -->
- [MMCV中的TensorRT自定义算子 (实验性)](#mmcv中的tensorrt自定义算子-实验性)
- [介绍](#介绍)
- [MMCV中的TensorRT插件列表](#mmcv中的tensorrt插件列表)
- [如何编译MMCV中的TensorRT插件](#如何编译mmcv中的tensorrt插件)
- [准备](#准备)
- [在Linux上编译](#在linux上编译)
- [创建TensorRT推理引擎并在python下进行推理](#创建tensorrt推理引擎并在python下进行推理)
- [如何在MMCV中添加新的TensorRT自定义算子](#如何在mmcv中添加新的tensorrt自定义算子)
- [主要流程](#主要流程)
- [注意](#注意)
- [已知问题](#已知问题)
- [引用](#引用)
<!-- TOC -->
### 介绍
**NVIDIA TensorRT**是一个为深度学习模型高性能推理准备的软件开发工具(SDK)。它包括深度学习推理优化器和运行时,可为深度学习推理应用提供低延迟和高吞吐量。请访问[developer's website](https://developer.nvidia.com/tensorrt)了解更多信息。
为了简化TensorRT部署带有MMCV自定义算子的模型的流程,MMCV中添加了一系列TensorRT插件。
### MMCV中的TensorRT插件列表
| ONNX算子 | TensorRT插件 | MMCV版本 |
| :-----------------------: | :-----------------------------------------------------------------------------: | :------: |
| MMCVRoiAlign | [MMCVRoiAlign](./tensorrt_custom_ops.md#mmcvroialign) | 1.2.6 |
| ScatterND | [ScatterND](./tensorrt_custom_ops.md#scatternd) | 1.2.6 |
| NonMaxSuppression | [NonMaxSuppression](./tensorrt_custom_ops.md#nonmaxsuppression) | 1.3.0 |
| MMCVDeformConv2d | [MMCVDeformConv2d](./tensorrt_custom_ops.md#mmcvdeformconv2d) | 1.3.0 |
| grid_sampler | [grid_sampler](./tensorrt_custom_ops.md#grid-sampler) | 1.3.1 |
| cummax | [cummax](./tensorrt_custom_ops.md#cummax) | 1.3.5 |
| cummin | [cummin](./tensorrt_custom_ops.md#cummin) | 1.3.5 |
| MMCVInstanceNormalization | [MMCVInstanceNormalization](./tensorrt_custom_ops.md#mmcvinstancenormalization) | 1.3.5 |
| MMCVModulatedDeformConv2d | [MMCVModulatedDeformConv2d](./tensorrt_custom_ops.md#mmcvmodulateddeformconv2d) | master |
注意
- 以上所有算子均在 TensorRT-7.2.1.6.Ubuntu-16.04.x86_64-gnu.cuda-10.2.cudnn8.0 环境下开发。
### 如何编译MMCV中的TensorRT插件
#### 准备
- 克隆代码仓库
```bash
git clone https://github.com/open-mmlab/mmcv.git
```
- 安装TensorRT
[NVIDIA Developer Zone](https://developer.nvidia.com/nvidia-tensorrt-download) 下载合适的TensorRT版本。
比如,对安装了cuda-10.2的x86-64的Ubuntu 16.04,下载文件为`TensorRT-7.2.1.6.Ubuntu-16.04.x86_64-gnu.cuda-10.2.cudnn8.0.tar.gz`.
然后使用下面方式安装并配置环境
```bash
cd ~/Downloads
tar -xvzf TensorRT-7.2.1.6.Ubuntu-16.04.x86_64-gnu.cuda-10.2.cudnn8.0.tar.gz
export TENSORRT_DIR=`pwd`/TensorRT-7.2.1.6
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$TENSORRT_DIR/lib
```
安装python依赖: tensorrt, graphsurgeon, onnx-graphsurgeon
```bash
pip install $TENSORRT_DIR/python/tensorrt-7.2.1.6-cp37-none-linux_x86_64.whl
pip install $TENSORRT_DIR/onnx_graphsurgeon/onnx_graphsurgeon-0.2.6-py2.py3-none-any.whl
pip install $TENSORRT_DIR/graphsurgeon/graphsurgeon-0.4.5-py2.py3-none-any.whl
```
想了解更多通过tar包安装TensorRT,请访问[Nvidia' website](https://docs.nvidia.com/deeplearning/tensorrt/archives/tensorrt-721/install-guide/index.html#installing-tar).
#### 在Linux上编译
```bash
cd mmcv ## to MMCV root directory
MMCV_WITH_OPS=1 MMCV_WITH_TRT=1 pip install -e .
```
### 创建TensorRT推理引擎并在python下进行推理
范例如下:
```python
import torch
import onnx
from mmcv.tensorrt import (TRTWrapper, onnx2trt, save_trt_engine,
is_tensorrt_plugin_loaded)
assert is_tensorrt_plugin_loaded(), 'Requires to complie TensorRT plugins in mmcv'
onnx_file = 'sample.onnx'
trt_file = 'sample.trt'
onnx_model = onnx.load(onnx_file)
## Model input
inputs = torch.rand(1, 3, 224, 224).cuda()
## Model input shape info
opt_shape_dict = {
'input': [list(inputs.shape),
list(inputs.shape),
list(inputs.shape)]
}
## Create TensorRT engine
max_workspace_size = 1 << 30
trt_engine = onnx2trt(
onnx_model,
opt_shape_dict,
max_workspace_size=max_workspace_size)
## Save TensorRT engine
save_trt_engine(trt_engine, trt_file)
## Run inference with TensorRT
trt_model = TRTWrapper(trt_file, ['input'], ['output'])
with torch.no_grad():
trt_outputs = trt_model({'input': inputs})
output = trt_outputs['output']
```
### 如何在MMCV中添加新的TensorRT自定义算子
#### 主要流程
下面是主要的步骤:
1. 添加c++头文件
2. 添加c++源文件
3. 添加cuda kernel文件
4.`trt_plugin.cpp`中注册插件
5.`tests/test_ops/test_tensorrt.py`中添加单元测试
**以RoIAlign算子插件`roi_align`举例。**
1. 在TensorRT包含目录`mmcv/ops/csrc/tensorrt/`中添加头文件`trt_roi_align.hpp`
2. 在TensorRT源码目录`mmcv/ops/csrc/tensorrt/plugins/`中添加头文件`trt_roi_align.cpp`
3. 在TensorRT源码目录`mmcv/ops/csrc/tensorrt/plugins/`中添加cuda kernel文件`trt_roi_align_kernel.cu`
4.[trt_plugin.cpp](https://github.com/open-mmlab/mmcv/blob/master/mmcv/ops/csrc/tensorrt/plugins/trt_plugin.cpp)中注册`roi_align`插件
```c++
#include "trt_plugin.hpp"
#include "trt_roi_align.hpp"
REGISTER_TENSORRT_PLUGIN(RoIAlignPluginDynamicCreator);
extern "C" {
bool initLibMMCVInferPlugins() { return true; }
} // extern "C"
```
5.`tests/test_ops/test_tensorrt.py`中添加单元测试
#### 注意
- 部分MMCV中的自定义算子存在对应的cuda实现,在进行TensorRT插件开发的时候可以参考。
### 已知问题
-
### 引用
- [Developer guide of Nvidia TensorRT](https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html)
- [TensorRT Open Source Software](https://github.com/NVIDIA/TensorRT)
- [onnx-tensorrt](https://github.com/onnx/onnx-tensorrt)
- [TensorRT python API](https://docs.nvidia.com/deeplearning/tensorrt/api/python_api/index.html)
- [TensorRT c++ plugin API](https://docs.nvidia.com/deeplearning/tensorrt/api/c_api/classnvinfer1_1_1_i_plugin.html)
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment