[Docs] Replace markdownlint with mdformat for avoiding installing ruby (#1936)

* Use mdformat pre-commit hook * allows consecutive numbering * improve .mdformat.toml * test mdformat * format markdown * minor fix * fix codespecll * fix circleci * add linkify-it-py dependency for cicleci * add comments * replace flake8 url * add mdformat-myst dependency * remove mdformat-myst dependency * update contributing.md

[Docs] Replace markdownlint with mdformat for avoiding installing ruby (#1936)
* Use mdformat pre-commit hook * allows consecutive numbering * improve .mdformat.toml * test mdformat * format markdown * minor fix * fix codespecll * fix circleci * add linkify-it-py dependency for cicleci * add comments * replace flake8 url * add mdformat-myst dependency * remove mdformat-myst dependency * update contributing.md
b326a219 · Zaida Zhou · GitHub · 8708851e · b326a219 · b326a219
Unverified Commit b326a219 authored May 16, 2022 by Zaida Zhou Committed by GitHub May 16, 2022
20 changed files
--- a/docs/en/understand_mmcv/cnn.md
+++ b/docs/en/understand_mmcv/cnn.md
--- a/docs/en/understand_mmcv/registry.md
+++ b/docs/en/understand_mmcv/registry.md
@@ -51,6 +51,7 @@ class Converter1(object):
        self.a = a
        self.b = b
 ```
+
 ```python
 # converter2.py
 from .builder import CONVERTERS
@@ -61,6 +62,7 @@ from .converter1 import Converter1
 def converter2(a, b)
    return Converter1(a, b)
 ```
+
 The key step to use registry for managing the modules is to register the implemented module into the registry `CONVERTERS` through
 `@CONVERTERS.register_module()` when you are creating the module. By this way, a mapping between a string and the class (function) is built and maintained by `CONVERTERS` as below

@@ -68,6 +70,7 @@ The key step to use registry for managing the modules is to register the impleme
 'Converter1' -> <class 'Converter1'>
 'converter2' -> <function 'converter2'>
 ```
+
 ```{note}
 The registry mechanism will be triggered only when the file where the module is located is imported.
 So you need to import that file somewhere. More details can be found at https://github.com/open-mmlab/mmdetection/issues/5974.

--- a/docs/en/understand_mmcv/runner.md
+++ b/docs/en/understand_mmcv/runner.md
@@ -8,7 +8,7 @@ The runner class is designed to manage the training. It eases the training proce

 ### EpochBasedRunner

-As its name indicates, workflow in `EpochBasedRunner` should be set based on epochs. For example, [('train', 2), ('val', 1)] means running 2 epochs for training and 1 epoch for validation, iteratively. And each epoch may contain multiple iterations. Currently, MMDetection uses `EpochBasedRunner` by default.
+As its name indicates, workflow in `EpochBasedRunner` should be set based on epochs. For example, \[('train', 2), ('val', 1)\] means running 2 epochs for training and 1 epoch for validation, iteratively. And each epoch may contain multiple iterations. Currently, MMDetection uses `EpochBasedRunner` by default.

 Let's take a look at its core logic:

@@ -44,7 +44,7 @@ def train(self, data_loader, **kwargs):

 ### IterBasedRunner

-Different from `EpochBasedRunner`, workflow in `IterBasedRunner` should be set based on iterations. For example, [('train', 2), ('val', 1)] means running 2 iters for training and 1 iter for validation, iteratively. Currently, MMSegmentation uses `IterBasedRunner` by default.
+Different from `EpochBasedRunner`, workflow in `IterBasedRunner` should be set based on iterations. For example, \[('train', 2), ('val', 1)\] means running 2 iters for training and 1 iter for validation, iteratively. Currently, MMSegmentation uses `IterBasedRunner` by default.

 Let's take a look at its core logic:

@@ -156,8 +156,8 @@ runner.run(data_loaders, cfg.workflow)

 Let's take `EpochBasedRunner` for example and go a little bit into details about setting workflow:

- Say we only want to put train in the workflow, then we can set: workflow = [('train', 1)]. The runner will only execute train iteratively in this case.
- Say we want to put both train and val in the workflow, then we can set: workflow = [('train', 3), ('val',1)]. The runner will first execute train for 3 epochs and then switch to val mode and execute val for 1 epoch. The workflow will be repeated until the current epoch hit the max_epochs.
- Workflow is highly flexible. Therefore, you can set workflow = [('val', 1), ('train',1)] if you would like the runner to validate first and train after.
+- Say we only want to put train in the workflow, then we can set: workflow = \[('train', 1)\]. The runner will only execute train iteratively in this case.
+- Say we want to put both train and val in the workflow, then we can set: workflow = \[('train', 3), ('val',1)\]. The runner will first execute train for 3 epochs and then switch to val mode and execute val for 1 epoch. The workflow will be repeated until the current epoch hit the max_epochs.
+- Workflow is highly flexible. Therefore, you can set workflow = \[('val', 1), ('train',1)\] if you would like the runner to validate first and train after.

 The code we demonstrated above is already in `train.py` in MM repositories. Simply modify the corresponding keys in the configuration files and the script will execute the expected workflow automatically.
--- a/docs/zh_cn/community/contributing.md
+++ b/docs/zh_cn/community/contributing.md
@@ -7,7 +7,9 @@
 - 添加新功能和新组件

 ### 工作流
+
 | 详细工作流见 [拉取请求](pr.md)
+
 1. 复刻并拉取最新的 OpenMMLab 算法库
 2. 创建新的分支（不建议使用主分支提拉取请求）
 3. 提交你的修改
@@ -16,16 +18,18 @@
 ```{note}
 如果你计划添加新功能并且该功能包含比较大的改动，建议先开 issue 讨论
 ```
+
 ### 代码风格

 #### Python

 [PEP8](https://www.python.org/dev/peps/pep-0008/) 作为 OpenMMLab 算法库首选的代码规范，我们使用以下工具检查和格式化代码

- [flake8](http://flake8.pycqa.org/en/latest/): Python 官方发布的代码规范检查工具，是多个检查工具的封装
- [yapf](https://github.com/google/yapf): Google 发布的代码规范检查工具
+- [flake8](https://github.com/PyCQA/flake8): Python 官方发布的代码规范检查工具，是多个检查工具的封装
 - [isort](https://github.com/timothycrosley/isort): 自动调整模块导入顺序的工具
- [markdownlint](https://github.com/markdownlint/markdownlint): 检查 markdown 文件的工具
+- [yapf](https://github.com/google/yapf): Google 发布的代码规范检查工具
+- [codespell](https://github.com/codespell-project/codespell): 检查单词拼写是否有误
+- [mdformat](https://github.com/executablebooks/mdformat): 检查 markdown 文件的工具
 - [docformatter](https://github.com/myint/docformatter): 格式化 docstring 的工具

 yapf 和 isort 的配置可以在 [setup.cfg](./setup.cfg) 找到
@@ -46,23 +50,7 @@ pip install -U pre-commit
 pre-commit install
 ```

-如果安装 markdownlint 遇到了问题，可以尝试使用以下的步骤安装 ruby
-
-```shell
-# install rvm
-curl -L https://get.rvm.io | bash -s -- --autolibs=read-fail
-[[ -s "$HOME/.rvm/scripts/rvm" ]] && source "$HOME/.rvm/scripts/rvm"
-rvm autolibs disable
-
-# install ruby
-rvm install 2.7.1
-```
-
-或者参考 [这个代码库](https://github.com/innerlee/setup) 和 [`zzruby.sh`](https://github.com/innerlee/setup/blob/master/zzruby.sh)。
-
-至此，每一次 commit 修改都会触发 pre-commit 检查代码格式。
-
->提交拉取请求前，请确保你的代码符合 yapf 的格式
+> 提交拉取请求前，请确保你的代码符合 yapf 的格式

 #### C++ and CUDA


--- a/docs/zh_cn/community/pr.md
+++ b/docs/zh_cn/community/pr.md
@@ -21,10 +21,10 @@

 #### 1. 获取最新的代码库

-+ 当你第一次提 PR 时
+- 当你第一次提 PR 时

  复刻 OpenMMLab 原代码库，点击 GitHub 页面右上角的 **Fork** 按钮即可
-    ![avatar](../../en/_static/community/1.png)
+  ![avatar](../../en/_static/community/1.png)

  克隆复刻的代码库到本地

@@ -38,14 +38,14 @@
  git remote add upstream git@github.com:open-mmlab/mmcv
  ```

-+ 从第二个 PR 起
+- 从第二个 PR 起

  检出本地代码库的主分支，然后从最新的原代码库的主分支拉取更新

  ```bash
  git checkout master
  git pull upstream master
-   ```
+  ```

 #### 2. 从主分支创建一个新的开发分支

@@ -56,6 +56,7 @@ git checkout -b branchname
 ```{tip}
 为了保证提交历史清晰可读，我们强烈推荐您先检出主分支 (master)，再创建新的分支。
 ```
+
 #### 3. 提交你的修改

 ```bash
@@ -66,23 +67,23 @@ git commit -m 'messages'

 #### 4. 推送你的修改到复刻的代码库，并创建一个`拉取请求`

-+ 推送当前分支到远端复刻的代码库
+- 推送当前分支到远端复刻的代码库

-    ```bash
-    git push origin branchname
-    ```
+  ```bash
+  git push origin branchname
+  ```

-+ 创建一个`拉取请求`
-![avatar](../../en/_static/community/2.png)
+- 创建一个`拉取请求`
+  ![avatar](../../en/_static/community/2.png)

-+ 修改`拉取请求`信息模板，描述修改原因和修改内容。还可以在 PR 描述中，手动关联到相关的`议题` (issue),（更多细节，请参考[官方文档](https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue)）。
+- 修改`拉取请求`信息模板，描述修改原因和修改内容。还可以在 PR 描述中，手动关联到相关的`议题` (issue),（更多细节，请参考[官方文档](https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue)）。

 #### 5. 讨论并评审你的代码

-+ 创建`拉取请求`时，可以关联给相关人员进行评审
-![avatar](../../en/_static/community/3.png)
+- 创建`拉取请求`时，可以关联给相关人员进行评审
+  ![avatar](../../en/_static/community/3.png)

-+ 根据评审人员的意见修改代码，并推送修改
+- 根据评审人员的意见修改代码，并推送修改

 #### 6. `拉取请求`合并之后删除该分支

@@ -99,15 +100,15 @@ git push origin --delete branchname # delete remote branch

 3. 粒度要细，一个PR只做一件事情，避免超大的PR

-    + Bad：实现 Faster R-CNN
-    + Acceptable：给 Faster R-CNN 添加一个 box head
-    + Good：给 box head 增加一个参数来支持自定义的 conv 层数
+   - Bad：实现 Faster R-CNN
+   - Acceptable：给 Faster R-CNN 添加一个 box head
+   - Good：给 box head 增加一个参数来支持自定义的 conv 层数

 4. 每次 Commit 时需要提供清晰且有意义 commit 信息

 5. 提供清晰且有意义的`拉取请求`描述

-    + 标题写明白任务名称，一般格式:[Prefix] Short description of the pull request (Suffix)
-    + prefix: 新增功能 [Feature], 修 bug [Fix], 文档相关 [Docs], 开发中 [WIP] (暂时不会被review)
-    + 描述里介绍`拉取请求`的主要修改内容，结果，以及对其他部分的影响, 参考`拉取请求`模板
-    + 关联相关的`议题` (issue) 和其他`拉取请求`
+   - 标题写明白任务名称，一般格式:\[Prefix\] Short description of the pull request (Suffix)
+   - prefix: 新增功能 \[Feature\], 修 bug \[Fix\], 文档相关 \[Docs\], 开发中 \[WIP\] (暂时不会被review)
+   - 描述里介绍`拉取请求`的主要修改内容，结果，以及对其他部分的影响, 参考`拉取请求`模板
+   - 关联相关的`议题` (issue) 和其他`拉取请求`
--- a/docs/zh_cn/deployment/onnxruntime_custom_ops.md
+++ b/docs/zh_cn/deployment/onnxruntime_custom_ops.md
@@ -2,55 +2,55 @@

 <!-- TOC -->

- [ONNX Runtime自定义算子](#onnx-runtime自定义算子)
+- [ONNX Runtime自定义算子](#onnx-runtime%E8%87%AA%E5%AE%9A%E4%B9%89%E7%AE%97%E5%AD%90)
  - [SoftNMS](#softnms)
-    - [描述](#描述)
-    - [模型参数](#模型参数)
-    - [输入](#输入)
-    - [输出](#输出)
-    - [类型约束](#类型约束)
+    - [描述](#%E6%8F%8F%E8%BF%B0)
+    - [模型参数](#%E6%A8%A1%E5%9E%8B%E5%8F%82%E6%95%B0)
+    - [输入](#%E8%BE%93%E5%85%A5)
+    - [输出](#%E8%BE%93%E5%87%BA)
+    - [类型约束](#%E7%B1%BB%E5%9E%8B%E7%BA%A6%E6%9D%9F)
  - [RoIAlign](#roialign)
-    - [描述](#描述-1)
-    - [模型参数](#模型参数-1)
-    - [输入](#输入-1)
-    - [输出](#输出-1)
-    - [类型约束](#类型约束-1)
+    - [描述](#%E6%8F%8F%E8%BF%B0-1)
+    - [模型参数](#%E6%A8%A1%E5%9E%8B%E5%8F%82%E6%95%B0-1)
+    - [输入](#%E8%BE%93%E5%85%A5-1)
+    - [输出](#%E8%BE%93%E5%87%BA-1)
+    - [类型约束](#%E7%B1%BB%E5%9E%8B%E7%BA%A6%E6%9D%9F-1)
  - [NMS](#nms)
-    - [描述](#描述-2)
-    - [模型参数](#模型参数-2)
-    - [输入](#输入-2)
-    - [输出](#输出-2)
-    - [类型约束](#类型约束-2)
+    - [描述](#%E6%8F%8F%E8%BF%B0-2)
+    - [模型参数](#%E6%A8%A1%E5%9E%8B%E5%8F%82%E6%95%B0-2)
+    - [输入](#%E8%BE%93%E5%85%A5-2)
+    - [输出](#%E8%BE%93%E5%87%BA-2)
+    - [类型约束](#%E7%B1%BB%E5%9E%8B%E7%BA%A6%E6%9D%9F-2)
  - [grid_sampler](#grid_sampler)
-    - [描述](#描述-3)
-    - [模型参数](#模型参数-3)
-    - [输入](#输入-3)
-    - [输出](#输出-3)
-    - [类型约束](#类型约束-3)
+    - [描述](#%E6%8F%8F%E8%BF%B0-3)
+    - [模型参数](#%E6%A8%A1%E5%9E%8B%E5%8F%82%E6%95%B0-3)
+    - [输入](#%E8%BE%93%E5%85%A5-3)
+    - [输出](#%E8%BE%93%E5%87%BA-3)
+    - [类型约束](#%E7%B1%BB%E5%9E%8B%E7%BA%A6%E6%9D%9F-3)
  - [CornerPool](#cornerpool)
-    - [描述](#描述-4)
-    - [模型参数](#模型参数-4)
-    - [输入](#输入-4)
-    - [输出](#输出-4)
-    - [类型约束](#类型约束-4)
+    - [描述](#%E6%8F%8F%E8%BF%B0-4)
+    - [模型参数](#%E6%A8%A1%E5%9E%8B%E5%8F%82%E6%95%B0-4)
+    - [输入](#%E8%BE%93%E5%85%A5-4)
+    - [输出](#%E8%BE%93%E5%87%BA-4)
+    - [类型约束](#%E7%B1%BB%E5%9E%8B%E7%BA%A6%E6%9D%9F-4)
  - [cummax](#cummax)
-    - [描述](#描述-5)
-    - [模型参数](#模型参数-5)
-    - [输入](#输入-5)
-    - [输出](#输出-5)
-    - [类型约束](#类型约束-5)
+    - [描述](#%E6%8F%8F%E8%BF%B0-5)
+    - [模型参数](#%E6%A8%A1%E5%9E%8B%E5%8F%82%E6%95%B0-5)
+    - [输入](#%E8%BE%93%E5%85%A5-5)
+    - [输出](#%E8%BE%93%E5%87%BA-5)
+    - [类型约束](#%E7%B1%BB%E5%9E%8B%E7%BA%A6%E6%9D%9F-5)
  - [cummin](#cummin)
-    - [描述](#描述-6)
-    - [模型参数](#模型参数-6)
-    - [输入](#输入-6)
-    - [输出](#输出-6)
-    - [类型约束](#类型约束-6)
+    - [描述](#%E6%8F%8F%E8%BF%B0-6)
+    - [模型参数](#%E6%A8%A1%E5%9E%8B%E5%8F%82%E6%95%B0-6)
+    - [输入](#%E8%BE%93%E5%85%A5-6)
+    - [输出](#%E8%BE%93%E5%87%BA-6)
+    - [类型约束](#%E7%B1%BB%E5%9E%8B%E7%BA%A6%E6%9D%9F-6)
  - [MMCVModulatedDeformConv2d](#mmcvmodulateddeformconv2d)
-    - [描述](#描述-7)
-    - [模型参数](#模型参数-7)
-    - [输入](#输入-7)
-    - [输出](#输出-7)
-    - [类型约束](#类型约束-7)
+    - [描述](#%E6%8F%8F%E8%BF%B0-7)
+    - [模型参数](#%E6%A8%A1%E5%9E%8B%E5%8F%82%E6%95%B0-7)
+    - [输入](#%E8%BE%93%E5%85%A5-7)
+    - [输出](#%E8%BE%93%E5%87%BA-7)
+    - [类型约束](#%E7%B1%BB%E5%9E%8B%E7%BA%A6%E6%9D%9F-7)

 <!-- TOC -->

@@ -62,13 +62,13 @@

 #### 模型参数

-| 类型    | 参数名          | 描述                                                    |
-| ------- | --------------- | ------------------------------------------------------- |
-| `float` | `iou_threshold` | 用来判断候选框重合度的阈值，取值范围[0, 1]。默认值为0   |
-| `float` | `sigma`         | 高斯方法的超参数                                        |
-| `float` | `min_score`     | NMS的score阈值                                          |
+| 类型      | 参数名             | 描述                                                 |
+| ------- | --------------- | -------------------------------------------------- |
+| `float` | `iou_threshold` | 用来判断候选框重合度的阈值，取值范围\[0, 1\]。默认值为0                   |
+| `float` | `sigma`         | 高斯方法的超参数                                           |
+| `float` | `min_score`     | NMS的score阈值                                        |
 | `int`   | `method`        | NMS的计算方式, (0: `naive`, 1: `linear`, 2: `gaussian`) |
-| `int`   | `offset`        | 用来计算候选框的宽高(x2 - x1 + offset)。可选值0或1      |
+| `int`   | `offset`        | 用来计算候选框的宽高(x2 - x1 + offset)。可选值0或1                |

 #### 输入

@@ -100,13 +100,13 @@

 #### 模型参数

-| 类型    | 参数名           | 描述                                                    |
-| ------- | ---------------- | ------------------------------------------------------- |
-| `int`   | `output_height`  | roi特征的输出高度                                       |
-| `int`   | `output_width`   | roi特征的输出宽度                                       |
-| `float` | `spatial_scale`  | 输入检测框的缩放系数                                    |
-| `int`   | `sampling_ratio` | 输出的采样率。`0`表示使用密集采样                       |
-| `str`   | `mode`           | 池化方式。 `avg`或`max`                                 |
+| 类型      | 参数名              | 描述                                  |
+| ------- | ---------------- | ----------------------------------- |
+| `int`   | `output_height`  | roi特征的输出高度                          |
+| `int`   | `output_width`   | roi特征的输出宽度                          |
+| `float` | `spatial_scale`  | 输入检测框的缩放系数                          |
+| `int`   | `sampling_ratio` | 输出的采样率。`0`表示使用密集采样                  |
+| `str`   | `mode`           | 池化方式。 `avg`或`max`                   |
 | `int`   | `aligned`        | 如果`aligned=1`，则像素会进行-0.5的偏移以达到更好的对齐 |

 #### 输入
@@ -137,10 +137,10 @@

 #### 模型参数

-| 类型    | 参数名          | 描述                                                  |
-| ------- | --------------- | ----------------------------------------------------- |
-| `float` | `iou_threshold` | 用来判断候选框重合度的阈值，取值范围[0, 1]。默认值为0 |
-| `int`   | `offset`        | 用来计算候选框的宽高(x2 - x1 + offset)。可选值0或1    |
+| 类型      | 参数名             | 描述                                  |
+| ------- | --------------- | ----------------------------------- |
+| `float` | `iou_threshold` | 用来判断候选框重合度的阈值，取值范围\[0, 1\]。默认值为0    |
+| `int`   | `offset`        | 用来计算候选框的宽高(x2 - x1 + offset)。可选值0或1 |

 #### 输入

@@ -170,10 +170,10 @@

 #### 模型参数

-| 类型  | 参数名               | 描述                                                                                                                                                 |
-| ----- | -------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------- |
-| `int` | `interpolation_mode` | 计算输出使用的插值模式。(0: `bilinear` , 1: `nearest`)                                                                                               |
-| `int` | `padding_mode`       | 边缘填充模式。(0: `zeros`, 1: `border`, 2: `reflection`)                                                                                             |
+| 类型    | 参数名                  | 描述                                                                                               |
+| ----- | -------------------- | ------------------------------------------------------------------------------------------------ |
+| `int` | `interpolation_mode` | 计算输出使用的插值模式。(0: `bilinear` , 1: `nearest`)                                                       |
+| `int` | `padding_mode`       | 边缘填充模式。(0: `zeros`, 1: `border`, 2: `reflection`)                                                |
 | `int` | `align_corners`      | 如果`align_corners=1`，则极值(`-1`和`1`)会被当做输入边缘像素的中心点。如果`align_corners=0`，则它们会被看做是边缘像素的边缘点,减小分辨率对采样的影响 |

 #### 输入
@@ -204,8 +204,8 @@

 #### 模型参数

-| 类型  | 参数名 | 描述                                                     |
-| ----- | ------ | -------------------------------------------------------- |
+| 类型    | 参数名    | 描述                                                  |
+| ----- | ------ | --------------------------------------------------- |
 | `int` | `mode` | 池化模式。(0: `top`, 1: `bottom`, 2: `left`, 3: `right`) |

 #### 输入
@@ -234,9 +234,9 @@

 #### 模型参数

-| 类型  | 参数名 | 描述               |
-| ----- | ------ | ------------------ |
-| `int` | `dim`  | 进行累计计算的维度 |
+| 类型    | 参数名   | 描述        |
+| ----- | ----- | --------- |
+| `int` | `dim` | 进行累计计算的维度 |

 #### 输入

@@ -266,9 +266,9 @@

 #### 模型参数

-| 类型  | 参数名 | 描述               |
-| ----- | ------ | ------------------ |
-| `int` | `dim`  | 进行累计计算的维度 |
+| 类型    | 参数名   | 描述        |
+| ----- | ----- | --------- |
+| `int` | `dim` | 进行累计计算的维度 |

 #### 输入

@@ -298,12 +298,12 @@

 #### 模型参数

-| 类型           | 参数名              | 描述                                                          |
-| -------------- | ------------------- | ------------------------------------------------------------- |
-| `list of ints` | `stride`            | 卷积的步长 (sH, sW)                                           |
-| `list of ints` | `padding`           | 输入特征填充大小 (padH, padW)                                 |
-| `list of ints` | `dilation`          | 卷积核各元素间隔 (dH, dW)                                     |
-| `int`          | `deformable_groups` | 可变偏移量的分组，通常置位1即可                               |
+| 类型             | 参数名                 | 描述                                     |
+| -------------- | ------------------- | -------------------------------------- |
+| `list of ints` | `stride`            | 卷积的步长 (sH, sW)                         |
+| `list of ints` | `padding`           | 输入特征填充大小 (padH, padW)                  |
+| `list of ints` | `dilation`          | 卷积核各元素间隔 (dH, dW)                      |
+| `int`          | `deformable_groups` | 可变偏移量的分组，通常置位1即可                       |
 | `int`          | `groups`            | 卷积分组数，`input_channel`会根据这个值被分为数个分组进行计算 |

 #### 输入

--- a/docs/zh_cn/deployment/onnxruntime_op.md
+++ b/docs/zh_cn/deployment/onnxruntime_op.md
@@ -15,16 +15,16 @@

 ### MMCV已支持的算子

-|                                       算子                                       |  CPU  |  GPU  | MMCV版本 |
-| :------------------------------------------------------------------------------: | :---: | :---: | :------: |
-|                   [SoftNMS](onnxruntime_custom_ops.md#softnms)                   |   Y   |   N   |  1.2.3   |
-|                  [RoIAlign](onnxruntime_custom_ops.md#roialign)                  |   Y   |   N   |  1.2.5   |
-|                       [NMS](onnxruntime_custom_ops.md#nms)                       |   Y   |   N   |  1.2.7   |
-|              [grid_sampler](onnxruntime_custom_ops.md#grid_sampler)              |   Y   |   N   |  1.3.1   |
-|                [CornerPool](onnxruntime_custom_ops.md#cornerpool)                |   Y   |   N   |  1.3.4   |
-|                    [cummax](onnxruntime_custom_ops.md#cummax)                    |   Y   |   N   |  1.3.4   |
-|                    [cummin](onnxruntime_custom_ops.md#cummin)                    |   Y   |   N   |  1.3.4   |
-| [MMCVModulatedDeformConv2d](onnxruntime_custom_ops.md#mmcvmodulateddeformconv2d) |   Y   |   N   |  1.3.12  |
+|                                        算子                                        | CPU | GPU | MMCV版本 |
+| :------------------------------------------------------------------------------: | :-: | :-: | :----: |
+|                   [SoftNMS](onnxruntime_custom_ops.md#softnms)                   |  Y  |  N  | 1.2.3  |
+|                  [RoIAlign](onnxruntime_custom_ops.md#roialign)                  |  Y  |  N  | 1.2.5  |
+|                       [NMS](onnxruntime_custom_ops.md#nms)                       |  Y  |  N  | 1.2.7  |
+|              [grid_sampler](onnxruntime_custom_ops.md#grid_sampler)              |  Y  |  N  | 1.3.1  |
+|                [CornerPool](onnxruntime_custom_ops.md#cornerpool)                |  Y  |  N  | 1.3.4  |
+|                    [cummax](onnxruntime_custom_ops.md#cummax)                    |  Y  |  N  | 1.3.4  |
+|                    [cummin](onnxruntime_custom_ops.md#cummin)                    |  Y  |  N  | 1.3.4  |
+| [MMCVModulatedDeformConv2d](onnxruntime_custom_ops.md#mmcvmodulateddeformconv2d) |  Y  |  N  | 1.3.12 |

 ### 如何编译ONNX Runtime自定义算子？

@@ -97,18 +97,20 @@ onnx_results = sess.run(None, {'input' : input_data})
 以`soft_nms`为例：

 1. 在ONNX Runtime头文件目录`mmcv/ops/csrc/onnxruntime/`下添加头文件`soft_nms.h`
+
 2. 在ONNX Runtime源码目录`mmcv/ops/csrc/onnxruntime/cpu/`下添加算子实现`soft_nms.cpp`
+
 3. 在[onnxruntime_register.cpp](../../mmcv/ops/csrc/onnxruntime/cpu/onnxruntime_register.cpp)中注册实现的算子`soft_nms`

-    ```c++
-    #include "soft_nms.h"
+   ```c++
+   #include "soft_nms.h"

-    SoftNmsOp c_SoftNmsOp;
+   SoftNmsOp c_SoftNmsOp;

-    if (auto status = ortApi->CustomOpDomain_Add(domain, &c_SoftNmsOp)) {
-    return status;
-    }
-    ```
+   if (auto status = ortApi->CustomOpDomain_Add(domain, &c_SoftNmsOp)) {
+   return status;
+   }
+   ```

 4. 在`tests/test_ops/test_onnx.py`添加单元测试，
   可以参考[here](../../tests/test_ops/test_onnx.py)。
@@ -118,8 +120,8 @@ onnx_results = sess.run(None, {'input' : input_data})
 ### 已知问题

 - "RuntimeError: tuple appears in op that does not forward tuples, unsupported kind: `prim::PythonOp`."
-   1. 请注意`cummax`和`cummin`算子是在torch >= 1.5.0被添加的。但他们需要在torch version >= 1.7.0才能正确导出。否则会在导出时发生上面的错误。
-   2. 解决方法：升级PyTorch到1.7.0以上版本
+  1. 请注意`cummax`和`cummin`算子是在torch >= 1.5.0被添加的。但他们需要在torch version >= 1.7.0才能正确导出。否则会在导出时发生上面的错误。
+  2. 解决方法：升级PyTorch到1.7.0以上版本

 ### 引用


--- a/docs/zh_cn/deployment/tensorrt_custom_ops.md
+++ b/docs/zh_cn/deployment/tensorrt_custom_ops.md
@@ -2,61 +2,61 @@

 <!-- TOC -->

- [TensorRT自定义算子](#tensorrt自定义算子)
+- [TensorRT自定义算子](#tensorrt%E8%87%AA%E5%AE%9A%E4%B9%89%E7%AE%97%E5%AD%90)
  - [MMCVRoIAlign](#mmcvroialign)
-    - [描述](#描述)
-    - [模型参数](#模型参数)
-    - [输入](#输入)
-    - [输出](#输出)
-    - [类型约束](#类型约束)
+    - [描述](#%E6%8F%8F%E8%BF%B0)
+    - [模型参数](#%E6%A8%A1%E5%9E%8B%E5%8F%82%E6%95%B0)
+    - [输入](#%E8%BE%93%E5%85%A5)
+    - [输出](#%E8%BE%93%E5%87%BA)
+    - [类型约束](#%E7%B1%BB%E5%9E%8B%E7%BA%A6%E6%9D%9F)
  - [ScatterND](#scatternd)
-    - [描述](#描述-1)
-    - [模型参数](#模型参数-1)
-    - [输入](#输入-1)
-    - [输出](#输出-1)
-    - [类型约束](#类型约束-1)
+    - [描述](#%E6%8F%8F%E8%BF%B0-1)
+    - [模型参数](#%E6%A8%A1%E5%9E%8B%E5%8F%82%E6%95%B0-1)
+    - [输入](#%E8%BE%93%E5%85%A5-1)
+    - [输出](#%E8%BE%93%E5%87%BA-1)
+    - [类型约束](#%E7%B1%BB%E5%9E%8B%E7%BA%A6%E6%9D%9F-1)
  - [NonMaxSuppression](#nonmaxsuppression)
-    - [描述](#描述-2)
-    - [模型参数](#模型参数-2)
-    - [输入](#输入-2)
-    - [输出](#输出-2)
-    - [类型约束](#类型约束-2)
+    - [描述](#%E6%8F%8F%E8%BF%B0-2)
+    - [模型参数](#%E6%A8%A1%E5%9E%8B%E5%8F%82%E6%95%B0-2)
+    - [输入](#%E8%BE%93%E5%85%A5-2)
+    - [输出](#%E8%BE%93%E5%87%BA-2)
+    - [类型约束](#%E7%B1%BB%E5%9E%8B%E7%BA%A6%E6%9D%9F-2)
  - [MMCVDeformConv2d](#mmcvdeformconv2d)
-    - [描述](#描述-3)
-    - [模型参数](#模型参数-3)
-    - [输入](#输入-3)
-    - [输出](#输出-3)
-    - [类型约束](#类型约束-3)
+    - [描述](#%E6%8F%8F%E8%BF%B0-3)
+    - [模型参数](#%E6%A8%A1%E5%9E%8B%E5%8F%82%E6%95%B0-3)
+    - [输入](#%E8%BE%93%E5%85%A5-3)
+    - [输出](#%E8%BE%93%E5%87%BA-3)
+    - [类型约束](#%E7%B1%BB%E5%9E%8B%E7%BA%A6%E6%9D%9F-3)
  - [grid_sampler](#grid_sampler)
-    - [描述](#描述-4)
-    - [模型参数](#模型参数-4)
-    - [输入](#输入-4)
-    - [输出](#输出-4)
-    - [类型约束](#类型约束-4)
+    - [描述](#%E6%8F%8F%E8%BF%B0-4)
+    - [模型参数](#%E6%A8%A1%E5%9E%8B%E5%8F%82%E6%95%B0-4)
+    - [输入](#%E8%BE%93%E5%85%A5-4)
+    - [输出](#%E8%BE%93%E5%87%BA-4)
+    - [类型约束](#%E7%B1%BB%E5%9E%8B%E7%BA%A6%E6%9D%9F-4)
  - [cummax](#cummax)
-    - [描述](#描述-5)
-    - [模型参数](#模型参数-5)
-    - [输入](#输入-5)
-    - [输出](#输出-5)
-    - [类型约束](#类型约束-5)
+    - [描述](#%E6%8F%8F%E8%BF%B0-5)
+    - [模型参数](#%E6%A8%A1%E5%9E%8B%E5%8F%82%E6%95%B0-5)
+    - [输入](#%E8%BE%93%E5%85%A5-5)
+    - [输出](#%E8%BE%93%E5%87%BA-5)
+    - [类型约束](#%E7%B1%BB%E5%9E%8B%E7%BA%A6%E6%9D%9F-5)
  - [cummin](#cummin)
-    - [描述](#描述-6)
-    - [模型参数](#模型参数-6)
-    - [输入](#输入-6)
-    - [输出](#输出-6)
-    - [类型约束](#类型约束-6)
+    - [描述](#%E6%8F%8F%E8%BF%B0-6)
+    - [模型参数](#%E6%A8%A1%E5%9E%8B%E5%8F%82%E6%95%B0-6)
+    - [输入](#%E8%BE%93%E5%85%A5-6)
+    - [输出](#%E8%BE%93%E5%87%BA-6)
+    - [类型约束](#%E7%B1%BB%E5%9E%8B%E7%BA%A6%E6%9D%9F-6)
  - [MMCVInstanceNormalization](#mmcvinstancenormalization)
-    - [描述](#描述-7)
-    - [模型参数](#模型参数-7)
-    - [输入](#输入-7)
-    - [输出](#输出-7)
-    - [类型约束](#类型约束-7)
+    - [描述](#%E6%8F%8F%E8%BF%B0-7)
+    - [模型参数](#%E6%A8%A1%E5%9E%8B%E5%8F%82%E6%95%B0-7)
+    - [输入](#%E8%BE%93%E5%85%A5-7)
+    - [输出](#%E8%BE%93%E5%87%BA-7)
+    - [类型约束](#%E7%B1%BB%E5%9E%8B%E7%BA%A6%E6%9D%9F-7)
  - [MMCVModulatedDeformConv2d](#mmcvmodulateddeformconv2d)
-    - [描述](#描述-8)
-    - [模型参数](#模型参数-8)
-    - [输入](#输入-8)
-    - [输出](#输出-8)
-    - [类型约束](#类型约束-8)
+    - [描述](#%E6%8F%8F%E8%BF%B0-8)
+    - [模型参数](#%E6%A8%A1%E5%9E%8B%E5%8F%82%E6%95%B0-8)
+    - [输入](#%E8%BE%93%E5%85%A5-8)
+    - [输出](#%E8%BE%93%E5%87%BA-8)
+    - [类型约束](#%E7%B1%BB%E5%9E%8B%E7%BA%A6%E6%9D%9F-8)

 <!-- TOC -->

@@ -68,13 +68,13 @@

 #### 模型参数

-| 类型    | 参数名           | 描述                                                    |
-| ------- | ---------------- | ------------------------------------------------------- |
-| `int`   | `output_height`  | roi特征的输出高度                                       |
-| `int`   | `output_width`   | roi特征的输出宽度                                       |
-| `float` | `spatial_scale`  | 输入检测框的缩放系数                                    |
-| `int`   | `sampling_ratio` | 输出的采样率。`0`表示使用密集采样                       |
-| `str`   | `mode`           | 池化方式。 `avg`或`max`                                 |
+| 类型      | 参数名              | 描述                                  |
+| ------- | ---------------- | ----------------------------------- |
+| `int`   | `output_height`  | roi特征的输出高度                          |
+| `int`   | `output_width`   | roi特征的输出宽度                          |
+| `float` | `spatial_scale`  | 输入检测框的缩放系数                          |
+| `int`   | `sampling_ratio` | 输出的采样率。`0`表示使用密集采样                  |
+| `str`   | `mode`           | 池化方式。 `avg`或`max`                   |
 | `int`   | `aligned`        | 如果`aligned=1`，则像素会进行-0.5的偏移以达到更好的对齐 |

 #### 输入
@@ -100,7 +100,7 @@

 #### 描述

-ScatterND接收三个输入，分别为秩为r >= 1的`data`，秩为q >= 1的`indices`以及秩为 q + r - indices.shape[-1] -1 的`update`。输出的计算方式为：首先创建一个`data`的拷贝，然后根据`indces`的值使用`update`对拷贝的`data`进行更新。注意`indices`中不应该存在相同的条目，也就是说对同一个位置进行一次以上的更新是不允许的。
+ScatterND接收三个输入，分别为秩为r >= 1的`data`，秩为q >= 1的`indices`以及秩为 q + r - indices.shape\[-1\] -1 的`update`。输出的计算方式为：首先创建一个`data`的拷贝，然后根据`indces`的值使用`update`对拷贝的`data`进行更新。注意`indices`中不应该存在相同的条目，也就是说对同一个位置进行一次以上的更新是不允许的。

 输出的计算方式可以参考如下代码：

@@ -147,13 +147,13 @@ ScatterND接收三个输入，分别为秩为r >= 1的`data`，秩为q >= 1的`i

 #### 模型参数

-| 类型    | 参数名                       | 描述                                                                                     |
-| ------- | ---------------------------- | ---------------------------------------------------------------------------------------- |
-| `int`   | `center_point_box`           | 0 - 候选框的格式为[y1, x1, y2, x2]， 1-候选框的格式为[x_center, y_center, width, height] |
-| `int`   | `max_output_boxes_per_class` | 每一类最大的输出检测框个数。默认为0，输出检测框个数等于输入候选框数                      |
-| `float` | `iou_threshold`              | 用来判断候选框重合度的阈值，取值范围[0, 1]。默认值为0                                    |
-| `float` | `score_threshold`            | 用来判断候选框是否合法的阈值                                                             |
-| `int`   | `offset`                     | 检测框长宽计算方式为(x2 - x1 + offset)，可选值0或1                                       |
+| 类型      | 参数名                          | 描述                                                                            |
+| ------- | ---------------------------- | ----------------------------------------------------------------------------- |
+| `int`   | `center_point_box`           | 0 - 候选框的格式为\[y1, x1, y2, x2\]， 1-候选框的格式为\[x_center, y_center, width, height\] |
+| `int`   | `max_output_boxes_per_class` | 每一类最大的输出检测框个数。默认为0，输出检测框个数等于输入候选框数                                            |
+| `float` | `iou_threshold`              | 用来判断候选框重合度的阈值，取值范围\[0, 1\]。默认值为0                                              |
+| `float` | `score_threshold`            | 用来判断候选框是否合法的阈值                                                                |
+| `int`   | `offset`                     | 检测框长宽计算方式为(x2 - x1 + offset)，可选值0或1                                           |

 #### 输入

@@ -185,13 +185,13 @@ ScatterND接收三个输入，分别为秩为r >= 1的`data`，秩为q >= 1的`i

 #### 模型参数

-| 类型           | 参数名             | 描述                                                                                          |
-| -------------- | ------------------ | --------------------------------------------------------------------------------------------- |
-| `list of ints` | `stride`           | 卷积的步长 (sH, sW)                                                                           |
-| `list of ints` | `padding`          | 输入特征填充大小 (padH, padW)                                                                 |
-| `list of ints` | `dilation`         | 卷积核各元素间隔 (dH, dW)                                                                     |
-| `int`          | `deformable_group` | 可变偏移量的分组                                                                              |
-| `int`          | `group`            | 卷积分组数，`input_channel`会根据这个值被分为数个分组进行计算                                 |
+| 类型             | 参数名                | 描述                                                      |
+| -------------- | ------------------ | ------------------------------------------------------- |
+| `list of ints` | `stride`           | 卷积的步长 (sH, sW)                                          |
+| `list of ints` | `padding`          | 输入特征填充大小 (padH, padW)                                   |
+| `list of ints` | `dilation`         | 卷积核各元素间隔 (dH, dW)                                       |
+| `int`          | `deformable_group` | 可变偏移量的分组                                                |
+| `int`          | `group`            | 卷积分组数，`input_channel`会根据这个值被分为数个分组进行计算                  |
 | `int`          | `im2col_step`      | 可变卷积使用im2col计算卷积。输入与偏移量会以im2col_step为步长分块计算，减少临时空间的使用量。 |

 #### 输入
@@ -224,10 +224,10 @@ ScatterND接收三个输入，分别为秩为r >= 1的`data`，秩为q >= 1的`i

 #### 模型参数

-| 类型  | 参数名               | 描述                                                                                                                                                 |
-| ----- | -------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------- |
-| `int` | `interpolation_mode` | 计算输出使用的插值模式。(0: `bilinear` , 1: `nearest`)                                                                                               |
-| `int` | `padding_mode`       | 边缘填充模式。(0: `zeros`, 1: `border`, 2: `reflection`)                                                                                             |
+| 类型    | 参数名                  | 描述                                                                                               |
+| ----- | -------------------- | ------------------------------------------------------------------------------------------------ |
+| `int` | `interpolation_mode` | 计算输出使用的插值模式。(0: `bilinear` , 1: `nearest`)                                                       |
+| `int` | `padding_mode`       | 边缘填充模式。(0: `zeros`, 1: `border`, 2: `reflection`)                                                |
 | `int` | `align_corners`      | 如果`align_corners=1`，则极值(`-1`和`1`)会被当做输入边缘像素的中心点。如果`align_corners=0`，则它们会被看做是边缘像素的边缘点,减小分辨率对采样的影响 |

 #### 输入
@@ -258,9 +258,9 @@ ScatterND接收三个输入，分别为秩为r >= 1的`data`，秩为q >= 1的`i

 #### 模型参数

-| 类型  | 参数名 | 描述               |
-| ----- | ------ | ------------------ |
-| `int` | `dim`  | 进行累计计算的维度 |
+| 类型    | 参数名   | 描述        |
+| ----- | ----- | --------- |
+| `int` | `dim` | 进行累计计算的维度 |

 #### 输入

@@ -290,9 +290,9 @@ ScatterND接收三个输入，分别为秩为r >= 1的`data`，秩为q >= 1的`i

 #### 模型参数

-| 类型  | 参数名 | 描述               |
-| ----- | ------ | ------------------ |
-| `int` | `dim`  | 进行累计计算的维度 |
+| 类型    | 参数名   | 描述        |
+| ----- | ----- | --------- |
+| `int` | `dim` | 进行累计计算的维度 |

 #### 输入

@@ -322,8 +322,8 @@ ScatterND接收三个输入，分别为秩为r >= 1的`data`，秩为q >= 1的`i

 #### 模型参数

-| 类型    | 参数名    | 描述                         |
-| ------- | --------- | ---------------------------- |
+| 类型      | 参数名       | 描述                |
+| ------- | --------- | ----------------- |
 | `float` | `epsilon` | 用来避免除0错误。默认为1e-05 |

 #### 输入
@@ -356,12 +356,12 @@ ScatterND接收三个输入，分别为秩为r >= 1的`data`，秩为q >= 1的`i

 #### 模型参数

-| 类型           | 参数名              | 描述                                                          |
-| -------------- | ------------------- | ------------------------------------------------------------- |
-| `list of ints` | `stride`            | 卷积的步长 (sH, sW)                                           |
-| `list of ints` | `padding`           | 输入特征填充大小 (padH, padW)                                 |
-| `list of ints` | `dilation`          | 卷积核各元素间隔 (dH, dW)                                     |
-| `int`          | `deformable_groups` | 可变偏移量的分组，通常置位1即可                               |
+| 类型             | 参数名                 | 描述                                     |
+| -------------- | ------------------- | -------------------------------------- |
+| `list of ints` | `stride`            | 卷积的步长 (sH, sW)                         |
+| `list of ints` | `padding`           | 输入特征填充大小 (padH, padW)                  |
+| `list of ints` | `dilation`          | 卷积核各元素间隔 (dH, dW)                      |
+| `int`          | `deformable_groups` | 可变偏移量的分组，通常置位1即可                       |
 | `int`          | `groups`            | 卷积分组数，`input_channel`会根据这个值被分为数个分组进行计算 |

 #### 输入

--- a/docs/zh_cn/deployment/tensorrt_plugin.md
+++ b/docs/zh_cn/deployment/tensorrt_plugin.md
@@ -2,18 +2,18 @@

 <!-- TOC -->

- [MMCV中的TensorRT自定义算子 (实验性)](#mmcv中的tensorrt自定义算子-实验性)
-  - [介绍](#介绍)
-  - [MMCV中的TensorRT插件列表](#mmcv中的tensorrt插件列表)
-  - [如何编译MMCV中的TensorRT插件](#如何编译mmcv中的tensorrt插件)
-    - [准备](#准备)
-    - [在Linux上编译](#在linux上编译)
-  - [创建TensorRT推理引擎并在python下进行推理](#创建tensorrt推理引擎并在python下进行推理)
-  - [如何在MMCV中添加新的TensorRT自定义算子](#如何在mmcv中添加新的tensorrt自定义算子)
-    - [主要流程](#主要流程)
-    - [注意](#注意)
-  - [已知问题](#已知问题)
-  - [引用](#引用)
+- [MMCV中的TensorRT自定义算子 (实验性)](#mmcv%E4%B8%AD%E7%9A%84tensorrt%E8%87%AA%E5%AE%9A%E4%B9%89%E7%AE%97%E5%AD%90-%E5%AE%9E%E9%AA%8C%E6%80%A7)
+  - [介绍](#%E4%BB%8B%E7%BB%8D)
+  - [MMCV中的TensorRT插件列表](#mmcv%E4%B8%AD%E7%9A%84tensorrt%E6%8F%92%E4%BB%B6%E5%88%97%E8%A1%A8)
+  - [如何编译MMCV中的TensorRT插件](#%E5%A6%82%E4%BD%95%E7%BC%96%E8%AF%91mmcv%E4%B8%AD%E7%9A%84tensorrt%E6%8F%92%E4%BB%B6)
+    - [准备](#%E5%87%86%E5%A4%87)
+    - [在Linux上编译](#%E5%9C%A8linux%E4%B8%8A%E7%BC%96%E8%AF%91)
+  - [创建TensorRT推理引擎并在python下进行推理](#%E5%88%9B%E5%BB%BAtensorrt%E6%8E%A8%E7%90%86%E5%BC%95%E6%93%8E%E5%B9%B6%E5%9C%A8python%E4%B8%8B%E8%BF%9B%E8%A1%8C%E6%8E%A8%E7%90%86)
+  - [如何在MMCV中添加新的TensorRT自定义算子](#%E5%A6%82%E4%BD%95%E5%9C%A8mmcv%E4%B8%AD%E6%B7%BB%E5%8A%A0%E6%96%B0%E7%9A%84tensorrt%E8%87%AA%E5%AE%9A%E4%B9%89%E7%AE%97%E5%AD%90)
+    - [主要流程](#%E4%B8%BB%E8%A6%81%E6%B5%81%E7%A8%8B)
+    - [注意](#%E6%B3%A8%E6%84%8F)
+  - [已知问题](#%E5%B7%B2%E7%9F%A5%E9%97%AE%E9%A2%98)
+  - [引用](#%E5%BC%95%E7%94%A8)

 <!-- TOC -->

@@ -24,17 +24,17 @@

 ### MMCV中的TensorRT插件列表

-|         ONNX算子          |                                  TensorRT插件                                   | MMCV版本 |
-| :-----------------------: | :-----------------------------------------------------------------------------: | :------: |
-|       MMCVRoiAlign        |              [MMCVRoiAlign](./tensorrt_custom_ops.md#mmcvroialign)              |  1.2.6   |
-|         ScatterND         |                 [ScatterND](./tensorrt_custom_ops.md#scatternd)                 |  1.2.6   |
-|     NonMaxSuppression     |         [NonMaxSuppression](./tensorrt_custom_ops.md#nonmaxsuppression)         |  1.3.0   |
-|     MMCVDeformConv2d      |          [MMCVDeformConv2d](./tensorrt_custom_ops.md#mmcvdeformconv2d)          |  1.3.0   |
-|       grid_sampler        |              [grid_sampler](./tensorrt_custom_ops.md#grid-sampler)              |  1.3.1   |
-|          cummax           |                    [cummax](./tensorrt_custom_ops.md#cummax)                    |  1.3.5   |
-|          cummin           |                    [cummin](./tensorrt_custom_ops.md#cummin)                    |  1.3.5   |
-| MMCVInstanceNormalization | [MMCVInstanceNormalization](./tensorrt_custom_ops.md#mmcvinstancenormalization) |  1.3.5   |
-| MMCVModulatedDeformConv2d | [MMCVModulatedDeformConv2d](./tensorrt_custom_ops.md#mmcvmodulateddeformconv2d) |  master  |
+|          ONNX算子           |                                   TensorRT插件                                    | MMCV版本 |
+| :-----------------------: | :-----------------------------------------------------------------------------: | :----: |
+|       MMCVRoiAlign        |              [MMCVRoiAlign](./tensorrt_custom_ops.md#mmcvroialign)              | 1.2.6  |
+|         ScatterND         |                 [ScatterND](./tensorrt_custom_ops.md#scatternd)                 | 1.2.6  |
+|     NonMaxSuppression     |         [NonMaxSuppression](./tensorrt_custom_ops.md#nonmaxsuppression)         | 1.3.0  |
+|     MMCVDeformConv2d      |          [MMCVDeformConv2d](./tensorrt_custom_ops.md#mmcvdeformconv2d)          | 1.3.0  |
+|       grid_sampler        |              [grid_sampler](./tensorrt_custom_ops.md#grid-sampler)              | 1.3.1  |
+|          cummax           |                    [cummax](./tensorrt_custom_ops.md#cummax)                    | 1.3.5  |
+|          cummin           |                    [cummin](./tensorrt_custom_ops.md#cummin)                    | 1.3.5  |
+| MMCVInstanceNormalization | [MMCVInstanceNormalization](./tensorrt_custom_ops.md#mmcvinstancenormalization) | 1.3.5  |
+| MMCVModulatedDeformConv2d | [MMCVModulatedDeformConv2d](./tensorrt_custom_ops.md#mmcvmodulateddeformconv2d) | master |

 注意

@@ -146,21 +146,24 @@ with torch.no_grad():
 **以RoIAlign算子插件`roi_align`举例。**

 1. 在TensorRT包含目录`mmcv/ops/csrc/tensorrt/`中添加头文件`trt_roi_align.hpp`
+
 2. 在TensorRT源码目录`mmcv/ops/csrc/tensorrt/plugins/`中添加头文件`trt_roi_align.cpp`
+
 3. 在TensorRT源码目录`mmcv/ops/csrc/tensorrt/plugins/`中添加cuda kernel文件`trt_roi_align_kernel.cu`
+
 4. 在[trt_plugin.cpp](https://github.com/open-mmlab/mmcv/blob/master/mmcv/ops/csrc/tensorrt/plugins/trt_plugin.cpp)中注册`roi_align`插件

-    ```c++
-    #include "trt_plugin.hpp"
+   ```c++
+   #include "trt_plugin.hpp"

-    #include "trt_roi_align.hpp"
+   #include "trt_roi_align.hpp"

-    REGISTER_TENSORRT_PLUGIN(RoIAlignPluginDynamicCreator);
+   REGISTER_TENSORRT_PLUGIN(RoIAlignPluginDynamicCreator);

-    extern "C" {
-    bool initLibMMCVInferPlugins() { return true; }
-    }  // extern "C"
-    ```
+   extern "C" {
+   bool initLibMMCVInferPlugins() { return true; }
+   }  // extern "C"
+   ```

 5. 在`tests/test_ops/test_tensorrt.py`中添加单元测试


--- a/docs/zh_cn/faq.md
+++ b/docs/zh_cn/faq.md
@@ -7,51 +7,51 @@

 - KeyError: "xxx: 'yyy is not in the zzz registry'"

-    只有模块所在的文件被导入时，注册机制才会被触发，所以您需要在某处导入该文件，更多详情请查看 https://github.com/open-mmlab/mmdetection/issues/5974。
+  只有模块所在的文件被导入时，注册机制才会被触发，所以您需要在某处导入该文件，更多详情请查看 https://github.com/open-mmlab/mmdetection/issues/5974%E3%80%82

- "No module named 'mmcv.ops'"; "No module named 'mmcv._ext'"
+- "No module named 'mmcv.ops'"; "No module named 'mmcv.\_ext'"

-    1. 使用 `pip uninstall mmcv` 卸载您环境中的 mmcv
-    2. 参考 [installation instruction](https://mmcv.readthedocs.io/en/latest/get_started/installation.html) 或者 [Build MMCV from source](https://mmcv.readthedocs.io/en/latest/get_started/build.html) 安装 mmcv-full
+  1. 使用 `pip uninstall mmcv` 卸载您环境中的 mmcv
+  2. 参考 [installation instruction](https://mmcv.readthedocs.io/en/latest/get_started/installation.html) 或者 [Build MMCV from source](https://mmcv.readthedocs.io/en/latest/get_started/build.html) 安装 mmcv-full

 - "invalid device function" 或者 "no kernel image is available for execution"

-    1. 检查 GPU 的 CUDA 计算能力
-    2. 运行  `python mmdet/utils/collect_env.py` 来检查 PyTorch、torchvision 和 MMCV 是否是针对正确的 GPU 架构构建的，您可能需要去设置 `TORCH_CUDA_ARCH_LIST` 来重新安装 MMCV。兼容性问题可能会出现在使用旧版的 GPUs，如：colab 上的 Tesla K80 (3.7)
-    3. 检查运行环境是否和 mmcv/mmdet 编译时的环境相同。例如，您可能使用 CUDA 10.0 编译 mmcv，但在 CUDA 9.0 的环境中运行它
+  1. 检查 GPU 的 CUDA 计算能力
+  2. 运行  `python mmdet/utils/collect_env.py` 来检查 PyTorch、torchvision 和 MMCV 是否是针对正确的 GPU 架构构建的，您可能需要去设置 `TORCH_CUDA_ARCH_LIST` 来重新安装 MMCV。兼容性问题可能会出现在使用旧版的 GPUs，如：colab 上的 Tesla K80 (3.7)
+  3. 检查运行环境是否和 mmcv/mmdet 编译时的环境相同。例如，您可能使用 CUDA 10.0 编译 mmcv，但在 CUDA 9.0 的环境中运行它

 - "undefined symbol" 或者 "cannot open xxx.so"

-    1. 如果符号和 CUDA/C++ 相关（例如：libcudart.so 或者 GLIBCXX），请检查 CUDA/GCC 运行时的版本是否和编译 mmcv 的一致
-    2. 如果符号和 PyTorch 相关（例如：符号包含 caffe、aten 和 TH），请检查 PyTorch 运行时的版本是否和编译 mmcv 的一致
-    3. 运行 `python mmdet/utils/collect_env.py` 以检查 PyTorch、torchvision 和 MMCV 构建和运行的环境是否相同
+  1. 如果符号和 CUDA/C++ 相关（例如：libcudart.so 或者 GLIBCXX），请检查 CUDA/GCC 运行时的版本是否和编译 mmcv 的一致
+  2. 如果符号和 PyTorch 相关（例如：符号包含 caffe、aten 和 TH），请检查 PyTorch 运行时的版本是否和编译 mmcv 的一致
+  3. 运行 `python mmdet/utils/collect_env.py` 以检查 PyTorch、torchvision 和 MMCV 构建和运行的环境是否相同

 - "RuntimeError: CUDA error: invalid configuration argument"

-    这个错误可能是由于您的 GPU 性能不佳造成的。尝试降低[THREADS_PER_BLOCK](https://github.com/open-mmlab/mmcv/blob/cac22f8cf5a904477e3b5461b1cc36856c2793da/mmcv/ops/csrc/common_cuda_helper.hpp#L10)
-    的值并重新编译 mmcv。
+  这个错误可能是由于您的 GPU 性能不佳造成的。尝试降低[THREADS_PER_BLOCK](https://github.com/open-mmlab/mmcv/blob/cac22f8cf5a904477e3b5461b1cc36856c2793da/mmcv/ops/csrc/common_cuda_helper.hpp#L10)
+  的值并重新编译 mmcv。

 - "RuntimeError: nms is not compiled with GPU support"

-    这个错误是由于您的 CUDA 环境没有正确安装。
-    您可以尝试重新安装您的 CUDA 环境，然后删除 mmcv/build 文件夹并重新编译 mmcv。
+  这个错误是由于您的 CUDA 环境没有正确安装。
+  您可以尝试重新安装您的 CUDA 环境，然后删除 mmcv/build 文件夹并重新编译 mmcv。

 - "Segmentation fault"

-    1. 检查 GCC 的版本，通常是因为 PyTorch 版本与 GCC 版本不匹配 （例如 GCC < 4.9 )，我们推荐用户使用 GCC 5.4，我们也不推荐使用 GCC 5.5， 因为有反馈 GCC 5.5 会导致 "segmentation fault" 并且切换到 GCC 5.4 就可以解决问题
-    2. 检查是否正确安装 CUDA 版本的 PyTorc。输入以下命令并检查是否返回 True
-        ```shell
-        python -c 'import torch; print(torch.cuda.is_available())'
-        ```
-    3. 如果 `torch` 安装成功，那么检查 MMCV 是否安装成功。输入以下命令，如果没有报错说明 mmcv-full 安装成。
-        ```shell
-        python -c 'import mmcv; import mmcv.ops'
-        ```
-    4. 如果 MMCV 与 PyTorch 都安装成功了，则可以使用 `ipdb` 设置断点或者使用 `print` 函数，分析是哪一部分的代码导致了 `segmentation fault`
+  1. 检查 GCC 的版本，通常是因为 PyTorch 版本与 GCC 版本不匹配 （例如 GCC \< 4.9 )，我们推荐用户使用 GCC 5.4，我们也不推荐使用 GCC 5.5， 因为有反馈 GCC 5.5 会导致 "segmentation fault" 并且切换到 GCC 5.4 就可以解决问题
+  2. 检查是否正确安装 CUDA 版本的 PyTorc。输入以下命令并检查是否返回 True
+     ```shell
+     python -c 'import torch; print(torch.cuda.is_available())'
+     ```
+  3. 如果 `torch` 安装成功，那么检查 MMCV 是否安装成功。输入以下命令，如果没有报错说明 mmcv-full 安装成。
+     ```shell
+     python -c 'import mmcv; import mmcv.ops'
+     ```
+  4. 如果 MMCV 与 PyTorch 都安装成功了，则可以使用 `ipdb` 设置断点或者使用 `print` 函数，分析是哪一部分的代码导致了 `segmentation fault`

 - "libtorch_cuda_cu.so: cannot open shared object file"

-    `mmcv-full` 依赖 `libtorch_cuda_cu.so` 文件，但程序运行时没能找到该文件。我们可以检查该文件是否存在 `~/miniconda3/envs/{environment-name}/lib/python3.7/site-packages/torch/lib` 也可以尝试重装 PyTorch。
+  `mmcv-full` 依赖 `libtorch_cuda_cu.so` 文件，但程序运行时没能找到该文件。我们可以检查该文件是否存在 `~/miniconda3/envs/{environment-name}/lib/python3.7/site-packages/torch/lib` 也可以尝试重装 PyTorch。

 - "fatal error C1189: #error:  -- unsupported Microsoft Visual Studio version!"

@@ -59,11 +59,11 @@

 - "error: member "torch::jit::detail::ModulePolicy::all_slots" may not be initialized"

-  如果您在 Windows 上编译 mmcv-full 并且 PyTorch 的版本是 1.5.0，您很可能会遇到这个问题 `- torch/csrc/jit/api/module.h(474): error: member "torch::jit::detail::ModulePolicy::all_slots" may not be initialized`。解决这个问题的方法是将 `torch/csrc/jit/api/module.h` 文件中所有 `static constexpr bool all_slots = false;` 替换为 `static bool all_slots = false;`。更多细节可以查看 https://github.com/pytorch/pytorch/issues/39394。
+  如果您在 Windows 上编译 mmcv-full 并且 PyTorch 的版本是 1.5.0，您很可能会遇到这个问题 `- torch/csrc/jit/api/module.h(474): error: member "torch::jit::detail::ModulePolicy::all_slots" may not be initialized`。解决这个问题的方法是将 `torch/csrc/jit/api/module.h` 文件中所有 `static constexpr bool all_slots = false;` 替换为 `static bool all_slots = false;`。更多细节可以查看 https://github.com/pytorch/pytorch/issues/39394%E3%80%82

 - "error: a member with an in-class initializer must be const"

-  如果您在 Windows 上编译 mmcv-full 并且 PyTorch 的版本是 1.6.0，您很可能会遇到这个问题 `"- torch/include\torch/csrc/jit/api/module.h(483): error: a member with an in-class initializer must be const"`. 解决这个问题的方法是将 `torch/include\torch/csrc/jit/api/module.h` 文件中的所有 `CONSTEXPR_EXCEPT_WIN_CUDA ` 替换为 `const`。更多细节可以查看 https://github.com/open-mmlab/mmcv/issues/575。
+  如果您在 Windows 上编译 mmcv-full 并且 PyTorch 的版本是 1.6.0，您很可能会遇到这个问题 `"- torch/include\torch/csrc/jit/api/module.h(483): error: a member with an in-class initializer must be const"`. 解决这个问题的方法是将 `torch/include\torch/csrc/jit/api/module.h` 文件中的所有 `CONSTEXPR_EXCEPT_WIN_CUDA ` 替换为 `const`。更多细节可以查看 https://github.com/open-mmlab/mmcv/issues/575%E3%80%82

 - "error: member "torch::jit::ProfileOptionalOp::Kind" may not be initialized"

@@ -73,7 +73,7 @@
  - 将 `torch\include\pybind11\cast.h` 文件中的 `explicit operator type&() { return *(this->value); }` 替换为 `explicit operator type&() { return *((type*)this->value); }`
  - 将 `torch/include\torch/csrc/jit/api/module.h` 文件中的 所有 `CONSTEXPR_EXCEPT_WIN_CUDA` 替换为 `const`

-  更多细节可以查看 https://github.com/pytorch/pytorch/pull/45956。
+  更多细节可以查看 https://github.com/pytorch/pytorch/pull/45956%E3%80%82

 - MMCV 和 MMDetection 的兼容性问题；"ConvWS is already registered in conv layer"

@@ -83,9 +83,9 @@

 - "RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one"

-    1. 这个错误是因为有些参数没有参与 loss 的计算，可能是代码中存在多个分支，导致有些分支没有参与 loss 的计算。更多细节见 https://github.com/pytorch/pytorch/issues/55582。
-    2. 你可以设置 DDP 中的 `find_unused_parameters` 为 `True`，或者手动查找哪些参数没有用到。
+  1. 这个错误是因为有些参数没有参与 loss 的计算，可能是代码中存在多个分支，导致有些分支没有参与 loss 的计算。更多细节见 https://github.com/pytorch/pytorch/issues/55582%E3%80%82
+  2. 你可以设置 DDP 中的 `find_unused_parameters` 为 `True`，或者手动查找哪些参数没有用到。

 - "RuntimeError: Trying to backward through the graph a second time"

-    不能同时设置 `GradientCumulativeOptimizerHook` 和 `OptimizerHook`，这会导致 `loss.backward()` 被调用两次，于是程序抛出 `RuntimeError`。我们只需设置其中的一个。更多细节见 https://github.com/open-mmlab/mmcv/issues/1379。
+  不能同时设置 `GradientCumulativeOptimizerHook` 和 `OptimizerHook`，这会导致 `loss.backward()` 被调用两次，于是程序抛出 `RuntimeError`。我们只需设置其中的一个。更多细节见 https://github.com/open-mmlab/mmcv/issues/1379%E3%80%82
--- a/docs/zh_cn/get_started/build.md
+++ b/docs/zh_cn/get_started/build.md
@@ -42,6 +42,7 @@ CC=clang CXX=clang++ CFLAGS='-stdlib=libc++' MMCV_WITH_OPS=1 pip install -e .
 ```{note}
 如果你打算使用 `opencv-python-headless` 而不是 `opencv-python`，例如在一个很小的容器环境或者没有图形用户界面的服务器中，你可以先安装 `opencv-python-headless`，这样在安装 mmcv 依赖的过程中会跳过 `opencv-python`
 ```
+
 ### 在 Windows 上编译 MMCV

 在 Windows 上编译 MMCV 比 Linux 复杂，本节将一步步介绍如何在 Windows 上编译 MMCV。
@@ -69,38 +70,38 @@ CC=clang CXX=clang++ CFLAGS='-stdlib=libc++' MMCV_WITH_OPS=1 pip install -e .

 2. 创建一个新的 Conda 环境

-    ```shell
-    conda create --name mmcv python=3.7  # 经测试，3.6, 3.7, 3.8 也能通过
-    conda activate mmcv  # 确保做任何操作前先激活环境
-    ```
+   ```shell
+   conda create --name mmcv python=3.7  # 经测试，3.6, 3.7, 3.8 也能通过
+   conda activate mmcv  # 确保做任何操作前先激活环境
+   ```

 3. 安装 PyTorch 时，可以根据需要安装支持 CUDA 或不支持 CUDA 的版本

-    ```shell
-    # CUDA version
-    conda install pytorch torchvision cudatoolkit=10.2 -c pytorch
-    # CPU version
-    conda install pytorch torchvision cpuonly -c pytorch
-    ```
+   ```shell
+   # CUDA version
+   conda install pytorch torchvision cudatoolkit=10.2 -c pytorch
+   # CPU version
+   conda install pytorch torchvision cpuonly -c pytorch
+   ```

 4. 准备 MMCV 源代码

-    ```shell
-    git clone https://github.com/open-mmlab/mmcv.git
-    cd mmcv
-    ```
+   ```shell
+   git clone https://github.com/open-mmlab/mmcv.git
+   cd mmcv
+   ```

 5. 安装所需 Python 依赖包

-    ```shell
-    pip3 install -r requirements/runtime.txt
-    ```
+   ```shell
+   pip3 install -r requirements/runtime.txt
+   ```

 6. 建议安装 `ninja` 以加快编译速度

-    ```bash
-    pip install -r requirements/optional.txt
-    ```
+   ```bash
+   pip install -r requirements/optional.txt
+   ```

 #### 编译与安装 MMCV

@@ -108,33 +109,33 @@ MMCV 有三种安装的模式：

 1. Lite 版本（不包含算子）

-    这种方式下，没有算子被编译，这种模式的 mmcv 是原生的 python 包
+   这种方式下，没有算子被编译，这种模式的 mmcv 是原生的 python 包

 2. Full 版本（只包含 CPU 算子）

-    编译 CPU 算子，但只有 x86 将会被编译，并且编译版本只能在 CPU only 情况下运行
+   编译 CPU 算子，但只有 x86 将会被编译，并且编译版本只能在 CPU only 情况下运行

 3. Full 版本（既包含 CPU 算子，又包含 CUDA 算子）

-    同时编译 CPU 和 CUDA 算子，`ops` 模块的 x86 与 CUDA 的代码都可以被编译。同时编译的版本可以在 CUDA 上调用 GPU
+   同时编译 CPU 和 CUDA 算子，`ops` 模块的 x86 与 CUDA 的代码都可以被编译。同时编译的版本可以在 CUDA 上调用 GPU

 ##### 通用步骤

 1. 设置 MSVC 编译器

-    设置环境变量。添加 `C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.27.29110\bin\Hostx86\x64` 到 `PATH`，则 `cl.exe` 可以在命令行中运行，如下所示。
+   设置环境变量。添加 `C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.27.29110\bin\Hostx86\x64` 到 `PATH`，则 `cl.exe` 可以在命令行中运行，如下所示。

-    ```none
-    (base) PS C:\Users\xxx> cl
-    Microsoft (R) C/C++ Optimizing  Compiler Version 19.27.29111 for x64
-    Copyright (C) Microsoft Corporation.   All rights reserved.
+   ```none
+   (base) PS C:\Users\xxx> cl
+   Microsoft (R) C/C++ Optimizing  Compiler Version 19.27.29111 for x64
+   Copyright (C) Microsoft Corporation.   All rights reserved.

-    usage: cl [ option... ] filename... [ / link linkoption... ]
-    ```
+   usage: cl [ option... ] filename... [ / link linkoption... ]
+   ```

-    为了兼容性，我们使用 x86-hosted 以及 x64-targeted 版本，即路径中的 `Hostx86\x64` 。
+   为了兼容性，我们使用 x86-hosted 以及 x64-targeted 版本，即路径中的 `Hostx86\x64` 。

-    因为 PyTorch 将解析 `cl.exe` 的输出以检查其版本，只有 utf-8 将会被识别，你可能需要将系统语言更改为英语。控制面板 -> 地区-> 管理-> 非 Unicode 来进行语言转换。
+   因为 PyTorch 将解析 `cl.exe` 的输出以检查其版本，只有 utf-8 将会被识别，你可能需要将系统语言更改为英语。控制面板 -> 地区-> 管理-> 非 Unicode 来进行语言转换。

 ##### 安装方式一：Lite version（不包含算子）

@@ -157,20 +158,20 @@ pip list

 2. 设置环境变量

-    ```shell
-    $env:MMCV_WITH_OPS = 1
-    $env:MAX_JOBS = 8  # 根据你可用CPU以及内存量进行设置
-    ```
+   ```shell
+   $env:MMCV_WITH_OPS = 1
+   $env:MAX_JOBS = 8  # 根据你可用CPU以及内存量进行设置
+   ```

 3. 编译安装

-    ```shell
-    conda activate mmcv  # 激活环境
-    cd mmcv  # 改变路径
-    python setup.py build_ext  # 如果成功, cl 将被启动用于编译算子
-    python setup.py develop  # 安装
-    pip list  # 检查是否安装成功
-    ```
+   ```shell
+   conda activate mmcv  # 激活环境
+   cd mmcv  # 改变路径
+   python setup.py build_ext  # 如果成功, cl 将被启动用于编译算子
+   python setup.py develop  # 安装
+   pip list  # 检查是否安装成功
+   ```

 ##### 安装方式三：Full version（既编译 CPU 算子又编译 CUDA 算子）

@@ -178,38 +179,38 @@ pip list

 2. 设置环境变量

-    ```shell
-    $env:MMCV_WITH_OPS = 1
-    $env:MAX_JOBS = 8  # 根据你可用CPU以及内存量进行设置
-    ```
+   ```shell
+   $env:MMCV_WITH_OPS = 1
+   $env:MAX_JOBS = 8  # 根据你可用CPU以及内存量进行设置
+   ```

-3.  检查 `CUDA_PATH` 或者 `CUDA_HOME` 环境变量已经存在在 `envs` 之中
+3. 检查 `CUDA_PATH` 或者 `CUDA_HOME` 环境变量已经存在在 `envs` 之中

-    ```none
-    (base) PS C:\Users\WRH> ls env:
+   ```none
+   (base) PS C:\Users\WRH> ls env:

-    Name                           Value
-    ----                           -----
-    CUDA_PATH                      C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2
-    CUDA_PATH_V10_1                C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1
-    CUDA_PATH_V10_2                C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2
-    ```
+   Name                           Value
+   ----                           -----
+   CUDA_PATH                      C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2
+   CUDA_PATH_V10_1                C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1
+   CUDA_PATH_V10_2                C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2
+   ```

-    如果没有，你可以按照下面的步骤设置
+   如果没有，你可以按照下面的步骤设置

-    ```shell
-    $env:CUDA_HOME = "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2"
-    # 或者
-    $env:CUDA_HOME = $env:CUDA_PATH_V10_2  # CUDA_PATH_V10_2 已经在环境变量中
-    ```
+   ```shell
+   $env:CUDA_HOME = "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2"
+   # 或者
+   $env:CUDA_HOME = $env:CUDA_PATH_V10_2  # CUDA_PATH_V10_2 已经在环境变量中
+   ```

 4. 设置 CUDA 的目标架构

-    ```shell
-    $env:TORCH_CUDA_ARCH_LIST="6.1" # 支持 GTX 1080
-    # 或者用所有支持的版本，但可能会变得很慢
-    $env:TORCH_CUDA_ARCH_LIST="3.5 3.7 5.0 5.2 6.0 6.1 7.0 7.5"
-    ```
+   ```shell
+   $env:TORCH_CUDA_ARCH_LIST="6.1" # 支持 GTX 1080
+   # 或者用所有支持的版本，但可能会变得很慢
+   $env:TORCH_CUDA_ARCH_LIST="3.5 3.7 5.0 5.2 6.0 6.1 7.0 7.5"
+   ```

 ```{note}
 我们可以在 [here](https://developer.nvidia.com/cuda-gpus) 查看 GPU 的计算能力
@@ -217,15 +218,15 @@ pip list

 5. 编译安装

-    ```shell
-    $env:MMCV_WITH_OPS = 1
-    $env:MAX_JOBS = 8 # 根据你可用CPU以及内存量进行设置
-    conda activate mmcv # 激活环境
-    cd mmcv  # 改变路径
-    python setup.py build_ext  # 如果成功, cl 将被启动用于编译算子
-    python setup.py develop # 安装
-    pip list # 检查是否安装成功
-    ```
+   ```shell
+   $env:MMCV_WITH_OPS = 1
+   $env:MAX_JOBS = 8 # 根据你可用CPU以及内存量进行设置
+   conda activate mmcv # 激活环境
+   cd mmcv  # 改变路径
+   python setup.py build_ext  # 如果成功, cl 将被启动用于编译算子
+   python setup.py develop # 安装
+   pip list # 检查是否安装成功
+   ```

 ```{note}
 如果你的 PyTorch 版本是 1.6.0，你可能会遇到一些这个 [issue](https://github.com/pytorch/pytorch/issues/42467) 提到的错误，则可以参考这个 [pull request](https://github.com/pytorch/pytorch/pull/43380/files) 修改 本地环境的 PyTorch 源代码

--- a/docs/zh_cn/get_started/installation.md
+++ b/docs/zh_cn/get_started/installation.md
@@ -17,13 +17,13 @@ a. 安装完整版

 i. 安装最新版本

-如下是安装最新版 ``mmcv-full`` 的命令
+如下是安装最新版 `mmcv-full` 的命令

 ```shell
 pip install mmcv-full -f https://download.openmmlab.com/mmcv/dist/{cu_version}/{torch_version}/index.html
 ```

-请将链接中的 ``{cu_version}`` 和 ``{torch_version}`` 根据自身需求替换成实际的版本号，例如想安装和 ``CUDA 11.1``、``PyTorch 1.9.0`` 兼容的最新版 ``mmcv-full``，使用如下替换过的命令
+请将链接中的 `{cu_version}` 和 `{torch_version}` 根据自身需求替换成实际的版本号，例如想安装和 `CUDA 11.1`、`PyTorch 1.9.0` 兼容的最新版 `mmcv-full`，使用如下替换过的命令

 ```shell
 pip install mmcv-full -f https://download.openmmlab.com/mmcv/dist/cu111/torch1.9.0/index.html
@@ -37,18 +37,18 @@ PyTorch 版本是 1.8.1、CUDA 版本是 11.1，你可以使用以下命令安
 `pip install mmcv-full -f https://download.openmmlab.com/mmcv/dist/cu111/torch1.8.0/index.html`
 ```

-如果想知道更多 CUDA 和 PyTorch 版本的命令，可以参考下面的表格，将链接中的 ``=={mmcv_version}`` 删去即可。
+如果想知道更多 CUDA 和 PyTorch 版本的命令，可以参考下面的表格，将链接中的 `=={mmcv_version}` 删去即可。

 ii. 安装特定的版本

-如下是安装特定版本 ``mmcv-full`` 的命令
+如下是安装特定版本 `mmcv-full` 的命令

 ```shell
 pip install mmcv-full=={mmcv_version} -f https://download.openmmlab.com/mmcv/dist/{cu_version}/{torch_version}/index.html
 ```

-首先请参考版本发布信息找到想要安装的版本号，将 ``{mmcv_version}`` 替换成该版本号，例如 ``1.3.9``。
-然后将链接中的 ``{cu_version}`` 和 ``{torch_version}`` 根据自身需求替换成实际的版本号，例如想安装和 ``CUDA 11.1``、``PyTorch 1.9.0`` 兼容的 ``mmcv-full`` 1.3.9 版本，使用如下替换过的命令
+首先请参考版本发布信息找到想要安装的版本号，将 `{mmcv_version}` 替换成该版本号，例如 `1.3.9`。
+然后将链接中的 `{cu_version}` 和 `{torch_version}` 根据自身需求替换成实际的版本号，例如想安装和 `CUDA 11.1`、`PyTorch 1.9.0` 兼容的 `mmcv-full` 1.3.9 版本，使用如下替换过的命令

 ```shell
 pip install mmcv-full==1.3.9 -f https://download.openmmlab.com/mmcv/dist/cu111/torch1.9.0/index.html

--- a/docs/zh_cn/get_started/previous_versions.md
+++ b/docs/zh_cn/get_started/previous_versions.md
-
 ## 其他版本的 PyTorch

 我们不再提供在较低的 `PyTorch` 版本下编译的 `mmcv-full` 包，但为了您的方便，您可以在下面找到它们。

 ### PyTorch 1.4

-| 1.0.0 <= mmcv_version <= 1.2.1
+| 1.0.0 \<= mmcv_version \<= 1.2.1

 #### CUDA 10.1

@@ -27,7 +26,7 @@ pip install mmcv-full=={mmcv_version} -f https://download.openmmlab.com/mmcv/dis

 ### PyTorch v1.3

-| 1.0.0 <= mmcv_version <= 1.3.16
+| 1.0.0 \<= mmcv_version \<= 1.3.16

 #### CUDA 10.1


--- a/docs/zh_cn/understand_mmcv/cnn.md
+++ b/docs/zh_cn/understand_mmcv/cnn.md
--- a/docs/zh_cn/understand_mmcv/config.md
+++ b/docs/zh_cn/understand_mmcv/config.md
@@ -40,6 +40,7 @@ d = 'string'
 这里是一个带有预定义变量的配置文件的例子。

 `config_a.py`
+
 ```python
 a = 1
 b = './work_dir/{{ fileBasenameNoExtension }}'
@@ -65,6 +66,7 @@ c = '{{ fileExtname }}'
 a = 1
 b = dict(b1=[0, 1, 2], b2=None)
 ```
+
 ### 不含重复键值对从基类配置文件继承

 `config_b.py`
@@ -83,6 +85,7 @@ d = 'string'
 ...      c=(1, 2),
 ...      d='string')
 ```
+
 在`config_b.py`里的新字段与在`config_a.py`里的旧字段拼接

 ### 含重复键值对从基类配置文件继承

--- a/docs/zh_cn/understand_mmcv/io.md
+++ b/docs/zh_cn/understand_mmcv/io.md
@@ -107,6 +107,7 @@ c
 d
 e
 ```
+
 #### 从硬盘读取

 使用 `list_from_file` 读取 `a.txt`

--- a/docs/zh_cn/understand_mmcv/registry.md
+++ b/docs/zh_cn/understand_mmcv/registry.md
 ## 注册器
+
 MMCV 使用 [注册器](https://github.com/open-mmlab/mmcv/blob/master/mmcv/utils/registry.py) 来管理具有相似功能的不同模块, 例如, 检测器中的主干网络、头部、和模型颈部。
 在 OpenMMLab 家族中的绝大部分开源项目使用注册器去管理数据集和模型的模块，例如 [MMDetection](https://github.com/open-mmlab/mmdetection), [MMDetection3D](https://github.com/open-mmlab/mmdetection3d), [MMClassification](https://github.com/open-mmlab/mmclassification), [MMEditing](https://github.com/open-mmlab/mmediting) 等。

@@ -7,6 +8,7 @@ MMCV 使用 [注册器](https://github.com/open-mmlab/mmcv/blob/master/mmcv/util
 ```

 ### 什么是注册器
+
 在MMCV中，注册器可以看作类或函数到字符串的映射。
 一个注册器中的类或函数通常有相似的接口，但是可以实现不同的算法或支持不同的数据集。
 借助注册器，用户可以通过使用相应的字符串查找类或函数，并根据他们的需要实例化对应模块或调用函数获取结果。
@@ -46,6 +48,7 @@ class Converter1(object):
        self.a = a
        self.b = b
 ```
+
 ```python
 # converter2.py
 from .builder import CONVERTERS
@@ -56,6 +59,7 @@ from .converter1 import Converter1
 def converter2(a, b)
    return Converter1(a, b)
 ```
+
 使用注册器管理模块的关键步骤是，将实现的模块注册到注册表 `CONVERTERS` 中。通过 `@CONVERTERS.register_module()` 装饰所实现的模块，字符串到类或函数之间的映射就可以由 `CONVERTERS` 构建和维护，如下所示：

 通过这种方式，就可以通过 `CONVERTERS` 建立字符串与类或函数之间的映射，如下所示：
@@ -64,9 +68,11 @@ def converter2(a, b)
 'Converter1' -> <class 'Converter1'>
 'converter2' -> <function 'converter2'>
 ```
+
 ```{note}
 只有模块所在的文件被导入时，注册机制才会被触发，所以您需要在某处导入该文件。更多详情请查看 https://github.com/open-mmlab/mmdetection/issues/5974。
 ```
+
 如果模块被成功注册了，你可以通过配置文件使用这个转换器（converter），如下所示：

 ```python

--- a/docs/zh_cn/understand_mmcv/runner.md
+++ b/docs/zh_cn/understand_mmcv/runner.md
@@ -8,7 +8,7 @@

 ### EpochBasedRunner

-顾名思义，`EpochBasedRunner` 是指以 epoch 为周期的工作流，例如设置 workflow = [('train', 2), ('val', 1)] 表示循环迭代地训练 2 个 epoch，然后验证 1 个 epoch。MMDetection 目标检测框架默认采用的是 `EpochBasedRunner`。
+顾名思义，`EpochBasedRunner` 是指以 epoch 为周期的工作流，例如设置 workflow = \[('train', 2), ('val', 1)\] 表示循环迭代地训练 2 个 epoch，然后验证 1 个 epoch。MMDetection 目标检测框架默认采用的是 `EpochBasedRunner`。

 其抽象逻辑如下所示：

@@ -25,6 +25,7 @@ while curr_epoch < max_epochs:
        for _ in range(epochs):
            epoch_runner(data_loaders[i], **kwargs)
 ```
+
 目前支持训练和验证两个工作流，以训练函数为例，其抽象逻辑是：

 ```python
@@ -40,7 +41,8 @@ def train(self, data_loader, **kwargs):
 ```

 ### IterBasedRunner
-不同于 `EpochBasedRunner`，`IterBasedRunner` 是指以 iter 为周期的工作流，例如设置 workflow = [('train', 2)， ('val', 1)] 表示循环迭代的训练 2 个 iter，然后验证 1 个 iter，MMSegmentation 语义分割框架默认采用的是  `IterBasedRunner`。
+
+不同于 `EpochBasedRunner`，`IterBasedRunner` 是指以 iter 为周期的工作流，例如设置 workflow = \[('train', 2)， ('val', 1)\] 表示循环迭代的训练 2 个 iter，然后验证 1 个 iter，MMSegmentation 语义分割框架默认采用的是  `IterBasedRunner`。

 其抽象逻辑如下所示：

@@ -59,6 +61,7 @@ while curr_iter < max_iters:
        for _ in range(iters):
            iter_runner(iter_loaders[i], **kwargs)
 ```
+
 目前支持训练和验证两个工作流，以验证函数为例，其抽象逻辑是：

 ```python
@@ -75,6 +78,7 @@ def val(self, data_loader, **kwargs):
 除了上述基础功能外，`EpochBasedRunner` 和 `IterBasedRunner` 还提供了 resume 、 save_checkpoint 和注册 hook 功能。

 ### 一个简单例子
+
 以最常用的分类任务为例详细说明 `runner` 的使用方法。 开启任何一个训练任务，都需要包括如下步骤：

 **(1) dataloader、model 和优化器等类初始化**
@@ -148,8 +152,8 @@ runner.run(data_loaders, cfg.workflow)

 关于 workflow 设置，以 `EpochBasedRunner` 为例，详情如下：

- 假设只想运行训练工作流，则可以设置 workflow = [('train', 1)]，表示只进行迭代训练
- 假设想运行训练和验证工作流，则可以设置 workflow = [('train',  3), ('val', 1)]，表示先训练 3 个 epoch ，然后切换到 val 工作流，运行 1 个 epoch，然后循环，直到训练 epoch 次数达到指定值
- 工作流设置还自由定制，例如你可以先验证再训练 workflow = [('val', 1), ('train', 1)]
+- 假设只想运行训练工作流，则可以设置 workflow = \[('train', 1)\]，表示只进行迭代训练
+- 假设想运行训练和验证工作流，则可以设置 workflow = \[('train',  3), ('val', 1)\]，表示先训练 3 个 epoch ，然后切换到 val 工作流，运行 1 个 epoch，然后循环，直到训练 epoch 次数达到指定值
+- 工作流设置还自由定制，例如你可以先验证再训练 workflow = \[('val', 1), ('train', 1)\]

 上述代码都已经封装到了各个代码库的 train.py 中，用户只需要设置相应的配置即可，上述流程会自动运行。
--- a/docs/zh_cn/understand_mmcv/utils.md
+++ b/docs/zh_cn/understand_mmcv/utils.md
@@ -58,7 +58,6 @@ with mmcv.Timer():

 你也可以使用 `since_start()` 和 `since_last_check()` 。前者返回计时器启动后的运行时长，后者返回最近一次查看计时器后的运行时长。

-
 ```python
 timer = mmcv.Timer()
 # code block 1 here

--- a/mmcv/ops/csrc/README.md
+++ b/mmcv/ops/csrc/README.md
@@ -76,95 +76,95 @@ This folder contains all non-python code for MMCV custom ops. Please follow the

 1. (Optional) Add shared kernel in `common` to support special hardware platform.

-    ```c++
-    // src/common/cuda/new_ops_cuda_kernel.cuh
-
-    template <typename T>
-    __global__ void new_ops_forward_cuda_kernel(const T* input, T* output, ...) {
-        // forward here
-    }
-
-    ```
-
-    Add cuda kernel launcher in `pytorch/cuda`.
-
-    ```c++
-    // src/pytorch/cuda
-    #include <new_ops_cuda_kernel.cuh>
-
-    void NewOpsForwardCUDAKernelLauncher(Tensor input, Tensor output, ...){
-        // initialize
-        at::cuda::CUDAGuard device_guard(input.device());
-        cudaStream_t stream = at::cuda::getCurrentCUDAStream();
-        ...
-        AT_DISPATCH_FLOATING_TYPES_AND_HALF(
-            input.scalar_type(), "new_ops_forward_cuda_kernel", ([&] {
-                new_ops_forward_cuda_kernel<scalar_t>
-                    <<<GET_BLOCKS(output_size), THREADS_PER_BLOCK, 0, stream>>>(
-                        input.data_ptr<scalar_t>(), output.data_ptr<scalar_t>(),...);
-            }));
-        AT_CUDA_CHECK(cudaGetLastError());
-    }
-    ```
+   ```c++
+   // src/common/cuda/new_ops_cuda_kernel.cuh
+
+   template <typename T>
+   __global__ void new_ops_forward_cuda_kernel(const T* input, T* output, ...) {
+       // forward here
+   }
+
+   ```
+
+   Add cuda kernel launcher in `pytorch/cuda`.
+
+   ```c++
+   // src/pytorch/cuda
+   #include <new_ops_cuda_kernel.cuh>
+
+   void NewOpsForwardCUDAKernelLauncher(Tensor input, Tensor output, ...){
+       // initialize
+       at::cuda::CUDAGuard device_guard(input.device());
+       cudaStream_t stream = at::cuda::getCurrentCUDAStream();
+       ...
+       AT_DISPATCH_FLOATING_TYPES_AND_HALF(
+           input.scalar_type(), "new_ops_forward_cuda_kernel", ([&] {
+               new_ops_forward_cuda_kernel<scalar_t>
+                   <<<GET_BLOCKS(output_size), THREADS_PER_BLOCK, 0, stream>>>(
+                       input.data_ptr<scalar_t>(), output.data_ptr<scalar_t>(),...);
+           }));
+       AT_CUDA_CHECK(cudaGetLastError());
+   }
+   ```

 2. Register implementation for different devices.

-    ```c++
-    // src/pytorch/cuda/cudabind.cpp
-    ...
+   ```c++
+   // src/pytorch/cuda/cudabind.cpp
+   ...

-    Tensor new_ops_forward_cuda(Tensor input, Tensor output, ...){
-        // implement cuda forward here
-        // use `NewOpsForwardCUDAKernelLauncher` here
-    }
-    // declare interface here.
-    Tensor new_ops_forward_impl(Tensor input, Tensor output, ...);
-    // register the implementation for given device (CUDA here).
-    REGISTER_DEVICE_IMPL(new_ops_forward_impl, CUDA, new_ops_forward_cuda);
-    ```
+   Tensor new_ops_forward_cuda(Tensor input, Tensor output, ...){
+       // implement cuda forward here
+       // use `NewOpsForwardCUDAKernelLauncher` here
+   }
+   // declare interface here.
+   Tensor new_ops_forward_impl(Tensor input, Tensor output, ...);
+   // register the implementation for given device (CUDA here).
+   REGISTER_DEVICE_IMPL(new_ops_forward_impl, CUDA, new_ops_forward_cuda);
+   ```

 3. Add ops implementation in `pytorch` directory. Select different implementations according to device type.

-    ```c++
-    // src/pytorch/new_ops.cpp
-    Tensor new_ops_forward_impl(Tensor input, Tensor output, ...){
-        // dispatch the implementation according to the device type of input.
-        DISPATCH_DEVICE_IMPL(new_ops_forward_impl, input, output, ...);
-    }
-    ...
+   ```c++
+   // src/pytorch/new_ops.cpp
+   Tensor new_ops_forward_impl(Tensor input, Tensor output, ...){
+       // dispatch the implementation according to the device type of input.
+       DISPATCH_DEVICE_IMPL(new_ops_forward_impl, input, output, ...);
+   }
+   ...

-    Tensor new_ops_forward(Tensor input, Tensor output, ...){
-        return new_ops_forward_impl(input, output, ...);
-    }
-    ```
+   Tensor new_ops_forward(Tensor input, Tensor output, ...){
+       return new_ops_forward_impl(input, output, ...);
+   }
+   ```

 4. Binding the implementation in `pytorch/pybind.cpp`

-    ```c++
-    // src/pytorch/pybind.cpp
+   ```c++
+   // src/pytorch/pybind.cpp

-    ...
+   ...

-    Tensor new_ops_forward(Tensor input, Tensor output, ...);
+   Tensor new_ops_forward(Tensor input, Tensor output, ...);

-    ...
+   ...

-    // bind with pybind11
-    m.def("new_ops_forward", &new_ops_forward, "new_ops_forward",
-            py::arg("input"), py::arg("output"), ...);
+   // bind with pybind11
+   m.def("new_ops_forward", &new_ops_forward, "new_ops_forward",
+           py::arg("input"), py::arg("output"), ...);

-    ...
+   ...

-    ```
+   ```

 5. Build MMCV again. Enjoy new ops in python

-    ```python
-    from ..utils import ext_loader
-    ext_module = ext_loader.load_ext('_ext', ['new_ops_forward'])
+   ```python
+   from ..utils import ext_loader
+   ext_module = ext_loader.load_ext('_ext', ['new_ops_forward'])

-    ...
+   ...

-    ext_module.new_ops_forward(input, output, ...)
+   ext_module.new_ops_forward(input, output, ...)

-    ```
+   ```