Commit 4353fa59 authored by limm's avatar limm
Browse files

add part code

parents
Pipeline #2807 canceled with stages
# 一键式脚本安装
通过用户调研,我们得知多数使用者在了解 mmdeploy 前,已经熟知 python 和 torch 用法。因此我们提供脚本简化 mmdeploy 安装。
假设您已经准备好
- python3 -m pip(必须,conda 或 pyenv 皆可)
- nvcc(取决于推理后端)
- torch(非必须,可延后安装)
运行这个脚本来安装 mmdeploy + ncnn backend,`nproc` 可以不指定。
```bash
$ cd /path/to/mmdeploy
$ python3 tools/scripts/build_ubuntu_x64_ncnn.py
..
```
期间可能需要 sudo 密码,脚本会尽最大努力完成 mmdeploy SDK 和 demo:
- 检测系统版本、make 使用的 job 个数、是否 root 用户,也会自动修复 pip 问题
- 寻找必须的基础工具,如 g++-7、cmake、wget 等
- 编译必须的依赖,如 pyncnn、 protobuf
脚本也会尽量避免影响 host 环境:
- 源码编译的依赖,都放在与 mmdeploy 同级的 `mmdeploy-dep` 目录中
- 不会主动修改 PATH、LD_LIBRARY_PATH、PYTHONPATH 等变量
- 会打印需要修改的环境变量,**需要注意最终的输出信息**
脚本最终会执行 `python3 tools/check_env.py`,安装成功应显示对应 backend 的版本号和 `ops_is_available: True`,例如:
```bash
$ python3 tools/check_env.py
..
2022-09-13 14:49:13,767 - mmdeploy - INFO - **********Backend information**********
2022-09-13 14:49:14,116 - mmdeploy - INFO - onnxruntime: 1.8.0 ops_is_avaliable : True
2022-09-13 14:49:14,131 - mmdeploy - INFO - tensorrt: 8.4.1.5 ops_is_avaliable : True
2022-09-13 14:49:14,139 - mmdeploy - INFO - ncnn: 1.0.20220901 ops_is_avaliable : True
2022-09-13 14:49:14,150 - mmdeploy - INFO - pplnn_is_avaliable: True
..
```
这是已验证的安装脚本。如果想让 mmdeploy 同时支持多种 backend,每个脚本执行一次即可:
| script | OS version |
| :-----------------------------: | :-----------------: |
| build_ubuntu_x64_ncnn.py | 18.04/20.04 |
| build_ubuntu_x64_ort.py | 18.04/20.04 |
| build_ubuntu_x64_pplnn.py | 18.04/20.04 |
| build_ubuntu_x64_torchscript.py | 18.04/20.04 |
| build_ubuntu_x64_tvm.py | 18.04/20.04 |
| build_jetson_orin_python38.sh | JetPack5.0 L4T 34.1 |
# 源码手动安装
如果网络良好,我们建议使用 [docker](build_from_docker.md)[一键式脚本](build_from_script.md) 方式。
## 下载
```bash
git clone -b main git@github.com:open-mmlab/mmdeploy.git --recursive
```
### FAQ
- 如果由于网络等原因导致拉取仓库子模块失败,可以尝试通过如下指令手动再次安装子模块:
```bash
git clone git@github.com:NVIDIA/cub.git third_party/cub
cd third_party/cub
git checkout c3cceac115
# 返回至 third_party 目录, 克隆 pybind11
cd ..
git clone git@github.com:pybind/pybind11.git pybind11
cd pybind11
git checkout 70a58c5
cd ..
git clone git@github.com:gabime/spdlog.git spdlog
cd spdlog
git checkout 9e8e52c048
```
- 如果以 `SSH` 方式 `git clone` 代码失败,您可以尝试使用 `HTTPS` 协议下载代码:
```bash
git clone -b main https://github.com/open-mmlab/mmdeploy.git MMDeploy
cd MMDeploy
git submodule update --init --recursive
```
## 编译
根据您的目标平台,点击如下对应的链接,按照说明编译 MMDeploy
- [Linux-x86_64](linux-x86_64.md)
- [Windows](windows.md)
- [MacOS](macos-arm64.md)
- [Android-aarch64](android.md)
- [NVIDIA Jetson](jetsons.md)
- [Qcom SNPE](snpe.md)
- [RISC-V](riscv.md)
- [Rockchip](rockchip.md)
# cmake 编译选项说明
<table class="docutils">
<thead>
<tr>
<th>选项</th>
<th>取值范围</th>
<th>缺省值</th>
<th>说明</th>
</tr>
</thead>
<tbody>
<tr>
<td>MMDEPLOY_SHARED_LIBS</td>
<td>{ON, OFF}</td>
<td>ON</td>
<td>动态库的编译开关。设置OFF时,编译静态库</td>
</tr>
<tr>
<td>MMDEPLOY_BUILD_SDK</td>
<td>{ON, OFF}</td>
<td>OFF</td>
<td>MMDeploy SDK 编译开关</td>
</tr>
<tr>
<td>MMDEPLOY_BUILD_SDK_MONOLITHIC</td>
<td>{ON, OFF}</td>
<td>OFF</td>
<td>编译生成单个 lib 文件</td>
</tr>
<tr>
<td>MMDEPLOY_BUILD_TEST</td>
<td>{ON, OFF}</td>
<td>OFF</td>
<td>MMDeploy SDK 的单元测试程序编译开关</td>
</tr>
<tr>
<td>MMDEPLOY_BUILD_SDK_PYTHON_API</td>
<td>{ON, OFF}</td>
<td>OFF</td>
<td>SDK python package的编译开关</td>
</tr>
<tr>
<td>MMDEPLOY_BUILD_SDK_CSHARP_API</td>
<td>{ON, OFF}</td>
<td>OFF</td>
<td>SDK C# package的编译开关</td>
</tr>
<tr>
<td>MMDEPLOY_BUILD_SDK_JAVA_API</td>
<td>{ON, OFF}</td>
<td>OFF</td>
<td>SDK Java package的编译开关</td>
</tr>
<tr>
<td>MMDEPLOY_BUILD_EXAMPLES</td>
<td>{ON, OFF}</td>
<td>OFF</td>
<td>是否编译 demo</td>
</tr>
<tr>
<td>MMDEPLOY_SPDLOG_EXTERNAL</td>
<td>{ON, OFF}</td>
<td>OFF</td>
<td>是否使用系统自带的 spdlog 安装包</td>
</tr>
<tr>
<td>MMDEPLOY_ZIP_MODEL</td>
<td>{ON, OFF}</td>
<td>OFF</td>
<td>是否使用 zip 格式的 sdk 目录</td>
</tr>
<tr>
<td>MMDEPLOY_COVERAGE</td>
<td>{ON, OFF}</td>
<td>OFF</td>
<td>额外增加编译选项,以生成代码覆盖率报表</td>
</tr>
<tr>
<td>MMDEPLOY_TARGET_DEVICES</td>
<td>{"cpu", "cuda"}</td>
<td>cpu</td>
<td>设置目标设备。当有多个设备时,设备名称之间使用分号隔开。 比如,-DMMDEPLOY_TARGET_DEVICES="cpu;cuda"</td>
</tr>
<tr>
<td>MMDEPLOY_TARGET_BACKENDS</td>
<td>{"trt", "ort", "pplnn", "ncnn", "openvino", "torchscript", "snpe", "coreml", "tvm"}</td>
<td>N/A</td>
<td> <b>默认情况下,SDK不设置任何后端</b>, 因为它与应用场景高度相关。 当选择多个后端时, 中间使用分号隔开。比如,<pre><code>-DMMDEPLOY_TARGET_BACKENDS="trt;ort;pplnn;ncnn;openvino"</code></pre>
构建时,几乎每个后端,都需设置一些路径变量,用来查找依赖包。<br>
1. <b>trt</b>: 表示 TensorRT。需要设置 <code>TENSORRT_DIR</code><code>CUDNN_DIR</code>
<pre><code>
-DTENSORRT_DIR=$env:TENSORRT_DIR
-DCUDNN_DIR=$env:CUDNN_DIR
</code></pre>
2. <b>ort</b>: 表示 ONNXRuntime。需要设置 <code>ONNXRUNTIME_DIR</code>
<pre><code>-DONNXRUNTIME_DIR=$env:ONNXRUNTIME_DIR</code></pre>
3. <b>pplnn</b>: 表示 PPL.NN。需要设置 <code>pplnn_DIR</code><br>
4. <b>ncnn</b>:表示 ncnn。需要设置 <code>ncnn_DIR</code><br>
5. <b>openvino</b>: 表示 OpenVINO。需要设置 <code>InferenceEngine_DIR</code><br>
6. <b>torchscript</b>: 表示 TorchScript。需要设置<code>Torch_DIR</code><br>
7. <b>snpe</b>: 表示 qcom snpe。需要环境变量设置 SNPE_ROOT。<br>
8. <b>coreml</b>: 表示 Core ML。目前在进行模型转换时需要设置 <code>Torch_DIR</code><br>
9. <b>tvm</b>: 表示 TVM。需要设置 <code>TVM_DIR</code><br>
</td>
</tr>
<tr>
<td>MMDEPLOY_CODEBASES</td>
<td>{"mmpretrain", "mmdet", "mmseg", "mmagic", "mmocr", "all"}</td>
<td>all</td>
<td>用来设置SDK后处理组件,加载 OpenMMLab 算法仓库的后处理功能。如果选择多个 codebase,中间使用分号隔开。比如,<code>-DMMDEPLOY_CODEBASES="mmcls;mmdet"</code>。也可以通过 <code>-DMMDEPLOY_CODEBASES=all</code> 方式,加载所有 codebase。</td>
</tr>
</tbody>
</table>
# ubuntu 交叉编译 aarch64
mmdeploy 选 ncnn 作为 aarch64 嵌入式 linux 设备的推理后端。 完整的部署分为两部分:
Host
- 模型转换
- 交叉编译嵌入式设备所需 SDK 和 bin
Device
- 运行编译结果
## 1. Host 模型转换
参照文档安装 [mmdeploy](../01-how-to-build/)[mmpretrain](https://github.com/open-mmlab/mmpretrain),转换 resnet18 对应模型包
```bash
export MODEL_CONFIG=/path/to/mmpretrain/configs/resnet/resnet18_8xb32_in1k.py
export MODEL_PATH=https://download.openmmlab.com/mmclassification/v0/resnet/resnet18_8xb32_in1k_20210831-fbbb1da6.pth
# 模型转换
cd /path/to/mmdeploy
python tools/deploy.py \
configs/mmpretrain/classification_ncnn_static.py \
$MODEL_CONFIG \
$MODEL_PATH \
tests/data/tiger.jpeg \
--work-dir resnet18 \
--device cpu \
--dump-info
```
## 2. Host 交叉编译
建议直接用脚本编译
```bash
sh -x tools/scripts/ubuntu_cross_build_aarch64.sh
```
以下是脚本对应的手动过程
a) 安装 aarch64 交叉编译工具
```bash
sudo apt install -y gcc-aarch64-linux-gnu g++-aarch64-linux-gnu
```
b) 交叉编译 opencv 安装到 tmp 目录
```bash
git clone https://github.com/opencv/opencv --depth=1 --branch=4.x --recursive
cd opencv/platforms/linux/
mkdir build && cd build
cmake ../../.. \
-DCMAKE_INSTALL_PREFIX=/tmp/ocv-aarch64 \
-DCMAKE_TOOLCHAIN_FILE=../aarch64-gnu.toolchain.cmake
make -j && make install
ls -alh /tmp/ocv-aarch64
..
```
c) 交叉编译 ncnn 安装到 tmp 目录
```bash
git clone https://github.com/tencent/ncnn --branch 20221128 --depth=1
mkdir build && cd build
cmake .. \
-DCMAKE_TOOLCHAIN_FILE=../toolchains/aarch64-linux-gnu.toolchain.cmake \
-DCMAKE_INSTALL_PREFIX=/tmp/ncnn-aarch64
make -j && make install
ls -alh /tmp/ncnn-aarch64
..
```
d) 交叉编译 mmdeploy,install/bin 目录是可执行文件
```bash
git submodule init
git submodule update
mkdir build && cd build
cmake .. \
-DCMAKE_TOOLCHAIN_FILE=../cmake/toolchains/aarch64-linux-gnu.cmake \
-DMMDEPLOY_TARGET_DEVICES="cpu" \
-DMMDEPLOY_TARGET_BACKENDS="ncnn" \
-Dncnn_DIR=/tmp/ncnn-aarch64/lib/cmake/ncnn \
-DOpenCV_DIR=/tmp/ocv-aarch64/lib/cmake/opencv4
make install
ls -lah install/bin/*
..
```
## 3. Device 执行
确认转换模型用了 `--dump-info`,这样 `resnet18` 目录才有 `pipeline.json` 等 SDK 所需文件。
把 dump 好的模型目录(resnet18)、可执行文件(image_classification)、测试图片(tests/data/tiger.jpeg)、交叉编译的 OpenCV(/tmp/ocv-aarch64) 拷贝到设备中
```bash
./image_classification cpu ./resnet18 tiger.jpeg
..
label: 292, score: 0.9261
label: 282, score: 0.0726
label: 290, score: 0.0008
label: 281, score: 0.0002
label: 340, score: 0.0001
```
# 如何在 Jetson 模组上安装 MMDeploy
本教程将介绍如何在 NVIDIA Jetson 平台上安装 MMDeploy。该方法已经在以下 3 种 Jetson 模组上进行了验证:
- Jetson Nano
- Jetson TX2
- Jetson AGX Xavier
## 预备
首先需要在 Jetson 模组上安装 JetPack SDK。
此外,在利用 MMDeploy 的 Model Converter 转换 PyTorch 模型为 ONNX 模型时,需要创建一个装有 PyTorch 的环境。
最后,关于编译工具链,要求 CMake 和 GCC 的版本分别不低于 3.14 和 7.0。
### JetPack SDK
JetPack SDK 为构建硬件加速的边缘 AI 应用提供了一个全面的开发环境。
其支持所有的 Jetson 模组及开发套件。
主要有两种安装 JetPack SDK 的方式:
1. 使用 SD 卡镜像方式,直接将镜像刻录到 SD 卡上
2. 使用 NVIDIA SDK Manager 进行安装
你可以在 NVIDIA [官网](https://developer.nvidia.com/jetpack-sdk-50dp)上找到详细的安装指南。
这里我们选择 [JetPack 4.6.1](https://developer.nvidia.com/jetpack-sdk-461) 作为装配 Jetson 模组的首选。MMDeploy 已经在 JetPack 4.6 rev3 及以上版本,TensorRT 8.0.1.6 及以上版本进行了测试。更早的 JetPack 版本与 TensorRT 7.x 存在不兼容的情况。
### Conda
安装 [Archiconda](https://github.com/Archiconda/build-tools/releases) 而不是 Anaconda,因为后者不提供针对 Jetson 的 wheel 文件。
```shell
wget https://github.com/Archiconda/build-tools/releases/download/0.2.3/Archiconda3-0.2.3-Linux-aarch64.sh
bash Archiconda3-0.2.3-Linux-aarch64.sh -b
echo -e '\n# set environment variable for conda' >> ~/.bashrc
echo ". ~/archiconda3/etc/profile.d/conda.sh" >> ~/.bashrc
echo 'export PATH=$PATH:~/archiconda3/bin' >> ~/.bashrc
echo -e '\n# set environment variable for pip' >> ~/.bashrc
echo 'export OPENBLAS_CORETYPE=ARMV8' >> ~/.bashrc
source ~/.bashrc
conda --version
```
完成安装后需创建并启动一个 conda 环境。
```shell
# 得到默认安装的 python3 版本
export PYTHON_VERSION=`python3 --version | cut -d' ' -f 2 | cut -d'.' -f1,2`
conda create -y -n mmdeploy python=${PYTHON_VERSION}
conda activate mmdeploy
```
```{note}
JetPack SDK 4+ 自带 python 3.6。我们强烈建议使用默认的 python 版本。尝试升级 python 可能会破坏 JetPack 环境。
如果必须安装更高版本的 python, 可以选择安装 JetPack 5+,其提供 python 3.8。
```
### PyTorch
[这里](https://forums.developer.nvidia.com/t/pytorch-for-jetson-version-1-10-now-available/72048)下载 Jetson 的 PyTorch wheel 文件并保存在本地目录 `/opt` 中。
此外,由于 torchvision 不提供针对 Jetson 平台的预编译包,因此需要从源码进行编译。
`torch 1.10.0``torchvision 0.11.1` 为例,可按以下方式进行安装:
```shell
# pytorch
wget https://nvidia.box.com/shared/static/fjtbno0vpo676a25cgvuqc1wty0fkkg6.whl -O torch-1.10.0-cp36-cp36m-linux_aarch64.whl
pip3 install torch-1.10.0-cp36-cp36m-linux_aarch64.whl
# torchvision
sudo apt-get install libjpeg-dev zlib1g-dev libpython3-dev libavcodec-dev libavformat-dev libswscale-dev libopenblas-dev -y
sudo rm -r torchvision
git clone https://github.com/pytorch/vision torchvision
cd torchvision
git checkout tags/v0.11.1 -b v0.11.1
export BUILD_VERSION=0.11.1
pip install -e .
```
如果安装其他版本的 PyTorch 和 torchvision,需参考[这里](https://pypi.org/project/torchvision/)的表格以保证版本兼容性。
### CMake
这里我们使用 CMake 截至2022年4月的最新版本 v3.23.1。
```shell
# purge existing
sudo apt-get purge cmake
sudo snap remove cmake
# install prebuilt binary
export CMAKE_VER=3.23.1
export ARCH=aarch64
wget https://github.com/Kitware/CMake/releases/download/v${CMAKE_VER}/cmake-${CMAKE_VER}-linux-${ARCH}.sh
chmod +x cmake-${CMAKE_VER}-linux-${ARCH}.sh
sudo ./cmake-${CMAKE_VER}-linux-${ARCH}.sh --prefix=/usr --skip-license
cmake --version
```
## 安装依赖项
MMDeploy 中的 Model Converter 依赖于 [MMCV](https://github.com/open-mmlab/mmcv) 和推理引擎 [TensorRT](https://developer.nvidia.com/tensorrt)
同时, MMDeploy 的 C/C++ Inference SDK 依赖于 [spdlog](https://github.com/gabime/spdlog), OpenCV, [ppl.cv](https://github.com/openppl-public/ppl.cv) 和 TensorRT 等。
因此,接下来我们将先介绍如何配置 TensorRT。
之后再分别展示安装 Model Converter 和 C/C++ Inference SDK 的步骤。
### 配置 TensorRT
JetPack SDK 自带 TensorRT。
但是为了能够在 Conda 环境中成功导入,我们需要将 TensorRT 拷贝进先前创建的 Conda 环境中。
```shell
cp -r /usr/lib/python${PYTHON_VERSION}/dist-packages/tensorrt* ~/archiconda3/envs/mmdeploy/lib/python${PYTHON_VERSION}/site-packages/
conda deactivate
conda activate mmdeploy
python -c "import tensorrt; print(tensorrt.__version__)" # 将会打印出 TensorRT 版本
# 为之后编译 MMDeploy 设置环境变量
export TENSORRT_DIR=/usr/include/aarch64-linux-gnu
# 将 cuda 路径和 lib 路径写入到环境变量 `$PATH` 和 `$LD_LIBRARY_PATH` 中, 为之后编译 MMDeploy 做准备
export PATH=$PATH:/usr/local/cuda/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64
```
你也可以通过添加以上环境变量至 `~/.bashrc` 使得它们永久化。
```shell
echo -e '\n# set environment variable for TensorRT' >> ~/.bashrc
echo 'export TENSORRT_DIR=/usr/include/aarch64-linux-gnu' >> ~/.bashrc
echo -e '\n# set environment variable for CUDA' >> ~/.bashrc
echo 'export PATH=$PATH:/usr/local/cuda/bin' >> ~/.bashrc
echo 'export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64' >> ~/.bashrc
source ~/.bashrc
conda activate mmdeploy
```
### 安装 Model Converter 的依赖项
- 安装 [MMCV](https://github.com/open-mmlab/mmcv)
MMCV 还未提供针对 Jetson 平台的预编译包,因此我们需要从源对其进行编译。
```shell
sudo apt-get install -y libssl-dev
git clone --branch 2.x https://github.com/open-mmlab/mmcv.git
cd mmcv
MMCV_WITH_OPS=1 pip install -e .
```
- 安装 ONNX
不要安装最新的 ONNX,推荐的 ONNX 版本是 1.10.0。
```shell
# 以下方式二选一
python3 -m pip install onnx==1.10.0
conda install -c conda-forge onnx
```
如果安装时,出现如下的报错信息:
```
CMake Error at CMakeLists.txt:299 (message):
Protobuf compiler not found
```
请根据如下指令,安装依赖项:
```shell
sudo apt-get install protobuf-compiler libprotoc-dev
```
- 安装 ONNX Runtime [可选]
访问 [Jetson_Zoo#ONNX_Runtime](https://elinux.org/Jetson_Zoo#ONNX_Runtime) 找到对应版本的 ONNX Runtime,然后下载并安装。
示例:
```
# Download pip wheel from location mentioned above
$ wget https://nvidia.box.com/shared/static/jy7nqva7l88mq9i8bw3g3sklzf4kccn2.whl -O onnxruntime_gpu-1.10.0-cp36-cp36m-linux_aarch64.whl
# Install pip wheel
$ pip3 install onnxruntime_gpu-1.10.0-cp36-cp36m-linux_aarch64.whl
```
- 安装 h5py 和 pycuda
Model Converter 使用 HDF5 存储 TensorRT INT8 量化的校准数据;需要 pycuda 拷贝显存
```shell
sudo apt-get install -y pkg-config libhdf5-100 libhdf5-dev
pip install versioned-hdf5 pycuda
```
### 安装 SDK 的依赖项
如果你不需要使用 MMDeploy C/C++ Inference SDK 则可以跳过本步骤。
- 安装 [spdlog](https://github.com/gabime/spdlog)
`spdlog` 是一个快速的,仅有头文件的 C++ 日志库。”
```shell
sudo apt-get install -y libspdlog-dev
```
- 安装 [ppl.cv](https://github.com/openppl-public/ppl.cv)
`ppl.cv`[OpenPPL](https://openppl.ai/home) 的高性能图像处理库。”
```shell
git clone https://github.com/openppl-public/ppl.cv.git
cd ppl.cv
export PPLCV_DIR=$(pwd)
echo -e '\n# set environment variable for ppl.cv' >> ~/.bashrc
echo "export PPLCV_DIR=$(pwd)" >> ~/.bashrc
./build.sh cuda
```
## 安装 MMDeploy
```shell
git clone -b main --recursive https://github.com/open-mmlab/mmdeploy.git
cd mmdeploy
export MMDEPLOY_DIR=$(pwd)
```
### 安装 Model Converter
由于一些算子采用的是 OpenMMLab 代码库中的实现,并不被 TenorRT 支持,
因此我们需要自定义 TensorRT 插件,例如 `roi_align``scatternd` 等。
你可以从[这里](../06-custom-ops/tensorrt.md)找到完整的自定义插件列表。
```shell
# 编译 TensorRT 自定义算子
mkdir -p build && cd build
cmake .. -DMMDEPLOY_TARGET_BACKENDS="trt"
make -j$(nproc) && make install
# 安装 model converter
cd ${MMDEPLOY_DIR}
pip install -v -e .
# "-v" 表示显示详细安装信息
# "-e" 表示在可编辑模式下安装
# 因此任何针对代码的本地修改都可以在无需重装的情况下生效。
```
### 安装 C/C++ Inference SDK
如果你不需要使用 MMDeploy C/C++ Inference SDK 则可以跳过本步骤。
1. 编译 SDK Libraries 和 Demos
```shell
mkdir -p build && cd build
cmake .. \
-DMMDEPLOY_BUILD_SDK=ON \
-DMMDEPLOY_BUILD_SDK_PYTHON_API=ON \
-DMMDEPLOY_BUILD_EXAMPLES=ON \
-DMMDEPLOY_TARGET_DEVICES="cuda;cpu" \
-DMMDEPLOY_TARGET_BACKENDS="trt" \
-DMMDEPLOY_CODEBASES=all \
-Dpplcv_DIR=${PPLCV_DIR}/cuda-build/install/lib/cmake/ppl
make -j$(nproc) && make install
```
2. 运行 demo
以目标检测为例:
```shell
./object_detection cuda ${directory/to/the/converted/models} ${path/to/an/image}
```
## Troubleshooting
### 安装
- `pip install` 报错 `Illegal instruction (core dumped)`
```shell
echo '# set env for pip' >> ~/.bashrc
echo 'export OPENBLAS_CORETYPE=ARMV8' >> ~/.bashrc
source ~/.bashrc
```
如果上述方法仍无法解决问题,检查是否正在使用镜像文件。如果是的,可尝试:
```shell
rm .condarc
conda clean -i
conda create -n xxx python=${PYTHON_VERSION}
```
### 执行
- `#assertion/root/workspace/mmdeploy/csrc/backend_ops/tensorrt/batched_nms/trt_batched_nms.cpp,98` or `pre_top_k need to be reduced for devices with arch 7.2`
1. 设置为 `MAX N` 模式并执行 `sudo nvpmodel -m 0 && sudo jetson_clocks`
2. 效仿 [mmdet pre_top_k](https://github.com/open-mmlab/mmdeploy/blob/34879e638cc2db511e798a376b9a4b9932660fe1/configs/mmdet/_base_/base_static.py#L13),减少配置文件中 `pre_top_k` 的个数,例如 `1000`
3. 重新进行模型转换并重新运行 demo。
### FAQ
- 错误 `error: cannot import name 'ProcessGroup' from 'torch.distributed'.`
- 请访问 [pytorch-for-jetson](https://forums.developer.nvidia.com/t/pytorch-for-jetson/72048) 安装 pytorch 1.11 版本。
# Linux-x86_64 下构建方式
- [Linux-x86_64 下构建方式](#linux-x86_64-下构建方式)
- [源码安装](#源码安装)
- [安装构建和编译工具链](#安装构建和编译工具链)
- [安装依赖包](#安装依赖包)
- [安装 MMDeploy Converter 依赖](#安装-mmdeploy-converter-依赖)
- [安装 MMDeploy SDK 依赖](#安装-mmdeploy-sdk-依赖)
- [安装推理引擎](#安装推理引擎)
- [编译 MMDeploy](#编译-mmdeploy)
- [编译 Model Converter](#编译-model-converter)
- [安装 Model Converter](#安装-model-converter)
- [编译 SDK 和 Demos](#编译-sdk-和-demos)
______________________________________________________________________
## 源码安装
### 安装构建和编译工具链
- cmake
**保证 cmake的版本 >= 3.14.0**。如果不是,可以参考以下命令安装 3.20.0 版本。如需获取其他版本,请参考[这里](https://cmake.org/install)
```bash
wget https://github.com/Kitware/CMake/releases/download/v3.20.0/cmake-3.20.0-linux-x86_64.tar.gz
tar -xzvf cmake-3.20.0-linux-x86_64.tar.gz
sudo ln -sf $(pwd)/cmake-3.20.0-linux-x86_64/bin/* /usr/bin/
```
- GCC 7+
MMDeploy SDK 使用了 C++17 特性,因此需要安装gcc 7+以上的版本。
```bash
# 如果 Ubuntu 版本 < 18.04,需要加入仓库
sudo add-apt-repository ppa:ubuntu-toolchain-r/test
sudo apt-get update
sudo apt-get install gcc-7
sudo apt-get install g++-7
```
### 安装依赖包
#### 安装 MMDeploy Converter 依赖
<table class="docutils">
<thead>
<tr>
<th>名称 </th>
<th>安装说明 </th>
</tr>
</thead>
<tbody>
<tr>
<td>conda </td>
<td>请参考<a href="https://docs.conda.io/projects/conda/en/latest/user-guide/install/index.html">官方说明</a>安装 conda。<br> 通过 conda 创建并激活 Python 环境。<br>
<pre><code>
conda create -n mmdeploy python=3.7 -y
conda activate mmdeploy
</code></pre>
</td>
</tr>
<tr>
<td>PyTorch <br>(>=1.8.0) </td>
<td>安装 PyTorch,要求版本是 torch>=1.8.0。可查看<a href="https://pytorch.org/">官网</a>获取更多详细的安装教程。请确保 PyTorch 要求的 CUDA 版本和您主机的 CUDA 版本是一致<br>
<pre><code>
conda install pytorch==1.8.0 torchvision==0.9.0 cudatoolkit=11.1 -c pytorch -c conda-forge
</code></pre>
</td>
</tr>
<tr>
<td>mmcv </td>
<td>参考如下命令安装 mmcv。更多安装方式,可查看 <a href="https://github.com/open-mmlab/mmcv">mmcv 官网</a><br>
<pre><code>
export cu_version=cu111 # cuda 11.1
export torch_version=torch1.8
pip install -U openmim
mim install mmengine
mim install "mmcv>=2.0.0rc2"
</code></pre>
</td>
</tr>
</tbody>
</table>
#### 安装 MMDeploy SDK 依赖
如果您只对模型转换感兴趣,那么可以跳过本章节。
<table class="docutils">
<thead>
<tr>
<th>名称 </th>
<th>安装说明 </th>
</tr>
</thead>
<tbody>
<tr>
<td>OpenCV<br>(>=3.0) </td>
<td>
在 Ubuntu 18.04 及以上版本
<pre><code>
sudo apt-get install libopencv-dev
</code></pre>
在 Ubuntu 16.04 中,需要源码安装 OpenCV。您可以参考<a href="https://docs.opencv.org/3.4/d7/d9f/tutorial_linux_install.html">此处</a>.
</td>
</tr>
<tr>
<td>pplcv </td>
<td>pplcv 是 openPPL 开发的高性能图像处理库。 <b>此依赖项为可选项,只有在 cuda 平台下,才需安装。</b><br>
<pre><code>
git clone https://github.com/openppl-public/ppl.cv.git
cd ppl.cv
export PPLCV_DIR=$(pwd)
git checkout tags/v0.7.0 -b v0.7.0
./build.sh cuda
</code></pre>
</td>
</tr>
</tbody>
</table>
#### 安装推理引擎
MMDeploy 的 Model Converter 和 SDK 共享推理引擎。您可以参考下文,选择自己感兴趣的推理引擎安装。
<table class="docutils">
<thead>
<tr>
<th>名称</th>
<th>安装包</th>
<th>安装说明</th>
</tr>
</thead>
<tbody>
<tr>
<td>ONNXRuntime</td>
<td>onnxruntime<br>(>=1.8.1) </td>
<td>
1. 安装 onnxruntime 的 python 包
<pre><code>pip install onnxruntime==1.8.1</code></pre>
2.<a href="https://github.com/microsoft/onnxruntime/releases/tag/v1.8.1">这里</a>下载 onnxruntime 的预编译包。参考如下命令,解压压缩包并设置环境变量
<pre><code>
wget https://github.com/microsoft/onnxruntime/releases/download/v1.8.1/onnxruntime-linux-x64-1.8.1.tgz
tar -zxvf onnxruntime-linux-x64-1.8.1.tgz
cd onnxruntime-linux-x64-1.8.1
export ONNXRUNTIME_DIR=$(pwd)
export LD_LIBRARY_PATH=$ONNXRUNTIME_DIR/lib:$LD_LIBRARY_PATH
</code></pre>
</td>
</tr>
<tr>
<td rowspan="2">TensorRT<br> </td>
<td>TensorRT <br> </td>
<td>
1. 登录 <a href="https://www.nvidia.com/">NVIDIA 官网</a>,从<a href="https://developer.nvidia.com/nvidia-tensorrt-download">这里</a>选取并下载 TensorRT tar 包。要保证它和您机器的 CPU 架构以及 CUDA 版本是匹配的。<br>
您可以参考这份<a href="https://docs.nvidia.com/deeplearning/tensorrt/install-guide/index.html#installing-tar">指南</a>安装 TensorRT。<br>
1. 这里也有一份 TensorRT 8.2 GA Update 2 在 Linux x86_64 和 CUDA 11.x 下的安装示例,供您参考。首先,点击<a href="https://developer.nvidia.com/compute/machine-learning/tensorrt/secure/8.2.3.0/tars/tensorrt-8.2.3.0.linux.x86_64-gnu.cuda-11.4.cudnn8.2.tar.gz">此处</a>下载 CUDA 11.x TensorRT 8.2.3.0。然后,根据如下命令,安装并配置 TensorRT 以及相关依赖。
<pre><code>
cd /the/path/of/tensorrt/tar/gz/file
tar -zxvf TensorRT-8.2.3.0.Linux.x86_64-gnu.cuda-11.4.cudnn8.2.tar.gz
pip install TensorRT-8.2.3.0/python/tensorrt-8.2.3.0-cp37-none-linux_x86_64.whl
export TENSORRT_DIR=$(pwd)/TensorRT-8.2.3.0
export LD_LIBRARY_PATH=$TENSORRT_DIR/lib:$LD_LIBRARY_PATH
pip install pycuda
</code></pre>
</td>
</tr>
<tr>
<td>cuDNN </td>
<td>
1.<a href="https://developer.nvidia.com/rdp/cudnn-archive">cuDNN Archive</a> 选择和您环境中 CPU 架构、CUDA 版本以及 TensorRT 版本配套的 cuDNN。以前文 TensorRT 安装说明为例,它需要 cudnn8.2。因此,可以下载 <a href="https://developer.nvidia.com/compute/machine-learning/cudnn/secure/8.2.1.32/11.3_06072021/cudnn-11.3-linux-x64-v8.2.1.32.tgz">CUDA 11.x cuDNN 8.2</a><br>
2. 解压压缩包,并设置环境变量
<pre><code>
cd /the/path/of/cudnn/tgz/file
tar -zxvf cudnn-11.3-linux-x64-v8.2.1.32.tgz
export CUDNN_DIR=$(pwd)/cuda
export LD_LIBRARY_PATH=$CUDNN_DIR/lib64:$LD_LIBRARY_PATH
</code></pre>
</td>
</tr>
<tr>
<td>PPL.NN</td>
<td>ppl.nn </td>
<td>
1. 请参考 ppl.nn 的 <a href="https://github.com/openppl-public/ppl.nn/blob/master/docs/en/building-from-source.md">安装文档</a> 编译 ppl.nn,并安装 pyppl<br>
2. 将 pplnn 的根目录写入环境变量
<pre><code>
cd ppl.nn
export PPLNN_DIR=$(pwd)
</code></pre>
</td>
</tr>
<tr>
<td>OpenVINO</td>
<td>openvino </td>
<td>1. 安装 <a href="https://docs.openvino.ai/2021.4/get_started.html">OpenVINO</a>
<pre><code>
pip install openvino-dev
</code></pre>
2. <b>可选</b>. 如果您想在 MMDeploy SDK 中使用 OpenVINO,请根据<a href="https://docs.openvino.ai/2021.4/openvino_docs_install_guides_installing_openvino_linux.html#install-openvino">指南</a>安装并配置它
</td>
</tr>
<tr>
<td>ncnn </td>
<td>ncnn </td>
<td>1. 请参考 ncnn的 <a href="https://github.com/Tencent/ncnn/wiki/how-to-build">wiki</a> 编译 ncnn。
编译时,请打开<code>-DNCNN_PYTHON=ON</code><br>
2. 将 ncnn 的根目录写入环境变量
<pre><code>
cd ncnn
export NCNN_DIR=$(pwd)
</code></pre>
3. 安装 pyncnn
<pre><code>
cd ${NCNN_DIR}/python
pip install -e .
</code></pre>
</td>
</tr>
<tr>
<td>TorchScript</td>
<td>libtorch</td>
<td>
1. Download libtorch from <a href="https://pytorch.org/get-started/locally/">here</a>. Please note that only <b>Pre-cxx11 ABI</b> and <b>version 1.8.1+</b> on Linux platform are supported by now. For previous versions of libtorch, you can find them in the <a href="https://github.com/pytorch/pytorch/issues/40961#issuecomment-1017317786">issue comment</a>. <br>
2. Take Libtorch1.8.1+cu111 as an example. You can install it like this:
<pre><code>
wget https://download.pytorch.org/libtorch/cu111/libtorch-shared-with-deps-1.8.1%2Bcu111.zip
unzip libtorch-shared-with-deps-1.8.1+cu111.zip
cd libtorch
export Torch_DIR=$(pwd)
export LD_LIBRARY_PATH=$Torch_DIR/lib:$LD_LIBRARY_PATH
</code></pre>
</td>
</tr>
<tr>
<td>Ascend</td>
<td>CANN</td>
<td>
1. 按照 <a href="https://www.hiascend.com/document/detail/zh/CANNCommunityEdition/60RC1alpha02/softwareinstall/instg/atlasdeploy_03_0002.html">官方指引</a> 安装 CANN 工具集.<br>
2. 配置环境
<pre><code>
export ASCEND_TOOLKIT_HOME="/usr/local/Ascend/ascend-toolkit/latest"
</code></pre>
</td>
</tr>
<tr>
<td>TVM</td>
<td>TVM</td>
<td>
1. 按照 <a href="https://tvm.apache.org/docs/install/from_source.html">官方指引</a>安装 TVM.<br>
2. 配置环境
<pre><code>
export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${TVM_HOME}/build
export PYTHONPATH=${TVM_HOME}/python:${PYTHONPATH}
</code></pre>
</td>
</tr>
</tbody>
</table>
注意: <br>
如果您想使上述环境变量永久有效,可以把它们加入<code>~/.bashrc</code>。以 ONNXRuntime 的环境变量为例,
```bash
echo '# set env for onnxruntime' >> ~/.bashrc
echo "export ONNXRUNTIME_DIR=${ONNXRUNTIME_DIR}" >> ~/.bashrc
echo "export LD_LIBRARY_PATH=$ONNXRUNTIME_DIR/lib:$LD_LIBRARY_PATH" >> ~/.bashrc
source ~/.bashrc
```
### 编译 MMDeploy
```bash
cd /the/root/path/of/MMDeploy
export MMDEPLOY_DIR=$(pwd)
```
#### 编译 Model Converter
如果您选择了ONNXRuntime,TensorRT,ncnn 和 torchscript 任一种推理后端,您需要编译对应的自定义算子库。
- **ONNXRuntime** 自定义算子
```bash
cd ${MMDEPLOY_DIR}
mkdir -p build && cd build
cmake -DCMAKE_CXX_COMPILER=g++-7 -DMMDEPLOY_TARGET_BACKENDS=ort -DONNXRUNTIME_DIR=${ONNXRUNTIME_DIR} ..
make -j$(nproc) && make install
```
- **TensorRT** 自定义算子
```bash
cd ${MMDEPLOY_DIR}
mkdir -p build && cd build
cmake -DCMAKE_CXX_COMPILER=g++-7 -DMMDEPLOY_TARGET_BACKENDS=trt -DTENSORRT_DIR=${TENSORRT_DIR} -DCUDNN_DIR=${CUDNN_DIR} ..
make -j$(nproc) && make install
```
- **ncnn** 自定义算子
```bash
cd ${MMDEPLOY_DIR}
mkdir -p build && cd build
cmake -DCMAKE_CXX_COMPILER=g++-7 -DMMDEPLOY_TARGET_BACKENDS=ncnn -Dncnn_DIR=${NCNN_DIR}/build/install/lib/cmake/ncnn ..
make -j$(nproc) && make install
```
- **torchscript** 自定义算子
```bash
cd ${MMDEPLOY_DIR}
mkdir -p build && cd build
cmake -DMMDEPLOY_TARGET_BACKENDS=torchscript -DTorch_DIR=${Torch_DIR} ..
make -j$(nproc) && make install
```
参考 [cmake 选项说明](cmake_option.md)
#### 安装 Model Converter
```bash
cd ${MMDEPLOY_DIR}
mim install -e .
```
**注意**
- 有些依赖项是可选的。运行 `pip install -e .` 将进行最小化依赖安装。 如果需安装其他可选依赖项,请执行`pip install -r requirements/optional.txt`
或者 `pip install -e .[optional]`。其中,`[optional]`可以替换为:`all``tests``build``optional`
- cuda10 建议安装[补丁包](https://developer.nvidia.com/cuda-10.2-download-archive?target_os=Linux&target_arch=x86_64&target_distro=Ubuntu&target_version=1804&target_type=runfilelocal),否则模型运行可能出现 GEMM 相关错误
#### 编译 SDK 和 Demos
下文展示一些构建 SDK 的样例。您可以参考它们,激活其他的推理引擎。
- cpu + ONNXRuntime
```Bash
cd ${MMDEPLOY_DIR}
mkdir -p build && cd build
cmake .. \
-DCMAKE_CXX_COMPILER=g++-7 \
-DMMDEPLOY_BUILD_SDK=ON \
-DMMDEPLOY_BUILD_EXAMPLES=ON \
-DMMDEPLOY_BUILD_SDK_PYTHON_API=ON \
-DMMDEPLOY_TARGET_DEVICES=cpu \
-DMMDEPLOY_TARGET_BACKENDS=ort \
-DONNXRUNTIME_DIR=${ONNXRUNTIME_DIR}
make -j$(nproc) && make install
```
- cuda + TensorRT
```Bash
cd ${MMDEPLOY_DIR}
mkdir -p build && cd build
cmake .. \
-DCMAKE_CXX_COMPILER=g++-7 \
-DMMDEPLOY_BUILD_SDK=ON \
-DMMDEPLOY_BUILD_EXAMPLES=ON \
-DMMDEPLOY_BUILD_SDK_PYTHON_API=ON \
-DMMDEPLOY_TARGET_DEVICES="cuda;cpu" \
-DMMDEPLOY_TARGET_BACKENDS=trt \
-Dpplcv_DIR=${PPLCV_DIR}/cuda-build/install/lib/cmake/ppl \
-DTENSORRT_DIR=${TENSORRT_DIR} \
-DCUDNN_DIR=${CUDNN_DIR}
make -j$(nproc) && make install
```
- pplnn
```Bash
cd ${MMDEPLOY_DIR}
mkdir -p build && cd build
cmake .. \
-DCMAKE_CXX_COMPILER=g++-7 \
-DMMDEPLOY_BUILD_SDK=ON \
-DMMDEPLOY_BUILD_EXAMPLES=ON \
-DMMDEPLOY_BUILD_SDK_PYTHON_API=ON \
-DMMDEPLOY_TARGET_DEVICES="cuda;cpu" \
-DMMDEPLOY_TARGET_BACKENDS=pplnn \
-Dpplcv_DIR=${PPLCV_DIR}/cuda-build/install/lib/cmake/ppl \
-Dpplnn_DIR=${PPLNN_DIR}/pplnn-build/install/lib/cmake/ppl
make -j$(nproc) && make install
```
- cuda + TensorRT + onnxruntime + openvino + ncnn
如果使用了 [ncnn 自动安装脚本](../../../tools/scripts/build_ubuntu_x64_ncnn.py), protobuf 会安装在 mmdeploy 同级目录的 mmdeploy-dep/pbinstall 中。
```Bash
export PROTO_DIR=/path/to/mmdeploy-dep/pbinstall
cmake .. \
-DCMAKE_CXX_COMPILER=g++-7 \
-DMMDEPLOY_BUILD_SDK=ON \
-DMMDEPLOY_BUILD_EXAMPLES=ON \
-DMMDEPLOY_BUILD_SDK_PYTHON_API=ON \
-DMMDEPLOY_TARGET_DEVICES="cuda;cpu" \
-DMMDEPLOY_TARGET_BACKENDS="trt;ort;ncnn;openvino" \
-Dpplcv_DIR=${PPLCV_DIR}/cuda-build/install/lib/cmake/ppl \
-DTENSORRT_DIR=${TENSORRT_DIR} \
-DCUDNN_DIR=${CUDNN_DIR} \
-DONNXRUNTIME_DIR=${ONNXRUNTIME_DIR} \
-DInferenceEngine_DIR=${OPENVINO_DIR}/runtime/cmake \
-Dncnn_DIR=${NCNN_DIR}/build/install/lib/cmake/ncnn \
-DProtobuf_LIBRARIES=${PROTO_DIR}/lib/libprotobuf.so \
-DProtobuf_PROTOC_EXECUTABLE=${PROTO_DIR}/bin/protoc \
-DProtobuf_INCLUDE_DIR=${PROTO_DIR}/pbinstall/include
```
```
```
# macOS-arm64 下构建方式
- [macOS-arm64 下构建方式](#macos-arm64-下构建方式)
- [源码安装](#源码安装)
- [安装构建和编译工具链](#安装构建和编译工具链)
- [安装依赖包](#安装依赖包)
- [安装 MMDeploy Converter 依赖](#安装-mmdeploy-converter-依赖)
- [安装 MMDeploy SDK 依赖](#安装-mmdeploy-sdk-依赖)
- [安装推理引擎](#安装推理引擎)
- [编译 MMDeploy](#编译-mmdeploy)
- [编译 Model Converter](#编译-model-converter)
- [安装 Model Converter](#安装-model-converter)
- [编译 SDK 和 Demos](#编译-sdk-和-demos)
## 源码安装
### 安装构建和编译工具链
- cmake
```
brew install cmake
```
- clang
安装 Xcode 或者通过如下命令安装 Command Line Tools
```
xcode-select --install
```
### 安装依赖包
#### 安装 MMDeploy Converter 依赖
参考[get_started](../get_started.md)文档,安装conda。
```bash
# install pytoch & mmcv
conda install pytorch==1.9.0 torchvision==0.10.0 -c pytorch
pip install -U openmim
mim install mmengine
mim install "mmcv>=2.0.0rc2"
```
#### 安装 MMDeploy SDK 依赖
如果您只对模型转换感兴趣,那么可以跳过本章节。
<table class="docutils">
<thead>
<tr>
<th>名称 </th>
<th>安装说明 </th>
</tr>
</thead>
<tbody>
<tr>
<td>OpenCV<br>(>=3.0) </td>
<td>
<pre><code>
brew install opencv
</code></pre>
</td>
</tbody>
</table>
#### 安装推理引擎
MMDeploy 的 Model Converter 和 SDK 共享推理引擎。您可以参考下文,选择自己感兴趣的推理引擎安装。这里重点介绍 Core ML。ONNX Runtime,ncnn 以及 TorchScript 的安装类似 linux 平台,可参考文档 [linux-x86_64](linux-x86_64.md) 进行安装。
Core ML 模型的转化过程中使用 TorchScript 模型作为IR,为了支持含有自定义算子的模型,如 mmdet 中的检测模型,需要安装 libtorch,这里作简单说明。
<table class="docutils">
<thead>
<tr>
<th>名称</th>
<th>安装包</th>
<th>安装说明</th>
</tr>
</thead>
<tbody>
<tr>
<td>Core ML</td>
<td>coremltools</td>
<td>
<pre><code>
pip install coremltools==6.3
</code></pre>
</td>
</tr>
<tr>
<td>TorchScript</td>
<td>libtorch</td>
<td>
1. libtorch暂不提供arm版本的library,故需要自行编译。编译时注意libtorch要和pytorch的版本保持一致,这样编译出的自定义算子才可以加载成功。<br>
2. 以libtorch 1.9.0为例,可通过如下命令安装:
<pre><code>
git clone -b v1.9.0 --recursive https://github.com/pytorch/pytorch.git
cd pytorch
mkdir build && cd build
cmake .. \
-DCMAKE_BUILD_TYPE=Release \
-DPYTHON_EXECUTABLE=`which python` \
-DCMAKE_INSTALL_PREFIX=install \
-DDISABLE_SVE=ON # low version like 1.9.0 of pytorch need DISABLE_SVE option
make -j4 && make install
export Torch_DIR=$(pwd)/install/share/cmake/Torch
</code></pre>
</td>
</tr>
</tbody>
</table>
### 编译 MMDeploy
```bash
cd /the/root/path/of/MMDeploy
export MMDEPLOY_DIR=$(pwd)
```
#### 编译 Model Converter
这里介绍使用 Core ML 作为推理后端所需的操作。
- **Core ML**
Core ML使用 torchscript 作为IR,某些 codebase 如 mmdet 需要编译 torchscript 自定义算子。
- **torchscript** 自定义算子
```bash
cd ${MMDEPLOY_DIR}
mkdir -p build && cd build
cmake -DMMDEPLOY_TARGET_BACKENDS=coreml -DTorch_DIR=${Torch_DIR} ..
make -j4 && make install
```
参考 [cmake 选项说明](cmake_option.md)
#### 安装 Model Converter
```bash
# requirements/runtime.txt 中依赖项grpcio,通过pip安装的方式无法正常import, 需使用 conda 安装
conda install grpcio
```
```bash
cd ${MMDEPLOY_DIR}
mim install -v -e .
```
**注意**
- 有些依赖项是可选的。运行 `pip install -e .` 将进行最小化依赖安装。 如果需安装其他可选依赖项,请执行`pip install -r requirements/optional.txt`
或者 `pip install -e .[optional]`。其中,`[optional]`可以替换为:`all``tests``build``optional`
#### 编译 SDK 和 Demos
下文展示使用 Core ML 作为推理引擎,构建SDK的样例。
- cpu + Core ML
```Bash
cd ${MMDEPLOY_DIR}
mkdir -p build && cd build
cmake .. \
-DMMDEPLOY_BUILD_SDK=ON \
-DMMDEPLOY_BUILD_EXAMPLES=ON \
-DMMDEPLOY_BUILD_SDK_PYTHON_API=ON \
-DMMDEPLOY_TARGET_DEVICES=cpu \
-DMMDEPLOY_TARGET_BACKENDS=coreml \
-DTorch_DIR=${Torch_DIR}
make -j4 && make install
```
# 支持 RISC-V
MMDeploy 选择 ncnn 作为 RISC-V 平台下的推理后端。完整的部署过程包含两个步骤:
模型转换:在 Host 端将 PyTorch 模型转为 ncnn 模型。并将转换后的模型传到 device。
模型部署:在 Host 端以交叉编译方式编译 ncnn 和 MMDeploy。传到 Device 端进行推理。
## 1. 模型转换
a) 安装 MMDeploy
可参考 [BUILD 文档](./linux-x86_64.md),安装 ncnn 推理引擎以及 MMDeploy。
b) 模型转换
以 Resnet-18 为例。先参照[文档](https://github.com/open-mmlab/mmpretrain)安装 mmpretrain,然后使用 `tools/deploy.py` 转换模型。
```bash
export MODEL_CONFIG=/path/to/mmpretrain/configs/resnet/resnet18_8xb32_in1k.py
export MODEL_PATH=https://download.openmmlab.com/mmclassification/v0/resnet/resnet18_8xb32_in1k_20210831-fbbb1da6.pth
# 模型转换
cd /path/to/mmdeploy
python tools/deploy.py \
configs/mmpretrain/classification_ncnn_static.py \
$MODEL_CONFIG \
$MODEL_PATH \
tests/data/tiger.jpeg \
--work-dir resnet18 \
--device cpu \
--dump-info
```
## 2. 模型部署
a) 下载交叉编译工具链,设置环境变量
```bash
# 下载 Xuantie-900-gcc-linux-5.10.4-glibc-x86_64-V2.2.6-20220516.tar.gz
# https://occ.t-head.cn/community/download?id=4046947553902661632
tar xf Xuantie-900-gcc-linux-5.10.4-glibc-x86_64-V2.2.6-20220516.tar.gz
export RISCV_ROOT_PATH=`realpath Xuantie-900-gcc-linux-5.10.4-glibc-x86_64-V2.2.6`
```
b) 编译 ncnn & opencv
```bash
# ncnn
# 可参考 https://github.com/Tencent/ncnn/wiki/how-to-build#build-for-allwinner-d1
# opencv
git clone https://github.com/opencv/opencv.git
mkdir build_riscv && cd build_riscv
cmake .. \
-DCMAKE_TOOLCHAIN_FILE=/path/to/mmdeploy/cmake/toolchains/riscv64-unknown-linux-gnu.cmake \
-DCMAKE_INSTALL_PREFIX=install \
-DBUILD_PERF_TESTS=OFF \
-DBUILD_SHARED_LIBS=OFF \
-DBUILD_TESTS=OFF \
-DCMAKE_BUILD_TYPE=Release
make -j$(nproc) && make install
```
c) 编译 mmdeploy SDK & demo
```bash
cd /path/to/mmdeploy
mkdir build_riscv && cd build_riscv
cmake .. \
-DCMAKE_TOOLCHAIN_FILE=../cmake/toolchains/riscv64-unknown-linux-gnu.cmake \
-DMMDEPLOY_BUILD_SDK=ON \
-DMMDEPLOY_SHARED_LIBS=OFF \
-DMMDEPLOY_BUILD_EXAMPLES=ON \
-DMMDEPLOY_TARGET_DEVICES="cpu" \
-DMMDEPLOY_TARGET_BACKENDS="ncnn" \
-Dncnn_DIR=${ncnn_DIR}/build-c906/install/lib/cmake/ncnn/ \
-DMMDEPLOY_CODEBASES=all \
-DOpenCV_DIR=${OpenCV_DIR}/build_riscv/install/lib/cmake/opencv4
make -j$(nproc) && make install
```
执行 `make install` 之后, examples的可执行文件会保存在 `install/bin`
```
tree -L 1 install/bin/
.
├── image_classification
├── image_restorer
├── image_segmentation
├── object_detection
├── ocr
├── pose_detection
└── rotated_object_detection
```
d) 运行 demo
先确认测试模型用了 `--dump-info`,这样 `resnet18` 目录才有 `pipeline.json` 等 SDK 所需文件。
把 dump 好的模型目录(resnet18)、可执行文件(image_classification)、测试图片(tests/data/tiger.jpeg)拷贝到设备中
```bash
./image_classification cpu ./resnet18 tiger.jpeg
```
# 瑞芯微 NPU 部署
- [瑞芯微 NPU 部署](#瑞芯微-npu-部署)
- [模型转换](#模型转换)
- [安装环境](#安装环境)
- [分类模型转换](#分类模型转换)
- [检测模型转换](#检测模型转换)
- [部署 config 说明](#部署-config-说明)
- [问题说明](#问题说明)
- [模型推理](#模型推理)
- [Host 交叉编译](#host-交叉编译)
- [Device 执行推理](#device-执行推理)
______________________________________________________________________
MMDeploy 支持把模型部署到瑞芯微设备上。已支持的芯片:RV1126、RK3588。
完整的部署过程包含两个步骤:
1. 模型转换
- 在主机上,将 PyTorch 模型转换为 RKNN 模型
2. 模型推理
- 在主机上, 使用交叉编译工具得到设备所需的 SDK 和 bin
- 把转好的模型和编好的 SDK、bin,传到设备,进行推理
## 模型转换
### 安装环境
1. 请参考[快速入门](../get_started.md),创建 conda 虚拟环境,并安装 PyTorch、mmcv-full
2. 安装 RKNN Toolkit
如下表所示,瑞芯微提供了 2 套 RKNN Toolkit,对应于不同的芯片型号
<table>
<thead>
<tr>
<th>Device</th>
<th>RKNN-Toolkit</th>
<th>Installation Guide</th>
</tr>
</thead>
<tbody>
<tr>
<td>RK1808 / RK1806 / RV1109 / RV1126</td>
<td><code>git clone https://github.com/rockchip-linux/rknn-toolkit</code></td>
<td><a href="https://github.com/rockchip-linux/rknn-toolkit2/tree/master/doc">安装指南</a></td>
</tr>
<tr>
<td>RK3566 / RK3568 / RK3588 / RV1103 / RV1106</td>
<td><code>git clone https://github.com/rockchip-linux/rknn-toolkit2</code></td>
<td><a href="https://github.com/rockchip-linux/rknn-toolkit/tree/master/docs">安装指南</a></td>
</tr>
</tbody>
</table>
2.1 通过 `git clone` 下载和设备匹配的 RKNN Toolkit
2.2 参考表中的安装指南,安装 RKNN python 安装包。建议在安装时,使用选项 `--no-deps`,以避免依赖包的冲突。以 rknn-toolkit2 为例:
```
pip install packages/rknn_toolkit2-1.4.0_22dcfef4-cp36-cp36m-linux_x86_64.whl --no-deps
```
2.3 先安装onnx==1.8.0,跟着 [instructions](../01-how-to-build/build_from_source.md),源码安装 MMDeploy。 需要注意的是, MMDeploy 和 RKNN 依赖的安装包间有冲突的内容. 这里提供建议在 python 3.6 环境中使用的安装包版本:
```
protobuf==3.19.4
onnx==1.8.0
onnxruntime==1.8.0
torch==1.8.0
torchvision==0.9.0
```
### 分类模型转换
以 mmpretrain 中的 resnet50 为例,模型转换命令如下:
```shell
# 安装 mmpretrain
pip install mmpretrain
git clone https://github.com/open-mmlab/mmpretrain
# 执行转换命令
cd /the/path/of/mmdeploy
python tools/deploy.py \
configs/mmpretrain/classification_rknn-fp16_static-224x224.py \
/the/path/of/mmpretrain/configs/resnet/resnet50_8xb32_in1k.py \
https://download.openmmlab.com/mmclassification/v0/resnet/resnet50_batch256_imagenet_20200708-cfb998bf.pth \
/the/path/of/mmpretrain/demo/demo.JPEG \
--work-dir mmdeploy_models/mmpretrain/resnet50 \
--device cpu \
--dump-info
```
```{note}
若转换过程中,遇到 NoModuleFoundError 的问题,使用 pip install 对应的包
```
### 检测模型转换
- YOLOV3 & YOLOX
将下面的模型拆分配置写入到 [detection_rknn_static.py](https://github.com/open-mmlab/mmdeploy/blob/main/configs/mmdet/detection/detection_rknn-int8_static-320x320.py)
```python
# yolov3, yolox for rknn-toolkit and rknn-toolkit2
partition_config = dict(
type='rknn', # the partition policy name
apply_marks=True, # should always be set to True
partition_cfg=[
dict(
save_file='model.onnx', # name to save the partitioned onnx
start=['detector_forward:input'], # [mark_name:input, ...]
end=['yolo_head:input'], # [mark_name:output, ...]
output_names=[f'pred_maps.{i}' for i in range(3)]) # output names
])
```
执行命令:
```shell
# 安装 mmdet
pip install mmdet
git clone https://github.com/open-mmlab/mmdetection
# 执行转换命令
python tools/deploy.py \
configs/mmdet/detection/detection_rknn-int8_static-320x320.py \
/the/path/of/mmdet/configs/yolov3/yolov3_mobilenetv2_320_300e_coco.py \
https://download.openmmlab.com/mmdetection/v2.0/yolo/yolov3_mobilenetv2_320_300e_coco/yolov3_mobilenetv2_320_300e_coco_20210719_215349-d18dff72.pth \
/the/path/of/mmdet/demo/demo.jpg \
--work-dir mmdeploy_models/mmdet/yolov3 \
--device cpu \
--dump-info
```
- RTMDet
将下面的模型拆分配置写入到 [detection_rknn-int8_static-640x640.py](https://github.com/open-mmlab/mmdeploy/blob/main/configs/mmdet/detection/detection_rknn-int8_static-640x640.py)
```python
# rtmdet for rknn-toolkit and rknn-toolkit2
partition_config = dict(
type='rknn', # the partition policy name
apply_marks=True, # should always be set to True
partition_cfg=[
dict(
save_file='model.onnx', # name to save the partitioned onnx
start=['detector_forward:input'], # [mark_name:input, ...]
end=['rtmdet_head:output'], # [mark_name:output, ...]
output_names=[f'pred_maps.{i}' for i in range(6)]) # output names
])
```
- RetinaNet & SSD & FSAF with rknn-toolkit2
将下面的模型拆分配置写入到 [detection_rknn_static.py](https://github.com/open-mmlab/mmdeploy/blob/main/configs/mmdet/detection/detection_rknn-int8_static-320x320.py)。使用 rknn-toolkit 的用户则不用。
```python
# retinanet, ssd and fsaf for rknn-toolkit2
partition_config = dict(
type='rknn', # the partition policy name
apply_marks=True,
partition_cfg=[
dict(
save_file='model.onnx',
start='detector_forward:input',
end=['BaseDenseHead:output'],
output_names=[f'BaseDenseHead.cls.{i}' for i in range(5)] +
[f'BaseDenseHead.loc.{i}' for i in range(5)])
])
```
### 部署 config 说明
部署 config,你可以根据需要修改 `backend_config` 字段. 一个 mmpretrain 的 `backend_config`例子如下:
```python
backend_config = dict(
type='rknn',
common_config=dict(
mean_values=None,
std_values=None,
target_platform='rk3588',
optimization_level=3),
quantization_config=dict(do_quantization=False, dataset=None),
input_size_list=[[3, 224, 224]])
```
`common_config` 的内容服务于 `rknn.config()`. `quantization_config` 的内容服务于 `rknn.build()`
### 问题说明
- SDK 只支持 int8 的 rknn 模型,这需要在转换模型时设置 `do_quantization=True`
- 模型速度问题:如果使用的设备运行的是 RKNPU,比如 rv1126 设备,请记得在 `quantization_config` 中设置 `pre_compile=True`
## 模型推理
### Host 交叉编译
若 host 是 Ubuntu 18.04 及以上版本,推荐脚本编译:
```shell
bash tools/scripts/ubuntu_cross_build_rknn.sh <model>
```
命令中的参数 model 表示瑞芯微芯片的型号,目前支持 rv1126,rk3588。
以下是对脚本中编译过程的说明。
如下表所示,瑞芯微提供了 2 套 RKNN API 工具包,对应于不同的芯片型号。而每套 RKNN API 工具包又分别对应不同的 gcc 交叉编译工具。
| Device | RKNN API |
| ------------------------------------------ | -------------------------------------------------- |
| RK1808 / RK1806 / RV1109 / RV1126 | [rknpu](https://github.com/rockchip-linux/rknpu) |
| RK3566 / RK3568 / RK3588 / RV1103 / RV1106 | [rknpu2](https://github.com/rockchip-linux/rknpu2) |
以支持的 rv1126 和 rk3588 为例,mmdeploy 在 ubuntu18.04 上的交叉编译过程如下:
- **rv11126**
1. 下载 RKNN API 包
```shell
git clone https://github.com/rockchip-linux/rknpu
export RKNPU_DIR=$(pwd)/rknpu
```
2. 准备 gcc 交叉编译工具
```shell
sudo apt-get update
sudo apt-get install gcc-arm-linux-gnueabihf
sudo apt-get install g++-arm-linux-gnueabihf
```
3. 源码安装 OpenCV
```shell
git clone https://github.com/opencv/opencv --depth=1 --branch=4.6.0 --recursive
cd opencv
mkdir -p build_arm_gnueabi && cd build_arm_gnueabi
cmake .. -DCMAKE_INSTALL_PREFIX=install \
-DCMAKE_TOOLCHAIN_FILE=../platforms/linux/arm-gnueabi.toolchain.cmake \
-DBUILD_PERF_TESTS=OFF -DBUILD_SHARED_LIBS=OFF -DBUILD_TESTS=OFF -DCMAKE_BUILD_TYPE=Release
make -j $(nproc) && make install
export OpenCV_ARM_INSTALL_DIR=$(pwd)/install
```
4. 编译 mmdeploy SDK
```shell
cd /path/to/mmdeploy
mkdir -p build && cd build
cmake .. \
-DCMAKE_TOOLCHAIN_FILE=../cmake/toolchains/arm-linux-gnueabihf.cmake \
-DMMDEPLOY_BUILD_SDK=ON \
-DMMDEPLOY_BUILD_EXAMPLES=ON \
-DMMDEPLOY_TARGET_BACKENDS="rknn" \
-DRKNPU_DEVICE_DIR=${RKNPU_DIR}/rknn/rknn_api/librknn_api \
-DOpenCV_DIR=${OpenCV_ARM_INSTALL_DIR}/lib/cmake/opencv4
make -j$(nproc) && make install
```
- **rk3588**
1. 下载 RKNN API 包
```shell
git clone https://github.com/rockchip-linux/rknpu2
export RKNPU2_DEVICE_DIR=$(pwd)/rknpu2/runtime/RK3588
```
2. 准备 gcc 交叉编译工具
```shell
git clone https://github.com/Caesar-github/gcc-buildroot-9.3.0-2020.03-x86_64_aarch64-rockchip-linux-gnu
export RKNN_TOOL_CHAIN=$(pwd)/gcc-buildroot-9.3.0-2020.03-x86_64_aarch64-rockchip-linux-gnu
export LD_LIBRARY_PATH=$RKNN_TOOL_CHAIN/lib64:$LD_LIBRARY_PATH
```
3. 下载 opencv 预编译包
```shell
git clone https://github.com/opencv/opencv --depth=1 --branch=4.6.0 --recursive
cd opencv
mkdir -p build_aarch64 && cd build_aarch64
cmake .. -DCMAKE_INSTALL_PREFIX=install
-DCMAKE_TOOLCHAIN_FILE=../platforms/linux/aarch64-gnu.toolchain.cmake \
-DBUILD_PERF_TESTS=OFF -DBUILD_SHARED_LIBS=OFF -DBUILD_TESTS=OFF -DCMAKE_BUILD_TYPE=Release
make -j $(nproc) && make install
export OpenCV_AARCH64_INSTALL_DIR=$(pwd)/install
```
4. 编译 mmdeploy SDK
```shell
cd /path/to/mmdeploy
mkdir -p build && cd build
export LD_LIBRARY_PATH=$RKNN_TOOL_CHAIN/lib64:$LD_LIBRARY_PATH
cmake ..\
-DCMAKE_TOOLCHAIN_FILE=../cmake/toolchains/rknpu2-linux-gnu.cmake \
-DMMDEPLOY_BUILD_SDK=ON \
-DMMDEPLOY_TARGET_BACKENDS="rknn" \
-DMMDEPLOY_BUILD_EXAMPLES=ON \
-DOpenCV_DIR=${OpenCV_AARCH64_INSTALL_DIR}/lib/cmake/opencv4
make -j $(nproc) && make install
```
### Device 执行推理
首先,确保`--dump-info`在转模型的时候调用了, 这样工作目录下包含 SDK 需要的配置文件 `pipeline.json`
使用 `adb push` 将转好的模型、编好的 SDK 和 bin 文件推到设备上。
```bash
cd {/the/path/to/mmdeploy}
adb push mmdeploy_models/mmpretrain/resnet50 /root/resnet50
adb push {/the/path/of/mmpretrain}/demo/demo.JPEG /root/demo.JPEG
adb push build/install /root/mmdeploy_sdk
```
通过 adb shell,打开设备终端,设置环境变量,运行例子。
```bash
adb shell
cd /root/mmdeploy_sdk
export LD_LIBRARY_PATH=$(pwd)/lib:${LD_LIBRARY_PATH}
./bin/image_classification cpu ../resnet50 ../demo.JPEG
```
结果显示:
```shell
label: 65, score: 0.95
```
# 支持 SNPE
mmdeploy 集成 snpe 的方式简单且有效: Client/Server 模式。
这种模式
1. 能剥离`模型转换``推理`环境:
- 推理无关事项在算力更高的设备上完成;
- 对于推理计算,能拿到 gpu/npu 真实运行结果,而非 cpu 模拟数值。
2. 能覆盖成本敏感的设备。armv7/risc-v/mips 芯片满足产品需求,但往往对 Python 支持有限;
3. 能简化 mmdeploy 安装步骤。如果只想转 snpe 模型测试精度,不需要编译 .whl 包。
## 一、运行推理服务
下载预编译 snpe 推理服务包, `adb push` 到手机、执行。
注意**手机要有 qcom 芯片**
```bash
$ wget https://media.githubusercontent.com/media/tpoisonooo/mmdeploy_snpe_testdata/main/snpe-inference-server-1.59.tar.gz
...
$ sudo apt install adb
$ adb push snpe-inference-server-1.59.tar.gz /data/local/tmp/
# 解压运行
$ adb shell
venus:/ $ cd /data/local/tmp
130|venus:/data/local/tmp $ tar xvf snpe-inference-server-1.59.tar.gz
...
130|venus:/data/local/tmp $ source export1.59.sh
130|venus:/data/local/tmp $ ./inference_server 60000
...
Server listening on [::]:60000
```
此时推理服务应打印设备所有 ipv6 和 ipv4 地址,并监听端口。
tips:
- 如果 `adb devices` 找不到设备,可能因为:
- 有些廉价线只能充电、不能传输数据
- 或者没有打开手机的“开发者模式”
- 如果需要自己编译,可参照 [NDK 交叉编译 snpe 推理服务](../appendix/cross_build_snpe_service.md)
- 如果监听端口时 `segmentation fault`,可能是因为:
- 端口号已占用,换一个端口
## 二、安装 mmdeploy
1. 环境要求
| 事项 | 版本 | 备注 |
| ------- | ------------------ | ------------- |
| host OS | ubuntu18.04 x86_64 | snpe 指定版本 |
| Python | **3.6.0** | snpe 指定版本 |
2. 安装
[官网下载 snpe-1.59](https://developer.qualcomm.com/qfile/69652/snpe-1.59.0.zip),解压设置环境变量
```bash
$ unzip snpe-1.59.0.zip
$ export SNPE_ROOT=${PWD}/snpe-1.59.0.3230
$ cd /path/to/mmdeploy
$ export PYTHONPATH=${PWD}/service/snpe/client:${SNPE_ROOT}/lib/python:${PYTHONPATH}
$ export LD_LIBRARY_PATH=${SNPE_ROOT}/lib/x86_64-linux-clang:${LD_LIBRARY_PATH}
$ export PATH=${SNPE_ROOT}/bin/x86_64-linux-clang:${PATH}
$ python3 -m pip install -e .
```
tips:
- 如果网络不好,[这个 .tar.gz](https://github.com/tpoisonooo/mmdeploy_snpe_testdata/blob/main/snpe-1.59.tar.gz) 仅减小官方包体积,没有修改原始内容。
## 三、测试模型
以 Resnet-18 为例。先参照[文档](https://github.com/open-mmlab/mmpretrain/tree/main)安装 mmpretrain,然后使用 `tools/deploy.py` 转换模型。
```bash
$ export MODEL_CONFIG=/path/to/mmpretrain/configs/resnet/resnet18_8xb16_cifar10.py
$ export MODEL_PATH=https://download.openmmlab.com/mmclassification/v0/resnet/resnet18_b16x8_cifar10_20210528-bd6371c8.pth
# 模型转换
$ cd /path/to/mmdeploy
$ python3 tools/deploy.py configs/mmpretrain/classification_snpe_static.py $MODEL_CONFIG $MODEL_PATH /path/to/test.png --work-dir resnet18 --device cpu --uri 192.168.1.1\:60000 --dump-info
# 精度测试
$ python3 tools/test.py configs/mmpretrain/classification_snpe_static.py $MODEL_CONFIG --model reset18/end2end.dlc --metrics accuracy precision f1_score recall --uri 192.168.1.1\:60000
```
注意需要 `--uri` 指明 snpe 推理服务的 ip 和端口号,可以使用 ipv4 和 ipv6 地址。
## 四、Android NDK 编译 SDK
如果你还需要用 Android NDK 编译 mmdeploy SDK,请继续阅读本章节。
### 1. 下载 OCV、NDK,设置环境变量
```bash
# 下载 android OCV
$ export OPENCV_VERSION=4.5.4
$ wget https://github.com/opencv/opencv/releases/download/${OPENCV_VERSION}/opencv-${OPENCV_VERSION}-android-sdk.zip
$ unzip opencv-${OPENCV_VERSION}-android-sdk.zip
$ export ANDROID_OCV_ROOT=`realpath opencv-${OPENCV_VERSION}-android-sdk`
# 下载 ndk r23b
$ wget https://dl.google.com/android/repository/android-ndk-r23b-linux.zip
$ unzip android-ndk-r23b-linux.zip
$ export ANDROID_NDK_ROOT=`realpath android-ndk-r23b`
```
### 2. 编译 mmdeploy SDK 和 demo
```bash
$ cd /path/to/mmdeploy
$ mkdir build && cd build
$ cmake .. \
-DMMDEPLOY_BUILD_SDK=ON \
-DCMAKE_TOOLCHAIN_FILE=${ANDROID_NDK_ROOT}/build/cmake/android.toolchain.cmake \
-DMMDEPLOY_TARGET_BACKENDS=snpe \
-DANDROID_ABI=arm64-v8a -DANDROID_PLATFORM=android-30 \
-DANDROID_STL=c++_static \
-DOpenCV_DIR=${ANDROID_OCV_ROOT}/sdk/native/jni/abi-arm64-v8a \
-DMMDEPLOY_BUILD_EXAMPLES=ON
$ make && make install
$ tree ./bin
./bin
├── image_classification
├── image_restorer
├── image_segmentation
├── mmdeploy_onnx2ncnn
├── object_detection
├── ocr
├── pose_detection
└── rotated_object_detection
```
选项说明
| 选项 | 说明 |
| ----------------------------- | ------------------------------------- |
| CMAKE_TOOLCHAIN_FILE | 加载 NDK 参数,主要用于选择编译器版本 |
| MMDEPLOY_TARGET_BACKENDS=snpe | 使用 snpe 推理 |
| ANDROID_STL=c++\_static | 避免 NDK 环境找不到合适的 c++ lib |
| MMDEPLOY_SHARED_LIBS=ON | 官方 snpe 没有提供静态库 |
[这里](../01-how-to-build/cmake_option.md)是完整的编译选项说明
## 3. 运行 demo
先确认测试模型用了 `--dump-info`,这样 `resnet18` 目录才有 `pipeline.json` 等 SDK 所需文件。
把 dump 好的模型目录、可执行文件和 lib 都 `adb push` 到设备里
```bash
$ cd /path/to/mmdeploy
$ adb push resnet18 /data/local/tmp
$ adb push tests/data/tiger.jpeg /data/local/tmp/resnet18/
$ cd /path/to/install/
$ adb push lib /data/local/tmp
$ adb push bin/image_classification /data/local/tmp/resnet18/
```
设置环境变量,执行样例
```bash
$ adb push /path/to/mmpretrain/demo/demo.JPEG /data/local/tmp
$ adb shell
venus:/ $ cd /data/local/tmp/resnet18
venus:/data/local/tmp/resnet18 $ export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/data/local/tmp/lib
venus:/data/local/tmp/resnet18 $ ./image_classification cpu ./ tiger.jpeg
..
label: 3, score: 0.3214
```
# Win10 下构建方式
- [Win10 下构建方式](#win10-下构建方式)
- [源码安装](#源码安装)
- [安装构建和编译工具链](#安装构建和编译工具链)
- [安装依赖包](#安装依赖包)
- [安装 MMDeploy Converter 依赖](#安装-mmdeploy-converter-依赖)
- [安装 MMDeploy SDK 依赖](#安装-mmdeploy-sdk-依赖)
- [安装推理引擎](#安装推理引擎)
- [编译 MMDeploy](#编译-mmdeploy)
- [编译 Model Converter](#编译-model-converter)
- [安装 Model Converter](#安装-model-converter)
- [编译 SDK 和 Demos](#编译-sdk-和-demos)
- [注意事项](#注意事项)
______________________________________________________________________
## 源码安装
下述安装方式,均是在 **Windows 10** 下进行,使用 **PowerShell Preview** 版本。
### 安装构建和编译工具链
1. 下载并安装 [Visual Studio 2019](https://visualstudio.microsoft.com) 。安装时请勾选 "使用C++的桌面开发, "Windows 10 SDK <br>
2. 把 cmake 路径加入到环境变量 PATH 中, "C:\\Program Files (x86)\\Microsoft Visual Studio\\2019\\Community\\Common7\\IDE\\CommonExtensions\\Microsoft\\CMake\\CMake\\bin" <br>
3. 如果系统中配置了 NVIDIA 显卡,根据[官网教程](https://developer.nvidia.com\/cuda-downloads),下载并安装 cuda toolkit。<br>
### 安装依赖包
#### 安装 MMDeploy Converter 依赖
<table class="docutils">
<thead>
<tr>
<th>名称 </th>
<th>安装方法 </th>
</tr>
</thead>
<tbody>
<tr>
<td>conda </td>
<td>请参考 <a href="https://docs.conda.io/projects/conda/en/latest/user-guide/install/index.html">这里</a> 安装 conda。安装完毕后,打开系统开始菜单,<b>以管理员的身份打开 anaconda powershell prompt</b>。 因为,<br>
<b>1. 下文中的安装命令均是在 anaconda powershell 中测试验证的。</b><br>
<b>2. 使用管理员权限,可以把第三方库安装到系统目录。能够简化 MMDeploy 编译命令。</b><br>
<b>说明:如果你对 cmake 工作原理很熟悉,也可以使用普通用户权限打开 anaconda powershell prompt</b>
</td>
</tr>
<tr>
<td>PyTorch <br>(>=1.8.0) </td>
<td> 安装 PyTorch,要求版本是 torch>=1.8.0。可查看<a href="https://pytorch.org/">官网</a>获取更多详细的安装教程。请确保 PyTorch 要求的 CUDA 版本和您主机的 CUDA 版本是一致<br>
<pre><code>
pip install torch==1.8.0+cu111 torchvision==0.9.0+cu111 torchaudio==0.8.0 -f https://download.pytorch.org/whl/torch_stable.html
</code></pre>
</td>
</tr>
<tr>
<td>mmcv </td>
<td>参考如下命令安装 mmcv。更多安装方式,可查看 <a href="https://github.com/open-mmlab/mmcv/tree/2.x#installation">mmcv 官网</a><br>
<pre><code>
$env:cu_version="cu111"
$env:torch_version="torch1.8"
pip install -U openmim
mim install "mmcv>=2.0.0rc1"
</code></pre>
</td>
</tr>
</tbody>
</table>
#### 安装 MMDeploy SDK 依赖
如果您只对模型转换感兴趣,那么可以跳过本章节。
<table class="docutils">
<thead>
<tr>
<th>名称 </th>
<th>安装方法 </th>
</tr>
</thead>
<tbody>
<tr>
<td>OpenCV </td>
<td>
1.<a href="https://github.com/opencv/opencv/releases">这里</a>下载 OpenCV 3+。
2. 您可以下载并安装 OpenCV 预编译包到指定的目录下。也可以选择源码编译安装的方式
3. 在安装目录中,找到 <code>OpenCVConfig.cmake</code>,并把它的路径添加到环境变量 <code>PATH</code> 中。像这样:
<pre><code>$env:path = "\the\path\where\OpenCVConfig.cmake\locates;" + "$env:path"</code></pre>
</td>
</tr>
<tr>
<td>pplcv </td>
<td>pplcv 是 openPPL 开发的高性能图像处理库。 <b>此依赖项为可选项,只有在 cuda 平台下,才需安装。</b><br>
<pre><code>
git clone https://github.com/openppl-public/ppl.cv.git
cd ppl.cv
git checkout tags/v0.7.0 -b v0.7.0
$env:PPLCV_DIR = "$pwd"
mkdir pplcv-build
cd pplcv-build
cmake .. -G "Visual Studio 16 2019" -T v142 -A x64 -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=install -DPPLCV_USE_CUDA=ON -DPPLCV_USE_MSVC_STATIC_RUNTIME=OFF
cmake --build . --config Release -- /m
cmake --install . --config Release
cd ../..
</code></pre>
</td>
</tr>
</tbody>
</table>
#### 安装推理引擎
MMDeploy 的 Model Converter 和 SDK 共享推理引擎。您可以参考下文,选择自己感兴趣的推理引擎安装。
**目前,在 Windows 平台下,MMDeploy 支持 ONNXRuntime 和 TensorRT 两种推理引擎**。其他推理引擎尚未进行验证,或者验证未通过。后续将陆续予以支持
<table class="docutils">
<thead>
<tr>
<th>推理引擎 </th>
<th>依赖包</th>
<th>安装方法 </th>
</tr>
</thead>
<tbody>
<tr>
<td>ONNXRuntime</td>
<td>onnxruntime<br>(>=1.8.1) </td>
<td>
1. 安装 onnxruntime 的 python 包
<pre><code>pip install onnxruntime==1.8.1</code></pre>
2.<a href="https://github.com/microsoft/onnxruntime/releases/tag/v1.8.1">这里</a>下载 onnxruntime 的预编译二进制包,解压并配置环境变量
<pre><code>
Invoke-WebRequest -Uri https://github.com/microsoft/onnxruntime/releases/download/v1.8.1/onnxruntime-win-x64-1.8.1.zip -OutFile onnxruntime-win-x64-1.8.1.zip
Expand-Archive onnxruntime-win-x64-1.8.1.zip .
$env:ONNXRUNTIME_DIR = "$pwd\onnxruntime-win-x64-1.8.1"
$env:path = "$env:ONNXRUNTIME_DIR\lib;" + $env:path
</code></pre>
</td>
</tr>
<tr>
<td rowspan="2">TensorRT<br> </td>
<td>TensorRT <br> </td>
<td>
1. 登录 <a href="https://www.nvidia.com/">NVIDIA 官网</a>,从<a href="https://developer.nvidia.com/nvidia-tensorrt-download">这里</a>选取并下载 TensorRT tar 包。要保证它和您机器的 CPU 架构以及 CUDA 版本是匹配的。您可以参考这份 <a href="https://docs.nvidia.com/deeplearning/tensorrt/install-guide/index.html#installing-tar">指南</a> 安装 TensorRT。<br>
2. 这里也有一份 TensorRT 8.2 GA Update 2 在 Windows x86_64 和 CUDA 11.x 下的安装示例,供您参考。首先,点击<a href="https://developer.nvidia.com/compute/machine-learning/tensorrt/secure/8.2.3.0/zip/TensorRT-8.2.3.0.Windows10.x86_64.cuda-11.4.cudnn8.2.zip">此处</a>下载 CUDA 11.x TensorRT 8.2.3.0。然后,根据如下命令,安装并配置 TensorRT 以及相关依赖。
<pre><code>
cd \the\path\of\tensorrt\zip\file
Expand-Archive TensorRT-8.2.3.0.Windows10.x86_64.cuda-11.4.cudnn8.2.zip .
pip install $env:TENSORRT_DIR\python\tensorrt-8.2.3.0-cp37-none-win_amd64.whl
$env:TENSORRT_DIR = "$pwd\TensorRT-8.2.3.0"
$env:path = "$env:TENSORRT_DIR\lib;" + $env:path
pip install pycuda
</code></pre>
</td>
</tr>
<tr>
<td>cudnn </td>
<td>
1.<a href="https://developer.nvidia.com/rdp/cudnn-archive">cuDNN Archive</a> 中选择和您环境中 CPU 架构、CUDA 版本以及 TensorRT 版本配套的 cuDNN。以前文 TensorRT 安装说明为例,它需要 cudnn8.2。因此,可以下载 <a href="https://developer.nvidia.com/compute/machine-learning/cudnn/secure/8.2.1.32/11.3_06072021/cudnn-11.3-windows-x64-v8.2.1.32.zip">CUDA 11.x cuDNN 8.2</a><br>
2. 解压压缩包,并设置环境变量
<pre><code>
cd \the\path\of\cudnn\zip\file
Expand-Archive cudnn-11.3-windows-x64-v8.2.1.32.zip .
$env:CUDNN_DIR="$pwd\cuda"
$env:path = "$env:CUDNN_DIR\bin;" + $env:path
</code></pre>
</td>
</tr>
<tr>
<td>PPL.NN</td>
<td>ppl.nn </td>
<td> TODO </td>
</tr>
<tr>
<td>OpenVINO</td>
<td>openvino </td>
<td>TODO </td>
</tr>
<tr>
<td>ncnn </td>
<td>ncnn </td>
<td>
1. 下载 <a href="https://github.com/google/protobuf/archive/v3.11.2.zip">protobuf-3.11.2</a><br>
2. 编译protobuf
<pre><code>cd &ltprotobuf-dir>
mkdir build
cd build
cmake -G "Visual Studio 16 2019" -A x64 -DCMAKE_INSTALL_PREFIX=%cd%/install -Dbuild_TESTS=OFF -Dprotobuf_MSVC_STATIC_RUNTIME=OFF ../cmake
cmake --build . --config Release -j 2
cmake --build . --config Release --target install</code></pre>
2. 下载ncnn
<pre><code>git clone --recursive https://github.com/Tencent/ncnn.git
cd &ltncnn-dir>
mkdir -p ncnn_build
cd ncnn_build
cmake -G "Visual Studio 16 2019" -A x64 -DCMAKE_INSTALL_PREFIX=%cd%/install -Dprotobuf_DIR=<protobuf-dir>/build/install/cmake -DNCNN_VULKAN=OFF ..
cmake --build . --config Release -j 2
cmake --build . --config Release --target install
</code></pre>
</td>
</tr>
</tbody>
</table>
### 编译 MMDeploy
```powershell
cd \the\root\path\of\MMDeploy
$env:MMDEPLOY_DIR="$pwd"
```
#### 编译 Model Converter
如果您选择了 ONNXRuntime,TensorRT 和 ncnn 任一种推理后端,您需要编译对应的自定义算子库。
- **ONNXRuntime** 自定义算子
```powershell
mkdir build -ErrorAction SilentlyContinue
cd build
cmake .. -G "Visual Studio 16 2019" -A x64 -T v142 -DMMDEPLOY_TARGET_BACKENDS="ort" -DONNXRUNTIME_DIR="$env:ONNXRUNTIME_DIR"
cmake --build . --config Release -- /m
cmake --install . --config Release
```
- **TensorRT** 自定义算子
```powershell
mkdir build -ErrorAction SilentlyContinue
cd build
cmake .. -G "Visual Studio 16 2019" -A x64 -T v142 -DMMDEPLOY_TARGET_BACKENDS="trt" -DTENSORRT_DIR="$env:TENSORRT_DIR" -DCUDNN_DIR="$env:CUDNN_DIR"
cmake --build . --config Release -- /m
cmake --install . --config Release
```
- **ncnn** 自定义算子
```powershell
mkdir build -ErrorAction SilentlyContinue
cd build
cmake .. -G "Visual Studio 16 2019" -A x64 -T v142
-DMMDEPLOY_TARGET_BACKENDS="ncnn" \
-Dncnn_DIR="<ncnn-dir>/ncnn_build/install/lib/cmake/ncnn"
-Dprotobuf_DIR="<protobuf-dir>/build/install/cmake"
-DProtobuf_LIBRARIES="<protobuf-dir>/build\install\lib"
-DProtobuf_INCLUDE_DIR="<protobuf-dir>/build\install\include\"
cmake --build . --config Release -- /m
cmake --install . --config Release
```
参考 [cmake 选项说明](cmake_option.md)
#### 安装 Model Converter
```powershell
cd $env:MMDEPLOY_DIR
pip install -e .
```
**注意**
- 有些依赖项是可选的。运行 `pip install -e .` 将进行最小化依赖安装。 如果需安装其他可选依赖项,请执行`pip install -r requirements/optional.txt`
或者 `pip install -e .[optional]`。其中,`[optional]`可以替换为:`all``tests``build``optional`
#### 编译 SDK 和 Demos
下文展示2个构建SDK的样例,分别用 ONNXRuntime 和 TensorRT 作为推理引擎。您可以参考它们,并结合前文 SDK 的编译选项说明,激活其他的推理引擎。
- cpu + ONNXRuntime
```PowerShell
cd $env:MMDEPLOY_DIR
mkdir build -ErrorAction SilentlyContinue
cd build
cmake .. -G "Visual Studio 16 2019" -A x64 -T v142 `
-DMMDEPLOY_BUILD_SDK=ON `
-DMMDEPLOY_BUILD_EXAMPLES=ON `
-DMMDEPLOY_BUILD_SDK_PYTHON_API=ON `
-DMMDEPLOY_TARGET_DEVICES="cpu" `
-DMMDEPLOY_TARGET_BACKENDS="ort" `
-DONNXRUNTIME_DIR="$env:ONNXRUNTIME_DIR"
cmake --build . --config Release -- /m
cmake --install . --config Release
```
- cuda + TensorRT
```PowerShell
cd $env:MMDEPLOY_DIR
mkdir build
cd build
cmake .. -G "Visual Studio 16 2019" -A x64 -T v142 `
-DMMDEPLOY_BUILD_SDK=ON `
-DMMDEPLOY_BUILD_EXAMPLES=ON `
-DMMDEPLOY_BUILD_SDK_PYTHON_API=ON `
-DMMDEPLOY_TARGET_DEVICES="cuda" `
-DMMDEPLOY_TARGET_BACKENDS="trt" `
-Dpplcv_DIR="$env:PPLCV_DIR/pplcv-build/install/lib/cmake/ppl" `
-DTENSORRT_DIR="$env:TENSORRT_DIR" `
-DCUDNN_DIR="$env:CUDNN_DIR"
cmake --build . --config Release -- /m
cmake --install . --config Release
```
- cpu + ncnn
```PowerShell
cd $env:MMDEPLOY_DIR
mkdir build
cd build
cmake .. -G "Visual Studio 16 2019" -A x64 -T v142 `
-DMMDEPLOY_BUILD_SDK=ON `
-DMMDEPLOY_BUILD_EXAMPLES=ON `
-DMMDEPLOY_BUILD_SDK_PYTHON_API=ON `
-DMMDEPLOY_TARGET_DEVICES="cpu" `
-DMMDEPLOY_TARGET_BACKENDS="ncnn" `
-Dncnn_DIR="<ncnn-dir>/ncnn_build/install/lib/cmake/ncnn"
-Dprotobuf_DIR="<protobuf-dir>/build/install/cmake"
-DProtobuf_LIBRARIES="<protobuf-dir>/build\install\lib"
-DProtobuf_INCLUDE_DIR="<protobuf-dir>/build\install\include\"
cmake --build . --config Release -- /m
cmake --install . --config Release
```
### 注意事项
1. Release / Debug 库不能混用。MMDeploy 要是编译 Release 版本,所有第三方依赖都要是 Release 版本。反之亦然。
# 如何转换模型
<!-- TOC -->
- [如何转换模型](#如何转换模型)
- [如何将模型从pytorch形式转换成其他后端形式](#如何将模型从pytorch形式转换成其他后端形式)
- [准备工作](#准备工作)
- [使用方法](#使用方法)
- [参数描述](#参数描述)
- [如何查找pytorch模型对应的部署配置文件](#如何查找pytorch模型对应的部署配置文件)
- [示例](#示例)
- [如何评测模型](#如何评测模型)
- [各后端已支持导出的模型列表](#各后端已支持导出的模型列表)
<!-- TOC -->
这篇教程介绍了如何使用 MMDeploy 的工具将一个 OpenMMlab 模型转换成某个后端的模型文件。
注意:
- 现在已支持的后端包括 [ONNXRuntime](../05-supported-backends/onnxruntime.md)[TensorRT](../05-supported-backends/tensorrt.md)[ncnn](../05-supported-backends/ncnn.md)[PPLNN](../05-supported-backends/pplnn.md)[OpenVINO](../05-supported-backends/openvino.md)
- 现在已支持的代码库包括 [MMPretrain](../04-supported-codebases/mmpretrain.md)[MMDetection](../04-supported-codebases/mmdet.md)[MMSegmentation](../04-supported-codebases/mmseg.md)[MMOCR](../04-supported-codebases/mmocr.md)[MMagic](../04-supported-codebases/mmagic.md)
## 如何将模型从pytorch形式转换成其他后端形式
### 准备工作
1. 安装您的目标后端。 您可以参考 [ONNXRuntime-install](../05-supported-backends/onnxruntime.md)[TensorRT-install](../05-supported-backends/tensorrt.md)[ncnn-install](../05-supported-backends/ncnn.md)[PPLNN-install](../05-supported-backends/pplnn.md), [OpenVINO-install](../05-supported-backends/openvino.md)
2. 安装您的目标代码库。 您可以参考 [MMPretrain-install](https://mmpretrain.readthedocs.io/en/latest/get_started.html#installation)[MMDetection-install](https://mmdetection.readthedocs.io/en/latest/get_started.html#installation)[MMSegmentation-install](https://mmsegmentation.readthedocs.io/en/latest/get_started.html#installation)[MMOCR-install](https://mmocr.readthedocs.io/en/latest/get_started/install.html#installation-steps)[MMagic-install](https://mmagic.readthedocs.io/en/latest/get_started/install.html#installation)
### 使用方法
```bash
python ./tools/deploy.py \
${DEPLOY_CFG_PATH} \
${MODEL_CFG_PATH} \
${MODEL_CHECKPOINT_PATH} \
${INPUT_IMG} \
--test-img ${TEST_IMG} \
--work-dir ${WORK_DIR} \
--calib-dataset-cfg ${CALIB_DATA_CFG} \
--device ${DEVICE} \
--log-level INFO \
--show \
--dump-info
```
### 参数描述
- `deploy_cfg` : mmdeploy 针对此模型的部署配置,包含推理框架类型、是否量化、输入 shape 是否动态等。配置文件之间可能有引用关系,`configs/mmpretrain/classification_ncnn_static.py` 是一个示例。
- `model_cfg` : mm 算法库的模型配置,例如 `mmpretrain/configs/vision_transformer/vit-base-p32_ft-64xb64_in1k-384.py`,与 mmdeploy 的路径无关。
- `checkpoint` : torch 模型路径。可以 http/https 开头,详见 `mmcv.FileClient` 的实现。
- `img` : 模型转换时,用做测试的图像或点云文件路径。
- `--test-img` : 用于测试模型的图像文件路径。默认设置成`None`
- `--work-dir` : 工作目录,用来保存日志和模型文件。
- `--calib-dataset-cfg` : 此参数只有int8模式下生效,用于校准数据集配置文件。若在int8模式下未传入参数,则会自动使用模型配置文件中的'val'数据集进行校准。
- `--device` : 用于模型转换的设备。 默认是`cpu`,对于 trt 可使用 `cuda:0` 这种形式。
- `--log-level` : 设置日记的等级,选项包括`'CRITICAL', 'FATAL', 'ERROR', 'WARN', 'WARNING', 'INFO', 'DEBUG', 'NOTSET'`。 默认是`INFO`
- `--show` : 是否显示检测的结果。
- `--dump-info` : 是否输出 SDK 信息。
### 如何查找pytorch模型对应的部署配置文件
1.`configs/` 文件夹中找到模型对应的代码库文件夹。 例如,转换一个yolov3模型您可以查找到 `configs/mmdet` 文件夹。
2. 根据模型的任务类型在 `configs/codebase_folder/` 下查找对应的文件夹。 例如yolov3模型,您可以查找到 `configs/mmdet/detection` 文件夹。
3.`configs/codebase_folder/task_folder/` 下找到模型的部署配置文件。 例如部署yolov3您可以使用 `configs/mmdet/detection/detection_onnxruntime_dynamic.py`
### 示例
```bash
python ./tools/deploy.py \
configs/mmdet/detection/detection_tensorrt_dynamic-320x320-1344x1344.py \
$PATH_TO_MMDET/configs/yolo/yolov3_d53_8xb8-ms-608-273e_coco.py \
$PATH_TO_MMDET/checkpoints/yolo/yolov3_d53_mstrain-608_273e_coco_20210518_115020-a2c3acb8.pth \
$PATH_TO_MMDET/demo/demo.jpg \
--work-dir work_dir \
--show \
--device cuda:0
```
## 如何评测模型
您可以尝试去评测转换出来的模型 ,参考 [profile 模型](profile_model.md)
## 各后端已支持导出的模型列表
参考[已支持的模型列表](../03-benchmark/supported_models.md)
# 融合预处理(实验性功能)
MMDeploy提供了一些Transform融合的能力,当使用SDK进行推理时,可以通过修改pipeline.json来开启融合选项,在某些Transform的组合下可以对预处理进行加速。
若要在MMDeploy的SDK中加入融合能力,可参考CVFusion的使用。
## 一、使用CVFusion
有两种选择,一种是在编译mmdeploy的时候,使用我们提供的融合kernel代码,一种是自己使用CVFusion生成融合kernel的代码。
A)使用提供的kernel代码
1. 从这里下载代码,并解压,将csrc文件夹拷贝到mmdeploy的根目录。
[elena_kernel-20220823.tar.gz](https://github.com/open-mmlab/mmdeploy/files/9399795/elena_kernel-20220823.tar.gz)
2. 编译mmdeploy的时候,增加选项`-DMMDEPLOY_ELENA_FUSION=ON`
B) 使用CVFusion生成kernel
1. 编译CVFusion
```bash
$ git clone --recursive https://github.com/OpenComputeLab/CVFusion.git
$ cd CVFusion
$ bash build.sh
# add OpFuse to PATH
$ export PATH=`pwd`/build/examples/MMDeploy:$PATH
```
2. 下载各个算法codebase
```bash
$ tree -L 1 .
├── mmdeploy
├── mmpretrain
├── mmdetection
├── mmsegmentation
├── ...
```
3. 生成融合kernel
```bash
python tools/elena/extract_transform.py ..
# 生成的代码会保存在csrc/preprocess/elena/{cpu_kernel}/{cuda_kernel}
```
4. 编译mmdeploy的时候,增加选项`-DMMDEPLOY_ELENA_FUSION=ON`
## 二、模型转换
模型转换时通过`--dump-info`生成SDK所需文件。
```bash
$ export MODEL_CONFIG=/path/to/mmpretrain/configs/resnet/resnet18_8xb32_in1k.py
$ export MODEL_PATH=https://download.openmmlab.com/mmclassification/v0/resnet/resnet18_8xb32_in1k_20210831-fbbb1da6.pth
$ python tools/deploy.py \
configs/mmpretrain/classification_onnxruntime_static.py \
$MODEL_CONFIG \
$MODEL_PATH \
tests/data/tiger.jpeg \
--work-dir resnet18 \
--device cpu \
--dump-info
```
## 三、模型推理
若当前pipeline的预处理模块支持融合,`pipeline.json`中会有`fuse_transform`字段,表示融合开关,默认为`false`。当启用融合算法时,需要把`false`改为`true`
# Win10 下预编译包的使用
- [Win10 下预编译包的使用](#win10-下预编译包的使用)
- [准备工作](#准备工作)
- [ONNX Runtime](#onnx-runtime)
- [TensorRT](#tensorrt)
- [模型转换](#模型转换)
- [ONNX Runtime Example](#onnx-runtime-example)
- [TensorRT Example](#tensorrt-example)
- [模型推理](#模型推理)
- [Backend Inference](#backend-inference)
- [ONNXRuntime](#onnxruntime)
- [TensorRT](#tensorrt-1)
- [Python SDK](#python-sdk)
- [ONNXRuntime](#onnxruntime-1)
- [TensorRT](#tensorrt-2)
- [C SDK](#c-sdk)
- [ONNXRuntime](#onnxruntime-2)
- [TensorRT](#tensorrt-3)
- [可能遇到的问题](#可能遇到的问题)
______________________________________________________________________
目前,`MMDeploy``Windows`平台下提供`cpu`以及`cuda`两种Device的预编译包,其中`cpu`版支持使用onnxruntime cpu进行推理,`cuda`版支持使用onnxruntime-gpu以及tensorrt进行推理,可以从[Releases](https://github.com/open-mmlab/mmdeploy/releases)获取。。
本篇教程以`mmdeploy-1.3.1-windows-amd64.zip``mmdeploy-1.3.1-windows-amd64-cuda11.8.zip`为例,展示预编译包的使用方法。
为了方便使用者快速上手,本教程以分类模型(mmpretrain)为例,展示两种预编译包的使用方法。
预编译包的目录结构如下。
```
.
├── build_sdk.ps1
├── example
├── include
├── install_opencv.ps1
├── lib
├── README.md
├── set_env.ps1
└── thirdparty
```
## 准备工作
使用预编译包来进行`模型转换`以及`模型推理`,除了预编译包的中的内容外,还需要安装一些第三方依赖库,下面分别介绍以`ONNX Runtime``TensorRT`为推理后端所要进行的准备工作。
两种推理后端环境准备工作中,其中一些操作是共有的,下面先介绍这些共有的操作,再分别介绍各自特有的操作。
首先新建一个工作目录workspace
1. 请按照[get_started](../get_started.md)文档,准备虚拟环境,安装pytorch、torchvision、mmcv。若要使用SDK的C接口,需要安装vs2019+, OpenCV。
:point_right: 这里建议使用`pip`而不是`conda`安装pytorch、torchvision
2. 克隆mmdeploy仓库
```bash
git clone -b main https://github.com/open-mmlab/mmdeploy.git
```
:point_right: 这里主要为了使用configs文件,所以没有加`--recursive`来下载submodule,也不需要编译`mmdeploy`
3. 安装mmpretrain
```bash
git clone -b main https://github.com/open-mmlab/mmpretrain.git
cd mmpretrain
pip install -e .
```
4. 准备一个PyTorch的模型文件当作我们的示例
这里选择了[resnet18_8xb32_in1k_20210831-fbbb1da6.pth](https://download.openmmlab.com/mmclassification/v0/resnet/resnet18_8xb32_in1k_20210831-fbbb1da6.pth),对应的训练config为[resnet18_8xb32_in1k.py](https://github.com/open-mmlab/mmpretrain/blob/main/configs/resnet/resnet18_8xb32_in1k.py)
做好以上工作后,当前工作目录的结构应为:
```
.
|-- mmpretrain
|-- mmdeploy
|-- resnet18_8xb32_in1k_20210831-fbbb1da6.pth
```
### ONNX Runtime
本节介绍`mmdeploy`使用`ONNX Runtime`推理所特有的环境准备工作
5. 安装`mmdeploy`(模型转换)以及`mmdeploy_runtime`(模型推理Python API)的预编译包
```bash
pip install mmdeploy==1.3.1
pip install mmdeploy-runtime==1.3.1
```
:point_right: 如果之前安装过,需要先卸载后再安装。
6. 安装onnxruntime package
```
pip install onnxruntime==1.8.1
```
7. 下载[`onnxruntime`](https://github.com/microsoft/onnxruntime/releases/tag/v1.8.1),添加环境变量
将onnxruntime的lib目录添加到PATH里面,如图所示,具体的路径根据个人情况更改。
![sys-path](https://user-images.githubusercontent.com/16019484/181463801-1d7814a8-b256-46e9-86f2-c08de0bc150b.png)
:exclamation: 重启powershell让环境变量生效,可以通过 echo $env:PATH 来检查是否设置成功。
8. 下载 SDK C/cpp Library mmdeploy-1.3.1-windows-amd64.zip
### TensorRT
本节介绍`mmdeploy`使用`TensorRT`推理所特有的环境准备工作
5. 安装`mmdeploy`(模型转换)以及`mmdeploy_runtime`(模型推理Python API)的预编译包
```bash
pip install mmdeploy==1.3.1
pip install mmdeploy-runtime-gpu==1.3.1
```
:point_right: 如果之前安装过,需要先卸载后再安装
6. 安装CUDA相关内容,并设置环境变量
- CUDA Toolkit 11.1
- TensorRT 8.2.3.0 (python包 + 环境变量)
- cuDNN 8.2.1.0
其中CUDA的环境变量在安装CUDA Toolkit后会自动添加,TensorRT以及cuDNN解压后需要自行添加运行库的路径到PATH,可参考onnxruntime的设置图例
:exclamation: 重启powershell让环境变量生效,可以通过 echo $env:PATH 来检查是否设置成功
:exclamation: 建议只添加一个版本的TensorRT的lib到PATH里面。不建议拷贝TensorRT的dll到C盘的cuda目录,在某些情况下,这样可以暴露dll的版本问题
7. 安装pycuda `pip install pycuda`
8. 下载 SDK C/cpp Library mmdeploy-1.3.1-windows-amd64-cuda11.8.zip
## 模型转换
### ONNX Runtime Example
下面介绍根据之前下载的ckpt来展示如果使用`mmdeploy`预编译包来进行模型转换
经过之前的准备工作,当前的工作目录结构应该为:
```
..
|-- mmdeploy-1.3.1-windows-amd64
|-- mmpretrain
|-- mmdeploy
`-- resnet18_8xb32_in1k_20210831-fbbb1da6.pth
```
python 转换代码
```python
from mmdeploy.apis import torch2onnx
from mmdeploy.backend.sdk.export_info import export2SDK
img = 'mmpretrain/demo/demo.JPEG'
work_dir = 'work_dir/onnx/resnet'
save_file = 'end2end.onnx'
deploy_cfg = 'mmdeploy/configs/mmpretrain/classification_onnxruntime_dynamic.py'
model_cfg = 'mmpretrain/configs/resnet/resnet18_8xb32_in1k.py'
model_checkpoint = 'resnet18_8xb32_in1k_20210831-fbbb1da6.pth'
device = 'cpu'
# 1. convert model to onnx
torch2onnx(img, work_dir, save_file, deploy_cfg, model_cfg,
model_checkpoint, device)
# 2. extract pipeline info for sdk use (dump-info)
export2SDK(deploy_cfg, model_cfg, work_dir, pth=model_checkpoint, device=device)
```
转换后的模型目录结构应该为:
```bash
.\work_dir\
`-- onnx
`-- resnet
|-- deploy.json
|-- detail.json
|-- end2end.onnx
`-- pipeline.json
```
### TensorRT Example
下面根据之前下载的ckpt来展示如果使用mmdeploy预编译包来进行模型转换
经过之前的准备工作,当前的工作目录结构应该为:
```
..
|-- mmdeploy-1.3.1-windows-amd64-cuda11.8
|-- mmpretrain
|-- mmdeploy
`-- resnet18_8xb32_in1k_20210831-fbbb1da6.pth
```
python 转换代码
```python
from mmdeploy.apis import torch2onnx
from mmdeploy.apis.tensorrt import onnx2tensorrt
from mmdeploy.backend.sdk.export_info import export2SDK
import os
img = 'mmpretrain/demo/demo.JPEG'
work_dir = 'work_dir/trt/resnet'
save_file = 'end2end.onnx'
deploy_cfg = 'mmdeploy/configs/mmpretrain/classification_tensorrt_static-224x224.py'
model_cfg = 'mmpretrain/configs/resnet/resnet18_8xb32_in1k.py'
model_checkpoint = 'resnet18_8xb32_in1k_20210831-fbbb1da6.pth'
device = 'cpu'
# 1. convert model to IR(onnx)
torch2onnx(img, work_dir, save_file, deploy_cfg, model_cfg,
model_checkpoint, device)
# 2. convert IR to tensorrt
onnx_model = os.path.join(work_dir, save_file)
save_file = 'end2end.engine'
model_id = 0
device = 'cuda'
onnx2tensorrt(work_dir, save_file, model_id, deploy_cfg, onnx_model, device)
# 3. extract pipeline info for sdk use (dump-info)
export2SDK(deploy_cfg, model_cfg, work_dir, pth=model_checkpoint, device=device)
```
转换后的模型目录结构应该为:
```
.\work_dir\
`-- trt
`-- resnet
|-- deploy.json
|-- detail.json
|-- end2end.engine
|-- end2end.onnx
`-- pipeline.json
```
## 模型推理
以下内容假定已完成了上述模型转换的两个Example,并得到了上述模型转换后的两个文件夹其中之一或者全部:
```
.\work_dir\onnx\resnet
.\work_dir\trt\resnet
```
当前的工作目录应为:
```
.
|-- mmdeploy-1.3.1-windows-amd64
|-- mmdeploy-1.3.1-windows-amd64-cuda11.8
|-- mmpretrain
|-- mmdeploy
|-- resnet18_8xb32_in1k_20210831-fbbb1da6.pth
`-- work_dir
```
### Backend Inference
:exclamation: 需要强调的一点是,这个接口不是为了做部署的,而是屏蔽了推理后端接口的,用来检验转换的模型是否可以正常推理的。
#### ONNXRuntime
Python 代码
```python
from mmdeploy.apis import inference_model
model_cfg = 'mmpretrain/configs/resnet/resnet18_8xb32_in1k.py'
deploy_cfg = 'mmdeploy/configs/mmpretrain/classification_onnxruntime_dynamic.py'
backend_files = ['work_dir/onnx/resnet/end2end.onnx']
img = 'mmpretrain/demo/demo.JPEG'
device = 'cpu'
result = inference_model(model_cfg, deploy_cfg, backend_files, img, device)
```
#### TensorRT
Python 代码
```python
from mmdeploy.apis import inference_model
model_cfg = 'mmpretrain/configs/resnet/resnet18_8xb32_in1k.py'
deploy_cfg = 'mmdeploy/configs/mmpretrain/classification_tensorrt_static-224x224.py'
backend_files = ['work_dir/trt/resnet/end2end.engine']
img = 'mmpretrain/demo/demo.JPEG'
device = 'cuda'
result = inference_model(model_cfg, deploy_cfg, backend_files, img, device)
```
### Python SDK
这里介绍如何使用SDK的Python API进行推理
#### ONNXRuntime
推理代码
```bash
python .\mmdeploy\demo\python\image_classification.py cpu .\work_dir\onnx\resnet\ .\mmpretrain\demo\demo.JPEG
```
#### TensorRT
推理代码
```bash
python .\mmdeploy\demo\python\image_classification.py cuda .\work_dir\trt\resnet\ .\mmpretrain\demo\demo.JPEG
```
### C SDK
这里介绍如何使用SDK的C API进行推理
#### ONNXRuntime
1. 添加环境变量
可根据 SDK 文件夹内的 README.md 使用脚本添加环境变量
2. 编译 examples
可根据 SDK 文件夹内的 README.md 使用脚本进行编译
3. 推理:
这里建议使用cmd,这样如果exe运行时如果找不到相关的dll的话会有弹窗
在mmdeploy-1.3.1-windows-amd64\\example\\cpp\\build\\Release目录下:
```
.\image_classification.exe cpu C:\workspace\work_dir\onnx\resnet\ C:\workspace\mmpretrain\demo\demo.JPEG
```
#### TensorRT
1. 添加环境变量
可根据 SDK 文件夹内的 README.md 使用脚本添加环境变量
2. 编译 examples
可根据 SDK 文件夹内的 README.md 使用脚本进行编译
3. 推理
这里建议使用cmd,这样如果exe运行时如果找不到相关的dll的话会有弹窗
在mmdeploy-1.3.1-windows-amd64-cuda11.8\\example\\cpp\\build\\Release目录下:
```
.\image_classification.exe cuda C:\workspace\work_dir\trt\resnet C:\workspace\mmpretrain\demo\demo.JPEG
```
## 可能遇到的问题
如遇到问题,可参考[FAQ](../faq.md)
# 如何 Profile 模型
模型转换结束后,MMDeploy 提供了 `tools/test.py` 做为单测工具。
## 依赖
需要参照 [安装说明](../01-how-to-build/build_from_source.md) 完成依赖安装
,按照 [转换说明](../02-how-to-run/convert_model.md) 转出模型。
## 用法
```shell
python tools/test.py \
${DEPLOY_CFG} \
${MODEL_CFG} \
--model ${BACKEND_MODEL_FILES} \
[--speed-test] \
[--warmup ${WARM_UP}] \
[--log-interval ${LOG_INTERVERL}] \
[--log2file ${LOG_RESULT_TO_FILE}]
```
## 参数详解
| 参数 | 说明 |
| ------------ | ------------------------- |
| deploy_cfg | 部署配置文件 |
| model_cfg | codebase 中的模型配置文件 |
| log2file | 保存日志和运行文件的路径 |
| speed-test | 是否做速度测试 |
| warm-up | 执行前是否 warm-up |
| log-interval | 日志打印间隔 |
## 使用样例
执行模型推理
```shell
python tools/test.py \
configs/mmpretrain/classification_onnxruntime_static.py \
{MMPRETRAIN_DIR}/configs/resnet/resnet50_b32x8_imagenet.py \
--model model.onnx \
--out out.pkl \
--device cuda:0
```
profile 速度测试
```shell
python tools/test.py \
configs/mmpretrain/classification_onnxruntime_static.py \
{MMPRETRAIN_DIR}/configs/resnet/resnet50_b32x8_imagenet.py \
--model model.onnx \
--speed-test \
--device cpu
```
# 如何量化模型
## 为什么要量化
相对于 fp32 模型,定点模型有诸多优点:
- 体积更小,8-bit 模型可降低 75% 文件大小
- 由于模型变小,Cache 命中率提升,速度更快
- 芯片往往有对应的定点加速指令,这些指令更快、能耗更低(常见 CPU 上 int8 大约只需要 10% 能量)
安装包体积、发热都是移动端评价 APP 的关键指标;而在服务端,“加速”意味着可以维持相同 QPS、增大模型换取精度提升。
## mmdeploy 离线量化方案
以 ncnn backend 为例,完整的工作流如下:
<div align="center">
<img src="../_static/image/quant_model.png"/>
</div>
mmdeploy 基于静态图(onnx)生成推理框架所需的量化表,再用后端工具把浮点模型转为定点。
目前 mmdeploy 支持 ncnn PTQ。
## 模型怎么转定点
[mmdeploy 安装](../01-how-to-build/build_from_source.md)完成后,加载 ppq 并安装
```bash
git clone https://github.com/openppl-public/ppq.git
cd ppq
pip install -r requirements.txt
python3 setup.py install
```
回到 mmdeploy, 使用 `tools/deploy.py --quant` 选项开启量化。
```bash
cd /path/to/mmdeploy
export MODEL_CONFIG=/path/to/mmpretrain/configs/resnet/resnet18_8xb16_cifar10.py
export MODEL_PATH=https://download.openmmlab.com/mmclassification/v0/resnet/resnet18_b16x8_cifar10_20210528-bd6371c8.pth
# 找一些 imagenet 样例图
git clone https://github.com/nihui/imagenet-sample-images --depth=1
# 量化模型
python3 tools/deploy.py configs/mmpretrain/classification_ncnn-int8_static.py ${MODEL_CONFIG} ${MODEL_PATH} /path/to/self-test.png --work-dir work_dir --device cpu --quant --quant-image-dir /path/to/imagenet-sample-images
...
```
参数说明
| 参数 | 含义 |
| :---------------: | :----------------------------------------------: |
| --quant | 是否开启量化,默认为 False |
| --quant-image-dir | 校准数据集,默认使用 MODEL_CONFIG 中的**验证集** |
## 自建校准数据集
校准集是用来计算量化层参数的,某些 DFQ(Data Free Quantization)方法甚至不需要校准集
- 新建文件夹,直接放入图片即可(不需要目录结构、不要负例、没有命名要求)
- 图片需为真实业务场景中的数据,相差过远会导致精度下降
- 不能直接拿测试集做量化,否则是过拟合
| 类型 | 训练集 | 验证集 | 测试集 | 校准集 |
| ---- | ------ | ------ | -------- | ------ |
| 用法 | QAT | PTQ | 测试精度 | PTQ |
**强烈建议**量化结束后,[按此文档](profile_model.md) 验证模型精度。[这里](../03-benchmark/quantization.md) 是一些量化模型测试结果。
# 更多工具介绍
`deploy.py` 以外, tools 目录下有很多实用工具
## torch2onnx
把 OpenMMLab 模型转 onnx 格式。
### 用法
```bash
python tools/torch2onnx.py \
${DEPLOY_CFG} \
${MODEL_CFG} \
${CHECKPOINT} \
${INPUT_IMG} \
--work-dir ${WORK_DIR} \
--device cpu \
--log-level INFO
```
### 参数说明
- `deploy_cfg` : The path of the deploy config file in MMDeploy codebase.
- `model_cfg` : The path of model config file in OpenMMLab codebase.
- `checkpoint` : The path of the model checkpoint file.
- `img` : The path of the image file used to convert the model.
- `--work-dir` : Directory to save output ONNX models Default is `./work-dir`.
- `--device` : The device used for conversion. If not specified, it will be set to `cpu`.
- `--log-level` : To set log level which in `'CRITICAL', 'FATAL', 'ERROR', 'WARN', 'WARNING', 'INFO', 'DEBUG', 'NOTSET'`. If not specified, it will be set to `INFO`.
## extract
`Mark` 节点的 onnx 模型会被分成多个子图,这个工具用来提取 onnx 模型中的子图。
### 用法
```bash
python tools/extract.py \
${INPUT_MODEL} \
${OUTPUT_MODEL} \
--start ${PARITION_START} \
--end ${PARITION_END} \
--log-level INFO
```
### 参数说明
- `input_model` : The path of input ONNX model. The output ONNX model will be extracted from this model.
- `output_model` : The path of output ONNX model.
- `--start` : The start point of extracted model with format `<function_name>:<input/output>`. The `function_name` comes from the decorator `@mark`.
- `--end` : The end point of extracted model with format `<function_name>:<input/output>`. The `function_name` comes from the decorator `@mark`.
- `--log-level` : To set log level which in `'CRITICAL', 'FATAL', 'ERROR', 'WARN', 'WARNING', 'INFO', 'DEBUG', 'NOTSET'`. If not specified, it will be set to `INFO`.
### 注意事项
要支持模型分块,必须在 onnx 模型中添加 mark 节点,用`@mark` 修饰。
下面这个例子里 mark 了 `multiclass_nms`,在 NMS 前设置 `end=multiclass_nms:input` 提取子图。
```python
@mark('multiclass_nms', inputs=['boxes', 'scores'], outputs=['dets', 'labels'])
def multiclass_nms(*args, **kwargs):
"""Wrapper function for `_multiclass_nms`."""
```
## onnx2pplnn
这个工具可以把 onnx 模型转成 pplnn 格式。
### 用法
```bash
python tools/onnx2pplnn.py \
${ONNX_PATH} \
${OUTPUT_PATH} \
--device cuda:0 \
--opt-shapes [224,224] \
--log-level INFO
```
### 参数说明
- `onnx_path`: The path of the `ONNX` model to convert.
- `output_path`: The converted `PPLNN` algorithm path in json format.
- `device`: The device of the model during conversion.
- `opt-shapes`: Optimal shapes for PPLNN optimization. The shape of each tensor should be wrap with "[]" or "()" and the shapes of tensors should be separated by ",".
- `--log-level`: To set log level which in `'CRITICAL', 'FATAL', 'ERROR', 'WARN', 'WARNING', 'INFO', 'DEBUG', 'NOTSET'`. If not specified, it will be set to `INFO`.
## onnx2tensorrt
这个工具把 onnx 转成 trt .engine 格式。
### 用法
```bash
python tools/onnx2tensorrt.py \
${DEPLOY_CFG} \
${ONNX_PATH} \
${OUTPUT} \
--device-id 0 \
--log-level INFO \
--calib-file /path/to/file
```
### 参数说明
- `deploy_cfg` : The path of the deploy config file in MMDeploy codebase.
- `onnx_path` : The ONNX model path to convert.
- `output` : The path of output TensorRT engine.
- `--device-id` : The device index, default to `0`.
- `--calib-file` : The calibration data used to calibrate engine to int8.
- `--log-level` : To set log level which in `'CRITICAL', 'FATAL', 'ERROR', 'WARN', 'WARNING', 'INFO', 'DEBUG', 'NOTSET'`. If not specified, it will be set to `INFO`.
## onnx2ncnn
onnx 转 ncnn
### 用法
```bash
python tools/onnx2ncnn.py \
${ONNX_PATH} \
${NCNN_PARAM} \
${NCNN_BIN} \
--log-level INFO
```
### 参数说明
- `onnx_path` : The path of the `ONNX` model to convert from.
- `output_param` : The converted `ncnn` param path.
- `output_bin` : The converted `ncnn` bin path.
- `--log-level` : To set log level which in `'CRITICAL', 'FATAL', 'ERROR', 'WARN', 'WARNING', 'INFO', 'DEBUG', 'NOTSET'`. If not specified, it will be set to `INFO`.
## profiler
这个工具用来测试 torch 和 trt 等后端的速度,注意测试不包含前后处理。
### 用法
```bash
python tools/profiler.py \
${DEPLOY_CFG} \
${MODEL_CFG} \
${IMAGE_DIR} \
--model ${MODEL} \
--device ${DEVICE} \
--shape ${SHAPE} \
--num-iter ${NUM_ITER} \
--warmup ${WARMUP} \
--cfg-options ${CFG_OPTIONS} \
--batch-size ${BATCH_SIZE} \
--img-ext ${IMG_EXT}
```
### 参数说明
- `deploy_cfg` : The path of the deploy config file in MMDeploy codebase.
- `model_cfg` : The path of model config file in OpenMMLab codebase.
- `image_dir` : The directory to image files that used to test the model.
- `--model` : The path of the model to be tested.
- `--shape` : Input shape of the model by `HxW`, e.g., `800x1344`. If not specified, it would use `input_shape` from deploy config.
- `--num-iter` : Number of iteration to run inference. Default is `100`.
- `--warmup` : Number of iteration to warm-up the machine. Default is `10`.
- `--device` : The device type. If not specified, it will be set to `cuda:0`.
- `--cfg-options` : Optional key-value pairs to be overrode for model config.
- `--batch-size`: the batch size for test inference. Default is `1`. Note that not all models support `batch_size>1`.
- `--img-ext`: the file extensions for input images from `image_dir`. Defaults to `['.jpg', '.jpeg', '.png', '.ppm', '.bmp', '.pgm', '.tif']`.
### 使用举例
```shell
python tools/profiler.py \
configs/mmpretrain/classification_tensorrt_dynamic-224x224-224x224.py \
../mmpretrain/configs/resnet/resnet18_8xb32_in1k.py \
../mmpretrain/demo/ \
--model work-dirs/mmpretrain/resnet/trt/end2end.engine \
--device cuda \
--shape 224x224 \
--num-iter 100 \
--warmup 10 \
--batch-size 1
```
输出:
```text
----- Settings:
+------------+---------+
| batch size | 1 |
| shape | 224x224 |
| iterations | 100 |
| warmup | 10 |
+------------+---------+
----- Results:
+--------+------------+---------+
| Stats | Latency/ms | FPS |
+--------+------------+---------+
| Mean | 1.535 | 651.656 |
| Median | 1.665 | 600.569 |
| Min | 1.308 | 764.341 |
| Max | 1.689 | 591.983 |
+--------+------------+---------+
```
## generate_md_table
生成mmdeploy支持的后端表。
### 用法
```shell
python tools/generate_md_table.py \
${YML_FILE} \
${OUTPUT} \
--backends ${BACKENDS}
```
### 参数说明
- `yml_file:` 输入 yml 配置路径
- `output:` 输出markdown文件路径
- `--backends:` 要输出的后端,默认为 onnxruntime tensorrt torchscript pplnn openvino ncnn
### 使用举例
从 mmocr.yml 生成mmdeploy支持的后端表
```shell
python tools/generate_md_table.py tests/regression/mmocr.yml tests/regression/mmocr.md --backends onnxruntime tensorrt torchscript pplnn openvino ncnn
```
输出:
| model | task | onnxruntime | tensorrt | torchscript | pplnn | openvino | ncnn |
| :----------------------------------------------------------------------------------- | :-------------- | :---------: | :------: | :---------: | :---: | :------: | :--: |
| [DBNet](https://github.com/open-mmlab/mmocr/tree/main/configs/textdet/dbnet) | TextDetection | Y | Y | Y | Y | Y | Y |
| [DBNetpp](https://github.com/open-mmlab/mmocr/tree/main/configs/textdet/dbnetpp) | TextDetection | Y | Y | N | N | Y | Y |
| [PANet](https://github.com/open-mmlab/mmocr/tree/main/configs/textdet/panet) | TextDetection | Y | Y | Y | Y | Y | Y |
| [PSENet](https://github.com/open-mmlab/mmocr/tree/main/configs/textdet/psenet) | TextDetection | Y | Y | Y | Y | Y | Y |
| [TextSnake](https://github.com/open-mmlab/mmocr/tree/main/configs/textdet/textsnake) | TextDetection | Y | Y | Y | N | N | N |
| [MaskRCNN](https://github.com/open-mmlab/mmocr/tree/main/configs/textdet/maskrcnn) | TextDetection | Y | Y | Y | N | N | N |
| [CRNN](https://github.com/open-mmlab/mmocr/tree/main/configs/textrecog/crnn) | TextRecognition | Y | Y | Y | Y | N | Y |
| [SAR](https://github.com/open-mmlab/mmocr/tree/main/configs/textrecog/sar) | TextRecognition | Y | N | Y | N | N | N |
| [SATRN](https://github.com/open-mmlab/mmocr/tree/main/configs/textrecog/satrn) | TextRecognition | Y | Y | Y | N | N | N |
| [ABINet](https://github.com/open-mmlab/mmocr/tree/main/configs/textrecog/abinet) | TextRecognition | Y | Y | Y | N | N | N |
# 如何写模型转换配置
<!-- This tutorial describes how to write a config for model conversion and deployment. A deployment config includes `onnx config`, `codebase config`, `backend config`. -->
这篇教程介绍了如何编写模型转换和部署的配置文件。部署配置文件由`ONNX配置``代码库配置``推理框架配置`组成。
<!-- TOC -->
- [如何写模型转换配置](#如何写模型转换配置)
- [1. 如何编写ONNX配置](#1-如何编写onnx配置)
- [ONNX配置参数说明](#onnx配置参数说明)
- [示例](#示例)
- [动态尺寸输入和输出配置](#动态尺寸输入和输出配置)
- [示例](#示例-1)
- [2. 如何编写代码库配置](#2-如何编写代码库配置)
- [代码库配置参数说明](#代码库配置参数说明)
- [示例](#示例-2)
- [3. 如何编写推理框架配置](#3-如何编写推理框架配置)
- [示例](#示例-3)
- [4. 部署配置完整示例](#4-部署配置完整示例)
- [5. 部署配置文件命名规则](#5-部署配置文件命名规则)
- [示例](#示例-4)
- [6. 如何编写模型配置文件](#6-如何编写模型配置文件)
<!-- TOC -->
## 1. 如何编写ONNX配置
ONNX 配置描述了如何将PyTorch模型转换为ONNX模型。
### ONNX配置参数说明
- `type`: 配置类型。 默认为 `onnx`
- `export_params`: 如果指定,将导出模型所有参数。如果您只想导出未训练模型将此项设置为 False。
- `keep_initializers_as_inputs`:
如果为 True,则所有初始化器(通常对应为参数)也将作为输入导出,添加到计算图中。 如果为 False,则初始化器不会作为输入导出,不添加到计算图中,仅将非参数输入添加到计算图中。
- `opset_version`: ONNX的算子集版本,默认为11。
- `save_file`: 输出ONNX模型文件。
- `input_names`: 模型计算图中输入节点的名称。
- `output_names`: 模型计算图中输出节点的名称。
- `input_shape`: 模型输入张量的高度和宽度。
#### 示例
```python
onnx_config = dict(
type='onnx',
export_params=True,
keep_initializers_as_inputs=False,
opset_version=11,
save_file='end2end.onnx',
input_names=['input'],
output_names=['output'],
input_shape=None)
```
### 动态尺寸输入和输出配置
如果模型要求动态尺寸的输入和输出,您需要在ONNX配置中加入dynamic_axes配置。
- `dynamic_axes`: 描述输入和输出的维度信息。
#### 示例
```python
dynamic_axes={
'input': {
0: 'batch',
2: 'height',
3: 'width'
},
'dets': {
0: 'batch',
1: 'num_dets',
},
'labels': {
0: 'batch',
1: 'num_dets',
},
}
```
## 2. 如何编写代码库配置
代码库主要指OpenMMLab 系列模型代码库,代码库配置由OpenMMLab 系列模型代码库的简称和OpenMMLab 系列模型任务类型组成。
### 代码库配置参数说明
- `type`: OpenMMLab 系列模型代码库的简称, 包括 `mmpretrain``mmdet``mmseg``mmocr``mmagic`
- `task`: OpenMMLab 系列模型任务类型, 具体请参考 [OpenMMLab 系列模型任务列表](#list-of-tasks-in-all-codebases)
#### 示例
```python
codebase_config = dict(type='mmpretrain', task='Classification')
```
## 3. 如何编写推理框架配置
推理框架配置主要用于指定模型运行在哪个推理框架,并提供模型在推理框架运行时所需的信息,具体参考 [ONNX Runtime](../05-supported-backends/onnxruntime.md)[TensorRT](../05-supported-backends/tensorrt.md)[ncnn](../05-supported-backends/ncnn.md)[PPLNN](../05-supported-backends/pplnn.md)
- `type`: 模型推理框架, 包括 `onnxruntime``ncnn``pplnn``tensorrt``openvino`
#### 示例
```python
backend_config = dict(
type='tensorrt',
common_config=dict(
fp16_mode=False, max_workspace_size=1 << 30),
model_inputs=[
dict(
input_shapes=dict(
input=dict(
min_shape=[1, 3, 512, 1024],
opt_shape=[1, 3, 1024, 2048],
max_shape=[1, 3, 2048, 2048])))
])
```
## 4. 部署配置完整示例
这里我们提供了一个以TensorRT为推理框架的基于mmpretrain图像分类任务的完整部署配置示例。
```python
codebase_config = dict(type='mmpretrain', task='Classification')
backend_config = dict(
type='tensorrt',
common_config=dict(
fp16_mode=False,
max_workspace_size=1 << 30),
model_inputs=[
dict(
input_shapes=dict(
input=dict(
min_shape=[1, 3, 224, 224],
opt_shape=[4, 3, 224, 224],
max_shape=[64, 3, 224, 224])))])
onnx_config = dict(
type='onnx',
dynamic_axes={
'input': {
0: 'batch',
2: 'height',
3: 'width'
},
'output': {
0: 'batch'
}
},
export_params=True,
keep_initializers_as_inputs=False,
opset_version=11,
save_file='end2end.onnx',
input_names=['input'],
output_names=['output'],
input_shape=[224, 224])
```
## 5. 部署配置文件命名规则
我们遵循以下样式来命名配置文件。建议贡献者遵循相同的风格。
```bash
(task name)_(backend name)_(dynamic or static).py
```
- `task name`: 模型任务类型。
- `backend name`: 推理框架名称。注意:如果您使用了量化,您需要指出量化类型。例如 `tensorrt-int8`
- `dynamic or static`: 动态或者静态尺寸导出。 注意:如果推理框架需要明确的形状信息,您需要添加输入大小的描述,格式为`高度 x 宽度`。 例如 `dynamic-512x1024-2048x2048`, 这意味着最小输入形状是`512x1024`,最大输入形状是`2048x2048`
#### 示例
```bash
detection_tensorrt-int8_dynamic-320x320-1344x1344.py
```
## 6. 如何编写模型配置文件
请根据模型具体任务的代码库,编写模型配置文件。 模型配置文件用于初始化模型,详情请参考[MMPretrain](https://github.com/open-mmlab/mmpretrain/blob/main/docs/zh_CN/user_guides/config.md)[MMDetection](https://github.com/open-mmlab/mmdetection/blob/3.x/docs/zh_cn/user_guides/config.md)[MMSegmentation](https://github.com/open-mmlab/mmsegmentation/blob/main/docs/zh_cn/user_guides/1_config.md)[MMOCR](https://github.com/open-mmlab/mmocr/blob/main/docs/en/user_guides/config.md)[MMagic](https://github.com/open-mmlab/mmagic/blob/main/docs/en/user_guides/config.md)
# 精度速度测试结果
## Backends
CPU: ncnn, ONNXRuntime, OpenVINO
GPU: ncnn, TensorRT, PPLNN
## 软硬件环境
- Ubuntu 18.04
- ncnn 20211208
- Cuda 11.3
- TensorRT 7.2.3.4
- Docker 20.10.8
- NVIDIA tesla T4 tensor core GPU for TensorRT
### 配置
- 静态图导出
- batch 大小为 1
- 测试时,计算各个数据集中 100 张图片的平均耗时
用户可以直接通过[model profiling](../02-how-to-run/profile_model.md)获得想要的速度测试结果。下面是我们环境中的测试结果:
## 速度测试
<div style="margin-left: 25px;">
<table class="docutils">
<thead>
<tr>
<th align="center" colspan="2">mmpretrain</th>
<th align="center" colspan="5">TensorRT(ms)</th>
<th align="center" colspan="2">PPLNN(ms)</th>
<th align="center" colspan="2">ncnn(ms)</th>
<th align="center" colspan="1">Ascend(ms)</th>
</tr>
</thead>
<tbody>
<tr>
<td align="center" colspan="1" rowspan="2">model</td>
<td align="center" colspan="1" rowspan="2">spatial</td>
<td align="center" colspan="3">T4</td>
<td align="center" colspan="2">JetsonNano2GB</td>
<td align="center" colspan="1">Jetson TX2</td>
<td align="center" colspan="1">T4</td>
<td align="center" colspan="1">SnapDragon888</td>
<td align="center" colspan="1">Adreno660</td>
<td align="center" colspan="1">Ascend310</td>
</tr>
<tr>
<td align="center" colspan="1">fp32</td>
<td align="center" colspan="1">fp16</td>
<td align="center" colspan="1">int8</td>
<td align="center" colspan="1">fp32</td>
<td align="center" colspan="1">fp16</td>
<td align="center" colspan="1">fp32</td>
<td align="center" colspan="1">fp16</td>
<td align="center" colspan="1">fp32</td>
<td align="center" colspan="1">fp32</td>
<td align="center" colspan="1">fp32</td>
</tr>
<tr>
<td align="center"><a href="https://github.com/open-mmlab/mmpretrain/blob/main/configs/resnet/resnet50_8xb32_in1k.py"> ResNet </a></td>
<td align="center">224x224</td>
<td align="center">2.97</td>
<td align="center">1.26</td>
<td align="center">1.21</td>
<td align="center">59.32</td>
<td align="center">30.54</td>
<td align="center">24.13</td>
<td align="center">1.30</td>
<td align="center">33.91</td>
<td align="center">25.93</td>
<td align="center">2.49</td>
</tr>
<tr>
<td align="center"> <a href="https://github.com/open-mmlab/mmpretrain/blob/main/configs/resnext/resnext50-32x4d_8xb32_in1k.py"> ResNeXt </a></td>
<td align="center">224x224</td>
<td align="center">4.31</td>
<td align="center">1.42</td>
<td align="center">1.37</td>
<td align="center">88.10</td>
<td align="center">49.18</td>
<td align="center">37.45</td>
<td align="center">1.36</td>
<td align="center">133.44</td>
<td align="center">69.38</td>
<td align="center">-</td>
</tr>
<tr>
<td align="center"> <a href="https://github.com/open-mmlab/mmpretrain/blob/main/configs/seresnet/seresnet50_8xb32_in1k.py"> SE-ResNet </a></td>
<td align="center">224x224</td>
<td align="center">3.41</td>
<td align="center">1.66</td>
<td align="center">1.51</td>
<td align="center">74.59</td>
<td align="center">48.78</td>
<td align="center">29.62</td>
<td align="center">1.91</td>
<td align="center">107.84</td>
<td align="center">80.85</td>
<td align="center">-</td>
</tr>
<tr>
<td align="center"><a href="https://github.com/open-mmlab/mmpretrain/blob/main/configs/shufflenet_v2/shufflenet-v2-1x_16xb64_in1k.py"> ShuffleNetV2 </a></td>
<td align="center">224x224</td>
<td align="center">1.37</td>
<td align="center">1.19</td>
<td align="center">1.13</td>
<td align="center">15.26</td>
<td align="center">10.23</td>
<td align="center">7.37</td>
<td align="center">4.69</td>
<td align="center">9.55</td>
<td align="center">10.66</td>
<td align="center">-</td>
</tr>
</tbody>
</table>
<table class="docutils">
<thead>
<tr>
<th align="center" colspan="2">mmdet part1</th>
<th align="center" colspan="4">TensorRT(ms)</th>
<th align="center" colspan="1">PPLNN(ms)</th>
</tr>
</thead>
<tbody>
<tr>
<td align="center" rowspan="2" colspan="1">model</td>
<td align="center" rowspan="2" colspan="1">spatial</td>
<td align="center" colspan="3">T4</td>
<td align="center" colspan="1">Jetson TX2</td>
<td align="center" colspan="1">T4</td>
</tr>
<tr>
<td align="center" colspan="1">fp32</td>
<td align="center" colspan="1">fp16</td>
<td align="center" colspan="1">int8</td>
<td align="center" colspan="1">fp32</td>
<td align="center" colspan="1">fp16</td>
</tr>
<tr>
<td align="center"><a href="https://github.com/open-mmlab/mmdetection/tree/main/configs/yolo/yolov3_d53_320_273e_coco.py">YOLOv3</a></td>
<td align="center">320x320</td>
<td align="center">14.76</td>
<td align="center">24.92</td>
<td align="center">24.92</td>
<td align="center">-</td>
<td align="center">18.07</td>
</tr>
<tr>
<td align="center"><a href="https://github.com/open-mmlab/mmdetection/tree/main/configs/ssd/ssdlite_mobilenetv2_scratch_600e_coco.py">SSD-Lite</a></td>
<td align="center">320x320</td>
<td align="center">8.84</td>
<td align="center">9.21</td>
<td align="center">8.04</td>
<td align="center">1.28</td>
<td align="center">19.72</td>
</tr>
<tr>
<td align="center"><a href="https://github.com/open-mmlab/mmdetection/tree/main/configs/retinanet/retinanet_r50_fpn_1x_coco.py">RetinaNet</a></td>
<td align="center">800x1344</td>
<td align="center">97.09</td>
<td align="center">25.79</td>
<td align="center">16.88</td>
<td align="center">780.48</td>
<td align="center">38.34</td>
</tr>
<tr>
<td align="center"><a href="https://github.com/open-mmlab/mmdetection/tree/main/configs/fcos/fcos_r50_caffe_fpn_gn-head_1x_coco.py">FCOS</a></td>
<td align="center">800x1344</td>
<td align="center">84.06</td>
<td align="center">23.15</td>
<td align="center">17.68</td>
<td align="center">-</td>
<td align="center">-</td>
</tr>
<tr>
<td align="center"><a href="https://github.com/open-mmlab/mmdetection/tree/main/configs/fsaf/fsaf_r50_fpn_1x_coco.py">FSAF</a></td>
<td align="center">800x1344</td>
<td align="center">82.96</td>
<td align="center">21.02</td>
<td align="center">13.50</td>
<td align="center">-</td>
<td align="center">30.41</td>
</tr>
<tr>
<td align="center"><a href="https://github.com/open-mmlab/mmdetection/tree/main/configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.py">Faster R-CNN</a></td>
<td align="center">800x1344</td>
<td align="center">88.08</td>
<td align="center">26.52</td>
<td align="center">19.14</td>
<td align="center">733.81</td>
<td align="center">65.40</td>
</tr>
<tr>
<td align="center"><a href="https://github.com/open-mmlab/mmdetection/tree/main/configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.py">Mask R-CNN</a></td>
<td align="center">800x1344</td>
<td align="center">104.83</td>
<td align="center">58.27</td>
<td align="center">-</td>
<td align="center">-</td>
<td align="center">86.80</td>
</tr>
</tbody>
</table>
</div>
<div style="margin-left: 25px;">
<table>
<thead>
<tr>
<th align="center" colspan="2">mmdet part2</th>
<th align="center" colspan="2">ncnn</th>
</tr>
</thead>
<tbody>
<tr>
<td align="center" rowspan="2">model</td>
<td align="center" rowspan="2">spatial</td>
<td align="center" colspan="1">SnapDragon888</td>
<td align="center" colspan="1">Adreno660</td>
</tr>
<tr>
<td align="center" colspan="1">fp32</td>
<td align="center" colspan="1">fp32</td>
</tr>
<tr>
<td align="center"><a href="https://github.com/open-mmlab/mmdetection/tree/main/configs/yolo/yolov3_mobilenetv2_mstrain-416_300e_coco.py">MobileNetv2-YOLOv3</a></td>
<td align="center">320x320</td>
<td align="center">48.57</td>
<td align="center">66.55</td>
</tr>
<tr>
<td align="center"><a href="https://github.com/open-mmlab/mmdetection/tree/main/configs/ssd/ssdlite_mobilenetv2_scratch_600e_coco.py">SSD-Lite</a></td>
<td align="center">320x320</td>
<td align="center">44.91</td>
<td align="center">66.19</td>
</tr>
<tr>
<td align="center"><a href="https://github.com/open-mmlab/mmdetection/tree/main/configs/yolox/yolox_tiny_8x8_300e_coco.py">YOLOX</a></td>
<td align="center">416x416</td>
<td align="center">111.60</td>
<td align="center">134.50</td>
</tr>
</tbody>
</table>
</div>
<div style="margin-left: 25px;">
<table class="docutils">
<thead>
<tr>
<th align="center" colspan="2">mmagic</th>
<th align="center" colspan="4">TensorRT(ms)</th>
<th align="center" colspan="1">PPLNN(ms)</th>
</tr>
</thead>
<tbody>
<tr>
<td align="center" rowspan="2">model</td>
<td align="center" rowspan="2">spatial</td>
<td align="center" colspan="3">T4</td>
<td align="center" colspan="1">Jetson TX2</td>
<td align="center" colspan="1">T4</td>
</tr>
<tr>
<td align="center" colspan="1">fp32</td>
<td align="center" colspan="1">fp16</td>
<td align="center" colspan="1">int8</td>
<td align="center" colspan="1">fp32</td>
<td align="center" colspan="1">fp16</td>
</tr>
<tr>
<td align="center"><a href="https://github.com/open-mmlab/mmagic/blob/main/configs/esrgan/esrgan_psnr-x4c64b23g32_1xb16-1000k_div2k.py">ESRGAN</a></td>
<td align="center">32x32</td>
<td align="center">12.64</td>
<td align="center">12.42</td>
<td align="center">12.45</td>
<td align="center">-</td>
<td align="center">7.67</td>
</tr>
<tr>
<td align="center"><a href="https://github.com/open-mmlab/mmagic/blob/main/configs/srcnn/srcnn_x4k915_1xb16-1000k_div2k.py">SRCNN</a></td>
<td align="center">32x32</td>
<td align="center">0.70</td>
<td align="center">0.35</td>
<td align="center">0.26</td>
<td align="center">58.86</td>
<td align="center">0.56</td>
</tr>
</tbody>
</table>
</div>
<div style="margin-left: 25px;">
<table class="docutils">
<thead>
<tr>
<th align="center" colspan="2">mmocr</th>
<th align="center" colspan="3">TensorRT(ms)</th>
<th align="center" colspan="1">PPLNN(ms)</th>
<th align="center" colspan="2">ncnn(ms)</th>
</tr>
</thead>
<tbody>
<tr>
<td align="center" rowspan="2">model</td>
<td align="center" rowspan="2">spatial</td>
<td align="center" colspan="3">T4</td>
<td align="center" colspan="1">T4</td>
<td align="center" colspan="1">SnapDragon888</td>
<td align="center" colspan="1">Adreno660</td>
</tr>
<tr>
<td align="center" colspan="1">fp32</td>
<td align="center" colspan="1">fp16</td>
<td align="center" colspan="1">int8</td>
<td align="center" colspan="1">fp16</td>
<td align="center" colspan="1">fp32</td>
<td align="center" colspan="1">fp32</td>
</tr>
<tr>
<td align="center"><a href="https://github.com/open-mmlab/mmocr/blob/main/configs/textdet/dbnet/dbnet_resnet18_fpnc_1200e_icdar2015.py">DBNet</a></td>
<td align="center">640x640</td>
<td align="center">10.70</td>
<td align="center">5.62</td>
<td align="center">5.00</td>
<td align="center">34.84</td>
<td align="center">-</td>
<td align="center">-</td>
</tr>
<tr>
<td align="center"><a href="https://github.com/open-mmlab/mmocr/blob/main/configs/textrecog/crnn/crnn_mini-vgg_5e_mj.py">CRNN</a></td>
<td align="center">32x32</td>
<td align="center">1.93 </td>
<td align="center">1.40</td>
<td align="center">1.36</td>
<td align="center">-</td>
<td align="center">10.57</td>
<td align="center">20.00</td>
</tbody>
</table>
</div>
<div style="margin-left: 25px;">
<table class="docutils">
<thead>
<tr>
<th align="center" colspan="2">mmseg</th>
<th align="center" colspan="4">TensorRT(ms)</th>
<th align="center" colspan="1">PPLNN(ms)</th>
</tr>
</thead>
<tbody>
<tr>
<td align="center" rowspan="2">model</td>
<td align="center" rowspan="2">spatial</td>
<td align="center" colspan="3">T4</td>
<td align="center" colspan="1">Jetson TX2</td>
<td align="center" colspan="1">T4</td>
</tr>
<tr>
<td align="center" colspan="1">fp32</td>
<td align="center" colspan="1">fp16</td>
<td align="center" colspan="1">int8</td>
<td align="center" colspan="1">fp32</td>
<td align="center" colspan="1">fp16</td>
</tr>
<tr>
<td align="center"><a href="https://github.com/open-mmlab/mmsegmentation/tree/main/configs/fcn/fcn_r50-d8_4xb2-40k_cityscapes-512x1024.py">FCN</a></td>
<td align="center">512x1024</td>
<td align="center">128.42</td>
<td align="center">23.97</td>
<td align="center">18.13</td>
<td align="center">1682.54</td>
<td align="center">27.00</td>
</tr>
<tr>
<td align="center"><a href="https://github.com/open-mmlab/mmsegmentation/tree/main/configs/pspnet/pspnet_r50-d8_4xb2-80k_cityscapes-512x1024.py">PSPNet</a></td>
<td align="center">1x3x512x1024</td>
<td align="center">119.77</td>
<td align="center">24.10</td>
<td align="center">16.33</td>
<td align="center">1586.19</td>
<td align="center">27.26</td>
</tr>
<tr>
<td align="center"><a href="https://github.com/open-mmlab/mmsegmentation/tree/main/configs/deeplabv3/deeplabv3/deeplabv3_r50-d8_4xb2-80k_cityscapes-512x1024.py">DeepLabV3</a></td>
<td align="center">512x1024</td>
<td align="center">226.75</td>
<td align="center">31.80</td>
<td align="center">19.85</td>
<td align="center">-</td>
<td align="center">36.01</td>
</tr>
<tr>
<td align="center"><a href="https://github.com/open-mmlab/mmsegmentation/tree/main/configs/deeplabv3plus/deeplabv3plus_r50-d8_4xb2-80k_cityscapes-512x1024.py">DeepLabV3+</a></td>
<td align="center">512x1024</td>
<td align="center">151.25</td>
<td align="center">47.03</td>
<td align="center">50.38</td>
<td align="center">2534.96</td>
<td align="center">34.80</td>
</tr>
</tbody>
</table>
</div>
## 精度测试
<div style="margin-left: 25px;">
<table class="docutils">
<thead>
<tr>
<th align="center" colspan="2">mmpretrain</th>
<th align="center">PyTorch</th>
<th align="center">TorchScript</th>
<th align="center">ONNX Runtime</th>
<th align="center" colspan="3">TensorRT</th>
<th align="center">PPLNN</th>
<th align="center">Ascend</th>
</tr>
</thead>
<tbody>
<tr>
<td align="center">model</td>
<td align="center">metric</td>
<td align="center">fp32</td>
<td align="center">fp32</td>
<td align="center">fp32</td>
<td align="center">fp32</td>
<td align="center">fp16</td>
<td align="center">int8</td>
<td align="center">fp16</td>
<td align="center">fp32</td>
</tr>
<tr>
<td align="center" rowspan="2"><a href="https://github.com/open-mmlab/mmpretrain/blob/main/configs/resnet/resnet18_8xb32_in1k.py">ResNet-18</a></td>
<td align="center">top-1</td>
<td align="center">69.90</td>
<td align="center">69.90</td>
<td align="center">69.88</td>
<td align="center">69.88</td>
<td align="center">69.86</td>
<td align="center">69.86</td>
<td align="center">69.86</td>
<td align="center">69.91</td>
</tr>
<tr>
<td align="center">top-5</td>
<td align="center">89.43</td>
<td align="center">89.43</td>
<td align="center">89.34</td>
<td align="center">89.34</td>
<td align="center">89.33</td>
<td align="center">89.38</td>
<td align="center">89.34</td>
<td align="center">89.43</td>
</tr>
<tr>
<td align="center" rowspan="2"><a href="https://github.com/open-mmlab/mmpretrain/blob/main/configs/resnext/resnext50-32x4d_8xb32_in1k.py">ResNeXt-50</a></td>
<td align="center">top-1</td>
<td align="center">77.90</td>
<td align="center">77.90</td>
<td align="center">77.90</td>
<td align="center">77.90</td>
<td align="center">-</td>
<td align="center">77.78</td>
<td align="center">77.89</td>
<td align="center">-</td>
</tr>
<tr>
<td align="center">top-5</td>
<td align="center">93.66</td>
<td align="center">93.66</td>
<td align="center">93.66</td>
<td align="center">93.66</td>
<td align="center">-</td>
<td align="center">93.64</td>
<td align="center">93.65</td>
<td align="center">-</td>
</tr>
<tr>
<td align="center" rowspan="2"><a href="https://github.com/open-mmlab/mmpretrain/blob/main/configs/seresnet/seresnext50-32x4d_8xb32_in1k.py">SE-ResNet-50</a></td>
<td align="center">top-1</td>
<td align="center">77.74</td>
<td align="center">77.74</td>
<td align="center">77.74</td>
<td align="center">77.74</td>
<td align="center">77.75</td>
<td align="center">77.63</td>
<td align="center">77.73</td>
<td align="center">-</td>
</tr>
<tr>
<td align="center">top-5</td>
<td align="center">93.84</td>
<td align="center">93.84</td>
<td align="center">93.84</td>
<td align="center">93.84</td>
<td align="center">93.83</td>
<td align="center">93.72</td>
<td align="center">93.84</td>
<td align="center">-</td>
</tr>
<tr>
<td align="center" rowspan="2"><a href="https://github.com/open-mmlab/mmpretrain/blob/main/configs/shufflenet_v1/shufflenet-v1-1x_16xb64_in1k.py">ShuffleNetV1 1.0x</a></td>
<td align="center">top-1</td>
<td align="center">68.13</td>
<td align="center">68.13</td>
<td align="center">68.13</td>
<td align="center">68.13</td>
<td align="center">68.13</td>
<td align="center">67.71</td>
<td align="center">68.11</td>
<td align="center">-</td>
</tr>
<tr>
<td align="center">top-5</td>
<td align="center">87.81</td>
<td align="center">87.81</td>
<td align="center">87.81</td>
<td align="center">87.81</td>
<td align="center">87.81</td>
<td align="center">87.58</td>
<td align="center">87.80</td>
<td align="center">-</td>
</tr>
<tr>
<td align="center" rowspan="2"><a href="https://github.com/open-mmlab/mmpretrain/blob/main/configs/shufflenet_v2/shufflenet-v2-1x_16xb64_in1k.py">ShuffleNetV2 1.0x</a></td>
<td align="center">top-1</td>
<td align="center">69.55</td>
<td align="center">69.55</td>
<td align="center">69.55</td>
<td align="center">69.55</td>
<td align="center">69.54</td>
<td align="center">69.10</td>
<td align="center">69.54</td>
<td align="center">-</td>
</tr>
<tr>
<td align="center">top-5</td>
<td align="center">88.92</td>
<td align="center">88.92</td>
<td align="center">88.92</td>
<td align="center">88.92</td>
<td align="center">88.91</td>
<td align="center">88.58</td>
<td align="center">88.92</td>
<td align="center">-</td>
</tr>
<tr>
<td align="center" rowspan="2"><a href="https://github.com/open-mmlab/mmpretrain/blob/main/configs/mobilenet_v2/mobilenet-v2_8xb32_in1k.py">MobileNet V2</a></td>
<td align="center">top-1</td>
<td align="center">71.86</td>
<td align="center">71.86</td>
<td align="center">71.86</td>
<td align="center">71.86</td>
<td align="center">71.87</td>
<td align="center">70.91</td>
<td align="center">71.84</td>
<td align="center">71.87</td>
</tr>
<tr>
<td align="center">top-5</td>
<td align="center">90.42</td>
<td align="center">90.42</td>
<td align="center">90.42</td>
<td align="center">90.42</td>
<td align="center">90.40</td>
<td align="center">89.85</td>
<td align="center">90.41</td>
<td align="center">90.42</td>
</tr>
<tr>
<td align="center" rowspan="2"><a href="https://github.com/open-mmlab/mmpretrain/blob/main/configs/vision_transformer/vit-base-p16_ft-64xb64_in1k-384.py">Vision Transformer</a></td>
<td align="center">top-1</td>
<td align="center">85.43</td>
<td align="center">85.43</td>
<td align="center">-</td>
<td align="center">85.43</td>
<td align="center">85.42</td>
<td align="center">-</td>
<td align="center">-</td>
<td align="center">85.43</td>
</tr>
<tr>
<td align="center">top-5</td>
<td align="center">97.77</td>
<td align="center">97.77</td>
<td align="center">-</td>
<td align="center">97.77</td>
<td align="center">97.76</td>
<td align="center">-</td>
<td align="center">-</td>
<td align="center">97.77</td>
</tr>
<tr>
<td align="center" rowspan="2"><a href="https://github.com/open-mmlab/mmpretrain/blob/main/configs/swin_transformer/swin-tiny_16xb64_in1k.py">Swin Transformer</a></td>
<td align="center">top-1</td>
<td align="center">81.18</td>
<td align="center">81.18</td>
<td align="center">81.18</td>
<td align="center">81.18</td>
<td align="center">81.18</td>
<td align="center">-</td>
<td align="center">-</td>
<td align="center">-</td>
</tr>
<tr>
<td align="center">top-5</td>
<td align="center">95.61</td>
<td align="center">95.61</td>
<td align="center">95.61</td>
<td align="center">95.61</td>
<td align="center">95.61</td>
<td align="center">-</td>
<td align="center">-</td>
<td align="center">-</td>
<tr>
<td align="center" rowspan="2"><a href="https://github.com/open-mmlab/mmpretrain/blob/main/configs/efficientformer/efficientformer-l1_8xb128_in1k.py">EfficientFormer</a></td>
<td align="center">top-1</td>
<td align="center">80.46</td>
<td align="center">80.45</td>
<td align="center">80.46</td>
<td align="center">80.46</td>
<td align="center">-</td>
<td align="center">-</td>
<td align="center">-</td>
<td align="center">-</td>
</tr>
<tr>
<td align="center">top-5</td>
<td align="center">94.99</td>
<td align="center">94.98</td>
<td align="center">94.99</td>
<td align="center">94.99</td>
<td align="center">-</td>
<td align="center">-</td>
<td align="center">-</td>
<td align="center">-</td>
</tr>
</tr>
</tbody>
</table>
</div>
<div style="margin-left: 25px;">
<table class="docutils">
<thead>
<tr>
<th align="center" colspan="4">mmdet</th>
<th align="center">Pytorch</th>
<th align="center">TorchScript</th>
<th align="center">ONNXRuntime</th>
<th align="center" colspan="3">TensorRT</th>
<th align="center">PPLNN</th>
<th align="center">Ascend</th>
<th align="center">OpenVINO</th>
</tr>
</thead>
<tbody>
<tr>
<td align="center">model</td>
<td align="center">task</td>
<td align="center">dataset</td>
<td align="center">metric</td>
<td align="center">fp32</td>
<td align="center">fp32</td>
<td align="center">fp32</td>
<td align="center">fp32</td>
<td align="center">fp16</td>
<td align="center">int8</td>
<td align="center">fp16</td>
<td align="center">fp32</td>
<td align="center">fp32</td>
</tr>
<tr>
<td align="center"><a href="https://github.com/open-mmlab/mmdetection/tree/main/configs/yolo/yolov3_d53_320_273e_coco.py">YOLOV3</a></td>
<td align="center">Object Detection</td>
<td align="center">COCO2017</td>
<td align="center">box AP</td>
<td align="center">33.7</td>
<td align="center">33.7</td>
<td align="center">-</td>
<td align="center">33.5</td>
<td align="center">33.5</td>
<td align="center">33.5</td>
<td align="center">-</td>
<td align="center">-</td>
<td align="center">-</td>
</tr>
<tr>
<td align="center"><a href="https://github.com/open-mmlab/mmdetection/tree/main/configs/ssd/ssd300_coco.py">SSD</a></td>
<td align="center">Object Detection</td>
<td align="center">COCO2017</td>
<td align="center">box AP</td>
<td align="center">25.5</td>
<td align="center">25.5</td>
<td align="center">-</td>
<td align="center">25.5</td>
<td align="center">25.5</td>
<td align="center">-</td>
<td align="center">-</td>
<td align="center">-</td>
<td align="center">-</td>
</tr>
<tr>
<td align="center"><a href="https://github.com/open-mmlab/mmdetection/tree/main/configs/retinanet/retinanet_r50_fpn_1x_coco.py">RetinaNet</a></td>
<td align="center">Object Detection</td>
<td align="center">COCO2017</td>
<td align="center">box AP</td>
<td align="center">36.5</td>
<td align="center">36.4</td>
<td align="center">-</td>
<td align="center">36.4</td>
<td align="center">36.4</td>
<td align="center">36.3</td>
<td align="center">36.5</td>
<td align="center">36.4</td>
<td align="center">-</td>
</tr>
<tr>
<td align="center"><a href="https://github.com/open-mmlab/mmdetection/tree/main/configs/fcos/fcos_r50_caffe_fpn_gn-head_1x_coco.py">FCOS</a></td>
<td align="center">Object Detection</td>
<td align="center">COCO2017</td>
<td align="center">box AP</td>
<td align="center">36.6</td>
<td align="center">-</td>
<td align="center">-</td>
<td align="center">36.6</td>
<td align="center">36.5</td>
<td align="center">-</td>
<td align="center">-</td>
<td align="center">-</td>
<td align="center">-</td>
</tr>
<tr>
<td align="center"><a href="https://github.com/open-mmlab/mmdetection/tree/main/configs/fsaf/fsaf_r50_fpn_1x_coco.py">FSAF</a></td>
<td align="center">Object Detection</td>
<td align="center">COCO2017</td>
<td align="center">box AP</td>
<td align="center">37.4</td>
<td align="center">37.4</td>
<td align="center">-</td>
<td align="center">37.4</td>
<td align="center">37.4</td>
<td align="center">37.2</td>
<td align="center">37.4</td>
<td align="center">-</td>
<td align="center">-</td>
</tr>
<tr>
<td align="center"><a href="https://github.com/open-mmlab/mmdetection/tree/3.x/configs/centernet/centernet_r18_8xb16-crop512-140e_coco.py">CenterNet</a></td>
<td align="center">Object Detection</td>
<td align="center">COCO2017</td>
<td align="center">box AP</td>
<td align="center">25.9</td>
<td align="center">26.0</td>
<td align="center">26.0</td>
<td align="center">26.0</td>
<td align="center">25.8</td>
<td align="center">-</td>
<td align="center">-</td>
<td align="center">-</td>
<td align="center">-</td>
</tr>
<tr>
<td align="center"><a href="https://github.com/open-mmlab/mmdetection/tree/main/configs/yolox/yolox_s_8x8_300e_coco.py">YOLOX</a></td>
<td align="center">Object Detection</td>
<td align="center">COCO2017</td>
<td align="center">box AP</td>
<td align="center">40.5</td>
<td align="center">40.3</td>
<td align="center">-</td>
<td align="center">40.3</td>
<td align="center">40.3</td>
<td align="center">29.3</td>
<td align="center">-</td>
<td align="center">-</td>
<td align="center">-</td>
</tr>
<tr>
<td align="center"><a href="https://github.com/open-mmlab/mmdetection/tree/main/configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.py">Faster R-CNN</a></td>
<td align="center">Object Detection</td>
<td align="center">COCO2017</td>
<td align="center">box AP</td>
<td align="center">37.4</td>
<td align="center">37.3</td>
<td align="center">-</td>
<td align="center">37.3</td>
<td align="center">37.3</td>
<td align="center">37.1</td>
<td align="center">37.3</td>
<td align="center">37.2</td>
<td align="center">-</td>
</tr>
<tr>
<td align="center"><a href="https://github.com/open-mmlab/mmdetection/tree/main/configs/atss/atss_r50_fpn_1x_coco.py">ATSS</a></td>
<td align="center">Object Detection</td>
<td align="center">COCO2017</td>
<td align="center">box AP</td>
<td align="center">39.4</td>
<td align="center">-</td>
<td align="center">-</td>
<td align="center">39.4</td>
<td align="center">39.4</td>
<td align="center">-</td>
<td align="center">-</td>
<td align="center">-</td>
<td align="center">-</td>
</tr>
<tr>
<td align="center"><a href="https://github.com/open-mmlab/mmdetection/tree/main/configs/cascade_rcnn/cascade_rcnn_r50_caffe_fpn_1x_coco.py">Cascade R-CNN</a></td>
<td align="center">Object Detection</td>
<td align="center">COCO2017</td>
<td align="center">box AP</td>
<td align="center">40.4</td>
<td align="center">-</td>
<td align="center">-</td>
<td align="center">40.4</td>
<td align="center">40.4</td>
<td align="center">-</td>
<td align="center">40.4</td>
<td align="center">-</td>
<td align="center">-</td>
</tr>
<tr>
<td align="center"><a href="https://github.com/open-mmlab/mmdetection/tree/main/configs/gfl/gfl_r50_fpn_1x_coco.py">GFL</a></td>
<td align="center">Object Detection</td>
<td align="center">COCO2017</td>
<td align="center">box AP</td>
<td align="center">40.2</td>
<td align="center">-</td>
<td align="center">40.2</td>
<td align="center">40.2</td>
<td align="center">40.0</td>
<td align="center">-</td>
<td align="center">-</td>
<td align="center">-</td>
<td align="center">-</td>
</tr>
<tr>
<td align="center"><a href="https://github.com/open-mmlab/mmdetection/tree/main/configs/reppoints/reppoints_moment_r50_fpn_1x_coco.py">RepPoints</a></td>
<td align="center">Object Detection</td>
<td align="center">COCO2017</td>
<td align="center">box AP</td>
<td align="center">37.0</td>
<td align="center">-</td>
<td align="center">-</td>
<td align="center">36.9</td>
<td align="center">-</td>
<td align="center">-</td>
<td align="center">-</td>
<td align="center">-</td>
<td align="center">-</td>
</tr>
<tr>
<td align="center"><a href="https://github.com/open-mmlab/mmdetection/tree/main/configs/detr/detr_r50_8x2_150e_coco.py">DETR</a></td>
<td align="center">Object Detection</td>
<td align="center">COCO2017</td>
<td align="center">box AP</td>
<td align="center">40.1</td>
<td align="center">40.1</td>
<td align="center">-</td>
<td align="center">40.1</td>
<td align="center">40.1</td>
<td align="center">-</td>
<td align="center">-</td>
<td align="center">-</td>
<td align="center">-</td>
</tr>
<tr>
<td align="center" rowspan="2"><a href="https://github.com/open-mmlab/mmdetection/tree/main/configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.py">Mask R-CNN</a></td>
<td align="center" rowspan="2">Instance Segmentation</td>
<td align="center" rowspan="2">COCO2017</td>
<td align="center">box AP</td>
<td align="center">38.2</td>
<td align="center">38.1</td>
<td align="center">-</td>
<td align="center">38.1</td>
<td align="center">38.1</td>
<td align="center">-</td>
<td align="center">38.0</td>
<td align="center">-</td>
<td align="center">-</td>
</tr>
<tr>
<td align="center">mask AP</td>
<td align="center">34.7</td>
<td align="center">34.7</td>
<td align="center">-</td>
<td align="center">33.7</td>
<td align="center">33.7</td>
<td align="center">-</td>
<td align="center">-</td>
<td align="center">-</td>
<td align="center">-</td>
</tr>
<tr>
<td align="center" rowspan="2"><a href="https://github.com/open-mmlab/mmdetection/blob/main/configs/swin/mask_rcnn_swin-t-p4-w7_fpn_1x_coco.py">Swin-Transformer</a></td>
<td align="center" rowspan="2">Instance Segmentation</td>
<td align="center" rowspan="2">COCO2017</td>
<td align="center">box AP</td>
<td align="center">42.7</td>
<td align="center">-</td>
<td align="center">42.7</td>
<td align="center">42.5</td>
<td align="center">37.7</td>
<td align="center">-</td>
<td align="center">-</td>
<td align="center">-</td>
<td align="center">-</td>
</tr>
<tr>
<td align="center">mask AP</td>
<td align="center">39.3</td>
<td align="center">-</td>
<td align="center">39.3</td>
<td align="center">39.3</td>
<td align="center">35.4</td>
<td align="center">-</td>
<td align="center">-</td>
<td align="center">-</td>
<td align="center">-</td>
</tr>
<tr>
<td align="center"><a href="https://github.com/open-mmlab/mmdetection/tree/3.x/configs/solo/solo_r50_fpn_1x_coco.py">SOLO</a></td>
<td align="center">Instance Segmentation</td>
<td align="center">COCO2017</td>
<td align="center">mask AP</td>
<td align="center">33.1</td>
<td align="center">-</td>
<td align="center">32.7</td>
<td align="center">-</td>
<td align="center">-</td>
<td align="center">-</td>
<td align="center">-</td>
<td align="center">-</td>
<td align="center">32.7</td>
</tr>
<tr>
<td align="center"><a href="https://github.com/open-mmlab/mmdetection/tree/3.x/configs/solov2/solov2_r50_fpn_1x_coco.py">SOLOv2</a></td>
<td align="center">Instance Segmentation</td>
<td align="center">COCO2017</td>
<td align="center">mask AP</td>
<td align="center">34.8</td>
<td align="center">-</td>
<td align="center">34.5</td>
<td align="center">-</td>
<td align="center">-</td>
<td align="center">-</td>
<td align="center">-</td>
<td align="center">-</td>
<td align="center">34.5</td>
</tr>
</tbody>
</table>
</div>
<div style="margin-left: 25px;">
<table class="docutils">
<thead>
<tr>
<th align="center" colspan="4">mmagic</th>
<th align="center">Pytorch</th>
<th align="center">TorchScript</th>
<th align="center">ONNX Runtime</th>
<th align="center" colspan="3">TensorRT</th>
<th align="center">PPLNN</th>
</tr>
</thead>
<tbody>
<tr>
<td align="center">model</td>
<td align="center">task</td>
<td align="center">dataset</td>
<td align="center">metric</td>
<td align="center">fp32</td>
<td align="center">fp32</td>
<td align="center">fp32</td>
<td align="center">fp32</td>
<td align="center">fp16</td>
<td align="center">int8</td>
<td align="center">fp16</td>
</tr>
<tr>
<td align="center" rowspan="2"><a href="https://github.com/open-mmlab/mmagic/blob/main/configs/srcnn/srcnn_x4k915_1xb16-1000k_div2k.py">SRCNN</a></td>
<td align="center" rowspan="2">Super Resolution</td>
<td align="center" rowspan="2">Set5</td>
<td align="center">PSNR</td>
<td align="center">28.4316</td>
<td align="center">28.4120</td>
<td align="center">28.4323</td>
<td align="center">28.4323</td>
<td align="center">28.4286</td>
<td align="center">28.1995</td>
<td align="center">28.4311</td>
</tr>
<tr>
<td align="center">SSIM</td>
<td align="center">0.8099</td>
<td align="center">0.8106</td>
<td align="center">0.8097</td>
<td align="center">0.8097</td>
<td align="center">0.8096</td>
<td align="center">0.7934</td>
<td align="center">0.8096</td>
</tr>
<tr>
<td align="center" rowspan="2"><a href="https://github.com/open-mmlab/mmagic/blob/main/configs/esrgan/esrgan_x4c64b23g32_1xb16-400k_div2k.py">ESRGAN</a></td>
<td align="center" rowspan="2">Super Resolution</td>
<td align="center" rowspan="2">Set5</td>
<td align="center">PSNR</td>
<td align="center">28.2700</td>
<td align="center">28.2619</td>
<td align="center">28.2592</td>
<td align="center">28.2592</td>
<td align="center"> - </td>
<td align="center"> - </td>
<td align="center">28.2624</td>
</tr>
<tr>
<td align="center">SSIM</td>
<td align="center">0.7778</td>
<td align="center">0.7784</td>
<td align="center">0.7764</td>
<td align="center">0.7774</td>
<td align="center"> - </td>
<td align="center"> - </td>
<td align="center">0.7765</td>
</tr>
<tr>
<td align="center" rowspan="2"><a href="https://github.com/open-mmlab/mmagic/blob/main/configs/esrgan/esrgan_psnr-x4c64b23g32_1xb16-1000k_div2k.py">ESRGAN-PSNR</a></td>
<td align="center" rowspan="2">Super Resolution</td>
<td align="center" rowspan="2">Set5</td>
<td align="center">PSNR</td>
<td align="center">30.6428</td>
<td align="center">30.6306</td>
<td align="center">30.6444</td>
<td align="center">30.6430</td>
<td align="center"> - </td>
<td align="center"> - </td>
<td align="center">27.0426</td>
</tr>
<tr>
<td align="center">SSIM</td>
<td align="center">0.8559</td>
<td align="center">0.8565</td>
<td align="center">0.8558</td>
<td align="center">0.8558</td>
<td align="center"> - </td>
<td align="center"> - </td>
<td align="center">0.8557</td>
</tr>
<tr>
<td align="center" rowspan="2"><a href="https://github.com/open-mmlab/mmagic/blob/main/configs/srgan_resnet/srgan_x4c64b16_1xb16-1000k_div2k.py">SRGAN</a></td>
<td align="center" rowspan="2">Super Resolution</td>
<td align="center" rowspan="2">Set5</td>
<td align="center">PSNR</td>
<td align="center">27.9499</td>
<td align="center">27.9252</td>
<td align="center">27.9408</td>
<td align="center">27.9408</td>
<td align="center"> - </td>
<td align="center"> - </td>
<td align="center">27.9388</td>
</tr>
<tr>
<td align="center">SSIM</td>
<td align="center">0.7846</td>
<td align="center">0.7851</td>
<td align="center">0.7839</td>
<td align="center">0.7839</td>
<td align="center"> - </td>
<td align="center"> - </td>
<td align="center">0.7839</td>
</tr>
<tr>
<td align="center" rowspan="2"><a href="https://github.com/open-mmlab/mmagic/blob/main/configs/srgan_resnet/msrresnet_x4c64b16_1xb16-1000k_div2k.py">SRResNet</a></td>
<td align="center" rowspan="2">Super Resolution</td>
<td align="center" rowspan="2">Set5</td>
<td align="center">PSNR</td>
<td align="center">30.2252</td>
<td align="center">30.2069</td>
<td align="center">30.2300</td>
<td align="center">30.2300</td>
<td align="center"> - </td>
<td align="center"> - </td>
<td align="center">30.2294</td>
</tr>
<tr>
<td align="center">SSIM</td>
<td align="center">0.8491</td>
<td align="center">0.8497</td>
<td align="center">0.8488</td>
<td align="center">0.8488</td>
<td align="center"> - </td>
<td align="center"> - </td>
<td align="center">0.8488</td>
</tr>
<tr>
<td align="center" rowspan="2"><a href="https://github.com/open-mmlab/mmagic/blob/main/configs/real_esrgan/realesrnet_c64b23g32_4xb12-lr2e-4-1000k_df2k-ost.py">Real-ESRNet</a></td>
<td align="center" rowspan="2">Super Resolution</td>
<td align="center" rowspan="2">Set5</td>
<td align="center">PSNR</td>
<td align="center">28.0297</td>
<td align="center">-</td>
<td align="center">27.7016</td>
<td align="center">27.7016</td>
<td align="center"> - </td>
<td align="center"> - </td>
<td align="center">27.7049</td>
</tr>
<tr>
<td align="center">SSIM</td>
<td align="center">0.8236</td>
<td align="center">-</td>
<td align="center">0.8122</td>
<td align="center">0.8122</td>
<td align="center"> - </td>
<td align="center"> - </td>
<td align="center">0.8123</td>
</tr>
<tr>
<td align="center" rowspan="2"><a href="https://github.com/open-mmlab/mmagic/blob/main/configs/edsr/edsr_x4c64b16_1xb16-300k_div2k.py">EDSR</a></td>
<td align="center" rowspan="2">Super Resolution</td>
<td align="center" rowspan="2">Set5</td>
<td align="center">PSNR</td>
<td align="center">30.2223</td>
<td align="center">30.2192</td>
<td align="center">30.2214</td>
<td align="center">30.2214</td>
<td align="center">30.2211</td>
<td align="center">30.1383</td>
<td align="center">-</td>
</tr>
<tr>
<td align="center">SSIM</td>
<td align="center">0.8500</td>
<td align="center">0.8507</td>
<td align="center">0.8497</td>
<td align="center">0.8497</td>
<td align="center">0.8497</td>
<td align="center">0.8469</td>
<td align="center"> - </td>
</tr>
</tbody>
</table>
</div>
<div style="margin-left: 25px;">
<table class="docutils">
<thead>
<tr>
<th align="center" colspan="4">mmocr</th>
<th align="center">Pytorch</th>
<th align="center">TorchScript</th>
<th align="center">ONNXRuntime</th>
<th align="center" colspan="3">TensorRT</th>
<th align="center">PPLNN</th>
<th align="center">OpenVINO</th>
</tr>
</thead>
<tbody>
<tr>
<td align="center">model</td>
<td align="center">task</td>
<td align="center">dataset</td>
<td align="center">metric</td>
<td align="center">fp32</td>
<td align="center">fp32</td>
<td align="center">fp32</td>
<td align="center">fp32</td>
<td align="center">fp16</td>
<td align="center">int8</td>
<td align="center">fp16</td>
<td align="center">fp32</td>
</tr>
<tr>
<td align="center" rowspan="3"><a href="https://github.com/open-mmlab/mmocr/blob/main/configs/textdet/dbnet/dbnet_resnet18_fpnc_1200e_icdar2015.py">DBNet*</a></td>
<td align="center" rowspan="3">TextDetection</td>
<td align="center" rowspan="3">ICDAR2015</td>
<td align="center">recall</td>
<td align="center">0.7310</td>
<td align="center">0.7308</td>
<td align="center">0.7304</td>
<td align="center">0.7198</td>
<td align="center">0.7179</td>
<td align="center">0.7111</td>
<td align="center">0.7304</td>
<td align="center">0.7309</td>
</tr>
<tr>
<td align="center">precision</td>
<td align="center">0.8714</td>
<td align="center">0.8718</td>
<td align="center">0.8714</td>
<td align="center">0.8677</td>
<td align="center">0.8674</td>
<td align="center">0.8688</td>
<td align="center">0.8718</td>
<td align="center">0.8714</td>
</tr>
<tr>
<td align="center">hmean</td>
<td align="center">0.7950</td>
<td align="center">0.7949</td>
<td align="center">0.7950</td>
<td align="center">0.7868</td>
<td align="center">0.7856</td>
<td align="center">0.7821</td>
<td align="center">0.7949</td>
<td align="center">0.7950</td>
</tr>
<tr>
<td align="center" rowspan="3"><a href="https://github.com/open-mmlab/mmocr/blob/main/configs/textdet/dbnetpp/dbnetpp_resnet50_fpnc_1200e_icdar2015.py">DBNetpp</a></td>
<td align="center" rowspan="3">TextDetection</td>
<td align="center" rowspan="3">ICDAR2015</td>
<td align="center">recall</td>
<td align="center">0.8209</td>
<td align="center">0.8209</td>
<td align="center">0.8209</td>
<td align="center">0.8199</td>
<td align="center">0.8204</td>
<td align="center">0.8204</td>
<td align="center">-</td>
<td align="center">0.8209</td>
</tr>
<tr>
<td align="center">precision</td>
<td align="center">0.9079</td>
<td align="center">0.9079</td>
<td align="center">0.9079</td>
<td align="center">0.9117</td>
<td align="center">0.9117</td>
<td align="center">0.9142</td>
<td align="center">-</td>
<td align="center">0.9079</td>
</tr>
<tr>
<td align="center">hmean</td>
<td align="center">0.8622</td>
<td align="center">0.8622</td>
<td align="center">0.8622</td>
<td align="center">0.8634</td>
<td align="center">0.8637</td>
<td align="center">0.8648</td>
<td align="center">-</td>
<td align="center">0.8622</td>
</tr>
<tr>
<td align="center" rowspan="3"><a href="https://github.com/open-mmlab/mmocr/blob/main/configs/textdet/psenet/psenet_resnet50_fpnf_600e_icdar2015.py">PSENet</a></td>
<td align="center" rowspan="3">TextDetection</td>
<td align="center" rowspan="3">ICDAR2015</td>
<td align="center">recall</td>
<td align="center">0.7526</td>
<td align="center">0.7526</td>
<td align="center">0.7526</td>
<td align="center">0.7526</td>
<td align="center">0.7520</td>
<td align="center">0.7496</td>
<td align="center">-</td>
<td align="center">0.7526</td>
</tr>
<tr>
<td align="center">precision</td>
<td align="center">0.8669</td>
<td align="center">0.8669</td>
<td align="center">0.8669</td>
<td align="center">0.8669</td>
<td align="center">0.8668</td>
<td align="center">0.8550</td>
<td align="center">-</td>
<td align="center">0.8669</td>
</tr>
<tr>
<td align="center">hmean</td>
<td align="center">0.8057</td>
<td align="center">0.8057</td>
<td align="center">0.8057</td>
<td align="center">0.8057</td>
<td align="center">0.8054</td>
<td align="center">0.7989</td>
<td align="center">-</td>
<td align="center">0.8057</td>
</tr>
<tr>
<td align="center" rowspan="3"><a href="https://github.com/open-mmlab/mmocr/blob/main/configs/textdet/panet/panet_resnet18_fpem-ffm_600e_icdar2015.py">PANet</a></td>
<td align="center" rowspan="3">TextDetection</td>
<td align="center" rowspan="3">ICDAR2015</td>
<td align="center">recall</td>
<td align="center">0.7401</td>
<td align="center">0.7401</td>
<td align="center">0.7401</td>
<td align="center">0.7357</td>
<td align="center">0.7366</td>
<td align="center">-</td>
<td align="center">-</td>
<td align="center">0.7401</td>
</tr>
<tr>
<td align="center">precision</td>
<td align="center">0.8601</td>
<td align="center">0.8601</td>
<td align="center">0.8601</td>
<td align="center">0.8570</td>
<td align="center">0.8586</td>
<td align="center">-</td>
<td align="center">-</td>
<td align="center">0.8601</td>
</tr>
<tr>
<td align="center">hmean</td>
<td align="center">0.7955</td>
<td align="center">0.7955</td>
<td align="center">0.7955</td>
<td align="center">0.7917</td>
<td align="center">0.7930</td>
<td align="center">-</td>
<td align="center">-</td>
<td align="center">0.7955</td>
</tr>
<tr>
<td align="center" rowspan="3"><a href="https://github.com/open-mmlab/mmocr/blob/main/configs/textdet/textsnake/textsnake_resnet50_fpn-unet_1200e_ctw1500.py">TextSnake</a></td>
<td align="center" rowspan="3">TextDetection</td>
<td align="center" rowspan="3">CTW1500</td>
<td align="center">recall</td>
<td align="center">0.8052</td>
<td align="center">0.8052</td>
<td align="center">0.8052</td>
<td align="center">0.8055</td>
<td align="center">-</td>
<td align="center">-</td>
<td align="center">-</td>
<td align="center">-</td>
</tr>
<tr>
<td align="center">precision</td>
<td align="center">0.8535</td>
<td align="center">0.8535</td>
<td align="center">0.8535</td>
<td align="center">0.8538</td>
<td align="center">-</td>
<td align="center">-</td>
<td align="center">-</td>
<td align="center">-</td>
</tr>
<tr>
<td align="center">hmean</td>
<td align="center">0.8286</td>
<td align="center">0.8286</td>
<td align="center">0.8286</td>
<td align="center">0.8290</td>
<td align="center">-</td>
<td align="center">-</td>
<td align="center">-</td>
<td align="center">-</td>
</tr>
<tr>
<td align="center" rowspan="3"><a href="https://github.com/open-mmlab/mmocr/blob/main/configs/textdet/maskrcnn/mask-rcnn_resnet50_fpn_160e_icdar2015.py">MaskRCNN</a></td>
<td align="center" rowspan="3">TextDetection</td>
<td align="center" rowspan="3">ICDAR2015</td>
<td align="center">recall</td>
<td align="center">0.7766</td>
<td align="center">0.7766</td>
<td align="center">0.7766</td>
<td align="center">0.7766</td>
<td align="center">0.7761</td>
<td align="center">0.7670</td>
<td align="center">-</td>
<td align="center">-</td>
</tr>
<tr>
<td align="center">precision</td>
<td align="center">0.8644</td>
<td align="center">0.8644</td>
<td align="center">0.8644</td>
<td align="center">0.8644</td>
<td align="center">0.8630</td>
<td align="center">0.8705</td>
<td align="center">-</td>
<td align="center">-</td>
</tr>
<tr>
<td align="center">hmean</td>
<td align="center">0.8182</td>
<td align="center">0.8182</td>
<td align="center">0.8182</td>
<td align="center">0.8182</td>
<td align="center">0.8172</td>
<td align="center">0.8155</td>
<td align="center">-</td>
<td align="center">-</td>
</tr>
<tr>
<td align="center"><a href="https://github.com/open-mmlab/mmocr/blob/main/configs/textrecog/crnn/crnn_mini-vgg_5e_mj.py">CRNN</a></td>
<td align="center">TextRecognition</td>
<td align="center">IIIT5K</td>
<td align="center">acc</td>
<td align="center">0.8067</td>
<td align="center">0.8067</td>
<td align="center">0.8067</td>
<td align="center">0.8067</td>
<td align="center">0.8063</td>
<td align="center">0.8067</td>
<td align="center">0.8067</td>
<td align="center">-</td>
</tr>
<tr>
<td align="center"><a href="https://github.com/open-mmlab/mmocr/blob/main/configs/textrecog/sar/sar_resnet31_parallel-decoder_5e_st-sub_mj-sub_sa_real.py">SAR</a></td>
<td align="center">TextRecognition</td>
<td align="center">IIIT5K</td>
<td align="center">acc</td>
<td align="center">0.9517</td>
<td align="center">-</td>
<td align="center">0.9287</td>
<td align="center">-</td>
<td align="center">-</td>
<td align="center">-</td>
<td align="center">-</td>
<td align="center">-</td>
</tr>
<tr>
<td align="center"><a href="https://github.com/open-mmlab/mmocr/blob/main/configs/textrecog/satrn/satrn_shallow-small_5e_st_mj.py">SATRN</a></td>
<td align="center">TextRecognition</td>
<td align="center">IIIT5K</td>
<td align="center">acc</td>
<td align="center">0.9470</td>
<td align="center">0.9487</td>
<td align="center">0.9487</td>
<td align="center">0.9487</td>
<td align="center">0.9483</td>
<td align="center">0.9483</td>
<td align="center">-</td>
<td align="center">-</td>
</tr>
<tr>
<td align="center"><a href="https://github.com/open-mmlab/mmocr/blob/main/configs/textrecog/abinet/abinet_20e_st-an_mj.py">ABINet</a></td>
<td align="center">TextRecognition</td>
<td align="center">IIIT5K</td>
<td align="center">acc</td>
<td align="center">0.9603</td>
<td align="center">0.9563</td>
<td align="center">0.9563</td>
<td align="center">0.9573</td>
<td align="center">0.9507</td>
<td align="center">0.9510</td>
<td align="center">-</td>
<td align="center">-</td>
</tr>
</tbody>
</table>
</div>
<div style="margin-left: 25px;">
<table class="docutils">
<thead>
<tr>
<th align="center" colspan="3">mmseg</th>
<th align="center">Pytorch</th>
<th align="center">TorchScript</th>
<th align="center">ONNXRuntime</th>
<th align="center" colspan="3">TensorRT</th>
<th align="center">PPLNN</th>
<th align="center">Ascend</th>
</tr>
</thead>
<tbody>
<tr>
<td align="center">model</td>
<td align="center">dataset</td>
<td align="center">metric</td>
<td align="center">fp32</td>
<td align="center">fp32</td>
<td align="center">fp32</td>
<td align="center">fp32</td>
<td align="center">fp16</td>
<td align="center">int8</td>
<td align="center">fp16</td>
<td align="center">fp32</td>
</tr>
<tr>
<td align="center"><a href="https://github.com/open-mmlab/mmsegmentation/tree/main/configs/fcn/fcn_r50-d8_4xb2-40k_cityscapes-512x1024.py">FCN</a></td>
<td align="center">Cityscapes</td>
<td align="center">mIoU</td>
<td align="center">72.25</td>
<td align="center">72.36</td>
<td align="center">-</td>
<td align="center">72.36</td>
<td align="center">72.35</td>
<td align="center">74.19</td>
<td align="center">72.35</td>
<td align="center">72.35</td>
</tr>
<tr>
<td align="center"><a href="https://github.com/open-mmlab/mmsegmentation/tree/main/configs/pspnet/pspnet_r50-d8_4xb2-80k_cityscapes-512x1024.py">PSPNet</a></td>
<td align="center">Cityscapes</td>
<td align="center">mIoU</td>
<td align="center">78.55</td>
<td align="center">78.66</td>
<td align="center">-</td>
<td align="center">78.26</td>
<td align="center">78.24</td>
<td align="center">77.97</td>
<td align="center">78.09</td>
<td align="center">78.67</td>
</tr>
<tr>
<td align="center"><a href="https://github.com/open-mmlab/mmsegmentation/tree/main/configs/deeplabv3/deeplabv3_r50-d8_4xb2-40k_cityscapes-512x1024.py">deeplabv3</a></td>
<td align="center">Cityscapes</td>
<td align="center">mIoU</td>
<td align="center">79.09</td>
<td align="center">79.12</td>
<td align="center">-</td>
<td align="center">79.12</td>
<td align="center">79.12</td>
<td align="center">78.96</td>
<td align="center">79.12</td>
<td align="center">79.06</td>
</tr>
<tr>
<td align="center"><a href="https://github.com/open-mmlab/mmsegmentation/tree/main/configs/deeplabv3plus/deeplabv3plus_r50-d8_4xb2-40k_cityscapes-512x1024.py">deeplabv3+</a></td>
<td align="center">Cityscapes</td>
<td align="center">mIoU</td>
<td align="center">79.61</td>
<td align="center">79.60</td>
<td align="center">-</td>
<td align="center">79.60</td>
<td align="center">79.60</td>
<td align="center">79.43</td>
<td align="center">79.60</td>
<td align="center">79.51</td>
</tr>
<tr>
<td align="center"><a href="https://github.com/open-mmlab/mmsegmentation/tree/main/configs/fastscnn/fast_scnn_8xb4-160k_cityscapes-512x1024.py">Fast-SCNN</a></td>
<td align="center">Cityscapes</td>
<td align="center">mIoU</td>
<td align="center">70.96</td>
<td align="center">70.96</td>
<td align="center">-</td>
<td align="center">70.93</td>
<td align="center">70.92</td>
<td align="center">66.00</td>
<td align="center">70.92</td>
<td align="center">-</td>
</tr>
<tr>
<td align="center"><a href="https://github.com/open-mmlab/mmsegmentation/tree/main/configs/unet/unet-s5-d16_fcn_4xb4-160k_cityscapes-512x1024.py">UNet</a></td>
<td align="center">Cityscapes</td>
<td align="center">mIoU</td>
<td align="center">69.10</td>
<td align="center">-</td>
<td align="center">-</td>
<td align="center">69.10</td>
<td align="center">69.10</td>
<td align="center">68.95</td>
<td align="center">-</td>
<td align="center">-</td>
</tr>
<tr>
<td align="center"><a href="https://github.com/open-mmlab/mmsegmentation/tree/main/configs/ann/ann_r50-d8_4xb2-40k_cityscapes-512x1024.py">ANN</a></td>
<td align="center">Cityscapes</td>
<td align="center">mIoU</td>
<td align="center">77.40</td>
<td align="center">-</td>
<td align="center">-</td>
<td align="center">77.32</td>
<td align="center">77.32</td>
<td align="center">-</td>
<td align="center">-</td>
<td align="center">-</td>
</tr>
<tr>
<td align="center"><a href="https://github.com/open-mmlab/mmsegmentation/tree/main/configs/apcnet/apcnet_r50-d8_4xb2-40k_cityscapes-512x1024.py">APCNet</a></td>
<td align="center">Cityscapes</td>
<td align="center">mIoU</td>
<td align="center">77.40</td>
<td align="center">-</td>
<td align="center">-</td>
<td align="center">77.32</td>
<td align="center">77.32</td>
<td align="center">-</td>
<td align="center">-</td>
<td align="center">-</td>
</tr>
<tr>
<td align="center"><a href="https://github.com/open-mmlab/mmsegmentation/tree/main/configs/bisenetv1/bisenetv1_r18-d32_4xb4-160k_cityscapes-1024x1024.py">BiSeNetV1</a></td>
<td align="center">Cityscapes</td>
<td align="center">mIoU</td>
<td align="center">74.44</td>
<td align="center">-</td>
<td align="center">-</td>
<td align="center">74.44</td>
<td align="center">74.43</td>
<td align="center">-</td>
<td align="center">-</td>
<td align="center">-</td>
</tr>
<tr>
<td align="center"><a href="https://github.com/open-mmlab/mmsegmentation/tree/main/configs/bisenetv2/bisenetv2_fcn_4xb4-160k_cityscapes-1024x1024.py">BiSeNetV2</a></td>
<td align="center">Cityscapes</td>
<td align="center">mIoU</td>
<td align="center">73.21</td>
<td align="center">-</td>
<td align="center">-</td>
<td align="center">73.21</td>
<td align="center">73.21</td>
<td align="center">-</td>
<td align="center">-</td>
<td align="center">-</td>
</tr>
<tr>
<td align="center"><a href="https://github.com/open-mmlab/mmsegmentation/tree/main/configs/cgnet/cgnet_fcn_4xb8-60k_cityscapes-512x1024.py">CGNet</a></td>
<td align="center">Cityscapes</td>
<td align="center">mIoU</td>
<td align="center">68.25</td>
<td align="center">-</td>
<td align="center">-</td>
<td align="center">68.27</td>
<td align="center">68.27</td>
<td align="center">-</td>
<td align="center">-</td>
<td align="center">-</td>
</tr>
<tr>
<td align="center"><a href="https://github.com/open-mmlab/mmsegmentation/tree/main/configs/emanet/emanet_r50-d8_4xb2-80k_cityscapes-512x1024.py">EMANet</a></td>
<td align="center">Cityscapes</td>
<td align="center">mIoU</td>
<td align="center">77.59</td>
<td align="center">-</td>
<td align="center">-</td>
<td align="center">77.59</td>
<td align="center">77.6</td>
<td align="center">-</td>
<td align="center">-</td>
<td align="center">-</td>
</tr>
<tr>
<td align="center"><a href="https://github.com/open-mmlab/mmsegmentation/tree/main/configs/encnet/encnet_r50-d8_4xb2-40k_cityscapes-512x1024.py">EncNet</a></td>
<td align="center">Cityscapes</td>
<td align="center">mIoU</td>
<td align="center">75.67</td>
<td align="center">-</td>
<td align="center">-</td>
<td align="center">75.66</td>
<td align="center">75.66</td>
<td align="center">-</td>
<td align="center">-</td>
<td align="center">-</td>
</tr>
<tr>
<td align="center"><a href="https://github.com/open-mmlab/mmsegmentation/tree/main/configs/erfnet/erfnet_fcn_4xb4-160k_cityscapes-512x1024.py">ERFNet</a></td>
<td align="center">Cityscapes</td>
<td align="center">mIoU</td>
<td align="center">71.08</td>
<td align="center">-</td>
<td align="center">-</td>
<td align="center">71.08</td>
<td align="center">71.07</td>
<td align="center">-</td>
<td align="center">-</td>
<td align="center">-</td>
</tr>
<tr>
<td align="center"><a href="https://github.com/open-mmlab/mmsegmentation/tree/main/configs/fastfcn/fastfcn_r50-d32_jpu_aspp_4xb2-80k_cityscapes-512x1024.py">FastFCN</a></td>
<td align="center">Cityscapes</td>
<td align="center">mIoU</td>
<td align="center">79.12</td>
<td align="center">-</td>
<td align="center">-</td>
<td align="center">79.12</td>
<td align="center">79.12</td>
<td align="center">-</td>
<td align="center">-</td>
<td align="center">-</td>
</tr>
<tr>
<td align="center"><a href="https://github.com/open-mmlab/mmsegmentation/tree/main/configs/gcnet/gcnet_r50-d8_4xb2-40k_cityscapes-512x1024.py">GCNet</a></td>
<td align="center">Cityscapes</td>
<td align="center">mIoU</td>
<td align="center">77.69</td>
<td align="center">-</td>
<td align="center">-</td>
<td align="center">77.69</td>
<td align="center">77.69</td>
<td align="center">-</td>
<td align="center">-</td>
<td align="center">-</td>
</tr>
<tr>
<td align="center"><a href="https://github.com/open-mmlab/mmsegmentation/tree/main/configs/icnet/icnet_r18-d8_4xb2-80k_cityscapes-832x832.py">ICNet</a></td>
<td align="center">Cityscapes</td>
<td align="center">mIoU</td>
<td align="center">76.29</td>
<td align="center">-</td>
<td align="center">-</td>
<td align="center">76.36</td>
<td align="center">76.36</td>
<td align="center">-</td>
<td align="center">-</td>
<td align="center">-</td>
</tr>
<tr>
<td align="center"><a href="https://github.com/open-mmlab/mmsegmentation/tree/main/configs/isanet/isanet_r50-d8_4xb2-40k_cityscapes-512x1024.py">ISANet</a></td>
<td align="center">Cityscapes</td>
<td align="center">mIoU</td>
<td align="center">78.49</td>
<td align="center">-</td>
<td align="center">-</td>
<td align="center">78.49</td>
<td align="center">78.49</td>
<td align="center">-</td>
<td align="center">-</td>
<td align="center">-</td>
</tr>
<tr>
<td align="center"><a href="https://github.com/open-mmlab/mmsegmentation/tree/main/configs/ocrnet/ocrnet_hr18s_4xb2-40k_cityscapes-512x1024.py">OCRNet</a></td>
<td align="center">Cityscapes</td>
<td align="center">mIoU</td>
<td align="center">74.30</td>
<td align="center">-</td>
<td align="center">-</td>
<td align="center">73.66</td>
<td align="center">73.67</td>
<td align="center">-</td>
<td align="center">-</td>
<td align="center">-</td>
</tr>
<tr>
<td align="center"><a href="https://github.com/open-mmlab/mmsegmentation/tree/main/configs/point_rend/pointrend_r50_4xb2-80k_cityscapes-512x1024.py">PointRend</a></td>
<td align="center">Cityscapes</td>
<td align="center">mIoU</td>
<td align="center">76.47</td>
<td align="center">76.47</td>
<td align="center">-</td>
<td align="center">76.41</td>
<td align="center">76.42</td>
<td align="center">-</td>
<td align="center">-</td>
<td align="center">-</td>
</tr>
<tr>
<td align="center"><a href="https://github.com/open-mmlab/mmsegmentation/tree/main/configs/sem_fpn/fpn_r50_4xb2-80k_cityscapes-512x1024.py">Semantic FPN</a></td>
<td align="center">Cityscapes</td>
<td align="center">mIoU</td>
<td align="center">74.52</td>
<td align="center">-</td>
<td align="center">-</td>
<td align="center">74.52</td>
<td align="center">74.52</td>
<td align="center">-</td>
<td align="center">-</td>
<td align="center">-</td>
</tr>
<tr>
<td align="center"><a href="https://github.com/open-mmlab/mmsegmentation/tree/main/configs/stdc/stdc1_in1k-pre_4xb12-80k_cityscapes-512x1024.py">STDC</a></td>
<td align="center">Cityscapes</td>
<td align="center">mIoU</td>
<td align="center">75.10</td>
<td align="center">-</td>
<td align="center">-</td>
<td align="center">75.10</td>
<td align="center">75.10</td>
<td align="center">-</td>
<td align="center">-</td>
<td align="center">-</td>
</tr>
<tr>
<td align="center"><a href="https://github.com/open-mmlab/mmsegmentation/tree/main/configs/stdc/stdc2_in1k-pre_4xb12-80k_cityscapes-512x1024.py">STDC</a></td>
<td align="center">Cityscapes</td>
<td align="center">mIoU</td>
<td align="center">77.17</td>
<td align="center">-</td>
<td align="center">-</td>
<td align="center">77.17</td>
<td align="center">77.17</td>
<td align="center">-</td>
<td align="center">-</td>
<td align="center">-</td>
</tr>
<tr>
<td align="center"><a href="https://github.com/open-mmlab/mmsegmentation/tree/main/configs/upernet/upernet_r50_4xb2-40k_cityscapes-512x1024.py">UPerNet</a></td>
<td align="center">Cityscapes</td>
<td align="center">mIoU</td>
<td align="center">77.10</td>
<td align="center">-</td>
<td align="center">-</td>
<td align="center">77.19</td>
<td align="center">77.18</td>
<td align="center">-</td>
<td align="center">-</td>
<td align="center">-</td>
</tr>
<tr>
<td align="center"><a href="https://github.com/open-mmlab/mmsegmentation/blob/main/configs/segmenter/segmenter_vit-s_fcn_8xb1-160k_ade20k-512x512.py">Segmenter</a></td>
<td align="center">ADE20K</td>
<td align="center">mIoU</td>
<td align="center">44.32</td>
<td align="center">44.29</td>
<td align="center">44.29</td>
<td align="center">44.29</td>
<td align="center">43.34</td>
<td align="center">43.35</td>
<td align="center">-</td>
<td align="center">-</td>
</tr>
</tbody>
</table>
</div>
<div style="margin-left: 25px;">
<table class="docutils">
<thead>
<tr>
<th align="center" colspan="4">mmpose</th>
<th align="center">Pytorch</th>
<th align="center">ONNXRuntime</th>
<th align="center" colspan="2">TensorRT</th>
<th align="center">PPLNN</th>
<th align="center">OpenVINO</th>
</tr>
</thead>
<tbody>
<tr>
<td align="center">model</td>
<td align="center">task</td>
<td align="center">dataset</td>
<td align="center">metric</td>
<td align="center">fp32</td>
<td align="center">fp32</td>
<td align="center">fp32</td>
<td align="center">fp16</td>
<td align="center">fp16</td>
<td align="center">fp32</td>
</tr>
<tr>
<td align="center" rowspan="2"><a href="https://github.com/open-mmlab/mmpose/blob/main/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_hrnet-w48_8xb32-210e_coco-256x192.py">HRNet</a></td>
<td align="center" rowspan="2">Pose Detection</td>
<td align="center" rowspan="2">COCO</td>
<td align="center">AP</td>
<td align="center">0.748</td>
<td align="center">0.748</td>
<td align="center">0.748</td>
<td align="center">0.748</td>
<td align="center">-</td>
<td align="center">0.748</td>
</tr>
<tr>
<td align="center">AR</td>
<td align="center">0.802</td>
<td align="center">0.802</td>
<td align="center">0.802</td>
<td align="center">0.802</td>
<td align="center">-</td>
<td align="center">0.802</td>
</tr>
<tr>
<td align="center" rowspan="2"><a href="https://github.com/open-mmlab/mmpose/blob/main/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_litehrnet-30_8xb64-210e_coco-256x192.py">LiteHRNet</a></td>
<td align="center" rowspan="2">Pose Detection</td>
<td align="center" rowspan="2">COCO</td>
<td align="center">AP</td>
<td align="center">0.663</td>
<td align="center">0.663</td>
<td align="center">0.663</td>
<td align="center">-</td>
<td align="center">-</td>
<td align="center">0.663</td>
</tr>
<tr>
<td align="center">AR</td>
<td align="center">0.728</td>
<td align="center">0.728</td>
<td align="center">0.728</td>
<td align="center">-</td>
<td align="center">-</td>
<td align="center">0.728</td>
</tr>
<tr>
<td align="center" rowspan="2"><a href="https://github.com/open-mmlab/mmpose/blob/main/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_4xmspn50_8xb32-210e_coco-256x192.py">MSPN</a></td>
<td align="center" rowspan="2">Pose Detection</td>
<td align="center" rowspan="2">COCO</td>
<td align="center">AP</td>
<td align="center">0.762</td>
<td align="center">0.762</td>
<td align="center">0.762</td>
<td align="center">0.762</td>
<td align="center">-</td>
<td align="center">0.762</td>
</tr>
<tr>
<td align="center">AR</td>
<td align="center">0.825</td>
<td align="center">0.825</td>
<td align="center">0.825</td>
<td align="center">0.825</td>
<td align="center">-</td>
<td align="center">0.825</td>
</tr>
<tr>
<td align="center" rowspan="2"><a href="https://github.com/open-mmlab/mmpose/blob/main/configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_hourglass52_8xb32-210e_coco-256x256.py">Hourglass</a></td>
<td align="center" rowspan="2">Pose Detection</td>
<td align="center" rowspan="2">COCO</td>
<td align="center">AP</td>
<td align="center">0.717</td>
<td align="center">0.717</td>
<td align="center">0.717</td>
<td align="center">0.717</td>
<td align="center">-</td>
<td align="center">0.717</td>
</tr>
<tr>
<td align="center">AR</td>
<td align="center">0.774</td>
<td align="center">0.774</td>
<td align="center">0.774</td>
<td align="center">0.774</td>
<td align="center">-</td>
<td align="center">0.774</td>
</tr>
<tr>
<td align="center" rowspan="2"><a href="https://github.com/open-mmlab/mmpose/blob/main/configs/body_2d_keypoint/simcc/coco/simcc_mobilenetv2_wo-deconv-8xb64-210e_coco-256x192.py">SimCC</a></td>
<td align="center" rowspan="2">Pose Detection</td>
<td align="center" rowspan="2">COCO</td>
<td align="center">AP</td>
<td align="center">0.607</td>
<td align="center">-</td>
<td align="center">0.608</td>
<td align="center">-</td>
<td align="center">-</td>
<td align="center">-</td>
</tr>
<tr>
<td align="center">AR</td>
<td align="center">0.668</td>
<td align="center">-</td>
<td align="center">0.672</td>
<td align="center">-</td>
<td align="center">-</td>
<td align="center">-</td>
</tr>
</tbody>
</table>
</div>
<div style="margin-left: 25px;">
<table class="docutils">
<thead>
<tr>
<th align="center" colspan="4">mmrotate</th>
<th align="center">Pytorch</th>
<th align="center">ONNXRuntime</th>
<th align="center" colspan="2">TensorRT</th>
<th align="center">PPLNN</th>
<th align="center">OpenVINO</th>
</tr>
</thead>
<tbody>
<tr>
<td align="center">model</td>
<td align="center">task</td>
<td align="center">dataset</td>
<td align="center">metrics</td>
<td align="center">fp32</td>
<td align="center">fp32</td>
<td align="center">fp32</td>
<td align="center">fp16</td>
<td align="center">fp16</td>
<td align="center">fp32</td>
</tr>
<tr>
<td align="center"><a href="https://github.com/open-mmlab/mmrotate/tree/main/configs/rotated_retinanet/rotated-retinanet-hbox-oc_r50_fpn_1x_dota.py">RotatedRetinaNet</a></td>
<td align="center">Rotated Detection</td>
<td align="center">DOTA-v1.0</td>
<td align="center">mAP</td>
<td align="center">0.698</td>
<td align="center">0.698</td>
<td align="center">0.698</td>
<td align="center">0.697</td>
<td align="center">-</td>
<td align="center">-</td>
</tr>
<tr>
<td align="center"><a href="https://github.com/open-mmlab/mmrotate/tree/1.x/configs/oriented_rcnn/oriented-rcnn-le90_r50_fpn_1x_dota.py">Oriented RCNN</a></td>
<td align="center">Rotated Detection</td>
<td align="center">DOTA-v1.0</td>
<td align="center">mAP</td>
<td align="center">0.756</td>
<td align="center">0.756</td>
<td align="center">0.758</td>
<td align="center">0.730</td>
<td align="center">-</td>
<td align="center">-</td>
</tr>
<tr>
<td align="center"><a href="https://github.com/open-mmlab/mmrotate/blob/1.x/configs/gliding_vertex/gliding-vertex-rbox_r50_fpn_1x_dota.py">GlidingVertex</a></td>
<td align="center">Rotated Detection</td>
<td align="center">DOTA-v1.0</td>
<td align="center">mAP</td>
<td align="center">0.732</td>
<td align="center">-</td>
<td align="center">0.733</td>
<td align="center">0.731</td>
<td align="center">-</td>
<td align="center">-</td>
</tr>
<tr>
<td align="center"><a href="https://github.com/open-mmlab/mmrotate/blob/1.x/configs/roi_trans/roi-trans-le90_r50_fpn_1x_dota.py">RoI Transformer</a></td>
<td align="center">Rotated Detection</td>
<td align="center">DOTA-v1.0</td>
<td align="center">mAP</td>
<td align="center">0.761</td>
<td align="center">-</td>
<td align="center">0.758</td>
<td align="center">-</td>
<td align="center">-</td>
<td align="center">-</td>
</tr>
</tbody>
</table>
</div>
<div style="margin-left: 25px;">
<table class="docutils">
<thead>
<tr>
<th align="center" colspan="4">mmaction2</th>
<th align="center">Pytorch</th>
<th align="center">ONNXRuntime</th>
<th align="center" colspan="2">TensorRT</th>
<th align="center">PPLNN</th>
<th align="center">OpenVINO</th>
</tr>
</thead>
<tbody>
<tr>
<td align="center">model</td>
<td align="center">task</td>
<td align="center">dataset</td>
<td align="center">metrics</td>
<td align="center">fp32</td>
<td align="center">fp32</td>
<td align="center">fp32</td>
<td align="center">fp16</td>
<td align="center">fp16</td>
<td align="center">fp32</td>
</tr>
<tr>
<td align="center" rowspan="2"><a href="https://github.com/open-mmlab/mmaction2/blob/main/configs/recognition/tsn/tsn_imagenet-pretrained-r50_8xb32-1x1x3-100e_kinetics400-rgb.py">TSN</a></td>
<td align="center" rowspan="2">Recognition</td>
<td align="center" rowspan="2">Kinetics-400</td>
<td align="center">top-1</td>
<td align="center">69.71</td>
<td align="center">-</td>
<td align="center">69.71</td>
<td align="center">-</td>
<td align="center">-</td>
<td align="center">-</td>
</tr>
<tr>
<td align="center">top-5</td>
<td align="center">88.75</td>
<td align="center">-</td>
<td align="center">88.75</td>
<td align="center">-</td>
<td align="center">-</td>
<td align="center">-</td>
</tr>
<tr>
<td align="center" rowspan="2"><a href="https://github.com/open-mmlab/mmaction2/blob/main/configs/recognition/slowfast/slowfast_r50_8xb8-4x16x1-256e_kinetics400-rgb.py">SlowFast</a></td>
<td align="center" rowspan="2">Recognition</td>
<td align="center" rowspan="2">Kinetics-400</td>
<td align="center">top-1</td>
<td align="center">74.45</td>
<td align="center">-</td>
<td align="center">75.62</td>
<td align="center">-</td>
<td align="center">-</td>
<td align="center">-</td>
</tr>
<tr>
<td align="center">top-5</td>
<td align="center">91.55</td>
<td align="center">-</td>
<td align="center">92.10</td>
<td align="center">-</td>
<td align="center">-</td>
<td align="center">-</td>
</tr>
</tbody>
</table>
</div>
## 备注
- 由于某些数据集在代码库中包含各种分辨率的图像,例如 MMDet,速度基准是通过 MMDeploy 中的静态配置获得的,而性能基准是通过动态配置获得的
- TensorRT 的一些 int8 性能基准测试需要有 tensor core 的 Nvidia 卡,否则性能会大幅下降
- DBNet 在模型 `neck` 使用了`nearest` 插值,TensorRT-7 用了与 Pytorch 完全不同的策略。为了使与 TensorRT-7 兼容,我们重写了`neck`以使用`bilinear`插值,这提高了检测性能。为了获得与 Pytorch 匹配的性能,推荐使用 TensorRT-8+,其插值方法与 Pytorch 相同。
- 对于 mmpose 模型,在模型配置文件中 `flip_test` 需设置为 `False`
- 部分模型在 fp16 模式下可能存在较大的精度损失,请根据具体情况对模型进行调整。
# 边、端设备测试结果
这里给出我们边、端设备的测试结论,用户可以直接通过 [model profiling](../02-how-to-run/profile_model.md) 获得自己环境的结果。
## 软硬件环境
- host OS ubuntu 18.04
- backend SNPE-1.59
- device Mi11 (qcom 888)
## mmpretrain 模型
| model | dataset | spatial | fp32 top-1 (%) | snpe gpu hybrid fp32 top-1 (%) | latency (ms) |
| :----------------------------------------------------------------------------------------------------------------------: | :---------: | :-----: | :------------: | :----------------------------: | :----------: |
| [ShuffleNetV2](https://github.com/open-mmlab/mmpretrain/blob/main/configs/shufflenet_v2/shufflenet-v2-1x_16xb64_in1k.py) | ImageNet-1k | 224x224 | 69.55 | 69.83\* | 20±7 |
| [MobilenetV2](https://github.com/open-mmlab/mmpretrain/blob/main/configs/mobilenet_v2/mobilenet-v2_8xb32_in1k.py) | ImageNet-1k | 224x224 | 71.86 | 72.14\* | 15±6 |
tips:
1. ImageNet-1k 数据集较大,仅使用一部分测试(8000/50000)
2. 边、端设备发热会降频,因此耗时实际上会波动。这里给出运行一段时间后、稳定的数值。这个结果更贴近实际需求
## mmocr 检测
| model | dataset | spatial | fp32 hmean | snpe gpu hybrid hmean | latency(ms) |
| :--------------------------------------------------------------------------------------------------------------------: | :-------: | :------: | :--------: | :-------------------: | :---------: |
| [PANet](https://github.com/open-mmlab/mmocr/blob/main/configs/textdet/panet/panet_resnet18_fpem-ffm_600e_icdar2015.py) | ICDAR2015 | 1312x736 | 0.795 | 0.785 @thr=0.9 | 3100±100 |
## mmpose 模型
| model | dataset | spatial | snpe hybrid AR@IoU=0.50 | snpe hybrid AP@IoU=0.50 | latency(ms) |
| :---------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :--------: | :-----: | :---------------------: | :---------------------: | :---------: |
| [pose_hrnet_w32](https://github.com/open-mmlab/mmpose/blob/main/configs/animal_2d_keypoint/topdown_heatmap/animalpose/td-hm_hrnet-w32_8xb64-210e_animalpose-256x256.py) | Animalpose | 256x256 | 0.997 | 0.989 | 630±50 |
tips:
- 测试 pose_hrnet 用的是 AnimalPose 的 test dataset,而非 val dataset
## mmseg
| model | dataset | spatial | mIoU | latency(ms) |
| :------------------------------------------------------------------------------------------------------------------: | :--------: | :------: | :---: | :---------: |
| [fcn](https://github.com/open-mmlab/mmsegmentation/blob/main/configs/fcn/fcn_r18-d8_4xb2-80k_cityscapes-512x1024.py) | Cityscapes | 512x1024 | 71.11 | 4915±500 |
tips:
- fcn 用 512x1024 尺寸运行正常。Cityscapes 数据集 1024x2048 分辨率会导致设备重启
## 其他模型
- mmdet 需要手动把模型拆成两部分。因为
- snpe 源码中 `onnx_to_ir.py` 仅能解析输入,`ir_to_dlc.py` 还不支持 topk
- UDO (用户自定义算子)无法和 `snpe-onnx-to-dlc` 配合使用
- mmagic 模型
- srcnn 需要 cubic resize,snpe 不支持
- esrgan 可正常转换,但加载模型会导致设备重启
- mmrotate 依赖 [e2cnn](https://pypi.org/project/e2cnn/) ,需要手动安装 [其 Python3.6
兼容分支](https://github.com/QUVA-Lab/e2cnn)
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment