-`#assertion/root/workspace/mmdeploy/csrc/backend_ops/tensorrt/batched_nms/trt_batched_nms.cpp,98` or `pre_top_k need to be reduced for devices with arch 7.2`
1. 登录 <ahref="https://www.nvidia.com/">NVIDIA 官网</a>,从<ahref="https://developer.nvidia.com/nvidia-tensorrt-download">这里</a>选取并下载 TensorRT tar 包。要保证它和您机器的 CPU 架构以及 CUDA 版本是匹配的。<br>
1. 这里也有一份 TensorRT 8.2 GA Update 2 在 Linux x86_64 和 CUDA 11.x 下的安装示例,供您参考。首先,点击<ahref="https://developer.nvidia.com/compute/machine-learning/tensorrt/secure/8.2.3.0/tars/tensorrt-8.2.3.0.linux.x86_64-gnu.cuda-11.4.cudnn8.2.tar.gz">此处</a>下载 CUDA 11.x TensorRT 8.2.3.0。然后,根据如下命令,安装并配置 TensorRT 以及相关依赖。
<pre><code>
cd /the/path/of/tensorrt/tar/gz/file
tar -zxvf TensorRT-8.2.3.0.Linux.x86_64-gnu.cuda-11.4.cudnn8.2.tar.gz
1. Download libtorch from <ahref="https://pytorch.org/get-started/locally/">here</a>. Please note that only <b>Pre-cxx11 ABI</b> and <b>version 1.8.1+</b> on Linux platform are supported by now. For previous versions of libtorch, you can find them in the <ahref="https://github.com/pytorch/pytorch/issues/40961#issuecomment-1017317786">issue comment</a>. <br>
2. Take Libtorch1.8.1+cu111 as an example. You can install it like this:
1. 登录 <ahref="https://www.nvidia.com/">NVIDIA 官网</a>,从<ahref="https://developer.nvidia.com/nvidia-tensorrt-download">这里</a>选取并下载 TensorRT tar 包。要保证它和您机器的 CPU 架构以及 CUDA 版本是匹配的。您可以参考这份 <ahref="https://docs.nvidia.com/deeplearning/tensorrt/install-guide/index.html#installing-tar">指南</a> 安装 TensorRT。<br>
2. 这里也有一份 TensorRT 8.2 GA Update 2 在 Windows x86_64 和 CUDA 11.x 下的安装示例,供您参考。首先,点击<ahref="https://developer.nvidia.com/compute/machine-learning/tensorrt/secure/8.2.3.0/zip/TensorRT-8.2.3.0.Windows10.x86_64.cuda-11.4.cudnn8.2.zip">此处</a>下载 CUDA 11.x TensorRT 8.2.3.0。然后,根据如下命令,安装并配置 TensorRT 以及相关依赖。
-`deploy_cfg` : The path of the deploy config file in MMDeploy codebase.
-`model_cfg` : The path of model config file in OpenMMLab codebase.
-`checkpoint` : The path of the model checkpoint file.
-`img` : The path of the image file used to convert the model.
-`--work-dir` : Directory to save output ONNX models Default is `./work-dir`.
-`--device` : The device used for conversion. If not specified, it will be set to `cpu`.
-`--log-level` : To set log level which in `'CRITICAL', 'FATAL', 'ERROR', 'WARN', 'WARNING', 'INFO', 'DEBUG', 'NOTSET'`. If not specified, it will be set to `INFO`.
-`input_model` : The path of input ONNX model. The output ONNX model will be extracted from this model.
-`output_model` : The path of output ONNX model.
-`--start` : The start point of extracted model with format `<function_name>:<input/output>`. The `function_name` comes from the decorator `@mark`.
-`--end` : The end point of extracted model with format `<function_name>:<input/output>`. The `function_name` comes from the decorator `@mark`.
-`--log-level` : To set log level which in `'CRITICAL', 'FATAL', 'ERROR', 'WARN', 'WARNING', 'INFO', 'DEBUG', 'NOTSET'`. If not specified, it will be set to `INFO`.
### 注意事项
要支持模型分块,必须在 onnx 模型中添加 mark 节点,用`@mark` 修饰。
下面这个例子里 mark 了 `multiclass_nms`,在 NMS 前设置 `end=multiclass_nms:input` 提取子图。
-`onnx_path`: The path of the `ONNX` model to convert.
-`output_path`: The converted `PPLNN` algorithm path in json format.
-`device`: The device of the model during conversion.
-`opt-shapes`: Optimal shapes for PPLNN optimization. The shape of each tensor should be wrap with "[]" or "()" and the shapes of tensors should be separated by ",".
-`--log-level`: To set log level which in `'CRITICAL', 'FATAL', 'ERROR', 'WARN', 'WARNING', 'INFO', 'DEBUG', 'NOTSET'`. If not specified, it will be set to `INFO`.
## onnx2tensorrt
这个工具把 onnx 转成 trt .engine 格式。
### 用法
```bash
python tools/onnx2tensorrt.py \
${DEPLOY_CFG}\
${ONNX_PATH}\
${OUTPUT}\
--device-id 0 \
--log-level INFO \
--calib-file /path/to/file
```
### 参数说明
-`deploy_cfg` : The path of the deploy config file in MMDeploy codebase.
-`onnx_path` : The ONNX model path to convert.
-`output` : The path of output TensorRT engine.
-`--device-id` : The device index, default to `0`.
-`--calib-file` : The calibration data used to calibrate engine to int8.
-`--log-level` : To set log level which in `'CRITICAL', 'FATAL', 'ERROR', 'WARN', 'WARNING', 'INFO', 'DEBUG', 'NOTSET'`. If not specified, it will be set to `INFO`.
## onnx2ncnn
onnx 转 ncnn
### 用法
```bash
python tools/onnx2ncnn.py \
${ONNX_PATH}\
${NCNN_PARAM}\
${NCNN_BIN}\
--log-level INFO
```
### 参数说明
-`onnx_path` : The path of the `ONNX` model to convert from.
-`output_param` : The converted `ncnn` param path.
-`output_bin` : The converted `ncnn` bin path.
-`--log-level` : To set log level which in `'CRITICAL', 'FATAL', 'ERROR', 'WARN', 'WARNING', 'INFO', 'DEBUG', 'NOTSET'`. If not specified, it will be set to `INFO`.
## profiler
这个工具用来测试 torch 和 trt 等后端的速度,注意测试不包含前后处理。
### 用法
```bash
python tools/profiler.py \
${DEPLOY_CFG}\
${MODEL_CFG}\
${IMAGE_DIR}\
--model${MODEL}\
--device${DEVICE}\
--shape${SHAPE}\
--num-iter${NUM_ITER}\
--warmup${WARMUP}\
--cfg-options${CFG_OPTIONS}\
--batch-size${BATCH_SIZE}\
--img-ext${IMG_EXT}
```
### 参数说明
-`deploy_cfg` : The path of the deploy config file in MMDeploy codebase.
-`model_cfg` : The path of model config file in OpenMMLab codebase.
-`image_dir` : The directory to image files that used to test the model.
-`--model` : The path of the model to be tested.
-`--shape` : Input shape of the model by `HxW`, e.g., `800x1344`. If not specified, it would use `input_shape` from deploy config.
-`--num-iter` : Number of iteration to run inference. Default is `100`.
-`--warmup` : Number of iteration to warm-up the machine. Default is `10`.
-`--device` : The device type. If not specified, it will be set to `cuda:0`.
-`--cfg-options` : Optional key-value pairs to be overrode for model config.
-`--batch-size`: the batch size for test inference. Default is `1`. Note that not all models support `batch_size>1`.
-`--img-ext`: the file extensions for input images from `image_dir`. Defaults to `['.jpg', '.jpeg', '.png', '.ppm', '.bmp', '.pgm', '.tif']`.
<!-- This tutorial describes how to write a config for model conversion and deployment. A deployment config includes `onnx config`, `codebase config`, `backend config`. -->
-`dynamic or static`: 动态或者静态尺寸导出。 注意:如果推理框架需要明确的形状信息,您需要添加输入大小的描述,格式为`高度 x 宽度`。 例如 `dynamic-512x1024-2048x2048`, 这意味着最小输入形状是`512x1024`,最大输入形状是`2048x2048`。