Commit c6a27e0b authored by panhb's avatar panhb
Browse files

first init

parent e4b993b1
Pipeline #2192 canceled with stages
# PicoDet MNN Demo
本Demo提供的预测代码是根据[Alibaba's MNN framework](https://github.com/alibaba/MNN) 推理库预测的。
## C++ Demo
- 第一步:根据[MNN官方编译文档](https://www.yuque.com/mnn/en/build_linux) 编译生成预测库.
- 第二步:编译或下载得到OpenCV库,可参考OpenCV官网,为了方便如果环境是gcc8.2 x86环境,可直接下载以下库:
```shell
wget https://paddledet.bj.bcebos.com/data/opencv-3.4.16_gcc8.2_ffmpeg.tar.gz
tar -xf opencv-3.4.16_gcc8.2_ffmpeg.tar.gz
```
- 第三步:准备模型
```shell
modelName=picodet_s_320_coco_lcnet
# 导出Inference model
python tools/export_model.py \
-c configs/picodet/${modelName}.yml \
-o weights=${modelName}.pdparams \
--output_dir=inference_model
# 转换到ONNX
paddle2onnx --model_dir inference_model/${modelName} \
--model_filename model.pdmodel \
--params_filename model.pdiparams \
--opset_version 11 \
--save_file ${modelName}.onnx
# 简化模型
python -m onnxsim ${modelName}.onnx ${modelName}_processed.onnx
# 将模型转换至MNN格式
python -m MNN.tools.mnnconvert -f ONNX --modelFile picodet_s_320_lcnet_processed.onnx --MNNModel picodet_s_320_lcnet.mnn
```
为了快速测试,可直接下载:[picodet_s_320_lcnet.mnn](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_s_320_lcnet.mnn)(不带后处理)。
**注意:**由于MNN里,Matmul算子的输入shape如果不一致计算有问题,带后处理的Demo正在升级中,很快发布。
## 编译可执行程序
- 第一步:导入lib包
```
mkdir mnn && cd mnn && mkdir lib
cp /path/to/MNN/build/libMNN.so .
cd ..
cp -r /path/to/MNN/include .
```
- 第二步:修改CMakeLists.txt中OpenCV和MNN的路径
- 第三步:开始编译
``` shell
mkdir build && cd build
cmake ..
make
```
如果在build目录下生成`picodet-mnn`可执行文件,就证明成功了。
## 开始运行
首先新建预测结果存放目录:
```shell
cp -r ../demo_onnxruntime/imgs .
cd build
mkdir ../results
```
- 预测一张图片
``` shell
./picodet-mnn 0 ../picodet_s_320_lcnet_3.mnn 320 320 ../imgs/dog.jpg
```
-测试速度Benchmark
``` shell
./picodet-mnn 1 ../picodet_s_320_lcnet.mnn 320 320
```
## FAQ
- 预测结果精度不对:
请先确认模型输入shape是否对齐,并且模型输出name是否对齐,不带后处理的PicoDet增强版模型输出name如下:
```shell
# 分类分支 | 检测分支
{"transpose_0.tmp_0", "transpose_1.tmp_0"},
{"transpose_2.tmp_0", "transpose_3.tmp_0"},
{"transpose_4.tmp_0", "transpose_5.tmp_0"},
{"transpose_6.tmp_0", "transpose_7.tmp_0"},
```
可使用[netron](https://netron.app)查看具体name,并修改`picodet_mnn.hpp`中相应`non_postprocess_heads_info`数组。
## Reference
[MNN](https://github.com/alibaba/MNN)
// Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include "picodet_mnn.hpp"
#include <iostream>
#include <opencv2/core/core.hpp>
#include <opencv2/highgui/highgui.hpp>
#include <opencv2/imgproc/imgproc.hpp>
#define __SAVE_RESULT__ // if defined save drawed results to ../results, else
// show it in windows
struct object_rect {
int x;
int y;
int width;
int height;
};
std::vector<int> GenerateColorMap(int num_class) {
auto colormap = std::vector<int>(3 * num_class, 0);
for (int i = 0; i < num_class; ++i) {
int j = 0;
int lab = i;
while (lab) {
colormap[i * 3] |= (((lab >> 0) & 1) << (7 - j));
colormap[i * 3 + 1] |= (((lab >> 1) & 1) << (7 - j));
colormap[i * 3 + 2] |= (((lab >> 2) & 1) << (7 - j));
++j;
lab >>= 3;
}
}
return colormap;
}
void draw_bboxes(const cv::Mat &im, const std::vector<BoxInfo> &bboxes,
std::string save_path = "None") {
static const char *class_names[] = {
"person", "bicycle", "car",
"motorcycle", "airplane", "bus",
"train", "truck", "boat",
"traffic light", "fire hydrant", "stop sign",
"parking meter", "bench", "bird",
"cat", "dog", "horse",
"sheep", "cow", "elephant",
"bear", "zebra", "giraffe",
"backpack", "umbrella", "handbag",
"tie", "suitcase", "frisbee",
"skis", "snowboard", "sports ball",
"kite", "baseball bat", "baseball glove",
"skateboard", "surfboard", "tennis racket",
"bottle", "wine glass", "cup",
"fork", "knife", "spoon",
"bowl", "banana", "apple",
"sandwich", "orange", "broccoli",
"carrot", "hot dog", "pizza",
"donut", "cake", "chair",
"couch", "potted plant", "bed",
"dining table", "toilet", "tv",
"laptop", "mouse", "remote",
"keyboard", "cell phone", "microwave",
"oven", "toaster", "sink",
"refrigerator", "book", "clock",
"vase", "scissors", "teddy bear",
"hair drier", "toothbrush"};
cv::Mat image = im.clone();
int src_w = image.cols;
int src_h = image.rows;
int thickness = 2;
auto colormap = GenerateColorMap(sizeof(class_names));
for (size_t i = 0; i < bboxes.size(); i++) {
const BoxInfo &bbox = bboxes[i];
std::cout << bbox.x1 << ". " << bbox.y1 << ". " << bbox.x2 << ". "
<< bbox.y2 << ". " << std::endl;
int c1 = colormap[3 * bbox.label + 0];
int c2 = colormap[3 * bbox.label + 1];
int c3 = colormap[3 * bbox.label + 2];
cv::Scalar color = cv::Scalar(c1, c2, c3);
// cv::Scalar color = cv::Scalar(0, 0, 255);
cv::rectangle(image, cv::Rect(cv::Point(bbox.x1, bbox.y1),
cv::Point(bbox.x2, bbox.y2)),
color, 1, cv::LINE_AA);
char text[256];
sprintf(text, "%s %.1f%%", class_names[bbox.label], bbox.score * 100);
int baseLine = 0;
cv::Size label_size =
cv::getTextSize(text, cv::FONT_HERSHEY_SIMPLEX, 0.4, 1, &baseLine);
int x = bbox.x1;
int y = bbox.y1 - label_size.height - baseLine;
if (y < 0)
y = 0;
if (x + label_size.width > image.cols)
x = image.cols - label_size.width;
cv::rectangle(image, cv::Rect(cv::Point(x, y),
cv::Size(label_size.width,
label_size.height + baseLine)),
color, -1);
cv::putText(image, text, cv::Point(x, y + label_size.height),
cv::FONT_HERSHEY_SIMPLEX, 0.4, cv::Scalar(255, 255, 255), 1,
cv::LINE_AA);
}
if (save_path == "None") {
cv::imshow("image", image);
} else {
cv::imwrite(save_path, image);
std::cout << save_path << std::endl;
}
}
int image_demo(PicoDet &detector, const char *imagepath) {
std::vector<cv::String> filenames;
cv::glob(imagepath, filenames, false);
for (auto img_name : filenames) {
cv::Mat image = cv::imread(img_name, cv::IMREAD_COLOR);
if (image.empty()) {
fprintf(stderr, "cv::imread %s failed\n", img_name.c_str());
return -1;
}
std::vector<BoxInfo> results;
detector.detect(image, results, false);
std::cout << "detect done." << std::endl;
#ifdef __SAVE_RESULT__
std::string save_path = img_name;
draw_bboxes(image, results, save_path.replace(3, 4, "results"));
#else
draw_bboxes(image, results);
cv::waitKey(0);
#endif
}
return 0;
}
int benchmark(PicoDet &detector, int width, int height) {
int loop_num = 100;
int warm_up = 8;
double time_min = DBL_MAX;
double time_max = -DBL_MAX;
double time_avg = 0;
cv::Mat image(width, height, CV_8UC3, cv::Scalar(1, 1, 1));
for (int i = 0; i < warm_up + loop_num; i++) {
auto start = std::chrono::steady_clock::now();
std::vector<BoxInfo> results;
detector.detect(image, results, false);
auto end = std::chrono::steady_clock::now();
std::chrono::duration<double> elapsed = end - start;
double time = elapsed.count();
if (i >= warm_up) {
time_min = (std::min)(time_min, time);
time_max = (std::max)(time_max, time);
time_avg += time;
}
}
time_avg /= loop_num;
fprintf(stderr, "%20s min = %7.2f max = %7.2f avg = %7.2f\n", "picodet",
time_min, time_max, time_avg);
return 0;
}
int main(int argc, char **argv) {
int mode = atoi(argv[1]);
std::string model_path = argv[2];
int height = 320;
int width = 320;
if (argc == 4) {
height = atoi(argv[3]);
width = atoi(argv[4]);
}
PicoDet detector = PicoDet(model_path, width, height, 4, 0.45, 0.3);
if (mode == 1) {
benchmark(detector, width, height);
} else {
if (argc != 5) {
std::cout << "Must set image file, such as ./picodet-mnn 0 "
"../picodet_s_320_lcnet.mnn 320 320 img.jpg"
<< std::endl;
}
const char *images = argv[5];
image_demo(detector, images);
}
}
// Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
// reference from https://github.com/RangiLyu/nanodet/tree/main/demo_mnn
#include "picodet_mnn.hpp"
using namespace std;
PicoDet::PicoDet(const std::string &mnn_path, int input_width, int input_length,
int num_thread_, float score_threshold_,
float nms_threshold_) {
num_thread = num_thread_;
in_w = input_width;
in_h = input_length;
score_threshold = score_threshold_;
nms_threshold = nms_threshold_;
PicoDet_interpreter = std::shared_ptr<MNN::Interpreter>(
MNN::Interpreter::createFromFile(mnn_path.c_str()));
MNN::ScheduleConfig config;
config.numThread = num_thread;
MNN::BackendConfig backendConfig;
backendConfig.precision = (MNN::BackendConfig::PrecisionMode)2;
config.backendConfig = &backendConfig;
PicoDet_session = PicoDet_interpreter->createSession(config);
input_tensor = PicoDet_interpreter->getSessionInput(PicoDet_session, nullptr);
}
PicoDet::~PicoDet() {
PicoDet_interpreter->releaseModel();
PicoDet_interpreter->releaseSession(PicoDet_session);
}
int PicoDet::detect(cv::Mat &raw_image, std::vector<BoxInfo> &result_list,
bool has_postprocess) {
if (raw_image.empty()) {
std::cout << "image is empty ,please check!" << std::endl;
return -1;
}
image_h = raw_image.rows;
image_w = raw_image.cols;
cv::Mat image;
cv::resize(raw_image, image, cv::Size(in_w, in_h));
PicoDet_interpreter->resizeTensor(input_tensor, {1, 3, in_h, in_w});
PicoDet_interpreter->resizeSession(PicoDet_session);
std::shared_ptr<MNN::CV::ImageProcess> pretreat(MNN::CV::ImageProcess::create(
MNN::CV::BGR, MNN::CV::BGR, mean_vals, 3, norm_vals, 3));
pretreat->convert(image.data, in_w, in_h, image.step[0], input_tensor);
auto start = chrono::steady_clock::now();
// run network
PicoDet_interpreter->runSession(PicoDet_session);
// get output data
std::vector<std::vector<BoxInfo>> results;
results.resize(num_class);
if (has_postprocess) {
auto bbox_out_tensor = PicoDet_interpreter->getSessionOutput(
PicoDet_session, nms_heads_info[0].c_str());
auto class_out_tensor = PicoDet_interpreter->getSessionOutput(
PicoDet_session, nms_heads_info[1].c_str());
// bbox branch
auto tensor_bbox_host =
new MNN::Tensor(bbox_out_tensor, MNN::Tensor::CAFFE);
bbox_out_tensor->copyToHostTensor(tensor_bbox_host);
auto bbox_output_shape = tensor_bbox_host->shape();
int output_size = 1;
for (int j = 0; j < bbox_output_shape.size(); ++j) {
output_size *= bbox_output_shape[j];
}
std::cout << "output_size:" << output_size << std::endl;
bbox_output_data_.resize(output_size);
std::copy_n(tensor_bbox_host->host<float>(), output_size,
bbox_output_data_.data());
delete tensor_bbox_host;
// class branch
auto tensor_class_host =
new MNN::Tensor(class_out_tensor, MNN::Tensor::CAFFE);
class_out_tensor->copyToHostTensor(tensor_class_host);
auto class_output_shape = tensor_class_host->shape();
output_size = 1;
for (int j = 0; j < class_output_shape.size(); ++j) {
output_size *= class_output_shape[j];
}
std::cout << "output_size:" << output_size << std::endl;
class_output_data_.resize(output_size);
std::copy_n(tensor_class_host->host<float>(), output_size,
class_output_data_.data());
delete tensor_class_host;
} else {
for (const auto &head_info : non_postprocess_heads_info) {
MNN::Tensor *tensor_scores = PicoDet_interpreter->getSessionOutput(
PicoDet_session, head_info.cls_layer.c_str());
MNN::Tensor *tensor_boxes = PicoDet_interpreter->getSessionOutput(
PicoDet_session, head_info.dis_layer.c_str());
MNN::Tensor tensor_scores_host(tensor_scores,
tensor_scores->getDimensionType());
tensor_scores->copyToHostTensor(&tensor_scores_host);
MNN::Tensor tensor_boxes_host(tensor_boxes,
tensor_boxes->getDimensionType());
tensor_boxes->copyToHostTensor(&tensor_boxes_host);
decode_infer(&tensor_scores_host, &tensor_boxes_host, head_info.stride,
score_threshold, results);
}
}
auto end = chrono::steady_clock::now();
chrono::duration<double> elapsed = end - start;
cout << "inference time:" << elapsed.count() << " s, ";
for (int i = 0; i < (int)results.size(); i++) {
nms(results[i], nms_threshold);
for (auto box : results[i]) {
box.x1 = box.x1 / in_w * image_w;
box.x2 = box.x2 / in_w * image_w;
box.y1 = box.y1 / in_h * image_h;
box.y2 = box.y2 / in_h * image_h;
result_list.push_back(box);
}
}
cout << "detect " << result_list.size() << " objects" << endl;
return 0;
}
void PicoDet::decode_infer(MNN::Tensor *cls_pred, MNN::Tensor *dis_pred,
int stride, float threshold,
std::vector<std::vector<BoxInfo>> &results) {
int feature_h = ceil((float)in_h / stride);
int feature_w = ceil((float)in_w / stride);
for (int idx = 0; idx < feature_h * feature_w; idx++) {
const float *scores = cls_pred->host<float>() + (idx * num_class);
int row = idx / feature_w;
int col = idx % feature_w;
float score = 0;
int cur_label = 0;
for (int label = 0; label < num_class; label++) {
if (scores[label] > score) {
score = scores[label];
cur_label = label;
}
}
if (score > threshold) {
const float *bbox_pred =
dis_pred->host<float>() + (idx * 4 * (reg_max + 1));
results[cur_label].push_back(
disPred2Bbox(bbox_pred, cur_label, score, col, row, stride));
}
}
}
BoxInfo PicoDet::disPred2Bbox(const float *&dfl_det, int label, float score,
int x, int y, int stride) {
float ct_x = (x + 0.5) * stride;
float ct_y = (y + 0.5) * stride;
std::vector<float> dis_pred;
dis_pred.resize(4);
for (int i = 0; i < 4; i++) {
float dis = 0;
float *dis_after_sm = new float[reg_max + 1];
activation_function_softmax(dfl_det + i * (reg_max + 1), dis_after_sm,
reg_max + 1);
for (int j = 0; j < reg_max + 1; j++) {
dis += j * dis_after_sm[j];
}
dis *= stride;
dis_pred[i] = dis;
delete[] dis_after_sm;
}
float xmin = (std::max)(ct_x - dis_pred[0], .0f);
float ymin = (std::max)(ct_y - dis_pred[1], .0f);
float xmax = (std::min)(ct_x + dis_pred[2], (float)in_w);
float ymax = (std::min)(ct_y + dis_pred[3], (float)in_h);
return BoxInfo{xmin, ymin, xmax, ymax, score, label};
}
void PicoDet::nms(std::vector<BoxInfo> &input_boxes, float NMS_THRESH) {
std::sort(input_boxes.begin(), input_boxes.end(),
[](BoxInfo a, BoxInfo b) { return a.score > b.score; });
std::vector<float> vArea(input_boxes.size());
for (int i = 0; i < int(input_boxes.size()); ++i) {
vArea[i] = (input_boxes.at(i).x2 - input_boxes.at(i).x1 + 1) *
(input_boxes.at(i).y2 - input_boxes.at(i).y1 + 1);
}
for (int i = 0; i < int(input_boxes.size()); ++i) {
for (int j = i + 1; j < int(input_boxes.size());) {
float xx1 = (std::max)(input_boxes[i].x1, input_boxes[j].x1);
float yy1 = (std::max)(input_boxes[i].y1, input_boxes[j].y1);
float xx2 = (std::min)(input_boxes[i].x2, input_boxes[j].x2);
float yy2 = (std::min)(input_boxes[i].y2, input_boxes[j].y2);
float w = (std::max)(float(0), xx2 - xx1 + 1);
float h = (std::max)(float(0), yy2 - yy1 + 1);
float inter = w * h;
float ovr = inter / (vArea[i] + vArea[j] - inter);
if (ovr >= NMS_THRESH) {
input_boxes.erase(input_boxes.begin() + j);
vArea.erase(vArea.begin() + j);
} else {
j++;
}
}
}
}
inline float fast_exp(float x) {
union {
uint32_t i;
float f;
} v{};
v.i = (1 << 23) * (1.4426950409 * x + 126.93490512f);
return v.f;
}
inline float sigmoid(float x) { return 1.0f / (1.0f + fast_exp(-x)); }
template <typename _Tp>
int activation_function_softmax(const _Tp *src, _Tp *dst, int length) {
const _Tp alpha = *std::max_element(src, src + length);
_Tp denominator{0};
for (int i = 0; i < length; ++i) {
dst[i] = fast_exp(src[i] - alpha);
denominator += dst[i];
}
for (int i = 0; i < length; ++i) {
dst[i] /= denominator;
}
return 0;
}
// Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#ifndef __PicoDet_H__
#define __PicoDet_H__
#pragma once
#include "Interpreter.hpp"
#include "ImageProcess.hpp"
#include "MNNDefine.h"
#include "Tensor.hpp"
#include <algorithm>
#include <chrono>
#include <iostream>
#include <memory>
#include <opencv2/opencv.hpp>
#include <string>
#include <vector>
typedef struct NonPostProcessHeadInfo_ {
std::string cls_layer;
std::string dis_layer;
int stride;
} NonPostProcessHeadInfo;
typedef struct BoxInfo_ {
float x1;
float y1;
float x2;
float y2;
float score;
int label;
} BoxInfo;
class PicoDet {
public:
PicoDet(const std::string &mnn_path, int input_width, int input_length,
int num_thread_ = 4, float score_threshold_ = 0.5,
float nms_threshold_ = 0.3);
~PicoDet();
int detect(cv::Mat &img, std::vector<BoxInfo> &result_list,
bool has_postprocess);
private:
void decode_infer(MNN::Tensor *cls_pred, MNN::Tensor *dis_pred, int stride,
float threshold,
std::vector<std::vector<BoxInfo>> &results);
BoxInfo disPred2Bbox(const float *&dfl_det, int label, float score, int x,
int y, int stride);
void nms(std::vector<BoxInfo> &input_boxes, float NMS_THRESH);
private:
std::shared_ptr<MNN::Interpreter> PicoDet_interpreter;
MNN::Session *PicoDet_session = nullptr;
MNN::Tensor *input_tensor = nullptr;
int num_thread;
int image_w;
int image_h;
int in_w = 320;
int in_h = 320;
float score_threshold;
float nms_threshold;
const float mean_vals[3] = {103.53f, 116.28f, 123.675f};
const float norm_vals[3] = {0.017429f, 0.017507f, 0.017125f};
const int num_class = 80;
const int reg_max = 7;
std::vector<float> bbox_output_data_;
std::vector<float> class_output_data_;
std::vector<std::string> nms_heads_info{"tmp_16", "concat_4.tmp_0"};
// If not export post-process, will use non_postprocess_heads_info
std::vector<NonPostProcessHeadInfo> non_postprocess_heads_info{
// cls_pred|dis_pred|stride
{"transpose_0.tmp_0", "transpose_1.tmp_0", 8},
{"transpose_2.tmp_0", "transpose_3.tmp_0", 16},
{"transpose_4.tmp_0", "transpose_5.tmp_0", 32},
{"transpose_6.tmp_0", "transpose_7.tmp_0", 64},
};
};
template <typename _Tp>
int activation_function_softmax(const _Tp *src, _Tp *dst, int length);
inline float fast_exp(float x);
inline float sigmoid(float x);
#endif
cmake_minimum_required(VERSION 3.9)
project(tinypose-mnn)
set(CMAKE_CXX_STANDARD 17)
set(MNN_DIR {YOUR_MNN_DIR})
find_package(OpenCV REQUIRED)
include_directories(
${MNN_DIR}/include
${MNN_DIR}/include/MNN
${CMAKE_SOURCE_DIR}
)
link_directories(mnn/lib)
add_library(libMNN SHARED IMPORTED)
set_target_properties(
libMNN
PROPERTIES IMPORTED_LOCATION
${CMAKE_SOURCE_DIR}/mnn/lib/libMNN.so
)
add_executable(tinypose-mnn main.cpp picodet_mnn.cpp keypoint_detector.cpp keypoint_postprocess.cpp)
target_link_libraries(tinypose-mnn MNN ${OpenCV_LIBS} libMNN.so)
cmake_minimum_required(VERSION 3.9)
project(tinypose-mnn)
set(CMAKE_CXX_STANDARD 17)
set(MNN_DIR {YOUR_MNN_DIR})
set(NDK_ROOT {YOUR_ANDROID_NDK_PATH})
set(LDFLAGS -latomic -pthread -ldl -llog -lz -static-libstdc++)
set(OpenCV_DIR ${CMAKE_SOURCE_DIR}/third/opencv4.1.0/arm64-v8a)
set(OpenCV_DEPS ${OpenCV_DIR}/libs/libopencv_imgcodecs.a
${OpenCV_DIR}/libs/libopencv_imgproc.a
${OpenCV_DIR}/libs/libopencv_core.a
${OpenCV_DIR}/3rdparty/libs/libtegra_hal.a
${OpenCV_DIR}/3rdparty/libs/liblibjpeg-turbo.a
${OpenCV_DIR}/3rdparty/libs/liblibwebp.a
${OpenCV_DIR}/3rdparty/libs/liblibpng.a
${OpenCV_DIR}/3rdparty/libs/liblibjasper.a
${OpenCV_DIR}/3rdparty/libs/liblibtiff.a
${OpenCV_DIR}/3rdparty/libs/libIlmImf.a
${OpenCV_DIR}/3rdparty/libs/libtbb.a
${OpenCV_DIR}/3rdparty/libs/libcpufeatures.a)
set(FLAGS "-pie -Wl,--gc-sections -funwind-tables -no-canonical-prefixes -D__ANDROID_API__=21 -fexceptions -frtti -std=c++11 -O3 -DNDEBUG -fPIE -fopenmp")
set(CMAKE_CXX_FLAGS "--sysroot=${NDK_ROOT}/sysroot ${FLAGS}")
set(STDCXX ${NDK_ROOT}/sources/cxx-stl/llvm-libc++/libs/arm64-v8a/libc++_static.a
${NDK_ROOT}/sources/cxx-stl/llvm-libc++/libs/arm64-v8a/libc++abi.a
${NDK_ROOT}/platforms/android-21/arch-arm64/usr/lib/libstdc++.a)
set(SYS_INCS ${NDK_ROOT}/sysroot/usr/include/aarch64-linux-android/ ${NDK_ROOT}/sources/cxx-stl/llvm-libc++/include/ ${NDK_ROOT}/sources/cxx-stl/llvm-libc++abi/include/ ${NDK_ROOT}/sources/android/support/include/ ${NDK_ROOT}/sysroot/usr/include/)
include_directories(
${SYS_INCS}
${OpenCV_DIR}/include
${MNN_DIR}/include
${MNN_DIR}/include/MNN
${CMAKE_SOURCE_DIR}
)
link_directories(${NDK_ROOT}/platforms/android-21/arch-arm64)
link_directories(${MNN_DIR}/project/android/build_64)
add_executable(tinypose-mnn picodet_mnn.cpp keypoint_postprocess.cpp keypoint_detector.cpp main.cpp)
target_link_libraries(tinypose-mnn -lMNN ${OpenCV_DEPS} ${STDCXX} ${LDFLAGS})
# TinyPose MNN Demo
This fold provides PicoDet+TinyPose inference code using
[Alibaba's MNN framework](https://github.com/alibaba/MNN). Most of the implements in
this fold are same as *demo_ncnn*.
## Install MNN
### Python library
Just run:
``` shell
pip install MNN
```
### C++ library
Please follow the [official document](https://www.yuque.com/mnn/en/build_linux) to build MNN engine.
- Create picodet_m_416_coco.onnx and tinypose256.onnx
example:
```shell
modelName=picodet_m_416_coco
# export model
python tools/export_model.py \
-c configs/picodet/${modelName}.yml \
-o weights=${modelName}.pdparams \
--output_dir=inference_model
# convert to onnx
paddle2onnx --model_dir inference_model/${modelName} \
--model_filename model.pdmodel \
--params_filename model.pdiparams \
--opset_version 11 \
--save_file ${modelName}.onnx
# onnxsim
python -m onnxsim ${modelName}.onnx ${modelName}_processed.onnx
```
- Convert model
example:
``` shell
python -m MNN.tools.mnnconvert -f ONNX --modelFile picodet-416.onnx --MNNModel picodet-416.mnn
```
Here are converted model
[picodet_m_416](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_m_416.mnn).
[tinypose256](https://paddledet.bj.bcebos.com/deploy/third_engine/tinypose256.mnn)
## Build
For C++ code, replace `libMNN.so` under *./mnn/lib* with the one you just compiled, modify OpenCV path and MNN path at CMake file,
and run
``` shell
mkdir build && cd build
cmake ..
make
```
Note that a flag at `main.cpp` is used to control whether to show the detection result or save it into a fold.
``` c++
#define __SAVE_RESULT__ // if defined save drawed results to ../results, else show it in windows
```
#### ARM Build
Prepare OpenCV library [OpenCV_4_1](https://paddle-inference-dist.bj.bcebos.com/opencv4.1.0.tar.gz).
``` shell
mkdir third && cd third
wget https://paddle-inference-dist.bj.bcebos.com/opencv4.1.0.tar.gz
tar -zxvf opencv4.1.0.tar.gz
cd ..
mkdir build && cd build
cmake -DCMAKE_TOOLCHAIN_FILE=$ANDROID_NDK/build/cmake/android.toolchain.cmake -DANDROID_ABI="arm64-v8a" -DANDROID_PLATFORM=android-21 -DANDROID_TOOLCHAIN=gcc ..
make
```
## Run
To detect images in a fold, run:
``` shell
./tinypose-mnn [mode] [image_file]
```
| param | detail |
| ---- | ---- |
| --mode | input mode,0:camera;1:image;2:video;3:benchmark |
| --image_file | input image path |
for example:
``` shell
./tinypose-mnn "1" "../imgs/test.jpg"
```
For speed benchmark:
``` shell
./tinypose-mnn "3" "0"
```
## Benchmark
Plateform: Kirin980
Model: [tinypose256](https://paddledet.bj.bcebos.com/deploy/third_engine/tinypose256.mnn)
| param | Min(s) | Max(s) | Avg(s) |
| -------- | ------ | ------ | ------ |
| Thread=4 | 0.018 | 0.021 | 0.019 |
| Thread=1 | 0.031 | 0.041 | 0.032 |
## Reference
[MNN](https://github.com/alibaba/MNN)
// Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include <sstream>
// for setprecision
#include <chrono>
#include <iomanip>
#include "keypoint_detector.h"
namespace PaddleDetection {
// Visualiztion MaskDetector results
cv::Mat VisualizeKptsResult(const cv::Mat& img,
const std::vector<KeyPointResult>& results,
const std::vector<int>& colormap,
float threshold) {
const int edge[][2] = {{0, 1},
{0, 2},
{1, 3},
{2, 4},
{3, 5},
{4, 6},
{5, 7},
{6, 8},
{7, 9},
{8, 10},
{5, 11},
{6, 12},
{11, 13},
{12, 14},
{13, 15},
{14, 16},
{11, 12}};
cv::Mat vis_img = img.clone();
for (int batchid = 0; batchid < results.size(); batchid++) {
for (int i = 0; i < results[batchid].num_joints; i++) {
if (results[batchid].keypoints[i * 3] > threshold) {
int x_coord = int(results[batchid].keypoints[i * 3 + 1]);
int y_coord = int(results[batchid].keypoints[i * 3 + 2]);
cv::circle(vis_img,
cv::Point2d(x_coord, y_coord),
1,
cv::Scalar(0, 0, 255),
2);
}
}
for (int i = 0; i < results[batchid].num_joints; i++) {
if (results[batchid].keypoints[edge[i][0] * 3] > threshold &&
results[batchid].keypoints[edge[i][1] * 3] > threshold) {
int x_start = int(results[batchid].keypoints[edge[i][0] * 3 + 1]);
int y_start = int(results[batchid].keypoints[edge[i][0] * 3 + 2]);
int x_end = int(results[batchid].keypoints[edge[i][1] * 3 + 1]);
int y_end = int(results[batchid].keypoints[edge[i][1] * 3 + 2]);
cv::line(vis_img,
cv::Point2d(x_start, y_start),
cv::Point2d(x_end, y_end),
colormap[i],
1);
}
}
}
return vis_img;
}
void KeyPointDetector::Postprocess(std::vector<float>& output,
std::vector<int>& output_shape,
std::vector<int>& idxout,
std::vector<int>& idx_shape,
std::vector<KeyPointResult>* result,
std::vector<std::vector<float>>& center_bs,
std::vector<std::vector<float>>& scale_bs) {
std::vector<float> preds(output_shape[1] * 3, 0);
for (int batchid = 0; batchid < output_shape[0]; batchid++) {
get_final_preds(output,
output_shape,
idxout,
idx_shape,
center_bs[batchid],
scale_bs[batchid],
preds,
batchid,
this->use_dark());
KeyPointResult result_item;
result_item.num_joints = output_shape[1];
result_item.keypoints.clear();
for (int i = 0; i < output_shape[1]; i++) {
result_item.keypoints.emplace_back(preds[i * 3]);
result_item.keypoints.emplace_back(preds[i * 3 + 1]);
result_item.keypoints.emplace_back(preds[i * 3 + 2]);
}
result->push_back(result_item);
}
}
void KeyPointDetector::Predict(const std::vector<cv::Mat> imgs,
std::vector<std::vector<float>>& center_bs,
std::vector<std::vector<float>>& scale_bs,
std::vector<KeyPointResult>* result) {
int batch_size = imgs.size();
KeyPointDet_interpreter->resizeTensor(input_tensor,
{batch_size, 3, in_h, in_w});
KeyPointDet_interpreter->resizeSession(KeyPointDet_session);
auto insize = 3 * in_h * in_w;
// Preprocess image
cv::Mat resized_im;
for (int bs_idx = 0; bs_idx < batch_size; bs_idx++) {
cv::Mat im = imgs.at(bs_idx);
cv::resize(im, resized_im, cv::Size(in_w, in_h));
std::shared_ptr<MNN::CV::ImageProcess> pretreat(
MNN::CV::ImageProcess::create(
MNN::CV::BGR, MNN::CV::RGB, mean_vals, 3, norm_vals, 3));
pretreat->convert(
resized_im.data, in_w, in_h, resized_im.step[0], input_tensor);
}
// Run predictor
auto inference_start = std::chrono::steady_clock::now();
KeyPointDet_interpreter->runSession(KeyPointDet_session);
// Get output tensor
auto out_tensor = KeyPointDet_interpreter->getSessionOutput(
KeyPointDet_session, "conv2d_441.tmp_1");
auto nchwoutTensor = new Tensor(out_tensor, Tensor::CAFFE);
out_tensor->copyToHostTensor(nchwoutTensor);
auto output_shape = nchwoutTensor->shape();
// Calculate output length
int output_size = 1;
for (int j = 0; j < output_shape.size(); ++j) {
output_size *= output_shape[j];
}
output_data_.resize(output_size);
std::copy_n(nchwoutTensor->host<float>(), output_size, output_data_.data());
delete nchwoutTensor;
auto idx_tensor = KeyPointDet_interpreter->getSessionOutput(
KeyPointDet_session, "argmax_0.tmp_0");
auto idxhostTensor = new Tensor(idx_tensor, Tensor::CAFFE);
idx_tensor->copyToHostTensor(idxhostTensor);
auto idx_shape = idxhostTensor->shape();
// Calculate output length
output_size = 1;
for (int j = 0; j < idx_shape.size(); ++j) {
output_size *= idx_shape[j];
}
idx_data_.resize(output_size);
std::copy_n(idxhostTensor->host<int>(), output_size, idx_data_.data());
delete idxhostTensor;
auto inference_end = std::chrono::steady_clock::now();
std::chrono::duration<double> elapsed = inference_end - inference_start;
printf("keypoint inference time: %f s\n", elapsed.count());
// Postprocessing result
Postprocess(output_data_,
output_shape,
idx_data_,
idx_shape,
result,
center_bs,
scale_bs);
}
} // namespace PaddleDetection
// Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#pragma once
#include <ctime>
#include <memory>
#include <string>
#include <utility>
#include <vector>
#include <opencv2/core/core.hpp>
#include <opencv2/highgui/highgui.hpp>
#include <opencv2/imgproc/imgproc.hpp>
#include "Interpreter.hpp"
#include "ImageProcess.hpp"
#include "MNNDefine.h"
#include "Tensor.hpp"
#include "keypoint_postprocess.h"
using namespace MNN;
namespace PaddleDetection {
// Object KeyPoint Result
struct KeyPointResult {
// Keypoints: shape(N x 3); N: number of Joints; 3: x,y,conf
std::vector<float> keypoints;
int num_joints = -1;
};
// Visualiztion KeyPoint Result
cv::Mat VisualizeKptsResult(const cv::Mat& img,
const std::vector<KeyPointResult>& results,
const std::vector<int>& colormap,
float threshold = 0.2);
class KeyPointDetector {
public:
explicit KeyPointDetector(const std::string& model_path,
int num_thread = 4,
int input_height = 256,
int input_width = 192,
float score_threshold = 0.3,
const int batch_size = 1,
bool use_dark = true) {
printf("config path: %s",
model_path.substr(0, model_path.find_last_of('/') + 1).c_str());
use_dark_ = use_dark;
in_w = input_width;
in_h = input_height;
threshold_ = score_threshold;
KeyPointDet_interpreter = std::shared_ptr<MNN::Interpreter>(
MNN::Interpreter::createFromFile(model_path.c_str()));
MNN::ScheduleConfig config;
config.type = MNN_FORWARD_CPU;
/*modeNum means gpuMode for GPU usage, Or means numThread for CPU usage.*/
config.numThread = num_thread;
// If type not fount, let it failed
config.backupType = MNN_FORWARD_CPU;
BackendConfig backendConfig;
backendConfig.precision = static_cast<MNN::BackendConfig::PrecisionMode>(1);
config.backendConfig = &backendConfig;
KeyPointDet_session = KeyPointDet_interpreter->createSession(config);
input_tensor =
KeyPointDet_interpreter->getSessionInput(KeyPointDet_session, nullptr);
}
~KeyPointDetector() {
KeyPointDet_interpreter->releaseModel();
KeyPointDet_interpreter->releaseSession(KeyPointDet_session);
}
// Load Paddle inference model
void LoadModel(std::string model_file, int num_theads);
// Run predictor
void Predict(const std::vector<cv::Mat> imgs,
std::vector<std::vector<float>>& center,
std::vector<std::vector<float>>& scale,
std::vector<KeyPointResult>* result = nullptr);
bool use_dark() { return this->use_dark_; }
inline float get_threshold() { return threshold_; };
// const float mean_vals[3] = { 103.53f, 116.28f, 123.675f };
// const float norm_vals[3] = { 0.017429f, 0.017507f, 0.017125f };
const float mean_vals[3] = {0.f, 0.f, 0.f};
const float norm_vals[3] = {1.f, 1.f, 1.f};
int in_w = 128;
int in_h = 256;
private:
// Postprocess result
void Postprocess(std::vector<float>& output,
std::vector<int>& output_shape,
std::vector<int>& idxout,
std::vector<int>& idx_shape,
std::vector<KeyPointResult>* result,
std::vector<std::vector<float>>& center,
std::vector<std::vector<float>>& scale);
std::vector<float> output_data_;
std::vector<int> idx_data_;
float threshold_;
bool use_dark_;
std::shared_ptr<MNN::Interpreter> KeyPointDet_interpreter;
MNN::Session* KeyPointDet_session = nullptr;
MNN::Tensor* input_tensor = nullptr;
};
} // namespace PaddleDetection
// Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include "keypoint_postprocess.h"
#define PI 3.1415926535
#define HALF_CIRCLE_DEGREE 180
cv::Point2f get_3rd_point(cv::Point2f& a, cv::Point2f& b) {
cv::Point2f direct{a.x - b.x, a.y - b.y};
return cv::Point2f(a.x - direct.y, a.y + direct.x);
}
std::vector<float> get_dir(float src_point_x,
float src_point_y,
float rot_rad) {
float sn = sin(rot_rad);
float cs = cos(rot_rad);
std::vector<float> src_result{0.0, 0.0};
src_result[0] = src_point_x * cs - src_point_y * sn;
src_result[1] = src_point_x * sn + src_point_y * cs;
return src_result;
}
void affine_tranform(
float pt_x, float pt_y, cv::Mat& trans, std::vector<float>& preds, int p) {
double new1[3] = {pt_x, pt_y, 1.0};
cv::Mat new_pt(3, 1, trans.type(), new1);
cv::Mat w = trans * new_pt;
preds[p * 3 + 1] = static_cast<float>(w.at<double>(0, 0));
preds[p * 3 + 2] = static_cast<float>(w.at<double>(1, 0));
}
void get_affine_transform(std::vector<float>& center,
std::vector<float>& scale,
float rot,
std::vector<int>& output_size,
cv::Mat& trans,
int inv) {
float src_w = scale[0];
float dst_w = static_cast<float>(output_size[0]);
float dst_h = static_cast<float>(output_size[1]);
float rot_rad = rot * PI / HALF_CIRCLE_DEGREE;
std::vector<float> src_dir = get_dir(-0.5 * src_w, 0, rot_rad);
std::vector<float> dst_dir{static_cast<float>(-0.5) * dst_w, 0.0};
cv::Point2f srcPoint2f[3], dstPoint2f[3];
srcPoint2f[0] = cv::Point2f(center[0], center[1]);
srcPoint2f[1] = cv::Point2f(center[0] + src_dir[0], center[1] + src_dir[1]);
srcPoint2f[2] = get_3rd_point(srcPoint2f[0], srcPoint2f[1]);
dstPoint2f[0] = cv::Point2f(dst_w * 0.5, dst_h * 0.5);
dstPoint2f[1] =
cv::Point2f(dst_w * 0.5 + dst_dir[0], dst_h * 0.5 + dst_dir[1]);
dstPoint2f[2] = get_3rd_point(dstPoint2f[0], dstPoint2f[1]);
if (inv == 0) {
trans = cv::getAffineTransform(srcPoint2f, dstPoint2f);
} else {
trans = cv::getAffineTransform(dstPoint2f, srcPoint2f);
}
}
void transform_preds(std::vector<float>& coords,
std::vector<float>& center,
std::vector<float>& scale,
std::vector<int>& output_size,
std::vector<int>& dim,
std::vector<float>& target_coords) {
cv::Mat trans(2, 3, CV_64FC1);
get_affine_transform(center, scale, 0, output_size, trans, 1);
for (int p = 0; p < dim[1]; ++p) {
affine_tranform(coords[p * 2], coords[p * 2 + 1], trans, target_coords, p);
}
}
// only for batchsize == 1
void get_max_preds(std::vector<float>& heatmap,
std::vector<int>& dim,
std::vector<float>& preds,
std::vector<float>& maxvals,
int batchid,
int joint_idx) {
int num_joints = dim[1];
int width = dim[3];
std::vector<int> idx;
idx.resize(num_joints * 2);
for (int j = 0; j < dim[1]; j++) {
float* index = &(
heatmap[batchid * num_joints * dim[2] * dim[3] + j * dim[2] * dim[3]]);
float* end = index + dim[2] * dim[3];
float* max_dis = std::max_element(index, end);
auto max_id = std::distance(index, max_dis);
maxvals[j] = *max_dis;
if (*max_dis > 0) {
preds[j * 2] = static_cast<float>(max_id % width);
preds[j * 2 + 1] = static_cast<float>(max_id / width);
}
}
}
void dark_parse(std::vector<float>& heatmap,
std::vector<int>& dim,
std::vector<float>& coords,
int px,
int py,
int index,
int ch) {
/*DARK postpocessing, Zhang et al. Distribution-Aware Coordinate
Representation for Human Pose Estimation (CVPR 2020).
1) offset = - hassian.inv() * derivative
2) dx = (heatmap[x+1] - heatmap[x-1])/2.
3) dxx = (dx[x+1] - dx[x-1])/2.
4) derivative = Mat([dx, dy])
5) hassian = Mat([[dxx, dxy], [dxy, dyy]])
*/
std::vector<float>::const_iterator first1 = heatmap.begin() + index;
std::vector<float>::const_iterator last1 =
heatmap.begin() + index + dim[2] * dim[3];
std::vector<float> heatmap_ch(first1, last1);
cv::Mat heatmap_mat = cv::Mat(heatmap_ch).reshape(0, dim[2]);
heatmap_mat.convertTo(heatmap_mat, CV_32FC1);
cv::GaussianBlur(heatmap_mat, heatmap_mat, cv::Size(3, 3), 0, 0);
heatmap_mat = heatmap_mat.reshape(1, 1);
heatmap_ch = std::vector<float>(heatmap_mat.reshape(1, 1));
float epsilon = 1e-10;
// sample heatmap to get values in around target location
float xy = log(fmax(heatmap_ch[py * dim[3] + px], epsilon));
float xr = log(fmax(heatmap_ch[py * dim[3] + px + 1], epsilon));
float xl = log(fmax(heatmap_ch[py * dim[3] + px - 1], epsilon));
float xr2 = log(fmax(heatmap_ch[py * dim[3] + px + 2], epsilon));
float xl2 = log(fmax(heatmap_ch[py * dim[3] + px - 2], epsilon));
float yu = log(fmax(heatmap_ch[(py + 1) * dim[3] + px], epsilon));
float yd = log(fmax(heatmap_ch[(py - 1) * dim[3] + px], epsilon));
float yu2 = log(fmax(heatmap_ch[(py + 2) * dim[3] + px], epsilon));
float yd2 = log(fmax(heatmap_ch[(py - 2) * dim[3] + px], epsilon));
float xryu = log(fmax(heatmap_ch[(py + 1) * dim[3] + px + 1], epsilon));
float xryd = log(fmax(heatmap_ch[(py - 1) * dim[3] + px + 1], epsilon));
float xlyu = log(fmax(heatmap_ch[(py + 1) * dim[3] + px - 1], epsilon));
float xlyd = log(fmax(heatmap_ch[(py - 1) * dim[3] + px - 1], epsilon));
// compute dx/dy and dxx/dyy with sampled values
float dx = 0.5 * (xr - xl);
float dy = 0.5 * (yu - yd);
float dxx = 0.25 * (xr2 - 2 * xy + xl2);
float dxy = 0.25 * (xryu - xryd - xlyu + xlyd);
float dyy = 0.25 * (yu2 - 2 * xy + yd2);
// finally get offset by derivative and hassian, which combined by dx/dy and
// dxx/dyy
if (dxx * dyy - dxy * dxy != 0) {
float M[2][2] = {dxx, dxy, dxy, dyy};
float D[2] = {dx, dy};
cv::Mat hassian(2, 2, CV_32F, M);
cv::Mat derivative(2, 1, CV_32F, D);
cv::Mat offset = -hassian.inv() * derivative;
coords[ch * 2] += offset.at<float>(0, 0);
coords[ch * 2 + 1] += offset.at<float>(1, 0);
}
}
void get_final_preds(std::vector<float>& heatmap,
std::vector<int>& dim,
std::vector<int>& idxout,
std::vector<int>& idxdim,
std::vector<float>& center,
std::vector<float> scale,
std::vector<float>& preds,
int batchid,
bool DARK) {
std::vector<float> coords;
coords.resize(dim[1] * 2);
int heatmap_height = dim[2];
int heatmap_width = dim[3];
for (int j = 0; j < dim[1]; ++j) {
int index = (batchid * dim[1] + j) * dim[2] * dim[3];
int idx = idxout[batchid * dim[1] + j];
preds[j * 3] = heatmap[index + idx];
coords[j * 2] = idx % heatmap_width;
coords[j * 2 + 1] = idx / heatmap_width;
int px = int(coords[j * 2] + 0.5);
int py = int(coords[j * 2 + 1] + 0.5);
if (DARK && px > 1 && px < heatmap_width - 2 && py > 1 &&
py < heatmap_height - 2) {
dark_parse(heatmap, dim, coords, px, py, index, j);
} else {
if (px > 0 && px < heatmap_width - 1) {
float diff_x = heatmap[index + py * dim[3] + px + 1] -
heatmap[index + py * dim[3] + px - 1];
coords[j * 2] += diff_x > 0 ? 1 : -1 * 0.25;
}
if (py > 0 && py < heatmap_height - 1) {
float diff_y = heatmap[index + (py + 1) * dim[3] + px] -
heatmap[index + (py - 1) * dim[3] + px];
coords[j * 2 + 1] += diff_y > 0 ? 1 : -1 * 0.25;
}
}
}
std::vector<int> img_size{heatmap_width, heatmap_height};
transform_preds(coords, center, scale, img_size, dim, preds);
}
void CropImg(cv::Mat& img,
cv::Mat& crop_img,
std::vector<int>& area,
std::vector<float>& center,
std::vector<float>& scale,
float expandratio) {
int crop_x1 = std::max(0, area[0]);
int crop_y1 = std::max(0, area[1]);
int crop_x2 = std::min(img.cols - 1, area[2]);
int crop_y2 = std::min(img.rows - 1, area[3]);
int center_x = (crop_x1 + crop_x2) / 2.;
int center_y = (crop_y1 + crop_y2) / 2.;
int half_h = (crop_y2 - crop_y1) / 2.;
int half_w = (crop_x2 - crop_x1) / 2.;
if (half_h * 3 > half_w * 4) {
half_w = static_cast<int>(half_h * 0.75);
} else {
half_h = static_cast<int>(half_w * 4 / 3);
}
crop_x1 =
std::max(0, center_x - static_cast<int>(half_w * (1 + expandratio)));
crop_y1 =
std::max(0, center_y - static_cast<int>(half_h * (1 + expandratio)));
crop_x2 = std::min(img.cols - 1,
static_cast<int>(center_x + half_w * (1 + expandratio)));
crop_y2 = std::min(img.rows - 1,
static_cast<int>(center_y + half_h * (1 + expandratio)));
crop_img =
img(cv::Range(crop_y1, crop_y2 + 1), cv::Range(crop_x1, crop_x2 + 1));
center.clear();
center.emplace_back((crop_x1 + crop_x2) / 2);
center.emplace_back((crop_y1 + crop_y2) / 2);
scale.clear();
scale.emplace_back((crop_x2 - crop_x1));
scale.emplace_back((crop_y2 - crop_y1));
}
// Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#pragma once
#include <opencv2/core/core.hpp>
#include <opencv2/highgui/highgui.hpp>
#include <opencv2/imgproc/imgproc.hpp>
#include <vector>
std::vector<float> get_3rd_point(std::vector<float>& a, std::vector<float>& b);
std::vector<float> get_dir(float src_point_x, float src_point_y, float rot_rad);
void affine_tranform(float pt_x,
float pt_y,
cv::Mat& trans,
std::vector<float>& x,
int p,
int num);
cv::Mat get_affine_transform(std::vector<float>& center,
std::vector<float>& scale,
float rot,
std::vector<int>& output_size,
int inv);
void transform_preds(std::vector<float>& coords,
std::vector<float>& center,
std::vector<float>& scale,
std::vector<int>& output_size,
std::vector<int>& dim,
std::vector<float>& target_coords);
void box_to_center_scale(std::vector<int>& box,
int width,
int height,
std::vector<float>& center,
std::vector<float>& scale);
void get_max_preds(std::vector<float>& heatmap,
std::vector<int>& dim,
std::vector<float>& preds,
std::vector<float>& maxvals,
int batchid,
int joint_idx);
void get_final_preds(std::vector<float>& heatmap,
std::vector<int>& dim,
std::vector<int>& idxout,
std::vector<int>& idxdim,
std::vector<float>& center,
std::vector<float> scale,
std::vector<float>& preds,
int batchid,
bool DARK = true);
void CropImg(cv::Mat& img,
cv::Mat& crop_img,
std::vector<int>& area,
std::vector<float>& center,
std::vector<float>& scale,
float expandratio = 0.25);
// Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
// reference from https://github.com/RangiLyu/nanodet/tree/main/demo_mnn
#include <iostream>
#include <opencv2/core/core.hpp>
#include <opencv2/highgui/highgui.hpp>
#include <opencv2/imgproc/imgproc.hpp>
#include "keypoint_detector.h"
#include "picodet_mnn.h"
#define __SAVE_RESULT__ // if defined save drawed results to ../results, else
// show it in windows
using namespace PaddleDetection;
struct object_rect {
int x;
int y;
int width;
int height;
};
int resize_uniform(cv::Mat& src,
cv::Mat& dst,
cv::Size dst_size,
object_rect& effect_area) {
int w = src.cols;
int h = src.rows;
int dst_w = dst_size.width;
int dst_h = dst_size.height;
dst = cv::Mat(cv::Size(dst_w, dst_h), CV_8UC3, cv::Scalar(0));
float ratio_src = w * 1.0 / h;
float ratio_dst = dst_w * 1.0 / dst_h;
int tmp_w = 0;
int tmp_h = 0;
if (ratio_src > ratio_dst) {
tmp_w = dst_w;
tmp_h = floor((dst_w * 1.0 / w) * h);
} else if (ratio_src < ratio_dst) {
tmp_h = dst_h;
tmp_w = floor((dst_h * 1.0 / h) * w);
} else {
cv::resize(src, dst, dst_size);
effect_area.x = 0;
effect_area.y = 0;
effect_area.width = dst_w;
effect_area.height = dst_h;
return 0;
}
cv::Mat tmp;
cv::resize(src, tmp, cv::Size(tmp_w, tmp_h));
if (tmp_w != dst_w) {
int index_w = floor((dst_w - tmp_w) / 2.0);
for (int i = 0; i < dst_h; i++) {
memcpy(dst.data + i * dst_w * 3 + index_w * 3,
tmp.data + i * tmp_w * 3,
tmp_w * 3);
}
effect_area.x = index_w;
effect_area.y = 0;
effect_area.width = tmp_w;
effect_area.height = tmp_h;
} else if (tmp_h != dst_h) {
int index_h = floor((dst_h - tmp_h) / 2.0);
memcpy(dst.data + index_h * dst_w * 3, tmp.data, tmp_w * tmp_h * 3);
effect_area.x = 0;
effect_area.y = index_h;
effect_area.width = tmp_w;
effect_area.height = tmp_h;
} else {
printf("error\n");
}
return 0;
}
const int color_list[80][3] = {
{216, 82, 24}, {236, 176, 31}, {125, 46, 141}, {118, 171, 47},
{76, 189, 237}, {238, 19, 46}, {76, 76, 76}, {153, 153, 153},
{255, 0, 0}, {255, 127, 0}, {190, 190, 0}, {0, 255, 0},
{0, 0, 255}, {170, 0, 255}, {84, 84, 0}, {84, 170, 0},
{84, 255, 0}, {170, 84, 0}, {170, 170, 0}, {170, 255, 0},
{255, 84, 0}, {255, 170, 0}, {255, 255, 0}, {0, 84, 127},
{0, 170, 127}, {0, 255, 127}, {84, 0, 127}, {84, 84, 127},
{84, 170, 127}, {84, 255, 127}, {170, 0, 127}, {170, 84, 127},
{170, 170, 127}, {170, 255, 127}, {255, 0, 127}, {255, 84, 127},
{255, 170, 127}, {255, 255, 127}, {0, 84, 255}, {0, 170, 255},
{0, 255, 255}, {84, 0, 255}, {84, 84, 255}, {84, 170, 255},
{84, 255, 255}, {170, 0, 255}, {170, 84, 255}, {170, 170, 255},
{170, 255, 255}, {255, 0, 255}, {255, 84, 255}, {255, 170, 255},
{42, 0, 0}, {84, 0, 0}, {127, 0, 0}, {170, 0, 0},
{212, 0, 0}, {255, 0, 0}, {0, 42, 0}, {0, 84, 0},
{0, 127, 0}, {0, 170, 0}, {0, 212, 0}, {0, 255, 0},
{0, 0, 42}, {0, 0, 84}, {0, 0, 127}, {0, 0, 170},
{0, 0, 212}, {0, 0, 255}, {0, 0, 0}, {36, 36, 36},
{72, 72, 72}, {109, 109, 109}, {145, 145, 145}, {182, 182, 182},
{218, 218, 218}, {0, 113, 188}, {80, 182, 188}, {127, 127, 0},
};
void draw_bboxes(const cv::Mat& bgr,
const std::vector<BoxInfo>& bboxes,
object_rect effect_roi,
std::string save_path = "None") {
static const char* class_names[] = {
"person", "bicycle", "car",
"motorcycle", "airplane", "bus",
"train", "truck", "boat",
"traffic light", "fire hydrant", "stop sign",
"parking meter", "bench", "bird",
"cat", "dog", "horse",
"sheep", "cow", "elephant",
"bear", "zebra", "giraffe",
"backpack", "umbrella", "handbag",
"tie", "suitcase", "frisbee",
"skis", "snowboard", "sports ball",
"kite", "baseball bat", "baseball glove",
"skateboard", "surfboard", "tennis racket",
"bottle", "wine glass", "cup",
"fork", "knife", "spoon",
"bowl", "banana", "apple",
"sandwich", "orange", "broccoli",
"carrot", "hot dog", "pizza",
"donut", "cake", "chair",
"couch", "potted plant", "bed",
"dining table", "toilet", "tv",
"laptop", "mouse", "remote",
"keyboard", "cell phone", "microwave",
"oven", "toaster", "sink",
"refrigerator", "book", "clock",
"vase", "scissors", "teddy bear",
"hair drier", "toothbrush"};
cv::Mat image = bgr.clone();
int src_w = image.cols;
int src_h = image.rows;
int dst_w = effect_roi.width;
int dst_h = effect_roi.height;
float width_ratio = (float)src_w / (float)dst_w;
float height_ratio = (float)src_h / (float)dst_h;
for (size_t i = 0; i < bboxes.size(); i++) {
const BoxInfo& bbox = bboxes[i];
cv::Scalar color = cv::Scalar(color_list[bbox.label][0],
color_list[bbox.label][1],
color_list[bbox.label][2]);
cv::rectangle(image,
cv::Rect(cv::Point((bbox.x1 - effect_roi.x) * width_ratio,
(bbox.y1 - effect_roi.y) * height_ratio),
cv::Point((bbox.x2 - effect_roi.x) * width_ratio,
(bbox.y2 - effect_roi.y) * height_ratio)),
color);
char text[256];
sprintf(text, "%s %.1f%%", class_names[bbox.label], bbox.score * 100);
int baseLine = 0;
cv::Size label_size =
cv::getTextSize(text, cv::FONT_HERSHEY_SIMPLEX, 0.4, 1, &baseLine);
int x = (bbox.x1 - effect_roi.x) * width_ratio;
int y =
(bbox.y1 - effect_roi.y) * height_ratio - label_size.height - baseLine;
if (y < 0) y = 0;
if (x + label_size.width > image.cols) x = image.cols - label_size.width;
cv::rectangle(
image,
cv::Rect(cv::Point(x, y),
cv::Size(label_size.width, label_size.height + baseLine)),
color,
-1);
cv::putText(image,
text,
cv::Point(x, y + label_size.height),
cv::FONT_HERSHEY_SIMPLEX,
0.4,
cv::Scalar(255, 255, 255));
}
if (save_path == "None") {
cv::imshow("image", image);
} else {
cv::imwrite(save_path, image);
std::cout << save_path << std::endl;
}
}
std::vector<BoxInfo> coordsback(const cv::Mat image,
const object_rect effect_roi,
const std::vector<BoxInfo>& bboxes) {
int src_w = image.cols;
int src_h = image.rows;
int dst_w = effect_roi.width;
int dst_h = effect_roi.height;
float width_ratio = (float)src_w / (float)dst_w;
float height_ratio = (float)src_h / (float)dst_h;
std::vector<BoxInfo> bboxes_oimg;
for (int i = 0; i < bboxes.size(); i++) {
auto bbox = bboxes[i];
bbox.x1 = (bbox.x1 - effect_roi.x) * width_ratio;
bbox.y1 = (bbox.y1 - effect_roi.y) * height_ratio;
bbox.x2 = (bbox.x2 - effect_roi.x) * width_ratio;
bbox.y2 = (bbox.y2 - effect_roi.y) * height_ratio;
bboxes_oimg.emplace_back(bbox);
}
return bboxes_oimg;
}
void image_infer_kpts(KeyPointDetector* kpts_detector,
cv::Mat image,
const object_rect effect_roi,
const std::vector<BoxInfo>& results,
std::string img_name = "kpts_vis",
bool save_img = true) {
std::vector<cv::Mat> cropimgs;
std::vector<std::vector<float>> center_bs;
std::vector<std::vector<float>> scale_bs;
std::vector<KeyPointResult> kpts_results;
auto results_oimg = coordsback(image, effect_roi, results);
for (int i = 0; i < results_oimg.size(); i++) {
auto rect = results_oimg[i];
if (rect.label == 0) {
cv::Mat cropimg;
std::vector<float> center, scale;
std::vector<int> area = {static_cast<int>(rect.x1),
static_cast<int>(rect.y1),
static_cast<int>(rect.x2),
static_cast<int>(rect.y2)};
CropImg(image, cropimg, area, center, scale);
// cv::imwrite("./test_crop_"+std::to_string(i)+".jpg", cropimg);
cropimgs.emplace_back(cropimg);
center_bs.emplace_back(center);
scale_bs.emplace_back(scale);
}
if (cropimgs.size() == 1 ||
(cropimgs.size() > 0 && i == results_oimg.size() - 1)) {
kpts_detector->Predict(cropimgs, center_bs, scale_bs, &kpts_results);
cropimgs.clear();
center_bs.clear();
scale_bs.clear();
}
}
std::vector<int> compression_params;
compression_params.push_back(cv::IMWRITE_JPEG_QUALITY);
compression_params.push_back(95);
std::string kpts_savepath =
"keypoint_" + img_name.substr(img_name.find_last_of('/') + 1);
cv::Mat kpts_vis_img =
VisualizeKptsResult(image, kpts_results, {0, 255, 0}, 0.3);
if (save_img) {
cv::imwrite(kpts_savepath, kpts_vis_img, compression_params);
printf("Visualized output saved as %s\n", kpts_savepath.c_str());
} else {
cv::imshow("image", kpts_vis_img);
}
}
int image_demo(PicoDet& detector,
KeyPointDetector* kpts_detector,
const char* imagepath) {
std::vector<cv::String> filenames;
cv::glob(imagepath, filenames, false);
for (auto img_name : filenames) {
cv::Mat image = cv::imread(img_name);
if (image.empty()) {
fprintf(stderr, "cv::imread %s failed\n", img_name.c_str());
return -1;
}
object_rect effect_roi;
cv::Mat resized_img;
resize_uniform(image, resized_img, cv::Size(320, 320), effect_roi);
std::vector<BoxInfo> results;
detector.detect(resized_img, results);
if (kpts_detector) {
image_infer_kpts(kpts_detector, image, effect_roi, results, img_name);
}
}
return 0;
}
int webcam_demo(PicoDet& detector,
KeyPointDetector* kpts_detector,
int cam_id) {
cv::Mat image;
cv::VideoCapture cap(cam_id);
while (true) {
cap >> image;
object_rect effect_roi;
cv::Mat resized_img;
resize_uniform(image, resized_img, cv::Size(320, 320), effect_roi);
std::vector<BoxInfo> results;
detector.detect(resized_img, results);
if (kpts_detector) {
image_infer_kpts(kpts_detector, image, effect_roi, results, "", false);
}
}
return 0;
}
int video_demo(PicoDet& detector,
KeyPointDetector* kpts_detector,
const char* path) {
cv::Mat image;
cv::VideoCapture cap(path);
while (true) {
cap >> image;
object_rect effect_roi;
cv::Mat resized_img;
resize_uniform(image, resized_img, cv::Size(320, 320), effect_roi);
std::vector<BoxInfo> results;
detector.detect(resized_img, results);
if (kpts_detector) {
image_infer_kpts(kpts_detector, image, effect_roi, results, "", false);
}
}
return 0;
}
int benchmark(KeyPointDetector* kpts_detector) {
int loop_num = 100;
int warm_up = 8;
double time_min = DBL_MAX;
double time_max = -DBL_MAX;
double time_avg = 0;
cv::Mat image(256, 192, CV_8UC3, cv::Scalar(1, 1, 1));
std::vector<float> center = {128, 96};
std::vector<float> scale = {256, 192};
std::vector<cv::Mat> cropimgs = {image};
std::vector<std::vector<float>> center_bs = {center};
std::vector<std::vector<float>> scale_bs = {scale};
std::vector<KeyPointResult> kpts_results;
for (int i = 0; i < warm_up + loop_num; i++) {
auto start = std::chrono::steady_clock::now();
std::vector<BoxInfo> results;
kpts_detector->Predict(cropimgs, center_bs, scale_bs, &kpts_results);
auto end = std::chrono::steady_clock::now();
std::chrono::duration<double> elapsed = end - start;
double time = elapsed.count();
if (i >= warm_up) {
time_min = (std::min)(time_min, time);
time_max = (std::max)(time_max, time);
time_avg += time;
}
}
time_avg /= loop_num;
fprintf(stderr,
"%20s min = %7.2f max = %7.2f avg = %7.2f\n",
"tinypose",
time_min,
time_max,
time_avg);
return 0;
}
int main(int argc, char** argv) {
if (argc != 3) {
fprintf(stderr,
"usage: %s [mode] [path]. \n For webcam mode=0, path is cam id; \n "
"For image demo, mode=1, path=xxx/xxx/*.jpg; \n For video, mode=2; "
"\n For benchmark, mode=3 path=0.\n",
argv[0]);
return -1;
}
PicoDet detector =
PicoDet("../weight/picodet_m_416.mnn", 416, 416, 4, 0.45, 0.3);
KeyPointDetector* kpts_detector =
new KeyPointDetector("../weight/tinypose256.mnn", 4, 256, 192);
int mode = atoi(argv[1]);
switch (mode) {
case 0: {
int cam_id = atoi(argv[2]);
webcam_demo(detector, kpts_detector, cam_id);
break;
}
case 1: {
const char* images = argv[2];
image_demo(detector, kpts_detector, images);
break;
}
case 2: {
const char* path = argv[2];
video_demo(detector, kpts_detector, path);
break;
}
case 3: {
benchmark(kpts_detector);
break;
}
default: {
fprintf(stderr,
"usage: %s [mode] [path]. \n For webcam mode=0, path is cam id; "
"\n For image demo, mode=1, path=xxx/xxx/*.jpg; \n For video, "
"mode=2; \n For benchmark, mode=3 path=0.\n",
argv[0]);
break;
}
}
delete kpts_detector;
kpts_detector = nullptr;
}
// Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
// reference from https://github.com/RangiLyu/nanodet/tree/main/demo_mnn
#include "picodet_mnn.h"
using namespace std;
PicoDet::PicoDet(const std::string &mnn_path,
int input_width,
int input_length,
int num_thread_,
float score_threshold_,
float nms_threshold_) {
num_thread = num_thread_;
in_w = input_width;
in_h = input_length;
score_threshold = score_threshold_;
nms_threshold = nms_threshold_;
PicoDet_interpreter = std::shared_ptr<MNN::Interpreter>(
MNN::Interpreter::createFromFile(mnn_path.c_str()));
MNN::ScheduleConfig config;
config.numThread = num_thread;
MNN::BackendConfig backendConfig;
backendConfig.precision = (MNN::BackendConfig::PrecisionMode)2;
config.backendConfig = &backendConfig;
PicoDet_session = PicoDet_interpreter->createSession(config);
input_tensor = PicoDet_interpreter->getSessionInput(PicoDet_session, nullptr);
}
PicoDet::~PicoDet() {
PicoDet_interpreter->releaseModel();
PicoDet_interpreter->releaseSession(PicoDet_session);
}
int PicoDet::detect(cv::Mat &raw_image, std::vector<BoxInfo> &result_list) {
if (raw_image.empty()) {
std::cout << "image is empty ,please check!" << std::endl;
return -1;
}
image_h = raw_image.rows;
image_w = raw_image.cols;
cv::Mat image;
cv::resize(raw_image, image, cv::Size(in_w, in_h));
PicoDet_interpreter->resizeTensor(input_tensor, {1, 3, in_h, in_w});
PicoDet_interpreter->resizeSession(PicoDet_session);
std::shared_ptr<MNN::CV::ImageProcess> pretreat(MNN::CV::ImageProcess::create(
MNN::CV::BGR, MNN::CV::BGR, mean_vals, 3, norm_vals, 3));
pretreat->convert(image.data, in_w, in_h, image.step[0], input_tensor);
auto start = chrono::steady_clock::now();
// run network
PicoDet_interpreter->runSession(PicoDet_session);
// get output data
std::vector<std::vector<BoxInfo>> results;
results.resize(num_class);
for (const auto &head_info : heads_info) {
MNN::Tensor *tensor_scores = PicoDet_interpreter->getSessionOutput(
PicoDet_session, head_info.cls_layer.c_str());
MNN::Tensor *tensor_boxes = PicoDet_interpreter->getSessionOutput(
PicoDet_session, head_info.dis_layer.c_str());
MNN::Tensor tensor_scores_host(tensor_scores,
tensor_scores->getDimensionType());
tensor_scores->copyToHostTensor(&tensor_scores_host);
MNN::Tensor tensor_boxes_host(tensor_boxes,
tensor_boxes->getDimensionType());
tensor_boxes->copyToHostTensor(&tensor_boxes_host);
decode_infer(&tensor_scores_host,
&tensor_boxes_host,
head_info.stride,
score_threshold,
results);
}
auto end = chrono::steady_clock::now();
chrono::duration<double> elapsed = end - start;
cout << "inference time:" << elapsed.count() << " s, ";
for (int i = 0; i < (int)results.size(); i++) {
nms(results[i], nms_threshold);
for (auto box : results[i]) {
box.x1 = box.x1 / in_w * image_w;
box.x2 = box.x2 / in_w * image_w;
box.y1 = box.y1 / in_h * image_h;
box.y2 = box.y2 / in_h * image_h;
result_list.push_back(box);
}
}
cout << "detect " << result_list.size() << " objects." << std::endl;
;
return 0;
}
void PicoDet::decode_infer(MNN::Tensor *cls_pred,
MNN::Tensor *dis_pred,
int stride,
float threshold,
std::vector<std::vector<BoxInfo>> &results) {
int feature_h = in_h / stride;
int feature_w = in_w / stride;
for (int idx = 0; idx < feature_h * feature_w; idx++) {
const float *scores = cls_pred->host<float>() + (idx * num_class);
int row = idx / feature_w;
int col = idx % feature_w;
float score = 0;
int cur_label = 0;
for (int label = 0; label < num_class; label++) {
if (scores[label] > score) {
score = scores[label];
cur_label = label;
}
}
if (score > threshold) {
const float *bbox_pred =
dis_pred->host<float>() + (idx * 4 * (reg_max + 1));
results[cur_label].push_back(
disPred2Bbox(bbox_pred, cur_label, score, col, row, stride));
}
}
}
BoxInfo PicoDet::disPred2Bbox(
const float *&dfl_det, int label, float score, int x, int y, int stride) {
float ct_x = (x + 0.5) * stride;
float ct_y = (y + 0.5) * stride;
std::vector<float> dis_pred;
dis_pred.resize(4);
for (int i = 0; i < 4; i++) {
float dis = 0;
float *dis_after_sm = new float[reg_max + 1];
activation_function_softmax(
dfl_det + i * (reg_max + 1), dis_after_sm, reg_max + 1);
for (int j = 0; j < reg_max + 1; j++) {
dis += j * dis_after_sm[j];
}
dis *= stride;
dis_pred[i] = dis;
delete[] dis_after_sm;
}
float xmin = (std::max)(ct_x - dis_pred[0], .0f);
float ymin = (std::max)(ct_y - dis_pred[1], .0f);
float xmax = (std::min)(ct_x + dis_pred[2], (float)in_w);
float ymax = (std::min)(ct_y + dis_pred[3], (float)in_h);
return BoxInfo{xmin, ymin, xmax, ymax, score, label};
}
void PicoDet::nms(std::vector<BoxInfo> &input_boxes, float NMS_THRESH) {
std::sort(input_boxes.begin(), input_boxes.end(), [](BoxInfo a, BoxInfo b) {
return a.score > b.score;
});
std::vector<float> vArea(input_boxes.size());
for (int i = 0; i < int(input_boxes.size()); ++i) {
vArea[i] = (input_boxes.at(i).x2 - input_boxes.at(i).x1 + 1) *
(input_boxes.at(i).y2 - input_boxes.at(i).y1 + 1);
}
for (int i = 0; i < int(input_boxes.size()); ++i) {
for (int j = i + 1; j < int(input_boxes.size());) {
float xx1 = (std::max)(input_boxes[i].x1, input_boxes[j].x1);
float yy1 = (std::max)(input_boxes[i].y1, input_boxes[j].y1);
float xx2 = (std::min)(input_boxes[i].x2, input_boxes[j].x2);
float yy2 = (std::min)(input_boxes[i].y2, input_boxes[j].y2);
float w = (std::max)(float(0), xx2 - xx1 + 1);
float h = (std::max)(float(0), yy2 - yy1 + 1);
float inter = w * h;
float ovr = inter / (vArea[i] + vArea[j] - inter);
if (ovr >= NMS_THRESH) {
input_boxes.erase(input_boxes.begin() + j);
vArea.erase(vArea.begin() + j);
} else {
j++;
}
}
}
}
string PicoDet::get_label_str(int label) { return labels[label]; }
inline float fast_exp(float x) {
union {
uint32_t i;
float f;
} v{};
v.i = (1 << 23) * (1.4426950409 * x + 126.93490512f);
return v.f;
}
inline float sigmoid(float x) { return 1.0f / (1.0f + fast_exp(-x)); }
template <typename _Tp>
int activation_function_softmax(const _Tp *src, _Tp *dst, int length) {
const _Tp alpha = *std::max_element(src, src + length);
_Tp denominator{0};
for (int i = 0; i < length; ++i) {
dst[i] = fast_exp(src[i] - alpha);
denominator += dst[i];
}
for (int i = 0; i < length; ++i) {
dst[i] /= denominator;
}
return 0;
}
// Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
// reference from https://github.com/RangiLyu/nanodet/tree/main/demo_mnn
#ifndef __PicoDet_H__
#define __PicoDet_H__
#pragma once
#include "Interpreter.hpp"
#include <algorithm>
#include <chrono>
#include <iostream>
#include <memory>
#include <opencv2/opencv.hpp>
#include <string>
#include <vector>
#include "ImageProcess.hpp"
#include "MNNDefine.h"
#include "Tensor.hpp"
typedef struct HeadInfo_ {
std::string cls_layer;
std::string dis_layer;
int stride;
} HeadInfo;
typedef struct BoxInfo_ {
float x1;
float y1;
float x2;
float y2;
float score;
int label;
} BoxInfo;
class PicoDet {
public:
PicoDet(const std::string &mnn_path,
int input_width,
int input_length,
int num_thread_ = 4,
float score_threshold_ = 0.5,
float nms_threshold_ = 0.3);
~PicoDet();
int detect(cv::Mat &img, std::vector<BoxInfo> &result_list);
std::string get_label_str(int label);
private:
void decode_infer(MNN::Tensor *cls_pred,
MNN::Tensor *dis_pred,
int stride,
float threshold,
std::vector<std::vector<BoxInfo>> &results);
BoxInfo disPred2Bbox(
const float *&dfl_det, int label, float score, int x, int y, int stride);
void nms(std::vector<BoxInfo> &input_boxes, float NMS_THRESH);
private:
std::shared_ptr<MNN::Interpreter> PicoDet_interpreter;
MNN::Session *PicoDet_session = nullptr;
MNN::Tensor *input_tensor = nullptr;
int num_thread;
int image_w;
int image_h;
int in_w = 320;
int in_h = 320;
float score_threshold;
float nms_threshold;
const float mean_vals[3] = {103.53f, 116.28f, 123.675f};
const float norm_vals[3] = {0.017429f, 0.017507f, 0.017125f};
const int num_class = 80;
const int reg_max = 7;
std::vector<HeadInfo> heads_info{
// cls_pred|dis_pred|stride
{"save_infer_model/scale_0.tmp_1", "save_infer_model/scale_4.tmp_1", 8},
{"save_infer_model/scale_1.tmp_1", "save_infer_model/scale_5.tmp_1", 16},
{"save_infer_model/scale_2.tmp_1", "save_infer_model/scale_6.tmp_1", 32},
{"save_infer_model/scale_3.tmp_1", "save_infer_model/scale_7.tmp_1", 64},
};
std::vector<std::string> labels{
"person", "bicycle", "car",
"motorcycle", "airplane", "bus",
"train", "truck", "boat",
"traffic light", "fire hydrant", "stop sign",
"parking meter", "bench", "bird",
"cat", "dog", "horse",
"sheep", "cow", "elephant",
"bear", "zebra", "giraffe",
"backpack", "umbrella", "handbag",
"tie", "suitcase", "frisbee",
"skis", "snowboard", "sports ball",
"kite", "baseball bat", "baseball glove",
"skateboard", "surfboard", "tennis racket",
"bottle", "wine glass", "cup",
"fork", "knife", "spoon",
"bowl", "banana", "apple",
"sandwich", "orange", "broccoli",
"carrot", "hot dog", "pizza",
"donut", "cake", "chair",
"couch", "potted plant", "bed",
"dining table", "toilet", "tv",
"laptop", "mouse", "remote",
"keyboard", "cell phone", "microwave",
"oven", "toaster", "sink",
"refrigerator", "book", "clock",
"vase", "scissors", "teddy bear",
"hair drier", "toothbrush"};
};
template <typename _Tp>
int activation_function_softmax(const _Tp *src, _Tp *dst, int length);
inline float fast_exp(float x);
inline float sigmoid(float x);
#endif
cmake_minimum_required(VERSION 3.9)
set(CMAKE_CXX_STANDARD 17)
project(picodet_demo)
find_package(OpenMP REQUIRED)
if(OPENMP_FOUND)
message("OPENMP FOUND")
set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} ${OpenMP_C_FLAGS}")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${OpenMP_CXX_FLAGS}")
set(CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} ${OpenMP_EXE_LINKER_FLAGS}")
endif()
# find_package(OpenCV REQUIRED)
find_package(OpenCV REQUIRED PATHS "/path/to/opencv-3.4.16_gcc8.2_ffmpeg")
# find_package(ncnn REQUIRED)
find_package(ncnn REQUIRED PATHS "/path/to/ncnn/build/install/lib/cmake/ncnn")
if(NOT TARGET ncnn)
message(WARNING "ncnn NOT FOUND! Please set ncnn_DIR environment variable")
else()
message("ncnn FOUND ")
endif()
include_directories(
${OpenCV_INCLUDE_DIRS}
${CMAKE_CURRENT_SOURCE_DIR}
${CMAKE_CURRENT_BINARY_DIR}
)
add_executable(picodet_demo main.cpp picodet.cpp)
target_link_libraries(
picodet_demo
ncnn
${OpenCV_LIBS}
)
# PicoDet NCNN Demo
该Demo提供的预测代码是根据[Tencent's NCNN framework](https://github.com/Tencent/ncnn)推理库预测的。
# 第一步:编译
## Windows
### Step1.
Download and Install Visual Studio from https://visualstudio.microsoft.com/vs/community/
### Step2.
Download and install OpenCV from https://github.com/opencv/opencv/releases
为了方便,如果环境是gcc8.2 x86环境,可直接下载以下库:
```shell
wget https://paddledet.bj.bcebos.com/data/opencv-3.4.16_gcc8.2_ffmpeg.tar.gz
tar -xf opencv-3.4.16_gcc8.2_ffmpeg.tar.gz
```
### Step3(可选).
Download and install Vulkan SDK from https://vulkan.lunarg.com/sdk/home
### Step4:编译NCNN
``` shell script
git clone --recursive https://github.com/Tencent/ncnn.git
```
Build NCNN following this tutorial: [Build for Windows x64 using VS2017](https://github.com/Tencent/ncnn/wiki/how-to-build#build-for-windows-x64-using-visual-studio-community-2017)
### Step5.
增加 `ncnn_DIR` = `YOUR_NCNN_PATH/build/install/lib/cmake/ncnn` 到系统变量中
Build project: Open x64 Native Tools Command Prompt for VS 2019 or 2017
``` cmd
cd <this-folder>
mkdir -p build
cd build
cmake ..
msbuild picodet_demo.vcxproj /p:configuration=release /p:platform=x64
```
## Linux
### Step1.
Build and install OpenCV from https://github.com/opencv/opencv
### Step2(可选).
Download Vulkan SDK from https://vulkan.lunarg.com/sdk/home
### Step3:编译NCNN
Clone NCNN repository
``` shell script
git clone --recursive https://github.com/Tencent/ncnn.git
```
Build NCNN following this tutorial: [Build for Linux / NVIDIA Jetson / Raspberry Pi](https://github.com/Tencent/ncnn/wiki/how-to-build#build-for-linux)
### Step4:编译可执行文件
``` shell script
cd <this-folder>
mkdir build
cd build
cmake ..
make
```
# Run demo
- 准备模型
```shell
modelName=picodet_s_320_coco_lcnet
# 导出Inference model
python tools/export_model.py \
-c configs/picodet/${modelName}.yml \
-o weights=${modelName}.pdparams \
--output_dir=inference_model
# 转换到ONNX
paddle2onnx --model_dir inference_model/${modelName} \
--model_filename model.pdmodel \
--params_filename model.pdiparams \
--opset_version 11 \
--save_file ${modelName}.onnx
# 简化模型
python -m onnxsim ${modelName}.onnx ${modelName}_processed.onnx
# 将模型转换至NCNN格式
Run onnx2ncnn in ncnn tools to generate ncnn .param and .bin file.
```
转NCNN模型可以利用在线转换工具 [https://convertmodel.com](https://convertmodel.com/)
为了快速测试,可直接下载:[picodet_s_320_coco_lcnet-opt.bin](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_s_320_coco_lcnet-opt.bin)/ [picodet_s_320_coco_lcnet-opt.param](https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_s_320_coco_lcnet-opt.param)(不带后处理)。
**注意:**由于带后处理后,NCNN预测会出NAN,暂时使用不带后处理Demo即可,带后处理的Demo正在升级中,很快发布。
## 开始运行
首先新建预测结果存放目录:
```shell
cp -r ../demo_onnxruntime/imgs .
cd build
mkdir ../results
```
- 预测一张图片
``` shell
./picodet_demo 0 ../picodet_s_320_coco_lcnet.bin ../picodet_s_320_coco_lcnet.param 320 320 ../imgs/dog.jpg 0
```
具体参数解析可参考`main.cpp`
-测试速度Benchmark
``` shell
./picodet_demo 1 ../picodet_s_320_lcnet.bin ../picodet_s_320_lcnet.param 320 320 0
```
## FAQ
- 预测结果精度不对:
请先确认模型输入shape是否对齐,并且模型输出name是否对齐,不带后处理的PicoDet增强版模型输出name如下:
```shell
# 分类分支 | 检测分支
{"transpose_0.tmp_0", "transpose_1.tmp_0"},
{"transpose_2.tmp_0", "transpose_3.tmp_0"},
{"transpose_4.tmp_0", "transpose_5.tmp_0"},
{"transpose_6.tmp_0", "transpose_7.tmp_0"},
```
可使用[netron](https://netron.app)查看具体name,并修改`picodet_mnn.hpp`中相应`non_postprocess_heads_info`数组。
// Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
// reference from https://github.com/RangiLyu/nanodet/tree/main/demo_ncnn
#include "picodet.h"
#include <benchmark.h>
#include <iostream>
#include <net.h>
#include <opencv2/core/core.hpp>
#include <opencv2/highgui/highgui.hpp>
#include <opencv2/imgproc/imgproc.hpp>
#define __SAVE_RESULT__ // if defined save drawed results to ../results, else
// show it in windows
struct object_rect {
int x;
int y;
int width;
int height;
};
std::vector<int> GenerateColorMap(int num_class) {
auto colormap = std::vector<int>(3 * num_class, 0);
for (int i = 0; i < num_class; ++i) {
int j = 0;
int lab = i;
while (lab) {
colormap[i * 3] |= (((lab >> 0) & 1) << (7 - j));
colormap[i * 3 + 1] |= (((lab >> 1) & 1) << (7 - j));
colormap[i * 3 + 2] |= (((lab >> 2) & 1) << (7 - j));
++j;
lab >>= 3;
}
}
return colormap;
}
void draw_bboxes(const cv::Mat &im, const std::vector<BoxInfo> &bboxes,
std::string save_path = "None") {
static const char *class_names[] = {
"person", "bicycle", "car",
"motorcycle", "airplane", "bus",
"train", "truck", "boat",
"traffic light", "fire hydrant", "stop sign",
"parking meter", "bench", "bird",
"cat", "dog", "horse",
"sheep", "cow", "elephant",
"bear", "zebra", "giraffe",
"backpack", "umbrella", "handbag",
"tie", "suitcase", "frisbee",
"skis", "snowboard", "sports ball",
"kite", "baseball bat", "baseball glove",
"skateboard", "surfboard", "tennis racket",
"bottle", "wine glass", "cup",
"fork", "knife", "spoon",
"bowl", "banana", "apple",
"sandwich", "orange", "broccoli",
"carrot", "hot dog", "pizza",
"donut", "cake", "chair",
"couch", "potted plant", "bed",
"dining table", "toilet", "tv",
"laptop", "mouse", "remote",
"keyboard", "cell phone", "microwave",
"oven", "toaster", "sink",
"refrigerator", "book", "clock",
"vase", "scissors", "teddy bear",
"hair drier", "toothbrush"};
cv::Mat image = im.clone();
int src_w = image.cols;
int src_h = image.rows;
int thickness = 2;
auto colormap = GenerateColorMap(sizeof(class_names));
for (size_t i = 0; i < bboxes.size(); i++) {
const BoxInfo &bbox = bboxes[i];
std::cout << bbox.x1 << ". " << bbox.y1 << ". " << bbox.x2 << ". "
<< bbox.y2 << ". " << std::endl;
int c1 = colormap[3 * bbox.label + 0];
int c2 = colormap[3 * bbox.label + 1];
int c3 = colormap[3 * bbox.label + 2];
cv::Scalar color = cv::Scalar(c1, c2, c3);
// cv::Scalar color = cv::Scalar(0, 0, 255);
cv::rectangle(image, cv::Rect(cv::Point(bbox.x1, bbox.y1),
cv::Point(bbox.x2, bbox.y2)),
color, 1);
char text[256];
sprintf(text, "%s %.1f%%", class_names[bbox.label], bbox.score * 100);
int baseLine = 0;
cv::Size label_size =
cv::getTextSize(text, cv::FONT_HERSHEY_SIMPLEX, 0.4, 1, &baseLine);
int x = bbox.x1;
int y = bbox.y1 - label_size.height - baseLine;
if (y < 0)
y = 0;
if (x + label_size.width > image.cols)
x = image.cols - label_size.width;
cv::rectangle(image, cv::Rect(cv::Point(x, y),
cv::Size(label_size.width,
label_size.height + baseLine)),
color, -1);
cv::putText(image, text, cv::Point(x, y + label_size.height),
cv::FONT_HERSHEY_SIMPLEX, 0.4, cv::Scalar(255, 255, 255), 1);
}
if (save_path == "None") {
cv::imshow("image", image);
} else {
cv::imwrite(save_path, image);
std::cout << "Result save in: " << save_path << std::endl;
}
}
int image_demo(PicoDet &detector, const char *imagepath,
int has_postprocess = 0) {
std::vector<cv::String> filenames;
cv::glob(imagepath, filenames, false);
bool is_postprocess = has_postprocess > 0 ? true : false;
for (auto img_name : filenames) {
cv::Mat image = cv::imread(img_name, cv::IMREAD_COLOR);
if (image.empty()) {
fprintf(stderr, "cv::imread %s failed\n", img_name.c_str());
return -1;
}
std::vector<BoxInfo> results;
detector.detect(image, results, is_postprocess);
std::cout << "detect done." << std::endl;
#ifdef __SAVE_RESULT__
std::string save_path = img_name;
draw_bboxes(image, results, save_path.replace(3, 4, "results"));
#else
draw_bboxes(image, results);
cv::waitKey(0);
#endif
}
return 0;
}
int benchmark(PicoDet &detector, int width, int height,
int has_postprocess = 0) {
int loop_num = 100;
int warm_up = 8;
double time_min = DBL_MAX;
double time_max = -DBL_MAX;
double time_avg = 0;
cv::Mat image(width, height, CV_8UC3, cv::Scalar(1, 1, 1));
bool is_postprocess = has_postprocess > 0 ? true : false;
for (int i = 0; i < warm_up + loop_num; i++) {
double start = ncnn::get_current_time();
std::vector<BoxInfo> results;
detector.detect(image, results, is_postprocess);
double end = ncnn::get_current_time();
double time = end - start;
if (i >= warm_up) {
time_min = (std::min)(time_min, time);
time_max = (std::max)(time_max, time);
time_avg += time;
}
}
time_avg /= loop_num;
fprintf(stderr, "%20s min = %7.2f max = %7.2f avg = %7.2f\n", "picodet",
time_min, time_max, time_avg);
return 0;
}
int main(int argc, char **argv) {
int mode = atoi(argv[1]);
char *bin_model_path = argv[2];
char *param_model_path = argv[3];
int height = 320;
int width = 320;
if (argc == 5) {
height = atoi(argv[4]);
width = atoi(argv[5]);
}
PicoDet detector =
PicoDet(param_model_path, bin_model_path, width, height, true, 0.45, 0.3);
if (mode == 1) {
benchmark(detector, width, height, atoi(argv[6]));
} else {
if (argc != 6) {
std::cout << "Must set image file, such as ./picodet_demo 0 "
"../picodet_s_320_lcnet.bin ../picodet_s_320_lcnet.param "
"320 320 img.jpg"
<< std::endl;
}
const char *images = argv[6];
image_demo(detector, images, atoi(argv[7]));
}
}
// Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
// reference from https://github.com/RangiLyu/nanodet/tree/main/demo_ncnn
#include "picodet.h"
#include <benchmark.h>
#include <iostream>
inline float fast_exp(float x) {
union {
uint32_t i;
float f;
} v{};
v.i = (1 << 23) * (1.4426950409 * x + 126.93490512f);
return v.f;
}
inline float sigmoid(float x) { return 1.0f / (1.0f + fast_exp(-x)); }
template <typename _Tp>
int activation_function_softmax(const _Tp *src, _Tp *dst, int length) {
const _Tp alpha = *std::max_element(src, src + length);
_Tp denominator{0};
for (int i = 0; i < length; ++i) {
dst[i] = fast_exp(src[i] - alpha);
denominator += dst[i];
}
for (int i = 0; i < length; ++i) {
dst[i] /= denominator;
}
return 0;
}
bool PicoDet::hasGPU = false;
PicoDet *PicoDet::detector = nullptr;
PicoDet::PicoDet(const char *param, const char *bin, int input_width,
int input_hight, bool useGPU, float score_threshold_ = 0.5,
float nms_threshold_ = 0.3) {
this->Net = new ncnn::Net();
#if NCNN_VULKAN
this->hasGPU = ncnn::get_gpu_count() > 0;
#endif
this->Net->opt.use_vulkan_compute = this->hasGPU && useGPU;
this->Net->opt.use_fp16_arithmetic = true;
this->Net->load_param(param);
this->Net->load_model(bin);
this->in_w = input_width;
this->in_h = input_hight;
this->score_threshold = score_threshold_;
this->nms_threshold = nms_threshold_;
}
PicoDet::~PicoDet() { delete this->Net; }
void PicoDet::preprocess(cv::Mat &image, ncnn::Mat &in) {
// cv::resize(image, image, cv::Size(this->in_w, this->in_h), 0.f, 0.f);
int img_w = image.cols;
int img_h = image.rows;
in = ncnn::Mat::from_pixels_resize(image.data, ncnn::Mat::PIXEL_BGR, img_w,
img_h, this->in_w, this->in_h);
const float mean_vals[3] = {103.53f, 116.28f, 123.675f};
const float norm_vals[3] = {0.017429f, 0.017507f, 0.017125f};
in.substract_mean_normalize(mean_vals, norm_vals);
}
int PicoDet::detect(cv::Mat image, std::vector<BoxInfo> &result_list,
bool has_postprocess) {
ncnn::Mat input;
preprocess(image, input);
auto ex = this->Net->create_extractor();
ex.set_light_mode(false);
ex.set_num_threads(4);
#if NCNN_VULKAN
ex.set_vulkan_compute(this->hasGPU);
#endif
ex.input("image", input); // picodet
this->image_h = image.rows;
this->image_w = image.cols;
std::vector<std::vector<BoxInfo>> results;
results.resize(this->num_class);
if (has_postprocess) {
ncnn::Mat dis_pred;
ncnn::Mat cls_pred;
ex.extract(this->nms_heads_info[0].c_str(), dis_pred);
ex.extract(this->nms_heads_info[1].c_str(), cls_pred);
std::cout << dis_pred.h << " " << dis_pred.w << std::endl;
std::cout << cls_pred.h << " " << cls_pred.w << std::endl;
this->nms_boxes(cls_pred, dis_pred, this->score_threshold, results);
} else {
for (const auto &head_info : this->non_postprocess_heads_info) {
ncnn::Mat dis_pred;
ncnn::Mat cls_pred;
ex.extract(head_info.dis_layer.c_str(), dis_pred);
ex.extract(head_info.cls_layer.c_str(), cls_pred);
this->decode_infer(cls_pred, dis_pred, head_info.stride,
this->score_threshold, results);
}
}
for (int i = 0; i < (int)results.size(); i++) {
this->nms(results[i], this->nms_threshold);
for (auto box : results[i]) {
box.x1 = box.x1 / this->in_w * this->image_w;
box.x2 = box.x2 / this->in_w * this->image_w;
box.y1 = box.y1 / this->in_h * this->image_h;
box.y2 = box.y2 / this->in_h * this->image_h;
result_list.push_back(box);
}
}
return 0;
}
void PicoDet::nms_boxes(ncnn::Mat &cls_pred, ncnn::Mat &dis_pred,
float score_threshold,
std::vector<std::vector<BoxInfo>> &result_list) {
BoxInfo bbox;
int i, j;
for (i = 0; i < dis_pred.h; i++) {
bbox.x1 = dis_pred.row(i)[0];
bbox.y1 = dis_pred.row(i)[1];
bbox.x2 = dis_pred.row(i)[2];
bbox.y2 = dis_pred.row(i)[3];
const float *scores = cls_pred.row(i);
float score = 0;
int cur_label = 0;
for (int label = 0; label < this->num_class; label++) {
float score_ = cls_pred.row(label)[i];
if (score_ > score) {
score = score_;
cur_label = label;
}
}
bbox.score = score;
bbox.label = cur_label;
result_list[cur_label].push_back(bbox);
}
}
void PicoDet::decode_infer(ncnn::Mat &cls_pred, ncnn::Mat &dis_pred, int stride,
float threshold,
std::vector<std::vector<BoxInfo>> &results) {
int feature_h = ceil((float)this->in_w / stride);
int feature_w = ceil((float)this->in_h / stride);
for (int idx = 0; idx < feature_h * feature_w; idx++) {
const float *scores = cls_pred.row(idx);
int row = idx / feature_w;
int col = idx % feature_w;
float score = 0;
int cur_label = 0;
for (int label = 0; label < this->num_class; label++) {
if (scores[label] > score) {
score = scores[label];
cur_label = label;
}
}
if (score > threshold) {
const float *bbox_pred = dis_pred.row(idx);
results[cur_label].push_back(
this->disPred2Bbox(bbox_pred, cur_label, score, col, row, stride));
}
}
}
BoxInfo PicoDet::disPred2Bbox(const float *&dfl_det, int label, float score,
int x, int y, int stride) {
float ct_x = (x + 0.5) * stride;
float ct_y = (y + 0.5) * stride;
std::vector<float> dis_pred;
dis_pred.resize(4);
for (int i = 0; i < 4; i++) {
float dis = 0;
float *dis_after_sm = new float[this->reg_max + 1];
activation_function_softmax(dfl_det + i * (this->reg_max + 1), dis_after_sm,
this->reg_max + 1);
for (int j = 0; j < this->reg_max + 1; j++) {
dis += j * dis_after_sm[j];
}
dis *= stride;
dis_pred[i] = dis;
delete[] dis_after_sm;
}
float xmin = (std::max)(ct_x - dis_pred[0], .0f);
float ymin = (std::max)(ct_y - dis_pred[1], .0f);
float xmax = (std::min)(ct_x + dis_pred[2], (float)this->in_w);
float ymax = (std::min)(ct_y + dis_pred[3], (float)this->in_w);
return BoxInfo{xmin, ymin, xmax, ymax, score, label};
}
void PicoDet::nms(std::vector<BoxInfo> &input_boxes, float NMS_THRESH) {
std::sort(input_boxes.begin(), input_boxes.end(),
[](BoxInfo a, BoxInfo b) { return a.score > b.score; });
std::vector<float> vArea(input_boxes.size());
for (int i = 0; i < int(input_boxes.size()); ++i) {
vArea[i] = (input_boxes.at(i).x2 - input_boxes.at(i).x1 + 1) *
(input_boxes.at(i).y2 - input_boxes.at(i).y1 + 1);
}
for (int i = 0; i < int(input_boxes.size()); ++i) {
for (int j = i + 1; j < int(input_boxes.size());) {
float xx1 = (std::max)(input_boxes[i].x1, input_boxes[j].x1);
float yy1 = (std::max)(input_boxes[i].y1, input_boxes[j].y1);
float xx2 = (std::min)(input_boxes[i].x2, input_boxes[j].x2);
float yy2 = (std::min)(input_boxes[i].y2, input_boxes[j].y2);
float w = (std::max)(float(0), xx2 - xx1 + 1);
float h = (std::max)(float(0), yy2 - yy1 + 1);
float inter = w * h;
float ovr = inter / (vArea[i] + vArea[j] - inter);
if (ovr >= NMS_THRESH) {
input_boxes.erase(input_boxes.begin() + j);
vArea.erase(vArea.begin() + j);
} else {
j++;
}
}
}
}
// Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
// reference from https://github.com/RangiLyu/nanodet/tree/main/demo_ncnn
#ifndef PICODET_H
#define PICODET_H
#include <net.h>
#include <opencv2/core/core.hpp>
typedef struct NonPostProcessHeadInfo {
std::string cls_layer;
std::string dis_layer;
int stride;
} NonPostProcessHeadInfo;
typedef struct BoxInfo {
float x1;
float y1;
float x2;
float y2;
float score;
int label;
} BoxInfo;
class PicoDet {
public:
PicoDet(const char *param, const char *bin, int input_width, int input_hight,
bool useGPU, float score_threshold_, float nms_threshold_);
~PicoDet();
static PicoDet *detector;
ncnn::Net *Net;
static bool hasGPU;
int detect(cv::Mat image, std::vector<BoxInfo> &result_list,
bool has_postprocess);
private:
void preprocess(cv::Mat &image, ncnn::Mat &in);
void decode_infer(ncnn::Mat &cls_pred, ncnn::Mat &dis_pred, int stride,
float threshold,
std::vector<std::vector<BoxInfo>> &results);
BoxInfo disPred2Bbox(const float *&dfl_det, int label, float score, int x,
int y, int stride);
static void nms(std::vector<BoxInfo> &result, float nms_threshold);
void nms_boxes(ncnn::Mat &cls_pred, ncnn::Mat &dis_pred,
float score_threshold,
std::vector<std::vector<BoxInfo>> &result_list);
int image_w;
int image_h;
int in_w = 320;
int in_h = 320;
int num_class = 80;
int reg_max = 7;
float score_threshold;
float nms_threshold;
std::vector<float> bbox_output_data_;
std::vector<float> class_output_data_;
std::vector<std::string> nms_heads_info{"tmp_16", "concat_4.tmp_0"};
// If not export post-process, will use non_postprocess_heads_info
std::vector<NonPostProcessHeadInfo> non_postprocess_heads_info{
// cls_pred|dis_pred|stride
{"transpose_0.tmp_0", "transpose_1.tmp_0", 8},
{"transpose_2.tmp_0", "transpose_3.tmp_0", 16},
{"transpose_4.tmp_0", "transpose_5.tmp_0", 32},
{"transpose_6.tmp_0", "transpose_7.tmp_0", 64},
};
};
#endif
# PP-YOLOE 转ONNX-TRT教程
本教程内容为:使用PP-YOLOE模型导出转换为ONNX格式,并定制化修改网络,使用[EfficientNMS_TRT](https://github.com/NVIDIA/TensorRT/tree/main/plugin/efficientNMSPlugin) OP,
可成功运行在[TensorRT](https://github.com/NVIDIA/TensorRT)上,示例仅供参考
## 1. 环境依赖
CUDA 10.2 + [cudnn 8.2.1](https://docs.nvidia.com/deeplearning/cudnn/install-guide/index.html) + [TensorRT 8.2](https://docs.nvidia.com/deeplearning/tensorrt/archives/tensorrt-821/install-guide/index.htm)
```commandline
onnx
onnxruntime
paddle2onnx
```
## 2. Paddle模型导出
```commandline
python tools/export_model.py -c configs/ppyoloe/ppyoloe_crn_l_300e_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams trt=True exclude_nms=True
```
## 3. ONNX模型转换 + 定制化修改EfficientNMS_TRT
```commandline
python deploy/third_engine/demo_onnx_trt/onnx_custom.py --onnx_file=output_inference/ppyoloe_crn_l_300e_coco/ppyoloe_crn_l_300e_coco.onnx --model_dir=output_inference/ppyoloe_crn_l_300e_coco/ --opset_version=11
```
## 4. TensorRT Engine
```commandline
trtexec --onnx=output_inference/ppyoloe_crn_l_300e_coco/ppyoloe_crn_l_300e_coco.onnx --saveEngine=ppyoloe_crn_l_300e_coco.engine
```
**注意**:若运行报错,可尝试添加`--tacticSources=-cublasLt,+cublas`参数解决
## 5. 运行TensorRT推理
```commandline
python deploy/third_engine/demo_onnx_trt/trt_infer.py --infer_cfg=output_inference/ppyoloe_crn_l_300e_coco/infer_cfg.yml --trt_engine=ppyoloe_crn_l_300e_coco.engine --image_file=demo/000000014439.jpg
```
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment