First Commit.

b952e97b · chenych · b952e97b · b952e97b · b952e97b · b952e97b
Commit b952e97b authored Nov 03, 2023 by chenych
20 changed files
--- a/.gitignore
+++ b/.gitignore
+*.mdb
+*.pth
+*.tar
+*.sh
+*.txt
+*.ipynb
+*.zip
+*.eps
+*.pdf
+### Linux ###
+*~
+# temporary files which can be created if a process still has a handle open of a deleted file
+.fuse_hidden*
+# KDE directory preferences
+.directory
+# Linux trash folder which might appear on any partition or disk
+.Trash-*
+# .nfs files are created when an open file is removed but is still being accessed
+.nfs*
+### OSX ###
+# General
+.DS_Store
+.AppleDouble
+.LSOverride
+# Icon must end with two \r
+Icon
+# Thumbnails
+._*
+# Files that might appear in the root of a volume
+.DocumentRevisions-V100
+.fseventsd
+.Spotlight-V100
+.TemporaryItems
+.Trashes
+.VolumeIcon.icns
+.com.apple.timemachine.donotpresent
+# Directories potentially created on remote AFP share
+.AppleDB
+.AppleDesktop
+Network Trash Folder
+Temporary Items
+.apdisk
+### Python ###
+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[cod]
+*$py.class
+# C extensions
+*.so
+# Distribution / packaging
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+MANIFEST
+# PyInstaller
+#  Usually these files are written by a python script from a template
+#  before PyInstaller builds the exe, so as to inject date/other infos into it.
+*.manifest
+*.spec
+# Installer logs
+pip-log.txt
+pip-delete-this-directory.txt
+# Unit test / coverage reports
+htmlcov/
+.tox/
+.coverage
+.coverage.*
+.cache
+nosetests.xml
+coverage.xml
+*.cover
+.hypothesis/
+.pytest_cache/
+# Translations
+*.mo
+*.pot
+# Django stuff:
+*.log
+local_settings.py
+db.sqlite3
+# Flask stuff:
+instance/
+.webassets-cache
+# Scrapy stuff:
+.scrapy
+# Sphinx documentation
+docs/_build/
+# PyBuilder
+target/
+# Jupyter Notebook
+.ipynb_checkpoints
+# IPython
+profile_default/
+ipython_config.py
+# pyenv
+.python-version
+# celery beat schedule file
+celerybeat-schedule
+# SageMath parsed files
+*.sage.py
+# Environments
+.env
+.venv
+env/
+venv/
+ENV/
+env.bak/
+venv.bak/
+# Spyder project settings
+.spyderproject
+.spyproject
+# Rope project settings
+.ropeproject
+# mkdocs documentation
+/site
+# mypy
+.mypy_cache/
+.dmypy.json
+dmypy.json
+### Python Patch ###
+.venv/
+### Python.VirtualEnv Stack ###
+# Virtualenv
+# http://iamzed.com/2009/05/07/a-primer-on-virtualenv/
+[Bb]in
+[Ii]nclude
+[Ll]ib64
+[Ll]ocal
+[Ss]cripts
+pyvenv.cfg
+pip-selfcheck.json
+### Windows ###
+# Windows thumbnail cache files
+Thumbs.db
+ehthumbs.db
+ehthumbs_vista.db
+# Dump file
+*.stackdump
+# Folder config file
+[Dd]esktop.ini
+# Recycle Bin used on file shares
+$RECYCLE.BIN/
+# Windows Installer files
+*.cab
+*.msi
+*.msix
+*.msm
+*.msp
+# Windows shortcuts
+*.lnk
+.idea/
+.vscode/
+output/
+exp/
+data/
+*.pyc
+*.mp4
+*.zip
\ No newline at end of file
--- a/Architecture of the CenterFace.png
+++ b/Architecture of the CenterFace.png
--- a/README.md
+++ b/README.md
+# CenterFace
+## 论文
+[CenterFace: Joint Face Detection and Alignment Using Face as Point](https://arxiv.org/abs/1911.03599)
+## 模型结构
+CenterFace是一种人脸检测算法，采用了轻量级网络mobileNetV2作为主干网络，结合特征金字塔网络（FPN）实现anchor free的人脸检测。
+![Architecture of the CenterFace](Architecture of the CenterFace.png)
+## 算法原理
+CenterFace模型是一种基于单阶段人脸检测算法，作者借鉴了CenterNet的思想，将人脸检测转换为标准点问题，根据人脸中心点来回归人脸框的大小和五个标志点。
+## 环境配置
+### Docker（方法一）
+```
+docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:1.13.1-centos7.6-dtk-23.04.1-py38-latest
+docker run docker run -it -v /path/your_code_data/:/path/your_code_data/ --shm-size=32G --privileged=true --device=/dev/kfd --device=/dev/dri/ --group-add video --name docker_name imageID bash
+cd /path/workspace/
+pip3 install -r requirements.txt
+```
+### Dockerfile（方法二）
+```
+cd ./docker
+docker build --no-cache -t centerface:latest .
+docker run -it -v /path/your_code_data/:/path/your_code_data/ --shm-size=32G --privileged=true --device=/dev/kfd --device=/dev/dri/ --group-add video --name docker_name imageID bash
+```
+### Anaconda（方法三）
+1、关于本项目DCU显卡所需的特殊深度学习库可从光合开发者社区下载安装： https://developer.hpccube.com/tool/
+```
+DTK软件栈：dtk23.04.1
+python：python3.8
+torch：1.13.1
+torchvision：0.14.1
+```
+`Tips：以上dtk驱动、python、paddle等DCU相关工具版本需要严格一一对应`
+2、其他非特殊库直接按照requirements.txt安装
+```
+pip3 install -r requirements.txt
+```
+## 数据集
+[WIDER_FACE](http://shuoyang1213.me/WIDERFACE/index.html)
+![datasets](datasets.png)
+下载图片红框中三个数据并解压，也可直接点击下面链接直接下载：
+[WIDER Face Training Images(Tencent Drive)](https://share.weiyun.com/5WjCBWV)
+[WIDER Face Validation Images(Tencent Drive)](https://share.weiyun.com/5ot9Qv1)
+[WIDER Face Testing Images(Tencent Drive)](https://share.weiyun.com/5vSUomP)
+annotation文件使用的是coco的格式，可以通过百度网盘下载
+[Baidu](https://pan.baidu.com/s/1j_2wggZ3bvCuOAfZvjWqTg) 提取码：f9hh
+数据集全部解压后的目录结构如下：
+```
+├── WIDER_train
+│   ├── images
+├── WIDER_test
+│   ├── images
+├── WIDER_val
+│   ├── images
+├── annotations
+│   ├── train_wider_face.json
+│   ├── val_wider_face.json
+```
+## 训练
+### 单机单卡
+```
+cd ./src
+bash train.sh
+```
+### 单机多卡
+```
+cd ./src
+bash train_multi.sh
+```
+## 推理
+#### 单卡推理
+```
+cd ./src
+python test_wider_face.py
+```
+## result
+![Result](draw_img.jpg)
+### 精度
+WIDER_FACE验证集上的测试结果如下
+| Method | Easy | Medium | Hard|
+|:--------:| :--------:| :---------:| :------:|
+| ours(one scale) | 0.9264 | 0.9133 | 0.7479 |
+| original | 0.922 | 0.911  | 0.782|
+## 应用场景
+### 算法类别
+人脸识别
+### 热点应用行业
+教育,交通,公安,医疗
+## 预训练权重
+./models/model_best.pth
+## 源码仓库及问题反馈
+https://developer.hpccube.com/codes/modelzoo/centerface-pytorch
+## 参考资料
+https://github.com/chenjun2hao/CenterFace.pytorch
--- a/TensorRT/_init_paths.py
+++ b/TensorRT/_init_paths.py
+import os.path as osp
+import sys
+def add_path(path):
+    if path not in sys.path:
+        sys.path.insert(0, path)
+father_dir = osp.dirname(osp.dirname(__file__))
+# Add lib to PYTHONPATH
+lib_path = osp.join(father_dir, 'src', 'lib')
+add_path(lib_path)
--- a/TensorRT/centernet_tensorrt_engine.py
+++ b/TensorRT/centernet_tensorrt_engine.py
+import os
+import sys
+import cv2
+import numpy as np
+import torch
+import torch.nn as nn
+# from config import cfg, update_config
+from models.utils import _gather_feat, _tranpose_and_gather_feat
+from tensorrt_model import TRTModel
+from utils.image import get_affine_transform, transform_preds
+class CenterNetTensorRTEngine(object):
+    def __init__(self, config, weight_file):
+        # update_config(cfg, config_file)
+        self.cfg = config
+        self.trtmodel = TRTModel(weight_file)
+    def preprocess(self, image, scale=1, meta=None):
+        height, width = image.shape[0:2]
+        new_height = int(height * scale)
+        new_width  = int(width * scale)
+        self.mean = np.array(self.cfg.mean, dtype=np.float32).reshape(1, 1, 3)
+        self.std = np.array(self.cfg.std, dtype=np.float32).reshape(1, 1, 3)
+        if self.cfg.fix_res:
+            inp_height, inp_width = self.cfg.input_h, self.cfg.input_w
+            c = np.array([new_width / 2., new_height / 2.], dtype=np.float32)
+            s = max(height, width) * 1.0
+        else:
+            inp_height = (new_height | self.cfg.pad) + 1
+            inp_width = (new_width | self.cfg.pad) + 1
+            c = np.array([new_width // 2, new_height // 2], dtype=np.float32)
+            s = np.array([inp_width, inp_height], dtype=np.float32)
+        trans_input = get_affine_transform(c, s, 0, [inp_width, inp_height])
+        resized_image = cv2.resize(image, (new_width, new_height))
+        inp_image = cv2.warpAffine(
+            resized_image, trans_input, (inp_width, inp_height), flags=cv2.INTER_LINEAR)
+        inp_image = ((inp_image / 255. - self.mean) / self.std).astype(np.float32)
+        images = inp_image.transpose(2, 0, 1).reshape(1, 3, inp_height, inp_width)
+        if self.cfg.flip_test:
+            images = np.concatenate((images, images[:, :, :, ::-1]), axis=0)
+        meta = {'c': c, 's': s,
+                'out_height': inp_height // self.cfg.down_ratio,
+                'out_width': inp_width // self.cfg.down_ratio}
+        return np.ascontiguousarray(images), meta
+    def run(self, imgs):
+        images , meta = self.preprocess(imgs)                       # prepocess for image
+        trt_output = self.trtmodel(images)                          # tensorrt inference
+        predictions = self.postprocess(trt_output, meta)
+        return predictions
+    def _nms(self, heat, kernel=3):
+        pad = (kernel - 1) // 2
+        hmax = nn.functional.max_pool2d(
+            heat, (kernel, kernel), stride=1, padding=pad)
+        keep = (hmax == heat).float()
+        return heat * keep
+    def _topk(self, scores, K=40):
+        batch, cat, height, width = scores.size()
+        topk_scores, topk_inds = torch.topk(scores.view(batch, cat, -1), K)
+        topk_inds = topk_inds % (height * width)
+        topk_ys   = (topk_inds / width).int().float()
+        topk_xs   = (topk_inds % width).int().float()
+        topk_score, topk_ind = torch.topk(topk_scores.view(batch, -1), K)
+        topk_clses = (topk_ind / K).int()
+        topk_inds = _gather_feat(
+            topk_inds.view(batch, -1, 1), topk_ind).view(batch, K)
+        topk_ys = _gather_feat(topk_ys.view(batch, -1, 1), topk_ind).view(batch, K)
+        topk_xs = _gather_feat(topk_xs.view(batch, -1, 1), topk_ind).view(batch, K)
+        return topk_score, topk_inds, topk_clses, topk_ys, topk_xs
+    def _topk_channel(self, scores, K=40):
+          batch, cat, height, width = scores.size()
+          topk_scores, topk_inds = torch.topk(scores.view(batch, cat, -1), K)
+          topk_inds = topk_inds % (height * width)
+          topk_ys   = (topk_inds / width).int().float()
+          topk_xs   = (topk_inds % width).int().float()
+          return topk_scores, topk_inds, topk_ys, topk_xs
+    def multi_pose_decode(self,
+        heat, wh, kps, reg=None, hm_hp=None, hp_offset=None, K=100):
+        batch, cat, height, width = heat.size()
+        num_joints = kps.shape[1] // 2
+        # perform nms on heatmaps
+        heat = self._nms(heat)
+        scores, inds, clses, ys, xs = self._topk(heat, K=K)
+        kps = _tranpose_and_gather_feat(kps, inds)
+        kps = kps.view(batch, K, num_joints * 2)
+        kps[..., ::2] += xs.view(batch, K, 1).expand(batch, K, num_joints)
+        kps[..., 1::2] += ys.view(batch, K, 1).expand(batch, K, num_joints)
+        if reg is not None:
+            reg = _tranpose_and_gather_feat(reg, inds)
+            reg = reg.view(batch, K, 2)
+            xs = xs.view(batch, K, 1) + reg[:, :, 0:1]
+            ys = ys.view(batch, K, 1) + reg[:, :, 1:2]
+        else:
+            xs = xs.view(batch, K, 1) + 0.5
+            ys = ys.view(batch, K, 1) + 0.5
+        wh = _tranpose_and_gather_feat(wh, inds)
+        wh = wh.view(batch, K, 2)
+        clses  = clses.view(batch, K, 1).float()
+        scores = scores.view(batch, K, 1)
+        bboxes = torch.cat([xs - wh[..., 0:1] / 2, 
+                          ys - wh[..., 1:2] / 2,
+                          xs + wh[..., 0:1] / 2, 
+                          ys + wh[..., 1:2] / 2], dim=2)
+        if hm_hp is not None:
+            hm_hp = self._nms(hm_hp)
+            thresh = 0.1
+            kps = kps.view(batch, K, num_joints, 2).permute(
+              0, 2, 1, 3).contiguous() # b x J x K x 2
+            reg_kps = kps.unsqueeze(3).expand(batch, num_joints, K, K, 2)
+            hm_score, hm_inds, hm_ys, hm_xs = self._topk_channel(hm_hp, K=K) # b x J x K
+            if hp_offset is not None:
+                hp_offset = _tranpose_and_gather_feat(
+                  hp_offset, hm_inds.view(batch, -1))
+                hp_offset = hp_offset.view(batch, num_joints, K, 2)
+                hm_xs = hm_xs + hp_offset[:, :, :, 0]
+                hm_ys = hm_ys + hp_offset[:, :, :, 1]
+            else:
+                hm_xs = hm_xs + 0.5
+                hm_ys = hm_ys + 0.5
+            mask = (hm_score > thresh).float()
+            hm_score = (1 - mask) * -1 + mask * hm_score
+            hm_ys = (1 - mask) * (-10000) + mask * hm_ys
+            hm_xs = (1 - mask) * (-10000) + mask * hm_xs
+            hm_kps = torch.stack([hm_xs, hm_ys], dim=-1).unsqueeze(
+              2).expand(batch, num_joints, K, K, 2)
+            dist = (((reg_kps - hm_kps) ** 2).sum(dim=4) ** 0.5)
+            min_dist, min_ind = dist.min(dim=3) # b x J x K
+            hm_score = hm_score.gather(2, min_ind).unsqueeze(-1) # b x J x K x 1
+            min_dist = min_dist.unsqueeze(-1)
+            min_ind = min_ind.view(batch, num_joints, K, 1, 1).expand(
+              batch, num_joints, K, 1, 2)
+            hm_kps = hm_kps.gather(3, min_ind)
+            hm_kps = hm_kps.view(batch, num_joints, K, 2)
+            l = bboxes[:, :, 0].view(batch, 1, K, 1).expand(batch, num_joints, K, 1)
+            t = bboxes[:, :, 1].view(batch, 1, K, 1).expand(batch, num_joints, K, 1)
+            r = bboxes[:, :, 2].view(batch, 1, K, 1).expand(batch, num_joints, K, 1)
+            b = bboxes[:, :, 3].view(batch, 1, K, 1).expand(batch, num_joints, K, 1)
+            mask = (hm_kps[..., 0:1] < l) + (hm_kps[..., 0:1] > r) + \
+                 (hm_kps[..., 1:2] < t) + (hm_kps[..., 1:2] > b) + \
+                 (hm_score < thresh) + (min_dist > (torch.max(b - t, r - l) * 0.3))
+            mask = (mask > 0).float().expand(batch, num_joints, K, 2)
+            kps = (1 - mask) * hm_kps + mask * kps
+            kps = kps.permute(0, 2, 1, 3).contiguous().view(
+              batch, K, num_joints * 2)
+        detections = torch.cat([bboxes, scores, kps, torch.transpose(hm_score.squeeze(dim=3), 1, 2)], dim=2)
+        return detections
+    def multi_pose_post_process(self, dets, c, s, h, w):
+        # dets: batch x max_dets x 40
+        # return list of 39 in image coord
+        ret = []
+        for i in range(dets.shape[0]):
+            bbox = transform_preds(dets[i, :, :4].reshape(-1, 2), c[i], s[i], (w, h))
+            pts = transform_preds(dets[i, :, 5:15].reshape(-1, 2), c[i], s[i], (w, h))
+            top_preds = np.concatenate(
+                [bbox.reshape(-1, 4), dets[i, :, 4:5], 
+                pts.reshape(-1, 10), dets[i, :, 15:20]], axis=1).astype(np.float32).tolist()
+            ret.append({np.ones(1, dtype=np.int32)[0]: top_preds})
+        return ret
+    def post_process(self, dets, meta, scale=1):
+        dets = dets.detach().cpu().numpy().reshape(1, -1, dets.shape[2])
+        dets = self.multi_pose_post_process(
+          dets.copy(), [meta['c']], [meta['s']],
+          meta['out_height'], meta['out_width'])
+        for j in range(1, self.cfg.num_classes + 1):
+            dets[0][j] = np.array(dets[0][j], dtype=np.float32).reshape(-1, 20)
+            dets[0][j][:, :4] /= scale
+            dets[0][j][:, 5:] /= scale
+        return dets[0]
+    def postprocess(self, *args):
+        hm, wh, hps, reg, hm_hp, hp_offset = args[0]; meta = args[1]
+        hm = hm.sigmoid_()
+        hm_hp = hm_hp.sigmoid_()
+        detections = self.multi_pose_decode(hm, wh, hps, reg=reg, hm_hp=hm_hp, hp_offset=hp_offset, K=self.cfg.K)
+        dets = self.post_process(detections, meta, 1)
+        return dets
--- a/TensorRT/convert2onnx.py
+++ b/TensorRT/convert2onnx.py
+import logging
+import os
+import _init_paths
+import cv2
+import numpy as np
+import onnxruntime as nxrun
+import torch
+from opts_pose import opts
+from datasets.dataset_factory import get_dataset
+from models.model import create_model, load_model
+from utils.image import get_affine_transform
+from detectors.detector_factory import detector_factory
+logger = logging.getLogger(__name__)
+class class_centernet(object):
+  def __init__(self, opt):
+    if opt.gpus[0] >= 0:
+      opt.device = torch.device('cuda')
+    else:
+      opt.device = torch.device('cpu')
+    print('Creating model...')
+    self.model = create_model(opt.arch, opt.heads, opt.head_conv)
+    self.model = load_model(self.model, opt.load_model)
+    self.model = self.model.to(opt.device)
+    self.model.eval()
+    self.mean = np.array(opt.mean, dtype=np.float32).reshape(1, 1, 3)
+    self.std = np.array(opt.std, dtype=np.float32).reshape(1, 1, 3)
+    self.max_per_image = 100
+    self.num_classes = opt.num_classes
+    self.scales = opt.test_scales
+    self.opt = opt
+    self.pause = True
+  def pre_process(self, image, scale, meta=None):
+    height, width = image.shape[0:2]
+    new_height = int(height * scale)
+    new_width  = int(width * scale)
+    if self.opt.fix_res:
+      inp_height, inp_width = self.opt.input_h, self.opt.input_w
+      c = np.array([new_width / 2., new_height / 2.], dtype=np.float32)
+      s = max(height, width) * 1.0
+    else:
+      inp_height = (new_height | self.opt.pad) + 1
+      inp_width = (new_width | self.opt.pad) + 1
+      c = np.array([new_width // 2, new_height // 2], dtype=np.float32)
+      s = np.array([inp_width, inp_height], dtype=np.float32)
+    trans_input = get_affine_transform(c, s, 0, [inp_width, inp_height])
+    resized_image = cv2.resize(image, (new_width, new_height))
+    inp_image = cv2.warpAffine(
+      resized_image, trans_input, (inp_width, inp_height),
+      flags=cv2.INTER_LINEAR)
+    inp_image = ((inp_image / 255. - self.mean) / self.std).astype(np.float32)
+    images = inp_image.transpose(2, 0, 1).reshape(1, 3, inp_height, inp_width)
+    if self.opt.flip_test:
+      images = np.concatenate((images, images[:, :, :, ::-1]), axis=0)
+    images = torch.from_numpy(images)
+    meta = {'c': c, 's': s,
+            'out_height': inp_height // self.opt.down_ratio,
+            'out_width': inp_width // self.opt.down_ratio}
+    return images, meta
+def main(opt):
+    # init model
+    os.environ['CUDA_VISIBLE_DEVICES'] = opt.gpus_str
+    Detector = detector_factory[opt.task]
+    detector = Detector(opt)
+    debug = 0            # return the detect result without show
+    threshold = 0.05
+    TASK = 'multi_pose'  # or 'multi_pose' for human pose estimation
+    input_h, intput_w = 800, 800
+    MODEL_PATH = '/your/centerface/exp/multi_pose/mobilev2_10/model_best.pth'
+    opt = opts().init('--task {} --load_model {} --debug {} --vis_thresh {} --input_h {} --input_w {}'.format(
+        TASK, MODEL_PATH, debug, threshold, input_h, intput_w).split(' '))
+    detector = detector_factory[opt.task](opt)
+    out_onnx_path = "../output/onnx_model/mobilev2_aspaper.onnx"
+    image = cv2.imread('../test_img/test.png')
+    torch_input, meta = detector.pre_process(image, scale=1)
+    torch_input = torch_input.cuda()
+    # pytorch output
+    torch_output = detector.model(torch_input)
+    torch.onnx.export(detector.model, torch_input, out_onnx_path, verbose=False)
+    sess = nxrun.InferenceSession(out_onnx_path)
+    print('save done')
+    input_name = sess.get_inputs()[0].name
+    output_onnx = sess.run(None, {input_name:  torch_input.cpu().data.numpy()})
+    temp = 1
+if __name__ == '__main__':
+    opt = opts().init()
+    main(opt)
--- a/TensorRT/demo_tensorrt.py
+++ b/TensorRT/demo_tensorrt.py
+import logging
+import math
+import os
+import pickle
+import time
+import cv2
+import numpy as np
+import tensorrt as trt
+import torch
+import _init_paths
+from torchvision import transforms
+from opts_pose import opts
+from centernet_tensorrt_engine import CenterNetTensorRTEngine
+logger = logging.getLogger(__name__)
+TRT_LOGGER = trt.Logger()  # required by TensorRT
+def build_engine(onnx_file_path, engine_file_path, precision, max_batch_size, cache_file=None):
+    """Builds a new TensorRT engine and saves it, if no engine presents"""
+    if os.path.exists(engine_file_path):
+        logger.info('{} TensorRT engine already exists. Skip building engine...'.format(precision))
+        return
+    logger.info('Building {} TensorRT engine from onnx file...'.format(precision))
+    with trt.Builder(TRT_LOGGER) as b, b.create_network() as n, trt.OnnxParser(n, TRT_LOGGER) as p:
+        b.max_workspace_size = 1 << 30  # 1GB
+        b.max_batch_size = max_batch_size
+        if precision == 'fp16':
+            b.fp16_mode = True
+        elif precision == 'int8':
+            from ..calibrator import Calibrator
+            b.int8_mode = True
+            b.int8_calibrator = Calibrator(cache_file=cache_file)
+        elif precision == 'fp32':
+            pass
+        else:
+            logger.error('Engine precision not supported: {}'.format(precision))
+            raise NotImplementedError
+        # Parse model file
+        with open(onnx_file_path, 'rb') as model:
+            p.parse(model.read())
+        if p.num_errors:
+            logger.error('Parsing onnx file found {} errors.'.format(p.num_errors))
+        engine = b.build_cuda_engine(n)
+        print(engine_file_path)
+        with open(engine_file_path, "wb") as f:
+            f.write(engine.serialize())
+def add_coco_bbox(image, bbox, conf=1):
+    txt = '{}{:.1f}'.format('person', conf)
+    font = cv2.FONT_HERSHEY_SIMPLEX
+    cv2.rectangle(image, (bbox[0], bbox[1]), (bbox[2], bbox[3]), (0, 255, 255), 2)
+    cv2.putText(image, txt, (bbox[0], bbox[1] - 2),
+                font, 0.5, (0, 255, 0), thickness=1, lineType=cv2.LINE_AA)
+def add_coco_hp(image, points, keypoints_prob):
+    for j in range(5):
+        if keypoints_prob[j] > 0.5:
+            cv2.circle(image, (points[j, 0], points[j, 1]), 2, (255, 255, 0), -1)
+    return image
+if __name__ == '__main__':
+    # 0. build trnsorrt engine/转成tensorrt模型
+    # onnx_path = '../output/onnx_model/mobilev2_large.onnx'
+    trt_path = '../output/onnx_model/mobilev2.trt'
+    # build_engine(onnx_path, trt_path, 'fp32', 1)
+    # print('build trnsorrt engine done')
+    config = opts().init()
+    # 1. load trnsorrt engine
+    body_engine = CenterNetTensorRTEngine(weight_file=trt_path, config=config)
+    print('load trnsorrt engine done')
+    # 2. video for the tracking
+    cap = cv2.VideoCapture('/your/path/xxx.mp4')
+    # 3. write the result image into video
+    if config.output_video:
+        fourcc = cv2.VideoWriter_fourcc(*'mp4v')  # 如果是mp4视频，编码需要为mp4v
+        im_width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
+        im_height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
+        write_cap = cv2.VideoWriter(config.output_video, fourcc, 50, (im_width, im_height))
+    k = 1; start_time = time.time()
+    while cap.grab():
+        k += 1
+        ret, image = cap.retrieve()                             # Capture frame-by-frame
+        rgb_img = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
+        detections = body_engine.run(rgb_img)[1]
+        print('fps is:{:.3f}'.format(k/(time.time() - start_time)))
+        for i, bbox in enumerate(detections):
+            if bbox[4] > 0.4:
+                body_bbox = np.array(bbox[:4], dtype=np.int32)
+                body_prob = bbox[4]
+                add_coco_bbox(image, body_bbox, body_prob)
+                body_pose = np.array(bbox[5:15], dtype=np.int32)
+                keypoints = np.array(body_pose, dtype=np.int32).reshape(5, 2)
+                keypoints_prob = bbox[15:]
+                image = add_coco_hp(image, keypoints, keypoints_prob)
+        # debug show
+        # cv2.imshow('image result', image)
+        # if cv2.waitKey(1) & 0xFF == ord('q'):
+        #     break
+        # write into video
+        if config.output_video:
+            write_cap.write(image)
--- a/TensorRT/tensorrt_model.py
+++ b/TensorRT/tensorrt_model.py
+import atexit
+import tensorrt as trt
+import torch
+def torch_dtype_to_trt(dtype):
+    if dtype == torch.int8:
+        return trt.int8
+    elif dtype == torch.int32:
+        return trt.int32
+    elif dtype == torch.float16:
+        return trt.float16
+    elif dtype == torch.float32:
+        return trt.float32
+    else:
+        raise TypeError('%s is not supported by tensorrt' % dtype)
+def torch_dtype_from_trt(dtype):
+    if dtype == trt.int8:
+        return torch.int8
+    elif dtype == trt.int32:
+        return torch.int32
+    elif dtype == trt.float16:
+        return torch.float16
+    elif dtype == trt.float32:
+        return torch.float32
+    else:
+        raise TypeError('%s is not supported by torch' % dtype)
+def torch_device_to_trt(device):
+    if device.type == torch.device('cuda').type:
+        return trt.TensorLocation.DEVICE
+    elif device.type == torch.device('cpu').type:
+        return trt.TensorLocation.HOST
+    else:
+        return TypeError('%s is not supported by tensorrt' % device)
+def torch_device_from_trt(device):
+    if device == trt.TensorLocation.DEVICE:
+        return torch.device('cuda')
+    elif device == trt.TensorLocation.HOST:
+        return torch.device('cpu')
+    else:
+        return TypeError('%s is not supported by torch' % device)
+class TRTModel(object):
+    def __init__(self, engine_path, input_names=None, output_names=None, final_shapes=None):
+        # load engine
+        self.logger = trt.Logger()
+        self.runtime = trt.Runtime(self.logger)
+        with open(engine_path, 'rb') as f:
+            self.engine = self.runtime.deserialize_cuda_engine(f.read())
+        self.context = self.engine.create_execution_context()
+        if input_names is None:
+            self.input_names = self._trt_input_names()
+        else:
+            self.input_names = input_names
+        if output_names is None:
+            self.output_names = self._trt_output_names()
+        else:
+            self.output_names = output_names
+        self.final_shapes = final_shapes
+    def _input_binding_indices(self):
+        return [i for i in range(self.engine.num_bindings) if self.engine.binding_is_input(i)]
+    def _output_binding_indices(self):
+        return [i for i in range(self.engine.num_bindings) if not self.engine.binding_is_input(i)]
+    def _trt_input_names(self):
+        return [self.engine.get_binding_name(i) for i in self._input_binding_indices()]
+    def _trt_output_names(self):
+        return [self.engine.get_binding_name(i) for i in self._output_binding_indices()]
+    def create_output_buffers(self, batch_size):
+        outputs = [None] * len(self.output_names)
+        for i, output_name in enumerate(self.output_names):
+            idx = self.engine.get_binding_index(output_name)
+            dtype = torch_dtype_from_trt(self.engine.get_binding_dtype(idx))
+            if self.final_shapes is not None:
+                shape = (batch_size, ) + self.final_shapes[i]
+            else:
+                shape = (batch_size, ) + tuple(self.engine.get_binding_shape(idx))
+            device = torch_device_from_trt(self.engine.get_location(idx))
+            output = torch.empty(size=shape, dtype=dtype, device=device)
+            outputs[i] = output
+        return outputs
+    def execute(self, *inputs):
+        batch_size = inputs[0].shape[0]
+        bindings = [None] * (len(self.input_names) + len(self.output_names))
+        # map input bindings
+        inputs_torch = [None] * len(self.input_names)
+        for i, name in enumerate(self.input_names):
+            idx = self.engine.get_binding_index(name)
+            # convert to appropriate format
+            inputs_torch[i] = torch.from_numpy(inputs[i])
+            inputs_torch[i] = inputs_torch[i].to(torch_device_from_trt(self.engine.get_location(idx)))
+            inputs_torch[i] = inputs_torch[i].type(torch_dtype_from_trt(self.engine.get_binding_dtype(idx)))
+            bindings[idx] = int(inputs_torch[i].data_ptr())
+        output_buffers = self.create_output_buffers(batch_size)
+        # map output bindings
+        for i, name in enumerate(self.output_names):
+            idx = self.engine.get_binding_index(name)
+            bindings[idx] = int(output_buffers[i].data_ptr())
+        self.context.execute(batch_size, bindings)
+        outputs = [buffer for buffer in output_buffers]
+        return outputs
+    def __call__(self, *inputs):
+        return self.execute(*inputs)
--- a/datasets.png
+++ b/datasets.png
--- a/docker/Dockerfile
+++ b/docker/Dockerfile
+FROM image.sourcefind.cn:5000/dcu/admin/base/pytorch:1.13.1-centos7.6-dtk-23.04-py38-latest
+RUN source /opt/dtk/env.sh
+COPY requirments.txt requirments.txt
+RUN pip3 install -r requirements.txt
+ADD https://ultralytics.com/assets/Arial.ttf /root/.config/Ultralytics/
--- a/draw_img.jpg
+++ b/draw_img.jpg
--- a/evaluate/README.md
+++ b/evaluate/README.md
+# WiderFace-Eval-Python
+wider face验证集的测评代码
+一、生成预测结果
+生成以下预测结果：
+```python
+ 0--Parade
+ 0_Parade_marchingband_1_20.txt
+ 0_Parade_marchingband_1_74.txt
+ ...
+ 1--Handshaking
+```
+其中，0--Parade是不同场景的文件夹，wider face总共有61种场景，0_Parade_marchingband_1_20.txt是对应某个图片的预测结果，其具有以下格式：
+```python
+image_name
+the number fo faces  #  检测出多少张人脸
+x, y, w, h, confidence  #  x和y是检测框左上角的坐标
+```
+举个例子：
+```python
+0_Parade_marchingband_1_309.jpg
+536
+499.62817 73.10439 34.215393 38.730423 0.93176836
+47.55735 86.14974 21.215218 25.779213 0.7041396
+```
+二、下载ground truth数据
+可以从官网下载，得到这四个文件：`wider_easy_val.mat, wider_face_val.mat, wider_hard_val.mat, wider_medium_val.mat`
+官网下载可能有点慢，可以从这里下载：https://pan.baidu.com/s/1AErRlTlYaok6p7OGV7VShQ
+三、编译工具
+执行命令
+bash make.sh
+四、测AP
+```python
+python3 evaluation.py -p <your prediction dir> -g <groud truth dir>  # 测easy,medium,hard的结果
+```
+```python
+python3 evaluation.py -p <your prediction dir> -g <groud truth dir> --all # 将easy,medium,hard一起测
+```
--- a/evaluate/box_overlaps.c
+++ b/evaluate/box_overlaps.c
--- a/evaluate/box_overlaps.pyx
+++ b/evaluate/box_overlaps.pyx
+# --------------------------------------------------------
+# Fast R-CNN
+# Copyright (c) 2015 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# Written by Sergey Karayev
+# --------------------------------------------------------
+cimport cython
+import numpy as np
+cimport numpy as np
+DTYPE = np.float64
+ctypedef np.float_t DTYPE_t
+def bbox_overlaps(
+        np.ndarray[DTYPE_t, ndim=2] boxes,
+        np.ndarray[DTYPE_t, ndim=2] query_boxes):
+    """
+    Parameters
+    ----------
+    boxes: (N, 4) ndarray of float
+    query_boxes: (K, 4) ndarray of float
+    Returns
+    -------
+    overlaps: (N, K) ndarray of overlap between boxes and query_boxes
+    """
+    cdef unsigned int N = boxes.shape[0]
+    cdef unsigned int K = query_boxes.shape[0]
+    cdef np.ndarray[DTYPE_t, ndim=2] overlaps = np.zeros((N, K), dtype=DTYPE)
+    cdef DTYPE_t iw, ih, box_area
+    cdef DTYPE_t ua
+    cdef unsigned int k, n
+    for k in range(K):
+        box_area = (
+            (query_boxes[k, 2] - query_boxes[k, 0] + 1) *
+            (query_boxes[k, 3] - query_boxes[k, 1] + 1)
+        )
+        for n in range(N):
+            iw = (
+                min(boxes[n, 2], query_boxes[k, 2]) -
+                max(boxes[n, 0], query_boxes[k, 0]) + 1
+            )
+            if iw > 0:
+                ih = (
+                    min(boxes[n, 3], query_boxes[k, 3]) -
+                    max(boxes[n, 1], query_boxes[k, 1]) + 1
+                )
+                if ih > 0:
+                    ua = float(
+                        (boxes[n, 2] - boxes[n, 0] + 1) *
+                        (boxes[n, 3] - boxes[n, 1] + 1) +
+                        box_area - iw * ih
+                    )
+                    overlaps[n, k] = iw * ih / ua
+    return overlaps
\ No newline at end of file
--- a/evaluate/evaluation.py
+++ b/evaluate/evaluation.py
+# -*-coding:utf-8-*-
+from __future__ import division
+"""
+WiderFace evaluation code
+author: wondervictor
+mail: tianhengcheng@gmail.com
+copyright@wondervictor
+"""
+import os
+import tqdm
+import pickle
+import argparse
+import numpy as np
+from scipy.io import loadmat
+from bbox import bbox_overlaps
+from IPython import embed
+def get_gt_boxes(gt_dir):
+    """ gt dir: (wider_face_val.mat, wider_easy_val.mat, wider_medium_val.mat, wider_hard_val.mat)"""
+    gt_mat = loadmat(os.path.join(gt_dir, 'wider_face_val.mat'))
+    hard_mat = loadmat(os.path.join(gt_dir, 'wider_hard_val.mat'))
+    medium_mat = loadmat(os.path.join(gt_dir, 'wider_medium_val.mat'))
+    easy_mat = loadmat(os.path.join(gt_dir, 'wider_easy_val.mat'))
+    facebox_list = gt_mat['face_bbx_list']
+    event_list = gt_mat['event_list']
+    file_list = gt_mat['file_list']
+    hard_gt_list = hard_mat['gt_list']
+    medium_gt_list = medium_mat['gt_list']
+    easy_gt_list = easy_mat['gt_list']
+    return facebox_list, event_list, file_list, hard_gt_list, medium_gt_list, easy_gt_list
+def get_gt_boxes_from_txt(gt_path, cache_dir):
+    cache_file = os.path.join(cache_dir, 'gt_cache.pkl')
+    if os.path.exists(cache_file):
+        f = open(cache_file, 'rb')
+        boxes = pickle.load(f)
+        f.close()
+        return boxes
+    f = open(gt_path, 'r')
+    state = 0
+    lines = f.readlines()
+    lines = list(map(lambda x: x.rstrip('\r\n'), lines))
+    boxes = {}
+    f.close()
+    current_boxes = []
+    current_name = None
+    for line in lines:
+        if state == 0 and '--' in line:
+            state = 1
+            current_name = line
+            continue
+        if state == 1:
+            state = 2
+            continue
+        if state == 2 and '--' in line:
+            state = 1
+            boxes[current_name] = np.array(current_boxes).astype('float32')
+            current_name = line
+            current_boxes = []
+            continue
+        if state == 2:
+            box = [float(x) for x in line.split(' ')[:4]]
+            current_boxes.append(box)
+            continue
+    f = open(cache_file, 'wb')
+    pickle.dump(boxes, f)
+    f.close()
+    return boxes
+def read_pred_file(filepath):
+    with open(filepath, 'r') as f:
+        lines = f.readlines()
+        img_file = lines[0].rstrip('\n\r')
+        lines = lines[2:]
+    boxes = np.array(list(map(lambda x: [float(a) for a in x.rstrip(
+        '\r\n').split(' ')], lines))).astype('float')
+    return img_file.split('/')[-1], boxes
+def get_preds(pred_dir):
+    events = os.listdir(pred_dir)
+    boxes = dict()
+    pbar = tqdm.tqdm(events)
+    for event in pbar:
+        pbar.set_description('Reading Predictions ')
+        event_dir = os.path.join(pred_dir, event)
+        event_images = os.listdir(event_dir)
+        current_event = dict()
+        for imgtxt in event_images:
+            imgname, _boxes = read_pred_file(os.path.join(event_dir, imgtxt))
+            current_event[imgname.rstrip('.jpg')] = _boxes
+        boxes[event] = current_event
+    return boxes
+def norm_score(pred):
+    """ norm score
+    pred {key: [[x1,y1,x2,y2,s]]}
+    """
+    max_score = 0
+    min_score = 1
+    for _, k in pred.items():
+        for _, v in k.items():
+            if len(v) == 0:
+                continue
+            _min = np.min(v[:, -1])
+            _max = np.max(v[:, -1])
+            max_score = max(_max, max_score)
+            min_score = min(_min, min_score)
+    diff = max_score - min_score
+    for _, k in pred.items():
+        for _, v in k.items():
+            if len(v) == 0:
+                continue
+            v[:, -1] = (v[:, -1] - min_score)/diff
+def image_eval(pred, gt, ignore, iou_thresh):
+    """ single image evaluation
+    pred: Nx5
+    gt: Nx4
+    ignore:
+    """
+    _pred = pred.copy()
+    _gt = gt.copy()
+    pred_recall = np.zeros(_pred.shape[0])
+    recall_list = np.zeros(_gt.shape[0])
+    proposal_list = np.ones(_pred.shape[0])
+    _pred[:, 2] = _pred[:, 2] + _pred[:, 0]
+    _pred[:, 3] = _pred[:, 3] + _pred[:, 1]
+    _gt[:, 2] = _gt[:, 2] + _gt[:, 0]
+    _gt[:, 3] = _gt[:, 3] + _gt[:, 1]
+    overlaps = bbox_overlaps(_pred[:, :4], _gt)
+    for h in range(_pred.shape[0]):
+        gt_overlap = overlaps[h]
+        max_overlap, max_idx = gt_overlap.max(), gt_overlap.argmax()
+        if max_overlap >= iou_thresh:
+            if ignore[max_idx] == 0:
+                recall_list[max_idx] = -1
+                proposal_list[h] = -1
+            elif recall_list[max_idx] == 0:
+                recall_list[max_idx] = 1
+        r_keep_index = np.where(recall_list == 1)[0]
+        pred_recall[h] = len(r_keep_index)
+    return pred_recall, proposal_list
+def img_pr_info(thresh_num, pred_info, proposal_list, pred_recall):
+    pr_info = np.zeros((thresh_num, 2)).astype('float')
+    for t in range(thresh_num):
+        thresh = 1 - (t+1)/thresh_num
+        r_index = np.where(pred_info[:, 4] >= thresh)[0]
+        if len(r_index) == 0:
+            pr_info[t, 0] = 0
+            pr_info[t, 1] = 0
+        else:
+            r_index = r_index[-1]
+            p_index = np.where(proposal_list[:r_index+1] == 1)[0]
+            pr_info[t, 0] = len(p_index)
+            pr_info[t, 1] = pred_recall[r_index]
+    return pr_info
+def dataset_pr_info(thresh_num, pr_curve, count_face):
+    _pr_curve = np.zeros((thresh_num, 2))
+    for i in range(thresh_num):
+        _pr_curve[i, 0] = pr_curve[i, 1] / pr_curve[i, 0]
+        _pr_curve[i, 1] = pr_curve[i, 1] / count_face
+    return _pr_curve
+def voc_ap(rec, prec):
+    # correct AP calculation
+    # first append sentinel values at the end
+    mrec = np.concatenate(([0.], rec, [1.]))
+    mpre = np.concatenate(([0.], prec, [0.]))
+    # compute the precision envelope
+    for i in range(mpre.size - 1, 0, -1):
+        mpre[i - 1] = np.maximum(mpre[i - 1], mpre[i])
+    # to calculate area under PR curve, look for points
+    # where X axis (recall) changes value
+    i = np.where(mrec[1:] != mrec[:-1])[0]
+    # and sum (\Delta recall) * prec
+    ap = np.sum((mrec[i + 1] - mrec[i]) * mpre[i + 1])
+    return ap
+def evaluation(pred, gt_path, all, iou_thresh=0.4):
+    pred = get_preds(pred)
+    norm_score(pred)
+    facebox_list, event_list, file_list, hard_gt_list, medium_gt_list, easy_gt_list = get_gt_boxes(
+        gt_path)
+    event_num = len(event_list)
+    thresh_num = 1000
+    settings = ['easy', 'medium', 'hard']
+    setting_gts = [easy_gt_list, medium_gt_list, hard_gt_list]
+    if not all:
+        aps = []
+        for setting_id in range(3):
+            # different setting
+            gt_list = setting_gts[setting_id]
+            count_face = 0
+            pr_curve = np.zeros((thresh_num, 2)).astype('float')
+            # [hard, medium, easy]
+            pbar = tqdm.tqdm(range(event_num))  # 61
+            error_count = 0
+            for i in pbar:
+                pbar.set_description(
+                    'Processing {}'.format(settings[setting_id]))
+                event_name = str(event_list[i][0][0])
+                img_list = file_list[i][0]
+                pred_list = pred[event_name]
+                sub_gt_list = gt_list[i][0]
+                # print("shape of sub_gt_list is: ",sub_gt_list.shape)
+                gt_bbx_list = facebox_list[i][0]
+                for j in range(len(img_list)):
+                    try:
+                        pred_info = pred_list[str(img_list[j][0][0])]
+                    except:
+                        error_count += 1
+                        continue
+                    gt_boxes = gt_bbx_list[j][0].astype('float')
+                    keep_index = sub_gt_list[j][0]
+                    count_face += len(keep_index)
+                    if len(gt_boxes) == 0 or len(pred_info) == 0:
+                        continue
+                    ignore = np.zeros(gt_boxes.shape[0])
+                    if len(keep_index) != 0:
+                        ignore[keep_index-1] = 1
+                    pred_recall, proposal_list = image_eval(
+                        pred_info, gt_boxes, ignore, iou_thresh)
+                    _img_pr_info = img_pr_info(
+                        thresh_num, pred_info, proposal_list, pred_recall)
+                    pr_curve += _img_pr_info
+            print("error_count is: ", error_count)
+            pr_curve = dataset_pr_info(thresh_num, pr_curve, count_face)
+            propose = pr_curve[:, 0]
+            recall = pr_curve[:, 1]
+            ap = voc_ap(recall, propose)
+            aps.append(ap)
+        print("==================== Results ====================")
+        print("Easy   Val AP: {}".format(aps[0]))
+        print("Medium Val AP: {}".format(aps[1]))
+        print("Hard   Val AP: {}".format(aps[2]))
+        print("=================================================")
+    else:
+        aps = []
+        # different setting
+        count_face = 0
+        pr_curve = np.zeros((thresh_num, 2)).astype(
+            'float')  # control calcultate how many samples
+        # [hard, medium, easy]
+        pbar = tqdm.tqdm(range(event_num))
+        error_count = 0
+        for i in pbar:
+            pbar.set_description('Processing {}'.format("all"))
+            # print("event_list is: ",event_list)
+            # '0--Parade', '1--Handshaking'
+            event_name = str(event_list[i][0][0])
+            img_list = file_list[i][0]
+            pred_list = pred[event_name]  # 每个文件夹的所有检测结果
+            sub_gt_list = [setting_gts[0][i][0],
+                           setting_gts[1][i][0], setting_gts[2][i][0]]
+            gt_bbx_list = facebox_list[i][0]
+            for j in range(len(img_list)):
+                try:
+                    # str(img_list[j][0][0] 是每个folder下面的图片名字
+                    pred_info = pred_list[str(img_list[j][0][0])]
+                except:
+                    error_count += 1
+                    continue
+                gt_boxes = gt_bbx_list[j][0].astype('float')
+                temp_i = []
+                for ii in range(3):
+                    if len(sub_gt_list[ii][j][0]) != 0:
+                        temp_i.append(ii)
+                if len(temp_i) != 0:
+                    keep_index = np.concatenate(
+                        tuple([sub_gt_list[xx][j][0] for xx in temp_i]))
+                else:
+                    keep_index = []
+                count_face += len(keep_index)
+                if len(gt_boxes) == 0 or len(pred_info) == 0:
+                    continue
+                ignore = np.zeros(gt_boxes.shape[0])  # no ignore
+                if len(keep_index) != 0:
+                    ignore[keep_index-1] = 1
+                pred_recall, proposal_list = image_eval(
+                    pred_info, gt_boxes, ignore, iou_thresh)
+                _img_pr_info = img_pr_info(
+                    thresh_num, pred_info, proposal_list, pred_recall)
+                pr_curve += _img_pr_info
+        pr_curve = dataset_pr_info(thresh_num, pr_curve, count_face)
+        propose = pr_curve[:, 0]
+        recall = pr_curve[:, 1]
+        ap = voc_ap(recall, propose)
+        aps.append(ap)
+        print("==================== Results ====================")
+        print("All Val AP: {}".format(aps[0]))
+        print("=================================================")
+if __name__ == '__main__':
+    parser = argparse.ArgumentParser()
+    parser.add_argument('-p', '--pred', default='../output/widerface')
+    parser.add_argument('-g', '--gt', default='./ground_truth')
+    parser.add_argument(
+        '--all', help='if test all together', action='store_true')
+    args = parser.parse_args()
+    evaluation(args.pred, args.gt, args.all)
--- a/evaluate/ground_truth/wider_easy_val.mat
+++ b/evaluate/ground_truth/wider_easy_val.mat
--- a/evaluate/ground_truth/wider_face_val.mat
+++ b/evaluate/ground_truth/wider_face_val.mat
--- a/evaluate/ground_truth/wider_hard_val.mat
+++ b/evaluate/ground_truth/wider_hard_val.mat
--- a/evaluate/ground_truth/wider_medium_val.mat
+++ b/evaluate/ground_truth/wider_medium_val.mat
--- a/evaluate/setup.py
+++ b/evaluate/setup.py
+"""
+WiderFace evaluation code
+author: wondervictor
+mail: tianhengcheng@gmail.com
+copyright@wondervictor
+"""
+from distutils.core import setup, Extension
+from Cython.Build import cythonize
+import numpy
+package = Extension('bbox', ['box_overlaps.pyx'], include_dirs=[numpy.get_include()])
+setup(ext_modules=cythonize([package]))