Commit 2887714a authored by zk's avatar zk
Browse files

新增README

parent ca23112b
......@@ -169,14 +169,11 @@ checkpoints/
__pycache__/
*.pyc
.ipynb_checkpoints/
build/
# build/
dist/
# 忽略你的本地图片或数据集(根据你的实际文件夹名字修改)
data/
images/
*.jpg
*.png
# 忽略 IDE 配置文件
.vscode/
......
**Installation:**
# GroundingDINO Inference & Deployment
本项目主要包含 GroundingDINO 的原生 PyTorch 推理以及基于 ONNX Runtime (ORT) 的推理部署流程,并针对 `ms_deform_attn` 算子提供了自定义的加速与量化方案。
1. Install the required dependencies in the current directory.
## 1. 环境准备
本环境配置参考 [官方 GroundingDINO 仓库](https://github.com/IDEA-Research/GroundingDINO)
### 1.1 基础环境变量设置
在进行编译和运行前,请先激活相关的计算栈环境并配置 HuggingFace 镜像:
```bash
source /opt/dtk/cuda/env.sh
export HF_ENDPOINT=https://hf-mirror.com
```
### 1.2 编译 GroundingDINO 库
进入项目目录并执行编译安装:
```bash
pip install -e .
cd GroundingDINO/
pip3 install -e . --no-build-isolation
```
2. Download pre-trained model weights.
> **💡 注意:Numpy 版本兼容性问题**
> 如果在运行过程中遇到因 `numpy` 版本过高导致的报错,请强制重装指定版本:
> ```bash
> python3 -m pip install numpy==1.26.4 --force-reinstall
> ```
### 1.3 模型下载
新建weights文件夹并下载模型权重:
```bash
mkdir weights
cd weights
wget -q https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha/groundingdino_swint_ogc.pth
cd ..
wget -q https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha2/groundingdino_swinb_cogcoor.pth
```
---
## 2. 推理模式前置配置
由于导出 ONNX 模型会改变模型的输入结构,在进行具体的推理操作前,**必须手动替换底层文件**以适配不同的推理后端。
目标文件路径:`groundingdino/models/GroundingDINO/groundingdino.py`
* **当使用 PyTorch 推理时**:
将 `groundingdino_torch.py` 的内容复制并覆盖到上述目标文件中。
* **当使用 ONNX Runtime 导出与推理时**:
将 `groundingdino_onnx.py` 的内容复制并覆盖到上述目标文件中。
---
## 3. PyTorch 原生推理
确保已按上述说明将代码切换为 Torch 模式后,直接运行测试脚本:
```bash
nvidia-smi
bash infer_test.sh
```
Replace `{GPU ID}`, `image_you_want_to_detect.jpg`, and `"dir you want to save the output"` with appropriate values in the following command
---
## 4. 标准 ONNX Runtime 推理
请先确保代码已切换为 ONNX 模式。本测试包含 5 轮预热(Warmup)和 10 轮正式测试。
**Step 1: 导出 ONNX 模型**
```bash
CUDA_VISIBLE_DEVICES={GPU ID} python demo/inference_on_a_image.py \
-c groundingdino/config/GroundingDINO_SwinT_OGC.py \
-p weights/groundingdino_swint_ogc.pth \
-i image_you_want_to_detect.jpg \
-o "dir you want to save the output" \
-t "chair"
[--cpu-only] # open it for cpu mode
python export_onnx.py
```
If you would like to specify the phrases to detect, here is a demo:
**Step 2: 执行 ORT 推理**
```bash
CUDA_VISIBLE_DEVICES={GPU ID} python demo/inference_on_a_image.py \
-c groundingdino/config/GroundingDINO_SwinT_OGC.py \
-p ./groundingdino_swint_ogc.pth \
-i .asset/cat_dog.jpeg \
-o logs/1111 \
-t "There is a cat and a dog in the image ." \
--token_spans "[[[9, 10], [11, 14]], [[19, 20], [21, 24]]]"
[--cpu-only] # open it for cpu mode
python onnx_inference1.py
```
The token_spans specify the start and end positions of a phrases. For example, the first phrase is `[[9, 10], [11, 14]]`. `"There is a cat and a dog in the image ."[9:10] = 'a'`, `"There is a cat and a dog in the image ."[11:14] = 'cat'`. Hence it refers to the phrase `a cat` . Similarly, the `[[19, 20], [21, 24]]` refers to the phrase `a dog`.
See the `demo/inference_on_a_image.py` for more details.
> **📝 参数说明**:
> 运行前请在 `onnx_inference1.py` 中填入对应的 `.onnx` 模型路径和图像路径。图片外的其他 Text/Mask 输入已自动生成并填入(可通过 `get_caption_mask.py` 生成)。
**Running with Python:**
---
```python
from groundingdino.util.inference import load_model, load_image, predict, annotate
import cv2
## 5. 进阶:带自定义算子 (ms_deform_attn) 的 ORT 推理
model = load_model("groundingdino/config/GroundingDINO_SwinT_OGC.py", "weights/groundingdino_swint_ogc.pth")
IMAGE_PATH = "weights/dog-3.jpeg"
TEXT_PROMPT = "chair . person . dog ."
BOX_TRESHOLD = 0.35
TEXT_TRESHOLD = 0.25
为了进一步优化性能,我们实现了 `ms_deform_attn` 的自定义算子,并提供了多种优化方案(含 FP16)。
image_source, image = load_image(IMAGE_PATH)
### 5.1 编译自定义算子
根据需求选择对应的算子实现目录(支持的方案包括:`ort_plugin`, `ort_plugin_fp16`, `ort_plugin_fp16_B`, `ort_plugin_fp16_C`)。
boxes, logits, phrases = predict(
model=model,
image=image,
caption=TEXT_PROMPT,
box_threshold=BOX_TRESHOLD,
text_threshold=TEXT_TRESHOLD
)
annotated_frame = annotate(image_source=image_source, boxes=boxes, logits=logits, phrases=phrases)
cv2.imwrite("annotated_image.jpg", annotated_frame)
`ort_plugin` 为例进行编译:
```bash
cd ort_plugin
mkdir build && cd build
cmake ..
make
```
**Web UI**
We also provide a demo code to integrate Grounding DINO with Gradio Web UI. See the file `demo/gradio_app.py` for more details.
编译成功后,将在该目录下生成动态链接库 `libms_deform_attn_ort.so`
**Notebooks**
- We release [demos](demo/image_editing_with_groundingdino_gligen.ipynb) to combine [Grounding DINO](https://arxiv.org/abs/2303.05499) with [GLIGEN](https://github.com/gligen/GLIGEN) for more controllable image editings.
- We release [demos](demo/image_editing_with_groundingdino_stablediffusion.ipynb) to combine [Grounding DINO](https://arxiv.org/abs/2303.05499) with [Stable Diffusion](https://github.com/Stability-AI/StableDiffusion) for image editings.
### 5.2 导出与优化模型
进入 deform 推理工作区:
```bash
cd deform_ort
```
## COCO Zero-shot Evaluations
**1. 导出带自定义算子的 ONNX:**
```bash
python export_onnx_deform.py
```
We provide an example to evaluate Grounding DINO zero-shot performance on COCO. The results should be **48.5**.
**2. 模型简化与量化:**
```bash
python onnx_optimize.py
```
> 该脚本会输出两种量化后的 ONNX 模型:
> - 一种**跳过了**自定义算子。
> - 另一种**保留了**自定义算子(需搭配不带 FP16 的 `.so` 库使用)。
### 5.3 运行自定义算子推理
最后,执行优化后的推理脚本(包含 5 轮预热,10 轮测试):
```bash
CUDA_VISIBLE_DEVICES=0 \
python demo/test_ap_on_coco.py \
-c groundingdino/config/GroundingDINO_SwinT_OGC.py \
-p weights/groundingdino_swint_ogc.pth \
--anno_path /path/to/annoataions/ie/instances_val2017.json \
--image_dir /path/to/imagedir/ie/val2017
python onnx_inference_deform_optim.py
```
> **📝 参数说明**:
> 运行前,请务必在代码中正确填入**ONNX 模型位置**以及**步骤 5.1 生成的自定义算子 `.so` 库位置**。
---
## 6. 测试结果对比
单张BW150,
图像输入800x1200,
batchsize=1
| 推理模式 | 优化方案 / 精度 | 预热次数 | 测试次数 | 平均推理延迟 (ms) | FPS |
| :--- | :--- | :---: | :---: | :---: | :--- |
| PyTorch | FP32 (Base) | 5 | 10 | 144.25 | 6.93 |
| ORT | 标准 ONNX | 5 | 10 | 173.66 | 5.76 |
| ORT + Deform Plugin | 动态库(ort_plugin) | 5 | 10 | 121.67 | 8.22 |
| ORT + Deform Plugin | FP16 混合精度量化(ort_plugin) | 5 | 10 | 95.17 | 10.5 |
| ORT + Deform Plugin | 纯FP16 方案 B(ort_plugin_fp16_B) | 5 | 10 | 87.34| 11.44 |
| ORT + Deform Plugin | 纯FP16 方案 C(ort_plugin_fp16_C) | 5 | 10 | 84.52 | 11.82 |
\ No newline at end of file
{
"architectures": [
"BertForMaskedLM"
],
"attention_probs_dropout_prob": 0.1,
"gradient_checkpointing": false,
"hidden_act": "gelu",
"hidden_dropout_prob": 0.1,
"hidden_size": 768,
"initializer_range": 0.02,
"intermediate_size": 3072,
"layer_norm_eps": 1e-12,
"max_position_embeddings": 512,
"model_type": "bert",
"num_attention_heads": 12,
"num_hidden_layers": 12,
"pad_token_id": 0,
"position_embedding_type": "absolute",
"transformers_version": "4.6.0.dev0",
"type_vocab_size": 2,
"use_cache": true,
"vocab_size": 30522
}
This diff is collapsed.
......@@ -26,7 +26,7 @@ def load_model(model_config_path, model_checkpoint_path, cpu_only=False):
# 加载模型
model = load_model(config_file, checkpoint_path, cpu_only=True)
# 正式推理时使用的提示词,以及相关的mask
# 正式推理时使用的提示词,以及相关的mask,可以提前使用get_caption_mask.py生成得到
caption = "car ."
input_ids = model.tokenizer([caption], return_tensors="pt")["input_ids"]
position_ids = torch.tensor([[0, 0, 1, 0]])
......@@ -39,10 +39,10 @@ text_token_mask = torch.tensor([[[True, False, False, False],
# 固定输入分辨率
img = torch.randn(1, 3, 800, 1200)
img = torch.randn(1, 3, 400, 600)
# img = torch.randn(1, 3, 400, 600)
# 导出原始ONNX模型
onnx_output_path = "weights_400x600/ground.onnx"
onnx_output_path = "weights/ground.onnx"
simplified_onnx_path = "weights/ground_simplified1.onnx"
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment