新增README

2887714a · zk · ca23112b · 2887714a · 2887714a · 2887714a
Commit 2887714a authored Apr 15, 2026 by zk
17 changed files
--- a/.asset/COCO.png
+++ b/.asset/COCO.png
--- a/.asset/GD_GLIGEN.png
+++ b/.asset/GD_GLIGEN.png
--- a/.asset/GD_SD.png
+++ b/.asset/GD_SD.png
--- a/.asset/ODinW.png
+++ b/.asset/ODinW.png
--- a/.asset/arch.png
+++ b/.asset/arch.png
--- a/.asset/cats.png
+++ b/.asset/cats.png
--- a/.asset/grounding_dino_logo.png
+++ b/.asset/grounding_dino_logo.png
--- a/.asset/hero_figure.png
+++ b/.asset/hero_figure.png
--- a/.gitignore
+++ b/.gitignore
@@ -169,14 +169,11 @@ checkpoints/
 __pycache__/
 *.pyc
 .ipynb_checkpoints/
-build/
+# build/
 dist/

 # 忽略你的本地图片或数据集（根据你的实际文件夹名字修改）
 data/
-images/
-*.jpg
-*.png

 # 忽略 IDE 配置文件
 .vscode/

--- a/README.md
+++ b/README.md
-**Installation:**
+# GroundingDINO Inference & Deployment

+本项目主要包含 GroundingDINO 的原生 PyTorch 推理以及基于 ONNX Runtime (ORT) 的推理部署流程，并针对 `ms_deform_attn` 算子提供了自定义的加速与量化方案。

-1. Install the required dependencies in the current directory.
+## 1. 环境准备

+本环境配置参考 [官方 GroundingDINO 仓库](https://github.com/IDEA-Research/GroundingDINO)。
+
+### 1.1 基础环境变量设置
+在进行编译和运行前，请先激活相关的计算栈环境并配置 HuggingFace 镜像：
+```bash
+source /opt/dtk/cuda/env.sh
+export HF_ENDPOINT=https://hf-mirror.com
+```
+
+### 1.2 编译 GroundingDINO 库
+进入项目目录并执行编译安装：
 ```bash
-pip install -e .
+cd GroundingDINO/
+pip3 install -e . --no-build-isolation
 ```

-2. Download pre-trained model weights.
+> **💡 注意：Numpy 版本兼容性问题**
+> 如果在运行过程中遇到因 `numpy` 版本过高导致的报错，请强制重装指定版本：
+> ```bash
+> python3 -m pip install numpy==1.26.4 --force-reinstall
+> ```

+### 1.3 模型下载
+新建weights文件夹并下载模型权重：
 ```bash
 mkdir weights
 cd weights
-wget -q https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha/groundingdino_swint_ogc.pth
-cd ..
+wget -q https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha2/groundingdino_swinb_cogcoor.pth
 ```

+
+
+---
+
+## 2. 推理模式前置配置
+
+由于导出 ONNX 模型会改变模型的输入结构，在进行具体的推理操作前，**必须手动替换底层文件**以适配不同的推理后端。
+
+目标文件路径：`groundingdino/models/GroundingDINO/groundingdino.py`
+
+* **当使用 PyTorch 推理时**：
+    将 `groundingdino_torch.py` 的内容复制并覆盖到上述目标文件中。
+* **当使用 ONNX Runtime 导出与推理时**：
+    将 `groundingdino_onnx.py` 的内容复制并覆盖到上述目标文件中。
+
+---
+
+## 3. PyTorch 原生推理
+
+确保已按上述说明将代码切换为 Torch 模式后，直接运行测试脚本：
 ```bash
-nvidia-smi
+bash infer_test.sh
 ```
-Replace `{GPU ID}`, `image_you_want_to_detect.jpg`, and `"dir you want to save the output"` with appropriate values in the following command
+
+---
+
+## 4. 标准 ONNX Runtime 推理
+
+请先确保代码已切换为 ONNX 模式。本测试包含 5 轮预热（Warmup）和 10 轮正式测试。
+
+**Step 1: 导出 ONNX 模型**
 ```bash
-CUDA_VISIBLE_DEVICES={GPU ID} python demo/inference_on_a_image.py \
-c groundingdino/config/GroundingDINO_SwinT_OGC.py \
-p weights/groundingdino_swint_ogc.pth \
-i image_you_want_to_detect.jpg \
-o "dir you want to save the output" \
-t "chair"
- [--cpu-only] # open it for cpu mode
+python export_onnx.py
 ```

-If you would like to specify the phrases to detect, here is a demo:
+**Step 2: 执行 ORT 推理**
 ```bash
-CUDA_VISIBLE_DEVICES={GPU ID} python demo/inference_on_a_image.py \
-c groundingdino/config/GroundingDINO_SwinT_OGC.py \
-p ./groundingdino_swint_ogc.pth \
-i .asset/cat_dog.jpeg \
-o logs/1111 \
-t "There is a cat and a dog in the image ." \
--token_spans "[[[9, 10], [11, 14]], [[19, 20], [21, 24]]]"
- [--cpu-only] # open it for cpu mode
+python onnx_inference1.py
 ```
-The token_spans specify the start and end positions of a phrases. For example, the first phrase is `[[9, 10], [11, 14]]`. `"There is a cat and a dog in the image ."[9:10] = 'a'`, `"There is a cat and a dog in the image ."[11:14] = 'cat'`. Hence it refers to the phrase `a cat` . Similarly, the `[[19, 20], [21, 24]]` refers to the phrase `a dog`.
-
-See the `demo/inference_on_a_image.py` for more details.
+> **📝 参数说明**：
+> 运行前请在 `onnx_inference1.py` 中填入对应的 `.onnx` 模型路径和图像路径。图片外的其他 Text/Mask 输入已自动生成并填入（可通过 `get_caption_mask.py` 生成）。

-**Running with Python:**
+---

-```python
-from groundingdino.util.inference import load_model, load_image, predict, annotate
-import cv2
+## 5. 进阶：带自定义算子 (ms_deform_attn) 的 ORT 推理

-model = load_model("groundingdino/config/GroundingDINO_SwinT_OGC.py", "weights/groundingdino_swint_ogc.pth")
-IMAGE_PATH = "weights/dog-3.jpeg"
-TEXT_PROMPT = "chair . person . dog ."
-BOX_TRESHOLD = 0.35
-TEXT_TRESHOLD = 0.25
+为了进一步优化性能，我们实现了 `ms_deform_attn` 的自定义算子，并提供了多种优化方案（含 FP16）。

-image_source, image = load_image(IMAGE_PATH)
+### 5.1 编译自定义算子
+根据需求选择对应的算子实现目录（支持的方案包括：`ort_plugin`, `ort_plugin_fp16`, `ort_plugin_fp16_B`, `ort_plugin_fp16_C`）。

-boxes, logits, phrases = predict(
-    model=model,
-    image=image,
-    caption=TEXT_PROMPT,
-    box_threshold=BOX_TRESHOLD,
-    text_threshold=TEXT_TRESHOLD
-)
-
-annotated_frame = annotate(image_source=image_source, boxes=boxes, logits=logits, phrases=phrases)
-cv2.imwrite("annotated_image.jpg", annotated_frame)
+以 `ort_plugin` 为例进行编译：
+```bash
+cd ort_plugin
+mkdir build && cd build
+cmake ..
+make
 ```
-**Web UI**
-
-We also provide a demo code to integrate Grounding DINO with Gradio Web UI. See the file `demo/gradio_app.py` for more details.
+编译成功后，将在该目录下生成动态链接库 `libms_deform_attn_ort.so`。

-**Notebooks**
-
- We release [demos](demo/image_editing_with_groundingdino_gligen.ipynb) to combine [Grounding DINO](https://arxiv.org/abs/2303.05499) with [GLIGEN](https://github.com/gligen/GLIGEN)  for more controllable image editings.
- We release [demos](demo/image_editing_with_groundingdino_stablediffusion.ipynb) to combine [Grounding DINO](https://arxiv.org/abs/2303.05499) with [Stable Diffusion](https://github.com/Stability-AI/StableDiffusion) for image editings.
+### 5.2 导出与优化模型
+进入 deform 推理工作区：
+```bash
+cd deform_ort
+```

-## COCO Zero-shot Evaluations
+**1. 导出带自定义算子的 ONNX：**
+```bash
+python export_onnx_deform.py
+```

-We provide an example to evaluate Grounding DINO zero-shot performance on COCO. The results should be **48.5**.
+**2. 模型简化与量化：**
+```bash
+python onnx_optimize.py
+```
+> 该脚本会输出两种量化后的 ONNX 模型：
+> - 一种**跳过了**自定义算子。
+> - 另一种**保留了**自定义算子（需搭配不带 FP16 的 `.so` 库使用）。

+### 5.3 运行自定义算子推理
+最后，执行优化后的推理脚本（包含 5 轮预热，10 轮测试）：
 ```bash
-CUDA_VISIBLE_DEVICES=0 \
-python demo/test_ap_on_coco.py \
- -c groundingdino/config/GroundingDINO_SwinT_OGC.py \
- -p weights/groundingdino_swint_ogc.pth \
- --anno_path /path/to/annoataions/ie/instances_val2017.json \
- --image_dir /path/to/imagedir/ie/val2017
+python onnx_inference_deform_optim.py
 ```
+> **📝 参数说明**：
+> 运行前，请务必在代码中正确填入**ONNX 模型位置**以及**步骤 5.1 生成的自定义算子 `.so` 库位置**。
+
+---
+
+## 6. 测试结果对比
+单张BW150，
+图像输入800x1200，
+batchsize=1
+
+| 推理模式 | 优化方案 / 精度 | 预热次数 | 测试次数 | 平均推理延迟 (ms) | FPS |
+| :--- | :--- | :---: | :---: | :---: | :--- |
+| PyTorch | FP32 (Base) | 5 | 10 | 144.25 | 6.93 |
+| ORT | 标准 ONNX | 5 | 10 | 173.66 | 5.76 |
+| ORT + Deform Plugin | 动态库（ort_plugin） | 5 | 10 | 121.67 | 8.22 |
+| ORT + Deform Plugin | FP16 混合精度量化（ort_plugin） | 5 | 10 | 95.17 | 10.5 |
+| ORT + Deform Plugin | 纯FP16 方案 B（ort_plugin_fp16_B） | 5 | 10 | 87.34| 11.44 |
+| ORT + Deform Plugin | 纯FP16 方案 C（ort_plugin_fp16_C） | 5 | 10 | 84.52 | 11.82 |
\ No newline at end of file
--- a/bert-base-uncased/config.json
+++ b/bert-base-uncased/config.json
-{
-  "architectures": [
-    "BertForMaskedLM"
-  ],
-  "attention_probs_dropout_prob": 0.1,
-  "gradient_checkpointing": false,
-  "hidden_act": "gelu",
-  "hidden_dropout_prob": 0.1,
-  "hidden_size": 768,
-  "initializer_range": 0.02,
-  "intermediate_size": 3072,
-  "layer_norm_eps": 1e-12,
-  "max_position_embeddings": 512,
-  "model_type": "bert",
-  "num_attention_heads": 12,
-  "num_hidden_layers": 12,
-  "pad_token_id": 0,
-  "position_embedding_type": "absolute",
-  "transformers_version": "4.6.0.dev0",
-  "type_vocab_size": 2,
-  "use_cache": true,
-  "vocab_size": 30522
-}
--- a/bert-base-uncased/vocab.txt
+++ b/bert-base-uncased/vocab.txt
--- a/deform_ort/result.jpg
+++ b/deform_ort/result.jpg
--- a/export_onnx.py
+++ b/export_onnx.py
@@ -26,7 +26,7 @@ def load_model(model_config_path, model_checkpoint_path, cpu_only=False):
 # 加载模型
 model = load_model(config_file, checkpoint_path, cpu_only=True)

-# 正式推理时使用的提示词，以及相关的mask
+# 正式推理时使用的提示词，以及相关的mask，可以提前使用get_caption_mask.py生成得到
 caption = "car ."
 input_ids = model.tokenizer([caption], return_tensors="pt")["input_ids"]
 position_ids = torch.tensor([[0, 0, 1, 0]])
@@ -39,10 +39,10 @@ text_token_mask = torch.tensor([[[True, False, False, False],

 # 固定输入分辨率
 img = torch.randn(1, 3, 800, 1200)
-img = torch.randn(1, 3, 400, 600)
+# img = torch.randn(1, 3, 400, 600)

 # 导出原始ONNX模型
-onnx_output_path = "weights_400x600/ground.onnx"
+onnx_output_path = "weights/ground.onnx"
 simplified_onnx_path = "weights/ground_simplified1.onnx"



--- a/images/in/ImageNet_01.jpg
+++ b/images/in/ImageNet_01.jpg
--- a/images/in/car_1.jpg
+++ b/images/in/car_1.jpg
--- a/images/out/result.jpg
+++ b/images/out/result.jpg