README.md

# DragDiffusion
DragDiffusion 模型，利用扩散模型进行基于点的交互式图像编辑，允许用户将图像中的任意点“拖动”到目标位置，以精确控制姿势、形状、表情和布局。

## 论文
`DragDiffusion: Harnessing Diffusion Models for Interactive Point-based Image Editing`
- https://arxiv.org/abs/2306.14435
- CVPR 2024

## 模型结构
<!-- 此处一句话简要介绍模型结构 -->
<!-- DragDiffusion利用扩散模型进行基于点的交互式图像编辑。主要是基于 StyleGAN 模型架构： -->

<div align=center>
    <img src="./doc/overview.png"/>
    <div >DragDiffusion</div>
</div>


## 算法原理
DragDiffusion 算法受 DragGAN 的启发，把编解码的图像重建部分利用上大规模预训练扩散模型，极大提升了基于点的交互式编辑在现实世界场景中的适用性。

（1）先通过LoRA微调SD模型，数据集为用户输入的图像。目的是在编辑过程中（其实也是生成过程）更好的保留输入图像中物体和风格特征。\
（2）通过运动监督（Motion Supervision）和点跟踪（Point Tracking）实现对扩散 latent 进行优化，确保多步迭代过程中更加的精准和有效。\
（3）在最后一步去噪的过程中为了保证统一以及质量，从 MasaCtrl 中汲取灵感，提出利用自注意力模块的属性来引导去噪过程。

在编辑过程中，需要增加正则项确保非编辑区域（编辑mask区域外）不变。

<!-- <div align=center>
    <img src="./doc/pipeline.png"/>
    <div >DragDiffusion</div>
</div> -->


## 环境配置
```
mv dragdiffusion_pytoch dragdiffusion # 去框架名后缀
# docker的-v 路径、docker_name和imageID根据实际情况修改
# pip安装时如果出现下载慢可以尝试别的镜像源
```
### Docker（方法一）
<!-- 此处提供[光源](https://www.sourcefind.cn/#/service-details)拉取docker镜像的地址与使用步骤 -->
```
docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.1.0-ubuntu20.04-dtk24.04.2-py3.10 # 本镜像imageID为：2f1f619d0182
docker run -it -v /path/your_code_data/:/path/your_code_data/ -v /opt/hyhal/:/opt/hyhal/:ro --shm-size=16G --privileged=true --device=/dev/kfd --device=/dev/dri/ --group-add video --network=host --name docker_name imageID bash
cd /your_code_path/dragdiffusion
pip install -r requirements.txt
```
### Dockerfile（方法二）
<!-- 此处提供dockerfile的使用方法 -->
```
cd /your_code_path/dragdiffusion/docker
docker build --no-cache -t codestral:latest .
docker run -it -v /path/your_code_data/:/path/your_code_data/ -v /opt/hyhal/:/opt/hyhal/:ro --shm-size=16G --privileged=true --device=/dev/kfd --device=/dev/dri/ --group-add video --network=host --name docker_name imageID bash
cd /your_code_path/dragdiffusion
pip install -r requirements.txt
```
### Anaconda（方法三）
<!-- 此处提供本地配置、编译的详细步骤，例如： -->

关于本项目DCU显卡所需的特殊深度学习库可从[光合](https://developer.hpccube.com/tool/)开发者社区下载安装。
```
DTK驱动: dtk24.04.2
python: python3.10
pytorch: 2.1.0
```
`Tips：以上DTK驱动、python、pytorch等DCU相关工具版本需要严格一一对应`

其它非深度学习库参照requirements.txt安装：
```
pip install -r requirements.txt
```
## 数据集
测试数据集 [DragBench](https://github.com/Yujun-Shi/DragDiffusion/releases/download/v0.1.1/DragBench.zip) 或者从 [`SCNet`](http://113.200.138.88:18080/aidatasets/dragbench) 上下载。\
下载后放在 ./drag_bench_evaluation/drag_bench_data 并解压，文件构成:
<br>
DragBench<br>
--- animals<br>
------ JH_2023-09-14-1820-16<br>
------ JH_2023-09-14-1821-23<br>
------ JH_2023-09-14-1821-58<br>
------ ...<br>
--- art_work<br>
--- building_city_view<br>
--- ...<br>
--- other_objects<br>
<br>


## 训练
```
# LoRA微调
python run_lora_training.py
# 训练drag扩散模型，并输出结果
python run_drag_diffusion.py
# 如果出现huggingface访问不通，请执行 `export HF_ENDPOINT=https://hf-mirror.com`
```
亦或者webui界面。

## 推理
<!-- 下载模型权重:
```
python scripts/download_model.py
```
或者从 [SCNet](http://113.200.138.88:18080/aimodels/findsource-dependency/stylegan2_pytorch) 上快速下载，并放在 /checkpoints 文件夹下。 -->

可视化webui推理：
```
python drag_ui.py --listen
# 如果出现huggingface访问不通，请执行 `export HF_ENDPOINT=https://hf-mirror.com`
```
<div align=center>
    <img src="./doc/webui.png" width=600/>
    <div >webui界面</div>
</div>

1、上传图片；\
2、输入提示；\
3、LoRA训练；\
4、通过鼠标选择要编辑的区域；\
5、通过鼠标标记点位；\
6、运行。\
ps：Drag以及LoRA的一些参数自行视情况修改。

## result
<!-- 此处填算法效果测试图（包括输入、输出） -->


<div align=center>
    <img src="./doc/result1.png" width=600/>
    <div >推理结果</div>
</div>

<!-- <div align=center>
    <img src="./doc/image (1).png" width=600/>
    <div >输出</div>
</div> -->


### 精度
测试集 `DragBench`，如上所述下载并解压好。
```
python run_lora_training.py
python run_drag_diffusion.py
python run_eval_similarity.py --eval_root /your_dragging_results
python run_eval_point_matching.py --eval_root /your_dragging_results
# ps:上述脚本的一些文件路径自行根据情况修改
```

| 加速卡K100_AI | 1-lpips ↑ | clip sim ↑| mean distance ↓ |
| :-----| :----- | :---- | :---- |
| paper | 0.885 | 0.977 | 35.260 |
| 优化后 | 0.869 | 0.975 | 29.824 |
<!-- | 单元格 | 单元格 | 单元格 | -->

ps:优化两点 1）lora训练部分的rank选择；2）drag部分采用了多层融合。

<div align=center>
    <img src="./doc/对比1.jpg" width=600/>
</div>
<div align=center>
    <img src="./doc/对比2.jpg" width=600/>
</div>
<div align=center>
    <img src="./doc/对比3.jpg" width=600/>
</div>
<div align=center>
    <img src="./doc/对比4.jpg" width=600/>
    <div >paper结果和优化后的结果</div>
</div>

## 应用场景
### 算法类别

<!-- 超出以上分类的类别命名也可参考此网址中的类别名：https://huggingface.co/ \ -->
`AIGC`

### 热点应用行业
<!-- 应用行业的填写需要做大量调研，从而为使用者提供专业、全面的推荐，除特殊算法，通常推荐数量>=3。 -->
`零售,制造,电商,医疗,教育`

<!-- ## 预训练权重 -->
<!-- - 此处填写预训练权重在公司内部的下载地址（预训练权重存放中心为：[SCNet AIModels](http://113.200.138.88:18080/aimodels) ，模型用到的各预训练权重请分别填上具体地址。），过小权重文件可打包到项目里。
- 此处填写公开预训练权重官网下载地址（非必须）。 -->

## 源码仓库及问题反馈
<!-- - 此处填本项目gitlab地址 -->
- https://developer.sourcefind.cn/codes/modelzoo/dragdiffusion_pytorch
## 参考资料
- https://github.com/XingangPan/DragDiffusion