README.md

# OOTDiffusion

## 论文

**OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable Virtual Try-on**

* https://arxiv.org/pdf/2403.01779

## 模型结构
该模型基于`Stable Diffusion`，通过添加`Outfitting Unet`学习衣物特征。

![alt text](readme_imgs/image-1.png)


## 算法原理

该算法基于`Stable Diffusion`，通过使用额外的Unet网络学习衣物特征，并使用cross-attention融入主干网络。

![alt text](readme_imgs/image-2.png)

## 环境配置

### Docker（方法一）
    
    docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.1.0-centos7.6-dtk24.04-py310

    docker run --shm-size 10g --network=host --name=ottd --privileged --device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v 项目地址(绝对路径):/home/ -v /opt/hyhal:/opt/hyhal:ro -it <your IMAGE ID> bash

    pip install -r requirements.txt


### Dockerfile（方法二）

    docker build -t <IMAGE_NAME>:<TAG> .

    docker run --shm-size 10g --network=host --name=ottd --privileged --device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v 项目地址(绝对路径):/home/ -v /opt/hyhal:/opt/hyhal:ro -it <your IMAGE ID> bash

    pip install -r requirements.txt


### Anaconda (方法三)
1、关于本项目DCU显卡所需的特殊深度学习库可从光合开发者社区下载安装：
https://developer.hpccube.com/tool/

    DTK驱动：dtk24.04
    python：python3.10
    torch: 2.1.0
    torchvision: 0.16.0
    onnx: 1.15.0

Tips：以上dtk驱动、python、torch等DCU相关工具版本需要严格一一对应

2、其它非特殊库参照requirements.txt安装

    pip install -r requirements.txt

## 数据集

|名称|链接|
|:---|:---|
|VITON-HD|https://openxlab.org.cn/datasets/OpenDataLab/VITON-HD/tree/main/raw <br> https://github.com/shadow2496/VITON-HD  (选择一个即可)|
|Dress Code|https://github.com/aimagelab/dress-code  (需要填表下载)|

## 训练

    cd train
    mkdir -p checkpoints/unet_garm checkpoints/unet_vton

    HIP_VISIBLE_DEVICES=0,1,2,3 python main.py

注意：该训练代码为非官方实现，目前仅支持`VITON-HD`类数据集的训练。

## 推理

### 模型下载

https://hf-mirror.com/levihsu/OOTDiffusion/tree/main/checkpoints

https://hf-mirror.com/openai/clip-vit-large-patch14/tree/main

下载链接中的所有模型文件，并放入`checkpoints`文件中。

    checkpoints/
    ├── clip-vit-large-patch14
    │   ├── config.json
    │   ├── merges.txt
    │   ├── preprocessor_config.json
    │   ├── pytorch_model.bin
    │   ├── special_tokens_map.json
    │   ├── tokenizer_config.json
    │   ├── tokenizer.json
    │   └── vocab.json
    ├── humanparsing
    │   ├── download.sh
    │   ├── exp-schp-201908261155-lip.pth
    │   ├── exp-schp-201908301523-atr.pth
    │   ├── parsing_atr.onnx
    │   └── parsing_lip.onnx
    ├── ootd
    │   ├── feature_extractor
    │   │   └── preprocessor_config.json
    │   ├── model_index.json
    │   ├── ootd_dc
    │   │   └── checkpoint-36000
    │   │       ├── unet_garm
    │   │       │   ├── config.json
    │   │       │   └── diffusion_pytorch_model.safetensors
    │   │       └── unet_vton
    │   │           ├── config.json
    │   │           └── diffusion_pytorch_model.safetensors
    │   ├── ootd_hd
    │   │   └── checkpoint-36000
    │   │       ├── unet_garm
    │   │       │   ├── config.json
    │   │       │   └── diffusion_pytorch_model.safetensors
    │   │       └── unet_vton
    │   │           ├── config.json
    │   │           └── diffusion_pytorch_model.safetensors
    │   ├── scheduler
    │   │   └── scheduler_config.json
    │   ├── text_encoder
    │   │   ├── config.json
    │   │   └── pytorch_model.bin
    │   ├── tokenizer
    │   │   ├── merges.txt
    │   │   ├── special_tokens_map.json
    │   │   ├── tokenizer_config.json
    │   │   └── vocab.json
    │   └── vae
    │       ├── config.json
    │       └── diffusion_pytorch_model.bin
    ├── openpose
    │   └── ckpts
    │       └── body_pose_model.pth
    └── README.txt

### 命令

半身

    # model_path表示任务图片
    cd OOTDiffusion/run
    python run_ootd.py --model_path <model-image-path> --cloth_path <cloth-image-path> --scale 2.0 --sample 1

全身

    # category = 0 上半身，1 下半身，2 裙子
    cd OOTDiffusion/run
    python run_ootd.py --model_path <model-image-path> --cloth_path <cloth-image-path> --model_type dc --category 2 --scale 2.0 --sample 1

### webui

    cd OOTDiffusion/run
    python gradio_ootd.py


## result

||人物|衣物|结果|
|:---|:---:|:---:|:---|
|hd|![alt text](readme_imgs/input_11.png)|![alt text](readme_imgs/input_12.png)|![alt text](readme_imgs/output1.png)
|dc|![alt text](readme_imgs/input_21.png)|![alt text](readme_imgs/input_22.png)|![alt text](readme_imgs/output2.png)|


### 精度

|ssim|lpips|
|:---:|:---:|
|0.86|0.075|

注意：该精度在size=(512, 384)条件下训练及测试得到，与官方实现（未开源）可能存在不同。

## 应用场景

### 算法类别

`AIGC`

### 热点应用行业

`零售,广媒,电商`

## 源码仓库及问题反馈

* https://developer.hpccube.com/codes/modelzoo/ootdiffusion_pytorch

## 参考资料

* https://github.com/levihsu/ootdiffusion
* https://github.com/lyc0929/OOTDiffusion-train