README.md 6.82 KB
Newer Older
mashun1's avatar
mashun1 committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# AnimateDiff

## 论文

**AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning**

* https://arxiv.org/abs/2307.04725

## 模型结构

$`\mathcal{E}`$(Encoder,用于压缩原始图像),`Base T2I`(文本生成图像模型,如Stable Diffusion),`Motion Modeling Module`(运动模型模块),`Personalized T2I`(个性化图像生成模型,如使用DreamBoth训练得到的模型),$`\mathcal{D}`$(Decoder,用于恢复/生成图像)。

![Alt text](readme_imgs/image-1.png)

## 算法原理

mashun1's avatar
mashun1 committed
17
该算法在stable diffusion的基础上,对网络进行膨胀以及加入注意力机制,可以生成高质量的动画,具体如下,
mashun1's avatar
mashun1 committed
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32

1.网络膨胀

将原始图像模型中的每个2D卷积和注意力层转换为仅空间伪3D层,通过将帧轴重塑为批次轴,使网络能够独立处理每一帧。同时,新插入的运动模块在每个批次中跨帧操作,以实现动画剪辑中的运动平滑性和内容一致性。

![Alt text](readme_imgs/image-2.png)

2.注意力机制

使用原始的时间注意力机制作为运动模块的设计,实现跨帧之间的有效信息交换。

## 环境配置

### Docker(方法一)

mashun1's avatar
update  
mashun1 committed
33
34
    docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.1.0-ubuntu20.04-dtk24.04.1-py3.10
    docker run --shm-size 10g --network=host --name=animatediff --privileged --device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v /opt/hyhal:/opt/hyhal:ro -v 项目地址(绝对路径):/home/ -it <your IMAGE ID> bash
mashun1's avatar
mashun1 committed
35
36
    pip install -r requirements.txt

mashun1's avatar
mashun1 committed
37
### Dockerfile(方法二)
mashun1's avatar
mashun1 committed
38
39
40
41

    # 需要在对应的目录下
    docker build -t <IMAGE_NAME>:<TAG> .
    # <your IMAGE ID>用以上拉取的docker的镜像ID替换
mashun1's avatar
update  
mashun1 committed
42
    docker run --shm-size 10g --network=host --name=animatediff --privileged --device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v /opt/hyhal:/opt/hyhal:ro -v 项目地址(绝对路径):/home/ -it <your IMAGE ID> bash
mashun1's avatar
mashun1 committed
43
44
45
46
    pip install -r requirements.txt

### Anaconda (方法三)
1、关于本项目DCU显卡所需的特殊深度学习库可从光合开发者社区下载安装:
chenzk's avatar
chenzk committed
47
https://developer.sourcefind.cn/tool/
mashun1's avatar
mashun1 committed
48

mashun1's avatar
update  
mashun1 committed
49
50
51
52
    DTK驱动:dtk24.04.1
    python:python3.10
    torch:2.1.0
    torchvision:0.16.0
mashun1's avatar
mashun1 committed
53
54
55
56
57
58
59
60
61

Tips:以上dtk驱动、python、torch等DCU相关工具版本需要严格一一对应

2、其它非特殊库参照requirements.txt安装

    pip install -r requirements.txt

## 数据集

mashun1's avatar
update  
mashun1 committed
62
官方数据目前已经下架,如需训练,请自行准备`文本-视频`数据或使用本项目提供的数据集。
mashun1's avatar
mashun1 committed
63

chenzk's avatar
chenzk committed
64
[原始链接](fudan-fuxi/VIDGEN-1M) 
mashun1's avatar
mashun1 committed
65

mashun1's avatar
update  
mashun1 committed
66
本项目提供了用于处理数据的脚本,具体使用方法请参考`scripts/process_data.py`
mashun1's avatar
mashun1 committed
67

mashun1's avatar
update  
mashun1 committed
68
    data.csv
mashun1's avatar
mashun1 committed
69

mashun1's avatar
update  
mashun1 committed
70
71
    |caption|video_path|
    |xxxxxxx|xxxxx.mp4|
mashun1's avatar
mashun1 committed
72

mashun1's avatar
mashun1 committed
73

mashun1's avatar
mashun1 committed
74
75
76
## 训练

数据准备完成后需要修改`configs/trainging``yaml`文件中数据路径,如下所示。
mashun1's avatar
mashun1 committed
77
78

    train_data:
mashun1's avatar
update  
mashun1 committed
79
80
    csv_path:        "<path/to/data.csv>"
    video_folder:    ""
mashun1's avatar
mashun1 committed
81
82
83
84
85
86
87
88
89

### 微调Unet原始层(image layers)

    torchrun --nnodes=1 --nproc_per_node=1 train.py --config configs/training/v1/image_finetune.yaml

### 训练motion modules

    torchrun --nnodes=1 --nproc_per_node=1 train.py --config configs/training/v1/training.yaml

mashun1's avatar
mashun1 committed
90
91
## 推理

mashun1's avatar
update  
mashun1 committed
92
93
94
95
96
97
98
99
100
101
102
103
104
    python -m scripts.animate --config configs/prompts/v1/v1-1-ToonYou.yaml --without-xformers

    python -m scripts.animate --config configs/prompts/v1/v1-2-Lyriel.yaml --without-xformers

    python -m scripts.animate --config configs/prompts/v2/v2-1-RealisticVision.yaml --without-xformers
 
    python -m scripts.animate --config configs/prompts/v3/v3-1-T2V.yaml --without-xformers

    python -m scripts.animate --config configs/prompts/v3/v3-2-animation-RealisticVision.yaml --without-xformers

注意:以上仅是部分推理示例,可以自行修改或编写`yaml`文件。

## result
mashun1's avatar
mashun1 committed
105

mashun1's avatar
update  
mashun1 committed
106
![Alt text](readme_imgs/sample.gif)
mashun1's avatar
mashun1 committed
107

mashun1's avatar
update  
mashun1 committed
108
109
110
111
112
113
114
### 精度



## 应用场景

### 算法类别
mashun1's avatar
mashun1 committed
115

mashun1's avatar
update  
mashun1 committed
116
117
118
119
120
121
122
123
124
`AIGC`

### 热点应用行业

`媒体,科研,教育`

## 预训练权重

### 模型下载
mashun1's avatar
mashun1 committed
125

chenzk's avatar
chenzk committed
126
CLIP: [原始链接](https://huggingface.co/openai/clip-vit-large-patch14/tree/main)
mashun1's avatar
mashun1 committed
127

mashun1's avatar
update  
mashun1 committed
128
DreamBooth_LORA: 
chenzk's avatar
chenzk committed
129
- toonyou_beta6: [原始链接](https://hf-mirror.com/frankjoshua/toonyou_beta6)
mashun1's avatar
update  
mashun1 committed
130
131
- 其他: [civitai](https://civitai.com/models)

chenzk's avatar
chenzk committed
132
sd1.5: [原始链接](https://hf-mirror.com/Jiali/stable-diffusion-1.5/tree/main) 
mashun1's avatar
update  
mashun1 committed
133

chenzk's avatar
chenzk committed
134
Motion_Module: [原始链接](https://huggingface.co/guoyww/animatediff/tree/main) 
mashun1's avatar
mashun1 committed
135

mashun1's avatar
mashun1 committed
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
    openai/
    └── clip-vit-large-patch14
        ├── config.json
        ├── merges.txt
        ├── preprocessor_config.json
        ├── pytorch_model.bin
        ├── special_tokens_map.json
        ├── tokenizer_config.json
        ├── tokenizer.json
        └── vocab.json

    models/
    ├── DreamBooth_LoRA
    │   ├── lyriel_v16.safetensors
    │   ├── Put personalized T2I checkpoints here.txt
    │   ├── realisticVisionV51_v51VAE.safetensors
    │   ├── toonyou_beta3.safetensors
    │   └── toonyou_beta6.safetensors
    ├── MotionLoRA
    │   ├── Put MotionLoRA checkpoints here.txt
    │   └── v2_lora_ZoomIn.ckpt
    ├── Motion_Module
    │   ├── mm_sd_v14.ckpt
    │   ├── mm_sd_v15.ckpt
    │   ├── mm_sd_v15_v2.ckpt
    │   ├── Put motion module checkpoints here.txt
    │   ├── v3_sd15_adapter.ckpt
    │   └── v3_sd15_mm.ckpt
    ├── SparseCtrl
    │   └── v3_sd15_sparsectrl_rgb.ckpt
    └── StableDiffusion
        ├── Put diffusers stable-diffusion-v1-5 repo here.txt
        └── stable-diffusion-v1-5
            ├── feature_extractor
            │   └── preprocessor_config.json
            ├── model_index.json
            ├── scheduler
            │   └── scheduler_config.json
            ├── text_encoder
            │   ├── config.json
            │   └── pytorch_model.bin
            ├── tokenizer
            │   ├── merges.txt
            │   ├── special_tokens_map.json
            │   ├── tokenizer_config.json
            │   └── vocab.json
            ├── unet
            │   ├── config.json
            │   └── diffusion_pytorch_model.bin
            ├── v1-5-pruned.ckpt
            └── vae
                ├── config.json
                └── diffusion_pytorch_model.bin

注意:以上模型并不是必选,仅提供文件结构,可根据需要自行选择部分或其他模型。

## 源码仓库及问题反馈

chenzk's avatar
chenzk committed
194
https://developer.sourcefind.cn/codes/modelzoo/animatediff_pytorch
mashun1's avatar
mashun1 committed
195
196
197
198
199

## 参考资料

* https://github.com/guoyww/AnimateDiff

mashun1's avatar
mashun1 committed
200
* https://github.com/m-bain/webvid