README.md 4.43 KB
Newer Older
mashun1's avatar
mashun1 committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
# DynamiCrafter

## 论文

**DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors**

* https://arxiv.org/abs/2310.12190

## 模型结构

该模型对Stable Diffusion进行了扩展,使其可以生成视频。在训练时采用双流图像注入(`Dual-stream image injection`)机制,该机制以一种上下文感知的方式继承视觉细节并提取输入图像特征。模型的整体流程是这样的,输入分别是`x`以及$`x^m`$(`x`中随机帧),视频`x`逐帧通过`VAE`的编码器部分获取 $`z_0`$,图像`x_m`通过编码器并`Repeat`后与`z_t`($`z_0`$扩散后得到)拼接进入`Denoising U-Net`,同时,由$`x^m`$经过`CLIP image encoder`以及`Query transformer`后得到的条件与`FPS``Text`特征一同进入`U-Net`进行训练。

![Alt text](readme_imgs/image-1.png)


## 算法原理

该算法在文本生成视频的基础上,增加了视觉信息,使得在视频生成的过程中可以保留视觉的细节信息。

![Alt text](readme_imgs/image-2.png)

## 环境配置

### Docker(方法一)

mashun1's avatar
mashun1 committed
26
    docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.1.0-ubuntu20.04-dtk24.04.1-py3.10
mashun1's avatar
mashun1 committed
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44

    docker run --shm-size 10g --network=host --name=dynamicrafter --privileged --device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v 项目地址(绝对路径):/home/ -v /opt/hyhal:/opt/hyhal:ro -it <your IMAGE ID> bash

    pip install -r requirements.txt


### Docker(方法二)

    # 需要在对应的目录下
    docker build -t <IMAGE_NAME>:<TAG> .

    docker run --shm-size 10g --network=host --name=dynamicrafter --privileged --device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v 项目地址(绝对路径):/home/ -v /opt/hyhal:/opt/hyhal:ro -it <your IMAGE ID> bash

    pip install -r requirements.txt


### Anaconda (方法三)
1、关于本项目DCU显卡所需的特殊深度学习库可从光合开发者社区下载安装:
chenzk's avatar
chenzk committed
45
https://developer.sourcefind.cn/tool/
mashun1's avatar
mashun1 committed
46

mashun1's avatar
mashun1 committed
47
48
    DTK驱动:dtk24.04.1
    python:python3.10
mashun1's avatar
mashun1 committed
49
    torch:2.1.0
mashun1's avatar
mashun1 committed
50
51
52
    torchvision:0.16.0
    triton:2.1.0

mashun1's avatar
mashun1 committed
53
54
55
56
57

Tips:以上dtk驱动、python、torch等DCU相关工具版本需要严格一一对应

2、其它非特殊库参照requirements.txt安装

mashun1's avatar
mashun1 committed
58
59
    pip install -r requirements.txt

mashun1's avatar
mashun1 committed
60
61
62
63
64
65
66

## 数据集



## 推理

mashun1's avatar
mashun1 committed
67
68
    export HF_ENDPOINT=https://hf-mirror.com

mashun1's avatar
mashun1 committed
69
70
71
### 命令行

    # Select the model based on required resolutions: i.e., 1024|512|320:
mashun1's avatar
mashun1 committed
72
    bash scripts/run.sh 512
mashun1's avatar
mashun1 committed
73

mashun1's avatar
mashun1 committed
74
75
    bash scripts/run_application.sh interp # Generate frame interpolation
    bash scripts/run_application.sh loop   # Looping video generation
mashun1's avatar
mashun1 committed
76

mashun1's avatar
mashun1 committed
77
78
79
80
### gradio页面

    python gradio_app.py --res 512

mashun1's avatar
mashun1 committed
81
    python gradio_app_interp_and_loop.py 
mashun1's avatar
mashun1 committed
82

mashun1's avatar
mashun1 committed
83
## result
mashun1's avatar
mashun1 committed
84

mashun1's avatar
mashun1 committed
85
### normal
mashun1's avatar
mashun1 committed
86
87
88
89
90
||输入|输出|
|:---|:---|:---|
|image|![alt text](readme_imgs/bloom01.png)|![Alt text](readme_imgs/image-3.gif)|
|prompt|time-lapse of a blooming flower with leaves and a stem||

mashun1's avatar
mashun1 committed
91
92
93
94
95
96
97
98
99
100
101
102
103

### interp
||输入1|输入2|结果|
|:---|:---|:---|:---|
|image|![alt text](readme_imgs/smile_01.png)|![alt text](readme_imgs/smile_02.png)|![alt text](readme_imgs/r2.gif)|
|prompt|a smiling girl|||

### loop
||输入|结果|
|:---|:---|:---|
|image|![alt text](readme_imgs/24.png)|![alt text](readme_imgs/r3.gif)|
|prompt|a beach with waves and clouds at sunset||

mashun1's avatar
mashun1 committed
104
105
106
107
108
109
110
111
112
113
114
115
116
117
### 精度



## 应用场景

### 算法类别

`AIGC`

### 热点应用行业

`媒体,科研,教育`

mashun1's avatar
update  
mashun1 committed
118
119
120
## 预训练权重
|Model|Resolution|GPU Mem|Checkpoint|
|:---------|:---------|:--------|:--------|
chenzk's avatar
chenzk committed
121
122
123
124
|DynamiCrafter1024|576x1024|18.3GB|[huggingface](https://huggingface.co/Doubiiu/DynamiCrafter_1024/blob/main/model.ckpt)/[SCNet]|
|DynamiCrafter512|320x512|12.8GB|[huggingface](https://huggingface.co/Doubiiu/DynamiCrafter_512/blob/main/model.ckpt)/[SCNet]|
|DynamiCrafter256|256x256|11.9GB |[huggingface](https://huggingface.co/Doubiiu/DynamiCrafter/blob/main/model.ckpt)/[SCNet]|
|DynamiCrafter512_interp|320x512 |12.8G|[huggingface](https://huggingface.co/Doubiiu/DynamiCrafter_512_Interp/blob/main/model.ckpt)/[SCNet]|
mashun1's avatar
update  
mashun1 committed
125
126
127
128
129
130
131
132
133
134
135
136
137

模型文件结构如下:

    checkpoints/
    |── dynamicrafter_512_v1
        └── model.ckpt
    |── dynamicrafter_1024_v1
        └── model.ckpt
    |── dynamicrafter_256_v1
        └── model.ckpt
    └── ...


mashun1's avatar
mashun1 committed
138
139
## 源码仓库及问题反馈

chenzk's avatar
chenzk committed
140
https://developer.sourcefind.cn/codes/modelzoo/dynamicrafter_pytorch
mashun1's avatar
mashun1 committed
141
142
143

## 参考资料

mashun1's avatar
mashun1 committed
144
* https://github.com/Doubiiu/DynamiCrafter