README.md 6.28 KB
Newer Older
mashun1's avatar
latte  
mashun1 committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
# Latte

## 论文

**Latte: Latent Diffusion Transformer for Video Generation**

* https://arxiv.org/abs/2401.03048v1

## 模型结构

该模型主要使用Transformer结构作为去噪模型。大概流程是这样的,输视频的Embedding,获取相应的Token,然后使用`Transformer Blocks`获取相应的时空信息,接着使用`Layer Norm``Linear and Reshape`得到`Noise``Variance`。下图的四种变体为不同的时空信息提取方式。

![alt text](readme_imgs/image-1.png)

## 算法原理

该算法的主要思想是将较为常见的`Unet`结构替换为`Transformer`结构作为去噪模型。相较于使用`Unet`,使用`Transformer`可以提升模型的速度,同时`Transformer`可以较好的对时空信息进行建模。

![alt text](readme_imgs/image-2.png)

## 环境配置

### Docker(方法一)

    docker pull image.sourcefind.cn:5000/dcu/admin/base/dtk:23.10-ubuntu20.04-py310

    docker run --shm-size 10g --network=host --name=latte --privileged --device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v /opt/hyhal:/opt/hyhal -v 项目地址(绝对路径):/home/ -it <your IMAGE ID> bash

mashun1's avatar
latte  
mashun1 committed
29
    pip install torch-2.1.0a0%2Bgit793d2b5.abi0.dtk2310-cp310-cp310-manylinux2014_x86_64.whl  (whl.zip文件中)
mashun1's avatar
latte  
mashun1 committed
30

mashun1's avatar
latte  
mashun1 committed
31
32
    pip install torchvision-0.16.0+git267eff6.abi0.dtk2310.torch2.1.0-cp310-cp310-linux_x86_64.whl

mashun1's avatar
latte  
mashun1 committed
33
34
35
36
37
38
39
40
41
42
    pip install -r requirements.txt
    pip install timm --no-deps

### Dockerfile(方法二)

    # 需要在对应的目录下
    docker build -t <IMAGE_NAME>:<TAG> .

    docker run --shm-size 10g --network=host --name=latte --privileged --device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v /opt/hyhal:/opt/hyhal -v 项目地址(绝对路径):/home/ -it <your IMAGE ID> bash

mashun1's avatar
latte  
mashun1 committed
43
    pip install torch-2.1.0a0%2Bgit793d2b5.abi0.dtk2310-cp310-cp310-manylinux2014_x86_64.whl  (whl.zip文件中)
mashun1's avatar
latte  
mashun1 committed
44

mashun1's avatar
latte  
mashun1 committed
45
46
    pip install torchvision-0.16.0+git267eff6.abi0.dtk2310.torch2.1.0-cp310-cp310-linux_x86_64.whl

mashun1's avatar
latte  
mashun1 committed
47
48
49
50
    pip install -r requirements.txt
    pip install timm --no-deps


mashun1's avatar
latte  
mashun1 committed
51
52
53
54
### Anaconda (方法三)
1、关于本项目DCU显卡所需的特殊深度学习库可从光合开发者社区下载安装:
https://developer.hpccube.com/tool/

mashun1's avatar
latte  
mashun1 committed
55
56
    DTK驱动:     dtk23.10
    python:      python3.10
mashun1's avatar
latte  
mashun1 committed
57
58
59
60
61
62

Tips:以上dtk驱动、python、torch等DCU相关工具版本需要严格一一对应

2、其它非特殊库参照requirements.txt安装

    pip install -r requirements.txt
mashun1's avatar
latte  
mashun1 committed
63

mashun1's avatar
latte  
mashun1 committed
64
65
    pip install timm --no-deps

mashun1's avatar
latte  
mashun1 committed
66
67
68
69
    pip install torch-2.1.0a0%2Bgit793d2b5.abi0.dtk2310-cp310-cp310-manylinux2014_x86_64.whl  (whl.zip文件中)

    pip install torchvision-0.16.0+git267eff6.abi0.dtk2310.torch2.1.0-cp310-cp310-linux_x86_64.whl

mashun1's avatar
latte  
mashun1 committed
70
71
72
73
74
75
76
77
78
79
80
81

## 数据集

|名称|URL|条件|
|:---|:---|:---|
|UCF101|https://www.crcv.ucf.edu/research/data-sets/ucf101/|无|
|FaceForensics|https://github.com/ondyari/FaceForensics/tree/original|填写表格|
|Tachi|https://github.com/AliaksandrSiarohin/first-order-model/blob/master/data/taichi-loading/README.md|无|
|SkyTimelapse|https://drive.google.com/file/d/1xWLiU-MBGN7MrsFHQm4_yXmfHBsMbJQo/view|无|

数据结构,这里为示例数据(仅展示UCF-101),完整数据请按如下结构准备。

mashun1's avatar
latte  
mashun1 committed
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
    train_datasets/
    └── UCF-101_tiny
        ├── ApplyEyeMakeup
        │   └── v_ApplyEyeMakeup_g01_c01.avi
        ├── ApplyLipstick
        │   └── v_ApplyLipstick_g01_c01.avi
        ├── Archery
        │   └── v_Archery_g01_c01.avi
        ├── BabyCrawling
        │   └── v_BabyCrawling_g01_c01.avi
        ├── BalanceBeam
        │   └── v_BalanceBeam_g01_c01.avi
        ├── BandMarching
        │   └── v_BandMarching_g01_c01.avi

mashun1's avatar
latte  
mashun1 committed
97
98
99
100
101
102

## 训练

    # 训练UCF-101
    torchrun --nnodes=1 --nproc_per_node=N train.py --config ./configs/ucf101/ucf101_train.yaml

mashun1's avatar
latte  
mashun1 committed
103
注意:训练前需要准备相应的预训练模型,具体参考`推理-模型下载`
mashun1's avatar
latte  
mashun1 committed
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122

## 推理

### 模型下载

https://hf-mirror.com/maxin-cn/Latte/tree/main

https://hf-mirror.com/PixArt-alpha/PixArt-XL-2-512x512/tree/main/transformer

    share_ckpts/
    ├── ffs.pt
    ├── skytimelapse.pt
    ├── t2v.pt
    └── ...
    
    pretrained_models/
    ├── sd-vae-ft-ema
    │   ├── config.json
    │   └── diffusion_pytorch_model.bin
mashun1's avatar
latte  
mashun1 committed
123
124
125
    ├── sd-vae-ft-mse
    │   ├── config.json
    │   └── diffusion_pytorch_model.bin
mashun1's avatar
latte  
mashun1 committed
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
    ├── ....
    ├── t2v_required_models
    │   ├── model_index.json
    │   ├── scheduler
    │   │   └── scheduler_config.json
    │   ├── text_encoder
    │   │   ├── config.json
    │   │   ├── model-00001-of-00004.safetensors
    │   │   ├── model-00002-of-00004.safetensors
    │   │   ├── model-00003-of-00004.safetensors
    │   │   ├── model-00004-of-00004.safetensors
    │   │   └── model.safetensors.index.json
    │   ├── tokenizer
    │   │   ├── added_tokens.json
    │   │   ├── special_tokens_map.json
    │   │   ├── spiece.model
    │   │   └── tokenizer_config.json
    │   ├── transformer
    │   │   ├── config.json
    │   │   └── diffusion_pytorch_model.safetensors
    │   └── vae
    │       ├── config.json
    │       └── diffusion_pytorch_model.safetensors
    └── vae
        ├── config.json
        └── diffusion_pytorch_model.bin


### 命令

    # FaceForensics(面部视频)
    # 获取一个视频
    bash sample/ffs.sh

    # 获取多个视频
    bash sample/ffs_ddp.sh

    # sky(天空视频)
    bash sample/sky.sh
    
    bash sample/sky_ddp.sh

    # taichi(打太极视频)
    bash sample/taichi.sh

    bash sample/taichi_ddp.sh

    #ucf101(动作视频)
    bash sample/ucf101.sh

    bash sample/ucf101_ddp.sh

    # 文本->视频
    bash sample/t2v.sh

## result

![alt text](readme_imgs/test.gif)

### 精度

mashun1's avatar
latte  
mashun1 committed
187

mashun1's avatar
latte  
mashun1 committed
188
189
190
191
192

## 应用场景

### 算法类别

mashun1's avatar
latte  
mashun1 committed
193
`视频生成`
mashun1's avatar
latte  
mashun1 committed
194
195
196
197
198
199
200
201
202
203
204
205

### 热点应用行业

`媒体,科研,教育`

## 源码仓库及问题反馈

* https://developer.hpccube.com/codes/modelzoo/latte_pytorch

## 参考资料

* https://github.com/Vchitect/Latte