README.md 6.73 KB
Newer Older
mashun1's avatar
mashun1 committed
1
2
3
4
# Open-Sora

## 论文

mashun1's avatar
mashun1 committed
5
6
7
**video-generation-models-as-world-simulators**

* https://openai.com/research/video-generation-models-as-world-simulators
mashun1's avatar
mashun1 committed
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26

## 模型结构

该模型为基于`Transformer`的视频生成模型,包含`Video Encoder-Decoder`用于视频/图像的压缩/恢复,`Transformer-based Latent Stable Diffusion`用于扩散/恢复,以及`Conditioning`用于生成对训练视频的条件(这里指文本描述)。

![alt text](readme_imgs/image-1.png)


## 算法原理

该算法通过在隐空间使用`Transformer`模型对视频进行扩散/反扩散学习视频的分布。

![alt text](readme_imgs/image-2.png)


## 环境配置

### Docker(方法一)

dcuai's avatar
dcuai committed
27
    docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.3.0-ubuntu20.04-dtk24.04.2-py3.10
mashun1's avatar
mashun1 committed
28
29
30
31
32

    docker run --shm-size 10g --network=host --name=opensora --privileged --device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v 项目地址(绝对路径):/home/ -v /opt/hyhal:/opt/hyhal:ro -it <your IMAGE ID> bash

    pip install -r requirements.txt

mashun1's avatar
mashun1 committed
33
    pip install .
dcuai's avatar
dcuai committed
34
35
36
    
    pip install bitsandbytes-0.42.0+das.opt1.dtk24042-py3-none-any.whl [开发者社区](https://download.sourcefind.cn:65024/directlink/4/bitsandbytes/DAS1.2/bitsandbytes-0.42.0+das.opt1.dtk24042-py3-none-any.whl)
    pip install diffusers-0.29.0+das.opt1.dtk24042-py3-none-any.whl [开发者社区](https://download.sourcefind.cn:65024/directlink/4/diffusers/DAS1.2/diffusers-0.29.0+das.opt1.dtk24042-py3-none-any.whl)
mashun1's avatar
mashun1 committed
37

mashun1's avatar
mashun1 committed
38

mashun1's avatar
mashun1 committed
39
### Dockerfile(方法二)
mashun1's avatar
mashun1 committed
40
41
42
43
44
45
46
47

    # 需要在对应的目录下
    docker build -t <IMAGE_NAME>:<TAG> .

    docker run --shm-size 10g --network=host --name=opensora --privileged --device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v 项目地址(绝对路径):/home/ -v /opt/hyhal:/opt/hyhal:ro -it <your IMAGE ID> bash

    pip install -r requirements.txt

mashun1's avatar
mashun1 committed
48
    pip install .
dcuai's avatar
dcuai committed
49
50
51
52
    
    pip install bitsandbytes-0.42.0+das.opt1.dtk24042-py3-none-any.whl [开发者社区](https://download.sourcefind.cn:65024/directlink/4/bitsandbytes/DAS1.2/bitsandbytes-0.42.0+das.opt1.dtk24042-py3-none-any.whl)
    pip install diffusers-0.29.0+das.opt1.dtk24042-py3-none-any.whl [开发者社区](https://download.sourcefind.cn:65024/directlink/4/diffusers/DAS1.2/diffusers-0.29.0+das.opt1.dtk24042-py3-none-any.whl)

mashun1's avatar
mashun1 committed
53

mashun1's avatar
mashun1 committed
54
55
### Anaconda (方法三)
1、关于本项目DCU显卡所需的特殊深度学习库可从光合开发者社区下载安装:
chenzk's avatar
chenzk committed
56
https://developer.sourcefind.cn/tool/
mashun1's avatar
mashun1 committed
57

dcuai's avatar
dcuai committed
58
59
60
61
    DTK驱动:dtk24.04.2
    python:python3.10
    torch:2.3.0
    torchvision:0.18.1
mashun1's avatar
mashun1 committed
62
    triton:2.1.0
dcuai's avatar
dcuai committed
63
64
65
    apex:1.3.0
    bitsandbytes:0.42.0
    diffusers:0.29.0
mashun1's avatar
mashun1 committed
66
67
68
69
70
71
72

Tips:以上dtk驱动、python、torch等DCU相关工具版本需要严格一一对应

2、其它非特殊库参照requirements.txt安装

    pip install -r requirements.txt

mashun1's avatar
mashun1 committed
73
74
    pip install .

mashun1's avatar
mashun1 committed
75
76
## 数据集

chenzk's avatar
chenzk committed
77
完整数据集(hd-vg-130m)下载:https://drive.google.com/drive/folders/154S6raNg9NpDGQRlRhhAaYcAx5xq1Ok8   
dcuai's avatar
dcuai committed
78

mashun1's avatar
mashun1 committed
79
80
81
82
83
84
85
可使用下列数据用于快速验证

https://opendatalab.com/OpenDataLab/ImageNet-1K/tree/main/raw (ImageNet)

https://www.crcv.ucf.edu/research/data-sets/ucf101/ (UCF101)

链接:https://pan.baidu.com/s/1nPEAC_52IuB5KF-5BAqGDA 
dcuai's avatar
dcuai committed
86
87
提取码:kwai  (mini数据集)   

mashun1's avatar
mashun1 committed
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
数据结构

    UCF-101/
    ├── ApplyEyeMakeup
    │   ├── v_ApplyEyeMakeup_g01_c01.avi
    │   ├── v_ApplyEyeMakeup_g01_c02.avi
    │   ├── v_ApplyEyeMakeup_g01_c03.avi
    │   ├── ...

使用脚本对数据进行处理并获取相应的csv文件

    # ImageNet
    python -m tools.datasets.convert_dataset imagenet IMAGENET_FOLDER --split train

    # UCF101
    python -m tools.datasets.convert_dataset ucf101 UCF101_FOLDER --split videos (如:ApplyEyeMakeup)

## 训练

敬请期待!

<!-- ### 模型下载

### 命令行
    
    # 若与huggingface网络连接错误,请执行命令
    export HF_ENDPOINT=https://hf-mirror.com

    # 1 GPU, 16x256x256
    torchrun --nnodes=1 --nproc_per_node=1 scripts/train.py configs/opensora/train/16x256x256.py --data-path YOUR_CSV_PATH
    # 8 GPUs, 64x512x512
    torchrun --nnodes=1 --nproc_per_node=8 scripts/train.py configs/opensora/train/64x512x512.py --data-path YOUR_CSV_PATH --ckpt-path YOUR_PRETRAINED_CKPT


同时参考`推理`部分T5下载。 -->

mashun1's avatar
mashun1 committed
124
<!-- ### 命令行 -->
mashun1's avatar
mashun1 committed
125
126
127
128
129
130


## 推理

### 模型下载

dcuai's avatar
dcuai committed
131
132
| Resoluion  | Data   | #iterations | Batch Size | GPU days (H800) | URL                                                                                           |SCNet高速下载通道|
| ---------- | ------ | ----------- | ---------- | --------------- | --------------------------------------------------------------------------------------------- |--------------------------- |            
chenzk's avatar
chenzk committed
133
134
135
| 16×256×256 | 366K   | 80k         | 8×64       | 117             | https://huggingface.co/hpcai-tech/Open-Sora/blob/main/OpenSora-v1-16x256x256.pth    |[SCNet]|
| 16×256×256 | 20K HQ | 24k         | 8×64       | 45              | https://huggingface.co/hpcai-tech/Open-Sora/blob/main/OpenSora-v1-HQ-16x256x256.pth |[SCNet]|
| 16×512×512 | 20K HQ | 20k         | 2×64       | 35              | https://huggingface.co/hpcai-tech/Open-Sora/blob/main/OpenSora-v1-HQ-16x512x512.pth |[SCNet]|
mashun1's avatar
mashun1 committed
136
137


chenzk's avatar
chenzk committed
138
[t5-v1_1-xxl](https://huggingface.co/DeepFloyd/t5-v1_1-xxl/tree/main) (T5)
mashun1's avatar
mashun1 committed
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160

    pretrained_models/
    └── t5_ckpts
        └── t5-v1_1-xxl
            ├── config.json
            ├── pytorch_model-00001-of-00002.bin
            ├── pytorch_model-00002-of-00002.bin
            ├── pytorch_model.bin.index.json
            ├── special_tokens_map.json
            ├── spiece.model
            └── tokenizer_config.json
    
    models/
    ├── OpenSora-v1-HQ-16x256x256.pth
    └── ...


注意:可以使用`https://hf-mirror.com`加速下载相应的模型权重。


### 命令行

mashun1's avatar
mashun1 committed
161
    # Sample 16x256x256 (5s/sample) 显存 ~32G
mashun1's avatar
mashun1 committed
162
163
    torchrun --standalone --nproc_per_node 1 scripts/inference.py configs/opensora/inference/16x256x256.py --ckpt-path ./path/to/your/ckpt.pth

mashun1's avatar
mashun1 committed
164
    # Sample 16x512x512 (20s/sample, 100 time steps) 显存 > 32G
mashun1's avatar
mashun1 committed
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
    torchrun --standalone --nproc_per_node 1 scripts/inference.py configs/opensora/inference/16x512x512.py --ckpt-path ./path/to/your/ckpt.pth


## result

|模型|prompt|结果|
|:---|:---|:---|
|16×256×256|`assets/texts/t2v_samples.txt:1`|![alt text](readme_imgs/r0.gif)|
|16×256×256|`assets/texts/t2v_samples.txt:2`|![alt text](readme_imgs/r1.gif)|

### 精度



## 应用场景

### 算法类别

`视频生成`

### 热点应用行业

`媒体,科研,教育`

## 源码仓库及问题反馈

chenzk's avatar
chenzk committed
191
* https://developer.sourcefind.cn/codes/modelzoo/open-sora_pytorch
mashun1's avatar
mashun1 committed
192
193
194
195

## 参考资料

* https://github.com/hpcaitech/Open-Sora