README.md 7.6 KB
Newer Older
mashun1's avatar
mashun1 committed
1
2
3
4
# Open-Sora

## 论文

mashun1's avatar
mashun1 committed
5
6
7
**video-generation-models-as-world-simulators**

* https://openai.com/research/video-generation-models-as-world-simulators
mashun1's avatar
mashun1 committed
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38

## 模型结构

该模型为基于`Transformer`的视频生成模型,包含`Video Encoder-Decoder`用于视频/图像的压缩/恢复,`Transformer-based Latent Stable Diffusion`用于扩散/恢复,以及`Conditioning`用于生成对训练视频的条件(这里指文本描述)。

![alt text](readme_imgs/image-1.png)


## 算法原理

该算法通过在隐空间使用`Transformer`模型对视频进行扩散/反扩散学习视频的分布。

![alt text](readme_imgs/image-2.png)


## 环境配置

### Docker(方法一)

    docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.1.0-centos7.6-dtk23.10.1-py38

    docker run --shm-size 10g --network=host --name=opensora --privileged --device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v 项目地址(绝对路径):/home/ -v /opt/hyhal:/opt/hyhal:ro -it <your IMAGE ID> bash

    pip install flash_attn-2.0.4_torch2.1_dtk2310-cp38-cp38-linux_x86_64.whl  (whl.zip文件中)

    pip install triton-2.1.0%2Bgit34f8189.abi0.dtk2310-cp38-cp38-manylinux2014_x86_64.whl (开发者社区下载)

    cd xformers && pip install xformers==0.0.23 --no-deps && bash patch_xformers.rocm.sh  (whl.zip文件中)

    pip install -r requirements.txt

mashun1's avatar
mashun1 committed
39
40
    pip install .

mashun1's avatar
mashun1 committed
41

mashun1's avatar
mashun1 committed
42
### Dockerfile(方法二)
mashun1's avatar
mashun1 committed
43
44
45
46
47
48
49
50
51
52
53
54
55
56

    # 需要在对应的目录下
    docker build -t <IMAGE_NAME>:<TAG> .

    docker run --shm-size 10g --network=host --name=opensora --privileged --device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v 项目地址(绝对路径):/home/ -v /opt/hyhal:/opt/hyhal:ro -it <your IMAGE ID> bash

    pip install flash_attn-2.0.4_torch2.1_dtk2310-cp38-cp38-linux_x86_64.whl  (whl.zip文件中)

    pip install triton-2.1.0%2Bgit34f8189.abi0.dtk2310-cp38-cp38-manylinux2014_x86_64.whl (开发者社区下载)

    cd xformers && pip install xformers==0.0.23 --no-deps && bash patch_xformers.rocm.sh  (whl.zip文件中)

    pip install -r requirements.txt

mashun1's avatar
mashun1 committed
57
58
    pip install .

mashun1's avatar
mashun1 committed
59
60
61
62
63
64
65
66
67
### Anaconda (方法三)
1、关于本项目DCU显卡所需的特殊深度学习库可从光合开发者社区下载安装:
https://developer.hpccube.com/tool/

    DTK驱动:dtk23.10.1
    python:python3.8
    torch:2.1.0
    torchvision:0.16.0
    triton:2.1.0
mashun1's avatar
mashun1 committed
68
    apex:0.1
mashun1's avatar
mashun1 committed
69
70
71
72
73
74
75
76
77
78
79

Tips:以上dtk驱动、python、torch等DCU相关工具版本需要严格一一对应

2、其它非特殊库参照requirements.txt安装

    pip install flash_attn-2.0.4_torch2.1_dtk2310-cp38-cp38-linux_x86_64.whl  (whl.zip文件中)

    cd xformers && pip install xformers==0.0.23 --no-deps && bash patch_xformers.rocm.sh  (whl.zip文件中)

    pip install -r requirements.txt

mashun1's avatar
mashun1 committed
80
81
    pip install .

mashun1's avatar
mashun1 committed
82
83
## 数据集

dcuai's avatar
dcuai committed
84
85
完整数据集下载:https://drive.google.com/drive/folders/154S6raNg9NpDGQRlRhhAaYcAx5xq1Ok8  SCNet高速下载通道:[hd-vg-130m](http://113.200.138.88:18080/aidatasets/project-dependency/hd-vg-130m/-/tree/main?ref_type=heads)     

mashun1's avatar
mashun1 committed
86
87
88
89
90
91
92
可使用下列数据用于快速验证

https://opendatalab.com/OpenDataLab/ImageNet-1K/tree/main/raw (ImageNet)

https://www.crcv.ucf.edu/research/data-sets/ucf101/ (UCF101)

链接:https://pan.baidu.com/s/1nPEAC_52IuB5KF-5BAqGDA 
dcuai's avatar
dcuai committed
93
94
95
提取码:kwai  (mini数据集)   

 SCNet高速下载通道:
dcuai's avatar
dcuai committed
96
97
98
- [magenet-1k](http://113.200.138.88:18080/aidatasets/project-dependency/imagenet-1k)
- [UCF101](http://113.200.138.88:18080/aidatasets/project-dependency/ucf101/-/blob/master/UCF101.rar?ref_type=heads)
- [mini](http://113.200.138.88:18080/aidatasets/project-dependency/mini/-/tree/master/datasets?ref_type=heads)
mashun1's avatar
mashun1 committed
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135

数据结构

    UCF-101/
    ├── ApplyEyeMakeup
    │   ├── v_ApplyEyeMakeup_g01_c01.avi
    │   ├── v_ApplyEyeMakeup_g01_c02.avi
    │   ├── v_ApplyEyeMakeup_g01_c03.avi
    │   ├── ...

使用脚本对数据进行处理并获取相应的csv文件

    # ImageNet
    python -m tools.datasets.convert_dataset imagenet IMAGENET_FOLDER --split train

    # UCF101
    python -m tools.datasets.convert_dataset ucf101 UCF101_FOLDER --split videos (如:ApplyEyeMakeup)

## 训练

敬请期待!

<!-- ### 模型下载

### 命令行
    
    # 若与huggingface网络连接错误,请执行命令
    export HF_ENDPOINT=https://hf-mirror.com

    # 1 GPU, 16x256x256
    torchrun --nnodes=1 --nproc_per_node=1 scripts/train.py configs/opensora/train/16x256x256.py --data-path YOUR_CSV_PATH
    # 8 GPUs, 64x512x512
    torchrun --nnodes=1 --nproc_per_node=8 scripts/train.py configs/opensora/train/64x512x512.py --data-path YOUR_CSV_PATH --ckpt-path YOUR_PRETRAINED_CKPT


同时参考`推理`部分T5下载。 -->

mashun1's avatar
mashun1 committed
136
<!-- ### 命令行 -->
mashun1's avatar
mashun1 committed
137
138
139
140
141
142


## 推理

### 模型下载

dcuai's avatar
dcuai committed
143
144
| Resoluion  | Data   | #iterations | Batch Size | GPU days (H800) | URL                                                                                           |SCNet高速下载通道|
| ---------- | ------ | ----------- | ---------- | --------------- | --------------------------------------------------------------------------------------------- |--------------------------- |            
dcuai's avatar
dcuai committed
145
| 16×256×256 | 366K   | 80k         | 8×64       | 117             | https://huggingface.co/hpcai-tech/Open-Sora/blob/main/OpenSora-v1-16x256x256.pth    |[OpenSora-v1-16x256x256.pth](http://113.200.138.88:18080/aimodels/Open-Sora/-/blob/main/OpenSora-v1-16x256x256.pth?ref_type=heads)|
dcuai's avatar
dcuai committed
146
147
| 16×256×256 | 20K HQ | 24k         | 8×64       | 45              | https://huggingface.co/hpcai-tech/Open-Sora/blob/main/OpenSora-v1-HQ-16x256x256.pth |[OpenSora-v1-HQ-16x256x256.pth](http://113.200.138.88:18080/aimodels/Open-Sora/-/blob/main/OpenSora-v1-HQ-16x256x256.pth?ref_type=heads)|
| 16×512×512 | 20K HQ | 20k         | 2×64       | 35              | https://huggingface.co/hpcai-tech/Open-Sora/blob/main/OpenSora-v1-HQ-16x512x512.pth |[OpenSora-v1-HQ-16x512x512.pth](http://113.200.138.88:18080/aimodels/Open-Sora/-/blob/main/OpenSora-v1-HQ-16x512x512.pth?ref_type=heads)|
mashun1's avatar
mashun1 committed
148
149
150


https://huggingface.co/DeepFloyd/t5-v1_1-xxl/tree/main  (T5)
dcuai's avatar
dcuai committed
151
SCNet高速下载通道:[T5](http://113.200.138.88:18080/aimodels/t5-v1_1-xxl)
mashun1's avatar
mashun1 committed
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173

    pretrained_models/
    └── t5_ckpts
        └── t5-v1_1-xxl
            ├── config.json
            ├── pytorch_model-00001-of-00002.bin
            ├── pytorch_model-00002-of-00002.bin
            ├── pytorch_model.bin.index.json
            ├── special_tokens_map.json
            ├── spiece.model
            └── tokenizer_config.json
    
    models/
    ├── OpenSora-v1-HQ-16x256x256.pth
    └── ...


注意:可以使用`https://hf-mirror.com`加速下载相应的模型权重。


### 命令行

mashun1's avatar
mashun1 committed
174
    # Sample 16x256x256 (5s/sample) 显存 ~32G
mashun1's avatar
mashun1 committed
175
176
    torchrun --standalone --nproc_per_node 1 scripts/inference.py configs/opensora/inference/16x256x256.py --ckpt-path ./path/to/your/ckpt.pth

mashun1's avatar
mashun1 committed
177
    # Sample 16x512x512 (20s/sample, 100 time steps) 显存 > 32G
mashun1's avatar
mashun1 committed
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
    torchrun --standalone --nproc_per_node 1 scripts/inference.py configs/opensora/inference/16x512x512.py --ckpt-path ./path/to/your/ckpt.pth


## result

|模型|prompt|结果|
|:---|:---|:---|
|16×256×256|`assets/texts/t2v_samples.txt:1`|![alt text](readme_imgs/r0.gif)|
|16×256×256|`assets/texts/t2v_samples.txt:2`|![alt text](readme_imgs/r1.gif)|

### 精度



## 应用场景

### 算法类别

`视频生成`

### 热点应用行业

`媒体,科研,教育`

## 源码仓库及问题反馈

* https://developer.hpccube.com/codes/modelzoo/open-sora_pytorch

## 参考资料

* https://github.com/hpcaitech/Open-Sora