README.md 6.59 KB
Newer Older
mashun1's avatar
mashun1 committed
1
2
3
4
# Open-Sora

## 论文

mashun1's avatar
mashun1 committed
5
6
7
**video-generation-models-as-world-simulators**

* https://openai.com/research/video-generation-models-as-world-simulators
mashun1's avatar
mashun1 committed
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38

## 模型结构

该模型为基于`Transformer`的视频生成模型,包含`Video Encoder-Decoder`用于视频/图像的压缩/恢复,`Transformer-based Latent Stable Diffusion`用于扩散/恢复,以及`Conditioning`用于生成对训练视频的条件(这里指文本描述)。

![alt text](readme_imgs/image-1.png)


## 算法原理

该算法通过在隐空间使用`Transformer`模型对视频进行扩散/反扩散学习视频的分布。

![alt text](readme_imgs/image-2.png)


## 环境配置

### Docker(方法一)

    docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.1.0-centos7.6-dtk23.10.1-py38

    docker run --shm-size 10g --network=host --name=opensora --privileged --device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v 项目地址(绝对路径):/home/ -v /opt/hyhal:/opt/hyhal:ro -it <your IMAGE ID> bash

    pip install flash_attn-2.0.4_torch2.1_dtk2310-cp38-cp38-linux_x86_64.whl  (whl.zip文件中)

    pip install triton-2.1.0%2Bgit34f8189.abi0.dtk2310-cp38-cp38-manylinux2014_x86_64.whl (开发者社区下载)

    cd xformers && pip install xformers==0.0.23 --no-deps && bash patch_xformers.rocm.sh  (whl.zip文件中)

    pip install -r requirements.txt

mashun1's avatar
mashun1 committed
39
40
    pip install .

mashun1's avatar
mashun1 committed
41

mashun1's avatar
mashun1 committed
42
### Dockerfile(方法二)
mashun1's avatar
mashun1 committed
43
44
45
46
47
48
49
50
51
52
53
54
55
56

    # 需要在对应的目录下
    docker build -t <IMAGE_NAME>:<TAG> .

    docker run --shm-size 10g --network=host --name=opensora --privileged --device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v 项目地址(绝对路径):/home/ -v /opt/hyhal:/opt/hyhal:ro -it <your IMAGE ID> bash

    pip install flash_attn-2.0.4_torch2.1_dtk2310-cp38-cp38-linux_x86_64.whl  (whl.zip文件中)

    pip install triton-2.1.0%2Bgit34f8189.abi0.dtk2310-cp38-cp38-manylinux2014_x86_64.whl (开发者社区下载)

    cd xformers && pip install xformers==0.0.23 --no-deps && bash patch_xformers.rocm.sh  (whl.zip文件中)

    pip install -r requirements.txt

mashun1's avatar
mashun1 committed
57
58
    pip install .

mashun1's avatar
mashun1 committed
59
60
61
62
63
64
65
66
67
### Anaconda (方法三)
1、关于本项目DCU显卡所需的特殊深度学习库可从光合开发者社区下载安装:
https://developer.hpccube.com/tool/

    DTK驱动:dtk23.10.1
    python:python3.8
    torch:2.1.0
    torchvision:0.16.0
    triton:2.1.0
mashun1's avatar
mashun1 committed
68
    apex:0.1
mashun1's avatar
mashun1 committed
69
70
71
72
73
74
75
76
77
78
79

Tips:以上dtk驱动、python、torch等DCU相关工具版本需要严格一一对应

2、其它非特殊库参照requirements.txt安装

    pip install flash_attn-2.0.4_torch2.1_dtk2310-cp38-cp38-linux_x86_64.whl  (whl.zip文件中)

    cd xformers && pip install xformers==0.0.23 --no-deps && bash patch_xformers.rocm.sh  (whl.zip文件中)

    pip install -r requirements.txt

mashun1's avatar
mashun1 committed
80
81
    pip install .

mashun1's avatar
mashun1 committed
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
## 数据集

完整数据集下载:https://drive.google.com/drive/folders/154S6raNg9NpDGQRlRhhAaYcAx5xq1Ok8

可使用下列数据用于快速验证

https://opendatalab.com/OpenDataLab/ImageNet-1K/tree/main/raw (ImageNet)

https://www.crcv.ucf.edu/research/data-sets/ucf101/ (UCF101)

链接:https://pan.baidu.com/s/1nPEAC_52IuB5KF-5BAqGDA 
提取码:kwai  (mini数据集)

数据结构

    UCF-101/
    ├── ApplyEyeMakeup
    │   ├── v_ApplyEyeMakeup_g01_c01.avi
    │   ├── v_ApplyEyeMakeup_g01_c02.avi
    │   ├── v_ApplyEyeMakeup_g01_c03.avi
    │   ├── ...

使用脚本对数据进行处理并获取相应的csv文件

    # ImageNet
    python -m tools.datasets.convert_dataset imagenet IMAGENET_FOLDER --split train

    # UCF101
    python -m tools.datasets.convert_dataset ucf101 UCF101_FOLDER --split videos (如:ApplyEyeMakeup)

## 训练

敬请期待!

<!-- ### 模型下载

### 命令行
    
    # 若与huggingface网络连接错误,请执行命令
    export HF_ENDPOINT=https://hf-mirror.com

    # 1 GPU, 16x256x256
    torchrun --nnodes=1 --nproc_per_node=1 scripts/train.py configs/opensora/train/16x256x256.py --data-path YOUR_CSV_PATH
    # 8 GPUs, 64x512x512
    torchrun --nnodes=1 --nproc_per_node=8 scripts/train.py configs/opensora/train/64x512x512.py --data-path YOUR_CSV_PATH --ckpt-path YOUR_PRETRAINED_CKPT


同时参考`推理`部分T5下载。 -->

mashun1's avatar
mashun1 committed
131
<!-- ### 命令行 -->
mashun1's avatar
mashun1 committed
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167


## 推理

### 模型下载

| Resoluion  | Data   | #iterations | Batch Size | GPU days (H800) | URL                                                                                           |
| ---------- | ------ | ----------- | ---------- | --------------- | --------------------------------------------------------------------------------------------- |
| 16×256×256 | 366K   | 80k         | 8×64       | 117             | https://huggingface.co/hpcai-tech/Open-Sora/blob/main/OpenSora-v1-16x256x256.pth    |
| 16×256×256 | 20K HQ | 24k         | 8×64       | 45              | https://huggingface.co/hpcai-tech/Open-Sora/blob/main/OpenSora-v1-HQ-16x256x256.pth |
| 16×512×512 | 20K HQ | 20k         | 2×64       | 35              | https://huggingface.co/hpcai-tech/Open-Sora/blob/main/OpenSora-v1-HQ-16x512x512.pth |


https://huggingface.co/DeepFloyd/t5-v1_1-xxl/tree/main  (T5)

    pretrained_models/
    └── t5_ckpts
        └── t5-v1_1-xxl
            ├── config.json
            ├── pytorch_model-00001-of-00002.bin
            ├── pytorch_model-00002-of-00002.bin
            ├── pytorch_model.bin.index.json
            ├── special_tokens_map.json
            ├── spiece.model
            └── tokenizer_config.json
    
    models/
    ├── OpenSora-v1-HQ-16x256x256.pth
    └── ...


注意:可以使用`https://hf-mirror.com`加速下载相应的模型权重。


### 命令行

mashun1's avatar
mashun1 committed
168
    # Sample 16x256x256 (5s/sample) 显存 ~32G
mashun1's avatar
mashun1 committed
169
170
    torchrun --standalone --nproc_per_node 1 scripts/inference.py configs/opensora/inference/16x256x256.py --ckpt-path ./path/to/your/ckpt.pth

mashun1's avatar
mashun1 committed
171
    # Sample 16x512x512 (20s/sample, 100 time steps) 显存 > 32G
mashun1's avatar
mashun1 committed
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
    torchrun --standalone --nproc_per_node 1 scripts/inference.py configs/opensora/inference/16x512x512.py --ckpt-path ./path/to/your/ckpt.pth


## result

|模型|prompt|结果|
|:---|:---|:---|
|16×256×256|`assets/texts/t2v_samples.txt:1`|![alt text](readme_imgs/r0.gif)|
|16×256×256|`assets/texts/t2v_samples.txt:2`|![alt text](readme_imgs/r1.gif)|

### 精度



## 应用场景

### 算法类别

`视频生成`

### 热点应用行业

`媒体,科研,教育`

## 源码仓库及问题反馈

* https://developer.hpccube.com/codes/modelzoo/open-sora_pytorch

## 参考资料

* https://github.com/hpcaitech/Open-Sora