# Open-Sora ## 论文 **video-generation-models-as-world-simulators** * https://openai.com/research/video-generation-models-as-world-simulators ## 模型结构 该模型为基于`Transformer`的视频生成模型,包含`Video Encoder-Decoder`用于视频/图像的压缩/恢复,`Transformer-based Latent Stable Diffusion`用于扩散/恢复,以及`Conditioning`用于生成对训练视频的条件(这里指文本描述)。 ![alt text](readme_imgs/image-1.png) ## 算法原理 该算法通过在隐空间使用`Transformer`模型对视频进行扩散/反扩散学习视频的分布。 ![alt text](readme_imgs/image-2.png) ## 环境配置 ### Docker(方法一) docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.3.0-ubuntu20.04-dtk24.04.2-py3.10 docker run --shm-size 10g --network=host --name=opensora --privileged --device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v 项目地址(绝对路径):/home/ -v /opt/hyhal:/opt/hyhal:ro -it bash pip install -r requirements.txt pip install . pip install bitsandbytes-0.42.0+das.opt1.dtk24042-py3-none-any.whl [开发者社区](https://download.sourcefind.cn:65024/directlink/4/bitsandbytes/DAS1.2/bitsandbytes-0.42.0+das.opt1.dtk24042-py3-none-any.whl) pip install diffusers-0.29.0+das.opt1.dtk24042-py3-none-any.whl [开发者社区](https://download.sourcefind.cn:65024/directlink/4/diffusers/DAS1.2/diffusers-0.29.0+das.opt1.dtk24042-py3-none-any.whl) ### Dockerfile(方法二) # 需要在对应的目录下 docker build -t : . docker run --shm-size 10g --network=host --name=opensora --privileged --device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v 项目地址(绝对路径):/home/ -v /opt/hyhal:/opt/hyhal:ro -it bash pip install -r requirements.txt pip install . pip install bitsandbytes-0.42.0+das.opt1.dtk24042-py3-none-any.whl [开发者社区](https://download.sourcefind.cn:65024/directlink/4/bitsandbytes/DAS1.2/bitsandbytes-0.42.0+das.opt1.dtk24042-py3-none-any.whl) pip install diffusers-0.29.0+das.opt1.dtk24042-py3-none-any.whl [开发者社区](https://download.sourcefind.cn:65024/directlink/4/diffusers/DAS1.2/diffusers-0.29.0+das.opt1.dtk24042-py3-none-any.whl) ### Anaconda (方法三) 1、关于本项目DCU显卡所需的特殊深度学习库可从光合开发者社区下载安装: https://developer.hpccube.com/tool/ DTK驱动:dtk24.04.2 python:python3.10 torch:2.3.0 torchvision:0.18.1 triton:2.1.0 apex:1.3.0 bitsandbytes:0.42.0 diffusers:0.29.0 Tips:以上dtk驱动、python、torch等DCU相关工具版本需要严格一一对应 2、其它非特殊库参照requirements.txt安装 pip install -r requirements.txt pip install . ## 数据集 完整数据集下载:https://drive.google.com/drive/folders/154S6raNg9NpDGQRlRhhAaYcAx5xq1Ok8 SCNet高速下载通道:[hd-vg-130m](http://113.200.138.88:18080/aidatasets/project-dependency/hd-vg-130m/-/tree/main?ref_type=heads) 可使用下列数据用于快速验证 https://opendatalab.com/OpenDataLab/ImageNet-1K/tree/main/raw (ImageNet) https://www.crcv.ucf.edu/research/data-sets/ucf101/ (UCF101) 链接:https://pan.baidu.com/s/1nPEAC_52IuB5KF-5BAqGDA 提取码:kwai (mini数据集) SCNet高速下载通道: - [magenet-1k](http://113.200.138.88:18080/aidatasets/project-dependency/imagenet-1k) - [UCF101](http://113.200.138.88:18080/aidatasets/project-dependency/ucf101/-/blob/master/UCF101.rar?ref_type=heads) - [mini](http://113.200.138.88:18080/aidatasets/project-dependency/mini/-/tree/master/datasets?ref_type=heads) 数据结构 UCF-101/ ├── ApplyEyeMakeup │ ├── v_ApplyEyeMakeup_g01_c01.avi │ ├── v_ApplyEyeMakeup_g01_c02.avi │ ├── v_ApplyEyeMakeup_g01_c03.avi │ ├── ... 使用脚本对数据进行处理并获取相应的csv文件 # ImageNet python -m tools.datasets.convert_dataset imagenet IMAGENET_FOLDER --split train # UCF101 python -m tools.datasets.convert_dataset ucf101 UCF101_FOLDER --split videos (如:ApplyEyeMakeup) ## 训练 敬请期待! ## 推理 ### 模型下载 | Resoluion | Data | #iterations | Batch Size | GPU days (H800) | URL |SCNet高速下载通道| | ---------- | ------ | ----------- | ---------- | --------------- | --------------------------------------------------------------------------------------------- |--------------------------- | | 16×256×256 | 366K | 80k | 8×64 | 117 | https://huggingface.co/hpcai-tech/Open-Sora/blob/main/OpenSora-v1-16x256x256.pth |[OpenSora-v1-16x256x256.pth](http://113.200.138.88:18080/aimodels/Open-Sora/-/blob/main/OpenSora-v1-16x256x256.pth?ref_type=heads)| | 16×256×256 | 20K HQ | 24k | 8×64 | 45 | https://huggingface.co/hpcai-tech/Open-Sora/blob/main/OpenSora-v1-HQ-16x256x256.pth |[OpenSora-v1-HQ-16x256x256.pth](http://113.200.138.88:18080/aimodels/Open-Sora/-/blob/main/OpenSora-v1-HQ-16x256x256.pth?ref_type=heads)| | 16×512×512 | 20K HQ | 20k | 2×64 | 35 | https://huggingface.co/hpcai-tech/Open-Sora/blob/main/OpenSora-v1-HQ-16x512x512.pth |[OpenSora-v1-HQ-16x512x512.pth](http://113.200.138.88:18080/aimodels/Open-Sora/-/blob/main/OpenSora-v1-HQ-16x512x512.pth?ref_type=heads)| https://huggingface.co/DeepFloyd/t5-v1_1-xxl/tree/main (T5) SCNet高速下载通道:[T5](http://113.200.138.88:18080/aimodels/t5-v1_1-xxl) pretrained_models/ └── t5_ckpts └── t5-v1_1-xxl ├── config.json ├── pytorch_model-00001-of-00002.bin ├── pytorch_model-00002-of-00002.bin ├── pytorch_model.bin.index.json ├── special_tokens_map.json ├── spiece.model └── tokenizer_config.json models/ ├── OpenSora-v1-HQ-16x256x256.pth └── ... 注意:可以使用`https://hf-mirror.com`加速下载相应的模型权重。 ### 命令行 # Sample 16x256x256 (5s/sample) 显存 ~32G torchrun --standalone --nproc_per_node 1 scripts/inference.py configs/opensora/inference/16x256x256.py --ckpt-path ./path/to/your/ckpt.pth # Sample 16x512x512 (20s/sample, 100 time steps) 显存 > 32G torchrun --standalone --nproc_per_node 1 scripts/inference.py configs/opensora/inference/16x512x512.py --ckpt-path ./path/to/your/ckpt.pth ## result |模型|prompt|结果| |:---|:---|:---| |16×256×256|`assets/texts/t2v_samples.txt:1`|![alt text](readme_imgs/r0.gif)| |16×256×256|`assets/texts/t2v_samples.txt:2`|![alt text](readme_imgs/r1.gif)| ### 精度 无 ## 应用场景 ### 算法类别 `视频生成` ### 热点应用行业 `媒体,科研,教育` ## 源码仓库及问题反馈 * https://developer.hpccube.com/codes/modelzoo/open-sora_pytorch ## 参考资料 * https://github.com/hpcaitech/Open-Sora