README.md

# Open-Sora

## 论文

**video-generation-models-as-world-simulators**

* https://openai.com/research/video-generation-models-as-world-simulators

## 模型结构

该模型为基于`Transformer`的视频生成模型，包含`Video Encoder-Decoder`用于视频/图像的压缩/恢复，`Transformer-based Latent Stable Diffusion`用于扩散/恢复，以及`Conditioning`用于生成对训练视频的条件（这里指文本描述）。

![alt text](readme_imgs/image-1.png)


## 算法原理

该算法通过在隐空间使用`Transformer`模型对视频进行扩散/反扩散学习视频的分布。

![alt text](readme_imgs/image-2.png)


## 环境配置

### Docker（方法一）

    docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.3.0-ubuntu20.04-dtk24.04.2-py3.10

    docker run --shm-size 10g --network=host --name=opensora --privileged --device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v 项目地址(绝对路径):/home/ -v /opt/hyhal:/opt/hyhal:ro -it <your IMAGE ID> bash

    pip install -r requirements.txt

    pip install .
    
    pip install bitsandbytes-0.42.0+das.opt1.dtk24042-py3-none-any.whl [开发者社区](https://download.sourcefind.cn:65024/directlink/4/bitsandbytes/DAS1.2/bitsandbytes-0.42.0+das.opt1.dtk24042-py3-none-any.whl)
    pip install diffusers-0.29.0+das.opt1.dtk24042-py3-none-any.whl [开发者社区](https://download.sourcefind.cn:65024/directlink/4/diffusers/DAS1.2/diffusers-0.29.0+das.opt1.dtk24042-py3-none-any.whl)


### Dockerfile（方法二）

    # 需要在对应的目录下
    docker build -t <IMAGE_NAME>:<TAG> .

    docker run --shm-size 10g --network=host --name=opensora --privileged --device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v 项目地址(绝对路径):/home/ -v /opt/hyhal:/opt/hyhal:ro -it <your IMAGE ID> bash

    pip install -r requirements.txt

    pip install .
    
    pip install bitsandbytes-0.42.0+das.opt1.dtk24042-py3-none-any.whl [开发者社区](https://download.sourcefind.cn:65024/directlink/4/bitsandbytes/DAS1.2/bitsandbytes-0.42.0+das.opt1.dtk24042-py3-none-any.whl)
    pip install diffusers-0.29.0+das.opt1.dtk24042-py3-none-any.whl [开发者社区](https://download.sourcefind.cn:65024/directlink/4/diffusers/DAS1.2/diffusers-0.29.0+das.opt1.dtk24042-py3-none-any.whl)


### Anaconda (方法三)
1、关于本项目DCU显卡所需的特殊深度学习库可从光合开发者社区下载安装：
https://developer.sourcefind.cn/tool/

    DTK驱动：dtk24.04.2
    python：python3.10
    torch:2.3.0
    torchvision:0.18.1
    triton:2.1.0
    apex:1.3.0
    bitsandbytes:0.42.0
    diffusers:0.29.0

Tips：以上dtk驱动、python、torch等DCU相关工具版本需要严格一一对应

2、其它非特殊库参照requirements.txt安装

    pip install -r requirements.txt

    pip install .

## 数据集

完整数据集（hd-vg-130m）下载：https://drive.google.com/drive/folders/154S6raNg9NpDGQRlRhhAaYcAx5xq1Ok8   

可使用下列数据用于快速验证

https://opendatalab.com/OpenDataLab/ImageNet-1K/tree/main/raw (ImageNet)

https://www.crcv.ucf.edu/research/data-sets/ucf101/ (UCF101)

链接：https://pan.baidu.com/s/1nPEAC_52IuB5KF-5BAqGDA 
提取码：kwai  （mini数据集）   

数据结构

    UCF-101/
    ├── ApplyEyeMakeup
    │   ├── v_ApplyEyeMakeup_g01_c01.avi
    │   ├── v_ApplyEyeMakeup_g01_c02.avi
    │   ├── v_ApplyEyeMakeup_g01_c03.avi
    │   ├── ...

使用脚本对数据进行处理并获取相应的csv文件

    # ImageNet
    python -m tools.datasets.convert_dataset imagenet IMAGENET_FOLDER --split train

    # UCF101
    python -m tools.datasets.convert_dataset ucf101 UCF101_FOLDER --split videos (如：ApplyEyeMakeup)

## 训练

敬请期待!

<!-- ### 模型下载

### 命令行
    
    # 若与huggingface网络连接错误，请执行命令
    export HF_ENDPOINT=https://hf-mirror.com

    # 1 GPU, 16x256x256
    torchrun --nnodes=1 --nproc_per_node=1 scripts/train.py configs/opensora/train/16x256x256.py --data-path YOUR_CSV_PATH
    # 8 GPUs, 64x512x512
    torchrun --nnodes=1 --nproc_per_node=8 scripts/train.py configs/opensora/train/64x512x512.py --data-path YOUR_CSV_PATH --ckpt-path YOUR_PRETRAINED_CKPT


同时参考`推理`部分T5下载。 -->

<!-- ### 命令行 -->


## 推理

### 模型下载

| Resoluion  | Data   | #iterations | Batch Size | GPU days (H800) | URL                                                                                           |SCNet高速下载通道|
| ---------- | ------ | ----------- | ---------- | --------------- | --------------------------------------------------------------------------------------------- |--------------------------- |            
| 16×256×256 | 366K   | 80k         | 8×64       | 117             | https://huggingface.co/hpcai-tech/Open-Sora/blob/main/OpenSora-v1-16x256x256.pth    |[SCNet]|
| 16×256×256 | 20K HQ | 24k         | 8×64       | 45              | https://huggingface.co/hpcai-tech/Open-Sora/blob/main/OpenSora-v1-HQ-16x256x256.pth |[SCNet]|
| 16×512×512 | 20K HQ | 20k         | 2×64       | 35              | https://huggingface.co/hpcai-tech/Open-Sora/blob/main/OpenSora-v1-HQ-16x512x512.pth |[SCNet]|


[t5-v1_1-xxl](https://huggingface.co/DeepFloyd/t5-v1_1-xxl/tree/main) (T5)

    pretrained_models/
    └── t5_ckpts
        └── t5-v1_1-xxl
            ├── config.json
            ├── pytorch_model-00001-of-00002.bin
            ├── pytorch_model-00002-of-00002.bin
            ├── pytorch_model.bin.index.json
            ├── special_tokens_map.json
            ├── spiece.model
            └── tokenizer_config.json
    
    models/
    ├── OpenSora-v1-HQ-16x256x256.pth
    └── ...


注意：可以使用`https://hf-mirror.com`加速下载相应的模型权重。


### 命令行

    # Sample 16x256x256 (5s/sample) 显存 ~32G
    torchrun --standalone --nproc_per_node 1 scripts/inference.py configs/opensora/inference/16x256x256.py --ckpt-path ./path/to/your/ckpt.pth

    # Sample 16x512x512 (20s/sample, 100 time steps) 显存 > 32G
    torchrun --standalone --nproc_per_node 1 scripts/inference.py configs/opensora/inference/16x512x512.py --ckpt-path ./path/to/your/ckpt.pth


## result

|模型|prompt|结果|
|:---|:---|:---|
|16×256×256|`assets/texts/t2v_samples.txt:1`|![alt text](readme_imgs/r0.gif)|
|16×256×256|`assets/texts/t2v_samples.txt:2`|![alt text](readme_imgs/r1.gif)|

### 精度

无

## 应用场景

### 算法类别

`视频生成`

### 热点应用行业

`媒体,科研,教育`

## 源码仓库及问题反馈

* https://developer.sourcefind.cn/codes/modelzoo/open-sora_pytorch

## 参考资料

* https://github.com/hpcaitech/Open-Sora