open_sora_inference

78bae405 · mashun1 · 78bae405 · 78bae405 · 78bae405 · 78bae405
Commit 78bae405 authored Mar 25, 2024 by mashun1
20 changed files
--- a/.github/workflows/close_issue.yaml
+++ b/.github/workflows/close_issue.yaml
+name: Close inactive issues
+on:
+  schedule:
+    - cron: "30 1 * * *"
+
+jobs:
+  close-issues:
+    runs-on: ubuntu-latest
+    permissions:
+      issues: write
+      pull-requests: write
+    steps:
+      - uses: actions/stale@v5
+        with:
+          days-before-issue-stale: 7
+          days-before-issue-close: 7
+          stale-issue-label: "stale"
+          stale-issue-message: "This issue is stale because it has been open for 7 days with no activity."
+          close-issue-message: "This issue was closed because it has been inactive for 7 days since being marked as stale."
+          days-before-pr-stale: -1
+          days-before-pr-close: -1
+          repo-token: ${{ secrets.GITHUB_TOKEN }}
--- a/.gitignore
+++ b/.gitignore
+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[cod]
+*$py.class
+
+# C extensions
+*.so
+
+# Distribution / packaging
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+share/python-wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+MANIFEST
+
+# PyInstaller
+#  Usually these files are written by a python script from a template
+#  before PyInstaller builds the exe, so as to inject date/other infos into it.
+*.manifest
+*.spec
+
+# Installer logs
+pip-log.txt
+pip-delete-this-directory.txt
+
+# Unit test / coverage reports
+htmlcov/
+.tox/
+.nox/
+.coverage
+.coverage.*
+.cache
+nosetests.xml
+coverage.xml
+*.cover
+*.py,cover
+.hypothesis/
+.pytest_cache/
+cover/
+
+# Translations
+*.mo
+*.pot
+
+# Django stuff:
+*.log
+local_settings.py
+db.sqlite3
+db.sqlite3-journal
+
+# Flask stuff:
+instance/
+.webassets-cache
+
+# Scrapy stuff:
+.scrapy
+
+# Sphinx documentation
+docs/_build/
+
+# PyBuilder
+.pybuilder/
+target/
+
+# Jupyter Notebook
+.ipynb_checkpoints
+
+# IPython
+profile_default/
+ipython_config.py
+
+# pyenv
+#   For a library or package, you might want to ignore these files since the code is
+#   intended to run in multiple environments; otherwise, check them in:
+# .python-version
+
+# pipenv
+#   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
+#   However, in case of collaboration, if having platform-specific dependencies or dependencies
+#   having no cross-platform support, pipenv may install dependencies that don't work, or not
+#   install all needed dependencies.
+#Pipfile.lock
+
+# poetry
+#   Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
+#   This is especially recommended for binary packages to ensure reproducibility, and is more
+#   commonly ignored for libraries.
+#   https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
+#poetry.lock
+
+# pdm
+#   Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
+#pdm.lock
+#   pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
+#   in version control.
+#   https://pdm.fming.dev/#use-with-ide
+.pdm.toml
+
+# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
+__pypackages__/
+
+# Celery stuff
+celerybeat-schedule
+celerybeat.pid
+
+# SageMath parsed files
+*.sage.py
+
+# Environments
+.env
+.venv
+env/
+venv/
+ENV/
+env.bak/
+venv.bak/
+
+# Spyder project settings
+.spyderproject
+.spyproject
+
+# Rope project settings
+.ropeproject
+
+# mkdocs documentation
+/site
+
+# mypy
+.mypy_cache/
+.dmypy.json
+dmypy.json
+
+# Pyre type checker
+.pyre/
+
+# pytype static type analyzer
+.pytype/
+
+# Cython debug symbols
+cython_debug/
+
+# PyCharm
+#  JetBrains specific template is maintained in a separate JetBrains.gitignore that can
+#  be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
+#  and can be added to the global gitignore or merged into this file.  For a more nuclear
+#  option (not recommended) you can uncomment the following to ignore the entire idea folder.
+.idea/
+.vscode/
+
+# macos
+*.DS_Store
+
+# misc files
+data/
+dataset/
+runs/
+checkpoints/
+outputs/
+samples/
+pretrained_models/
+
+# Secret files
+hostfile
+
+
+models/*
+pretrained_models/*
--- a/.isort.cfg
+++ b/.isort.cfg
+[settings]
+line_length = 120
+multi_line_output=3
+include_trailing_comma = true
+ignore_comments = true
+profile = black
+honor_noqa = true
--- a/.pre-commit-config.yaml
+++ b/.pre-commit-config.yaml
+repos:
+
+  - repo: https://github.com/PyCQA/autoflake
+    rev: v2.2.1
+    hooks:
+      - id: autoflake
+        name: autoflake (python)
+        args: ['--in-place', '--remove-unused-variables', '--remove-all-unused-imports', '--ignore-init-module-imports']
+
+  - repo: https://github.com/pycqa/isort
+    rev: 5.12.0
+    hooks:
+      - id: isort
+        name: sort all imports (python)
+
+  - repo: https://github.com/psf/black-pre-commit-mirror
+    rev: 23.9.1
+    hooks:
+    - id: black
+      name: black formatter
+      args: ['--line-length=120', '--target-version=py37', '--target-version=py38', '--target-version=py39','--target-version=py310']
+
+  - repo: https://github.com/pre-commit/pre-commit-hooks
+    rev: v4.3.0
+    hooks:
+      - id: check-yaml
+      - id: check-merge-conflict
+      - id: check-case-conflict
+      - id: trailing-whitespace
+      - id: end-of-file-fixer
+      - id: mixed-line-ending
+        args: ['--fix=lf']
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
+# Contributing
+
+The Open-Sora project welcomes any constructive contribution from the community and the team is more than willing to work on problems you have encountered to make it a better project.
+
+## Development Environment Setup
+
+To contribute to Open-Sora, we would like to first guide you to set up a proper development environment so that you can better implement your code. You can install this library from source with the `editable` flag (`-e`, for development mode) so that your change to the source code will be reflected in runtime without re-installation. 
+
+You can refer to the [Installation Section](./README.md#installation) and replace `pip install -v .` with `pip install -v -e .`.
+
+
+### Code Style
+
+We have some static checks when you commit your code change, please make sure you can pass all the tests and make sure the coding style meets our requirements. We use pre-commit hook to make sure the code is aligned with the writing standard. To set up the code style checking, you need to follow the steps below.
+
+```shell
+# these commands are executed under the Open-Sora directory
+pip install pre-commit
+pre-commit install
+```
+
+Code format checking will be automatically executed when you commit your changes.
+
+
+## Contribution Guide
+
+You need to follow these steps below to make contribution to the main repository via pull request. You can learn about the details of pull request [here](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/about-pull-requests).
+
+### 1. Fork the Official Repository
+
+Firstly, you need to visit the [Open-Sora repository](https://github.com/hpcaitech/Open-Sora) and fork into your own account. The `fork` button is at the right top corner of the web page alongside with buttons such as `watch` and `star`.
+
+Now, you can clone your own forked repository into your local environment.
+
+```shell
+git clone https://github.com/<YOUR-USERNAME>/Open-Sora.git
+```
+
+### 2. Configure Git
+
+You need to set the official repository as your upstream so that you can synchronize with the latest update in the official repository. You can learn about upstream [here](https://www.atlassian.com/git/tutorials/git-forks-and-upstreams).
+
+Then add the original repository as upstream
+
+```shell
+cd Open-Sora
+git remote add upstream https://github.com/hpcaitech/Open-Sora.git
+```
+
+you can use the following command to verify that the remote is set. You should see both `origin` and `upstream` in the output.
+
+```shell
+git remote -v
+```
+
+### 3. Synchronize with Official Repository
+
+Before you make changes to the codebase, it is always good to fetch the latest updates in the official repository. In order to do so, you can use the commands below.
+
+```shell
+git fetch upstream
+git checkout main
+git merge upstream/main
+git push origin main
+```
+
+### 5. Create a New Branch
+
+You should not make changes to the `main` branch of your forked repository as this might make upstream synchronization difficult. You can create a new branch with the appropriate name. General branch name format should start with `hotfix/` and `feature/`. `hotfix` is for bug fix and `feature` is for addition of a new feature.
+
+
+```shell
+git checkout -b <NEW-BRANCH-NAME>
+```
+
+### 6. Implementation and Code Commit
+
+Now you can implement your code change in the source code. Remember that you installed the system in development, thus you do not need to uninstall and install to make the code take effect. The code change will be reflected in every new PyThon execution.
+You can commit and push the changes to your local repository. The changes should be kept logical, modular and atomic.
+
+```shell
+git add -A
+git commit -m "<COMMIT-MESSAGE>"
+git push -u origin <NEW-BRANCH-NAME>
+```
+
+### 7. Open a Pull Request
+
+You can now create a pull request on the GitHub webpage of your repository. The source branch is `<NEW-BRANCH-NAME>` of your repository and the target branch should be `main` of `hpcaitech/Open-Sora`. After creating this pull request, you should be able to see it [here](https://github.com/hpcaitech/Open-Sora/pulls).
+
+The Open-Sora team will review your code change and merge your code if applicable.
--- a/LICENSE
+++ b/LICENSE
--- a/README.md
+++ b/README.md
+# Open-Sora
+
+## 论文
+
+无
+
+## 模型结构
+
+该模型为基于`Transformer`的视频生成模型，包含`Video Encoder-Decoder`用于视频/图像的压缩/恢复，`Transformer-based Latent Stable Diffusion`用于扩散/恢复，以及`Conditioning`用于生成对训练视频的条件（这里指文本描述）。
+
+![alt text](readme_imgs/image-1.png)
+
+
+## 算法原理
+
+该算法通过在隐空间使用`Transformer`模型对视频进行扩散/反扩散学习视频的分布。
+
+![alt text](readme_imgs/image-2.png)
+
+
+## 环境配置
+
+### Docker（方法一）
+
+    docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.1.0-centos7.6-dtk23.10.1-py38
+
+    docker run --shm-size 10g --network=host --name=opensora --privileged --device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v 项目地址(绝对路径):/home/ -v /opt/hyhal:/opt/hyhal:ro -it <your IMAGE ID> bash
+
+    pip install flash_attn-2.0.4_torch2.1_dtk2310-cp38-cp38-linux_x86_64.whl  （whl.zip文件中）
+
+    pip install triton-2.1.0%2Bgit34f8189.abi0.dtk2310-cp38-cp38-manylinux2014_x86_64.whl （开发者社区下载）
+
+    cd xformers && pip install xformers==0.0.23 --no-deps && bash patch_xformers.rocm.sh  （whl.zip文件中）
+
+    pip install -r requirements.txt
+
+
+### Docker（方法二）
+
+    # 需要在对应的目录下
+    docker build -t <IMAGE_NAME>:<TAG> .
+
+    docker run --shm-size 10g --network=host --name=opensora --privileged --device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v 项目地址(绝对路径):/home/ -v /opt/hyhal:/opt/hyhal:ro -it <your IMAGE ID> bash
+
+    pip install flash_attn-2.0.4_torch2.1_dtk2310-cp38-cp38-linux_x86_64.whl  （whl.zip文件中）
+
+    pip install triton-2.1.0%2Bgit34f8189.abi0.dtk2310-cp38-cp38-manylinux2014_x86_64.whl （开发者社区下载）
+
+    cd xformers && pip install xformers==0.0.23 --no-deps && bash patch_xformers.rocm.sh  （whl.zip文件中）
+
+    pip install -r requirements.txt
+
+### Anaconda (方法三)
+1、关于本项目DCU显卡所需的特殊深度学习库可从光合开发者社区下载安装：
+https://developer.hpccube.com/tool/
+
+    DTK驱动：dtk23.10.1
+    python：python3.8
+    torch:2.1.0
+    torchvision:0.16.0
+    triton:2.1.0
+    apex:
+
+
+Tips：以上dtk驱动、python、torch等DCU相关工具版本需要严格一一对应
+
+2、其它非特殊库参照requirements.txt安装
+
+    pip install flash_attn-2.0.4_torch2.1_dtk2310-cp38-cp38-linux_x86_64.whl  （whl.zip文件中）
+
+    cd xformers && pip install xformers==0.0.23 --no-deps && bash patch_xformers.rocm.sh  （whl.zip文件中）
+
+    pip install -r requirements.txt
+
+## 数据集
+
+完整数据集下载：https://drive.google.com/drive/folders/154S6raNg9NpDGQRlRhhAaYcAx5xq1Ok8
+
+可使用下列数据用于快速验证
+
+https://opendatalab.com/OpenDataLab/ImageNet-1K/tree/main/raw (ImageNet)
+
+https://www.crcv.ucf.edu/research/data-sets/ucf101/ (UCF101)
+
+链接：https://pan.baidu.com/s/1nPEAC_52IuB5KF-5BAqGDA 
+提取码：kwai  （mini数据集）
+
+数据结构
+
+    UCF-101/
+    ├── ApplyEyeMakeup
+    │   ├── v_ApplyEyeMakeup_g01_c01.avi
+    │   ├── v_ApplyEyeMakeup_g01_c02.avi
+    │   ├── v_ApplyEyeMakeup_g01_c03.avi
+    │   ├── ...
+
+使用脚本对数据进行处理并获取相应的csv文件
+
+    # ImageNet
+    python -m tools.datasets.convert_dataset imagenet IMAGENET_FOLDER --split train
+
+    # UCF101
+    python -m tools.datasets.convert_dataset ucf101 UCF101_FOLDER --split videos (如：ApplyEyeMakeup)
+
+## 训练
+
+敬请期待!
+
+<!-- ### 模型下载
+
+### 命令行
+    
+    # 若与huggingface网络连接错误，请执行命令
+    export HF_ENDPOINT=https://hf-mirror.com
+
+    # 1 GPU, 16x256x256
+    torchrun --nnodes=1 --nproc_per_node=1 scripts/train.py configs/opensora/train/16x256x256.py --data-path YOUR_CSV_PATH
+    # 8 GPUs, 64x512x512
+    torchrun --nnodes=1 --nproc_per_node=8 scripts/train.py configs/opensora/train/64x512x512.py --data-path YOUR_CSV_PATH --ckpt-path YOUR_PRETRAINED_CKPT
+
+
+同时参考`推理`部分T5下载。 -->
+
+### 命令行
+
+
+## 推理
+
+### 模型下载
+
+| Resoluion  | Data   | #iterations | Batch Size | GPU days (H800) | URL                                                                                           |
+| ---------- | ------ | ----------- | ---------- | --------------- | --------------------------------------------------------------------------------------------- |
+| 16×256×256 | 366K   | 80k         | 8×64       | 117             | https://huggingface.co/hpcai-tech/Open-Sora/blob/main/OpenSora-v1-16x256x256.pth    |
+| 16×256×256 | 20K HQ | 24k         | 8×64       | 45              | https://huggingface.co/hpcai-tech/Open-Sora/blob/main/OpenSora-v1-HQ-16x256x256.pth |
+| 16×512×512 | 20K HQ | 20k         | 2×64       | 35              | https://huggingface.co/hpcai-tech/Open-Sora/blob/main/OpenSora-v1-HQ-16x512x512.pth |
+
+
+https://huggingface.co/DeepFloyd/t5-v1_1-xxl/tree/main  (T5)
+
+    pretrained_models/
+    └── t5_ckpts
+        └── t5-v1_1-xxl
+            ├── config.json
+            ├── pytorch_model-00001-of-00002.bin
+            ├── pytorch_model-00002-of-00002.bin
+            ├── pytorch_model.bin.index.json
+            ├── special_tokens_map.json
+            ├── spiece.model
+            └── tokenizer_config.json
+    
+    models/
+    ├── OpenSora-v1-HQ-16x256x256.pth
+    └── ...
+
+
+注意：可以使用`https://hf-mirror.com`加速下载相应的模型权重。
+
+
+### 命令行
+
+    # Sample 16x256x256 (5s/sample)
+    torchrun --standalone --nproc_per_node 1 scripts/inference.py configs/opensora/inference/16x256x256.py --ckpt-path ./path/to/your/ckpt.pth
+
+    # Sample 16x512x512 (20s/sample, 100 time steps)
+    torchrun --standalone --nproc_per_node 1 scripts/inference.py configs/opensora/inference/16x512x512.py --ckpt-path ./path/to/your/ckpt.pth
+
+    # Sample 64x512x512 (40s/sample, 100 time steps)
+    torchrun --standalone --nproc_per_node 1 scripts/inference.py configs/opensora/inference/64x512x512.py --ckpt-path ./path/to/your/ckpt.pth
+
+    # Sample 64x512x512 with sequence parallelism (30s/sample, 100 time steps)
+    # sequence parallelism is enabled automatically when nproc_per_node is larger than 1
+    torchrun --standalone --nproc_per_node 2 scripts/inference.py configs/opensora/inference/64x512x512.py --ckpt-path ./path/to/your/ckpt.pth
+
+## result
+
+|模型|prompt|结果|
+|:---|:---|:---|
+|16×256×256|`assets/texts/t2v_samples.txt:1`|![alt text](readme_imgs/r0.gif)|
+|16×256×256|`assets/texts/t2v_samples.txt:2`|![alt text](readme_imgs/r1.gif)|
+
+### 精度
+
+无
+
+## 应用场景
+
+### 算法类别
+
+`视频生成`
+
+### 热点应用行业
+
+`媒体,科研,教育`
+
+## 源码仓库及问题反馈
+
+* https://developer.hpccube.com/codes/modelzoo/open-sora_pytorch
+
+## 参考资料
+
+* https://github.com/hpcaitech/Open-Sora
--- a/README_official.md
+++ b/README_official.md
+<p align="center">
+    <img src="./assets/readme/icon.png" width="250"/>
+</p>
+
+<div align="center">
+    <a href="https://github.com/hpcaitech/Open-Sora/stargazers"><img src="https://img.shields.io/github/stars/hpcaitech/Open-Sora?style=social"></a>
+    <a href="https://hpcaitech.github.io/Open-Sora/"><img src="https://img.shields.io/badge/Gallery-View-orange?logo=&amp"></a>
+    <a href="https://discord.gg/shpbperhGs"><img src="https://img.shields.io/badge/Discord-join-blueviolet?logo=discord&amp"></a>
+    <a href="https://join.slack.com/t/colossalaiworkspace/shared_invite/zt-247ipg9fk-KRRYmUl~u2ll2637WRURVA"><img src="https://img.shields.io/badge/Slack-ColossalAI-blueviolet?logo=slack&amp"></a>
+    <a href="https://twitter.com/yangyou1991/status/1769411544083996787?s=61&t=jT0Dsx2d-MS5vS9rNM5e5g"><img src="https://img.shields.io/badge/Twitter-Discuss-blue?logo=twitter&amp"></a>
+    <a href="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/WeChat.png"><img src="https://img.shields.io/badge/微信-小助手加群-green?logo=wechat&amp"></a>
+</div>
+
+## Open-Sora: Democratizing Efficient Video Production for All
+We present **Open-Sora**, an initiative dedicated to **efficiently** produce high-quality video and make the model, 
+tools and contents accessible to all. By embracing **open-source** principles, 
+Open-Sora not only democratizes access to advanced video generation techniques, but also offers a 
+streamlined and user-friendly platform that simplifies the complexities of video production.
+With Open-Sora, we aim to inspire innovation, creativity, and inclusivity in the realm of content creation. [[中文]](/docs/README_zh.md)
+
+## 📰 News
+
+* **[2024.03.18]** 🔥 We release **Open-Sora 1.0**, a fully open-source project for video generation.
+Open-Sora 1.0 supports a full pipeline of video data preprocessing, training with
+<a href="https://github.com/hpcaitech/ColossalAI"><img src="assets/readme/colossal_ai.png" width="8%" ></a> acceleration,
+inference, and more. Our provided [checkpoints](#model-weights) can produce 2~5s 512x512 videos with only 3 days training.
+* **[2024.03.04]** Open-Sora provides training with 46% cost reduction.
+
+## 🎥 Latest Demo
+
+| **2s 512×512**                                                                                                                                                                 | **2s 512×512**                                                                                                                                                              | **2s 512×512**                                                                                                                                    |
+| ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------- |
+| [<img src="assets/readme/sample_0.gif" width="">](https://github.com/hpcaitech/Open-Sora/assets/99191637/de1963d3-b43b-4e68-a670-bb821ebb6f80)                                 | [<img src="assets/readme/sample_1.gif" width="">](https://github.com/hpcaitech/Open-Sora/assets/99191637/13f8338f-3d42-4b71-8142-d234fbd746cc)                              | [<img src="assets/readme/sample_2.gif" width="">](https://github.com/hpcaitech/Open-Sora/assets/99191637/fa6a65a6-e32a-4d64-9a9e-eabb0ebb8c16)    |
+| A serene night scene in a forested area. [...] The video is a time-lapse, capturing the transition from day to night, with the lake and forest serving as a constant backdrop. | A soaring drone footage captures the majestic beauty of a coastal cliff, [...] The water gently laps at the rock base and the greenery that clings to the top of the cliff. | The majestic beauty of a waterfall cascading down a cliff into a serene lake. [...] The camera angle provides a bird's eye view of the waterfall. |
+| [<img src="assets/readme/sample_3.gif" width="">](https://github.com/hpcaitech/Open-Sora/assets/99191637/64232f84-1b36-4750-a6c0-3e610fa9aa94)                                 | [<img src="assets/readme/sample_4.gif" width="">](https://github.com/hpcaitech/Open-Sora/assets/99191637/983a1965-a374-41a7-a76b-c07941a6c1e9)                              | [<img src="assets/readme/sample_5.gif" width="">](https://github.com/hpcaitech/Open-Sora/assets/99191637/ec10c879-9767-4c31-865f-2e8d6cf11e65)    |
+| A bustling city street at night, filled with the glow of car headlights and the ambient light of streetlights. [...]                                                           | The vibrant beauty of a sunflower field. The sunflowers are arranged in neat rows, creating a sense of order and symmetry. [...]                                            | A serene underwater scene featuring a sea turtle swimming through a coral reef. The turtle, with its greenish-brown shell [...]                   |
+
+Videos are downsampled to `.gif` for display. Click for original videos. Prompts are trimmed for display, see [here](/assets/texts/t2v_samples.txt) for full prompts. See more samples at our [gallery](https://hpcaitech.github.io/Open-Sora/).
+
+
+## 🔆 New Features/Updates
+
+* 📍 Open-Sora-v1 released. Model weights are available [here](#model-weights). With only 400K video clips and 200 H800 days (compared with 152M samples in Stable Video Diffusion), we are able to generate 2s 512×512 videos.
+* ✅ Three stages training from an image diffusion model to a video diffusion model. We provide the weights for each stage.
+* ✅ Support training acceleration including accelerated transformer, faster T5 and VAE, and sequence parallelism. Open-Sora improve **55%** training speed when training on 64x512x512 videos. Details locates at [acceleration.md](docs/acceleration.md).
+* ✅ We provide video cutting and captioning tools for data preprocessing. Instructions can be found [here](tools/data/README.md) and our data collection plan can be found at [datasets.md](docs/datasets.md).
+* ✅ We find VQ-VAE from [VideoGPT](https://wilson1yan.github.io/videogpt/index.html) has a low quality and thus adopt a better VAE from [Stability-AI](https://huggingface.co/stabilityai/sd-vae-ft-mse-original). We also find patching in the time dimension deteriorates the quality. See our **[report](docs/report_v1.md)** for more discussions.
+* ✅ We investigate different architectures including DiT, Latte, and our proposed STDiT. Our **STDiT** achieves a better trade-off between quality and speed. See our **[report](docs/report_v1.md)** for more discussions.
+* ✅ Support clip and T5 text conditioning.
+* ✅ By viewing images as one-frame videos, our project supports training DiT on both images and videos (e.g., ImageNet & UCF101). See [command.md](docs/command.md) for more instructions.
+* ✅ Support inference with official weights from [DiT](https://github.com/facebookresearch/DiT), [Latte](https://github.com/Vchitect/Latte), and [PixArt](https://pixart-alpha.github.io/).
+
+<details>
+<summary>View more</summary>
+
+* ✅ Refactor the codebase. See [structure.md](docs/structure.md) to learn the project structure and how to use the config files.
+
+</details>
+
+### TODO list sorted by priority
+
+* [ ] Complete the data processing pipeline (including dense optical flow, aesthetics scores, text-image similarity, deduplication, etc.). See [datasets.md](/docs/datasets.md) for more information. **[WIP]**
+* [ ] Training Video-VAE. **[WIP]**
+
+<details>
+<summary>View more</summary>
+
+* [ ] Support image and video conditioning.
+* [ ] Evaluation pipeline.
+* [ ] Incoporate a better scheduler, e.g., rectified flow in SD3.
+* [ ] Support variable aspect ratios, resolutions, durations.
+* [ ] Support SD3 when released.
+
+</details>
+
+## Contents
+
+* [Installation](#installation)
+* [Model Weights](#model-weights)
+* [Inference](#inference)
+* [Data Processing](#data-processing)
+* [Training](#training)
+* [Contribution](#contribution)
+* [Acknowledgement](#acknowledgement)
+* [Citation](#citation)
+
+## Installation
+
+```bash
+# create a virtual env
+conda create -n opensora python=3.10
+
+# install torch
+# the command below is for CUDA 12.1, choose install commands from 
+# https://pytorch.org/get-started/locally/ based on your own CUDA version
+pip3 install torch torchvision
+
+# install flash attention (optional)
+pip install packaging ninja
+pip install flash-attn --no-build-isolation
+
+# install apex (optional)
+pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" git+https://github.com/NVIDIA/apex.git
+
+# install xformers
+pip3 install -U xformers --index-url https://download.pytorch.org/whl/cu121
+
+# install this project
+git clone https://github.com/hpcaitech/Open-Sora
+cd Open-Sora
+pip install -v .
+```
+
+After installation, we suggest reading [structure.md](docs/structure.md) to learn the project structure and how to use the config files.
+
+## Model Weights
+
+| Resoluion  | Data   | #iterations | Batch Size | GPU days (H800) | URL                                                                                           |
+| ---------- | ------ | ----------- | ---------- | --------------- | --------------------------------------------------------------------------------------------- |
+| 16×256×256 | 366K   | 80k         | 8×64       | 117             | [:link:](https://huggingface.co/hpcai-tech/Open-Sora/blob/main/OpenSora-v1-16x256x256.pth)    |
+| 16×256×256 | 20K HQ | 24k         | 8×64       | 45              | [:link:](https://huggingface.co/hpcai-tech/Open-Sora/blob/main/OpenSora-v1-HQ-16x256x256.pth) |
+| 16×512×512 | 20K HQ | 20k         | 2×64       | 35              | [:link:](https://huggingface.co/hpcai-tech/Open-Sora/blob/main/OpenSora-v1-HQ-16x512x512.pth) |
+| 64×512×512 | 50K HQ |             |            |                 | TBD                                                                                           |
+
+Our model's weight is partially initialized from [PixArt-α](https://github.com/PixArt-alpha/PixArt-alpha). The number of parameters is 724M. More information about training can be found in our **[report](/docs/report_v1.md)**. More about dataset can be found in [dataset.md](/docs/dataset.md). HQ means high quality.
+
+:warning: **LIMITATION**: Our model is trained on a limited budget. The quality and text alignment is relatively poor. The model performs badly especially on generating human beings and cannot follow detailed instructions. We are working on improving the quality and text alignment.
+
+## Inference
+
+To run inference with our provided weights, first download [T5](https://huggingface.co/DeepFloyd/t5-v1_1-xxl/tree/main) weights into `pretrained_models/t5_ckpts/t5-v1_1-xxl`. Then download the model weights. Run the following commands to generate samples. See [here](docs/structure.md#inference-config-demos) to customize the configuration.
+
+```bash
+# Sample 16x256x256 (5s/sample)
+torchrun --standalone --nproc_per_node 1 scripts/inference.py configs/opensora/inference/16x256x256.py --ckpt-path ./path/to/your/ckpt.pth
+
+# Sample 16x512x512 (20s/sample, 100 time steps)
+torchrun --standalone --nproc_per_node 1 scripts/inference.py configs/opensora/inference/16x512x512.py --ckpt-path ./path/to/your/ckpt.pth
+
+# Sample 64x512x512 (40s/sample, 100 time steps)
+torchrun --standalone --nproc_per_node 1 scripts/inference.py configs/opensora/inference/64x512x512.py --ckpt-path ./path/to/your/ckpt.pth
+
+# Sample 64x512x512 with sequence parallelism (30s/sample, 100 time steps)
+# sequence parallelism is enabled automatically when nproc_per_node is larger than 1
+torchrun --standalone --nproc_per_node 2 scripts/inference.py configs/opensora/inference/64x512x512.py --ckpt-path ./path/to/your/ckpt.pth
+```
+
+The speed is tested on H800 GPUs. For inference with other models, see [here](docs/commands.md) for more instructions.
+
+## Data Processing
+
+High-quality Data is the key to high-quality models. Our used datasets and data collection plan is [here](/docs/datasets.md). We provide tools to process video data. Currently, our data processing pipeline includes the following steps:
+
+1. Downloading datasets. [[docs](/tools/datasets/README.md)]
+2. Split videos into clips. [[docs](/tools/scenedetect/README.md)]
+3. Generate video captions. [[docs](/tools/caption/README.md)]
+
+## Training
+
+To launch training, first download [T5](https://huggingface.co/DeepFloyd/t5-v1_1-xxl/tree/main) weights into `pretrained_models/t5_ckpts/t5-v1_1-xxl`. Then run the following commands to launch training on a single node.
+
+```bash
+# 1 GPU, 16x256x256
+torchrun --nnodes=1 --nproc_per_node=1 scripts/train.py configs/opensora/train/16x256x512.py --data-path YOUR_CSV_PATH
+# 8 GPUs, 64x512x512
+torchrun --nnodes=1 --nproc_per_node=8 scripts/train.py configs/opensora/train/64x512x512.py --data-path YOUR_CSV_PATH --ckpt-path YOUR_PRETRAINED_CKPT
+```
+
+To launch training on multiple nodes, prepare a hostfile according to [ColossalAI](https://colossalai.org/docs/basics/launch_colossalai/#launch-with-colossal-ai-cli), and run the following commands.
+
+```bash
+colossalai run --nproc_per_node 8 --hostfile hostfile scripts/train.py configs/opensora/train/64x512x512.py --data-path YOUR_CSV_PATH --ckpt-path YOUR_PRETRAINED_CKPT
+```
+
+For training other models and advanced usage, see [here](docs/commands.md) for more instructions.
+
+## Contribution
+
+If you wish to contribute to this project, you can refer to the [Contribution Guideline](./CONTRIBUTING.md).
+
+## Acknowledgement
+
+* [DiT](https://github.com/facebookresearch/DiT): Scalable Diffusion Models with Transformers.
+* [OpenDiT](https://github.com/NUS-HPC-AI-Lab/OpenDiT): An acceleration for DiT training. We adopt valuable acceleration strategies for training progress from OpenDiT.
+* [PixArt](https://github.com/PixArt-alpha/PixArt-alpha): An open-source DiT-based text-to-image model.
+* [Latte](https://github.com/Vchitect/Latte): An attempt to efficiently train DiT for video.
+* [StabilityAI VAE](https://huggingface.co/stabilityai/sd-vae-ft-mse-original): A powerful image VAE model.
+* [CLIP](https://github.com/openai/CLIP): A powerful text-image embedding model.
+* [T5](https://github.com/google-research/text-to-text-transfer-transformer): A powerful text encoder.
+* [LLaVA](https://github.com/haotian-liu/LLaVA): A powerful image captioning model based on [Yi-34B](https://huggingface.co/01-ai/Yi-34B).
+
+We are grateful for their exceptional work and generous contribution to open source.
+
+## Citation
+
+```bibtex
+@software{opensora,
+  author = {Zangwei Zheng and Xiangyu Peng and Yang You},
+  title = {Open-Sora: Democratizing Efficient Video Production for All},
+  month = {March},
+  year = {2024},
+  url = {https://github.com/hpcaitech/Open-Sora}
+}
+```
+
+[Zangwei Zheng](https://github.com/zhengzangw) and [Xiangyu Peng](https://github.com/xyupeng) equally contributed to this work during their internship at [HPC-AI Tech](https://hpc-ai.com/).
+
+## Star History
+
+[![Star History Chart](https://api.star-history.com/svg?repos=hpcaitech/Open-Sora&type=Date)](https://star-history.com/#hpcaitech/Open-Sora&Date)
--- a/assets/images/imagenet/train/n01440764/n01440764_10026.JPEG
+++ b/assets/images/imagenet/train/n01440764/n01440764_10026.JPEG
--- a/assets/images/imagenet/val/n01440764/ILSVRC2012_val_00000293.JPEG
+++ b/assets/images/imagenet/val/n01440764/ILSVRC2012_val_00000293.JPEG
--- a/assets/readme/colossal_ai.png
+++ b/assets/readme/colossal_ai.png
--- a/assets/readme/icon.png
+++ b/assets/readme/icon.png
--- a/assets/readme/sample_0.gif
+++ b/assets/readme/sample_0.gif
--- a/assets/readme/sample_1.gif
+++ b/assets/readme/sample_1.gif
--- a/assets/readme/sample_2.gif
+++ b/assets/readme/sample_2.gif
--- a/assets/readme/sample_3.gif
+++ b/assets/readme/sample_3.gif
--- a/assets/readme/sample_4.gif
+++ b/assets/readme/sample_4.gif
--- a/assets/readme/sample_5.gif
+++ b/assets/readme/sample_5.gif
--- a/assets/texts/imagenet_id.txt
+++ b/assets/texts/imagenet_id.txt
+207
+360
+387
+974
+88
+979
+417
+279
--- a/assets/texts/imagenet_labels.txt
+++ b/assets/texts/imagenet_labels.txt
+golden retriever
+otter
+lesser panda
+geyser
+macaw
+valley
+balloon
+golden panda