## ___***DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors***___

[![Open in OpenXLab](https://cdn-static.openxlab.org.cn/app-center/openxlab_app.svg)](https://openxlab.org.cn/apps/detail/JinboXING/DynamiCrafter)

_**[Jinbo Xing](https://doubiiu.github.io/), [Menghan Xia*](https://menghanxia.github.io), [Yong Zhang](https://yzhang2016.github.io), [Haoxin Chen](), [Wangbo Yu](),
[Hanyuan Liu](https://github.com/hyliu), [Xintao Wang](https://xinntao.github.io/), [Tien-Tsin Wong*](https://www.cse.cuhk.edu.hk/~ttwong/myself.html), [Ying Shan](https://scholar.google.com/citations?hl=en&user=4oXBp9UAAAAJ&view_op=list_works&sortby=pubdate)**_

(* corresponding authors) From CUHK and Tencent AI Lab.

## 🔆 Introduction ### 🔥🔥 New Update Rolls Out for DynamiCrafter! Better Dynamic, Higher Resolution, and Stronger Coherence!
🤗 DynamiCrafter can animate open-domain still images based on text prompt by leveraging the pre-trained video diffusion priors. Please check our project page and paper for more information.
😀 We will continue to improve the model's performance. 👀 Seeking comparisons with [Stable Video Diffusion](https://stability.ai/news/stable-video-diffusion-open-ai-video-model) and [PikaLabs](https://pika.art/)? Click the image below. [![](https://img.youtube.com/vi/0NfmIsNAg-g/0.jpg)](https://www.youtube.com/watch?v=0NfmIsNAg-g) ### 1.1. Showcases (576x1024)

### 1.2. Showcases (320x512)

### 1.3. Showcases (256x256)

"bear playing guitar happily, snowing"		"boy walking on the street"

### 2. Applications #### 2.1 Storytelling video generation (see project page for more details)

#### 2.2 Looping video generation

#### 2.3 Generative frame interpolation

Input starting frame	Input ending frame	Generated video

## 📝 Changelog - __[2024.02.05]__: 🔥🔥 Release high-resolution models (320x512 & 576x1024). - __[2023.12.02]__: Launch the local Gradio demo. - __[2023.11.29]__: Release the main model at a resolution of 256x256. - __[2023.11.27]__: Launch the project page and update the arXiv preprint.
## 🧰 Models |Model|Resolution|GPU Mem. & Inference Time (A100, ddim 50steps)|Checkpoint| |:---------|:---------|:--------|:--------| |DynamiCrafter1024|576x1024|18.3GB & 75s (`perframe_ae=True`)|[Hugging Face](https://huggingface.co/Doubiiu/DynamiCrafter_1024/blob/main/model.ckpt)| |DynamiCrafter512|320x512|12.8GB & 20s (`perframe_ae=True`)|[Hugging Face](https://huggingface.co/Doubiiu/DynamiCrafter_512/blob/main/model.ckpt)| |DynamiCrafter256|256x256|11.9GB & 10s (`perframe_ae=False`)|[Hugging Face](https://huggingface.co/Doubiiu/DynamiCrafter/blob/main/model.ckpt)| Currently, our DynamiCrafter can support generating videos of up to 16 frames with a resolution of 576x1024. The inference time can be reduced by using fewer DDIM steps. GPU memory consumed on RTX 4090 reported by @noguchis in [Twitter](https://x.com/noguchis/status/1754488826016432341?s=20): 18.3GB (576x1024), 12.8GB (320x512), 11.9GB (256x256). ## ⚙️ Setup ### Install Environment via Anaconda (Recommended) ```bash conda create -n dynamicrafter python=3.8.5 conda activate dynamicrafter pip install -r requirements.txt ``` ## 💫 Inference ### 1. Command line 1) Download pretrained models via Hugging Face, and put the `model.ckpt` with the required resolution in `checkpoints/dynamicrafter_[1024|512|256]_v1/model.ckpt`. 2) Run the commands based on your devices and needs in terminal. ```bash # Run on a single GPU: # Select the model based on required resolutions: i.e., 1024|512|320: sh scripts/run.sh 1024 # Run on multiple GPUs for parallel inference: sh scripts/run_mp.sh 1024 ``` ### 2. Local Gradio demo 1. Download the pretrained models and put them in the corresponding directory according to the previous guidelines. 2. Input the following commands in terminal (choose a model based on the required resolution: 1024, 512 or 256). ```bash python gradio_app.py --res 1024 ``` Community Extensions: [ComfyUI](https://github.com/chaojie/ComfyUI-DynamiCrafter) (Thanks to [chaojie](https://github.com/chaojie)). ## 👨‍👩‍👧‍👦 Crafter Family [VideoCrafter1](https://github.com/AILab-CVC/VideoCrafter): Framework for high-quality video generation. [ScaleCrafter](https://github.com/YingqingHe/ScaleCrafter): Tuning-free method for high-resolution image/video generation. [TaleCrafter](https://github.com/AILab-CVC/TaleCrafter): An interactive story visualization tool that supports multiple characters. [LongerCrafter](https://github.com/arthur-qiu/LongerCrafter): Tuning-free method for longer high-quality video generation. [MakeYourVideo, might be a Crafter:)](https://doubiiu.github.io/projects/Make-Your-Video/): Video generation/editing with textual and structural guidance. [StyleCrafter](https://gongyeliu.github.io/StyleCrafter.github.io/): Stylized-image-guided text-to-image and text-to-video generation. ## 😉 Citation ```bib @article{xing2023dynamicrafter, title={DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors}, author={Xing, Jinbo and Xia, Menghan and Zhang, Yong and Chen, Haoxin and Yu, Wangbo and Liu, Hanyuan and Wang, Xintao and Wong, Tien-Tsin and Shan, Ying}, journal={arXiv preprint arXiv:2310.12190}, year={2023} } ``` ## 🙏 Acknowledgements We would like to thank [AK(@_akhaliq)](https://twitter.com/_akhaliq?lang=en) for the help of setting up hugging face online demo, and [camenduru](https://twitter.com/camenduru) for providing the replicate & colab online demo. ## 📢 Disclaimer We develop this repository for RESEARCH purposes, so it can only be used for personal/research/non-commercial purposes. ****