## Cinemo: Consistent and Controllable Image Animation with Motion Diffusion Models
Official PyTorch Implementation
[](https://arxiv.org/abs/2407.15642)
[](https://maxin-cn.github.io/cinemo_project/)
[](https://huggingface.co/spaces/maxin-cn/Cinemo)
> [**Cinemo: Consistent and Controllable Image Animation with Motion Diffusion Models**](https://maxin-cn.github.io/cinemo_project/)
> [Xin Ma](https://maxin-cn.github.io/), [Yaohui Wang*†](https://wyhsirius.github.io/), [Gengyun Jia](https://scholar.google.com/citations?user=_04pkGgAAAAJ&hl=zh-CN), [Xinyuan Chen](https://scholar.google.com/citations?user=3fWSC8YAAAAJ), [Yuan-Fang Li](https://users.monash.edu/~yli/), [Cunjian Chen*](https://cunjian.github.io/), [Yu Qiao](https://scholar.google.com.hk/citations?user=gFtI-8QAAAAJ&hl=zh-CN)
> (*Corresponding authors, †Project Lead)
This repo contains pre-trained weights, and sampling code of Cinemo. Please visit our [project page](https://maxin-cn.github.io/cinemo_project/) for more results.
## News
- (🔥 New) Jul. 29, 2024. 💥 [HuggingFace space](https://huggingface.co/spaces/maxin-cn/Cinemo) is added, you can also launch [gradio interface ](#gradio-interface) locally.
- (🔥 New) Jul. 23, 2024. 💥 Our paper is released on [arxiv](https://arxiv.org/abs/2407.15642).
- (🔥 New) Jun. 2, 2024. 💥 The inference code is released. The checkpoint can be found [here](https://huggingface.co/maxin-cn/Cinemo/tree/main).
## Setup
Download and set up the repo:
```bash
git clone https://github.com/maxin-cn/Cinemo
cd Cinemo
conda env create -f environment.yml
conda activate cinemo
```
## Animation
You can sample from our **pre-trained Cinemo models** with [`animation.py`](pipelines/animation.py). Weights for our pre-trained Cinemo model can be found [here](https://huggingface.co/maxin-cn/Cinemo/tree/main). The script has various arguments for adjusting sampling steps, changing the classifier-free guidance scale, etc:
```bash
bash pipelines/animation.sh
```
Related model weights will be downloaded automatically and following results can be obtained,
| Input image |
Output video |
Input image |
Output video |
 |
 |
 |
 |
| "People Walking" |
"Sea Swell" |
 |
 |
 |
 |
| "Girl Dancing under the Stars" |
"Dragon Glowing Eyes" |
 |
 |
 |
 |
| "Bubbles Floating upwards" |
"Snowman Waving his Hand" |
## Gradio interface
We also provide a local gradio interface, just run:
```bash
python app.py
```
You can specify the `--share` and `--server_name` arguments to meet your requirement!
## Other Applications
You can also utilize Cinemo for other applications, such as motion transfer and video editing:
```bash
bash pipelines/video_editing.sh
```
Related checkpoints will be downloaded automatically and following results will be obtained,
| Input video |
First frame |
Edited first frame |
Output video |
 |
 |
 |
 |
or motion transfer,
| Input video |
First frame |
Edited first frame |
Output video |
 |
 |
 |
 |
## Contact Us
Xin Ma: xin.ma1@monash.edu,
Yaohui Wang: wangyaohui@pjlab.org.cn
## Citation
If you find this work useful for your research, please consider citing it.
```bibtex
@article{ma2024cinemo,
title={Cinemo: Latent Diffusion Transformer for Video Generation},
author={Ma, Xin and Wang, Yaohui and Jia, Gengyun and Chen, Xinyuan and Li, Yuan-Fang and Chen, Cunjian and Qiao, Yu},
journal={arXiv preprint arXiv:2407.15642},
year={2024}
}
```
## Acknowledgments
Cinemo has been greatly inspired by the following amazing works and teams: [LaVie](https://github.com/Vchitect/LaVie) and [SEINE](https://github.com/Vchitect/SEINE), we thank all the contributors for open-sourcing.
## License
The code and model weights are licensed under [LICENSE](LICENSE).