magic-animate

b96ae489 · mashun1 · b96ae489 · b96ae489 · b96ae489 · b96ae489
Commit b96ae489 authored Dec 16, 2023 by mashun1
20 changed files
--- a/.gitignore
+++ b/.gitignore
+__pycache__
+.vscode
+samples
+xformers
+src
+third_party
+backup
+pretrained_models
+*.nfs*
+./*.png
+./*.mp4
+demo/tmp
+demo/outputs
\ No newline at end of file
--- a/LICENSE
+++ b/LICENSE
+BSD 3-Clause License
+Copyright 2023 MagicAnimate Team All rights reserved.
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are met:
+1. Redistributions of source code must retain the above copyright notice, this
+   list of conditions and the following disclaimer.
+2. Redistributions in binary form must reproduce the above copyright notice,
+   this list of conditions and the following disclaimer in the documentation
+   and/or other materials provided with the distribution.
+3. Neither the name of the copyright holder nor the names of its
+   contributors may be used to endorse or promote products derived from
+   this software without specific prior written permission.
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
+FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
\ No newline at end of file
--- a/README.md
+++ b/README.md
+# magic-animate
+## 论文
+**MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model**
+* https://arxiv.org/pdf/2311.16498.pdf
+## 模型结构
+如图所示，该模型的输入为`reference image`（该图像为参考图片），`DensePose sequence`（目标动作），`noisy latents`（随机初始化的噪声，长度与DensePose一致）。`Appearance Encoder`的作用是提取`reference image`的特征，`ControlNet`的作用是提取动作特征，`Temporal Attention`插入`2D-Unet`使其变为`3D-Unet`。
+![Alt text](readme_images/image-1.png)
+## 算法原理
+用途：该算法可以使图像中人物按照给定动作"动起来"。
+原理：
+1.时序一致性：在原先的2D-Unet中插入时序注意力层使其变为3D-Unet，将图像领域的扩散模型扩展至视频领域。
+2.参考图片特征提取：具有改进的身份和背景保留功能，以增强单帧保真度和时间连贯性。
+3.图片-视频联合训练：与图像数据集相比，视频数据集在身份、背景和姿势方面的规模要小得多，且变化更少，限制了动画框架有效学习参考条件的能力，使用联合训练，可以缓解该问题。
+4.视频融合：将视频重叠的部分求平均
+<img src="readme_images/image-5.png" alt="alt_text" width="300" height="200">
+## 环境配置
+### Docker（方法一）
+    docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:1.13.1-centos7.6-dtk-23.04.1-py39-latest
+    docker run --shm-size 10g --network=host --name=magic_animate --privileged --device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v 项目地址(绝对路径):/home/ -it <your IMAGE ID> bash
+    pip install -r requirements.txt
+### Docker（方法二）
+    # 需要在对应的目录下
+    docker build -t <IMAGE_NAME>:<TAG> .
+    # <your IMAGE ID>用以上拉取的docker的镜像ID替换
+    docker run -it --shm-size 10g --network=host --name=magic_animate --privileged --device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined <your IMAGE ID> bash
+### Anaconda (方法三)
+1、关于本项目DCU显卡所需的特殊深度学习库可从光合开发者社区下载安装：
+https://developer.hpccube.com/tool/
+    DTK驱动：dtk23.04.1
+    python：python3.9
+    torch:1.13.1
+    torchvision:0.14.1
+    torchaudio:0.13.1
+    deepspeed:0.9.2
+    apex:0.1
+Tips：以上dtk驱动、python、torch等DCU相关工具版本需要严格一一对应
+2、其它非特殊库参照requirements.txt安装
+    pip install -r requirements.txt
+## 数据集
+无
+## 推理
+模型下载地址：
+MagicAnimate - https://huggingface.co/zcxu-eric/MagicAnimate/tree/main
+sd-vae-ft-mse - https://huggingface.co/stabilityai/sd-vae-ft-mse
+注意：如果无法访问，可以使用镜像 https://hf-mirror.com/
+    pretrained_models/
+    ├── MagicAnimate
+    │   ├── appearance_encoder
+    │   │   ├── config.json
+    │   │   └── diffusion_pytorch_model.safetensors
+    │   ├── densepose_controlnet
+    │   │   ├── config.json
+    │   │   └── diffusion_pytorch_model.safetensors
+    │   └── temporal_attention
+    │       └── temporal_attention.ckpt
+    ├── sd-vae-ft-mse
+    │   ├── config.json
+    │   └── diffusion_pytorch_model.bin
+    └── stable-diffusion-v1-5
+        ├── text_encoder
+        │   ├── config.json
+        │   └── pytorch_model.bin
+        ├── tokenizer
+        │   ├── merges.txt
+        │   ├── special_tokens_map.json
+        │   ├── tokenizer_config.json
+        │   └── vocab.json
+        ├── unet
+        │   ├── config.json
+        │   └── diffusion_pytorch_model.bin
+        ├── v1-5-pruned.ckpt
+        └── vae
+            ├── config.json
+            └── diffusion_pytorch_model.bin
+### 命令行
+单卡运行
+    bash scripts/animate.sh
+多卡运行
+    bash scripts/animate_dist.sh
+### webui
+单卡运行
+    python3 -m demo.gradio_animate
+多卡运行
+    python3 -m demo.gradio_animate_dist
+## result
+![alt](readme_images/m.gif)
+### 精度
+无
+## 应用场景
+### 算法类别
+`AIGC`
+### 热点应用行业
+`媒体、科研、教育`
+## 源码仓库及问题反馈
+* https://developer.hpccube.com/codes/modelzoo/magic-animate_pytorch
+## 参考
+* https://github.com/magic-research/magic-animate
\ No newline at end of file
--- a/README_official.md
+++ b/README_official.md
+<!-- # magic-edit.github.io -->
+<p align="center">
+  <h2 align="center">MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model</h2>
+  <p align="center">
+    <a href="https://scholar.google.com/citations?user=-4iADzMAAAAJ&hl=en"><strong>Zhongcong Xu</strong></a>
+    ·
+    <a href="http://jeff95.me/"><strong>Jianfeng Zhang</strong></a>
+    ·
+    <a href="https://scholar.google.com.sg/citations?user=8gm-CYYAAAAJ&hl=en"><strong>Jun Hao Liew</strong></a>
+    ·
+    <a href="https://hanshuyan.github.io/"><strong>Hanshu Yan</strong></a>
+    ·
+    <a href="https://scholar.google.com/citations?user=stQQf7wAAAAJ&hl=en"><strong>Jia-Wei Liu</strong></a>
+    ·
+    <a href="https://zhangchenxu528.github.io/"><strong>Chenxu Zhang</strong></a>
+    ·
+    <a href="https://sites.google.com/site/jshfeng/home"><strong>Jiashi Feng</strong></a>
+    ·
+    <a href="https://sites.google.com/view/showlab"><strong>Mike Zheng Shou</strong></a>
+    <br>
+    <br>
+        <a href="https://arxiv.org/abs/2311.16498"><img src='https://img.shields.io/badge/arXiv-MagicAnimate-red' alt='Paper PDF'></a>
+        <a href='https://showlab.github.io/magicanimate'><img src='https://img.shields.io/badge/Project_Page-MagicAnimate-green' alt='Project Page'></a>
+        <a href='https://huggingface.co/spaces/zcxu-eric/magicanimate'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue'></a>
+    <br>
+    <b>National University of Singapore &nbsp; | &nbsp;  ByteDance</b>
+  </p>
+  <table align="center">
+    <tr>
+    <td>
+      <img src="assets/teaser/t4.gif">
+    </td>
+    <td>
+      <img src="assets/teaser/t2.gif">
+    </td>
+    </tr>
+  </table>
+## 📢 News
+* **[2023.12.4]** Release inference code and gradio demo. We are working to improve MagicAnimate, stay tuned!
+* **[2023.11.23]** Release MagicAnimate paper and project page.
+## 🏃‍♂️ Getting Started
+Please download the pretrained base models for [StableDiffusion V1.5](https://huggingface.co/runwayml/stable-diffusion-v1-5) and [MSE-finetuned VAE](https://huggingface.co/stabilityai/sd-vae-ft-mse).
+Download our MagicAnimate [checkpoints](https://huggingface.co/zcxu-eric/MagicAnimate).
+**Place them as follows:**
+```bash
+magic-animate
+|----pretrained_models
+  |----MagicAnimate
+    |----appearance_encoder
+      |----diffusion_pytorch_model.safetensors
+      |----config.json
+    |----densepose_controlnet
+      |----diffusion_pytorch_model.safetensors
+      |----config.json
+    |----temporal_attention
+      |----temporal_attention.ckpt
+  |----sd-vae-ft-mse
+    |----...
+  |----stable-diffusion-v1-5
+    |----...
+|----...
+```
+## ⚒️ Installation
+prerequisites: `python>=3.8`, `CUDA>=11.3`, and `ffmpeg`.
+Install with `conda`: 
+```bash
+conda env create -f environment.yaml
+conda activate manimate
+```
+or `pip`:
+```bash
+pip3 install -r requirements.txt
+```
+## 💃 Inference
+Run inference on single GPU:
+```bash
+bash scripts/animate.sh
+```
+Run inference with multiple GPUs:
+```bash
+bash scripts/animate_dist.sh
+```
+## 🎨 Gradio Demo 
+#### Online Gradio Demo:
+Try our [online gradio demo](https://huggingface.co/spaces/zcxu-eric/magicanimate) quickly.
+#### Local Gradio Demo:
+Launch local gradio demo on single GPU:
+```bash
+python3 -m demo.gradio_animate
+```
+Launch local gradio demo if you have multiple GPUs:
+```bash
+python3 -m demo.gradio_animate_dist
+```
+Then open gradio demo in local browser.
+## 🙏 Acknowledgements
+We would like to thank [AK(@_akhaliq)](https://twitter.com/_akhaliq?lang=en) and huggingface team for the help of setting up oneline gradio demo.
+## 🎓 Citation
+If you find this codebase useful for your research, please use the following entry.
+```BibTeX
+@inproceedings{xu2023magicanimate,
+    author    = {Xu, Zhongcong and Zhang, Jianfeng and Liew, Jun Hao and Yan, Hanshu and Liu, Jia-Wei and Zhang, Chenxu and Feng, Jiashi and Shou, Mike Zheng},
+    title     = {MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model},
+    booktitle = {arXiv},
+    year      = {2023}
+}
+```
--- a/assets/preprint/MagicAnimate.pdf
+++ b/assets/preprint/MagicAnimate.pdf
--- a/assets/teaser/t2.gif
+++ b/assets/teaser/t2.gif
--- a/assets/teaser/t4.gif
+++ b/assets/teaser/t4.gif
--- a/configs/inference/inference.yaml
+++ b/configs/inference/inference.yaml
+unet_additional_kwargs:
+  unet_use_cross_frame_attention: false
+  unet_use_temporal_attention: false
+  use_motion_module: true
+  motion_module_resolutions:
+  - 1
+  - 2
+  - 4
+  - 8
+  motion_module_mid_block: false
+  motion_module_decoder_only: false
+  motion_module_type: Vanilla
+  motion_module_kwargs:
+    num_attention_heads: 8
+    num_transformer_block: 1
+    attention_block_types:
+    - Temporal_Self
+    - Temporal_Self
+    temporal_position_encoding: true
+    temporal_position_encoding_max_len: 24
+    temporal_attention_dim_div: 1
+noise_scheduler_kwargs:
+  beta_start: 0.00085
+  beta_end: 0.012
+  beta_schedule: "linear"
--- a/configs/prompts/animation.yaml
+++ b/configs/prompts/animation.yaml
+pretrained_model_path: "pretrained_models/stable-diffusion-v1-5"
+pretrained_vae_path: "pretrained_models/sd-vae-ft-mse"
+pretrained_controlnet_path: "pretrained_models/MagicAnimate/densepose_controlnet"
+pretrained_appearance_encoder_path: "pretrained_models/MagicAnimate/appearance_encoder"
+pretrained_unet_path: ""
+motion_module: "pretrained_models/MagicAnimate/temporal_attention/temporal_attention.ckpt"
+savename: null
+fusion_blocks: "midup"
+seed:           [1]
+steps:          25
+guidance_scale: 7.5
+source_image:
+  - "inputs/applications/source_image/monalisa.png"
+  - "inputs/applications/source_image/demo4.png"
+  - "inputs/applications/source_image/dalle2.jpeg"
+  - "inputs/applications/source_image/dalle8.jpeg"
+  - "inputs/applications/source_image/multi1_source.png"
+video_path:
+  - "inputs/applications/driving/densepose/running.mp4"
+  - "inputs/applications/driving/densepose/demo4.mp4"
+  - "inputs/applications/driving/densepose/running2.mp4"
+  - "inputs/applications/driving/densepose/dancing2.mp4"
+  - "inputs/applications/driving/densepose/multi_dancing.mp4"
+inference_config: "configs/inference/inference.yaml"
+size: 256
+L:    16
+S:    1 
+I:    0
+clip: 0
+offset: 0
+max_length: null
+video_type: "condition"
+invert_video: false
+save_individual_videos: false
--- a/demo/animate.py
+++ b/demo/animate.py
+# Copyright 2023 ByteDance and/or its affiliates.
+#
+# Copyright (2023) MagicAnimate Authors
+#
+# ByteDance, its affiliates and licensors retain all intellectual
+# property and proprietary rights in and to this material, related
+# documentation and any modifications thereto. Any use, reproduction,
+# disclosure or distribution of this material and related documentation
+# without an express license agreement from ByteDance or
+# its affiliates is strictly prohibited.
+import argparse
+import argparse
+import datetime
+import inspect
+import os
+import numpy as np
+from PIL import Image
+from omegaconf import OmegaConf
+from collections import OrderedDict
+import torch
+from diffusers import AutoencoderKL, DDIMScheduler, UniPCMultistepScheduler
+from tqdm import tqdm
+from transformers import CLIPTextModel, CLIPTokenizer
+from magicanimate.models.unet_controlnet import UNet3DConditionModel
+from magicanimate.models.controlnet import ControlNetModel
+from magicanimate.models.appearance_encoder import AppearanceEncoderModel
+from magicanimate.models.mutual_self_attention import ReferenceAttentionControl
+from magicanimate.pipelines.pipeline_animation import AnimationPipeline
+from magicanimate.utils.util import save_videos_grid
+from accelerate.utils import set_seed
+from magicanimate.utils.videoreader import VideoReader
+from einops import rearrange, repeat
+import csv, pdb, glob
+from safetensors import safe_open
+import math
+from pathlib import Path
+class MagicAnimate():
+    def __init__(self, config="configs/prompts/animation.yaml") -> None:
+        print("Initializing MagicAnimate Pipeline...")
+        *_, func_args = inspect.getargvalues(inspect.currentframe())
+        func_args = dict(func_args)
+        config  = OmegaConf.load(config)
+        inference_config = OmegaConf.load(config.inference_config)
+        motion_module = config.motion_module
+        ### >>> create animation pipeline >>> ###
+        tokenizer = CLIPTokenizer.from_pretrained(config.pretrained_model_path, subfolder="tokenizer")
+        text_encoder = CLIPTextModel.from_pretrained(config.pretrained_model_path, subfolder="text_encoder")
+        if config.pretrained_unet_path:
+            unet = UNet3DConditionModel.from_pretrained_2d(config.pretrained_unet_path, unet_additional_kwargs=OmegaConf.to_container(inference_config.unet_additional_kwargs))
+        else:
+            unet = UNet3DConditionModel.from_pretrained_2d(config.pretrained_model_path, subfolder="unet", unet_additional_kwargs=OmegaConf.to_container(inference_config.unet_additional_kwargs))
+        self.appearance_encoder = AppearanceEncoderModel.from_pretrained(config.pretrained_appearance_encoder_path, subfolder="appearance_encoder").cuda()
+        self.reference_control_writer = ReferenceAttentionControl(self.appearance_encoder, do_classifier_free_guidance=True, mode='write', fusion_blocks=config.fusion_blocks)
+        self.reference_control_reader = ReferenceAttentionControl(unet, do_classifier_free_guidance=True, mode='read', fusion_blocks=config.fusion_blocks)
+        if config.pretrained_vae_path is not None:
+            vae = AutoencoderKL.from_pretrained(config.pretrained_vae_path)
+        else:
+            vae = AutoencoderKL.from_pretrained(config.pretrained_model_path, subfolder="vae")
+        ### Load controlnet
+        controlnet   = ControlNetModel.from_pretrained(config.pretrained_controlnet_path)
+        vae.to(torch.float16)
+        unet.to(torch.float16)
+        text_encoder.to(torch.float16)
+        controlnet.to(torch.float16)
+        self.appearance_encoder.to(torch.float16)
+        unet.enable_xformers_memory_efficient_attention()
+        self.appearance_encoder.enable_xformers_memory_efficient_attention()
+        controlnet.enable_xformers_memory_efficient_attention()
+        self.pipeline = AnimationPipeline(
+            vae=vae, text_encoder=text_encoder, tokenizer=tokenizer, unet=unet, controlnet=controlnet,
+            scheduler=DDIMScheduler(**OmegaConf.to_container(inference_config.noise_scheduler_kwargs)),
+            # NOTE: UniPCMultistepScheduler
+        ).to("cuda")
+        # 1. unet ckpt
+        # 1.1 motion module
+        motion_module_state_dict = torch.load(motion_module, map_location="cpu")
+        if "global_step" in motion_module_state_dict: func_args.update({"global_step": motion_module_state_dict["global_step"]})
+        motion_module_state_dict = motion_module_state_dict['state_dict'] if 'state_dict' in motion_module_state_dict else motion_module_state_dict
+        try:
+            # extra steps for self-trained models
+            state_dict = OrderedDict()
+            for key in motion_module_state_dict.keys():
+                if key.startswith("module."):
+                    _key = key.split("module.")[-1]
+                    state_dict[_key] = motion_module_state_dict[key]
+                else:
+                    state_dict[key] = motion_module_state_dict[key]
+            motion_module_state_dict = state_dict
+            del state_dict
+            missing, unexpected = self.pipeline.unet.load_state_dict(motion_module_state_dict, strict=False)
+            assert len(unexpected) == 0
+        except:
+            _tmp_ = OrderedDict()
+            for key in motion_module_state_dict.keys():
+                if "motion_modules" in key:
+                    if key.startswith("unet."):
+                        _key = key.split('unet.')[-1]
+                        _tmp_[_key] = motion_module_state_dict[key]
+                    else:
+                        _tmp_[key] = motion_module_state_dict[key]
+            missing, unexpected = unet.load_state_dict(_tmp_, strict=False)
+            assert len(unexpected) == 0
+            del _tmp_
+        del motion_module_state_dict
+        self.pipeline.to("cuda")
+        self.L = config.L
+        print("Initialization Done!")
+    def __call__(self, source_image, motion_sequence, random_seed, step, guidance_scale, size=512):
+            prompt = n_prompt = ""
+            random_seed = int(random_seed)
+            step = int(step)
+            guidance_scale = float(guidance_scale)
+            samples_per_video = []
+            # manually set random seed for reproduction
+            if random_seed != -1: 
+                torch.manual_seed(random_seed)
+                set_seed(random_seed)
+            else:
+                torch.seed()
+            if motion_sequence.endswith('.mp4'):
+                control = VideoReader(motion_sequence).read()
+                if control[0].shape[0] != size:
+                    control = [np.array(Image.fromarray(c).resize((size, size))) for c in control]
+                control = np.array(control)
+            if source_image.shape[0] != size:
+                source_image = np.array(Image.fromarray(source_image).resize((size, size)))
+            H, W, C = source_image.shape
+            init_latents = None
+            original_length = control.shape[0]
+            if control.shape[0] % self.L > 0:
+                control = np.pad(control, ((0, self.L-control.shape[0] % self.L), (0, 0), (0, 0), (0, 0)), mode='edge')
+            generator = torch.Generator(device=torch.device("cuda:0"))
+            generator.manual_seed(torch.initial_seed())
+            sample = self.pipeline(
+                prompt,
+                negative_prompt         = n_prompt,
+                num_inference_steps     = step,
+                guidance_scale          = guidance_scale,
+                width                   = W,
+                height                  = H,
+                video_length            = len(control),
+                controlnet_condition    = control,
+                init_latents            = init_latents,
+                generator               = generator,
+                appearance_encoder       = self.appearance_encoder, 
+                reference_control_writer = self.reference_control_writer,
+                reference_control_reader = self.reference_control_reader,
+                source_image             = source_image,
+            ).videos
+            source_images = np.array([source_image] * original_length)
+            source_images = rearrange(torch.from_numpy(source_images), "t h w c -> 1 c t h w") / 255.0
+            samples_per_video.append(source_images)
+            control = control / 255.0
+            control = rearrange(control, "t h w c -> 1 c t h w")
+            control = torch.from_numpy(control)
+            samples_per_video.append(control[:, :, :original_length])
+            samples_per_video.append(sample[:, :, :original_length])
+            samples_per_video = torch.cat(samples_per_video)
+            time_str = datetime.datetime.now().strftime("%Y-%m-%dT%H-%M-%S")
+            savedir = f"demo/outputs"
+            animation_path = f"{savedir}/{time_str}.mp4"
+            os.makedirs(savedir, exist_ok=True)
+            save_videos_grid(samples_per_video, animation_path)
+            return animation_path
\ No newline at end of file
--- a/demo/animate_dist.py
+++ b/demo/animate_dist.py
+# Copyright 2023 ByteDance and/or its affiliates.
+#
+# Copyright (2023) MagicAnimate Authors
+#
+# ByteDance, its affiliates and licensors retain all intellectual
+# property and proprietary rights in and to this material, related
+# documentation and any modifications thereto. Any use, reproduction,
+# disclosure or distribution of this material and related documentation
+# without an express license agreement from ByteDance or
+# its affiliates is strictly prohibited.
+import argparse
+import argparse
+import datetime
+import inspect
+import os
+import numpy as np
+from PIL import Image
+from omegaconf import OmegaConf
+from collections import OrderedDict
+import torch
+import random
+from diffusers import AutoencoderKL, DDIMScheduler, UniPCMultistepScheduler
+from transformers import CLIPTextModel, CLIPTokenizer
+from magicanimate.models.unet_controlnet import UNet3DConditionModel
+from magicanimate.models.controlnet import ControlNetModel
+from magicanimate.models.appearance_encoder import AppearanceEncoderModel
+from magicanimate.models.mutual_self_attention import ReferenceAttentionControl
+from magicanimate.pipelines.pipeline_animation import AnimationPipeline
+from magicanimate.utils.util import save_videos_grid
+from magicanimate.utils.dist_tools import distributed_init
+from accelerate.utils import set_seed
+from magicanimate.utils.videoreader import VideoReader
+from einops import rearrange
+animator = None
+class MagicAnimate():
+    def __init__(self, args) -> None:
+        config=args.config
+        device = torch.device(f"cuda:{args.rank}")
+        print("Initializing MagicAnimate Pipeline...")
+        *_, func_args = inspect.getargvalues(inspect.currentframe())
+        func_args = dict(func_args)
+        config  = OmegaConf.load(config)
+        inference_config = OmegaConf.load(config.inference_config)
+        motion_module = config.motion_module
+        ### >>> create animation pipeline >>> ###
+        tokenizer = CLIPTokenizer.from_pretrained(config.pretrained_model_path, subfolder="tokenizer")
+        text_encoder = CLIPTextModel.from_pretrained(config.pretrained_model_path, subfolder="text_encoder")
+        if config.pretrained_unet_path:
+            unet = UNet3DConditionModel.from_pretrained_2d(config.pretrained_unet_path, unet_additional_kwargs=OmegaConf.to_container(inference_config.unet_additional_kwargs))
+        else:
+            unet = UNet3DConditionModel.from_pretrained_2d(config.pretrained_model_path, subfolder="unet", unet_additional_kwargs=OmegaConf.to_container(inference_config.unet_additional_kwargs))
+        self.appearance_encoder = AppearanceEncoderModel.from_pretrained(config.pretrained_appearance_encoder_path, subfolder="appearance_encoder").to(device)
+        self.reference_control_writer = ReferenceAttentionControl(self.appearance_encoder, do_classifier_free_guidance=True, mode='write', fusion_blocks=config.fusion_blocks)
+        self.reference_control_reader = ReferenceAttentionControl(unet, do_classifier_free_guidance=True, mode='read', fusion_blocks=config.fusion_blocks)
+        if config.pretrained_vae_path is not None:
+            vae = AutoencoderKL.from_pretrained(config.pretrained_vae_path)
+        else:
+            vae = AutoencoderKL.from_pretrained(config.pretrained_model_path, subfolder="vae")
+        ### Load controlnet
+        controlnet   = ControlNetModel.from_pretrained(config.pretrained_controlnet_path)
+        vae.to(torch.float16)
+        unet.to(torch.float16)
+        text_encoder.to(torch.float16)
+        controlnet.to(torch.float16)
+        self.appearance_encoder.to(torch.float16)
+        unet.enable_xformers_memory_efficient_attention()
+        self.appearance_encoder.enable_xformers_memory_efficient_attention()
+        controlnet.enable_xformers_memory_efficient_attention()
+        self.pipeline = AnimationPipeline(
+            vae=vae, text_encoder=text_encoder, tokenizer=tokenizer, unet=unet, controlnet=controlnet,
+            scheduler=DDIMScheduler(**OmegaConf.to_container(inference_config.noise_scheduler_kwargs)),
+            # NOTE: UniPCMultistepScheduler
+        )
+        # 1. unet ckpt
+        # 1.1 motion module
+        motion_module_state_dict = torch.load(motion_module, map_location="cpu")
+        if "global_step" in motion_module_state_dict: func_args.update({"global_step": motion_module_state_dict["global_step"]})
+        motion_module_state_dict = motion_module_state_dict['state_dict'] if 'state_dict' in motion_module_state_dict else motion_module_state_dict
+        try:
+            # extra steps for self-trained models
+            state_dict = OrderedDict()
+            for key in motion_module_state_dict.keys():
+                if key.startswith("module."):
+                    _key = key.split("module.")[-1]
+                    state_dict[_key] = motion_module_state_dict[key]
+                else:
+                    state_dict[key] = motion_module_state_dict[key]
+            motion_module_state_dict = state_dict
+            del state_dict
+            missing, unexpected = self.pipeline.unet.load_state_dict(motion_module_state_dict, strict=False)
+            assert len(unexpected) == 0
+        except:
+            _tmp_ = OrderedDict()
+            for key in motion_module_state_dict.keys():
+                if "motion_modules" in key:
+                    if key.startswith("unet."):
+                        _key = key.split('unet.')[-1]
+                        _tmp_[_key] = motion_module_state_dict[key]
+                    else:
+                        _tmp_[key] = motion_module_state_dict[key]
+            missing, unexpected = unet.load_state_dict(_tmp_, strict=False)
+            assert len(unexpected) == 0
+            del _tmp_
+        del motion_module_state_dict
+        self.pipeline.to(device)
+        self.L = config.L
+        print("Initialization Done!")
+        dist_kwargs = {"rank":args.rank, "world_size":args.world_size, "dist":args.dist}
+        self.predict(args.reference_image, args.motion_sequence, args.random_seed, args.step, args.guidance_scale, args.save_path, dist_kwargs)
+    def predict(self, source_image, motion_sequence, random_seed, step, guidance_scale, save_path, dist_kwargs, size=512):
+            prompt = n_prompt = ""
+            samples_per_video = []
+            # manually set random seed for reproduction
+            if random_seed != -1: 
+                torch.manual_seed(random_seed)
+                set_seed(random_seed)
+            else:
+                torch.seed()
+            if motion_sequence.endswith('.mp4'):
+                control = VideoReader(motion_sequence).read()
+                if control[0].shape[0] != size:
+                    control = [np.array(Image.fromarray(c).resize((size, size))) for c in control]
+                control = np.array(control)
+            if not isinstance(source_image, np.ndarray):
+                source_image = np.array(Image.open(source_image))
+            if source_image.shape[0] != size:
+                source_image = np.array(Image.fromarray(source_image).resize((size, size)))
+            H, W, C = source_image.shape
+            init_latents = None
+            original_length = control.shape[0]
+            if control.shape[0] % self.L > 0:
+                control = np.pad(control, ((0, self.L-control.shape[0] % self.L), (0, 0), (0, 0), (0, 0)), mode='edge')
+            generator = torch.Generator(device=torch.device("cuda:0"))
+            generator.manual_seed(torch.initial_seed())
+            sample = self.pipeline(
+                prompt,
+                negative_prompt         = n_prompt,
+                num_inference_steps     = step,
+                guidance_scale          = guidance_scale,
+                width                   = W,
+                height                  = H,
+                video_length            = len(control),
+                controlnet_condition    = control,
+                init_latents            = init_latents,
+                generator               = generator,
+                appearance_encoder       = self.appearance_encoder, 
+                reference_control_writer = self.reference_control_writer,
+                reference_control_reader = self.reference_control_reader,
+                source_image             = source_image,
+                **dist_kwargs,
+            ).videos
+            if dist_kwargs.get('rank', 0) == 0:
+                source_images = np.array([source_image] * original_length)
+                source_images = rearrange(torch.from_numpy(source_images), "t h w c -> 1 c t h w") / 255.0
+                samples_per_video.append(source_images)
+                control = control / 255.0
+                control = rearrange(control, "t h w c -> 1 c t h w")
+                control = torch.from_numpy(control)
+                samples_per_video.append(control[:, :, :original_length])
+                samples_per_video.append(sample[:, :, :original_length])
+                samples_per_video = torch.cat(samples_per_video)
+                save_videos_grid(samples_per_video, save_path)
+def distributed_main(device_id, args):
+    args.rank = device_id
+    args.device_id = device_id
+    if torch.cuda.is_available():
+        torch.cuda.set_device(args.device_id)
+        torch.cuda.init()
+    distributed_init(args)
+    MagicAnimate(args)
+def run(args):
+    if args.dist:
+        args.world_size = max(1, torch.cuda.device_count())
+        assert args.world_size <= torch.cuda.device_count()
+        if args.world_size > 0 and torch.cuda.device_count() > 1:
+            port = random.randint(10000, 20000)
+            args.init_method = f"tcp://localhost:{port}"
+            torch.multiprocessing.spawn(
+                fn=distributed_main,
+                args=(args,),
+                nprocs=args.world_size,
+            )
+    else:
+        MagicAnimate(args)
+if __name__ == '__main__':
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--config", type=str, default="configs/prompts/animation.yaml", required=False)
+    parser.add_argument("--dist", type=bool, default=True, required=False)
+    parser.add_argument("--rank", type=int, default=0, required=False)
+    parser.add_argument("--world_size", type=int, default=1, required=False)
+    parser.add_argument("--reference_image", type=str, default=None, required=True)
+    parser.add_argument("--motion_sequence", type=str, default=None, required=True)
+    parser.add_argument("--random_seed", type=int, default=1, required=False)
+    parser.add_argument("--step", type=int, default=25, required=False)
+    parser.add_argument("--guidance_scale", type=float, default=7.5, required=False)
+    parser.add_argument("--save_path", type=str, default=None, required=True)
+    args = parser.parse_args()
+    run(args)
\ No newline at end of file
--- a/demo/gradio_animate.py
+++ b/demo/gradio_animate.py
+# Copyright 2023 ByteDance and/or its affiliates.
+#
+# Copyright (2023) MagicAnimate Authors
+#
+# ByteDance, its affiliates and licensors retain all intellectual
+# property and proprietary rights in and to this material, related
+# documentation and any modifications thereto. Any use, reproduction,
+# disclosure or distribution of this material and related documentation
+# without an express license agreement from ByteDance or
+# its affiliates is strictly prohibited.
+import argparse
+import imageio
+import numpy as np
+import gradio as gr
+from PIL import Image
+from demo.animate import MagicAnimate
+animator = MagicAnimate()
+def animate(reference_image, motion_sequence_state, seed, steps, guidance_scale):
+    return animator(reference_image, motion_sequence_state, seed, steps, guidance_scale)
+with gr.Blocks() as demo:
+    gr.HTML(
+        """
+        <div style="display: flex; justify-content: center; align-items: center; text-align: center;">
+        <a href="https://github.com/magic-research/magic-animate" style="margin-right: 20px; text-decoration: none; display: flex; align-items: center;">
+        </a>
+        <div>
+            <h1 >MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model</h1>
+            <h5 style="margin: 0;">If you like our project, please give us a star ✨ on Github for the latest update.</h5>
+            <div style="display: flex; justify-content: center; align-items: center; text-align: center;>
+                <a href="https://arxiv.org/abs/2311.16498"><img src="https://img.shields.io/badge/Arxiv-2311.16498-red"></a>
+                <a href='https://showlab.github.io/magicanimate'><img src='https://img.shields.io/badge/Project_Page-MagicAnimate-green' alt='Project Page'></a>
+                <a href='https://github.com/magic-research/magic-animate'><img src='https://img.shields.io/badge/Github-Code-blue'></a>
+            </div>
+        </div>
+        </div>
+        """)
+    animation = gr.Video(format="mp4", label="Animation Results", autoplay=True)
+    with gr.Row():
+        reference_image  = gr.Image(label="Reference Image")
+        motion_sequence  = gr.Video(format="mp4", label="Motion Sequence")
+        with gr.Column():
+            random_seed         = gr.Textbox(label="Random seed", value=1, info="default: -1")
+            sampling_steps      = gr.Textbox(label="Sampling steps", value=25, info="default: 25")
+            guidance_scale      = gr.Textbox(label="Guidance scale", value=7.5, info="default: 7.5")
+            submit              = gr.Button("Animate")
+    def read_video(video):
+        reader = imageio.get_reader(video)
+        fps = reader.get_meta_data()['fps']
+        return video
+    def read_image(image, size=512):
+        return np.array(Image.fromarray(image).resize((size, size)))
+    # when user uploads a new video
+    motion_sequence.upload(
+        read_video,
+        motion_sequence,
+        motion_sequence
+    )
+    # when `first_frame` is updated
+    reference_image.upload(
+        read_image,
+        reference_image,
+        reference_image
+    )
+    # when the `submit` button is clicked
+    submit.click(
+        animate,
+        [reference_image, motion_sequence, random_seed, sampling_steps, guidance_scale], 
+        animation
+    )
+    # Examples
+    gr.Markdown("## Examples")
+    gr.Examples(
+        examples=[
+            ["inputs/applications/source_image/monalisa.png", "inputs/applications/driving/densepose/running.mp4"], 
+            ["inputs/applications/source_image/demo4.png", "inputs/applications/driving/densepose/demo4.mp4"],
+            ["inputs/applications/source_image/dalle2.jpeg", "inputs/applications/driving/densepose/running2.mp4"],
+            ["inputs/applications/source_image/dalle8.jpeg", "inputs/applications/driving/densepose/dancing2.mp4"],
+            ["inputs/applications/source_image/multi1_source.png", "inputs/applications/driving/densepose/multi_dancing.mp4"],
+        ],
+        inputs=[reference_image, motion_sequence],
+        outputs=animation,
+    )
+demo.launch(share=True)
\ No newline at end of file
--- a/demo/gradio_animate_dist.py
+++ b/demo/gradio_animate_dist.py
+# Copyright 2023 ByteDance and/or its affiliates.
+#
+# Copyright (2023) MagicAnimate Authors
+#
+# ByteDance, its affiliates and licensors retain all intellectual
+# property and proprietary rights in and to this material, related
+# documentation and any modifications thereto. Any use, reproduction,
+# disclosure or distribution of this material and related documentation
+# without an express license agreement from ByteDance or
+# its affiliates is strictly prohibited.
+import argparse
+import imageio
+import os, datetime
+import numpy as np
+import gradio as gr
+from PIL import Image
+from subprocess import PIPE, run
+os.makedirs("./demo/tmp", exist_ok=True)
+savedir = f"demo/outputs"
+os.makedirs(savedir, exist_ok=True)
+def animate(reference_image, motion_sequence, seed, steps, guidance_scale):
+    time_str = datetime.datetime.now().strftime("%Y-%m-%dT%H-%M-%S")
+    animation_path = f"{savedir}/{time_str}.mp4"
+    save_path = "./demo/tmp/input_reference_image.png"
+    Image.fromarray(reference_image).save(save_path)
+    command = "python -m demo.animate_dist --reference_image {} --motion_sequence {} --random_seed {} --step {} --guidance_scale {} --save_path {}".format(
+        save_path,
+        motion_sequence,
+        seed,
+        steps,
+        guidance_scale,
+        animation_path
+    )
+    run(command, stdout=PIPE, stderr=PIPE, universal_newlines=True, shell=True)
+    return animation_path
+with gr.Blocks() as demo:
+    gr.HTML(
+        """
+        <div style="display: flex; justify-content: center; align-items: center; text-align: center;">
+        <a href="https://github.com/magic-research/magic-animate" style="margin-right: 20px; text-decoration: none; display: flex; align-items: center;">
+        </a>
+        <div>
+            <h1 >MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model</h1>
+            <h5 style="margin: 0;">If you like our project, please give us a star ✨ on Github for the latest update.</h5>
+            <div style="display: flex; justify-content: center; align-items: center; text-align: center;>
+                <a href="https://arxiv.org/abs/2311.16498"><img src="https://img.shields.io/badge/Arxiv-2311.16498-red"></a>
+                <a href='https://showlab.github.io/magicanimate'><img src='https://img.shields.io/badge/Project_Page-MagicAnimate-green' alt='Project Page'></a>
+                <a href='https://github.com/magic-research/magic-animate'><img src='https://img.shields.io/badge/Github-Code-blue'></a>
+            </div>
+        </div>
+        </div>
+        """)
+    animation = gr.Video(format="mp4", label="Animation Results", autoplay=True)
+    with gr.Row():
+        reference_image  = gr.Image(label="Reference Image")
+        motion_sequence  = gr.Video(format="mp4", label="Motion Sequence")
+        with gr.Column():
+            random_seed         = gr.Textbox(label="Random seed", value=1, info="default: -1")
+            sampling_steps      = gr.Textbox(label="Sampling steps", value=25, info="default: 25")
+            guidance_scale      = gr.Textbox(label="Guidance scale", value=7.5, info="default: 7.5")
+            submit              = gr.Button("Animate")
+    def read_video(video, size=512):
+        size = int(size)
+        reader = imageio.get_reader(video)
+        # fps = reader.get_meta_data()['fps']
+        frames = []
+        for img in reader:
+            frames.append(np.array(Image.fromarray(img).resize((size, size))))
+        save_path = "./demo/tmp/input_motion_sequence.mp4"
+        imageio.mimwrite(save_path, frames, fps=25)
+        return save_path
+    def read_image(image, size=512):
+        img = np.array(Image.fromarray(image).resize((size, size)))
+        return img
+    # when user uploads a new video
+    motion_sequence.upload(
+        read_video,
+        motion_sequence,
+        motion_sequence
+    )
+    # when `first_frame` is updated
+    reference_image.upload(
+        read_image,
+        reference_image,
+        reference_image
+    )
+    # when the `submit` button is clicked
+    submit.click(
+        animate,
+        [reference_image, motion_sequence, random_seed, sampling_steps, guidance_scale], 
+        animation
+    )
+    # Examples
+    gr.Markdown("## Examples")
+    gr.Examples(
+        examples=[
+            ["inputs/applications/source_image/monalisa.png", "inputs/applications/driving/densepose/running.mp4"], 
+            ["inputs/applications/source_image/demo4.png", "inputs/applications/driving/densepose/demo4.mp4"],
+            ["inputs/applications/source_image/dalle2.jpeg", "inputs/applications/driving/densepose/running2.mp4"],
+            ["inputs/applications/source_image/dalle8.jpeg", "inputs/applications/driving/densepose/dancing2.mp4"],
+            ["inputs/applications/source_image/multi1_source.png", "inputs/applications/driving/densepose/multi_dancing.mp4"],
+        ],
+        inputs=[reference_image, motion_sequence],
+        outputs=animation,
+    )
+# demo.queue(max_size=10)
+demo.launch(share=True)
\ No newline at end of file
--- a/environment.yaml
+++ b/environment.yaml
+name: manimate
+channels:
+  - conda-forge
+  - defaults
+dependencies:
+  - _libgcc_mutex=0.1=main
+  - _openmp_mutex=5.1=1_gnu
+  - asttokens=2.2.1=pyhd8ed1ab_0
+  - backcall=0.2.0=pyh9f0ad1d_0
+  - backports=1.0=pyhd8ed1ab_3
+  - backports.functools_lru_cache=1.6.5=pyhd8ed1ab_0
+  - ca-certificates=2023.7.22=hbcca054_0
+  - comm=0.1.4=pyhd8ed1ab_0
+  - debugpy=1.6.7=py38h6a678d5_0
+  - decorator=5.1.1=pyhd8ed1ab_0
+  - entrypoints=0.4=pyhd8ed1ab_0
+  - executing=1.2.0=pyhd8ed1ab_0
+  - ipykernel=6.25.1=pyh71e2992_0
+  - ipython=8.12.0=pyh41d4057_0
+  - jedi=0.19.0=pyhd8ed1ab_0
+  - jupyter_client=7.3.4=pyhd8ed1ab_0
+  - jupyter_core=4.12.0=py38h578d9bd_0
+  - ld_impl_linux-64=2.38=h1181459_1
+  - libffi=3.3=he6710b0_2
+  - libgcc-ng=11.2.0=h1234567_1
+  - libgomp=11.2.0=h1234567_1
+  - libsodium=1.0.18=h36c2ea0_1
+  - libstdcxx-ng=11.2.0=h1234567_1
+  - matplotlib-inline=0.1.6=pyhd8ed1ab_0
+  - ncurses=6.4=h6a678d5_0
+  - nest-asyncio=1.5.6=pyhd8ed1ab_0
+  - openssl=1.1.1l=h7f98852_0
+  - packaging=23.1=pyhd8ed1ab_0
+  - parso=0.8.3=pyhd8ed1ab_0
+  - pexpect=4.8.0=pyh1a96a4e_2
+  - pickleshare=0.7.5=py_1003
+  - pip=23.2.1=py38h06a4308_0
+  - prompt-toolkit=3.0.39=pyha770c72_0
+  - prompt_toolkit=3.0.39=hd8ed1ab_0
+  - ptyprocess=0.7.0=pyhd3deb0d_0
+  - pure_eval=0.2.2=pyhd8ed1ab_0
+  - pygments=2.16.1=pyhd8ed1ab_0
+  - python=3.8.5=h7579374_1
+  - python-dateutil=2.8.2=pyhd8ed1ab_0
+  - python_abi=3.8=2_cp38
+  - pyzmq=25.1.0=py38h6a678d5_0
+  - readline=8.2=h5eee18b_0
+  - setuptools=68.0.0=py38h06a4308_0
+  - six=1.16.0=pyh6c4a22f_0
+  - sqlite=3.41.2=h5eee18b_0
+  - stack_data=0.6.2=pyhd8ed1ab_0
+  - tk=8.6.12=h1ccaba5_0
+  - tornado=6.1=py38h0a891b7_3
+  - traitlets=5.9.0=pyhd8ed1ab_0
+  - typing_extensions=4.7.1=pyha770c72_0
+  - wcwidth=0.2.6=pyhd8ed1ab_0
+  - wheel=0.38.4=py38h06a4308_0
+  - xz=5.4.2=h5eee18b_0
+  - zeromq=4.3.4=h9c3ff4c_1
+  - zlib=1.2.13=h5eee18b_0
+  - pip:
+      - absl-py==1.4.0
+      - accelerate==0.22.0
+      - aiofiles==23.2.1
+      - aiohttp==3.8.5
+      - aiosignal==1.3.1
+      - altair==5.0.1
+      - annotated-types==0.5.0
+      - antlr4-python3-runtime==4.9.3
+      - anyio==3.7.1
+      - async-timeout==4.0.3
+      - attrs==23.1.0
+      - cachetools==5.3.1
+      - certifi==2023.7.22
+      - charset-normalizer==3.2.0
+      - click==8.1.7
+      - cmake==3.27.2
+      - contourpy==1.1.0
+      - cycler==0.11.0
+      - datasets==2.14.4
+      - dill==0.3.7
+      - einops==0.6.1
+      - exceptiongroup==1.1.3
+      - fastapi==0.103.0
+      - ffmpy==0.3.1
+      - filelock==3.12.2
+      - fonttools==4.42.1
+      - frozenlist==1.4.0
+      - fsspec==2023.6.0
+      - google-auth==2.22.0
+      - google-auth-oauthlib==1.0.0
+      - gradio==3.41.2
+      - gradio-client==0.5.0
+      - grpcio==1.57.0
+      - h11==0.14.0
+      - httpcore==0.17.3
+      - httpx==0.24.1
+      - huggingface-hub==0.16.4
+      - idna==3.4
+      - importlib-metadata==6.8.0
+      - importlib-resources==6.0.1
+      - jinja2==3.1.2
+      - joblib==1.3.2
+      - jsonschema==4.19.0
+      - jsonschema-specifications==2023.7.1
+      - kiwisolver==1.4.5
+      - lightning-utilities==0.9.0
+      - lit==16.0.6
+      - markdown==3.4.4
+      - markupsafe==2.1.3
+      - matplotlib==3.7.2
+      - mpmath==1.3.0
+      - multidict==6.0.4
+      - multiprocess==0.70.15
+      - networkx==3.1
+      - numpy==1.24.4
+      - nvidia-cublas-cu11==11.10.3.66
+      - nvidia-cuda-cupti-cu11==11.7.101
+      - nvidia-cuda-nvrtc-cu11==11.7.99
+      - nvidia-cuda-runtime-cu11==11.7.99
+      - nvidia-cudnn-cu11==8.5.0.96
+      - nvidia-cufft-cu11==10.9.0.58
+      - nvidia-curand-cu11==10.2.10.91
+      - nvidia-cusolver-cu11==11.4.0.1
+      - nvidia-cusparse-cu11==11.7.4.91
+      - nvidia-nccl-cu11==2.14.3
+      - nvidia-nvtx-cu11==11.7.91
+      - oauthlib==3.2.2
+      - omegaconf==2.3.0
+      - opencv-python==4.8.0.76
+      - orjson==3.9.5
+      - pandas==2.0.3
+      - pillow==9.5.0
+      - pkgutil-resolve-name==1.3.10
+      - protobuf==4.24.2
+      - psutil==5.9.5
+      - pyarrow==13.0.0
+      - pyasn1==0.5.0
+      - pyasn1-modules==0.3.0
+      - pydantic==2.3.0
+      - pydantic-core==2.6.3
+      - pydub==0.25.1
+      - pyparsing==3.0.9
+      - python-multipart==0.0.6
+      - pytorch-lightning==2.0.7
+      - pytz==2023.3
+      - pyyaml==6.0.1
+      - referencing==0.30.2
+      - regex==2023.8.8
+      - requests==2.31.0
+      - requests-oauthlib==1.3.1
+      - rpds-py==0.9.2
+      - rsa==4.9
+      - safetensors==0.3.3
+      - semantic-version==2.10.0
+      - sniffio==1.3.0
+      - starlette==0.27.0
+      - sympy==1.12
+      - tensorboard==2.14.0
+      - tensorboard-data-server==0.7.1
+      - tokenizers==0.13.3
+      - toolz==0.12.0
+      - torchmetrics==1.1.0
+      - tqdm==4.66.1
+      - transformers==4.32.0
+      - triton==2.0.0
+      - tzdata==2023.3
+      - urllib3==1.26.16
+      - uvicorn==0.23.2
+      - websockets==11.0.3
+      - werkzeug==2.3.7
+      - xxhash==3.3.0
+      - yarl==1.9.2
+      - zipp==3.16.2
+      - decord
+      - imageio==2.9.0
+      - imageio-ffmpeg==0.4.3
+      - timm
+      - scipy
+      - scikit-image
+      - av
+      - imgaug
+      - lpips
+      - ffmpeg-python
+      - torch==2.0.1
+      - torchvision==0.15.2
+      - xformers==0.0.22
+      - diffusers==0.21.4
+prefix: /home/tiger/miniconda3/envs/manimate
\ No newline at end of file
--- a/inputs/applications/driving/densepose/dancing2.mp4
+++ b/inputs/applications/driving/densepose/dancing2.mp4
--- a/inputs/applications/driving/densepose/demo4.mp4
+++ b/inputs/applications/driving/densepose/demo4.mp4
--- a/inputs/applications/driving/densepose/multi_dancing.mp4
+++ b/inputs/applications/driving/densepose/multi_dancing.mp4
--- a/inputs/applications/driving/densepose/running.mp4
+++ b/inputs/applications/driving/densepose/running.mp4
--- a/inputs/applications/driving/densepose/running2.mp4
+++ b/inputs/applications/driving/densepose/running2.mp4
--- a/inputs/applications/source_image/dalle2.jpeg
+++ b/inputs/applications/source_image/dalle2.jpeg