"examples/trials/vscode:/vscode.git/clone" did not exist on "f892ed67aa3eaa1ee1ff4c12a94b2241c8a3788e"
Commit b96ae489 authored by mashun1's avatar mashun1
Browse files

magic-animate

parents
Pipeline #674 canceled with stages
__pycache__
.vscode
samples
xformers
src
third_party
backup
pretrained_models
*.nfs*
./*.png
./*.mp4
demo/tmp
demo/outputs
\ No newline at end of file
BSD 3-Clause License
Copyright 2023 MagicAnimate Team All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
1. Redistributions of source code must retain the above copyright notice, this
list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.
3. Neither the name of the copyright holder nor the names of its
contributors may be used to endorse or promote products derived from
this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
\ No newline at end of file
# magic-animate
## 论文
**MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model**
* https://arxiv.org/pdf/2311.16498.pdf
## 模型结构
如图所示,该模型的输入为`reference image`(该图像为参考图片),`DensePose sequence`(目标动作),`noisy latents`(随机初始化的噪声,长度与DensePose一致)。`Appearance Encoder`的作用是提取`reference image`的特征,`ControlNet`的作用是提取动作特征,`Temporal Attention`插入`2D-Unet`使其变为`3D-Unet`
![Alt text](readme_images/image-1.png)
## 算法原理
用途:该算法可以使图像中人物按照给定动作"动起来"。
原理:
1.时序一致性:在原先的2D-Unet中插入时序注意力层使其变为3D-Unet,将图像领域的扩散模型扩展至视频领域。
2.参考图片特征提取:具有改进的身份和背景保留功能,以增强单帧保真度和时间连贯性。
3.图片-视频联合训练:与图像数据集相比,视频数据集在身份、背景和姿势方面的规模要小得多,且变化更少,限制了动画框架有效学习参考条件的能力,使用联合训练,可以缓解该问题。
4.视频融合:将视频重叠的部分求平均
<img src="readme_images/image-5.png" alt="alt_text" width="300" height="200">
## 环境配置
### Docker(方法一)
docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:1.13.1-centos7.6-dtk-23.04.1-py39-latest
docker run --shm-size 10g --network=host --name=magic_animate --privileged --device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v 项目地址(绝对路径):/home/ -it <your IMAGE ID> bash
pip install -r requirements.txt
### Docker(方法二)
# 需要在对应的目录下
docker build -t <IMAGE_NAME>:<TAG> .
# <your IMAGE ID>用以上拉取的docker的镜像ID替换
docker run -it --shm-size 10g --network=host --name=magic_animate --privileged --device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined <your IMAGE ID> bash
### Anaconda (方法三)
1、关于本项目DCU显卡所需的特殊深度学习库可从光合开发者社区下载安装:
https://developer.hpccube.com/tool/
DTK驱动:dtk23.04.1
python:python3.9
torch:1.13.1
torchvision:0.14.1
torchaudio:0.13.1
deepspeed:0.9.2
apex:0.1
Tips:以上dtk驱动、python、torch等DCU相关工具版本需要严格一一对应
2、其它非特殊库参照requirements.txt安装
pip install -r requirements.txt
## 数据集
## 推理
模型下载地址:
MagicAnimate - https://huggingface.co/zcxu-eric/MagicAnimate/tree/main
sd-vae-ft-mse - https://huggingface.co/stabilityai/sd-vae-ft-mse
注意:如果无法访问,可以使用镜像 https://hf-mirror.com/
pretrained_models/
├── MagicAnimate
│ ├── appearance_encoder
│ │ ├── config.json
│ │ └── diffusion_pytorch_model.safetensors
│ ├── densepose_controlnet
│ │ ├── config.json
│ │ └── diffusion_pytorch_model.safetensors
│ └── temporal_attention
│ └── temporal_attention.ckpt
├── sd-vae-ft-mse
│ ├── config.json
│ └── diffusion_pytorch_model.bin
└── stable-diffusion-v1-5
├── text_encoder
│ ├── config.json
│ └── pytorch_model.bin
├── tokenizer
│ ├── merges.txt
│ ├── special_tokens_map.json
│ ├── tokenizer_config.json
│ └── vocab.json
├── unet
│ ├── config.json
│ └── diffusion_pytorch_model.bin
├── v1-5-pruned.ckpt
└── vae
├── config.json
└── diffusion_pytorch_model.bin
### 命令行
单卡运行
bash scripts/animate.sh
多卡运行
bash scripts/animate_dist.sh
### webui
单卡运行
python3 -m demo.gradio_animate
多卡运行
python3 -m demo.gradio_animate_dist
## result
![alt](readme_images/m.gif)
### 精度
## 应用场景
### 算法类别
`AIGC`
### 热点应用行业
`媒体、科研、教育`
## 源码仓库及问题反馈
* https://developer.hpccube.com/codes/modelzoo/magic-animate_pytorch
## 参考
* https://github.com/magic-research/magic-animate
\ No newline at end of file
<!-- # magic-edit.github.io -->
<p align="center">
<h2 align="center">MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model</h2>
<p align="center">
<a href="https://scholar.google.com/citations?user=-4iADzMAAAAJ&hl=en"><strong>Zhongcong Xu</strong></a>
·
<a href="http://jeff95.me/"><strong>Jianfeng Zhang</strong></a>
·
<a href="https://scholar.google.com.sg/citations?user=8gm-CYYAAAAJ&hl=en"><strong>Jun Hao Liew</strong></a>
·
<a href="https://hanshuyan.github.io/"><strong>Hanshu Yan</strong></a>
·
<a href="https://scholar.google.com/citations?user=stQQf7wAAAAJ&hl=en"><strong>Jia-Wei Liu</strong></a>
·
<a href="https://zhangchenxu528.github.io/"><strong>Chenxu Zhang</strong></a>
·
<a href="https://sites.google.com/site/jshfeng/home"><strong>Jiashi Feng</strong></a>
·
<a href="https://sites.google.com/view/showlab"><strong>Mike Zheng Shou</strong></a>
<br>
<br>
<a href="https://arxiv.org/abs/2311.16498"><img src='https://img.shields.io/badge/arXiv-MagicAnimate-red' alt='Paper PDF'></a>
<a href='https://showlab.github.io/magicanimate'><img src='https://img.shields.io/badge/Project_Page-MagicAnimate-green' alt='Project Page'></a>
<a href='https://huggingface.co/spaces/zcxu-eric/magicanimate'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue'></a>
<br>
<b>National University of Singapore &nbsp; | &nbsp; ByteDance</b>
</p>
<table align="center">
<tr>
<td>
<img src="assets/teaser/t4.gif">
</td>
<td>
<img src="assets/teaser/t2.gif">
</td>
</tr>
</table>
## 📢 News
* **[2023.12.4]** Release inference code and gradio demo. We are working to improve MagicAnimate, stay tuned!
* **[2023.11.23]** Release MagicAnimate paper and project page.
## 🏃‍♂️ Getting Started
Please download the pretrained base models for [StableDiffusion V1.5](https://huggingface.co/runwayml/stable-diffusion-v1-5) and [MSE-finetuned VAE](https://huggingface.co/stabilityai/sd-vae-ft-mse).
Download our MagicAnimate [checkpoints](https://huggingface.co/zcxu-eric/MagicAnimate).
**Place them as follows:**
```bash
magic-animate
|----pretrained_models
|----MagicAnimate
|----appearance_encoder
|----diffusion_pytorch_model.safetensors
|----config.json
|----densepose_controlnet
|----diffusion_pytorch_model.safetensors
|----config.json
|----temporal_attention
|----temporal_attention.ckpt
|----sd-vae-ft-mse
|----...
|----stable-diffusion-v1-5
|----...
|----...
```
## ⚒️ Installation
prerequisites: `python>=3.8`, `CUDA>=11.3`, and `ffmpeg`.
Install with `conda`:
```bash
conda env create -f environment.yaml
conda activate manimate
```
or `pip`:
```bash
pip3 install -r requirements.txt
```
## 💃 Inference
Run inference on single GPU:
```bash
bash scripts/animate.sh
```
Run inference with multiple GPUs:
```bash
bash scripts/animate_dist.sh
```
## 🎨 Gradio Demo
#### Online Gradio Demo:
Try our [online gradio demo](https://huggingface.co/spaces/zcxu-eric/magicanimate) quickly.
#### Local Gradio Demo:
Launch local gradio demo on single GPU:
```bash
python3 -m demo.gradio_animate
```
Launch local gradio demo if you have multiple GPUs:
```bash
python3 -m demo.gradio_animate_dist
```
Then open gradio demo in local browser.
## 🙏 Acknowledgements
We would like to thank [AK(@_akhaliq)](https://twitter.com/_akhaliq?lang=en) and huggingface team for the help of setting up oneline gradio demo.
## 🎓 Citation
If you find this codebase useful for your research, please use the following entry.
```BibTeX
@inproceedings{xu2023magicanimate,
author = {Xu, Zhongcong and Zhang, Jianfeng and Liew, Jun Hao and Yan, Hanshu and Liu, Jia-Wei and Zhang, Chenxu and Feng, Jiashi and Shou, Mike Zheng},
title = {MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model},
booktitle = {arXiv},
year = {2023}
}
```
unet_additional_kwargs:
unet_use_cross_frame_attention: false
unet_use_temporal_attention: false
use_motion_module: true
motion_module_resolutions:
- 1
- 2
- 4
- 8
motion_module_mid_block: false
motion_module_decoder_only: false
motion_module_type: Vanilla
motion_module_kwargs:
num_attention_heads: 8
num_transformer_block: 1
attention_block_types:
- Temporal_Self
- Temporal_Self
temporal_position_encoding: true
temporal_position_encoding_max_len: 24
temporal_attention_dim_div: 1
noise_scheduler_kwargs:
beta_start: 0.00085
beta_end: 0.012
beta_schedule: "linear"
pretrained_model_path: "pretrained_models/stable-diffusion-v1-5"
pretrained_vae_path: "pretrained_models/sd-vae-ft-mse"
pretrained_controlnet_path: "pretrained_models/MagicAnimate/densepose_controlnet"
pretrained_appearance_encoder_path: "pretrained_models/MagicAnimate/appearance_encoder"
pretrained_unet_path: ""
motion_module: "pretrained_models/MagicAnimate/temporal_attention/temporal_attention.ckpt"
savename: null
fusion_blocks: "midup"
seed: [1]
steps: 25
guidance_scale: 7.5
source_image:
- "inputs/applications/source_image/monalisa.png"
- "inputs/applications/source_image/demo4.png"
- "inputs/applications/source_image/dalle2.jpeg"
- "inputs/applications/source_image/dalle8.jpeg"
- "inputs/applications/source_image/multi1_source.png"
video_path:
- "inputs/applications/driving/densepose/running.mp4"
- "inputs/applications/driving/densepose/demo4.mp4"
- "inputs/applications/driving/densepose/running2.mp4"
- "inputs/applications/driving/densepose/dancing2.mp4"
- "inputs/applications/driving/densepose/multi_dancing.mp4"
inference_config: "configs/inference/inference.yaml"
size: 256
L: 16
S: 1
I: 0
clip: 0
offset: 0
max_length: null
video_type: "condition"
invert_video: false
save_individual_videos: false
# Copyright 2023 ByteDance and/or its affiliates.
#
# Copyright (2023) MagicAnimate Authors
#
# ByteDance, its affiliates and licensors retain all intellectual
# property and proprietary rights in and to this material, related
# documentation and any modifications thereto. Any use, reproduction,
# disclosure or distribution of this material and related documentation
# without an express license agreement from ByteDance or
# its affiliates is strictly prohibited.
import argparse
import argparse
import datetime
import inspect
import os
import numpy as np
from PIL import Image
from omegaconf import OmegaConf
from collections import OrderedDict
import torch
from diffusers import AutoencoderKL, DDIMScheduler, UniPCMultistepScheduler
from tqdm import tqdm
from transformers import CLIPTextModel, CLIPTokenizer
from magicanimate.models.unet_controlnet import UNet3DConditionModel
from magicanimate.models.controlnet import ControlNetModel
from magicanimate.models.appearance_encoder import AppearanceEncoderModel
from magicanimate.models.mutual_self_attention import ReferenceAttentionControl
from magicanimate.pipelines.pipeline_animation import AnimationPipeline
from magicanimate.utils.util import save_videos_grid
from accelerate.utils import set_seed
from magicanimate.utils.videoreader import VideoReader
from einops import rearrange, repeat
import csv, pdb, glob
from safetensors import safe_open
import math
from pathlib import Path
class MagicAnimate():
def __init__(self, config="configs/prompts/animation.yaml") -> None:
print("Initializing MagicAnimate Pipeline...")
*_, func_args = inspect.getargvalues(inspect.currentframe())
func_args = dict(func_args)
config = OmegaConf.load(config)
inference_config = OmegaConf.load(config.inference_config)
motion_module = config.motion_module
### >>> create animation pipeline >>> ###
tokenizer = CLIPTokenizer.from_pretrained(config.pretrained_model_path, subfolder="tokenizer")
text_encoder = CLIPTextModel.from_pretrained(config.pretrained_model_path, subfolder="text_encoder")
if config.pretrained_unet_path:
unet = UNet3DConditionModel.from_pretrained_2d(config.pretrained_unet_path, unet_additional_kwargs=OmegaConf.to_container(inference_config.unet_additional_kwargs))
else:
unet = UNet3DConditionModel.from_pretrained_2d(config.pretrained_model_path, subfolder="unet", unet_additional_kwargs=OmegaConf.to_container(inference_config.unet_additional_kwargs))
self.appearance_encoder = AppearanceEncoderModel.from_pretrained(config.pretrained_appearance_encoder_path, subfolder="appearance_encoder").cuda()
self.reference_control_writer = ReferenceAttentionControl(self.appearance_encoder, do_classifier_free_guidance=True, mode='write', fusion_blocks=config.fusion_blocks)
self.reference_control_reader = ReferenceAttentionControl(unet, do_classifier_free_guidance=True, mode='read', fusion_blocks=config.fusion_blocks)
if config.pretrained_vae_path is not None:
vae = AutoencoderKL.from_pretrained(config.pretrained_vae_path)
else:
vae = AutoencoderKL.from_pretrained(config.pretrained_model_path, subfolder="vae")
### Load controlnet
controlnet = ControlNetModel.from_pretrained(config.pretrained_controlnet_path)
vae.to(torch.float16)
unet.to(torch.float16)
text_encoder.to(torch.float16)
controlnet.to(torch.float16)
self.appearance_encoder.to(torch.float16)
unet.enable_xformers_memory_efficient_attention()
self.appearance_encoder.enable_xformers_memory_efficient_attention()
controlnet.enable_xformers_memory_efficient_attention()
self.pipeline = AnimationPipeline(
vae=vae, text_encoder=text_encoder, tokenizer=tokenizer, unet=unet, controlnet=controlnet,
scheduler=DDIMScheduler(**OmegaConf.to_container(inference_config.noise_scheduler_kwargs)),
# NOTE: UniPCMultistepScheduler
).to("cuda")
# 1. unet ckpt
# 1.1 motion module
motion_module_state_dict = torch.load(motion_module, map_location="cpu")
if "global_step" in motion_module_state_dict: func_args.update({"global_step": motion_module_state_dict["global_step"]})
motion_module_state_dict = motion_module_state_dict['state_dict'] if 'state_dict' in motion_module_state_dict else motion_module_state_dict
try:
# extra steps for self-trained models
state_dict = OrderedDict()
for key in motion_module_state_dict.keys():
if key.startswith("module."):
_key = key.split("module.")[-1]
state_dict[_key] = motion_module_state_dict[key]
else:
state_dict[key] = motion_module_state_dict[key]
motion_module_state_dict = state_dict
del state_dict
missing, unexpected = self.pipeline.unet.load_state_dict(motion_module_state_dict, strict=False)
assert len(unexpected) == 0
except:
_tmp_ = OrderedDict()
for key in motion_module_state_dict.keys():
if "motion_modules" in key:
if key.startswith("unet."):
_key = key.split('unet.')[-1]
_tmp_[_key] = motion_module_state_dict[key]
else:
_tmp_[key] = motion_module_state_dict[key]
missing, unexpected = unet.load_state_dict(_tmp_, strict=False)
assert len(unexpected) == 0
del _tmp_
del motion_module_state_dict
self.pipeline.to("cuda")
self.L = config.L
print("Initialization Done!")
def __call__(self, source_image, motion_sequence, random_seed, step, guidance_scale, size=512):
prompt = n_prompt = ""
random_seed = int(random_seed)
step = int(step)
guidance_scale = float(guidance_scale)
samples_per_video = []
# manually set random seed for reproduction
if random_seed != -1:
torch.manual_seed(random_seed)
set_seed(random_seed)
else:
torch.seed()
if motion_sequence.endswith('.mp4'):
control = VideoReader(motion_sequence).read()
if control[0].shape[0] != size:
control = [np.array(Image.fromarray(c).resize((size, size))) for c in control]
control = np.array(control)
if source_image.shape[0] != size:
source_image = np.array(Image.fromarray(source_image).resize((size, size)))
H, W, C = source_image.shape
init_latents = None
original_length = control.shape[0]
if control.shape[0] % self.L > 0:
control = np.pad(control, ((0, self.L-control.shape[0] % self.L), (0, 0), (0, 0), (0, 0)), mode='edge')
generator = torch.Generator(device=torch.device("cuda:0"))
generator.manual_seed(torch.initial_seed())
sample = self.pipeline(
prompt,
negative_prompt = n_prompt,
num_inference_steps = step,
guidance_scale = guidance_scale,
width = W,
height = H,
video_length = len(control),
controlnet_condition = control,
init_latents = init_latents,
generator = generator,
appearance_encoder = self.appearance_encoder,
reference_control_writer = self.reference_control_writer,
reference_control_reader = self.reference_control_reader,
source_image = source_image,
).videos
source_images = np.array([source_image] * original_length)
source_images = rearrange(torch.from_numpy(source_images), "t h w c -> 1 c t h w") / 255.0
samples_per_video.append(source_images)
control = control / 255.0
control = rearrange(control, "t h w c -> 1 c t h w")
control = torch.from_numpy(control)
samples_per_video.append(control[:, :, :original_length])
samples_per_video.append(sample[:, :, :original_length])
samples_per_video = torch.cat(samples_per_video)
time_str = datetime.datetime.now().strftime("%Y-%m-%dT%H-%M-%S")
savedir = f"demo/outputs"
animation_path = f"{savedir}/{time_str}.mp4"
os.makedirs(savedir, exist_ok=True)
save_videos_grid(samples_per_video, animation_path)
return animation_path
\ No newline at end of file
# Copyright 2023 ByteDance and/or its affiliates.
#
# Copyright (2023) MagicAnimate Authors
#
# ByteDance, its affiliates and licensors retain all intellectual
# property and proprietary rights in and to this material, related
# documentation and any modifications thereto. Any use, reproduction,
# disclosure or distribution of this material and related documentation
# without an express license agreement from ByteDance or
# its affiliates is strictly prohibited.
import argparse
import argparse
import datetime
import inspect
import os
import numpy as np
from PIL import Image
from omegaconf import OmegaConf
from collections import OrderedDict
import torch
import random
from diffusers import AutoencoderKL, DDIMScheduler, UniPCMultistepScheduler
from transformers import CLIPTextModel, CLIPTokenizer
from magicanimate.models.unet_controlnet import UNet3DConditionModel
from magicanimate.models.controlnet import ControlNetModel
from magicanimate.models.appearance_encoder import AppearanceEncoderModel
from magicanimate.models.mutual_self_attention import ReferenceAttentionControl
from magicanimate.pipelines.pipeline_animation import AnimationPipeline
from magicanimate.utils.util import save_videos_grid
from magicanimate.utils.dist_tools import distributed_init
from accelerate.utils import set_seed
from magicanimate.utils.videoreader import VideoReader
from einops import rearrange
animator = None
class MagicAnimate():
def __init__(self, args) -> None:
config=args.config
device = torch.device(f"cuda:{args.rank}")
print("Initializing MagicAnimate Pipeline...")
*_, func_args = inspect.getargvalues(inspect.currentframe())
func_args = dict(func_args)
config = OmegaConf.load(config)
inference_config = OmegaConf.load(config.inference_config)
motion_module = config.motion_module
### >>> create animation pipeline >>> ###
tokenizer = CLIPTokenizer.from_pretrained(config.pretrained_model_path, subfolder="tokenizer")
text_encoder = CLIPTextModel.from_pretrained(config.pretrained_model_path, subfolder="text_encoder")
if config.pretrained_unet_path:
unet = UNet3DConditionModel.from_pretrained_2d(config.pretrained_unet_path, unet_additional_kwargs=OmegaConf.to_container(inference_config.unet_additional_kwargs))
else:
unet = UNet3DConditionModel.from_pretrained_2d(config.pretrained_model_path, subfolder="unet", unet_additional_kwargs=OmegaConf.to_container(inference_config.unet_additional_kwargs))
self.appearance_encoder = AppearanceEncoderModel.from_pretrained(config.pretrained_appearance_encoder_path, subfolder="appearance_encoder").to(device)
self.reference_control_writer = ReferenceAttentionControl(self.appearance_encoder, do_classifier_free_guidance=True, mode='write', fusion_blocks=config.fusion_blocks)
self.reference_control_reader = ReferenceAttentionControl(unet, do_classifier_free_guidance=True, mode='read', fusion_blocks=config.fusion_blocks)
if config.pretrained_vae_path is not None:
vae = AutoencoderKL.from_pretrained(config.pretrained_vae_path)
else:
vae = AutoencoderKL.from_pretrained(config.pretrained_model_path, subfolder="vae")
### Load controlnet
controlnet = ControlNetModel.from_pretrained(config.pretrained_controlnet_path)
vae.to(torch.float16)
unet.to(torch.float16)
text_encoder.to(torch.float16)
controlnet.to(torch.float16)
self.appearance_encoder.to(torch.float16)
unet.enable_xformers_memory_efficient_attention()
self.appearance_encoder.enable_xformers_memory_efficient_attention()
controlnet.enable_xformers_memory_efficient_attention()
self.pipeline = AnimationPipeline(
vae=vae, text_encoder=text_encoder, tokenizer=tokenizer, unet=unet, controlnet=controlnet,
scheduler=DDIMScheduler(**OmegaConf.to_container(inference_config.noise_scheduler_kwargs)),
# NOTE: UniPCMultistepScheduler
)
# 1. unet ckpt
# 1.1 motion module
motion_module_state_dict = torch.load(motion_module, map_location="cpu")
if "global_step" in motion_module_state_dict: func_args.update({"global_step": motion_module_state_dict["global_step"]})
motion_module_state_dict = motion_module_state_dict['state_dict'] if 'state_dict' in motion_module_state_dict else motion_module_state_dict
try:
# extra steps for self-trained models
state_dict = OrderedDict()
for key in motion_module_state_dict.keys():
if key.startswith("module."):
_key = key.split("module.")[-1]
state_dict[_key] = motion_module_state_dict[key]
else:
state_dict[key] = motion_module_state_dict[key]
motion_module_state_dict = state_dict
del state_dict
missing, unexpected = self.pipeline.unet.load_state_dict(motion_module_state_dict, strict=False)
assert len(unexpected) == 0
except:
_tmp_ = OrderedDict()
for key in motion_module_state_dict.keys():
if "motion_modules" in key:
if key.startswith("unet."):
_key = key.split('unet.')[-1]
_tmp_[_key] = motion_module_state_dict[key]
else:
_tmp_[key] = motion_module_state_dict[key]
missing, unexpected = unet.load_state_dict(_tmp_, strict=False)
assert len(unexpected) == 0
del _tmp_
del motion_module_state_dict
self.pipeline.to(device)
self.L = config.L
print("Initialization Done!")
dist_kwargs = {"rank":args.rank, "world_size":args.world_size, "dist":args.dist}
self.predict(args.reference_image, args.motion_sequence, args.random_seed, args.step, args.guidance_scale, args.save_path, dist_kwargs)
def predict(self, source_image, motion_sequence, random_seed, step, guidance_scale, save_path, dist_kwargs, size=512):
prompt = n_prompt = ""
samples_per_video = []
# manually set random seed for reproduction
if random_seed != -1:
torch.manual_seed(random_seed)
set_seed(random_seed)
else:
torch.seed()
if motion_sequence.endswith('.mp4'):
control = VideoReader(motion_sequence).read()
if control[0].shape[0] != size:
control = [np.array(Image.fromarray(c).resize((size, size))) for c in control]
control = np.array(control)
if not isinstance(source_image, np.ndarray):
source_image = np.array(Image.open(source_image))
if source_image.shape[0] != size:
source_image = np.array(Image.fromarray(source_image).resize((size, size)))
H, W, C = source_image.shape
init_latents = None
original_length = control.shape[0]
if control.shape[0] % self.L > 0:
control = np.pad(control, ((0, self.L-control.shape[0] % self.L), (0, 0), (0, 0), (0, 0)), mode='edge')
generator = torch.Generator(device=torch.device("cuda:0"))
generator.manual_seed(torch.initial_seed())
sample = self.pipeline(
prompt,
negative_prompt = n_prompt,
num_inference_steps = step,
guidance_scale = guidance_scale,
width = W,
height = H,
video_length = len(control),
controlnet_condition = control,
init_latents = init_latents,
generator = generator,
appearance_encoder = self.appearance_encoder,
reference_control_writer = self.reference_control_writer,
reference_control_reader = self.reference_control_reader,
source_image = source_image,
**dist_kwargs,
).videos
if dist_kwargs.get('rank', 0) == 0:
source_images = np.array([source_image] * original_length)
source_images = rearrange(torch.from_numpy(source_images), "t h w c -> 1 c t h w") / 255.0
samples_per_video.append(source_images)
control = control / 255.0
control = rearrange(control, "t h w c -> 1 c t h w")
control = torch.from_numpy(control)
samples_per_video.append(control[:, :, :original_length])
samples_per_video.append(sample[:, :, :original_length])
samples_per_video = torch.cat(samples_per_video)
save_videos_grid(samples_per_video, save_path)
def distributed_main(device_id, args):
args.rank = device_id
args.device_id = device_id
if torch.cuda.is_available():
torch.cuda.set_device(args.device_id)
torch.cuda.init()
distributed_init(args)
MagicAnimate(args)
def run(args):
if args.dist:
args.world_size = max(1, torch.cuda.device_count())
assert args.world_size <= torch.cuda.device_count()
if args.world_size > 0 and torch.cuda.device_count() > 1:
port = random.randint(10000, 20000)
args.init_method = f"tcp://localhost:{port}"
torch.multiprocessing.spawn(
fn=distributed_main,
args=(args,),
nprocs=args.world_size,
)
else:
MagicAnimate(args)
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument("--config", type=str, default="configs/prompts/animation.yaml", required=False)
parser.add_argument("--dist", type=bool, default=True, required=False)
parser.add_argument("--rank", type=int, default=0, required=False)
parser.add_argument("--world_size", type=int, default=1, required=False)
parser.add_argument("--reference_image", type=str, default=None, required=True)
parser.add_argument("--motion_sequence", type=str, default=None, required=True)
parser.add_argument("--random_seed", type=int, default=1, required=False)
parser.add_argument("--step", type=int, default=25, required=False)
parser.add_argument("--guidance_scale", type=float, default=7.5, required=False)
parser.add_argument("--save_path", type=str, default=None, required=True)
args = parser.parse_args()
run(args)
\ No newline at end of file
# Copyright 2023 ByteDance and/or its affiliates.
#
# Copyright (2023) MagicAnimate Authors
#
# ByteDance, its affiliates and licensors retain all intellectual
# property and proprietary rights in and to this material, related
# documentation and any modifications thereto. Any use, reproduction,
# disclosure or distribution of this material and related documentation
# without an express license agreement from ByteDance or
# its affiliates is strictly prohibited.
import argparse
import imageio
import numpy as np
import gradio as gr
from PIL import Image
from demo.animate import MagicAnimate
animator = MagicAnimate()
def animate(reference_image, motion_sequence_state, seed, steps, guidance_scale):
return animator(reference_image, motion_sequence_state, seed, steps, guidance_scale)
with gr.Blocks() as demo:
gr.HTML(
"""
<div style="display: flex; justify-content: center; align-items: center; text-align: center;">
<a href="https://github.com/magic-research/magic-animate" style="margin-right: 20px; text-decoration: none; display: flex; align-items: center;">
</a>
<div>
<h1 >MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model</h1>
<h5 style="margin: 0;">If you like our project, please give us a star ✨ on Github for the latest update.</h5>
<div style="display: flex; justify-content: center; align-items: center; text-align: center;>
<a href="https://arxiv.org/abs/2311.16498"><img src="https://img.shields.io/badge/Arxiv-2311.16498-red"></a>
<a href='https://showlab.github.io/magicanimate'><img src='https://img.shields.io/badge/Project_Page-MagicAnimate-green' alt='Project Page'></a>
<a href='https://github.com/magic-research/magic-animate'><img src='https://img.shields.io/badge/Github-Code-blue'></a>
</div>
</div>
</div>
""")
animation = gr.Video(format="mp4", label="Animation Results", autoplay=True)
with gr.Row():
reference_image = gr.Image(label="Reference Image")
motion_sequence = gr.Video(format="mp4", label="Motion Sequence")
with gr.Column():
random_seed = gr.Textbox(label="Random seed", value=1, info="default: -1")
sampling_steps = gr.Textbox(label="Sampling steps", value=25, info="default: 25")
guidance_scale = gr.Textbox(label="Guidance scale", value=7.5, info="default: 7.5")
submit = gr.Button("Animate")
def read_video(video):
reader = imageio.get_reader(video)
fps = reader.get_meta_data()['fps']
return video
def read_image(image, size=512):
return np.array(Image.fromarray(image).resize((size, size)))
# when user uploads a new video
motion_sequence.upload(
read_video,
motion_sequence,
motion_sequence
)
# when `first_frame` is updated
reference_image.upload(
read_image,
reference_image,
reference_image
)
# when the `submit` button is clicked
submit.click(
animate,
[reference_image, motion_sequence, random_seed, sampling_steps, guidance_scale],
animation
)
# Examples
gr.Markdown("## Examples")
gr.Examples(
examples=[
["inputs/applications/source_image/monalisa.png", "inputs/applications/driving/densepose/running.mp4"],
["inputs/applications/source_image/demo4.png", "inputs/applications/driving/densepose/demo4.mp4"],
["inputs/applications/source_image/dalle2.jpeg", "inputs/applications/driving/densepose/running2.mp4"],
["inputs/applications/source_image/dalle8.jpeg", "inputs/applications/driving/densepose/dancing2.mp4"],
["inputs/applications/source_image/multi1_source.png", "inputs/applications/driving/densepose/multi_dancing.mp4"],
],
inputs=[reference_image, motion_sequence],
outputs=animation,
)
demo.launch(share=True)
\ No newline at end of file
# Copyright 2023 ByteDance and/or its affiliates.
#
# Copyright (2023) MagicAnimate Authors
#
# ByteDance, its affiliates and licensors retain all intellectual
# property and proprietary rights in and to this material, related
# documentation and any modifications thereto. Any use, reproduction,
# disclosure or distribution of this material and related documentation
# without an express license agreement from ByteDance or
# its affiliates is strictly prohibited.
import argparse
import imageio
import os, datetime
import numpy as np
import gradio as gr
from PIL import Image
from subprocess import PIPE, run
os.makedirs("./demo/tmp", exist_ok=True)
savedir = f"demo/outputs"
os.makedirs(savedir, exist_ok=True)
def animate(reference_image, motion_sequence, seed, steps, guidance_scale):
time_str = datetime.datetime.now().strftime("%Y-%m-%dT%H-%M-%S")
animation_path = f"{savedir}/{time_str}.mp4"
save_path = "./demo/tmp/input_reference_image.png"
Image.fromarray(reference_image).save(save_path)
command = "python -m demo.animate_dist --reference_image {} --motion_sequence {} --random_seed {} --step {} --guidance_scale {} --save_path {}".format(
save_path,
motion_sequence,
seed,
steps,
guidance_scale,
animation_path
)
run(command, stdout=PIPE, stderr=PIPE, universal_newlines=True, shell=True)
return animation_path
with gr.Blocks() as demo:
gr.HTML(
"""
<div style="display: flex; justify-content: center; align-items: center; text-align: center;">
<a href="https://github.com/magic-research/magic-animate" style="margin-right: 20px; text-decoration: none; display: flex; align-items: center;">
</a>
<div>
<h1 >MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model</h1>
<h5 style="margin: 0;">If you like our project, please give us a star ✨ on Github for the latest update.</h5>
<div style="display: flex; justify-content: center; align-items: center; text-align: center;>
<a href="https://arxiv.org/abs/2311.16498"><img src="https://img.shields.io/badge/Arxiv-2311.16498-red"></a>
<a href='https://showlab.github.io/magicanimate'><img src='https://img.shields.io/badge/Project_Page-MagicAnimate-green' alt='Project Page'></a>
<a href='https://github.com/magic-research/magic-animate'><img src='https://img.shields.io/badge/Github-Code-blue'></a>
</div>
</div>
</div>
""")
animation = gr.Video(format="mp4", label="Animation Results", autoplay=True)
with gr.Row():
reference_image = gr.Image(label="Reference Image")
motion_sequence = gr.Video(format="mp4", label="Motion Sequence")
with gr.Column():
random_seed = gr.Textbox(label="Random seed", value=1, info="default: -1")
sampling_steps = gr.Textbox(label="Sampling steps", value=25, info="default: 25")
guidance_scale = gr.Textbox(label="Guidance scale", value=7.5, info="default: 7.5")
submit = gr.Button("Animate")
def read_video(video, size=512):
size = int(size)
reader = imageio.get_reader(video)
# fps = reader.get_meta_data()['fps']
frames = []
for img in reader:
frames.append(np.array(Image.fromarray(img).resize((size, size))))
save_path = "./demo/tmp/input_motion_sequence.mp4"
imageio.mimwrite(save_path, frames, fps=25)
return save_path
def read_image(image, size=512):
img = np.array(Image.fromarray(image).resize((size, size)))
return img
# when user uploads a new video
motion_sequence.upload(
read_video,
motion_sequence,
motion_sequence
)
# when `first_frame` is updated
reference_image.upload(
read_image,
reference_image,
reference_image
)
# when the `submit` button is clicked
submit.click(
animate,
[reference_image, motion_sequence, random_seed, sampling_steps, guidance_scale],
animation
)
# Examples
gr.Markdown("## Examples")
gr.Examples(
examples=[
["inputs/applications/source_image/monalisa.png", "inputs/applications/driving/densepose/running.mp4"],
["inputs/applications/source_image/demo4.png", "inputs/applications/driving/densepose/demo4.mp4"],
["inputs/applications/source_image/dalle2.jpeg", "inputs/applications/driving/densepose/running2.mp4"],
["inputs/applications/source_image/dalle8.jpeg", "inputs/applications/driving/densepose/dancing2.mp4"],
["inputs/applications/source_image/multi1_source.png", "inputs/applications/driving/densepose/multi_dancing.mp4"],
],
inputs=[reference_image, motion_sequence],
outputs=animation,
)
# demo.queue(max_size=10)
demo.launch(share=True)
\ No newline at end of file
name: manimate
channels:
- conda-forge
- defaults
dependencies:
- _libgcc_mutex=0.1=main
- _openmp_mutex=5.1=1_gnu
- asttokens=2.2.1=pyhd8ed1ab_0
- backcall=0.2.0=pyh9f0ad1d_0
- backports=1.0=pyhd8ed1ab_3
- backports.functools_lru_cache=1.6.5=pyhd8ed1ab_0
- ca-certificates=2023.7.22=hbcca054_0
- comm=0.1.4=pyhd8ed1ab_0
- debugpy=1.6.7=py38h6a678d5_0
- decorator=5.1.1=pyhd8ed1ab_0
- entrypoints=0.4=pyhd8ed1ab_0
- executing=1.2.0=pyhd8ed1ab_0
- ipykernel=6.25.1=pyh71e2992_0
- ipython=8.12.0=pyh41d4057_0
- jedi=0.19.0=pyhd8ed1ab_0
- jupyter_client=7.3.4=pyhd8ed1ab_0
- jupyter_core=4.12.0=py38h578d9bd_0
- ld_impl_linux-64=2.38=h1181459_1
- libffi=3.3=he6710b0_2
- libgcc-ng=11.2.0=h1234567_1
- libgomp=11.2.0=h1234567_1
- libsodium=1.0.18=h36c2ea0_1
- libstdcxx-ng=11.2.0=h1234567_1
- matplotlib-inline=0.1.6=pyhd8ed1ab_0
- ncurses=6.4=h6a678d5_0
- nest-asyncio=1.5.6=pyhd8ed1ab_0
- openssl=1.1.1l=h7f98852_0
- packaging=23.1=pyhd8ed1ab_0
- parso=0.8.3=pyhd8ed1ab_0
- pexpect=4.8.0=pyh1a96a4e_2
- pickleshare=0.7.5=py_1003
- pip=23.2.1=py38h06a4308_0
- prompt-toolkit=3.0.39=pyha770c72_0
- prompt_toolkit=3.0.39=hd8ed1ab_0
- ptyprocess=0.7.0=pyhd3deb0d_0
- pure_eval=0.2.2=pyhd8ed1ab_0
- pygments=2.16.1=pyhd8ed1ab_0
- python=3.8.5=h7579374_1
- python-dateutil=2.8.2=pyhd8ed1ab_0
- python_abi=3.8=2_cp38
- pyzmq=25.1.0=py38h6a678d5_0
- readline=8.2=h5eee18b_0
- setuptools=68.0.0=py38h06a4308_0
- six=1.16.0=pyh6c4a22f_0
- sqlite=3.41.2=h5eee18b_0
- stack_data=0.6.2=pyhd8ed1ab_0
- tk=8.6.12=h1ccaba5_0
- tornado=6.1=py38h0a891b7_3
- traitlets=5.9.0=pyhd8ed1ab_0
- typing_extensions=4.7.1=pyha770c72_0
- wcwidth=0.2.6=pyhd8ed1ab_0
- wheel=0.38.4=py38h06a4308_0
- xz=5.4.2=h5eee18b_0
- zeromq=4.3.4=h9c3ff4c_1
- zlib=1.2.13=h5eee18b_0
- pip:
- absl-py==1.4.0
- accelerate==0.22.0
- aiofiles==23.2.1
- aiohttp==3.8.5
- aiosignal==1.3.1
- altair==5.0.1
- annotated-types==0.5.0
- antlr4-python3-runtime==4.9.3
- anyio==3.7.1
- async-timeout==4.0.3
- attrs==23.1.0
- cachetools==5.3.1
- certifi==2023.7.22
- charset-normalizer==3.2.0
- click==8.1.7
- cmake==3.27.2
- contourpy==1.1.0
- cycler==0.11.0
- datasets==2.14.4
- dill==0.3.7
- einops==0.6.1
- exceptiongroup==1.1.3
- fastapi==0.103.0
- ffmpy==0.3.1
- filelock==3.12.2
- fonttools==4.42.1
- frozenlist==1.4.0
- fsspec==2023.6.0
- google-auth==2.22.0
- google-auth-oauthlib==1.0.0
- gradio==3.41.2
- gradio-client==0.5.0
- grpcio==1.57.0
- h11==0.14.0
- httpcore==0.17.3
- httpx==0.24.1
- huggingface-hub==0.16.4
- idna==3.4
- importlib-metadata==6.8.0
- importlib-resources==6.0.1
- jinja2==3.1.2
- joblib==1.3.2
- jsonschema==4.19.0
- jsonschema-specifications==2023.7.1
- kiwisolver==1.4.5
- lightning-utilities==0.9.0
- lit==16.0.6
- markdown==3.4.4
- markupsafe==2.1.3
- matplotlib==3.7.2
- mpmath==1.3.0
- multidict==6.0.4
- multiprocess==0.70.15
- networkx==3.1
- numpy==1.24.4
- nvidia-cublas-cu11==11.10.3.66
- nvidia-cuda-cupti-cu11==11.7.101
- nvidia-cuda-nvrtc-cu11==11.7.99
- nvidia-cuda-runtime-cu11==11.7.99
- nvidia-cudnn-cu11==8.5.0.96
- nvidia-cufft-cu11==10.9.0.58
- nvidia-curand-cu11==10.2.10.91
- nvidia-cusolver-cu11==11.4.0.1
- nvidia-cusparse-cu11==11.7.4.91
- nvidia-nccl-cu11==2.14.3
- nvidia-nvtx-cu11==11.7.91
- oauthlib==3.2.2
- omegaconf==2.3.0
- opencv-python==4.8.0.76
- orjson==3.9.5
- pandas==2.0.3
- pillow==9.5.0
- pkgutil-resolve-name==1.3.10
- protobuf==4.24.2
- psutil==5.9.5
- pyarrow==13.0.0
- pyasn1==0.5.0
- pyasn1-modules==0.3.0
- pydantic==2.3.0
- pydantic-core==2.6.3
- pydub==0.25.1
- pyparsing==3.0.9
- python-multipart==0.0.6
- pytorch-lightning==2.0.7
- pytz==2023.3
- pyyaml==6.0.1
- referencing==0.30.2
- regex==2023.8.8
- requests==2.31.0
- requests-oauthlib==1.3.1
- rpds-py==0.9.2
- rsa==4.9
- safetensors==0.3.3
- semantic-version==2.10.0
- sniffio==1.3.0
- starlette==0.27.0
- sympy==1.12
- tensorboard==2.14.0
- tensorboard-data-server==0.7.1
- tokenizers==0.13.3
- toolz==0.12.0
- torchmetrics==1.1.0
- tqdm==4.66.1
- transformers==4.32.0
- triton==2.0.0
- tzdata==2023.3
- urllib3==1.26.16
- uvicorn==0.23.2
- websockets==11.0.3
- werkzeug==2.3.7
- xxhash==3.3.0
- yarl==1.9.2
- zipp==3.16.2
- decord
- imageio==2.9.0
- imageio-ffmpeg==0.4.3
- timm
- scipy
- scikit-image
- av
- imgaug
- lpips
- ffmpeg-python
- torch==2.0.1
- torchvision==0.15.2
- xformers==0.0.22
- diffusers==0.21.4
prefix: /home/tiger/miniconda3/envs/manimate
\ No newline at end of file
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment