# AnimateDiff

## 论文

**AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning**

* https://arxiv.org/abs/2307.04725

## 模型结构

$`\mathcal{E}`$（Encoder，用于压缩原始图像），`Base T2I`（文本生成图像模型，如Stable Diffusion），`Motion Modeling Module`（运动模型模块），`Personalized T2I`（个性化图像生成模型，如使用DreamBoth训练得到的模型），$`\mathcal{D}`$（Decoder，用于恢复/生成图像）。

![Alt text](readme_imgs/image-1.png)

## 算法原理

用途：该算法可以生成高质量的动画。

原理：

1.网络膨胀

将原始图像模型中的每个2D卷积和注意力层转换为仅空间伪3D层，通过将帧轴重塑为批次轴，使网络能够独立处理每一帧。同时，新插入的运动模块在每个批次中跨帧操作，以实现动画剪辑中的运动平滑性和内容一致性。

![Alt text](readme_imgs/image-2.png)

2.注意力机制

使用原始的时间注意力机制作为运动模块的设计，实现跨帧之间的有效信息交换。

## 环境配置

### Docker（方法一）

    docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:1.13.1-centos7.6-dtk-23.04.1-py39-latest
    docker run --shm-size 10g --network=host --name=animatediff --privileged --device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v 项目地址(绝对路径):/home/ -it <your IMAGE ID> bash
    pip install -r requirements.txt

### Dockerfile（方法二）

    # 需要在对应的目录下
    docker build -t <IMAGE_NAME>:<TAG> .
    # <your IMAGE ID>用以上拉取的docker的镜像ID替换
    docker run -it --shm-size 10g --network=host --name=animatediff --privileged --device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined <your IMAGE ID> bash
    pip install -r requirements.txt

### Anaconda (方法三)
1、关于本项目DCU显卡所需的特殊深度学习库可从光合开发者社区下载安装：
https://developer.hpccube.com/tool/

    DTK驱动：dtk23.04.1
    python：python3.9
    torch:1.13.1
    torchvision:0.14.1
    torchaudio:0.13.1
    deepspeed:0.9.2
    apex:0.1

Tips：以上dtk驱动、python、torch等DCU相关工具版本需要严格一一对应

2、其它非特殊库参照requirements.txt安装

    pip install -r requirements.txt


## 数据集

2.5M - 包含2.5M个数据（prompt-video）

http://www.robots.ox.ac.uk/~maxbain/webvid/results_2M_train.csv

http://www.robots.ox.ac.uk/~maxbain/webvid/results_2M_val.csv

10M - 包含10M个数据（prompt-video）

http://www.robots.ox.ac.uk/~maxbain/webvid/results_10M_train.csv

http://www.robots.ox.ac.uk/~maxbain/webvid/results_10M_val.csv

详情参考： https://github.com/m-bain/webvid

下载完上述`csv`文件后，需要执行`webvid`项目中的`download.py`下载相应的视频文件。

注意：该数据集为训练数据集

## 推理

### 模型下载

https://huggingface.co/guoyww/animatediff/tree/main

https://civitai.com/models/4201?modelVersionId=130072

https://civitai.com/models/30240?modelVersionId=125771

https://huggingface.co/openai/clip-vit-large-patch14/tree/main

    openai/
    └── clip-vit-large-patch14
        ├── config.json
        ├── merges.txt
        ├── preprocessor_config.json
        ├── pytorch_model.bin
        ├── special_tokens_map.json
        ├── tokenizer_config.json
        ├── tokenizer.json
        └── vocab.json

    models/
    ├── DreamBooth_LoRA
    │   ├── lyriel_v16.safetensors
    │   ├── Put personalized T2I checkpoints here.txt
    │   ├── realisticVisionV51_v51VAE.safetensors
    │   ├── toonyou_beta3.safetensors
    │   └── toonyou_beta6.safetensors
    ├── MotionLoRA
    │   ├── Put MotionLoRA checkpoints here.txt
    │   └── v2_lora_ZoomIn.ckpt
    ├── Motion_Module
    │   ├── mm_sd_v14.ckpt
    │   ├── mm_sd_v15.ckpt
    │   ├── mm_sd_v15_v2.ckpt
    │   ├── Put motion module checkpoints here.txt
    │   ├── v3_sd15_adapter.ckpt
    │   └── v3_sd15_mm.ckpt
    ├── SparseCtrl
    │   └── v3_sd15_sparsectrl_rgb.ckpt
    └── StableDiffusion
        ├── Put diffusers stable-diffusion-v1-5 repo here.txt
        └── stable-diffusion-v1-5
            ├── feature_extractor
            │   └── preprocessor_config.json
            ├── model_index.json
            ├── scheduler
            │   └── scheduler_config.json
            ├── text_encoder
            │   ├── config.json
            │   └── pytorch_model.bin
            ├── tokenizer
            │   ├── merges.txt
            │   ├── special_tokens_map.json
            │   ├── tokenizer_config.json
            │   └── vocab.json
            ├── unet
            │   ├── config.json
            │   └── diffusion_pytorch_model.bin
            ├── v1-5-pruned.ckpt
            └── vae
                ├── config.json
                └── diffusion_pytorch_model.bin

注意：以上模型并不是必选，仅提供文件结构，可根据需要自行选择部分或其他模型。

### 命令

    python -m scripts.animate --config configs/prompts/v1/v1-1-ToonYou.yaml --without-xformers

    python -m scripts.animate --config configs/prompts/v1/v1-2-Lyriel.yaml --without-xformers

    python -m scripts.animate --config configs/prompts/v2/v2-1-RealisticVision.yaml --without-xformers
 
    python -m scripts.animate --config configs/prompts/v3/v3-1-T2V.yaml --without-xformers

    python -m scripts.animate --config configs/prompts/v3/v3-2-animation-RealisticVision.yaml --without-xformers

注意：以上仅是部分推理示例，可以自行修改或编写`yaml`文件。

## 训练

### 数据结构

    data/
    └── videos
        ├── xxx.mp4
        └── xxx.mp4
    └── xxx.csv

注意：数据准备完成后需要修改`configs/trainging`中`yaml`文件中数据路径，如下所示。

    train_data:
    csv_path:        "data/results_2M_val.csv"
    video_folder:    "data/videos"

### 微调Unet原始层（image layers）

    torchrun --nnodes=1 --nproc_per_node=1 train.py --config configs/training/v1/image_finetune.yaml

### 训练motion modules

    torchrun --nnodes=1 --nproc_per_node=1 train.py --config configs/training/v1/training.yaml

## result

![Alt text](readme_imgs/sample.gif)

### 精度

无

## 应用场景

### 算法类别

`AIGC`

### 热点应用行业

`媒体,科研,教育`

## 源码仓库及问题反馈

https://developer.hpccube.com/codes/modelzoo/animatediff_pytorch

## 参考资料

* https://github.com/guoyww/AnimateDiff

* https://github.com/m-bain/webvid