update

4cdc2890 · mashun1 · 040d4074 · 4cdc2890 · 4cdc2890 · 4cdc2890
Commit 4cdc2890 authored Sep 03, 2024 by mashun1
10 changed files
--- a/.gitignore
+++ b/.gitignore
@@ -27,3 +27,6 @@ models/StableDiffusion/*
 !models/MotionLoRA/
 !models/MotionLoRA/*.txt
 openai/
+train_data.csv
+train_data
\ No newline at end of file
--- a/Dockerfile
+++ b/Dockerfile
-FROM image.sourcefind.cn:5000/dcu/admin/base/pytorch:1.13.1-centos7.6-dtk-23.04.1-py39-latest
+FROM image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.1.0-ubuntu20.04-dtk24.04.1-py3.10
--- a/README.md
+++ b/README.md
@@ -30,8 +30,8 @@ $`\mathcal{E}`$（Encoder，用于压缩原始图像），`Base T2I`（文本生
 ### Docker（方法一）
-    docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:1.13.1-centos7.6-dtk-23.04.1-py39-latest
+    docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.1.0-ubuntu20.04-dtk24.04.1-py3.10
-    docker run --shm-size 10g --network=host --name=animatediff --privileged --device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v 项目地址(绝对路径):/home/ -it <your IMAGE ID> bash
+    docker run --shm-size 10g --network=host --name=animatediff --privileged --device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v /opt/hyhal:/opt/hyhal:ro -v 项目地址(绝对路径):/home/ -it <your IMAGE ID> bash
    pip install -r requirements.txt
 ### Dockerfile（方法二）
@@ -39,20 +39,17 @@ $`\mathcal{E}`$（Encoder，用于压缩原始图像），`Base T2I`（文本生
    # 需要在对应的目录下
    docker build -t <IMAGE_NAME>:<TAG> .
    # <your IMAGE ID>用以上拉取的docker的镜像ID替换
-    docker run -it --shm-size 10g --network=host --name=animatediff --privileged --device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined <your IMAGE ID> bash
+    docker run --shm-size 10g --network=host --name=animatediff --privileged --device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v /opt/hyhal:/opt/hyhal:ro -v 项目地址(绝对路径):/home/ -it <your IMAGE ID> bash
    pip install -r requirements.txt
 ### Anaconda (方法三)
 1、关于本项目DCU显卡所需的特殊深度学习库可从光合开发者社区下载安装：
 https://developer.hpccube.com/tool/
-    DTK驱动：dtk23.04.1
+    DTK驱动：dtk24.04.1
-    python：python3.9
+    python：python3.10
-    torch:1.13.1
+    torch:2.1.0
-    torchvision:0.14.1
+    torchvision:0.16.0
-    torchaudio:0.13.1
-    deepspeed:0.9.2
-    apex:0.1
 Tips：以上dtk驱动、python、torch等DCU相关工具版本需要严格一一对应
@@ -60,39 +57,27 @@ Tips：以上dtk驱动、python、torch等DCU相关工具版本需要严格一
    pip install -r requirements.txt
 ## 数据集
-官方数据目前已经下架，如需训练，请自行准备`文本-视频`数据。
+官方数据目前已经下架，如需训练，请自行准备`文本-视频`数据或使用本项目提供的数据集。
-<!-- 2.5M - 包含2.5M个数据（prompt-video）
-http://www.robots.ox.ac.uk/~maxbain/webvid/results_2M_train.csv
-http://www.robots.ox.ac.uk/~maxbain/webvid/results_2M_val.csv
-10M - 包含10M个数据（prompt-video）
+[原始链接](fudan-fuxi/VIDGEN-1M) | [SCNet高速下载通道](http://113.200.138.88:18080/aidatasets/fudan-fuxi/VIDGEN-1M)
-http://www.robots.ox.ac.uk/~maxbain/webvid/results_10M_train.csv
+本项目提供了用于处理数据的脚本，具体使用方法请参考`scripts/process_data.py`。
-http://www.robots.ox.ac.uk/~maxbain/webvid/results_10M_val.csv
+    data.csv
-详情参考： https://github.com/m-bain/webvid
+    |caption|video_path|
+    |xxxxxxx|xxxxx.mp4|
-下载完上述`csv`文件后，需要执行`webvid`项目中的`download.py`下载相应的视频文件。
-    data/
-    └── videos
-        ├── xxx.mp4
-        └── xxx.mp4
-    └── xxx.csv -->
 ## 训练
 数据准备完成后需要修改`configs/trainging`中`yaml`文件中数据路径，如下所示。
    train_data:
-    csv_path:        "data/results_2M_val.csv"
+    csv_path:        "<path/to/data.csv>"
-    video_folder:    "data/videos"
+    video_folder:    ""
 ### 微调Unet原始层（image layers）
@@ -104,17 +89,49 @@ http://www.robots.ox.ac.uk/~maxbain/webvid/results_10M_val.csv
 ## 推理
-### 模型下载
+    python -m scripts.animate --config configs/prompts/v1/v1-1-ToonYou.yaml --without-xformers
+    python -m scripts.animate --config configs/prompts/v1/v1-2-Lyriel.yaml --without-xformers
+    python -m scripts.animate --config configs/prompts/v2/v2-1-RealisticVision.yaml --without-xformers
+    python -m scripts.animate --config configs/prompts/v3/v3-1-T2V.yaml --without-xformers
+    python -m scripts.animate --config configs/prompts/v3/v3-2-animation-RealisticVision.yaml --without-xformers
+注意：以上仅是部分推理示例，可以自行修改或编写`yaml`文件。
+## result
-https://huggingface.co/guoyww/animatediff/tree/main
+![Alt text](readme_imgs/sample.gif)
-https://civitai.com/models/4201?modelVersionId=130072
+### 精度
+无
+## 应用场景
+### 算法类别
-https://civitai.com/models/30240?modelVersionId=125771
+`AIGC`
+### 热点应用行业
+`媒体,科研,教育`
+## 预训练权重
+### 模型下载
-https://huggingface.co/openai/clip-vit-large-patch14/tree/main
+CLIP: [原始链接](https://huggingface.co/openai/clip-vit-large-patch14/tree/main) | [SCNet高速下载通道](http://113.200.138.88:18080/aimodels/clip-vit-large-patch14)
-可使用`hf-mirror.com`替换`huggingface.co`加速模型下载。
+DreamBooth_LORA: 
+- toonyou_beta6: [原始链接](https://hf-mirror.com/frankjoshua/toonyou_beta6) | [SCNet高速下载通道](http://113.200.138.88:18080/aimodels/frankjoshua/toonyou_beta6)
+- 其他: [civitai](https://civitai.com/models)
+sd1.5: [原始链接](https://hf-mirror.com/Jiali/stable-diffusion-1.5/tree/main) | [SCNet高速下载通道](http://113.200.138.88:18080/aimodels/stable-diffusion-v1-5)
+Motion_Module: [原始链接](https://huggingface.co/guoyww/animatediff/tree/main) | [SCNet高速下载通道](http://113.200.138.88:18080/aimodels/AnimateDiff)
    openai/
    └── clip-vit-large-patch14
@@ -172,38 +189,6 @@ https://huggingface.co/openai/clip-vit-large-patch14/tree/main
 注意：以上模型并不是必选，仅提供文件结构，可根据需要自行选择部分或其他模型。
-### 命令
-    python -m scripts.animate --config configs/prompts/v1/v1-1-ToonYou.yaml --without-xformers
-    python -m scripts.animate --config configs/prompts/v1/v1-2-Lyriel.yaml --without-xformers
-    python -m scripts.animate --config configs/prompts/v2/v2-1-RealisticVision.yaml --without-xformers
-    python -m scripts.animate --config configs/prompts/v3/v3-1-T2V.yaml --without-xformers
-    python -m scripts.animate --config configs/prompts/v3/v3-2-animation-RealisticVision.yaml --without-xformers
-注意：以上仅是部分推理示例，可以自行修改或编写`yaml`文件。
-## result
-![Alt text](readme_imgs/sample.gif)
-### 精度
-无
-## 应用场景
-### 算法类别
-`AIGC`
-### 热点应用行业
-`媒体,科研,教育`
 ## 源码仓库及问题反馈
 https://developer.hpccube.com/codes/modelzoo/animatediff_pytorch

--- a/animatediff/data/dataset.py
+++ b/animatediff/data/dataset.py
@@ -77,6 +77,34 @@ class WebVid10M(Dataset):
        return sample
+class VIDGen(WebVid10M):
+    def get_batch(self, idx):
+        video_dict = self.dataset[idx]
+        # videoid, name, page_dir = video_dict['videoid'], video_dict['name'], video_dict['page_dir']
+        name, video_dir = video_dict['caption'], video_dict['video_path']
+        # video_dir    = os.path.join(self.video_folder, f"{videoid}.mp4")
+        video_reader = VideoReader(video_dir)
+        video_length = len(video_reader)
+        if not self.is_image:
+            clip_length = min(video_length, (self.sample_n_frames - 1) * self.sample_stride + 1)
+            start_idx   = random.randint(0, video_length - clip_length)
+            batch_index = np.linspace(start_idx, start_idx + clip_length - 1, self.sample_n_frames, dtype=int)
+        else:
+            batch_index = [random.randint(0, video_length - 1)]
+        pixel_values = torch.from_numpy(video_reader.get_batch(batch_index).asnumpy()).permute(0, 3, 1, 2).contiguous()
+        pixel_values = pixel_values / 255.
+        del video_reader
+        if self.is_image:
+            pixel_values = pixel_values[0]
+        return pixel_values, name
 if __name__ == "__main__":
    from animatediff.utils.util import save_videos_grid

--- a/configs/prompts/v2/v2-1-RealisticVision.yaml
+++ b/configs/prompts/v2/v2-1-RealisticVision.yaml
@@ -2,6 +2,7 @@
  motion_module:    "models/Motion_Module/mm_sd_v15_v2.ckpt"
  dreambooth_path: "models/DreamBooth_LoRA/realisticVisionV51_v51VAE.safetensors"
+  # dreambooth_path: "models/DreamBooth_LoRA/toonyou_beta6.safetensors"
  lora_model_path: ""
  seed:           [13100322578370451493, 14752961627088720670, 9329399085567825781, 16987697414827649302]
@@ -10,12 +11,12 @@
  prompt:
    - "b&w photo of 42 y.o man in black clothes, bald, face, half body, body, high detailed skin, skin pores, coastline, overcast weather, wind, waves, 8k uhd, dslr, soft lighting, high quality, film grain, Fujifilm XT3"
-    - "close up photo of a rabbit, forest, haze, halation, bloom, dramatic atmosphere, centred, rule of thirds, 200mm 1.4f macro shot"
+    # - "close up photo of a rabbit, forest, haze, halation, bloom, dramatic atmosphere, centred, rule of thirds, 200mm 1.4f macro shot"
-    - "photo of coastline, rocks, storm weather, wind, waves, lightning, 8k uhd, dslr, soft lighting, high quality, film grain, Fujifilm XT3"
+    # - "photo of coastline, rocks, storm weather, wind, waves, lightning, 8k uhd, dslr, soft lighting, high quality, film grain, Fujifilm XT3"
-    - "night, b&w photo of old house, post apocalypse, forest, storm weather, wind, rocks, 8k uhd, dslr, soft lighting, high quality, film grain"
+    # - "night, b&w photo of old house, post apocalypse, forest, storm weather, wind, rocks, 8k uhd, dslr, soft lighting, high quality, film grain"
  n_prompt:
    - "semi-realistic, cgi, 3d, render, sketch, cartoon, drawing, anime, text, close up, cropped, out of frame, worst quality, low quality, jpeg artifacts, ugly, duplicate, morbid, mutilated, extra fingers, mutated hands, poorly drawn hands, poorly drawn face, mutation, deformed, blurry, dehydrated, bad anatomy, bad proportions, extra limbs, cloned face, disfigured, gross proportions, malformed limbs, missing arms, missing legs, extra arms, extra legs, fused fingers, too many fingers, long neck"
-    - "semi-realistic, cgi, 3d, render, sketch, cartoon, drawing, anime, text, close up, cropped, out of frame, worst quality, low quality, jpeg artifacts, ugly, duplicate, morbid, mutilated, extra fingers, mutated hands, poorly drawn hands, poorly drawn face, mutation, deformed, blurry, dehydrated, bad anatomy, bad proportions, extra limbs, cloned face, disfigured, gross proportions, malformed limbs, missing arms, missing legs, extra arms, extra legs, fused fingers, too many fingers, long neck"
+    # - "semi-realistic, cgi, 3d, render, sketch, cartoon, drawing, anime, text, close up, cropped, out of frame, worst quality, low quality, jpeg artifacts, ugly, duplicate, morbid, mutilated, extra fingers, mutated hands, poorly drawn hands, poorly drawn face, mutation, deformed, blurry, dehydrated, bad anatomy, bad proportions, extra limbs, cloned face, disfigured, gross proportions, malformed limbs, missing arms, missing legs, extra arms, extra legs, fused fingers, too many fingers, long neck"
-    - "blur, haze, deformed iris, deformed pupils, semi-realistic, cgi, 3d, render, sketch, cartoon, drawing, anime, mutated hands and fingers, deformed, distorted, disfigured, poorly drawn, bad anatomy, wrong anatomy, extra limb, missing limb, floating limbs, disconnected limbs, mutation, mutated, ugly, disgusting, amputation"
+    # - "blur, haze, deformed iris, deformed pupils, semi-realistic, cgi, 3d, render, sketch, cartoon, drawing, anime, mutated hands and fingers, deformed, distorted, disfigured, poorly drawn, bad anatomy, wrong anatomy, extra limb, missing limb, floating limbs, disconnected limbs, mutation, mutated, ugly, disgusting, amputation"
-    - "blur, haze, deformed iris, deformed pupils, semi-realistic, cgi, 3d, render, sketch, cartoon, drawing, anime, art, mutated hands and fingers, deformed, distorted, disfigured, poorly drawn, bad anatomy, wrong anatomy, extra limb, missing limb, floating limbs, disconnected limbs, mutation, mutated, ugly, disgusting, amputation"
+    # - "blur, haze, deformed iris, deformed pupils, semi-realistic, cgi, 3d, render, sketch, cartoon, drawing, anime, art, mutated hands and fingers, deformed, distorted, disfigured, poorly drawn, bad anatomy, wrong anatomy, extra limb, missing limb, floating limbs, disconnected limbs, mutation, mutated, ugly, disgusting, amputation"
--- a/configs/training/v1/image_finetune.yaml
+++ b/configs/training/v1/image_finetune.yaml
@@ -12,7 +12,7 @@ noise_scheduler_kwargs:
  clip_sample:         false
 train_data:
-  csv_path:        "data/results_2M_val.csv"
+  csv_path:        "train_data.csv"
  video_folder:    "data/videos"
  sample_size:  128

--- a/configs/training/v1/training.yaml
+++ b/configs/training/v1/training.yaml
@@ -28,7 +28,7 @@ noise_scheduler_kwargs:
  clip_sample:         false
 train_data:
-  csv_path:        "data/results_2M_val.csv"
+  csv_path:        "train_data.csv"
  video_folder:    "data/videos"
  sample_size:     128
  sample_stride:   4

--- a/icon.png
+++ b/icon.png
--- a/requirements.txt
+++ b/requirements.txt
 diffusers==0.11.1
 transformers==4.25.1
-xformers==0.0.16
+# xformers==0.0.16
 imageio==2.27.0
 decord==0.6.0
 gdown

--- a/train.py
+++ b/train.py
@@ -34,7 +34,7 @@ from diffusers.utils.import_utils import is_xformers_available
 import transformers
 from transformers import CLIPTextModel, CLIPTokenizer
-from animatediff.data.dataset import WebVid10M
+from animatediff.data.dataset import WebVid10M, VIDGen
 from animatediff.models.unet import UNet3DConditionModel
 from animatediff.pipelines.pipeline_animation import AnimationPipeline
 from animatediff.utils.util import save_videos_grid, zero_rank_print
@@ -228,7 +228,8 @@ def main(
    text_encoder.to(local_rank)
    # Get the training dataset
-    train_dataset = WebVid10M(**train_data, is_image=image_finetune)
+    # train_dataset = WebVid10M(**train_data, is_image=image_finetune)
+    train_dataset = VIDGen(**train_data, is_image=image_finetune)
    distributed_sampler = DistributedSampler(
        train_dataset,
        num_replicas=num_processes,