Commit 4cdc2890 authored by mashun1's avatar mashun1
Browse files

update

parent 040d4074
...@@ -27,3 +27,6 @@ models/StableDiffusion/* ...@@ -27,3 +27,6 @@ models/StableDiffusion/*
!models/MotionLoRA/ !models/MotionLoRA/
!models/MotionLoRA/*.txt !models/MotionLoRA/*.txt
openai/ openai/
train_data.csv
train_data
\ No newline at end of file
FROM image.sourcefind.cn:5000/dcu/admin/base/pytorch:1.13.1-centos7.6-dtk-23.04.1-py39-latest FROM image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.1.0-ubuntu20.04-dtk24.04.1-py3.10
...@@ -30,8 +30,8 @@ $`\mathcal{E}`$(Encoder,用于压缩原始图像),`Base T2I`(文本生 ...@@ -30,8 +30,8 @@ $`\mathcal{E}`$(Encoder,用于压缩原始图像),`Base T2I`(文本生
### Docker(方法一) ### Docker(方法一)
docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:1.13.1-centos7.6-dtk-23.04.1-py39-latest docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.1.0-ubuntu20.04-dtk24.04.1-py3.10
docker run --shm-size 10g --network=host --name=animatediff --privileged --device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v 项目地址(绝对路径):/home/ -it <your IMAGE ID> bash docker run --shm-size 10g --network=host --name=animatediff --privileged --device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v /opt/hyhal:/opt/hyhal:ro -v 项目地址(绝对路径):/home/ -it <your IMAGE ID> bash
pip install -r requirements.txt pip install -r requirements.txt
### Dockerfile(方法二) ### Dockerfile(方法二)
...@@ -39,20 +39,17 @@ $`\mathcal{E}`$(Encoder,用于压缩原始图像),`Base T2I`(文本生 ...@@ -39,20 +39,17 @@ $`\mathcal{E}`$(Encoder,用于压缩原始图像),`Base T2I`(文本生
# 需要在对应的目录下 # 需要在对应的目录下
docker build -t <IMAGE_NAME>:<TAG> . docker build -t <IMAGE_NAME>:<TAG> .
# <your IMAGE ID>用以上拉取的docker的镜像ID替换 # <your IMAGE ID>用以上拉取的docker的镜像ID替换
docker run -it --shm-size 10g --network=host --name=animatediff --privileged --device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined <your IMAGE ID> bash docker run --shm-size 10g --network=host --name=animatediff --privileged --device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v /opt/hyhal:/opt/hyhal:ro -v 项目地址(绝对路径):/home/ -it <your IMAGE ID> bash
pip install -r requirements.txt pip install -r requirements.txt
### Anaconda (方法三) ### Anaconda (方法三)
1、关于本项目DCU显卡所需的特殊深度学习库可从光合开发者社区下载安装: 1、关于本项目DCU显卡所需的特殊深度学习库可从光合开发者社区下载安装:
https://developer.hpccube.com/tool/ https://developer.hpccube.com/tool/
DTK驱动:dtk23.04.1 DTK驱动:dtk24.04.1
python:python3.9 python:python3.10
torch:1.13.1 torch:2.1.0
torchvision:0.14.1 torchvision:0.16.0
torchaudio:0.13.1
deepspeed:0.9.2
apex:0.1
Tips:以上dtk驱动、python、torch等DCU相关工具版本需要严格一一对应 Tips:以上dtk驱动、python、torch等DCU相关工具版本需要严格一一对应
...@@ -60,39 +57,27 @@ Tips:以上dtk驱动、python、torch等DCU相关工具版本需要严格一 ...@@ -60,39 +57,27 @@ Tips:以上dtk驱动、python、torch等DCU相关工具版本需要严格一
pip install -r requirements.txt pip install -r requirements.txt
## 数据集 ## 数据集
官方数据目前已经下架,如需训练,请自行准备`文本-视频`数据。 官方数据目前已经下架,如需训练,请自行准备`文本-视频`数据或使用本项目提供的数据集。
<!-- 2.5M - 包含2.5M个数据(prompt-video)
http://www.robots.ox.ac.uk/~maxbain/webvid/results_2M_train.csv
http://www.robots.ox.ac.uk/~maxbain/webvid/results_2M_val.csv
10M - 包含10M个数据(prompt-video) [原始链接](fudan-fuxi/VIDGEN-1M) | [SCNet高速下载通道](http://113.200.138.88:18080/aidatasets/fudan-fuxi/VIDGEN-1M)
http://www.robots.ox.ac.uk/~maxbain/webvid/results_10M_train.csv 本项目提供了用于处理数据的脚本,具体使用方法请参考`scripts/process_data.py`
http://www.robots.ox.ac.uk/~maxbain/webvid/results_10M_val.csv data.csv
详情参考: https://github.com/m-bain/webvid |caption|video_path|
|xxxxxxx|xxxxx.mp4|
下载完上述`csv`文件后,需要执行`webvid`项目中的`download.py`下载相应的视频文件。
data/
└── videos
├── xxx.mp4
└── xxx.mp4
└── xxx.csv -->
## 训练 ## 训练
数据准备完成后需要修改`configs/trainging``yaml`文件中数据路径,如下所示。 数据准备完成后需要修改`configs/trainging``yaml`文件中数据路径,如下所示。
train_data: train_data:
csv_path: "data/results_2M_val.csv" csv_path: "<path/to/data.csv>"
video_folder: "data/videos" video_folder: ""
### 微调Unet原始层(image layers) ### 微调Unet原始层(image layers)
...@@ -104,17 +89,49 @@ http://www.robots.ox.ac.uk/~maxbain/webvid/results_10M_val.csv ...@@ -104,17 +89,49 @@ http://www.robots.ox.ac.uk/~maxbain/webvid/results_10M_val.csv
## 推理 ## 推理
### 模型下载 python -m scripts.animate --config configs/prompts/v1/v1-1-ToonYou.yaml --without-xformers
python -m scripts.animate --config configs/prompts/v1/v1-2-Lyriel.yaml --without-xformers
python -m scripts.animate --config configs/prompts/v2/v2-1-RealisticVision.yaml --without-xformers
python -m scripts.animate --config configs/prompts/v3/v3-1-T2V.yaml --without-xformers
python -m scripts.animate --config configs/prompts/v3/v3-2-animation-RealisticVision.yaml --without-xformers
注意:以上仅是部分推理示例,可以自行修改或编写`yaml`文件。
## result
https://huggingface.co/guoyww/animatediff/tree/main ![Alt text](readme_imgs/sample.gif)
https://civitai.com/models/4201?modelVersionId=130072 ### 精度
## 应用场景
### 算法类别
https://civitai.com/models/30240?modelVersionId=125771 `AIGC`
### 热点应用行业
`媒体,科研,教育`
## 预训练权重
### 模型下载
https://huggingface.co/openai/clip-vit-large-patch14/tree/main CLIP: [原始链接](https://huggingface.co/openai/clip-vit-large-patch14/tree/main) | [SCNet高速下载通道](http://113.200.138.88:18080/aimodels/clip-vit-large-patch14)
可使用`hf-mirror.com`替换`huggingface.co`加速模型下载。 DreamBooth_LORA:
- toonyou_beta6: [原始链接](https://hf-mirror.com/frankjoshua/toonyou_beta6) | [SCNet高速下载通道](http://113.200.138.88:18080/aimodels/frankjoshua/toonyou_beta6)
- 其他: [civitai](https://civitai.com/models)
sd1.5: [原始链接](https://hf-mirror.com/Jiali/stable-diffusion-1.5/tree/main) | [SCNet高速下载通道](http://113.200.138.88:18080/aimodels/stable-diffusion-v1-5)
Motion_Module: [原始链接](https://huggingface.co/guoyww/animatediff/tree/main) | [SCNet高速下载通道](http://113.200.138.88:18080/aimodels/AnimateDiff)
openai/ openai/
└── clip-vit-large-patch14 └── clip-vit-large-patch14
...@@ -172,38 +189,6 @@ https://huggingface.co/openai/clip-vit-large-patch14/tree/main ...@@ -172,38 +189,6 @@ https://huggingface.co/openai/clip-vit-large-patch14/tree/main
注意:以上模型并不是必选,仅提供文件结构,可根据需要自行选择部分或其他模型。 注意:以上模型并不是必选,仅提供文件结构,可根据需要自行选择部分或其他模型。
### 命令
python -m scripts.animate --config configs/prompts/v1/v1-1-ToonYou.yaml --without-xformers
python -m scripts.animate --config configs/prompts/v1/v1-2-Lyriel.yaml --without-xformers
python -m scripts.animate --config configs/prompts/v2/v2-1-RealisticVision.yaml --without-xformers
python -m scripts.animate --config configs/prompts/v3/v3-1-T2V.yaml --without-xformers
python -m scripts.animate --config configs/prompts/v3/v3-2-animation-RealisticVision.yaml --without-xformers
注意:以上仅是部分推理示例,可以自行修改或编写`yaml`文件。
## result
![Alt text](readme_imgs/sample.gif)
### 精度
## 应用场景
### 算法类别
`AIGC`
### 热点应用行业
`媒体,科研,教育`
## 源码仓库及问题反馈 ## 源码仓库及问题反馈
https://developer.hpccube.com/codes/modelzoo/animatediff_pytorch https://developer.hpccube.com/codes/modelzoo/animatediff_pytorch
......
...@@ -77,6 +77,34 @@ class WebVid10M(Dataset): ...@@ -77,6 +77,34 @@ class WebVid10M(Dataset):
return sample return sample
class VIDGen(WebVid10M):
def get_batch(self, idx):
video_dict = self.dataset[idx]
# videoid, name, page_dir = video_dict['videoid'], video_dict['name'], video_dict['page_dir']
name, video_dir = video_dict['caption'], video_dict['video_path']
# video_dir = os.path.join(self.video_folder, f"{videoid}.mp4")
video_reader = VideoReader(video_dir)
video_length = len(video_reader)
if not self.is_image:
clip_length = min(video_length, (self.sample_n_frames - 1) * self.sample_stride + 1)
start_idx = random.randint(0, video_length - clip_length)
batch_index = np.linspace(start_idx, start_idx + clip_length - 1, self.sample_n_frames, dtype=int)
else:
batch_index = [random.randint(0, video_length - 1)]
pixel_values = torch.from_numpy(video_reader.get_batch(batch_index).asnumpy()).permute(0, 3, 1, 2).contiguous()
pixel_values = pixel_values / 255.
del video_reader
if self.is_image:
pixel_values = pixel_values[0]
return pixel_values, name
if __name__ == "__main__": if __name__ == "__main__":
from animatediff.utils.util import save_videos_grid from animatediff.utils.util import save_videos_grid
......
...@@ -2,6 +2,7 @@ ...@@ -2,6 +2,7 @@
motion_module: "models/Motion_Module/mm_sd_v15_v2.ckpt" motion_module: "models/Motion_Module/mm_sd_v15_v2.ckpt"
dreambooth_path: "models/DreamBooth_LoRA/realisticVisionV51_v51VAE.safetensors" dreambooth_path: "models/DreamBooth_LoRA/realisticVisionV51_v51VAE.safetensors"
# dreambooth_path: "models/DreamBooth_LoRA/toonyou_beta6.safetensors"
lora_model_path: "" lora_model_path: ""
seed: [13100322578370451493, 14752961627088720670, 9329399085567825781, 16987697414827649302] seed: [13100322578370451493, 14752961627088720670, 9329399085567825781, 16987697414827649302]
...@@ -10,12 +11,12 @@ ...@@ -10,12 +11,12 @@
prompt: prompt:
- "b&w photo of 42 y.o man in black clothes, bald, face, half body, body, high detailed skin, skin pores, coastline, overcast weather, wind, waves, 8k uhd, dslr, soft lighting, high quality, film grain, Fujifilm XT3" - "b&w photo of 42 y.o man in black clothes, bald, face, half body, body, high detailed skin, skin pores, coastline, overcast weather, wind, waves, 8k uhd, dslr, soft lighting, high quality, film grain, Fujifilm XT3"
- "close up photo of a rabbit, forest, haze, halation, bloom, dramatic atmosphere, centred, rule of thirds, 200mm 1.4f macro shot" # - "close up photo of a rabbit, forest, haze, halation, bloom, dramatic atmosphere, centred, rule of thirds, 200mm 1.4f macro shot"
- "photo of coastline, rocks, storm weather, wind, waves, lightning, 8k uhd, dslr, soft lighting, high quality, film grain, Fujifilm XT3" # - "photo of coastline, rocks, storm weather, wind, waves, lightning, 8k uhd, dslr, soft lighting, high quality, film grain, Fujifilm XT3"
- "night, b&w photo of old house, post apocalypse, forest, storm weather, wind, rocks, 8k uhd, dslr, soft lighting, high quality, film grain" # - "night, b&w photo of old house, post apocalypse, forest, storm weather, wind, rocks, 8k uhd, dslr, soft lighting, high quality, film grain"
n_prompt: n_prompt:
- "semi-realistic, cgi, 3d, render, sketch, cartoon, drawing, anime, text, close up, cropped, out of frame, worst quality, low quality, jpeg artifacts, ugly, duplicate, morbid, mutilated, extra fingers, mutated hands, poorly drawn hands, poorly drawn face, mutation, deformed, blurry, dehydrated, bad anatomy, bad proportions, extra limbs, cloned face, disfigured, gross proportions, malformed limbs, missing arms, missing legs, extra arms, extra legs, fused fingers, too many fingers, long neck" - "semi-realistic, cgi, 3d, render, sketch, cartoon, drawing, anime, text, close up, cropped, out of frame, worst quality, low quality, jpeg artifacts, ugly, duplicate, morbid, mutilated, extra fingers, mutated hands, poorly drawn hands, poorly drawn face, mutation, deformed, blurry, dehydrated, bad anatomy, bad proportions, extra limbs, cloned face, disfigured, gross proportions, malformed limbs, missing arms, missing legs, extra arms, extra legs, fused fingers, too many fingers, long neck"
- "semi-realistic, cgi, 3d, render, sketch, cartoon, drawing, anime, text, close up, cropped, out of frame, worst quality, low quality, jpeg artifacts, ugly, duplicate, morbid, mutilated, extra fingers, mutated hands, poorly drawn hands, poorly drawn face, mutation, deformed, blurry, dehydrated, bad anatomy, bad proportions, extra limbs, cloned face, disfigured, gross proportions, malformed limbs, missing arms, missing legs, extra arms, extra legs, fused fingers, too many fingers, long neck" # - "semi-realistic, cgi, 3d, render, sketch, cartoon, drawing, anime, text, close up, cropped, out of frame, worst quality, low quality, jpeg artifacts, ugly, duplicate, morbid, mutilated, extra fingers, mutated hands, poorly drawn hands, poorly drawn face, mutation, deformed, blurry, dehydrated, bad anatomy, bad proportions, extra limbs, cloned face, disfigured, gross proportions, malformed limbs, missing arms, missing legs, extra arms, extra legs, fused fingers, too many fingers, long neck"
- "blur, haze, deformed iris, deformed pupils, semi-realistic, cgi, 3d, render, sketch, cartoon, drawing, anime, mutated hands and fingers, deformed, distorted, disfigured, poorly drawn, bad anatomy, wrong anatomy, extra limb, missing limb, floating limbs, disconnected limbs, mutation, mutated, ugly, disgusting, amputation" # - "blur, haze, deformed iris, deformed pupils, semi-realistic, cgi, 3d, render, sketch, cartoon, drawing, anime, mutated hands and fingers, deformed, distorted, disfigured, poorly drawn, bad anatomy, wrong anatomy, extra limb, missing limb, floating limbs, disconnected limbs, mutation, mutated, ugly, disgusting, amputation"
- "blur, haze, deformed iris, deformed pupils, semi-realistic, cgi, 3d, render, sketch, cartoon, drawing, anime, art, mutated hands and fingers, deformed, distorted, disfigured, poorly drawn, bad anatomy, wrong anatomy, extra limb, missing limb, floating limbs, disconnected limbs, mutation, mutated, ugly, disgusting, amputation" # - "blur, haze, deformed iris, deformed pupils, semi-realistic, cgi, 3d, render, sketch, cartoon, drawing, anime, art, mutated hands and fingers, deformed, distorted, disfigured, poorly drawn, bad anatomy, wrong anatomy, extra limb, missing limb, floating limbs, disconnected limbs, mutation, mutated, ugly, disgusting, amputation"
...@@ -12,7 +12,7 @@ noise_scheduler_kwargs: ...@@ -12,7 +12,7 @@ noise_scheduler_kwargs:
clip_sample: false clip_sample: false
train_data: train_data:
csv_path: "data/results_2M_val.csv" csv_path: "train_data.csv"
video_folder: "data/videos" video_folder: "data/videos"
sample_size: 128 sample_size: 128
......
...@@ -28,7 +28,7 @@ noise_scheduler_kwargs: ...@@ -28,7 +28,7 @@ noise_scheduler_kwargs:
clip_sample: false clip_sample: false
train_data: train_data:
csv_path: "data/results_2M_val.csv" csv_path: "train_data.csv"
video_folder: "data/videos" video_folder: "data/videos"
sample_size: 128 sample_size: 128
sample_stride: 4 sample_stride: 4
......
icon.png

68.4 KB

diffusers==0.11.1 diffusers==0.11.1
transformers==4.25.1 transformers==4.25.1
xformers==0.0.16 # xformers==0.0.16
imageio==2.27.0 imageio==2.27.0
decord==0.6.0 decord==0.6.0
gdown gdown
......
...@@ -34,7 +34,7 @@ from diffusers.utils.import_utils import is_xformers_available ...@@ -34,7 +34,7 @@ from diffusers.utils.import_utils import is_xformers_available
import transformers import transformers
from transformers import CLIPTextModel, CLIPTokenizer from transformers import CLIPTextModel, CLIPTokenizer
from animatediff.data.dataset import WebVid10M from animatediff.data.dataset import WebVid10M, VIDGen
from animatediff.models.unet import UNet3DConditionModel from animatediff.models.unet import UNet3DConditionModel
from animatediff.pipelines.pipeline_animation import AnimationPipeline from animatediff.pipelines.pipeline_animation import AnimationPipeline
from animatediff.utils.util import save_videos_grid, zero_rank_print from animatediff.utils.util import save_videos_grid, zero_rank_print
...@@ -228,7 +228,8 @@ def main( ...@@ -228,7 +228,8 @@ def main(
text_encoder.to(local_rank) text_encoder.to(local_rank)
# Get the training dataset # Get the training dataset
train_dataset = WebVid10M(**train_data, is_image=image_finetune) # train_dataset = WebVid10M(**train_data, is_image=image_finetune)
train_dataset = VIDGen(**train_data, is_image=image_finetune)
distributed_sampler = DistributedSampler( distributed_sampler = DistributedSampler(
train_dataset, train_dataset,
num_replicas=num_processes, num_replicas=num_processes,
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment