Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
ModelZoo
AnimateDiff_pytorch
Commits
4cdc2890
Commit
4cdc2890
authored
Sep 03, 2024
by
mashun1
Browse files
update
parent
040d4074
Changes
10
Hide whitespace changes
Inline
Side-by-side
Showing
10 changed files
with
98 additions
and
80 deletions
+98
-80
.gitignore
.gitignore
+3
-0
Dockerfile
Dockerfile
+1
-1
README.md
README.md
+53
-68
animatediff/data/dataset.py
animatediff/data/dataset.py
+28
-0
configs/prompts/v2/v2-1-RealisticVision.yaml
configs/prompts/v2/v2-1-RealisticVision.yaml
+7
-6
configs/training/v1/image_finetune.yaml
configs/training/v1/image_finetune.yaml
+1
-1
configs/training/v1/training.yaml
configs/training/v1/training.yaml
+1
-1
icon.png
icon.png
+0
-0
requirements.txt
requirements.txt
+1
-1
train.py
train.py
+3
-2
No files found.
.gitignore
View file @
4cdc2890
...
@@ -27,3 +27,6 @@ models/StableDiffusion/*
...
@@ -27,3 +27,6 @@ models/StableDiffusion/*
!models/MotionLoRA/
!models/MotionLoRA/
!models/MotionLoRA/*.txt
!models/MotionLoRA/*.txt
openai/
openai/
train_data.csv
train_data
\ No newline at end of file
Dockerfile
View file @
4cdc2890
FROM
image.sourcefind.cn:5000/dcu/admin/base/pytorch:
1
.1
3.1-centos7.6
-dtk
-23
.04.1-py3
9-latest
FROM
image.sourcefind.cn:5000/dcu/admin/base/pytorch:
2
.1
.0-ubuntu20.04
-dtk
24
.04.1-py3
.10
README.md
View file @
4cdc2890
...
@@ -30,8 +30,8 @@ $`\mathcal{E}`$(Encoder,用于压缩原始图像),`Base T2I`(文本生
...
@@ -30,8 +30,8 @@ $`\mathcal{E}`$(Encoder,用于压缩原始图像),`Base T2I`(文本生
### Docker(方法一)
### Docker(方法一)
docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:
1
.1
3.1-centos7.6
-dtk
-23
.04.1-py3
9-latest
docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:
2
.1
.0-ubuntu20.04
-dtk
24
.04.1-py3
.10
docker run --shm-size 10g --network=host --name=animatediff --privileged --device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v 项目地址(绝对路径):/home/ -it <your IMAGE ID> bash
docker run --shm-size 10g --network=host --name=animatediff --privileged --device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v
/opt/hyhal:/opt/hyhal:ro -v
项目地址(绝对路径):/home/ -it <your IMAGE ID> bash
pip install -r requirements.txt
pip install -r requirements.txt
### Dockerfile(方法二)
### Dockerfile(方法二)
...
@@ -39,20 +39,17 @@ $`\mathcal{E}`$(Encoder,用于压缩原始图像),`Base T2I`(文本生
...
@@ -39,20 +39,17 @@ $`\mathcal{E}`$(Encoder,用于压缩原始图像),`Base T2I`(文本生
# 需要在对应的目录下
# 需要在对应的目录下
docker build -t <IMAGE_NAME>:<TAG> .
docker build -t <IMAGE_NAME>:<TAG> .
# <your IMAGE ID>用以上拉取的docker的镜像ID替换
# <your IMAGE ID>用以上拉取的docker的镜像ID替换
docker run
-it
--shm-size 10g --network=host --name=animatediff --privileged --device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined <your IMAGE ID> bash
docker run --shm-size 10g --network=host --name=animatediff --privileged --device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined
-v /opt/hyhal:/opt/hyhal:ro -v 项目地址(绝对路径):/home/ -it
<your IMAGE ID> bash
pip install -r requirements.txt
pip install -r requirements.txt
### Anaconda (方法三)
### Anaconda (方法三)
1、关于本项目DCU显卡所需的特殊深度学习库可从光合开发者社区下载安装:
1、关于本项目DCU显卡所需的特殊深度学习库可从光合开发者社区下载安装:
https://developer.hpccube.com/tool/
https://developer.hpccube.com/tool/
DTK驱动:dtk23.04.1
DTK驱动:dtk24.04.1
python:python3.9
python:python3.10
torch:1.13.1
torch:2.1.0
torchvision:0.14.1
torchvision:0.16.0
torchaudio:0.13.1
deepspeed:0.9.2
apex:0.1
Tips:以上dtk驱动、python、torch等DCU相关工具版本需要严格一一对应
Tips:以上dtk驱动、python、torch等DCU相关工具版本需要严格一一对应
...
@@ -60,39 +57,27 @@ Tips:以上dtk驱动、python、torch等DCU相关工具版本需要严格一
...
@@ -60,39 +57,27 @@ Tips:以上dtk驱动、python、torch等DCU相关工具版本需要严格一
pip install -r requirements.txt
pip install -r requirements.txt
## 数据集
## 数据集
官方数据目前已经下架,如需训练,请自行准备
`文本-视频`
数据。
官方数据目前已经下架,如需训练,请自行准备
`文本-视频`
数据或使用本项目提供的数据集。
<!-- 2.5M - 包含2.5M个数据(prompt-video)
http://www.robots.ox.ac.uk/~maxbain/webvid/results_2M_train.csv
http://www.robots.ox.ac.uk/~maxbain/webvid/results_2M_val.csv
10M - 包含10M个数据(prompt-video)
[
原始链接
](
fudan-fuxi/VIDGEN-1M
)
|
[
SCNet高速下载通道
](
http://113.200.138.88:18080/aidatasets/fudan-fuxi/VIDGEN-1M
)
http://www.robots.ox.ac.uk/~maxbain/webvid/results_10M_train.csv
本项目提供了用于处理数据的脚本,具体使用方法请参考
`scripts/process_data.py`
。
http://www.robots.ox.ac.uk/~maxbain/webvid/results_10M_val
.csv
data
.csv
详情参考: https://github.com/m-bain/webvid
|caption|video_path|
|xxxxxxx|xxxxx.mp4|
下载完上述
`csv`
文件后,需要执行
`webvid`
项目中的
`download.py`
下载相应的视频文件。
data/
└── videos
├── xxx.mp4
└── xxx.mp4
└── xxx.csv -->
## 训练
## 训练
数据准备完成后需要修改
`configs/trainging`
中
`yaml`
文件中数据路径,如下所示。
数据准备完成后需要修改
`configs/trainging`
中
`yaml`
文件中数据路径,如下所示。
train_data:
train_data:
csv_path: "
data/results_2M_val
.csv"
csv_path: "
<path/to/data
.csv
>
"
video_folder: "
data/videos
"
video_folder: ""
### 微调Unet原始层(image layers)
### 微调Unet原始层(image layers)
...
@@ -104,17 +89,49 @@ http://www.robots.ox.ac.uk/~maxbain/webvid/results_10M_val.csv
...
@@ -104,17 +89,49 @@ http://www.robots.ox.ac.uk/~maxbain/webvid/results_10M_val.csv
## 推理
## 推理
### 模型下载
python -m scripts.animate --config configs/prompts/v1/v1-1-ToonYou.yaml --without-xformers
python -m scripts.animate --config configs/prompts/v1/v1-2-Lyriel.yaml --without-xformers
python -m scripts.animate --config configs/prompts/v2/v2-1-RealisticVision.yaml --without-xformers
python -m scripts.animate --config configs/prompts/v3/v3-1-T2V.yaml --without-xformers
python -m scripts.animate --config configs/prompts/v3/v3-2-animation-RealisticVision.yaml --without-xformers
注意:以上仅是部分推理示例,可以自行修改或编写
`yaml`
文件。
## result
https://huggingface.co/guoyww/animatediff/tree/main

https://civitai.com/models/4201?modelVersionId=130072
### 精度
无
## 应用场景
### 算法类别
https://civitai.com/models/30240?modelVersionId=125771
`AIGC`
### 热点应用行业
`媒体,科研,教育`
## 预训练权重
### 模型下载
https://huggingface.co/openai/clip-vit-large-patch14/tree/main
CLIP:
[
原始链接
](
https://huggingface.co/openai/clip-vit-large-patch14/tree/main
)
|
[
SCNet高速下载通道
](
http://113.200.138.88:18080/aimodels/clip-vit-large-patch14
)
可使用
`hf-mirror.com`
替换
`huggingface.co`
加速模型下载。
DreamBooth_LORA:
-
toonyou_beta6:
[
原始链接
](
https://hf-mirror.com/frankjoshua/toonyou_beta6
)
|
[
SCNet高速下载通道
](
http://113.200.138.88:18080/aimodels/frankjoshua/toonyou_beta6
)
-
其他:
[
civitai
](
https://civitai.com/models
)
sd1.5:
[
原始链接
](
https://hf-mirror.com/Jiali/stable-diffusion-1.5/tree/main
)
|
[
SCNet高速下载通道
](
http://113.200.138.88:18080/aimodels/stable-diffusion-v1-5
)
Motion_Module:
[
原始链接
](
https://huggingface.co/guoyww/animatediff/tree/main
)
|
[
SCNet高速下载通道
](
http://113.200.138.88:18080/aimodels/AnimateDiff
)
openai/
openai/
└── clip-vit-large-patch14
└── clip-vit-large-patch14
...
@@ -172,38 +189,6 @@ https://huggingface.co/openai/clip-vit-large-patch14/tree/main
...
@@ -172,38 +189,6 @@ https://huggingface.co/openai/clip-vit-large-patch14/tree/main
注意:以上模型并不是必选,仅提供文件结构,可根据需要自行选择部分或其他模型。
注意:以上模型并不是必选,仅提供文件结构,可根据需要自行选择部分或其他模型。
### 命令
python -m scripts.animate --config configs/prompts/v1/v1-1-ToonYou.yaml --without-xformers
python -m scripts.animate --config configs/prompts/v1/v1-2-Lyriel.yaml --without-xformers
python -m scripts.animate --config configs/prompts/v2/v2-1-RealisticVision.yaml --without-xformers
python -m scripts.animate --config configs/prompts/v3/v3-1-T2V.yaml --without-xformers
python -m scripts.animate --config configs/prompts/v3/v3-2-animation-RealisticVision.yaml --without-xformers
注意:以上仅是部分推理示例,可以自行修改或编写
`yaml`
文件。
## result

### 精度
无
## 应用场景
### 算法类别
`AIGC`
### 热点应用行业
`媒体,科研,教育`
## 源码仓库及问题反馈
## 源码仓库及问题反馈
https://developer.hpccube.com/codes/modelzoo/animatediff_pytorch
https://developer.hpccube.com/codes/modelzoo/animatediff_pytorch
...
...
animatediff/data/dataset.py
View file @
4cdc2890
...
@@ -77,6 +77,34 @@ class WebVid10M(Dataset):
...
@@ -77,6 +77,34 @@ class WebVid10M(Dataset):
return
sample
return
sample
class
VIDGen
(
WebVid10M
):
def
get_batch
(
self
,
idx
):
video_dict
=
self
.
dataset
[
idx
]
# videoid, name, page_dir = video_dict['videoid'], video_dict['name'], video_dict['page_dir']
name
,
video_dir
=
video_dict
[
'caption'
],
video_dict
[
'video_path'
]
# video_dir = os.path.join(self.video_folder, f"{videoid}.mp4")
video_reader
=
VideoReader
(
video_dir
)
video_length
=
len
(
video_reader
)
if
not
self
.
is_image
:
clip_length
=
min
(
video_length
,
(
self
.
sample_n_frames
-
1
)
*
self
.
sample_stride
+
1
)
start_idx
=
random
.
randint
(
0
,
video_length
-
clip_length
)
batch_index
=
np
.
linspace
(
start_idx
,
start_idx
+
clip_length
-
1
,
self
.
sample_n_frames
,
dtype
=
int
)
else
:
batch_index
=
[
random
.
randint
(
0
,
video_length
-
1
)]
pixel_values
=
torch
.
from_numpy
(
video_reader
.
get_batch
(
batch_index
).
asnumpy
()).
permute
(
0
,
3
,
1
,
2
).
contiguous
()
pixel_values
=
pixel_values
/
255.
del
video_reader
if
self
.
is_image
:
pixel_values
=
pixel_values
[
0
]
return
pixel_values
,
name
if
__name__
==
"__main__"
:
if
__name__
==
"__main__"
:
from
animatediff.utils.util
import
save_videos_grid
from
animatediff.utils.util
import
save_videos_grid
...
...
configs/prompts/v2/v2-1-RealisticVision.yaml
View file @
4cdc2890
...
@@ -2,6 +2,7 @@
...
@@ -2,6 +2,7 @@
motion_module
:
"
models/Motion_Module/mm_sd_v15_v2.ckpt"
motion_module
:
"
models/Motion_Module/mm_sd_v15_v2.ckpt"
dreambooth_path
:
"
models/DreamBooth_LoRA/realisticVisionV51_v51VAE.safetensors"
dreambooth_path
:
"
models/DreamBooth_LoRA/realisticVisionV51_v51VAE.safetensors"
# dreambooth_path: "models/DreamBooth_LoRA/toonyou_beta6.safetensors"
lora_model_path
:
"
"
lora_model_path
:
"
"
seed
:
[
13100322578370451493
,
14752961627088720670
,
9329399085567825781
,
16987697414827649302
]
seed
:
[
13100322578370451493
,
14752961627088720670
,
9329399085567825781
,
16987697414827649302
]
...
@@ -10,12 +11,12 @@
...
@@ -10,12 +11,12 @@
prompt
:
prompt
:
-
"
b&w
photo
of
42
y.o
man
in
black
clothes,
bald,
face,
half
body,
body,
high
detailed
skin,
skin
pores,
coastline,
overcast
weather,
wind,
waves,
8k
uhd,
dslr,
soft
lighting,
high
quality,
film
grain,
Fujifilm
XT3"
-
"
b&w
photo
of
42
y.o
man
in
black
clothes,
bald,
face,
half
body,
body,
high
detailed
skin,
skin
pores,
coastline,
overcast
weather,
wind,
waves,
8k
uhd,
dslr,
soft
lighting,
high
quality,
film
grain,
Fujifilm
XT3"
-
"
close
up
photo
of
a
rabbit,
forest,
haze,
halation,
bloom,
dramatic
atmosphere,
centred,
rule
of
thirds,
200mm
1.4f
macro
shot"
#
- "close up photo of a rabbit, forest, haze, halation, bloom, dramatic atmosphere, centred, rule of thirds, 200mm 1.4f macro shot"
-
"
photo
of
coastline,
rocks,
storm
weather,
wind,
waves,
lightning,
8k
uhd,
dslr,
soft
lighting,
high
quality,
film
grain,
Fujifilm
XT3"
#
- "photo of coastline, rocks, storm weather, wind, waves, lightning, 8k uhd, dslr, soft lighting, high quality, film grain, Fujifilm XT3"
-
"
night,
b&w
photo
of
old
house,
post
apocalypse,
forest,
storm
weather,
wind,
rocks,
8k
uhd,
dslr,
soft
lighting,
high
quality,
film
grain"
#
- "night, b&w photo of old house, post apocalypse, forest, storm weather, wind, rocks, 8k uhd, dslr, soft lighting, high quality, film grain"
n_prompt
:
n_prompt
:
-
"
semi-realistic,
cgi,
3d,
render,
sketch,
cartoon,
drawing,
anime,
text,
close
up,
cropped,
out
of
frame,
worst
quality,
low
quality,
jpeg
artifacts,
ugly,
duplicate,
morbid,
mutilated,
extra
fingers,
mutated
hands,
poorly
drawn
hands,
poorly
drawn
face,
mutation,
deformed,
blurry,
dehydrated,
bad
anatomy,
bad
proportions,
extra
limbs,
cloned
face,
disfigured,
gross
proportions,
malformed
limbs,
missing
arms,
missing
legs,
extra
arms,
extra
legs,
fused
fingers,
too
many
fingers,
long
neck"
-
"
semi-realistic,
cgi,
3d,
render,
sketch,
cartoon,
drawing,
anime,
text,
close
up,
cropped,
out
of
frame,
worst
quality,
low
quality,
jpeg
artifacts,
ugly,
duplicate,
morbid,
mutilated,
extra
fingers,
mutated
hands,
poorly
drawn
hands,
poorly
drawn
face,
mutation,
deformed,
blurry,
dehydrated,
bad
anatomy,
bad
proportions,
extra
limbs,
cloned
face,
disfigured,
gross
proportions,
malformed
limbs,
missing
arms,
missing
legs,
extra
arms,
extra
legs,
fused
fingers,
too
many
fingers,
long
neck"
-
"
semi-realistic,
cgi,
3d,
render,
sketch,
cartoon,
drawing,
anime,
text,
close
up,
cropped,
out
of
frame,
worst
quality,
low
quality,
jpeg
artifacts,
ugly,
duplicate,
morbid,
mutilated,
extra
fingers,
mutated
hands,
poorly
drawn
hands,
poorly
drawn
face,
mutation,
deformed,
blurry,
dehydrated,
bad
anatomy,
bad
proportions,
extra
limbs,
cloned
face,
disfigured,
gross
proportions,
malformed
limbs,
missing
arms,
missing
legs,
extra
arms,
extra
legs,
fused
fingers,
too
many
fingers,
long
neck"
#
- "semi-realistic, cgi, 3d, render, sketch, cartoon, drawing, anime, text, close up, cropped, out of frame, worst quality, low quality, jpeg artifacts, ugly, duplicate, morbid, mutilated, extra fingers, mutated hands, poorly drawn hands, poorly drawn face, mutation, deformed, blurry, dehydrated, bad anatomy, bad proportions, extra limbs, cloned face, disfigured, gross proportions, malformed limbs, missing arms, missing legs, extra arms, extra legs, fused fingers, too many fingers, long neck"
-
"
blur,
haze,
deformed
iris,
deformed
pupils,
semi-realistic,
cgi,
3d,
render,
sketch,
cartoon,
drawing,
anime,
mutated
hands
and
fingers,
deformed,
distorted,
disfigured,
poorly
drawn,
bad
anatomy,
wrong
anatomy,
extra
limb,
missing
limb,
floating
limbs,
disconnected
limbs,
mutation,
mutated,
ugly,
disgusting,
amputation"
#
- "blur, haze, deformed iris, deformed pupils, semi-realistic, cgi, 3d, render, sketch, cartoon, drawing, anime, mutated hands and fingers, deformed, distorted, disfigured, poorly drawn, bad anatomy, wrong anatomy, extra limb, missing limb, floating limbs, disconnected limbs, mutation, mutated, ugly, disgusting, amputation"
-
"
blur,
haze,
deformed
iris,
deformed
pupils,
semi-realistic,
cgi,
3d,
render,
sketch,
cartoon,
drawing,
anime,
art,
mutated
hands
and
fingers,
deformed,
distorted,
disfigured,
poorly
drawn,
bad
anatomy,
wrong
anatomy,
extra
limb,
missing
limb,
floating
limbs,
disconnected
limbs,
mutation,
mutated,
ugly,
disgusting,
amputation"
#
- "blur, haze, deformed iris, deformed pupils, semi-realistic, cgi, 3d, render, sketch, cartoon, drawing, anime, art, mutated hands and fingers, deformed, distorted, disfigured, poorly drawn, bad anatomy, wrong anatomy, extra limb, missing limb, floating limbs, disconnected limbs, mutation, mutated, ugly, disgusting, amputation"
configs/training/v1/image_finetune.yaml
View file @
4cdc2890
...
@@ -12,7 +12,7 @@ noise_scheduler_kwargs:
...
@@ -12,7 +12,7 @@ noise_scheduler_kwargs:
clip_sample
:
false
clip_sample
:
false
train_data
:
train_data
:
csv_path
:
"
data/results_2M_val
.csv"
csv_path
:
"
train_data
.csv"
video_folder
:
"
data/videos"
video_folder
:
"
data/videos"
sample_size
:
128
sample_size
:
128
...
...
configs/training/v1/training.yaml
View file @
4cdc2890
...
@@ -28,7 +28,7 @@ noise_scheduler_kwargs:
...
@@ -28,7 +28,7 @@ noise_scheduler_kwargs:
clip_sample
:
false
clip_sample
:
false
train_data
:
train_data
:
csv_path
:
"
data/results_2M_val
.csv"
csv_path
:
"
train_data
.csv"
video_folder
:
"
data/videos"
video_folder
:
"
data/videos"
sample_size
:
128
sample_size
:
128
sample_stride
:
4
sample_stride
:
4
...
...
icon.png
0 → 100644
View file @
4cdc2890
68.4 KB
requirements.txt
View file @
4cdc2890
diffusers
==0.11.1
diffusers
==0.11.1
transformers
==4.25.1
transformers
==4.25.1
xformers
==0.0.16
#
xformers==0.0.16
imageio
==2.27.0
imageio
==2.27.0
decord
==0.6.0
decord
==0.6.0
gdown
gdown
...
...
train.py
View file @
4cdc2890
...
@@ -34,7 +34,7 @@ from diffusers.utils.import_utils import is_xformers_available
...
@@ -34,7 +34,7 @@ from diffusers.utils.import_utils import is_xformers_available
import
transformers
import
transformers
from
transformers
import
CLIPTextModel
,
CLIPTokenizer
from
transformers
import
CLIPTextModel
,
CLIPTokenizer
from
animatediff.data.dataset
import
WebVid10M
from
animatediff.data.dataset
import
WebVid10M
,
VIDGen
from
animatediff.models.unet
import
UNet3DConditionModel
from
animatediff.models.unet
import
UNet3DConditionModel
from
animatediff.pipelines.pipeline_animation
import
AnimationPipeline
from
animatediff.pipelines.pipeline_animation
import
AnimationPipeline
from
animatediff.utils.util
import
save_videos_grid
,
zero_rank_print
from
animatediff.utils.util
import
save_videos_grid
,
zero_rank_print
...
@@ -228,7 +228,8 @@ def main(
...
@@ -228,7 +228,8 @@ def main(
text_encoder
.
to
(
local_rank
)
text_encoder
.
to
(
local_rank
)
# Get the training dataset
# Get the training dataset
train_dataset
=
WebVid10M
(
**
train_data
,
is_image
=
image_finetune
)
# train_dataset = WebVid10M(**train_data, is_image=image_finetune)
train_dataset
=
VIDGen
(
**
train_data
,
is_image
=
image_finetune
)
distributed_sampler
=
DistributedSampler
(
distributed_sampler
=
DistributedSampler
(
train_dataset
,
train_dataset
,
num_replicas
=
num_processes
,
num_replicas
=
num_processes
,
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment