README.md 8.76 KB
Newer Older
mashun1's avatar
wan2.1  
mashun1 committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
# Wan2.1

## 论文

`Tech Report`

* https://wanxai.com/

## 模型结构

模型采用主流`Latent Diffusion`架构,包含用于数据压缩/恢复的`3D VAE`,去噪模块`DiT`,文本使用`T5`编码器处理。

![alt text](readme_imgs/arch.png)

## 算法原理

采用Flow matching算法。

![alt text](readme_imgs/alg.png)

## 环境配置

### Docker(方法一)
    
    docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.3.0-ubuntu22.04-dtk24.04.3-py3.10

    docker run --shm-size 100g --network=host --name=wan --privileged --device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v 项目地址(绝对路径):/home/ -v /opt/hyhal:/opt/hyhal:ro -it <your IMAGE ID> bash

    pip install -r requirements.txt

    pip install "xfuser>=0.4" --no-deps torch

    bash modified/fix.sh

### Dockerfile(方法二)

    docker build -t <IMAGE_NAME>:<TAG> .

    docker run --shm-size 100g --network=host --name=wan --privileged --device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v 项目地址(绝对路径):/home/ -v /opt/hyhal:/opt/hyhal:ro -it <your IMAGE ID> bash
    
    pip install -r requirements.txt

    pip install "xfuser>=0.4" --no-deps torch

    bash modified/fix.sh

### Anaconda(方法三)

1、关于本项目DCU显卡所需的特殊深度学习库可从光合开发者社区下载安装: https://developer.hpccube.com/tool/

```
DTK驱动:dtk24.04.3
python:python3.10
torch:2.3.0
torchvision:0.18.1
torchaudio:2.1.2
triton:2.1.0
vllm:0.6.2
flash-attn:2.6.1
deepspeed:0.14.2
apex:1.3.0
xformers:0.0.25
transformers:4.48.0
```

2、其他非特殊库直接按照requirements.txt安装

```
pip install -r requirements.txt

pip install "xfuser>=0.4" --no-deps torch

# 需要参考 modified/fix.sh中的命令修改相应位置的代码
```

## 数据集



## 训练



## 推理

mashun1's avatar
mashun1 committed
86
### 文本-视频生成
mashun1's avatar
wan2.1  
mashun1 committed
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136

1、单卡

<!-- # # 14B模型支持480/720P
# python generate.py  --task t2v-14B --size 832*480 --ckpt_dir models/Wan2.1-T2V-14B --prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage." -->

```bash
# 1.3B模型支持480P
python generate.py  --task t2v-1.3B --size 832*480 --ckpt_dir models/Wan2.1-T2V-1.3B --prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage."
```

注意:若遇到显存不够的问题,可以尝试`--offload_model True``--t5_cpu`

2、多卡

```bash
# 1.3B
torchrun --nproc_per_node=4 generate.py --task t2v-1.3B --size 832*480 --ckpt_dir models/Wan2.1-T2V-1.3B --dit_fsdp --t5_fsdp --ulysses_size 4 --prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage."

# 14B
torchrun --nproc_per_node=4 generate.py --task t2v-14B --size 1280*720 --ckpt_dir models/Wan2.1-T2V-14B --dit_fsdp --t5_fsdp --ulysses_size 4 --prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage."
```

启用提示增强

```bash
<命令> --use_prompt_extend --prompt_extend_method 'local_qwen' --prompt_extend_target_lang 'ch' --prompt_extend_model models/Qwen2.5-7B-Instruct

# example
python generate.py  --task t2v-1.3B --size 832*480 --ckpt_dir models/Wan2.1-T2V-1.3B --prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage" --use_prompt_extend --prompt_extend_method 'local_qwen' --prompt_extend_target_lang 'ch' --prompt_extend_model models/Qwen2.5-7B-Instruct
```

### 图像-视频生成

<!-- 1、单卡

```bash
python generate.py --task i2v-14B --size 1280*720 --ckpt_dir models/Wan2.1-I2V-14B-720P --image examples/i2v_input.JPG --prompt "Summer beach vacation style, a white cat wearing sunglasses sits on a surfboard. The fluffy-furred feline gazes directly at the camera with a relaxed expression. Blurred beach scenery forms the background featuring crystal-clear waters, distant green hills, and a blue sky dotted with white clouds. The cat assumes a naturally relaxed posture, as if savoring the sea breeze and warm sunlight. A close-up shot highlights the feline's intricate details and the refreshing atmosphere of the seaside."
```

注意:若遇到显存不够的问题,可以尝试`--offload_model True``--t5_cpu` -->


```bash
torchrun --nproc_per_node=4 generate.py --task i2v-14B --size 1280*720 --ckpt_dir models/Wan2.1-I2V-14B-720P --image examples/i2v_input.JPG --dit_fsdp --t5_fsdp --ulysses_size 4 --prompt "Summer beach vacation style, a white cat wearing sunglasses sits on a surfboard. The fluffy-furred feline gazes directly at the camera with a relaxed expression. Blurred beach scenery forms the background featuring crystal-clear waters, distant green hills, and a blue sky dotted with white clouds. The cat assumes a naturally relaxed posture, as if savoring the sea breeze and warm sunlight. A close-up shot highlights the feline's intricate details and the refreshing atmosphere of the seaside."
```

启用提示增强

```bash
mashun1's avatar
mashun1 committed
137
<命令> --use_prompt_extend --prompt_extend_model models/Qwen2.5-VL-7B-Instruct
mashun1's avatar
wan2.1  
mashun1 committed
138
139

# example
mashun1's avatar
mashun1 committed
140
torchrun --nproc_per_node=4 generate.py --task i2v-14B --size 1280*720 --ckpt_dir models/Wan2.1-I2V-14B-720P --image examples/i2v_input.JPG --dit_fsdp --t5_fsdp --ulysses_size 4 --prompt "Summer beach vacation style, a white cat wearing sunglasses sits on a surfboard. The fluffy-furred feline gazes directly at the camera with a relaxed expression. Blurred beach scenery forms the background featuring crystal-clear waters, distant green hills, and a blue sky dotted with white clouds. The cat assumes a naturally relaxed posture, as if savoring the sea breeze and warm sunlight. A close-up shot highlights the feline's intricate details and the refreshing atmosphere of the seaside." --use_prompt_extend --prompt_extend_model models/Qwen2.5-VL-7B-Instruct
mashun1's avatar
wan2.1  
mashun1 committed
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
```

### 文本-图像生成

<!-- 1、单卡

```bash
python generate.py --task t2i-14B --size 1024*1024 --ckpt_dir models/Wan2.1-T2V-14B  --prompt '一个朴素端庄的美人'
```

注意:若遇到显存不够的问题,可以尝试`--offload_model True``--t5_cpu` -->

```bash
torchrun --nproc_per_node=4 generate.py --dit_fsdp --t5_fsdp --ulysses_size 4 --base_seed 0 --frame_num 1 --task t2i-14B  --size 1024*1024 --prompt '一个朴素端庄的美人' --ckpt_dir models/Wan2.1-T2V-14B
```

启用提示增强

```bash
<命令> --use_prompt_extend --prompt_extend_method 'local_qwen' --prompt_extend_target_lang 'ch' --prompt_extend_model models/Qwen2.5-7B-Instruct

# example
torchrun --nproc_per_node=4 generate.py --dit_fsdp --t5_fsdp --ulysses_size 4 --base_seed 0 --frame_num 1 --task t2i-14B  --size 1024*1024 --prompt '一个朴素端庄的美人' --ckpt_dir models/Wan2.1-T2V-14B --use_prompt_extend --prompt_extend_method 'local_qwen' --prompt_extend_target_lang 'ch' --prompt_extend_model models/Qwen2.5-7B-Instruct
```

### webui

```bash
python gradio/t2v_1.3B_singleGPU.py --ckpt_dir models/Wan2.1-T2V-1.3B --prompt_extend_method 'local_qwen' --prompt_extend_model models/Qwen2.5-7B-Instruct
```

## result


|model/task|t2v|i2v|t2i|
|:---:|:---:|:---:|:---:|
|T2V-14B|![](readme_imgs/t2v-14B.gif)||![](readme_imgs/t2i-14B.png)|
|T2V-1.3B|![](readme_imgs/t2v-1.3B.gif)||||
|I2V-14B-720P||![](readme_imgs/i2v-14B_720.gif)||
|I2V-14B-480P||![](readme_imgs/i2v-14B_480.gif)||


### 精度



## 应用场景

### 算法类别

`视频生成`

### 热点应用行业

`电商,教育,广媒`

## 预训练权重

下载后的模型放在 `models` 目录(自行创建)


|Models|下载链接|
|:---:|:---:|
|T2V-14B|[modelscope](https://www.modelscope.cn/models/Wan-AI/Wan2.1-T2V-14B) \| [SCNet高速下载通道](http://113.200.138.88:18080/aimodels/Wan-AI/Wan2.1-T2V-14B) |
|I2V-14B-720P|[modelscope](https://www.modelscope.cn/models/Wan-AI/Wan2.1-I2V-14B-720P)  \| [SCNet高速下载通道](http://113.200.138.88:18080/aimodels/Wan-AI/Wan2.1-I2V-14B-720P) |
|I2V-14B-480P|[modelscope](https://www.modelscope.cn/models/Wan-AI/Wan2.1-I2V-14B-480P) \| [SCNet高速下载通道](http://113.200.138.88:18080/aimodels/Wan-AI/Wan2.1-I2V-14B-480P) |
|T2V-1.3B|[modelscope](https://www.modelscope.cn/models/Wan-AI/Wan2.1-T2V-1.3B) \| [SCNet高速下载通道](http://113.200.138.88:18080/aimodels/Wan-AI/Wan2.1-T2V-1.3B) |
|Qwen2.5-7B-Instruct|[modelscope](https://www.modelscope.cn/models/Qwen/Qwen2.5-7B-Instruct) \| [SCNet高速下载通道](http://113.200.138.88:18080/aimodels/qwen/Qwen2.5-7B-Instruct) |
|Qwen2.5-VL-7B-Instruct|[modelscope](https://www.modelscope.cn/models/Qwen/Qwen2.5-VL-7B-Instruct) \| [SCNet高速下载通道](http://113.200.138.88:18080/aimodels/qwen/qwen2.5-vl-7b-instruct) |


## 源码仓库及问题反馈

* https://developer.sourcefind.cn/codes/modelzoo/wan2.1_pytorch

## 参考资料

* https://github.com/Wan-Video/Wan2.1