# Wan2.1

## 论文

`Tech Report`

* https://wanxai.com/

## 模型结构

模型采用主流`Latent Diffusion`架构，包含用于数据压缩/恢复的`3D VAE`，去噪模块`DiT`，文本使用`T5`编码器处理。

![alt text](readme_imgs/arch.png)

## 算法原理

采用Flow matching算法。

![alt text](readme_imgs/alg.png)

## 环境配置

### Docker（方法一）
    
    docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.3.0-ubuntu22.04-dtk24.04.3-py3.10

    docker run --shm-size 100g --network=host --name=wan --privileged --device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v 项目地址(绝对路径):/home/ -v /opt/hyhal:/opt/hyhal:ro -it <your IMAGE ID> bash

    pip install -r requirements.txt

    pip install "xfuser=0.4.2" --no-deps torch

    bash modified/fix.sh

### Dockerfile（方法二）

    docker build -t <IMAGE_NAME>:<TAG> .

    docker run --shm-size 100g --network=host --name=wan --privileged --device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v 项目地址(绝对路径):/home/ -v /opt/hyhal:/opt/hyhal:ro -it <your IMAGE ID> bash
    
    pip install -r requirements.txt

    pip install "xfuser==0.4.2" --no-deps torch

    bash modified/fix.sh

### Anaconda（方法三）

1、关于本项目DCU显卡所需的特殊深度学习库可从光合开发者社区下载安装： https://developer.hpccube.com/tool/

```
DTK驱动:dtk24.04.3
python:python3.10
torch:2.3.0
torchvision:0.18.1
torchaudio:2.1.2
triton:2.1.0
vllm:0.6.2
flash-attn:2.6.1
deepspeed:0.14.2
apex:1.3.0
xformers:0.0.25
transformers:4.48.0
```

2、其他非特殊库直接按照requirements.txt安装

```
pip install -r requirements.txt

pip install "xfuser==0.4.2" --no-deps torch

# 需要参考 modified/fix.sh中的命令修改相应位置的代码
```

## 数据集

无

## 训练

无

## 推理

### 文本-视频生成

1、单卡

<!-- # # 14B模型支持480/720P
# python generate.py  --task t2v-14B --size 832*480 --ckpt_dir models/Wan2.1-T2V-14B --prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage." -->

```bash
# 1.3B模型支持480P
python generate.py  --task t2v-1.3B --size 832*480 --ckpt_dir models/Wan2.1-T2V-1.3B --prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage."
```

注意：若遇到显存不够的问题，可以尝试`--offload_model True`，`--t5_cpu`

2、多卡

```bash
# 1.3B
torchrun --nproc_per_node=4 generate.py --task t2v-1.3B --size 832*480 --ckpt_dir models/Wan2.1-T2V-1.3B --dit_fsdp --t5_fsdp --ulysses_size 4 --prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage."

# 14B
torchrun --nproc_per_node=4 generate.py --task t2v-14B --size 1280*720 --ckpt_dir models/Wan2.1-T2V-14B --dit_fsdp --t5_fsdp --ulysses_size 4 --prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage."
```

启用提示增强

```bash
<命令> --use_prompt_extend --prompt_extend_method 'local_qwen' --prompt_extend_target_lang 'ch' --prompt_extend_model models/Qwen2.5-7B-Instruct

# example
python generate.py  --task t2v-1.3B --size 832*480 --ckpt_dir models/Wan2.1-T2V-1.3B --prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage" --use_prompt_extend --prompt_extend_method 'local_qwen' --prompt_extend_target_lang 'ch' --prompt_extend_model models/Qwen2.5-7B-Instruct
```

### 图像-视频生成

<!-- 1、单卡

```bash
python generate.py --task i2v-14B --size 1280*720 --ckpt_dir models/Wan2.1-I2V-14B-720P --image examples/i2v_input.JPG --prompt "Summer beach vacation style, a white cat wearing sunglasses sits on a surfboard. The fluffy-furred feline gazes directly at the camera with a relaxed expression. Blurred beach scenery forms the background featuring crystal-clear waters, distant green hills, and a blue sky dotted with white clouds. The cat assumes a naturally relaxed posture, as if savoring the sea breeze and warm sunlight. A close-up shot highlights the feline's intricate details and the refreshing atmosphere of the seaside."
```

注意：若遇到显存不够的问题，可以尝试`--offload_model True`，`--t5_cpu` -->


```bash
torchrun --nproc_per_node=4 generate.py --task i2v-14B --size 1280*720 --ckpt_dir models/Wan2.1-I2V-14B-720P --image examples/i2v_input.JPG --dit_fsdp --t5_fsdp --ulysses_size 4 --prompt "Summer beach vacation style, a white cat wearing sunglasses sits on a surfboard. The fluffy-furred feline gazes directly at the camera with a relaxed expression. Blurred beach scenery forms the background featuring crystal-clear waters, distant green hills, and a blue sky dotted with white clouds. The cat assumes a naturally relaxed posture, as if savoring the sea breeze and warm sunlight. A close-up shot highlights the feline's intricate details and the refreshing atmosphere of the seaside."
```

启用提示增强

```bash
<命令> --use_prompt_extend --prompt_extend_model models/Qwen2.5-VL-7B-Instruct

# example
torchrun --nproc_per_node=4 generate.py --task i2v-14B --size 1280*720 --ckpt_dir models/Wan2.1-I2V-14B-720P --image examples/i2v_input.JPG --dit_fsdp --t5_fsdp --ulysses_size 4 --prompt "Summer beach vacation style, a white cat wearing sunglasses sits on a surfboard. The fluffy-furred feline gazes directly at the camera with a relaxed expression. Blurred beach scenery forms the background featuring crystal-clear waters, distant green hills, and a blue sky dotted with white clouds. The cat assumes a naturally relaxed posture, as if savoring the sea breeze and warm sunlight. A close-up shot highlights the feline's intricate details and the refreshing atmosphere of the seaside." --use_prompt_extend --prompt_extend_model models/Qwen2.5-VL-7B-Instruct
```

### 文本-图像生成

<!-- 1、单卡

```bash
python generate.py --task t2i-14B --size 1024*1024 --ckpt_dir models/Wan2.1-T2V-14B  --prompt '一个朴素端庄的美人'
```

注意：若遇到显存不够的问题，可以尝试`--offload_model True`，`--t5_cpu` -->

```bash
torchrun --nproc_per_node=4 generate.py --dit_fsdp --t5_fsdp --ulysses_size 4 --base_seed 0 --frame_num 1 --task t2i-14B  --size 1024*1024 --prompt '一个朴素端庄的美人' --ckpt_dir models/Wan2.1-T2V-14B
```

启用提示增强

```bash
<命令> --use_prompt_extend --prompt_extend_method 'local_qwen' --prompt_extend_target_lang 'ch' --prompt_extend_model models/Qwen2.5-7B-Instruct

# example
torchrun --nproc_per_node=4 generate.py --dit_fsdp --t5_fsdp --ulysses_size 4 --base_seed 0 --frame_num 1 --task t2i-14B  --size 1024*1024 --prompt '一个朴素端庄的美人' --ckpt_dir models/Wan2.1-T2V-14B --use_prompt_extend --prompt_extend_method 'local_qwen' --prompt_extend_target_lang 'ch' --prompt_extend_model models/Qwen2.5-7B-Instruct
```

### webui

```bash
python gradio/t2v_1.3B_singleGPU.py --ckpt_dir models/Wan2.1-T2V-1.3B --prompt_extend_method 'local_qwen' --prompt_extend_model models/Qwen2.5-7B-Instruct
```

## result


|model/task|t2v|i2v|t2i|
|:---:|:---:|:---:|:---:|
|T2V-14B|![](readme_imgs/t2v-14B.gif)||![](readme_imgs/t2i-14B.png)|
|T2V-1.3B|![](readme_imgs/t2v-1.3B.gif)||||
|I2V-14B-720P||![](readme_imgs/i2v-14B_720.gif)||
|I2V-14B-480P||![](readme_imgs/i2v-14B_480.gif)||


### 精度

无

## 应用场景

### 算法类别

`视频生成`

### 热点应用行业

`电商,教育,广媒`

## 预训练权重

下载后的模型放在 `models` 目录（自行创建）


|Models|下载链接|
|:---:|:---:|
|T2V-14B|[modelscope](https://www.modelscope.cn/models/Wan-AI/Wan2.1-T2V-14B) \| [SCNet] |
|I2V-14B-720P|[modelscope](https://www.modelscope.cn/models/Wan-AI/Wan2.1-I2V-14B-720P)  \| [SCNet] |
|I2V-14B-480P|[modelscope](https://www.modelscope.cn/models/Wan-AI/Wan2.1-I2V-14B-480P) \| [SCNet] |
|T2V-1.3B|[modelscope](https://www.modelscope.cn/models/Wan-AI/Wan2.1-T2V-1.3B) \| [SCNet] |
|Qwen2.5-7B-Instruct|[modelscope](https://www.modelscope.cn/models/Qwen/Qwen2.5-7B-Instruct) \| [SCNet] |
|Qwen2.5-VL-7B-Instruct|[modelscope](https://www.modelscope.cn/models/Qwen/Qwen2.5-VL-7B-Instruct) \| [SCNet]|


## 源码仓库及问题反馈

* https://developer.sourcefind.cn/codes/modelzoo/wan2.1_pytorch

## 参考资料

* https://github.com/Wan-Video/Wan2.1
