# Wan2.1 ## 论文 `Tech Report` * https://wanxai.com/ ## 模型结构 模型采用主流`Latent Diffusion`架构,包含用于数据压缩/恢复的`3D VAE`,去噪模块`DiT`,文本使用`T5`编码器处理。 ![alt text](readme_imgs/arch.png) ## 算法原理 采用Flow matching算法。 ![alt text](readme_imgs/alg.png) ## 环境配置 ### Docker(方法一) docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.3.0-ubuntu22.04-dtk24.04.3-py3.10 docker run --shm-size 100g --network=host --name=wan --privileged --device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v 项目地址(绝对路径):/home/ -v /opt/hyhal:/opt/hyhal:ro -it bash pip install -r requirements.txt pip install "xfuser>=0.4" --no-deps torch bash modified/fix.sh ### Dockerfile(方法二) docker build -t : . docker run --shm-size 100g --network=host --name=wan --privileged --device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v 项目地址(绝对路径):/home/ -v /opt/hyhal:/opt/hyhal:ro -it bash pip install -r requirements.txt pip install "xfuser>=0.4" --no-deps torch bash modified/fix.sh ### Anaconda(方法三) 1、关于本项目DCU显卡所需的特殊深度学习库可从光合开发者社区下载安装: https://developer.hpccube.com/tool/ ``` DTK驱动:dtk24.04.3 python:python3.10 torch:2.3.0 torchvision:0.18.1 torchaudio:2.1.2 triton:2.1.0 vllm:0.6.2 flash-attn:2.6.1 deepspeed:0.14.2 apex:1.3.0 xformers:0.0.25 transformers:4.48.0 ``` 2、其他非特殊库直接按照requirements.txt安装 ``` pip install -r requirements.txt pip install "xfuser>=0.4" --no-deps torch # 需要参考 modified/fix.sh中的命令修改相应位置的代码 ``` ## 数据集 无 ## 训练 无 ## 推理 ### 文本-视频生成 1、单卡 ```bash # 1.3B模型支持480P python generate.py --task t2v-1.3B --size 832*480 --ckpt_dir models/Wan2.1-T2V-1.3B --prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage." ``` 注意:若遇到显存不够的问题,可以尝试`--offload_model True`,`--t5_cpu` 2、多卡 ```bash # 1.3B torchrun --nproc_per_node=4 generate.py --task t2v-1.3B --size 832*480 --ckpt_dir models/Wan2.1-T2V-1.3B --dit_fsdp --t5_fsdp --ulysses_size 4 --prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage." # 14B torchrun --nproc_per_node=4 generate.py --task t2v-14B --size 1280*720 --ckpt_dir models/Wan2.1-T2V-14B --dit_fsdp --t5_fsdp --ulysses_size 4 --prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage." ``` 启用提示增强 ```bash <命令> --use_prompt_extend --prompt_extend_method 'local_qwen' --prompt_extend_target_lang 'ch' --prompt_extend_model models/Qwen2.5-7B-Instruct # example python generate.py --task t2v-1.3B --size 832*480 --ckpt_dir models/Wan2.1-T2V-1.3B --prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage" --use_prompt_extend --prompt_extend_method 'local_qwen' --prompt_extend_target_lang 'ch' --prompt_extend_model models/Qwen2.5-7B-Instruct ``` ### 图像-视频生成 ```bash torchrun --nproc_per_node=4 generate.py --task i2v-14B --size 1280*720 --ckpt_dir models/Wan2.1-I2V-14B-720P --image examples/i2v_input.JPG --dit_fsdp --t5_fsdp --ulysses_size 4 --prompt "Summer beach vacation style, a white cat wearing sunglasses sits on a surfboard. The fluffy-furred feline gazes directly at the camera with a relaxed expression. Blurred beach scenery forms the background featuring crystal-clear waters, distant green hills, and a blue sky dotted with white clouds. The cat assumes a naturally relaxed posture, as if savoring the sea breeze and warm sunlight. A close-up shot highlights the feline's intricate details and the refreshing atmosphere of the seaside." ``` 启用提示增强 ```bash <命令> --use_prompt_extend --prompt_extend_model models/Qwen2.5-VL-7B-Instruct # example torchrun --nproc_per_node=4 generate.py --task i2v-14B --size 1280*720 --ckpt_dir models/Wan2.1-I2V-14B-720P --image examples/i2v_input.JPG --dit_fsdp --t5_fsdp --ulysses_size 4 --prompt "Summer beach vacation style, a white cat wearing sunglasses sits on a surfboard. The fluffy-furred feline gazes directly at the camera with a relaxed expression. Blurred beach scenery forms the background featuring crystal-clear waters, distant green hills, and a blue sky dotted with white clouds. The cat assumes a naturally relaxed posture, as if savoring the sea breeze and warm sunlight. A close-up shot highlights the feline's intricate details and the refreshing atmosphere of the seaside." --use_prompt_extend --prompt_extend_model models/Qwen2.5-VL-7B-Instruct ``` ### 文本-图像生成 ```bash torchrun --nproc_per_node=4 generate.py --dit_fsdp --t5_fsdp --ulysses_size 4 --base_seed 0 --frame_num 1 --task t2i-14B --size 1024*1024 --prompt '一个朴素端庄的美人' --ckpt_dir models/Wan2.1-T2V-14B ``` 启用提示增强 ```bash <命令> --use_prompt_extend --prompt_extend_method 'local_qwen' --prompt_extend_target_lang 'ch' --prompt_extend_model models/Qwen2.5-7B-Instruct # example torchrun --nproc_per_node=4 generate.py --dit_fsdp --t5_fsdp --ulysses_size 4 --base_seed 0 --frame_num 1 --task t2i-14B --size 1024*1024 --prompt '一个朴素端庄的美人' --ckpt_dir models/Wan2.1-T2V-14B --use_prompt_extend --prompt_extend_method 'local_qwen' --prompt_extend_target_lang 'ch' --prompt_extend_model models/Qwen2.5-7B-Instruct ``` ### webui ```bash python gradio/t2v_1.3B_singleGPU.py --ckpt_dir models/Wan2.1-T2V-1.3B --prompt_extend_method 'local_qwen' --prompt_extend_model models/Qwen2.5-7B-Instruct ``` ## result |model/task|t2v|i2v|t2i| |:---:|:---:|:---:|:---:| |T2V-14B|![](readme_imgs/t2v-14B.gif)||![](readme_imgs/t2i-14B.png)| |T2V-1.3B|![](readme_imgs/t2v-1.3B.gif)|||| |I2V-14B-720P||![](readme_imgs/i2v-14B_720.gif)|| |I2V-14B-480P||![](readme_imgs/i2v-14B_480.gif)|| ### 精度 无 ## 应用场景 ### 算法类别 `视频生成` ### 热点应用行业 `电商,教育,广媒` ## 预训练权重 下载后的模型放在 `models` 目录(自行创建) |Models|下载链接| |:---:|:---:| |T2V-14B|[modelscope](https://www.modelscope.cn/models/Wan-AI/Wan2.1-T2V-14B) \| [SCNet高速下载通道](http://113.200.138.88:18080/aimodels/Wan-AI/Wan2.1-T2V-14B) | |I2V-14B-720P|[modelscope](https://www.modelscope.cn/models/Wan-AI/Wan2.1-I2V-14B-720P) \| [SCNet高速下载通道](http://113.200.138.88:18080/aimodels/Wan-AI/Wan2.1-I2V-14B-720P) | |I2V-14B-480P|[modelscope](https://www.modelscope.cn/models/Wan-AI/Wan2.1-I2V-14B-480P) \| [SCNet高速下载通道](http://113.200.138.88:18080/aimodels/Wan-AI/Wan2.1-I2V-14B-480P) | |T2V-1.3B|[modelscope](https://www.modelscope.cn/models/Wan-AI/Wan2.1-T2V-1.3B) \| [SCNet高速下载通道](http://113.200.138.88:18080/aimodels/Wan-AI/Wan2.1-T2V-1.3B) | |Qwen2.5-7B-Instruct|[modelscope](https://www.modelscope.cn/models/Qwen/Qwen2.5-7B-Instruct) \| [SCNet高速下载通道](http://113.200.138.88:18080/aimodels/qwen/Qwen2.5-7B-Instruct) | |Qwen2.5-VL-7B-Instruct|[modelscope](https://www.modelscope.cn/models/Qwen/Qwen2.5-VL-7B-Instruct) \| [SCNet高速下载通道](http://113.200.138.88:18080/aimodels/qwen/qwen2.5-vl-7b-instruct) | ## 源码仓库及问题反馈 * https://developer.sourcefind.cn/codes/modelzoo/wan2.1_pytorch ## 参考资料 * https://github.com/Wan-Video/Wan2.1