Commit e08c4f90 authored by sandy's avatar sandy Committed by GitHub
Browse files

Merge branch 'main' into audio_r2v

parents 12bfd120 6d07a72e
...@@ -25,3 +25,5 @@ ...@@ -25,3 +25,5 @@
build/ build/
dist/ dist/
.cache/ .cache/
server_cache/
app/.gradio/
# LightX2V: Light Video Generation Inference Framework <div align="center" style="font-family: charter;">
<h1>⚡️ LightX2V:<br> Lightweight Video Generation Inference Framework</h1>
<div align="center" id="lightx2v">
<img alt="logo" src="assets/img_lightx2v.png" width=75%></img> <img alt="logo" src="assets/img_lightx2v.png" width=75%></img>
[![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/licenses/Apache-2.0) [![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
[![Ask DeepWiki](https://deepwiki.com/badge.svg)](https://deepwiki.com/ModelTC/lightx2v) [![Ask DeepWiki](https://deepwiki.com/badge.svg)](https://deepwiki.com/ModelTC/lightx2v)
[![Doc](https://img.shields.io/badge/docs-English-99cc2)](https://lightx2v-en.readthedocs.io/en/latest) [![Doc](https://img.shields.io/badge/docs-English-99cc2)](https://lightx2v-en.readthedocs.io/en/latest)
[![Doc](https://img.shields.io/badge/文档-中文-99cc2)](https://lightx2v-zhcn.readthedocs.io/zh-cn/latest) [![Doc](https://img.shields.io/badge/文档-中文-99cc2)](https://lightx2v-zhcn.readthedocs.io/zh-cn/latest)
[![Papers](https://img.shields.io/badge/论文集-中文-99cc2)](https://lightx2v-papers-zhcn.readthedocs.io/zh-cn/latest)
[![Docker](https://badgen.net/badge/icon/docker?icon=docker&label)](https://hub.docker.com/r/lightx2v/lightx2v/tags) [![Docker](https://badgen.net/badge/icon/docker?icon=docker&label)](https://hub.docker.com/r/lightx2v/lightx2v/tags)
**\[ English | [中文](README_zh.md) \]**
</div> </div>
-------------------------------------------------------------------------------- --------------------------------------------------------------------------------
## Supported Model List **LightX2V** is an advanced lightweight video generation inference framework engineered to deliver efficient, high-performance video synthesis solutions. This unified platform integrates multiple state-of-the-art video generation techniques, supporting diverse generation tasks including text-to-video (T2V) and image-to-video (I2V). **X2V represents the transformation of different input modalities (X, such as text or images) into video output (V)**.
[HunyuanVideo-T2V](https://huggingface.co/tencent/HunyuanVideo) ## 💡 Quick Start
[HunyuanVideo-I2V](https://huggingface.co/tencent/HunyuanVideo-I2V) For comprehensive usage instructions, please refer to our documentation: **[English Docs](https://lightx2v-en.readthedocs.io/en/latest/) | [中文文档](https://lightx2v-zhcn.readthedocs.io/zh-cn/latest/)**
[Wan2.1-T2V](https://huggingface.co/Wan-AI/Wan2.1-T2V-1.3B)
[Wan2.1-I2V](https://huggingface.co/Wan-AI/Wan2.1-I2V-14B-480P) ## 🤖 Supported Model Ecosystem
[Wan2.1-T2V-StepDistill-CfgDistill](https://huggingface.co/lightx2v/Wan2.1-T2V-14B-StepDistill-CfgDistill) ### Official Open-Source Models
-[HunyuanVideo](https://huggingface.co/tencent/HunyuanVideo)
-[Wan2.1](https://huggingface.co/Wan-AI/)
-[SkyReels-V2-DF](https://huggingface.co/Skywork/SkyReels-V2-DF-14B-540P)
-[CogVideoX1.5-5B-T2V](https://huggingface.co/THUDM/CogVideoX1.5-5B)
[Wan2.1-T2V-CausVid](https://huggingface.co/lightx2v/Wan2.1-T2V-14B-CausVid) ### Quantized Models
-[Wan2.1-T2V-1.3B-Lightx2v](https://huggingface.co/lightx2v/Wan2.1-T2V-1.3B-Lightx2v)
-[Wan2.1-T2V-14B-Lightx2v](https://huggingface.co/lightx2v/Wan2.1-T2V-14B-Lightx2v)
-[Wan2.1-I2V-14B-480P-Lightx2v](https://huggingface.co/lightx2v/Wan2.1-I2V-14B-480P-Lightx2v)
-[Wan2.1-I2V-14B-720P-Lightx2v](https://huggingface.co/lightx2v/Wan2.1-I2V-14B-720P-Lightx2v)
[SkyReels-V2-DF](https://huggingface.co/Skywork/SkyReels-V2-DF-14B-540P) ### Distilled Models (**🚀 Recommended: 4-step inference**)
-[Wan2.1-T2V-14B-StepDistill-CfgDistill-Lightx2v](https://huggingface.co/lightx2v/Wan2.1-T2V-14B-StepDistill-CfgDistill-Lightx2v)
-[Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-Lightx2v](https://huggingface.co/lightx2v/Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-Lightx2v)
-[Wan2.1-I2V-14B-720P-StepDistill-CfgDistill-Lightx2v](https://huggingface.co/lightx2v/Wan2.1-I2V-14B-720P-StepDistill-CfgDistill-Lightx2v)
[CogVideoX1.5-5B-T2V](https://huggingface.co/THUDM/CogVideoX1.5-5B) ### Autoregressive Models
-[Wan2.1-T2V-CausVid](https://huggingface.co/lightx2v/Wan2.1-T2V-14B-CausVid)
## How to Run
Please refer to the [documentation](https://github.com/ModelTC/lightx2v/tree/main/docs) in lightx2v. ## 🚀 Core Features
## Contributing Guidelines ### 🎯 **Ultimate Performance Optimization**
- **🔥 SOTA Inference Speed**: Achieve **~15x** acceleration via step distillation and system optimization (single GPU)
- **⚡️ Revolutionary 4-Step Distillation**: Compress original 40-50 step inference to just 4 steps without CFG requirements
- **🛠️ Advanced Operator Support**: Integrated with cutting-edge operators including [Sage Attention](https://github.com/thu-ml/SageAttention), [Flash Attention](https://github.com/Dao-AILab/flash-attention), [Radial Attention](https://github.com/mit-han-lab/radial-attention), [q8-kernel](https://github.com/KONAKONA666/q8_kernels), [sgl-kernel](https://github.com/sgl-project/sglang/tree/main/sgl-kernel), [vllm](https://github.com/vllm-project/vllm)
We have prepared a `pre-commit` hook to enforce consistent code formatting across the project. ### 💾 **Resource-Efficient Deployment**
- **💡 Breaking Hardware Barriers**: Run 14B models for 480P/720P video generation with only **8GB VRAM + 16GB RAM**
- **🔧 Intelligent Parameter Offloading**: Advanced disk-CPU-GPU three-tier offloading architecture with phase/block-level granular management
- **⚙️ Comprehensive Quantization**: Support for `w8a8-int8`, `w8a8-fp8`, `w4a4-nvfp4` and other quantization strategies
1. Install the required dependencies: ### 🎨 **Rich Feature Ecosystem**
- **📈 Smart Feature Caching**: Intelligent caching mechanisms to eliminate redundant computations
- **🔄 Parallel Inference**: Multi-GPU parallel processing for enhanced performance
- **📱 Flexible Deployment Options**: Support for Gradio, service deployment, ComfyUI and other deployment methods
- **🎛️ Dynamic Resolution Inference**: Adaptive resolution adjustment for optimal generation quality
```shell
pip install ruff pre-commit
```
2. Then, run the following command before commit: ## 🏆 Performance Benchmarks
For detailed performance metrics and comparisons, please refer to our [benchmark documentation](https://github.com/ModelTC/LightX2V/blob/main/docs/EN/source/getting_started/benchmark_source.md).
[Detailed Service Deployment Guide →](https://lightx2v-en.readthedocs.io/en/latest/deploy_guides/deploy_service.html)
## 📚 Technical Documentation
### 📖 **Method Tutorials**
- [Model Quantization](https://lightx2v-en.readthedocs.io/en/latest/method_tutorials/quantization.html) - Comprehensive guide to quantization strategies
- [Feature Caching](https://lightx2v-en.readthedocs.io/en/latest/method_tutorials/cache.html) - Intelligent caching mechanisms
- [Attention Mechanisms](https://lightx2v-en.readthedocs.io/en/latest/method_tutorials/attention.html) - State-of-the-art attention operators
- [Parameter Offloading](https://lightx2v-en.readthedocs.io/en/latest/method_tutorials/offload.html) - Three-tier storage architecture
- [Parallel Inference](https://lightx2v-en.readthedocs.io/en/latest/method_tutorials/parallel.html) - Multi-GPU acceleration strategies
- [Step Distillation](https://lightx2v-en.readthedocs.io/en/latest/method_tutorials/step_distill.html) - 4-step inference technology
### 🛠️ **Deployment Guides**
- [Low-Resource Deployment](https://lightx2v-en.readthedocs.io/en/latest/deploy_guides/for_low_resource.html) - Optimized 8GB VRAM solutions
- [Low-Latency Deployment](https://lightx2v-en.readthedocs.io/en/latest/deploy_guides/for_low_latency.html) - Ultra-fast inference optimization
- [Gradio Deployment](https://lightx2v-en.readthedocs.io/en/latest/deploy_guides/deploy_gradio.html) - Web interface setup
- [Service Deployment](https://lightx2v-en.readthedocs.io/en/latest/deploy_guides/deploy_service.html) - Production API service deployment
## 🧾 Contributing Guidelines
We maintain code quality through automated pre-commit hooks to ensure consistent formatting across the project.
```shell > [!TIP]
pre-commit run --all-files > **Setup Instructions:**
>
> 1. Install required dependencies:
> ```shell
> pip install ruff pre-commit
> ```
>
> 2. Run before committing:
> ```shell
> pre-commit run --all-files
> ```
We appreciate your contributions to making LightX2V better!
## 🤝 Acknowledgments
We extend our gratitude to all the model repositories and research communities that inspired and contributed to the development of LightX2V. This framework builds upon the collective efforts of the open-source community.
## 🌟 Star History
[![Star History Chart](https://api.star-history.com/svg?repos=ModelTC/lightx2v&type=Timeline)](https://star-history.com/#ModelTC/lightx2v&Timeline)
## ✏️ Citation
If you find LightX2V useful in your research, please consider citing our work:
```bibtex
@misc{lightx2v,
author = {LightX2V Contributors},
title = {LightX2V: Light Video Generation Inference Framework},
year = {2025},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/ModelTC/lightx2v}},
}
``` ```
Thank you for your contributions! ## 📞 Contact & Support
For questions, suggestions, or support, please feel free to reach out through:
- 🐛 [GitHub Issues](https://github.com/ModelTC/lightx2v/issues) - Bug reports and feature requests
- 💬 [GitHub Discussions](https://github.com/ModelTC/lightx2v/discussions) - Community discussions and Q&A
## Acknowledgments ---
We built the code for this repository by referencing the code repositories involved in all the models mentioned above. <div align="center">
Built with ❤️ by the LightX2V team
</div>
<div align="center" style="font-family: charter;">
<h1>⚡️ LightX2V:<br> 轻量级视频生成推理框架</h1>
<img alt="logo" src="assets/img_lightx2v.png" width=75%></img>
[![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
[![Ask DeepWiki](https://deepwiki.com/badge.svg)](https://deepwiki.com/ModelTC/lightx2v)
[![Doc](https://img.shields.io/badge/docs-English-99cc2)](https://lightx2v-en.readthedocs.io/en/latest)
[![Doc](https://img.shields.io/badge/文档-中文-99cc2)](https://lightx2v-zhcn.readthedocs.io/zh-cn/latest)
[![Papers](https://img.shields.io/badge/论文集-中文-99cc2)](https://lightx2v-papers-zhcn.readthedocs.io/zh-cn/latest)
[![Docker](https://badgen.net/badge/icon/docker?icon=docker&label)](https://hub.docker.com/r/lightx2v/lightx2v/tags)
**\[ [English](README.md) | 中文 \]**
</div>
--------------------------------------------------------------------------------
**LightX2V** 是一个先进的轻量级视频生成推理框架,专为提供高效、高性能的视频合成解决方案而设计。该统一平台集成了多种前沿的视频生成技术,支持文本生成视频(T2V)和图像生成视频(I2V)等多样化生成任务。**X2V 表示将不同的输入模态(X,如文本或图像)转换为视频输出(V)**
## 💡 快速开始
详细使用说明请参考我们的文档:**[英文文档](https://lightx2v-en.readthedocs.io/en/latest/) | [中文文档](https://lightx2v-zhcn.readthedocs.io/zh-cn/latest/)**
## 🤖 支持的模型生态
### 官方开源模型
-[HunyuanVideo](https://huggingface.co/tencent/HunyuanVideo)
-[Wan2.1](https://huggingface.co/Wan-AI/)
-[SkyReels-V2-DF](https://huggingface.co/Skywork/SkyReels-V2-DF-14B-540P)
-[CogVideoX1.5-5B-T2V](https://huggingface.co/THUDM/CogVideoX1.5-5B)
### 量化模型
-[Wan2.1-T2V-1.3B-Lightx2v](https://huggingface.co/lightx2v/Wan2.1-T2V-1.3B-Lightx2v)
-[Wan2.1-T2V-14B-Lightx2v](https://huggingface.co/lightx2v/Wan2.1-T2V-14B-Lightx2v)
-[Wan2.1-I2V-14B-480P-Lightx2v](https://huggingface.co/lightx2v/Wan2.1-I2V-14B-480P-Lightx2v)
-[Wan2.1-I2V-14B-720P-Lightx2v](https://huggingface.co/lightx2v/Wan2.1-I2V-14B-720P-Lightx2v)
### 蒸馏模型 (**🚀 推荐:4步推理**)
-[Wan2.1-T2V-14B-StepDistill-CfgDistill-Lightx2v](https://huggingface.co/lightx2v/Wan2.1-T2V-14B-StepDistill-CfgDistill-Lightx2v)
-[Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-Lightx2v](https://huggingface.co/lightx2v/Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-Lightx2v)
-[Wan2.1-I2V-14B-720P-StepDistill-CfgDistill-Lightx2v](https://huggingface.co/lightx2v/Wan2.1-I2V-14B-720P-StepDistill-CfgDistill-Lightx2v)
### 自回归模型
-[Wan2.1-T2V-CausVid](https://huggingface.co/lightx2v/Wan2.1-T2V-14B-CausVid)
## 🚀 核心特性
### 🎯 **极致性能优化**
- **🔥 SOTA推理速度**: 通过步数蒸馏和系统优化实现**15倍**极速加速(单GPU)
- **⚡️ 革命性4步蒸馏**: 将原始40-50步推理压缩至仅需4步,且无需CFG配置
- **🛠️ 先进算子支持**: 集成顶尖算子,包括[Sage Attention](https://github.com/thu-ml/SageAttention)[Flash Attention](https://github.com/Dao-AILab/flash-attention)[Radial Attention](https://github.com/mit-han-lab/radial-attention)[q8-kernel](https://github.com/KONAKONA666/q8_kernels)[sgl-kernel](https://github.com/sgl-project/sglang/tree/main/sgl-kernel)[vllm](https://github.com/vllm-project/vllm)
### 💾 **资源高效部署**
- **💡 突破硬件限制**: **仅需8GB显存 + 16GB内存**即可运行14B模型生成480P/720P视频
- **🔧 智能参数卸载**: 先进的磁盘-CPU-GPU三级卸载架构,支持阶段/块级别的精细化管理
- **⚙️ 全面量化支持**: 支持`w8a8-int8``w8a8-fp8``w4a4-nvfp4`等多种量化策略
### 🎨 **丰富功能生态**
- **📈 智能特征缓存**: 智能缓存机制,消除冗余计算,提升效率
- **🔄 并行推理加速**: 多GPU并行处理,显著提升性能表现
- **📱 灵活部署选择**: 支持Gradio、服务化部署、ComfyUI等多种部署方式
- **🎛️ 动态分辨率推理**: 自适应分辨率调整,优化生成质量
## 🏆 性能基准测试
详细的性能指标和对比分析,请参考我们的[基准测试文档](https://github.com/ModelTC/LightX2V/blob/main/docs/ZH_CN/source/getting_started/benchmark_source.md)
[详细服务部署指南 →](https://lightx2v-zhcn.readthedocs.io/zh-cn/latest/deploy_guides/deploy_service.html)
## 📚 技术文档
### 📖 **方法教程**
- [模型量化](https://lightx2v-zhcn.readthedocs.io/zh-cn/latest/method_tutorials/quantization.html) - 量化策略全面指南
- [特征缓存](https://lightx2v-zhcn.readthedocs.io/zh-cn/latest/method_tutorials/cache.html) - 智能缓存机制详解
- [注意力机制](https://lightx2v-zhcn.readthedocs.io/zh-cn/latest/method_tutorials/attention.html) - 前沿注意力算子
- [参数卸载](https://lightx2v-zhcn.readthedocs.io/zh-cn/latest/method_tutorials/offload.html) - 三级存储架构
- [并行推理](https://lightx2v-zhcn.readthedocs.io/zh-cn/latest/method_tutorials/parallel.html) - 多GPU加速策略
- [步数蒸馏](https://lightx2v-zhcn.readthedocs.io/zh-cn/latest/method_tutorials/step_distill.html) - 4步推理技术
### 🛠️ **部署指南**
- [低资源场景部署](https://lightx2v-zhcn.readthedocs.io/zh-cn/latest/deploy_guides/for_low_resource.html) - 优化的8GB显存解决方案
- [低延迟场景部署](https://lightx2v-zhcn.readthedocs.io/zh-cn/latest/deploy_guides/for_low_latency.html) - 极速推理优化
- [Gradio部署](https://lightx2v-zhcn.readthedocs.io/zh-cn/latest/deploy_guides/deploy_gradio.html) - Web界面搭建
- [服务化部署](https://lightx2v-zhcn.readthedocs.io/zh-cn/latest/deploy_guides/deploy_service.html) - 生产级API服务部署
## 🧾 代码贡献指南
我们通过自动化的预提交钩子来保证代码质量,确保项目代码格式的一致性。
> [!TIP]
> **安装说明:**
>
> 1. 安装必要的依赖:
> ```shell
> pip install ruff pre-commit
> ```
>
> 2. 提交前运行:
> ```shell
> pre-commit run --all-files
> ```
感谢您为LightX2V的改进做出贡献!
## 🤝 致谢
我们向所有启发和促进LightX2V开发的模型仓库和研究社区表示诚挚的感谢。此框架基于开源社区的集体努力而构建。
## 🌟 Star 历史
[![Star History Chart](https://api.star-history.com/svg?repos=ModelTC/lightx2v&type=Timeline)](https://star-history.com/#ModelTC/lightx2v&Timeline)
## ✏️ 引用
如果您发现LightX2V对您的研究有用,请考虑引用我们的工作:
```bibtex
@misc{lightx2v,
author = {LightX2V Contributors},
title = {LightX2V: Light Video Generation Inference Framework},
year = {2025},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/ModelTC/lightx2v}},
}
```
## 📞 联系与支持
如有任何问题、建议或需要支持,欢迎通过以下方式联系我们:
- 🐛 [GitHub Issues](https://github.com/ModelTC/lightx2v/issues) - 错误报告和功能请求
- 💬 [GitHub Discussions](https://github.com/ModelTC/lightx2v/discussions) - 社区讨论和问答
---
<div align="center">
由 LightX2V 团队用 ❤️ 构建
</div>
-----BEGIN CERTIFICATE-----
MIIFazCCA1OgAwIBAgIRAIIQz7DSQONZRGPgu2OCiwAwDQYJKoZIhvcNAQELBQAw
TzELMAkGA1UEBhMCVVMxKTAnBgNVBAoTIEludGVybmV0IFNlY3VyaXR5IFJlc2Vh
cmNoIEdyb3VwMRUwEwYDVQQDEwxJU1JHIFJvb3QgWDEwHhcNMTUwNjA0MTEwNDM4
WhcNMzUwNjA0MTEwNDM4WjBPMQswCQYDVQQGEwJVUzEpMCcGA1UEChMgSW50ZXJu
ZXQgU2VjdXJpdHkgUmVzZWFyY2ggR3JvdXAxFTATBgNVBAMTDElTUkcgUm9vdCBY
MTCCAiIwDQYJKoZIhvcNAQEBBQADggIPADCCAgoCggIBAK3oJHP0FDfzm54rVygc
h77ct984kIxuPOZXoHj3dcKi/vVqbvYATyjb3miGbESTtrFj/RQSa78f0uoxmyF+
0TM8ukj13Xnfs7j/EvEhmkvBioZxaUpmZmyPfjxwv60pIgbz5MDmgK7iS4+3mX6U
A5/TR5d8mUgjU+g4rk8Kb4Mu0UlXjIB0ttov0DiNewNwIRt18jA8+o+u3dpjq+sW
T8KOEUt+zwvo/7V3LvSye0rgTBIlDHCNAymg4VMk7BPZ7hm/ELNKjD+Jo2FR3qyH
B5T0Y3HsLuJvW5iB4YlcNHlsdu87kGJ55tukmi8mxdAQ4Q7e2RCOFvu396j3x+UC
B5iPNgiV5+I3lg02dZ77DnKxHZu8A/lJBdiB3QW0KtZB6awBdpUKD9jf1b0SHzUv
KBds0pjBqAlkd25HN7rOrFleaJ1/ctaJxQZBKT5ZPt0m9STJEadao0xAH0ahmbWn
OlFuhjuefXKnEgV4We0+UXgVCwOPjdAvBbI+e0ocS3MFEvzG6uBQE3xDk3SzynTn
jh8BCNAw1FtxNrQHusEwMFxIt4I7mKZ9YIqioymCzLq9gwQbooMDQaHWBfEbwrbw
qHyGO0aoSCqI3Haadr8faqU9GY/rOPNk3sgrDQoo//fb4hVC1CLQJ13hef4Y53CI
rU7m2Ys6xt0nUW7/vGT1M0NPAgMBAAGjQjBAMA4GA1UdDwEB/wQEAwIBBjAPBgNV
HRMBAf8EBTADAQH/MB0GA1UdDgQWBBR5tFnme7bl5AFzgAiIyBpY9umbbjANBgkq
hkiG9w0BAQsFAAOCAgEAVR9YqbyyqFDQDLHYGmkgJykIrGF1XIpu+ILlaS/V9lZL
ubhzEFnTIZd+50xx+7LSYK05qAvqFyFWhfFQDlnrzuBZ6brJFe+GnY+EgPbk6ZGQ
3BebYhtF8GaV0nxvwuo77x/Py9auJ/GpsMiu/X1+mvoiBOv/2X/qkSsisRcOj/KK
NFtY2PwByVS5uCbMiogziUwthDyC3+6WVwW6LLv3xLfHTjuCvjHIInNzktHCgKQ5
ORAzI4JMPJ+GslWYHb4phowim57iaztXOoJwTdwJx4nLCgdNbOhdjsnvzqvHu7Ur
TkXWStAmzOVyyghqpZXjFaH3pO3JLF+l+/+sKAIuvtd7u+Nxe5AW0wdeRlN8NwdC
jNPElpzVmbUq4JUagEiuTDkHzsxHpFKVK7q4+63SM1N95R1NbdWhscdCb+ZAJzVc
oyi3B43njTOQ5yOf+1CceWxG1bQVs5ZufpsMljq4Ui0/1lvh+wjChP4kqKOJ2qxq
4RgqsahDYVvTH9w7jXbyLeiNdd8XM2w9U/t7y0Ff/9yi0GE44Za4rF2LN9d11TPA
mRGunUHBcnWEvgJBQl9nJEiU0Zsnvgc/ubhPgXRR4Xq37Z0j4r7g1SgEEzwxA57d
emyPxgcYxn/eR44/KJ4EBs+lVDR3veyJm+kXQ99b21/+jh5Xos1AnX5iItreGCc=
-----END CERTIFICATE-----
# Gradio Demo
Please refer our gradio deployment doc:
[English doc: Gradio Deployment](https://lightx2v-en.readthedocs.io/en/latest/deploy_guides/deploy_gradio.html)
[中文文档: Gradio 部署](https://lightx2v-zhcn.readthedocs.io/zh-cn/latest/deploy_guides/deploy_gradio.html)
import os import os
import gradio as gr import gradio as gr
import asyncio
import argparse import argparse
import json import json
import torch import torch
...@@ -109,6 +108,24 @@ def get_cpu_memory(): ...@@ -109,6 +108,24 @@ def get_cpu_memory():
return available_bytes / 1024**3 return available_bytes / 1024**3
def cleanup_memory():
gc.collect()
if torch.cuda.is_available():
torch.cuda.empty_cache()
torch.cuda.synchronize()
try:
if hasattr(psutil, "virtual_memory"):
if os.name == "posix":
try:
os.system("sync")
except: # noqa
pass
except: # noqa
pass
def generate_unique_filename(base_dir="./saved_videos"): def generate_unique_filename(base_dir="./saved_videos"):
os.makedirs(base_dir, exist_ok=True) os.makedirs(base_dir, exist_ok=True)
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S") timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
...@@ -147,11 +164,8 @@ for op_name, is_installed in available_attn_ops: ...@@ -147,11 +164,8 @@ for op_name, is_installed in available_attn_ops:
def run_inference( def run_inference(
model_type,
task,
prompt, prompt,
negative_prompt, negative_prompt,
image_path,
save_video_path, save_video_path,
torch_compile, torch_compile,
infer_steps, infer_steps,
...@@ -175,30 +189,30 @@ def run_inference( ...@@ -175,30 +189,30 @@ def run_inference(
cpu_offload, cpu_offload,
offload_granularity, offload_granularity,
offload_ratio, offload_ratio,
t5_cpu_offload,
unload_modules,
t5_offload_granularity, t5_offload_granularity,
attention_type, attention_type,
quant_op, quant_op,
rotary_chunk, rotary_chunk,
rotary_chunk_size, rotary_chunk_size,
clean_cuda_cache, clean_cuda_cache,
image_path=None,
): ):
cleanup_memory()
quant_op = quant_op.split("(")[0].strip() quant_op = quant_op.split("(")[0].strip()
attention_type = attention_type.split("(")[0].strip() attention_type = attention_type.split("(")[0].strip()
global global_runner, current_config, model_path global global_runner, current_config, model_path, task
global cur_dit_quant_scheme, cur_clip_quant_scheme, cur_t5_quant_scheme, cur_precision_mode, cur_enable_teacache global cur_dit_quant_scheme, cur_clip_quant_scheme, cur_t5_quant_scheme, cur_precision_mode, cur_enable_teacache
if os.path.exists(os.path.join(model_path, "config.json")): if os.path.exists(os.path.join(model_path, "config.json")):
with open(os.path.join(model_path, "config.json"), "r") as f: with open(os.path.join(model_path, "config.json"), "r") as f:
model_config = json.load(f) model_config = json.load(f)
if task == "Image to Video":
task = "i2v"
elif task == "Text to Video":
task = "t2v"
if task == "t2v": if task == "t2v":
if model_type == "Wan2.1 1.3B": if model_size == "1.3b":
# 1.3B # 1.3B
coefficient = [ coefficient = [
[ [
...@@ -293,6 +307,7 @@ def run_inference( ...@@ -293,6 +307,7 @@ def run_inference(
needs_reinit = ( needs_reinit = (
lazy_load lazy_load
or unload_modules
or global_runner is None or global_runner is None
or current_config is None or current_config is None
or cur_dit_quant_scheme is None or cur_dit_quant_scheme is None
...@@ -331,6 +346,8 @@ def run_inference( ...@@ -331,6 +346,8 @@ def run_inference(
if os.path.exists(os.path.join(dit_quantized_ckpt, "config.json")): if os.path.exists(os.path.join(dit_quantized_ckpt, "config.json")):
with open(os.path.join(dit_quantized_ckpt, "config.json"), "r") as f: with open(os.path.join(dit_quantized_ckpt, "config.json"), "r") as f:
quant_model_config = json.load(f) quant_model_config = json.load(f)
else:
quant_model_config = {}
else: else:
mm_type = "Default" mm_type = "Default"
dit_quantized_ckpt = None dit_quantized_ckpt = None
...@@ -361,6 +378,8 @@ def run_inference( ...@@ -361,6 +378,8 @@ def run_inference(
"coefficients": coefficient[0] if use_ret_steps else coefficient[1], "coefficients": coefficient[0] if use_ret_steps else coefficient[1],
"use_ret_steps": use_ret_steps, "use_ret_steps": use_ret_steps,
"teacache_thresh": teacache_thresh, "teacache_thresh": teacache_thresh,
"t5_cpu_offload": t5_cpu_offload,
"unload_modules": unload_modules,
"t5_quantized": is_t5_quant, "t5_quantized": is_t5_quant,
"t5_quantized_ckpt": t5_quant_ckpt, "t5_quantized_ckpt": t5_quant_ckpt,
"t5_quant_scheme": t5_quant_scheme, "t5_quant_scheme": t5_quant_scheme,
...@@ -399,7 +418,6 @@ def run_inference( ...@@ -399,7 +418,6 @@ def run_inference(
config.update({k: v for k, v in vars(args).items()}) config.update({k: v for k, v in vars(args).items()})
config = EasyDict(config) config = EasyDict(config)
config["mode"] = "infer"
config.update(model_config) config.update(model_config)
config.update(quant_model_config) config.update(quant_model_config)
...@@ -429,17 +447,27 @@ def run_inference( ...@@ -429,17 +447,27 @@ def run_inference(
else: else:
runner.config = config runner.config = config
asyncio.run(runner.run_pipeline()) runner.run_pipeline()
if lazy_load: del config, args, model_config, quant_model_config
del runner if "dit_quantized_ckpt" in locals():
torch.cuda.empty_cache() del dit_quantized_ckpt
gc.collect() if "t5_quant_ckpt" in locals():
del t5_quant_ckpt
if "clip_quant_ckpt" in locals():
del clip_quant_ckpt
cleanup_memory()
return save_video_path return save_video_path
def auto_configure(enable_auto_config, model_type, resolution): def handle_lazy_load_change(lazy_load_enabled):
"""Handle lazy_load checkbox change to automatically enable unload_modules"""
return gr.update(value=lazy_load_enabled)
def auto_configure(enable_auto_config, resolution):
default_config = { default_config = {
"torch_compile_val": False, "torch_compile_val": False,
"lazy_load_val": False, "lazy_load_val": False,
...@@ -449,6 +477,8 @@ def auto_configure(enable_auto_config, model_type, resolution): ...@@ -449,6 +477,8 @@ def auto_configure(enable_auto_config, model_type, resolution):
"cpu_offload_val": False, "cpu_offload_val": False,
"offload_granularity_val": "block", "offload_granularity_val": "block",
"offload_ratio_val": 1, "offload_ratio_val": 1,
"t5_cpu_offload_val": False,
"unload_modules_val": False,
"t5_offload_granularity_val": "model", "t5_offload_granularity_val": "model",
"attention_type_val": attn_op_choices[0][1], "attention_type_val": attn_op_choices[0][1],
"quant_op_val": quant_op_choices[0][1], "quant_op_val": quant_op_choices[0][1],
...@@ -505,7 +535,7 @@ def auto_configure(enable_auto_config, model_type, resolution): ...@@ -505,7 +535,7 @@ def auto_configure(enable_auto_config, model_type, resolution):
else: else:
res = "480p" res = "480p"
if model_type in ["Wan2.1 14B"]: if model_size == "14b":
is_14b = True is_14b = True
else: else:
is_14b = False is_14b = False
...@@ -513,13 +543,14 @@ def auto_configure(enable_auto_config, model_type, resolution): ...@@ -513,13 +543,14 @@ def auto_configure(enable_auto_config, model_type, resolution):
if res == "720p" and is_14b: if res == "720p" and is_14b:
gpu_rules = [ gpu_rules = [
(80, {}), (80, {}),
(48, {"cpu_offload_val": True, "offload_ratio_val": 0.5}), (48, {"cpu_offload_val": True, "offload_ratio_val": 0.5, "t5_cpu_offload_val": True}),
(40, {"cpu_offload_val": True, "offload_ratio_val": 0.8}), (40, {"cpu_offload_val": True, "offload_ratio_val": 0.8, "t5_cpu_offload_val": True}),
(32, {"cpu_offload_val": True, "offload_ratio_val": 1}), (32, {"cpu_offload_val": True, "offload_ratio_val": 1, "t5_cpu_offload_val": True}),
( (
24, 24,
{ {
"cpu_offload_val": True, "cpu_offload_val": True,
"t5_cpu_offload_val": True,
"offload_ratio_val": 1, "offload_ratio_val": 1,
"t5_offload_granularity_val": "block", "t5_offload_granularity_val": "block",
"precision_mode_val": "bf16", "precision_mode_val": "bf16",
...@@ -530,6 +561,7 @@ def auto_configure(enable_auto_config, model_type, resolution): ...@@ -530,6 +561,7 @@ def auto_configure(enable_auto_config, model_type, resolution):
16, 16,
{ {
"cpu_offload_val": True, "cpu_offload_val": True,
"t5_cpu_offload_val": True,
"offload_ratio_val": 1, "offload_ratio_val": 1,
"t5_offload_granularity_val": "block", "t5_offload_granularity_val": "block",
"precision_mode_val": "bf16", "precision_mode_val": "bf16",
...@@ -543,6 +575,7 @@ def auto_configure(enable_auto_config, model_type, resolution): ...@@ -543,6 +575,7 @@ def auto_configure(enable_auto_config, model_type, resolution):
12, 12,
{ {
"cpu_offload_val": True, "cpu_offload_val": True,
"t5_cpu_offload_val": True,
"offload_ratio_val": 1, "offload_ratio_val": 1,
"t5_offload_granularity_val": "block", "t5_offload_granularity_val": "block",
"precision_mode_val": "bf16", "precision_mode_val": "bf16",
...@@ -551,12 +584,14 @@ def auto_configure(enable_auto_config, model_type, resolution): ...@@ -551,12 +584,14 @@ def auto_configure(enable_auto_config, model_type, resolution):
"rotary_chunk_val": True, "rotary_chunk_val": True,
"rotary_chunk_size_val": 100, "rotary_chunk_size_val": 100,
"clean_cuda_cache_val": True, "clean_cuda_cache_val": True,
"use_tiny_vae_val": True,
}, },
), ),
( (
8, 8,
{ {
"cpu_offload_val": True, "cpu_offload_val": True,
"t5_cpu_offload_val": True,
"offload_ratio_val": 1, "offload_ratio_val": 1,
"t5_offload_granularity_val": "block", "t5_offload_granularity_val": "block",
"precision_mode_val": "bf16", "precision_mode_val": "bf16",
...@@ -569,6 +604,8 @@ def auto_configure(enable_auto_config, model_type, resolution): ...@@ -569,6 +604,8 @@ def auto_configure(enable_auto_config, model_type, resolution):
"clip_quant_scheme_val": quant_type, "clip_quant_scheme_val": quant_type,
"dit_quant_scheme_val": quant_type, "dit_quant_scheme_val": quant_type,
"lazy_load_val": True, "lazy_load_val": True,
"unload_modules_val": True,
"use_tiny_vae_val": True,
}, },
), ),
] ]
...@@ -576,13 +613,14 @@ def auto_configure(enable_auto_config, model_type, resolution): ...@@ -576,13 +613,14 @@ def auto_configure(enable_auto_config, model_type, resolution):
elif is_14b: elif is_14b:
gpu_rules = [ gpu_rules = [
(80, {}), (80, {}),
(48, {"cpu_offload_val": True, "offload_ratio_val": 0.2}), (48, {"cpu_offload_val": True, "offload_ratio_val": 0.2, "t5_cpu_offload_val": True}),
(40, {"cpu_offload_val": True, "offload_ratio_val": 0.5}), (40, {"cpu_offload_val": True, "offload_ratio_val": 0.5, "t5_cpu_offload_val": True}),
(24, {"cpu_offload_val": True, "offload_ratio_val": 0.8}), (24, {"cpu_offload_val": True, "offload_ratio_val": 0.8, "t5_cpu_offload_val": True}),
( (
16, 16,
{ {
"cpu_offload_val": True, "cpu_offload_val": True,
"t5_cpu_offload_val": True,
"offload_ratio_val": 1, "offload_ratio_val": 1,
"t5_offload_granularity_val": "block", "t5_offload_granularity_val": "block",
"precision_mode_val": "bf16", "precision_mode_val": "bf16",
...@@ -595,6 +633,7 @@ def auto_configure(enable_auto_config, model_type, resolution): ...@@ -595,6 +633,7 @@ def auto_configure(enable_auto_config, model_type, resolution):
( (
{ {
"cpu_offload_val": True, "cpu_offload_val": True,
"t5_cpu_offload_val": True,
"offload_ratio_val": 1, "offload_ratio_val": 1,
"t5_offload_granularity_val": "block", "t5_offload_granularity_val": "block",
"precision_mode_val": "bf16", "precision_mode_val": "bf16",
...@@ -604,12 +643,15 @@ def auto_configure(enable_auto_config, model_type, resolution): ...@@ -604,12 +643,15 @@ def auto_configure(enable_auto_config, model_type, resolution):
"clip_quant_scheme_val": quant_type, "clip_quant_scheme_val": quant_type,
"dit_quant_scheme_val": quant_type, "dit_quant_scheme_val": quant_type,
"lazy_load_val": True, "lazy_load_val": True,
"unload_modules_val": True,
"rotary_chunk_val": True, "rotary_chunk_val": True,
"rotary_chunk_size_val": 10000, "rotary_chunk_size_val": 10000,
"use_tiny_vae_val": True,
} }
if res == "540p" if res == "540p"
else { else {
"cpu_offload_val": True, "cpu_offload_val": True,
"t5_cpu_offload_val": True,
"offload_ratio_val": 1, "offload_ratio_val": 1,
"t5_offload_granularity_val": "block", "t5_offload_granularity_val": "block",
"precision_mode_val": "bf16", "precision_mode_val": "bf16",
...@@ -619,11 +661,26 @@ def auto_configure(enable_auto_config, model_type, resolution): ...@@ -619,11 +661,26 @@ def auto_configure(enable_auto_config, model_type, resolution):
"clip_quant_scheme_val": quant_type, "clip_quant_scheme_val": quant_type,
"dit_quant_scheme_val": quant_type, "dit_quant_scheme_val": quant_type,
"lazy_load_val": True, "lazy_load_val": True,
"unload_modules_val": True,
"use_tiny_vae_val": True,
} }
), ),
), ),
] ]
else:
gpu_rules = [
(24, {}),
(
8,
{
"t5_cpu_offload_val": True,
"t5_offload_granularity_val": "block",
"t5_quant_scheme_val": quant_type,
},
),
]
if is_14b: if is_14b:
cpu_rules = [ cpu_rules = [
(128, {}), (128, {}),
...@@ -636,6 +693,19 @@ def auto_configure(enable_auto_config, model_type, resolution): ...@@ -636,6 +693,19 @@ def auto_configure(enable_auto_config, model_type, resolution):
"t5_quant_scheme_val": quant_type, "t5_quant_scheme_val": quant_type,
"clip_quant_scheme_val": quant_type, "clip_quant_scheme_val": quant_type,
"lazy_load_val": True, "lazy_load_val": True,
"unload_modules_val": True,
},
),
]
else:
cpu_rules = [
(64, {}),
(
16,
{
"t5_quant_scheme_val": quant_type,
"unload_modules_val": True,
"use_tiny_vae_val": True,
}, },
), ),
] ]
...@@ -654,12 +724,6 @@ def auto_configure(enable_auto_config, model_type, resolution): ...@@ -654,12 +724,6 @@ def auto_configure(enable_auto_config, model_type, resolution):
def main(): def main():
def update_model_type(task_type):
if task_type == "Image to Video":
return gr.update(choices=["Wan2.1 14B"], value="Wan2.1 14B")
elif task_type == "Text to Video":
return gr.update(choices=["Wan2.1 14B", "Wan2.1 1.3B"], value="Wan2.1 14B")
def toggle_image_input(task): def toggle_image_input(task):
return gr.update(visible=(task == "Image to Video")) return gr.update(visible=(task == "Image to Video"))
...@@ -683,37 +747,15 @@ def main(): ...@@ -683,37 +747,15 @@ def main():
with gr.Group(): with gr.Group():
gr.Markdown("## 📥 Input Parameters") gr.Markdown("## 📥 Input Parameters")
with gr.Row(): if task == "i2v":
task = gr.Dropdown( with gr.Row():
choices=["Image to Video", "Text to Video"], image_path = gr.Image(
value="Image to Video", label="Input Image",
label="Task Type", type="filepath",
) height=300,
model_type = gr.Dropdown( interactive=True,
choices=["Wan2.1 14B"], visible=True,
value="Wan2.1 14B", )
label="Model Type",
)
task.change(
fn=update_model_type,
inputs=task,
outputs=model_type,
)
with gr.Row():
image_path = gr.Image(
label="Input Image",
type="filepath",
height=300,
interactive=True,
visible=True, # Initially visible
)
task.change(
fn=toggle_image_input,
inputs=task,
outputs=image_path,
)
with gr.Row(): with gr.Row():
with gr.Column(): with gr.Column():
...@@ -755,6 +797,13 @@ def main(): ...@@ -755,6 +797,13 @@ def main():
value="832x480", value="832x480",
label="Maximum Resolution", label="Maximum Resolution",
) )
with gr.Column():
enable_auto_config = gr.Checkbox(
label="Auto-configure Inference Options",
value=False,
info="Automatically optimize GPU settings to match the current resolution. After changing the resolution, please re-check this option to prevent potential performance degradation or runtime errors.",
)
with gr.Column(scale=9): with gr.Column(scale=9):
seed = gr.Slider( seed = gr.Slider(
label="Random Seed", label="Random Seed",
...@@ -836,14 +885,6 @@ def main(): ...@@ -836,14 +885,6 @@ def main():
with gr.Tab("⚙️ Advanced Options", id=2): with gr.Tab("⚙️ Advanced Options", id=2):
with gr.Group(elem_classes="advanced-options"): with gr.Group(elem_classes="advanced-options"):
gr.Markdown("### Auto configuration")
with gr.Row():
enable_auto_config = gr.Checkbox(
label="Auto configuration",
value=False,
info="Auto-tune optimization settings for your GPU",
)
gr.Markdown("### GPU Memory Optimization") gr.Markdown("### GPU Memory Optimization")
with gr.Row(): with gr.Row():
rotary_chunk = gr.Checkbox( rotary_chunk = gr.Checkbox(
...@@ -861,6 +902,11 @@ def main(): ...@@ -861,6 +902,11 @@ def main():
info="Controls the chunk size for applying rotary embeddings. Larger values may improve performance but increase memory usage. Only effective if 'rotary_chunk' is checked.", info="Controls the chunk size for applying rotary embeddings. Larger values may improve performance but increase memory usage. Only effective if 'rotary_chunk' is checked.",
) )
unload_modules = gr.Checkbox(
label="Unload Modules",
value=False,
info="Unload modules (T5, CLIP, DIT, etc.) after inference to reduce GPU/CPU memory usage",
)
clean_cuda_cache = gr.Checkbox( clean_cuda_cache = gr.Checkbox(
label="Clean CUDA Memory Cache", label="Clean CUDA Memory Cache",
value=False, value=False,
...@@ -895,6 +941,12 @@ def main(): ...@@ -895,6 +941,12 @@ def main():
value=1.0, value=1.0,
info="Controls how much of the Dit model is offloaded to the CPU", info="Controls how much of the Dit model is offloaded to the CPU",
) )
t5_cpu_offload = gr.Checkbox(
label="T5 CPU Offloading",
value=False,
info="Offload the T5 Encoder model to CPU to reduce GPU memory usage",
)
t5_offload_granularity = gr.Dropdown( t5_offload_granularity = gr.Dropdown(
label="T5 Encoder Offload Granularity", label="T5 Encoder Offload Granularity",
choices=["model", "block"], choices=["model", "block"],
...@@ -983,7 +1035,7 @@ def main(): ...@@ -983,7 +1035,7 @@ def main():
enable_auto_config.change( enable_auto_config.change(
fn=auto_configure, fn=auto_configure,
inputs=[enable_auto_config, model_type, resolution], inputs=[enable_auto_config, resolution],
outputs=[ outputs=[
torch_compile, torch_compile,
lazy_load, lazy_load,
...@@ -993,6 +1045,8 @@ def main(): ...@@ -993,6 +1045,8 @@ def main():
cpu_offload, cpu_offload,
offload_granularity, offload_granularity,
offload_ratio, offload_ratio,
t5_cpu_offload,
unload_modules,
t5_offload_granularity, t5_offload_granularity,
attention_type, attention_type,
quant_op, quant_op,
...@@ -1008,46 +1062,92 @@ def main(): ...@@ -1008,46 +1062,92 @@ def main():
], ],
) )
infer_btn.click( lazy_load.change(
fn=run_inference, fn=handle_lazy_load_change,
inputs=[ inputs=[lazy_load],
model_type, outputs=[unload_modules],
task, )
prompt, if task == "i2v":
negative_prompt, infer_btn.click(
image_path, fn=run_inference,
save_video_path, inputs=[
torch_compile, prompt,
infer_steps, negative_prompt,
num_frames, save_video_path,
resolution, torch_compile,
seed, infer_steps,
sample_shift, num_frames,
enable_teacache, resolution,
teacache_thresh, seed,
use_ret_steps, sample_shift,
enable_cfg, enable_teacache,
cfg_scale, teacache_thresh,
dit_quant_scheme, use_ret_steps,
t5_quant_scheme, enable_cfg,
clip_quant_scheme, cfg_scale,
fps, dit_quant_scheme,
use_tiny_vae, t5_quant_scheme,
use_tiling_vae, clip_quant_scheme,
lazy_load, fps,
precision_mode, use_tiny_vae,
cpu_offload, use_tiling_vae,
offload_granularity, lazy_load,
offload_ratio, precision_mode,
t5_offload_granularity, cpu_offload,
attention_type, offload_granularity,
quant_op, offload_ratio,
rotary_chunk, t5_cpu_offload,
rotary_chunk_size, unload_modules,
clean_cuda_cache, t5_offload_granularity,
], attention_type,
outputs=output_video, quant_op,
) rotary_chunk,
rotary_chunk_size,
clean_cuda_cache,
image_path,
],
outputs=output_video,
)
else:
infer_btn.click(
fn=run_inference,
inputs=[
prompt,
negative_prompt,
save_video_path,
torch_compile,
infer_steps,
num_frames,
resolution,
seed,
sample_shift,
enable_teacache,
teacache_thresh,
use_ret_steps,
enable_cfg,
cfg_scale,
dit_quant_scheme,
t5_quant_scheme,
clip_quant_scheme,
fps,
use_tiny_vae,
use_tiling_vae,
lazy_load,
precision_mode,
cpu_offload,
offload_granularity,
offload_ratio,
t5_cpu_offload,
unload_modules,
t5_offload_granularity,
attention_type,
quant_op,
rotary_chunk,
rotary_chunk_size,
clean_cuda_cache,
],
outputs=output_video,
)
demo.launch(share=True, server_port=args.server_port, server_name=args.server_name) demo.launch(share=True, server_port=args.server_port, server_name=args.server_name)
...@@ -1062,12 +1162,16 @@ if __name__ == "__main__": ...@@ -1062,12 +1162,16 @@ if __name__ == "__main__":
default="wan2.1", default="wan2.1",
help="Model class to use", help="Model class to use",
) )
parser.add_argument("--model_size", type=str, required=True, choices=["14b", "1.3b"], help="Model type to use")
parser.add_argument("--task", type=str, required=True, choices=["i2v", "t2v"], help="Specify the task type. 'i2v' for image-to-video translation, 't2v' for text-to-video generation.")
parser.add_argument("--server_port", type=int, default=7862, help="Server port") parser.add_argument("--server_port", type=int, default=7862, help="Server port")
parser.add_argument("--server_name", type=str, default="0.0.0.0", help="Server ip") parser.add_argument("--server_name", type=str, default="0.0.0.0", help="Server ip")
args = parser.parse_args() args = parser.parse_args()
global model_path, model_cls global model_path, model_cls, model_size
model_path = args.model_path model_path = args.model_path
model_cls = args.model_cls model_cls = args.model_cls
model_size = args.model_size
task = args.task
main() main()
import os import os
import gradio as gr import gradio as gr
import asyncio
import argparse import argparse
import json import json
import torch import torch
...@@ -13,7 +12,6 @@ import importlib.util ...@@ -13,7 +12,6 @@ import importlib.util
import psutil import psutil
import random import random
logger.add( logger.add(
"inference_logs.log", "inference_logs.log",
rotation="100 MB", rotation="100 MB",
...@@ -98,7 +96,7 @@ def get_gpu_memory(gpu_idx=0): ...@@ -98,7 +96,7 @@ def get_gpu_memory(gpu_idx=0):
try: try:
with torch.cuda.device(gpu_idx): with torch.cuda.device(gpu_idx):
memory_info = torch.cuda.mem_get_info() memory_info = torch.cuda.mem_get_info()
total_memory = memory_info[1] / (1024**3) total_memory = memory_info[1] / (1024**3) # Convert bytes to GB
return total_memory return total_memory
except Exception as e: except Exception as e:
logger.warning(f"获取GPU内存失败: {e}") logger.warning(f"获取GPU内存失败: {e}")
...@@ -110,6 +108,26 @@ def get_cpu_memory(): ...@@ -110,6 +108,26 @@ def get_cpu_memory():
return available_bytes / 1024**3 return available_bytes / 1024**3
def cleanup_memory():
gc.collect()
if torch.cuda.is_available():
torch.cuda.empty_cache()
torch.cuda.synchronize()
try:
import psutil
if hasattr(psutil, "virtual_memory"):
if os.name == "posix":
try:
os.system("sync")
except: # noqa
pass
except: # noqa
pass
def generate_unique_filename(base_dir="./saved_videos"): def generate_unique_filename(base_dir="./saved_videos"):
os.makedirs(base_dir, exist_ok=True) os.makedirs(base_dir, exist_ok=True)
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S") timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
...@@ -148,11 +166,8 @@ for op_name, is_installed in available_attn_ops: ...@@ -148,11 +166,8 @@ for op_name, is_installed in available_attn_ops:
def run_inference( def run_inference(
model_type,
task,
prompt, prompt,
negative_prompt, negative_prompt,
image_path,
save_video_path, save_video_path,
torch_compile, torch_compile,
infer_steps, infer_steps,
...@@ -176,30 +191,30 @@ def run_inference( ...@@ -176,30 +191,30 @@ def run_inference(
cpu_offload, cpu_offload,
offload_granularity, offload_granularity,
offload_ratio, offload_ratio,
t5_cpu_offload,
unload_modules,
t5_offload_granularity, t5_offload_granularity,
attention_type, attention_type,
quant_op, quant_op,
rotary_chunk, rotary_chunk,
rotary_chunk_size, rotary_chunk_size,
clean_cuda_cache, clean_cuda_cache,
image_path=None,
): ):
cleanup_memory()
quant_op = quant_op.split("(")[0].strip() quant_op = quant_op.split("(")[0].strip()
attention_type = attention_type.split("(")[0].strip() attention_type = attention_type.split("(")[0].strip()
global global_runner, current_config, model_path global global_runner, current_config, model_path, task
global cur_dit_quant_scheme, cur_clip_quant_scheme, cur_t5_quant_scheme, cur_precision_mode, cur_enable_teacache global cur_dit_quant_scheme, cur_clip_quant_scheme, cur_t5_quant_scheme, cur_precision_mode, cur_enable_teacache
if os.path.exists(os.path.join(model_path, "config.json")): if os.path.exists(os.path.join(model_path, "config.json")):
with open(os.path.join(model_path, "config.json"), "r") as f: with open(os.path.join(model_path, "config.json"), "r") as f:
model_config = json.load(f) model_config = json.load(f)
if task == "图像生成视频":
task = "i2v"
elif task == "文本生成视频":
task = "t2v"
if task == "t2v": if task == "t2v":
if model_type == "Wan2.1 1.3B": if model_size == "1.3b":
# 1.3B # 1.3B
coefficient = [ coefficient = [
[ [
...@@ -294,6 +309,7 @@ def run_inference( ...@@ -294,6 +309,7 @@ def run_inference(
needs_reinit = ( needs_reinit = (
lazy_load lazy_load
or unload_modules
or global_runner is None or global_runner is None
or current_config is None or current_config is None
or cur_dit_quant_scheme is None or cur_dit_quant_scheme is None
...@@ -332,6 +348,8 @@ def run_inference( ...@@ -332,6 +348,8 @@ def run_inference(
if os.path.exists(os.path.join(dit_quantized_ckpt, "config.json")): if os.path.exists(os.path.join(dit_quantized_ckpt, "config.json")):
with open(os.path.join(dit_quantized_ckpt, "config.json"), "r") as f: with open(os.path.join(dit_quantized_ckpt, "config.json"), "r") as f:
quant_model_config = json.load(f) quant_model_config = json.load(f)
else:
quant_model_config = {}
else: else:
mm_type = "Default" mm_type = "Default"
dit_quantized_ckpt = None dit_quantized_ckpt = None
...@@ -362,6 +380,8 @@ def run_inference( ...@@ -362,6 +380,8 @@ def run_inference(
"coefficients": coefficient[0] if use_ret_steps else coefficient[1], "coefficients": coefficient[0] if use_ret_steps else coefficient[1],
"use_ret_steps": use_ret_steps, "use_ret_steps": use_ret_steps,
"teacache_thresh": teacache_thresh, "teacache_thresh": teacache_thresh,
"t5_cpu_offload": t5_cpu_offload,
"unload_modules": unload_modules,
"t5_quantized": is_t5_quant, "t5_quantized": is_t5_quant,
"t5_quantized_ckpt": t5_quant_ckpt, "t5_quantized_ckpt": t5_quant_ckpt,
"t5_quant_scheme": t5_quant_scheme, "t5_quant_scheme": t5_quant_scheme,
...@@ -400,13 +420,13 @@ def run_inference( ...@@ -400,13 +420,13 @@ def run_inference(
config.update({k: v for k, v in vars(args).items()}) config.update({k: v for k, v in vars(args).items()})
config = EasyDict(config) config = EasyDict(config)
config["mode"] = "infer"
config.update(model_config) config.update(model_config)
config.update(quant_model_config) config.update(quant_model_config)
logger.info(f"使用模型: {model_path}") logger.info(f"使用模型: {model_path}")
logger.info(f"推理配置:\n{json.dumps(config, indent=4, ensure_ascii=False)}") logger.info(f"推理配置:\n{json.dumps(config, indent=4, ensure_ascii=False)}")
# Initialize or reuse the runner
runner = global_runner runner = global_runner
if needs_reinit: if needs_reinit:
if runner is not None: if runner is not None:
...@@ -429,17 +449,27 @@ def run_inference( ...@@ -429,17 +449,27 @@ def run_inference(
else: else:
runner.config = config runner.config = config
asyncio.run(runner.run_pipeline()) runner.run_pipeline()
if lazy_load: del config, args, model_config, quant_model_config
del runner if "dit_quantized_ckpt" in locals():
torch.cuda.empty_cache() del dit_quantized_ckpt
gc.collect() if "t5_quant_ckpt" in locals():
del t5_quant_ckpt
if "clip_quant_ckpt" in locals():
del clip_quant_ckpt
cleanup_memory()
return save_video_path return save_video_path
def auto_configure(enable_auto_config, model_type, resolution): def handle_lazy_load_change(lazy_load_enabled):
"""Handle lazy_load checkbox change to automatically enable unload_modules"""
return gr.update(value=lazy_load_enabled)
def auto_configure(enable_auto_config, resolution):
default_config = { default_config = {
"torch_compile_val": False, "torch_compile_val": False,
"lazy_load_val": False, "lazy_load_val": False,
...@@ -449,6 +479,8 @@ def auto_configure(enable_auto_config, model_type, resolution): ...@@ -449,6 +479,8 @@ def auto_configure(enable_auto_config, model_type, resolution):
"cpu_offload_val": False, "cpu_offload_val": False,
"offload_granularity_val": "block", "offload_granularity_val": "block",
"offload_ratio_val": 1, "offload_ratio_val": 1,
"t5_cpu_offload_val": False,
"unload_modules_val": False,
"t5_offload_granularity_val": "model", "t5_offload_granularity_val": "model",
"attention_type_val": attn_op_choices[0][1], "attention_type_val": attn_op_choices[0][1],
"quant_op_val": quant_op_choices[0][1], "quant_op_val": quant_op_choices[0][1],
...@@ -505,7 +537,7 @@ def auto_configure(enable_auto_config, model_type, resolution): ...@@ -505,7 +537,7 @@ def auto_configure(enable_auto_config, model_type, resolution):
else: else:
res = "480p" res = "480p"
if model_type in ["Wan2.1 14B"]: if model_size == "14b":
is_14b = True is_14b = True
else: else:
is_14b = False is_14b = False
...@@ -513,13 +545,14 @@ def auto_configure(enable_auto_config, model_type, resolution): ...@@ -513,13 +545,14 @@ def auto_configure(enable_auto_config, model_type, resolution):
if res == "720p" and is_14b: if res == "720p" and is_14b:
gpu_rules = [ gpu_rules = [
(80, {}), (80, {}),
(48, {"cpu_offload_val": True, "offload_ratio_val": 0.5}), (48, {"cpu_offload_val": True, "offload_ratio_val": 0.5, "t5_cpu_offload_val": True}),
(40, {"cpu_offload_val": True, "offload_ratio_val": 0.8}), (40, {"cpu_offload_val": True, "offload_ratio_val": 0.8, "t5_cpu_offload_val": True}),
(32, {"cpu_offload_val": True, "offload_ratio_val": 1}), (32, {"cpu_offload_val": True, "offload_ratio_val": 1, "t5_cpu_offload_val": True}),
( (
24, 24,
{ {
"cpu_offload_val": True, "cpu_offload_val": True,
"t5_cpu_offload_val": True,
"offload_ratio_val": 1, "offload_ratio_val": 1,
"t5_offload_granularity_val": "block", "t5_offload_granularity_val": "block",
"precision_mode_val": "bf16", "precision_mode_val": "bf16",
...@@ -530,6 +563,7 @@ def auto_configure(enable_auto_config, model_type, resolution): ...@@ -530,6 +563,7 @@ def auto_configure(enable_auto_config, model_type, resolution):
16, 16,
{ {
"cpu_offload_val": True, "cpu_offload_val": True,
"t5_cpu_offload_val": True,
"offload_ratio_val": 1, "offload_ratio_val": 1,
"t5_offload_granularity_val": "block", "t5_offload_granularity_val": "block",
"precision_mode_val": "bf16", "precision_mode_val": "bf16",
...@@ -543,6 +577,7 @@ def auto_configure(enable_auto_config, model_type, resolution): ...@@ -543,6 +577,7 @@ def auto_configure(enable_auto_config, model_type, resolution):
12, 12,
{ {
"cpu_offload_val": True, "cpu_offload_val": True,
"t5_cpu_offload_val": True,
"offload_ratio_val": 1, "offload_ratio_val": 1,
"t5_offload_granularity_val": "block", "t5_offload_granularity_val": "block",
"precision_mode_val": "bf16", "precision_mode_val": "bf16",
...@@ -551,12 +586,14 @@ def auto_configure(enable_auto_config, model_type, resolution): ...@@ -551,12 +586,14 @@ def auto_configure(enable_auto_config, model_type, resolution):
"rotary_chunk_val": True, "rotary_chunk_val": True,
"rotary_chunk_size_val": 100, "rotary_chunk_size_val": 100,
"clean_cuda_cache_val": True, "clean_cuda_cache_val": True,
"use_tiny_vae_val": True,
}, },
), ),
( (
8, 8,
{ {
"cpu_offload_val": True, "cpu_offload_val": True,
"t5_cpu_offload_val": True,
"offload_ratio_val": 1, "offload_ratio_val": 1,
"t5_offload_granularity_val": "block", "t5_offload_granularity_val": "block",
"precision_mode_val": "bf16", "precision_mode_val": "bf16",
...@@ -569,6 +606,8 @@ def auto_configure(enable_auto_config, model_type, resolution): ...@@ -569,6 +606,8 @@ def auto_configure(enable_auto_config, model_type, resolution):
"clip_quant_scheme_val": quant_type, "clip_quant_scheme_val": quant_type,
"dit_quant_scheme_val": quant_type, "dit_quant_scheme_val": quant_type,
"lazy_load_val": True, "lazy_load_val": True,
"unload_modules_val": True,
"use_tiny_vae_val": True,
}, },
), ),
] ]
...@@ -576,13 +615,14 @@ def auto_configure(enable_auto_config, model_type, resolution): ...@@ -576,13 +615,14 @@ def auto_configure(enable_auto_config, model_type, resolution):
elif is_14b: elif is_14b:
gpu_rules = [ gpu_rules = [
(80, {}), (80, {}),
(48, {"cpu_offload_val": True, "offload_ratio_val": 0.2}), (48, {"cpu_offload_val": True, "offload_ratio_val": 0.2, "t5_cpu_offload_val": True}),
(40, {"cpu_offload_val": True, "offload_ratio_val": 0.5}), (40, {"cpu_offload_val": True, "offload_ratio_val": 0.5, "t5_cpu_offload_val": True}),
(24, {"cpu_offload_val": True, "offload_ratio_val": 0.8}), (24, {"cpu_offload_val": True, "offload_ratio_val": 0.8, "t5_cpu_offload_val": True}),
( (
16, 16,
{ {
"cpu_offload_val": True, "cpu_offload_val": True,
"t5_cpu_offload_val": True,
"offload_ratio_val": 1, "offload_ratio_val": 1,
"t5_offload_granularity_val": "block", "t5_offload_granularity_val": "block",
"precision_mode_val": "bf16", "precision_mode_val": "bf16",
...@@ -595,6 +635,7 @@ def auto_configure(enable_auto_config, model_type, resolution): ...@@ -595,6 +635,7 @@ def auto_configure(enable_auto_config, model_type, resolution):
( (
{ {
"cpu_offload_val": True, "cpu_offload_val": True,
"t5_cpu_offload_val": True,
"offload_ratio_val": 1, "offload_ratio_val": 1,
"t5_offload_granularity_val": "block", "t5_offload_granularity_val": "block",
"precision_mode_val": "bf16", "precision_mode_val": "bf16",
...@@ -604,12 +645,15 @@ def auto_configure(enable_auto_config, model_type, resolution): ...@@ -604,12 +645,15 @@ def auto_configure(enable_auto_config, model_type, resolution):
"clip_quant_scheme_val": quant_type, "clip_quant_scheme_val": quant_type,
"dit_quant_scheme_val": quant_type, "dit_quant_scheme_val": quant_type,
"lazy_load_val": True, "lazy_load_val": True,
"unload_modules_val": True,
"rotary_chunk_val": True, "rotary_chunk_val": True,
"rotary_chunk_size_val": 10000, "rotary_chunk_size_val": 10000,
"use_tiny_vae_val": True,
} }
if res == "540p" if res == "540p"
else { else {
"cpu_offload_val": True, "cpu_offload_val": True,
"t5_cpu_offload_val": True,
"offload_ratio_val": 1, "offload_ratio_val": 1,
"t5_offload_granularity_val": "block", "t5_offload_granularity_val": "block",
"precision_mode_val": "bf16", "precision_mode_val": "bf16",
...@@ -619,11 +663,26 @@ def auto_configure(enable_auto_config, model_type, resolution): ...@@ -619,11 +663,26 @@ def auto_configure(enable_auto_config, model_type, resolution):
"clip_quant_scheme_val": quant_type, "clip_quant_scheme_val": quant_type,
"dit_quant_scheme_val": quant_type, "dit_quant_scheme_val": quant_type,
"lazy_load_val": True, "lazy_load_val": True,
"unload_modules_val": True,
"use_tiny_vae_val": True,
} }
), ),
), ),
] ]
else:
gpu_rules = [
(24, {}),
(
8,
{
"t5_cpu_offload_val": True,
"t5_offload_granularity_val": "block",
"t5_quant_scheme_val": quant_type,
},
),
]
if is_14b: if is_14b:
cpu_rules = [ cpu_rules = [
(128, {}), (128, {}),
...@@ -636,6 +695,19 @@ def auto_configure(enable_auto_config, model_type, resolution): ...@@ -636,6 +695,19 @@ def auto_configure(enable_auto_config, model_type, resolution):
"t5_quant_scheme_val": quant_type, "t5_quant_scheme_val": quant_type,
"clip_quant_scheme_val": quant_type, "clip_quant_scheme_val": quant_type,
"lazy_load_val": True, "lazy_load_val": True,
"unload_modules_val": True,
},
),
]
else:
cpu_rules = [
(64, {}),
(
16,
{
"t5_quant_scheme_val": quant_type,
"unload_modules_val": True,
"use_tiny_vae_val": True,
}, },
), ),
] ]
...@@ -654,17 +726,11 @@ def auto_configure(enable_auto_config, model_type, resolution): ...@@ -654,17 +726,11 @@ def auto_configure(enable_auto_config, model_type, resolution):
def main(): def main():
def update_model_type(task_type):
if task_type == "图像生成视频":
return gr.update(choices=["Wan2.1 14B"], value="Wan2.1 14B")
elif task_type == "文本生成视频":
return gr.update(choices=["Wan2.1 14B", "Wan2.1 1.3B"], value="Wan2.1 14B")
def toggle_image_input(task): def toggle_image_input(task):
return gr.update(visible=(task == "图像生成视频")) return gr.update(visible=(task == "i2v"))
with gr.Blocks( with gr.Blocks(
title="Lightx2v (轻量级视频生成推理引擎)", title="Lightx2v (轻量级视频推理和生成引擎)",
css=""" css="""
.main-content { max-width: 1400px; margin: auto; } .main-content { max-width: 1400px; margin: auto; }
.output-video { max-height: 650px; } .output-video { max-height: 650px; }
...@@ -683,37 +749,15 @@ def main(): ...@@ -683,37 +749,15 @@ def main():
with gr.Group(): with gr.Group():
gr.Markdown("## 📥 输入参数") gr.Markdown("## 📥 输入参数")
with gr.Row(): if task == "i2v":
task = gr.Dropdown( with gr.Row():
choices=["图像生成视频", "文本生成视频"], image_path = gr.Image(
value="图像生成视频", label="输入图像",
label="任务类型", type="filepath",
) height=300,
model_type = gr.Dropdown( interactive=True,
choices=["Wan2.1 14B"], visible=True,
value="Wan2.1 14B", )
label="模型类型",
)
task.change(
fn=update_model_type,
inputs=task,
outputs=model_type,
)
with gr.Row():
image_path = gr.Image(
label="输入图像",
type="filepath",
height=300,
interactive=True,
visible=True,
)
task.change(
fn=toggle_image_input,
inputs=task,
outputs=image_path,
)
with gr.Row(): with gr.Row():
with gr.Column(): with gr.Column():
...@@ -755,6 +799,11 @@ def main(): ...@@ -755,6 +799,11 @@ def main():
value="832x480", value="832x480",
label="最大分辨率", label="最大分辨率",
) )
with gr.Column():
enable_auto_config = gr.Checkbox(
label="自动配置推理选项", value=False, info="自动优化GPU设置以匹配当前分辨率。修改分辨率后,请重新勾选此选项,否则可能导致性能下降或运行失败。"
)
with gr.Column(scale=9): with gr.Column(scale=9):
seed = gr.Slider( seed = gr.Slider(
label="随机种子", label="随机种子",
...@@ -764,9 +813,10 @@ def main(): ...@@ -764,9 +813,10 @@ def main():
value=generate_random_seed(), value=generate_random_seed(),
) )
with gr.Column(scale=1): with gr.Column(scale=1):
randomize_btn = gr.Button("🎲 生成随机种子", variant="secondary") randomize_btn = gr.Button("🎲 随机化", variant="secondary")
randomize_btn.click(fn=generate_random_seed, inputs=None, outputs=seed) randomize_btn.click(fn=generate_random_seed, inputs=None, outputs=seed)
with gr.Column(): with gr.Column():
infer_steps = gr.Slider( infer_steps = gr.Slider(
label="推理步数", label="推理步数",
...@@ -774,7 +824,7 @@ def main(): ...@@ -774,7 +824,7 @@ def main():
maximum=100, maximum=100,
step=1, step=1,
value=40, value=40,
info="视频生成的推理步数。增加步数可能提高质量但降低速度", info="视频生成的推理步数。增加步数可能提高质量但降低速度",
) )
enable_cfg = gr.Checkbox( enable_cfg = gr.Checkbox(
...@@ -788,7 +838,7 @@ def main(): ...@@ -788,7 +838,7 @@ def main():
maximum=10, maximum=10,
step=1, step=1,
value=5, value=5,
info="控制提示词的影响强度。值越高,提示词的影响越大", info="控制提示词的影响强度。值越高,提示词的影响越大",
) )
sample_shift = gr.Slider( sample_shift = gr.Slider(
label="分布偏移", label="分布偏移",
...@@ -796,7 +846,7 @@ def main(): ...@@ -796,7 +846,7 @@ def main():
minimum=0, minimum=0,
maximum=10, maximum=10,
step=1, step=1,
info="控制样本分布偏移的程度。值越大表示偏移越明显", info="控制样本分布偏移的程度。值越大表示偏移越明显",
) )
fps = gr.Slider( fps = gr.Slider(
...@@ -805,7 +855,7 @@ def main(): ...@@ -805,7 +855,7 @@ def main():
maximum=30, maximum=30,
step=1, step=1,
value=16, value=16,
info="视频的每秒帧数。较高的FPS会产生更流畅的视频", info="视频的每秒帧数。较高的FPS会产生更流畅的视频",
) )
num_frames = gr.Slider( num_frames = gr.Slider(
label="总帧数", label="总帧数",
...@@ -813,7 +863,7 @@ def main(): ...@@ -813,7 +863,7 @@ def main():
maximum=120, maximum=120,
step=1, step=1,
value=81, value=81,
info="视频中的总帧数。更多帧数会产生更长的视频", info="视频中的总帧数。更多帧数会产生更长的视频",
) )
save_video_path = gr.Textbox( save_video_path = gr.Textbox(
...@@ -835,14 +885,6 @@ def main(): ...@@ -835,14 +885,6 @@ def main():
with gr.Tab("⚙️ 高级选项", id=2): with gr.Tab("⚙️ 高级选项", id=2):
with gr.Group(elem_classes="advanced-options"): with gr.Group(elem_classes="advanced-options"):
gr.Markdown("### 自动配置")
with gr.Row():
enable_auto_config = gr.Checkbox(
label="自动配置",
value=False,
info="自动调整优化设置以适应您的GPU",
)
gr.Markdown("### GPU内存优化") gr.Markdown("### GPU内存优化")
with gr.Row(): with gr.Row():
rotary_chunk = gr.Checkbox( rotary_chunk = gr.Checkbox(
...@@ -857,13 +899,17 @@ def main(): ...@@ -857,13 +899,17 @@ def main():
minimum=100, minimum=100,
maximum=10000, maximum=10000,
step=100, step=100,
info="控制应用旋转编码的块大小, 较大的值可能提高性能但增加内存使用, 仅在'rotary_chunk'勾选时有效", info="控制应用旋转编码的块大小。较大的值可能提高性能但增加内存使用。仅在'rotary_chunk'勾选时有效。",
)
unload_modules = gr.Checkbox(
label="卸载模块",
value=False,
info="推理后卸载模块(T5、CLIP、DIT等)以减少GPU/CPU内存使用",
) )
clean_cuda_cache = gr.Checkbox( clean_cuda_cache = gr.Checkbox(
label="清理CUDA内存缓存", label="清理CUDA内存缓存",
value=False, value=False,
info="及时释放GPU内存, 但会减慢推理速度。", info="启用时,及时释放GPU内存但会减慢推理速度。",
) )
gr.Markdown("### 异步卸载") gr.Markdown("### 异步卸载")
...@@ -877,14 +923,14 @@ def main(): ...@@ -877,14 +923,14 @@ def main():
lazy_load = gr.Checkbox( lazy_load = gr.Checkbox(
label="启用延迟加载", label="启用延迟加载",
value=False, value=False,
info="在推理过程中延迟加载模型组件, 仅在'cpu_offload'勾选和使用量化Dit模型时有效", info="在推理过程中延迟加载模型组件。需要CPU加载和DIT量化。",
) )
offload_granularity = gr.Dropdown( offload_granularity = gr.Dropdown(
label="Dit卸载粒度", label="Dit卸载粒度",
choices=["block", "phase"], choices=["block", "phase"],
value="phase", value="phase",
info="设置Dit模型卸载粒度: 块或计算阶段", info="设置Dit模型卸载粒度块或计算阶段",
) )
offload_ratio = gr.Slider( offload_ratio = gr.Slider(
label="Dit模型卸载比例", label="Dit模型卸载比例",
...@@ -894,6 +940,11 @@ def main(): ...@@ -894,6 +940,11 @@ def main():
value=1.0, value=1.0,
info="控制将多少Dit模型卸载到CPU", info="控制将多少Dit模型卸载到CPU",
) )
t5_cpu_offload = gr.Checkbox(
label="T5 CPU卸载",
value=False,
info="将T5编码器模型卸载到CPU以减少GPU内存使用",
)
t5_offload_granularity = gr.Dropdown( t5_offload_granularity = gr.Dropdown(
label="T5编码器卸载粒度", label="T5编码器卸载粒度",
choices=["model", "block"], choices=["model", "block"],
...@@ -926,25 +977,25 @@ def main(): ...@@ -926,25 +977,25 @@ def main():
label="Dit", label="Dit",
choices=["fp8", "int8", "bf16"], choices=["fp8", "int8", "bf16"],
value="bf16", value="bf16",
info="Dit模型的推理精度", info="Dit模型的量化精度",
) )
t5_quant_scheme = gr.Dropdown( t5_quant_scheme = gr.Dropdown(
label="T5编码器", label="T5编码器",
choices=["fp8", "int8", "bf16"], choices=["fp8", "int8", "bf16"],
value="bf16", value="bf16",
info="T5编码器模型的推理精度", info="T5编码器模型的量化精度",
) )
clip_quant_scheme = gr.Dropdown( clip_quant_scheme = gr.Dropdown(
label="Clip编码器", label="Clip编码器",
choices=["fp8", "int8", "fp16"], choices=["fp8", "int8", "fp16"],
value="fp16", value="fp16",
info="Clip编码器的推理精度", info="Clip编码器的量化精度",
) )
precision_mode = gr.Dropdown( precision_mode = gr.Dropdown(
label="敏感层精度", label="敏感层精度模式",
choices=["fp32", "bf16"], choices=["fp32", "bf16"],
value="fp32", value="fp32",
info="选择用于敏感层(如norm层和embedding层)的数值精度", info="选择用于关键模型组件(如归一化和嵌入层)的数值精度。FP32提供更高精度,而BF16在兼容硬件上提高性能。",
) )
gr.Markdown("### 变分自编码器(VAE)") gr.Markdown("### 变分自编码器(VAE)")
...@@ -982,7 +1033,7 @@ def main(): ...@@ -982,7 +1033,7 @@ def main():
enable_auto_config.change( enable_auto_config.change(
fn=auto_configure, fn=auto_configure,
inputs=[enable_auto_config, model_type, resolution], inputs=[enable_auto_config, resolution],
outputs=[ outputs=[
torch_compile, torch_compile,
lazy_load, lazy_load,
...@@ -992,6 +1043,8 @@ def main(): ...@@ -992,6 +1043,8 @@ def main():
cpu_offload, cpu_offload,
offload_granularity, offload_granularity,
offload_ratio, offload_ratio,
t5_cpu_offload,
unload_modules,
t5_offload_granularity, t5_offload_granularity,
attention_type, attention_type,
quant_op, quant_op,
...@@ -1007,46 +1060,92 @@ def main(): ...@@ -1007,46 +1060,92 @@ def main():
], ],
) )
infer_btn.click( lazy_load.change(
fn=run_inference, fn=handle_lazy_load_change,
inputs=[ inputs=[lazy_load],
model_type, outputs=[unload_modules],
task, )
prompt, if task == "i2v":
negative_prompt, infer_btn.click(
image_path, fn=run_inference,
save_video_path, inputs=[
torch_compile, prompt,
infer_steps, negative_prompt,
num_frames, save_video_path,
resolution, torch_compile,
seed, infer_steps,
sample_shift, num_frames,
enable_teacache, resolution,
teacache_thresh, seed,
use_ret_steps, sample_shift,
enable_cfg, enable_teacache,
cfg_scale, teacache_thresh,
dit_quant_scheme, use_ret_steps,
t5_quant_scheme, enable_cfg,
clip_quant_scheme, cfg_scale,
fps, dit_quant_scheme,
use_tiny_vae, t5_quant_scheme,
use_tiling_vae, clip_quant_scheme,
lazy_load, fps,
precision_mode, use_tiny_vae,
cpu_offload, use_tiling_vae,
offload_granularity, lazy_load,
offload_ratio, precision_mode,
t5_offload_granularity, cpu_offload,
attention_type, offload_granularity,
quant_op, offload_ratio,
rotary_chunk, t5_cpu_offload,
rotary_chunk_size, unload_modules,
clean_cuda_cache, t5_offload_granularity,
], attention_type,
outputs=output_video, quant_op,
) rotary_chunk,
rotary_chunk_size,
clean_cuda_cache,
image_path,
],
outputs=output_video,
)
else:
infer_btn.click(
fn=run_inference,
inputs=[
prompt,
negative_prompt,
save_video_path,
torch_compile,
infer_steps,
num_frames,
resolution,
seed,
sample_shift,
enable_teacache,
teacache_thresh,
use_ret_steps,
enable_cfg,
cfg_scale,
dit_quant_scheme,
t5_quant_scheme,
clip_quant_scheme,
fps,
use_tiny_vae,
use_tiling_vae,
lazy_load,
precision_mode,
cpu_offload,
offload_granularity,
offload_ratio,
t5_cpu_offload,
unload_modules,
t5_offload_granularity,
attention_type,
quant_op,
rotary_chunk,
rotary_chunk_size,
clean_cuda_cache,
],
outputs=output_video,
)
demo.launch(share=True, server_port=args.server_port, server_name=args.server_name) demo.launch(share=True, server_port=args.server_port, server_name=args.server_name)
...@@ -1061,12 +1160,16 @@ if __name__ == "__main__": ...@@ -1061,12 +1160,16 @@ if __name__ == "__main__":
default="wan2.1", default="wan2.1",
help="要使用的模型类别", help="要使用的模型类别",
) )
parser.add_argument("--model_size", type=str, required=True, choices=["14b", "1.3b"], help="模型大小:14b 或 1.3b")
parser.add_argument("--task", type=str, required=True, choices=["i2v", "t2v"], help="指定任务类型。'i2v'用于图像到视频转换,'t2v'用于文本到视频生成。")
parser.add_argument("--server_port", type=int, default=7862, help="服务器端口") parser.add_argument("--server_port", type=int, default=7862, help="服务器端口")
parser.add_argument("--server_name", type=str, default="0.0.0.0", help="服务器IP") parser.add_argument("--server_name", type=str, default="0.0.0.0", help="服务器IP")
args = parser.parse_args() args = parser.parse_args()
global model_path, model_cls global model_path, model_cls, model_size
model_path = args.model_path model_path = args.model_path
model_cls = args.model_cls model_cls = args.model_cls
model_size = args.model_size
task = args.task
main() main()
#!/bin/bash #!/bin/bash
lightx2v_path=/mtc/gushiqiao/llmc_workspace/lightx2v_new/lightx2v # Lightx2v Gradio Demo Startup Script
model_path=/data/nvme0/gushiqiao/models/I2V/Wan2.1-I2V-14B-720P-Lightx2v-Step-Distill # Supports both Image-to-Video (i2v) and Text-to-Video (t2v) modes
export CUDA_VISIBLE_DEVICES=7 # ==================== Configuration Area ====================
# ⚠️ Important: Please modify the following paths according to your actual environment
# 🚨 Storage Performance Tips 🚨
# 💾 Strongly recommend storing model files on SSD solid-state drives!
# 📈 SSD can significantly improve model loading speed and inference performance
# 🐌 Using mechanical hard drives (HDD) may cause slow model loading and affect overall experience
# Lightx2v project root directory path
# Example: /home/user/lightx2v or /data/video_gen/lightx2v
lightx2v_path=/path/to/lightx2v
# Model path configuration
# Image-to-video model path (for i2v tasks)
# Example: /path/to/Wan2.1-I2V-14B-720P-Lightx2v
i2v_model_path=/path/to/Wan2.1-I2V-14B-720P-Lightx2v-Step-Distill
# Text-to-video model path (for t2v tasks)
# Example: /path/to/Wan2.1-T2V-1.3B
t2v_model_path=/path/to/Wan2.1-T2V-1.3B
# Model size configuration
# Default model size (14b, 1.3b)
model_size="14b"
# Server configuration
server_name="0.0.0.0"
server_port=8032
# GPU configuration
gpu_id=0
# ==================== Environment Variables Setup ====================
export CUDA_VISIBLE_DEVICES=$gpu_id
export CUDA_LAUNCH_BLOCKING=1 export CUDA_LAUNCH_BLOCKING=1
export PYTHONPATH=${lightx2v_path}:$PYTHONPATH export PYTHONPATH=${lightx2v_path}:$PYTHONPATH
export ENABLE_PROFILING_DEBUG=true export ENABLE_PROFILING_DEBUG=true
export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True
python gradio_demo.py \ # ==================== Parameter Parsing ====================
--model_path $model_path \ # Default task type
--server_name 0.0.0.0 \ task="i2v"
--server_port 8005 # Default interface language
lang="zh"
# 解析命令行参数
while [[ $# -gt 0 ]]; do
case $1 in
--task)
task="$2"
shift 2
;;
--lang)
lang="$2"
shift 2
;;
--port)
server_port="$2"
shift 2
;;
--gpu)
gpu_id="$2"
export CUDA_VISIBLE_DEVICES=$gpu_id
shift 2
;;
--model_size)
model_size="$2"
shift 2
;;
--help)
echo "🎬 Lightx2v Gradio Demo Startup Script"
echo "=========================================="
echo "Usage: $0 [options]"
echo ""
echo "📋 Available options:"
echo " --task i2v|t2v Task type (default: i2v)"
echo " i2v: Image-to-video generation"
echo " t2v: Text-to-video generation"
echo " --lang zh|en Interface language (default: zh)"
echo " zh: Chinese interface"
echo " en: English interface"
echo " --port PORT Server port (default: 8032)"
echo " --gpu GPU_ID GPU device ID (default: 0)"
echo " --model_size MODEL_SIZE"
echo " Model size (default: 14b)"
echo " 14b: 14 billion parameters model"
echo " 1.3b: 1.3 billion parameters model"
echo " --help Show this help message"
echo ""
echo "🚀 Usage examples:"
echo " $0 # Default startup for image-to-video mode"
echo " $0 --task i2v --lang zh --port 8032 # Start with specified parameters"
echo " $0 --task t2v --lang en --port 7860 # Text-to-video with English interface"
echo " $0 --task i2v --gpu 1 --port 8032 # Use GPU 1"
echo " $0 --task t2v --model_size 1.3b # Use 1.3B model"
echo " $0 --task i2v --model_size 14b # Use 14B model"
echo ""
echo "📝 Notes:"
echo " - Edit script to configure model paths before first use"
echo " - Ensure required Python dependencies are installed"
echo " - Recommended to use GPU with 8GB+ VRAM"
echo " - 🚨 Strongly recommend storing models on SSD for better performance"
exit 0
;;
*)
echo "Unknown parameter: $1"
echo "Use --help to see help information"
exit 1
;;
esac
done
# ==================== Parameter Validation ====================
if [[ "$task" != "i2v" && "$task" != "t2v" ]]; then
echo "Error: Task type must be 'i2v' or 't2v'"
exit 1
fi
if [[ "$lang" != "zh" && "$lang" != "en" ]]; then
echo "Error: Language must be 'zh' or 'en'"
exit 1
fi
# Validate model size
if [[ "$model_size" != "14b" && "$model_size" != "1.3b" ]]; then
echo "Error: Model size must be '14b' or '1.3b'"
exit 1
fi
# Select model path based on task type
if [[ "$task" == "i2v" ]]; then
model_path=$i2v_model_path
echo "🎬 Starting Image-to-Video mode"
else
model_path=$t2v_model_path
echo "🎬 Starting Text-to-Video mode"
fi
# Check if model path exists
if [[ ! -d "$model_path" ]]; then
echo "❌ Error: Model path does not exist"
echo "📁 Path: $model_path"
echo "🔧 Solutions:"
echo " 1. Check model path configuration in script"
echo " 2. Ensure model files are properly downloaded"
echo " 3. Verify path permissions are correct"
echo " 4. 💾 Recommend storing models on SSD for faster loading"
exit 1
fi
# Select demo file based on language
if [[ "$lang" == "zh" ]]; then
demo_file="gradio_demo_zh.py"
echo "🌏 Using Chinese interface"
else
demo_file="gradio_demo.py"
echo "🌏 Using English interface"
fi
# Check if demo file exists
if [[ ! -f "$demo_file" ]]; then
echo "❌ Error: Demo file does not exist"
echo "📄 File: $demo_file"
echo "🔧 Solutions:"
echo " 1. Ensure script is run in the correct directory"
echo " 2. Check if file has been renamed or moved"
echo " 3. Re-clone or download project files"
exit 1
fi
# ==================== System Information Display ====================
echo "=========================================="
echo "🚀 Lightx2v Gradio Demo Starting..."
echo "=========================================="
echo "📁 Project path: $lightx2v_path"
echo "🤖 Model path: $model_path"
echo "🎯 Task type: $task"
echo "🤖 Model size: $model_size"
echo "🌏 Interface language: $lang"
echo "🖥️ GPU device: $gpu_id"
echo "🌐 Server address: $server_name:$server_port"
echo "=========================================="
# Display system resource information
echo "💻 System resource information:"
free -h | grep -E "Mem|Swap"
echo ""
# Display GPU information
if command -v nvidia-smi &> /dev/null; then
echo "🎮 GPU information:"
nvidia-smi --query-gpu=name,memory.total,memory.free --format=csv,noheader,nounits | head -1
echo ""
fi
# ==================== Start Demo ====================
echo "🎬 Starting Gradio demo..."
echo "📱 Please access in browser: http://$server_name:$server_port"
echo "⏹️ Press Ctrl+C to stop service"
echo "🔄 First startup may take several minutes to load model..."
echo "=========================================="
# Start Python demo
python $demo_file \
--model_path "$model_path" \
--task "$task" \
--server_name "$server_name" \
--server_port "$server_port" \
--model_size "$model_size"
# python gradio_demo_zh.py \ # Display final system resource usage
# --model_path $model_path \ echo ""
# --server_name 0.0.0.0 \ echo "=========================================="
# --server_port 8005 echo "📊 Final system resource usage:"
free -h | grep -E "Mem|Swap"
{
"infer_steps": 40,
"target_video_length": 81,
"target_height": 480,
"target_width": 832,
"self_attn_1_type": "radial_attn",
"cross_attn_1_type": "flash_attn3",
"cross_attn_2_type": "flash_attn3",
"seed": 42,
"sample_guide_scale": 5,
"sample_shift": 5,
"enable_cfg": true,
"cpu_offload": false
}
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment