Merge branch 'main' into audio_r2v

e08c4f90 · sandy · GitHub · 12bfd120 · 6d07a72e · e08c4f90
Commit e08c4f90 authored Jul 17, 2025 by sandy Committed by GitHub Jul 17, 2025
20 changed files
--- a/.gitignore
+++ b/.gitignore
@@ -25,3 +25,5 @@
 build/
 dist/
 .cache/
+server_cache/
+app/.gradio/
--- a/README.md
+++ b/README.md
-# LightX2V: Light Video Generation Inference Framework
+<div align="center" style="font-family: charter;">
+  <h1>⚡️ LightX2V:<br> Lightweight Video Generation Inference Framework</h1>
-<div align="center" id="lightx2v">
 <img alt="logo" src="assets/img_lightx2v.png" width=75%></img>
 [![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
 [![Ask DeepWiki](https://deepwiki.com/badge.svg)](https://deepwiki.com/ModelTC/lightx2v)
 [![Doc](https://img.shields.io/badge/docs-English-99cc2)](https://lightx2v-en.readthedocs.io/en/latest)
 [![Doc](https://img.shields.io/badge/文档-中文-99cc2)](https://lightx2v-zhcn.readthedocs.io/zh-cn/latest)
+[![Papers](https://img.shields.io/badge/论文集-中文-99cc2)](https://lightx2v-papers-zhcn.readthedocs.io/zh-cn/latest)
 [![Docker](https://badgen.net/badge/icon/docker?icon=docker&label)](https://hub.docker.com/r/lightx2v/lightx2v/tags)
+**\[ English | [中文](README_zh.md) \]**
 </div>
 --------------------------------------------------------------------------------
-## Supported Model List
+**LightX2V** is an advanced lightweight video generation inference framework engineered to deliver efficient, high-performance video synthesis solutions. This unified platform integrates multiple state-of-the-art video generation techniques, supporting diverse generation tasks including text-to-video (T2V) and image-to-video (I2V). **X2V represents the transformation of different input modalities (X, such as text or images) into video output (V)**.
-✅ [HunyuanVideo-T2V](https://huggingface.co/tencent/HunyuanVideo)
+## 💡 Quick Start
-✅ [HunyuanVideo-I2V](https://huggingface.co/tencent/HunyuanVideo-I2V)
+For comprehensive usage instructions, please refer to our documentation: **[English Docs](https://lightx2v-en.readthedocs.io/en/latest/) | [中文文档](https://lightx2v-zhcn.readthedocs.io/zh-cn/latest/)**
-✅ [Wan2.1-T2V](https://huggingface.co/Wan-AI/Wan2.1-T2V-1.3B)
-✅ [Wan2.1-I2V](https://huggingface.co/Wan-AI/Wan2.1-I2V-14B-480P)
+## 🤖 Supported Model Ecosystem
-✅ [Wan2.1-T2V-StepDistill-CfgDistill](https://huggingface.co/lightx2v/Wan2.1-T2V-14B-StepDistill-CfgDistill)
+### Official Open-Source Models
+- ✅ [HunyuanVideo](https://huggingface.co/tencent/HunyuanVideo)
+- ✅ [Wan2.1](https://huggingface.co/Wan-AI/)
+- ✅ [SkyReels-V2-DF](https://huggingface.co/Skywork/SkyReels-V2-DF-14B-540P)
+- ✅ [CogVideoX1.5-5B-T2V](https://huggingface.co/THUDM/CogVideoX1.5-5B)
-✅ [Wan2.1-T2V-CausVid](https://huggingface.co/lightx2v/Wan2.1-T2V-14B-CausVid)
+### Quantized Models
+- ✅ [Wan2.1-T2V-1.3B-Lightx2v](https://huggingface.co/lightx2v/Wan2.1-T2V-1.3B-Lightx2v)
+- ✅ [Wan2.1-T2V-14B-Lightx2v](https://huggingface.co/lightx2v/Wan2.1-T2V-14B-Lightx2v)
+- ✅ [Wan2.1-I2V-14B-480P-Lightx2v](https://huggingface.co/lightx2v/Wan2.1-I2V-14B-480P-Lightx2v)
+- ✅ [Wan2.1-I2V-14B-720P-Lightx2v](https://huggingface.co/lightx2v/Wan2.1-I2V-14B-720P-Lightx2v)
-✅ [SkyReels-V2-DF](https://huggingface.co/Skywork/SkyReels-V2-DF-14B-540P)
+### Distilled Models (**🚀 Recommended: 4-step inference**)
+- ✅ [Wan2.1-T2V-14B-StepDistill-CfgDistill-Lightx2v](https://huggingface.co/lightx2v/Wan2.1-T2V-14B-StepDistill-CfgDistill-Lightx2v)
+- ✅ [Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-Lightx2v](https://huggingface.co/lightx2v/Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-Lightx2v)
+- ✅ [Wan2.1-I2V-14B-720P-StepDistill-CfgDistill-Lightx2v](https://huggingface.co/lightx2v/Wan2.1-I2V-14B-720P-StepDistill-CfgDistill-Lightx2v)
-✅ [CogVideoX1.5-5B-T2V](https://huggingface.co/THUDM/CogVideoX1.5-5B)
+### Autoregressive Models
+- ✅ [Wan2.1-T2V-CausVid](https://huggingface.co/lightx2v/Wan2.1-T2V-14B-CausVid)
-## How to Run
-Please refer to the [documentation](https://github.com/ModelTC/lightx2v/tree/main/docs) in lightx2v.
+## 🚀 Core Features
-## Contributing Guidelines
+### 🎯 **Ultimate Performance Optimization**
+- **🔥 SOTA Inference Speed**: Achieve **~15x** acceleration via step distillation and system optimization (single GPU)
+- **⚡️ Revolutionary 4-Step Distillation**: Compress original 40-50 step inference to just 4 steps without CFG requirements
+- **🛠️ Advanced Operator Support**: Integrated with cutting-edge operators including [Sage Attention](https://github.com/thu-ml/SageAttention), [Flash Attention](https://github.com/Dao-AILab/flash-attention), [Radial Attention](https://github.com/mit-han-lab/radial-attention), [q8-kernel](https://github.com/KONAKONA666/q8_kernels), [sgl-kernel](https://github.com/sgl-project/sglang/tree/main/sgl-kernel), [vllm](https://github.com/vllm-project/vllm)
-We have prepared a `pre-commit` hook to enforce consistent code formatting across the project.
+### 💾 **Resource-Efficient Deployment**
+- **💡 Breaking Hardware Barriers**: Run 14B models for 480P/720P video generation with only **8GB VRAM + 16GB RAM**
+- **🔧 Intelligent Parameter Offloading**: Advanced disk-CPU-GPU three-tier offloading architecture with phase/block-level granular management
+- **⚙️ Comprehensive Quantization**: Support for `w8a8-int8`, `w8a8-fp8`, `w4a4-nvfp4` and other quantization strategies
-1. Install the required dependencies:
+### 🎨 **Rich Feature Ecosystem**
+- **📈 Smart Feature Caching**: Intelligent caching mechanisms to eliminate redundant computations
+- **🔄 Parallel Inference**: Multi-GPU parallel processing for enhanced performance
+- **📱 Flexible Deployment Options**: Support for Gradio, service deployment, ComfyUI and other deployment methods
+- **🎛️ Dynamic Resolution Inference**: Adaptive resolution adjustment for optimal generation quality
-```shell
-pip install ruff pre-commit
-```
-2. Then, run the following command before commit:
+## 🏆 Performance Benchmarks
+For detailed performance metrics and comparisons, please refer to our [benchmark documentation](https://github.com/ModelTC/LightX2V/blob/main/docs/EN/source/getting_started/benchmark_source.md).
+[Detailed Service Deployment Guide →](https://lightx2v-en.readthedocs.io/en/latest/deploy_guides/deploy_service.html)
+## 📚 Technical Documentation
+### 📖 **Method Tutorials**
+- [Model Quantization](https://lightx2v-en.readthedocs.io/en/latest/method_tutorials/quantization.html) - Comprehensive guide to quantization strategies
+- [Feature Caching](https://lightx2v-en.readthedocs.io/en/latest/method_tutorials/cache.html) - Intelligent caching mechanisms
+- [Attention Mechanisms](https://lightx2v-en.readthedocs.io/en/latest/method_tutorials/attention.html) - State-of-the-art attention operators
+- [Parameter Offloading](https://lightx2v-en.readthedocs.io/en/latest/method_tutorials/offload.html) - Three-tier storage architecture
+- [Parallel Inference](https://lightx2v-en.readthedocs.io/en/latest/method_tutorials/parallel.html) - Multi-GPU acceleration strategies
+- [Step Distillation](https://lightx2v-en.readthedocs.io/en/latest/method_tutorials/step_distill.html) - 4-step inference technology
+### 🛠️ **Deployment Guides**
+- [Low-Resource Deployment](https://lightx2v-en.readthedocs.io/en/latest/deploy_guides/for_low_resource.html) - Optimized 8GB VRAM solutions
+- [Low-Latency Deployment](https://lightx2v-en.readthedocs.io/en/latest/deploy_guides/for_low_latency.html) - Ultra-fast inference optimization
+- [Gradio Deployment](https://lightx2v-en.readthedocs.io/en/latest/deploy_guides/deploy_gradio.html) - Web interface setup
+- [Service Deployment](https://lightx2v-en.readthedocs.io/en/latest/deploy_guides/deploy_service.html) - Production API service deployment
+## 🧾 Contributing Guidelines
+We maintain code quality through automated pre-commit hooks to ensure consistent formatting across the project.
-```shell
+> [!TIP]
-pre-commit run --all-files
+> **Setup Instructions:**
+>
+> 1. Install required dependencies:
+> ```shell
+> pip install ruff pre-commit
+> ```
+>
+> 2. Run before committing:
+> ```shell
+> pre-commit run --all-files
+> ```
+We appreciate your contributions to making LightX2V better!
+## 🤝 Acknowledgments
+We extend our gratitude to all the model repositories and research communities that inspired and contributed to the development of LightX2V. This framework builds upon the collective efforts of the open-source community.
+## 🌟 Star History
+[![Star History Chart](https://api.star-history.com/svg?repos=ModelTC/lightx2v&type=Timeline)](https://star-history.com/#ModelTC/lightx2v&Timeline)
+## ✏️ Citation
+If you find LightX2V useful in your research, please consider citing our work:
+```bibtex
+@misc{lightx2v,
+ author = {LightX2V Contributors},
+ title = {LightX2V: Light Video Generation Inference Framework},
+ year = {2025},
+ publisher = {GitHub},
+ journal = {GitHub repository},
+ howpublished = {\url{https://github.com/ModelTC/lightx2v}},
+}
 ```
-Thank you for your contributions!
+## 📞 Contact & Support
+For questions, suggestions, or support, please feel free to reach out through:
+- 🐛 [GitHub Issues](https://github.com/ModelTC/lightx2v/issues) - Bug reports and feature requests
+- 💬 [GitHub Discussions](https://github.com/ModelTC/lightx2v/discussions) - Community discussions and Q&A
-## Acknowledgments
+---
-We built the code for this repository by referencing the code repositories involved in all the models mentioned above.
+<div align="center">
+Built with ❤️ by the LightX2V team
+</div>
--- a/README_zh.md
+++ b/README_zh.md
+<div align="center" style="font-family: charter;">
+  <h1>⚡️ LightX2V:<br> 轻量级视频生成推理框架</h1>
+<img alt="logo" src="assets/img_lightx2v.png" width=75%></img>
+[![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
+[![Ask DeepWiki](https://deepwiki.com/badge.svg)](https://deepwiki.com/ModelTC/lightx2v)
+[![Doc](https://img.shields.io/badge/docs-English-99cc2)](https://lightx2v-en.readthedocs.io/en/latest)
+[![Doc](https://img.shields.io/badge/文档-中文-99cc2)](https://lightx2v-zhcn.readthedocs.io/zh-cn/latest)
+[![Papers](https://img.shields.io/badge/论文集-中文-99cc2)](https://lightx2v-papers-zhcn.readthedocs.io/zh-cn/latest)
+[![Docker](https://badgen.net/badge/icon/docker?icon=docker&label)](https://hub.docker.com/r/lightx2v/lightx2v/tags)
+**\[ [English](README.md) | 中文 \]**
+</div>
+--------------------------------------------------------------------------------
+**LightX2V** 是一个先进的轻量级视频生成推理框架，专为提供高效、高性能的视频合成解决方案而设计。该统一平台集成了多种前沿的视频生成技术，支持文本生成视频(T2V)和图像生成视频(I2V)等多样化生成任务。**X2V 表示将不同的输入模态(X，如文本或图像)转换为视频输出(V)**。
+## 💡 快速开始
+详细使用说明请参考我们的文档：**[英文文档](https://lightx2v-en.readthedocs.io/en/latest/) | [中文文档](https://lightx2v-zhcn.readthedocs.io/zh-cn/latest/)**
+## 🤖 支持的模型生态
+### 官方开源模型
+- ✅ [HunyuanVideo](https://huggingface.co/tencent/HunyuanVideo)
+- ✅ [Wan2.1](https://huggingface.co/Wan-AI/)
+- ✅ [SkyReels-V2-DF](https://huggingface.co/Skywork/SkyReels-V2-DF-14B-540P)
+- ✅ [CogVideoX1.5-5B-T2V](https://huggingface.co/THUDM/CogVideoX1.5-5B)
+### 量化模型
+- ✅ [Wan2.1-T2V-1.3B-Lightx2v](https://huggingface.co/lightx2v/Wan2.1-T2V-1.3B-Lightx2v)
+- ✅ [Wan2.1-T2V-14B-Lightx2v](https://huggingface.co/lightx2v/Wan2.1-T2V-14B-Lightx2v)
+- ✅ [Wan2.1-I2V-14B-480P-Lightx2v](https://huggingface.co/lightx2v/Wan2.1-I2V-14B-480P-Lightx2v)
+- ✅ [Wan2.1-I2V-14B-720P-Lightx2v](https://huggingface.co/lightx2v/Wan2.1-I2V-14B-720P-Lightx2v)
+### 蒸馏模型 (**🚀 推荐：4步推理**)
+- ✅ [Wan2.1-T2V-14B-StepDistill-CfgDistill-Lightx2v](https://huggingface.co/lightx2v/Wan2.1-T2V-14B-StepDistill-CfgDistill-Lightx2v)
+- ✅ [Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-Lightx2v](https://huggingface.co/lightx2v/Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-Lightx2v)
+- ✅ [Wan2.1-I2V-14B-720P-StepDistill-CfgDistill-Lightx2v](https://huggingface.co/lightx2v/Wan2.1-I2V-14B-720P-StepDistill-CfgDistill-Lightx2v)
+### 自回归模型
+- ✅ [Wan2.1-T2V-CausVid](https://huggingface.co/lightx2v/Wan2.1-T2V-14B-CausVid)
+## 🚀 核心特性
+### 🎯 **极致性能优化**
+- **🔥 SOTA推理速度**: 通过步数蒸馏和系统优化实现**15倍**极速加速(单GPU)
+- **⚡️ 革命性4步蒸馏**: 将原始40-50步推理压缩至仅需4步，且无需CFG配置
+- **🛠️ 先进算子支持**: 集成顶尖算子，包括[Sage Attention](https://github.com/thu-ml/SageAttention)、[Flash Attention](https://github.com/Dao-AILab/flash-attention)、[Radial Attention](https://github.com/mit-han-lab/radial-attention)、[q8-kernel](https://github.com/KONAKONA666/q8_kernels)、[sgl-kernel](https://github.com/sgl-project/sglang/tree/main/sgl-kernel)、[vllm](https://github.com/vllm-project/vllm)
+### 💾 **资源高效部署**
+- **💡 突破硬件限制**: **仅需8GB显存 + 16GB内存**即可运行14B模型生成480P/720P视频
+- **🔧 智能参数卸载**: 先进的磁盘-CPU-GPU三级卸载架构，支持阶段/块级别的精细化管理
+- **⚙️ 全面量化支持**: 支持`w8a8-int8`、`w8a8-fp8`、`w4a4-nvfp4`等多种量化策略
+### 🎨 **丰富功能生态**
+- **📈 智能特征缓存**: 智能缓存机制，消除冗余计算，提升效率
+- **🔄 并行推理加速**: 多GPU并行处理，显著提升性能表现
+- **📱 灵活部署选择**: 支持Gradio、服务化部署、ComfyUI等多种部署方式
+- **🎛️ 动态分辨率推理**: 自适应分辨率调整，优化生成质量
+## 🏆 性能基准测试
+详细的性能指标和对比分析，请参考我们的[基准测试文档](https://github.com/ModelTC/LightX2V/blob/main/docs/ZH_CN/source/getting_started/benchmark_source.md)。
+[详细服务部署指南 →](https://lightx2v-zhcn.readthedocs.io/zh-cn/latest/deploy_guides/deploy_service.html)
+## 📚 技术文档
+### 📖 **方法教程**
+- [模型量化](https://lightx2v-zhcn.readthedocs.io/zh-cn/latest/method_tutorials/quantization.html) - 量化策略全面指南
+- [特征缓存](https://lightx2v-zhcn.readthedocs.io/zh-cn/latest/method_tutorials/cache.html) - 智能缓存机制详解
+- [注意力机制](https://lightx2v-zhcn.readthedocs.io/zh-cn/latest/method_tutorials/attention.html) - 前沿注意力算子
+- [参数卸载](https://lightx2v-zhcn.readthedocs.io/zh-cn/latest/method_tutorials/offload.html) - 三级存储架构
+- [并行推理](https://lightx2v-zhcn.readthedocs.io/zh-cn/latest/method_tutorials/parallel.html) - 多GPU加速策略
+- [步数蒸馏](https://lightx2v-zhcn.readthedocs.io/zh-cn/latest/method_tutorials/step_distill.html) - 4步推理技术
+### 🛠️ **部署指南**
+- [低资源场景部署](https://lightx2v-zhcn.readthedocs.io/zh-cn/latest/deploy_guides/for_low_resource.html) - 优化的8GB显存解决方案
+- [低延迟场景部署](https://lightx2v-zhcn.readthedocs.io/zh-cn/latest/deploy_guides/for_low_latency.html) - 极速推理优化
+- [Gradio部署](https://lightx2v-zhcn.readthedocs.io/zh-cn/latest/deploy_guides/deploy_gradio.html) - Web界面搭建
+- [服务化部署](https://lightx2v-zhcn.readthedocs.io/zh-cn/latest/deploy_guides/deploy_service.html) - 生产级API服务部署
+## 🧾 代码贡献指南
+我们通过自动化的预提交钩子来保证代码质量，确保项目代码格式的一致性。
+> [!TIP]
+> **安装说明：**
+>
+> 1. 安装必要的依赖：
+> ```shell
+> pip install ruff pre-commit
+> ```
+>
+> 2. 提交前运行：
+> ```shell
+> pre-commit run --all-files
+> ```
+感谢您为LightX2V的改进做出贡献！
+## 🤝 致谢
+我们向所有启发和促进LightX2V开发的模型仓库和研究社区表示诚挚的感谢。此框架基于开源社区的集体努力而构建。
+## 🌟 Star 历史
+[![Star History Chart](https://api.star-history.com/svg?repos=ModelTC/lightx2v&type=Timeline)](https://star-history.com/#ModelTC/lightx2v&Timeline)
+## ✏️ 引用
+如果您发现LightX2V对您的研究有用，请考虑引用我们的工作：
+```bibtex
+@misc{lightx2v,
+ author = {LightX2V Contributors},
+ title = {LightX2V: Light Video Generation Inference Framework},
+ year = {2025},
+ publisher = {GitHub},
+ journal = {GitHub repository},
+ howpublished = {\url{https://github.com/ModelTC/lightx2v}},
+}
+```
+## 📞 联系与支持
+如有任何问题、建议或需要支持，欢迎通过以下方式联系我们：
+- 🐛 [GitHub Issues](https://github.com/ModelTC/lightx2v/issues) - 错误报告和功能请求
+- 💬 [GitHub Discussions](https://github.com/ModelTC/lightx2v/discussions) - 社区讨论和问答
+---
+<div align="center">
+由 LightX2V 团队用 ❤️ 构建
+</div>
--- a/app/.gradio/certificate.pem
+++ b/app/.gradio/certificate.pem
-----BEGIN CERTIFICATE-----
-MIIFazCCA1OgAwIBAgIRAIIQz7DSQONZRGPgu2OCiwAwDQYJKoZIhvcNAQELBQAw
-TzELMAkGA1UEBhMCVVMxKTAnBgNVBAoTIEludGVybmV0IFNlY3VyaXR5IFJlc2Vh
-cmNoIEdyb3VwMRUwEwYDVQQDEwxJU1JHIFJvb3QgWDEwHhcNMTUwNjA0MTEwNDM4
-WhcNMzUwNjA0MTEwNDM4WjBPMQswCQYDVQQGEwJVUzEpMCcGA1UEChMgSW50ZXJu
-ZXQgU2VjdXJpdHkgUmVzZWFyY2ggR3JvdXAxFTATBgNVBAMTDElTUkcgUm9vdCBY
-MTCCAiIwDQYJKoZIhvcNAQEBBQADggIPADCCAgoCggIBAK3oJHP0FDfzm54rVygc
-h77ct984kIxuPOZXoHj3dcKi/vVqbvYATyjb3miGbESTtrFj/RQSa78f0uoxmyF+
-0TM8ukj13Xnfs7j/EvEhmkvBioZxaUpmZmyPfjxwv60pIgbz5MDmgK7iS4+3mX6U
-A5/TR5d8mUgjU+g4rk8Kb4Mu0UlXjIB0ttov0DiNewNwIRt18jA8+o+u3dpjq+sW
-T8KOEUt+zwvo/7V3LvSye0rgTBIlDHCNAymg4VMk7BPZ7hm/ELNKjD+Jo2FR3qyH
-B5T0Y3HsLuJvW5iB4YlcNHlsdu87kGJ55tukmi8mxdAQ4Q7e2RCOFvu396j3x+UC
-B5iPNgiV5+I3lg02dZ77DnKxHZu8A/lJBdiB3QW0KtZB6awBdpUKD9jf1b0SHzUv
-KBds0pjBqAlkd25HN7rOrFleaJ1/ctaJxQZBKT5ZPt0m9STJEadao0xAH0ahmbWn
-OlFuhjuefXKnEgV4We0+UXgVCwOPjdAvBbI+e0ocS3MFEvzG6uBQE3xDk3SzynTn
-jh8BCNAw1FtxNrQHusEwMFxIt4I7mKZ9YIqioymCzLq9gwQbooMDQaHWBfEbwrbw
-qHyGO0aoSCqI3Haadr8faqU9GY/rOPNk3sgrDQoo//fb4hVC1CLQJ13hef4Y53CI
-rU7m2Ys6xt0nUW7/vGT1M0NPAgMBAAGjQjBAMA4GA1UdDwEB/wQEAwIBBjAPBgNV
-HRMBAf8EBTADAQH/MB0GA1UdDgQWBBR5tFnme7bl5AFzgAiIyBpY9umbbjANBgkq
-hkiG9w0BAQsFAAOCAgEAVR9YqbyyqFDQDLHYGmkgJykIrGF1XIpu+ILlaS/V9lZL
-ubhzEFnTIZd+50xx+7LSYK05qAvqFyFWhfFQDlnrzuBZ6brJFe+GnY+EgPbk6ZGQ
-3BebYhtF8GaV0nxvwuo77x/Py9auJ/GpsMiu/X1+mvoiBOv/2X/qkSsisRcOj/KK
-NFtY2PwByVS5uCbMiogziUwthDyC3+6WVwW6LLv3xLfHTjuCvjHIInNzktHCgKQ5
-ORAzI4JMPJ+GslWYHb4phowim57iaztXOoJwTdwJx4nLCgdNbOhdjsnvzqvHu7Ur
-TkXWStAmzOVyyghqpZXjFaH3pO3JLF+l+/+sKAIuvtd7u+Nxe5AW0wdeRlN8NwdC
-jNPElpzVmbUq4JUagEiuTDkHzsxHpFKVK7q4+63SM1N95R1NbdWhscdCb+ZAJzVc
-oyi3B43njTOQ5yOf+1CceWxG1bQVs5ZufpsMljq4Ui0/1lvh+wjChP4kqKOJ2qxq
-4RgqsahDYVvTH9w7jXbyLeiNdd8XM2w9U/t7y0Ff/9yi0GE44Za4rF2LN9d11TPA
-mRGunUHBcnWEvgJBQl9nJEiU0Zsnvgc/ubhPgXRR4Xq37Z0j4r7g1SgEEzwxA57d
-emyPxgcYxn/eR44/KJ4EBs+lVDR3veyJm+kXQ99b21/+jh5Xos1AnX5iItreGCc=
-----END CERTIFICATE-----
--- a/app/README.md
+++ b/app/README.md
+# Gradio Demo
+Please refer our gradio deployment doc:
+[English doc: Gradio Deployment](https://lightx2v-en.readthedocs.io/en/latest/deploy_guides/deploy_gradio.html)
+[中文文档: Gradio 部署](https://lightx2v-zhcn.readthedocs.io/zh-cn/latest/deploy_guides/deploy_gradio.html)
--- a/app/gradio_demo.py
+++ b/app/gradio_demo.py
 import os
 import gradio as gr
-import asyncio
 import argparse
 import json
 import torch
@@ -109,6 +108,24 @@ def get_cpu_memory():
    return available_bytes / 1024**3
+def cleanup_memory():
+    gc.collect()
+    if torch.cuda.is_available():
+        torch.cuda.empty_cache()
+        torch.cuda.synchronize()
+    try:
+        if hasattr(psutil, "virtual_memory"):
+            if os.name == "posix":
+                try:
+                    os.system("sync")
+                except:  # noqa
+                    pass
+    except:  # noqa
+        pass
 def generate_unique_filename(base_dir="./saved_videos"):
    os.makedirs(base_dir, exist_ok=True)
    timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
@@ -147,11 +164,8 @@ for op_name, is_installed in available_attn_ops:
 def run_inference(
-    model_type,
-    task,
    prompt,
    negative_prompt,
-    image_path,
    save_video_path,
    torch_compile,
    infer_steps,
@@ -175,30 +189,30 @@ def run_inference(
    cpu_offload,
    offload_granularity,
    offload_ratio,
+    t5_cpu_offload,
+    unload_modules,
    t5_offload_granularity,
    attention_type,
    quant_op,
    rotary_chunk,
    rotary_chunk_size,
    clean_cuda_cache,
+    image_path=None,
 ):
+    cleanup_memory()
    quant_op = quant_op.split("(")[0].strip()
    attention_type = attention_type.split("(")[0].strip()
-    global global_runner, current_config, model_path
+    global global_runner, current_config, model_path, task
    global cur_dit_quant_scheme, cur_clip_quant_scheme, cur_t5_quant_scheme, cur_precision_mode, cur_enable_teacache
    if os.path.exists(os.path.join(model_path, "config.json")):
        with open(os.path.join(model_path, "config.json"), "r") as f:
            model_config = json.load(f)
-    if task == "Image to Video":
-        task = "i2v"
-    elif task == "Text to Video":
-        task = "t2v"
    if task == "t2v":
-        if model_type == "Wan2.1 1.3B":
+        if model_size == "1.3b":
            # 1.3B
            coefficient = [
                [
@@ -293,6 +307,7 @@ def run_inference(
    needs_reinit = (
        lazy_load
+        or unload_modules
        or global_runner is None
        or current_config is None
        or cur_dit_quant_scheme is None
@@ -331,6 +346,8 @@ def run_inference(
        if os.path.exists(os.path.join(dit_quantized_ckpt, "config.json")):
            with open(os.path.join(dit_quantized_ckpt, "config.json"), "r") as f:
                quant_model_config = json.load(f)
+        else:
+            quant_model_config = {}
    else:
        mm_type = "Default"
        dit_quantized_ckpt = None
@@ -361,6 +378,8 @@ def run_inference(
        "coefficients": coefficient[0] if use_ret_steps else coefficient[1],
        "use_ret_steps": use_ret_steps,
        "teacache_thresh": teacache_thresh,
+        "t5_cpu_offload": t5_cpu_offload,
+        "unload_modules": unload_modules,
        "t5_quantized": is_t5_quant,
        "t5_quantized_ckpt": t5_quant_ckpt,
        "t5_quant_scheme": t5_quant_scheme,
@@ -399,7 +418,6 @@ def run_inference(
    config.update({k: v for k, v in vars(args).items()})
    config = EasyDict(config)
-    config["mode"] = "infer"
    config.update(model_config)
    config.update(quant_model_config)
@@ -429,17 +447,27 @@ def run_inference(
    else:
        runner.config = config
-    asyncio.run(runner.run_pipeline())
+    runner.run_pipeline()
-    if lazy_load:
+    del config, args, model_config, quant_model_config
-        del runner
+    if "dit_quantized_ckpt" in locals():
-        torch.cuda.empty_cache()
+        del dit_quantized_ckpt
-        gc.collect()
+    if "t5_quant_ckpt" in locals():
+        del t5_quant_ckpt
+    if "clip_quant_ckpt" in locals():
+        del clip_quant_ckpt
+    cleanup_memory()
    return save_video_path
-def auto_configure(enable_auto_config, model_type, resolution):
+def handle_lazy_load_change(lazy_load_enabled):
+    """Handle lazy_load checkbox change to automatically enable unload_modules"""
+    return gr.update(value=lazy_load_enabled)
+def auto_configure(enable_auto_config, resolution):
    default_config = {
        "torch_compile_val": False,
        "lazy_load_val": False,
@@ -449,6 +477,8 @@ def auto_configure(enable_auto_config, model_type, resolution):
        "cpu_offload_val": False,
        "offload_granularity_val": "block",
        "offload_ratio_val": 1,
+        "t5_cpu_offload_val": False,
+        "unload_modules_val": False,
        "t5_offload_granularity_val": "model",
        "attention_type_val": attn_op_choices[0][1],
        "quant_op_val": quant_op_choices[0][1],
@@ -505,7 +535,7 @@ def auto_configure(enable_auto_config, model_type, resolution):
    else:
        res = "480p"
-    if model_type in ["Wan2.1 14B"]:
+    if model_size == "14b":
        is_14b = True
    else:
        is_14b = False
@@ -513,13 +543,14 @@ def auto_configure(enable_auto_config, model_type, resolution):
    if res == "720p" and is_14b:
        gpu_rules = [
            (80, {}),
-            (48, {"cpu_offload_val": True, "offload_ratio_val": 0.5}),
+            (48, {"cpu_offload_val": True, "offload_ratio_val": 0.5, "t5_cpu_offload_val": True}),
-            (40, {"cpu_offload_val": True, "offload_ratio_val": 0.8}),
+            (40, {"cpu_offload_val": True, "offload_ratio_val": 0.8, "t5_cpu_offload_val": True}),
-            (32, {"cpu_offload_val": True, "offload_ratio_val": 1}),
+            (32, {"cpu_offload_val": True, "offload_ratio_val": 1, "t5_cpu_offload_val": True}),
            (
                24,
                {
                    "cpu_offload_val": True,
+                    "t5_cpu_offload_val": True,
                    "offload_ratio_val": 1,
                    "t5_offload_granularity_val": "block",
                    "precision_mode_val": "bf16",
@@ -530,6 +561,7 @@ def auto_configure(enable_auto_config, model_type, resolution):
                16,
                {
                    "cpu_offload_val": True,
+                    "t5_cpu_offload_val": True,
                    "offload_ratio_val": 1,
                    "t5_offload_granularity_val": "block",
                    "precision_mode_val": "bf16",
@@ -543,6 +575,7 @@ def auto_configure(enable_auto_config, model_type, resolution):
                12,
                {
                    "cpu_offload_val": True,
+                    "t5_cpu_offload_val": True,
                    "offload_ratio_val": 1,
                    "t5_offload_granularity_val": "block",
                    "precision_mode_val": "bf16",
@@ -551,12 +584,14 @@ def auto_configure(enable_auto_config, model_type, resolution):
                    "rotary_chunk_val": True,
                    "rotary_chunk_size_val": 100,
                    "clean_cuda_cache_val": True,
+                    "use_tiny_vae_val": True,
                },
            ),
            (
                8,
                {
                    "cpu_offload_val": True,
+                    "t5_cpu_offload_val": True,
                    "offload_ratio_val": 1,
                    "t5_offload_granularity_val": "block",
                    "precision_mode_val": "bf16",
@@ -569,6 +604,8 @@ def auto_configure(enable_auto_config, model_type, resolution):
                    "clip_quant_scheme_val": quant_type,
                    "dit_quant_scheme_val": quant_type,
                    "lazy_load_val": True,
+                    "unload_modules_val": True,
+                    "use_tiny_vae_val": True,
                },
            ),
        ]
@@ -576,13 +613,14 @@ def auto_configure(enable_auto_config, model_type, resolution):
    elif is_14b:
        gpu_rules = [
            (80, {}),
-            (48, {"cpu_offload_val": True, "offload_ratio_val": 0.2}),
+            (48, {"cpu_offload_val": True, "offload_ratio_val": 0.2, "t5_cpu_offload_val": True}),
-            (40, {"cpu_offload_val": True, "offload_ratio_val": 0.5}),
+            (40, {"cpu_offload_val": True, "offload_ratio_val": 0.5, "t5_cpu_offload_val": True}),
-            (24, {"cpu_offload_val": True, "offload_ratio_val": 0.8}),
+            (24, {"cpu_offload_val": True, "offload_ratio_val": 0.8, "t5_cpu_offload_val": True}),
            (
                16,
                {
                    "cpu_offload_val": True,
+                    "t5_cpu_offload_val": True,
                    "offload_ratio_val": 1,
                    "t5_offload_granularity_val": "block",
                    "precision_mode_val": "bf16",
@@ -595,6 +633,7 @@ def auto_configure(enable_auto_config, model_type, resolution):
                (
                    {
                        "cpu_offload_val": True,
+                        "t5_cpu_offload_val": True,
                        "offload_ratio_val": 1,
                        "t5_offload_granularity_val": "block",
                        "precision_mode_val": "bf16",
@@ -604,12 +643,15 @@ def auto_configure(enable_auto_config, model_type, resolution):
                        "clip_quant_scheme_val": quant_type,
                        "dit_quant_scheme_val": quant_type,
                        "lazy_load_val": True,
+                        "unload_modules_val": True,
                        "rotary_chunk_val": True,
                        "rotary_chunk_size_val": 10000,
+                        "use_tiny_vae_val": True,
                    }
                    if res == "540p"
                    else {
                        "cpu_offload_val": True,
+                        "t5_cpu_offload_val": True,
                        "offload_ratio_val": 1,
                        "t5_offload_granularity_val": "block",
                        "precision_mode_val": "bf16",
@@ -619,11 +661,26 @@ def auto_configure(enable_auto_config, model_type, resolution):
                        "clip_quant_scheme_val": quant_type,
                        "dit_quant_scheme_val": quant_type,
                        "lazy_load_val": True,
+                        "unload_modules_val": True,
+                        "use_tiny_vae_val": True,
                    }
                ),
            ),
        ]
+    else:
+        gpu_rules = [
+            (24, {}),
+            (
+                8,
+                {
+                    "t5_cpu_offload_val": True,
+                    "t5_offload_granularity_val": "block",
+                    "t5_quant_scheme_val": quant_type,
+                },
+            ),
+        ]
    if is_14b:
        cpu_rules = [
            (128, {}),
@@ -636,6 +693,19 @@ def auto_configure(enable_auto_config, model_type, resolution):
                    "t5_quant_scheme_val": quant_type,
                    "clip_quant_scheme_val": quant_type,
                    "lazy_load_val": True,
+                    "unload_modules_val": True,
+                },
+            ),
+        ]
+    else:
+        cpu_rules = [
+            (64, {}),
+            (
+                16,
+                {
+                    "t5_quant_scheme_val": quant_type,
+                    "unload_modules_val": True,
+                    "use_tiny_vae_val": True,
                },
            ),
        ]
@@ -654,12 +724,6 @@ def auto_configure(enable_auto_config, model_type, resolution):
 def main():
-    def update_model_type(task_type):
-        if task_type == "Image to Video":
-            return gr.update(choices=["Wan2.1 14B"], value="Wan2.1 14B")
-        elif task_type == "Text to Video":
-            return gr.update(choices=["Wan2.1 14B", "Wan2.1 1.3B"], value="Wan2.1 14B")
    def toggle_image_input(task):
        return gr.update(visible=(task == "Image to Video"))
@@ -683,37 +747,15 @@ def main():
                        with gr.Group():
                            gr.Markdown("## 📥 Input Parameters")
-                            with gr.Row():
+                            if task == "i2v":
-                                task = gr.Dropdown(
+                                with gr.Row():
-                                    choices=["Image to Video", "Text to Video"],
+                                    image_path = gr.Image(
-                                    value="Image to Video",
+                                        label="Input Image",
-                                    label="Task Type",
+                                        type="filepath",
-                                )
+                                        height=300,
-                                model_type = gr.Dropdown(
+                                        interactive=True,
-                                    choices=["Wan2.1 14B"],
+                                        visible=True,
-                                    value="Wan2.1 14B",
+                                    )
-                                    label="Model Type",
-                                )
-                                task.change(
-                                    fn=update_model_type,
-                                    inputs=task,
-                                    outputs=model_type,
-                                )
-                            with gr.Row():
-                                image_path = gr.Image(
-                                    label="Input Image",
-                                    type="filepath",
-                                    height=300,
-                                    interactive=True,
-                                    visible=True,  # Initially visible
-                                )
-                                task.change(
-                                    fn=toggle_image_input,
-                                    inputs=task,
-                                    outputs=image_path,
-                                )
                            with gr.Row():
                                with gr.Column():
@@ -755,6 +797,13 @@ def main():
                                        value="832x480",
                                        label="Maximum Resolution",
                                    )
+                                with gr.Column():
+                                    enable_auto_config = gr.Checkbox(
+                                        label="Auto-configure Inference Options",
+                                        value=False,
+                                        info="Automatically optimize GPU settings to match the current resolution. After changing the resolution, please re-check this option to prevent potential performance degradation or runtime errors.",
+                                    )
                                with gr.Column(scale=9):
                                    seed = gr.Slider(
                                        label="Random Seed",
@@ -836,14 +885,6 @@ def main():
            with gr.Tab("⚙️ Advanced Options", id=2):
                with gr.Group(elem_classes="advanced-options"):
-                    gr.Markdown("### Auto configuration")
-                    with gr.Row():
-                        enable_auto_config = gr.Checkbox(
-                            label="Auto configuration",
-                            value=False,
-                            info="Auto-tune optimization settings for your GPU",
-                        )
                    gr.Markdown("### GPU Memory Optimization")
                    with gr.Row():
                        rotary_chunk = gr.Checkbox(
@@ -861,6 +902,11 @@ def main():
                            info="Controls the chunk size for applying rotary embeddings. Larger values may improve performance but increase memory usage. Only effective if 'rotary_chunk' is checked.",
                        )
+                        unload_modules = gr.Checkbox(
+                            label="Unload Modules",
+                            value=False,
+                            info="Unload modules (T5, CLIP, DIT, etc.) after inference to reduce GPU/CPU memory usage",
+                        )
                        clean_cuda_cache = gr.Checkbox(
                            label="Clean CUDA Memory Cache",
                            value=False,
@@ -895,6 +941,12 @@ def main():
                            value=1.0,
                            info="Controls how much of the Dit model is offloaded to the CPU",
                        )
+                        t5_cpu_offload = gr.Checkbox(
+                            label="T5 CPU Offloading",
+                            value=False,
+                            info="Offload the T5 Encoder model to CPU to reduce GPU memory usage",
+                        )
                        t5_offload_granularity = gr.Dropdown(
                            label="T5 Encoder Offload Granularity",
                            choices=["model", "block"],
@@ -983,7 +1035,7 @@ def main():
                enable_auto_config.change(
                    fn=auto_configure,
-                    inputs=[enable_auto_config, model_type, resolution],
+                    inputs=[enable_auto_config, resolution],
                    outputs=[
                        torch_compile,
                        lazy_load,
@@ -993,6 +1045,8 @@ def main():
                        cpu_offload,
                        offload_granularity,
                        offload_ratio,
+                        t5_cpu_offload,
+                        unload_modules,
                        t5_offload_granularity,
                        attention_type,
                        quant_op,
@@ -1008,46 +1062,92 @@ def main():
                    ],
                )
-        infer_btn.click(
+                lazy_load.change(
-            fn=run_inference,
+                    fn=handle_lazy_load_change,
-            inputs=[
+                    inputs=[lazy_load],
-                model_type,
+                    outputs=[unload_modules],
-                task,
+                )
-                prompt,
+        if task == "i2v":
-                negative_prompt,
+            infer_btn.click(
-                image_path,
+                fn=run_inference,
-                save_video_path,
+                inputs=[
-                torch_compile,
+                    prompt,
-                infer_steps,
+                    negative_prompt,
-                num_frames,
+                    save_video_path,
-                resolution,
+                    torch_compile,
-                seed,
+                    infer_steps,
-                sample_shift,
+                    num_frames,
-                enable_teacache,
+                    resolution,
-                teacache_thresh,
+                    seed,
-                use_ret_steps,
+                    sample_shift,
-                enable_cfg,
+                    enable_teacache,
-                cfg_scale,
+                    teacache_thresh,
-                dit_quant_scheme,
+                    use_ret_steps,
-                t5_quant_scheme,
+                    enable_cfg,
-                clip_quant_scheme,
+                    cfg_scale,
-                fps,
+                    dit_quant_scheme,
-                use_tiny_vae,
+                    t5_quant_scheme,
-                use_tiling_vae,
+                    clip_quant_scheme,
-                lazy_load,
+                    fps,
-                precision_mode,
+                    use_tiny_vae,
-                cpu_offload,
+                    use_tiling_vae,
-                offload_granularity,
+                    lazy_load,
-                offload_ratio,
+                    precision_mode,
-                t5_offload_granularity,
+                    cpu_offload,
-                attention_type,
+                    offload_granularity,
-                quant_op,
+                    offload_ratio,
-                rotary_chunk,
+                    t5_cpu_offload,
-                rotary_chunk_size,
+                    unload_modules,
-                clean_cuda_cache,
+                    t5_offload_granularity,
-            ],
+                    attention_type,
-            outputs=output_video,
+                    quant_op,
-        )
+                    rotary_chunk,
+                    rotary_chunk_size,
+                    clean_cuda_cache,
+                    image_path,
+                ],
+                outputs=output_video,
+            )
+        else:
+            infer_btn.click(
+                fn=run_inference,
+                inputs=[
+                    prompt,
+                    negative_prompt,
+                    save_video_path,
+                    torch_compile,
+                    infer_steps,
+                    num_frames,
+                    resolution,
+                    seed,
+                    sample_shift,
+                    enable_teacache,
+                    teacache_thresh,
+                    use_ret_steps,
+                    enable_cfg,
+                    cfg_scale,
+                    dit_quant_scheme,
+                    t5_quant_scheme,
+                    clip_quant_scheme,
+                    fps,
+                    use_tiny_vae,
+                    use_tiling_vae,
+                    lazy_load,
+                    precision_mode,
+                    cpu_offload,
+                    offload_granularity,
+                    offload_ratio,
+                    t5_cpu_offload,
+                    unload_modules,
+                    t5_offload_granularity,
+                    attention_type,
+                    quant_op,
+                    rotary_chunk,
+                    rotary_chunk_size,
+                    clean_cuda_cache,
+                ],
+                outputs=output_video,
+            )
    demo.launch(share=True, server_port=args.server_port, server_name=args.server_name)
@@ -1062,12 +1162,16 @@ if __name__ == "__main__":
        default="wan2.1",
        help="Model class to use",
    )
+    parser.add_argument("--model_size", type=str, required=True, choices=["14b", "1.3b"], help="Model type to use")
+    parser.add_argument("--task", type=str, required=True, choices=["i2v", "t2v"], help="Specify the task type. 'i2v' for image-to-video translation, 't2v' for text-to-video generation.")
    parser.add_argument("--server_port", type=int, default=7862, help="Server port")
    parser.add_argument("--server_name", type=str, default="0.0.0.0", help="Server ip")
    args = parser.parse_args()
-    global model_path, model_cls
+    global model_path, model_cls, model_size
    model_path = args.model_path
    model_cls = args.model_cls
+    model_size = args.model_size
+    task = args.task
    main()
--- a/app/gradio_demo_zh.py
+++ b/app/gradio_demo_zh.py
 import os
 import gradio as gr
-import asyncio
 import argparse
 import json
 import torch
@@ -13,7 +12,6 @@ import importlib.util
 import psutil
 import random
 logger.add(
    "inference_logs.log",
    rotation="100 MB",
@@ -98,7 +96,7 @@ def get_gpu_memory(gpu_idx=0):
    try:
        with torch.cuda.device(gpu_idx):
            memory_info = torch.cuda.mem_get_info()
-            total_memory = memory_info[1] / (1024**3)
+            total_memory = memory_info[1] / (1024**3)  # Convert bytes to GB
            return total_memory
    except Exception as e:
        logger.warning(f"获取GPU内存失败: {e}")
@@ -110,6 +108,26 @@ def get_cpu_memory():
    return available_bytes / 1024**3
+def cleanup_memory():
+    gc.collect()
+    if torch.cuda.is_available():
+        torch.cuda.empty_cache()
+        torch.cuda.synchronize()
+    try:
+        import psutil
+        if hasattr(psutil, "virtual_memory"):
+            if os.name == "posix":
+                try:
+                    os.system("sync")
+                except:  # noqa
+                    pass
+    except:  # noqa
+        pass
 def generate_unique_filename(base_dir="./saved_videos"):
    os.makedirs(base_dir, exist_ok=True)
    timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
@@ -148,11 +166,8 @@ for op_name, is_installed in available_attn_ops:
 def run_inference(
-    model_type,
-    task,
    prompt,
    negative_prompt,
-    image_path,
    save_video_path,
    torch_compile,
    infer_steps,
@@ -176,30 +191,30 @@ def run_inference(
    cpu_offload,
    offload_granularity,
    offload_ratio,
+    t5_cpu_offload,
+    unload_modules,
    t5_offload_granularity,
    attention_type,
    quant_op,
    rotary_chunk,
    rotary_chunk_size,
    clean_cuda_cache,
+    image_path=None,
 ):
+    cleanup_memory()
    quant_op = quant_op.split("(")[0].strip()
    attention_type = attention_type.split("(")[0].strip()
-    global global_runner, current_config, model_path
+    global global_runner, current_config, model_path, task
    global cur_dit_quant_scheme, cur_clip_quant_scheme, cur_t5_quant_scheme, cur_precision_mode, cur_enable_teacache
    if os.path.exists(os.path.join(model_path, "config.json")):
        with open(os.path.join(model_path, "config.json"), "r") as f:
            model_config = json.load(f)
-    if task == "图像生成视频":
-        task = "i2v"
-    elif task == "文本生成视频":
-        task = "t2v"
    if task == "t2v":
-        if model_type == "Wan2.1 1.3B":
+        if model_size == "1.3b":
            # 1.3B
            coefficient = [
                [
@@ -294,6 +309,7 @@ def run_inference(
    needs_reinit = (
        lazy_load
+        or unload_modules
        or global_runner is None
        or current_config is None
        or cur_dit_quant_scheme is None
@@ -332,6 +348,8 @@ def run_inference(
        if os.path.exists(os.path.join(dit_quantized_ckpt, "config.json")):
            with open(os.path.join(dit_quantized_ckpt, "config.json"), "r") as f:
                quant_model_config = json.load(f)
+        else:
+            quant_model_config = {}
    else:
        mm_type = "Default"
        dit_quantized_ckpt = None
@@ -362,6 +380,8 @@ def run_inference(
        "coefficients": coefficient[0] if use_ret_steps else coefficient[1],
        "use_ret_steps": use_ret_steps,
        "teacache_thresh": teacache_thresh,
+        "t5_cpu_offload": t5_cpu_offload,
+        "unload_modules": unload_modules,
        "t5_quantized": is_t5_quant,
        "t5_quantized_ckpt": t5_quant_ckpt,
        "t5_quant_scheme": t5_quant_scheme,
@@ -400,13 +420,13 @@ def run_inference(
    config.update({k: v for k, v in vars(args).items()})
    config = EasyDict(config)
-    config["mode"] = "infer"
    config.update(model_config)
    config.update(quant_model_config)
    logger.info(f"使用模型: {model_path}")
    logger.info(f"推理配置:\n{json.dumps(config, indent=4, ensure_ascii=False)}")
+    # Initialize or reuse the runner
    runner = global_runner
    if needs_reinit:
        if runner is not None:
@@ -429,17 +449,27 @@ def run_inference(
    else:
        runner.config = config
-    asyncio.run(runner.run_pipeline())
+    runner.run_pipeline()
-    if lazy_load:
+    del config, args, model_config, quant_model_config
-        del runner
+    if "dit_quantized_ckpt" in locals():
-        torch.cuda.empty_cache()
+        del dit_quantized_ckpt
-        gc.collect()
+    if "t5_quant_ckpt" in locals():
+        del t5_quant_ckpt
+    if "clip_quant_ckpt" in locals():
+        del clip_quant_ckpt
+    cleanup_memory()
    return save_video_path
-def auto_configure(enable_auto_config, model_type, resolution):
+def handle_lazy_load_change(lazy_load_enabled):
+    """Handle lazy_load checkbox change to automatically enable unload_modules"""
+    return gr.update(value=lazy_load_enabled)
+def auto_configure(enable_auto_config, resolution):
    default_config = {
        "torch_compile_val": False,
        "lazy_load_val": False,
@@ -449,6 +479,8 @@ def auto_configure(enable_auto_config, model_type, resolution):
        "cpu_offload_val": False,
        "offload_granularity_val": "block",
        "offload_ratio_val": 1,
+        "t5_cpu_offload_val": False,
+        "unload_modules_val": False,
        "t5_offload_granularity_val": "model",
        "attention_type_val": attn_op_choices[0][1],
        "quant_op_val": quant_op_choices[0][1],
@@ -505,7 +537,7 @@ def auto_configure(enable_auto_config, model_type, resolution):
    else:
        res = "480p"
-    if model_type in ["Wan2.1 14B"]:
+    if model_size == "14b":
        is_14b = True
    else:
        is_14b = False
@@ -513,13 +545,14 @@ def auto_configure(enable_auto_config, model_type, resolution):
    if res == "720p" and is_14b:
        gpu_rules = [
            (80, {}),
-            (48, {"cpu_offload_val": True, "offload_ratio_val": 0.5}),
+            (48, {"cpu_offload_val": True, "offload_ratio_val": 0.5, "t5_cpu_offload_val": True}),
-            (40, {"cpu_offload_val": True, "offload_ratio_val": 0.8}),
+            (40, {"cpu_offload_val": True, "offload_ratio_val": 0.8, "t5_cpu_offload_val": True}),
-            (32, {"cpu_offload_val": True, "offload_ratio_val": 1}),
+            (32, {"cpu_offload_val": True, "offload_ratio_val": 1, "t5_cpu_offload_val": True}),
            (
                24,
                {
                    "cpu_offload_val": True,
+                    "t5_cpu_offload_val": True,
                    "offload_ratio_val": 1,
                    "t5_offload_granularity_val": "block",
                    "precision_mode_val": "bf16",
@@ -530,6 +563,7 @@ def auto_configure(enable_auto_config, model_type, resolution):
                16,
                {
                    "cpu_offload_val": True,
+                    "t5_cpu_offload_val": True,
                    "offload_ratio_val": 1,
                    "t5_offload_granularity_val": "block",
                    "precision_mode_val": "bf16",
@@ -543,6 +577,7 @@ def auto_configure(enable_auto_config, model_type, resolution):
                12,
                {
                    "cpu_offload_val": True,
+                    "t5_cpu_offload_val": True,
                    "offload_ratio_val": 1,
                    "t5_offload_granularity_val": "block",
                    "precision_mode_val": "bf16",
@@ -551,12 +586,14 @@ def auto_configure(enable_auto_config, model_type, resolution):
                    "rotary_chunk_val": True,
                    "rotary_chunk_size_val": 100,
                    "clean_cuda_cache_val": True,
+                    "use_tiny_vae_val": True,
                },
            ),
            (
                8,
                {
                    "cpu_offload_val": True,
+                    "t5_cpu_offload_val": True,
                    "offload_ratio_val": 1,
                    "t5_offload_granularity_val": "block",
                    "precision_mode_val": "bf16",
@@ -569,6 +606,8 @@ def auto_configure(enable_auto_config, model_type, resolution):
                    "clip_quant_scheme_val": quant_type,
                    "dit_quant_scheme_val": quant_type,
                    "lazy_load_val": True,
+                    "unload_modules_val": True,
+                    "use_tiny_vae_val": True,
                },
            ),
        ]
@@ -576,13 +615,14 @@ def auto_configure(enable_auto_config, model_type, resolution):
    elif is_14b:
        gpu_rules = [
            (80, {}),
-            (48, {"cpu_offload_val": True, "offload_ratio_val": 0.2}),
+            (48, {"cpu_offload_val": True, "offload_ratio_val": 0.2, "t5_cpu_offload_val": True}),
-            (40, {"cpu_offload_val": True, "offload_ratio_val": 0.5}),
+            (40, {"cpu_offload_val": True, "offload_ratio_val": 0.5, "t5_cpu_offload_val": True}),
-            (24, {"cpu_offload_val": True, "offload_ratio_val": 0.8}),
+            (24, {"cpu_offload_val": True, "offload_ratio_val": 0.8, "t5_cpu_offload_val": True}),
            (
                16,
                {
                    "cpu_offload_val": True,
+                    "t5_cpu_offload_val": True,
                    "offload_ratio_val": 1,
                    "t5_offload_granularity_val": "block",
                    "precision_mode_val": "bf16",
@@ -595,6 +635,7 @@ def auto_configure(enable_auto_config, model_type, resolution):
                (
                    {
                        "cpu_offload_val": True,
+                        "t5_cpu_offload_val": True,
                        "offload_ratio_val": 1,
                        "t5_offload_granularity_val": "block",
                        "precision_mode_val": "bf16",
@@ -604,12 +645,15 @@ def auto_configure(enable_auto_config, model_type, resolution):
                        "clip_quant_scheme_val": quant_type,
                        "dit_quant_scheme_val": quant_type,
                        "lazy_load_val": True,
+                        "unload_modules_val": True,
                        "rotary_chunk_val": True,
                        "rotary_chunk_size_val": 10000,
+                        "use_tiny_vae_val": True,
                    }
                    if res == "540p"
                    else {
                        "cpu_offload_val": True,
+                        "t5_cpu_offload_val": True,
                        "offload_ratio_val": 1,
                        "t5_offload_granularity_val": "block",
                        "precision_mode_val": "bf16",
@@ -619,11 +663,26 @@ def auto_configure(enable_auto_config, model_type, resolution):
                        "clip_quant_scheme_val": quant_type,
                        "dit_quant_scheme_val": quant_type,
                        "lazy_load_val": True,
+                        "unload_modules_val": True,
+                        "use_tiny_vae_val": True,
                    }
                ),
            ),
        ]
+    else:
+        gpu_rules = [
+            (24, {}),
+            (
+                8,
+                {
+                    "t5_cpu_offload_val": True,
+                    "t5_offload_granularity_val": "block",
+                    "t5_quant_scheme_val": quant_type,
+                },
+            ),
+        ]
    if is_14b:
        cpu_rules = [
            (128, {}),
@@ -636,6 +695,19 @@ def auto_configure(enable_auto_config, model_type, resolution):
                    "t5_quant_scheme_val": quant_type,
                    "clip_quant_scheme_val": quant_type,
                    "lazy_load_val": True,
+                    "unload_modules_val": True,
+                },
+            ),
+        ]
+    else:
+        cpu_rules = [
+            (64, {}),
+            (
+                16,
+                {
+                    "t5_quant_scheme_val": quant_type,
+                    "unload_modules_val": True,
+                    "use_tiny_vae_val": True,
                },
            ),
        ]
@@ -654,17 +726,11 @@ def auto_configure(enable_auto_config, model_type, resolution):
 def main():
-    def update_model_type(task_type):
-        if task_type == "图像生成视频":
-            return gr.update(choices=["Wan2.1 14B"], value="Wan2.1 14B")
-        elif task_type == "文本生成视频":
-            return gr.update(choices=["Wan2.1 14B", "Wan2.1 1.3B"], value="Wan2.1 14B")
    def toggle_image_input(task):
-        return gr.update(visible=(task == "图像生成视频"))
+        return gr.update(visible=(task == "i2v"))
    with gr.Blocks(
-        title="Lightx2v (轻量级视频生成推理引擎)",
+        title="Lightx2v (轻量级视频推理和生成引擎)",
        css="""
        .main-content { max-width: 1400px; margin: auto; }
        .output-video { max-height: 650px; }
@@ -683,37 +749,15 @@ def main():
                        with gr.Group():
                            gr.Markdown("## 📥 输入参数")
-                            with gr.Row():
+                            if task == "i2v":
-                                task = gr.Dropdown(
+                                with gr.Row():
-                                    choices=["图像生成视频", "文本生成视频"],
+                                    image_path = gr.Image(
-                                    value="图像生成视频",
+                                        label="输入图像",
-                                    label="任务类型",
+                                        type="filepath",
-                                )
+                                        height=300,
-                                model_type = gr.Dropdown(
+                                        interactive=True,
-                                    choices=["Wan2.1 14B"],
+                                        visible=True,
-                                    value="Wan2.1 14B",
+                                    )
-                                    label="模型类型",
-                                )
-                                task.change(
-                                    fn=update_model_type,
-                                    inputs=task,
-                                    outputs=model_type,
-                                )
-                            with gr.Row():
-                                image_path = gr.Image(
-                                    label="输入图像",
-                                    type="filepath",
-                                    height=300,
-                                    interactive=True,
-                                    visible=True,
-                                )
-                                task.change(
-                                    fn=toggle_image_input,
-                                    inputs=task,
-                                    outputs=image_path,
-                                )
                            with gr.Row():
                                with gr.Column():
@@ -755,6 +799,11 @@ def main():
                                        value="832x480",
                                        label="最大分辨率",
                                    )
+                                with gr.Column():
+                                    enable_auto_config = gr.Checkbox(
+                                        label="自动配置推理选项", value=False, info="自动优化GPU设置以匹配当前分辨率。修改分辨率后，请重新勾选此选项，否则可能导致性能下降或运行失败。"
+                                    )
                                with gr.Column(scale=9):
                                    seed = gr.Slider(
                                        label="随机种子",
@@ -764,9 +813,10 @@ def main():
                                        value=generate_random_seed(),
                                    )
                                with gr.Column(scale=1):
-                                    randomize_btn = gr.Button("🎲 生成随机种子", variant="secondary")
+                                    randomize_btn = gr.Button("🎲 随机化", variant="secondary")
                                randomize_btn.click(fn=generate_random_seed, inputs=None, outputs=seed)
                                with gr.Column():
                                    infer_steps = gr.Slider(
                                        label="推理步数",
@@ -774,7 +824,7 @@ def main():
                                        maximum=100,
                                        step=1,
                                        value=40,
-                                        info="视频生成的推理步数。增加步数可能提高质量但降低速度",
+                                        info="视频生成的推理步数。增加步数可能提高质量但降低速度。",
                                    )
                            enable_cfg = gr.Checkbox(
@@ -788,7 +838,7 @@ def main():
                                maximum=10,
                                step=1,
                                value=5,
-                                info="控制提示词的影响强度。值越高，提示词的影响越大",
+                                info="控制提示词的影响强度。值越高，提示词的影响越大。",
                            )
                            sample_shift = gr.Slider(
                                label="分布偏移",
@@ -796,7 +846,7 @@ def main():
                                minimum=0,
                                maximum=10,
                                step=1,
-                                info="控制样本分布偏移的程度。值越大表示偏移越明显",
+                                info="控制样本分布偏移的程度。值越大表示偏移越明显。",
                            )
                            fps = gr.Slider(
@@ -805,7 +855,7 @@ def main():
                                maximum=30,
                                step=1,
                                value=16,
-                                info="视频的每秒帧数。较高的FPS会产生更流畅的视频",
+                                info="视频的每秒帧数。较高的FPS会产生更流畅的视频。",
                            )
                            num_frames = gr.Slider(
                                label="总帧数",
@@ -813,7 +863,7 @@ def main():
                                maximum=120,
                                step=1,
                                value=81,
-                                info="视频中的总帧数。更多帧数会产生更长的视频",
+                                info="视频中的总帧数。更多帧数会产生更长的视频。",
                            )
                        save_video_path = gr.Textbox(
@@ -835,14 +885,6 @@ def main():
            with gr.Tab("⚙️ 高级选项", id=2):
                with gr.Group(elem_classes="advanced-options"):
-                    gr.Markdown("### 自动配置")
-                    with gr.Row():
-                        enable_auto_config = gr.Checkbox(
-                            label="自动配置",
-                            value=False,
-                            info="自动调整优化设置以适应您的GPU",
-                        )
                    gr.Markdown("### GPU内存优化")
                    with gr.Row():
                        rotary_chunk = gr.Checkbox(
@@ -857,13 +899,17 @@ def main():
                            minimum=100,
                            maximum=10000,
                            step=100,
-                            info="控制应用旋转编码的块大小, 较大的值可能提高性能但增加内存使用, 仅在'rotary_chunk'勾选时有效",
+                            info="控制应用旋转编码的块大小。较大的值可能提高性能但增加内存使用。仅在'rotary_chunk'勾选时有效。",
+                        )
+                        unload_modules = gr.Checkbox(
+                            label="卸载模块",
+                            value=False,
+                            info="推理后卸载模块（T5、CLIP、DIT等）以减少GPU/CPU内存使用",
                        )
                        clean_cuda_cache = gr.Checkbox(
                            label="清理CUDA内存缓存",
                            value=False,
-                            info="及时释放GPU内存, 但会减慢推理速度。",
+                            info="启用时，及时释放GPU内存但会减慢推理速度。",
                        )
                    gr.Markdown("### 异步卸载")
@@ -877,14 +923,14 @@ def main():
                        lazy_load = gr.Checkbox(
                            label="启用延迟加载",
                            value=False,
-                            info="在推理过程中延迟加载模型组件, 仅在'cpu_offload'勾选和使用量化Dit模型时有效",
+                            info="在推理过程中延迟加载模型组件。需要CPU加载和DIT量化。",
                        )
                        offload_granularity = gr.Dropdown(
                            label="Dit卸载粒度",
                            choices=["block", "phase"],
                            value="phase",
-                            info="设置Dit模型卸载粒度: 块或计算阶段",
+                            info="设置Dit模型卸载粒度：块或计算阶段",
                        )
                        offload_ratio = gr.Slider(
                            label="Dit模型卸载比例",
@@ -894,6 +940,11 @@ def main():
                            value=1.0,
                            info="控制将多少Dit模型卸载到CPU",
                        )
+                        t5_cpu_offload = gr.Checkbox(
+                            label="T5 CPU卸载",
+                            value=False,
+                            info="将T5编码器模型卸载到CPU以减少GPU内存使用",
+                        )
                        t5_offload_granularity = gr.Dropdown(
                            label="T5编码器卸载粒度",
                            choices=["model", "block"],
@@ -926,25 +977,25 @@ def main():
                            label="Dit",
                            choices=["fp8", "int8", "bf16"],
                            value="bf16",
-                            info="Dit模型的推理精度",
+                            info="Dit模型的量化精度",
                        )
                        t5_quant_scheme = gr.Dropdown(
                            label="T5编码器",
                            choices=["fp8", "int8", "bf16"],
                            value="bf16",
-                            info="T5编码器模型的推理精度",
+                            info="T5编码器模型的量化精度",
                        )
                        clip_quant_scheme = gr.Dropdown(
                            label="Clip编码器",
                            choices=["fp8", "int8", "fp16"],
                            value="fp16",
-                            info="Clip编码器的推理精度",
+                            info="Clip编码器的量化精度",
                        )
                        precision_mode = gr.Dropdown(
-                            label="敏感层精度",
+                            label="敏感层精度模式",
                            choices=["fp32", "bf16"],
                            value="fp32",
-                            info="选择用于敏感层（如norm层和embedding层）的数值精度",
+                            info="选择用于关键模型组件（如归一化和嵌入层）的数值精度。FP32提供更高精度，而BF16在兼容硬件上提高性能。",
                        )
                    gr.Markdown("### 变分自编码器(VAE)")
@@ -982,7 +1033,7 @@ def main():
                enable_auto_config.change(
                    fn=auto_configure,
-                    inputs=[enable_auto_config, model_type, resolution],
+                    inputs=[enable_auto_config, resolution],
                    outputs=[
                        torch_compile,
                        lazy_load,
@@ -992,6 +1043,8 @@ def main():
                        cpu_offload,
                        offload_granularity,
                        offload_ratio,
+                        t5_cpu_offload,
+                        unload_modules,
                        t5_offload_granularity,
                        attention_type,
                        quant_op,
@@ -1007,46 +1060,92 @@ def main():
                    ],
                )
-        infer_btn.click(
+                lazy_load.change(
-            fn=run_inference,
+                    fn=handle_lazy_load_change,
-            inputs=[
+                    inputs=[lazy_load],
-                model_type,
+                    outputs=[unload_modules],
-                task,
+                )
-                prompt,
+        if task == "i2v":
-                negative_prompt,
+            infer_btn.click(
-                image_path,
+                fn=run_inference,
-                save_video_path,
+                inputs=[
-                torch_compile,
+                    prompt,
-                infer_steps,
+                    negative_prompt,
-                num_frames,
+                    save_video_path,
-                resolution,
+                    torch_compile,
-                seed,
+                    infer_steps,
-                sample_shift,
+                    num_frames,
-                enable_teacache,
+                    resolution,
-                teacache_thresh,
+                    seed,
-                use_ret_steps,
+                    sample_shift,
-                enable_cfg,
+                    enable_teacache,
-                cfg_scale,
+                    teacache_thresh,
-                dit_quant_scheme,
+                    use_ret_steps,
-                t5_quant_scheme,
+                    enable_cfg,
-                clip_quant_scheme,
+                    cfg_scale,
-                fps,
+                    dit_quant_scheme,
-                use_tiny_vae,
+                    t5_quant_scheme,
-                use_tiling_vae,
+                    clip_quant_scheme,
-                lazy_load,
+                    fps,
-                precision_mode,
+                    use_tiny_vae,
-                cpu_offload,
+                    use_tiling_vae,
-                offload_granularity,
+                    lazy_load,
-                offload_ratio,
+                    precision_mode,
-                t5_offload_granularity,
+                    cpu_offload,
-                attention_type,
+                    offload_granularity,
-                quant_op,
+                    offload_ratio,
-                rotary_chunk,
+                    t5_cpu_offload,
-                rotary_chunk_size,
+                    unload_modules,
-                clean_cuda_cache,
+                    t5_offload_granularity,
-            ],
+                    attention_type,
-            outputs=output_video,
+                    quant_op,
-        )
+                    rotary_chunk,
+                    rotary_chunk_size,
+                    clean_cuda_cache,
+                    image_path,
+                ],
+                outputs=output_video,
+            )
+        else:
+            infer_btn.click(
+                fn=run_inference,
+                inputs=[
+                    prompt,
+                    negative_prompt,
+                    save_video_path,
+                    torch_compile,
+                    infer_steps,
+                    num_frames,
+                    resolution,
+                    seed,
+                    sample_shift,
+                    enable_teacache,
+                    teacache_thresh,
+                    use_ret_steps,
+                    enable_cfg,
+                    cfg_scale,
+                    dit_quant_scheme,
+                    t5_quant_scheme,
+                    clip_quant_scheme,
+                    fps,
+                    use_tiny_vae,
+                    use_tiling_vae,
+                    lazy_load,
+                    precision_mode,
+                    cpu_offload,
+                    offload_granularity,
+                    offload_ratio,
+                    t5_cpu_offload,
+                    unload_modules,
+                    t5_offload_granularity,
+                    attention_type,
+                    quant_op,
+                    rotary_chunk,
+                    rotary_chunk_size,
+                    clean_cuda_cache,
+                ],
+                outputs=output_video,
+            )
    demo.launch(share=True, server_port=args.server_port, server_name=args.server_name)
@@ -1061,12 +1160,16 @@ if __name__ == "__main__":
        default="wan2.1",
        help="要使用的模型类别",
    )
+    parser.add_argument("--model_size", type=str, required=True, choices=["14b", "1.3b"], help="模型大小：14b 或 1.3b")
+    parser.add_argument("--task", type=str, required=True, choices=["i2v", "t2v"], help="指定任务类型。'i2v'用于图像到视频转换，'t2v'用于文本到视频生成。")
    parser.add_argument("--server_port", type=int, default=7862, help="服务器端口")
    parser.add_argument("--server_name", type=str, default="0.0.0.0", help="服务器IP")
    args = parser.parse_args()
-    global model_path, model_cls
+    global model_path, model_cls, model_size
    model_path = args.model_path
    model_cls = args.model_cls
+    model_size = args.model_size
+    task = args.task
    main()
--- a/app/run_gradio.sh
+++ b/app/run_gradio.sh
 #!/bin/bash
-lightx2v_path=/mtc/gushiqiao/llmc_workspace/lightx2v_new/lightx2v
+# Lightx2v Gradio Demo Startup Script
-model_path=/data/nvme0/gushiqiao/models/I2V/Wan2.1-I2V-14B-720P-Lightx2v-Step-Distill
+# Supports both Image-to-Video (i2v) and Text-to-Video (t2v) modes
-export CUDA_VISIBLE_DEVICES=7
+# ==================== Configuration Area ====================
+# ⚠️  Important: Please modify the following paths according to your actual environment
+# 🚨 Storage Performance Tips 🚨
+# 💾 Strongly recommend storing model files on SSD solid-state drives!
+# 📈 SSD can significantly improve model loading speed and inference performance
+# 🐌 Using mechanical hard drives (HDD) may cause slow model loading and affect overall experience
+# Lightx2v project root directory path
+# Example: /home/user/lightx2v or /data/video_gen/lightx2v
+lightx2v_path=/path/to/lightx2v
+# Model path configuration
+# Image-to-video model path (for i2v tasks)
+# Example: /path/to/Wan2.1-I2V-14B-720P-Lightx2v
+i2v_model_path=/path/to/Wan2.1-I2V-14B-720P-Lightx2v-Step-Distill
+# Text-to-video model path (for t2v tasks)
+# Example: /path/to/Wan2.1-T2V-1.3B
+t2v_model_path=/path/to/Wan2.1-T2V-1.3B
+# Model size configuration
+# Default model size (14b, 1.3b)
+model_size="14b"
+# Server configuration
+server_name="0.0.0.0"
+server_port=8032
+# GPU configuration
+gpu_id=0
+# ==================== Environment Variables Setup ====================
+export CUDA_VISIBLE_DEVICES=$gpu_id
 export CUDA_LAUNCH_BLOCKING=1
 export PYTHONPATH=${lightx2v_path}:$PYTHONPATH
 export ENABLE_PROFILING_DEBUG=true
 export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True
-python gradio_demo.py \
+# ==================== Parameter Parsing ====================
-    --model_path $model_path \
+# Default task type
-    --server_name 0.0.0.0 \
+task="i2v"
-    --server_port 8005
+# Default interface language
+lang="zh"
+# 解析命令行参数
+while [[ $# -gt 0 ]]; do
+    case $1 in
+        --task)
+            task="$2"
+            shift 2
+            ;;
+        --lang)
+            lang="$2"
+            shift 2
+            ;;
+        --port)
+            server_port="$2"
+            shift 2
+            ;;
+        --gpu)
+            gpu_id="$2"
+            export CUDA_VISIBLE_DEVICES=$gpu_id
+            shift 2
+            ;;
+        --model_size)
+            model_size="$2"
+            shift 2
+            ;;
+        --help)
+            echo "🎬 Lightx2v Gradio Demo Startup Script"
+            echo "=========================================="
+            echo "Usage: $0 [options]"
+            echo ""
+            echo "📋 Available options:"
+            echo "  --task i2v|t2v    Task type (default: i2v)"
+            echo "                     i2v: Image-to-video generation"
+            echo "                     t2v: Text-to-video generation"
+            echo "  --lang zh|en      Interface language (default: zh)"
+            echo "                     zh: Chinese interface"
+            echo "                     en: English interface"
+            echo "  --port PORT       Server port (default: 8032)"
+            echo "  --gpu GPU_ID      GPU device ID (default: 0)"
+            echo "  --model_size MODEL_SIZE"
+            echo "                     Model size (default: 14b)"
+            echo "                     14b: 14 billion parameters model"
+            echo "                     1.3b: 1.3 billion parameters model"
+            echo "  --help            Show this help message"
+            echo ""
+            echo "🚀 Usage examples:"
+            echo "  $0                                    # Default startup for image-to-video mode"
+            echo "  $0 --task i2v --lang zh --port 8032   # Start with specified parameters"
+            echo "  $0 --task t2v --lang en --port 7860   # Text-to-video with English interface"
+            echo "  $0 --task i2v --gpu 1 --port 8032     # Use GPU 1"
+            echo "  $0 --task t2v --model_size 1.3b       # Use 1.3B model"
+            echo "  $0 --task i2v --model_size 14b        # Use 14B model"
+            echo ""
+            echo "📝 Notes:"
+            echo "  - Edit script to configure model paths before first use"
+            echo "  - Ensure required Python dependencies are installed"
+            echo "  - Recommended to use GPU with 8GB+ VRAM"
+            echo "  - 🚨 Strongly recommend storing models on SSD for better performance"
+            exit 0
+            ;;
+        *)
+            echo "Unknown parameter: $1"
+            echo "Use --help to see help information"
+            exit 1
+            ;;
+    esac
+done
+# ==================== Parameter Validation ====================
+if [[ "$task" != "i2v" && "$task" != "t2v" ]]; then
+    echo "Error: Task type must be 'i2v' or 't2v'"
+    exit 1
+fi
+if [[ "$lang" != "zh" && "$lang" != "en" ]]; then
+    echo "Error: Language must be 'zh' or 'en'"
+    exit 1
+fi
+# Validate model size
+if [[ "$model_size" != "14b" && "$model_size" != "1.3b" ]]; then
+    echo "Error: Model size must be '14b' or '1.3b'"
+    exit 1
+fi
+# Select model path based on task type
+if [[ "$task" == "i2v" ]]; then
+    model_path=$i2v_model_path
+    echo "🎬 Starting Image-to-Video mode"
+else
+    model_path=$t2v_model_path
+    echo "🎬 Starting Text-to-Video mode"
+fi
+# Check if model path exists
+if [[ ! -d "$model_path" ]]; then
+    echo "❌ Error: Model path does not exist"
+    echo "📁 Path: $model_path"
+    echo "🔧 Solutions:"
+    echo "  1. Check model path configuration in script"
+    echo "  2. Ensure model files are properly downloaded"
+    echo "  3. Verify path permissions are correct"
+    echo "  4. 💾 Recommend storing models on SSD for faster loading"
+    exit 1
+fi
+# Select demo file based on language
+if [[ "$lang" == "zh" ]]; then
+    demo_file="gradio_demo_zh.py"
+    echo "🌏 Using Chinese interface"
+else
+    demo_file="gradio_demo.py"
+    echo "🌏 Using English interface"
+fi
+# Check if demo file exists
+if [[ ! -f "$demo_file" ]]; then
+    echo "❌ Error: Demo file does not exist"
+    echo "📄 File: $demo_file"
+    echo "🔧 Solutions:"
+    echo "  1. Ensure script is run in the correct directory"
+    echo "  2. Check if file has been renamed or moved"
+    echo "  3. Re-clone or download project files"
+    exit 1
+fi
+# ==================== System Information Display ====================
+echo "=========================================="
+echo "🚀 Lightx2v Gradio Demo Starting..."
+echo "=========================================="
+echo "📁 Project path: $lightx2v_path"
+echo "🤖 Model path: $model_path"
+echo "🎯 Task type: $task"
+echo "🤖 Model size: $model_size"
+echo "🌏 Interface language: $lang"
+echo "🖥️  GPU device: $gpu_id"
+echo "🌐 Server address: $server_name:$server_port"
+echo "=========================================="
+# Display system resource information
+echo "💻 System resource information:"
+free -h | grep -E "Mem|Swap"
+echo ""
+# Display GPU information
+if command -v nvidia-smi &> /dev/null; then
+    echo "🎮 GPU information:"
+    nvidia-smi --query-gpu=name,memory.total,memory.free --format=csv,noheader,nounits | head -1
+    echo ""
+fi
+# ==================== Start Demo ====================
+echo "🎬 Starting Gradio demo..."
+echo "📱 Please access in browser: http://$server_name:$server_port"
+echo "⏹️  Press Ctrl+C to stop service"
+echo "🔄 First startup may take several minutes to load model..."
+echo "=========================================="
+# Start Python demo
+python $demo_file \
+    --model_path "$model_path" \
+    --task "$task" \
+    --server_name "$server_name" \
+    --server_port "$server_port" \
+    --model_size "$model_size"
-# python gradio_demo_zh.py \
+# Display final system resource usage
-#     --model_path $model_path \
+echo ""
-#     --server_name 0.0.0.0 \
+echo "=========================================="
-#     --server_port 8005
+echo "📊 Final system resource usage:"
+free -h | grep -E "Mem|Swap"
--- a/assets/figs/offload/fig1_en.png
+++ b/assets/figs/offload/fig1_en.png
--- a/assets/figs/offload/fig1_zh.png
+++ b/assets/figs/offload/fig1_zh.png
--- a/assets/figs/offload/fig2_en.png
+++ b/assets/figs/offload/fig2_en.png
--- a/assets/figs/offload/fig2_zh.png
+++ b/assets/figs/offload/fig2_zh.png
--- a/assets/figs/offload/fig3_en.png
+++ b/assets/figs/offload/fig3_en.png
--- a/assets/figs/offload/fig3_zh.png
+++ b/assets/figs/offload/fig3_zh.png
--- a/assets/figs/offload/fig4_en.png
+++ b/assets/figs/offload/fig4_en.png
--- a/assets/figs/offload/fig4_zh.png
+++ b/assets/figs/offload/fig4_zh.png
--- a/assets/figs/offload/fig5_en.png
+++ b/assets/figs/offload/fig5_en.png
--- a/assets/figs/offload/fig5_zh.png
+++ b/assets/figs/offload/fig5_zh.png
--- a/configs/attns/wan_i2v_flash.json
+++ b/configs/attns/wan_i2v_flash.json
--- a/configs/attentions/wan_i2v_radial.json
+++ b/configs/attentions/wan_i2v_radial.json
+{
+    "infer_steps": 40,
+    "target_video_length": 81,
+    "target_height": 480,
+    "target_width": 832,
+    "self_attn_1_type": "radial_attn",
+    "cross_attn_1_type": "flash_attn3",
+    "cross_attn_2_type": "flash_attn3",
+    "seed": 42,
+    "sample_guide_scale": 5,
+    "sample_shift": 5,
+    "enable_cfg": true,
+    "cpu_offload": false
+}