Merge branch 'main' of github.com:ModelTC/lightx2v into dev-debug-distill

ae089db4 · GoatWu · 8b213df0 · 4796fc6e · ae089db4 · ae089db4
Commit ae089db4 authored Jul 11, 2025 by GoatWu
20 changed files
--- a/README.md
+++ b/README.md
-# LightX2V: Light Video Generation Inference Framework
+<div align="center" style="font-family: charter;">
+  <h1>⚡️ LightX2V:<br> Light Video Generation Inference Framework</h1>
-<div align="center" id="lightx2v">
 <img alt="logo" src="assets/img_lightx2v.png" width=75%></img>
 [![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
@@ -9,49 +9,74 @@
 [![Doc](https://img.shields.io/badge/文档-中文-99cc2)](https://lightx2v-zhcn.readthedocs.io/zh-cn/latest)
 [![Docker](https://badgen.net/badge/icon/docker?icon=docker&label)](https://hub.docker.com/r/lightx2v/lightx2v/tags)
+**\[ English | [中文](README_zh.md) | [日本語](README_ja.md) \]**
 </div>
 --------------------------------------------------------------------------------
-## Supported Model List
+**LightX2V** is a lightweight video generation inference framework designed to provide an inference tool that leverages multiple advanced video generation inference techniques. As a unified inference platform, this framework supports various generation tasks such as text-to-video (T2V) and image-to-video (I2V) across different models. **X2V means transforming different input modalities (such as text or images) to video output.**
-✅ [HunyuanVideo-T2V](https://huggingface.co/tencent/HunyuanVideo)
-✅ [HunyuanVideo-I2V](https://huggingface.co/tencent/HunyuanVideo-I2V)
+## 💡 How to Start
-✅ [Wan2.1-T2V](https://huggingface.co/Wan-AI/Wan2.1-T2V-1.3B)
+Please refer to our documentation: **[English Docs](https://lightx2v-en.readthedocs.io/en/latest/) | [中文文档](https://lightx2v-zhcn.readthedocs.io/zh-cn/latest/)**.
-✅ [Wan2.1-I2V](https://huggingface.co/Wan-AI/Wan2.1-I2V-14B-480P)
-✅ [Wan2.1-T2V-CausVid](https://huggingface.co/lightx2v/Wan2.1-T2V-14B-CausVid)
+## 🤖 Supported Model List
-✅ [SkyReels-V2-DF](https://huggingface.co/Skywork/SkyReels-V2-DF-14B-540P)
+- ✅ [HunyuanVideo-T2V](https://huggingface.co/tencent/HunyuanVideo)
+- ✅ [HunyuanVideo-I2V](https://huggingface.co/tencent/HunyuanVideo-I2V)
+- ✅ [Wan2.1-T2V](https://huggingface.co/Wan-AI/Wan2.1-T2V-1.3B)
+- ✅ [Wan2.1-I2V](https://huggingface.co/Wan-AI/Wan2.1-I2V-14B-480P)
+- ✅ [Wan2.1-T2V-StepDistill-CfgDistill](https://huggingface.co/lightx2v/Wan2.1-T2V-14B-StepDistill-CfgDistill) (recommended 🚀🚀🚀)
+- ✅ [Wan2.1-T2V-CausVid](https://huggingface.co/lightx2v/Wan2.1-T2V-14B-CausVid)
+- ✅ [SkyReels-V2-DF](https://huggingface.co/Skywork/SkyReels-V2-DF-14B-540P)
+- ✅ [CogVideoX1.5-5B-T2V](https://huggingface.co/THUDM/CogVideoX1.5-5B)
-✅ [CogVideoX1.5-5B-T2V](https://huggingface.co/THUDM/CogVideoX1.5-5B)
-## How to Run
+## 🧾 Contributing Guidelines
-Please refer to the [documentation](https://github.com/ModelTC/lightx2v/tree/main/docs) in lightx2v.
+We have prepared a pre-commit hook to enforce consistent code formatting across the project.
-## Contributing Guidelines
+> [!TIP]
+> - Install the required dependencies:
+>
+> ```shell
+> pip install ruff pre-commit
+>```
+>
+> - Then, run the following command before commit:
+>
+> ```shell
+> pre-commit run --all-files
+>```
-We have prepared a `pre-commit` hook to enforce consistent code formatting across the project.
-1. Install the required dependencies:
+Thank you for your contributions!
-```shell
-pip install ruff pre-commit
-```
-2. Then, run the following command before commit:
+## 🤝 Acknowledgments
-```shell
+We built the code for this repository by referencing the code repositories involved in all the models mentioned above.
-pre-commit run --all-files
-```
-Thank you for your contributions!
+## 🌟 Star History
-## Acknowledgments
+[![Star History Chart](https://api.star-history.com/svg?repos=ModelTC/lightx2v&type=Timeline)](https://star-history.com/#ModelTC/llmc&Timeline)
-We built the code for this repository by referencing the code repositories involved in all the models mentioned above.
+## ✏️ Citation
+If you find our framework useful to your research, please kindly cite our work:
+```
+@misc{lightx2v,
+ author = {lightx2v contributors},
+ title = {LightX2V: Light Video Generation Inference Framework},
+ year = {2025},
+ publisher = {GitHub},
+ journal = {GitHub repository},
+ howpublished = {\url{https://github.com/ModelTC/lightx2v}},
+}
+```
--- a/README_ja.md
+++ b/README_ja.md
+<div align="center" style="font-family: charter;">
+  <h1>⚡️ LightX2V:<br> 軽量ビデオ生成推論フレームワーク</h1>
+<img alt="logo" src="assets/img_lightx2v.png" width=75%></img>
+[![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
+[![Ask DeepWiki](https://deepwiki.com/badge.svg)](https://deepwiki.com/ModelTC/lightx2v)
+[![Doc](https://img.shields.io/badge/docs-English-99cc2)](https://lightx2v-en.readthedocs.io/en/latest)
+[![Doc](https://img.shields.io/badge/ドキュメント-日本語-99cc2)](https://lightx2v-zhcn.readthedocs.io/zh-cn/latest)
+[![Docker](https://badgen.net/badge/icon/docker?icon=docker&label)](https://hub.docker.com/r/lightx2v/lightx2v/tags)
+**\[ [English](README.md) | [中文](README_zh.md) | 日本語 \]**
+</div>
+--------------------------------------------------------------------------------
+**LightX2V** は、複数の先進的なビデオ生成推論技術を組み合わせた 軽量ビデオ生成推論フレームワーク です。単一のプラットフォームで テキストからビデオ (T2V)、画像からビデオ (I2V) など多様な生成タスクとモデルをサポートします。**X2V は「さまざまな入力モダリティ（テキスト・画像など）をビデオに変換する」ことを意味します。**
+## 💡 はじめに
+詳細手順はドキュメントをご覧ください：**[English Docs](https://lightx2v-en.readthedocs.io/en/latest/)** | **[中文文档](https://lightx2v-zhcn.readthedocs.io/zh-cn/latest/)**
+## 🤖 対応モデル一覧
+- ✅ [HunyuanVideo-T2V](https://huggingface.co/tencent/HunyuanVideo)
+- ✅ [HunyuanVideo-I2V](https://huggingface.co/tencent/HunyuanVideo-I2V)
+- ✅ [Wan 2.1-T2V](https://huggingface.co/Wan-AI/Wan2.1-T2V-1.3B)
+- ✅ [Wan 2.1-I2V](https://huggingface.co/Wan-AI/Wan2.1-I2V-14B-480P)
+- ✅ [Wan 2.1-T2V-StepDistill-CfgDistill](https://huggingface.co/lightx2v/Wan2.1-T2V-14B-StepDistill-CfgDistill) (おすすめ 🚀🚀🚀)
+- ✅ [Wan 2.1-T2V-CausVid](https://huggingface.co/lightx2v/Wan2.1-T2V-14B-CausVid)
+- ✅ [SkyReels-V2-DF](https://huggingface.co/Skywork/SkyReels-V2-DF-14B-540P)
+- ✅ [CogVideoX 1.5-5B-T2V](https://huggingface.co/THUDM/CogVideoX1.5-5B)
+## 🧾 コントリビューションガイドライン
+プロジェクト全体でコードフォーマットを統一するため、`pre-commit` フックを用意しています。
+> [!Tip]
+> 1. 依存パッケージをインストール
+>    ```bash
+>    pip install ruff pre-commit
+>    ```
+> 2. コミット前に実行
+>    ```bash
+>    pre-commit run --all-files
+>    ```
+ご協力ありがとうございます！
+## 🤝 謝辞
+本リポジトリの実装は、上記すべてのモデル関連リポジトリを参考にしています。
+## 🌟 Star 推移
+[![Star History Chart](https://api.star-history.com/svg?repos=ModelTC/lightx2v&type=Timeline)](https://star-history.com/#ModelTC/lightx2v&Timeline)
+## ✏️ 引用
+本フレームワークが研究に役立った場合は、以下を引用してください。
+```bibtex
+@misc{lightx2v,
+  author    = {lightx2v contributors},
+  title     = {LightX2V: Light Video Generation Inference Framework},
+  year      = {2025},
+  publisher = {GitHub},
+  howpublished = {\url{https://github.com/ModelTC/lightx2v}},
+}
--- a/README_zh.md
+++ b/README_zh.md
+<div align="center" style="font-family: charter;">
+  <h1>⚡️ LightX2V：<br>轻量级视频生成推理框架</h1>
+<img alt="logo" src="assets/img_lightx2v.png" width=75%></img>
+[![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
+[![Ask DeepWiki](https://deepwiki.com/badge.svg)](https://deepwiki.com/ModelTC/lightx2v)
+[![Doc](https://img.shields.io/badge/docs-English-99cc2)](https://lightx2v-en.readthedocs.io/en/latest)
+[![Doc](https://img.shields.io/badge/文档-中文-99cc2)](https://lightx2v-zhcn.readthedocs.io/zh-cn/latest)
+[![Docker](https://badgen.net/badge/icon/docker?icon=docker&label)](https://hub.docker.com/r/lightx2v/lightx2v/tags)
+**\[ [English](README.md) | 中文 | [日本語](README_ja.md) \]**
+</div>
+**LightX2V** 是一个轻量级的视频生成推理框架，集成多种先进的视频生成推理技术，统一支持 文本生成视频 (T2V)、图像生成视频 (I2V) 等多种生成任务及模型。**X2V 表示将不同的输入模态（X，如文本或图像）转换（to）为视频输出（V）。**
+## 💡 快速开始
+请参考文档：**[English Docs](https://lightx2v-en.readthedocs.io/en/latest/)** | **[中文文档](https://lightx2v-zhcn.readthedocs.io/zh-cn/latest/)**
+## 🤖 支持的模型列表
+- ✅ [HunyuanVideo-T2V](https://huggingface.co/tencent/HunyuanVideo)
+- ✅ [HunyuanVideo-I2V](https://huggingface.co/tencent/HunyuanVideo-I2V)
+- ✅ [Wan2.1-T2V](https://huggingface.co/Wan-AI/Wan2.1-T2V-1.3B)
+- ✅ [Wan2.1-I2V](https://huggingface.co/Wan-AI/Wan2.1-I2V-14B-480P)
+- ✅ [Wan2.1-T2V-StepDistill-CfgDistill](https://huggingface.co/lightx2v/Wan2.1-T2V-14B-StepDistill-CfgDistill) (推荐 🚀🚀🚀)
+- ✅ [Wan2.1-T2V-CausVid](https://huggingface.co/lightx2v/Wan2.1-T2V-14B-CausVid)
+- ✅ [SkyReels-V2-DF](https://huggingface.co/Skywork/SkyReels-V2-DF-14B-540P)
+- ✅ [CogVideoX1.5-5B-T2V](https://huggingface.co/THUDM/CogVideoX1.5-5B)
+## 🧾 贡献指南
+我们使用 `pre-commit` 统一代码格式。
+> [!Tip]
+> - 下载需要的依赖:
+>
+> ```shell
+> pip install ruff pre-commit
+>```
+>
+> - 然后，在提交前运行下述指令:
+>
+> ```shell
+> pre-commit run --all-files
+>```
+欢迎贡献！
+## 🤝 致谢
+本仓库实现参考了以上列出的所有模型对应的代码仓库。
+## 🌟 Star 记录
+[![Star History Chart](https://api.star-history.com/svg?repos=ModelTC/lightx2v&type=Timeline)](https://star-history.com/#ModelTC/lightx2v&Timeline)
+## ✏️ 引用
+如果您觉得本框架对您的研究有帮助，请引用：
+```bibtex
+@misc{lightx2v,
+  author = {lightx2v contributors},
+  title  = {LightX2V: Light Video Generation Inference Framework},
+  year   = {2025},
+  publisher = {GitHub},
+  howpublished = {\url{https://github.com/ModelTC/lightx2v}},
+}
+```
--- a/app/README.md
+++ b/app/README.md
+# Lightx2v Gradio Demo Interface
+## 📖 Overview
+Lightx2v is a lightweight video inference and generation engine that provides a web interface based on Gradio, supporting both Image-to-Video and Text-to-Video generation modes.
+This project contains two main demo files:
+- `gradio_demo.py` - English interface version
+- `gradio_demo_zh.py` - Chinese interface version
+## 🚀 Quick Start
+### System Requirements
+- Python 3.10+ (recommended)
+- CUDA 12.4+ (recommended)
+- At least 8GB GPU VRAM
+- At least 16GB system memory
+- At least 128GB SSD solid-state drive (**💾 Strongly recommend using SSD solid-state drives to store model files! During "lazy loading" startup, significantly improves model loading speed and inference performance**)
+### Install Dependencies
+```bash
+# Install basic dependencies
+pip install -r ../requirements.txt
+pip install gradio
+```
+#### Recommended Optimization Library Configuration
+- ✅ [Flash attention](https://github.com/Dao-AILab/flash-attention)
+- ✅ [Sage attention](https://github.com/thu-ml/SageAttention)
+- ✅ [vllm-kernel](https://github.com/vllm-project/vllm)
+- ✅ [sglang-kernel](https://github.com/sgl-project/sglang/tree/main/sgl-kernel)
+- ✅ [q8-kernel](https://github.com/KONAKONA666/q8_kernels) (only supports ADA architecture GPUs)
+### 🤖 Supported Models
+#### 🎬 Image-to-Video Models
+| Model Name | Resolution | Parameters | Features | Recommended Use |
+|------------|------------|------------|----------|-----------------|
+| ✅ [Wan2.1-I2V-14B-480P-Lightx2v](https://huggingface.co/lightx2v/Wan2.1-T2V-14B-StepDistill-CfgDistill) | 480p | 14B | Standard version | Balance speed and quality |
+| ✅ [Wan2.1-I2V-14B-720P-Lightx2v](https://huggingface.co/lightx2v/Wan2.1-T2V-14B-StepDistill-CfgDistill) | 720p | 14B | HD version | Pursue high-quality output |
+| ✅ [Wan2.1-I2V-14B-480P-Lightx2v-StepDistill-CfgDistill](https://huggingface.co/lightx2v/Wan2.1-T2V-14B-StepDistill-CfgDistill) | 480p | 14B | Distilled optimized version | Faster inference speed |
+| ✅ [Wan2.1-I2V-14B-720P-Lightx2v-StepDistill-CfgDistill](https://huggingface.co/lightx2v/Wan2.1-T2V-14B-StepDistill-CfgDistill) | 720p | 14B | HD distilled version | High quality + fast inference |
+#### 📝 Text-to-Video Models
+| Model Name | Parameters | Features | Recommended Use |
+|------------|------------|----------|-----------------|
+| ✅ [Wan2.1-T2V-1.3B-Lightx2v](https://huggingface.co/lightx2v/Wan2.1-T2V-14B-StepDistill-CfgDistill) | 1.3B | Lightweight | Fast prototyping and testing |
+| ✅ [Wan2.1-T2V-14B-Lightx2v](https://huggingface.co/lightx2v/Wan2.1-T2V-14B-StepDistill-CfgDistill) | 14B | Standard version | Balance speed and quality |
+| ✅ [Wan2.1-T2V-14B-Lightx2v-StepDistill-CfgDistill](https://huggingface.co/lightx2v/Wan2.1-T2V-14B-StepDistill-CfgDistill) | 14B | Distilled optimized version | High quality + fast inference |
+**💡 Model Selection Recommendations**:
+- **First-time use**: Recommend choosing distilled versions
+- **Pursuing quality**: Choose 720p resolution or 14B parameter models
+- **Pursuing speed**: Choose 480p resolution or 1.3B parameter models
+- **Resource-constrained**: Prioritize distilled versions and lower resolutions
+### Startup Methods
+#### Method 1: Using Startup Script (Recommended)
+```bash
+# 1. Edit the startup script to configure relevant paths
+vim run_gradio.sh
+# Configuration items that need to be modified:
+# - lightx2v_path: Lightx2v project root directory path
+# - i2v_model_path: Image-to-video model path
+# - t2v_model_path: Text-to-video model path
+# 💾 Important note: Recommend pointing model paths to SSD storage locations
+# Example: /mnt/ssd/models/ or /data/ssd/models/
+# 2. Run the startup script
+bash run_gradio.sh
+# 3. Or start with parameters (recommended)
+bash run_gradio.sh --task i2v --lang en --port 8032
+# bash run_gradio.sh --task t2v --lang en --port 8032
+```
+#### Method 2: Direct Command Line Startup
+**Image-to-Video Mode:**
+```bash
+python gradio_demo.py \
+    --model_path /path/to/Wan2.1-I2V-14B-720P-Lightx2v \
+    --task i2v \
+    --server_name 0.0.0.0 \
+    --server_port 7862
+```
+**Text-to-Video Mode:**
+```bash
+python gradio_demo.py \
+    --model_path /path/to/Wan2.1-T2V-1.3B \
+    --task t2v \
+    --server_name 0.0.0.0 \
+    --server_port 7862
+```
+**Chinese Interface Version:**
+```bash
+python gradio_demo_zh.py \
+    --model_path /path/to/model \
+    --task i2v \
+    --server_name 0.0.0.0 \
+    --server_port 7862
+```
+## 📋 Command Line Parameters
+| Parameter | Type | Required | Default | Description |
+|-----------|------|----------|---------|-------------|
+| `--model_path` | str | ✅ | - | Model folder path |
+| `--model_cls` | str | ❌ | wan2.1 | Model class (currently only supports wan2.1) |
+| `--task` | str | ✅ | - | Task type: `i2v` (image-to-video) or `t2v` (text-to-video) |
+| `--server_port` | int | ❌ | 7862 | Server port |
+| `--server_name` | str | ❌ | 0.0.0.0 | Server IP address |
+## 🎯 Features
+### Basic Settings
+#### Model Type Selection
+- **Wan2.1 14B**: Large parameter count, high generation quality, suitable for high-quality video generation
+- **Wan2.1 1.3B**: Lightweight model, fast speed, suitable for rapid prototyping and testing
+#### Input Parameters
+- **Prompt**: Describe the expected video content
+- **Negative Prompt**: Specify elements you don't want to appear
+- **Resolution**: Supports multiple preset resolutions (480p/540p/720p)
+- **Random Seed**: Controls the randomness of generation results
+- **Inference Steps**: Affects the balance between generation quality and speed
+#### Video Parameters
+- **FPS**: Frames per second
+- **Total Frames**: Video length
+- **CFG Scale Factor**: Controls prompt influence strength (1-10)
+- **Distribution Shift**: Controls generation style deviation degree (0-10)
+### Advanced Optimization Options
+#### GPU Memory Optimization
+- **Chunked Rotary Position Embedding**: Saves GPU memory
+- **Rotary Embedding Chunk Size**: Controls chunk granularity
+- **Clean CUDA Cache**: Promptly frees GPU memory
+#### Asynchronous Offloading
+- **CPU Offloading**: Transfers partial computation to CPU
+- **Lazy Loading**: Loads model components on-demand, significantly reduces system memory consumption
+- **Offload Granularity Control**: Fine-grained control of offloading strategies
+#### Low-Precision Quantization
+- **Attention Operators**: Flash Attention, Sage Attention, etc.
+- **Quantization Operators**: vLLM, SGL, Q8F, etc.
+- **Precision Modes**: FP8, INT8, BF16, etc.
+#### VAE Optimization
+- **Lightweight VAE**: Accelerates decoding process
+- **VAE Tiling Inference**: Reduces memory usage
+#### Feature Caching
+- **Tea Cache**: Caches intermediate features to accelerate generation
+- **Cache Threshold**: Controls cache trigger conditions
+- **Key Step Caching**: Writes cache only at key steps
+## 🔧 Auto-Configuration Feature
+After enabling "Auto-configure Inference Options", the system will automatically optimize parameters based on your hardware configuration:
+### GPU Memory Rules
+- **80GB+**: Default configuration, no optimization needed
+- **48GB**: Enable CPU offloading, offload ratio 50%
+- **40GB**: Enable CPU offloading, offload ratio 80%
+- **32GB**: Enable CPU offloading, offload ratio 100%
+- **24GB**: Enable BF16 precision, VAE tiling
+- **16GB**: Enable chunked offloading, rotary embedding chunking
+- **12GB**: Enable cache cleaning, lightweight VAE
+- **8GB**: Enable quantization, lazy loading
+### CPU Memory Rules
+- **128GB+**: Default configuration
+- **64GB**: Enable DIT quantization
+- **32GB**: Enable lazy loading
+- **16GB**: Enable full model quantization
+## ⚠️ Important Notes
+### 🚀 Low-Resource Device Optimization Recommendations
+**💡 For devices with insufficient VRAM or performance constraints**:
+- **🎯 Model Selection**: Prioritize using distilled version models (StepDistill-CfgDistill)
+- **⚡ Inference Steps**: Recommend setting to 4 steps
+- **🔧 CFG Settings**: Recommend disabling CFG option to improve generation speed
+- **🔄 Auto-Configuration**: Enable "Auto-configure Inference Options"
+### 🔧 Quick Optimization Configuration Examples
+```bash
+# Start with distilled model
+bash run_gradio.sh --task i2v
+# Interface setting recommendations
+- Inference Steps: 25
+- CFG Scale Factor: 4
+- Resolution: 832x480
+- Auto-Configuration: Enabled
+- Quantization Scheme: int8
+- Tea Cache: Enabled
+```
+## 📁 File Structure
+```
+lightx2v/app/
+├── gradio_demo.py          # English interface demo
+├── gradio_demo_zh.py       # Chinese interface demo
+├── run_gradio.sh          # Startup script
+├── README.md              # Documentation
+├── saved_videos/          # Generated video save directory
+└── inference_logs.log     # Inference logs
+```
+## 🎨 Interface Description
+### Basic Settings Tab
+- **Input Parameters**: Model type, prompts, resolution, and other basic settings
+- **Video Parameters**: FPS, frame count, CFG, and other video generation parameters
+- **Output Settings**: Video save path configuration
+### Advanced Options Tab
+- **GPU Memory Optimization**: Memory management related options
+- **Asynchronous Offloading**: CPU offloading and lazy loading
+- **Low-Precision Quantization**: Various quantization optimization options
+- **VAE Optimization**: Variational Autoencoder optimization
+- **Feature Caching**: Cache strategy configuration
+## 🔍 Troubleshooting
+### Common Issues
+**💡 Tip**: Generally, after enabling "Auto-configure Inference Options", the system will automatically optimize parameter settings based on your hardware configuration, and performance issues usually won't occur. If you encounter problems, please refer to the following solutions:
+1. **CUDA Memory Insufficient**
+   - Enable CPU offloading
+   - Reduce resolution
+   - Enable quantization options
+2. **System Memory Insufficient**
+   - Enable CPU offloading
+   - Enable lazy loading option
+   - Enable quantization options
+3. **Slow Generation Speed**
+   - Reduce inference steps
+   - Enable auto-configuration
+   - Use lightweight models
+   - Enable Tea Cache
+   - Use quantization operators
+   - 💾 **Check if models are stored on SSD**
+4. **Slow Model Loading**
+   - 💾 **Migrate models to SSD storage**
+   - Enable lazy loading option
+   - Check disk I/O performance
+   - Consider using NVMe SSD
+5. **Poor Video Quality**
+   - Increase inference steps
+   - Increase CFG scale factor
+   - Use 14B models
+   - Optimize prompts
+### Log Viewing
+```bash
+# View inference logs
+tail -f inference_logs.log
+# View GPU usage
+nvidia-smi
+# View system resources
+htop
+```
+**Note**: Please comply with relevant laws and regulations when using videos generated by this tool, and do not use them for illegal purposes.
--- a/app/README_zh.md
+++ b/app/README_zh.md
+# Lightx2v Gradio 演示界面
+## 📖 概述
+Lightx2v 是一个轻量级的视频推理和生成引擎，提供了基于 Gradio 的 Web 界面，支持图像到视频（Image-to-Video）和文本到视频（Text-to-Video）两种生成模式。
+本项目包含两个主要演示文件：
+- `gradio_demo.py` - 英文界面版本
+- `gradio_demo_zh.py` - 中文界面版本
+## 🚀 快速开始
+### 环境要求
+- Python 3.10+ (推荐)
+- CUDA 12.4+ (推荐)
+- 至少 8GB GPU 显存
+- 至少 16GB 系统内存
+- 至少 128GB SSD固态硬盘 (**💾 强烈建议使用SSD固态硬盘存储模型文件！"延迟加载"启动时，显著提升模型加载速度和推理性能**)
+### 安装依赖☀
+```bash
+# 安装基础依赖
+pip install -r ../requirements.txt
+pip install gradio
+```
+#### 推荐优化库配置
+- ✅ [Flash attention](https://github.com/Dao-AILab/flash-attention)
+- ✅ [Sage attention](https://github.com/thu-ml/SageAttention)
+- ✅ [vllm-kernel](https://github.com/vllm-project/vllm)
+- ✅ [sglang-kernel](https://github.com/sgl-project/sglang/tree/main/sgl-kernel)
+- ✅ [q8-kernel](https://github.com/KONAKONA666/q8_kernels) (只支持ADA架构的GPU)
+### 🤖 支持的模型
+#### 🎬 图像到视频模型 (Image-to-Video)
+| 模型名称 | 分辨率 | 参数量 | 特点 | 推荐场景 |
+|----------|--------|--------|------|----------|
+| ✅ [Wan2.1-I2V-14B-480P-Lightx2v](https://huggingface.co/lightx2v/Wan2.1-T2V-14B-StepDistill-CfgDistill) | 480p | 14B | 标准版本 | 平衡速度和质量 |
+| ✅ [Wan2.1-I2V-14B-720P-Lightx2v](https://huggingface.co/lightx2v/Wan2.1-T2V-14B-StepDistill-CfgDistill) | 720p | 14B | 高清版本 | 追求高质量输出 |
+| ✅ [Wan2.1-I2V-14B-480P-Lightx2v-StepDistill-CfgDistill](https://huggingface.co/lightx2v/Wan2.1-T2V-14B-StepDistill-CfgDistill) | 480p | 14B | 蒸馏优化版 | 更快的推理速度 |
+| ✅ [Wan2.1-I2V-14B-720P-Lightx2v-StepDistill-CfgDistill](https://huggingface.co/lightx2v/Wan2.1-T2V-14B-StepDistill-CfgDistill) | 720p | 14B | 高清蒸馏版 | 高质量+快速推理 |
+#### 📝 文本到视频模型 (Text-to-Video)
+| 模型名称 | 参数量 | 特点 | 推荐场景 |
+|----------|--------|------|----------|
+| ✅ [Wan2.1-T2V-1.3B-Lightx2v](https://huggingface.co/lightx2v/Wan2.1-T2V-14B-StepDistill-CfgDistill) | 1.3B | 轻量级 | 快速原型测试 |
+| ✅ [Wan2.1-T2V-14B-Lightx2v](https://huggingface.co/lightx2v/Wan2.1-T2V-14B-StepDistill-CfgDistill) | 14B | 标准版本 | 平衡速度和质量 |
+| ✅ [Wan2.1-T2V-14B-Lightx2v-StepDistill-CfgDistill](https://huggingface.co/lightx2v/Wan2.1-T2V-14B-StepDistill-CfgDistill) | 14B | 蒸馏优化版 | 高质量+快速推理 |
+**💡 模型选择建议**:
+- **首次使用**: 建议选择蒸馏版本
+- **追求质量**: 选择720p分辨率或14B参数模型
+- **追求速度**: 选择480p分辨率或1.3B参数模型
+- **资源受限**: 优先选择蒸馏版本和较低分辨率
+### 启动方式
+#### 方式一：使用启动脚本（推荐）
+```bash
+# 1. 编辑启动脚本，配置相关路径
+vim run_gradio.sh
+# 需要修改的配置项：
+# - lightx2v_path: Lightx2v项目根目录路径
+# - i2v_model_path: 图像到视频模型路径
+# - t2v_model_path: 文本到视频模型路径
+# 💾 重要提示：建议将模型路径指向SSD存储位置
+# 例如：/mnt/ssd/models/ 或 /data/ssd/models/
+# 2. 运行启动脚本
+bash run_gradio.sh
+# 3. 或使用参数启动（推荐）
+bash run_gradio.sh --task i2v --lang zh --port 8032
+# bash run_gradio.sh --task t2v --lang zh --port 8032
+```
+#### 方式二：直接命令行启动
+**图像到视频模式：**
+```bash
+python gradio_demo_zh.py \
+    --model_path /path/to/Wan2.1-I2V-14B-720P-Lightx2v \
+    --task i2v \
+    --server_name 0.0.0.0 \
+    --server_port 7862
+```
+**文本到视频模式：**
+```bash
+python gradio_demo_zh.py \
+    --model_path /path/to/Wan2.1-T2V-1.3B \
+    --task t2v \
+    --server_name 0.0.0.0 \
+    --server_port 7862
+```
+**英文界面版本：**
+```bash
+python gradio_demo.py \
+    --model_path /path/to/model \
+    --task i2v \
+    --server_name 0.0.0.0 \
+    --server_port 7862
+```
+## 📋 命令行参数
+| 参数 | 类型 | 必需 | 默认值 | 说明 |
+|------|------|------|--------|------|
+| `--model_path` | str | ✅ | - | 模型文件夹路径 |
+| `--model_cls` | str | ❌ | wan2.1 | 模型类别（目前仅支持wan2.1） |
+| `--task` | str | ✅ | - | 任务类型：`i2v`（图像到视频）或 `t2v`（文本到视频） |
+| `--server_port` | int | ❌ | 7862 | 服务器端口 |
+| `--server_name` | str | ❌ | 0.0.0.0 | 服务器IP地址 |
+## 🎯 功能特性
+### 基本设置
+#### 模型类型选择
+- **Wan2.1 14B**: 参数量大，生成质量高，适合高质量视频生成
+- **Wan2.1 1.3B**: 轻量级模型，速度快，适合快速原型和测试
+#### 输入参数
+- **提示词 (Prompt)**: 描述期望的视频内容
+- **负向提示词 (Negative Prompt)**: 指定不希望出现的元素
+- **分辨率**: 支持多种预设分辨率（480p/540p/720p）
+- **随机种子**: 控制生成结果的随机性
+- **推理步数**: 影响生成质量和速度的平衡
+#### 视频参数
+- **FPS**: 每秒帧数
+- **总帧数**: 视频长度
+- **CFG缩放因子**: 控制提示词影响强度（1-10）
+- **分布偏移**: 控制生成风格偏离程度（0-10）
+### 高级优化选项
+#### GPU内存优化
+- **分块旋转位置编码**: 节省GPU内存
+- **旋转编码块大小**: 控制分块粒度
+- **清理CUDA缓存**: 及时释放GPU内存
+#### 异步卸载
+- **CPU卸载**: 将部分计算转移到CPU
+- **延迟加载**: 按需加载模型组件，显著节省系统内存消耗
+- **卸载粒度控制**: 精细控制卸载策略
+#### 低精度量化
+- **注意力算子**: Flash Attention、Sage Attention等
+- **量化算子**: vLLM、SGL、Q8F等
+- **精度模式**: FP8、INT8、BF16等
+#### VAE优化
+- **轻量级VAE**: 加速解码过程
+- **VAE分块推理**: 减少内存占用
+#### 特征缓存
+- **Tea Cache**: 缓存中间特征加速生成
+- **缓存阈值**: 控制缓存触发条件
+- **关键步缓存**: 仅在关键步骤写入缓存
+## 🔧 自动配置功能
+启用"自动配置推理选项"后，系统会根据您的硬件配置自动优化参数：
+### GPU内存规则
+- **80GB+**: 默认配置，无需优化
+- **48GB**: 启用CPU卸载，卸载比例50%
+- **40GB**: 启用CPU卸载，卸载比例80%
+- **32GB**: 启用CPU卸载，卸载比例100%
+- **24GB**: 启用BF16精度、VAE分块
+- **16GB**: 启用分块卸载、旋转编码分块
+- **12GB**: 启用清理缓存、轻量级VAE
+- **8GB**: 启用量化、延迟加载
+### CPU内存规则
+- **128GB+**: 默认配置
+- **64GB**: 启用DIT量化
+- **32GB**: 启用延迟加载
+- **16GB**: 启用全模型量化
+## ⚠️ 重要注意事项
+### 🚀 低资源设备优化建议
+**💡 针对显存不足或性能受限的设备**:
+- **🎯 模型选择**: 优先使用蒸馏版本模型 (StepDistill-CfgDistill)
+- **⚡ 推理步数**: 建议设置为 4 步
+- **🔧 CFG设置**: 建议关闭CFG选项以提升生成速度
+- **🔄 自动配置**: 启用"自动配置推理选项"
+### 🔧 快速优化配置示例
+```bash
+# 启动时使用蒸馏模型
+bash run_gradio.sh --task i2v
+# 界面设置建议
+- 推理步数: 25
+- CFG缩放因子: 4
+- 分辨率: 832x480
+- 自动配置: 开启
+- 量化方案: int8
+- Tea Cache: 开启
+```
+## 📁 文件结构
+```
+lightx2v/app/
+├── gradio_demo.py          # 英文界面演示
+├── gradio_demo_zh.py       # 中文界面演示
+├── run_gradio.sh          # 启动脚本
+├── README.md              # 说明文档
+├── saved_videos/          # 生成视频保存目录
+└── inference_logs.log     # 推理日志
+```
+## 🎨 界面说明
+### 基本设置标签页
+- **输入参数**: 模型类型、提示词、分辨率等基本设置
+- **视频参数**: FPS、帧数、CFG等视频生成参数
+- **输出设置**: 视频保存路径配置
+### 高级选项标签页
+- **GPU内存优化**: 内存管理相关选项
+- **异步卸载**: CPU卸载和延迟加载
+- **低精度量化**: 各种量化优化选项
+- **VAE优化**: 变分自编码器优化
+- **特征缓存**: 缓存策略配置
+## 🔍 故障排除
+### 常见问题
+**💡 提示**: 一般情况下，启用"自动配置推理选项"后，系统会根据您的硬件配置自动优化参数设置，通常不会出现性能问题。如果遇到问题，请参考以下解决方案：
+1. **CUDA内存不足**
+   - 启用CPU卸载
+   - 降低分辨率
+   - 启用量化选项
+1. **系統内存不足**
+   - 启用CPU卸载
+   - 启用延迟加载选项
+   - 启用量化选项
+2. **生成速度慢**
+   - 减少推理步数
+   - 启用自动配置
+   - 使用轻量级模型
+   - 启用Tea Cache
+   - 使用量化算子
+   - 💾 **检查模型是否存放在SSD上**
+3. **模型加载缓慢**
+   - 💾 **将模型迁移到SSD存储**
+   - 启用延迟加载选项
+   - 检查磁盘I/O性能
+   - 考虑使用NVMe SSD
+4. **视频质量不佳**
+   - 增加推理步数
+   - 提高CFG缩放因子
+   - 使用14B模型
+   - 优化提示词
+### 日志查看
+```bash
+# 查看推理日志
+tail -f inference_logs.log
+# 查看GPU使用情况
+nvidia-smi
+# 查看系统资源
+htop
+```
+欢迎提交Issue和Pull Request来改进这个项目！
+**注意**: 使用本工具生成的视频内容请遵守相关法律法规，不得用于非法用途。
--- a/app/gradio_demo.py
+++ b/app/gradio_demo.py
@@ -148,10 +148,8 @@ for op_name, is_installed in available_attn_ops:
 def run_inference(
    model_type,
-    task,
    prompt,
    negative_prompt,
-    image_path,
    save_video_path,
    torch_compile,
    infer_steps,
@@ -181,22 +179,18 @@ def run_inference(
    rotary_chunk,
    rotary_chunk_size,
    clean_cuda_cache,
+    image_path=None,
 ):
    quant_op = quant_op.split("(")[0].strip()
    attention_type = attention_type.split("(")[0].strip()
-    global global_runner, current_config, model_path
+    global global_runner, current_config, model_path, task
    global cur_dit_quant_scheme, cur_clip_quant_scheme, cur_t5_quant_scheme, cur_precision_mode, cur_enable_teacache
    if os.path.exists(os.path.join(model_path, "config.json")):
        with open(os.path.join(model_path, "config.json"), "r") as f:
            model_config = json.load(f)
-    if task == "Image to Video":
-        task = "i2v"
-    elif task == "Text to Video":
-        task = "t2v"
    if task == "t2v":
        if model_type == "Wan2.1 1.3B":
            # 1.3B
@@ -551,6 +545,7 @@ def auto_configure(enable_auto_config, model_type, resolution):
                    "rotary_chunk_val": True,
                    "rotary_chunk_size_val": 100,
                    "clean_cuda_cache_val": True,
+                    "use_tiny_vae_val": True,
                },
            ),
            (
@@ -569,6 +564,7 @@ def auto_configure(enable_auto_config, model_type, resolution):
                    "clip_quant_scheme_val": quant_type,
                    "dit_quant_scheme_val": quant_type,
                    "lazy_load_val": True,
+                    "use_tiny_vae_val": True,
                },
            ),
        ]
@@ -606,6 +602,7 @@ def auto_configure(enable_auto_config, model_type, resolution):
                        "lazy_load_val": True,
                        "rotary_chunk_val": True,
                        "rotary_chunk_size_val": 10000,
+                        "use_tiny_vae_val": True,
                    }
                    if res == "540p"
                    else {
@@ -619,11 +616,15 @@ def auto_configure(enable_auto_config, model_type, resolution):
                        "clip_quant_scheme_val": quant_type,
                        "dit_quant_scheme_val": quant_type,
                        "lazy_load_val": True,
+                        "use_tiny_vae_val": True,
                    }
                ),
            ),
        ]
+    else:
+        gpu_rules = {}
    if is_14b:
        cpu_rules = [
            (128, {}),
@@ -639,6 +640,8 @@ def auto_configure(enable_auto_config, model_type, resolution):
                },
            ),
        ]
+    else:
+        cpu_rules = {}
    for threshold, updates in gpu_rules:
        if gpu_memory >= threshold:
@@ -654,12 +657,6 @@ def auto_configure(enable_auto_config, model_type, resolution):
 def main():
-    def update_model_type(task_type):
-        if task_type == "Image to Video":
-            return gr.update(choices=["Wan2.1 14B"], value="Wan2.1 14B")
-        elif task_type == "Text to Video":
-            return gr.update(choices=["Wan2.1 14B", "Wan2.1 1.3B"], value="Wan2.1 14B")
    def toggle_image_input(task):
        return gr.update(visible=(task == "Image to Video"))
@@ -684,35 +681,27 @@ def main():
                            gr.Markdown("## 📥 Input Parameters")
                            with gr.Row():
-                                task = gr.Dropdown(
+                                if task == "i2v":
-                                    choices=["Image to Video", "Text to Video"],
-                                    value="Image to Video",
-                                    label="Task Type",
-                                )
                                    model_type = gr.Dropdown(
                                        choices=["Wan2.1 14B"],
                                        value="Wan2.1 14B",
                                        label="Model Type",
                                    )
-                                task.change(
+                                else:
-                                    fn=update_model_type,
+                                    model_type = gr.Dropdown(
-                                    inputs=task,
+                                        choices=["Wan2.1 14B", "Wan2.1 1.3B"],
-                                    outputs=model_type,
+                                        value="Wan2.1 14B",
+                                        label="Model Type",
                                    )
+                            if task == "i2v":
                                with gr.Row():
                                    image_path = gr.Image(
                                        label="Input Image",
                                        type="filepath",
                                        height=300,
                                        interactive=True,
-                                    visible=True,  # Initially visible
+                                        visible=True,
-                                )
-                                task.change(
-                                    fn=toggle_image_input,
-                                    inputs=task,
-                                    outputs=image_path,
                                    )
                            with gr.Row():
@@ -755,6 +744,13 @@ def main():
                                        value="832x480",
                                        label="Maximum Resolution",
                                    )
+                                with gr.Column():
+                                    enable_auto_config = gr.Checkbox(
+                                        label="Auto-configure Inference Options",
+                                        value=False,
+                                        info="Automatically optimize GPU settings to match the current resolution. After changing the resolution, please re-check this option to prevent potential performance degradation or runtime errors.",
+                                    )
                                with gr.Column(scale=9):
                                    seed = gr.Slider(
                                        label="Random Seed",
@@ -836,14 +832,6 @@ def main():
            with gr.Tab("⚙️ Advanced Options", id=2):
                with gr.Group(elem_classes="advanced-options"):
-                    gr.Markdown("### Auto configuration")
-                    with gr.Row():
-                        enable_auto_config = gr.Checkbox(
-                            label="Auto configuration",
-                            value=False,
-                            info="Auto-tune optimization settings for your GPU",
-                        )
                    gr.Markdown("### GPU Memory Optimization")
                    with gr.Row():
                        rotary_chunk = gr.Checkbox(
@@ -1007,15 +995,53 @@ def main():
                        use_ret_steps,
                    ],
                )
+        if task == "i2v":
            infer_btn.click(
                fn=run_inference,
                inputs=[
                    model_type,
-                task,
                    prompt,
                    negative_prompt,
+                    save_video_path,
+                    torch_compile,
+                    infer_steps,
+                    num_frames,
+                    resolution,
+                    seed,
+                    sample_shift,
+                    enable_teacache,
+                    teacache_thresh,
+                    use_ret_steps,
+                    enable_cfg,
+                    cfg_scale,
+                    dit_quant_scheme,
+                    t5_quant_scheme,
+                    clip_quant_scheme,
+                    fps,
+                    use_tiny_vae,
+                    use_tiling_vae,
+                    lazy_load,
+                    precision_mode,
+                    cpu_offload,
+                    offload_granularity,
+                    offload_ratio,
+                    t5_offload_granularity,
+                    attention_type,
+                    quant_op,
+                    rotary_chunk,
+                    rotary_chunk_size,
+                    clean_cuda_cache,
                    image_path,
+                ],
+                outputs=output_video,
+            )
+        else:
+            infer_btn.click(
+                fn=run_inference,
+                inputs=[
+                    model_type,
+                    prompt,
+                    negative_prompt,
                    save_video_path,
                    torch_compile,
                    infer_steps,
@@ -1062,6 +1088,7 @@ if __name__ == "__main__":
        default="wan2.1",
        help="Model class to use",
    )
+    parser.add_argument("--task", type=str, required=True, choices=["i2v", "t2v"], help="Specify the task type. 'i2v' for image-to-video translation, 't2v' for text-to-video generation.")
    parser.add_argument("--server_port", type=int, default=7862, help="Server port")
    parser.add_argument("--server_name", type=str, default="0.0.0.0", help="Server ip")
    args = parser.parse_args()
@@ -1069,5 +1096,6 @@ if __name__ == "__main__":
    global model_path, model_cls
    model_path = args.model_path
    model_cls = args.model_cls
+    task = args.task
    main()
--- a/app/gradio_demo_zh.py
+++ b/app/gradio_demo_zh.py
@@ -13,7 +13,6 @@ import importlib.util
 import psutil
 import random
 logger.add(
    "inference_logs.log",
    rotation="100 MB",
@@ -98,7 +97,7 @@ def get_gpu_memory(gpu_idx=0):
    try:
        with torch.cuda.device(gpu_idx):
            memory_info = torch.cuda.mem_get_info()
-            total_memory = memory_info[1] / (1024**3)
+            total_memory = memory_info[1] / (1024**3)  # Convert bytes to GB
            return total_memory
    except Exception as e:
        logger.warning(f"获取GPU内存失败: {e}")
@@ -149,10 +148,8 @@ for op_name, is_installed in available_attn_ops:
 def run_inference(
    model_type,
-    task,
    prompt,
    negative_prompt,
-    image_path,
    save_video_path,
    torch_compile,
    infer_steps,
@@ -182,22 +179,18 @@ def run_inference(
    rotary_chunk,
    rotary_chunk_size,
    clean_cuda_cache,
+    image_path=None,
 ):
    quant_op = quant_op.split("(")[0].strip()
    attention_type = attention_type.split("(")[0].strip()
-    global global_runner, current_config, model_path
+    global global_runner, current_config, model_path, task
    global cur_dit_quant_scheme, cur_clip_quant_scheme, cur_t5_quant_scheme, cur_precision_mode, cur_enable_teacache
    if os.path.exists(os.path.join(model_path, "config.json")):
        with open(os.path.join(model_path, "config.json"), "r") as f:
            model_config = json.load(f)
-    if task == "图像生成视频":
-        task = "i2v"
-    elif task == "文本生成视频":
-        task = "t2v"
    if task == "t2v":
        if model_type == "Wan2.1 1.3B":
            # 1.3B
@@ -407,6 +400,7 @@ def run_inference(
    logger.info(f"使用模型: {model_path}")
    logger.info(f"推理配置:\n{json.dumps(config, indent=4, ensure_ascii=False)}")
+    # Initialize or reuse the runner
    runner = global_runner
    if needs_reinit:
        if runner is not None:
@@ -551,6 +545,7 @@ def auto_configure(enable_auto_config, model_type, resolution):
                    "rotary_chunk_val": True,
                    "rotary_chunk_size_val": 100,
                    "clean_cuda_cache_val": True,
+                    "use_tiny_vae_val": True,
                },
            ),
            (
@@ -569,6 +564,7 @@ def auto_configure(enable_auto_config, model_type, resolution):
                    "clip_quant_scheme_val": quant_type,
                    "dit_quant_scheme_val": quant_type,
                    "lazy_load_val": True,
+                    "use_tiny_vae_val": True,
                },
            ),
        ]
@@ -606,6 +602,7 @@ def auto_configure(enable_auto_config, model_type, resolution):
                        "lazy_load_val": True,
                        "rotary_chunk_val": True,
                        "rotary_chunk_size_val": 10000,
+                        "use_tiny_vae_val": True,
                    }
                    if res == "540p"
                    else {
@@ -619,11 +616,15 @@ def auto_configure(enable_auto_config, model_type, resolution):
                        "clip_quant_scheme_val": quant_type,
                        "dit_quant_scheme_val": quant_type,
                        "lazy_load_val": True,
+                        "use_tiny_vae_val": True,
                    }
                ),
            ),
        ]
+    else:
+        gpu_rules = {}
    if is_14b:
        cpu_rules = [
            (128, {}),
@@ -639,6 +640,8 @@ def auto_configure(enable_auto_config, model_type, resolution):
                },
            ),
        ]
+    else:
+        cpu_rules = {}
    for threshold, updates in gpu_rules:
        if gpu_memory >= threshold:
@@ -654,17 +657,11 @@ def auto_configure(enable_auto_config, model_type, resolution):
 def main():
-    def update_model_type(task_type):
-        if task_type == "图像生成视频":
-            return gr.update(choices=["Wan2.1 14B"], value="Wan2.1 14B")
-        elif task_type == "文本生成视频":
-            return gr.update(choices=["Wan2.1 14B", "Wan2.1 1.3B"], value="Wan2.1 14B")
    def toggle_image_input(task):
-        return gr.update(visible=(task == "图像生成视频"))
+        return gr.update(visible=(task == "i2v"))
    with gr.Blocks(
-        title="Lightx2v (轻量级视频生成推理引擎)",
+        title="Lightx2v (轻量级视频推理和生成引擎)",
        css="""
        .main-content { max-width: 1400px; margin: auto; }
        .output-video { max-height: 650px; }
@@ -684,22 +681,20 @@ def main():
                            gr.Markdown("## 📥 输入参数")
                            with gr.Row():
-                                task = gr.Dropdown(
+                                if task == "i2v":
-                                    choices=["图像生成视频", "文本生成视频"],
-                                    value="图像生成视频",
-                                    label="任务类型",
-                                )
                                    model_type = gr.Dropdown(
                                        choices=["Wan2.1 14B"],
                                        value="Wan2.1 14B",
                                        label="模型类型",
                                    )
-                                task.change(
+                                else:
-                                    fn=update_model_type,
+                                    model_type = gr.Dropdown(
-                                    inputs=task,
+                                        choices=["Wan2.1 14B", "Wan2.1 1.3B"],
-                                    outputs=model_type,
+                                        value="Wan2.1 14B",
+                                        label="模型类型",
                                    )
+                            if task == "i2v":
                                with gr.Row():
                                    image_path = gr.Image(
                                        label="输入图像",
@@ -709,12 +704,6 @@ def main():
                                        visible=True,
                                    )
-                                task.change(
-                                    fn=toggle_image_input,
-                                    inputs=task,
-                                    outputs=image_path,
-                                )
                            with gr.Row():
                                with gr.Column():
                                    prompt = gr.Textbox(
@@ -755,6 +744,11 @@ def main():
                                        value="832x480",
                                        label="最大分辨率",
                                    )
+                                with gr.Column():
+                                    enable_auto_config = gr.Checkbox(
+                                        label="自动配置推理选项", value=False, info="自动优化GPU设置以匹配当前分辨率。修改分辨率后，请重新勾选此选项，否则可能导致性能下降或运行失败。"
+                                    )
                                with gr.Column(scale=9):
                                    seed = gr.Slider(
                                        label="随机种子",
@@ -764,9 +758,10 @@ def main():
                                        value=generate_random_seed(),
                                    )
                                with gr.Column(scale=1):
-                                    randomize_btn = gr.Button("🎲 生成随机种子", variant="secondary")
+                                    randomize_btn = gr.Button("🎲 随机化", variant="secondary")
                                randomize_btn.click(fn=generate_random_seed, inputs=None, outputs=seed)
                                with gr.Column():
                                    infer_steps = gr.Slider(
                                        label="推理步数",
@@ -774,7 +769,7 @@ def main():
                                        maximum=100,
                                        step=1,
                                        value=40,
-                                        info="视频生成的推理步数。增加步数可能提高质量但降低速度",
+                                        info="视频生成的推理步数。增加步数可能提高质量但降低速度。",
                                    )
                            enable_cfg = gr.Checkbox(
@@ -788,7 +783,7 @@ def main():
                                maximum=10,
                                step=1,
                                value=5,
-                                info="控制提示词的影响强度。值越高，提示词的影响越大",
+                                info="控制提示词的影响强度。值越高，提示词的影响越大。",
                            )
                            sample_shift = gr.Slider(
                                label="分布偏移",
@@ -796,7 +791,7 @@ def main():
                                minimum=0,
                                maximum=10,
                                step=1,
-                                info="控制样本分布偏移的程度。值越大表示偏移越明显",
+                                info="控制样本分布偏移的程度。值越大表示偏移越明显。",
                            )
                            fps = gr.Slider(
@@ -805,7 +800,7 @@ def main():
                                maximum=30,
                                step=1,
                                value=16,
-                                info="视频的每秒帧数。较高的FPS会产生更流畅的视频",
+                                info="视频的每秒帧数。较高的FPS会产生更流畅的视频。",
                            )
                            num_frames = gr.Slider(
                                label="总帧数",
@@ -813,7 +808,7 @@ def main():
                                maximum=120,
                                step=1,
                                value=81,
-                                info="视频中的总帧数。更多帧数会产生更长的视频",
+                                info="视频中的总帧数。更多帧数会产生更长的视频。",
                            )
                        save_video_path = gr.Textbox(
@@ -835,14 +830,6 @@ def main():
            with gr.Tab("⚙️ 高级选项", id=2):
                with gr.Group(elem_classes="advanced-options"):
-                    gr.Markdown("### 自动配置")
-                    with gr.Row():
-                        enable_auto_config = gr.Checkbox(
-                            label="自动配置",
-                            value=False,
-                            info="自动调整优化设置以适应您的GPU",
-                        )
                    gr.Markdown("### GPU内存优化")
                    with gr.Row():
                        rotary_chunk = gr.Checkbox(
@@ -857,13 +844,13 @@ def main():
                            minimum=100,
                            maximum=10000,
                            step=100,
-                            info="控制应用旋转编码的块大小, 较大的值可能提高性能但增加内存使用, 仅在'rotary_chunk'勾选时有效",
+                            info="控制应用旋转编码的块大小。较大的值可能提高性能但增加内存使用。仅在'rotary_chunk'勾选时有效。",
                        )
                        clean_cuda_cache = gr.Checkbox(
                            label="清理CUDA内存缓存",
                            value=False,
-                            info="及时释放GPU内存, 但会减慢推理速度。",
+                            info="启用时，及时释放GPU内存但会减慢推理速度。",
                        )
                    gr.Markdown("### 异步卸载")
@@ -877,14 +864,14 @@ def main():
                        lazy_load = gr.Checkbox(
                            label="启用延迟加载",
                            value=False,
-                            info="在推理过程中延迟加载模型组件, 仅在'cpu_offload'勾选和使用量化Dit模型时有效",
+                            info="在推理过程中延迟加载模型组件。需要CPU加载和DIT量化。",
                        )
                        offload_granularity = gr.Dropdown(
                            label="Dit卸载粒度",
                            choices=["block", "phase"],
                            value="phase",
-                            info="设置Dit模型卸载粒度: 块或计算阶段",
+                            info="设置Dit模型卸载粒度：块或计算阶段",
                        )
                        offload_ratio = gr.Slider(
                            label="Dit模型卸载比例",
@@ -926,25 +913,25 @@ def main():
                            label="Dit",
                            choices=["fp8", "int8", "bf16"],
                            value="bf16",
-                            info="Dit模型的推理精度",
+                            info="Dit模型的量化精度",
                        )
                        t5_quant_scheme = gr.Dropdown(
                            label="T5编码器",
                            choices=["fp8", "int8", "bf16"],
                            value="bf16",
-                            info="T5编码器模型的推理精度",
+                            info="T5编码器模型的量化精度",
                        )
                        clip_quant_scheme = gr.Dropdown(
                            label="Clip编码器",
                            choices=["fp8", "int8", "fp16"],
                            value="fp16",
-                            info="Clip编码器的推理精度",
+                            info="Clip编码器的量化精度",
                        )
                        precision_mode = gr.Dropdown(
-                            label="敏感层精度",
+                            label="敏感层精度模式",
                            choices=["fp32", "bf16"],
                            value="fp32",
-                            info="选择用于敏感层（如norm层和embedding层）的数值精度",
+                            info="选择用于关键模型组件（如归一化和嵌入层）的数值精度。FP32提供更高精度，而BF16在兼容硬件上提高性能。",
                        )
                    gr.Markdown("### 变分自编码器(VAE)")
@@ -1006,15 +993,53 @@ def main():
                        use_ret_steps,
                    ],
                )
+        if task == "i2v":
            infer_btn.click(
                fn=run_inference,
                inputs=[
                    model_type,
-                task,
                    prompt,
                    negative_prompt,
+                    save_video_path,
+                    torch_compile,
+                    infer_steps,
+                    num_frames,
+                    resolution,
+                    seed,
+                    sample_shift,
+                    enable_teacache,
+                    teacache_thresh,
+                    use_ret_steps,
+                    enable_cfg,
+                    cfg_scale,
+                    dit_quant_scheme,
+                    t5_quant_scheme,
+                    clip_quant_scheme,
+                    fps,
+                    use_tiny_vae,
+                    use_tiling_vae,
+                    lazy_load,
+                    precision_mode,
+                    cpu_offload,
+                    offload_granularity,
+                    offload_ratio,
+                    t5_offload_granularity,
+                    attention_type,
+                    quant_op,
+                    rotary_chunk,
+                    rotary_chunk_size,
+                    clean_cuda_cache,
                    image_path,
+                ],
+                outputs=output_video,
+            )
+        else:
+            infer_btn.click(
+                fn=run_inference,
+                inputs=[
+                    model_type,
+                    prompt,
+                    negative_prompt,
                    save_video_path,
                    torch_compile,
                    infer_steps,
@@ -1061,6 +1086,7 @@ if __name__ == "__main__":
        default="wan2.1",
        help="要使用的模型类别",
    )
+    parser.add_argument("--task", type=str, required=True, choices=["i2v", "t2v"], help="指定任务类型。'i2v'用于图像到视频转换，'t2v'用于文本到视频生成。")
    parser.add_argument("--server_port", type=int, default=7862, help="服务器端口")
    parser.add_argument("--server_name", type=str, default="0.0.0.0", help="服务器IP")
    args = parser.parse_args()
@@ -1068,5 +1094,6 @@ if __name__ == "__main__":
    global model_path, model_cls
    model_path = args.model_path
    model_cls = args.model_cls
+    task = args.task
    main()
--- a/app/run_gradio.sh
+++ b/app/run_gradio.sh
 #!/bin/bash
-lightx2v_path=/mtc/gushiqiao/llmc_workspace/lightx2v_new/lightx2v
+# Lightx2v Gradio Demo Startup Script
-model_path=/data/nvme0/gushiqiao/models/I2V/Wan2.1-I2V-14B-720P-Lightx2v-Step-Distill
+# Supports both Image-to-Video (i2v) and Text-to-Video (t2v) modes
-export CUDA_VISIBLE_DEVICES=7
+# ==================== Configuration Area ====================
+# ⚠️  Important: Please modify the following paths according to your actual environment
+# 🚨 Storage Performance Tips 🚨
+# 💾 Strongly recommend storing model files on SSD solid-state drives!
+# 📈 SSD can significantly improve model loading speed and inference performance
+# 🐌 Using mechanical hard drives (HDD) may cause slow model loading and affect overall experience
+# Lightx2v project root directory path
+# Example: /home/user/lightx2v or /data/video_gen/lightx2v
+lightx2v_path=/path/to/lightx2v
+# Model path configuration
+# Image-to-video model path (for i2v tasks)
+# Example: /path/to/Wan2.1-I2V-14B-720P-Lightx2v
+i2v_model_path=/path/to/Wan2.1-I2V-14B-720P-Lightx2v
+# Text-to-video model path (for t2v tasks)
+# Example: /path/to/Wan2.1-T2V-1.3B
+t2v_model_path=/path/to/Wan2.1-T2V-1.3B
+# Server configuration
+server_name="0.0.0.0"
+server_port=8032
+# GPU configuration
+gpu_id=0
+# ==================== Environment Variables Setup ====================
+export CUDA_VISIBLE_DEVICES=$gpu_id
 export CUDA_LAUNCH_BLOCKING=1
 export PYTHONPATH=${lightx2v_path}:$PYTHONPATH
 export ENABLE_PROFILING_DEBUG=true
 export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True
-python gradio_demo.py \
+# ==================== Parameter Parsing ====================
-    --model_path $model_path \
+# Default task type
-    --server_name 0.0.0.0 \
+task="i2v"
-    --server_port 8005
+# Default interface language
+lang="zh"
+# 解析命令行参数
+while [[ $# -gt 0 ]]; do
+    case $1 in
+        --task)
+            task="$2"
+            shift 2
+            ;;
+        --lang)
+            lang="$2"
+            shift 2
+            ;;
+        --port)
+            server_port="$2"
+            shift 2
+            ;;
+        --gpu)
+            gpu_id="$2"
+            export CUDA_VISIBLE_DEVICES=$gpu_id
+            shift 2
+            ;;
+        --help)
+            echo "🎬 Lightx2v Gradio Demo Startup Script"
+            echo "=========================================="
+            echo "Usage: $0 [options]"
+            echo ""
+            echo "📋 Available options:"
+            echo "  --task i2v|t2v    Task type (default: i2v)"
+            echo "                     i2v: Image-to-video generation"
+            echo "                     t2v: Text-to-video generation"
+            echo "  --lang zh|en      Interface language (default: zh)"
+            echo "                     zh: Chinese interface"
+            echo "                     en: English interface"
+            echo "  --port PORT       Server port (default: 8032)"
+            echo "  --gpu GPU_ID      GPU device ID (default: 0)"
+            echo "  --help            Show this help message"
+            echo ""
+            echo "🚀 Usage examples:"
+            echo "  $0                                    # Default startup for image-to-video mode"
+            echo "  $0 --task i2v --lang zh --port 8032   # Start with specified parameters"
+            echo "  $0 --task t2v --lang en --port 7860   # Text-to-video with English interface"
+            echo "  $0 --task i2v --gpu 1 --port 8032     # Use GPU 1"
+            echo ""
+            echo "📝 Notes:"
+            echo "  - Edit script to configure model paths before first use"
+            echo "  - Ensure required Python dependencies are installed"
+            echo "  - Recommended to use GPU with 8GB+ VRAM"
+            echo "  - 🚨 Strongly recommend storing models on SSD for better performance"
+            exit 0
+            ;;
+        *)
+            echo "Unknown parameter: $1"
+            echo "Use --help to see help information"
+            exit 1
+            ;;
+    esac
+done
+# ==================== Parameter Validation ====================
+if [[ "$task" != "i2v" && "$task" != "t2v" ]]; then
+    echo "Error: Task type must be 'i2v' or 't2v'"
+    exit 1
+fi
+if [[ "$lang" != "zh" && "$lang" != "en" ]]; then
+    echo "Error: Language must be 'zh' or 'en'"
+    exit 1
+fi
+# Select model path based on task type
+if [[ "$task" == "i2v" ]]; then
+    model_path=$i2v_model_path
+    echo "🎬 Starting Image-to-Video mode"
+else
+    model_path=$t2v_model_path
+    echo "🎬 Starting Text-to-Video mode"
+fi
+# Check if model path exists
+if [[ ! -d "$model_path" ]]; then
+    echo "❌ Error: Model path does not exist"
+    echo "📁 Path: $model_path"
+    echo "🔧 Solutions:"
+    echo "  1. Check model path configuration in script"
+    echo "  2. Ensure model files are properly downloaded"
+    echo "  3. Verify path permissions are correct"
+    echo "  4. 💾 Recommend storing models on SSD for faster loading"
+    exit 1
+fi
+# Select demo file based on language
+if [[ "$lang" == "zh" ]]; then
+    demo_file="gradio_demo_zh.py"
+    echo "🌏 Using Chinese interface"
+else
+    demo_file="gradio_demo.py"
+    echo "🌏 Using English interface"
+fi
+# Check if demo file exists
+if [[ ! -f "$demo_file" ]]; then
+    echo "❌ Error: Demo file does not exist"
+    echo "📄 File: $demo_file"
+    echo "🔧 Solutions:"
+    echo "  1. Ensure script is run in the correct directory"
+    echo "  2. Check if file has been renamed or moved"
+    echo "  3. Re-clone or download project files"
+    exit 1
+fi
+# ==================== System Information Display ====================
+echo "=========================================="
+echo "🚀 Lightx2v Gradio Demo Starting..."
+echo "=========================================="
+echo "📁 Project path: $lightx2v_path"
+echo "🤖 Model path: $model_path"
+echo "🎯 Task type: $task"
+echo "🌏 Interface language: $lang"
+echo "🖥️  GPU device: $gpu_id"
+echo "🌐 Server address: $server_name:$server_port"
+echo "=========================================="
+# Display system resource information
+echo "💻 System resource information:"
+free -h | grep -E "Mem|Swap"
+echo ""
+# Display GPU information
+if command -v nvidia-smi &> /dev/null; then
+    echo "🎮 GPU information:"
+    nvidia-smi --query-gpu=name,memory.total,memory.free --format=csv,noheader,nounits | head -1
+    echo ""
+fi
+# ==================== Start Demo ====================
+echo "🎬 Starting Gradio demo..."
+echo "📱 Please access in browser: http://$server_name:$server_port"
+echo "⏹️  Press Ctrl+C to stop service"
+echo "🔄 First startup may take several minutes to load model..."
+echo "=========================================="
+# Start Python demo
+python $demo_file \
+    --model_path "$model_path" \
+    --task "$task" \
+    --server_name "$server_name" \
+    --server_port "$server_port"
-# python gradio_demo_zh.py \
+# Display final system resource usage
-#     --model_path $model_path \
+echo ""
-#     --server_name 0.0.0.0 \
+echo "=========================================="
-#     --server_port 8005
+echo "📊 Final system resource usage:"
+free -h | grep -E "Mem|Swap"
--- a/configs/caching/adacache/wan_i2v_ada.json
+++ b/configs/caching/adacache/wan_i2v_ada.json
+{
+    "infer_steps": 40,
+    "target_video_length": 81,
+    "target_height": 480,
+    "target_width": 832,
+    "self_attn_1_type": "flash_attn3",
+    "cross_attn_1_type": "flash_attn3",
+    "cross_attn_2_type": "flash_attn3",
+    "seed": 442,
+    "sample_guide_scale": 5,
+    "sample_shift": 3,
+    "enable_cfg": true,
+    "cpu_offload": false,
+    "feature_caching": "Ada"
+}
--- a/configs/caching/teacache/wan_t2v_14b.json
+++ b/configs/caching/teacache/wan_t2v_14b.json
--- a/configs/caching/teacache/wan_t2v_1_3b.json
+++ b/configs/caching/teacache/wan_t2v_1_3b.json
--- a/docs/EN/source/deploy_guides/deploy_comfyui.md
+++ b/docs/EN/source/deploy_guides/deploy_comfyui.md
-# comfyui部署
+# ComfyUI Deployment
-xxx
+This feature will be available soon.
--- a/docs/EN/source/deploy_guides/lora_deploy.md
+++ b/docs/EN/source/deploy_guides/lora_deploy.md
+# Lora模型部署
+xxx
--- a/docs/EN/source/getting_started/benchmark.md
+++ b/docs/EN/source/getting_started/benchmark.md
+# Benchmark
+xxx
--- a/docs/EN/source/getting_started/quickstart.md
+++ b/docs/EN/source/getting_started/quickstart.md
@@ -18,9 +18,8 @@ git clone https://github.com/ModelTC/lightx2v.git lightx2v && cd lightx2v
 conda create -n lightx2v python=3.11 && conda activate lightx2v
 pip install -r requirements.txt
-# Install again separately to bypass the version conflict check
 # The Hunyuan model needs to run under this version of transformers. If you do not need to run the Hunyuan model, you can ignore this step.
-pip install transformers==4.45.2
+# pip install transformers==4.45.2
 # install flash-attention 2
 git clone https://github.com/Dao-AILab/flash-attention.git --recursive
@@ -34,7 +33,7 @@ cd flash-attention/hopper && python setup.py install
 ```shell
 # Modify the path in the script
-bash scripts/run_wan_t2v.sh
+bash scripts/wan/run_wan_t2v.sh
 ```
-In addition to the existing input arguments in the script, there are also some necessary parameters in the `${lightx2v_path}/configs/wan_t2v.json` file specified by `--config_json`. You can modify them as needed.
+In addition to the existing input arguments in the script, there are also some necessary parameters in the `wan_t2v.json` file specified by `--config_json`. You can modify them as needed.
--- a/docs/EN/source/index.rst
+++ b/docs/EN/source/index.rst
@@ -2,17 +2,32 @@ Welcome to Lightx2v!
 ==================
 .. figure:: ../../../assets/img_lightx2v.png
-  :width: 100%
+  :width: 80%
  :align: center
  :alt: Lightx2v
  :class: no-scaled-link
 .. raw:: html
-   <p style="text-align:center">
+    <div align="center" style="font-family: charter;">
-   <strong>A Light Video Generation Inference Framework
-   </strong>
+    <a href="https://opensource.org/licenses/Apache-2.0"><img src="https://img.shields.io/badge/License-Apache_2.0-blue.svg" alt="License"></a>
+    <a href="https://deepwiki.com/ModelTC/lightx2v"><img src="https://deepwiki.com/badge.svg" alt="Ask DeepWiki"></a>
+    <a href="https://lightx2v-en.readthedocs.io/en/latest"><img src="https://img.shields.io/badge/docs-English-99cc2" alt="Doc"></a>
+    <a href="https://lightx2v-zhcn.readthedocs.io/zh-cn/latest"><img src="https://img.shields.io/badge/文档-中文-99cc2" alt="Doc"></a>
+    <a href="https://hub.docker.com/r/lightx2v/lightx2v/tags"><img src="https://badgen.net/badge/icon/docker?icon=docker&label" alt="Docker"></a>
+    </div>
+    <div align="center" style="font-family: charter;">
+    <strong>LightX2V: Light Video Generation Inference Framework</strong>
+    </div>
+LightX2V is a lightweight video generation inference framework designed to provide an inference tool that leverages multiple advanced video generation inference techniques. As a unified inference platform, this framework supports various generation tasks such as text-to-video (T2V) and image-to-video (I2V) across different models. X2V means transforming different input modalities (such as text or images) to video output.
+GitHub: https://github.com/ModelTC/lightx2v
+HuggingFace: https://huggingface.co/lightx2v
 Documentation
 -------------
@@ -22,6 +37,7 @@ Documentation
   :caption: Quick Start
   Quick Start <getting_started/quickstart.md>
+   Benchmark <getting_started/benchmark.md>
 .. toctree::
   :maxdepth: 1
@@ -32,6 +48,8 @@ Documentation
   Attention Module <method_tutorials/attention.md>
   Offloading <method_tutorials/offload.md>
   Parallel Inference <method_tutorials/parallel.md>
+   Step Distill <method_tutorials/step_distill.md>
+   Autoregressive Distill <method_tutorials/autoregressive_distill.md>
 .. toctree::
   :maxdepth: 1
@@ -39,14 +57,8 @@ Documentation
   Low Latency Deployment <deploy_guides/for_low_latency.md>
   Low Resource Deployment <deploy_guides/for_low_resource.md>
-   Server Deployment <deploy_guides/deploy_service.md>
+   Lora Deployment <deploy_guides/lora_deploy.md>
+   Service Deployment <deploy_guides/deploy_service.md>
   Gradio Deployment <deploy_guides/deploy_gradio.md>
   ComfyUI Deployment <deploy_guides/deploy_comfyui.md>
   Local Windows Deployment <deploy_guides/deploy_local_windows.md>
-.. Indices and tables
-.. ==================
-.. * :ref:`genindex`
-.. * :ref:`modindex`
--- a/docs/EN/source/method_tutorials/attention.md
+++ b/docs/EN/source/method_tutorials/attention.md
-# 注意力机制
+# 🎯 Attention Type Configuration in DiT Model
-xxx
+The DiT model in `LightX2V` currently uses three types of attention mechanisms. Each type of attention can be configured with a specific backend library.
+---
+## Attention Usage Locations
+1. **Self-Attention on the image**
+   - Configuration key: `self_attn_1_type`
+2. **Cross-Attention between image and prompt text**
+   - Configuration key: `cross_attn_1_type`
+3. **Cross-Attention between image and reference image (in I2V mode)**
+   - Configuration key: `cross_attn_2_type`
+---
+## 🚀 Supported Attention Backends
+| Name               | Type Identifier   | GitHub Link |
+|--------------------|-------------------|-------------|
+| Flash Attention 2  | `flash_attn2`     | [flash-attention v2](https://github.com/Dao-AILab/flash-attention) |
+| Flash Attention 3  | `flash_attn3`     | [flash-attention v3](https://github.com/Dao-AILab/flash-attention) |
+| Sage Attention 2   | `sage_attn2`      | [SageAttention](https://github.com/thu-ml/SageAttention) |
+| Radial Attention   | `radial_attn`     | [Radial Attention](https://github.com/mit-han-lab/radial-attention) |
+| Sparge Attention   | `sparge_ckpt`     | [Sparge Attention](https://github.com/thu-ml/SpargeAttn) |
+---
+## 🛠️ Configuration Example
+In the `wan_i2v.json` configuration file, you can specify the attention types as follows:
+```json
+{
+  "self_attn_1_type": "radial_attn",
+  "cross_attn_1_type": "flash_attn3",
+  "cross_attn_2_type": "flash_attn3"
+}
+```
+To use other attention backends, simply replace the values with the appropriate type identifiers listed above.
+Tip: Due to the limitations of the sparse algorithm's principle, radial_attn can only be used in self-attention.
+---
+For Sparge Attention like `wan_t2v_sparge.json` configuration file:
+   Sparge Attention need PostTrain weight path
+```json
+{
+  "self_attn_1_type": "flash_attn3",
+  "cross_attn_1_type": "flash_attn3",
+  "cross_attn_2_type": "flash_attn3"
+  "sparge": true,
+  "sparge_ckpt": "/path/to/sparge_wan2.1_t2v_1.3B.pt"
+}
+```
+---
+For further customization or behavior tuning, please refer to the official documentation of the respective attention libraries.
--- a/docs/EN/source/method_tutorials/autoregressive_distill.md
+++ b/docs/EN/source/method_tutorials/autoregressive_distill.md
+# 自回归蒸馏
+xxx
--- a/docs/EN/source/method_tutorials/cache.md
+++ b/docs/EN/source/method_tutorials/cache.md
-# 特征缓存
+# Feature Caching
-xxx
+## Cache Acceleration Algorithm
+- Cache reuse is an important acceleration algorithm in the inference process of diffusion models.
+- Its core idea is to skip redundant computations at certain time steps by reusing historical cache results to improve inference efficiency.
+- The key to the algorithm is how to decide at which time steps to perform cache reuse, usually based on dynamic judgment of model state changes or error thresholds.
+- During inference, key content such as intermediate features, residuals, and attention outputs need to be cached. When entering reusable time steps, directly use the cached content and reconstruct the current output through approximation methods like Taylor expansion, thereby reducing repetitive calculations and achieving efficient inference.
+### TeaCache
+The core idea of `TeaCache` is to accumulate the **relative L1** distance between adjacent time step inputs, and when the cumulative distance reaches a set threshold, determine that the current time step can perform cache reuse.
+- Specifically, the algorithm calculates the relative L1 distance between the current input and the previous step input at each inference step, and accumulates it.
+- When the cumulative distance exceeds the threshold, indicating that the model state has changed sufficiently, it directly reuses the most recently cached content, skipping some redundant computations. This can significantly reduce the number of forward computations of the model and improve inference speed.
+In actual effect, TeaCache achieves significant acceleration while ensuring generation quality. The video comparison before and after acceleration is as follows:
+| Before Acceleration | After Acceleration |
+|:------:|:------:|
+| Single H200 inference time: 58s | Single H200 inference time: 17.9s |
+| ![Effect before acceleration](../../../../assets/gifs/1.gif) | ![Effect after acceleration](../../../../assets/gifs/2.gif) |
+- Speedup ratio: **3.24**
+- config：[wan_t2v_1_3b_tea_480p.json](https://github.com/ModelTC/lightx2v/tree/main/configs/caching/teacache/wan_t2v_1_3b_tea_480p.json)
+- Reference paper: [https://arxiv.org/abs/2411.19108](https://arxiv.org/abs/2411.19108)
+### TaylorSeer Cache
+The core of `TaylorSeer Cache` lies in using Taylor formula to recalculate cached content as residual compensation for cache reuse time steps. The specific approach is that at cache reuse time steps, not only simply reuse historical cache, but also approximate reconstruction of current output through Taylor expansion. This can further improve output accuracy while reducing computational load. Taylor expansion can effectively capture subtle changes in model state, compensating for errors brought by cache reuse, thereby ensuring generation quality while accelerating. `TaylorSeer Cache` is suitable for scenarios with high requirements for output precision, and can further improve model inference performance on the basis of cache reuse.
+| Before Acceleration | After Acceleration |
+|:------:|:------:|
+| Single H200 inference time: 57.7s | Single H200 inference time: 41.3s |
+| ![Effect before acceleration](../../../../assets/gifs/3.gif) | ![Effect after acceleration](../../../../assets/gifs/4.gif) |
+- Speedup ratio: **1.39**
+- config：[wan_t2v_taylorseer](https://github.com/ModelTC/lightx2v/tree/main/configs/caching/taylorseer/wan_t2v_taylorseer.json)
+- Reference paper: [https://arxiv.org/abs/2503.06923](https://arxiv.org/abs/2503.06923)
+### AdaCache
+The core idea of `AdaCache` is to dynamically adjust the step size of cache reuse based on partial cached content in specified block chunks.
+- The algorithm analyzes feature differences between two adjacent time steps within specific blocks, and adaptively decides the next cache reuse time step interval based on the difference magnitude.
+- When model state changes are small, the step size automatically increases, reducing cache update frequency; when state changes are large, the step size decreases to ensure output quality.
+This allows flexible adjustment of cache strategies based on dynamic changes in the actual inference process, achieving more efficient acceleration and better generation effects. AdaCache is suitable for application scenarios with high requirements for both inference speed and generation quality.
+| Before Acceleration | After Acceleration |
+|:------:|:------:|
+| Single H200 inference time: 227s | Single H200 inference time: 83s |
+| ![Effect before acceleration](../../../../assets/gifs/5.gif) | ![Effect after acceleration](../../../../assets/gifs/6.gif) |
+- Speedup ratio: **2.73**
+- config：[wan_i2v_ada](https://github.com/ModelTC/lightx2v/tree/main/configs/caching/adacache/wan_i2v_ada.json)
+- Reference paper: [https://arxiv.org/abs/2411.02397](https://arxiv.org/abs/2411.02397)
+### CustomCache
+`CustomCache` combines the advantages of `TeaCache` and `TaylorSeer Cache`.
+- It combines the real-time and rationality of `TeaCache` in cache decision-making, determining when to perform cache reuse through dynamic thresholds.
+- At the same time, it utilizes `TaylorSeer`'s Taylor expansion method to make use of cached content.
+This not only efficiently determines the timing of cache reuse, but also maximizes the utilization of cached content, improving output accuracy and generation quality. Actual tests show that `CustomCache` generates video quality superior to using `TeaCache`, `TaylorSeer Cache`, or `AdaCache` alone across multiple content generation tasks, making it one of the currently optimal comprehensive performance cache acceleration algorithms.
+| Before Acceleration | After Acceleration |
+|:------:|:------:|
+| Single H200 inference time: 57.9s | Single H200 inference time: 16.6s |
+| ![Effect before acceleration](../../../../assets/gifs/7.gif) | ![Effect after acceleration](../../../../assets/gifs/8.gif) |
+- Speedup ratio: **3.49**
+- config：[wan_t2v_custom_1_3b](https://github.com/ModelTC/lightx2v/tree/main/configs/caching/custom/wan_t2v_custom_1_3b.json)
+## How to Run
+The config files for feature caching are available [here](https://github.com/ModelTC/lightx2v/tree/main/configs/caching)
+By specifying --config_json to the specific config file, you can test different cache algorithms.
+[Here](https://github.com/ModelTC/lightx2v/tree/main/scripts/cache) are some running scripts for use.
--- a/docs/EN/source/method_tutorials/step_distill.md
+++ b/docs/EN/source/method_tutorials/step_distill.md
+# 步数蒸馏
+xxx