Merge pull request #106 from ModelTC/dev_docs

Update gradio docs

Merge pull request #106 from ModelTC/dev_docs
Update gradio docs
b62c81e3 · gushiqiao · GitHub · 82661877 · 27920492 · b62c81e3
Commit b62c81e3 authored Jul 11, 2025 by gushiqiao Committed by GitHub Jul 11, 2025
5 changed files
--- a/app/README.md
+++ b/app/README.md
-# Lightx2v Gradio Demo Interface
+# Gradio Demo

-## 📖 Overview
+Please refer our Gradio Demo doc:

-Lightx2v is a lightweight video inference and generation engine that provides a web interface based on Gradio, supporting both Image-to-Video and Text-to-Video generation modes.
-
-This project contains two main demo files:
- `gradio_demo.py` - English interface version
- `gradio_demo_zh.py` - Chinese interface version
-
-## 🚀 Quick Start
-
-### System Requirements
-
- Python 3.10+ (recommended)
- CUDA 12.4+ (recommended)
- At least 8GB GPU VRAM
- At least 16GB system memory
- At least 128GB SSD solid-state drive (**💾 Strongly recommend using SSD solid-state drives to store model files! During "lazy loading" startup, significantly improves model loading speed and inference performance**)
-
-### Install Dependencies
-
-```bash
-# Install basic dependencies
-pip install -r ../requirements.txt
-pip install gradio
-```
-
-#### Recommended Optimization Library Configuration
-
- ✅ [Flash attention](https://github.com/Dao-AILab/flash-attention)
- ✅ [Sage attention](https://github.com/thu-ml/SageAttention)
- ✅ [vllm-kernel](https://github.com/vllm-project/vllm)
- ✅ [sglang-kernel](https://github.com/sgl-project/sglang/tree/main/sgl-kernel)
- ✅ [q8-kernel](https://github.com/KONAKONA666/q8_kernels) (only supports ADA architecture GPUs)
-
-### 🤖 Supported Models
-
-#### 🎬 Image-to-Video Models
-
-| Model Name | Resolution | Parameters | Features | Recommended Use |
-|------------|------------|------------|----------|-----------------|
-| ✅ [Wan2.1-I2V-14B-480P-Lightx2v](https://huggingface.co/lightx2v/Wan2.1-T2V-14B-StepDistill-CfgDistill) | 480p | 14B | Standard version | Balance speed and quality |
-| ✅ [Wan2.1-I2V-14B-720P-Lightx2v](https://huggingface.co/lightx2v/Wan2.1-T2V-14B-StepDistill-CfgDistill) | 720p | 14B | HD version | Pursue high-quality output |
-| ✅ [Wan2.1-I2V-14B-480P-Lightx2v-StepDistill-CfgDistill](https://huggingface.co/lightx2v/Wan2.1-T2V-14B-StepDistill-CfgDistill) | 480p | 14B | Distilled optimized version | Faster inference speed |
-| ✅ [Wan2.1-I2V-14B-720P-Lightx2v-StepDistill-CfgDistill](https://huggingface.co/lightx2v/Wan2.1-T2V-14B-StepDistill-CfgDistill) | 720p | 14B | HD distilled version | High quality + fast inference |
-
-#### 📝 Text-to-Video Models
-
-| Model Name | Parameters | Features | Recommended Use |
-|------------|------------|----------|-----------------|
-| ✅ [Wan2.1-T2V-1.3B-Lightx2v](https://huggingface.co/lightx2v/Wan2.1-T2V-14B-StepDistill-CfgDistill) | 1.3B | Lightweight | Fast prototyping and testing |
-| ✅ [Wan2.1-T2V-14B-Lightx2v](https://huggingface.co/lightx2v/Wan2.1-T2V-14B-StepDistill-CfgDistill) | 14B | Standard version | Balance speed and quality |
-| ✅ [Wan2.1-T2V-14B-Lightx2v-StepDistill-CfgDistill](https://huggingface.co/lightx2v/Wan2.1-T2V-14B-StepDistill-CfgDistill) | 14B | Distilled optimized version | High quality + fast inference |
-
-**💡 Model Selection Recommendations**:
- **First-time use**: Recommend choosing distilled versions
- **Pursuing quality**: Choose 720p resolution or 14B parameter models
- **Pursuing speed**: Choose 480p resolution or 1.3B parameter models
- **Resource-constrained**: Prioritize distilled versions and lower resolutions
-
-### Startup Methods
-
-#### Method 1: Using Startup Script (Recommended)
-
-```bash
-# 1. Edit the startup script to configure relevant paths
-vim run_gradio.sh
-
-# Configuration items that need to be modified:
-# - lightx2v_path: Lightx2v project root directory path
-# - i2v_model_path: Image-to-video model path
-# - t2v_model_path: Text-to-video model path
-
-# 💾 Important note: Recommend pointing model paths to SSD storage locations
-# Example: /mnt/ssd/models/ or /data/ssd/models/
-
-# 2. Run the startup script
-bash run_gradio.sh
-
-# 3. Or start with parameters (recommended)
-bash run_gradio.sh --task i2v --lang en --port 8032
-# bash run_gradio.sh --task t2v --lang en --port 8032
-```
-
-#### Method 2: Direct Command Line Startup
-
-**Image-to-Video Mode:**
-```bash
-python gradio_demo.py \
-    --model_path /path/to/Wan2.1-I2V-14B-720P-Lightx2v \
-    --task i2v \
-    --server_name 0.0.0.0 \
-    --server_port 7862
-```
-
-**Text-to-Video Mode:**
-```bash
-python gradio_demo.py \
-    --model_path /path/to/Wan2.1-T2V-1.3B \
-    --task t2v \
-    --server_name 0.0.0.0 \
-    --server_port 7862
-```
-
-**Chinese Interface Version:**
-```bash
-python gradio_demo_zh.py \
-    --model_path /path/to/model \
-    --task i2v \
-    --server_name 0.0.0.0 \
-    --server_port 7862
-```
-
-## 📋 Command Line Parameters
-
-| Parameter | Type | Required | Default | Description |
-|-----------|------|----------|---------|-------------|
-| `--model_path` | str | ✅ | - | Model folder path |
-| `--model_cls` | str | ❌ | wan2.1 | Model class (currently only supports wan2.1) |
-| `--task` | str | ✅ | - | Task type: `i2v` (image-to-video) or `t2v` (text-to-video) |
-| `--server_port` | int | ❌ | 7862 | Server port |
-| `--server_name` | str | ❌ | 0.0.0.0 | Server IP address |
-
-## 🎯 Features
-
-### Basic Settings
-
-#### Model Type Selection
- **Wan2.1 14B**: Large parameter count, high generation quality, suitable for high-quality video generation
- **Wan2.1 1.3B**: Lightweight model, fast speed, suitable for rapid prototyping and testing
-
-#### Input Parameters
- **Prompt**: Describe the expected video content
- **Negative Prompt**: Specify elements you don't want to appear
- **Resolution**: Supports multiple preset resolutions (480p/540p/720p)
- **Random Seed**: Controls the randomness of generation results
- **Inference Steps**: Affects the balance between generation quality and speed
-
-#### Video Parameters
- **FPS**: Frames per second
- **Total Frames**: Video length
- **CFG Scale Factor**: Controls prompt influence strength (1-10)
- **Distribution Shift**: Controls generation style deviation degree (0-10)
-
-### Advanced Optimization Options
-
-#### GPU Memory Optimization
- **Chunked Rotary Position Embedding**: Saves GPU memory
- **Rotary Embedding Chunk Size**: Controls chunk granularity
- **Clean CUDA Cache**: Promptly frees GPU memory
-
-#### Asynchronous Offloading
- **CPU Offloading**: Transfers partial computation to CPU
- **Lazy Loading**: Loads model components on-demand, significantly reduces system memory consumption
- **Offload Granularity Control**: Fine-grained control of offloading strategies
-
-#### Low-Precision Quantization
- **Attention Operators**: Flash Attention, Sage Attention, etc.
- **Quantization Operators**: vLLM, SGL, Q8F, etc.
- **Precision Modes**: FP8, INT8, BF16, etc.
-
-#### VAE Optimization
- **Lightweight VAE**: Accelerates decoding process
- **VAE Tiling Inference**: Reduces memory usage
-
-#### Feature Caching
- **Tea Cache**: Caches intermediate features to accelerate generation
- **Cache Threshold**: Controls cache trigger conditions
- **Key Step Caching**: Writes cache only at key steps
-
-## 🔧 Auto-Configuration Feature
-
-After enabling "Auto-configure Inference Options", the system will automatically optimize parameters based on your hardware configuration:
-
-### GPU Memory Rules
- **80GB+**: Default configuration, no optimization needed
- **48GB**: Enable CPU offloading, offload ratio 50%
- **40GB**: Enable CPU offloading, offload ratio 80%
- **32GB**: Enable CPU offloading, offload ratio 100%
- **24GB**: Enable BF16 precision, VAE tiling
- **16GB**: Enable chunked offloading, rotary embedding chunking
- **12GB**: Enable cache cleaning, lightweight VAE
- **8GB**: Enable quantization, lazy loading
-
-### CPU Memory Rules
- **128GB+**: Default configuration
- **64GB**: Enable DIT quantization
- **32GB**: Enable lazy loading
- **16GB**: Enable full model quantization
-
-## ⚠️ Important Notes
-
-### 🚀 Low-Resource Device Optimization Recommendations
-
-**💡 For devices with insufficient VRAM or performance constraints**:
-
- **🎯 Model Selection**: Prioritize using distilled version models (StepDistill-CfgDistill)
- **⚡ Inference Steps**: Recommend setting to 4 steps
- **🔧 CFG Settings**: Recommend disabling CFG option to improve generation speed
- **🔄 Auto-Configuration**: Enable "Auto-configure Inference Options"
-
-### 🔧 Quick Optimization Configuration Examples
-
-```bash
-# Start with distilled model
-bash run_gradio.sh --task i2v
-
-# Interface setting recommendations
- Inference Steps: 25
- CFG Scale Factor: 4
- Resolution: 832x480
- Auto-Configuration: Enabled
- Quantization Scheme: int8
- Tea Cache: Enabled
-```
-
-## 📁 File Structure
-
-```
-lightx2v/app/
-├── gradio_demo.py          # English interface demo
-├── gradio_demo_zh.py       # Chinese interface demo
-├── run_gradio.sh          # Startup script
-├── README.md              # Documentation
-├── saved_videos/          # Generated video save directory
-└── inference_logs.log     # Inference logs
-```
-
-## 🎨 Interface Description
-
-### Basic Settings Tab
- **Input Parameters**: Model type, prompts, resolution, and other basic settings
- **Video Parameters**: FPS, frame count, CFG, and other video generation parameters
- **Output Settings**: Video save path configuration
-
-### Advanced Options Tab
- **GPU Memory Optimization**: Memory management related options
- **Asynchronous Offloading**: CPU offloading and lazy loading
- **Low-Precision Quantization**: Various quantization optimization options
- **VAE Optimization**: Variational Autoencoder optimization
- **Feature Caching**: Cache strategy configuration
-
-## 🔍 Troubleshooting
-
-### Common Issues
-
-**💡 Tip**: Generally, after enabling "Auto-configure Inference Options", the system will automatically optimize parameter settings based on your hardware configuration, and performance issues usually won't occur. If you encounter problems, please refer to the following solutions:
-
-1. **CUDA Memory Insufficient**
-   - Enable CPU offloading
-   - Reduce resolution
-   - Enable quantization options
-
-2. **System Memory Insufficient**
-   - Enable CPU offloading
-   - Enable lazy loading option
-   - Enable quantization options
-
-3. **Slow Generation Speed**
-   - Reduce inference steps
-   - Enable auto-configuration
-   - Use lightweight models
-   - Enable Tea Cache
-   - Use quantization operators
-   - 💾 **Check if models are stored on SSD**
-
-4. **Slow Model Loading**
-   - 💾 **Migrate models to SSD storage**
-   - Enable lazy loading option
-   - Check disk I/O performance
-   - Consider using NVMe SSD
-
-5. **Poor Video Quality**
-   - Increase inference steps
-   - Increase CFG scale factor
-   - Use 14B models
-   - Optimize prompts
-
-### Log Viewing
-
-```bash
-# View inference logs
-tail -f inference_logs.log
-
-# View GPU usage
-nvidia-smi
-
-# View system resources
-htop
-```
-
-
-**Note**: Please comply with relevant laws and regulations when using videos generated by this tool, and do not use them for illegal purposes.
+[English doc](https://lightx2v-en.readthedocs.io/en/latest/deploy_guides/deploy_gradio.html)
+[中文文档](https://lightx2v-zhcn.readthedocs.io/zh-cn/latest/deploy_guides/deploy_gradio.mdl)
--- a/app/README_zh.md
+++ b/app/README_zh.md
-# Lightx2v Gradio 演示界面
-
-## 📖 概述
-
-Lightx2v 是一个轻量级的视频推理和生成引擎，提供了基于 Gradio 的 Web 界面，支持图像到视频（Image-to-Video）和文本到视频（Text-to-Video）两种生成模式。
-
-本项目包含两个主要演示文件：
- `gradio_demo.py` - 英文界面版本
- `gradio_demo_zh.py` - 中文界面版本
-
-## 🚀 快速开始
-
-### 环境要求
-
- Python 3.10+ (推荐)
- CUDA 12.4+ (推荐)
- 至少 8GB GPU 显存
- 至少 16GB 系统内存
- 至少 128GB SSD固态硬盘 (**💾 强烈建议使用SSD固态硬盘存储模型文件！"延迟加载"启动时，显著提升模型加载速度和推理性能**)
-
-
-### 安装依赖☀
-
-```bash
-# 安装基础依赖
-pip install -r ../requirements.txt
-pip install gradio
-```
-#### 推荐优化库配置
-
- ✅ [Flash attention](https://github.com/Dao-AILab/flash-attention)
- ✅ [Sage attention](https://github.com/thu-ml/SageAttention)
- ✅ [vllm-kernel](https://github.com/vllm-project/vllm)
- ✅ [sglang-kernel](https://github.com/sgl-project/sglang/tree/main/sgl-kernel)
- ✅ [q8-kernel](https://github.com/KONAKONA666/q8_kernels) (只支持ADA架构的GPU)
-
-### 🤖 支持的模型
-
-#### 🎬 图像到视频模型 (Image-to-Video)
-
-| 模型名称 | 分辨率 | 参数量 | 特点 | 推荐场景 |
-|----------|--------|--------|------|----------|
-| ✅ [Wan2.1-I2V-14B-480P-Lightx2v](https://huggingface.co/lightx2v/Wan2.1-T2V-14B-StepDistill-CfgDistill) | 480p | 14B | 标准版本 | 平衡速度和质量 |
-| ✅ [Wan2.1-I2V-14B-720P-Lightx2v](https://huggingface.co/lightx2v/Wan2.1-T2V-14B-StepDistill-CfgDistill) | 720p | 14B | 高清版本 | 追求高质量输出 |
-| ✅ [Wan2.1-I2V-14B-480P-Lightx2v-StepDistill-CfgDistill](https://huggingface.co/lightx2v/Wan2.1-T2V-14B-StepDistill-CfgDistill) | 480p | 14B | 蒸馏优化版 | 更快的推理速度 |
-| ✅ [Wan2.1-I2V-14B-720P-Lightx2v-StepDistill-CfgDistill](https://huggingface.co/lightx2v/Wan2.1-T2V-14B-StepDistill-CfgDistill) | 720p | 14B | 高清蒸馏版 | 高质量+快速推理 |
-
-#### 📝 文本到视频模型 (Text-to-Video)
-
-| 模型名称 | 参数量 | 特点 | 推荐场景 |
-|----------|--------|------|----------|
-| ✅ [Wan2.1-T2V-1.3B-Lightx2v](https://huggingface.co/lightx2v/Wan2.1-T2V-14B-StepDistill-CfgDistill) | 1.3B | 轻量级 | 快速原型测试 |
-| ✅ [Wan2.1-T2V-14B-Lightx2v](https://huggingface.co/lightx2v/Wan2.1-T2V-14B-StepDistill-CfgDistill) | 14B | 标准版本 | 平衡速度和质量 |
-| ✅ [Wan2.1-T2V-14B-Lightx2v-StepDistill-CfgDistill](https://huggingface.co/lightx2v/Wan2.1-T2V-14B-StepDistill-CfgDistill) | 14B | 蒸馏优化版 | 高质量+快速推理 |
-
-
-**💡 模型选择建议**:
- **首次使用**: 建议选择蒸馏版本
- **追求质量**: 选择720p分辨率或14B参数模型
- **追求速度**: 选择480p分辨率或1.3B参数模型
- **资源受限**: 优先选择蒸馏版本和较低分辨率
-
-
-
-### 启动方式
-
-#### 方式一：使用启动脚本（推荐）
-
-```bash
-# 1. 编辑启动脚本，配置相关路径
-vim run_gradio.sh
-
-# 需要修改的配置项：
-# - lightx2v_path: Lightx2v项目根目录路径
-# - i2v_model_path: 图像到视频模型路径
-# - t2v_model_path: 文本到视频模型路径
-
-# 💾 重要提示：建议将模型路径指向SSD存储位置
-# 例如：/mnt/ssd/models/ 或 /data/ssd/models/
-
-# 2. 运行启动脚本
-bash run_gradio.sh
-
-# 3. 或使用参数启动（推荐）
-bash run_gradio.sh --task i2v --lang zh --port 8032
-# bash run_gradio.sh --task t2v --lang zh --port 8032
-```
-
-#### 方式二：直接命令行启动
-
-**图像到视频模式：**
-```bash
-python gradio_demo_zh.py \
-    --model_path /path/to/Wan2.1-I2V-14B-720P-Lightx2v \
-    --task i2v \
-    --server_name 0.0.0.0 \
-    --server_port 7862
-```
-
-**文本到视频模式：**
-```bash
-python gradio_demo_zh.py \
-    --model_path /path/to/Wan2.1-T2V-1.3B \
-    --task t2v \
-    --server_name 0.0.0.0 \
-    --server_port 7862
-```
-
-**英文界面版本：**
-```bash
-python gradio_demo.py \
-    --model_path /path/to/model \
-    --task i2v \
-    --server_name 0.0.0.0 \
-    --server_port 7862
-```
-
-## 📋 命令行参数
-
-| 参数 | 类型 | 必需 | 默认值 | 说明 |
-|------|------|------|--------|------|
-| `--model_path` | str | ✅ | - | 模型文件夹路径 |
-| `--model_cls` | str | ❌ | wan2.1 | 模型类别（目前仅支持wan2.1） |
-| `--task` | str | ✅ | - | 任务类型：`i2v`（图像到视频）或 `t2v`（文本到视频） |
-| `--server_port` | int | ❌ | 7862 | 服务器端口 |
-| `--server_name` | str | ❌ | 0.0.0.0 | 服务器IP地址 |
-
-## 🎯 功能特性
-
-### 基本设置
-
-#### 模型类型选择
- **Wan2.1 14B**: 参数量大，生成质量高，适合高质量视频生成
- **Wan2.1 1.3B**: 轻量级模型，速度快，适合快速原型和测试
-
-#### 输入参数
- **提示词 (Prompt)**: 描述期望的视频内容
- **负向提示词 (Negative Prompt)**: 指定不希望出现的元素
- **分辨率**: 支持多种预设分辨率（480p/540p/720p）
- **随机种子**: 控制生成结果的随机性
- **推理步数**: 影响生成质量和速度的平衡
-
-#### 视频参数
- **FPS**: 每秒帧数
- **总帧数**: 视频长度
- **CFG缩放因子**: 控制提示词影响强度（1-10）
- **分布偏移**: 控制生成风格偏离程度（0-10）
-
-### 高级优化选项
-
-#### GPU内存优化
- **分块旋转位置编码**: 节省GPU内存
- **旋转编码块大小**: 控制分块粒度
- **清理CUDA缓存**: 及时释放GPU内存
-
-#### 异步卸载
- **CPU卸载**: 将部分计算转移到CPU
- **延迟加载**: 按需加载模型组件，显著节省系统内存消耗
- **卸载粒度控制**: 精细控制卸载策略
-
-#### 低精度量化
- **注意力算子**: Flash Attention、Sage Attention等
- **量化算子**: vLLM、SGL、Q8F等
- **精度模式**: FP8、INT8、BF16等
-
-#### VAE优化
- **轻量级VAE**: 加速解码过程
- **VAE分块推理**: 减少内存占用
-
-#### 特征缓存
- **Tea Cache**: 缓存中间特征加速生成
- **缓存阈值**: 控制缓存触发条件
- **关键步缓存**: 仅在关键步骤写入缓存
-
-## 🔧 自动配置功能
-
-启用"自动配置推理选项"后，系统会根据您的硬件配置自动优化参数：
-
-
-### GPU内存规则
- **80GB+**: 默认配置，无需优化
- **48GB**: 启用CPU卸载，卸载比例50%
- **40GB**: 启用CPU卸载，卸载比例80%
- **32GB**: 启用CPU卸载，卸载比例100%
- **24GB**: 启用BF16精度、VAE分块
- **16GB**: 启用分块卸载、旋转编码分块
- **12GB**: 启用清理缓存、轻量级VAE
- **8GB**: 启用量化、延迟加载
-
-### CPU内存规则
- **128GB+**: 默认配置
- **64GB**: 启用DIT量化
- **32GB**: 启用延迟加载
- **16GB**: 启用全模型量化
-
-## ⚠️ 重要注意事项
-
-### 🚀 低资源设备优化建议
-
-**💡 针对显存不足或性能受限的设备**:
-
- **🎯 模型选择**: 优先使用蒸馏版本模型 (StepDistill-CfgDistill)
- **⚡ 推理步数**: 建议设置为 4 步
- **🔧 CFG设置**: 建议关闭CFG选项以提升生成速度
- **🔄 自动配置**: 启用"自动配置推理选项"
-
-### 🔧 快速优化配置示例
-
-```bash
-# 启动时使用蒸馏模型
-bash run_gradio.sh --task i2v
-
-# 界面设置建议
- 推理步数: 25
- CFG缩放因子: 4
- 分辨率: 832x480
- 自动配置: 开启
- 量化方案: int8
- Tea Cache: 开启
-```
-
-## 📁 文件结构
-
-```
-lightx2v/app/
-├── gradio_demo.py          # 英文界面演示
-├── gradio_demo_zh.py       # 中文界面演示
-├── run_gradio.sh          # 启动脚本
-├── README.md              # 说明文档
-├── saved_videos/          # 生成视频保存目录
-└── inference_logs.log     # 推理日志
-```
-
-## 🎨 界面说明
-
-### 基本设置标签页
- **输入参数**: 模型类型、提示词、分辨率等基本设置
- **视频参数**: FPS、帧数、CFG等视频生成参数
- **输出设置**: 视频保存路径配置
-
-### 高级选项标签页
- **GPU内存优化**: 内存管理相关选项
- **异步卸载**: CPU卸载和延迟加载
- **低精度量化**: 各种量化优化选项
- **VAE优化**: 变分自编码器优化
- **特征缓存**: 缓存策略配置
-
-## 🔍 故障排除
-
-### 常见问题
-
-**💡 提示**: 一般情况下，启用"自动配置推理选项"后，系统会根据您的硬件配置自动优化参数设置，通常不会出现性能问题。如果遇到问题，请参考以下解决方案：
-
-1. **CUDA内存不足**
-   - 启用CPU卸载
-   - 降低分辨率
-   - 启用量化选项
-
-1. **系統内存不足**
-   - 启用CPU卸载
-   - 启用延迟加载选项
-   - 启用量化选项
-
-2. **生成速度慢**
-   - 减少推理步数
-   - 启用自动配置
-   - 使用轻量级模型
-   - 启用Tea Cache
-   - 使用量化算子
-   - 💾 **检查模型是否存放在SSD上**
-
-3. **模型加载缓慢**
-   - 💾 **将模型迁移到SSD存储**
-   - 启用延迟加载选项
-   - 检查磁盘I/O性能
-   - 考虑使用NVMe SSD
-
-4. **视频质量不佳**
-   - 增加推理步数
-   - 提高CFG缩放因子
-   - 使用14B模型
-   - 优化提示词
-
-### 日志查看
-
-```bash
-# 查看推理日志
-tail -f inference_logs.log
-
-# 查看GPU使用情况
-nvidia-smi
-
-# 查看系统资源
-htop
-```
-
-
-欢迎提交Issue和Pull Request来改进这个项目！
-
-
-**注意**: 使用本工具生成的视频内容请遵守相关法律法规，不得用于非法用途。
--- a/docs/EN/source/deploy_guides/deploy_gradio.md
+++ b/docs/EN/source/deploy_guides/deploy_gradio.md
-# gradio部署
+# Lightx2v Gradio Demo Interface

-xxx
+## 📖 Overview
+
+Lightx2v is a lightweight video inference and generation engine that provides a web interface based on Gradio, supporting both Image-to-Video and Text-to-Video generation modes.
+
+This project contains two main demo files:
+- `gradio_demo.py` - English interface version
+- `gradio_demo_zh.py` - Chinese interface version
+
+## 🚀 Quick Start
+
+### System Requirements
+
+- Python 3.10+ (recommended)
+- CUDA 12.4+ (recommended)
+- At least 8GB GPU VRAM
+- At least 16GB system memory
+- At least 128GB SSD solid-state drive (**💾 Strongly recommend using SSD solid-state drives to store model files! During "lazy loading" startup, significantly improves model loading speed and inference performance**)
+
+### Install Dependencies
+
+```bash
+# Install basic dependencies
+pip install -r requirements.txt
+pip install gradio
+```
+
+#### Recommended Optimization Library Configuration
+
+- ✅ [Flash attention](https://github.com/Dao-AILab/flash-attention)
+- ✅ [Sage attention](https://github.com/thu-ml/SageAttention)
+- ✅ [vllm-kernel](https://github.com/vllm-project/vllm)
+- ✅ [sglang-kernel](https://github.com/sgl-project/sglang/tree/main/sgl-kernel)
+- ✅ [q8-kernel](https://github.com/KONAKONA666/q8_kernels) (only supports ADA architecture GPUs)
+
+### 🤖 Supported Models
+
+#### 🎬 Image-to-Video Models
+
+| Model Name | Resolution | Parameters | Features | Recommended Use |
+|------------|------------|------------|----------|-----------------|
+| ✅ [Wan2.1-I2V-14B-480P-Lightx2v](https://huggingface.co/lightx2v/Wan2.1-I2V-14B-480P-Lightx2v) | 480p | 14B | Standard version | Balance speed and quality |
+| ✅ [Wan2.1-I2V-14B-720P-Lightx2v](https://huggingface.co/lightx2v/Wan2.1-I2V-14B-720P-Lightx2v) | 720p | 14B | HD version | Pursue high-quality output |
+| ✅ [Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-Lightx2v](https://huggingface.co/lightx2v/Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-Lightx2v) | 480p | 14B | Distilled optimized version | Faster inference speed |
+| ✅ [Wan2.1-I2V-14B-720P-StepDistill-CfgDistill-Lightx2v](https://huggingface.co/lightx2v/Wan2.1-I2V-14B-720P-StepDistill-CfgDistill-Lightx2v) | 720p | 14B | HD distilled version | High quality + fast inference |
+
+#### 📝 Text-to-Video Models
+
+| Model Name | Parameters | Features | Recommended Use |
+|------------|------------|----------|-----------------|
+| ✅ [Wan2.1-T2V-1.3B-Lightx2v](https://huggingface.co/lightx2v/Wan2.1-T2V-1.3B-Lightx2v) | 1.3B | Lightweight | Fast prototyping and testing |
+| ✅ [Wan2.1-T2V-14B-Lightx2v](https://huggingface.co/lightx2v/Wan2.1-T2V-14B-Lightx2v) | 14B | Standard version | Balance speed and quality |
+| ✅ [Wan2.1-T2V-14B-StepDistill-CfgDistill-Lightx2v](https://huggingface.co/lightx2v/Wan2.1-T2V-14B-StepDistill-CfgDistill-Lightx2v) | 14B | Distilled optimized version | High quality + fast inference |
+
+**💡 Model Selection Recommendations**:
+- **First-time use**: Recommend choosing distilled versions
+- **Pursuing quality**: Choose 720p resolution or 14B parameter models
+- **Pursuing speed**: Choose 480p resolution or 1.3B parameter models
+- **Resource-constrained**: Prioritize distilled versions and lower resolutions
+
+### Startup Methods
+
+#### Method 1: Using Startup Script (Recommended)
+
+```bash
+# 1. Edit the startup script to configure relevant paths
+cd app/
+vim run_gradio.sh
+
+# Configuration items that need to be modified:
+# - lightx2v_path: Lightx2v project root directory path
+# - i2v_model_path: Image-to-video model path
+# - t2v_model_path: Text-to-video model path
+
+# 💾 Important note: Recommend pointing model paths to SSD storage locations
+# Example: /mnt/ssd/models/ or /data/ssd/models/
+
+# 2. Run the startup script
+bash run_gradio.sh
+
+# 3. Or start with parameters (recommended)
+bash run_gradio.sh --task i2v --lang en --port 8032
+# bash run_gradio.sh --task t2v --lang en --port 8032
+```
+
+#### Method 2: Direct Command Line Startup
+
+**Image-to-Video Mode:**
+```bash
+python gradio_demo.py \
+    --model_path /path/to/Wan2.1-I2V-14B-720P-Lightx2v \
+    --task i2v \
+    --server_name 0.0.0.0 \
+    --server_port 7862
+```
+
+**Text-to-Video Mode:**
+```bash
+python gradio_demo.py \
+    --model_path /path/to/Wan2.1-T2V-1.3B \
+    --task t2v \
+    --server_name 0.0.0.0 \
+    --server_port 7862
+```
+
+**Chinese Interface Version:**
+```bash
+python gradio_demo_zh.py \
+    --model_path /path/to/model \
+    --task i2v \
+    --server_name 0.0.0.0 \
+    --server_port 7862
+```
+
+## 📋 Command Line Parameters
+
+| Parameter | Type | Required | Default | Description |
+|-----------|------|----------|---------|-------------|
+| `--model_path` | str | ✅ | - | Model folder path |
+| `--model_cls` | str | ❌ | wan2.1 | Model class (currently only supports wan2.1) |
+| `--task` | str | ✅ | - | Task type: `i2v` (image-to-video) or `t2v` (text-to-video) |
+| `--server_port` | int | ❌ | 7862 | Server port |
+| `--server_name` | str | ❌ | 0.0.0.0 | Server IP address |
+
+## 🎯 Features
+
+### Basic Settings
+
+#### Model Type Selection
+- **Wan2.1 14B**: Large parameter count, high generation quality, suitable for high-quality video generation
+- **Wan2.1 1.3B**: Lightweight model, fast speed, suitable for rapid prototyping and testing
+
+#### Input Parameters
+- **Prompt**: Describe the expected video content
+- **Negative Prompt**: Specify elements you don't want to appear
+- **Resolution**: Supports multiple preset resolutions (480p/540p/720p)
+- **Random Seed**: Controls the randomness of generation results
+- **Inference Steps**: Affects the balance between generation quality and speed
+
+#### Video Parameters
+- **FPS**: Frames per second
+- **Total Frames**: Video length
+- **CFG Scale Factor**: Controls prompt influence strength (1-10)
+- **Distribution Shift**: Controls generation style deviation degree (0-10)
+
+### Advanced Optimization Options
+
+#### GPU Memory Optimization
+- **Chunked Rotary Position Embedding**: Saves GPU memory
+- **Rotary Embedding Chunk Size**: Controls chunk granularity
+- **Clean CUDA Cache**: Promptly frees GPU memory
+
+#### Asynchronous Offloading
+- **CPU Offloading**: Transfers partial computation to CPU
+- **Lazy Loading**: Loads model components on-demand, significantly reduces system memory consumption
+- **Offload Granularity Control**: Fine-grained control of offloading strategies
+
+#### Low-Precision Quantization
+- **Attention Operators**: Flash Attention, Sage Attention, etc.
+- **Quantization Operators**: vLLM, SGL, Q8F, etc.
+- **Precision Modes**: FP8, INT8, BF16, etc.
+
+#### VAE Optimization
+- **Lightweight VAE**: Accelerates decoding process
+- **VAE Tiling Inference**: Reduces memory usage
+
+#### Feature Caching
+- **Tea Cache**: Caches intermediate features to accelerate generation
+- **Cache Threshold**: Controls cache trigger conditions
+- **Key Step Caching**: Writes cache only at key steps
+
+## 🔧 Auto-Configuration Feature
+
+After enabling "Auto-configure Inference Options", the system will automatically optimize parameters based on your hardware configuration:
+
+### GPU Memory Rules
+- **80GB+**: Default configuration, no optimization needed
+- **48GB**: Enable CPU offloading, offload ratio 50%
+- **40GB**: Enable CPU offloading, offload ratio 80%
+- **32GB**: Enable CPU offloading, offload ratio 100%
+- **24GB**: Enable BF16 precision, VAE tiling
+- **16GB**: Enable chunked offloading, rotary embedding chunking
+- **12GB**: Enable cache cleaning, lightweight VAE
+- **8GB**: Enable quantization, lazy loading
+
+### CPU Memory Rules
+- **128GB+**: Default configuration
+- **64GB**: Enable DIT quantization
+- **32GB**: Enable lazy loading
+- **16GB**: Enable full model quantization
+
+## ⚠️ Important Notes
+
+### 🚀 Low-Resource Device Optimization Recommendations
+
+**💡 For devices with insufficient VRAM or performance constraints**:
+
+- **🎯 Model Selection**: Prioritize using distilled version models (StepDistill-CfgDistill)
+- **⚡ Inference Steps**: Recommend setting to 4 steps
+- **🔧 CFG Settings**: Recommend disabling CFG option to improve generation speed
+- **🔄 Auto-Configuration**: Enable "Auto-configure Inference Options"
+
+
+## 📁 File Structure
+
+```
+lightx2v/app/
+├── gradio_demo.py          # English interface demo
+├── gradio_demo_zh.py       # Chinese interface demo
+├── run_gradio.sh          # Startup script
+├── README.md              # Documentation
+├── saved_videos/          # Generated video save directory
+└── inference_logs.log     # Inference logs
+```
+
+## 🎨 Interface Description
+
+### Basic Settings Tab
+- **Input Parameters**: Model type, prompts, resolution, and other basic settings
+- **Video Parameters**: FPS, frame count, CFG, and other video generation parameters
+- **Output Settings**: Video save path configuration
+
+### Advanced Options Tab
+- **GPU Memory Optimization**: Memory management related options
+- **Asynchronous Offloading**: CPU offloading and lazy loading
+- **Low-Precision Quantization**: Various quantization optimization options
+- **VAE Optimization**: Variational Autoencoder optimization
+- **Feature Caching**: Cache strategy configuration
+
+## 🔍 Troubleshooting
+
+### Common Issues
+
+**💡 Tip**: Generally, after enabling "Auto-configure Inference Options", the system will automatically optimize parameter settings based on your hardware configuration, and performance issues usually won't occur. If you encounter problems, please refer to the following solutions:
+
+1. **CUDA Memory Insufficient**
+   - Enable CPU offloading
+   - Reduce resolution
+   - Enable quantization options
+
+2. **System Memory Insufficient**
+   - Enable CPU offloading
+   - Enable lazy loading option
+   - Enable quantization options
+
+3. **Slow Generation Speed**
+   - Reduce inference steps
+   - Enable auto-configuration
+   - Use lightweight models
+   - Enable Tea Cache
+   - Use quantization operators
+   - 💾 **Check if models are stored on SSD**
+
+4. **Slow Model Loading**
+   - 💾 **Migrate models to SSD storage**
+   - Enable lazy loading option
+   - Check disk I/O performance
+   - Consider using NVMe SSD
+
+5. **Poor Video Quality**
+   - Increase inference steps
+   - Increase CFG scale factor
+   - Use 14B models
+   - Optimize prompts
+
+### Log Viewing
+
+```bash
+# View inference logs
+tail -f inference_logs.log
+
+# View GPU usage
+nvidia-smi
+
+# View system resources
+htop
+```
+
+
+**Note**: Please comply with relevant laws and regulations when using videos generated by this tool, and do not use them for illegal purposes.
--- a/docs/ZH_CN/source/deploy_guides/deploy_gradio.md
+++ b/docs/ZH_CN/source/deploy_guides/deploy_gradio.md
-# Gradio 部署
+# Lightx2v Gradio 演示界面

-xxx
+## 📖 概述
+
+Lightx2v 是一个轻量级的视频推理和生成引擎，提供了基于 Gradio 的 Web 界面，支持图像到视频（Image-to-Video）和文本到视频（Text-to-Video）两种生成模式。
+
+本项目包含两个主要演示文件：
+- `gradio_demo.py` - 英文界面版本
+- `gradio_demo_zh.py` - 中文界面版本
+
+## 🚀 快速开始
+
+### 环境要求
+
+- Python 3.10+ (推荐)
+- CUDA 12.4+ (推荐)
+- 至少 8GB GPU 显存
+- 至少 16GB 系统内存
+- 至少 128GB SSD固态硬盘 (**💾 强烈建议使用SSD固态硬盘存储模型文件！"延迟加载"启动时，显著提升模型加载速度和推理性能**)
+
+
+### 安装依赖☀
+
+```bash
+# 安装基础依赖
+pip install -r requirements.txt
+pip install gradio
+```
+#### 推荐优化库配置
+
+- ✅ [Flash attention](https://github.com/Dao-AILab/flash-attention)
+- ✅ [Sage attention](https://github.com/thu-ml/SageAttention)
+- ✅ [vllm-kernel](https://github.com/vllm-project/vllm)
+- ✅ [sglang-kernel](https://github.com/sgl-project/sglang/tree/main/sgl-kernel)
+- ✅ [q8-kernel](https://github.com/KONAKONA666/q8_kernels) (只支持ADA架构的GPU)
+
+### 🤖 支持的模型
+
+#### 🎬 图像到视频模型 (Image-to-Video)
+
+| 模型名称 | 分辨率 | 参数量 | 特点 | 推荐场景 |
+|----------|--------|--------|------|----------|
+| ✅ [Wan2.1-I2V-14B-480P-Lightx2v](https://huggingface.co/lightx2v/Wan2.1-I2V-14B-480P-Lightx2v) | 480p | 14B | 标准版本 | 平衡速度和质量 |
+| ✅ [Wan2.1-I2V-14B-720P-Lightx2v](https://huggingface.co/lightx2v/Wan2.1-I2V-14B-720P-Lightx2v) | 720p | 14B | 高清版本 | 追求高质量输出 |
+| ✅ [Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-Lightx2v](https://huggingface.co/lightx2v/Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-Lightx2v) | 480p | 14B | 蒸馏优化版 | 更快的推理速度 |
+| ✅ [Wan2.1-I2V-14B-720P-StepDistill-CfgDistill-Lightx2v](https://huggingface.co/lightx2v/Wan2.1-I2V-14B-720P-StepDistill-CfgDistill-Lightx2v) | 720p | 14B | 高清蒸馏版 | 高质量+快速推理 |
+
+#### 📝 文本到视频模型 (Text-to-Video)
+
+| 模型名称 | 参数量 | 特点 | 推荐场景 |
+|----------|--------|------|----------|
+| ✅ [Wan2.1-T2V-1.3B-Lightx2v](https://huggingface.co/lightx2v/Wan2.1-T2V-1.3B-Lightx2v) | 1.3B | 轻量级 | 快速原型测试 |
+| ✅ [Wan2.1-T2V-14B-Lightx2v](https://huggingface.co/lightx2v/Wan2.1-T2V-14B-Lightx2v) | 14B | 标准版本 | 平衡速度和质量 |
+| ✅ [Wan2.1-T2V-14B-StepDistill-CfgDistill-Lightx2v](https://huggingface.co/lightx2v/Wan2.1-T2V-14B-StepDistill-CfgDistill-Lightx2v) | 14B | 蒸馏优化版 | 高质量+快速推理 |
+
+
+**💡 模型选择建议**:
+- **首次使用**: 建议选择蒸馏版本
+- **追求质量**: 选择720p分辨率或14B参数模型
+- **追求速度**: 选择480p分辨率或1.3B参数模型
+- **资源受限**: 优先选择蒸馏版本和较低分辨率
+
+
+
+### 启动方式
+
+#### 方式一：使用启动脚本（推荐）
+
+```bash
+# 1. 编辑启动脚本，配置相关路径
+cd app/
+vim run_gradio.sh
+
+# 需要修改的配置项：
+# - lightx2v_path: Lightx2v项目根目录路径
+# - i2v_model_path: 图像到视频模型路径
+# - t2v_model_path: 文本到视频模型路径
+
+# 💾 重要提示：建议将模型路径指向SSD存储位置
+# 例如：/mnt/ssd/models/ 或 /data/ssd/models/
+
+# 2. 运行启动脚本
+bash run_gradio.sh
+
+# 3. 或使用参数启动（推荐）
+bash run_gradio.sh --task i2v --lang zh --port 8032
+# bash run_gradio.sh --task t2v --lang zh --port 8032
+```
+
+#### 方式二：直接命令行启动
+
+**图像到视频模式：**
+```bash
+python gradio_demo_zh.py \
+    --model_path /path/to/Wan2.1-I2V-14B-720P-Lightx2v \
+    --task i2v \
+    --server_name 0.0.0.0 \
+    --server_port 7862
+```
+
+**文本到视频模式：**
+```bash
+python gradio_demo_zh.py \
+    --model_path /path/to/Wan2.1-T2V-1.3B \
+    --task t2v \
+    --server_name 0.0.0.0 \
+    --server_port 7862
+```
+
+**英文界面版本：**
+```bash
+python gradio_demo.py \
+    --model_path /path/to/model \
+    --task i2v \
+    --server_name 0.0.0.0 \
+    --server_port 7862
+```
+
+## 📋 命令行参数
+
+| 参数 | 类型 | 必需 | 默认值 | 说明 |
+|------|------|------|--------|------|
+| `--model_path` | str | ✅ | - | 模型文件夹路径 |
+| `--model_cls` | str | ❌ | wan2.1 | 模型类别（目前仅支持wan2.1） |
+| `--task` | str | ✅ | - | 任务类型：`i2v`（图像到视频）或 `t2v`（文本到视频） |
+| `--server_port` | int | ❌ | 7862 | 服务器端口 |
+| `--server_name` | str | ❌ | 0.0.0.0 | 服务器IP地址 |
+
+## 🎯 功能特性
+
+### 基本设置
+
+#### 模型类型选择
+- **Wan2.1 14B**: 参数量大，生成质量高，适合高质量视频生成
+- **Wan2.1 1.3B**: 轻量级模型，速度快，适合快速原型和测试
+
+#### 输入参数
+- **提示词 (Prompt)**: 描述期望的视频内容
+- **负向提示词 (Negative Prompt)**: 指定不希望出现的元素
+- **分辨率**: 支持多种预设分辨率（480p/540p/720p）
+- **随机种子**: 控制生成结果的随机性
+- **推理步数**: 影响生成质量和速度的平衡
+
+#### 视频参数
+- **FPS**: 每秒帧数
+- **总帧数**: 视频长度
+- **CFG缩放因子**: 控制提示词影响强度（1-10）
+- **分布偏移**: 控制生成风格偏离程度（0-10）
+
+### 高级优化选项
+
+#### GPU内存优化
+- **分块旋转位置编码**: 节省GPU内存
+- **旋转编码块大小**: 控制分块粒度
+- **清理CUDA缓存**: 及时释放GPU内存
+
+#### 异步卸载
+- **CPU卸载**: 将部分计算转移到CPU
+- **延迟加载**: 按需加载模型组件，显著节省系统内存消耗
+- **卸载粒度控制**: 精细控制卸载策略
+
+#### 低精度量化
+- **注意力算子**: Flash Attention、Sage Attention等
+- **量化算子**: vLLM、SGL、Q8F等
+- **精度模式**: FP8、INT8、BF16等
+
+#### VAE优化
+- **轻量级VAE**: 加速解码过程
+- **VAE分块推理**: 减少内存占用
+
+#### 特征缓存
+- **Tea Cache**: 缓存中间特征加速生成
+- **缓存阈值**: 控制缓存触发条件
+- **关键步缓存**: 仅在关键步骤写入缓存
+
+## 🔧 自动配置功能
+
+启用"自动配置推理选项"后，系统会根据您的硬件配置自动优化参数：
+
+
+### GPU内存规则
+- **80GB+**: 默认配置，无需优化
+- **48GB**: 启用CPU卸载，卸载比例50%
+- **40GB**: 启用CPU卸载，卸载比例80%
+- **32GB**: 启用CPU卸载，卸载比例100%
+- **24GB**: 启用BF16精度、VAE分块
+- **16GB**: 启用分块卸载、旋转编码分块
+- **12GB**: 启用清理缓存、轻量级VAE
+- **8GB**: 启用量化、延迟加载
+
+### CPU内存规则
+- **128GB+**: 默认配置
+- **64GB**: 启用DIT量化
+- **32GB**: 启用延迟加载
+- **16GB**: 启用全模型量化
+
+## ⚠️ 重要注意事项
+
+### 🚀 低资源设备优化建议
+
+**💡 针对显存不足或性能受限的设备**:
+
+- **🎯 模型选择**: 优先使用蒸馏版本模型 (StepDistill-CfgDistill)
+- **⚡ 推理步数**: 建议设置为 4 步
+- **🔧 CFG设置**: 建议关闭CFG选项以提升生成速度
+- **🔄 自动配置**: 启用"自动配置推理选项"
+
+
+## 📁 文件结构
+
+```
+lightx2v/app/
+├── gradio_demo.py          # 英文界面演示
+├── gradio_demo_zh.py       # 中文界面演示
+├── run_gradio.sh          # 启动脚本
+├── README.md              # 说明文档
+├── saved_videos/          # 生成视频保存目录
+└── inference_logs.log     # 推理日志
+```
+
+## 🎨 界面说明
+
+### 基本设置标签页
+- **输入参数**: 模型类型、提示词、分辨率等基本设置
+- **视频参数**: FPS、帧数、CFG等视频生成参数
+- **输出设置**: 视频保存路径配置
+
+### 高级选项标签页
+- **GPU内存优化**: 内存管理相关选项
+- **异步卸载**: CPU卸载和延迟加载
+- **低精度量化**: 各种量化优化选项
+- **VAE优化**: 变分自编码器优化
+- **特征缓存**: 缓存策略配置
+
+## 🔍 故障排除
+
+### 常见问题
+
+**💡 提示**: 一般情况下，启用"自动配置推理选项"后，系统会根据您的硬件配置自动优化参数设置，通常不会出现性能问题。如果遇到问题，请参考以下解决方案：
+
+1. **CUDA内存不足**
+   - 启用CPU卸载
+   - 降低分辨率
+   - 启用量化选项
+
+1. **系統内存不足**
+   - 启用CPU卸载
+   - 启用延迟加载选项
+   - 启用量化选项
+
+2. **生成速度慢**
+   - 减少推理步数
+   - 启用自动配置
+   - 使用轻量级模型
+   - 启用Tea Cache
+   - 使用量化算子
+   - 💾 **检查模型是否存放在SSD上**
+
+3. **模型加载缓慢**
+   - 💾 **将模型迁移到SSD存储**
+   - 启用延迟加载选项
+   - 检查磁盘I/O性能
+   - 考虑使用NVMe SSD
+
+4. **视频质量不佳**
+   - 增加推理步数
+   - 提高CFG缩放因子
+   - 使用14B模型
+   - 优化提示词
+
+### 日志查看
+
+```bash
+# 查看推理日志
+tail -f inference_logs.log
+
+# 查看GPU使用情况
+nvidia-smi
+
+# 查看系统资源
+htop
+```
+
+
+欢迎提交Issue和Pull Request来改进这个项目！
+
+
+**注意**: 使用本工具生成的视频内容请遵守相关法律法规，不得用于非法用途。
--- a/lightx2v/models/networks/wan/model.py
+++ b/lightx2v/models/networks/wan/model.py
@@ -80,7 +80,12 @@ class WanModel:
        safetensors_files = glob.glob(safetensors_pattern)

        if not safetensors_files:
-            raise FileNotFoundError(f"No .safetensors files found in directory: {self.model_path}")
+            original_pattern = os.path.join(self.model_path, "original", "*.safetensors")
+            safetensors_files = glob.glob(original_pattern)
+
+            if not safetensors_files:
+                raise FileNotFoundError(f"No .safetensors files found in directory: {self.model_path}")
+
        weight_dict = {}
        for file_path in safetensors_files:
            file_weights = self._load_safetensor_to_dict(file_path, use_bf16, skip_bf16)