updata lightx2v

a1ebc651 · xuwx1 · 5a4db490 · a1ebc651 · a1ebc651 · a1ebc651
Commit a1ebc651 authored Dec 11, 2025 by xuwx1
20 changed files
--- a/docs/EN/source/deploy_guides/lora_deploy.md
+++ b/docs/EN/source/deploy_guides/lora_deploy.md
+# LoRA Model Deployment and Related Tools
+
+LoRA (Low-Rank Adaptation) is an efficient model fine-tuning technique that significantly reduces the number of trainable parameters through low-rank matrix decomposition. LightX2V fully supports LoRA technology, including LoRA inference, LoRA extraction, and LoRA merging functions.
+
+## 🎯 LoRA Technical Features
+
+- **Efficient Fine-tuning**: Dramatically reduces training parameters through low-rank adaptation
+- **Flexible Deployment**: Supports dynamic loading and removal of LoRA weights
+- **Multiple Formats**: Supports various LoRA weight formats and naming conventions
+- **Comprehensive Tools**: Provides complete LoRA extraction and merging toolchain
+
+## 📜 LoRA Inference Deployment
+
+### Configuration File Method
+
+Specify LoRA path in configuration file:
+
+```json
+{
+  "lora_configs": [
+    {
+      "path": "/path/to/your/lora.safetensors",
+      "strength": 1.0
+    }
+  ]
+}
+```
+
+**Configuration Parameter Description:**
+
+- `lora_path`: LoRA weight file path list, supports loading multiple LoRAs simultaneously
+- `strength_model`: LoRA strength coefficient (alpha), controls LoRA's influence on the original model
+
+### Command Line Method
+
+Specify LoRA path directly in command line (supports loading single LoRA only):
+
+```bash
+python -m lightx2v.infer \
+  --model_cls wan2.1 \
+  --task t2v \
+  --model_path /path/to/model \
+  --config_json /path/to/config.json \
+  --lora_path /path/to/your/lora.safetensors \
+  --lora_strength 0.8 \
+  --prompt "Your prompt here"
+```
+
+### Multiple LoRAs Configuration
+
+To use multiple LoRAs with different strengths, specify them in the config JSON file:
+
+```json
+{
+  "lora_configs": [
+    {
+      "path": "/path/to/first_lora.safetensors",
+      "strength": 0.8
+    },
+    {
+      "path": "/path/to/second_lora.safetensors",
+      "strength": 0.5
+    }
+  ]
+}
+```
+
+### Supported LoRA Formats
+
+LightX2V supports multiple LoRA weight naming conventions:
+
+| Format Type | Weight Naming | Description |
+|-------------|---------------|-------------|
+| **Standard LoRA** | `lora_A.weight`, `lora_B.weight` | Standard LoRA matrix decomposition format |
+| **Down/Up Format** | `lora_down.weight`, `lora_up.weight` | Another common naming convention |
+| **Diff Format** | `diff` | `weight` difference values |
+| **Bias Diff** | `diff_b` | `bias` weight difference values |
+| **Modulation Diff** | `diff_m` | `modulation` weight difference values |
+
+### Inference Script Examples
+
+**Step Distillation LoRA Inference:**
+
+```bash
+# T2V LoRA Inference
+bash scripts/wan/run_wan_t2v_distill_4step_cfg_lora.sh
+
+# I2V LoRA Inference
+bash scripts/wan/run_wan_i2v_distill_4step_cfg_lora.sh
+```
+
+**Audio-Driven LoRA Inference:**
+
+```bash
+bash scripts/wan/run_wan_i2v_audio.sh
+```
+
+### Using LoRA in API Service
+
+Specify through [config file](wan_t2v_distill_4step_cfg_lora.json), modify the startup command in [scripts/server/start_server.sh](https://github.com/ModelTC/lightx2v/blob/main/scripts/server/start_server.sh):
+
+```bash
+python -m lightx2v.api_server \
+  --model_cls wan2.1_distill \
+  --task t2v \
+  --model_path $model_path \
+  --config_json ${lightx2v_path}/configs/distill/wan_t2v_distill_4step_cfg_lora.json \
+  --port 8000 \
+  --nproc_per_node 1
+```
+
+## 🔧 LoRA Extraction Tool
+
+Use `tools/extract/lora_extractor.py` to extract LoRA weights from the difference between two models.
+
+### Basic Usage
+
+```bash
+python tools/extract/lora_extractor.py \
+  --source-model /path/to/base/model \
+  --target-model /path/to/finetuned/model \
+  --output /path/to/extracted/lora.safetensors \
+  --rank 32
+```
+
+### Parameter Description
+
+| Parameter | Type | Required | Default | Description |
+|-----------|------|----------|---------|-------------|
+| `--source-model` | str | ✅ | - | Base model path |
+| `--target-model` | str | ✅ | - | Fine-tuned model path |
+| `--output` | str | ✅ | - | Output LoRA file path |
+| `--source-type` | str | ❌ | `safetensors` | Base model format (`safetensors`/`pytorch`) |
+| `--target-type` | str | ❌ | `safetensors` | Fine-tuned model format (`safetensors`/`pytorch`) |
+| `--output-format` | str | ❌ | `safetensors` | Output format (`safetensors`/`pytorch`) |
+| `--rank` | int | ❌ | `32` | LoRA rank value |
+| `--output-dtype` | str | ❌ | `bf16` | Output data type |
+| `--diff-only` | bool | ❌ | `False` | Save weight differences only, without LoRA decomposition |
+
+### Advanced Usage Examples
+
+**Extract High-Rank LoRA:**
+
+```bash
+python tools/extract/lora_extractor.py \
+  --source-model /path/to/base/model \
+  --target-model /path/to/finetuned/model \
+  --output /path/to/high_rank_lora.safetensors \
+  --rank 64 \
+  --output-dtype fp16
+```
+
+**Save Weight Differences Only:**
+
+```bash
+python tools/extract/lora_extractor.py \
+  --source-model /path/to/base/model \
+  --target-model /path/to/finetuned/model \
+  --output /path/to/weight_diff.safetensors \
+  --diff-only
+```
+
+## 🔀 LoRA Merging Tool
+
+Use `tools/extract/lora_merger.py` to merge LoRA weights into the base model for subsequent quantization and other operations.
+
+### Basic Usage
+
+```bash
+python tools/extract/lora_merger.py \
+  --source-model /path/to/base/model \
+  --lora-model /path/to/lora.safetensors \
+  --output /path/to/merged/model.safetensors \
+  --alpha 1.0
+```
+
+### Parameter Description
+
+| Parameter | Type | Required | Default | Description |
+|-----------|------|----------|---------|-------------|
+| `--source-model` | str | ✅ | - | Base model path |
+| `--lora-model` | str | ✅ | - | LoRA weights path |
+| `--output` | str | ✅ | - | Output merged model path |
+| `--source-type` | str | ❌ | `safetensors` | Base model format |
+| `--lora-type` | str | ❌ | `safetensors` | LoRA weights format |
+| `--output-format` | str | ❌ | `safetensors` | Output format |
+| `--alpha` | float | ❌ | `1.0` | LoRA merge strength |
+| `--output-dtype` | str | ❌ | `bf16` | Output data type |
+
+### Advanced Usage Examples
+
+**Partial Strength Merging:**
+
+```bash
+python tools/extract/lora_merger.py \
+  --source-model /path/to/base/model \
+  --lora-model /path/to/lora.safetensors \
+  --output /path/to/merged_model.safetensors \
+  --alpha 0.7 \
+  --output-dtype fp32
+```
+
+**Multi-Format Support:**
+
+```bash
+python tools/extract/lora_merger.py \
+  --source-model /path/to/base/model.pt \
+  --source-type pytorch \
+  --lora-model /path/to/lora.safetensors \
+  --lora-type safetensors \
+  --output /path/to/merged_model.safetensors \
+  --output-format safetensors \
+  --alpha 1.0
+```
--- a/docs/EN/source/getting_started/benchmark.md
+++ b/docs/EN/source/getting_started/benchmark.md
+# Benchmark
+
+For a better display of video playback effects and detailed performance comparisons, you can get better presentation and corresponding documentation content on this [🔗 page](https://github.com/ModelTC/LightX2V/blob/main/docs/EN/source/getting_started/benchmark_source.md).
--- a/docs/EN/source/getting_started/benchmark_source.md
+++ b/docs/EN/source/getting_started/benchmark_source.md
+# 🚀 Benchmark
+
+> This document showcases the performance test results of LightX2V across different hardware environments, including detailed comparison data for H200 and RTX 4090 platforms.
+
+---
+
+## 🖥️ H200 Environment (~140GB VRAM)
+
+### 📋 Software Environment Configuration
+
+| Component | Version |
+|:----------|:--------|
+| **Python** | 3.11 |
+| **PyTorch** | 2.7.1+cu128 |
+| **SageAttention** | 2.2.0 |
+| **vLLM** | 0.9.2 |
+| **sgl-kernel** | 0.1.8 |
+
+---
+
+### 🎬 480P 5s Video Test
+
+**Test Configuration:**
+- **Model**: [Wan2.1-I2V-14B-480P-Lightx2v](https://huggingface.co/lightx2v/Wan2.1-I2V-14B-480P-Lightx2v)
+- **Parameters**: `infer_steps=40`, `seed=42`, `enable_cfg=True`
+
+#### 📊 Performance Comparison Table
+
+| Configuration | Inference Time(s) | GPU Memory(GB) | Speedup | Video Effect |
+|:-------------|:-----------------:|:--------------:|:-------:|:------------:|
+| **Wan2.1 Official** | 366 | 71 | 1.0x | <video src="https://github.com/user-attachments/assets/24fb112e-c868-4484-b7f0-d9542979c2c3" width="200px"></video> |
+| **FastVideo** | 292 | 26 | **1.25x** | <video src="https://github.com/user-attachments/assets/26c01987-441b-4064-b6f4-f89347fddc15" width="200px"></video> |
+| **LightX2V_1** | 250 | 53 | **1.46x** | <video src="https://github.com/user-attachments/assets/7bffe48f-e433-430b-91dc-ac745908ba3a" width="200px"></video> |
+| **LightX2V_2** | 216 | 50 | **1.70x** | <video src="https://github.com/user-attachments/assets/0a24ca47-c466-433e-8a53-96f259d19841" width="200px"></video> |
+| **LightX2V_3** | 191 | 35 | **1.92x** | <video src="https://github.com/user-attachments/assets/970c73d3-1d60-444e-b64d-9bf8af9b19f1" width="200px"></video> |
+| **LightX2V_3-Distill** | 14 | 35 | **🏆 20.85x** | <video src="https://github.com/user-attachments/assets/b4dc403c-919d-4ba1-b29f-ef53640c0334" width="200px"></video> |
+| **LightX2V_4** | 107 | 35 | **3.41x** | <video src="https://github.com/user-attachments/assets/49cd2760-4be2-432c-bf4e-01af9a1303dd" width="200px"></video> |
+
+---
+
+### 🎬 720P 5s Video Test
+
+**Test Configuration:**
+- **Model**: [Wan2.1-I2V-14B-720P-Lightx2v](https://huggingface.co/lightx2v/Wan2.1-I2V-14B-720P-Lightx2v)
+- **Parameters**: `infer_steps=40`, `seed=1234`, `enable_cfg=True`
+
+#### 📊 Performance Comparison Table
+
+| Configuration | Inference Time(s) | GPU Memory(GB) | Speedup | Video Effect |
+|:-------------|:-----------------:|:--------------:|:-------:|:------------:|
+| **Wan2.1 Official** | 974 | 81 | 1.0x | <video src="https://github.com/user-attachments/assets/a28b3956-ec52-4a8e-aa97-c8baf3129771" width="200px"></video> |
+| **FastVideo** | 914 | 40 | **1.07x** | <video src="https://github.com/user-attachments/assets/bd09a886-e61c-4214-ae0f-6ff2711cafa8" width="200px"></video> |
+| **LightX2V_1** | 807 | 65 | **1.21x** | <video src="https://github.com/user-attachments/assets/a79aae87-9560-4935-8d05-7afc9909e993" width="200px"></video> |
+| **LightX2V_2** | 751 | 57 | **1.30x** | <video src="https://github.com/user-attachments/assets/cb389492-9b33-40b6-a132-84e6cb9fa620" width="200px"></video> |
+| **LightX2V_3** | 671 | 43 | **1.45x** | <video src="https://github.com/user-attachments/assets/71c3d085-5d8a-44e7-aac3-412c108d9c53" width="200px"></video> |
+| **LightX2V_3-Distill** | 44 | 43 | **🏆 22.14x** | <video src="https://github.com/user-attachments/assets/9fad8806-938f-4527-b064-0c0b58f0f8c2" width="200px"></video> |
+| **LightX2V_4** | 344 | 46 | **2.83x** | <video src="https://github.com/user-attachments/assets/c744d15d-9832-4746-b72c-85fa3b87ed0d" width="200px"></video> |
+
+---
+
+## 🖥️ RTX 4090 Environment (~24GB VRAM)
+
+### 📋 Software Environment Configuration
+
+| Component | Version |
+|:----------|:--------|
+| **Python** | 3.9.16 |
+| **PyTorch** | 2.5.1+cu124 |
+| **SageAttention** | 2.1.0 |
+| **vLLM** | 0.6.6 |
+| **sgl-kernel** | 0.0.5 |
+| **q8-kernels** | 0.0.0 |
+
+---
+
+### 🎬 480P 5s Video Test
+
+**Test Configuration:**
+- **Model**: [Wan2.1-I2V-14B-480P-Lightx2v](https://huggingface.co/lightx2v/Wan2.1-I2V-14B-480P-Lightx2v)
+- **Parameters**: `infer_steps=40`, `seed=42`, `enable_cfg=True`
+
+#### 📊 Performance Comparison Table
+
+| Configuration | Inference Time(s) | GPU Memory(GB) | Speedup | Video Effect |
+|:-------------|:-----------------:|:--------------:|:-------:|:------------:|
+| **Wan2GP(profile=3)** | 779 | 20 | **1.0x** | <video src="https://github.com/user-attachments/assets/ba548a48-04f8-4616-a55a-ad7aed07d438" width="200px"></video> |
+| **LightX2V_5** | 738 | 16 | **1.05x** | <video src="https://github.com/user-attachments/assets/ce72ab7d-50a7-4467-ac8c-a6ed1b3827a7" width="200px"></video> |
+| **LightX2V_5-Distill** | 68 | 16 | **11.45x** | <video src="https://github.com/user-attachments/assets/5df4b8a7-3162-47f8-a359-e22fbb4d1836" width="200px"></video> |
+| **LightX2V_6** | 630 | 12 | **1.24x** | <video src="https://github.com/user-attachments/assets/d13cd939-363b-4f8b-80d9-d3a145c46676" width="200px"></video> |
+| **LightX2V_6-Distill** | 63 | 12 | **🏆 12.36x** | <video src="https://github.com/user-attachments/assets/f372bce4-3c2f-411d-aa6b-c4daeb467d90" width="200px"></video>
+
+---
+
+### 🎬 720P 5s Video Test
+
+**Test Configuration:**
+- **Model**: [Wan2.1-I2V-14B-720P-Lightx2v](https://huggingface.co/lightx2v/Wan2.1-I2V-14B-720P-Lightx2v)
+- **Parameters**: `infer_steps=40`, `seed=1234`, `enable_cfg=True`
+
+#### 📊 Performance Comparison Table
+
+| Configuration | Inference Time(s) | GPU Memory(GB) | Speedup | Video Effect |
+|:-------------|:-----------------:|:--------------:|:-------:|:------------:|
+| **Wan2GP(profile=3)** | -- | OOM | -- | <video src="--" width="200px"></video> |
+| **LightX2V_5** | 2473 | 23 | -- | <video src="https://github.com/user-attachments/assets/0e83b146-3297-4c63-831c-8462cc657cad" width="200px"></video> |
+| **LightX2V_5-Distill** | 183 | 23 | -- | <video src="https://github.com/user-attachments/assets/976d0af0-244c-4abe-b2cb-01f68ad69d3c" width="200px"></video> |
+| **LightX2V_6** | 2169 | 18 | -- | <video src="https://github.com/user-attachments/assets/cf9edf82-53e1-46af-a000-79a88af8ad4a" width="200px"></video> |
+| **LightX2V_6-Distill** | 171 | 18 | -- | <video src="https://github.com/user-attachments/assets/e3064b03-6cd6-4c82-9e31-ab28b3165798" width="200px"></video> |
+
+---
+
+## 📖 Configuration Descriptions
+
+### 🖥️ H200 Environment Configuration Descriptions
+
+| Configuration | Technical Features |
+|:--------------|:------------------|
+| **Wan2.1 Official** | Based on [Wan2.1 official repository](https://github.com/Wan-Video/Wan2.1) original implementation |
+| **FastVideo** | Based on [FastVideo official repository](https://github.com/hao-ai-lab/FastVideo), using SageAttention2 backend optimization |
+| **LightX2V_1** | Uses SageAttention2 to replace native attention mechanism, adopts DIT BF16+FP32 (partial sensitive layers) mixed precision computation, improving computational efficiency while maintaining precision |
+| **LightX2V_2** | Unified BF16 precision computation, further reducing memory usage and computational overhead while maintaining generation quality |
+| **LightX2V_3** | Introduces FP8 quantization technology to significantly reduce computational precision requirements, combined with Tiling VAE technology to optimize memory usage |
+| **LightX2V_3-Distill** | Based on LightX2V_3 using 4-step distillation model(`infer_steps=4`, `enable_cfg=False`), further reducing inference steps while maintaining generation quality |
+| **LightX2V_4** | Based on LightX2V_3 with TeaCache(teacache_thresh=0.2) caching reuse technology, achieving acceleration through intelligent redundant computation skipping |
+
+### 🖥️ RTX 4090 Environment Configuration Descriptions
+
+| Configuration | Technical Features |
+|:--------------|:------------------|
+| **Wan2GP(profile=3)** | Implementation based on [Wan2GP repository](https://github.com/deepbeepmeep/Wan2GP), using MMGP optimization technology. Profile=3 configuration is suitable for RTX 3090/4090 environments with at least 32GB RAM and 24GB VRAM, adapting to limited memory resources by sacrificing VRAM. Uses quantized models: [480P model](https://huggingface.co/DeepBeepMeep/Wan2.1/blob/main/wan2.1_image2video_480p_14B_quanto_mbf16_int8.safetensors) and [720P model](https://huggingface.co/DeepBeepMeep/Wan2.1/blob/main/wan2.1_image2video_720p_14B_quanto_mbf16_int8.safetensors) |
+| **LightX2V_5** | Uses SageAttention2 to replace native attention mechanism, adopts DIT FP8+FP32 (partial sensitive layers) mixed precision computation, enables CPU offload technology, executes partial sensitive layers with FP32 precision, asynchronously offloads DIT inference process data to CPU, saves VRAM, with block-level offload granularity |
+| **LightX2V_5-Distill** | Based on LightX2V_5 using 4-step distillation model(`infer_steps=4`, `enable_cfg=False`), further reducing inference steps while maintaining generation quality |
+| **LightX2V_6** | Based on LightX2V_3 with CPU offload technology enabled, executes partial sensitive layers with FP32 precision, asynchronously offloads DIT inference process data to CPU, saves VRAM, with block-level offload granularity |
+| **LightX2V_6-Distill** | Based on LightX2V_6 using 4-step distillation model(`infer_steps=4`, `enable_cfg=False`), further reducing inference steps while maintaining generation quality |
+
+---
+
+## 📁 Configuration Files Reference
+
+Benchmark-related configuration files and execution scripts are available at:
+
+| Type | Link | Description |
+|:-----|:-----|:------------|
+| **Configuration Files** | [configs/bench](https://github.com/ModelTC/LightX2V/tree/main/configs/bench) | Contains JSON files with various optimization configurations |
+| **Execution Scripts** | [scripts/bench](https://github.com/ModelTC/LightX2V/tree/main/scripts/bench) | Contains benchmark execution scripts |
+
+---
+
+> 💡 **Tip**: It is recommended to choose the appropriate optimization solution based on your hardware configuration to achieve the best performance.
--- a/docs/EN/source/getting_started/model_structure.md
+++ b/docs/EN/source/getting_started/model_structure.md
+# Model Format and Loading Guide
+
+## 📖 Overview
+
+LightX2V is a flexible video generation inference framework that supports multiple model sources and formats, providing users with rich options:
+
+- ✅ **Wan Official Models**: Directly compatible with officially released complete models from Wan2.1 and Wan2.2
+- ✅ **Single-File Models**: Supports single-file format models released by LightX2V (including quantized versions)
+- ✅ **LoRA Models**: Supports loading distilled LoRAs released by LightX2V
+
+This document provides detailed instructions on how to use various model formats, configuration parameters, and best practices.
+
+---
+
+## 🗂️ Format 1: Wan Official Models
+
+### Model Repositories
+- [Wan2.1 Collection](https://huggingface.co/collections/Wan-AI/wan21-68ac4ba85372ae5a8e282a1b)
+- [Wan2.2 Collection](https://huggingface.co/collections/Wan-AI/wan22-68ac4ae80a8b477e79636fc8)
+
+### Model Features
+- **Official Guarantee**: Complete models officially released by Wan-AI with highest quality
+- **Complete Components**: Includes all necessary components (DIT, T5, CLIP, VAE)
+- **Original Precision**: Uses BF16/FP32 precision with no quantization loss
+- **Strong Compatibility**: Fully compatible with Wan official toolchain
+
+### Wan2.1 Official Models
+
+#### Directory Structure
+
+Using [Wan2.1-I2V-14B-720P](https://huggingface.co/Wan-AI/Wan2.1-I2V-14B-720P) as an example:
+
+```
+Wan2.1-I2V-14B-720P/
+├── diffusion_pytorch_model-00001-of-00007.safetensors   # DIT model shard 1
+├── diffusion_pytorch_model-00002-of-00007.safetensors   # DIT model shard 2
+├── diffusion_pytorch_model-00003-of-00007.safetensors   # DIT model shard 3
+├── diffusion_pytorch_model-00004-of-00007.safetensors   # DIT model shard 4
+├── diffusion_pytorch_model-00005-of-00007.safetensors   # DIT model shard 5
+├── diffusion_pytorch_model-00006-of-00007.safetensors   # DIT model shard 6
+├── diffusion_pytorch_model-00007-of-00007.safetensors   # DIT model shard 7
+├── diffusion_pytorch_model.safetensors.index.json       # Shard index file
+├── models_t5_umt5-xxl-enc-bf16.pth                      # T5 text encoder
+├── models_clip_open-clip-xlm-roberta-large-vit-huge-14.pth  # CLIP encoder
+├── Wan2.1_VAE.pth                                       # VAE encoder/decoder
+├── config.json                                          # Model configuration
+├── xlm-roberta-large/                                   # CLIP tokenizer
+├── google/                                              # T5 tokenizer
+├── assets/
+└── examples/
+```
+
+#### Usage
+
+```bash
+# Download model
+huggingface-cli download Wan-AI/Wan2.1-I2V-14B-720P \
+    --local-dir ./models/Wan2.1-I2V-14B-720P
+
+# Configure launch script
+model_path=./models/Wan2.1-I2V-14B-720P
+lightx2v_path=/path/to/LightX2V
+
+# Run inference
+cd LightX2V/scripts
+bash wan/run_wan_i2v.sh
+```
+
+### Wan2.2 Official Models
+
+#### Directory Structure
+
+Using [Wan2.2-I2V-A14B](https://huggingface.co/Wan-AI/Wan2.2-I2V-A14B) as an example:
+
+```
+Wan2.2-I2V-A14B/
+├── high_noise_model/                                    # High-noise model directory
+│   ├── diffusion_pytorch_model-00001-of-00009.safetensors
+│   ├── diffusion_pytorch_model-00002-of-00009.safetensors
+│   ├── ...
+│   ├── diffusion_pytorch_model-00009-of-00009.safetensors
+│   └── diffusion_pytorch_model.safetensors.index.json
+├── low_noise_model/                                     # Low-noise model directory
+│   ├── diffusion_pytorch_model-00001-of-00009.safetensors
+│   ├── diffusion_pytorch_model-00002-of-00009.safetensors
+│   ├── ...
+│   ├── diffusion_pytorch_model-00009-of-00009.safetensors
+│   └── diffusion_pytorch_model.safetensors.index.json
+├── models_t5_umt5-xxl-enc-bf16.pth                      # T5 text encoder
+├── Wan2.1_VAE.pth                                       # VAE encoder/decoder
+├── configuration.json                                   # Model configuration
+├── google/                                              # T5 tokenizer
+├── assets/                                              # Example assets (optional)
+└── examples/                                            # Example files (optional)
+```
+
+#### Usage
+
+```bash
+# Download model
+huggingface-cli download Wan-AI/Wan2.2-I2V-A14B \
+    --local-dir ./models/Wan2.2-I2V-A14B
+
+# Configure launch script
+model_path=./models/Wan2.2-I2V-A14B
+lightx2v_path=/path/to/LightX2V
+
+# Run inference
+cd LightX2V/scripts
+bash wan22/run_wan22_moe_i2v.sh
+```
+
+### Available Model List
+
+#### Wan2.1 Official Model List
+
+| Model Name | Download Link |
+|---------|----------|
+| Wan2.1-I2V-14B-720P | [Link](https://huggingface.co/Wan-AI/Wan2.1-I2V-14B-720P) |
+| Wan2.1-I2V-14B-480P | [Link](https://huggingface.co/Wan-AI/Wan2.1-I2V-14B-480P) |
+| Wan2.1-T2V-14B | [Link](https://huggingface.co/Wan-AI/Wan2.1-T2V-14B) |
+| Wan2.1-T2V-1.3B | [Link](https://huggingface.co/Wan-AI/Wan2.1-T2V-1.3B) |
+| Wan2.1-FLF2V-14B-720P | [Link](https://huggingface.co/Wan-AI/Wan2.1-FLF2V-14B-720P) |
+| Wan2.1-VACE-14B | [Link](https://huggingface.co/Wan-AI/Wan2.1-VACE-14B) |
+| Wan2.1-VACE-1.3B | [Link](https://huggingface.co/Wan-AI/Wan2.1-VACE-1.3B) |
+
+#### Wan2.2 Official Model List
+
+| Model Name | Download Link |
+|---------|----------|
+| Wan2.2-I2V-A14B | [Link](https://huggingface.co/Wan-AI/Wan2.2-I2V-A14B) |
+| Wan2.2-T2V-A14B | [Link](https://huggingface.co/Wan-AI/Wan2.2-T2V-A14B) |
+| Wan2.2-TI2V-5B | [Link](https://huggingface.co/Wan-AI/Wan2.2-TI2V-5B) |
+| Wan2.2-Animate-14B | [Link](https://huggingface.co/Wan-AI/Wan2.2-Animate-14B) |
+
+### Usage Tips
+
+> 💡 **Quantized Model Usage**: To use quantized models, refer to the [Model Conversion Script](https://github.com/ModelTC/LightX2V/blob/main/tools/convert/readme_zh.md) for conversion, or directly use pre-converted quantized models in Format 2 below
+>
+> 💡 **Memory Optimization**: For devices with RTX 4090 24GB or smaller memory, it's recommended to combine quantization techniques with CPU offload features:
+> - Quantization Configuration: Refer to [Quantization Documentation](../method_tutorials/quantization.md)
+> - CPU Offload: Refer to [Parameter Offload Documentation](../method_tutorials/offload.md)
+> - Wan2.1 Configuration: Refer to [offload config files](https://github.com/ModelTC/LightX2V/tree/main/configs/offload)
+> - Wan2.2 Configuration: Refer to [wan22 config files](https://github.com/ModelTC/LightX2V/tree/main/configs/wan22) with `4090` suffix
+
+---
+
+## 🗂️ Format 2: LightX2V Single-File Models (Recommended)
+
+### Model Repositories
+- [Wan2.1-LightX2V](https://huggingface.co/lightx2v/Wan2.1-Distill-Models)
+- [Wan2.2-LightX2V](https://huggingface.co/lightx2v/Wan2.2-Distill-Models)
+
+### Model Features
+- **Single-File Management**: Single safetensors file, easy to manage and deploy
+- **Multi-Precision Support**: Provides original precision, FP8, INT8, and other precision versions
+- **Distillation Acceleration**: Supports 4-step fast inference
+- **Tool Compatibility**: Compatible with ComfyUI and other tools
+
+**Examples**:
+- `wan2.1_i2v_720p_lightx2v_4step.safetensors` - 720P I2V original precision
+- `wan2.1_i2v_720p_scaled_fp8_e4m3_lightx2v_4step.safetensors` - 720P I2V FP8 quantization
+- `wan2.1_i2v_480p_int8_lightx2v_4step.safetensors` - 480P I2V INT8 quantization
+- ...
+
+### Wan2.1 Single-File Models
+
+#### Scenario A: Download Single Model File
+
+**Step 1: Select and Download Model**
+
+```bash
+# Create model directory
+mkdir -p ./models/wan2.1_i2v_720p
+
+# Download 720P I2V FP8 quantized model
+huggingface-cli download lightx2v/Wan2.1-Distill-Models \
+    --local-dir ./models/wan2.1_i2v_720p \
+    --include "wan2.1_i2v_720p_lightx2v_4step.safetensors"
+```
+
+**Step 2: Manually Organize Other Components**
+
+Directory structure as follows:
+```
+wan2.1_i2v_720p/
+├── wan2.1_i2v_720p_lightx2v_4step.safetensors                    # Original precision
+└── t5/clip/vae/config.json/xlm-roberta-large/google and other components       # Need manual organization
+```
+
+**Step 3: Configure Launch Script**
+
+```bash
+# Set in launch script (point to directory containing model file)
+model_path=./models/wan2.1_i2v_720p
+lightx2v_path=/path/to/LightX2V
+
+# Run script
+cd LightX2V/scripts
+bash wan/run_wan_i2v_distill_4step_cfg.sh
+```
+
+> 💡 **Tip**: When there's only one model file in the directory, LightX2V will automatically load it.
+
+#### Scenario B: Download Multiple Model Files
+
+When you download multiple models with different precisions to the same directory, you need to explicitly specify which model to use in the configuration file.
+
+**Step 1: Download Multiple Models**
+
+```bash
+# Create model directory
+mkdir -p ./models/wan2.1_i2v_720p_multi
+
+# Download original precision model
+huggingface-cli download lightx2v/Wan2.1-Distill-Models \
+    --local-dir ./models/wan2.1_i2v_720p_multi \
+    --include "wan2.1_i2v_720p_lightx2v_4step.safetensors"
+
+# Download FP8 quantized model
+huggingface-cli download lightx2v/Wan2.1-Distill-Models \
+    --local-dir ./models/wan2.1_i2v_720p_multi \
+    --include "wan2.1_i2v_720p_scaled_fp8_e4m3_lightx2v_4step.safetensors"
+
+# Download INT8 quantized model
+huggingface-cli download lightx2v/Wan2.1-Distill-Models \
+    --local-dir ./models/wan2.1_i2v_720p_multi \
+    --include "wan2.1_i2v_720p_int8_lightx2v_4step.safetensors"
+```
+
+**Step 2: Manually Organize Other Components**
+
+Directory structure as follows:
+
+```
+wan2.1_i2v_720p_multi/
+├── wan2.1_i2v_720p_lightx2v_4step.safetensors                    # Original precision
+├── wan2.1_i2v_720p_scaled_fp8_e4m3_lightx2v_4step.safetensors   # FP8 quantization
+└── wan2.1_i2v_720p_int8_lightx2v_4step.safetensors              # INT8 quantization
+└── t5/clip/vae/config.json/xlm-roberta-large/google and other components       # Need manual organization
+```
+
+**Step 3: Specify Model in Configuration File**
+
+Edit configuration file (e.g., `configs/distill/wan_i2v_distill_4step_cfg.json`):
+
+```json
+{
+    // Use original precision model
+    "dit_original_ckpt": "./models/wan2.1_i2v_720p_multi/wan2.1_i2v_720p_lightx2v_4step.safetensors",
+
+    // Or use FP8 quantized model
+    // "dit_quantized_ckpt": "./models/wan2.1_i2v_720p_multi/wan2.1_i2v_720p_scaled_fp8_e4m3_lightx2v_4step.safetensors",
+    // "dit_quantized": true,
+    // "dit_quant_scheme": "fp8-vllm",
+
+    // Or use INT8 quantized model
+    // "dit_quantized_ckpt": "./models/wan2.1_i2v_720p_multi/wan2.1_i2v_720p_int8_lightx2v_4step.safetensors",
+    // "dit_quantized": true,
+    // "dit_quant_scheme": "int8-vllm",
+
+    // Other configurations...
+}
+```
+### Usage Tips
+
+> 💡 **Configuration Parameter Description**:
+> - **dit_original_ckpt**: Used to specify the path to original precision models (BF16/FP32/FP16)
+> - **dit_quantized_ckpt**: Used to specify the path to quantized models (FP8/INT8), must be used with `dit_quantized` and `dit_quant_scheme` parameters
+
+**Step 4: Start Inference**
+
+```bash
+cd LightX2V/scripts
+bash wan/run_wan_i2v_distill_4step_cfg.sh
+```
+
+> 💡 **Tip**: Other components (T5, CLIP, VAE, tokenizer, etc.) need to be manually organized into the model directory
+
+### Wan2.2 Single-File Models
+
+#### Directory Structure Requirements
+
+When using Wan2.2 single-file models, you need to manually create a specific directory structure:
+
+```
+wan2.2_models/
+├── high_noise_model/                                    # High-noise model directory (required)
+│   └── wan2.2_i2v_A14b_high_noise_lightx2v_4step.safetensors
+├── low_noise_model/                                     # Low-noise model directory (required)
+│   └── wan2.2_i2v_A14b_low_noise_lightx2v_4step.safetensors
+└── t5/clip/vae/config.json/...                          # Other components (manually organized)
+```
+
+#### Scenario A: Only One Model File Per Directory
+
+```bash
+# Create required subdirectories
+mkdir -p ./models/wan2.2_models/high_noise_model
+mkdir -p ./models/wan2.2_models/low_noise_model
+
+# Download high-noise model to corresponding directory
+huggingface-cli download lightx2v/Wan2.2-Distill-Models \
+    --local-dir ./models/wan2.2_models/high_noise_model \
+    --include "wan2.2_i2v_A14b_high_noise_scaled_fp8_e4m3_lightx2v_4step.safetensors"
+
+# Download low-noise model to corresponding directory
+huggingface-cli download lightx2v/Wan2.2-Distill-Models \
+    --local-dir ./models/wan2.2_models/low_noise_model \
+    --include "wan2.2_i2v_A14b_low_noise_scaled_fp8_e4m3_lightx2v_4step.safetensors"
+
+# Configure launch script (point to parent directory)
+model_path=./models/wan2.2_models
+lightx2v_path=/path/to/LightX2V
+
+# Run script
+cd LightX2V/scripts
+bash wan22/run_wan22_moe_i2v_distill.sh
+```
+
+> 💡 **Tip**: When there's only one model file in each subdirectory, LightX2V will automatically load it.
+
+#### Scenario B: Multiple Model Files Per Directory
+
+When you place multiple models with different precisions in both `high_noise_model/` and `low_noise_model/` directories, you need to explicitly specify them in the configuration file.
+
+```bash
+# Create directories
+mkdir -p ./models/wan2.2_models_multi/high_noise_model
+mkdir -p ./models/wan2.2_models_multi/low_noise_model
+
+# Download multiple versions of high-noise model
+huggingface-cli download lightx2v/Wan2.2-Distill-Models \
+    --local-dir ./models/wan2.2_models_multi/high_noise_model \
+    --include "wan2.2_i2v_A14b_high_noise_*.safetensors"
+
+# Download multiple versions of low-noise model
+huggingface-cli download lightx2v/Wan2.2-Distill-Models \
+    --local-dir ./models/wan2.2_models_multi/low_noise_model \
+    --include "wan2.2_i2v_A14b_low_noise_*.safetensors"
+```
+
+**Directory Structure**:
+
+```
+wan2.2_models_multi/
+├── high_noise_model/
+│   ├── wan2.2_i2v_A14b_high_noise_lightx2v_4step.safetensors        # Original precision
+│   ├── wan2.2_i2v_A14b_high_noise_fp8_e4m3_lightx2v_4step.safetensors    # FP8 quantization
+│   └── wan2.2_i2v_A14b_high_noise_int8_lightx2v_4step.safetensors   # INT8 quantization
+└── low_noise_model/
+│    ├── wan2.2_i2v_A14b_low_noise_lightx2v_4step.safetensors         # Original precision
+│    ├── wan2.2_i2v_A14b_low_noise_fp8_e4m3_lightx2v_4step.safetensors     # FP8 quantization
+│    └── wan2.2_i2v_A14b_low_noise_int8_lightx2v_4step.safetensors    # INT8 quantization
+└── t5/vae/config.json/xlm-roberta-large/google and other components       # Need manual organization
+```
+
+**Configuration File Settings**:
+
+```json
+{
+    // Use original precision model
+    "high_noise_original_ckpt": "./models/wan2.2_models_multi/high_noise_model/wan2.2_i2v_A14b_high_noise_lightx2v_4step.safetensors",
+    "low_noise_original_ckpt": "./models/wan2.2_models_multi/low_noise_model/wan2.2_i2v_A14b_low_noise_lightx2v_4step.safetensors",
+
+    // Or use FP8 quantized model
+    // "high_noise_quantized_ckpt": "./models/wan2.2_models_multi/high_noise_model/wan2.2_i2v_A14b_high_noise_fp8_e4m3_lightx2v_4step.safetensors",
+    // "low_noise_quantized_ckpt": "./models/wan2.2_models_multi/low_noise_model/wan2.2_i2v_A14b_low_noise_fp8_e4m3_lightx2v_4step.safetensors",
+    // "dit_quantized": true,
+    // "dit_quant_scheme": "fp8-vllm"
+
+    // Or use INT8 quantized model
+    // "high_noise_quantized_ckpt": "./models/wan2.2_models_multi/high_noise_model/wan2.2_i2v_A14b_high_noise_int8_lightx2v_4step.safetensors",
+    // "low_noise_quantized_ckpt": "./models/wan2.2_models_multi/low_noise_model/wan2.2_i2v_A14b_low_noise_int8_lightx2v_4step.safetensors",
+    // "dit_quantized": true,
+    // "dit_quant_scheme": "int8-vllm"
+}
+```
+
+### Usage Tips
+
+> 💡 **Configuration Parameter Description**:
+> - **high_noise_original_ckpt** / **low_noise_original_ckpt**: Used to specify the path to original precision models (BF16/FP32/FP16)
+> - **high_noise_quantized_ckpt** / **low_noise_quantized_ckpt**: Used to specify the path to quantized models (FP8/INT8), must be used with `dit_quantized` and `dit_quant_scheme` parameters
+
+
+### Available Model List
+
+#### Wan2.1 Single-File Model List
+
+**Image-to-Video Models (I2V)**
+
+| Filename | Precision | Description |
+|--------|------|------|
+| `wan2.1_i2v_480p_lightx2v_4step.safetensors` | BF16 | 4-step model original precision |
+| `wan2.1_i2v_480p_scaled_fp8_e4m3_lightx2v_4step.safetensors` | FP8 | 4-step model FP8 quantization |
+| `wan2.1_i2v_480p_int8_lightx2v_4step.safetensors` | INT8 | 4-step model INT8 quantization |
+| `wan2.1_i2v_480p_scaled_fp8_e4m3_lightx2v_4step_comfyui.safetensors` | FP8 | 4-step model ComfyUI format |
+| `wan2.1_i2v_720p_lightx2v_4step.safetensors` | BF16 | 4-step model original precision |
+| `wan2.1_i2v_720p_scaled_fp8_e4m3_lightx2v_4step.safetensors` | FP8 | 4-step model FP8 quantization |
+| `wan2.1_i2v_720p_int8_lightx2v_4step.safetensors` | INT8 | 4-step model INT8 quantization |
+| `wan2.1_i2v_720p_scaled_fp8_e4m3_lightx2v_4step_comfyui.safetensors` | FP8 | 4-step model ComfyUI format |
+
+**Text-to-Video Models (T2V)**
+
+| Filename | Precision | Description |
+|--------|------|------|
+| `wan2.1_t2v_14b_lightx2v_4step.safetensors` | BF16 | 4-step model original precision |
+| `wan2.1_t2v_14b_scaled_fp8_e4m3_lightx2v_4step.safetensors` | FP8 | 4-step model FP8 quantization |
+| `wan2.1_t2v_14b_int8_lightx2v_4step.safetensors` | INT8 | 4-step model INT8 quantization |
+| `wan2.1_t2v_14b_scaled_fp8_e4m3_lightx2v_4step_comfyui.safetensors` | FP8 | 4-step model ComfyUI format |
+
+#### Wan2.2 Single-File Model List
+
+**Image-to-Video Models (I2V) - A14B Series**
+
+| Filename | Precision | Description |
+|--------|------|------|
+| `wan2.2_i2v_A14b_high_noise_lightx2v_4step.safetensors` | BF16 | High-noise model - 4-step original precision |
+| `wan2.2_i2v_A14b_high_noise_scaled_fp8_e4m3_lightx2v_4step.safetensors` | FP8 | High-noise model - 4-step FP8 quantization |
+| `wan2.2_i2v_A14b_high_noise_int8_lightx2v_4step.safetensors` | INT8 | High-noise model - 4-step INT8 quantization |
+| `wan2.2_i2v_A14b_low_noise_lightx2v_4step.safetensors` | BF16 | Low-noise model - 4-step original precision |
+| `wan2.2_i2v_A14b_low_noise_scaled_fp8_e4m3_lightx2v_4step.safetensors` | FP8 | Low-noise model - 4-step FP8 quantization |
+| `wan2.2_i2v_A14b_low_noise_int8_lightx2v_4step.safetensors` | INT8 | Low-noise model - 4-step INT8 quantization |
+
+> 💡 **Usage Tips**:
+> - Wan2.2 models use a dual-noise architecture, requiring both high-noise and low-noise models to be downloaded
+> - Refer to the "Wan2.2 Single-File Models" section above for detailed directory organization
+
+---
+
+## 🗂️ Format 3: LightX2V LoRA Models
+
+LoRA (Low-Rank Adaptation) models provide a lightweight model fine-tuning solution that enables customization for specific effects without modifying the base model.
+
+### Model Repositories
+
+- **Wan2.1 LoRA Models**: [lightx2v/Wan2.1-Distill-Loras](https://huggingface.co/lightx2v/Wan2.1-Distill-Loras)
+- **Wan2.2 LoRA Models**: [lightx2v/Wan2.2-Distill-Loras](https://huggingface.co/lightx2v/Wan2.2-Distill-Loras)
+
+### Usage Methods
+
+#### Method 1: Offline Merging
+
+Merge LoRA weights offline into the base model to generate a new complete model file.
+
+**Steps**:
+
+Refer to the [Model Conversion Documentation](https://github.com/ModelTC/lightx2v/tree/main/tools/convert/readme_zh.md) for offline merging.
+
+**Advantages**:
+- ✅ No need to load LoRA during inference
+- ✅ Better performance
+
+**Disadvantages**:
+- ❌ Requires additional storage space
+- ❌ Switching different LoRAs requires re-merging
+
+#### Method 2: Online Loading
+
+Dynamically load LoRA weights during inference without modifying the base model.
+
+**LoRA Application Principle**:
+
+```python
+# LoRA weight application formula
+# lora_scale = (alpha / rank)
+# W' = W + lora_scale * B @ A
+# Where: B = up_proj (out_features, rank)
+#        A = down_proj (rank, in_features)
+
+if weights_dict["alpha"] is not None:
+    lora_scale = weights_dict["alpha"] / lora_down.shape[0]
+elif alpha is not None:
+    lora_scale = alpha / lora_down.shape[0]
+else:
+    lora_scale = 1.0
+```
+
+**Configuration Method**:
+
+**Wan2.1 LoRA Configuration**:
+
+```json
+{
+  "lora_configs": [
+    {
+      "path": "wan2.1_i2v_lora_rank64_lightx2v_4step.safetensors",
+      "strength": 1.0,
+      "alpha": null
+    }
+  ]
+}
+```
+
+**Wan2.2 LoRA Configuration**:
+
+Since Wan2.2 uses a dual-model architecture (high-noise/low-noise), LoRA needs to be configured separately for both models:
+
+```json
+{
+  "lora_configs": [
+    {
+      "name": "low_noise_model",
+      "path": "wan2.2_i2v_A14b_low_noise_lora_rank64_lightx2v_4step.safetensors",
+      "strength": 1.0,
+      "alpha": null
+    },
+    {
+      "name": "high_noise_model",
+      "path": "wan2.2_i2v_A14b_high_noise_lora_rank64_lightx2v_4step.safetensors",
+      "strength": 1.0,
+      "alpha": null
+    }
+  ]
+}
+```
+
+**Parameter Description**:
+
+| Parameter | Description | Default |
+|------|------|--------|
+| `path` | LoRA model file path | Required |
+| `strength` | LoRA strength coefficient, range [0.0, 1.0] | 1.0 |
+| `alpha` | LoRA scaling factor, uses model's built-in value when `null` | null |
+| `name` | (Wan2.2 only) Specifies which model to apply to | Required |
+
+**Advantages**:
+- ✅ Flexible switching between different LoRAs
+- ✅ Saves storage space
+- ✅ Can dynamically adjust LoRA strength
+
+**Disadvantages**:
+- ❌ Additional loading time during inference
+- ❌ Slightly increases memory usage
+
+---
+
+## 📚 Related Resources
+
+### Official Repositories
+- [LightX2V GitHub](https://github.com/ModelTC/LightX2V)
+- [LightX2V Single-File Model Repository](https://huggingface.co/lightx2v/Wan2.1-Distill-Models)
+- [Wan-AI Official Model Repository](https://huggingface.co/Wan-AI)
+
+### Model Download Links
+
+**Wan2.1 Series**
+- [Wan2.1 Collection](https://huggingface.co/collections/Wan-AI/wan21-68ac4ba85372ae5a8e282a1b)
+
+**Wan2.2 Series**
+- [Wan2.2 Collection](https://huggingface.co/collections/Wan-AI/wan22-68ac4ae80a8b477e79636fc8)
+
+**LightX2V Single-File Models**
+- [Wan2.1-Distill-Models](https://huggingface.co/lightx2v/Wan2.1-Distill-Models)
+- [Wan2.2-Distill-Models](https://huggingface.co/lightx2v/Wan2.2-Distill-Models)
+
+### Documentation Links
+- [Quantization Documentation](../method_tutorials/quantization.md)
+- [Parameter Offload Documentation](../method_tutorials/offload.md)
+- [Configuration File Examples](https://github.com/ModelTC/LightX2V/tree/main/configs)
+
+---
+
+Through this document, you should be able to:
+
+✅ Understand all model formats supported by LightX2V
+✅ Select appropriate models and precisions based on your needs
+✅ Correctly download and organize model files
+✅ Configure launch parameters and successfully run inference
+✅ Resolve common model loading issues
+
+If you have other questions, feel free to ask in [GitHub Issues](https://github.com/ModelTC/LightX2V/issues).
--- a/docs/EN/source/getting_started/quickstart.md
+++ b/docs/EN/source/getting_started/quickstart.md
+# LightX2V Quick Start Guide
+
+Welcome to LightX2V! This guide will help you quickly set up the environment and start using LightX2V for video generation.
+
+## 📋 Table of Contents
+
+- [System Requirements](#system-requirements)
+- [Linux Environment Setup](#linux-environment-setup)
+  - [Docker Environment (Recommended)](#docker-environment-recommended)
+  - [Conda Environment Setup](#conda-environment-setup)
+- [Windows Environment Setup](#windows-environment-setup)
+- [Inference Usage](#inference-usage)
+
+## 🚀 System Requirements
+
+- **Operating System**: Linux (Ubuntu 18.04+) or Windows 10/11
+- **Python**: 3.10 or higher
+- **GPU**: NVIDIA GPU with CUDA support, at least 8GB VRAM
+- **Memory**: 16GB or more recommended
+- **Storage**: At least 50GB available space
+
+## 🐧 Linux Environment Setup
+
+### 🐳 Docker Environment (Recommended)
+
+We strongly recommend using the Docker environment, which is the simplest and fastest installation method.
+
+#### 1. Pull Image
+
+Visit LightX2V's [Docker Hub](https://hub.docker.com/r/lightx2v/lightx2v/tags), select a tag with the latest date, such as `25111101-cu128`:
+
+```bash
+docker pull lightx2v/lightx2v:25111101-cu128
+```
+
+We recommend using the `cuda128` environment for faster inference speed. If you need to use the `cuda124` environment, you can use image versions with the `-cu124` suffix:
+
+```bash
+docker pull lightx2v/lightx2v:25101501-cu124
+```
+
+#### 2. Run Container
+
+```bash
+docker run --gpus all -itd --ipc=host --name [container_name] -v [mount_settings] --entrypoint /bin/bash [image_id]
+```
+
+#### 3. China Mirror Source (Optional)
+
+For mainland China, if the network is unstable when pulling images, you can pull from Alibaba Cloud:
+
+```bash
+# cuda128
+docker pull registry.cn-hangzhou.aliyuncs.com/yongyang/lightx2v:25111101-cu128
+
+# cuda124
+docker pull registry.cn-hangzhou.aliyuncs.com/yongyang/lightx2v:25101501-cu124
+```
+
+### 🐍 Conda Environment Setup
+
+If you prefer to set up the environment yourself using Conda, please follow these steps:
+
+#### Step 1: Clone Repository
+
+```bash
+# Download project code
+git clone https://github.com/ModelTC/LightX2V.git
+cd LightX2V
+```
+
+#### Step 2: Create Conda Virtual Environment
+
+```bash
+# Create and activate conda environment
+conda create -n lightx2v python=3.11 -y
+conda activate lightx2v
+```
+
+#### Step 3: Install Dependencies
+
+```bash
+pip install -v -e .
+```
+
+#### Step 4: Install Attention Operators
+
+**Option A: Flash Attention 2**
+```bash
+git clone https://github.com/Dao-AILab/flash-attention.git --recursive
+cd flash-attention && python setup.py install
+```
+
+**Option B: Flash Attention 3 (for Hopper architecture GPUs)**
+```bash
+cd flash-attention/hopper && python setup.py install
+```
+
+**Option C: SageAttention 2 (Recommended)**
+```bash
+git clone https://github.com/thu-ml/SageAttention.git
+cd SageAttention && CUDA_ARCHITECTURES="8.0,8.6,8.9,9.0,12.0" EXT_PARALLEL=4 NVCC_APPEND_FLAGS="--threads 8" MAX_JOBS=32 pip install -v -e .
+```
+
+#### Step 4: Install Quantization Operators (Optional)
+
+Quantization operators are used to support model quantization, which can significantly reduce memory usage and accelerate inference. Choose the appropriate quantization operator based on your needs:
+
+**Option A: VLLM Kernels (Recommended)**
+Suitable for various quantization schemes, supports FP8 and other quantization formats.
+
+```bash
+pip install vllm
+```
+
+Or install from source for the latest features:
+
+```bash
+git clone https://github.com/vllm-project/vllm.git
+cd vllm
+uv pip install -e .
+```
+
+**Option B: SGL Kernels**
+Suitable for SGL quantization scheme, requires torch == 2.8.0.
+
+```bash
+pip install sgl-kernel --upgrade
+```
+
+**Option C: Q8 Kernels**
+Suitable for Ada architecture GPUs (such as RTX 4090, L40S, etc.).
+
+```bash
+git clone https://github.com/KONAKONA666/q8_kernels.git
+cd q8_kernels && git submodule init && git submodule update
+python setup.py install
+```
+
+> 💡 **Note**:
+> - You can skip this step if you don't need quantization functionality
+> - Quantized models can be downloaded from [LightX2V HuggingFace](https://huggingface.co/lightx2v)
+> - For more quantization information, please refer to the [Quantization Documentation](method_tutorials/quantization.html)
+
+#### Step 5: Verify Installation
+
+```python
+import lightx2v
+print(f"LightX2V Version: {lightx2v.__version__}")
+```
+
+## 🪟 Windows Environment Setup
+
+Windows systems only support Conda environment setup. Please follow these steps:
+
+### 🐍 Conda Environment Setup
+
+#### Step 1: Check CUDA Version
+
+First, confirm your GPU driver and CUDA version:
+
+```cmd
+nvidia-smi
+```
+
+Record the **CUDA Version** information in the output, which needs to be consistent in subsequent installations.
+
+#### Step 2: Create Python Environment
+
+```cmd
+# Create new environment (Python 3.12 recommended)
+conda create -n lightx2v python=3.12 -y
+
+# Activate environment
+conda activate lightx2v
+```
+
+> 💡 **Note**: Python 3.10 or higher is recommended for best compatibility.
+
+#### Step 3: Install PyTorch Framework
+
+**Method 1: Download Official Wheel Package (Recommended)**
+
+1. Visit the [PyTorch Official Download Page](https://download.pytorch.org/whl/torch/)
+2. Select the corresponding version wheel package, paying attention to matching:
+   - **Python Version**: Consistent with your environment
+   - **CUDA Version**: Matches your GPU driver
+   - **Platform**: Select Windows version
+
+**Example (Python 3.12 + PyTorch 2.6 + CUDA 12.4):**
+
+```cmd
+# Download and install PyTorch
+pip install torch-2.6.0+cu124-cp312-cp312-win_amd64.whl
+
+# Install supporting packages
+pip install torchvision==0.21.0 torchaudio==2.6.0
+```
+
+**Method 2: Direct Installation via pip**
+
+```cmd
+# CUDA 12.4 version example
+pip install torch==2.6.0+cu124 torchvision==0.21.0+cu124 torchaudio==2.6.0+cu124 --index-url https://download.pytorch.org/whl/cu124
+```
+
+#### Step 4: Install Windows Version vLLM
+
+Download the corresponding wheel package from [vllm-windows releases](https://github.com/SystemPanic/vllm-windows/releases).
+
+**Version Matching Requirements:**
+- Python version matching
+- PyTorch version matching
+- CUDA version matching
+
+```cmd
+# Install vLLM (please adjust according to actual filename)
+pip install vllm-0.9.1+cu124-cp312-cp312-win_amd64.whl
+```
+
+#### Step 5: Install Attention Mechanism Operators
+
+**Option A: Flash Attention 2**
+
+```cmd
+pip install flash-attn==2.7.2.post1
+```
+
+**Option B: SageAttention 2 (Strongly Recommended)**
+
+**Download Sources:**
+- [Windows Special Version 1](https://github.com/woct0rdho/SageAttention/releases)
+- [Windows Special Version 2](https://github.com/sdbds/SageAttention-for-windows/releases)
+
+```cmd
+# Install SageAttention (please adjust according to actual filename)
+pip install sageattention-2.1.1+cu126torch2.6.0-cp312-cp312-win_amd64.whl
+```
+
+> ⚠️ **Note**: SageAttention's CUDA version doesn't need to be strictly aligned, but Python and PyTorch versions must match.
+
+#### Step 6: Clone Repository
+
+```cmd
+# Clone project code
+git clone https://github.com/ModelTC/LightX2V.git
+cd LightX2V
+
+# Install Windows-specific dependencies
+pip install -r requirements_win.txt
+pip install -v -e .
+```
+
+#### Step 7: Install Quantization Operators (Optional)
+
+Quantization operators are used to support model quantization, which can significantly reduce memory usage and accelerate inference.
+
+**Install VLLM (Recommended):**
+
+Download the corresponding wheel package from [vllm-windows releases](https://github.com/SystemPanic/vllm-windows/releases) and install it.
+
+```cmd
+# Install vLLM (please adjust according to actual filename)
+pip install vllm-0.9.1+cu124-cp312-cp312-win_amd64.whl
+```
+
+> 💡 **Note**:
+> - You can skip this step if you don't need quantization functionality
+> - Quantized models can be downloaded from [LightX2V HuggingFace](https://huggingface.co/lightx2v)
+> - For more quantization information, please refer to the [Quantization Documentation](method_tutorials/quantization.html)
+
+## 🎯 Inference Usage
+
+### 📥 Model Preparation
+
+Before starting inference, you need to download the model files in advance. We recommend:
+
+- **Download Source**: Download models from [LightX2V Official Hugging Face](https://huggingface.co/lightx2v/) or other open-source model repositories
+- **Storage Location**: It's recommended to store models on SSD disks for better read performance
+- **Available Models**: Including Wan2.1-I2V, Wan2.1-T2V, and other models supporting different resolutions and functionalities
+
+### 📁 Configuration Files and Scripts
+
+The configuration files used for inference are available [here](https://github.com/ModelTC/LightX2V/tree/main/configs), and scripts are available [here](https://github.com/ModelTC/LightX2V/tree/main/scripts).
+
+You need to configure the downloaded model path in the run script. In addition to the input arguments in the script, there are also some necessary parameters in the configuration file specified by `--config_json`. You can modify them as needed.
+
+### 🚀 Start Inference
+
+#### Linux Environment
+
+```bash
+# Run after modifying the path in the script
+bash scripts/wan/run_wan_t2v.sh
+```
+
+#### Windows Environment
+
+```cmd
+# Use Windows batch script
+scripts\win\run_wan_t2v.bat
+```
+
+#### Python Script Launch
+
+```python
+from lightx2v import LightX2VPipeline
+
+pipe = LightX2VPipeline(
+    model_path="/path/to/Wan2.1-T2V-14B",
+    model_cls="wan2.1",
+    task="t2v",
+)
+
+pipe.create_generator(
+    attn_mode="sage_attn2",
+    infer_steps=50,
+    height=480,  # 720
+    width=832,   # 1280
+    num_frames=81,
+    guidance_scale=5.0,
+    sample_shift=5.0,
+)
+
+seed = 42
+prompt = "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage."
+negative_prompt = "镜头晃动，色调艳丽，过曝，静态，细节模糊不清，字幕，风格，作品，画作，画面，静止，整体发灰，最差质量，低质量，JPEG压缩残留，丑陋的，残缺的，多余的手指，画得不好的手部，画得不好的脸部，畸形的，毁容的，形态畸形的肢体，手指融合，静止不动的画面，杂乱的背景，三条腿，背景人很多，倒着走"
+save_result_path="/path/to/save_results/output.mp4"
+
+pipe.generate(
+    seed=seed,
+    prompt=prompt,
+    negative_prompt=negative_prompt,
+    save_result_path=save_result_path,
+)
+```
+
+> 💡 **More Examples**: For more usage examples including quantization, offloading, caching, and other advanced configurations, please refer to the [examples directory](https://github.com/ModelTC/LightX2V/tree/main/examples).
+
+## 📞 Get Help
+
+If you encounter problems during installation or usage, please:
+
+1. Search for related issues in [GitHub Issues](https://github.com/ModelTC/LightX2V/issues)
+2. Submit a new Issue describing your problem
+
+---
+
+🎉 **Congratulations!** You have successfully set up the LightX2V environment and can now start enjoying video generation!
--- a/docs/EN/source/index.rst
+++ b/docs/EN/source/index.rst
+Welcome to Lightx2v!
+==================
+
+.. figure:: ../../../assets/img_lightx2v.png
+  :width: 80%
+  :align: center
+  :alt: Lightx2v
+  :class: no-scaled-link
+
+.. raw:: html
+
+    <div align="center" style="font-family: charter;">
+
+    <a href="https://opensource.org/licenses/Apache-2.0"><img src="https://img.shields.io/badge/License-Apache_2.0-blue.svg" alt="License"></a>
+    <a href="https://deepwiki.com/ModelTC/lightx2v"><img src="https://deepwiki.com/badge.svg" alt="Ask DeepWiki"></a>
+    <a href="https://lightx2v-en.readthedocs.io/en/latest"><img src="https://img.shields.io/badge/docs-English-99cc2" alt="Doc"></a>
+    <a href="https://lightx2v-zhcn.readthedocs.io/zh-cn/latest"><img src="https://img.shields.io/badge/文档-中文-99cc2" alt="Doc"></a>
+    <a href="https://hub.docker.com/r/lightx2v/lightx2v/tags"><img src="https://badgen.net/badge/icon/docker?icon=docker&label" alt="Docker"></a>
+
+    </div>
+
+    <div align="center" style="font-family: charter;">
+    <strong>LightX2V: Light Video Generation Inference Framework</strong>
+    </div>
+
+LightX2V is a lightweight video generation inference framework designed to provide an inference tool that leverages multiple advanced video generation inference techniques. As a unified inference platform, this framework supports various generation tasks such as text-to-video (T2V) and image-to-video (I2V) across different models. X2V means transforming different input modalities (such as text or images) to video output.
+
+GitHub: https://github.com/ModelTC/lightx2v
+
+HuggingFace: https://huggingface.co/lightx2v
+
+Documentation
+-------------
+
+.. toctree::
+   :maxdepth: 1
+   :caption: Quick Start
+
+   Quick Start <getting_started/quickstart.md>
+   Model Structure <getting_started/model_structure.md>
+   Benchmark <getting_started/benchmark.md>
+
+.. toctree::
+   :maxdepth: 1
+   :caption: Method Tutorials
+
+   Model Quantization <method_tutorials/quantization.md>
+   Feature Caching <method_tutorials/cache.md>
+   Attention Module <method_tutorials/attention.md>
+   Offload <method_tutorials/offload.md>
+   Parallel Inference <method_tutorials/parallel.md>
+   Changing Resolution Inference <method_tutorials/changing_resolution.md>
+   Step Distill <method_tutorials/step_distill.md>
+   Autoregressive Distill <method_tutorials/autoregressive_distill.md>
+   Video Frame Interpolation <method_tutorials/video_frame_interpolation.md>
+
+.. toctree::
+   :maxdepth: 1
+   :caption: Deployment Guides
+
+   Low Latency Deployment <deploy_guides/for_low_latency.md>
+   Low Resource Deployment <deploy_guides/for_low_resource.md>
+   Lora Deployment <deploy_guides/lora_deploy.md>
+   Service Deployment <deploy_guides/deploy_service.md>
+   Gradio Deployment <deploy_guides/deploy_gradio.md>
+   ComfyUI Deployment <deploy_guides/deploy_comfyui.md>
+   Local Windows Deployment <deploy_guides/deploy_local_windows.md>
--- a/docs/EN/source/method_tutorials/attention.md
+++ b/docs/EN/source/method_tutorials/attention.md
+# Attention Mechanisms
+
+## Attention Mechanisms Supported by LightX2V
+
+| Name               | Type Name        | GitHub Link |
+|--------------------|------------------|-------------|
+| Flash Attention 2  | `flash_attn2`    | [flash-attention v2](https://github.com/Dao-AILab/flash-attention) |
+| Flash Attention 3  | `flash_attn3`    | [flash-attention v3](https://github.com/Dao-AILab/flash-attention) |
+| Sage Attention 2   | `sage_attn2`     | [SageAttention](https://github.com/thu-ml/SageAttention) |
+| Radial Attention   | `radial_attn`    | [Radial Attention](https://github.com/mit-han-lab/radial-attention) |
+| Sparge Attention   | `sparge_ckpt`     | [Sparge Attention](https://github.com/thu-ml/SpargeAttn) |
+
+---
+
+## Configuration Examples
+
+The configuration files for attention mechanisms are located [here](https://github.com/ModelTC/lightx2v/tree/main/configs/attentions)
+
+By specifying --config_json to a specific config file, you can test different attention mechanisms.
+
+For example, for radial_attn, the configuration is as follows:
+
+```json
+{
+  "self_attn_1_type": "radial_attn",
+  "cross_attn_1_type": "flash_attn3",
+  "cross_attn_2_type": "flash_attn3"
+}
+```
+
+To switch to other types, simply replace the corresponding values with the type names from the table above.
+
+Tips: radial_attn can only be used in self attention due to the limitations of its sparse algorithm principle.
+
+For further customization of attention mechanism behavior, please refer to the official documentation or implementation code of each attention library.
--- a/docs/EN/source/method_tutorials/autoregressive_distill.md
+++ b/docs/EN/source/method_tutorials/autoregressive_distill.md
+# Autoregressive Distillation
+
+Autoregressive distillation is a technical exploration in LightX2V. By training distilled models, it reduces inference steps from the original 40-50 steps to **8 steps**, achieving inference acceleration while enabling infinite-length video generation through KV Cache technology.
+
+> ⚠️ Warning: Currently, autoregressive distillation has mediocre effects and the acceleration improvement has not met expectations, but it can serve as a long-term research project. Currently, LightX2V only supports autoregressive models for T2V.
+
+## 🔍 Technical Principle
+
+Autoregressive distillation is implemented through [CausVid](https://github.com/tianweiy/CausVid) technology. CausVid performs step distillation and CFG distillation on 1.3B autoregressive models. LightX2V extends it with a series of enhancements:
+
+1. **Larger Models**: Supports autoregressive distillation training for 14B models;
+2. **More Complete Data Processing Pipeline**: Generates a training dataset of 50,000 prompt-video pairs;
+
+For detailed implementation, refer to [CausVid-Plus](https://github.com/GoatWu/CausVid-Plus).
+
+## 🛠️ Configuration Files
+
+### Configuration File
+
+Configuration options are provided in the [configs/causvid/](https://github.com/ModelTC/lightx2v/tree/main/configs/causvid) directory:
+
+| Configuration File | Model Address |
+|-------------------|---------------|
+| [wan_t2v_causvid.json](https://github.com/ModelTC/lightx2v/blob/main/configs/causvid/wan_t2v_causvid.json) | https://huggingface.co/lightx2v/Wan2.1-T2V-14B-CausVid |
+
+### Key Configuration Parameters
+
+```json
+{
+  "enable_cfg": false,          // Disable CFG for speed improvement
+  "num_fragments": 3,           // Number of video segments generated at once, 5s each
+  "num_frames": 21,             // Frames per video segment, modify with caution!
+  "num_frame_per_block": 3,     // Frames per autoregressive block, modify with caution!
+  "num_blocks": 7,              // Autoregressive blocks per video segment, modify with caution!
+  "frame_seq_length": 1560,     // Encoding length per frame, modify with caution!
+  "denoising_step_list": [      // Denoising timestep list
+    999, 934, 862, 756, 603, 410, 250, 140, 74
+  ]
+}
+```
+
+## 📜 Usage
+
+### Model Preparation
+
+Place the downloaded model (`causal_model.pt` or `causal_model.safetensors`) in the `causvid_models/` folder under the Wan model root directory:
+- For T2V: `Wan2.1-T2V-14B/causvid_models/`
+
+### Inference Script
+
+```bash
+bash scripts/wan/run_wan_t2v_causvid.sh
+```
--- a/docs/EN/source/method_tutorials/cache.md
+++ b/docs/EN/source/method_tutorials/cache.md
+# Feature Cache
+
+To demonstrate some video playback effects, you can get better display effects and corresponding documentation content on this [🔗 page](https://github.com/ModelTC/LightX2V/blob/main/docs/EN/source/method_tutorials/cache_source.md).
--- a/docs/EN/source/method_tutorials/cache_source.md
+++ b/docs/EN/source/method_tutorials/cache_source.md
+# Feature Caching
+
+## Cache Acceleration Algorithm
+- In the inference process of diffusion models, cache reuse is an important acceleration algorithm.
+- The core idea is to skip redundant computations at certain time steps by reusing historical cache results to improve inference efficiency.
+- The key to the algorithm is how to decide which time steps to perform cache reuse, usually based on dynamic judgment of model state changes or error thresholds.
+- During inference, key content such as intermediate features, residuals, and attention outputs need to be cached. When entering reusable time steps, the cached content is directly utilized, and the current output is reconstructed through approximation methods like Taylor expansion, thereby reducing repeated calculations and achieving efficient inference.
+
+### TeaCache
+The core idea of `TeaCache` is to accumulate the **relative L1** distance between adjacent time step inputs. When the accumulated distance reaches a set threshold, it determines that the current time step should not use cache reuse; conversely, when the accumulated distance does not reach the set threshold, cache reuse is used to accelerate the inference process.
+- Specifically, the algorithm calculates the relative L1 distance between the current input and the previous step input at each inference step and accumulates it.
+- When the accumulated distance does not exceed the threshold, it indicates that the model state change is not obvious, so the most recently cached content is directly reused, skipping some redundant calculations. This can significantly reduce the number of forward computations of the model and improve inference speed.
+
+In practical effects, TeaCache achieves significant acceleration while ensuring generation quality. On a single H200 card, the time consumption and video comparison before and after acceleration are as follows:
+
+<table>
+  <tr>
+    <td align="center">
+      Before acceleration: 58s
+    </td>
+    <td align="center">
+      After acceleration: 17.9s
+    </td>
+  </tr>
+  <tr>
+    <td align="center">
+      <video src="https://github.com/user-attachments/assets/1781df9b-04df-4586-b22f-5d15f8e1bff6" width="100%"></video>
+    </td>
+    <td align="center">
+      <video src="https://github.com/user-attachments/assets/e93f91eb-3825-4866-90c2-351176263a2f" width="100%"></video>
+    </td>
+  </tr>
+</table>
+
+
+- Acceleration ratio: **3.24**
+- Config: [wan_t2v_1_3b_tea_480p.json](https://github.com/ModelTC/lightx2v/tree/main/configs/caching/teacache/wan_t2v_1_3b_tea_480p.json)
+- Reference paper: [https://arxiv.org/abs/2411.19108](https://arxiv.org/abs/2411.19108)
+
+### TaylorSeer Cache
+The core of `TaylorSeer Cache` lies in using Taylor's formula to recalculate cached content as residual compensation for cache reuse time steps.
+- The specific approach is to not only simply reuse historical cache at cache reuse time steps, but also approximately reconstruct the current output through Taylor expansion. This can further improve output accuracy while reducing computational load.
+- Taylor expansion can effectively capture minor changes in model state, allowing errors caused by cache reuse to be compensated, thereby ensuring generation quality while accelerating.
+
+`TaylorSeer Cache` is suitable for scenarios with high output accuracy requirements and can further improve model inference performance based on cache reuse.
+
+<table>
+  <tr>
+    <td align="center">
+      Before acceleration: 57.7s
+    </td>
+    <td align="center">
+      After acceleration: 41.3s
+    </td>
+  </tr>
+  <tr>
+    <td align="center">
+      <video src="https://github.com/user-attachments/assets/2d04005c-853b-4752-884b-29f8ea5717d2" width="100%"></video>
+    </td>
+    <td align="center">
+      <video src="https://github.com/user-attachments/assets/270e3624-c904-468c-813e-0c65daf1594d" width="100%"></video>
+    </td>
+  </tr>
+</table>
+
+
+- Acceleration ratio: **1.39**
+- Config: [wan_t2v_taylorseer](https://github.com/ModelTC/lightx2v/tree/main/configs/caching/taylorseer/wan_t2v_taylorseer.json)
+- Reference paper: [https://arxiv.org/abs/2503.06923](https://arxiv.org/abs/2503.06923)
+
+### AdaCache
+The core idea of `AdaCache` is to dynamically adjust the step size of cache reuse based on partial cached content in specified block chunks.
+- The algorithm analyzes feature differences between two adjacent time steps within specific blocks and adaptively determines the next cache reuse time step interval based on the difference magnitude.
+- When model state changes are small, the step size automatically increases, reducing cache update frequency; when state changes are large, the step size decreases to ensure output quality.
+
+This allows flexible adjustment of caching strategies based on dynamic changes in the actual inference process, achieving more efficient acceleration and better generation results. AdaCache is suitable for application scenarios that have high requirements for both inference speed and generation quality.
+
+<table>
+  <tr>
+    <td align="center">
+      Before acceleration: 227s
+    </td>
+    <td align="center">
+      After acceleration: 83s
+    </td>
+  </tr>
+  <tr>
+    <td align="center">
+      <video src="https://github.com/user-attachments/assets/33b2206d-17e6-4433-bed7-bfa890f9fa7d" width="100%"></video>
+    </td>
+    <td align="center">
+      <video src="https://github.com/user-attachments/assets/084dbe3d-6ff3-4afc-9a7c-453ec53b3672" width="100%"></video>
+    </td>
+  </tr>
+</table>
+
+
+- Acceleration ratio: **2.73**
+- Config: [wan_i2v_ada](https://github.com/ModelTC/lightx2v/tree/main/configs/caching/adacache/wan_i2v_ada.json)
+- Reference paper: [https://arxiv.org/abs/2411.02397](https://arxiv.org/abs/2411.02397)
+
+### CustomCache
+`CustomCache` combines the advantages of `TeaCache` and `TaylorSeer Cache`.
+- It combines the real-time and reasonable cache decision-making of `TeaCache`, determining when to perform cache reuse through dynamic thresholds.
+- At the same time, it utilizes `TaylorSeer`'s Taylor expansion method to make use of cached content.
+
+This not only efficiently determines the timing of cache reuse but also maximizes the utilization of cached content, improving output accuracy and generation quality. Actual testing shows that `CustomCache` produces video quality superior to using `TeaCache`, `TaylorSeer Cache`, or `AdaCache` alone across multiple content generation tasks, making it one of the currently optimal comprehensive cache acceleration algorithms.
+
+<table>
+  <tr>
+    <td align="center">
+      Before acceleration: 57.9s
+    </td>
+    <td align="center">
+      After acceleration: 16.6s
+    </td>
+  </tr>
+  <tr>
+    <td align="center">
+      <video src="https://github.com/user-attachments/assets/304ff1e8-ad1c-4013-bcf1-959ac140f67f" width="100%"></video>
+    </td>
+    <td align="center">
+      <video src="https://github.com/user-attachments/assets/d3fb474a-79af-4f33-b965-23d402d3cf16" width="100%"></video>
+    </td>
+  </tr>
+</table>
+
+
+- Acceleration ratio: **3.49**
+- Config: [wan_t2v_custom_1_3b](https://github.com/ModelTC/lightx2v/tree/main/configs/caching/custom/wan_t2v_custom_1_3b.json)
+
+
+## Usage
+
+The config files for feature caching are located [here](https://github.com/ModelTC/lightx2v/tree/main/configs/caching)
+
+By specifying --config_json to the specific config file, you can test different cache algorithms.
+
+[Here](https://github.com/ModelTC/lightx2v/tree/main/scripts/cache) are some running scripts for use.
--- a/docs/EN/source/method_tutorials/changing_resolution.md
+++ b/docs/EN/source/method_tutorials/changing_resolution.md
+# Variable Resolution Inference
+
+## Overview
+
+Variable resolution inference is a technical strategy for optimizing the denoising process. It improves computational efficiency while maintaining generation quality by using different resolutions at different stages of the denoising process. The core idea of this method is to use lower resolution for coarse denoising in the early stages and switch to normal resolution for fine processing in the later stages.
+
+## Technical Principles
+
+### Multi-stage Denoising Strategy
+
+Variable resolution inference is based on the following observations:
+
+- **Early-stage denoising**: Mainly handles coarse noise and overall structure, requiring less detailed information
+- **Late-stage denoising**: Focuses on detail optimization and high-frequency information recovery, requiring complete resolution information
+
+### Resolution Switching Mechanism
+
+1. **Low-resolution stage** (early stage)
+   - Downsample the input to a lower resolution (e.g., 0.75x of original size)
+   - Execute initial denoising steps
+   - Quickly remove most noise and establish basic structure
+
+2. **Normal resolution stage** (late stage)
+   - Upsample the denoising result from the first step back to original resolution
+   - Continue executing remaining denoising steps
+   - Restore detailed information and complete fine processing
+
+### U-shaped Resolution Strategy
+
+If resolution is reduced at the very beginning of the denoising steps, it may cause significant differences between the final generated video and the video generated through normal inference. Therefore, a U-shaped resolution strategy can be adopted, where the original resolution is maintained for the first few steps, then resolution is reduced for inference.
+
+## Usage
+
+The config files for variable resolution inference are located [here](https://github.com/ModelTC/LightX2V/tree/main/configs/changing_resolution)
+
+You can test variable resolution inference by specifying --config_json to the specific config file.
+
+You can refer to the scripts [here](https://github.com/ModelTC/LightX2V/blob/main/scripts/changing_resolution) to run.
+
+### Example 1:
+```
+{
+    "infer_steps": 50,
+    "changing_resolution": true,
+    "resolution_rate": [0.75],
+    "changing_resolution_steps": [25]
+}
+```
+
+This means a total of 50 steps, with resolution at 0.75x original resolution from step 1 to 25, and original resolution from step 26 to the final step.
+
+### Example 2:
+```
+{
+    "infer_steps": 50,
+    "changing_resolution": true,
+    "resolution_rate": [1.0, 0.75],
+    "changing_resolution_steps": [10, 35]
+}
+```
+
+This means a total of 50 steps, with original resolution from step 1 to 10, 0.75x original resolution from step 11 to 35, and original resolution from step 36 to the final step.
+
+Generally, if `changing_resolution_steps` is [A, B, C], the denoising starts at step 1, and the total number of steps is X, then the inference process will be divided into four segments.
+
+Specifically, these segments are (0, A], (A, B], (B, C], and (C, X], where each segment is a left-open, right-closed interval.
--- a/docs/EN/source/method_tutorials/offload.md
+++ b/docs/EN/source/method_tutorials/offload.md
+# Parameter Offload
+
+## 📖 Overview
+
+LightX2V implements an advanced parameter offload mechanism specifically designed for large model inference under limited hardware resources. The system provides an excellent speed-memory balance by intelligently managing model weights across different memory hierarchies.
+
+**Core Features:**
+- **Block/Phase-level Offload**: Efficiently manages model weights in block/phase units for optimal memory usage
+  - **Block**: The basic computational unit of Transformer models, containing complete Transformer layers (self-attention, cross-attention, feedforward networks, etc.), serving as a larger memory management unit
+  - **Phase**: Finer-grained computational stages within blocks, containing individual computational components (such as self-attention, cross-attention, feedforward networks, etc.), providing more precise memory control
+- **Multi-tier Storage Support**: GPU → CPU → Disk hierarchy with intelligent caching
+- **Asynchronous Operations**: Overlaps computation and data transfer using CUDA streams
+- **Disk/NVMe Serialization**: Supports secondary storage when memory is insufficient
+
+## 🎯 Offload Strategies
+
+### Strategy 1: GPU-CPU Block/Phase Offload
+
+**Use Case**: Insufficient GPU memory but sufficient system memory
+
+**How It Works**: Manages model weights in block or phase units between GPU and CPU memory, utilizing CUDA streams to overlap computation and data transfer. Blocks contain complete Transformer layers, while Phases are individual computational components within blocks.
+
+<div align="center">
+<img alt="GPU-CPU block/phase offload workflow" src="https://raw.githubusercontent.com/ModelTC/LightX2V/main/assets/figs/offload/fig1_en.png" width="75%">
+</div>
+
+<div align="center">
+<img alt="Swap operation" src="https://raw.githubusercontent.com/ModelTC/LightX2V/main/assets/figs/offload/fig2_en.png" width="75%">
+</div>
+
+<div align="center">
+<img alt="Swap concept" src="https://raw.githubusercontent.com/ModelTC/LightX2V/main/assets/figs/offload/fig3_en.png" width="75%">
+</div>
+
+
+**Block vs Phase Explanation**:
+- **Block Granularity**: Larger memory management unit containing complete Transformer layers (self-attention, cross-attention, feedforward networks, etc.), suitable for sufficient memory scenarios with reduced management overhead
+- **Phase Granularity**: Finer-grained memory management containing individual computational components (such as self-attention, cross-attention, feedforward networks, etc.), suitable for memory-constrained scenarios with more flexible memory control
+
+**Key Features:**
+- **Asynchronous Transfer**: Uses three CUDA streams with different priorities for parallel computation and transfer
+  - Compute stream (priority=-1): High priority, handles current computation
+  - GPU load stream (priority=0): Medium priority, handles CPU to GPU prefetching
+  - CPU load stream (priority=0): Medium priority, handles GPU to CPU offloading
+- **Prefetch Mechanism**: Preloads the next block/phase to GPU in advance
+- **Intelligent Caching**: Maintains weight cache in CPU memory
+- **Stream Synchronization**: Ensures correctness of data transfer and computation
+- **Swap Operation**: Rotates block/phase positions after computation for continuous execution
+
+
+
+
+### Strategy 2: Disk-CPU-GPU Block/Phase Offload (Lazy Loading)
+
+**Use Case**: Both GPU memory and system memory are insufficient
+
+**How It Works**: Builds upon Strategy 1 by introducing disk storage, implementing a three-tier storage hierarchy (Disk → CPU → GPU). CPU continues to serve as a cache pool with configurable size, suitable for devices with limited CPU memory.
+
+
+<div align="center">
+<img alt="Disk-CPU-GPU block/phase offload workflow" src="https://raw.githubusercontent.com/ModelTC/LightX2V/main/assets/figs/offload/fig4_en.png" width="75%">
+</div>
+
+
+<div align="center">
+<img alt="Working steps" src="https://raw.githubusercontent.com/ModelTC/LightX2V/main/assets/figs/offload/fig5_en.png" width="75%">
+</div>
+
+**Key Features:**
+- **Lazy Loading**: Model weights are loaded from disk on-demand, avoiding loading the entire model at once
+- **Intelligent Caching**: CPU memory buffer uses FIFO strategy with configurable size
+- **Multi-threaded Prefetch**: Uses multiple disk worker threads for parallel loading
+- **Asynchronous Transfer**: Uses CUDA streams to overlap computation and data transfer
+- **Swap Rotation**: Achieves continuous computation through position rotation, avoiding repeated loading/offloading
+
+**Working Steps**:
+- **Disk Storage**: Model weights are stored on SSD/NVMe by block, one .safetensors file per block
+- **Task Scheduling**: When a block/phase is needed, priority task queue assigns disk worker threads
+- **Asynchronous Loading**: Multiple disk threads load weight files from disk to CPU memory buffer in parallel
+- **Intelligent Caching**: CPU memory buffer manages cache using FIFO strategy with configurable size
+- **Cache Hit**: If weights are already in cache, transfer directly to GPU without disk read
+- **Prefetch Transfer**: Weights in cache are asynchronously transferred to GPU memory (using GPU load stream)
+- **Compute Execution**: Weights on GPU perform computation (using compute stream) while background continues prefetching next block/phase
+- **Swap Rotation**: After computation completes, rotate block/phase positions for continuous computation
+- **Memory Management**: When CPU cache is full, automatically evict the least recently used weight block/phase
+
+
+
+## ⚙️ Configuration Parameters
+
+### GPU-CPU Offload Configuration
+
+```python
+config = {
+    "cpu_offload": True,
+    "offload_ratio": 1.0,           # Offload ratio (0.0-1.0)
+    "offload_granularity": "block", # Offload granularity: "block" or "phase"
+    "lazy_load": False,             # Disable lazy loading
+}
+```
+
+### Disk-CPU-GPU Offload Configuration
+
+```python
+config = {
+    "cpu_offload": True,
+    "lazy_load": True,              # Enable lazy loading
+    "offload_ratio": 1.0,           # Offload ratio
+    "offload_granularity": "phase", # Recommended to use phase granularity
+    "num_disk_workers": 2,          # Number of disk worker threads
+    "offload_to_disk": True,        # Enable disk offload
+}
+```
+
+**Intelligent Cache Key Parameters:**
+- `max_memory`: Controls CPU cache size, affects cache hit rate and memory usage
+- `num_disk_workers`: Controls number of disk loading threads, affects prefetch speed
+- `offload_granularity`: Controls cache granularity (block or phase), affects cache efficiency
+  - `"block"`: Cache management in complete Transformer layer units
+  - `"phase"`: Cache management in individual computational component units
+
+**Offload Configuration for Non-DIT Model Components (T5, CLIP, VAE):**
+
+The offload behavior of these components follows these rules:
+- **Default Behavior**: If not specified separately, T5, CLIP, VAE will follow the `cpu_offload` setting
+- **Independent Configuration**: Can set offload strategy separately for each component for fine-grained control
+
+**Configuration Example**:
+```json
+{
+    "cpu_offload": true,           // DIT model offload switch
+    "t5_cpu_offload": false,       // T5 encoder independent setting
+    "clip_cpu_offload": false,     // CLIP encoder independent setting
+    "vae_cpu_offload": false       // VAE encoder independent setting
+}
+```
+
+For memory-constrained devices, a progressive offload strategy is recommended:
+
+1. **Step 1**: Only enable `cpu_offload`, disable `t5_cpu_offload`, `clip_cpu_offload`, `vae_cpu_offload`
+2. **Step 2**: If memory is still insufficient, gradually enable CPU offload for T5, CLIP, VAE
+3. **Step 3**: If memory is still not enough, consider using quantization + CPU offload or enable `lazy_load`
+
+**Practical Experience**:
+- **RTX 4090 24GB + 14B Model**: Usually only need to enable `cpu_offload`, manually set other component offload to `false`, and use FP8 quantized version
+- **Smaller Memory GPUs**: Need to combine quantization, CPU offload, and lazy loading
+- **Quantization Schemes**: Refer to [Quantization Documentation](../method_tutorials/quantization.md) to select appropriate quantization strategy
+
+
+**Configuration File Reference**:
+- **Wan2.1 Series Models**: Refer to [offload config files](https://github.com/ModelTC/lightx2v/tree/main/configs/offload)
+- **Wan2.2 Series Models**: Refer to [wan22 config files](https://github.com/ModelTC/lightx2v/tree/main/configs/wan22) with `4090` suffix
+
+## 🎯 Usage Recommendations
+- 🔄 GPU-CPU Block/Phase Offload: Suitable for insufficient GPU memory (RTX 3090/4090 24G) but sufficient system memory (>64/128G)
+
+- 💾 Disk-CPU-GPU Block/Phase Offload: Suitable for both insufficient GPU memory (RTX 3060/4090 8G) and system memory (16/32G)
+
+- 🚫 No Offload: Suitable for high-end hardware configurations pursuing best performance
+
+
+## 🔍 Troubleshooting
+
+### Common Issues and Solutions
+
+1. **Disk I/O Bottleneck**
+   - Solution: Use NVMe SSD, increase num_disk_workers
+
+
+2. **Memory Buffer Overflow**
+   - Solution: Increase max_memory or reduce num_disk_workers
+
+3. **Loading Timeout**
+   - Solution: Check disk performance, optimize file system
+
+
+**Note**: This offload mechanism is specifically designed for LightX2V, fully utilizing the asynchronous computing capabilities of modern hardware, significantly lowering the hardware threshold for large model inference.
--- a/docs/EN/source/method_tutorials/parallel.md
+++ b/docs/EN/source/method_tutorials/parallel.md
+# Parallel Inference
+
+LightX2V supports distributed parallel inference, enabling the utilization of multiple GPUs for inference. The DiT component supports two parallel attention mechanisms: **Ulysses** and **Ring**, while also supporting **Cfg parallel inference**. Parallel inference significantly reduces inference time and alleviates memory overhead on each GPU.
+
+## DiT Parallel Configuration
+
+### 1. Ulysses Parallel
+
+**Configuration method:**
+```json
+    "parallel": {
+        "seq_p_size": 4,
+        "seq_p_attn_type": "ulysses"
+    }
+```
+
+### 2. Ring Parallel
+
+**Configuration method:**
+```json
+    "parallel": {
+        "seq_p_size": 4,
+        "seq_p_attn_type": "ring"
+    }
+```
+
+## Cfg Parallel Configuration
+
+**Configuration method:**
+```json
+    "parallel": {
+        "cfg_p_size": 2
+    }
+```
+
+## Hybrid Parallel Configuration
+
+**Configuration method:**
+```json
+    "parallel": {
+        "seq_p_size": 4,
+        "seq_p_attn_type": "ulysses",
+        "cfg_p_size": 2
+    }
+```
+
+## Usage
+
+Parallel inference configuration files are available [here](https://github.com/ModelTC/lightx2v/tree/main/configs/dist_infer)
+
+By specifying --config_json to a specific config file, you can test parallel inference.
+
+[Here](https://github.com/ModelTC/lightx2v/tree/main/scripts/dist_infer) are some run scripts for your use.
--- a/docs/EN/source/method_tutorials/quantization.md
+++ b/docs/EN/source/method_tutorials/quantization.md
+# Model Quantization Techniques
+
+## 📖 Overview
+
+LightX2V supports quantized inference for DIT, T5, and CLIP models, reducing memory usage and improving inference speed by lowering model precision.
+
+---
+
+## 🔧 Quantization Modes
+
+| Quantization Mode | Weight Quantization | Activation Quantization | Compute Kernel | Supported Hardware |
+|--------------|----------|----------|----------|----------|
+| `fp8-vllm` | FP8 channel symmetric | FP8 channel dynamic symmetric | [VLLM](https://github.com/vllm-project/vllm) | H100/H200/H800, RTX 40 series, etc. |
+| `int8-vllm` | INT8 channel symmetric | INT8 channel dynamic symmetric | [VLLM](https://github.com/vllm-project/vllm) | A100/A800, RTX 30/40 series, etc.  |
+| `fp8-sgl` | FP8 channel symmetric | FP8 channel dynamic symmetric | [SGL](https://github.com/sgl-project/sglang/tree/main/sgl-kernel) | H100/H200/H800, RTX 40 series, etc. |
+| `int8-sgl` | INT8 channel symmetric | INT8 channel dynamic symmetric | [SGL](https://github.com/sgl-project/sglang/tree/main/sgl-kernel) | A100/A800, RTX 30/40 series, etc.  |
+| `fp8-q8f` | FP8 channel symmetric | FP8 channel dynamic symmetric | [Q8-Kernels](https://github.com/KONAKONA666/q8_kernels) | RTX 40 series, L40S, etc. |
+| `int8-q8f` | INT8 channel symmetric | INT8 channel dynamic symmetric | [Q8-Kernels](https://github.com/KONAKONA666/q8_kernels) | RTX 40 series, L40S, etc. |
+| `int8-torchao` | INT8 channel symmetric | INT8 channel dynamic symmetric | [TorchAO](https://github.com/pytorch/ao) | A100/A800, RTX 30/40 series, etc. |
+| `int4-g128-marlin` | INT4 group symmetric | FP16 | [Marlin](https://github.com/IST-DASLab/marlin) | H200/H800/A100/A800, RTX 30/40 series, etc. |
+| `fp8-b128-deepgemm` | FP8 block symmetric | FP8 group symmetric | [DeepGemm](https://github.com/deepseek-ai/DeepGEMM) | H100/H200/H800, RTX 40 series, etc.|
+
+---
+
+## 🔧 Obtaining Quantized Models
+
+### Method 1: Download Pre-Quantized Models
+
+Download pre-quantized models from LightX2V model repositories:
+
+**DIT Models**
+
+Download pre-quantized DIT models from [Wan2.1-Distill-Models](https://huggingface.co/lightx2v/Wan2.1-Distill-Models):
+
+```bash
+# Download DIT FP8 quantized model
+huggingface-cli download lightx2v/Wan2.1-Distill-Models \
+    --local-dir ./models \
+    --include "wan2.1_i2v_720p_scaled_fp8_e4m3_lightx2v_4step.safetensors"
+```
+
+**Encoder Models**
+
+Download pre-quantized T5 and CLIP models from [Encoders-LightX2V](https://huggingface.co/lightx2v/Encoders-Lightx2v):
+
+```bash
+# Download T5 FP8 quantized model
+huggingface-cli download lightx2v/Encoders-Lightx2v \
+    --local-dir ./models \
+    --include "models_t5_umt5-xxl-enc-fp8.pth"
+
+# Download CLIP FP8 quantized model
+huggingface-cli download lightx2v/Encoders-Lightx2v \
+    --local-dir ./models \
+    --include "models_clip_open-clip-xlm-roberta-large-vit-huge-14-fp8.pth"
+```
+
+### Method 2: Self-Quantize Models
+
+For detailed quantization tool usage, refer to: [Model Conversion Documentation](https://github.com/ModelTC/lightx2v/tree/main/tools/convert/readme_zh.md)
+
+---
+
+## 🚀 Using Quantized Models
+
+### DIT Model Quantization
+
+#### Supported Quantization Modes
+
+DIT quantization modes (`dit_quant_scheme`) support: `fp8-vllm`, `int8-vllm`, `fp8-sgl`, `int8-sgl`, `fp8-q8f`, `int8-q8f`, `int8-torchao`, `int4-g128-marlin`, `fp8-b128-deepgemm`
+
+#### Configuration Example
+
+```json
+{
+    "dit_quantized": true,
+    "dit_quant_scheme": "fp8-sgl",
+    "dit_quantized_ckpt": "/path/to/dit_quantized_model"  // Optional
+}
+```
+
+> 💡 **Tip**: When there's only one DIT model in the script's `model_path`, `dit_quantized_ckpt` doesn't need to be specified separately.
+
+### T5 Model Quantization
+
+#### Supported Quantization Modes
+
+T5 quantization modes (`t5_quant_scheme`) support: `int8-vllm`, `fp8-sgl`, `int8-q8f`, `fp8-q8f`, `int8-torchao`
+
+#### Configuration Example
+
+```json
+{
+    "t5_quantized": true,
+    "t5_quant_scheme": "fp8-sgl",
+    "t5_quantized_ckpt": "/path/to/t5_quantized_model"  // Optional
+}
+```
+
+> 💡 **Tip**: When a T5 quantized model exists in the script's specified `model_path` (such as `models_t5_umt5-xxl-enc-fp8.pth` or `models_t5_umt5-xxl-enc-int8.pth`), `t5_quantized_ckpt` doesn't need to be specified separately.
+
+### CLIP Model Quantization
+
+#### Supported Quantization Modes
+
+CLIP quantization modes (`clip_quant_scheme`) support: `int8-vllm`, `fp8-sgl`, `int8-q8f`, `fp8-q8f`, `int8-torchao`
+
+#### Configuration Example
+
+```json
+{
+    "clip_quantized": true,
+    "clip_quant_scheme": "fp8-sgl",
+    "clip_quantized_ckpt": "/path/to/clip_quantized_model"  // Optional
+}
+```
+
+> 💡 **Tip**: When a CLIP quantized model exists in the script's specified `model_path` (such as `models_clip_open-clip-xlm-roberta-large-vit-huge-14-fp8.pth` or `models_clip_open-clip-xlm-roberta-large-vit-huge-14-int8.pth`), `clip_quantized_ckpt` doesn't need to be specified separately.
+
+### Performance Optimization Strategy
+
+If memory is insufficient, you can combine parameter offloading to further reduce memory usage. Refer to [Parameter Offload Documentation](../method_tutorials/offload.md):
+
+> - **Wan2.1 Configuration**: Refer to [offload config files](https://github.com/ModelTC/LightX2V/tree/main/configs/offload)
+> - **Wan2.2 Configuration**: Refer to [wan22 config files](https://github.com/ModelTC/LightX2V/tree/main/configs/wan22) with `4090` suffix
+
+---
+
+## 📚 Related Resources
+
+### Configuration File Examples
+- [INT8 Quantization Config](https://github.com/ModelTC/LightX2V/blob/main/configs/quantization/wan_i2v.json)
+- [Q8F Quantization Config](https://github.com/ModelTC/LightX2V/blob/main/configs/quantization/wan_i2v_q8f.json)
+- [TorchAO Quantization Config](https://github.com/ModelTC/LightX2V/blob/main/configs/quantization/wan_i2v_torchao.json)
+
+### Run Scripts
+- [Quantization Inference Scripts](https://github.com/ModelTC/LightX2V/tree/main/scripts/quantization)
+
+### Tool Documentation
+- [Quantization Tool Documentation](https://github.com/ModelTC/lightx2v/tree/main/tools/convert/readme_zh.md)
+- [LightCompress Quantization Documentation](https://github.com/ModelTC/llmc/blob/main/docs/zh_cn/source/backend/lightx2v.md)
+
+### Model Repositories
+- [Wan2.1-LightX2V Quantized Models](https://huggingface.co/lightx2v/Wan2.1-Distill-Models)
+- [Wan2.2-LightX2V Quantized Models](https://huggingface.co/lightx2v/Wan2.2-Distill-Models)
+- [Encoders Quantized Models](https://huggingface.co/lightx2v/Encoders-Lightx2v)
+
+---
+
+Through this document, you should be able to:
+
+✅ Understand quantization schemes supported by LightX2V
+✅ Select appropriate quantization strategies based on hardware
+✅ Correctly configure quantization parameters
+✅ Obtain and use quantized models
+✅ Optimize inference performance and memory usage
+
+If you have other questions, feel free to ask in [GitHub Issues](https://github.com/ModelTC/LightX2V/issues).
--- a/docs/EN/source/method_tutorials/step_distill.md
+++ b/docs/EN/source/method_tutorials/step_distill.md
+# Step Distillation
+
+Step distillation is an important optimization technique in LightX2V. By training distilled models, it significantly reduces inference steps from the original 40-50 steps to **4 steps**, dramatically improving inference speed while maintaining video quality. LightX2V implements step distillation along with CFG distillation to further enhance inference speed.
+
+## 🔍 Technical Principle
+
+### DMD Distillation
+
+The core technology of step distillation is [DMD Distillation](https://arxiv.org/abs/2311.18828). The DMD distillation framework is shown in the following diagram:
+
+<div align="center">
+<img alt="DMD Distillation Framework" src="https://raw.githubusercontent.com/ModelTC/LightX2V/main/assets/figs/step_distill/fig_01.png" width="75%">
+</div>
+
+The core idea of DMD distillation is to minimize the KL divergence between the output distributions of the distilled model and the original model:
+
+$$
+\begin{aligned}
+D_{KL}\left(p_{\text{fake}} \; \| \; p_{\text{real}} \right) &= \mathbb{E}{x\sim p\text{fake}}\left(\log\left(\frac{p_\text{fake}(x)}{p_\text{real}(x)}\right)\right)\\
+&= \mathbb{E}{\substack{
+z \sim \mathcal{N}(0; \mathbf{I}) \\
+x = G_\theta(z)
+}}-\big(\log~p_\text{real}(x) - \log~p_\text{fake}(x)\big).
+\end{aligned}
+$$
+
+Since directly computing the probability density is nearly impossible, DMD distillation instead computes the gradient of this KL divergence:
+
+$$
+\begin{aligned}
+\nabla_\theta D_{KL}
+&= \mathbb{E}{\substack{
+z \sim \mathcal{N}(0; \mathbf{I}) \\
+x = G_\theta(z)
+} } \Big[-
+\big(
+s_\text{real}(x) - s_\text{fake}(x)\big)
+\hspace{.5mm} \frac{dG}{d\theta}
+\Big],
+\end{aligned}
+$$
+
+where $s_\text{real}(x) =\nabla_{x} \text{log}~p_\text{real}(x)$ and $s_\text{fake}(x) =\nabla_{x} \text{log}~p_\text{fake}(x)$ are score functions. Score functions can be computed by the model. Therefore, DMD distillation maintains three models in total:
+
+- `real_score`, computes the score of the real distribution; since the real distribution is fixed, DMD distillation uses the original model with fixed weights as its score function;
+- `fake_score`, computes the score of the fake distribution; since the fake distribution is constantly updated, DMD distillation initializes it with the original model and fine-tunes it to learn the output distribution of the generator;
+- `generator`, the student model, guided by computing the gradient of the KL divergence between `real_score` and `fake_score`.
+
+> References:
+> 1. [DMD (One-step Diffusion with Distribution Matching Distillation)](https://arxiv.org/abs/2311.18828)
+> 2. [DMD2 (Improved Distribution Matching Distillation for Fast Image Synthesis)](https://arxiv.org/abs/2405.14867)
+
+### Self-Forcing
+
+DMD distillation technology is designed for image generation. The step distillation in LightX2V is implemented based on [Self-Forcing](https://github.com/guandeh17/Self-Forcing) technology. The overall implementation of Self-Forcing is similar to DMD, but following DMD2, it removes the regression loss and uses ODE initialization instead. Additionally, Self-Forcing adds an important optimization for video generation tasks:
+
+Current DMD distillation-based methods struggle to generate videos in one step. Self-Forcing selects one timestep for optimization each time, with the generator computing gradients only at this step. This approach significantly improves Self-Forcing's training speed and enhances the denoising quality at intermediate timesteps, also improving its effectiveness.
+
+> References:
+> 1. [Self-Forcing (Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion)](https://arxiv.org/abs/2506.08009)
+
+### LightX2V
+
+Self-Forcing performs step distillation and CFG distillation on 1.3B autoregressive models. LightX2V extends it with a series of enhancements:
+
+1. **Larger Models**: Supports step distillation training for 14B models;
+2. **More Model Types**: Supports standard bidirectional models and I2V model step distillation training;
+3. **Better Results**: LightX2V uses high-quality prompts from approximately 50,000 data entries for training;
+
+For detailed implementation, refer to [Self-Forcing-Plus](https://github.com/GoatWu/Self-Forcing-Plus).
+
+## 🎯 Technical Features
+
+- **Inference Acceleration**: Reduces inference steps from 40-50 to 4 steps without CFG, achieving approximately **20-24x** speedup
+- **Quality Preservation**: Maintains original video generation quality through distillation techniques
+- **Strong Compatibility**: Supports both T2V and I2V tasks
+- **Flexible Usage**: Supports loading complete step distillation models or loading step distillation LoRA on top of native models; compatible with int8/fp8 model quantization
+
+## 🛠️ Configuration Files
+
+### Basic Configuration Files
+
+Multiple configuration options are provided in the [configs/distill/](https://github.com/ModelTC/lightx2v/tree/main/configs/distill) directory:
+
+| Configuration File | Purpose | Model Address |
+|-------------------|---------|---------------|
+| [wan_t2v_distill_4step_cfg.json](https://github.com/ModelTC/lightx2v/blob/main/configs/distill/wan_t2v_distill_4step_cfg.json) | Load T2V 4-step distillation complete model | [hugging-face](https://huggingface.co/lightx2v/Wan2.1-T2V-14B-StepDistill-CfgDistill-Lightx2v/blob/main/distill_models/distill_model.safetensors) |
+| [wan_i2v_distill_4step_cfg.json](https://github.com/ModelTC/lightx2v/blob/main/configs/distill/wan_i2v_distill_4step_cfg.json) | Load I2V 4-step distillation complete model | [hugging-face](https://huggingface.co/lightx2v/Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-Lightx2v/blob/main/distill_models/distill_model.safetensors) |
+| [wan_t2v_distill_4step_cfg_lora.json](https://github.com/ModelTC/lightx2v/blob/main/configs/distill/wan_t2v_distill_4step_cfg_lora.json) | Load Wan-T2V model and step distillation LoRA | [hugging-face](https://huggingface.co/lightx2v/Wan2.1-T2V-14B-StepDistill-CfgDistill-Lightx2v/blob/main/loras/Wan21_T2V_14B_lightx2v_cfg_step_distill_lora_rank64.safetensors) |
+| [wan_i2v_distill_4step_cfg_lora.json](https://github.com/ModelTC/lightx2v/blob/main/configs/distill/wan_i2v_distill_4step_cfg_lora.json) | Load Wan-I2V model and step distillation LoRA | [hugging-face](https://huggingface.co/lightx2v/Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-Lightx2v/blob/main/loras/Wan21_I2V_14B_lightx2v_cfg_step_distill_lora_rank64.safetensors) |
+
+### Key Configuration Parameters
+
+- Since DMD distillation only trains a few fixed timesteps, we recommend using `LCM Scheduler` for inference. In [WanStepDistillScheduler](https://github.com/ModelTC/LightX2V/blob/main/lightx2v/models/schedulers/wan/step_distill/scheduler.py), `LCM Scheduler` is already fixed in use, requiring no user configuration.
+- `infer_steps`, `denoising_step_list` and `sample_shift` are set to parameters matching those during training, and are generally not recommended for user modification.
+- `enable_cfg` must be set to `false` (equivalent to setting `sample_guide_scale = 1`), otherwise the video may become completely blurred.
+- `lora_configs` supports merging multiple LoRAs with different strengths. When `lora_configs` is not empty, the original `Wan2.1` model is loaded by default. Therefore, when using `lora_config` and wanting to use step distillation, please set the path and strength of the step distillation LoRA.
+
+```json
+{
+  "infer_steps": 4,                              // Inference steps
+  "denoising_step_list": [1000, 750, 500, 250],  // Denoising timestep list
+  "sample_shift": 5,                             // Scheduler timestep shift
+  "enable_cfg": false,                           // Disable CFG for speed improvement
+  "lora_configs": [                              // LoRA weights path (optional)
+    {
+      "path": "path/to/distill_lora.safetensors",
+      "strength": 1.0
+    }
+  ]
+}
+```
+
+## 📜 Usage
+
+### Model Preparation
+
+**Complete Model:**
+Place the downloaded model (`distill_model.pt` or `distill_model.safetensors`) in the `distill_models/` folder under the Wan model root directory:
+
+- For T2V: `Wan2.1-T2V-14B/distill_models/`
+- For I2V-480P: `Wan2.1-I2V-14B-480P/distill_models/`
+
+**LoRA:**
+
+1. Place the downloaded LoRA in any location
+2. Modify the `lora_path` parameter in the configuration file to the LoRA storage path
+
+### Inference Scripts
+
+**T2V Complete Model:**
+
+```bash
+bash scripts/wan/run_wan_t2v_distill_4step_cfg.sh
+```
+
+**I2V Complete Model:**
+
+```bash
+bash scripts/wan/run_wan_i2v_distill_4step_cfg.sh
+```
+
+### Step Distillation LoRA Inference Scripts
+
+**T2V LoRA:**
+
+```bash
+bash scripts/wan/run_wan_t2v_distill_4step_cfg_lora.sh
+```
+
+**I2V LoRA:**
+
+```bash
+bash scripts/wan/run_wan_i2v_distill_4step_cfg_lora.sh
+```
+
+## 🔧 Service Deployment
+
+### Start Distillation Model Service
+
+Modify the startup command in [scripts/server/start_server.sh](https://github.com/ModelTC/lightx2v/blob/main/scripts/server/start_server.sh):
+
+```bash
+python -m lightx2v.api_server \
+  --model_cls wan2.1_distill \
+  --task t2v \
+  --model_path $model_path \
+  --config_json ${lightx2v_path}/configs/distill/wan_t2v_distill_4step_cfg.json \
+  --port 8000 \
+  --nproc_per_node 1
+```
+
+Run the service startup script:
+
+```bash
+scripts/server/start_server.sh
+```
+
+For more details, see [Service Deployment](https://lightx2v-en.readthedocs.io/en/latest/deploy_guides/deploy_service.html).
+
+### Usage in Gradio Interface
+
+See [Gradio Documentation](https://lightx2v-en.readthedocs.io/en/latest/deploy_guides/deploy_gradio.html)
--- a/docs/EN/source/method_tutorials/video_frame_interpolation.md
+++ b/docs/EN/source/method_tutorials/video_frame_interpolation.md
+# Video Frame Interpolation (VFI)
+
+> **Important Note**: Video frame interpolation is enabled through configuration files, not command-line parameters. Please add a `video_frame_interpolation` configuration block to your JSON config file to enable this feature.
+
+## Overview
+
+Video Frame Interpolation (VFI) is a technique that generates intermediate frames between existing frames to increase the frame rate and create smoother video playback. LightX2V integrates the RIFE (Real-Time Intermediate Flow Estimation) model to provide high-quality frame interpolation capabilities.
+
+## What is RIFE?
+
+RIFE is a state-of-the-art video frame interpolation method that uses optical flow estimation to generate intermediate frames. It can effectively:
+
+- Increase video frame rate (e.g., from 16 FPS to 32 FPS)
+- Create smooth motion transitions
+- Maintain high visual quality with minimal artifacts
+- Process videos in real-time
+
+## Installation and Setup
+
+### Download RIFE Model
+
+First, download the RIFE model weights using the provided script:
+
+```bash
+python tools/download_rife.py <target_directory>
+```
+
+For example, to download to the location:
+```bash
+python tools/download_rife.py /path/to/rife/train_log
+```
+
+This script will:
+- Download RIFEv4.26 model from HuggingFace
+- Extract and place the model files in the correct directory
+- Clean up temporary files
+
+## Usage
+
+### Configuration File Setup
+
+Video frame interpolation is enabled through configuration files. Add a `video_frame_interpolation` configuration block to your JSON config file:
+
+```json
+{
+    "infer_steps": 50,
+    "target_video_length": 81,
+    "target_height": 480,
+    "target_width": 832,
+    "fps": 16,
+    "video_frame_interpolation": {
+        "algo": "rife",
+        "target_fps": 32,
+        "model_path": "/path/to/rife/train_log"
+    }
+}
+```
+
+### Command Line Interface
+
+Run inference using a configuration file that includes VFI settings:
+
+```bash
+python lightx2v/infer.py \
+    --model_cls wan2.1 \
+    --task t2v \
+    --model_path /path/to/model \
+    --config_json ./configs/video_frame_interpolation/wan_t2v.json \
+    --prompt "A beautiful sunset over the ocean" \
+    --save_result_path ./output.mp4
+```
+
+### Configuration Parameters
+
+In the `video_frame_interpolation` configuration block:
+
+- `algo`: Frame interpolation algorithm, currently supports "rife"
+- `target_fps`: Target frame rate for the output video
+- `model_path`: RIFE model path, typically "/path/to/rife/train_log"
+
+Other related configurations:
+- `fps`: Source video frame rate (default 16)
+
+### Configuration Priority
+
+The system automatically handles video frame rate configuration with the following priority:
+1. `video_frame_interpolation.target_fps` - If video frame interpolation is enabled, this frame rate is used as the output frame rate
+2. `fps` (default 16) - If video frame interpolation is not enabled, this frame rate is used; it's always used as the source frame rate
+
+
+## How It Works
+
+### Frame Interpolation Process
+
+1. **Source Video Generation**: The base model generates video frames at the source FPS
+2. **Frame Analysis**: RIFE analyzes adjacent frames to estimate optical flow
+3. **Intermediate Frame Generation**: New frames are generated between existing frames
+4. **Temporal Smoothing**: The interpolated frames create smooth motion transitions
+
+### Technical Details
+
+- **Input Format**: ComfyUI Image tensors [N, H, W, C] in range [0, 1]
+- **Output Format**: Interpolated ComfyUI Image tensors [M, H, W, C] in range [0, 1]
+- **Processing**: Automatic padding and resolution handling
+- **Memory Optimization**: Efficient GPU memory management
+
+## Example Configurations
+
+### Basic Frame Rate Doubling
+
+Create configuration file `wan_t2v_vfi_32fps.json`:
+
+```json
+{
+    "infer_steps": 50,
+    "target_video_length": 81,
+    "target_height": 480,
+    "target_width": 832,
+    "seed": 42,
+    "sample_guide_scale": 6,
+    "enable_cfg": true,
+    "fps": 16,
+    "video_frame_interpolation": {
+        "algo": "rife",
+        "target_fps": 32,
+        "model_path": "/path/to/rife/train_log"
+    }
+}
+```
+
+Run command:
+```bash
+python lightx2v/infer.py \
+    --model_cls wan2.1 \
+    --task t2v \
+    --model_path ./models/wan2.1 \
+    --config_json ./wan_t2v_vfi_32fps.json \
+    --prompt "A cat playing in the garden" \
+    --save_result_path ./output_32fps.mp4
+```
+
+### Higher Frame Rate Enhancement
+
+Create configuration file `wan_i2v_vfi_60fps.json`:
+
+```json
+{
+    "infer_steps": 30,
+    "target_video_length": 81,
+    "target_height": 480,
+    "target_width": 832,
+    "seed": 42,
+    "sample_guide_scale": 6,
+    "enable_cfg": true,
+    "fps": 16,
+    "video_frame_interpolation": {
+        "algo": "rife",
+        "target_fps": 60,
+        "model_path": "/path/to/rife/train_log"
+    }
+}
+```
+
+Run command:
+```bash
+python lightx2v/infer.py \
+    --model_cls wan2.1 \
+    --task i2v \
+    --model_path ./models/wan2.1 \
+    --config_json ./wan_i2v_vfi_60fps.json \
+    --image_path ./input.jpg \
+    --prompt "Smooth camera movement" \
+    --save_result_path ./output_60fps.mp4
+```
+
+## Performance Considerations
+
+### Memory Usage
+
+- RIFE processing requires additional GPU memory
+- Memory usage scales with video resolution and length
+- Consider using lower resolutions for longer videos
+
+### Processing Time
+
+- Frame interpolation adds processing overhead
+- Higher target frame rates require more computation
+- Processing time is roughly proportional to the number of interpolated frames
+
+### Quality vs Speed Trade-offs
+
+- Higher interpolation ratios may introduce artifacts
+- Optimal range: 2x to 4x frame rate increase
+- For extreme interpolation (>4x), consider multiple passes
+
+## Best Practices
+
+### Optimal Use Cases
+
+- **Motion-heavy videos**: Benefit most from frame interpolation
+- **Camera movements**: Smoother panning and zooming
+- **Action sequences**: Reduced motion blur perception
+- **Slow-motion effects**: Create fluid slow-motion videos
+
+### Recommended Settings
+
+- **Source FPS**: 16-24 FPS (generated by base model)
+- **Target FPS**: 32-60 FPS (2x to 4x increase)
+- **Resolution**: Up to 720p for best performance
+
+### Troubleshooting
+
+#### Common Issues
+
+1. **Out of Memory**: Reduce video resolution or target FPS
+2. **Artifacts in output**: Lower the interpolation ratio
+3. **Slow processing**: Check GPU memory and consider using CPU offloading
+
+#### Solutions
+
+Solve issues by modifying the configuration file:
+
+```json
+{
+    // For memory issues, use lower resolution
+    "target_height": 480,
+    "target_width": 832,
+
+    // For quality issues, use moderate interpolation
+    "video_frame_interpolation": {
+        "target_fps": 24  // instead of 60
+    },
+
+    // For performance issues, enable offloading
+    "cpu_offload": true
+}
+```
+
+## Technical Implementation
+
+The RIFE integration in LightX2V includes:
+
+- **RIFEWrapper**: ComfyUI-compatible wrapper for RIFE model
+- **Automatic Model Loading**: Seamless integration with the inference pipeline
+- **Memory Optimization**: Efficient tensor management and GPU memory usage
+- **Quality Preservation**: Maintains original video quality while adding frames
--- a/docs/PAPERS_ZH_CN/.readthedocs.yaml
+++ b/docs/PAPERS_ZH_CN/.readthedocs.yaml
+version: 2
+
+# Set the version of Python and other tools you might need
+build:
+  os: ubuntu-20.04
+  tools:
+    python: "3.10"
+
+formats:
+    - epub
+
+sphinx:
+  configuration: docs/PAPERS_ZH_CN/source/conf.py
+
+python:
+  install:
+    - requirements: requirements-docs.txt
--- a/docs/PAPERS_ZH_CN/Makefile
+++ b/docs/PAPERS_ZH_CN/Makefile
+# Minimal makefile for Sphinx documentation
+#
+
+# You can set these variables from the command line, and also
+# from the environment for the first two.
+SPHINXOPTS    ?=
+SPHINXBUILD   ?= sphinx-build
+SOURCEDIR     = source
+BUILDDIR      = build
+
+# Put it first so that "make" without argument is like "make help".
+help:
+	@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
+
+.PHONY: help Makefile
+
+# Catch-all target: route all unknown targets to Sphinx using the new
+# "make mode" option.  $(O) is meant as a shortcut for $(SPHINXOPTS).
+%: Makefile
+	@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
--- a/docs/PAPERS_ZH_CN/make.bat
+++ b/docs/PAPERS_ZH_CN/make.bat
+@ECHO OFF
+
+pushd %~dp0
+
+REM Command file for Sphinx documentation
+
+if "%SPHINXBUILD%" == "" (
+	set SPHINXBUILD=sphinx-build
+)
+set SOURCEDIR=source
+set BUILDDIR=build
+
+%SPHINXBUILD% >NUL 2>NUL
+if errorlevel 9009 (
+	echo.
+	echo.The 'sphinx-build' command was not found. Make sure you have Sphinx
+	echo.installed, then set the SPHINXBUILD environment variable to point
+	echo.to the full path of the 'sphinx-build' executable. Alternatively you
+	echo.may add the Sphinx directory to PATH.
+	echo.
+	echo.If you don't have Sphinx installed, grab it from
+	echo.https://www.sphinx-doc.org/
+	exit /b 1
+)
+
+if "%1" == "" goto help
+
+%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
+goto end
+
+:help
+%SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
+
+:end
+popd
--- a/docs/PAPERS_ZH_CN/source/conf.py
+++ b/docs/PAPERS_ZH_CN/source/conf.py
+# Configuration file for the Sphinx documentation builder.
+#
+# This file only contains a selection of the most common options. For a full
+# list see the documentation:
+# https://www.sphinx-doc.org/en/master/usage/configuration.html
+
+# -- Path setup --------------------------------------------------------------
+
+# If extensions (or modules to document with autodoc) are in another directory,
+# add these directories to sys.path here. If the directory is relative to the
+# documentation root, use os.path.abspath to make it absolute, like shown here.
+
+import logging
+import os
+import sys
+from typing import List
+
+import sphinxcontrib.redoc
+from sphinx.ext import autodoc
+
+logger = logging.getLogger(__name__)
+sys.path.append(os.path.abspath("../.."))
+
+# -- Project information -----------------------------------------------------
+
+project = "Lightx2v"
+copyright = "2025, Lightx2v Team"
+author = "the Lightx2v Team"
+
+# -- General configuration ---------------------------------------------------
+
+# Add any Sphinx extension module names here, as strings. They can be
+# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
+# ones.
+extensions = [
+    "sphinx.ext.napoleon",
+    "sphinx.ext.viewcode",
+    "sphinx.ext.intersphinx",
+    "sphinx_copybutton",
+    "sphinx.ext.autodoc",
+    "sphinx.ext.autosummary",
+    "myst_parser",
+    "sphinxarg.ext",
+    "sphinxcontrib.redoc",
+    "sphinxcontrib.openapi",
+]
+
+html_static_path = ["_static"]
+
+# Add any paths that contain templates here, relative to this directory.
+templates_path = ["_templates"]
+
+# List of patterns, relative to source directory, that match files and
+# directories to ignore when looking for source files.
+# This pattern also affects html_static_path and html_extra_path.
+exclude_patterns: List[str] = ["**/*.template.rst"]
+
+# Exclude the prompt "$" when copying code
+copybutton_prompt_text = r"\$ "
+copybutton_prompt_is_regexp = True
+
+# -- Options for HTML output -------------------------------------------------
+
+# The theme to use for HTML and HTML Help pages.  See the documentation for
+# a list of builtin themes.
+#
+html_title = project
+html_theme = "sphinx_book_theme"
+# html_theme = 'sphinx_rtd_theme'
+html_logo = "../../../assets/img_lightx2v.png"
+html_theme_options = {
+    "path_to_docs": "docs/ZH_CN/source",
+    "repository_url": "https://github.com/ModelTC/lightx2v",
+    "use_repository_button": True,
+}
+
+# Add any paths that contain custom static files (such as style sheets) here,
+# relative to this directory. They are copied after the builtin static files,
+# so a file named "default.css" will overwrite the builtin "default.css".
+# html_static_path = ['_static']
+
+
+# Generate additional rst documentation here.
+def setup(app):
+    # from docs.source.generate_examples import generate_examples
+    # generate_examples()
+    pass
+
+
+# Mock out external dependencies here.
+autodoc_mock_imports = [
+    "cpuinfo",
+    "torch",
+    "transformers",
+    "psutil",
+    "prometheus_client",
+    "sentencepiece",
+    "lightllmnumpy",
+    "tqdm",
+    "tensorizer",
+]
+
+for mock_target in autodoc_mock_imports:
+    if mock_target in sys.modules:
+        logger.info(
+            "Potentially problematic mock target (%s) found; autodoc_mock_imports cannot mock modules that have already been loaded into sys.modules when the sphinx build starts.",
+            mock_target,
+        )
+
+
+class MockedClassDocumenter(autodoc.ClassDocumenter):
+    """Remove note about base class when a class is derived from object."""
+
+    def add_line(self, line: str, source: str, *lineno: int) -> None:
+        if line == "   Bases: :py:class:`object`":
+            return
+        super().add_line(line, source, *lineno)
+
+
+autodoc.ClassDocumenter = MockedClassDocumenter
+
+navigation_with_keys = False