Unverified Commit 6f281bdd authored by gushiqiao's avatar gushiqiao Committed by GitHub
Browse files

Update docs (#386)

parent 8b1e4f94
...@@ -26,26 +26,23 @@ For comprehensive usage instructions, please refer to our documentation: **[Engl ...@@ -26,26 +26,23 @@ For comprehensive usage instructions, please refer to our documentation: **[Engl
## 🤖 Supported Model Ecosystem ## 🤖 Supported Model Ecosystem
### Official Open-Source Models ### Official Open-Source Models
-[HunyuanVideo](https://huggingface.co/tencent/HunyuanVideo)
-[Wan2.1 & Wan2.2](https://huggingface.co/Wan-AI/) -[Wan2.1 & Wan2.2](https://huggingface.co/Wan-AI/)
-[SkyReels-V2-DF](https://huggingface.co/Skywork/SkyReels-V2-DF-14B-540P) -[Qwen-Image](https://huggingface.co/Qwen/Qwen-Image)
-[CogVideoX1.5-5B-T2V](https://huggingface.co/THUDM/CogVideoX1.5-5B) -[Qwen-Image-Edit](https://huggingface.co/spaces/Qwen/Qwen-Image-Edit)
### Quantized Models ### Quantized and Distilled Models/LoRAs (**🚀 Recommended: 4-step inference**)
-[Wan2.1-T2V-1.3B-Lightx2v](https://huggingface.co/lightx2v/Wan2.1-T2V-1.3B-Lightx2v) -[Wan2.1-Distill-Models](https://huggingface.co/lightx2v/Wan2.1-Distill-Models)
-[Wan2.1-T2V-14B-Lightx2v](https://huggingface.co/lightx2v/Wan2.1-T2V-14B-Lightx2v) -[Wan2.2-Distill-Models](https://huggingface.co/lightx2v/Wan2.2-Distill-Models)
-[Wan2.1-I2V-14B-480P-Lightx2v](https://huggingface.co/lightx2v/Wan2.1-I2V-14B-480P-Lightx2v) -[Wan2.1-Distill-Loras](https://huggingface.co/lightx2v/Wan2.1-Distill-Loras)
-[Wan2.1-I2V-14B-720P-Lightx2v](https://huggingface.co/lightx2v/Wan2.1-I2V-14B-720P-Lightx2v) -[Wan2.2-Distill-Loras](https://huggingface.co/lightx2v/Wan2.2-Distill-Loras)
### Distilled Models (**🚀 Recommended: 4-step inference**)
-[Wan2.1-T2V-14B-StepDistill-CfgDistill-Lightx2v](https://huggingface.co/lightx2v/Wan2.1-T2V-14B-StepDistill-CfgDistill-Lightx2v)
-[Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-Lightx2v](https://huggingface.co/lightx2v/Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-Lightx2v)
-[Wan2.1-I2V-14B-720P-StepDistill-CfgDistill-Lightx2v](https://huggingface.co/lightx2v/Wan2.1-I2V-14B-720P-StepDistill-CfgDistill-Lightx2v)
🔔 Follow our [HuggingFace page](https://huggingface.co/lightx2v) for the latest model releases from our team. 🔔 Follow our [HuggingFace page](https://huggingface.co/lightx2v) for the latest model releases from our team.
### Autoregressive Models ### Autoregressive Models
-[Wan2.1-T2V-CausVid](https://huggingface.co/lightx2v/Wan2.1-T2V-14B-CausVid) -[Wan2.1-T2V-CausVid](https://huggingface.co/lightx2v/Wan2.1-T2V-14B-CausVid)
-[Self-Forcing](https://github.com/guandeh17/Self-Forcing)
💡 Refer to the [Model Structure Documentation](https://lightx2v-en.readthedocs.io/en/latest/deploy_guides/model_structure.html) to quickly get started with LightX2V
## 🚀 Frontend Interfaces ## 🚀 Frontend Interfaces
......
...@@ -25,26 +25,23 @@ ...@@ -25,26 +25,23 @@
## 🤖 支持的模型生态 ## 🤖 支持的模型生态
### 官方开源模型 ### 官方开源模型
-[HunyuanVideo](https://huggingface.co/tencent/HunyuanVideo)
-[Wan2.1 & Wan2.2](https://huggingface.co/Wan-AI/) -[Wan2.1 & Wan2.2](https://huggingface.co/Wan-AI/)
-[SkyReels-V2-DF](https://huggingface.co/Skywork/SkyReels-V2-DF-14B-540P) -[Qwen-Image](https://huggingface.co/Qwen/Qwen-Image)
-[CogVideoX1.5-5B-T2V](https://huggingface.co/THUDM/CogVideoX1.5-5B) -[Qwen-Image-Edit](https://huggingface.co/spaces/Qwen/Qwen-Image-Edit)
### 量化模型 ### 量化模型和蒸馏模型/Lora (**🚀 推荐:4步推理**)
-[Wan2.1-T2V-1.3B-Lightx2v](https://huggingface.co/lightx2v/Wan2.1-T2V-1.3B-Lightx2v) -[Wan2.1-Distill-Models](https://huggingface.co/lightx2v/Wan2.1-Distill-Models)
-[Wan2.1-T2V-14B-Lightx2v](https://huggingface.co/lightx2v/Wan2.1-T2V-14B-Lightx2v) -[Wan2.2-Distill-Models](https://huggingface.co/lightx2v/Wan2.2-Distill-Models)
-[Wan2.1-I2V-14B-480P-Lightx2v](https://huggingface.co/lightx2v/Wan2.1-I2V-14B-480P-Lightx2v) -[Wan2.1-Distill-Loras](https://huggingface.co/lightx2v/Wan2.1-Distill-Loras)
-[Wan2.1-I2V-14B-720P-Lightx2v](https://huggingface.co/lightx2v/Wan2.1-I2V-14B-720P-Lightx2v) -[Wan2.2-Distill-Loras](https://huggingface.co/lightx2v/Wan2.2-Distill-Loras)
### 蒸馏模型 (**🚀 推荐:4步推理**)
-[Wan2.1-T2V-14B-StepDistill-CfgDistill-Lightx2v](https://huggingface.co/lightx2v/Wan2.1-T2V-14B-StepDistill-CfgDistill-Lightx2v)
-[Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-Lightx2v](https://huggingface.co/lightx2v/Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-Lightx2v)
-[Wan2.1-I2V-14B-720P-StepDistill-CfgDistill-Lightx2v](https://huggingface.co/lightx2v/Wan2.1-I2V-14B-720P-StepDistill-CfgDistill-Lightx2v)
🔔 可以关注我们的[HuggingFace主页](https://huggingface.co/lightx2v),及时获取我们团队的模型。 🔔 可以关注我们的[HuggingFace主页](https://huggingface.co/lightx2v),及时获取我们团队的模型。
### 自回归模型 ### 自回归模型
-[Wan2.1-T2V-CausVid](https://huggingface.co/lightx2v/Wan2.1-T2V-14B-CausVid) -[Wan2.1-T2V-CausVid](https://huggingface.co/lightx2v/Wan2.1-T2V-14B-CausVid)
-[Self-Forcing](https://github.com/guandeh17/Self-Forcing)
💡 参考[模型结构文档](https://lightx2v-zhcn.readthedocs.io/zh-cn/latest/deploy_guides/model_structure.html)快速上手 LightX2V
## 🚀 前端展示 ## 🚀 前端展示
...@@ -79,6 +76,7 @@ ...@@ -79,6 +76,7 @@
- **🔄 并行推理加速**: 多GPU并行处理,显著提升性能表现 - **🔄 并行推理加速**: 多GPU并行处理,显著提升性能表现
- **📱 灵活部署选择**: 支持Gradio、服务化部署、ComfyUI等多种部署方式 - **📱 灵活部署选择**: 支持Gradio、服务化部署、ComfyUI等多种部署方式
- **🎛️ 动态分辨率推理**: 自适应分辨率调整,优化生成质量 - **🎛️ 动态分辨率推理**: 自适应分辨率调整,优化生成质量
- **🎞️ 视频帧插值**: 基于RIFE的帧插值技术,实现流畅的帧率提升
## 🏆 性能基准测试 ## 🏆 性能基准测试
......
...@@ -16,11 +16,6 @@ ...@@ -16,11 +16,6 @@
500, 500,
250 250
], ],
"mm_config": {
"mm_type": "W-fp8-channel-sym-A-fp8-channel-sym-dynamic-Vllm"
},
"dit_quantized": true, "dit_quantized": true,
"dit_quant_scheme": "fp8-sgl", "dit_quant_scheme": "fp8-sgl"
"dit_quantized_ckpt": "/data/nvme0/gushiqiao/models/hf_lightx2v_models/models/wan2.1_i2v_480p_scaled_fp8_e4m3_lightx2v_4step.safetensors"
} }
{
"infer_steps": 4,
"target_fps": 16,
"video_duration": 16,
"audio_sr": 16000,
"target_video_length": 81,
"target_height": 720,
"target_width": 1280,
"self_attn_1_type": "flash_attn3",
"cross_attn_1_type": "flash_attn3",
"cross_attn_2_type": "flash_attn3",
"sample_guide_scale": 1,
"sample_shift": 5,
"enable_cfg": false,
"cpu_offload": true,
"offload_granularity": "block",
"t5_cpu_offload": true,
"offload_ratio_val": 1,
"t5_offload_granularity": "block",
"use_tiling_vae": true,
"audio_encoder_cpu_offload": false,
"audio_adapter_cpu_offload": false
}
...@@ -9,10 +9,15 @@ ...@@ -9,10 +9,15 @@
"sample_guide_scale": 5, "sample_guide_scale": 5,
"sample_shift": 5, "sample_shift": 5,
"enable_cfg": true, "enable_cfg": true,
"dit_quantized": true,
"dit_quant_scheme": "fp8-q8f",
"t5_quantized": true,
"t5_quant_scheme": "fp8-q8f",
"clip_quantized": true,
"clip_quant_scheme": "fp8-q8f",
"cpu_offload": true, "cpu_offload": true,
"t5_cpu_offload": true,
"offload_granularity": "block", "offload_granularity": "block",
"mm_config": { "t5_cpu_offload": false,
"mm_type": "W-int8-channel-sym-A-int8-channel-sym-dynamic-Vllm" "vae_cpu_offload": false,
} "clip_cpu_offload": false
} }
...@@ -12,8 +12,7 @@ ...@@ -12,8 +12,7 @@
"t5_cpu_offload": true, "t5_cpu_offload": true,
"t5_offload_granularity": "block", "t5_offload_granularity": "block",
"t5_quantized": true, "t5_quantized": true,
"t5_quantized_ckpt": "/path/to/models_t5_umt5-xxl-enc-fp8.pth", "t5_quant_scheme": "fp8-sgl",
"t5_quant_scheme": "fp8", "unload_modules": false,
"unload_modules": true, "use_tiling_vae": false
"use_tiling_vae": true
} }
...@@ -10,9 +10,15 @@ ...@@ -10,9 +10,15 @@
"sample_guide_scale": 6, "sample_guide_scale": 6,
"sample_shift": 8, "sample_shift": 8,
"enable_cfg": true, "enable_cfg": true,
"dit_quantized": true,
"dit_quant_scheme": "fp8-q8f",
"t5_quantized": true,
"t5_quant_scheme": "fp8-q8f",
"clip_quantized": true,
"clip_quant_scheme": "fp8-q8f",
"cpu_offload": true, "cpu_offload": true,
"offload_granularity": "block", "offload_granularity": "block",
"t5_cpu_offload": true, "t5_cpu_offload": false,
"dit_quantized": true, "vae_cpu_offload": false,
"dit_quant_scheme": "fp8-sgl" "clip_cpu_offload": false
} }
...@@ -11,7 +11,14 @@ ...@@ -11,7 +11,14 @@
"enable_cfg": true, "enable_cfg": true,
"cpu_offload": true, "cpu_offload": true,
"offload_granularity": "phase", "offload_granularity": "phase",
"t5_cpu_offload": true, "t5_cpu_offload": false,
"t5_offload_granularity": "block", "clip_cpu_offload": false,
"use_tiling_vae": true "vae_cpu_offload": false,
"use_tiling_vae": false,
"dit_quantized": true,
"dit_quant_scheme": "fp8-q8f",
"t5_quantized": true,
"t5_quant_scheme": "fp8-q8f",
"clip_quantized": true,
"clip_quant_scheme": "fp8-q8f"
} }
...@@ -11,12 +11,15 @@ ...@@ -11,12 +11,15 @@
"sample_shift": 8, "sample_shift": 8,
"enable_cfg": true, "enable_cfg": true,
"cpu_offload": true, "cpu_offload": true,
"t5_cpu_offload": true,
"offload_granularity": "phase", "offload_granularity": "phase",
"dit_quantized_ckpt": "/path/to/dit_int8", "t5_cpu_offload": false,
"clip_cpu_offload": false,
"vae_cpu_offload": false,
"use_tiling_vae": false,
"dit_quantized": true, "dit_quantized": true,
"dit_quant_scheme": "int8-q8f", "dit_quant_scheme": "fp8-q8f",
"use_tiny_vae": true, "t5_quantized": true,
"tiny_vae_path": "/x2v_models/taew2_1.pth", "t5_quant_scheme": "fp8-q8f",
"t5_offload_granularity": "block" "clip_quantized": true,
"clip_quant_scheme": "fp8-q8f"
} }
...@@ -24,7 +24,7 @@ ...@@ -24,7 +24,7 @@
"dit_quantized": true, "dit_quantized": true,
"dit_quant_scheme": "fp8-q8f", "dit_quant_scheme": "fp8-q8f",
"t5_quantized": true, "t5_quantized": true,
"t5_quant_scheme": "fp8", "t5_quant_scheme": "fp8-q8f",
"clip_quantized": true, "clip_quantized": true,
"clip_quant_scheme": "fp8" "clip_quant_scheme": "fp8-q8f"
} }
# Model Structure Guide # Model Format and Loading Guide
## 📖 Overview ## 📖 Overview
This document provides a comprehensive introduction to the model directory structure of the LightX2V project, designed to help users efficiently organize model files and achieve a convenient user experience. Through scientific directory organization, users can enjoy the convenience of "one-click startup" without manually configuring complex path parameters. Meanwhile, the system also supports flexible manual path configuration to meet the diverse needs of different user groups. LightX2V is a flexible video generation inference framework that supports multiple model sources and formats, providing users with rich options:
## 🗂️ Model Directory Structure -**Wan Official Models**: Directly compatible with officially released complete models from Wan2.1 and Wan2.2
-**Single-File Models**: Supports single-file format models released by LightX2V (including quantized versions)
-**LoRA Models**: Supports loading distilled LoRAs released by LightX2V
### LightX2V Official Model List This document provides detailed instructions on how to use various model formats, configuration parameters, and best practices.
View all available models: [LightX2V Official Model Repository](https://huggingface.co/lightx2v) ---
### Standard Directory Structure ## 🗂️ Format 1: Wan Official Models
Using `Wan2.1-I2V-14B-480P-LightX2V` as an example, the standard file structure is as follows: ### Model Repositories
- [Wan2.1 Collection](https://huggingface.co/collections/Wan-AI/wan21-68ac4ba85372ae5a8e282a1b)
- [Wan2.2 Collection](https://huggingface.co/collections/Wan-AI/wan22-68ac4ae80a8b477e79636fc8)
``` ### Model Features
Wan2.1-I2V-14B-480P-LightX2V/ - **Official Guarantee**: Complete models officially released by Wan-AI with highest quality
├── fp8/ # FP8 quantized version (DIT/T5/CLIP) - **Complete Components**: Includes all necessary components (DIT, T5, CLIP, VAE)
│ ├── block_xx.safetensors # DIT model FP8 quantized version - **Original Precision**: Uses BF16/FP32 precision with no quantization loss
│ ├── models_t5_umt5-xxl-enc-fp8.pth # T5 encoder FP8 quantized version - **Strong Compatibility**: Fully compatible with Wan official toolchain
│ ├── clip-fp8.pth # CLIP encoder FP8 quantized version
│ ├── Wan2.1_VAE.pth # VAE variational autoencoder ### Wan2.1 Official Models
│ ├── taew2_1.pth # Lightweight VAE (optional)
│ └── config.json # Model configuration file
├── int8/ # INT8 quantized version (DIT/T5/CLIP)
│ ├── block_xx.safetensors # DIT model INT8 quantized version
│ ├── models_t5_umt5-xxl-enc-int8.pth # T5 encoder INT8 quantized version
│ ├── clip-int8.pth # CLIP encoder INT8 quantized version
│ ├── Wan2.1_VAE.pth # VAE variational autoencoder
│ ├── taew2_1.pth # Lightweight VAE (optional)
│ └── config.json # Model configuration file
├── original/ # Original precision version (DIT/T5/CLIP)
│ ├── xx.safetensors # DIT model original precision version
│ ├── models_t5_umt5-xxl-enc-bf16.pth # T5 encoder original precision version
│ ├── models_clip_open-clip-xlm-roberta-large-vit-huge-14.pth # CLIP encoder original precision version
│ ├── Wan2.1_VAE.pth # VAE variational autoencoder
│ ├── taew2_1.pth # Lightweight VAE (optional)
│ └── config.json # Model configuration file
```
Using `Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-LightX2V` as an example, the standard file structure is as follows: #### Directory Structure
Using [Wan2.1-I2V-14B-720P](https://huggingface.co/Wan-AI/Wan2.1-I2V-14B-720P) as an example:
``` ```
Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-LightX2V/ Wan2.1-I2V-14B-720P/
├── distill_fp8/ # FP8 quantized version (DIT/T5/CLIP) ├── diffusion_pytorch_model-00001-of-00007.safetensors # DIT model shard 1
│ ├── block_xx.safetensors # DIT model FP8 quantized version ├── diffusion_pytorch_model-00002-of-00007.safetensors # DIT model shard 2
│ ├── models_t5_umt5-xxl-enc-fp8.pth # T5 encoder FP8 quantized version ├── diffusion_pytorch_model-00003-of-00007.safetensors # DIT model shard 3
│ ├── clip-fp8.pth # CLIP encoder FP8 quantized version ├── diffusion_pytorch_model-00004-of-00007.safetensors # DIT model shard 4
│ ├── Wan2.1_VAE.pth # VAE variational autoencoder ├── diffusion_pytorch_model-00005-of-00007.safetensors # DIT model shard 5
│ ├── taew2_1.pth # Lightweight VAE (optional) ├── diffusion_pytorch_model-00006-of-00007.safetensors # DIT model shard 6
│ └── config.json # Model configuration file ├── diffusion_pytorch_model-00007-of-00007.safetensors # DIT model shard 7
├── distill_int8/ # INT8 quantized version (DIT/T5/CLIP) ├── diffusion_pytorch_model.safetensors.index.json # Shard index file
│ ├── block_xx.safetensors # DIT model INT8 quantized version ├── models_t5_umt5-xxl-enc-bf16.pth # T5 text encoder
│ ├── models_t5_umt5-xxl-enc-int8.pth # T5 encoder INT8 quantized version ├── models_clip_open-clip-xlm-roberta-large-vit-huge-14.pth # CLIP encoder
│ ├── clip-int8.pth # CLIP encoder INT8 quantized version ├── Wan2.1_VAE.pth # VAE encoder/decoder
│ ├── Wan2.1_VAE.pth # VAE variational autoencoder ├── config.json # Model configuration
│ ├── taew2_1.pth # Lightweight VAE (optional) ├── xlm-roberta-large/ # CLIP tokenizer
│ └── config.json # Model configuration file ├── google/ # T5 tokenizer
├── distill_models/ # Original precision version (DIT/T5/CLIP) ├── assets/
│ ├── distill_model.safetensors # DIT model original precision version └── examples/
│ ├── models_t5_umt5-xxl-enc-bf16.pth # T5 encoder original precision version
│ ├── models_clip_open-clip-xlm-roberta-large-vit-huge-14.pth # CLIP encoder original precision version
│ ├── Wan2.1_VAE.pth # VAE variational autoencoder
│ ├── taew2_1.pth # Lightweight VAE (optional)
│ └── config.json # Model configuration file
├── loras/
│ ├── Wan21_I2V_14B_lightx2v_cfg_step_distill_lora_rank64.safetensors # Distillation model lora
``` ```
### 💾 Storage Recommendations #### Usage
**It is strongly recommended to store model files on SSD solid-state drives**, as this can significantly improve model loading speed and inference performance.
**Recommended storage paths**:
```bash ```bash
/mnt/ssd/models/ # Independent SSD mount point # Download model
/data/ssd/models/ # Data SSD directory huggingface-cli download Wan-AI/Wan2.1-I2V-14B-720P \
/opt/models/ # System optimization directory --local-dir ./models/Wan2.1-I2V-14B-720P
```
### Quantization Version Description # Configure launch script
model_path=./models/Wan2.1-I2V-14B-720P
lightx2v_path=/path/to/LightX2V
Each model includes multiple quantized versions to adapt to different hardware configuration requirements: # Run inference
- **FP8 Version**: Suitable for GPUs that support FP8 (such as H100, A100, RTX 40 series), providing optimal performance cd LightX2V/scripts
- **INT8 Version**: Suitable for most GPUs, balancing performance and compatibility, reducing memory usage by approximately 50% bash wan/run_wan_i2v.sh
- **Original Precision Version**: Suitable for applications with extremely high precision requirements, providing highest quality output ```
### Wan2.2 Official Models
## 🚀 Usage Methods #### Directory Structure
### Environment Setup Using [Wan2.2-I2V-A14B](https://huggingface.co/Wan-AI/Wan2.2-I2V-A14B) as an example:
#### Installing Hugging Face CLI ```
Wan2.2-I2V-A14B/
├── high_noise_model/ # High-noise model directory
│ ├── diffusion_pytorch_model-00001-of-00009.safetensors
│ ├── diffusion_pytorch_model-00002-of-00009.safetensors
│ ├── ...
│ ├── diffusion_pytorch_model-00009-of-00009.safetensors
│ └── diffusion_pytorch_model.safetensors.index.json
├── low_noise_model/ # Low-noise model directory
│ ├── diffusion_pytorch_model-00001-of-00009.safetensors
│ ├── diffusion_pytorch_model-00002-of-00009.safetensors
│ ├── ...
│ ├── diffusion_pytorch_model-00009-of-00009.safetensors
│ └── diffusion_pytorch_model.safetensors.index.json
├── models_t5_umt5-xxl-enc-bf16.pth # T5 text encoder
├── Wan2.1_VAE.pth # VAE encoder/decoder
├── configuration.json # Model configuration
├── google/ # T5 tokenizer
├── assets/ # Example assets (optional)
└── examples/ # Example files (optional)
```
Before starting to download models, please ensure that Hugging Face CLI is properly installed: #### Usage
```bash ```bash
# Install huggingface_hub # Download model
pip install huggingface_hub huggingface-cli download Wan-AI/Wan2.2-I2V-A14B \
--local-dir ./models/Wan2.2-I2V-A14B
# Or install huggingface-cli # Configure launch script
pip install huggingface-cli model_path=./models/Wan2.2-I2V-A14B
lightx2v_path=/path/to/LightX2V
# Login to Hugging Face (optional, but strongly recommended) # Run inference
huggingface-cli login cd LightX2V/scripts
bash wan22/run_wan22_moe_i2v.sh
``` ```
### Method 1: Complete Model Download (Recommended) ### Available Model List
**Advantage**: After downloading the complete model, the system will automatically identify all component paths without manual configuration, providing a more convenient user experience #### Wan2.1 Official Model List
#### 1. Download Complete Model | Model Name | Download Link |
|---------|----------|
| Wan2.1-I2V-14B-720P | [Link](https://huggingface.co/Wan-AI/Wan2.1-I2V-14B-720P) |
| Wan2.1-I2V-14B-480P | [Link](https://huggingface.co/Wan-AI/Wan2.1-I2V-14B-480P) |
| Wan2.1-T2V-14B | [Link](https://huggingface.co/Wan-AI/Wan2.1-T2V-14B) |
| Wan2.1-T2V-1.3B | [Link](https://huggingface.co/Wan-AI/Wan2.1-T2V-1.3B) |
| Wan2.1-FLF2V-14B-720P | [Link](https://huggingface.co/Wan-AI/Wan2.1-FLF2V-14B-720P) |
| Wan2.1-VACE-14B | [Link](https://huggingface.co/Wan-AI/Wan2.1-VACE-14B) |
| Wan2.1-VACE-1.3B | [Link](https://huggingface.co/Wan-AI/Wan2.1-VACE-1.3B) |
```bash #### Wan2.2 Official Model List
# Use Hugging Face CLI to download complete model
huggingface-cli download lightx2v/Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-LightX2V \
--local-dir ./Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-LightX2V
```
#### 2. Start Inference | Model Name | Download Link |
|---------|----------|
| Wan2.2-I2V-A14B | [Link](https://huggingface.co/Wan-AI/Wan2.2-I2V-A14B) |
| Wan2.2-T2V-A14B | [Link](https://huggingface.co/Wan-AI/Wan2.2-T2V-A14B) |
| Wan2.2-TI2V-5B | [Link](https://huggingface.co/Wan-AI/Wan2.2-TI2V-5B) |
| Wan2.2-Animate-14B | [Link](https://huggingface.co/Wan-AI/Wan2.2-Animate-14B) |
##### Bash Script Startup ### Usage Tips
###### Scenario 1: Using Full Precision Model > 💡 **Quantized Model Usage**: To use quantized models, refer to the [Model Conversion Script](https://github.com/ModelTC/LightX2V/blob/main/tools/convert/readme_zh.md) for conversion, or directly use pre-converted quantized models in Format 2 below
>
> 💡 **Memory Optimization**: For devices with RTX 4090 24GB or smaller memory, it's recommended to combine quantization techniques with CPU offload features:
> - Quantization Configuration: Refer to [Quantization Documentation](../method_tutorials/quantization.md)
> - CPU Offload: Refer to [Parameter Offload Documentation](../method_tutorials/offload.md)
> - Wan2.1 Configuration: Refer to [offload config files](https://github.com/ModelTC/LightX2V/tree/main/configs/offload)
> - Wan2.2 Configuration: Refer to [wan22 config files](https://github.com/ModelTC/LightX2V/tree/main/configs/wan22) with `4090` suffix
Modify the configuration in the [run script](https://github.com/ModelTC/LightX2V/tree/main/scripts/wan/run_wan_i2v_distill_4step_cfg.sh): ---
- `model_path`: Set to the downloaded model path `./Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-LightX2V`
- `lightx2v_path`: Set to the LightX2V project root directory path
###### Scenario 2: Using Quantized Model ## 🗂️ Format 2: LightX2V Single-File Models (Recommended)
When using the complete model, if you need to enable quantization, please add the following configuration to the [configuration file](https://github.com/ModelTC/LightX2V/tree/main/configs/distill/wan_i2v_distill_4step_cfg.json): ### Model Repositories
- [Wan2.1-LightX2V](https://huggingface.co/lightx2v/Wan2.1-Distill-Models)
- [Wan2.2-LightX2V](https://huggingface.co/lightx2v/Wan2.2-Distill-Models)
```json ### Model Features
{ - **Single-File Management**: Single safetensors file, easy to manage and deploy
"mm_config": { - **Multi-Precision Support**: Provides original precision, FP8, INT8, and other precision versions
"mm_type": "W-fp8-channel-sym-A-fp8-channel-sym-dynamic-Vllm" - **Distillation Acceleration**: Supports 4-step fast inference
}, // DIT model quantization scheme - **Tool Compatibility**: Compatible with ComfyUI and other tools
"t5_quantized": true, // Enable T5 quantization
"t5_quant_scheme": "fp8", // T5 quantization mode
"clip_quantized": true, // Enable CLIP quantization
"clip_quant_scheme": "fp8" // CLIP quantization mode
}
```
> **Important Note**: Quantization configurations for each model can be flexibly combined. Quantization paths do not need to be manually specified, as the system will automatically locate the quantized versions of each model. **Examples**:
- `wan2.1_i2v_720p_lightx2v_4step.safetensors` - 720P I2V original precision
- `wan2.1_i2v_720p_scaled_fp8_e4m3_lightx2v_4step.safetensors` - 720P I2V FP8 quantization
- `wan2.1_i2v_480p_int8_lightx2v_4step.safetensors` - 480P I2V INT8 quantization
- ...
For detailed explanation of quantization technology, please refer to the [Quantization Documentation](../method_tutorials/quantization.md). ### Wan2.1 Single-File Models
Use the provided bash script for quick startup: #### Scenario A: Download Single Model File
**Step 1: Select and Download Model**
```bash ```bash
cd LightX2V/scripts # Create model directory
bash wan/run_wan_t2v_distill_4step_cfg.sh mkdir -p ./models/wan2.1_i2v_720p
```
##### Gradio Interface Startup # Download 720P I2V FP8 quantized model
huggingface-cli download lightx2v/Wan2.1-Distill-Models \
--local-dir ./models/wan2.1_i2v_720p \
--include "wan2.1_i2v_720p_lightx2v_4step.safetensors"
```
When performing inference through the Gradio interface, simply specify the model root directory path at startup, and lightweight VAE can be flexibly selected through frontend interface buttons: **Step 2: Configure Launch Script**
```bash ```bash
# Image-to-video inference (I2V) # Set in launch script (point to directory containing model file)
python gradio_demo.py \ model_path=./models/wan2.1_i2v_720p
--model_path ./Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-LightX2V \ lightx2v_path=/path/to/LightX2V
--model_size 14b \
--task i2v \ # Run script
--model_cls wan2.1_distill cd LightX2V/scripts
bash wan/run_wan_i2v_distill_4step_cfg.sh
``` ```
### Method 2: Selective Download > 💡 **Tip**: When there's only one model file in the directory, LightX2V will automatically load it.
**Advantage**: Only download the required versions (quantized or non-quantized), effectively saving storage space and download time #### Scenario B: Download Multiple Model Files
#### 1. Selective Download When you download multiple models with different precisions to the same directory, you need to explicitly specify which model to use in the configuration file.
```bash **Step 1: Download Multiple Models**
# Use Hugging Face CLI to selectively download non-quantized version
huggingface-cli download lightx2v/Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-LightX2V \
--local-dir ./Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-LightX2V \
--include "distill_models/*"
```
```bash ```bash
# Use Hugging Face CLI to selectively download FP8 quantized version # Create model directory
huggingface-cli download lightx2v/Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-LightX2V \ mkdir -p ./models/wan2.1_i2v_720p_multi
--local-dir ./Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-LightX2V \
--include "distill_fp8/*" # Download original precision model
huggingface-cli download lightx2v/Wan2.1-Distill-Models \
--local-dir ./models/wan2.1_i2v_720p_multi \
--include "wan2.1_i2v_720p_lightx2v_4step.safetensors"
# Download FP8 quantized model
huggingface-cli download lightx2v/Wan2.1-Distill-Models \
--local-dir ./models/wan2.1_i2v_720p_multi \
--include "wan2.1_i2v_720p_scaled_fp8_e4m3_lightx2v_4step.safetensors"
# Download INT8 quantized model
huggingface-cli download lightx2v/Wan2.1-Distill-Models \
--local-dir ./models/wan2.1_i2v_720p_multi \
--include "wan2.1_i2v_720p_int8_lightx2v_4step.safetensors"
``` ```
```bash **Directory Structure**:
# Use Hugging Face CLI to selectively download INT8 quantized version
huggingface-cli download lightx2v/Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-LightX2V \
--local-dir ./Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-LightX2V \
--include "distill_int8/*"
```
> **Important Note**: When starting inference scripts or Gradio, the `model_path` parameter still needs to be specified as the complete path without the `--include` parameter. For example: `model_path=./Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-LightX2V`, not `./Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-LightX2V/distill_int8`. ```
wan2.1_i2v_720p_multi/
├── wan2.1_i2v_720p_lightx2v_4step.safetensors # Original precision
├── wan2.1_i2v_720p_scaled_fp8_e4m3_lightx2v_4step.safetensors # FP8 quantization
└── wan2.1_i2v_720p_int8_lightx2v_4step.safetensors # INT8 quantization
└── t5/clip/vae/config.json/xlm-roberta-large/google and other components # Manually organized
```
#### 2. Start Inference **Step 2: Specify Model in Configuration File**
**Taking the model with only FP8 version downloaded as an example:** Edit configuration file (e.g., `configs/distill/wan_i2v_distill_4step_cfg.json`):
##### Bash Script Startup ```json
{
// Use original precision model
"dit_original_ckpt": "./models/wan2.1_i2v_720p_multi/wan2.1_i2v_720p_lightx2v_4step.safetensors",
###### Scenario 1: Using FP8 DIT + FP8 T5 + FP8 CLIP // Or use FP8 quantized model
// "dit_quantized_ckpt": "./models/wan2.1_i2v_720p_multi/wan2.1_i2v_720p_scaled_fp8_e4m3_lightx2v_4step.safetensors",
// "dit_quantized": true,
// "dit_quant_scheme": "fp8-vllm",
Set the `model_path` in the [run script](https://github.com/ModelTC/LightX2V/tree/main/scripts/wan/run_wan_i2v_distill_4step_cfg.sh) to your downloaded model path `./Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-LightX2V/`, and set `lightx2v_path` to your LightX2V project path. // Or use INT8 quantized model
// "dit_quantized_ckpt": "./models/wan2.1_i2v_720p_multi/wan2.1_i2v_720p_int8_lightx2v_4step.safetensors",
// "dit_quantized": true,
// "dit_quant_scheme": "int8-vllm",
Only need to modify the quantized model configuration in the configuration file as follows: // Other configurations...
```json
{
"mm_config": {
"mm_type": "W-fp8-channel-sym-A-fp8-channel-sym-dynamic-Vllm"
}, // DIT quantization scheme
"t5_quantized": true, // Whether to use T5 quantized version
"t5_quant_scheme": "fp8", // T5 quantization mode
"clip_quantized": true, // Whether to use CLIP quantized version
"clip_quant_scheme": "fp8", // CLIP quantization mode
} }
``` ```
### Usage Tips
> **Important Note**: At this time, each model can only be specified as a quantized version. Quantization paths do not need to be manually specified, as the system will automatically locate the quantized versions of each model. > 💡 **Configuration Parameter Description**:
> - **dit_original_ckpt**: Used to specify the path to original precision models (BF16/FP32/FP16)
> - **dit_quantized_ckpt**: Used to specify the path to quantized models (FP8/INT8), must be used with `dit_quantized` and `dit_quant_scheme` parameters
###### Scenario 2: Using FP8 DIT + Original Precision T5 + Original Precision CLIP **Step 3: Start Inference**
Set the `model_path` in the [run script](https://github.com/ModelTC/LightX2V/tree/main/scripts/wan/run_wan_i2v_distill_4step_cfg.sh) to your downloaded model path `./Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-LightX2V`, and set `lightx2v_path` to your LightX2V project path. ```bash
cd LightX2V/scripts
bash wan/run_wan_i2v_distill_4step_cfg.sh
```
### Wan2.2 Single-File Models
#### Directory Structure Requirements
When using Wan2.2 single-file models, you need to manually create a specific directory structure:
Since only quantized weights were downloaded, you need to manually download the original precision versions of T5 and CLIP, and configure them in the configuration file's `t5_original_ckpt` and `clip_original_ckpt` as follows: ```
```json wan2.2_models/
{ ├── high_noise_model/ # High-noise model directory (required)
"mm_config": { │ └── wan2.2_i2v_A14b_high_noise_lightx2v_4step.safetensors # High-noise model file
"mm_type": "W-fp8-channel-sym-A-fp8-channel-sym-dynamic-Vllm" └── low_noise_model/ # Low-noise model directory (required)
}, // DIT quantization scheme │ └── wan2.2_i2v_A14b_low_noise_lightx2v_4step.safetensors # Low-noise model file
"t5_original_ckpt": "/path/to/models_t5_umt5-xxl-enc-bf16.pth", └── t5/vae/config.json/xlm-roberta-large/google and other components # Manually organized
"clip_original_ckpt": "/path/to/models_clip_open-clip-xlm-roberta-large-vit-huge-14.pth"
}
``` ```
Use the provided bash script for quick startup: #### Scenario A: Only One Model File Per Directory
```bash ```bash
# Create required subdirectories
mkdir -p ./models/wan2.2_models/high_noise_model
mkdir -p ./models/wan2.2_models/low_noise_model
# Download high-noise model to corresponding directory
huggingface-cli download lightx2v/Wan2.2-Distill-Models \
--local-dir ./models/wan2.2_models/high_noise_model \
--include "wan2.2_i2v_A14b_high_noise_scaled_fp8_e4m3_lightx2v_4step.safetensors"
# Download low-noise model to corresponding directory
huggingface-cli download lightx2v/Wan2.2-Distill-Models \
--local-dir ./models/wan2.2_models/low_noise_model \
--include "wan2.2_i2v_A14b_low_noise_scaled_fp8_e4m3_lightx2v_4step.safetensors"
# Configure launch script (point to parent directory)
model_path=./models/wan2.2_models
lightx2v_path=/path/to/LightX2V
# Run script
cd LightX2V/scripts cd LightX2V/scripts
bash wan/run_wan_t2v_distill_4step_cfg.sh bash wan22/run_wan22_moe_i2v_distill.sh
``` ```
##### Gradio Interface Startup > 💡 **Tip**: When there's only one model file in each subdirectory, LightX2V will automatically load it.
#### Scenario B: Multiple Model Files Per Directory
When performing inference through the Gradio interface, specify the model root directory path at startup: When you place multiple models with different precisions in both `high_noise_model/` and `low_noise_model/` directories, you need to explicitly specify them in the configuration file.
```bash ```bash
# Image-to-video inference (I2V) # Create directories
python gradio_demo.py \ mkdir -p ./models/wan2.2_models_multi/high_noise_model
--model_path ./Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-LightX2V/ \ mkdir -p ./models/wan2.2_models_multi/low_noise_model
--model_size 14b \
--task i2v \ # Download multiple versions of high-noise model
--model_cls wan2.1_distill huggingface-cli download lightx2v/Wan2.2-Distill-Models \
--local-dir ./models/wan2.2_models_multi/high_noise_model \
--include "wan2.2_i2v_A14b_high_noise_*.safetensors"
# Download multiple versions of low-noise model
huggingface-cli download lightx2v/Wan2.2-Distill-Models \
--local-dir ./models/wan2.2_models_multi/low_noise_model \
--include "wan2.2_i2v_A14b_low_noise_*.safetensors"
``` ```
> **Important Note**: Since the model root directory only contains quantized versions of each model, when using the frontend, the quantization precision for DIT/T5/CLIP models can only be selected as fp8. If you need to use non-quantized versions of T5/CLIP, please manually download non-quantized weights and place them in the gradio_demo model_path directory (`./Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-LightX2V/`). In this case, the T5/CLIP quantization precision can be set to bf16/fp16. **Directory Structure**:
### Method 3: Manual Configuration ```
wan2.2_models_multi/
Users can flexibly configure quantization options and paths for each component according to actual needs, achieving mixed use of quantized and non-quantized components. Please ensure that the required model weights have been correctly downloaded and placed in the specified paths. ├── high_noise_model/
│ ├── wan2.2_i2v_A14b_high_noise_lightx2v_4step.safetensors # Original precision
│ ├── wan2.2_i2v_A14b_high_noise_fp8_e4m3_lightx2v_4step.safetensors # FP8 quantization
│ └── wan2.2_i2v_A14b_high_noise_int8_lightx2v_4step.safetensors # INT8 quantization
└── low_noise_model/
├── wan2.2_i2v_A14b_low_noise_lightx2v_4step.safetensors # Original precision
├── wan2.2_i2v_A14b_low_noise_fp8_e4m3_lightx2v_4step.safetensors # FP8 quantization
└── wan2.2_i2v_A14b_low_noise_int8_lightx2v_4step.safetensors # INT8 quantization
```
#### DIT Model Configuration **Configuration File Settings**:
```json ```json
{ {
"dit_quantized_ckpt": "/path/to/dit_quantized_ckpt", // DIT quantized weights path // Use original precision model
"dit_original_ckpt": "/path/to/dit_original_ckpt", // DIT original precision weights path "high_noise_original_ckpt": "./models/wan2.2_models_multi/high_noise_model/wan2.2_i2v_A14b_high_noise_lightx2v_4step.safetensors",
"mm_config": { "low_noise_original_ckpt": "./models/wan2.2_models_multi/low_noise_model/wan2.2_i2v_A14b_low_noise_lightx2v_4step.safetensors",
"mm_type": "W-fp8-channel-sym-A-fp8-channel-sym-dynamic-Vllm" // DIT matrix multiplication operator type, specify as "Default" when not quantized
} // Or use FP8 quantized model
// "high_noise_quantized_ckpt": "./models/wan2.2_models_multi/high_noise_model/wan2.2_i2v_A14b_high_noise_fp8_e4m3_lightx2v_4step.safetensors",
// "low_noise_quantized_ckpt": "./models/wan2.2_models_multi/low_noise_model/wan2.2_i2v_A14b_low_noise_fp8_e4m3_lightx2v_4step.safetensors",
// "dit_quantized": true,
// "dit_quant_scheme": "fp8-vllm"
// Or use INT8 quantized model
// "high_noise_quantized_ckpt": "./models/wan2.2_models_multi/high_noise_model/wan2.2_i2v_A14b_high_noise_int8_lightx2v_4step.safetensors",
// "low_noise_quantized_ckpt": "./models/wan2.2_models_multi/low_noise_model/wan2.2_i2v_A14b_low_noise_int8_lightx2v_4step.safetensors",
// "dit_quantized": true,
// "dit_quant_scheme": "int8-vllm"
} }
``` ```
#### T5 Model Configuration ### Usage Tips
```json > 💡 **Configuration Parameter Description**:
{ > - **high_noise_original_ckpt** / **low_noise_original_ckpt**: Used to specify the path to original precision models (BF16/FP32/FP16)
"t5_quantized_ckpt": "/path/to/t5_quantized_ckpt", // T5 quantized weights path > - **high_noise_quantized_ckpt** / **low_noise_quantized_ckpt**: Used to specify the path to quantized models (FP8/INT8), must be used with `dit_quantized` and `dit_quant_scheme` parameters
"t5_original_ckpt": "/path/to/t5_original_ckpt", // T5 original precision weights path
"t5_quantized": true, // Whether to enable T5 quantization
"t5_quant_scheme": "fp8" // T5 quantization mode, only effective when t5_quantized is true ### Available Model List
}
#### Wan2.1 Single-File Model List
**Image-to-Video Models (I2V)**
| Filename | Precision | Description |
|--------|------|------|
| `wan2.1_i2v_480p_lightx2v_4step.safetensors` | BF16 | 4-step model original precision |
| `wan2.1_i2v_480p_scaled_fp8_e4m3_lightx2v_4step.safetensors` | FP8 | 4-step model FP8 quantization |
| `wan2.1_i2v_480p_int8_lightx2v_4step.safetensors` | INT8 | 4-step model INT8 quantization |
| `wan2.1_i2v_480p_scaled_fp8_e4m3_lightx2v_4step_comfyui.safetensors` | FP8 | 4-step model ComfyUI format |
| `wan2.1_i2v_720p_lightx2v_4step.safetensors` | BF16 | 4-step model original precision |
| `wan2.1_i2v_720p_scaled_fp8_e4m3_lightx2v_4step.safetensors` | FP8 | 4-step model FP8 quantization |
| `wan2.1_i2v_720p_int8_lightx2v_4step.safetensors` | INT8 | 4-step model INT8 quantization |
| `wan2.1_i2v_720p_scaled_fp8_e4m3_lightx2v_4step_comfyui.safetensors` | FP8 | 4-step model ComfyUI format |
**Text-to-Video Models (T2V)**
| Filename | Precision | Description |
|--------|------|------|
| `wan2.1_t2v_14b_lightx2v_4step.safetensors` | BF16 | 4-step model original precision |
| `wan2.1_t2v_14b_scaled_fp8_e4m3_lightx2v_4step.safetensors` | FP8 | 4-step model FP8 quantization |
| `wan2.1_t2v_14b_int8_lightx2v_4step.safetensors` | INT8 | 4-step model INT8 quantization |
| `wan2.1_t2v_14b_scaled_fp8_e4m3_lightx2v_4step_comfyui.safetensors` | FP8 | 4-step model ComfyUI format |
#### Wan2.2 Single-File Model List
**Image-to-Video Models (I2V) - A14B Series**
| Filename | Precision | Description |
|--------|------|------|
| `wan2.2_i2v_A14b_high_noise_lightx2v_4step.safetensors` | BF16 | High-noise model - 4-step original precision |
| `wan2.2_i2v_A14b_high_noise_scaled_fp8_e4m3_lightx2v_4step.safetensors` | FP8 | High-noise model - 4-step FP8 quantization |
| `wan2.2_i2v_A14b_high_noise_int8_lightx2v_4step.safetensors` | INT8 | High-noise model - 4-step INT8 quantization |
| `wan2.2_i2v_A14b_low_noise_lightx2v_4step.safetensors` | BF16 | Low-noise model - 4-step original precision |
| `wan2.2_i2v_A14b_low_noise_scaled_fp8_e4m3_lightx2v_4step.safetensors` | FP8 | Low-noise model - 4-step FP8 quantization |
| `wan2.2_i2v_A14b_low_noise_int8_lightx2v_4step.safetensors` | INT8 | Low-noise model - 4-step INT8 quantization |
> 💡 **Usage Tips**:
> - Wan2.2 models use a dual-noise architecture, requiring both high-noise and low-noise models to be downloaded
> - Refer to the "Wan2.2 Single-File Models" section above for detailed directory organization
---
## 🗂️ Format 3: LightX2V LoRA Models
LoRA (Low-Rank Adaptation) models provide a lightweight model fine-tuning solution that enables customization for specific effects without modifying the base model.
### Model Repositories
- **Wan2.1 LoRA Models**: [lightx2v/Wan2.1-Distill-Loras](https://huggingface.co/lightx2v/Wan2.1-Distill-Loras)
- **Wan2.2 LoRA Models**: [lightx2v/Wan2.2-Distill-Loras](https://huggingface.co/lightx2v/Wan2.2-Distill-Loras)
### Usage Methods
#### Method 1: Offline Merging
Merge LoRA weights offline into the base model to generate a new complete model file.
**Steps**:
Refer to the [Model Conversion Documentation](https://github.com/ModelTC/lightx2v/tree/main/tools/convert/readme_zh.md) for offline merging.
**Advantages**:
- ✅ No need to load LoRA during inference
- ✅ Better performance
**Disadvantages**:
- ❌ Requires additional storage space
- ❌ Switching different LoRAs requires re-merging
#### Method 2: Online Loading
Dynamically load LoRA weights during inference without modifying the base model.
**LoRA Application Principle**:
```python
# LoRA weight application formula
# W' = W + (alpha/rank) * B @ A
# Where: B = up_proj (out_features, rank)
# A = down_proj (rank, in_features)
if weights_dict["alpha"] is not None:
lora_alpha = weights_dict["alpha"] / lora_down.shape[0]
elif alpha is not None:
lora_alpha = alpha / lora_down.shape[0]
else:
lora_alpha = 1.0
``` ```
#### CLIP Model Configuration **Configuration Method**:
**Wan2.1 LoRA Configuration**:
```json ```json
{ {
"clip_quantized_ckpt": "/path/to/clip_quantized_ckpt", // CLIP quantized weights path "lora_configs": [
"clip_original_ckpt": "/path/to/clip_original_ckpt", // CLIP original precision weights path {
"clip_quantized": true, // Whether to enable CLIP quantization "path": "wan2.1_i2v_lora_rank64_lightx2v_4step.safetensors",
"clip_quant_scheme": "fp8" // CLIP quantization mode, only effective when clip_quantized is true "strength": 1.0,
"alpha": null
}
]
} }
``` ```
#### VAE Model Configuration **Wan2.2 LoRA Configuration**:
Since Wan2.2 uses a dual-model architecture (high-noise/low-noise), LoRA needs to be configured separately for both models:
```json ```json
{ {
"vae_pth": "/path/to/Wan2.1_VAE.pth", // Original VAE model path "lora_configs": [
"use_tiny_vae": true, // Whether to use lightweight VAE {
"tiny_vae_path": "/path/to/taew2_1.pth" // Lightweight VAE model path "name": "low_noise_model",
"path": "wan2.2_i2v_A14b_low_noise_lora_rank64_lightx2v_4step.safetensors",
"strength": 1.0,
"alpha": null
},
{
"name": "high_noise_model",
"path": "wan2.2_i2v_A14b_high_noise_lora_rank64_lightx2v_4step.safetensors",
"strength": 1.0,
"alpha": null
}
]
} }
``` ```
> **Configuration Notes**: **Parameter Description**:
> - Quantized weights and original precision weights can be flexibly mixed and used, and the system will automatically select the corresponding model based on the configuration
> - The choice of quantization mode depends on your hardware support, it is recommended to use FP8 on high-end GPUs like H100/A100
> - Lightweight VAE can significantly improve inference speed but may slightly affect generation quality
## 💡 Best Practices
### Recommended Configurations
**Complete Model Users**:
- Download complete models to enjoy the convenience of automatic path discovery
- Only need to configure quantization schemes and component switches
- Recommended to use bash scripts for quick startup
**Storage Space Limited Users**:
- Selectively download required quantized versions
- Flexibly mix and use quantized and original precision components
- Use bash scripts to simplify startup process
**Advanced Users**:
- Completely manual path configuration for maximum flexibility
- Support scattered storage of model files
- Can customize bash script parameters
### Performance Optimization Recommendations | Parameter | Description | Default |
|------|------|--------|
| `path` | LoRA model file path | Required |
| `strength` | LoRA strength coefficient, range [0.0, 1.0] | 1.0 |
| `alpha` | LoRA scaling factor, uses model's built-in value when `null`, defaults to 1 if no built-in value | null |
| `name` | (Wan2.2 only) Specifies which model to apply to | Required |
- **Use SSD Storage**: Significantly improve model loading speed and inference performance **Advantages**:
- **Choose Appropriate Quantization Schemes**: - ✅ Flexible switching between different LoRAs
- FP8: Suitable for high-end GPUs like H100/A100, high precision - ✅ Saves storage space
- INT8: Suitable for general GPUs, small memory footprint - ✅ Can dynamically adjust LoRA strength
- **Enable Lightweight VAE**: `use_tiny_vae: true` can improve inference speed
- **Reasonable CPU Offload Configuration**: `t5_cpu_offload: true` can save GPU memory
### Download Optimization Recommendations **Disadvantages**:
- ❌ Additional loading time during inference
- ❌ Slightly increases memory usage
- **Use Hugging Face CLI**: More stable than git clone, supports resume download ---
- **Selective Download**: Only download required quantized versions, saving time and storage space
- **Network Optimization**: Use stable network connections, use proxy when necessary
- **Resume Download**: Use `--resume-download` parameter to support continuing download after interruption
## 🚨 Frequently Asked Questions
### Q: Model files are too large and download speed is slow, what should I do?
A: It is recommended to use selective download method, only download required quantized versions, or use domestic mirror sources
### Q: Model path does not exist when starting up?
A: Please check if the model has been correctly downloaded, verify if the path configuration is correct, and confirm if the automatic discovery mechanism is working properly
### Q: How to switch between different quantization schemes? ## 📚 Related Resources
A: Modify parameters such as `mm_type`, `t5_quant_scheme`, `clip_quant_scheme` in the configuration file, please refer to the [Quantization Documentation](../method_tutorials/quantization.md)
### Q: How to mix and use quantized and original precision components? ### Official Repositories
A: Control through `t5_quantized` and `clip_quantized` parameters, and manually specify original precision paths - [LightX2V GitHub](https://github.com/ModelTC/LightX2V)
- [LightX2V Single-File Model Repository](https://huggingface.co/lightx2v/Wan2.1-Distill-Models)
- [Wan-AI Official Model Repository](https://huggingface.co/Wan-AI)
### Q: How to set paths in configuration files? ### Model Download Links
A: It is recommended to use automatic path discovery, for manual configuration please refer to the "Manual Configuration" section
### Q: How to verify if automatic path discovery is working properly? **Wan2.1 Series**
A: Check the startup logs, the code will output the actual model paths being used - [Wan2.1 Collection](https://huggingface.co/collections/Wan-AI/wan21-68ac4ba85372ae5a8e282a1b)
### Q: What should I do if bash script startup fails? **Wan2.2 Series**
A: Check if the path configuration in the script is correct, ensure that `lightx2v_path` and `model_path` variables are correctly set - [Wan2.2 Collection](https://huggingface.co/collections/Wan-AI/wan22-68ac4ae80a8b477e79636fc8)
## 📚 Related Links **LightX2V Single-File Models**
- [Wan2.1-Distill-Models](https://huggingface.co/lightx2v/Wan2.1-Distill-Models)
- [Wan2.2-Distill-Models](https://huggingface.co/lightx2v/Wan2.2-Distill-Models)
- [LightX2V Official Model Repository](https://huggingface.co/lightx2v) ### Documentation Links
- [Gradio Deployment Guide](./deploy_gradio.md) - [Quantization Documentation](../method_tutorials/quantization.md)
- [Parameter Offload Documentation](../method_tutorials/offload.md)
- [Configuration File Examples](https://github.com/ModelTC/LightX2V/tree/main/configs) - [Configuration File Examples](https://github.com/ModelTC/LightX2V/tree/main/configs)
--- ---
Through scientific model file organization and flexible configuration options, LightX2V supports multiple usage scenarios. Complete model download provides maximum convenience, selective download saves storage space, and manual configuration provides maximum flexibility. The automatic path discovery mechanism ensures that users do not need to remember complex path configurations while maintaining system scalability. Through this document, you should be able to:
✅ Understand all model formats supported by LightX2V
✅ Select appropriate models and precisions based on your needs
✅ Correctly download and organize model files
✅ Configure launch parameters and successfully run inference
✅ Resolve common model loading issues
If you have other questions, feel free to ask in [GitHub Issues](https://github.com/ModelTC/LightX2V/issues).
# Offload # Parameter Offload
## 📖 Overview ## 📖 Overview
Lightx2v implements a state-of-the-art parameter offloading mechanism specifically designed for efficient large model inference under limited hardware resources. This system provides excellent speed-memory balance through intelligent management of model weights across different memory hierarchies, enabling dynamic scheduling between GPU, CPU, and disk storage. LightX2V implements an advanced parameter offload mechanism specifically designed for large model inference under limited hardware resources. The system provides an excellent speed-memory balance by intelligently managing model weights across different memory hierarchies.
**Core Features:** **Core Features:**
- **Intelligent Granularity Management**: Supports both Block and Phase offloading granularities for flexible memory control - **Block/Phase-level Offload**: Efficiently manages model weights in block/phase units for optimal memory usage
- **Block Granularity**: Complete Transformer layers as management units, containing self-attention, cross-attention, feed-forward networks, etc., suitable for memory-sufficient environments - **Block**: The basic computational unit of Transformer models, containing complete Transformer layers (self-attention, cross-attention, feedforward networks, etc.), serving as a larger memory management unit
- **Phase Granularity**: Individual computational components as management units, providing finer-grained memory control for memory-constrained deployment scenarios - **Phase**: Finer-grained computational stages within blocks, containing individual computational components (such as self-attention, cross-attention, feedforward networks, etc.), providing more precise memory control
- **Multi-level Storage Architecture**: GPU → CPU → Disk three-tier storage hierarchy with intelligent caching strategies - **Multi-tier Storage Support**: GPU → CPU → Disk hierarchy with intelligent caching
- **Asynchronous Parallel Processing**: CUDA stream-based asynchronous computation and data transfer for maximum hardware utilization - **Asynchronous Operations**: Overlaps computation and data transfer using CUDA streams
- **Persistent Storage Support**: SSD/NVMe disk storage support for ultra-large model inference deployment - **Disk/NVMe Serialization**: Supports secondary storage when memory is insufficient
## 🎯 Offloading Strategy Details ## 🎯 Offload Strategies
### Strategy 1: GPU-CPU Granularity Offloading ### Strategy 1: GPU-CPU Block/Phase Offload
**Applicable Scenarios**: GPU VRAM insufficient but system memory resources adequate **Use Case**: Insufficient GPU memory but sufficient system memory
**Technical Principle**: Establishes efficient weight scheduling mechanism between GPU and CPU memory, managing model weights in Block or Phase units. Leverages CUDA stream asynchronous capabilities to achieve parallel execution of computation and data transfer. Blocks contain complete Transformer layer structures, while Phases correspond to individual computational components within layers. **How It Works**: Manages model weights in block or phase units between GPU and CPU memory, utilizing CUDA streams to overlap computation and data transfer. Blocks contain complete Transformer layers, while Phases are individual computational components within blocks.
**Granularity Selection Guide**:
- **Block Granularity**: Suitable for memory-sufficient environments, reduces management overhead and improves overall performance
- **Phase Granularity**: Suitable for memory-constrained environments, provides more flexible memory control and optimizes resource utilization
<div align="center"> <div align="center">
<img alt="GPU-CPU Block/Phase Offloading Workflow" src="https://raw.githubusercontent.com/ModelTC/LightX2V/main/assets/figs/offload/fig1_en.png" width="75%"> <img alt="GPU-CPU block/phase offload workflow" src="https://raw.githubusercontent.com/ModelTC/LightX2V/main/assets/figs/offload/fig1_en.png" width="75%">
</div> </div>
<div align="center"> <div align="center">
<img alt="Swap Mechanism Core Concept" src="https://raw.githubusercontent.com/ModelTC/LightX2V/main/assets/figs/offload/fig2_en.png" width="75%"> <img alt="Swap operation" src="https://raw.githubusercontent.com/ModelTC/LightX2V/main/assets/figs/offload/fig2_en.png" width="75%">
</div> </div>
<div align="center"> <div align="center">
<img alt="Asynchronous Execution Flow" src="https://raw.githubusercontent.com/ModelTC/LightX2V/main/assets/figs/offload/fig3_en.png" width="75%"> <img alt="Swap concept" src="https://raw.githubusercontent.com/ModelTC/LightX2V/main/assets/figs/offload/fig3_en.png" width="75%">
</div> </div>
**Technical Features:**
- **Multi-stream Parallel Architecture**: Employs three CUDA streams with different priorities to parallelize computation and transfer
- Compute Stream (priority=-1): High priority, responsible for current computation tasks
- GPU Load Stream (priority=0): Medium priority, responsible for weight prefetching from CPU to GPU
- CPU Load Stream (priority=0): Medium priority, responsible for weight offloading from GPU to CPU
- **Intelligent Prefetching Mechanism**: Predictively loads next Block/Phase based on computation progress
- **Efficient Cache Management**: Maintains weight cache pool in CPU memory for improved access efficiency
- **Stream Synchronization Guarantee**: Ensures temporal correctness of data transfer and computation
- **Position Rotation Optimization**: Achieves continuous computation through Swap operations, avoiding repeated loading/unloading
### Strategy 2: Disk-CPU-GPU Three-Level Offloading (Lazy Loading) **Block vs Phase Explanation**:
- **Block Granularity**: Larger memory management unit containing complete Transformer layers (self-attention, cross-attention, feedforward networks, etc.), suitable for sufficient memory scenarios with reduced management overhead
- **Phase Granularity**: Finer-grained memory management containing individual computational components (such as self-attention, cross-attention, feedforward networks, etc.), suitable for memory-constrained scenarios with more flexible memory control
**Key Features:**
- **Asynchronous Transfer**: Uses three CUDA streams with different priorities for parallel computation and transfer
- Compute stream (priority=-1): High priority, handles current computation
- GPU load stream (priority=0): Medium priority, handles CPU to GPU prefetching
- CPU load stream (priority=0): Medium priority, handles GPU to CPU offloading
- **Prefetch Mechanism**: Preloads the next block/phase to GPU in advance
- **Intelligent Caching**: Maintains weight cache in CPU memory
- **Stream Synchronization**: Ensures correctness of data transfer and computation
- **Swap Operation**: Rotates block/phase positions after computation for continuous execution
**Applicable Scenarios**: Both GPU VRAM and system memory resources insufficient in constrained environments ### Strategy 2: Disk-CPU-GPU Block/Phase Offload (Lazy Loading)
**Use Case**: Both GPU memory and system memory are insufficient
**How It Works**: Builds upon Strategy 1 by introducing disk storage, implementing a three-tier storage hierarchy (Disk → CPU → GPU). CPU continues to serve as a cache pool with configurable size, suitable for devices with limited CPU memory.
**Technical Principle**: Introduces disk storage layer on top of Strategy 1, constructing a Disk→CPU→GPU three-level storage architecture. CPU serves as a configurable intelligent cache pool, suitable for various memory-constrained deployment environments.
<div align="center"> <div align="center">
<img alt="Disk-CPU-GPU Three-Level Offloading Architecture" src="https://raw.githubusercontent.com/ModelTC/LightX2V/main/assets/figs/offload/fig4_en.png" width="75%"> <img alt="Disk-CPU-GPU block/phase offload workflow" src="https://raw.githubusercontent.com/ModelTC/LightX2V/main/assets/figs/offload/fig4_en.png" width="75%">
</div> </div>
<div align="center"> <div align="center">
<img alt="Complete Workflow" src="https://raw.githubusercontent.com/ModelTC/LightX2V/main/assets/figs/offload/fig5_en.png" width="75%"> <img alt="Working steps" src="https://raw.githubusercontent.com/ModelTC/LightX2V/main/assets/figs/offload/fig5_en.png" width="75%">
</div> </div>
**Execution Steps Details:** **Key Features:**
1. **Disk Storage Layer**: Model weights organized by Block on SSD/NVMe, each Block corresponding to one .safetensors file - **Lazy Loading**: Model weights are loaded from disk on-demand, avoiding loading the entire model at once
2. **Task Scheduling Layer**: Priority queue-based intelligent scheduling system for disk loading task assignment - **Intelligent Caching**: CPU memory buffer uses FIFO strategy with configurable size
3. **Asynchronous Loading Layer**: Multi-threaded parallel reading of weight files from disk to CPU memory buffer - **Multi-threaded Prefetch**: Uses multiple disk worker threads for parallel loading
4. **Intelligent Cache Layer**: CPU memory buffer using FIFO strategy for cache management with dynamic size configuration - **Asynchronous Transfer**: Uses CUDA streams to overlap computation and data transfer
5. **Cache Hit Optimization**: Direct transfer to GPU when weights are already in cache, avoiding disk I/O overhead - **Swap Rotation**: Achieves continuous computation through position rotation, avoiding repeated loading/offloading
6. **Prefetch Transfer Layer**: Weights in cache asynchronously transferred to GPU memory via GPU load stream
7. **Compute Execution Layer**: Weights on GPU perform computation (compute stream) while background continues prefetching next Block/Phase **Working Steps**:
8. **Position Rotation Layer**: Swap rotation after computation completion for continuous computation flow - **Disk Storage**: Model weights are stored on SSD/NVMe by block, one .safetensors file per block
9. **Memory Management Layer**: Automatic eviction of earliest used weight Blocks/Phases when CPU cache is full - **Task Scheduling**: When a block/phase is needed, priority task queue assigns disk worker threads
- **Asynchronous Loading**: Multiple disk threads load weight files from disk to CPU memory buffer in parallel
- **Intelligent Caching**: CPU memory buffer manages cache using FIFO strategy with configurable size
- **Cache Hit**: If weights are already in cache, transfer directly to GPU without disk read
- **Prefetch Transfer**: Weights in cache are asynchronously transferred to GPU memory (using GPU load stream)
- **Compute Execution**: Weights on GPU perform computation (using compute stream) while background continues prefetching next block/phase
- **Swap Rotation**: After computation completes, rotate block/phase positions for continuous computation
- **Memory Management**: When CPU cache is full, automatically evict the least recently used weight block/phase
**Technical Features:**
- **On-demand Loading Mechanism**: Model weights loaded from disk only when needed, avoiding loading entire model at once
- **Configurable Cache Strategy**: CPU memory buffer supports FIFO strategy with dynamically adjustable size
- **Multi-threaded Parallel Loading**: Leverages multiple disk worker threads for parallel data loading
- **Asynchronous Transfer Optimization**: CUDA stream-based asynchronous data transfer for maximum hardware utilization
- **Continuous Computation Guarantee**: Achieves continuous computation through position rotation mechanism, avoiding repeated loading/unloading operations
## ⚙️ Configuration Parameters Details ## ⚙️ Configuration Parameters
### GPU-CPU Offloading Configuration ### GPU-CPU Offload Configuration
```python ```python
config = { config = {
"cpu_offload": True, # Enable CPU offloading functionality "cpu_offload": True,
"offload_ratio": 1.0, # Offload ratio (0.0-1.0), 1.0 means complete offloading "offload_ratio": 1.0, # Offload ratio (0.0-1.0)
"offload_granularity": "block", # Offload granularity selection: "block" or "phase" "offload_granularity": "block", # Offload granularity: "block" or "phase"
"lazy_load": False, # Disable lazy loading mode "lazy_load": False, # Disable lazy loading
} }
``` ```
### Disk-CPU-GPU Offloading Configuration ### Disk-CPU-GPU Offload Configuration
```python ```python
config = { config = {
"cpu_offload": True, # Enable CPU offloading functionality "cpu_offload": True,
"lazy_load": True, # Enable lazy loading mode "lazy_load": True, # Enable lazy loading
"offload_ratio": 1.0, # Offload ratio setting "offload_ratio": 1.0, # Offload ratio
"offload_granularity": "phase", # Recommended to use phase granularity for better memory control "offload_granularity": "phase", # Recommended to use phase granularity
"num_disk_workers": 2, # Number of disk worker threads "num_disk_workers": 2, # Number of disk worker threads
"offload_to_disk": True, # Enable disk offloading functionality "offload_to_disk": True, # Enable disk offload
"offload_path": ".", # Disk offload path configuration
} }
``` ```
**Intelligent Cache Key Parameter Descriptions:** **Intelligent Cache Key Parameters:**
- `max_memory`: Controls CPU cache size upper limit, directly affects cache hit rate and memory usage - `max_memory`: Controls CPU cache size, affects cache hit rate and memory usage
- `num_disk_workers`: Controls number of disk loading threads, affects data prefetch speed - `num_disk_workers`: Controls number of disk loading threads, affects prefetch speed
- `offload_granularity`: Controls cache management granularity, affects cache efficiency and memory utilization - `offload_granularity`: Controls cache granularity (block or phase), affects cache efficiency
- `"block"`: Cache management in units of complete Transformer layers, suitable for memory-sufficient environments - `"block"`: Cache management in complete Transformer layer units
- `"phase"`: Cache management in units of individual computational components, suitable for memory-constrained environments - `"phase"`: Cache management in individual computational component units
**Offload Configuration for Non-DIT Model Components (T5, CLIP, VAE):**
The offload behavior of these components follows these rules:
- **Default Behavior**: If not specified separately, T5, CLIP, VAE will follow the `cpu_offload` setting
- **Independent Configuration**: Can set offload strategy separately for each component for fine-grained control
**Configuration Example**:
```json
{
"cpu_offload": true, // DIT model offload switch
"t5_cpu_offload": false, // T5 encoder independent setting
"clip_cpu_offload": false, // CLIP encoder independent setting
"vae_cpu_offload": false // VAE encoder independent setting
}
```
For memory-constrained devices, a progressive offload strategy is recommended:
1. **Step 1**: Only enable `cpu_offload`, disable `t5_cpu_offload`, `clip_cpu_offload`, `vae_cpu_offload`
2. **Step 2**: If memory is still insufficient, gradually enable CPU offload for T5, CLIP, VAE
3. **Step 3**: If memory is still not enough, consider using quantization + CPU offload or enable `lazy_load`
**Practical Experience**:
- **RTX 4090 24GB + 14B Model**: Usually only need to enable `cpu_offload`, manually set other component offload to `false`, and use FP8 quantized version
- **Smaller Memory GPUs**: Need to combine quantization, CPU offload, and lazy loading
- **Quantization Schemes**: Refer to [Quantization Documentation](../method_tutorials/quantization.md) to select appropriate quantization strategy
Detailed configuration files can be referenced at [Official Configuration Repository](https://github.com/ModelTC/lightx2v/tree/main/configs/offload)
## 🎯 Deployment Strategy Recommendations **Configuration File Reference**:
- **Wan2.1 Series Models**: Refer to [offload config files](https://github.com/ModelTC/lightx2v/tree/main/configs/offload)
- **Wan2.2 Series Models**: Refer to [wan22 config files](https://github.com/ModelTC/lightx2v/tree/main/configs/wan22) with `4090` suffix
- 🔄 GPU-CPU Granularity Offloading: Suitable for insufficient GPU VRAM (RTX 3090/4090 24G) but adequate system memory (>64G) ## 🎯 Usage Recommendations
- Advantages: Balances performance and memory usage, suitable for medium-scale model inference - 🔄 GPU-CPU Block/Phase Offload: Suitable for insufficient GPU memory (RTX 3090/4090 24G) but sufficient system memory (>64/128G)
- 💾 Disk-CPU-GPU Three-Level Offloading: Suitable for limited GPU VRAM (RTX 3060/4090 8G) and insufficient system memory (16-32G) - 💾 Disk-CPU-GPU Block/Phase Offload: Suitable for both insufficient GPU memory (RTX 3060/4090 8G) and system memory (16/32G)
- Advantages: Supports ultra-large model inference with lowest hardware threshold
- 🚫 No Offload Mode: Suitable for high-end hardware configurations pursuing optimal inference performance - 🚫 No Offload: Suitable for high-end hardware configurations pursuing best performance
- Advantages: Maximizes computational efficiency, suitable for latency-sensitive application scenarios
## 🔍 Troubleshooting and Solutions
### Common Performance Issues and Optimization Strategies ## 🔍 Troubleshooting
### Common Issues and Solutions
1. **Disk I/O Bottleneck**
- Solution: Use NVMe SSD, increase num_disk_workers
1. **Disk I/O Performance Bottleneck**
- Problem Symptoms: Slow model loading speed, high inference latency
- Solutions:
- Upgrade to NVMe SSD storage devices
- Increase num_disk_workers parameter value
- Optimize file system configuration
2. **Memory Buffer Overflow** 2. **Memory Buffer Overflow**
- Problem Symptoms: Insufficient system memory, program abnormal exit - Solution: Increase max_memory or reduce num_disk_workers
- Solutions:
- Increase max_memory parameter value 3. **Loading Timeout**
- Decrease num_disk_workers parameter value - Solution: Check disk performance, optimize file system
- Adjust offload_granularity to "phase"
3. **Model Loading Timeout** **Note**: This offload mechanism is specifically designed for LightX2V, fully utilizing the asynchronous computing capabilities of modern hardware, significantly lowering the hardware threshold for large model inference.
- Problem Symptoms: Timeout errors during model loading process
- Solutions:
- Check disk read/write performance
- Optimize file system parameters
- Verify storage device health status
## 📚 Technical Summary
Lightx2v's offloading mechanism is specifically designed for modern AI inference scenarios, fully leveraging GPU's asynchronous computing capabilities and multi-level storage architecture advantages. Through intelligent weight management and efficient parallel processing, this mechanism significantly reduces the hardware threshold for large model inference, providing developers with flexible and efficient deployment solutions.
**Technical Highlights:**
- 🚀 **Performance Optimization**: Asynchronous parallel processing maximizes hardware utilization
- 💾 **Intelligent Memory**: Multi-level caching strategies achieve optimal memory management
- 🔧 **Flexible Configuration**: Supports flexible configuration of multiple granularities and strategies
- 🛡️ **Stable and Reliable**: Comprehensive error handling and fault recovery mechanisms
# Model Quantization # Model Quantization Techniques
LightX2V supports quantization inference for linear layers in `Dit`, supporting `w8a8-int8`, `w8a8-fp8`, `w8a8-fp8block`, `w8a8-mxfp8`, and `w4a4-nvfp4` matrix multiplication. Additionally, LightX2V also supports quantization of T5 and CLIP encoders to further improve inference performance. ## 📖 Overview
## 📊 Quantization Scheme Overview LightX2V supports quantized inference for DIT, T5, and CLIP models, reducing memory usage and improving inference speed by lowering model precision.
### DIT Model Quantization ---
LightX2V supports multiple DIT matrix multiplication quantization schemes, configured through the `mm_type` parameter: ## 🔧 Quantization Modes
#### Supported mm_type Types | Quantization Mode | Weight Quantization | Activation Quantization | Compute Kernel | Supported Hardware |
|--------------|----------|----------|----------|----------|
| `fp8-vllm` | FP8 channel symmetric | FP8 channel dynamic symmetric | [VLLM](https://github.com/vllm-project/vllm) | H100/H200/H800, RTX 40 series, etc. |
| `int8-vllm` | INT8 channel symmetric | INT8 channel dynamic symmetric | [VLLM](https://github.com/vllm-project/vllm) | A100/A800, RTX 30/40 series, etc. |
| `fp8-sgl` | FP8 channel symmetric | FP8 channel dynamic symmetric | [SGL](https://github.com/sgl-project/sglang/tree/main/sgl-kernel) | H100/H200/H800, RTX 40 series, etc. |
| `int8-sgl` | INT8 channel symmetric | INT8 channel dynamic symmetric | [SGL](https://github.com/sgl-project/sglang/tree/main/sgl-kernel) | A100/A800, RTX 30/40 series, etc. |
| `fp8-q8f` | FP8 channel symmetric | FP8 channel dynamic symmetric | [Q8-Kernels](https://github.com/KONAKONA666/q8_kernels) | RTX 40 series, L40S, etc. |
| `int8-q8f` | INT8 channel symmetric | INT8 channel dynamic symmetric | [Q8-Kernels](https://github.com/KONAKONA666/q8_kernels) | RTX 40 series, L40S, etc. |
| `int8-torchao` | INT8 channel symmetric | INT8 channel dynamic symmetric | [TorchAO](https://github.com/pytorch/ao) | A100/A800, RTX 30/40 series, etc. |
| `int4-g128-marlin` | INT4 group symmetric | FP16 | [Marlin](https://github.com/IST-DASLab/marlin) | H200/H800/A100/A800, RTX 30/40 series, etc. |
| `fp8-b128-deepgemm` | FP8 block symmetric | FP8 group symmetric | [DeepGemm](https://github.com/deepseek-ai/DeepGEMM) | H100/H200/H800, RTX 40 series, etc.|
| mm_type | Weight Quantization | Activation Quantization | Compute Kernel | ---
|---------|-------------------|------------------------|----------------|
| `Default` | No Quantization | No Quantization | PyTorch |
| `W-fp8-channel-sym-A-fp8-channel-sym-dynamic-Vllm` | FP8 Channel Symmetric | FP8 Channel Dynamic Symmetric | VLLM |
| `W-int8-channel-sym-A-int8-channel-sym-dynamic-Vllm` | INT8 Channel Symmetric | INT8 Channel Dynamic Symmetric | VLLM |
| `W-fp8-channel-sym-A-fp8-channel-sym-dynamic-Q8F` | FP8 Channel Symmetric | FP8 Channel Dynamic Symmetric | Q8F |
| `W-int8-channel-sym-A-int8-channel-sym-dynamic-Q8F` | INT8 Channel Symmetric | INT8 Channel Dynamic Symmetric | Q8F |
| `W-fp8-block128-sym-A-fp8-channel-group128-sym-dynamic-Deepgemm` | FP8 Block Symmetric | FP8 Channel Group Symmetric | DeepGEMM |
| `W-fp8-channel-sym-A-fp8-channel-sym-dynamic-Sgl` | FP8 Channel Symmetric | FP8 Channel Dynamic Symmetric | SGL |
#### Detailed Quantization Scheme Description ## 🔧 Obtaining Quantized Models
**FP8 Quantization Scheme**: ### Method 1: Download Pre-Quantized Models
- **Weight Quantization**: Uses `torch.float8_e4m3fn` format with per-channel symmetric quantization
- **Activation Quantization**: Dynamic quantization supporting per-token and per-channel modes
- **Advantages**: Provides optimal performance on FP8-supported GPUs with minimal precision loss (typically <1%)
- **Compatible Hardware**: H100, A100, RTX 40 series and other FP8-supported GPUs
**INT8 Quantization Scheme**: Download pre-quantized models from LightX2V model repositories:
- **Weight Quantization**: Uses `torch.int8` format with per-channel symmetric quantization
- **Activation Quantization**: Dynamic quantization supporting per-token mode
- **Advantages**: Best compatibility, suitable for most GPU hardware, reduces memory usage by ~50%
- **Compatible Hardware**: All INT8-supported GPUs
**Block Quantization Scheme**: **DIT Models**
- **Weight Quantization**: FP8 quantization by 128x128 blocks
- **Activation Quantization**: Quantization by channel groups (group size 128)
- **Advantages**: Particularly suitable for large models with higher memory efficiency, supports larger batch sizes
### T5 Encoder Quantization Download pre-quantized DIT models from [Wan2.1-Distill-Models](https://huggingface.co/lightx2v/Wan2.1-Distill-Models):
T5 encoder supports the following quantization schemes: ```bash
# Download DIT FP8 quantized model
huggingface-cli download lightx2v/Wan2.1-Distill-Models \
--local-dir ./models \
--include "wan2.1_i2v_720p_scaled_fp8_e4m3_lightx2v_4step.safetensors"
```
#### Supported quant_scheme Types **Encoder Models**
| quant_scheme | Quantization Precision | Compute Kernel | Download pre-quantized T5 and CLIP models from [Encoders-LightX2V](https://huggingface.co/lightx2v/Encoders-Lightx2v):
|--------------|----------------------|----------------|
| `int8` | INT8 | VLLM |
| `fp8` | FP8 | VLLM |
| `int8-torchao` | INT8 | TorchAO |
| `int8-q8f` | INT8 | Q8F |
| `fp8-q8f` | FP8 | Q8F |
### CLIP Encoder Quantization ```bash
# Download T5 FP8 quantized model
huggingface-cli download lightx2v/Encoders-Lightx2v \
--local-dir ./models \
--include "models_t5_umt5-xxl-enc-fp8.pth"
CLIP encoder supports the same quantization schemes as T5 # Download CLIP FP8 quantized model
huggingface-cli download lightx2v/Encoders-Lightx2v \
--local-dir ./models \
--include "models_clip_open-clip-xlm-roberta-large-vit-huge-14-fp8.pth"
```
## 🚀 Producing Quantized Models ### Method 2: Self-Quantize Models
Download quantized models from the [LightX2V Official Model Repository](https://huggingface.co/lightx2v), refer to the [Model Structure Documentation](../deploy_guides/model_structure.md) for details. For detailed quantization tool usage, refer to: [Model Conversion Documentation](https://github.com/ModelTC/lightx2v/tree/main/tools/convert/readme_zh.md)
Use LightX2V's convert tool to convert models into quantized models. Refer to the [documentation](https://github.com/ModelTC/lightx2v/tree/main/tools/convert/readme.md). ---
## 📥 Loading Quantized Models for Inference ## 🚀 Using Quantized Models
### DIT Model Configuration ### DIT Model Quantization
Write the path of the converted quantized weights to the `dit_quantized_ckpt` field in the [configuration file](https://github.com/ModelTC/lightx2v/blob/main/configs/quantization). #### Supported Quantization Modes
```json DIT quantization modes (`dit_quant_scheme`) support: `fp8-vllm`, `int8-vllm`, `fp8-sgl`, `int8-sgl`, `fp8-q8f`, `int8-q8f`, `int8-torchao`, `int4-g128-marlin`, `fp8-b128-deepgemm`
{
"dit_quantized_ckpt": "/path/to/dit_quantized_ckpt",
"mm_config": {
"mm_type": "W-fp8-channel-sym-A-fp8-channel-sym-dynamic-Vllm"
}
}
```
### T5 Encoder Configuration #### Configuration Example
```json ```json
{ {
"t5_quantized": true, "dit_quantized": true,
"t5_quant_scheme": "fp8", "dit_quant_scheme": "fp8-sgl",
"t5_quantized_ckpt": "/path/to/t5_quantized_ckpt" "dit_quantized_ckpt": "/path/to/dit_quantized_model" // Optional
} }
``` ```
### CLIP Encoder Configuration > 💡 **Tip**: When there's only one DIT model in the script's `model_path`, `dit_quantized_ckpt` doesn't need to be specified separately.
```json ### T5 Model Quantization
{
"clip_quantized": true, #### Supported Quantization Modes
"clip_quant_scheme": "fp8",
"clip_quantized_ckpt": "/path/to/clip_quantized_ckpt"
}
```
### Complete Configuration Example T5 quantization modes (`t5_quant_scheme`) support: `int8-vllm`, `fp8-sgl`, `int8-q8f`, `fp8-q8f`, `int8-torchao`
#### Configuration Example
```json ```json
{ {
"dit_quantized_ckpt": "/path/to/dit_quantized_ckpt",
"mm_config": {
"mm_type": "W-fp8-channel-sym-A-fp8-channel-sym-dynamic-Vllm"
},
"t5_quantized": true, "t5_quantized": true,
"t5_quant_scheme": "fp8", "t5_quant_scheme": "fp8-sgl",
"t5_quantized_ckpt": "/path/to/t5_quantized_ckpt", "t5_quantized_ckpt": "/path/to/t5_quantized_model" // Optional
"clip_quantized": true,
"clip_quant_scheme": "fp8",
"clip_quantized_ckpt": "/path/to/clip_quantized_ckpt"
} }
``` ```
By specifying `--config_json` to the specific config file, you can load the quantized model for inference. > 💡 **Tip**: When a T5 quantized model exists in the script's specified `model_path` (such as `models_t5_umt5-xxl-enc-fp8.pth` or `models_t5_umt5-xxl-enc-int8.pth`), `t5_quantized_ckpt` doesn't need to be specified separately.
[Here](https://github.com/ModelTC/lightx2v/tree/main/scripts/quantization) are some running scripts for use.
## 💡 Quantization Scheme Selection Recommendations
### Hardware Compatibility ### CLIP Model Quantization
- **H100/A100 GPU/RTX 4090/RTX 4060**: Recommended to use FP8 quantization schemes #### Supported Quantization Modes
- DIT: `W-fp8-channel-sym-A-fp8-channel-sym-dynamic-Vllm`
- T5/CLIP: `fp8`
- **A100/RTX 3090/RTX 3060**: Recommended to use INT8 quantization schemes
- DIT: `W-int8-channel-sym-A-int8-channel-sym-dynamic-Vllm`
- T5/CLIP: `int8`
- **Other GPUs**: Choose based on hardware support
### Performance Optimization CLIP quantization modes (`clip_quant_scheme`) support: `int8-vllm`, `fp8-sgl`, `int8-q8f`, `fp8-q8f`, `int8-torchao`
- **Memory Constrained**: Choose INT8 quantization schemes #### Configuration Example
- **Speed Priority**: Choose FP8 quantization schemes
- **High Precision Requirements**: Use FP8 or mixed precision schemes
### Mixed Quantization Strategy
You can choose different quantization schemes for different components:
```json ```json
{ {
"mm_config": {
"mm_type": "W-fp8-channel-sym-A-fp8-channel-sym-dynamic-Vllm"
},
"t5_quantized": true,
"t5_quant_scheme": "int8",
"clip_quantized": true, "clip_quantized": true,
"clip_quant_scheme": "fp8" "clip_quant_scheme": "fp8-sgl",
"clip_quantized_ckpt": "/path/to/clip_quantized_model" // Optional
} }
``` ```
## 🔧 Advanced Quantization Features > 💡 **Tip**: When a CLIP quantized model exists in the script's specified `model_path` (such as `models_clip_open-clip-xlm-roberta-large-vit-huge-14-fp8.pth` or `models_clip_open-clip-xlm-roberta-large-vit-huge-14-int8.pth`), `clip_quantized_ckpt` doesn't need to be specified separately.
For details, please refer to the documentation of the quantization tool [LLMC](https://github.com/ModelTC/llmc/blob/main/docs/en/source/backend/lightx2v.md) ### Performance Optimization Strategy
### Custom Quantization Kernels If memory is insufficient, you can combine parameter offloading to further reduce memory usage. Refer to [Parameter Offload Documentation](../method_tutorials/offload.md):
LightX2V supports custom quantization kernels that can be extended in the following ways: > - **Wan2.1 Configuration**: Refer to [offload config files](https://github.com/ModelTC/LightX2V/tree/main/configs/offload)
> - **Wan2.2 Configuration**: Refer to [wan22 config files](https://github.com/ModelTC/LightX2V/tree/main/configs/wan22) with `4090` suffix
1. **Register New mm_type**: Add new quantization classes in `mm_weight.py` ---
2. **Implement Quantization Functions**: Define quantization methods for weights and activations
3. **Integrate Compute Kernels**: Use custom matrix multiplication implementations
## 🚨 Important Notes ## 📚 Related Resources
1. **Hardware Requirements**: FP8 quantization requires FP8-supported GPUs (such as H100, RTX 40 series) ### Configuration File Examples
2. **Precision Impact**: Quantization will bring certain precision loss, which needs to be weighed based on application scenarios - [INT8 Quantization Config](https://github.com/ModelTC/LightX2V/blob/main/configs/quantization/wan_i2v.json)
- [Q8F Quantization Config](https://github.com/ModelTC/LightX2V/blob/main/configs/quantization/wan_i2v_q8f.json)
- [TorchAO Quantization Config](https://github.com/ModelTC/LightX2V/blob/main/configs/quantization/wan_i2v_torchao.json)
## 📚 Related Resources ### Run Scripts
- [Quantization Inference Scripts](https://github.com/ModelTC/LightX2V/tree/main/scripts/quantization)
### Tool Documentation
- [Quantization Tool Documentation](https://github.com/ModelTC/lightx2v/tree/main/tools/convert/readme_zh.md)
- [LightCompress Quantization Documentation](https://github.com/ModelTC/llmc/blob/main/docs/zh_cn/source/backend/lightx2v.md)
### Model Repositories
- [Wan2.1-LightX2V Quantized Models](https://huggingface.co/lightx2v/Wan2.1-Distill-Models)
- [Wan2.2-LightX2V Quantized Models](https://huggingface.co/lightx2v/Wan2.2-Distill-Models)
- [Encoders Quantized Models](https://huggingface.co/lightx2v/Encoders-Lightx2v)
---
Through this document, you should be able to:
✅ Understand quantization schemes supported by LightX2V
✅ Select appropriate quantization strategies based on hardware
✅ Correctly configure quantization parameters
✅ Obtain and use quantized models
✅ Optimize inference performance and memory usage
- [Quantization Tool Documentation](https://github.com/ModelTC/lightx2v/tree/main/tools/convert/readme.md) If you have other questions, feel free to ask in [GitHub Issues](https://github.com/ModelTC/LightX2V/issues).
- [Running Scripts](https://github.com/ModelTC/lightx2v/tree/main/scripts/quantization)
- [Configuration File Examples](https://github.com/ModelTC/lightx2v/blob/main/configs/quantization)
- [LLMC Quantization Documentation](https://github.com/ModelTC/llmc/blob/main/docs/en/source/backend/lightx2v.md)
# 模型结构介绍 # 模型格式与加载指南
## 📖 概述 ## 📖 概述
本文档全面介绍 LightX2V 项目的模型目录结构,旨在帮助用户高效组织模型文件,实现便捷的使用体验。通过科学的目录组织方式,用户可以享受"一键启动"的便利,无需手动配置复杂的路径参数。同时,系统也支持灵活的手动路径配置,满足不同用户群体的多样化需求。 LightX2V 是一个灵活的视频生成推理框架,支持多种模型来源和格式,为用户提供丰富的选择:
## 🗂️ 模型目录结构 -**Wan 官方模型**:直接兼容 Wan2.1 和 Wan2.2 官方发布的完整模型
-**单文件模型**:支持 LightX2V 发布的单文件格式模型(包含量化版本)
-**LoRA 模型**:支持加载 LightX2V 发布的蒸馏 LoRA
### LightX2V 官方模型列表 本文档将详细介绍各种模型格式的使用方法、配置参数和最佳实践。
查看所有可用模型:[LightX2V 官方模型仓库](https://huggingface.co/lightx2v) ---
### 标准目录结构 ## 🗂️ 格式一:Wan 官方模型
`Wan2.1-I2V-14B-480P-LightX2V` 为例,标准文件结构如下: ### 模型仓库
- [Wan2.1 Collection](https://huggingface.co/collections/Wan-AI/wan21-68ac4ba85372ae5a8e282a1b)
- [Wan2.2 Collection](https://huggingface.co/collections/Wan-AI/wan22-68ac4ae80a8b477e79636fc8)
``` ### 模型特点
Wan2.1-I2V-14B-480P-LightX2V/ - **官方保证**:Wan-AI 官方发布的完整模型,质量最高
├── fp8/ # FP8 量化版本 (DIT/T5/CLIP) - **完整组件**:包含所有必需的组件(DIT、T5、CLIP、VAE)
│ ├── block_xx.safetensors # DIT 模型 FP8 量化版本 - **原始精度**:使用 BF16/FP32 精度,无量化损失
│ ├── models_t5_umt5-xxl-enc-fp8.pth # T5 编码器 FP8 量化版本 - **兼容性强**:与 Wan 官方工具链完全兼容
│ ├── clip-fp8.pth # CLIP 编码器 FP8 量化版本
│ ├── Wan2.1_VAE.pth # VAE 变分自编码器 ### Wan2.1 官方模型
│ ├── taew2_1.pth # 轻量级 VAE (可选)
│ └── config.json # 模型配置文件
├── int8/ # INT8 量化版本 (DIT/T5/CLIP)
│ ├── block_xx.safetensors # DIT 模型 INT8 量化版本
│ ├── models_t5_umt5-xxl-enc-int8.pth # T5 编码器 INT8 量化版本
│ ├── clip-int8.pth # CLIP 编码器 INT8 量化版本
│ ├── Wan2.1_VAE.pth # VAE 变分自编码器
│ ├── taew2_1.pth # 轻量级 VAE (可选)
│ └── config.json # 模型配置文件
├── original/ # 原始精度版本 (DIT/T5/CLIP)
│ ├── xx.safetensors # DIT 模型原始精度版本
│ ├── models_t5_umt5-xxl-enc-bf16.pth # T5 编码器原始精度版本
│ ├── models_clip_open-clip-xlm-roberta-large-vit-huge-14.pth # CLIP 编码器原始精度版本
│ ├── Wan2.1_VAE.pth # VAE 变分自编码器
│ ├── taew2_1.pth # 轻量级 VAE (可选)
│ └── config.json # 模型配置文件
```
`Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-LightX2V` 为例,标准文件结构如下: #### 目录结构
[Wan2.1-I2V-14B-720P](https://huggingface.co/Wan-AI/Wan2.1-I2V-14B-720P) 为例:
``` ```
Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-LightX2V/ Wan2.1-I2V-14B-720P/
├── distill_fp8/ # FP8 量化版本 (DIT/T5/CLIP) ├── diffusion_pytorch_model-00001-of-00007.safetensors # DIT 模型分片 1
│ ├── block_xx.safetensors # DIT 模型 FP8 量化版本 ├── diffusion_pytorch_model-00002-of-00007.safetensors # DIT 模型分片 2
│ ├── models_t5_umt5-xxl-enc-fp8.pth # T5 编码器 FP8 量化版本 ├── diffusion_pytorch_model-00003-of-00007.safetensors # DIT 模型分片 3
│ ├── clip-fp8.pth # CLIP 编码器 FP8 量化版本 ├── diffusion_pytorch_model-00004-of-00007.safetensors # DIT 模型分片 4
│ ├── Wan2.1_VAE.pth # VAE 变分自编码器 ├── diffusion_pytorch_model-00005-of-00007.safetensors # DIT 模型分片 5
│ ├── taew2_1.pth # 轻量级 VAE (可选) ├── diffusion_pytorch_model-00006-of-00007.safetensors # DIT 模型分片 6
│ └── config.json # 模型配置文件 ├── diffusion_pytorch_model-00007-of-00007.safetensors # DIT 模型分片 7
├── distill_int8/ # INT8 量化版本 (DIT/T5/CLIP) ├── diffusion_pytorch_model.safetensors.index.json # 分片索引文件
│ ├── block_xx.safetensors # DIT 模型 INT8 量化版本 ├── models_t5_umt5-xxl-enc-bf16.pth # T5 文本编码器
│ ├── models_t5_umt5-xxl-enc-int8.pth # T5 编码器 INT8 量化版本 ├── models_clip_open-clip-xlm-roberta-large-vit-huge-14.pth # CLIP 编码器
│ ├── clip-int8.pth # CLIP 编码器 INT8 量化版本 ├── Wan2.1_VAE.pth # VAE 编解码器
│ ├── Wan2.1_VAE.pth # VAE 变分自编码器 ├── config.json # 模型配置
│ ├── taew2_1.pth # 轻量级 VAE (可选) ├── xlm-roberta-large/ # CLIP tokenizer
│ └── config.json # 模型配置文件 ├── google/ # T5 tokenizer
├── distill_models/ # 原始精度版本 (DIT/T5/CLIP) ├── assets/
│ ├── distill_model.safetensors # DIT 模型原始精度版本 └── examples/
│ ├── models_t5_umt5-xxl-enc-bf16.pth # T5 编码器原始精度版本
│ ├── models_clip_open-clip-xlm-roberta-large-vit-huge-14.pth # CLIP 编码器原始精度版本
│ ├── Wan2.1_VAE.pth # VAE 变分自编码器
│ ├── taew2_1.pth # 轻量级 VAE (可选)
│ └── config.json # 模型配置文件
├── loras/
│ ├── Wan21_I2V_14B_lightx2v_cfg_step_distill_lora_rank64.safetensors # 蒸馏模型lora
``` ```
### 💾 存储建议 #### 使用方法
**强烈建议将模型文件存储在 SSD 固态硬盘上**,此举可显著提升模型加载速度和推理性能。
**推荐存储路径**
```bash ```bash
/mnt/ssd/models/ # 独立 SSD 挂载点 # 下载模型
/data/ssd/models/ # 数据 SSD 目录 huggingface-cli download Wan-AI/Wan2.1-I2V-14B-720P \
/opt/models/ # 系统优化目录 --local-dir ./models/Wan2.1-I2V-14B-720P
```
### 量化版本说明 # 配置启动脚本
model_path=./models/Wan2.1-I2V-14B-720P
lightx2v_path=/path/to/LightX2V
每个模型均包含多个量化版本,适配不同硬件配置需求: # 运行推理
- **FP8 版本**:适用于支持 FP8 的 GPU(如 H100、A100、RTX 40系列),提供最佳性能表现 cd LightX2V/scripts
- **INT8 版本**:适用于大多数 GPU,在性能和兼容性间取得平衡,内存占用减少约50% bash wan/run_wan_i2v.sh
- **原始精度版本**:适用于对精度要求极高的应用场景,提供最高质量输出 ```
### Wan2.2 官方模型
## 🚀 使用方法 #### 目录结构
### 环境准备 [Wan2.2-I2V-A14B](https://huggingface.co/Wan-AI/Wan2.2-I2V-A14B) 为例:
#### 安装 Hugging Face CLI ```
Wan2.2-I2V-A14B/
├── high_noise_model/ # 高噪声模型目录
│ ├── diffusion_pytorch_model-00001-of-00009.safetensors
│ ├── diffusion_pytorch_model-00002-of-00009.safetensors
│ ├── ...
│ ├── diffusion_pytorch_model-00009-of-00009.safetensors
│ └── diffusion_pytorch_model.safetensors.index.json
├── low_noise_model/ # 低噪声模型目录
│ ├── diffusion_pytorch_model-00001-of-00009.safetensors
│ ├── diffusion_pytorch_model-00002-of-00009.safetensors
│ ├── ...
│ ├── diffusion_pytorch_model-00009-of-00009.safetensors
│ └── diffusion_pytorch_model.safetensors.index.json
├── models_t5_umt5-xxl-enc-bf16.pth # T5 文本编码器
├── Wan2.1_VAE.pth # VAE 编解码器
├── configuration.json # 模型配置
├── google/ # T5 tokenizer
├── assets/ # 示例资源(可选)
└── examples/ # 示例文件(可选)
```
在开始下载模型之前,请确保已正确安装 Hugging Face CLI: #### 使用方法
```bash ```bash
# 安装 huggingface_hub # 下载模型
pip install huggingface_hub huggingface-cli download Wan-AI/Wan2.2-I2V-A14B \
--local-dir ./models/Wan2.2-I2V-A14B
# 或者安装 huggingface-cli # 配置启动脚本
pip install huggingface-cli model_path=./models/Wan2.2-I2V-A14B
lightx2v_path=/path/to/LightX2V
# 登录 Hugging Face(可选,但强烈推荐) # 运行推理
huggingface-cli login cd LightX2V/scripts
bash wan22/run_wan22_moe_i2v.sh
``` ```
### 方式一:完整模型下载(推荐) ### 可用模型列表
**优势**:下载完整模型后,系统将自动识别所有组件路径,无需手动配置,使用体验更加便捷 #### Wan2.1 官方模型列表
#### 1. 下载完整模型 | 模型名称 | 下载链接 |
|---------|----------|
| Wan2.1-I2V-14B-720P | [链接](https://huggingface.co/Wan-AI/Wan2.1-I2V-14B-720P) |
| Wan2.1-I2V-14B-480P | [链接](https://huggingface.co/Wan-AI/Wan2.1-I2V-14B-480P) |
| Wan2.1-T2V-14B | [链接](https://huggingface.co/Wan-AI/Wan2.1-T2V-14B) |
| Wan2.1-T2V-1.3B | [链接](https://huggingface.co/Wan-AI/Wan2.1-T2V-1.3B) |
| Wan2.1-FLF2V-14B-720P | [链接](https://huggingface.co/Wan-AI/Wan2.1-FLF2V-14B-720P) |
| Wan2.1-VACE-14B | [链接](https://huggingface.co/Wan-AI/Wan2.1-VACE-14B) |
| Wan2.1-VACE-1.3B | [链接](https://huggingface.co/Wan-AI/Wan2.1-VACE-1.3B) |
```bash #### Wan2.2 官方模型列表
# 使用 Hugging Face CLI 下载完整模型
huggingface-cli download lightx2v/Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-LightX2V \
--local-dir ./Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-LightX2V
```
#### 2. 启动推理 | 模型名称 | 下载链接 |
|---------|----------|
| Wan2.2-I2V-A14B | [链接](https://huggingface.co/Wan-AI/Wan2.2-I2V-A14B) |
| Wan2.2-T2V-A14B | [链接](https://huggingface.co/Wan-AI/Wan2.2-T2V-A14B) |
| Wan2.2-TI2V-5B | [链接](https://huggingface.co/Wan-AI/Wan2.2-TI2V-5B) |
| Wan2.2-Animate-14B | [链接](https://huggingface.co/Wan-AI/Wan2.2-Animate-14B) |
##### Bash 脚本启动 ### 使用提示
###### 场景一:使用全精度模型 > 💡 **量化模型使用**:如需使用量化模型,可参考[模型转换脚本](https://github.com/ModelTC/LightX2V/blob/main/tools/convert/readme_zh.md)进行转换,或直接使用下方格式二中的预转换量化模型
>
> 💡 **显存优化**:对于 RTX 4090 24GB 或更小显存的设备,建议结合量化技术和 CPU 卸载功能:
> - 量化配置:参考[量化技术文档](../method_tutorials/quantization.md)
> - CPU 卸载:参考[参数卸载文档](../method_tutorials/offload.md)
> - Wan2.1 配置:参考 [offload 配置文件](https://github.com/ModelTC/LightX2V/tree/main/configs/offload)
> - Wan2.2 配置:参考 [wan22 配置文件](https://github.com/ModelTC/LightX2V/tree/main/configs/wan22) 中以 `4090` 结尾的配置
修改[运行脚本](https://github.com/ModelTC/LightX2V/tree/main/scripts/wan/run_wan_i2v_distill_4step_cfg.sh)中的配置: ---
- `model_path`:设置为下载的模型路径 `./Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-LightX2V`
- `lightx2v_path`:设置为 `LightX2V` 项目根目录路径
###### 场景二:使用量化模型 ## 🗂️ 格式二:LightX2V 单文件模型(推荐)
当使用完整模型时,如需启用量化功能,请在[配置文件](https://github.com/ModelTC/LightX2V/tree/main/configs/distill/wan_i2v_distill_4step_cfg.json)中添加以下配置: ### 模型仓库
- [Wan2.1-LightX2V](https://huggingface.co/lightx2v/Wan2.1-Distill-Models)
- [Wan2.2-LightX2V](https://huggingface.co/lightx2v/Wan2.2-Distill-Models)
```json ### 模型特点
{ - **单文件管理**:单个 safetensors 文件,易于管理和部署
"mm_config": { - **多精度支持**:提供原始精度、FP8、INT8 等多种精度版本
"mm_type": "W-fp8-channel-sym-A-fp8-channel-sym-dynamic-Vllm" - **蒸馏加速**:支持 4-step 快速推理
}, // DIT 模型量化方案 - **工具兼容**:兼容 ComfyUI 等其他工具
"t5_quantized": true, // 启用 T5 量化
"t5_quant_scheme": "fp8", // T5 量化模式 **示例**
"clip_quantized": true, // 启用 CLIP 量化 - `wan2.1_i2v_720p_lightx2v_4step.safetensors` - 720P 图生视频原始精度
"clip_quant_scheme": "fp8" // CLIP 量化模式 - `wan2.1_i2v_720p_scaled_fp8_e4m3_lightx2v_4step.safetensors` - 720P 图生视频 FP8 量化
} - `wan2.1_i2v_480p_int8_lightx2v_4step.safetensors` - 480P 图生视频 INT8 量化
``` - ...
> **重要提示**:各模型的量化配置可以灵活组合。量化路径无需手动指定,系统将自动定位各模型的量化版本。 ### Wan2.1 单文件模型
有关量化技术的详细说明,请参考[量化文档](../method_tutorials/quantization.md) #### 场景 A:下载单个模型文件
使用提供的 bash 脚本快速启动: **步骤 1:选择并下载模型**
```bash ```bash
cd LightX2V/scripts # 创建模型目录
bash wan/run_wan_t2v_distill_4step_cfg.sh mkdir -p ./models/wan2.1_i2v_720p
```
##### Gradio 界面启动 # 下载 720P 图生视频 FP8 量化模型
huggingface-cli download lightx2v/Wan2.1-Distill-Models \
--local-dir ./models/wan2.1_i2v_720p \
--include "wan2.1_i2v_720p_lightx2v_4step.safetensors"
```
通过 Gradio 界面进行推理时,只需在启动时指定模型根目录路径,轻量级 VAE 等可通过前端界面按钮灵活选择: **步骤 2:配置启动脚本**
```bash ```bash
# 图像到视频推理 (I2V) # 在启动脚本中设置(指向包含模型文件的目录)
python gradio_demo_zh.py \ model_path=./models/wan2.1_i2v_720p
--model_path ./Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-LightX2V \ lightx2v_path=/path/to/LightX2V
--model_size 14b \
--task i2v \ # 运行脚本
--model_cls wan2.1_distill cd LightX2V/scripts
bash wan/run_wan_i2v_distill_4step_cfg.sh
``` ```
### 方式二:选择性下载 > 💡 **提示**:当目录下只有一个模型文件时,LightX2V 会自动加载该文件。
**优势**:仅下载所需的版本(量化或非量化),有效节省存储空间和下载时间 #### 场景 B:下载多个模型文件
#### 1. 选择性下载 当您下载了多个不同精度的模型到同一目录时,需要在配置文件中明确指定使用哪个模型。
```bash **步骤 1:下载多个模型**
# 使用 Hugging Face CLI 选择性下载非量化版本
huggingface-cli download lightx2v/Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-LightX2V \
--local-dir ./Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-LightX2V \
--include "distill_models/*"
```
```bash ```bash
# 使用 Hugging Face CLI 选择性下载 FP8 量化版本 # 创建模型目录
huggingface-cli download lightx2v/Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-LightX2V \ mkdir -p ./models/wan2.1_i2v_720p_multi
--local-dir ./Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-LightX2V \
--include "distill_fp8/*" # 下载原始精度模型
huggingface-cli download lightx2v/Wan2.1-Distill-Models \
--local-dir ./models/wan2.1_i2v_720p_multi \
--include "wan2.1_i2v_720p_lightx2v_4step.safetensors"
# 下载 FP8 量化模型
huggingface-cli download lightx2v/Wan2.1-Distill-Models \
--local-dir ./models/wan2.1_i2v_720p_multi \
--include "wan2.1_i2v_720p_scaled_fp8_e4m3_lightx2v_4step.safetensors"
# 下载 INT8 量化模型
huggingface-cli download lightx2v/Wan2.1-Distill-Models \
--local-dir ./models/wan2.1_i2v_720p_multi \
--include "wan2.1_i2v_720p_int8_lightx2v_4step.safetensors"
``` ```
```bash **目录结构**
# 使用 Hugging Face CLI 选择性下载 INT8 量化版本
huggingface-cli download lightx2v/Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-LightX2V \
--local-dir ./Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-LightX2V \
--include "distill_int8/*"
```
> **重要提示**:当启动推理脚本或Gradio时,`model_path` 参数仍需要指定为不包含 `--include` 的完整路径。例如:`model_path=./Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-LightX2V`,而不是 `./Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-LightX2V/distill_int8`。 ```
wan2.1_i2v_720p_multi/
├── wan2.1_i2v_720p_lightx2v_4step.safetensors # 原始精度
├── wan2.1_i2v_720p_scaled_fp8_e4m3_lightx2v_4step.safetensors # FP8 量化
└── wan2.1_i2v_720p_int8_lightx2v_4step.safetensors # INT8 量化
└── t5/clip/vae/config.json/xlm-roberta-large/google等其他组件 # 需要手动组织
```
#### 2. 启动推理 **步骤 2:在配置文件中指定模型**
**以只下载了FP8版本的模型为例:** 编辑配置文件(如 `configs/distill/wan_i2v_distill_4step_cfg.json`):
##### Bash 脚本启动 ```json
{
// 使用原始精度模型
"dit_original_ckpt": "./models/wan2.1_i2v_720p_multi/wan2.1_i2v_720p_lightx2v_4step.safetensors",
###### 场景一:使用 FP8 DIT + FP8 T5 + FP8 CLIP // 或使用 FP8 量化模型
// "dit_quantized_ckpt": "./models/wan2.1_i2v_720p_multi/wan2.1_i2v_720p_scaled_fp8_e4m3_lightx2v_4step.safetensors",
// "dit_quantized": true,
// "dit_quant_scheme": "fp8-vllm",
[运行脚本](https://github.com/ModelTC/LightX2V/tree/main/scripts/wan/run_wan_i2v_distill_4step_cfg.sh)中的 `model_path` 指定为您下载好的模型路径 `./Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-LightX2V/``lightx2v_path` 指定为您的 `LightX2V` 项目路径。 // 或使用 INT8 量化模型
// "dit_quantized_ckpt": "./models/wan2.1_i2v_720p_multi/wan2.1_i2v_720p_int8_lightx2v_4step.safetensors",
// "dit_quantized": true,
// "dit_quant_scheme": "int8-vllm",
仅需修改配置文件中的量化模型配置如下: // 其他配置...
```json
{
"mm_config": {
"mm_type": "W-fp8-channel-sym-A-fp8-channel-sym-dynamic-Vllm"
}, // DIT 的量化方案
"t5_quantized": true, // 是否使用 T5 量化版本
"t5_quant_scheme": "fp8", // T5 的量化模式
"clip_quantized": true, // 是否使用 CLIP 量化版本
"clip_quant_scheme": "fp8", // CLIP 的量化模式
} }
``` ```
### 使用提示
> **重要提示**:此时各模型只能指定为量化版本。量化路径无需手动指定,系统将自动定位各模型的量化版本。 > 💡 **配置参数说明**:
> - **dit_original_ckpt**:用于指定原始精度模型(BF16/FP32/FP16)的路径
> - **dit_quantized_ckpt**:用于指定量化模型(FP8/INT8)的路径,需配合 `dit_quantized` 和 `dit_quant_scheme` 参数使用
###### 场景二:使用 FP8 DIT + 原始精度 T5 + 原始精度 CLIP **步骤 3:启动推理**
[运行脚本](https://github.com/ModelTC/LightX2V/tree/main/scripts/wan/run_wan_i2v_distill_4step_cfg.sh)中的 `model_path` 指定为您下载好的模型路径 `./Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-LightX2V``lightx2v_path` 指定为您的 `LightX2V` 项目路径。 ```bash
cd LightX2V/scripts
bash wan/run_wan_i2v_distill_4step_cfg.sh
```
由于仅下载了量化权重,需要手动下载 T5 和 CLIP 的原始精度版本,并在配置文件的 `t5_original_ckpt``clip_original_ckpt` 中配置如下: ### Wan2.2 单文件模型
```json
{ #### 目录结构要求
"mm_config": {
"mm_type": "W-fp8-channel-sym-A-fp8-channel-sym-dynamic-Vllm" 使用 Wan2.2 单文件模型时,需要手动创建特定的目录结构:
}, // DIT 的量化方案
"t5_original_ckpt": "/path/to/models_t5_umt5-xxl-enc-bf16.pth", ```
"clip_original_ckpt": "/path/to/models_clip_open-clip-xlm-roberta-large-vit-huge-14.pth" wan2.2_models/
} ├── high_noise_model/ # 高噪声模型目录(必须)
│ └── wan2.2_i2v_A14b_high_noise_lightx2v_4step.safetensors # 高噪声模型文件
└── low_noise_model/ # 低噪声模型目录(必须)
│ └── wan2.2_i2v_A14b_low_noise_lightx2v_4step.safetensors # 低噪声模型文件
└── t5/vae/config.json/xlm-roberta-large/google等其他组件 # 需要手动组织
``` ```
使用提供的 bash 脚本快速启动: #### 场景 A:每个目录下只有一个模型文件
```bash ```bash
# 创建必需的子目录
mkdir -p ./models/wan2.2_models/high_noise_model
mkdir -p ./models/wan2.2_models/low_noise_model
# 下载高噪声模型到对应目录
huggingface-cli download lightx2v/Wan2.2-Distill-Models \
--local-dir ./models/wan2.2_models/high_noise_model \
--include "wan2.2_i2v_A14b_high_noise_scaled_fp8_e4m3_lightx2v_4step.safetensors"
# 下载低噪声模型到对应目录
huggingface-cli download lightx2v/Wan2.2-Distill-Models \
--local-dir ./models/wan2.2_models/low_noise_model \
--include "wan2.2_i2v_A14b_low_noise_scaled_fp8_e4m3_lightx2v_4step.safetensors"
# 配置启动脚本(指向父目录)
model_path=./models/wan2.2_models
lightx2v_path=/path/to/LightX2V
# 运行脚本
cd LightX2V/scripts cd LightX2V/scripts
bash wan/run_wan_t2v_distill_4step_cfg.sh bash wan22/run_wan22_moe_i2v_distill.sh
``` ```
##### Gradio 界面启动 > 💡 **提示**:当每个子目录下只有一个模型文件时,LightX2V 会自动加载。
#### 场景 B:每个目录下有多个模型文件
通过 Gradio 界面进行推理时,启动时指定模型根目录路径: 当您在 `high_noise_model/``low_noise_model/` 目录下分别放置了多个不同精度的模型时,需要在配置文件中明确指定。
```bash ```bash
# 图像到视频推理 (I2V) # 创建目录
python gradio_demo_zh.py \ mkdir -p ./models/wan2.2_models_multi/high_noise_model
--model_path ./Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-LightX2V/ \ mkdir -p ./models/wan2.2_models_multi/low_noise_model
--model_size 14b \
--task i2v \ # 下载高噪声模型的多个版本
--model_cls wan2.1_distill huggingface-cli download lightx2v/Wan2.2-Distill-Models \
--local-dir ./models/wan2.2_models_multi/high_noise_model \
--include "wan2.2_i2v_A14b_high_noise_*.safetensors"
# 下载低噪声模型的多个版本
huggingface-cli download lightx2v/Wan2.2-Distill-Models \
--local-dir ./models/wan2.2_models_multi/low_noise_model \
--include "wan2.2_i2v_A14b_low_noise_*.safetensors"
``` ```
> **重要提示**:由于模型根目录下仅包含各模型的量化版本,前端使用时,对于 DIT/T5/CLIP 模型的量化精度只能选择 fp8。如需使用非量化版本的T5/CLIP,请手动下载非量化权重并放置到gradio_demo的model_path目录(`./Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-LightX2V/`)下,此时T5/CLIP的量化精度可以选择bf16/fp16。 **目录结构**
### 方式三:手动配置
用户可根据实际需求灵活配置各个组件的量化选项和路径,实现量化与非量化组件的混合使用。请确保所需的模型权重已正确下载并放置在指定路径。 ```
wan2.2_models_multi/
├── high_noise_model/
│ ├── wan2.2_i2v_A14b_high_noise_lightx2v_4step.safetensors # 原始精度
│ ├── wan2.2_i2v_A14b_high_noise_fp8_e4m3_lightx2v_4step.safetensors # FP8 量化
│ └── wan2.2_i2v_A14b_high_noise_int8_lightx2v_4step.safetensors # INT8 量化
└── low_noise_model/
├── wan2.2_i2v_A14b_low_noise_lightx2v_4step.safetensors # 原始精度
├── wan2.2_i2v_A14b_low_noise_fp8_e4m3_lightx2v_4step.safetensors # FP8 量化
└── wan2.2_i2v_A14b_low_noise_int8_lightx2v_4step.safetensors # INT8 量化
```
#### DIT 模型配置 **配置文件设置**
```json ```json
{ {
"dit_quantized_ckpt": "/path/to/dit_quantized_ckpt", // DIT 量化权重路径 // 使用原始精度模型
"dit_original_ckpt": "/path/to/dit_original_ckpt", // DIT 原始精度权重路径 "high_noise_original_ckpt": "./models/wan2.2_models_multi/high_noise_model/wan2.2_i2v_A14b_high_noise_lightx2v_4step.safetensors",
"mm_config": { "low_noise_original_ckpt": "./models/wan2.2_models_multi/low_noise_model/wan2.2_i2v_A14b_low_noise_lightx2v_4step.safetensors",
"mm_type": "W-fp8-channel-sym-A-fp8-channel-sym-dynamic-Vllm" // DIT 矩阵乘算子类型,非量化时指定为 "Default"
} // 或使用 FP8 量化模型
// "high_noise_quantized_ckpt": "./models/wan2.2_models_multi/high_noise_model/wan2.2_i2v_A14b_high_noise_fp8_e4m3_lightx2v_4step.safetensors",
// "low_noise_quantized_ckpt": "./models/wan2.2_models_multi/low_noise_model/wan2.2_i2v_A14b_low_noise_fp8_e4m3_lightx2v_4step.safetensors",
// "dit_quantized": true,
// "dit_quant_scheme": "fp8-vllm"
// 或使用 INT8 量化模型
// "high_noise_quantized_ckpt": "./models/wan2.2_models_multi/high_noise_model/wan2.2_i2v_A14b_high_noise_int8_lightx2v_4step.safetensors",
// "low_noise_quantized_ckpt": "./models/wan2.2_models_multi/low_noise_model/wan2.2_i2v_A14b_low_noise_int8_lightx2v_4step.safetensors",
// "dit_quantized": true,
// "dit_quant_scheme": "int8-vllm"
} }
``` ```
#### T5 模型配置 ### 使用提示
```json > 💡 **配置参数说明**:
{ > - **high_noise_original_ckpt** / **low_noise_original_ckpt**:用于指定原始精度模型(BF16/FP32/FP16)的路径
"t5_quantized_ckpt": "/path/to/t5_quantized_ckpt", // T5 量化权重路径 > - **high_noise_quantized_ckpt** / **low_noise_quantized_ckpt**:用于指定量化模型(FP8/INT8)的路径,需配合 `dit_quantized` 和 `dit_quant_scheme` 参数使用
"t5_original_ckpt": "/path/to/t5_original_ckpt", // T5 原始精度权重路径
"t5_quantized": true, // 是否启用 T5 量化
"t5_quant_scheme": "fp8" // T5 量化模式,仅在 t5_quantized true 时生效 ### 可用模型列表
}
#### Wan2.1 单文件模型列表
**图生视频模型(I2V)**
| 文件名 | 精度 | 说明 |
|--------|------|------|
| `wan2.1_i2v_480p_lightx2v_4step.safetensors` | BF16 | 4步模型原始精度 |
| `wan2.1_i2v_480p_scaled_fp8_e4m3_lightx2v_4step.safetensors` | FP8 | 4步模型FP8 量化 |
| `wan2.1_i2v_480p_int8_lightx2v_4step.safetensors` | INT8 | 4步模型INT8 量化 |
| `wan2.1_i2v_480p_scaled_fp8_e4m3_lightx2v_4step_comfyui.safetensors` | FP8 | 4步模型ComfyUI 格式 |
| `wan2.1_i2v_720p_lightx2v_4step.safetensors` | BF16 | 4步模型原始精度 |
| `wan2.1_i2v_720p_scaled_fp8_e4m3_lightx2v_4step.safetensors` | FP8 | 4步模型FP8 量化 |
| `wan2.1_i2v_720p_int8_lightx2v_4step.safetensors` | INT8 | 4步模型INT8 量化 |
| `wan2.1_i2v_720p_scaled_fp8_e4m3_lightx2v_4step_comfyui.safetensors` | FP8 | 4步模型ComfyUI 格式 |
**文生视频模型(T2V)**
| 文件名 | 精度 | 说明 |
|--------|------|------|
| `wan2.1_t2v_14b_lightx2v_4step.safetensors` | BF16 | 4步模型原始精度 |
| `wan2.1_t2v_14b_scaled_fp8_e4m3_lightx2v_4step.safetensors` | FP8 | 4步模型FP8 量化 |
| `wan2.1_t2v_14b_int8_lightx2v_4step.safetensors` | INT8 | 4步模型INT8 量化 |
| `wan2.1_t2v_14b_scaled_fp8_e4m3_lightx2v_4step_comfyui.safetensors` | FP8 | 4步模型ComfyUI 格式 |
#### Wan2.2 单文件模型列表
**图生视频模型(I2V)- A14B 系列**
| 文件名 | 精度 | 说明 |
|--------|------|------|
| `wan2.2_i2v_A14b_high_noise_lightx2v_4step.safetensors` | BF16 | 高噪声模型-4步原始精度 |
| `wan2.2_i2v_A14b_high_noise_scaled_fp8_e4m3_lightx2v_4step.safetensors` | FP8 | 高噪声模型-4步FP8量化 |
| `wan2.2_i2v_A14b_high_noise_int8_lightx2v_4step.safetensors` | INT8 | 高噪声模型-4步INT8量化 |
| `wan2.2_i2v_A14b_low_noise_lightx2v_4step.safetensors` | BF16 | 低噪声模型-4步原始精度 |
| `wan2.2_i2v_A14b_low_noise_scaled_fp8_e4m3_lightx2v_4step.safetensors` | FP8 | 低噪声模型-4步FP8量化 |
| `wan2.2_i2v_A14b_low_noise_int8_lightx2v_4step.safetensors` | INT8 | 低噪声模型-4步INT8量化 |
> 💡 **使用提示**:
> - Wan2.2 模型采用双噪声架构,需要同时下载高噪声(high_noise)和低噪声(low_noise)模型
> - 详细的目录组织方式请参考上方"Wan2.2 单文件模型"部分
---
## 🗂️ 格式三:LightX2V LoRA 模型
LoRA(Low-Rank Adaptation)模型提供了一种轻量级的模型微调方案,可以在不修改基础模型的情况下实现特定效果的定制化。
### 模型仓库
- **Wan2.1 LoRA 模型**[lightx2v/Wan2.1-Distill-Loras](https://huggingface.co/lightx2v/Wan2.1-Distill-Loras)
- **Wan2.2 LoRA 模型**[lightx2v/Wan2.2-Distill-Loras](https://huggingface.co/lightx2v/Wan2.2-Distill-Loras)
### 使用方式
#### 方式一:离线合并
将 LoRA 权重离线合并到基础模型中,生成新的完整模型文件。
**操作步骤**
参考 [模型转换文档](https://github.com/ModelTC/lightx2v/tree/main/tools/convert/readme_zh.md) 进行离线合并。
**优点**
- ✅ 推理时无需额外加载 LoRA
- ✅ 性能更优
**缺点**
- ❌ 需要额外存储空间
- ❌ 切换不同 LoRA 需要重新合并
#### 方式二:在线加载
在推理时动态加载 LoRA 权重,无需修改基础模型。
**LoRA 应用原理**
```python
# LoRA 权重应用公式
# W' = W + (alpha/rank) * B @ A
# 其中:B = up_proj (out_features, rank)
# A = down_proj (rank, in_features)
if weights_dict["alpha"] is not None:
lora_alpha = weights_dict["alpha"] / lora_down.shape[0]
elif alpha is not None:
lora_alpha = alpha / lora_down.shape[0]
else:
lora_alpha = 1.0
``` ```
#### CLIP 模型配置 **配置方法**
**Wan2.1 LoRA 配置**
```json ```json
{ {
"clip_quantized_ckpt": "/path/to/clip_quantized_ckpt", // CLIP 量化权重路径 "lora_configs": [
"clip_original_ckpt": "/path/to/clip_original_ckpt", // CLIP 原始精度权重路径 {
"clip_quantized": true, // 是否启用 CLIP 量化 "path": "wan2.1_i2v_lora_rank64_lightx2v_4step.safetensors",
"clip_quant_scheme": "fp8" // CLIP 量化模式,仅在 clip_quantized true 时生效 "strength": 1.0,
"alpha": null
}
]
} }
``` ```
#### VAE 模型配置 **Wan2.2 LoRA 配置**
由于 Wan2.2 采用双模型架构(高噪声/低噪声),需要分别为两个模型配置 LoRA:
```json ```json
{ {
"vae_pth": "/path/to/Wan2.1_VAE.pth", // 原始 VAE 模型路径 "lora_configs": [
"use_tiny_vae": true, // 是否使用轻量级 VAE {
"tiny_vae_path": "/path/to/taew2_1.pth" // 轻量级 VAE 模型路径 "name": "low_noise_model",
"path": "wan2.2_i2v_A14b_low_noise_lora_rank64_lightx2v_4step.safetensors",
"strength": 1.0,
"alpha": null
},
{
"name": "high_noise_model",
"path": "wan2.2_i2v_A14b_high_noise_lora_rank64_lightx2v_4step.safetensors",
"strength": 1.0,
"alpha": null
}
]
} }
``` ```
> **配置说明**: **参数说明**
> - 量化权重和原始精度权重可以灵活混合使用,系统将根据配置自动选择对应的模型
> - 量化模式的选择取决于您的硬件支持情况,建议在 H100/A100 等高端 GPU 上使用 FP8
> - 轻量级 VAE 可以显著提升推理速度,但可能略微影响生成质量
## 💡 最佳实践
### 推荐配置
**完整模型用户**
- 下载完整模型,享受自动路径查找的便利
- 仅需配置量化方案和组件开关
- 推荐使用 bash 脚本快速启动
**存储空间受限用户**
- 选择性下载所需的量化版本
- 灵活混合使用量化和原始精度组件
- 使用 bash 脚本简化启动流程
**高级用户**
- 完全手动配置路径,实现最大灵活性
- 支持模型文件分散存储
- 可自定义 bash 脚本参数
### 性能优化建议 | 参数 | 说明 | 默认值 |
|------|------|--------|
| `path` | LoRA 模型文件路径 | 必填 |
| `strength` | LoRA 强度系数,范围 [0.0, 1.0] | 1.0 |
| `alpha` | LoRA 缩放因子,`null` 时使用模型内置值,如果没有内置值默认1 | null |
| `name` | (仅 Wan2.2)指定应用到哪个模型 | 必填 |
- **使用 SSD 存储**:显著提升模型加载速度和推理性能 **优点**
- **选择合适的量化方案** - ✅ 灵活切换不同 LoRA
- FP8:适用于 H100/A100 等高端 GPU,精度高 - ✅ 节省存储空间
- INT8:适用于通用 GPU,内存占用小 - ✅ 可动态调整 LoRA 强度
- **启用轻量级 VAE**`use_tiny_vae: true` 可提升推理速度
- **合理配置 CPU 卸载**`t5_cpu_offload: true` 可节省 GPU 内存
### 下载优化建议 **缺点**
- ❌ 推理时需额外加载时间
- ❌ 略微增加显存占用
- **使用 Hugging Face CLI**:比 git clone 更稳定,支持断点续传 ---
- **选择性下载**:仅下载所需的量化版本,节省时间和存储空间
- **网络优化**:使用稳定的网络连接,必要时使用代理
- **断点续传**:使用 `--resume-download` 参数支持中断后继续下载
## 🚨 常见问题
### Q: 模型文件过大,下载速度缓慢怎么办?
A: 建议使用选择性下载方式,仅下载所需的量化版本,或使用国内镜像源
### Q: 启动时提示模型路径不存在?
A: 请检查模型是否已正确下载,验证路径配置是否正确,确认自动查找机制是否正常工作
### Q: 如何切换不同的量化方案? ## 📚 相关资源
A: 修改配置文件中的 `mm_type`, `t5_quant_scheme`,`clip_quant_scheme`等参数,请参考[量化文档](../method_tutorials/quantization.md)
### Q: 如何混合使用量化和原始精度组件? ### 官方仓库
A: 通过 `t5_quantized``clip_quantized` 参数控制,并手动指定原始精度路径 - [LightX2V GitHub](https://github.com/ModelTC/LightX2V)
- [LightX2V 单文件模型仓库](https://huggingface.co/lightx2v/Wan2.1-Distill-Models)
- [Wan-AI 官方模型仓库](https://huggingface.co/Wan-AI)
### Q: 配置文件中的路径如何设置? ### 模型下载链接
A: 推荐使用自动路径查找,如需手动配置请参考"手动配置"部分
### Q: 如何验证自动路径查找是否正常工作? **Wan2.1 系列**
A: 查看启动日志,代码将输出实际使用的模型路径 - [Wan2.1 Collection](https://huggingface.co/collections/Wan-AI/wan21-68ac4ba85372ae5a8e282a1b)
### Q: bash 脚本启动失败怎么办? **Wan2.2 系列**
A: 检查脚本中的路径配置是否正确,确保 `lightx2v_path``model_path` 变量已正确设置 - [Wan2.2 Collection](https://huggingface.co/collections/Wan-AI/wan22-68ac4ae80a8b477e79636fc8)
## 📚 相关链接 **LightX2V 单文件模型**
- [Wan2.1-Distill-Models](https://huggingface.co/lightx2v/Wan2.1-Distill-Models)
- [Wan2.2-Distill-Models](https://huggingface.co/lightx2v/Wan2.2-Distill-Models)
- [LightX2V 官方模型仓库](https://huggingface.co/lightx2v) ### 文档链接
- [Gradio 部署指南](./deploy_gradio.md) - [量化技术文档](../method_tutorials/quantization.md)
- [参数卸载文档](../method_tutorials/offload.md)
- [配置文件示例](https://github.com/ModelTC/LightX2V/tree/main/configs) - [配置文件示例](https://github.com/ModelTC/LightX2V/tree/main/configs)
--- ---
通过科学的模型文件组织和灵活的配置选项,LightX2V 支持多种使用场景。完整模型下载提供最大的便利性,选择性下载节省存储空间,手动配置提供最大的灵活性。自动路径查找机制确保用户无需记忆复杂的路径配置,同时保持系统的可扩展性。 通过本文档,您应该能够:
✅ 理解 LightX2V 支持的所有模型格式
✅ 根据需求选择合适的模型和精度
✅ 正确下载和组织模型文件
✅ 配置启动参数并成功运行推理
✅ 解决常见的模型加载问题
如有其他问题,欢迎在 [GitHub Issues](https://github.com/ModelTC/LightX2V/issues) 中提问。
...@@ -109,7 +109,6 @@ config = { ...@@ -109,7 +109,6 @@ config = {
"offload_granularity": "phase", # 推荐使用phase粒度 "offload_granularity": "phase", # 推荐使用phase粒度
"num_disk_workers": 2, # 磁盘工作线程数 "num_disk_workers": 2, # 磁盘工作线程数
"offload_to_disk": True, # 启用磁盘卸载 "offload_to_disk": True, # 启用磁盘卸载
"offload_path": ".", # 磁盘卸载路径
} }
``` ```
...@@ -120,7 +119,37 @@ config = { ...@@ -120,7 +119,37 @@ config = {
- `"block"`:以完整的Transformer层为单位进行缓存管理 - `"block"`:以完整的Transformer层为单位进行缓存管理
- `"phase"`:以单个计算组件为单位进行缓存管理 - `"phase"`:以单个计算组件为单位进行缓存管理
详细配置文件可参考[config](https://github.com/ModelTC/lightx2v/tree/main/configs/offload) **非 DIT 模型组件(T5、CLIP、VAE)的卸载配置:**
这些组件的卸载行为遵循以下规则:
- **默认行为**:如果没有单独指定,T5、CLIP、VAE 会跟随 `cpu_offload` 的设置
- **独立配置**:可以为每个组件单独设置卸载策略,实现精细控制
**配置示例**
```json
{
"cpu_offload": true, // DIT 模型卸载开关
"t5_cpu_offload": false, // T5 编码器独立设置
"clip_cpu_offload": false, // CLIP 编码器独立设置
"vae_cpu_offload": false // VAE 编码器独立设置
}
```
在显存受限的设备上,建议采用渐进式卸载策略:
1. **第一步**:仅开启 `cpu_offload`,关闭 `t5_cpu_offload``clip_cpu_offload``vae_cpu_offload`
2. **第二步**:如果显存仍不足,逐步开启 T5、CLIP、VAE 的 CPU 卸载
3. **第三步**:如果显存仍然不够,考虑使用量化 + CPU 卸载或启用 `lazy_load`
**实践经验**
- **RTX 4090 24GB + 14B 模型**:通常只需开启 `cpu_offload`,其他组件卸载需要手动设为 `false`,同时使用 FP8 量化版本
- **更小显存的 GPU**:需要组合使用量化、CPU 卸载和延迟加载
- **量化方案**:建议参考[量化技术文档](../method_tutorials/quantization.md)选择合适的量化策略
**配置文件参考**
- **Wan2.1 系列模型**:参考 [offload 配置文件](https://github.com/ModelTC/lightx2v/tree/main/configs/offload)
- **Wan2.2 系列模型**:参考 [wan22 配置文件](https://github.com/ModelTC/lightx2v/tree/main/configs/wan22) 中以 `4090` 结尾的配置文件
## 🎯 使用建议 ## 🎯 使用建议
- 🔄 GPU-CPU分block/phase卸载:适合GPU显存不足(RTX 3090/4090 24G)但系统内存(>64/128G)充足 - 🔄 GPU-CPU分block/phase卸载:适合GPU显存不足(RTX 3090/4090 24G)但系统内存(>64/128G)充足
......
# 模型量化 # 模型量化技术
LightX2V支持对`Dit`中的线性层进行量化推理,支持`w8a8-int8``w8a8-fp8``w8a8-fp8block``w8a8-mxfp8``w4a4-nvfp4`的矩阵乘法。同时,LightX2V也支持对T5和CLIP编码器进行量化,以进一步提升推理性能。 ## 📖 概述
## 📊 量化方案概览 LightX2V 支持对 DIT、T5 和 CLIP 模型进行量化推理,通过降低模型精度来减少显存占用并提升推理速度。
### DIT 模型量化 ---
LightX2V支持多种DIT矩阵乘法量化方案,通过配置文件中的`mm_type`参数进行配置:
#### 支持的 mm_type 类型 ## 🔧 量化模式
| mm_type | 权重量化 | 激活量化 | 计算内核 | | 量化模式 | 权重量化 | 激活量化 | 计算内核 | 适用硬件 |
|---------|----------|----------|----------| |--------------|----------|----------|----------|----------|
| `Default` | 无量化 | 无量化 | PyTorch | | `fp8-vllm` | FP8 通道对称 | FP8 通道动态对称 | [VLLM](https://github.com/vllm-project/vllm) | H100/H200/H800, RTX 40系等 |
| `W-fp8-channel-sym-A-fp8-channel-sym-dynamic-Vllm` | FP8 通道对称 | FP8 通道动态对称 | VLLM | | `int8-vllm` | INT8 通道对称 | INT8 通道动态对称 | [VLLM](https://github.com/vllm-project/vllm) | A100/A800, RTX 30/40系等 |
| `W-int8-channel-sym-A-int8-channel-sym-dynamic-Vllm` | INT8 通道对称 | INT8 通道动态对称 | VLLM | | `fp8-sgl` | FP8 通道对称 | FP8 通道动态对称 | [SGL](https://github.com/sgl-project/sglang/tree/main/sgl-kernel) | H100/H200/H800, RTX 40系等 |
| `W-fp8-channel-sym-A-fp8-channel-sym-dynamic-Q8F` | FP8 通道对称 | FP8 通道动态对称 | Q8F | | `int8-sgl` | INT8 通道对称 | INT8 通道动态对称 | [SGL](https://github.com/sgl-project/sglang/tree/main/sgl-kernel) | A100/A800, RTX 30/40系等 |
| `W-int8-channel-sym-A-int8-channel-sym-dynamic-Q8F` | INT8 通道对称 | INT8 通道动态对称 | Q8F | | `fp8-q8f` | FP8 通道对称 | FP8 通道动态对称 | [Q8-Kernels](https://github.com/KONAKONA666/q8_kernels) | RTX 40系, L40S等 |
| `W-fp8-block128-sym-A-fp8-channel-group128-sym-dynamic-Deepgemm` | FP8 块对称 | FP8 通道组对称 | DeepGEMM | | `int8-q8f` | INT8 通道对称 | INT8 通道动态对称 | [Q8-Kernels](https://github.com/KONAKONA666/q8_kernels) | RTX 40系, L40S等 |
| `W-fp8-channel-sym-A-fp8-channel-sym-dynamic-Sgl` | FP8 通道对称 | FP8 通道动态对称 | SGL | | `int8-torchao` | INT8 通道对称 | INT8 通道动态对称 | [TorchAO](https://github.com/pytorch/ao) | A100/A800, RTX 30/40系等 |
| `int4-g128-marlin` | INT4 分组对称 | FP16 | [Marlin](https://github.com/IST-DASLab/marlin) | H200/H800/A100/A800, RTX 30/40系等 |
| `fp8-b128-deepgemm` | FP8 分块对称 | FP8 分组对称 | [DeepGemm](https://github.com/deepseek-ai/DeepGEMM) | H100/H200/H800, RTX 40系等|
#### 量化方案详细说明 ---
**FP8 量化方案** ## 🔧 量化模型获取
- **权重量化**:使用 `torch.float8_e4m3fn` 格式,按通道进行对称量化
- **激活量化**:动态量化,支持 per-token 和 per-channel 模式
- **优势**:在支持 FP8 的 GPU 上提供最佳性能,精度损失最小(通常<1%)
- **适用硬件**:H100、A100、RTX 40系列等支持FP8的GPU
**INT8 量化方案** ### 方式一:下载预量化模型
- **权重量化**:使用 `torch.int8` 格式,按通道进行对称量化
- **激活量化**:动态量化,支持 per-token 模式
- **优势**:兼容性最好,适用于大多数 GPU 硬件,内存占用减少约50%
- **适用硬件**:所有支持INT8的GPU
**块量化方案** 从 LightX2V 模型仓库下载预量化的模型:
- **权重量化**:按 128x128 块进行 FP8 量化
- **激活量化**:按通道组(组大小128)进行量化
- **优势**:特别适合大模型,内存效率更高,支持更大的batch size
### T5 编码器量化 **DIT 模型**
T5编码器支持以下量化方案 [Wan2.1-Distill-Models](https://huggingface.co/lightx2v/Wan2.1-Distill-Models) 下载预量化的 DIT 模型
#### 支持的 quant_scheme 类型 ```bash
# 下载 DIT FP8 量化模型
| quant_scheme | 量化精度 | 计算内核 | huggingface-cli download lightx2v/Wan2.1-Distill-Models \
|--------------|----------|----------| --local-dir ./models \
| `int8` | INT8 | VLLM | --include "wan2.1_i2v_720p_scaled_fp8_e4m3_lightx2v_4step.safetensors"
| `fp8` | FP8 | VLLM | ```
| `int8-torchao` | INT8 | TorchAO |
| `int8-q8f` | INT8 | Q8F |
| `fp8-q8f` | FP8 | Q8F |
#### T5量化特性 **Encoder 模型**
- **线性层量化**:量化注意力层和FFN层中的线性变换 [Encoders-LightX2V](https://huggingface.co/lightx2v/Encoders-Lightx2v) 下载预量化的 T5 和 CLIP 模型:
- **动态量化**:激活在推理过程中动态量化,无需预计算
- **精度保持**:通过对称量化和缩放因子保持数值精度
### CLIP 编码器量化 ```bash
# 下载 T5 FP8 量化模型
huggingface-cli download lightx2v/Encoders-Lightx2v \
--local-dir ./models \
--include "models_t5_umt5-xxl-enc-fp8.pth"
CLIP编码器支持与T5相同的量化方案: # 下载 CLIP FP8 量化模型
huggingface-cli download lightx2v/Encoders-Lightx2v \
--local-dir ./models \
--include "models_clip_open-clip-xlm-roberta-large-vit-huge-14-fp8.pth"
```
#### CLIP量化特性 ### 方式二:自行量化模型
- **视觉编码器量化**:量化Vision Transformer中的线性层 详细量化工具使用方法请参考:[模型转换文档](https://github.com/ModelTC/lightx2v/tree/main/tools/convert/readme_zh.md)
- **文本编码器量化**:量化文本编码器中的线性层
- **多模态对齐**:保持视觉和文本特征之间的对齐精度
## 🚀 生产量化模型 ---
可通过[LightX2V 官方模型仓库](https://huggingface.co/lightx2v)下载量化模型,具体可参考[模型结构文档](../deploy_guides/model_structure.md) ## 🚀 量化模型使用
使用LightX2V的convert工具,将模型转换成量化模型,参考[文档](https://github.com/ModelTC/lightx2v/tree/main/tools/convert/readme_zh.md) ### DIT 模型量化
## 📥 加载量化模型进行推理 #### 支持的量化模式
### DIT 模型配置 DIT 量化模式(`dit_quant_scheme`)支持:`fp8-vllm``int8-vllm``fp8-sgl``int8-sgl``fp8-q8f``int8-q8f``int8-torchao``int4-g128-marlin``fp8-b128-deepgemm`
将转换后的量化权重的路径,写到[配置文件](https://github.com/ModelTC/lightx2v/blob/main/configs/quantization)中的`dit_quantized_ckpt`中。 #### 配置示例
```json ```json
{ {
"dit_quantized_ckpt": "/path/to/dit_quantized_ckpt", "dit_quantized": true,
"mm_config": { "dit_quant_scheme": "fp8-sgl",
"mm_type": "W-fp8-channel-sym-A-fp8-channel-sym-dynamic-Vllm" "dit_quantized_ckpt": "/path/to/dit_quantized_model" // 可选
}
} }
``` ```
### T5 编码器配置 > 💡 **提示**:当运行脚本的 `model_path` 中只有一个 DIT 模型时,`dit_quantized_ckpt` 可以不用单独指定。
```json ### T5 模型量化
{
"t5_quantized": true,
"t5_quant_scheme": "fp8",
"t5_quantized_ckpt": "/path/to/t5_quantized_ckpt"
}
```
### CLIP 编码器配置 #### 支持的量化模式
```json T5 量化模式(`t5_quant_scheme`)支持:`int8-vllm``fp8-sgl``int8-q8f``fp8-q8f``int8-torchao`
{
"clip_quantized": true,
"clip_quant_scheme": "fp8",
"clip_quantized_ckpt": "/path/to/clip_quantized_ckpt"
}
```
### 完整配置示例 #### 配置示例
```json ```json
{ {
"dit_quantized_ckpt": "/path/to/dit_quantized_ckpt",
"mm_config": {
"mm_type": "W-fp8-channel-sym-A-fp8-channel-sym-dynamic-Vllm"
},
"t5_quantized": true, "t5_quantized": true,
"t5_quant_scheme": "fp8", "t5_quant_scheme": "fp8-sgl",
"t5_quantized_ckpt": "/path/to/t5_quantized_ckpt", "t5_quantized_ckpt": "/path/to/t5_quantized_model" // 可选
"clip_quantized": true,
"clip_quant_scheme": "fp8",
"clip_quantized_ckpt": "/path/to/clip_quantized_ckpt"
} }
``` ```
通过指定`--config_json`到具体的config文件,即可以加载量化模型进行推理。 > 💡 **提示**:当运行脚本指定的 `model_path` 中存在 T5 量化模型(如 `models_t5_umt5-xxl-enc-fp8.pth` 或 `models_t5_umt5-xxl-enc-int8.pth`)时,`t5_quantized_ckpt` 可以不用单独指定。
[这里](https://github.com/ModelTC/lightx2v/tree/main/scripts/quantization)有一些运行脚本供使用。
## 💡 量化方案选择建议
### 硬件兼容性
- **H100/A100 GPU/RTX 4090/RTX 4060**:推荐使用 FP8 量化方案 ### CLIP 模型量化
- DIT: `W-fp8-channel-sym-A-fp8-channel-sym-dynamic-Vllm`
- T5/CLIP: `fp8`
- **A100/RTX 3090/RTX 3060**:推荐使用 INT8 量化方案
- DIT: `W-int8-channel-sym-A-int8-channel-sym-dynamic-Vllm`
- T5/CLIP: `int8`
- **其他 GPU**:根据硬件支持情况选择
### 性能优化 #### 支持的量化模式
- **内存受限**:选择 INT8 量化方案 CLIP 量化模式(`clip_quant_scheme`)支持:`int8-vllm``fp8-sgl``int8-q8f``fp8-q8f``int8-torchao`
- **速度优先**:选择 FP8 量化方案
- **精度要求高**:使用 FP8 或混合精度方案
### 混合量化策略 #### 配置示例
可以针对不同组件选择不同的量化方案:
```json ```json
{ {
"mm_config": {
"mm_type": "W-fp8-channel-sym-A-fp8-channel-sym-dynamic-Vllm"
},
"t5_quantized": true,
"t5_quant_scheme": "int8",
"clip_quantized": true, "clip_quantized": true,
"clip_quant_scheme": "fp8" "clip_quant_scheme": "fp8-sgl",
"clip_quantized_ckpt": "/path/to/clip_quantized_model" // 可选
} }
``` ```
## 🔧 高阶量化功能 > 💡 **提示**:当运行脚本指定的 `model_path` 中存在 CLIP 量化模型(如 `models_clip_open-clip-xlm-roberta-large-vit-huge-14-fp8.pth` 或 `models_clip_open-clip-xlm-roberta-large-vit-huge-14-int8.pth`)时,`clip_quantized_ckpt` 可以不用单独指定。
### 量化算法调优
具体可参考量化工具[LightCompress的文档](https://github.com/ModelTC/llmc/blob/main/docs/zh_cn/source/backend/lightx2v.md)
### 自定义量化内核 ### 性能优化策略
LightX2V支持自定义量化内核,可以通过以下方式扩展 如果显存不够,可以结合参数卸载来进一步减少显存占用,参考[参数卸载文档](../method_tutorials/offload.md)
1. **注册新的 mm_type**:在 `mm_weight.py` 中添加新的量化类 > - **Wan2.1 配置**:参考 [offload 配置文件](https://github.com/ModelTC/LightX2V/tree/main/configs/offload)
2. **实现量化函数**:定义权重和激活的量化方法 > - **Wan2.2 配置**:参考 [wan22 配置文件](https://github.com/ModelTC/LightX2V/tree/main/configs/wan22) 中以 `4090` 结尾的配置
3. **集成计算内核**:使用自定义的矩阵乘法实现
---
## 🚨 重要注意事项 ## 📚 相关资源
1. **硬件要求**:FP8 量化需要支持 FP8 的 GPU(如 H100、RTX40系) ### 配置文件示例
2. **精度影响**:量化会带来一定的精度损失,需要根据应用场景权衡 - [INT8 量化配置](https://github.com/ModelTC/LightX2V/blob/main/configs/quantization/wan_i2v.json)
- [Q8F 量化配置](https://github.com/ModelTC/LightX2V/blob/main/configs/quantization/wan_i2v_q8f.json)
- [TorchAO 量化配置](https://github.com/ModelTC/LightX2V/blob/main/configs/quantization/wan_i2v_torchao.json)
## 📚 相关资源 ### 运行脚本
- [量化推理脚本](https://github.com/ModelTC/LightX2V/tree/main/scripts/quantization)
### 工具文档
- [量化工具文档](https://github.com/ModelTC/lightx2v/tree/main/tools/convert/readme_zh.md) - [量化工具文档](https://github.com/ModelTC/lightx2v/tree/main/tools/convert/readme_zh.md)
- [运行脚本](https://github.com/ModelTC/lightx2v/tree/main/scripts/quantization)
- [配置文件示例](https://github.com/ModelTC/lightx2v/blob/main/configs/quantization)
- [LightCompress 量化文档](https://github.com/ModelTC/llmc/blob/main/docs/zh_cn/source/backend/lightx2v.md) - [LightCompress 量化文档](https://github.com/ModelTC/llmc/blob/main/docs/zh_cn/source/backend/lightx2v.md)
### 模型仓库
- [Wan2.1-LightX2V 量化模型](https://huggingface.co/lightx2v/Wan2.1-Distill-Models)
- [Wan2.2-LightX2V 量化模型](https://huggingface.co/lightx2v/Wan2.2-Distill-Models)
- [Encoders 量化模型](https://huggingface.co/lightx2v/Encoders-Lightx2v)
---
通过本文档,您应该能够:
✅ 理解 LightX2V 支持的量化方案
✅ 根据硬件选择合适的量化策略
✅ 正确配置量化参数
✅ 获取和使用量化模型
✅ 优化推理性能和显存使用
如有其他问题,欢迎在 [GitHub Issues](https://github.com/ModelTC/LightX2V/issues) 中提问。
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment