- ✅ [q8-kernel](https://github.com/KONAKONA666/q8_kernels)(only supports ADA architecture GPUs)
Install according to the project homepage tutorials for each operator as needed
Install according to the project homepage tutorials for each operator as needed.
### 🤖 Supported Models
### 📥 Model Download
#### 🎬 Image-to-Video Models
Refer to the [Model Structure Documentation](../getting_started/model_structure.md) to download complete models (including quantized and non-quantized versions) or download only quantized/non-quantized versions.
| Model Name | Resolution | Parameters | Features | Recommended Use |
| ✅ [Wan2.1-I2V-14B-720P-StepDistill-CfgDistill-Lightx2v](https://huggingface.co/lightx2v/Wan2.1-I2V-14B-720P-StepDistill-CfgDistill-Lightx2v) | 720p | 14B | HD distilled version | High quality + fast inference |
#### wan2.1 Model Directory Structure
#### 📝 Text-to-Video Models
| Model Name | Parameters | Features | Recommended Use |
-**Resource-constrained**: Prioritize distilled versions and lower resolutions
-**Real-time applications**: Strongly recommend using distilled models (`wan2.1_distill`)
**🎯 Model Category Description**:
-**`wan2.1`**: Standard model, provides the best video generation quality, suitable for scenarios with extremely high quality requirements
-**`wan2.1_distill`**: Distilled model, optimized through knowledge distillation technology, significantly improves inference speed, maintains good quality while greatly reducing computation time, suitable for most application scenarios
**📥 Model Download**:
Refer to the [Model Structure Documentation](./model_structure.md) to download complete models (including quantized and non-quantized versions) or download only quantized/non-quantized versions.
**Download Options**:
```
models/
├── wan2.1_i2v_720p_lightx2v_4step.safetensors # Original precision
├── t5/clip/xlm-roberta-large/google # text and image encoder
├── vae/lightvae/lighttae # vae
└── config.json # Model configuration file
```
-**Complete Model**: When downloading complete models with both quantized and non-quantized versions, you can freely choose the quantization precision for DIT/T5/CLIP in the advanced options of the `Gradio` Web frontend.
#### wan2.2 Model Directory Structure
-**Non-quantized Version Only**: When downloading only non-quantized versions, in the `Gradio` Web frontend, the quantization precision for `DIT/T5/CLIP` can only be set to bf16/fp16. If you need to use quantized versions of models, please manually download quantized weights to the `i2v_model_path` or `t2v_model_path` directory where Gradio is started.
```
models/
├── wan2.2_i2v_A14b_high_noise_lightx2v_4step_1030.safetensors # high noise original precision
├── wan2.2_i2v_A14b_high_noise_fp8_e4m3_lightx2v_4step_1030.safetensors # high noise FP8 quantization
├── wan2.2_i2v_A14b_high_noise_int8_lightx2v_4step_1030.safetensors # high noise INT8 quantization
├── wan2.2_i2v_A14b_high_noise_int8_lightx2v_4step_1030_split # high noise INT8 quantization block storage directory
├── wan2.2_i2v_A14b_low_noise_lightx2v_4step.safetensors # low noise original precision
├── t5/clip/xlm-roberta-large/google # text and image encoder
├── vae/lightvae/lighttae # vae
└── config.json # Model configuration file
```
-**Quantized Version Only**: When downloading only quantized versions, in the `Gradio` Web frontend, the quantization precision for `DIT/T5/CLIP` can only be set to fp8 or int8 (depending on the weights you downloaded). If you need to use non-quantized versions of models, please manually download non-quantized weights to the `i2v_model_path` or `t2v_model_path` directory where Gradio is started.
**📝 Download Instructions**:
-**Note**: Whether you download complete models or partial models, the values for `i2v_model_path` and `t2v_model_path` parameters should be the first-level directory paths. For example: `Wan2.1-I2V-14B-480P-Lightx2v/`, not `Wan2.1-I2V-14B-480P-Lightx2v/int8`.
- Model weights can be downloaded from HuggingFace:
- Text and Image Encoders can be downloaded from [Encoders](https://huggingface.co/lightx2v/Encoderss)
- VAE can be downloaded from [Autoencoders](https://huggingface.co/lightx2v/Autoencoders)
- For `xxx_split` directories (e.g., `wan2.1_i2v_720p_scaled_fp8_e4m3_lightx2v_4step_split`), which store multiple safetensors by block, suitable for devices with insufficient memory. For example, devices with 16GB or less memory should download according to their own situation.
| `--model_path` | str | ✅ | - | Model root directory path (directory containing all model files) |
| `--server_port` | int | ❌ | 7862 | Server port |
| `--server_name` | str | ❌ | 0.0.0.0 | Server IP address |
| `--output_dir` | str | ❌ | ./outputs | Output video save directory |
**💡 Note**: Model type (wan2.1/wan2.2), task type (i2v/t2v), and specific model file selection are all configured in the Web interface.
## 🎯 Features
### Basic Settings
### Model Configuration
-**Model Type**: Supports wan2.1 and wan2.2 model architectures
-**Task Type**: Supports Image-to-Video (i2v) and Text-to-Video (t2v) generation modes
-**Model Selection**: Frontend automatically identifies and filters available model files, supports automatic quantization precision detection
-**Encoder Configuration**: Supports selection of T5 text encoder, CLIP image encoder, and VAE decoder
-**Operator Selection**: Supports multiple attention operators and quantization matrix multiplication operators, system automatically sorts by installation status
### Input Parameters
#### Input Parameters
-**Prompt**: Describe the expected video content
-**Negative Prompt**: Specify elements you don't want to appear
-**Input Image**: Upload input image required in i2v mode
-**Key Step Caching**: Writes cache only at key steps
The system automatically configures optimal inference options based on your hardware configuration (GPU VRAM and CPU memory) without manual adjustment. The best configuration is automatically applied on startup, including:
## 🔧 Auto-Configuration Feature
-**GPU Memory Optimization**: Automatically enables CPU offloading, VAE tiling inference, etc. based on VRAM size
-**CPU Memory Optimization**: Automatically enables lazy loading, module unloading, etc. based on system memory
-**Operator Selection**: Automatically selects the best installed operators (sorted by priority)
-**Quantization Configuration**: Automatically detects and applies quantization precision based on model file names
After enabling "Auto-configure Inference Options", the system will automatically optimize parameters based on your hardware configuration:
### GPU Memory Rules
-**80GB+**: Default configuration, no optimization needed
-**48GB**: Enable CPU offloading, offload ratio 50%
-**40GB**: Enable CPU offloading, offload ratio 80%
-**32GB**: Enable CPU offloading, offload ratio 100%
**💡 Tip**: Generally, after enabling "Auto-configure Inference Options", the system will automatically optimize parameter settings based on your hardware configuration, and performance issues usually won't occur. If you encounter problems, please refer to the following solutions:
@@ -178,7 +178,7 @@ class WanModel(CompiledMethodsMixin):
ifos.path.exists(non_block_file):
safetensors_files=[non_block_file]
else:
raiseValueError(f"Non-block file not found in {safetensors_path}")
raiseValueError(f"Non-block file not found in {safetensors_path}. Please check the model path. Lazy load mode only supports loading chunked model weights.")
weight_dict={}
forfile_pathinsafetensors_files:
...
...
@@ -221,7 +221,7 @@ class WanModel(CompiledMethodsMixin):
ifos.path.exists(non_block_file):
safetensors_files=[non_block_file]
else:
raiseValueError(f"Non-block file not found in {safetensors_path}, Please check the lazy load model path")
raiseValueError(f"Non-block file not found in {safetensors_path}. Please check the model path. Lazy load mode only supports loading chunked model weights.")