logger.info(f"Found Hugging Face model files in: {path}")
returnpath
raiseFileNotFoundError(f"No Hugging Face model files (.safetensors) found.\nPlease download the model from: https://huggingface.co/lightx2v/ or specify the model path in the configuration file.")
logger.info(f"Found PyTorch model checkpoint: {path}")
returnpath
raiseFileNotFoundError(f"PyTorch model file '{filename}' not found.\nPlease download the model from https://huggingface.co/lightx2v/ or specify the model path in the configuration file.")
logger.info(f"Found Hugging Face model files in: {path}")
returnpath
raiseFileNotFoundError(f"No Hugging Face model files (.safetensors) found.\nPlease download the model from: https://huggingface.co/lightx2v/ or specify the model path in the configuration file.")
logger.info(f"Found PyTorch model checkpoint: {path}")
returnpath
raiseFileNotFoundError(f"PyTorch model file '{filename}' not found.\nPlease download the model from https://huggingface.co/lightx2v/ or specify the model path in the configuration file.")
@@ -70,6 +70,20 @@ Install according to the project homepage tutorials for each operator as needed
-**`wan2.1`**: Standard model, provides the best video generation quality, suitable for scenarios with extremely high quality requirements
-**`wan2.1_distill`**: Distilled model, optimized through knowledge distillation technology, significantly improves inference speed, maintains good quality while greatly reducing computation time, suitable for most application scenarios
**📥 Model Download**:
Refer to the [Model Structure Documentation](./model_structure.md) to download complete models (including quantized and non-quantized versions) or download only quantized/non-quantized versions.
**Download Options**:
-**Complete Model**: When downloading complete models with both quantized and non-quantized versions, you can freely choose the quantization precision for DIT/T5/CLIP in the advanced options of the `Gradio` Web frontend.
-**Non-quantized Version Only**: When downloading only non-quantized versions, in the `Gradio` Web frontend, the quantization precision for `DIT/T5/CLIP` can only be set to bf16/fp16. If you need to use quantized versions of models, please manually download quantized weights to the `i2v_model_path` or `t2v_model_path` directory where Gradio is started.
-**Quantized Version Only**: When downloading only quantized versions, in the `Gradio` Web frontend, the quantization precision for `DIT/T5/CLIP` can only be set to fp8 or int8 (depending on the weights you downloaded). If you need to use non-quantized versions of models, please manually download non-quantized weights to the `i2v_model_path` or `t2v_model_path` directory where Gradio is started.
-**Note**: Whether you download complete models or partial models, the values for `i2v_model_path` and `t2v_model_path` parameters should be the first-level directory paths. For example: `Wan2.1-I2V-14B-480P-Lightx2v/`, not `Wan2.1-I2V-14B-480P-Lightx2v/int8`.
├── Wan2.1-I2V-14B-480P-Lightx2v/ # Image-to-video model (480P)
...
...
@@ -52,6 +53,20 @@ After extraction, ensure the directory structure is as follows:
└── Wan2.1-T2V-14B-StepDistill-CfgDistill-Lightx2v/ # Text-to-video model (4-step distillation)
```
**📥 Model Download**:
Refer to the [Model Structure Documentation](./model_structure.md) to download complete models (including quantized and non-quantized versions) or download only quantized/non-quantized versions.
**Download Options**:
-**Complete Model**: When downloading complete models with both quantized and non-quantized versions, you can freely choose the quantization precision for DIT/T5/CLIP in the advanced options of the `Gradio` Web frontend.
-**Non-quantized Version Only**: When downloading only non-quantized versions, in the `Gradio` Web frontend, the quantization precision for `DIT/T5/CLIP` can only be set to bf16/fp16. If you need to use quantized versions of models, please manually download quantized weights to the `i2v_model_path` or `t2v_model_path` directory where Gradio is started.
-**Quantized Version Only**: When downloading only quantized versions, in the `Gradio` Web frontend, the quantization precision for `DIT/T5/CLIP` can only be set to fp8 or int8 (depending on the weights you downloaded). If you need to use non-quantized versions of models, please manually download non-quantized weights to the `i2v_model_path` or `t2v_model_path` directory where Gradio is started.
-**Note**: Whether you download complete models or partial models, the values for `i2v_model_path` and `t2v_model_path` parameters should be the first-level directory paths. For example: `Wan2.1-I2V-14B-480P-Lightx2v/`, not `Wan2.1-I2V-14B-480P-Lightx2v/int8`.
**📋 Configuration Parameters**
Edit the `lightx2v_config.txt` file and modify the following parameters as needed:
...
...
@@ -74,6 +89,12 @@ model_size=14b
# Model class (wan2.1: standard model, wan2.1_distill: distilled model)
This document introduces the model directory structure of the Lightx2v project, helping users correctly organize model files for a convenient user experience. Through proper directory organization, users can enjoy the convenience of "one-click startup" without manually configuring complex path parameters.
This document provides a comprehensive introduction to the model directory structure of the LightX2V project, designed to help users efficiently organize model files and achieve a convenient user experience. Through scientific directory organization, users can enjoy the convenience of "one-click startup" without manually configuring complex path parameters. Meanwhile, the system also supports flexible manual path configuration to meet the diverse needs of different user groups.
## 🗂️ Model Directory Structure
### Lightx2v Official Model List
### LightX2V Official Model List
View all available models: [Lightx2v Official Model Repository](https://huggingface.co/lightx2v)
View all available models: [LightX2V Official Model Repository](https://huggingface.co/lightx2v)
### Standard Directory Structure
Using `Wan2.1-I2V-14B-480P-Lightx2v` as an example:
Using `Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-LightX2V` as an example, the standard file structure is as follows:
**Strongly recommend storing model files on SSD solid-state drives** to significantly improve model loading speed and inference performance.
**It is strongly recommended to store model files on SSD solid-state drives**, as this can significantly improve model loading speed and inference performance.
**Recommended storage paths**:
```bash
...
...
@@ -40,88 +50,302 @@ Model Root Directory/
/opt/models/ # System optimization directory
```
### Quantized Version Directories
### Quantization Version Description
Each model contains multiple quantized versions for different hardware configurations:
Each model includes multiple quantized versions to adapt to different hardware configuration requirements:
-**FP8 Version**: Suitable for GPUs that support FP8 (such as H100, A100, RTX 40 series), providing optimal performance
-**INT8 Version**: Suitable for most GPUs, balancing performance and compatibility, reducing memory usage by approximately 50%
-**Original Precision Version**: Suitable for applications with extremely high precision requirements, providing highest quality output
## 🚀 Usage Methods
### Environment Setup
#### Installing Hugging Face CLI
Before starting to download models, please ensure that Hugging Face CLI is properly installed:
```bash
# Install huggingface_hub
pip install huggingface_hub
# Or install huggingface-cli
pip install huggingface-cli
# Login to Hugging Face (optional, but strongly recommended)
huggingface-cli login
```
Model Directory/
├── fp8/ # FP8 quantized version (H100/A100 high-end GPUs)
├── int8/ # INT8 quantized version (general GPUs)
└── original/ # Original precision version (DIT)
### Method 1: Complete Model Download (Recommended)
**Advantage**: After downloading the complete model, the system will automatically identify all component paths without manual configuration, providing a more convenient user experience
**💡 Using Full Precision Models**: To use full precision models, simply copy the official weight files to the `original/` directory.
#### 2. Start Inference
## 🚀 Usage Methods
##### Bash Script Startup
###### Scenario 1: Using Full Precision Model
Modify the configuration in the [run script](https://github.com/ModelTC/LightX2V/tree/main/scripts/wan/run_wan_i2v_distill_4step_cfg.sh):
-`model_path`: Set to the downloaded model path `./Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-LightX2V`
-`lightx2v_path`: Set to the LightX2V project root directory path
###### Scenario 2: Using Quantized Model
When using the complete model, if you need to enable quantization, please add the following configuration to the [configuration file](https://github.com/ModelTC/LightX2V/tree/main/configs/distill/wan_i2v_distill_4step_cfg.json):
> **Important Note**: Quantization configurations for each model can be flexibly combined. Quantization paths do not need to be manually specified, as the system will automatically locate the quantized versions of each model.
When using the Gradio interface, simply specify the model root directory path:
For detailed explanation of quantization technology, please refer to the [Quantization Documentation](../method_tutorials/quantization.md).
Use the provided bash script for quick startup:
```bash
cd LightX2V/scripts
bash wan/run_wan_t2v_distill_4step_cfg.sh
```
##### Gradio Interface Startup
When performing inference through the Gradio interface, simply specify the model root directory path at startup, and lightweight VAE can be flexibly selected through frontend interface buttons:
> **Important Note**: When starting inference scripts or Gradio, the `model_path` parameter still needs to be specified as the complete path without the `--include` parameter. For example: `model_path=./Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-LightX2V`, not `./Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-LightX2V/int8`.
#### 2. Start Inference
**Taking the model with only FP8 version downloaded as an example:**
##### Bash Script Startup
###### Scenario 1: Using FP8 DIT + FP8 T5 + FP8 CLIP
Set the `model_path` in the [run script](https://github.com/ModelTC/LightX2V/tree/main/scripts/wan/run_wan_i2v_distill_4step_cfg.sh) to your downloaded model path `./Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-LightX2V/`, and set `lightx2v_path` to your LightX2V project path.
Only need to modify the quantized model configuration in the configuration file as follows:
> **Important Note**: At this time, each model can only be specified as a quantized version. Quantization paths do not need to be manually specified, as the system will automatically locate the quantized versions of each model.
###### Scenario 2: Using FP8 DIT + Original Precision T5 + Original Precision CLIP
Set the `model_path` in the [run script](https://github.com/ModelTC/LightX2V/tree/main/scripts/wan/run_wan_i2v_distill_4step_cfg.sh) to your downloaded model path `./Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-LightX2V`, and set `lightx2v_path` to your LightX2V project path.
Since only quantized weights were downloaded, you need to manually download the original precision versions of T5 and CLIP, and configure them in the configuration file's `t5_original_ckpt` and `clip_original_ckpt` as follows:
> **Important Note**: Since the model root directory only contains quantized versions of each model, when using the frontend, the quantization precision for DIT/T5/CLIP models can only be selected as fp8. If you need to use non-quantized versions of T5/CLIP, please manually download non-quantized weights and place them in the gradio_demo model_path directory (`./Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-LightX2V/`). In this case, the T5/CLIP quantization precision can be set to bf16/fp16.
### Method 3: Manual Configuration
Users can flexibly configure quantization options and paths for each component according to actual needs, achieving mixed use of quantized and non-quantized components. Please ensure that the required model weights have been correctly downloaded and placed in the specified paths.
When starting with configuration files, such as [configuration file](https://github.com/ModelTC/LightX2V/tree/main/configs/offload/disk/wan_i2v_phase_lazy_load_480p.json), the following path configurations can be omitted:
-`dit_quantized_ckpt`: No need to specify, code will automatically search in the model directory
-`tiny_vae_path`: No need to specify, code will automatically search in the model directory
-`clip_quantized_ckpt`: No need to specify, code will automatically search in the model directory
-`t5_quantized_ckpt`: No need to specify, code will automatically search in the model directory
#### CLIP Model Configuration
**💡 Simplified Configuration**: After organizing model files according to the recommended directory structure, most path configurations can be omitted as the code will handle them automatically.
**💡 Download Recommendations**: It is recommended to use SSD storage and ensure stable network connection. For large files, you can use `git lfs` or download tools such as `aria2c`.
> **Configuration Notes**:
> - Quantized weights and original precision weights can be flexibly mixed and used, and the system will automatically select the corresponding model based on the configuration
> - The choice of quantization mode depends on your hardware support, it is recommended to use FP8 on high-end GPUs like H100/A100
> - Lightweight VAE can significantly improve inference speed but may slightly affect generation quality
## 💡 Best Practices
### Recommended Configurations
**Complete Model Users**:
- Download complete models to enjoy the convenience of automatic path discovery
- Only need to configure quantization schemes and component switches
- Recommended to use bash scripts for quick startup
- Flexibly mix and use quantized and original precision components
- Use bash scripts to simplify startup process
**Advanced Users**:
- Completely manual path configuration for maximum flexibility
- Support scattered storage of model files
- Can customize bash script parameters
### Performance Optimization Recommendations
-**Use SSD Storage**: Significantly improve model loading speed and inference performance
-**Unified Directory Structure**: Facilitate management and switching between different model versions
-**Reserve Sufficient Space**: Ensure adequate storage space (recommended at least 200GB)
-**Regular Cleanup**: Delete unnecessary model versions to save space
-**Network Optimization**: Use stable network connections and download tools
-**Choose Appropriate Quantization Schemes**:
- FP8: Suitable for high-end GPUs like H100/A100, high precision
- INT8: Suitable for general GPUs, small memory footprint
-**Enable Lightweight VAE**: `use_tiny_vae: true` can improve inference speed
-**Reasonable CPU Offload Configuration**: `t5_cpu_offload: true` can save GPU memory
## 🚨 Common Issues
### Download Optimization Recommendations
### Q: Model files are too large and download is slow?
A: Use domestic mirror sources, download tools such as `aria2c`, or consider using cloud storage services
-**Use Hugging Face CLI**: More stable than git clone, supports resume download
-**Selective Download**: Only download required quantized versions, saving time and storage space
-**Network Optimization**: Use stable network connections, use proxy when necessary
-**Resume Download**: Use `--resume-download` parameter to support continuing download after interruption
### Q: Model path not found when starting?
A: Check if the model has been downloaded correctly and verify the path configuration
## 🚨 Frequently Asked Questions
### Q: How to switch between different model versions?
A: Modify the model path parameter in the startup command, supports running multiple model instances simultaneously
### Q: Model files are too large and download speed is slow, what should I do?
A: It is recommended to use selective download method, only download required quantized versions, or use domestic mirror sources
### Q: Model loading is very slow?
A: Ensure models are stored on SSD, enable lazy loading, and use quantized version models
### Q: Model path does not exist when starting up?
A: Please check if the model has been correctly downloaded, verify if the path configuration is correct, and confirm if the automatic discovery mechanism is working properly
### Q: How to switch between different quantization schemes?
A: Modify parameters such as `mm_type`, `t5_quant_scheme`, `clip_quant_scheme` in the configuration file, please refer to the [Quantization Documentation](../method_tutorials/quantization.md)
### Q: How to mix and use quantized and original precision components?
A: Control through `t5_quantized` and `clip_quantized` parameters, and manually specify original precision paths
### Q: How to set paths in configuration files?
A: After organizing according to the recommended directory structure, most path configurations can be omitted as the code will handle them automatically
A: It is recommended to use automatic path discovery, for manual configuration please refer to the "Manual Configuration" section
### Q: How to verify if automatic path discovery is working properly?
A: Check the startup logs, the code will output the actual model paths being used
### Q: What should I do if bash script startup fails?
A: Check if the path configuration in the script is correct, ensure that `lightx2v_path` and `model_path` variables are correctly set
## 📚 Related Links
-[Lightx2v Official Model Repository](https://huggingface.co/lightx2v)
-[LightX2V Official Model Repository](https://huggingface.co/lightx2v)
Through proper model file organization, users can enjoy the convenience of "one-click startup" without manually configuring complex path parameters. It is recommended to organize model files according to the structure recommended in this document and fully utilize the advantages of SSD storage.
Through scientific model file organization and flexible configuration options, LightX2V supports multiple usage scenarios. Complete model download provides maximum convenience, selective download saves storage space, and manual configuration provides maximum flexibility. The automatic path discovery mechanism ensures that users do not need to remember complex path configurations while maintaining system scalability.
LightX2V supports quantization inference for linear layers in `Dit`, supporting `w8a8-int8`, `w8a8-fp8`, `w8a8-fp8block`, `w8a8-mxfp8`, and `w4a4-nvfp4` matrix multiplication.
LightX2V supports quantization inference for linear layers in `Dit`, supporting `w8a8-int8`, `w8a8-fp8`, `w8a8-fp8block`, `w8a8-mxfp8`, and `w4a4-nvfp4` matrix multiplication. Additionally, LightX2V also supports quantization of T5 and CLIP encoders to further improve inference performance.
## 📊 Quantization Scheme Overview
## Producing Quantized Models
### DIT Model Quantization
LightX2V supports multiple DIT matrix multiplication quantization schemes, configured through the `mm_type` parameter:
CLIP encoder supports the same quantization schemes as T5
## 🚀 Producing Quantized Models
Download quantized models from the [LightX2V Official Model Repository](https://huggingface.co/lightx2v), refer to the [Model Structure Documentation](../deploy_guides/model_structure.md) for details.
Use LightX2V's convert tool to convert models into quantized models. Refer to the [documentation](https://github.com/ModelTC/lightx2v/tree/main/tools/convert/readme.md).
## Loading Quantized Models for Inference
## 📥 Loading Quantized Models for Inference
### DIT Model Configuration
Write the path of the converted quantized weights to the `dit_quantized_ckpt` field in the [configuration file](https://github.com/ModelTC/lightx2v/blob/main/configs/quantization).
By specifying --config_json to the specific config file, you can load the quantized model for inference.
For details, please refer to the documentation of the quantization tool [LLMC](https://github.com/ModelTC/llmc/blob/main/docs/en/source/backend/lightx2v.md)
### Custom Quantization Kernels
LightX2V supports custom quantization kernels that can be extended in the following ways:
1.**Register New mm_type**: Add new quantization classes in `mm_weight.py`
2.**Implement Quantization Functions**: Define quantization methods for weights and activations
3.**Integrate Compute Kernels**: Use custom matrix multiplication implementations
raiseFileNotFoundError(f"PyTorch model file '{filename}' not found.\nPlease download the model from https://huggingface.co/lightx2v/ or specify the model path in the configuration file.")