logger.info(f"Found Hugging Face model files in: {path}")
returnpath
raiseFileNotFoundError(f"No Hugging Face model files (.safetensors) found.\nPlease download the model from: https://huggingface.co/lightx2v/ or specify the model path in the configuration file.")
logger.info(f"Found PyTorch model checkpoint: {path}")
returnpath
raiseFileNotFoundError(f"PyTorch model file '{filename}' not found.\nPlease download the model from https://huggingface.co/lightx2v/ or specify the model path in the configuration file.")
logger.info(f"Found Hugging Face model files in: {path}")
returnpath
raiseFileNotFoundError(f"No Hugging Face model files (.safetensors) found.\nPlease download the model from: https://huggingface.co/lightx2v/ or specify the model path in the configuration file.")
logger.info(f"Found PyTorch model checkpoint: {path}")
returnpath
raiseFileNotFoundError(f"PyTorch model file '{filename}' not found.\nPlease download the model from https://huggingface.co/lightx2v/ or specify the model path in the configuration file.")
@@ -70,6 +70,20 @@ Install according to the project homepage tutorials for each operator as needed
...
@@ -70,6 +70,20 @@ Install according to the project homepage tutorials for each operator as needed
-**`wan2.1`**: Standard model, provides the best video generation quality, suitable for scenarios with extremely high quality requirements
-**`wan2.1`**: Standard model, provides the best video generation quality, suitable for scenarios with extremely high quality requirements
-**`wan2.1_distill`**: Distilled model, optimized through knowledge distillation technology, significantly improves inference speed, maintains good quality while greatly reducing computation time, suitable for most application scenarios
-**`wan2.1_distill`**: Distilled model, optimized through knowledge distillation technology, significantly improves inference speed, maintains good quality while greatly reducing computation time, suitable for most application scenarios
**📥 Model Download**:
Refer to the [Model Structure Documentation](./model_structure.md) to download complete models (including quantized and non-quantized versions) or download only quantized/non-quantized versions.
**Download Options**:
-**Complete Model**: When downloading complete models with both quantized and non-quantized versions, you can freely choose the quantization precision for DIT/T5/CLIP in the advanced options of the `Gradio` Web frontend.
-**Non-quantized Version Only**: When downloading only non-quantized versions, in the `Gradio` Web frontend, the quantization precision for `DIT/T5/CLIP` can only be set to bf16/fp16. If you need to use quantized versions of models, please manually download quantized weights to the `i2v_model_path` or `t2v_model_path` directory where Gradio is started.
-**Quantized Version Only**: When downloading only quantized versions, in the `Gradio` Web frontend, the quantization precision for `DIT/T5/CLIP` can only be set to fp8 or int8 (depending on the weights you downloaded). If you need to use non-quantized versions of models, please manually download non-quantized weights to the `i2v_model_path` or `t2v_model_path` directory where Gradio is started.
-**Note**: Whether you download complete models or partial models, the values for `i2v_model_path` and `t2v_model_path` parameters should be the first-level directory paths. For example: `Wan2.1-I2V-14B-480P-Lightx2v/`, not `Wan2.1-I2V-14B-480P-Lightx2v/int8`.
├── Wan2.1-I2V-14B-480P-Lightx2v/ # Image-to-video model (480P)
├── Wan2.1-I2V-14B-480P-Lightx2v/ # Image-to-video model (480P)
...
@@ -52,6 +53,20 @@ After extraction, ensure the directory structure is as follows:
...
@@ -52,6 +53,20 @@ After extraction, ensure the directory structure is as follows:
└── Wan2.1-T2V-14B-StepDistill-CfgDistill-Lightx2v/ # Text-to-video model (4-step distillation)
└── Wan2.1-T2V-14B-StepDistill-CfgDistill-Lightx2v/ # Text-to-video model (4-step distillation)
```
```
**📥 Model Download**:
Refer to the [Model Structure Documentation](./model_structure.md) to download complete models (including quantized and non-quantized versions) or download only quantized/non-quantized versions.
**Download Options**:
-**Complete Model**: When downloading complete models with both quantized and non-quantized versions, you can freely choose the quantization precision for DIT/T5/CLIP in the advanced options of the `Gradio` Web frontend.
-**Non-quantized Version Only**: When downloading only non-quantized versions, in the `Gradio` Web frontend, the quantization precision for `DIT/T5/CLIP` can only be set to bf16/fp16. If you need to use quantized versions of models, please manually download quantized weights to the `i2v_model_path` or `t2v_model_path` directory where Gradio is started.
-**Quantized Version Only**: When downloading only quantized versions, in the `Gradio` Web frontend, the quantization precision for `DIT/T5/CLIP` can only be set to fp8 or int8 (depending on the weights you downloaded). If you need to use non-quantized versions of models, please manually download non-quantized weights to the `i2v_model_path` or `t2v_model_path` directory where Gradio is started.
-**Note**: Whether you download complete models or partial models, the values for `i2v_model_path` and `t2v_model_path` parameters should be the first-level directory paths. For example: `Wan2.1-I2V-14B-480P-Lightx2v/`, not `Wan2.1-I2V-14B-480P-Lightx2v/int8`.
**📋 Configuration Parameters**
**📋 Configuration Parameters**
Edit the `lightx2v_config.txt` file and modify the following parameters as needed:
Edit the `lightx2v_config.txt` file and modify the following parameters as needed:
...
@@ -74,6 +89,12 @@ model_size=14b
...
@@ -74,6 +89,12 @@ model_size=14b
# Model class (wan2.1: standard model, wan2.1_distill: distilled model)
# Model class (wan2.1: standard model, wan2.1_distill: distilled model)
LightX2V supports quantization inference for linear layers in `Dit`, supporting `w8a8-int8`, `w8a8-fp8`, `w8a8-fp8block`, `w8a8-mxfp8`, and `w4a4-nvfp4` matrix multiplication.
LightX2V supports quantization inference for linear layers in `Dit`, supporting `w8a8-int8`, `w8a8-fp8`, `w8a8-fp8block`, `w8a8-mxfp8`, and `w4a4-nvfp4` matrix multiplication. Additionally, LightX2V also supports quantization of T5 and CLIP encoders to further improve inference performance.
## 📊 Quantization Scheme Overview
## Producing Quantized Models
### DIT Model Quantization
LightX2V supports multiple DIT matrix multiplication quantization schemes, configured through the `mm_type` parameter:
CLIP encoder supports the same quantization schemes as T5
## 🚀 Producing Quantized Models
Download quantized models from the [LightX2V Official Model Repository](https://huggingface.co/lightx2v), refer to the [Model Structure Documentation](../deploy_guides/model_structure.md) for details.
Use LightX2V's convert tool to convert models into quantized models. Refer to the [documentation](https://github.com/ModelTC/lightx2v/tree/main/tools/convert/readme.md).
Use LightX2V's convert tool to convert models into quantized models. Refer to the [documentation](https://github.com/ModelTC/lightx2v/tree/main/tools/convert/readme.md).
## Loading Quantized Models for Inference
## 📥 Loading Quantized Models for Inference
### DIT Model Configuration
Write the path of the converted quantized weights to the `dit_quantized_ckpt` field in the [configuration file](https://github.com/ModelTC/lightx2v/blob/main/configs/quantization).
Write the path of the converted quantized weights to the `dit_quantized_ckpt` field in the [configuration file](https://github.com/ModelTC/lightx2v/blob/main/configs/quantization).
By specifying --config_json to the specific config file, you can load the quantized model for inference.
For details, please refer to the documentation of the quantization tool [LLMC](https://github.com/ModelTC/llmc/blob/main/docs/en/source/backend/lightx2v.md)
For details, please refer to the documentation of the quantization tool [LLMC](https://github.com/ModelTC/llmc/blob/main/docs/en/source/backend/lightx2v.md)
### Custom Quantization Kernels
LightX2V supports custom quantization kernels that can be extended in the following ways:
1.**Register New mm_type**: Add new quantization classes in `mm_weight.py`
2.**Implement Quantization Functions**: Define quantization methods for weights and activations
3.**Integrate Compute Kernels**: Use custom matrix multiplication implementations
raiseFileNotFoundError(f"PyTorch model file '{filename}' not found.\nPlease download the model from https://huggingface.co/lightx2v/ or specify the model path in the configuration file.")
raiseFileNotFoundError(f"PyTorch model file '{filename}' not found.\nPlease download the model from https://huggingface.co/lightx2v/ or specify the model path in the configuration file.")